Head of AI Infrastructure Engineering

Consultant:

Yien Quek

Job Reference No.

Registration No.

R1109830

License No.

16S8060

Function

Technology Leadership

Industry

IT & Telco

We are seeking a senior infrastructure leader to lead the deployment, performance, and reliability of large-scale bare-metal GPU clusters at the core of next-generation AI factory environments. This individual will play a pivotal role in bringing new compute capacity into production, ensuring infrastructure is commissioned effectively, operated reliably, and continuously optimised to support mission-critical AI workloads.

This is a high-impact leadership role at the intersection of infrastructure engineering, systems performance, production reliability, and operational scale-up.

Role
This role is responsible for leading the end-to-end deployment and operational management of bare-metal GPU clusters, from initial setup through to ongoing optimisation. You will drive infrastructure performance across compute, networking, and storage layers while establishing operational frameworks for monitoring, incident response, and system reliability. The position also involves building and guiding engineering teams, developing scalable deployment processes, and collaborating with cross-functional stakeholders including data centre, platform, and hardware partners. In addition, you will contribute to infrastructure planning, vendor evaluation, and capacity scaling initiatives to support continued platform growth.

Requirements
The ideal candidate brings a strong track record in managing large-scale compute environments, such as GPU clusters, HPC systems, or distributed infrastructure. You should have deep expertise in infrastructure engineering, performance optimisation, and production reliability, along with experience in Linux-based systems and cluster orchestration tools. Familiarity with infrastructure automation, observability practices, and incident management is essential. Prior experience in leading technical teams, supporting hardware selection, and working within high-performance or data centre environments will be advantageous, along with exposure to GPU ecosystems and high-speed networking technologies.

To Apply
To apply, please submit your resume to Yien Quek at yq@kerryconsulting.com. We regret to inform that only successful shortlisted candidates will be notified. Licence No: 16S8060 | Registration no: R1109830

Apply for this position

Head of AI Infrastructure Engineering

Platform Engineering Lead, Cybersecurity AI startup

Head of AI Infrastructure Strategy

Overview

Senior Release Engineer, AI Infrastructure

Senior Solutions Engineer – Data Centre Infrastructure