Head of AI Infrastructure Engineering Jobs in Singapore

    Head of AI Infrastructure Engineering

      Consultant:
      Job Reference No.
      Registration No.
      R1109830
      License No.
      16S8060
      Function
      Technology Leadership
      Industry
      IT & Telco

      We are seeking a senior infrastructure leader to lead the deployment, performance, and reliability of large-scale bare-metal GPU clusters at the core of next-generation AI factory environments. This individual will play a pivotal role in bringing new compute capacity into production, ensuring infrastructure is commissioned effectively, operated reliably, and continuously optimised to support mission-critical AI workloads.

      This is a high-impact leadership role at the intersection of infrastructure engineering, systems performance, production reliability, and operational scale-up.

      Role
      This role is responsible for leading the end-to-end deployment and operational management of bare-metal GPU clusters, from initial setup through to ongoing optimisation. You will drive infrastructure performance across compute, networking, and storage layers while establishing operational frameworks for monitoring, incident response, and system reliability. The position also involves building and guiding engineering teams, developing scalable deployment processes, and collaborating with cross-functional stakeholders including data centre, platform, and hardware partners. In addition, you will contribute to infrastructure planning, vendor evaluation, and capacity scaling initiatives to support continued platform growth.

      Requirements
      The ideal candidate brings a strong track record in managing large-scale compute environments, such as GPU clusters, HPC systems, or distributed infrastructure. You should have deep expertise in infrastructure engineering, performance optimisation, and production reliability, along with experience in Linux-based systems and cluster orchestration tools. Familiarity with infrastructure automation, observability practices, and incident management is essential. Prior experience in leading technical teams, supporting hardware selection, and working within high-performance or data centre environments will be advantageous, along with exposure to GPU ecosystems and high-speed networking technologies.

      To Apply
      To apply, please submit your resume to Yien Quek at yq@kerryconsulting.com. We regret to inform that only successful shortlisted candidates will be notified. Licence No: 16S8060 | Registration no: R1109830

      Apply for this position