Overview

Senior DevOps Engineer, GPUaaS

Date: 23 Dec 2025

Location:

Singapore, Singapore

Company:
Singtel Group

About Singtel Digital InfraCo – RE:AI

 

Singtel Digital InfraCo’s RE:AI division is building Asia’s most advanced and sustainable AI infrastructure ecosystem. RE:AI enables enterprises, research institutions, and digital-native businesses to accelerate innovation through responsible, high-performance AI compute and connectivity solutions.

 

Be a Part of Something BIG!

 

As an DevOps Engineer for SingTel’s GPU-as-a-Service (GPUaaS), you will help in implementing processes and integration of operations to advance customer’s AI and HPC capabilities. You will be exposed to both physical data center implementation and software solutions in a Singtel GPU-as-a-Service (GPUaaS). This position requires a forward-thinking individual who thrives in dynamic environments and is committed to driving continuous improvement in GPU for AI and HPC environments. This is an excellent opportunity for someone eager to start their career in DevOps and grow their expertise in AI and HPC cloud platforms.

 

Responsibilities

  • Design, deploy and support large-scale, distribute GPU clusters for AI and ML workloads.
  • Manage and automate provisioning of GPU resources in both on-prem and cloud platforms.
  • Design, implement and manage CI/CD pipelines for AI models and GPU-accelerated applications.
  • Monitor cluster usage, health, performance and availability.
  • Improve infrastructure provisioning, management, and monitoring through automation.
  • Troubleshoot compute resource system level issues such as Slurm, Kubernetes, GPU drivers, CUDA, IB networking.
  • Optimize system parameters (e.g., OS, drivers, networking, library) for AI workload performance.
  • Conduct GPU cluster benchmark and keeping up with the latest advancements in GPU technology.
  • Set up monitoring and logging for GPU resources using Zabbix, Prometheus, NVIDIA DCGM and other tools.
  • Implement security best-practices for multi-tenant GPU-as-a-Service (GPUaaS) environment.
  • Collaborate with software and administrator to to streamline workflows and improve collaboration.
  • Providing technical support and guidance to users of GPU-accelerated systems.
  • Work with senior DevOps engineer to identify bottlenecks and improve development and operational processes for AI and HPC GPU cloud.
  • Learning to solve problems in high-performance distributed computation for AI and HPC GPU cloud computing.
  • This role may require availability outside standard work hours, including nights, weekends and public holidays.

 

Requirements

  • Bachelor’s degree in Computer Science/Engineering, Information Technology, Systems Engineering, or a related field.
  • Strong Linux system administration skills in Ubuntu/CentOS/Rocky Linux, etc.
  • Experience with DevOps tools such as Jenkins, Kubernetes, Ansible and Terraform.
  • Solid understanding of DevOps practices, including CI/CD, automation, and monitoring.
  • Proficiency in scripting languages (e.g., Python, Bash).
  • Experience in implementing monitoring solutions such as Zabbix, Prometheus.
  • Familiarity with AI frameworks such as TensorFlow, PyTorch.
  • Understanding of cloud architectures (IaaS, PaaS), GPU architecture and NVIDIA GPUs.
  • Strong verbal, written, and presentation skills in English.
  • Team player with experience in cross-functional coordination.
  • Strong technical problem solving and analytical skills for system optimization.

 

Desirable qualifications

  • Understanding of how collective communications (MPI, RDMA, and NCCL) works, as well as an understanding of GPU specific aceleration works on GPU cluster.
  • Knowledge of DevOps/ML Ops technologies in GPU cluster such as Docker/containers, Kubernetes, data center deployments
  • Familiarity with Slurm or other HPC workload managers to manage GPU clusters.
  • Understanding of AI & HPC networking technologies such as InfiniBand, RoCE, DPUs.
  • System-level experience specifically GPU-based systems (NVIDIA GPU and SDKs)
  • Understanding how AI and HPC workloads interact with both GPU HW and SW infrastructure.

 

Rewards that Go Beyond 

  • Flexible work arrangements
  • Full suite of health and wellness benefits 
  • Ongoing training and development programs 
  • Internal mobility opportunities

 

Your Career Growth Starts Here. Apply Now!


About Singtel

Headquartered in Singapore, Singtel has 140 years of operating experience and played a pivotal role in the country’s development as a major communications hub. Optus, our subsidiary in Australia, is a leader in integrated telecommunications, constantly raising the bar in innovative products and services.

We are also strategically invested in leading companies in Asia and Africa, including Bharti Airtel (India, South Asia and Africa), Telkomsel (Indonesia), Globe Telecom (the Philippines) and Advanced Info Service (Thailand). We work closely with our associates, leveraging our scale in networks, customer reach and extensive operational experience to lead and shape the communications industry.

Together, the Group serves over 700 million mobile customers around world. Singtel is one of the largest listed Singapore companies on the Singapore Exchange by market capitalisation.

The Group has a vast network of offices throughout Asia Pacific, Europe and the USA, and employs more than 23,000 staff worldwide.