Microsoft
Member of Technical Staff, High Performance Computing Engineer - MAI SuperIntelligence Team
Found: Today
This role is based in London, United Kingdom, with a focus on building and scaling infrastructure for AI models.
Responsibilities:
- Design, operate, and maintain large-scale HPC environments.
- Own the deployment and operation of HPC schedulers (e.g., SLURM, Kubernetes).
- Serve as a technical owner for core HPC domains, including maintenance and performance tuning.
- Develop automation and tooling using Bash and/or Python.
- Collaborate with researchers to troubleshoot and optimize workloads.
Qualifications:
- Bachelor’s degree in computer science or related field with 4+ years of relevant experience.
- Experience with high-scale training clusters and public cloud infrastructure (e.g., Azure, AWS, GCP).
- Preferred: Master’s degree and 6+ years of experience.
- Experience with LLM training clusters and AI platforms.