Microsoft
Member of Technical Staff, HPC Operations Engineering Manager - MAI SuperIntelligence Team
Found: February 23, 2026
This role is based in Mountain View, California, requiring in-office presence 4 days a week.
Compensation:
USD $139,900 - $274,800 per year
Responsibilities:
- Lead a team of SREs to ensure uptime and resiliency of AI systems.
- Design and maintain monitoring and alerting systems for model serving.
- Build automation for deployments and incident response.
- Manage on-call rotations and troubleshoot production issues.
- Ensure data privacy and compliance across environments.
- Collaborate with ML engineers to enhance developer experience.
Qualifications:
- Bachelor's Degree in Computer Science or related field with 8+ years of experience in SRE, DevOps, or Infrastructure Engineering.
- Experience with Kubernetes, Docker, and public cloud platforms.
- Programming skills in Python, Go, or Bash.
- 6+ years of people management experience preferred.