Nvidia
Engineering Manager, DGX Cloud Production Engineering
Found: Today
This role is based in multiple remote locations in the US, including California, Texas, and Washington.
Compensation:
$224,000 - $356,500/year
Responsibilities:
- Lead a team of software and production engineers focused on Kubernetes-based operations and automation.
- Drive execution across cluster operations, GitOps, observability, and incident response.
- Define team priorities, roadmap, and operational ownership.
- Collaborate with various teams to enhance production readiness.
- Foster a culture of on-call and incident review focused on learning and ownership.
- Coach engineers and create clear ownership in complex problem spaces.
Requirements:
- 8+ years of industry experience, including 2+ years in a leadership role.
- Experience with production infrastructure, cloud platforms, and Kubernetes environments.
- Strong understanding of reliability engineering and operational excellence.
- Ability to influence teams without direct authority.
- Excellent communication and prioritization skills.
- BS/MS in Computer Science or equivalent experience.
Preferred Qualifications:
- Experience leading SRE or production engineering teams.
- Familiarity with GPU infrastructure and multi-cloud environments.
- Proven track record of improving operational efficiency.