Nvidia
Senior Production Engineer - DGX Cloud
Found: Today
This role is remote with multiple locations available including CA, NC, TX, CO, and WA.
Compensation:
$168,000 - $333,500/year based on experience and level.
Responsibilities:
- Work on production systems for scalable GPU clusters for AI workloads.
- Implement monitoring and health management for GPU assets.
- Collaborate with teams to ensure reliable AI cluster performance.
Requirements:
- 8+ years in Production Engineering/DevOps/SRE roles.
- Experience with large-scale production systems.
- BS in Computer Science, Engineering, or related field.
- Proficient in systems programming languages (Go, Python).
Tech stack:
GPU, Kubernetes, Slurm, Bright Cluster Manager.