OpenAI
Software Engineer, Hardware Health
Found: Today
This role is based in San Francisco.
Compensation:
$250K – $445K + Offers Equity
Responsibilities:
- Define and maintain health signals across GPUs, CPUs, networking, and platform infrastructure.
- Build and evolve health checks that detect, remediate, and verify failures at scale.
- Investigate hardware failures and system-level issues across large-scale compute environments.
- Build automation and tooling for global cluster management.
Requirements:
- 7+ years of industry experience in software or infrastructure engineering.
- Strong proficiency with Python and shell scripting.
- Experience building large-scale distributed systems.