Nvidia

Senior Platform and EngOps Engineer - Cluster Operations

India, Bengaluru

Found: December 10, 2025

View Details and Apply

This role is based in Bengaluru, India.

What you'll do:

Develop automated tools for deploying and maintaining GPU clusters interconnected via NVLink and InfiniBand.
Implement DevOps tools for software updates, maintenance tasks, and monitoring cluster availability.
Troubleshoot daily cluster failures to maintain optimal performance.
Manage software and firmware updates for clusters.
Collaborate with engineering and product teams across time zones.

What we need to see:

BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
5+ years of experience in deploying and managing clusters and infrastructure.
Expertise in Ansible, Python, and Shell Scripting.
Deep understanding of operating systems and high-performance applications.
Proficient with Linux fundamentals.

Ways to stand out: