Senior DevOps Service Reliability Operations Engineer - DGX Cloud

Nvidia logo Nvidia

📍 2 Locations

Scraped: Today

Location:

US, CA, Santa Clara or Remote

Compensation:

$144,000 - $270,250/year

Responsibilities:

  • Design, develop, and implement a Service Reliability Operations Center.
  • Provide 24/7 support, working with global teams.
  • Develop monitors, alarms, and alerts to enhance service reliability.
  • Perform systems and network administration tasks.
  • Collaborate with developers to create and update runbooks.
  • Manage incidents and improve service quality.

Requirements:

  • 5+ years of experience in large-scale production systems.
  • Expertise in Linux, Ansible, Python, and networking.
  • BS in Computer Science or equivalent experience.
  • Experience with Kubernetes and cloud environments is a plus.

Fresh Big Tech Jobs in One Place

Get fresh, high-paying jobs daily straight to your email from Apple, Google, Amazon, Meta, Nvidia, Stripe, Microsoft, Netflix, Tesla, Uber, Airbnb, TikTok, Spotify, Booking.com, Pinterest, Canva, OpenAI, and others.

Why I Created Top Jobs Today

Stanislav Prigodich

Hey, I’m Stan 👋

I’m a software developer, and over time I realized I cared mostly about roles at big tech companies - not just whatever happened to show up on LinkedIn or generic job boards. But those sources weren’t enough - some roles were delayed, or never posted at all.

So I built this project to solve that. It scrapes fresh job postings directly from official company sites, figures out what kind of roles they really are, and sends them as email alerts - simple, fast, and focused.

Hope it makes your search easier too. Wishing you the best of luck - and I’m really glad you’re here!

Connect with me on LinkedIn
Reddit Join my r/FAANGJobs Community