Engineering Manager, Ads ML Efficiency

Remote - United States

Found: Today

About the Role

Reddit is building a dedicated Ads ML Efficiency function to make model training and inference materially faster, cheaper, safer, and more scalable. As the Engineering Manager for this team, you will lead a group focused on model optimization, training efficiency, GPU enablement, load testing, model performance tooling, and efficiency guardrails across Ads ML.

What you’ll do:

Lead & Grow: Hire, mentor, and retain a high-performing team of ML engineers / systems-oriented engineers working on model optimization and ML efficiency.
Set Technical Direction: Define the roadmap for training optimization, inference optimization, launch-readiness tooling, and reusable efficiency primitives across Ads ML.
Deliver Measurable Wins: Drive reductions in model training time, online latency, serving cost, and infra-driven launch risk.
Build Systems and Tooling: Guide the development of profiling, benchmarking, load testing, observability, cost analysis, debugging, and efficiency certification systems.
Operate in the Critical Path: Partner with model owners and platform teams to accelerate high-priority launches and remove bottlenecks from the path to production.
Shape the Team’s Evolution: Balance near-term white-glove optimization work with medium-term platformization and automation.
Build XFN Alignment: Work closely with MLP, AMP, Ranking, and serving teams to clarify boundaries, upstream generic wins, and keep Ads needs on track.
Raise the Bar: Establish engineering rigor around measurement, performance debugging, launch safety, and technical decision-making for efficiency work.

What we’re looking for:

Deep ML Engineering Experience: The candidate should have been close to the models themselves and understand training, serving, debugging, and optimization in depth.
Hands-on Optimization Background: Direct experience improving training loops, serving systems, profiling workflows, model/inference efficiency, or GPU utilization.
Strong Managerial Ability: Experience building and leading teams, coaching engineers, managing delivery, and making prioritization tradeoffs under ambiguity.
Distributed Systems Fluency: Proven ability to reason about production-scale ML systems and the tradeoffs that govern reliability, speed, cost, and scale.
Customer and Platform Instincts: Able to work as a service provider to modeling teams while still building reusable systems rather than only heroic one-offs.
Strong Communication: Can explain technical tradeoffs clearly to engineers, PMs, and senior stakeholders.
Ads experience: Experience in ads ranking, recommender systems, marketplace ML, or adjacent production ML domains is strongly preferred.

Nice-to-have:

Experience with GPU training and serving migrations.
Experience with PyTorch, distributed training frameworks, or kernel/performance optimization.
Experience building efficiency benchmarking or launch certification frameworks.
Experience working in organizations where ML platform and applied modeling responsibilities are split across multiple teams.

View Details and Apply

Get jobs like this in your inbox daily

Fresh FAANG jobs, every day, filtered for your role and location.