Reddit
Engineering Manager, Ads ML Efficiency
Found: Today
About the Role
Reddit is building a dedicated Ads ML Efficiency function to make model training and inference materially faster, cheaper, safer, and more scalable. As the Engineering Manager for this team, you will lead a group focused on model optimization, training efficiency, GPU enablement, load testing, model performance tooling, and efficiency guardrails across Ads ML.
What you’ll do:
- Lead & Grow: Hire, mentor, and retain a high-performing team of ML engineers / systems-oriented engineers working on model optimization and ML efficiency.
- Set Technical Direction: Define the roadmap for training optimization, inference optimization, launch-readiness tooling, and reusable efficiency primitives across Ads ML.
- Deliver Measurable Wins: Drive reductions in model training time, online latency, serving cost, and infra-driven launch risk.
- Build Systems and Tooling: Guide the development of profiling, benchmarking, load testing, observability, cost analysis, debugging, and efficiency certification systems.
- Operate in the Critical Path: Partner with model owners and platform teams to accelerate high-priority launches and remove bottlenecks from the path to production.
- Shape the Team’s Evolution: Balance near-term white-glove optimization work with medium-term platformization and automation.
- Build XFN Alignment: Work closely with MLP, AMP, Ranking, and serving teams to clarify boundaries, upstream generic wins, and keep Ads needs on track.
- Raise the Bar: Establish engineering rigor around measurement, performance debugging, launch safety, and technical decision-making for efficiency work.
What we’re looking for:
- Deep ML Engineering Experience: The candidate should have been close to the models themselves and understand training, serving, debugging, and optimization in depth.
- Hands-on Optimization Background: Direct experience improving training loops, serving systems, profiling workflows, model/inference efficiency, or GPU utilization.
- Strong Managerial Ability: Experience building and leading teams, coaching engineers, managing delivery, and making prioritization tradeoffs under ambiguity.
- Distributed Systems Fluency: Proven ability to reason about production-scale ML systems and the tradeoffs that govern reliability, speed, cost, and scale.
- Customer and Platform Instincts: Able to work as a service provider to modeling teams while still building reusable systems rather than only heroic one-offs.
- Strong Communication: Can explain technical tradeoffs clearly to engineers, PMs, and senior stakeholders.
- Ads experience: Experience in ads ranking, recommender systems, marketplace ML, or adjacent production ML domains is strongly preferred.
Nice-to-have:
- Experience with GPU training and serving migrations.
- Experience with PyTorch, distributed training frameworks, or kernel/performance optimization.
- Experience building efficiency benchmarking or launch certification frameworks.
- Experience working in organizations where ML platform and applied modeling responsibilities are split across multiple teams.