Microsoft
Research Intern - Reliability of Cloud and AI Systems
Found: February 28, 2026
This position is based in Redmond, Washington.
Compensation:
USD $6,710 - $13,270 per month
Overview:
Join the Systems Reliability Group at Microsoft Research to work on innovative reliability mechanisms and scalable debugging tools for cloud and AI systems.
Responsibilities:
- Work with large-scale codebases and configurations powering Microsoft Azure and Office 365.
- Analyze production data to discover failure patterns and design prevention strategies.
- Develop tools for monitoring, logging, and troubleshooting at scale.
- Integrate and evaluate solutions on real Microsoft services.
Qualifications:
- Currently enrolled in a PhD program in Computer Science or a related STEM field.
- Experience in building scalable and reliable systems is preferred.