Canva
Engineering Manager (Infra) - AI Reliability (ANZ Remote)
Found: Today
This role is based in Melbourne, VIC, Australia with remote work options available.
Responsibilities:
- Building world-class AI infrastructure to support a 100+ person research team.
- Designing and scaling multi-cloud systems for high-performance model training and inference.
- Partnering across AWS, GCP, Cloudflare, and GCore to optimize GPU compute environments.
- Enhancing CI/CD pipelines and developer velocity.
- Improving monitoring, alerting, and system observability for AI workloads.
- Driving alignment in DevOps best practices across teams.
- Leading a high-impact engineering team in a fast-paced environment.
Requirements:
- Experience leading DevOps or infrastructure teams, ideally in AI or high-performance computing.
- Proficiency with AWS and multi-cloud environments.
- Experience with Kubernetes, SLURM, or similar distributed training infrastructure.
- Fluency in infrastructure as code tools like Terraform.
- Strong grasp of containerization, Linux fundamentals, and cloud networking.
- Collaborative and passionate about enabling others.
About the team:
Join CORE (Canva Original Research & Exploration) to build world-class AI models that unlock creativity at scale.