Google DeepMind is seeking a Staff ML Engineer to work on our foundation model training infrastructure. You will be responsible for designing and implementing the systems that train the next generation of Gemini models.
This role requires deep expertise in large-scale distributed training, optimization of training efficiency, and model architecture innovation. You will work with some of the world's leading AI researchers and have access to cutting-edge compute infrastructure.
You will contribute to pushing the boundaries of what's possible with AI, working on models that will be used by billions of people worldwide.
Requirements
- 8+ years of ML engineering experience
- Expert-level proficiency in JAX or PyTorch
- Deep experience with distributed training at scale (1000+ GPUs)
- Strong understanding of transformer architectures
- Track record of shipping ML systems at scale
- PhD in ML/CS preferred
Required Skills
PythonJAXDistributed TrainingTransformersCUDA
About Google DeepMind
Building AI systems that can solve complex problems and advance scientific discovery.