San Francisco, CAHybrid$250,000 - $450,0002 weeks ago
full-timeseniorclaudecustom
About the Role
Anthropic is hiring an AI Safety Researcher to work on our core alignment research agenda. You will develop new techniques for making AI systems more reliable, interpretable, and aligned with human values.
This role involves both theoretical and empirical research. You will design experiments, analyze model behavior, and develop new training techniques that improve the safety properties of our models.
We are looking for someone who combines strong ML engineering skills with deep thinking about AI safety challenges. You will help shape the direction of safety research at one of the leading AI labs in the world.
Requirements
- 5+ years of ML/AI research experience
- Deep understanding of alignment techniques (RLHF, Constitutional AI, debate)
- Strong publication record in ML safety or related fields
- Proficiency in Python and PyTorch
- Experience analyzing and interpreting model behavior
- PhD in ML, CS, or related field strongly preferred
Required Skills
PythonPyTorchRLHFTransformers
About Anthropic
AI safety company building reliable, interpretable, and steerable AI systems. Makers of Claude.