A

AI Safety Researcher

Anthropic

San Francisco, CAHybrid$250,000 - $450,0002 weeks ago

full-timeseniorclaudecustom

About the Role

Anthropic is hiring an AI Safety Researcher to work on our core alignment research agenda. You will develop new techniques for making AI systems more reliable, interpretable, and aligned with human values. This role involves both theoretical and empirical research. You will design experiments, analyze model behavior, and develop new training techniques that improve the safety properties of our models. We are looking for someone who combines strong ML engineering skills with deep thinking about AI safety challenges. You will help shape the direction of safety research at one of the leading AI labs in the world.

Requirements

- 5+ years of ML/AI research experience - Deep understanding of alignment techniques (RLHF, Constitutional AI, debate) - Strong publication record in ML safety or related fields - Proficiency in Python and PyTorch - Experience analyzing and interpreting model behavior - PhD in ML, CS, or related field strongly preferred

Required Skills

PythonPyTorchRLHFTransformers

About Anthropic

AI safety company building reliable, interpretable, and steerable AI systems. Makers of Claude.

Visit Company Website

Ready to Apply?

Join Anthropic and work on cutting-edge AI technology