ElevenLabs is looking for a Senior Speech ML Engineer to work on our text-to-speech and voice cloning models. You will push the boundaries of speech synthesis quality, developing models that produce natural, expressive, and emotionally nuanced speech.
This role involves working on novel architectures for speech generation, prosody modeling, and multi-speaker adaptation. You will work with large-scale speech data and train models that serve millions of API requests daily.
The ideal candidate has deep expertise in speech synthesis, audio signal processing, and generative models.
Requirements
- 5+ years of experience in speech/audio ML
- Deep knowledge of TTS architectures (Tacotron, VITS, etc.)
- Strong PyTorch and CUDA optimization skills
- Experience with voice cloning and speaker adaptation
- Understanding of audio signal processing
- PhD in Speech Processing or related field preferred
Required Skills
PythonPyTorchCUDATransformers
About ElevenLabs
The most realistic AI voice platform. Text-to-speech, voice cloning, and audio AI.