San Francisco, CAHybrid$190,000 - $300,0001 months ago
full-timeseniorllamamistralstable-diffusion
About the Role
Build the fastest AI inference platform. Optimize serving of LLMs and diffusion models at scale.
Responsibilities:
- Optimize model serving latency and throughput
- Implement custom CUDA kernels
- Build batching and scheduling systems
- Support multiple model architectures
Requirements
- 3+ years ML systems experience
- Strong C++/CUDA programming
- Experience with model quantization
- Knowledge of transformer architectures
Required Skills
PythonC++CUDAPyTorch
About Fireworks AI
Fast and affordable AI inference platform for production workloads.