About the Role

<div class="content-intro"><h3><strong><span style="font-family: arial, helvetica, sans-serif;">About xAI</span></strong></h3> <p><span style="font-family: arial, helvetica, sans-serif;">xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. </span><span style="font-family: arial, helvetica, sans-serif;">Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. </span><span style="font-family: arial, helvetica, sans-serif;">We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. </span><span style="font-family: arial, helvetica, sans-serif;">All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.</span></p></div><h3><strong>ABOUT THE ROLE:</strong></h3> <p>You will join the multimodal team to push toward superhuman multimodal intelligence. Advance understanding and generation across modalities—image, video, audio, and text—spanning the full stack: data curation/acquisition, tokenizer training, large-scale pre-training, post-training/alignment, infrastructure/scaling, evaluation, tooling/demos, and end-to-end product experiences.</p> <p>Collaborate cross-functionally with pre-training, post-training, reasoning, data, applied, and product teams to deliver frontier capabilities in multimodal reasoning, world modeling, tool use, agentic behaviors, and interactive human-AI collaboration. Contribute to building models that can see, hear, reason about, and interact with the world in real time at unprecedented levels.</p> <h3><strong>RESPONSIBILITIES:</strong></h3> <ul> <li>Design, build, and optimize large-scale distributed systems for multimodal pre-training, post-training, inference, data processing, and tokenization at web/petabyte scale.</li> <li>Develop high-throughput pipelines for data acquisition, preprocessing, filtering, generation, decoding, loading, crawling, visualization, and management (images, videos, audio + text).</li> <li>Advance multimodal capabilities including spatial-temporal compression, cross-modal alignment, world modeling, reasoning, emergent abilities, audio/image/video understanding & generation, real-time video processing, and noisy data handling.</li> <li>Drive data quality and studies: curation (human/synthetic), filtering techniques, analysis, and scalable pipelines to support trillion-parameter models.</li> <li>Create evaluation frameworks, internal benchmarks, reward models, and metrics that capture real-world usage, failure modes, interactive dynamics, and human-AI synergy.</li> <li>Innovate on algorithms, modeling approaches, hardware/software/algorithm co-design, and scaling paradigms for state-of-the-art performance.</li> <li>Build research tooling, user-friendly interfaces, prototypes/demos, full-stack applications, and enable rapid iteration based on feedback.</li> <li>Work across the stack (pre-training → SFT/RL/post-training) to enable reasoning, tool calling, agentic behaviors, orchestration, and seamless real-time interactions.</li> </ul> <h3><strong>BASIC QUALIFICATIONS:</strong></h3> <ul> <li>Hands-on experience with multimodal pre-training, post-training, or fine-tuning (vision, audio, video, or cross-modal).</li> <li>Expert-level proficiency in Python (core language), with strong experience in at least one of: JAX / PyTorch / XLA.</li> <li>Proven track record building or optimizing large-scale distributed ML systems (training/inference optimization, GPU utilization, multi-GPU/TPU setups, hardware co-design).</li> <li>Deep experience designing and running data pipelines at scale: curation, filtering, generation, quality studies, especially for noisy/real-world multimodal data.</li> <li>Strong fundamentals in evaluation design, benchmarks, reward modeling, or RL techniques (particularly for interactive/agentic behaviors).</li> <li>Proactive self-starter who thrives in high-intensity environments and is passionate about pushing multimodal AI frontiers.</li> <li>Willingness to own end-to-end initiatives and do whatever it takes to deliver breakthrough user experiences.</li> </ul> <h3>PREFERRED SKILLS AND EXPERIENCE:</h3> <ul> <li>Experience leading major improvements in model capabilities through better data, modeling, algorithms, or scaling.</li> <li>Familiarity with state-of-the-art in multimodal LLMs, scaling laws, tokenizers, compression techniques, reasoning, or agentic systems.</li> <li>Proficiency in Rust and/or C++ for performance-critical components.</li> <li>Hands-on work with large-scale orchestration tools such as Spark, Ray, or Kubernetes.</li> <li>Background building full-stack tooling: performant interfaces, real-time research demos/apps, or end-to-end product ownership.</li> <li>Passion for end-to-end user experience in interactive, real-time multimodal AI systems.</li> </ul> <h3><strong>COMPENSATION AND BENEFITS:</strong></h3> <p>$180,000 - $440,000 USD</p> <p class="p1">Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.</p><div class="content-conclusion"><p><em>xAI is an equal opportunity employer. For details on data processing, view our </em><em><a href="https://x.ai/legal/recruitment-privacy-notice" target="_blank">Recruitment Privacy Notice</a>.</em></p></div>

About the Role

Member of Technical Staff - Multimodal Understanding

About the Role

Required Skills

About xAI

Ready to Apply?

Member of Technical Staff - Multimodal Understanding

About the Role

Required Skills

About xAI

Ready to Apply?