<div class="content-intro"><h3><strong><span style="font-family: arial, helvetica, sans-serif;">ABOUT xAI</span></strong></h3> <p><span style="font-family: arial, helvetica, sans-serif;">xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. </span><span style="font-family: arial, helvetica, sans-serif;">Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. </span><span style="font-family: arial, helvetica, sans-serif;">We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. </span><span style="font-family: arial, helvetica, sans-serif;">All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.</span></p></div><h3>About the Role:</h3> <ul> <li>We are building the high-performance inference platform that serves Grok to millions of users every day with lightning speed and perfect reliability.</li> <li>As a Member of Technical Staff - Inference, you will design and optimize large-scale model serving systems end-to-end. You will own everything from distributed infrastructure (global KV cache, continuous batching, load balancing, auto-scaling) to deep low-level optimizations (GPU kernels, quantization, speculative decoding, tail latency).</li> <li>This is a high-impact role where your work directly determines how fast and reliably users interact with Grok at massive scale</li> </ul> <p><span style="font-size: 14pt;"><strong>Responsibilities: </strong></span></p> <ul> <li>Architect and implement scalable distributed infrastructure for model serving (load balancing, auto-scaling, batch scheduling, global KV cache).</li> <li>Optimize latency and throughput of model inference under real production workloads.</li> <li>Build reliable, high-concurrency serving systems that serve billions of users with 100% uptime, 0% error rate, and excellent tail latency.</li> <li>Benchmark, fine-tune, and accelerate inference engines (including low-level GPU kernel work and code generation).</li> <li>Develop custom tools to trace, replay, and fix issues across the full stack — from orchestration down to GPU kernels.</li> <li>Create robust CI/CD infrastructure for seamless endpoint deployment, image publishing, and inference engine updates.</li> <li>Accelerate research on scaling test-time compute, RL rollout, and model-hardware co-design for next-generation systems.</li> </ul> <h3>BASIC QUALIFICATIONS:</h3> <ul> <li>Deep low-level systems programming (C/C++ or Rust)</li> <li>Experience with large-scale, high-concurrent production serving.</li> <li>Experience with GPU inference engines (vLLM, SGLang, Triton, TensorRT-LLM, etc.).</li> <li>Strong background in system optimizations: batching, caching, load balancing, parallelism.</li> <li>Low-level inference optimizations: GPU kernels, code generation.</li> <li>Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics.</li> <li>Experience with testing, benchmarking, and reliability of inference services.</li> <li>Experience designing and implementing CI/CD infrastructure for inference.</li> </ul> <h3>COMPENSATION AND BENEFITS:</h3> <p>$180,000 - $440,000 USD</p> <p>Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.</p><div class="content-conclusion"><p><em>xAI is an equal opportunity employer. For details on data processing, view our </em><em><a href="https://x.ai/legal/recruitment-privacy-notice" target="_blank">Recruitment Privacy Notice</a>.</em></p></div>