RDMA Engineer - Supercomputing job opportunity at xAI.



bot
xAI RDMA Engineer - Supercomputing
Experience: General
Pattern: full-time
apply Apply Now
Salary:
Status:

Infrastructure

Copy Link Report
degreeGeneral
loacation Palo Alto, CA; San Francisco, CA, United States Of America
loacation Palo Alto, CA;..........United States Of America

About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.  About the Role RDMA Engineers on xAI’s Supercomputing team design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability. Focus Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes. Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead. Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems. Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI. Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments. Ideal Experience Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments. Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization. Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory). Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads. Knowledge of Kubernetes networking and integrating RDMA into containerized environments. Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization). Tech Stack NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE) RDMA protocols (e.g., GPUDirect RDMA, RoCEv2) Kubernetes Rust and C/C++ MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library) Annual Salary Range $180,000 - $440,000 USD Benefits Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Other Ai Matches

Mechanical Engineer (HVAC / Chilled Water) Applicants are expected to have a solid experience in handling Data Center Operations related tasks
remote-jobserver Remote
Chemistry Tutor Applicants are expected to have a solid experience in handling Human Data related tasks
Global Supply Manager- SaaS Applicants are expected to have a solid experience in handling Finance related tasks
Manager, Law Enforcement Response Team Applicants are expected to have a solid experience in handling Legal related tasks
Senior Sourcing Specialist- Indirect Applicants are expected to have a solid experience in handling Finance related tasks
Member of Technical Staff - Multimodal Interactions Post-training Applicants are expected to have a solid experience in handling Foundation Model related tasks
Member of Technical Staff, Pre-training Data Infrastructure Applicants are expected to have a solid experience in handling Foundation Model related tasks
Mission Manager - International Government Applicants are expected to have a solid experience in handling Engineering related tasks
Client Partner Applicants are expected to have a solid experience in handling Sales related tasks
Member of Technical Staff, Ads Product Applicants are expected to have a solid experience in handling Product related tasks
Legal Director, X Payments Applicants are expected to have a solid experience in handling Legal related tasks
Network Engineer - Backbone Applicants are expected to have a solid experience in handling Engineering related tasks
Facilities Maintenance Technician Applicants are expected to have a solid experience in handling Data Center Operations related tasks
Member of Technical Staff, Image Generation - Agent, RL Applicants are expected to have a solid experience in handling Foundation Model related tasks
remote-jobserver Remote
System Design Specialist Applicants are expected to have a solid experience in handling Human Data related tasks
Member of Technical Staff - Reasoning Post-training Applicants are expected to have a solid experience in handling Foundation Model related tasks
Site Ops Lead Applicants are expected to have a solid experience in handling Data Center Operations related tasks
Software Engineer - Reliability Applicants are expected to have a solid experience in handling Infrastructure related tasks
remote-jobserver Remote
Medicine Tutor Applicants are expected to have a solid experience in handling Human Data related tasks
Member of Technical Staff - Search Post Training Applicants are expected to have a solid experience in handling Foundation Model related tasks
Member of Technical Staff - Government - Cleared Applicants are expected to have a solid experience in handling Engineering related tasks
remote-jobserver Remote
Materials Science Tutor Applicants are expected to have a solid experience in handling Human Data related tasks
remote-jobserver Remote
Data Science Tutor Applicants are expected to have a solid experience in handling Human Data related tasks