Bachelor’s degree in computer science or a related STEM field
Experience programming AI accelerators (e.g. GPUs, custom silicon etc.) using AI frameworks such as PyTorch or similar
Experience developing custom kernels and compiler infrastructure to improve performance using low-level programming models such as CUDA, OpenCL or similar
Minimum 5 years of experience developing and optimizing performance in modern C/C++
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Preferred Qualifications
Master’s/PhD in computer science or related STEM field
A proven track record of impactful contributions to pre-training of AI models at scale using GPUs/custom ASIC or similar (publications, relevant work experience, shipped products, patents etc)
Experience with neural network training using ML frameworks such as PyTorch etc.
Experience with distributed AI systems and communication protocols such as MPI or collective libraries such as NCCL etc.
Experience or knowledge in one or more of LLMs and recommender systems
What You'll Be Doing
Applying state-of-the-art optimization techniques to our latest large-scale AI workloads running on Meta’s fleet of accelerators
Profiling, analyzing, debugging, and optimizing large-scale workloads on our next-generation training superclusters
Work tightly with our customers to co-design models to maximize pre-training and inference efficiency
Set direction and goals for the team related to project impact, capacity, and developer efficiency
Collaborating cross-functionally with the compiler, framework, communication and firmware teams to capture performance bottlenecks
Implement custom kernels to maximize model performance
Lead large and complex technical efforts across many engineers and teams
Perks and Benefits
Meta builds technologies that help people connect, find communities, and grow businesses
Opportunity to work on some of the most crucial and exciting problems in the field
Be part of a team dedicated to maximizing training and inference performance of Generative AI and Recommendation models