Requirements:

MSc or PhD in Computer Science, Computer Engineering, or a related field with 8+ years of experience in machine learning, computer vision, or equivalent experience
A passion for optimizing AI models and understanding low-level GPU processes
Strong programming skills in Python, C++, and CUDA (Slang is a plus)
Experience with running and optimizing distributed training on a multi-node multi-GPU system
Strong software engineering fundamentals including source control, testing/validation, and containerization
Excellent communication and interpersonal skills

Nice to Haves:

Strong coding architecture skills demonstrated through contributions to large internal or open-source projects
Hands-on experience with NCCL, MPI, or UCX communication libraries
Experience with advanced CUDA and Slang optimization for graphics or vision applications
History of multidisciplinary creativity and innovation with hardware and software projects in graphics or robotics
Experience with robotic systems such as autonomous vehicles or humanoid robotics

Profile model training and inference to identify bottlenecks regarding efficiency, latency, and memory usage
Optimize models with respect to GPU utilization and throughput by improving kernel efficiency
Work with researchers to scale prototypes of models for generative video creation, segmentation, 3D reconstruction, and more
Collaborate within a large codebase building robust models for synthetic data and world generation
Interact closely with different research, performance, and product teams at NVIDIA
Contribute to NVIDIA NuRec and other core NVIDIA products and libraries