Requirements:

An advanced degree in Computer Science, Computer Engineering, or related computationally focused science degree (or equivalent experience)
15+ years of relevant experience in software development or research work
Programming fluency in C/C++ with a deep understanding of algorithms and software development
Background in parallel programming, e.g., CUDA, OpenACC, OpenMP, MPI, pthreads, etc.
Hands-on experience in low-level performance optimizations
In-depth expertise with CPU and GPU architecture fundamentals
Effective communication and organization skills, logical problem-solving approach, good time management, and prioritization skills

Nice to Haves:

Expertise in parallelization and performance optimization of Deep Learning models in Natural Language Processing, Computer Vision, Recommender Systems, etc.
Excellent understanding of linear algebra

Researching and developing techniques to GPU accelerate workloads in deep learning, machine learning, or other AI domains
Working with technical experts to analyze and optimize complex AI and HPC algorithms for optimal solutions on modern CPU and GPU architectures
Publishing and presenting optimization techniques in developer blogs or conferences
Influencing the design of next-generation hardware architectures, software, and programming models

Opportunity to work on cutting-edge technologies at the forefront of AI and GPU acceleration
Engagement with the developer community and collaboration with industry and academia experts
Possibility of influencing design decisions for next-gen hardware and software
Joining a team of forward-thinking individuals at a leading tech company, NVIDIA
A diverse work environment and commitment to equal opportunity employment