Requirements
- 5+ years of professional experience in local GPU deployment, profiling, and optimization.
- BS or MS degree in Computer Science, Engineering, or related degree.
- Strong proficiency in C/C++, Python, software design, programming techniques.
- Familiarity with and development experience on the Windows operating system.
- Proven theoretical understanding of Transformer architectures, specifically LLMs and Generative AI, and convolutional neural networks.
- Experience working with open-source LLM and GenAI software, e.g., PyTorch or llama.cpp.
- Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite.
- Strong verbal and written communication skills in English and organization skills, with a logical approach to problem-solving, time management, and task prioritization skills.
- Excellent interpersonal skills.
- Some travel required for conferences and on-site visits with external partners.
Nice to Haves
- Experience with GPU-accelerated AI inference driven by NVIDIA APIs, specifically cuDNN, CUTLASS, TensorRT.
- Confirmed expert knowledge in Vulkan and / or DX12.
- Familiarity with WSL2, Docker.
- Detailed knowledge of the latest generation GPU architectures.
- Experience with AI deployment on NPUs and ARM architectures.
What You'll Be Doing
- Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like PyTorch, llama.cpp, ComfyUI.
- Engage with internal product teams and external OSS maintainers to align on and prioritize OSS enhancements.
- Work closely with internal engineering teams and external app developers on solving local end-to-end LLM & Generative AI GPU deployment challenges, using techniques like quantization or distillation.
- Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance.
- Conduct hands-on training, develop sample code and host presentations to provide guidance on efficient end-to-end AI deployment targeting optimal runtime performance.
- Guide developers of AI applications applying methodologies for efficient adoption of DL frameworks targeting maximal utilization of GPU Tensor Cores for the best possible inference performance.
- Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs.
Perks and Benefits
NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all.