Senior DevTech Engineer - Windows LLM and GenAI Open-Source Ecosystem at NVIDIA

Requirements:

5+ years of professional experience in local GPU deployment, profiling, and optimization
BS or MS degree in Computer Science, Engineering, or related degree
Strong proficiency in C/C++, Python, software design, programming techniques
Familiarity with and development experience on the Windows operating system
Proven theoretical understanding of Transformer architectures - specifically LLMs and Generative AI - and convolutional neural networks
Experience working with open-source LLM and GenAI software, e.g., PyTorch or llama.cpp
Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite
Strong verbal and written communication skills in English and organization skills, with a logical approach to problem-solving, time management, and task prioritization skills
Excellent interpersonal skills
Some travel is required for conferences and for on-site visits with external partners

Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like PyTorch, llama.cpp, ComfyUI
Engage with internal product teams and external OSS maintainers to align on and prioritize OSS enhancements
Work closely with internal engineering teams and external app developers on solving local end-to-end LLM & Generative AI GPU deployment challenges, using techniques like quantization or distillation
Apply powerful profiling and debugging tools for analyzing the most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance
Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance
Guide developers of AI applications applying methodologies for efficient adoption of DL frameworks targeting maximal utilization of GPU Tensor Cores for the best possible inference performance
Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next-generation GPU features by providing real-world workflows and giving feedback on partner and customer needs

Experience with GPU-accelerated AI inference driven by NVIDIA APIs, specifically cuDNN, CUTLASS, TensorRT
Confirmed expert knowledge in Vulkan and / or DX12
Familiarity with WSL2, Docker
Detailed knowledge of the latest generation GPU architectures
Experience with AI deployment on NPUs and ARM architectures

Highly competitive salaries
Extensive benefits package
Work environment that promotes diversity, inclusion, and flexibility
Equal opportunity employer committed to fostering a supportive and empowering workplace for all