Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

Senior System Software Engineer, NCCL - Partner Enablement

AI Summary ✨

Requirements:

  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience. Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
  • Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high-performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
  • Expert in Linux fundamentals and a scripting language, preferably Python
  • Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and time zones

Nice to haves:

  • Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, especially for large clusters. Experience debugging network configuration issues in large-scale deployments
  • Familiarity with CUDA programming and/or GPUs. Good understanding of Machine Learning concepts and experience with Deep Learning Frameworks such as PyTorch, TensorFlow
  • Deep understanding of technology and passionate about what you do

What you'll be doing:

  • Engage with our partners and customers to root cause functional and performance issues reported with NCCL
  • Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters
  • Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
  • Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters
  • Document and conduct trainings/webinars for NCCL
  • Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure, and support

Perks and benefits:

  • NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization
  • Dedicated teams of driven, innovative professionals pushing the boundaries of technology
  • Highly competitive salaries
  • Extensive benefits package
  • Work environment that promotes diversity, inclusion, and flexibility
  • Equal opportunity employer committed to fostering a supportive and empowering workplace for all
Apply here
NVIDIA logo

NVIDIA

Remote - Germany (Remote)

Experience: Senior
Posted: May 23, 2025
Aws
Azure
Docker
Gcp
Kubernetes
Nodejs
Python
backend

Similar jobs

  • a day ago
    New
    Remote
  • a day ago
    New
    Remote
  • 2 days ago
    New
  • See all jobs in Germany