Company Logo
Software Engineer

Netflix - 1d ago

Company Logo
Senior Software Engineer

Reddit - 4d ago

Research Engineer (LLM Training and Performance)

Requirements:

  • Strong PyTorch and PyTorch Distributed experience, having run multi-node jobs with tens to hundreds of GPUs.
  • Hands-on experience with Megatron-LM/Megatron-Core/NeMo, DeepSpeed, or serious FSDP/ZeRO expertise.
  • Real profiling expertise (Nsight Systems/Compute, nvprof) and experience with NVTX-instrumented workflows.
  • GPU programming skills with Triton and/or CUDA, and the ability to write, test, and debug kernels.
  • A solid understanding of NCCL collectives, as well as topology and fabric effects (IB/RoCE), and how they show up in traces.

Nice to Haves:

  • FlashAttention-2 and 3, CUTLASS and CuTe, TransformerEngine and FP8, Inductor, AOTAutograd, and torch.compile.
  • MoE at scale (expert parallel, router losses, capacity management) and long-context tricks (ALiBi/YaRN/NTK scaling).
  • Kubernetes or SLURM at scale, placement and affinity tuning, as well as AWS, GCP, and Azure GPU fleets.
  • Web-scale data plumbing (streaming datasets, Parquet and TFRecord, tokenizer perf), eval harnesses, and benchmarking.
  • Safety and post-training methods, such as DPO, ORPO, GRPO, and reward models.
  • Inference ecosystems such as vLLM and paged KV.

What You'll Be Doing:

  • Be responsible for improving end-to-end performance for multi-node LLM pre-training and post-training pipelines.
  • Profile hotspots (Nsight Systems/Compute, NVTX) and fix them using compute/comm overlap, kernel fusion, scheduling, etc.
  • Design and evaluate architecture choices (depth/width, attention variants including GQA/MQA/MLA/Flash-style, RoPE scaling/NTK, and MoE routing and load-balancing).
  • Implement custom ops (Triton and/or CUDA C++), integrate via PyTorch extensions, and upstream when possible.
  • Push memory/perf levers: FSDP/ZeRO, activation checkpointing, FP8/TE, tensor/pipeline/sequence/expert parallelism, NCCL tuning.
  • Harden large runs by building elastic and fault-tolerant training setups, ensuring robust checkpointing, strengthening reproducibility, and improving resilience to preemption.
  • Keep the data path fast using streaming and sharded data loaders and tokenizer pipelines, as well as improve overall throughput and cache efficiency.
  • Define the right metrics, build dashboards, and deliver steady improvements.
  • Run both pre-training and post-training (including SFT, RLHF, and GRPO-style methods) efficiently across sizable clusters.

Perks and Benefits:

  • We are an equal opportunity employer
  • We know great ideas can come from anyone, anywhere. That’s why we do our best to create an open and inclusive workplace – one that welcomes everyone regardless of their background, identity, religion, age, accessibility needs, or orientation.
  • We process the data provided in your job application in accordance with the Recruitment Privacy Policy.
AI Summary ✨
JetBrains logo

JetBrains

Netherlands, Germany, Cyprus, UK, Czech Republic, Poland, Armenia, Serbia, Spain

Remote
Experience: Senior
Posted: February 19, 2026
Last seen: an hour ago
Aws
Azure
Gcp
Kubernetes
Nodejs
machinelearning

Why we track JetBrains

JetBrains makes the IDEs that most developers love—IntelliJ, WebStorm, PyCharm, and Kotlin. Headquartered in Prague with offices across Europe. If you want to build developer tools used by millions of engineers, there are few better places.

Similar jobs

  • adyen logo

    Senior Machine Learning Engineer

    Amsterdam, Netherlands

    a day ago
    New
  • adyen logo

    Senior Machine Learning Scientist

    Amsterdam, Netherlands

    a day ago
    New
  • ebay logo

    Senior Applied Researcher

    Amsterdam, Netherlands

    8 days ago
  • ebay logo

    Senior Engineer

    Amsterdam, Netherlands

    8 days ago
  • See all jobs in Netherlands