Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

AI Research Engineer - Datadog AI Research (DAIR)

AI Summary ✨

Requirements:

  • You have strong software engineering skills with experience in domains such as observability, SRE, or security
  • You have depth in distributed computing and ML systems for training and inference at scale; experience with Ray, Slurm, or similar frameworks is a plus
  • You are proficient in Python, familiar with a systems language (e.g., Rust, C++, or Go), and you are comfortable with modern cloud and data infrastructure
  • You have practical experience implementing and operating ML training and inference systems (e.g., PyTorch or JAX), including containerization, orchestration, and GPU acceleration
  • You are familiar with efficient training, fine-tuning, and inference techniques for large foundation models
  • You can explain design and performance trade-offs clearly to both technical and non-technical audiences
  • You have a strong interest in open-science and open-source contributions, including establishing rigorous benchmarks and sharing artifacts with the community

What You'll Be Doing:

  • Build and operate datasets, training and evaluation pipelines, benchmarks, and internal tooling
  • Implement models, run experiments at scale, and profile for reliability, performance, and cost
  • Orchestrate distributed training and distributed RL with Ray, including scheduling, scaling, and failure recovery
  • Make the research stack observable, reproducible, and easier to use
  • Establish rigorous automated benchmarks and regression tests for forecasting, anomaly detection, multi-modal analysis, agents, and code repair tasks
  • Collaborate with Research Scientists, Product, and Engineering to integrate advanced AI capabilities into Datadog’s product ecosystem and to harden prototypes into reliable services
  • Contribute high-quality code, documentation, and open-source artifacts that enable the community and internal teams to reproduce, extend, and evaluate results

Nice to Haves (Bonus Points):

  • You have a demonstrated ability to bridge cutting-edge research prototypes and real-world product applications, ideally with large foundation models, generative AI agents, or domain-specific LLM deployments
  • You are passionate about pushing the boundaries of AI while maintaining a strong focus on customer impact, scalability, and responsible deployment of new technologies
  • You have hands-on experience with GPU programming and optimization, including experience in CUDA
  • You have experience writing production data pipelines and applications
  • You have experience supporting or contributing to research publications

Perks and Benefits:

  • Competitive global benefits
  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Opportunity to collaborate closely with colleagues across the Datadog offices in New York City and Paris
  • Opportunity to attend and present at conferences and meetups
  • Intra-departmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
Apply here
Datadog logo

Datadog

France

Experience: Mid-level
Posted: August 27, 2025
Golang
Python
Rust
machinelearning

Similar jobs

  • 14 days ago
  • 14 days ago
  • 2 months ago
    Still looking
  • See all jobs in France