Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

Senior Machine Learning Engineer, Ads Training Platform

AI Summary ✨

Requirements

  • 5+ years in infrastructure/platform engineering or large-scale distributed systems.
  • 2+ years hands-on experience with Ray platform.
  • Strong understanding of distributed computing principles (task scheduling, fault tolerance, state management).
  • Experience with distributed storage systems and large-scale data processing.
  • Proven ability to debug and profile distributed jobs.

What You'll Be Doing

  • Design, build, and maintain large-scale distributed training infrastructure for Ads ML models.
  • Develop tools and frameworks on top of the Ray platform.
  • Build tools to debug, profile, and tune distributed training jobs for performance and reliability.
  • Integrate with object storage systems and improve data access patterns.
  • Collaborate with ML engineers to improve model training time, efficiency, and GPU training costs.
  • Drive improvements in scheduling, state management, and fault tolerance within the training platform to enhance overall performance.

Nice to Haves

  • Experience with deep learning frameworks (PyTorch, TensorFlow) is a big plus.
  • Bonus: model optimization for distributed training, Ads ML experience.

Perks and Benefits

  • Pension Scheme
  • Private Medical and Dental Scheme
  • Life Assurance, Income Protection
  • Workspace benefit for your home office
  • Personal & Professional development funds
  • Family Planning Support
  • Commuter Benefits
  • Flexible Vacation & Reddit Global Days Off
Apply here
Reddit logo

Reddit

Remote - UK (Remote)

Experience: Senior
Posted: August 19, 2025
dataengineering

Similar jobs

  • a day ago
    New
    Remote
  • 2 days ago
    Remote
  • 3 days ago
    Remote
  • See all jobs in UK