Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

Site Reliability Engineer, ML Infrastructure, Large Models SRE

AI Summary ✨

Requirements:

  • Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
  • 5 years of experience with software development in one or more programming languages.
  • 3 years of experience in designing, analyzing, and troubleshooting distributed systems.
  • 2 years of experience leading projects and providing technical leadership.

Nice to haves:

  • Experience in Large Language Models/Machine Learning tooling and infrastructure.
  • Experience in automation, monitoring, and incident response.
  • Experience in C++, Java, Python, or Go.
  • Understanding of Site Reliability Engineering (SRE) principles and best practices.
  • Excellent communication, project and stakeholder management skills.

What you'll be doing:

  • Design, build, and maintain scalable and reliable Large Model infrastructure.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Participate in an oncall incident response, be a part of the oncall rotation and practice blameless postmortems.
  • Practice sustainable incident response and blameless postmortems.
  • Implement best practices in SRE, including automation, monitoring, and incident response.

Perks and benefits:

  • Google is proud to be an equal opportunity and affirmative action employer.
  • Opportunity to work on challenging projects with unique scale at Google.
  • Collaborative and intellectually stimulating work environment.
Apply here
Google logo

Google

London, UK

Experience: Mid-level
Posted: August 1, 2025
Golang
Java
Nodejs
Python
sitereliability

Similar jobs

  • 17 days ago
  • a month ago
    Still looking
    Remote
  • a month ago
    Still looking
  • See all jobs in UK