Site Reliability Engineer, Platforms Infrastructure Engineering
AI Summary ✨
Requirements:
Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
Experience analyzing and troubleshooting systems.
Experience in one or more of the following: C, C++, Java, Python, Go, Perl, or Ruby.
Nice to haves:
Experience with oncall and production incident management.
Experience with Unix-based OS.
Experience analyzing and troubleshooting systems.
What you'll be doing:
Drive an understanding of production reliability into platform design and development through consulting, model development, and automation.
Own the characterization and qualification of new platforms. Build reliability through an understanding of the platform's performance and capabilities.
Develop per-platform capability-focused Service Level Objective (SLO), monitoring, and alerts to create coherency and consistency despite increased platform heterogeneity.
Address challenges created by the introduction of technologies into Google’s production systems as well as software experiments that could impact future infrastructure planning.
Learn about the software and hardware that underpins Google’s production systems and interact with the development and SRE teams.
Perks and Benefits:
Opportunity to manage the complex challenges of scale unique to Google Cloud.
Use expertise in coding, algorithms, complexity analysis, and large-scale system design.
Culture of intellectual curiosity, problem-solving, and openness.
Collaborative work environment with support and mentorship for learning and growth.