Senior Software Engineer, Site Reliability Engineering, Cloud IRT
AI Summary ✨
Requirements
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
5 years of experience with software development in one or more programming languages.
3 years of experience in designing, analyzing, and troubleshooting distributed systems.
2 years of experience leading projects and providing technical leadership.
Nice to Haves
Experience in telemetry systems and incident and risk management.
Ability to work on cross-organizational boundaries.
Ability to balance product/development velocity with architectural hygiene.
Excellent problem-solving approach and communication skills, with a passion for learning from experiences.
What You'll Be Doing
Define and escalate risks in Cloud and reduce incident probabilities with strategic and tactical/pragmatic approaches as appropriate.
Focus on high-quality customer outcomes and continuous collaboration across GCP teams.
Create IMAG training, end to end processes for incident management lifecycle, and partner with Cloud SRE UTLs and the Cloud Support leadership team.
Build systems and tooling to support the Cloud IRT team. Improve visibility for Cloud, detection of large-scale issues, communications to customers, stakeholders and customer facing teams.
Participate in oncall rotation supporting critical incident response for all of GCP.