Staff Systems Engineer, Site Reliability Engineering, Google Cloud
AI Summary ✨
Requirements:
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
5 years of experience with programming in one or more programming languages.
3 years of experience working with Unix/Linux systems internals and administration (e.g. filesystems, inodes, system calls) or networking (e.g. TCP/IP, routing, network topologies and hardware, SDN).
3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems.
Nice to haves:
Experience working in computing, distributed systems, storage, or networking.
Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
Ability to debug, optimize code, and to automate routine tasks.
Systematic problem-solving approach, coupled with effective verbal and written communication skills.
What you'll be doing:
Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of Google's services.
Lead sustainable incident response, blameless postmortems, and production improvements that result in direct business opportunities for Google.
Provide guidance to other team members on managing availability and performance of mission critical services, building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
Mentor and train other team members on design techniques and coding standards, and to cultivate innovation and collaboration across multiple teams.
Manage individual projects priorities, deadlines, and deliverables.
Perks and Benefits:
Development and maintenance of Google's data centers and platforms.
Opportunity to work on meaningful projects and learn and grow with mentorship.
Diverse and inclusive culture fostering collaboration and innovation.
Proud engineering team supporting Google's product portfolio.