Staff Systems Engineer, Site Reliability Engineering, Google Cloud
AI Summary ✨
Requirements:
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
5 years of experience with programming in one or more programming languages.
4 years of experience leading projects.
3 years of experience designing, analyzing, and troubleshooting distributed systems, and working with administration (e.g. filesystems, inodes, system calls) or networking (e.g. TCP/IP, routing, network topologies and hardware, Software-Defined Networking (SDN)).
Nice to have:
Experience in troubleshooting, and supporting large-scale applications such as web services, data storage, databases, data pipelines, high-performance computing, commerce engines, with Linux/Unix operating systems.
Experience in network protocols and large scale networking architectures.
Excellent problem-solving skills.
What you'll be doing:
Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of Google's services.
Lead incident response, postmortems, and production improvements that result in direct business opportunities for Google.
Provide guidance to other team members on managing availability and performance of mission critical services, building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
Mentor and train other team members on design techniques and coding standards, and to cultivate innovation and collaboration across multiple teams.
Manage individual projects priorities, deadlines, and deliverables.
Perks and Benefits:
Opportunity to work on large-scale, massively distributed, fault-tolerant systems.
Intellectual curiosity, problem-solving, and openness culture.
Collaboration, innovation, and risk-taking in a blame-free environment.