Requirements:

Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
5 years of experience with software development in one or more programming languages.
3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems.
2 years of experience leading projects and providing technical leadership.

Nice to haves:

Master's degree in Computer Science or Engineering.
Experience developing and supporting Google scale production systems.
Experience enhancing and supporting large production systems on cluster management system.
Experience in software engineering and development experience in C++, Python, General Configuration Language (GCL), APIs and Go.
Experience with networking, capacity and performance.
Experience in large-scale system and architecture design and system integrations or migrations.

Drive improvements across the entire service life-cycle, from inception and design through deployment, operation and refinement.
Enable services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Monitor and measure availability, latency and overall system health to maintain live services.
Automate and evolve systems sustainably through mechanisms that improve reliability and velocity by pushing for changes that scale.
Respond to incidents sustainably and conduct blameless postmortems.

Collaborate with individuals passionate about shaping the future of artificial intelligence, generative AI, and machine learning platforms.
Drive production excellence through SRE principles.
Support groundbreaking AI/ML tools on the rapidly growing Vertex GenAI platform.
Work in a blame-free environment that encourages collaboration, problem-solving, and risk-taking.