Requirements
- In-depth experience in a Site Reliability Engineering, DevOps, or Infrastructure-focused role
- Expertise and professional experience with cloud operations, focusing on "infrastructure-as-a-service" (compute, storage, and network virtualization)
- Proficiency in GoLang and Python
- Experience operating large-scale multi-tenant Infrastructure as a Managed service
- Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc)
Nice to Haves
- Automation advocate - believes in removing operational load via software
- Strong sense of ownership and great teamwork and communication skills
- Experience managing, scaling, and troubleshooting Java and GoLang applications
- Capable of collaborating with multiple engineering teams and mentoring others
What You'll Be Doing
- Operate, monitor, and triage all aspects of production and non-production environments
- Design, build, and implement innovative solutions around Kubernetes in a distributed environment
- Prepare alert handling procedures, runbooks, and collaborate with other SRE teams
- Participate in on-call rotations to troubleshoot and resolve production issues
- Automate deployment and orchestration of services into the cloud environment and routine processes
Perks and Benefits
Not specified in the job posting.