Experience in critical, large scale distributed systems
Experience building and leading engineering teams; ideally SRE or Production Engineering
Strong emphasis on SRE as an engineering subject area, with proficiency in at least one of the following languages (Golang, Rust, Python, Swift)
Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements
Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management, influencing decision making
Bachelors or Masters in Computer Science, Computer Engineering, or equivalent experience
Nice to Haves
Working with large bare-metal infrastructure and release management
Experience with large scale server provisioning, fleet management and maintenance
Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs