Summary

Requirements

Experience in critical, large scale distributed systems
Experience building and leading engineering teams; ideally SRE or Production Engineering
Strong emphasis on SRE as an engineering subject area, with proficiency in at least one of the following languages (Golang, Rust, Python, Swift)
Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements
Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management, influencing decision making
Bachelors or Masters in Computer Science, Computer Engineering, or equivalent experience

Working with large bare-metal infrastructure and release management
Experience with large scale server provisioning, fleet management and maintenance
Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs
Hardware bootstrap and associated security (PXE, BIOS, TPM, secure boot, trusted computing)
Automating operations processes via services and tools
Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others

Act as the Service Owner, designing and mapping key performance indicators to achieve the organization’s mission
Lead the definition of requirements, priorities and planning of engineering deliverables
Implement structured engineering and operations processes
Lead the team in daily agile SRE practices, ensuring proper team focus on priorities, achievements, and deliverables
Optimise velocity and efficiency of delivery, and drive continuous improvement

High-visibility role collaborating with multiple teams
Invest in and build good relations with key partners within Apple
Collaboration with internal customers, product engineering, and development groups
Coaching and mentoring opportunities for team members
Empowerment to provide appropriate context and timely feedback