Site Reliability Engineer (SRE), Data Infrastructure
This job is offline
AI Summary ✨
Requirements
BS or MS degree in Computer Science with years of experience in a Site Reliability Engineering (SRE) and/or DevOps role
Years of experience of running services in a large scale *nix environment and understanding of SRE principles & goals along with prior on-call experience
Deep understanding and experience in one or more of the following - Hadoop, Spark, Flink, Kubernetes, AWS
The ability to design, author, and release code in any language (Go, Python, Ruby or Java)
Nice to Haves
Fast learner with excellent analytical problem solving and interpersonal skills
Experience supporting Java applications
Experience on Big Data Technologies
Experience working with geographically distributed teams and implement high level projects and migrations
Strong communication skills and ability deliver results on time with high quality
What You'll Be Doing
Support Java based applications & Spark/Flink jobs on Baremetal, AWS & Kubernetes
Understand the application requirements (Performance, Security, Scalability etc.) and assess the right services/topology on AWS, Baremetal & Kubernetes
Build automation to enable self-healing systems
Build tools to monitor high performance & alert the low latency applications
Troubleshoot application specific, core network, system & performance issues
Involvement in challenging and fast paced projects supporting Apple's business by delivering innovative solutions
Monitor production, staging, test and development environments for a myriad of applications in an agile and dynamic organization