6+ years of technical experience in software engineering, network engineering, or systems administration
5+ years of operational experience in improving Service Reliability, Availability and Performance
Ability to deal with the ambiguity associated with working in a fast-paced environment
Systematic problem-solving approach, coupled with effective communication skills and a sense of curiosity
Expertise in analyzing, troubleshooting, and automating root cause analysis and mitigation of incidents impacting large-scale distributed systems
Ability to travel to customer site on a regular basis in South East UK
Nice to Haves:
Prior HPC knowledge
Influencing the product architecture and roadmap to make sure the customer-experienced supportability is always a key consideration when evolving the product
What You'll Be Doing:
Collaborating closely with existing SRE teams on building and enhancing tooling and automation solutions
Collaborating with customers to understand their pain points and formulate strategies for addressing recurring issues
Communicating on a deeply technical level with a large enterprise customer
Designing and implementing changes to service telemetry for automation
Enhancing customer-facing experience through proactive alerting and analysis
Providing operational insights into customer experience to Design and Product teams
Perks and Benefits:
Microsoft is an equal opportunity employer
Qualified applicants will receive consideration for employment regardless of various characteristics
Benefits and perks may vary depending on the nature of employment and location