Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Master's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python. OR equivalent experience working with large-scale distributed systems (e.g., cloud computing providers, SaaS services, etc., ideally with millions or billions of users) or similarly complex environments.
Awareness of, and ability to reason about, modern distributed software design patterns and cloud systems architecture, including microservices, containers, load-balancing, queuing, caching.
Experience with C#/Java/C/C++/Golang.
Experience in building, shipping and operating reliable solutions.
Nice to Haves
Familiarity with modern distributed software design patterns and cloud systems architecture, including microservices, containers, load balancing, queuing, caching.
Experience as a technical lead or engineering manager.
Experience working on large and unfamiliar codebases (millions of lines of code).
Experience with open-source projects, Kubernetes, Linux and containers is desired.
Proven track record in building, shipping, and operating reliable solutions.
Proficiency in programming languages like C#/Java/Python.
Experience with data technologies (SQL/NoSQL/etc.).
Experience with Azure is a plus.
Experience in AI adoption with tools like GitHub Copilot, Azure OpenAI and custom copilots to streamline development and reduce toil.
What You'll Be Doing
Defining system reliability goals through Service Level Objectives (SLOs). Enhancing production posture with targeted improvements in observability and operability (telemetry, alerting, incident/change management, safe deployment practices).
Building reusable automation and processes that help multiple teams meet their reliability goals. Influencing product architecture and roadmaps to ensure customer-experienced reliability is a core design principle.
Contributing directly to product code to achieve reliability outcomes. Leveraging AI to proactively detect anomalies, predict incidents, and automate operational workflows - scaling reliability efforts across complex systems.
Providing technical leadership across multiple Azure teams. Mentoring others on SRE principles, practices, and tools as well as AI usage to boost software development productivity.
Designing and developing large-scale distributed software services and solutions. Delivering "best-in-class" engineering by ensuring services are modular, secure, reliable, testable, diagnosable, observable, and reusable.
Collaborating with internal and external partners to support team goals. Balancing pragmatism with vision—driving continuous improvements in process and codebase. Building automation to prevent or remediate service issues before they impact users.
Driving innovation in large-scale operations by applying cutting-edge AI tools and techniques to reduce operational toil and scale reliability engineering across complex systems. Gaining a working understanding of Microsoft businesses and contributing to cohesive, end-to-end user experiences.
Perks and Benefits
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
#AzRel #AzCXP
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.