Proven expertise designing and operating distributed systems at massive scale in SaaS or cloud platforms
Strong background in backend, reliability, or platform engineering, with hands-on experience in Go, Ruby, or similar languages
Deep understanding of observability, incident response, and strategies for ensuring reliability in complex systems
Ability to drive alignment across teams and influence senior leaders without formal authority
Track record of delivering technical outcomes that shaped business direction
Strong communicator who can simplify complex topics for engineers, executives, and customers alike
Passion for mentoring and developing senior technical talent
What you'll be doing
Own the long-term roadmap for Production Engineering, driving modernization and scale initiatives that directly impact GitLab.com availability and performance
Lead design for complex distributed systems challenges such as sharding, multi-tenant isolation, observability, and failure recovery
Collaborate with leaders across engineering and product to ensure infrastructure choices deliver both technical and business value
Anticipate evolving needs at scale and define patterns that make our platform more resilient, efficient, and cost-effective
Mentor senior and staff engineers, multiplying impact by raising technical excellence across teams
Champion observability and production readiness practices that improve the speed and quality at which we ship features to customers, along with improving reliability for customers worldwide
Perks and Benefits
Benefits to support your health, finances, and well-being
All remote, asynchronous work environment
Flexible Paid Time Off
Team Member Resource Groups
Equity Compensation & Employee Stock Purchase Plan