Design, develop, and operate highly available and scalable distributed systems.
Collaborate with development teams to implement best practices for CI/CD, infrastructure as code, automated testing and security, etc.
Troubleshoot and debug issues across the entire stack, including application code, networking, and infrastructure.
Build, maintain, and optimize monitoring and alerting solutions to ensure high availability and performance of services.
Automate repetitive tasks and processes, focusing on reliability and efficiency improvements.
Participate in on-call rotations and incident management processes.
Contribute to team and organizational strategy, participating in architectural reviews and decision-making processes.
3+ years of experience in designing, building, and operating reliable distributed systems.
Hands-on experience with cloud platforms such as Google Cloud Platform (GCP) or Amazon Web Services (AWS).
Strong understanding of core Linux/UNIX operating system fundamentals and TCP/IP and network stack.
Experience operating Kubernetes clusters in production.
Knowledge of monitoring and logging tools.
Proficiency in at least one programming language.
Bachelor's degree in Computer Science, Electrical or Computer Engineering, or equivalent experience.
Nice to Haves
Security: Knowledge of security best practices for cloud-based infrastructure.
DevOps Tools: Experience with deploying software to production, implementing and managing CI/CD pipelines, infrastructure as code, and software release tooling.
Database experience: Familiarity with databases (e.g., PostgreSQL, Cassandra, Redis).
Team Leadership: Prior experience leading a team of engineers.
Additional Requirements
A dedicated lifelong learner.
A professional engineer who loves crafting, analyzing, and troubleshooting large software systems.
An excellent communicator who builds collaborative relationships.
Have excellent analytical and problem-solving skills.
A great teammate with the ability to work independently.
Always actively looking for ways to improve services.
Demonstrate personal accountability.
What You'll Be Doing
Work with development teams to ensure scalability, reliability, and performance of core services.
Evangelize best practices and improve systems that power the company.
Participate in all stages of development cycles from feature design to production release.
Write and review code, and deeply understand how applications work.
Collaborate with teams to build competencies and scalable systems.