Requirements:

Have experience operating cloud-native production systems and services
Write production-quality code (e.g. Python, Go, or similar) to automate operations and improve reliability
Understand common failure modes in distributed systems, such as dependency failures, resource exhaustion, and partial outages
Have experience working with containerized workloads and platforms (e.g. Kubernetes) in production environments
Are comfortable participating in on-call rotations and diagnosing straightforward production issues
Have experience using observability tools and responding to alerts
Are familiar with SRE concepts such as SLIs, SLOs, and error budgets, and are learning how to apply them in practice
Have hands-on experience with infrastructure as code or declarative configuration (e.g. Terraform, Kubernetes manifests)
Can follow incident response processes and contribute meaningfully during outages
Are comfortable receiving feedback, learning from incidents, and improving your systems over time

Nice to Have:

Build, operate, and improve production systems with a focus on reliability, scalability, and performance
Apply software engineering principles to automate operational tasks and reduce manual toil
Contribute to the design and implementation of systems using established SRE best practices
Help define and measure SLIs and SLOs for services you support
Improve observability through metrics, dashboards, logging, and tracing
Participate in on-call rotations and respond to production incidents with guidance and support
Assist with incident investigation and contribute to post-incident reviews and follow-up actions
Perform basic analysis around system behavior, capacity usage, and scaling characteristics
Identify reliability issues or operational pain points and work with teammates to address them
Collaborate with product, platform, and security engineers to ship reliable systems
Write and maintain clear operational runbooks and system documentation

This role is based in Dublin, Ireland and follows a hybrid working model
Klaviyo supports work authorization and relocation for this position
Company’s total compensation package may include participation in the company’s annual cash bonus plan, equity, and comprehensive range of health, welfare, and wellbeing benefits based on eligibility