Requirements
- 4+ years of experience in the observability domain or in a relevant platform/infrastructure domain.
- Observability Stack Expertise: You have hands-on experience operating core telemetry data stores at scale e.g. Elasticsearch/Opensearch/VictoriaLogs/Clickhouse for logging, Prometheus/VictoriaMetrics for metrics and Grafana Tempo for distributed tracing.
- Linux Experience: You understand the operating system at a kernel level and can debug complex networking, file system, and performance issues on both bare metal and virtualized hardware.
- Production Kubernetes Experience: Proven hands-on experience operating, and troubleshooting production workloads on Kubernetes (on-prem and/or cloud), including strong day-to-day use of kubectl and Kubernetes primitives (e.g. Namespaces, Pods, Deployments/StatefulSets, Services, Ingress, ConfigMaps/Secrets).
- Software Engineering Mindset: You are proficient in Go or Python and do not just write scripts; you build tools and automation platforms that treat infrastructure as code.
Nice to have
- Experience with large scale, multi-tenant isolation and quota or cost governance approaches for telemetry platforms.
- Familiarity with regulated environments where security, audibility, and data handling requirements shape platform design decisions.
What you'll be doing
- Build the next generation of our platform: Design and implement the future architecture of our logging and metrics systems.
- Own infrastructure operations: You will take full ownership of our hybrid infrastructure.
- Automate to reduce toil: Write code in Go or Python to eliminate manual operational tasks.
- Optimize for scale and performance: Dive deep into performance bottlenecks within our distributed tracing and logging pipelines.
- Reliability and Engineering: Participate in on-call rotations and engineering solutions to prevent alerts from firing.
Perks and benefits
Role based out of our Amsterdam office. Office-first company with in-person collaboration.