4+ years of experience in the observability domain or in a relevant platform/infrastructure domain.
Observability Stack Expertise: You have hands-on experience operating core telemetry data stores at scale e.g. Elasticsearch/Opensearch/VictoriaLogs/Clickhouse for logging, Prometheus/VictoriaMetrics for metrics and Grafana Tempo for distributed tracing.
Linux Experience: You understand the operating system at a kernel level and can debug complex networking, file system, and performance issues on both bare metal and virtualized hardware.
Production Kubernetes Experience: Proven hands-on experience operating, and troubleshooting production workloads on Kubernetes (on-prem and/or cloud), including strong day-to-day use of kubectl and Kubernetes primitives (e.g. Namespaces, Pods, Deployments/StatefulSets, Services, Ingress, ConfigMaps/Secrets).
Software Engineering Mindset: You are proficient in Go or Python and do not just write scripts; you build tools and automation platforms that treat infrastructure as code.
Nice to have
Experience with large scale, multi-tenant isolation and quota or cost governance approaches for telemetry platforms.
Familiarity with regulated environments where security, audibility, and data handling requirements shape platform design decisions.
What you'll be doing
Build the next generation of our platform: Design and implement the future architecture of our logging and metrics systems.
Own infrastructure operations: You will take full ownership of our hybrid infrastructure.
Automate to reduce toil: Write code in Go or Python to eliminate manual operational tasks.
Optimize for scale and performance: Dive deep into performance bottlenecks within our distributed tracing and logging pipelines.
Reliability and Engineering: Participate in on-call rotations and engineering solutions to prevent alerts from firing.
Perks and benefits
Role based out of our Amsterdam office. Office-first company with in-person collaboration.