7+ years of experience as a software engineer, with 3+ years focused on building and maintaining observability platforms or highly distributed systems
Familiarity with monitoring, alerting, and incident response best practices
Expertise in designing and implementing APIs and data pipelines for high-throughput, real-time data ingestion
A practical understanding of distributed systems and their unique observability challenges
Hands-on experience with core observability tools, such as Prometheus, Grafana, Loki, ELK stack (Elasticsearch, Logstash, Kibana), Jaeger, OpenTelemetry, etc.
Experience with containerization and orchestration technologies (Docker, Kubernetes) and infrastructure as code tools (e.g., Ansible, Terraform)
Proficiency in Python as your primary engineering language
Nice to Have
Previous experience in a DevOps, SRE, or developer experience role
Experience with multiple cloud platforms (AWS, GCP, Azure) and their native observability services
Contributions to open-source observability projects
A track record of prototyping and sketching new solutions to complex problems
What You'll Be Doing
Designing, implementing, and assembling scalable and resilient observability solutions across logs, metrics, and traces, leveraging existing market solutions or technologies from scratch
Building robust APIs and data pipelines to ingest, process, and expose observability data to product teams
Collaborating closely with product teams to understand their observability needs and integrate solutions that empower them to monitor, alert, and debug their components effectively
Optimizing the observability infrastructure for performance, accuracy, cost-effectiveness, and an exceptional user experience
Developing and maintaining tooling to automate the onboarding/sunsetting of components to the observability platform and streamline data collection
Contributing to the strategic roadmap of the observability platform, identifying and implementing new features and improvements