Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

Senior Site Reliability Engineer, Database Operations:Clickhouse

AI Summary ✨

Requirements:

  • Advanced database platform management experience, preferably using Postgres and Clickhouse at scale
  • Advanced Cloud Infrastructure automation and management, preferably using Ansible, Chef, Terraform, Helm charts, Operators and Kubernetes
  • Solid experience with at least one programming language: Go, Ruby or Python
  • Advanced experience with Linux
  • Extensive on-call experience as an SRE supporting mission-critical systems
  • Solid incident management experience, across all phases: Analysis, Remediation, RCA and Corrective Actions
  • Solid experience implementing monitoring at scale (preferably Prometheus and Grafana)

What You'll Be Doing:

  • Design, build, and maintain ClickHouse and PostgreSQL clusters to support high-demand, enterprise-scale workloads
  • Provision and Orchestrate cloud infrastructure using configuration management tools (Ansible, Chef), IaC (Terraform) and the Kubernetes ecosystem (Helm charts, Operators) and distributed consensus (etcd) in GCP
  • Design and implement enterprise-grade, high-availability ClickHouse solutions with ClickHouse Keeper, sharding, and replication, optimized for large-scale and dynamic datasets
  • Optimize and scale high-transaction PostgreSQL clusters with Patroni and streaming replication for GitLab’s core applications on GCP
  • Build and maintain early warning systems, monitoring, and alerting tools (e.g., Prometheus/Grafana) to predict capacity needs, monitor query latency and replication lag, and ensure resource optimization across platforms
  • Enable cross-database integrations and workflows, such as ClickHouse-to-PostgreSQL data federation, CDC, and logical replication, to support hybrid analytics
  • Respond to platform alerts, user emergencies, and support requests while ensuring strict adherence to SLOs, including during SRE on-call rotations
  • Enhance infrastructure security by implementing and updating measures that protect GitLab’s systems and ensure compliance with regulatory requirements (e.g., GDPR, FedRAMP, SOC2, ISO)
  • Partner with internal and external compliance assessors as Subject Matter Experts during certifications and recertifications
  • Collaborate with engineering teams to address architectural bottlenecks, plan service rollouts and migrations, and shape the future roadmap while maintaining strong operational readiness

Nice to Haves:

  • Willingness and ability to live and promote Gitlab's unique CREDIT Values in one's day to day work and interactions with teammates
  • Superior verbal and written communication skills
  • Cool, collected and composed under pressure
  • Comfortable and productive working asynchronously across time zones and cultures, at the speed and scale of business
  • Enable others to excel
  • Be a Leader of One
  • Act Like an Owner with Gitlab's resources

Perks and Benefits:

  • Benefits to support your health, finances, and well-being
  • All remote, asynchronous work environment
  • Flexible Paid Time Off
  • Team Member Resource Groups
  • Equity Compensation & Employee Stock Purchase Plan
  • Growth and development budget
  • Parental leave
  • Home office support
Apply here
GitLab logo

GitLab

Remote EMEA

Experience: Senior
Posted: January 31, 2025
Gcp
Git
Golang
Kubernetes
Postgresql
Python
Terraform
sitereliability

Similar jobs

  • a day ago
    New
    Remote
  • 22 days ago
    Still looking
    Remote
  • 4 months ago
    Still looking
    Remote
  • See all jobs in