Professional Experience: 4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence.
System Design Expertise: 2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems.
SRE / Operational Experience: 1+ year of being operationally responsible for production-grade Ceph clusters.
Educational Background: Bachelor’s degree in Computer Science or equivalent practical experience.
What We Value
Ceph & Rook Expertise: Practical, hands-on experience managing Ceph storage solutions, with a deep understanding of its architecture and operational nuances, ideally using Rook.
Automation Proficiency: Strong skills in infrastructure automation tools such as Terraform, Kubernetes Operators, and with coding proficiency in Go, Java, or equivalent.
Systems Programming: Experience in systems programming with proficiency in Go, Rust, C/C++, or equivalent languages.
Hardware and OS Knowledge: Deep familiarity with hardware configurations, operating systems, and diagnostic tools.
Networking Fundamentals: Solid understanding of networking principles, with experience in CNIs or cloud networking infrastructure preferred.
On-premises Data Centre Experience: Experience working with on-premises hardware, or as sysadmin/SRE in data centres.
Core Responsibilities
Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints.
Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling.
Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community.
Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects.
Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure.