4+ years of experience with Linux system administration (RHEL or equivalent preferred).
Experience with cloud-based hosting platforms like AWS, Azure, or GCP and/or experience with hardware-based environments.
Familiarity with monitoring systems using tools like Prometheus and writing health checks.
Proficiency with one programming language, such as Java, Go, Python, JavaScript, or similar languages.
What We Value
Confidence in troubleshooting complex systems issues independently using stack traces and observability & systems tools.
Comfort with managing large scale production systems and technologies with configuration management, load balancing, monitoring & alerting infrastructure, and container orchestration.
Ability to work with a high level of autonomy and responsibility in a rapidly changing environment with dynamic objectives and iteration with users.
Experience with containers (Docker/Podman) and orchestration (OpenShift/Kubernetes) at scale is a plus.
Core Responsibilities
Maintaining availability of cloud & physical Linux servers that power the Palantir platform in air-gapped production environments.
Design, deploy, and operate infrastructure to support customer & product requirements via modern orchestration & monitoring platforms.
Collaborate closely with product teams on requirements & SLOs for deploying software into air-gapped environments.
Identifying, troubleshooting, and solving network & systems issues.
Scripting to automate away routine operational tasks.
Provide technical troubleshooting support for production issues, ensuring timely resolution and minimal impact on operations. Participate in a support on-call schedule.