Requirements:

BS or MS in Computer Science or equivalent program from an accredited University/College.
8+ years of hands-on software engineering or equivalent experience.
Demonstrate understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, and security.
Expertise in Kubernetes (K8s) & KubeVirt and building RESTful web services.
Understanding of building AI Agentic solutions preferably Nvidia open source AI solutions. Demonstrate working experiences in SRE principles like metrics emission for observability, monitoring, alerting using logs, traces and metrics.
Hands-on experience working with Docker, Containers and Infrastructure as a Code like terraform deployment CI/CD.
Exhibit knowledge in concepts of working with CSPs, for example: AWS (Fargate, EC2, IAM, ECR, EKS, Route53 etc...), Azure etc.

Nice to Haves:

Expertise in technologies such as Stack-storm, OpenStack, Redhat OpenShift, AI DBs like Milvus.
A track record of solving complex problems with elegant solutions.
Prior experience with Go & Python, React.
Demonstrate delivery of complex projects in previous roles.
Showcase ability in developing Frontend application with concepts of SSA, RBAC.

Design, build, and implement scalable cloud-based systems for PaaS/IaaS.
Work closely with other teams on new products or features/improvements of existing products.
Develop, maintain and improve cloud deployment of our software.
Participate in the triage & resolution of complex infra-related issues.
Collaborate with developers, QA and Product teams to establish, refine and streamline our software release process, software observability to ensure service operability, reliability, availability.
Maintain services once live by measuring and monitoring availability, latency, and overall system health using metrics, logs, and traces.
Develop, maintain and improve automation tools that can help improve efficiency of SRE operations.
Practice balanced incident response and blameless postmortems.
Be part of an on-call rotation to support production systems.