Senior System Software Engineer, Platform Operations
AI Summary ✨
Requirements:
Bachelor’s degree in Computer Science, a related technical field, or equivalent experience
Over 6 years of DevOps experience optimizing, deploying, and running containerized applications (Docker, Kubernetes) across AWS, Azure, and GCP, including hands-on work with EKS, AKS, and GKE
Proficient in Python and Linux shell scripting for automation, application development, system administration, and troubleshooting
Validated experience architecting, implementing, and managing cloud infrastructure using Terraform
Demonstrated ability as a meticulous problem-solver with strong analytical skills, capable of diagnosing and resolving complex technical challenges under pressure
Excellent communication, teamwork, and collaboration skills, with an ability to articulate technical concepts clearly to diverse audiences and lead technical responses during incidents
Nice to haves:
Proven experience designing and implementing event-driven architectures using pub/sub patterns with platforms like AWS SNS / SQS, Google Pub / Sub, or Azure Service Bus
Knowledge of generative AI architectures (LLMs, diffusion models) and concepts such as Retrieval Augmented Generation (RAG) and vector databases
Hands-on experience with the NVIDIA AI stack (NeMo, Triton Inference Server, TensorRT) for model development, serving, and optimization. Production experience with NVIDIA NIM is a strong plus
Experienced in building and running CI/CD pipelines (Jenkins, GitLab CI) and managed software development environments, applying SRE principles to automate, enhance reliability, and improve performance
Familiarity with Python-based Learning Management Systems (LMS) such as Open edX
What you’ll be doing:
Architect, build, and evolve the scalable technology stack for global learner and instructor technical support
Lead the global operationalization of support systems, to ensure high availability, performance, and efficient resource utilization
Provide technical leadership and mentorship to a distributed operations team, driving excellence in the use of support technologies and processes
Collaborate cross-functionally to translate support insights and user feedback into systemic improvements to shared NVIDIA services, the DLI platform, and overall experience for enterprises, learners, and instructors
Perks and benefits:
Competitive salaries
Generous benefits package
Considered one of the technology world’s most desirable employers
Best-in-class teams are rapidly growing due to unparalleled growth