Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

Senior AI/ML Infrastructure Engineer

AI Summary ✨

Requirements:

  • 3-7 years of experience in AI/ML infrastructure, DevOps, or HPC environments.
  • Strong expertise in Linux, Kubernetes, and container orchestration.
  • Hands-on experience with GPUs, TPUs, or AI accelerators.
  • Knowledge of storage architectures, distributed file systems (Lustre), and data lakes.
  • Experience with monitoring, logging, and automation tools (Prometheus, ELK, Grafana, Python, Bash).
  • Strong understanding of AI/ML frameworks (TensorFlow, PyTorch).
  • Excellent problem-solving skills and ability to work in a fast-paced AI/ML environment.
  • Expert level experience on C programming knowledge.

What you'll be doing:

  • Architect, develop and deploy scalable AI/ML infrastructure using Kubernetes, Docker, and containerized solutions.
  • Configure & Optimize GPU, TPU, and high-performance compute resources for efficient training and inference.
  • Implement cloud-based AI solutions and integrate with on-prem environments.
  • Design high-speed networking and storage solutions to support AI workloads. Strong understanding of RDMA, RoCE V2 protocols.
  • Working and Managing experience with Nvidia SuperPOD.
  • Deep understanding and hands-on experience on Parallel File Systems (Lustre).
  • Optimize distributed training workflows.
  • Implement automated deployment pipelines.
  • Ensure cost-efficient resource utilization by tuning cloud auto-scaling, spot instances, and job scheduling.
  • Develop observability and monitoring tools using Prometheus, Grafana.
  • Ensure AI workloads comply with security best practices (RBAC, IAM, encryption).
  • Maintain high availability, fault tolerance, and disaster recovery strategies for AI infrastructure.
  • Work closely with AI/ML engineers, data scientists, and DevOps teams to streamline AI workflows.
  • Stay ahead of emerging AI infrastructure trends and evaluate new technologies.

Nice to haves:

  • (None specified)

Perks and Benefits:

  • (None specified)
Apply here
eBay logo

eBay

Amsterdam, Netherlands

Experience: Senior
Posted: February 26, 2025
Docker
Kubernetes
Python
machinelearning

Similar jobs

  • ebay logo

    AI Platform Software Engineer

    Amsterdam, Netherlands

    6 days ago
  • ebay logo

    AI Engineering & Operations Lead

    Amsterdam, Netherlands

    a month ago
    Still looking
  • 2 months ago
    Still looking
    Remote
  • See all jobs in Netherlands