Company Logo
Software Engineer

Netflix - 1d ago

Company Logo
Senior Software Engineer

Reddit - 4d ago

Senior AI Compute Infrastructure Engineer

Requirements

  • 5+ years of infrastructure engineering experience, with significant time spent on GPU compute, ML infrastructure, distributed systems, high-performance computing, or large-scale production platforms
  • Hands-on experience operating GPU clusters or accelerator-backed infrastructure in production or production-like environments, including scheduling, orchestration, utilization monitoring, and cost optimization
  • Strong systems engineering fundamentals across Linux, networking, storage, containers, Kubernetes, distributed runtimes, and production debugging
  • Experience with ML serving frameworks such as vLLM, Triton Inference Server, TensorRT, TorchServe, KServe, Ray Serve, or equivalent systems
  • Proficiency in Python for infrastructure automation, tooling, debugging, integration, and operational workflows
  • Practical understanding of performance tradeoffs across batching, concurrency, memory usage, GPU utilization, model size, latency, throughput, availability, and cost
  • Track record of optimizing compute costs while maintaining clear performance, reliability, and availability expectations
  • Experience building observable systems with useful metrics, logs, traces, dashboards, alerts, and incident workflows
  • Comfortable working in high-stakes, always-on environments where uptime, throughput, correctness, and operational discipline are critical
  • Clear communicator who can translate infrastructure tradeoffs for researchers, product teams, platform engineers, security stakeholders, and engineering leadership

Nice to Haves

  • Experience at a frontier AI lab, hyperscaler, high-frequency trading firm, research platform, or high-scale ML organization
  • Familiarity with custom silicon or specialized accelerators such as TPUs, AWS Trainium, Gaudi, or similar platforms
  • Background in capacity planning, procurement input, reserved capacity strategy, cloud accelerator economics, or GPU fleet cost management
  • Experience with distributed training frameworks such as DeepSpeed, Megatron-LM, FSDP, Ray, or equivalent systems
  • Experience debugging CUDA, NCCL, kernel, driver, runtime, memory, networking, or low-level performance issues
  • Experience with Rust, C++, Go, CUDA, or other systems languages used for performance-critical infrastructure
  • Crypto, financial services, trading infrastructure, or security-sensitive production infrastructure experience

What You'll Be Doing

  • Own and operate GPU and accelerator clusters used for training, inference, evaluation, and experimentation
  • Design infrastructure that enables running models locally on GPUs
  • Build and improve scheduling, orchestration, and utilization systems
  • Optimize inference pipelines for latency, throughput, and cost
  • Partner with ML engineers to remove bottlenecks
  • Build observability for GPU utilization and more
  • Drive reliability and incident response
  • Evaluate and integrate new hardware and cloud instance families
  • Build tooling for GPU usage visibility
  • Contribute to long-term architecture decisions

Perks and Benefits

  • Global team with diverse talents and backgrounds
  • Equal opportunity employer without discriminating based on various characteristics
  • Celebration of all Krakenites for their unique perspectives
  • Encouragement to apply even if not fully meeting requirements, especially if passionate about crypto
  • Ongoing acceptance of applications
  • Job-related skills or work-style assessments as part of the hiring process
AI Summary ✨
Kraken logo

Kraken

UK, Canada, Portugal, Spain, Poland, Ireland, Germany, United Arab Emirates, Brazil, Romania, Czech Republic, Cyprus, Lithuania, Switzerland, Mexico

Remote
Experience: Senior
Posted: May 7, 2026
Last seen: 2 hours ago
Aws
Golang
Kubernetes
Nodejs
Python
Rust
machinelearning

Why we track Kraken

Kraken is one of the oldest and most established crypto exchanges. They take security seriously and have a strong engineering reputation. Remote-friendly with EU roles. A more serious option in the crypto space.

Similar jobs

  • 15 hours ago
    New
  • 16 hours ago
    New
  • 18 hours ago
    New
  • See all jobs in UK