Company Logo
Software Engineer

Netflix - 1d ago

Company Logo
Senior Software Engineer

Reddit - 4d ago

Software Reliability Engineer - LPU Hardware DataFlow

AI Summary ✨

Requirements:

  • BS or higher degree or equivalent experience with 8+ years in reliability engineering, hardware testing, driver testing, or SRE with a focus on hardware/drivers.
  • Functional programming experience (haskell, nix).
  • Strong System level programming experience (C++, Rust, Java).
  • Strong experience with Linux and scripting (Python, Shell) for test automation, result parsing, and tooling.
  • Proficiency in building automated test pipelines; experience with CI/CD and with running tests at scale (e.g. test farms, lab automation).
  • Ability to prioritize failures, examine logs and dumps, and collaborate with driver or hardware teams to identify root causes of issues.
  • Strong communication skills in English; capable of collaborating with distributed teams across EMEA and worldwide.

Nice to haves:

  • Experience with GPU or accelerator reliability testing; familiarity with NVIDIA or other GPU/driver ecosystems.
  • Experience with hardware durability or certification testing (stress, longevity, thermal, power) and/or driver consistency and regression testing.
  • Background in driver development, kernel debugging, or low-level software; ability to read driver code and correlate behavior with test failures.
  • Experience with hardware testing tools, lab automation, or DUT (device-under-test) management at scale.
  • Knowledge of reliability standards and methods (e.g. FIT rates, accelerated life testing, failure analysis).
  • Experience with firmware or BIOS reliability testing; understanding of hardware–software interaction and error reporting (e.g. AER, MCE).

What you'll be doing:

  • Fix logic bugs before they even happen by providing formal correctness proofs.
  • Develop and sustain driver reliability test frameworks: automated stability evaluations, regression test suites, and compatibility assessments across OS, driver versions, and hardware SKUs.
  • Diagnose and identify driver and hardware failures: investigate crashes, freezes, and errors; collaborate with driver and hardware groups to resolve problems and enhance test coverage.
  • Establish and track reliability metrics and SLOs for hardware and drivers; perform post-mortems and encourage advancements in test automation and coverage.
  • Build, implement, and run hardware reliability and qualification tests: stress tests, longevity tests, thermal/power cycling, and environmental tests on GPUs and accelerators.
  • Automate test running, result gathering, and reporting; incorporate reliability tests into CI and release workflows; manage lab or farm infrastructure for reliability testing across EMEA and worldwide.

Perks and benefits:

  • Join our team of world-class engineers at NVIDIA.
  • Collaborative and inclusive environment
  • Opportunity to thrive and make a significant impact
NVIDIA logo

NVIDIA

UK, Hungary

Experience: Senior
Posted: March 16, 2026
Java
Python
Rust
backend

Why we track NVIDIA

NVIDIA has become one of the most important companies in tech thanks to AI and GPU computing. They have EU roles across several countries. If you're interested in hardware, CUDA, or ML infrastructure, they're hard to beat.

Similar jobs

  • 4 hours ago
    New
  • 7 hours ago
    New
  • 11 hours ago
    New
  • See all jobs in UK