Requirements:

5+ years of experience.
MSc or PhD in CS, EE, or CSEE or equivalent experience.
Strong background in Deep Learning.
Strong programming skills in Python and PyTorch.
Experience with inference optimization techniques (such as quantization) and inference optimization frameworks, one of: TensorRT, TensorRT-LLM, vLLM, SGLang.

Nice to Haves:

Familiarity with deploying Deep Learning models in production settings (e.g., Docker, Triton Inference Server).
CUDA programming experience.
Familiarity with diffusion models.
Proven experience in analyzing, modeling, and tuning the performance of GPU workloads, both inference and training.

What you'll be doing:

Improve inference speed for Cosmos WFMs on GPU platforms.
Effectively carry out the production deployment of Cosmos WFMs.
Profile and analyze deep learning workloads to identify and remove bottlenecks.

Perks and Benefits:

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. For Poland: The base salary range is 221,250 PLN - 383,500 PLN for Level 3, and 292,500 PLN - 507,000 PLN for Level 4.

Apply here