Requirements:
- 5+ years of experience.
- MSc or PhD in CS, EE, or CSEE or equivalent experience.
- Strong background in Deep Learning.
- Strong programming skills in Python and PyTorch.
- Experience with inference optimization techniques (such as quantization) and inference optimization frameworks, one of: TensorRT, TensorRT-LLM, vLLM, SGLang.
Nice to Haves:
- Familiarity with deploying Deep Learning models in production settings (e.g., Docker, Triton Inference Server).
- CUDA programming experience.
- Familiarity with diffusion models.
- Proven experience in analyzing, modeling, and tuning the performance of GPU workloads, both inference and training.
What you'll be doing:
- Improve inference speed for Cosmos WFMs on GPU platforms.
- Effectively carry out the production deployment of Cosmos WFMs.
- Profile and analyze deep learning workloads to identify and remove bottlenecks.
Perks and Benefits:
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. For Poland: The base salary range is 221,250 PLN - 383,500 PLN for Level 3, and 292,500 PLN - 507,000 PLN for Level 4.