MSc or PhD in CS, EE, or CSEE or equivalent experience.
Strong background in Deep Learning.
Strong programming skills in Python and PyTorch.
Experience with inference optimization techniques (such as quantization) and inference optimization frameworks, one of: TensorRT, TensorRT-LLM, vLLM, SGLang.
Nice to haves:
Familiarity with deploying Deep Learning models in production settings (e.g., Docker, Triton Inference Server).
CUDA programming experience.
Familiarity with diffusion models.
Proven experience in analyzing, modeling, and tuning the performance of GPU workloads, both inference and training.
What you'll be doing:
Improve inference speed for Cosmos WFMs on GPU platforms.
Effectively carry out the production deployment of Cosmos WFMs.
Profile and analyze deep learning workloads to identify and remove bottlenecks.