Requirements
- BS, MS, or PhD in Computer Science, Computer Engineering, or related field
- Software engineering experience focusing on systems programming, ML infrastructure, or AI compilers
- Expertise in Python: Deep understanding of memory management, concurrent programming
- Experience with LLM Inference Engines: Hands-on experience modifying or extending frameworks like SGLang, vLLM, DeepSpeed-FastGen, or TensorRT-LLM
- PyTorch Internals: Strong experience writing PyTorch C++ extensions and custom operators
- Hardware Interfacing: Proven track record of integrating ML workloads with accelerators using custom SDKs, APIs, or low-level drivers
Nice to Haves
- Prior experience working on non-CUDA software ecosystems (e.g., AMD ROCm, AWS Neuron, Google XLA)
- Familiarity with AI compilers and intermediate representations (MLIR, Apache TVM, OpenAI Triton)
- Strong understanding of underlying LLM architectures (Transformers, MoE) and attention algorithms
- Previous experience at an AI silicon startup or working on custom accelerators (Google TPU, AWS Trainium)
What You'll Be Doing
- Framework Integration: Architect and develop backend integration for custom AI chip in SGLang
- Custom Operator Development: Write custom C++ / PyTorch extensions
- Performance Optimization: Profile and optimize end-to-end LLM inference
- Cross-Functional Collaboration: Work closely with hardware and compiler teams
- Testing & Deployment: Build robust testing pipelines for model validation
Perks & Benefits
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.