5+ years' experience as an ML Engineer or Data Scientist
5+ years' experience writing production code in Python
Experience using tools like Git, Docker, Kubernetes, CircleCI
Experience productionising Generative AI workstreams or Agentic AI projects
You know the fundamentals of Generative AI and have a good understanding of the science behind it, including LLMs, VLMs, transformers and fine-tuning techniques
A robust understanding of traditional ML and evaluation techniques, and a good understanding of the research and developments around Generative AI evaluation techniques
You get satisfaction from seeing your work shipped and driving measurable impact to the business
Experience mentoring others in the team
You have a bias to simplicity, where you care most about achieving impact
Nice to Haves
Experience with evaluation harnesses and frameworks for Generative AI
Experience with observability, monitoring, and safety techniques for deployed GenAI systems
Experience in strongly typed languages such as Go
What you'll be doing
Developing, assessing and evaluating agentic systems built with a variety of LLM providers, including OpenAI, Anthropic & Google
Research and implementation of new GenAI evaluation techniques
Contributions to the development of software packages and evaluation harnesses to allow downstream teams to develop their own evaluation datasets and metrics
Development of common tooling to support the above, including labeling UIs and associated tooling
Defining best practices for GenAI evaluations across the business and working closely with downstream teams and product partners to align evaluation strategies with business needs
Perks and Benefits
Benefits differ by country, including healthcare, well-being, parental leave, pensions, and generous annual leave allowances
Time off to support a charitable cause of your choice
Commitment to diversity, equity, and inclusion in all aspects of the hiring process