Requirements:
- Extensive experience with end-to-end distributed data systems, specially ML-centric ones
- Previous experience as Data Scientist in large scale product team/business
- Excellent Python and Software Engineering knowledge. Ability to work with Java if needed. Demonstrable experience collaborating with engineers on services
- Strong drive to solve problems for Data Scientists, with the ability to work independently in a cross-functional and cross-team environment
- Good communication skills, ability to get the point across to non-technical individuals and back it up with data (and statistical analysis), to engage and manage project stakeholders
- Strong problem-solving skills with the ability to help refine problem statements and propose solutions taking effort-impact-scalability tradeoff into account
Nice to haves:
- Apache Spark, Airflow, Iceberg, Kafka, dbt
- Scikit-Learn, XGBoost, MLFlow, Ray, PyTorch, Graph-tool (or similar)
- AWS (S3, EMR, SageMaker, Lakeformation), Terraform, Docker, GitHub CI/CD
- Knowledge Graphs (+ RAG), graph ML, probabilistic programming, A/B testing
What you'll be doing:
- Software engineering: e.g. testing + CI/CD, monitoring/alerting + disaster recovery
- MLOps: Terraform and AWS infra, ML governance for hundreds of models
- Data Engineering: distributed processing at terabyte scale
- Science: prove value of new methodologies/algorithms applied to cross-team domains, estimate and measure impact, mentor junior members in experiment design
Perks and benefits:
No specific perks and benefits mentioned in the job description.