Bachelor’s or Master’s degree in Data Science, Computer Science, Linguistics, Cognitive Science, HCI, Psychology, or a related field
5+ years of relevant job experience
Proficiency in Python for data analysis (pandas, NumPy, Jupyter, etc.)
Experience working with large datasets and designing model-evaluation pipelines, taxonomies, categorization schemes, or structured rating frameworks
Ability to interpret unstructured data (text, transcripts, user sessions) and stitch together qualitative and quantitative findings into actionable guidance
Nice to Haves
Experience working directly with LLMs, generative AI systems, or NLP models
Familiarity with evaluations specific to AI quality, hallucination detection, or model alignment
Experience building internal tools, scripts, or dashboards for evaluation workflows
Familiarity with prompt engineering, RAG systems, or model fine-tuning
Experience evaluating LLMs, multimodal models, or other generative AI systems at scale
Expertise in designing annotation guidelines and managing large scale annotation projects
Background in human factors, social science, or qualitative assessment methodologies
What You'll Be Doing
Analyze AI outputs
Develop evaluation frameworks
Design qualitative
Translate findings into actionable improvements for product and engineering teams
Work cross-functionally with Engineering and Project Managers, Product, and Research teams
Perks and Benefits
Opportunity to work with the Human-Centered AI team at Apple
Collaborate with Data Scientists, Researchers, and Engineers