Requirements

Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modeling, or data engineering work
OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work
OR equivalent experience
Experience using data processing technologies for Multimodal dataset scalability, parallel processing, data handling, streaming/batch processing, etc.
Experience working with distributed computing tools such as Spark, Kubernetes, TensorFlow, Flink, and Pyspark
Experience in conducting research in Machine Learning or working as an ML Engineer/ MLOps/ SWE
Experience designing and developing data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video) and building infrastructure to support this work from the ground up

Nice to Haves

Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video)
Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models
Partner with the pretraining and post-training teams to improve the data recipe through experimentation
Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models
Embody Microsoft's culture and values

Benefits and perks may vary depending on the nature of employment with Microsoft and the country of work