PhD or MS degree in Computer Science, Electrical Engineering, Computer Engineering, or a related technical field.
Extensive experience in system-level programming and optimization, with a deep understanding of computer architecture, interconnect fabrics, memory systems, toolchains.
Proficiency in designing, evaluating, and optimizing hardware systems for machine learning applications, with specific knowledge of machine learning accelerators.
Strong proficiency in programming languages such as Python, C, and experience with machine learning frameworks like PyTorch2.
Understanding of machine learning algorithms, data structures, and software-hardware interaction principles.
Experience with parallel programming, multithreading, synchronization.
Ability to solve complex software systems and optimize code for performance enhancements.
Strong communication skills, with the ability to work collaboratively in a multi-disciplinary team environment.
What you'll be doing:
Design and optimize software and hardware systems to improve the performance of machine learning workloads.
Collaborate with multi-functional teams to develop scalable, efficient, and high-performance machine learning solutions that use advanced machine learning accelerators.
Develop and optimize toolchains that improve the efficiency of machine learning models on specialized hardware.
Analyze and optimize the interconnects between different hardware components to minimize latency and improve throughput in machine learning applications.
Conduct in-depth performance analysis, identify bottlenecks, and implement solutions at both the software and hardware layers.
Stay abreast of the latest advancements in machine learning technologies, toolchains, computer architecture, and hardware accelerators to drive innovation within the company.