Job Description
Summary
Description
* Responsibilities:
* Design and develop compiler based optimizations for Metal backend in ML frameworks such as torch.compile for PyTorch
* Work on cutting-edge ML inference framework project and optimize code for efficient and scalable ML inference using distributed techniques
* Implement features of Metal device backend for ML training acceleration technologies
* Work with Core teams of PyTorch, JAX or Tensorflow to provide Metal runtime and device backend support
* Tune GPU-accelerated training across products.
* Performing in-depth analysis, compiler and kernel level optimizations to ensure the best possible performance across hardware families.
* Intended deliverables:
* GPU accelerated ML Frameworks technology
* Optimized ML training across products.
If this sounds of interest, we would love to hear from you!
Minimum Qualifications
- 3+ years of programming and problem-solving experience with C/C++/ObjC
- Experience with Distributed training or inference techniques
- GPU compute programming models & optimization techniques
- Experience with system level programming and computer architecture
Preferred Qualifications
- Contributions to an AI framework such as PyTorch, JAX or Tensorflow is a plus
- Experience with graph compilers such as Triton, OpenXLA or LLVM/MLIR is a plus
- Good understanding of machine learning fundamentals