Job Description
Summary
We are looking for someone who thrives on collaboration and wants to push the boundaries of what is possible today! The Video Computer Vision org is a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple products. We balance research and product to deliver Apple quality, state-of-the-art experiences, innovating through the full stack, and partnering with HW, SW and ML teams to influence the sensor and silicon roadmap that brings our vision to life.
Description
• Conduct research and development on multimodal large language models, focusing on exploring and utilizing diverse data modalities
• Design, implement, and evaluate algorithms and models to enhance the performance and capabilities of our AI systems
• Collaborate with cross-functional teams, including researchers, data scientists, software engineers, to translate research into practical applications
• Stay up-to-date with the latest advancements in AI, machine learning, and computer vision, and apply this knowledge to drive innovation within the company
Minimum Qualifications
- Experience in developing, training/tuning multimodal LLMs
- Programming skills in Python and C++
- Bachelors Degree and a minimum of 3 years relevant industry experience.
Preferred Qualifications
- Expertise in one or more of: computer vision, NLP, multimodal fusion, Generative AI.
- Experience with at least one deep learning framework such as JAX, PyTorch, or similar.
- Publication record in relevant venues.
- PhD in Computer Science, Electrical Engineering, or a related field with a focus on AI, machine learning, or computer vision.