Job Description
Summary
Description
- Designing and implementing semi-supervised, self-supervised representation learning techniques for growing the power of both limited labeled data and large-scale unlabeled data.
- Developing evaluation protocols centered on the end-to-end user experience, with a focus on anticipating potential failure modes, edge cases, and anomalies.
- Employing data selection techniques such as novelty detection, active learning, and core-set selection for diverse data types like images, 3D models, natural language, and audio.
- Uncovering patterns in data, setting performance targets, and using modern statistical and ML-based methods to model data distributions. This will aid in reducing redundancy and addressing out-of-distribution samples.
Minimum Qualifications
- Demonstrated expertise in machine learning with a passion for data-centric machine learning.
- Experience with natural language processing (NLP), and large language models, such as BERT, GPT, or Transformers.
- Strong programming skills and hands-on experience using the following languages or deep learning frameworks: Python, PyTorch, or Jax.
- Strong problem-solving and communication skills.
- 5+ years of experience with developing and evaluating ML applications, and demonstrated experience in understanding and improving data quality.
- MS degree in Machine Learning, Natural Language Processing, Computer Vision, Data Science, Statistics or related areas.
Preferred Qualifications
- Ph.D preferred
- Demonstrated publication record in relevant conferences (e.g. ACL, EMNLP, NeurIPS, ICML, ICLR, , etc).
- Staying on top of emerging trends in LLMs.