Role overview

This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features. Both human grading and model-based auto-grading are thoughtfully leveraged to power these evaluations.

Additionally, this role researches and develops auto-grading methodology & infrastructure to benefit ongoing and future Apple Intelligence safety evaluations.

Producing safety evaluations that uphold Apple’s Responsible AI values requires thoughtful data sampling, creation, and curation for evaluation datasets; high quality, detailed annotations and careful auto-grading to assess feature performance; and mindful analysis to understand what the evaluation means for the user experience.

This role heavily draws on applied data science, scientific investigation and interpretation, cross-functional communication and collaboration, and metrics reporting and presentation.

","responsibilities":"Develop metrics for evaluation of safety and fairness risks inherent to generative AI features.

Design datasets, identify data needs, and work on creative solutions, scaling and expanding data coverage through human and synthetic generation methods.

Develop auto-grading technologies and approaches for application in safety evaluations of generative AI features.

Provide technical direction and expertise to team-wide initiatives in safety auto-grading.

Use and implement data pipelines, and collaborate cross-functionally to execute end-to-end safety evaluations.

Work with highly-sensitive content with exposure to offensive and controversial content.

What you'll work on

As a member of Apple’s Responsible AI group you will be working on a wide array of new features and research in the generative AI space.

Our team is currently interested in large generative models for vision and language, with particular interest on Responsible AI, safety, fairness, robustness, explainability, and uncertainty in models.

What we're looking for

MS, or PhD in Computer Science, Machine Learning, Statistics, or related fields; or an equivalent qualification acquired through other avenues.

Experience working with generative models for evaluation and/or product development, and up-to-date knowledge of common challenges and failures.

Strong engineering skills and experience in writing production-quality code in Python.

Deep experience in foundation model-based AI programming (i.e.: using DSPy for optimizing foundation model prompts, for example) and a drive to innovate in this space.

Experience working with noisy, crowd-based data labels and human evaluations.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .

Tags & focus areas

Used for matching and alerts on DevFound

Ai Machine Learning Generative Ai

AIML - Machine Learning Engineer, Responsible AI

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?