Role overview
Design and develop machine learning and LLM-based solutions for ML model and system evaluation use cases such as:
Automatic large scale data generation
Automatic UI and Non UI test evaluation
Run evaluation jobs at scale
Build and optimize LLM judges
Intelligent log summarization and anomaly detection
Fine-tune or prompt-engineer foundation models (e.g., Apple, GPT, Claude) for Evaluation-specific applications
Collaborate with QA teams to integrate models into testing frameworks
Continuously evaluate and improve model performance through A/B testing, human feedback loops, and retraining
Monitor advances in LLMs and NLP and propose innovative applications within the ML evaluation domain
What we're looking for
3+ years of proven ability in machine learning, including hands-on work with LLMs.
Strong programming skills in Python and experience with ML/NLP libraries
Experience building or fine-tuning LLMs for software engineering tasks
Understanding of prompt engineering, and retrieval-augmented generation (RAG)
Experience developing LLM based automated evaluation frameworks
Excellent knowledge of software testing methodologies & practices
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .