Role overview
About The Job
Mercor
connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include
Benchmark
,
General Catalyst
,
Peter Thiel
,
Adam D'Angelo
,
Larry Summers
, and
Jack Dorsey
.
Position:
AI Task Evaluation & Statistical Analysis Specialist
Type:
Contract
Compensation:
$100–$120/hour
Location:
Remote
Role Responsibilities
- Conduct comprehensive statistical failure analysis to identify patterns in AI agent failures across task components such as prompts, rubrics, and templates.
- Perform root cause analysis to determine if failures are due to task design, rubric clarity, file complexity, or agent limitations.
- Analyze performance variations across finance sub-domains, file types, and task categories to enhance understanding of AI model performance.
- Create dashboards and reports to highlight failure clusters, edge cases, and improvement opportunities.
- Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings.
- Present insights to data labeling experts and technical teams to foster collaboration and drive improvements.
What we're looking for
- Experience with AI/ML model evaluation or quality assurance.
- Background in finance or willingness to learn finance domain concepts.
- Experience with multi-dimensional failure analysis.
- Familiarity with benchmark datasets and evaluation frameworks.
- 2-4 years of relevant experience.
Interview process
- Upload resume
- AI interview based on your resume
- Submit form
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: support@mercor.com
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
,