M
AI

Data Scientist Fully Remote

Mercor · · $208k - $249k

Actively hiring Posted 24 days ago

Role overview

**About The Job**
**Mercor**
connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include
**Benchmark**
,
**General Catalyst**
,
**Peter Thiel**
,
**Adam D'Angelo**
,
**Larry Summers**
, and
**Jack Dorsey**
.


**Position:**
AI Task Evaluation & Statistical Analysis Specialist


**Type:**
**Contract**
**Compensation:**
**$100–$120/hour**
**Location:**
**Remote**
**Role Responsibilities**

* Conduct comprehensive statistical failure analysis to identify patterns in AI agent failures across task components such as prompts, rubrics, and templates.
* Perform root cause analysis to determine if failures are due to task design, rubric clarity, file complexity, or agent limitations.
* Analyze performance variations across finance sub-domains, file types, and task categories to enhance understanding of AI model performance.
* Create dashboards and reports to highlight failure clusters, edge cases, and improvement opportunities.
* Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings.
* Present insights to data labeling experts and technical teams to foster collaboration and drive improvements.

**Qualifications**
**Must-Have**

* Statistical Expertise: Strong foundation in statistical analysis, hypothesis testing, and pattern recognition.
* Programming: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis.
* Data Analysis: Experience with exploratory data analysis and creating actionable insights from complex datasets.
* AI/ML Familiarity: Understanding of LLM evaluation methods and quality metrics.
* Tools: Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL.

**Preferred**

* Experience with AI/ML model evaluation or quality assurance.
* Background in finance or willingness to learn finance domain concepts.
* Experience with multi-dimensional failure analysis.
* Familiarity with benchmark datasets and evaluation frameworks.
* 2-4 years of relevant experience.

**Application Process (Takes 20–30 mins to complete)**

* Upload resume
* AI interview based on your resume
* Submit form

**Resources & Support**

* For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
* For any help or support, reach out to: support@mercor.com

*PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.*
,

Tags & focus areas

Used for matching and alerts on DevFound
Parttime Remote Ai Data Science Generative Ai