Role overview
Role - Permanent Remote
Contract - Independent Contractor (Short-term Project)
This is an **Independent Contractor role only
What you'll work on
- Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness
- Conduct fact-checking using trusted public sources and authoritative references
- Conduct accuracy testing by executing code and validating outputs using appropriate tools
- Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies
- Assess code quality, readability, algorithmic soundness, and explanation quality
- Ensure model responses align with expected conversational behavior and system guidelines
- Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed evaluation guidelines
What we're looking for
- You hold a BS, MS, or PhD in Computer Science or a closely related field
- You have significant (5+ years) real-world experience in software engineering or related technical roles
- You are an expert in at least two relevant programming languages (e.g., Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, Powershell, Bash, Swift, Kotlin, R, TypeScript, HTML/CSS)
- You are able to solve HackerRank or LeetCode Medium and Hard–level problems independently
- You have experience contributing to well-known open-source projects, including merged pull requests
- You have significant experience using LLMs while coding and understand their strengths and failure modes
- You have strong attention to detail and are comfortable evaluating complex technical reasoning, identifying subtle bugs or logical flaws
Tags & focus areas
Used for matching and alerts on DevFound Remote Ai Data Science