Senior Software Engineer | Upto $80/hr Hourly
About the role
About The Job Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .
Position: Conversational AI Systems Evaluator
Type: Full-time or Part-time Contract Work Compensation: $45–$80/hour Location: Remote Role Responsibilities
- Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
- Conduct fact-checking using trusted public sources and authoritative references.
- Execute code and validate outputs using appropriate tools to ensure accuracy.
- Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
- Assess code quality, readability, algorithmic soundness, and explanation quality.
- Ensure model responses align with expected conversational behavior and system guidelines.
- Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed evaluation guidelines.
Qualifications Must-Have
- BS, MS, or PhD in Computer Science or a closely related field
- Significant real-world experience in software engineering or related technical roles
- Expertise in at least one relevant programming language (e.g., Python, Java, C++, JavaScript, Go, Rust)
- Ability to solve HackerRank or LeetCode Medium and Hard–level problems independently
- Experience contributing to well-known open-source projects, including merged pull requests
- Significant experience using LLMs while coding and understanding their strengths and failure modes
- Strong attention to detail and comfort evaluating complex technical reasoning, identifying subtle bugs or logical flaws
Preferred
- Prior experience with RLHF, model evaluation, or data annotation work
- Track record in competitive programming
- Experience reviewing code in production environments
- Familiarity with multiple programming paradigms or ecosystems
- Experience explaining complex technical concepts to non-expert audiences
Application Process (Takes 20–30 mins to complete)
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: support@mercor.com
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity. ,
Senior Software Engineer | Upto $80/hr Hourly
About the role
About The Job Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .
Position: Conversational AI Systems Evaluator
Type: Full-time or Part-time Contract Work Compensation: $45–$80/hour Location: Remote Role Responsibilities
- Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness.
- Conduct fact-checking using trusted public sources and authoritative references.
- Execute code and validate outputs using appropriate tools to ensure accuracy.
- Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies.
- Assess code quality, readability, algorithmic soundness, and explanation quality.
- Ensure model responses align with expected conversational behavior and system guidelines.
- Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed evaluation guidelines.
Qualifications Must-Have
- BS, MS, or PhD in Computer Science or a closely related field
- Significant real-world experience in software engineering or related technical roles
- Expertise in at least one relevant programming language (e.g., Python, Java, C++, JavaScript, Go, Rust)
- Ability to solve HackerRank or LeetCode Medium and Hard–level problems independently
- Experience contributing to well-known open-source projects, including merged pull requests
- Significant experience using LLMs while coding and understanding their strengths and failure modes
- Strong attention to detail and comfort evaluating complex technical reasoning, identifying subtle bugs or logical flaws
Preferred
- Prior experience with RLHF, model evaluation, or data annotation work
- Track record in competitive programming
- Experience reviewing code in production environments
- Familiarity with multiple programming paradigms or ecosystems
- Experience explaining complex technical concepts to non-expert audiences
Application Process (Takes 20–30 mins to complete)
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: support@mercor.com
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity. ,