About the role
About The Job Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .
Position: SWE Expert
Type: Contract Compensation: $70–$150/hour Role Responsibilities
- Convert high-level objectives into tightly scoped, testable deliverables with clear inputs/outputs and measurable success criteria.
- Create structured documentation defining expected behavior, constraints, and edge cases for reuse by other evaluators.
- Build lightweight automation scripts to support evaluation flows, such as generating required artifacts and validating outputs.
- Write deterministic Python verifier scripts for completion checks via final state or output validation.
- Design prompts/tasks to reliably elicit target workflow behavior while avoiding leakage of internal instructions.
- Implement robust error handling and actionable failure messages in verification tooling.
- Develop plausible but ineffective “baseline” or “distractor” approaches to confirm evaluation discrimination.
- Maintain clean artifact hygiene with versionable structure, consistent naming, and reproducible execution.
Qualifications Must-Have
- Strong Python skills in file system operations, parsing, validation, and deterministic execution.
- Experience with evaluation harnesses, automated grading, or QA-style verification.
- Familiarity with prompt design and LLM evaluation methodologies.
- Comfort with structured specs and documentation conventions like Markdown and YAML.
- Working knowledge of Git, CLI workflows, virtual environments, and dependency management.
Preferred
- Knowledge of embeddings/similarity concepts like cosine similarity for negative-control design.
- Ability to communicate clearly and control scope without relying on domain-specific context.
Application Process (Takes 20–30 mins to complete)
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: support@mercor.com
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity. ,
Similar jobs you might like
About the role
About The Job Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .
Position: SWE Expert
Type: Contract Compensation: $70–$150/hour Role Responsibilities
- Convert high-level objectives into tightly scoped, testable deliverables with clear inputs/outputs and measurable success criteria.
- Create structured documentation defining expected behavior, constraints, and edge cases for reuse by other evaluators.
- Build lightweight automation scripts to support evaluation flows, such as generating required artifacts and validating outputs.
- Write deterministic Python verifier scripts for completion checks via final state or output validation.
- Design prompts/tasks to reliably elicit target workflow behavior while avoiding leakage of internal instructions.
- Implement robust error handling and actionable failure messages in verification tooling.
- Develop plausible but ineffective “baseline” or “distractor” approaches to confirm evaluation discrimination.
- Maintain clean artifact hygiene with versionable structure, consistent naming, and reproducible execution.
Qualifications Must-Have
- Strong Python skills in file system operations, parsing, validation, and deterministic execution.
- Experience with evaluation harnesses, automated grading, or QA-style verification.
- Familiarity with prompt design and LLM evaluation methodologies.
- Comfort with structured specs and documentation conventions like Markdown and YAML.
- Working knowledge of Git, CLI workflows, virtual environments, and dependency management.
Preferred
- Knowledge of embeddings/similarity concepts like cosine similarity for negative-control design.
- Ability to communicate clearly and control scope without relying on domain-specific context.
Application Process (Takes 20–30 mins to complete)
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: support@mercor.com
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity. ,