Jobs.ca
Jobs.ca
Language
AuraOne logo

Bilingual Japanese AI Evaluation Specialist.

AuraOne1 day ago
Remote
Scott, Saskatchewan, Canada
$49 - $98/hour
Entry Level
Full-Time

About the role

Bilingual Japanese AI Evaluation Specialist.

Bilingual Japanese AI Evaluation Specialist is a remote evaluation track for reviewing japanese generalist evaluation prompts and responses against AuraOne's quality rubric.

Apply now Browse open roles

REL 24.09 signed REGRESSION caught

cached OVERRIDE 19 of 142 INCIDENT gate halted TRACK

Evaluation & annotation

Aligned to the AuraOne specialist routing.

TYPE

Contractor

Remote-first specialist work, paid per accepted task.

LOCATION

Remote

Independent specialist contractor

Remote — US-eligible

About The Role

Bilingual Japanese AI Evaluation Specialist is a remote evaluation track for reviewing japanese generalist evaluation prompts and responses against AuraOne's quality rubric. Reviewers compare paired outputs, label edge cases, and write the kind of structured feedback the modeling team can use to retrain.

AI data reviewers help turn japanese generalist evaluation outputs into auditable labels, rationales, and regression cases for AuraOne Human Data.

Review model outputs, label edge cases, and improve training quality across high-volume AI workflows.

Responsibilities

↳Evaluate japanese generalist evaluation model outputs against a versioned rubric and assign severity tags for Bilingual Japanese AI Evaluation Specialist assignments. ↳Compare paired responses and pick the stronger answer with a written rationale. ↳Label hallucinations, instruction-following failures, and unsafe content with structured tags. ↳Capture ambiguous prompts and route them back to the program team for rubric updates. ↳Maintain reviewer-quality scores by calibrating against gold-standard examples each week. ↳Document recurring failure modes so the modeling team can target them in the next training run.

Requirements

↳Prior evaluation, annotation, or human-rater experience on japanese generalist evaluation or adjacent content for Bilingual Japanese AI Evaluation Specialist work. ↳Comfort applying multi-page rubrics consistently across long batches. ↳Clear written reasoning that names the issue and the rubric clause being applied. ↳Strong attention to detail and the ability to flag when a prompt itself is the problem. ↳Reliable async availability for at least 10 hours per week.

EXAMPLE TASKS

↳Compare two japanese generalist evaluation model responses to the same prompt and pick the stronger one with rationale. ↳Tag an unsafe response with the correct policy category and severity. ↳Audit a 50-row batch for rubric consistency and report drift to the program lead. ↳Propose a rubric clarification after spotting a recurring failure mode.

NICE TO HAVE

↳Background in linguistics, content moderation, or trust & safety review. ↳Experience with inter-rater agreement metrics and calibration cycles. ↳Domain expertise that lets you spot subject-matter errors automated checks miss.

Compensation

$49–$98 / hr

Expected schedule: contractor, remote specialist work with program-defined task volume and review pacing.

Skills Used In Matching

Model output evaluation Rubric-based annotation Severity tagging Inter-rater calibration Japanese generalist evaluation

How To Apply

AuraOne uses a shared specialist intake to confirm track fit, review readiness, and the best queue for your profile. Applications submitted from partner job boards carry the source, role, and category on the apply URL.

Apply now Browse other roles

EXAMPLE TASKS

↳Compare two japanese generalist evaluation model responses to the same prompt and pick the stronger one with rationale. ↳Tag an unsafe response with the correct policy category and severity. ↳Audit a 50-row batch for rubric consistency and report drift to the program lead. ↳Propose a rubric clarification after spotting a recurring failure mode.

NICE TO HAVE

↳Background in linguistics, content moderation, or trust & safety review. ↳Experience with inter-rater agreement metrics and calibration cycles. ↳Domain expertise that lets you spot subject-matter errors automated checks miss.

About AuraOne

Technology, Information and Internet
51-200 employees

AI is becoming the new way work gets done.

Not someday. Now.

It is reviewing evidence. Training robots. Inspecting products. Evaluating risk. Helping teams make decisions faster than ever before.

But intelligence is not enough.

A model can produce an answer. A company still needs to trust the work.

That is why we built AuraOne.

AuraOne is the platform for building, testing, and running AI in the real world.

For AI labs, we turn human feedback, expert review, evaluation, and model failures into a system that improves with every run.

For enterprises, we turn private data and existing workflows into AI products that learn from the people, decisions, and outcomes inside the business.

The result is simple:

Work gets cleaner. Models get stronger. Failures are remembered. Decisions are easier to trust. And customers keep what they build.

AuraOne is not another chatbot. Not another dashboard. Not another layer of AI theater.

It is the system behind AI that makes it work.

Your work. Your data. Your AI.

Similar Jobs