About the role
Who you are
- Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
- 2+ years experience in computational linguistics or language data processing or AI data creation
- Experience with language data annotation systems and other forms of data markup
- Proficient with scripting languages, such as Python
- Experience working with speech, text, and multimodal data in multiple languages
- Excellent communication, strong organizational skills and very detailed oriented
- Comfortable working in a fast paced, highly collaborative, dynamic work environment
- PhD in Computational Linguistics (or equivalent field with computational emphasis)
- Expertise in bootstrapping AI data collections for quickly evolving requirements
- Extensive experience working with speech, text, and multimodal data in multiple languages
- Experience in data creation for complex agentic workflows
- Practical experience with Machine Learning
- Familiarity with technical concepts such as APIs
- Practical knowledge of version control and agile development
- Familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)
- Willingness to support several projects at one time, and to accept reprioritization as necessary
- Able to think creatively and possess strong analytical and problem solving skills
What the job involves
- The Amazon Artificial General Intelligence (AGI) Data Services organization is responsible for developing diverse datasets to train and evaluate the Amazon AI models
- We are looking for Language Engineers to join our science and engineering team to support the development of complex, multimodal datasets, using a range of approaches including synthetic data generation, model-supported data generation, and human-in-the-loop data collections
- You will play a critical role in driving innovation and advancing the state-of-the-art in evaluating and training AI models
- You will work closely with cross-functional teams, including product managers, engineers, and data scientists to ensure that our AI systems are best in class
- Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
- Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
- Analyze and extract insights from large amounts of data
- Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
- Use modeling tools to bootstrap or test new AI functionalities
- Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models
About Amazon
Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. We are driven by the excitement of building technologies, inventing products, and providing services that change lives. We embrace new ways of doing things, make decisions quickly, and are not afraid to fail. We have the scope and capabilities of a large company, and the spirit and heart of a small one.
Together, Amazonians research and develop new technologies from Amazon Web Services to Alexa on behalf of our customers: shoppers, sellers, content creators, and developers around the world.
Our mission is to be Earth's most customer-centric company. Our actions, goals, projects, programs, and inventions begin and end with the customer top of mind.
You'll also hear us say that at Amazon, it's always "Day 1." What do we mean? That our approach remains the same as it was on Amazon's very first day - to make smart, fast decisions, stay nimble, invent, and focus on delighting our customers.
About the role
Who you are
- Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
- 2+ years experience in computational linguistics or language data processing or AI data creation
- Experience with language data annotation systems and other forms of data markup
- Proficient with scripting languages, such as Python
- Experience working with speech, text, and multimodal data in multiple languages
- Excellent communication, strong organizational skills and very detailed oriented
- Comfortable working in a fast paced, highly collaborative, dynamic work environment
- PhD in Computational Linguistics (or equivalent field with computational emphasis)
- Expertise in bootstrapping AI data collections for quickly evolving requirements
- Extensive experience working with speech, text, and multimodal data in multiple languages
- Experience in data creation for complex agentic workflows
- Practical experience with Machine Learning
- Familiarity with technical concepts such as APIs
- Practical knowledge of version control and agile development
- Familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)
- Willingness to support several projects at one time, and to accept reprioritization as necessary
- Able to think creatively and possess strong analytical and problem solving skills
What the job involves
- The Amazon Artificial General Intelligence (AGI) Data Services organization is responsible for developing diverse datasets to train and evaluate the Amazon AI models
- We are looking for Language Engineers to join our science and engineering team to support the development of complex, multimodal datasets, using a range of approaches including synthetic data generation, model-supported data generation, and human-in-the-loop data collections
- You will play a critical role in driving innovation and advancing the state-of-the-art in evaluating and training AI models
- You will work closely with cross-functional teams, including product managers, engineers, and data scientists to ensure that our AI systems are best in class
- Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
- Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
- Analyze and extract insights from large amounts of data
- Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
- Use modeling tools to bootstrap or test new AI functionalities
- Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models
About Amazon
Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. We are driven by the excitement of building technologies, inventing products, and providing services that change lives. We embrace new ways of doing things, make decisions quickly, and are not afraid to fail. We have the scope and capabilities of a large company, and the spirit and heart of a small one.
Together, Amazonians research and develop new technologies from Amazon Web Services to Alexa on behalf of our customers: shoppers, sellers, content creators, and developers around the world.
Our mission is to be Earth's most customer-centric company. Our actions, goals, projects, programs, and inventions begin and end with the customer top of mind.
You'll also hear us say that at Amazon, it's always "Day 1." What do we mean? That our approach remains the same as it was on Amazon's very first day - to make smart, fast decisions, stay nimble, invent, and focus on delighting our customers.