Top Benefits
About the role
*Please note our roles are open hybrid, in Montreal and Quebec city locations, as well as remote across the Quebec province.
Are you ready to play a key role in simplifying the deployment of Machine Learning models?
Are you passionate about cloud-native technologies, automation, and developer experience? Coveo is looking for a Senior Developer to join our ML Model Training team! Your mission? Build and evolve the infrastructure that powers thousands of model rebuilds every day, enabling our Data Scientists and Applied Scientists to train their models at scale, reliably, and efficiently.
You’ll focus on simplifying the ML model development experience, designing tools and systems that abstract away complexity while giving internal users the visibility and control they need to iterate with confidence. Your work will directly impact how fast, how often, and how safely models are trained across Coveo’s AI ecosystem.
Here’s what you’ll be responsible for:
- Design simple, powerful interfaces and tools that enable scientists to configure and launch training jobs with minimal friction, whether for prototyping or production.
- Develop smart orchestration and automation mechanisms to prioritize, batch, retry, or rollback training jobs at a massive scale.
- Champion performance and cost optimization, helping the organization manage compute usage responsibly without sacrificing velocity or quality.
- Implement robust observability layers so users can monitor performance, track metrics, and debug model training workflows.
- Collaborate with applied scientists and data engineers to understand their needs, improve developer experience, and continuously raise the bar on reliability and efficiency.
Here is what will qualify you for the role:
- 8+ years of backend or platform engineering experience, with a strong focus on cloud-native and distributed systems (Java, Python, AWS preferred).
- Deep understanding of scalable system design, CI/CD, and container orchestration (Kubernetes, ECS, or similar).
- Passion for developer experience: you care about ergonomics and eliminating friction for internal users.
- A problem-solving mindset, with the resourcefulness to analyze, optimize, and debug large-scale systems while continuously embracing a growth-oriented approach.
Here is what would make you stand out:
- Familiarity with Terraform & Kubernetes for infrastructure automation and container orchestration.
- Experience building ML infrastructure or internal platforms used by data science teams.
- Hands-on experience with job orchestration, task queues, or pipelines at scale
- Solid grasp of observability practices (logs, metrics, traces), and how to build systems that are easy to monitor and debug.
Do you think you can bring this role to life? Send us your application, we want to get to know you!
Join the Coveolife!
We encourage all qualified candidates to apply regardless of, for example, age, gender, disability, gaps in CV, national or ethnic background. We know that applying for a new role is a lot of work and we really appreciate your time.
#li-hybrid #li-remote
About Coveo
Coveo powers the digital experiences of the world’s most innovative brands serving millions of people and billions of interactions across every digital experience. After a decade of enriching our market-leading platform with forward-thinking global enterprises, we know what it takes to gain a trusted AI-experience advantage.
We strongly believe that the future is business-to-person, that experience is today’s competitive front line, a make or break for every business.
For enterprises to achieve this AI-experience advantage at scale, it is imperative to have an Enterprise Spinal and composable ability to deliver AI semantic search and generative experiences at each customer and employee interaction.
Top Benefits
About the role
*Please note our roles are open hybrid, in Montreal and Quebec city locations, as well as remote across the Quebec province.
Are you ready to play a key role in simplifying the deployment of Machine Learning models?
Are you passionate about cloud-native technologies, automation, and developer experience? Coveo is looking for a Senior Developer to join our ML Model Training team! Your mission? Build and evolve the infrastructure that powers thousands of model rebuilds every day, enabling our Data Scientists and Applied Scientists to train their models at scale, reliably, and efficiently.
You’ll focus on simplifying the ML model development experience, designing tools and systems that abstract away complexity while giving internal users the visibility and control they need to iterate with confidence. Your work will directly impact how fast, how often, and how safely models are trained across Coveo’s AI ecosystem.
Here’s what you’ll be responsible for:
- Design simple, powerful interfaces and tools that enable scientists to configure and launch training jobs with minimal friction, whether for prototyping or production.
- Develop smart orchestration and automation mechanisms to prioritize, batch, retry, or rollback training jobs at a massive scale.
- Champion performance and cost optimization, helping the organization manage compute usage responsibly without sacrificing velocity or quality.
- Implement robust observability layers so users can monitor performance, track metrics, and debug model training workflows.
- Collaborate with applied scientists and data engineers to understand their needs, improve developer experience, and continuously raise the bar on reliability and efficiency.
Here is what will qualify you for the role:
- 8+ years of backend or platform engineering experience, with a strong focus on cloud-native and distributed systems (Java, Python, AWS preferred).
- Deep understanding of scalable system design, CI/CD, and container orchestration (Kubernetes, ECS, or similar).
- Passion for developer experience: you care about ergonomics and eliminating friction for internal users.
- A problem-solving mindset, with the resourcefulness to analyze, optimize, and debug large-scale systems while continuously embracing a growth-oriented approach.
Here is what would make you stand out:
- Familiarity with Terraform & Kubernetes for infrastructure automation and container orchestration.
- Experience building ML infrastructure or internal platforms used by data science teams.
- Hands-on experience with job orchestration, task queues, or pipelines at scale
- Solid grasp of observability practices (logs, metrics, traces), and how to build systems that are easy to monitor and debug.
Do you think you can bring this role to life? Send us your application, we want to get to know you!
Join the Coveolife!
We encourage all qualified candidates to apply regardless of, for example, age, gender, disability, gaps in CV, national or ethnic background. We know that applying for a new role is a lot of work and we really appreciate your time.
#li-hybrid #li-remote
About Coveo
Coveo powers the digital experiences of the world’s most innovative brands serving millions of people and billions of interactions across every digital experience. After a decade of enriching our market-leading platform with forward-thinking global enterprises, we know what it takes to gain a trusted AI-experience advantage.
We strongly believe that the future is business-to-person, that experience is today’s competitive front line, a make or break for every business.
For enterprises to achieve this AI-experience advantage at scale, it is imperative to have an Enterprise Spinal and composable ability to deliver AI semantic search and generative experiences at each customer and employee interaction.