Senior MLOps Engineer
About the role
- You will own and evolve the infrastructure that powers our ML pipelines – from cloud environments and CI/CD systems to workflow orchestration and model deployment
- You will work closely with ML scientists, bioinformaticians, and software engineers to keep our platform reliable, reproducible, and scalable
- You’ll maintain and improve cloud infrastructure (GCP) using Infrastructure-as-Code tools (Terraform)
- Manage IAM, RBAC, and permission policies across cloud environments
- Own and evolve CI/CD pipelines (CircleCI, GitHub Actions) and ensure best practices are followed across the engineering and ML teams
- Administer and support workflow orchestration platforms (e.g., Seqera/Nextflow, Argo, Kubeflow)
- Operate and configure ML experiment tracking and registry tooling (e.g., W&B, MLflow)
- Build and maintain containerized environments (Docker) and manage Kubernetes clusters
- Manage GPU resources – provisioning, scheduling, and debugging hardware and driver issues
- Write and maintain Python tooling, scripts, and integrations that support ML infrastructure
- Help deploy ML models to production environments and monitor their performance- If this sounds like you, we would love to hear from you
- You have 4+ years of experience in production infrastructure or MLOps, you write solid Python, and you are curious about the ML and scientific workflows your work supports
- You are someone who enjoys keeping the infrastructure running smoothly so that scientists can focus on their research
- Above all, you are a collaborative, kind team member who communicates clearly, adapts to evolving needs, and is happy to help colleagues grow their own infrastructure skills along the way
- You are comfortable working across cloud platforms, CI/CD systems, containers, and GPUs – and you take pride in making these systems reliable and easy for others to use
- Extensive Hands-on experience with Kubernetes and containerization (Docker)
- Familiarity with Python package and environment management (e.g., pip, conda, pixi)
- Strong Python programming skills
- Experience managing GPU compute (provisioning, debugging, driver management)
- 4+ years of experience operating production infrastructure
- Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform)
- Self-motivated problem solver with excellent communication skills
- Solid background in CI/CD systems (CircleCI, GitHub Actions, or similar)
- Understanding of ML frameworks (e.g., PyTorch, PyTorch Lightning), ML workflows (training, inference, evaluation), and the model lifecycle
- Familiarity with Kubernetes CRDs and batch/gang schedulers (e.g., Volcano, Kueue)
- Experience working with large-scale datasets (storage, versioning, efficient access patterns)
- Experience working directly with scientists and researchers in an interdisciplinary setting
- Knowledge of biology and/or machine learning science
- Familiarity with data compliance and governance frameworks (e.g., HIPAA, SOC 2)
- Previous startup experience
- Familiarity with MLOps tooling (e.g., W&B, Ray, VertexAI) and distributed compute patterns (e.g., DDP, realtime/batch inference, multi-node training).
Not the right fit? Search for MLOps Engineer jobs in Toronto, Ontario
About Deep Genomics
Deep Genomics is using artificial intelligence to build a new universe of life-saving genetic therapies.
The future of medicine will rely on artificial intelligence, because biology is too complex for humans to understand. At Deep Genomics, our geneticists, molecular biologists and chemists develop new ways of detecting and treating disease using our biologically accurate artificial intelligence technology.
Similar Jobs
Senior MLOps Engineer
About the role
- You will own and evolve the infrastructure that powers our ML pipelines – from cloud environments and CI/CD systems to workflow orchestration and model deployment
- You will work closely with ML scientists, bioinformaticians, and software engineers to keep our platform reliable, reproducible, and scalable
- You’ll maintain and improve cloud infrastructure (GCP) using Infrastructure-as-Code tools (Terraform)
- Manage IAM, RBAC, and permission policies across cloud environments
- Own and evolve CI/CD pipelines (CircleCI, GitHub Actions) and ensure best practices are followed across the engineering and ML teams
- Administer and support workflow orchestration platforms (e.g., Seqera/Nextflow, Argo, Kubeflow)
- Operate and configure ML experiment tracking and registry tooling (e.g., W&B, MLflow)
- Build and maintain containerized environments (Docker) and manage Kubernetes clusters
- Manage GPU resources – provisioning, scheduling, and debugging hardware and driver issues
- Write and maintain Python tooling, scripts, and integrations that support ML infrastructure
- Help deploy ML models to production environments and monitor their performance- If this sounds like you, we would love to hear from you
- You have 4+ years of experience in production infrastructure or MLOps, you write solid Python, and you are curious about the ML and scientific workflows your work supports
- You are someone who enjoys keeping the infrastructure running smoothly so that scientists can focus on their research
- Above all, you are a collaborative, kind team member who communicates clearly, adapts to evolving needs, and is happy to help colleagues grow their own infrastructure skills along the way
- You are comfortable working across cloud platforms, CI/CD systems, containers, and GPUs – and you take pride in making these systems reliable and easy for others to use
- Extensive Hands-on experience with Kubernetes and containerization (Docker)
- Familiarity with Python package and environment management (e.g., pip, conda, pixi)
- Strong Python programming skills
- Experience managing GPU compute (provisioning, debugging, driver management)
- 4+ years of experience operating production infrastructure
- Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform)
- Self-motivated problem solver with excellent communication skills
- Solid background in CI/CD systems (CircleCI, GitHub Actions, or similar)
- Understanding of ML frameworks (e.g., PyTorch, PyTorch Lightning), ML workflows (training, inference, evaluation), and the model lifecycle
- Familiarity with Kubernetes CRDs and batch/gang schedulers (e.g., Volcano, Kueue)
- Experience working with large-scale datasets (storage, versioning, efficient access patterns)
- Experience working directly with scientists and researchers in an interdisciplinary setting
- Knowledge of biology and/or machine learning science
- Familiarity with data compliance and governance frameworks (e.g., HIPAA, SOC 2)
- Previous startup experience
- Familiarity with MLOps tooling (e.g., W&B, Ray, VertexAI) and distributed compute patterns (e.g., DDP, realtime/batch inference, multi-node training).
Not the right fit? Search for MLOps Engineer jobs in Toronto, Ontario
About Deep Genomics
Deep Genomics is using artificial intelligence to build a new universe of life-saving genetic therapies.
The future of medicine will rely on artificial intelligence, because biology is too complex for humans to understand. At Deep Genomics, our geneticists, molecular biologists and chemists develop new ways of detecting and treating disease using our biologically accurate artificial intelligence technology.