About the role
LLMOps Lead (Canada)
Location: Remote / Toronto (Canada)
Level: Senior Engineer Staff Engineer (depending on experience)
Role Overview
We are seeking an experienced LLMOps Lead to architect, manage, and scale the operational infrastructure behind our large language model workflows. You will be responsible for building robust pipelines, monitoring systems, and tooling that empower our AI teams to deploy, test, and iterate models rapidly and reliably.
Key Responsibilities
- Design, build, and maintain LLM infrastructure (model serving, versioning, rollout, inference pipelines)
- Automate model training, fine-tuning, evaluation, and retraining workflows
- Implement observability, monitoring, logging, and alerting around model performance, drift, latency, and reliability
- Work cross-functionally with ML research, product, and software engineering to integrate LLMs into production services
- Optimize cost, scaling, caching, batching, and hardware utilization for inference
- Lead best practices in MLOps: reproducibility, infrastructure as code, model lineage, experiment tracking
- Mentor junior engineers; set engineering standards and guidelines for LLM operations
Required Skills & Experience
- Master's or PhD in Computer Science, Machine Learning, or related (Waterloo / U of Toronto / Queen's preferred)
- 5+ years of backend / MLOps / infrastructure experience, with 2+ years working specifically with large language models or deep learning production systems
- Deep familiarity with frameworks such as PyTorch, TensorFlow, Transformers, Hugging Face, etc.
- Experience with model serving platforms (e.g. Triton, TorchServe, KFServing, TensorFlow Serving)
- Strong knowledge of distributed systems, containerization (Docker), orchestration (Kubernetes), and cloud infrastructure (AWS, GCP, Azure)
- Hands-on experience in orchestration tools (Airflow, Kubeflow, MLFlow, etc.)
- Proven track record of building scalable, reliable, low-latency inference systems
- Familiar with cost optimization strategies for model inference (quantization, pruning, batching, caching)
- Excellent communication, problem-solving, debugging, and leadership skills
- Energetic, proactive, with ability to adapt in fast-paced, startup-style environment
Nice-to-Have
- Experience with agentic systems, autonomic AI, or multi-agent coordination
- Background in prompt engineering, reinforcement learning from human feedback (RLHF)
- Exposure to edge / on-device inference
- #IT2025
About the role
LLMOps Lead (Canada)
Location: Remote / Toronto (Canada)
Level: Senior Engineer Staff Engineer (depending on experience)
Role Overview
We are seeking an experienced LLMOps Lead to architect, manage, and scale the operational infrastructure behind our large language model workflows. You will be responsible for building robust pipelines, monitoring systems, and tooling that empower our AI teams to deploy, test, and iterate models rapidly and reliably.
Key Responsibilities
- Design, build, and maintain LLM infrastructure (model serving, versioning, rollout, inference pipelines)
- Automate model training, fine-tuning, evaluation, and retraining workflows
- Implement observability, monitoring, logging, and alerting around model performance, drift, latency, and reliability
- Work cross-functionally with ML research, product, and software engineering to integrate LLMs into production services
- Optimize cost, scaling, caching, batching, and hardware utilization for inference
- Lead best practices in MLOps: reproducibility, infrastructure as code, model lineage, experiment tracking
- Mentor junior engineers; set engineering standards and guidelines for LLM operations
Required Skills & Experience
- Master's or PhD in Computer Science, Machine Learning, or related (Waterloo / U of Toronto / Queen's preferred)
- 5+ years of backend / MLOps / infrastructure experience, with 2+ years working specifically with large language models or deep learning production systems
- Deep familiarity with frameworks such as PyTorch, TensorFlow, Transformers, Hugging Face, etc.
- Experience with model serving platforms (e.g. Triton, TorchServe, KFServing, TensorFlow Serving)
- Strong knowledge of distributed systems, containerization (Docker), orchestration (Kubernetes), and cloud infrastructure (AWS, GCP, Azure)
- Hands-on experience in orchestration tools (Airflow, Kubeflow, MLFlow, etc.)
- Proven track record of building scalable, reliable, low-latency inference systems
- Familiar with cost optimization strategies for model inference (quantization, pruning, batching, caching)
- Excellent communication, problem-solving, debugging, and leadership skills
- Energetic, proactive, with ability to adapt in fast-paced, startup-style environment
Nice-to-Have
- Experience with agentic systems, autonomic AI, or multi-agent coordination
- Background in prompt engineering, reinforcement learning from human feedback (RLHF)
- Exposure to edge / on-device inference
- #IT2025