Jobs.ca
Jobs.ca
Language
Vector Institute logo

ML Infrastructure Specialist

Toronto, ON
$100,600 - $125,800/per year
Senior Level
Full-Time

Top Benefits

Vacation time
Floater days
GRRSP program

About the role

Machine Learning Infrastructure Specialist

POSITION SUMMARY

As an ML Infrastructure Specialist focused on systems and scalable AI infrastructure, you will build and improve efficient, reusable systems to train, deploy, monitor, and serve large-scale machine learning models, including large language models (LLMs). Working at the intersection of applied research and production systems, you will collaborate with Vector’s AI Engineering team members, researchers, and industry partners to bring advanced AI capabilities into real-world use. You will contribute to initiatives that strengthen software and systems supporting state-of-the-art AI development and deployment, owning well-scoped projects from end-to-end.

KEY RESPONSIBILITIES

  • Design and implement distributed systems for scalable ML training, inference, and serving on multi-GPU/multi-node environments, with a focus on large foundation models;
  • Configure and maintain LLM inference systems using modern serving frameworks (e.g., vLLM, TGI, SGLang, TritonRT-LLM), including performance tuning;
  • Collaborate with researchers and Applied ML Scientists to turn model innovations into production-ready services (i.e., containerized, tested, and observable);
  • Develop reusable, open-source-friendly modules and tooling that scale ML experimentation and deployment across diverse environments (e.g., Slurm, Kubernetes, and major cloud providers);
  • Provide code reviews, documentation, and mentorship to junior team members; collaborate with partner teams on best practices for reliability, reproducibility, CI/CD, and hardware-aware optimization;
  • Contribute to technical design discussions and road mapping related to ML infrastructure, serving pipelines, and research-to-production workflows;
  • Present technical work via demos and deep dives; contribute to open-source where appropriate; and,
  • Other responsibilities as assigned or amended from time to time.

KEY SUCCESS MEASURES

  • Delivery of high-performance, reliable, and maintainable ML infrastructure used across teams and projects;
  • Adoption and reuse of core infrastructure components across internal applications;
  • Clear design and technical documentation materials that improve developer experience; and,
  • Effective mentorship and collaboration that improves team capability and velocity.

PROFILE OF THE IDEAL CANDIDATE

  • Bachelor's or Master’s degree in Computer Science, Electrical Engineering, or a related field; advanced degrees or relevant equivalent experience preferred;
  • 3+ years of experience developing scalable systems or infrastructure for machine learning workflows, ideally involving large-scale models and GPU workloads;
  • Deep expertise in Python and systems programming; fluency with performance profiling, distributed computing, and containerized environments (e.g., Docker, Kubernetes);
  • Experience with one or more modern LLM inference/serving frameworks (e.g., vLLM, SGLang, TritonRT-LLM);
  • Familiarity with GPU-accelerated inference, memory optimization strategies, and batching/scheduling techniques;
  • Practical knowledge of cloud platforms (e.g., GCP, AWS, Azure) and orchestration of multi-node training or serving systems; and,
  • Strong understanding of software engineering best practices including CI/CD, testing, observability, and DevOps automation.

**TOTAL REWARDS:**The expected salary for this position will be $100,600 - $125,800 per year, plus benefits if applicable. The final salary offer will reflect the successful candidate's experience, skills, and qualifications, in alignment with the Vector Institute'sCompensation Policyand may differ from above.

The Vector Institute’s Total Rewards approach extends beyond traditional compensation and benefits. Full-time employees are eligible for a comprehensive suite of supports that recognize and value employees, including vacation time, floater days, GRRSP, a Health Spending Account, a Summer Hours program, and flexible work arrangements.

**POSITION STATUS:**This posting is for an existing vacancy.

USE OF ARTIFICIAL INTELLIGENCE: Vector may use both internal and external third party AI-based tools to assist in the screening of applications for this posting. Any data collected will be used solely for recruitment purposes and handled in accordance with Vector’sExternal Privacy PolicyandUse of AI-Based Tools in Recruitment and Selection Policy.

**INCLUSION AND EQUAL OPPORTUNITY EMPLOYMENT:**Vector believes AI powers possibility by advancing cutting-edge research and translating it into real-world impact through collaboration with research, industry, and government. Vector is committed to fostering a diverse and inclusive culture that reflects its values.

The Vector Institute welcomes applications from all qualified candidates, including those who are Indigenous, 2SLGBTQIA+, racialized persons/visible minorities, women, and people with disabilities.

If you require an accommodation at any stage of the recruitment or selection process, please contact hr@vectorinstitute.ai. The Vector Institute team will be happy to work with you to ensure your experience is as inclusive and accessible as possible.

**JOIN OUR COMMUNITY:**Check out the Vector Institute’s Careers Page to explore open opportunities at Vector and Follow Vector on X, LinkedIn, and Bluesky to stay connected with the latest developments in Ontario's AI ecosystem and the Vector Institute.

About Vector Institute

Research Services
201-500

Vector Institute is an independent, not-for-profit corporation dedicated to research in the field of artificial intelligence (AI), excelling in machine and deep learning. We work with institutions, industry, start-ups, incubators and accelerators to advance AI research and drive its application, adoption and commercialization across Canada.

Launched in March 2017 with generous support from the Government of Canada, Government of Ontario, and private industry, and in partnership with the University of Toronto and other universities.

Vector prioritizes transparency. Viewers will be made aware of any AI-generated content before they listen, view or read it.

Similar jobs you might like