Jobs.ca
Jobs.ca
Language
RBC logo

Lead Engineer, Generative AI and Machine Learning

RBC7 days ago
Urgently Hiring
Verified
Toronto, ON
Senior Level
Full-time

Top Benefits

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
Leaders who support your development through coaching and managing opportunities
Ability to make a difference and lasting impact

About the role

Job Description

What is the opportunity?

This role offers a unique chance to pioneer the integration of Generative AI and machine learning (ML) into Site Reliability Engineering (SRE), driving transformative improvements in system reliability, efficiency, and scalability. You will work at the intersection of AI/ML innovation and cloud-native infrastructure, addressing critical challenges like anomaly detection, incident prediction, and automation. By leveraging cutting-edge technologies, you will empower organizations to minimize downtime, enhance observability, and optimize operational workflows, directly impacting business continuity and performance.

What will do you do?

  • Design and deploy end-to-end AI/ML solutions to solve SRE challenges (e.g., log analysis, auto-remediation, and predictive maintenance).
  • Develop models using supervised/unsupervised learning and Generative AI tools (e.g., LLMs, text-generation frameworks) to improve system resilience.
  • Fine-tune models, engineer prompts, and integrate AI solutions with SRE tooling (monitoring systems, CI/CD pipelines).
  • Collaborate with SRE, DevOps, and data science teams to scale solutions across cloud platforms (OCP, Azure).
  • Translate AI insights into strategies for reducing downtime, automating tasks, and aligning with SRE principles (SLOs, error budgets).
  • Build and maintain ML pipelines using Python, TensorFlow, PyTorch, and OpenAI APIs.
  • Evaluate emerging AI technologies to advance reliability engineering practices.

What do you need to succeed?

  • Technical Expertise : Strong experience in ML/Generative AI, Python, and frameworks like TensorFlow, PyTorch, or OpenAI APIs.
  • SRE Knowledge : Familiarity with SRE concepts (SLOs, error budgets) and cloud-native environments (OCP, Azure).
  • Problem-Solving Skills : Ability to address complex reliability challenges with AI-driven solutions.
  • Collaboration : Effective teamwork with cross-functional teams (SRE, DevOps, data science).
  • Innovation : Passion for exploring emerging AI technologies and advocating for novel approaches.
  • Operational Focus : Commitment to ensuring scalable, production-ready deployments and optimizing model performance.

Must haves:

  • Proven expertise in machine learning (ML) and Generative AI : Hands-on experience with frameworks like TensorFlow, PyTorch, or Hugging Face, and tools such as OpenAI APIs or LLMs.
  • Strong programming skills in Python : Proficiency in developing and deploying ML models and pipelines.
  • SRE/DevOps fundamentals : Familiarity with Site Reliability Engineering principles (e.g., SLOs, error budgets) and cloud-native infrastructure (OCP, Azure).
  • Model deployment and scalability : Experience operationalizing ML models in production environments, including monitoring, maintenance, and optimization.
  • Collaborative problem-solving : Ability to work with cross-functional teams (SRE, DevOps, data science) to translate technical insights into actionable solutions.
  • Data analysis and engineering : Skills in preprocessing data, feature engineering, and working with large-scale datasets.

Nice to****haves:

  • Prompt engineering and fine-tuning : Experience optimizing Generative AI models (e.g., LLMs) for domain-specific tasks.
  • MLOps/AIOps tools : Familiarity with ML pipeline orchestration (e.g., Kubeflow, MLflow) and SRE tooling (e.g., Prometheus, Kubernetes).
  • Anomaly detection/time-series analysis : Prior work in predictive maintenance, incident forecasting, or log analysis for infrastructure systems.
  • Open-source contributions : Active participation in AI/ML or SRE-related open-source projects.
  • Cloud certifications : Advanced credentials (e.g., AWS Machine Learning Specialty, Google Cloud AI Engineer).
  • Domain knowledge in observability : Experience with tools like Grafana, ELK Stack, or Splunk for enhancing system visibility.

What’s in it for you?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

  • A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
  • Leaders who support your development through coaching and managing opportunities
  • Ability to make a difference and lasting impact
  • Work in a dynamic, collaborative, progressive, and high-performing team
  • Flexible work/life balance options
  • Opportunities to do challenging work
  • Opportunities to take on progressively greater accountabilities
  • Access to a variety of job opportunities across business and geographies

About RBC

Banking
10,000+

Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.

http://rbc.com/legalstuff.

La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais

Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.

https://www.rbc.com/conditions-dutilisation/