Top Benefits
About the role
Job Description
What is the opportunity?
This role offers a unique chance to pioneer the integration of Generative AI and machine learning (ML) into Site Reliability Engineering (SRE), driving transformative improvements in system reliability, efficiency, and scalability. You will work at the intersection of AI/ML innovation and cloud-native infrastructure, addressing critical challenges like anomaly detection, incident prediction, and automation. By leveraging cutting-edge technologies, you will empower organizations to minimize downtime, enhance observability, and optimize operational workflows, directly impacting business continuity and performance.
What will do you do?
- Design and deploy end-to-end AI/ML solutions to solve SRE challenges (e.g., log analysis, auto-remediation, and predictive maintenance).
- Develop models using supervised/unsupervised learning and Generative AI tools (e.g., LLMs, text-generation frameworks) to improve system resilience.
- Fine-tune models, engineer prompts, and integrate AI solutions with SRE tooling (monitoring systems, CI/CD pipelines).
- Collaborate with SRE, DevOps, and data science teams to scale solutions across cloud platforms (OCP, Azure).
- Translate AI insights into strategies for reducing downtime, automating tasks, and aligning with SRE principles (SLOs, error budgets).
- Build and maintain ML pipelines using Python, TensorFlow, PyTorch, and OpenAI APIs.
- Evaluate emerging AI technologies to advance reliability engineering practices.
What do you need to succeed?
- Technical Expertise: Strong experience in ML/Generative AI, Python, and frameworks like TensorFlow, PyTorch, or OpenAI APIs.
- SRE Knowledge: Familiarity with SRE concepts (SLOs, error budgets) and cloud-native environments (OCP, Azure).
- Problem-Solving Skills: Ability to address complex reliability challenges with AI-driven solutions.
- Collaboration: Effective teamwork with cross-functional teams (SRE, DevOps, data science).
- Innovation: Passion for exploring emerging AI technologies and advocating for novel approaches.
- Operational Focus: Commitment to ensuring scalable, production-ready deployments and optimizing model performance.
Must haves:
- Proven expertise in machine learning (ML) and Generative AI: Hands-on experience with frameworks like TensorFlow, PyTorch, or Hugging Face, and tools such as OpenAI APIs or LLMs.
- Strong programming skills in Python: Proficiency in developing and deploying ML models and pipelines.
- SRE/DevOps fundamentals: Familiarity with Site Reliability Engineering principles (e.g., SLOs, error budgets) and cloud-native infrastructure (OCP, Azure).
- Model deployment and scalability: Experience operationalizing ML models in production environments, including monitoring, maintenance, and optimization.
- Collaborative problem-solving: Ability to work with cross-functional teams (SRE, DevOps, data science) to translate technical insights into actionable solutions.
- Data analysis and engineering: Skills in preprocessing data, feature engineering, and working with large-scale datasets.
Nice to haves:
- Prompt engineering and fine-tuning: Experience optimizing Generative AI models (e.g., LLMs) for domain-specific tasks.
- MLOps/AIOps tools: Familiarity with ML pipeline orchestration (e.g., Kubeflow, MLflow) and SRE tooling (e.g., Prometheus, Kubernetes).
- Anomaly detection/time-series analysis: Prior work in predictive maintenance, incident forecasting, or log analysis for infrastructure systems.
- Open-source contributions: Active participation in AI/ML or SRE-related open-source projects.
- Cloud certifications: Advanced credentials (e.g., AWS Machine Learning Specialty, Google Cloud AI Engineer).
- Domain knowledge in observability: Experience with tools like Grafana, ELK Stack, or Splunk for enhancing system visibility.
What’s in it for you?
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- Flexible work/life balance options
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Access to a variety of job opportunities across business and geographies
Job Skills
Agile Methodology, Group Problem Solving, IT Systems Integration, Organizational Leadership, Product Services, Software Development Life Cycle (SDLC), System Applications, System Integration Testing (SIT), Systems Software
Additional Job Details
Address:
RBC WATERPARK PLACE, 88 QUEENS QUAY W:TORONTO
City:
Toronto
Country:
Canada
Work hours/week:
37.5
Employment Type:
Full time
Platform:
TECHNOLOGY AND OPERATIONS
Job Type:
Regular
Pay Type:
Salaried
Posted Date:
2025-10-23
Application Deadline:
2025-10-31
Note**:** Applications will be accepted until 11:59 PM on the day prior to the application deadline date above
I****nclusion and Equal Opportunity Employment
At RBC, we believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.
Join our Talent Community
Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.
Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com.
About RBC
Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.
La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais
Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.
Top Benefits
About the role
Job Description
What is the opportunity?
This role offers a unique chance to pioneer the integration of Generative AI and machine learning (ML) into Site Reliability Engineering (SRE), driving transformative improvements in system reliability, efficiency, and scalability. You will work at the intersection of AI/ML innovation and cloud-native infrastructure, addressing critical challenges like anomaly detection, incident prediction, and automation. By leveraging cutting-edge technologies, you will empower organizations to minimize downtime, enhance observability, and optimize operational workflows, directly impacting business continuity and performance.
What will do you do?
- Design and deploy end-to-end AI/ML solutions to solve SRE challenges (e.g., log analysis, auto-remediation, and predictive maintenance).
- Develop models using supervised/unsupervised learning and Generative AI tools (e.g., LLMs, text-generation frameworks) to improve system resilience.
- Fine-tune models, engineer prompts, and integrate AI solutions with SRE tooling (monitoring systems, CI/CD pipelines).
- Collaborate with SRE, DevOps, and data science teams to scale solutions across cloud platforms (OCP, Azure).
- Translate AI insights into strategies for reducing downtime, automating tasks, and aligning with SRE principles (SLOs, error budgets).
- Build and maintain ML pipelines using Python, TensorFlow, PyTorch, and OpenAI APIs.
- Evaluate emerging AI technologies to advance reliability engineering practices.
What do you need to succeed?
- Technical Expertise: Strong experience in ML/Generative AI, Python, and frameworks like TensorFlow, PyTorch, or OpenAI APIs.
- SRE Knowledge: Familiarity with SRE concepts (SLOs, error budgets) and cloud-native environments (OCP, Azure).
- Problem-Solving Skills: Ability to address complex reliability challenges with AI-driven solutions.
- Collaboration: Effective teamwork with cross-functional teams (SRE, DevOps, data science).
- Innovation: Passion for exploring emerging AI technologies and advocating for novel approaches.
- Operational Focus: Commitment to ensuring scalable, production-ready deployments and optimizing model performance.
Must haves:
- Proven expertise in machine learning (ML) and Generative AI: Hands-on experience with frameworks like TensorFlow, PyTorch, or Hugging Face, and tools such as OpenAI APIs or LLMs.
- Strong programming skills in Python: Proficiency in developing and deploying ML models and pipelines.
- SRE/DevOps fundamentals: Familiarity with Site Reliability Engineering principles (e.g., SLOs, error budgets) and cloud-native infrastructure (OCP, Azure).
- Model deployment and scalability: Experience operationalizing ML models in production environments, including monitoring, maintenance, and optimization.
- Collaborative problem-solving: Ability to work with cross-functional teams (SRE, DevOps, data science) to translate technical insights into actionable solutions.
- Data analysis and engineering: Skills in preprocessing data, feature engineering, and working with large-scale datasets.
Nice to haves:
- Prompt engineering and fine-tuning: Experience optimizing Generative AI models (e.g., LLMs) for domain-specific tasks.
- MLOps/AIOps tools: Familiarity with ML pipeline orchestration (e.g., Kubeflow, MLflow) and SRE tooling (e.g., Prometheus, Kubernetes).
- Anomaly detection/time-series analysis: Prior work in predictive maintenance, incident forecasting, or log analysis for infrastructure systems.
- Open-source contributions: Active participation in AI/ML or SRE-related open-source projects.
- Cloud certifications: Advanced credentials (e.g., AWS Machine Learning Specialty, Google Cloud AI Engineer).
- Domain knowledge in observability: Experience with tools like Grafana, ELK Stack, or Splunk for enhancing system visibility.
What’s in it for you?
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- Flexible work/life balance options
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Access to a variety of job opportunities across business and geographies
Job Skills
Agile Methodology, Group Problem Solving, IT Systems Integration, Organizational Leadership, Product Services, Software Development Life Cycle (SDLC), System Applications, System Integration Testing (SIT), Systems Software
Additional Job Details
Address:
RBC WATERPARK PLACE, 88 QUEENS QUAY W:TORONTO
City:
Toronto
Country:
Canada
Work hours/week:
37.5
Employment Type:
Full time
Platform:
TECHNOLOGY AND OPERATIONS
Job Type:
Regular
Pay Type:
Salaried
Posted Date:
2025-10-23
Application Deadline:
2025-10-31
Note**:** Applications will be accepted until 11:59 PM on the day prior to the application deadline date above
I****nclusion and Equal Opportunity Employment
At RBC, we believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.
Join our Talent Community
Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.
Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com.
About RBC
Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.
La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais
Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.