Top Benefits
About the role
What is the Opportunity?
We are seeking an experienced and skilled Senior Site Reliability Engineer and
System's Specialist to join our team, responsible for ensuring the stability,
reliability, and performance of our mission-critical application.
The ideal candidate will possess a strong technical background in Linux administration, scripting, automation, and database management. This is a critical role that requires a high degree of technical expertise, attention to detail, and excellent problem-solving skills.
What will you do?
- Provide expert-level support and maintenance for our mission-critical application, ensuring high availability and performance
- Collaborate with cross-functional teams to identify and resolve technical issues, and implement preventative measures to minimize downtime
- Develop and maintain automation scripts using Python or shell scripting to streamline application maintenance and deployment tasks
- Design and implement DevOps/SRE automation solutions to improve application reliability, scalability, and efficiency
- Administer and troubleshoot Linux-based systems, including configuration, security, and performance optimization
- Develop and maintain SQL scripts to support data analysis, reporting, and application functionality
- Participate in on-call rotations to provide 24/7 support for critical application issues
- Collaborate with development teams to ensure smooth deployment of new features and updates
- Develop and maintain technical documentation to support application maintenance and troubleshooting
- Reliability & Performance Engineering
- Design, implement, and maintain scalable systems with high availability, reliability, and performance.
- Define and monitor SLAs, SLOs, and SLIs; drive observability improvements.
- Conduct capacity planning, performance tuning, and system optimization.
- Develop and implement disaster recovery and business continuity strategies.
- Automation & Infrastructure as Code
- Develop and maintain Infrastructure as Code (IaC) using tools like Copilot, RBC Assist etc.
- Build automation for CI/CD pipelines to streamline software delivery and deployment.
- Automate routine operational tasks to improve efficiency and reduce human error.
- Create and maintain reliable deployment processes, including blue-green and canary releases.
- Monitoring, Incident Response & Root Cause Analysis
- Own on-call responsibilities and develop processes to reduce alert fatigue.
- Lead incident response efforts, including communication and postmortem documentation.
- Implement and enhance monitoring and alerting systems (e.g., Prometheus, Grafana, Datadog).
- Champion blameless postmortems and drive systemic fixes to recurring issues.
- Collaboration, Governance & Mentorship
- Collaborate closely with development, security, and operations teams to embed reliability practices.
- Drive SRE best practices across teams and influence architecture and design decisions.
- Participate in internal audits and compliance activities related to infrastructure and availability.
- Mentor junior SREs and contribute to internal knowledge-sharing and documentation.
What do you need to succeed?
Must-Have:
-
Bachelor's or Master’s degree in Computer Science, Software Engineering, or a related field.
-
3+ years’ experience with system administration in RedHat Linux OS OR Apache, Solar
-
1+ years’ experience with Ansible
-
Strong experience with Python scripting
-
Experience providing Production Support
-
Experience with monitoring/SRE tools like Dynatrace, PagerDuty, ELK Stack
Nice to haves:
- Knowledge/experience with AI (Agents, LLMS etc.)
- Knowledge of Manage File Transfer platforms.
- Knowledge of DevOps tools like Git, Docker, Jenkins, and Kubernetes.
What’s in it for you?
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services
- Flexible work/life balance options
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Opportunities to building close relationships with clients
#LI-POST
#TECHPJ
About RBC
Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.
La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais
Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.
Top Benefits
About the role
What is the Opportunity?
We are seeking an experienced and skilled Senior Site Reliability Engineer and
System's Specialist to join our team, responsible for ensuring the stability,
reliability, and performance of our mission-critical application.
The ideal candidate will possess a strong technical background in Linux administration, scripting, automation, and database management. This is a critical role that requires a high degree of technical expertise, attention to detail, and excellent problem-solving skills.
What will you do?
- Provide expert-level support and maintenance for our mission-critical application, ensuring high availability and performance
- Collaborate with cross-functional teams to identify and resolve technical issues, and implement preventative measures to minimize downtime
- Develop and maintain automation scripts using Python or shell scripting to streamline application maintenance and deployment tasks
- Design and implement DevOps/SRE automation solutions to improve application reliability, scalability, and efficiency
- Administer and troubleshoot Linux-based systems, including configuration, security, and performance optimization
- Develop and maintain SQL scripts to support data analysis, reporting, and application functionality
- Participate in on-call rotations to provide 24/7 support for critical application issues
- Collaborate with development teams to ensure smooth deployment of new features and updates
- Develop and maintain technical documentation to support application maintenance and troubleshooting
- Reliability & Performance Engineering
- Design, implement, and maintain scalable systems with high availability, reliability, and performance.
- Define and monitor SLAs, SLOs, and SLIs; drive observability improvements.
- Conduct capacity planning, performance tuning, and system optimization.
- Develop and implement disaster recovery and business continuity strategies.
- Automation & Infrastructure as Code
- Develop and maintain Infrastructure as Code (IaC) using tools like Copilot, RBC Assist etc.
- Build automation for CI/CD pipelines to streamline software delivery and deployment.
- Automate routine operational tasks to improve efficiency and reduce human error.
- Create and maintain reliable deployment processes, including blue-green and canary releases.
- Monitoring, Incident Response & Root Cause Analysis
- Own on-call responsibilities and develop processes to reduce alert fatigue.
- Lead incident response efforts, including communication and postmortem documentation.
- Implement and enhance monitoring and alerting systems (e.g., Prometheus, Grafana, Datadog).
- Champion blameless postmortems and drive systemic fixes to recurring issues.
- Collaboration, Governance & Mentorship
- Collaborate closely with development, security, and operations teams to embed reliability practices.
- Drive SRE best practices across teams and influence architecture and design decisions.
- Participate in internal audits and compliance activities related to infrastructure and availability.
- Mentor junior SREs and contribute to internal knowledge-sharing and documentation.
What do you need to succeed?
Must-Have:
-
Bachelor's or Master’s degree in Computer Science, Software Engineering, or a related field.
-
3+ years’ experience with system administration in RedHat Linux OS OR Apache, Solar
-
1+ years’ experience with Ansible
-
Strong experience with Python scripting
-
Experience providing Production Support
-
Experience with monitoring/SRE tools like Dynatrace, PagerDuty, ELK Stack
Nice to haves:
- Knowledge/experience with AI (Agents, LLMS etc.)
- Knowledge of Manage File Transfer platforms.
- Knowledge of DevOps tools like Git, Docker, Jenkins, and Kubernetes.
What’s in it for you?
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services
- Flexible work/life balance options
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Opportunities to building close relationships with clients
#LI-POST
#TECHPJ
About RBC
Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.
La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais
Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.