Jobs.ca
Jobs.ca
Language
RBC logo

Senior Site Reliability Engineer

RBCabout 15 hours ago
Urgently Hiring
Verified
Mississauga, ON
Senior Level
Full-time

Top Benefits

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
Leaders who support your development through coaching and managing opportunities
Ability to make a difference and lasting impact

About the role

Job Description

What is the Opportunity?

RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business. With a unique blend of technical expertise and industry-specific knowledge, this team plays a critical role in ensuring the seamless operations of digital services that cater to both the business's internal and external stakeholders.

As a Senior Site Reliability Engineer, you will bring the engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation. This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by RBC Insurance Technology. You'll leverage your proficiency in Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace and scripting languages to build and maintain robust automation and SRE tooling.

What will you do?

  • Set vision for SRE product base (monitoring, alerting, machine learning anomaly detection, self-healing, reliability testing)
  • Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health.
  • Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks.
  • Develop and maintain custom tooling and automation scripts in languages like Bash, Python, and PowerShell to enhance operational efficiency and system reliability.
  • Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards.
  • Actively contribute to the definition and tracking of SLIs, SLOs, and other critical metrics, refining our alerting and monitoring strategies accordingly.
  • Document and maintain comprehensive runbooks, facilitating quick resolution of incidents and reducing mean time to recovery (MTTR).
  • Create and refine custom tooling and automation scripts using languages such as Bash, Python, and PowerShell, supporting the infrastructure's scalability and reliability needs.
  • Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives.
  • Mentor team members in building out robust monitoring and alerting strategies based on well-defined SLIs and SLOs.
  • Act as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, infrastructure of supported applications.
  • Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
  • Drive transformation by continuously looking for ways to automate existing processes.
  • Debug production issues across services and levels of the stack and provide primary operational support.
  • Perform production support role, including off-hours support (As part of an on-call rotation)

Must-have:

  • 4+ years of SRE or Systems Engineering experience with a proven record in technical leadership.
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Expertise in infrastructure-as-code and configuration management, particularly Ansible.
  • Advanced scripting capabilities in Bash, Python, PowerShell, or other similar languages.
  • In-depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka, and their role in system reliability.
  • Knowledge of creating, maintaining, and alerting on SLIs, SLOs, and other reliability metrics.

Nice-to-have:

  • Insurance industry experience
  • In-depth hands-on experience in a variety of SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana)
  • Familiarity with containerization technologies such as Docker.
  • Hands-on experience with DevOps CI-CD tools e.g. Jenkins, Artifactory and Vault

Soft Skills:

  • Excellent communication skills to foster collaboration across departments.

  • A resilient problem-solving approach, capable of leading the charge during high-stress incidents.

  • Strategic thinking and analytical prowess, with a focus on delivering reliable and performant systems.

  • Organizational skills to manage multiple priorities in a fast-paced environment.

What’s in it for you?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

  • A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
  • Leaders who support your development through coaching and managing opportunities
  • Ability to make a difference and lasting impact
  • Work in a dynamic, collaborative, progressive, and high-performing team
  • A world-class training program in financial services
  • Flexible work/life balance options
  • Opportunities to do challenging work

#LI-POST

#TECHPJ

Job Skills

Agile Methodology, Application Infrastructure, Group Problem Solving, IT Automation, IT Monitoring, Operations Support, Production Support, Software Development Life Cycle (SDLC), Software Engineering, Software Product Technical Knowledge, System Applications, Systems Software

Additional Job Details

Address:

MEADOWVALE BUSINESS PARK, 6880 FINANCIAL DR:MISSISSAUGA

City:

Mississauga

Country:

Canada

Work hours/week:

37.5

Employment Type:

Full time

Platform:

TECHNOLOGY AND OPERATIONS

Job Type:

Regular

Pay Type:

Salaried

Posted Date:

2025-03-07

Application Deadline:

2025-09-30

Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above

About RBC

Banking
10,000+

Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.

http://rbc.com/legalstuff.

La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais

Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.

https://www.rbc.com/conditions-dutilisation/