Jobs.ca
Jobs.ca
Language
RBC logo

Senior IAM Resiliency Engineer - Observability (Global Security)

RBCabout 16 hours ago
Urgently Hiring
Verified
Toronto, ON
Senior Level
Full-time

Top Benefits

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable.
Leaders who support your development through coaching and managing opportunities.
Ability to make a difference and lasting impact

About the role

Job Description

What is the opportunity?

We are seeking an expert Senior Observability Engineer to own the resilience and "see-ability" of our mission-critical Identity and Access Management (IAM) platform. Your primary mission will be to design, build, and scale an end-to-end observability stack that provides deep, actionable insights into our distributed systems.

You will be the team's subject matter expert on monitoring, logging, tracing, and detection. By leveraging a diverse toolset including Elastic Stack, Dynatrace, Prometheus, Grafana, Splunk and Catchpoint , your work will directly strengthen our detection capabilities and aggressively reduce our Mean Time to Detect (MTTD). This isn't just about collecting data; it's about transforming data into automated intelligence that proactively identifies and mitigates failures before they impact our users.

What will you do?

  • Architect & Build: Design and implement a unified, multi-layered observability framework that provides a "single pane of glass" for our IAM services.
  • Strengthen Detection: Develop sophisticated, high-signal/low-noise alerting strategies. This includes building anomaly detection models, predictive monitoring and critical integrity checks for unexpected configuration drifts, potential privilege escalation events and expiring certificates and keys to prevent security related outages.
  • Reduce MTTD: Be the primary driver for initiatives, tooling, and process improvements focused on minimizing Mean Time to Detect and Mean Time to Resolution (MTTR).
  • Tool Integration & Management: Master and integrate our full stack of observability tools:
    • Metrics & Dashboards: Prometheus & Grafana for time-series metrics and visualization.
    • Logging: Elastic Stack and/or Splunk for centralized logging, query optimization, and trend analysis.
    • APM & Tracing: Dynatrace for deep application performance monitoring and distributed tracing across microservices.
    • Synthetic & RUM: Catchpoint for proactive, outside-in monitoring of critical IAM user journeys (like login, token issuance, and password reset).
  • Define "Normal": Establish and evangelize key Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for the IAM platform and build the dashboards that track them.
  • Champion Resiliency: Partner with Infrastructure and Engineering teams to use observability data to inform chaos engineering tests, performance tuning, and capacity planning.
  • Evangelize Best Practices: Train and mentor system engineers on observability best practices, instrumentation (e.g., OpenTelemetry), and building "observable-by-default" applications.

What do you need to succeed?

Must Have:

  • Experience: 7+ years in a senior Observability, SRE, or DevOps role with a focus on monitoring highly-available, distributed systems.

  • Metrics & Dashboards: Deep, hands-on expertise with Prometheus (incl. PromQL) and building complex, actionable dashboards in Grafana.

  • Logging Expertise: Proven experience managing and extracting value from large-scale logging platforms like ELK (Elasticsearch, Logstash, Kibana) or Splunk.

  • APM Mastery: Demonstrable experience using an APM tool like Dynatrace , New Relic, or AppDynamics to trace, debug, and optimize application performance.

  • Synthetic Monitoring: Experience with synthetic monitoring tools like Catchpoint to model and validate critical user flows.

  • Core Concepts: A strong "three pillars" foundation (metrics, logs, traces) and a passion for data-driven reliability.

  • Automation: Strong scripting skills (e.g., Python, Go, Bash) and experience with Infrastructure as Code (Terraform, Ansible) for managing your monitoring stack.

  • Communication: Excellent ability to communicate complex technical concepts to diverse audiences, from junior engineers to senior leadership.

Nice to Have:

  • IAM Context: Experience monitoring IAM-specific protocols and services (e.g., OAuth2, OIDC, SAML, LDAP, SCIM).
  • **Trust & Integrity Monitoring: **Experience building monitors for configuration drift, anomalous privilege escalation, and certificate lifecycle management.
  • Anomaly Detection: Practical experience implementing or using AIOps and machine-learning-based anomaly detection systems.
  • Cloud Native: Deep experience with observability in a Kubernetes and/or public cloud (AWS, GCP, Azure) environment.
  • Distributed Tracing: Experience with OpenTelemetry, Jaeger, or Zipkin.
  • Chaos Engineering: Familiarity with chaos engineering principles and tools (e.g., ChaosToolkit, Gremlin).

What’s in It for You?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

  • A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable.

  • Leaders who support your development through coaching and managing opportunities.

  • Ability to make a difference and lasting impact

  • Work in a dynamic, collaborative, progressive, and high-performing team

  • A world-class training program in financial services

  • Opportunities to do challenging work

  • Opportunities to take on progressively greater accountabilities

  • Access to a variety of job opportunities across business and geographies

#Ll-POST
#TECHPJ

Job Skills

Agile Working, Agile Working, Application Security, Automation Tools, Bash (Programming Language), Cloud Platform, Cyber Security Management, Decision Making, Dynatrace APM, Elastic Logstash, ElasticSearch, Grafana, High Reliability, Identity Access Management (IAM), Information Security Management, Information Technology Security, Infrastructure Penetration Testing, Interpersonal Communication, IT Security Architecture, IT Systems Integration, Kubernetes, Prometheus (Software), Python (Programming Language), Red Hat Ansible, Security Information and Event Management (SIEM) {+ 5 more}

Additional Job Details

Address:

16 YORK ST:TORONTO

City:

Toronto

Country:

Canada

Work hours/week:

37.5

Employment Type:

Full time

Platform:

TECHNOLOGY AND OPERATIONS

Job Type:

Regular

Pay Type:

Salaried

Posted Date:

2025-11-14

Application Deadline:

2025-11-30

Note**: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above

About RBC

Banking
10,000+

Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. Our success comes from the 94,000+ employees who leverage their imaginations and insights to bring our vision, values and strategy to life so we can help our clients thrive and communities prosper. As Canada's biggest bank and one of the largest in the world, based on market capitalization, we have a diversified business model with a focus on innovation and providing exceptional experiences to our more than 17 million clients in Canada, the U.S. and 27 other countries. Learn more at rbc.com. We are proud to support a broad range of community initiatives through donations, community investments and employee volunteer activities. See how at www.rbc.com/community-social-impact.

http://rbc.com/legalstuff.

La Banque Royale du Canada est une institution financière mondiale définie par sa raison d'être, guidée par des principes et orientée vers l'excellence en matière de rendement. Notre succès est attribuable aux quelque 94 000+ employés qui mettent à profit leur créativité et leur savoir faire pour concrétiser notre vision, nos valeurs et notre stratégie afin que nous puissions contribuer à la prospérité de nos clients et au dynamisme des collectivités. Selon la capitalisation boursière, nous sommes la plus importante banque du Canada et l'une des plus grandes banques du monde. Nous avons adopté un modèle d'affaires diversifié axé sur l'innovation et l'offre d'expériences exceptionnelles à nos plus de 17 millions de clients au Canada, aux États Unis et dans 27 autres pays. Pour en savoir plus, visitez le site rbc.com/francais

Nous sommes fiers d'appuyer une grande diversité d'initiatives communautaires par des dons, des investissements dans la collectivité et le travail bénévole de nos employés. Pour de plus amples renseignements, visitez le site www.rbc.com/collectivite-impact-social.

https://www.rbc.com/conditions-dutilisation/