Jobs.ca
Jobs.ca
Language
Kong Inc. logo

Senior Site Reliability Engineer

Kong Inc.15 days ago
Remote
Toronto
CA$103,414 - CA$144,836/yearly
Senior Level

Top Benefits

Flexible time off: Take time for yourself and priorities.
Stock options: Share in company success.
U-First Fridays: 4 hours/month for learning.

About the role

Who you are

  • If you don’t think you meet all of the criteria below but are still interested in the job, please apply. Nobody checks every box - we’re looking for candidates that are particularly strong in a few areas, and have some interest and capabilities in others
  • This is a hands-on role ideal for engineers who thrive on running production SaaS systems at scale, automating operations, and continuously improving performance, resilience, and deployment pipelines
  • BS in Computer Science or equivalent practical experience
  • Proven experience managing SaaS or PaaS systems at enterprise scale (multi-region, multi-tenant, secure environments)
  • Deep expertise in Kubernetes, including debugging cluster/networking issues and designing for fault tolerance and scalability
  • Strong proficiency with Infrastructure as Code tools like Terraform or Terragrunt
  • Experience with CI/CD pipelines and GitOps workflows (ArgoCD, Atlantis, Helm)
  • Proficiency in one or more programming languages (Go, Python, Bash) for automation and tooling
  • Solid understanding of Linux/Unix systems, networking (DNS, TLS/SSL, HTTP), load balancers and distributed systems
  • Experiencing working with API gateway and service mesh technologies
  • Familiarity with streaming systems like Kafka and observability platforms (Datadog, Prometheus, Grafana)
  • Experience working in a 24/7/365 production support environment
  • Hands-on experience with Kong Gateway, Kong Mesh, or similar service connectivity technologies
  • Experience operating ClickHouse, Druid, or other time-series and analytics databases
  • Experience managing PostgreSQL and Redis in multi-region configurations
  • Working knowledge of AWS networking (PrivateLink, Transit Gateway, VPC Peering, Firewalls), Azure VNet, or GCP NCC
  • Strong understanding of disaster recovery, resiliency testing, and compliance-driven reliability practices

What the job involves

  • As a Site Reliability Engineer, you’ll join the global Platform SRE team responsible for building, operating, and scaling Kong’s multi-region SaaS platform that powers the world’s API connectivity
  • You’ll design, automate, and run production systems serving thousands of customers across AWS, GCP, and Azure
  • You’ll work on everything from multi-region Kubernetes clusters to service mesh and gateway architectures, ensuring the reliability, scalability, and security of Kong’s SaaS offerings
  • Operate and scale Kong’s global SaaS platform (Konnect), ensuring reliability, availability, and performance across regions and clouds
  • Build, automate, and maintain Kubernetes-based infrastructure and deployment workflows using Terraform/Terragrunt, Helm, and ArgoCD
  • Design, maintain, and optimize multi-region data and caching layers — including PostgreSQL, Redis, ClickHouse, and Druid — for high availability and low latency
  • Operate and improve Kong Gateway and Kong Mesh environments supporting hybrid and distributed architectures
  • Develop and maintain CI/CD pipelines and GitOps workflows to automate service delivery and ensure consistent infrastructure changes
  • Enhance observability and incident response readiness through systems like Datadog, Prometheus, Grafana, and Thanos, defining and tracking SLOs
  • Collaborate closely with development and security teams to ensure smooth operation of SaaS services in compliance with reliability, security, and regulatory standards
  • Participate in a global 24/7 on-call rotation and drive continuous improvement of operational playbooks and postmortem practices
  • Lead and contribute to scaling initiatives that improve elasticity, reliability, and cost-efficiency across the SaaS platform

Benefits

  • Flexible time off: Take time to take care of yourself and the things that matter most
  • Stock options: We want you to share in our success. That's why stock options are offered to most Kongers
  • U-First Fridays: Get 4 hours a month for continuous learning with a book, podcast, or course of your choice
  • Virtual events: Stay connected with Donut chats, trivia, fitness challenges, guided meditations, and more
  • Dedicated unplug days: Silence those notifications. Enjoy some well-deserved long weekend where the entire team unplugs
  • Home office stipend: Build a home office environment tailored to support your productivity

About Kong Inc.

Software Development
501-1000

Powering the API World. No AI without APIs.

Kong enables any company to become an API-first company. Kong’s unified cloud native API platform is easy to use and works in any environment — unleashing developer productivity, automating security, and boosting performance of APIs and microservices at scale.