Senior Site Reliability Engineer

Kong Inc.2 months ago

Remote

Toronto

CA$103,414 - CA$144,836/yearly

Senior Level

Top Benefits

Flexible time off: Take time for yourself and priorities.

Stock options: Share in company success.

U-First Fridays: 4 hours/month for learning.

About the role

Who you are

If you don’t think you meet all of the criteria below but are still interested in the job, please apply. Nobody checks every box - we’re looking for candidates that are particularly strong in a few areas, and have some interest and capabilities in others
This is a hands-on role ideal for engineers who thrive on running production SaaS systems at scale, automating operations, and continuously improving performance, resilience, and deployment pipelines
BS in Computer Science or equivalent practical experience
Proven experience managing SaaS or PaaS systems at enterprise scale (multi-region, multi-tenant, secure environments)
Deep expertise in Kubernetes, including debugging cluster/networking issues and designing for fault tolerance and scalability
Strong proficiency with Infrastructure as Code tools like Terraform or Terragrunt
Experience with CI/CD pipelines and GitOps workflows (ArgoCD, Atlantis, Helm)
Proficiency in one or more programming languages (Go, Python, Bash) for automation and tooling
Solid understanding of Linux/Unix systems, networking (DNS, TLS/SSL, HTTP), load balancers and distributed systems
Experiencing working with API gateway and service mesh technologies
Familiarity with streaming systems like Kafka and observability platforms (Datadog, Prometheus, Grafana)
Experience working in a 24/7/365 production support environment
Hands-on experience with Kong Gateway, Kong Mesh, or similar service connectivity technologies
Experience operating ClickHouse, Druid, or other time-series and analytics databases
Experience managing PostgreSQL and Redis in multi-region configurations
Working knowledge of AWS networking (PrivateLink, Transit Gateway, VPC Peering, Firewalls), Azure VNet, or GCP NCC
Strong understanding of disaster recovery, resiliency testing, and compliance-driven reliability practices

What the job involves

As a Site Reliability Engineer, you’ll join the global Platform SRE team responsible for building, operating, and scaling Kong’s multi-region SaaS platform that powers the world’s API connectivity
You’ll design, automate, and run production systems serving thousands of customers across AWS, GCP, and Azure
You’ll work on everything from multi-region Kubernetes clusters to service mesh and gateway architectures, ensuring the reliability, scalability, and security of Kong’s SaaS offerings
Operate and scale Kong’s global SaaS platform (Konnect), ensuring reliability, availability, and performance across regions and clouds
Build, automate, and maintain Kubernetes-based infrastructure and deployment workflows using Terraform/Terragrunt, Helm, and ArgoCD
Design, maintain, and optimize multi-region data and caching layers — including PostgreSQL, Redis, ClickHouse, and Druid — for high availability and low latency
Operate and improve Kong Gateway and Kong Mesh environments supporting hybrid and distributed architectures
Develop and maintain CI/CD pipelines and GitOps workflows to automate service delivery and ensure consistent infrastructure changes
Enhance observability and incident response readiness through systems like Datadog, Prometheus, Grafana, and Thanos, defining and tracking SLOs
Collaborate closely with development and security teams to ensure smooth operation of SaaS services in compliance with reliability, security, and regulatory standards
Participate in a global 24/7 on-call rotation and drive continuous improvement of operational playbooks and postmortem practices
Lead and contribute to scaling initiatives that improve elasticity, reliability, and cost-efficiency across the SaaS platform

Benefits

Flexible time off: Take time to take care of yourself and the things that matter most
Stock options: We want you to share in our success. That's why stock options are offered to most Kongers
U-First Fridays: Get 4 hours a month for continuous learning with a book, podcast, or course of your choice
Virtual events: Stay connected with Donut chats, trivia, fitness challenges, guided meditations, and more
Dedicated unplug days: Silence those notifications. Enjoy some well-deserved long weekend where the entire team unplugs
Home office stipend: Build a home office environment tailored to support your productivity

About Kong Inc.

Software Development

501-1000

Powering the API World. No AI without APIs.

Kong enables any company to become an API-first company. Kong’s unified cloud native API platform is easy to use and works in any environment — unleashing developer productivity, automating security, and boosting performance of APIs and microservices at scale.

Website LinkedIn