Lead Site Reliability Engineer

Factset 4 months ago

Toronto

Senior Level

Top Benefits

Generous paid time off for personal, vacation, parental, and medical leave.

Comprehensive health coverage for employees and families at little or no cost.

About the role

Who you are

10+ years as a Site Reliability Engineer, DevOps, or similar role in cloud-native environments (AWS focus)
Deep technical proficiency with AWS services (EC2, EKS, S3, RDS, IAM, etc.)
Expert-level experience managing, tuning, and scaling PostgreSQL databases
Advanced skill in Terraform (modular design, environment promotion, CI/CD integration)
Proficient in building and operating CI/CD systems (Gitlab CI, GitHub Actions, or equivalent)
Hands-on experience with GitOps workflows (Argo CD, Flux, etc.)
Strong knowledge of Kubernetes (deployment, scaling, networking, security)
Experience with monitoring and logging stacks (DataDog, Prometheus, Grafana, ELK, etc.)
Track record in designing, communicating, and executing complex infrastructure roadmaps
Experience mentoring and enabling engineering teams
Strong written and verbal communication skills
Professional certifications (AWS Solutions Architect, Kubernetes, Terraform)
Experience in fin-tech, SaaS, or high-compliance industries
Exposure to data privacy regulations and secure software development practices
Bachelors degree in computer science or similar

What the job involves

We are seeking a seasoned Senior Site Reliability Engineer with deep expertise in AWS to own, architect, and continuously evolve Irwin’s core infrastructure
You will plan, build, and optimize the systems that support our web applications and internal tools, ensuring scalability, reliability, observability, and security
Your technical judgment, roadmap planning skills, and hands-on expertise will enable our engineering teams to ship features with velocity and confidence
Design and execute long-term strategies for scalable, secure infrastructure to host the Irwin web application and associate tooling on AWS/EKS with PostgreSQL
Architect and manage highly available cloud environments on EKS/Kubernetes using best practices for cost, performance, and security
Oversee, tune, and ensure the high availability of large-scale PostgreSQL databases; optimize for performance, backup, disaster recovery, and observability. Bonus points for experience using Snowflake or other OLAP systems
Lead the adoption and maintenance of Terraform workflows to manage infrastructure; ensure reproducibility, modularity, and CI/CD integration
Build, maintain and scale CI/CD pipelines using GitOps principles to automate deployments, reduce risk, and speed up delivery cycles
Design, deploy, and manage production-grade Kubernetes clusters; automate scaling and implement robust security practices
Implement monitoring, logging, and alerting solutions; establish best practices for incident detection and resolution
Apply industry best practices for infrastructure and data security; ensure governance and compliance with relevant standards (e.g., SOC2, GDPR)
Mentor SRE peers and engineering teams on DevOps/SRE methodologies; document, communicate, and evangelize infrastructure best practices

The application process

Please attach your resume and a cover letter describing your approach to architecting scalable infrastructure and your experience with AWS, PostgreSQL, Terraform, GitOps, CI/CD, and Kubernetes

Benefits

A competitive package offering generous paid time off for personal, vacation, parental, and medical leave.
Comprehensive health coverage for employees and their families, at little or no cost to employees.
Discounted services at gyms and wellness facilities.
Free working lunch in the office Monday through Thursday.
A social community involved in sports, charities, and in-office events.
Certification reimbursement for eligible expenses related to the CFA, IPM, CAIA, and FRM exams.

About Factset

10,000+

Website

Lead Site Reliability Engineer

Factset 4 months ago

Toronto

Senior Level

Top Benefits

Generous paid time off for personal, vacation, parental, and medical leave.

Comprehensive health coverage for employees and families at little or no cost.

About the role

Who you are

10+ years as a Site Reliability Engineer, DevOps, or similar role in cloud-native environments (AWS focus)
Deep technical proficiency with AWS services (EC2, EKS, S3, RDS, IAM, etc.)
Expert-level experience managing, tuning, and scaling PostgreSQL databases
Advanced skill in Terraform (modular design, environment promotion, CI/CD integration)
Proficient in building and operating CI/CD systems (Gitlab CI, GitHub Actions, or equivalent)
Hands-on experience with GitOps workflows (Argo CD, Flux, etc.)
Strong knowledge of Kubernetes (deployment, scaling, networking, security)
Experience with monitoring and logging stacks (DataDog, Prometheus, Grafana, ELK, etc.)
Track record in designing, communicating, and executing complex infrastructure roadmaps
Experience mentoring and enabling engineering teams
Strong written and verbal communication skills
Professional certifications (AWS Solutions Architect, Kubernetes, Terraform)
Experience in fin-tech, SaaS, or high-compliance industries
Exposure to data privacy regulations and secure software development practices
Bachelors degree in computer science or similar

What the job involves

We are seeking a seasoned Senior Site Reliability Engineer with deep expertise in AWS to own, architect, and continuously evolve Irwin’s core infrastructure
You will plan, build, and optimize the systems that support our web applications and internal tools, ensuring scalability, reliability, observability, and security
Your technical judgment, roadmap planning skills, and hands-on expertise will enable our engineering teams to ship features with velocity and confidence
Design and execute long-term strategies for scalable, secure infrastructure to host the Irwin web application and associate tooling on AWS/EKS with PostgreSQL
Architect and manage highly available cloud environments on EKS/Kubernetes using best practices for cost, performance, and security
Oversee, tune, and ensure the high availability of large-scale PostgreSQL databases; optimize for performance, backup, disaster recovery, and observability. Bonus points for experience using Snowflake or other OLAP systems
Lead the adoption and maintenance of Terraform workflows to manage infrastructure; ensure reproducibility, modularity, and CI/CD integration
Build, maintain and scale CI/CD pipelines using GitOps principles to automate deployments, reduce risk, and speed up delivery cycles
Design, deploy, and manage production-grade Kubernetes clusters; automate scaling and implement robust security practices
Implement monitoring, logging, and alerting solutions; establish best practices for incident detection and resolution
Apply industry best practices for infrastructure and data security; ensure governance and compliance with relevant standards (e.g., SOC2, GDPR)
Mentor SRE peers and engineering teams on DevOps/SRE methodologies; document, communicate, and evangelize infrastructure best practices

The application process

Please attach your resume and a cover letter describing your approach to architecting scalable infrastructure and your experience with AWS, PostgreSQL, Terraform, GitOps, CI/CD, and Kubernetes

Benefits

A competitive package offering generous paid time off for personal, vacation, parental, and medical leave.
Comprehensive health coverage for employees and their families, at little or no cost to employees.
Discounted services at gyms and wellness facilities.
Free working lunch in the office Monday through Thursday.
A social community involved in sports, charities, and in-office events.
Certification reimbursement for eligible expenses related to the CFA, IPM, CAIA, and FRM exams.

About Factset

10,000+

Website