Senior Site Reliability Engineer
Top Benefits
About the role
Who you are
- 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
- Deep expertise in Kubernetes administration and architecture
- Strong track record of leading CI/CD and platform engineering initiatives
- Demonstrated experience leading technical projects and mentoring engineers
- Advanced experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
- Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk, Grafana)
- Advanced experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
- Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
- Experience with GitOps practices and tools like ArgoCD
- Experience building and maintaining platform engineering solutions at scale
- Experience implementing and managing observability solutions
- Experience with cost optimization and capacity planning
- Knowledge of emerging trends in platform engineering and DevOps practices
- Strong technical writing skills for documentation and knowledge sharing
- Experience with developer portals and internal platform products
What the job involves
- We are seeking a Senior Site Reliability Engineer to join our Release Engineering team
- As a senior member of our Release Engineering, you will drive our release and deployment developer experience, observability, and platform engineering initiatives to enable teams across PagerDuty engineering
- In this role you will lead technical decisions, and mentor team members while building robust, scalable infrastructure solutions that enhance our developer experience and platform reliability
- Lead the design and implementation of complex platform engineering solutions
- Drive architectural decisions for our CI/CD infrastructure and Kubernetes platform
- Mentor junior team members and provide technical leadership in platform engineering practices
- Develop and implement strategic initiatives to improve developer experience and platform reliability
- Design and implement scalable solutions for infrastructure automation using Terraform and other IaC tools
- Lead post incident reviews and drive systematic improvements to prevent recurring issues
- Collaborate with other engineering teams globally to define and implement platform standards
- Champion observability and monitoring best practices across the organization
- Participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules
Benefits
- 20 hours per year of paid volunteer time
- Health insurance
- Wellness Days and mid-year Wellness Week: extra time off for whole company to unplug and recharge at the same time
- Generous paid parental leave and return to work policy to help with transition back
- Generous paid time off
- Hands-on career and leadership development programs
- Flexible workplace/WFH
About PagerDuty
In an always-on world, teams trust PagerDuty to help them deliver an optimal digital experience to their customers, every time. PagerDuty is the central nervous system for a company’s digital operations. We identify issues and opportunities in real-time and bring together the right people to respond to problems faster and prevent them in the future. From digital disruptors to Fortune 500 companies, over 18,000 businesses rely on PagerDuty to help them continually improve their digital operations—so their teams can spend less time reacting to incidents and more time building for the future.
Dutonians believe that we are a part of a bigger movement of businesses being built to benefit everyone—the customer and the employee, as well as our community. We are go-getters fueled by the fire to reinvent how people and companies work together. We take the lead and get creative to be first in the hearts of our customers. Whether it’s keeping the world on or changing it entirely, Dutonians are fueled by the fire to reinvent how people and companies work together to deliver in real-time, across the globe.
Join us to lead uncharted efforts and reinvent how companies run.
Senior Site Reliability Engineer
Top Benefits
About the role
Who you are
- 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
- Deep expertise in Kubernetes administration and architecture
- Strong track record of leading CI/CD and platform engineering initiatives
- Demonstrated experience leading technical projects and mentoring engineers
- Advanced experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
- Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk, Grafana)
- Advanced experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
- Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
- Experience with GitOps practices and tools like ArgoCD
- Experience building and maintaining platform engineering solutions at scale
- Experience implementing and managing observability solutions
- Experience with cost optimization and capacity planning
- Knowledge of emerging trends in platform engineering and DevOps practices
- Strong technical writing skills for documentation and knowledge sharing
- Experience with developer portals and internal platform products
What the job involves
- We are seeking a Senior Site Reliability Engineer to join our Release Engineering team
- As a senior member of our Release Engineering, you will drive our release and deployment developer experience, observability, and platform engineering initiatives to enable teams across PagerDuty engineering
- In this role you will lead technical decisions, and mentor team members while building robust, scalable infrastructure solutions that enhance our developer experience and platform reliability
- Lead the design and implementation of complex platform engineering solutions
- Drive architectural decisions for our CI/CD infrastructure and Kubernetes platform
- Mentor junior team members and provide technical leadership in platform engineering practices
- Develop and implement strategic initiatives to improve developer experience and platform reliability
- Design and implement scalable solutions for infrastructure automation using Terraform and other IaC tools
- Lead post incident reviews and drive systematic improvements to prevent recurring issues
- Collaborate with other engineering teams globally to define and implement platform standards
- Champion observability and monitoring best practices across the organization
- Participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules
Benefits
- 20 hours per year of paid volunteer time
- Health insurance
- Wellness Days and mid-year Wellness Week: extra time off for whole company to unplug and recharge at the same time
- Generous paid parental leave and return to work policy to help with transition back
- Generous paid time off
- Hands-on career and leadership development programs
- Flexible workplace/WFH
About PagerDuty
In an always-on world, teams trust PagerDuty to help them deliver an optimal digital experience to their customers, every time. PagerDuty is the central nervous system for a company’s digital operations. We identify issues and opportunities in real-time and bring together the right people to respond to problems faster and prevent them in the future. From digital disruptors to Fortune 500 companies, over 18,000 businesses rely on PagerDuty to help them continually improve their digital operations—so their teams can spend less time reacting to incidents and more time building for the future.
Dutonians believe that we are a part of a bigger movement of businesses being built to benefit everyone—the customer and the employee, as well as our community. We are go-getters fueled by the fire to reinvent how people and companies work together. We take the lead and get creative to be first in the hearts of our customers. Whether it’s keeping the world on or changing it entirely, Dutonians are fueled by the fire to reinvent how people and companies work together to deliver in real-time, across the globe.
Join us to lead uncharted efforts and reinvent how companies run.