Staff Site Reliability Engineer (SRE)
Top Benefits
About the role
Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Staff Site Reliability Engineer to join us at Thinkific.
We’re looking for a Staff Site Reliability Engineer (SRE) to join us at Thinkific. As a Staff SRE, you’ll work closely with engineers, domain experts, and stakeholders to improve the performance, reliability, and security of our systems. You’ll bring deep technical skills and a collaborative mindset to drive forward important projects and help other engineers grow through mentorship and coaching.
You’ll be a core contributor on your team and a go-to expert in your domain. You’ll collaborate with engineering leadership and cross-functional partners to solve complex problems and ensure our systems scale reliably to support our growing platform.
Your goal will be to help guide and execute on projects related to your technical domain. Here’s how you’ll accomplish this:
- Own a technical domain within our system and be accountable for operations and SLOs related to performance, reliability, and security, as well as architectural evolution and technical documentation aligned with broader strategy
- Contribute to the planning and execution of technical projects within and across your team, helping ensure that initiatives are well-scoped, aligned with organizational priorities, and effectively delivered
- Partner with product managers, designers, and other engineers to define system requirements, propose implementation strategies, and make tradeoffs visible
- Champion operational excellence, observability, and incident response across your team and adjacent services
- Write high-quality, maintainable, and efficient code with a focus on long-term scalability and performance
- Share your expertise by mentoring other engineers, supporting code reviews, and guiding others through architectural and debugging challenges
- Promote a culture of continuous improvement by encouraging experimentation, learning from failure, and driving engineering best practices in reliability, performance, and software quality
- Participate in our on-call rotation and incident response processes to help maintain a high level of service reliability
The person we have in mind likely:
- Has 6+ years of experience in the software or infrastructure engineering profession, including time spent in a reliability or platform-focused role
- Has experience owning services in production, and feels comfortable with infrastructure as code, container orchestration, and cloud-native development practices
- Understands the operational needs of complex distributed systems and has experience with monitoring, observability, incident management, and system hardening
- Writes infrastructure code in tools like Terraform with an eye toward security, modularity, and collaboration
- Has experience with languages like Ruby, Python, or Bash, and is proficient in working with relational and non-relational databases such as Postgres or AWS Aurora
- Can identify root causes of complex issues across multiple systems and work with stakeholders to develop resilient solutions
- Has experience with queueing systems like SNS, SQS, or Sidekiq and understands patterns for asynchronous processing and fault tolerance
- Enjoys collaborating across teams, sharing knowledge, and helping shape the team’s technical roadmap
- Is a thoughtful communicator who proactively shares context, feedback, and plans with their team and stakeholders
- Brings a continuous improvement mindset by seeking out opportunities to streamline workflows, reduce toil, and enable team success
- Loves to learn and grow. They’ve found (and keep looking for) ways to level up their skills in this field, whether that’s through formal education, gaining professional experience, or maybe even building their own business
These things would also be nice, but we think you could learn them on the job:
- Experience working with AWS services and infrastructure at scale
- Knowledge of networking fundamentals and related cloud services such as Cloudflare, load balancing, and TLS
The recruitment compensation range for this position is $135,000 – $165,000 CAD. Your specific compensation within this range is determined based on your job-related skills, knowledge, experience, and our internal equity assessment.
Diversity, Equity, Inclusion and Belonging & Accessibility
This is just our initial idea of who we’re looking for! At Thinkific, we know that people have unique career journeys. If your experience is close to what we’ve described but you feel that you might be missing a few of the requirements, please still apply! We believe in equal opportunity and are committed to diversity, equity, inclusion, and belonging across every facet of our business.
We’re also committed to providing a comfortable and accessible interview experience for every candidate. If there are any accommodations our team can make throughout our hiring process (big or small), please let us know.
About Thinkific
Thinkific is the leading Creator Educator Platform & Ecosystem. An all-in-one platform to help you easily create, market & sell your digital learning products.
We transform lives by helping our Creator Educators to amplify their passion, impact, and abundance. Our mission is to make it simple for entrepreneurs and established businesses – our Creator Educators - to scale and generate revenue by teaching what they know.
We’re relied upon every day by the largest and most successful Creator Educators on the planet. With an ecosystem encompassing 50K+ Creator Educators worldwide, and 3k+ Thinkific Partners & Experts, it’s no wonder the largest and most successful Creator Educators on the planet count on Thinkific every day.
Sign up for our Talent Community to learn more about new jobs, interview tips, exciting news, what our team is up to, and more: https://thnk.cc/talent-community. Join our team: http://thinkific.com/resources/careers
Sign up now for a free trial at http://www.thinkific.com
Staff Site Reliability Engineer (SRE)
Top Benefits
About the role
Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Staff Site Reliability Engineer to join us at Thinkific.
We’re looking for a Staff Site Reliability Engineer (SRE) to join us at Thinkific. As a Staff SRE, you’ll work closely with engineers, domain experts, and stakeholders to improve the performance, reliability, and security of our systems. You’ll bring deep technical skills and a collaborative mindset to drive forward important projects and help other engineers grow through mentorship and coaching.
You’ll be a core contributor on your team and a go-to expert in your domain. You’ll collaborate with engineering leadership and cross-functional partners to solve complex problems and ensure our systems scale reliably to support our growing platform.
Your goal will be to help guide and execute on projects related to your technical domain. Here’s how you’ll accomplish this:
- Own a technical domain within our system and be accountable for operations and SLOs related to performance, reliability, and security, as well as architectural evolution and technical documentation aligned with broader strategy
- Contribute to the planning and execution of technical projects within and across your team, helping ensure that initiatives are well-scoped, aligned with organizational priorities, and effectively delivered
- Partner with product managers, designers, and other engineers to define system requirements, propose implementation strategies, and make tradeoffs visible
- Champion operational excellence, observability, and incident response across your team and adjacent services
- Write high-quality, maintainable, and efficient code with a focus on long-term scalability and performance
- Share your expertise by mentoring other engineers, supporting code reviews, and guiding others through architectural and debugging challenges
- Promote a culture of continuous improvement by encouraging experimentation, learning from failure, and driving engineering best practices in reliability, performance, and software quality
- Participate in our on-call rotation and incident response processes to help maintain a high level of service reliability
The person we have in mind likely:
- Has 6+ years of experience in the software or infrastructure engineering profession, including time spent in a reliability or platform-focused role
- Has experience owning services in production, and feels comfortable with infrastructure as code, container orchestration, and cloud-native development practices
- Understands the operational needs of complex distributed systems and has experience with monitoring, observability, incident management, and system hardening
- Writes infrastructure code in tools like Terraform with an eye toward security, modularity, and collaboration
- Has experience with languages like Ruby, Python, or Bash, and is proficient in working with relational and non-relational databases such as Postgres or AWS Aurora
- Can identify root causes of complex issues across multiple systems and work with stakeholders to develop resilient solutions
- Has experience with queueing systems like SNS, SQS, or Sidekiq and understands patterns for asynchronous processing and fault tolerance
- Enjoys collaborating across teams, sharing knowledge, and helping shape the team’s technical roadmap
- Is a thoughtful communicator who proactively shares context, feedback, and plans with their team and stakeholders
- Brings a continuous improvement mindset by seeking out opportunities to streamline workflows, reduce toil, and enable team success
- Loves to learn and grow. They’ve found (and keep looking for) ways to level up their skills in this field, whether that’s through formal education, gaining professional experience, or maybe even building their own business
These things would also be nice, but we think you could learn them on the job:
- Experience working with AWS services and infrastructure at scale
- Knowledge of networking fundamentals and related cloud services such as Cloudflare, load balancing, and TLS
The recruitment compensation range for this position is $135,000 – $165,000 CAD. Your specific compensation within this range is determined based on your job-related skills, knowledge, experience, and our internal equity assessment.
Diversity, Equity, Inclusion and Belonging & Accessibility
This is just our initial idea of who we’re looking for! At Thinkific, we know that people have unique career journeys. If your experience is close to what we’ve described but you feel that you might be missing a few of the requirements, please still apply! We believe in equal opportunity and are committed to diversity, equity, inclusion, and belonging across every facet of our business.
We’re also committed to providing a comfortable and accessible interview experience for every candidate. If there are any accommodations our team can make throughout our hiring process (big or small), please let us know.
About Thinkific
Thinkific is the leading Creator Educator Platform & Ecosystem. An all-in-one platform to help you easily create, market & sell your digital learning products.
We transform lives by helping our Creator Educators to amplify their passion, impact, and abundance. Our mission is to make it simple for entrepreneurs and established businesses – our Creator Educators - to scale and generate revenue by teaching what they know.
We’re relied upon every day by the largest and most successful Creator Educators on the planet. With an ecosystem encompassing 50K+ Creator Educators worldwide, and 3k+ Thinkific Partners & Experts, it’s no wonder the largest and most successful Creator Educators on the planet count on Thinkific every day.
Sign up for our Talent Community to learn more about new jobs, interview tips, exciting news, what our team is up to, and more: https://thnk.cc/talent-community. Join our team: http://thinkific.com/resources/careers
Sign up now for a free trial at http://www.thinkific.com