Top Benefits
About the role
Who you are
- 8–10+ years of experience in SRE, DevOps, Cloud Engineering, Platform Engineering, or Software Development roles
- Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, or similar
- Experience building AI/LLM-powered developer tools or integrations
- Demonstrated ability to drive org-wide tooling adoption, including change management, training, and measuring outcomes
- Proficiency in prompt engineering techniques
- Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
- Hands-on experience operating production environments in AWS
- Strong experience with Terraform
- Experience with container orchestration platforms like ECS or Kubernetes
- Familiarity with CI/CD tools such as GitHub Actions
- Solid understanding of observability practices including system metrics, distributed tracing, and SLOs. Datadog is a plus
- Exceptional communication and presentation skills, both written and verbal
- Experience troubleshooting complex distributed systems in a high-traffic production environment
- Exposure to event streaming systems such as Kafka or Kinesis
- Experience building Internal Developer Platforms (IDP) or designing self-service infrastructure workflows
- Familiarity with systems security, compliance requirements, or infrastructure hardening
- Experience with agentic AI workflows, MCP frameworks, or AI-powered automation beyond code generation
- Track record of leading incident response or driving post-incident review processes
What the job involves
- We are looking for a Staff Site Reliability Engineer to lead AI enablement across our engineering organization
- As AI-assisted development reshapes how software gets built, a new platform layer is emerging underneath — one that requires guardrails, quality gates, security standards, and tooling infrastructure to ensure AI-generated output is reliable, secure, and production-worthy
- This role owns that layer
- This role blends building and buying — you'll design and develop custom tools and frameworks where the market doesn't meet our needs, while continuously evaluating the evolving landscape to ensure we're leveraging the best solutions available
- We aim to be on the cutting edge, not the bleeding edge — investing deliberately in what delivers real value and staying ready to pivot when the market shifts meaningfully
- You will define and drive the strategy for embedding AI-native tools and practices into the software development lifecycle — from AI-assisted code review and developer workflow automation to establishing security standards for emerging frameworks like MCP
- You'll own AI tooling standards for the engineering org, evaluate and adopt the best platforms, use data to measure impact and prioritize where to invest next, and partner with teams to automate repetitive workflows using agentic tools
- This is a visible, high-influence role — you'll run lunch-and-learns, shape best practices, and be the go-to voice for how we leverage AI to multiply engineering output while keeping the foundations trustworthy
- This role sits within our Platform SRE team, and you'll participate in the team's ad-hoc support rotation, providing infrastructure guidance and troubleshooting for engineering teams
- This means you bring deep SRE fundamentals — AWS, Terraform, production operations — alongside your AI enablement focus
- AI Enablement Strategy: Define and own the standards and best practices for AI-assisted development across the engineering organization, from tool selection to workflow integration
- Tooling Development: Evaluate, build, or adopt AI-powered tools that improve code quality, catch vulnerabilities earlier in the development process, and reduce review cycle times — whether that means evolving internal solutions or identifying and integrating third-party platforms
- Adoption & Advocacy: Partner with engineering teams to understand what's impacting their AI tool adoption, guide them through improvements, and lead org-wide enablement efforts such as lunch-and-learns, workshops, and documentation
- Measuring Impact: Establish metrics and feedback loops to quantify the impact of AI tooling on developer productivity, code quality, and delivery speed
- Infrastructure Automation: Contribute to the design and scaling of production environments using AWS and Terraform when on rotation or as needs arise
- Mentorship & Standards: Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization
- On-Call: Participate in a low-volume on-call rotation
Benefits
- Paid parental leave
- 401k plan
- Stock option plan
- Open vacation days
- Flexible working hours
- Work from home opportunities
- Health insurance
Not the right fit? Search for Site Reliability Engineer jobs in Canada
About Coalition, Inc.
Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. By combining comprehensive insurance coverage and cybersecurity tools, Coalition helps businesses manage and mitigate potential cyber attacks. Coalition offers its Active Insurance products to policyholders in the U.S., the U.K., Canada, and Australia through Coalition’s relationships with leading global insurers and cyber capacity through its own carrier, Coalition Insurance Company. Coalition also provides automated cyber alerts, expert guidance and advice, and third-party risk management to businesses worldwide through its holistic cyber risk management platform, Coalition Control.
Coalition is also home to Coalition Security, which helps protect small businesses from the expanding universe of cyber threats. Coalition Security cyber tools and services are built and managed by cybersecurity experts invested in your risk.
Similar jobs you might like
Top Benefits
About the role
Who you are
- 8–10+ years of experience in SRE, DevOps, Cloud Engineering, Platform Engineering, or Software Development roles
- Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, or similar
- Experience building AI/LLM-powered developer tools or integrations
- Demonstrated ability to drive org-wide tooling adoption, including change management, training, and measuring outcomes
- Proficiency in prompt engineering techniques
- Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
- Hands-on experience operating production environments in AWS
- Strong experience with Terraform
- Experience with container orchestration platforms like ECS or Kubernetes
- Familiarity with CI/CD tools such as GitHub Actions
- Solid understanding of observability practices including system metrics, distributed tracing, and SLOs. Datadog is a plus
- Exceptional communication and presentation skills, both written and verbal
- Experience troubleshooting complex distributed systems in a high-traffic production environment
- Exposure to event streaming systems such as Kafka or Kinesis
- Experience building Internal Developer Platforms (IDP) or designing self-service infrastructure workflows
- Familiarity with systems security, compliance requirements, or infrastructure hardening
- Experience with agentic AI workflows, MCP frameworks, or AI-powered automation beyond code generation
- Track record of leading incident response or driving post-incident review processes
What the job involves
- We are looking for a Staff Site Reliability Engineer to lead AI enablement across our engineering organization
- As AI-assisted development reshapes how software gets built, a new platform layer is emerging underneath — one that requires guardrails, quality gates, security standards, and tooling infrastructure to ensure AI-generated output is reliable, secure, and production-worthy
- This role owns that layer
- This role blends building and buying — you'll design and develop custom tools and frameworks where the market doesn't meet our needs, while continuously evaluating the evolving landscape to ensure we're leveraging the best solutions available
- We aim to be on the cutting edge, not the bleeding edge — investing deliberately in what delivers real value and staying ready to pivot when the market shifts meaningfully
- You will define and drive the strategy for embedding AI-native tools and practices into the software development lifecycle — from AI-assisted code review and developer workflow automation to establishing security standards for emerging frameworks like MCP
- You'll own AI tooling standards for the engineering org, evaluate and adopt the best platforms, use data to measure impact and prioritize where to invest next, and partner with teams to automate repetitive workflows using agentic tools
- This is a visible, high-influence role — you'll run lunch-and-learns, shape best practices, and be the go-to voice for how we leverage AI to multiply engineering output while keeping the foundations trustworthy
- This role sits within our Platform SRE team, and you'll participate in the team's ad-hoc support rotation, providing infrastructure guidance and troubleshooting for engineering teams
- This means you bring deep SRE fundamentals — AWS, Terraform, production operations — alongside your AI enablement focus
- AI Enablement Strategy: Define and own the standards and best practices for AI-assisted development across the engineering organization, from tool selection to workflow integration
- Tooling Development: Evaluate, build, or adopt AI-powered tools that improve code quality, catch vulnerabilities earlier in the development process, and reduce review cycle times — whether that means evolving internal solutions or identifying and integrating third-party platforms
- Adoption & Advocacy: Partner with engineering teams to understand what's impacting their AI tool adoption, guide them through improvements, and lead org-wide enablement efforts such as lunch-and-learns, workshops, and documentation
- Measuring Impact: Establish metrics and feedback loops to quantify the impact of AI tooling on developer productivity, code quality, and delivery speed
- Infrastructure Automation: Contribute to the design and scaling of production environments using AWS and Terraform when on rotation or as needs arise
- Mentorship & Standards: Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization
- On-Call: Participate in a low-volume on-call rotation
Benefits
- Paid parental leave
- 401k plan
- Stock option plan
- Open vacation days
- Flexible working hours
- Work from home opportunities
- Health insurance
Not the right fit? Search for Site Reliability Engineer jobs in Canada
About Coalition, Inc.
Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. By combining comprehensive insurance coverage and cybersecurity tools, Coalition helps businesses manage and mitigate potential cyber attacks. Coalition offers its Active Insurance products to policyholders in the U.S., the U.K., Canada, and Australia through Coalition’s relationships with leading global insurers and cyber capacity through its own carrier, Coalition Insurance Company. Coalition also provides automated cyber alerts, expert guidance and advice, and third-party risk management to businesses worldwide through its holistic cyber risk management platform, Coalition Control.
Coalition is also home to Coalition Security, which helps protect small businesses from the expanding universe of cyber threats. Coalition Security cyber tools and services are built and managed by cybersecurity experts invested in your risk.