Staff Site Reliability Engineer

Coalition, Inc.about 1 month ago

Remote

Canada

$153,400 - $220,400/yearly

Staff

Top Benefits

Paid parental leave

401k plan

Stock options

About the role

Who you are

8–10+ years of experience in SRE, DevOps, Cloud Engineering, Platform Engineering, or Software Development roles
Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, or similar
Experience building AI/LLM-powered developer tools or integrations
Demonstrated ability to drive org-wide tooling adoption, including change management, training, and measuring outcomes
Proficiency in prompt engineering techniques
Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
Hands-on experience operating production environments in AWS
Strong experience with Terraform
Experience with container orchestration platforms like ECS or Kubernetes
Familiarity with CI/CD tools such as GitHub Actions
Solid understanding of observability practices including system metrics, distributed tracing, and SLOs. Datadog is a plus
Exceptional communication and presentation skills, both written and verbal
Experience troubleshooting complex distributed systems in a high-traffic production environment
Exposure to event streaming systems such as Kafka or Kinesis
Experience building Internal Developer Platforms (IDP) or designing self-service infrastructure workflows
Familiarity with systems security, compliance requirements, or infrastructure hardening
Experience with agentic AI workflows, MCP frameworks, or AI-powered automation beyond code generation
Track record of leading incident response or driving post-incident review processes

What the job involves

We are looking for a Staff Site Reliability Engineer to lead AI enablement across our engineering organization
As AI-assisted development reshapes how software gets built, a new platform layer is emerging underneath — one that requires guardrails, quality gates, security standards, and tooling infrastructure to ensure AI-generated output is reliable, secure, and production-worthy
This role owns that layer
This role blends building and buying — you'll design and develop custom tools and frameworks where the market doesn't meet our needs, while continuously evaluating the evolving landscape to ensure we're leveraging the best solutions available
We aim to be on the cutting edge, not the bleeding edge — investing deliberately in what delivers real value and staying ready to pivot when the market shifts meaningfully
You will define and drive the strategy for embedding AI-native tools and practices into the software development lifecycle — from AI-assisted code review and developer workflow automation to establishing security standards for emerging frameworks like MCP
You'll own AI tooling standards for the engineering org, evaluate and adopt the best platforms, use data to measure impact and prioritize where to invest next, and partner with teams to automate repetitive workflows using agentic tools
This is a visible, high-influence role — you'll run lunch-and-learns, shape best practices, and be the go-to voice for how we leverage AI to multiply engineering output while keeping the foundations trustworthy
This role sits within our Platform SRE team, and you'll participate in the team's ad-hoc support rotation, providing infrastructure guidance and troubleshooting for engineering teams
This means you bring deep SRE fundamentals — AWS, Terraform, production operations — alongside your AI enablement focus
AI Enablement Strategy: Define and own the standards and best practices for AI-assisted development across the engineering organization, from tool selection to workflow integration
Tooling Development: Evaluate, build, or adopt AI-powered tools that improve code quality, catch vulnerabilities earlier in the development process, and reduce review cycle times — whether that means evolving internal solutions or identifying and integrating third-party platforms
Adoption & Advocacy: Partner with engineering teams to understand what's impacting their AI tool adoption, guide them through improvements, and lead org-wide enablement efforts such as lunch-and-learns, workshops, and documentation
Measuring Impact: Establish metrics and feedback loops to quantify the impact of AI tooling on developer productivity, code quality, and delivery speed
Infrastructure Automation: Contribute to the design and scaling of production environments using AWS and Terraform when on rotation or as needs arise
Mentorship & Standards: Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization
On-Call: Participate in a low-volume on-call rotation

Benefits

Paid parental leave
401k plan
Stock option plan
Open vacation days
Flexible working hours
Work from home opportunities
Health insurance

Not the right fit? Search for Site Reliability Engineer jobs in Canada

About Coalition, Inc.

Insurance

501-1000

Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. By combining comprehensive insurance coverage and cybersecurity tools, Coalition helps businesses manage and mitigate potential cyber attacks. Coalition offers its Active Insurance products to policyholders in the U.S., the U.K., Canada, and Australia through Coalition’s relationships with leading global insurers and cyber capacity through its own carrier, Coalition Insurance Company. Coalition also provides automated cyber alerts, expert guidance and advice, and third-party risk management to businesses worldwide through its holistic cyber risk management platform, Coalition Control.

Coalition is also home to Coalition Security, which helps protect small businesses from the expanding universe of cyber threats. Coalition Security cyber tools and services are built and managed by cybersecurity experts invested in your risk.

Website LinkedIn

Similar jobs you might like