Site Reliability Engineer
Top Benefits
About the role
Who you are
- Experience: 4+ years in SRE, DevOps, Cloud Engineering, or Software Development roles
- Cloud Proficiency: Hands-on experience operating and scaling production environments within AWS
- Infrastructure as Code: Strong expertise with Terraform for managing complex cloud infrastructure
- Programming: Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
- Containers & Orchestration: Experience with ECS or Kubernetes
- CI/CD: Familiarity with modern deployment tools, specifically GitHub Actions
- Communication: Strong written and verbal skills with a knack for evangelizing reliability best practices across the organization
- Experience troubleshooting complex distributed systems in a high-traffic production environment
- Exposure to event streaming systems such as Kafka or Kinesis
- Experience contributing to Internal Developer Platforms (IDP) or automating self-service infrastructure workflows
- Familiarity with systems security, compliance requirements, or infrastructure hardening
What the job involves
- We are looking for a Site Reliability Engineer to join our Platform SRE team
- In this role, you will build and operate the infrastructure, tools, and "paved roads" that empower our developers to deliver scalable, secure, and reliable software with speed and confidence
- You’ll work across the entire stack—from infrastructure automation and observability to developer enablement and system reliability
- You will be a key collaborator with software engineering and security teams, helping to evolve our Infrastructure as Code (IaC), enhance CI/CD pipelines, and scale our internal developer platform
- We value pragmatism and engineering excellence, primarily using Python, Go, and AWS to reduce toil and build self-service capabilities
- Infrastructure Automation: Design, build, and scale production environments using AWS and Terraform
- System Reliability: Improve the resilience and operability of our platform through failure-based testing and automated recovery strategies
- Developer Enablement: Design and implement reusable platform components and self-service tools to streamline the developer experience
- Observability: Implement and maintain robust observability practices, including system metrics, distributed tracing, and SLO management
- Mentorship & Standards: Guide junior engineers, uphold high infrastructure quality, and contribute to the team’s evolving best practices
- Collaboration: Participate in technical design discussions, sharing feedback and adapting strategies based on team input and evolving requirements
Benefits
- Paid parental leave
- 401k plan
- Stock option plan
- Open vacation days
- Flexible working hours
- Work from home opportunities
- Health insurance
About Coalition, Inc.
Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. By combining comprehensive insurance coverage and cybersecurity tools, Coalition helps businesses manage and mitigate potential cyber attacks. Coalition offers its Active Insurance products to policyholders in the U.S., the U.K., Canada, and Australia through Coalition’s relationships with leading global insurers and cyber capacity through its own carrier, Coalition Insurance Company. Coalition also provides automated cyber alerts, expert guidance and advice, and third-party risk management to businesses worldwide through its holistic cyber risk management platform, Coalition Control.
Coalition is also home to Coalition Security, which helps protect small businesses from the expanding universe of cyber threats. Coalition Security cyber tools and services are built and managed by cybersecurity experts invested in your risk.
Similar jobs you might like
Site Reliability Engineer
Top Benefits
About the role
Who you are
- Experience: 4+ years in SRE, DevOps, Cloud Engineering, or Software Development roles
- Cloud Proficiency: Hands-on experience operating and scaling production environments within AWS
- Infrastructure as Code: Strong expertise with Terraform for managing complex cloud infrastructure
- Programming: Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
- Containers & Orchestration: Experience with ECS or Kubernetes
- CI/CD: Familiarity with modern deployment tools, specifically GitHub Actions
- Communication: Strong written and verbal skills with a knack for evangelizing reliability best practices across the organization
- Experience troubleshooting complex distributed systems in a high-traffic production environment
- Exposure to event streaming systems such as Kafka or Kinesis
- Experience contributing to Internal Developer Platforms (IDP) or automating self-service infrastructure workflows
- Familiarity with systems security, compliance requirements, or infrastructure hardening
What the job involves
- We are looking for a Site Reliability Engineer to join our Platform SRE team
- In this role, you will build and operate the infrastructure, tools, and "paved roads" that empower our developers to deliver scalable, secure, and reliable software with speed and confidence
- You’ll work across the entire stack—from infrastructure automation and observability to developer enablement and system reliability
- You will be a key collaborator with software engineering and security teams, helping to evolve our Infrastructure as Code (IaC), enhance CI/CD pipelines, and scale our internal developer platform
- We value pragmatism and engineering excellence, primarily using Python, Go, and AWS to reduce toil and build self-service capabilities
- Infrastructure Automation: Design, build, and scale production environments using AWS and Terraform
- System Reliability: Improve the resilience and operability of our platform through failure-based testing and automated recovery strategies
- Developer Enablement: Design and implement reusable platform components and self-service tools to streamline the developer experience
- Observability: Implement and maintain robust observability practices, including system metrics, distributed tracing, and SLO management
- Mentorship & Standards: Guide junior engineers, uphold high infrastructure quality, and contribute to the team’s evolving best practices
- Collaboration: Participate in technical design discussions, sharing feedback and adapting strategies based on team input and evolving requirements
Benefits
- Paid parental leave
- 401k plan
- Stock option plan
- Open vacation days
- Flexible working hours
- Work from home opportunities
- Health insurance
About Coalition, Inc.
Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. By combining comprehensive insurance coverage and cybersecurity tools, Coalition helps businesses manage and mitigate potential cyber attacks. Coalition offers its Active Insurance products to policyholders in the U.S., the U.K., Canada, and Australia through Coalition’s relationships with leading global insurers and cyber capacity through its own carrier, Coalition Insurance Company. Coalition also provides automated cyber alerts, expert guidance and advice, and third-party risk management to businesses worldwide through its holistic cyber risk management platform, Coalition Control.
Coalition is also home to Coalition Security, which helps protect small businesses from the expanding universe of cyber threats. Coalition Security cyber tools and services are built and managed by cybersecurity experts invested in your risk.