Site Reliability Engineer
Remote
United States, Canada
$150,000 - $200,000/yearly
Mid Level
About the role
Who you are
- 8+ years of experience in SRE, DevOps, or Platform Engineering, managing distributed cloud-native systems in production
- Proficiency in Infrastructure as Code (IaC) tools like Terraform/Pulumi
- Experience with TypeScript or other modern programming languages (our stack is heavily TypeScript-based)
- Strong background in cloud platforms (GCP, AWS, Azure) - hands-on experience with at least one is required
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc)
- Understanding of CI/CD best practices and experience with pipeline tools like Github Actions
- Strong troubleshooting skills and experience with incident management
- Experience with cloud security solutions, IAM, secrets management (HashiCorp Vault, GCP Secrets Manager), Identity based Authentication, Zero Trust
- Knowledge of security compliance frameworks (SOC 2, ISO 27001)
- Experience with Kubernetes, serverless architectures, or container security
- Exposure to data and data platforms, e.g. Snowflake, Databricks and Spark engines like AWS EMR and GCP Dataproc
What the job involves
- We are looking for an experienced Site Reliability Engineer to drive the reliability, scalability, and operational excellence of Bobsled's data-sharing platform. You'll apply your expertise to complex technical and business challenges, ensuring that our infrastructure and pipelines are highly available and performant
- You will play a key role in maintaining and improving Bobsled's multi-cloud environment (GCP, AWS, Azure, Cloudflare, Snowflake, Databricks). Your work will have a direct and massive impact on the way organizations share and collaborate on data across the world
- As an early hire, you will also play a pivotal role in shaping our team culture, fostering a collaborative environment, and assessing engineering candidates
- Infrastructure Reliability: Design, build, and maintain highly available, scalable infrastructure using modern IaC practices such as Terraform/Pulumi
- Multi-Cloud Operations: Manage and optimize Bobsled's infrastructure across GCP, AWS, Azure, and other cloud providers
- CI/CD Pipelines: Build and maintain robust pipelines that ensure safe, reliable, and automated deployment of infrastructure and applications
- Monitoring & Observability: Develop comprehensive monitoring, logging, and alerting systems to ensure visibility into infrastructure and application health
- Incident Response: Establish and continuously improve incident response processes, ensuring rapid detection and resolution of production issues
- Performance Optimization: Identify and resolve performance bottlenecks, capacity planning, and cost optimization across our cloud environments
- On-Call & Reliability: Participate in on-call rotations and drive improvements to reduce toil and improve system reliability
About Bobsled
Software Development
11-50
Bobsled is a cross-cloud data sharing platform that makes it painless to share data between any data lake or warehouse.
We enable product and data teams to share data products directly into a customer or partner’s preferred analytical environment without ever leaving their own.
Site Reliability Engineer
Remote
United States, Canada
$150,000 - $200,000/yearly
Mid Level
About the role
Who you are
- 8+ years of experience in SRE, DevOps, or Platform Engineering, managing distributed cloud-native systems in production
- Proficiency in Infrastructure as Code (IaC) tools like Terraform/Pulumi
- Experience with TypeScript or other modern programming languages (our stack is heavily TypeScript-based)
- Strong background in cloud platforms (GCP, AWS, Azure) - hands-on experience with at least one is required
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc)
- Understanding of CI/CD best practices and experience with pipeline tools like Github Actions
- Strong troubleshooting skills and experience with incident management
- Experience with cloud security solutions, IAM, secrets management (HashiCorp Vault, GCP Secrets Manager), Identity based Authentication, Zero Trust
- Knowledge of security compliance frameworks (SOC 2, ISO 27001)
- Experience with Kubernetes, serverless architectures, or container security
- Exposure to data and data platforms, e.g. Snowflake, Databricks and Spark engines like AWS EMR and GCP Dataproc
What the job involves
- We are looking for an experienced Site Reliability Engineer to drive the reliability, scalability, and operational excellence of Bobsled's data-sharing platform. You'll apply your expertise to complex technical and business challenges, ensuring that our infrastructure and pipelines are highly available and performant
- You will play a key role in maintaining and improving Bobsled's multi-cloud environment (GCP, AWS, Azure, Cloudflare, Snowflake, Databricks). Your work will have a direct and massive impact on the way organizations share and collaborate on data across the world
- As an early hire, you will also play a pivotal role in shaping our team culture, fostering a collaborative environment, and assessing engineering candidates
- Infrastructure Reliability: Design, build, and maintain highly available, scalable infrastructure using modern IaC practices such as Terraform/Pulumi
- Multi-Cloud Operations: Manage and optimize Bobsled's infrastructure across GCP, AWS, Azure, and other cloud providers
- CI/CD Pipelines: Build and maintain robust pipelines that ensure safe, reliable, and automated deployment of infrastructure and applications
- Monitoring & Observability: Develop comprehensive monitoring, logging, and alerting systems to ensure visibility into infrastructure and application health
- Incident Response: Establish and continuously improve incident response processes, ensuring rapid detection and resolution of production issues
- Performance Optimization: Identify and resolve performance bottlenecks, capacity planning, and cost optimization across our cloud environments
- On-Call & Reliability: Participate in on-call rotations and drive improvements to reduce toil and improve system reliability
About Bobsled
Software Development
11-50
Bobsled is a cross-cloud data sharing platform that makes it painless to share data between any data lake or warehouse.
We enable product and data teams to share data products directly into a customer or partner’s preferred analytical environment without ever leaving their own.