Jobs.ca
Jobs.ca
Language
Bobsled logo

Site Reliability Engineer

Bobsled5 days ago
Remote
United States, Canada
$150,000 - $200,000/yearly
Mid Level

About the role

Who you are

  • 8+ years of experience in SRE, DevOps, or Platform Engineering, managing distributed cloud-native systems in production
  • Proficiency in Infrastructure as Code (IaC) tools like Terraform/Pulumi
  • Experience with TypeScript or other modern programming languages (our stack is heavily TypeScript-based)
  • Strong background in cloud platforms (GCP, AWS, Azure) - hands-on experience with at least one is required
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc)
  • Understanding of CI/CD best practices and experience with pipeline tools like Github Actions
  • Strong troubleshooting skills and experience with incident management
  • Experience with cloud security solutions, IAM, secrets management (HashiCorp Vault, GCP Secrets Manager), Identity based Authentication, Zero Trust
  • Knowledge of security compliance frameworks (SOC 2, ISO 27001)
  • Experience with Kubernetes, serverless architectures, or container security
  • Exposure to data and data platforms, e.g. Snowflake, Databricks and Spark engines like AWS EMR and GCP Dataproc

What the job involves

  • We are looking for an experienced Site Reliability Engineer to drive the reliability, scalability, and operational excellence of Bobsled's data-sharing platform. You'll apply your expertise to complex technical and business challenges, ensuring that our infrastructure and pipelines are highly available and performant
  • You will play a key role in maintaining and improving Bobsled's multi-cloud environment (GCP, AWS, Azure, Cloudflare, Snowflake, Databricks). Your work will have a direct and massive impact on the way organizations share and collaborate on data across the world
  • As an early hire, you will also play a pivotal role in shaping our team culture, fostering a collaborative environment, and assessing engineering candidates
  • Infrastructure Reliability: Design, build, and maintain highly available, scalable infrastructure using modern IaC practices such as Terraform/Pulumi
  • Multi-Cloud Operations: Manage and optimize Bobsled's infrastructure across GCP, AWS, Azure, and other cloud providers
  • CI/CD Pipelines: Build and maintain robust pipelines that ensure safe, reliable, and automated deployment of infrastructure and applications
  • Monitoring & Observability: Develop comprehensive monitoring, logging, and alerting systems to ensure visibility into infrastructure and application health
  • Incident Response: Establish and continuously improve incident response processes, ensuring rapid detection and resolution of production issues
  • Performance Optimization: Identify and resolve performance bottlenecks, capacity planning, and cost optimization across our cloud environments
  • On-Call & Reliability: Participate in on-call rotations and drive improvements to reduce toil and improve system reliability

About Bobsled

Software Development
11-50

Bobsled is a cross-cloud data sharing platform that makes it painless to share data between any data lake or warehouse.

We enable product and data teams to share data products directly into a customer or partner’s preferred analytical environment without ever leaving their own.