Jobs.ca
Jobs.ca
Language
Pinterest logo

Senior Site Reliability Engineer

Pinterest25 days ago
Toronto
Senior Level

About the role

Who you are

  • Strong knowledge of Kubernetes (specially EKS), including deploy patterns, rollout safety, and core debugging workflows
  • 4+ years of experience with programming languages (Python or Golang preferred)
  • Strong experience managing projects and initiatives end-to-end
  • Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot or Claude for code generation, debugging, and documentation
  • Demonstrated ability to write effective prompts to get high-quality, reliable outputs from LLMs
  • Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
  • Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
  • High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables
  • Experience with technologies such as Terraform, Buildkite, and/or ArgoCD is required
  • Bachelor’s or Master’s degree in a relevant field such as Computer Science, or equivalent experience

What the job involves

  • The Site Reliability Engineering organization at Pinterest is accountable for ensuring overall Pinterest availability as well as enhancing Engineering teams’ capability to design, build and operate robust systems at scale
  • We are hiring a Sr. SRE to join our Compute SRE team
  • This team is responsible for ensuring that all compute workloads run smoothly on Pinterest
  • We're building the future on kubernetes and our job is to connect it with what Pinterest needs
  • Pinterest’s applications and infrastructure that handle billions of monthly page views and petabytes of data as Pinterest continues to grow and scale
  • As a Pinterest SRE, you will design and build systems, platforms, tools, frameworks and methodologies to assure the reliability of our large-scale distributed systems
  • Tackle project challenges on EKS, such as implementing Karpenter. This work affects how every developer codes, tests, and improves their work
  • Collaborate across various teams to drive projects forward using open-source tools
  • Build a deep understanding of how Pinterest’s systems behave, scale, interact and fail, and use that insight to identity risks and opportunities for remediation
  • Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across Pinterest Engineering
  • Build meaningful, insightful and actionable SLIs
  • Automate critical portions of Pinterest’s engineering processes, to minimize risk and maximize the speed of innovation
  • Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world
  • Use AI for analysis of incidents, operational signals, and system behaviors to help identify patterns and generate plans and propose remediation approaches
  • Leverage AI to speed development of runbooks, automation workflows, reliability tooling by drafting, iterating, and refining approaches

About Pinterest

Software Development
10,000+

Pinterest's mission is to bring everyone the inspiration to create a life they love. It's the visual inspiration platform where 482 million monthly active users worldwide come to search, save, and shop the best ideas in the world for all of life’s moments.

Similar Jobs