Senior Manager

Tubi 5 days ago

Toronto

Senior Level

Top Benefits

Medical, dental, and vision coverage from day one

Generous parental leave, childcare, and eldercare support

Monthly wellness reimbursement, generous time off, extra holidays

About the role

Who you are

8+ years of experience in a technical field, with at least 3+ years in an engineering leadership position managing SRE, DevOps, or Production Engineering teams
A deep, principled understanding of SRE tenets, including Service Level Indicators (SLIs), SLOs, error budgets, toil reduction, and capacity planning
Exceptional communication, negotiation, and influencing skills, with the ability to articulate complex technical concepts and strategies to both technical and non-technical stakeholders at all levels of the organization
A strong technical background as a hands-on software engineer or site reliability engineer prior to moving into management. Deep knowledge of AWS services (especially networking, IAM, EKS, ALBs/NLBs, Route 53, CloudWatch). Proven experience with Kubernetes in production (EKS preferred), including service exposure, networking, and availability engineering
Hands-on familiarity with modern SRE tools and technologies, including Infrastructure as Code (e.g., Terraform, Ansible), container orchestration (Kubernetes), observability platforms (e.g., Prometheus, Grafana, Datadog, Splunk), and incident tooling (e.g., PagerDuty, FireHydrant), deployment-safety tooling (e.g., Argo Rollouts, LaunchDarkly), and observability standards (e.g., OpenTelemetry)
Executive-caliber incident communication/storytelling skills (clear status, stakeholder alignment, and post-incident narratives)
Demonstrated success in hiring, developing, and mentoring high-performing engineers, including managing senior and principal-level talent
Experience managing globally distributed teams and developing equitable and sustainable on-call rotation practices
Experience in financial planning, budget management, and vendor contract negotiation for technical infrastructure and tooling

What the job involves

Site Reliability Engineering (SRE) at Tubi is not a traditional operations team
We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems
Our mission is to engineer resilience from the ground up, enabling our product teams to innovate rapidly while ensuring our users have a stellar experience
We own the availability, latency, performance, and capacity of our platform, and we achieve our goals through a culture of data-driven decision-making, blameless learning, and relentless automation
We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability Engineering team
You are more than a people manager or a tech lead; you are the strategic leader responsible for architecting our reliability roadmap
You will build and mentor a team of talented engineers, foster a culture of blameless learning and continuous improvement, and champion the engineering practices that allow us to balance rapid innovation with rock-solid stability
You will be a key influencer in our engineering leadership, partnering with peers across the organization to ensure reliability is a shared responsibility and a core tenet of our engineering culture
Team Leadership & Mentorship:
Lead, mentor, and grow a team of Site Reliability Engineers
Foster a culture of innovation and technical excellence where engineers feel empowered to do their best work
Provide personalized coaching, create professional development plans, and guide the careers of senior and emerging talent within the team
Establish equitable, sustainable on-call practices (including global coverage where applicable) that protect focus time and avoid burnout
Define team rituals - runbook reviews, game days, and incident retros - that reinforce quality and learning
Strategic Planning & Vision: Define and drive the multi-year technical strategy and vision for Tubi’s observability, and automation platforms
Partner with infra lead to align Tubi’s infrastructure & SRE roadmap
Partner with tech leaders to align the SRE roadmap with business objectives
Champion a data-driven approach to reliability, using Service Level Objectives (SLOs) and error budgets to facilitate productive conversations about risk and feature velocity
Operational Excellence & Incident Management:
Own the end-to-end availability, performance, and efficiency of our critical user-facing services
Evolve our incident response practice to reduce Mean Time to Resolution (MTTR) and Mean Time Between Failures (MTBF)
Champion a rigorous, blameless, and data-driven post-mortem culture to ensure we learn from both successes and failures, driving eng teams for systemic fixes and automation to prevent the recurrence of incidents
Streamline and improve our existing processes and practices, and collaborate with other teams to enhance our production release standards by improving current processes
Define and tune a 24×7 on-call rotation for low noise and fast response; act as executive escalation partner during major incidents
Own disaster-recovery strategy (playbooks, failover drills, recovery simulations) and track SLO gaps with time-bound remediations
Financial & Vendor Management: Own the SRE budget, tooling, and headcount
Manage relationships with key third-party vendors for our observability and SRE related AI platforms, work with infra lead and finance team for contract negotiations and ensure we derive maximum value from our investments
Cross-Functional Collaboration: Act as a key influencer and strategic partner to leaders in Software Engineering, Product Management, and Infra/Sec
Drive the adoption of SRE best practices and principles throughout the organization, ensuring new services are designed for reliability, scalability, and observability from day one

Benefits

Healthcare Coverage: We offer medical, dental, and vision coverage, effective from day one
Family Support: We’re proud to support families of all kinds, and offer generous parental leave, childcare support, and eldercare assistance whenever you need it
Wellness Programs: Monthly wellness reimbursement, generous time off, and additional Tubi Holidays help us support mental and physical wellbeing for you and your family
Continuing Education: From education reimbursement to leadership development to certification support, we’re invested in developing our talent so you can take your career to the next level
Financial Support: We offer resources to help keep you financially fit and invested in your future, from our highly-rated retirement savings matches to financial advisors and planning services

About Tubi

Entertainment Providers

501-1000

Tubi is the most watched free TV and movie streaming service in the U.S., dedicated to providing all people access to all the world's stories. As a leading ad-supported video-on-demand service, the company engages diverse audiences through a personalized experience and the world’s largest content library of over 275,000 movies and TV episodes, a growing collection of Tubi Originals, and nearly 250 FAST channels. Tubi is part of the Tubi Media Group, a division of Fox Corporation that oversees the company’s digital businesses

Website LinkedIn