Jobs.ca
Jobs.ca
Language
Kablamo logo

Senior Site Reliability Engineer - Remote

Kablamo10 days ago
Toronto, Ontario
Senior Level
full_time

Top Benefits

Kablamo bonus scheme
Remote first with a downtown Toronto office available
Work abroad for up to 3 weeks per year (some restrictions apply)

About the role

SENIOR SITE RELIABILITY ENGINEER Reports to: Product Care Manager

Location: Toronto (hybrid)

Role Type: Full time (Permanent)

Level: Individual Contributor

Introduction Kablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have assembled an amazing list of customers, including some of the best known enterprise and government organizations, in Australia and Canada. We’re looking to further accelerate our growth in both markets, and we’re seeking a Sr. Site Reliability Engineer to help us support new products to market.

Kablamo is proud to be an Advanced AWS Consulting partner, and we have recently been recognised as a global leader in designing and building cloud-based data and AI/ML solutions. At the 2021 AWS Global Public Sector conference, Kablamo won the award for “Most Innovative AI/ML Solution” for our work building bushfire prediction data platforms in Australia - we were selected from more than 1,800 AWS global partners.

The Role As we expand the capability across our Product Care offering, we are looking for a Sr. Site Reliability Engineer (SRE) to help us build our capability and deliver insights from massive scale data in real time. The Sr. SRE role is responsible for developing automated solutions for operational aspects such as on-call monitoring, performance and capacity planning, and disaster response. The role will complement our ongoing development teams, looking at continuous delivery and infrastructure automation.

As the bridge between development and operations, you will be our primary escalation point across key customer accounts.

Key Responsibilities

  • Contribute to the design, implementation, and maintenance of our AWS infrastructure
  • Be proactive in anticipating production issues. Assess risks and mitigate against these, planning for contingencies and counter-measures in advance
  • Ensuring reliability to get systems back to a steady state by quickly investigating and fixing performance, stability and scalability issues, ensuring Kablamo is able to meet SLA and SLO requirements
  • Responsible for ensuring that the underlying infrastructure is running smoothly and that systems and tools are working as expected. You will be assessing risks and mitigating against these or planning appropriate contingencies and counter-measures in advance
  • Develop or implement visual tools for technical and business teams to observe system health and supporting the Technical Account Manager in reporting on reliability metrics
  • Use automation tools to solve problems, writing and developing code to automate processes, such as analysing logs and testing production environments
  • Working with the engineering and/or development team to identify recurring problems which can be resolved through automation
  • Responsible for enhancing performance, efficiency and monitoring of software development processes
  • Act on system incidents; as the SRE you are a key contact involved in incident response and resolutions including active collaboration in any PIRs/Post-mortems
  • Collaborate closely with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability. Actively collaborating with the development team to define fields for logging and tracing.
  • Being a voice to advocate for reliability against competing priorities
  • Helping prepare activities for production release, including facilitating training and enablement of client technical teams and/or attending appropriate meetings (Technical Working Groups, Architecture Review Boards, Change Advisory Boards)

Required Skills And Experience

  • 5+ years’ experience in an SRE or DevOps role
  • Deep understanding of system architecture and design principles
  • Ability to think critically and problem solve, providing good performance under pressure
  • Troubleshooting experience with the ability to clearly communicate to customers or the engineering team
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Experience with AWS and its services (Serverless, Deployment Tools, Networking, Containerization, Security, Cost Management)
  • Familiarity with tools such as AWS CloudWatch, Datadog, Grafana, Prometheus, Scalyr, PagerDuty, OpsGenie, Jira Service Management
  • Ability to work cross functionally with support engineering, development teams and/or client vendors to deliver sound outcomes and suggest system improvements
  • Understanding of security requirements and implications and can conform to applicable security frameworks
  • An in-depth knowledge of version control
  • CI/CD implementation expertise
  • Experience with production rollback
  • Knowledge of fundamental network concepts and protocols
  • The ability to program with one or more high level languages, such as Python, Go, Java, C/C++ and JavaScript
  • A good understanding of DevOps concepts and best practices including Infrastructure-as-Code

Bonus Points For

  • Bachelor’s degree in computer science or other similar technical qualification
  • AWS Associate and/or Professional Level Certifications
  • Strong grasp of networking, security, and reliability fundamentals
  • Solid understanding of Agile methodologies and practices

Career Progression

  • Lead SRE
  • Principal/Staff SRE

Hiring Process

  • 30-min intro chat with our TA team
  • 1-hr Technical interview
  • 1-hr Final Interview
  • References
  • Offer!

Why Work at Kablamo? Our Culture We acknowledge a workplace that is diverse and inclusive, enables for greater innovation and produces benefits including improved performance, improved employee happiness and wellbeing, and superior outcomes for our customers. We attribute our success to all our unique and charismatic Kablamites. Through our fortnightly back to base and our debate Thunderdomes, we enable our Kablamites to provide feedback, share ideas, challenge the status quo and technically challenge each other constructively.

The PERKS!!!

  • Kablamo bonus scheme
  • Remote first with a downtown Toronto office available
  • Work abroad for up to 3 weeks per year (some restrictions apply)
  • Career growth (we really do promote from within!)
  • Individual training budget
  • Online rewards platform
  • Regular social events
  • Blogging rewards
  • Paid birthday leave
  • Anniversary bonus
  • Referral bonus
  • Parental Leave top up
  • Employee Assistance Program
  • Swag

Kablamo is a proud equal opportunity employer. We make our hiring decisions solely based on your skills and experience, as well as the perspectives and value you can bring to our team. Kablamo believes that diversity is vital to provide the best service to our clients and we are committed to fostering a varied and inclusive work environment. Every effort to accommodate candidates for accessibility will be made upon request. Information received related to accommodations will be addressed confidentially.

Kablamo would like to thank all candidates for their interest however only qualified applicants will be shortlisted.

About Kablamo

IT Services and IT Consulting
51-200

We design and build beautiful cloud-based software.

The genesis of our business was a team of specialists who cut their teeth in the digital transformation of some of Australia’s largest media organisations. Those visionaries, and our growing team around the world, now bring that expertise and knowledge to unlocking the digital futures for enterprise and government.

Our people set us apart. Only the most qualified specialists, with the natural curiosity and confidence to take nothing for granted, can uncover solutions to problems that others barely glimpse.

We're embedded explorers who thrive on the trust we foster. Clients trust us to evaluate all aspects of data management, networking, security, architecture, processes, and strategy to discover the best path toward true digital enterprise.

We're silo-busting and legacy-busting, data liberators. We break down the walls keeping enterprises from their potential. We deliver them the future they’ve been promised.

We combine all the elements of successful delivery for a complete and consistent approach to cloud-based strategy:

SECURITY AND NETWORKING We deliver scalable, high performance cloud-based solutions. We integrate and embed automated security early into the development process.

LEVERAGE We have access to a wealth of pre-built cloud services and components. We've led multiple cloud transformation programs across the utilities, media, banking and retail sectors.

AGILE AND AUTOMATED We build code that's easy to maintain and extend, so your business can launch new products quickly. Automation drives speed and maintains quality. Testing ensures the right choice for you.

HUMAN-CENTERED Our focus is people-first. Through our Innovation Squads we offer a unique and low-risk way to quickly engage and experiment with initiatives without having to make long-term commitments.