Jobs.ca
Jobs.ca
Language
Intelcom | Dragonfly logo

Site Reliability Engineer (SRE)

Montréal, QC
Senior Level
full_time

About the role

Ride the next mile with us!

Reporting to the Site Reliability Engineering Manager within the Information Technology department, the Site Reliability Engineer (SRE) will play a key role in ensuring our systems are reliable, scalable, and high-performing. Working closely with both development and operations teams, you will help design, build, and maintain resilient infrastructure and applications that support our business objectives.

We are looking for a collaborative problem-solver who is passionate about operational excellence and eager to contribute to a high-impact team. Please note that this role may require occasional off-hours availability until processes are fully matured.

Responsibilities

  • Incident Management: Detect and respond to issues, ensuring rapid recovery to minimize downtime. Current on-call contributors need better coordination and structure in investigations. This role involves off-hours events, but these are cyclical with quieter periods. Define and implement an escalation process. Ensure the communication and adhesion of all the stakeholders across the business to the process. Document incident reports and conduct post-mortems to promote a continuous improvement approach.
  • Collaboration: Work closely with development and operations teams to ensure smooth deployment and operation of applications. Provide primary operational support and engineering for large-scale distributed software applications. Collaborate with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. This requires a diligent follow-up and close collaboration with all teams.
  • Influence: Create sustainable systems and services through automation and enhancements. Promote a culture of innovation and continuous improvement within the SRE team and the broader organization. Coordinate with the SRE team manager in establishing and executing operational policies that promote agility and scalability. Coordinate and mentor other SRE team members, fostering professional growth and development. Work closely with development and operations teams to ensure smooth deployment and operation of applications.
  • Automation: Automate repetitive tasks to improve efficiency and reduce human errors. Improve the reliability, quality, and time-to-market of our software solutions. Measure and optimize system performance anticipating business needs.
  • Monitoring and Alerting: Implement and enhance monitoring systems (e.g., Datadog) to track the health and performance of applications and infrastructure. There are existing systems, but additional ones are needed. Monitor and maintain the production environment, ensuring high availability and system health. Gather and process metrics from operating systems and applications to assist in performance tuning and fault finding. Develop an health monitoring dashboard to enable the visibility of our various stakeholders on our production environment.
  • Disaster Recovery: Prepare and implement disaster recovery plans to manage unexpected outages.
  • Performance Optimization: Continuously improve system performance and scalability.
  • Capacity Planning: Ensure the infrastructure can handle current and future demands.
  • Chaos Engineering: Intentionally introduce failures to test system resilience and improve robustness.

Qualifications

  • Bachelor's degree in software engineering, computer science or equivalent.
  • 3+ years experience in cloud management, development and/or SRE responsibilities.
  • Experience in Agile methodology and technical project execution. Knowledgeable in DevOps concepts, AWS, Azure, GCP, observability tools (Datadog, cloudflare), Terraform, PagerDuty and how to integrate all these things together.

Other Skills:

  • Strong initiative and resilience, with a demonstrated ability to explore new ideas and innovative approaches to solving complex problems.
  • Excellent interpersonal and communication skills in both French and English.
  • Be able and comfortable evolving in fast-moving environment.
  • Schedule: Primarily daytime hours, but on-call availability is required for the initial months to observe and refine existing processes.

Join Our Team

Be part of a dynamic and innovative company at the forefront of the last-mile delivery industry. If you are a strategic thinker, results-driven leader, and passionate about driving business growth, we’d love to hear from you.

Intelcom is a leading last-mile carrier in the e-commerce sector. Our teams across Canada as well as our network of independent contractors contribute to Intelcom’s daily operations.

Our goal is simple: in a constantly evolving business sector, we don't just follow, we get ahead. In addition to standing out through innovative services and delivery methods, Intelcom is also undergoing a technological transformation where the integration of customer experience and logistics technologies are at the heart of its evolution.

At Intelcom, we know experience comes in many forms and are committed to building a culture where difference is valued. We are always looking for talented and diverse individuals to join our teams. With over 60 delivery centers across Canada, we may have the right opportunity for you.

About Intelcom | Dragonfly

Transportation, Logistics, Supply Chain and Storage
1001-5000

Intelcom est une entreprise de logistique du dernier kilomètre chef de file dans le secteur du commerce électronique. Nos équipes d’un bout à l’autre du Canada ainsi que notre réseau d’entrepreneurs indépendants contribuent aux activités quotidiennes d’Intelcom. En innovant constamment et en adoptant une approche unique en matière de technologies logistiques, Intelcom est résolument tournée vers l’avenir afin de continuer à offrir une efficacité opérationnelle toujours plus grande. __ Intelcom is a leading last-mile carrier in the e-commerce sector. Our teams across Canada as well as our network of independent contractors contribute to Intelcom’s daily operations. Through constant innovation and a unique approach to logistics technology, Intelcom is focused on what’s ahead to continue delivering new levels of operational efficiency.