About the role
Job Title: Azure Cloud Site Reliability Engineer (SRE) Job Title: -Calgary ,AB (Onsite )
Must Have ; -Infrastructure deploy(VMs, AKS, networks )CI/CD pipelines setu pMonitoring, scaling, patchin gTroubleshooting production issue sAutomation : Terraform, Scripti g
Role Overview: The SRE will be responsible for the reliability, availability, and performance of Azure/AWS PaaS and IaaS workloads. They bridge the gap between development and operations, focusing on building automated systems that prevent failures, managing incident responses, and optimizing cloud costs . Key Responsibilitie s System Reliability & Monitoring: Design, implement, and maintain comprehensive monitoring and alerting systems such as Azure Monitor, AWS CloudWatch, Application Insights, and Log Analytic s.Automation & Toil Reduction: Automate repetitive manual operations (toil) such as environment provisioning, system patching, and scaling. Use IaC tools like Terraform and Ansible to manage infrastructur e.Incident Response & Management: Actively manage incident responses, root cause analysis (RCA), and post-mortem investigations to improve system reliability and minimize mean time to resolution (MTTR ).Cloud SRE Agent Integration: Deploy and configure Cloud SRE Agent to automate incident investigation, execute remediation steps (restart, scale, rollback), and manage routine task s.Capacity Planning & Scalability: Analyze usage patterns to optimize cloud resources, ensuring high availability and performance while managing costs via Azure Cost Managemen t.CI/CD & DevOps Collaboration: Integrate automation workflows into CI/CD pipelines (e.g., GitHub Actions or Azure Pipelines) to ensure reliable deployment
s. Required Skills & Qualificatio ns Cloud Platforms: Expert knowledge of Microsoft Azure infrastructure services (Compute, Storage, Networking, AK S).Scripting & Programming: Proficiency in Python, Bash, or PowerShell for building automation too ls.Infrastructure as Code (IaC): Extensive experience with Terraform and ARM templates/Bic ep.Observability Tools: Experience with Azure Monitor, Grafana, Prometheus, or Datad og.Containers & Orchestration: Solid understanding of Kubernetes/AKS (Azure Kubernetes Servic e).Operating Systems: Proficient in Windows/Linux environmen ts.Azure Certification is a +Exposure to multi Cloud environment is mu
st. Typical "Day in the Life" Activit ies Reviewing Service Level Objectives (SLOs) and error budg ets.Refining auto-scaling rules for Kubernetes clusters based on traffic tre nds.Working with developers to review service architecture and ensure fault tolera nce.Configuring AI-driven alert suppression to reduce alert fati gue.Creating Azure Dashboards to visualize key performance indicators (KP
Is).
Not the right fit? Search for Cloud Consultant jobs in Calgary, Alberta, Canada
About HCLTech
HCLTech is a global technology company, home to more than 223,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending March 2025 totaled $13.8 billion. To learn how we can supercharge progress for you, visit hcltech.com.
Similar Jobs
About the role
Job Title: Azure Cloud Site Reliability Engineer (SRE) Job Title: -Calgary ,AB (Onsite )
Must Have ; -Infrastructure deploy(VMs, AKS, networks )CI/CD pipelines setu pMonitoring, scaling, patchin gTroubleshooting production issue sAutomation : Terraform, Scripti g
Role Overview: The SRE will be responsible for the reliability, availability, and performance of Azure/AWS PaaS and IaaS workloads. They bridge the gap between development and operations, focusing on building automated systems that prevent failures, managing incident responses, and optimizing cloud costs . Key Responsibilitie s System Reliability & Monitoring: Design, implement, and maintain comprehensive monitoring and alerting systems such as Azure Monitor, AWS CloudWatch, Application Insights, and Log Analytic s.Automation & Toil Reduction: Automate repetitive manual operations (toil) such as environment provisioning, system patching, and scaling. Use IaC tools like Terraform and Ansible to manage infrastructur e.Incident Response & Management: Actively manage incident responses, root cause analysis (RCA), and post-mortem investigations to improve system reliability and minimize mean time to resolution (MTTR ).Cloud SRE Agent Integration: Deploy and configure Cloud SRE Agent to automate incident investigation, execute remediation steps (restart, scale, rollback), and manage routine task s.Capacity Planning & Scalability: Analyze usage patterns to optimize cloud resources, ensuring high availability and performance while managing costs via Azure Cost Managemen t.CI/CD & DevOps Collaboration: Integrate automation workflows into CI/CD pipelines (e.g., GitHub Actions or Azure Pipelines) to ensure reliable deployment
s. Required Skills & Qualificatio ns Cloud Platforms: Expert knowledge of Microsoft Azure infrastructure services (Compute, Storage, Networking, AK S).Scripting & Programming: Proficiency in Python, Bash, or PowerShell for building automation too ls.Infrastructure as Code (IaC): Extensive experience with Terraform and ARM templates/Bic ep.Observability Tools: Experience with Azure Monitor, Grafana, Prometheus, or Datad og.Containers & Orchestration: Solid understanding of Kubernetes/AKS (Azure Kubernetes Servic e).Operating Systems: Proficient in Windows/Linux environmen ts.Azure Certification is a +Exposure to multi Cloud environment is mu
st. Typical "Day in the Life" Activit ies Reviewing Service Level Objectives (SLOs) and error budg ets.Refining auto-scaling rules for Kubernetes clusters based on traffic tre nds.Working with developers to review service architecture and ensure fault tolera nce.Configuring AI-driven alert suppression to reduce alert fati gue.Creating Azure Dashboards to visualize key performance indicators (KP
Is).
Not the right fit? Search for Cloud Consultant jobs in Calgary, Alberta, Canada
About HCLTech
HCLTech is a global technology company, home to more than 223,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending March 2025 totaled $13.8 billion. To learn how we can supercharge progress for you, visit hcltech.com.