Jobs.ca
Jobs.ca
Language
Payfirma logo

Site Reliability Engineer

Payfirma15 days ago
Vancouver
Mid Level

About the role

Who you are

  • 8+ years in IT operations, including focused SRE or DevOps roles
  • Deep hands-on experience with AWS (EC2, S3, Lambda, EKS, VPC, IAM, etc.)
  • Proficiency in scripting languages such as PowerShell
  • Familiarity with observability tools like Prometheus, Grafana, CloudWatch, Sumologic, AppDynamics
  • Strong grasp of Infrastructure as Code (Terraform, CloudFormation)
  • Solid foundations in networking, security, and database management
  • Experience with APIs and file exchange interfaces
  • Fluency in Agile methodologies (Scrum, Kanban)
  • Excellent problem-solving and root cause analysis skills
  • Strong leadership and communication, with a track record of stakeholder engagement
  • Familiarity with ITIL frameworks and incident management practices
  • Proficient in .NET, SQL, IIS, SSIS
  • Fintech or payments domain experience
  • AWS certifications (Solutions Architect, DevOps Engineer)
  • Experience managing digital certificates and secure communications
  • ITIL foundational certifications
  • Master’s degree in Computer Science or related field

What the job involves

  • As a Site Reliability Engineer, you will be responsible for ensuring the resilience, scalability, and performance of our mission-critical systems. Acting as a guardian of uptime and operational excellence, you’ll blend DevOps expertise, automation, and cloud-native tools to ensure our platforms remain fast, secure, and available
  • Welcome to KORT Payments, where innovation meets excellence! We specialize in providing a state-of-the-art omnichannel payments platform designed to make business transactions a breeze
  • As we expand our presence in the U.S. market, we're excited to bring our proven solutions and innovative approach to new industries, while continuing to operate under the KORT Payments banner alongside Merrco, Payfirma, and Barnet
  • Our mission? To empower businesses with top-notch capabilities in compliance, risk management, and payment processing. Our trailblazing, enterprise-grade platform, coupled with a veteran management team, ensures we stay ahead of the curve in delivering unparalleled service and satisfaction
  • As a Site Reliability Engineer, you will:
  • Manage reliability, availability, and performance across production systems
  • Lead production releases using AWS and Azure DevOps
  • Troubleshoot and resolve Tier 3 support issues; participate in on-call rotations
  • Maintain and validate production and non-production environments
  • Develop automation to eliminate repetitive issues and strengthen system resilience
  • Build proactive monitoring and alerting solutions (e.g., AppDynamics, Sumologic)
  • Conduct root cause analysis and implement permanent fixes
  • Operate within ITIL best practices (incident, problem, change management)
  • Ensure successful data loads and exchanges across platforms
  • Process change requests through Jira Service Management, maintaining governance and compliance
  • Collaborate with cross-functional teams to ensure seamless deployments
  • Drive improvements in observability, tooling, and response procedures
  • Manage cloud infrastructure (AWS) and CI/CD pipelines

About Payfirma

1-10