Senior DevOps & Site Reliability Engineer (Americas)
Top Benefits
About the role
- Our Cloud Operations team is seeking a Senior DevOps & Site Reliability Engineer who will play a critical role in ensuring the reliability, performance, and scalability of our diverse SaaS applications
- This role is a specialized hybrid, bridging the gap between legacy VM-based architectures and modern cloud-native standards through aggressive automation and development-focused operations
- Unlike a traditional SRE, this role is deeply integrated with the software development lifecycle, focusing on the consolidation and optimization of platform operations
- You will be responsible for building the CI/CD frameworks, self-service tools, and AI-driven automation that allow our engineering teams to move faster while maintaining rock-solid stability
- Your mission is to maximize the ROI of our existing infrastructure by “automating away” manual toil
- On-call coverage will be required on a weekly rotation basis
- In this role, you will be the technical anchor for a global platform footprint that includes a mix of Azure IaaS/PaaS, Google Cloud Platform (GCP), Kubernetes, and various data platforms. Your day will consist of:
- Intelligent Automation & DevOps: Identifying manual “toil” and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI
- AI-Enhanced Operations: Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency
- Scalable CI/CD & Provisioning: Designing and maintaining “self-service” deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform)
- Strategic ROI Projects: Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures
- Unified Observability: Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects
- Cross-Functional Collaboration: Partner with engineering, security and operations teams to ensure new features are “born” with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency
- Root Cause Analysis & Forensics: Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL)
- Governance & Security: Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP
Benefits
- Generous PTO
- Flexible work schedules
- A casual dress work environment
- Paid company holidays
- Remote work opportunities
- Appspace Quiet Fridays (No non-essential internal meetings scheduled)- Experience with AI-driven log analysis or automated incident remediation
- Knowledge of database tuning (SQL Server, MySQL, MongoDB)
- 6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
- Expert-level PowerShell and Python skills. Hands-on experience with Bicep or Terraform is required
- You are a problem-solver and an automator at heart
- Experience with Atlassian suite (Jira, Confluence, Bitbucket)
- Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
- Must have a passion for life-long learning
- Familiarity with compliance standards (SOC2, HIPAA, GDPR)
- Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP)
- Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
- Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments
About Appspace
Connect your people, places, and spaces.
Appspace is the workplace experience platform for your whole team that lets you manage it all – from employee communications to your physical office spaces. So, work-from-anywhere becomes an experience everyone loves. With offices in the US, UK, UAE, and Malaysia, plus additional experts in a dozen other countries, we provide global support to thousands of customers and help companies modernize their workplace experience.
Similar Jobs
Senior DevOps & Site Reliability Engineer (Americas)
Top Benefits
About the role
- Our Cloud Operations team is seeking a Senior DevOps & Site Reliability Engineer who will play a critical role in ensuring the reliability, performance, and scalability of our diverse SaaS applications
- This role is a specialized hybrid, bridging the gap between legacy VM-based architectures and modern cloud-native standards through aggressive automation and development-focused operations
- Unlike a traditional SRE, this role is deeply integrated with the software development lifecycle, focusing on the consolidation and optimization of platform operations
- You will be responsible for building the CI/CD frameworks, self-service tools, and AI-driven automation that allow our engineering teams to move faster while maintaining rock-solid stability
- Your mission is to maximize the ROI of our existing infrastructure by “automating away” manual toil
- On-call coverage will be required on a weekly rotation basis
- In this role, you will be the technical anchor for a global platform footprint that includes a mix of Azure IaaS/PaaS, Google Cloud Platform (GCP), Kubernetes, and various data platforms. Your day will consist of:
- Intelligent Automation & DevOps: Identifying manual “toil” and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI
- AI-Enhanced Operations: Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency
- Scalable CI/CD & Provisioning: Designing and maintaining “self-service” deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform)
- Strategic ROI Projects: Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures
- Unified Observability: Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects
- Cross-Functional Collaboration: Partner with engineering, security and operations teams to ensure new features are “born” with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency
- Root Cause Analysis & Forensics: Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL)
- Governance & Security: Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP
Benefits
- Generous PTO
- Flexible work schedules
- A casual dress work environment
- Paid company holidays
- Remote work opportunities
- Appspace Quiet Fridays (No non-essential internal meetings scheduled)- Experience with AI-driven log analysis or automated incident remediation
- Knowledge of database tuning (SQL Server, MySQL, MongoDB)
- 6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
- Expert-level PowerShell and Python skills. Hands-on experience with Bicep or Terraform is required
- You are a problem-solver and an automator at heart
- Experience with Atlassian suite (Jira, Confluence, Bitbucket)
- Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
- Must have a passion for life-long learning
- Familiarity with compliance standards (SOC2, HIPAA, GDPR)
- Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP)
- Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
- Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments
About Appspace
Connect your people, places, and spaces.
Appspace is the workplace experience platform for your whole team that lets you manage it all – from employee communications to your physical office spaces. So, work-from-anywhere becomes an experience everyone loves. With offices in the US, UK, UAE, and Malaysia, plus additional experts in a dozen other countries, we provide global support to thousands of customers and help companies modernize their workplace experience.