Senior Software Engineer II - Observability (Remote - Canada)
About the role
We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them.
It takes a certain kind of person to join this team. Those who ask hard questions, give honest feedback, and show up for each other. No egos, no solo acts. Just smart, curious humans pushing toward something bigger, together.
One Confluent. One Team. One Data Streaming Platform.
About the Role:
We’re seeking a Senior Software Engineer II with a passion for observability, and a desire to define their career by contributing to our mission critical observability platform that operates at global scale; across the big three cloud providers, a hundred regions, thousands of clusters and tens of thousands of nodes. At this scale, every contribution you make will have an enormous impact on how we develop and operate our global data streaming platform. This role will let you flex a wide range of skills; where UX design and mega scale system design are equally important, where improving developer velocity and compute resource efficiency can be equally satisfying.
What You Will Do:
- Architect, design, build and operate end-to-end solutions for collecting, shipping, storing and querying Open-Telemetry signals from infrastructure, application containers, and k8s clusters, with a heavy focus on self-service, multitenancy, reliability and velocity.
- Operate global and regional storage and query backends for metrics, traces, and logs
- Define and implement the building blocks for querying, visualizing and acting on 300M+ active time series using Grafana, Prometheus, AlertManager, and PagerDuty
- Evaluate and implement new capabilities for logging, trace analytics and application profiling
- Work directly with product engineering teams, on calls, and incident commanders to evangelize and deliver enhancements to our observability platform
What You Will Bring:
- 5+ years building distributed systems in Java, Golang or Python and running in k8s
- 2+ years on an SRE, DevOps, observability, or similar platform engineering team delivering capabilities to multiple product engineering teams
- Deep experience in the use of Prometheus, AlertManager, and Grafana
- Experience with operating in-house observability infrastructure and being on-call for it
- BS, MS, or PhD in computer science or a related field, or equivalent work experience
What Gives You an Edge:
- Experience operating highly scalable observability backends (VictoriaMetrics, Cortex, Thanos, Mimir, Tempo, Loki, Elastic, etc)
- Experience federating multiple observability backends that cross cloud regions
- You’ve built and operated k8s clusters on an infrastructure or SRE team in AWS, GCP, and Azure
- You’ve taken a platform approach to problem solving by building your own OTel collector components, k8s custom resources and k8s operators
Ready to build what's next? Let’s get in motion.
Come As You Are
Belonging isn’t a perk here. It’s the baseline. We work across time zones and backgrounds, knowing the best ideas come from different perspectives. And we make space for everyone to lead, grow, and challenge what’s possible.
We’re proud to be an equal opportunity workplace. Employment decisions are based on job-related criteria, without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by law.
About Confluent
Confluent is pioneering a fundamentally new category of data infrastructure focused on data in motion. Our cloud-native offering is the foundational platform for data in motion --- designed to be the intelligent connective tissue enabling real-time data, from multiple sources, to constantly stream across the organization. With Confluent, our customers can meet the new business imperative of delivering rich, digital customer experiences and real-time business operations. Our mission is to help every organization harness data in motion so they can compete and thrive in the modern world.
Senior Software Engineer II - Observability (Remote - Canada)
About the role
We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them.
It takes a certain kind of person to join this team. Those who ask hard questions, give honest feedback, and show up for each other. No egos, no solo acts. Just smart, curious humans pushing toward something bigger, together.
One Confluent. One Team. One Data Streaming Platform.
About the Role:
We’re seeking a Senior Software Engineer II with a passion for observability, and a desire to define their career by contributing to our mission critical observability platform that operates at global scale; across the big three cloud providers, a hundred regions, thousands of clusters and tens of thousands of nodes. At this scale, every contribution you make will have an enormous impact on how we develop and operate our global data streaming platform. This role will let you flex a wide range of skills; where UX design and mega scale system design are equally important, where improving developer velocity and compute resource efficiency can be equally satisfying.
What You Will Do:
- Architect, design, build and operate end-to-end solutions for collecting, shipping, storing and querying Open-Telemetry signals from infrastructure, application containers, and k8s clusters, with a heavy focus on self-service, multitenancy, reliability and velocity.
- Operate global and regional storage and query backends for metrics, traces, and logs
- Define and implement the building blocks for querying, visualizing and acting on 300M+ active time series using Grafana, Prometheus, AlertManager, and PagerDuty
- Evaluate and implement new capabilities for logging, trace analytics and application profiling
- Work directly with product engineering teams, on calls, and incident commanders to evangelize and deliver enhancements to our observability platform
What You Will Bring:
- 5+ years building distributed systems in Java, Golang or Python and running in k8s
- 2+ years on an SRE, DevOps, observability, or similar platform engineering team delivering capabilities to multiple product engineering teams
- Deep experience in the use of Prometheus, AlertManager, and Grafana
- Experience with operating in-house observability infrastructure and being on-call for it
- BS, MS, or PhD in computer science or a related field, or equivalent work experience
What Gives You an Edge:
- Experience operating highly scalable observability backends (VictoriaMetrics, Cortex, Thanos, Mimir, Tempo, Loki, Elastic, etc)
- Experience federating multiple observability backends that cross cloud regions
- You’ve built and operated k8s clusters on an infrastructure or SRE team in AWS, GCP, and Azure
- You’ve taken a platform approach to problem solving by building your own OTel collector components, k8s custom resources and k8s operators
Ready to build what's next? Let’s get in motion.
Come As You Are
Belonging isn’t a perk here. It’s the baseline. We work across time zones and backgrounds, knowing the best ideas come from different perspectives. And we make space for everyone to lead, grow, and challenge what’s possible.
We’re proud to be an equal opportunity workplace. Employment decisions are based on job-related criteria, without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by law.
About Confluent
Confluent is pioneering a fundamentally new category of data infrastructure focused on data in motion. Our cloud-native offering is the foundational platform for data in motion --- designed to be the intelligent connective tissue enabling real-time data, from multiple sources, to constantly stream across the organization. With Confluent, our customers can meet the new business imperative of delivering rich, digital customer experiences and real-time business operations. Our mission is to help every organization harness data in motion so they can compete and thrive in the modern world.