Senior Infrastructure Engineer
Top Benefits
About the role
Who you are
- You thrive on high-impact engineering, love solving complex distributed systems challenges, and want to shape the foundation of one of the most widely used developer platforms on the planet
- Strong software development skills in Go, Python, Java or similar (design, testing, and code review)
- Significant experience shipping and operating cloud infrastructure and distributed systems at scale in production (typically 5+ years of relevant work)
- Solid foundation in Linux, Networking, and Cloud Security
- Excellent cross-collaboration, written and verbal communication in a remote environment
- Kubernetes ecosystem and Containerization (EKS, ingress, CNI, service mesh)
- Observability tooling (OpenTelemetry, Prometheus, Grafana)
- CI/CD & release automation (GitHub Actions, Argo CD)
- Cost optimization at scale (FinOps, capacity modelling)
What the job involves
- Our Infrastructure Engineering team is the backbone of Docker’s cloud-native platform — powering products like Docker Hub, Docker Build Cloud, Docker Scout etc., for millions of developers worldwide
- We don’t just keep the lights on — we design, build, and operate the infrastructure services and platforms that make Docker fast, reliable, and secure at global scale
- We own and operate the core building blocks of Docker’s application platform:
- Compute – Multi-tenant EKS clusters, autoscaling, and capacity management
- Edge & Internal Networking – Ingress, rate limiting, VPN, and secure inter-cluster connectivity
- Observability – End-to-end metrics, logs, tracing, probes, and alerting
- Deployment – GitOps workflows powered by Argo CD
- Security – IAM for services and humans, plus robust secret management
- Cloud Infra Provisioning & FinOps – Automated cloud resource provisioning and cost transparency
- In this role you'll architect and run globally distributed platform services that hundreds of engineers rely on every day
- Continuously evolve our compute, edge, observability and deployment layers for maximum resilience, performance, and cost-efficiency at a global scale
- Lead with automation, Infrastructure as Code, and SLO-driven operations to deliver reliability through software, not toil
- Code first: we tackle infra problems with software, design docs, and rigorous code review
- Async & remote‑first: decisions are documented in RFCs; incident reviews are blameless and written
- Cross‑functional: platform, product, and security engineers collaborate daily to unblock each other
- Continuous improvement: we ship small, measure impact, and iterate quickly
- You'll design, develop, and ship internal platform services (e.g. provisioning, cost insights, rate‑limiting) in Go or Python
- Partner with product and engineering teams to provide paved‑road patterns for deployment, observability, and security
- Codify infrastructure with Terraform and Go; champion GitOps best practices
- Define and own SLOs, lead on‑call rotations, conduct blameless post‑mortems, and implement remediations
- Advance observability by operating metrics, logs, tracing, probes, and alerting pipelines at cloud scale
- Evolve Docker’s ingress stack—Envoy Gateway, ALB/NLB, AWS VPC CNI—to deliver secure, reliable, and cost‑efficient request routing
- Operate and scale multi‑tenant EKS clusters; guide the evaluation and adoption of new infrastructure technologies
- Build and operate self-serve cloud resource provisioning platforms at scale
- Deliver real-time cost visibility and lead company-wide cost-efficiency initiatives
- First 30 Days
- Complete onboarding and build relationships across Engineering, Security, and Product
- Ship your first Terraform or internal service change and shadow on-call
- Gain a deep understanding of our platform architecture, SLOs, and current reliability initiatives
- First 60 Days
- Take ownership of a critical service or infrastructure component
- Lead a medium-complexity project from design to production
- Rotate fully into the on‑call schedule, leading incident response when needed, with confidence
- First 90 Days
- Lead a high-impact project from design to production
- Contribute to refining our platform roadmap and champion initiatives that reduce toil and accelerate delivery
- First Year
- Lead the design and launch of a major, company-wide infrastructure initiative
- Become a recognized subject matter expert in Docker’s cloud infrastructure
- Mentor newer engineers and influence engineering culture through technical leadership and continuous improvement
Benefits
- 100% company paid medical premiums for employees and dependents
- Flexible Time Off Policy
- “Whaleness” Days — At least 1 company wide day off per month
- Employer Paid Holidays
- Generous Maternity and Parental Leave
- Home Office Set Up Budget
- Monthly Technology Stipend
- Training Allowances
- Life and Disability Insurance
- Retirement Plans
- Virtual and In-Person Social Events
- Docker Swag
- Quarterly Hackathons
- Virtual Coffee with Co-Workers
About Docker, Inc
At Docker, we simplify the lives of developers who are making world-changing apps. Docker helps developers bring their ideas to reality by conquering the complexity of app development. We simplify and accelerate workflows with an integrated development pipeline and application components. Actively used by millions of developers around the world, Docker Desktop and Docker Hub provide unmatched simplicity, agility and choice.
Senior Infrastructure Engineer
Top Benefits
About the role
Who you are
- You thrive on high-impact engineering, love solving complex distributed systems challenges, and want to shape the foundation of one of the most widely used developer platforms on the planet
- Strong software development skills in Go, Python, Java or similar (design, testing, and code review)
- Significant experience shipping and operating cloud infrastructure and distributed systems at scale in production (typically 5+ years of relevant work)
- Solid foundation in Linux, Networking, and Cloud Security
- Excellent cross-collaboration, written and verbal communication in a remote environment
- Kubernetes ecosystem and Containerization (EKS, ingress, CNI, service mesh)
- Observability tooling (OpenTelemetry, Prometheus, Grafana)
- CI/CD & release automation (GitHub Actions, Argo CD)
- Cost optimization at scale (FinOps, capacity modelling)
What the job involves
- Our Infrastructure Engineering team is the backbone of Docker’s cloud-native platform — powering products like Docker Hub, Docker Build Cloud, Docker Scout etc., for millions of developers worldwide
- We don’t just keep the lights on — we design, build, and operate the infrastructure services and platforms that make Docker fast, reliable, and secure at global scale
- We own and operate the core building blocks of Docker’s application platform:
- Compute – Multi-tenant EKS clusters, autoscaling, and capacity management
- Edge & Internal Networking – Ingress, rate limiting, VPN, and secure inter-cluster connectivity
- Observability – End-to-end metrics, logs, tracing, probes, and alerting
- Deployment – GitOps workflows powered by Argo CD
- Security – IAM for services and humans, plus robust secret management
- Cloud Infra Provisioning & FinOps – Automated cloud resource provisioning and cost transparency
- In this role you'll architect and run globally distributed platform services that hundreds of engineers rely on every day
- Continuously evolve our compute, edge, observability and deployment layers for maximum resilience, performance, and cost-efficiency at a global scale
- Lead with automation, Infrastructure as Code, and SLO-driven operations to deliver reliability through software, not toil
- Code first: we tackle infra problems with software, design docs, and rigorous code review
- Async & remote‑first: decisions are documented in RFCs; incident reviews are blameless and written
- Cross‑functional: platform, product, and security engineers collaborate daily to unblock each other
- Continuous improvement: we ship small, measure impact, and iterate quickly
- You'll design, develop, and ship internal platform services (e.g. provisioning, cost insights, rate‑limiting) in Go or Python
- Partner with product and engineering teams to provide paved‑road patterns for deployment, observability, and security
- Codify infrastructure with Terraform and Go; champion GitOps best practices
- Define and own SLOs, lead on‑call rotations, conduct blameless post‑mortems, and implement remediations
- Advance observability by operating metrics, logs, tracing, probes, and alerting pipelines at cloud scale
- Evolve Docker’s ingress stack—Envoy Gateway, ALB/NLB, AWS VPC CNI—to deliver secure, reliable, and cost‑efficient request routing
- Operate and scale multi‑tenant EKS clusters; guide the evaluation and adoption of new infrastructure technologies
- Build and operate self-serve cloud resource provisioning platforms at scale
- Deliver real-time cost visibility and lead company-wide cost-efficiency initiatives
- First 30 Days
- Complete onboarding and build relationships across Engineering, Security, and Product
- Ship your first Terraform or internal service change and shadow on-call
- Gain a deep understanding of our platform architecture, SLOs, and current reliability initiatives
- First 60 Days
- Take ownership of a critical service or infrastructure component
- Lead a medium-complexity project from design to production
- Rotate fully into the on‑call schedule, leading incident response when needed, with confidence
- First 90 Days
- Lead a high-impact project from design to production
- Contribute to refining our platform roadmap and champion initiatives that reduce toil and accelerate delivery
- First Year
- Lead the design and launch of a major, company-wide infrastructure initiative
- Become a recognized subject matter expert in Docker’s cloud infrastructure
- Mentor newer engineers and influence engineering culture through technical leadership and continuous improvement
Benefits
- 100% company paid medical premiums for employees and dependents
- Flexible Time Off Policy
- “Whaleness” Days — At least 1 company wide day off per month
- Employer Paid Holidays
- Generous Maternity and Parental Leave
- Home Office Set Up Budget
- Monthly Technology Stipend
- Training Allowances
- Life and Disability Insurance
- Retirement Plans
- Virtual and In-Person Social Events
- Docker Swag
- Quarterly Hackathons
- Virtual Coffee with Co-Workers
About Docker, Inc
At Docker, we simplify the lives of developers who are making world-changing apps. Docker helps developers bring their ideas to reality by conquering the complexity of app development. We simplify and accelerate workflows with an integrated development pipeline and application components. Actively used by millions of developers around the world, Docker Desktop and Docker Hub provide unmatched simplicity, agility and choice.