Distributed Storage Expert
Remote
Cambridge
£95,757 - £164,154/yearly
JobCard.seniorityLevels.senior_level
About the role
Who you are
- Strong Linux system administration background
- Knowledge of GPU compute environments or AI training infrastructure
- Proven experience installing, configuring, and maintaining Ceph clusters or similar technologies in a production environment
- Experience with monitoring and observability tools (Prometheus, Grafana, etc.)
- Familiarity with distributed filesystems (e.g., Lustre, BeeGFS) and cloud-based storage services (e.g. EC2)
- Contributions to open-source storage, data management, or infrastructure projects
- Experience with tiered storage management and lifecycle data policies
- Familiarity with object storage systems (S3, RADOS Gateway, MinIO, etc.)
- Scripting and automation proficiency (e.g. Bash, Python, Terraform/OpenTofu, Ansible)
- Understanding of data security best practices and compliance considerations
- Experience working with container technologies (e.g. Docker, Kubernetes) and image storage registries
- Strong analytical, troubleshooting, communication and documentation skills
What the job involves
- We support technology-focused start ups, each with unique data management challenges, and are seeking an experienced Storage Architect to help them design, deploy and maintain high-performance storage systems for their AI and data-driven workloads. The successful candidate will combine deep experience architecting and managing distributed, cloud, and tiered storage solutions with strong Linux and automation skills
- In this role you will:
- Design, implement, and maintain storage platforms that support large-scale AI and data pipelines
- Manage distributed storage systems such as Ceph, Lustre, or BeeGFS
- Oversee tiered storage architectures, optimizing data movement across high-performance, object, and archival tiers
- Ensure data integrity, availability, and security across on-premises and cloud environments
- Develop automation and monitoring tools using Bash, Python, or similar scripting languages
- Manage and secure container images and related storage used for AI and ML workloads
- Integrate storage systems with public cloud services (AWS, Azure, GCP) and hybrid environments
- Troubleshoot complex storage and data flow issues, collaborating closely with AI platform and infrastructure teams
- Contribute to ongoing architecture improvements, performance tuning, and capacity planning
The application process
- We work on many joint projects with our sister organization, CommonAI CIC. By applying for this role you give us permission to share your details with CommonAI CIC so that we may consider you for open roles across both organizations. If you prefer not to have your details shared with CommonAI CIC or would not like to be considered for other roles then please let us know when you apply. This will not affect your application for this role
Distributed Storage Expert
Remote
Cambridge
£95,757 - £164,154/yearly
JobCard.seniorityLevels.senior_level
About the role
Who you are
- Strong Linux system administration background
- Knowledge of GPU compute environments or AI training infrastructure
- Proven experience installing, configuring, and maintaining Ceph clusters or similar technologies in a production environment
- Experience with monitoring and observability tools (Prometheus, Grafana, etc.)
- Familiarity with distributed filesystems (e.g., Lustre, BeeGFS) and cloud-based storage services (e.g. EC2)
- Contributions to open-source storage, data management, or infrastructure projects
- Experience with tiered storage management and lifecycle data policies
- Familiarity with object storage systems (S3, RADOS Gateway, MinIO, etc.)
- Scripting and automation proficiency (e.g. Bash, Python, Terraform/OpenTofu, Ansible)
- Understanding of data security best practices and compliance considerations
- Experience working with container technologies (e.g. Docker, Kubernetes) and image storage registries
- Strong analytical, troubleshooting, communication and documentation skills
What the job involves
- We support technology-focused start ups, each with unique data management challenges, and are seeking an experienced Storage Architect to help them design, deploy and maintain high-performance storage systems for their AI and data-driven workloads. The successful candidate will combine deep experience architecting and managing distributed, cloud, and tiered storage solutions with strong Linux and automation skills
- In this role you will:
- Design, implement, and maintain storage platforms that support large-scale AI and data pipelines
- Manage distributed storage systems such as Ceph, Lustre, or BeeGFS
- Oversee tiered storage architectures, optimizing data movement across high-performance, object, and archival tiers
- Ensure data integrity, availability, and security across on-premises and cloud environments
- Develop automation and monitoring tools using Bash, Python, or similar scripting languages
- Manage and secure container images and related storage used for AI and ML workloads
- Integrate storage systems with public cloud services (AWS, Azure, GCP) and hybrid environments
- Troubleshoot complex storage and data flow issues, collaborating closely with AI platform and infrastructure teams
- Contribute to ongoing architecture improvements, performance tuning, and capacity planning
The application process
- We work on many joint projects with our sister organization, CommonAI CIC. By applying for this role you give us permission to share your details with CommonAI CIC so that we may consider you for open roles across both organizations. If you prefer not to have your details shared with CommonAI CIC or would not like to be considered for other roles then please let us know when you apply. This will not affect your application for this role