GPU Performance Engineer

Remote

Cambridge

£94,642 - £175,764/yearly

JobCard.seniorityLevels.mid_level

Strong experience with GPU computing and performance analysis
Experience with large-scale or distributed GPU systems
Hands-on experience running and interpreting AI benchmarks and workloads
Familiarity with scientific computing or HPC workloads
Proven background in performance optimization and profiling
Experience developing internal tooling or benchmarking frameworks
Solid understanding of the full stack, including: GPU architectures and memory hierarchies, Drivers, runtimes, and system software, AI frameworks and numerical libraries
Knowledge of modern AI workloads (training and inference) and model architectures
Strong coding and debugging skills (e.g., C/C++, Python, CUDA or similar)
Experience with Linux systems and low-level debugging
Experience using performance analysis and profiling tools
Ability to reason about complex systems and explain performance behavior clearly

We are seeking a GPU Performance Engineer to evaluate, analyze, and optimize performance across AI workloads and scientific computing applications. In this role, you will run and develop open-source AI benchmarks, build tooling to automate benchmarking and result collection, and deeply analyze performance results to identify and resolve bottlenecks across the full hardware and software stack
This position is ideal for someone who enjoys low-level performance work, hands-on experimentation, and turning complex performance data into actionable insights
Run, develop, and maintain open-source AI benchmarks (training and inference workloads) as well as custom AI and scientific computing workloads
Design and implement benchmarking and automation tools to execute workloads, collect results, and ensure reproducibility
Analyze and interpret performance data to identify compute, memory, communication, and I/O bottlenecks
Perform performance optimization across models, kernels, libraries, and system configurations
Troubleshoot performance issues across the full hardware/software stack, including GPUs, CPUs, interconnects, drivers, runtimes, and frameworks
Collaborate with researchers, engineers, and systems teams to improve performance and efficiency
Document findings and clearly communicate performance insights and recommendations

We start with a 30 min screening call
We then follow with a 45 minutes introductory call with Dr Rosemary Francis, our CTO
Finally a 2 hour technical deep dive with the other team members
During the hiring process we may ask you technical questions about any technologies or experience that you list on your CV. You will be expected to write short snippets of code to solve specific problems
We work on many joint projects with our sister organization, CommonAI CIC. By applying for this role you give us permission to share your details with CommonAI CIC so that we may consider you for open roles across both organizations. If you prefer not to have your details shared with CommonAI CIC or would not like to be considered for other roles then please let us know when you apply. This will not affect your application for this role

1-10

Remote

Cambridge

£94,642 - £175,764/yearly

JobCard.seniorityLevels.mid_level

Strong experience with GPU computing and performance analysis
Experience with large-scale or distributed GPU systems
Hands-on experience running and interpreting AI benchmarks and workloads
Familiarity with scientific computing or HPC workloads
Proven background in performance optimization and profiling
Experience developing internal tooling or benchmarking frameworks
Solid understanding of the full stack, including: GPU architectures and memory hierarchies, Drivers, runtimes, and system software, AI frameworks and numerical libraries
Knowledge of modern AI workloads (training and inference) and model architectures
Strong coding and debugging skills (e.g., C/C++, Python, CUDA or similar)
Experience with Linux systems and low-level debugging
Experience using performance analysis and profiling tools
Ability to reason about complex systems and explain performance behavior clearly

We are seeking a GPU Performance Engineer to evaluate, analyze, and optimize performance across AI workloads and scientific computing applications. In this role, you will run and develop open-source AI benchmarks, build tooling to automate benchmarking and result collection, and deeply analyze performance results to identify and resolve bottlenecks across the full hardware and software stack
This position is ideal for someone who enjoys low-level performance work, hands-on experimentation, and turning complex performance data into actionable insights
Run, develop, and maintain open-source AI benchmarks (training and inference workloads) as well as custom AI and scientific computing workloads
Design and implement benchmarking and automation tools to execute workloads, collect results, and ensure reproducibility
Analyze and interpret performance data to identify compute, memory, communication, and I/O bottlenecks
Perform performance optimization across models, kernels, libraries, and system configurations
Troubleshoot performance issues across the full hardware/software stack, including GPUs, CPUs, interconnects, drivers, runtimes, and frameworks
Collaborate with researchers, engineers, and systems teams to improve performance and efficiency
Document findings and clearly communicate performance insights and recommendations

We start with a 30 min screening call
We then follow with a 45 minutes introductory call with Dr Rosemary Francis, our CTO
Finally a 2 hour technical deep dive with the other team members
During the hiring process we may ask you technical questions about any technologies or experience that you list on your CV. You will be expected to write short snippets of code to solve specific problems
We work on many joint projects with our sister organization, CommonAI CIC. By applying for this role you give us permission to share your details with CommonAI CIC so that we may consider you for open roles across both organizations. If you prefer not to have your details shared with CommonAI CIC or would not like to be considered for other roles then please let us know when you apply. This will not affect your application for this role

1-10