About the role
DRWis a diversified trading firm with over 3 decades of experience bringing sophisticated technology and exceptional people together to operate in markets around the world. We value autonomy and the ability to quickly pivot to capture opportunities, so we operate using our own capital and trading at our own risk.
Headquartered in Chicago with offices throughout the U.S., Canada, Europe, and Asia, we trade a variety of asset classes including Fixed Income, ETFs, Equities, FX, Commodities and Energy across all major global markets. We have also leveraged our expertise and technology to expand into three non-traditional strategies: real estate, venture capital and cryptoassets.
We operate with respect, curiosity and open minds. The people who thrive here share our belief that it's not just what we do that matters–it's how we do it.DRWis a place of high expectations, integrity, innovation and a willingness to challenge consensus.
We are looking for an HPC Specialist to join our AI and Multi Asset Systematic Strategies team. This team builds and operates GPU infrastructure that powers AI and ML workloads. You'll work on the infrastructure stack from bare metal to model serving, combining systems engineering, performance optimization, and infrastructure automation to solve complex problems at the intersection of hardware, networking, and distributed systems.
Responsibilities:
- Deploy, maintain, and optimize GPU infrastructure for large-scale LLM inference workloads, including provisioning, configuration, and deployment of GPU server fleets.
- Architect and implement distributed serving solutions for multi-node, multi-GPU model deployments.
- Manage GPU-enabled Kubernetes clusters for LLM and ML workloads.
- Configure network infrastructure including load balancers, firewalls, and inter-node communication for GPU clusters.
- Implement and optimize storage solutions for model weights and inference caches.
- Troubleshoot performance bottlenecks across the stack: hardware, drivers, networking, and application layer.
- Research and evaluate emerging GPU technologies, model serving frameworks, and infrastructure optimizations.
- Collaborate with ML engineers to profile model performance and implement inference acceleration techniques.
- Drive reliability improvements through monitoring, alerting, capacity planning, and incident response.
Requirements:
- Bachelor's or Master's degree in Computer Science, Systems Engineering, or related field.
- 5+ years in DevOps, SRE, or infrastructure engineering roles.
- Strong experience with GPU infrastructure, model serving frameworks (vLLM, SGLang), and GPU driver management.
- Hands-on experience optimizing deep learning workloads (inference or training) on GPU clusters.
- Deep Linux systems knowledge including network configuration, storage optimization, and Kubernetes orchestration.
- Experience with infrastructure as code tools (Ansible, Terraform, or similar).
- Strong understanding of distributed systems, networking protocols (TCP/IP, HTTP/2), and load balancing.
- Proficiency in Python and Bash scripting for automation.
- Experience with monitoring and observability tools (Prometheus, Grafana, or similar).
For more information about DRW's processing activities and our use of job applicants' data, please view our Privacy Notice athttps://drw.com/privacy-notice.
California residents, please review the California Privacy Notice for information about certain legal rights athttps://drw.com/california-privacy-notice.
[#LI-KS1]
Not the right fit? Search for HPC Specialist jobs in Montréal, QC
About DRW
At DRW, we identify and capture trading and investment opportunities globally. What sets us apart is our diversified approach—trading across many asset classes and instruments, in markets around the world, with horizons from seconds to years. We succeed by leveraging technology, research and risk management.
We offer the best of both worlds: the opportunity and spirit of a startup and the benefits and stability of an established, experienced firm. Our employees work hard to solve interesting problems, and their results are rewarded. We value continuous learning—from our outcomes, from the environment and from each other. It’s a place of high expectations, deep curiosity, and constant collaboration, with some of the smartest, most passionate people you’ll meet. // Chez DRW, nous identifions et saisissons les opportunités de négociation et d'investissement au niveau mondial. Ce qui nous différencie, c'est notre approche diversifiée : nous négocions sur de nombreuses classes d'actifs et d'instruments, sur des marchés du monde entier, avec des horizons allant de quelques secondes à plusieurs années. Nous réussissons en nous appuyant sur la technologie, la recherche et la gestion des risques. Nous offrons le meilleur des deux mondes: les opportunités et l'ésprit d’une entreprise en démarrage, et les avantages et la stabilité d'une entreprise établie et expérimentée. Nos employés travaillent dur pour résoudre des problèmes intéressants et leurs résultats sont récompensés. Nous valorisons l'apprentissage continu – de nos résultats, de l’environnement et de chacun de nous. C’est un lieu où les attentes sont élevées, la curiosité est profonde et la collaboration est constante, et où l’on retrouve quelques-uns des individus les plus intelligents et les plus passionnés que vous puissiez rencontrer.
Similar jobs you might like
About the role
DRWis a diversified trading firm with over 3 decades of experience bringing sophisticated technology and exceptional people together to operate in markets around the world. We value autonomy and the ability to quickly pivot to capture opportunities, so we operate using our own capital and trading at our own risk.
Headquartered in Chicago with offices throughout the U.S., Canada, Europe, and Asia, we trade a variety of asset classes including Fixed Income, ETFs, Equities, FX, Commodities and Energy across all major global markets. We have also leveraged our expertise and technology to expand into three non-traditional strategies: real estate, venture capital and cryptoassets.
We operate with respect, curiosity and open minds. The people who thrive here share our belief that it's not just what we do that matters–it's how we do it.DRWis a place of high expectations, integrity, innovation and a willingness to challenge consensus.
We are looking for an HPC Specialist to join our AI and Multi Asset Systematic Strategies team. This team builds and operates GPU infrastructure that powers AI and ML workloads. You'll work on the infrastructure stack from bare metal to model serving, combining systems engineering, performance optimization, and infrastructure automation to solve complex problems at the intersection of hardware, networking, and distributed systems.
Responsibilities:
- Deploy, maintain, and optimize GPU infrastructure for large-scale LLM inference workloads, including provisioning, configuration, and deployment of GPU server fleets.
- Architect and implement distributed serving solutions for multi-node, multi-GPU model deployments.
- Manage GPU-enabled Kubernetes clusters for LLM and ML workloads.
- Configure network infrastructure including load balancers, firewalls, and inter-node communication for GPU clusters.
- Implement and optimize storage solutions for model weights and inference caches.
- Troubleshoot performance bottlenecks across the stack: hardware, drivers, networking, and application layer.
- Research and evaluate emerging GPU technologies, model serving frameworks, and infrastructure optimizations.
- Collaborate with ML engineers to profile model performance and implement inference acceleration techniques.
- Drive reliability improvements through monitoring, alerting, capacity planning, and incident response.
Requirements:
- Bachelor's or Master's degree in Computer Science, Systems Engineering, or related field.
- 5+ years in DevOps, SRE, or infrastructure engineering roles.
- Strong experience with GPU infrastructure, model serving frameworks (vLLM, SGLang), and GPU driver management.
- Hands-on experience optimizing deep learning workloads (inference or training) on GPU clusters.
- Deep Linux systems knowledge including network configuration, storage optimization, and Kubernetes orchestration.
- Experience with infrastructure as code tools (Ansible, Terraform, or similar).
- Strong understanding of distributed systems, networking protocols (TCP/IP, HTTP/2), and load balancing.
- Proficiency in Python and Bash scripting for automation.
- Experience with monitoring and observability tools (Prometheus, Grafana, or similar).
For more information about DRW's processing activities and our use of job applicants' data, please view our Privacy Notice athttps://drw.com/privacy-notice.
California residents, please review the California Privacy Notice for information about certain legal rights athttps://drw.com/california-privacy-notice.
[#LI-KS1]
Not the right fit? Search for HPC Specialist jobs in Montréal, QC
About DRW
At DRW, we identify and capture trading and investment opportunities globally. What sets us apart is our diversified approach—trading across many asset classes and instruments, in markets around the world, with horizons from seconds to years. We succeed by leveraging technology, research and risk management.
We offer the best of both worlds: the opportunity and spirit of a startup and the benefits and stability of an established, experienced firm. Our employees work hard to solve interesting problems, and their results are rewarded. We value continuous learning—from our outcomes, from the environment and from each other. It’s a place of high expectations, deep curiosity, and constant collaboration, with some of the smartest, most passionate people you’ll meet. // Chez DRW, nous identifions et saisissons les opportunités de négociation et d'investissement au niveau mondial. Ce qui nous différencie, c'est notre approche diversifiée : nous négocions sur de nombreuses classes d'actifs et d'instruments, sur des marchés du monde entier, avec des horizons allant de quelques secondes à plusieurs années. Nous réussissons en nous appuyant sur la technologie, la recherche et la gestion des risques. Nous offrons le meilleur des deux mondes: les opportunités et l'ésprit d’une entreprise en démarrage, et les avantages et la stabilité d'une entreprise établie et expérimentée. Nos employés travaillent dur pour résoudre des problèmes intéressants et leurs résultats sont récompensés. Nous valorisons l'apprentissage continu – de nos résultats, de l’environnement et de chacun de nous. C’est un lieu où les attentes sont élevées, la curiosité est profonde et la collaboration est constante, et où l’on retrouve quelques-uns des individus les plus intelligents et les plus passionnés que vous puissiez rencontrer.