Top Benefits
Health, dental, vision, and life insurance coverage
401(k) plan with up to 5% employer match
Free tax advice via Carta
About the role
Who you are
- Experience with HPC programming and accelerator languages such as CUDA, OpenCL, SYCL, etc
- In-depth knowledge of low-level (micro)architectural performance is required
- 4+ years of experience working on complex code and systems
- Experience with performance modeling and performance data analysis
- Understanding of Parallelization techniques for ML / HPC Acceleration
- Deep interest in machine learning technologies and use cases
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
- Experience with layout optimization found in libraries such as CUTLASS and CUTE
- Experience with performance profilers, performance data analysis tools, visualization tools, and debugging or experience working with embedded systems
- Experience working with distributed/parallel programming models and an understanding of parallel hardware
- We organize regular team onsites and local meetups in Los Altos, CA as well as different cities. Traveling 2-4 times a year is expected for all roles
What the job involves
- ML developers today face significant friction in taking trained models into deployment
- They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable/ model-specific enhancements
- At Modular, we are building the next generation AI platform that will radically improve the way developers build and deploy AI models
- As an AI Kernel Engineer, you'll be instrumental in building our core platform that enables developers to leverage deployment-specific optimizations across diverse model families and frameworks
- As a kernel engineer you will be crafting high-performance kernels for CPUs, GPUs, and emerging hardware architectures
- You'll drive critical performance improvements through kernel optimizations that:
- Accelerate model latency and maximize throughput
- Minimize communication overhead and optimize resource utilization
- Streamline memory usage through reduced activation volumes
- Enhance data pre- and post-processing pipelines
- Deliver measurable end-to-end performance gains across our model ecosystem
- Design and optimize high-performance ML numeric and data manipulation kernels/operators
- Utilize low-level C/C++/Assembly programming to achieve state of the art performance. Your work will also entail potentially introducing new novel compiler and tools support
- Work with compiler, framework, runtime and performance teams to deliver end-to-end performance that fully utilizes today’s complex server and mobile systems
- Collaborate with architects and hardware engineers to co-design future accelerators, including ISA for new hardware features and evolving ISA
- Collaborate with machine learning researchers to guide system development for future ML trends
Benefits
- A variety of fantastic health benefits (health, dental, vision insurance; life insurance etc) are available
- A 401k plan with up to 5% match
- Free tax advice on Carta
- Generous work-from-home stipend of $1500 to help you improve your home office
- Unlimited paid time off and flexible work hours
Top Benefits
Health, dental, vision, and life insurance coverage
401(k) plan with up to 5% employer match
Free tax advice via Carta
About the role
Who you are
- Experience with HPC programming and accelerator languages such as CUDA, OpenCL, SYCL, etc
- In-depth knowledge of low-level (micro)architectural performance is required
- 4+ years of experience working on complex code and systems
- Experience with performance modeling and performance data analysis
- Understanding of Parallelization techniques for ML / HPC Acceleration
- Deep interest in machine learning technologies and use cases
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
- Experience with layout optimization found in libraries such as CUTLASS and CUTE
- Experience with performance profilers, performance data analysis tools, visualization tools, and debugging or experience working with embedded systems
- Experience working with distributed/parallel programming models and an understanding of parallel hardware
- We organize regular team onsites and local meetups in Los Altos, CA as well as different cities. Traveling 2-4 times a year is expected for all roles
What the job involves
- ML developers today face significant friction in taking trained models into deployment
- They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable/ model-specific enhancements
- At Modular, we are building the next generation AI platform that will radically improve the way developers build and deploy AI models
- As an AI Kernel Engineer, you'll be instrumental in building our core platform that enables developers to leverage deployment-specific optimizations across diverse model families and frameworks
- As a kernel engineer you will be crafting high-performance kernels for CPUs, GPUs, and emerging hardware architectures
- You'll drive critical performance improvements through kernel optimizations that:
- Accelerate model latency and maximize throughput
- Minimize communication overhead and optimize resource utilization
- Streamline memory usage through reduced activation volumes
- Enhance data pre- and post-processing pipelines
- Deliver measurable end-to-end performance gains across our model ecosystem
- Design and optimize high-performance ML numeric and data manipulation kernels/operators
- Utilize low-level C/C++/Assembly programming to achieve state of the art performance. Your work will also entail potentially introducing new novel compiler and tools support
- Work with compiler, framework, runtime and performance teams to deliver end-to-end performance that fully utilizes today’s complex server and mobile systems
- Collaborate with architects and hardware engineers to co-design future accelerators, including ISA for new hardware features and evolving ISA
- Collaborate with machine learning researchers to guide system development for future ML trends
Benefits
- A variety of fantastic health benefits (health, dental, vision insurance; life insurance etc) are available
- A 401k plan with up to 5% match
- Free tax advice on Carta
- Generous work-from-home stipend of $1500 to help you improve your home office
- Unlimited paid time off and flexible work hours