Kernel Engineer

Modular 8 days ago

Remote

San Francisco Bay Area

$167,000 - $242,000/yearly

Mid Level

Top Benefits

Health, dental, vision, and life insurance coverage

401(k) plan with up to 5% employer match

Free tax advice via Carta

About the role

Who you are

Experience with HPC programming and accelerator languages such as CUDA, OpenCL, SYCL, etc
In-depth knowledge of low-level (micro)architectural performance is required
4+ years of experience working on complex code and systems
Experience with performance modeling and performance data analysis
Understanding of Parallelization techniques for ML / HPC Acceleration
Deep interest in machine learning technologies and use cases
Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Experience with layout optimization found in libraries such as CUTLASS and CUTE
Experience with performance profilers, performance data analysis tools, visualization tools, and debugging or experience working with embedded systems
Experience working with distributed/parallel programming models and an understanding of parallel hardware
We organize regular team onsites and local meetups in Los Altos, CA as well as different cities. Traveling 2-4 times a year is expected for all roles

What the job involves

ML developers today face significant friction in taking trained models into deployment
They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable/ model-specific enhancements
At Modular, we are building the next generation AI platform that will radically improve the way developers build and deploy AI models
As an AI Kernel Engineer, you'll be instrumental in building our core platform that enables developers to leverage deployment-specific optimizations across diverse model families and frameworks
As a kernel engineer you will be crafting high-performance kernels for CPUs, GPUs, and emerging hardware architectures
You'll drive critical performance improvements through kernel optimizations that:
Accelerate model latency and maximize throughput
Minimize communication overhead and optimize resource utilization
Streamline memory usage through reduced activation volumes
Enhance data pre- and post-processing pipelines
Deliver measurable end-to-end performance gains across our model ecosystem
Design and optimize high-performance ML numeric and data manipulation kernels/operators
Utilize low-level C/C++/Assembly programming to achieve state of the art performance. Your work will also entail potentially introducing new novel compiler and tools support
Work with compiler, framework, runtime and performance teams to deliver end-to-end performance that fully utilizes today’s complex server and mobile systems
Collaborate with architects and hardware engineers to co-design future accelerators, including ISA for new hardware features and evolving ISA
Collaborate with machine learning researchers to guide system development for future ML trends

Benefits

A variety of fantastic health benefits (health, dental, vision insurance; life insurance etc) are available
A 401k plan with up to 5% match
Free tax advice on Carta
Generous work-from-home stipend of $1500 to help you improve your home office
Unlimited paid time off and flexible work hours

About Modular

Software Development

201-500

The next-generation AI developer platform unifying the development and deployment of AI for the world.

Website LinkedIn