Senior Machine Learning Engineer, Post Training & Speculative Decoding
About the role
Mission We are seeking a highly skilled Machine Learning Engineer to join our advanced model development team. This role focuses on pre-training, continued training, and post-training of models, with a particular emphasis on draft model optimization for speculative decoding and quantization-aware training (QAT). The ideal candidate has deep experience with training methodologies, open-weight models, and performance-tuning for inference.
Responsibilities & Opportunities In This Role
- Lead pre-training and post-training efforts for draft models tailored to speculative decoding architectures.
- Conduct continued training and post-training of open-weight models for non-draft (standard) inference scenarios.
- Implement and optimize quantization-aware training pipelines to enable low-precision inference with minimal accuracy loss.
- Collaborate with model architecture, inference, and systems teams to evaluate model readiness across training and deployment stages.
- Develop tooling and evaluation metrics for training effectiveness, draft model fidelity, and speculative hit-rate optimization.
- Contribute to experimental designs for novel training regimes and speculative decoding strategies.
Ideal Candidates Have/are
- 5+ years of experience in machine learning, with a strong focus on model training.
- Proven experience with transformer-based architectures (e.g., LLaMA, Mistral, Gemma).
- Deep understanding of speculative decoding and draft model usage.
- Hands-on experience with quantization-aware training, including PyTorch QAT workflows or similar frameworks.
- Familiarity with open-weight foundation models and continued/pre-training techniques.
- Proficient in Python and ML frameworks such as PyTorch, JAX, or TensorFlow.
Preferred Qualifications
- Experience optimizing models for fast inference and sampling in production environments.
- Exposure to distributed training, low-level kernel optimizations, and inference-time system constraints.
- Publications or contributions to open-source ML projects.
Attributes Of a Groqster
- Humility - Egos are checked at the door
- Collaborative & Team Savvy - We make up the smartest person in the room, together
- Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
- Curious & Innovative - Take a creative approach to projects, problems, and design
- Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking
Compensation: Groq is committed to providing competitive compensation through our Total Cash philosophy, which incorporates potential bonus value directly into base pay. The total cash salary range for this position, which is inclusive of the potential bonus value, is 196,800–266,200, with individual placement determined by your geographic location, experience, skills, and alignment with internal compensation standards. This range is specific to candidates located in the United States. Compensation for international candidates will vary based on local market dynamics. Beyond cash compensation, Groq also offers equity participation and a robust suite of employee benefits.
This position may require access to technology and/or information subject to U.S. export control laws and regulations, as well as applicable local laws and regulations, including the Export Administration Regulations (EAR). To comply with these requirements, candidates for this role must meet all relevant export control eligibility criteria.
About Groq
Groq is fast AI Inference. The Groq LPU™ AI Inference Technology delivers exceptional compute speed, quality, and energy efficiency. Groq, headquartered in Silicon Valley, provides cloud and on-prem inference at scale for AI applications. The LPU and related systems are designed, fabricated, and assembled in North America. Learn more and try Groq speed today at groq.com.
Senior Machine Learning Engineer, Post Training & Speculative Decoding
About the role
Mission We are seeking a highly skilled Machine Learning Engineer to join our advanced model development team. This role focuses on pre-training, continued training, and post-training of models, with a particular emphasis on draft model optimization for speculative decoding and quantization-aware training (QAT). The ideal candidate has deep experience with training methodologies, open-weight models, and performance-tuning for inference.
Responsibilities & Opportunities In This Role
- Lead pre-training and post-training efforts for draft models tailored to speculative decoding architectures.
- Conduct continued training and post-training of open-weight models for non-draft (standard) inference scenarios.
- Implement and optimize quantization-aware training pipelines to enable low-precision inference with minimal accuracy loss.
- Collaborate with model architecture, inference, and systems teams to evaluate model readiness across training and deployment stages.
- Develop tooling and evaluation metrics for training effectiveness, draft model fidelity, and speculative hit-rate optimization.
- Contribute to experimental designs for novel training regimes and speculative decoding strategies.
Ideal Candidates Have/are
- 5+ years of experience in machine learning, with a strong focus on model training.
- Proven experience with transformer-based architectures (e.g., LLaMA, Mistral, Gemma).
- Deep understanding of speculative decoding and draft model usage.
- Hands-on experience with quantization-aware training, including PyTorch QAT workflows or similar frameworks.
- Familiarity with open-weight foundation models and continued/pre-training techniques.
- Proficient in Python and ML frameworks such as PyTorch, JAX, or TensorFlow.
Preferred Qualifications
- Experience optimizing models for fast inference and sampling in production environments.
- Exposure to distributed training, low-level kernel optimizations, and inference-time system constraints.
- Publications or contributions to open-source ML projects.
Attributes Of a Groqster
- Humility - Egos are checked at the door
- Collaborative & Team Savvy - We make up the smartest person in the room, together
- Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
- Curious & Innovative - Take a creative approach to projects, problems, and design
- Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking
Compensation: Groq is committed to providing competitive compensation through our Total Cash philosophy, which incorporates potential bonus value directly into base pay. The total cash salary range for this position, which is inclusive of the potential bonus value, is 196,800–266,200, with individual placement determined by your geographic location, experience, skills, and alignment with internal compensation standards. This range is specific to candidates located in the United States. Compensation for international candidates will vary based on local market dynamics. Beyond cash compensation, Groq also offers equity participation and a robust suite of employee benefits.
This position may require access to technology and/or information subject to U.S. export control laws and regulations, as well as applicable local laws and regulations, including the Export Administration Regulations (EAR). To comply with these requirements, candidates for this role must meet all relevant export control eligibility criteria.
About Groq
Groq is fast AI Inference. The Groq LPU™ AI Inference Technology delivers exceptional compute speed, quality, and energy efficiency. Groq, headquartered in Silicon Valley, provides cloud and on-prem inference at scale for AI applications. The LPU and related systems are designed, fabricated, and assembled in North America. Learn more and try Groq speed today at groq.com.