LLM Infrastructure Data Scientist
About the role
About the Role
We are seeking a highly motivated LLM Infrastructure Data Scientist to help us build, scale, and optimize the infrastructure powering our large language model (LLM) development and deployment. You will work at the intersection of data science, systems engineering, and machine learning, with a focus on improving the efficiency, reliability, and performance of large-scale model training and inference.
Key Responsibilities
- Analyze and optimize end-to-end training and inference pipelines for LLMs, focusing on performance, cost, and scalability.
- Monitor and diagnose model infrastructure issues across GPUs/TPUs, distributed training systems, and data pipelines.
- Design and implement robust telemetry, logging, and metrics collection systems for LLM workloads.
- Work closely with ML engineers, research scientists, and infrastructure teams to drive data-informed decisions.
- Develop dashboards and automated reports to track key metrics such as utilization, throughput, model convergence, and error rates.
- Run A/B experiments on infrastructure changes and evaluate their impact using rigorous statistical methods.
- Contribute to the design and testing of new systems for model versioning, reproducibility, and model governance.
Minimum Qualifications
- Bachelor’s or Master’s degree in Computer Science, Statistics, Applied Mathematics, or a related field.
- 3+ years of experience in data science, ML infrastructure, or related technical roles.
- Strong skills in Python and data analysis libraries (e.g., Pandas, NumPy, Scikit-learn).
- Experience with distributed computing frameworks (e.g., Ray, Spark, or Dask).
- Familiarity with ML model training workflows, especially deep learning and LLMs (e.g., using PyTorch, TensorFlow, or JAX).
- Experience with visualization tools (e.g., Plotly, Dash, Grafana, or Tableau).
- Knowledge of cloud infrastructure (AWS, GCP, Azure) and containerized environments (Docker, Kubernetes).
Job Type: Full-time
Pay: $40,000.00-$60,000.00 per year
Experience:
- Machine learning: 1 year (preferred)
About Luxolis
AI-Powered 2D and 3D Vision Solutions
Luxolis AI provides a comprehensive suite of services and cutting-edge AI-powered 2D and 3D Vision Solutions. From system planning to vision implementation, defect detection, and hardware solutions, we offer end-to-end support to optimize manufacturing processes and enhance product quality. With our innovative technologies and expert guidance, businesses can achieve operational excellence, improve efficiency, and gain deep insights into their physical environments.
Our in-house product suite includes the 3D Vision Camera, Depth Camera, 3D Capture, and 3D Connect platform, tailored to improve efficiency and accuracy while minimizing operational disruptions.
Luxolis AI’s smart solutions empower businesses to achieve greater flexibility, accuracy, and cost-effectiveness in their production processes.
LLM Infrastructure Data Scientist
About the role
About the Role
We are seeking a highly motivated LLM Infrastructure Data Scientist to help us build, scale, and optimize the infrastructure powering our large language model (LLM) development and deployment. You will work at the intersection of data science, systems engineering, and machine learning, with a focus on improving the efficiency, reliability, and performance of large-scale model training and inference.
Key Responsibilities
- Analyze and optimize end-to-end training and inference pipelines for LLMs, focusing on performance, cost, and scalability.
- Monitor and diagnose model infrastructure issues across GPUs/TPUs, distributed training systems, and data pipelines.
- Design and implement robust telemetry, logging, and metrics collection systems for LLM workloads.
- Work closely with ML engineers, research scientists, and infrastructure teams to drive data-informed decisions.
- Develop dashboards and automated reports to track key metrics such as utilization, throughput, model convergence, and error rates.
- Run A/B experiments on infrastructure changes and evaluate their impact using rigorous statistical methods.
- Contribute to the design and testing of new systems for model versioning, reproducibility, and model governance.
Minimum Qualifications
- Bachelor’s or Master’s degree in Computer Science, Statistics, Applied Mathematics, or a related field.
- 3+ years of experience in data science, ML infrastructure, or related technical roles.
- Strong skills in Python and data analysis libraries (e.g., Pandas, NumPy, Scikit-learn).
- Experience with distributed computing frameworks (e.g., Ray, Spark, or Dask).
- Familiarity with ML model training workflows, especially deep learning and LLMs (e.g., using PyTorch, TensorFlow, or JAX).
- Experience with visualization tools (e.g., Plotly, Dash, Grafana, or Tableau).
- Knowledge of cloud infrastructure (AWS, GCP, Azure) and containerized environments (Docker, Kubernetes).
Job Type: Full-time
Pay: $40,000.00-$60,000.00 per year
Experience:
- Machine learning: 1 year (preferred)
About Luxolis
AI-Powered 2D and 3D Vision Solutions
Luxolis AI provides a comprehensive suite of services and cutting-edge AI-powered 2D and 3D Vision Solutions. From system planning to vision implementation, defect detection, and hardware solutions, we offer end-to-end support to optimize manufacturing processes and enhance product quality. With our innovative technologies and expert guidance, businesses can achieve operational excellence, improve efficiency, and gain deep insights into their physical environments.
Our in-house product suite includes the 3D Vision Camera, Depth Camera, 3D Capture, and 3D Connect platform, tailored to improve efficiency and accuracy while minimizing operational disruptions.
Luxolis AI’s smart solutions empower businesses to achieve greater flexibility, accuracy, and cost-effectiveness in their production processes.