Senior Research Engineer

Autodesk about 1 month ago

London, Toronto

Senior Level

About the role

Who you are

Bachelor’s degree in Computer Science, Electrical Engineering, Robotics, or related field (or equivalent practical experience)
4+ years of experience building computer vision systems using Python
Strong experience with deep learning for computer vision (detection, segmentation, and/or video understanding) using modern frameworks such as PyTorch
Experience taking ML prototypes into reliable pipelines, including evaluation, monitoring, and failure analysis
Experience building or integrating ML systems into cloud or backend workflows (batch processing and/or services)
Strong collaboration and communication skills; ability to work across teams and stakeholders
Experience with vision-language models (VLMs) and multimodal systems (for example: grounded vision, open-vocabulary recognition, retrieval-augmented multimodal reasoning)
Experience with multimodal fusion (combining imagery/video with metadata, documents, and sensor signals)
Experience with video pipelines (tracking, temporal aggregation, long-video processing)
Experience with real-world datasets, including data curation, labelling strategy, augmentation, and quality control under limited data constraints
Experience developing reusable platform components adopted across multiple teams

What the job involves

We are hiring a Senior Software Engineer focused on Computer Vision and Multimodal AI to build robust perception and understanding systems used across multiple teams and product areas
You will develop end-to-end pipelines that transform images and video into structured, reliable observations by combining modern vision models with multimodal reasoning and contextual signals (for example: domain metadata, documents, and sensor inputs)
This role blends applied research with strong software engineering: rapid iteration, rigorous evaluation, and production-minded implementation for cloud-scale batch processing and interactive workflows
Design, build, and improve multi-stage computer vision pipelines that may include segmentation, detection, tracking, and VLM-based analysis, producing structured outputs (entities, attributes, actions/events, confidence, provenance)
Build systems that handle real-world variability in visual inputs (for example: low resolution, poor lighting, motion blur, cluttered scenes, inconsistent capture devices)
Work with diverse media types such as photos, video, timelapse, 360 video, and RGB-D when available
Fuse visual evidence with contextual inputs such as metadata, documents, and sensor streams to improve recognition quality and reduce ambiguity
Evaluate and integrate state-of-the-art vision and vision-language foundation models, including open-vocabulary recognition, grounded perception, segmentation, and multimodal reasoning
Apply fine-tuning or adaptation approaches when needed; partner with ML teams on training, data strategy, and infrastructure best practices
Define measurable acceptance criteria and benchmarking for accuracy, robustness, latency/cost, and reliability across datasets and domains
Build scalable cloud workflows for batch processing and integrate outputs with APIs and downstream consumers
Improve operational performance and cost via batching, caching, model selection, and pipeline observability
Write maintainable code, contribute to design docs, code reviews, shared libraries, and cross-team technical decisions
What Success Looks Like:
Delivered an end-to-end system that ingests real-world image/video inputs and outputs a structured, queryable set of observations (objects plus activities/events), with clear accuracy and reliability metrics
Demonstrated robustness to common visual failure modes (lighting, occlusion, clutter, camera variation) and measurable improvements when contextual signals are available
Built a modular pipeline architecture (segmentation/detection/VLM reasoning components) that can be reused and extended across domains and teams
Maintained strong engineering quality: reproducible experiments, documented decisions, maintainable code, and dependable integrations

Not the right fit? Search for Research Engineer jobs in London, Toronto

About Autodesk

10,000+

Website

Similar jobs you might like

Senior Research Engineer

Autodesk about 1 month ago

London, Toronto

Senior Level

About the role

Who you are

Bachelor’s degree in Computer Science, Electrical Engineering, Robotics, or related field (or equivalent practical experience)
4+ years of experience building computer vision systems using Python
Strong experience with deep learning for computer vision (detection, segmentation, and/or video understanding) using modern frameworks such as PyTorch
Experience taking ML prototypes into reliable pipelines, including evaluation, monitoring, and failure analysis
Experience building or integrating ML systems into cloud or backend workflows (batch processing and/or services)
Strong collaboration and communication skills; ability to work across teams and stakeholders
Experience with vision-language models (VLMs) and multimodal systems (for example: grounded vision, open-vocabulary recognition, retrieval-augmented multimodal reasoning)
Experience with multimodal fusion (combining imagery/video with metadata, documents, and sensor signals)
Experience with video pipelines (tracking, temporal aggregation, long-video processing)
Experience with real-world datasets, including data curation, labelling strategy, augmentation, and quality control under limited data constraints
Experience developing reusable platform components adopted across multiple teams

What the job involves

We are hiring a Senior Software Engineer focused on Computer Vision and Multimodal AI to build robust perception and understanding systems used across multiple teams and product areas
You will develop end-to-end pipelines that transform images and video into structured, reliable observations by combining modern vision models with multimodal reasoning and contextual signals (for example: domain metadata, documents, and sensor inputs)
This role blends applied research with strong software engineering: rapid iteration, rigorous evaluation, and production-minded implementation for cloud-scale batch processing and interactive workflows
Design, build, and improve multi-stage computer vision pipelines that may include segmentation, detection, tracking, and VLM-based analysis, producing structured outputs (entities, attributes, actions/events, confidence, provenance)
Build systems that handle real-world variability in visual inputs (for example: low resolution, poor lighting, motion blur, cluttered scenes, inconsistent capture devices)
Work with diverse media types such as photos, video, timelapse, 360 video, and RGB-D when available
Fuse visual evidence with contextual inputs such as metadata, documents, and sensor streams to improve recognition quality and reduce ambiguity
Evaluate and integrate state-of-the-art vision and vision-language foundation models, including open-vocabulary recognition, grounded perception, segmentation, and multimodal reasoning
Apply fine-tuning or adaptation approaches when needed; partner with ML teams on training, data strategy, and infrastructure best practices
Define measurable acceptance criteria and benchmarking for accuracy, robustness, latency/cost, and reliability across datasets and domains
Build scalable cloud workflows for batch processing and integrate outputs with APIs and downstream consumers
Improve operational performance and cost via batching, caching, model selection, and pipeline observability
Write maintainable code, contribute to design docs, code reviews, shared libraries, and cross-team technical decisions
What Success Looks Like:
Delivered an end-to-end system that ingests real-world image/video inputs and outputs a structured, queryable set of observations (objects plus activities/events), with clear accuracy and reliability metrics
Demonstrated robustness to common visual failure modes (lighting, occlusion, clutter, camera variation) and measurable improvements when contextual signals are available
Built a modular pipeline architecture (segmentation/detection/VLM reasoning components) that can be reused and extended across domains and teams
Maintained strong engineering quality: reproducible experiments, documented decisions, maintainable code, and dependable integrations

Not the right fit? Search for Research Engineer jobs in London, Toronto

About Autodesk

10,000+

Website