Research Data Engineer GEMINI Data Quality
About the role
This role will contribute to the Vital real-time health data for Trials, Artificial Intelligence and a Learning Health System (VITAL) platform, which is a federally funded, secure, high-performance computing environment being developed in Ontario, Quebec and Alberta that will allow hospital data to be collected every 24 hours, linked, deidentified, and accessed by researchers through a unified portal. VITAL expands upon the GEMINI platform at Unity Health Toronto, which is a collaborative data and analytics platform that allows 35+ Ontario hospitals to accelerate research and quality improvement, leading to excellent hospital care.
The VITAL team is seeking an experienced Data Engineer to join this innovative network. The role of the Data Engineer will be to extract, transform and load data from source systems (EHRs, administrative databases, registries, etc.) into repositories or pipelines so that the data can be used for analytics (reporting, business intelligence, statistical modeling, machine learning, etc.). The scope of this role is end-to-end, such that it includes all activities from data collection, requirements gathering and database analysis, to pipeline design and programming, to ongoing QA and maintenance of data pipelines and target data systems.
The incumbent will play an integral role liaising with various departments at Unity Health Toronto and across VITAL’s collaborating institutions and hospitals, driving operational excellence for both GEMINI and VITAL. This will be a challenging, rewarding, and fast-paced environment with interaction with clinicians, researchers, data scientists, hospital IT team, and hospital leaders. The ideal candidate will possess exceptional understanding of health data systems and technologies, and excellent skills with inter-institutional stakeholders and initiative.
Duties and Responsibilities:
Requirements gathering and data analysis
Engages with stakeholders to understand their analytic/research/business needs, and work collaboratively with them to develop and validate concrete business requirements for data collection and data pipeline design
Documents business requirements as required
Carries out analysis of source system databases to discover the structure, flows, functions, and interdependencies of data within the system using custom written SQL scripts or an enterprise tool
Documents findings and presents them to the stakeholders
Data pipeline development
Based on requirements and analysis findings, conducts detailed interviews with subject matter experts, data scientists, and hospital IT team to support data collection and inform the design of a logical and scalable pipeline model
Documents the data pipeline model using mock-ups or partial outputs as appropriate, and validates the model with stakeholders
Constructs and automates the data pipeline from the validated model
Leverages HL7/FHIR data standards to ingest standardized clinical data into target data models and pipelines
Designs and implements robust error and exception handling procedures
Documents data pipeline architecture
Monitoring and troubleshooting of data pipelines, target data systems
Routinely monitors system logs/alerts for error and exception detection
Manually runs jobs/restarts pipelines when automation fails
Maintains ongoing quality assurance to correct for data drift
Collaborates with the IT Security team to monitor and protect pipelines against data leakage and ensure compliance with healthcare data privacy policies and regulations
Patches/updates data pipeline software, scripting tools, etc. as needed
Routinely monitors data pipeline health and optimizes code when necessary
Qualifications:
Undergraduate degree in Computer Science, Engineering, Biostatistics, or a related discipline, along with relevant experience managing large scale projects, ideally within healthcare data systems and datasets (e.g., EHR, CIHI, clinical registries)
Extensive knowledge in the design and development of data pipelines required
Mastery of SQL as well as at least one of either R or Python required; working knowledge of relational database systems (Postgres, MySQL, etc.) preferred
Experience designing and running testing scenarios required
Knowledge on Linux commands and Shell scripts required
Working knowledge of version controlling and collaboration platforms (GitHub, GitLab etc.) required
Good knowledge of data warehouse concepts and experience with BI/DW concepts e.g., facts, dimension, star/snowflake schema structures, 3NF modeling, metadata management etc., required
Knowledge of HL7, FHIR standards, Integration Engine, and FHIR server preferred
Familiarity with cloud architecture (AWS, Azure, GCP, etc.) preferred
Experience with data orchestration (e.g. Airflow) and containerization technologies preferred
Good judgment and understanding of what issues to escalate, resolve on your own, making suggestions for possible resolution required
Ability to learn new technology expediently
Ability to maintain responsibility for all aspects of pipeline development, automation and improvements/operations for a project
Ability to work effectively in a team environment and across all organizational levels, where flexibility, collaboration and adaptability are important
Note: As part of our recruitment process, automated tools may be used to assist in the initial review of application materials.
Unity Health Toronto is committed to creating an accessible and inclusive organization. We strive to provide a recruitment process that is barrier-free and in compliance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code. We understand that you may require an accommodation at any stage of the recruitment process. When you are contacted, please inform the Talent Acquisition Specialist and we will work with you to meet your accommodation needs. We want to emphasize that all accommodation requests are handled with the utmost confidentiality, respecting your privacy and dignity.
Not the right fit? Search for Research Data Engineer GEMINI Data Quality jobs in Toronto, ON
About Unity Health Toronto
Unity Health Toronto, comprised of Providence Healthcare, St. Joseph’s Health Centre and St. Michael’s Hospital, works to advance the health of everyone in our urban communities and beyond. Our health network serves patients, residents and clients across the full spectrum of care, spanning primary care, secondary community care, tertiary and quaternary care services to post-acute through rehabilitation, palliative care and long-term care, while investing in world-class research and education.
Similar jobs you might like
Research Data Engineer GEMINI Data Quality
About the role
This role will contribute to the Vital real-time health data for Trials, Artificial Intelligence and a Learning Health System (VITAL) platform, which is a federally funded, secure, high-performance computing environment being developed in Ontario, Quebec and Alberta that will allow hospital data to be collected every 24 hours, linked, deidentified, and accessed by researchers through a unified portal. VITAL expands upon the GEMINI platform at Unity Health Toronto, which is a collaborative data and analytics platform that allows 35+ Ontario hospitals to accelerate research and quality improvement, leading to excellent hospital care.
The VITAL team is seeking an experienced Data Engineer to join this innovative network. The role of the Data Engineer will be to extract, transform and load data from source systems (EHRs, administrative databases, registries, etc.) into repositories or pipelines so that the data can be used for analytics (reporting, business intelligence, statistical modeling, machine learning, etc.). The scope of this role is end-to-end, such that it includes all activities from data collection, requirements gathering and database analysis, to pipeline design and programming, to ongoing QA and maintenance of data pipelines and target data systems.
The incumbent will play an integral role liaising with various departments at Unity Health Toronto and across VITAL’s collaborating institutions and hospitals, driving operational excellence for both GEMINI and VITAL. This will be a challenging, rewarding, and fast-paced environment with interaction with clinicians, researchers, data scientists, hospital IT team, and hospital leaders. The ideal candidate will possess exceptional understanding of health data systems and technologies, and excellent skills with inter-institutional stakeholders and initiative.
Duties and Responsibilities:
Requirements gathering and data analysis
Engages with stakeholders to understand their analytic/research/business needs, and work collaboratively with them to develop and validate concrete business requirements for data collection and data pipeline design
Documents business requirements as required
Carries out analysis of source system databases to discover the structure, flows, functions, and interdependencies of data within the system using custom written SQL scripts or an enterprise tool
Documents findings and presents them to the stakeholders
Data pipeline development
Based on requirements and analysis findings, conducts detailed interviews with subject matter experts, data scientists, and hospital IT team to support data collection and inform the design of a logical and scalable pipeline model
Documents the data pipeline model using mock-ups or partial outputs as appropriate, and validates the model with stakeholders
Constructs and automates the data pipeline from the validated model
Leverages HL7/FHIR data standards to ingest standardized clinical data into target data models and pipelines
Designs and implements robust error and exception handling procedures
Documents data pipeline architecture
Monitoring and troubleshooting of data pipelines, target data systems
Routinely monitors system logs/alerts for error and exception detection
Manually runs jobs/restarts pipelines when automation fails
Maintains ongoing quality assurance to correct for data drift
Collaborates with the IT Security team to monitor and protect pipelines against data leakage and ensure compliance with healthcare data privacy policies and regulations
Patches/updates data pipeline software, scripting tools, etc. as needed
Routinely monitors data pipeline health and optimizes code when necessary
Qualifications:
Undergraduate degree in Computer Science, Engineering, Biostatistics, or a related discipline, along with relevant experience managing large scale projects, ideally within healthcare data systems and datasets (e.g., EHR, CIHI, clinical registries)
Extensive knowledge in the design and development of data pipelines required
Mastery of SQL as well as at least one of either R or Python required; working knowledge of relational database systems (Postgres, MySQL, etc.) preferred
Experience designing and running testing scenarios required
Knowledge on Linux commands and Shell scripts required
Working knowledge of version controlling and collaboration platforms (GitHub, GitLab etc.) required
Good knowledge of data warehouse concepts and experience with BI/DW concepts e.g., facts, dimension, star/snowflake schema structures, 3NF modeling, metadata management etc., required
Knowledge of HL7, FHIR standards, Integration Engine, and FHIR server preferred
Familiarity with cloud architecture (AWS, Azure, GCP, etc.) preferred
Experience with data orchestration (e.g. Airflow) and containerization technologies preferred
Good judgment and understanding of what issues to escalate, resolve on your own, making suggestions for possible resolution required
Ability to learn new technology expediently
Ability to maintain responsibility for all aspects of pipeline development, automation and improvements/operations for a project
Ability to work effectively in a team environment and across all organizational levels, where flexibility, collaboration and adaptability are important
Note: As part of our recruitment process, automated tools may be used to assist in the initial review of application materials.
Unity Health Toronto is committed to creating an accessible and inclusive organization. We strive to provide a recruitment process that is barrier-free and in compliance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code. We understand that you may require an accommodation at any stage of the recruitment process. When you are contacted, please inform the Talent Acquisition Specialist and we will work with you to meet your accommodation needs. We want to emphasize that all accommodation requests are handled with the utmost confidentiality, respecting your privacy and dignity.
Not the right fit? Search for Research Data Engineer GEMINI Data Quality jobs in Toronto, ON
About Unity Health Toronto
Unity Health Toronto, comprised of Providence Healthcare, St. Joseph’s Health Centre and St. Michael’s Hospital, works to advance the health of everyone in our urban communities and beyond. Our health network serves patients, residents and clients across the full spectrum of care, spanning primary care, secondary community care, tertiary and quaternary care services to post-acute through rehabilitation, palliative care and long-term care, while investing in world-class research and education.