Jobs.ca
Jobs.ca
Language
Newforma logo

Data Platform Developer

Newforma19 days ago
Hybrid
Quebec
JobCard.seniorityLevels.mid_level
JobCard.employmentTypes.full_time

About the role

We're seeking a talented Data Platform Developer to join our Platform Engineering team and architect the data foundation that will power Newforma's next generation of AI-driven capabilities and analytics. You'll design and implement modern data architectures including medallion/lakehouse patterns, build event-driven data pipelines that process billions of project documents and communications in real-time, and create the analytics infrastructure that enables both business intelligence and AI/ML initiatives. This is a foundational role at an exciting time—as we migrate to AWS and invest heavily in AI, you'll establish the data practices and infrastructure that will serve the company for years to come.

Newforma manages billions of emails, documents, RFIs, submittals, drawings, and project files for thousands of construction projects worldwide. This rich dataset represents an incredible opportunity for AI-powered insights, intelligent automation, and advanced analytics. You'll build the data infrastructure to unlock this potential, creating pipelines that transform raw project data into clean, structured, and AI-ready datasets while also enabling real-time analytics and business intelligence. Working closely with our Director of AI Engineering and Platform Engineering team, you'll establish data architecture patterns that support everything from semantic search and RAG systems to executive dashboards and predictive analytics.

In this role, your responsibilities will include:

Data Architecture & Strategy

  • Design and implement medallion architecture (bronze, silver, gold layers) or lakehouse patterns on AWS to organize and transform data at scale
  • Establish data modeling standards, governance practices, and quality frameworks across the organization
  • Define data retention, archival, and lifecycle management policies for massive volumes of project data
  • Create reference architectures and best practices for data engineering across teams
  • Partner with the Director of AI Engineering to design data pipelines optimized for AI/ML workloads including vector embeddings and model training
  • Work with the Lead Software Architect to ensure data architecture aligns with overall platform strategy
  • Design data schemas and structures that support both analytical queries and AI applications

Event-Driven Architecture

  • Design and implement event-driven data architectures using AWS EventBridge, Kinesis, MSK (Kafka), SNS, and SQS
  • Build real-time data streaming pipelines that capture, process, and route project events across the platform
  • Architect event schemas and patterns for domain events (document uploads, email filing, RFI submissions, etc.)
  • Implement change data capture (CDC) patterns to stream database changes to data lakes and analytics systems
  • Design event-driven workflows that trigger AI processing, notifications, and downstream system updates
  • Establish event governance including versioning, documentation, and monitoring
  • Optimize event processing for low latency and high throughput at scale

Data Pipeline Development

  • Build robust, scalable ETL/ELT pipelines using AWS Glue, Step Functions, Lambda, and EMR
  • Develop data transformation jobs that cleanse, enrich, and structure unstructured project data
  • Implement data quality checks, validation rules, and monitoring throughout pipelines
  • Create reusable pipeline components and frameworks that teams can leverage
  • Optimize pipeline performance and cost efficiency for processing billions of documents
  • Handle diverse data formats including emails, PDFs, CAD drawings, images, and structured databases
  • Implement data lineage tracking and metadata management

Analytics & Business Intelligence

  • Design and build data warehouses and data marts using Amazon Redshift, Athena, or similar technologies
  • Create dimensional models and star schemas optimized for analytical queries
  • Build datasets and aggregations that power executive dashboards and operational reports
  • Implement BI solutions using tools like QuickSight, Tableau, PowerBI, or similar platforms
  • Partner with product and business teams to understand analytics requirements and deliver insights
  • Create self-service analytics capabilities that empower teams to explore data independently
  • Establish KPIs, metrics, and reporting frameworks for product and business analytics

AI/ML Data Infrastructure

  • Prepare and structure data to support AI initiatives including document classification, semantic search, and intelligent agents
  • Build pipelines for generating and storing vector embeddings for RAG (Retrieval-Augmented Generation) systems
  • Create training datasets and feature stores for machine learning models
  • Implement data versioning and experiment tracking for AI/ML workflows
  • Design scalable inference pipelines that serve AI models with fresh, contextualized data
  • Collaborate with the AI Engineering team to optimize data formats and access patterns for LLM applications

Data Operations & Monitoring

  • Implement comprehensive monitoring, alerting, and observability for data pipelines and systems
  • Build data quality dashboards and anomaly detection systems
  • Create operational runbooks and documentation for data platform components
  • Optimize costs across data storage, processing, and querying
  • Ensure data security, encryption, and compliance with privacy regulations
  • Participate in on-call rotation to support production data systems

Collaboration

  • Collaborate with other platform engineering team members to accomplish tasks
  • Participate in agile ceremonies including daily stand-ups, sprint planning, and retrospectives
  • Work closely with development teams and with the software architect to establish good data engineering practices for newly developed features

Requirements for the position include:

  • 5+ years of experience in data engineering, analytics engineering, or related roles
  • Strong hands-on experience with AWS data services including S3, Glue, Athena, Redshift, Kinesis, EventBridge, Lambda, and EMR
  • Proven expertise designing and implementing event-driven architectures using streaming technologies (Kafka/MSK, Kinesis, EventBridge)
  • Experience building medallion architectures, lakehouse platforms, or similar modern data architectures (bronze/silver/gold patterns, Delta Lake, Iceberg)
  • Proficiency with SQL and database design including both relational (PostgreSQL, MySQL) and analytical databases (Redshift, Snowflake)
  • Strong programming skills in Python for data processing, transformation, and automation
  • Experience with data orchestration tools such as Apache Airflow, AWS Step Functions, or Prefect
  • Knowledge of data modeling techniques including dimensional modeling, star schemas, and data vault
  • Experience with analytics and BI tools (QuickSight, Tableau, PowerBI, Looker) and building reports/dashboards
  • Understanding of data quality, data governance, and master data management principles
  • Familiarity with infrastructure-as-code (Pulumi, Terraform, CloudFormation) for managing data infrastructure
  • Strong problem-solving skills and ability to optimize complex data workflows
  • Excellent communication skills with ability to explain technical concepts to diverse audiences
  • Team player who collaborates effectively across engineering, product, and business teams

Nice to have qualifications for this position include:

  • AWS certifications (AWS Data Analytics - Specialty, AWS Solutions Architect, or similar)
  • Experience with Azure data services (Data Factory, Synapse, Event Hubs) and Azure-to-AWS data migrations
  • Knowledge of real-time stream processing frameworks (Apache Spark Streaming, Flink, Kafka Streams)
  • Experience preparing data for AI/ML applications including vector databases (Pinecone, Weaviate, pgvector)
  • Familiarity with document processing, OCR, and unstructured data extraction techniques
  • Experience with data catalog and metadata management tools (AWS Glue Data Catalog, Alation, Collibra)
  • Knowledge of .NET/C# and integrating data pipelines with .NET applications
  • Understanding of SaaS multi-tenancy patterns in data architecture
  • Experience with data privacy and compliance frameworks (GDPR, SOC 2, CCPA)
  • Background in the AECO industry or project management domain
  • Familiarity with graph databases (Neptune, Neo4j) for relationship modeling
  • Experience with serverless data architectures and cost optimization strategies
  • Knowledge of dbt (data build tool) or similar transformation frameworks
  • Bilingual in English and French.

About Newforma

Software Development
51-200

Newforma provides Information Management and collaboration software for the Architecture, Engineering, Construction, and Owner/Operator (AECO) industry. We empower AECO firms by delivering technology solutions that drive better project outcomes at every stage of the construction project lifecycle, from design, to ribbon-cutting, and beyond. Over 4.3M users in more than 16M projects worldwide have streamlined their communication, simplified their administration, and enabled real-time collaboration, all thanks to Newforma’s platforms. Visit us at newforma.com.