OMOP CDM Conversion

Convert Source Data into Reliable OMOP CDM-Format Databases

The ability to integrate data from diverse sources and conduct meaningful cross-dataset analytics relies on harmonizing the source datasets, a process that involves adopting standardized vocabularies and a common model. The OMOP CDM stands as a widely accepted standard for observational health data. At The Hyve, we offer a comprehensive data harmonization service, providing an efficient and cost-effective solution for the conversion of source datasets (e.g. EHR, claims) into the OMOP CDM format. Our consultancy and data engineering experts ensure that the final OMOPed data is reliable, accurate and purpose-fit.

Our approach

OMOP CDM as a Service

The Hyve’s Real World Data team adeptly manages the complex data conversion process, allowing Client teams to dedicate their efforts to evidence-generating data analysis rather than the time-consuming data conversion management.

Our extract-transform-load (ETL) development service is designed to be user-friendly while remaining budget-conscious, ensuring a swift start to data analysis utilizing the OMOP CDM format. With our support, you can confidently transition into impactful analysis without the hassle of extensive conversion management, focusing on the insights that matter most.

ETL development framework

Leveraging The Hyve's ETL development framework Delphyne for creating multiple ETLs across various source datasets offers significant advantages:

  • Simplified ETL Management: Delphyne simplifies the ETL process, making it easier to handle large volumes of data, providing a robust solution for data conversion needs.

  • Efficient Transformation: Delphyne streamlines data conversion into the OMOP CDM format using reusable tools and standardized processes.

  • Data Quality Enhancement: Delphyne ensures consistent rules for data loading and OMOP CDM transformation, leading to overall improved data quality.

Iterative Data Quality

Data quality is an integral part of OMOP CDM data conversion projects. We follow an iterative process to ensure data quality at every step:

  • Profiling the source data (source data quality)

  • Designing, building, implementing and automating the ETL pipeline (ETL process quality)

  • Utilizing OHDSI data quality tools to assess the quality of the converted data in OMOP CDM format (target data quality)

  • Applying checks that compare source and mapped data to demonstrate the accuracy and reliability of the source-to-target mapping.


Mapping the UK Biobank to the OMOP CDM

Delphyne OHDSI integration

The Hyve is a prominent OMOP/OHDSI services provider and a technology leader in the European Health Data Evidence Network (EHDEN), a pre-competitive initiative around OMOP/OHDSI. Within the framework of EHDEN, The Hyve was contracted by University King's College to facilitate access to the UK Biobank dataset for COVID-19 research. In this endeavor, The Hyve successfully converted the UK Biobank into the OMOP CDM, achieving extensive data mapping coverage despite several challenges. These obstacles included mapping non-standard ontologies, dealing with data heterogeneity from various data providers, converting wide-format tables, and adapting to a dynamic and evolving data source.

Read more


What people say about The Hyve

"The Hyve's acknowledgment of the FAIR principles and close connection with the OHDSI community result in an efficient support of open science with respect for data privacy and sensitivity. Our cooperation with The Hyve on harmonization of two health data sources within the international projects BigData@Heart and EHDEN was very smooth. Perfect communication, together with The Hyve’s experience in iterative and agile development using synthetic data helped to separate the development process and the ETL deployment on a client side."

Spiros Denaxas and Václav Papež

"The Hyve is one of Europe’s leading technology IT services providers who have established an international reputation within the biomedical informatics domain, from open standards such as OHDSI to working with FAIR principles. With their passionate leadership, they have been involved in numerous projects including the European Health Data & Evidence Network (EHDEN). I have no doubt that all of these projects have benefited greatly from their thinking, insights and hands-on expertise."

Nigel Hughes, Industry Lead at IMI EHDEN Project

"I approached The Hyve to develop the ‘progress visualisation’ aspect of RADAR-base − a dynamic graph that visualises weekly completion rates of tasks in real time. The whole journey from conception to completion was extremely efficient and well-managed, and I worked with a software engineer who was able to produce exactly what I had envisioned in a very quick turnaround."

Katie White, Research Assistant at King's College London

Ready to harmonize your data?

Convert source datasets into OMOP CDM format efficiently and affordably. Our consultancy and data engineering experts will ensure that the final OMOPed data is reliable, accurate and purpose-fit.

Fill in the form and we will get in touch


What are the key aspects in the data harmonization process?

Projects focused on harmonizing source data typically involve the following stages and tools:

  • Source data profiling using WhiteRabbit (a part of the OHDSI toolkit maintained by The Hyve)

  • Syntactic mapping using Rabbit in a Hat (a part of the OHDSI toolkit maintained by The Hyve)

  • Semantic mapping using USAGI and ATHENA vocabularies

  • ETL development using The Hyve proprietary Delphyne framework

  • Data quality verification using Data Quality Dashboard and Achilles

What types of data can The Hyve team harmonize to OMOP CDM?

The Hyve specializes in harmonizing various types of source data to the OMOP CDM, including:

  • EHR

  • Healthcare claims databases

  • Biobank data

  • Registry data

Additionally, we assist clients in integrating OMOP data with real-world data from sources such as wearable devices, patient-reported outcomes data, and in some cases, even clinical trial data.

Are custom data fields or custom data models necessary?

Depending on the analytics use cases and data quality requirements, The Hyve team has substantial experience in providing consultancy and technical solutions for implementing custom fields and custom model layers. These solutions are tailored to be fit for purpose, enabling novel analytics and evidence generation.

What is Delphyne?

Delphyne is a Python package designed to simplify and standardize the process of converting source data to the OMOP Common Data Model (CDM). It offers an easy setup for ETL development projects, flexible source data loading and transformation implementation, quick mapping of source values to standard OMOP concepts, and efficient data extraction and loading options e.g. cashing. Delphyne also includes a ready-to-use ETL framework (Delphyne-template), ensuring a cost-effective and streamlined ETL development process.

Does The Hyve need to get access to the source data?

The decision to grant The Hyve access to the source data is optional, not mandatory. We understand the importance of maintaining the security of your data. As an ISO 27001-certified company, The Hyve prioritizes the protection of sensitive information. Our expertise allows us to work either with the source data or via development of an intermediary totally privacy preserving synthetic dataset used for ETL development purposes only. In summary, we respect your preferences and are here to collaborate in a way that aligns with your data security requirements.