Much of The Hyve's day to day business can be summarized as supporting customers to make their biomedical data FAIR. In practice, this means that we now have executed dozens of projects harmonizing biomedical data, with healthcare sources ranging from local GP offices to clinical trials from the largest pharmaceutical companies in the world, and with biology data ranging from whole genome sequencing to physical activity data. In the course of these projects, we've also encountered and worked with a dozen different models for representing clinical data, such as i2b2/tranSMART, OMOP, FHIR, RDF, CDISC SDTM, ODM, etc. Using a standard, instead of just an Excel file and codebook, greatly helps to implement FAIR principle R1.3: (meta)data meet domain-relevant community standards.
In this blog post I will address some of the questions that people often ask me regarding the Common Data Model for FAIR biomedical data.
"How do you choose which data model applies to your data management process?"
This is not an easy question to answer, and it depends on many different factors, such as the nature (observational? interventional?) and source of the data, update frequency, intended usage, current and expected data modalities, applicable rules, regulations, best practices etc. A nice example of a context-based comparison of data models is the recent EMA report on comparing OMOP vs Sentinel for European healthcare data. But of course there are general patterns. A national health data network will have a different data integration approach than a bench-side cell line experiment. I would like to share a few insights based on 'frequently asked questions' that we often get about common data models as a step to make biomedical healthcare data FAIR.
"What models can you recommend?"
Let's look at a generalized use case, where we try to find a data model and tooling to harmonize data from several biomedical studies (say clinical trials or investigator-initiated studies) for analytics purposes. Important questions here are who the beneficiaries of this data integration are and what type of analysis they are planning to do with the data (the 'use case' in IT lingo). Especially when starting out, you should try to demonstrate and maximize the utility of the approach directly to the end-users (whether these are translational medicine scientists, department heads, medical doctors, citizens/patients etc.). If you approach The Hyve which such a request, we can help you find a suitable technical integration approach, for example:
- using OHDSI tooling to query observational health datasets in a uniform way, leveraging the OMOP data model
- using an interface such as Glowing Bear on top of conformed clinical trial data using the i2b2/tranSMART data model
- Using Avro schemas from RADAR-BASE to persist data from mobile questionnaires and wearable sensors
- using cBioPortal to provide oncology researchers with direct access to the integrated clinical (survival status, cancer staging,) and omics (SNPs, CNVs etc.) data
- using PhUSE or a similar RDF-based approach to ensure that the data is queryable and using semantic web and search tools such as Disqover
"Which services does The Hyve provide?"
We help academic hospitals and our big pharma customers improve their level of data harmonization. This benefits analytics and data reuse per the FAIR principles, from a proof of concept (POC) based on a few studies to a company-wide strategy for making R&D data FAIR. In this outtake from a Pistoia webinar, I discussed a few examples (see the video below). In the rest of this post, I will highlight OHDSI as one example.