Overview
The Open Targets Platform (OTP) is a powerful resource that connects potential drug targets to diseases using comprehensive evidence from multiple data sources. The data sources are highly diverse, including genetic evidence, expression data, pathway and drug databases, and many other sources of information. The Association Page displays one-to-many relationships between targets and diseases based on integrated evidence, and a Profile Page that delivers detailed, source-rich information about specific targets or diseases. These features support researchers and bioinformaticians in making data-driven decisions for drug discovery and development.
Challenge
While the OTP provides robust out-of-the-box data, many users require the ability to incorporate and analyze custom datasets—either from proprietary research or niche sources not included in the default evidence base. Previously, integrating this custom data into OTP workflows required considerable manual effort and technical workarounds. The challenge: to make loading of custom datasets easier.
Solution: Enhanced Data Loader
The Hyve developed a highly flexible, enhanced data loader, built to simplify and standardize the process of ingesting custom data into OTP:
- Custom Evidence Data: Load custom evidence to supplement preexisting data sources or as brand new evidence, displayed as an additional column in the Associations On The Fly (AOTF) table.

- Custom Profile Data Support: The loader leverages the built-in Clickhouse database to support any type of tabular custom data that can be displayed in the profile page, ensuring scalability and performance. For example, you can load and display custom cBioPortal omics data as illustrated by our OTP instance.

- Flexible Preprocessing: New evidence can be passed through custom preprocessing workflows to clean, transform, or format data before ingestion. For example, EBI’s Platform ETL backend transformer can be used to generate OTP-compliant evidence from raw custom data, which can then be loaded to the platform.
To integrate custom datasets, the only requirements are:
- Raw custom data (in any parsable format, such as CSV and JSON)
- Preprocessing workflows/scripts (optional)
- Clickhouse schema files (in SQL format)
- OpenSearch document mapping (in JSON format)
Impact
The enhanced data loader significantly lowers the technical development barrier to enriching a privately deployed Open Targets Platform instance with any public or proprietary datasets, fostering greater flexibility and customizability. It enables seamless integration of proprietary databases and in-house data lakes into the Platform, where they can be combined with the existing Open Targets public data. Data types similar to existing data are easiest to add, but novel data types can also be added; for example, The Hyve added cBioPortal as a data source to Open Targets, demonstrating how private and public datasets from cBioPortal can be integrated into drug discovery.