Data Modeling & Semantic Interoperability
FAIR-compliant, semantically meaningful data models ready for human and machine consumption.
Do you work for a life science organization? We can help you unlock the full potential of your data by making it structured, FAIR, and machine-actionable, then connecting it to proven open-source AI tools.
FAIR-compliant, semantically meaningful data models ready for human and machine consumption.
Aligning data to established biomedical ontologies — enabling interoperability, automated reasoning, and AI-assisted querying.
Connected, queryable graphs linking drugs, targets, pathways, and samples — navigable at scale by both analysts and AI systems.
Understand what you have, how it’s structured, and where the gaps are, before any AI talk begins.
Modeling, FAIR alignment, ontology standardization. The work that determines whether AI delivers.
Connect to Agents, implement MCP, configure BioChatter — proven tools. No black boxes, no vendor lock-in.
Check whether the integrated solution provides the expected high quality results.
A working pilot with real outcomes first. Fixed deliverables, then broader rollout.
A Model Context Protocol (MCP)connecting an LLM directly to the Open Targets Platform database — allowing scientists to query drug target and disease association data in plain language, as an extension to the platform itself. This is based on the official Open Targets MCP repository.
Automated extraction, transformation, and loading pipelines that parse and standardize heterogeneous biomedical data sources without manual intervention — reducing errors and accelerating data availability for analysis. This replaces manual extraction, formal wrangling, and slow data refresh cycles.
The Hyve developed the OQL AI Helper, enabling users to translate natural language queries into OQL, cBioPortal’s Oncology Query Language. For more details, read this article.
Scientists upload PDFs — field reports, lab notebooks, study summaries. BioChatter extracts and structures metadata against a predefined model, then uses interactive chat to fill gaps and fix inconsistencies. How does it work?
The Hyve developed the AI OQL-Helper, a tool which takes in natural language and produces Onco Query Language (OQL) commands. This smart chat interface is available for testing on The Hyve demo server. It is embedded directly into the cBioPortal Query page, allowing users to write queries in natural language. The AI then translates these into Onco Query Language (OQL) commands, making advanced genomic data querying accessible to everyone. This lowers the barrier for users new to OQL, letting you explore complex data quickly and intuitively. Read more here.
The Open Targets MCP server lets you explore Open Targets data in a new, conversational way — querying it through natural language rather than manual database lookups. The LLM generates SQL queries under the hood, translating your scientific questions directly into structured database calls.
What sets the official MCP server apart from earlier community-built alternatives is its compatibility with custom data sources. Open Targets is inherently flexible, and the MCP server is built to match — meaning you can incorporate your own proprietary or internal datasets alongside the core platform data.
It also includes smart JQ-based filtering: database responses are filtered before being passed back to the LLM, keeping token usage lean and outputs focused. Combined with straightforward local deployment, it works as a true out-of-the-box solution — easy to test, easy to run on-premises, and ready to integrate into your existing environment.
We believe AI should be a smart tool in the hands of scientists — not a replacement for their expertise. That means starting with your data: understanding what you have, ensuring it's FAIR — and properly governed, before any AI conversation begins. From there, we choose the right approach for the right problem: sometimes that's a large language model, other times it's classical machine learning, and sometimes it's neither. We favor transparency and openness, using local AI where privacy matters and clearly defining use cases up front — so we can validate whether AI genuinely solves the problem. Every solution is tested against real outcomes before it scales. The scientist stays in control. Always.
Our data is a mess. Are we even ready for AI?
Messy data is exactly where we start. You don't need to have AI-ready data to work with us, that's what we help you build. Most of our engagements begin with a data landscape assessment to understand what you have and whatneeds fixing before anything AI-related happens.
We've had bad experiences with AI vendors overpromising. How are you different?
We'll tell you what we can't do as readily as what we can. Our pilots are tasked with delivering a tangible, working output, not just a slide deck. If the data isn't ready for AI, we'll let you know before you spend anything.
Can you help us build an internal business case for this kind of investment?
Yes, we do this regularly. We can help you frame the ROI in terms of time saved and data quality improvements in language that resonates with your leadership team.
How is this different from just using ChatGPT on our data?
ChatGPT doesn't know your ontologies, your data models, or your domain standards. We make your data structured and semantically rich first, then connect it to AI tools, not a general-purpose chatbot guessing at your context.
Is our data safe? Can this run on our own infrastructure?
Yes. All our solutions can be deployed fully on-premise or in your private cloud. For BioChatter, the LLM is fully swappable, you can connect your own internal model or run everything locally. Your data never leaves your infrastructure unless you choose otherwise.
Our data comes from 10 different sources in different formats. Can you handle that?
That heterogeneity is exactly the problem that ETL automation is designed to address. We map, transform, and standardize across sources into a single consistent output.
How do we know the automated pipeline isn't introducing errors?
Every pipeline includes automated validation and quality checks at each step. Exceptions are flagged for human review, we don't silently pass bad data through.
Can I integrate the Open Targets Platform into my AI workflow?
Yes. You can do this by using the public Open Targets MCP server, or by deploying an MCP server for your own Open Targets Platform instance.
If I add my own data to the Open Targets database, can I query it using the Open Targets MCP?
In most cases, yes. The Open Targets MCP server is designed to work out-of-the-box with extended datasets. Minor adaptations might be needed for uncommon data types.
How can I try out cBioPortal AI?
The cBioPortal community is actively working on making this more widely available. In the meantime, you can test it at chat.cbioportal.org.
Exploring AI for your data programme? Let’s start with a conversation about your data — not a sales pitch about AI. We’ll tell you honestly what’s possible and what isn’t.