Rich metadata in cBioPortal: a step towards FAIRness

Sjoerd van Hagen

04-06-2019 3 min read

The cBioPortal community and The Hyve have always championed Open Source and Open Data. Now that the FAIR principles for data stewardship are gaining momentum, we cannot stay behind! To take a small step towards FAIR compliance, rich metadata functionality has been added in cBioPortal 2.1.0.

Development of this feature was sponsored by one of our pharma clients to keep track of the data provenance for their clinical trials. A structured but flexible approach ensures it can be used in any context. This will take cBioPortal one step closer to enabling FAIR compliance by taking care of the F2 FAIR principle.

Use case

Many of the questions to the cBioPortal community’s discussion group are related to how the publicly available data in the public cBioPortal was preprocessed before being loaded into the portal. Before this feature was released, the metadata fields that were supported only included title, description, and a reference to the accompanying publication, if any.

Rich Metadata

cBioPortal is used in many different contexts, with both public and private data coming from a variety of sources. For clinical trials run by pharma companies, different metadata needs to be stored when compared to data coming from public sources. To support all use cases, we needed to support metadata in any structure, and we decided not to have any constraints. We do allow users to implement their own constraints on the metadata that they associate with their data to achieve uniformity for their study metadata.

How does it work?

It is quite simple, actually. Metadata can be included when loading the study and will be shown in the cBioPortal front end. The supported formats for importing the metadata are JSON (for metadata that is automatically generated) and YAML (for metadata that is created by human beings).

The image on the left side shows the metadata that is imported, and the image on the right shows the result rendered in the front end.

Next step

The next step would be to create metadata for all the public studies that are in the public portal. This would be a task for the data curators, as they know the data and metadata inside out and can retrace the steps taken before loading the data into cBioPortal. This way, the data provenance can be recorded in a structured fashion.

cBioPortal

cBioPortal is an open-source platform that makes large-scale cancer genomics datasets accessible to researchers and clinicians. At The Hyve, we help pharmaceutical companies, hospitals, research institutions, and data providers unlock the full potential of cBioPortal by enabling secure, integrated analysis of both public and proprietary data tailored to their research needs. As a trusted contributor to the cBioPortal community since 2015, we combine deep technical expertise with hands-on experience to ensure smooth deployment, seamless integration, and actionable insights from your biomedical data.

Need help with cBioPortal feature development?

The Hyve provides services to develop and improve features in cBioPortal. Implemented features are released to the community via the cBioPortal repository on GitHub. For inquiries on cBioPortal feature development projects or other services around cBioPortal, feel free to contact us via this form.

Rich metadata in cBioPortal: a step towards FAIRness

Use case

Rich Metadata

How does it work?

Next step

cBioPortal

Need help with cBioPortal feature development?

Fill in the form and we will get in touch

Thank you

Rich metadata in cBioPortal: a step towards FAIRness

Use case

Rich Metadata

How does it work?

Next step

cBioPortal

Sjoerd van Hagen

Need help with cBioPortal feature development?

Fill in the form and we will get in touch

Stay updated