Accelerating Knowledge Discovery and Innovation with FAIR principles

Julia Kurps

04-11-2016 4 min read

Technological developments enable researchers from academia and industry to produce and collect tremendous amounts of scientific data. Most of this data, however, disappears in some sort of internal storage after it was analyzed with respect to its original purpose, and findings were published in a peer-reviewed article. Preventing this data loss and allowing secondary (re-)use and strategic analysis will help reveal additional information buried in the data and thereby allow exploitation of its full potential. Besides, a big part of available data is produced by publicly funded research and should therefore be publicly accessible to facilitate knowledge discovery.

To support these ambitions and enable optimal data sharing, The Hyve supports the application of the FAIR(Findable, Accessible, Interoperable, Reusable) Data principles. Data that conforms to the FAIR principles should be:

Findable

To make research data easily findable, it needs to be stored in a repository and should be accompanied by a set of relevant, meaningful, and domain-specific metadata tags. An overview of minimal metadata standards for biological and biomedical research can be found here. Furthermore, each individual data set needs a unique, globally registered, and persistent identifier, indexed in a common register. The pan-European collaborative data infrastructure EUDAT offers support for assigning unique identifiers to research data. Making data sets searchable in this way will ensure findable data.

Accessible

Data and/or corresponding metadata need to be Accessible via its unique identifier. It is essential that the data is human and machine-readable and can be accessed via a standard protocol. To guarantee longevity of data access, it is recommended to make use of certified repositories. Accessibility is highly dependent on licenses assigned to the research data. Here, it should be emphasized that accessible data is not open data by default.

Interoperable

Interoperability of data can be achieved by using formal, accessible, shared and broadly applicable language, standard data formats and widely-used ontologies and terminology. Adding qualified references to linked metadata and data sets will enable researchers to optimal benefit from FAIR data sets.

Reusable

To ensure (re-) usability of scientific data, data needs to be clearly documented to allow researchers to understand not only data structure, but also experimental setup and data manipulations. Therefore, document provenance, preferably in a self-documenting fashion, is a key requirement. The open source workflow building tool Galaxy is an excellent example, documenting all individual steps in analysis workflows as well as the version used for analysis.

The Dutch Techcenter for Life Sciences (DTL), a driving force in the implementation of the FAIR principles, organized a workshop during which The Hyve and other DTL partners discussed the status quo, roadmap, and joined contributions to facilitate the dissemination of FAIR principles and development of technical applications. Additional information regarding the technical infrastructure, which is currently being built by DTL partners, can be found in this paper, which was recently published in the PeerJ Journal. After a comprehensive introduction by Ruben Kok (Head of DTL) and Jaap Heringa (VU, Amsterdam, Head of Elixir Node NL), multiple partners shared experiences of implementing FAIR principles in daily practice. Interesting examples for FAIR-ification of widely-used life science applications focused on Molgenis and phenotypeDB. It became clear that converting non-FAIR data to data that fully complies with the FAIR principles is a stepwise process that requires joint efforts from data owners, data producers, FAIR data experts, and domain experts. To further support the dissemination of FAIR principles, DTL offers hackathons to make life science applications compliant to FAIR standards and bring-your-own-data-workshops (BYOD) to convert non-FAIR data sets and train data owners in making their data FAIR.

The Hyve strongly believes in the added value of pre-competitive collaboration and joined efforts and therefore supports all facets of open science. We are actively engaged in open source software communities and we contribute to the dissemination of the FAIR principles in the field of life sciences. If you want to learn more about FAIR principles and FAIR data stewardship or need assistance in making your research data compliant to those principles, feel free to contact us to discuss how The Hyve can help you.