Fairspace is an open-source research data management platform that incorporates the FAIR principles and enables scientific collaboration. It offers an environment for managing research data (files, collections) and a metadata catalog with a flexible data model and search interface. It is built on semantic web technologies, such as RDF and SHACL, and is FAIR-by-design. The Hyve created Fairspace in 2016 and has been developing it ever since. Over the years, we’ve customized the platform for several organizations’ use cases, such as the Food Nutrition Security Cloud (FNS-Cloud) microbiome use case, which is presented here.
FNS-Cloud is a consortium with 36 European partners both from the academic and industry field. One of the ambitions of the FNS-Cloud consortium is to bring together food & nutrition security (FNS) research resources such as datasets and tools. Most of these resources have been developed independently and therefore have different interfaces and/or data formats. The FNS-Cloud project aims to set up a first-generation food & nutrition security cloud that integrates existing research resources and shares these in a uniform manner. Several demonstrators are planned in the project in order to provide the use cases requirement to guide the development and integration of the FNS-Cloud tools.
The consortium will utilize Fairspace for the development of a Microbiome Demonstrator, a ‘catalog’ in which researchers can find studies regarding diet interventions. The Microbiome Demonstrator will also enable them to combine these studies with gut microbiome data for further analysis.
In addition, the consortium partners want to link Fairspace with the ELIXIR Infrastructure, which already includes a wide range of European food and nutrition resources. However, ELIXIR does not yet have a solution for food and microbiome data that allows users to find data for analysis across multiple public data sources. Therefore, The Hyve designed a customized solution using Fairspace that could potentially fill this gap.
How we solved it
Several characteristics of Fairspace make it the right solution for solving the challenges mentioned above. Firstly, the tool uses a (semantic) metadata model developed by the consortium to facilitate semantic data integration: FNS-Harmony. FNS-Harmony is used to map data to source-specific fields and entities to classes and attributes in the ontology. In addition, The Hyve developed a SHACL data model based on FNS-Harmony and user requirements. While the FNS-Harmony OWL ontology describes the knowledge of the domain and assumes an open world, the SHACL model is more restrictive, assumes data lives in a closed world, such as the Fairspace metadata store, and prescribes how this data should look. Fairspace uses this SHACL model as its content data model, both for validation (data integrity) and user interface generation. With this setup, the (meta)data search can be performed both by users and machines.
Secondly, The Hyve implemented a set of ETL (extract, transform, load) processes in order to fetch data from several public sources on food and nutrition such as ENA, MGnify, MetaboLights, and dbNP and load it into Fairspace. The data is mapped to the common data model as defined in Fairspace, using selected ontologies and vocabularies.
Lastly, a JupyterHub environment integrated with Fairspace allows users to directly analyze their data and metadata using R, Python or Julia without logging out from Fairspace. After the metadata selection, users can move to this environment and run predefined scripts for data pre-processing and analysis. Users can access the harmonized metadata through sparql and custom API and (file) data.
This work is currently still in progress. The FNS-Cloud project will run until September 2023 and the ETL customization is not open source.
For the FNS-Cloud consortium, Fairspace has become the user browser that allows microbiome and food scientists to explore data within public resources that have been mapped to a common (meta)data model and vocabularies/ontologies. Using the built-in faceted search interface, researchers can export or analyze selected data in a secure analysis environment such as Jupyter Notebook. Finally, in line with the open-source philosophy of The Hyve, these Fairspace extensions have been developed with code that is well-readable, easy to maintain and with a high test coverage.
This use case proves that Fairspace is an easy-to-use open-source research data management tool that is able to resolve a range of data management challenges faced by various organizations. Moreover, it solves these issues in a way that complies with the FAIR principles.
Food Nutrition Security Cloud (FNS-Cloud) has received funding from the European Union’s Horizon 2020 Research and Innovation programme (H2020-EU.126.96.36.199. – A sustainable and competitive agri-food industry) under Grant Agreement No. 863059 – www.fns-cloud.eu