CINECA: Sharing patient genomic and biomolecular data across continents

Elisa Cirillo

26-03-2020 5 min read

The Common Infrastructure for National Cohorts in Europe, Canada, and Africa (CINECA) project has been running since January 2019. In this blog, we’d like to give you an impression of what the consortium aims to achieve and the activities that The Hyve has been involved in over the past year.

The CINECA project aims to bring together patient genomic and biomolecular data from 18 research institutions and biobanks from ten different countries across three continents. This creates a virtual cohort of no less than 1.4 million individuals located all over the world.

Performing analyses on such a large cohort should help scientists better explore, for instance, the effect of a range of mutations in humans and establish a link between genetic variants and certain diseases. Besides, it should speed up the development of drugs tailored to genetic variants causing cancer, cystic fibrosis, and other conditions.

To achieve this goal, CINECA is developing a federated cloud-based network. Some of the partners have already successfully implemented workflows for analyzing patient genomic and biomolecular data. The challenge is now to scale up the effort and make the analysis available in different institutions and biobanks participating in the CINECA project.

Personal health train

One issue you quickly encounter with such an initiative is that patient data often cannot be exported from the hospital or research institution for safety and privacy reasons. The solution that the CINECA project will adopt, is the Personal Health Train concept. The idea behind this concept is that the data stays in the same physical location, but scientists from participating institutions can access the relevant data via a workflow that is adopted by different partners. The advantage of this approach is that data can be shared selectively, without exposing privacy-sensitive information. At the same time, pooling cohorts means that researchers can now analyze data from thousands or even a million individuals. This is a huge advantage for research areas such as genomics, where big data analysis is essential if you want to determine if a mutation is associated with an increased or decreased risk of certain diseases or types of cancer.

Work packages

To keep the work manageable, task forces focusing on different work packages have been created: Federated Data Discovery, Authentication & Authorization, Harmonized Metadata, and Federated Data Analysis and ELSI framework. In the past year, The Hyve has been overseeing the activities of the Federated Data Analysis group. Of course, the activities of the different work packages cannot be completely separated as the decisions of one group often have implications for other work packages. Think about questions like: Which type of metadata and data do we need to collect from this cohort?, What are the minimal requirements for running the workflow across all participating institutions, hospitals, or biobanks? What communication barriers need to be resolved before the systems can be aligned? Needless to say, this is all very much a work in progress.

Intercontinental

The CINECA project is led by the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI). What makes the project stand out, is that it goes beyond Europe. It is truly a transcontinental effort, linking European cohorts to patient groups in Canada and Africa. This approach should enhance our understanding globally of which mutations have a significant impact on disease.

In the past year, the focus of the project was mainly on making an inventory: listing the cohorts at the various institutions and which genomic and biomolecular data are available for all participants. Another important aspect was describing the workflows for data analysis that have been implemented successfully at one or more institutions, so in a relatively small-scale setting. The challenge is now to scale up, merge, and roll out these workflows at different partner institutions. Of course, you then have to overcome the problem that the IT infrastructure differs from country to country and across continents.

Use cases

In the past year, several use cases were identified for federated workflows that provide a meaningful scientific output. Here are three examples of use cases in which The Hyve will be involved.

The first use case involves running federated joint cohort variant genotyping across cohorts. Joint genotyping can be performed and implemented in various ways, based on the data access levels of the participating cohorts. Data requirements could involve raw sequencing data (FASTQ/long read formats), reference-aligned data (BAM/CRAM), and/or individual genotypes (VCF/GVCF).

The second use case concerns running a federated Expression Quantitative Trait Loci (eQTL) analysis across different cohorts. eQTL profiles are based on genetic variants that explain variation in gene expression levels. An eQTL may regulate different genes depending on the tissue type and disease state. Cohort data required to perform eQTL analysis are genotype (DNA data) and gene expression data (RNA data).

The third use case is running a federated Polygenic Risk Score (PRS) analysis across different cohorts, calculating the likelihood that an individual develops a particular phenotype given his or her genotype. A PRS weighs trait-associated alleles across many loci. Data requirements from these cohorts are genotypes and phenotypes for the reference data or a pre-existing PRS, genotypes and phenotypes from individuals of similar ethnic background to evaluate the efficacy of the PRS.

Future plans

All participating partners were supposed to meet in Toronto, Canada, from 17th to 19th March to discuss the current state of affairs. Unfortunately, the conference had to be canceled because of the Coronavirus outbreak. It was changed to an online meeting that took place on the same dates. Despite the changes, the partners involved were able to provide meaningful updates on the work done and how to proceed. News and future events can be consulted on the CINECA website.