The Hyve has initiated a project to explore the integration of single-cell RNA sequencing (scRNA-seq) data into cBioPortal. cBioPortal is an open-access resource for interactive exploration of multidimensional cancer genomics data sets. Adding single-cell data and visualisations to cBioPortal would reduce the time spent finding, downloading, and processing data for both users new to single-cell data and users with expertise in single-cell technologies. This blog outlines the work done so far towards visualising and exploring scRNA-seq data in cBioPortal.
Why is extending bulk data with single-cell data helpful?
Combining scRNA-seq data with bulk data and clinical information in cBioPortal will help correct assumptions about the heterogeneity between cells and provide insights about the tumours at single-cell resolution. cBioPortal offers several ways of visualising discrete genetic events (copy number alterations [CNAs] or mutations) as well as data on mRNA, protein abundance or DNA methylation. Adding single-cell metadata to these studies can reveal distinct cell populations within the tumour tissue, help find specific markers, and explain phenomena such as tumorigenesis and metastasis. Moreover, it can be of great significance for developing new diagnostic and anti-tumour treatment methods by linking information derived from single-cell data, such as the cell type abundance or proliferative activity of specific cell types, to clinical markers, e.g. the response to different kinds of treatment.
Why have we extended cBioPortal with scRNA-seq data?
Currently, there are over 400 scRNA-seq visualisation tools available (as reported by the scRNA-tools website), each with different strengths and weaknesses, which makes it hard to find the most suitable tool for a user's needs. Having to use multiple tools to meet all user requirements leads to duplicate effort being spent on data curation, installation and maintenance. Therefore there is an urgent need to focus attention on one or a few open-source tools that meet the requirements of the majority of users. By selecting one tool and integrating it with cBioPortal, we aspire to rally the cBioPortal community's support behind it. We hope this will lead to a production-quality open source tool with an extensive compendium of curated public single-cell datasets.
What did we do?
With this extension, we have made single-cell data (highlighted in yellow) available as a new cBioPortal data type (Genomic Profile Sample Counts red box in the image below). For single-cell visualisation and exploration, we have compared twenty-one existing open-source tools based on their capabilities and selected cellxgene and VITESSCE to be integrated into cBioPortal's Patient View page. For more details on why these tools were chosen, see the section on Open-source scRNA-seq visualisation tools later in this article. In addition to this, we add meta-data derived from single-cell studies to help subset samples based on their composition of cell types and cell cycle phases.
Example: To filter samples with high T-cell infiltration and explore samples further using scRNA-seq visualisation tools
The user can access single-cell data in cBioPortal through the Study View page and the Patient View page. On the Study View page, users can visualise metadata of the single-cell studies with the help of histograms. The user can explore the frequencies of the different types of cells present in the tumour or examine the cell cycle state of different cells. For example, suppose a user is interested in looking at samples with high T-cell infiltration. In that case, they can add a chart with the frequencies of T-cells to the dashboard by selecting it from the “Charts” menu on the top right of the Study View page.
The Cell Type Fractions option will list all the different cell types found in the study. Selecting them will add histograms of the respective cell type frequencies across all samples in the study. The user can then filter for the samples with the highest frequency of T-cells in the tissue. Furthermore, the user can use the cell cycle phase charts to select samples with high T-cell infiltration as well as samples with a high number of T-cells with cells in the S phase if they wish to subset the cell types further.
Thus, adding single-cell metadata allows users to take advantage of cellular heterogeneity and helps in fine-tuning their analysis.
After creating a subset of patients, suppose the user wishes to explore the single-cell data of each patient individually. In that case, they can enter the Patient View. A new tab called “Single Cell Data” has been added in the Patient View that presents the user with different scRNA-seq samples associated with this patient.
The user can look at a description of the study, the number of cells sequenced, a summary of the raw counts of the different cell types, and cell cycle phase assignments. To interactively visualise and explore a sample in more depth, cBioPortal’s Single Cell Data tab contains links to open each sample in cellxgene or VITESSCE.
Open-source scRNA-seq visualization tools
After comparing twenty-one tools, cellxgene and VITESSCE emerged as the most promising tools to integrate with cBioPortal. Both of these tools have active communities, render visualisations quickly, and are customisable. Cellxgene is one of the most popular visualisation tools and is used by many organisations around the world. It can handle large datasets and even allows users to perform differential expression analysis. VITESSCE is highly configurable, serverless, and supports hierarchical cell annotation, e.g., for cell subtypes. We selected these tools because integration with a bigger platform like cBioPortal will make these scRNA-seq visualisation tools accessible to a larger audience, leading to an influx of user feedback and community efforts to implement further enhancements.
This implementation of single-cell visualisation tools with cBioPortal allows users to examine many different biologically and clinically relevant hypotheses. This study takes the first step towards including single-cell assays into cBioPortal. The next step is to load publicly available datasets containing both bulk genomic sequencing data (to identify copy number aberrations, mutations, etc.) and single-cell RNA sequencing data from the same samples to demonstrate the meaningful biological results that can be inferred from this integration.