Simple data loading into cBioPortal with Docker (Updated 2019)

In our recent blog post we discussed an easy solution to set up a local instance of cBioPortal using Docker. The next step is to load a study into the database. In cBioPortal Datahub, there are plenty of public studies that have staging files available for loading directly into a private cBioPortal instance. An up-to-date list of public studies is maintained here.

In this tutorial we will load a study using the same docker image used to set up cBioPortal.

Requirements

Before loading the study, make sure all the cBioPortal, MySQL, MongoDB and the session service containers are up and running, and that you’ve located the previously configured portal.properties file. We will also generate a validation report, which will point out potential issues in our data. In this tutorial, we will use the staging files for the TCGA Bladder Cancer study.

Commands

# Download the TCGA bladder cancer staging files from git hub. For this step you need to install git and git-lfs.

$ git clone https://github.com/cBioPortal/datahub.git

$ cd datahub

$ git pull origin master

$ git lfs pull -I public/blca_tcga

$ cd ..

# Start a Docker container to load the study

$ docker run -it --rm \

--net cbio-net \

-v "$PWD/datahub/public/blca_tcga:/study:ro" \

-v "$PWD:/outdir" \

-v "$PWD/portal.properties:/cbioportal/portal.properties:ro" \

cbioportal/cbioportal:latest \

metaImport.py -u http://cbioportal-container:8080 -s /study \

--html=/outdir/report_blca.html -v -o

This will run the data validator first, which checks for common issues and reports them in the report file. If there are no errors, it will continue and import the data. The study is loaded successfully when you see “Updating study status to : 'AVAILABLE' for study: blca_tcga” !

The last thing to do is restart Tomcat to retrieve any updates made to the database. This is done by restarting the Docker container that runs cBioPortal:

# Restart

$ docker restart cbioportal-container

In your local cBioPortal instance, e.g. http://localhost:8081, the TCGA Bladder Cancer study is now visible.

Further information on loading data and the used parameters can be found on GitHub and in the cBioPortal documentation. For commercial support to help with data loading or transformation, you can contact us here.

cBioPortal

cBioPortal provides researchers and clinicians an insight in large scale cancer genomics datasets, helping them create and select better treatments for patients.

Read more