In the MRC MATURA project The Hyve has been working with Queen Mary University London to enable GWAS analysis within tranSMART for Postgres, sponsored by Janssen. As detailed below, we have ported the GWAS functionality in Oracle to Postgres, set up pipelines for loading the reference and analysis data and provided training and example code to the end users. The work was done in collaboration with Mike Barnes and Evan Tzanis in the William Harvey Research Institute Bioinformatics team at Queen Mary University London, and Anthony Rowe from Janssen. The team from The Hyve that worked on this project consisted of Wibo Pipping, Gustavo Lopes and Sjoerd van Hagen.
The goal of the project was to enable GWAS analysis in the MATURA project. To establish what was required to do so, Wibo and I have performed a requirements gathering with Mike and Evan during a productive and pleasant, two-day visit to QMUL.
We came up with the following requirements:
- end users have to be able to perform a GWAS analysis using data in tranSMART to create the cohorts, use plink to do the GWAS analysis and load the results back into tranSMART for visualization
- it has to work with Postgres
- Evan would have to be able to set up servers within the MATURA infrastructure
- additional requirements we would find out from the end users
To meet these requirements we have successfully:
- gathered user feedback on the R client and the GWAS viewer
- gave a training to the end users on how to use the R client to create cohorts and use those to create a phenotype file for plink
- gave an example of how to run the plink tool to do the GWAS analysis
- wrote a Python script to transform the plink output into an input file to transmart-batch
- adapted transmart-batch to be able to load GWAS analyses
- ported the GWAS code to Postgres (it was only implemented for Oracle)
- provided Evan with documentation on how to install GWAS-enabled tranSMART
- provided documentation for the end users on how to use the system
Below is the diagram describing the flow.
- transmart-batch has been updated to be able to load GWAS analysis data (https://github.com/thehyve/transmart-batch/blob/master/docs/data_formats/gwas.md)
- transmart-data has been updated to load the reference data (https://github.com/transmart/transmart-data/commit/974b6e0b70ec9efc90afa82061d941d5d62582b2)