Easy data import into tranSMART using the Arborist and tranSMART toolkit

Even for data scientists and researchers who are experienced tranSMART users, loading data into this data warehouse can be a complicated task. Therefore, The Hyve developed two tools, the Arborist and the tmtk (tranSMART toolkit), to make it easier to import data into tranSMART. We developed these tools in close cooperation with BMMRI, TraIT and Health RI.

The Arborist

The Arborist Visual editor is a secure web application that enables data managers to collaborate on data modelling with non-technical data experts. It allows them to:

  • restructure the tranSMART tree with drag and drop
  • rename variables and values
  • add and edit metadata for any tree node
  • work with both low and high dimensional data

Try the Arborist for yourself here.

The code can be found on Github, under GPL v3 license.

tmtk Python library

The tmtk (tranSMART toolkit) Python library allows users to create and load studies without the need for in-depth tranSMART knowledge. It enables:

  • fast creation of studies from tabular files (e.g. XLS, TSV, CSV)
  • extensive dataset validation
  • the direct embedded use of The Arborist in Jupyter Notebook
  • loading studies into The Arborist web application for collaboration
  • many functions to work with low and high dimensional data

Documentation on the tmtk can be found here.

The code can be found on Github, under GPL v3 license.

Importing data into tranSMART in five simple steps

So how do you transform data and import data into tranSMART using the Arborist and the tranSMART toolkit? It can be done in five simple steps. Or six, if you still need to install the tmtk Python library in Jupyter notebook.

Step 1: Import

Start the import wizard in the tmtk to create a study based on the study data you want to import.

Step 2: Validate

Let the tmtk check the tranSMART-specific requirements.

Step 3: Edit

Make the compulsory changes to your tree with the visual Arborist editor or the appropriate functions in tmtk.

Step 4: Save

Store the study files on your computer as "tranSMART-ready staging files".

Optional Step 4b: The Arborist

Transfer the study to the Arborist web application for easy collaboration!

Step 5: Load

Use the tranSMART-batch loading tool in the tmtk from within the Jupyter notebook to load your data into tranSMART.

Contact us for more information on how to use the tranSMART toolkit and the Arborist with your data set!

Tags