RADAR-base has over the past years grown into a stable, user-friendly platform. In this blog, we give you an impression of the technical problems our team encountered on the way and how those were overcome.
The open-source software platform RADAR-base, an initiative of the RADAR-CNS consortium, was built to support clinical studies with remote monitoring technologies, using sensor data from wearable devices such as smartwatches and Android smartphones. Remote monitoring gives scientists more insight into the health status of patients and should enable early detection of relapse, allowing patients and health care personnel to act quickly and adequately.
The platform was officially launched in April 2018, but a number of pilot studies are already running at that moment. In the past year, The Hyve gained lots of hands-on experience with and RADAR-base as the number of patients contributing sensor data increased from dozens to hundreds. At the Bio-IT World Conference (15 - 17 May 2018 in Boston, MA), RADAR-base won the Best of Show Award in the category Data Integration & Management, partly because of the pilot studies that were successfully running.
Developing the platform, one problem we often encountered was that not all Android devices are the same ‘under the hood’. This meant that data we expected to collect, was not available on all devices. For example, not all tablets provide location data, not all smartphone front cameras have a light sensor (keeping track of day and night), not all devices have a magnetic field sensor (for more accurate orientation calculation). These variations meant that the code had to be adjusted frequently to work with a range of smartphone brands and types. Sometimes, it was even necessary to adjust the code to an updated version of Android software.
The main mechanism for pairing Android apps to the RADAR-base platform – via a QR code scan – turned out to work well only with high-resolution cameras. To make it more easily readable, we reduced the QR code size. Besides, we changed the pairing mechanism to allow for shorter codes. This creates a fall back in case the QR verification does not work after all and the user needs to manually enter the pairing code. These measures enabled us to maintain an easy-to-use and at the same time secure pairing mechanism.
Presenting the collected data in a number of comprehensible graphs on the dashboard also proved a challenge. The first platform setup had a relatively simple backend that could handle data from thousands of users in parallel. However, the scope of the Dashboard was later expanded to show near-raw data results with charts displaying years with weekly intervals or a few minutes with 10-second intervals.
At first, the intervals were naively collecting all data in an interval and computing the aggregates. With a limited number of users providing data that worked fine, but as the platform expanded from dozens to hundreds of participants this approach gave memory and CPU issues in production. We therefore decided to precompute the intervals, to prevent the REST API from overloading. A second improvement was the introduction of reservoir sampling, which means that not all data points are used but only a representative sample. This has the advantage that a smaller number of elements needs to be kept in memory.
Still not content with the time it took to calculate and present all graphs on the dashboard, we eventually solved the issue by changing the intervals with which Kafka streams commit updates to the platform. The end result is an informative dashboard with multiple graphs that provide insight into the patient’s health values that loads, most importantly, within an acceptable amount of time.
Data storage and processing
With an increasing number of users sending data from smartphones and wearable devices to the platform, the storage space and data management capacity of our servers became an issue. All RADAR-base data is contained within four silos. For each step, measures were taken to keep the data flow manageable:
- The open-source stream-processing software platform, Apache Kafka, is used to collect real-time sensor data and compute aggregates. The data is stored for a month. This ensures that whenever a failure occurs downstream, we have enough time to correct it while retaining all collected data.
- The aggregates computed in Kafka are stored in the database MongoDB. There we ran into the problem that MongoDB has its own bookkeeping and failsafe mechanisms to ensure no data is lost in case of a failure. Some of these measures were redundant with the back-up Kafka provides. We decided to reduce MongoDB’s bookkeeping as this reduced the data storage requirements.
- Raw data copied from Kafka and stored in the Hadoop Distributed File System (HDFS), retaining all previous values. Once the data has been verified and published within the RADAR-CNS consortium, there is no need to keep hold of those records. To free up valuable disk space, a removal process was instated for old, published HDFS data.
- To keep track of changes, data that is extracted from HDFS and readied for internal publication gets a version number. Because of the nature of streaming data, that version just contains a snapshot of some moment in time. Each subsequent version will largely overlap with previous versions for older data. Therefore, we are looking into solutions for filesystem-level enhancements such as snapshots and copy-on-write files to ensure we don’t copy all data, only the data that changes. Otherwise, we would need to add a new hard disk for every few publications we make. Another alteration to the platform, is that the data extraction from HDFS is now extensible: it is ready to be coupled with Amazon S3 to store releases off-site, a custom directory layout, file compression or file format. To top it off, the HDFS extraction process does all of this in a multithreaded way.
Last, but not least: we raised the legal standards of RADAR-base. The privacy policies of RADAR-base apps have been updated to be GDPR compliant. In addition, the management portal is now fully auditable, with a change log of all changes made to the database. This ensures that the platform is usable in tightly controlled environments, such as (academic) research facilities, pharmaceutical companies, contract research organizations (CROs), hospitals, and other health care institutions.
If you have any questions on RADAR-base, contact us.
If you would like to read more on this platform, read our previous blog posts: