Six quick tips to start making your data FAIR

01-02-2018 5 min read

This is an update of a previous blog post which listed four quick and easy tips. We have updated the old tips and added 2 new tips at the end of the post.

Making your data “FAIR” is something you hear more and more. In the age of creating large volumes of data and a variety of data types, and after experiencing data management difficulties when running large-scale projects, researchers realized that making their data Findable, Accessible, Interoperable, and Reusable is becoming a necessity.

So you want to follow the best practices for data management and want to make your data FAIR. Where do you start? With these six quick and easy tips you can start your FAIR journey today.

Describe the origin of your dataset in the metadata

A lot of datasets lack a description of their origin. Where is the data created? How? and by whom? Is the data processed? Or is it the raw dataset?

These are important questions for other researchers that need to be answered before they can actually use your data for their own research. A genomic dataset that is created with the Illumina Next Generation Sequencing platform is different from one that is created with a Roche sequencer. Maybe your pipeline is only compatible with a specific machine.

Describing this in the metadata, a file that contains all the information about your dataset (so basically data about your data), will greatly help others with filtering on relevant and usable datasets. A good example to store this information is a “readme file” which you add to your dataset.

Describe the purpose of your dataset in the metadata

Why was (or is) this dataset created? What was your goal when you created this dataset?

A lot of datasets are created for a single purpose. FAIRifying your data ensures that you start thinking about other possible purposes or applications for your data. Maybe while thinking about these purposes, you can think of someone who can also benefit from your data? Sharing is caring.

Add license if needed

Is your data freely available for use? For what purposes can it be used? Maybe you do not want it to be used for a publication, but it can still help other researchers narrow down their search field.

Or maybe you want to get something in return for the use of your dataset. By adding a license to your data, you ensure that no one can legally use your data for purposes for which it is not intended. can legally use your data for purposes for which it is not intended.

Some examples of these licenses are:

Public Domain Dedication and License (PDDL) — “Public Domain for data/databases”
Attribution License (ODC-By) — “Attribution for data/databases”
Open Database License (ODC-ODbL) — “Attribution Share-Alike for data/databases”

Each license comes with its own benefits. For example, the "Public Domain Dedication and License" is highly customizable to what you want your data to be used for, while the "Open Database License" ensures your work gets attributed to you and not someone else.

Store your data in a FAIR way

When you have gone through all the steps mentioned above, you of course want to be able to share your amazing FAIR dataset with others in an easy way.

Storing your data in a place that does not require login or review by a data access committee decreases the hassle it will take to download data, thus making it easily accessible.

Some examples of these storage places are:

For a more in-depth perspective, read our blog post on archiving in a FAIR way.

Describe how someone can get access to your data

Some data can be stored openly and is available to everyone. However, most data is too sensitive to share in such a way.

When you have described what is inside your metadata, in such a way that a researcher can determine if it can be valuable to them, it is good practice to think about and describe how such a person can actually get access to your data. Should they send an email to someone? What should definitely be included in such an email? Bonus Points if you can describe it in such a way that a machine can interpret this information as well!

Add machine-readable metadata

So, at this point, you have a lot of extra metadata, but this metadata may not really be used for anything. By describing your metadata in an RDF format, you can use this to query for similarly described data through a so-called SPARQL endpoint. Doing this can actually benefit you as well as other researchers!

And one extra tip: SCREAM IT FROM THE ROOFTOPS.

Let other researchers know that your data is open and available. Data for them, citations for you: it’s a win-win situation.