Ensuring The Reproducibility of Science

On September 11, 2019, a one-day gathering of data professionals, led by CENDI, NISO/NFAIS, The National Academies, and RDA set out to better understand the role, pitfalls, and opportunities for the FAIR data principles. The title given to the convening was Implementing FAIR Data for People and Machines: Impacts and Implications and topics included what counts as FAIR data, how the FAIR principles might change governmental use of data, specifics as to the connections between FAIR and scholarly outputs, and what changes must take place in research funding and scientific processes in order for FAIR data to be an integrated part of the effort.

For those who may be unfamiliar with the acronym, FAIR stands for Findable, Accessible, Interoperable, and Reusable; it represents a set of principles originally published in 2016 by a consortium of scientists and researchers in Scientific Data. Growing out of a largely European beginning, FAIR data principles are being embraced by researchers around the world, and for somewhat obvious reasons. If one of the foundational principles of science is reproducibility, and data is the foundation of modern science, we don’t have a healthy worldwide scientific ecosystem without the ability for researchers to find, access, and reuse data in order to further their own research.

Keynote speaker, Barend Mons, President, CODATA (a commission of the International Science Council), and Professor of Bioinformatics, Leiden University, The Netherlands, joked that FAIR could just as easily stand for “Future AI Ready” data, because the FAIR principles are concerned only with the machine-side of data. For each of the principles, the focus is on whether or not they are being met for computers, not for human consumption. Does the data have machine-readable metadata that makes it discoverable? Can other computers access the data through well described and open APIs? Can the data be transformed appropriately into other formats to be reused in other services and systems? These are the sorts of questions that determine whether data is considered FAIR or not.

Even that was too binary for some at the meeting as there was a robust conversation around degrees of FAIRness as well. Some recommended that data should be analyzed by funders as degrees-of-FAIR across multiple aspects, and not just the binary FAIR-or-not-FAIR. One might easily imagine data that would meet every test of one or two of the principles, but meet only the lowest possible barriers for the other principles. As was also pointed out during the day's discussions, FAIR data principles say nothing about the actual quality of the data in question. It’s possible to have perfectly FAIR, and entirely false, data. It’s also possible to have FAIR data that was produced unethically, or to use FAIR data for what most would consider unethical projects. These additional features that we hope are being tracked and noted for data in the modern era will have to be overseen by another set of principles entirely.

It is clear from the interest and conversation that we’re going to be hearing a lot more about FAIR standards here in the US as scholars and publishers wrestle with how to deal with these expectations. Funders are beginning to demand that outputs from their awards quality as FAIR, and researchers themselves are ever-more aware of the need for FAIR data in the modern scientific process. How these principles become instantiated in scholarship via publishing and citation is a much larger question without a good answer. More events like this one that bring all the stakeholders together will be necessary to sort out the complicated details of how FAIR principles impact modern and future science.

The joint sponsors of the event are working out the logistics of sharing the presentations and recording of this event.