Report: Data Tracking in Research

Summary

Issued by the Committee on Scientific Library Services and Information Systems of the Deutsche Forschungsgemeinschaft (DFG), German Research Foundation, this 12-page briefing paper addresses data tracking in digital research resources, focusing on the transformation of scholarly publishers from content providers to being providers of analytics and the potential problems this may present to the institution.

The report consists of four sections

  1. Description of the current situation
  2. Transformation of Major Publishers and Their Relationship with the Academic Community
  3. Types of Data Mining
  4. Conclusion

The intention behind its release was to encourage debate on “the practice of tracking, its legality, the measures required for compliance with data protections and the consequences of the aggregation of usage data, thereby enabling such measures to be adopted.”

Pull quote: Research tracking is carried out using an ensemble of tools ranging from tracking site visits via authentication systems to detailed real-time data on the information behaviour of individuals and institutions...The result is that comprehensive data collections about research activities of individuals and entire institutions are being assembled by commercially operated corporations. 

The report specifically notes, as an underlying cause for alarm, the rise in transformative licensing agreements, agreements which award the right of data access by the publisher in return for supporting an institutional shift away from the subscription model of paid access. As a specific example, the paper references a Publish & Read agreement signed between Elsevier and the Netherlands in 2020. As explained by Lisa Janicke Hinchliffe in a post on the Scholarly Kitchen blog, “a Publish-and-Read agreement is an agreement in which the publisher receives payment only for publishing and reading is included for no additional cost. Again, the library goal is typically a cost-neutral contract in comparison with the previous subscription-based reading agreement or perhaps a decrease in price; however, these goals are not always realized”. The authors of the briefing paper distrust the introduction of Seamless Access or, as they refer to it, the GetFTR strategy, whereby publishers through introduction of more reliable user authentication and system navigation, gather data on usage and workflow in the research cycle. They then analyze and repackage the data in order to license it back to academic institutions. The concern is this shift means that “it will no longer be the public sector but increasingly private companies that are privy to knowledge about research content and trends, its institutions and stakeholders.” 

The authors look at three different types of data-mining in the context of publisher gathering and analysis of data. In their conclusion, they call for “clear-cut, transparent guidelines” in order to ensure that academic and research entities retain some degree of control over the gathering and use of data.