Report on Open Access: The Role and Impact of Preprint Servers

January 2020

Prelude: How NISO in-person events are changing

While virtual events have a place in discussing emerging trends and industry issues, it is important to note that due to the proliferation of virtual events, an emphasis on lowering any cost-related barriers to participation, and technological limitations, these events are not entirely successful in allowing various stakeholder groups to fully engage in problem solving.

NISO sees a need to enable face-to-face networking in our cross-sector environment, both as a means of fostering a sense of community as well as serving to focus attention on emerging issues and resolutions to those issues. This November event was an initial experiment in learning how best to foster industry engagement and springboard community interest and cooperation in developing needed solutions.

As noted in Todd Carpenter’s Letter appearing in the December mailing to members and non‑members, “NISO’s in-person events should be focused more on conversation than on presentation. We should take advantage of our colocation to drive discussion, interaction, and development of ideas...Events will now be focused on engaging the participants in a conversation about the topic, with a goal of producing tangible outputs to advance the community.”

Summary of presentations and small group discussions

The objective for the Fall on-site event was to look at the current status of preprint services and identify areas where best practices and guidelines might better assure practical value to the community. Approximately thirty paying participants attended this day-and-a-half long seminar.

As foundation for discussion, the November program included nine presentations from industry professionals. These presentations addressed:

Current use and function of preprints in general and discipline-specific communities
Funding models and long-term sustainability of preprint services
Institutional repositories (IRs) as an adjacent but not a duplicative element of the scholarly infrastructure and whether IRs might be an effective added element to providing access
Impact of preprints on scholarly societies’ and commercial publishing activities
Platform requirements and related technical concerns associated with hosting preprints.

Individual decks may be viewed on the NISO event page here.

Kent Anderson of Caldera Publishing Solutions presented his research into how preprints are actually functioning in the marketplace and the systemic gaps that have opened as a result. The preprint is not being used as a means of enabling early peer review, but rather as a means of driving early awareness of work and fueling citations for the researcher. Anderson noted concerns over the financial sustainability of these services and offered examples of harm to society arising from access to unvetted scientific findings, poor quality manuscripts, and predatory behaviors. He offered six potential mechanisms that might be introduced in order to improve the value of preprint services and minimize negative uses. Those included:

Elimination of the assignment of DOIs or other permanent identifiers to preprints
Limiting access to preprints to those members of the research community with the appropriate degree of expertise necessary to understanding and evaluation of the content
Rescinding access to preprints that remain unpublished after a specified time period has elapsed
Offering a standard set of collaborative features across the spectrum of preprint services
Charging submission fees for uploading content to preprint services
Creating a point of editorial accountability

Other speakers would subsequently respond to these suggestions in their talks and external reactions to Anderson’s recommendations may be seen in this Twitter thread of responses to the bullet recommendations.

Later in the day, an attendee tweeted: IDEA! What if #preprint servers had a Twitter style #bluecheck mark only available to researchers with "verified" status in the community? Could this help mitigate the concerns around bogus science living online forever? Question is: how do you get the check? Responses to that single comment would seem to indicate that this attempt at verifying expertise is not an isolated concern.

Oya Rieger, Senior Advisor, Ithaka, spoke about continued reliance on preprint services as an unfunded solution to funder requirements of open access (OA) to research. While there is tremendous enthusiasm for broader access to publicly funded research, the full infrastructure needed to support such access at scale is not yet in place nor is it evident as to who will pay to fund the engineering of such infrastructure. Such gaps in funding can become problematic if preprint repositories are to survive as archives. Both Rieger and Anderson commented on the cost of running arXiv at $1.5 million annually. Other significant concerns from the libraries’ perspective were a potential for the fragmentation of the scholarly record and the lack of preservation strategies. Tweets from the event captured additional concerns that Rieger put forward:

There are no automated quality control systems to effectively link preprint articles to their final published versions.
@arxiv can be #PlanS compliant but what would that take? @arxiv is successful because the barrier to participate is low; it's easy. Not that many metadata fields (#sigh) and acquiring better metadata is expensive.
At least 33 preprint servers are now identifiable ... not counting repositories. So, they are proliferating. Can they play a role in compliance infrastructure?

Trying to ascertain who is using preprint services and to what extent is challenging on a number of levels. Thomas Narock, Associate Professor of Data Science at Goucher College, presented research indicating the differences in acceptance and use of preprints across a variety of disciplines. Narock and his co-author looked at differences in usage across domains, use of preprints vs. postprints, topic overlap across domains, access to peer reviewed versions, and network connections among authors (full text article here). Unsurprisingly, the adoption and use of preprints is high in many scientific disciplines but less so in the humanities and social sciences. Similar research had been released earlier in 2019 by Richard Abdill and Ran Blekhman.

IRs serve a related purpose, but operate in a space adjacent to that of preprint repositories. Where IRs generally house the output from a single institution’s faculty, preprint repositories tend to be more focused and subject-oriented. As expressed by Tyler Walters of Virginia Tech, IRs are populated and maintained primarily by library staff, whereas, preprint servers have emerged from researchers themselves and their associated communities, associations, and societies. Preprint servers offer some degree of support for commentary and/or peer review, whereas, IRs do not. Preprint servers offer notification alerts, whereas, IRs do not. At the same time, the speaker noted particularly that research universities have an ongoing responsibility for building and maintaining infrastructure (IRs, but potentially more). There are some potential issues arising from this largely unfunded mandate, but there is also an opportunity for cross-sector collaboration in finding solutions:

What should the relationship be between government repositories (e.g., PAGES, PubAg, Pubmed Central), preprint services, and IRs?
What about research data and the content of these repositories? What linkages are being built between them? How will they be sustained?
What impact does depositing research data have on a preprint server? Should the data be deposited with the preprint?

From the audience came a question regarding the interaction between university press publishing and a lot of library publishing that is going on. Tyler’s response was that there is much silo-ed activity that could be improved and made more efficient. But from the Twitterverse watching output from this NISO event came a response from a research scientist at OCLC: Are academic IRs Silos? Or "cylinders of excellence"?

Angela Cochran of the American Society of Civil Engineers noted that societies themselves currently have three options in dealing with the proliferation of preprints. The first was wait and see, which would mean adhering to existing policies of not accepting articles that have been previously made available. This may create a perception of being stodgy and/or behind the times, but does protect subscription revenue. The second option referenced by Cochran was to acquiesce, by allowing preprint submissions to the society’s journals. This shows support for preprints, but may well pose a threat to some portion of the journal income. Finally, there is the option for societies to get out in front by building their own preprint service or by partnering with an existing service.

Cochran pointed out that preprints currently represented only a “sliver” of publishing activities, but one which is growing. As Narock had previously noted, scholarly communities and disciplines are adopting “preprint culture” with varying degrees of enthusiasm. Her recommendation was that societies monitor what is happening in their particular space and noted in closing that the impact of preprint services on society revenues is still unclear.

Greg Gordon of SSRN did not go deeply into the revenue implications of preprint servers but rather focused on another question that posed a problem, namely, the definition of what constituted a preprint. As he noted, in his experience, scholarly output might include an idea paper, a working paper, a conference proceeding or poster, or may be the more rigidly defined pre-publication version of a completed article. Preprint servers may or may not be ingesting this variety, but the lack of common vocabulary seemed to be a stumbling block to full integration of the preprint into the scholarly ecosystem. Gordon looked to the future when a model for descriptive, linked metadata assigned to preprints might better support the creation of “knowledge cards,” such as those generated by Google.

On day two of the event, the focus shifted from the current status of preprints and stakeholder concerns to issues pertaining to systems and technology. Sara Rouhi of PLOS opened up with several questions posed to attendees:

What are preprints and are we talking about the same things when we say @SSRN vs. @MedArXiv, etc.?
Who are they for and what are they meant to do? What are the distinctions about preprints and different types of repositories? They have different approaches, they’re for different people.
What is the ultimate benefit?

Gerry Grenier of IEEE addressed his organization’s experience in adopting Cochran’s third option of “Get Out In Front.” Reinforcing Kent Anderson’s point about the true function of preprints, Grenier noted that IEEE members wanted to drive early attention to their work and the recent beta launch of TechRxiv for the engineering and computer science communities had been the society’s response. IEEE determined that the organization had neither the time nor the funding to build an experimental service from scratch, adopting a platform-as-a-service approach. It made more sense to look at potential partners, existing options such as Atypon, Figshare, and Research Square. One particular point of discussion dealt with available access to the content and the need for moderators as mediators on TechRxiv. Referencing Kent Anderson’s point the day before about limiting access to pre-publication content, Grenier noted that IEEE was still wrestling with the potential of limiting access solely to members but noted that doing so would introduce issues both for themselves and for Figshare, their partner.

Kathleen Shearer of Confederation of Open Access Repositories (COAR) to some extent echoed the points put forward by Tyler Walters of Virginia Tech in suggesting that institutions had a responsibility for support of the research infrastructure. COAR’s vision of a global knowledge commons meant that IRs might become the “foundation for a distributed, globally networked infrastructure for scholarly communication, on top of which layers of value added services will be deployed, thereby transforming the system, making it more research-centric, open to and supportive of innovation, while also collectively managed by the scholarly community.” This would necessitate adoption of new technologies, common behaviors, and value-added services. She referenced the proposed PubFair model, which was envisioned as a means of minimizing publishing costs and honoring academic standards while connecting communities with services linked to a preferred repository. (Version Two of the PubFair white paper, while not available at the time of the event, is currently accessible here.) The model emphasizes interoperability, a decentralized network of platforms and community-defined peer review and assessment processes. However, recognized issues associated with this model include its viability being reliant on an unfunded mandate and a continuing commitment to participation by global research communities. COAR expects to sponsor a multi-stakeholder meeting in January of 2020 to begin defining vocabulary, standards, and protocols in support of their vision.

Katie Funk and Jeff Beck of NIH discussed the critical role of metadata in communicating to users’ information regarding the status of a particular version and provenance of the service through which it was made available. Among their recommendations were the idea that metadata indicate that the preprint server be identified as a <journal title> and that the <publisher> field name be the organization responsible for enforcing the policies and practices of the server. Also important would be included indicators as to peer-reviewed or unvetted status and that the vocabulary term of preprint not be listed as the article type.

Countering the point made by Kent Anderson in his opening keynote, Funk and Beck noted that Crossref recommends that each version of a paper be assigned a unique DOI as a mechanism for differentiating between versions.

Their concluding recommendations included the following:

Facilitate linking from preprint to version of record / published version by including preprint PID/DOI in article metadata. (NIH expectation)
Establish best practices for maintenance of the scientific record, e.g., withdrawals or retractions to deal with plagiarism or scientific misconduct. (NIH expectation)
Build a common vocabulary for preprints, e.g., publishing vs. posting, that can enable more productive discussions around standards.

The event closed out with a visionary presentation by Jessica Polka of ASAPBio, which posed the idea that focusing on definitions might not be the right approach to thinking about preprints (see full text here posted to NISO IO in December). Attendee Sara Rouhi of PLOS in her write‑up of the event appearing on the PLOS blog noted the following:

“...we should focus our efforts to generate a clear “vocabulary for the full suite of peer review and screening checks that can be applied to any version of an article in the publishing continuum.”

Much of the justified concern about the dissemination of research findings that aren’t peer reviewed can be mitigated by using the checks and taxonomies appropriate to the field, clearing indicating the moderation strategies used by that community/server, and effective version monitoring so readers understand how the version they’re reading fits into a wider scheme community feedback.”

Output and Potential Work Items

Discussions emerging from the three small group discussions among attendees included the following observations and recommendations:

(1) Recognition that each research community is dealing with preprints according to its own established set of protocols and academic culture. This fragmentation across the spectrum of disciplines may result in confusion for those who seek to take advantage of OA materials (journalists, those dealing with a medical condition, students, etc.). Preprints, as one group noted, represent merely a form of available access.

(2) All three groups noted that it would be useful for the information community to come to grips with the need for a common nomenclature and set of definitions surrounding scholarly outputs and indicators of where a specific output is in the research work cycle (as in interim research data, preliminary results, initial article draft, preprint, continuing through version of record). This recommendation emerged independently from all three groups.

Other ideas arising from the different groups as areas requiring attention included best practices for:

Metrics and appropriate tracking of usage
Licensing of preprint submissions, content
Degree of validation associated with a particular submission
Implementation of single sign-on access

Potential work item:

Revision of the NISO JAV terminology (Working Group Information)

(3) There is a need for education of the broader community about what are appropriate indicators of content for purposes of indicating trustworthiness, proper vetting by the scientific community, and appropriateness for mass consumption. (There was serious concern that effort be put towards ensuring that it be made clear that purely mechanical identifier or service [such as an assigned DOI] ought not to be perceived as an indicator of quality, of responsible peer review, and/or as indicative of a version of record, the final version of scholarship.)

NISO's Plan for the Future

NISO will continue to foster this type of engagement and support for these events, not simply for education of the membership, but as springboards allowing common understanding and insight into emerging gaps for the community.

Report on Open Access: The Role and Impact of Preprint Servers: A NFAIS Forethought Event

Prelude: How NISO in-person events are changing

Summary of presentations and small group discussions

Output and Potential Work Items

Jill O'Neill

Related Information

Standards and the Role of Preprints in Scholarly Communication

To Preprint or Not to Preprint

Why “what is a preprint?” is the wrong question