Fall CNI 2021 Suggests Trends for '22
One of the stranger aspects of the pandemic is how small our worlds have become. As one who would regularly spend time flying to distant locations, I found it odd to reflect that I hadn’t been to Washington, DC, less than 50 miles from my home, in almost two years. Yet in December, I ventured to DC for one of the very few events of 2021 that I attended in person, the fall meeting of the Coalition for Networked Information (CNI).
As Clifford Lynch, executive director of CNI, said in his opening remarks, “It’s almost strange” to be back together in person. Lynch’s keynote has always been—and continues to be—one of the highlights of the year, as he traverses the myriad of developments and trends affecting scholarly communications and information management. In roughly 45 minutes, Lynch deftly connected the dots among a set of trends as diverse as repositories, information security, humanities, research assessment, reproducibility, and artificial intelligence, all the while setting the stage for the program sessions that were to follow.
Lynch began by noting that so many of the past two years’ changes in practice haven’t been innovations in technology, necessarily, but rather the wider applications of existing technology, such as controlled digital lending, virtual meetings, remote work structures, and digital signatures. Each of these things had been piloted or were in regular use among pockets of the community but were not ubiquitously deployed. It was a perfect representation of the William Gibson quote, “The future has arrived—it’s just not evenly distributed yet.” Lynch left us to ponder what of all these many new applications will remain when we return to something that is like the normal we knew two years ago.
It is very likely that many of the tools and resources that have positioned us to navigate the pandemic will become increasingly embedded into practice. Transformative technologies, Lynch noted, have a way of sneaking up on us, and their nature isn’t fully realized until the situation changes, which necessitates their application in new ways. Lynch highlighted HathiTrust as one example, but there have been many, such as Zoom, ORCIDs, open repositories, and linked data.
Yet with all the potentially positive changes, there was also a mix of unfinished work and warning signs on the horizon. On the unfinished side remain the problems of discovery and attention management. With an ever-increasing pool of content, formally published or semi-formally released as preprints or grey literature, it is increasingly challenging to discern where a scholar should focus their limited attention. The proliferation of formats, repositories, and publication outlets only exacerbates this problem. Driving this is the increasing expectation of publication quantity, but without any assurance of higher quality output. Potentially of more concern is the increasing politicization of scholarship and its impacts on the scholarly record. What areas of research are not being pursued because they are politically damaging or hot-button topics? This has always been the case in some areas (e.g., firearms, abortion), but seemingly the domains of science that are politicized continue to increase, from social sciences and pollution to now epidemiology and geosciences. Long-term funding of science, and the independence of the scientific ecosystem, could fall victim to these societal fracturing forces, as many other things have.
One of the challenges of attending a live event in person that is somewhat mitigated in virtual meetings is only being able to be present at a single thread of sessions, rather than either viewing recordings (though these are now available after the fact in many cases) or bouncing between multiple sessions. In-person sessions that particularly caught my attention at CNI were focused on the FOLIO initiative updates, a discussion of the future of DPLA, and the recently published FORCE11 data publishing ethics guidelines, as each of these have resonance with NISO initiatives in some fashion.
Of course, library systems and their evolution are a core element of NISO’s work. Open source projects, particularly ones at the scale and complexity of an ILS, are no small undertaking. Despite the substantial investments in FOLIO to date, the project is modest in its achievements so far, although significant progress has been made. FOLIO had its roots in the Open Library Environment (OLE) project led by Duke University Libraries, launched in 2008. The OLE project team joined in partnership with EBSCO in 2016 to begin work on FOLIO. The CNI project briefing focused on six institutions and their experience moving toward implementation of FOLIO, describing their decision-making and implementation engagement. While every institution needs systems to manage its library services, few have the resources to dedicate to contributing to an open source project, which has limited the project’s capacity. Even fewer seem willing or able to assume the risk of moving forward, with only 15 institutions as early implementers of the system, though again, it is early days. In describing FOLIO during the session, it was made clear early on that FOLIO is being designed as an extensible platform, not simply an ILS, in which new service modules can be added for things like reporting or controlled digital lending, as needs evolve. This flexibility carries with it some challenges, as defining the minimum viable service becomes vague in that model and inhibits adoption. The value of the session was chiefly in hearing the thinking of library leaders regarding applications and the decision-making process on advancing new systems and where they believe library systems will need to develop further in the future.
DPLA is also at a crossroads, as it deals with the challenges of being a tool to navigate a distributed collection of repositories, without hosting the highest quality versions of those digital objects. During the session, the problems of being indexed by Google because of this structure were demonstrated. Google’s indexing favors high-quality original content, which is not what DPLA possesses at this point. Rather, DPLA contains an aggregation of metadata and lower-resolution images for discovery purposes. Google is already crawling the content from the host institution’s repository and links users to that source material rather than to DPLA. This was tested with a set of travel photographs posted only to DPLA, as described during the session. The question of whether DPLA should adjust its model and become a centralized host of the original content was the topic of the session. From my perspective, it was also indicative of two issues with the repository community and the Web more generally.
The first has to do with the nature of generalist repositories that collect material from any domain or on any theme, and thereby become simply a dumping ground for items, which creates challenges for curation, enhancement, discovery, and reuse. It also reinforces the power of Google as the intermediary of all things on the internet. If information managers presume that they don’t need to do any curation because all problems will be solved by the power of search, they cede control over the user experience to the search engine and have to significantly alter their systems to accommodate the requirements of the search engine.
For DPLA, this could mean something as radical as a reengineering of its entire service model, which would be no small undertaking—either from the perspective of how DPLA currently operates or with regard to how it engages with its participating members. Would DPLA members hand over to the central repository the original versions of all of the content they host in their own repositories, essentially replicating them? Some might, for some content, but I expect not all would and certainly not for all content. Institutional ownership barriers run long and deep.
Finally, in the session on FORCE11’s recently published research data publishing ethics recommendations, the discussion again focused on the role of organizations that host repositories. The recommendations themselves focus on the roles and responsibilities of actors in the data publishing ecosystem, but several of these hinge not on the scientists creating or hosting the data, but on the repositories that curate the data. Fundamentally, if an organization is taking on the responsibility for publishing content, whether formally as a publisher or by hosting a repository, it has a responsibility for stewardship and the integrity and curation of that data. This involves oversight of the content’s quality at some level, support for access and reuse (such as FAIR-TRUST), and management of retraction—a project NISO has recently launched to support. Connecting this ecosystem and the various elements will be a challenge for the coming years, as I noted in my introduction to I/O this month. This oversight and management certainly comes with attendant costs for staffing and curation, but these costs are an outcome of assuming the responsibility. I’ve long said that posting things to the Web isn’t publishing and that maintaining a repository of scholarly materials demands the same degree of attention to quality as found in other scholarly publishing environments.
The conference closed with a plenary session by Rebecca Doerge, Brian Frezza, and Keith Webster on the work by Carnegie Mellon University to launch an automated lab for its campus. Known as a cloud lab, the installation will be the first such at an academic institution. These labs allow for fully automated processing of laboratory experiments, using 211 instruments on a 24-hour-per-day basis, year-round, and in parallel manner. This leads to a tremendous increase in productivity, with an improvement in replicability and process sharing, since all of these elements are encoded and precisely defined by the system. Industrial applications of this technology have seen sevenfold increases in productivity, which could significantly impact the pace of discovery in laboratory science. This vision of the future of laboratory science comes at no small cost, with CMU investing close to $40 million. In the near term, few institutions will have the resources to invest in these scientific resources, creating a situation where those that have these tools make advancements at a far faster rate than others, which will consolidate top talent and resources at “elite” institutions. Eventually, these cloud labs could become regional research tools that can support networks of institutions, much like the shared resources of satellites or large-scale telescope arrays that support astronomical science. As all of these experiments are encoded, it’s worth considering how the future of science is in automated processes. Might a new “data paper” that is the outcome of a cloud lab experiment simply be the algorithmic code that produced the result? How will publication change if that becomes the new normal?
Truly, the future exists today. As is typical at a CNI meeting, one can get a glimpse of some of the advances that may be widely adopted years into the future. It is also a place where participants can ponder the institutional implications and the ecosystem demands that will require greater collaboration and integration. How can we make all of these nascent ideas work at scale, functioning seamlessly and efficiently? That is where NISO comes into play to support the broader implementation of those ideas. Based on this year’s CNI, we have a lot on our plates.