Note: Part One of this feature article was published in January 2020 and may be found here.
As we unofficially enter the next decade, we can reflect on what the teens were like and can consider what information distribution will be like in the 20s. Oftentimes, we don’t notice the significant change that happens over a long period of time, when those changes are incremental as we’re living through them.
Last month, I put forward five of ten technology trends that began transforming our landscape over the past decade, and will continue to impact the information creation, distribution and management ecosystem well into the next. The first set were primarily, though not exclusively, focused on technical tools and applications. This set, again not exclusively, focuses on the sociological aspects of the technology environment we inhabit. When discussing standards, I often say that the technology is not the primary thing we spend most of our time arguing about. The most challenging issues are social. It is in this list we see the areas related to our ecosystem that will cause the most heartache and provide the greatest opportunity to improve our community in the decade ahead.
Two of the trends I mentioned last month were security and metadata. Both of these are related to privacy, which people began to grasp as a critical feature of the technological ecosystem in the past decade. How we grapple with this in the next is among our biggest socio-technical challenges.
In the early 2010s, with the revelations of Edward Snowden (see Security last month) it came to light that the government had long been gathering information about people’s online and telephonic behavior, en masse, for everyone. It was defended at the time as being “just metadata”. The problem is that metadata are the keys to understanding the universe, as every librarian will tell you. As just one simple example, your browser metadata can uniquely identify you about 99% of the time. And over the past decade, being able to gather, store and analyze those data can provide organizations a vivid and accurate picture of who you are, what your interests are, and what potentially your intentions are. Combine these trends with the increase in internet-connected cameras and facial recognition, the opportunities to surveil the population have never been more ubiquitous and more troubling. People are only now beginning to recognize the implications of these data being gathered, analyzed and monitored.
Anyone who gave it much thought understood that the advertising model of the internet was supporting how much of the internet functioned. Far fewer understood the power and ubiquity of the tracking that is done of people’s online behavior. As more and more users bought into the mobile device ecosystem (see last month’s Mobile trend), the digital tracking of our behavior became not just virtual but it is also impacting our experience IRL. Just this week, the FCC initiated action against the major telecommunications companies enforcing the breach of law that cell providers were engaged in by selling real-time geolocation data. This is likely to be the first of many revelations of how traditional notions of privacy have been breached and left to rot on the side of the broken dikes of traditional norms that we thought protected us.
In recognition of these vast troves of information being gathered, in the early-2010s many tech CEOs began referencing the term, “Data was the New Oil”, a few years later there began a steady stream of traditional media stories about it and its implications. Scholarly communications always had its foundations in data. These data were normally not shared or made public, often relying on summary tables in articles to convey their most salient elements. The past decade saw the rapid arrival of research data as a first class scientific publication object. A movement that began in the early 2010s to advance data citation, grew into a larger movement to support data publication, referencing and eventually into the FAIR initiative. The Research Data Alliance began in 2013 to consolidate discussions around the various aspects of data. From their first meeting of only about 100 people in Gothenburg, Sweden, RDA has grown to now number nearly 10,000 people worldwide. Data Cite, a service for providing DOIs to research data sets launched in December of 2009, currently has registered more than 2.2 million DOIs for data objects. In the past decade, data certainly has become a first class object, as those in Amsterdam first envisioned.
Moving into the next decade, there is a lot of work to do to more fully integrate data in the traditional publication process. Many challenges exist related to sharing and working with data at scale. Scholars need to develop confidence with engaging with it to advance their own research goals. Understanding what data is appropriate, discoverable and reusable remains a challenge. Beyond this, many issues need to be resolved from how to share sensitive data, to rights and control, as well as peer review and quality assessment. Last week, in her closing keynote at the NISO Plus Conference, danah boyd at Data & Society, commented that we need to do a better job ‘interrogating the data”, before we rely on it for processing and informing our decisions. Now that data has arrived, understanding how to apply it is critical.
Many of the people driving the early advancement of network computing and personal computing were radicals and revolutionaries, if not in the traditional sense of either word. There was a strong thread of independence and counter-culture among many of the leading voices in information technology, thinking that technology had the capacity to the overturn existing cultural structures. One of these is the model of subscriptions and controlled access to information. Stewart Brand’s iconic quote, “Information wants to be free” is the summation of some of these ideas. While Open Access began many years before 2010, in the past decade the model gained significant traction. Gone are the days when conferences would include pro-versus-con arguments about the value of open access and its potential positive or negative impacts. Some of the formerly strongest opponents are now among the world’s largest open access publishers. Nearly every publisher of significant scale provides an open-access publication option. Government and philanthropic funders advanced policies to drive researchers to publish the results of funded research as open access articles and share other research outputs freely.
The drive to advance open access has led to the creation of transformative agreements and even more aggressive OA policies. As we look forward, we will begin to face the myriad of implications of this shift in the underlying business model. Publishers have begun to shift their services to include a broader range of researcher workflow solutions. How will societies and publishers who are outside of the well-funded world of STEM be impacted and position themselves when they are forced to either defend their subscription model or survive with fewer resources? What will be the impact on library budgets and the role of the library as an increasing amount of the content they traditionally subscribed to is freely available? How will discovery services function in an environment when the boundaries between traditional journals and “everything else on the web” begins to fade? How will users easily distinguish between vetted, quality content and potentially biased, flawed, or un-vetted research?
Openness became not just the domain of publications and articles. The decade saw rapid growth of open access to data, to ever more open sources, even open ledgers, which traditionally had been kept hidden either for reasons of scale, of profitability, or of security. Because of many of the technological advances in terms of storage, or processing, and interconnectivity discussed last month, many more have the technical wherewithal to these objects feely on the web.
The next decade will be the time when the implications of this movement to openness come to light. We will have to adapt our infrastructure to support more openness in a variety of ways. It will also have broader implications that we will all need to address in ways we don’t yet comprehend.
In the formative years of the internet, there was a push to keep regulation out of online interactions. Online sales and services were exempted from sales taxes, platform provides were protected from the implications of the behavior of their users, and FCC stood behind a set of principles that supported access to the basic infrastructures of the net. As the many issues around access, around security, around privacy have exposed themselves, demand for increased regulation has grown. Content creators tried to deal with the growth of online piracy through the Stop Online Piracy Act (SOPA) and its Senate counterpart the PROTECT IP Act (PIPA) that would have a created a "blacklist" of censored websites. This meat with strong opposition from the community of online users, supported in large part by platform providers, which culminated in the Internet Blackout on the January 18, 2012, becoming the first wide scale internet political movement. People’s concerns about privacy (as described above) in the European Union led to the eventual adoption of General Data Protection Regulation (GDPR). In the United States, California led in the adoption of similar types of privacy protections, but others are likely to follow. The needs of those with visual or reading impairments made a significant advancement in the past decade with the adoption of the Marrakesh Treaty in June of 2013. Control of data, and the flows of information around the internet are also squarely in the eyes of regulators and legislators. The more that hacking, data breaches, and privacy abuses remain in the headlines, the more pressure there will be to address these issues via regulation.
Looking forward, one can easily see continuation of the legal and regulatory battles around Net Neutrality, around the concept of contractual preemption of rights enshrined by copyright, what it means to have machines create content, and around the many issues surrounding reuse and republication in a digital age.
What had been a relatively free wheeling community is increasingly becoming a place of large and competing interests. These will be battled out either via regulation, in the courts, or in legislation; most likely in all three over the coming decade.
As our world has moved more digital the way in which we analyze the information available also needs to adapt and change. In 2011, NISO participated in the first altmetrics meeting, which subsequently led to a multi-year initiative to advance new forms of assessment.
Traditional bibliometrics, grounded in referencing linking and citation counting in scientific articles, aggregated at a publication level, drove a tremendous amount of thinking about publication and research assessment for many years. These were always imperfect measures.
The altmetrics movement of the 2010s was—and still is—important, not so much for all the focus on social media metrics which gained it a lot of attention, but on the value of questioning why are these data important, and what are we trying to measure through the statistics we capture. Assessing the average level of a journal’s citations is important when assessing the quality of a publication generally. Whereas, if you are considering a particular author, a particular paper, or a particular research project, there are many ways to assess that at different levels of granularity and in different contexts.
As institutions, particularly libraries, grow into more digital-first organizations, we need to rethink some of the traditional metrics by which we assess success. For example, a traditional metric of a library was how many items were in its collection. I recall a funny episode about a decade ago when the Directors of four prestigious institutions were bragging about the respective size of their collections, each successively trying to better the other. But in a world of open access, Wikipedia and the world wide web, what does it mean to ‘have a collection’? Even more, institutions have begun ‘sharing their collections’ in novel ways, jointly collecting materials and arranging for seamless inter-library-loan among the collections, such that searching one catalog means searching them all. What do you measure in that context?
Similarly, institutions would measure their community by how many patrons it served. Realistically, isn’t it more important to understand the community that the institution is not serving? Understanding who your patrons are is wonderful, but understanding who is not served is also critical. An institution might be providing excellent service to one group, while another might be marginalized and not engaged.
We increasingly live in a world that has a tremendous amount of data about user behavior, or content interactions, of the network of how things interrelate, and how knowledge graphs develop. And yet, most organizations lack people with the data skills to analyze these data in meaningful ways beyond the most rudimentary statistics of counting and averaging. Ideally, the next decade we will move beyond the basics to using data to answer the easy questions to using metrics and data to answer more challenging queries.
Economics (a bonus)
Behind all of this innovation are the basic fundamentals of our economy. The past decade has been one of prosperity, that began from the wreckage of the Great Recession of 2008-2009. The success of the economy propelled a lot of innovation and investment. This isn’t the place for, and I am not the right person to prognosticate about where the economy will head, but there are obvious realities we shouldn’t forget. First, things do not go up forever. History has shown that economies don’t grow forever without faltering. Whether the next recession is deep or shallow, there is no doubting it will come. When it does come, will our investments prove useful and successful? Did we take the time when times were good to make improvements to our infrastructure that will last? Was enough attention focused on the right problems, on the appropriate solutions?
The second reality to focus on is demographics. Our world is constantly changing as is the community that participates in it. As the world economy has grown, have we collectively done enough to encourage worldwide engagement in information creation, dissemination and organization? Have we reached out, when times were good to those who could use support in their attempts to engage in our marketplace of ideas, or provided a welcoming place for them to engage? Beyond this, there are population trends in the USA that will impact the scholarly and educational communities. A demographic ‘baby bust’ resulting from fertility rates dropping to historic levels resulted in some 4 million fewer children born between 2008 and 2016. This will have significant implications on institutions that rely on growing numbers of students and patrons, as well as for the larger society. While a lot of attention has been garnered by the Millennials and the Gen Zs that followed, the subsequent demographic dip that coincided with the 2010s will reverberate for much of the next decade. Institutions, both academic and libraries will be squeezed by the shortage of students, unless there is a greater openness to foreign students from outside the USA. Whether there will be a receptive political environment to allowing this is, sadly, an open question.