Skip to main content
Introducing the Newest ISO Identifier Standard

Introducing the Newest ISO Identifier Standard

June 2024

Letter from the Executive Director, June 2024

The ISO Subcommittee on Identification and Description, which NISO manages as the secretariat, published a new standard in May, the ISCC.

Each May, the ISO Technical Committee on Information and Documentation (ISO TC 46) gathers experts from around the world for a plenary meeting. This year, the meeting was held in Berlin at the kind invitation of the German national standard body, DIN. The weeklong meeting in Berlin covered a wide range of issues, from country codes to paper permanence to distributed ledger technology for trusted records management. There are many international projects currently underway that have a relationship with and impact on standards work in the US library and information community, such as transliteration, thesauri development, and EPUB file preservation. We will be providing a summary of the meeting outcomes during our next monthly open teleconference on Monday, June 17.

While NISO represents US interests to all of the subcommittees of ISO TC 46, the group we are most connected to is the Subcommittee on Identification and Description (ISO TC 46/SC 9). This is because NISO serves as the secretariat for the group; I serve as chair of the subcommittee; and Keondra Bailey, NISO’s assistant standards program manager, serves as committee manager. In these roles, we provide support for the various working groups and projects underway in the subcommittee. This includes managing the process of launching projects, balloting, comment oversight, procedural compliance, and interfacing with the ISO Central Secretariat on various issues. 

Last month, TC 46/SC 9 was pleased to celebrate the publication of the newest ISO identifier system, ISO 24138, the International Standard Content Code (ISCC). The ISCC is an identification system for digital assets, including digital representations of text, images, audio, video, or other content across all media sectors. An ISCC is generated from the digital content and is algorithmically derived from the file being identified. There are several component strings within the ISCC, denoting its type, its metadata, and its content. The resulting string is therefore bound to the content and can support data integrity verification and other file specific use cases. Unlike most other ISO identifiers, this algorithmic process can be undertaken by anyone on any file, so it is not centrally managed, nor does the standard specify a system for managing authoritative metadata about the referent. Unrelated parties can independently derive the same ISCC from a file.

As more content is distributed in digital form, a native identification structure that can uniquely and independently confirm the identity of a referent is a powerful tool. In 2022, the W3C published a Recommendation Describing Decentralized Identifier (DID) systems. ISCC is included as one of the 103 experimental DID method specifications that existed at the time of publication, as it was in development at the time. DID systems are designed to enable individuals and organizations to generate their own identifiers using systems they trust. These new identifiers enable entities to prove control over them by authenticating using cryptographic proofs such as digital signatures. 

Increasingly, with the development of generative AI image tools and image manipulation tools, being able to compare files and track their provenance is becoming vital to information exchange and ensuring trust in the digital information we consume. The rapidly growing Coalition for Content Provenance and Authenticity (C2PA) project seeks to embed provenance information in file metadata to help support understanding about file manipulation and provenance of content. This embedded metadata model attempts to connect a content object via cryptographic signatures back to the source of the object—say, a camera, a recording device, or a computer. The model can also capture changes made over time, so that an end user of content can double check its provenance and authenticity. 

The ISCC standard also has the capability to capture content similarity comparisons, allowing for connections to be made to semantically similar content, an assertion called a soft binding. Recently, the C2PA adopted the ISCC as one identifier that is among its list of authoritative soft binding algorithms in the C2PA model. Soft binding has been a challenge for work identification and grouping semantically similar content. Most hashing protocols will generate completely different outputs if the digital files vary even in the most minor of ways. Although it is not a requirement, the ISCC model has the capability to capture and statistically describe the similarity of two files. For example, editing one page of an e-book would result in two slightly different EPUB files, and in a traditional hashing protocol, they would not appear to be related. By simply looking at the outputs of the hashing protocol, one might not know how similar or different the files were. The ISCC, however, has the capability to provide a statistical similarity check based on the output of the ISCC assignment algorithm.

Distributed identifier systems are not likely to replace existing centralized registry systems, such as the ISBN, ISSN, ORCID or DOI systems, for several reasons. First, not all objects are easily represented in digital forms that can be hashed, such as a serial publication or a person. Second, there is value in centralization and curation, such as credentialing and validity. The value of the DOI system is not in the assignment of the identifier or even in the resolution capabilities of the handle system. It is in the curation of a centralized registry that binds the identifier with metadata about the object. In this way, the persistence of linking can be maintained. Finally, many DID proponents advocate for the independent, distributed control of a self-sovereign identity system, claiming that the best entity (be it a person, an organization, or an object) to assert the characteristics of a thing is that entity. However, much of what is asserted, particularly when it comes to credentials, is assigned by third parties. One might claim they went to a prestigious institution, or have a license to do something, but those credentials are given out by authorities that confer them. While some are trying to create overlay systems to attach DIDs that are cryptographically tied to credentialing systems, their implementation and adoption has been slow. But the momentum behind these systems could accelerate. DID systems may also find innovative applications in the search and discovery of objects, where registries are created and maintained not centrally but through crawling and capture techniques, like those that create the indexes of services like Google.

There are a lot of advances happening in the landscape of identification and identifier systems. Some of these issues will take center stage during the upcoming PIDFest conference later this month in Prague. I will be discussing another persistent identifier project underway at ISO on unique media identification as well as participating in the discussions about national PID strategies. It should be a great meeting, and while the in-person program is sold out, you can still register and participate virtually.

We are looking forward to seeing you either at one of these ISO-related meetings, NISO’s upcoming annual meeting, or the ALA conference, where we will host a number of sessions. As always, there’s so much afoot in June!

Sincerely,

Todd A. Carpenter
Executive Director