The Role of Metadata and Persistent Identifiers (PIDs) in Open Science
In today’s evolving research ecosystem, metadata and persistent identifiers (PIDs) are the backbone of discoverability, transparency, and reproducibility in science. Whether it’s ensuring the proper attribution of research outputs or complying with new data sharing policies, metadata and PIDs are foundational to open science practices.
At the Center for Open Science (COS), metadata and PIDs are crucial to our mission of fostering openness, integrity, and reproducibility in research. COS is a non-profit organization focused on creating systems that promote culture change in research through three mutually reinforcing activities: policy advocacy and awareness, community action and culture change, and infrastructure and software tools. These activities work together to drive greater adoption of open and reproducible practices across the research community.
Through our flagship platform, the Open Science Framework (OSF), we offer tools and services that allow researchers to manage and share their work openly and transparently, across the entire research lifecycle. By integrating PIDs into OSF and enhancing our metadata practices, we are helping researchers, institutions, and funders meet new demands for data sharing and accountability while fostering a more connected research community.
Supporting Compliance and Discoverability with Expanded Metadata
Metadata is often overlooked in the research process, yet it is fundamental to making research outputs usable and discoverable. Without well-structured metadata—covering essential elements like titles, descriptions, keywords, and contributor roles—research can become difficult to find and reuse. To address the increasing complexity of data-sharing requirements and to support greater transparency, OSF has expanded its metadata fields to better align with emerging standards and funder mandates.
For example, the updated NIH Data Sharing Policy, which took effect in January 2023, requires researchers to include specific metadata, such as funder information and resource types, with their data. In response, we added new fields in OSF for funder names, award titles, and resource types to ensure researchers can meet these requirements. These enhancements not only help researchers comply with policy but also make their work more visible and accessible to collaborators, funders, and the broader research community. Our recent metadata enhancements also bring OSF into alignment with the desirable characteristics for data repositories outlined by the NIH, the White House Office of Science and Technology Policy (OSTP), and the National Science and Technology Council (NSTC).
Beyond compliance, metadata plays a critical role in supporting the FAIR principles. By enhancing metadata fields to include information like the language of research materials, subject areas, and the type of research outputs, we make it easier for others to find, access, and build upon the work being shared on OSF. This has implications for the entire research lifecycle, from early planning and collaboration to dissemination and long-term preservation.
Leveraging OSF's Full Capabilities for the Research Lifecycle
The OSF is designed to support the entire research lifecycle, offering tools for researchers to share their work at every stage — from initial project creation to final publication. OSF is organized around three core types of objects:
- Registrations: These are time-stamped, publicly accessible records of a research plan, allowing researchers to pre-register their hypotheses and methods, fostering transparency. Registrations can also be used to archive research project outcomes in a persistent, citable container with many additional PID relationships that can be expressed in registration metadata.
- Project Spaces: These living, collaborative workspaces allow research teams to share files, data, and other materials in an environment that maintains a clear audit trail of contributions and interactions. Projects can also be archived using the Registrations workflow.
- Preprints: OSF offers preprint services that allow researchers to share articles openly before or in place of traditional journal publication.
The OSF Metadata Application Profile (OSF MAP) outlines all of the standard metadata fields available for OSF objects and how they can be used. Each of these objects is enriched with metadata, which connects them to the larger research ecosystem. OSF can uniquely link all of these parts of the research lifecycle together, representing a powerful interconnected archive of a research project.
PIDs: Ensuring Persistent Connections Across Research
While metadata provides the necessary context for understanding and using research outputs, PIDs serve as unique identifiers that ensure the long-term tracking and connection of research outputs, researchers, institutions, and funders.
At OSF, we integrate several key PIDs:
- ORCiD IDs for researchers, linking their work consistently.
- DOIs for research outputs, enabling persistent citation and discovery.
- ROR IDs for institutions, maintaining accurate affiliation records.
- Crossref funder IDs, connecting research to its funding sources for tracking impact.
By assigning these PIDs to research objects and contributors on OSF, we are able to create robust information graphs that depict relationships across datasets, publications, researchers, institutions, and funding bodies. For instance, a dataset uploaded to OSF can be assigned a DOI, linked to a researcher’s ORCiD, associated with their institution’s ROR ID, and connected to the funder’s Crossref ID. These relationships allow for the seamless tracking, discovery, and reuse of research, both within OSF and across external systems.
The power of these PID-driven graphs is that they enable us to infer and expand relationships between research entities. By aggregating records that share the same identifiers across multiple repositories, we can generate a more comprehensive picture of the research landscape. This interconnected data enhances discoverability, facilitates collaboration, and supports more complex queries about research impact and influence. For example, we are developing resources that guide researchers to sync several critical PID workflows, like their ORCID record with Datacite and Crossref, to make attribution of their work much faster and easier.
Our partnerships with DataCite and Crossref ensure that PIDs assigned through OSF are integrated into global systems for citation and discovery. By adhering to community standards like the DataCite Metadata Schema, OSF enables research outputs to be indexed in services such as Google Scholar, Web of Science, and DataCite Commons, further expanding their reach and impact.
Through our ongoing reliance on PIDs and the constant evolution of our search and indexing tools, OSF continues to build an expansive, multi-disciplinary, and multi-institutional network of research objects, creating new opportunities for insight and discovery.
The Future of Metadata and PIDs on OSF
We have been involved in national strategy and working groups at the national level to continue the adoption and support of PIDs across the research community. We also actively participate in international conversations about PID development through conferences and other events. We work to develop help guides, documentation, and other forms of support for our OSF users to maximize their use of PIDs on the platform. Looking to the future, we are continually evaluating and monitoring the emergence of new PID standards and communities for inclusion in OSF.
Through the continued adoption of metadata standards and the integration of PIDs, we envision a research landscape where data is more discoverable, more connected, and ultimately, more impactful.
We are grateful to the Center for Open Science for their generous sponsorship of NISO Plus Global Online 2024.