Working with Scholarly APIs: A NISO Training Series

Training Series

Course Objective

To provide consistency of training and a baseline of knowledge across the information community for appropriate use of APIs using the HTTP REST paradigm for scholarly resources across multiple information services and systems.  

Web APIs provide interfaces to enable developers, technologists and researchers to interact with third party applications through a set of common protocols and standards. By doing so, they enable complex functionality to be developed and information to be exchanged with relative ease and reliability. In this course, we will cover the role of APIs in workflow integrations for publishing and application development, as well how analysts, researchers, and business intelligence professionals use APIs to aggregate and synthesize data for bibliometrics, topic modeling, data visualization and trend identification.

Course Moderator: Phill Jones

Dr. Phill Jones is a technologist, entrepreneur, product leader, strategic analyst and consultant. His current role is co-founder for digital and technology at the MoreBrains Cooperative, a consultancy working at the forefront of scholarly infrastructure, information management, and research dissemination. He is a Scholarly Kitchen Chef, a member of the Learned Publishing Editorial Board, a member of the Researcher to Reader advisory board, and a Judge of the Karger Vesalius Innovation Awards.

Previously, Phill was the CTO at Emerald Publishing. Before that, he spent 6 years at Digital Science in a variety of roles including VP of Business Development at ReadCube, Director of Publishing Innovation and also as a Bibliometric Consultant. He was also an early employee and the first editorial director at JoVE.

In a former life, Phill was a successful cross-disciplinary research career at Imperial College, London, where he earned a PhD in Physics and Harvard Medical Schools where he was a research faculty member working in Stroke, Alzheimer Disease and molecular optical imaging.

Course Duration and Dates

Thursday, April 28, 2022 – Thursday, June 16, 2022. The series consists of eight (8) segments, one per week and each lasting approximately 60-90 minutes. Each segment is intended to cover a Thursday lunch period (11:00am - 12:30pm, Eastern Daylight Time, US & Canada).

Guest lecturers will be featured in specific segments, as the course moderator deems appropriate. 

Each session will be recorded and links to that archived recording will be disseminated to course registrants within 2 business days of the close of the specific session. 

Basic Student Requisites

Those who plan to register for this training should be able to understand and execute the following:

  • Required:
    • Computer with a stable internet connection
    • Installation of and familiarity with the free API client software postman: (https://www.postman.com/)
    • Familiarity with scholarly publishing metadata and infrastructure (titles, authors, institutions, DOIs, ORCID etc)
    • Awareness of web technology and terminology (HTTPS, POST, GET, headers, status codes e.g. 404 etc)
    • Familiarity with JSON and XML data structures
  • Desirable but not essential
    • Familiarity with cURL command-line utility
    • Text editor software eg NotePad++ (windows), Atom (cross-platform), or Visual Studio Code (cross-platform)
    • Familiarity with a scripting language, eg python or R

Who Can Benefit from This Online Training:

This training series has been arranged to meet the needs of:

  • Early career content professionals working in editorial/production environments of small to mid-size scholarly societies or similar publishing entities.
  • Early or mid-career programmers and developers working in academic institutions and libraries who want to make use of a variety of APIs as provided by organizations in the scholarly communications ecosystem
  • Mid-career managers or supervisors whose roles require them to be familiar with multiple information systems and platforms and the relevant APIs that support transfer of information between those systems.

Event Sessions

Foundational Specifics - Thursday, April 28, 2022

Speaker

Phill Jones

Co-Founder and Lead, Digital and Technology
More Brains Coop

The first session on the course will serve to establish a common vocabulary and baseline understanding. It is likely that most attendees will already be familiar with many of the concepts discussed, but everybody will get the opportunity to both broaden their view of the role of APIs and also identify the areas most relevant to their daily work and career progression.

In this introductory session, attendees will…

  • hear a brief overview of the function and role of APIs on the web
  • know what it is to be RESTful
  • learn how APIs apply to scholarly publishing, infrastructure and analytics.
  • install and configure the Postman REST client
  • test out the Postman client with a couple of simple queries
  • Discuss which APIs they, or their organisations use or may want to use in the future for various use-cases

Resources shared by course moderator Phill Jones, and attendees:

Crossref API

Crossref Documentation

The "First" World Wide Web Page

ORCID - Thursday, May 5, 2022

Speaker

ORCID, which stands for Open Researcher and Contributor ID, provides a unique, persistent digital identifier free of charge to researchers, so that they can be uniquely identified and connected to their contributions & affiliations. Additionally, ORCID provides researchers with an ORCID record, which is a store of connections between identifiers. Lastly, ORCID provides a set of APIs that enable transparent and trustworthy connections between those researchers, their contributions, and their affiliations. ORCID APIs can be integrated into applications in order to help users find information and to help simplify reporting and analysis among many other use cases. This workshop session provides an introduction to ORCID's APIs, including:

  • ORCID API types & features
  • Obtaining access to ORCID APIs
  • Searching & retrieving publicly available data
  • User permissions - introduction to OAuth 2.0
  • Adding & updating data
  • Support resources 

Shared Resources, Documentation, and More!

ORCID Integration and API FAQ

How do I find ORCID record holders at my institution? by Paula Demain

W3Schools: HTML URL Encoding Reference

Crossref - Thursday, May 12, 2022

Speaker

Crossref makes research objects easy to find, cite, link, assess, and reuse. Crossref members provide metadata about a range of scholarly objects, and this metadata is all made freely available through Crossref APIs.  

This workshop will help you learn:

  • the breadth and depth of the metadata Crossref collects

  • what APIs are available and how they are used

  • how to search, facet, filter, or sample metadata using the Crossref REST API

  • learn how to discover connections between scholarly objects and the discussion that surrounds them using the Event Data API

  • how APIs will help build a Research Nexus of interconnected scholarly works and metadata

Resources shared by course moderator Phill Jones, guest lecturer Patricia Feeney, and attendees:

API Case Study: Crossref metadata for bibliometrics

HTML URL Encoding Reference

DOI Name Values 

OpenRefine (formerly known as Google Refine)

Crossref Unified Resource API

Crossref Ticket of the month - March 2022 - Getting started with REST API queries

Wikipedia: Semantic triple

Wikipedia: Triplestore

Crossref Documentation, Content Markup Guide: Required, recommended, and optional elements

Crossref API: Members

Digital Science Dimensions - Thursday, May 19, 2022

Speaker

Dimensions aggregates and enriches data about the scholarly cycle; from grants to publications, datasets, clinical trials, patents, and policy documents. Researchers and their research institutions are automatically extracted and unified, while the text is categorised using pre-existing classifications (the Australian Fields of Research, the British Units of Assessment, the UN Sustainable Development Goals, and more health specialised classifications such as the Research, Condition, and Disease (RCDC) and ICRP Cancer types). 

The presentation will introduce the participants to the use of the Dimensions API. It will show:

  • how to easily connect to the API, using Postman, 
  • how to use the python library dimCLI in Google Colab
  • what can be extracted and classified from it
  • where to find more information about the data sources 
  • the Dimensions API Lab, to learn more about how to use the API

Resources shared by course moderator Phill Jones and guest lecturer, Dr. Hélène Draux:

The Dimensions Search Language: Query Syntax

COVID-19: Dataset of Global Research by Dimensions

Google Cloud: Datasets

Web of Science - Thursday, May 26, 2022

Speaker

This session will cover the available Web of Science APIs, show examples of popular requests using Postman, and discuss the retrieved content. In this session, attendees will:

  • get an overview of the Clarivate Developer Portal and how to find and learn more about Web of Science APIs
  • know how to search and retrieve Web of Science Documents with Expanded API and Starter API
  • learn about InCites API with schema specific metrics
  • learn about Journal API that allows searching and retrieving complete metrics from Journal Citation Reports (JCR)
  • learn about our open source offerings on Github
  • get a brief demo with Web of Science API Exporter, an open-source tool that allows data exports without coding skills

Shared Resources, Documentation, and More!

Online JSON Viewer

Term frequency - Inverse Document Frequency (tf–idf)

OpenAPI Specification

An Easier Way to Access the Web of Science API: Using the Web of Science Excel Converter tool by Eric Schares of Iowa State University

OpenAlex - Thursday, June 2, 2022

Speaker

The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities: Works, Authors, Venues, Institutions, and Concepts. Together, these comprise a graph of hundreds of millions of entities and billions of connections. Using the OpenAlex API, anyone can query and access this fully open (under a CC0 license) catalog of the global research system.

In this session, participants will:

  • Get up and running with the API using Postman, including authentication to the so-called "polite pool" for best performance;
  • Get single entities and review the structure and semantics of entity metadata;
  • Get lists of entities by leveraging basic filtering, full-text search, sorting, and pagination; and
  • Get aggregate information on groups of entities directly from the API without any need for post-processing.

Shared Resources, Documentation, and More!

OpenAlex Documentation

OurResearch - nonprofit organization which creates and distributes tools and services for libraries, institutions and researchers

CORE - the world’s largest aggregator of open access research papers from repositories and journals

Getting citation data from openAlex by DOI - python notebook on how to retrieve total citations per paper and citations per year for a set of DOI's

OpenAlex Twitter

Methods and Tools for Scholarly Data Analytics - Thursday, June 9, 2022

Speaker

When working with scholarly data, the analyst must consider many different technology aspects. In terms of data integration, knowledge of the available datasets and how to link across them is crucial. For effective data enrichment, experience with widely used libraries and APIs can add additional value to the data. Finally, visualization of the outcomes is essential for proper interpretation and communication of findings.
 
For this presentation, I will demonstrate several technology solutions relating to these import steps showing how Python, open-source libraries and public data sources can be used effectively for custom analysis. In particular, topic modelling and geoparsing will be discussed, along with network visualization using Gephi.

Shared Resources, Documentation, and More!

Gephi: The Open Graph Viz Platform - Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free.

ASIS&T Digital Library: Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?

OECD Data- Thursday, June 16, 2022

Speaker

You will learn a little bit about the OECD and what it does, about the official Data API that the OECD provides for use by external audiences as well as the standards underpinning the API (RESTful, SDMX, JSON-LD). The OECD Data API provides statistical data on the many subjects areas in which OECD experts are active, including Economics, Environment, Education, Trade, Agriculture, Labor and more.

We will then proceed to look at the API in more detail, in particular understanding the query formats as well as what the returned responses contain. This will include trying out several queries using Postman or cURL. Finally, for inspiration, we will look at existing implementations and how they use (or are going to use) the Data API. This will include data exchanges between organizations, online data visualizations as well as the automatic generation of statistical publications (print and PDF).

Shared Resources, Documentation, and More!

OECD data for developers - The OECD has application programming interfaces (APIs) that provide access to datasets in the catalogue of OECD databases. The APIs allow you to query the data in several ways, using parameters to specify your request so that you can create innovative software applications which use OECD datasets. 

OECD.Stat - OECD.Stat includes data and metadata for OECD countries and selected non-member economies.

IBAN COUNTRY CODES ALPHA-2 & ALPHA-3 -  A complete list of all country ISO codes as described in the ISO 3166 international standard.

Statistical Data and Metadata eXchange (SDMX) for the Python data ecosystempandaSDMX is an Apache 2.0-licensed Python library that implements SDMX 2.1 (ISO 17369:2013), a format for exchange of statistical data and metadata used by national statistical agencies, central banks, and international organisations.

Additional Information

  • Cancellations made by April 28, 2022 will receive a refund, less a $35 cancellation. After that date, there are no refunds.

  • Registrants will receive detailed instructions about accessing the eight training series sessions via e-mail the Monday prior to the specific session of the series. (Anyone registering between Monday and the close of registration will receive the message shortly after the registration is received, within normal business hours.) Due to the widespread use of spam blockers, filters, out of office messages, etc., it is your responsibility to contact the NISO office if you do not receive login instructions before the start of the webinar.

  • If you have not received your Login Instruction e-mail by 10 a.m. (ET) on the day before the series segment, please contact the NISO office at nisohq@niso.org for immediate assistance.

  • Registration is per site (access for one computer) and includes access to the online recorded archive of the conference. You may have as many people as you like from the registrant's organization view the conference from that one connection. If you need additional connections, you will need to enter a separate registration for each connection needed.

  • If you are registering someone else from your organization, either use that person's e-mail address when registering or contact nisohq@niso.org to provide alternate contact information.

  • Conference presentation slides and Q&A will be posted to this event webpage following the live conference.

  • Registrants will receive an e-mail message containing access information to the archived conference recording within 48 hours after the event. This recording access is only to be used by the registrant's organization.

For Online Events

  • You will need a computer for the presentation and Q&A.
     
  • Audio is available through the computer (broadcast) and by telephone. We recommend you have a set-up for telephone audio as back-up even if you plan to use the broadcast audio as the voice over Internet isn't always 100% reliable.

It is your responsibility to ensure that your system is properly set up before each webinar begins.