Connecting the Dots: Constellations in the Linked Data Universe

Webinar

About the Webinar

The universe of linked data is rapidly expanding and our community is finding innovative ways to link and apply data. This session will cover several initiatives and projects using linked data to improve discovery and reuse of information.

Event Sessions

Introduction

Speaker

OCLC Linked Data Initiatives and Activities

Speaker

Richard will start us off with a high-level overview of the advancements and activities in Linked Data as it relates to bibliographic data resources. He will update us on OCLC initiatives in this area and how they relate to broader developments.

The Linked Data Catalog: Small Steps Toward a Web of Library Data

Speaker

Tom Johnson

Digital Applications Librarian, Oregon State University
Oregon State University

Tom will provide a glimpse into the practical requirements of a Linked Data catalog, how libraries can use existing Linked Open Data, and what opportunities exist to participate in a bibliographic web.

Event Q&A With Our Speakers

Q: So what will the cataloging tools look like when we stop copying and start linking? 


A: Wallis: That will ultimately be up to the system suppliers, but I would predict that they would include features to help suggest authorities to link to, ways to follow links to confirm that you have chosen the correct person/place/concept, and a route to create/suggest new authorities, to name a few.

 

Q: With the de-composition of the traditional bibliographic record, as Richard Wallis suggested, what are some scenarios for description workflows and tools needed for implementing linked data in production environments? 


A: Wallis: See cataloguing tools answer above.

 

Q: How are the big .org data financed? 


A: Wallis: Subscription, sponsorship, part of other worthy/important projects – the full range of possibilities can be found across the range.


A: Johnson: Often they are individual or academic projects. Sometimes, those will come with grant funding or general community support; sometimes not. As more important infrastructure pops up relying on these datasets, more organizations will be compelled to ensure their long term stability. Compare to Free/Open Source software models.

 

Q: Can Tom talk a bit about linked data using DSpace? 


A: Johnson: DSpace has some serious limitations in terms of how it handles descriptive metadata for its objects. We've had success using linked data on DSpace collections in two ways: First, our SKOS name authority work uses the standard DSpace authority system. URIs go into the database as "authority keys" and the authority system has been implemented to use SKOS and FOAF via SPARQL queries. This doesn't really give us linked data in DSpace, but it lets us hook what we have into our objects.

The other thing we've done is to extract metadata from DSpace, process it, and load it into a standalone triplestore. This is what we've done with our theses and dissertations, and it has helped us unify those records which exist in QDC in DSpace and in MARC in the catalog. The linked data gives us a canonical serialization so all the data we have about the objects is in one place.

You can see the actual data here: http://data.library.oregonstate.edu/

...and there is a paper we presented earlier this year about these topics: http://hdl.handle.net/1957/32977

 

Q: How is this topic related to the implementation of RDA? 


A: Wallis: In two ways. First, as a record based cataloguing format, RDA records could be considered as source for producing linked data the same way that Marc can be. Second, the RDA community has used RDA elements as a vocabulary that can be used as linked data properties.

 

Q: Any tips for creating URI's locally that are stable and linkable? 


A: Johnson:  Be organized about your namespace. Choose something which won't collide with your other web activities (e.g. http://data.example.edu/ or http://example.edu/data/) and have a domain policy which ensures you will hold onto the namespace for as long as the URIs are relevant. It's really about good management.



Q: Are there concerns in the area of trust? i.e too many participants taking too many liberties and relations like "sameAs" being misused. 


A: Wallis: As was said, on the linked data web (as for the web itself) anyone can say anything about anything. SameAs relationships have been [ab] used in many ways. However it is for the data consumer or application provider to apply their editorial judgment about resources they trust to add value to their data – for example the BBC trust Wikipedia for descriptions of animals, but they do not trust it for music (they use MusicBrainz) or politics (they use their own reporters).


A: Johnson: There is a ton of interesting work going on surrounding both of these issues. If you care to, you can dig pretty deep. The kind of ad hoc trust that Richard describes will work pretty well for most applications. More formal and generalized trust systems might pop up in the future.

Re: The semantics of SameAs, a good starting place is: www.w3.org/2009/12/rdf-ws/papers/ws21

 

Q: A question about the OSU authorities’ project - how did you set about initially cleaning/reconciling the names you had in-hand? 


A: Johnson: We used a heuristic approach to do an initial name matching/URI minting pass against the existing names. The result is far from perfect, but is still an improvement over the status quo. The long term approach is to introduce quality control into the ingest/review workflow. We have a simple web application for editing (and splitting/merging) data for URIs, and we'll be doing some research on how expensive and effective name review turns out to be as we roll this project out in the New Year.

The web app is free software (though not yet in use even here and not at allsupported) and the code is up here:

https://github.com/no-reply/SKOS-Name-Authority-Editor


Q: URI's for art works? Any progress here? 


A: Wallis: Europeana publishes linked data for its resources, many of which are art works.

 

Q: Is OCLC planning or considering implementing linked data in cataloging -- via Connexion for example? 


A: Wallis: OCLC is continuously reviewing the tools and services it provides in the context of the evolving library landscape – the experiences gained from publishing linked data are being reflected in to those reviews.

 

Q: You haven't mentioned the kind of inferences and logical queries that can be done with linked data. Is that considered not so important for libraries, or do you see that coming, maybe later? 


A: Wallis: There are great potential benefits to be gained from analyzing and creating inferences from linked data resources, once we have established some consistent publishing practices. Today the major benefits will flow from publishing and following links. 


A: Johnson: This is a very promising line for research and future development. I think perhaps it often gets avoided in sessions like this because it's a hard topic to handle quickly and most people get skeptical or overwhelmed when you start to talk to0 much about 'semantics', 'ontologies' and other knowledge representation and AI-related stuff. It's true that the most immediate benefits come from sharing data at scale; but if you are interested in use cases for inference, you shouldn't be discouraged. You're just ahead of the game.

As a simple example of how inference might be useful, think about trying to work with unexpected data properties. If you get data like:

http://example.org/some_item 
http://id.loc.gov/vocabulary/relators/anm
http://dbpedia.org/resource/Tex_Avery

You might not have a clue what the marcrel:anm means or what to do with it. But you probably know exactly what to do with dc:contributor. So when you "follow your nose" and find:

http://id.loc.gov/vocabulary/relators/anm
http://www.w3.org/2000/01/rdf-schema#subPropertyOf
http://purl.org/dc/elements/1.1/contributor

Your application can be smart enough to act accordingly and, for example, index Tex Avery as a contributor.

 

Q: Richard, has OCLC done an assessment to determine if the use of schema.org has increased the visibility of your resources in search engines, i.e. higher rankings or increased hits on your resources. 


A: Wallis: At this early experimental stage most results are anecdotal. As one site, even if a large one, WorldCat.org can only have a small impact on the ratings – once many libraries are publishing and [most importantly] interlinking their resources, network effects should come into play.

 

Q: Is the control headings function in OCLC a step toward linked data? Linking that enables correction of the name as copied data? 


A: Wallis: It is one part of a continuously evolving picture.

 

Q: Is training on XML necessary for current catalogers? 


A: Wallis: As XML is only one way that RDF can be packaged for exchange, it ‘should’ be hidden behind normal workflow tools as they evolve thus removing the need for cataloguers to interact directly with XML.


A: Johnson: Exactly. XML is just one format here (and, in my opinion, one of the hardest to work with for RDF). I do think catalogers, current and
aspiring, should work to understand data formats in general. Understanding XML certainly wouldn't hurt, but I wouldn't emphasize it over, say, JSON or N3/Turtle.



Q: How would a cataloger handle it when the authoritative system flat out denies the existence of the thing you are describing? 


A: Wallis: An authority system ‘should’ have a way to at least suggesting new authorities if it is to be relevant in a linked data world. Alternatively, local authority systems could be linked to if published as linked data. As we move towards a linked library data world, one would expect to mix authorities from several sources.

Additional Information

  • ​​​​​​​Registration closes at 12:00 pm Eastern on December 12, 2012. Cancellations made by December 5, 2012 will receive a refund, less a $20 cancellation fee. After that date, there are no refunds.
  • Registrants will receive detailed instructions about accessing the webinar via e-mail the Monday prior to the event. (Anyone registering between Monday and the close of registration will receive the message shortly after the registration is received, within normal business hours.) Due to the widespread use of spam blockers, filters, out of office messages, etc., it is your responsibility to contact the NISO office if you do not receive login instructions before the start of the webinar.
  • Registration is per site (access for one computer) and includes access to the online recorded archive of the webinar. If you are registering someone else from your organization, either use that person's e-mail address when registering or contact the NISO office to provide alternate contact information.
  • Webinar presentation slides and Q&A will be posted to the site following the live webinar.
  • Registrants will receive access information to the archived webinar following the event. An e-mail message containing archive access instructions will be sent within 48 hours of the event.