The concept of linked data has been bubbling around in the library consciousness for more than a decade. However, for most front-line librarians, linked data is still a bit of a mystery. Our goal today is to demystify linked data and in plain language explain why it’s important and useful to libraries. We'll also provide info on how and why Aspen publishes linked data.
What is Linked Data?
In the case of Linked Data, analogies are our friends.
Let’s use my friend Candice as an example. Candice owns a 2016 Toyota Prius. BOOM! We’re off to a good start with our linked data. We have two things, Candice and a Prius, and a relationship, in this case that Candice OWNS that prius.
This concept of two things and a relationship is easily extendable to libraries:
- The Main Street Branch owns a hardcover copy of the book The Hunger Games.
- The Main Street Branch is a location of The City Library.
- The City Library subscribes to Cloud Library.
- Cloud Library offers an ebook of The Hunger Games.
- The Hunger Games ebook belongs to the Grouped Work The Hunger Games
- The hardcover copy of the book The Hunger Games belongs to the Grouped Work The Hunger Games.
So is that all there is to Linked Data? Nope, there are a few more requirements.
According to Tim Berners-Lee, there are four expectations/rules that allow us to define the relationship between two things as linked data. The following four “rules” are taken from his article, Linked Data:
Use URIs as names for things
URI stands for Universal Resource Identifier, and it’s most easily thought of in this context as the unique ID for each object or concept we’re describing relationships between.
Use HTTP URIs so people can look up those names
Here the requirement is that you publish links to your URIs on the web.
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
Now we’re talking about describing those relationships. Specifically, how do we use standards so that other applications will understand what we’re trying to describe about the relationship between our two objects or concepts.
Include links to other URIs so that they can discover more things.
Do you link your objects or concepts to anything else? Linking outside of your own data is where the magic happens, creating connections across datasets.
Aspen and Linked Data
Let’s take a look at Aspen data and compare it to the rules above. Every title in Aspen that comes from the ILS uses the bibliographic number from the MARC record in the ILS as its unique identifier and that identifier is published in the URL. Unique identifiers are used or created and added to the URL for any titles coming from sources outside the ILS.
Grouped works, which combine all of the formats of a single title into the same search result/page, also each have their own unique identifier in the URL. Library systems and branches each have unique identifiers published in URLs, and branches include address information. By creating unique identifiers and publishing each of these identifiers as URLs, Aspen easily satisfies Berners-Lee’s rules one and two.
The third requirement is about publishing your data. Aspen publishes data using two different standards: BIBFRAME and Schema.org, both of which are expressed in JSON-LD which is based on RDF. BIBFRAME is managed by the Library of Congress and is specific to library data. Schema.org is an initiative driven by Google and several other search engines to provide website creators with a standard to publish structured data that their search engines can understand. By publishing data in these formats, Aspen ensures that information about your materials is available to any search engine, library application, or other entity on the web that is capable of reading those formats. How cool is that?
The fourth expectation is a little bit tougher and definitely a direction for future growth. While Aspen references URIs extensively across its own data, the data it publishes in BIBFRAME and Schema.org does not yet reference outside datasets. However, as more and more organizations with rich datasets understand the value of linked data and publish their data using these standards, Aspen is primed to take advantage of any data available to enhance and extend the online experience for our library partners and their patrons.
Why BIBFRAME and Schema.org
The work done in Aspen’s code to publish BIBFRAME data was first done in 2016. Although the standard was less than 5 years old at the time, there was a sense that the more applications that adopted it, the more useful it would become, which is why Aspen continues to publish BIBFRAME data today.
The work done with Schema.org was done around the same time, but with a different goal: to provide data in the structured format that search engines understand and prefer. We wanted to make sure that Google not only knew which branches had copies of which titles, but also the addresses of those branches, the cost of the items (free!), and whether or not those materials were in stock. This of course is no guarantee that materials from the library appear at the top of Google searches. Search Engine Optimization is a field entirely unto itself, but by publishing in schema.org we at least know that search engines are starting with the right information.
This is just the very tip of the iceberg as far as linked data, but if that piqued your interest, there’s so much more information on linked data for libraries out there. Great professional resources include the American Library Association’s Linked Data Interest Group and the 2020 book Linked Data for the Perplexed Librarian.
Read more by Jordan Fields