<<

We are leaving MARC and transitioning into RDA in order to make our bibliographic information accessible on the .

“…the role of the national library is increasingly moving from a focus on ‘stored knowledge’ to one where ‘smart knowledge’ is paramount.” Libraries will increasingly be linking and connecting to content, which means that local collections will need to be placed within a larger national, and even international, context.

Jisc. (2015, March 17). Report: A national monograph strategy roadmap. Downloaded March 30, 2015 from: http://www.jisc.ac.uk/reports/a‐national‐monograph‐strategy‐ roadmap.

1 The Web, as it is familiar to us, uses the HTTP protocol to retrieve information resources. Everything that lives on the Web is an information resource: Documents; videos; image; music files…

2 Information pulled from our MARC records using our OPAC interface is based on this model. This silos our information.

3 The Semantic Web uses the HTTP protocol to identify real world, non‐information resources and the relationships between resources and non‐information resources.

People, places, abstract can be linked to other non‐information resources and information resources and the relationships between them.

The Semantic Web is a collaborative effort led by then Consortium (W3C) to provide a framework that allows data to be shared and reused across application, enterprise, and community boundaries.

W3C. What is the Semantic Web? Downloaded February 12, 2014 from http://www.w3.org/2001/sw/

4 The goal of the Semantic Web is to move from a Web of Documents to an open inter‐ connected Web of Data by doing the following using Open :

 Provide valuable, agreed‐upon information in a standard, open format.

 Provide mechanisms to link individual schemas and vocabularies in a way so that people can note if their ideas are “similar” and related, even if they are not exactly the same.

 Bring all this information to an environment which can be used by most, if not all of us. (Make data available free of proprietary software, single social networks, or web application.) Bauer, Florian and Kaltenböck, Martin. (2012). Linked : The Essentials. Downloaded December 30, 2014 from: http://www.reeep.org/LOD‐the‐ Essentials.pdf, p.25.

5 One way for us to do this is by connecting our records to the global Web of Data. This is the Linking Open Data (LOD) Cloud Diagram. At the center is DBpedia, the Linked Data version of . The colors on the diagram represent broad topic areas. Library and cultural institutions are in the green area.

The LOD project is a community activity started in 2007 by the AW3C’s Semantic Web Education and Outreach (SWEO) Interest Group (www.w3.org/wiki/SweoIG) whose goal is to made data freely available to everyone. The collection of Linked Data published on the Web is referred to as the LOD Cloud. Open‐content projects as diverse as encyclopedias and dictionaries, government statistics, information regarding chemical and biological collections and endangered species, bibliographic data, music artists and information about their songs, and academic research papers are all available using the same data format and reachable using the same API. The LOD cloud has doubled in size every 10 months since 2007. More than 40% of the Linked Data in the LOD cloud is contributed by governments (mainly from the United Kingdom and the United States), followed by geographical data (22%) and data from the life sciences domain (almost 10%). Life sciences (including some large pharmaceutical companies) contribute over 50% of the links between datasets. Publication data (from books, journals, and the like) comes in second with 19%, and the media domain (the BBC, the New York Times, and others) provides another 12%. The original data owners themselves publish one‐third of the data contained in the LOD cloud, whereas third parties publish 67%. For example, many universities republish data from their respective governments in Linked Data formats, often cleaning and enhancing data descriptions in the process. Wood, David, Zaidman, Marsha, and Ruth, Luke. (2014). Linked Data: Structured Data on the Web. Shelter Island, NY: Manning, p. 14.

6 Linked Data refers to a set of best practices and techniques for publishing and connecting structured data on the Semantic Web using international standards of the W3C.

Data that conforms to these practices and techniques are also called Linked Data.

Wood, David, Zaidman, Marsha, and Ruth, Luke. (2014). Linked Data: Structured Data on the Web. Shelter Island, NY: Manning, p.4‐5.

The four Linked Data principles proposed by Tim Berners‐Lee are: Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) Include links to other URIs so people can discover those things Berners‐Lee, Tim. (2009, June 18). Linked Data. Downloaded December 26, 2014 from http://www.w3.org/DesignIssues/LinkedData.html.

7 RDF (Resource Description Framework) is used to implement Linked Data on the Semantic Web. The Semantic Web Cake, aka or Semantic Web Layer Cake displays the architecture of the Semantic Web. The Semantic Web relies on the combination of the following technologies: Explicit : They allow web pages to carry their meaning on their sleeves, Ontologies: They describe the main concepts of a domain and their relationships, Logical reasoning: it makes it possible to draw conclusions from combining data with ontologies.

TRUST is also a major component of the Semantic Web: it relies on folks providing good, accurate information. I emphasize this for two reasons: 1) trusting that the information accessed in the Web of Data is good information makes the Web of Data work; and 2) it is important for us to make the case of the importance of good data within our own institutions. Media Map. State‐of‐the‐art Situation. Downloaded February 12, 2014 from http://mediamapproject.org/project_situation.html

8 The basic building block is a simple statement called a triple made up of a subject, predicate, and object. HTTP unique identifiers are assigned to subjects, predicates, and objects that have unique identifiers. Objects can also take the form of text which are not assigned unique identifiers. When this happens the object is known as a typed literal.

Subjects and objects with HTTP identifiers are displayed as circles. Literals are shown in rectangles.

Predicates are the arcs, or arrows between them. The predicates denotes the relationship between the subject and object. The arrow of the arc points from the subject to the object. It is possible to have a subject/object relationship between two resources in one graph and a reverse relationship between the same two objects in another graph which the subject from the first graph is the object in the second graph, and the object of the first graph is the subject in the second graph.

A set of triples is called an RDF graph.

“Is a” should be read as “Is an instance of” . Dotted arrows imply an inferred relationship.

9 An RDF parser reads text in one or more RDF serializations (formats ‐ XML, N‐Triples, ) and interprets it as triples in the RDF model. An RDF serializer does the reverse; it takes a set of triples and creates a file that expresses that content in one of the serialization forms (Allemang & Handler, p.51‐53). The parser/serializer has no direct counterpart in a relational data‐backed system, at least as far as standards go. (This is a key advantage of RDF stores over traditional data stores.) An RDF parser inputs a file with a .rdf extension and converts it into an internal representation of the triples that are expressed in that file. The triples are stored in the triple store and are available for all the operations of that store. An RDF store (aka triple store) is a that is tuned for storing and retrieving data in the form of triples. In addition to the familiar functions of a database, an RDF store has the additional ability to merge information from multiple data sources, as defined by the RDF standard. The query engine provides the capability to retrieve information from an RDF store according to structured queries. An application has some work that it performs with the data it processes: analysis, user interaction, archiving, and so forth. These capabilities are accomplished using some programming that accesses the RDF store via queries (processed with the RDF query engine). Converters convert information from one form into RDF and often into a standard form of RDF like Turtle. Most RDF systems include a table converter of some sort. (Tabular data can be mapped into triples in a natural way.) (Allemang & Handler, p.53).

10 Libraries around the world are contributing datasets to the LOD Cloud. You can access these datasets on datahub.

About the Datahub The Datahub provides the means to freely search data, register published datasets, create and manage groups of datasets, and get updates from datasets and groups you're interested in. Accessed April 5, 2015 from: http://datahub.io/organization?q=Library&sort=name+asc

11 The Library of Congress’s Authorities and Vocabularies are linked in the LOD Cloud and LC provides a search interface so you can access their unique identifiers.

12 OCLC administers the Virtual International Authority File (VIAF). They are providing a unique identifier which links the authority records of National Libraries throughout the world who have submitted their records to VIAF.

OCLC is building tools and services based on these datasets. Scroll down a VIAF page and open the “About” bar.

13 Click on WorldCat Identities, and you will open an OCLC WorldCat Discovery page.

14 These VIAF and National Library identifiers are now available on Wikipedia

15 When you scroll down to the very bottom of a Wikipedia page, you will find , where you will see the unique identifiers assigned by VIAF, LC, Getty, ISNI, ULAN, and National Libraries. These are the identifiers used in RDF graphs to uniquely identify the resources that we manage.

16 So when we update an Authority Record, that information is already linked in the LOD Cloud. This increases the exposure of library data to a broader linked global community.

This adheres to one of the major principles of Linked Data, to provide useful links to other datasets.

17 The British Library is one of the forerunners in experimenting with how to represent library‐related information on the web. This is an image of one of their tests from 2010.

British Library. Journal Articles in RDFXML in datahub. Downloaded April 18, 2015 from

18 The Library of Congress is developing BIBFRAME as a general model for expressing and connecting bibliographic data.

19 It provides the means to replace a MARC record

20 with an RDF serialization (or format), thereby providing the means to connect bibliographic information to the LOD Cloud.

What you see on this slide is a small portion of the RDF graph for the MARC record on the previous slide. The record was downloaded from OCLC as a MARC RDF/SML file. The record was converted into a BIBFRAME record using MarcEdit. It was opened in Notepad ++, copied and parsed using an RDF validator which produced the graph.

Many of these slides contain to their sources which can be accessed by right‐clicking on the slide and opening the .

Resources I suggest checking out: The British Library for a monograph The British Library Data Model for a serial. The two Stanford resources: Report of the Stanford Linked Data Workshop and Stanford Linked Data Workshop Technology Plan. The Relationship between BIBFRAME and OCLC’s Linked‐Data Model of Bibliographic Description: A Working Paper And from the Reference List: Bauer, Florian and Kaltenböck, Martin. (2012). Linked Open Data: The Essentials. Downloaded December 30, 2014 from:

21 http://www.reeep.org/LOD‐the‐Essentials.pdf

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35