LINKED OPEN DATA for MEMORY INSTITUTIONS: IMPLEMENTATION HANDBOOK YEAH! Project Deliverable No: 3
Total Page:16
File Type:pdf, Size:1020Kb
YEAH! LINKED OPEN DATA FOR MEMORY INSTITUTIONS: IMPLEMENTATION HANDBOOK YEAH! project deliverable no: 3 YEAH! is supported by: Vinnova NordForsk Icelandic Centre for Research (RANNIS) Estonian Ministry for Economic Affairs and Communications Luleå – Reykjavik – Stockholm - Tartu April 2014 TABLE OF CONTENTS Introduction ............................................................................................................................................... 3 Introducing Linked Open Data .................................................................................................................. 5 The business case for using linked open data in memory institutions ..................................................... 8 What are the risks? .........................................................................................................................................9 Who else is doing it? .....................................................................................................................................10 Getting started with Linked Open Data................................................................................................... 11 Making your data ready for Linked Open Data ....................................................................................... 14 Step 1. Setting objectives .............................................................................................................................14 Step 2. Analysing and cleaning data ...........................................................................................................15 Step 3. Creating an RDF data model and mapping it to source data .....................................................17 Step 4. Selecting ontologies .........................................................................................................................20 Step 5. Converting data to RDF ...................................................................................................................21 Step 6. Publishing Open Data ......................................................................................................................22 YEAH! experience with Linked Open Data .............................................................................................. 24 National Archives of Estonia ................................................................................................................ 24 The scenario ...................................................................................................................................................24 Challenges and lessons learned ..................................................................................................................24 Creating the RDF structure...........................................................................................................................24 Matching two different description practices ...........................................................................................25 National Archives of Iceland ................................................................................................................ 32 National Archives of Sweden ............................................................................................................... 36 Working with Open Refine to convert data into RDF ..............................................................................40 Creating a new ontology with RDF and SKOS .................................................................................. 47 Challenges for the LOD project ............................................................................................................... 49 Glossary ................................................................................................................................................... 52 References ............................................................................................................................................... 53 INTRODUCTION Sharing information between memory institutions and collating it into portals for users is not a new concept. Libraries were sharing OPAC catalogue data long before the World Wide Web was invented. There are a number of domain specific standards that support exchange of data between digital repositories and databases that serve as finding aids – EAD for archives, Z39.50 for libraries, CIDOC- CRM primarily in museums and OAI-PMH for all types of institutions. These standards have served memory institutions well and have supported the development of access services to cultural heritage collections that were not possible using the standard Internet protocols and software. Now, however, the semantic web has taken off and the mainstream Internet technologies are beginning to include mechanisms for sharing and accessing data that surpass the cultural sector tools based on domain-specific standards. Open Data and Linked Open Data are creating a new “underground layer” to the World Wide Web where computers can interact with our data and bring it to “surface” for human users in new ways that hitherto have been difficult and expensive to create. The new access methods that semantic web offers for using cultural heritage collections are not (yet) going to replace the existing catalogues, finding aids, databases and portals that memory institutions have arduously developed over the past decades. Rather, these new methods can be seen as additional “windows” that we create for looking at our rich collections, and it will not be just people looking into these windows, but also software that will help us connect our collections with those of our peers. The benefits of using these new technologies are manifold but as with any innovation, the promise of the “brave new world” appeals to different stakeholders at different pace. Memory institutions have embraced the Linked Open Data with gusto and enthusiasm, mainly because it allows them to reach out to new audiences, especially younger generations whose information consumption habits are primarily influenced by the Internet, and to connect the heritage collections with e-government services that are increasingly built on open data. For those who consider map-based user interfaces as mere gimmicks, or interconnected data timelines from historic collections and current public registries as “just more data”, the semantic web can still offer new tools for interacting with cultural heritage collections and ask new types of intelligent questions in one query that would previously have taken three or ten or more in different databases and portals. Although little appreciated today, perhaps the biggest advantage of the open data protocols for memory institutions will in the future be the machine-to-machine query and retrieval functions. Current query and access tools of memory institutions are designed with human users in mind and mostly support retrieval of single objects – the repository is serving one file at a time that the users can view or save. Bulk retrieval of objects (e.g., “all archduke’s tax records from 1712 and 1716 to 1720”), selective retrieval (e.g., “files of pages 13, 45 and 173 from book N”) and content streaming (e.g., “all sections of recordings that contain birdsong”) are difficult to achieve with current finding aids. Open Data holds promise of making this as easy as selecting ‘person’, ‘title’ or ‘year’ in a catalogue portal today. This report is telling the tale of three archives that collaborated in the YEAH! project to learn how to make use of the Linked Open Data (LOD). The six sections of the report first describe the main concepts of Open Data and Linked Open Data and the business case for using these in memory institutions. This is followed by a recommendation for a six-stage process for making a cultural heritage collection available as linked open data. The report does not describe the entire LOD project 3 planning and execution cycle in memory institutions, because these projects will serve different purposes in different institutions and will be built around different use cases. Instead, the handbook presents the core activities that every project will have to undertake, split into six stages that build on one another yet allow for iterations in the process. Each workflow stage is described as a task-list that can be checked off at the end of the stage, and supported with references to tools and additional information sources. The last sections contain three case studies from the three archives that created linked open data collections and demonstrators based on them. These case studies can be used as practical guides on the decisions that had to be made along the way and tools that were used to overcome the incongruences in historical data, incompatibility of databases or different levels of granularity in available description. Even if the described technical details of using the LOD tools will change over time together with the software products, the decision points and solutions found by the archives have a lasting value and can be reused by anyone. The three case studies present three different use cases of linking historic collections – same types of records across time (census records), different types of records from the same archive (census and building permits) and collections from different memory institutions (photos from an archive and postcards from a library). These three cover well the current range of LOD applications