Delft's History Revisited
Total Page:16
File Type:pdf, Size:1020Kb
Delft’s history revisited Semantic Web applications in the cultural heritage domain Martijn van Egdom Delft’s history revisited THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE by Martijn van Egdom born in Rhenen Web Information Systems Department of Software Technology Faculty EEMCS, Delft University of Technol- Erfgoed Delft en Omstreken ogy Schoolstraat 7 Delft, the Netherlands Delft, the Netherlands www.wis.ewi.tudelft.nl www.erfgoed-delft.nl c 2012 Martijn van Egdom. Coverpicture: View on Delft, painted by Daniel Vosmaer, Erfgoed Delft. Delft’s history revisited Author: Martijn van Egdom Student id: 1174444 Email: [email protected] Abstract While at one side there is an ever increasing movement within cultural heritage organizations to offer public access to their collection-data using the Web, on the other the Semantic Web, fueled by ongoing research, is growing up to be a mature and successful addition to the Web. Nowadays, these two sides are join- ing forces, combining the large collections of mostly public data of the cultural heritage institutions with the revolutionary methods and techniques developed by the Semantic Web researchers. This thesis is the result of this rather symbiotic collaboration, providing mul- tiple contributions for both the side of the cultural heritage institutions as well as the side of the Semantic Web researchers. Of special note are: the descrip- tion of search techniques currently applied by cultural heritage organizations on their published data; the discussion of a generic method to transform legacy data to linked data, including a detailed analysis of each step of the process; and the development of a prototype of a faceted browser which utilizes the transformed data. The products of this research, a fully functional transformation method and a working prototype of a faceted browser, show exactly how cultural heritage organizations can benefit from new technologies provided within the Semantic Web. Thesis Committee: Chair: Prof.dr.ir. G.J.P.M. Houben, Faculty EEMCS, TUDelft University supervisor: Dr. L. Hollink , Faculty EEMCS, TUDelft Company supervisor: Drs. M. Beumer, Erfgoed Delft en Omstreken Committee Member: IDr. M. Pinzger, Faculty EEMCS, TUDelft Preface Writing a thesis is like hiking a mountain. While some parts of the trail are steep and slippery, others provide a beautiful view and a place to relax and sit down. The goal of such a trip is often to reach that phenomenal scenic vista point, and to overcome ones personal limits. For me, this thesis provided a journey from tricky definitions, towards understanding the beauty of the Semantic Web. Furthermore, this hike marks the personal achievement of completing my master’s and moving on to a new chapter in life. When going off to hike, some company is always pleasant as companions can warn for sharp rocks, dangerous curves and other hazardous situations and can provide the support needed to press on. I therefore would like to thank several people for their help, support and great ideas: Joseba, my beloved wife, thanks for all your support.. Marjolein Beumer, I will remember the discussions we had about data.. Laura Hollink, always sharp, trying to figure out what the real thing was that I meant.. Peter de Klerk, the brainstorming sessions on how to create RDF were helpful.. Kim Schouten, I hope I can enlist you again to help me write proper English.. Of course there are many more people that have been an asset in conducting the re- search and writing this thesis. These people are (in alphabetic order): Bennie Blom, Frans Bridié, Arthur Hanselman, Geert-Jan Houben, Anita Jansen, Karin Kievit, Frank Meijer, Wim van Rotterdam, Michel van Tol, Wout van Wezel, and Ivo Zandhuis. Martijn van Egdom Voorburg, the Netherlands February 2, 2012 1 Contents Contents 3 List of Figures 5 I Context 7 1 Introduction 9 1.1 Research Questions . 9 1.2 Scope . 10 1.3 Relationship with Erfgoed Delft en Omstreken . 11 1.4 Related Projects . 11 1.5 A small introduction into the Semantic Web . 12 1.6 Why Linked Open Data? . 16 1.7 Thesis structure . 20 II Cultural Heritage Search 21 2 A survey of search techniques for cultural heritage 23 2.1 A brief history . 23 2.2 Methodology . 24 2.3 Search techniques . 25 2.4 Considerations & conclusion . 34 3 Currently applied search techniques at Erfgoed Delft 37 3.1 Methodology . 37 3.2 Systems . 38 3.3 Observations & conclusion . 43 III Semantic Web for Cultural Heritage 45 4 Transforming legacy data 47 3 CONTENTS CONTENTS 4.1 Semantic value . 47 4.2 Characteristics of high quality RDF . 48 4.3 The transformation recipe: a generic method . 50 4.4 General issues & guidelines . 53 4.5 An extended case study . 57 4.6 Limitations . 65 4.7 Evaluation . 66 4.8 Conclusion . 68 5 A Faceted Browser 71 5.1 Faceted browsing requirements . 72 5.2 Architecture . 73 5.3 Optimizing facets . 75 5.4 Search Performance . 79 5.5 Feedback on Facet . 80 5.6 Results & conclusion . 82 IV Conclusions 85 6 Conclusions and future work 87 6.1 Contributions . 87 6.2 Research conclusions . 87 6.3 Summary per research question . 88 6.4 Future work . 89 Bibliography 91 Glossary 93 List of Abbreviations 95 A List of Archives 97 B Full list of top 50 museums 99 C Diagrams of the transformation of the Mierenvelt Dataset 103 D Baptism in Delft data-sample 109 E Diagram Linked Open Data Cloud 111 4 List of Figures 1.1 The original architecture of the web. 12 1.2 Dynamic pages using databases. 13 1.3 Web 2.0 . 14 1.4 Basic architecture of the Semantic Web . 15 2.1 Museum using Google (www.habitot.org) . 26 2.2 History of Chicago (encyclopedia.chicagohistory.org) . 27 2.3 Rein Sofia museum - VR Tour (http://www.googleartproject.com)) . 28 2.4 Basic search in the collection (http://www.britishmuseum.org/) . 29 2.5 Looking for... (http://www.tante.org.uk) . 30 2.6 Thesaurus term of Stichting Volkenkundige Collectie Nederland (SVCN) (www.svcn.org) . 31 2.7 Searching for Medals (http://collections.vam.ac.uk/) . 33 3.1 A charter (copyright Erfgoed Delft) . 41 3.2 Detail of the last will and testamen of A.J. van Brouwershaven - 1508 (copyright Erfgoed Delft) . 41 5.1 Overview Architecture of Facet . 73 5.2 Search versus Facet Filters . 74 C.1 Step 2: Convert to plain RDF . 103 C.2 Step 3: Complete the RDF . 104 C.3 Step 4: Link to other resources within the data itself . 105 C.4 Step 6: Link with more common ontologies . 106 C.5 Step 7: Enrich by linking to other datasets on the web . 107 E.1 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ . 111 5 Part I Context 7 Chapter 1 Introduction The introduction of the World Wide Web set off a revolution of astronomical pro- portions. Within just two decades it completely changed businesses and lifestyles of people [9]. This revolution is based on the observation (and a bit of frustration) of just one man: Tim Berners-Lee. Late 1980’s he realized: "It’s not the computers which are interesting, it’s the documents!". "The Web is more a social creation than a technical one. I designed it for a social effect to help people work together and not as a technical toy. The ultimate goal of the Web is to support and improve our weblike existence in the world. We clump into families, associations, and companies. We develop trust across the miles and distrust around the corner. [2]" Tim Berners-Lee Nowadays, mankind is making the next revolutionary move: "It’s not the documents, it is the things they are about, which are important". This is the basic philosophy behind the Semantic Web. As quoted, Tim Berners-Lee states that the Web is a social, rather than a technical concept. While working on Semantic Web technologies at Erfgoed Delft, a Cultural Heritage institution in Delft, seems to be primarily a technology effort, the intended results are nevertheless of a social kind. People are influenced by their culture and history, both as an individual and as a society. With the technology described in this thesis and applied in a prototype, Erfgoed Delft is able to serve people by offering them tools to explore their history. 1.1 Research Questions The research is conducted in participation with Erfgoed Delft. The main research ques- tion is: How can cultural heritage organizations, like Erfgoed Delft, benefit from new Semantic Web technologies with respect to their collections on the Web? 9 1.2 Scope Introduction In order to be able to answer the main research question, the following research ques- tions have been formulated: 1. What methods are available on the Web to offer the public insight in the collec- tions of cultural heritage organizations. 2. What methods are currently used by Erfgoed Delft that enable the public to search through their collections. 3. What is a suitable method to transform ‘legacy collection data’ into Semantic Web formats. 4. What methods can cultural heritage organizations use to allow people to explore their semantic collection data. In this thesis, collections on the Web are seen as the combination of both data and applications presenting the data in some form. This definition is reflected in all four research questions. The first two questions are primarily background questions: they are required to get the needed insight into the current state of collection websites. The third question will be the core of this research, being from the academic Semantic Web domain point of view.