Eindhoven University of Technology MASTER Dynamic Integration Of
Total Page:16
File Type:pdf, Size:1020Kb
Eindhoven University of Technology MASTER Dynamic integration of web and TV content for personalized information retrieval in interactive TV Smeets, C.J.J.(Chris) Award date: 2007 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain TECHNISCHE UNIVERSITEIT EINDHOVEN Department of Mathematics and Computer Science Dynamic Integration of Web and TV content for Personalized Information Retrieval in Interactive TV By C.J.J. Smeets Supervisor: Dr. Lora Aroyo (TU/e) Mentor: Ir. Pieter Bellekens (TU/e) Eindhoven, November 2007 2 Abstract This thesis concludes my work on the ITEA Passepartout project. The Passepartout project envisions an ambient interactive personalized connected home environment. In this thesis a semantics-based framework, called SenSee, for next generation television is extended. SenSee provides personalized access to TV content in a cross-media environment. It allows for an integrated view on data harvested from heterogeneous and distributed Web sources. The ultimate goal is to support individual and group TV viewers (operating multiple devices, e.g. PC, set-top box and a mobile) in finding programs that suit best their interests. In this paper, I focus on the integration of multiple data sources, such as BBC data from BBC backstage, XMLTV and IMDB. Moreover, these sources are interconnected and mapped to external vocabularies, like OWL Time, Geo Ontology, TV Anytime genre classifications and WordNet. The integration of these sources lead to a gigantic dataset, it is necessary to do some performance tests, with the purpose to guarantee a fast performance of operations on the dataset. The resulting dataset is controlled by a client-application, called iFanzy, by offering the user a faceted browsing view on the data. Users can search and browse the data based on facets for time, location and genre. 3 4 Preface This master thesis is the final project of my study at the Department of Mathematics and Computer Science at the Eindhoven University of Technology (TU/e). The work described in this thesis was carried out at the Information System group at the Department of Mathematics and Computer Science at the Eindhoven University of Technology. Special thanks to Lora Aroyo and Pieter Bellekens for their guidance throughout this project, sharing their experience in the field and for making this work an enjoyable experience. Also thanks to all those who contributed to this work, especially Geert-Jan Houben en Kees van der Sluijs for their input and evaluation during the work. My final word of thanks goes to my family who supported me unconditionally throughout this entire thesis. September 2007, Chris Smeets 5 6 Presentations and Collaborations The work described in this thesis is also used in the following papers: • SenSee Framework for Personalized Access to TV Content Lora Aroyo, Pieter Bellekens, Martin Bjorkman, Geert-Jan Houben, Paul Akkermans, and Annelies Kaptein. In: EuroITV 2007. Pages 156-165. • Engineering Semantic-Based Interactive Multi-device Web Applications Pieter Bellekens, Kees van der Sluijs, Lora Aroyo, Geert-Jan Houben. In: 7th International Conference on Web Engineering (ICWE 2007), 16-20 July 2007, Como, Italy, LNCS 4607, pp. 328 - 343 • Semantics-based Framework for Personalized Access to TV Content: Personalized TV Guide Use Case. Pieter Bellekens, Kees van der Sluijs, Lora Aroyo, Geert-Jan Houben and Annelies Kaptein. In: SWChallenge 2007. The work described in this thesis also was used in the following presentations: • EuroITV 2007 Conference • Semantic Web Challenge 2007 A demo of the client application can be found at: • http://wwwis.win.tue.nl:8888/SenSee/SenSeeWebClient.html The work done in this thesis is also used in iFanzy. • http://www.stoneroos.nl/solutions/productdetails/3101 • http://en.wikipedia.org/wiki/IFanzy 7 8 Table of Contents Chapter 1 Introduction .........................................................................................................................................................11 1.1 Project overview..........................................................................................................................................................12 1.2 Document overview ...................................................................................................................................................13 Chapter 2 Goal and Problem Definition ...........................................................................................................................15 2.1 Research Questions....................................................................................................................................................16 2.2 Goals and deliverables................................................................................................................................................17 Chapter 3 SenSee Framework.............................................................................................................................................19 3.1.1 iFanzy (deliverable 3) .........................................................................................................................................21 Chapter 4 Design aspects for the SenSee content modelling ......................................................................................23 4.1 Data sources.................................................................................................................................................................23 4.1.1 Broadcasts .............................................................................................................................................................24 4.1.2 Video on demand.................................................................................................................................................25 4.1.3 Additional sources...............................................................................................................................................27 4.2 Vocabularies..................................................................................................................................................................28 4.2.1 Geographical .........................................................................................................................................................29 4.2.2 Wordnet................................................................................................................................................................30 4.2.3 Wikipedia...............................................................................................................................................................30 4.2.4 TV-Anytime...........................................................................................................................................................32 4.2.5 OWL-Time............................................................................................................................................................33 4.3 Extended common scheme.......................................................................................................................................35 4.3.1 Sources to be integrated....................................................................................................................................35 4.3.2 Integration process..............................................................................................................................................39 4.3.3 Conclusion.............................................................................................................................................................47 4.4 Presentation & Faceted browsing: Client aspects................................................................................................48 Chapter 5 Implementation....................................................................................................................................................51 5.1.1 Implementation tools..........................................................................................................................................51 5.1.2 XMLTV Parser......................................................................................................................................................52 5.1.3 IMDb Parser..........................................................................................................................................................52 5.1.4 Performance