Thesis Sduarte

Total Page:16

File Type:pdf, Size:1020Kb

Thesis Sduarte Information Retrieval for Children: Search Behavior and Solutions Sergio Ra´ulDuarte Torres Graduation committee: Chairman and Secretary Prof. dr. P. M. G. Apers University of Twente, NL Promoters Prof. dr. P. M. G. Apers University of Twente, NL Prof. dr. T. W. C. Huibers University of Twente, NL Assistant promoter Dr. ir. D. Hiemstra University of Twente, NL Members Prof. dr. Alan Smeaton Dublin City University, Ireland Prof. dr. Arjen de Vries TU Delft / CWI, NL Prof. dr. Franciska de Jong University of Twente, NL Prof. dr. Dirk Heylen University of Twente, NL Dr. Jaime Arguello University of North Carolina at Chapel Hill, USA CTIT Ph.D. Thesis Series No. 14-295 Centre for Telematics and Information Technology University of Twente P.O. Box 217, 7500 AE Enschede, NL SIKS Dissertation Series No. 2014-03 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. ISBN 978-90-365-3618-9 ISSN 1381-3617 DOI 10.3990./1.9789036536189 Cover Design Avril Follega/Sergio Duarte Printed by Gildeprint Drukkerijen Copyright c 2014, Sergio Ra´ulDuarte Torres, Enschede, The Netherlands All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, without the prior written permission of the author. INFORMATION RETRIEVAL FOR CHILDREN: SEARCH BEHAVIOR AND SOLUTIONS DISSERTATION to obtain the degree of doctor at the University of Twente, on the authority of the rector magnificus, Prof. dr. H. Brinksma, on account of the decision of the graduation committee to be publicly defended on Friday, 14th of February, 2014 at 16.45 by Sergio Ra´ulDuarte Torres born on December 19, 1983 in Bogot´a,Colombia This dissertation is approved by: Prof. dr. P. M. G. Apers (promotor) Prof. dr. T. W. C. Huibers (promotor) Dr. ir. Djoerd Hiemstra (assistent-promotor) This thesis is dedicated to the loving memory of my beloved father. May his memory forever be a source of inspiration and blessings. Thanks for everything. Acknowledgements I have finally completed my doctoral thesis after something more than four years. It has been a long way with ups and downs, and foremost, with plenty of learning. In these pages I want to express my gratitude to the persons that sup- ported me, gave me a hand, and in general to the persons that provided me their company in this journey. All of them played and important role in the completion of this thesis and a very important role in making fun the time I spent developing this work. First at all, I want to thank my parents who did not only raise me and take care of me, but also always make their best efforts to make my dreams possible. I deeply appreciate the advices and company of my sister during the difficult moments. I am also very thankful to Avril for sharing her playful attitude and joy towards life, and of course, for designing the cover of this thesis. I would like to express my deepest gratitude and appreciation to my pro- moter Peter Apers and my co-promoter Theo Huibers, who gave me the opportunity to carry out the research presented in this thesis, and more importantly, for giving me the chance of becoming a doctor. Foremost, I want to thank my daily supervisor, Djoerd Hiemstra, for his continuous guidance, motivation, patience and vast knowledge. I am certain I would have not reached this point without his advises, and I am certain that I am better professional and person after his mentoring. Besides my supervisor, I would like to thank the other members of the committee: Dr. Alan Smeaton, Dr. Jaime Arguello, Dr. Arjen de Vries, Dr. Franciska de Jong and Dr Dirk Heylen, for their generous time and good will through the preparation and review of this thesis. I feel very honored to have such a distinguished and tough committee. I am also very grateful to all the members of the DB group. I would particularly like to thank Ida and Suse, who helped me in a myriad ways, from giving me tips about the Dutch life to let my PhD going all these months. I also really appreciate Jan's support with the IT issues. I greatly enjoyed the conversations about his trips, which were very encouraging. Many thanks to Riham who helped me to get along with the Dutch life during my first months in the Netherlands; to Rezwan for his company, especially during the first months of my staying in the UT; to Juan for all those conversations about technical and random stuff that came out every time I visited his office; to Zhemin and Mohammad for the nice conversations, and to Christoph, Mena, Brend, Victor, Iwe, Almer and Lei for their company during the daily life in the group. I own a very important debt to all the members of the PuppyIR project, from whom I did not only receive some tasks, but also plenty of feedback and enlightening moments. Particularly, I highly appreciate the construc- tive comments and warm encouragement of Arjen de Vries, Franciska de Jong, Ian Ruthven, Leif Azzopardi, Andreas Lingnau and Marie-Francine Moens. Many of those comments shape the structure of my thesis. I want to extend my appreciation to Carsten Eickhoff, Ric Glassey, Frans Van der Sluis and Tamara Polajnar for giving me the chance of cooperat- ing with them in some of the endeavors of the project and the publications that were the results of that effort. I also highly appreciate the guidance and tips provided by Hanna Jockmann-Mannak these months. I woud like to show my gratitude to Pavel, as a former member of the DB group and the PuppyIR project, for his mentoring at the beginning of my PhD, for his ideas, suggestions and most of all for his encouragement. An important part of my research was carried out at Yahoo! Labs in Barcelona. I want to thank all the members of the lab. Particularly, I am very thankful to Ricardo Baeza-Yates for facilitating my stay in the labs and I highly appreciate the discussions and feedback offered by Mounia Lalmas, first in the PuppyIR meetings and then in the labs. I owe sincere and earnest thankfulness to Ingmar, for his assistance and mentoring. I really appreciate his patience and tenacity, which helped me make the most of my time in the lab. I learned a big deal from him. I also want to thank Hugues Bouchard for making my stay in Barcelona lively and showing me all those funky places in G´otico.Thanks a lot to Marek, Luca, Eraldo, Bart, Ruth, Giorgos, Jannete and Michele for the fussball matches and for hanging out in the city. Special thanks to Erik for his company and especially for his geeky tips and jokes during our stay. I have many people to thank beyond the scope of the campus and the research world. Firstly, I want to say thank you to the Piepke community, a.k.a my housemates. I am very thankful to Robin, as a colleague and even more as a housemate, I really appreciate his advices and his friendship. I also want to thank Marga for her kindness and everlasting tolerance of all my Latin habits. I also appreciate the time as housemates with Danielle, and more recently with Hans, Diego and Vincent. In these long months there were two persons that were always there when I need them. I will be always in depth debt with Maral for keeping my sanity through all our conversations and good coffee moments. I am also very glad to have met Juan Jimenez from the very early stages of my PhD, he has been a great friend and a person I can always count on. I am also glad to be part of the Latin move of Enschede, including my year as a board member of La Voz. It was very rewarding to be part of the board along with Oscar, Teo, Diruji, John, Juan Carlos and Daniela. During my stay in Enschede I met wonderful people. I feel blessed for meeting Andrea, Andreita, Andreea and Adriana, for their friendship and dancing lessons. It was a pleasure to attend the gigs of Chilangos and Rimon, so thanks a lot to David, Oscar, Jorge, Diego and Gerard. It was great to count with Abraham, Cesar, Norma and Israel at lunchtime, especially because everybody else likes to have lunch too early. I thank Eduardo and Nayeli, for their company during my first months in En- schede and for encouraging me to play f´utbol, although it did not last long, it was great fun. I also enjoyed greatly the meetings with Vicky, Lorenzo, Jorge and Maite in all the parties and dinners. Talking about parties, I had some very nice moments in places like the Molly Malone and, of course, the Paddy's, particularly with Desmond, Juan Pablo, Nico, Ignacio, Marina, Pablo, Jenny, Arturo (both), Javier, and many of the persons I have mentioned so far. For the times I was back home, I have to thank los compas, for all the catching up, the f´utbol games, the drinking and their immense encour- agement. The company of Goyes, Barrantes, Juan Manuel, Juan Carlos, Lina, Lis, Raul and Camilo were refreshing and completely necessary to recharge my energies prior to every year in the Netherlands. Without those moments I am sure I would have taken much longer to finish this thesis. Among my compas, I am always in debt with David, who has filled me with motivation for so many years and who has been there all the time during my PhD, even in the long distance, his friendship has always been invaluable.
Recommended publications
  • DEWS: a Decentralized Engine for Web Search
    DEWS: A Decentralized Engine for Web Search Reaz Ahmed, Rakibul Haque, Md. Faizul Bari, Raouf Boutaba David R. Cheriton School of Computer Science, University of Waterloo [r5ahmed|m9haque|mfbari|rboutaba]@uwaterloo.ca Bertrand Mathieu Orange Labs, Lannion, France [email protected] (Technical Report: CS-2012-17) Abstract The way we explore the Web is largely governed by centrally controlled, clustered search engines, which is not healthy for our freedom in the Internet. A better solution is to enable the Web to index itself in a decentralized manner. In this work we propose a decentralized Web search mechanism, named DEWS, which will enable the existing webservers to collaborate with each other to form a distributed index of the Web. DEWS can rank the search results based on query keyword relevance and relative importance of websites. DEWS also supports approximate matching of query keywords and incremental retrieval of search results in a decentralized manner. We use the standard LETOR 3.0 dataset to validate the DEWS protocol. Simulation results show that the ranking accuracy of DEWS is very close to the centralized case, while network overhead for collaborative search and indexing is logarithmic on network size. Simulation results also show that DEWS is resilient to changes in the available pool of indexing webservers and works efficiently even in presence of heavy query load. 1 Introduction Internet is the largest repository of documents that man kind has ever created. Voluntary contributions from millions of Internet users around the globe, and decentralized, autonomous hosting infrastructure are the sole factors propelling the continuous growth of the Internet.
    [Show full text]
  • Towards the Ontology Web Search Engine
    TOWARDS THE ONTOLOGY WEB SEARCH ENGINE Olegs Verhodubs [email protected] Abstract. The project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies in the Web. OWL (Web Ontology Languages) ontologies are meant, and they are necessary for the functioning of the SWES (Semantic Web Expert System). SWES is an expert system that will use found ontologies from the Web, generating rules from them, and will supplement its knowledge base with these generated rules. It is expected that the SWES will serve as a universal expert system for the average user. Keywords: Ontology Web Search Engine, Search Engine, Crawler, Indexer, Semantic Web I. INTRODUCTION The technological development of the Web during the last few decades has provided us with more information than we can comprehend or manage effectively [1]. Typical uses of the Web involve seeking and making use of information, searching for and getting in touch with other people, reviewing catalogs of online stores and ordering products by filling out forms, and viewing adult material. Keyword-based search engines such as YAHOO, GOOGLE and others are the main tools for using the Web, and they provide with links to relevant pages in the Web. Despite improvements in search engine technology, the difficulties remain essentially the same [2]. Firstly, relevant pages, retrieved by search engines, are useless, if they are distributed among a large number of mildly relevant or irrelevant pages.
    [Show full text]
  • Merging Multiple Search Results Approach for Meta-Search Engines
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by D-Scholarship@Pitt MERGING MULTIPLE SEARCH RESULTS APPROACH FOR META-SEARCH ENGINES By Khaled Abd-El-Fatah Mohamed B.S-------- Cairo University, Egypt, 1995 M.A-------Cairo University, Egypt, 1999 M.A------- University of Pittsburgh 2001 Submitted to the Graduate Faculty of School of Information Sciences in Partial Fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2004 UNIVERSITY OF PITTSBURGH INFORMATION SCIENCES This dissertation was presented by Khaled Abd-El-Fatah Mohamed It was defended on Janauary 29, 2004 and approved by Chris Tomer, PhD, Associate Professor, DLIS Jose-Marie Griffiths, PhD, Professor, DLIS Don King, Research Professor, DLIS Amy Knapp, PhD, ULS Dissertation Director: Chris Tomer, PhD, Associate Professor MERGING MULTIPLE SEARCH RESULTS APPROACH FOR META-SEARCH ENGINES Khaled A. Mohamed, PhD University of Pittsburgh, 2004 Meta Search Engines are finding tools developed for enhancing the search performance by submitting user queries to multiple search engines and combining the search results in a unified ranked list. They utilized data fusion technique, which requires three major steps: databases selection, the results combination, and the results merging. This study tries to build a framework that can be used for merging the search results retrieved from any set of search engines. This framework based on answering three major questions: 1. How meta-search developers could define the optimal rank order for the selected engines. 2. How meta-search developers could choose the best search engines combination. 3. What is the optimal heuristic merging function that could be used for aggregating the rank order of the retrieved documents form incomparable search engines.
    [Show full text]
  • Organizing User Search Histories
    Global Journal of Computer Science and Technology Network, Web & Security Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172 & Print ISSN: 0975-4350 Organizing user Search Histories By Ravi Kumar Yandluri Gokaraju Rangaraju Institute of Engineering & Technology, India Abstract - Internet userscontinuously make queries over web to obtain required information. They need information about various tasks and sub tasks for which they use search engines. Over a period of time they make plenty of related queries. Search engines save these queries and maintain user’s search histories. Users can view their search histories in chronological order. However, the search histories are not organized into related groups. In fact there is no organization made except the chronological order. Recently Hwang et al. studied the problem of organizing historical search information of users into groups dynamically. This automatic grouping of user search histories can help search engines also in various applications such as collaborative search, sessionization, query alterations, result ranking and query suggestions. They proposed various techniques to achieve this. In this paper we implemented those techniques practically using a prototype web application built in Java technologies. The experimental results revealed that the proposed application is useful to organize search histories. Indexterms : search engine, search history, click graph, query grouping. GJCST-E Classification : H.3.5 Organizing user Search Histories Strictly as per the compliance and regulations of: © 2013. Ravi Kumar Yandluri. This is a research/review paper, distributed under the terms of the Creative Commons Attribution- Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction inany medium, provided the original work is properly cited.
    [Show full text]
  • A Community-Based Approach to Personalizing Web Search
    COVER FEATURE A Community-Based Approach to Personalizing Web Search Barry Smyth University College Dublin Researchers can leverage the latent knowledge created within search communities by recording users’search activities—the queries they submit and results they select—at the community level.They can use this data to build a relevance model that guides the promotion of community-relevant results during regular Web search. ver the past few years, current Web search through rates should be preferred during ranking. engines have become the dominant tool for Unfortunately, Direct Hit’s so-called popularity engine accessing information online. However, even did not play a central role on the modern search stage today’s most successful search engines struggle (although the technology does live on as part of the O to provide high-quality search results: Approx- Teoma search engine) largely because the technology imately 50 percent of Web search sessions fail to find proved inept at identifying new sites or less well- any relevant results for the searcher. traveled ones, even though they may have had more The earliest Web search engines adopted an informa- contextual relevance to a given search. tion-retrieval view of search, using sophisticated term- Despite Direct Hit’s fate, the notion that searchers based matching techniques to identify relevant docu- themselves could influence the ranking of results by virtue ments from repeated occurrences of salient query terms. of their search activities remained a powerful one. It res- Although such techniques proved useful for identifying onates well with the ideas that underpin the social web, a set of potentially relevant results, they offered little a phrase often used to highlight the importance of a suite insight into how such results could be usefully ranked.
    [Show full text]
  • Do Metadata Models Meet IQ Requirements?
    Do Metadata Mo dels meet IQ Requirements Claudia Rolker Felix Naumann Humb oldtUniversitat zu Berlin Forschungszentrum Informatik FZI Unter den Linden HaidundNeuStr D Berlin D Karlsruhe Germany Germany naumanndbisinformatikhub erli nde rolkerfzide Abstract Research has recognized the imp ortance of analyzing information quality IQ for many dierent applications The success of data integration greatly dep ends on the quality of the individual data In statistical applications p o or data quality often leads to wrong conclusions High information quality is literally a vital prop erty of hospital information systems Po or data quality of sto ck price information services can lead to economically wrong decisions Several pro jects have analyzed this need for IQ metadata and have prop osed a set of IQ criteria or attributes which can b e used to prop erly assess information quality In this pap er we survey and compare these approaches In a second step we take a lo ok at existing prominent prop osals of metadata mo dels esp ecially those on the Internet Then we match these mo dels to the requirements of information quality mo deling Finally we prop ose a quality assurance pro cedure for the assurance of metadata mo dels Intro duction The quality of information is b ecoming increasingly imp ortant not only b ecause of the rapid growth of the Internet and its implication for the information industry Also the anarchic nature of the Internet has made industry and researchers aware of this issue As awareness of quality issues amongst information
    [Show full text]
  • A New Algorithm for Inferring User Search Goals with Feedback Sessions
    International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 3 Issue 11, November 2014 A Survey on: A New Algorithm for Inferring User Search Goals with Feedback Sessions Swati S. Chiliveri, Pratiksha C.Dhande a few words to search engines. Because these queries are short Abstract— For a broad-topic and ambiguous query, different and ambiguous, how to interpret the queries in terms of a set users may have different search goals when they submit it to a of target categories has become a major research issue. search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance and For example, when the query ―the sun‖ is submitted to a user experience. This paper provides search engine, some people want to locate the homepage of a an overview of the system architecture of proposed feedback United Kingdom newspaper, while some people want to session framework with their advantages. Also we have briefly discussed the literature survey on Automatic Identification of learn the natural knowledge of the sun. Therefore, it is User Goals in Web Search with their pros and cons. We have necessary and potential to capture different user search goals, presented working of various classification approaches such as user needs in information retrieval. We define user search SVM, Exact matching and Bridges etc. under the web query goals as the information on different aspects of a query that classification framework. Lastly, we have described the existing user groups want to obtain. Information need is a user’s web clustering engines such as MetaCrawler, Carrot 2, particular desire to obtain information to satisfy users need.
    [Show full text]
  • Developing Semantic Digital Libraries Using Data Mining Techniques
    DEVELOPING SEMANTIC DIGITAL LIBRARIES USING DATA MINING TECHNIQUES By HYUNKI KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by Hyunki Kim To my family ACKNOWLEDGMENTS I would like to express my sincere gratitude to my advisor, Dr. Su-Shing Chen. He has provided me with financial support, assistance, and active encouragement over the years. I would also like to thank my committee members, Dr. Gerald Ritter, Dr. Randy Chow, Dr. Jih-Kwon Peir, and Dr. Yunmei Chen. Their comments and suggestions were invaluable. I would like to thank my parents, Jungza Kim and ChaesooKim, for their spiritual support from thousand miles away. I would also like to thank my beloved wife, Youngmee Shin, and my sweet daughter, Gayoung Kim, for their constant love, encouragement, and patience. I sincerely apologize to my family for having not taken care of them for so long. I would never have finished my study without them. Finally, I would like to thank my friends, Meongchul Song, Chee-Yong Choo, Xiaoou Fu, Yu Chen, for their help. iv TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................................................................................. iv LIST OF TABLES........................................................................................................... viii LIST OF FIGURES ..........................................................................................................
    [Show full text]
  • Investigating Search Processes in Collaborative Exploratory Web Search
    INVESTIGATING SEARCH PROCESSES IN COLLABORATIVE EXPLORATORY WEB SEARCH by Zhen Yue B.S., Nanjing University, 2005 M.S., Peking University, 2007 Submitted to the Graduate Faculty of School of Information Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2014 1 UNIVERSITY OF PITTSBURGH SCHOOL OF INFORMATION SCIENCES This dissertation was presented by Zhen Yue It was defended on April 1, 2014 and approved by Bernard J. Jansen, Ph.D., Associate Professor, College of Information Sciences and Technology, Pennsylvania State University Peter Brusilovsky, Ph.D., Professor, School of Information Sciences, University of Pittsburgh Ellen Detlefsen, D.L.S., Associate Professor, School of Information Sciences, University of Pittsburgh Jung Sun Oh, Ph.D., Asistant Professor, School of Information Sciences, University of Pittsburgh Dissertation Advisor: Daqing He, Ph.D., Associate Professor, School of Information Sciences, University of Pittsburgh 2 Copyright © by Zhen Yue 2014 3 INVESTIGATING SEARCH PROCESSES IN COLLABORATIVE EXPLORATORY WEB SEARCH Zhen Yue, PhD University of Pittsburgh, 2014 People are often engaged in collaboration in workplaces or daily life due to the complexity of tasks. In the information seeking and retrieval environment, the task can be as simple as fact- finding or a known-item search, or as complex as exploratory search. Given the complex nature of the information needs, exploratory searches may require the collaboration among multiple people who share the same search goal. For instance, students may work together to search for information in a collaborative course project; friends may search together while planning a vacation. There are demands for collaborative search systems that could support this new format of search (Morris, 2013).
    [Show full text]
  • Study of Crawlers and Indexing Techniques in Hidden Web
    Sweety Mangla et al, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.4, April- 2015, pg. 598-606 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 4, Issue. 4, April 2015, pg.598 – 606 RESEARCH ARTICLE Study of Crawlers and Indexing Techniques in Hidden Web Sweety Mangla1, Geetanjali Gandhi2 1M.Tech Student, Department of CSE, B.S.Anangpuria Institute of Technology and Management, Alampur, India 2Assistant Professor, Department of CSE, B.S.Anangpuria Institute of Technology and Management, Alampur, India 1 [email protected] 2 [email protected] Abstract- The traditional search engine work to crawl, index and query the “Surface web”. The Hidden Web is the willing on the Web that is accessible through a html form. not general search engines. This Hidden web forms 96% of the total web .The hidden web carry the high quality data and has a wide coverage. So hidden web has always stand like a golden egg in the eyes of the researcher. This high quality information can be restored by hidden web crawler using a Web query front-end to the database with standard HTML form attribute. The documents restored by a hidden web crawler are more proper, as these documents are attainable only through dynamically generated pages, delivered in response to a query. A huge amount of data is hidden in the databases behind the search interfaces which need to be indexed so in order to serve user’s query.
    [Show full text]
  • Searching for People on Web Search Engines Amanda Spink and Bernard J
    The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at www.emeraldinsight.com/researchregister www.emeraldinsight.com/0022-0418.htm JD 60,3 Searching for people on Web search engines Amanda Spink and Bernard J. Jansen 266 School of Information Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA Received 20 October 2003 Jan Pedersen Revised 21 November 2003 Overture Web Search Division, Palo Alto, California, USA Accepted 23 November 2003 Keywords Information retrieval, Information searches, Worldwide web Abstract The Web is a communication and information technology that is often used for the distribution and retrieval of personal information. Many people and organizations mount Web sites containing large amounts of information on individuals, particularly about celebrities. However, limited studies have examined how people search for information on other people, using personal names, via Web search engines. Explores the nature of personal name searching on Web search engines. The specific research questions addressed in the study are: “Do personal names form a major part of queries to Web search engines?”; “What are the characteristics of personal name Web searching?”; and “How effective is personal name Web searching?”. Random samples of queries from two Web search engines were analyzed. The findings show that: personal name searching is a common but not a major part of Web searching with few people seeking information on celebrities via Web search engines; few personal name queries include double quotations or additional identifying terms; and name searches on Alta Vista included more advanced search features relative to those on AlltheWeb.com.
    [Show full text]
  • Web Miningmining –– Datadata Miningmining Imim Internetinternet
    WebWeb MiningMining –– DataData MiningMining imim InternetInternet Vorlesung SS 2010 Johannes Fürnkranz TU Darmstadt Hochschulstrasse 10 D-64289 Darmstadt 06151/166238 [email protected] 1 © J. Fürnkranz GeneralGeneral InformationInformation ● Web-page: http://www.ke.informatik.tu-darmstadt.de/lehre/ss10/web-mining/ ● Text: ● Soumen Chakrabarti: Mining the Web – Discovering Knowlege from Hypertext Data, Morgan Kaufmann Publishers 2003. http://www.cse.iitb.ac.in/~soumen/mining-the-web/ readable online in htttp://books.google.de ● Christopher D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008 complete book freely available at http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html ● Johannes Fürnkranz: Web Mining. The Data Mining and Knowledge Discovery Handbook, Springer-Verlag 2005. Book chapter with many pointers to the literature ● Various other articles available from the Web-page ● Lecture Slides: ● available from course page (additional slides at book pages) 2 © J. Fürnkranz ÜbungenÜbungen ● 6 Aufgaben Programmierung ist notwendig ● aber die Programme sind nur Mittel zum Zweck ca. alle 2 Wochen eine Abgabe ● Ausarbeitung der Lösungen ● Übungsstunden Durchbesprechen der abgegebenen Lösungen Jeder der abgibt, muß anwesend sein, und die Lösung vorführen können ● Beurteilung: Bonuspunkte bei bestandener Klausur Verbesserungen bis zu einem Notengrad sind möglich ● Gruppenarbeit möglich Gruppengröße max. 3 3 © J. Fürnkranz OverviewOverview ● Motivation
    [Show full text]