LAICOS: An Open Source Platform for Personalized Social Web Search
∗ Mohamed Reda∗ Hakim Hacid Mokrane Bouzeghoub Bouadjenek Sidetrade PRiSM Laboratory PRiSM Laboratory 114 Rue Gallieni, 92100 Versailles University Versailles University Boulogne-Billancourt, France mokrane.bouzeghoub@ [email protected] [email protected] prism.uvsq.fr
ABSTRACT sents an interesting opportunity for enhancing existing data In this paper, we introduce LAICOS, a social Web search management systems or proposing new ones. engine as a contribution to the growing area of Social In- Information Retrieval (IR) is performed every day in an formation Retrieval (SIR). Social information and personal- obvious way over the Web [1], typically under a search en- ization are at the heart of LAICOS. On the one hand, the gine. However, finding relevant information on the Web still social context of documents is added as a layer to their tex- becomes harder for end-users as: (i) usually, the user doesn’t tual content traditionally used for indexing to provide Per- necessarily know what he is looking for until he reaches it, sonalized Social Document Representations. On the other and (ii) even if the user knows what he is looking for, he hand, the social context of users is used for the query expan- doesn’t always know how to formulate the right query to sion process using the Personalized Social Query Expansion find it (except if in cases of navigational queries [7]). In ex- framework (PSQE) proposed in our earlier works. We de- isting IR systems, queries are usually interpreted and pro- cessed using document indexes and/or ontologies, which are scribe the different components of the system while relying 1 on social bookmarking systems as a source of social infor- hidden for users. The resulting documents are not neces- mation for personalizing and enhancing the IR process. We sarily relevant from an end-user perspective, in spite of the show how the internal structure of indexes as well as the ranking performed by the Web search engine. query expansion process operated using social information. To improve the IR process and reduce the amount of irrel- evant documents, there are mainly three possible improve- Categories and Subject Descriptors: H.3.3 [Informa- ment tracks: (i) query reformulation using extra knowledge, tion Systems]: Information Search and Retrieval i.e. expansion or refinement of the user query, (ii) post fil- General Terms: Algorithms, Experimentation. tering or re-ranking of the retrieved documents (based on the user profile or context), and (iii) improvement of the IR Keywords: Information Retrieval, Social networks. model, i.e. the way documents and queries are represented and matched to quantify their similarities. 1. INTRODUCTION In this demonstration, we introduce LAICOS, an open source Personalized Social Web Search Engine prototype, The Web 2.0 has strengthened end-users position in the Web in which social information and personalization are at the through their integration in the heart of the ecosystem of heart of the IR process. Actually, this prototype is relying on content generation. This has been made possible mainly social bookmarking systems as a source of social information through the availability of tools such as social networks, so- for personalizing and enhancing the IR process but can be cial bookmarking systems, social news sites, etc. impacting extended to use any source of social metadata, i.e. tweets, the way information is produced, processed, and consumed comments, etc. by both humans and machines. As a result, on the one hand, the user is no longer able to digest the large quantity of in- formation he has access to, and is generally overwhelmed 2. ANATOMY OF LAICOS by it. On the other hand, this abundance of information captures in general an explicit feedback of users and repre- Figure 1 illustrates the different components of LAICOS, which are: (i) a set of connectors, crawlers, and a database ∗ storing the data, (ii) data indexing engines, and (iii) a query This work has been mainly done when the authors was at processing engine. In the following, we briefly describe some Bell Labs France, Centre de Villarceaux. of these components. 2.1 Crawlers in LAICOS Permission to make digital or hard copies of all or part of this work for personal or LAICOS possesses two crawlers: (i) a crawler for web classroom use is granted without fee provided that copies are not made or distributed 2 for profit or commercial advantage and that copies bear this notice and the full cita- pages based on Heritrix , which was specifically designed for tion on the first page. Copyrights for components of this work owned by others than web archiving and crawling. For each crawled web page, (ii) ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- a folksonomy crawler engine will download all the annota- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 1We also refer to documents as web pages or resources. KDD’13, August 11–14, 2013, Chicago, Illinois, USA. 2 Copyright 2013 ACM 978-1-4503-2174-7/13/08 ...$15.00. http://crawler.archive.org/index.html
1446 • Tags Docs: it stores the posting list of web pages for tags. In particular, for each tag, this structure stores: Web Interface for the id of the web page which is tagged with this tag search and results and the amount of users who have used this tag to Crawling Query-time process Folksonomy components annotate this web page. crawler PSQE Web crawler