Aalborg Universitet Spatial Keyword Querying: Ranking Evaluation And
Total Page:16
File Type:pdf, Size:1020Kb
Aalborg Universitet Spatial Keyword Querying: Ranking Evaluation and Efficient Query Processing Keles, Ilkcan DOI (link to publication from Publisher): 10.5278/vbn.phd.tech.00039 Publication date: 2018 Document Version Publisher's PDF, also known as Version of record Link to publication from Aalborg University Citation for published version (APA): Keles, I. (2018). Spatial Keyword Querying: Ranking Evaluation and Efficient Query Processing. Aalborg Universitetsforlag. Ph.d.-serien for Det Tekniske Fakultet for IT og Design, Aalborg Universitet https://doi.org/10.5278/vbn.phd.tech.00039 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from vbn.aau.dk on: October 08, 2021 SPATIAL KEYWORD QUERYING: RANKING EVALUATION AND EFFICIENT QUERY PROCESSING AND EFFICIENT QUERY RANKING EVALUATION KEYWORD QUERYING: SPATIAL SPATIAL KEYWORD QUERYING RANKING EVALUATION AND EFFICIENT QUERY PROCESSING BY ILKCAN KELES DISSERTATION SUBMITTED 2018 ILKCAN KELES Spatial Keyword Querying: Ranking Evaluation and Efficient Query Processing Ph.D. Dissertation Ilkcan Keles Dissertation submitted April, 2018 Dissertation submitted: April, 2018 PhD supervisor: Prof. Christian Søndergaard Jensen Aalborg University Assistant PhD supervisor: Assoc. Prof. Simonas Šaltenis Aalborg University PhD committee: Associate Professor Christian Thomsen (chairman) Aalborg University Associate Professor Maria Luisa Damiani University of Milan Associate Professor Vladimir I. Zadorozhny University of Pittsburgh PhD Series: Technical Faculty of IT and Design, Aalborg University Department: Department of Computer Science ISSN (online): 2446-1628 ISBN (online): 978-87-7210-184-2 Published by: Aalborg University Press Langagervej 2 DK – 9220 Aalborg Ø Phone: +45 99407140 [email protected] forlag.aau.dk © Copyright: Ilkcan Keles The author has obtained the right to include the published and accepted articles in the thesis, with a condition that they are cited, DOI pointers and/or copyright/credits are placed prominently in the references. Printed in Denmark by Rosendahls, 2018 Abstract Due to the widespread adoption of mobile devices with positioning capa- bilities, notably smartphones, users increasingly search for geographically nearby information on search engines. Further, an analysis of user behavior finds that users not only search for local content, but also take action with respect to search results. In step with these developments, the research com- munity has proposed various kinds of spatial keyword queries that return ranked lists of relevant points of interest. These proposals generally come with advanced query processing techniques, the goal being to make it pos- sible for users to find relevant information quickly. Most of the proposals employ a simple ranking function that takes only textual relevance and spa- tial proximity into account. While these proposals study the query processing efficiency, they are generally weak when it comes to evaluation of the result rankings. We believe that ranking evaluation for spatial keyword queries is important since it is directly related to the user satisfaction. The thesis addresses several challenges related to ranking evaluation for spatial keyword queries. The first challenge we address is forming ground- truth rankings for spatial keyword queries that reflect user preferences. The main idea is that the more similar an output ranking is to the ground-truth ranking, the better the output ranking is. The thesis proposes methods based on crowdsourcing and vehicle trajectories to address this challenge. These methods make it possible for researchers to propose novel ranking functions and to assess the performance of these functions. As such, the thesis makes a step towards more advanced and complex ranking functions that correspond better to user preferences. The contributions of the thesis can also be used to evaluate hypotheses regarding different keywords and geographical regions. Along these lines, it might be possible to employ different ranking functions for different queries in the same system. The thesis also addresses the prob- lem of detecting the visited points of interest in a GPS dataset and proposes algorithms to tackle this problem. These visits offer insight into which points of interest are of interest to drivers and offer a means of ranking for points of interest. More specifically, the thesis first proposes a technique based on crowd- iii sourcing to obtain partial rankings corresponding to user preferences. The method employs pairwise relevance questions and uses a gain function to decrease the number of questions that need to be processed to form a rank- ing, thus reducing the cost of crowdsourcing. The resulting partial rankings can be used to assess the quality of ranking functions. Next, the thesis proposes a system, CrowdRankEval, to evaluate ranking functions for a given set of queries and corresponding query results obtained using various ranking functions. The system utilizes the aforementioned crowdsourcing-based method to form ground-truth rankings for queries. The system provides an easy-to-use interface that allows users to visualize the rankings obtained from crowdsourcing and to view the evaluation results. Further, the thesis proposes a crowdsourcing-based approach to evaluate two ranking functions on a set of spatial keyword queries. The approach takes budget constraints into account and utilizes a learn-to-rank method and an entropy definition to determine the most important questions to compare two ranking functions for a given query. The thesis also proposes a method that uses GPS data to extract ground- truth rankings for spatial keyword queries. The idea is to use historical trips to points of interest to determine the relative popularity of points of interest. The experimental findings suggest that the proposed method is capable of capturing user preferences. Finally, the thesis formalizes a so-called k-TMSTC query that targets users looking for groups of points of interest instead of single points of interest. Two algorithms based on density-based clustering are proposed to process this query. Experiments show that the proposed methods support interactive search. Resumé Udbredelsen af mobile enheder, i særdeleshed smartphones med indbygget GPS, har medført at brugere i stigende grad søger efter information om det område de befinder sig i. Derudover viser analyse af brugeradfærd at brugere ikke alene søger efter information om deres omgivelser, men også agerer på den. Som følge deraf har forskere foreslået en række forskellige spatiale sø- geordsforespørgsler der tager både brugerens søgeord og lokation i betragt- ning og returnerer en rangeret list af interessepunkter. Disse forslag kommer med typisk med avancerede teknikker til processering af forespørgsler med henblik på hurtigt at give brugere svar på deres forespørgsler. De fleste af forslagene bruger en simpel rangeringsfunktion, der kun tager tekst og spa- tial relevans i betragtning og processerer forespørgsler hurtigt, men overvejer ikke kvaliteten af svaret på forespørgslen. Vi mener at evalueringen af spa- tiale søgeordsforespørgslers resultater er vigtig da kvaliteten af resultaterne er direkte relateret til brugertilfredshed. Denne afhandling adresserer adskillige udfordringer i forbindelse med evalueringen af spatiale søgeordsforespørgsler. Den første udfordring vi adresserer er at etablere sande rangeringer af forespørgselsresultater baseret på brugerpræferencer. Jo tættere et forespørgselsresultat er på den sande rangering jo bedre er den. Afhandlingen foreslår metoder til at disse at adressere denne udfordring baseret på crowdsourcing og bilers færden i ve- jnet. Disse metoder tillader forskere at foreslå og evaluere nye rangerings- funktioner. Derfor bringer denne afhandling os tættere på mere avancerede og komplekse rangeringsfunktioner der stemmer bedre overens med bruger- præferencer. Afhandlingens bidrag kan endvidere bruges til at evaluere hypoteser vedrørende forskellige søgeord og geografiske lokationer. Det kan måske også lade sig gøre at bruge forskellige rangeringsfunktioner til forskellige typer af forespørgsler. Afhandlingen foreslår også algoritmer til at adressere problemet med detektering af besøgte punkter fra GPS-data. Besø- gene giver indsigt i hvilke punkter er interessante for bilister og en måde at rangere interessepunkterne. Mere specifikt foreslår afhandlingen først en teknik baseret på crowd- sourcing til at opnå delvise rangeringer svarende til brugerpræferencer. Tek- v nikken stiller en række spørgsmål til crowdsourcingarbejdere og bruger en funktion til at afgøre hvilket spørgsmål der skal stilles næste gang for at få mest mulig indsigt i brugerpræferencer og reducerer derfor omkostnin- gen ved crowdsourcing. De resulterende delvis