Utilizing Query Performance Predictors for Early Termination in Meta-Search
Total Page:16
File Type:pdf, Size:1020Kb
UTILIZING QUERY PERFORMANCE PREDICTORS FOR EARLY TERMINATION IN META-SEARCH A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY BY EMRE ŞENER IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER ENGINEERING AUGUST 2016 Approval of the thesis: UTILIZING QUERY PERFORMANCE PREDICTORS FOR EARLY TERMINATION IN META-SEARCH submitted by EMRE ŞENER in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Department, Middle East Technical University by, Prof. Dr. Gülbin Dural Ünver _____________ Dean, Graduate School of Natural and Applied Sciences Prof. Dr. Adnan Yazıcı _____________ Head of Department, Computer Engineering Assoc. Prof. Dr. İsmail Sengör Altıngövde _____________ Supervisor, Computer Engineering Department, METU Examining Committee Members: Prof. Dr. İsmail Hakkı Toroslu _____________ Computer Engineering Department, METU Assoc. Prof. Dr. İsmail Sengör Altıngövde _____________ Computer Engineering Department, METU Assoc. Prof. Dr. Pınar Karagöz _____________ Computer Engineering Department, METU Assist. Prof. Dr. Selim Temizer _____________ Computer Engineering Department, METU Assist. Prof. Dr. Engin Demir _____________ Computer Engineering Department, UTAA Date: 22.08.2016 I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work. Name, Last name: Emre ŞENER Signature: iv ABSTRACT UTILIZING QUERY PERFORMANCE PREDICTORS FOR EARLY TERMINATION IN META-SEARCH Şener, Emre M.S., Department of Computer Engineering Supervisor: Assoc. Prof. Dr. İsmail Sengör Altıngövde August 2016, 153 pages In the context of web, a meta-search engine is a system that forwards an incoming user query to all the component search engines (aka, resources); and then merges the retrieved results. Given that hundreds of such resources may exist, it is mandatory for a meta-search engine to avoid forwarding a query to all available resources, but rather focus on a subset of them. In this thesis, we first introduce a novel incremental query forwarding strategy for meta-search. More specifically, given a ranked list of N search engines, our strategy operates in rounds, such that in each round, we retrieve the results of the next k “unvisited” resources in the list (where k<N), asses the quality of the intermediate merged list, and stop if any further quality improvement seems unlikely. As our second contribution, we introduce a novel incremental query result merging strategy. In this strategy, we forward query to all search engines but we assess the quality of intermediate merged lists as early as we retrieve the results from an engine and stop if any further quality improvements are not likely. In order to assess the result quality, we utilize post-retrieval query performance prediction (QPP) techniques. Our experiments using the standard FedWeb 2013 dataset reveal that the proposed strategies can reduce the response v time and/or network bandwidth usage, while the quality of the result is comparable to, or sometimes, even better than the baseline strategy. Keywords: Meta-search Engines, Query Performance Prediction, Evaluation vi ÖZ META-ARAMA İÇİN SORGU PERFORMANS TAHMİNİ YÖNTEMLERİYLE ERKEN SONUÇ OLUŞTURMA Şener, Emre Yüksek Lisans, Bilgisayar Mühendisliği Bölümü Tez Yöneticisi: Doç. Dr. İsmail Sengör Altıngövde Ağustos 2016, 153 sayfa Meta-arama motorları diğer kaynaklardan gelen sorgu sonuçlarını kullanarak sonuç listesi oluştururlar. Bu tez ile meta-arama motorlarındaki sorgu maliyetini düşürmeyi hedeflemekteyiz. Bu amaçla, ilk olarak seçilen arama kaynaklarının hepsine birden sorgunun yönlendirilmesi yerine, artırımlı olarak daha küçük k elemanlı alt kümelere sorgunun yönlendirilmesini öneriyoruz. Artırımlı olarak yönlendirilen alt kümedeki kaynaklardan gelen sonuçlara sorgu performans tahmini adımını ekleyerek, sorgu performans tahmini değerine göre cevap kalitesini artırmayacağına karar verdiğimiz durumlarda kalan kaynaklara sorgunun gönderilmemesi ile sorgu maliyetinin azaltılmasını hedefliyoruz. Önerdiğimiz ikinci yöntemde ise, sorgu seçilen kaynakların hepsine birden yönlendiriliyor ancak, herhangi bir kaynaktan sonuç gelir gelmez ara listeleri birleştirip, sorgu performans tahmini ile daha sonraki kaynaklardan gelecek cevapların sonuç kalitesini artırmayacağına karar verdiğimiz durumlarda, bu ara listeyi kullanıcıya dönerek sorgu zamanını azaltmayı hedefliyoruz. Yöntemlerimizi FedWeb 2013 verisi üstünde test ettik ve deney sonuçlarından artırımlı sorgu işleme yöntemlerimizin tüm seçilen kaynaklara tek seferde göndermeye göre sorgu maliyetini ve/veya ağ kaynakları kullanımını düşürdüğünü gördük. vii Anahtar Kelimeler: Meta-arama Motorları, Sorgu Performans Ölçümü, Performans Değerlendirmesi viii To My Precious Wife ix ACKNOWLEDGEMENTS I would like to enounce my deep gratitude to my supervisor Assoc. Prof Dr. İsmail Sengör Altıngövde for his valuable supervision, advice, useful critics and discussions throughout this study. I am also grateful to my thesis committee members for their criticism and advices. This research project is jointly supported by Ministry of Science, Industry and Technology of Turkey and Huawei Telekomünikasyon Ltd. Co. under SAN-TEZ funding program with the project number 0441.STZ.2013-2. x TABLE OF CONTENTS ABSTRACT ................................................................................................................. v ÖZ .............................................................................................................................. vii ACKNOWLEDGEMENTS ......................................................................................... x TABLE OF CONTENTS ............................................................................................ xi LIST OF TABLES ..................................................................................................... xv LIST OF FIGURES ................................................................................................ xxxi LIST OF ABBREVATIONS ................................................................................. xxxii CHAPTERS ................................................................................................................. 1 1. INTRODUCTION ................................................................................................ 1 2. RELATED WORK ............................................................................................... 5 3. UTILIZING QUERY PERFORMANCE PREDICTORS FOR EARLY TERMINATION IN META-SEARCH ..................................................................... 11 3.1. QPP Based Adaptive Incremental Query Forwarding in Meta-Search ....... 11 3.2. QPP Based Adaptive Incremental Query Result Merging in Meta-Search . 13 3.3. Resource Selection ...................................................................................... 15 xi 3.3.1. TWF-IRF ................................................................................................. 16 3.3.2. UISP ........................................................................................................ 16 3.3.3. Oracle ...................................................................................................... 16 3.3.4. FedWeb2013 Baseline ............................................................................. 17 3.3.5. ReDDE .................................................................................................... 17 3.3.6. Rank-S ..................................................................................................... 18 3.3.7. Adaptive-k ............................................................................................... 18 3.4. Result Merging ............................................................................................ 18 3.4.1. Oracle Merging ........................................................................................ 19 3.4.2. ISR ........................................................................................................... 19 3.4.3. RRF ......................................................................................................... 19 3.4.4. CombSUM ............................................................................................... 20 3.5. Query Performance Prediction (QPP) ......................................................... 21 3.5.1. SUM QPP ................................................................................................ 21 3.5.2. NDCG QPP ............................................................................................. 21 3.6. Learning to Stop .......................................................................................... 22 3.6.1. Adhoc Stopping Policies ......................................................................... 23 xii 3.6.2. Machine Learning Model ........................................................................ 24 3.6.2.1. QPP Ratio ..................................................................................... 25 3.6.2.2. QPP Difference ............................................................................. 25 4. EXPERIMENTS AND RESULTS ..................................................................... 27 4.1. Data Set ......................................................................................................