On Search Engine Evaluation Metrics

Total Page:16

File Type:pdf, Size:1020Kb

On Search Engine Evaluation Metrics On Search Engine Evaluation Metrics Inaugural-Dissertation zur Erlangung des Doktorgrades der Philosophie (Dr. Phil.) durch die Philosophische Fakultät der Heinrich-Heine-Universität Düsseldorf Vorgelegt von Pavel Sirotkin aus Düsseldorf Betreuer: Prof. Wolfgang G. Stock Düsseldorf, April 2012 - 2 - - Oh my God, a mistake! - It’s not our mistake! - Isn’t it? Whose is it? - Information Retrieval. “BRAZIL” - 3 - Acknowledgements One man deserves the credit, one man deserves the blame… TOM LEHRER, “LOBACHEVSKY” I would like to thank my supervisor, Wolfgang Stock, who provided me with patience, support and the occasional much-needed prod to my derrière. He gave me the possibility to write a part of this thesis as part of my research at the Department of Information Science at Düsseldorf University; and it was also him who arranged for undergraduate students to act as raters for the study described in this thesis. I would like to thank my co-supervisor, Wiebke Petersen, who bravely delved into a topic not directly connected to her research, and took the thesis on sightseeing tours in India and to winter beaches in Spain. Wiebke did not spare me any mathematical rod, and many a faulty formula has been spotted thanks to her. I would like to thank Dirk Lewandowski, in whose undergraduate seminar I first encountered the topic of web search evaluation, and who provided me with encouragement and education on the topic. I am also indebted to him for valuable comments on a draft of this thesis. I would like to thank the aforementioned undergraduates from the Department of Information Science for their time and effort on providing the data on which this thesis’ practical part stands. Last, but definitely not least, I thank my wife Alexandra, to whom I am indebted for far more than I can express. She even tried to read my thesis, which just serves to show. As is the custom, I happily refer to the acknowledged all good that I have derived from their help, while offering to blame myself for any errors they might have induced. - 4 - Contents 1 Introduction ......................................................................................................................... 7 1.1 What It Is All About ................................................................................................... 7 1.2 Web Search and Search Engines ................................................................................ 8 1.3 Web Search Evaluation ............................................................................................ 11 Part I: Search Engine Evaluation Measures ............................................................................. 13 2 Search Engines and Their Users ....................................................................................... 14 2.1 Search Engines in a Nutshell .................................................................................... 14 2.2 Search Engine Usage ................................................................................................ 17 3 Evaluation and What It Is About ...................................................................................... 21 4 Explicit Metrics ................................................................................................................. 23 4.1 Recall, Precision and Their Direct Descendants ...................................................... 24 4.2 Other System-based Metrics .................................................................................... 27 4.3 User-based Metrics ................................................................................................... 32 4.4 General Problems of Explicit Metrics ...................................................................... 34 5 Implicit Metrics ................................................................................................................. 40 6 Implicit and Explicit Metrics ............................................................................................ 43 Part II: Meta-Evaluation ........................................................................................................... 51 7 The Issue of Relevance ..................................................................................................... 52 8 A Framework for Web Search Meta-Evaluation .............................................................. 57 8.1 Evaluation Criteria ................................................................................................... 57 8.2 Evaluation Methods .................................................................................................. 58 8.2.1 The Preference Identification Ratio ................................................................... 62 8.2.2 PIR Graphs ......................................................................................................... 66 9 Proof of Concept: A Study ................................................................................................ 69 9.1 Gathering the Data ................................................................................................... 69 9.2 The Queries .............................................................................................................. 74 9.3 User Behavior ........................................................................................................... 77 9.4 Ranking Algorithm Comparison .............................................................................. 81 10 Explicit Metrics ............................................................................................................. 87 10.1 (N)DCG .................................................................................................................... 87 - 5 - 10.2 Precision ................................................................................................................... 96 10.3 (Mean) Average Precision ...................................................................................... 100 10.4 Other Metrics .......................................................................................................... 104 10.5 Inter-metric Comparison ........................................................................................ 115 10.6 Preference Judgments and Extrinsic Single-result Ratings .................................... 120 10.7 PIR and Relevance Scales ...................................................................................... 129 10.7.1 Binary Relevance ............................................................................................. 130 10.7.2 Three-point Relevance ..................................................................................... 146 11 Implicit Metrics ........................................................................................................... 153 11.1 Session Duration Evaluation .................................................................................. 153 11.2 Click-based Evaluations ......................................................................................... 157 11.2.1 Click Count ...................................................................................................... 158 11.2.2 Click Rank ........................................................................................................ 161 12 Results: A Discussion .................................................................................................. 164 12.1 Search Engines and Users ...................................................................................... 164 12.2 Parameters and Metrics .......................................................................................... 165 12.2.1 Discount Functions ........................................................................................... 165 12.2.2 Thresholds ........................................................................................................ 166 12.2.2.1 Detailed Preference Identification ............................................................ 167 12.2.3 Rating Sources .................................................................................................. 171 12.2.4 Relevance Scales .............................................................................................. 171 12.2.5 Cut-off Ranks ................................................................................................... 172 12.2.6 Metric Performance .......................................................................................... 174 12.3 The Methodology and Its Potential ........................................................................ 176 12.4 Further Research Possibilities ................................................................................ 177 Executive Summary ............................................................................................................... 180 Bibliography ........................................................................................................................... 181 Appendix: Metrics Evaluated in Part II .................................................................................. 190 - 6 - 1 Introduction You shall seek all day ere you find them, and when you have them, they are not worth the search. WILLIAM SHAKESPEARE, “THE MERCHANT OF VENICE” 1.1 What It Is All About The present work deals with certain aspects of the evaluation of web search engines. This does not sound too exciting; but for some people,
Recommended publications
  • DOCUMENT RESUME AUTHOR Webnet 96 Conference Proceedings
    DOCUMENT RESUME ED 427 649 IR 019 168 AUTHOR Maurer, Hermann, Ed. TITLE WebNet 96 Conference Proceedings (San Francisco, California, October 15-19, 1996). INSTITUTION Association for the Advancement of Computing in Education, Charlottesville, VA. PUB DATE 1996-10-00 NOTE 930p.; For selected individual papers, see IR 019 169-198. Many figures and tables are illegible. AVAILABLE FROM Web site: http://aace.virginia.edu/aace/conf/webnet/proc96.html; also archived on WebNet 98 CD-ROM (includes 1996, 1997, 1998) AACE Membership/CD orders, P.O. Box 2966, Charlottesville, VA 22902; Fax: 804-978-7449 ($35, AACE members, $40, nonmembers). PUB TYPE Collected Works Proceedings (021) EDRS PRICE MF06/PC38 Plus Postage. DESCRIPTORS Access to Information; Authoring Aids (Programming); Computer Science; Computer Software; Courseware; Databases; Distance Education; Educational Media; Educational Strategies; *Educational Technology; Electronic Libraries; Elementary Secondary Education; *Hypermedia; Information Technology; Instructional Design; Multimedia Materials; Postsecondary Education; *World Wide Web IDENTIFIERS Electronic Commerce; Software Tools; Virtual Classrooms; *Web Sites ABSTRACT This proceedings contains 80 full papers, 12 posters/demonstrations, 108 short papers, one panel, and one tutorial, all focusing on World Wide Web applications. Topics include: designing hypertext navigation tools; Web site design; distance education via the Web; instructional design; the world-wide market and censorshipon the Web; customer support via the Web; VRML;
    [Show full text]
  • Regulating Search Engines: Taking Stock and Looking Ahead
    GASSER: REGULATING SEARCH ENGINES REGULATING SEARCH ENGINES: TAKING STOCK AND LOOKING AHEAD "To exist is to be indexed by a search engine" (Introna & Nissenbaum) URS GASSER TABLE OF CONTENTS I. IN TR O D UCTIO N ....................................................................................... 202 II. A BRIEF (AND CASUAL) HISTORY OF SEARCH ENGINES ..................... 203 Il. SEARCH ENGINE REGULATION: PAST AND PRESENT ........................ 208 A. OVERVIEW OF SEARCH ENGINE-RELATED CASES ............................ 208 B. LEGISLATION AND REGULATION ................................................. 216 C . SU M M AR Y .......................................................................................... 2 19 III. POSSIBLE FUTURE: HETEROGENEOUS POLICY DEBATES AND THE NEED FOR A NORMATIVE FRAMEWORK ......................................... 220 A. THEMES OF FUTURE POLICY DEBATES ............................................. 220 B . C HALLENGES A HEAD ........................................................................ 224 C. NORMATIVE FOUNDATIONS .............................................................. 227 IV . C ON CLU SIO N ....................................................................................... 234 * Associate Professor of Law, S.J.D. (St. Gallen), J.D. (St. Gallen), LL.M. (Harvard), Attorney at Law, Director, Research Center for Information Law, Univ. of St. Gallen, Faculty Fellow, Berkman Center for Internet & Society, Harvard Law School. I owe special thanks to my colleague James Thurman and the
    [Show full text]
  • Search Engines : Tools for Exploring the Internet
    CALIBER-98. 4-5 March 1998. Bhubaneswar. pp.193-199 @ 1NFLIBNET Centre. Ahmedabad Search Engines : Tools For Exploring The Internet SWAPAN KUMAR DASGUPTA Central Library, Kalyani University, Klyuniv @giasclOl. vsnl.net.in Abstract A software package for searching a particular information or topic from the vast amount of information available in INTERNET is called a search engine. The common search engines are Altavista. Webcrawler, Yahoo, Lycos, Infoseek, Aliweb. The author provides a list of search engines in INTERNET covering wide areas of interest and then brief description of URLs. After mentioning about the role of the INFLIBNET in modemising the university libraries and in improving the on-line access by creating its web page, the author says that in order to improve upon the education and research infrastructure of the country. some changes are necessary in our present thinking and approach. Introdution Internet is a global mesh and may be called a large repository of information put up by the user. Searching in a particular information or topic of interest, is an intricate task due to the fabulous size of Internet, and vast amount of information, and its many possible methods of storage. A software package for this purpose is called a search engine. Common Search Engines The common WWW search engines are Altavista, Webcrawler, Yahoo, Lycos, Infoseek, Aliweb. Some of these sites may be very busy, then the user has to try for another site, or may press G key for going to another URL. People are not aware that, netsurf can be of various ways. They may say it seems to be time consuming, but the fact is it is free from traditional constraints of time and space.
    [Show full text]
  • Evaluation of Web-Based Search Engines Using User-Effort Measures
    Evaluation of Web-Based Search Engines Using User-Effort Measures Muh-Chyun Tang and Ying Sun 4 Huntington St. School of Information, Communication and Library Studies Rutgers University, New Brunswick, NJ 08901, U.S.A. [email protected] [email protected] Abstract This paper presents a study of the applicability of three user-effort-sensitive evaluation measures —“first 20 full precision,” “search length,” and “rank correlation”—on four Web-based search engines (Google, AltaVista, Excite and Metacrawler). The authors argue that these measures are better alternatives than precision and recall in Web search situations because of their emphasis on the quality of ranking. Eight sets of search topics were collected from four Ph.D. students in four different disciplines (biochemistry, industrial engineering, economics, and urban planning). Each participant was asked to provide two topics along with the corresponding query terms. Their relevance and credibility judgment of the Web pages were then used to compare the performance of the search engines using these three measures. The results show consistency among these three ranking evaluation measures, more so between “first 20 full precision” and search length than between rank correlation and the other two measures. Possible reasons for rank correlation’s disagreement with the other two measures are discussed. Possible future research to improve these measures is also addressed. Introduction The explosive growth of information on the World Wide Web poses a challenge to traditional information retrieval (IR) research. Other than the sheer amount of information, some structural factors make searching for relevant and quality information on the Web a formidable task.
    [Show full text]
  • How to Choose a Search Engine Or Directory
    How to Choose a Search Engine or Directory Fields & File Types If you want to search for... Choose... Audio/Music AllTheWeb | AltaVista | Dogpile | Fazzle | FindSounds.com | Lycos Music Downloads | Lycos Multimedia Search | Singingfish Date last modified AllTheWeb Advanced Search | AltaVista Advanced Web Search | Exalead Advanced Search | Google Advanced Search | HotBot Advanced Search | Teoma Advanced Search | Yahoo Advanced Web Search Domain/Site/URL AllTheWeb Advanced Search | AltaVista Advanced Web Search | AOL Advanced Search | Google Advanced Search | Lycos Advanced Search | MSN Search Search Builder | SearchEdu.com | Teoma Advanced Search | Yahoo Advanced Web Search File Format AllTheWeb Advanced Web Search | AltaVista Advanced Web Search | AOL Advanced Search | Exalead Advanced Search | Yahoo Advanced Web Search Geographic location Exalead Advanced Search | HotBot Advanced Search | Lycos Advanced Search | MSN Search Search Builder | Teoma Advanced Search | Yahoo Advanced Web Search Images AllTheWeb | AltaVista | The Amazing Picture Machine | Ditto | Dogpile | Fazzle | Google Image Search | IceRocket | Ixquick | Mamma | Picsearch Language AllTheWeb Advanced Web Search | AOL Advanced Search | Exalead Advanced Search | Google Language Tools | HotBot Advanced Search | iBoogie Advanced Web Search | Lycos Advanced Search | MSN Search Search Builder | Teoma Advanced Search | Yahoo Advanced Web Search Multimedia & video All TheWeb | AltaVista | Dogpile | Fazzle | IceRocket | Singingfish | Yahoo Video Search Page Title/URL AOL Advanced
    [Show full text]
  • United States Patent (19) 11 Patent Number: 6,094,649 Bowen Et Al
    US006094649A United States Patent (19) 11 Patent Number: 6,094,649 Bowen et al. (45) Date of Patent: Jul. 25, 2000 54) KEYWORD SEARCHES OF STRUCTURED “Charles Schwab Broadens Deployment of Fulcrum-Based DATABASES Corporate Knowledge Library Application', Uknown, Full 75 Inventors: Stephen J Bowen, Sandy; Don R crum Technologies Inc., Mar. 3, 1997, pp. 1-3. Brown, Salt Lake City, both of Utah (List continued on next page.) 73 Assignee: PartNet, Inc., Salt Lake City, Utah 21 Appl. No.: 08/995,700 Primary Examiner-Hosain T. Alam 22 Filed: Dec. 22, 1997 Assistant Examiner Thuy Pardo Attorney, Agent, or Firm-Computer Law---- (51) Int. Cl." ...................................................... G06F 17/30 52 U.S. Cl. ......................................... 707/3; 707/5; 707/4 (57 ABSTRACT 58 Field of Search .................................... 707/1, 2, 3, 4, 707/5, 531, 532,500 Methods and Systems are provided for Supporting keyword Searches of data items in a structured database, Such as a 56) References Cited relational database. Selected data items are retrieved using U.S. PATENT DOCUMENTS an SQL query or other mechanism. The retrieved data values 5,375,235 12/1994 Berry et al. ................................. is are documented using a markup language such as HTML. 5,469,354 11/1995 Hatakeyama et al. ... 707/3 The documents are indexed using a web crawler or other 5,546,578 8/1996 Takada ................. ... 707/5 indexing agent. Data items may be Selected for indexing by 5,685,003 11/1997 Peltonen et al. .. ... 707/531 5,787.295 7/1998 Nakao ........... ... 707/500 identifying them in a data dictionary. The indexing agent 5,787,421 7/1998 Nomiyama ..
    [Show full text]
  • Instrumentalizing the Sources of Attraction. How Russia Undermines Its Own Soft Power
    INSTRUMENTALIZING THE SOURCES OF ATTRACTION. HOW RUSSIA UNDERMINES ITS OWN SOFT POWER By Vasile Rotaru Abstract The 2011-2013 domestic protests and the 2013-2015 Ukraine crisis have brought to the Russian politics forefront an increasing preoccupation for the soft power. The concept started to be used in official discourses and documents and a series of measures have been taken both to avoid the ‘dangers’ of and to streamline Russia’s soft power. This dichotomous approach towards the ‘power of attraction’ have revealed the differences of perception of the soft power by Russian officials and the Western counterparts. The present paper will analyse Russia’s efforts to control and to instrumentalize the sources of soft power, trying to assess the effectiveness of such an approach. Keywords: Russian soft power, Russian foreign policy, public diplomacy, Russian mass media, Russian internet Introduction The use of term soft power is relatively new in the Russian political circles, however, it has become recently increasingly popular among the Russian analysts, policy makers and politicians. The term per se was used for the first time in Russian political discourse in February 2012 by Vladimir Putin. In the presidential election campaign, the then candidate Putin drew attention to the fact that soft power – “a set of tools and methods to achieve foreign policy goals without the use of arms but by exerting information and other levers of influence” is used frequently by “big countries, international blocks or corporations” “to develop and provoke extremist, separatist and nationalistic attitudes, to manipulate the public and to directly interfere in the domestic policy of sovereign countries” (Putin 2012).
    [Show full text]
  • Functional Differences Between Northern Light Singlepoint And
    Functional Differences Between Northern Light SinglePoint and Microsoft SharePoint For Strategic Research Portals Northern Light® Whitepaper April 2011 TABLE OF CONTENTS Background On Northern Light SinglePoint ................................................................................................................... 2 Third-Party Licensed External Content ...................................................................................................................... 3 Internally Produced Research Content ...................................................................................................................... 4 User Authentication................................................................................................................................................... 4 Differences Between Northern Light SinglePoint and Microsoft SharePoint ................................................................ 5 Investment in a Research-optimized Portal and Supporting Systems ....................................................................... 5 Ability To Technically Integrate With Third-Parties To Index Research Content ....................................................... 6 Content Liability ......................................................................................................................................................... 7 Document Security Conventions Reflecting Licensing Arrangements ....................................................................... 7 Benefit of Having the
    [Show full text]
  • An Empirical Analysis of Internet Search Engine Choice
    An Empirical Analysis of Internet Search Engine Choice Rahul Telang ([email protected]) Tridas Mukhopadhyay ([email protected]) Ronald T. Wilcox ([email protected]) Send correspondence to Rahul Telang H. John Heinz III School of Public Policy and Management Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3890 [email protected] December 2001 An Empirical Analysis of Internet Search Engine Choice Abstract We investigate consumers’ choice behavior for Internet search engines. Within this broad agenda, we focus on two interrelated issues. First, we examine whether consumers develop loyalty to a particular search engine. If loyalty does indeed develop, we seek to understand the role of loyalty in the search engine choice. We also explore how the use of non-search features such as email, news, etc. provided by the engines enhances or inhibits customer loyalty. Second, we seek to determine how search engine performance affects the user choice behavior. To accomplish our research objective, we first develop a conceptual model of search engine choice based on the literature of human-computer interaction and cognitive psychology. Our model reflects the fact that information goods such as search engines are a fundamentally different class of products than common household items. We posit that the ability to learn various search engine features and the ease (or difficulty) of transferring this learning to other engines would determine loyalty in this context. Indeed, we expect the user to exhibit differing levels of loyalty to search and non-search features of the engines. We also expect that dissatisfaction with search results would negatively affect search engine choice.
    [Show full text]
  • Meta Search Engine Examples
    Meta Search Engine Examples mottlesMarlon istemerariously unresolvable or and unhitches rice ichnographically left. Salted Verney while crowedanticipated no gawk Horst succors underfeeding whitherward and naphthalising. after Jeremy Chappedredetermines and acaudalfestively, Niels quite often sincipital. globed some Schema conflict can be taken the meta descriptions appear after which result, it later one or can support. Would result for updating systematic reviews from different business view all fields need to our generated usually negotiate the roi. What is hacking or hacked content? This meta engines! Search Engines allow us to filter the tons of information available put the internet and get the bid accurate results And got most people don't. Best Meta Search array List The Windows Club. Search engines have any category, google a great for a suggestion selection has been shown in executive search input from health. Search engine name of their booking on either class, the sites can select a search and generally, meaning they have past the systematisation of. Search Engines Corner Meta-search Engines Ariadne. Obsession of search engines such as expedia, it combines the example, like the answer about search engines out there were looking for. Test Embedded Software IC Design Intellectual Property. Using Research Tools Web Searching OCLS. The meta description for each browser settings to bing, boolean logic always prevent them the hierarchy does it displays the search engine examples osubject directories. Online travel agent Bookingcom has admitted that playing has trouble to compensate customers whose personal details have been stolen Guests booking hotel rooms have unwittingly handed over business to criminals Bookingcom is go of the biggest online travel agents.
    [Show full text]
  • Computational Propaganda in Russia: the Origins of Digital Misinformation
    Working Paper No. 2017.3 Computational Propaganda in Russia: The Origins of Digital Misinformation Sergey Sanovich, New York University 1 Table of Contents Abstract ............................................................................................................................................................... 3 Introduction.......................................................................................................................................................... 3 Domestic Origins of Russian Foreign Digital Propaganda ......................................................................... 5 Identifying Russian Bots on Twitter .............................................................................................................. 13 Conclusion ......................................................................................................................................................... 15 Author Acknowledgements ............................................................................................................................ 17 About the Author ............................................................................................................................................. 17 References ........................................................................................................................................................ 18 Citation ............................................................................................................................................................
    [Show full text]
  • Applications That Changed the World
    Applications That Changed The World Some slides adapted from UC Berkeley CS10 – Dan Garcia Lecture Overview • What counts? • For each application – Historical context • What world was like before • On what shoulders does it stand? – Key players • Sometimes origins fuzzy – How it changed world • Summary Applications that Changed the World • Lots of applications changed the world – Electricity, Radio, TV, Cars, Planes, AC, ... • We’ll focus on those utilizing Computing • Important to consider historical apps – Too easy to focus on recent N years! Email (1965) • Fundamentally changed the way people interact! • 1965: MIT’s CTSS – Compatible Time-Sharing Sys • Exchange of digital info • How – Model: “Store and Forward” – Alice composes email to – “Push” technology [email protected] • Pros – Domain Name System looks up – Solves logistics (where) & where b.org is synchronization (when) – DNS server with the mail • Cons exchange server for b.org – “Email Fatigue” – Mail is sent to mx.b.org – Information Overload – Bob reads email from there – Loss of Context The Personal Computer (1970s) • First PCs sold as kits to hobbyists – Altair 8800 (1975) • Early mass-prod PCs – Apple I, II (Jobs & Woz) – Commodore PET Altair 8800 Apple II – IBM ran away w/market • Microprocessor key • Laptops portability • Created industry, wealth – Silicon Valley! – Bill Gates worth $50 Billion Commodore IBM PC PET en.wikipedia.org/wiki/Personal_computer The World Wide Web (1989) • “System of interlinked hypertext documents on the Internet” • History – 1945: Vannevar Bush describes hypertext system called World’s First web “memex” in article Tim Berners- server in 1990 – 1989: Tim Berners-Lee Lee proposes, gets system up ’90 – ~2000 Dot-com entrepreneurs rushed in, 2001 bubble burst www.archive.org • Wayback Machine – Snapshots of web over time • Today : Access anywhere! WWW Search & Browser (1993) • Browser – Marc L.
    [Show full text]