Web Searcher Interactions with the Dogpile.Com Meta-Search Engine
Total Page:16
File Type:pdf, Size:1020Kb
Web Searcher Interaction With the Dogpile.com Metasearch Engine Bernard J. Jansen College of Information Sciences and Technology, The Pennsylvania State University, 329F IST Building, University Park, PA 16802. E-mail: [email protected] Amanda Spink Faculty of Information Technology, Queensland University of Technology, Gardens Point Campus, 2 George Street, GPO Box 2434, Brisbane QLD 4001, Australia. E-mail: [email protected] Sherry Koshman School of Information Sciences, University of Pittsburgh, 610 IS Building, 135 N. Bellefield Avenue, Pittsburgh, PA 15260. E-mail: [email protected] Metasearch engines are an intuitive method for improving Unlike single source Web search engines, metasearch the performance of Web search by increasing coverage, engines do not crawl the Internet themselves to build an returning large numbers of results with a focus on rele- index of Web documents. Instead, a metasearch engine sends vance, and presenting alternative views of information needs. However, the use of metasearch engines in an queries simultaneously to multiple other Web search en- operational environment is not well understood. In this gines, retrieves the results from each, and then combines the study, we investigate the usage of Dogpile.com, a major results from all into a single results listing, at the same time Web metasearch engine, with the aim of discovering how avoiding redundancy. In effect, Web metasearch engine Web searchers interact with metasearch engines. We users are not using just one engine, but many search engines report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and com- at once to effectively utilize Web searching. The ultimate pare these results with findings from other Web searching purpose of a metasearch engine is to diversify the results of studies. We collect data on geographical location of the queries by utilizing the innate differences of single searchers, use of system feedback, content selection, source Web search engines and provide Web searchers with sessions, queries, and term usage. Findings show that the highest ranked search results from the collection of Web Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean 2.85), search engines. Although one could certainly query multiple implement system feedback moderately (8.4% of users), search engines, a metasearch engine distills these top results and generally (56% of users) spend less than one automatically, giving the searcher a comprehensive set of minute interacting with the Web search engine. Overall, search results within a single listing, all in real time. metasearchers seem to have higher degrees of interac- We know that there is little overlap among typical search tion than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of engine result listings (Ding & Marchionini, 1996), and sin- metasearching may be what define the differences from gle search engines index a relatively small percentage of the other forms of Web searching. We discuss the implica- Web (Lawrence & Giles, 1999). Research shows that results tions of our findings in relation to metasearch for Web retrieved from multiple sources have a higher probability of searchers, search engines, and content providers. being relevant to the searcher’s information needs (Gauch, Wang, & Gomez, 1996). Finally, a single search engine may Introduction have inherent biases that influence what results are returned Metasearch engines have an intuitive appeal as a method (Gerhart, 2004; Introna & Nissenbaum, 2000). By combin- of improving the retrieval performance for Web searches. ing results from several sources, a metasearch engine ad- dresses all three concerns. Chignell, Gwizdka, and Bodner (1999) found little overlap Received October 25, 2005; revised May 18, 2006; accepted May 18, 2006 in the results returned by various Web search engines. They © 2007 Wiley Periodicals, Inc. • Published online 2 February 2007 in describe a metasearch engine as useful, since different engines Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20555 employ different means of matching queries to relevant items, JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 58(5):744–755, 2007 and also have different indexing coverage. Selberg and load as the metric of comparison. Searching characteristics Etzioni (1997) further suggested that no single search engine were not presented. is likely to return more than 45% of the relevant results. Sub- Developers of the Mearf metasearch engine (Oztekin, sequently, the design and performance of metasearch en- Karypis, & Kumar, 2002) collected transaction logs from gines have become an ongoing area of study (Buzikashvili, November 22, 2000 to November 10, 2001, using click- 2002; Chignell, Gwizdka & Bodner, 1999; Dreilinger & through as a mechanism for evaluating Mearf performance. Howe, 1997; Meng, Yu, & Lui, 2002; Selberg & Etzioni, They report on the mean documents returned per query, user 1997; Spink, Lawrence, & Giles, 2000). reranking of results, and the number of documents clicked However, there has been little investigation into how on by searchers. Approximately 64% of queries included a searchers interact with Web metasearch engines. If metasearch click on a document, with a mean of 2.02 clicks per query. provides an improved Web searching environment, one may However, there were a total of 17,055 queries submitted expect differences in interactions when compared to Web during the one year period, so this may not be a representa- searching on other search engines. What are the interaction tive sample of metasearch engine users. patterns between searchers and a metasearch engine? This Many studies have examined the performance of single question motivates our research. Web search engines such as AltaVista, Excite, AlltheWeb In the following sections, we review the related studies (Spink & Jansen, 2004), and NAVER (Park, Bae, & Lee, and list our research questions. We then discuss the Dog- 2005). Spink, Jansen, Blakely, and Koshman (2006) found pile.com Web metasearch engine and the research design little results overlap and uniqueness among major Web that was used in our study. We then discuss the findings from search engines. However, limited large-scale studies have multiple levels of analysis, concluding with implications for examined how searchers interact with Web metasearch Web metasearching. engines. An understanding of how searchers utilize these systems is critical for the future refinement of metasearch Related Studies engine design and the evaluation of Web metasearch engine Web research is now a major interdisciplinary area of performance. These are the motivators for our research. study, including the modeling of user behavior and Web search engine performance (Spink & Jansen, 2004). Web Research Questions search engine crawling and retrieving studies have evolved The research questions driving our study are as follows: as an important area of Web research since the mid-1990s. Many metasearch tools have been developed and commer- 1. What are the characteristics of search interactions on the cially implemented, but little research has investigated the Dogpile.com metasearch engine? To address this re- usage and performance of Web metasearch engines. Selberg search question, we investigated session length, query and Etzioni (1997) developed one of the first metasearch length, query structure, query formulation, result pages engines, Metacrawler (http://www.metacrawler.com). Largely viewed and term usage of these Web searchers. focusing on the system design, the researchers discuss usage, 2. What are the temporal characteristics of metasearching reporting on 50,878 queries submitted between July 7 and on Dogpile.com? For this research question, we investi- September 30, 1995, with 46.67% (24,253 queries) being gated the duration of sessions and the frequency of inter- actions during these sessions. unique. The top 10 queries represented 3.37% (1,716) of all 3. What are the topical characteristics of searches on the queries. The top queries were all one term in length, and com- Dogpile.com metasearch engine? To address this re- monly occurring natural language terms (e.g., the, of, and, or) search question, we investigated a subset of queries reported in later Web user studies were not present. submitted by searchers on Dogpile.com to gain insight Gauch, Wang, and Gomez (1996) designed the ProFusion into the nature of their search topics using a qualitative metasearch engine and evaluated its performance in a lab analysis. setting. The researchers used 12 students who submitted queries and compared ProFusion to the six underlying search engines using the number of relevant documents Research Design retrieved, the number of irrelevant documents retrieved, the Dogpile.com number of broken links, the number of duplicates, the num- ber of unique relevance documents and precision. How the Dogpile.com (http://www.Dogpile.com/) is owned by In- study participants utilized the metasearch engine was not fospace, a market leader in the metasearch engine business. discussed. Dogpile.com incorporates into its search result listings the The SavvySearch (Dreilinger & Howe, 1997; Howe & results from other search engines, including results from the Dreilinger, 1997) is a metasearch engine that selects the most four leading Web search indices (i.e., Ask Jeeves, Google, promising search engines