Web Searching: a Quality Measurement Perspective
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by E-LIS repository Web Searching: A Quality Measurement Perspective to appear in: Zimmer, M.; Spink, A. (eds.): Web Search: Interdisciplinary perspec- tives. Dordrecht: Spinger, 2007. Dirk Lewandowski Hamburg University of Applied Sciences, Department Information, Ber- liner Tor 5, D – 20099 Hamburg, Germany. E-Mail: [email protected] Nadine Höchstötter Institute for Decision Theory and Management Science, Universitaet Karlsruhe (TH), Kaiserstrasse 12, D - 76128 Karlsruhe, Germany. E-mail: [email protected] Abstract The purpose of this paper is to describe various quality measures for search engines and to ask whether these are suitable. We especially focus on user needs and their use of web search engines. The paper presents an extensive literature review and a first quality measurement model, as well. Findings include that search engine quality can not be measured by just re- trieval effectiveness (the quality of the results), but should also consider index quality, the quality of the search features and search engine usability. For each of these sections, empirical results from studies conducted in the past, as well as from our own research are presented. These results have implications for the evaluation of search engines and for the development of better search systems that give the user the best possible search experi- ence. Introduction Web search engines have become important for information seeking in many different contexts (e.g., personal, business, and scientific). Research questions not answered satisfactorily are, as of now, how well these en- gines perform regarding user expectations and what measures should be used to get an overall picture of search engine quality. It is well known that search engine quality in its entirety cannot be measured with the use of traditional retrieval measures. But the development of new, search engine specific measures, as proposed in Vaughan (2004) are not sufficient, ei- ther. Search engine quality must be defined more extensively and integrate factors beyond retrieval performance such as index quality and the quality of the search features. One aspect neglected is the user himself. But to discuss and judge the quality of search engines, it is important to focus on the user of such sys- tems, too. A better performance of ranking algorithms or providing addi- tional services do not always lead to users’ satisfaction and to better search results. We will focus on the search engine user behaviour to derive strate- gies to measure search engine quality. Additionally, quality assurance is an important aspect to improve cus- tomer satisfaction and loyalty. This is fundamental to protect market shares and revenues from adverts. Furthermore, quality measurement helps to identify potential improvements of search engines. We are sure that only an integrated approach to quality measurement can lead to results usable for the development of better search engines. As with information retrieval, in general, we find a paradigm shift from the more technical (document-oriented) perspective to the user-oriented per- spective (cf. Ingwersen & Järvelin, 2005). Our goal in this chapter is to de- fine the scope of our perspective in comparison to other approaches and to give a literature overview of quality measurements for search engines. We will also focus on each individual factor stated in studies dealing with user interaction with search engines and user expectations to search engines. The integrated approach of user and technical aspects shows that there are many possibilities but they are not widely adopted yet. Our chapter first gives an overview of studies conducted to derive qual- ity measures and to present the state of the art. The other focus in this sec- tion lies on user surveys and analyses to give an anticipation of what users really do by placing search queries. In section 3 we give a general con- spectus of parameters we deduced from our literature research and explain them shortly. In section 4 we show empirical results that reflect the current quality standard by our individual measures of search engines. In the last 2 section we summarize our findings and give potential strategies to improve search engines. Many of the empirical findings stem from our own research conducted over the past years. Our integrated view on search engine quality meas- urement is reflected by the different research areas of the authors. Related studies In this section, we will discuss studies dealing with search engines in the given context. The two areas relevant for extensive search engine quality measurement are the concept of information quality in general and its transfer to search engines as a technical background, and user studies to see what happens at the front-end. Each will be discussed under a separate heading. Search engine quality Referring to information quality, one usually appraises information on the basis of a single document or a set of documents. Two perspectives have to be differentiated: Firstly, information quality in the production of a database which means, how documents or sources have to be appropriately selected and secondly, information quality of the results retrieved by a cer- tain IR system. While the latter can be easily applied to Web search engines, the assur- ance of the quality of databases is more difficult. The approach of the ma- jor search engines is to index not only a part of the Web, but as much as possible (or as much as reasonable under economic aspects). Only certain fractions of the Web (such as Spam sites) should be willingly omitted from the database. While in the production of databases the process of selecting documents (or sources of documents) can be seen as an important quality aspect, in the context of search engines, this process is reassigned to the ranking process. Therefore, classic judgements for the selection of docu- ments from a library context do not fit to search engines. Only specialized search engines rely on a selection of quality sources (Web sites or servers) for building their indices. An important point is that quality measurement of search results give only limited insight into the reliability and correctness of the information presented in the document. Popular examples are documents from Wikipe- dia, which are often highly ranked by search engines. But there seems not to be an agreement of experts whether Wikipedia content is trustworthy or 3 not. For a normal user, there is only a limited chance of scrutinising these documents. In this context, perceived information quality is more a matter of trust. Within the wider context of search engine evaluation, it is possible to build models completely based on trust (Wang, Xie, & Goh, 1999), as explained later on. When discussing quality of search results, one should also keep in mind how search engines determine relevance. They mainly focus on popularity (or authority) rather than on what is commonly regarded as quality. It should be emphasized that in the process of selecting documents to be in- dexed by engines and in the ranking process as well, no human reviews are involved. But a certain bias can be found inherent in the ranking algo- rithms (Lewandowski, 2004b). These rate Web pages (apart from classic IR calculations) mainly by determining their popularity based on the link structure of the Web. The basic assumption is that a link to a page is a vote for that page. But not all links should be counted the same; link-based measures take into account the popularity of the linking page itself and the number of outgoing links, as well. This holds true for both of the main link-based ranking algorithms, Google’s PageRank (Page, Brin, Motwani, & Winograd, 1998) and HITS (Kleinberg, 1999). Table 1. Query-independent ranking factors (taken from Lewandowski, 2005a) Directory hierarchy Documents on a higher hierarchy level are preferred. Number of incoming links The higher the number of incoming links, the more important the document. Link popularity Quality/authority of a document is measured accord- ing to its linking within the Web graph. Click popularity Documents visited by many users are preferred. Up-to-dateness Current documents are preferred to older docu- ments. Document length Documents within a sudden length range are pre- ferred. File format Documents written in standard HTML are preferred to documents in other formats such as PDF or DOC. Size of the Website Documents from larger Web sites (or within a sud- den size range) are preferred. Link-based measures are commonly calculated query-independent, i.e., no computing power is needed to calculate these measures at the moment us- ers place their search queries. Therefore, these measures can be applied very fast by the ranking process. Other query-independent factors are used as well (see table 1 and for a detailed discussion Lewandowski, 2005a). 4 Here, the important point is that the process of ranking Web pages evolved from a query-document matching, based on term frequency and similar factors, to a process where several quality measurements are also taken into account. Link-based algorithms are of good use to push some highly relevant re- sults to the top of the results list. This approach is oriented towards the typical user behaviour. Users often view only a few results from the top of the list and seldom process to the second or even third page of the results list. Another prob- lem with the calculation of appropriate result lists is the shortness of search queries. Therefore, most ranking algorithms prefer popular pages and the presence of search terms in anchor texts. Although the general user rarely uses advanced search features, this does not make them unnecessary or useless. On the one hand, there are special user groups like librarians or in- formation professionals who conduct complex searches.