Readingsample
Total Page:16
File Type:pdf, Size:1020Kb
Information Science and Knowledge Management 14 Web Search Multidisciplinary Perspectives Bearbeitet von Amanda Spink, Michael Zimmer 1. Auflage 2008. Buch. xii, 352 S. Hardcover ISBN 978 3 540 75828 0 Format (B x L): 15,5 x 23,5 cm Gewicht: 703 g Weitere Fachgebiete > EDV, Informatik > EDV, Informatik: Allgemeines, Moderne Kommunikation > Soziale, sicherheitstechnische, ethische Aspekte Zu Inhaltsverzeichnis schnell und portofrei erhältlich bei Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbücher, insbesondere Recht, Steuern und Wirtschaft. Im Sortiment finden Sie alle Medien (Bücher, Zeitschriften, CDs, eBooks, etc.) aller Verlage. Ergänzt wird das Programm durch Services wie Neuerscheinungsdienst oder Zusammenstellungen von Büchern zu Sonderpreisen. Der Shop führt mehr als 8 Millionen Produkte. 16 Web Searching: A Quality Measurement Perspective D. Lewandowski and N. Höchstötter Summary The purpose of this paper is to describe various quality measures for search engines and to ask whether these are suitable. We especially focus on user needs and their use of Web search engines. The paper presents an extensive litera- ture review and a first quality measurement model, as well. Findings include that Web search engine quality can not be measured by just retrieval effectiveness (the quality of the results), but should also consider index quality, the quality of the search features and Web search engine usability . For each of these sections, empiri- cal results from studies conducted in the past, as well as from our own research are presented. These results have implications for the evaluation of Web search engines and for the development of better search systems that give the user the best possible search experience. 16.1 Introduction Web search engines have become important for information seeking in many dif- ferent contexts (e.g., personal, business, and scientific). Research questions not answered satisfactorily are, as of now, how well these engines perform regarding user expectations and what measures should be used to get an overall picture of search engine quality . It is well known that search engine quality in its entirety cannot be measured with the use of traditional retrieval measures. But the develop- ment of new, search engine specific measures, as proposed in Vaughan (2004) are not sufficient, either. Search engine quality must be defined more extensively and integrate factors beyond retrieval performance such as index quality and the quality of the search features. One aspect neglected is the user himself. But to discuss and judge the quality of search engines, it is important to focus on the user of such systems, too. Better per- formance of ranking algorithms or providing additional services does not always lead to users’ satisfaction and to better search results. We focus on the Web search engine user behaviour to derive strategies to measure Web search engine quality . Additionally, quality assurance is an important aspect to improve customer satis- faction and loyalty. This is fundamental to protect market shares and revenues from A. Spink and M. Zimmer (eds.), Web Search, Springer Series in Information Science 309 and Knowledge Management 14. © Springer-Verlag Berlin Heidelberg 2008 310 D. Lewandowski, N. Höchstötter adverts. Furthermore, quality measurement helps to identify potential improve- ments of search engines. We are sure that only an integrated approach to quality measurement can lead to results usable for the development of better search engines. As with information retrieval , in general, we find a paradigm shift from the more technical (document- oriented) perspective to the user-oriented perspective (Ingwersen and Järvelin 2005). Our goal in this chapter is to define the scope of our perspective in compari- son to other approaches and to give a literature overview of quality measurements for search engines. We will also focus on each individual factor stated in studies dealing with user interaction with search engines and user expectations to search engines. The integrated approach of user and technical aspects shows that there are many possibilities but they are not widely adopted yet. Our chapter first gives an overview of studies conducted to derive quality meas- ures and to present the state of the art. The other focus in this section lies on user surveys and analyses to give an anticipation of what users really do by placing search queries. In Sect. 3 we give a general conspectus of parameters we deduced from our literature research and explain them shortly. In Sect. 4 we show empirical results that reflect the current quality standard by our individual measures of search engines. In the last section we summarize our findings and give potential strategies to improve search engines. Many of the empirical findings stem from our own research conducted over the past years. Our integrated view on search engine quality measurement is reflected by the different research areas of the authors. 16.2 Related Studies In this section, we will discuss studies dealing with search engines in the given context. The two areas relevant for extensive search engine quality measurement are the concept of information quality in general and its transfer to search engines as a technical background, and user studies to see what happens at the front-end. Each will be discussed under a separate heading. 16.2.1 Search Engine Quality Referring to information quality, one usually appraises information on the basis of a single document or a set of documents. Two perspectives have to be differenti- ated: Firstly, information quality in the production of a database which means, how documents or sources have to be appropriately selected and secondly, information quality of the results retrieved by a certain IR system. While the latter can be easily applied to Web search engines, the assurance of the quality of databases is more difficult. The approach of the major search engines 16 Web Searching 311 is to index not only a part of the Web, but as much as possible (or as much as rea- sonable under economic aspects). Only certain fractions of the Web (such as Spam sites) should be willingly omitted from the database. While in the production of databases the process of selecting documents (or sources of documents) can be seen as an important quality aspect, in the context of search engines, this process is reas- signed to the ranking process. Therefore, classic judgements for the selection of documents from a library context do not fit to search engines. Only specialized search engines rely on a selection of quality sources (Websites or servers) for build- ing their indices. An important point is that quality measurement of search results give only lim- ited insight into the reliability and correctness of the information presented in the document. Popular examples are documents from Wikipedia , which are often highly ranked by search engines. But, there seems not to be an agreement of experts whether Wikipedia content is trustworthy or not. For a normal user, there is only a limited chance of scrutinising these documents. In this context, perceived informa- tion quality is more a matter of trust . Within the wider context of search engine evaluation, it is possible to build models completely based on trust (Wang et al. 1999), as explained later on. When discussing quality of search results, one should also keep in mind how search engines determine relevance. They mainly focus on popularity (or authority) rather than on what is commonly regarded as quality. It should be emphasized that in the process of selecting documents to be indexed by engines and in the ranking process as well, no human reviews are involved. But a certain bias can be found inherent in the ranking algorithms (Lewandowski 2004b). These rate Web pages (apart from classic IR calculations) mainly by determining their popularity based on the link structure of the Web. The basic assumption is that a link to a page is a vote for that page. But not all links should be counted the same; link-based meas- ures take into account the popularity of the linking page itself and the number of outgoing links, as well. This holds true for both of the main link-based ranking algorithms , Google ’s PageRank (Page et al. 1998) and HITS (Kleinberg 1999). Link-based measures are commonly calculated query -independent, i.e., no com- puting power is needed to calculate these measures at the moment users place their search queries. Therefore, these measures can be applied very fast by the ranking process. Other query -independent factors are used as well (see Table 16.1 and for a detailed discussion Lewandowski 2005a). Here, the important point is that the process of ranking Web pages evolved from a query -document matching, based on term frequency and similar factors, to a process where several quality measure- ments are also taken into account. Link-based algorithms are of good use to push some highly relevant results to the top of the results list. This approach is oriented towards the typical user behaviour. Users often view only a few results from the top of the list and seldom process to the second or even third page of the results list. Another problem with the calcu- lation of appropriate result lists is the shortness of search queries. Therefore, most ranking algorithms prefer popular pages and the presence of search terms in anchor 312 D. Lewandowski, N. Höchstötter Table 16.1 Query-independent ranking factors (taken from Lewandowski 2005a) Directory hierarchy Documents on a higher hierarchy level are preferred Number of incoming links The higher the number of incoming links, the more important the document. Link popularity Quality/authority of a document is measured according to its linking within the Web graph. Click popularity Documents visited by many users are preferred. Up-to-dateness Current documents are preferred to older documents.