Webometrics and Altmetrics: Home Birth Vs

Michael Thelwall Webometrics and Altmetrics: Home Birth vs. Hospital Birth 1 Introduction Almost two decades ago it became apparent that scientometric techniques that had been developed for citation databases could also be applied to the web. This triggered two reactions: optimism and pessimism. On the one hand, a number of optimistic articles were written by Ray Larson, Tomas Almind, Josep-Manuel Rodríguez-Gairín, Peter Ingwersen, Blaise Cronin, Judit Bar-Ilan, Isidro Aguillo, Lennart Björneborn, Alastair Smith, and Ronald Rousseau that described or developed new ways in which the web could be used for scientometrics. On the other hand, more critical articles were written by Anthony van Raan and Leo Egghe that focused on the problems that webometric methods were likely to face. This chapter argues that these two approaches represented different theoretical per- spectives on scientometrics and that time has shown both to be valid. This chapter concludes by contrasting the academic literature that gave birth to webometrics with the coherent set of goals that announced altmetrics on a public website. 2 The Birth of Webometrics Webometrics is an information science research field concerned with quantitative analyses of web data for various purposes. It was born from scientometrics, the quantitative analysis of science, with the realization that, using commercial search engines, some standard scientometric techniques could, and perhaps should, be applied to the web. The basic idea for webometrics apparently dawned on several people at almost the same time (Aguillo, 1998; Almind & Ingwersen, 1997; Cronin, Snyder, Rosenbaum, Martinson, & Callahan, 1998; Larson, 1996; Rodríguez-Gairín, 1997), although the most developed concept was that of Almind and Ingwersen (1997), who explicitly compared the web to traditional citation indexes and coined the term webometrics. The idea driving the birth of webometrics was that many scientometric studies had analyzed information about sets of scientific documents extracted from a publication database or citation index, and that the same methods could be adapted for the web. For example, the productivity of one or more departments DOI 10.1515/9783110308464-019, © 2020 Michael Thelwall, published by De Gruyter. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License. 338 | Michael Thelwall could be assessed by counting the number of documents produced by them, as recorded in an international scientific database, but on the web the number of web pages produced by them could also be counted, either directly or using an appropriate search engine query. Perhaps more usefully, evaluative scientometric studies often assessed the impact of the documents produced by one or more researchers by counting citations to them. On the web, the impact of collections of documents could be assessed to some extent by counting the number of hyperlinks pointing to them (Ingwersen, 1998). In addition, new types of study of scientific impact were possible with the web because of the increasingly many activities that were recorded online, such as for education, public engagement, and talks. Hence, for example, it might be possible to trace the wider impact of individual academics by counting and analysing how often they were mentioned online (Cronin et al., 1998). There were two different reactions to the emergence of webometrics: enthu- siasm and scepticism. The enthusiastic perspective described the range of studies that might be possible with webometrics and concentrated on its potential applications and strengths (Björneborn, & Ingwersen, 2001; Borgman, & Furner, 2002; Cronin, 2001; Leydesdorff, & Curran, 2000); the critical perspective instead emphasized the problems inherent in web data and focused on the limitations that webometric methods would necessarily face (Björneborn, & Ingwersen, 2001; Egghe, 2000; van Raan, 2001). Drawing from both strands, webometrics emerged as an empirical field that sought to identify limitations with web data, todevelop methods to circumvent these limitations, and to evaluate webometric methods through comparisons with related offline sources of evidence (e.g., comparing counts of hyperlinks to journal websites with counts of citations to the associated journals), when possible (Bar-Ilan, 1999; Rousseau, 1999; Smith, 1999). 3 Webometric Theories Like the mother field scientometrics, webometrics has not succeeded in creating a single unifying theory for its main tasks but it has produced a few theoretical con- tributions and one specific new theory. Within scientometrics, a detailed theory to deal with the practice of citation analysis has long been needed (Cronin, 1981), but has not been produced. Instead, practicing scientometricians draw upon a range of different theories (e.g., Merton, 1973; de Solla Price, 1976) and empirical evidence (Oppenheim, & Renn, 1978) about why people cite and the biases in citation analysis (e.g., Cronin, 1984) in order to effectively process and interpret citation data (van Raan, 1998). Within webometrics and altmetrics there are spec- Webometrics and Altmetrics: Home Birth vs. Hospital Birth | 339 ulations about why people cite in the web as well as some empirical evidence, but both are weaker than for citation analysis. Although early webometric papers were mainly quite theoretical in terms of discussing the potential for webometrics, the first major clearly articulated theoretical contribution to the mature field was an article that gave it a formal def- inition as “the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bib- liometric and informetric approaches” (Björneborn 2004, p. vii), placed the name of the field within a research taxonomy, systematically named the main objects of study for link analysis, and introduced a standard diagram style for link networks (Björneborn & Ingwersen, 2004). While no article has attempted to create a unifying theory of hyperlinks (or any other aspect of webometrics), a third article proposed a unifying “theoretical framework” for link analysis by specifying the stages that an effective link analysis wouldhavetoinclude,asfollows: 1. Link interpretation is required in any link analysis exercise if conclusions are to be drawn about underlying reasons for link creation. An exception would be made for evaluative link analysis if it could be consistently demonstrated that inlink counts correlate with the phenomenon desired to be evaluated. 2. No single method for link interpretation is perfect. Method triangulation is required, ideally including a direct method and a correlation testing method. 3. Fundamental problems, including the “rich get richer” property of link creation and web dynamics, mean that definitive answers cannot be given to most research questions. As a result, research conclusions should always be expressed cautiously. 4. Extensive interpretation exercises are not appropriate because of the reasons given in point 3 above. (Thelwall, 2006) The purposes of the above theoretical framework were to create a consensus about the minimum requirements for a published link analysis study and to ensure that the conclusions drawn from future studies would always be tied to the evidence provided in them. In this sense it copies the earlier work of van Raan (1998). Al- though no new link analysis theories have been proposed since the above, more powerful link analysis methods have been developed (e.g., Seeber, Lepori, Lomi, Aguillo, & Barberio, 2012) and the field has continued to progress. Within webometrics, link analysis’ younger sibling is search engine evaluation, the analysis of properties of commercial web search engines. This area of study was conceived from its parent fields—scientometrics and informetrics— when it became apparent that link analysis could not survive alone because the results that web search engines gave for queries were inconsistent and sometimes 340 | Michael Thelwall illogical. Search engine evaluation studies attempted to cure this childhood dis- ease by identifying strategies for extracting information that was as reliable as possible (e.g., Bar-Ilan, 2001; Bar-Ilan, 2004), such as by conducting multiple searches and aggregating the results (Rousseau, 1999). In addition, by under- standing and mapping the extent of the problem of search engine reliability (e.g., Vaughan & Thelwall, 2004), the results of link analysis studies could be interpreted with an appropriate degree of caution. From a theoretical perspective, this is a very odd type of information science research because the object of study is a computer system (Google, Bing etc.) created by a small number of humans (mainly computer scientists) that is, in theory, perfectly understandable to researchers. Nevertheless, the exact details of the functioning of all the major commercial search engines are proprietary secrets, thus necessitating empirical analysis. This area of research has generated no formal theories at all, perhaps because it seems illogical to theorize about a very small number of complex computer systems. To fill this gap, here are some new theoretical hypotheses about commercial search engines. These are consistent with the history of search engines over the past decade but are clearly not directly testable until one of them fails: – Incompleteness hypothesis: No major commercial search engine will ever be designed to give an accurate and complete set of

Webometrics and Altmetrics: Home Birth Vs

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support