Quantitative Web History Methods

11 Quantitative Web History Methods Anthony Cocciolo INTRODUCTION as a decreased use of text on the web in favor of image-based content, such as video and This chapter explores how historical research photographs. I was particularly interested in questions, including research questions about what seemed like an erosion of written con- the history of the web, can be addressed tent online in favor of a form of communica- through quantitative research methods applied tion that seemed to share commonalities with to web archives. At a basic level, quantitative children’s books, where photographs were methods involve applying a variety of math- accompanied with small amounts of text. In ematical or statistical analysis on numerical seeing what looked like a movement away data. These mathematical techniques can be from the written word, I was reminded of the as simple as adding up the occurrence of work of Walter Ong who noted the tenacity of some word, to more sophisticated techniques orality, or a tendency to attempt to return to such as analysis of variance (ANOVA), which an oral culture despite the success and obvi- will be described in more depth in this chap- ous benefits of literacy (Ong, 2002). An oral ter. Through the use of web archives, quanti- culture is one without knowledge of literacy, tative methods can be used to show patterns where information, knowledge, and culture and changes over time, thus having utility in is communicated and passed down through addressing historical research questions. means other than the written word, such as In this chapter, a personal use of quantita- through oral storytelling, music, and other tive research methods with web archives will non-written means. Was the internet, with its be discussed as a way of illustrating how they newfound ability to easily stream video and can be used more broadly (Cocciolo, 2015). high-resolution imagery, allowing for Ong’s In 2014, I was interested in what I perceived return to orality? BK-SAGE-BRUGGER_MILLIGAN-180264-Chp11.indd 138 8/20/18 8:18 PM QUANTITATIVE WEB HISTORY METHODS 139 Although I realized that I could not make The aim of this chapter is to offer a starting such sweeping arguments in an academic point for historians interested in applying study – reviewers would be none too pleased – quantitative research methods to web archives I was still interested in developing a sound to answer historical research questions, using method for determining changes in the amount personal experience as an illustration. of text delivered to users over time. To study However, before the stages are discussed, this, web archives are essential because they relevant literature on using quantitative contain copies of webpages from the past. research methods with web archives will be Although some countries have extensive web introduced. archives of their national domain or other collecting areas, in the United States – which was my main study site – the most extensive web archive is the one kept by the Internet RELEVANT LITERATURE Archive that it displays through its WayBack Machine. Thus, I knew that if I was to study Using quantitative research methods with changes in the presentation of text over web- web archives may be somewhat new to histo- sites used by people in the United States, the rians. In a traditional sense, historical research web archives kept by Internet Archive would involves the close examination of textual be an essential resource. records to address historical research ques- Before I discuss how I applied web tions. The question of whether to use quanti- archives in my research, I will outline the tative or qualitative research methods is not general steps for engaging in quantitative generally directed at the historian but rather at research using web archives. Each of these the social scientist, such as the psychologist, steps will be described in more detail in the sociologist, and education researcher. following sections and will draw on this per- Quantitative and qualitative research – when sonal example. These steps are: referred to by social scientists – typically involves studying living people, which may or 1 Developing a research question – First, a research may not be the case for historians. In social question should be developed that can be science, qualitative research methods such as addressed in full or in part through web archives. interviews and focus groups are often used to Types of questions that can be explored through get at people’s understanding of something such methods will be explored, as well as those that is not well understood, such as motiva- that are better suited for other methods. 2 Securing a corpus – Second, the corpus of web tions or opinions on some topic. Quantitative archived content that provides coverage of the research can be used to study an issue or topic areas appropriate to the research question ought that may be better understood. However, there to be secured, and ways to gain access to such is greater interest in seeing how wide or gen- corpora will be discussed. eralizable a given view is. Whereas qualitative 3 Numerical translation – Third, the corpus of text research may involve analysis of data such as needs to be translated into numerical data based interview transcripts, quantitative research on the research question. may involve analysis of numerical data such 4 Analysis – Fourth, using the numerical datasets as that generated from a survey. In this paper, created in the earlier step, mathematical or ‘quantitative research’ is used to refer to per- statistical analysis techniques can be employed. forming analysis using numerical and statisti- This can be as simple as mathematical functions such as summation, average, standard deviation, cal techniques on data, specifically web to more complex analysis, including statistical archives. While this method may not be com- techniques such as analysis of variance (ANOVA). monly used by historians, it can be used to 5 Drawing conclusions – Fifth, like all research, con- help address historical research questions in clusions should be drawn based on the analysis. conjunction with other sources of evidence. BK-SAGE-BRUGGER_MILLIGAN-180264-Chp11.indd 139 8/20/18 8:18 PM 140 THE SAGE HANDBOOK OF WEB HISTORY Studying webpages and web-based phe- opportunities for historical and longitudinal nomena using quantitative methods is not analysis are increasingly possible. new as it is captured in the research subfield In the field of communication and media of information science known as webomet- studies, researchers have begun to use web rics. According to Thelwall and Vaughan, archives to create web histories, which ‘Webometrics encompasses all quantitative Brügger defines as ‘a necessary condition studies of web-related phenomenon’ (2004: for the understanding of the Internet of the 1213). Thelwall (2009) notes that webomet- present as well as of new, emerging Internet rics can be used for studying a variety of web- forms’ (2011: 24). Web histories can include based phenomena, such as issues relating to studies of multiple facets of the web, such as election websites, online academic commu- national domains, which may look at factors nication, bloggers as amateur journalists, and such as volume, space, structure, content, social networking. The methods can be used among others (Brügger, 2014). for understanding aspects like web impact assessment, citation impact, trend detec- tion, and search engine optimization, among other possible uses. Webometrics grows out STAge 1 – DEVELOPING A RESEARCH of the subfield of information science known QUESTION OR QUESTIONS as bibliometrics, which uses quantitative analysis to make measurements related to Before progressing further, I must make a published books and articles, such as citation quick note on language used in this article as analysis to determine impact. Related sub- it varies to some degree by country. By fields include infometrics, which is the quan- ‘homepage’, I am referring to the start page titative study of information and can combine or initial page of a website. I also use the analysis of information in whatever form it term website, which refers to an entire col- may occur. lection of webpages under a given domain. Björneborn and Ingwersen (2004) high- For example, the website ‘pepsi.com’ is light four main areas of webometric research: composed of a homepage and other web- 1) webpage content analysis; 2) web link pages that are hyperlinked together to form structure analysis; 3) web usage analysis; and the website. 4) web technology analysis. Notably missing When engaging in quantitative research from this list is a longitudinal or time-based using web archives, it is necessary to have dimension. However, webometrics research- a research question that lends itself to such ers highlight the possibilities opened up by methods. For my project mentioned in the web archives. Björneborn and Ingwersen introduction, my research questions are the highlight that ‘Web archaeology… could following: in this webometric context be important for recovering historical Web developments, for Is the use of text on the World Wide Web declining? If so, when did it start declining, and by how example, by means of the Internet Archive much has it declined? (www.archive.org)’ (2004: 1217). When webometrics was in its early development in The above research questions are well suited the 2000s, web archives such as the Internet for quantitative methods using web archives. Archive only contained a few years of con- The first reason why is that the question is tent, making them less appealing for long- essentially quantitative in nature: a ‘decline’ term, longitudinal analysis. However, as web and by ‘how much’ is something that can be archives have persisted, and notable web readily measured numerically by comparing archives such as the Internet Archive have data from some specific year in the past and surpassed 20 years of crawling websites, new measuring it against a more recent year.

Quantitative Web History Methods

KYC) Platform

Technological Advances in Corpus Sampling Methodology

STEFANN: Scene Text Editor Using Font Adaptive Neural Network

OSINT Handbook September 2020

Business Intelligence Resources 2018

The Web Is for Reading? the Rise and Fall of Text on the Web Anthony Cocciolo ABSTRACT Introduction. This Study Looks to Addres

100 Tech Hacks [PDF]

The Rise and Fall of Text on the Web: a Quantitative Study of Web Archives

Tandem 2.0: Image and Text Data Generation Application

Megabyteact-GSA-2016.Pdf

Open Source Intelligence Tools and Resources Handbook 2020

STEFANN: Scene Text Editor Using Font Adaptive Neural Network