Understanding Computational Web Archives Research Methods Using Research Objects Emily Maemura, Christoph Becker Ian Milligan Digital Curation Institute Department of History University of Toronto University of Waterloo Toronto, Canada Waterloo, Canada
[email protected],
[email protected] [email protected] Abstract—Use of computational methods for exploration and traditionally focused on close readings of source materials analysis of web archives sources is emerging in new disciplines are moving to the use of distant reading methods [2]. such as digital humanities. This raises urgent questions about We highlight three critical factors to be considered for how such research projects process web archival material using computational methods to construct their findings. humanities researchers using web archives: This paper aims to enable web archives scholars to document 1) interrogating sources. – Understanding how these web their practices systematically to improve the transparency of their methods. We adopt the Research Object framework to archives collections were created is key to judging characterize three case studies that use computational methods the adequacy, appropriateness and limitations of the to analyze web archives within digital history research. We then source material. discuss how the framework can support the characterization 2) understanding new methods. – The use of emerging of research methods and serve as a basis for discussions of computational methods is a key prerequisite to work- methods and issues such as reuse and provenance. The results suggest that the framework provides an effective ing with large scale data sets. conceptual perspective to describe and analyze the computa- 3) transparency of the research process. – Findings are tional methods used in web archive research on a high level dependent on the validity of the computational pro- and make transparent the choices made in the process.