Web Archiving and You Web Archiving and Us

Web Archiving and You Web Archiving and Us Amy Wickner University of Maryland Libraries Code4Lib 2018 Slides & Resources: https://osf.io/ex6ny/ Hello, thank you for this opportunity to talk about web archives and archiving. This talk is about what stakes the code4lib community might have in documenting particular experiences of the live web. In addition to slides, I’m leading with a list of material, tools, and trainings I read and relied on in putting this talk together. Despite the limited scope of the talk, I hope you’ll each find something of personal relevance to pursue further. “ the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use International Internet Preservation Coalition To begin, here’s how the International Internet Preservation Consortium or IIPC defines web archiving. Let’s break this down a little. “Collecting portions” means not collecting everything: there’s generally a process of selection. “Archival format” implies that long-term preservation and stewardship are the goals of collecting material from the web. And “serving the archives for access and use” implies a stewarding entity conceptually separate from the bodies of creators and users of archives. It also implies that there is no web archiving without access and use. As we go along, we’ll see examples that both reinforce and trouble these assumptions. A point of clarity about wording: when I say for example “critique,” “question,” or “trouble” as a verb, I mean inquiry rather than judgement or condemnation. we are collectors So, preambles mostly over. Between us, we here in this room and on the livestream and so on represent collectors of web-based material, subjects of captured websites, and users of web archives. To confirm, let’s do a quick poll. Please raise your hand if you are a collector of web-based material. Thank you. Web design & development Labor Browser technology Federal policy Personal tech Corporate policy Digital culture Information ethics Web archiving technology Collaboration Costs & impact of storage Attention As individuals, collectives, and agents of institutions building web archives, we manage many moving parts in attempting to document even a small part of the living web. Here’s a short list of areas that influence our practices. Web development affects the archivability of websites. Personal tech influences how people produce and consume web-based material. Regime change leads to federal policy change like the end of net neutrality, content appearing and disappearing, large-scale collaborations like the End of Term archive, and the Internet Archive moving servers to Canada. Corporate policies and practices like terms of service, DRM, startup churn, and data selling influence both archiving and live use of the web, as we heard Wednesday in Mark Matienzo’s overview of IndieWeb. And trends in ethics include growing discussion of privacy as contextual integrity, particularly in online spaces, as well as the right to be forgotten. What constitutes archival value is, and will always be, “specific to place, time, culture and individual subjectivity. It does not dangle somewhere outside of humanity, immutable, pristine, transcendent. The appraiser creates, or recreates, archival value with every appraisal exercise. Harris 1998 As collectors, we also work within the specific contexts of our biases. This is an appraisal practice, in which collectors assign value to material and take actions accordingly. We aren’t always able to articulate these criteria, nor do we always itemize the actions taken -- colloquially lumping it all together in the word “save.” Appraisal as a fundamental archival practice has been hotly and insularly contested for more than a century -- and I have a syllabus to share if you’re at all curious. This [POINTS TO SLIDE] is one of the more accessible and increasingly relevant approaches, articulated by Verne Harris in 1998. He argues that appraisal is where power is most concentrated in archivists, and that it’s closer to storytelling than to a science. Trying to articulate that story is one way we can grow as web archivists. How can we better Competent, critical, curious use web archiving Learn, teach how it works technologies? Foreground labor Put faces to names behind infrastructure Blewer 2017; Arquivo.pt 2018 Improving our appraisal also comes down to being not only competent but also critical and curious users of web archiving tools. As starting points, I recommend two blog posts by Ashley Blewer: one that approachably introduces the technical side of popular web archiving frameworks; and one that explains the links between archivability and accessibility. If you find yourself in a position to teach with web archiving, try to scaffold learning around not just how to use things but also how they work. Let’s also look at how labor impacts web archiving: How much time do you spend on different parts of the process? What kind of work is web archiving? Put faces to the names of developers, archivists, and designers behind what you collect and how. The developers at Arquivo.pt, the Portuguese web archive, put out a pretty honest video last week describing the process behind their work, including some recent struggles and decisions around improving services. Documentation like this give us insight on the care of web archives. we are subjects So that’s an overview of how collectors and web archiving mutually shape one another. Next let’s consider how web archiving impacts subjects represented in web archives. And in fact, much of the power archivists wield is in the description or metadata that tells the story of a collection and its subjects. Please raise your hand if you’re a subject represented in web archives. Trick question: it’s all of us. How are we identity represented as safety subjects? privacy access accessibility exclusion harm To get at why representation in web archives even matters, consider how you and I are represented on the internet. Many of us come face to face everyday with the reality that the web is not so friendly to us, that it’s not built for us to use, is designed to propagate and privilege limited, harmful representations, all of which have real impacts on our well-being. Spaces nominally designed for participation -- Twitter, Wikipedia, reddit, I don’t really want to go on -- can be some of the most unwelcoming. There’s the tumultuous experience of trying to manage our own identities and safety online. Appeals to and for the right to be forgotten elicit cries of “Shame!”, of government overreach, and of the dangers of censorship for accountability and democracy. Protecting privacy is now treated as an individual rather than a collective responsibility: the admirable work of the Electronic Freedom Foundation, Library Freedom Project, and more only emphasize that institutions and their policies currently trend towards a lack of respect for privacy. And maybe they always have. So. We know the power of archival representation. We know that bias, mis- and underrepresentation, are rampant on the web, including in highly participatory spaces. We’re aware of access and accessibility issues in all of the above. So what warrant could we possibly have to assume web archives would be any different? Future users Current users Audience Designated community Web archives perpetuate unresolved issues that affect us as subjects and for which we bear responsibility as collectors. Communicating the context of web archiving to people or robots who might use the results is one way to confront these issues. There were some raging Western archival debates in the mid-to-late 20th century about whether and how archivists should envision a future user or user community when building collections, and I assume there are small fires of this kind smoldering today. In the digital curation world, and in any system even nominally based on the Open Archival Information System (OAIS), we assume a designated community to justify preserving certain data. It can be illuminating to examine one’s assumptions about audience. we are users Now, please raise your hand if you use web archives. Thank you. How do people use remix web archives? critique “plural and heterogeneous archives” legal evidence historical evidence receipts Post 2017; Taylor 2017; Belovari 2017; Milligan 2016; Zannettou, et al. 2018 How do people use web archives, anyway? Artists use them as source material for remix and critique, as what Colin Post calls “plural and heterogeneous archives.” Courts are starting understand the legal uses and limits of web archives as evidence, including whose interests such evidence and cases tend to serve. Historians have used web archives to study political discourse and engagement, among other topics, but it’s also been pointed out that today’s web archives are so little conducive to historical research methods that it may not be strictly accurate to refer to them as “the historical record.” And of course, journalists and the general public use them for RECEIPTS. Maybe you see yourself in one of these categories, although I’ve left so many out. Let’s think about lines of inquiry we can take as users of web archives, to critically read, much as we learn to critically build. EXERCISE: postcolonial critique ● How do web archives reflect or suppress values of the people they represent? ● How are the decisions behind designs, appraisal, and access obscured or revealed? ● What are the labor practices behind web archives and archiving technologies? ● What are the environmental impacts of web archiving? Anderson 2002 Fundamentally, this means approaching them as having been constructed in different ways and for a variety of purposes. Just like other archives, the narrative of how they’re built is closely tied to how they represent the world.

Web Archiving and You Web Archiving and Us

Jamie Shiers, CERN

Module 8 Wiki Guide

INST 785 Section 0101 Documentation, Collection, and INST 785 Appraisal of Records Spring 2020

Archiving 2016 Preliminary Program

10 Small Scale Academic Web Archiving: DACHS

Selection in Web Archives: the Value of Archival Best Practices

Modeling Popularity and Reliability of Sources in Multilingual Wikipedia

Cultural Anthropology and the Infrastructure of Publishing

Yale University Library Preservation Department

2016 Program

Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language

Web Archiving Supplementary Guidelines