Musso the Digital Dark

Total Page:16

File Type:pdf, Size:1020Kb

Musso the Digital Dark Today I would like to talk about DIGITAL RESOURCES and what they can give to historical research. Digital sources seem like something that do not concerns us historians today, but twenty years from now we will not be able to invesEgate any aspect of history without taking into consideraon these sources, because this is what our era is producing now - and mostly, they represent possibiliEes that no previous generaon of historians could enjoy. The utopia of Alexandria’s library, the Universal Library which holds all knowledge and is accessible to anyone is potenEally something that the internet could give us – however, just like Alexandria’s library burned to the ground, our universal and fast-growing digital heritage is in danger. I called this presentaon “The digital dark age” to focus on the problem of preservaon of our digital cultural heritage, which has some specificity that make it par8cularly vulnerable for preservaon. But also, I would like to stress that when I talk about digital sources I don’t just mean documents created with the arrival of computers and the internet and that will be useful historical sources either to contemporary historians today or to historians in general 100 years from now, but really to anyone engaging in historical research today. 1 I would like to start with this: this is a digiEzed copy of the Magna Carta from the BriEsh Library h=p://www.bl.uk/collec8on-items/magna-carta-1215 Zoom – you can access the whole document in a simulaon of how it looks like Metadata on the document – where, when, copyright, descripEon Transcript So this makes the Magna Carta a digital source – like any other document, from any era, that has been digized. 2 I like to divide digital sources in three big categories: a) DIGITISED SOURCES: any type of document that was not originally created as a digital object (whether text, video, or sound) that was subsequently digiEzed and made into a digital object, whether a pdf, mov, jpeg, whatever format computer can read. EX: MAGNA CHARTA b) BORN-DIGITAL SOURCES: these are all files that were created in digital: pictures, texts, music scores, sketches, this power-point… pre\y much anything that is produced today from offices to arEsts’ studios to government assemblies c) ONLINE SOURCES: sources that are created not only as digital files, but as online objects – on the internet basically. Tweets, facebook statuses, blogs, newspapers, etc. These objects have the further characterisEc of being publicly accessible and much more interacEve: think of how many days a newspaper websites is updated, or the process of retweeEng tweets, or the comments that we can all live on the web in many ways. The Guardian website is nothing like its paper copy, and it is also very different from a pdf version of the paper copy. Also, online sources are composed of different media: text, images, video, 3 I would like to say something about the process of digi8za8on, which poten8ally affects all historical sources which are the only available sources up to the arrival of digital records. DigiEsaon is important for many reasons: - providing online, potenally universal access to informaon held on paper - reducing wear and tear to historical records (imagine how many more people can access the Magna Carta online without having to actually touch it) - retain the appearance of the original arEfact - reducing the problem of SPACE - improving searchability (eg, OCR) 4 The digiEzaon techniques are not unique, and they vary a lot according to the type of medium you want to digiEze. Standards are under review, but good guidelines are for example provided by JISC, the UK associaon for the usage of digital media. The easiest documents to digiEze are text and sEll images, which don’t required parEcularly complicated technologies and DIY used oden by archives Usually done with a digital cameras or scanners – These are: - old picture of people at King’s College - The other is a picture of a document I took in the historical archive of the Italian oil company, and it is a terrible example of how to digiEze – I just did it with a small camera for personal usage. 5 from the JISC guidelines: - Copies must not be enhanced or modified. So no cropping, no photoshop - Each page must be copied on its own. - Weights sheets may be needed to flaen documents. - A colour checker and ruler must be included on every page so to show the actual dimension and colour of the object. - The enre page should be included; the edge of the paper must not be cropped out of view. If you are photographing a bound volume, the margin should be included. Resolu8on -> 300dpi for reference-only, 600 for actual preservaon. Usually two files are created, one at lower and one at higher resoluEon Colour sengs: Bit-depth relates to the level of colour that will be captured A ‘bit’ is the binary digit that represents the tonal value of the pixel Generally speaking, a 1-bit image is black and white, an 8-bit image has 256 shades of either grey or colour and a 24-bit image has millions of shades of colour format: these are jpeg, Eff, pdf, etc. The difference lies in whether the format is compressed or uncompressed and if compressed, if it’s lossy or lossless. Usually jpeg are used for low-res, compressed storage and Eff for high-res storage. Usually each 6 DigiEsaon of audio and video objects are much more complex and oden require professional equipment. Usually only archives that are specialized in audiovisual components (like the bbc archive) do the digiEzaon themselves. Most of them outsource the job to specialized companies. The original object (tape, record, film) must be played on its original reader (recorder, record player, etc) and connected to a computer with special cables and run with a programme which converts the informaon in digital. For example, AUDACITY is a free, easy to learn so^ware 7 One great advantage of digital files is that they do not degrade or lose quality with repeated use (like tapes or record albums, or books do). They can also be copied repeatedly without any loss of alteraon. This leads to big problems related to copyright and intellectual property, but I will leave them for another Eme. The digital object will not be catalogued on a folder or shelf anymore, but it will sEll be a folder on a directory. These folder trees are set up by the archives according to specific guidelines. However, usually the researchers do not access the folder tree, but they operate research through a search engine interface, whether online or on the offline archive’s catalogue. 8 We should never forget that these documents are kept on servers and hard drives, so they do occupy a physical space, just a different one. This physical space is actually a very fragile physical space (I’m sure you have all broken a hard drive by making it fall or just with a power malfuncEon. One big difference between physical archives and digiEal archives is that we do not tend to preserve objects, say through restauraon, etc, but through the preservaon of the data. The hard drive is not a historical source, it’s its content that maers. The new fronEer of historical preservaon is DATA MIGRATION, the process of transferring data between storage types, formats, or computer systems. As digital storage technology progresses, data will migrate periodically on new formats. I think the suggested standards is to migrate every 10 years. The file size is the size of the computer file of your image It is measured in bytes The larger the file size, the more disk space (storage space) this will take up on your computer Bytes (1 byte = 8 bits) are oden broken down into kilobytes or KB (1000 bytes) 9 This in red is the old one, these in black are the new drives. This goes on and on for several kilometres. “The Internet Archive”, a non-profit organizaon funded in San francisco in 1996 whose purpose is to collect, preserve, and make available to the general public all historical collecEons that exist in digital format. The Internet archive includes pictures, websites, music, moving images, and over three million public-domain books. It is an umbrella archive, as it both acquires digital sources itself and links to different collecEons around the world. 10 This is addiEonal storage purchased for the archive. The main problems with these archives is that they need constant power consump8on and cause excess heat – for example, the Internet Archive’s Petabox system uses the heat the hard drives generate to heat the building. - Also, separate data centre to prevent physical damage in just one part - old drives kept as an extra copy, not thrown away The Internet Archive alone currently hold 50 PetaBytes, that is 50000 terabytes (Petabyte = 1000 terabytes). Which corresponds to roughly: 6 million books 400 billion webpages 3,800 films 350,000 news programmes 200,000 audio recordings 100,000 pictures This order of magnitude takes me to the next problem I would like to describe: the 11 This is what happens on the internet in 60’’. And this is just these monitores websites, plus there are all the non-online digital sources and the digiEzed sources. The amount of informaon available is skyrockeEng Big data, sampling and social science approaches oden seem to be the only way to navigate in this ocean of digital sources. If we consider pictures for examples, 5 billion pictures are uploaded on the internet every day. Ge\y images, one of the largest photography archives in the world, had 80 million pictures in total.
Recommended publications
  • Archives First: Digital Preservation Further Investigations Into Digital
    Archives First: digital preservation Further investigations into digital preservation for local authorities Viv Cothey * 2020 * Gloucestershire County Council ii Not caring about Archives because you have nothing to archive is no different from saying you don’t care about freedom of speech because you have nothing to say. Or that you don’t care about freedom of the press because you don’t like to read. (after Snowden, 2019, p 208) Disclaimer The views and opinions expressed in this report do not necessarily represent those of the institutions to which the author is affiliated. iii iv Executive summary This report is about an investigation into digital preservation by (English) local authorities which was commissioned by the Archives First consortium of eleven local authority record offices or similar memory organisations (Archives). The investigation is partly funded by The National Archives. Archival institutions are uniquely able to serve the public by providing current and future generations with access to authentic unique original records. In the case of local authority Archives these records will include documents related to significant decision making processes and events that bear on individuals and their communities. Archival practice, especially relating to provenance and purposeful preservation, is instrumental in supporting continuing public trust and essential to all of us being able to hold authority to account. The report explains how Archival practice differs from library practice where provenance and purposeful preservation are absent. The current investigation follows an earlier Archives First project in 2016-2017 that investigated local authority digital preservation preparedness. The 2016-2017 investigation revealed that local authority line of business systems in respect of children services, did not support the statutory requirement to retain digital records over the long- term (at least 100 years).
    [Show full text]
  • Digital Preservation Handbook
    Digital Preservation Handbook Digital Preservation Briefing Illustrations by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark Who is it for? Senior administrators (DigCurV Executive Lens), operational managers (DigCurV Manager Lens) and staff (DigCurV Practitioner Lens) within repositories, funding agencies, creators and publishers, anyone requiring an introduction to the subject. Assumed level of knowledge Novice. Purpose To provide a strategic overview and senior management briefing, outlining the broad issues and the rationale for funding to be allocated to the tasks involved in preserving digital resources. To provide a synthesis of current thinking on digital preservation issues. To distinguish between the major categories of issues. To help clarify how various issues will impact on decisions at various stages of the life-cycle of digital materials. To provide a focus for further debate and discussion within organisations and with external audiences. Gold sponsor Silver sponsors Bronze sponsors Reusing this information You may re-use this material in English (not including logos) with required acknowledgements free of charge in any format or medium. See How to use the Handbook for full details of licences and acknowledgements for re-use. For permission for translation into other languages email: [email protected] Please use this form of citation for the Handbook: Digital Preservation Handbook, 2nd Edition, http://handbook.dpconline.org/, Digital Preservation Coalition © 2015. 2 Contents Why Digital Preservation Matters
    [Show full text]
  • A New Digital Dark Age? Collaborative Web Tools, Social Media and Long-Term Preservation Stuart Jeffrey Version of Record First Published: 05 Dec 2012
    This article was downloaded by: [University of York] On: 10 December 2012, At: 04:01 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK World Archaeology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rwar20 A new Digital Dark Age? Collaborative web tools, social media and long-term preservation Stuart Jeffrey Version of record first published: 05 Dec 2012. To cite this article: Stuart Jeffrey (2012): A new Digital Dark Age? Collaborative web tools, social media and long-term preservation, World Archaeology, 44:4, 553-570 To link to this article: http://dx.doi.org/10.1080/00438243.2012.737579 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. A new Digital Dark Age? Collaborative web tools, social media and long-term preservation Stuart Jeffrey Abstract This paper examines the impact of exciting new approaches to open data sharing, collaborative web tools and social media on the sustainability of archaeological data.
    [Show full text]
  • Digital Preservation.Pdf
    Digital Preservation By Jean-Yves Le Meur project leader of CERN Digital Memory 100 AD End of cuneiform on tablets 350-50 BC Jupiter orbit Tablets -> British Museum 1800- 1300-1400 Jupiter orbit 2014 (again) The missing 29 Jan 2016 ‘rosetta’ tablet Digital preservation in a nutshell ● World wide Landscape ○ Rationale ○ Interesting initiatives ○ Good practices: OAIS ● The different Approaches The Digital “Dark Age” "We are nonchalantly throwing all of our data into what could become an information black hole without realizing it" Vint Cerf (vice-president of Google in Feb 2015) The Digital “Dark Age” ● Very large community worrying about the preservation of digital content ● Digital Preservation Coalition ● Open Preservation Foundation ● UNESCO PERSIST project, EU e-ARK project, National Libraries and Archives ● Many related conferences: iPRES series, etc. “This is not about preserving bits, It is about preserving meaning, much like the Rosetta Stone.” More than 70 major libraries destroyed over time: accidents, disasters, ethnocides How digital data evaporates (I) 1. Physical Obsolescence: Bit rot Ten 2. Redundancy failure Major 3. Technological Obsolescence of readers, formats, OS, HWs New 4. Lost in migrations ! Risks 5. Missing context: no codec ! How digital data evaporates (II) 6. Redundancy failure Ten 7. Economical Failures Major 8. Lost in transitions: people ! New 9. Corruption, mistake or attack 10. Dissipation: out of reach Risks Some examples at CERN ● The very first WWW pages ○ Reconstructed in 2013 - found again in 2018!
    [Show full text]
  • Follow-Up Questions
    Follow-Up Questions ASERL Webinar: “Intro to Digital Preservation #2 -- Forbearing the Digital Dark Age: Capturing Metadata for Digital Objects” Speaker = Chris Dietrich, National Park Service Session Recording: https://vimeo.com/63669010 Speaker’s PPT: http://bit.ly/10PKvu8 UPDATED – May 23, 2013 Tools 1. Is photo watermarking available using Windows Explorer? Microsoft Paint, which comes installed with Microsoft Windows, provides basic (albeit inelegant) watermarking capabilities. 2. Do Microsoft tools capture basic metadata automatically, without user intervention? Microsoft Office products capture very basic metadata automatically. The Author, Initials, and Company are captured automatically from a user’s Windows User Account settings. File system properties like File Size, Date, etc. are also automatically captured. The following Microsoft Knowledge Base articles provide details for each Microsoft Office product: http://office.microsoft.com/en-us/access-help/view-or-change-the-properties-for-an-office-file- HA010354245.aspx, http://office.microsoft.com/en-us/help/about-file-properties-HP003071721.aspx. Microsoft SharePoint can be configured to automatically capture metadata for items uploaded to libraries: http://office.microsoft.com/en-us/sharepoint-help/introduction-to-managed-metadata-HA102832521.aspx. 3. Can you recommend tools/services that leverage geospatial data that do not provide latitude & longitude information? For example, I want to plot a photo of “Mt Doom” on a map but have no coordinates…. Embedding geospatial coordinates in digital objects (often called “geotagging”) can be done with a number of software tools. GPS Photo Link (http://www.geospatialexperts.com/gps-photo%20link.php) allows users to add coordinates to embedded metadata manually, or by selecting a photo(s) and then clicking a point on a Bing Maps satellite image.
    [Show full text]
  • Problems of Digital Sustainability
    Acta Polytechnica Hungarica Vol. 7, No. 3, 2010 Problems of Digital Sustainability Tamás Szádeczky Department of Measurement and Automation, Kandó Kálmán Faculty of Electrical Engineering, Óbuda University Tavaszmező u. 17, H-1084 Budapest, Hungary [email protected] Abstract: The article introduces digital communication by drawing comparisons between the histories of digital and conventional written communication. It also shows the technical and legal bases and the currently reached achievements. In relation to the technical elements, it acquaints the reader with the development and current effects of computer technology, especially cryptography. In connection with the legal basis, the work presents the regulations which have emerged and made possible the legal acceptance of the digital signature and electronic documents in the United States of America, in the European Union and among certain of its member countries, including Hungary. The article reviews the regulations and the developed practices in the fields of e-commerce, electronic invoices, electronic records management and certain e-government functions in Hungary which are necessary for digital communication. The work draws attention to the importance of secure keeping and processing of electronic documents, which is also enforced by the legal environment. The author points to the technical requirements and practical troubles of digital communication, called digital sustainability. Keywords: electronic archive; digital sustainability; preservation; electronic signature; data security 1 Introduction We may distinguish three revolutions in the development of written communication [1]. The first revolution was the invention of alphabetical writing carrying phonetic value around 1300 BC., which segregated the text from the content. The second revolution in the 15th Century was the book printing invented by Gutenberg, Johannes Gensfleisch, which made written material widely available.
    [Show full text]
  • A DIY Approach to Digital Preservation
    Practical Digital Solutions: A DIY Approach to Digital Preservation by Tyler McNally A Thesis Submitted to the Faculty of Graduate Studies of the University of Manitoba In Partial Fulfillment of the Requirements of the Degree of MASTER OF ARTS Department of History (Archival Studies) Joint Masters Program University of Manitoba/University of Winnipeg Winnipeg, Manitoba Copyright © 2018 by Tyler McNally Table of Contents Abstract…………………………………………………………...i AcknoWledgments…………………………………………….ii Acronym IndeX…………………………………………………iii Introduction……………………………………………………..1 Chapter 1………………………………………………………….12 Chapter 2………………………………………………………….54 Chapter 3………………………………………………………….81 Conclusion ……………………………………………………….116 Bibliography……………………………………………………..125 iii Abstract Since the introduction of computers, archivists have had to find ways to deal with digital records. As more records are born digital (created through digital means) and digital technologies become more entrenched in hoW data is created and processed, it is imperative that archivists properly preserve these records. This thesis seeks to propose one possible solution to this issue. Rather than advocate for paid solutions or electronic record management systems, it advocates for more practical in-house DIY solutions. The first chapter lays out background information and the historiography of digital archiving in Canada at the federal level. The second chapter moves step-by-step through a Workflow developed at the University of Manitoba’s Faculty of Medicine Archives that lays out one possible DIY style solution. The third chapter is an audit of the WorkfloW from the second chapter against three important international standards for preserving digital information. iv Acknowledgments I would like to acknoWledge and thank Professors Thomas Nesmith and Greg Bak. Their role as professors of the Archival Studies program has been a great source of support and inspiration as well as their knowledge and passion for both archives and their students.
    [Show full text]
  • The Theory and Craft of Digital Preservation Manuscript Submitted to Johns Hopkins University Press By: Trevor Owens June, 2017 2
    1 The Theory and Craft of Digital Preservation Manuscript Submitted to Johns Hopkins University Press By: Trevor Owens June, 2017 2 Table of Contents Acknowledgements 3 1. Beyond Digital Hype and Digital Anxiety 5 2. Artifact, Information, or Folklore: Preservation’s Divergent Lineages 11 3. Understanding Digital Objects 26 4. Challenges & Opportunities of Digital Preservation 39 5. The Craft of Digital Preservation 50 6. Preservation Intent & Collection Development 56 7. Managing Copies and Formats 70 8. Arranging and Describing Digital Objects 85 9. Enabling Multimodal Access and Use 104 10. Conclusions: Tools for Looking Forward 122 Bibliography 131 3 Acknowledgements I spent a year working on this book, but it represents the culmination of about a decade of trying to make my own sense of digital preservation. As such, I have a lot of people to acknowledge. The strengths of this book come from the international digital preservation community I’ve been welcomed into. Its’ weaknesses are my own. I first learned about digital preservation in my time at the Roy Rosenzweig Center for History and New Media. Before he passed away, Roy made an extensive and lasting impression those of us lucky enough to work for him. My constant hope is that the compassion, dedication, and pragmatism Roy brought into every day of his work at the Center comes through in my own. My understanding and appreciation for issues in digital history and digital preservation were sparked by four years of discussion and collaboration with colleges there; Dan Cohen, Josh Greenberg, Sean Takats, Tom Scheinfeldt, Sharon Leon, Sheila Brennan, Dave Lester, Jeremy Boggs, Jim Safley, Kari Kraus, Connie Moon Sehat, Miles Kelly, Mindy Lawrence, Jon Lesser, Kris Kelly, Ken Albers, Faolan Cheslack-Postava, John Flatness, Dan Stillman, and Christopher Hamner.
    [Show full text]
  • Evaluating Personal Archiving Strategies for Internet-Based Information
    Evaluating Personal Archiving Strategies for Internet-based Information Catherine C. Marshall; Microsoft; San Francisco, CA; Frank McCown and Michael L. Nelson; Old Dominion University; Norfolk, VA Abstract that is not shared – web-based email, personal photo stores, digital Internet-based personal digital belongings present different briefcases, and other non-published files – may be perceived dif- vulnerabilities than locally stored materials. We use responses to a ferently, as less at risk, than local files. survey of people who have recovered lost websites, in combination Consumers recognize the distinctions between materials with supplementary interviews, to paint a fuller picture of current stored locally and materials stored on the Internet. For example, curatorial strategies and practices. We examine the types of people may express the related opinion that if you find something personal, topical, and commercial websites that respondents have on the Internet once, it will be there when you look for it again, lost and the reasons they have lost this potentially valuable suggesting an almost magical persistence (in fact, one participant material. We further explore what they have tried to recover and in the study described in [1] said, “I thought that they [web pages] how the loss influences their subsequent practices. We found that were all set in stone.”). While individuals usually attribute this curation of personal digital materials in online stores bears some characteristic to files they have found rather than to personal mate- striking similarities to the curation of similar materials stored rial that they have stored on Internet services, there is no in- locally in that study participants continue to archive personal principal reason that this belief would not extend to their own assets by relying on a combination of benign neglect, sporadic digital belongings.
    [Show full text]
  • A Digital Dark Ages? Challenges in the Preservation of Electronic Information
    63RD IFLA Council and General Conference Workshop: Audiovisual and Multimedia joint with Preservation and Conservation, Information Technology, Library Buildings and Equipment, and the PAC Core Programme, September 4, 1997 A Digital Dark Ages? Challenges in the Preservation of Electronic Information Terry Kuny 1 XIST Inc. / UDT Core Programme email: [email protected] Last revised: August 27, 1997 Who controls the past controls the future. Who controls the present controls the past. George Orwell, Nineteen Eighty-Four, 1949 Monks and monasteries played a vital role in the Middle Ages in preserving and distributing books. It was their work which provided much of our present knowledge of the ancient past and of the rich heritage of Greek, Roman and Arabic traditions. With the advent of the printing press, this monastic tradition disappeared. However, the reverence for the historical record of text has been carried by librarians and archivists within private and public libraries to this very day. The tenor of our time appears to regard history as having ended, with pronouncements from many techno-pundits claiming that the Internet is revolutionary and changes everything. We seem at times, to be living in what Umberto Eco has called an “epoch of forgetting.” Within this hyperbolic environment of technology euphoria, there is a constant, albeit weaker, call among information professionals for a more sustained thinking about the impacts of the new technologies on society. One of these impacts is how we are to preserve the historic record in an electronic era where change and speed is valued more highly that conservation and longevity.
    [Show full text]
  • Electronic Records Archives Brian Knowles Roger Williams University, [email protected]
    Roger Williams University DOCS@RWU School of Architecture, Art, and Historic Historic Preservation Theses Preservation Theses and Projects 2015 Electronic Records Archives Brian Knowles Roger Williams University, [email protected] Follow this and additional works at: http://docs.rwu.edu/hp_theses Part of the Historic Preservation and Conservation Commons Recommended Citation Knowles, Brian, "Electronic Records Archives" (2015). Historic Preservation Theses. Paper 11. http://docs.rwu.edu/hp_theses/11 This Thesis is brought to you for free and open access by the School of Architecture, Art, and Historic Preservation Theses and Projects at DOCS@RWU. It has been accepted for inclusion in Historic Preservation Theses by an authorized administrator of DOCS@RWU. For more information, please contact [email protected]. Electronic Records Archives Brian Knowles Master of Science Historic Preservation School of Architecture, Art and Historic Preservation Roger Williams University Spring 2015 ii SIGNATURES Electronic Records Archives Brian Knowles ____________________________________ Date:____________ Jeremy Wells Thesis Adviser ____________________________________ Date:____________ Jessie Kratz Thesis Reader ____________________________________ Date:____________ Stephen White Dean of SAAHP ____________________________________ Date:____________ iii TABLE OF CONTENTS TITLE PAGE: i SIGNATURES: ii TABLE OF CONTENTS: iii ABSTRACT: v LIST OF FIGURES: vi PREFACE: 1 CHAPTER 1: ELECTRONIC RECORDS, BEST PRACTICES 1.0 Introduction: 3 1.1 Electronic Records
    [Show full text]
  • Bots, Seeds and People: Web Archives As Infrastructure
    Bots, Seeds and People Web Archives as Infrastructure Ed Summers Ricardo Punzalan University of Maryland University of Maryland [email protected] [email protected] ABSTRACT than 44%. Archives of web content matter, because hypertext The field of web archiving provides a unique mix of human links are known to break. Ceglowski [11] has estimated that and automated agents collaborating to achieve the preserva- about a quarter of all links break every 7 years. Even within tion of the web. Centuries old theories of archival appraisal highly curated regions of the web such as scholarly and legal are being transplanted into the sociotechnical environment publishing rates of link rot can be up to 50% [69,79]. of the World Wide Web with varying degrees of success. Failing to capture everything should not be surprising to the The work of the archivist and bots in contact with the mate- experienced archivist. Over the years, archival scholars have rial of the web present a distinctive and understudied CSCW argued that gaps and silences in the archival record are in- shaped problem. To investigate this space we conducted evitable. This is partly because we do not have the storage semi-structured interviews with archivists and technologists capacity nor all the manpower nor all the resources required who were directly involved in the selection of content from to keep everything. Thus, archivists necessarily select rep- the web for archives. These semi-structured interviews iden- resentative samples, identify unique and irreplaceable, and tified thematic areas that inform the appraisal process in web culturally valuable, records. We often assume that archivists archives, some of which are encoded in heuristics and algo- abide by a clear set of appraisal principles in their selection rithms.
    [Show full text]