<<

Mathematics for safeguarding the nation’s digital memory

Martine J. Safeguarding the nation’s digital memory Barons, CMath

Archives Martine J. Barons, CMath The digital preservation problem IDSS Director Applied Statistics & Risk Unit The National Archives University of Warwick Project Soft Elicitation [email protected] Data SEJ The Tool ECMI21 13th April 2021 Lessons Digital Preservation Awards c Martine J. Barons 2021 Overview

Mathematics for safeguarding 1 the nation’s Archives digital The digital preservation problem memory

Martine J. Barons, 2 IDSS CMath

Archives 3 The National Archives Project The digital preservation Soft Elicitation problem Data IDSS

The National SEJ Archives Project The Tool Soft Elicitation Data SEJ 4 Lessons The Tool Lockdown Lessons Lockdown Digital Preservation Awards Digital Preservation Awards c Martine J. Barons 2021 What are archives?

Mathematics for Collections of information known as records safeguarding the nation’s Records digital memory letters Martine J. Barons, reports CMath minutes Archives The digital registers preservation problem maps IDSS The National photographs and films Archives Project digital files Soft Elicitation Data SEJ sound recordings The Tool Lessons Records in an archive are primary sources. Archives provide Lockdown Digital Preservation first-hand information or evidence relating to historical events Awards or figures. c Martine J. Barons 2021 Archives are managed by a variety of types of institutions and the materials they collect differ.

Mathematics for Archives safeguarding the nation’s Government or national archives: materials related to all digital memory levels of government

Martine J. Barons, Corporate archives: manage and preserve business records. CMath College and university archives:preserve materials related Archives to the institution. The digital preservation problem Historical societies: preserve materials related to a specific IDSS region, event, or industry. The National Archives Museum archives: diverse collections typically consisting of Project Soft Elicitation artwork or artifacts Data SEJ Religious archives: collect and preserve materials related The Tool Lessons to a faith, denomination or place of worship Lockdown Digital Special collections: a collection of items that are either Preservation Awards irreplaceable or rare, usually in a library. c Martine J. Barons 2021 Archives are managed by a variety of types of institutions and the materials they collect differ. The National Archives

Mathematics for TNA safeguarding the nation’s Official archive of the UK Government digital memory Preserve the key records created by around 250 Martine J. Barons, government departments e.g. Multiple email threads on a CMath decision around public safety which will have rich

Archives embedded metadata, multiple attachments and image The digital preservation footers; AI algorithms used to decide who has the right to problem IDSS remain in the country The National At last count the digital archive held 4759.5TB of records Archives Project Records are text, videos, sound recordings, databases, 3d Soft Elicitation Data models, images etc. SEJ The Tool Have to preserve those records forever Lessons Lockdown Socially important records e.g. London Olympics Digital Preservation Awards Legally important records e.g.Hillsborough 15 April 1989 c Martine J. Barons 2021 Digital Preservation

Mathematics The act of preserving digital records is termed digital for safeguarding preservation. There is a wide variety of risks to digital files.The the nation’s digital risks are connected and influence one another. memory Martine J. Digital preservation is a relatively new concern, especially Barons, CMath for archivists, many of whom are still working primarily

Archives with analogue materials. The digital preservation It takes time to get to a point where professionals know problem IDSS what data they need to start monitoring in order to better

The National make decisions. Archives Project Sharing data on risks means admitting something has Soft Elicitation Data gone wrong, which understandably might be something SEJ The Tool many organisations and individuals will be cautious to do. Lessons Lockdown Obsolescence and limited lifespan: Archivists like to say Digital Preservation that digital records last for ever - or 5 years, whichever Awards c Martine J. comes first. Barons 2021 Digital Preservation challenges

Mathematics Need to preserve the original bitstream and be able to render it for safeguarding ‘sufficiently’ for use. the nation’s digital memory Challenges Martine J. Software - wordstar, bespoke software - emulators Barons, CMath Storage medium - Facebook, Myspace, Google (reduced Archives volume), cloud, account access The digital preservation problem Storage hardware - floppy disc, 8-track, VHS / Betamax, IDSS phone The National Archives Storage life - laptop, flash drive, hard drive, SD card, Project Soft Elicitation magnetic tape, malware corruption Data SEJ Copying errors The Tool Lessons Storage compression - photographs on Facebook or Google Lockdown Digital Preservation Natural disaster or accident - Fire, flood, earthquake - Awards location c Martine J. Barons 2021 Integrating Decision Support systems

Mathematics for A formal & defensible statistical methodology to draw together safeguarding the nation’s inferences when: digital memory Users are decision Centres

Martine J. Barons, Centres motivated to act as a single coherent unit for a CMath common goal Archives Consensus about utility structure to scrutinise efficacy of The digital preservation problem candidate policies IDSS Consensus about an overarching description of dynamics The National Archives driving the process. Project Soft Elicitation Consensus about who is expert about what, to identify Data SEJ appropriate expert panels The Tool Lessons Expert judgements from disparate panels of experts Lockdown Digital Each component panel informed by complex models & Preservation Awards huge data sets c Martine J. Barons 2021 Integrating Decision Support systems

Mathematics for safeguarding the nation’s digital memory A single, comprehensive probabilistic model is inappropriate Martine J. Barons, infeasibly large CMath no shared structural assumptions so no centre can ‘own’ Archives The digital the full joint distribution preservation problem dynamic revisions lead to fast obsolescence IDSS

The National Archives Full technical details in Project Coherent Frameworks for Statistical Inference serving Integrating Decision Soft Elicitation Data Support Systems Jim Q. Smith, Martine J. Barons & Manuele Leonelli, SEJ submitted & on arXiv: 1507.07394 The Tool Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021 Theorem

Mathematics for safeguarding the nation’s digital memory Theorem: Martine J. Barons, Suppose an IDSS for a CK class (U, D, S) is adequate where CMath U and D are arbitrary and S includes the consensus that the Archives IDSS is delegable, separately informed, cutting and commonly The digital preservation problem separated at time t. Then it will also be sound and distributed IDSS at time t. Furthermore it is common knowledge that the SB’s The National beliefs about each panel’s parameter vector are the same as Archives θi Project those of the corresponding expert panel Gi , i ∈ [m], for all Soft Elicitation Data d ∈ D and at any time t ≥ 0. SEJ The Tool Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021 Examples of sound and distributive frameworks

Mathematics for Staged trees safeguarding the nation’s Bayesian Networks digital memory Chain event graphs

Martine J. Decomposable graphs Barons, CMath Multiregression dynamic models

Archives The digital preservation problem IDSS

The National Archives Project Soft Elicitation Data SEJ The Tool Lessons Lockdown Digital Preservation Figure: Manuele Leonelli & James Q. Smith(2015) Bayesian Decision Support for complex systems with Awards many distributed experts Ann Op Res c Martine J. Barons 2021 The National Archives Project

Mathematics for safeguarding the nation’s digital memory

Martine J. Soft Elicitation Barons, CMath Model building

Archives Model quantification - Data The digital preservation problem Model quantification - SEJ IDSS Evaluation / feedback The National Archives Software engineering Project Soft Elicitation Launch & adoption Data SEJ The Tool Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021 Soft Elicitation

Mathematics An essential starting point with any problem is to interact with for safeguarding problem-owners, their advisers, experts and close stakeholders the nation’s digital to understand their perspectives, views, values, uncertainties, memory worldview Martine J. Barons, CMath Structure the problem

Archives What are the processes, inputs, outputs, actors, The digital preservation perceptions of cause and effect? problem IDSS How do they interact? The National What are the specific objectives, uncertainties, challenges Archives Project to address? Soft Elicitation Data SEJ How might these be modelled? The Tool What relevant data and expertise are available? Lessons Lockdown Digital How should outputs be presented? Preservation Awards Iterate c Martine J. Barons 2021 Soft Elicitation with TNA

Mathematics for safeguarding Problem identification the nation’s digital Digital preservation is a mission to send messages to the memory future Martine J. Barons, CMath Those messages to be faithfully transmitted, to retain their meaning and to be useful for generations to come. Archives The digital Our world is not static. Threats are constantly changing. preservation problem IDSS Archives’ resources and ability to deal with threats change

The National too and its not always easy to see what to do next. Archives Project Archives all have a long list of good things to do, but the Soft Elicitation Data reality is that few of them will ever have the luxury of SEJ The Tool doing all of them. Lessons Even the best resourced archives must make choices about Lockdown Digital Preservation how and when to invest in digital preservation. Awards c Martine J. Barons 2021 Soft Elicitation with TNA

Mathematics for safeguarding the nation’s digital A new framework required memory A new framework for managing digital preservation risk is Martine J. Barons, required that: CMath Describes and explains a complex and interdependent map Archives The digital of risk events, risk management actions and their impact preservation problem on preservation outcomes. IDSS Allows archivists to compare and prioritise very different The National Archives types of threats to the digital archive with potential Project Soft Elicitation impact in different areas. Data SEJ The Tool Operates even where we have limited data or imperfect Lessons evidence. Lockdown Digital Preservation Awards c Martine J. Barons 2021 Soft Elicitation with TNA

Mathematics for Workshops safeguarding the nation’s TNA June 2017 review existing (qualitative) approaches to digital memory risk management within digital preservation Martine J. Barons, TNA 2017 risk mapping exercise to identify risks within CMath the Digital Repository Infrastructure, and establish Archives relationships between those risks The digital preservation problem November 2018 internal TNA workshop with AS&RU on IDSS Bayesian Networks and Structured Expert Judgement The National Archives November 2019 Workshop with TNA and partner archives Project to identify risks involved in digital preservation and the Soft Elicitation Data SEJ relationships between them The Tool Draft network AS&RU with TNA - Dr Thais Fonseca and Lessons Lockdown Hannah Merwood Digital Preservation Awards January 2020 Formal kick-off and structure review with c Martine J. TNA and partner archives Barons 2021 Network

Mathematics for safeguarding the nation’s digital memory

Martine J. Barons, CMath

Archives The digital preservation problem IDSS

The National Archives Project Soft Elicitation Data SEJ The Tool Lessons Lockdown Digital Utility: Renderability and Intellectual Control Preservation Awards c Martine J. Barons 2021 Quantification

Mathematics for safeguarding the nation’s digital memory Finding relevant data Martine J. Node definitions (measurable) Barons, CMath Surveys Archives Previous research The digital preservation problem Available statistics at TNA IDSS Data granularity - where data does exist they are often The National Archives related to storage medium and too specific Project Soft Elicitation Data Identify data gaps SEJ The Tool SEJ Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021 Structured Expert Judgement - The IDEA protocol

Mathematics for safeguarding the nation’s digital Investigate, Discuss, memory Estimate, Aggregate Martine J. Barons, Private estimate CMath Facilitated Archives The digital Discussion preservation problem Second private IDSS estimate The National Archives Project Aggregation Soft Elicitation Data Calibration SEJ The Tool questions Lessons Lockdown Questions of interest Digital Preservation Awards c Martine J. Barons 2021 SEJ-Range graphs for discussion

Mathematics for safeguarding the nation’s digital memory

Martine J. Barons, CMath

Archives The digital preservation problem IDSS

The National Archives Project Soft Elicitation Data SEJ The Tool Lessons Question 36: Out of 1,000 born-digital files, for how many Lockdown would you expect an archive to know their conditions of use? Digital A.M. Hanea, M.F. McBride, M.A. Burgman & B.C. Wintle (2018) Classical meets modern in the IDEA Preservation protocol for structured expert judgement, Journal of Risk Research, 21:4, 417-433, DOI: Awards 10.1080/13669877.2016.1215346 c Martine J. Barons 2021 Safeguarding the nation’s digital memory

Mathematics for safeguarding the nation’s digital memory

Martine J. Barons, CMath

Archives The digital preservation problem IDSS

The National Archives Project Soft Elicitation Data SEJ The Tool Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021 Mathematics for safeguarding the nation’s digital memory

Martine J. Barons, CMath

Archives The digital preservation problem IDSS

The National Archives Project Soft Elicitation Data SEJ The Tool Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021 Developing DiAGRAM

Mathematics for safeguarding The National Archive the nation’s digital Workshops to identify variables of interest, granularity, etc. memory

Martine J. The National Archives Barons, CMath Dorset History Centre

Archives Gloucestershire Archives The digital preservation problem Transport for London Archives IDSS University of Brighton Design Archives The National Archives University of Leeds Special Collections Project Soft Elicitation Data SEJ The Utility The Tool Lessons Renderability Lockdown Digital Intellectual control Preservation Awards c Martine J. Barons 2021 Digital Preservation Awards 2020 feedback

Mathematics for The judges said ... safeguarding the nation’s digital Provides a good summary of the existing DP tools and memory landscape, and clearly articulates the unique element this Martine J. Barons, project brings (quantitative dimension to risk CMath management). Archives Looked at existing DP tools, recognized that a better tool The digital preservation problem was needed, and found it (Bayesian Networks). IDSS Outcomes are credible. The National Archives Proof that digital projects are worthwhile in their own Project Soft Elicitation right is a benefit perhaps unique to this project. Data SEJ COVID created more opportunities to funnel project The Tool Lessons resources into development. The uniqueness in this Lockdown project’s funding is a stand-out positive impact for Digital Preservation Awards potential advocacy for the field c Martine J. Barons 2021 Digital Preservation Awards 2020 feedback

Mathematics for The judges also said ... safeguarding the nation’s digital Answers a pressing need with a multidisciplinary approach. memory Clear understanding of audience and what they need from Martine J. Barons, this tool CMath The prototype shows clear innovation in the way it Archives The digital displays information and attempts to create quantitative preservation problem comparisons IDSS Clear immediate benefits and the starting point for The National Archives potential new ways of assessing risk Project Soft Elicitation Very strong, convincing and as far as I can tell entirely new Data SEJ basis for executives to understand the provision of funding, The Tool Lessons policy and priority to investment in digital preservation, as Lockdown well as the practical outcomes of preservation actions. Digital Preservation Awards This is a step change from anything that has come before. c Martine J. Barons 2021 TNA spending review

Mathematics for safeguarding the nation’s Baseline current digital digital memory preservation risk score at

Martine J. The National Archives Barons, CMath Show current value of

Archives digital preservation The digital preservation practice in comparison to problem model for simple backup IDSS

The National Model changes to risks Archives Project scores over various time Soft Elicitation periods with and without Data Present risk score at The SEJ requested funding The Tool National Archives versus a Lessons Achieved 33.5% increase Lockdown default average model and just Digital in overall funding Preservation backing up Awards c Martine J. Barons 2021 Acknowledgements

Mathematics for safeguarding the nation’s digital [email protected] memory go.warwick.ac.uk/MJBarons Martine J. Barons, CMath

Archives The digital preservation problem IDSS

The National Archives Project Soft Elicitation Data SEJ The Tool Supported by EPSRC grant EP/K039628/1 Lessons Lockdown Digital Preservation Awards c Martine J. Barons 2021