Facilitating the Aggregation of Dispersed Personal Archives a Proposed Functional, Technical, and Business Model

Total Page:16

File Type:pdf, Size:1020Kb

Facilitating the Aggregation of Dispersed Personal Archives a Proposed Functional, Technical, and Business Model Facilitating the Aggregation of Dispersed Personal Archives A Proposed Functional, Technical, and Business Model Christopher J. Prom Assistant University Archivist and Associate Professor, University of Illinois at Urbana-Champaign Abstract We keep records in archives because such institutions are dedicated to preserving authentic evidence of human activity. ‘Cloud’ services pose a direct challenge to the archival mission. Archivists and all of humanity have a direct interest in building tools that help people aggregate, use, and control records they created. This paper outlines the conceptual model one such service, which is dubbed “myKive,” and which is currently undergoing proof-of-concept development at the University of Illinois. After describing its necessity, the paper lists the proposed service’s functions, outlines its core architecture, and describes it development/business framework. Author Christopher J. (Chris) Prom is Assistant University Archivist and Associate Professor of Library Administration at the University of Illinois at Urbana-Champaign. He holds a PhD in history from the University of Illinois and also studied at the University of York (United Kingdom). He is a Fellow of the Society of American Archivists and has received several other research fellowships including most recently a 2009-10 Fulbright Distinguished Scholar Award. He maintains the Practical E-Records Blog and an active publication portfolio. His research describes the ways in which archival users seek information relevant to their needs and assesses methods that archivists can use to efficiently meet those needs. He most recently authored a technical watch report for the Digital Preservation Coalition, “Preserving Email.” Chris is also co-director of the Archon™ project, which developed an open source application for managing archival descriptive information and digital objects, and he is a member of the ArchivesSpace project, which is developing a next-generation archival management system. He has served the Society of American Archivists in several capacities. He is currently a member of the editorial board of The American Archivist. I would like to begin my paper with a story. The story demonstrates key challenges faced by archives and archivists in what we might term the cloud era—the era of dispersed digital archives. Last November, I boarded a train at Union Station in Chicago, Illinois. I had just a left a meeting of the Society of American Archivists’ Fundamental Change Working Group. This group was charged with revising the Fundamentals Series, which comprises the heart of our society’s publishing program.1 1 The six books that comprise this series are available by visiting the publications pages at the Society of American Archivists website, at http://bit.ly/LyVy06 (Accessed June 26, 2012). Everyone at the meeting was acutely aware of two facts: 1) that newly trained archivists need a sophisticated set of digital skills, and 2) that our new instructional manuals must facilitate these skills. Moving quickly to find a seat on the train, I spotted a person from my University. I’ll call this person “Dr. Important.” After the requisite chic-chat, Dr. Important asked me what I have been working on lately. “Well, I’ve been writing a guide to email preservation.” “Oh, that’s interesting. Maybe you can help me.” Who doesn’t like to be asked for help? Maybe I could tell Dr. Important how to organize email and export it to a preservation-ready format. If lucky, I might even convince Dr. Important to transfer email to the University Archives, where it would become a public research resource. In this way, it would be accessible much like the handwritten or typescript correspondence from many other important people who had worked or studied at the University of Illinois in past years. “You see, I went to look for something I sent back in 2009,” Dr. Important continued. “I’ve been keeping a copy of all of my important emails, one folder for each month. But when I went back to find the message I needed, all the folders were gone.” Dr. Important told me that technical staff could not restore the emails, which likely went missing during a system migration that had taken place several months prior. As an archivist, I mourned the death of the evidence and information that Dr. Important had created and cared for over many years. But I felt helpless, and l let the conversation drift to another topic. This incident, and many others that I could tell from my time at the University of Illinois, illustrate one of the greatest challenges that archivists face: ensuring the preservation of evidence when people’s communication tools have, in effect, become their unofficial recordkeeping mechanisms. This problem is particularly pressing because, in most institutions, centralized systems to manage correspondence and other communications are dead or at least have one foot firmly in the grave. Given this fact, what can we (as a profession) do to make sure that usable records are fixed into a medium that will facilitate their perseveration and use? In order to answer that question, we must understand the ways in which information is dispersed within modern organizations and external social networks. More to the point, we must understand the way in which technology makes information into records that subsist within human social networks. With a better understanding of how records are formed and used within the technologies that facilitate such networks, we will be better positioned to capture and preserve not only information, but also contextual data about how that information was dispersed, used, and reused. 1. ‘Record-ness,’ Archives, and the Need for a Personal Archives Service In the cloud era, record capture and preservation systems must take three factors into account: 1) the perceived lack of value accorded to preserving digital communications; 2) the communication and information management practices used by individuals and; 3) the specific ways in which contextual data transforms information into evidence, within human social networks and the technologies that support them. Taken as a whole, the implications of these three factors call into question the continued existence of archives in the cloud environment—if by archives we mean a group of records that are maintained as a collective using the principles of provenance, original order, and collective control. 2 1.1 The perceived lack of value accorded to preserving digital communications In 1899, the American sociologist Thorstein Veblen wrote that “the cheap, and therefore indecorous, articles of daily consumption in modern industrial communities are commonly machine products.” 2 Such articles are much used but little valued, at least in a monetary sense. For that reason, they are easily lost or discarded. Any American who has eaten at a Fourth of July picnic knows how easy it is to throw dirty plastic utensils and plates into the trash, in spite of their utility when the hot dogs and watermelon were being served. In post-industrial societies, digital communications comprise one of the cheap, and therefore indecorous, articles of daily consumption. We are familiar with the forms that these materials take: email messages, blog posts, Facebook updates, tweets, online videos. Each can be inexpensively produced with the help of an electronic device. Each is arguably less decorous than the format for communication that it replaced, such as the handwritten letters, illustrated diaries, or professionally produced films in which archives like to traffic.3 Given this fact, one may expect that the greatest challenge in preserving such materials might consist simply in convincing people that their personal digital communications are important enough to preserve. But this is not the case. In the abstract, many people value digital materials highly and keep everything they send or create. However, most of them do not much concern themselves when a system crash sweeps digital records away as in a flood.4 This points to an important truism: the broader information ecology in which people work makes it very difficult for both organizations and individuals to identify, capture, and preserve the records that have the most long-term archival value, unless extraordinary actions are taken. Let me provide a few examples. My own institution, the University of Illinois, formerly made extensive use of college and departmental subject files, documenting faculty teaching, research, service, and administration. I say ‘formerly’ because over the past twenty years these paper-based files have largely disappeared. During the same period, most of our distinguished faculty members stopped keeping systematic correspondence files, aside from messages fortuitously retained within active email accounts. Asking administrators or faculty members to keep records outside of their communication applications (either in paper or in digital form) seems like a fruitless task. First, it would require that the institution implement an expensive software and hardware product, such as and Electronic Records Management (ERM) application. More to the point, implementing such a system would require that people make extensive changes to their work habits and procedures—something that is extremely unlikely in the Facebook Era, with its emphasis on immediate communication and response. Where ERM or ‘document management’ systems have been implemented, we see numerous problems follow. For example, staff in the office of our chief administrative officer (the Chancellor) are worried that email messages documenting critical policy decisions never make their way into the document management system since administrators don’t like to change their work habits and deposit email. Staff members are also worried that the system will not survive the departure of the current records manager. 2 Thorstein Veblen, The Theory of the Leisure Class (New York: Viking Press, 1967), 161, http://www.gutenberg.org/ebooks/833. 3 For one attempt to provide a more decorous platform for personal reminiscence and storytelling in the digital era, see the Cowbird service, founded by Jonathan Harris, at http://cowbird.com (Accessed June 26, 2012).
Recommended publications
  • Diverted Derived Design
    Diverted Derived Design Table of Contents Introduction 0 Motivations 1 Licenses 2 Design (as a) process 3 Distributions 4 Economies 5 Propositions 6 This book 7 Glossary 8 2 Diverted Derived Design Introduction The term open source is becoming popular among product designers. We see websites and initiatives appear with a lot of good intentions but sometimes missing the point and often creating confusion. Design magazines and blogs are always rushing into calling an openly published creation open source but rarely question the licenses or provide schematics or design files to download. We are furniture designers, hackers and artists who have been working with free/libre and open source software for quite some time. For us, applying these prirciples to product design was a natural extension, providing new areas to explore. But we also realized that designers coming to this with no prior open source experience had a lot of information to grasp before getting a clear picture of what could be open source product design. So we set ourselves to mobilize our knowledge in this book. We hope that this tool can be a base for teaching and learning about open source product design; a collective understanding of what one should know today to get started and join the movement; a reference students, amateurs and educators can have in their back pocket when they go out to explain what they are passionate about. How to read this book We have divided this book in sections that make sense for us. Each of these tries to address what we think is a general question you might have about open source product design.
    [Show full text]
  • Release 0.4.6-Alpha Philipp Heckel
    Syncany User Guide Release 0.4.6-alpha Philipp Heckel Apr 22, 2017 Contents 1 What is Syncany? 3 1.1 How do I use it?.............................................4 1.2 Who needs another sync tool?......................................4 1.3 What Syncany is not!..........................................5 1.4 Example Use Cases...........................................5 2 Installation 9 2.1 Installation requirements......................................... 10 2.2 Installing the latest release........................................ 10 2.3 Installing the latest snapshot....................................... 14 2.4 Installing from source.......................................... 14 3 Getting Started 15 3.1 Using the graphical user interface (GUI)................................ 15 3.2 Using the command line interface (CLI)................................. 18 4 Concepts 25 4.1 Abstraction for dumb storage through a minimal API.......................... 26 4.2 Minimizing remote disk space through deduplication.......................... 26 4.3 Privacy by design through client-side encryption............................ 28 4.4 Trace-based synchronization through vector clocks........................... 28 4.5 Differences and similarities to other tools................................ 29 4.6 Further Resources............................................ 29 5 Commands 31 5.1 The sy command............................................ 32 5.2 sy daemon: Start and stop the background process (daemon).................... 32 5.3 sy init: Initializing a
    [Show full text]
  • Trabajo Práctico Laboratorio De Sistemas Operativos Y Redes
    Trabajo Práctico Laboratorio de Sistemas Operativos y Redes Profesor: José Luis Di Biase Tema: Sparkleshare Integrantes: Esteban Tolaba Fernando Di Meglio Fernando Martinotti Introducción ¿Qué es Sparkleshare? Es un software cliente de código abierto que provee almacenamiento online y sincronización de archivos entre dispositivos remotos, está programado en el lenguaje C#. Ofrece un servicio similar al conocido Dropbox pero con mas ventajas, como pueden ser: ● Control total sobre la información ● Tamaño limitado solamente por la capacidad del hardware con el que se esté trabajando ● Totalmente libre y open source con licencia GPLv3 Cuenta con versiones compatibles con todas las distribuciones de Linux, Mac OS y Windows. ¿Cuándo resulta útil? Utilizar Sparkleshare es muy recomendable cuando se tiene una carpeta con archivos de un proyecto que varias personas van a manipular y sincronizarlos en caso de que sean modificados, se puede hacer todo de manera segura ya que utiliza encriptación para evitar que la información sea vista o alterada por personas sin autorización. Por el contrario, no es del todo útil si se desea realizar un backup completo de un disco duro, o compartir colecciones enteras de música o video, no se puede especular con la velocidad de conexión o espacio de almacenamiento disponible con el que cuente cada uno. Instalación Se procederá a dar los pasos para su correcta instalación (Versión actual 1.3) en un sistema operativo Linux Ubuntu 12.04 (Precise) Instalar paquetes requeridos Hay unos cuantos paquetes que deben ser
    [Show full text]
  • Proyecto Fin De Grado
    ÔÔÔÔÔÔÔÔÔÔ ÔÔÔÔÔ ÔÔÔÔ ÔÔ ÔÔÔÔÔ ÔÔÔÔÔÔÔ ÔÔÔÔÔÔ ÔÔ ÔÔÔÔÔÔÍ ÍÔÔÔÔÍ ÔÔÆÔÔ ÔÔ ÔÔÔÔÔ ÍÔÔÔÔÔÔÆ ÔÔÔÔÔÔ Ô¿¿¿¿ ¿¿ Ô¿¿¿¿¿¿¿¿¿ Ô¿¿¿¿¿¿¿¿¿ VºBº ÔÔÔÔÔ ÔÔÔÔÔ ÔÆÔÔÆ ÔÔÆÔÔ ÔÔ ÔÔÔÔÔ ÍÔÔÔÔÔÔÆ ÔÔÔÔÔÔÔ ÆÔÔÔÔÍ ÔÔÔÔÔÔ ìì »¿»¿¿¿¿»¿¿ ìííí Tecnología Cloud para el Hogar Digital RESUMEN El mundo tecnológico está cambiando hacia la optimización en la gestión de recursos gracias a la poderosa influencia de tecnologías como la virtualización y la computación en la nube (Cloud Computing). En esta memoria se realiza un acercamiento a las mismas, desde las causas que las motivaron hasta sus últimas tendencias, pasando por la identificación de sus principales características, ventajas e inconvenientes. Por otro lado, el Hogar Digital es ya una realidad para la mayoría de los seres humanos. En él se dispone de acceso a múltiples tipos de redes de telecomunicaciones (3G, 4G, WI-FI, ADSL…) con más o menos capacidad pero que permiten conexiones a internet desde cualquier parte, en todo momento, y con prácticamente cualquier dispositivo (ordenadores personales, smartphones, tabletas, televisores…). Esto es aprovechado por las empresas para ofrecer todo tipo de servicios. Algunos de estos servicios están basados en el cloud computing sobre todo ofreciendo almacenamiento en la nube a aquellos dispositivos con capacidad reducida, como son los smarthphones y las tabletas. Ese espacio de almacenamiento normalmente está en los servidores bajo el control de grandes compañías. Guardar documentos, videos, fotos privadas sin tener la certeza de que estos no son consultados por alguien sin consentimiento, puede despertar en el usuario cierto recelo. Para estos usuarios que desean control sobre su intimidad, se ofrece la posibilidad de que sea el propio usuario el que monte sus propios servidores y su propio servicio cloud para compartir su información privada sólo con sus familiares y amigos o con cualquiera al que le dé permiso.
    [Show full text]
  • This Thesis Has Been Submitted in Fulfilment of the Requirements for a Postgraduate Degree (E.G
    This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (e.g. PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work is protected by copyright and other intellectual property rights, which are retained by the thesis author, unless otherwise stated. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. Experimental Reproducibility in High-Throughput Multi-Omic Analysis Systems Thesis submitted for the degree of Doctor of Philosophy The University of Edinburgh 2015 Abstract The reproducibility of scientific studies is an important issue facing modern biology. A large number of studies published today cannot be reproduced, and the situation has been described as a reproducibility crisis. It has been shown that the inclusion of computational analysis within a study, adds a further level of complexity in reproducing the findings in that study. Even the reproduction of only the computational component of a study is fraught with difficulty. When provided with the source data, a list of the tools used and a protocol, it can still be difficult to produce the same results. One reason for this is that variation between different tools, versions, configurations, dependencies, operating systems and hardware, all contribute towards variation in the results.
    [Show full text]
  • 8 Self-Hosted Cloud- Storage Solutions There Are Many Reasons to Choose Your Own Cloud Storage Solutions Especially If You Have Sensitive Data
    8 Self-Hosted Cloud- Storage Solutions There are many reasons to choose your own Cloud storage solutions especially if you have sensitive data. What if the service you use shuts down, or your data is lost? - Raj Kumar Maurya hy setup your own private cloud based ownCloud server via mobile and desktop apps. You can add storage when there are already options external storage to your ownCloud with Dropbox, SWIFT, available like Google Drive, Microsoft FTPs, Google Docs, S3, external WebDAV servers and more. WOneDrive, Dropbox, etc? The logic is the Enable the encryption app to encrypt data on external same as you would apply to any public vs private service. storage for improved security and privacy. They have tons You have more control over something you setup yourself, of plugins available, like music players, photo galleries, and including rights management, security of data, etc. Here video players. are eight private cloud storage solutions to try out. SparkleShare Creating a cloud on your own server gives you better It uses Git as a storage backend and is particularly good control on your data. Following are other reasons to at hosting documents. SparkleShare creates a special have your personal cloud storage solution: folder on your computer. When someone adds, removes • Better privacy protection and encryption: Avoid or edits a file, it will sync both host and all your peers. spying on your files on the server using encryption. It keeps a revision of the modified files history, as well • Good performance as your data is stored in local as support for encryption.
    [Show full text]