A Digital Dark Ages? Challenges in the Preservation of Electronic Information
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Archaeology and the Big Data Challenge
Think big about data: Archaeology and the Big Data challenge Gabriele Gattiglia Abstract – Usually defined as high volume, high velocity, and/or high variety data, Big Data permit us to learn things that we could not comprehend using smaller amounts of data, thanks to the empowerment provided by software, hardware and algorithms. This requires a novel archaeological approach: to use a lot of data; to accept messiness; to move from causation to correlation. Do the imperfections of archaeological data preclude this approach? Or are archaeological data perfect because they are messy and difficult to structure? Normal- ly archaeology deals with the complexity of large datasets, fragmentary data, data from a variety of sources and disciplines, rarely in the same format or scale. If so, is archaeology ready to work more with data-driven research, to accept predictive and probabilistic techniques? Big Data inform, rather than explain, they expose patterns for archaeological interpretation, they are a resource and a tool: data mining, text mining, data visualisations, quantitative methods, image processing etc. can help us to understand complex archaeological information. Nonetheless, however seductive Big Data appear, we cannot ignore the problems, such as the risk of considering that data = truth, and intellectual property and ethical issues. Rather, we must adopt this technology with an appreciation of its power but also of its limitations. Key words – Big Data, datafication, data-led research, correlation, predictive modelling Zusammenfassung – Üblicherweise als Hochgeschwindigkeitsdaten (high volume, high velocity und/oder high variety data) bezeichnet, machen es Big Data möglich, dank dem Einsatz von Software, Hardware und Algorithmen historische Prozesse zu studieren, die man anhand kleinerer Datenmengen nicht verstehen kann. -
Archives First: Digital Preservation Further Investigations Into Digital
Archives First: digital preservation Further investigations into digital preservation for local authorities Viv Cothey * 2020 * Gloucestershire County Council ii Not caring about Archives because you have nothing to archive is no different from saying you don’t care about freedom of speech because you have nothing to say. Or that you don’t care about freedom of the press because you don’t like to read. (after Snowden, 2019, p 208) Disclaimer The views and opinions expressed in this report do not necessarily represent those of the institutions to which the author is affiliated. iii iv Executive summary This report is about an investigation into digital preservation by (English) local authorities which was commissioned by the Archives First consortium of eleven local authority record offices or similar memory organisations (Archives). The investigation is partly funded by The National Archives. Archival institutions are uniquely able to serve the public by providing current and future generations with access to authentic unique original records. In the case of local authority Archives these records will include documents related to significant decision making processes and events that bear on individuals and their communities. Archival practice, especially relating to provenance and purposeful preservation, is instrumental in supporting continuing public trust and essential to all of us being able to hold authority to account. The report explains how Archival practice differs from library practice where provenance and purposeful preservation are absent. The current investigation follows an earlier Archives First project in 2016-2017 that investigated local authority digital preservation preparedness. The 2016-2017 investigation revealed that local authority line of business systems in respect of children services, did not support the statutory requirement to retain digital records over the long- term (at least 100 years). -
Data Mining and Data Warehousing Unit 1: Introduction
https://genuinenotes.com Data Mining and Data Warehousing Unit 1: Introduction Data Mining Databases today can range in size into the terabytes — more than 1,000,000,000,000 bytes of data. Within these masses of data lies hidden information of strategic importance. But when there is such enormous amount of data how could we draw any meaningful conclusion? We are data rich, but information poor. The abundance of data, coupled with the need for powerful data analysis tools, has been described as a data rich but information poor situation. Data mining is being used both to increase revenues and to reduce costs. The potential returns are enormous. Innovative organizations worldwide are using data mining to locate and appeal to higher-value customers, to reconfigure their product offerings to increase sales, and to minimize losses due to error or fraud. What is data mining? Mining is a brilliant term characterizing the process that finds a small set of precious bits from a great deal of raw material. Definition: Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in unique ways that are both understandable and useful to the data owner. Data mining is an interdisciplinary field bringing together techniques from machine learning, pattern recognition, statistics, databases, and visualization to address the issue of information extraction from large database. -
Digital Preservation Handbook
Digital Preservation Handbook Digital Preservation Briefing Illustrations by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark Who is it for? Senior administrators (DigCurV Executive Lens), operational managers (DigCurV Manager Lens) and staff (DigCurV Practitioner Lens) within repositories, funding agencies, creators and publishers, anyone requiring an introduction to the subject. Assumed level of knowledge Novice. Purpose To provide a strategic overview and senior management briefing, outlining the broad issues and the rationale for funding to be allocated to the tasks involved in preserving digital resources. To provide a synthesis of current thinking on digital preservation issues. To distinguish between the major categories of issues. To help clarify how various issues will impact on decisions at various stages of the life-cycle of digital materials. To provide a focus for further debate and discussion within organisations and with external audiences. Gold sponsor Silver sponsors Bronze sponsors Reusing this information You may re-use this material in English (not including logos) with required acknowledgements free of charge in any format or medium. See How to use the Handbook for full details of licences and acknowledgements for re-use. For permission for translation into other languages email: [email protected] Please use this form of citation for the Handbook: Digital Preservation Handbook, 2nd Edition, http://handbook.dpconline.org/, Digital Preservation Coalition © 2015. 2 Contents Why Digital Preservation Matters -
A New Digital Dark Age? Collaborative Web Tools, Social Media and Long-Term Preservation Stuart Jeffrey Version of Record First Published: 05 Dec 2012
This article was downloaded by: [University of York] On: 10 December 2012, At: 04:01 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK World Archaeology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rwar20 A new Digital Dark Age? Collaborative web tools, social media and long-term preservation Stuart Jeffrey Version of record first published: 05 Dec 2012. To cite this article: Stuart Jeffrey (2012): A new Digital Dark Age? Collaborative web tools, social media and long-term preservation, World Archaeology, 44:4, 553-570 To link to this article: http://dx.doi.org/10.1080/00438243.2012.737579 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. A new Digital Dark Age? Collaborative web tools, social media and long-term preservation Stuart Jeffrey Abstract This paper examines the impact of exciting new approaches to open data sharing, collaborative web tools and social media on the sustainability of archaeological data. -
Data Lakes, Data Hubs and AI
Data Lakes, Data Hubs and AI Dan McCreary Distinguished Engineer in Artificial Intelligence Optum Advanced Applied Technologies Background for Dan McCreary • Co-founder of "NoSQL Now!" conference • Coauthor (with Ann Kelly) of "Making Sense of NoSQL" • Guide for managers and architects • Focus on NoSQL architectural tradeoff analysis • Basis for 40 hour course on database architectures • How to pick the right database architecture • http://manning.com/mccreary • Focus on metadata and IT strategy (capabilities) • Currently focused on NLP and Artificial Intelligence Training Data Management 2 The Impact of Deep Learning Predictive Precision Deep Learning Traditional Machine Learning (Linear Regression) Training Set Size Large datasets create competitive advantage 3 High Costs of Data Sourcing for Deep Learning $ % OF TIME % OF THE BILLION IN 80 WASTED 60 COST 3.5 SPENDING By data scientists just getting Of data warehouse projects In 2016 on data integration access to data and preparing is on ETL software data for analysis Six Database Core Architecture Patterns Relational Analytical (read-mostly OLAP) Key-Value key value key value key value key value Column-Family Graph Document Which architectures are best for data lakes, data hubs and AI? 5 Role of the Solution Architect Relational Analytical (OLAP) Key-Value Help! k v k v k v k v Column-Family Graph Document Business Unit Sally Solutions Title: Solution Architect • Non-bias matching of business problem to the right data architecture before we begin looking at a specific products 6 Google Trends Data Lake Data Hub 7 Data Science and Deep Learning Data Science Deep Learning 8 Data Lake Definition ~10 TB and up $350/10TB Drive A storage repository that holds a vast amount of raw data in its native format until it is needed. -
Digital Preservation.Pdf
Digital Preservation By Jean-Yves Le Meur project leader of CERN Digital Memory 100 AD End of cuneiform on tablets 350-50 BC Jupiter orbit Tablets -> British Museum 1800- 1300-1400 Jupiter orbit 2014 (again) The missing 29 Jan 2016 ‘rosetta’ tablet Digital preservation in a nutshell ● World wide Landscape ○ Rationale ○ Interesting initiatives ○ Good practices: OAIS ● The different Approaches The Digital “Dark Age” "We are nonchalantly throwing all of our data into what could become an information black hole without realizing it" Vint Cerf (vice-president of Google in Feb 2015) The Digital “Dark Age” ● Very large community worrying about the preservation of digital content ● Digital Preservation Coalition ● Open Preservation Foundation ● UNESCO PERSIST project, EU e-ARK project, National Libraries and Archives ● Many related conferences: iPRES series, etc. “This is not about preserving bits, It is about preserving meaning, much like the Rosetta Stone.” More than 70 major libraries destroyed over time: accidents, disasters, ethnocides How digital data evaporates (I) 1. Physical Obsolescence: Bit rot Ten 2. Redundancy failure Major 3. Technological Obsolescence of readers, formats, OS, HWs New 4. Lost in migrations ! Risks 5. Missing context: no codec ! How digital data evaporates (II) 6. Redundancy failure Ten 7. Economical Failures Major 8. Lost in transitions: people ! New 9. Corruption, mistake or attack 10. Dissipation: out of reach Risks Some examples at CERN ● The very first WWW pages ○ Reconstructed in 2013 - found again in 2018! -
Follow-Up Questions
Follow-Up Questions ASERL Webinar: “Intro to Digital Preservation #2 -- Forbearing the Digital Dark Age: Capturing Metadata for Digital Objects” Speaker = Chris Dietrich, National Park Service Session Recording: https://vimeo.com/63669010 Speaker’s PPT: http://bit.ly/10PKvu8 UPDATED – May 23, 2013 Tools 1. Is photo watermarking available using Windows Explorer? Microsoft Paint, which comes installed with Microsoft Windows, provides basic (albeit inelegant) watermarking capabilities. 2. Do Microsoft tools capture basic metadata automatically, without user intervention? Microsoft Office products capture very basic metadata automatically. The Author, Initials, and Company are captured automatically from a user’s Windows User Account settings. File system properties like File Size, Date, etc. are also automatically captured. The following Microsoft Knowledge Base articles provide details for each Microsoft Office product: http://office.microsoft.com/en-us/access-help/view-or-change-the-properties-for-an-office-file- HA010354245.aspx, http://office.microsoft.com/en-us/help/about-file-properties-HP003071721.aspx. Microsoft SharePoint can be configured to automatically capture metadata for items uploaded to libraries: http://office.microsoft.com/en-us/sharepoint-help/introduction-to-managed-metadata-HA102832521.aspx. 3. Can you recommend tools/services that leverage geospatial data that do not provide latitude & longitude information? For example, I want to plot a photo of “Mt Doom” on a map but have no coordinates…. Embedding geospatial coordinates in digital objects (often called “geotagging”) can be done with a number of software tools. GPS Photo Link (http://www.geospatialexperts.com/gps-photo%20link.php) allows users to add coordinates to embedded metadata manually, or by selecting a photo(s) and then clicking a point on a Bing Maps satellite image. -
Supervised Machine Learning Models for Fake News Detection
CCT College Dublin ARC (Academic Research Collection) ICT Student Achievement Summer 6-15-2019 Supervised Machine Learning Models for Fake News Detection Andrea Lopez CCT College Dublin Adelo Vieira CCT College Dublin Zafar Ahsan CCT College Dublin Farooq Sabib CCT College Dublin Shirley Marinho CCT College Dublin Follow this and additional works at: https://arc.cct.ie/ict Part of the Graphics and Human Computer Interfaces Commons, Numerical Analysis and Scientific Computing Commons, and the Theory and Algorithms Commons Recommended Citation Lopez, Andrea; Vieira, Adelo; Ahsan, Zafar; Sabib, Farooq; and Marinho, Shirley, "Supervised Machine Learning Models for Fake News Detection" (2019). ICT. 5. https://arc.cct.ie/ict/5 This Dissertation is brought to you for free and open access by the Student Achievement at ARC (Academic Research Collection). It has been accepted for inclusion in ICT by an authorized administrator of ARC (Academic Research Collection). For more information, please contact [email protected]. CCT COLLEGE Supervised Machine Learning Models for Fake News Detection A Supervised Machine Learning Models for Fake News Detection by Gofaas group by GoSupervisedfaas group Machine Learning Models for Fake News Detection [Figure.1]by Gofaas group A Supervised Machine Learning by GoModelsfaas group for Fake News Detection 1 INDEX: Executive Summary Chapter 1. Introduction ................................................................................................. 6 Initial Proposal .............................................................................................................. -
Problems of Digital Sustainability
Acta Polytechnica Hungarica Vol. 7, No. 3, 2010 Problems of Digital Sustainability Tamás Szádeczky Department of Measurement and Automation, Kandó Kálmán Faculty of Electrical Engineering, Óbuda University Tavaszmező u. 17, H-1084 Budapest, Hungary [email protected] Abstract: The article introduces digital communication by drawing comparisons between the histories of digital and conventional written communication. It also shows the technical and legal bases and the currently reached achievements. In relation to the technical elements, it acquaints the reader with the development and current effects of computer technology, especially cryptography. In connection with the legal basis, the work presents the regulations which have emerged and made possible the legal acceptance of the digital signature and electronic documents in the United States of America, in the European Union and among certain of its member countries, including Hungary. The article reviews the regulations and the developed practices in the fields of e-commerce, electronic invoices, electronic records management and certain e-government functions in Hungary which are necessary for digital communication. The work draws attention to the importance of secure keeping and processing of electronic documents, which is also enforced by the legal environment. The author points to the technical requirements and practical troubles of digital communication, called digital sustainability. Keywords: electronic archive; digital sustainability; preservation; electronic signature; data security 1 Introduction We may distinguish three revolutions in the development of written communication [1]. The first revolution was the invention of alphabetical writing carrying phonetic value around 1300 BC., which segregated the text from the content. The second revolution in the 15th Century was the book printing invented by Gutenberg, Johannes Gensfleisch, which made written material widely available. -
A Data Lake for Archaeological Data Management and Analytics
ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics Pengfei Liu Camille Noûs Sabine Loudcher [email protected] Jérôme Darmont Laboratoire Cogitamus [email protected] France [email protected] [email protected] Université de Lyon, Lyon 2, UR ERIC Lyon, France ABSTRACT 1 INTRODUCTION With new emerging technologies, such as satellites and drones, Over the past decade, new forms of data such as geospatial data and archaeologists collect data over large areas. However, it becomes aerial photography have been included in archaeology research [8], difficult to process such data in time. Archaeological data also have leading to new challenges such as storing massive, heterogeneous many different formats (images, texts, sensor data) and canbe data, high-performance data processing and data governance [4]. structured, semi-structured and unstructured. Such variety makes As a result, archaeologists need a platform that can host, process, data difficult to collect, store, manage, search and analyze effectively. analyze and share such data. A few approaches have been proposed, but none of them covers the In this context, a multidisciplinary consortium of archaeologists full data lifecycle nor provides an efficient data management system. and computer scientists proposed the HyperThesau project1, which Hence, we propose the use of a data lake to provide centralized data aims at designing a data management and analysis platform. Hyper- stores to host heterogeneous data, as well as tools for data quality Thesau has two main objectives: 1) the design and implementation checking, cleaning, transformation and analysis. In this paper, we of an integrated platform to host, search, analyze and share archaeo- propose a generic, flexible and complete data lake architecture. -
Archaeological Archives
Archaeological Archives A guide to best practice in creation, compilation, transfer and curation Archaeological Archives A guide to best practice in creation, compilation, transfer and curation Duncan H. Brown Supported by: Archaeological Archives Forum Archaeology Data Service Association of Local Government Archaeological Officers UK Department of Environment for Northern Ireland English Heritage Historic Scotland Institute for Archaeologists Royal Commission on the Ancient and Historical Monuments of Scotland Society of Museum Archaeologists This guide has been published by the Institute for Archaeologists on behalf of the Archaeological Archives Forum. Development and production of this guide was funded by grant-aid from English Heritage, Historic Scotland, the Royal Commission on the Ancient and Historical Monuments of Scotland, the Environment and Heritage Service (Northern Ireland) and the Society of Museum Archaeologists, with support in kind from the Archaeology Data Service and the Association of Local Government Archaeological Officers (UK). Original photographs by John Lawrence. Others reproduced with the kind permission of Mark Bowden (page 5), English Heritage (page 40), the Environment and Heritage Service (Northern Ireland) (page 1), Southampton City Council (page 29), and the Royal Commission on the Ancient and Historical Monuments of Scotland (page 26). First published July 2007. Second Edition September 2011. ISBN 0948393912 Design and layout by Maria Geals. Foreword The creation of stable, consistent, logical, and accessible archives from fieldwork is a fundamental building block of archaeological activity. Since the discipline emerged in the late 19th and early 20th centuries, it has been recognised that the process of excavation is destructive and that no archaeological interpretations are sustainable unless they can be backed up with the evidence of field records and post-excavation analysis.