Module 8 Wiki Guide
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Jamie Shiers, CERN
Investing in Curation A Shared Path to Sustainability [email protected] Data Preservation in HEP (DPHEP) Outline 1. Pick 2 of the messages from the roadmap & comment – I could comment on all – but not in 10’ 1. What (+ve) impact has 4C already had on us? – Avoiding overlap with the above 1. A point for discussion – shared responsiblity / action THE MESSAGES The 4C Roadmap Messages 1. Identify the value of digital assets and make choices 2. Demand and choose more efficient systems 3. Develop scalable services and infrastructure 4. Design digital curation as a sustainable service 5. Make funding dependent on costing digital assets across the whole lifecycle 6. Be collaborative and transparent to drive down costs IMPACT OF 4C ON DPHEP International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics “LHC Cost Model” (simplified) Start with 10PB, then +50PB/year, then +50% every 3y (or +15% / year) 10EB 1EB 6 Case B) increasing archive growth Total cost: ~$59.9M (~$2M / year) 7 1. Identify the value of digital assets and make choices • Today, significant volumes of HEP data are thrown away “at birth” – i.e. via very strict filters (aka triggers) B4 writing to storage To 1st approximation ALL remaining data needs to be kept for a few decades • “Value” can be measured in a number of ways: – Scientific publications / results; – Educational / cultural impact; – “Spin-offs” – e.g. superconductivity, ICT, vacuum technology. Why build an LHC? BEFORE! 1 – Long Tail of Papers 2 – New Theore cal Insights 3 4 3 – “Discovery” to “Precision” Volume: 100PB + ~50PB/year (+400PB/year from 2020) 11 Zimmermann( Alain Blondel TLEP design study r-ECFA 2013-07-20 5 Balance sheet – Tevatron@FNAL • 20 year investment in Tevatron ~ $4B • Students $4B • Magnets and MRI $5-10B } ~ $50B total • Computing $40B Very rough calculation – but confirms our gut feeling that investment in fundamental science pays off I think there is an opportunity for someone to repeat this exercise more rigorously cf. -
Collecting and Preserving Digital Materials
COLLECTING AND PRESERVING DIGITAL MATERIALS A HOW-TO GUIDE FOR HISTORICAL SOCIETIES BY SOPHIE SHILLING CONTENTS Foreword Preface 1 Introduction 2 Digital material creation Born-digital materials Digitisation 3 Project planning Write a plan Create a workflow Policies and procedures Funding Getting everyone on-board 4 Select Bitstream preservation File formats Image resolution File naming conventions 5 Describe Metadata 6 Ingest Software Digital storage 7 Access and outreach Copyright Culturally sensitive content 8 Community 9 Glossary Bibliography i Foreword FOREWORD How the collection and research landscape has changed!! In 2000 the Federation of Australian Historical Societies commissioned Bronwyn Wilson to prepare a training guide for historical societies on the collection of cultural materials. Its purpose was to advise societies on the need to gather and collect contemporary material of diverse types for the benefit of future generations of researchers. The material that she discussed was essentially in hard copy format, but under the heading of ‘Electronic Media’ Bronwyn included a discussion of video tape, audio tape and the internet. Fast forward to 2018 and we inhabit a very different world because of the digital revolution. Today a very high proportion of the information generated in our technologically-driven society is created and distributed digitally, from emails to publications to images. Increasingly, collecting organisations are making their data available online, so that the modern researcher can achieve much by simply sitting at home on their computer and accessing information via services such as Trove and the increasing body of government and private material that is becoming available on the web. This creates both challenges and opportunities for historical societies. -
View Presentation
a centre of expertise in data curation and preservation The Digital Curation Centre Hugh Corley DCC Services and Outreach Liaison Officer Funders: 1 University College London Information Day :: 16th January 2007 a centre of expertise in data curation and preservation What is an Information Day • Learn • Discuss • Action Funders: 2 University College London Information Day :: 16th January 2007 a centre of expertise in data curation and preservation Overview • Digital Curation • Aims of the DCC • Work of the DCC Funders: 3 University College London Information Day :: 16th January 2007 a centre of expertise in data curation and preservation Digital Curation • Active management of data over life- cycle of scholarly and scientific interest – reproducibility – reuse • Appreciation of differences between disciplines • Ubiquitous relationship with digital information Funders: 4 University College London Information Day :: 16th January 2007 a centre of expertise in data curation and preservation UK Digital Curation Centre • CCLRC • University of Edinburgh • HATII • UKOLN Funders: 5 University College London Information Day :: 16th January 2007 a centre of expertise in data curation and preservation UK Digital Curation Centre Support and promote continuing improvement in the quality of data curation and digital preservation activity Drivers – increasing awareness that digital assets are reusable – continuing access vital to ensure contemporary scholarship is reproducible and verifiable – digital assets are fragile Funders: 6 University College -
Metadata Demystified: a Guide for Publishers
ISBN 1-880124-59-9 Metadata Demystified: A Guide for Publishers Table of Contents What Metadata Is 1 What Metadata Isn’t 3 XML 3 Identifiers 4 Why Metadata Is Important 6 What Metadata Means to the Publisher 6 What Metadata Means to the Reader 6 Book-Oriented Metadata Practices 8 ONIX 9 Journal-Oriented Metadata Practices 10 ONIX for Serials 10 JWP On the Exchange of Serials Subscription Information 10 CrossRef 11 The Open Archives Initiative 13 Conclusion 13 Where To Go From Here 13 Compendium of Cited Resources 14 About the Authors and Publishers 15 Published by: The Sheridan Press & NISO Press Contributing Editors: Pat Harris, Susan Parente, Kevin Pirkey, Greg Suprock, Mark Witkowski Authors: Amy Brand, Frank Daly, Barbara Meyers Copyright 2003, The Sheridan Press and NISO Press Printed July 2003 Metadata Demystified: A Guide for Publishers This guide presents an overview of evolving classified according to a variety of specific metadata conventions in publishing, as well as functions, such as technical metadata for related initiatives designed to standardize how technical processes, rights metadata for rights metadata is structured and disseminated resolution, and preservation metadata for online. Focusing on strategic rather than digital archiving, this guide focuses on technical considerations in the business of descriptive metadata, or metadata that publishing, this guide offers insight into how characterizes the content itself. book and journal publishers can streamline the various metadata-based operations at work Occurrences of metadata vary tremendously in their companies and leverage that metadata in richness; that is, how much or how little for added exposure through digital media such of the entity being described is actually as the Web. -
Front Page Title of Paper in Full: Centered
Preservation Is Not a Place 1 4th International Digital Curation Conference December 2008 Preservation Is Not a Place Stephen Abrams, Patricia Cruse, John Kunze California Digital Library, University of California December 2008 Abstract The Digital Preservation Program of the California Digital Library (CDL) is engaged in a process of reinvention involving significant transformations of its outlook, effort, and infrastructure. This includes a re-articulation of its mission in terms of digital curation, rather than preservation; encouraging a programmatic, rather than a project-oriented approach to curation activities; and a renewed emphasis on services, rather than systems. This last shift was motivated by a desire to deprecate the centrality of the repository as place. Having the repository as the locus for curation activity has resulted in the deployment of a somewhat cumbersome monolithic system that falls short of desired goals for responsiveness to rapidly changing user needs and operational and administrative sustainability. The Program is pursuing a path towards a new curation environment based on the principle of devolving curation function to a set of small, simple, loosely-coupled services. In considering this new infrastructure the Program is relying upon a highly deliberative process starting from first principles drawn from library and archival science. This is followed by a stagewise progressesion of identifying core preservable values, devising strategies promoting those values, defining abstract services embodying those strategies, and, finally, developing systems that instantiate those services. This paper presents a snapshot of the Program’s transformative efforts in its early phase. The 4th International Digital Curation Conference taking place in Edinburgh, Scotland over 1-3 December 2008 will address the theme Radical Sharing: Transforming Science?. -
Undergraduate Biocuration: Developing Tomorrow’S Researchers While Mining Today’S Data
The Journal of Undergraduate Neuroscience Education (JUNE), Fall 2015, 14(1):A56-A65 ARTICLE Undergraduate Biocuration: Developing Tomorrow’s Researchers While Mining Today’s Data Cassie S. Mitchell, Ashlyn Cates, Renaid B. Kim, & Sabrina K. Hollinger Biomedical Engineering, Georgia Institute of Technology & Emory University, Atlanta, GA 30332. Biocuration is a time-intensive process that involves managerial positions. We detail the undergraduate extraction, transcription, and organization of biological or application and training processes and give detailed job clinical data from disjointed data sets into a user-friendly descriptions for each position on the assembly line. We database. Curated data is subsequently used primarily for present case studies of neuropathology curation performed text mining or informatics analysis (bioinformatics, entirely by undergraduates, namely the construction of neuroinformatics, health informatics, etc.) and secondarily experimental databases of Amyotrophic Lateral Sclerosis as a researcher resource. Biocuration is traditionally (ALS) transgenic mouse models and clinical data from ALS considered a Ph.D. level task, but a massive shortage of patient records. Our results reveal undergraduate curators to consolidate the ever-mounting biomedical “big biocuration is scalable for a group of 8-50+ with relatively data” opens the possibility of utilizing biocuration as a minimal required resources. Moreover, with average means to mine today’s data while teaching students skill accuracy rates greater than 98.8%, undergraduate sets they can utilize in any career. By developing a biocurators are equivalently accurate to their professional biocuration assembly line of simplified and counterparts. Initial training to be completely proficient at compartmentalized tasks, we have enabled biocuration to the entry-level takes about five weeks with a minimal be effectively performed by a hierarchy of undergraduate student time commitment of four hours/week. -
2016 Technical Guidelines for Digitizing Cultural Heritage Materials
September 2016 Technical Guidelines for Digitizing Cultural Heritage Materials Creation of Raster Image Files i Document Information Title Editor Technical Guidelines for Digitizing Cultural Heritage Materials: Thomas Rieger Creation of Raster Image Files Document Type Technical Guidelines Publication Date September 2016 Source Documents Title Editors Technical Guidelines for Digitizing Cultural Heritage Materials: Don Williams and Michael Creation of Raster Image Master Files Stelmach http://www.digitizationguidelines.gov/guidelines/FADGI_Still_Image- Tech_Guidelines_2010-08-24.pdf Document Type Technical Guidelines Publication Date August 2010 Title Author s Technical Guidelines for Digitizing Archival Records for Electronic Steven Puglia, Jeffrey Reed, and Access: Creation of Production Master Files – Raster Images Erin Rhodes http://www.archives.gov/preservation/technical/guidelines.pdf U.S. National Archives and Records Administration Document Type Technical Guidelines Publication Date June 2004 This work is available for worldwide use and reuse under CC0 1.0 Universal. ii Table of Contents INTRODUCTION ........................................................................................................................................... 7 SCOPE .......................................................................................................................................................... 7 THE FADGI STAR SYSTEM ....................................................................................................................... -
A Brief Introduction to the Data Curation Profiles
Data Curation Profiles http://www.datacurationprofiles.org Access. Knowledge. Success A Brief Introduction to the Data Curation Profiles Jake Carlson Data Services Specialist Data Curation Profiles http://www.datacurationprofiles.org Access. Knowledge. Success Data Curation Profiles http://www.datacurationprofiles.org Access. Knowledge. Success Setting the stage Since 2004: Purdue Interdisciplinary Research Initiative revealed many data needs on campus What faculty said… • Not sure how or whether to share data • Lack of time to organize data sets • Need help describing data for discovery • Want to find new ways to manage data • Need help archiving data sets/collections Data Curation Profiles http://www.datacurationprofiles.org Access. Knowledge. Success “Investigating Data Curation Profiles across Research Domains” This lead to a research award from IMLS • Goals of the project: – A better understanding of the practices, attitudes and needs of researchers in managing and sharing their data. – Identify possible roles for librarians and the skill sets they will need to facilitate data sharing and curation. – Develop “data curation profiles” – a tool for librarians and others to gather information on researcher needs for their data and to inform curation services. – Purdue: Brandt—PI, Carlson—Proj Mgr, Witt—coPI; GSLIS UIUC: Palmer—co-PI, Cragin—Proj Mgr Data Curation Profiles http://www.datacurationprofiles.org Access. Knowledge. Success Data Curation Profiles Methodology • Initial interview— based on “Conducting a Data Interview” poster, then pre-profile based on model of data in a few published papers • Interviews were transcribed, reviewed • Coding, wanted more detail -> 2nd Interview • Developed and revised a template for Profile “In our experience, one of the most effective tactics for eliciting datasets for the collection is a simple librarian- researcher interview. -
Repurposing Archival Theory in the Practice of Data Curation
Repurposing Archival Theory in the Practice of Data Curation Elizabeth Rolando| Wendy Hagenmaier |Susan Wells Parham Introduction Methodology • Expansion of data curation and digital archiving services at the Georgia Tech Library • Process the same digital collection, once by data curator, once by digital archivist and Archives. • Data curation processing informed by OAIS Reference Model1, ICPSR workflow2, and • How do data curation and archival science intersect? UK Data Archive workflow3 • How can comparing data curation and archival science lead to improvements in • Archival processing informed by concepts, such as appraisal, respect des fonds, local workflows and practices? original order, and archival value4, as well documented practices at peer institutions • Compare processing plans to discover areas of agreement and areas of conflict Data Transfer Data Processing Metadata Processing Preservation Access Unique Data Curation Processing Steps -Deposit agreement modeled on -Format transformation policies -Review and enhancement of -Varied retention periods, -Datasets treated as active and institutional repository license guided by reuse over preservation README file, used to accommodate determined by Board of Regents reusable -Funding model for sustainability -Create derivatives to promote diverse depositor needs Retention Schedule and funding -Datasets linked to publications access and re-use model -Bulk or individual file download -Correct erroneous or missing data Common Processing Steps -Data quarantine -Format identification -
Applying the ETL Process to Blockchain Data. Prospect and Findings
information Article Applying the ETL Process to Blockchain Data. Prospect and Findings Roberta Galici 1, Laura Ordile 1, Michele Marchesi 1 , Andrea Pinna 2,* and Roberto Tonelli 1 1 Department of Mathematics and Computer Science, University of Cagliari, Via Ospedale 72, 09124 Cagliari, Italy; [email protected] (R.G.); [email protected] (L.O.); [email protected] (M.M.); [email protected] (R.T.) 2 Department of Electrical and Electronic Engineering (DIEE), University of Cagliari, Piazza D’Armi, 09100 Cagliari, Italy * Correspondence: [email protected] Received: 7 March 2020; Accepted: 7 April 2020; Published: 10 April 2020 Abstract: We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction. -
Don't WARC Away: Preservation Metadata & Web Archives
Don't WARC Away: Preservation Metadata & Web Archives! Jefferson Bailey & Maria LaCalle, Internet Archive ALA 2015 | ALCTS PARS | June 27, 2015 @jefferson_bail | [email protected] Don't WARC Away: Preservation Metadata & Web Archives! Jefferson Bailey & Maria LaCalle, Internet Archive ALA 2015 | ALCTS PARS | June 27, 2015 @jefferson_bail | [email protected] •! We are a non-profit Digital Library & Archive founded in 1996 •! 20+PB unique data: 10PB web, ~8m text, 2m vid, 2m aud, 100K soft, etc •! We work in a former church and it’s awesome •! Developed: Heritrix, Wayback, warcprox, Umbra, NutchWax, ARC format •! Engineers, librarians/archivists, program staff •! https://archive.org/web •! Largest and oldest publicly available web archive in existence •! 485,000,000,000+ URLs (that’s billions) •! Like a billion websites, domain agnostic •! Content in 40+ Languages •! Periodic snapshot; 1b+ URLs per week •! https://archive-it.org/ •! Web archiving service used by 370+ institutions •! 3500+ collection, 10 billion+ URLs •! 49 states and 19 countries •! Libraries, archives, museums, governments, non-profits, etc. •! User groups, Annual Meeting, collaborative and educational projects What is a web archive? •! Web archiving is the process of collecting portions of web content, preserving the collections, and then providing access to the archives - for use and re use. •! A web archive is a collection of archived URLs grouped by theme, event, subject area, or web address. •! A web archive contains as much as possible from the original resources and documents the change over time. It recreates the experience a user would have had if they!had visited the live site on the day it was archived. -
SCONUL Focus Number 38 Summer/Autumn 2006
SCONUL Focus Number 38 Summer/Autumn 2006 Contents ISSN 1745-5782 (print) ISSN 1745-5790 (online) 3 The 3Ss 4 The Learning Grid at the University of Warwick: a library innovation to support learning in higher education Rachel Edwards 8 The Learning Gateway: opening the doors to a new generation of learners at St Martin’s College, Carlisle campus Margaret Weaver 11 Middlesex University: the impressive rejuvenation of Hendon campus Paul Beaty-Pownall 14 Poor design equals poor health questionnaire: the final results Jim Jackson 20 Human resourcing in academic libraries: the ‘lady librarian’, the call for flexible staff and the need to be counted A. D. B. MacLean, N. C. Joint 26 Taking steps that make you feel dizzy: personal reflections on module 1 of the Future Leaders programme John Cox, Annie Kilner, Dilys Young 30 Evolution: the Oxford trainee scheme Gill Powell, Katie Robertson 34 A week in the life Kim McGowan 36 Got the knowledge? Focusing on the student: Manchester Metropolitan University’s (MMU) library welcome campaign David Matthews, Emily Shields, Rosie Jones, Karen Peters 41 Ask the audience: e-voting at the University of Leeds Lisa Foggo, Susan Mottram, Sarah Taylor 44 Information literacy, the link between second and tertiary education: project origins and current developments Christine Irving 47 Review of how libraries are currently supporting the research process Ruth Stubbings, Joyce Bartlett, Sharon Reid 51 Researchers, information and libraries: the CONUL national research support survey John Cox 55 Creating a new Social Science Library at Oxford University based on reader consultation Louise Clarke 58 The use of personal scanners and digital cameras within OULS reading rooms Steve Rose, Gillian Evison 60 Copyright, digital resources and IPR at Brunel University Monique Ritchie 64 Secure electronic delivery: ‘get the world’s knowledge with less waiting’ Alison E.