Curating Research Data: Volume
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Jamie Shiers, CERN
Investing in Curation A Shared Path to Sustainability [email protected] Data Preservation in HEP (DPHEP) Outline 1. Pick 2 of the messages from the roadmap & comment – I could comment on all – but not in 10’ 1. What (+ve) impact has 4C already had on us? – Avoiding overlap with the above 1. A point for discussion – shared responsiblity / action THE MESSAGES The 4C Roadmap Messages 1. Identify the value of digital assets and make choices 2. Demand and choose more efficient systems 3. Develop scalable services and infrastructure 4. Design digital curation as a sustainable service 5. Make funding dependent on costing digital assets across the whole lifecycle 6. Be collaborative and transparent to drive down costs IMPACT OF 4C ON DPHEP International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics “LHC Cost Model” (simplified) Start with 10PB, then +50PB/year, then +50% every 3y (or +15% / year) 10EB 1EB 6 Case B) increasing archive growth Total cost: ~$59.9M (~$2M / year) 7 1. Identify the value of digital assets and make choices • Today, significant volumes of HEP data are thrown away “at birth” – i.e. via very strict filters (aka triggers) B4 writing to storage To 1st approximation ALL remaining data needs to be kept for a few decades • “Value” can be measured in a number of ways: – Scientific publications / results; – Educational / cultural impact; – “Spin-offs” – e.g. superconductivity, ICT, vacuum technology. Why build an LHC? BEFORE! 1 – Long Tail of Papers 2 – New Theore cal Insights 3 4 3 – “Discovery” to “Precision” Volume: 100PB + ~50PB/year (+400PB/year from 2020) 11 Zimmermann( Alain Blondel TLEP design study r-ECFA 2013-07-20 5 Balance sheet – Tevatron@FNAL • 20 year investment in Tevatron ~ $4B • Students $4B • Magnets and MRI $5-10B } ~ $50B total • Computing $40B Very rough calculation – but confirms our gut feeling that investment in fundamental science pays off I think there is an opportunity for someone to repeat this exercise more rigorously cf. -
Module 8 Wiki Guide
Best Practices for Biomedical Research Data Management Harvard Medical School, The Francis A. Countway Library of Medicine Module 8 Wiki Guide Learning Objectives and Outcomes: 1. Emphasize characteristics of long-term data curation and preservation that build on and extend active data management ● It is the purview of permanent archiving and preservation to take over stewardship and ensure that the data do not become technologically obsolete and no longer permanently accessible. ○ The selection of a repository to ensure that certain technical processes are performed routinely and reliably to maintain data integrity ○ Determining the costs and steps necessary to address preservation issues such as technological obsolescence inhibiting data access ○ Consistent, citable access to data and associated contextual records ○ Ensuring that protected data stays protected through repository-governed access control ● Data Management ○ Refers to the handling, manipulation, and retention of data generated within the context of the scientific process ○ Use of this term has become more common as funding agencies require researchers to develop and implement structured plans as part of grant-funded project activities ● Digital Stewardship ○ Contributions to the longevity and usefulness of digital content by its caretakers that may occur within, but often outside of, a formal digital preservation program ○ Encompasses all activities related to the care and management of digital objects over time, and addresses all phases of the digital object lifecycle 2. Distinguish between preservation and curation ● Digital Curation ○ The combination of data curation and digital preservation ○ There tends to be a relatively strong orientation toward authenticity, trustworthiness, and long-term preservation ○ Maintaining and adding value to a trusted body of digital information for future and current use. -
Archival Representation
Archival Science 3: 1–25, 2003. 1 © 2003 Kluwer Academic Publishers. Printed in the Netherlands. Archival Representation ELIZABETH YAKEL University of Michigan, School of Information (E-mail: [email protected]) Abstract. This paper defines and discusses archival representation and its role in archival practice. Archival representation refers to both the processes of arrangement and description and is viewed as a fluid, evolving, and socially constructed practice. The paper analyzes organizational and descriptive schemas, tools, and systems as a means of uncovering repre- sentational practices. In conclusion the paper argues that the term ‘archival representation’ more precisely captures the actual work of archivists in (re)ordering, interpreting, creating surrogates, and designing architectures for representational systems. Keywords: archival arrangement and description, archival cataloging, archival practice, archival processing, finding aids A man hath perished and his corpse has become dirt. All his kindred have crumbled to dust. But writings cause him to be remembered in the mouth of the reciter.1 In The Design of Everyday Things Donald Norman argues for a user- centered approach to the design of the daily artifacts we take for granted.2 While archives and archival collections are not everyday things for most people, they are embedded in everyday archival practice. Furthermore, archival representations and representational systems must characterize these everyday things for potential researchers. The term ‘representation’ is used to refer both to the process or activity of representing and to the object(s) produced by an instance of that activity. The process of representing seeks to establish systematic corre- spondence between the target domain and the modeling domain and to capture or ‘re-present,’ through the medium of the modeling domain, the object, the data, or information in the target domain .. -
Digital Data Archive and Preservation in the Cloud – What to Do and What Not to Do 2 © 2013 Storage Networking Industry Association
DigitalPRESENTATION Data Archive TITLE and GOES PreservationHERE in the Cloud - What To Do and What Not To Do Chad Thibodeau / Cleversafe Sebastian Zangaro / HP SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. Digital Data Archive and Preservation in the Cloud – What to do and What Not to Do 2 © 2013 Storage Networking Industry Association. All Rights Reserved. 2 Abstract Cloud Archive Challenges and Best Practices This session will appeal to Storage Vendors, Datacenter Managers, Developers, and those seeking a basic understanding of how best to implement a Cloud Storage Digital Data Archive and Cloud Storage Digital Data Preservation service. -
Course Schedule
Library & Information Science Program University of Hawaiʻi at Mānoa Information & Computer Sciences Department COURSE SYLLABUS Fall 2017 LIS 693: Archival & Special Collections Management Instructor: Dr. Andrew Wertheimer Contact Information: 002G Hamilton Library, 2550 McCarthy Mall, Honolulu Hawaii 96822 TEL: e-mail: [email protected] 808. 956. 5812 Please add LIS 693 in the subject line. Chat: Google Chat: [email protected] (by appointment) Skype: andrew.b.wertheimer (by appointment) Course Portal (Laulima): https://laulima.hawaii.edu/portal Office Hours: Tuesdays: 4:00-5:30 PM Thursdays: 3:30-5:00 PM or by appointment Class Time/ Location: Tuesdays 1-3:40 pm @ Hamilton 3G Course Catalog Description: LIS 693 (Special Topics): Archival & Special Collections Management introduces students to the proper management of an archive, manuscript collection or special collection using management approaches and best practices from archival studies. This course introduces management theory, appraisal theory, and relevant issues concerning facilities and legal issues related to privacy, intellectual property, records management, as well as advocacy, fundraising, reference and educational outreach functions. This course, along with Archival Ethics, Archival Processing, is part of the professional preparation for the archival field. 3 credits. Prerequisites: None. Beginning in Fall 2018, pending approval, this course will be offered as LIS 658: Archival & Special Collections Management. Textbook & Readings Required Textbook: Kate Theimer, Management: Innovative Practices for Archives and Special Collections. Lanham: Rowman & Littlefield Publishers, 2014 ISBN 978-1-59884- 864-9 (at the University Bookstore). (also available as an e-book) Additional Required Readings: Additional required readings will be posted in Laulima. Articles are available via UHM’s electronic resources. -
From Coalition to Commons: Plan S and the Future of Scholarly Communication
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Copyright, Fair Use, Scholarly Communication, etc. Libraries at University of Nebraska-Lincoln 2019 From Coalition to Commons: Plan S and the Future of Scholarly Communication Rob Johnson Research Consulting Follow this and additional works at: https://digitalcommons.unl.edu/scholcom Part of the Intellectual Property Law Commons, Scholarly Communication Commons, and the Scholarly Publishing Commons Johnson, Rob, "From Coalition to Commons: Plan S and the Future of Scholarly Communication" (2019). Copyright, Fair Use, Scholarly Communication, etc.. 157. https://digitalcommons.unl.edu/scholcom/157 This Article is brought to you for free and open access by the Libraries at University of Nebraska-Lincoln at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Copyright, Fair Use, Scholarly Communication, etc. by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Insights – 32, 2019 Plan S and the future of scholarly communication | Rob Johnson From coalition to commons: Plan S and the future of scholarly communication The announcement of Plan S in September 2018 triggered a wide-ranging debate over how best to accelerate the shift to open access. The Plan’s ten principles represent a call for the creation of an intellectual commons, to be brought into being through collective action by funders and managed through regulated market mechanisms. As it gathers both momentum and critics, the coalition must grapple with questions of equity, efficiency and sustainability. The work of Elinor Ostrom has shown that successful management of the commons frequently relies on polycentricity and adaptive governance. The Plan S principles must therefore function as an overarching framework within which local actors retain some autonomy, and should remain open to amendment as the scholarly communication landscape evolves. -
Open Access Availability of Scientific Publications
Analytical Support for Bibliometrics Indicators Open access availability of scientific publications Analytical Support for Bibliometrics Indicators Open access availability of scientific publications* Final Report January 2018 By: Science-Metrix Inc. 1335 Mont-Royal E. ▪ Montréal ▪ Québec ▪ Canada ▪ H2J 1Y6 1.514.495.6505 ▪ 1.800.994.4761 [email protected] ▪ www.science-metrix.com *This work was funded by the National Science Foundation’s (NSF) National Center for Science and Engineering Statistics (NCSES). Any opinions, findings, conclusions or recommendations expressed in this report do not necessarily reflect the views of NCSES or the NSF. The analysis for this research was conducted by SRI International on behalf of NSF’s NCSES under contract number NSFDACS1063289. Analytical Support for Bibliometrics Indicators Open access availability of scientific publications Contents Contents .............................................................................................................................................................. i Tables ................................................................................................................................................................. ii Figures ................................................................................................................................................................ ii Abstract ............................................................................................................................................................ -
A Data Management Life-Cycle
science for a changing world Central Energy Resources Team Data Management A Data Management Life-Cycle Introduction Documented, reliable, and accessible data and information are essential building blocks supporting scientific research and applications that enhance society's knowledge base (fig. 1). The U.S. Geological Survey (USGS), a leading provider of science data, information, and knowledge, is uniquely positioned to integrate science and natural resource information to address societal needs. The USGS Central Energy Resources Team (USGS-CERT) provides critical information and knowledge on tnu iuu the quantity, quality, and distribution of the Nation's and the 10101010110 world's oil, gas, and coal resources. 1C001111001 By using a life-cycle model, the USGS-CERT Data Manage 1001001"*: ment Project is developing an integrated data management system 10^19*1- to (1) promote access to energy data and information, (2) increase data documentation, and (3) streamline product delivery to the public, scientists, and decision makers. The project incorporates web-based technology, data cataloging systems, data processing Data Information routines, and metadata documentation tools to improve data Figure 1. Access to organized and documented data and access, enhance data consistency, and increase office efficiency. information increases knowledge. Implementation Summary and Future Directions In a life-cycle model, data and information are transformed into tangible products and knowledge by a continuous flow in which The Data Management Project addresses USGS science the output of one process becomes the input of others (fig. 2). \ goals and affects how the Central Energy Resources Team manages data and carries out its programs. Our task-oriented "Find, Get, Use, Deliver, and Maintain" The tools and processes developed are based on strategy incorporates life-cycle concepts and directs USGS-CERT the life-cycle approach and our "Find, Get, Use, data management tasks and implementation. -
Measuring the Impact of Digital Repositories
Measuring the Impact of Digital Repositories February 28-March 1, 2017 Participant Introductions Bruce Ambacher’s career spans four decades at NARA, George Mason University, and the University of Maryland. His responsibilities included service as acting chief of NARA’s digital preservation unit and as court-appointed preservation officer for the PROFS, Iran-Contra, and Clinton email collections. He represented NARA on the Federal Geographic Data Committee and helped develop federal and international geospatial standards. He was NARA’s representative for the OAIS Reference Model. He co-chaired the development of TRAC. He helped develop the trustworthy digital repositories standards ISO 16363 and ISO 16919. He began teaching courses in archives and digital preservation at George Mason University in 1984, became an adjunct professor at the University of Maryland in 2000 and a Visiting Professor between 2007 and 2013. He has consulted on digital preservation for industry and cultural humanities institutions. He is a Research Affiliate of the Digital Curation Innovation Center at the University of Maryland Mary Barlow is head of the Strategic Project Management Office at the European Bioinformatics Institute (EMBL-EBI), and has oversight of impact reporting and the development of infrastructure business cases. Prior to taking on this role, Mary served as a Programme Manager for the multi-million pound investment in EMBL-EBI by the UK government's Large Facilities Capital Fund. This programme included the construction of new office space and the on-going public procurement of ICT infrastructure to support EMBL-EBI's growing public databases. Mary's work prior to EMBL-EBI focused on ICT integration and intelligent buildings. -
United States Court of Appeals for the District of Columbia Circuit
USCA Case #17-7035 Document #1694255 Filed: 09/25/2017 Page 1 of 58 Nos. 17-7035 (Lead Case), 17-7039 In the United States Court of Appeals for the District of Columbia Circuit AMERICAN SOCIETY FOR TESTING AND MATERIALS; NATIONAL FIRE PROTECTION ASSOCIATION, INC.; and AMERICAN SOCIETY OF HEATING, REFRIGERATING, AND AIR-CONDITIONING ENGINEERS, INC., Plaintiffs-Appellees, v. PUBLIC.RESOURCE.ORG, INC., Defendant-Appellant. (Full caption on inside cover) Appeal from the United States District Court for the District of Columbia BRIEF OF SIXTY-SIX LIBRARY ASSOCIATIONS, NONPROFIT ORGANIZATIONS, LEGAL TECHNOLOGY COMPANIES, FORMER SENIOR GOVERNMENT OFFICIALS, LIBRARIANS, INNOVATORS, AND PROFESSORS OF LAW AS AMICI CURIAE IN SUPPORT OF DEFENDANT-APPELLANT (AMENDED TO ADD FURTHER SIGNATORIES) Charles Duan Counsel of Record Meredith F. Rose Public Knowledge 1818 N Street NW, Suite 410 Washington, DC 20036 (202) 861-0020 [email protected] Counsel for amici curiae Rev. c9e1c40d USCA Case #17-7035 Document #1694255 Filed: 09/25/2017 Page 2 of 58 AMERICAN SOCIETY FOR TESTING AND MATERIALS; NATIONAL FIRE PROTECTION ASSOCIATION, INC.; and AMERICAN SOCIETY OF HEATING, REFRIGERATING, AND AIR-CONDITIONING ENGINEERS, INC., Plaintiffs-Appellees, v. PUBLIC.RESOURCE.ORG, INC., Defendant-Appellant. AMERICAN EDUCATIONAL RESEARCH ASSOCIATION, INC.; AMERICAN PSYCHOLOGICAL ASSOCIATION, INC.; and NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION, INC., Plaintiffs-Appellees, v. PUBLIC.RESOURCE.ORG, INC., Defendant-Appellant, AMERICAN SOCIETY FOR TESTING AND MATERIALS; NATIONAL FIRE PROTECTION ASSOCIATION, INC.; and AMERICAN SOCIETY OF HEATING, REFRIGERATING, AND AIR-CONDITIONING ENGINEERS, INC., Intervenors-Appellees. USCA Case #17-7035 Document #1694255 Filed: 09/25/2017 Page 3 of 58 CERTIFICATE AS TO PARTIES, RULINGS, AND RELATED CASES Pursuant to Circuit Rule 28(a)(1), amici curiae certify as follows. -
Comparing Tape and Cloud Storage for Long-Term Data Preservation 1
Comparing Tape and Cloud Storage for Long-Term Data Preservation November 2018 Table of Contents Abstract ......................................................................................................................................................... 2 Introduction .................................................................................................................................................. 2 Perceptions of Leading Edge Technology ..................................................................................................... 2 Reliability ....................................................................................................................................................... 3 Security ......................................................................................................................................................... 4 Speed ............................................................................................................................................................ 5 Storing Data .............................................................................................................................................. 5 Retrieving Data ......................................................................................................................................... 6 Data Growth .................................................................................................................................................. 6 Redundancy ............................................................................................................................................. -
Undergraduate Biocuration: Developing Tomorrow’S Researchers While Mining Today’S Data
The Journal of Undergraduate Neuroscience Education (JUNE), Fall 2015, 14(1):A56-A65 ARTICLE Undergraduate Biocuration: Developing Tomorrow’s Researchers While Mining Today’s Data Cassie S. Mitchell, Ashlyn Cates, Renaid B. Kim, & Sabrina K. Hollinger Biomedical Engineering, Georgia Institute of Technology & Emory University, Atlanta, GA 30332. Biocuration is a time-intensive process that involves managerial positions. We detail the undergraduate extraction, transcription, and organization of biological or application and training processes and give detailed job clinical data from disjointed data sets into a user-friendly descriptions for each position on the assembly line. We database. Curated data is subsequently used primarily for present case studies of neuropathology curation performed text mining or informatics analysis (bioinformatics, entirely by undergraduates, namely the construction of neuroinformatics, health informatics, etc.) and secondarily experimental databases of Amyotrophic Lateral Sclerosis as a researcher resource. Biocuration is traditionally (ALS) transgenic mouse models and clinical data from ALS considered a Ph.D. level task, but a massive shortage of patient records. Our results reveal undergraduate curators to consolidate the ever-mounting biomedical “big biocuration is scalable for a group of 8-50+ with relatively data” opens the possibility of utilizing biocuration as a minimal required resources. Moreover, with average means to mine today’s data while teaching students skill accuracy rates greater than 98.8%, undergraduate sets they can utilize in any career. By developing a biocurators are equivalently accurate to their professional biocuration assembly line of simplified and counterparts. Initial training to be completely proficient at compartmentalized tasks, we have enabled biocuration to the entry-level takes about five weeks with a minimal be effectively performed by a hierarchy of undergraduate student time commitment of four hours/week.