Seeking a Digital Preservation Storage Container Format for Web Archiving
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Recommended Formats Statement 2019-2020
Library of Congress Recommended Formats Statement 2019-2020 For online version, see https://www.loc.gov/preservation/resources/rfs/index.html Introduction to the 2019-2020 revision ....................................................................... 2 I. Textual Works and Musical Compositions ............................................................ 4 i. Textual Works – Print .................................................................................................... 4 ii. Textual Works – Digital ................................................................................................ 6 iii. Textual Works – Electronic serials .............................................................................. 8 iv. Digital Musical Compositions (score-based representations) .................................. 11 II. Still Image Works ............................................................................................... 13 i. Photographs – Print .................................................................................................... 13 ii. Photographs – Digital ................................................................................................. 14 iii. Other Graphic Images – Print .................................................................................... 15 iv. Other Graphic Images – Digital ................................................................................. 16 v. Microforms ................................................................................................................ -
ROOT I/O Compression Improvements for HEP Analysis
EPJ Web of Conferences 245, 02017 (2020) https://doi.org/10.1051/epjconf/202024502017 CHEP 2019 ROOT I/O compression improvements for HEP analysis Oksana Shadura1;∗ Brian Paul Bockelman2;∗∗ Philippe Canal3;∗∗∗ Danilo Piparo4;∗∗∗∗ and Zhe Zhang1;y 1University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States 2Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States 3Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States 4CERN, Meyrin 1211, Geneve, Switzerland Abstract. We overview recent changes in the ROOT I/O system, enhancing it by improving its performance and interaction with other data analysis ecosys- tems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly improve experiment’s software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically in- creased over the last couple of years. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly, because there are significant trade-offs between the increased CPU cost for reading and writing files and the reduces storage space. 1 Introduction In the past years, Large Hadron Collider (LHC) experiments are managing about an exabyte of storage for analysis purposes, approximately half of which is stored on tape storages for archival purposes, and half is used for traditional disk storage. Meanwhile for High Lumi- nosity Large Hadron Collider (HL-LHC) storage requirements per year are expected to be increased by a factor of 10 [1]. -
ACS – the Archival Cytometry Standard
http://flowcyt.sf.net/acs/latest.pdf ACS – the Archival Cytometry Standard Archival Cytometry Standard ACS International Society for Advancement of Cytometry Candidate Recommendation DRAFT Document Status The Archival Cytometry Standard (ACS) has undergone several revisions since its initial development in June 2007. The current proposal is an ISAC Candidate Recommendation Draft. It is assumed, however not guaranteed, that significant features and design aspects will remain unchanged for the final version of the Recommendation. This specification has been formally tested to comply with the W3C XML schema version 1.0 specification but no position is taken with respect to whether a particular software implementing this specification performs according to medical or other valid regulations. The work may be used under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported license. You are free to share (copy, distribute and transmit), and adapt the work under the conditions specified at http://creativecommons.org/licenses/by-sa/3.0/legalcode. Disclaimer of Liability The International Society for Advancement of Cytometry (ISAC) disclaims liability for any injury, harm, or other damage of any nature whatsoever, to persons or property, whether direct, indirect, consequential or compensatory, directly or indirectly resulting from publication, use of, or reliance on this Specification, and users of this Specification, as a condition of use, forever release ISAC from such liability and waive all claims against ISAC that may in any manner arise out of such liability. ISAC further disclaims all warranties, whether express, implied or statutory, and makes no assurances as to the accuracy or completeness of any information published in the Specification. -
Arxiv:2004.10531V1 [Cs.OH] 8 Apr 2020
ROOT I/O compression improvements for HEP analysis Oksana Shadura1;∗ Brian Paul Bockelman2;∗∗ Philippe Canal3;∗∗∗ Danilo Piparo4;∗∗∗∗ and Zhe Zhang1;y 1University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States 2Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States 3Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States 4CERN, Meyrin 1211, Geneve, Switzerland Abstract. We overview recent changes in the ROOT I/O system, increasing per- formance and enhancing it and improving its interaction with other data analy- sis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment’s software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically in- creased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are sig- nificant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space. 1 Introduction In the past years LHC experiments are commissioned and now manages about an exabyte of storage for analysis purposes, approximately half of which is used for archival purposes, and half is used for traditional disk storage. Meanwhile for HL-LHC storage requirements per year are expected to be increased by factor 10 [1]. arXiv:2004.10531v1 [cs.OH] 8 Apr 2020 Looking at these predictions, we would like to state that storage will remain one of the major cost drivers and at the same time the bottlenecks for HEP computing. -
Building and Installing Xen 4.X and Linux Kernel 3.X on Ubuntu and Debian Linux
Building and Installing Xen 4.x and Linux Kernel 3.x on Ubuntu and Debian Linux Version 2.3 Author: Teo En Ming (Zhang Enming) Website #1: http://www.teo-en-ming.com Website #2: http://www.zhang-enming.com Email #1: [email protected] Email #2: [email protected] Email #3: [email protected] Mobile Phone(s): +65-8369-2618 / +65-9117-5902 / +65-9465-2119 Country: Singapore Date: 10 August 2013 Saturday 4:56 A.M. Singapore Time 1 Installing Prerequisite Software sudo apt-get install ocaml-findlib sudo apt-get install bcc bin86 gawk bridge-utils iproute libcurl3 libcurl4-openssl-dev bzip2 module-init-tools transfig tgif texinfo texlive-latex-base texlive-latex-recommended texlive-fonts-extra texlive-fonts-recommended pciutils-dev mercurial build-essential make gcc libc6-dev zlib1g-dev python python-dev python-twisted libncurses5-dev patch libvncserver-dev libsdl-dev libjpeg62-dev iasl libbz2-dev e2fslibs-dev git-core uuid-dev ocaml libx11-dev bison flex sudo apt-get install gcc-multilib sudo apt-get install xz-utils libyajl-dev gettext sudo apt-get install git-core kernel-package fakeroot build-essential libncurses5-dev 2 Linux Kernel 3.x with Xen Virtualization Support (Dom0 and DomU) In this installation document, we will build/compile Xen 4.1.3-rc1-pre and Linux kernel 3.3.0-rc7 from sources. sudo apt-get install aria2 aria2c -x 5 http://www.kernel.org/pub/linux/kernel/v3.0/testing/linux-3.3-rc7.tar.bz2 tar xfvj linux-3.3-rc7.tar.bz2 cd linux-3.3-rc7 Page 1 of 25 (C) 2013 Teo En Ming (Zhang Enming) 3 Configuring the Linux kernel cp /boot/config-3.0.0-12-generic .config make oldconfig Accept the defaults for new kernel configuration options by pressing enter. -
A Statistical Report on State Archives and Records Management Programs in the United States
THE STATE OF STATE RECORDS A Statistical Report on State Archives and Records Management Programs in the United States 2021 EDITION BASED ON FY2020 SURVEY JULY 2021 RECENT REPORTS AND PUBLICATIONS FROM THE COUNCIL OF STATE ARCHIVISTS These and other publications may be found at the CoSA website, https://www.statearchivists.org/ SERI Digital Best Practices Series Number 2: Developing Managing Electronic Communications in Government. 2019 Successful Electronic Records Grants Projects. May 2021 When is Information Misinformation? 2019. MoVE-IT: Modeling Viable Electronic Information Transfers. A collaborative publication of CoSA, the Chief Officers of State February 2021 Library Agencies (COSLA), and the National Association of The Modeling Viable Electronic Information Transfers (MoVE-IT) Secretaries of State (NASS), this handy infographic highlights project builds upon work surveying state and territorial archives common pitfalls for online users and offers five tips for regarding the efficacy of their electronic records management avoiding them. and digital preservation programs. This report analyzes seven electronic records transfer projects with regard to the elements State Archiving in the Digital Era: A Playbook for the that challenged them and, ultimately, made them successes. Preservation of Electronic Records. OctOber 2018 Published with support from Preservica and AVP. Developed by CoSA and the National Association of State Chief Information Officers, the playbook presents eleven “plays” Public Records and Remote Work: Advice for State to help state leaders think about the best ways to preserve Governments. SepteMber 2020 archives in the digital era. This report was supported in part by Social Media as State Public Records: A Collections Practices a National Leadership Grant from the Institute of Museum and Scan. -
Metadefender Core V4.12.2
MetaDefender Core v4.12.2 © 2018 OPSWAT, Inc. All rights reserved. OPSWAT®, MetadefenderTM and the OPSWAT logo are trademarks of OPSWAT, Inc. All other trademarks, trade names, service marks, service names, and images mentioned and/or used herein belong to their respective owners. Table of Contents About This Guide 13 Key Features of Metadefender Core 14 1. Quick Start with Metadefender Core 15 1.1. Installation 15 Operating system invariant initial steps 15 Basic setup 16 1.1.1. Configuration wizard 16 1.2. License Activation 21 1.3. Scan Files with Metadefender Core 21 2. Installing or Upgrading Metadefender Core 22 2.1. Recommended System Requirements 22 System Requirements For Server 22 Browser Requirements for the Metadefender Core Management Console 24 2.2. Installing Metadefender 25 Installation 25 Installation notes 25 2.2.1. Installing Metadefender Core using command line 26 2.2.2. Installing Metadefender Core using the Install Wizard 27 2.3. Upgrading MetaDefender Core 27 Upgrading from MetaDefender Core 3.x 27 Upgrading from MetaDefender Core 4.x 28 2.4. Metadefender Core Licensing 28 2.4.1. Activating Metadefender Licenses 28 2.4.2. Checking Your Metadefender Core License 35 2.5. Performance and Load Estimation 36 What to know before reading the results: Some factors that affect performance 36 How test results are calculated 37 Test Reports 37 Performance Report - Multi-Scanning On Linux 37 Performance Report - Multi-Scanning On Windows 41 2.6. Special installation options 46 Use RAMDISK for the tempdirectory 46 3. Configuring Metadefender Core 50 3.1. Management Console 50 3.2. -
Bagit File Packaging Format (V0.97) Draft-Kunze-Bagit-07.Txt
Network Working Group A. Boyko Internet-Draft Expires: October 4, 2012 J. Kunze California Digital Library J. Littman L. Madden Library of Congress B. Vargas April 2, 2012 The BagIt File Packaging Format (V0.97) draft-kunze-bagit-07.txt Abstract This document specifies BagIt, a hierarchical file packaging format for storage and transfer of arbitrary digital content. A "bag" has just enough structure to enclose descriptive "tags" and a "payload" but does not require knowledge of the payload's internal semantics. This BagIt format should be suitable for disk-based or network-based storage and transfer. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 4, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Boyko, et al. -
Freebsd File Formats Manual Libarchive-Formats (5)
libarchive-formats (5) FreeBSD File Formats Manual libarchive-formats (5) NAME libarchive-formats —archive formats supported by the libarchive library DESCRIPTION The libarchive(3) library reads and writes a variety of streaming archive formats. Generally speaking, all of these archive formats consist of a series of “entries”. Each entry stores a single file system object, such as a file, directory,orsymbolic link. The following provides a brief description of each format supported by libarchive,with some information about recognized extensions or limitations of the current library support. Note that just because a format is supported by libarchive does not imply that a program that uses libarchive will support that format. Applica- tions that use libarchive specify which formats theywish to support, though manyprograms do use libarchive convenience functions to enable all supported formats. TarFormats The libarchive(3) library can read most tar archives. However, itonly writes POSIX-standard “ustar” and “pax interchange” formats. All tar formats store each entry in one or more 512-byte records. The first record is used for file metadata, including filename, timestamp, and mode information, and the file data is stored in subsequent records. Later variants have extended this by either appropriating undefined areas of the header record, extending the header to multiple records, or by storing special entries that modify the interpretation of subsequent entries. gnutar The libarchive(3) library can read GNU-format tar archives. It currently supports the most popular GNU extensions, including modern long filename and linkname support, as well as atime and ctime data. The libarchive library does not support multi-volume archives, nor the old GNU long filename format. -
Electronic Data Deliverables Reference Guide EPA Region 4
Electronic Data Deliverables Reference Guide Version 1.0 EPA Region 4 Prepared By: for Region 4 Superfund Division Environmental Protection Agency March 2010 DISCLAIMER OF ENDORSEMENT Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government, and shall not be used for advertising or product endorsement purposes. STATUS OF DOCUMENT As of March 2010, this document and all contents contained herein are considered DRAFT and are subject to revision and subsequent republication. Ecological EDD specifications do not appear in this guidance as they are currently under development, and will appear in future addenda. CONTACTS For questions and comments, contact: Your RPM or, DART Coordinator Superfund Division, 11th Floor East United States Environmental Protection Agency, Region 4 Sam Nunn Atlanta Federal Center 61 Forsyth Street, SW Atlanta, GA 30303-8960 (404) 562-8558 [email protected] Acronyms CAS RN – Chemical Abstracts Service Registry Number DART – Data Archival and ReTrieval EDD – Electronic Data Deliverable EDP – EQuIS Data Processor EPA – Environmental Protection Agency O&M – Operation and Maintenance SESD – Science and Ecosystem Support Division SRS – Substance Registry System CLP – Contract Laboratory Program PRP – Potentially Responsible Party Definitions Darter - Darter is a set of software utilities written by EPA that assist in moving data from other platforms such as FORMS, Niton, YSI and Scribe to the Region 4 EDD format. Data Provider – It is important to distinguish between “Data Provider” and “Sample Provider” with regard to EDD submittals. -
Application Software
Software 2 Computer Software ซอรฟแวร หมายถึง กลุมของโปรแกรมคอมพิวเตอรหรือชุดคําสั่งที่สั่งการใหคอมพิวเตอรทํางานที่ตองการ และทํางาน โดยวิธีใด โดยชุดคําสั่งเหลานี้จะเรียงลําดับขั้นตอนการทํางานตางๆของคอมพิวเตอรโดยเขียนขึ้นจากภาษาคอมพิวเตอร ซึ่ง ชุดคําสั่งเหลานี้เรียงกันเปนโปรแกรมคอมพิวเตอร ซึ่ง Software แบงออกไดเปน 2ประเภท ไดแก 1. System Software 2. Application software System Software คือ ซอฟตแวรที่บริษัทผูผลิตสรางขึ้นมาเพื่อใชจัดการกับระบบคอมพิวเตอร หนาที่การทํางานของซอฟตแวรระบบคือ ดําเนินงานพื้นฐานตาง ๆ ของระบบคอมพิวเตอรทั้งในดานการติดตอประสานงานกับอุปกรณคอมพิวเตอรหรือ Hardware และ เปนสวนที่ดําเนินการพื้นฐานของคอมพิวเตอรในการทํางานของโปรแกรมตางๆอีกดวย โดย System Software ถูกแบงไดเปน 2สวนใหญๆ ไดแก 1. ระบบปฏิบัติการ (OS : Operating System) คือ ซอฟตแวรหรือโปรแกรม ที่ควบคุมการประมวลผลของโปรแกรมตางๆ และบริหารการใชทรัพยากรของ ระบบและติดตอการทํางานกับ Hardware เชน การบริหารการใชหนวยความจําของโปรแกรมตางๆ Software 3 หนาที่ของระบบปฏิบัติการ o แสดงรายระเอียดและตรวจสอบการเชื่อมตอ ขอมูลและการทํางานเกี่ยวกับอุปกรณตางๆภายในคอมพิวเตอรเมื่อมี การ Boot !! การ Boot คือ กระบวนการ Start และ Restart คอมพิวเตอร แบงเปน 2 ประเภท คือ Cold Boot - เปดดวยการกดปุมPower Warm Boot - เปนการเปดเครื่องใหมจากการ Restart หรือ Reset เครื่อง ***Sleep Mode กับ Hibernate Mode คืออะไร? - Sleep Mode - เปนการพักเครื่อง ดวยการตัดไฟสวนที่ไมจําเปน - Hibernate Mode - เปนการปดเครื่องโดยการยายขอมูลทั้งหมดไปที่ Hard disk แลวตัดไฟ ทั้งหมด เมื่อเปดเครื่องกลับมาอีกครั้ง ระบบจะดึงขอมูลเดิมกลับมาทํางานตอ โปรแกรมตางๆ ที่ เปดไว จะอยูดังเดิม o ตอบสนองการใชงาน สั่งการของ User ผานทางการปอนหรือสงออก -
(Arcp) URI Scheme
The Archive and Package (arcp) URI scheme 1st Stian Soiland-Reyes 2nd Marcos C´aceres School of Computer Science Mozilla Corporation The University of Manchester Melbourne, Australia Manchester, UK https://marcosc.com/ https://orcid.org/0000-0001-9842-9718 Abstract—The arcp URI scheme is introduced for location- downloaded application manifest together with relative links, independent identifiers to consume or reference hypermedia and while also allowing a web application to work offline. linked data resources bundled inside a file archive, as well as to resolve archived resources within programmatic frameworks II. THE ARCHIVE AND PACKAGE (ARCP) URI SCHEME for Research Objects. The Research Object for this article is available at http://s11.no/2018/arcp.html#ro Inspired by the app URL scheme we defined the Index Terms—Uniform resource locators, Semantic Web, Per- Archive and Package (arcp) URI scheme [10], an IETF sistent identifiers, Identity management systems, Data com- pression, Hypertext systems, Distributed information systems, Internet-Draft which specifies how to mint URIs to reference Content-based retrieval resources within any archive or package, independent of archive format or location. I. BACKGROUND The primary use case for arcp is for consuming applications, which may receive an archive through various ways, like file Archive formats like BagIt [1] have been recognized as upload from a web browser or by reference to a dataset in a important for preservation and transferring of datasets and repository like Zenodo or FigShare. In order to parse Linked other digital resources [2]. More specific examples include Data resources (say to expose them for SPARQL queries), they COMBINE archives [3] for systems biology, CDF [4] for will need to generate a base URL for the root of the archive.