Redundancy and Access Permissions in Decentralized File Systems

Total Page:16

File Type:pdf, Size:1020Kb

Redundancy and Access Permissions in Decentralized File Systems Lehrstuhl für Netzarchitekturen und Netzdienste Fakultät für Informatik Technische Universität München Redundancy and Access Permissions in Decentralized File Systems Johanna Amann Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Johann Schlichter Prüfer der Dissertation: 1. Univ.-Prof. Dr. Georg Carle 2. Univ.-Prof. Dr. Kurt Tutschku, Universität Wien 3. TUM Junior Fellow Dr. Thomas Fuhrmann Die Dissertation wurde am 20.06.2011 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 05.09.2011 angenommen. Zusammenfassung Verteilte Dateisysteme sind ein Thema mit dem sich die Informatik immer wieder aufs Neue auseinandersetzt. Die ersten verteilten Dateisysteme sind schon sehr kurz nach dem Aufkommen von Computernetzen entstanden. Traditionell sind solche Systeme serverbasiert, d. h. es gibt eine strikte Unterscheidung zwischen den Systemen welche die Daten speichern und den Systemen welche die Daten abrufen. Diese Architektur hat Vorteile; es wird zum Beispiel angenommen, dass die Server vertrauenswürdig sind. Außerdem gibt es eine inhärente lineare Ordnung der Dateioperationen. Auf der anderen Seite sind solche zentralisierten Systeme schlecht skalierbar, haben einen hohen Administrationsaufwand, eine geringe Fehlertoleranz oder sind teuer im Betrieb. Im Verlauf des letzten Jahrzehnts wurde eine neue Art von verteilten Dateisystemen wiederholt diskutiert. Diese werden dezentrale oder Peer-to-Peer Dateisysteme genannt. Bei diesen Systemen verschwimmt die strenge Trennung zwischen datenspeichernden und datenabrufenden Systemen. Die Daten werden auf allen am Dateisystem teilnehmenden Rechnern gespeichert. Eine solche Systemarchitektur kann viele der Problemstellen server- basierter Systeme ausgleichen. Sie ist unter anderem leicht skalierbar, benötigt wegen der verteilten Architektur eine geringere Bandbreite und hat einen niedrigeren Administrati- onsaufwand. Sie bringt aber neue Herausforderungen mit sich: ihre interne Architektur ist weit komplexer, die benötigten Algorithmen werden noch immer erforscht oder verbessert. Die verschiedenen existierenden Prototypensysteme spielen in der Praxis keine Rolle, da in den bisherigen Betrachtungen einige für die Praxis wichtige Systembestandteile nicht einbezogen wurden. Die vorliegende Arbeit präsentiert einige dieser fehlenden Systembestandteile. Um den Ausfall einzelner Rechner tolerieren zu können, müssen dezentrale Dateisysteme Daten redundant speichern. Diese Arbeit präsentiert ein ressourcensparendes Verfahren, welches eine kontinuierliche Verfügbarkeit der Daten sicherstellt und mit verschiedenen Redundanzalgorithmen verwendet werden kann. In Simulationen, die auf Verkehrsdaten echter Peer-to-Peer Systeme basieren, wird die Effektivität des Verfahrens demonstriert und die Tauglichkeit verschiedener Redundanzalgorithmen für dezentrale Dateispeicherung verglichen. Eine weitere Problematik dezentraler Dateisysteme ist die Durchsetzung von Dateirech- ten. Da es keine vertrauenswürdigen Instanzen gibt, muss die Sicherung der Integrität und Vertraulichkeit durch kryptografische Verfahren erfolgen. Die existierenden Ansätze skalieren nicht, unterstützen keine Benutzergruppen oder benötigen zumindest teilweise vertrauenswürdige Rechner. Diese Arbeit präsentiert ein System, welches Dateirechte nur mittels Kryptografie durchsetzt und gleichzeitig alle diese Voraussetzungen erfüllt. Das 3 System wurde in einem Prototypen implementiert um die Funktionsfähigkeit des Ansatzes zu demonstrieren. Die in dieser Arbeit entwickelten Algorithmen schließen zwei der wichtigsten Funktionali- tätslücken der existierenden Systeme. Es können nun vollkommen dezentrale Dateisysteme, welche für den Nutzer in ihrer Funktionalität nicht von serverbasierenden Systemen un- terscheidbar sind, entwickelt werden. 4 Abstract Distributed file systems are a recurring topic in computer science. The first distributed file systems surfaced shortly after the inception of computer networking. Traditionally these systems are server based; there is a strict distinction between systems that store data and systems that access data. This architecture has advantages; it is e.g. assumed that the servers are trusted and there is an implicit linear order in which the commands are executed. On the other hand, such systems do not scale well, have a high administrative cost, a low tolerance for failure or are expensive to operate. In the last decade, a new kind of distributed file system has been frequently discussed. They are called decentralized or peer-to-peer file systems. In these systems, the strict separation between data storing and data accessing peers becomes blurred. The infor- mation stored in the system is spread among all peers that are part of the file system. Such system architectures are not impeded by many of the problems of server based systems. They are e.g. rapidly scalable, have lower bandwidth requirements because of their decentralized architecture and have a lower administrative overhead. However, these systems also pose new challenges: their internal architecture is much more complex and the required algorithms are still being researched or improved. The existing prototype systems are rarely used in practice because they are missing key features required in the real world. This thesis presents some of these missing system components. To be able to tolerate node failures, decentralized file systems have to use data redundancy. This work shows a resource efficient method that guarantees the continuous availability of the data and can be used with different redundancy algorithms. In simulations that use real-world peer-to-peer network traces, the effectivity of the method is demonstrated and the suitability of the different redundancy algorithms for decentralized data storage is compared. A further problem of decentralized file systems is the enforcement of file access permis- sions. Because there are no trusted nodes in the network, data integrity and confidentiality have to be enforced solely by cryptography. The existing approaches do not scale, do not support user groups or need at least partially trusted network nodes. This thesis presents a cryptographic approach that enforces file permissions and satisfies all these requirements. The system was implemented in a prototype to demonstrate the validity of the approach. The algorithms developed in this thesis close two of the most important functionality gaps of existing systems. They can be used to implement decentralized file systems which are indistinguishable from server based systems from the user’s perspective. 5 Pre-Publications Parts of this dissertation have been pre-published in earlier versions: Adding Cryptographically Enforced Permissions to Fully Decentralized \bullet File Systems Johanna Amann and Thomas Fuhrmann Technical Report TUM-I1107, Technische Universität München, München, Germany, 6. April 2011 A Quantitative Analysis of Redundancy Schemes for Peer-to-Peer Storage \bullet Systems Yaser Houri, Johanna Amann, and Thomas Fuhrmann Proceedings of the 12th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’10), New York City, USA, September 20-22, 2010 Cryptographically Enforced Permissions for Fully Decentralized File \bullet Systems Johanna Amann and Thomas Fuhrmann Proceedings of the 10th IEEE International Conference on Peer-to-Peer Computing 2010 (P2P’10), Delft, the Netherlands, August 25-27, 2010 Unix-like Access Permissions in Fully Decentralized File Systems \bullet Johanna Amann and Thomas Fuhrmann Poster Presentation at the 8th USENIX Conference on File and Storage Technologies (FAST’10), San Jose, California, USA, February 23-26, 2010 IgorFs: A Distributed P2P File System \bullet Johanna Amann, Benedikt Elser, Yaser Houri and Thomas Fuhrmann Proceedings of the 8th IEEE International Conference on Peer-to-Peer Computing (P2P’08), Aachen, Germany, September 8-11, 2008 7 Danksagungen Ich schulde vielen Menschen einen herzlichen Dank, die beim Schreiben dieser Arbeit auf verschiedenste Weise unterstützt haben. Ich danke meinem Doktorvater Dr. Thomas Fuhrmann, ohne dessen Betreuung die- se Dissertation nicht möglich gewesen wäre. Weiterhin danke ich den Mitgliedern der Prüfungskommission Prof. Dr.-Ing. Georg Carle, V.-Prof Dr. Kurt Tutschku sowie dem Vorsitzenden der Prüfungskommission Prof. Dr. Johann Schlichter, ohne deren schnelle Korrektur und zeitliche Flexibilität bei der Terminfindung der Antritt meiner Folgestelle nicht möglich gewesen wäre. Vielen Dank auch an meine Kollegen Benedikt Elser, Björn Saballus und Thomas Nitsche, die sehr zu meiner langfristigen Motivation beigetragen haben. Ich danke weiterhin Florian Sesser, Jean Ledwon, meinem Großvater Horst Rockinger und meinem Vater Anton Amann für die große Hilfe bei der Suche nach Grammatik und Rechtschreibfehlern und für die Kommentare zur Doktorarbeit. Einen besonderen Dank möchte ich noch an Dr. Georg Groh für seine Unterstützung in der Endphase richten. Vielen Dank auch an meine Familie und sowie alle anderen die mich in dieser Zeit unterstützt haben. 9 Contents Zusammenfassung 3 Abstract 5 Pre-publications 7 Danksagungen 9 1. Introduction 17 1.1. Contribution . 19 1.2. Organization . 19 I. Background 23 2. Cryptographic algorithms 25 2.1. Hashing Algorithms . 25 2.2. Symmetric Encryption Algorithms . 26 2.3. Public
Recommended publications
  • CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
    CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru­ mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas­ tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math­ ias Payer.
    [Show full text]
  • Katalog Elektronskih Knjiga
    KATALOG ELEKTRONSKIH KNJIGA Br Autor Naziv Godina ISBN Str. Porijeklo izdavanja 1 Peter Kent Pay Per Click Search 2006 0-471-74594-3 130 Kupovina Engine Marketing for Dummies 2 Terry Large Access 1 2007 Internet Freeware 3 Kevin Smith Excel Lassons & Tutorials 2004 Internet Freeware 4 Terry Michael Photografy Tutorials 2006 Internet Freeware Janine Peterson Phil Pivnick 5 Jake Ludington Converting Vinyl LPs 2003 Internet Freeware to CD 6 Allen Wyatt Cleaning Windows XP 2004 0-7645-7311-X Poklon for Dummies 7 Peter Kent Sarch Engine Optimization 2006 0-4717-5441-2 Kupovina for Dummies 8 Terry Large Access 2 2007 Internet Freeware 9 Dirk Dupon How to write, create, 2005 Internet Freeware promote and sell E-books on the Internet 10 Chayden Bates eBook Marketing 2000 Internet Freeware Explained 11 Kevin Sinclair How To Choose A 1999 Internet Freeware Homebased Bussines 12 Bob McElwain 101 Newbie-Frendly Tips 2001 Internet Freeware 13 Windows Basics 2004 Poklon 14 Michael Abrash Zen of Graphic 2005 Poklon Programming, 2. izdanje 15 13 Hot Internet 2000 Internet Freeware Moneymaking Methods 16 K. Williams The Complete HTML 1998 Poklon Teacher 17 C. Darwin On the Origin of Species Internet Freeware 2/175 Br Autor Naziv Godina ISBN Str. Porijeklo izdavanja 18 C. Darwin The Variation of Animals Internet Freeware 19 Bruce Eckel Thinking in C++, Vol 1 2000 Internet Freeware 20 Bruce Eckel Thinking in C++, Vol 2 2000 Internet Freeware 21 James Parton Captains of Industry 1890 399 Internet Freeware 22 Bruno R. Preiss Data Structures and 1998 Internet
    [Show full text]
  • File Formats
    man pages section 4: File Formats Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–3945–10 September 2004 Copyright 2004 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
    [Show full text]
  • HTTP-FUSE Xenoppix
    HTTP-FUSE Xenoppix Kuniyasu Suzaki† Toshiki Yagi† Kengo Iijima† Kenji Kitagawa†† Shuichi Tashiro††† National Institute of Advanced Industrial Science and Technology† Alpha Systems Inc.†† Information-Technology Promotion Agency, Japan††† {k.suzaki,yagi-toshiki,k-iijima}@aist.go.jp [email protected], [email protected] Abstract a CD-ROM. Furthermore it requires remaking the entire CD-ROM when a bit of data is up- dated. The other solution is a Virtual Machine We developed “HTTP-FUSE Xenoppix” which which enables us to install many OSes and ap- boots Linux, Plan9, and NetBSD on Virtual plications easily. However, that requires in- Machine Monitor “Xen” with a small bootable stalling virtual machine software. (6.5MB) CD-ROM. The bootable CD-ROM in- cludes boot loader, kernel, and miniroot only We have developed “Xenoppix” [1], which and most part of files are obtained via Internet is a combination of CD/DVD bootable Linux with network loopback device HTTP-FUSE “KNOPPIX” [2] and Virtual Machine Monitor CLOOP. It is made from cloop (Compressed “Xen” [3, 4]. Xenoppix boots Linux (KNOP- Loopback block device) and FUSE (Filesys- PIX) as Host OS and NetBSD or Plan9 as Guest tem USErspace). HTTP-FUSE CLOOP can re- OS with a bootable DVD only. KNOPPIX construct a block device from many small block is advanced in automatic device detection and files of HTTP servers. In this paper we describe driver integration. It prepares the Xen environ- the detail of the implementation and its perfor- ment and Guest OSes don’t need to worry about mance. lack of device drivers.
    [Show full text]
  • Software Distributor Administration Guide for HP-UX 11I
    Software Distributor Administration Guide for HP-UX 11i HP Computers Manufacturing Part Number: B2355-90754 June 2002, Edition 3 © Copyright 2002 Hewlett-Packard Company. Legal Notices The information in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material. Warranty A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be obtained from your local Sales and Service Office. Restricted Rights Legend Use, duplication or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 for DOD agencies, and subparagraphs (c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR 52.227-19 for other agencies. HEWLETT-PACKARD COMPANY 3000 Hanover Street Palo Alto, California 94304 U.S.A. Use of this document and any supporting software media supplied for this pack is restricted to this product only. Additional copies of the programs may be made for security and back-up purposes only. Resale of the programs, in their present form or with alterations, is expressly prohibited. Copyright Notice Copyright 1997-2002 Hewlett-Packard Company. All rights reserved.
    [Show full text]
  • Printed Program
    rd International Conference 43 on Software Engineering goes Virtual PROGRAM May 25th –28th 2021 Co-located events: May 17th –May 24th Workshops: May 24th , May 29th –June 4th https://conf.researchr.org/home/icse-2021 @ICSEconf Table of Contents Conference overviews ............................................................... 3 Sponsors and Supporters ......................................................... 11 Welcome letter ........................................................................ 13 Keynotes ................................................................................. 18 Technical Briefings ................................................................. 28 Co-located events .................................................................... 35 Workshops .............................................................................. 36 New Faculty Symposium ........................................................ 37 Doctoral Symposium .............................................................. 39 Detailed Program - Tuesday, May 25th .................................................... 42 - Wednesday, May 26th ............................................... 53 - Thursday, May 27th .................................................. 66 - Friday, May 28th ....................................................... 80 Awards .................................................................................... 91 Social and Networking events ................................................. 94 Organizing Committee ........................................................
    [Show full text]
  • File Permissions Do Not Restrict Root
    Filesystem Security 1 General Principles • Files and folders are managed • A file handle provides an by the operating system opaque identifier for a • Applications, including shells, file/folder access files through an API • File operations • Access control entry (ACE) – Open file: returns file handle – Allow/deny a certain type of – Read/write/execute file access to a file/folder by – Close file: invalidates file user/group handle • Access control list (ACL) • Hierarchical file organization – Collection of ACEs for a – Tree (Windows) file/folder – DAG (Linux) 2 Discretionary Access Control (DAC) • Users can protect what they own – The owner may grant access to others – The owner may define the type of access (read/write/execute) given to others • DAC is the standard model used in operating systems • Mandatory Access Control (MAC) – Alternative model not covered in this lecture – Multiple levels of security for users and documents – Read down and write up principles 3 Closed vs. Open Policy Closed policy Open Policy – Also called “default secure” • Deny Tom read access to “foo” • Give Tom read access to “foo” • Deny Bob r/w access to “bar” • Give Bob r/w access to “bar • Tom: I would like to read “foo” • Tom: I would like to read “foo” – Access denied – Access allowed • Tom: I would like to read “bar” • Tom: I would like to read “bar” – Access allowed – Access denied 4 Closed Policy with Negative Authorizations and Deny Priority • Give Tom r/w access to “bar” • Deny Tom write access to “bar” • Tom: I would like to read “bar” – Access
    [Show full text]
  • A Dataset of Open-Source Safety-Critical Software⋆
    A Dataset of Open-Source Safety-Critical Software? Rafaila Galanopoulou[0000−0002−5318−9017] and Diomidis Spinellis[0000−0003−4231−1897] Department of Management Science and Technology Athens University of Economics and Business Patission 76, Athens, 10434, Greece ft8160018,[email protected] https://www.aueb.gr Abstract. We describe the method used to create a dataset of open- source safety-critical software, such as that used for autonomous cars, healthcare, and autonomous aviation, through a systematic and rigorous selection process. The dataset can be used for empirical studies regard- ing the quality assessment of safety-critical software, its dependencies, and its development process, as well as comparative studies considering software from other domains. Keywords: open-source · safety-critical · dataset 1 Introduction Safety-critical systems (SCS) are those whose failure could result in loss of life, significant property damage, or damage to the environment [5]. Over the past decades ever more software is developed and released as open-source software (OSS) | with licenses that allow its free use, study, change, and distribution [1]. The increasing adoption of open-source software in safety-critical systems [10], such as those used in the medical, aerospace, and automotive industries, poses an interesting challenge. On the one hand, it shortens time to delivery and lowers development costs [6]. On the other hand, it introduces questions regarding the system's quality. For a piece of software to be part of a safety-critical application it requires quality assurance, because quality is a crucial factor of an SCS's software [3]. This assurance demands that evidence regarding OSS quality is supplied, and an analysis is needed to assess if the certification cost is worthwhile.
    [Show full text]
  • Files and Processes (Review)
    Files and Processes (review) Files and Processes (review) 1/61 Learning Objectives Files and Processes (review) I Review of files in standard C versus using system call interface for files I Review of buffering concepts I Review of process memory model I Review of bootup sequence in Linux and Microsoft Windows I Review of basic system calls under Linux: fork, exec, wait, exit, sleep, alarm, kill, signal I Review of similar basic system calls under MS Windows 2/61 Files Files and I Recall how we write a file copy program in standard C. Processes (review) #include <stdio.h> FILE *fopen(const char *path, const char *mode); size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream); int fclose(FILE *fp); I We can also use character-based functions such as: #include <stdio.h> int fgetc(FILE *stream); int fputc(int c, FILE *stream); I With either approach, we can write a C program that will work on any operating system as it is in standard C. 3/61 Standard C File Copy Files and Processes (review) I Uses fread and fwrite. I files-processes/stdc-mycp.c 4/61 POSIX/Unix Files Files and Processes (review) I "On a UNIX system, everything is a file; if something is not a file, it is a process." I A directory is just a file containing names of other files. I Programs, services, texts, images, and so forth, are all files. I Input and output devices, and generally all devices, are considered to be files.
    [Show full text]
  • DMFS - a Data Migration File System for Netbsd
    DMFS - A Data Migration File System for NetBSD William Studenmund Veridian MRJ Technology Solutions NASAAmes Research Center" Abstract It was designed to support the mass storage systems de- ployed here at NAS under the NAStore 2 system. That system supported a total of twenty StorageTek NearLine ! have recently developed DMFS, a Data Migration File tape silos at two locations, each with up to four tape System, for NetBSD[I]. This file system provides ker- drives each. Each silo contained upwards of 5000 tapes, nel support for the data migration system being devel- and had robotic pass-throughs to adjoining silos. oped by my research group at NASA/Ames. The file system utilizes an underlying file store to provide the file The volman system is designed using a client-server backing, and coordinates user and system access to the model, and consists of three main components: the vol- files. It stores its internal metadata in a flat file, which man master, possibly multiple volman servers, and vol- resides on a separate file system. This paper will first man clients. The volman servers connect to each tape describe our data migration system to provide a context silo, mount and unmount tapes at the direction of the for DMFS, then it will describe DMFS. It also will de- volman master, and provide tape services to clients. The scribe the changes to NetBSD needed to make DMFS volman master maintains a database of known tapes and work. Then it will give an overview of the file archival locations, and directs the tape servers to move and mount and restoration procedures, and describe how some typi- tapes to service client requests.
    [Show full text]
  • The Application of File Identification, Validation, and Characterization Tools in Digital Curation
    THE APPLICATION OF FILE IDENTIFICATION, VALIDATION, AND CHARACTERIZATION TOOLS IN DIGITAL CURATION BY KEVIN MICHAEL FORD THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Library and Information Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2011 Urbana, Illinois Advisers: Research Assistant Professor Melissa Cragin Assistant Professor Jerome McDonough ABSTRACT File format identification, characterization, and validation are considered essential processes for digital preservation and, by extension, long-term data curation. These actions are performed on data objects by humans or computers, in an attempt to identify the type of a given file, derive characterizing information that is specific to the file, and validate that the given file conforms to its type specification. The present research reviews the literature surrounding these digital preservation activities, including their theoretical basis and the publications that accompanied the formal release of tools and services designed in response to their theoretical foundation. It also reports the results from extensive tests designed to evaluate the coverage of some of the software tools developed to perform file format identification, characterization, and validation actions. Tests of these tools demonstrate that more work is needed – particularly in terms of scalable solutions – to address the expanse of digital data to be preserved and curated. The breadth of file types these tools are anticipated to handle is so great as to call into question whether a scalable solution is feasible, and, more broadly, whether such efforts will offer a meaningful return on investment. Also, these tools, which serve to provide a type of baseline reading of a file in a repository, can be easily tricked.
    [Show full text]
  • Interesting Things You Didn't Know You Could Do With
    Interesting Things You Didn’t Know You Could Do With ZFS Allan Jude -- ScaleEngine Inc. [email protected] twitter: @allanjude Introduction Allan Jude ● 13 Years as FreeBSD Server Admin ● FreeBSD src/doc committer (focus: ZFS, bhyve, ucl, xo) ● Co-Author of “FreeBSD Mastery: ZFS” and upcoming “FreeBSD Mastery: Advanced ZFS” with M. W. Lucas ● Architect of the ScaleEngine CDN (HTTP and Video) ● Host of BSDNow.tv & TechSNAP.tv Podcasts ● Use ZFS for large collections of videos, extremely large website caches, mirrors of PC-BSD pkgs and RaspBSD ● Single Handedly Manage Over 1000TB of ZFS Storage The Power of ZFS ● Integrated Redundancy (Mirroring, RAID-Z) ● Data Integrity Checking (Checksums, Scrub) ● Pooled Storage (Hot Add Disks) ● Multi-Level Cache (ARC, L2ARC, SLOG) ● Copy-on-Write (no fsck) ● Snapshots and Clones ● Quotas and Reservations ● Transparent Compression (LZ4, GZIP1-9) ● Incremental Replication (zfs send/recv) ● Datasets with Individual Inherited Properties ● Custom Properties ● Fine Grained Delegation Applying That Power ZFS has many features, but how can I use them to solve my problems? ZFS has a very well designed command line user interface, making it very easy for a sysadmin to perform common tasks (add more storage, create new datasets, change settings and properties), accomplish things that were not possible before, as well as extract a great deal more information from the storage system. ZFS Was Meant To Be Scripted # zfs list -Hp -r -o name,refer,logicalreferenced sestore5/mysql02 22001288628 24331078144 sestore5/omicron
    [Show full text]