Integrating Preservation Functions Into the Web Server Joan A

Total Page:16

File Type:pdf, Size:1020Kb

Integrating Preservation Functions Into the Web Server Joan A Old Dominion University ODU Digital Commons Computer Science Theses & Dissertations Computer Science Summer 2008 Integrating Preservation Functions Into the Web Server Joan A. Smith Old Dominion University Follow this and additional works at: https://digitalcommons.odu.edu/computerscience_etds Part of the Computer Sciences Commons, and the Digital Communications and Networking Commons Recommended Citation Smith, Joan A.. "Integrating Preservation Functions Into the Web Server" (2008). Doctor of Philosophy (PhD), dissertation, Computer Science, Old Dominion University, DOI: 10.25777/jr7m-6g71 https://digitalcommons.odu.edu/computerscience_etds/22 This Dissertation is brought to you for free and open access by the Computer Science at ODU Digital Commons. It has been accepted for inclusion in Computer Science Theses & Dissertations by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. INTEGRATING PRESERVATION FUNCTIONS INTO THE WEB SERVER by Joan A. Smith B.A. 1986 University of the State of New York M.A. 1988 Hampton University A Dissertation Submitted to the Faculty of Old Dominion University in Partial Fulfillment of the Requirement for the Degree of DOCTOR OF PHILOSOPHY COMPUTER SCIENCE OLD DOMINION UNIVERSITY August 2008 Approved by: Michael L. Nelson (Director) Kurt Maly Steven J. Zeil Mohammed K. Zubair Simeon Warner ABSTRACT INTEGRATING PRESERVATION FUNCTIONS INTO THE WEB SERVER Joan A. Smith Old Dominion University, 2008 Director: Dr. Michael L. Nelson Digital preservation of the World Wide Web poses unique challenges, different from the preservation issues facing professional Digital Libraries. The complete list of a website’s resources cannot be cited with confidence, and the descriptive metadata available for the resources is so minimal that it is sometimes insufficient for a browser to recognize. In short, the Web suffers from a counting problem and a representation problem. Refreshing the bits, migrating from an obsolete file format to a newer format, and other classic digital preservation problems also affect the Web. As digital collections devise solutions to these problems, the Web will also benefit. But the core World Wide Web problems of Counting and Representation need a targeted solution. As the host of web content, the web server is uniquely positioned to assist in the preservation of the resources it serves. It both knows the resources it has, and knows what kind of resources they are. This dissertation presents research in which preservation functions have been integrated into the web server itself. The CRATE Model defines a method for addressing the Counting Problem and the Representation Problem using existing web server-compatible technology. A series of experiments which evaluated this approach are presented, along with a technical review of the MODOAI web server module which acts as the preservation agent. The feasibility of this approach is demonstrated by a quantitative analysis of its use in a commercial web testing environment. iii ©Copyright, 2008, by Joan A. Smith, All Rights Reserved. iv Dedicated to my mother and to the memory of her mother. a barren field gives birth to the fertile ground — Byzantine (Septuagint) Psalter v ACKNOWLEDGMENTS Considering how many people influenced this research or directly contributed to it through creative discussion, commentary, or plain critique, it seems unfair to not have them as co-authors or at least co-inspirators, so to speak. If not for Michael Nelson, my eternally patient advisor, this whole project would never have begun. He and Johan Bollen took the time to convince me that pursuing a PhD was a worthwhile endeavor. More importantly, Michael then followed up by securing funding for this research from both the Andrew Mellon Foundation and the Library of Congress. His guid- ance was always on target; he knew somehow when to apply pressure and when to step back. I am indebted to him on many levels and for many reasons. Still, the process is long and courage falters. Stephan Olariu and Hussein Abdel-Wahab provided much-needed encouragement and support at critical points along the way, particularly in overcoming the first hurdle, the Diagnostic (PhD qualifying) Exam. The Candidacy Exam committee members, Kurt Maly, Simeon Warner, Steven Zeil and Mohammed Zubair, provided very helpful feedback and commentary on the initial proposal and I greatly appreciate the time and effort they put into reviewing that as well as this dissertation. Throughout these years I’ve also had the good fortune to have Martin Klein and Frank Mc- Cown as project partners, research colleagues and office mates. Working with them has been both productive and rewarding. The pseudo-competition to get papers accepted at conferences added a dimension of fun to what was otherwise just plain hard work. In addition, I have benefited from the software development efforts of Terry Harrison and Aravind Elango, who wrote the initial prototype of MODOAI, from which the author’s current modular version was derived. I owe Howard Smith (while at Symantec) and Jim Gray (of Kronos) many thanks for suggesting testing methods and for providing commercial test environments for MODOAI. It isn’t often that such an academic endeavor can be tested in a commercial environment. The experience was especially helpful in defining criteria for metadata utility compatibility and in designing off-line tests for the software. When it comes to software engineering, however, one individual stands above the rest. If I have learned anything at all about the art and practice of software engineering, it is thanks to John Owen. As a professional colleague, business partner, and friend, his depth of knowledge and love of the craft have been an inspiration and a guide. The version of MODOAI developed for this dissertation owes its new architecture and implementation style to the debates and discussions we’ve had. From moral support to technical opinions his help has been invaluable. Michael Nelson guided the development of the CRATE concept and its focus on the counting and description problems. The complex object implementation, particularly MPEG-21 DIDL, is based on the work of Herbert Van de Sompel and Jeroen Bekaert. Their research significantly vi helped clarify the concepts and goals of the CRATE model. There have, of course, been many others whose professionalism and support have made a differ- ence during my time at Old Dominion University. Janet Brunelle gave me an opportunity to teach the Computer Science capstone course, renewing my interest in teaching while simultaneously re- minding me just how much work goes into preparing lectures. The department’s administrative staff, particularly Phyillis Woods, kept the paperwork straight and the paychecks coming. Ajay Gupta and his systems staff assisted, supported, and restored as only "Sys Admins" can. Among these system administrators, Chris Robinson, Ian Gullet and Joshua Robertson were especially important to the "Counting Problem" experiments. I appreciate their help with both hardware and software. Ultimately, of course, it all comes down to family, and here I have been blessed with good fortune beyond the normal measure. My parents and siblings have continually expressed both con- fidence in my abilities as well as a belief in the value of education for its own sake. Last, but most importantly, I have had the unflagging devotion of a wonderful husband. Howard Smith has been with me through it all, with patience and love. There is no way to express how much this has meant to me every day. This PhD belongs to him. vii ACRONYMS AIP Archival Information Package (OAIS) API Application Programming Interface CCSDS Consultative Committee for Space Data Systems DIDL Digital Item Declaration Language DOI Digital Object Identifier DIP Dissemination Information Package (OAIS) DL Digital Library GIF Graphics Interchange Format HTML HyperText Markup Language HTTP Hypertext Transfer Protocol JPEG Joint Photographic Experts Group KWF Kahn-Wilensky Framework LANL Los Alamos National Laboratory METS Metadata Encoding and Transmission Standard MIME Multipurpose Internet Mail Extensions MPEG Moving Picture Experts Group NDIIPP National Digital Information Infrastructure and Preservation Program NSSDC National Space Sciences Data Center OAI Open Archives Initiative OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting OAIS Open Archival Information System PDF Portable Document Format (Adobe) PNG Portable Network Graphics PREMIS Preservation Metadata Implementation Strategies RDF Resource Description Framework RSS Really Simple Syndication SIP Submission Information Package (OAIS) URI Uniform Resource Identifier URL Uniform Resource Locator URN Unform Resource Name W3C The World Wide Web Consortium WWW The World Wide Web XHTML Extensible HyperText Markup Language XML Extensible Markup Language viii TABLE OF CONTENTS Page LISTOFTABLES ....................................... xi LISTOFFIGURES...................................... xiii CHAPTER I INTRODUCTION .................................... 1 1 The Challenge of Digital Preservation . 1 2 Scope ...................................... 4 3 Approach .................................... 5 4 Organization................................... 7 II CURRENT PRACTICE IN DIGITAL PRESERVATION . ... 9 1 Preservation: Definitions and Limitations . 9 2 Digital Preservation Models & Implementations . 12 3 LOCKSS..................................... 19 4 OAI-PMH...................................
Recommended publications
  • Gpsbabel Documentation Gpsbabel Documentation Table of Contents
    GPSBabel Documentation GPSBabel Documentation Table of Contents Introduction to GPSBabel ................................................................................................... xx The Problem: Too many incompatible GPS file formats ................................................... xx The Solution ............................................................................................................ xx 1. Getting or Building GPSBabel .......................................................................................... 1 Downloading - the easy way. ....................................................................................... 1 Building from source. .................................................................................................. 1 2. Usage ........................................................................................................................... 3 Invocation ................................................................................................................. 3 Suboptions ................................................................................................................ 4 Advanced Usage ........................................................................................................ 4 Route and Track Modes .............................................................................................. 5 Working with predefined options .................................................................................. 6 Realtime tracking ......................................................................................................
    [Show full text]
  • CONNX Installation Guide Version 11.5
    CONNX Installation Guide Version 11.5 Table Of Contents Preface .......................................................................................................................................................... 1 About the Installation Guide .......................................................................................................................... 1 Database Terminology .................................................................................................................................. 2 Installation Overview ..................................................................................................................................... 3 CONNX Installation Checklist ....................................................................................................................... 4 Upgrade Installation Checklist ...................................................................................................................... 5 Displaying Your CONNX Version .................................................................................................................. 6 To display the CONNX version and build numbers ............................................................................... 6 How CONNX Works ...................................................................................................................................... 7 Related Topics ......................................................................................................................................
    [Show full text]
  • Enterprise Peopletools 8.50 Peoplebook: System and Server Administration
    Enterprise PeopleTools 8.50 PeopleBook: System and Server Administration September 2009 Enterprise PeopleTools 8.50 PeopleBook: System and Server Administration SKU pt850pbr0 Copyright © 1988, 2009, Oracle and/or its affiliates. All rights reserved. Trademark Notice Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this software or related documentation is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007).
    [Show full text]
  • HTMLDOC 1.8.23 Software Users Manual ESP−003−20021023
    HTMLDOC 1.8.23 Software Users Manual ESP−003−20021023 Easy Software Products Copyright 1997−2002, See the GNU General Public License for Details. HTMLDOC 1.8.23 Software Users Manual Table of Contents Introduction..................................................................................................................................................IN−1 History...............................................................................................................................................IN−1 Organization of This Manual............................................................................................................IN−2 Support..............................................................................................................................................IN−2 Encryption Support............................................................................................................................IN−2 Copyright, Trademark, and License Information..............................................................................IN−2 Chapter 1 − Installing HTMLDOC..............................................................................................................1−1 Installing a Binary Distribution...........................................................................................................1−1 Installing HTMLDOC from the Source Distribution..........................................................................1−3 Chapter 2 − Getting Started..........................................................................................................................2−1
    [Show full text]
  • By Michael L. Black Dissertation
    TRANSPARENT CULTURES: IMAGINED USERS AND THE POLITICS OF SOFTWARE DESIGN (1975-2012) BY MICHAEL L. BLACK DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in English in the Graduate College of the University of Illinois at Urbana-Champaign, 2014 Urbana, Illinois Doctoral Committee: Professor Robert Markley, Chair Associate Professor Ted Underwood Associate Professor Melissa Littlefield Associate Professor Spencer Schaffner Associate Professor John T. Newcomb ii Abstract The rapid pace of software’s development poses serious challenges for any cultural history of computing. While digital media studies often sidestep historicism, this project asserts that computing’s messy, and often hidden, history can be studied using digital tools built to adapt text-mining strategies to the textuality of source code. My project examines the emergence of personal computing, a platform underlying much of digital media studies but that itself has received little attention outside of corporate histories. Using an archive of technical papers, professional journals, popular magazines, and science fiction, I trace the origin of design strategies that led to a largely instrumentalist view of personal computing and elevated “transparent design” to a privileged status. I then apply text-mining tools that I built with this historical context in mind to study source code critically, including those features of applications hidden by transparent design strategies. This project’s first three chapters examine how and why strategies of information hiding shaped consumer software design from the 1980s on. In Chapter 1, I analyze technical literature from the 1970s and 80s to show how cognitive psychologists and computer engineers developed an ideal of transparency that discouraged users from accessing information structures underlying personal computers.
    [Show full text]
  • HTMLDOC 1.8.27 Software Users Manual ESP-003-20060801
    HTMLDOC 1.8.27 Software Users Manual ESP-003-20060801 Easy Software Products Copyright 1997-2006, All Rights Reserved. HTMLDOC 1.8.27 Software Users Manual Table of Contents Introduction...................................................................................................................................................IN-1 History................................................................................................................................................IN-1 Organization of This Manual.............................................................................................................IN-2 Support...............................................................................................................................................IN-2 Encryption Support.............................................................................................................................IN-2 Copyright, Trademark, and License Information...............................................................................IN-2 Chapter 1 - Installing HTMLDOC................................................................................................................1-1 Requirements........................................................................................................................................1-1 Installing HTMLDOC..........................................................................................................................1-1 Installing HTMLDOC on Microsoft Windows..............................................................................1-1
    [Show full text]
  • Die Postscript- & PDF-Bibel
    Die PostScript- Thomas Merz Thomas Olaf Drümmer & PDF-Bibel 2. Auflage colophon.fm Page 640 Friday, December 7, 2001 4:01 PM Die PostScript- & PDF-Bibel Die Deutsche Bibliothek CIP-Einheitsaufnahme Merz, Thomas unter Mitarbeit von Drümmer, Olaf: Die PostScript- & PDF-Bibel / Thomas Merz. - 2. Aufl.. - München : PDFlib; Heidelberg : dpunkt-Verl., 2002 1. Aufl. u.d.T.: Die PostScript- & Acrobat-Bibel ISBN 3-935320-01-9 © 2002 PDFlib GmbH, München 2. Auflage 2002 Verlagsadresse PDFlib GmbH Tal 40, 80331 München Fax 089/29 16 46 86 http://www.pdflib.com Buchbestellungen: bookpdflib.com Webseite zum Buch http://www.pdflib.com/bibel Umschlag, Illustrationen und Gestaltung Alessio Leonardi, Leonardi.Wollein Berlin Schriften Luc(as) de Groot, Berlin Lektorat Katja Karsunke, Wolnzach Korrektorat Susanne Spitzer, München Verpflegung Riva Pizza, München Belichtung, Druck und Bindung Druckerei Kösel, Kempten Alle Angaben in diesem Buch wurden mit größter Sorgfalt zusammengestellt. Dennoch sind Fehler nicht ganz auszuschließen. Autor und Verlag übernehmen keine juristische Veranortung oder Haftung für Schäden, die durch eventuell verbliebene Fehler entstehen. Alle Warenbezeichnungen werden ohne Gährleistung der freien Verwendbarkeit benutzt und sind möglicher- weise eingetragene Warenzeichen. Dieses Buch ist urheberrechtlich geschützt. Alle Rechte ein- schließlich Vervielfältigung, Übersetzung, Mikroverfilmung und Einspeicherung in elektronischen Systemen vorbehalten. 1 2 3 4 5 6 03 02 01 Thomas Merz Die PostScript- Thomas Merz Thomas Olaf Drümmer
    [Show full text]
  • A Concise Guide to CHARMM and the Analysis of Protein Structure and Function
    A Concise Guide to CHARMM and the Analysis of Protein Structure and Function Robert Schleif Biology Department Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218 1/8/06 9/17/13 Preface Increasingly, biologists and biochemists are faced with understanding how their favorite proteins work. The structures of many of these proteins have been determined, and the structures of many more will be determined in the next few years. Once a protein's structure has been determined, it becomes possible and also enticing to design experiments probing the protein's mechanism of action. Tools for the graphical display of structure, the manipulation of the structure, and the calculation of various interaction energies all become interesting and important. Additionally, some properties of a protein may best be revealed by modeling the protein in water and simulating its molecular thermal motion at 300 K. A researcher interested in protein structure and function faces the question of whether to use one of the complete, but expensive, computer programs for the manipulation and analysis of protein structure, use a number of the highly specialized but almost completely undocumented programs that are available on the web, or to learn and use a powerful and general program that can perform most of the manipulations and calculations one might need. This book is written for those who decide to follow the latter course and to learn the program CHARMM (Chemistry at Harvard Macromolecular Mechanics) that was initiated in the laboratory of Dr. Martin Karplus. The program has been continuously refined and extended by many workers over the years since the initial publication, "CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations", J.
    [Show full text]
  • Mini-XML Programmers Manual, Version
    Mini−XML Programmers Manual, Version 2.0 Michael Sweet Copyright 2003−2004 Mini−XML Programmers Manual, Version 2.0 Table of Contents Introduction.........................................................................................................................................................1 Legal Stuff...............................................................................................................................................2 History.....................................................................................................................................................2 Organization of This Document...............................................................................................................3 Notation Conventions..............................................................................................................................3 Abbreviations...........................................................................................................................................4 Other References......................................................................................................................................4 1 − Building, Installing, and Packaging Mini−XML.......................................................................................5 Compiling Mini−XML............................................................................................................................5 Installing Mini−XML..............................................................................................................................6
    [Show full text]
  • Server-Side PDF There’S More to PDF Than Just Printing Documents
    Server-Side PDF there’s more to PDF than just printing documents Leonard Rosenthol Senior Software Engineer Digital Applications, Inc. Copyright©1999-2000 Digital Applications, Inc. Overview You are here because... · You’re currently working with PDF, but on the · PDF Creation client side · PDF Manipulation · You’re interested in doing things server side with · PDF Data Extraction PDF files · PDF Forms · You’re already doing some server side PDF · PDF Rendering & Printing solutions but looking for others · Libraries · You were already awake and had to find something to kill time · You’re a friend of mine, and wanted to heckle Copyright©1999-2000 Digital Applications, Inc. Copyright©1999-2000 Digital Applications, Inc. What is “Server-side How I do things PDF”? · You already have a draft version of this document in your proceedings, so you shouldn’t need to take · too many notes Any program that works with PDF documents and runs in an unattended, most likely batch, mode. · A copy of the final document is on Digital · Application’s website (<http://www.digapp.com>) Does NOT require use of any portion of Adobe’s for your downloading pleasure Acrobat software · since that would be a violation of the Acrobat EULA · Although I’ve left time at the end for Q & A, I’m more than happy to take questions at any time. Copyright©1999-2000 Digital Applications, Inc. Copyright©1999-2000 Digital Applications, Inc. PDF Creation Postscript · Postscript · HTML/XML · Ghostscript · Images · PStill · “Office” documents · Distiller Server · TeX Copyright©1999-2000 Digital Applications, Inc. Copyright©1999-2000 Digital Applications, Inc.
    [Show full text]
  • Deplate 0.8.3 – Convert Wiki-Like Markup to Latex, Docbook, Html, Or “Html-Slides”
    Deplate 0.8.3 – convert wiki-like markup to latex, docbook, html, or “html-slides” Thomas Link 10. Jul 2008 Keywords: Converter, Text, LaTeX, HTML, Docbook, Wiki, Presentation, Web Publishing Contents 1 Introduction 7 1.1 Goals . 8 2 Getting deplate 8 2.1 Requirements . 8 2.2 Download . 9 2.3 Support . 9 3 Installation 9 3.1 Gem............................................. 9 3.2 Win32 . 9 3.3 ZIP Archive . 10 3.4 Related Software . 11 4 Usage 11 4.1 Editor support . 15 4.1.1 General remark . 15 4.1.2 VIM . 16 5 Configuration 16 5.1 User configuration of the standard setup . 17 5.2 Configuration of wiki names . 17 5.3 Configuration via deplate.ini . 18 5.4 Options . 19 5.5 Templates . 20 5.5.1 Template variables (obsolete) . 20 5.5.2 Template files . 21 1 6 Input Formats 22 6.1 deplate . 23 6.2 deplate-restricted . 23 6.3 deplate-headings . 23 6.4 rdoc ............................................. 23 6.4.1 heading (rdoc, level 3) . 24 6.5 play – Hypothetical support for screen-plays and stage-plays . 25 6.5.1 EXT. Beach – Day . 27 6.6 template . 27 7 Output Formats 27 7.1 HTML, single-file (no chunks) output . 27 7.2 HTML Site, multi-page (chunky) output . 28 7.3 HTML Slides, abbreviated online presentations . 28 7.4 HTML Website . 28 7.5 XHTML 1.0-transitional (xhtml10t) . 28 7.6 XHTML 1.1 with MathML (xhtml11m) . 28 7.7 Php, PhpSite . 28 7.8 LaTeX . 29 7.8.1 latex-dramatist: Typeset stage plays with the dramatist package .
    [Show full text]
  • NCURSES-Programming-HOWTO.Pdf
    NCURSES Programming HOWTO Pradeep Padala <[email protected]> v1.9, 2005−06−20 Revision History Revision 1.9 2005−06−20 Revised by: ppadala The license has been changed to the MIT−style license used by NCURSES. Note that the programs are also re−licensed under this. Revision 1.8 2005−06−17 Revised by: ppadala Lots of updates. Added references and perl examples. Changes to examples. Many grammatical and stylistic changes to the content. Changes to NCURSES history. Revision 1.7.1 2002−06−25 Revised by: ppadala Added a README file for building and instructions for building from source. Revision 1.7 2002−06−25 Revised by: ppadala Added "Other formats" section and made a lot of fancy changes to the programs. Inlining of programs is gone. Revision 1.6.1 2002−02−24 Revised by: ppadala Removed the old Changelog section, cleaned the makefiles Revision 1.6 2002−02−16 Revised by: ppadala Corrected a lot of spelling mistakes, added ACS variables section Revision 1.5 2002−01−05 Revised by: ppadala Changed structure to present proper TOC Revision 1.3.1 2001−07−26 Revised by: ppadala Corrected maintainers paragraph, Corrected stable release number Revision 1.3 2001−07−24 Revised by: ppadala Added copyright notices to main document (LDP license) and programs (GPL), Corrected printw_example. Revision 1.2 2001−06−05 Revised by: ppadala Incorporated ravi's changes. Mainly to introduction, menu, form, justforfun sections Revision 1.1 2001−05−22 Revised by: ppadala Added "a word about window" section, Added scanw_example. This document is intended to be an "All in One" guide for programming with ncurses and its sister libraries.
    [Show full text]