SGML As a Framework for Digital Preservation and Access. INSTITUTION Commission on Preservation and Access, Washington, DC

DOCUMENT RESUME ED 417 748 IR 056 976 AUTHOR Coleman, James; Willis, Don TITLE SGML as a Framework for Digital Preservation and Access. INSTITUTION Commission on Preservation and Access, Washington, DC. ISBN ISBN-1-887334-54-8 PUB DATE 1997-07-00 NOTE 55p. AVAILABLE FROM Commission on Preservation and Access, A Program of the Council on Library and Information Resources, 1400 16th Street, NW, Suite 740, Washington, DC 20036-2217 ($20). PUB TYPE Reports Evaluative (142) EDRS PRICE MF01/PC03 Plus Postage. DESCRIPTORS *Access to Information; Computer Oriented Programs; *Electronic Libraries; *Information Retrieval; Library Automation; Online Catalogs; *Preservation; Standards IDENTIFIERS Digital Technology; *SGML ABSTRACT This report explores the suitability of Standard Generalized Markup Language (SGML) as a framework for building, managing, and providing access to digital libraries, with special emphasis on preservation and access issues. SGML is an international standard (ISO 8879) designed to promote text interchange. It is used to define markup languages, which can then encode the logical structure and content of any so-defined document. The connection between SGML and the traditional concerns of preservation and access may not be immediately apparent, but the use of descriptive markup tools such as SGML is crucial to the quality and long-term accessibility of digitized materials. Beginning with a general exploration of digital formats for preservation and access, the report provides a staged technical tutorial on the features and uses of SGML. The tutorial covers SGML and related standards, SGML Document Type Definitions in current use, and related projects now under development. A tiered metadata model is described that could incorporate SGML along with other standards to facilitate discovery and retrieval of digital documents. Endnotes and a bibliography provide further resources. Appendices include: a discussion of practical concerns related to the uses of SGML in conversion and authoring projects, descriptions of markup format and SGML tools, and a vendor's look at cost metrics. The report concludes that SGML meets current preservation and access requirements for digital libraries and that SGML solves the widest range of archival database problems today: SGML is a standard; it is non-proprietary and platform-independent; and it allows rich, full access to content-based digital documents. The SGML framework should be seriously considered when planning digital library projects.(Author) ******************************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. ******************************************************************************** _U.S.DEPARTMENTOFEDUCATION "PERMISSION TO REPRODUCETHIS , Office of Educational Research and Improvement MATERIAL HAS BEEN GRANTED BY EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC), This docuMeni has been reprothiqed as E . B . Timmer received from the person or organization originating it. The Commission on Minor changes have been made to improve reproduction quality. TO THE EDUCATIONAL RESOURCES Points of view or opinions.stated in this document do not necessarily represent res rvation INFORMATION CENTER (ERIC)." official OERI position or policy': ccess SGML as a Framework for Digital Preservation and Access I.2><TEIHEADER><FILE of the Hay Draft Lincoln's Gettysburgl coln</AUTHOR></TITLE Text vice</PUBLISHER><ADD raries</ADDRLINE><ADDE <AVAILABILITY>No tribution restrictions :EDESC>Getiysburg tress, Hay Draft, Ma; gress</SOURCEDE: ing TEI lite,as a tonstration of manu4 oding.</ENCODINi 17</DATE><RESPSTMT>< eman</NAME><RESP>E :oding</ITEM></CHAN ><DIV1 E="Manuscript"> [brought forth, upon s continent, a new id to the proposition .t all men are crea Now we are engaged .1,it nation,or any .ion so conceived, 1LNmet here on a great .tlefield of that e sent</DEL><ADD kCE="supralinear" tion of it as<DEL E="overstrike" HAND D="AL">a</ADD>final ting place <DEL T PLACE="supralinear" ID="AL">for</ADD> natiorCmight live. is altogether fit /P> 3ut in a larger s' consecrate—we k not hallow this ho struggldd, here, '.--oe 'e consecrated it p--- ND= "AL " >poor< /ADD> ,er to add or det; , tember, what we si . I...4c. kd here. It is for the living, rath 11, v at" 4,1.4 DD PLACE="supralinear" ID="AL">work</ADO ly..fr Iried on. It is rather us to be here 5 el"s.t. taming beforethe</EEL><ADE kCE="supralinear here gave <DEL 'E= "overstrike" /DEL>the last full .sure of devotio these dead shall not 'e died in vain; pf freedom; and that ^nxr^rnm,ni- of I c.hmll n"....-F 7-..rich frnrr, July1997 The Commission on Preservation and Access is a program of the Council on Library and Information Resources. 2 OUT COPY AVAILABLE Published by The Commission on Preservation and Access A program of the Council on Library and Information Resources 1400 16th Street, NW, Suite 740 Washington, DC 20036-2117 E-mail: info @CLlR.org WWW: http://www-cpa.stanford.edu/cpa.html Reports from the Commission on Preservation and Access are intended to stimulate thought and discussion. They do not necessarily reflect the views of the Commission or the Council on Library and Information Resources. The image of the Gettysburg. Address (known as The Hay Draft or Second Draft) is used with permission. It can be viewed on the Library of Congress' Exhibitions site. Exhibitions Home Page: http://lcweb.loc.gov/exhibits/archives/intro.html Library of Congress Home Page: http://www.loc.gov/ Additional copies are available from the above address for $20.00. Orders must be prepaid, with checks (in U.S. funds) made payable to "The Commission on Preservation and Access." This report has been submitted to the ERIC Clearinghouse on Information Resources. scpThe paper in this publication meets the minimum requirements of the American National Standard for Information Sciences-Permanence of Paper for Printed Library Materials ANSI Z39.48-1992. -ii-- 1 ISBN 1-887334-13er 0 COPYRIGHT 1997 by the Commission on Preservation and Access. No part of this publication may be reproduced or transcribed in any form without permission of the publisher. Requests for reproduction for noncommercial purposes, including educational advancement, private study, or research, will be granted. Full credit must be given to the author(s) and the publisher. SGML as a Framework for Digital Preservation and Access I.2><TEIHEADER><F/LE 1 of the Hay Draft Lincoln's Gettysburg: coln</AUTHOR></TITLE vice< /PUBLISHER><ADt raries</ADDRLINE><ADD1 ..---'----P.'.4 'I"' ''' ,"...-4,=, (IEText<AVAILABILITY>N tribution restrictioni, ^-.r.-r.-.. DESC>Gettysbur ress, Hay Draft, Mai, gress</SOURCEDE1 onstration of manus oding.</ENCODIN 7</DATE><RESPSTMT>1 'n eman</NAME><RESP>Er oding< /ITEM>< /CHAN( ><DIV1 E="Manuscript"> brought forth, upo a continent, a new to the propositio t all men are crea Now we are engaged at nation, or any ion so conceived,j '',/'-''''A". 1. 4,..,,,,,,,..,,,,,,_,, met here on a grea tlefield of that esent</DEL><ADD CE-"supralinear. tion of it as<DEL E-"overstrike" HAN D="AL">a</ADD>fina ting place <DEL T LACE-"supralinear" D="AL">for</ADD> :::1 ' ,.... I nation might live. is altogether fitll . '-' -`9---,,e, But in a larger E r.. .e,. , / cp 0>neecrate&m da sh;we not hallow this Iho struggled, here, e consecrated it 4;77 D="AL">poor</ADD> er to add or det ember, what we s d here. It is for the living, rath D PLACE - "supralinear' D-=AL">work</ADIt ried on. It is rather us to be here ci aining before </ h; that from these ored dead we take HAND-"AL">the</DEL><AD CE-"supralinear here gave <DEL E-"overstrike" /DEL>the last full sure of devotio, these dead shall not e died in vain ;, of freedom; and that by James Coleman Head, Academic Computing for the Humanities and Area Studies Stanford University and Don Willis President, Connectex July 1997 Commission Preface In June 1996, the Commission on Preservation and Access began an investigation into the methods and costs involved in using SGML (Standard Generalized Markup Language) for the creation and use of digital documents, with particular attention to preservation and access issues. Technical consultant Don Willis was engaged to conduct research structured to obtain practical results. In addition to reviewing printed and electronic information sources, Willis interviewed SGML users from industry and the library and preservation communities throughout the summer and fall of 1996. The technical report was then used as the basis for this final report, which is directed pri- marily to library and archives communities. Among other reviewers, Barclay Ogden, Head, Conservation Department and Digital Library R&D Department, University of California at Berkeley, made substantial contributions to its content and approach. Information in the report has been updated throughout its preparation. However, it necessarily reflects the situation as of the date of publication. This report is one in a series of Commission publications intended to spark additional discussion and contribute to a collective understanding of how preservation and access needs can be addressed within an evolving technological environment. About the Authors James Coleman is the Head of Academic Computing for the Humanities at Stanford University, and runs an SGML-based electronic text center there. He has been involved in developing information retrieval and access systems for research and scholarly use at Stanford University and the Research Libraries Group, Inc. for more than 10 years. Don Willis is one of the founders of Connectex (www.connectex.com). He has developed information systems for The Library of Congress, Bell & Howell, and Sony. Willis, formerly Vice President, Electronic Product Development, University Microfilm International, is the author of the CPA report A Hybrid Systems Approach to Preservation of Printed Materials (November 1992), one of a series of technical reports from the Commission's Technology Assessment Advisory Committee.

SGML As a Framework for Digital Preservation and Access. INSTITUTION Commission on Preservation and Access, Washington, DC

Extensible Markup Language (XML) and Its Role in Supporting the Global Justice XML Data Model

Transformation Frameworks and Their Relevance in Universal Design

Odfweave Manual

Extensible Markup Language (XML)

The Text Encoding Initiative Nancy Ide

XXX Format Assessment

Metadefender Core V4.12.2

Information Processing — Text and Office Systems — Standard Generalized Markup Language (SGML)

HTML 4.0 Specification

Cumulus User Guide

Supported File Types and Size Limits

Acrobat Glossary