Global Digital Format Registry (GDFR)

Total Page:16

File Type:pdf, Size:1020Kb

Global Digital Format Registry (GDFR)

Global Digital Format Registry (GDFR) Data Model v.4 Rev. 2004-01-12

1 Introduction

The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a format-specific basis. The existence of a sustainable registry of authoritative representation information about digital formats has been identified as a crucial component of the research agenda for effective digital preservation [NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy questions surrounding the establishment of a Global Digital Format Registry (GDFR).

2 Scope

The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats.

3 Definitions

 Format. A fixed, byte-serialized encoding of an information model.  Information model. A formal expression of exchangeable knowledge [ISO 14721].  Representation information. Information that maps formatted content streams into more meaningful concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO 14721].

4 Data Types

4.1 Primitive Data Types

 ByteStream. A sequence of arbitrary octets.  Enumeration. A set of unique values.  Integer. An integer numeric value.  String. A sequence of characters represented in the UTF-8 encoding [UTF-8].

4.2 Derived Data Types

 Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as constrained by [Wolf].  Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP].  MIME. A MIME media type represented as an RFC 2046-encoded string [MIME].  NonNegative. A non-negative integer, i.e., 0, 1, 2, …  Telephone. A telephone number represented as an ITU-T E.164-encoded string [ITU E.164].  URI. A Universal Resource Identifier represented as an RFC 2396-encoded string [URI].

5 Data Model

All property attributes are defined in the data model in terms of their name, type, obligation, cardinality, and definition. Obligation is indicated as: 'M' for mandatory, 'MA' for mandatory-if-applicable, and 'O' for optional. Cardinality is indicated as 'R' for (arbitrarily) repeatable.

GDFR Data Model v.4 1 5.1 Primitive Properties

Access Type Enumeration M Access type: Escrow Inaccessible copy on file License Access by license only On-site On-site access only Public Unrestricted access Restricted No access Other Requires informative note Start Date O Starting date End Date O Ending date Note String MA R Informative note LastModified Date M Modification date/timestamp

Agent Name String M Personal or corporate name of agent Type Enumeration M Agent type: Commercial Commercial (for-profit) entity Government Governmental agency Education Educational institution Non-profit Non-profit entity Professional Professional organization Standard Accredited standards body Trade Trade association Other Requires informative note Address String O Postal address Telephone Telephone O Telephone number Fax Telephone O Facsimile number Email Email O Email address Web URI O Web site Note String MA R Informative note LastModified Date M Modification date/timestamp

Application Name String M Application name Version String M Version identifier Release Date M Release date Vendor Agent O Vendor Process Process O R Process HWDependenc Platform O R Hardware dependency y SWDependency Application O R Software dependency Note String O R Informative note LastModified Date M Modification date/timestamp

Authority Agent Agent M Authority agent Start Date MA Starting date of effective authority End Date MA Ending date of effective authority Note String O R Informative note LastModified Date M Modification date/timestamp

GDFR Data Model v.4 2 Class Identifier Cognomen M Class identifier Description String M Description Note String O R Informative note LastModified Date M Modification date/timestamp

Cognomen Value String M Cognomen value Type Enumeration M Cognomen type: AFNOR AFNOR standard ANSI ANSI standard ARK CDL Archival Resource Key BSI BSI standard CCITT CCITT standard DDC Dewey Decimal Classification DOI Digital Object Identifier ECMA ECMA standard GDFRClass GDFR classification identifier GDFRFormat GDFR format identifier GDFRRegistry GDFR registry identifier Handle CNRI handle Informal No defined syntax or embedded semantics ISO ISO standard ISBN International Standard Book Number ISSN International Standard Serial Number ITU ITU recommendation JEITA JEITA standard LCC Library of Congress Classification LCCN Library of Congress Control Number MIME MIME media type [MIME] NISO NISO standard PII Publisher's Item Identification [PII] PURL Persistent URL RFC IETF Request for Comment SICI Serial Item and Contribution Identifier [SICI] TOM Typed Object Model identifier UUID/GUID Universally/globally-unique Identifier [UUID] URI Uniform Resource Identifier [URI] URL Uniform Resource Locator URN Uniform Resource Number [URN] Other Requires informative note Note String MA R Informative note LastModified Date M Modification date/timestamp

GDFR Data Model v.4 3 Document Title String M Document title Type Enumeration M Document type: Article Correspondence Manual Monograph Report Standard Thesis Web Other Requires informative note Author Agent O R Author Edition String O Edition Publisher Agent O R Publisher Date Date O Publication date Accessibility Access M R Access regime Identifier Cognomen O R Identifier Note String MA R Informative note LastModified Date M Modification date/timestamp

Event Agent Agent M Agent effecting the event Type Enumeration M Event type: Delete Deletion of a format Initial Initial registration of a format Obsolescence Declaration of format obsolescence Update Update format representation information Other Requires informative note Scope Enumeration M Scope of the event: Editorial Non-substantive editorial change Technical Substantive technical change Review Enumeration M Review type: Full Full technical review Partial Requires informative note None No review Date Date M Date/timestamp Note String O R Informative note LastModified Date M Modification date/timestamp

Interface Protocol Enumeration M Interface protocol: HTTP .NET RMI Remote method invocation SOAP Web Service Other Requires informative note Connection String MA Protocol-specific connection parameters Note String O R Informative note LastModified Date M Modification date/timestamp

GDFR Data Model v.4 4 Ontology Class Class M Ontological class Note String O R Informative note LastModified Date M Modification date/timestamp

Platform Name String M Platform name Version String M Version identifier Release Date M Release date Vendor Agent O Vendor Note String O R Informative note LastModified Date M Modification date/timestamp

Process Type Enumeration M Process type: Create Create new instantiation of formatted object Render Media type-specific rendering of formatted object TransformFrom Requires source auxiliary format TransformTo Requires target auxiliary format Validate Validation of formatted object Other Requires informative note Auxiliary Cognomen MA R Source or target format of transformation Note String O R Informative note LastModified Date M Modification date/timestamp

Registry Identifier Cognomen M Registry identifier Service Service M R Supported GDFR service LastHarvestedBy Date O Date/timestamp of last harvest by this registry LastHarvest Date O Date/timestamp of last harvest of this registry Note String O R Informative note LastModified Date M Modification date/timestamp

Relation Identifier Cognomen M Target format identifier Registry Cognomen O Target registry identifier Note String O R Informative note LastModified Date M Modification date/timestamp

Service Type Enumeration M Service type: Approval Technical review Description Query for specific format Export Bulk export of registry data Introspection Information about registry instance Maintenance Maintain format representation information Notification Synchronization Distributed synchronization Interface Interface M R Service interface

GDFR Data Model v.4 5 Note String O R Informative note LastModified Date M Modification date/timestamp

GDFR Data Model v.4 6 Signature Value ByteStream M Signature value Obligation Enumeration M Signature obligation: Mandatory MandatoryIfApplicable Requires informative note Optional Note String MA R Informative note LastModified Date M Modification date/timestamp

5.2 Derived Properties

Derived properties inherit all of the attributes of their parent.

ExternalSignature IS-A Signature Type Enumeration M External signature type: Extension File extension Type Mac OS data type Other Requires informative note

FormatRelation IS-A Relation Type Enumeration M Format relation type: EquivalentTo Equivalent to target IsPreviousVersionOf Previous version of target IsSubsequentVersionOf Subsequent version of target IsSubtypeOf Subtype of target IsSupertypeOf Supertype (parent) of target MayContain May encapsulate target UsedBy May be encapsulated by target Other Requires informative note

InternalSignature IS-A Signature Position Enumeration M Signature position: Fixed Fixed position; requires offset Arbitrary Arbitrary position Offset NonNegative MA Byte offset

Person IS-A Agent Title String O Personal title Affiliation Agent O Organizational affiliation

5.3 Registry Properties

GDFR IS-A Registry Version String M Version identifier for registry code base and data model Date Date M Build date for registry code base and data model Aegis Authority M R Responsible authority ExternalRegistry Registry O R Known external registry Ontology Ontology M Ontological classification scheme Format Format O R Format representation information

GDFR Data Model v.4 7 5.4 Format Properties

Format Identifier Cognomen M Format canonical identifier Description String M Short description of format Alias Cognomen O R Variant identifier Version String O Format version identifier Author Agent O R Author Owner Authority M R Legal owner Maintainer Authority O R Maintainer Classification Cognomen O R Ontological classification Relationship FormatRelation O R Typed relationship with other format Specification Document M R Specification document Signature Signature O R External or internal signature Application Application O R Application system using format Provenance Event M R Provenance event Note String O R Informative note LastModified Date M Modification date/timestamp

6. Identifiers

GDFR requires three types for identifiers: for ontological classifications, formats, and registries. If these identifiers are strictly for purposes of identification, i.e., no resolution is necessary, they should be defined in a registered gdfr namespace of the info URI scheme [INFO].

info:gdfr/c/classid info:gdfr/f/formatid info:gdfr/r/registryid

If resolution is desired, then the identifiers should be defined in a registered gdfr namespace of the URN scheme [URN]:

urn:gdfr:c:classid urn:gdfr:f:formatid urn:gdfr:r:registryid

References

[INFO] H. Van de Sompel, T. Hammond, E. Neylon, and S. L. Weibel, The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces, Internet draft, December 2003 .

[ITU E.164] ITU-T E.164, The international public telecommunications numbering plan, May 1997.

[ISO 6093] ISO 6093:1985, Information processing – Representation of numerical values in character strings for information interchange.

[ISO 8601] ISO 8601:1997, Data elements and interchange formats – Information interchange – Representation of dates and times.

[ISO 11179] ISO/IEC 11179-3:2003, Information technology – Specification and standardization of data elements – Part 3: basic attributes of data elements.

GDFR Data Model v.4 8 [ISO 14721] ISO 14721:2003, Space data and information transfer systems – Open archival information system – Reference model .

[MIME] N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC 2046, November 1996 .

[NSF-DELOS] M. Hedstrom, S. Ross, et al., Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation, 2003 .

[PII] Elsevier Science, Publisher Item Identifier as a means of document identification .

[SICI] ANSI/NISO Z39.56-1996, Serial Item and Contribution Identifier (SICI).

[URI] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, August 1998 .

[SMPTP] J. Klenson, Simple Mail Transfer Protocol, RFC 2281, April 2001 .

[UUID] ISO/IEC 11578:1996, Information technology – Open Systems Interconnection – Remote Procedure Call (RPC).

[URN] R. Moats, URN Syntax, RFC 2141, May 1997 .

[UTF-8] Unicode Consortium, The Unicode Standard, Version 3.0 (Reading: Addison-Wesley, 2000).

[Wolf] M. Wolfe and C. Wicksteed, Date and Time Formats, W3C Note, September 15, 1997 http://www.w3.org/ TR/NOTE-datetime.

GDFR Data Model v.4 9

Recommended publications