English, and Identify the Glosses and Headings As Modern English Or Latin
Total Page:16
File Type:pdf, Size:1020Kb
e ENRICH Schema — A Reference Guide edited by Lou Burnard October 2008 e ENRICH Schema ii edited by Lou Burnard Contents 1 Manuscript Description Metadata 1 1.1 Phrase-level Elements ................................... 6 1.1.1 Origination ...................................... 6 1.1.2 Material ........................................ 7 1.1.3 Watermarks and Stamps ............................... 7 1.1.4 Dimensions ...................................... 8 1.1.5 References to Locations within a Manuscript .................... 10 1.1.6 Names of Persons, Places, and Organizations .................... 12 1.1.7 Catchwords, Signatures, Secundo Folio ........................ 13 1.1.8 Heraldry ........................................ 14 1.2 e Manuscript Identifier ................................. 14 1.3 e Manuscript Heading .................................. 18 1.4 Intellectual Content .................................... 19 1.4.1 e <msItem> Element ................................ 20 1.4.2 Authors and Titles ................................... 22 1.4.3 Rubrics, Incipits, Explicits, and Other Quotations from the Text .......... 23 1.4.4 Filiation ........................................ 24 1.4.5 Text Classification ................................... 24 1.4.6 Languages and Writing Systems ........................... 24 1.5 Physical Description .................................... 25 1.5.1 Object Description .................................. 26 1.5.2 Writing, Decoration, and Other Notations ...................... 30 1.5.3 Bindings, Seals, and Additional Material ....................... 35 1.6 History ........................................... 37 1.7 Additional information ................................... 38 1.7.1 Administrative information .............................. 39 1.7.2 Surrogates ....................................... 41 1.8 Manuscript Parts ...................................... 42 2 Metadata about digital facsimiles 43 3 Customization Section 47 3.1 Macros ........................................... 48 3.2 Model classes ........................................ 50 3.3 Attribute classes ...................................... 62 3.4 Elements .......................................... 80 iii e ENRICH Schema iv Introduction is Reference Guide defines an XML format for the structure of the data which all ENRICH partners will contribute to the Manuscriptorium, either directly or indirectly by means of a harvester or transformation process. e format is formally expressed by a schema which is generated from the XML source of this guide. e guide itself is a conformant subset of Release 1.4 of e Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange (TEI P5). e schema defined by this document addresses three distinct aspects of a digitized manuscript: 1. metadata describing the original source manuscript (1. Manuscript Description Metadata) 2. metadata describing digitized images of the original source manuscript (2. Metadata about digital facsimi- les) 3. a transcription of the text contained by the original source manuscript Within Manuscriptorium, only the first two are required. However, the schema documented here also provides for the third, in the interest of completeness and for the assistance of ENRICH partners wishing to provide richer access facilities to their holdings. e schema defined by this document is available in DTD, RELAX NG, and W3C Schema languages. Along with the present documentation, this forms one of the key deliverables for Work Package 3 of the ENRICH project. Earlier versions of Manuscriptorium used schemas based on MASTER, notably one known as MASTER-X. ese specifications both defined comparatively unconstrained XML formats, which permitted a very wide range of possibilities and did not attempt to constrain (for example) values to any predefined set of values. While appropriate for an interchange format, this approach has some drawbacks: • there may be wide variation in approaches taken to represent essentially the same phenomenon; • the format appears over complex to novice users, who will only ever want to use a very small subset of the possible tags; • developing soware (e.g. stylesheets) for the format becomes unnecessarily complex, since every possibility must be allowed for even though it is unlikely to appear; • accurate searching of the data may be needlessly complicated by the large number of ways of representing e.g. attribute values such as dates. In the ENRICH schema, we have tried to address these concerns by reducing the number of choices and constraining the possible values for several attributes. Nevertheless, • the resulting schema remains fully TEI Conformant: we are only defining a subset; • all constraints introduced have the full consent of all partners in the project. e overall structure of an ENRICH-conformant XML document may be summarized as follows: v e ENRICH Schema <TEI> <teiHeader> <!-- ... metadata describing the manuscript --> </teiHeader> <facsimile> <!-- ... metadata describing the digital images --> </facsimile> <text> <!-- (optional) transcription of the manuscript --> </text> </TEI> e remainder of this Guide describes each of these aspects in more detail, using material derived from the P5 release of the TEI Guidelines. vi 1 Manuscript Description Metadata Each distinct manuscript must be described using a distinct TEI-conformant <teiHeader> element, as specified in the TEI Guidelines, chapter 2. is element may contain many components, depending on the needs of the creator, which may be provided in either structured or (relatively) unstructured form. For Manuscriptorium purposes, the following components of the TEI Header must be provided, and must conform to the constraints specified here. <fileDesc> (file description) contains a full bibliographic description of an electronic file. <titleStmt> (title statement) groups information about the title of a work and those responsible for its intellectual content. <publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. <sourceDesc> (source description) describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. <revisionDesc> (revision description) summarizes the revision history for a file. Other header components, if present, will be ignored by Manuscriptorium; they will be retained for storage in the system and returned on request, but their content is not processed for any purpose, and they may not be visible for searching purposes. e following example shows the minimal required structure: <teiHeader> <fileDesc> <titleStmt> <title>[Title of manuscript]</title> </titleStmt> <publicationStmt> <distributor>[name of data provider]</distributor> <idno>[project-specific identifier]</idno> </publicationStmt> <sourceDesc> <msDesc xml:id="ex5" xml:lang="en"> <!-- [full manuscript description ]--> </msDesc> </sourceDesc> </fileDesc> <revisionDesc> <change when="2008-01-01"> 1 1. Manuscript Description Metadata <!-- [revision information] --> </change> </revisionDesc> </teiHeader> Taking these in turn, • the title of the manuscript is used to identify it in short summary displays; it should correspond with the information used for the same purpose in the <head> element within the <msDesc> element below. • the name of the data provider may be given in any conventional form but should be consistent across all data provided. • the project-specific identifier has two parts: it consists of the short alphabetic code used to identify the partner (e.g. OCS for OUCS), followed by a four digit sequence number. For example, OCS0002 would be the second digital record contributed to the Manuscriptorium project by partner OCS. Note that this identifier has nothing to do with the manuscript shelfmark or other identifier. When ingesting records, Manuscriptorium will assume that if a record with this identifier already exists, the intention is to replace it. • the manuscript description provided must follow the specification given in the remainder of this section. • at least one <change> element must be provided, providing the date that this record was last revised before being submitted. As elsewhere, dates must be provided in the ISO format yyyy-mm-dd. e content of the <change> element is free text, which may be used to indicate the scope of any revision and the person/s responsible for it. <msDesc> (manuscript description) contains a description of a single identifiable manuscript. @xml:id (identifier) provides a unique identifier for the element bearing the attribute. @xml:lang (language) indicates the language of the element content using a ‘tag’ generated according to BCP 47 e <msDesc> element is used to provide detailed information about a single manuscript. For ENRICH purposes, this must carry the attributes mentioned above, to supply a unique internal identifier for the manuscript, and to specify the language of its description respectively. e value for xml:id may be the same as the value supplied for the <idno> element in the <teiHeader>, or it may be some other project-specific identifier used for cross-reference. It should however be prefixed by an identifier for the partner concerned, so as to avoid possible identifier collisions. e value for xml:lang, as elsewhere, must be supplied in the form of a valid language identifier (see below). If no value is supplied, the assumption is that