Misconceptions of The

Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019 A world of difference: Myths and misconceptions about the TEI ............................................................................................................................................................ James Cummings Newcastle University, UK ....................................................................................................................................... Abstract The Guidelines of the Text Encoding Initiative are generally recognized in the digital humanities as important and foundational standards for many types of research in the field. The TEI Guidelines are generalistic, seeking to enable the largest possible user base encoding digital texts for a wide range of purposes. Consulting on many TEI-based projects, teaching TEI workshops, and volunteer- ing as part of the TEI Technical Council, I have encountered many myths, mis- Correspondence: conceptions, and misunderstandings about the TEI. Indeed, one plenary lecturer James Cummings, School of once claimed ‘the problem with the TEI is it has too many tags and there is no English, Newcastle way to change it’. Inspired by myths such as this, this article will detail common University, Newcastle Upon misconceptions about the TEI that I have encountered, concentrating on those Tyne, NE1 7RU, UK. E-mail: technical myths that will help increase knowledge about the TEI misconceptions james.cummings@newcastle. along the way. The article ends with a consideration of why these myths might ac.uk have arisen, and what might be able to be done about them. ................................................................................................................................................................................. 1 Introduction to and from numerous other formats; and it is a well-documented format for archival long-term The Text Encoding Initiative is a mature interna- preservation. tional consortium of institutions, projects, and in- This article presents some of the myths, miscon- dividual members. It is a community of users and ceptions, and misunderstandings that I have person- volunteers that produces a freely available manual of ally encountered while working with TEI research regularly maintained and updated recommenda- projects and serving on the TEI Technical Council tions for encoding digital text: ‘The TEI for more than a decade following the release of TEI 1 Guidelines’. The TEI and its community have P5. The myths or misconceptions I will look at become an important aspect of Digital Humanities include: for those undertaking digital textual studies. The TEI is XML (and XML is broken or dead) However, the TEI is more than just a community The TEI is too big and complex that creates a set of guidelines: in doing so it for- The TEI is too simple or general malizes a history of the community’s concerns for There is no way to change the TEI textual distinctions and exemplifies understandings You have to be a TEI guru to customize the TEI of how to encode them and how these have de- The TEI is too small (or does not have veloped over its existence; it acts as a slowly de- <mySpecialElement>) veloping consensus-based method of structuring You cannot get from TEI to $myPreferredFormat those distinctions; it is a mechanism for producing If you use TEI you must learn other technologies customized schemas that reflect an individual pro- You cannot do stand-off markup in XML (or ject’s needs; it produces methods for transformation TEI) Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018. ß The Author(s) 2018. Published by Oxford University Press on behalf of1 EADH. All rights reserved. For permissions, please email: [email protected] doi:10.1093/llc/fqy071 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019 J. Cummings XML (and TEI) cannot handle overlapping insistence by some that XML is inherently broken hierarchies or dead as a technology. This claim is often linked There are no tools for the TEI with the desire to herald the arrival of new technol- Interoperability is impossible with the TEI ogies. This does not mean that XML does not have The TEI is only for digital editions, I am doing some limitations (the problem of overlapping hier- $otherThing archies, discussed later, being one often mentioned The TEI is only for Anglophone or Western and the TEI Guidelines have a whole chapter on works this), but there are also many solutions to these. While markup theorists may wish to argue the This list of myths is not meant to be exhaustive, finer points of whether one possible solution to and there are indeed many more, but these seem this or that problem is better, most pragmatic pro- most important to address for those new to the jects just wish to undertake their encoding as effi- TEI. I will not ‘name and shame’ those repeating ciently as possible. In my experience any inherent these misconceptions, since that would be counter- limitations, some of which are discussed later, are productive. The real intention here is to record these misconceptions, to explode them, and to consider rarely prohibitive, and projects often prefer to how they might be avoided in future. accept the most straightforward solution. There is a natural temptation for people to latch onto the new, especially in computer technology, and to 2 The TEI Is XML (and XML Is want to dismiss the old, but they do so at a cost. Championing a new format does not necessitate the Broken or Dead) denigration of existing formats, and they can and do happily co-exist. Moreover, the TEI Guidelines and One of the first myths to investigate is the claim that project encoding concerns are not usually about the the TEI is XML and XML is broken or dead. The format, but instead they are about the rich granu- TEI Guidelines were first expressed in SGML as a larity of information and the use of appropriate markup language and only as of TEI P4 moved to technologies for specific tasks.2 recommending XML, but even this recommenda- tion may change in the future. As new languages, technologies, and methodologies for text encoding 3 emerge in future, the TEI Guidelines may move to The TEI Is Too Big and them or include them as one of a set of ways to Complex serialize digital text, so long as they meet the basic requirements for easy long-term preservation, ex- The TEI Guidelines indeed does consist of a large set pressiveness, validation, integration, and mass of recommendations, and there are many levels at adoption that is seen with XML. It is important which it can be used and understood, which can be that it is the prose of the TEI Guidelines that is daunting. The TEI Guidelines do sometimes enable considered normative, not the current markup lan- multiple methods of encoding the same phenomena guage they are written in or recommend, nor the but strive to do so only to accommodate differing schemas generated from them. What is written in approaches or methodologies. However, the TEI is a the Guidelines in prose is more important than the modular framework that allows a project, or a sub- rules of any generated schema. There are constraints community to choose precisely which elements to in the prose of the TEI Guidelines (such as honest make available and where appropriate build pro- adherence to the abstract model) which will never cessing workflows based on only those elements. be able to be modelled in any schema language, and The current TEI P5 Guidelines (version 3.4.0) this is precisely one of the strengths of these com- have 569 elements, but no one expects any encoding munity-developed recommendations. project to use all of these or necessarily be fully While the TEI Guidelines are currently formu- aware of the underlying infrastructure of inter- lated as XML, it is important to look at the linked classes that form the TEI infrastructure.3 In 2 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019 A world of difference proper use of the TEI, customization is strongly place to store project-specific encoding documenta- recommended: tion. Indeed, the TEI customization file acts not only as a source of documentation but also defines These Guidelines provide an encoding scheme an encoding scheme and may provide recommen- suitable for encoding a very wide range of dations for processing files encoded with that texts, and capable of supporting a wide variety scheme, and it provides a historical record for all of applications. For this reason, the TEI of these (documentation, formal encoding scheme, scheme supports a variety of different and processing information). Thus a customization approaches to solving similar problems and file such as this is called an ODD file for ‘One also defines a much richer set of elements Document Does-it-all’. This TEI ODD file acts as than is likely to be necessary in any given pro- a meta-schema source not only for the generation of ject. Furthermore, the TEI scheme may be ex- a schema (to validate your documents) but also for tended in well-defined and documented ways your local encoding manual, customized, and inter- for texts that cannot be conveniently or ap- nationalized as necessary for your project. The propriately encoded using what is provided. figure below (Fig. 1) shows a collection of chosen For these reasons, it is almost impossible to TEI elements documented by a TEI ODD use the TEI scheme without customizing it in Customization file which is used to generate sche- some way.4 mas in RELAX NG (RNG) or XML schema (XSD) To address this misunderstanding, it is necessary format, as well as documentation outputs such as to provide a brief introduction to TEI customiza- PDF or HTML versions of the local encoding guide- tion.

Misconceptions of The

Transformation Frameworks and Their Relevance in Universal Design

The Text Encoding Initiative Nancy Ide

OASIS Response to NSTC Request for Feedback on Standard Practices

Automated Software System for Checking the Structure and Format of Acm Sig Documents

SGML As a Framework for Digital Preservation and Access. INSTITUTION Commission on Preservation and Access, Washington, DC

A Standard Format Proposal for Hierarchical Analyses and Representations

Syllabus FREN379 Winter 2020

JSON Application Programming Interface for Discrete Event Simulation Data Exchange

Deliverable 5.2 Score Edition Component V2

Introduction

Release Notes for the Docbook XSL Stylesheets I

Text Encoding Initiative Semantic Modeling. a Conceptual Workflow Proposal