Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference: Myths and misconceptions about the TEI ...... James Cummings Newcastle University, UK ...... Abstract The Guidelines of the are generally recognized in the as important and foundational standards for many types of research in the field. The TEI Guidelines are generalistic, seeking to enable the largest possible user base encoding digital texts for a wide range of purposes. Consulting on many TEI-based projects, teaching TEI workshops, and volunteer- ing as part of the TEI Technical Council, I have encountered many myths, mis- Correspondence: conceptions, and misunderstandings about the TEI. Indeed, one plenary lecturer James Cummings, School of once claimed ‘the problem with the TEI is it has too many tags and there is no English, Newcastle way to change it’. Inspired by myths such as this, this article will detail common University, Newcastle Upon misconceptions about the TEI that I have encountered, concentrating on those Tyne, NE1 7RU, UK. E-mail: technical myths that will help increase knowledge about the TEI misconceptions james.cummings@newcastle. along the way. The article ends with a consideration of why these myths might ac.uk have arisen, and what might be able to be done about them......

1 Introduction to and from numerous other formats; and it is a well-documented format for archival long-term The Text Encoding Initiative is a mature interna- preservation. tional consortium of institutions, projects, and in- This article presents some of the myths, miscon- dividual members. It is a community of users and ceptions, and misunderstandings that I have person- volunteers that produces a freely available manual of ally encountered while working with TEI research regularly maintained and updated recommenda- projects and serving on the TEI Technical Council tions for encoding digital text: ‘The TEI for more than a decade following the release of TEI 1 Guidelines’. The TEI and its community have P5. The myths or misconceptions I will look at become an important aspect of Digital Humanities include: for those undertaking digital textual studies. The TEI is XML (and XML is broken or dead) However, the TEI is more than just a community The TEI is too big and complex that creates a set of guidelines: in doing so it for- The TEI is too simple or general malizes a history of the community’s concerns for There is no way to change the TEI textual distinctions and exemplifies understandings You have to be a TEI guru to customize the TEI of how to encode them and how these have de- The TEI is too small (or does not have veloped over its existence; it acts as a slowly de- ) veloping consensus-based method of structuring You cannot get from TEI to $myPreferredFormat those distinctions; it is a mechanism for producing If you use TEI you must learn other technologies customized schemas that reflect an individual pro- You cannot do stand-off markup in XML (or ject’s needs; it produces methods for transformation TEI)

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018. ß The Author(s) 2018. Published by Oxford University Press on behalf of1 EADH. All rights reserved. For permissions, please email: [email protected] doi:10.1093/llc/fqy071 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings

XML (and TEI) cannot handle overlapping insistence by some that XML is inherently broken hierarchies or dead as a technology. This claim is often linked There are no tools for the TEI with the desire to herald the arrival of new technol- Interoperability is impossible with the TEI ogies. This does not mean that XML does not have The TEI is only for digital editions, I am doing some limitations (the problem of overlapping hier- $otherThing archies, discussed later, being one often mentioned The TEI is only for Anglophone or Western and the TEI Guidelines have a whole chapter on works this), but there are also many solutions to these. While markup theorists may wish to argue the This list of myths is not meant to be exhaustive, finer points of whether one possible solution to and there are indeed many more, but these seem this or that problem is better, most pragmatic pro- most important to address for those new to the jects just wish to undertake their encoding as effi- TEI. I will not ‘name and shame’ those repeating ciently as possible. In my experience any inherent these misconceptions, since that would be counter- limitations, some of which are discussed later, are productive. The real intention here is to record these misconceptions, to explode them, and to consider rarely prohibitive, and projects often prefer to how they might be avoided in future. accept the most straightforward solution. There is a natural temptation for people to latch onto the new, especially in computer technology, and to 2 The TEI Is XML (and XML Is want to dismiss the old, but they do so at a cost. Championing a new format does not necessitate the Broken or Dead) denigration of existing formats, and they can and do happily co-exist. Moreover, the TEI Guidelines and One of the first myths to investigate is the claim that project encoding concerns are not usually about the the TEI is XML and XML is broken or dead. The format, but instead they are about the rich granu- TEI Guidelines were first expressed in SGML as a larity of information and the use of appropriate and only as of TEI P4 moved to technologies for specific tasks.2 recommending XML, but even this recommenda- tion may change in the future. As new languages, technologies, and methodologies for text encoding 3 emerge in future, the TEI Guidelines may move to The TEI Is Too Big and them or include them as one of a set of ways to Complex serialize digital text, so long as they meet the basic requirements for easy long-term preservation, ex- The TEI Guidelines indeed does consist of a large set pressiveness, validation, integration, and mass of recommendations, and there are many levels at adoption that is seen with XML. It is important which it can be used and understood, which can be that it is the prose of the TEI Guidelines that is daunting. The TEI Guidelines do sometimes enable considered normative, not the current markup lan- multiple methods of encoding the same phenomena guage they are written in or recommend, nor the but strive to do so only to accommodate differing schemas generated from them. What is written in approaches or methodologies. However, the TEI is a the Guidelines in prose is more important than the modular framework that allows a project, or a sub- rules of any generated schema. There are constraints community to choose precisely which elements to in the prose of the TEI Guidelines (such as honest make available and where appropriate build pro- adherence to the abstract model) which will never cessing workflows based on only those elements. be able to be modelled in any schema language, and The current TEI P5 Guidelines (version 3.4.0) this is precisely one of the strengths of these com- have 569 elements, but no one expects any encoding munity-developed recommendations. project to use all of these or necessarily be fully While the TEI Guidelines are currently formu- aware of the underlying infrastructure of inter- lated as XML, it is important to look at the linked classes that form the TEI infrastructure.3 In

2 in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference proper use of the TEI, customization is strongly place to store project-specific encoding documenta- recommended: tion. Indeed, the TEI customization file acts not only as a source of documentation but also defines These Guidelines provide an encoding scheme an encoding scheme and may provide recommen- suitable for encoding a very wide range of dations for processing files encoded with that texts, and capable of supporting a wide variety scheme, and it provides a historical record for all of applications. For this reason, the TEI of these (documentation, formal encoding scheme, scheme supports a variety of different and processing information). Thus a customization approaches to solving similar problems and file such as this is called an ODD file for ‘One also defines a much richer set of elements Document Does-it-all’. This TEI ODD file acts as than is likely to be necessary in any given pro- a meta-schema source not only for the generation of ject. Furthermore, the TEI scheme may be ex- a schema (to validate your documents) but also for tended in well-defined and documented ways your local encoding manual, customized, and inter- for texts that cannot be conveniently or ap- nationalized as necessary for your project. The propriately encoded using what is provided. figure below (Fig. 1) shows a collection of chosen For these reasons, it is almost impossible to TEI elements documented by a TEI ODD use the TEI scheme without customizing it in Customization file which is used to generate sche- some way.4 mas in RELAX NG (RNG) or XML schema (XSD) To address this misunderstanding, it is necessary format, as well as documentation outputs such as to provide a brief introduction to TEI customiza- PDF or HTML versions of the local encoding guide- tion. Most customization is undertaken by individ- lines reflecting the customization undertaken.6 uals or projects, but in some cases a number of The elements of the TEI Guidelines are organized projects band together to use identical or similar in a modular manner so that those undertaking cus- customizations of the TEI Guidelines to enable tomization do not need, necessarily, to pick and interchange (or even interoperability) between choose elements on an individual basis. The TEI their resources. That is, a community of practice ODD customization file may include: may define a subset of the TEI Guidelines for use within that community, whether the size of that a module as a whole (thus getting all of its community is one person or thousands worldwide. elements) An excellent example of this is that users of the a module, and ask for only some of its elements EpiDoc Guidelines (produced by an international (thus getting only those) collaborative effort and intended to be used to a module, but list those that are not wanted (thus encode scholarly and educational editions of ancient getting any others) documents) do not need to learn all of the TEI Guidelines.5 A more important benefit is that it has enabled the creation of a variety of tools for the publication of EpiDoc resources, and for con- version of texts encoded using the Leiden Conventions to and from EpiDoc. These are con- ventions often used in print editions of epigraphical or papyrological documents. Choosing which elements to include or exclude from the final scheme helps to enforce consistency among the encoders on that project—even if that project only has just one encoder. Moreover, the TEI customization file records the information about the customization for future users and is a Fig. 1 TEI ODD customization file

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 3 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings

Or exclude: the XML directly, and in doing so gain more fine- grained control of a customization than using any a module as a whole (thus getting none of its interface. However, doing this can be complex, and elements) it is not required if the Web-based interfaces suffice. The figure below (Fig. 2) shows one way of using The figure below (Fig. 3) shows the TEI ODD XML module-based customization. required to include a variety of elements. There is an important difference between the in- What this XML tells a TEI ODD processor clusion and exclusion of particular elements in a TEI through these elements is that it ODD Customization. When a TEI ODD file includes should include the ‘tei’, ‘transcr’, and ‘tagdocs’ mod- some modules and specifies which elements are part ules wholesale.8 The ‘tei’ module does not provide of that inclusion, then a schema generated from this any elements but instead is one which instantiates the will only ever include those elements from those TEI class structure and should usually be included. modules. Conversely, if a TEI ODD customization TEI modules are usually created in conjunction with file includes modules but excludes particular elem- a particular chapter of the TEI Guidelines. The ents from that inclusion, then a schema that is regen- ‘transcr’ module is related to Chapter 11 of the TEI erated from that TEI ODD file at a future date will Guidelines ‘Representation of Primary Sources’. The automatically include any new elements that the TEI ‘tagdocs’ module is associated with Chapter 22 Consortium has added to those modules in the inter- ‘Documentation Elements’ and provides the elements vening releases of the TEI Guidelines. The choice of (such as the element) which custom- approach for a particular encoding project will ization files themselves use. The ‘core’, ‘header’, and depend on whether the project wishes to benefit ‘textstructure’ modules include only specific elements from new elements introduced to that module in using the @include attribute. This means, for ex- the future or remain static with their choices from ample, if the TEI Guidelines of the future include a the beginning.7 A project’s TEI P5 customization is new element in the ‘textstructure’ module, any later always free to base their customization on any pre- outputs generated from this TEI ODD customization vious version of the TEI Guidelines or they can spe- file will not know about it. Compare the above figure cify the ‘current’ version and always get new (Fig. 3) with the figure below (Fig. 4). improvements when regenerating their schemas. The ODD file in Fig. 4 tells any TEI ODD proces- While the TEI Consortium provides Web-based sor that when including the ‘textstructure’ module, it methods for including and excluding elements, the should include all elements except for those listed in underlying TEI ODD customization file uses the the @except attribute. Currently this would end up TEI’s own vocabulary to record these choices in with the same list of elements being included, but if an XML file. Some TEI customizers choose to edit the TEI Consortium introduces more elements to the ‘textstructure’ module in the future, these will appear in the outputs regenerated from this customization.9 An encoding project need only customize to the ne- cessary degree that assists them in accomplishing (and documenting) their goals, and this kind of flexi- bility of customization is one of the hallmarks of the TEI. While the TEI is indeed complex, it can be made as small and simple as required.

4 The TEI Is Too Simple or General

In almost complete contradiction to the previous Fig. 2 Including modules in a TEI ODD customization misconception, another criticism often levelled at

4 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference

Fig. 3 ODD fragment for including elements

Fig. 4 ODD fragment for including elements with @except attribute the TEI is that it is ‘too simple or too general’. its own controlled vocabulary (which may include This most frequently comes up when discussing ‘water’). This expectation is expressed by the use of encoding needs with a potential project (preferably the datatype ‘teidata.enumerated’ to provide an but rarely before applying for funding). In my ex- open list of suggested values in the TEI perience this tends to occur when the project Guidelines. There are, however, real instances claims that the TEI Guidelines are not sufficient where the TEI is ‘too simple’ in that it provides to cope with the unique and special research re- only a general level of markup for some phenomena, quirements of the particular project. Often this but this is usually in cases either where there are comment is a symptom of a superficial understand- existing XML standards that the TEI recommends ing of the scope of the TEI, or a misunderstanding for these or where the TEI community has not of the generalistic nature of the scheme and the way pushed the standard forward yet to greater detail in which any individual element can be further in this area. There is an inevitable tension between refined by using attributes or nested levels of the desire to add new elements to the TEI scheme to encoding. increase its expressivity, and the need to avoid ex- More than once I have had a conversation with panding the Guidelines unnecessarily. As a result the someone with a very basic knowledge of the TEI TEI strives to recommend related standards where saying something like ‘The element is reasonable. An example of this might be the far too general, for my research I need something element used to encode the pres- like a element to say the damage ence of music notation in a text. In this case the TEI was caused by water!’.10 One answer is, of course, Guidelines states that ‘It is also recommended, when that one should use the @agent attribute of the useful, to embed XML-based music notation for- element to specify the damage was mats, such as the Music Encoding Initiative caused by water.11 The possible values of the format as content of . This must @agent attribute are not defined by the TEI be done by means of customization’.12 While the Guidelines (although some sample suggestions are TEI framework is indeed general, one is able to made, which indeed do not include ‘water’), but change the TEI to be as specific as any individual rather a TEI customization is expected to document project needs, and while it is simple in some places,

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 5 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings it can be constrained further, expanded, changed, or modified element is changed from the ori- even mixed with other standards. ginal ‘contains a proper noun or noun phrase’ to something more jovial.14 The ODD code in Fig. 6 also replaces the existing class memberships of the 5 There Is No Way to Change element (here commented out). Classes the TEI are underlying structures of the TEI framework that elements (or other classes) claim membership It seems ridiculous, after the previous misconcep- of to determine what attributes an element may tions, that anyone would claim that there is no way have (here @type and @subtype from the class ‘att.- to change the TEI, and yet, I have sat through an typed’) and where it appears in other content important plenary lecture where the speaker has models (here anywhere the ‘model.nameLike.agent’ 15 claimed something like ‘the problem with the is allowed). The figures below (Figs 7 and 8) con- TEI is it has too many tags and there is no way to tinue where the previous (not well-formed) figure change it’. left off. The TEI Guidelines document a literate pro- The TEI ODD customization file documents gramming methodology to customize the TEI constraints and potential processing models. The framework for specific projects.13 While there are constraints are used to generate additional forms Web-based tools for creating TEI customizations, of schema validation (such as in this in the background they store the information instance). Here we have embedded a Schematron using the TEI ODD XML format. It can take some rule to report an error when our customized time to learn the markup needed for TEI custom- element is used somewhere that is not a ization, but the power over a customization that this descendent of a

element. (This is a fabricated gives a project with regard to its encoding, valid- pedagogical example, names obviously could appear ation, and documentation is well worth the effort— in many other places, but for the purposes of this the return on investment for those creating custom- customization that would be marked as an error.) ization is significant. The documentation of processing models is a By using the element in a TEI promising recent addition to the infrastructure of ODD XML customization file in the figure above TEI ODD customization. In Fig. 7, the (Fig. 5), this documents a change to the element is used to indicate the intention that when a element from the core module. In this case the spe- element that has a @type attribute is being cifications are missing, but we fill in some of the processed for ‘web’ output, the (intentionally) possibilities in the figures below (Figs 6–8). vaguely specified ‘alternate’ behaviour should be The snippet of the customization file depicted in used. This might change in different forms of Web Fig. 6 specifies a change to the specification of the output or different media of output (such as in element (hence, the main element is called print). Here two parameters to the ‘alternate’ pro- ). In particular, the gloss of the cessing behaviour are provided as elem- modified element becomes ‘a name’, ents: the ‘alternate’ parameter and the ‘default’ where the original in the TEI Guidelines is ‘name, parameter. Each has a @value attribute which is proper noun’. Furthermore, the description of the an Xpath expression to be interpreted in the

Fig. 5 ODD fragment for changing the element

6 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference

Fig. 6 ODD initial fragment for changing gloss, description, and classes

Fig. 7 ODD fragment documenting constraints and processing models of each element in a document instance. generate a footnote. Once this basic customization The first documents that the format has been explained, even those editors who element’s @type attribute should be used for the barely understand XPath can see that swapping the ‘alternate’ value, while the second records that the values would swap which part of the document is content of the element itself (here ‘.’ in used in the tooltip and which in the output text. XPath) is to be used as the value for the ‘default’ Developers building infrastructures on top of the parameter. A processing toolchain could take processing model have reported that using this elements that match these conditions and reading the TEI ODD customization file have and produce Web-based tooltips, or in print significantly shrunk and improved their code output the same ‘alternate’ behaviour might (Turska et al., 2016).

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 7 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings

Fig. 8 Final ODD fragment replacing examples and remarks

In the closing fragment of this a element but can also be stored else- (Fig. 8), the examples and remarks of the where in the TEI ODD customization file and refer- element are modified using the and enced from within the element. elements. These elements are very Indeed, the elements in the figures useful for projects that wish to provide examples in this section all have @:id attributes which which match the project’s materials and additional would enable them to be embedded alongside the element-specific encoding notes for the documenta- prose and pointed to from the schema specification tion that they will generate from this TEI ODD. using a .(Thoughmoreproperlythese A common practice that the example in the fig- elements would themselves be ures above does not include is the modification of embedded inside a element.) The benefits attributes and lists of their possible values. To of doing this should not be underestimated, for ex- record that a project is changing the values of an ample, this enables documentation writers to situate attribute involves a number of steps inside an the change in the schema precisely at the point where element. In the example below (Fig. 9), this change is being documented in prose. If later a a TEI ODD customization changes the @agent at- project wanted to update the available values for the tribute of the element of the ‘transcr’ @agent attribute on the element (perhaps module (on the representation of primary sources). to include ‘fire’ alongside ‘water’), they could change Inside a list of attributes (using the elem- the above and any relevant accompanying ent), a single attribute definition (using the prose at the same point in the documentation. element) is changed by providing a re- The TEI customization demonstrated in this sec- placement list of values (using the elem- tion shows conclusively that one can indeed change ent) detailed with the element. the TEI. However, this has demonstrated the more In this case the attribute values are in a list of complex TEI ODD XML format which not all TEI ‘semi’ using the @type attribute on the users may wish to grapple with and raises the ques- element, which means that these should be con- tion of whether you need to be some sort of guru to sidered as strongly suggested values to encourage customize the TEI. standardization but one can include new values where necessary. If a value of ‘closed’ had been used instead, schemas generated from this custom- 6 You Have to Be a TEI Guru to ization file would allow only the listed values. Customize the TEI A TEI ODD customization file is also able to con- tain as much prose description and other local encod- There are those that feel the TEI can only be con- ing information as needed by the project. Element trolled by some elite priesthood, but this is simply specifications such as these are usually stored inside not the case. While it is true that some of the

8 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference

Fig. 9 Providing a list of values for the @agent attribute of the element methods for customizing the TEI, such as those has of the TEI Guidelines. However, customizations shown above, can be intimidating at first, they still may be modified, updated, and evolved over the can be mastered with a little practice. Moreover, the lifetime of an encoding project as the needs of the TEI Consortium provides Web-based tools to create project (and TEI experience of the customizer) TEI customizations. At time of writing, most custo- change. Those new to customizing the TEI are mizers use the Web-based Roma tool provided by encouraged to discuss any concerns or questions the TEI Consortium (http://roma.tei-c.org/); how- they may have openly on the TEI’s mailing list. ever, since this is showing its age, a new tool is being Moreover, one project may have different custom- created.16 izations (and thus schemas and documentation) for In Fig. 10, the Roma interface is being used to different stages in their workflows. For example, remove the , , and they might have one for initial data entry, a different elements from this particular cus- one for proofreading, and a tighter one for pre-pub- tomization. Tools such as this allow users to browse lication validation. through the options available from the TEI - Although it would be difficult for current or work and select those modules or elements that fit future Web interfaces for editing TEI ODD custom- their needs. The customization undertaken by a ization files to allow for the full expressivity possible project can start from scratch or could use a in the TEI ODD vocabulary, it is certain that these number of existing TEI customizations as starting will provide tools for the most common tasks in points. customizing the TEI. The existence of such tools The overall quality of the customization, regard- and their support by the TEI Technical Council or less of the method by which it is created, does the TEI community means that one does not need depend somewhat on the knowledge the customizer to be a TEI guru to customize the TEI.

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 9 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings

Fig. 10 The (dated) TEI Roma Web interface: http://roma.tei-c.org/

7 physically arranged), this is suitable for a prose de- The TEI Is Too Small (or Does scription or compiled collation formula rather than Not Have ) a more structured record of the information.18 Both of these are ways in which the TEI is too When someone argues that the TEI Guidelines are ‘small’, not having an element and only too small what they usually mean is that the TEI having a limited element. However, Guidelines do not include a special element whose what makes the TEI different from most other stand- mere name implies that its existence might make ards is that any user may add new elements to it and their particular encoding easier. Occasionally they do so in a manner (using the TEI ODD customiza- might really mean that the TEI does not yet have tion format) which fully integrates it into the TEI elements for a particular form of encoding. In some infrastructure and acts as documentation for how instances, the TEI provides a general solution where their project’s outputs vary from the version of the others might prefer more specific markup. For ex- TEI that they are using. This also acts as a useful ample, while the element for describing method to suggest changes to the TEI by providing manuscripts can also be used for early-printed them a copy of a new customization.19 If a user really books, some are reluctant to do so simply because wants to be part of the TEI of the name and its development for manuscript Guidelines, there is a straightforward, open, commu- cataloguing. Moreover, there is currently no gen- nity-based process for proposing that.20 The figure eral-purpose element for the description below (Fig. 11) demonstrates how to add an entirely of non-manuscript or print objects in as much new element, in this case a . detail.17 While there is a element (for In this case the element docu- describing how the leaves of a manuscript are ments the creation of a element, in

10 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference

Fig. 11 Creating an entirely new name element a new namespace that is not the usual TEI name- Consortium’s OxGarage convertor.23 It is impossible space. Any new elements that customizations create for the TEI Consortium to provide conversions to should not claim they are in the TEI namespace and every known format, nor is this desirable. In most instead provide a new one. This cases the target format will be specific to the needs of provides a gloss and description, makes it a a particular project and maintaining such conver- member of the same TEI classes as the sions should necessarily be the responsibility of that element, and allows only text nodes as its content project (or sub-community) rather than the elected model.21 Any resulting documentation or schemas volunteers of the TEI Technical Council who are generated would allow this element in the same busy dealing with bugs and feature requests for the places that elements are allowed. TEI Guidelines, maintaining the infrastructure for the TEI Guidelines, and associated software.24 From a project point of view, the important 8 You Cannot Get from TEI to thing with the creation of a data model is ensuring $myPreferredFormat the granularity of information to properly express the project’s understanding of the phenomena it is There are many reasons why someone might want encoding. In some cases, it may be better to encode to process TEI P5 XML files into other formats. The materials in TEI and convert to a format necessary TEI Guidelines currently recommend encoding texts for a later stage in a workflow, for example for an in XML, which is an easily processable markup lan- analysis or publication stage. In other cases, the TEI guage with libraries for handling it in dozens of might not be as useful an encoding format as the programming languages. Other simple scripting lan- desired format, and TEI might simply be exported guages like XSLT and XQuery exist specifically for later for long-term preservation. If the other format transforming and querying XML structures. It is fits better with the research being undertaken then very unlikely that it is impossible to programmatic- the solution is simple: use that format! It is more ally get from the TEI to any other format, though in important to do the research enabled by your cre- some cases the cost of doing so may be prohibitive. ation of digital text than it is to use any particular At time of writing, the TEI Consortium provides open international standard, however, reinventing best-effort XSLT stylesheets for transformations to the wheel should always be avoided if possible. If or from around forty other formats.22 Many of these your preferred format is an XML vocabulary, then, conversions are also available via the TEI if desired, a project could even use the TEI ODD

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 11 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings customization language to document this vocabu- the recommendations of the TEI Guidelines, they lary as the ODD language ‘is not tied to the TEI and must then learn all of the other related technologies can be used to define schemas for any XML lan- for the processing, display, and publication of XML. guage’ (Rahtz and Burnard, 2013). There will This is a misunderstanding that rejects the notion of always be the possibility of converting from the modern research projects as collaborative enter- chosen format to the TEI for long-term preserva- prises or did not factor in the real economic costs tion. But where the TEI is truly not suitable, then it for doing so in their funding model. While there are is more important to use the appropriate technology those digital scholars who will master both the TEI for that research. Guidelines and a variety of tools for analysis and This does not mean that the conversion between querying of TEI resources, there are others who formats is unproblematic or always simple. Indeed, it can use technologies to transform and manipulate is rare that there is not some complication in moving these resources to a researcher’s specifications with between any two formats, since they may encode se- only a minimal knowledge of the TEI itself. It is mantics that are significantly different, or informa- often better to collaborate with technical developers tion at different levels of granularity. Creating careful who have the requisite skills needed to produce conversions is a skill and limitation of resources interfaces which can help answer the research ques- available to a project will determine how much tions at hand. work may be put towards this sort of work, and While it is true that some of those using TEI also up-converting data to add greater richness of go on to learn other XML technologies, this is be- markup will always be more difficult than so-called cause of the power that it gives them for publication ‘lossy’ conversions. Such ‘lossy’ conversions may not and analysis. If an encoder is not involved in those always preserve distinctions, granularity, or semantics aspects of a project, they do not need to learn other in the output format. This is not always a bad thing. technologies. However, when asked by researchers For example, if a piece of research software only takes familiar with the TEI but who do not know any data in a less-granular format, it is common to con- programming languages, I have often recommended vert a copy to that less-rich format on which to run that they start with related technologies like XPath, the analysis while keeping the original as the ‘real’ XSLT, and XQuery depending on their needs. copy of the data. That research software might be Indeed, I would recommend at least a very basic used for the publication or analysis of the data knowledge of XPath for encoders working on large while still retaining the richer version of the data XML text collections, especially if they are proof- for long-term preservation enables other processes reading or revising them, because of the power it to undertake different analysis in the future. In gives them in locating structures and inconsistencies many cases projects use TEI as their base format in the marked-up text. XPath also has the benefit of and convert to a number of different formats for being a fundamental component of other technolo- publication and analysis as needed. In other cases, gies such as XSLT and XQuery, so the few hours of projects store their data in a different format and experimentation with it that will enable someone to create an export to TEI to aid interchange. Both of become effectively proficient in XPath can also be these are completely reasonable ways to use the TEI leveraged if the researcher eventually decides to and help to demonstrate that one usually can get to learn more. That said, there is no requirement for or from the TEI to other formats. those using TEI to learn these technologies or other software. There are many projects where the editor- ial acts of encoding texts are kept completely separ- 9 If You Use TEI You Must Learn ate from those undertaking the analysis or Other Technologies publication of them. Increasingly there are also general purpose tools One barrier to using the TEI for some is the as- like TEI Boilerplate or CETEIcean for lightweight sumption that, if they create resources following publication, TEI Publisher built on top of eXist-

12 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference db, as well as more specific technologies such as the Other examples might include the ability to store Edition Visualization Technology for those produ- as out-of-line markup a critical apparatus entry of cing that form of critical digital edition.25 The TEI variant readings (using the element). This Consortium provides stylesheets for conversion to/ element may be stored completely separate from the TEI, as mentioned earlier, and these are from a flatter textual edition which is pointed into also available for use in editors such as the oXygen from the element.30 XML Editor.26 The TEI Archiving, Publishing, and One of the major changes in TEI P5 is the move Access Service (TAPAS) is a project that enables TEI of many attributes to take URI-based values, which users an easy method to publish their TEI files with- means they can point both internally and externally out infrastructure of their own.27 to the TEI document. With the subsequent intro- One of the more recent introductions to the TEI duction of the element, encoders are Guidelines, the ability to document intended pro- now able to document private URI syntaxes making cessing models inside a TEI ODD customization such pointing attributes even easier to use. file, gives developers a method to generate software While you can certainly do stand-off markup in based on implementation-agnostic instructions the current version of the TEI, there is still room for stored in the customization file. The eXist-db TEI improvement. Piotr Ban´ ski notes some of these Publisher makes good use of this to enable those limitations of stand-off markup within the TEI without development skills to modify the display framework and has been working to improve this of their editions through changes to the TEI ODD (Ban´ ski, 2010). For example, there are significantly customization. While it might significantly increase advanced proposals for additional elements for the benefit one can derive from TEI resources, it is embedding stand-off markup and linked data. not necessary to learn other technologies to make What is really needed is more and better documen- use of the TEI. The modern nature of research col- tation on using these features of the TEI, and better laboration and generalized tool development are at user-friendly general tools for creating and process- least two reasons why one does not necessarily need ing stand-off markup.31 While there is much more to learn other technologies. that could be said about stand-off and out-of-line markup versus embedded markup, and how the TEI should deal with this better, it should be evident 10 You Cannot Do Stand-off that the TEI Guidelines do provide the ability to Markup in XML (or TEI) handle stand-off markup.

This myth displays not only a misunderstanding of 11 XML but also an unfamiliarity with the TEI. While XML (and TEI) Cannot Handle many encoding practices are done using embedded Overlapping Hierarchies or inline markup (and indeed that was how most XML was originally conceived), it is also possible, This is an often-touted problem by developers seek- and indeed increasingly common, to create re- ing to find problems with XML and replace it either sources as a set of interlinked files, often encoding with a new technology or sometimes a different old information in a stand-off or out-of-line method.28 technology with which they are more familiar. Using a variety of techniques, any XML document When fear of XML technology is the cause of their can point to fragments or even individual characters objection, then this should be treated with scepti- of other parts of itself or other local or remote cism. That XML has difficulty with overlapping documents. hierarchies is not, in itself, strictly a myth. This con- The TEI Guidelines have a variety of elements cern, indeed, pre-dates XML and existed as a prob- and attributes specifically for use with stand-off lem noted in SGML (Renear et al., 1996). However, and fragmentary structures (such as the it is usually a misunderstanding that assumes that, and elements, and the @part attribute).29 since there is a limitation in one area, a whole

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 13 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings standard should be abandoned. While it is true that such as these, and the ability of the TEI to use stand- XML as a data structure limits embedded expres- off or out-of-line markup, it is reasonable to claim sions to a single hierarchy, it has a variety of solu- that the TEI can handle overlapping hierarchies. tions for this built in, such as the use of empty elements as boundary markers for other hierarchies and URI-based pointing for stand-off or out-of-line 12 There Are No Tools for the TEI markup. This is not some secret or new problem, the TEI Guidelines contain a whole chapter on Non- As the TEI is currently formulated in XML, and hierarchical Structures which documents some of there are thousands of tools which understand the more popular solutions.32 Others are embedded XML, this claim is obviously nonsensical at one throughout the TEI Guidelines, and users of the TEI level. However, those who believe this myth often generally follow Renear’s suggestion that analytical mean something slightly different: they will claim perspectives often determine the hierarchies one that there are no tools that understand the TEI vo- uses, and where non-hierarchical perspectives exist cabulary itself and leverage this to produce the these can often be represented through hierarchical output they desire. The many TEI Consortium sub-perspectives (Renear et al., 1996). For example, Stylesheets have already been mentioned along while there is often a conflict between the intellec- with TEI Publisher, TEI Boilerplate, and tual structures of paragraphs with the physical CETEIcean, so clearly this myth is also false. This structure of pages, the TEI Guidelines recommend is not to say that more tool development would not the (page beginning) element as an empty be beneficial. element to record page breaks. To some users, not The figure above (Fig. 12) shows a screenshot of being able to enclose both of these simultaneously in a fragment of the TEI Consortium Wiki listing tools elements which wrap around them with start and placed in the ‘Tools’ category. Not all of these tools end tags appears to be a major limitation, while are for use with only TEI files nor is this in any way almost all the projects I have been involved with an exhaustive list of tools. More accurately these are did not find this a significant problem at all. The the tools that people have chosen to list on the TEI concern of overlapping hierarchies in embedded Consortium Wiki and so are a very small subset of markup may be felt more keenly by markup theor- what is actually available. Most tools that projects ists (DeRose, 2004). Some in the markup commu- create are for their own bespoke purposes rather nity long for a more elegant markup system that than generalized tools for any TEI document. does not have this limitation, whereas pragmatists Sadly, even many of these are not openly released. tend to doubt that the proposed solutions will There are both open source and commercial XML achieve XML’s wide-spread adoption and benefits. editors, such as the oXygen XML Editor, which have If one is alternating between two set hierarchies built-in support for the TEI. (for example physical and intellectual), then there are a variety of XSLT stylesheets which can trans- form overlapping hierarchies such as these to swap 13 Interoperability Is Impossible to the other hierarchy.33 That in most cases XML with the TEI only represents overlapping hierarchies in oblique methods proves to be almost no problem for prag- This misunderstanding has sometimes been given as matic projects. Most are happy to prioritize one a justification of why a particular project might hierarchy (usually the intellectual) over others prefer to use its own bespoke format rather than a (such as the physical) or move to a basic out-of- better known format like the TEI. The idea sug- line markup solution. Those projects for which it is gested is that as TEI use is so variable, it is near a significant concern are also those likely to be cap- impossible for resources from different TEI custom- able of implementing the various mitigating tech- izations to interoperate with each other, thus they niques proposed over the years.34 Given solutions might as well reinvent the wheel. The TEI

14 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference

Fig. 12 A snapshot of the Tools category from the TEI Consortium wiki44

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 15 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings

Guidelines are different from many static standards Elsewhere I have argued that the notion that unme- because they accept the inevitable truth that projects diated interoperability between such varied resources using it need the ability to customize, constrain, and should be automatic is a deluded fantasy, as there extend the standard they are using. A conflict will will always be a necessary step of mediation between always exist between the expressive freedom the TEI the resources (Cummings, 2014). This means some- gives any individual project and the notion of seam- one (or some software program) needs to under- less interoperability with others (Bauman, 2011). stand the differing formats, perhaps through The TEI Guidelines provide a mechanism (the TEI analysis of the TEI ODD customization. The solu- ODD customization format discussed earlier) for tion to the problem is proper (machine-readable) documenting any local project changes to the stand- documentation, which expresses the differences be- ard. The more documentation (human and ma- tween each of the projects and the full TEI. While chine-readable) is provided, the more likely any programmatically reading these two customization barriers to interchange and interoperability can be files will identify where the differences are, there is overcome. The existence of TEI customization does always a human element in determining how to mean that two TEI projects might be using very handle them. When trying to interchange resources, different views of the TEI framework, and this in- projects often find it useful to downgrade their evitably leads to a tension between the freedom of richer local copies to common TEI subsets (such as customization and the fragmentation of the stand- TEI Lite or TEI simplePrint). This also has many ard within the community (Cummings, 2008). other benefits. As Martin Holmes notes in reference However, that does not imply that interoperability to downsampling their materials to less-rich TEI for- (or at least interchange) is impossible, merely that it mats, this ‘constitutes what we might call ‘‘enacted can be difficult depending on the degree of differ- documentation’’’ forcing them to reflect on the ences between the schemas. Luckily, these differ- complexities of their encoding choices (Holmes, ences should be recorded in the TEI ODD 2017). Holmes has also created a ‘CodeSharing API’ which gives an interoperable method for sam- customization which can be programmatically com- 37 pared (Burnard and Rahtz, 2004). The TEI pling the encoding practices of a project using it. Guidelines also include a notion of ‘TEI Interoperability will always remain a problem for Conformance’ that one should think would help any standard with a significant degree of expressivity with interoperability.35 While these rules do help in the creation of document instances. However, it in setting some basic ground rules for what consti- is much easier to convert between the outputs of tutes a TEI document, the real challenge for inter- two well-documented projects (with both prose operability is the variation between completely valid and schema specifications in a TEI ODD) than it and conformant TEI documents. is between documents from projects following their However, seamless interoperability is not what own bespoke markup systems. Experience shows the TEI Guidelines are meant to achieve, rather that the benefit of a shared adherence to a general underlying framework, even when using it to differ- the possibility for the interchange of texts ent ends, far outweighs the difficulty in disentan- (Unsworth, 2011). The difference between inter- gling bespoke systems of encoding. change, one of the original goals of the TEI, and interoperability mostly relies on the degree of human intervention required in the mediation be- tween formats. The degree of interoperability 14 The TEI Is Only for Digital needed between two resources also depends on Editions, I Am Doing $otherThing whether this is bidirectional interoperability (both resources using each other’s materials) or more One of the popular uses for the recommendations of commonly unidirectional. In my view the latter the TEI Guidelines is for the production of digital really becomes a form of mediated interchange re- editions. However, it is also used for the creation of gardless of whether that is ‘negotiated’ or ‘blind’.36 many other resources. For example, while the

16 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

A world of difference recommendations in Chapter 10 ‘Manuscript language and writing system. While the TEI has in- Description’ are useful when properly describing a dubitably arisen from a western context, the com- manuscript as metadata for a digital edition, they munity strives to broaden the scope of its examples, are also used by libraries for fully detailed manu- to extend the coverage of various forms of non- script catalogues.38 There are also modules in the western textual phenomena, to improve its interna- TEI for dictionaries, linguistic corpora, and tionalization and localization mechanisms, and to graphs, networks, and trees. deliberately expand the diversity of the community When creating output from a TEI-encoded file, itself.39 Those encoding documents following the there is a natural assumption by some that there is a TEI Guidelines use to represent characters one-to-one relationship between this file and a in digital form. However, not all characters for all Web-based ‘digital edition’ as output. However, human languages for all times are already available that is a problematic way in which to view and in Unicode, so the TEI Guidelines also have built-in undertake TEI encoding (Turska, Cummings, and recommendations for describing non-Unicode char- Rahtz, 2016). If one is using the recommendations acters as well as recording scribal glyph-variants. of the TEI properly, then it is possible to create so The TEI has mechanisms for internationalization much more than a single edition view of this and localization which enable both the TEI output. One can generate not only multiple editions Guidelines and individual project customizations but also such outputs as supplementary files, indi- to provide descriptions of its markup in any desired ces, databases, interactive visualizations, glossaries, language. The TEI Guidelines are written in English and camera-ready print copy, amongst many other and have generally not been translated as a whole, possibilities. TEI is used for creating born-digital but for a selection of languages where volunteers ancillary resources as well, such as bibliographies, have undertaken the work, the glosses and descrip- working papers, meeting minutes, and slides for lec- tions of elements, attributes, and attribute values tures, as well as other teaching materials. Some re- have been translated. This means that these can be search projects create large TEI resources which are displayed with in the convention outputs of the TEI never intended for a one-to-one Web publication, Guidelines (Web pages, , , etc.), or even but as text-bases for querying and analysis. from the generated schemas as tooltips or pop-ups However, if there is a good, open, international in some XML editors, in the encoder’s preferred standard for doing ‘$otherThing’ (whatever that languages. Currently, the many of glosses and de- might be), then it may be a better idea to use that scriptions have been translated into Chinese standard if TEI does not cater for a project’s specific (Taiwanese), French, German, Italian, Japanese, needs, but it is vital to make sure that really is the Korean, and Spanish. While the TEI has the infra- case by consulting the TEI Guidelines and its com- structure in its TEI ODD customization to enable munity. Even though the TEI Guidelines are for users to rename objects like elements themselves, many more things than digital editions, the choice users report that encoders that are not fluent in of standards to follow should be based on using the English are not resistant to the (mostly English- appropriate formats for that particular research, in derived) element names as long as the descriptions full knowledge of the benefits and drawbacks of the of them are in their preferred language. possible formats. Although these mechanisms are a good start, there is always room for improvement, but this re- quires volunteers fluent in the language willing to 15 The TEI Is Only for Anglophone donate a significant amount of time for translation. or Western Works The figure below (Fig. 13) shows the reference page for the element in Chinese, demonstrating The TEI Guidelines strive to be applicable to encod- the kind of benefits given to those encoding in a ing any form of text, from any time period, in any different language.

Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 17 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019

J. Cummings

Fig. 13 The TEI Guidelines reference page for the element in Chinese45

16 Conclusion but now, given its near ubiquitous acceptance in Digital Humanities for digital textual studies it can 16.1 Why do these myths exist? truly be considered so. It is not a ragtag group of Some of these myths and misconceptions have rebels but has become the establishment which grains of truth in them. If it seems as though I people naturally want to challenge.40 Another have skimmed over the complexities of some prob- factor leading to misconceptions may be that the lems, this is merely because of the number of myths TEI Guidelines are often taught in very intensive examined. Although the TEI Guidelines cover a workshops: this means that people are required to large number of textual phenomena in general ingest large amounts of information and new con- ways, there are ways to control the size and specifi- cepts in a very short period of time. It is therefore city of the scheme for any project. While XML does unsurprising that some misunderstandings might not encode multiple hierarchies when used for occur with such necessarily compressed teaching embedded markup, there are built-in methods for methods. As part of that teaching, it is sometimes dealing with these, stand-off markup options, and easier to focus on the facts as instantiated by rules scripts for alternating between hierarchies. Although and precepts (e.g. ‘each TEI document must have a there is truth in these myths, it is also possible that with a inside’) rather than ex- there are other reasons for why these misconcep- plaining the underlying concepts of why something tions exist. is being encoded in this manner. Ideally TEI peda- When it started as an academic project, the Text gogy would have the leisure to explain both the pre- Encoding Initiative was not part of the mainstream, cepts (the ‘how’) and the underlying concepts (the</p><p>18 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019</p><p>A world of difference</p><p>‘why’). The best teachers I have known do this as TEI Guidelines, is always difficult. The TEI is a com- much as possible. Similarly misunderstandings munity, with elected volunteers from that commu- occur because encoders have only read part of the nity helping to steer it, not a priesthood with special TEI Guidelines (say one or two chapters) and so are gurus.43 It is an important aspect of the TEI that unaware of some of the overall infrastructure or anyone can change it, not only in their own local possibilities instantiated by other parts of the rec- customizations but through participation in the ommendations. Finally there is the, thankfully rare, community, submitting feature requests, training wilful misunderstanding for more strategic reasons others to understand it, or helping to dispel myths (such as promoting one’s own solutions to the sup- and misconceptions in public forums. posed strawmen problems).</p><p>16.2 What can we do about these myths? References Combating ignorance of the TEI Guidelines is not Ban´ ski, P. (2010). Why TEI stand-off annotation doesn’t an easy task, and clearly more and better teaching quite work: and why you might want to use it never- materials should improve this. The open training theless. In Proceedings of Balisage: The Markup materials already produced by the TEI community Conference 2010. https://www.balisage.net/Proceed ings//vol5/html/Banski01/BalisageVol5-Banski01.<a href="/tags/HTML/" rel="tag">html</a>. go a long way to rectifying this, but perhaps more of these should focus on the why of text encoding in Bauman, S. (2011). Interchange vs. interoperability. In conjunction with the how. Since TEI pedagogy is Proceedings of Balisage: The Markup Conference often compressed into intensive training followed 2011. http://www.balisage.net/Proceedings/vol7/html/ Bauman01/BalisageVol7-Bauman01.html. by practical application for a very specific research project, it might be helpful to encourage longer term Burnard, L. and Rahtz, S. P. Q. (2004). RelaxNG with TEI learning, through creating more self-tuition Son of ODD. In Proceedings of Extreme Markup Languages 2004 Conference. http://conferences.idealli materials and building teaching into other longer ance.org/extreme/html/2004/Burnard01/EML2004Burn courses. It might also be useful to provide a greater ard01.html. number of shorter, easily digested, introductions to Burnard, L. (2013). Resolving the Durand Conundrum. the TEI, accompanied perhaps with surveys of types Journal of the Text Encoding Initiative, 6. http://jtei. of encoding and larger full-worked examples revues.org/842. demonstrating the whole project arc from encoding Cummings, J. (2008). The text encoding initiative and the to publication and analysis. The TEI By Example study of literature. In Schreibman, S. and Siemens, R. project was a good idea and is a reasonable place (eds), A Companion to Digital Literary Studies. Oxford: for those wishing to teach themselves the TEI to Blackwell. http://www.digitalhumanities.org/compa start to get basic introductions to a variety of TEI nionDLS/. encoding tasks. However, at almost a decade old, Cummings, J. (2014). The compromises and flexibility of and as the TEI continues to develop, TEI By 41 TEI Customisation. In Mills, C., Pidd, M. and Ward, E. Example becomes increasingly out of date. This (eds), Proceedings of the Digital Humanities Congress is not to say that the TEI Guidelines themselves 2012. Studies in the Digital Humanities. Sheffield: HRI could not make better use of their own examples, Online Publications. https://www.dhi.ac.uk/openbook/ building a corpus of more fully worked detailed chapter/dhc2012-cummings. 42 encoding samples. Another approach would be DeRose, S. (2004). Markup overlap: a review and a horse. the encouragement of new users to discuss more, In Proceedings of the Extreme Markup Languages and more freely, on TEI-L (the TEI community’s 2004 Conference. http://xml.coverpages.org/DeRose discussion list), so that misunderstandings can be EML2004.pdf. openly rectified in a spirit of cooperation. Holmes, M. (2017). Whatever happened to interchange?. However, encouraging that kind of open participa- Digital Scholarship in the Humanities, 32(Suppl 1): i63– tion by users new to a domain of knowledge, like the 8. https://doi.org/10.1093/llc/fqw048.</p><p>Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 19 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019</p><p>J. Cummings</p><p>Rahtz, S. P. Q. and Burnard, L. (2013). Reviewing the markup system for classical epigraphical documents TEI ODD System. In Proceedings of the 2013 ACM but have since been expanded to encompass other Symposium on Document Engineering, DocEng ’13. ancient documents with similar textual phenomena ACM. http://doi.acm.org/10.1145/2494266.2494321. and challenges (such as papyri). Renear, A., Mylonas, E., and Durand, D. (1996). 6 Document Type Definitions (DTDs) should be con- Refining our notion of what text really is: The problem sidered deprecated in my opinion, since they are no of overlapping hierarchies. In Hockey, S. and Ide, N. longer fit for purpose with modern XML technologies (eds), Research in Humanities Computing 4: Selected (e.g. with multi-namespaced documents). Papers from the ALLC/ACH Conference, Christ 7 There is another approach not mentioned here Church, Oxford, April 1992, pp. 263–80. See also an whereby a TEI ODD customization file may include earlier (1993) version of this at http://cds.library. elements directly from a module while not including brown.edu/resources/stg/monographs/ohco.html. the module itself. However, this can be dangerous, since the element will not necessarily benefit from its Turska, M., Cummings, J., and Rahtz, S.P.Q. (2016). membership in, for example, attribute classes that are Challenging the myth of presentation in digital edi- locally defined in the unreferenced module. tions. Journal of the Text Encoding Initiative. 8 The TEI ODD XML snippets in the figures here make Unsworth, J. (2011). Computational work with very large the assumption that they are nested inside a text collections. Journal of the Text Encoding Initiative. <schemaSpec> element as part of a well-formed and valid TEI ODD customization. 9 Similar mechanisms, some discussed further below, exist for customizing elements themselves, or indeed Notes the classes and modules of which they are a part. 1 This article is based on a paper given at the Digital 10 As with all the examples given, this is a real but anon- Humanities 2017 conference in Montreal, Canada. ymized example with the name of the desired element The original slides that accompanied the article are changed to protect the naive researcher. I do not feel available at https://slides.com/jamescummings/tei- naming and shaming is in the best interests of the myths. From the first release of the TEI P5 community. Guidelines (28 October 2007), the TEI Consortium 11 Another answer is to extend the TEI through use of moved away from having major releases every few customization as described later, but this should not years to a system of rolling version releases every 6 be preferred when there is a satisfactory element that months. At time of writing, the current version of already exists, such as in this case. the TEI Guidelines is TEI P5 3.4.0. The experiences 12 TEI Guidelines, ‘<notatedMusic> reference page’, of other members of the TEI Technical Council and http://www.tei-c.org/Vault/P5/3.4.0/doc/tei-p5-doc/ TEI Community at large over many years have been en/html/ref-notatedMusic.html. invaluable to this article. 13 The <a href="/tags/Literate_programming/" rel="tag">literate programming</a> methodology is based on 2 I discuss this frustrating temptation of the ‘religious’ the work on Donald Knuth in which human-readable zealots who often argue that we should latch on to program documentation and code are interspersed their preferred new technology, dismiss the old, and and used as a source for both documentation and ignore any costs of doing so in a blog post at: https:// compiled source code. In the case of TEI customiza- faqingperplxd.wordpress.com/2015/05/14/childish- tion, the source documentation may be mixed with toys/. schema specifications, from which documentation, 3 The number of elements the TEI Guidelines currently schema specifications, and even processing model include is available on the element reference page pipelines, may be generated. from version 3.2.0 onwards, http://www.tei-c.org/ 14 In both these cases, the TEI ODD customization has Vault/P5/3.4.0/doc/tei-p5-doc/en/html/REF- @xml:lang and @versionDate attributes on these elem- ELEMENTS.html. ents. The first of these specifies the language of the 4 TEI Guidelines, Chapter 23: ‘Using the TEI’, Section content (to aid the TEI and others in their desires to 23.3 ‘Customization’ http://www.tei-c.org/Vault/P5/3. internationalize) and the second a date at which this 4.0/doc/tei-p5-doc/en/html/USE.html#MD. was last changed or updated. These attributes are 5 See the latest EpiDoc Guidelines at http://www.stoa. useful for those wanting to track when particular org/epidoc/gl/latest/. The EpiDoc Guidelines are a changes were made which is useful when providing pure TEI subset which were originally conceived as a translations of content in more than one language.</p><p>20 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019</p><p>A world of difference</p><p>15 The original class memberships of the <name> elem- maintenance or support compared with those neces- ent, shown here for completeness, are commented out. sary for TEI Consortium outputs. 16 Raffaele Viglianti is creating a new customization tool 25 See http://teiboilerplate.org, http://teic.github.io/ (currently referred to as RomaJS) on behalf of the TEI CETEIcean/, http://teipublisher.com and http://visua Technical Council and TEI community. The new tool lizationtechnology.wordpress.com/ for more informa- is undergoing testing and will be provided for open tion about these open source software tools. testing in due course from the TEI Consortium web- 26 See http://www.oxygenxml.com/ for information site http://www.tei-c.org/. about this editor. The oxygen-tei package is provided 17 Though work has been ongoing on the creation of an by the TEI Consortium; see https://github.com/TEIC/ element for describing objects for many years, see oxygen-tei. https://github.com/TEIC/TEI/issues/327 for more 27 See http://tapasproject.org for more information information. about TAPAS. 18 See for example Dot Porter’s VisColl tool https:// 28 I choose to differentiate here between ‘stand-off github.com/leoba/VisColl which provides a more markup’ where annotations are stored separately and structured approach to collation. often used to create new structures that cross hierar- 19 The elected TEI Technical Council considers all re- chies from a number of existing annotations (for quests for changes to the TEI Guidelines submitted example with the <join/> element) and a simpler as issues to the GitHub repository https://github. ‘out-of-line’ markup where additional annotation com/teic/tei. Having a working customization is not (whether part of a conflicting hierarchy or not) is a requirement but can speed up the process. stored separately and points to a single location in 20 If really arguing for this new element, one would usu- the original. ally provide a number of examples of how it was to be 29 See http://www.tei-c.org/Vault/P5/3.4.0/doc/tei-p5- used and why it was needed, as well as countering doc/en/html/SA.html#SAAG for more information obvious queries such as why existing mechanisms on the aggregation of fragmentary structures. like using <name type¼"special"> were not sufficient. 30 More detailed discussion of this approach is available 21 This uses a pure TEI content model, in some earlier in the TEI Guidelines, Chapter 12 ‘Critical Apparatus’, versions of the TEI these were formulated in RELAX Section 12.2.4 ‘Other Linking Methods’ http://www. NG. For information on the move away from RELAX tei-c.org/Vault/P5/3.4.0/doc/tei-p5-doc/en/html/TC. NG content models, see Burnard, 2013. html#TCAPLN. 22 Formats currently supported include bibtex, cocoa, 31 There do exist some generalized tools for working in csv, <a href="/tags/DocBook/" rel="tag">docbook</a>, docx, dtd, epub, html(5), <a href="/tags/XSL/" rel="tag">xsl</a>-fo, json, an out-of-line method, see for example Raffaele InDesign, <a href="/tags/LaTeX/" rel="tag">latex</a>, <a href="/tags/Markdown/" rel="tag">markdown</a>, mediawiki, nlm, odd, pdf, Viglianti’s Core builder http://raffazizzi.github.io/ rdf, relaxng, slides, txt, wordpress, xlsx, xsd, and coreBuilder/. others. While many of these are legacy conversions 32 See the chapter of the TEI Guidelines on Non-hier- created for particular applications with a number of archical Structures at http://www.tei-c.org/Vault/P5/3. built-in assumptions, they often provide basic conver- 4.0/doc/tei-p5-doc/en/html/NH.html for more con- sions or examples for further customization. sideration of strategies for handling overlap in a TEI 23 The OxGarage conversion framework provides a fron- context. tend and RESTful Web API to the TEI Consortium’s 33 A simple demonstration of this is visible in https:// XSLT stylesheets and enables the pipelining of mul- github.com/TEIC/Stylesheets/blob/dev/tools/process tiple conversions going through a number of different pb.xsl which creates files with a non-TEI <page> ele- formats. In general, it uses TEI as a pivot format where ments fragmenting any other hierarchies based on feasible. It is used by other services like Roma (and the <pb/> elements in the source document. new RomaJS). See http://oxgarage.tei-c.org/. 34 There has been much (digital) ink spilt over solutions 24 The TEI Technical Council maintains the TEI to the overlap problem over the years, see also https:// Guidelines, the transformations necessary to produce www.balisage.net/Proceedings/topics/Concurrent_Ma various Web versions of them, associated software, rkup�Overlap.html. build infrastructure, and various GitHub repositories 35 See http://www.tei-c.org/Vault/P5/3.4.0/doc/tei-p5- for all these. Additional stylesheet conversions can be doc/en/html/USE.html#CF for more information provided to the TEI Technical Council for inclusion in about TEI Conformance. The term is most often their offerings but will be a very low priority for abused in research funding proposals.</p><p>Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018 21 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqy071/5248221 by Joint Library of the Hellenic and Roman Societies user on 01 February 2019</p><p>J. Cummings</p><p>36 Syd Bauman’s definitions (see Bauman, 2011) are 40 That the TEI Community won ADHO’s Antonio useful starting points, and in distinguishing between Zampolli Award is a particularly fitting sign of this, ‘negotiated’ and ‘blind’ interchange he raises the prob- given Zampolli’s involvement as one of the founders lem of what forms of mediation are necessary and on of the TEI. whose part. Bauman sees interchange as ‘negotiated’ 41 This is not to cast any blame on the TEI By Example (requiring human interaction) or ‘blind’ (only need- project, it is just a necessary occurrence for any such ing documentation). His hope is that we will support endeavour funded as a single project rather than an ‘blind interchange’ instead of ‘mindless interoperabil- ongoing service. For more information about TEI By ity’, since this enables a balance between expressivity Example, see http://teibyexample.org. and adherence to standards. 42 I have suggested as much in a paper at the TEI 2018 37 Martin Holmes’ ‘CodeSharing API’ is also a step in the conference: ‘ ‘‘Examples work more forcibly on the right direction. It was used to expose the TEI created mind than precepts’’—Expanding and improving the by the Map of Early Modern London project. See use of examples by the TEI Guidelines’. https://github.com/martindholmes/CodeSharing for 43 Indeed, the views of the TEI I have presented here are more information. my own and not necessarily representative of the TEI 38 See for example the manuscript catalogues of the Consortium, and certainly not the TEI community, as Bodleian Libraries which use a common consolidated a whole. TEI ODD customization stored at https://github.com/ 44 See https://wiki.tei-c.org/ for the TEI Consortium wiki bodleian/consolidated-tei-schema/. Several catalogues or https://wiki.tei-c.org/index.php/Category:Tools for following this one schema are presented separately, this ‘Tools’ category. 45 See http://www.tei-c.org/Vault/P5/3.4.0/doc/tei-p5- e.g. https://medieval.bodleian.ox.ac.uk/. doc/zh-TW/html/ref-abbr.html for this page. Note 39 See for example the recent creation of the East Asian that the values remain as English tokens, but the TEI Special Interest Group: http://www.tei-c.org/ glosses and some website architecture appear as Activities/SIG/EastAsian/. Chinese.</p><p>22 Digital Scholarship in the Humanities, Vol. 0, No. 0, 2018</p> </div> </article> </div> </div> </div> <script type="text/javascript" async crossorigin="anonymous" src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8519364510543070"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.1/jquery.min.js" crossorigin="anonymous" referrerpolicy="no-referrer"></script> <script> var docId = '71ef265a849443c0bbf59c280e10298e'; var endPage = 1; var totalPage = 22; var pfLoading = false; window.addEventListener('scroll', function () { if (pfLoading) return; var $now = $('.article-imgview .pf').eq(endPage - 1); if (document.documentElement.scrollTop + $(window).height() > $now.offset().top) { pfLoading = true; endPage++; if (endPage > totalPage) return; var imgEle = new Image(); var imgsrc = "//data.docslib.org/img/71ef265a849443c0bbf59c280e10298e-" + endPage + (endPage > 3 ? ".jpg" : ".webp"); imgEle.src = imgsrc; var $imgLoad = $('<div class="pf" id="pf' + endPage + '"><img src="/loading.gif"></div>'); $('.article-imgview').append($imgLoad); imgEle.addEventListener('load', function () { $imgLoad.find('img').attr('src', imgsrc); pfLoading = false }); if (endPage < 7) { adcall('pf' + endPage); } } }, { passive: true }); </script> <script> var sc_project = 11552861; var sc_invisible = 1; var sc_security = "b956b151"; </script> <script src="https://www.statcounter.com/counter/counter.js" async></script> </html><script data-cfasync="false" src="/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script>