The Text Encoding Initiative: Flexible and Extensible Document Encoding

Total Page:16

File Type:pdf, Size:1020Kb

The Text Encoding Initiative: Flexible and Extensible Document Encoding The Text Enco ding Initiative: Flexible and Extensible Do cument Enco ding David T. Barnard and Nancy M. Ide Technical Rep ort 96-396 ISSN 0836-0227-96-396 Department of Computing and Information Science Queen's University Kingston, Ontario, Canada K7L 3N6 Decemb er 1995 Abstract The Text Enco ding Initiativeisaninternational collab oration aimed at pro ducing a common enco ding scheme for complex texts. The diversity of the texts used by memb ers of the communities served by the pro ject led to a large sp eci cation, but the sp eci cation is structured to facilitate understanding and use. The requirement for generality is sometimes in tension with the requirement to handle sp ecialized text typ es. The texts that are enco ded often can b e viewed or interpreted in several di erentways. While many electronic do cuments can b e enco ded in very simple ways, 1 some do cuments and some users will tax the limits of any xed scheme, so a exible extensible enco ding is required to supp ort research and to facilitate the reuse of texts. 1 Intro duction Computerized pro cessing of structured do cuments is an imp ortant theme in information science [1, 13 ]. To take full advantage of progress in this eld it is necessary to have convenient electronic representations of structured do cuments. The publication of Guidelines for Electronic Text Encoding and Interchange [14 ] by the Text Enco ding Initiative TEI [16] was the result of several years of collab orative e ort by memb ers of the humanities and linguistics research and academic community in Europ e, North America and Asia to derive a common enco ding scheme for complex textual structures [11]. The TEI was conceived at a meeting held in 1987 at Vassar College in Poughkeepsie, New York. The Vassar meeting drew together researchers involved in the collection, organization, and analysis of electronic texts for linguistic and humanistic research. Some of those present were involved in research programs that included the pro duction of large corp ora of texts, while others had written innovative pieces of software to pro cess electronic texts in various ways. There was a shared view that the enormous varietyofmutually incomprehensible enco ding schemes in use was a hindrance to research. To facilitate the interchange and reuse of texts, and to give guidance to those creating new electronic texts, some new scheme was required that could synthesize the b est ideas and asp ects of existing schemes. The Asso ciation for Computers and the Humanities, the Asso ciation for Computational 2 Linguistics, and the Asso ciation for Literary and Linguistic Computing jointly sp onsored a pro ject to devise a new enco ding scheme. Funding was obtained from the National Endowment for the Humanities USA, Directorate XI I I of the Commission of the Europ ean Communities, the So cial Science and Humanities Research Council of Canada, the Andrew W. Mellon Foundation, and the various organizations|companies, government agencies, universities and institutes|where the participants worked. The work of the pro ject was carried out by committees dealing with sp ecialized topics, and co ordinated by a Steering Committee, a technical review committee, and two editors. Information ab out the Initiative app ears primarily in the Guidelines [14 ] and on the TEI's World Wide Web page [16]. The rst three numb ers of the 1995 volume of Computers and the Humanities deal with various asp ects of the TEI; these have b een reprinted in monograph form [12]. The new enco ding scheme was required to : adequately represent all the textual features needed for research, b e simple, clear and concrete, b e easy for researchers to use without sp ecial-purp ose software, allow the rigorous de nition and ecient pro cessing of texts, provide for user-de ned extensions, and conform to existing and emerging standards. 3 The Standard Generalized Markup Language SGML [9, 10 , 7] was a relatively new standard when the pro ject b egan. Although SGML was thus not in wide use within the communities served by the pro ject, SGML was chosen as the basis for the new enco ding scheme. The TEI enco ding scheme is a complex application of SGML. To promote acceptability of the scheme, it was necessary to ensure that : a common core of textual features could b e easily shared, additional sp ecialist features could b e easily added, multiple parallel enco dings of the same feature were p ossible, users could decide how rich the markup in a do cumentwould b e, with a very small minimal requirement, and there should b e adequate do cumentation of the text and the markup of any electronic text. Here is a simple do cument enco ded using the TEI scheme. <!DOCTYPE tei.2 system 'tei2.dtd'[ <!ENTITY TEI.prose 'INCLUDE'> <!ENTITY english.wsd system 'teien.wsd' SUBDOC> ]> <tei.2> <teiHeader> 4 <fileDesc> <titleStmt><title>Short document.</title> <publicationStmt><p>Unpublished.</p></publicationStmt> <sourceDesc><p>Electronic form is original.</p></sourceDesc> </fileDesc> <profileDesc> <langUsage><language id=eng wsd=english.wsd usage='100'></langUsage> </profileDesc> </teiHeader> <text> <body> <p>A very short TEI document.</p> </body> </text> </tei.2> The rst line of this do cument identi es the do cumenttyp e by p ointing to an external de nition. The next two lines contained within the square brackets on lines 1 and 4 contain de nitions that override those in the external de nition; their purp ose here is to include the de nitions of the tag set for prose do cuments, and to indicate that the do cument b eing enco ded is written in English. From line 5 through to the end of the do cument, there are two parts. The teiHeader contains information describing the electronic do cument; this is akin to an extended bibliographic description. The text contains the enco ded text itself; 5 in this case, there is a single line of text. Realistic do cuments have these same constituent parts { a reference to the TEI de nitions, some lo cal selections of parts of those de nitions, a header, and the text itself. The remainder of this pap er addresses three asp ects of the TEI Guidelines. In the next section we describ e the structure of the de nitions{a exible collection whose parts can b e selected as required for a do cument. We then discuss the tension b etween generality and sp eci city in this de nition that is intended to servewell for many do cument classes. The last asp ect we address is the need to representmultiple views of a single electronic text. 2 The Structure of the TEI Sp eci cations From the b eginning of the pro ject it was clear that the sp eci cation pro duced by the TEI would b e very large. It was also clear that most do cuments would only make use of some parts of the sp eci cation. It was thus necessary to structure the sp eci cations in sucha way that users could select those parts that apply to a sp eci c do cument. The mechanism for making this selection should b e easy to understand and use, and it should not require any formalism or software b eyond what is provided in SGML. The nal sp eci cation arranges features to b e enco ded into several categories. Each set of features is enco ded using a set of SGML tags. The DTD is put together in sucha way that sets of tags can b e included in the DTD or excluded from it, and thus the tags are allowed in a do cument or prohibited, resp ectively [15 ]. 6 These are core features that app ear in all do cuments. provision for characters sets to b e used in do cuments tags for sp ecifying the TEI header including the le description, enco ding description tags for sp ecifying basic elements including paragraphs, emphasis, quotations, list notes, simple p ointers, references simple text structure including front and back matter, p ossibly nested divisions Two sets of tags cover the core features. The rst contains the tags for sp ecifying the TEI header. The second contains tags for basic elements like paragraphs. The de ning asp ect of the core is that it app ears in all bases. There is a base tag set for each ma jor do cument category that can b e enco ded. These are: prose contains only the core features verse structure within lines, rhyme, metrical structure drama cast lists, p erformances, stage directions 7 transcriptions of sp eech utterances, pauses, temp oral information print dictionaries structure of entries, grammatical and typ ographical information terminological databases terms and de nitions Each do cument explicitly selects one of these base tag sets. Each of these bases includes the core features. Some do cuments are more complex; they contain material that is from more than one of these bases. For example, a critical essay that included lengthy quotations from p o ems under discussion would require b oth prose and verse structures. There are mechanisms for combining base tag sets in a single do cument; the details can b e found in the Guidelines. Tag sets for additional features can b e used with any of the base typ es. These are included as necessary, dep ending on the information to b e represented int he enco ded text. The additional tag sets include: linking, segmentation and alignment extended p ointers, segments, anchors simple analytic mechanisms linguistic segments, spans of text feature structures 8 designed for linguistic analysis but useful for recording anyvalues of any named features detected in the text certainty and resp onsibility who tagged this and how certain is the enco ding? transcription of primary sources deletions, illegibility critical apparatus sources for the text, attaching witnesses to spans of text names and dates graphs, networks and trees enco ding data structures to represet information ab out text tables, formulae and graphics language corp ora collections of text, sources For example, to enco de a technical pap er, one would cho ose the prose base.
Recommended publications
  • Transformation Frameworks and Their Relevance in Universal Design
    Universal Access in the Information Society manuscript No. (will be inserted by the editor) Transformation frameworks and their relevance in universal design Silas S. Brown and Peter Robinson University of Cambridge Computer Laboratory 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK e-mail: fSilas.Brown,[email protected] Received: date / Revised version: date Category: Long Paper – Some algorithms can be simplified if the data is first transformed into a convenient structure. Many algo- Key words notations, transformation, conversion, ed- rithms can be regarded as transformations in their ucation, tools, 4DML own right. Abstract Music, engineering, mathematics, and many – Transformation can be important when presenting other disciplines have established notations for writing data to the user and when accepting user input. their documents. Adjusting these notations can con- This last point, namely the importance of transfor- tribute to universal access by helping to address access mation in user interaction, is relevant to universal design difficulties such as disabilities, cultural backgrounds, or and will be elaborated here. restrictive hardware. Tools that support the program- ming of such transformations can also assist by allowing the creation of new notations on demand, which is an 1.1 Transformation in universal design under-explored option in the relief of educational diffi- culties. Universal design aims to develop “technologies which This paper reviews some programming tools that can are accessible and usable by all citizens. thus avoid- be used to effect such transformations. It also introduces ing the need for a posteriori adaptations or specialised a tool, called “4DML”, that allows the programmer to design” [37].
    [Show full text]
  • The Text Encoding Initiative Nancy Ide
    Encoding standards for large text resources: The Text Encoding Initiative Nancy Ide LaboratoirE Parole et Langage Department of Computer Science CNRS/Universitd de Provence Vassar College 29, Avenue R.obert Schuman Poughkeepsie, New York 12601 (U.S.A.) 13621 Aix-en-Provence Cedex 1 (France) e-maih ide@cs, vassar, edu Abstract. The Text Encoding Initiative (TEl) is an considerable work to develop adequately large and international project established in 1988 to develop appropriately constituted textual resources still remains. guidelines for the preparation and interchange of The demand for extensive reusability of large text electronic texts for research, and to satisfy a broad range collections in turn requires the development of of uses by the language industries more generally. The standardized encoding formats for this data. It is no longer need for standardized encoding practices has become realistic to distribute data in ad hoc formats, since the inxreasingly critical as the need to use and, most eflbrt and resources required to clean tip and reformat the importantly, reuse vast amounts of electronic text has data for local use is at best costly, and in many cases dramatically increased for both research and industry, in prohibitive. Because much existing and potentially particular for natural language processing. In January available data was originally formatted R)r the purposes 1994, the TEl isstled its Guidelines for the Fmcoding and of printing, the information explicitly represented in the hiterehange of Machine-Readable Texts, which provide encoding concerns a imrticular physical realization of a standardized encoding conventions for a large range of text rather than its logical strttcture (which is of greater text types and features relevant for a broad range of interest for most NLP applications), and the applications.
    [Show full text]
  • SGML As a Framework for Digital Preservation and Access. INSTITUTION Commission on Preservation and Access, Washington, DC
    DOCUMENT RESUME ED 417 748 IR 056 976 AUTHOR Coleman, James; Willis, Don TITLE SGML as a Framework for Digital Preservation and Access. INSTITUTION Commission on Preservation and Access, Washington, DC. ISBN ISBN-1-887334-54-8 PUB DATE 1997-07-00 NOTE 55p. AVAILABLE FROM Commission on Preservation and Access, A Program of the Council on Library and Information Resources, 1400 16th Street, NW, Suite 740, Washington, DC 20036-2217 ($20). PUB TYPE Reports Evaluative (142) EDRS PRICE MF01/PC03 Plus Postage. DESCRIPTORS *Access to Information; Computer Oriented Programs; *Electronic Libraries; *Information Retrieval; Library Automation; Online Catalogs; *Preservation; Standards IDENTIFIERS Digital Technology; *SGML ABSTRACT This report explores the suitability of Standard Generalized Markup Language (SGML) as a framework for building, managing, and providing access to digital libraries, with special emphasis on preservation and access issues. SGML is an international standard (ISO 8879) designed to promote text interchange. It is used to define markup languages, which can then encode the logical structure and content of any so-defined document. The connection between SGML and the traditional concerns of preservation and access may not be immediately apparent, but the use of descriptive markup tools such as SGML is crucial to the quality and long-term accessibility of digitized materials. Beginning with a general exploration of digital formats for preservation and access, the report provides a staged technical tutorial on the features and uses of SGML. The tutorial covers SGML and related standards, SGML Document Type Definitions in current use, and related projects now under development. A tiered metadata model is described that could incorporate SGML along with other standards to facilitate discovery and retrieval of digital documents.
    [Show full text]
  • A Standard Format Proposal for Hierarchical Analyses and Representations
    A standard format proposal for hierarchical analyses and representations David Rizo Alan Marsden Department of Software and Computing Lancaster Institute for the Contemporary Arts, Systems, University of Alicante Lancaster University Instituto Superior de Enseñanzas Artísticas de la Bailrigg Comunidad Valenciana Lancaster LA1 4YW, UK Ctra. San Vicente s/n. E-03690 [email protected] San Vicente del Raspeig, Alicante, Spain [email protected] ABSTRACT portant place in historical research, in pedagogy, and even in such In the realm of digital musicology, standardizations efforts to date humble publications as concert programme notes. have mostly concentrated on the representation of music. Anal- Standard text is generally a crucial part of music analyses, often yses of music are increasingly being generated or communicated augmented by specialist terminology, and frequently accompanied by digital means. We demonstrate that the same arguments for the by some form of diagram or other graphical representation. There desirability of standardization in the representation of music apply are well established ways of representing texts in digital form, but also to the representation of analyses of music: proper preservation, not for the representation of analytical diagrams. In some cases sharing of data, and facilitation of digital processing. We concen- the diagrams are very like music notation, with just the addition of trate here on analyses which can be described as hierarchical and textual labels, brackets or a few other symbols (see Figure 1). In show that this covers a broad range of existing analytical formats. other cases they are more elaborate, and might not make direct use We propose an extension of MEI (Music Encoding Initiative) to al- of musical notation at all.
    [Show full text]
  • Syllabus FREN379 Winter 2020
    Department of French and Italian Studies University of Washington Winter 2020 FRENCH 379 Eighteenth-Century France Through Digital Archives and Tools Tues, Thurs 1:30-3:20 Denny 159 Geoffrey Turnovsky ([email protected]) Padelford C-255; 685-1618 Office Hours: M 1-3pm and by appointment Description. The last decade or two has witnessed a huge migration of texts and data onto digital platforms, where they can be accessed, in many cases, by anyone anywhere. This is a terrific benefit to students and teachers alike, who otherwise wouldn't be able to consult these materials; and it has transformed the kind of work and research we can do in the French program and in the Humanities. We can now discover obscure, archival documents which we would never have been able to find in the past. And we can look at classic works in their original forms, rather than in contemporary re-editions that often change and modernize the works. Yet this ease of access brings challenges: to locate these resources on the web, to assess their quality and reliability, and to understand how to use them, as primary sources and “data”, and as new research technologies. The PDF of a first edition downloaded through Google Books certainly looks like the historical printed book it reproduces; but it is not that printed book. It is a particular image of one copy of it, created under certain conditions and it can be a mistake to forget the difference. In this course, we'll explore a variety of digital archives, databases and tools that are useful for studying French cultural history.
    [Show full text]
  • THE TEXT ENCODING INITIATIVE Edward
    THE TEXT ENCODING INITIATIVE Edward Vanhoutte & Ron Van den Branden Centre for Scholarly Editing and Document Studies Royal Academy of Dutch Language and Literature - Belgium Koningstraat 18 9000 Gent Belgium [email protected] [email protected] KEYWORDS: text-encoding, markup, markup languages, XML, humanities Preprint © Edward Vanhoutte & Ron Van den Branden - The Text Encoding Initiative Bates & Maack (eds.), Encyclopedia of Library and Information Sciences - 1 ABSTRACT The result of community efforts among computing humanists, the Text Encoding Initiative or TEI is the de facto standard for the encoding of texts in the humanities. This article explains the historical context of the TEI, its ground principles, history, and organisation. Preprint © Edward Vanhoutte & Ron Van den Branden - The Text Encoding Initiative Bates & Maack (eds.), Encyclopedia of Library and Information Sciences - 2 INTRODUCTION The Text Encoding Initiative (TEI) is a standard for the representation of textual material in digital form through the means of text encoding. This standard is the collaborative product of a community of scholars, chiefly from the humanities, social sciences, and linguistics who are organized in the TEI Consortium (TEI-C <http://www.tei-c.org>). The TEI Consortium is a non-profit membership organisation and governs a wide variety of activities such as the development, publication, and maintenance of the text encoding standard documented in the TEI Guidelines, the discussion and development of the standard on the TEI mailing list (TEI-L) and in Special Interest Groups (SIG), the gathering of the TEI community on yearly members meetings, and the promotion of the standard in publications, on workshops, training courses, colloquia, and conferences.
    [Show full text]
  • Towards a Content-Based Billing Model: the Synergy Between Access Control and Billing
    TOWARDS A CONTENT-BASED BILLING MODEL: THE SYNERGY BETWEEN ACCESS CONTROL AND BILLING Peter J. de Villiers and Reinhardt A. Botha Faculty of Computer Studies, Port Elizabeth Technikon, Port Elizabeth [email protected], [email protected] Abstract In the internet environment the importance of content has grown. This has brought about a change in emphasis to provide content-based ser- vices in addition to traditional Internet transactions and services. The introduction of content-based services is the key to increase revenues. To e®ectively increase revenues through content-based services e±cient billing systems need to be implemented. This paper will investigate a presumed synergy between access control and billing. The investigation identi¯es three phases. Subsequently the similarities and di®erences be- tween access control and billing in each of these phases are investigated. This investigation assumes that information delivery takes place in an XML format. Keywords: access control, billing, XML 1. INTRODUCTION The global economy has made a transition from an economy with an industrial focus to being an economy based on knowledge and infor- mation. This new paradigm is enabled by the use of Information and Communication Technologies (ICT). The increasing pace of technological innovations in the ¯eld of ICT has given rise to new ways of communicating, learning and conducting business. The Internet has facilitated the establishment of a "borderless" environment for communications and the electronic delivery of certain services. This is known as electronic business (e-business). Electronic business is the key to doing business in the new global economy that is based on knowledge and information [15].
    [Show full text]
  • 3.4 the Extensible Markup Language (XML)
    TEI BY EXAMPLE MODULE 0: INTRODUCTION TO TEXT ENCODING AND THE TEI Edward Vanhoutte Ron Van den Branden Melissa Terras Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium, Gent, 9 July 2010 Last updated September 2020 Licensed under a Creative Commons Attribution ShareAlike 3.0 License Module 0: Introduction to Text Encoding and the TEI TABLE OF CONTENTS 1. Introduction....................................................................................................................................................................1 2. Text Encoding in the Humanities...............................................................................................................................2 3. Markup Languages in the Humanities.......................................................................................................................3 3.1 Procedural and Descriptive Markup.................................................................................................................... 3 3.2 Early Attempts......................................................................................................................................................... 3 3.3 The Standard Generalized Markup Language (SGML)...................................................................................... 4 3.4 The eXtensible Markup Language (XML)............................................................................................................5 4. XML: Ground Rules........................................................................................................................................................5
    [Show full text]
  • Deliverable 5.2 Score Edition Component V2
    Ref. Ares(2021)2031810 - 22/03/2021 TROMPA: Towards Richer Online Music Public-domain Archives Deliverable 5.2 Score Edition Component v2 Grant Agreement nr 770376 Project runtime May 2018 - April 2021 Document Reference TR-D5.2-Score Edition Component v2 Work Package WP5 - TROMPA Contributor Environment Deliverable Type Report Dissemination Level PU Document due date 28 February 2021 Date of submission 28 February 2021 Leader GOLD Contact Person Tim Crawford (GOLD) Authors Tim Crawford (GOLD), David Weigl, Werner Goebl (MDW) Reviewers David Linssen (VD), Emilia Gomez, Aggelos Gkiokas (UPF) TR-D5.2-Score Edition Component v2 1 Executive Summary This document describes tooling developed as part of the TROMPA Digital Score Edition deliverable, including an extension for the Atom text editor desktop application to aid in the finalisation of digital score encodings, a Javascript module supporting selection of elements upon a rendered digital score, and the Music Scholars Annotation Tool, a web app integrating this component, in which users can select, view and interact with musical scores which have been encoded using the MEI (Music Encoding Initiative) XML format. The most important feature of the Annotation Tool is the facility for users to make annotations to the scores in a number of ways (textual comments or cued links to media such as audio or video, which can then be displayed in a dedicated area of the interface, or to external resources such as html pages) which they ‘own’, in the sense that they may be kept private or ‘published’ to other TROMPA users of the same scores. These were the main requirements of Task 5.2 of the TROMPA project.
    [Show full text]
  • Introduction
    C. M. SPERBERG-MCQUEEN Editor Text Encoding Initiative University of Illinois at Chicago The Text Encoding Initiative: Electronic Text Markup for Research ABSTRACT This paper describes the goals and work of the Text Encoding Initiative (TEI), an international cooperative project to develop and disseminate guidelines for the encoding and interchange of electronic text for research purposes. It begins by outlining some basic problems that arise in the attempt to represent textual material in computers and some problems that arise in the attempt to encourage the sharing and reuse of electronic textual resources. These problems provide the necessary background for a brief review of the origins and organization of the Text Encoding Initiative itself. Next, the paper describes the rationale for the decision of the TEI to use the Standard Generalized Markup Language (SGML) as the basis for its work. Finally, the work accomplished by the TEI is described in general terms, and some attempt is made to clarify what the project has and has not accomplished. INTRODUCTION This paper describes the goals and work of the Text Encoding Initiative (TEI), an international cooperative project to develop and disseminate guidelines for the encoding and interchange of electronic text for research purposes. In the simplest possible terms, the TEI is an attempt to find better ways to put texts into computers for the purposes of doing research that uses those texts. 35 36 CM. SPERBERG-MCQUEEN The paper will first discuss some basic problems involved in that process, then some practical aspects of the reuse and reusability of textual resources. With the context thus clarified, the origins and organization of the TEI itself can then be described briefly, along with the reasons behind the decision of the TEI to use the Standard Generalized Markup Language (SGML) as the basis for its work.
    [Show full text]
  • Text Encoding Initiative Semantic Modeling. a Conceptual Workflow Proposal
    Text Encoding Initiative semantic modeling. A conceptual workflow proposal Fabio Ciotti1, Marilena Daquino2, and Francesca Tomasi2 1 Department of Humanities, University of Roma, Italy [email protected] 2 Department of Classical Philology and Italian Studies, University of Bologna, Italy {marilena.daquino2, francesca.tomasi}@unibo.it Abstract. In this paper we present a proposal for the XML TEI semantic enhancement, through an ontological modelization based on a three level approach: an ontological generalization of the TEI schema; an intensional semantics of TEI elements; an extensional semantics of the markup content. A possible TEI semantic enhancement will be the result of these three levels dialogue and combination. We conclude with the ontology mapping issue and a Linked Open Data suggestion for digital libraries based on XML TEI semantically enriched model. Keywords: ontology, TEI, XML, interoperability, LOD, digital libraries. 1 Introduction Digital libraries are moving towards a radical identity redefinition. Semantic Web technologies are, in particular, contributing to increase the expressivity of the digital library as an unexplored reservoir of raw data, on which keyword as ontologies, RDF, OWL, metadata and controlled vocabularies play a crucial role. “Yet semantic technologies offer a new level of flexibility, interoperability, and relationships for digital repositories” [1]. Typically XML is the meta-language used in order to populate libraries with documents aiming at ensuring the maximum syntactic interchange, even between technological heterogeneous repositories. In the „humanistic community‟ XML is commonly used in combination with TEI (Text Encoding Initiative)1, the „controlled vocabulary‟ for declaring humanistic-domain-based interpretations. TEI is a flexible and customizable schema that represents a shared approach in the community, demonstrated by the fact that the most of existent digital libraries of national textual traditions are based on this standard2.
    [Show full text]
  • Markup Languages and TEI XML Encoding
    Methods and tools for Digital Philology: markup languages and TEI XML encoding Digital Tools for Humanists Summer School 2019 Pisa, 10-14 June 2019 Roberto Rosselli Del Turco Dipartimento di Studi Umanistici Università di Torino [email protected] XML encoding Markup languages there are many markup languages, which differ greatly fundamental distinction: procedural markup vs. descriptive markup procedural markup is typical of word processors: instructions for specifying where the characters should appear on the page, their appearance, etc. WYSIWYG approach, but also see LaTeX the user doesn’t see or modify the markup directly (but again see LaTeX) descriptive markup describes text this distinction isn’t as neat as one would love to think, see for instance the structural aspect of text 2 XML encoding Descriptive markup allows the scholar to do a semantic annotation of text the current standard is the XML language (← SGML) in spite of the multiple hierarchies problem XML has been used to produce many different encoding schemas: TEI schemas for all types of texts TEI-derived schemas: EpiDoc, MEI, CEI, etc. other schemas: DOCBOOK, MML – Music Markup Language, MathML, SVG, etc. it is also possible to create a personal encoding schema, but you would need a very good reason not to use TEI XML 3 Il linguaggio XML Markup languages: XML SGML is the “father” of XML (eXtensible Markup Language) XML was created to replace both SGML, offering similar characteristics but a much lower complexity, and also HTML, going beyond the intrinsic
    [Show full text]