Introduction to XML-TEI

Introduction to XML-TEI 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 1 What is XML? =eXtensible Markup Language make explicit what is implicit to be processed by machines store and transport data of any types platform independent focus on what the data is not how to display it extensible can be expanded, not pre-defined tags hierarchical tree structure nested elements XML documents must be well-formed and validated against a predefined schema XML vocabularies determine the semantics of the elements https://en.wikipedia.org/wiki/List_of_XML_markup_languages 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 2 What is TEI? = Text Encoding Initiative • an XML vocabulary (encoding schema) • an organization & community of users: http://www.tei- c.org/index.xml • The TEI Guidelines: most recent version Proposal 5 or P5 was released in 2007 • Text encoding is a modeling activity: a process of creating an analytical representation of a document or an information system • Analytical process / embedded scholarly interpretation • Modeling and processing textual data 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 3 What types of “processing”? Website document Map 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 4 XML-TEI in the humanities: digital editions and archives • Dictionaries: Digital Dictionary of the German language • Classical texts: Perseus • Works of major authors: Shelley-Godwin, Blake, Dante , Newton, Nietzsche • Thematic archives: Early Women Writers Project, Mapping the Republic of Letters • Letter collections: Van Gogh’s Letters, Letters of 1916 • List of projects using TEI: http://www.tei- c.org/Activities/Projects/ 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 5 Stages of a text encoding project • document analysis: determine the hierarchical structure of the documents that you wish to model in XML. • design a schema: a formal grammar describing where the various elements and attributes you want to use can and cannot be employed. • mark up your document according to the schema and validate it. • refining the document analysis and its schema during the markup phase. • processing different selections of the same data: transform, publish and analyze the results 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 6 Document analysis and XML markup • XML is used to model some of the structural and semantic properties of the cultural documents used in humanities research according to a hierarchical tree structure - chapter paragraph footnotes - canto stanza line - act scene speech • Analyze document structure based on research questions bibliographic reference vs. author, title, publisher, publication date, etc. i.e. foreign language vs. which particular languages • Markup – descriptive (title, foreign word, emphasis) multipurposive – presentational (italics) – procedural (change font to italics) – separation of meaning and presentation • The computer extracts lists of marked up elements for processing 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 7 THE TEI GRAMMAR STRUCTURE 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 8 ordered hierarchy of content objects (OHCO) • 1980s-1990s • Text as nesting objects Problem of overlapping hierarchies ! <l>text text</l> <l>text text <metaphor> text text</l> <l>text text</metaphor></l> Zibaldone: <quote>text text text text</quote> <quote>text text</quote> 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 9 Basic TEI Guidelines • Schema • Header • Elements • Attributes Well-formed Valid 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 10 Zibaldone Schema 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 11 Zibaldone Header • <teiHeader> <fileDesc> <titleStmt> <title>An Encoding of <bibl>Zibaldone di pensieri</bibl> by Giacomo Leopardi</title> <principal> <persName xml:id="Silvia">Silvia Stoyanova</persName> </principal> <respStmt> <persName xml:id="Ben">Ben Johnston</persName> <resp>conversion</resp> </respStmt> </titleStmt> <publicationStmt> In preparation </publicationStmt> <sourceDesc> Project Manuzio at www.liberliber.it </sourceDesc> </fileDesc> </teiHeader> 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 12 Elements Elements are delimited by tags and enclose content or can be empty: <element>content</element> <pb xml:id="p102" n="102"/> = page break The content can be: - a string of text <persName>Casa</persName> - other elements <ref type="bibl"><persName type=“author">Casa,</persName> <title>Galat.</title></ref> - text + elements <ref type="bibl"><persName type=“author">Casa,</persName><title>Galat.</title>, cap. 26. princip.</ref> 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 13 Attributes Attributes qualify elements. Attributes have a name and a value Attribute values are enclosed in quotation marks. <ref type="bibl"><persName type=“author">Casa</persName>, <title>Galat.</title>, cap. 26. princip.</ref> <note place="inline" type="ink" subtype="dissimilar">(Luglio o Agosto 1817.)</note> Identifier attributes for referring from one textual element to another <pb xml:id="p1473" n="1473"/> = page break id = paragraph id <note place="margin" xml:id="note35.1"> element attribute attribute value 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 14 Well-formed and valid XML document Well-formedness • XML tree has one root containing all other elements • All XML elements must have a closing tag • XML tags are case sensitive • All XML elements must be properly nested with no overlapping tags • Attribute values must always be quoted. • With XML, white space is preserved. Valid XML: • certain elements in certain contexts (corresponding to the schema) 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 15 Intro to the Oxygen XML editor • a cross-platform application designed for document development using structured mark-up languages such as XML • supports output to multiple target formats, including: PDF, TXT, HTML and XML • import data from a database, Excel, HTML or text file • checks for well-formedness and validates • easy error tracking - locate the error source by clicking on it • XPath search and evaluation support • content completion (attribute suggestions) 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 16 Transformations after the encoding CSS • Cascading Style Sheets are a strategy for describing how XML (and HTML) should be rendered. • Descriptive markup describes what the elements in a document mean, but not how they look. CSS is intended to let the designer specify the rendering separately from the XML. XML <ref target="p37_1" type="other">p.37,1</ref> CSS “other” = [] 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 17 XSLT • eXtensible Style Sheet Language Transformations: a programming language used to transform XML to other forms (other XML, HTML, plain text, etc.). • XSLT can be used to style an XML document, but it can also be used to create entirely new documents by transforming existing ones in almost unlimited ways. XSLT stylesheets can define complex transformations including sorting, grouping, filtering, aggregation, and selection. 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 18 Queries • XQuery: a language used to query XML databases. • XPath: a formal method of navigating the XML hierarchy, used by XSLT and XQuery. – a list of all person names, you use XPath to find all the person names in your document and XSLT to generate an output document that contains the newly constructed list of contents. 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 20 SVG Scalable Vector Graphics. An XML vocabulary (schema) for describing graphics. For example, one could use XSLT to transform a Shakespearean play into a bar graph illustrating how much each character speaks. 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 21 References • David J. Birnbaum, “What is XML and why should humanists care? An even gentler introduction to XML” http://dh.obdurodon.org/what-is- xml.xhtml • Gentle introduction to TEI http://xml.coverpages.org/TEI- GentleIntroXML.pdf • http://www.w3schools.com/xml/default.asp • TEI By Example - http://www.teibyexample.org/ 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 22 identify elements, element contents, attributes and attribute values 10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 23 .

Load more