Introduction to XML-TEI

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 1 What is XML? =eXtensible  make explicit what is implicit  to be processed by machines  store and transport data of any types  platform independent  focus on what the data is  not how to display it  extensible  can be expanded, not pre-defined tags  hierarchical tree structure  nested elements  XML documents must be well-formed and validated against a predefined schema  XML vocabularies  determine the semantics of the elements https://en.wikipedia.org/wiki/List_of_XML_markup_languages

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 2 What is TEI? = Text Encoding Initiative

• an XML vocabulary (encoding schema) • an organization & community of users: http://www.tei- c.org/index.xml • The TEI Guidelines: most recent version Proposal 5 or P5 was released in 2007 • Text encoding is a modeling activity: a process of creating an analytical representation of a document or an information system • Analytical process / embedded scholarly interpretation • Modeling and processing textual data

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 3

What types of “processing”?

Website

document

Map

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 4 XML-TEI in the humanities: digital editions and archives • Dictionaries: Digital Dictionary of the German language • Classical texts: Perseus • Works of major authors: Shelley-Godwin, Blake, Dante , Newton, Nietzsche • Thematic archives: Early Women Writers Project, Mapping the Republic of Letters • Letter collections: Van Gogh’s Letters, Letters of 1916 • List of projects using TEI: http://www.tei- c.org/Activities/Projects/

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 5 Stages of a text encoding project

• document analysis: determine the hierarchical structure of the documents that you wish to model in XML. • design a schema: a formal grammar describing where the various elements and attributes you want to use can and cannot be employed. • mark up your document according to the schema and validate it. • refining the document analysis and its schema during the markup phase. • processing different selections of the same data: transform, publish and analyze the results

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 6

Document analysis and XML markup

• XML is used to model some of the structural and semantic properties of the cultural documents used in humanities research according to a hierarchical tree structure - chapter  paragraph  footnotes - canto  stanza  line - act  scene  speech • Analyze document structure based on research questions bibliographic reference vs. author, title, publisher, publication date, etc. i.e. foreign language vs. which particular languages • Markup – descriptive (title, foreign word, emphasis)  multipurposive – presentational (italics) – procedural (change font to italics) – separation of meaning and presentation • The computer extracts lists of marked up elements for processing

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 7 THE TEI GRAMMAR STRUCTURE

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 8 ordered hierarchy of content objects (OHCO) • 1980s-1990s • Text as nesting objects  Problem of overlapping hierarchies ! text text text text text text text text

Zibaldone:

text text

text text

text text

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 9

Basic TEI Guidelines

• Schema • Header • Elements • Attributes Well-formed Valid

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 10 Zibaldone Schema

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 11 Zibaldone Header

An Encoding of <bibl>Zibaldone di pensieri</bibl> by Giacomo Leopardi Silvia Stoyanova Ben Johnston conversion

In preparation

Project Manuzio at www.liberliber.it

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 12 Elements

Elements are delimited by tags and enclose content or can be empty: content = page break

The content can be: - a string of text Casa - other elements Casa, Galat. - text + elements Casa,Galat., cap. 26. princip.

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 13 Attributes

Attributes qualify elements. Attributes have a name and a value Attribute values are enclosed in quotation marks. Casa, Galat., cap. 26. princip. (Luglio o Agosto 1817.)  Identifier attributes for referring from one textual element to another = page break id

= paragraph id

element attribute attribute value

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 14 Well-formed and valid XML document

Well-formedness • XML tree has one root containing all other elements • All XML elements must have a closing tag • XML tags are case sensitive • All XML elements must be properly nested with no overlapping tags • Attribute values must always be quoted. • With XML, white space is preserved. Valid XML: • certain elements in certain contexts (corresponding to the schema)

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 15

Intro to the Oxygen XML editor

• a cross-platform application designed for document development using structured mark-up languages such as XML • supports output to multiple target formats, including: PDF, TXT, HTML and XML • import data from a database, Excel, HTML or text file • checks for well-formedness and validates • easy error tracking - locate the error source by clicking on it • XPath search and evaluation support • content completion (attribute suggestions)

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 16 Transformations after the encoding

CSS • Cascading Style Sheets are a strategy for describing how XML (and HTML) should be rendered. • Descriptive markup describes what the elements in a document mean, but not how they look. CSS is intended to let the designer specify the rendering separately from the XML. XML p.37,1 CSS “other” = []

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 17 XSLT

• eXtensible Style Sheet Language Transformations: a programming language used to transform XML to other forms (other XML, HTML, plain text, etc.). • XSLT can be used to style an XML document, but it can also be used to create entirely new documents by transforming existing ones in almost unlimited ways. XSLT stylesheets can define complex transformations including sorting, grouping, filtering, aggregation, and selection.

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 18 Queries

• XQuery: a language used to query XML databases. • XPath: a formal method of navigating the XML hierarchy, used by XSLT and XQuery. – a list of all person names, you use XPath to find all the person names in your document and XSLT to generate an output document that contains the newly constructed list of contents.

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 20 SVG

Scalable Vector Graphics. An XML vocabulary (schema) for describing graphics. For example, one could use XSLT to transform a Shakespearean play into a bar graph illustrating how much each character speaks.

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 21 References

• David J. Birnbaum, “What is XML and why should humanists care? An even gentler introduction to XML” http://dh.obdurodon.org/what-is- xml.xhtml • Gentle introduction to TEI http://xml.coverpages.org/TEI- GentleIntroXML.pdf • http://www.w3schools.com/xml/default.asp • TEI By Example - http://www.teibyexample.org/

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 22

identify elements, element contents, attributes and attribute values

10/26/2016 Silvia Stoyanova, Introduction to XML-TEI. 23