Digital Humanities 2010 The MLCD Overlap Corpus semantic openness of SGML and XML to supply non-hierarchical interpretation of what (MOC) are often thought to be inescapably hierarchical notations. Huitfeldt, Claus The SGML/XML-based approaches to overlap
[email protected] fall, roughly, into three groups: milestones, Department of Philosophy, University of fragmentation-and-reconstitution, and stand- Bergen off annotation. Milestones (described as early as Sperberg-McQueen, C. M. (Barnard et al. 1988), and used in (Sperberg-
[email protected] McQueen and Burnard 1990) and later versions Black Mesa Technologies LLC, USA of the TEI Guidelines) use empty elements to Marcoux, Yves mark the boundaries of regions which cannot be marked simply as elements because they
[email protected] overlap the boundaries of other elements. More Université de Montréal, Canada recently, approaches to milestone markup have been generalized in the Trojan Horse and CLIX markup idioms (DeRose 2004). For some time, theorists and practitioners Fragmentation is the technique of dividing of descriptive markup have been aware that a logical unit which overlaps other units the strict hierarchical organization of elements into several smaller units, which do not; the provided by SGML and XML represents a consuming application can then re-aggregate potentially problematic abstraction. The nesting the fragments. structures of SGML and XML capture an important property of real texts and represent Stand-off annotation addresses the overlap a successful combination of expressive power problem by removing the markup from the main and tractability. But not all textual phenomena data stream of the document, at the same time appear in properly nested form, and for more adding pointers back into the base data.