Semantic Web 0 (0) 1 1 IOS Press

OLiA – Ontologies of Linguistic Annotation

Christian Chiarcos Information Sciences Institute, University of Southern California, [email protected]

Abstract. This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets currently available as part of Linguistic Linked Open Data (LLOD) cloud. The OLiA ontologies represent a repository of annotation terminology for various linguistic phenomena on a great band-width of languages, they have been used to facilitate interoperability and information integration of linguistic annotations in corpora, NLP pipelines, and lexical-semantic resources.

Keywords: linguistic terminology, grammatical categories, annotation schemes, corpora, NLP, tag sets, annotation schemes

1. Background 2. Architecture

The heterogeneity of linguistic annotations has been The Ontologies of Linguistic Annotations [3] rep- recognized as a key problem limiting the interoper- resent a modular architecture of OWL/DL ontologies ability and reusability of NLP tools and linguistic data that formalize several intermediate steps of the map- collections. Several repositories of linguistic annota- ping between annotations, a ‘Reference Model’ and tion terminology have been developed to facilitate an- existing terminology repositories (‘External Reference notation interoperability by means of a joint level of Models’). representation, or an ‘interlingua’, the most prominent The OLiA ontologies were developed as part of an probably being the General Ontology of Linguistic De- infrastructure for the sustainable maintenance of lin- scription [13, GOLD] and the ISO TC37/SC4 Data guistic resources [27], and their primary fields of appli- Category Registry [18, ISOcat]. cation include the formalization of annotation schemes Still, these repositories are developed by different and concept-based querying over heterogeneously an- communities, and are thus not always compatible with notated corpora [23,9]. each other, neither with respect to their definitions, or In the OLiA architecture, four different types of on- their technologies (e.g., there is no commonly agreed tologies are distinguished: formalism to link linguistic annotations to terminol- – The OLIAREFERENCE MODEL specifies the ogy repositories), and harmonization efforts are still in common terminology that different annotation their early stages [17]. schemes can refer to. It is derived from existing The Ontologies of Linguistic Annotation (OLiA) repositories of annotation terminology and ex- have been developed to facilitate the development of tended in accordance with the annotation schemes applications that take benefit of a well-defined termi- that it was applied to. nological backbone even before the GOLD and ISOcat – Multiple OLIAANNOTATION MODELs formal- repositories have converged into a generally accepted ize annotation schemes and tagsets. Annotation reference terminology: They introduce an intermedi- Models are based on the original documentation, ate level of representation between ISOcat, GOLD and so that they provide an interpretation-independent other repositories of linguistic reference terminology representation of the annotation scheme. and are interconnected with these resources, and they – For every Annotation Model, a LINKING MODEL provide not only means to formalize reference cate- defines v relationships between concepts/properties gories, but also annotation schemes, and the way that in the respective Annotation Model and the Ref- these are linked with reference categories. erence Model. Linking Models are interpretations

1570-0844/0-1900/$27.50 c 0 – IOS Press and the authors. All rights reserved 2 Chiarcos /

of Annotation Model concepts and properties in ent values for 16 MorphosyntacticFeatures, 4 terms of the Reference Model. MorphosyntacticFeatures, 4 SyntacticFea- – Existing terminology repositories can be inte- tures and 4 SemanticFeatures (for glosses, grated as EXTERNAL REFERENCE MODELs, if part-of-speech annotation and for edge labels in syntax they are represented in OWL/DL. Then, Link- annotation). ing Models specify v relationships between Ref- As for morphological, morphosyntactic and syntac- erence Model concepts and External Reference tic annotations, the OLiA ontologies include 32 An- Model concepts. notation Models for about 70 different languages, in- cluding several multi-lingual annotation schemes, e.g., The OLiA Reference Model specifies classes for EAGLES [3] for 11 Western European languages, olia:Determiner linguistic categories (e.g., ) and and MULTEXT/East [7] for 15 (mostly) Eastern Eu- olia:Accusative grammatical features (e.g., ), as ropean languages. As for non-(Indo-)European lan- well as properties that define relations between these guages, the OLiA ontologies include morphosyntac- olia:hasCase (e.g., ). Far from being yet another tic annotation schemes for languages of the Indian annotation terminology ontology, the OLiA Reference subcontinent, for Arabic, Basque, Chinese, Estonian, Model does not introduce its own view on the linguis- Finnish, Hausa, Hungarian and Turkish, as well as tic world, but rather, it is a derivative of EAGLES [19], multi-lingual schemes applied to languages of Africa, MULTEXT/East [12], and GOLD [13] that was intro- the Americas, the Pacific and Australia. The OLiA on- duced as a technical means to interpret linguistic anno- tologies also cover historical language stages, includ- tations with respect to these terminological reposito- ing Old High German, Old Norse and Old/Classical ries, and further enriched with information drawn from Tibetan. Additionally, 7 Annotation Models for differ- the annotation schemes it was applied to. ent resources with discourse annotations have been de- Conceptually, Annotation Models differ from the veloped. Reference Model in that they include not only concepts External reference models currently linked to the and properties, but also individuals: Individuals rep- OLiA Reference Model include GOLD [3], the Onto- resent concrete tags, while classes represent abstract Tag ontologies [1], an ontological remodeling of ISO- concepts similar to those of the Reference Model. cat [4], and the Typological Database System (TDS) ontologies [25]. Unfortunately, neither of these exter- nal reference models can currently be redistributed be- 3. Data Set Description cause of an uncertain licensing situation (GOLD and ISOcat lack explicit license information)1 or other re- The OLiA ontologies are available from http: strictions (the OntoTag and TDS ontologies are avail- //purl.org/olia under a Creative Commons At- able to the author but have not been publicly released). tribution license (CC-BY). In this context, the function of the OLiA Reference The OLiA ontologies cover different grammatical Model is not to provide a novel and independent view phenomena, including inflectional morphology, word on linguistic terminology, but rather to serve as a sta- classes, phrase and edge labels of different syntax an- ble intermediate representation between (ontological notations, as well as prototypes for discourse annota- models of) annotation schemes and these terminology tions (coreference, discourse relations, discourse struc- repositories. This allows any concept that can be ex- ture and information structure). Annotations for lexical pressed in terms of the OLiA Reference Model also to semantics are only covered to the extent that they are be interpreted in the context of ISOcat, GOLD, Onto- encoded in syntactic and morphosyntactic annotation Tag or TDS. schemes. For lexical semantic annotations in general, As compared to a direct linking between annotation a number of reference resources is already available, models and these terminology repositories, the mod- including RDF versions of WordNet and FrameNet. ular structure limits the number of linkings that need In recent years, the OLiA ontologies have been to be defined (if a new Annotation Model is linked to substantially extended. At the time of writing, the Morpho- OLiA Reference Model distinguishes 14 1 logicalCategory Morpho- Nevertheless, the developers are sympathetic to the idea of s (morphemes), 263 releasing this data under an open license, Helen Aristar-Dry (for syntacticCategorys (word classes), 83 Syn- GOLD) and Menzo Windhouwer (for ISOcat), pers. communication, tacticCategorys (phrase labels), and 326 differ- June 2012. Chiarcos / 3 the Reference Model, it inherits its linking with ISO- cat, GOLD, OntoTag and TDS), and also, it provides stability (GOLD and ISOcat are developed in com- munity processes with occasional revisions), a clear and non-redundant taxonomical organization (similar to GOLD, TDS and OntoTag, but very different from the semi-structured ISOcat) and establishes interoper- ability between GOLD and ISOcat (that – despite on- going harmonization efforts [17] – are maintained by different communities and developed independently). Using the OLiA Reference Model, it is thus possible to develop applications that are interoperable in terms of GOLD and ISOcat even though both are still under de- velopment and both differ in their conceptualizations. Such applications are briefly described in the follow- Fig. 1. Interpreting annotations in terms of the OLiA Reference ing section. Model

not new) insight’ from the Potsdam Commentary 4. Application Corpus [30, file 4794], with part-of-speech annota- tions according to the STTS scheme [26]: The tag Initially, the OLiA ontologies have been intended PDAT matches the surface string of the individual to serve a documentation function, i.e., as a formal stts:PDAT from the STTS Annotation Model.4 The means to specify the semantics of annotation schemes superconcept stts:AttributiveDemonstra- [27]. From the ontologies, dynamic HTML can be gen- tivePronoun is a subconcept of olia:Demon- erated,2 and tags in the annotation can be represented strativeDeterminer (STTS Linking Model).5 as hyperlinks pointing to the corresponding definition The word diese ‘this’ from the example can thus be [9]. described in terms of the OLiA Reference Model as In earlier corpus query systems, e.g., ANNIS [8], olia:DemonstrativeDeterminer, etc. and SPLICR [23], OLiA was used to formulate in- These ontology-based descriptions are compara- teroperable corpus queries: Instead of querying for ble across different corpora and/or NLP tools, across cat="NX" to retrieve noun phrases from the TüBa- different languages, and even across different types D/Z corpus [31] or cat="NP" on the NEGRA cor- of language resources: Recently, the OLiA ontolo- pus [29, both are corpora of German newspaper text], gies have also been applied to represent grammatical a query for cat in {olia:NounPhrase} was specifications of machine-readable dictionaries, that expanded into a disjunction of possible tags [8]. If are thus interoperable with OLiA-linked corpora [20]. corpora are represented as Linked Data, they can be Moreover, through the linking with External Refer- directly linked with OLiA Annotation Models, and ence Models like GOLD and ISOcat, OLiA-linked re- queried with SPARQL without a query preprocessor sources are also interoperable with resources directly [5]. grounded in either GOLD or ISOcat. In a similar vein, OLiA can be employed in NLP Using Semantic Web formalisms to represent cor- pipeline systems and other NLP pipeline systems pora and annotations also provides us with the pos- for tagset-independent, interoperable information pro- sibility to develop novel, ontology-based NLP algo- cessing [1]. In this function, OLiA is part of the speci- rithms. One application are ensemble combination ar- fications of the NLP Interchange Format (NIF).3 chitectures, where different NLP modules (say, part- Figure 1 illustrates how annotations can be mapped of-speech taggers) are applied in parallel, so that they onto Reference Model concepts for the German phrase produce annotations for one particular phenomenon, Diese nicht neue Erkenntnis ‘this well-known (lit. and that these annotations are then integrated. Using

2http://code.google.com/p/ co-ode-owl-plugins/wiki/OWLDoc 4http://purl.org/olia/stts.owl 3http://nlp2rdf.org/nif-1-0 5http://purl.org/olia/stts-link.rdf 4 Chiarcos /

OLiA Reference Model specifications to integrate the disjunction (t). As compared to tagset-specific so- analyses of multiple NLP tools for German, [10] could lutions [24, | for ambiguities, and + for cliticiza- show that a simple majority-based combination could tion/fusion], OWL/DL provides a W3C-standardized increase both the robustness and the level of detail of vocabulary to express these relationships, that also morphosyntactic and morphological analyses. Similar extends beyond individual tagsets. Another differ- results have been obtained with the OntoTag ontolo- ence is that negation (owl:complementOf) is gies for Spanish [22]. available in the linking. This is of particular im- portance for the linking between External Refer- ence Models and the OLiA Reference Model. For 5. Discussion example, an olia:ProQuantifier (pronominal quantifier, can substitute for an independent noun This paper summarized the development of the phrase, e.g., someone) can be defined as subclass OLiA ontologies since 2006, their current status, and of gold:Quantifier. According to its definition, a number of applications that have been developed on however, gold:Quantifier primarily pertains to this basis. Determiners, so that a more appropriate superclass The fundamental idea of the OLiA architecture is would be gold:Quantifieru¬gold:Determiner. that annotation schemes are linked to community- The physical separation of Linking Models from maintained terminology repositories through an inter- Annotation Models and Reference Model introduces mediate ‘Reference Model’, thereby minimizing the a clear distinction between externally provided infor- number of mappings necessary establish interoperabil- mation and the ontology engineer’s interpretation. An- ity of one annotation scheme with multiple termi- notation Models formalize annotation documentation, nology repositories. Further, annotation schemes and and the Reference Model is based on a generalization their linking to the Reference Model are formalized as of a broad band-width of resources. However, there separate OWL/DL ontologies, so that interpretation- may be different terminological traditions involved, independent conceptualization (annotation documen- so that apparently similar concepts found in Refer- tation) and its interpretation in terms of the Reference ence Model and Annotation Model are in fact unre- Model (linking) are properly distinguished. lated. If nevertheless an incorrect identification takes The OLiA ontologies differ from related approaches place, the linking can be inspected by standard ontol- in that they take a focus on modeling annotation ogy browsers, and corrected independently from the schemes and their linking with reference categories interpretation-invariant Annotation Model and Refer- rather than merely providing reference categories. ence Model. Furthermore, multiple linkings between The differentiation of Annotation Models, the OLiA an Annotation Model and the Reference Model can be Reference Model and External Reference Models implemented, e.g., to accommodate for systematic tag- (community-maintained terminology repositories) rep- ger errors (i.e., more extensive usage of owl:join), resents increasing levels of abstraction, and, possibly, or for multiple dialects of the same tagset (e.g., the loss of information. However, no information about STTS tagset distinguishes indefinite attributive pro- the original annotation is lost, and tools may chose nouns in indefinite noun phrases [PIAT] and in defi- the appropriate level of abstraction. Unlike a direct nite noun phrases [PIDAT], but in the TüBa-D/Z cor- mapping approach, OLiA allows to recover informa- pus [31], PIAT covers both uses). tion about sources of mismatches between Reference In ISOcat, the problem of conflicting interpretations Model concepts and Annotation Model concepts, be- of data categories is currently not addressed, and the cause a declarative linking is provided that allows in- definitions provided are not always sufficient to distin- spection and refinement using standard RDF/OWL guish classes, e.g., the category definite/DC-2004 tools. is defined as ‘value referring to the capacity of identi- The relationship between annotations and refer- fication of an entity’. The concept is (at least partially) ence concept is not only represented in a transpar- grounded in MULTEXT/East [15], which, however, ent way, but also, conceptual mismatches can be rep- conflate different uses of ‘definite’: (1) postfixed De- resented. Many tagsets for part-of-speech annotation, terminer in Romanian, Bulgarian and Persian nouns or for example, introduce hybrid categories to represent adjectives, (2) difference between ‘full’ and ‘reduced’ either conceptual overlap/fusion or ambiguity using adjectives in Slavic (diachronically, full forms reflect OWL/DL constructs to represent conjunction (u) or a clitic pronominal), (3) a pattern of quantifier agree- Chiarcos / 5 ment in Slavic, and (4) the so-called ‘definite conjunc- plied to develop NLP applications on the basis of on- tion’ of Hungarian verbs. Even though the generic def- tological representations of linguistic annotations [22]. inition captures most of these different meanings, they One important difference is that the OntoTag ontolo- remain incompatible with each other. However, with- gies are considering only the languages of the Iberian out modeling relations between different language- peninsula (in particular Spanish), that they are partially specific annotation schemes and a data category reg- designed with a top-down perspective (whereas the de- istry from a global perspective, it is possible that such velopment of the OLiA Reference Model is guided by ill-defined data categories and/or links remain unde- the annotation schemes it is applied to) and are thus tected. Within MULTEXT/East, for example, only the richer in consistency constraints (that are, however, of- ontological modeling of language-specific annotation ten language-specific), and that the OntoTag ontolo- schemes and the common morphosyntactic specifica- gies are not publicly available at the moment. Within tions led to the proper differentiation between these the OLiA architecture, the morphosyntactic layer of different conceptions of ‘definite’ [7]. The OLiA Ref- the OntoTag ontologies is integrated as an External erence Model provides such a fully developed taxon- Reference Model [1]. omy of linguistic categories. Recent activities to aug- The OLiA ontologies may play an important role ment ISOcat with a relation category registry [28] may in NLP, corpus and annotation interoperability in that eventually lead to a comparable global perspective, so they relate these activities to initiatives in different lin- that the problem of conflicting interpretations of data guistic communities to establish reference repositories categories may become more obvious to ISOcat devel- for linguistic annotation terminology, e.g., recent de- opers, but these are still on-going developments. velopments towards the creation of a Linguistic Linked In comparison to GOLD, OLiA is more focused Open Data (LLOD) cloud. In this context, the OLiA on NLP and corpus interoperability, whereas GOLD ontologies are used to provide linguistic reference ter- originates from the language documentation commu- minology for lexical-semantic resources such as lemon nity. Therefore, a number of data categories com- [21] and Uby [11] as well as for linguistic corpora such monly assumed in NLP were not originally repre- as the Manually Annotated Sub-Corpus of the Open sented in GOLD. For example, gold:CommonNoun American National Corpus [16].6 was added only recently (between 2006 and 2008), fol- lowing a suggestion by the author. While the GOLD community process will eventually lead to a compen- Acknowledgments sation of such coverage issues, a more fundamental problem is that the views of academic linguists and The OLiA ontologies have originally been devel- NLP engineers may deviate with respect to the over- oped at the Collaborative Research Center (SFB) 441 arching taxonomy of concepts. GOLD, for example, “Linguistic Data Structures” (University of Tübingen) seems to conflate both semantic roles (‘case’ in the in the context of the project “Sustainability of Linguis- sense of [14], e.g., gold:BenefactiveCase) and tic Resources” in cooperation with SFB 632 “Infor- syntactic roles under gold:CaseProperty. There- mation Structure” (University of Potsdam, Humboldt- fore, OLiA adopts a relatively agnostic view on the University Berlin) and SFB 538 “Multilingualism” taxonomical order of concepts. While the taxonomy (University of Hamburg) from 2006 to 2008. From is modeled in a specific way (mostly following estab- 2007 to 2011, they have been maintained and further lished annotation schemes), it is not assumed that this developed in the context of SFB 632 in the context of way of modeling is the only possibility. In fact, alterna- the project “Linguistic Data Base”. In 2012, the author continued his research on OLiA in the context of Post- tive taxonomies can be formulated as External Refer- Doc fellowship at the Information Sciences Institute ence Models, and OWL/DL-based allows to formulate of the University of Southern California funded by the specific conditions for the linking, including the use of German Academic Exchange Service (DAAD). negation and disjunction. Consequently, mismatches can be represented. (As opposed to this, GOLD Com- munity of Practice Extensions are assumed to adopt the 6A Linked Data version of the corpus, MASC 1.0.3, was GOLD hierarchy and only to extend it, not to redefine generated from data available under http://datahub.io/ dataset/masc in preparation of the MLODE workshop. It is part it.) of the LLOD cloud diagram, but not yet linked with other resources Conceptually, the OLiA ontologies are closer re- as it will soon be deprecated by the release of a new version of the lated to the OntoTag ontologies [2], that were also ap- corpus and a revision of its data model. 6 Chiarcos /

In parts, this data set description is based on [6], [16] Ide N, Baker C, Fellbaum C, Fillmore C, Passonneau R (2008) shortened, updated and thoroughly revised. MASC: The Manually Annotated Sub-Corpus of American English. In: Proc. 6th Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco [17] Kemps-Snijders M (2010) RELISH: Rendering endangered References languages lexicons interoperable through standards harmoni- sation. In: 7th SaLTMiL Workshop on Creation and use of ba- [1] Buyko E, Chiarcos C, Pareja-Lora A (2008) Ontology-based sic lexical resources for less-resourced languages, held in con- interface specifications for a NLP pipeline architecture. In: junction with LREC 2010, Valetta, Malta Proc. LREC 2008, Marrakech, Morocco [18] Kemps-Snijders M, Windhouwer M, Wittenburg P, Wright S [2] Aguado de Cea G, Gomez-Perez A, Alvarez de Mon I, Pareja- (2009) ISOcat: Remodelling metadata for language resources. Lora A (2004) OntoTag’s linguistic ontologies. In: Proc. Infor- International Journal of Metadata, Semantics and Ontologies mation Technology: Coding and Computing (ITCC’04), Wash- 4(4):261–276 ington, DC, USA [19] Leech G, Wilson A (1996) EAGLES recommendations for the [3] Chiarcos C (2008) An ontology of linguistic annotations. LDV morphosyntactic annotation of corpora. URL http://www. Forum 23(1):1–16 ilc.cnr.it/EAGLES/annotate/annotate.html, [4] Chiarcos C (2010) Grounding an ontology of linguistic annota- version of March 1996 tions in the Data Category Registry. In: LREC 2010 Workshop [20] McCrae J, Spohr D, Cimiano P (2011) Linking lexical re- on and Language Technology Standards sources and ontologies on the semantic web with Lemon. The (LT<S), Valetta, Malta, pp 37–40 Semantic Web: Research and Applications pp 245–259 [5] Chiarcos C (2012) Interoperability of Corpora and Annota- [21] McCrae J, Montiel-Ponsoda E, Cimiano P (2012) Integrating tions. In: Chiarcos C, Nordhoff S, Hellmann S (eds) Linked and with lemon. Linked Data in Linguis- Data in Linguistics. Representing and Connecting Language tics pp 25–34 Data and Language Metadata, Springer, Heidelberg, pp 161– [22] Pareja-Lora A, Aguado de Cea G (2010) Ontology-based inter- 179 operation of linguistic tools for an improved lemma annotation [6] Chiarcos C (2012) Ontologies of linguistic annotation: Survey in Spanish. In: Proc. LREC 2010, Valetta, Malta and perspectives. In: Proceedings of the Eighth International [23] Rehm G, Eckart R, Chiarcos C (2007) An OWL-and XQuery- Conference on Language Resources and Evaluation (LREC), based mechanism for the retrieval of linguistic patterns from pp 303–310 XML-corpora. In: Proc. RANLP 2007, Borovets, Bulgaria [7] Chiarcos C, Erjavec T (2011) OWL/DL formalization of the [24] Santorini B (1990) Part-of-speech tagging guidelines for the MULTEXT-East morphosyntactic specifications. In: Proc. 5th Penn Project. Department of Computer and Informa- Linguistic Annotation Workshop, held in conjunction with tion Science, University of Pennsylvania, technical report MS- ACL-HTL 2011, Portland, pp 11–20 CIS-90-47 [8] Chiarcos C, Götze M (2007) A linguistic database with [25] Saulwick A, Windhouwer M, Dimitriadis A, Goedemans R ontology-sensitive corpus querying. system demonstration at (2005) Distributed tasking in ontology mediated integration Datenstrukturen für linguistische Ressourcen und ihre Anwen- of typological databases for linguistic research. In: Proc. dungen. Frühjahrstagung der Gesellschaft für Linguistische 17th Conf. on Advanced Information Systems Engineering Datenverarbeitung (GLDV 2007). T (CAiSE’05), Porto [9] Chiarcos C, Dipper S, Götze M, et al (2008) A flexible frame- [26] Schiller A, Teufel S, Stöckert C, Thielen C (1999) Guidelines work for integrating annotations from different tools and tag für das Tagging deutscher Textcorpora mit STTS. Tech. rep., sets. TAL (Traitement Automatique des Langues) 49(2) Universities of Stuttgart and Tübingen [10] Chiarcos C, Hellmann S, Nordhoff S (2011) Towards a linguis- [27] Schmidt T, Chiarcos C, Lehmberg T, Rehm G, Witt A, Hinrichs tic linked open data cloud: The open linguistics working group. E (eds) (2006) Avoiding data graveyards: From heterogeneous Traitement automatique des langues 52(3):245–275 data collected in multiple research projects to sustainable lin- [11] Eckle-Kohler J, McCrae J, Chiarcos C (this vol.) lemonUby - guistic resources, Proceedings of the E-MELD workshop on a large, interlinked, syntactically-rich resource for ontologies. Digital Language Documentation submitted to this volume [28] Schuurman I, Windhouwer M (2011) Explicit semantics for en- [12] Erjavec T (2004) MULTEXT-East version 3. In: Proc. LREC riched documents. What do ISOcat, RELcat and SCHEMAcat 2004, Lisboa, Portugal, pp 1535–1538 have to offer? In: Proc. 2nd Supporting Digital Humanities [13] Farrar S, Langendoen DT (2010) An OWL-DL implementa- conference (SDH 2011), Copenhagen, Denmark tion of GOLD: An ontology for the Semantic Web. In: Witt [29] Skut W, Brants T, Krenn B, Uszkoreit H (1998) A linguisti- AW, Metzing D (eds) Linguistic Modeling of Information and cally interpreted corpus of German newspaper text. In: Proc. Markup Languages: Contributions to Language Technology, ESSLLI Workshop on Recent Advances in Corpus Annotation, Springer, Dordrecht Saarbrücken, Germany [14] Fillmore CJ (1968) The case for case. In: Bach E, Harms R [30] Stede M (2004) The Potsdam Commentary Corpus. In: Proc. (eds) Universals in Linguistic Theory, Holt, Rinehart, and Win- ACL-2004 Workshop on Discourse Annotation, Barcelona, ston, New York, pp 1–88 Spain, pp 96–102 [15] Francopoulo G, Declerck T, Sornlertlamvanich V, et al (2008) [31] Telljohann H, Hinrichs EW, Kübler S (2003) Stylebook for the Data Category Registry: Morpho-syntactic and syntactic pro- Tübingen treebank of written German (TüBa-D/Z). Tech. rep., files. In: Proc. LREC-2008 Workshop on Uses and Usage of Seminar für Sprachwissenschaft, Universität Tübingen, Tübin- Language Resource-Related Standards, Marrakech, Morocco gen, Germany