Standardizing Lexical-Semantic Resources – Fleshing out the abstract standard LMF

Judith Eckle-Kohler‡ Iryna Gurevych†‡ ‡ Ubiquitous Knowledge Processing † Ubiquitous Knowledge Processing Lab (UKP-TUDA) Lab (UKP-DIPF) Department of Computer Science German Institute for Educational Technische Universitat¨ Darmstadt Research and Educational Information www.ukp.tu-darmstadt.de www.ukp.tu-darmstadt.de

Abstract can be interpreted the same way. Two syntacti- cally interoperable LSRs might use the same term This paper describes the application of to denote different meanings. Semantically in- the (LMF) for teroperable LSRs, on the other hand, use terms standardizing lexical-semantic resources in that share a common definition of their meaning. the context of NLP. More specifically, we highlight the question how lexical-semantic Consequently, NLP systems that switch between resources can be made semantically inter- semantically interoperable LSRs can perform the operable by means of LMF and ISOCat. same kind of processing and the results produced The LMF model UBY-LMF, an instantia- can still be interpreted the same way. tion of LMF specifically for NLP, serves as In this paper, we focus on the question how an example to illustrate the path towards se- to achieve semantic interoperability by means of mantic interoperability of lexical resources. the ISO 24613:2008 LMF (Francopoulo et al., 2006) and ISOCat.1 The comprehensive LMF 1 Introduction lexicon model UBY-LMF (Eckle-Kohler et al., 2012) serves as an example to show how the ab- Lexical-semantic resources (LSR) are used in stract LMF standard is to be fleshed out and in- major NLP tasks, such as word sense disam- stantiated in order to make LSRs semantically in- biguation, and informa- teroperable for NLP purposes. tion extraction. In recent years, the aspects of UBY-LMF covers very heterogeneous LSRs in reusing and merging LSRs have gained signifi- two languages, English and German, and has been cance, mainly due to the fact that LSRs are ex- used to standardize a range of LSRs resulting in pensive to build. Standardization of LSRs plays the large-scale LSR UBY (Gurevych et al., 2012), an important role in this context, because it facili- see http://www.ukp.tu-darmstadt.de/uby/. UBY tates integration and merging of LSRs and makes currently contains ten resources in two languages: reuse of LSRs easy. NLP systems that are built English WordNet (Fellbaum, 1998), Wiktionary2, according to standards can simply plug in stan- Wikipedia3, FrameNet (Baker et al., 1998), and dardized LSRs and are thus able to easily switch VerbNet (Kipper et al., 2008); German Wik- between different standardized LSRs. In other tionary, , GermaNet (Kunze and Lem- words, standardizing LSRs makes them interop- nitzer, 2002) and IMSlex-Subcat (Eckle-Kohler, erable. 1999) and the English and German entries of Two aspects of interoperability are to be dis- OmegaWiki4. tinguished in NLP: syntactic interoperability and semantic interoperability (Ide and Pustejovsky, 1http://www.isocat.org/ 2011). While NLP systems can perform the same 2http://www.wiktionary.org/ kind of processing with syntactically interopera- 3http://www.wikipedia.org/ ble LSRs, there is no guarantee, that the results 4http://www.omegawiki.org/

496

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 2 LMF and semantic interoperability Fixing the structure of a lexicon model by choosing a set of classes contributes to syntactic First, we give an overview of the LMF standard interoperability of LSRs, as it fixes the high-level and briefly describe how to use it. We put a spe- organization of lexical knowledge in an LSR, e.g., cial focus on the question how LSRs can be made whether synonymy is encoded by grouping senses semantically interoperable by means of LMF. into synsets (using the Synset class) or by speci- LMF – an abstract standard LMF defines a fying sense relations (using the SenseRelation meta-model of lexical resources, covering both class), which connect synonymous senses. NLP lexicons and machine readable dictionar- Defining attributes for the LMF classes and ies. The standard specifies this meta-model in specifying the attribute values is far more chal- the Unified Modeling Language (UML) by pro- lenging than choosing from a given set of classes, viding a set of UML diagrams. UML packages because the standard gives only a few examples of are used to organize the meta-model and each di- attributes and leaves the specification of attributes agram given in the standard corresponds to an to the user in order to allow maximum flexibility. UML package. LMF defines a mandatory core Finally, the attributes and values have to be package and a number of extension packages for linked to a description of their meaning in an different types of resources, e.g., morphological ISO 12620:2009 compliant Data Category Reg- resources or . The core package models istry (DCR), see (Broeder et al., 2010). ISOCat is a lexicon in the traditional headword-based fash- the implementation of the ISO 12620:2009 DCR ion, i.e., organized by lexical entries. Each lexi- providing descriptions of terms used in language cal entry is defined as the pairing of one to many resources. forms and zero to many senses. These descriptions in ISOCat are standardized, Instantiating LMF The abstract meta-model i.e., they comply with a predefined format and given by the LMF standard is not immediately us- provide some mandatory information types, in- able as a format for encoding (i.e., converting) an cluding a unique administrative identifier (e.g., existing LSR (Tokunaga et al., 2009). It has to be partOfSpeech) and a unique and persistent iden- instantiated first, i.e., a full-fledged lexicon model tifier (PID, e.g., http://www.isocat.org/datcat/DC- has to be developed by choosing LMF classes and 396) which can be used to link to the descriptions. by specifying suitable attributes for these LMF The standardized descriptions of terms are called classes. Data Categories (DCs). According to the standard, developing a lexi- Semantic Interoperability Connecting the lin- con model involves guistic terms used for attributes and their values 1. selecting classes from the UML packages, in a lexicon model with their meaning defined ex- ternally in ISOCat contributes to semantic inter- 2. defining attributes for these classes and operability of LSRs (see also Windhouwer and Wright (2012)). The definitions of DCs in ISOCat 3. linking the attributes and other linguistic constitute an interlingua that can be used to map terms introduced (e.g., attribute values) to idiosyncratically used linguistic terms to a set of standardized descriptions of their meaning. reference definitions (Chiarcos, 2010). Different Selecting a combination of LMF classes from LSRs that share a common definition of their lin- the LMF core package and from the extension guistic vocabulary are said to be semantically in- packages establishes the structure of a lexicon teroperable (Ide and Pustejovsky, 2010). model. While the LMF core package models a Consider as an example the LexicalEntry lexicon in terms of lexical entries, the LMF ex- class of two different lexicon models A and tensions provide classes for different types of lex- B. Lexicon model A could have an attribute icon organization, e.g., covering the synset-based partOfSpeech (POS), while lexicon model organization of wordnets or the semantic frame- B could have an attribute pos. Linking based organization of FrameNet. both attributes to the meaning “A category as-

497

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 signed to a word based on its grammatical 2012) and (Gurevych et al., 2012) for detailed in- and semantic properties.” given in ISOCat formation on UBY-LMF and the corresponding (http://www.isocat.org/datcat/DC-396) makes the large-scale LSR UBY. two lexicon models semantically interoperable UBY-LMF is represented by a DTD which with respect to the POS attribute. can be used to automatically convert any given A human can look up the meaning of a term resource into the corresponding XML format. occurring in a lexicon model by following the link Converters for ten LSRs to UBY-LMF format to the ISOCat DC and consulting its description in are publicly available on Google Code, see ISOCat. Linking the attributes and their values to http://code.google.com/p/uby/. ISOCat DCs results in a so-called Data Category Selection. 3.2 UBY-LMF attributes It is important to stress that the notion of ”se- In UBY-LMF, the definition of attributes for the mantic interoperability” in the context of LMF LMF classes was guided by two requirements has a limited scope: it only refers to the meaning that we identified as important in the context of of the linguistic vocabulary used in an LMF lex- NLP: (i), comprehensiveness, and (ii) extensibil- icon model – not to the meaning of the lexemes ity (Eckle-Kohler et al., 2012). listed in a LSR. Comprehensiveness implies that the model should be able to represent all the lexical infor- 3 UBY-LMF – Instantiating LMF mation present in a wide range of LSRs, because NLP applications usually require different types Considering the fact that only a fleshed-out LMF of lexical knowledge and it is difficult to decide lexicon model, i.e., an instantiation of the LMF in advance which type of lexical information will standard, can be used for actually standardizing be useful for a particular NLP application. LSRs, it is obvious that LMF-compliant LSRs Extensibility is also crucial in the NLP domain, are not necessarily interoperable, neither syntac- because UBY-LMF should be applicable across tically nor semantically. languages (Gurevych et al., 2012), (Eckle-Kohler Therefore it is important to develop a single, and Gurevych, 2012) and as well be able to adopt comprehensive instantiation of LMF, which can automatically extracted lexical-semantic knowl- immediately be used for standardizing LSRs. The edge. LMF lexicon model UBY-LMF strives to be such a comprehensive instantiation of LMF to be used 4 UBY-LMF and semantic in NLP. interoperability In section 2, we have pointed out that linking at- 3.1 UBY-LMF characteristics tributes and values used in an LMF lexicon model UBY-LMF covers a wide range of lexical in- to DCs in ISOCat is crucial for semantic interop- formation types, since it has been designed as erability. a uniform format for standardizing heteroge- Now we will take a closer look at the seman- neous types of LSRs, including both expert- tic interoperability of UBY-LMF compliant re- constructed resources – wordnets, FrameNet, sources by describing in detail the grounding of VerbNet – and collaboratively constructed re- UBY-LMF attributes and values in the ISOCat sources – Wikipedia, , OmegaWiki. repository. First, we introduce ISOCat, then, we In UBY-LMF, there is one Lexicon per inte- describe the process of selecting and creating an grated resource, i.e., one Lexicon for FrameNet, ISOCat Data Category Selection for UBY-LMF, for WordNet and so on. This way, the Lexicon and finally, we look at some limitations of the cur- instances can be aligned at the sense level by link- rent version of UBY-LMF. ing pairs of senses or synsets using instances of the SenseAxis class. 4.1 Overview of ISOCat The full model consists of 39 classes and 129 The Data Category Registry ISOCat is a collabo- attributes. Please refer to (Eckle-Kohler et al., ratively constructed repository where everybody

498

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 can register and create DCs. Users can assign http://www.isocat.org/rest/dcs/484. their DCs to so-called Thematic Domains, such Figure 1 shows a screenshot of the public Data as Morphosyntax, Syntax or Lexical Resources. Category Selection for UBY as of 2012. The ISOCat Web interface provides a form that guides the user through the process of creating a 4.2 UBY-LMF: Selecting DCs from ISOCat DC, also indicating which kind of information has Selecting DCs from ISOCat which are suitable to be provided mandatorily. The well-formedness descriptions of terms used in a lexicon model such of a newly created DC is automatically checked as UBY-LMF is a task that requires the identi- and displayed by a flag – a green marking indi- fication of fine-grained and subtle differences in cating well-formedness. meaning and hence, is a task where humans are Users can also group DCs, including self- superior to machines in terms of quality. For created ones, into a Data Category Selection. UBY-LMF, the intended meanings of the lexicon Typically, Data Category Selections are created model terms and the textual descriptions of the for projects or resources, e.g., there are Data Cat- DCs were manually compared in order to first egory Selections for RELISH5 or CLARIN6 or identify candidate DCs with equivalent or similar for resources, such as the STTS tagset7 or UBY.8 meaning and then to select one of the candidate Data Category Selections can be made publicly DCs as reference for a specific term. available, in order to allow for linking to particu- Searching for DCs in ISOCat Iden- lar DCs. tifying candidate DCs in ISOCat means It is possible to submit subsets of well-formed accessing the ISOCat Web interface DCs to standardization. While the standardization (http://www.isocat.org/interface/index.html) of ISOCat DCs has not yet started, standardized and searching for the lexicon model term or DCs might be important for resources where sus- variants thereof. Searching for candidate DCs in tainability is an issue, because standardized DCs ISOCat is time-consuming and not particularly can be considered as stable. Non-standardized user-friendly, because there are many DCs with DCs, on the other hand, could in principle be similar names and equivalent or near-equivalent changed at any time by their owners which might meaning. Currently ISOCat does not display re- also involve changes in their meaning. lations between such terms (e.g., the equivalence Two types of DCs distinguished in ISOCat are relation). relevant for LMF lexicon models (Windhouwer This is offered to a limited extent by RelCat, and Wright, 2012): first, complex DCs which a companion registry to ISOCat (Windhouwer, have a typed value domain and typically corre- 2012). However, RelCat is still at an early stage spond to attributes in an LMF lexicon model. Ac- of development and currently provides only rela- cording to the size of the value domain, DCs are tions between selected Data Category Selections, classified further into open DCs (they can take such as GOLD9 and RELISH. Therefore, we did an arbitrary number of values), closed DCs (their not use it for the selection of DCs for UBY-LMF. values can be enumerated) and constrained DCs There are basically two ways of looking for (the number of values is too big in order to be DCs in ISOCat: first, by entering a search term enumerated, but yet constrained). Second, there and second by browsing Thematic Domains, such are simple DCs which describe values of a closed as Syntax, or by browsing Data Category Selec- DC. tions published by particular groups or projects, The attributes and values defined in UBY-LMF such as CLARIN or RELISH. refer to 175 ISOCat DCs; most of them are simple DCs. The corresponding Data Category Choosing among several candidate DCs Typ- Selection is publicly available in ISOCat, see ically, an ISOCat search query for a term (using the option exact match) yields a list of DCs that 5http://tla.mpi.nl/relish/ have slightly different names, but are very similar 6 http://www.clarin.eu in meaning. Consider as an example the linguistic 7http://www.isocat.org/rest/dcs/376 8http://www.isocat.org/rest/dcs/484 9http://linguistics-ontology.org/

499

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 Figure 1: Screenshot of the public Data Category Selection “Uby 2012“ in ISOCat, see http://www.isocat.org/interface/index.html. term determiner which is a possible value of the • We sorted the search results by Owner, i.e., attribute partOfSpeech in UBY-LMF. There are by the person who created the DC, and pre- 25 DCs in ISOCat that exactly match the term de- ferred DCs which are owned by experts, ei- terminer. ther in the standardization community or in Another example is the term direct object that the linguistics community. is sometimes used for specifying the accusative • We preferred those DCs that had passed the noun phrase argument of transitive verbs in ISOCat test for well-formedness. German. In ISOCat, there are two different specifications of this term, one explicitly stat- Sometimes, selecting an existing DC from ing that this accusative noun phrase argument ISOCat involved making compromises. For can become the clause subject in passiviza- instance, the attribute value taxonomic of the tion (http://www.isocat.org/datcat/DC-1274), attribute relType of the SenseRelation the other not mentioning passivization at all class links to the ISOCat DC taxonomy (http://www.isocat.org/datcat/DC-2263). (http://www.isocat.org/rest/dc/4039) which We tried to select a DC with a meaning as close has a narrower meaning than the meaning as possible to the intended meaning in UBY-LMF. intended in UBY-LMF: While taxonomic in When selecting a particular DC from a list of can- UBY-LMF denotes a type of sense relation that didates, we followed two simple strategies: defines a taxonomy in a general sense, covering

500

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 all kinds of lexemes, (see Cruse (1986)), the We used the following expert-based sources as DC 4039, taxonomy, is restricted to controlled references for newly created DCs: vocabulary terms organized in a taxonomy. • the EAGLES synopsis on morphosyntac- Note that many attributes or attribute values tic phenomena10 (Calzolari and Monachini, in UBY-LMF refer to ISOCat DCs with a differ- 1996), as well as the EAGLES recommenda- ent (so-called admitted) name. This is explicitly tions on subcategorization11 have been used supported by ISOCat, because each DC defini- to identify DCs relevant for lexical syntax tion may optionally contain Data Element Name Sections in order to record other names for the • traditional grammars such as the English DC as used in different sources, such as a given grammar by Quirk (1980) database, format or application. In this man- ner, the number of parallel DCs in ISOCat with • a Web-based encyclopedia of linguistic equivalent meaning can be limited. We left the terms, collaboratively built by linguists, see specification of Data Element Name Sections in http://www.glottopedia.de the newly created DCs (holding a deviating UBY • the detailed documentation of the FrameNet name) to future work. resource (Ruppenhofer et al., 2010) Semantic Drift As ISOCat is a collaboratively Fifteen of the newly created DCs are required for constructed and thus dynamically evolving repos- representing subcategorization frames in UBY- itory, DCs that are not standardized yet could LMF. Most of these DCs are simple DCs that are eventually change their meaning over time, e.g., linked to attribute values in UBY-LMF. The cor- they might be further fleshed out and specified in responding DCs were either missing in ISOCat or more detail. there were only DCs with a related, but not suffi- Since the public release of UBY in March ciently similar meaning. 2012, we have already encountered such a case For instance, we created a DC for to-infinitive of semantic drift for the DC verbFormMood (see complements as in He tried to address all http://www.isocat.org/rest/dc/1427) which we se- questions.. While both infinitive and infini- verbForm lected as a description of the attribute tiveParticle are present in ISOCat, all avail- SyntacticCategory attached to the class in the able definitions of infinitive do not explic- Syntax part of UBY-LMF. itly exclude the presence of a particle, nei- The DC verbFormMood has been changed af- ther http://www.isocat.org/datcat/DC-1312 nor ter March 2012 (according to the change log, this http://www.isocat.org/datcat/DC-2753. What is DC was changed on 2012-06-09) and is no longer required, however, for the specification of verbal similar enough to the intended meaning. There- complements are individual DCs for bare infini- fore, we will create a new DC which specifies tives used without particle and for infinitives used verbForm as a property of verb phrase comple- with particle (i.e., to in English). ments in the context of subcategorization. Ten newly created DCs are necessary to rep- resent FrameNet in LMF and primarily describe 4.3 UBY-LMF: Creating new ISOCat DCs specific properties of frame-semantic frames, The majority of attributes and values in UBY- e.g., coreness, incorporated semantic argument or LMF refer to already existing ISOCat DCs. Yet, transparent meaning. we had to create 38 new DCs in ISOCat, as partic- ular definitions were missing. We decided to cre- 4.4 UBY-LMF: Limitations of semantic ate new DCs in those domains where much stan- interoperability dardization work has already been done or large The semantic interoperability of a UBY-LMF sources of linguistic expert knowledge are avail- compliant resource is naturally limited to those able. In particular, these are the domains lexical domains within UBY-LMF where attributes and syntax (related to subcategorization), derivational 10http://www.ilc.cnr.it/EAGLES96/morphsyn/ morphology, and frame semantic information. 11http://www.ilc.cnr.it/EAGLES96/synlex/

501

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 their values are fleshed out at a fine-grained FrameNet and VerbNet employ different label level and refer to ISOCat DCs. Consequently, sets for selectional preferences. While VerbNet attributes of UBY-LMF classes that are string- makes use of a combination of high-level con- valued currently limit the semantic interoperabil- cepts from the EuroWordNet hierarchy (Kipper- ity of UBY-LMF. There are three main areas in Schuler, 2005), e.g., +animate & +organization, UBY-LMF where attributes or their values are FrameNet selectional preferences (called onto- currently string-valued due to a lack of stan- logical types in FrameNet) correspond to synset dardization: first, the names of semantic rela- nodes of WordNet (Ruppenhofer et al., 2010), tions, second, the names of semantic roles, and such as Message, Container, Speed. third, the types of arbitrary semantic classifica- tion schemes, as well as the labels used in these 5 Discussion schemes. In this section, we first compare UBY-LMF as Semantic Relations Names of semantic rela- an LMF instantiation with the TEI guidelines for tions are values of the attribute relName of the Dictionaries. For a detailed discussion of related SenseRelation and SynsetRelation class. work, please refer to (Eckle-Kohler et al., 2012). It is not clear, if the set of possible names for Then, we give an outlook of future work on pro- semantic relations is restricted, considering the viding alternative formats for UBY-LMF which names of semantic relations found in Wiktionary, make the linking to ISOCat accessible to larger OmegaWiki, WordNet and FrameNet, For in- communities. stance, in OmegaWiki, relations such as “works in a”, “partners with”, “is practiced by a”, “or- 5.1 The TEI guidelines for Dictionaries bits around“ (for moons moving around a planet) The Text Encoding Initiative (TEI) provides are listed. This does not fit in the set of classi- guidelines which define standardized representa- cal lexical-semantic relations described by Cruse tion formats for various kinds of digitized texts (1986). in the Digital Humanities. In particular, the TEI Semantic Roles Names of semantic roles are provides guidelines for Dictionaries which are values of the attribute semanticRole of the in principle applicable to all kinds of lexical re- SemanticArgument class. Although DCs for sources, but primarily to those that are intended semantic roles have been proposed by Schiffrin for human use. and Bunt (2007), we have not entered them to Romary (2010a) has suggested to develop the ISOCat, because there is still ongoing work in TEI guidelines for Dictionaries into a full LMF ISO committees on the standardization of seman- instantiation. In their current form, the TEI guide- tic roles. lines cover already core classes of LMF which are required for representing dictionaries. Turning Semantic classification schemes Finally, the TEI guidelines for Dictionaries into an LMF names of semantic classification schemes instantiation would require first to select the sub- and the labels used in these schemes are val- set of these guidelines that can be mapped to LMF ues of the attributes type and label of the and then to extend the guidelines, e.g., in order to SemanticLabel class. On the one hand, the also cover lexical syntax in more detail (Romary, type attribute covers such divergent classifica- 2010b). tion schemes as Wikipedia categories, WordNet We did not follow this proposal for UBY-LMF, semantic fields (i.e., the lexicographer file however, because we aimed at representing very names), VerbNet classes, selectional preference heterogeneous resources, which are important for schemes or register classification which is present NLP systems, by a single, uniform lexicon model. in Wiktionary. This particular goal could be achieved mainly due On the other hand, the string-values of the to the high degree of flexibility offered by the label attribute are even more diverse. For LMF standard. Using the TEI guidelines as a instance, the selectional preference schemes in starting point seemed to bring about too many

502

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 constraints, because many aspects of the lexicon are first, using a single instantiation of LMF for model would have been fixed already. multiple resources, such as UBY-LMF, and sec- ond, establishing a linking between an LMF lexi- 5.2 Linking to ISOCat – Outlook con model and ISOCat DCs. The actual representation format for the linking Many DCs required for LSRs are still missing of UBY-LMF terms and ISOCat DCs should be in ISOCat, mainly due to ongoing standardiza- useful both for humans and for machines. For hu- tion in various areas. Yet, for UBY-LMF, we have mans, it is helpful to be able to look up the mean- fleshed out a Data Category Selection of consider- ing of the terms used in UBY by reading the de- able size and detail in such areas as morphosyntax scriptions of these terms in ISOCat. Machines, and subcategorization. We made the UBY-LMF on the other hand, can determine the degree of Data Category Selection available in ISOCat as semantic interoperability of two lexical resources a starting point for further work on standardizing by comparing the ISOCat PIDs each lexical re- lexical resources according to LMF. source links to. At the time of writing this paper, UBY-LMF is Acknowledgments the only LMF lexicon model with a publicly ac- This work has been supported by the Volks- cessible Data Category Selection in ISOCat. We wagen Foundation as part of the Lichtenberg- hope that DataCategory Selections used in other Professorship Program under grant No. I/82806. LMF models will be added to ISOCat soon, be- We thank Silvana Hartmann, Michael Matuschek cause referring to ISOCat DCs is the only way and Christian M. Meyer for their contributions to to automatically determine the degree of semantic this work. interoperability of any two LMF-compliant LSRs. Currently, UBY-LMF provides human- readable links to ISOCat DCs at the schema References level (as suggested by Windhouwer and Wright Collin F. Baker, Charles J. Fillmore, and John B. (2012)), i.e., at the level of the UBY-LMF DTD. Lowe. 1998. The Berkeley FrameNet project. In Pairs of attributes or attribute values and ISOCat Proceedings of the 36th Annual Meeting of the As- DCs are listed in XML comments. As part of fu- sociation for Computational Linguistics and 17th ture work, we plan to provide additional versions International Conference on Computational Lin- of the UBY-LMF model with machine-readable guistics (COLING-ACL’98), pages 86–90, Mon- links to ISOCat, in particular an XSD version as treal, Canada. Daan Broeder, Marc Kemps-Snijders, Dieter Van Uyt- well as an RDFS version for the Semantic Web vanck, Menzo Windhouwer, Peter Withers, Peter community. Wittenburg, and Claus Zinn. 2010. A Data Cat- In addition, we plan to create a metadata egory Registry- and Component-based Metadata instance for UBY-LMF based on the compo- Framework. In Proceedings of the Seventh Inter- nent metadata infrastructure which has been national Conference on Language Resources and adopted by large communities developing in- Evaluation (LREC), pages 43–47, Valletta, Malta. frastructures of language resources, such as Daan Broeder, Dieter van Uytvanck, Maria Gavrili- CLARIN and META-SHARE (Broeder et al. dou, Thorsten Trippel, and Menzo Windhouwer. 2012. Standardizing a component metadata infras- (2012), Labropoulou and Desipri (2012), Gavrili- tructure. In Proceedings of the Eight International dou et al. (2011)). Such a metadata instance Conference on Language Resources and Evaluation specifies the characteristics of UBY-LMF as a (LREC’12), pages 1387 – 1390, Istanbul, Turkey. , also including the linking of Nicoletta Calzolari and Monica Monachini. 1996. UBY-LMF terms to ISOCat. EAGLES Proposal for Morphosyntactic Stan- dards: in view of a ready-to-use package. In 6 Conclusion G. Perissinotto, editor, Research in Humanities Computing, volume 5, pages 48–64. Oxford Uni- We have shown how LMF contributes to seman- versity Press, Oxford, UK. tic interoperability of LSRs which is crucial for Christian Chiarcos. 2010. Towards robust multi-tool NLP systems using LSRs. The two key aspects tagging. an OWL/DL-based approach. In Proceed-

503

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 ings of the 48th Annual Meeting of the Association on Global Interoperability for Language Resources, for Computational Linguistic, pages 659–670. As- Hong Kong. sociation for Computational Linguistic. Nancy Ide and James Pustejovsky. 2011. Issues and D.A. Cruse. 1986. Lexical Semantics. Cambridge Strategies for Interoperability among NLP Software Textbooks in Linguistics. Cambridge University Systems and Resources, Tutorial at the 5th Interna- Press. tional Joint Conference on Natural Language Pro- Judith Eckle-Kohler and Iryna Gurevych. 2012. cessing (IJCNLP). Subcat-LMF: Fleshing out a standardized format Karin Kipper, Anna Korhonen, Neville Ryant, and for subcategorization frame interoperability. In Martha Palmer. 2008. A Large-scale Classification Proceedings of the 13th Conference of the Euro- of English Verbs. Language Resources and Evalu- pean Chapter of the Association for Computational ation, 42:21–40. Linguistics (EACL 2012), pages 550–560, Avignon, Karin Kipper-Schuler. 2005. VerbNet: A broad- France. coverage, comprehensive verb lexicon. PhD. The- Judith Eckle-Kohler, Iryna Gurevych, Silvana Hart- sis. Computer and Information Science Dept., Uni- mann, Michael Matuschek, and Christian M. versity of Pennsylvania. Philadelphia, PA. Meyer. 2012. UBY-LMF - A Uniform Format Claudia Kunze and Lothar Lemnitzer. 2002. Ger- for Standardizing Heterogeneous Lexical-Semantic maNet representation, visualization, application. Resources in ISO-LMF. In Proceedings of the 8th In Proceedings of the Third International Con- International Conference on Language Resources ference on Language Resources and Evaluation and Evaluation (LREC 2012), pages 275–282, Is- (LREC), pages 1485–1491, Las Palmas, Canary Is- tanbul, Turkey. lands, Spain. Judith Eckle-Kohler. 1999. Linguistisches Wissen zur Penny Labropoulou and Elina Desipri. 2012. Docu- automatischen Lexikon-Akquisition aus deutschen mentation and User Manual of the META-SHARE Textcorpora. Logos-Verlag, Berlin, Germany. Metadata Model, http://www.meta-net.eu. PhDThesis. Randolph Quirk. 1980. A Grammar of contemporary Christiane Fellbaum. 1998. WordNet: An Electronic English. Longman. Lexical Database. MIT Press, Cambridge, MA, Laurent Romary. 2010a. Standardization of the for- USA. mal representation of lexical information for NLP. Gil Francopoulo, Nuria Bel, Monte George, Nico- In Dictionaries. An International Encyclopedia of letta Calzolari, Monica Monachini, Mandy Pet, and Lexicography. Supplementary volume: Recent de- Claudia Soria. 2006. Lexical Markup Framework velopments with special focus on computational (LMF). In Proceedings of the Fifth International lexicography. Mouton de Gruyter. Conference on Language Resources and Evaluation Laurent Romary. 2010b. Using the TEI framework as (LREC), pages 233–236, Genoa, Italy. a possible serialization for LMF, presentation at the Maria Gavrilidou, Penny Labropoulou, Stelios RELISH workshop, August. Piperidis, Monica Monachini, Francesca Frontini, Josef Ruppenhofer, Michael Ellsworth, Miriam R. L. Gil Francopoulo, Victoria Arranz, and Valerie´ Petruck, Christopher R. Johnson, and Jan Schef- Mapelli. 2011. A Metadata Schema for the fczyk. 2010. FrameNet II: Extended Theory and Description of Language Resources (LRs). In Practice. Proceedings of the Workshop on Language Re- Amanda Schiffrin and Harry Bunt. 2007. sources, Technology and Services in the Sharing Documented compilation of semantic data Paradigm, pages 84–92, Chiang Mai, Thailand. categories, LIRICS Deliverable D4.3, Asian Federation of Natural Language Processing. http://lirics.loria.fr/documents.html. Iryna Gurevych, Judith Eckle-Kohler, Silvana Hart- Takenobu Tokunaga, Dain Kaplan, Nicoletta Calzo- mann, Michael Matuschek, Christian M. Meyer, lari, Monica Monachini, Claudia Soria, Virach and Christian Wirth. 2012. Uby - A Large-Scale Sornlertlamvanich, Thatsanee Charoenporn, Yingju Unified Lexical-Semantic Resource. In Proceed- Xia, Chu-Ren Huang, Shu-Kai Hsieh, and Kiyoaki ings of the 13th Conference of the European Chap- Shirai. 2009. Query Expansion using LMF- ter of the Association for Computational Linguistics Compliant Lexical Resources. In Proceedings of (EACL 2012), pages 580–590, Avignon, France. the 7th Workshop on Asian Language Resources, Nancy Ide and James Pustejovsky. 2010. What Does pages 145–152, Suntec, Singapore. Interoperability Mean, anyway? Toward an Op- Menzo Windhouwer and Sue Ellen Wright. 2012. erational Definition of Interoperability. In Pro- Linking to linguistic data categories in ISOcat. ceedings of the Second International Conference pages 99–107. Springer, Heidelberg.

504

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012 Menzo Windhouwer. 2012. RELcat: a Relation Reg- istry for ISOcat data categories. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 3661 – 3664, Istanbul, Turkey.

505

Proceedings of KONVENS 2012 (SFLR 2012 workshop), Vienna, September 21, 2012