Proceedings of KONVENS 2012 (Main Track: Poster Presentations), Vienna, September 19, 2012 Ments Between Its Lsrs
Total Page:16
File Type:pdf, Size:1020Kb
Navigating Sense-Aligned Lexical-Semantic Resources: THE WEB INTERFACE TO UBY Iryna Gurevych1,2, Michael Matuschek1, Tri-Duc Nghiem1, Judith Eckle-Kohler1, Silvana Hartmann1, Christian M. Meyer1 1Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universitat¨ Darmstadt 2Ubiquitous Knowledge Processing Lab (UKP-DIPF) German Institute for Educational Research and Educational Information http://www.ukp.tu-darmstadt.de Abstract to the large sense-aligned LSR UBY (Gurevych et al., 2012). UBY is represented in compli- In this paper, we present the Web inter- ance with the ISO standard LMF (Francopoulo face to UBY, a large-scale lexical resource based on the Lexical Markup Framework et al., 2006) and currently contains interoper- (LMF). UBY contains interoperable ver- able versions of nine heterogeneous LSRs in sions of nine resources in two languages. two languages, as well as pairwise sense align- The interface allows to conveniently exam- ments for a subset of them: English WordNet ine and navigate the encoded information (WN), Wiktionary (WKT-en), Wikipedia (WP- in UBY across resource boundaries. Its en), FrameNet (FN), and VerbNet (VN); German main contributions are twofold: 1) The vi- Wiktionary (WKT-de), Wikipedia (WP-de), and sual view allows to examine the sense clus- GermaNet (GN), and the English and German en- ters for a lemma induced by alignments between different resources at the level of tries of OmegaWiki (OW-en/de). word senses. 2) The textual view uniformly The novel aspects of our interface can be sum- presents senses from different resources in marized as 1) A graph-based visualization of detail and offers the possibility to directly sense alignments between the LSRs integrated in compare them in a parallel view. The Web UBY. Different senses of the same lemma which 1 interface is freely available on our website . are aligned across LSRs are grouped. This allows intuitively exploring and assessing the individual 1 Introduction senses across resource boundaries. 2) A textual Lexical-semantic resources (LSRs) are the foun- view for uniformly examining lexical information dation of many Natural Language Processing in detail. For a given lemma, all senses avail- (NLP) tasks. Recently, the limited coverage of able in UBY can be retrieved and the informa- LSRs has led to a number of independent efforts tion attached to them can be inspected in detail. to align existing LSRs at the word sense level. Additionally, this view offers to compare any two However, it is very inconvenient to explore the senses in a detailed contrasting view. resulting sense-aligned LSRs, because there are no APIs or user interfaces (UIs) and the data 2 Related Work formats are heterogeneous. Yet, easy access to Single Resource Interfaces. Web interfaces have sense-aligned LSRs would be crucial for their ac- been traditionally used for electronic dictionaries, ceptance and use in NLP, as researchers face the such as the Oxford Dictionary of English. Lew problem of determining the added value of sense- (2011) reviews the interfaces of the most promi- aligned LSRs for particular tasks. nent English dictionaries. These interfaces have In this paper, we address these issues by pre- also largely influenced the development of Web senting UBY-UI, an easy-to-use Web-based UI interfaces for LSRs, such as the ones for WN, 1https://uby.ukp.informatik.tu-darmstadt.de FN, WKT, or the recently presented DANTE (Kil- 194 Proceedings of KONVENS 2012 (Main track: poster presentations), Vienna, September 19, 2012 ments between its LSRs. Additionally, UBY-UI supports the direct comparison of two arbitrary word senses present in UBY. In the other multi resource interfaces, this is only possible for whole lexical entries. Graph-based Interfaces. Two examples for visualizing WN are Visuwords4 and WordNet ex- plorer5 that allow browsing the WN synset struc- ture. An example for a cross-lingual graph-based interface is VisualThesaurus6 which shows re- lated words in six different languages. UBY- UI provides a similar graph-based interface, but combines the information from multiple types of LSRs interlinked by means of sense alignments. 3 UBY – A Sense-Aligned LSR LMF Model. The large-scale multilingual re- source UBY holds standardized and hence in- teroperable versions of the nine LSRs previ- ously listed. UBY is represented according to the UBY-LMF lexicon model (Eckle-Kohler et al., 2012), an instantiation and extension of Figure 1: Search result for the verb align in the vi- the meta lexicon model defined by LMF. De- sual view. The aligned senses are connected by sense veloping a lexicon model such as UBY-LMF alignment nodes. Nodes are coloured by resource. involves first selecting appropriate classes (e.g. LexicalEntry) from the LMF packages, sec- ond defining attributes for these classes (e.g. garriff, 2010) which directly adapted the dictio- part of speech), and third linking the at- nary interface models. All of these Web inter- tributes and other linguisitc terms (such as at- faces have been designed in strict adherence to a tribute values) to ISOCat.7 UBY-LMF is ca- specific, single LSR. The UBY-UI is, in contrast, pable of representing a wide range of informa- designed for multiple heterogeneous LSRs. tion types from heterogeneous LSRs, including Multi Resource Interfaces. Only a few other both expert-constructed resources and collabora- Web interfaces are able to display information tively constructed resources. Representing them from multiple LSRs. The majority of them is lim- according to the class structure of UBY-LMF ited to show preformatted lexical entries one af- makes them structurally interoperable. The link- ter another without interconnecting them. Popu- ing of linguistic terms with their meaning as de- lar examples are Dictionary.com2 and TheFreeD- fined in ISOCat contributes to semantic interoper- ictionary3. Similarly, the DWDS interface (Klein ability. and Geyken, 2010) displays its entries in small Sense Alignments. UBY-LMF models rearrangable boxes. The Worterbuchnetz¨ (Burch a LexicalResource as consisting of one and Rapp, 2007) is an example of a Web interface Lexicon per integrated resource. These that connects its entries by hyperlinks – however, Lexicon instances can be aligned at the sense only at the level of lemmas and not word senses. level by linking pairs of senses or synsets using In contrast, UBY-UI provides hyperlinks to navigate between different word senses, as UBY 4http://www.visuwords.com provides mono- and cross-lingual sense align- 5http://faculty.uoit.ca/collins/research/wnVis.html 6http://www.visualthesaurus.com 2http://www.dictionary.com 7http://www.isocat.org/, the implementation of the ISO 3http://www.thefreedictionary.com 12620 Data Category Registry (Broeder et al., 2010). 195 Proceedings of KONVENS 2012 (Main track: poster presentations), Vienna, September 19, 2012 Figure 2: The textual view: (1) Senses of align, grouped by resource. (2) Area for selecting resources. (3) Detail view for a selected sense. (4) Drag & drop area for sense comparison. (5) Links to other senses. instances of the SenseAxis class. The resource website9. The frontend of the Web application is UBY features pairwise sense alignments between based on Apache Wicket10. a subset of LSRs. Both monolingual and cross- Visual View. The natural entry point to the vi- lingual alignments are present in UBY: WN–WP- sual view is the search box for a lemma11, and en (Niemann and Gurevych, 2011), WN–WKT-en the result is a graph, with the query lemma as the (Meyer and Gurevych, 2011), VN–WN (Kipper central node and the retrieved senses as nodes at- et al., 2006), VN–FN (Palmer, 2009), OW-de– tached to it (see figure 1). The sense nodes are WN (Gurevych et al., 2012) and OW–WP, OW- coloured according to the source LSRs. To keep en–OW-de, WP-en–WP-de which are part of the the view compact, the definition is only shown original resources. when a node is clicked. The WN–WP-en, WN–WKT-en and OW-de– The sense alignments between LSRs available WN alignments have been automatically created. in UBY are represented by alignment nodes, Please refer to the papers mentioned above for which are displayed as hubs connecting aligned details on the alignment algorithm and detailed senses. For generating the alignment nodes, we statistics. cluster senses based on their pairwise aligments UBY 1.0 UBY currently contains more than and include all senses which are directly or tran- 4.5 million lexical entries, 4.9 million senses, 5.4 sitively aligned. Thus, the visual view provides million semantic relations between senses and a visualization of which and how many senses more than 700,000 alignments between senses. from different LSRs are aligned in UBY. In Fig- There are 890,000 unique German and 3.1 mil- ure 1, we show the grouping of senses for the verb lion unique English lemma-POS combinations8. align. If a user wants to inspect a specific sense in more detail, a click on the link within a sense 4 UBY Web Interface node opens the textual view described below. Textual View. While the query mechanism for Technical Basis. UBY is deployed in an SQL the textual view is the same as for the visual view, database via hibernate, which is also the founda- in this case the interface returns a list of senses tion of the UBY-API. This allows to easily query (see (1) in Figure 2), including definitions, avail- all information entities within UBY. More de- able for this lemma either in all LSRs, or only tails on the UBY-API can be found on the UBY 9http://www.ukp.tu-darmstadt.de/uby 8Note that for homonyms there may be more than one 10http://wicket.apache.org/ LexicalEntry for a lemma-POS combination. 11Filtering by POS is to be included in a future release.