Computational Linguistics in Practice: WebLicht and GermaNet Two Projects at the Department of Linguistics at the University of Tübingen Verena Henrich University of Tübingen Department of Linguistics November 30, 2010 Who I am: Verena Henrich • 2009: Master in Computer Science at h_da - Lecture about Natural Language Processing (NLP) - Two semesters in Iceland - Topic of master thesis about NLP • Since 2009: Researcher at the Department of General and Computational Linguistics at the University of Tübingen - First task: development of an editor for the German wordnet (GermaNet) - Further project that I will introduce today: WebLicht - PhD plans: word sense disambiguation with GermaNet 2 | Verena Henrich November 30, 2010 GermaNet – A German Wordnet 3 | Verena Henrich November 30, 2010 GermaNet: A German Wordnet • GermaNet is a lexical resource covering the German base vocabulary • It is a lexical semantic network • Belongs to the family of wordnets modeled after the Princeton WordNet for English • GermaNet is divided into 3 word categories: - Adjectives - Nouns - Verbs • Words are ordered according to their meaning 4 | Verena Henrich November 30, 2010 GermaNet: Lexical Units • Word meanings are represented by lexical units • A lexical unit specifies one form and one meaning (i.e. reading) of a word • Examples: - “Bank“ has 2 readings . Reading 1: [Bank, {Sitzbank}] (bench) . Reading 2: [Bank, {Geldinstitut}] (financial institution) - “Leiter” has 3 readings . Reading 1: [Leiter, {Steiggerät}] (ladder) . Reading 2: [Leiter, {Verantwortlicher, Anführer}] (leader) . Reading 3: [Leiter, {stromleitender Stoff}] (electric conductor) • Lexical units are grouped into semantic concepts according to their meaning 5 | Verena Henrich November 30, 2010 GermaNet: Synsets • Semantic concepts are represented by synsets • A synset is a set of (near-)synonymous words 6 | Verena Henrich November 30, 2010 GermaNet: Synset Examples • Verb examples: [rennen, laufen, sprinten, spurten] (to run) [klingeln, bimmeln, schellen, gongen, läuten] (to ring) • Adjective examples: [stark, kräftig] (strong/poweful) [eckig, kantig, zackig] (square-shaped/jagged) [ausgeprägt, hervorstechend, markant] (distinctive) • Noun examples: [Witz, Scherz, Jux, Ulk, Spaß, Schabernack, Gag] (joke) [Substantiv, Hauptwort, Nomen] (noun) [Textil, Gewebe, Webware, Stoff] (cloth/material) 7 | Verena Henrich November 30, 2010 GermaNet: Synsets • Each lexical unit belongs to exactly one synset • A literal however can belong to many synsets [Chip, Katoffelchip] (potato chrisp) [Chip, Mikrochip] (computer chip) [Kohle, Geld, Kies, Knete, Moneten] (money) [Kohle, Kohlegestein] (coal) [Golf, VW Golf] (car) [Golf] (Küstengebiet) (gulf) [Golf, Golfspiel] (golf) [gehen, laufen] (to walk) [gehen, funktionieren] (to work) • A synset has an average of 1.37 lexical units 8 | Verena Henrich November 30, 2010 GermaNet: Relations • In GermaNet, there are two types of semantic relations - Lexical relations are established between lexical units . Synonymy . Antonymy . Pertainymy - Conceptual relations are established between synsets . Hypernymy and hyponymy . Part-whole relations (meronymy and holonymy) . Entailment . Causation . Association 9 | Verena Henrich November 30, 2010 GermaNet: Lexical Relations • Lexical relations hold between two lexical units - Synonymy - Antonymy - Pertainymy 10 | Verena Henrich November 30, 2010 GermaNet: Conceptual Relations • Conceptual relations hold between two synsets - Hypernymy and hyponymy - Part-whole relations (meronymy and holonymy) - Entailment - Causation - Association 11 | Verena Henrich November 30, 2010 GermaNet: Conceptual Relations • GermaNet is hierarchically structured in terms of the hypernymy-hyponymy relation of synsets 12 | Verena Henrich November 30, 2010 GermaNet: Conceptual Relations • Part-whole relations are conceptual relations 13 | Verena Henrich November 30, 2010 GermaNet: Relations 14 | Verena Henrich November 30, 2010 GermaNet: Readings for “unterhalten” 1. (v) [unterhalten, pflegen] (to cultivate) -- über etwas verfügen • [unterhalten] -- NN.AN.Pp -- Sie unterhalten gute Beziehungen zu ihren Nachbarn. • [pflegen] Hypernyms: [haben, besitzen] 2. (v) [unterhalten] (to keep oneself amused) -- sich auf angenehme Weise die Zeit vertreiben • [unterhalten] -- NN.AR.BM -- Sie hat sich blendend unterhalten. (NN.AR.BM) Hypernyms: [vergnügen] 3. (v) [unterhalten] (to entertain) -- für Zerstreuung/Zeitvertreib sorgen • [unterhalten] -- NN.AN.Bs -- Er unterhielt seine Gäste mit Musik. (NN.AN.Bs) Hypernyms: [vergnügen, amüsieren] 4. (v) [unterhalten] (to maintain sth.) – etw. halten/einrichten/betreiben und dafür aufkommen • [unterhalten] -- NN.AN -- Er unterhält einen Reitstall. (NN.AN) Hypernyms: [führen] Hyponyms: [instandhalten] [bewirtschaften] 5. (v) [unterhalten] (to talk) -- ein Gespräch führen • [unterhalten] -- NN.AR.Pp.Bo -- Er unterhielt sich den ganzen Abend über seine Prüfungen. (NN.AR.Pp) -- Er unterhielt sich nur mit mir. (NN.AR.Bo) Hypernyms: [austauschen] Hyponyms: [klönen] [labern] [palavern] [philosophieren] [plauschen] [plaudern, schwatzen, schnattern] 6. (v) [unterhalten, alimentieren] (to support sb.) -- für jmds. Lebensunterhalt aufkommen • [unterhalten] -- NN.AN -- Er unterhält eine sieben-köpfige Familie. (NN.AN) • [alimentieren] Hypernyms: [ernähren, nähren] 15 | Verena Henrich November 30, 2010 GermaNet: Purpose • GermaNet development started in 1997 at the Department of Linguistics at the University of Tübingen • Developed to serve as an electronic lexicographic reference database for German word senses • Primarily intended to serve as a resource for word sense disambiguation which is crucial for natural language applications like - Information retrieval - Construction of language technology tools - Annotation of corpora - Machine translation 16 | Verena Henrich November 30, 2010 GermaNet: Size • Number of lexical units: 84.600 - Adjectives: 8.100 lexical units - Nouns: 64.100 lexical units - Verbs: 12.300 lexical units • Number of synsets: 61.700 - Adjectives: 5.600 synsets - Nouns: 46.900 synsets - Verbs: 9.200 synsets • 84600 literals (1,10 readings per literal) • Lexical relations: 3500 • Conceptual relations: 73700 17 | Verena Henrich November 30, 2010 Tools for GermaNet • Application Programming Interfaces - Java API - Perl API • Web Application: http://weblicht.sfs.uni-tuebingen.de:8080/gnet/ • Web service: as part of WebLicht • GermaNet-Explorer: visualisation tool (developed at the University of Dortmund) • GernEdiT: GermaNet editing tool 18 | Verena Henrich November 30, 2010 GermaNet: Data Formats • Former: - Lexicograher files: complex legacy format • Now: - Relational database • Export formats: - Proprietary XML format: distribution format - Lexical Markup Framework: XML, ISO standard - Princeton WordNet format 19 | Verena Henrich November 30, 2010 GermaNet: Lexicographer Files (*** Nüsse ***) {Nuss, Nuß*o, Nusskern, ?festes_Nahrungsmittel,@ nomen.Pflanze:Nuss,@ ('der essbare Kern einer Nuss')} {Haselnuss, Haselnuß*o, Haselnusskern, Haselnußkern*o, Nuss,@ nomen.Pflanze:Haselstrauch,#} {Kokosnuss, Kokosnuß*o, Nuss,@ nomen.Pflanze:Kokospalme,#} {Betelnuss, Betelnuß*o, Nuss,@ Genussmittel,@} {Erdnuss, Erdnuß*o, Erdnusskern, Erdnußkern*o, Nuss,@ nomen.Pflanze:Erdnusspflanze,#} {Cashewkern, Cashewnuss, Cashewnuß*o, Nuss,@ nomen.Pflanze:Acajubaum,#} ... 20 | Verena Henrich November 30, 2010 GermaNet: Lexicographer Files • Lexicographer files have shortcomings, there are three main problems 1. No visualization Difficult to insert new items 2. Complex data format Syntax errors and semantic inconsistencies 3. No versioning Impossible to track back changes 21 | Verena Henrich November 30, 2010 GernEdiT – The GermaNet Editing Tool • Developed to overcome the shortcomings of the lexicographer files 1. No visualization Graphical tool (search and browse GermaNet) 2. Complex data format User-friendly tool (with internal consistency checks) 3. No versioning Editing history 22 | Verena Henrich November 30, 2010 GernEdiT – The GermaNet Editing Tool 23 | Verena Henrich November 30, 2010 GermaNet: Links & References • GermaNet homepage: http://www.sfs.uni-tuebingen.de/GermaNet/ • GermaNet web application: http://weblicht.sfs.uni-tuebingen.de:8080/gnet/ • Verena Henrich and Erhard Hinrichs: GernEdiT - The GermaNet Editing Tool. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valletta, Malta, 2010 http://www.lrec-conf.org/proceedings/lrec2010/pdf/264_Paper.pdf • Verena Henrich and Erhard Hinrichs: GernEdiT: A Graphical Tool for GermaNet Development. In Proceedings of the ACL 2010 System Demonstrations, Uppsala, Sweden, 2010 http://www.aclweb.org/anthology/P10-4004 • Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. The MIT Press, 1998. • Princeton WordNet homepage: http://wordnet.princeton.edu/ • Princeton WordNet web application: http://wordnetweb.princeton.edu/perl/webwn 24 | Verena Henrich November 30, 2010 WebLicht – Web-Based Linguistic Chaining Tool 25 | Verena Henrich November 30, 2010 WebLicht: Motivation • Many linguistic resources (corpora, dictionaries, …) and tools (tokenizer, tagger, parser, …) are available • Most of them are implemented to run on local machines - This can be inconvenient, time-consuming, and error-prone because a user has to install all necessary tools • Requirement: avoid “download-first” paradigm One possible solution: make tools and resources available on the web 26 | Verena Henrich November 30, 2010 WebLicht:
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages45 Page
-
File Size-