David Jurgens What a Beautiful Multilingual World: BabelNet 2.0 & Friends! Roberto Navigli

Tiziano Andrea Flati Moro

http://lcl.uniroma1.it Simone 04/11/2013Daniele Vannella Ponzetto TIA 2013, Paris, France BabelNet & friends 2 Roberto Navigli

It’s all about knowledge!

• Intuitively, we all know what knowledge is… • …and why we need it

BabelNet & friends 04/11/2013 3 BabelNet & friends 04/11/2013 4 Roberto Navigli Roberto Navigli

1 It’s all about knowledge!

• But can we expect computers to know? • Can’t computers just use, e.g., statistical techniques?

BabelNet & friends 04/11/2013 5 BabelNet & friends 04/11/2013 6 Roberto Navigli Roberto Navigli

State-of-the-art Machine State-of-the-art

• EN: These are movies in which the music genre, e.g. rock, is an important element but not necessarily central to the plot. Examples are Easy Rider (1969), The Graduate (1969), and Saturday Night Fever (1978).

FR: J'aime le chocolat, donc j'ai acheté un bar dans un supermarché.

BabelNet & friends 04/11/2013 7 BabelNet & friends 04/11/2013 8 Roberto Navigli Roberto Navigli

2 State-of-the-art Machine Translation State-of-the-art Machine Translation

• EN: These are movies in which the music genre, e.g. • EN: Knowledge of the distribution of underground rock rock, is an important element but not necessarily central densities can assist in interpreting subsurface geologic

to the plot. Examples are Easy Rider (1969), The structure and rock type. Danger here! Graduate (1969), and Saturday Night Fever (1978). • IT: Questi sono i film in cui il genere musicale, ad es roccia, è un elemento importante, ma non necessariamente al centro della trama.

BabelNet & friends 04/11/2013 9 BabelNet & friends 04/11/2013 10 Roberto Navigli Roberto Navigli

State-of-the-art Machine Translation

• EN: Knowledge of the distribution of underground rock It’s not that the “big data” approach is bad, densities can assist in interpreting subsurface geologic it’s just that mere statistics is not enough structure and rock type. • IT: La conoscenza della distribuzione di densità di rock underground può aiutare a interpretare in sottosuolo struttura geologica e tipo di roccia.

BabelNet & friends 04/11/2013 11 BabelNet & friends 04/11/2013 12 Roberto Navigli Roberto Navigli

3 The Knowledge Acquisition Bottleneck Resources to the rescue! • Knowledge is crucial in language-related research areas – Sense Disambiguation • Various projects undertaken to make lexical knowledge – Named Entity Recognition/Linking available in machine readable form – – WordNet [Fellbaum, 1998] On a large scale, I mean – Open Mind Word Expert [Chklovski & Mihalcea, 2002] – (your favourite area here) – EuroWordNet [Vossen, 1998] • However, providing knowledge is difficult and costly – Multilingual Central Repository [Atserias et al. 2004] – The WordNetPlus project [Boyd-Graber et al., 2006] AKA: The Hamster Wheel – OntoNotes [Hovy et al., 2006] – Wikipedia (collaborative effort) Wisdom of the Crowd – Wiktionary (collaborative effort) – Omega Wiki (collaborative effort) –…

BabelNet & friends 04/11/2013 13 BabelNet & friends 04/11/2013 14 Roberto Navigli Roberto Navigli

But we need an ontology, not just an encyclopedia! Word Sense Disambiguation in a Nutshell

• And, ideally, we need it to be large-scale, wide- spring “Spring water can be found at different altitudes” coverage (target word) (context)

WSD system knowledge

sense of target word

BabelNet & friends 04/11/2013 15 BabelNet & friends 04/11/2013 16 Roberto Navigli Roberto Navigli

4 The Richer, The Better The Richer, The Better

• Highly-interconnected semantic networks have a great impact on knowledge-based WSD even in a fine-grained setting [Navigli & Lapata, IEEE TPAMI 2010]

nirvana point!!! divergence point State-of-the- art WSD source: [Navigli [Navigli source:Lapata, 2010] and

BabelNet & friends 04/11/2013 17 BabelNet & friends 04/11/2013 18 Roberto Navigli Roberto Navigli

State of the Art “in a nutshell” State of the Art “in a nutshell”

• Supervised approaches • Supervised approaches – Require large amounts of training data – Require large amounts of training data – Do not generalize across domains and languages – Do not generalize across domains and languages

• Knowledge-based approaches have a higher potential • Knowledge-based approaches have a higher potential – Lexical knowledge resources only partly available – Lexical knowledge resources only partly available – Only for few languages (e.g. not all 23 EU official languages) – Heterogenous and with low coverage lexical MultiWordNet knowledge BalkaNet WOLF resource

MCRMCR GermaNet WordNet BabelNet & friends 04/11/2013 19 BabelNet & friends 04/11/2013 20 Roberto Navigli Roberto Navigli

5 BabelNet & friends 04/11/2013 21 BabelNet & friends 04/11/2013 22 Roberto Navigli Roberto Navigli

This is where the ERC (and my project) comes into play Multilingual Joint Word Sense Disambiguation (MultiJEDI)

Key Objective 1: create knowledge for all languages

MultiWordNet BalkaNet WOLF A 5-year ERC Starting Grant (2011-2016) on Multilingual Word Sense Disambiguation MCRMCR GermaNet WordNet

BabelNet & friends 04/11/2013 23 BabelNet & friends 04/11/2013 24 Roberto Navigli Roberto Navigli

6 Multilingual Joint Word Sense Disambiguation The Vision (MultiJEDI) MultiJEDI Key Objective 2: use all languages to disambiguate one MultiJEDI

Input text in *any* language Disambiguated text

WordNet Multilingual Joint WSD: central research objective Multilingual? Wikipedia Automatic Acquisition of a Wide-Coverage Multilingual Semantic Network: BabelNet

BabelNet & friends 04/11/2013 25 BabelNet & friends 04/11/2013 26 Roberto Navigli Roberto Navigli

Objective 1: Creating a Multilingual Semantic Network BabelNet [Navigli and Ponzetto, AIJ 2012]

• Start from two large complementary resources: • A wide-coverage multilingual semantic network including both encyclopedic (from Wikipedia) and – WordNet: full-fledged taxonomy lexicographic (from WordNet) entries – Wikipedia: multilingual and continuously updated NEs and specialized Concepts from WordNet concepts from Wikipedia {wheeled vehicle} has-part {brake} h as h -p

is-a a as rt -a -p is a {wheel} rt {splasher} {wagon, {self-propelled vehicle} waggon}

i s is a - - - a a is

{motor vehicle} {tractor} {locomotive, engine, locomotive engine, railway locomotive} i s - -a a is {car window} t {car,auto, automobile,Getas-par the best from both worlds {golf cart, h golfcart} machine, motorcar}

h t as -a r - s a pa i p r - t s a h {accelerator, {convertible} accelerator pedal, {air bag} gas pedal, throttle} Concepts integrated from both resources

BabelNet & friends 04/11/2013 27 BabelNet & friends 04/11/2013 28 Roberto Navigli Roberto Navigli

7 BabelNet integrates the best of both worlds WordNet Taming the long tail…

balloon Wikipedia

BabelNet & friends 04/11/2013 29 BabelNet & friends 04/11/2013 30 Roberto Navigli Roberto Navigli

WordNet [Miller et al., 1990; Fellbaum, 1998] Wikipedia [The Web Community, 2001-today]

(unspecified) semantic relation {wheeled vehicle} has-part {brake} ha ha s-p

is-a a s rt • Playing with senses -a -p concepts is a {wheel} rt • Bla bla bla bla bla bla bla {splasher} {wagon, {self-propelled vehicle} waggon} • Bla bla bla bla bla bla bla

i s is -a - - a a • Bla bla bla bla bla bla bla semantic relation is • Bla bla bla bla bla bla bla {motor vehicle} {tractor} {locomotive, engine, locomotive engine, railway locomotive} • Bla bla bla bla bla bla bla i s a - - a is {car window} art {golf cart, {car,auto, automobile, has-p golfcart} machine, motorcar}

h t as -a r - s a pa i p r - t s a concepts h {accelerator, {convertible} accelerator pedal, {air bag} gas pedal, throttle}

BabelNet & friends 04/11/2013 31 BabelNet & friends 04/11/2013 32 Roberto Navigli Roberto Navigli

8 BabelNet: concepts and semantic relations (1) BabelNet: concepts and semantic relations (2)

• Concepts and relations in BabelNet are harvested from • We encode knowledge as a labeled directed graph: WordNet and Wikipedia: – Each vertex is a Babel synset – WordNet: BabelNet: balloon , Ballon , synsets concepts EN DE aerostatoES, aerostatoIT, pallone aerostaticoIT, lexico-semantic relations semantic relations mongolfièreFR

– Each edge is a semantic relation between synsets: – Wikipedia: BabelNet: • is-a (balloon is-a aircraft) • part-of (gasbag part-of balloon) pages concepts • instance-of (Einstein instance-of physicist) •… hyperlinks semantic relations • unspecified/relatedness (balloon related-to flight)

BabelNet & friends 04/11/2013 33 BabelNet & friends 04/11/2013 34 Roberto Navigli Roberto Navigli

BabelNet: objectives An example of mapping

1. Provide a unified resource – By establishing an automated mapping between Wikipedia pages and WordNet senses 2. Enable multilinguality – By collecting the lexicalizations of concepts in different languages using:

a) Wikipedia interlanguage links b) Statistical Machine Translation

BabelNet & friends 04/11/2013 35 BabelNet & friends 04/11/2013 36 Roberto Navigli Roberto Navigli

9 Creation of the Wikipedia disambiguation contexts Creation of the Wikipedia disambiguation contexts

sense label

ctx(Balloon (aircraft)) = { } ctx(Balloon (aircraft)) = { aircraft }

BabelNet & friends 04/11/2013 37 BabelNet & friends 04/11/2013 38 Roberto Navigli Roberto Navigli

Creation of the Wikipedia disambiguation contexts Creation of the Wikipedia disambiguation contexts

hyperlinks

categories

ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola } airship, …, gondola, ballooning, hydrogen, aeronautics }

BabelNet & friends 04/11/2013 39 BabelNet & friends 04/11/2013 40 Roberto Navigli Roberto Navigli

10 Building BabelNet: Mapping Wikipedia to WordNet

• Given a Wikipage w and its disambiguation context ctx(w): – For each WordNet sense s of w, calculate score(s, w) as follows:

BabelNet & friends 04/11/2013 41 BabelNet & friends 04/11/2013 42 Roberto Navigli Roberto Navigli

The Wikipedia page context in the WordNet graph The Wikipedia page context in the WordNet graph ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola }

aircraft#n#1 gondola#n#1 buoyancy#n#1 airship#n#1 balloon#n#1 balloon#n#1 aerostat#n#1

BabelNet & friends 04/11/2013 43 BabelNet & friends 04/11/2013 44 Roberto Navigli Roberto Navigli

11 The Wikipedia page context in the WordNet graph The Wikipedia page context in the WordNet graph

aircraft#n#1 aircraft#n#1 gondola#n#1 buoyancy#n#1 gondola#n#1 buoyancy#n#1 airship#n#1 airship#n#1 balloon#n#1 balloon#n#1 aerostat#n#1 aerostat#n#1

balloon#n#1 -> aircraft#n#1 balloon#n#1 -> aircraft#n#1 0.35 balloon#n#1 -> aircraft#n#1 -> airship#n#1 balloon#n#1 -> aircraft#n#1 -> airship#n#1 balloon#n#1 -> gondola#n#1 balloon#n#1 -> gondola#n#1 balloon#n#1 -> gondola#n#1 -> flight#n#1 -> buoyancy#n#1 balloon#n#1 -> gondola#n#1 -> flight#n#1 -> buoyancy#n#1 balloon#n#1 -> aerostat#n#1 balloon#n#1 -> aerostat#n#1

BabelNet & friends 04/11/2013 45 BabelNet & friends 04/11/2013 46 Roberto Navigli Roberto Navigli

Building BabelNet: Translating Babel synsets Building BabelNet: Translating Babel synsets

1. Exploiting Wikipedia interlanguage links 2. Filling the lexical translation gaps using a Machine Translation system to translate the English lexicalizations of a concept

Ballon • On August 27, 1783 in Paris, Franklin witnessed the globo world's first hydrogen [[Balloon (aircraft)|balloon]] aerostàtico flight. pallone aerostatico Google Translate

• Le 27 Août, 1783 à Paris, Franklin vu le premier vol en ballon d'hydrogène.

BabelNet & friends 04/11/2013 47 BabelNet & friends 04/11/2013 48 Roberto Navigli Roberto Navigli

12 Building BabelNet: Translating Babel synsets The most frequent translation of a word in a given meaning

2. Filling the lexical translation gaps using a Machine Translation system to translate the English left context term right context lexicalizations of a concept wikification may refer to: the… • For each word sense s, we translate: geoinformatics services' and ' wikification of GIS by the masses' – sentences from SemCor (a corpus annotated with WordNet the process may be called wikification (as in ... senses) which contain s which is then called " wikification and to the related problem – sentences from Wikipedia linked to the Wikipage of s reason needs copyediting, wikification , reduction of POV, work on references • The most frequent translation of s is huge amount of cleanup, wikification , etc. Version of 12 Nov selected for each target language

BabelNet & friends 04/11/2013 49 BabelNet & friends 04/11/2013 50 Roberto Navigli Roberto Navigli

The most frequent translation of a word in a given meaning The most frequent translation of a word in a given meaning

left context term right context left context term right context wikificazione potrebbe riferirsi a: il… wikificazione potrebbe riferirsi a: il… servizi geoinformatici' e ' wikification di GIS dalle masse' servizi geoinformatici' e ' wikification di GIS dalle masse' il processo chiamato wikificazione (come in ... il processo chiamato wikificazione (come in ... che è quindi chiamato wikificazione e al problema correlato… che è quindi chiamato wikificazione e al problema correlato… ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre

BabelNet & friends 04/11/2013 51 BabelNet & friends 04/11/2013 52 Roberto Navigli Roberto Navigli

13 BabelNet: a multilingual encyclopedic dictionary! BabelNet 2.0 is online: http://babelnet.org

• Available online: http://babelnet.org

For research purposes…

BabelNet & friends 04/11/2013 53 BabelNet & friends 04/11/2013 54 Roberto Navigli Roberto Navigli

BabelNet knows BabelNet knows Paris 13!

BabelNet & friends 04/11/2013 55 BabelNet & friends 04/11/2013 56 Roberto Navigli Roberto Navigli

14 BabelNet knows Paris 13! The BabelNet API

Retrieve all synsets with the English lemma “bank”

Print information about each synset

GetPrint the each (relation, German synsets) sense map of thein synset the synset neighbours Print the information of Get the synsets related by each related synset a given relation type

BabelNet & friends 04/11/2013 57 BabelNet & friends 04/11/2013 58 Roberto Navigli Roberto Navigli

BabelNet goes at a faster pace than I can cope with Anatomy of BabelNet 2.0 Previous version had 6! • 50 languages covered (including Latin!) Key fact! • List at http://babelnet.org/stats.jsp

BabelNet & friends 04/11/2013 59 BabelNet & friends 04/11/2013 60 Roberto Navigli Roberto Navigli

15 Anatomy of BabelNet 2.0 Anatomy of BabelNet 2.0 • 50 languages covered (including Latin!) • 50 languages covered (including Latin!) • 9.3M Babel synsets (concepts and named entities) • Integrates: • 50M word senses – WordNet 3.0 • 262M semantic relations (28 edges per synset on avg.) – Wikipedia (2012 dump) – OmegaWiki: a collaborative multilingual dictionary • 7.7M synset-associated images – Open Multilingual WordNet [Bond and Foster, 2013] • 18M textual definitions • for all open-class parts of speech

BabelNet & friends 04/11/2013 61 BabelNet & friends 04/11/2013 62 Roberto Navigli Roberto Navigli

WordNet+Open Multilingual WordNet+Wikipedia+… +OmegaWiki+automatic translations…

BabelNet & friends 04/11/2013 63 BabelNet & friends 04/11/2013 64 Roberto Navigli Roberto Navigli

16 +textual definitions +Wikipedia categories

BabelNet & friends 04/11/2013 65 BabelNet & friends 04/11/2013 66 Roberto Navigli Roberto Navigli

+images Evaluations: I have to go fast here!

BabelNet & friends 04/11/2013 67 BabelNet & friends 04/11/2013 68 Roberto Navigli Roberto Navigli

17 WordNet-Wikipedia mapping accuracy Evaluation of BabelNet against gold standard resources

• Overall quality of the mapping: ~84% Up to +2300% new senses! – On a random sample of 1k Wikipages Extra-coverage – Note: this concerns only those 50k synsets in the intersection • Quality of the mapping of frequent : ~91%

BabelNet & friends 04/11/2013 69 BabelNet & friends 04/11/2013 70 Roberto Navigli Roberto Navigli

Coarse-grained Word Sense Disambiguation with BabelNet

State of the art results!

Current state of the art

BabelNet & friends 04/11/2013 71 BabelNet & friends 04/11/2013 72 Roberto Navigli Roberto Navigli

18 Annotating with BabelNet: all in one! Key fact!

• Annotating with BabelNet implies annotating with WordNet and Wikipedia • (now also OmegaWiki and Open Multilingual WordNet!)

BabelNet

74

BabelNet & friends 04/11/2013 73 BabelNet & friends 04/11/2013 74 Roberto Navigli Roberto Navigli

Dataset for Multilingual Word Sense Disambiguation We are not alone in the (resource) universe!

BabelNet & friends 04/11/2013 BabelNet: a Very Large Multilingual Ontology 04/11/2013 77 Roberto Navigli Roberto Navigli

19 We are not alone in the (resource) universe! So where is the novelty? • DBPedia [Bizer et al. 2009] - a resource obtained from structured information in Wikipedia • We provide a unified, integrated inventory and network – «Describes 3.77M things» for both word senses and named entities – Core of the Linked Open Data Cloud • YAGO [Suchanek et al. 2007] – «Contains 10M entities and 120M facts about these entities» – Links Wikipedia categories to WordNet synsets • MENTA [de Melo and Weikum, 2010] – A «multilingual taxonomy with 5.4M entities» • WikiNet [Nastase and Strube, 2013] – Semantic network connecting Wikipedia entities – «3M concepts and 38+M relations» • Freebase (http://freebase.com): collaborative effort – Structured data; started from Wikipedia, MusicBrainz, ChefMoz, etc.

BabelNet & friends 04/11/2013 78 BabelNet & friends 04/11/2013 79 Roberto Navigli Roberto Navigli

Now in the Linked Open Data cloud…

BabelNet & friends 04/11/2013 80 BabelNet goes to the (Multilingual) 04/11/2013 81 Roberto Navigli Roberto Navigli

20 Actually, in the *Linguistic* Linked Open Data cloud… SPred: Semantic Predicates on a large scale [Flati & Navigli, ACL 2013]

• Choose a set of semantic classes • Given a lexical predicate

w1 w2 … wi * wi+1 … wn

identify the semantic class distribution for * • Classify new filling arguments

BabelNet goes to the (Multilingual) Semantic Web 04/11/2013 82 BabelNet & friends 04/11/2013 83 Roberto Navigli Roberto Navigli

The idea in a nutshell: start from text (1/6) The idea in a nutshell: focus on lexical predicates (2/6) cup of * cup of *

…pounded millet mixed with two cups of butter, every day… …pounded millet mixed with two cups of butter, every day… …senior citizens would linger over cups of coffee and exchange news… …senior citizens would linger over cups of coffee and exchange news… ...began over a cup of yogurt and a broken printer that … ...began over a cup of yogurt and a broken printer that … …in the quarterfinal cup of Yugoslavia as a renowned team… …in the quarterfinal cup of Yugoslavia as a renowned team… …which is equivalent to 3 cups of regular coffee, he will be able to… …which is equivalent to 3 cups of regular coffee, he will be able to… He drank several more cups of wine and started behaving wildly… He drank several more cups of wine and started behaving wildly… …or fruit, as well as small cups of kosher wine or other beverages… …or fruit, as well as small cups of kosher wine or other beverages… …two Euroleague titles, two Cups of Italy, one Koraş Cup and one… …two Euroleague titles, two Cups of Italy, one Koraş Cup and one…

BabelNet & friends 04/11/2013 84 BabelNet & friends 04/11/2013 85 Roberto Navigli Roberto Navigli

21 The idea in a nutshell: focus on filling arguments (3/6) The idea in a nutshell: gather all the filling arguments (4/6) cup of * cup of *

…pounded millet mixed with two cups of butter, every day… …senior citizens would linger over cups of coffee and exchange news… ...began over a cup of yogurt and a broken printer that … …in the quarterfinal cup of Yugoslavia as a renowned team… …which is equivalent to 3 cups of regular coffee, he will be able to… He drank several more cups of wine and started behaving wildly… …or fruit, as well as small cups of kosher wine or other beverages… …two Euroleague titles, two Cups of Italy, one Koraş Cup and one…

BabelNet & friends 04/11/2013 86 BabelNet & friends 04/11/2013 87 Roberto Navigli Roberto Navigli

The idea in a nutshell: group similar arguments (5/6) The idea in a nutshell: associate semantic labels (6/6)

cup of * cup of *

BabelNet & friends 04/11/2013 88 BabelNet & friends 04/11/2013 89 Roberto Navigli Roberto Navigli

22 cup of * Next feature Another Plan for Knowledge Acquisition: in BabelNet! • Knowledge acquisition without relying on hand-crafted knowledge • • Deals with technical domains • Current approaches have limits: – Kozareva & Hovy [2010]: lexico-syntactic patterns + pruning based on the longest path Wine Coffee Earl Grey tea Water – Yang & Callan [2009]: based on incremental clusters of terms Seawater Sack Turkish coffee Green tea • 82% F1 for small-scale WordNet is-a sub-hierarchies (39 terms on average) White wine Drip coffee Indian tea … • 61% F1 on part-of sub-hierarchies Red wine Espresso Black tea Claret Cappuccino Tea – Snow et al. [2006]: hyponym acquisition based on a probabilistic Kosher wine Caffè latte … model Madeira wine Decaffeinated • 58% P, 21% R coffee Wine in China Classes sorted by relevance! … … • These approaches have not been shown to be able to extract large specialized domain ontologies

BabelNet & friends 04/11/2013 90 BabelNet & friends 04/11/2013 91 Roberto Navigli Roberto Navigli

OntoLearn Reloaded: Workflow

Web glossaries & documents

OntoLearn Reloaded Domain Upper [Navigli, Faralli & Velardi, IJCAI 2011; terms terms Computational Linguistics 2013] Domain Corpus

Definition Unlike other approaches, we learn both concepts Terminology Domain & hypernym Graph extraction filtering and relations entirely from scratch extraction pruning for any domain of interest

Hypernym graph Induced taxonomy

BabelNet & OntoLearn Reloaded 04/11/2013 92 BabelNet & OntoLearn Reloaded 04/11/2013 93 Roberto Navigli Roberto Navigli

23 OntoLearn Reloaded: Workflow Terminology Extraction

Web glossaries & documents maximum likelihood Domain Upper terms flow network terms mesh generation Domain Corpus

hash function pattern recognition Definition Terminology Domain & hypernym Graph extraction filtering extraction pruning information processing

Domain Corpus Domain terms Hypernym graph Induced taxonomy

BabelNet & OntoLearn Reloaded 04/11/2013 94 BabelNet & OntoLearn Reloaded 04/11/2013 95 Roberto Navigli Roberto Navigli

OntoLearn Reloaded: Workflow Definition & Hypernym Extraction

Web glossaries & + Domain Filtering documents

Domain Corpus Web glossaries & Domain Upper documents terms terms

Domain flow network Corpus Domain definition extraction (WCL) terms

Definition Terminology Domain domain non domain & hypernym Graph extraction filtering extraction pruning In graph theory, a flow network is a directed graph.

Global Cash Flow Network is a business opportunity to make money online.

A flow network is a network with two distinguished vertices.

Hypernym graph Induced taxonomy

BabelNet & OntoLearn Reloaded 04/11/2013 96 BabelNet & OntoLearn Reloaded 04/11/2013 97 Roberto Navigli Roberto Navigli

24 Definition & Hypernym Extraction Definition & Hypernym Extraction + Domain Filtering + Domain Filtering

Domain Corpus Web glossaries & Domain Corpus Web glossaries & documents documents graph data structure

directed graph flow network Terms Domain definition extraction (WCL) from definition extraction (WCL) terms previous iteration directed graph network directed graph network In graph theory, a flow network is a directed graph. A directed graph is a graph where ...

A flow network is a network with two distinguished vertices. A directed graph is a data structure ...

directed graph graph hypernym extraction flow network hypernym extraction flow network network data structure

BabelNet & OntoLearn Reloaded 04/11/2013 98 BabelNet & OntoLearn Reloaded 04/11/2013 99 Roberto Navigli Roberto Navigli

Hypernym Extraction Algorithm: Word Class Lattices

Based on Word-Class Lattices (WCLs), i.e. lattice-based definition models learned by means of a greedy definition alignment algorithm [Navigli & Velardi, ACL 2010] The iterative • Determine whether a sentence is definitional • If so, return the hypernym(s) of the defined term growth of the hypernym graph

BabelNet & OntoLearn Reloaded 04/11/2013 100 BabelNet & OntoLearn Reloaded 04/11/2013 101 Roberto Navigli Roberto Navigli

25 26 From the Noisy Hypernym Graph...

BabelNet & OntoLearn Reloaded 04/11/2013 107 Roberto Navigli

An excerpt of the AI taxonomy ...to a Tree-like Taxonomy

BabelNet & OntoLearn Reloaded 04/11/2013 108 BabelNet & OntoLearn Reloaded 04/11/2013 109 Roberto Navigli Roberto Navigli

27 Semantically-Enhanced Open Information Extraction Semantically-Enhanced Open Information Extraction [Moro and Navigli, IJCAI 2013] [Moro and Navigli, IJCAI 2013] • Moving to semantically enhanced OIE:

BabelNet & friends 04/11/2013 110 BabelNet & friends 04/11/2013 111 Roberto Navigli Roberto Navigli

Semantically-Enhanced Open Information Extraction Semantically-Enhanced Open Information Extraction [Moro and Navigli, IJCAI 2013] [Moro and Navigli, IJCAI 2013] • Moving to semantically enhanced OIE: • Moving to semantically enhanced OIE:

BabelNet & friends 04/11/2013 112 BabelNet & friends 04/11/2013 113 Roberto Navigli Roberto Navigli

28 Ontologizing the relation strings Improving traditional IE with BabelNet [Moro et al., ISWC 2013] • Using rich semantics increases precision while keeping recall high

BabelNet & friends 04/11/2013 114 BabelNet & friends 04/11/2013 115 Roberto Navigli Roberto Navigli

Thanks or…

m i (grazie)

BabelNet & friends 04/11/2013 116 BabelNet & friends 04/11/2013 118 Roberto Navigli Roberto Navigli

29 Roberto Navigli Linguistic Computing Laboratory http://lcl.uniroma1.it

30