Universal Dependencies as a Resource for Linguistic Research

Joakim Nivre UD in a Nutshell

Cross-linguistically consistent grammatical annotation Support multilingual research in NLP and linguistics • Meaningful linguistic analysis within and across languages • Syntactic in monolingual and cross-lingual settings • Useful information for downstream language understanding tasks Build on common usage and existing de facto standards Complement – not replace – language-specific schemes UD for Linguistic Research

Theory • Can we make linguistic sense of UD representations? • Are dependencies syntactic or semantic? Data • What kind of linguistic data is available in UD ? • How diverse are the data sets? Studies • Two case studies using UD resources Theory Syntax

root punct

nmod

nsubj dobj aux det case det aux det det

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

root

nmod

nsubj dobj aux det case det aux det det

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

root

nmod

nsubj dobj

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

2 root punct

obl

nsubj obj aux det case det aux det det

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

root

obl

nsubj obj aux det case det aux Syntaxdet det The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

root punct

nmodobl

nsubjnsubj dobjobj aux det case det aux det det

TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT

root • Content are related by dependency relations nmod

nsubj dobj aux det case det aux det det

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

root

nmod

nsubj dobj

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

2

2 root punct

obl

nsubj obj aux det case det aux Syntaxdet det The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

rootroot punct

nmodobl

nsubjnsubj dobjobj auxaux det case detdet auxaux det det

TheThe catcat couldcould havehave chasedchased allall the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERBVERB DETDET DET NOUN ADP DET NOUN PUNCT

rootroot • Content words are related by dependency relations nmodobl

• Function wordsnsubjnsubj attach to dobjtheobj content they modify aux det case det aux det det

TheThe catcat couldcould havehave chasedchased allall the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERBVERB DETDET DET NOUN ADP DET NOUN PUNCT

root

nmod

nsubj dobj

The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

2

2 Syntax

root punct

nmodobl

nsubjnsubj dobjobj aux det case detdet aux det det

TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT

root • Content words are related by dependency relations nmodobl

• Function wordsnsubjnsubj attach to dobjtheobj content word they modify aux det case • detPunctuationdet attachaux to head ofdet phrase or clausedet TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT

root

nmodobl

nsubjnsubj dobjobj

TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT

2 2 “Content-Head Dependencies” nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

1 “Content-Head Dependencies” nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

1 “Content-Head Dependencies” nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

1 “Content-Head Dependencies” nsubj obl nsubj obl aux case aux we havecase come to Osaka we have come nsubjto vgOsakaobl pobj “Function-Head Dependencies”we have come to Osaka nsubj vg obl pobj nsubj obl we have come to auxOsaka case we have come to Osaka nsubj obl nsubj obl aux caseaux case we have comewe tohave Osakacome to Osaka

nsubj obl aux case we have come to Osaka

1

1 nsubj obl aux case we have come to Osaka “Content-Head Dependencies” “Function-Head Dependencies” nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka

nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka nsubj obl nsubj obl aux case aux case we have come to Osaka we have come to Osaka nsubj obl aux case

we have come to Osaka

1

1 nsubj obl aux case we have come to Osaka “Content-Head Dependencies” “Function-Head Dependencies” nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka

nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka Dubious Linguistics?nsubj obl “Suchnsubj an approach to theobl syntax of natural languages is contraryaux to most workcase in aux case theoretical syntax in the past 35 years, regardlesswe have of whethercome this workto is Osaka we have constituency-come toor dependency-based.Osaka ” (Groß and Osborne, 2015) nsubj obl aux case

we have come to Osaka

1

1 What is a head?

Zwicky (1985), summarised by Hudson (1987) Why choose one? Why choose one?

Head properties may be shared by several elements • So neither content-head nor function-head can be quite right Why choose one?

Head properties may be shared by several elements • So neither content-head nor function-head can be quite right Linguistic theories capture this in different ways • Lexical vs. functional heads (Chomsky, 1995) • Surface syntax vs. deep syntax (Sgall et al., 1986; Mel’čuk, 1988) • Dissociated nucleus (Tesnière, 1959) Why choose one?

Head properties may be shared by several elements • So neither content-head nor function-head can be quite right Linguistic theories capture this in different ways • Lexical vs. functional heads (Chomsky, 1995) • Surface syntax vs. deep syntax (Sgall et al., 1986; Mel’čuk, 1988) • Dissociated nucleus (Tesnière, 1959) What about UD? UD Syntax UD Syntax

UD representations are mono-stratal – single tree • Facilitates annotation, parsing and downstream tasks UD Syntax

UD representations are mono-stratal – single tree • Facilitates annotation, parsing and downstream tasks Tree structure primarily reflects lexical dependencies • Brings out parallelism between typologically diverse languages • Reveals predicate-argument structure for downstream tasks UD Syntax

root UD representationspunct are mono-stratal – single tree obl nsubj:pass • detFacilitatesaux:pass annotation,det parsing and downstream tasks The dog was chased by the cat . DET NOUN AUX VERB ADP DET NOUN PUNCT Tree structure primarily reflects lexical dependencies root punct • Brings out parallelismobl between typologically diverse languages nsubj:pass

Hunden jagades av katten . • NOUNRevealsVERB predicate-argumentADP NOUN PUNCT structure for downstream tasks Definite=Def Voice=Pass Definite=Def

root punct obl nsubj:pass case det aux:pass det

The dog was chased by the cat . DET NOUN AUX VERB ADP DET NOUN PUNCT

root punct obl nsubj:pass case

Hunden jagades av katten . NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def

4 UD Syntax

root UD representationspunct are mono-stratal – single tree obl nsubj:pass • detFacilitatesaux:pass annotation,det parsing and downstream tasks The dog was chased by the cat . DET NOUN AUX VERB ADP DET NOUN PUNCT Tree structure primarily reflects lexical dependencies root punct • Brings out parallelismobl between typologically diverse languages nsubj:pass

Hunden jagades av katten . • NOUNRevealsVERB predicate-argumentADP NOUN PUNCT structure for downstream tasks Definite=Def Voice=Pass Definite=Def

root punct Reddy et al. (2016) Transforming Dependency Structures obl Dependenciesto Logical to Logical Forms for Forms Semantic Parsing nsubj:pass case Composition det aux:pass det root The dog was chased by the cat . nsubj dobj DET NOUN AUX VERB ADP DET NOUN PUNCT Disney acquired Pixar

root punct obl (nsubj (dobj acquired Pixar) Disney) nsubj:pass case

Hunden jagades av katten . z. xy.acquired(ze) Pixar(ya) Disney(xa) NOUN VERB ADP NOUN PUNCT 9 ^ ^ ^ arg (z ,x ) arg (z ,y ) Definite=Def Voice=Pass Definite=Def 1 e a ^ 2 e a

20

1

4 UD Syntax

Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions

nmod nsubj case case conj fixed det det cc fixed

the man in the moon Jack and Jill left in spite of this

nsubj conj cc

Jack and Jill left

case fixed fixed

in spite of this

7 UD Syntax

Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions

nmod nsubj case case conj fixed det det cc fixed

the man in the moon Jack and Jill left in spite of this

nsubj conj cc

Jack and Jill left

case fixed fixed

in spite of this

7 UD Syntax

Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions

nmod nsubj case case conj fixed det det cc fixed

the man in the moon Jack and Jill left in spite of this

nsubj conj cc

Jack and Jill left

case fixed fixed

in spite of this

7 UD Syntax

Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions

nmod nsubj case case conj fixed det det cc fixed

the man in the moon Jack and Jill left in spite of this

nsubj conj cc

Jack and Jill left

case fixed fixed

in spite of this

7 nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

1 nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl we have come to Osaka

nsubj obl we have come to Osaka

nsubj aux obl case

1 nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl we have come to Osaka

nsubj obl we have come to Osaka

nsubj aux obl case

1 nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl we have come to Osaka

nsubj obl dependency nucleus we have come to Osaka

nsubj aux obl case

1 nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl we have come to Osaka

nsubj obl dependency nucleus karaka vibhakti we have come to Osaka

nsubj aux obl case

1 nsubj obl aux case we have come to Osaka

nsubj vg obl pobj we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl aux case we have come to Osaka

nsubj obl we have come to Osaka

nsubj obl dependency nucleus karaka vibhakti we have come to Osaka kakariuke bunsetsu nsubj aux obl case

1 nsubj obl aux case

we have come to Osaka

nsubj vg obl pobj

we have come to Osaka

nsubj obl aux case

we have come to Osaka

nsubj obl aux case

we have come to Osaka

nsubj obl

we have come to Osaka

nsubj obl

Linguisticwe have Typologycome to Osaka nsubj aux obl case

nsubj

she is nice nonverbal ona mila predication

nsubj

nsubj cop Croft et al. (2017) Linguistic Typology Meets Universal Dependencies she is nice ona mila

nsubj

1 nsubj obl aux case

we have come to Osaka

nsubj vg obl pobj

we have come to Osaka

nsubj obl aux case

we have come to Osaka

nsubj obl aux case

we have come to Osaka

nsubj obl

we have come to Osaka

nsubj obl

we have come to Osaka

nsubj aux obl case

nsubj

she is nice Linguisticona Typologymila nsubj

nsubj copula strategy cop she is nice nonverbal ona mila predication

nsubj null strategy

Croft et al. (2017) Linguistic Typology Meets Universal Dependencies

1 nsubj cop det amod

Ivan is the best dancer Ivan luˇcˇsij tancor Verb Groupsamod

nsubj

obl nsubj case aux det

Ivan will participate in the show Ivan participera `a le spectacle

det nsubj case obl

nmod det case the Chair ’s oce to grafeo tou prodrou

det

det nmod

det nmod case det the Chair ’s oce of the Chair to grafeo tou prodrou

det

det nmod

nmod det case the Chair ’s oce the oce of the Chair

det case det nmod

9 nsubj cop det amod

Ivan is the best dancer Ivan luˇcˇsij tancor

amod

nsubj

obl nsubj case aux det

Ivan will participate in the show Ivan participera `a le spectacle

det nsubj case obl

nmod det case the Chair ’s oce of the Chair to grafe´ıo tou pro´edrou Adpositions and Casedet

det nmod

det nmod case det the Chair ’s oce of the Chair to grafe´ıo tou pro´edrou

det

det nmod

nmod det case the Chair ’s oce the oce of the Chair grafe´ıo tou pro´edrou

det case det nmod

9 nsubj cop det amod

Ivan is the best dancer Ivan luˇcˇsij tancor

amod

nsubj

obl nsubj case aux det

Ivan will participate in the show Ivan Adpositionsparticipera `aandle Casespectacle det nsubj case obl

nmod det case the Chair ’s oce of the Chair to grafe´ıo tou pro´edrou

det

det nmod

det nmod case det the Chair ’s oce of the Chair to grafe´ıo tou pro´edrou

det

det nmod

nmod det case the Chair ’s oce the oce of the Chair grafe´ıo tou pro´edrou

det case det nmod

9 nsubj cop det amod

Ivan is the best dancer Ivan luˇcˇsij tancor

amod

nsubj

obl nsubj case aux det

Ivan will participate in the show Ivan participera `a le spectacle

det nsubj case obl

nmod det case the Chair ’s oce of the Chair to grafe´ıo tou pro´edrou

det

det nmod

det nmod case det the Chair ’s oce of the Chair to Adpositionsgrafe´ıo and Casetou pro´edrou det

det nmod

nmod det case the Chair ’s oce the oce of the Chair

det case det nmod

9 Syntax or Semantics?

Are not UD dependencies semantic rather than syntactic? • Dependencies capture predicate-argument relations • Dependencies do not (directly) capture agreement and government A functional view of syntax: • UD is based on grammatical functions, not semantic roles • UD models relations that are encoded morphosyntactically • UD does not model semantic role alternations conj cc nsubj obj flat Grammaticaldet fixed Functions flat fixed det

Thomas Alva Edison invented the light bulb as well as the phonograph PROPN PROPN PROPN VERB DET NOUN NOUN ADP ADV ADP DET NOUN

obj

nsubj iobj det

Mary sent Peter a book PROPN VERB PROPN DET NOUN obj nsubj det obl obj Mary broke the vase nsubj det case PROPN VERB DET NOUN Mary sent a book to Peter PROPN VERB DET NOUN ADP NOUN det nsubj the vase broke obl DET NOUN VERB obj nsubj det case

Mary sent a book on Sunday PROPN VERB DET NOUN ADP NOUN

1

2 obj nsubj det

Mary broke the vase PROPN VERB DET NOUN

det nsubj

the vase broke DET NOUN VERB

obj nsubj det obj Mary broke the vase nsubj det PROPN VERB DET NOUN Mary broke the vase Valency-Changing OperationsPROPN VERB DET NOUN det nsubj nsubj:pass obl:agent the vase broke det aux:pass case DET NOUN VERB the vase was broken by Mary DET NOUN AUX VERB ADP PROPN

obj nsubj nsubj det Hasan ko¸stu Mary broke the vase (Hasan) (ran) PROPN VERB DET NOUN PROPN VERB

nsubj nsubj:pass obl:agent obj:caus det aux:pass case (ben) Hasanı ko¸sturdum the vase was broken by Mary (I) (Hasan) (made-run) DET NOUN AUX VERB ADP PROPN PROPN VERB

nsubj

Hasan ko¸stu (Hasan) (ran) PROPN VERB

nsubj obj:caus

(ben) Hasanı ko¸sturdum (I) (Hasan) (made-run) PROPN VERB

2

2 nsubj obl aux case

we have come to Osaka

nsubj vg obl pobj

UDwe haveRepresentationscome to Osaka

• Mono-stratal butnsubj multi-relational representationsobl • Grammatical functionsaux take priority case

• Both welexical andhave functionalcome heads canto be extractedOsaka

nsubj obl aux case

we have come to Osaka

nsubj obl

we have come to Osaka

nsubj obl

we have come to Osaka

nsubj aux obl case

1 nsubj obl aux case

we have come to Osaka

nsubj vg obl pobj

UDwe haveRepresentationscome to Osaka

• Mono-stratal butnsubj multi-relational representationsobl • Grammatical functionsaux take priority case

• Both welexical andhave functionalcome heads canto be extractedOsaka

nsubj obl aux case

we have come to Osaka

But nsubjyou need to be oblaware of this!

we have come to Osaka

nsubj obl

we have come to Osaka

nsubj aux obl case

1 Head-Initial or Head-Final?

nsubj obl aux case nsubj obl we have aux come to case Osaka we have come to Osaka nsubj vg obl pobj

we nsubj have vg come obl to pobjOsaka we have come to Osaka nsubj obl aux case nsubj obl we have aux come to caseOsaka we have come to Osaka nsubj obl aux case nsubj obl we have aux come to caseOsaka we have come to Osaka

1 1 Data 50 languages 70 treebanks http://universaldependencies.org Size 1500000

1000000

500000

Median 0 Treebank Size 1500000 Czech

1000000

500000

Median 0 Treebank Size 1500000 Czech

Russian 1000000

500000

Median 0 Treebank Size 1500000 Czech

Russian 1000000

Arabic

500000

Median 0 Language Family

Isolates 3 Other 4 Turkic 3 Afro-Asian 3 Uralic 3 Indo-European 34 Language Family Indo-European

Iranian Isolates Greek 1 3 Other 2 Baltic Celtic Slavonic 4 1 Turkic 2 10 3 Indian Afro-Asian 3 3 Uralic 3 Indo-European 34Germanic 7 Romance 8 Genre 50

40

30

20

10

0 News Fiction Non-FictionWikipedia Legal Blog Spoken Bible Medical Reviews Grammar Web Other Studies • Word order freedom studied in UD treebanks

• Conditional entropy of order given dependencies

• Test hypotheses about case and word order freedom • Word order freedom studied in UD treebanks

• Conditional entropy of order given dependencies

• Test hypotheses about case and word order freedom

Thanks to Richard for sharing slides! Word Order Freedom

English German Conditional Entropy

Local delexicalized trees to avoid data sparsity Word Order and Morphology

Semantics and Word Order • Word order typology based on Bible translations

• Massively parallel alignment and annotation projection

• UD tags and dependencies projected from English • Word order typology based on Bible translations

• Massively parallel alignment and annotation projection

• UD tags and dependencies projected from English

Thanks to Robert for sharing slides! Methodology

Word alignment of parallel texts: • New Testament in 986 languages (1144 translations) • Bayesian word alignment with interlingua (Östling, 2015) Annotation projection: • UD tags and dependencies from 5 English translations • Sparse, high-precision projection (80% of links must agree) Word order statistics: • Count frequency of different word order variants (tokens or types) • Constructions: SOV – SV – OV – Adp-Noun – Adj-Noun

New information for ≈ 600 languages Conclusion

• Large collection of languages – but biased sample and mostly small corpora

• Cross-linguistically consistent grammatical annotation – but you have to know the quirks

http://universaldependencies.org