Universal Dependencies as a Resource for Linguistic Research
Joakim Nivre UD in a Nutshell
Cross-linguistically consistent grammatical annotation Support multilingual research in NLP and linguistics • Meaningful linguistic analysis within and across languages • Syntactic parsing in monolingual and cross-lingual settings • Useful information for downstream language understanding tasks Build on common usage and existing de facto standards Complement – not replace – language-specific schemes UD for Linguistic Research
Theory • Can we make linguistic sense of UD representations? • Are dependencies syntactic or semantic? Data • What kind of linguistic data is available in UD treebanks? • How diverse are the data sets? Studies • Two case studies using UD resources Theory Syntax
root punct
nmod
nsubj dobj aux det case det aux det det
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
root
nmod
nsubj dobj aux det case det aux det det
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
root
nmod
nsubj dobj
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
2 root punct
obl
nsubj obj aux det case det aux det det
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
root
obl
nsubj obj aux det case det aux Syntaxdet det The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
root punct
nmodobl
nsubjnsubj dobjobj aux det case det aux det det
TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT
root • Content words are related by dependency relations nmod
nsubj dobj aux det case det aux det det
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
root
nmod
nsubj dobj
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
2
2 root punct
obl
nsubj obj aux det case det aux Syntaxdet det The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
rootroot punct
nmodobl
nsubjnsubj dobjobj auxaux det case detdet auxaux det det
TheThe catcat couldcould havehave chasedchased allall the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERBVERB DETDET DET NOUN ADP DET NOUN PUNCT
rootroot • Content words are related by dependency relations nmodobl
• Function wordsnsubjnsubj attach to dobjtheobj content word they modify aux det case det aux det det
TheThe catcat couldcould havehave chasedchased allall the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERBVERB DETDET DET NOUN ADP DET NOUN PUNCT
root
nmod
nsubj dobj
The cat could have chased all the dogs down the street . DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT
2
2 Syntax
root punct
nmodobl
nsubjnsubj dobjobj aux det case detdet aux det det
TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT
root • Content words are related by dependency relations nmodobl
• Function wordsnsubjnsubj attach to dobjtheobj content word they modify aux det case • detPunctuationdet attachaux to head ofdet phrase or clausedet TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT
root
nmodobl
nsubjnsubj dobjobj
TheThe catcat couldcould have chased all the dogs down the street . DETDET NOUNNOUN AUXAUX AUXAUX VERB DET DET NOUN ADP DET NOUN PUNCT
2 2 “Content-Head Dependencies” nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
1 “Content-Head Dependencies” nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
1 “Content-Head Dependencies” nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
1 “Content-Head Dependencies” nsubj obl nsubj obl aux case aux we havecase come to Osaka we have come nsubjto vgOsakaobl pobj “Function-Head Dependencies”we have come to Osaka nsubj vg obl pobj nsubj obl we have come to auxOsaka case we have come to Osaka nsubj obl nsubj obl aux caseaux case we have comewe tohave Osakacome to Osaka
nsubj obl aux case we have come to Osaka
1
1 nsubj obl aux case we have come to Osaka “Content-Head Dependencies” “Function-Head Dependencies” nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka
nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka nsubj obl nsubj obl aux case aux case we have come to Osaka we have come to Osaka nsubj obl aux case
we have come to Osaka
1
1 nsubj obl aux case we have come to Osaka “Content-Head Dependencies” “Function-Head Dependencies” nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka
nsubj obl nsubj vg obl pobj aux case we have come to Osaka we have come to Osaka Dubious Linguistics?nsubj obl “Suchnsubj an approach to theobl syntax of natural languages is contraryaux to most workcase in aux case theoretical syntax in the past 35 years, regardlesswe have of whethercome this workto is Osaka we have constituency-come toor dependency-based.Osaka ” (Groß and Osborne, 2015) nsubj obl aux case
we have come to Osaka
1
1 What is a head?
Zwicky (1985), summarised by Hudson (1987) Why choose one? Why choose one?
Head properties may be shared by several elements • So neither content-head nor function-head can be quite right Why choose one?
Head properties may be shared by several elements • So neither content-head nor function-head can be quite right Linguistic theories capture this in different ways • Lexical vs. functional heads (Chomsky, 1995) • Surface syntax vs. deep syntax (Sgall et al., 1986; Mel’čuk, 1988) • Dissociated nucleus (Tesnière, 1959) Why choose one?
Head properties may be shared by several elements • So neither content-head nor function-head can be quite right Linguistic theories capture this in different ways • Lexical vs. functional heads (Chomsky, 1995) • Surface syntax vs. deep syntax (Sgall et al., 1986; Mel’čuk, 1988) • Dissociated nucleus (Tesnière, 1959) What about UD? UD Syntax UD Syntax
UD representations are mono-stratal – single tree • Facilitates annotation, parsing and downstream tasks UD Syntax
UD representations are mono-stratal – single tree • Facilitates annotation, parsing and downstream tasks Tree structure primarily reflects lexical dependencies • Brings out parallelism between typologically diverse languages • Reveals predicate-argument structure for downstream tasks UD Syntax
root UD representationspunct are mono-stratal – single tree obl nsubj:pass • detFacilitatesaux:pass annotation,det parsing and downstream tasks The dog was chased by the cat . DET NOUN AUX VERB ADP DET NOUN PUNCT Tree structure primarily reflects lexical dependencies root punct • Brings out parallelismobl between typologically diverse languages nsubj:pass
Hunden jagades av katten . • NOUNRevealsVERB predicate-argumentADP NOUN PUNCT structure for downstream tasks Definite=Def Voice=Pass Definite=Def
root punct obl nsubj:pass case det aux:pass det
The dog was chased by the cat . DET NOUN AUX VERB ADP DET NOUN PUNCT
root punct obl nsubj:pass case
Hunden jagades av katten . NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def
4 UD Syntax
root UD representationspunct are mono-stratal – single tree obl nsubj:pass • detFacilitatesaux:pass annotation,det parsing and downstream tasks The dog was chased by the cat . DET NOUN AUX VERB ADP DET NOUN PUNCT Tree structure primarily reflects lexical dependencies root punct • Brings out parallelismobl between typologically diverse languages nsubj:pass
Hunden jagades av katten . • NOUNRevealsVERB predicate-argumentADP NOUN PUNCT structure for downstream tasks Definite=Def Voice=Pass Definite=Def
root punct Reddy et al. (2016) Transforming Dependency Structures obl Dependenciesto Logical to Logical Forms for Forms Semantic Parsing nsubj:pass case Composition det aux:pass det root The dog was chased by the cat . nsubj dobj DET NOUN AUX VERB ADP DET NOUN PUNCT Disney acquired Pixar
root punct obl (nsubj (dobj acquired Pixar) Disney) nsubj:pass case
Hunden jagades av katten . z. xy.acquired(ze) Pixar(ya) Disney(xa) NOUN VERB ADP NOUN PUNCT 9 ^ ^ ^ arg (z ,x ) arg (z ,y ) Definite=Def Voice=Pass Definite=Def 1 e a ^ 2 e a
20
1
4 UD Syntax
Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions
nmod nsubj case case conj fixed det det cc fixed
the man in the moon Jack and Jill left in spite of this
nsubj conj cc
Jack and Jill left
case fixed fixed
in spite of this
7 UD Syntax
Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions
nmod nsubj case case conj fixed det det cc fixed
the man in the moon Jack and Jill left in spite of this
nsubj conj cc
Jack and Jill left
case fixed fixed
in spite of this
7 UD Syntax
Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions
nmod nsubj case case conj fixed det det cc fixed
the man in the moon Jack and Jill left in spite of this
nsubj conj cc
Jack and Jill left
case fixed fixed
in spite of this
7 UD Syntax
Other relations encoded in labels – not tree structure • Functional relations link functional heads to lexical heads • Coordination relations link equivalent heads/dependents • Multiword relations link elements of lexicalized expressions
nmod nsubj case case conj fixed det det cc fixed
the man in the moon Jack and Jill left in spite of this
nsubj conj cc
Jack and Jill left
case fixed fixed
in spite of this
7 nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
1 nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl we have come to Osaka
nsubj obl we have come to Osaka
nsubj aux obl case
1 nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl we have come to Osaka
nsubj obl we have come to Osaka
nsubj aux obl case
1 nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl we have come to Osaka
nsubj obl dependency nucleus we have come to Osaka
nsubj aux obl case
1 nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl we have come to Osaka
nsubj obl dependency nucleus karaka vibhakti we have come to Osaka
nsubj aux obl case
1 nsubj obl aux case we have come to Osaka
nsubj vg obl pobj we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl aux case we have come to Osaka
nsubj obl we have come to Osaka
nsubj obl dependency nucleus karaka vibhakti we have come to Osaka kakariuke bunsetsu nsubj aux obl case
1 nsubj obl aux case
we have come to Osaka
nsubj vg obl pobj
we have come to Osaka
nsubj obl aux case
we have come to Osaka
nsubj obl aux case
we have come to Osaka
nsubj obl
we have come to Osaka
nsubj obl
Linguisticwe have Typologycome to Osaka nsubj aux obl case
nsubj
she is nice nonverbal ona mila predication
nsubj
nsubj cop Croft et al. (2017) Linguistic Typology Meets Universal Dependencies she is nice ona mila
nsubj
1 nsubj obl aux case
we have come to Osaka
nsubj vg obl pobj
we have come to Osaka
nsubj obl aux case
we have come to Osaka
nsubj obl aux case
we have come to Osaka
nsubj obl
we have come to Osaka
nsubj obl
we have come to Osaka
nsubj aux obl case
nsubj
she is nice Linguisticona Typologymila nsubj
nsubj copula strategy cop she is nice nonverbal ona mila predication
nsubj null strategy
Croft et al. (2017) Linguistic Typology Meets Universal Dependencies
1 nsubj cop det amod
Ivan is the best dancer Ivan luˇcˇsij tancor Verb Groupsamod
nsubj
obl nsubj case aux det
Ivan will participate in the show Ivan participera `a le spectacle
det nsubj case obl
nmod det case the Chair ’s o ce to grafeo tou prodrou
det
det nmod
det nmod case det the Chair ’s o ce of the Chair to grafeo tou prodrou
det
det nmod
nmod det case the Chair ’s o ce the o ce of the Chair
det case det nmod
9 nsubj cop det amod
Ivan is the best dancer Ivan luˇcˇsij tancor
amod
nsubj
obl nsubj case aux det
Ivan will participate in the show Ivan participera `a le spectacle
det nsubj case obl
nmod det case the Chair ’s o ce of the Chair to grafe´ıo tou pro´edrou Adpositions and Casedet
det nmod
det nmod case det the Chair ’s o ce of the Chair to grafe´ıo tou pro´edrou
det
det nmod
nmod det case the Chair ’s o ce the o ce of the Chair grafe´ıo tou pro´edrou
det case det nmod
9 nsubj cop det amod
Ivan is the best dancer Ivan luˇcˇsij tancor
amod
nsubj
obl nsubj case aux det
Ivan will participate in the show Ivan Adpositionsparticipera `aandle Casespectacle det nsubj case obl
nmod det case the Chair ’s o ce of the Chair to grafe´ıo tou pro´edrou
det
det nmod
det nmod case det the Chair ’s o ce of the Chair to grafe´ıo tou pro´edrou
det
det nmod
nmod det case the Chair ’s o ce the o ce of the Chair grafe´ıo tou pro´edrou
det case det nmod
9 nsubj cop det amod
Ivan is the best dancer Ivan luˇcˇsij tancor
amod
nsubj
obl nsubj case aux det
Ivan will participate in the show Ivan participera `a le spectacle
det nsubj case obl
nmod det case the Chair ’s o ce of the Chair to grafe´ıo tou pro´edrou
det
det nmod
det nmod case det the Chair ’s o ce of the Chair to Adpositionsgrafe´ıo and Casetou pro´edrou det
det nmod
nmod det case the Chair ’s o ce the o ce of the Chair
det case det nmod
9 Syntax or Semantics?
Are not UD dependencies semantic rather than syntactic? • Dependencies capture predicate-argument relations • Dependencies do not (directly) capture agreement and government A functional view of syntax: • UD is based on grammatical functions, not semantic roles • UD models relations that are encoded morphosyntactically • UD does not model semantic role alternations conj cc nsubj obj flat Grammaticaldet fixed Functions flat compound fixed det
Thomas Alva Edison invented the light bulb as well as the phonograph PROPN PROPN PROPN VERB DET NOUN NOUN ADP ADV ADP DET NOUN
obj
nsubj iobj det
Mary sent Peter a book PROPN VERB PROPN DET NOUN obj nsubj det obl obj Mary broke the vase nsubj det case PROPN VERB DET NOUN Mary sent a book to Peter PROPN VERB DET NOUN ADP NOUN det nsubj the vase broke obl DET NOUN VERB obj nsubj det case
Mary sent a book on Sunday PROPN VERB DET NOUN ADP NOUN
1
2 obj nsubj det
Mary broke the vase PROPN VERB DET NOUN
det nsubj
the vase broke DET NOUN VERB
obj nsubj det obj Mary broke the vase nsubj det PROPN VERB DET NOUN Mary broke the vase Valency-Changing OperationsPROPN VERB DET NOUN det nsubj nsubj:pass obl:agent the vase broke det aux:pass case DET NOUN VERB the vase was broken by Mary DET NOUN AUX VERB ADP PROPN
obj nsubj nsubj det Hasan ko¸stu Mary broke the vase (Hasan) (ran) PROPN VERB DET NOUN PROPN VERB
nsubj nsubj:pass obl:agent obj:caus det aux:pass case (ben) Hasanı ko¸sturdum the vase was broken by Mary (I) (Hasan) (made-run) DET NOUN AUX VERB ADP PROPN PROPN VERB
nsubj
Hasan ko¸stu (Hasan) (ran) PROPN VERB
nsubj obj:caus
(ben) Hasanı ko¸sturdum (I) (Hasan) (made-run) PROPN VERB
2
2 nsubj obl aux case
we have come to Osaka
nsubj vg obl pobj
UDwe haveRepresentationscome to Osaka
• Mono-stratal butnsubj multi-relational representationsobl • Grammatical functionsaux take priority case
• Both welexical andhave functionalcome heads canto be extractedOsaka
nsubj obl aux case
we have come to Osaka
nsubj obl
we have come to Osaka
nsubj obl
we have come to Osaka
nsubj aux obl case
1 nsubj obl aux case
we have come to Osaka
nsubj vg obl pobj
UDwe haveRepresentationscome to Osaka
• Mono-stratal butnsubj multi-relational representationsobl • Grammatical functionsaux take priority case
• Both welexical andhave functionalcome heads canto be extractedOsaka
nsubj obl aux case
we have come to Osaka
But nsubjyou need to be oblaware of this!
we have come to Osaka
nsubj obl
we have come to Osaka
nsubj aux obl case
1 Head-Initial or Head-Final?
nsubj obl aux case nsubj obl we have aux come to case Osaka we have come to Osaka nsubj vg obl pobj
we nsubj have vg come obl to pobjOsaka we have come to Osaka nsubj obl aux case nsubj obl we have aux come to caseOsaka we have come to Osaka nsubj obl aux case nsubj obl we have aux come to caseOsaka we have come to Osaka
1 1 Data 50 languages 70 treebanks http://universaldependencies.org Treebank Size 1500000
1000000
500000
Median 0 Treebank Size 1500000 Czech
1000000
500000
Median 0 Treebank Size 1500000 Czech
Russian 1000000
500000
Median 0 Treebank Size 1500000 Czech
Russian 1000000
Arabic
500000
Median 0 Language Family
Isolates 3 Other 4 Turkic 3 Afro-Asian 3 Uralic 3 Indo-European 34 Language Family Indo-European
Iranian Isolates Greek 1 3 Other 2 Baltic Celtic Slavonic 4 1 Turkic 2 10 3 Indian Afro-Asian 3 3 Uralic 3 Indo-European 34Germanic 7 Romance 8 Genre 50
40
30
20
10
0 News Fiction Non-FictionWikipedia Legal Blog Spoken Bible Medical Reviews Grammar Web Other Studies • Word order freedom studied in UD treebanks
• Conditional entropy of order given dependencies
• Test hypotheses about case and word order freedom • Word order freedom studied in UD treebanks
• Conditional entropy of order given dependencies
• Test hypotheses about case and word order freedom
Thanks to Richard for sharing slides! Word Order Freedom
English German Conditional Entropy
Local delexicalized trees to avoid data sparsity Word Order and Morphology
Semantics and Word Order • Word order typology based on Bible translations
• Massively parallel alignment and annotation projection
• UD tags and dependencies projected from English • Word order typology based on Bible translations
• Massively parallel alignment and annotation projection
• UD tags and dependencies projected from English
Thanks to Robert for sharing slides! Methodology
Word alignment of parallel texts: • New Testament in 986 languages (1144 translations) • Bayesian word alignment with interlingua (Östling, 2015) Annotation projection: • UD tags and dependencies from 5 English translations • Sparse, high-precision projection (80% of links must agree) Word order statistics: • Count frequency of different word order variants (tokens or types) • Constructions: SOV – SV – OV – Adp-Noun – Adj-Noun
New information for ≈ 600 languages Conclusion
• Large collection of languages – but biased sample and mostly small corpora
• Cross-linguistically consistent grammatical annotation – but you have to know the quirks
http://universaldependencies.org