The data Chomsky hierarchy Bottom up Problems Conclusion

Expressive power Learnability and Language Acquisition

Alexander Clark

Department of Computer Science Royal Holloway, University of London

Tuesday The data Chomsky hierarchy Bottom up Problems Conclusion

Outline

The data

Chomsky hierarchy Swiss German

Bottom up

Problems

Conclusion

(Ed Stabler, Greg Kobele, Makoto Kanazawa) The data Chomsky hierarchy Bottom up Problems Conclusion

Outline

The data

Chomsky hierarchy Swiss German

Bottom up

Problems

Conclusion More precisely, we observe sounds/gestures. • We don’t observe the meanings, but we are aware of • entailment relations between propositions. (or something similar) The sound/meaning relation is not one-to-one, but there is • a lot of structure. (knowing what was said tells you a lot about the meaning!)

The data Chomsky hierarchy Bottom up Problems Conclusion

What are the linguistic data?

We observe sound/meaning pairs. • The data Chomsky hierarchy Bottom up Problems Conclusion

What are the linguistic data?

We observe sound/meaning pairs. • More precisely, we observe sounds/gestures. • We don’t observe the meanings, but we are aware of • entailment relations between propositions. (or something similar) The sound/meaning relation is not one-to-one, but there is • a lot of structure. (knowing what was said tells you a lot about the meaning!) The data Chomsky hierarchy Bottom up Problems Conclusion

Sound/meaning pairs s5 m5

s4 m4

s3

m3

s2

m2

s1

m1 The data Chomsky hierarchy Bottom up Problems Conclusion

Structural descriptions Traditional view

grammar

g

SD

φ ψ

sound/SM meaning/CI The data Chomsky hierarchy Bottom up Problems Conclusion

Example

Grammar might be a context-free grammar, augmented • with some compositional semantics. Structure might be a labeled derivation (parse) tree. • φ is just the yield of the tree (a function). • ψ just computes the semantics (another function). • The data Chomsky hierarchy Bottom up Problems Conclusion

Standard model s5 m5

s4 m4

s3

m3

s2

m2

s1

m1 Everybody here speaks two languages. • Either way is fine. • I gave him a cake and her a biscuit/I gave him a cake. • John likes but Marys hates Sibelius/John likes Sibelius •

The data Chomsky hierarchy Bottom up Problems Conclusion

Issues

Description If you are just describing the data, then you can stipulate this as a requirement. If you are learning then you may not be able to learn grammars that satisfy this stipulation. So you need to decide whether this stipulation is necessary. The data Chomsky hierarchy Bottom up Problems Conclusion

Issues

Description If you are just describing the data, then you can stipulate this as a requirement. If you are learning then you may not be able to learn grammars that satisfy this stipulation. So you need to decide whether this stipulation is necessary.

Everybody here speaks two languages. • Either way is fine. • I gave him a cake and her a biscuit/I gave him a cake. • John likes but Marys hates Sibelius/John likes Sibelius • The data Chomsky hierarchy Bottom up Problems Conclusion

Alternative model s5 m5

s4 m4

s3

m3

s2

m2

s1

m1 The data Chomsky hierarchy Bottom up Problems Conclusion

Invalid structures s5 m5

s4 m4

s3

m3

s2

m2

s1

m1 The data Chomsky hierarchy Bottom up Problems Conclusion

Weak and Strong

Weak generative capacity Set of strings that a grammar can generate Mathematically tractable Strong generative capacity Set of structures that a grammar can generate Linguistically interesting but less tractable From a learnability point of view, this is not so crucial: the data that the child sees consists of semantically well formed utterances. The data Chomsky hierarchy Bottom up Problems Conclusion

Outline

The data

Chomsky hierarchy Swiss German

Bottom up

Problems

Conclusion The data Chomsky hierarchy Bottom up Problems Conclusion

Chomsky hierarchy of rewriting systems Chomsky, 1959

Rewriting rules of the form α β → Type 0 Unrestricted rewriting systems Type 1 Context sensitive grammars Type 2 Context free grammars Type 3 Regular grammars/finite automata

Classes of representations • Purely “syntactic” hierarchy • The data Chomsky hierarchy Bottom up Problems Conclusion

Rules

Terminal symbols Σ: a, b, c observed nonterminal symbols V : N, P, Q unobserved Rules: α β → N abc N → Pbc N → aPb N → PaQ aNb→ aPQb NabP→ QcRS → The data Chomsky hierarchy Bottom up Problems Conclusion

Finite languages

Grammar rules Simple list of all sentences in the language N abc analogy:→ Monkey vocalisations Language class All finite languages Languages with a finite number of sentences The data Chomsky hierarchy Bottom up Problems Conclusion

Regular languages

Grammar rules N aP Only→ one nonterminal on the rhs, and all terminals on the left. Language class All regular languages DFAs, NFAs, regular languages, generated by a , finite Myhill-Nerode congruence, . . . (Predates the Chomsky Hierarchy) The data Chomsky hierarchy Bottom up Problems Conclusion

Linear languages

Grammar rules N aPbc Only→ one nonterminal on the rhs Language class All linear languages Between regular languages and context-free languages Example L = anbn n > 0 { | } The data Chomsky hierarchy Bottom up Problems Conclusion

Context-free grammars

Grammar rules N aPbQc Only→ one nonterminal on the lhs, any number on the rhs Language class Context-free languages Polynomially parsable regular grammars applied to trees Grammar 1 S ab S → abS → S S

a b S a b S

a b a b S

a b

The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of context-free languages

(ab)+ The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of context-free languages

(ab)+ Grammar 1 S ab S → abS → S S

a b S a b S

a b a b S

a b The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of context-free languages

(ab)+ Grammar 2 S ab and S aTb T → ba and T → bSa → → S S

a T b a T b

b a b S a

a b The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of context-free languages

(ab)+ Grammar 3 S ab and S SS → → S S

S S S S

a b a b S S a b

a b a b The data Chomsky hierarchy Bottom up Problems Conclusion

Which is the ’right’ structure? S S

a T b S S

b S a S S a b

a b a b a b S

a b S

a b S

a b The data Chomsky hierarchy Bottom up Problems Conclusion

English example Which structure gives you the right dependencies?

Mary kicked the ball and Jill ate the cake • S

S and S

Mary VP Jill VP

kicked NP ate NP

the ball the cake The data Chomsky hierarchy Bottom up Problems Conclusion

English example Which structure gives you the right dependencies?

Mary kicked the ball and Jill ate the cake • S

NP VP

Mary kicked X NP

NP and Y the cake

the ball Jill ate The data Chomsky hierarchy Bottom up Problems Conclusion

Semantically well-formed

Mary kicked the ball and Jill ate the cake • # Mary fed the ball and Jill ate the cake • # Mary kicked the cake and Jill ate the cake • # Mary fed the ball and Jill ate the cat • Mary fed the cat and Jill ate the cake • Crucial to consider semantic wellformedness, not just syntactic wellformedness, especially when we come to think about learnability. Grammar 1 S ab S → aSb → S

a S b

a b

The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of (properly) context-free languages

anbn n > 0 | The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of (properly) context-free languages

anbn n > 0 | Grammar 1 S ab S → aSb → S

a S b

a b The data Chomsky hierarchy Bottom up Problems Conclusion

Examples of context-free languages

anbn n > 0 | Grammar 2 S ab and S aT T → b and T →aTb → → S

a T

a T b

b The data Chomsky hierarchy Bottom up Problems Conclusion

Argument that a language is not regular

Facts: If L and R are regular languages, then L R is also regular. • ∩ General structure of correct argument We want to show that L is not regular, but L is very complex Take a simple regular language R L R is a simple but infinite subset of L Show∩ that L R is not regular Therefore L is∩ not regular The data Chomsky hierarchy Bottom up Problems Conclusion

English is not regular

Example – Shuly Wintner A white male hired another white male A white male – whom a white male hired – hired another white male Define R R = A white male(whom a white male)∗(hired)∗hired another white male { } Subset of English L R = A∩ white male(whom a white male)n(hired)nhired another white male n > {0 | } The data Chomsky hierarchy Bottom up Problems Conclusion

Argument that a language is not context free

Facts: If L is CF and R is regular, then L R is also CF. • ∩ General structure of correct argument We want to show that L is not CF, but L is very complex Take a simple regular language R L R is a simple but infinite subset of L Show∩ that L R is not CF Therefore L is∩ not CF In principle an unbounded number of crossed dependencies is possible. However, except for the first and last verb any permutation of the NPs and the verbs is grammatical as well (even though with a completely different dependency structure since the dependencies are always cross-serial). Therefore, the string language of Dutch cross-serial dependencies amounts roughly to nkvk k>0 which is a context-free language. { | } Bresnan et al. (1982) argue that the SGC of CFGs is too limited for the Dutch examples. Weakness of the argument: an argument about syntactic structure makes always certain theoretical stipulations. Although it is very probable, it does not absolutely prove that, even using different syntactic theories, there is no context-free analysis for the Dutch examples. It only shows that the Bresnan et al. think the appropriate ones cannot be obtained with a CFG.

Shieber’sThe data argumentChomsky about hierarchy Swiss German cross-serialBottom up dependenciesProblems is more convincingConclusion since it relies only on the string language, i.e., it concerns the WGC of CFGs. The Swiss German data:

(6) ... das mer em Hans esSwiss huus h¨alfed German aastriiche ... that we Hans-DAT house-ACCShieber, helped 1985 paint

‘... that we helped Hans paint the house’

(7) ... das mer d’chind em Hans es huus l¨ond h¨alfe aastriiche ... that we the children-ACC Hans-DAT house-ACC let help paint

‘... that we let the children help Hans paint the house’

Swiss German uses case marking and displays cross-serial dependencies.

Proposition 5 The language L of Swiss German is not context-free (Shieber 1985).

Argumentation goes as follows: Assume that L is context-free. Then the intersection of a regular language with the image of L under a homomorphism must be context-free as well. Find a homomorphism and a regular language such that the result is a non context-free language. Contradiction. Further possible sentence:

(8) ... das mer d’chind em Hans es huus haend wele laa h¨alfe aastriiche ... that we the children-ACC Hans-DAT house-ACC have wanted let help paint ‘... that we have wanted to let the children help Hans paint the house’

Swiss German allows constructions of the form (Jan s¨ait) (‘Jan says’)das mer (d’chind)i (em Hans)j es huus haend wele (laa)i (h¨alfe)j aastriiche. In these constructions the number of accusative NPs d’chind must equal the number of verbs (here laa) selecting for an accusative and the number of dative NPs em Hans must equal the number of verbs (here h¨alfe) selecting for a dative object. Furthermore, the order must be the same in the sense that if all accusative NPs precede all dative NPs, then all verbs selecting an accusative must precede all verbs selecting a dative. Homomorphism f: f(“d’chind”) = af(“Jan s¨ait das mer”) = w f(“em Hans”) = bf(“es huus haend wele”) = x f(“laa”) = cf(“aastriiche”) = y f(“h¨alfe”) = df(s)=z otherwise

6 Pick regular language a is accusative noun, b is dative noun c is verb that needs accusative, d is verb that needs dative R = a∗b∗c∗d ∗

The data Chomsky hierarchy Bottom up Problems Conclusion

Proof

This language is not CF abcd, aabccd, aaabcccd, abbcdd, . . . number of as equals number of cs and number of bs equals number of ds anbmcnd m The data Chomsky hierarchy Bottom up Problems Conclusion

Proof

This language is not CF abcd, aabccd, aaabcccd, abbcdd, . . . number of as equals number of cs and number of bs equals number of ds anbmcnd m Pick regular language a is accusative noun, b is dative noun c is verb that needs accusative, d is verb that needs dative R = a∗b∗c∗d ∗ The data Chomsky hierarchy Bottom up Problems Conclusion

Outline

The data

Chomsky hierarchy Swiss German

Bottom up

Problems

Conclusion The data Chomsky hierarchy Bottom up Problems Conclusion

Derivation tree Top-down versus Bottom-up

S

a S b

a S b

c The data Chomsky hierarchy Bottom up Problems Conclusion

Top down versus bottom up Kanazawa (2011)

Top down Chomsky (1956, 1959) N PQ • → Rewrite N as PQ • Bottom up derivations Smullyan (1961), Chomsky (1995), Stabler (1997) N PQ • ← Combine P and Q to make N • N(xy) := P(x), Q(y) • The data Chomsky hierarchy Bottom up Problems Conclusion

Minimalist program and MERGE

External merge Combine X and Y to make a new object. Internal merge Y is part of X Minimalist Grammars (Stabler, 1997) Tractable formalisation of this approach. The data Chomsky hierarchy Bottom up Problems Conclusion

Mildly context sensitive languages Joshi

Slightly vague and informal definition: Definition A class of languages is MCS if: Not too complex: Efficiently parsable (in P) • Constant growth property/semilinearity • Includes some classes of crossing dependencies • (includes all context-free languages) • The data Chomsky hierarchy Bottom up Problems Conclusion

Convergence of MCS formalisms Joshi et al. (1990)

A crucial result for linguistics: all of the then current proposals turned out to be weakly equivalent: Linear indexed grammars • Tree adjoining grammars • Head grammars • Combinatory categorial grammars • The derivation structures can all be described using LCFRs (we will use Multiple Context Free Grammars instead) The data Chomsky hierarchy Bottom up Problems Conclusion

Movement MCFGs and MGs

See handout. The data Chomsky hierarchy Bottom up Problems Conclusion

Outline

The data

Chomsky hierarchy Swiss German

Bottom up

Problems

Conclusion The data Chomsky hierarchy Bottom up Problems Conclusion

Argument that a language is not MCF

Facts (Seki et al.): If L is MCF and R is regular, then L R is also MCF. • ∩ General structure of correct argument We want to show that L is not MCF, but L is very complex Take a simple regular language R L R is a simple but infinite subset of L Show∩ that L R is not MCF Therefore L is∩ not MCF The data Chomsky hierarchy Bottom up Problems Conclusion

Old Georgian Kracht and Michaelis

Suffixaufnahme/Stacking in Old Georgian A word can have a number of case-suffixes that depends on how deeply embedded it is WME- word marking English Verb ’run’, and noun ’son’, case marker ’gen’ son run • son songen run • son songen songengen run • son songen songengen songengengen run. • See Joshi’s rebuttal and Pesetsky’s recent work The data Chomsky hierarchy Bottom up Problems Conclusion

Chinese number names Radzinski, 1991

Data wu = 5, zhao = 1012 wu zhao zhao wu zhao = 5 1024 + 5 1012 *wu zhao wu zhao zhao × × Language intersected with wu, zhao ∗ k k { kn } wuzhao 1 wuzhao 2 ... wuzhao k > k > kn { | 1 2 ··· } This is not semilinear The data Chomsky hierarchy Bottom up Problems Conclusion

Yoruba Kobele, 2006 (and MCFG2 2011)

Relative clauses Olu ra adie ti o go • Olu buy chicken that 3s dumb • Rira adie ti o go to Olu ra adie ti o go ko da • buying chicken that 3s dumb that Olu buy chicken that 3s • dumb not good * Rira adie ti o go to Olu ra adie ti o kere ko da • Recursive copying: intersection with appropriate R gives 2n copies of ’adie’ The data Chomsky hierarchy Bottom up Problems Conclusion

Outline

The data

Chomsky hierarchy Swiss German

Bottom up

Problems

Conclusion The data Chomsky hierarchy Bottom up Problems Conclusion

Conclusion

Broad consensus that somewhere in the MCFG hierarchy • is correct Some problematic cases but all a bit marginal • Possibly we need some true copying operation as well • N1(uu) P1(u) Also for← full stem reduplication in morphology? Alternatively there might be some nontrivial operation at • Spellout. We will make the default assumption here that MCFGs are • adequate.