<<

Lecture 11: Computational

Paula Buttery

Dept of Theoretical & Applied , University of Cambridge

Paula Buttery (DTAL) Computational Psycholinguistics 1 / 47 What is Psycholinguistics?

Psycholinguistics is concerned with: how we acquire, comprehend and produce language; understanding how language is stored and processed in the brain.

Paula Buttery (DTAL) Computational Psycholinguistics 2 / 47 What is Psycholinguistics?

Psycholinguistics is concerned with: how we acquire, comprehend and produce language; understanding how language is stored and processed in the brain.

Paula Buttery (DTAL) Computational Psycholinguistics 2 / 47 Example research questions in psycholinguistics:

How are words organised in the brain? Full listing: cat cat + N + Sing Minimum redundancy: cats cat + N + PL cat N hope hope + V hope V hopes hope + V + 3P + Sing vs. fox N fox fox + N + Sing fox V fox fox + V ^s +PL foxes fox + N + PL ^s +3P + Sing foxes fox + V + 3P + Sing ^ed +PastPart foxed fox + V + PastPart

Paula Buttery (DTAL) Computational Psycholinguistics 3 / 47 Example research questions in psycholinguistics: Morphology

How are words organised in the brain? Full listing: cat cat + N + Sing Minimum redundancy: cats cat + N + PL cat N hope hope + V hope V hopes hope + V + 3P + Sing vs. fox N fox fox + N + Sing fox V fox fox + V ^s +PL foxes fox + N + PL ^s +3P + Sing foxes fox + V + 3P + Sing ^ed +PastPart foxed fox + V + PastPart

Paula Buttery (DTAL) Computational Psycholinguistics 3 / 47 Example research questions in psycholinguistics:

Syntactic complexity: What makes a difficult to process? The cat the dog licked ran away The cat the dog the rat chased licked ran away The fact that the employee who the manager hired stole office supplies worried the executive Complement clause then relative clause The executive who the fact that the employee stole office supplies worried hired the manager Relative clause then complement clause T. Gibson

Paula Buttery (DTAL) Computational Psycholinguistics 4 / 47 Example research questions in psycholinguistics: Syntax

Syntactic complexity: What makes a sentence difficult to process? The cat the dog licked ran away The cat the dog the rat chased licked ran away The fact that the employee who the manager hired stole office supplies worried the executive Complement clause then relative clause The executive who the fact that the employee stole office supplies worried hired the manager Relative clause then complement clause T. Gibson

Paula Buttery (DTAL) Computational Psycholinguistics 4 / 47 (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope.

.

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 Online parsing ambiguity: The student forgot the solution was in the back of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope)

.

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 was in the back of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 in the back of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 the back of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 back of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 of the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 the book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back of .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 book Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back of the .

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 Strong garden path effect: The horse raced past the barn fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back of the book.

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 fell.

Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back of the book. Strong garden path effect: The horse raced past the barn

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 Example research questions in psycholinguistics: Syntax

Parsing: How does the brain perform parse disambiguation? Choosing from multiple parses: He saw the boy with the telescope. (He saw (the boy with the telescope)) vs. ((He saw the boy) with the telescope) Online parsing ambiguity: The student forgot the solution was in the back of the book. Strong garden path effect: The horse raced past the barn fell.

Paula Buttery (DTAL) Computational Psycholinguistics 5 / 47 Example research questions in Psycholinguistics:

In what manner is word meaning stored in the brain? For concepts (nouns): categories theory (Lakoff) women fire and dangerous things decompositional feature based model: bird: +feathers +fly +beak what shall we use as features? prototype theory (canonical examples) bird: crow (rather than penguin) exemplar theory (multiple good examples) bird: crow, parrot, sparrow { } Semantic networks (is-a or has-a) bird: is-a animal, has-a feathers ...

Paula Buttery (DTAL) Computational Psycholinguistics 6 / 47 Example research questions in Psycholinguistics: Semantics

In what manner is word meaning stored in the brain? For concepts (nouns): categories theory (Lakoff) women fire and dangerous things decompositional feature based model: bird: +feathers +fly +beak what shall we use as features? prototype theory (canonical examples) bird: crow (rather than penguin) exemplar theory (multiple good examples) bird: crow, parrot, sparrow { } Semantic networks (is-a or has-a) bird: is-a animal, has-a feathers ...

Paula Buttery (DTAL) Computational Psycholinguistics 6 / 47 Example research questions in Psycholinguistics: Semantics

In what manner is word meaning stored in the brain? For concepts (nouns): categories theory (Lakoff) women fire and dangerous things decompositional feature based model: bird: +feathers +fly +beak what shall we use as features? prototype theory (canonical examples) bird: crow (rather than penguin) exemplar theory (multiple good examples) bird: crow, parrot, sparrow { } Semantic networks (is-a or has-a) bird: is-a animal, has-a feathers ...

Paula Buttery (DTAL) Computational Psycholinguistics 6 / 47 Example research questions in Psycholinguistics: Semantics

In what manner is word meaning stored in the brain? For concepts (nouns): categories theory (Lakoff) women fire and dangerous things decompositional feature based model: bird: +feathers +fly +beak what shall we use as features? prototype theory (canonical examples) bird: crow (rather than penguin) exemplar theory (multiple good examples) bird: crow, parrot, sparrow { } Semantic networks (is-a or has-a) bird: is-a animal, has-a feathers ...

Paula Buttery (DTAL) Computational Psycholinguistics 6 / 47 Example research questions in Psycholinguistics: Semantics

In what manner is word meaning stored in the brain? For concepts (nouns): categories theory (Lakoff) women fire and dangerous things decompositional feature based model: bird: +feathers +fly +beak what shall we use as features? prototype theory (canonical examples) bird: crow (rather than penguin) exemplar theory (multiple good examples) bird: crow, parrot, sparrow { } Semantic networks (is-a or has-a) bird: is-a animal, has-a feathers ...

Paula Buttery (DTAL) Computational Psycholinguistics 6 / 47 0.57::melt v 0.33::amount n+in p() 0.29::pot n+and c 0.44::pron rel +smoke v 0.33::ceramic a 0.28::bottom n+of p() 0.43::of p()+gold n 0.33::hot a 0.28::of p()+flower n 0.41::porous a 0.32::boil v 0.28::of p()+water n 0.40::of p()+tea n 0.31::bowl n+and c 0.28::food n+in p() 0.39::player n+win v 0.31::ingredient n+in p() 0.39::money n+in p() 0.30::plant n+in p() 0.38::of p()+coffee n 0.30::simmer v

Finally, note that distributions contain many contexts which arise from multiword expressions of various types, and these often have high weights. The distribution of pot contains several examples, as does the following distribution for time:

0.46::of p()+death n 0.40::pron rel +spend v 0.37::country n+at p() 0.45::same a 0.39::sand n+of p() 0.37::age n+at p() 0.45::1 n+at p(temp) 0.39::pron rel +waste v 0.37::space n+and c 0.45::Nick n+of p() 0.38::place n+around p() 0.37::in p()+career n 0.42::spare a 0.38::of p()+arrival n 0.37::world n+at p() Example0.42::playoffs n+for researchp() questions0.38::of p()+completion in Psycholinguistics:n Semantics 0.42::of p()+retirement n 0.37::after p()+time n 0.41::of p()+release n 0.37::of p()+arrest n

InTosum what up,thereis manner a widerangeofchoicein is word meaning constructingdist storedributional in the models. brain? Manually examining the characteristic contexts gives us a good idea of how sensible different weighting measures are, for instance, but we need to look at Forhow other distributions words: are actually used to evaluate how well theymodelmeaning. Distributional models (statistical co-occurrence of with other words 8.4and Similarity grammatical contexts) Calculating similarity in a distributional space is done by calculating the distance between the vectors.

The most common method is cosine similarity. Law of cosines: c2 = a2 + b2 2ab cos γ • − Paula Buttery (DTAL) Computational Psycholinguistics 7 / 47

70 Psycholinguists use a range of methodologies

Questionnaires Rating experiments e.g. how do you rate the grammaticality of this sentence? Self evaluations e.g. how were you carrying out the task? Discovering participant knowledge of the task

Paula Buttery (DTAL) Computational Psycholinguistics 8 / 47 It’s not only us who have screw looses. (screws loose) He has already trunked two packs. (packed two trunks)

Evidence may be obtained from observation

Observations Study of speech errors Study of the language of aphasics Study of

Speech Error Data

Paula Buttery (DTAL) Computational Psycholinguistics 9 / 47 (screws loose) He has already trunked two packs. (packed two trunks)

Evidence may be obtained from observation

Observations Study of speech errors Study of the language of aphasics Study of language acquisition

Speech Error Data It’s not only us who have screw looses.

Paula Buttery (DTAL) Computational Psycholinguistics 9 / 47 He has already trunked two packs. (packed two trunks)

Evidence may be obtained from observation

Observations Study of speech errors Study of the language of aphasics Study of language acquisition

Speech Error Data It’s not only us who have screw looses. (screws loose)

Paula Buttery (DTAL) Computational Psycholinguistics 9 / 47 (packed two trunks)

Evidence may be obtained from observation

Observations Study of speech errors Study of the language of aphasics Study of language acquisition

Speech Error Data It’s not only us who have screw looses. (screws loose) He has already trunked two packs.

Paula Buttery (DTAL) Computational Psycholinguistics 9 / 47 Evidence may be obtained from observation

Observations Study of speech errors Study of the language of aphasics Study of language acquisition

Speech Error Data It’s not only us who have screw looses. (screws loose) He has already trunked two packs. (packed two trunks)

Paula Buttery (DTAL) Computational Psycholinguistics 9 / 47 Evidence may be obtained from observation

Aphasic Data it’s all happenining in the kitchen. an’ he’s fallenining off the stool

Paula Buttery (DTAL) Computational Psycholinguistics 10 / 47 Evidence may be obtained from observation

Aphasic Data it’s all happenining in the kitchen. an’ he’s fallenining off the stool

Paula Buttery (DTAL) Computational Psycholinguistics 10 / 47 Evidence may be obtained from observation

Language Acquisition Data

Stage 2 (2.0–2.5) -s plurals Stage 3 (2.5–3.0) ’s possessive Brown’s stages: Stage 4 (3.0–3.75) regular past tense Stage 5 (3.75–4.5) irregular 3rd person verbs (dos does, haves has) → → show CHILDES example The wug test (elicitation)

Paula Buttery (DTAL) Computational Psycholinguistics 11 / 47 Jean Berko-Gleason designed the Wug Test

This is a man who knows how to SPOW. He is SPOWING. He did the same thing yesterday. Yesterday he .....

Paula Buttery (DTAL) Computational Psycholinguistics 12 / 47 Jean Berko-Gleason designed the Wug Test

This is a man who knows how to SPOW. He is SPOWING. He did the same thing yesterday. Yesterday he .....

Paula Buttery (DTAL) Computational Psycholinguistics 12 / 47 Jean Berko-Gleason designed the Wug Test

This is a man who knows how to SPOW. He is SPOWING. He did the same thing yesterday. Yesterday he .....

Paula Buttery (DTAL) Computational Psycholinguistics 12 / 47 Psycholinguists use a range of methodologies cont.

Questionnaires Rating experiments Self evaluations

Observations Study of speech errors Study of the language of aphasics Study of language acquisition

Experimental observation as a response to stimulus Measurement of brain response Measurement of reading times Measurement of reaction time to a linguistic task

Paula Buttery (DTAL) Computational Psycholinguistics 13 / 47 Psycholinguists use a range of methodologies cont.

Questionnaires Rating experiments Self evaluations

Observations Study of speech errors Study of the language of aphasics Study of language acquisition

Experimental observation as a response to stimulus Measurement of brain response Measurement of reading times Measurement of reaction time to a linguistic task

Paula Buttery (DTAL) Computational Psycholinguistics 13 / 47 Brain responses may be measured by several methods

EEG MEG

High temporal resolution Problematic spatial resolution

Paula Buttery (DTAL) Computational Psycholinguistics 14 / 47 Brain responses may be measured by several methods

fMRI

BOLD (Blood-oxygen-level dependent) response—measures the change in magnetization between oxygen-rich and oxygen-poor blood. High spatial resolution Low temporal resolution

Paula Buttery (DTAL) Computational Psycholinguistics 15 / 47 Measuring reading and reacting times

Eye tracking

Button Pressing Self paced reading Completion of a task (e.g. a lexical decision task)

For all reaction time experiments we assume that the time taken to react to a task reflects the ‘difficulty’ of the cognitive processes involved.

Paula Buttery (DTAL) Computational Psycholinguistics 16 / 47 Measuring reading and reacting times

Eye tracking

Button Pressing Self paced reading Completion of a task (e.g. a lexical decision task)

For all reaction time experiments we assume that the time taken to react to a task reflects the ‘difficulty’ of the cognitive processes involved.

Paula Buttery (DTAL) Computational Psycholinguistics 16 / 47 Evaluating responses

Shadowing Participant repeats the stimulus e.g. lexical correction, Marslen-Wilson and Welsh e.g. missing auxiliaries, Caines

Paula Buttery (DTAL) Computational Psycholinguistics 17 / 47 Experimental Strategies: Priming Effect

Established that exposure to a stimulus influences a response to a later stimulus. Caused by spreading activation (the priming stimulus activates part(s) of the brain, then when the second stimulus is encountered less additional activation is needed). Priming manifests itself as a measurable change in reaction. Lexical Priming: e.g. Priming experiments show us that: lifting primes lift, burned primes burn but selective does not prime select (this maybe tells us something about derivational vs. inflectional morphology). Syntactic Priming: e.g. get candidate to read ”the ghoul sold a vacuum cleaner to a witch”; then ask participant to describe picture of a vampire handing a ghost a hat; participant more likely to use the to-construction (i.e. ’the vampire hands a hat to the ghost’) Also shown for Passive constructions—Branigan, Pickering and Cleland; Dialogue modelling—Pickering and Garrod Paula Buttery (DTAL) Computational Psycholinguistics 18 / 47 Experimental Strategies: Priming Effect

Established that exposure to a stimulus influences a response to a later stimulus. Caused by spreading activation (the priming stimulus activates part(s) of the brain, then when the second stimulus is encountered less additional activation is needed). Priming manifests itself as a measurable change in reaction. Lexical Priming: e.g. Priming experiments show us that: lifting primes lift, burned primes burn but selective does not prime select (this maybe tells us something about derivational vs. inflectional morphology). Syntactic Priming: e.g. get candidate to read ”the ghoul sold a vacuum cleaner to a witch”; then ask participant to describe picture of a vampire handing a ghost a hat; participant more likely to use the to-construction (i.e. ’the vampire hands a hat to the ghost’) Also shown for Passive constructions—Branigan, Pickering and Cleland; Dialogue modelling—Pickering and Garrod Paula Buttery (DTAL) Computational Psycholinguistics 18 / 47 Experimental Strategies: Priming Effect

Established that exposure to a stimulus influences a response to a later stimulus. Caused by spreading activation (the priming stimulus activates part(s) of the brain, then when the second stimulus is encountered less additional activation is needed). Priming manifests itself as a measurable change in reaction. Lexical Priming: e.g. Priming experiments show us that: lifting primes lift, burned primes burn but selective does not prime select (this maybe tells us something about derivational vs. inflectional morphology). Syntactic Priming: e.g. get candidate to read ”the ghoul sold a vacuum cleaner to a witch”; then ask participant to describe picture of a vampire handing a ghost a hat; participant more likely to use the to-construction (i.e. ’the vampire hands a hat to the ghost’) Also shown for Passive constructions—Branigan, Pickering and Cleland; Dialogue modelling—Pickering and Garrod Paula Buttery (DTAL) Computational Psycholinguistics 18 / 47 Experimental Strategies: Priming Effect

Established that exposure to a stimulus influences a response to a later stimulus. Caused by spreading activation (the priming stimulus activates part(s) of the brain, then when the second stimulus is encountered less additional activation is needed). Priming manifests itself as a measurable change in reaction. Lexical Priming: e.g. Priming experiments show us that: lifting primes lift, burned primes burn but selective does not prime select (this maybe tells us something about derivational vs. inflectional morphology). Syntactic Priming: e.g. get candidate to read ”the ghoul sold a vacuum cleaner to a witch”; then ask participant to describe picture of a vampire handing a ghost a hat; participant more likely to use the to-construction (i.e. ’the vampire hands a hat to the ghost’) Also shown for Passive constructions—Branigan, Pickering and Cleland; Dialogue modelling—Pickering and Garrod Paula Buttery (DTAL) Computational Psycholinguistics 18 / 47 So what is a Computational Psycholinguist?

Computational Psycholinguists generate testable hypotheses by building computational models of language processes and also by drawing on information theory.

Note that Information Theoretic predictions are not always explanatory in terms of processing mechanisms e.g. Uniform Information Density—Jaeger

the girl I saw last Saturday vs the girl that I saw last Saturday

Paula Buttery (DTAL) Computational Psycholinguistics 19 / 47 So what is a Computational Psycholinguist?

Computational Psycholinguists generate testable hypotheses by building computational models of language processes and also by drawing on information theory.

Note that Information Theoretic predictions are not always explanatory in terms of processing mechanisms e.g. Uniform Information Density—Jaeger

the girl I saw last Saturday vs the girl that I saw last Saturday

Paula Buttery (DTAL) Computational Psycholinguistics 19 / 47 How are words organised in the in the brain?

Computational Psycholinguists predict reaction times to words based on various organisational models: Morphological family size models Baayen Ratio of lexical item to its morphological family size (a larger family co-activates and speeds reaction times). Cohort and Lexical isolation point models Marslen-Wilson Fast recognition of high frequency words with low frequency neighbours (recognition point vs. with uniqueness point) Information Residuals Moscoso del Prado Martin Showed response latencies in visual lexical decision based on the frequency of the word in a corpus and also the entropy of the morphological paradigm.

Paula Buttery (DTAL) Computational Psycholinguistics 20 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve A sentence is constructed top-down and left-to-right. The model consists of a register for the current node being explored and a stack for all the nodes left to explore. The size of the stack an approximation to working memory load. Yngve predicted that sentences which required many items to be placed on the stack would be difficult to process and also less frequent in the language. He also predicted that when multiple parses are possible we should prefer the one with the minimised stack.

Paula Buttery (DTAL) Computational Psycholinguistics 21 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve S NP VP NP→ Det N VP→VP NP Register Stack Det→ the S N →dog S N→cat V→chased →

Paula Buttery (DTAL) Computational Psycholinguistics 22 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve S NP VP NP→ Det N → Register Stack VP VP NP S Det→ the S N →dog NP NP VP N→cat V→chased →

Paula Buttery (DTAL) Computational Psycholinguistics 23 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve

S NP VP S NP→ Det N Register Stack → VP VP NP NP Det→ the S N →dog NP VP N→cat Det N VP Det V→chased →

Paula Buttery (DTAL) Computational Psycholinguistics 24 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve

S S NP VP → Register Stack NP Det N NP VP→VP NP S Det→ the NP VP N →dog Det N VP N→cat Det the N VP V→chased → the

Paula Buttery (DTAL) Computational Psycholinguistics 25 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve

S S NP VP Register Stack → NP Det N NP VP→VP NP S Det→ the NP VP N →dog Det N VP N→cat Det N the N VP V→chased N VP → the

Paula Buttery (DTAL) Computational Psycholinguistics 26 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve

S Register Stack S NP VP NP→ Det N NP S VP→VP NP NP VP Det→ the Det N VP N →dog the N VP N→cat Det N N VP V→chased → dog VP the dog

Paula Buttery (DTAL) Computational Psycholinguistics 27 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve

S Register Stack S NP VP → NP Det N NP VP S VP→VP NP NP VP Det→ the Det N VP N →dog the N VP N→cat Det N N VP V→chased dog VP → VP the dog

Paula Buttery (DTAL) Computational Psycholinguistics 28 / 47 What makes a sentence difficult to process?

Parsing models as predictors for observed patterns in language–Yngve Register Stack S S NP VP S NP→ Det N NP VP NP VP VP→VP NP Det N VP Det→ the the N VP N →dog N VP N→cat Det N V dog VP V→chased → VP V NP the dog

Paula Buttery (DTAL) Computational Psycholinguistics 29 / 47 What makes a sentence difficult to process?

Yngve’s make correct predictions about centre embedding Consider: This is the malt that the rat that the cat that the dog worried killed ate. as opposed to: This is the malt that was eaten by the rat that was killed by the cat that was worried by the dog.

Paula Buttery (DTAL) Computational Psycholinguistics 30 / 47 What makes a sentence difficult to process?

Yngve’s make correct predictions about centre embedding Consider: STACK: N VP VP VP This is the malt that the rat that the cat that the dog worried killed ate. as opposed to: This is the malt that was eaten by the rat that was killed by the cat that was worried by the dog.

Yngve evaluated his predictions by looking at frequencies of constructions in corpus data.

Paula Buttery (DTAL) Computational Psycholinguistics 31 / 47 What makes a sentence difficult to process?

Dependency Locality Theory—Gibson Processing cost of integrating a new word is proportional to the distance between the word and the item with which the word is integrating. Distance is measured in words plus new phrases and discourse referents. DLT will predict that object relative clauses are harder to process because they have two nouns that appear before any verb: The girl who likes me, went to the party. The girl who Peter likes, went to the party.

Paula Buttery (DTAL) Computational Psycholinguistics 32 / 47 What makes a sentence difficult to process?

Dependency Locality Theory—Gibson Processing cost of integrating a new word is proportional to the distance between the word and the item with which the word is integrating. Distance is measured in words plus new phrases and discourse referents. DLT can also explain: The fact that the employee who the manager hired stole office supplies worried the executive vs. The executive who the fact that the employee stole office supplies worried hired the manager

Paula Buttery (DTAL) Computational Psycholinguistics 33 / 47 Online parsing ambiguity: The student forgot the solution was in the back of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

.

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 Online parsing ambiguity: The student forgot the solution was in the back of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

.

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 was in the back of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 in the back of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 the back of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 back of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 of the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 the book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back of .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 book Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back of the .

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back of the book.

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back of the book. Strong garden path effect: The horse raced past the barn

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 Surprisal as a predictor of reading times in sentence comprehension—Levy

How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back of the book. Strong garden path effect: The horse raced past the barn fell.

.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 How does the brain perform parse disambiguation?

Surprisal as a measure of lexical and syntactic complexity: e.g. S(wi ) = log 1/P(wi ) (1)

Online parsing ambiguity: The student forgot the solution was in the back of the book. Strong garden path effect: The horse raced past the barn fell.

Surprisal as a predictor of reading times in sentence comprehension—Levy.

Paula Buttery (DTAL) Computational Psycholinguistics 34 / 47 How does the brain perform parse disambiguation? Hale, 2001

Predictability as a measure of difficulty Probabilistic context-free grammars (PCFGs) are a good model of how human sentence comprehension works. A probabilistic Earley parser is a good model of online eager sentence comprehension for PCFGs. The cognitive effort associated with a word in a sentence can be measured by the word’s negative log conditional probability: 1 log P(wi w1...i−1) |

Paula Buttery (DTAL) Computational Psycholinguistics 35 / 47 Hale, 2001

Example PCFG S NP VP 1 NP → N PP 0.2 NP → N 0.8 PP → P NP 1 VP → VP PP 0.1 VP → V VP 0.2 VP → V NP 0.4 VP → V 0.3 N → it, fish, rivers, December, they 0.2 P → {in } 1 V → {can,} fish 0.5 → { }

Paula Buttery (DTAL) Computational Psycholinguistics 36 / 47 How does the brain perform parse disambiguation? Hale, 2001

Probabilistic Early Parsing Prefix Probability: makes use of the fact that all the trees for the grammar must sum to 1.

Paula Buttery (DTAL) Computational Psycholinguistics 37 / 47 computational linguistics 8

edgen functional info [starting at, waiting at] history

e0 S ● NP VP [0,0] . → .

en S NP ● VP [0,1] em . → .

e S NP VP ● [0,1] e z → y

Now that we know what the edges look like in the Earley Chart, the next question is how they get into the chart 6. The algorithm 6 Putting edges into the chart is called proceedsHow as follows does7: the brain perform parse disambiguation?populating the chart. Hale, 7 I deviate slightly from J&M in order 2001 to (hopefully) make it slightly simpler • Earley is a top-down algorithm so we initiate the chart with the to follow.

S ruleedge containingn dotted the rule leftmost[start dot: at, at wait this at] pointhistory we haveword explored n computational● they ● linguisticscan ● 8 ● no nodese0 ofS this→ G rule.NP VP [0,0] word 0 fish e1 NP → G N [0,0] word 1 edgeen2 dottedNP → G ruleN PP [starting[0,0] at, waiting at] history 0 I 2 3 e3 N → they G [0,1] e0 S ● NPedge VPn functional info [0,0] [starting at, waiting at] history e4 NP → N [0,1] (e3) S NP VP → G → e5 NP → N G PP [0,1] (e3) NP N PP e S ● NP VP [0,0] → e6 S → NP G0 VP→ [0,1] (e4) NP N . → • For eache7 wordPP → inG theP. NP sentence we[1,1] proceed through theword following 2 PP P NP e8 VP → G V [1,1] → e S NP ● VP [0,1] e VP VP PP steps:e VP → Vn NP [1,1] m → 9 G . → VP V VP e10 VP → V. VP [1,1] → G VP V NP Predictione11 -VP This→ stepG VP adds PP new edges[1,1] to the chart and can be → ez S NP VP ● [0,1] ey VP V e12 V → can G → [1,2] → thought of as expanding a node. A non-terminal (tree node) N {they, fish, rivers, ...} e13 VP → V G [1,2] (e12) → e VP → V NP [1,2] (e ) P {in} is unexplored14 ifG it occurs in any edges with a12 dot on its LHS. → e VP → NowV VP that we know what [1,2] the edges look (e like) in the Earley Chart, V {can, fish} So15 for all edges inG the chart, find any non-terminals12 with a dot → e16 S → NPthe VP nextG question is how[0,2] they get into (e4 the,e13 chart) 6. The algorithm 6 Putting edges into the chart is called e VP → VP PP [1,2] (e ) on17 their LHS andproceedsG expand as follows them7: according to13 the rule set. They populating the chart. 7 I deviate slightly= fromNP, J&MVP, inPP order, N, V, P e18 NP → G N [2,2] word 3 N { } wille appearNP → withN a PP span of [n,n][2,2] where n is the waiting location to (hopefully) make it slightly simpler 19 •G Earley is a top-down algorithm so we initiate the chart with the Po f S = N, V, P e VP → V [2,2] Nto follow. { } ⊂ N of the20 edge thatG isrule being containing expanded. the leftmost dot: at this point we have explored e VP → SV NP [2,2] 21 G ● they ● can ● fish ● e VP → noV VP nodes of this rule.[2,2] 22 G Figure 6: As an aide memoire, here edgeen dottedVP → ruleVPedge PP [startingdotted[2,2] rule at, waiting[starting at, at] waitinghistory at] history 23 G n are the0 grammarI rules for2 our toy 3 e e24 S PP ●→NPG P VP NPe S ● NP[2,2] VP [0,0] [0,0] 0 0 grammarS and theNP VP input sentence we e →N → fish → [2,3] → 25 G NP N PP e NP ● N [0,0] are trying→ to parse with its numbered 1e26 V → fish G [2,3] NP N → → e27 NP → •N For each word in the[2,3] sentence we proceed (e25) through the following locations.PP P NP e2 PaulaNP Buttery● NG (DTAL) PP [0,0]Computational Psycholinguistics → 38 / 47 e NP → N PP [2,3] (e ) VP VP PP 28 → steps:G 25 → VP V VP → VP V NP Prediction - This step adds new edges to the chart and can be → Scan - This step allows us to check if we have a part-of-speech VP V → thought of as expanding a node. A non-terminal (tree node) N {they, fish, rivers, ...} → node that is consistent with the input sentence. We have P {in} is unexplored if it occurs in any edges with a dot on its LHS. → V {can, fish} reached a part-of-speechSo for all node edges in to the be chart, scanned find any if non-terminalswe find any with a dot → on their LHS and expand them according to the rule set. They non-terminal belonging to the set Po f S = N, V, P with = NP, VP, PP, N, V, P will appear with a spanN of [n,n] where{ n is the} waiting location N { } Po f S = N, V, P a dot on its LHS. Soof thefor edge all edgesthat is being in the expanded. chart, find any non- N { } ⊂ N terminals in Po f S with a dot on their LHS and check if the N edgen dotted rule [starting at, waiting at] history Figure 6: As an aide memoire, here word at the waiting location is consistent with the part-of- are the grammar rules for our toy e S ● NP VP [0,0] 0 → grammar and the input sentence we speech found. If ite is1 consistentNP ● N (i.e. the word[0,0 may] adopt that are trying to parse with its numbered → locations. e NP ● N PP [0,0] part-of-speech) then2 we add→ an edge to the chart. The span of

this edge will beScan from- This the step waiting allows uslocation to check of if we the have expanded a part-of-speech rule to the locationnode at the that end is consistent of the with word. the input sentence. We have reached a part-of-speech node to be scanned if we find any non-terminal belonging to the set = N, V, P with NPo f S { } a dot on its LHS. So for all edges in the chart, find any non- terminals in with a dot on their LHS and check if the NPo f S word at the waiting location is consistent with the part-of- speech found. If it is consistent (i.e. the word may adopt that part-of-speech) then we add an edge to the chart. The span of this edge will be from the waiting location of the expanded rule to the location at the end of the word. Hale, 2001

edgen dotted rule [S, W] hist Prob MaxProb e0 S → G NP VP [0,0] P(S → NP VP)=1 e1 NP → G N [0,0] P(e0)P(NP → N)=1*0.8=0.8 e2 NP → G N PP [0,0] P(e0)P(NP → N PP)=1*0.2=0.2 e3 N → they G [0,1] P(N → they)=0.2 e4 NP → N G [0,1] (e3) P(e3)P(NP → N) =0.2*0.8 =0.16 e5 NP → N G PP [0,1] (e3) e6 S → NP G VP [0,1] (e4) e7 PP → G P NP [1,1] P(N → they)P(e2)P(PP → P NP) =0.2*1*0.2*1=0.04 e8 VP → G V [1,1] P(N → they)P(e1)P(VP → V) =0.2*1*0.8*0.3=0.048 e9 VP → G V NP [1,1] P(N → they)P(e1)P(VP → V NP) =0.2*1*0.8*0.4=0.048 e10 VP → G V VP [1,1] P(N → they)P(e1)P(VP → V VP) =0.2*1*0.8*0.2=0.032 e11 VP → G VP PP [1,1] P(N → they)P(e1)P(VP → VP PP) =0.2*1*0.8*0.1=0.0016 e12 V → can G [1,2] P(V → can)=0.5 e13 VP → V G [1,2] (e12) P(e12)P(VP → V) =0.5*0.3 =0.15 e14 VP → V G NP [1,2] (e12) e15 VP → V G VP [1,2] (e12) e16 S → NP VP G [0,2] (e4,e13) P(e4)P(e13)P(S → NP VP) =0.2*0.8*0.5*0.3*1 =0.024 e17 VP → VP G PP [1,2] (e13)

Paula Buttery (DTAL) Computational Psycholinguistics 39 / 47 What makes a sentence difficult to process?

Predicting brain response from a probabilistic parser: We used functional Magnetic Resonance Imaging (fMRI) to monitor brain activation while subjects passively listen to short narratives. The texts were written so as to introduce various syntactic complexities (relative clauses, embedded questions, etc.) not usually found (in such density) in actual corpora. With the use of a computationally implemented probabilistic parser (taken to represent an ideal listener) we have calculated a number of temporally dense (one per word) parametric measures reflecting different aspects of the incremental processing of each sentence. We used the resulting measures to model the observed brain activity (BOLD). We were able to identify different brain networks that support incremental linguistic processing and characterize their particular function. Asaf Bachrach Paula Buttery (DTAL) Computational Psycholinguistics 40 / 47 What makes a sentence difficult to process?

‘Identifying computable functions and their spatiotemporal distribution in the human brain’—Andrew Thwaites et al.

Paula Buttery (DTAL) Computational Psycholinguistics 41 / 47 Modelling acquisition of syntax as a process

state S- {} state a- s np { \ } state b- s np, s/s { \ } state c- s np, (s np)/np { \ \ } state d- s np, (s np)/(s np) { \ \ \ } state e- s np, s/s, (s np)/np Paula Buttery (DTAL) { \ Computational\ } Psycholinguistics 42 / 47 state f - s np, (s np)/np, ((s np)/np)/np { \ \ \ } state g- s np, (s np)/(s np), s/s { \ \ \ } state h- s np, (s np)/(s np), (s np)/np { \ \ \ \ } state i - s np, s/s, (s np)/np, (s np)/(s np) { \ \ \ \ } state j - s np, s/s, (s np)/np, ((s np)/np)/np { \ \ \ } state k- s np, (s np)/np, ((s np)/np)/np, (s np)/(s np) { \ \ \ \ \ } state l - s np, (s np)/np, ((s np)/np)/np, (s np)/(s np), s/s { \ \ \ \ \ } Figure 6.10: The CGL as a Markov structure.

111 How is meaning represented in the brain?

Vector space models (VSMs) and semantic priming—Pado and Lapata take word pairs from the psychological literature compute vector representations for target words and related and unrelated prime words distance between related prime and target should be smaller than distance between unrelated prime and target.

Paula Buttery (DTAL) Computational Psycholinguistics 43 / 47 How is meaning represented in the brain?

Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data—Devereux et al.

Paula Buttery (DTAL) Computational Psycholinguistics 44 / 47 Psycholinguistics is concerned with understanding how language is stored and processed in the brain. Computational Psycholinguistics contributes to the field by making predictions using information theory or computational models of language. These predictions are tested through observations or various experimental measurements.

Paula Buttery (DTAL) Computational Psycholinguistics 45 / 47 To find out more: Harley, T. (2001) The psychology of language from data to theory. Berko, J. (1958) The Child’s Learning of English Morphology. Marslen-Wilson, W. & Welsh, A. (1978) Processing interactions and lexical access during word recognition in continuous speech. Pickering, M.J., & Garrod, S. (2004). The interactive-alignment model Jaeger, T. (2010). Redundancy and Reduction: Speakers Manage Syntactic Information Density. de Jong, N., Schreuder, R. & Baayen, H. (2000) The morphological family size effect and morphology. Moscoso del Prado Martn, F., Kostic, A. & Baayen, H. (2004) Putting the bits together: An information theoretical perspective on morphological processing.

Paula Buttery (DTAL) Computational Psycholinguistics 46 / 47 Gibson, E. (2000) The Dependency Locality theory: A Distance-based theory of linguistic complexity. Levy, R (2008) Expectation-Based Syntactic Comprehension. Bachrach, A. (2008) Imaging Neural Correlates of Syntactic Complexity in Naturalistic Context. Pado, S. & Lapata, M. (2007) Dependency-based Construction of Semantic Space Models. Devereux, B. et al (2013) The Centre for Speech, Language and the Brain (CSLB) concept property norms.

Paula Buttery (DTAL) Computational Psycholinguistics 47 / 47