<<

Representation Learning: What is it and how do you teach it?

Clayton Greenberg

Spoken Language Systems Group (LSV) Graduate School of Computer Science (GSCS) Saarland Informatics Campus, Saarland University (UdS)

May 8, 2017

Clayton Greenberg (UdS) Representation Learning May 8, 2017 1 / 41 Who are you?

“I . . . I hardly know, sir, just at present . . . at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then.”

Clayton Greenberg (UdS) Representation Learning May 8, 2017 2 / 41 Representation learning in a neural network

xt is the set the features about you that is given to the system. A is where other sources of information are incorporated.

ht is the task-specific output.

Representation learning is how the computer decides on the xt to A arrow.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 3 / 41 Some natural language processing (NLP) applications

Speech Recognition Machine Translation Information Retrieval

Clayton Greenberg (UdS) Representation Learning May 8, 2017 4 / 41 The disciplines of NLP

Computer Science Mathematics

Linguistics

Clayton Greenberg (UdS) Representation Learning May 8, 2017 5 / 41 Educational objectives for representation learning

CS Students can collect high-quality data in an organized fashion.

M Students can extrapolate from exploring representative samples.

L Students can recognize and describe useful patterns in data.

Students can build appropriate models of data.

Students can apply their models to relevant tasks.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 6 / 41 Outline

1 Characters and words: SWORDSS

2 Words and events: thematic fit modeling

3 General teaching philosophy

Clayton Greenberg (UdS) Representation Learning May 8, 2017 7 / 41 Outline

1 Characters and words: SWORDSS The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective

2 Words and events: thematic fit modeling The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective

3 General teaching philosophy Characters and words: SWORDSS The Collect objective Limited data:

This is one of very few occurrences of the word “unbirthday”.

• It is especially difficult to learn a good representation for a rare word. • Distributional hypothesis: the context of the word captures its meaning. • Fewer occurrences ⇒ fewer contexts ⇒ less meaning

Clayton Greenberg (UdS) Representation Learning May 8, 2017 9 / 41 Characters and words: SWORDSS The Collect objective Sub-word units

• unbirthday ≈ un + birthday

• Linguists would separate the word into its morphemes

• Morphological analysis is high precision, but also expensive

• Data that can be collected automatically: subword units of fixed length

• unbirthday = {unb,nbi,bir,irt,rth,thd,hda,day}

Clayton Greenberg (UdS) Representation Learning May 8, 2017 10 / 41 Characters and words: SWORDSS The Extrapolate objective Rare words are common

Language Vocabulary Rare Words Representation Not Found German 37k 16k 13k Tagalog 22k 11k 8k Turkish 25k 14k 10k

• Roughly half of the words in the vocabulary are rare.

• Most rare words fail to receive a representation.

• We extrapolate that humans must have backoff strategies for rare words.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 11 / 41 Characters and words: SWORDSS The Recognize objective Form and function

Which is the and which is the ?

We recognize universal form-function correspondences.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 12 / 41 Characters and words: SWORDSS The Build objective SWORDSS model: better rare word embeddings

Pre-trained Lexicon Embeddings

- Word Similarity/ Induce Enhanced NLP Task Relatedness Tasks Embeddings {- Langauge Modelling Workflow for applying enhanced word embeddings (Singh et al., 2016).

• Builds better embeddings in 4 steps: map, index, search, and combine. • At end of search step: obtain a list W of words w overlapping target t by at least 3 characters.

• Combine step: vt = ∑ StringSimilarity(t,w) × vw w∈W

Clayton Greenberg (UdS) Representation Learning May 8, 2017 13 / 41 Characters and words: SWORDSS The Apply objective SWORDSS Results

Applied the enhanced representations to two specialized tasks:

1 Rare word similarity task • Ask humans to rate how similar two words are on scale from 1 to 10. • Compute the cosine of the angle between the two representations. • Test the correlation between the ratings and cosines. • Achieved near state-of-the-art (Spearman’s ρ = 0.51) despite no hard-coded morphological information.

2 Rare word perplexity task • Incorporate new representations into a language model. • Compute surprisals for each rare word in a corpus. • Perplexity is the exponential of the average surprisal value. • Outperformed many high performing language model architectures.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 14 / 41 Outline

1 Characters and words: SWORDSS The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective

2 Words and events: thematic fit modeling The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective

3 General teaching philosophy Words and events: thematic fit modeling The Collect objective McRae et al. (1997) procedure for patients

On a scale from 1 (not common) to 7 (very common), how common is it for • soccer • croquet • the harpsichord • cheese to be played?

Clayton Greenberg (UdS) Representation Learning May 8, 2017 16 / 41 Words and events: thematic fit modeling The Collect objective Linear modeling results

(Intercept)

logV

logN

logP_v

logP_n

−1 0 1 2 3 4 5

Using log polysemy and frequency to predict thematic fit on McRaeNN dataset.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 17 / 41 Words and events: thematic fit modeling The Collect objective Greenberg et al. (2015a) procedure for patients

On a scale from 1 (strongly disagree) to 7 (strongly agree), how much do you agree that • soccer • croquet • the harpsichord • cheese is something that is played?

Clayton Greenberg (UdS) Representation Learning May 8, 2017 18 / 41 Words and events: thematic fit modeling The Collect objective Linear modeling results

(Intercept)

logV

logN

logP_v

logP_n

0 1 2 3 4

Confounds become insignificant due to less biased data collection.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 19 / 41 Words and events: thematic fit modeling The Extrapolate objective ANOVA results: Polysemy-Fit interaction

5

4 MONOSEMOUS verbs

POLYSEMOUS verbs 3 Mean Thematic Fit Rating 2 Bad Good Fit Polysemy and Fit effects on thematic fit scores from Greenberg et al. (2015a)

We extrapolate that unequal sense frequencies harm word representations.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 20 / 41 Words and events: thematic fit modeling The Extrapolate objective Comparing means of conditions

5

4

3

2

1 Mean Thematic Fit Rating 0 MONO.GOOD POLY.SENSE1 POLY.SENSE2 POLY.BAD MONO.BAD Extrapolate: having one sense that fits is much better than having none. Extrapolate: fitting the most preferred sense is a little better than that. Extrapolate: polysemy makes judgements less extreme.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 21 / 41 Words and events: thematic fit modeling The Extrapolate objective Linear mixed effects modeling results

(Intercept)

POLY.SENSE1

POLY.SENSE2

POLY.BAD

MONO.BAD

logV

logN

−4 −2 0 2 4 6

Extrapolate: sense frequency distinctions look much less important. Extrapolate: the softening effect of polysemy is weaker for bad fillers.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 22 / 41 Words and events: thematic fit modeling The Recognize objective An “instrument” example

a gavel hit the hedgehog with a flamingo pink quills the queen

Clayton Greenberg (UdS) Representation Learning May 8, 2017 23 / 41 Words and events: thematic fit modeling The Recognize objective Instrument thematic fit judgements

Ferretti et al. (2001): “[On a scale from 1 to 7, h]ow common is it to use each of the following to perform the action of eating?”

cup 3.3 fork 6.7 knife 6.3 napkin 3.8 pliers 1.0 spoon 6.3 toothpick 2.1

We recognize that a verb may need a separate representation for each role.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 24 / 41 Words and events: thematic fit modeling The Build objective The existing method, step 1: count dependencies

Count verb-role-filler triples & adjust counts by local mutual information (LMI).

OVRF LMI(V ,R,F) = OVRF log EVRF

NMOD VC PMOD

the hedgehog was hit by Alice

SBJ LGS

Clayton Greenberg (UdS) Representation Learning May 8, 2017 25 / 41 Words and events: thematic fit modeling The Build objective The existing method, step 2: compute centroid from top 20

Query the top 20 highest scoring fillers and compute the centroid.

centroid

friend gusto family hand knife fork spoon

Verb eat, “with”-preposional object

The most typical with-PP arguments of the verb “eat” according to TypeDM.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 26 / 41 Words and events: thematic fit modeling The Build objective The existing method, step 3: calculate thematic fit score

Return cosine similarity of test role-filler and centroid.

nurse centroid score = 0.33

fork score = 0.30

Verb eat, “with”-preposional object

Sample thematic fit scores using the Baroni and Lenci (2010) method.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 27 / 41 Words and events: thematic fit modeling The Build objective A new vector-space framework for thematic fit modeling

Key idea Build a model in which each verb-role has multiple prototypes (vectors). Use only the closest prototype to determine the thematic fit score.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 28 / 41 Words and events: thematic fit modeling The Build objective The Centroid method

single prototype

friend gusto family hand knife fork spoon

Verb eat, “with”-preposional object

Illustration of the Centroid method for prototype generation, using the most typical with-PP arguments of the verb “eat” according to TypeDM.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 29 / 41 Words and events: thematic fit modeling The Build objective The OneBest method

friend gusto

family

hand knife fork spoon

Verb eat, “with”-preposional object

Illustration of the OneBest method for prototype generation, using the most typical with-PP arguments of the verb “eat” according to TypeDM.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 30 / 41 Words and events: thematic fit modeling The Build objective The kClusters method

cluster 2 prototype cluster 1 prototype

friend gusto cluster 3 family prototype hand knife fork spoon

Verb eat, “with”-preposional object

Illustration of the kClusters method for prototype generation, using the most typical with-PP arguments of the verb “eat” according to TypeDM (Greenberg et al., 2015b).

Clayton Greenberg (UdS) Representation Learning May 8, 2017 31 / 41 Words and events: thematic fit modeling The Apply objective Greenberg et al. (2015a) dataset: results by verb type

Method POLYSEMOUS MONOSEMOUS Centroid 0.43 0.66 OneBest 0.47 0.66 kClusters 0.47 0.68

Correlation between human judgements from the Greenberg et al. (2015a) dataset (patients) and automatic scores using LMIs from TypeDM, by prototype generation method and verb type.

Applying the kClusters method correlates best with the human ratings.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 32 / 41 Outline

1 Characters and words: SWORDSS The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective

2 Words and events: thematic fit modeling The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective

3 General teaching philosophy General teaching philosophy Interactive Lectures

My lectures are interactive, energetic, and dynamic, with exercises / activities built-in.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 34 / 41 General teaching philosophy Green statements

CS Students can collect high-quality data in an organized fashion. M Students can extrapolate from exploring representative samples.

L Students can recognize and describe useful patterns in data.

Students can build appropriate models of data.

Students can apply their models to relevant tasks.

We recite short definitions of the most important terms in class.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 35 / 41 General teaching philosophy Class discussions

Method POLYSEMOUS MONOSEMOUS Centroid 0.43 0.66 OneBest 0.47 0.66 kClusters 0.47 0.68

Correlation between human judgements from the Greenberg et al. (2015a) dataset (patients) and automatic scores using LMIs from TypeDM, by prototype generation method and verb type.

After laying the foundation, we discuss advanced concepts in class.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 36 / 41 General teaching philosophy Quick feedback

Which is the Jabberwocky and which is the Bandersnatch?

Receiving feedback as soon as possible after an activity reinforces learning.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 37 / 41 General teaching philosophy Math has a clear punchline

• Builds better embeddings in 4 steps: map, index, search, and combine. • At end of search step: obtain a list W of words w overlapping target t by at least 3 characters.

• Combine step: vt = ∑ StringSimilarity(t,w) × vw w∈W

Count verb-role-filler triples & adjust counts by local mutual information (LMI).

OVRF LMI(V ,R,F) = OVRF log EVRF

Sometimes there is no replacement for board work.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 38 / 41 General teaching philosophy Data Science at UCSD

Computer Science Mathematics

Cognitive Science

Clayton Greenberg (UdS) Representation Learning May 8, 2017 39 / 41 General teaching philosophy Data Science: a map of the field

A map of the data science field from the European Data Science Summer School, Saarland University

Clayton Greenberg (UdS) Representation Learning May 8, 2017 40 / 41 General teaching philosophy ReferencesI

Greenberg, C., Demberg, V., and Sayeed, A. (2015a). Verb polysemy and frequency effects in thematic fit modeling. In Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics, pages 48–57, Denver, Colorado. Association for Computational Linguistics. Greenberg, C., Sayeed, A., and Demberg, V. (2015b). Improving unsupervised vector-space thematic fit evaluation via role-filler prototype clustering. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 21–31, Denver, Colorado. Association for Computational Linguistics. Singh, M., Greenberg, C., Oualil, Y., and Klakow, D. (2016). Sub-word similarity based search for embeddings: Inducing rare-word embeddings for word similarity tasks and language modelling. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2061–2070, Osaka, Japan. The COLING 2016 Organizing Committee.

Clayton Greenberg (UdS) Representation Learning May 8, 2017 41 / 41