Representation Learning: What is it and how do you teach it?
Clayton Greenberg
Spoken Language Systems Group (LSV) Graduate School of Computer Science (GSCS) Saarland Informatics Campus, Saarland University (UdS)
May 8, 2017
Clayton Greenberg (UdS) Representation Learning May 8, 2017 1 / 41 Who are you?
“I . . . I hardly know, sir, just at present . . . at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then.”
Clayton Greenberg (UdS) Representation Learning May 8, 2017 2 / 41 Representation learning in a neural network
xt is the set the features about you that is given to the system. A is where other sources of information are incorporated.
ht is the task-specific output.
Representation learning is how the computer decides on the xt to A arrow.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 3 / 41 Some natural language processing (NLP) applications
Speech Recognition Machine Translation Information Retrieval
Clayton Greenberg (UdS) Representation Learning May 8, 2017 4 / 41 The disciplines of NLP
Computer Science Mathematics
Linguistics
Clayton Greenberg (UdS) Representation Learning May 8, 2017 5 / 41 Educational objectives for representation learning
CS Students can collect high-quality data in an organized fashion.
M Students can extrapolate from exploring representative samples.
L Students can recognize and describe useful patterns in data.
Students can build appropriate models of data.
Students can apply their models to relevant tasks.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 6 / 41 Outline
1 Characters and words: SWORDSS
2 Words and events: thematic fit modeling
3 General teaching philosophy
Clayton Greenberg (UdS) Representation Learning May 8, 2017 7 / 41 Outline
1 Characters and words: SWORDSS The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective
2 Words and events: thematic fit modeling The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective
3 General teaching philosophy Characters and words: SWORDSS The Collect objective Limited data: unbirthday
This is one of very few occurrences of the word “unbirthday”.
• It is especially difficult to learn a good representation for a rare word. • Distributional hypothesis: the context of the word captures its meaning. • Fewer occurrences ⇒ fewer contexts ⇒ less meaning
Clayton Greenberg (UdS) Representation Learning May 8, 2017 9 / 41 Characters and words: SWORDSS The Collect objective Sub-word units
• unbirthday ≈ un + birthday
• Linguists would separate the word into its morphemes
• Morphological analysis is high precision, but also expensive
• Data that can be collected automatically: subword units of fixed length
• unbirthday = {unb,nbi,bir,irt,rth,thd,hda,day}
Clayton Greenberg (UdS) Representation Learning May 8, 2017 10 / 41 Characters and words: SWORDSS The Extrapolate objective Rare words are common
Language Vocabulary Rare Words Representation Not Found German 37k 16k 13k Tagalog 22k 11k 8k Turkish 25k 14k 10k
• Roughly half of the words in the vocabulary are rare.
• Most rare words fail to receive a representation.
• We extrapolate that humans must have backoff strategies for rare words.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 11 / 41 Characters and words: SWORDSS The Recognize objective Form and function
Which is the Jabberwocky and which is the Bandersnatch?
We recognize universal form-function correspondences.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 12 / 41 Characters and words: SWORDSS The Build objective SWORDSS model: better rare word embeddings
Pre-trained Lexicon Embeddings
- Word Similarity/ Induce Enhanced NLP Task Relatedness Tasks Embeddings {- Langauge Modelling Workflow for applying enhanced word embeddings (Singh et al., 2016).
• Builds better embeddings in 4 steps: map, index, search, and combine. • At end of search step: obtain a list W of words w overlapping target t by at least 3 characters.
• Combine step: vt = ∑ StringSimilarity(t,w) × vw w∈W
Clayton Greenberg (UdS) Representation Learning May 8, 2017 13 / 41 Characters and words: SWORDSS The Apply objective SWORDSS Results
Applied the enhanced representations to two specialized tasks:
1 Rare word similarity task • Ask humans to rate how similar two words are on scale from 1 to 10. • Compute the cosine of the angle between the two representations. • Test the correlation between the ratings and cosines. • Achieved near state-of-the-art (Spearman’s ρ = 0.51) despite no hard-coded morphological information.
2 Rare word perplexity task • Incorporate new representations into a language model. • Compute surprisals for each rare word in a corpus. • Perplexity is the exponential of the average surprisal value. • Outperformed many high performing language model architectures.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 14 / 41 Outline
1 Characters and words: SWORDSS The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective
2 Words and events: thematic fit modeling The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective
3 General teaching philosophy Words and events: thematic fit modeling The Collect objective McRae et al. (1997) procedure for patients
On a scale from 1 (not common) to 7 (very common), how common is it for • soccer • croquet • the harpsichord • cheese to be played?
Clayton Greenberg (UdS) Representation Learning May 8, 2017 16 / 41 Words and events: thematic fit modeling The Collect objective Linear modeling results
(Intercept)
logV
logN
logP_v
logP_n
−1 0 1 2 3 4 5
Using log polysemy and frequency to predict thematic fit on McRaeNN dataset.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 17 / 41 Words and events: thematic fit modeling The Collect objective Greenberg et al. (2015a) procedure for patients
On a scale from 1 (strongly disagree) to 7 (strongly agree), how much do you agree that • soccer • croquet • the harpsichord • cheese is something that is played?
Clayton Greenberg (UdS) Representation Learning May 8, 2017 18 / 41 Words and events: thematic fit modeling The Collect objective Linear modeling results
(Intercept)
logV
logN
logP_v
logP_n
0 1 2 3 4
Confounds become insignificant due to less biased data collection.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 19 / 41 Words and events: thematic fit modeling The Extrapolate objective ANOVA results: Polysemy-Fit interaction
5
4 MONOSEMOUS verbs
POLYSEMOUS verbs 3 Mean Thematic Fit Rating 2 Bad Good Fit Polysemy and Fit effects on thematic fit scores from Greenberg et al. (2015a)
We extrapolate that unequal sense frequencies harm word representations.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 20 / 41 Words and events: thematic fit modeling The Extrapolate objective Comparing means of conditions
5
4
3
2
1 Mean Thematic Fit Rating 0 MONO.GOOD POLY.SENSE1 POLY.SENSE2 POLY.BAD MONO.BAD Extrapolate: having one sense that fits is much better than having none. Extrapolate: fitting the most preferred sense is a little better than that. Extrapolate: polysemy makes judgements less extreme.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 21 / 41 Words and events: thematic fit modeling The Extrapolate objective Linear mixed effects modeling results
(Intercept)
POLY.SENSE1
POLY.SENSE2
POLY.BAD
MONO.BAD
logV
logN
−4 −2 0 2 4 6
Extrapolate: sense frequency distinctions look much less important. Extrapolate: the softening effect of polysemy is weaker for bad fillers.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 22 / 41 Words and events: thematic fit modeling The Recognize objective An “instrument” example
a gavel Alice hit the hedgehog with a flamingo pink quills the queen
Clayton Greenberg (UdS) Representation Learning May 8, 2017 23 / 41 Words and events: thematic fit modeling The Recognize objective Instrument thematic fit judgements
Ferretti et al. (2001): “[On a scale from 1 to 7, h]ow common is it to use each of the following to perform the action of eating?”
cup 3.3 fork 6.7 knife 6.3 napkin 3.8 pliers 1.0 spoon 6.3 toothpick 2.1
We recognize that a verb may need a separate representation for each role.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 24 / 41 Words and events: thematic fit modeling The Build objective The existing method, step 1: count dependencies
Count verb-role-filler triples & adjust counts by local mutual information (LMI).
OVRF LMI(V ,R,F) = OVRF log EVRF
NMOD VC PMOD
the hedgehog was hit by Alice
SBJ LGS
Clayton Greenberg (UdS) Representation Learning May 8, 2017 25 / 41 Words and events: thematic fit modeling The Build objective The existing method, step 2: compute centroid from top 20
Query the top 20 highest scoring fillers and compute the centroid.
centroid
friend gusto family hand knife fork spoon
Verb eat, “with”-preposi onal object
The most typical with-PP arguments of the verb “eat” according to TypeDM.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 26 / 41 Words and events: thematic fit modeling The Build objective The existing method, step 3: calculate thematic fit score
Return cosine similarity of test role-filler and centroid.
nurse centroid score = 0.33
fork score = 0.30
Verb eat, “with”-preposi onal object
Sample thematic fit scores using the Baroni and Lenci (2010) method.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 27 / 41 Words and events: thematic fit modeling The Build objective A new vector-space framework for thematic fit modeling
Key idea Build a model in which each verb-role has multiple prototypes (vectors). Use only the closest prototype to determine the thematic fit score.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 28 / 41 Words and events: thematic fit modeling The Build objective The Centroid method
single prototype
friend gusto family hand knife fork spoon
Verb eat, “with”-preposi onal object
Illustration of the Centroid method for prototype generation, using the most typical with-PP arguments of the verb “eat” according to TypeDM.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 29 / 41 Words and events: thematic fit modeling The Build objective The OneBest method
friend gusto
family
hand knife fork spoon
Verb eat, “with”-preposi onal object
Illustration of the OneBest method for prototype generation, using the most typical with-PP arguments of the verb “eat” according to TypeDM.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 30 / 41 Words and events: thematic fit modeling The Build objective The kClusters method
cluster 2 prototype cluster 1 prototype
friend gusto cluster 3 family prototype hand knife fork spoon
Verb eat, “with”-preposi onal object
Illustration of the kClusters method for prototype generation, using the most typical with-PP arguments of the verb “eat” according to TypeDM (Greenberg et al., 2015b).
Clayton Greenberg (UdS) Representation Learning May 8, 2017 31 / 41 Words and events: thematic fit modeling The Apply objective Greenberg et al. (2015a) dataset: results by verb type
Method POLYSEMOUS MONOSEMOUS Centroid 0.43 0.66 OneBest 0.47 0.66 kClusters 0.47 0.68
Correlation between human judgements from the Greenberg et al. (2015a) dataset (patients) and automatic scores using LMIs from TypeDM, by prototype generation method and verb type.
Applying the kClusters method correlates best with the human ratings.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 32 / 41 Outline
1 Characters and words: SWORDSS The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective
2 Words and events: thematic fit modeling The Collect objective The Extrapolate objective The Recognize objective The Build objective The Apply objective
3 General teaching philosophy General teaching philosophy Interactive Lectures
My lectures are interactive, energetic, and dynamic, with exercises / activities built-in.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 34 / 41 General teaching philosophy Green statements
CS Students can collect high-quality data in an organized fashion. M Students can extrapolate from exploring representative samples.
L Students can recognize and describe useful patterns in data.
Students can build appropriate models of data.
Students can apply their models to relevant tasks.
We recite short definitions of the most important terms in class.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 35 / 41 General teaching philosophy Class discussions
Method POLYSEMOUS MONOSEMOUS Centroid 0.43 0.66 OneBest 0.47 0.66 kClusters 0.47 0.68
Correlation between human judgements from the Greenberg et al. (2015a) dataset (patients) and automatic scores using LMIs from TypeDM, by prototype generation method and verb type.
After laying the foundation, we discuss advanced concepts in class.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 36 / 41 General teaching philosophy Quick feedback
Which is the Jabberwocky and which is the Bandersnatch?
Receiving feedback as soon as possible after an activity reinforces learning.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 37 / 41 General teaching philosophy Math has a clear punchline
• Builds better embeddings in 4 steps: map, index, search, and combine. • At end of search step: obtain a list W of words w overlapping target t by at least 3 characters.
• Combine step: vt = ∑ StringSimilarity(t,w) × vw w∈W
Count verb-role-filler triples & adjust counts by local mutual information (LMI).
OVRF LMI(V ,R,F) = OVRF log EVRF
Sometimes there is no replacement for board work.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 38 / 41 General teaching philosophy Data Science at UCSD
Computer Science Mathematics
Cognitive Science
Clayton Greenberg (UdS) Representation Learning May 8, 2017 39 / 41 General teaching philosophy Data Science: a map of the field
A map of the data science field from the European Data Science Summer School, Saarland University
Clayton Greenberg (UdS) Representation Learning May 8, 2017 40 / 41 General teaching philosophy ReferencesI
Greenberg, C., Demberg, V., and Sayeed, A. (2015a). Verb polysemy and frequency effects in thematic fit modeling. In Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics, pages 48–57, Denver, Colorado. Association for Computational Linguistics. Greenberg, C., Sayeed, A., and Demberg, V. (2015b). Improving unsupervised vector-space thematic fit evaluation via role-filler prototype clustering. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 21–31, Denver, Colorado. Association for Computational Linguistics. Singh, M., Greenberg, C., Oualil, Y., and Klakow, D. (2016). Sub-word similarity based search for embeddings: Inducing rare-word embeddings for word similarity tasks and language modelling. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2061–2070, Osaka, Japan. The COLING 2016 Organizing Committee.
Clayton Greenberg (UdS) Representation Learning May 8, 2017 41 / 41