<<

Zaidan & Eisner – Modeling ModelingAnnotators Annotators: Zaidan & Eisner – Modeling Annotators A Generative Approach to Annotators are useful. Learning from Annotator Rationales They provide us with data sets that we use everyday in the context of statistical learning. Omar F. Zaidan Jason Eisner Annotators are smart! ☺ The Center for Language We achieve good performance using such data in and Speech Processing many NLP tasks. Johns Hopkins University Annotators are underutilized… EMNLP 2008 – Honolulu, HI A lot of thought goes into annotation, but little of that is th Saturday October 25 , 2008 captured.

{ozaidan |jason} @cs.jhu.edu

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Annotators are Underutilized? Annotators are Underutilized! م اب ,Hey annotator, Armageddon “Hey annotator“ ها ا ار ه آر . ا، إاج is this review This disaster flick is a disaster alright. Directed by is this review Tony Scott (Top Gun), it's the story of an asteroid the ت ( ج ب ) ، وي آ positive or size of Texas caught on a collision course with Earth. positive or و س ام ة ار . negative?” After a great opening, in which an American negative?” ا را، ء أ، إ إ spaceship, plus NYC, are completely destroyed by a رك، ا اآ و . comet shower, NASA detects said asteroid and go ن ا ا ( وس و ) ، :Annotator says: into a frenzy. They hire the world's best oil driller Annotator says (Bruce Willis), and send him and his crew up into و و إ اء ح ا ه. .space to fix our global problem ه ارة واآ و و ي The action scenes are over the top and too ludicrous آت، إ وت أ، و for words. So much so, I had to sigh and hit my head و ات . و أ ن رؤ ب with my notebook a couple of times. Also, to see a رن آا ه اه . ا ا wonderful actor like in a film like (y = –1) this is a waste of his talents. The only real reason for (y = –1) ها ا ه و اق د اآ making this film was to somehow out-perform Deep . ا، م اب . ??Useful! Impact. Bottom line is, Armageddon is a failure. Useful

1 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Annotators are Underutilized! Annotators are Underutilized! Hey annotator, Armageddon“ م اب ,Hey annotator“ is this review This disaster flick is a disaster alright. Directed by ها ا ار ه آر . ا، إاج is this review Tony Scott (Top Gun), it's the story of an asteroid the .positive or size of Texas caught on a collision course with Earth ت ( ج ب ) ، وي آ positive or و س ام ة ار . negative?” negative?” After a great opening, in which an American spaceship, plus NYC, are completely destroyed by a ا را، ء أ، إ إ comet shower, NASA detects said asteroid and go رك، ا اآ و . Annotator says: into a frenzy. They hire the world's best oil driller ن ا ا ( وس و ) ، :Annotator says (Bruce Willis), and send him and his crew up into .space to fix our global problem و و إ اء ح ا ه. The action scenes are over the top and too ludicrous ه ارة واآ و و ي for words. So much so, I had to sigh and hit my head آت، إ وت أ، و with my notebook a couple of times. Also, to see a و ات . و أ ن رؤ ب wonderful actor like Billy Bob Thornton in a film like رن آا ه اه . ا ا (y = –1) (y = –1) this is a waste of his talents. The only real reason for making this film was to somehow out-perform Deep ها ا ه و اق د اآ .Useful! Impact. Bottom line is, Armageddon is a failure . ا، م اب . !Useful

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Why would Rationales be Useful? Related Work • “Annotators are underutilized . Let’s ask them to do more than just annotate class.” • Let’s ask annotator to identify relevant features . – Haghighi & Klein (2006), Raghavan et al. (2006), Rationales Class labels Druck et al. (2008). • But: – Features could be hard to describe. Apple shares up 3% apple apple- noun

– Features could be hard for annotators to understand. Annotator – We might want different features in the future.

2 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Saving Private Ryan War became a reality to me after seeing Saving Private Ryan. Steve Spielberg goes beyond reality with his latest production. Keep the kids home as the R rating is for Reality. Tom Hanks is stunning as Capt John Miller, set out in France during WW II to . . . Nonannotated . . . rescue and return home a soldier, Private Ryan (Matt Damon) who lost three brothers in the war. documents Spielberg takes us inside the heads of these individuals as they face death during the horrific battle scenes. Private Ryan is not for everyone, but I felt the time was right for a movie like this to be made. The movie reminds us of the sacrifices made by our fighting men and women. For this I thank them and for Steve Spielberg for making a movie that I will never forget. And I’m sure the Academy ...... will not forget Tom Hanks come April, as another well deserved Oscar with be in Tom's possession.

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators The Postman Question: after the disaster that was Waterworld, Class and also what the fuck were the execs who gave Costner the money to make another movie thinking?? “rationales ” In this 3 hour advertisement for his new hair weave, Costner plays a nameless drifter who dons a long ...... dead postal employee's uniform (I shit you not) and Annotated gradually turns a nuked-out USA into an idealized hippy-dippy society. (The main accomplishment of documents this brave new world is in re-inventing polyester.) t?? When he's not pointing the camera directly at ha himself, director Costner does have a nice visual w w sense, but by the time the second hour rolled no around, I was reduced to sitting on my hands to … keep from clawing out my own eyes. Mark this one OK "return to sender". . . .

3 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Linear Classifier Linear Classifier

• In our experiments, we use a linear classifier. • At test time, when presented with document x: r r • Classifier is represented by a weight vector . +1 , θ ⋅ f (x) > 0 Choose y* =  r r  −1 , θ ⋅ f (x) < 0 • Rationales will play a role when learning . • Where:r • Improvements solely due to a betterlearned . ⋅ –fr )( : feature vector (18k binary unigram features). (vs. no rationales) –θ : corresponding weights (i.e. 18k weights). • At test time: Positive weights favor y = +1 – No change to decision rule. Negative weights favor y = –1 – No new features. – No need for rationales.

Zaidan & Eisner – Modeling Annotators x Zaidan & Eisner – Modeling Annotators is the directoral debut from one of my favorite actors, Steve Buscemi. He gave memorable performences in , Fargo, and . Now ! " ( ) , . a acting actors after all alot am an and around as at awfully bar being better he tries his hand at writing, directing and acting all in the same flick. The movie starts brother buscemi but capable career chloe cream debbie debut decides did dies out awfully slow with Tommy (Buscemi) hanging around a local bar the "trees directing director dogs drive even excellent expectation fargo favorite finding flick lounge" and him pestering his brother. It's obvious he a loser. But as he says "it's for from gave girl good hand hanging having he him his i ice i'm in interest is it it's better I'm a loser and know I am, then being a loser and not thinking I am." Well put. job know liked local loser lounge love memorable michael movie mr my named not The story starts to take off when his uncle dies, and Tommy, not having a job, decides now obvious of off old one out pick put reach reservoir same says seen sevigny slow to drive an ice cream truck. Well, the movie starts to pick up with him finding a love soup starts steve story take the then think thinking this though to tommy trees tries interest in a 17 year old girl named Debbie (Chloe Sevigny) and ... I liked this movie truck uncle up well when with writing year you you've alot even though it did not reach my expectation. After you've seen him in Fargo and Reservoir Dogs, you know he is capable of a better performence. I think his brother, abc abc abc abc abc favor y = +1 Michael, did an excellent job for his debut performence. Mr. Buscemi is off to a good 0.2 0.1 0.0 +0.1 +0.2 favor y = –1 career as a director!

! " () , . a acting actors after all alot am an and around as at awfully bar being better brother buscemi but capable career chloe cream debbie . f (x): 0/1 vector; 18k elements fand = 1, freach = 1, fterrific = 0, …

! " ( ) , . a acting actors after all alot am an and around as at awfully bar being better debut decides did dies directing director dogs drive even excellent expectation fargo favorite finding flick for from gave girl . brother buscemi but capable career chloe cream debbie debut decides did dies directing director dogs drive even excellent expectation fargo favorite finding flick good hand hanging having he him his i ice i'm in interest is it it's job know liked local loser lounge love memorable michael movie . for from gave girl good hand hanging having he him his i ice i'm in interest is it it's job know liked local loser lounge love memorable michael movie mr my named not mr my named not now obvious of off old one out pick put reach reservoir same says seen sevigny slow soup starts steve story take . now obvious of off old one out pick put reach reservoir same says seen sevigny slow soup starts steve story take the then think thinking this though to tommy trees tries the think thinking this though to tommy trees tries truck uncle up when with writing year you you've truck uncle up well when with writing year you you've then well .

4 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators –1 +1

! " () , . a acting actors after all alot am an and around as at awfully bar being better brother buscemi but capable career chloe cream debbie . ! “ ( ) , . a acting actors after all alot am an and around as at awfully bar being better brother buscemi but capable career chloe cream debbie .

debut decides did dies directing director dogs drive even excellent expectation fargo favorite finding flick for from gave girl . debut decides did dies directing director dogs drive even excellent expectation fargo favorite finding flick for from gave girl

good hand hanging having he him his i ice i'm in interest is it it's job know liked local loser lounge love memorable michael movie . good hand hanging having he him his I ice i'm in interest is it it's job know liked local loser lounge love memorable michael movie .

mr my named not now obvious of off old one out pick put reach reservoir same says seen sevigny slow soup starts steve story take . mr my named not now obvious of off old one out pick put reach reservoir same says seen sevigny slow soup starts steve story take .

the then think thinking this though to tommy trees tries truck uncle up well when with writing year you you've . the then think thinking this though to tommy trees tries truck uncle up well when with writing year you you've . This was learned in a standard way: This was learned in a novel way (our algorithm!):

choose that models class labels well choose that models class labels & rationales well (of training data) (of training data)

i.e. loglinear model

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators r n r r r n r r

Choose θ = arg maxr ∏ p( yi | xi ,θ ) ⋅ p(ri | yi , xi ,θ ) Choose θ = arg maxr ∏ p( yi | xi ,θ ) ⋅ p(ri | yi , xi ,θ ) θ i=1 θ i=1

From a review for “Prince of Egypt” ( y = +1 ): From a review for “Prince of Egypt” ( y = +1 ):

O O O O O I I I I O O O O O O I I I I O r

51 weeks into '98 , a champ has emerged . 51 weeks into '98 , a champ has emerged . x

O O O O I I I I I O O O O O I I I I I O r the prince of egypt succeeds where other movies failed . the prince of egypt succeeds where other movies failed . x • We encode rationales as a tag sequence. r • The model p ( r | y , x , θ ) r should predict tag sequence with some help from θ . Let’s describe it…

5 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Designing the Model . Designing the Model . • Mission: model { I,O} tag sequence with ’s help. • Mission: model { I,O} tag sequence with ’s help. r r r r r r r r 1 2 3 … M (Tags) 1 2 3 … M (Tags) ( … ) … (Words) x1 x2 x3 xM … (Words) x1 x2 x3 xM Hmm… has no role here. • Makes sense: if for some word w, , then annotator is likely to mark it I. • How likely? We learn a parameter from that annotator’s rationale data. • is learned jointly with .

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Designing the Model . Designing the Model . • Mission: model { I,O} tag sequence with ’s help. • Mission: model { I,O} tag sequence with ’s help.

r1 r2 r rM r1 r2 r rM 3 … (Tags) LM 3 … (Tags) CM ( … ) ( … )

… (Words) … (Words) x1 x2 x3 xM x1 x2 x3 xM • Problem with this model: it is led to believe that • So, learn 4 more weights: any word w marked with I has high . • Better! Now: “w is marked with I either because of … temptation to start shouting slogans, delivering high or being around others marked with I.” • BUT: its message in an artistically interesting way without being overly manipulative. • CRF: Used in other labeling tasks too (POS, NER).

6 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Recap Experiments • Linear classifier: • Dataset: Pang & Lee’s IMDB movie review dataset (Pang & Lee, 2004). • Our work: choose that models all of annotator’s data well: both class labels and rationales . • 2000 document, each annotated with class label and enriched with rationales (Zaidan et al., 2007).

Learned • We train a classifier jointly maximizing likelihood of jointly : Standard loglinear model class and rationale data. : CRF predicting I/O tag sequence • Compared it to 2 baseline models that account • Remember, at test time: only for class data: loglinear model and SVM. – No change to decision rule. – …and to “SVM contrast” method of Zaidan et al. (2007). – No new features. Improvement due solely – No need for rationales. to betterlearned .

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators 95

Our Method I like ‘em curves, but… nt 93 ffere tly di ifican “SVM Contrast” • Q: “I see that including rationales helps with Sign 91 Method (2007) performance. But they do take extra time to LogLinear collect. Is it worth the extra time?” nt Baseline 89 ffere tly di can SVM Baseline ignifi 87 S • A: “Yes!”

85

Accuracy (%) Accuracy • Question addressed in Zaidan et al. (2007). 83 – Collecting rationales doesn’t take too long. – You don’t even need so many to get much of benefit. 81 • You should collect some in your next annotation 79 0 400 800 1200 1600 project! Training Set Size (Documents)

7 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators Fancy Features . Captures Annotator’s Style • Remember this? r r r r • is a noisy channel, parameterized by . 1 2 3 … M (Tags) • Noisy? Annotators use r to tell us about , but they … don’t do it “perfectly”: – They may mark too little (or too much). … (Words) – They may prefer to start rationales at syntactic x1 x2 x3 xM boundaries, and/or end around punctuation. • Actually, we used an expanded set that includes about 100 “conditional ” transition features, e.g.: • To what degree? Remember that we train both – How often do you see IO around punctuation? models jointly: – How often do you see IO around syntactic boundaries?

• Extra features model rationales better, but don’t help learn any better than basic feature set. • So, learned captures style of a specific annotator.

Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators . Captures Annotator’s Style Conclusions • So, learned captures style of a specific annotator. • A model that accounts for class and rationale annotations outperforms two strong baselines. • Empirical evidence? – An annotator’s own predicts their rationales best. • Generative approach generalizes to other classifiers (just incorporate in objective). • “So, perhaps you got lucky with A0 (whose data you used in experiments). Would this work with others?” • …and to other domains: – E.g. vision: just model how relevant portions of image tend A0 A3 A4 A5 to trigger rationales (“channel model”) and what size/shape rationales tend to be (“language model”). Loglinear baseline 71.0 73.0 71.0 70.0 • Doing an annotation project? Collect rationales! Our method 76.0 76.0 77.0 74.0 – Even small number could help. Jason Eisner Salesman SVM baseline 72.0 72.0 72.0 70.0 –“You may get more bang for the buck! ” “SVM Contrast” method 75.0 73.0 74.0 72.0 1800forRATS

8 Zaidan & Eisner – Modeling Annotators Zaidan & Eisner – Modeling Annotators And finally… • Jason and I thank Christine Piatko (coauthor on On The Internets Zaidan et al., 2007) for helpful discussions and • Slides and more: feedback on paper and presentation. http://cs.jhu.edu/~ozaidan/rationales

• Thank you! FYI–BibTeX entry: @InProceedings{zaidan-eisner:2008:EMNLP, author = { Zaidan , Omar and Eisner , Jason}, • A woman in peril. A confrontation. An explosion. The title = { Modeling Annotators: {A} Generative Approach to Learning from Annotator Rationales }, end. Yawn. Yawn. Yawn. ( Metro ) booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, month = {October}, • As directed by , B&R is one long excuse year = {2008}, for a Taco Bell promotion. ( & Robin ) address = {Honolulu, Hawaii}, publisher = {Association for Computational Linguistics}, pages = {31--40}, • I was reduced to sitting on my hands to keep from url= {{http://www.aclweb.org/anthology/D08-1004}http://www.aclweb.org/anthology/D08-1004 } } clawing out my own eyes. ( The Postman )

9