Part-Of-Speech Tagging • Lit Review Part 2 • Written Review of 2 Articles, Due April 1

Announcements Part-of-Speech Tagging • Lit Review Part 2 • Written review of 2 articles, due April 1 • Final Project Proposal CS 341: Natural Language Processing Prof. Heather Pon-Barry • Due Monday April 6 www.mtholyoke.edu/courses/ponbarry/cs341.html Today POS Tagging • Process of assigning part of speech marker to each word in a collection ! She/pronoun ! • POS Tagging found/verb ! herself/pronoun ! falling/verb ! ... POS Tagging Penn Treebank Tagset • Words often have more than one POS: e.g., back • The back door = adjective (JJ) • On my back = noun (NN) • Win the voters back = adverb (RB) • Promised to back the bill = verb (VB) • The POS tagging problem is to determine the POS tag for a particular instance of a word. Applications POS Tagging Performance • Speech synthesis • How many tags are correct? (Tag accuracy) • State of the art: about 97% • “I object” vs. “This object...” • But baseline is already 90% • Baseline is performance is: • Parsing • Tag every word with its most frequent tag • Machine translation • Tag unknown words as nouns • Partly easy because • Named entity recognition • Many words are unambiguous • Word sense disambiguation • You get points for them (the, a, etc.) and for punctuation marks! How difficult is POS Tagging? Automatic POS Tagging • In the Brown corpus: • Symbolic • ~ 11% of the word types are ambiguous with regard to part of speech • Rule-based • ~ 40% of the word tokens are ambiguous • Transformation-based • But they tend to be very common words. E.g., that • Probabilistic • I know that he is honest = preposition (IN) • Hidden Markov models • Yes, that play was nice = determiner (DT) • You can’t go that far = adverb (RB) • Log-linear models Rule-based Tagging Rule-based Example • Start with a dictionary ! • Assign all possible tags to words from the !!!! ! NN! dictionary !!!! ! RB!!! ! VBN!! JJ VB! • Write rules by hand to selectively remove tags PRP! VBD!! TO VB DT NN! • Leaving the correct tag for each word She!promised to back the!bill Rule-based Example Transformation-based Eliminate VBN if VBD is an option when • Combines rule-based and probabilistic tagging VBN|VBD follows “<start> PRP” • rules are used to specify tags in a certain environment !!!! ! NN! • probabilistic, we use a tagged corpus to find the best RB!!! performing rules (supervised learning) VBN ! JJ VB! • Input PRP VBD!! TO VB DT NN! • tagged corpus She!promised to back the!bill • dictionary (with most frequent tags) • Example: Brill tagger HMM: Part-of-Speech Automatic POS Tagging Transition Probabilities • Symbolic • Rule-based • Transformation-based • Probabilistic • Hidden Markov models • Log-linear models Observation Likelihoods: P(word|tag) HMM Maxent P(tag|word) MEMMs • Can do surprisingly well just looking at a word by itself: • Word the: the DT • Maximum Entropy Markov Model • Prefixes unfathomable: un- JJ • A sequence version of the maximum entropy • Suffixes Importantly: -ly RB classifier. • Capitalization Meridian: CAP NNP ti-2 ti-1 • Word shapes 35-year: d-x JJ NNP MD VB • Then build a classifier to predict tag wi-1 wi-1 wi wi+1 <s> Janet will back the bill • Maxent P(tag|word): 93.7% overall / 82.6% unknown Slide adapted from Dan Jurafsky MEMMs More Features ti-2 ti-1 NNP MD VB wi-1 wi-1 wi wi+1 <s> Janet will back the bill Slide adapted from Dan Jurafsky MEMM Decoding POS Tagging Accuracies • Rough accuracies: • Simplest algorithm • Baseline: most freq tag: ~90% • Greedy: at each step in sequence, select tag that maximizes P(tag | nearby words, nearby tags) • Trigram HMM: ~95% • Maxent P(t|w): 93.7% • In practice • MEMM tagger: 96.9% • Viterbi algorithm • Bidirectional MEMM: 97.2% • Beam search • Upper bound: ~98% (human agreement) Slide adapted from Dan Jurafsky More Resources References • Log-linear models • Stanford POS Tagger (cyclic dependency network, bidirectional version of MEMM) • Ratnaparkhi, EMNLP 1996 • http://nlp.stanford.edu/software/tagger.shtml • Toutanova et al., NAACL 2003 • CMU Twitter POS tagger • Excellent recent survey: “Part-of-speech tagging from 97% to 100%: is it time for some • http://www.ark.cs.cmu.edu/TweetNLP/ linguistics?” (Manning, 2011) Summary Training a Tagger • Input • Penn Treebank: standard tagset • tagged corpus • Approaches to POS tagging: • dictionary (with most frequent tags) • Symbolic: rule-based, transformation-based • These are available for English • Probabilistic: HMMs, MEMMs • What about other languages? Research in POS Tagging • Low resource languages • Learning a Part-of-Speech Tagger from Two Hours of Annotation (Garrette and Baldridge, 2013) [video].

Load more