4/23/19

Pipeline of NLP Tools

Lecture 13a: Scraping (not covered here) Sentence splitting Chunks Tokenization () Part-of-speech tagging Thursday Today CS540 4/23/19 Shallow Named entity recognition Material borrowed (with permission) from James Pustejovsky & Marc Verhagen of Brandeis. Mistakes are mine. Syntactic parsing ()

2

POS as HMM POS tagging with Hidden Markov Models What are the states? ◦ POS tags ◦ CC/CD/…/NN/…/WRB ◦ Because the goal is to find the most likely sequence of tags

What are the observations P(w1...wn | t1...tn )P(t1...tn ) ◦ P(t ...t | w ...w ) = 1 n 1 n P(w ...w ) What is in the Transition Table? tags words 1 n ◦ Maps POS tags to POS tags ◦ Probabilities µ P(w1...wn | t1...tn )P(t1...tn ) ◦ how likely is a singular noun (NN) to be followed by an adjective (JJ)? n ◦ Trained using a labeled corpus » Õ P(wi | ti )P(ti | ti-1 ) What about the observation matrix? i=1 ◦ Maps words (observations) to states (POS tags) ◦ Entries: P(POS tag | ) ◦ Trained on labeled corpus

output probability transition probability

4

POS tagging algorithms Chunking (shallow parsing)

Performance on the Wall Street Journal corpus He reckons the current account deficit will narrow to NP VP NP VP PP Training Speed Accuracy only # 1.8 billion in September . Cost NP PP NP Dependency NetWarning: (2003) this table predatesLow Low 97.2 BERT (from Google) and other Conditional Random Fields High High 97.1 A chunker (shallow parser) segments a sentence into Support vector machinesnew techniques (2003) trained on very 97.1 large corpuses non-recursive phrases. Bidirectional MEMM (2005) Low 97.1 Brill’s tagger (1995) Low 96.6 HMM (2000) Very low High 96.7

5 6

1 4/23/19

The Noun Phrase (NP) Prepositional Phrases Examples: Examples: ◦ He ◦ the man in the white suit ◦ Barak Obama ◦ Come and look at my paintings ◦ The President ◦ Are you fond of animals? ◦ The former Congressman from Illinois ◦ Put that thing on the floor They can all appear in a similar context: ___ was born in Hawaii.

7 8

Verb Phrases Chunking Examples: Text chunking is dividing sentences into non- ◦ He went overlapping phrases. ◦ He was trying to keep his temper. ◦ She quickly showed me the way to hide. Noun phrase chunking deals with extracting the noun phrases from a sentence. While NP chunking is much simpler than parsing, it is still a challenging task to build an accurate and very efficient NP chunker.

9 10

What kind of structures should a What is it good for partial parser identify? Different structures useful for different tasks: ◦ Partial constituent structure [NPI] [VPsaw [NPa tall man in the park]]. Chunking is useful in many applications: ◦ Prosodic segments ◦ Information Retrieval & [I saw] [a tall man] [in the park]. ◦ ◦ Content word groups ◦ Preprocessing before full syntactic analysis [I] [saw] [a tall man] [in the park]. ◦ Text to speech ◦ …

11 12

2 4/23/19

Chunk Parsing Chunk Parsing Examples Goal: divide a sentence into a sequence of chunks. Noun-phrase chunking: ◦ [I] saw [a tall man] in [the park]. Chunks are non-overlapping regions of a text: ◦ [I] saw [a tall man] in [the park]. Verb-phrase chunking: ◦ The man who [was in the park] [saw me]. Chunks are non-recursive Prosodic chunking: ◦ a chunk can not contain other chunks ◦ [I saw] [a tall man] [in the park]. Chunks are non-exhaustive ◦ not all words must be included in chunks

13 14

Chunks and Constituency Chunk Parsing: Accuracy Constituents: [a tall man in [the park]]. Chunk parsing achieves higher accuracy Chunks: [a tall man] in [the park]. ◦ Smaller solution space ◦ Less word-order flexibility within chunks than between Chunks are not constituents chunks ◦ Constituents are recursive ◦ Better locality: Chunks are typically subsequences of Constituents ◦ Fewer long-range dependencies ◦ Chunks do not cross constituent boundaries ◦ Less context dependence ◦ No need to resolve attachment ambiguity ◦ Less error propagation

15 16

Chunk Parsing: Domain Specificity Chunk Parsing: Efficiency Chunk parsing is less domain specific: Chunk parsing is more efficient Dependencies on lexical/semantic information tend to occur ◦ Smaller solution space at levels "higher" than chunks: ◦ Relevant context is small and local ◦ Chunks are non-recursive ◦ Attachment ◦ Chunk parsing can be implemented with a finite state machine ◦ Argument selection ◦ Movement Fewer stylistic differences within chunks

17 18

3 4/23/19

Psycholinguistic Motivations Chunk Parsing Techniques Chunk parsing is psycholinguistically motivated: Chunks as processing units Chunk parsers usually ignore lexical content ◦ Humans tend to read texts one chunk at a time Only need to look at part-of-speech tags ◦ Eye-movement tracking studies Techniques for implementing chunk parsing: Chunks are phonologically marked ◦ Regular expression matching / Finite State Machines (see next) ◦ Pauses, Stress patterns ◦ Transformation Based Learning ◦ Memory Based Learning Chunking might be a first step in full parsing ◦ Other tagging-style methods

19 20

Regular Expression Matching Chunking as Tagging Map Part of Speech tag sequences to {I,O,B}* Define a regular expression that matches the sequences of §I – tag is part of an NP chunk tags in a chunk §O – tag is not part of ◦ A simple noun phrase chunk regexp: §B – the first tag of an NP chunk which immediately

? * * follows another NP chunk ◦ Chunk all matching subsequences: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN §Alternative tags: Begin, End, Outside [the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT mat/NN] Example: ◦ If matching subsequences overlap, the first one or longest one gets priority ◦ Input: The teacher gave Sara the book ◦ Output: I I O I B I

21 22

Chunking State of the Art Chunking with When addressed as tagging – methods similar to POS tagging can be used ◦ HMM – combining POS and IOB tags ◦ TBL – rules based on POS and IOB tags Chunking performance on Penn Depending on task specification and test set: 90-95% for NP chunks Recall Precisi F- on score Winnow (with basic features) (Zhang, 93.60 93.54 93.57 2002) Perceptron (Carreras, 2003) 93.29 94.19 93.74 SVM + voting (Kudoh, 2003) 93.92 93.89 93.91 SVM (Kudo, 2000) 93.51 93.45 93.48 Bidirectional MEMM (Tsuruoka, 2005) 93.70 93.70 93.70

24 23

4 4/23/19

Named-Entity Recognition Syntactic Parsing

We have shown that interleukin-1 (IL-1) and IL-2 control protein protein protein S IL-2 receptor alpha (IL-2R alpha) gene transcription in DNA VP CD4-CD8-murine T lymphocyte precursors. cell_line NP Recognize named-entities in a sentence. NP QP ◦ Gene/protein names VBN NN VBD DT JJ CD CD NNS . ◦ Protein, DNA, RNA, cell_line, cell_type Estimated volume was a light 2.4 million ounces .

25 26

Phrase Structure + Head Information Dependency relations

S

VP

NP NP QP VBN NN VBD DT JJ CD CD NNS . VBN NN VBD DT JJ CD CD NNS .

Estimated volume was a light 2.4 million ounces . Estimated volume was a light 2.4 million ounces .

27 28

Parse Tree Semantic Structure So So Predicate NP1 VP15 argument NP1 VP15 DT2 NP4 VP16 VP21 relations

DT2 NP4 VP16 VP21 A AJ5 NP7 VP17 AV19 VP22 NP25 ARG1 ARG1 ARG1

A AJ5 NP7 ARG2 VP17 AV19 VP22 NP25 8 10 does 24 ARG2 normal NP NP not exclude NP ARG1

normal NP8 NP10 does not exclude NP24

AJ26 28 serum NP11 NP13 NP MOD ARG1 AJ26 28 serum NP11 NP13 NP measurement NP29 NP31 CRP deep MOD ARG1 CRP mesurement deep NP29 NP31 vein thrombosis MOD 29 vein thrombosis 30

5 4/23/19

Feature-Based Parsing

HEAD: verb HPSG SUBJ: <> COMPS: <> ◦ A few schema

Subject-head schema ◦ Many lexical entries HEAD: verb ◦ Deep syntactic analysis SUBJ: Lexical entry COMPS: <> Grammar Head-modifier schema ◦ Corpus-based grammar noun verb HEAD: HEAD: adv construction (Miyao et al SUBJ: <> SUBJ: HEAD: COMPS: <> COMPS: <> MOD: verb 2004) Parser Mary walked slowly ◦ Beam search (Tsuruoka et al.)

31

6