CS 294-5: Statistical Statistical ? Natural Processing ƒ Last time: ƒ Syntactic trees + lexical translations → (unambiguous) logical statements ƒ Symbolic deep (?) semantics ƒ Often used as part of a logical NLP interface or in database / dialog systems ƒ Applications like question answering and information extraction often don’t need such expressiveness

ƒ Today: Semantic Roles, Empty Elements, and Coreference ƒ Statistically extracting shallow semantics Lecture 18: 11/7/05 ƒ Semantic role labeling ƒ Coreference resolution Includes examples from Johnson, Jurafsky and Gildea, Luo, Palmer

Semantic Role Labeling (SRL) SRL Example

ƒ Characterize clauses as relations with roles:

ƒ Want to more than which NP is the subject (but not much more): ƒ Relations like subject are syntactic, relations like agent or message are semantic ƒ Typical pipeline: ƒ Parse, then label roles ƒ Almost all errors locked in by parser ƒ Really, SRL is quite a lot easier than parsing

PropBank / FrameNet PropBank Example

ƒ FrameNet: roles shared between verbs ƒ PropBank: each verb has it’s own roles ƒ PropBank more used, because it’s layered over the treebank (and so has greater coverage, plus parses) ƒ Note: some linguistic theories postulate even fewer roles than FrameNet (e.g. 5-20 total: agent, patient, instrument, etc.)

1 PropBank Example PropBank Example

Shared Arguments Path Features

Results Interaction with Empty Elements

ƒ Features: ƒ Path from target to filler ƒ Filler’s syntactic type, headword, case ƒ Target’s identity ƒ Sentence voice, etc. ƒ Lots of other second-order features

ƒ Gold vs parsed source trees

ƒ SRL is fairly easy on gold trees

ƒ Harder on automatic parses

2 Empty Elements Example: English

ƒ In the PTB, three kinds of empty elements: ƒ Null items (usually complementizers) ƒ Dislocation (WH- traces, topicalization, relative clause and heavy NP extraposition) ƒ Control (raising, passives, control, shared argumentation)

ƒ Need to reconstruct these (and resolve any indexation)

Example: German Types of Empties

A Pattern-Matching Approach Pattern-Matching Details

ƒ [Johnson 02] ƒ Something like transformation- based learning ƒ Extract patterns ƒ Details: transitive verb marking, auxiliaries ƒ Details: legal subtrees ƒ Rank patterns ƒ Pruning ranking: by correct / match rate ƒ Application priority: by depth ƒ Pre-order traversal ƒ Greedy match

3 Top Patterns Extracted Results

A Machine-Learning Approach Reference Resolution

ƒ [Levy and Manning 04] ƒ Noun phrases refer to entities in the world, ƒ Build two classifiers: many pairs of noun phrases co-refer: ƒ First one predicts where empties go ƒ Second one predicts if/where they are bound John Smith, CFO of Prime Corp. since 1986, ƒ Use syntactic features similar to SRL (paths, categories, heads, etc) saw his pay jump 20% to $1.3 million

as the 57-year- old also became

the financial services co.’s president.

Kinds of Reference Grammatical Constraints

ƒ Referring expressions ƒ Gender / number ƒ John Smith ƒ Jack gave Mary a gift. She was excited. ƒ President Smith More common in ƒ Mary gave her mother a gift. She was excited. newswire, generally ƒ the president harder in practice ƒ the company’s new executive ƒ Position (cf. binding theory) More interesting ƒ The company’s board polices itself / it. ƒ Free variables grammatical ƒ Bob thinks Jack sends email to himself / him. constraints, ƒ Smith saw his pay increase more linguistic theory, easier in practice ƒ Direction (anaphora vs. cataphora) ƒ Bound variables ƒ She bought a coat for Amy. ƒ Every company trademarks its name. ƒ In her closet, Amy found her lost coat.

4 Discourse Constraints Other Constraints

ƒ Recency ƒ Style / Usage Patterns ƒ Peter Watters was named CEO. Watters’ promotion came six weeks after his brother, ƒ Salience Eric Watters, stepped down.

ƒ Focus ƒ Semantic Compatibility ƒ Smith had bought a used car that morning. ƒ Centering Theory [Grosz et al. 86] The used car dealership assured him it was in good condition.

Two Kinds of Models Mention Pair Models

ƒ Mention Pair models ƒ Most common machine learning approach ƒ Treat coreference chains as a ƒ Build classifiers over pairs of NPs collection of pairwise links ƒ For each NP, pick a preceding NP or NEW ƒ Make independent pairwise decisions ƒ Or, for each NP, choose link or no-link and reconcile them in some way (e.g. ƒ Clean up non-transitivity with clustering or graph clustering or greedy partitioning) partitioning algorithms ƒ E.g.: [Soon et al. 01], [Ng and Cardie 02] ƒ Some work has done the classification and clustering jointly ƒ Entity-Mention models [McCallum and Wellner 03] ƒ A cleaner, but less studied, approach ƒ Kind of a hack, results in the 50’s to 60’s on all NPs ƒ Posit single underlying entities ƒ Better number on proper names and pronouns ƒ Each mention links to a discourse ƒ Better numbers if tested on gold entities entity [Pasula et al. 03], [Luo et al. 04] ƒ Failures are mostly because of insufficient knowledge or features for hard common noun cases

Pairwise Features An Entity Mention Model

ƒ Example: [Luo et al. 04] ƒ Bell Tree (link vs. start decision list) ƒ Entity centroids, or not? ƒ Not for [Luo et al. 04], see [Pasula et al. 03] ƒ Some features work on nearest mention (e.g. recency and distance) ƒ Others work on “canonical” mention (e.g. spelling match) ƒ Lots of pruning, model highly approximate ƒ (Actually ends up being like a greedy-link system in the end) [Luo et al. 04]

5