<<

CS388: Natural Language Processing

Lecture 14: Semancs I

Greg Durre Recall: Dependencies ‣ Dependency syntax: syntacc structure is defined by dependencies ‣ Head (parent, governor) connected to dependent (child, modifier) ‣ Each word has exactly one parent except for the ROOT symbol ‣ Dependencies must form a directed acyclic graph

DT NN VBD TO DT NN ROOT the dog ran to the house Recall: Shi-Reduce Parsing ROOT

I ate some spaghe bolognese ‣ State: Stack: [ROOT I ate] Buffer: [some spaghe bolognese] ‣ Le-arc (reduce operaon): Let denote the stack ‣ “Pop two elements, add an arc, put them back on the stack”

w 2,w 1 w 1 , w 2 is now a child of w 1 | ! | ‣ Train a classifier to make these decisions sequenally — that classifier can parse sentences for you Where are we now?

‣ Early in the class: bags of word (classifiers) => sequences of words (sequence modeling)

‣ Now we can understand sentences in terms of tree structures as well

‣ Why is this useful? What does this allow us to do?

‣ We’re going to see how parsing can be a stepping stone towards more formal representaons of language meaning Today

‣ Montague semancs:

‣ Model theorec semancs

‣ Composional semancs with first-order

‣ CCG parsing for database queries

‣ Lambda-DCS for queson answering Model Theorec Semancs Model Theorec Semancs ‣ Key idea: can ground out natural language expressions in set- theorec expressions called models of those sentences ‣ Natural language statement S => interpretaon of S that models it She likes going to that restaurant ‣ Interpretaon: defines who she and that restaurant are, make it able to be concretely evaluated with respect to a world ‣ Entailment (statement A implies statement B) reduces to: in all worlds where A is true, B is true ‣ Our modeling language is first-order logic First-order Logic ‣ Powerful logic formalism including things like enes, relaons, and quanficaons Lady sings ‣ sings is a predicate (with one argument), funcon f: enty → true/false ‣ sings() = true or false, have to execute this against some database (world) ‣ [[sings]] = denotaon, set of enes which sing (found by execung this predicate on the world — we’ll come back to this) Quanficaon ‣ Universal quanficaon: “forall” operator ‣ ∀x sings(x) ∨ dances(x) → performs(x) “Everyone who sings or dances performs” ‣ Existenal quanficaon: “there exists” operator ‣ ∃x sings(x) “Someone sings”

‣ Source of ambiguity! “Everyone is with someone” ‣ ∀x ∃y friend(x,y) ‣ ∃y ∀x friend(x,y) Logic in NLP ‣ Queson answering: Who are all the American singers named Amy? λx. naonality(x,USA) ∧ sings(x) ∧ firstName(x,Amy) ‣ Funcon that maps from x to true/false, like filter. Execute this on the world to answer the queson ‣ Lambda calculus: powerful system for expressing these funcons ‣ Informaon extracon: Lady Gaga and are both musicians musician(Lady Gaga) ∧ musician(Eminem) ‣ Can now do reasoning. Maybe know: ∀x musician(x) => performer(x) Then: performer(Lady Gaga) ∧ performer(Eminem) Composional Semancs with First- Order Logic Montague Semancs S Id Name Alias Birthdate Sings? e470 Stefani Germanoa Lady Gaga 3/28/1986 T VP NP e728 Marshall Mathers Eminem 10/17/1972 T NNP NNP VBP ‣ Database containing enes, predicates, etc. Lady Gaga sings ‣ Sentence expresses something about the world which is either true or false ‣ Denotaon: evaluaon of some expression against this database ‣[[Lady Gaga]] = e470 ‣[[sings(e470)]] = True denotaon of this string is an enty denotaon of this expression is T/F Montague Semancs sings(e470) S funcon applicaon: apply this to e470 ID e470 NP VP λy. sings(y)

NNP NNP VBP Lady Gaga sings λy. sings(y) takes one argument (y, the enty) and returns a logical form sings(y) ‣ We can use the syntacc parse as a bridge to the lambda-calculus representaon, build up a logical form (our model) composionally Parses to Logical Forms sings(e470) ∧ dances(e470) S e470 NP VP λy. sings(y) ∧ dances(y) VP CC VP NNP NNP Lady Gaga and VBP VBP sings dances λy. sings(y) λy. dances(y) ‣ General rules: VP: λy. a(y) ∧ b(y) -> VP: λy. a(y) CC VP: λy. b(y) S: f(x) -> NP: x VP: f Parses to Logical Forms born(e470,3/28/1986) S e470 NP VP λy. born(y, 3/28/1986) VP λy. born(y, 3/28/1986) NNP NNP VBD was Lady Gaga NP VBN born March 28, 1986 λx.λy. born(y,x) 3/28/1986 ‣ Funcon takes two arguments: first x (date), then y (enty) ‣ How to handle tense: should we indicate that this happened in the past? Tricky things ‣ Adverbs/temporality: Lady Gaga sang well yesterday sings(Lady Gaga, time=yesterday, manner=well) ‣ “Neo-Davidsonian” view of events: things with many properes: ∃. type(e,sing) ∧ agent(e,e470) ∧ manner(e,well) ∧ time(e,…) ‣ Quanficaon: Everyone is friends with someone ∃y ∀x friend(x,y) ∀x ∃y friend(x,y) (one friend) (different friends) ‣ Same syntacc parse for both! So syntax doesn't resolve all ambiguies ‣ Indefinite: Amy ate a waffle ∃w. waffle(w) ∧ ate(Amy,w) ‣ Generic: Cats eat mice (all cats eat mice? most cats? some cats?) Semanc Parsing

‣ For queson answering, syntacc parsing doesn’t tell you everything you want to know, but indicates the right structure

‣ Soluon: semanc parsing: many forms of this task depending on semanc formalisms

‣ Two today: CCG (looks like what we’ve been doing) and lambda-DCS

‣ Applicaons: database querying/queson answer: produce lambda- calculus expressions that can be executed in these contexts CCG Parsing Combinatory Categorial Grammar ‣ Steedman+Szabolcsi (1980s): formalism bridging syntax and semancs ‣ Parallel derivaons of syntacc parse and lambda calculus expression ‣ Syntacc categories (for this lecture): S, NP, “slash” categories S ‣ S\NP: “if I combine with an NP on my sings(e728) le side, I form a sentence” — verb NP S\NP ‣ When you apply this, there has to be a e728 λy. sings(y) parallel instance of funcon Eminem sings applicaon on the semancs side Combinatory Categorial Grammar ‣ Steedman+Szabolcsi 1980s: formalism bridging syntax and semancs ‣ Syntacc categories (for this lecture): S, NP, “slash” categories ‣ S\NP: “if I combine with an NP on my le side, I form a sentence” — verb ‣ (S\NP)/NP: “I need an NP on my right and then on my le” — verb with a direct object S borders(e101,e89) S S\NP sings(e728) λy borders(y,e89)

NP S\NP NP (S\NP)/NP NP e728 λy. sings(y) e101 λx.λy borders(y,x) e89 Eminem sings Oklahoma borders Texas CCG Parsing

‣ “What” is a very complex type: needs a noun and needs a S\NP to form a sentence. S\NP is basically a verb phrase (border Texas)

Zelemoyer and Collins (2005) CCG Parsing

‣ “What” is a very complex type: needs a noun and needs a S\NP to form a sentence. S\NP is basically a verb phrase (border Texas) ‣ Lexicon is highly ambiguous — all the challenge of CCG parsing is in picking the right lexicon entries Zelemoyer and Collins (2005) CCG Parsing

‣ “to” needs an NP (desnaon) and N (parent) Slide credit: Dan Klein CCG Parsing

‣ Many ways to build these parsers

‣ One approach: run a “supertagger” (tags the sentence with complex labels), then run the parser

‣ Parsing is easy once you have the tags, so we’ve reduced it to a (hard) tagging

Zelemoyer and Collins (2005) Building CCG Parsers

‣ Model: log-linear model over P (d x) exp w> f(r, x) derivaons with features on rules: | / r d ! X2 S f sings(e728) = Indicator(S -> NP S\NP)

NP S\NP f f = Indicator(S\NP -> sings) e728 λy. sings(y) Eminem sings ‣ Can parse with a variant of CKY Zelemoyer and Collins (2005) Building CCG Parsers

‣ Training data looks like pairs of sentences and logical forms What states border Texas λx. state(x) ∧ borders(x, e89) ‣ Problem: we don’t know the derivaon ‣ Texas corresponds to NP | e89 in the logical form (easy to figure out)

‣ What corresponds to (S/(S\NP))/N | λf.λg.λx. f(x) ∧ g(x) ‣ How do we infer that without being told it?

Zelemoyer and Collins (2005) Lexicon

‣ GENLEX: takes sentence S and logical form L. Break up logical form into chunks C(L), assume any substring of S might map to any chunk What states border Texas λx. state(x) ∧ borders(x, e89) ‣ Chunks inferred from the logic form based on rules: ‣ NP: e89 ‣ (S\NP)/NP: λx. λy. borders(x,y) ‣ Any substring can parse to any of these in the lexicon ‣ Texas -> NP: e89 is correct ‣ border Texas -> NP: e89 ‣ What states border Texas -> NP: e89 … Zelemoyer and Collins (2005) GENLEX

‣ Very complex and hand-engineered way of taking lambda calculus expressions and “backsolving” for the derivaon

Zelemoyer and Collins (2005) Learning

‣ Iterave procedure like the EM algorithm: esmate “best” parses that derive each logical form, retrain the parser using these parses with supervised learning

‣ We’ll talk about a simpler form of this in a few slides

Zelemoyer and Collins (2005) Applicaons

‣ GeoQuery: answering quesons about states (~80% accuracy)

‣ Jobs: answering quesons about job posngs (~80% accuracy)

‣ ATIS: flight search

‣ Can do well on all of these tasks if you handcra systems and use plenty of training data: these domains aren’t that rich

‣ What about broader QA? Lambda-DCS Lambda-DCS ‣ Dependency-based composional semancs — original version was less powerful than lambda calculus, lambda-DCS is as powerful

‣ Designed in the context of building a QA system from Freebase

‣ Freebase: set of enes and relaons Bob Cooper March 15, 1961 Washington PlaceOfBirth DateOfBirth CapitalOf Seale Alice Smith PlaceOfBirth ‣ [[PlaceOfBirth]] = set of pairs of (person, locaon) Liang et al. (2011), Liang (2013) Lambda-DCS Lambda-DCS Lambda calculus Seattle λx. x = Seattle PlaceOfBirth λx.λy. PlaceOfBirth(x,y) PlaceOfBirth.Seattle λx. PlaceOfBirth(x,Seattle)

‣ Looks like a tree fragment over Freebase Seale ??? PlaceOfBirth

Profession.Scientist ∧ λx. Profession(x,Scientist) PlaceOfBirth.Seattle ∧ PlaceOfBirth(x,Seattle) Liang et al. (2011), Liang (2013) Lambda-DCS Bob Cooper March 15, 1961 Washington PlaceOfBirth DateOfBirth CapitalOf Seale Alice Smith PlaceOfBirth Profession ??? Scienst Profession PlaceOfBirth Scienst Seale Profession.Scientist ∧ “list of sciensts born in Seale” PlaceOfBirth.Seattle ‣ Execute this fragment against Freebase, returns Alice Smith (and others) Liang et al. (2011), Liang (2013) Parsing into Lambda-DCS

‣ Derivaon d on sentence x:

‣ No more explicit syntax in these derivaons like we had in CCG

‣ Building the lexicon: more sophiscated process than GENLEX, but can handle thousands of predicates

‣ Log-linear model with features on rules: P (d x) exp w> f(r, x) | / r d ! ‣ Similar to CRF parsers X2 Berant et al. (2013) Parsing with Lambda-DCS

‣ Learn just from queson-answer pairs: maximize the likelihood of the right denotaon y with the derivaon d marginalized out

sum over derivaons d such that the For each example: denotaon of d on knowledge base K is yi Run beam search to get a set of derivaons Let d = highest-scoring derivaon in the beam Let d* = highest-scoring derivaon in the beam with correct denotaon Do a structured perceptron update towards d* away from d Berant et al. (2013) Takeaways

‣ Can represent meaning with first order logic and lambda calculus

‣ Can bridge syntax and semancs and create semanc parsers that can interpret language into lambda-calculus expressions

‣ Useful for querying databases, queson answering, etc.

‣ Next me: neural net methods for doing this that rely less on having explicit grammars