<<

PACLIC 28

Encoding Generalized Quantifiers in Dependency-based Compositional

Yubing Dong⇤ Ran Tian Department of Computer Science Graduate School of Information Sciences University of Southern California Tohoku University [email protected] [email protected]

Yusuke Miyao National Institute of Informatics, Japan [email protected]

Abstract Example 1. P H but H ; P , where ) P At most 5 students like noodles. For textual entailment recognition systems, it H At most 5 Japanese students like udon noodles. is often important to correctly handle Gen- eralized quantifiers (GQ). In this paper, we Example 2. P H but H ; P , where ) explore ways of encoding GQs in a recent P At least 5 Japanese students like udon noodles. framework of Dependency-based Composi- H At least 5 students like noodles. tional Semantics, especially aiming to cor- rectly handle linguistic knowledge like hy- Example 3. P ; H and H ; P , where ponymy when GQs are involved. We use both P Most Japanese students like udon noodles. the selection operator mechanism and a new H Most students like noodles. relation to implement some major properties of GQs, reducing 69% errors of a In this paper, we explore ways of encoding GQs previous system, and a further error analy- in a recent framework of Dependency-based Com- sis suggests extensions towards more power- positional Semantics (DCS) (Liang et al., 2013; Tian ful logical systems. et al., 2014a), especially aiming to correctly handle linguistic knowledge like hyponymy when GQs are involved. We use selection operators, an extension 1 Introduction mechanism described in Tian et al. (2014a), to im- Dependency-based Compositional Semantics (DCS) plement a sub-type of GQs (Section 3.1). To deal provides a formal yet intuitive way to model nat- with downward monotonicity of the ar- ural language semantics. It was initially proposed gument, we also propose a simple extension called in Liang et al. (2011) as a relational database query- “relation” to the framework (Section 3.2). This ap- ing protocol, and later used for logical inference proach does not encode the exact semantics of ev- in Tian et al. (2014a). Although the DCS inference ery specific GQ, but instead captures some major framework provided decent support for both quanti- properties that are both easily implementable with fiers all (universal quantifier) and no (negated exis- the current technology and useful in many cases. tential quantifier), attention is required for an RTE As in Tian et al. (2014a), we empirically tested the system to cope with generalized quantifiers (GQ), extended system on the “Generalized Quantifiers” including “at most n”, “at least n”, “most”, etc., section of the FraCaS corpus (The Fracas Consor- which can affect the direction or even the existence tium et al., 1996), and reduced 69% of the previous of an entailment relation, as demonstrated in Exam- errors. A further error analysis reveals some limita- ples 1 to 3. tions of the current approach, suggesting extensions

⇤ This work was conducted during an internship at the Na- towards more powerful logical systems. We hope tional Institute of Informatics, Japan. this research could make linguistic knowledge like

Copyright 2014 by Yubing Dong, Ran Tian, and Yusuke Miyao 28th Pacific Asia Conference on Language, Information and Computation pages 585–594 !585 PACLIC 28 hyponymy a more effective resource for textual en- A B F (A)(B), or whether it entails the ✓ ) tailment tasks, and also shed some light on the han- two arguments having a non-empty intersection (for dling of more complicated natural language infer- short, “entails ”), i.e. F (A)(B) A B = . 9 ) \ 6 ; ence phenomena. The extended system is publicly There are three cases: released at https://github.com/tomtung/ A B F (A)(B) A B = tifmo. ✓ ) ) \ 6 ; as “most” in Example 4, 2 Background A B F (A)(B) A B = 2.1 Properties of Generalized Quantifiers ✓ 6) ) \ 6 ; as “a lot of” in Example 5, and In this paper, “generalized quantifiers” refers to quantity-denoting such as “few”, A B F (A)(B) A B = ✓ 6) 6) \ 6 ; “most”, “at least 5”, etc. They can bind with a as “at most 5” in Example 6.1 property-denoting common phrase (e.g. “stu- Example 4. A B C, where dents”) to form a quantified noun phrase (e.g. “few ) ) students”), which can then bind with a predicate A All students like noodles. (e.g. “like noodles”) to form a sentence (e.g. “few B Most students like noodles. students like noodles”). We regard the meanings C There are students who like noodles. of both the common noun phrase and the predi- Example 5. A B C, where 6) ) cate as their , i.e. let W be the uni- A All students like noodles. verse containing all entities (a.k.a. the “world” ), B A lot of students like noodles. then the meaning of “students” is regarded as a set C There are students who like noodles. student W containing all entities being students, Example 6. A B C, where ✓ 6) 6) and the meaning of “(someone) likes noodles” is re- A All students like noodles. garded as a subset of W which contains all entities B At most 5 students like noodles. who like noodles. Thus, if we denote the power set C There are students who like noodles. W of W as 2 , then a GQ can be seen as a binary rela- The conservativity property of GQs results from W tion over 2 , or in other words, a F from the “domain restraining” role of the noun argument, (A, B) 2W 2W to F (A)(B) 2 = 0, 1 , in 2 ⇥ 2 { } which effectively eliminates objects that do not have which the sets A and B are called noun argument the noun property, so that we only need to consider and predicate argument, respectively. which of the rest has the predicate property. For ex- Usually, the relation imposed by a GQ is based ample: on the notion of set ; for example, |·| Example 7. Few apples are toxic. Few apples AtMost[30%](A)(B) represents the relation that () are toxic apples. A B / A 30%. However, in practice it often | \ | | |  requires a large amount of effort to introduce cardi- The intuition here is that to know whether few ap- nalities into logical inference. Hence, in this paper ples are toxic, it is sufficient to know which apples we make a compromise by encoding properties of are toxic; those non-apple toxicants are irrelevant. GQs that are most relevant to semantic relations like We formally define the conservativity property as hyponymy and are useful for solving RTE problems. follows. We mainly on three major properties, namely Definition 1 (Conservativity). A GQ F is conserva- interaction with universal and existential quantifica- tive if for any A, B W , ✓ tions, conservativity, and monotonicity. F (A)(B) F (A)(A B). Given a GQ, say F , one most basic semantic () \ 1 property is its interaction with universal and ex- We have made a convenient and practical assumption here: for an English GQ denoted as F ( )( ), F (A)(B) presupposes istential quantifications—whether F (A)(B) is en- · · A = . Therefore we ignore the cases when A B 6 ; ✓ ) tailed by the noun argument being a subset of the F (A)(B) A B = , because A = A B 6) \ 6 ; 6 ;^ ✓ ) predicate argument (for short, “entailed by ”), i.e. A B = . 8 \ 6 ;

!586 PACLIC 28

student like like Tom! Mary! Teddy bear! SBJ OBJ X12! Kate! John! kitsune udon! ARG ARG like John! Mike! spaghetti! ✓ 1 2 ···! ···! ···! student noodle 1 1 udon noodle ARG student noodle kitsune udon! kitsune udon! 1 ARG Q E! tempura udon! spaghetti! ! 1 ···! ···! udon all udon Figure 2: Adapted DCS tree for logical inference Figure 1: A DCS tree of the sentence “all students like udon noodles”, with a given database. calculates the intersection of entries in table udon and noodle to get the of “udon noo- Another important property that textual entail- dles”, and stores the result at place “2”. The ex- ment could rely on is monotonicity. ecute marker “X12” on the root edge imposes the Definition 2 (Monotonicity). A GQ F is upward- wide reading of the quantifier “all”, and guides a entailing (resp. downward-entailing) in the noun ar- calculation that first joins the result stored at place gument if, for any A, B W and A A (resp. “2” with the second column of like table, pro- ✓ 0 ✓ A A), ducing the denotation of “like udon noodles”; then 0 ◆ projects this denotation into the first column to get F (A0)(B) F (A)(B). ) the denotation of “subjects who like udon noodles”; and finally checks if this is a superset of the deno- F being upward/downward-entailing in the predi- tation of “students” stored in place “1”. The nar- cate argument can be defined in a similar manner. row reading of “all” (i.e. “there is a specific udon For example, the GQ “at most 5” is downward- noodle liked by all students”) can be produced by entailing in each argument as shown in Example 1; replacing the execute marker X12 with X21, which and the GQ “at least 5” is upward-entailing as will first assembles each entry x in the second col- shown in Example 2. umn of the like table such that all students like In Section 3, we will explore ways of encoding x, then intersects the result with the denotation of the properties discussed in this section in the DCS “udon noodles”, and finally checks if the intersec- inference framework. tion is an empty set. For the precise calculation of a DCS tree and details of this “mark-execute” mecha- 2.2 Dependency-based Compositional nism, please consult Liang et al. (2013). Semantics In Tian et al. (2014a), the DCS framework was DCS (Liang et al., 2013) was originally proposed adapted to deal with open-domain textual inference. as a natural language interface for querying con- The idea is to use relational algebra operators (Codd, crete relational databases. The meanings of a nat- 1970) to formalize the calculation process used in ural language sentence in DCS are represented by DCS trees, so we can perform logical inference on a DCS tree, which is designed to be both semanti- this abstract level, without given a closed-domain re- cally precise for execution on a database, and struc- lational database. For example, Figure 2 shows an turally straightforward for easy alignment to a syn- adapted DCS tree representing the same sentence, tactic dependency tree. For example, Figure 1 shows “all students like udon noodles”; and it guides a cal- the DCS tree for the sentence “all students like udon culation of the meaning parallel to the original DCS. noodles”, with the corresponding tables in a given Concretely, the abstract denotation of “udon noo- relational database. dles” is formulated as the following: When executed on databases, a DCS tree calcu- lates denotations in a bottom-up manner. For ex- D1 = noodle udon, ample, the DCS tree in Figure 1 first takes the ta- \ ble student, and stores it at place “1”; then it where noodle and udon are no longer given tables

!587 PACLIC 28 in a relational database but abstract sets (treated as of such denotations, we mainly consider their satisfi- symbols) representing denotations of the words, and ability, i.e. whether D (or D ) = . Here, by defini- 4 5 6 ; “ ” is a relational algebra operator representing “in- tion of the division operator, D4 = student \ SBJ 6 ;, ✓ tersection”. D3 and D5 = q (D2, student) = 6 ;, ✓ 6 ;, Similarly, the abstract denotation of “like udon x; studentSBJ x D2. 9 ⇥ { }OBJ ✓ noodles” is formulated by: As we can see from the previous description, many intermediate or related denotations are pro- D2 = like (WSBJ (D1) ), \ ⇥ OBJ duced during the processing of DCS trees. In Tian where W is the “world set” as mentioned in Sec- et al. (2014a), a special kind of auxiliary denota- tion 2.1, and “ ” denotes the Cartesian product. tions is considered, which integrates the in- ⇥ Subscripts SBJ and OBJ are used to denote different formation of an entire DCS tree, and is naturally dimensions. linked to a single pairing of a syntactic/semantic la- Finally, the abstract denotation of “subjects who bel and a node in the DCS tree. Such a pair is called a germ, denoted by (like, SBJ) , (like, OBJ) , like udon noodles” is: T T (noodle, ARG) , etc., where the subscript is used T T D3 = ⇡SBJ (D2) , to denote the whole DCS tree and emphasize the context awareness of the germ object. Abstract where ⇡ is the projection operator. Here we use ⇡r r denotations linked to germs are closely related to to denote a projection into dimension r, whereas ⇡ the concept of feasible values defined in Liang et denotes a projection to all dimensions other than r. al. (2013). For example, if we consider the DCS The adapted DCS tree in Figure 2 uses syntac- tree in Figure 2 and assume the wide reading of SBJ OBJ T tic/semantic labels ( , , etc.) instead of num- “all”, then the denotations linked to (like, OBJ) bers in the original DCS tree to denote different di- T and (noodle, ARG) both equal to ⇡OBJ (D2)= mensions (i.e. different columns in the tables of the T ⇡OBJ (like) D1, “udon noodles that are liked by \ relational database), because they provide database- somebody”; the denotation linked to (like, SBJ) independent explanations for these dimensions. In T is D3, “subjects who like udon noodles”; and the addition, the involved “mark-execute” mechanism denotation linked to (student, ARG) is student. for representing quantifier “all” (as illustrated by T The final result D4 can then be seen as been calcu- the Q, E and X12 markers in Figure 1) is simplified lated from the abstract denotations linked to germs to a quantification marker “ ” on the student-like (like, SBJ) and (student, ARG) . Abstract de- ✓ T T edge (Figure 2), and explained as the division oper- notations linked to germs are useful for encoding ator q in relational algebra2: ✓ GQs in the DCS framework, as we describe in Sec- r tion 3.2. q (R, C)= x = R ( x Wr) x Cr ✓ { | ;6 \ { } ⇥ ✓ { } ⇥ } Another useful mechanism for implementing GQs Therefore, the abstract denotations is the selection operator sf introduced in Tian et al. SBJ OBJ (2014a), which are marked on a DCS tree node and D4 = q ⇡ (D2) , student ✓ wrap an abstract denotation D to form a new ab- SBJ = q (D3, student) stract denotation sf (D) during the calculation pro- ✓ cess. Selection operators were introduced as an ex- and OBJ SBJ tension mechanism to represent the generalized se- D5 = ⇡ q (D2, student) , ✓ lection operation in relational algebra, which selects correspond to the final results calculated by the orig- a subset of specific properties from a given set; the inal DCS tree according to the wide reading and nar- axioms characterizing such properties can be user- row reading of “all”, respectively. For logical infer- defined. For example, in Tian et al. (2014a), se- ence, instead of the database-dependent evaluations lection operators are used to implement superlatives

2 r such as “highest”, so that s ( ) de- When R and C have the same dimension, q (R, C) is ei- highest mountain ther the 0-dimension point set (if R C)✓ or (otherwise) notes the set of the highest mountains. Effectively, a {⇤} ✓ . selection operator s is a user-defined map from any ; f

!588 PACLIC 28

abstract denotation D to a new denotation sf (D). based on a chain of shallow syntactic edit operations In Section 3.1 we will use this mechanism to encode linking premise to hypothesis, which not only failed GQs. to include various inference patterns, but also was unreliable when there are multiple sentences in the 2.3 Related Work premise, or when the premise is relatively long. The GQs have been a topic of interest in study of logic DCS inference framework, on the other hand, grace- since ancient time: Aristotle’s syllogism could be fully handles such cases in a uniform way, thanks to seen as concerning the meanings and properties of the more sophisticated inference engine. An empiri- four basic quantifiers, namely “all”, “no”, “some”, cal comparison is shown in Section 4. and “not all”. Gottlob Frege (Zalta, 2014), one of The logical inference engine described in Tian et the founders of modern logic, in 1870s introduced 8 al. (2014a) treats abstract denotations as terms and and , and formulated the notion of a quantifier as a 9 represent meanings by atomic sentences, which is second order relation. The idea of generalized quan- shown to be very efficient compared to first order tifiers was introduced by Mostowski (1957) and gen- logic provers (Tian et al., 2014b). The idea be- eralized in Lindstrom¨ (1966), forming the standard hind this is actually very similar to description logics definition we use nowadays. Later, Barwise and (DL) (Baader et al., 2003); indeed, the gener- DLR Cooper (1981), following Montague (1973), showed alization of DLs towards n-ary relations (Calvanese the importance of GQs in the formal analysis of lin- et al., 1998) was proposed to deal with inference guistic phenomena. By and large, these works cover problems on database schemata expressed in rela- the logical and linguistic background involved in tional models, which shares the same setting with this paper. the logical system proposed in Tian et al. (2014a). Although it has been recognized that it is impor- The system includes intersections, Cartesian DLR tant to encode GQs for solving textual entailment products of 1-dimensional sets and projections into problems, this remains a big challenge. MacCart- 1-dimensional sets, as well as constructors not pre- ney et al. (2006), for example, tried to capture the sented in Tian et al. (2014a)’s system such as com- use of GQs in feature vectors, but the capabilities of plement, union, and qualified number restrictions. which are greatly limited without an inference en- is also shown to be reducible to the tradi- DLR gine. Even for systems that are backed by inference tional DL (with binary relations) , for which ALCQI engines like in this paper, the focus still needs to be many complete DL inference engines are available3. put on practical NLP rather than logic, linguistics, In comparison, Tian et al. (2014a)’s inference en- or semantic theory, and model complexity may need gine is not complete and lacks a thorough explo- to be purposely traded for computation efficiency. ration from the theoretical side, e.g. on the decid- For example, Lewis and Steedman (2013) used first- ability and complexity of the logical system, but order logic for semantic representation, which is the- it has a working natural language interface inher- oretically very expressive, but still unable to define ited from the DCS framework, and supports spe- GQs without some extensions (Barwise and Cooper, cific constructors tailored for textual inference, e.g. 1981) that are nontrivial especially for practical in- the division operator, which seems not easily en- ference. coded in . Description logics have been ap- DLR Some works made the compromise similar to plied to natural language processing since the early ours: only encode the important properties of GQs days, but were used mostly for semantic interpre- rather than their perfect semantics. A notable re- tation (Brachman, 1985; Sowa, 1991; Knight and cent work that focused on monotonicity is MacCart- Luk, 1994), in which knowledge on syntactic, se- ney and Manning (2008), in which the notion of mantic, and pragmatic elements of natural language monotonicity was generalized to support recursive are encoded in DL to drive the process of converting determination of entailments of a compound expres- utterances into deep and context-dependent logical sion from its constituents. To a large extent, this approach handled the interaction between multiple 3http://www.cs.man.ac.uk/˜sattler/ GQs in a single sentence. However, inference was reasoners.html

!589 PACLIC 28

like SBJ OBJ both hold if we add axiom:

sF (A) A ARG ✓ ARG ✓ sAtLeast[5](student) noodle On the other hand, entailment to existential quantifi- ARG cation (Example 5) ARG s (A) B A B = udon F ✓ ) \ 6 ; Figure 3: Encoding “at least 5” as selection can be implied from the custom axiom:

sF (A) A = forms. Tian et al. (2014a)’s work on the other hand \ 6 ; directly uses a DL-like logical system to represent The monotonicity in the noun argument can be semantics and perform textual inference, benefiting implemented as well. If F is upward-entailing in from the efficiency of DL logical inference. We ex- the noun argument, we should add the axiom plore ways of extending Tian et al. (2014a)’s system A0 A s (A0) s (A). to deal with more advanced linguistic phenomena ◆ ) F ✓ F in this work, while trying to preserve its algebraic Note that the direction of is reversed because ✓ fashion to ensure efficiency, because we believe it s (A) serves as the subset in the form F (A)(B) F ⌘ is important to investigate to what extent the “natu- s (A) B. Similarly, downward-entailment in the F ✓ ral” textual inference requires from a logical system. noun argument can be achieved by the axiom Some limitations of Tian et al. (2014a)’s framework has actually been revealed; for which we will discuss A0 A sF (A0) sF (A). ✓ ) ✓ in Section 4. A proof tree for Example 2 is shown in Figure 4, 3 Encoding Generalized Quantifiers where D3 is the denotation for “subjects who like udon noodles”, as defined in Section 2.2, and sim- 3.1 Encoding as Selections ilarly D0 = ⇡SBJ (like (WSBJ noodleOBJ)) for 3 \ ⇥ Selections are used to encode GQs in the form of “subjects who like noodles”. 3.2 Encoding as Relations F (A)(B) sF (A) B, ⌘ ✓ As mentioned in Section 2.1, a GQ can be seen a W where sF is the specific selection operator defined binary relation over 2 . From this point of view, for the GQ F —that is, a selection operator sF is we introduce a new extension called relation as a always used together with a quantification marker new type of statement into the framework. A rela- “ ”, as exemplified in Figure 3. Note that s (A) tion rf can be used to represent an arbitrary relation ✓ F can be defined as any set related to A, not necessar- between two abstract denotations. A relation marker ily being its subset. can be marked on a DCS tree edge to denote some The basic requirement for encoding a GQ F in relation between the child germ and the parent germ this way is that F should be upward-entailing in its (Figure 5). In our implementation, the core infer- predicate argument, because the form s (A) B ence engine keeps track of which term pairs are la- F ✓ implies such monotonicity. Entailment from univer- beled with which relations: not only can it answer sal quantification (Example 4) whether two terms have been claimed to have a cer- tain relation, but also can it look up all terms that

A B sF (A) B have a certain relation with a certain term. Similar ✓ ) ✓ to selections, we can also specify different sets of and conservativity axioms for different relations—an axiom could be about either what a relation statement entails or what s (A) (A B) s (A) B it is entailed by. F ✓ \ , F ✓

!590 PACLIC 28

Algebraic Property Upward-entailment for sAtLeast[5] student Japanese student A0 A s (A0) s (A) ◆ \ ◆ ) AtLeast[5] ✓ AtLeast[5] Premise sAtLeast[5](student) sAtLeast[5](student Japanese) sAtLeast[5](student Japanese) D3 Algebraic Property ✓ \ \ ✓ s (student) D3 D3 D0 AtLeast[5] ✓ ✓ 3 s (student) D0 AtLeast[5] ✓ 3 Figure 4: An example of proof with generalized quantifiers encoded as selections.

premises entail. For example, from “few dinosaurs � are pterosaurs”, with the knowledge of GQ conser- like � SBJ OBJ vativity the system should figure out “few dinosaurs can fly” without being explicitly instructed to prove ARG � ARG so. However, to implement r (A, A B) student noodle F \ ) ARG rF (A, B), the forward-chaining strategy would re- quire the engine to find all Bs that satisfy X = A B ARG \ udon whenever a relation rF (A, X) is claimed, in order to claim the relation rF (A, B). Though it is quite easy to check if X = A B for a given triple (X, A, B), Figure 5: Encoding “at most 5” as relation \ issues arise when B is not given and we need to find all possibilities. It is generally impractical to enu- Intuitively, a GQ F can be represented by a rela- merate all possible forms that a set X can be written tion rF : as intersections; the number of possibilities easily F (A)(B) r (A, B) 4 ⌘ F explodes even for small-size problems . Hence, we To enable the entailment from universal or to ex- implement the rule r (A, A B) r (A, B) as F \ ) F istential quantification, we simply add the axiom the following: if r (A, X), and if X A, then F ✓ A B rF (A, B) or rF (A, B) A B = , we take every B X and check if X = A B. ✓ ) ) \ 6 ; ◆ \ respectively. The necessary conditions X A and B X limit ✓ ◆ The axioms for monotonicity are also very intu- the search space at first. We would like to empha- itive. For GQs that are downward-entailing in both size this detailed implementation issue here because arguments (e.g. “at most 5” in Example 1), we put formal semantics researchers are often not aware of such difficulties. r (A, B) A A0 B B0 r (A0,B0). f ^ ◆ ^ ◆ ) f A shortcoming of the relation implementation is Other kinds of monotonicity can be achieved in a that, when processing DCS trees with relations, our similar way. extended system simply discards the edges marked As for conservativity, we can simply implement as relations, then calculate the abstract denotations of germs in the resulting DCS forest, and finally rF (A, B) rF (A, A B), ) \ use the denotations of corresponding germs as ar- but the reverse guments of the relations to form statements. For example, in Figure 5, we calculate the denota- rF (A, A B) rF (A, B) \ ) tions of germs (student, ARG) 2 and (like, SBJ) 1 , T T is a little tricky. This is because Tian et al. (2014a)’s which are student and D3 (as defined in Sec- inference engine is based on forward-chaining: it al- tion 2.2), respectively; then we form the statement ways tries to deduce all possible implications from rAtMost[5](student,D3) as the meaning of this sen- given premises. This strategy is employed not tence. This procedure implies that, relations in only because of its efficiency, but also because it DCS trees are always explained as having the widest opens the possibility of adapting DCS for entail- 4For example, X = X C for every C X; even we ment generation (Androutsopoulos and Malakasio- \ ◆ only consider minimal intersections such that X = A B but \ tis, 2010), in which case without any given hypothe- X = A and X = B, the possibilities could be exponential, e.g. 6 6 ses the system needs to actively explore what the consider X =(A B) C = A (B C). \ \ \ \

!591 PACLIC 28

Monotonicity and hence we cannot deal with multiple re- GQ Entailed by Entails lations in a single sentence. It causes errors when 8 9 Noun Arg. Predicate Arg. there are multiple GQs encoded as relations appear many 337 in the same sentence; we analyze this case in detail " a lot of 337 " in Section 4. few 73 ## a few 33 "" most 337 4 Evaluation and Analysis " at most n 77 ## at least n 73 The FraCaS corpus (The Fracas Consortium et al., "" 5 1996) was built in the mid 1990s by the FraCaS Table 1: Properties of GQs appear in FraCaS corpus, Consortium, which contains a set of hand-crafted including the interaction with universal and existential entailment problems covering a wide range of se- quantifications, and the monotonicity in noun and predi- cate arguments, where “ ”, “ ”, and “7” denote upward- mantic phenomena, organized in nine sections. The " # first section is titled “Generalized Quantifiers”, and entailing, downward-entailing, and non-monotone, re- can serve as a good empirical test suite for RTE sys- spectively. tems that handle general properties of GQs. This 6 Accuracy section contains 74 problems ; 44 of them have one System premise sentence while the other 30 have multiple Single Multi Overall premises. The involved GQs and their properties MacCartney07 84.1% NatLog N/A are listed in Table 1.7 Our implementation extends MacCartney08 97.7% the TIFMO system publicly released with Tian et Parser Syntax 70.5% 50.0% 62.2% CCG-Dist al. (2014a)8. Since we mainly focus on the perfor- Gold Syntax 88.6% 80.0% 85.1% mance of the DCS framework as formal semantics, Baseline 79.5% 86.7% 82.4% on-the-fly knowledge and WordNet are not used. Selection 90.9% 93.3% 91.9% TIFMO Major GQ properties can be implemented as com- Relation 88.6% 93.3% 90.5% posable and reusable units9, so that each GQ can be Selection+Relation 93.2% 96.7% 94.6% created by simply composing the units that corre- Table 2: Accuracies achieved on the first section of Fra- sponds to the properties it has. This makes imple- CaS corpus using different systems. menting new GQs very easy. TIFMO uses the Stanford Parser10 to obtain Stan- ford dependencies (de Marneffe et al., 2006) and tion+Relation”. GQs are simply dropped in the POS tags, which are used to construct DCS trees “Baseline” setting. The “Selection” and “Relation” based on a set of pre-defined rules. We extend settings use the same DCS trees as in “Baseline”, those rules in order to recognize GQs in this step, except for selection or relation markers on DCS and encode them under one of four settings, namely trees to represent GQs. The “Selection” approach “Baseline”, “Selection”, “Relation”, and “Selec- implements all GQs as selections (even for those are downward-entailing in the predicate argument), 5 We used the version converted to XML format by MacCart- whereas “Relation” approach implements all GQs as ney and Manning (2007). 66 problems that do not have a defined solution are excluded. relations. In the “Selection+Relation” setting, we 7FraCaS dubiously interpreted “many” as denoting “a large use relations only for the GQs that are downward- proportion” rather than “a large absolute number”, whereas entailing in the predicate argument (i.e. “few” and “few” as denoting “a small absolute number” rather than “a “at most n”), and implement the rest of the GQs as small proportion”. We also treat “a lot of” as a synonym of “many”. selections. We evaluate the system under each set- 8http://kmcs.nii.ac.jp/tifmo/ ting; the test results are shown in Table 2. 9 We implement GQ properties as stackable traits in We compare our results with two previous tex- Scala (Odersky et al., 2011), each consists of no more than a few dozen lines of code. tual inference systems, CCG-Dist (Lewis and Steed- 10http://nlp.stanford.edu/software/ man, 2013) and NatLog (MacCartney and Manning, lex-parser.shtml 2007; MacCartney and Manning, 2008), also shown

!592 PACLIC 28 in Table 2. CCG-Dist uses rule-based conversion where D is the abstract denotation for “spend at from CCG parses to first order logic formulas, and home” results are given using both parser syntax and gold D = spend (WSBJ WOBJ homeMOD) syntax. The resulting accuracies are not very high, \ ⇥ ⇥ even for gold syntax, showing that implementing which is correct and solves this case. However, in GQs is not an easily accomplishable task although general we need to further extend the notion of “re- first order logic is theoretically very expressive. Nat- lation” to handle different scopes, or at least we need Log is a system based on natural logic, which has something similar to the division operator but can be almost perfect performance on single premise prob- used to implement downward entailment in the pred- lems but faces difficulties dealing with premises of icate argument. multiple sentences. In , our extension of the If we recall the definition of division operator q , ✓ TIFMO system achieves the best overall accuracy. it is natural to consider a similar operator as In each setting of our extension, almost all of the r errors are related to the handling of GQs. The “Se- q (R, C)= x R ( x Wr) x Cr . ◆ { | \ { } ⇥ ◆ { } ⇥ } lection” approach cannot encode downward entail- Fortunately, q can be defined algebraically as ment in the predicate argument, as shown in Exam- ◆ r r r r ple 8; whereas the “Relation” approach fails to han- q (R, C)=⇡ (R) ⇡ (R¯ (W Cr)), dle multiple GQs in a single sentence, as shown in ◆ \ \ ⇥ r Example 9. where W denotes the Cartesian product of W s on all dimensions other than r. This is imple- Example 8. P1 P2 P3 H, where ¯ ^ ^ ) mentable if we introduce complement X into Tian P1 Few committee members are from southern Eu- et al. (2014a)’s logical system. Operator “q ” can ◆ rope. be combined with selection operators to encode GQs P2 All committee members are people. that are downward-entailing in predicate argument, P3 All people who are from Portugal are from e.g. “at most ten”. We may also be tempted southern Europe. to introduce free variables or higher order opera- H There are few committee members from Portu- tors, especially when we begin to consider donkey gal. (Heim, 1982) and other advanced infer- Example 9. P H, where ence phenomena. Such decisions should be made 6) P At most ten commissioners spend a lot of time at with caution because unguarded free variables easily home. lead to undecidability, as suggested by the research H At most ten commissioners spend time at home. on description logics. However, further exploration In Example 9, when both “at most ten” and “a on this topic should be a future direction but out of lot of” are encoded as relations, both of them take the scope of this work. the widest scope and the meaning of P is calculated 5 Conclusion as the of Pa “at most ten commission- ers spend (something) at home”, and Pb “(some- Encoding the semantics of a generalized quantifier body) spend a lot of time at home”. Then, Pa im- is often crucial to correctly capturing the seman- plies H since “at most ten” is downward-entailing tics of a sentence and making the right textual en- in the predicate argument; the system produces a tailment. We have shown in this paper that major wrong answer. On the other hand, in the “Selec- properties of GQs can be implemented in the DCS tion+Relation” setting, “a lot of” is encoded as inference framework to correctly handle semantic selection and accompanied with the quantification relations like hyponymy. This tested and demon- marker “ ”, which can take a narrow scope and is strated the capabilities and potentials of the DCS ✓ explained as a division operator. Hence the calcu- framework, and suggested extensions towards more lated meaning of P becomes powerful logical systems for handling more sophis- ticated linguistic phenomena. OBJ rAtMost[10](comm’r,q (D, sALotOf(time))) ✓

!593 PACLIC 28

References and Paraphrasing, pages 193–200. Association for Computational Linguistics. Ion Androutsopoulos and Prodromos Malakasiotis. Bill MacCartney and Christopher D. Manning. 2008. 2010. A survey of paraphrasing and textual entailment Modeling semantic containment and exclusion in natu- methods. J. Artif. Int. Res., 38(1):135–187, May. ral language inference. In Proceedings of the 22Nd In- Franz Baader, Diego Calvanese, Deborah L. McGuin- ternational Conference on Computational Linguistics ness, Daniele Nardi, and Peter F. Patel-Schneider, edi- - Volume 1, COLING ’08, pages 521–528, Strouds- tors. 2003. The Description Logic Handbook: Theory, burg, PA, USA. Association for Computational Lin- Implementation, and Applications. Cambridge Uni- guistics. versity Press, New York, NY, USA. Bill MacCartney, Trond Grenager, Marie-Catherine Jon Barwise and Robin Cooper. 1981. Generalized quan- de Marneffe, Daniel Cer, and Christopher D Manning. tifiers and natural language. Linguistics and Philoso- 2006. Learning to recognize features of valid textual phy, 4(2):159–219. entailments. In Proceedings of the main conference Ronald J. Brachman. 1985. On the epistemological sta- on Human Language Technology Conference of the tus of semantic networks. In Ronald J. Brachman and North American Chapter of the Association of Com- Hector J. Levesque, editors, Readings in Knowledge putational Linguistics, pages 41–48. Association for Representation. Morgan Kaufmann. Computational Linguistics. Diego Calvanese, Giuseppe De Giacomo, and Maurizio . 1973. The proper treatment of quan- Lenzerini. 1998. On the decidability of query con- tification in ordinary english. In Patrick Suppes, Julius tainment under constraints. In Proceedings of the 17th Moravcsik, and Jaakko Hintikka, editors, Approaches ACM SIGACT SIGMOD SIGART Symposium on Prin- to Natural Language, volume 49, pages 221–242. Dor- ciples of Database Systems (PODS98). drecht. Edgar F Codd. 1970. A relational model of data for large Andrzej Mostowski. 1957. On a generalization of quan- Commun. ACM shared data banks. , 13(6):377–387, tifiers. Fundamenta mathematicae, 44:12–36. June. Martin Odersky, Lex Spoon, and Bill Venners. 2011. Marie-Catherine de Marneffe, Bill MacCartney, and Traits as stackable modifications. In Programming in Christopher D. Manning. 2006. Generating typed de- Scala: A Comprehensive Step-by-Step Guide, chapter pendency parses from phrase structure parses. In Pro- 12.5, pages 267–271. Artima Inc, 2nd edition. ceedings of the 5th Language Resources and Evalua- John F. Sowa, editor. 1991. Principles of Semantic Net- tion Conference (LREC 2006), pages 449–454. works: Explorations in the Representation of Knowl- . 1982. The semantics of definite and indefi- edge. Morgan Kaufmann. nite noun phrases. Ph.D. thesis. The Fracas Consortium, Robin Cooper, Dick Crouch, Kevin Knight and Steve K. Luk. 1994. Building a large- Jan Van Eijck, Chris Fox, Josef Van Genabith, Jan scale knowledge base for machine translation. In Pro- Jaspars, , David Milward, Manfred Pinkal, ceedings of AAAI ’94. Massimo Poesio, Steve Pulman, Ted Briscoe, Holger Mike Lewis and Mark Steedman. 2013. Combined Maier, and Karsten Konrad. 1996. Using the frame- distributional and logical semantics. Transactions of work. Fracas project LRE 62, 51. the Association for Computational Linguistics, 1:179– Ran Tian, Yusuke Miyao, and Takuya Matsuzaki. 2014a. 192. Logical inference on dependency-based compositional Percy Liang, Michael Jordan, and Dan Klein. 2011. semantics. In Proceedings of Association for Compu- Learning Dependency-based Compositional Seman- tational Linguistics (ACL) 2014. tics. In Proceedings of the 49th Annual Meeting of Ran Tian, Yusuke Miyao, and Takuya Matsuzaki. 2014b. the Association for Computational Linguistics: Hu- Efficient logical inference for semantic processing. In man Language Technologies, pages 590–599, Strouds- Proceedings of the ACL 2014 Workshop on Semantic burg, PA, USA. Association for Computational Lin- Parsing. guistics. Edward N. Zalta. 2014. Gottlob frege. In Edward N. Percy Liang, Michael I. Jordan, and Dan Klein. 2013. Zalta, editor, The Stanford Encyclopedia of Philoso- Learning dependency-based compositional semantics. phy. Spring 2014 edition. Computational Linguistics, 39(2). Per Lindstrom.¨ 1966. First order predicate logic with generalized quantifiers. Theoria, 32(3):186–195. Bill MacCartney and Christopher D. Manning. 2007. Natural logic for textual inference. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment

!594