Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), Pages 1–9, Gothenburg, Sweden, April 26-30 2014

EACL 2014

14th Conference of the European Chapter of the Association for Computational Linguistics

Proceedings of the Workshop on Type Theory and Natural Language Semantics (TTNLS)

April 27, 2014 Gothenburg, Sweden c 2014 The Association for Computational Linguistics

Order copies of this and other ACL proceedings from:

Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected]

ISBN 978-1-937284-74-9

ii Introduction

Type theory has been a central area of research in logic, the semantics of programming languages, and natural language semantics over the past ﬁfty years. Recent developments in type theory have been used to reconstruct the formal foundations of computational semantics. The treatments are generally intensional and polymorphic in character. They allow for structured, ﬁne-grained encoding of information across a diverse set of linguistic domains.

The work in this area has opened up new approaches to modelling the relations between, inter alia, syntax, semantic interpretation, dialogue, inference, and cognition, from a largely proof theoretic perspective. The papers in this volume cover a wide range of topics on the application of type theory to modelling semantic properties of natural language.

TTNLS 2014 is providing a forum for the presentation of leading edge research in this fast developing sub-ﬁeld of computational linguistics. To the best of our knowledge it is the ﬁrst major conference on this topic hosted by the ACL.

We received a total of 13 relevant submissions, 10 of which were accepted for presentation. Each submission was reviewed by two members of our Programme Committee. We thank these people for their detailed and helpful reviews.

We are also including Aarne Ranta’s invited paper here. We are honoured that he has accepted our invitation to present this paper at TTNLS, and we look forward to his talk.

We would like to thank the organisers of the EACL 2014 both for their ﬁnancial assistance and their organisational support. We are also grateful to the Dialogue Technology Lab at the Centre for Language Technology of University of Gothenburg, and to the Wenner-Gren Foundations for funding Shalom Lappin’s research visits to University of Gothenburg, during which this workshop was organised.

Robin Cooper, Simon Dobnik, Shalom Lappin, and Staffan Larsson

University of Gothenburg and London

March 2014

iii

Workshop Organisation

Organising Committee: Robin Cooper (University of Gothenburg) Simon Dobnik (University of Gothenburg) Shalom Lappin (King’s College London) Staffan Larsson (University of Gothenburg)

Programme Committee: Krasimir Angelov (University of Gothenburg and Chalmers University of Technology) Patrick Blackburn (University of Roskilde) Stergios Chatzikyriakidis (Royal Holloway, University of London) Stephen Clark (University of Cambridge) Philippe de Groote (Inria Nancy - Grand Est) Jan van Eijck (University of Amsterdam) Raquel Fernández (University of Amsterdam) Tim Fernando (Trinity College, Dublin) Chris Fox (University of Essex) Jonathan Ginzburg (Université Paris-Diderot (Paris 7)) Zhaohui Luo (Royal Holloway, University of London) Uwe Mönnich (University of Tübingen) Bruno Mery (LaBRI, Université Bordeaux 1) Glyn Morrill (Universitat Politècnica de Catalunya, Barcelona) Larry Moss (Indiana University) Reinhard Muskens (Tilburg University) Bengt Nordström (University of Gothenburg and Chalmers University of Technology) Valeria de Paiva (Nuance Communications, Inc., Sunnyvale California) Carl Pollard (The Ohio State University) Ian Pratt-Hartmann (University of Manchester) Stephen Pulman (University of Oxford) Matt Purver (Queen Mary, University of London) Aarne Ranta (University of Gothenburg and Chalmers University of Technology) Christian Retoré (LaBRI, Université Bordeaux 1) Scott Martin (Nuance Communications, Inc., Sunnyvale California) Ray Turner (University of Essex)

Invited speaker: Aarne Ranta (University of Gothenburg and Chalmers University of Technology)

Table of Contents

Types and Records for Predication Aarne Ranta ...... 1

System with Generalized Quantiﬁers on Dependent Types for Anaphora Justyna Grudzinska and Marek Zawadowski ...... 10

Monads as a Solution for Generalized Opacity Gianluca Giorgolo and Ash Asudeh ...... 19

The Phenogrammar of Coordination ChrisWorth...... 28

Natural Language Reasoning Using Proof-Assistant Technology: Rich Typing and Beyond Stergios Chatzikyriakidis and Zhaohui Luo ...... 37

A Type-Driven Tensor-Based Semantics for CCG Jean Maillard, Stephen Clark and Edward Grefenstette ...... 46

From Natural Language to RDF Graphs with Pregroups Antonin Delpeuch and Anne Preller ...... 55

Incremental semantic scales by strings Tim Fernando ...... 63

A Probabilistic Rich Type Theory for Semantic Interpretation Robin Cooper, Simon Dobnik, Shalom Lappin and Staffan Larsson ...... 72

Probabilistic Type Theory for Incremental Dialogue Processing Julian Hough and Matthew Purver ...... 80

Propositions, Questions, and Adjectives: a rich type theoretic approach Jonathan Ginzburg, Robin Cooper and Tim Fernando...... 89

vii

Conference Program

Sunday, 27 April 2014

8:15 Registration

8:45 Opening remarks by Robin Cooper

(9:00 - 10:30) Session 1

9:00 Types and Records for Predication Aarne Ranta

10:00 System with Generalized Quantiﬁers on Dependent Types for Anaphora Justyna Grudzinska and Marek Zawadowski

10:30 Coffee break

(11:00 - 12:30) Session 2

11:00 Monads as a Solution for Generalized Opacity Gianluca Giorgolo and Ash Asudeh

11:30 The Phenogrammar of Coordination Chris Worth

12:00 Natural Language Reasoning Using Proof-Assistant Technology: Rich Typing and Beyond Stergios Chatzikyriakidis and Zhaohui Luo

12:30 Lunch

(14:00 - 15:30) Session 3

14:00 A Type-Driven Tensor-Based Semantics for CCG Jean Maillard, Stephen Clark and Edward Grefenstette

14:30 From Natural Language to RDF Graphs with Pregroups Antonin Delpeuch and Anne Preller

15:00 Incremental semantic scales by strings Tim Fernando

15:30 Coffee break

ix Sunday, 27 April 2014 (continued)

(16:00 - 17:30) Session 4

16:00 A Probabilistic Rich Type Theory for Semantic Interpretation Robin Cooper, Simon Dobnik, Shalom Lappin and Staffan Larsson

16:30 Probabilistic Type Theory for Incremental Dialogue Processing Julian Hough and Matthew Purver

17:00 Propositions, Questions, and Adjectives: a rich type theoretic approach Jonathan Ginzburg, Robin Cooper and Tim Fernando

17:30 Concluding remarks

x Types and Records for Predication

Aarne Ranta Department of Computer Science and Engineering, University of Gothenburg [email protected]

a grammar may need a large number of categories Abstract and rules. In GPSG (Gazdar et al., 1985), and later in HPSG (Pollard and Sag, 1994), this is solved This paper studies the use of records by introducing a feature called subcat for verbs. and dependent types in GF (Grammatical Verbs taking different arguments differ in the sub- Framework) to build a grammar for pred- cat feature but share otherwise the characteristic of ication with an unlimited number of sub- being verbs. categories, also covering extraction and In this paper, we will study the syntax and se- coordination. The grammar is imple- mantics of predication in GF, Grammatical Frame- mented for Chinese, English, Finnish, and work (Ranta, 2011). We will generalize both over Swedish, sharing the maximum of code subcategories (as in GPSG and HPSG), and over to identify similarities and differences be- languages (as customary in GF). We use depen- tween the languages. Equipped with a dent types to control the application of verbs to probabilistic model and a large lexicon, legitimate arguments, and records to control the the grammar has also been tested in wide- placement of arguments in sentences. The record coverage machine translation. The ﬁrst structure is inspired by the topological model of evaluations show improvements in parsing syntax in (Diderichsen, 1962). speed, coverage, and robustness in com- The approach is designed to apply to all lan- parison to earlier GF grammars. The study guages in the GF Resource Grammar Library conﬁrms that dependent types, records, (RGL, (Ranta, 2009)), factoring out their typolog- and functors are useful in both engineer- ical differences in a modular way. We have tested ing and theoretical perspectives. the grammar with four languages from three families: Chinese, English, Finnish, and Swedish. As 1 Introduction the implementation reuses old RGL code for all Predication is the basic level of syntax. In logic, it parts but predication, it can be ported to new lan- means building atomic formulas by predicates. In guages with just a few pages of new GF code. We linguistics, it means building sentences by verbs. have also tested it in wide coverage tasks, with a Categorial grammars (Bar-Hillel, 1953; Lambek, probabilistic tree model and a lexicon of 60,000 1958) adapt logical predication to natural lan- lemmas. guage. Thus for instance transitive verbs are cat- We will start with an introduction to the abstrac- egorized as (n s/n), which is the logical type tion mechanisms of GF and conclude with a sum- \ n n s with the information that one argument mary of some recent research. Section 2 places → → comes before the verb and the other one after. But GF on the map of grammar formalisms. Section 3 most approaches to syntax and semantics, includ- works out an example showing how abstract syn- ing (Montague, 1974), introduce predicate cate- tax can be shared between languages. Section 4 gories as primitives rather than as function types. shows how parts of concrete syntax can be shared Thus transitive verbs are a category of its own, re- as well. Section 5 gives the full picture of predi- lated to logic via a semantic rule. This gives more cation with dependent types and records, also ad- expressive power, as it permits predicates with dif- dressing extraction, coordination, and semantics. ferent syntactic properties and variable word order Section 6 gives preliminary evaluation. Section 7 (e.g. inversion in questions). A drawback is that concludes.

1 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 1–9, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics

2 GF: an executive summary logical frameworks, the abstract syntax encodes the structure relevant for semantics, whereas the GF belongs to a subfamily of categorial grammars concrete syntax defines “syntactic sugar”. inspired by (Curry, 1961). These grammars make The resulting system turned out to be equiv- a distinction between tectogrammar, which spec- alent to parallel multiple context-free gram- ifies the syntactic structures (tree-like representa- mars (Seki et al., 1991) and therefore parsable tions), and phenogrammar, which relates these in polynomial time (Ljunglof,¨ 2004). Compre- structures to linear representations, such as se- hensive grammars have been written for 29 lan- quences of characters, words, or phonemes. Other guages, and later work has optimized GF pars- formalisms in this family include ACG (de Groote, ing and also added probabilistic disambiguation 2001) and Lambda grammars (Muskens, 2001). and robustness, resulting in state-of-the-art perfor- GF inherits its name from LF, Logical Frame- mance in wide-coverage deep parsing (Angelov, works, which are type theories used for defin- 2011; Angelov and Ljunglof,¨ 2014). ing logics (Harper et al., 1993). GF builds on the LF called ALF, Another Logical Framework 3 Example: subject-verb-object (Magnusson, 1994), which implements Martin- sentences Lof’s¨ higher-level type theory (first introduced Let us start with an important special case of predi- in the preface of (Martin-Lof,¨ 1984); see Chap- cation: the subject-verb-object structure. The sim- ter 8 of (Ranta, 1994) for more details). Before plest possible rule is GF was introduced as an independent formalism fun PredTV : NP -> TV -> NP -> S in 1998, GF-like applications were built as plug- ins to ALF (Ranta, 1997). The idea was that the that is, a function that takes a subject NP, a tran- LF defines the tectogrammar, and the plug-in de- sitive verb TV, and an object NP, and returns a fines the phenogrammar. The intended application sentence S. This function builds abstract syntax was natural language interfaces to formal proof trees. Concrete syntax defines linearization rules, systems, in the style of (Coscoy et al., 1995). which convert trees into strings. The above rule GF was born via two additions to the natural can give rise to different word orders, such as SVO language interface idea. The first one was multi- (as in English), SOV (as in Hindi), and VSO (as in linguality: one and the same tectogrammar can Arabic): be given multiple phenogrammars. The second lin PredTV s v o = s ++ v ++ o lin PredTV s v o = s ++ o ++ v addition was parsing: the phenogrammar, which lin PredTV s v o = v ++ s ++ o was initially just (generating strings linearization where ++ means concatenation. from type theoretical formulas), was reversed to The above rule builds a sentence in one step. rules that parse natural language into type theory. A more flexible approach is to do it in two steps: The result was a method for translation, which complementation, forming a VP (verb phrase) combines parsing the source language with lin- from the verb and the object, and predication earization into the target language. This idea was proper that provides the subject. The abstract syn- indeed suggested in (Curry, 1961), and applied tax is before GF in the Rosetta project (Landsbergen, fun Compl : TV -> NP -> VP 1982), which used Montague’s analysis trees as fun Pred : NP -> VP -> S tectogrammar. These functions are easy to linearize for the SVO GF can be seen as a formalization and gener- and SOV orders: alization of Montague grammar. Formalization, lin Compl v o = v ++ o -- SVO because it introduces a formal notation for the lin Compl v o = o ++ v -- SOV linearization rules that in Montague’s work were lin Pred s vp = s ++ vp -- both expressed informally. Generalization, because of where -- marks a comment. However, the VSO multilinguality and also because the type system order cannot be obtained in this way, because the for analysis trees has dependent types. two parts of the VP are separated by the subject. Following the terminology of programming lan- The solution is to generalize linearization from guage theory, the tectogrammar is in GF called strings to records. Complementation can then re- the abstract syntax whereas the phenogrammar is turn a record that has the verb and the object as called the concrete syntax. As in compilers and separate fields. Then we can also generate VSO:

2 lin Compl v o = {verb = v ; obj = o} lin Love = { lin Pred s vp = vp.verb ++ s ++ vp.obj s = table { {n = Sg ; p = P3} => "loves" ; The dot (.) means projection, picking the value _ => "love" of a field in a record. } Records enable the abstract syntax to abstract } away not only from word order, but also from We can now define English linearization: whether a language uses discontinuous con- lin Compl v o = stituents. VP in VSO languages is one example. {s = table {a => v.s ! a ++ o.s ! Acc}} Once we enable discontinuous constituents, they lin Pred s vp = {s = s.s ! Nom ++ vp.s ! np.a} turn out useful almost everywhere, as they enable us to delay the decision about linear order. It can using the same type of records for VP as for TV, then be varied even inside a single language, if it and a one-string record for S. The Compl rule depends on syntactic context (as e.g. in German; passes the agreement feature to the verb of the VP, cf. (Muller,¨ 2004) for a survey). and selects the Acc form of the object (with ! de- The next thing to abstract away from is inflec- noting selection from a table). The Pred rule se- tion and agreement. Given the lexicon lects the Nom form of the subject, and attaches to fun We, She : NP this the VP form selected for np.a, i.e. the agree- fun Love : TV ment feature of the subject. we can build the abstract syntax tree Pred We (Compl Love She) 4 Generalized concrete syntax to represent we love her. If we swap the subject To see the full power of GF, we now take a look and the object, we get at its type and module system. Figure 1 shows a Pred She (Compl Love We) complete set of grammar modules implementing transitive verb predication for Finnish and Chinese for she loves us. Now, these two sentences are with a maximum of shared code. built from the same abstract syntax objects, but The first module in Figure 1 is the abstract syn- no single word is shared between them! This is tax Pred, where the fun rules are preceded by because the noun phrases inflect for case and the a set of cat rules defining the categories of the verb agrees to the subject. grammar, i.e. the basic types. Pred defines five In contrast to English, Chinese just reorders the categories: S, Cl, NP, VP, and TV. S is the top- words: level category of sentences, whereas Cl (clause) is women ai ta - “we love her” the intermediate category of predications, which ta ai women - “she loves us” can be used as sentences in many ways—here, as declaratives and as questions. Thus the above rules for SVO languages work as The concrete syntax has corresponding lincat they are for Chinese. But in English, we must in- rules, which equip each category with a lineariza- clude case and agreement as features in the con- tion type, i.e. the type of the values returned crete syntax. Thus the linearization of an NP is when linearizing trees of that category. The mod- a record that includes a table producing the case ule PredFunctor in Figure 1 contains four such forms, and agreement as an inherent feature: rules. In lincat NP, the type Case => Str is lin She = { the type of tables that produce a string as a func- s = table { Nom => "she" ; tion of a case, and Agr is the type of agreement Acc => "her" features. }; When a GF grammar is compiled, each lin rule a = {n = Sg ; p = P3} ; } is type checked with respect to the lincats of the categories involved, to guarantee that, for every The agreement feature (field a) is itself a record, with a number and a gender. In other languages, fun f : C1 Cn C case and agreement can of course have different → ··· → → sets of values. we have Verbs likewise include tables that inflect them for different agreement features: lin f : C∗ C∗ C∗ 1 → ··· → n →

3 abstract Pred = { cat S ; Cl ; NP ; VP ; TV ; fun Compl : TV -> NP -> VP ; fun Pred : TV -> NP -> Cl ; fun Decl : Cl -> S ; fun Quest : Cl -> S ; } incomplete concrete PredFunctor of Pred = open PredInterface in { lincat S = {s : Str} ; lincat Cl = {subj,verb,obj : Str} ; lincat NP = {s : Case => Str ; a : Agr} ; lincat VP = {verb : Agr => Str ; obj : Str} ; lincat TV = {s : Agr => Str} ; lin Compl tv np = {verb = tv.s ; obj = np.s ! objCase} ; lin Pred np vp = {subj = np.s !subjCase ; verb = vp.verb ! np.a ; obj = vp.obj} ; lin Decl cl = {s = decl cl.subj cl.verb cl.obj} ; lin Quest cl = {s = quest cl.subj cl.verb cl.obj} ; } interface PredInterface = { oper Case, Agr : PType ; oper subjCase, objCase : Case ; oper decl, quest : Str -> Str -> Str -> Str ; } instance PredInstanceFin of PredInterface = { concrete PredFin of Pred = oper Case = -- Nom | Acc | ... ; PredFunctor with oper Agr = {n : Number ; p : Person} ; (PredInterface = oper subjCase = Nom ; objCase = Acc ; PredInstanceFin) ; oper decl s v o = s ++ v ++ o ; oper quest s v o = v ++ "&+ ko" ++ s ++ o ; } instance PredInstanceChi of PredInterface = { concrete PredChi of Pred = oper Case, Agr = {} ; PredFunctor with oper subjCase, objCase = <> ; (PredInterface = oper decl s v o = s ++ v ++ o ; PredInstanceChi) ; oper quest s v o = s ++ v ++ o ++ "ma" ; }

Figure 1: Functorized grammar for transitive verb predication.

where A∗ is the linearization type of A. Thus lin- And more languages can be added to use it. Con- earization is a homomorphism. It is actually sider for example the definition of NP. The expe- an instance of denotational semantics, where the rience from the RGL shows that, if a language lincats are the domains of possible denota- has case and agreement, its NPs inflect for case tions. and have inherent agreement. The limiting case Much of the strength of GF comes from us- of Chinese can be treated by using the unit type ( i.e. the record type with no fields) for both ing different linearization types for different lan- {} guages. Thus English needs case and agreement, features. This would not be so elegant for Chinese Finnish needs many more cases (in the full gram- alone, but makes sense in the code sharing context. mar), Chinese needs mostly only strings, and so Discontinuity now appears as another useful on. However, it is both useful and illuminating to generalization. With the lincat definition in unify the types. The way to do this is by the use PredFunctor, we can share the Compl rule in all of functors, also known as a parametrized mod- of the languages discussed so far. In clauses (Cl), ules. we continue on similar lines: we keep the subject, PredFunctor in Figure 1 is an example; func- the verb, and the object on separate fields. Notice tors are marked with the keyword incomplete.A that verb in Cl is a plain string, since the value of functor depends on an interface, which declares Agr gets fixed when the subject is added. a set of parameters (PredInterface in Figure The final sentence word order is created as the 1). A concrete module is produced by giving last step, when converting Cl into S. As Cl is dis- an instance to the interface (PredInstanceFin continuous, it can be linearized in different orders. and PredInstanceChi). In Figure 1, this is used in Finnish for generat- The rules in PredFunctor in Figure 1 are de- ing the SVO order in declaratives and VSO on signed to work for both languages, by varying the questions (with an intervening question particle ko definitions of the constants in PredInterface. glued to the verb). It also supports the other word

4 orders of Finnish (Karttunen and Kay, 1985). For example, NP+QCl is used in verbs such as ask By using an abstract syntax in combination with (someone whether something). unordered records, parameters, and functors for What about PP (prepositional phrase) comple- the concrete syntax, we follow a kind of a “prin- ments? The best approach in a multilingual set- ciples and parameters” approach to language vari- ting is to treat them as NP complements with des- ation (Chomsky, 1981). The actual parameter set ignated cases. Thus in Figure 2.5, the lineariza- for the whole RGL is of course larger than the one tion type of VP has fields of type complCase. shown here. This covers cases and prepositions, often in com- Mathematically, it is possible to treat all differ- bination. For instance, the German verb lieben ences in concrete syntax by parameters, simply by (“love”) takes a plain accusative argument, fol- declaring a new parameter for every lincat and gen (“love”) a plain dative, and warten (“wait”) lin rule! But this is both vacuous as a theory and the preposition auf with the accusative. From the an unnecessary detour in practice. It is more il- abstract syntax point of view, all of them are NP- luminating to keep the functor simple and the set complement verbs. Cases and prepositions, and of parameters small. If the functor does not work thereby transitivity, are defined in concrete syntax. for a new language, it usually makes more sense to The category Cl, clause, is the discontinuous override it than to grow the parameter list, and GF structure of sentences before word order is deter- provides a mechanism for this. Opposite to “prin- mined. Its instance Cl (c np O) corresponds to ciples and parameters”, this is “a model in which the slash categories S/NP and S/PP in GPSG. language-particular rules take over the work of pa- Similarly, VP (c np O) corresponds to VP/NP rameter settings” (Newmeyer, 2004). A combina- and VP/PP, Adv (c np O) to Adv/NP (preposition of the two models enables language compari- tions), and so on. son by measuring the amount of overrides. 2. Initial formation of verb phases. A VP is formed from a V by fixing its tense and polarity. 5 The full predication system In the resulting VP, the verb depends only on the So far we have only dealt with one kind of verbs, agreement features of the expected subject. The TV. But we need more: intransitive, ditransitive, complement case comes from the verb’s lexical sentence-complement, etc. The general verb cate- entry, but the other fields—such as the objects— gory is a dependent type, which varies over argu- are left empty. This makes the VP usable in both ment type lists: complementation and slash operations (where the subject is added before some complement). cat V (x : Args) VPs can also be formed from adverbials, ad- The list x : Args corresponds to the subcat fea- jectival phrases, and common nouns, by adding a ture in GPSG and HPSG. Verb phrases and clauses copula. Thus was in results from applying UseAdv have the same dependencies. Syntactically, a to the preposition (i.e. Adv/NP) in, and expands to phrase depending on x : Args has “holes” for a VP with ComplNP (was in France) and to a slash every argument in the list x. Semantically, it is a clause with PredVP (she was in). function over the denotations of its arguments (see 3. Complementation, VP slash formation, re- Section 5.3 below). flexivization. The Compl functions in Figure 2.3 provide each verb phrase with its “first” comple- 5.1 The code ment. The Slash functions provide the “last” Figure 2 shows the essentials of the resulting complement, leaving a “gap” in the middle. For grammar, and we will now explain this code. The instance, SlashCl provides the slash clause used full code is available at the GF web site. in the question whom did you tell that we sleep. 1. Argument lists and dependent categories. The Refl rules fill argument places with reflexive The argument of a verb can be an adjectival phrase pronouns. (AP, become old), a clause (Cl, say that we go), a 4. NP-VP predication, slash termination, and common noun (CN, become a president), a noun adverbial modification. PredVP is the basic NP- phrase (NP, love her), a question (QCl, wonder VP predication rule. With x = c np O, it be- who goes), or a verb phrase (VP, want to go). The comes the rule that combines NP with VP/NP to definition allows an arbitrary list of arguments. form S/NP. SlashTerm is the GPSG “slash termi-

5 1. Argument lists and some dependent categories cat Arg ; Args -- arguments and argument lists fun ap, cl, cn, np, qcl, vp : Arg -- AP, Cl, CN, NP, QCl, VP argument fun O : Args -- no arguments fun c : Arg -> Args -> Args -- one more argument cat V (x : Args) -- verb in the lexicon cat VP (x : Args) -- verb phrase cat Cl (x : Args) -- clause cat AP (x : Args) -- adjectival phrase cat CN (x : Args) -- common noun phrase cat Adv (x : Args) -- adverbial phrase 2. Initial formation of verb phases fun UseV : (x : Args) -> Temp -> Pol -> V x -> VP x -- loved (X) fun UseAP : (x : Args) -> Temp -> Pol -> AP x -> VP x -- was married to (X) fun UseCN : (x : Args) -> Temp -> Pol -> CN x -> VP x -- was a son of (X) fun UseAdv : (x : Args) -> Temp -> Pol -> Adv x -> VP x -- was in (X) 3. Complementation, VP slash formation, reﬂexivization fun ComplNP : (x : Args) -> VP (c np x) -> NP -> VP x -- love her fun ComplCl : (x : Args) -> VP (c cl x) -> Cl x -> VP x -- say that we go fun SlashNP : (x : Args) -> VP (c np (c np x)) -> NP -> VP (c np x) -- show (X) to him fun SlashCl : (x : Args) -> VP (c np (c cl x)) -> Cl x -> VP (c np x) -- tell (X) that.. fun ReflVP : (x : Args) -> VP (c np x) -> VP x -- love herself fun ReflVP2 : (x : Args) -> VP (c np (c np x)) -> VP (c np x) -- show (X) to herself 4. NP-VP predication, slash termination, and adverbial modiﬁcation fun PredVP : (x : Args) -> NP -> VP x -> Cl x -- she loves (X) fun SlashTerm : (x : Args) -> Cl (c np x) -> NP -> Cl x -- she loves + X 5. The functorial linearization type of VP lincat VP = { verb : Agr => Str * Str * Str ; -- finite: would,have,gone inf : VVType => Str ; -- infinitive: (not) (to) go imp : ImpType => Str ; -- imperative: go c1 : ComplCase ; -- case of first complement c2 : ComplCase ; -- case of second complement vvtype : VVType ; -- type of VP complement adj : Agr => Str ; -- adjective complement obj1 : Agr => Str ; -- first complement obj2 : Agr => Str ; -- second complement objagr : {a : Agr ; objCtr : Bool} ; -- agreement used in object control adv1 : Str ; -- pre-verb adverb adv2 : Str ; -- post-verb adverb ext : Str ; -- extraposed element e.g. that-clause } 6. Some functorial linearization rules lin ComplNP x vp np = vp ** {obj1 = \\a => appComplCase vp.c1 np} lin ComplCl x vp cl = vp ** {ext = that_Compl ++ declSubordCl cl} lin SlashNP2 x vp np = vp ** {obj2 = \\a => appComplCase vp.c2 np} lin SlashCl x vp cl = vp ** {ext = that_Compl ++ declSubordCl cl} 7. Some interface parameters oper Agr, ComplCase : PType -- agreement, complement case oper appComplCase : ComplCase -> NP -> Str -- apply complement case to NP oper declSubordCl : Cl -> Str -- subordinate question word order

Figure 2: Dependent types, records, and parameters for predication.

6 nation” rule. 5.3 Semantics 5. The functorial linearization type of VP. The abstract syntax has straightforward denota- This record type contains the string-valued fields tional semantics: each type in the Args list of a that can appear in different orders, as well as the category adds an argument to the type of denota- inherent features that are needed when comple- tions. For instance, the basic VP denotation type is ments are added. The corresponding record for Cl Ent -> Prop, and the type for an arbitrary sub- has similar fields with constant strings, plus a sub- category of VP x is ject field. (x : Args) -> Den x (Ent -> Prop) 6. Some functorial linearization rules. The where Den is a type family defined recursively verb-phrase expanding rules typically work with over Args, , where the old VP is left un- record updates Den : Args -> Type -> Type changed except for a few fields that get new val- Den O t = t ues. GF uses the symbol ** for record updates. Den (c np xs) t = Ent -> Den xs t Notice that ComplCl and SlashCl have exactly Den (c cl xs) t = Prop -> Den xs t the same linearization rules; the difference comes and so on for all values of Arg. The second ar- from the argument list x in the abstract syntax. gument t varies over the basic denotation types of 7. Some interface parameters. The code VP, AP, Adv, and CN. in Figure 2.5 and 2.6 is shared by different lan- Montague-style semantics is readily available guages, but it depends on an interface that declares for all rules operating on these categories. As a parameters, some of which are shown here. logical framework, GF has the expressive power needed for defining semantics (Ranta, 2004). The 5.2 More constructions types can moreover be extended to express selec- tional restrictions, where verb arguments are re- Extraction. The formation of questions and rel- stricted to domains of individuals. Here is a type atives is straighforward. Sentential (yes/no) ques- system that adds a domain argument to NP and tions, formed by QuestCl in Figure 3.1, don’t in VP: many languages need any changes in the clause, cat NP (d : Dom) but just a different ordering in final linearization. cat VP (d : Dom)(x : Args) Wh questions typically put one interrogative (IP) fun PredVP : (d : Dom) -> (x : Args) -> NP d -> VP d x -> Cl x in the focus, which may be in the beginning of the sentence even though the corresponding argument The predication rule checks that the NP and the place in declaratives is later. The focus field in VP have the same domain. QCl is used for this purpose. It carries a Boolean feature saying whether the field is occupied. If its 6 Evaluation value is True, the next IP is put into the “normal” Coverage. The dependent type system for verbs, argument place, as in who loves whom. verb phrases, and clauses is a generalization of Coordination. The VP conjunction rules in the old Resource Grammar Library (Ranta, 2009), Figure 3.2 take care of both intransitive VPs (she which has a set of hard-wired verb subcategories walks and runs) and of verb phrases with argu- and a handful of slash categories. While it cov- ments (she loves and hates us). Similarly, Cl con- ers “all usual cases”, many logically possible ones juction covers both complete sentences and slash are missing. Some such cases even appear in the clauses (she loves and we hate him). Some VP Penn treebank (Marcus et al., 1993), requiring ex- coordination instances may be ungrammatical, in tra rules in the GF interpretation of the treebank particular with inverted word orders. Thus she is (Angelov, 2011). An example is a function of type tired and wants to sleep works as a declarative, V (c np (c vp O)) -> but the question is not so good: ?is she tired and VPC (c np O) -> VP (c np O) wants to sleep. Preventing this would need a much which is used 12 times, for example in This is de- more complex rules. Since the goal of our gram- signed to get the wagons in a circle and defend mar is not to define grammaticality (as in formal the smoking franchise. It has been easy to write language theory), but to analyse and translate ex- conversion rules showing that the old coverage is isting texts, we opted for a simple system in this preserved. But it remains future work to see what case (but did not need to do so elsewhere). new cases are covered by the increased generality.

7 1. Extraction. cat QCl (x : Args) -- question clause cat IP -- interrogative phrase fun QuestCl : (x : Args) -> Cl x -> QCl x -- does she love him fun QuestVP : (x : Args) -> IP -> VP x -> QCl x -- who loves him fun QuestSlash : (x : Args) -> IP -> QCl (c np x) -> QCl x -- whom does she love lincat QCl = Cl ** {focus : {s : Str ; isOcc : Bool}} -- focal IP, whether occupied 2. Coordination. cat VPC (x : Args) -- VP conjunction cat ClC (x : Args) -- Clause conjunction fun StartVPC : (x : Args) -> Conj -> VP x -> VP x -> VPC x -- love or hate fun ContVPC : (x : Args) -> VP x -> VPC x -> VPC x -- admire, love or hate fun UseVPC : (x : Args) -> VPC x -> VP x -- [use VPC as VP] fun StartClC : (x : Args) -> Conj -> Cl x -> Cl x -> ClC x -- he sells and I buy fun ContClC : (x : Args) -> Cl x -> ClC x -> ClC x -- you steal, he sells and I buy fun UseClC : (x : Args) -> ClC x -> Cl x -- [use ClC as Cl]

Figure 3: Extraction and coordination.

Multilinguality. How universal are the con- written. crete syntax functor and interface? In the stan- Robustness. Robustness in GF parsing is dard RGL, functorization has only been attempted achieved by introducing metavariables (“ques- for families of closely related languages, with Ro- tion marks”) when tree nodes cannot be con- mance languages sharing 75% of syntax code and structed by the grammar (Angelov, 2011). The Scandinavian languages 85% (Ranta, 2009). The subtrees under a metavariable node are linearized new predication grammar shares code across all separately, just like a sequence of chunks. In languages. The figure to compare is the percent- translation, this leads to decrease in quality, be- age of shared code (abstract syntax + functor + in- cause dependencies between chunks are not de- terface) of the total code written for a particular tected. The early application of tense and polarity language (shared + language-specific). This per- is an improvement, as it makes verb chunks con- centage is 70 for Chinese, 64 for English, 61 for tain information that was previously detected only Finnish, and 76 for Swedish, when calculated as if the parser managed to build a whole sentence. lines of code. The total amount of shared code is 760 lines. One example of overrides is negation 7 Conclusion and questions in English, which are complicated by the need of auxiliaries for some verbs (go) but We have shown a GF grammar for predication al- not for others (be). This explains why Swedish lowing an unlimited variation of argument lists: an shares more of the common code than English. abstract syntax with a concise definition using de- Performance. Dependent types are not inte- pendent types, a concrete syntax using a functor grated in current GF parsers, but checked by post- and records, and a straightforward denotational se- processing. This implies a loss of speed, be- mantics. The grammar has been tested with four cause many trees are constructed just to be thrown languages and shown promising results in speed away. But when we specialized dependent types and robustness, also in large-scale processing. A and rules to nondependent instances needed by the more general conclusion is that dependent types, lexicon (using them as metarules in the sense of records, and functors are powerful tools both for GPSG), parsing became several times faster than computational grammar engineering and for the with the old grammar. An analysis remains to do, theoretical study of languages. but one hypothesis is that the speed-up is due to Acknowledgements. I am grateful to Krasimir fixing tense and polarity earlier than in the old Angelov and Robin Cooper for comments, and RGL: when starting to build VPs, as opposed to to Swedish Research Council for support under when using clauses in full sentences. Dependent grant nr. 2012-5746 (Reliable Multilingual Dig- types made it easy to test this refactoring, since ital Communication). they reduced the number of rules that had to be

8 References L. Magnusson. 1994. The Implementation of ALF - a Proof Editor based on Martin-Lof’s¨ Monomorphic K. Angelov and P. Ljunglof.¨ 2014. Fast statistical pars- Type Theory with Explicit Substitution. Ph.D. thesis, ing with parallel multiple context-free grammars. In Department of Computing Science, Chalmers Uni- Proceedings of EACL-2014, Gothenburg. versity of Technology and University of Goteborg.¨ K. Angelov. 2011. The Mechanics of the Grammatical M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Framework. Ph.D. thesis, Chalmers University of 1993. Building a Large Annotated Corpus of En- Technology. glish: The Penn Treebank. Computational Linguis- tics, 19(2):313–330. Y. Bar-Hillel. 1953. A quasi-arithmetical notation for syntactic description. Language, 29:27–58. P. Martin-Lof.¨ 1984. Intuitionistic Type Theory. Bib- liopolis, Napoli. N. Chomsky. 1981. Lectures on Government and Binding. Mouton de Gruyter. R. Montague. 1974. Formal Philosophy. Yale Univer- sity Press, New Haven. Collected papers edited by Y. Coscoy, G. Kahn, and L. Thery. 1995. Extract- Richmond Thomason. ing text from proofs. In M. Dezani-Ciancaglini and G. Plotkin, editors, Proc. Second Int. Conf. on Typed S. Muller.¨ 2004. Continuous or Discontinuous Con- Lambda Calculi and Applications, volume 902 of stituents? A Comparison between Syntactic Analy- LNCS, pages 109–123. ses for Constituent Order and Their Processing Sys- tems. Research on Language and Computation, H. B. Curry. 1961. Some logical aspects of grammat- 2(2):209–257. ical structure. In Roman Jakobson, editor, Structure of Language and its Mathematical Aspects: Pro- R. Muskens. 2001. Lambda Grammars and the ceedings of the Twelfth Symposium in Applied Math- Syntax-Semantics Interface. In R. van Rooy and ematics, pages 56–68. American Mathematical So- M. Stokhof, editors, Proceedings of the Thirteenth ciety. Amsterdam Colloquium, pages 150–155, Amster- dam. http://let.uvt.nl/general/people/ Ph. de Groote. 2001. Towards Abstract Categorial rmuskens/pubs/amscoll.pdf. Grammars. In Association for Computational Lin- guistics, 39th Annual Meeting and 10th Conference F. J. Newmeyer. 2004. Against a parameter-setting of the European Chapter, Toulouse, France, pages approach to language variation. Linguistic Variation 148–155. Yearbook, 4:181–234. C. Pollard and I. Sag. 1994. Head-Driven Phrase P. Diderichsen. 1962. Elementær dansk grammatik. Structure Grammar. University of Chicago Press. Gyldendal, København. A. Ranta. 1994. Type Theoretical Grammar. Oxford G. Gazdar, E. Klein, G. Pullum, and I. Sag. 1985. Gen- University Press. eralized Phrase Structure Grammar. Basil Black- well, Oxford. A. Ranta. 1997. Structures grammaticales dans le franc¸ais mathematique.´ Mathematiques,´ informa- R. Harper, F. Honsell, and G. Plotkin. 1993. A Frame- tique et Sciences Humaines, 138/139:5–56/5–36. work for Deﬁning Logics. JACM, 40(1):143–184. A. Ranta. 2004. Computational Semantics in Type L. Karttunen and M. Kay. 1985. Parsing in a free Theory. Mathematics and Social Sciences, 165:31– word order language. In D. Dowty, L. Karttunen, 57. and A. Zwicky, editors, Natural Language Pars- ing, Psychological, Computational, and Theoretical A. Ranta. 2009. The GF Resource Grammar Perspectives, pages 279–306. Cambridge University Library. Linguistics in Language Technology, Press. 2. http://elanguage.net/journals/index. php/lilt/article/viewFile/214/158. J. Lambek. 1958. The mathematics of sentence structure. American Mathematical Monthly, 65:154–170. A. Ranta. 2011. Grammatical Framework: Program- ming with Multilingual Grammars. CSLI Publica- J. Landsbergen. 1982. Machine translation based tions, Stanford. on logically isomorphic Montague grammars. In COLING-1982. H. Seki, T. Matsumura, M. Fujii, and T. Kasami. 1991. On multiple context-free grammars. Theoretical P. Ljunglof.¨ 2004. The Expressivity and Com- Computer Science, 88:191–229. plexity of Grammatical Framework. Ph.D. thesis, Dept. of Computing Science, Chalmers University of Technology and Gothenburg Uni- versity. http://www.cs.chalmers.se/~peb/ pubs/p04-PhD-thesis.pdf.

9 System with Generalized Quantiﬁers on Dependent Types for Anaphora

Justyna Grudzinska´ Marek Zawadowski Institute of Philosophy Institute of Mathematics University of Warsaw University of Warsaw Krakowskie Przedmiescie´ 3, 00-927 Warszawa Banacha 2, 02-097 Warszawa [email protected] [email protected]

Abstract pronouns as quantifiers; as in the systems of dynamic semantics context is used as a medium sup- We propose a system for the interpreta- plying (possibly dependent) types as their poten- tion of anaphoric relationships between tial quantificational domains. Like Dekker’s Pred- unbound pronouns and quantifiers. The icate Logic with Anaphora and more recent mul- main technical contribution of our pro- tidimensional models (Dekker, 1994); (Dekker, posal consists in combining generalized 2008), our system lends itself to the compositional quantifiers with dependent types. Empir- treatment of unbound anaphora, while keeping a ically, our system allows a uniform treat- classical, static notion of truth. The main novelty ment of all types of unbound anaphora, in- of our proposal consists in combining generalized cluding the notoriously difficult cases such quantifiers (Mostowski, 1957); (Lindstrom,¨ 1966); as quantificational subordination, cumula- (Barwise and Cooper, 1981) with dependent types tive and branching continuations, and don- (Martin-Lof,¨ 1972); (Ranta, 1994). key anaphora. The paper is organized as follows. In Section 2 we introduce informally the main features of our 1 Introduction system. Section 3 sketches the process of English- The phenomenon of unbound anaphora refers to to-formal language translation. Finally, sections 4 instances where anaphoric pronouns occur outside and 5 define the syntax and semantics of the sys- the syntactic scopes (i.e. the c-command domain) tem. of their quantifier antecedents. The main kinds of unbound anaphora are regular anaphora to quan- 2 Elements of system tifiers, quantificational subordination, and donkey 2.1 Context, types and dependent types anaphora, as exemplified by (1) to (3) respectively: The variables of our system are always typed. We (1) Most kids entered. They looked happy. write x : X to denote that the variable x is of type X and refer to this as a type specification of the (2) Every man loves a woman. They kiss them. variable x. Types, in this paper, are interpreted as (3) Every farmer who owns a donkey beats it. sets. We write the interpretation of the type X as X . k k Unbound anaphoric pronouns have been dealt Types can depend on variables of other types. with in two main semantic paradigms: dynamic Thus, if we already have a type specification x : semantic theories (Groenendijk and Stokhof, X, then we can also have type Y (x) depending 1991); (Van den Berg, 1996); (Nouwen, 2003) and on the variable x and we can declare a variable y the E-type/D-type tradition (Evans, 1977); (Heim, of type Y by stating y : Y (x). The fact that Y 1990); (Elbourne, 2005). In the dynamic seman- depends on X is modeled as a projection tic theories pronouns are taken to be (syntactically π : Y X . free, but semantically bound) variables, and con- k k → k k text serves as a medium supplying values for the So that if the variable x of type X is interpreted as variables. In the E-type/D-type tradition pronouns an element a X , Y (a) is interpreted as the are treated as quantifiers. Our system combines ∈ k k k k fiber of π over a (the preimage of a under π) aspects of both families of theories. As in the E- { } type/D-type tradition we treat unbound anaphoric Y (a) = b Y : π(b) = a . k k { ∈ k k }

10 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 10–18, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics

One standard natural language example of such a 1957) further generalized by Lindstrom¨ to a dependence of types is that if m is a variable of so-called polyadic quantifier (Lindstrom,¨ 1966), the type of months M, there is a type D(m) of see (Bellert and Zawadowski, 1989). To use a the days of the month m. Such type dependencies familiar example, a multi-quantifier prefix like can be nested, i.e., we can have a sequence of type is thought of as a single two-place ∀m:M |∃w:W specifications of the (individual) variables: quantifier obtained by an operation on the two single quantifiers, and it has as denotation: x : X, y : Y (x), z : Z(x, y). m:M w:W = R M W : a M : Context for us is a partially ordered sequence of k∀ |∃ k { ⊆ k k × k k { ∈ k k type specifications of the (individual) variables b W : a, b R w:W m:M . and it is interpreted as a parameter set, i.e. as a { ∈ k k h i ∈ } ∈ k∃ k} ∈ k∀ k} set of compatible n-tuples of the elements of the In this paper we generalize the three chain- sets corresponding to the types involved (compat- constructors and the corresponding semantical op- ible wrt all projections). erations to (pre-) chains defined on dependent types. 2.2 Quantifiers, chains of quantifiers 2.3 Dynamic extensions of contexts Our system defines quantifiers and predicates polymorphically. A generalized quantifier Q is In our system language expressions are all defined an association to every set Z a subset of the in context. Thus the first sentence in (2) (on the power set of Z. If we have a predicate P de- most natural interpretation where a woman defined in a context Γ, then for any interpreta- pends on every man) translates (via the process de- tion of the context Γ it is interpreted as a scribed in Section 3) into a sentence with a chain k k subset of its parameter set. Quantifier phrases, of quantifiers in a context: e.g. every man or some woman, are interpreted as follows: every = man and Γ m:M w:W Love(m, w), k m:mank {k k} ` ∀ |∃ somew:woman = X woman : X = . k k { ⊆ k k 6 ∅} and says that the set of pairs, a man and a woman The interpretation of quantifier phrases is fur- he loves, has the following property: the set of ther extended into the interpretation of chains of those men that love some woman each is the set quantifiers. Consider an example in (2): of all men. The way to understand the second sen- (2) Every man loves a woman. They kiss them. tence in (2) (i.e. the anaphoric continuation) is that every man kisses the women he loves rather than Multi-quantifier sentences such as the first sen- those loved by someone else. Thus the first sentence in (2) are known to be ambiguous with tence in (2) must deliver some internal relation be- different readings corresponding to how various tween the types corresponding to the two quanti- quantifiers are semantically related in the sen- fier phrases. tence. To account for the readings available for In our analysis, the first sentence in (2) extends such multi-quantifier sentences, we raise quanti- the context Γ by adding new variable specifica- fier phrases to the front of a sentence to form tions on newly formed types for every quantifier (generalized) quantifier prefixes - chains of quan- phrase in the chain Ch = - for the ∀m:M |∃w:W tifiers. Chains of quantifiers are built from quanti- purpose of the formation of such new types we in- fier phrases using three chain-constructors: pack- troduce a new type constructor T. That is, the first formation rule (?,..., ?), sequential composi- sentence in (2) (denoted as ϕ) extends the context tion ? ?, and parallel composition ? . The se- | ? by adding: mantical operations that correspond to the chain- constructors (known as cumulation, iteration and T T tϕ, m : ϕ, m:M ; tϕ, w : ϕ, w:W (tϕ, m ) branching) capture in a compositional manner cu- ∀ ∀ ∃ ∃ ∀ mulative, scope-dependent and branching read- The interpretations of types (that correspond to ings, respectively. quantifier phrases in Ch) from the extended con- The idea of chain-constructors and the cor- text Γϕ are defined in a two-step procedure using responding semantical operations builds on the inductive clauses through which we define Ch Mostowski’s notion of quantifier (Mostowski, but in the reverse direction.

11 Step 1. We define fibers of new types by inverse (3) Every farmer who owns a donkey beats it. induction. To account for the dynamic contribution of modi- Basic step. fied common nouns (in this case common nouns For the whole chain Ch = m:M w:W we put: ∀ |∃ modified by relative clauses) we include in our T := Love . system -sentences (i.e. sentences with dummy ϕ, m:M w:W ∗ k ∀ |∃ k k k quantifier phrases). The modified common noun Inductive step. gets translated into a -sentence (with a dummy- ∗ quantifier phrase f : F ): Tϕ, = a M : b W : k ∀m:M k { ∈ k k { ∈ k k Γ f : F d:DOwn(f, d) a, b Love w:W ` |∃ h i ∈ k k} ∈ k∃ k} and for a M and we extend the context by dropping the speci- ∈ k k fications of variables: (f : F, d : D) and adding T ϕ, w:W (a) = b W : a, b Love new variable specifications on newly formed types k ∃ k { ∈ k k h i ∈ k k} for every (dummy-) quantifier phrase in the chain Step 2. We build dependent types from fibers. Ch∗: T T ϕ, w:W = a ϕ, w:W (a): T T k ∃ k {{ } × k ∃ k tϕ,f : ϕ,f:F ; tϕ, d : ϕ, d:D (tϕ,f ), [ ∃ ∃ a Tϕ, ∈ k ∀m:M k} The interpretations of types (that correspond to the Thus the first sentence in (2) extends the con- quantifier phrases in the Ch∗) from the extended T text Γ by adding the type ϕ, m:M , interpreted context Γϕ are defined in our two-step procedure. T ∀ as ϕ, m:M (i.e. the set of men who love some Thus the -sentence in (3) extends the context by k ∀ k T ∗ T T women), and the dependent type ϕ, w:W (tϕ, m ), adding the type ϕ,f:F interpreted as ϕ,f:F ∃ ∀ k k interpreted for a Tϕ, as Tϕ, (a) (i.e. the set of farmers who own some donkeys), ∈ k ∀m:M k k ∃w:W k (i.e. the set of women loved by the man a). and the dependent type Tϕ, (tϕ,f ), interpreted ∃d:D Unbound anaphoric pronouns are interpreted for a Tϕ,f:F as Tϕ, (a) (i.e. the set of ∈ k k k ∃d:D k with reference to the context created by the fore- donkeys owned by the farmer a). The main clause going text: they are treated as universal quantifiers translates into: and newly formed (possibly dependent) types in- T T crementally added to the context serve as their po- Γϕ tϕ,f : ϕ,f:F tϕ, : ϕ, (tϕ,f ) ` ∀ |∀ ∃d ∃d:D tential quantificational domains. That is, unbound Beat(tϕ,f , tϕ, ), anaphoric pronouns theym and themw in the sec- ∃d ond sentence of (2) have the ability to pick up and yielding the correct truth conditions Every farmer quantify universally over the respective interpreta- who owns a donkey beats every donkey he owns. tions. The anaphoric continuation in (2) translates Importantly, since we quantify over fibers (and not into: over farmer, donkey pairs), our solution does h i not run into the so-called ‘proportion problem’. Γ T T ϕ tϕ, m : ϕ, tϕ, w : ϕ, (tϕ, m ) ` ∀ ∀ ∀m:M |∀ ∃ ∃w:W ∀ Dynamic extensions of contexts and their interpretation are also defined for cumulative and Kiss(tϕ, m , tϕ, w ), ∀ ∃ branching continuations. Consider a cumulative where: example in (4):

T T = tϕ, m : ϕ, tϕ, w : ϕ, (tϕ, m ) (4) Last year three scientists wrote (a total of) five k∀ ∀ ∀m:M |∀ ∃ ∃w:W ∀ k articles (between them). They presented them R Tϕ, : a Tϕ, : { ⊆ k ∃w:W k { ∈ k ∀m:M k at major conferences. b Tϕ, (a): a, b R { ∈ k ∃w:W k h i ∈ } ∈ Interpreted cumulatively, the first sentence in (4) T (a) T , tϕ, w : ϕ, (tϕ, m ) tϕ, m : ϕ, translates into a sentence: k∀ ∃ ∃w:W ∀ k } ∈ k∀ ∀ ∀m:M k} yielding the correct truth conditions: Every man Γ (T hrees:S, F ivea:A) W rite(s, a). kisses every woman he loves. ` Our system also handles intra-sentential The anaphoric continuation in (4) can be inter- anaphora, as exemplified in (3): preted in what Krifka calls a ‘correspondence’

12 fashion (Krifka, 1996). For example, Dr. K wrote 4 System - syntax one article, co-authored two more with Dr. N, who 4.1 Alphabet co-authored two more with Dr. S, and the scientists that cooperated in writing one or more articles The alphabet consists of: also cooperated in presenting these (and no other) type variables: X,Y,Z,...; articles at major conferences. In our system, the type constants: M, men, women, . . .; first sentence in (4) extends the context by adding type constructors: , , T; individual variables: x, y, z, . . .; the type corresponding to (T hrees:S, F ivea:A): P Q predicates: P,P ,P ,...; t : T , 0 1 ϕ,(T hrees,F ivea) ϕ,(T hrees:S ; F ivea:A) quantifier symbols: , , five, Q ,Q ,...; ∃ ∀ 1 2 interpreted as a set of tuples three chain constructors: ? ?, ? , (?,..., ?). | ? T = k ϕ,(T hrees:S ,F ivea:A)k 4.2 Context = c, d : c S & d A & c wrote d {h i ∈ k k ∈ k k } A context is a list of type specifications of (indi- The anaphoric continuation then quantifies univer- vidual) variables. If we have a context sally over this type (i.e. a set of pairs), yielding the desired truth-conditions The respective scientists Γ = x1 : X1, . . . , xn : Xn( xi i Jn ) h i ∈ cooperated in presenting at major conferences the respective articles that they cooperated in writing. then the judgement Γ: cxt 3 English-to-formal language translation ` We assume a two-step translation process. expresses this fact. Having a context Γ as above, Representation. The syntax of the representa- we can declare a type Xn+1 in that context tion language - for the English fragment considered in this paper - is as follows. Γ Xn+1( xi i Jn+1 ): type ` h i ∈ S P rdn(QP , . . . , QP ); → 1 n MCN P rdn(QP ,..., CN ,...,QP ); where Jn+1 1, . . . , n such that if i Jn+1, → 1 n ⊆ { } ∈ MCN CN; then Ji Jn+1, J1 = . The type Xn+1 depends → ⊆ ∅ on variables xi i Jn+1 . Now, we can declare a QP Det MCN; h i ∈ → X ( x ) Det every, most, three, . . .; new variable of the type n+1 i i Jn+1 in the → h i ∈ CN man, woman, . . .; context Γ → P rdn enter, love, . . .; → Γ xn+1 : Xn+1( xi i Jn+1 ) Common nouns (CNs) are interpreted as types, ` h i ∈ and common nouns modified by relative clauses and extend the context Γ by adding this variable (MCNs) - as -sentences determining some (pos- ∗ specification, i.e. we have sibly dependent) types.

Disambiguation. Sentences of English, con- Γ, xn+1 : Xn+1( xi i Jn+1 ): cxt trary to sentences of our formal language, are of- ` h i ∈ ten ambiguous. Hence one sentence representa- Γ0 is a subcontext of Γ if Γ0 is a context and a sub- tion can be associated with more than one sentence list of Γ. Let ∆ be a list of variable speciﬁcations in our formal language. The next step thus in- from a context Γ, ∆0 the least subcontext of Γ con- volves disambiguation. We take quantiﬁer phrases taining ∆. We say that ∆ is convex iff ∆0 ∆ is − of a given representation, e.g.: again a context.

P (Q1X1,Q2X2,Q3X3) The variables the types depend on are always and organize them into all possible chains of quan- explicitly written down in specifications. We can tifiers in suitable contexts with some restrictions think of a context as (a linearization of) a partially imposed on particular quantifiers concerning the ordered set of declarations such that the declara- places in prefixes at which they can occur (a de- tion of a variable x (of type X) precedes the dec- tailed elaboration of the disambiguation process is laration of the variable y (of type Y ) iff the type Y left for another place): depends on the variable x. Q1x1:X1 Q2x2:X2 The formation rules for both Σ- and Π-types are | P (x1, x2, x3). Q3x3:X3 as usual.

13 4.3 Language 2. Sequential composition of pre-chains Quantifier-free formulas. Here, we need only Γ Ch ~ : p-ch, Γ Ch ~ : p-ch predicates applied to variables. So we write ` 1 ~y1:Y1(~x1) ` 2 ~y2:Y2(~x2) Γ Ch Ch : p-ch 1 ~y1:Y~1(~x1) 2 ~y2:Y~2(~x2) Γ P (x , . . . , x ): qf-f ` | ` 1 n provided ~y (~y ~x ) = ; the specifications of to express that P is an n-ary predicate and the 2 ∩ 1 ∪ 1 ∅ the variables (~x ~x ) (~y ~y ) form a con- specifications of the variables x , . . . , x form a 1 ∪ 2 − 1 ∪ 2 1 n text, a subcontext of Γ. In so obtained pre-chain: subcontext of Γ. the binding variables are ~y ~y and the indexing Quantifier phrases. If we have a context Γ, y : 1 ∪ 2 variables are ~x ~x . Y (~x), ∆ and quantifier symbol Q, then we can 1 ∪ 2 3. Parallel composition of pre-chains form a quantifier phrase Qy:Y (~x) in that context. We write Γ Ch ~ : p-ch, Γ Ch ~ : p-ch ` 1 ~y1:Y1(~x1) ` 2 ~y2:Y2(~x2) Ch ~ Γ, y : Y (~x), ∆ Qy:Y (~x) : QP Γ 1 ~y1:Y1(~x1) : p-ch ` Ch ~ ` 2 ~y2:Y2(~x2) to express this fact. In a quantifier prase Qy:Y (~x): provided ~y (~y ~x ) = = ~y (~y ~x ). the variable y is the binding variable and the vari- 2 ∩ 1 ∪ 1 ∅ 1 ∩ 2 ∪ 2 ables ~x are indexing variables. As above, in so obtained pre-chain: the binding variables are ~y ~y and the indexing variables Packs of quantifiers. Quantifiers phrases can 1 ∪ 2 are ~x ~x . be grouped together to form a pack of quantifiers. 1 ∪ 2 The pack of quantifiers formation rule is as fol- A pre-chain of quantifiers Ch~y:Y~ (~x) is a chain lows. iff ~x ~y. The following ⊆ Γ Qi yi:Yi(~xi) : QP i = 1, . . . k ` Γ Ch ~ : chain Γ (Q ,...,Q ): pack ` ~y:Y (~x) ` 1 y1:Y1(~x1) k yk:Yk(~xk) expresses the fact that Ch is a chain of quan- where, with ~y = y , . . . , y and ~x = k ~x , we ~y:Y~ (~x) 1 k i=1 i tifiers in the context Γ. have that y = y for i = j and ~y ~x = . In so i 6 j 6 ∩ S ∅ Formulas, sentences and -sentences. The for- constructed pack: the binding variables are ~y and ∗ mulas have binding variables, indexing variables the indexing variables are ~x. We can denote such and argument variables. We write ϕ (~z) for a pack P c to indicate the variables involved. ~y:Y (~x) ~y:Y~ (~x) a formula with binding variables ~y, indexing vari- One-element pack will be denoted and treated as ables ~x and argument variables ~z. We have the a quantifier phrase. This is why we denote such a following formation rule for formulas pack as Qy:Y (~x) rather than (Qy:Y (~x)). Pre-chains and chains of quantifiers. Chains Γ A(~z): qf-f, Γ Ch~y:Y~ (~x) : p-ch and pre-chains of quantifiers have binding vari- ` ` Γ Ch~y:Y~ (~x) A(~z): formula ables and indexing variables. By Ch~y:Y~ (~x) we de- ` note a pre-chain with binding variables ~y and in- provided ~y is final in ~z, i.e. ~y ~z and variable ⊆ dexing variables ~x so that the type of the variable specifications of ~z ~y form a subcontext of Γ. In − yi is Yi(~xi) with i ~xi = ~x. Chains of quantifiers so constructed formula: the binding variables are are pre-chains in which all indexing variables are S ~y, the indexing variables are ~x, and the argument bound. Pre-chains of quantifiers arrange quantifier variables are ~z. phrases into N-free pre-orders, subject to some A formula ϕ (~z) is a sentence iff ~z ~y ~y:Y (~x) ⊆ binding conditions. Mutually comparable QPs in a and ~x ~y. So a sentence is a formula without free ⊆ pre-chain sit in one pack. Thus the pre-chains are variables, neither individual nor indexing. The fol- built from packs via two chain-constructors of se- lowing quential ? ? and parallel composition ? . The chain | ? formation rules are as follows. Γ ϕ~y:Y (~x)(~z): sentence ` 1. Packs of quantifiers. Packs of quantifiers are pre-chains of quantifiers with the same bind- expresses the fact that ϕ~y:Y (~x)(~z) is a sentence ing variable and the same indexing variables, i.e. formed in the context Γ. We shall also consider some special formulas Γ P c : ~y:Y~ (~x) pack that we call -sentences. A formula ϕ (~z) is a ` ∗ ~y:Y (~x) Γ P c ~ : p-ch -sentence if ~x ~y ~z but the set ~z ~y is possibly ` ~y:Y (~x) ∗ ⊆ ∪ −

14 ˇ not empty and moreover the type of each variable Γ such that one variable speciﬁcation tΦ0,ϕ : TΦ0,ϕ in ~z ~y is constant, i.e., it does not depend on vari- was already added for each pack Φ P acks − 0 ∈ Ch∗ ables of other types. In such case we consider the such that Φ0

Ch Γ0 TΦ,ϕ( tΦ0,ϕ Φ0 P acksCh ,Φ0

Γ0, tΦ,ϕ : TΦ,ϕ( tΦ ,ϕ Φ P acks ,Φ < Φ): cxt Γ ϕ (~z): -sentence h 0 i 0∈ Ch∗ 0 Ch∗ ` ~y:Y (~x) ∗ The context obtained from Γˇ by adding the new to express the fact that ϕ~y:Y (~x)(~z) is a -sentence ∗ variables corresponding to all the packs P acksCh formed in the context Γ. ∗ as above will be denoted by Having formed a -sentence ϕ we can form a ∗ new context Γϕ defined in the next section. Γϕ = Γˇ T(Ch ~ A(~z)). Notation. For semantics we need some notation ∪ ~y:Y (~x) for the variables in the -sentence. Suppose we At the end we add another context formation ∗ have a -sentence rule ∗ Γ Ch ~ A(~z): -sentence, Γ Ch~y:Y (~x) P (~z): -sentence ` ~y:Y (~x) ∗ ` ∗ Γϕ : cxt We define: (i) The environment of pre-chain Ch: Then we can build another formula starting in the Env(Ch) = Env(Ch~y:Y~ (~x)) - is the context defining variables ~x ~y; (ii) The binding variables context Γϕ. This process can be iterated. Thus − in this system sentence ϕ in a context Γ is con- of pre-chain Ch: Bv(Ch) = Bv(Ch~y:Y~ (~x)) - is the convex set of declarations in Γ of the binding structed via specifying sequence of formulas, with the last formula being the sentence ϕ. However, variables in ~y; (iii) env(Ch) = env(Ch~y:Y~ (~x)) - the set of variables in the environment of Ch, i.e. for the lack of space we are going to describe here only one step of this process. That is, sentence ϕ ~x ~y; (iv) bv(Ch) = bv(Ch ~ ) - the set of − ~y:Y (~x) in a context Γ can be constructed via specifying biding variables ~y; (v) The environment of a pre- -sentence ψ extending the context as follows chain Ch0 in a -sentence ϕ = Ch~y:Y (~x) P (~z), de- ∗ ∗ noted Envϕ(Ch0), is the set of binding variables Γ ψ : -sentence in all the packs in Ch∗ that are <ϕ-smaller than all ` ∗ packs in Ch0. Note Env(Ch0) Envϕ(Ch0). If Γ ϕ : sentence ⊆ ψ ` Ch0 = Ch1 Ch2 is a sub-pre-chain of the chain | For short, we can write Ch , then Env (Ch ) = Env (Ch ) ~y:Y (~x) ϕ 2 ϕ 1 ∪ Bv(Ch1) and Envϕ(Ch1) = Envϕ(Ch0). Γ Γ ϕ : sentence ` ψ ` 4.4 Dynamic extensions 5 System - semantics Suppose we have constructed a -sentence in a ∗ context 5.1 Interpretation of dependent types The context Γ Γ Ch A(~z): -sentence ` ~y:Y~ (~x) ∗ x : X(...), . . . , z : Z(. . . , x, y, . . .): cxt ` We write ϕ for Ch~y:Y~ (~x) A(~z). We form a context Γϕ dropping the specifica- gives rise to a dependence graph. A dependence tions of variables ~z and adding one type and one graph DGΓ = (TΓ,EΓ) for the context Γ has variable specification for each pack in P acks . types of Γ as vertices and an edge π : Y X Ch∗ Y,x → Let Γˇ denote the context Γ with the specifica- for every variable specification x : X(...) in Γ tions of the variables ~z deleted. Suppose Φ and every type Y (. . . , x, . . .) occurring in Γ that ∈ P acksCh∗ and Γ0 is an extension of the context depends on x.

15 The dependence diagram for the context Γ is an Envϕ(Ch0) is a sub-context of Γ disjoint from the association : DG Set to every type X convex set Bv(Ch ) and Env (Ch ), Bv(Ch ) is k − k Γ → 0 ϕ 0 0 in T a set X and every edge π : Y X a sub-context of Γ. For ~a Env (Ch ) we de- Γ k k Y,x → ∈ k ϕ 0 k in E a function π : Y X , so that fine Bv(Ch ) (~a) to be the largest set such that Γ k Y,xk k k → k k k 0 k whenever we have a triangle of edges in EΓ, πY,x as before π : Z Y , π : Z X we have ~a Bv(Ch0) (~a) Envϕ(Ch0), Bv(Ch0) Z,y → Z,x → { } × k k ⊆ k k π = π π . k Z,xk k Y,xk ◦ k Z,yk Interpretation of quantifier phrases. If we have The interpretation of the context Γ, the param- a quantifier phrase eter space Γ , is the limit of the dependence dia- k k gram : DGΓ Set. More specifically, Γ Q : QP k − k → ` y:Y (~x) Γ = x : X(...), . . . , z : Z(. . . , x, y, . . .) = and ~a Env (Q ) , then it is interpreted k k k k ∈ k ϕ y:Y (~x) k as Q ( Y (~a)) ( Y (~a ~x)). ~a : dom(~a) = var(Γ), ~a(z) Z (~a env(Z)), k k k k ⊆ P k k d { ∈ k k d Interpretation of packs. If we have a pack of πZ,x (~a(z)) = ~a(x), for z : Z in Γ, x envZ quantifiers in the sentence ϕ k k ∈ } where var(Γ) denotes variables specified in Γ and P c = (Q1 ,...Qn ) env(Z) denotes indexing variables of the type Z. y1:Y1(~x1) yn:Yn(~xn) The interpretation of the Σ- and Π-types are as and ~a Env (P c) , then its interpretation with ∈ k ϕ k usual. the parameter ~a is

5.2 Interpretation of language P c (~a) = (Q ,...,Q ) (~a) = k k k 1y1:Y1(~x1) nyn:Yn(~xn) k Interpretation of predicates and quantifier sym- n bols. Both predicates and quantifiers are inter- A Yi (~a ~xi): πi(A) Qi ( Yi (~a ~xi), preted polymorphically. { ⊆ i=1 k k d ∈ k k k k d If we have a predicate P defined in a context Γ: Y for i = 1, . . . , n } x1 : X1, . . . , xn : Xn( xi i Jn]) P (~x): qf-f π i h i ∈ ` where i is the -th projection from the product. Interpretation of chain constructors. then, for any interpretation of the context Γ , it k k 1. Parallel composition. For a pre-chain of is interpreted as a subset of its parameter set, i.e. quantifiers in the sentence ϕ P Γ . k k ⊆ k k Quantifier symbol Q is interpreted as quantifier Ch 1~y1:Y~1(~x1) Q i.e. an association to every1 set Z a subset Ch0 = k k Ch2 ~ Q (Z) (Z). ~y2:Y2(~x2) k k ⊆ P Interpretation of pre-chains and chains of quan- and ~a Envϕ(Ch0) we define tifiers. We interpret QP’s, packs, pre-chains, and ∈ k k chains in the environment of a sentence Envϕ. Ch1 ~ ~y1:Y1(~x1) (~a) = A B : This is the only case that is needed. We could kCh2~y :Y~ (~x ) k { × interpret the aforementioned syntactic objects in 2 2 2 their natural environment Env (i.e. independently A Ch (~a ~x ) and 1~y1:Y~1(~x1) 1 of any given sentence) but it would unnecessarily ∈ k k d B Ch2 ~ (~a ~x2) complicate some definitions. Thus having a ( -) ~y2:Y2(~x2) ∗ ∈ k k d } sentence ϕ = Ch~y:Y (~x) P (~z) (defined in a con- 2. Sequential composition. For a pre-chain of text Γ) and a sub-pre-chain (QP, pack) Ch0, for quantifiers in the sentence ϕ ~a Env (Ch ) we define the meaning of ∈ k ϕ 0 k Ch0 = Ch Ch 1~y1:Y~1(~x1) 2~y2:Y~2(~x2) Ch0 (~a) | k k and ~a Envϕ(Ch0) we define Notation. Let ϕ = Ch P (~y) be a - ∈ k k ~y:Y~ ∗ sentence built in a context Γ, Ch0 a pre-chain used Ch1 ~ Ch2 ~ (~a) = in the construction of the ( )-chain Ch. Then k ~y1:Y1(~x1)| ~y2:Y2(~x2)k ∗ 1 This association can be partial. R Bv(Ch0) (~a): ~b Bv(Ch ) (~a): { ⊆ k k { ∈ k 1 k

16 ~c Bv(Ch ) (~a,~b): ~b,~c R then we deﬁne sets { ∈ k 2 k h i ∈ } ∈ ~ Ch2 ~ (~a, b) Ch1 ~ (~a) T (~a) Ch (~a) k ~y2:Y2(~x2)k } ∈ k ~y1:Y1(~x1)k } k ϕ,Chi k ∈ k ik Validity. A sentence for i = 1, 2 so that

~x : X~ Ch ~ P (~y) T (~a) = T (~a) T (~a) ` ~y:Y k ϕ,Ch0 k k ϕ,Ch1 k × k ϕ,Ch2 k is true under the above interpretation iff if such sets exist, and these sets ( Tϕ,Ch (~a)) are k i k undeﬁned otherwise. P ( Y~ ) Ch Sequential decomposition. If we have k k k k ∈ k ~y:Y~ k

Ch0 = Ch1 ~ Ch2 ~ 5.3 Interpretation of dynamic extensions ~y1:Y1(~x1)| ~y2:Y2(~x2) Suppose we obtain a context Γϕ from Γ by the fol- then we put lowing rule Tϕ,Ch (~a) = ~b Bv(Ch1) (~a): Γ Ch A(~z): -sentence, k 1 k { ∈ k k ` ~y:Y~ (~x) ∗ ~ ~ T Γ : cxt ~c Bv(Ch2) (~a, b): b,~c ϕ,Ch (~a) ϕ { ∈ k k h i ∈ k 0 k } Ch2 (~a,~b) where ϕ is Ch~y:Y~ (~x) A(~z). Then ∈ k k } For ~b Bv(Ch1) we put ˇ T ∈ k k Γϕ = Γ (Ch~y:Y~ (~x) A(~z)). ∪ T (~a,~b) = ~c Bv(Ch ) (~a,~b): k ϕ,Ch2 k { ∈ k 2 k Γ From dependence diagram : DGΓ Set k − k → ~b,~c T (~a) we shall define another dependence diagram h i ∈ k ϕ,Ch0 k } Step 2. (Building dependent types from fibers.) Γϕ = : DGΓϕ Set If Φ is a pack in Ch , ~a Env (Φ) then we k − k k − k → ∗ ∈ k ϕ k put Thus, for Φ P ackCh∗ we need to define Γ ∈ T ϕ and for Φ < Φ we need to define Tϕ,Φ = ~a Tφ,Φ (~a): ~a Envϕ(Φ) , k Φ,ϕk 0 Ch∗ k k {{ }×k k ∈ k k T T [ T πTΦ,ϕ,t : Φ,ϕ Φ ,ϕ Φ0

T (~a) Bv(Ch0) (~a) 6 Conclusion k ϕ,Ch0 k ⊆ k k It was our intention in this paper to show that This is done using the inductive clauses through adopting a typed approach to generalized quan- which we have defined Ch∗ but in the reverse di- tification allows a uniform treatment of a wide ar- rection. ray of anaphoric data involving natural language The basic case is when Ch0 = Ch∗. We put quantification. T ( ) = P k ϕ,Chk ∅ k k Acknowledgments The inductive step. Now assume that the set The work of Justyna Grudzinska´ was funded by T (~a) is defined for ~a Env (Ch ) . the National Science Center on the basis of de- k ϕ,Ch0 k ∈ k ϕ 0 k Parallel decomposition. If we have cision DEC-2012/07/B/HS1/00301. The authors would like to thank the anonymous reviewers for Ch 1~y1:Y~1(~x1) valuable comments on an earlier version of this pa- Ch0 = Ch per. 2~y2:Y~2(~x2)

17 References Barwise, Jon and Robin Cooper. 1981. Generalized Quantifiers and Natural Language. Linguistics and Philosophy 4: 159-219. Bellert, Irena and Marek Zawadowski. 1989. Formal- ization of the feature system in terms of pre-orders. In Irena Bellert Feature System for Quantification Structures in Natural Language. Foris Dordrecht. 155-172. Dekker, Paul. 1994. Predicate logic with anaphora. In Lynn Santelmann and Mandy Harvey (eds.), Pro- ceedings SALT IX. Ithaca, NY: DMLL Publications, Cornell University. 79-95. Dekker, Paul. 2008. A multi-dimensional treatment of quantification in extraordinary English. Linguistics and Philosophy 31: 101-127. Elworthy, David A. H. 1995. A theory of anaphoric information. Linguistics and Philosophy 18: 297- 332. Elbourne, Paul D. 2005. Situations and Individuals. Cambridge, MA: MIT Press. Evans, Gareth 1977. Pronouns, Quantifiers, and Rela- tive Clauses (I). Canadian Journal of Philosophy 7: 467-536. Heim, Irene. 1990. E-type pronouns and donkey anaphora. Linguistics and Philosophy 13: 137-78. Groenendijk, Jeroen and Martin Stokhof. 1991. Dy- namic Predicate Logic. Linguistics and Philosophy 14: 39-100. Kamp, Hans and Uwe Reyle. 1993. From Discourse to Logic. Kluwer Academic Publishers, Dordrecht. Krifka, Manfred. 1996. Parametrized sum individuals for plural reference and partitive quantification. Linguistics and Philosophy 19: 555-598. Lindstrom,¨ Per. 1966. First-order predicate logic with generalized quantifiers. Theoria 32: 186-95. Martin-Lof,¨ Per. 1972. An intuitionstic theory of types. Technical Report, University of Stockholm. Mostowski, Andrzej. 1957. On a generalization of quantifiers. Fundamenta Mathematicae 44: 12-36. Nouwen, Rick. 2003. Plural pronominal anaphora in context: dynamic aspects of quantification. Ph.D. thesis, UiL-OTS, Utrecht, LOT dissertation series, No. 84. Ranta, Aarne. 1994. Type-Theoretical Grammar. Ox- ford University Press, Oxford. Van den Berg, Martin H. 1996. The Internal Structure of Discourse. Ph.D. thesis, Universiteit van Amster- dam, Amsterdam.

18 Monads as a Solution for Generalized Opacity

Gianluca Giorgolo Ash Asudeh University of Oxford University of Oxford and Carleton University

gianluca.giorgolo,ash.asudeh @ling-phil.ox.ac.uk { }

Abstract data like (1) inside a larger framework that also includes other types of expressions. In this paper we discuss a conservative ex- We decompose examples like (1) along two di- tension of the simply-typed lambda calcu- mensions: the presence or absence of a modal ex- lus in order to model a class of expres- pression, and the way in which we multiply re- sions that generalize the notion of opaque fer to the same individual. In the case of (1), we contexts. Our extension is based on previ- have a modal and we use two different co-referring ous work in the semantics of programming expressions. Examples (2)-(4) complete the land- languages aimed at providing a mathemat- scape of possible combinations: ical characterization of computations that produce some kind of side effect (Moggi, (2) Dr. Octopus punched Spider-Man but he 1989), and is based on the notion of mon- didn’t punch Spider-Man. ads, a construction in category theory that, (3) Mary Jane loves Peter Parker but she intuitively, maps a collection of “simple” doesn’t love Spider-Man. values and “simple” functions into a more complex value space, in a canonical way. (4) Reza doesn’t believe Jesus is Jesus. The main advantages of our approach with (2) is clearly a contradictory statement, as we respect to traditional analyses of opacity predicate of Dr. Octopus that he has the property are the fact that we are able to explain in of having punched Spider-Man and its negation. a uniform way a set of different but re- Notice that in this example there is no modal and lated phenomena, and that we do so in a the exact same expression is used twice to refer principled way that has been shown to also to the object individual. In the case of (3) we explain other linguistic phenomena (Shan, still have no modal and we use two different but 2001). co-referring expressions to refer to the same in- 1 Introduction dividual. However in this case the statement has a non-contradictory reading. Similarly (4) has a Opaque contexts have been an active area of re- non-contradictory reading, which states that, ac- search in natural language semantics since Frege’s cording to the speaker, Reza doesn’t believe that original discussion of the puzzle (Frege, 1892). A the entity he (Reza) calls Jesus is the entity that the sentence like (1) has a non-contradictory interpre- speaker calls Jesus (e.g., is not the same individual tation despite the fact that the two referring expres- or does not have the same properties). This case is sions Hesperus and Phosphorus refer to the same symmetrical to (3), as here we have a modal ex- entity, the planet we know as Venus. pression but the same expression is used twice to refer to the same individual. (1) Reza doesn’t believe Hesperus is Phos- If the relevant reading of (4) is difﬁcult to get, phorus. consider an alternative kind of scenario, as in (5), The fact that a sentence like (1) includes the in which the subject of the sentence, Kim, suffers modal believe has inﬂuenced much of the analy- from Capgras Syndrome1 and thinks that Sandy is ses proposed in the literature, and has linked the an impostor. The speaker says: phenomenon with the notion of modality. In this 1From Wikipedia: “[Capgras syndrome] is a disorder in paper we challenge this view and try to position which a person holds a delusion that a friend, spouse, par-

19 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 19–27, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics

(5) Kim doesn’t believe Sandy is Sandy simple into a more complex object and function space. They have been successfully used in the Given the definition of Capgras Syndrome (in semantics of programming languages to charac- fn. 1), there is a clear non-contradictory reading terise computations that are not “pure”. By pure available here, in which the speaker is stating that we mean code objects that are totally referentially Kim does not believe that the entity in question, transparent (i.e. do not depend on external factors that the speaker and (other non-Capgras-sufferers) and return the same results given the same input would call Sandy, is the entity that she associate independently of their execution context), and also with the name Sandy. that do not have effects on the “real world”. In We propose an analysis of the non-contradictory contrast, monads are used to model computations cases based on the intuition that the apparently co- that for example read from or write to a file, that referential expressions are in fact interpreted us- depend on some random process or whose return ing different interpretation functions, which cor- value is non-deterministic. respond to different perspectives that are pitted In our case we will use the monad that describe against each other in the sentences. Furthermore, values that are made dependent on some exter- we propose that modal expressions are not the only nal factor, commonly known in the functional pro- ones capable of introducing a shift of perspective, gramming literature as the Reader monad.3 We but also that verbs that involve some kind of men- will represent linguistic expressions that can be as- tal attitude of the subject towards the object have signed potentially different interpretations as func- the same effect. tions from interpretation indices to values. Effec- Notice that this last observation distinguishes tively we will construct a different type of lexicon our approach from one where a sentence like (3) is that does not represent only the linguistic knowl- interpreted as simply saying that Mary Jane loves edge of a single speaker but also her (possibly par- only one guise of the entity that corresponds to Pe- tial) knowledge of the language of other speak- ter Parker but not another one. The problem with ers. So, for example, we will claim that (4) can this interpretation is that if it is indeed the case be assigned a non-contradictory reading because that different co-referring expressions simply pick the speaker’s lexicon also includes the information out different guises of the same individual, then a regarding Reza’s interpretation of the name Jesus sentence like (6) should have a non-contradictory and therefore makes it possible for the speaker to reading, while this seems not to be the case. use the same expression, in combination with a (6) Dr. Octopus killed Spider-Man but he verb such as believe, to actually refer to two dif- didn’t kill Peter Parker. ferent entities. In one case we will argue that the name Jesus is interpreted using the speaker’s inter- While a guise-based interpretation is compatible pretation while in the other case it is Reza’s inter- 2 with our analysis , it is also necessary to correctly pretation that is used. model the different behaviour of verbs like love Notice that we can apply our analysis to any with respect to others like punch or kill. In fact, natural language expression that may have differ- we need to model the difference between, for ex- ent interpretations. This means that, for exam- ample, kill and murder, since murder does involve ple, we can extend our analysis, which is limited a mental attitude of intention and the correspond- to referring expressions here for space reasons, to ing sentence to (6) is not necessarily contradictory: other cases, such as the standard examples involv- (7) Dr. Octopus murdered Spider-Man but he ing ideally synonymous predicates like groundhog didn’t murder Peter Parker. and woodchuck (see, e.g., Fox and Lappin (2005)). The paper is organised as follows: in section The implementation of our analysis is based on 2 we discuss the technical details of our analysis; monads. Monads are a construction in category in section 3 we discuss our analyses of the moti- theory that defines a canonical way to map a set vating examples; section 4 compares our approach of objects and functions that we may consider as with a standard approach to opacity; we conclude ent, or other close family member has been replaced by an in section 5. identical-looking impostor.” 2Indeed, one way to understand guises is as different ways 3Shan (2001) was the first to sketch the idea of using the in which we interpret a referring term (Heim, 1998). Reader monad to model intensional phenomena.

20 2 Monads and interpretation functions associated with the expressions under considera- tion. To avoid introducing the complexities of the categorical formalism, we introduce monads as they We thus add two operators, η and ?, to the are usually encountered in the computer science lambda calculus and the reductions work as ex- literature. A monad is defined as a triple , η, ? . pected for (9) and (10). These reductions are im- h♦ i ♦ is what we call a functor, in our case a mapping plicit in our analyses in section 3. between types and functions. We call the component of ♦ that maps between types ♦1, while the one that maps between functions ♦2. In our 2.1 Compositional logic case ♦1 will map each type to a new type that corresponds to the original type with an added For composing the meanings of linguistic reinterpretation index parameter. Formally, if i is sources we use a logical calculus adapted for the the type of interpretation indices, then ♦1 maps linear case5 from the one introduced by Benton et any type τ to i τ. In terms of functions, → al. (1998). The calculus is based on a language maps any function f : τ δ to a function ♦2 → with two connectives corresponding to our type f :(i τ) i δ. corresponds to func- 0 → → → ♦2 constructors: (, a binary connective, that corre- tion composition: sponds to (linear) functional types, and ♦, a unary connective, that represents monadic types. ♦2(f) = λg.λi.f(g(i)) (8) The logical calculus is described by the proof In what follows the component ♦2 will not be used rules in figure 1. The rules come annotated with so we will use ♦ as an abbreviation for ♦1. This lambda terms that characterise the Curry-Howard means that we will write τ for the type i τ. ♦ → correspondence between proofs and meaning η (pronounced “unit”) is a polymorphic func- terms. Here we assume a Lexical Functional tion that maps inhabitants of a type τ to inhabitants Grammar-like setup (Dalrymple, 2001), where of its image under , formally η : τ.τ τ. ♦ ∀ → ♦ a syntactic and functional grammar component Using the computational metaphor, η should em- feeds the semantic component with lexical re- bed a value in a computation that returns that value sources already marked with respect to their without any side-effect. In our case η should sim- predicate-argument relationships. Alternatively ply add a vacuous parameter to the value: we could modify the calculus to a categorial setting, by introducing a structural connective, and η(x) = λi.x (9) using directional versions of the implication con- ? (pronounced “bind”) is a polymorphic func- nective together with purely structural rules to tion of type τ. δ. τ (τ δ) δ, and control the compositional process. ∀ ∀ ♦ → → ♦ → ♦ acts as a sort of enhanced functional application.4 We can prove that the Cut rule is admissible, Again using the computational metaphor, ? takes therefore the calculus becomes an effective (al- care of combining the side effects of the argument though inefficient) way of computing the meaning and the function and returns the resulting compu- of a linguistic expression. tation. In the case of the monad we are interested A key advantage we gain from the monadic ap- in, ? is defined as in (10). proach is that we are not forced to generalize all a ? f = λi.f(a(i))(i) (10) lexical entries to the “worst case”. With the logical setup we have just described we can in fact Another fundamental property of ? is that, by freely mix monadic and non monadic resources. imposing an order of evaluation, it will provide For example, in our logic we can combine a pure us with an additional scoping mechanism distinct version of a binary function with arguments that from standard functional application. This will al- are either pure or monadic, as the following are all low us to correctly capture the multiple readings

4We use for ? the argument order as it is normally used in functional programming. We could swap the arguments and 5Linearity is a property that has been argued for in the make it look more like standard functional application. Also, context of natural language semantics by various researchers we write ? in inﬁx notation. (Moortgat (2011), Asudeh (2012), among others).

21 Γ BB, ∆ C id ` ` Cut x : A x : A Γ, ∆ C ` ` Γ, x : A t : B ∆ t : A Γ, x : B u : C ` ( R ` ` ( L Γ λx.t : A ( B Γ, ∆, y : A B u[y(t)/x]: C ` ( ` Γ x : A Γ, x : A t : ♦B ` ♦R ` ♦L Γ η(x): ♦A Γ, y : A y ? λx.t : B ` ♦ ` ♦ Figure 1: Sequent calculus for a fragment of multiplicative linear logic enriched with a monadic modality, together with a Curry-Howard correspondence between formulae and meaning terms. provable theorems in this logic. the entire set of resources available. This shows that the compositionality of our monadic approach A B C, A, B C (11) ( ( ` ♦ cannot be equivalently recapitulated in a simple A B C, A, B C (12) type theory. ( ( ♦ ` ♦ A B C, A, B C (13) ( ( ♦ ` ♦ A B C, A, B C (14) 3 Examples ( ( ♦ ♦ ` ♦ The starting point for the analysis of examples (1)- In contrast, the following is not a theorem in the (4) is the lexicon in table 1. The lexicon represents logic: the linguistic knowledge of the speaker, and at A ( B ( C,I ( A, I ( B I ( C (15) the same time her knowledge about other agents’ 6` grammars. In general, then, if we were to simply lift the type Most lexical entries are standard, since we do of the lexical resources whose interpretation may not need to change the type and denotation of lex- be dependent on a specific point of view, we would ical items that are not involved in the phenomena be forced to lift all linguistic expressions that may under discussion. So, for instance, logical opera- combine with them, thus generalizing to the worst tors such as not and but are interpreted in the stan- case after all. dard non-monadic way, as is a verb like punch or The monadic machinery also achieves a higher kill. Referring expressions that are possibly con- level of compositionality. In principle we could tentious, in the sense that they can be interpreted directly encode our monad using the type con- → differently by the speaker and other agents, instead structor, with linear implication, (, on the logical have the monadic type ♦e. This is reflected in side. However this alternative encoding wouldn’t their denotation by the fact that their value varies have the same deductive properties. Compare the according to an interpretation index. We use a pattern of inferences we have for the monadic special index σ to refer to the speaker’s own pertype, in (11)–(14), with the equivalent one for the spective, and assume that this is the default index simple type: used whenever no other index is specifically introduced. For example, in the case of the name A B C, A, B C (16) ( ( ` Spider-Man, the speaker is aware of his secret A B C,I A, B I C (17) ( ( ( ` ( identity and therefore interprets it as another name A B C, A, I B I C (18) for the individual Peter Parker, while Mary Jane ( ( ( ` ( A ( B ( C,I ( A, I ( B (19) and Dr. Octopus consider Spider-Man a different ` entity from Peter Parker. I ( I ( C We assume that sentences are interpreted in a In the case of the simple type, the final formula we model in which all entities are mental entities, i.e. derive depends in some non-trivial way on the en- that there is no direct reference to entities in the tire collection of resources on the left-hand side of world, but only to mental representations. Enti- the sequent. In contrast in the case of the monadic ties are therefore relativized with respect to the type, the same type can be derived for all config- agent that mentally represents them, where non- urations. What is important is that we can pre- contentious entities are always relativized accord- dict the final formula without having to consider ing to the speaker. This allows us to represent the

22 WORD DENOTATION TYPE Reza rσ e Kim kσ e Dr. Octopus oσ e Mary Jane mjσ e Peter Parker ppσ e not λp. p t t ¬ → but λp.λq.p q t t t ∧ → → is λx.λy.x = y e e t → → punch λo.λs.punch(s)(o) e e t → → believe λc.λs.λi.B(s)(c(κ(s))) t e t ♦ → → ♦ love λo.λs.λi.love(s)(o(κ(s))) e e t ♦ → → ♦ esr if i = r, Hesperus λi. ♦e (vσ if i = σ msr if i = r, Phosphorus λi. ♦e (vσ if i = σ smi if i = o or i = mj, Spider-Man λi. ♦e (ppσ if i = σ jr if i = r, Jesus λi. ♦e (jσ if i = σ

impk if i = k, Sandy λi. ♦e (sσ if i = σ Table 1: Speaker’s lexicon fact that different agents may have distinct equiv- while love changes the interpretation of its object alencies between entities. For example, Reza in relative to the perspective of its subject. However our model does not equate the evening star and we will see that the interaction of these lexical en- the morning star, but the speaker equates them tries and the evaluation order imposed by ? will al- with each other and with Venus. Therefore, the low us to let the complement of a verb like believe speaker’s lexicon in table 1 represents the fact and the object of a verb like love escape the spe- that the speaker’s epistemic model includes what ciﬁc effect of forcing the subject point of view, and the speaker knows about other agents’ models, instead we will be able to derive readings in which e.g. that Reza has a distinct denotation (from the the arguments of the verb are interpreted using the speaker) for Hesperus, that Mary Jane has a dis- speaker’s point of view. tinct representation for Spider-Man, that Kim has Figure 2 reports the four non-equivalent read- a distinct representation for Sandy, etc. ings that we derive in our system for example (1), repeated here as (20).6 The other special lexical entries in our lexicon are those for verbs like believe and love. The two (20) Reza doesn’t believe that Hesperus is entries are similar in the sense that they both take Phosphorus. an already monadic resource and actively supply Reading (21) assigns to both Hesperus and a speciﬁc interpretation index that corresponds to Phosphorus the subject interpretation and results, the subject of the verbs. The function κ maps each after contextualising the sentence by applying it entity to the corresponding interpretation index, to the standard σ interpretation index, in the truth i.e. κ : e i. For example, in the lexical en- → conditions in (25), i.e. that Reza does not be- tries for believe and love, κ maps the subject to the lieve that the evening star is the morning star. This interpretation index of the subject. Thus, the entry 6The logic generates six different readings but the monad for believe uses the subject’s point of view as the we are using here has a commutative behaviour, so four of perspective used to evaluate its entire complement, these readings are pairwise equivalent.

23 believe ( Hesperus ? λx. Phosphorus ? λy.η( is (x)(y)))( Reza ) ? λz.η( not (z)) (21) Hesperus ? λx. believe ( Phosphorus ? λy.η( is (x)(y)))( Reza ) ? λz.η( not (z)) (22) JPhosphorusK J ? λx. KbelieveJ ( HesperusK ? λy.η(JisK (y)(x)))(JRezaK) ? λz.η(JnotK (z)) (23) JHesperusK ? λx. JPhosphorusK J ? λy. believeK (η(J isK (x)(y)))(J RezaK ) ? λz.η(J notK (z)) (24) J K J K J K J K J K J K J FigureK 2: Non-equivalentJ K readingsJ for RezaK doesn’tJ K believeJ HesperusK is Phosphorus.J K reading would not be contradictory in an epistemic The verb punch is not a verb that can change model (such as Reza’s model) where the evening the interpretation perspective and therefore the po- star and the morning star are not the same entity. tentially controversial name Spider-Man is interpreted in both instances using the speaker’s inter- B(r)(esr = msr) (25) pretation index. The result are unsatisfiable truth ¬ conditions, as expected: In the case of (22) and (23) we get a similar effect although here we mix the epistemic models, punch(oσ)(ppσ) punch(oσ)(ppσ) (30) and one of the referring expressions is interpreted ∧ ¬ under the speaker perspective while the other is In contrast a verb like love is defined in our lex- again interpreted under Reza’s perspective. For icon as possibly changing the interpretation per- these two readings we obtain respectively the truth spective of its object to that of its subject. There- conditions in (26) and (27). fore in the case of a sentence like (3), we expect one reading where the potentially contentious B(r)(v = ms ) (26) ¬ σ r name Spider-Man is interpreted according to the B(r)(v = es ) (27) ¬ σ r subject of love, Mary Jane. This is in fact the result we obtain. Figure 3 reports the two readings Finally for (24) we get the contradictory reading that our framework generates for (3). that Reza does not believe that Venus is Venus, as Reading (31), corresponds to the non contradic- both referring expressions are evaluated using the tory interpretation of sentence (3), where Spider- speaker’s interpretation index. Man is interpreted according to Mary Jane’s perspective and therefore is assigned an entity differ- B(r)(v = v ) (28) ¬ σ σ ent from Peter Parker: The different contexts for the interpretation of love(mj )(pp ) love(mj )(sm ) (33) referring expressions are completely determined σ σ ∧ ¬ σ mj by the order in which we evaluate monadic resources. This means that, just by looking at the Reading (32) instead generates unsatisfiable truth linear order of the lambda order, we can check conditions, as Spider-Man is identified with Peter whether a referring expression is evaluated inside Parker according to the speaker’s interpretation: the scope of a perspective changing operator such love(mj )(pp ) love(mj )(pp ) (34) as believe, or if it is interpreted using the standard σ σ ∧ ¬ σ σ interpretation. If we consider a case like sentence (2), we ought Our last example, (4), repeated here as (35), is to get only a contradictory reading as the statement particularly interesting as we are not aware of pre- is intuitively contradictory. Our analysis produces vious work that discusses this type of sentence. a single reading that indeed corresponds to a con- The non-contradictory reading that this sentence tradictory interpretation: has seems to be connected specifically to two different interpretations of the same name, Jesus, Spider-Man ? λx. Spider-Man ? both under the syntactic scope of the modal believe. λy.η( but ( punch ( Dr. Octopus )(x)) J K J K ( not ( punch ( Dr. Octopus )(y)))) (29) (35) Reza doesn’t believe Jesus is Jesus. J K J K J K J K J K J K 24 love (η( Peter Parker ))( Mary Jane ) ? λp. love ( Spider-Man )( Mary Jane ) ? λq.η( but (p)( not (q))) (31) JloveK (η(JPeter ParkerK))(JMary JaneK) ? λp. JSpider-ManK J ? λx. KloveJ (η(x))( KMary Jane ) ? λq.η(JbutK (p)(JnotK (q))) (32) J K J K J K J K J K J K Figure 3:J Non-equivalentK J K readings for Mary Jane loves Peter Parker but she doesn’t love Spider-Man.

believe ( Jesus ? λx. Jesus ? λy.η( is (x)(y)))( Reza ) ? λz.η( not (z)) (36) Jesus ? λx. Jesus ? λy. believe (η( is (x)(y)))( Reza ) ? λz.η( not (z)) (37) JJesus ?K λx.J believeK (JJesusK ? λy.η(JisK (x)(y)))(JRezaK) ? λz.η(JnotK (z)) (38) J K J K J K J K J K J K J FigureK 4: Non-equivalentJ K J readingsK forJ RezaK doesn’tJ believeK Jesus isJ Jesus.K

Our system generates three non-equivalent read- 4 Comparison with traditional ings, reported here in figure 4.7 approaches Reading (36) and (37) corresponds to two contradictory readings of the sentence: in the first In this section we try to sketch how a traditional case both instances of the name Jesus are inter- approach to opaque contexts, such as one based preted from the subject perspective and therefore on a de dicto/de re ambiguity with respect to a attribute to Reza the non-belief in a tautology, sim- modal operator, would fare in the analysis of (4), ilarly in the second case, even though in this case our most challenging example. the two names are interpreted from the perspec- To try to explain the two readings in the con- tive of the speaker. In contrast the reading in (38) text of a standard possible worlds semantics, we corresponds to the interpretation that assigns two could take (4) to be ambiguous with respect to a different referents to the two instances of the name de dicto/de re reading. In the case of the de dicto Jesus, producing the truth conditions in (39) which reading (which corresponds to the non-satisfiable are satisfiable in a suitable model. reading) the two names are evaluated under the scope of the doxastic operator believe, i.e. they B(r)(jσ = jr) (39) ¬ both refer to the same entity that is assigned to the The analysis of the Capgras example (5), re- name Jesus in each accessible world. Clearly this peated in (40), is equivalent; the non-contradictory is always the case, and so (4) is not satisfiable. In reading is shown in (41). the case of the de re reading, we assume that the two names are evaluated at different worlds that (40) Kim doesn’t believe Sandy is Sandy. assign different referents to the two names. One of these two worlds will be the actual world and B(k)(sσ = impk) (41) ¬ the other one of the accesible worlds. The reading We use impk as the speaker’s representation of is satisfiable if the doxastic modality links the ac- the “impostor” that Kim thinks has taken the place tual world with one in which the name Jesus refers of Sandy. to a different entity. Notice that for this analysis to More generally, there are again three non- work we need to make two assumptions: 1. that equivalent readings, including the one above, names behave as quantifiers with the property of which are just those in figure 4, with Jesus re- escaping modal contexts, 2. that names can be as- placed by Sandy and Reza replaced by Kim . signed different referents in different worlds, i.e. 7Again, there are six readings that correspondJ to differentK we have to abandon the standard notion that names proofs, but givenJ theK commutativeJ K behaviour of theJ ReaderK are rigid designators (Kripke, 1972). In contrast, monad, the fact that equality is commutative, and the fact that we have in this case two identical lexical items, only three of in our approach we do not need to abandon the them are non-equivalent readings. idea of rigid designation for names (within each

25 agent’s model). sufficiently general. However, such an approach would present a number of rather serious problems. The first is 5 Conclusion connected with the assumption that names are We started by discussing a diverse collection of scopeless. This is a common hypothesis in natural expressions that share the common property of language semantics and indeed if we model names showing nontrivial referential behaviours. We as generalized quantifiers they can be proven to be have proposed a common analysis of all these scopeless (Zimmermann, 1993). But this is prob- expressions in terms of a combination of differ- lematic for our example. In fact we would predict ent interpretation contexts. We have claimed that that both instances of the name Jesus escape the the switch to a different interpretation context is scope of believe. The resulting reading would bind triggered by specific lexical items, such as modal the quantified individual to the interpretation of Je- verbs but also verbs that express some kind of sus in the actual world. In this way we only cap- mental attitude of the subject of the verb towards ture the non-satisfiable reading. To save the sco- its object. The context switch is not obligatory, pal approach we would need to assume that names as witnessed by the multiple readings that the sen- in fact are sometimes interpreted in the scope of tences discussed seem to have. We implemented modal operators. our analysis using monads. The main idea of One way to do this would be to set up our se- our formal implementation is that referring ex- mantic derivations so that they allow different sco- pressions that have a potential dependency from pal relations between quantifiers and other opera- an interpretation context can be implemented as tors. The problem with this solution is that for sen- functions from interpretation indices to fully in- tences like (4) we generate twelve different deriva- terpreted values. Similarly, the linguistic triggers tions, some of which do not correspond to valid for context switch are implemented in the lexi- readings of the sentence. con as functions that can modify the interpreta- Even assuming that we find a satisfactory solution context of their arguments. Monads allow tion for these problems, the scopal approach can- us to freely combine these “lifted” meanings with not really capture the intuitions behind opacity in standard ones, avoiding in this way to generalize all contexts. Consider again (4) and assume that our lexicon to the worst case. We have also seen there are two views about Jesus: Jesus as a divine how more traditional approaches, while capable of being and Jesus as a human being. Assume that Je- dealing with some of the examples we discuss, are sus is a human being in the actual world and that not capable of providing a generalised explanation Reza is an atheist, then the only possible reading of the observed phenomena. is the non-satisfiable one, as the referent for Jesus Acknowledgements will be the same in the actual world and all accessible Reza-belief-worlds. The problem is that the The authors would like to thank our anonymous scopal approach assumes a single modal model, reviewers for their comments. This research is while in this case it seems that there are two doxas- supported by a Marie Curie Intra-European Fel- tic models, Reza’s model and the speaker’s model, lowship from the Europoean Commision under under discussion. In contrast, in our approach the contract number 327811 (Giorgolo) and an Early relevant part of Reza’s model is embedded inside Researcher Award from the Ontario Ministry of the speaker’s model and interpretation indices in- Research and Innovation and NSERC Discovery dicate which interpretation belongs to Reza and Grant #371969 (Asudeh). which to the speaker. Finally an account of modality in terms of scopal properties is necessarily limited to cases in which modal operators are present. While this may be a valid position in the case of typical intensional verbs like seek or want, it would not be clear how we could extend this approach to cases like 3, as the verb love has no clear modal con- notation. Thus, the scopal approach would not be

26 References Ash Asudeh. 2012. The Logic of Pronominal Resump- tion. Oxford Studies in Theoretical Linguistics. Ox- ford University Press, New York. Nick Benton, G. M. Bierman, and Valeria de Paiva. 1998. Computational types from a logical perspective. Journal of Functional Programming, 8(2):177–193. Mary Dalrymple. 2001. Lexical Functional Grammar. Academic Press, San Diego, CA. Chris Fox and Shalom Lappin. 2005. Foundations of Intensional Semantics. Blackwell, Oxford. Gottlob Frege. 1892. Uber¨ Sinn und Bedeutung. Zeitschrift fur¨ Philosophie und philosophische Kri- tik, 100:25–50. Gottlob Frege. 1952. On sense and reference. In Pe- ter T. Geach and Max Black, editors, Translations from the Philosophical Writings of Gottlob Frege, pages 56–78. Blackwell, Oxford. Translation of Frege (1892). Irene Heim. 1998. Anaphora and semantic interpretation: A reinterpretation of Reinhart’s approach. In Uli Sauerland and Orin Percus, editors, The Inter- pretive Tract, volume 25 of MIT Working Papers in Linguistics, pages 205–246. MITWPL, Cambridge, MA. Saul Kripke. 1972. Naming and necessity. In Donald Davidson and Gilbert Harman, editors, Semantics of Natural Language, pages 253–355. Reidel. Eugenio Moggi. 1989. Computational lambda- calculus and monads. In LICS, pages 14–23. IEEE Computer Society. Michael Moortgat. 2011. Categorial type logics. In Johan van Benthem and Alice ter Meulen, editors, Handbook of Logic and Language, pages 95–179. Elsevier, second edition. Chung-chieh Shan. 2001. Monads for natural language semantics. In Kristina Striegnitz, editor, Pro- ceedings of the ESSLLI-2001 Student Session, pages 285–298. 13th European Summer School in Logic, Language and Information. Thomas Ede Zimmermann. 1993. Scopeless quanti- ﬁers and operators. Journal of Philosophical Logic, 22(5):545–561, October.

27 The Phenogrammar of Coordination

Chris Worth Department of Linguistics The Ohio State University [email protected]

Abstract (2) Joffrey whined and sniveled. (3) Tyrion slapped and Tywin chastised Jof- Linear Categorial Grammar (LinCG) is a frey. sign-based, Curryesque, relational, logical categorial grammar (CG) whose cen- The first example is a straightforward instance tral architecture is based on linear logic. of noun phrase coordination. The second and third Curryesque grammars separate the ab- are both instances of what has become known in stract combinatorics (tectogrammar) of the categorial grammar literature as “functor co- linguistic expressions from their concrete, ordination”, that is, the coordination of linguistic audible representations (phenogrammar). material that is in some way incomplete. The third Most of these grammars encode linear or- is particularly noteworthy as being an example of der in string-based lambda terms, in which a “right node raising” construction, whereby the there is no obvious way to distinguish right argument Joffrey serves as the object to both of from left. Without some notion of direc- the higher NP-Verb complexes. We will show that tionality, grammars are unable to differen- all three examples can be given an uncomplicated tiate, say, subject and object for purposes account in the Curryesque framework of Linear of building functorial coordinate struc- Categorial Grammar (LinCG), and that (2) and tures. We introduce the notion of a phe- (3) have more in common than not. nominator as a way to encode the term Section 1 provides an overview of the data and structure of a functor separately from its and central issues surrounding an analysis of co- “string support”. This technology is then ordination in Curryesque grammars. Section 2 in- employed to analyze a range of coordina- troduces the reader to the framework of LinCG, tion phenomena typically left unaddressed and presents the technical innovations at the heart by Linear Logic-based Curryesque frame- of this paper. Section 3 gives lexical entries and works. derivations for the examples in section 1, and sec- 1 Overview tion 4 discusses our results and suggests some directions for research in the near future, with refer- Flexibility to the notion of constituency in con- ences following. junction with introduction (and composition) rules has allowed categorial grammars to successfully 1.1 Curryesque grammars and Linear address an entire host of coordination phenomena Categorial Grammar in a transparent and compositional manner. While We take as our starting point the Curryesque (af- “Curryesque” CGs as a rule do not suffer from ter Curry (1961)) tradition of categorial grammars, some of the other difficulties that plague Lambek making particular reference to those originating CGs, many are notably deficient in one area: co- with Oehrle (1994) and continuing with Abstract ordination. Lest we throw the baby out with the Categorial Grammar (ACG) of de Groote (2001), bathwater, this is an issue that needs to be ad- Muskens (2010)’s Lambda Grammar (λG), Kub- dressed. We take the following to be an exemplary ota and Levine’s Hybrid Type-Logical Catego- subset of the relevant data, and adopt a fragment rial Grammar (Kubota and Levine, 2012) and methodology to show how it may be analyzed. to a lesser extent the Grammatical Framework (1) Tyrion and Joffrey drank. of Ranta (2004), and others. These dialects

28 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 28–36, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics of categorial grammar make a distinction be- ering’ operators in the tectogrammar (that is, ex- tween Tectogrammar, or “abstract syntax”, and pressions which scope within a continuation), and Phenogrammar, or “concrete syntax”. Tec- it ‘focuses’ a particular meaningful unit in the se- togrammar is primarily concerned with the struc- mantics. A focused expression might share its tec- tural properties of grammar, among them co- totype ((NP ( S) ( S) with, say, a quantified occurrence, case, agreement, tense, and so forth. noun phrase, but the two could have different phe- Phenogrammar is concerned with computing a notypes, reflecting the accentuation or lack thereof pre-phonological representation of what will even- by placing the resulting expression in the domain tually be produced by the speaker, and encom- of prosodic boundary phenomena or not. Never- passes word order, morphology, prosody, and the theless, the system is constrained by the fact that like. the tectogrammar is based on linear logic, so if we take some care when writing grammar rules, we Linear Categorial Grammar (LinCG) is a sign- should still find resource sensitivity to be at the based, Curryesque, relational, logical categorial heart of the framework. grammar whose central architecture is based on linear logic. Abbreviatory overlap has been 1.2 Why coordination is difficult for a regrettably persistent problem, and LinCG is Curryesque grammars the same in essence as the framework varyingly Most Curryesque CGs encode linear order in called Linear Grammar (LG) and Pheno-Tecto- lambda terms, and there is no obvious way to dis- Differentiated Categorial Grammar (PTDCG), and tinguish ‘right’ from ‘left’ by examining the types developed in Smith (2010), Mihalicek (2012), (be they linear or intuitionistic).1 This is not a Martin (2013), Pollard and Smith (2012), and Pol- problem when we are coordinating strings directly, lard (2013). In LinCG, the syntax-phonology and as de Groote and Maarek (2007) show, but an anal- syntax-semantics interfaces amount to noting that ysis of the more difficult case of functor coordi- the logics for the phenogrammar, the tectogram- nation remains elusive.2 Without some notion of mar, and the semantics operate in parallel. This directionality, grammars are unable to distinguish stands in contrast to ‘syntactocentric’ theories of between, say, subject and object. This would seem grammar, where syntax is taken to be the fun- to predict, for example, that λs. s SLAPPED damental domain within which expressions com- · · JOFFREY and λs. TYRION SLAPPED s would bine, and then phonology and semantics are ‘read · · have the same syntactic category (NP S in the off’ of the syntactic representation. LinCG is con- ( tectogrammar, and St St in the phenogram- ceptually different in that it has relational, rather → mar), and would thus be compatible under coor- than functional, interfaces between the three com- dination, but this is generally not the case. What ponents of the grammar. Since we do not interpret we need is a way to examine the structure of a syntactic types into phenogrammatical or semantic lambda term independently of the specific string types, this allows us a great deal of freedom within constants that comprise it. To put it another way, each logic, although in practice we maintain a in order to coordinate functors, we need to be able fairly tight connection between all three compo- to distinguish between what Oehrle (1995) calls nents. Grammar rules take the form of derivational their string support, that is, the string constants rules which generate triples called signs, and they which make up the body of a particular functional bind together the three logics so that they operate term, and the linearization structure such functors concurrently. While the invocation of a grammar impose on their arguments. rule might simply be, say, point-wise application, the ramifications for the three systems can in prin- 2 Linear Categorial Grammar (LinCG) ciple be different; one can imagine expressions which exhibit type asymmetry in various ways. Curryesque grammars separate the notion of linear order from the abstract combinatorics of linguis- By way of example, one might think of ‘focus’ 1A noteworthy exception is Ranta’s Grammatical Frame- as an operation which has reflexes in all three as- work (GF), explored in, e.g. Ranta (2004) and Ranta pects of the grammar: it applies pitch accents to (2009). GF also makes distinctions between tectogrammar the target string(s) in the phenogrammar (the dif- and phenogrammar, though it has a somewhat different conception of each. ference between accented and unaccented words 2A problem explicitly recognized by Kubota (2010) in being reflected in the phenotype), it creates ‘low- section 3.2.1.

29 Γ, x : A b : B tic expressions, and as such base their tectogram- ` Abs Γ λx : A. b : A B mars around logics other than bilinear logic; the ` → Grammatical Framework is based on Martin-Lof¨ We additionally stipulate the following familiar type theory, and LinCG and its cousins ACG and axioms governing the conversion and reduction of 3 λG use linear logic. Linear logic is generally de- lambda terms: scribed as being “resource-sensitive”, owing to the λx : A. b = λy : A. [y/x]b (α-conversion) ` lack of the structural rules of weakening and con- (λx. b a) = [a/x]b (β-reduction) ` traction. Resource sensitivity is an attractive no- As is common to any number of Curryesque tion, theoretically, since it allows us to describe frameworks, we encode the phenogrammatical processes of resource production, consumption, parts of LinCG signs with typed lambda terms and combination in a manner which is agnostic consisting of strings, and functions over strings.4 about precisely how resources are combined. Cer- We axiomatize our theory of strings in the familiar tain problems which have been historically tricky way: for Lambek categorial grammars (medial extrac- : St tion, quantifier scope, etc.) are easily handled by ` : St St St LinCG. ` · → → stu : St. s (t u) = (s t) u Since a full introduction to the framework is re- ` ∀ · · · · s : St. s = s = s grettably impossible given current constraints, we ` ∀ · · refer the interested reader to the references in sec- The first axiom asserts that the empty string is tion 1.1, which contain a more in-depth discus- a string. The second axiom asserts that concatenation, written , is a (curried) binary function on sion of the potential richness of the architecture · of LinCG. We do not wish to say anything new strings. The third axiom represents the fact that about the semantics or the tectogrammar of coor- concatenation is associative, and the fourth, that dination in the current discussion, so we will ex- the empty string is a two-sided identity for con- pend our time fleshing out the phenogrammatical catenation. Because of the associativity of con- component of the framework, and it is to this topic catenation, we will drop parentheses as a matter that we now turn. of convention. The phenogrammar of a typical LinCG sign will 2.1 LinCG Phenogrammar resemble the following (with one complication to LinCG grammar rules take the form of tripartite be added shortly): inference rules, indicating what operations take λs. s SNIVELED : St St place pointwise within each component of the ` · → signs in question. There are two main gram- Since we treat St as the only base type, we will generally omit typing judgments in lambda terms mar rules, called application (App) for combining when no confusion will result. Furthermore, we signs, and abstraction (Abs) for creating the potential for combination through hypothetical rea- use SMALLCAPS to indicate that a particular con- soning. Aside from the lexical entries given as stant is a string. So, the preceding lexical entry s axioms of the theory, it is also possible to obtain provides us with a function from some string , to strings, which concatenates the string SNIVELED typed variables using the rule of axiom (Ax), and s we make use of this rule in the analysis of right to the right of . node raising found in section 3.4. While the tec- 2.1.1 Phenominators togrammar of LinCG is based on a fragment of The center of our analysis of coordination is linear logic, the phenogrammatical and semantic the notion of a phenominator (short for pheno- components are based on higher order logic. Since combinator), a particular variety of typed lambda we are concerned only with the phenogrammatical term. Intuitively, phenominators serve the same component here, we have chosen to simplify the purpose for LinCG that bilinear (slash) types do exposition by presenting only the phenogrammat- for Lambek categorial grammars. Specifically, ical part of the rules of application and abstraction: 3 Ax The exact status of the rule of η-conversion with respect f : A f : A to this framework is currently unclear, and since we do not ` make use of it, we omit its discussion here. Γ f : A B ∆ a : A 4 ` → ` App although other structures have been proposed, e.g. the Γ, ∆ (f a): B node sets found in Muskens (2001). `

30 they encode the linearization structure of a func- long as the arguments in question are immediately tor, that is, where arguments may eventually oc- adjacent to the string support at each successive cur with respect to its string support. To put it an- application, it is possible to permute them to some other way, a phenominator describes the structure extent without losing the general thrust of the anal- a functor “projects”, in terms of linear order. ysis. For example, the choice to have transitive From a technical standpoint, we would like to verbs take their object arguments first is insignifi- define a phenominator as a closed monoidal linear cant.5 Since strings are implicitly under the image lambda term, i.e. a term containing no constants of the identity phenominator λs.s, we will consis- other than concatenation and the empty string. tently omit this subscript. The idea is that phenominators are the terms of We will be able to define a function we call the higher order theory of monoids, and they in say, so that it will have the following property: some ways describe the abstract “shape” of pos- s : St. ϕ :Φ.say (ϕ s) = s sible string functions. For those accustomed to ` ∀ ∀ thinking of “syntax” as being word order, then That is, say is a left inverse for unary phenomi- phenominators can be thought of as a kind of syn- nators. tactic combinator. In practice, we will make use The function say is defined recursively via cer- only of what we call the unary phenominators, the tain objects we call vacuities. The idea of a vacu- types of which we will refer to using the sort Φ ity is that it be in some way an “empty argument” (with ϕ used by custom as a metavariable over to which a functional term may apply. If we are unary phenominators, i.e. terms whose type is in dealing with functions taking string arguments, it Φ). These are not unary in the strict sense, but they seems obvious that the vacuity on strings should will have as their centerpiece one particular string be the empty string . If we are dealing with second-order functions taking St St arguments, variable, which will be bound with the highest → scope. We will generally abbreviate phenomina- for example, quantified noun phrases like every- one, then the vacuity on St St should be the tors by the construction with which they are most → λs.s commonly associated: VP for verb phrases and identity function on strings, . Higher vacuities intransitive verbs, TV for transitive verbs, DTV than these become more complicated, and defin- for ditransitive verbs, QNP for quantified noun ing all of the higher-order vacuities is not entirely phrases, and RNR for right node raising construc- straightforward, as certain types are not guaran- tions. Here are examples of some of the most com- teed to have a unique vacuity. Fortunately, we can mon phenominators we will make use of and the do it for any higher-function taking as an argument abbreviations we customarily use for them: another function under the image of a phenominator – then the vacuity on such a function is just the Phenominator Abbreviation phenominator applied to the empty string.6 The λs.s (omitted) central idea is easily understood when one asks λvs.s v VP · what, say, a vacuous transitive verb sounds like. λvst.t v s TV · · The answer seems to be: by itself, nothing, but λvstu.u v s t DTV · · · it imposes a certain order on its arguments. One λvP.(P v) QNP practical application of this clause is in analyzing λvs.v s RNR · so-called “argument cluster coordination”, where As indicated previously, the first argument of a this definition will ensure that the argument cluster phenominator always corresponds to what we re- gets linearized in the correct manner. This analy- fer to (after Oehrle (1995)) as the string support sis is regrettably just outside the scope of the cur- of a particular term. With the first argument dis- rent inquiry, though the notion of the phenomina- pensed with, we have chosen the argument order 5Since we believe it is possible to embed Lambek catego- of the phenominators out of general concern for rial grammars in LinCG, this fact reflects that the calculus we what we perceive to be fairly uncontroversial cat- are dealing with is similar to the associative Lambek Calcu- lus. egorial analyses of English grammatical phenom- 6A reviewer suggests that this concept may be related to ena. That is, transitive verbs take their object ar- the “context passing representation” of Hughes (1995), and guments first, and then their subject arguments, di- the association of a nil term with its continuation with respect to contexts is assuredly evocative of the association of transitives take their first and second object argu- the vacuity on a phenominator-indexed type with the contin- ments, followed by their subject argument, etc. As uation of with respect to a phenominator.

31 tor can be profitably employed to provide exactly Then our typing is justified along the following such an analysis by adopting and reinterpreting a lines: categorial account along the lines of the one given τ ::= (St St) ϕ → VP in Dowty (1988). ::= (St St)λvs.s v → · We formally define vacuities as follows: ::= (St St)λf:St St. t:St.f=(λvs.s v t) → → ∃ · vacSt St =def λs.s So applying the subtyping predicate to the term in → vacτϕ =def (ϕ ) question, we have The reader should note that as a special case of the (λf : St St. t : St. → ∃ second clause, we have f = (λvs.s v t) λs . s SNIVELED) · 0 0 · vac = vac = (λs.s ) = = t : St.λs0. s0 SNIVELED = (λvs.s v t) St Stλs.s ∃ · · = t : St.λs . s SNIVELED = λs. s t This in turn enables us to define say: ∃ 0 0 · · = t : St.λs. s SNIVELED = λs. s t say =def λs.s ∃ · · St which is true with t = SNIVELED, and the term is sayτ1 τ2 =def λk : τ1 τ2. sayτ2 (k vacτ1 ) → → shown to be well-typed. say(τ1 τ2)ϕ =def sayτ1 τ2 → → For an expedient example, we can apply say to 3 Analysis our putative lexical entry from earlier, and verify The basic strategy underlying our analysis of coor- that it will reduce to the string SNIVELED as de- dination is that in order to coordinate two linguis- sired: tic signs, we need to track two things: their lin- saySt St λs. s SNIVELED → · earization structure, and their string support. If we = λk : St St. → have access to the linearization structure of each (say (k vac )) λs. s SNIVELED St St · conjunct, then we can check to see that it is the = say (λs. s SNIVELED vac ) St · St same, and the signs are compatible for coordina- = say (λs. s SNIVELED ) St · tion. Furthermore, we will be able to maintain this = say SNIVELED St · structure independent of the actual string support = say SNIVELED St of the individual signs. = λs.s SNIVELED Phenominators simultaneously allow us to = SNIVELED check the linearization structure of coordination 2.1.2 Subtyping by unary phenominators candidates and to reconstruct the relevant lin- In order to augment our type theory with the earization functions after coordination has taken relevant subtypes, we turn to Lambek and Scott place. The function say addresses the second (1986), who hold that one way to do subtyping is point. For a given sign, we can apply say to it by defining predicates that amount to the charac- in order to retrieve its string support. Then, we teristic function of the particular subtype in ques- will be able to directly coordinate the resulting tion, and then ensuring that these predicates meet strings by concatenating them with a conjunction certain axioms embedding the subtype into the su- in between. Finally, we can apply the phenomi- pertype. We will be able to write such predicates nator to the resulting string and retrieve the new using phenominators. A unary phenominator is linearization function, containing the entire coor- one which has under its image a function whose dinate structure as its string support. string support is a single contiguous string. With 3.1 Lexical entries this idea in place, we are able to assign subtypes to functional types in the following way. In LinCG, lexical entries constitute the (nonlogi- For τ a (functional) type, we write τϕ (with ϕ a cal) axioms of the proof theory. First we consider the simplest elements of our fragment, the phenos phenominator) as shorthand for τϕ0 , where: for the proper names Joffrey, Tyrion, and Tywin: ϕ = λf : τ. s : St.f = (ϕ s) 0 ∃ (4) a. JOFFREY : St Then ϕ0 constitutes a subtyping predicate in the ` b. TYRION : St manner of Lambek and Scott (1986). For example, ` let τ = St St and ϕ = λvs.s v. Let us consider c. TYWIN : St → · ` the following (putative) lexical entry (pheno only): Next, we consider the intransitive verbs drank, λs . s SNIVELED :(St St) sniveled and whined.: ` 0 0 · → VP

32 (5) a. λs. s DRANK :(St St) the identity phenominator, and since say is also ` · → VP St b. λs. s SNIVELED :(St St) defined to be the identity on strings, the lexical en- ` · → VP c. λs. s WHINED :(St St)VP try we obtain for and simply concatenates each ` · → argument string to either side of the string AND. Each of these is a function from strings to strings, We give the full term reduction here, although this seeking to linearize its ‘subject’ string argument to version of and can be shown to be equal to the fol- the left of the verb. They are under the image of lowing: the “verb phrase” phenominator λvs.s v. · λc1c2 : St. c2 AND c1 : St St St The transitive verbs chastised and slapped seek ` · · → → to linearize their first string argument to the right, Since our terms at times become rather large, resulting in a function under the image of the VP we will adopt a convention where proof trees are phenominator, and their second argument to the given with numerical indexes instead of sequents, left, resulting in a string. with the corresponding sequents following below (at times on multiple lines). We will from time (6) a. λst. t CHASTISED s to time elide multiple steps of reduction, noting in ` · · :(St St St)TV passing the relevant definitions to consider when → → b. λst. t SLAPPED s reconstructing the proof. ` · · :(St St St)TV 1 2 → → 3 4 Technically, this type could be written (St → 6 5 (St St) ) , but for the purposes of coordina- 7 → VP TV tion, the present is sufficient. Each of these entries 1. λc1 : St. λc2 : St. ` is under the image of the “transitive verb” phe- λs.s ((saySt c2) AND (saySt c1)) · · nominator λvst.t v s. : St St St · · → → Finally, we come to the lexical entry schema for 2. JOFFREY : St and: ` 3. λc : St. ` 2 (7) λc : τ . λc : τ . λs.s ((saySt c2) AND (saySt JOFFREY)) ` 1 ϕ 2 ϕ · · ϕ ((say c ) AND (say c )) = λc2 : St. τϕ 2 · · τϕ 1 : τ τ τ λs.s ((saySt c2) AND (λs.s JOFFREY)) ϕ → ϕ → ϕ · · = λc : St. λs.s ((say c ) AND JOFFREY) 2 St 2 · · We note first that it takes two arguments of iden- : St St tical types τ, and furthermore that these must be → 4. TYRION : St under the image of the same phenominator ϕ. It ` 7 then returns an expression of the same subtype. 5. λs.s ((saySt TYRION) AND JOFFREY): ` · · This mechanism bears more detailed examination. St First, each conjunct is subjected to the function = λs.s ((λs.s TYRION) AND JOFFREY): St · · say, which, given its type, will return the string = λs.s (TYRION AND JOFFREY): St · · support of the conjunct. Then, the resulting strings = TYRION AND JOFFREY : St · · AND are concatenated to either side of the string . 6. λs. s DRANK :(St St)VP Finally, the phenominator of each argument is ap- ` · → 7. TYRION AND JOFFREY DRANK : St plied to the resulting string, creating a function ` · · · identical to the linearization functions of each of 3.3 Functor coordination the conjuncts, except with the coordinated string Here, in order to understand the term appearing in the relevant position. in each conjunct, it is helpful to notice that the following equality holds (with f a function from 3.2 String coordination strings to strings, under the image of the VP phe- String coordination is direct and straightforward. nominator): Since string-typed terms are under the image of f :(St St)VP say(St St) f → ` → VP 7 Since ϕ occurs within both the body of the term and the = saySt St f → subtyping predicate, we note that this effectively takes us into = saySt (f vacSt) the realm of dependent types. Making the type theory of the = say (f ) phenogrammar precise is an ongoing area of research, and we St are aware that constraining the type system is of paramount = λs.s (f ) importance for computational tractability. = (f ): St

33 This says that to coordinate VPs, we will ﬁrst need The difference is that we encode directionality in to reduce them to their string support by feeding the phenominator, rather than in the type. Since their linearization functions the empty string. For our system does not include function composition the sake of brevity, this term reduction will be as a rule, but as a theorem, we will need to make elided from steps 5 and 8 in the derivations be- use of hypothetical reasoning in order to permute low. Steps 2 and 6 constitute the hypothesizing the order of the string arguments in order to con- and subsequent withdrawal of an ‘object’ string struct expressions with the correct structure.8 argument t0, as do steps 10 and 14 (s0). Format- As was the case with the functor coordinating restrictions prohibit rule-labeling on the proof tion example in section 3.3, applying say to the trees, so we note that these are each instances of conjuncts passes them the empty string, reducing the rules of axiom (Ax) and abstraction (Abs), re- them to their string support, as shown here: spectively. f :(St St)RNR say(St St) f → ` → RNR 1 2 = saySt St f 3 4 → = saySt (f vacSt) 5 6 = say (f ) 7 St = λs.s (f ) 1. λc :(St St) . λc :(St St) . ` 1 → VP 2 → VP = (f ): St λvs.s v · As before, this reduction is elided in the proof ((say(St St) c2) AND (say(St St) c1)) → VP · · → VP given below, occurring in steps 8 and 15. :(St St) (St St) (St St) → VP → → VP → → VP 2. λs. s SNIVELED :(St St)VP ` · → 1 2 3. λc2 :(St St)VP. λvs.s v 3 4 9 10 ` → · 5 11 12 ((say(St St) c2) AND → VP · 7 6 13 (say λs. s SNIVELED)) (St St)VP 8 14 ·. → · . 15 16 17 = λc :(St St) . λvs.s v 2 → VP · ((say(St St) c2) AND SNIVELED) 1. λst. t CHASTISED s :(St St St)TV → VP · · ` · · → → :(St St)VP (St St)VP 2. t : St t : St → → → 0 ` 0 4. λs. s WHINED :(St St)VP 3. t : St λt. t CHASTISED t :(St St) ` · → 0 ` · · 0 → VP 5. λvs.s v ((say(St St) λs. s WHINED) ` · → VP · 4. TYWIN : St AND SNIVELED) ` .· · 5. t0 : St TYWIN CHASTISED t0 : St . ` · · 6. λt . TYWIN CHASTISED t :(St St) = (λvs.s v WHINED AND SNIVELED) ` 0 · · 0 → RNR · · · = λs. s WHINED AND SNIVELED 7. λc1 :(St St)RNR. λc2 :(St St)RNR. · · · ` → → :(St St)VP λvs.v s ((say(St St) c2) → · → RNR AND (say(St St) c1)) 6. JOFFREY : St · · → RNR ` :(St St) (St St) → RNR → → RNR 7. JOFFREY WHINED AND SNIVELED : St (St St) ` · · · → → RNR 3.4 Right node raising 8. λc2 :(St St)RNR. λvs.v s ` → · ((say(St St) c2) AND In the end, ‘right node raising’ constructions prove → RNR · (say λt . TYWIN CHASTISED t )) only to be a special case of functor coordination. (St St)RNR 0 0 ·. → · · The key here is the licensing of the ‘rightward- . looking’ functors, which are under the image of = λc2 :(St St)RNR. λvs.v s the phenominator λvs.v s. As was the case with → · · 8Regrettably, space constraints prohibit a discussion veri- the ‘leftward-looking’ functor coordination exam- fying the typing for the ‘right node raised’ terms. The reader ple in section 3.3, this analysis is essentially the can verify that the terms are in fact well-typed given the sub- same as the well-known Lambek categorial gram- typing schema in section 2.1.2. It is possible to write inference rules that speak directly to the introduction and elimina- mar analysis originating in Steedman (1985) and tion of the relevant functional subtypes, but these are omitted continuing in Dowty (1988) and Morrill (1994). here for the sake of brevity.

34 ((say(St St) c2) 4.1 Future work → RNR AND TYWIN CHASTISED) · · · It is possible to give an analysis of argument clus- :(St St) (St St) → RNR → → RNR ter coordination using phenominators, instantiat- 9. λst. t SLAPPED s :(St St St) ing the lexical entry for and with τ as the type ` · · → → TV (St St St St)DTV (St St)VP and ϕ as 10. s0 : St s0 : St → → → → → ` λvP s. s (P ) v, and using hypothetical reason- 11. s : St λt. t SLAPPED s :(St St) · · 0 ` · · 0 → VP ing. Regrettably, the necessity of brevity prohibits 12. TYRION : St a detailed account here. ` Given that phenominators provide access to the 13. s0 : St TYRION SLAPPED s0 : St ` · · structure of functional terms which concatenate 14. λs . TYRION SLAPPED s :(St St) ` 0 · · 0 → RNR strings to the right and left of their string support, 15. λvs.v s ((say(St St) it is our belief that any Lambek categorial gram- ` · → RNR λs . TYRION SLAPPED s ) mar analysis can be recast in LinCG by an algo- 0 · · 0 AND TYWIN CHASTISED) rithmic translation of directional slash types into .· · · phenominator-indexed functional phenotypes, and . we are currently in the process of evaluating a po- = (λvs.v s TYRION SLAPPED · · tential translation algorithm from directional slash AND TYWIN CHASTISED) · · · types to phenominators. This should in turn pro- = λs. TYRION SLAPPED AND · · vide us with most of the details necessary to de- TYWIN CHASTISED s :(St St) · · · → RNR scribe a system which emulates the HTLCG of 16. JOFFREY : St Kubota and Levine (2012), which provides anal- ` 17. TYRION SLAPPED AND yses of various gapping phenomena, greatly in- ` · · TYWIN CHASTISED JOFFREY : St creasing the overall empirical coverage. · · · There are a number of coordination phenomena 4 Discussion that require modifications to the tectogrammatical component. We would like to be able to analyze We provide a brief introduction to the framework unlike category coordinations like rich and an ex- of Linear Categorial Grammar (LinCG). One of cellent cook in the manner of Bayer (1996), as well the primary strengths of categorial grammar in as Morrill (1996), which would require the addi- general has been its ability to address coordination of some variety of sum types in the tectogram- tion phenomena. Coordination presents a uniquely mar. Further muddying the waters is so-called “it- particular problem for grammars which distin- erated” or “list” coordination, which requires the guish between structural combination (tectogram- ability to generate coordinate structures contain- mar) and the actual linear order of the strings gen- ing a number of conjuncts with no coordinating erated by such grammars (part of phenogrammar). conjunction, as in Thurston, Kim, and Steve. Due to the inability to distinguish ‘directionality’ It is our intent to extend the use of phenom- in string functors within a standard typed lambda inators to analyze intonation as well, and we calculus, a general analysis of coordination seems expect that they can be fruitfully employed to difficult. give accounts of focus, association with focus, We have elaborated LinCG’s concept of contrastive topicalization, “in-situ” topicalization, phenogrammar by introducing phenominators, alternative questions, and any number of other closed monoidal linear lambda terms. We have phenomena which are at least partially realized shown how the recursive function say provides a prosodically. left inverse for unaryphenominators, and we have defined a more general notion of an ‘empty cate- Acknowledgements gory’ known as a vacuity, which say is defined in terms of. It is then possible to describe sub- I am grateful to Carl Pollard, Bob Levine, Yusuke types of functional types suitable to make the rele- Kubota, Manjuan Duan, Gerald Penn, the TTNLS vant distinctions. These technologies enable us to 2014 committee, and two anonymous reviewers give analyses of various coordination phenomena for their comments. Any errors or misunderstand- in LinCG, extending the empirical coverage of the ings rest solely on the shoulders of the author. framework.

35 References Reinhard Muskens. 2010. New Directions in Type- Theoretic Grammars. Journal of Logic, Lan- Samuel Bayer. 1996. The coordination of unlike cate- guage and Information, 19(2):129–136. DOI gories. Language, pages 579–616. 10.1007/s10849-009-9114-9. Haskell B. Curry. 1961. Some Logical Aspects of Richard Oehrle. 1994. Term-Labeled Categorial Type Grammatical Structure. In Roman. O. Jakobson, ed- Systems. Linguistics and Philosophy, 17:633–678. itor, Structure of Language and its Mathematical As- pects, pages 56–68. American Mathematical Soci- Dick Oehrle. 1995. Some 3-dimensional systems of ety. labelled deduction. Logic Journal of the IGPL, 3(2- 3):429–448. Philippe de Groote and Sarah Maarek. 2007. Type- theoretic Extensions of Abstract Categorial Gram- Carl Pollard and E. Allyn Smith. 2012. A uniﬁed mars. In Reinhard Muskens, editor, Proceedings analysis of the same, phrasal comparatives, and su- of Workshop on New Directions in Type-Theoretic perlatives. In Anca Chereches, editor, Proceedings Grammars. of SALT 22, pages 307–325. eLanguage.

Philippe de Groote. 2001. Towards Abstract Catego- Carl Pollard. 2013. Agnostic Hyperintensional Se- rial Grammars. In Association for Computational mantics. Synthese. to appear. Linguistics, 39th Annual Meeting and 10th Confer- ence of the European Chapter, Proceedings of the Aarne Ranta. 2004. Grammatical Framework: A Conference, pages 148–155. Type-Theoretical Grammar Formalism. Journal of Functional Programming, 14:145–189. David Dowty. 1988. Type raising, functional composi- Aarne Ranta. 2009. The GF resource grammar library. tion, and non-constituent conjunction. In Richard T. Linguistic Issues in Language Technology, 2(1). Oehrle, Emmon Bach, and Deirdre Wheeler, editors, Categorial Grammars and Natural Language Struc- E. Allyn Smith. 2010. Correlational Comparison in tures, volume 32 of Studies in Linguistics and Phi- English. Ph.D. thesis, The Ohio State University. losophy, pages 153–197. Springer Netherlands. Mark Steedman. 1985. Dependency and coordination¨ John Hughes. 1995. The design of a pretty-printing li- in the grammar of dutch and english. Language, brary. In Advanced Functional Programming, pages pages 523–568. 53–96. Springer.

Yusuke Kubota and Robert Levine. 2012. Gapping as like-category coordination. In D. Bechet´ and A. Dikovsky, editors, Logical Aspects of Computa- tional Linguistics (LACL) 2012.

Yusuke Kubota. 2010. (In)ﬂexibility of Constituency in Japanese in Multi-Modal Categorial Grammar with Structured Phonology. Ph.D. thesis, The Ohio State University.

J. Lambek and P.J. Scott. 1986. Introduction to higher order categorical logic. Cambridge Univer- sity Press.

Scott Martin. 2013. The Dynamics of Sense and Impli- cature. Ph.D. thesis, The Ohio State University.

Vedrana Mihalicek. 2012. Serbo-Croatian Word Or- der: A Logical Approach. Ph.D. thesis, The Ohio State University.

Glyn V. Morrill. 1994. Type Logical Grammar. Kluwer Academic Publishers.

Glyn Morrill. 1996. Grammar and logic*. Theoria, 62(3):260–293.

Reinhard Muskens. 2001. Lambda grammars and the syntax-semantics interface. In Proceedings of the Thirteenth Amsterdam Colloquium, pages 150–155. Universiteit van Amsterdam.

36 Natural Language Reasoning Using proof-assistant technology: Rich Typing and beyond∗ Stergios Chatzikyriakidis Zhaohui Luo Dept of Computer Science, Dept of Computer Science, Royal Holloway, Univ of London Royal Holloway, Univ of London Egham, Surrey TW20 0EX, U.K; Open Egham, Surrey TW20 0EX, U.K; University of Cyprus [email protected] [email protected]

Abstract on formal semantics in MTTs with coercive sub- In this paper, we study natural language typing (Luo, 2012b) and its implementation in the inference based on the formal semantics proof assistant Coq (Coq, 2007). in modern type theories (MTTs) and their A Modern Type Theory (MTT) is a dependent implementations in proof-assistants such type theory consisting of an internal logic, which as Coq. To this end, the type theory follows the propositions-as-types principle. This UTT with coercive subtyping is used as latter feature along with the availability of power- the logical language in which natural lan- ful type structures make MTTs very useful for for- guage semantics is translated to, followed mal semantics. The use of MTTs for NL semantics by the implementation of these semantics has been proposed with exciting results as regards in the Coq proof-assistant. Valid infer- various issues of NL semantics, ranging from ences are treated as theorems to be proven quantification and anaphora to adjectival modifi- via Coq’s proof machinery. We shall em- cation, co-predication, belief and context formal- phasise that the rich typing mechanisms in ization. (Sundholm, 1989; Ranta, 1994; Boldini, MTTs (much richer than those in the sim- 2000; Cooper, 2005; Fox and Lappin, 2005; Re- ple type theory as used in the Montagovian toré, 2013; Ginzburg and Cooper, forthcoming; setting) provide very useful tools in many Luo, 2011a; Luo, 2012b; Chatzikyriakidis and respects in formal semantics. This is ex- Luo, 2012; Chatzikyriakidis and Luo, 2013a). Re- emplified via the formalisation of various cently, there has been a systematic study of MTT linguistic examples, including conjoined semantics using Luo’s UTT with coercive subtyp- NPs, comparatives, adjectives as well as ing (type theory with coercive subtyping, hence- various linguistic coercions. The aim of forth TTCS) (Luo, 2010; Luo, 2011a; Luo, 2012b; the paper is thus twofold: a) to show that Chatzikyriakidis and Luo, 2012; Chatzikyriakidis the use of proof-assistant technology has and Luo, 2013a; Chatzikyriakidis and Luo, 2013b; indeed the potential to be developed into Chatzikyriakidis and Luo, 2014). This is the ver- a new way of dealing with inference, and sion of MTT used in this paper. More specifically, b) to exemplify the advantages of having a the paper concentrates on one of the key differ- rich typing system to the study of formal ences between MTTs and simple typed ones, i.e. semantics in general and natural language rich typing. Rich typing will be shown to be a inference in particular. key ingredient for both formal semantics in general and the study of NLI in particular. 1 Introduction A proof assistant is a computer system that as- Natural Language Inference (NLI), i.e. the task of sists the users to develop proofs of mathemati- determining whether an NL hypothesis can be in- cal theorems. A number of proof assistants im- ferred from an NL premise, has been an active re- plement MTTs. For instance, the proof assistant search theme in computational semantics in which Coq (Coq, 2007) implements pCIC, the predica- various approaches have been proposed (see, for tive Calculus of Inductive Constructions1 and sup- example (MacCartney, 2009) and some of the ref- 1 erences therein). In this paper, we study NLI based pCIC is a type theory that is rather similar to UTT, es- pecially after its universe Set became predicative since Coq This∗ work is supported by the research grant F/07- 8.0. A main difference is that UTT does not have co-inductive 537/AJ of the Leverhulme Trust in the U.K. types. The interested reader is directed to Goguen’s PhD the-

37 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 37–45, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics ports some very useful tactics that can be used to sivity compared to classical simple typed systems help the users to automate (parts of) their proofs. as these are used in mainstream Montagovian se- Proof assistants have been used in various applica- mantics. tions in computer science (e.g., program verifica- tion) and formalised mathematics (e.g., formalisa- 2.1 Type many-sortedness and CNs as types tion of the proof of the 4-colour theorem in Coq). In Montague semantics (Montague, 1974), the The above two developments, the use of MTT underlying logic (Church’s simple type theory semantics on the one hand and the implementa- (Church, 1940)) can be seen as ‘single-sorted’ in tion of MTTs in proof assistants on the other, has the sense that there is only one type e of all enti- opened a new research avenue: the use of existing ties. The other types such as t of truth values and proof assistants in dealing with NLI. In this pa- the function types generated from e and t do not per, two different goals are to be achieved: a) on a stand for types of entities. In this respect, there are more practical level, to show how proof-assistant no fine-grained distinctions between the elements technology can be used in order to deal with NLI of type e and as such all individuals are interpreted and b) on a theoretical level, the significance of using the same type. For example, John and Mary rich typing for formal semantics and NLI in par- have the same type in simple type theories, the ticular. These two different aspects of the paper type e of individuals. An MTT, on the other hand, will be studied on a par, by concentrating on a can be regarded as a ‘many-sorted’ logical system number of NLI cases (quite a lot actually) that in that it contains many types and as such one can are adequately dealt with on a theoretical level via make fine-grained distinctions between individu- rich typing and the implementation of the account als and further use those different types to interpret making use of rich type structures in Coq on a subclasses of individuals. For example, one can more practical level. We shall also consider how to have John: [[man]] and Mary : [[woman]], where employ dependent typing in the coercive subtyp- [[man]] and [[woman]] are different types. ing framework to formalise linguistic coercions. An important trait of MTT-based semantics is the interpretation of common nouns (CNs) as types 2 Rich typing in MTTs (Ranta, 1994) rather than sets or predicates (i.e., objects of type e t) as it is the case within A Modern Type Theory (MTT) is a variant of → the Montagovian tradition. The CNs man, human, a class of type theories in the tradition initiated table and book are interpreted as types [[man]], by the work of Martin-Löf (Martin-Löf, 1975; [[human]], [[table]] and [[book]], respectively. Then, Martin-Löf, 1984), which have dependent and individuals are interpreted as being of one of the inductive types, among others. We choose to types used to interpret CNs. The interpretation of call them Modern Type Theories in order to dis- CNs as Types is also a prerequisite in order for the tinguish them from Church’s simple type theory subtyping mechanism to work. This is because, (Church, 1940) that is commonly employed within assuming CNs to be predicates, subtyping would the Montagovian tradition in formal semantics. go wrong given contravariance of function types.3 Among the variants of MTTs, we are going to employ the Unified Theory of dependent Types 2.2 Subtyping (UTT) (Luo, 1994) with the addition of the co- Coercive subtyping (Luo, 1999; Luo et al., 2012) ercive subtyping mechanism (see, for example, provides an adequate framework to be employed (Luo, 1999; Luo et al., 2012) and below). UTT is for MTT-based formal semantics (Luo, 2010; Luo, an impredicative type theory in which a type P rop 2012b).4 It can be seen as an abbreviation mech- of all logical propositions exists.2 This stands anism: is a (proper) subtype of ( ) if as part of the study of linguistic semantics using A B Acoercion c from type A dependent case, Σ-types can be used to interpret to type B and, if so, an object a of type A can be linguistic phenomena of central importance, like used in any context CB[ ] that expects an object of for example adjectival modification (Ranta, 1994). type B: CB[a] is legal (well-typed) and equal to For example, handsome man is interpreted as a CB[c(a)]. Σ-type (4), the type of handsome men (or more As an example, assuming that both [[man]] and precisely, of those men together with proofs that [[human]] are base types, one may introduce the they are handsome): following as a basic subtyping relation: (4) Σm: [[man]]. [[handsome]](m) (1) [[man]] < [[human]] where [[handsome]](m) is a family of proposi- In case that [[man]] is defined as a compos- tions/types that depends on the man m.5 ite Σ-type (see 2.3 below for details), where The other basic constructor for dependent types § male: [[human]] P rop: is Π. Π-types can be seen as a generalization of the → normal function space where the second type is a (2) [[man]]= Σh: [[human]]. male(h) family of types that might be dependent on the values of the first. A Π-type degenerates to the func- we have that (1) is the case because the above Σ- tion type A B in the non-dependent case. In type is a subtype of [[human]] via the first projec- → more detail, when A is a type and P is a predicate tion π1: over A, Πx : A.P (x) is the dependent function

(3) (Σh: [[human]]. male(h)) <π1 [[human]] type that, in the embedded logic, stands for the universally quantiﬁed proposition x : A.P (x). We will see in the next section the importance of ∀ the coercive subtyping mechanism when dealing For example, the following sentence (5) is inter- with NLI. preted as (6):

2.3 Dependent typing and universes (5) Every man walks. One of the basic features of MTTs is the use of (6) Πx: [[man]].[[walk]](x) Dependent Types. A dependent type is a family of types depending on some values. Here we explain Type Universes. An advanced feature of MTTs, two basic constructors for dependent types, Σ and which will be shown to be very relevant in inter- Π, both highly relevant for the study of linguistic preting NL semantics, is that of universes. Infor- semantics. mally, a universe is a collection of (the names of) The constructor/operator Σ is a generaliza- types put into a type (Martin-L¨of, 1984).6 For ex- tion of the Cartesian product of two sets that ample, one may want to collect all the names of allows the second set to depend on values of the types that interpret common nouns into a uni- the ﬁrst. For instance, if [[human]] is a type verse CN : Type. The idea is that for each type A and male: [[human]] P rop, then the Σ-type → that interprets a common noun, there is a name A Σh: [[human]]. male(h) is intuitively the type of in CN. For example, humans who are male.

More formally, if A is a type and B is an A- [[man]]: CN and TCN([[man]]) = [[man]]. indexed family of types, then Σ(A, B), or some- 5 times written as , is a type, consist- Adjectival modification is a notoriously difficult issue Σx : A.B(x) and as such not all cases of adjectives can be captured via ing of pairs (a, b) such that a is of type A and b using a Σ type analysis. For a proper treatment of adjecti- is of type B(a). When B(x) is a constant type val modification within this framework, see (Chatzikyriakidis (i.e., always the same type no matter what x is), and Luo, 2013a). 6There is quite a long discussion on how these universes the Σ-type degenerates into product type A B of should be like. In particular, the debate is largely concen- × non-dependent pairs. Σ-types (and product types) trated on whether a universe should be predicative or impredicative. A strongly impredicative universe U of all types are associated projection operations π1 and π2 so (with U : U and Π-types) is shown to be paradoxical (Gi- that π1(a, b)= a and π2(a, b)= b, for every (a, b) rard, 1971) and as such logically inconsistent. The theory of type Σ(A, B) or A B. UTT we use here has only one impredicative universe Prop × (representing the world of logical formulas) together with in- The linguistic relevance of Σ-types can be di- finitely many predicative universes which as such avoids Gi- rectly appreciated once we understand that in its rard’s paradox (see (Luo, 1994) for more details).

39 In practice, we do not distinguish a type in CN and that P (x): P rop, such cases are straightforwardly its name by omitting the overlines and the operator proven.

TCN by simply writing, for instance, [[man]]: CN. Thus, the universe includes the collection of the 3.1 The FraCas test suite names that interpret common nouns. For example, In this section we present how implementing MTT in CN, we shall find the following types: NL semantics in Coq can deal with various cases (7) [[man]], [[woman]], [[book]], ... of NLI inference. For this reason, we use exam- (8) Σm: [[man]].[[handsome]](m) ples from the FraCas test suite. The FraCas Test Suite (Cooper et al., 1996) arose out of the FraCas (9) G + G R F Consortium, a huge collaboration with the aim to where the Σ-type in (8 is the proposed inter- develop a range of resources related to computa- pretation of ‘handsome man’ and the disjoint sum tional semantics. The FraCas test suite is specifi- type in (9) is that of ‘gun’ (the sum of real guns cally designed to reflect what an adequate theory and fake guns – see above).7 Interesting appli- of NL inference should be able to capture. It com- cations of the use of universes can be proposed prises NLI examples formulated in the form of a like for example, their use in giving the types for premise (or premises) followed by a question and quantifiers and VP adverbs as extending over the an answer. For instance, universe CN (Luo, 2011b) as well as coordination extending over the universe of all linguistic types (12) Either Smith, Jones and Anderson signed the LType (Chatzikyriakidis and Luo, 2012). contract. Did Jones sign the contract? [Yes] 3 NL Inference in Coq The examples are quite simple in format but are Coq is a dependently typed interactive theorem designed to cover a very wide spectrum of seman- prover implementing the calculus of Inductive tic phenomena, e.g. generalized quantifiers, con- Constructions (pCiC, see (Coq, 2007)). Coq, and joined plurals, tense and aspect related phenom- in general proof-assistants, provide assistance in ena, adjectives and ellipsis, among others. In what the development of formal proofs. The idea is sim- follows, we show how the use of a rich type sys- ple: you use Coq in order to see whether state- tem can deal with NLI adequately (at least for the ments as regards anything that has been either pre- cases looked at) from both a theoretical and an im- defined or user-defined (definitions, parameters, plementational point of view. variables) can be proven or not. In the case of NLI, the same idea applies: once the semantics of NL 3.2 Rich typing and NLI words are defined, then these semantics can be rea- 3.2.1 Quantifiers soned about by using Coq’s proof mechanism. In A great deal of the FraCas examples are cases of this sense, valid NLIs can be seen as theorems, or inference that result from the monotone properties better valid NLIs must be theorems. of quantifiers. Examples concerning monotonic- A very simple case of semantic entailment, that ity on the first argument are very easily treated in of example (10), will therefore be formulated as a system encoding an MTT with coercive subtyp- the following theorem in Coq (11): ing, by employing the subtyping relations between (10) John walks some man walks ⇒ CNs. To put this claim in context, let us look at (11) Theorem x: John walks some man walks the following example (3.55) from the FraCas test → Now, depending on the semantics of the indi- suite: vidual lexical items one may or may not prove the theorem that needs to be proven in each case. In- (13) Some Irish delegates finished the survey on ferences like the one shown in (11) are easy cases time. in Coq. Assuming the semantics of some which Did any delegate finish the report on time specify that given any A of type CN and a predi- [Yes] cate of type A P rop, there exists an x: A such → Treating adjectival modification as involving a 7 The use of disjoint sum types was proposed by Σ type where the first projection is always a coer- (Chatzikyriakidis and Luo, 2013a) in order to deal with priva- tive modification. The interested reader is directed there for cion as in (Luo, 2011a), we get Irish delegate to details. be a subtype of delegate, i.e. [[Irishdelegate]] <

40 [[delegate]]. This is basically all that Coq needs in 3.2.2 Conjoined NPs 8 order to prove the inference. Inference involving conjoined NPs concerns cases Moving on to quantifier cases involving mono- like the one shown below: tonicity on the second argument, we notice that these are more difficult to get since an adjunct (e.g. a PP) is involved in deriving the inference: (18) Smith, Jones and Anderson signed the contract. (14) Some delegates finished the survey on time. Did Jones sign the contract? [Yes] Did any delegate finish the survey? [Yes] In (Chatzikyriakidis and Luo, 2012), a polymor- The type proposed for VP adverbs by Luo (Luo, phic type for binary coordinators that extends over 2011b) is based on the idea of a type universe of the constructed universe LType, the universe of CNs. As already said in the introduction, type uni- linguistic types was proposed. This can be ex- verses a universe is a collection of (the names of) tended to n-ary coordinators. For example, the types put into a type. In this respect, one can form coordinator and may take three arguments, as in the premise of (18). In such cases, the type of the the universe CN which basically stands for the collection of names interpreting common nouns. coordinator, denoted as and3 in semantics, is: The type proposed for VP adverbs makes use of (19) and : ΠA: LT ype.A A A A. 3 → → → this CN universe and assumes quantification over Intuitively, we may write this type as it (Chatzikyriakidis and Luo, 2013a; Chatzikyri- ΠA: LT ype.A3 A. For instance, the akidis and Luo, 2012): → semantics of (18) is (20), where c is ‘the contract’: (15) ΠA: CN. (A P rop) (A P rop) → → → However, in order to derive the inference (20) [[sign]](and3(s, j, a), c) needed in cases of monotonicity on the second ar- In order to consider such coordinators in rea- gument cases, this typing alone is not enough. Σ soning, we consider the following auxiliary object types can be used in order to slightly modify the (similarly to the auxiliary object ADV ) and define typing. In order to do this, we first introduce an and as follows: auxiliary object ADV as follows: 3 (21) AND3 : ΠA : LT ype. Πx,y,z : A. Σa : (16) ADV : ΠA: CN.Πv : A P rop.Σp: A → → A. p : A P rop. p(a) p(x) p(y) P rop. x : A.p(x) v(x) ∀ → ⊃ ∧ ∧ ∀ ⊃ p(z). This reads as follows: for any common noun A (22) and3 = λA : LType.λx,y,z : A. and any predicate v over A, ADV (A, v) is a pair π1(AND3(A,x,y,z)) (p,m) such that for any x: A, p(x) implies v(x). Having defined the coordinators such as and in Taking the sentence (14) as an example, for the such a way, we can get the desired inferences. For 9 CN delegate and predicate [[finish]] , we define example, from the semantics (20), we can infer on time to be the first projection of the auxiliary that ‘Jones signed the contract’, the hypothesis in object (16) which is of type (15): (18).10 Coordinators such as or can be defined in a similar way. (17) on time = λA : CN.λv : A P rop. → π1(ONTIME(A, v)) 3.2.3 Comparatives As a consequence, for instance, any delegate Inference with comparatives can also be treated by who finished the survey on time (p(x)) in (16) did using Σ types. Two ways of doing this will be pro- finish the survey (v(x)). posed, one not involving and one involving measures. We shall consider shorter than as a typi- 8For details on the semantics of the other lexical items like cal example. Intuitively, shorter than should be e.g. VP adverbs in the sentence, see the following discussion. Also, following Luo (Luo, 2011a) we implement Σ-types as 10A note about Coq is in order here: building new uni- dependent record types in Coq. Again, see (Chatzikyriakidis verses is not an option in Coq (or, put in another way, Coq and Luo, 2013b) for details. does not support building of new universes). Instead, we shall 9Note that [[finish]]: [[human]] Prop < use an existing universe in Coq in conducting our examples [[delegate]] Prop. → for coordination. → 41 of type Human Human P rop as in the (28) Smith believed that Itel had won the contract → → following example: 1991. Did Itel win the contract in 1991? [Don’t (23) Mary is shorter than John. know] We assume that there be a predicate What we need is to encode that verbs like know short: Human P rop, expressing that a presuppose their argument’s truth while verbs like → human is short. Intuitively, if Mary is shorter believe do not. For instance, know belongs to the than John and John is short, then so is Mary. former class and its semantics is given as follows: Furthermore, one should be able to take care of (29) KNOW = Σp : Human P rop the transitive properties of comparatives. Thus, → → P rop. h : Human P : P rop. p(h, P ) if A is COMP than B and B is COMP than ∀ ∀ ⊃ C, then A is also COMP than C. All these P can be captured by considering COMP of the (30) [[know]] = π1(KNOW ) following Σ-type and define shorter than to be its In effect, a similar reasoning to the one used in first projection: dealing with VP adverbs is proposed. In effect, (24) COMP :Σp: Human Human an auxiliary object is firstly used, followed by the → → definition of know as the first projection of the Σ P rop. h1, h2, h3 : Human. ∀ type involved in the auxiliary object. With this, the p(h1, h2) p(h2, h3) p(h1, h3) ∧ ⊃ ∧ inference (27) can be obtained as expected. In- h1, h2 : Human.p(h1, h2) short(h2) ∀ ⊃ ⊃ tensional verbs like believe on the other hand do short(h1). not imply their arguments and inferences like (28) (25) [[shorter than]] = π (COMP ) 1 cannot be shown to be valid inferences. With the above, we can easily show that the inferences like (26) can be obtained as expected.11 3.2.5 Adjectival inference As a last example of the use of rich typing in order (26) John is shorter than George. to deal with NLI, we discuss NLI cases involving George is shorter than Stergios. adjectives. In (Chatzikyriakidis and Luo, 2013a) Is John shorter than Stergios? [Yes] we have shown that the use of subtyping, Σ types and universes can give us a correct account of at Given the definition in COMP according to least intersective and subsective adjectives. Note which if two elements stand in a COMP relation that the original Σ type analysis proposed by re- (meaning that the first argument is shorter than searchers like Ranta (Ranta, 1994) is inadequate to the second one), and there is also a third element capture the inferential properties of either intersec- standing in a COMP relation with the second, tive or subsective adjectives. The FraCas test suite then by transitivity defined in COMP , this third has a rather different classification. One major dis- element also stands in a COMP relation with the tinction is between affirmative and non-affirmative first, i.e. the third element is shorter than the first. adjectives shown below: 3.2.4 Factive/Implicative verbs (31) Affirmative: Adj(N) (N) This section concerns inference cases with various ⇒ (32) Non-affirmative: Adj(N) ; (N) types of verbs that presuppose the truth of their Concentrating on affirmative adjectives for the complement like for example factive or implica- moment, we see that a Σ type analysis is enough tive verbs. Example (27) is an example of such a in these cases.Cases of affirmative adjectives are verb, while (28) is not: handled well with the existing record mechanism (27) Smith knew that Itel had won the contract already used for adjectives. The following infer- 1991. ence as well as similar inferences are correctly Did Itel win the contract in 1991? [Yes] captured, given that a CN modified by an inter- 11In giving a full analysis of compratives, one may further sective adjective is interpreted as a Σ-type which consider measures. Such an account is also possible using Σ is a subtype of the CN via means of the first pro- types, in effect extending the account just proposed for com- jection. paratives. The idea is basically to extend the above account using dependent typing over measures. Such an account can Cases of subsective adjectives are discussed be found in (Chatzikyriakidis and Luo, 2013b) in the section dubbed as extensional comparison

42 classes in the FraCas test suite. There, cases of The MTT-semantics of (36) is (37): adjectival inference involving adjectives like small (37) x: [[book]]. [[enjoy]](j, x) and large are discussed. Cases like these can be ∃ handled using a typing which quantiﬁes over a uni- where verse. In the case of large and small this universe is the universe CN:12 (38) [[enjoy]]: Human Event P rop. → → (33) ΠA: CN. (A P rop) → However, the domain type of [[enjoy]](j) is With this typing, cases like the one shown be- Event, which is different from Book! Then, how low are correctly treated: can [[enjoy]](j, x) in (37) be well-typed? The an- (34) All mice are small animals. swer is that, in the framework of coercive subtyp- Mickey is a large mouse. ing and, in particular, under the assumption of the Is Mickey a large animal? [No] following coercion: Lastly, one should be able to take care of inferences associated with intersective adjectives like (39) Book

43 (42) coercion Book

44 modiﬁcation to conjoined/disjoined NPs, compar- Z. Luo, S. Soloviev, and T. Xue. 2012. Coercive atives as well as factive/implicative verbs and type subtyping: theory and implementation. Information coercions. and Computation, 223:18–42. Z. Luo. 1994. Computation and Reasoning: A Type Theory for Computer Science. Oxford Univ Press. References Z. Luo. 1999. Coercive subtyping. Journal of Logic N. Asher and Z. Luo. 2012. Formalisationof coercions and Computation, 9(1):105–130. in lexical semantics. Sinn und Bedeutung 17, Paris, 223. Z. Luo. 2010. Type-theoretical semantics with coercive subtyping. Semantics and Linguistic Theory 20 P. Boldini. 2000. Formalizing context in intuitionistic (SALT20), Vancouver, 84(2):28–56. type theory. Fundamenta Informaticae, 42(2):1–23. Z. Luo. 2011a. Contextual analysis of word meanings S. Chatzikyriakidis and Z. Luo. 2012. An ac- in type-theoretical semantics. In Logical Aspects count of natural language coordination in type the- of Computational Linguistics (LACL'2011). LNAI ory with coercive subtyping. In Y. Parmentier and 6736, pages 159–174. D. Duchier, editors, Proc. of Constraint Solving and Language Processing (CSLP12). LNCS 8114, pages Zhaohui Luo. 2011b. Adjectives and adverbs in type- 31–51, Orleans. theoretical semantics. Notes. Z. Luo. 2012a. Common nouns as types. In D. Bechet S. Chatzikyriakidis and Z. Luo. 2013a. Adjectives and A. Dikovsky, editors, Logical Aspects of Com- in a modern type-theoretical setting. In G. Morrill putational Linguistics (LACL'2012). LNCS 7351, and J.M Nederhof, editors, Proceedings of Formal pages 173–185. Grammar 2013. LNCS 8036, pages 159–174. Z. Luo. 2012b. Formal semantics in modern type the- S. Chatzikyriakidis and Z. Luo. 2013b. Natural lan- ories with coercive subtyping. Linguistics and Phi- guage inference in coq. Submitted. losophy, 35(6):491–513. S. Chatzikyriakidis and Z. Luo. 2014. Hyperin- B. MacCartney. 2009. Natural Language Inference. tensionality in modern type theories. Submitted Ph.D. thesis, Stanford Universisty. manuscript. P. Martin-L¨of. 1975. An intuitionistic theory of types: A. Church. 1940. A formulation of the simple theory predicative part. In H.Rose and J.C.Shepherdson, of types. J. Symbolic Logic, 5(1). editors, Logic Colloquium'73.

R. Cooper, D. Crouch, J. van Eijck, C. Fox, J. van Gen- P. Martin-L¨of. 1984. Intuitionistic Type Theory. Bib- abith, J. Jaspars, H. Kamp, D. Milward, M. Pinkal, liopolis. M. Poesio, and S. Pulman. 1996. Using the framework. Technical Report LRE 62-051r. R. Montague. 1974. Formal Philosophy. Yale Univer- http://www.cogsci.ed.ac.uk/ fracas/. sity Press.

R. Cooper. 2005. Records andrecordtypes in semantic A. Ranta. 1994. Type-Theoretical Grammar. Oxford theory. J. Logic and Compututation, 15(2). University Press. C. Retor´e. 2013. The Montagovian generative lexicon The Coq Development Team, 2007. The Coq Proof Ty : an integrated type-theoretical framework for Assistant Reference Manual (Version 8.1), INRIA. λ n compositional semantics and lexical pragmatics. C. Fox and S. Lappin. 2005. Foundations of Inten- G. Sundholm. 1989. Constructive generalized quanti- sional Semantics. Blackwell. ﬁers. Synthese, 79(1):1–12. J. Ginzburg and R. Cooper. forthcoming. Ttr for natural language semantics. In C. Fox and S. Lappin, editors, Handbook of Contemporary Semantic The- ory. Blackwell.

J.-Y. Girard. 1971. Une extension de l’interpretation fonctionelle de gödel àl’analyse et son application àl’élimination des coupures dans et la thèorie des types’. Proc. 2nd Scandinavian Logic Symposium. North-Holland.

H. Goguen. 1994. A Typed Operational Semantics for Type Theory. Ph.D. thesis, University of Edinburgh.

45 A Type-Driven Tensor-Based Semantics for CCG

Jean Maillard Stephen Clark Edward Grefenstette University of Cambridge University of Cambridge University of Oxford Computer Laboratory Computer Laboratory Department of Computer Science [email protected] [email protected] [email protected]

Abstract One approach is to assume that the meanings of all words are represented by context vectors, and This paper shows how the tensor-based se- then combine those vectors using some operation, mantic framework of Coecke et al. can such as vector addition, element-wise multiplica- be seamlessly integrated with Combina- tion, or tensor product (Clark and Pulman, 2007; tory Categorial Grammar (CCG). The inte- Mitchell and Lapata, 2008). A more sophisticated gration follows from the observation that approach, which is the subject of this paper, is to tensors are linear maps, and hence can adapt the compositional process from formal se- be manipulated using the combinators of mantics (Dowty et al., 1981) and attempt to build CCG, including type-raising and compo- a distributional representation in step with the syn- sition. Given the existence of robust, tactic derivation (Coecke et al., 2010; Baroni et al., wide-coverage CCG parsers, this opens up 2013). Finally, there is a third approach using neu- the possibility of a practical, type-driven ral networks, which perhaps lies in between the compositional semantics based on distri- two described above (Socher et al., 2010; Socher butional representations. et al., 2012). Here compositional distributed representations are built using matrices operating on 1 Intoduction vectors, with all parameters learnt through a su- In this paper we show how tensor-based distribu- pervised learning procedure intended to optimise tional semantics can be seamlessly integrated with performance on some NLP task, such as syntac- Combinatory Categorial Grammar (CCG, Steed- tic parsing or sentiment analysis. The approach man (2000)), building on the theoretical discus- of Hermann and Blunsom (2013) conditions the sion in Grefenstette (2013). Tensor-based distribu- vector combination operation on the syntactic type tional semantics represents the meanings of words of the combinands, moving it a little closer to the with particular syntactic types as tensors whose se- more formal semantics-inspired approaches. mantic type matches that of the syntactic type (Co- The remainder of the Introduction gives a short ecke et al., 2010). For example, the meaning of a summary of distributional semantics. The rest of transitive verb with syntactic type (S NP)/NP is the paper introduces some mathematical notation \ a 3rd-order tensor from the tensor product space from multi-linear algebra, including Einstein nota- N S N. The seamless integration with CCG tion, and then shows how the combinatory rules of ⊗ ⊗ arises from the (somewhat trivial) observation that CCG, including type-raising and composition, can tensors are linear maps — a particular kind of be applied directly to tensor-based semantic rep- function — and hence can be manipulated using resentations. As well as describing a tensor-based CCG’s combinatory rules. semantics for CCG, a further goal of this paper is to Tensor-based semantics arises from the desire to present the compositional framework of Coecke et enhance distributional semantics with some com- al. (2010), which is based on category theory, to a positional structure, in order to make distribu- computational linguistics audience using only the tional semantics more of a complete semantic the- mathematics of multi-linear algebra. ory, and to increase its utility in NLP applications. There are a number of suggestions for how 1.1 Distributional Semantics to add compositionality to a distributional seman- We assume a basic knowledge of distributional se- tics (Clarke, 2012; Pulman, 2013; Erk, 2012). mantics (Grefenstette, 1994; Schutze,¨ 1998). Re-

46 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 46–54, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics cent inroductions to the topic include Turney and ing recent techniques from “recursive” neural net- Pantel (2010) and Clark (2014). works (Socher et al., 2010). Another possibility A potentially useful distinction for this paper, is suggested by Grefenstette et al. (2013), extend- and one not commonly made, is between distri- ing the learning technique based on linear regres- butional and distributed representations. Distri- sion from Baroni and Zamparelli (2010) in which butional representations are inherently contextual, “gold-standard” distributional representations are and rely on the frequently quoted dictum from assumed to be available for some phrases and Firth that “you shall know a word from the com- larger units. pany it keeps” (Firth, 1957; Pulman, 2013). This 2 Mathematical Preliminaries leads to the so-called distributional hypothesis that words that occur in similar contexts tend to have The tensor-based compositional process relies on similar meanings, and to various proposals for taking dot (or inner) products between vectors and how to implement this hypothesis (Curran, 2004), higher-order tensors. Dot products, and a number including alternative definitions of context; alter- of other operations on vectors and tensors, can be native weighting schemes which emphasize the conveniently written using Einstein notation (also importance of some contexts over others; alterna- referred to as the Einstein summation convention). tive similarity measures; and various dimension- In the rest of the paper we assume that the vector ality reduction schemes such as the well-known spaces are over the field of real numbers. LSA technique (Landauer and Dumais, 1997). An interesting conceptual question is whether a sim- 2.1 Einstein Notation ilar distributional hypothesis can be applied to The squared amplitude of a vector v Rn is given ∈ phrases and larger units: is it the case that sen- by: tences, for example, have similar meanings if they n v 2 = v v occur in similar contexts? Work which does ex- | | i i i=1 tend the distributional hypothesis to larger units X includes Baroni and Zamparelli (2010), Clarke Similarly, the dot product of two vectors v, w n ∈ (2012), and Baroni et al. (2013). R is given by: Distributed representations, on the other hand, n can be thought of simply as vectors (or possibly v w = viwi · higher-order tensors) of real numbers, where there Xi=1 is no a priori interpretation of the basis vectors. Denote the components of an m n real matrix Neural networks can perhaps be categorised in this × A by Aij for 1 i m and 1 j n. Then way, since the resulting vector representations are ≤ ≤ ≤ ≤ n the matrix-vector product of A and v R gives simply sequences of real numbers resulting from m ∈ a vector Av R with components: the optimisation of some training criterion on a ∈ training set (Collobert and Weston, 2008; Socher n et al., 2010). Whether these distributed represen- (Av)i = Aijvj tations can be given a contextual interpretation de- Xj=1 pends on how they are trained. We can also multiply an n m matrix A and an × One important point for this paper is that the m o matrix B to produce an n o matrix AB × × tensor-based compositional process makes no as- with components: sumptions about the interpretation of the tensors. m Hence in the remainder of the paper we make no (AB) = A B reference to how noun vectors or verb tensors, ij ik kj k=1 for example, can be acquired (which, for the case X of the higher-order tensors, is a wide open re- The previous examples are some of the most search question). However, in order to help the common operations in linear algebra, and they all reader who would prefer a more grounded dis- involve sums over repeated indices. They can be cussion, one possibility is to obtain the noun vec- simplified by introducing the Einstein summation tors using standard distributional techniques (Cur- convention: summation over the relevant range ran, 2004), and learn the higher-order tensors us- is implied on every component index that occurs

47 twice. Pairs of indices that are summed over are Thus every finite-dimensional vector is a linear known as contracted, while the remaining indices functional, and vice versa. Row and column vec- are known as free. Using this convention, the tors are examples of first-order tensors. above operations can be written as: Definition 1 (First-order tensor). Given a vector 2 space V over the field , a first-order tensor T v = vivi R | | can be defined as: v w = viwi · an element of the vector space V , • (Av)i = Aijvj, i.e. the contraction of v with a linear map T : V R, the second index of A • → a V -dimensional array of numbers Ti, for (AB)ij = AikBkj, i.e. the contraction of the • | | 1 i V . second index of A with the first of B ≤ ≤ | | These three definitions are all equivalent. Given Note how the number of free indices is always a first-order tensor described using one of these conserved between the left- and right-hand sides in definitions, it is trivial to find the two other de- these examples. For instance, while the last equa- scriptions. tion has two indices on the left and four on the right, the two extra indices on the right are con- Matrices An n m matrix A over R can be rep- × tracted. Hence counting the number of free indices resented by a two-dimensional array of real num- can be a quick way of determining what type of bers A , for 1 i n and 1 j m. ij ≤ ≤ ≤ ≤ object is given by a certain mathematical expres- Via matrix-vector multiplication, the matrix A sion in Einstein notation: no free indices means can be seen as a linear map A : Rm Rn. It → that an operation yields a scalar number, one free maps a vector v Rm to a vector index means a vector, two a matrix, and so on. ∈ A A v 11 ··· 1m 1 2.2 Tensors . . . .  . .. .   .  , Linear Functionals Given a finite-dimensional An1 Anm vm n  ···    vector space R over R, a linear functional is a     linear map a : Rn R. with components → Let a vector v have components v in a fixed ba- i A(v) = A v . sis. Then the result of applying a linear functional i ij j a to v can be written as: We can also contract a vector with the first index of the matrix, which gives us a map A : Rn m → v1 R . This corresponds to the operation . a(v) = a1v1+ +anvn = a1 an  .  A11 A1m ··· ··· ··· v . . . n w1 wn . .. . ,   ···  . .    A A The numbers ai are the components of the lin-  n1 ··· nm ear functional, which can also be pictured as a row   resulting in a vector with components vector. Since there is a one-to-one correspondence T between row and column vectors, the above equa- (w A)i = Ajiwj. tion is equivalent to: We can combine the two operations and see a matrix as a map A : Rn Rm R, defined by: a1 × → v(a) = a v + +a v = v v . A11 A1m v1 1 1 ··· n n 1 ··· n  .  ··· wT Av = w w . .. . . an 1 n  . . .   .    ···   An1 Anm vm Using Einstein convention, the equations above  ···        can be written as: In Einstein notation, this operation can be written as a(v) = viai = v(a) wiAijvj,

48 which yields a scalar (constant) value, consistent – T : V W R or T : V W R. × → ⊗ → with the fact that all the indices are contracted. Finally, matrices can also be characterised in Again, these deﬁnitions are all equivalent. Most terms of Kronecker products. Given two vectors importantly, the four types of maps given in the v Rn and w Rm, their Kronecker product deﬁnition are isomorphic. Therefore specifying ∈ ∈ v w is a matrix one map is enough to specify all the others. ⊗

v1w1 v1wm Tensors We can generalise these definitions to . ···. . the more general concept of tensor. v w = . .. . , ⊗  . .  vnw1 vnwm Definition 3 (Tensor). Given vector spaces  ···  th   V1,...,Vk over the field R, a k -order tensor T with components is defined as:

(v w)ij = viwj. an element of the vector space V V , ⊗ • 1 ⊗ · · · ⊗ k It is a general result in linear algebra that any a V V , kth-dimensional array of • | 1| × · · · × | k| n m matrix can be written as a finite sum of numbers Ti1 i , for 1 ij Vj , × ··· k ≤ ≤ | | Kronecker products x(k) y(k) of a set of k ⊗ vectors x(k) and y(k). Note that the sum over k a multi-linear map T : V1 Vk R. P • × · · · × → is written explicitly as it would not be implied by Einstein notation: this is because the index k does 3 Tensor-Based CCG Semantics not range over vector/matrix/tensor components, In this section we show how CCG’s syntactic types but over a set of vectors, and hence that index ap- can be given tensor-based meaning spaces, and pears in brackets. how the combinator’s employed by CCG to com- An n m matrix is an element of the tensor × bine syntactic categories carry over to those mean- space Rn Rm, and it can also be seen as a linear ⊗ ing spaces, maintaining what is often described map A : Rn Rm R. This is because, given ⊗ → as CCG’s “transparent interface” between syntax a matrix B with decomposition x(k) y(k), k ⊗ and semantics. Here are some example syntactic the matrix A can act as follows: P types, and the corresponding tensor spaces containing the meanings of the words with those types X (k) (k) A(B) = Aij xi yj (using the notation syntactic type : semantic type). k We first assume that all atomic types have   (k) A11 A1m y X ··· meanings living in distinct vector spaces: = (k) (k)  . .. .  .  x1 xn  . . .  .  k ··· (k) An1 Anm ym noun phrases, NP : N ··· • = A B . ij ij sentences, S : S • Again, counting the number of free indices in the The recipe for determining the meaning space last line tells us that this operation yields a scalar. of a complex syntactic type is to replace each Matrices are examples of second-order tensors. atomic type with its corresponding vector space Definition 2 (Second-order tensor). Given vector and the slashes with tensor product operators: spaces V,W over the field R, a second-order tensor T can be defined as: Intransitive verb, S NP : S N • \ ⊗ an element of the vector space V W , Transitive verb, (S NP)/NP : S N N • ⊗ • \ ⊗ ⊗ Ditransitive verb, ((S NP)/NP)/NP : a V W -dimensional array of numbers • \ • | | × | | S N N N Tij, for 1 i V and 1 j W , ⊗ ⊗ ⊗ ≤ ≤ | | ≤ ≤ | | Adverbial modifier, (S NP) (S NP) : • \ \ \ a (multi-) linear map: S N S N • ⊗ ⊗ ⊗ – T : V W , Preposition modifying NP, (NP NP)/NP : → • \ – T : W V , N N N → ⊗ ⊗

49 Hence the meaning of an intransitive verb, for the first index corresponds to the type X and the example, is a matrix in the tensor product space second to the type Y . That is why, when perform- S N. The meaning of a transitive verb is a ing the contraction corresponding to Pat walks, ⊗ “cuboid”, or 3rd-order tensor, in the tensor product P N is contracted with the second index of ∈ space S N N. In the same way that the syntac- W S N, and not the first.1 The first index ⊗ ⊗ ∈ ⊗ tic type of an intransitive verb can be thought of as of W is then the only free index, telling us that the a function — taking an NP and returning an S — above operation yields a first-order tensor (vector). the meaning of an intransitive verb is also a func- Since this index corresponds to S, we know that tion (linear map) — taking a vector in N and re- applying backward application to Pat walks yields turning a vector in S. Another way to think of this a meaning vector in S. function is that each element of the matrix spec- Forward application is performed in the same ifies, for a pair of basis vectors (one from N and manner. Consider the following example: one from S), what the result is on the S basis vec- Pat kisses Sandy tor given a value on the N basis vector. NP (S NP)/NP NP Now we describe how the combinatory rules \ NS N NN carry over to the meaning spaces. ⊗ ⊗ with corresponding tensors P N for Pat, K ∈ ∈ 3.1 Application S N N for kisses and Y N for Sandy. ⊗ ⊗ ∈ The function application rules of CCG are forward The forward application deriving the type of (>) and backward (<) application: kisses Sandy corresponds to X/Y Y = X (>) K Y , ⇒ ijk k YX Y = X (<) \ ⇒ where Y is contracted with the third index of K In a traditional semantics for CCG, if function because we have maintained the order defined by application is applied in the syntax, then function the type (S NP)/NP: the third index then corre- \ application applies also in the semantics (Steed- sponds to an argument NP coming from the right. man, 2000). This is also true of the tensor-based Counting the number of free indices in the semantics. For example, the meaning of a subject above expression tells us that it yields a second- NP combines with the meaning of an intransitive order tensor. Looking at the types corresponding verb via matrix multiplication, which is equivalent to the free indices tells us that this second-order to applying the linear map corresponding to the tensor is of type S N, which is the semantic type ⊗ matrix to the vector representing the meaning of of a verb phrase (or intransitive verb), as we have the NP. Applying (multi-)linear maps in (multi- already seen in the walks example. )linear algebra is equivalent to applying tensor 3.2 Composition contraction to the combining tensors. Here is the case for an intransitive verb: The forward (>B) and backward (B) NP S NP ⇒ \ Y ZX Y = X Z (

50 with tensors M S N S N for might and YX Y X, by backward application; ∈ ⊗ ⊗ ⊗ • \ ⇒ K S N N for kiss. Combining the meanings ∈ ⊗ ⊗ YX Y X/(X Y ) X Y , by forward of might and kiss corresponds to the following op- • \ ⇒T \ \ type-raising, and X/(X Y ) X Y X, by eration: \ \ ⇒ forward application. MijklKklm, yielding a tensor in S N N, which is the Both ways of parsing this sentence yield an item ⊗ ⊗ correct semantic type for a phrase with syntactic of type X, and crucially the meaning of the result- type (S NP)/NP. Backward composition is per- ing item should be the same in both cases.2 This \ formed analogously. property of type-raising provides an avenue into determining what the tensor representation for the 3.3 Backward-Crossed Composition type-raised category should be, since the tensor English also requires the use of backward-crossed representations must also be the same: composition (Steedman, 2000): AjBij = Aijk0 Bjk. X/Y Z X = Z/Y (

(S NP)/NP (S NP) (S NP) T) and backward (T) ⇒ \ order tensor at all places where the ﬁrst two in- X = T (T/X)(

51 a cubiod in which the noun vector is repeated a For each i, j, k, two matrices — corresponding to number of times (once for each sentence index), the l, m indices above — are “cancelled”. resulting in a series of “steps” progressing diag- This intuitive explanation extends to arguments onally from the bottom of the cuboid to the top with any number of slashes. For example, a (assuming a particular orientation). composition where the cancelling categories are The discussion so far has been somewhat ab- (N /N )/(N /N ) would require inner products be- stract, so to ﬁnish this section we include some tween 4th-order tensors in N N N N. more examples with CCG categories, and show ⊗ ⊗ ⊗ that the tensor contraction operation has an intuitive similarity with the “cancellation law” of cat- 4 Related Work egorial grammar which applies in the syntax. First consider the example of a subject NP The tensor-based semantics presented in this pa- with meaning A, combining with a verb phrase per is effectively an extension of the Coecke et al. S NP with meaning B, resulting in a sentence (2010) framework to CCG, re-expressing in Ein- \ with meaning C. In the syntax, the two NPs can- stein notation the existing categorical CCG exten- cel. In the semantics, for each basis of the sentence sion in Grefenstette (2013), which itself builds space S we perform an inner product between two on an earlier Lambek Grammar extension to the vectors in N: framework by Coecke et al. (2013). This work also bears some similarity to the Ci = AjBij treatment of categorial grammars presented by Ba- Hence, inner products in the tensor space corre- roni et al. (2013), which it effectively encompasses spond to cancellation in the syntax. by expressing the tensor contractions described by This correspondence extends to complex argu- Baroni et al. as Einstein summations. However, ments, and also to composition. Consider the sub- this paper also covers CCG-speciﬁc operations not ject type-raising case, in which a subject NP with discussed by Baroni et al., such as type-raising and meaning A in S S N combines with a verb composition. ⊗ ⊗ phrase S NP with meaning B, resulting in a sen- One difference between this paper and the orig- \ tence with meaning C. Again we perform inner inal work by Coecke et al. (2010) is that they use product operations, but this time the inner product pregroups as the syntactic formalism (Lambek, is between two matrices:3 2008), a context-free variant of categorial grammar. In pregroups, cancellation in the syntax is

Ci = AijkBjk always between two atomic categories (or more precisely, between an atomic category and its “ad- Note that two matrices are “cancelled” for each joint”), whereas in CCG the arguments in complex basis vector of the sentence space (i.e. for each categories can be complex categories themselves. index i in Ci). To what extent this difference is signiﬁcant re- As a ﬁnal example, consider the forward com- mains to be seen. For example, one area where this position from earlier, in which a modal verb with may have an impact is when non-linearities are meaning A in S N S N combines with a tran- ⊗ ⊗ ⊗ added after contractions. Since the CCG contrac- sitive verb with meaning B in S N N to give ⊗ ⊗ tions with complex arguments happen “in one go”, a transitive verb with meaning C in S N N. ⊗ ⊗ whereas the corresponding pregroup cancellation Again the cancellation in the syntax corresponds in the semantics would be a series of contractions, to inner products between matrices, but this time many more non-linearities would be added in the we need an inner product for each combination of pregroup case. 3 indices: Krishnamurthy and Mitchell (2013) is based on a similar insight to this paper – that CCG provides C = A B ijk ijlm lmk combinators which can manipulate functions op- 3To be more precise, the two matrices can be thought of erating over vectors. Krishnamurthy and Mitchell as vectors in the tensor space S N and the inner product is consider the function application case, whereas we between these vectors. Another⊗ way to think of this operation is to “linearize” the two matrices into vectors and then have shown how the type-raising and composition perform the inner product on these vectors. operators apply naturally in this setting also.

52 5 Conclusion of what the meaning spaces represent. The latter question is particularly pressing in the case of This paper provides a theoretical framework for the sentence space, and providing an interpretation the development of a compositional distributional of such spaces remains a challenge for the distri- semantics for CCG. Given the existence of ro- butional semantics community, as well as relating bust, wide-coverage CCG parsers (Clark and Cur- distributional semantics to more traditional topics ran, 2007; Hockenmaier and Steedman, 2002), in semantics such as quantification and inference. together with various techniques for learning the tensors, the opportunity exists for a practical im- Acknowledgments plementation. However, there are significant engineering difficulties which need to be overcome. Jean Maillard is supported by an EPSRC MPhil studentship. Stephen Clark is supported by ERC Consider adapting the neural-network learning Starting Grant DisCoTex (306920) and EPSRC techniques of Socher et al. (2012) to this prob- grant EP/I037512/1. Edward Grefenstette is sup- lem.4 In terms of the number of tensors, the lexi- ported by EPSRC grant EP/I037512/1. We would con would need to contain a tensor for every word- like to thank Tamara Polajnar, Laura Rimell, Nal category pair; this is at least an order of magnitude Kalchbrenner and Karl Moritz Hermann for useful more tensors then the number of matrices learnt in discussion. existing work (Socher et al., 2012; Hermann and Blunsom, 2013). Furthermore, the order of the tensors is now higher. Syntactic categories such as References ((N /N )/(N /N ))/((N /N )/(N /N )) are not un- common in the wide-coverage grammar of Hock- M. Baroni and R. Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing enmaier and Steedman (2007), which in this case adjective-noun constructions in semantic space. In would require an 8th-order tensor. This combina- Conference on Empirical Methods in Natural Lan- tion of many word-category pairs and higher-order guage Processing (EMNLP-10), Cambridge, MA. tensors results in a huge number of parameters. M. Baroni, R. Bernardi, and R. Zamparelli. 2013. As a solution to this problem, we are investigat- Frege in space: A program for compositional dis- ing ways to reduce the number of parameters, for tributional semantics (to appear). Linguistic Issues example using tensor decomposition techniques in Language Technologies. (Kolda and Bader, 2009). It may also be possi- Johan Bos, Stephen Clark, Mark Steedman, James R. ble to reduce the size of some of the complex cat- Curran, and Julia Hockenmaier. 2004. Wide- egories in the grammar. Many challenges remain coverage semantic representations from a CCG parser. In Proceedings of COLING-04, pages 1240– before a type-driven compositional distributional 1246, Geneva, Switzerland. semantics can be realised, similar to the work of Bos for the model-theoretic case (Bos et al., 2004; Johan Bos. 2005. Towards wide-coverage seman- Bos, 2005), but in this paper we have set out the tic interpretation. In Proceedings of the Sixth In- ternational Workshop on Computational Semantics theoretical framework for such an implementation. (IWCS-6), pages 42–53, Tilburg, The Netherlands. Finally, we repeat a comment made earlier that the compositional framework makes no assump- Stephen Clark and James R. Curran. 2007. Wide- coverage efficient statistical parsing with CCG tions about the underlying vector spaces, or how and log-linear models. Computational Linguistics, they are to be interpreted. On the one hand, this 33(4):493–552. flexibility is welcome, since it means the framework can encompass many techniques for building Stephen Clark and Stephen Pulman. 2007. Combining symbolic and distributional models of meaning. In word vectors (and tensors). On the other hand, it Proceedings of AAAI Spring Symposium on Quan- means that a description of the framework is nec- tum Interaction, Stanford, CA. AAAI Press. essarily abstract, and it leaves open the question Stephen Clark. 2014. Vector space models of lexical 4Non-linear transformations are inherent to neural net- meaning (to appear). In Shalom Lappin and Chris works, whereas the framework in this paper is entirely linear. Fox, editors, Handbook of Contemporary Semantics However, as hinted at earlier in the paper, non-linear transfor- second edition. Wiley-Blackwell. mations can be applied to the output of each tensor, turning the linear networks in this paper into extensions of those in Daoud Clarke. 2012. A context-theoretic frame- Socher et al. (2012) (extensions in the sense that the tensors work for compositionality in distributional seman- in Socher et al. (2012) do not extend beyond matrices). tics. Computational Linguistics, 38(1):41–71.

53 B. Coecke, M. Sadrzadeh, and S. Clark. 2010. Math- Jayant Krishnamurthy and Tom M. Mitchell. 2013. ematical foundations for a compositional distribu- Vector space semantic parsing: A framework for tional model of meaning. In J. van Bentham, compositional vector space models. In Proceed- M. Moortgat, and W. Buszkowski, editors, Linguis- ings of the 2013 ACL Workshop on Continuous Vec- tic Analysis (Lambek Festschrift), volume 36, pages tor Space Models and their Compositionality, Sofia, 345–384. Bulgaria. Bob Coecke, Edward Grefenstette, and Mehrnoosh Joachim Lambek. 2008. From Word to Sentence. Sadrzadeh. 2013. Lambek vs. Lambek: Functorial A Computational Algebraic Approach to Grammar. vector space semantics and string diagrams for Lam- Polimetrica. bek calculus. Annals of Pure and Applied Logic. T. K. Landauer and S. T. Dumais. 1997. A solu- R. Collobert and J. Weston. 2008. A unified architec- tion to Plato’s problem: the latent semantic analysis ture for natural language processing: Deep neural theory of acquisition, induction and representation networks with multitask learning. In International of knowledge. Psychological Review, 104(2):211– Conference on Machine Learning, ICML, Helsinki, 240. Finland. Jeff Mitchell and Mirella Lapata. 2008. Vector-based James R. Curran. 2004. From Distributional to Seman- models of semantic composition. In Proceedings of tic Similarity. Ph.D. thesis, University of Edinburgh. ACL-08, pages 236–244, Columbus, OH. D.R. Dowty, R.E. Wall, and S. Peters. 1981. Introduc- Stephen Pulman. 2013. Distributional semantic mod- tion to Montague Semantics. Dordrecht. els. In Sadrzadeh Heunen and Grefenstette, editors, Compositional Methods in Physics and Linguistics. Katrin Erk. 2012. Vector space models of word mean- Oxford University Press. ing and phrase meaning: a survey. Language and Linguistics Compass, 6(10):635–653. Hinrich Schutze.¨ 1998. Automatic word sense dis- crimination. Computational Linguistics, 24(1):97– J. R. Firth. 1957. A synopsis of linguistic theory 1930- 124. 1955. In Studies in Linguistic Analysis, pages 1–32. Oxford: Philological Society. Richard Socher, Christopher D. Manning, and An- Edward Grefenstette, Georgiana Dinu, YaoZhong drew Y. Ng. 2010. Learning continuous phrase Zhang, Mehrnoosh Sadrzadeh, and Marco Baroni. representations and syntactic parsing with recursive 2013. Multistep regression learning for composi- neural networks. In Proceedings of the NIPS Deep tional distributional semantics. In Proceedings of Learning and Unsupervised Feature Learning Work- the 10th International Conference on Computational shop, Vancouver, Canada. Semantics (IWCS-13), Potsdam, Germany. Richard Socher, Brody Huval, Christopher D. Man- Gregory Grefenstette. 1994. Explorations in Auto- ning, and Andrew Y. Ng. 2012. Semantic composi- matic Thesaurus Discovery. Kluwer. tionality through recursive matrix-vector spaces. In Proceedings of the Conference on Empirical Meth- Edward Grefenstette. 2013. Category-Theoretic ods in Natural Language Processing, pages 1201– Quantitative Compositional Distributional Models 1211, Jeju, Korea. of Natural Language Semantics. Ph.D. thesis, Uni- versity of Oxford. Mark Steedman. 2000. The Syntactic Process. The MIT Press, Cambridge, MA. Karl Moritz Hermann and Phil Blunsom. 2013. The role of syntax in vector space models of composi- Peter D. Turney and Patrick Pantel. 2010. From tional semantics. Proceedings of ACL, Sofia, Bul- frequency to meaning: Vector space models of se- garia, August. Association for Computational Lin- mantics. Journal of Artificial Intelligence Research, guistics. 37:141–188. Julia Hockenmaier and Mark Steedman. 2002. Gen- erative models for statistical parsing with Combi- natory Categorial Grammar. In Proceedings of the 40th Meeting of the ACL, pages 335–342, Philadel- phia, PA. Julia Hockenmaier and Mark Steedman. 2007. CCG- bank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Com- putational Linguistics, 33(3):355–396. T. G. Kolda and B. W. Bader. 2009. Tensor decompo- sitions and applications. SIAM Review, 51(3):455– 500.

54 From Natural Language to RDF Graphs with Pregroups

Antonin Delpeuch Anne Preller École Normale Supérieure LIRMM 45 rue d’Ulm 161 rue Ada 75005 Paris, France 34392 Montpellier Cedex 5, France [email protected] [email protected]

Abstract are based on a lexicon like all categorial grammars. Syntactical analysis consists in the construction of a We define an algorithm translating natural proof (derivation) in the calculus. language sentences to the formal syntax of All semantics proposed so far use functors from RDF, an existential conjunctive logic widely the lexical category of (Preller and Lambek, 2007) used on the Semantic Web. Our translation into some symmetric compact closed category. They is based on pregroup grammars, an efficient include the compositional distributional vector space type-logical grammatical framework with a models of (Clark et al., 2008), (Kartsaklis et al., 2013) transparent syntax-semantics interface. We in- based on context and the functional logical models of troduce a restricted notion of side effects in (Preller, 2012). the semantic category of finitely generated free We proceed as follows. Recalling that words and semimodules over 0, 1 to that end. The grammatical strings recognised by the grammar are { } translation gives an intensional counterpart to represented by meaning-morphisms in the lexical cat- previous extensional models. We establish egory, we propose a "syntactic" functor from the latter a one-to-one correspondence between exten- into a symmetric monoidal category of ‘morphisms CS sional models and RDF models such that sat- with side effects’ over the category of finite dimen- C isfaction is preserved. Our translation encom- sional semimodules over the lattice B = 0, 1 . The el- { } passes the expressivity of the target language ements of the semimodule S identify with RDF graphs. and supports complex linguistic constructions The value of the syntactic functor at a statement is the like relative clauses and unbounded dependen- RDF graph of the statement. The extensional models cies. of logic are recast as "semantic" functors from the lexical category to . We associate to any semantic functor C 1 Introduction an RDF interpretation and show that a statement is true in the semantic functor if and only if the corresponding There is a general agreement that Natural Language RDF graph is true in the RDF interpretation. Processing has two aspects. One is syntax, rules how words are put together to form a grammatical string. 2 Preliminaries The other is semantics, rules how meanings of strings are computed from the meanings of words. To this we 2.1 RDF graphs add a third aspect, namely that semantics must include RDF is a widely-adopted framework introduced in rules of logic how to compute the truth of the whole (Hayes and McBride, 2004) as a standard for linked string from the truth of its parts. information representation on the Web. Informally, an The Resource Description Framework (RDF) RDF graph is a set of labeled links between objects, (Hayes and McBride, 2004) is an artificial language which represent statements concerning the linked ob- that takes the intuitive form of knowledge graphs. Its jects. semantics has the expressive power of the fragment of Throughout this article we will simply consider that multisorted first order logic that uses conjunction and they are represented by strings of characters without existential quantification only. This restricted expres- spaces, written in a mono-spaced font: the entity John sive power limits the statements of natural language is denoted by the string John. that can be interpreted as RDF graphs. Typically, A link between two objects is a triple, made of one statements with negation words are excluded. entity (the subject of the predicate), a property (the type We use pregroup grammars as the linguistic and of the link, also represented by a string) and a second mathematical tool to recognise and interpret natu- entity (the object of the predicate). Graphically, we rep- ral language strings in RDF. Pregroup Calculus, also resent a triple as a directed property from the subject to known as Compact Bilinear Logic (Lambek, 1993) the object, labeled by the predicate. (Lambek, 1999), is a simplification of Lambek’s Syn- RDF allows to use nodes without labels, called blank tactic Calculus, (Lambek, 1958). Pregroup grammars nodes. Concretely, this means that we can always pick

55 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 55–62, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics a fresh node, named blank-n where n is some num- We start with the formal definition of a monoidal cat- ber, such that this node is not involved in any triple yet. egory followed by the condition it must satisfy to be compact closed. An example We can represent the sentence John owns a white boat using a fresh blank node for a white Definition 1. A category is C boat: 1. monoidal if there is a bifunctor : , ⊗ C × C → C John owns blank-1 a distinguished object I, the unit of , satisfying blank-1 rdf:type boat (A B) C = A (B C), (f ⊗ g) h = blank-1 is_white true f ⊗(g ⊗h) 2 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ Here, rdf:type is a special RDF predicate indicating 2. compact closed if it is monoidal and there are con- r ` the type of its subject. The graph representing this set travariant endofunctors () and () , called right of triples is adjoint and left adjoint, such that for every object A and every morphism f : A B there are mor- → phisms η : I Ar A, the name of f, and John blank-1 f → ⊗ owns : A Br I, the coname of f, satisfying for rdf:type f ⊗ → any g : B C → is-white (A B)r = Br Ar, (A B)` = B` A` ⊗ ⊗ ⊗ ⊗ true boat (f 1C ) (1A ηg) = g f ⊗ ◦ ⊗ ◦ r (1 r ) (η 1 r ) = (f g) . A ⊗ g ◦ f ⊗ C ◦ Figure 1: A graph with a blank node 3. symmetric if it is monoidal and there is a natural Recall that our goal is to translate natural language isomorphism σAB : A B B A such that sentences to graphs. Graphs can indeed be seen as se- 1 ⊗ → ⊗ σ− = σ . mantic representations of sentences, in the sense that AB BA they can be used to assign a truth value to a sentence, Recall that is a bifunctor if and only if the follow- ⊗ trough the notion of entailment (Hayes and McBride, ing equalities are satisfied for all objects A, B and all 2004). morphisms fi : Ai Bi, gi : Bi Ci, i = 1, 2 A graph is an instance of when can be ob- → → H G H tained from by assigning labels to some blank nodes G 1A 1B = 1A B (possibly none of them). A graph entails another ⊗ ⊗ G0 Interchange Law (1) graph if an instance of is a subgraph of 0. With G G G (f2 f1) (g2 g1) = (f2 g2) (f1 g1) these definitions, we can define true RDF graphs as the ◦ ⊗ ◦ ⊗ ◦ ⊗ graphs entailed by some reference graph, storing all the The morphisms of ( ) and identify with true facts. C B LB graphs, which we now describe without invoking cat- 2.2 Pregroup grammars egory theory. A pregroup grammar (Lambek, 1999), consists of the The objects of ( ) are called types. They include C B free pregroup ( ) generated by a partially ordered the elements of , called basic types. For instance, the C B B set and a lexicon . By ‘free pregroup’ we mean set may include the basic types s for statements, d B DB B the free compact closed category ( ) generated by for determiners, n for noun phrases, o for direct ob- C B following the version given in (Preller and Lambek, jects with corresponding types in relatives clauses r, B 2007)1. The lexicon introduces the monoidal cat- nˆ and oˆ. There is only one strict inequality n < o. DB egory , freely generated by the inequalities and the Assimilating to a category, we write iab : a b LB B → lexical morphisms given by the lexicon. Lexical entries instead of a b and 1 for i .A simple type ≤ a aa have "formal meanings" in the lexical category, the free is an iterated left adjoint or right adjoint of a basic `` ( 2) ` ( 1) (0) r compact closed category generated by and the mor- type ..., a = a − , a = a − , a = a , a = B phisms introduced by the words. They are analogue to a(1), arr = a(2),... . The iterator of t = a(z) is the the lambda-terms intervening in categorial grammars, integer z. An arbitrary type is a string of simple types (Steedman, 1996). separated by the symbol . In particular, the empty ⊗ The working of pregroup grammars can be described string is a type, denoted I. without explicit use of category theory. The main result The lexicon of a pregroup grammar consists of a set of this section however, the decomposition lemma, in- of pairs word : T where the type T ( ) captures ∈ C B vokes properties of the graphs proved in (Preller and the different grammatical roles a word may play. For Lambek, 2007). 1Semantics requires more than one morphism between 2Strictly speaking, these equalities hold up to natural iso- objects, whereas the partial preorder of the free pregroup of morphisms, but the coherence theorem of (Mac Lane, 1971) (Lambek, 1999) confounds all morphisms with identical do- makes it possible to replace the isomorphisms by equalities main and codomain. without loss of generality.

56 instance T = ar b ⊗ w a proper name : n w r ` wT = " wT = transitive verb : n s o ar b ⊗ ⊗ b transitive verb : nˆ r r o` ⊗ ⊗ ⊗ T = ar b c` transitive verb : nr oˆr r ⊗ ⊗ ⊗ ⊗ a c determiner : d wT TTT⊗jjj adjective : dr d wT = wT = ⊗ ar b c` wT countnoun : dr n ⊗ ⊗ b ⊗ relpronoun : nr n r` nˆ T = ar cr b r ⊗ ⊗ ` ⊗ ⊗ ⊗ relpronoun : n n r oˆ c a ⊗ ⊗ ⊗ w TTT⊗jjj A pregroup grammars analyses a string of words by wT = r r wT = a c b wT constructing a graph: choose an entry wi : Ti for each ⊗ ⊗ b word wi and align the types in the order of the words. Place non-crossing oriented underlinks such that the If T has two factors that are basic types, there is besides tail of an underlink is a basic type with an even num- the main lexical morphism wT an auxiliary lexical mor- ber of iterations, say t = a(z), where z = 2n. If phism j . ± T the head is to the left of the tail then it is the left ad- (z)` (z 1) joint b = b − , for some basic type b a. If it T = nr n r` d, d = nˆ, oˆ ≥ is to the right then its a right adjoint b(z)r = b(z+1). ⊗ ⊗ ⊗ r n jd Complete the graph by repeating the nodes that are not j thatT = that that d head or tail of an underlink in a line below and adding nr n} r` d" n d a vertical link between corresponding nodes. ⊗ ⊗ ⊗ The string is said to be grammatical if exactly one We omit the subscript T if this does not lead to confu- simple type remains that is not the tail or head of an sion. underlink and if it is a basic type. The resulting graph is called a reduction and denotes a unique morphism The nodes of the meaning graphs are the simple r : T ... T b of ( ). More generally, graphs types of T . They form the lower line of the graph, the 1 ⊗ ⊗ n → C B standing for morphisms align the domain above and the upper line is the empty string I. The corresponding codomain below. morphism has domain I and codomain T . An overlink For instance, the graph below exhibits a reduction to may have several tails but only one head. The tails of overlinks are right or left adjoints of basic types, the the sentence type s 3 john owns a white boat head is a basic type . (n) (nr s o`) (d) (dr d) (dr n) The lexical category generated by the lexicon ; g ; ; LB ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ is the compact closed category freely generated by DB s , the symmetry morphisms σ and the lexical mor- B ab The following two graphs are reductions to the noun phisms introduced by . DB phrase type n. The ﬁrst graph corresponds to the case Strings of words also have meaning(s) in the lex- where the relative pronoun is the object of the verb in ical category. The lexical meaning of a grammati- the relative clause whereas it is the subject in the sec- cal string word1 ... wordn recognised by the reduction ond graph r : T ...T b is, by deﬁnition, the composite of 1 n → a cat that bob hates the tensor product of the word meanings and the cho- d dr n nr n r` oˆ n nr oˆr r ⊗ ? ⊗ ⊗ ? ⊗ ⊗ b ⊗ ⊗ ⊗ ? ⊗ : ⊗ sen reduction.

r (word ... word ): I b . n ◦ 1T1 ⊗ ⊗ nTn → a cat that bobhates The meaning of a composite morphism g f can be d dr n nr n r` nˆ nˆ r r o` n ◦ ⊗ ? ⊗ ⊗ ? ⊗ ⊗ e ⊗ ⊗ ? ⊗ ⊗ c ⊗ simpliﬁed graphically. Stack the graph of f above the graph of g and walk the paths of the graph starting at n a node that is not the head of a link until you arrive The meanings of words are also represented by at a node that is not the tail of a link. Compose the graphs that correspond to morphisms of the lexical cat- labels in the order they are encountered along the path. egory . In fact, every entry w : T in the lexicon Replace the whole path by a single link labelling it by LB the composite label of the path. The resulting graph gives rise to a meaning morphism wT : I T and a → represents the morphism g f, (Selinger, 2011). lexical morphism wT represented by the following ori- ◦ ented labelled graphs For instance, the grammatical strings mentioned

w 3"basic type" is replaced by simple type with an even iter- wb = = wb b ator in the general case

57 above above simplify to the lexical meaning of the string satisﬁes

r (word1 ... wordn) own white boat ◦ ⊗ ⊗ john a = p1 pm (wordi ... wordi ) , ◦ · · · ◦ 0 ◦ 1 ⊗ ⊗ m (n) (nr s o`) (d) (dr d ) (dr n ) ⊗ ; ⊗ ⊗ g ⊗ ⊗ ; ⊗ ⊗ ; ⊗ where each pk is either a tensor product of inequalities of basic types or a tensor product of inequalities and s one property word wordjk . Moreover, k < k0 implies j < j . I k k0 john boat red a In particular, the meaning of the string belongs to ◦ ◦ n o the monoidal category generated by the lexicon = TTT⊗jjj = own (john (boat red a)) ◦ ⊗ ◦ ◦ This is a straightforward consequence of the charac- own terisation of morphisms as normal graphs in the free s s category, (Preller and Lambek, 2007) and the interchange law. T = nˆ r r o` ⊗ ⊗ We map lexical morphisms to "unﬁnished" RDF a cat bobhate triples that T ! d dr n nr nÒ r` nˆ nˆ r r o` n lexical morphism RDF triple ⊗ ? ⊗ ⊗ ? ⊗ ⊗ e ⊗ ⊗ ? ⊗ ⊗ f ⊗ noun : d n ? rdf : type noun → n adjective : d d ? is-adjective true verb : a b →c ? verb ? that hateT ((cat a) bob)] : I I n ◦ ◦ ◦ ⊗ ⊗ → determiner⊗ →: I d blank ??, → r r ?? blank T 0 = n oˆ r ⊗ ⊗ propername : I n propername ??, a cat bob hate → that T 0 ?? propername r r Ò ` ! r r (3) d d? n n? n r oˆ n n? oˆ: r ⊗ ⊗ ⊗ ⊗ ⊗ d ⊗ ⊗ ⊗ ⊗ ⊗ The question marks designate unoccupied positions in the triple. Node words occupy either the subject or the n object position, unary property words leave only the = that hate ((cat a) bob)] : I I n ◦ T 0 ◦ ◦ ⊗ ⊗ → subject position open, binary property words occupy All unlabelled links in the graph above correspond the centre position leaving subject and object position to inequalities of basic types. Inserting the strict in- unoccupied. Finally, the noun phrases a cat that hates equalities at the appropriate place and applying the in- Bob and a cat that Bob hates are respectively translated terchange law, we decompose the label into minimal to the following graphs : building blocks that correspond one-to-one to the re- blank-1 blank-1 sources occurring in the RDF graph associated to the rdf:type rdf:type statement. For example, hates hates own (john (boat white a)) = own◦ (1 ⊗in) (1◦ boat◦) (1 white)) n n n Bob cat Bob cat (john◦ a)⊗. ◦ ⊗ ◦ ⊗ ◦ ⊗ (2) (a) A cat that hates Bob (b) A cat that Bob hates The expression after the equality symbol above is a composite of tensor products the factors of which are 3 The Translation either inequalities between basic objects or lexical mor- Let B = 0, 1 , +, be the commutative semiring in phisms. Only the rightmost tensor product contains h{ } ·i which the addition, +, is the lattice operator and the more than one lexical morphism. In fact, all factors ∨ multiplication, , the lattice operator on 0, 1 . are lexical morphisms with domain I. · ∧ { } The semantic category which hosts the RDF graphs The translation of statements to RDF graphs rests on and our models of grammatical strings is the category the existence of this decomposition. Borrowing the ter- C of ﬁnitely generated semimodules over . It is compact minology of RDF graphs, call any lexical morphism B closed satisfying A` = A = Ar for each object A. with domain I a node word and any other lexical mor- Every object A has a‘canonical’ basis (e ) such that phism a property word. i i every element v A can be written uniquely as v = ∈ Lemma 1 (Decomposition). Let word 1 ... word n αiei, where αi B. We refer to elements ofA as i ∈ be a grammatical string with lexical morphisms vectors. Morphisms of are maps that commute with C word ,..., word and a reduction r : T ...T Paddition and scalar multiplication. 1 n 1 n → b. Then there is an enumeration of the node words We interpret RDF graphs as elements of a semimod- word : I b ,..., word : I b such that ule S determined by the RDF vocabulary, see below. i1 → 1 im → m

58 The translation of grammatical strings is given by a computes at a b the union of the RDF graphs p(a ) i ⊗ j i functor from the monoidal category generated by the and q(bj), computed separately by p and q. lexical morphisms into a category of morphisms with As an illustration, consider the following morphisms side effects mapping the decomposition of a grammati- of CS cal string to a vector of S. m = (e , 0), m = (e , 0) Let L be a set of labels that includes the property john john blanki blanki words and a ‘very large’, but finite number n0 of la- The arrow of a proper noun consists of the node rep- 4 bels blanki, for i = 1, . . . n0 . The other elements resenting this entity, e.g. ejohn, paired with the empty of L are node names and property names of an RDF graph represented by 0 S. Choosing the empty graph ∈ vocabulary. means that nothing is said about this node. A similar Define N as the semimodule over the semi-ring B remark holds for determiners. freely generated by L and denote elabel the basis vector of N corresponding to label L. We present mwhite = (1N , 1N eis-white etrue), ∈ ⊗ ⊗ RDF triples by basis vectors of S = N N N and ⊗ ⊗ mboat = (1N , 1N erdf-type eboat) RDF graphs by sums of basis vectors of S, for instance ⊗ ⊗ An adjective or a noun is a morphism that maps a node ebob ehate eblank +eblank erdf type ecat. Hence, ⊗ ⊗ ⊗ − ⊗ the vector sum models the union of RDF graphs. word to itself and to the empty slot in the corresponding We want to interpret the lexical morphisms such that triple. As a side effect, it adds this triple to the second they construct a triple when applied to an argument and component. add it to the vector of S already constructed. Composi- mown = (1N eown 1N , 1N eown 1N ) tion of the category alone is not powerful enough to ⊗ ⊗ ⊗ ⊗ C achieve this. We define a new category in which A transitive verb is a morphism that maps an ordered CS the entity a white boat will be denoted by the pair pair of nodes to a triple making the first the subject and (e , e e e + e e the second the object of the triple. blank-1 blank-1 ⊗ rdf:type ⊗ boat blank-1 ⊗ is-white ⊗ etrue). Compose the morphisms Therefore we switch from to the monoidal cate- C mwhite mblanki = (eblanki , t1) gory S below in which arrows have two components. ◦ C t1 = (eblanki eis-white etrue) The first component creates the triple and the second ⊗ ⊗ mboat mwhite mblank = (eblank , t2 + t1) component adds the new triple to the graph as a ‘side ◦ ◦ i i t2 = (eblanki erdf-type eboat) effect’. m (m ⊗(m m⊗ m )) = own ◦ john ⊗ boat ◦ white ◦ blanki Definition 2 (Category with Side Effects). Let mown (ejohn eblanki , 0 + t2 + t1) = 5 ◦ ⊗ a , . . . , a and b , . . . , b be the basis of A and (ejohn eown eblank , t3 + t2 + t1) { 1 m} { 1 n} ⊗ ⊗ i B. For any p (A, S), q (B,S), define p q t3 = ejohn eown eblank ∈ C ∈ C ⊗+ ∈ ⊗ ⊗ i (A B,S) as the unique linear map satisfying for an (4) C ⊗ arbitrary basis vector ai bj of A B The effect of composition is to create a new triple on ⊗ ⊗ the left of the comma and to store it on the right. (p + q): ai bj p(ai) + q(bj) . Proposition 1. The category with side effects S is a ⊗ ⊗ 7→ C monoidal category. The category of morphisms with side effects in S CS has: The proof of this proposition is given in appendix A. objects as in Define a monoidal structure preserving functor • C F morphisms (f, p): A B where f (A, B), from the monoidal category generated by the lexical • → ∈ C Ker(f) = 0 and p (A, S). morphisms to S thus { } ∈ C (s) = S =C N N N arrows 1A = (1A, 0). •F ⊗ ⊗ • (a) = N if a = s an operation defined by (f, p) (h, q) = (f h, p •F 6 • ◦ ◦ ◦ ◦ (1a) = (ia b) = (ja b) = 1 (a), for all basic h + q). •F F F F an operation on objects defined as in types a, b • ⊗ C ∈ B an operation on arrows defined by (f, p) (that : r n) = (1, 0) • ⊗ ⊗ •F → (h, q) = (f h, p q). (name : I d) = (ename, 0) ⊗ ⊗+ •F → (determiner : I d) = (eblanki , 0) Examples of morphisms in the first component are •F r → (word : d n) = (1, 1 erdf-type eword) the symmetry σ : N N N N, π1, π2 : N N •F r ⊗ ⊗ ⊗ ⊗ → ⊗ ⊗ → (word : d d) = (1, 1 eis-word etrue) N defined by σ(ai bj) = bj ai, π1(ai bj) = ai •F ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ (wordnr s ol ) = (1 eword 1, 1 eword 1) and π2(ai bj) = bj. •F ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ (wordnˆ r r ol ) = (π1, 1 eword 1) The new operation + introduced above concerns F ⊗ ⊗ ⊗ ⊗ ⊗ (wordnr oˆr r) = (π2, (1 eword 1) σ). the second component only. The morphism p q F ⊗ ⊗ ⊗ ⊗ ◦ ⊗+ Note that the morphisms in the example above satisfy 4 m = (word). Computation (4) shows that it suffices that n0 exceeds the number of occurrences of word F F determiners in the set of digitalised documents maps the lexical meaning of john owns a white boat to 5In , every object has a unique basis: the canonical one. the three RDF triples t , t , t . C 1 2 3

59 Lemma 2. Let word 1 ... word n be a statement with ... Ui is said to be a selector if for any basis vector ⊗ m corresponding lexical entries word : T ... word : e ... e U ... U 1 1 n 1 ⊗ ⊗ n ∈ 1 ⊗ ⊗ n Tn and r : T1 ...Tn s a reduction to the sentence → q(e ... e ) = e ... e or q(e ... e ) = 0 . type. Then second component of 1⊗ ⊗ n i1 ⊗ ⊗ im 1⊗ ⊗ n

(r (word ... word )) = We say that q selects the i-th factor if m = 1 and i = i1 F ◦ 1 ⊗ ⊗ n (f, t1 + + tm) S(I,S) and that it is a projector if m = n. If the latter is ··· ∈ C the case then q is an idempotent and self-adjoint en- is a sum of RDF triples t S, for k = 1, . . . , m. domorphism, hence a ‘property’ in quantum logic, see i ∈ Moreover, tk has the form (3) determined by the prop- (Abramsky and Coecke, 2004). erty word word such that the unlabelled nodes are If the domain V and the codomain U ... U are jk i1 ⊗ ⊗ im filled by node words of the statement. fixed then the selectors are in a one-to-one correspondence with the ‘truth-value’ functions p : A , The proof is given in Appendix B. → {> ⊥} on the set A of basis vectors of V related by the condi- Definition 3 (RDF Translation). The RDF graph trans- tion lating the statement word 1 ... word n with reduction p(a) = if and only if q(a) = 0 r : T ...T s is the second component of (r > 6 1 n → F ◦ (word ... word )). for all a A . 1 ⊗ ⊗ n ∈ k Let v = l=1 ajl be an arbitrary vector of V .A For instance, the translation of the statement John selector q : V W is said to be true at v if q(a ) = 0, → jl 6 owns a white boat is for l = 1, . . .P , k. e e e + e e e john own blanki blanki is-white true Lemma 3. Selectors are closed under composition and + e ⊗ e⊗ e , ⊗ ⊗ blanki ⊗ rdf-type ⊗ boat the tensor product. Every identity is a selector. Let p : V W and q : W X be selectors and because → → v V . Then q p is true at v if and only if p is true at (own (john (boat white a)) ∈ ◦ F ◦ ⊗ ◦ ◦ v and q is true at w = p(v). = mown (mjohn (mboat mwhite mblanki )) . ◦ ⊗ ◦ ◦ Proof. The first assertion is straight forward. To show The translation algorithm from text to RDF graphs the second, assume that a is a basis vector for which starts with a parsing algorithm of pregroup grammars (q p)(a) = 0. Then p(a) = 0 because q(0) = 0. ◦ 6 6 that chooses a type for each word of the statement and Hence p(a) is a basis vector of W selected by p. The provides a reduction to the sentence type. Next it com- property now follows for an arbitrary vector from the putes the decomposition of the formal meaning in the definitions. lexical category by ‘yanking’. Finally it computes the RDF graph by applying the translation functor to the Definition 4. A compact closed structure preserving F functor : is a -model with universe U if decomposition. The parsing algorithm is cubic poly- M LB → C C nomial in the number of words. Decomposition is lin- it satisfies (s) = U U ear, because the number of links is proportional to the •M ⊗ (a) = U for any basic type a = s number of words. Finally, translation again is linear, •M 6 (1a) = (ia b) = (ja b) = 1 (a), for all because the sum of the number of property words and •M M M M of the number of node words in the decomposition is basic types a, b (that) = 1 ∈ B proportional to the number of words. •M U (wordar b) and (wordnr s ol ) are projec- •M ⊗ M ⊗ ⊗ 4 -Models and RDF Interpretations tors C (wordnˆ r ol ) = π1 (wordnr s ol ) •M ⊗ ⊗ ◦ M ⊗ ⊗ In this section, we establish the connection between the (wordnr oˆr r) = π2 (wordnr s ol ) σ M ⊗ ⊗ ◦ M ⊗ ⊗ ◦ extensional models of meaning of (Preller, 2012) with maps determiners and proper names in the sin- •M the RDF interpretations of (Hayes, 2004) via the trans- gular to basis vectors. lation from natural language to RDF graphs. We F The last condition guarantees that the interpretations show that a statement is true in an extensional model of a transitive verb in a statement and in a relative if and only if the RDF graph computed by is true F clause are equal if the relative pronoun is the subject of in the RDF interpretation associated to the extensional the verb in the relative clause and that they differ only model. by the symmetry isomorphism if the relative pronoun Choose an object U of , the ‘universe of discourse’ C is the object. of the fragment of natural language. The basis vectors of U stand for individuals or concepts. Properties are Definition 5. A statement word1 ... wordn with reduction r : T ... T b is true in if represented by maps that test (n-tuples of) entities and 1 ⊗ ⊗ n → M (r (word ... word ) = 0. let (part of) them pass if the test succeeds. M ◦ 1 ⊗ ⊗ n 6 Let 1 i < < i n be a strictly increasing Truth in a model can be reformulated in terms of se- ≤ 1 ··· m ≤ sequence. A linear map q : U ... U U lector truth with the help of Lemma 1. 1 ⊗ ⊗ n → i1 ⊗

60 Lemma 4. Let p1 pm (wordi1 ... wordim ) Acknowledgments ◦· · ·◦ 0 ◦ ⊗ ⊗ be the decomposition of the formal meaning of the This work was supported by the École Normale statement word ... word . Then (p ) is a selec- 1 n M l Supérieure and the LIRMM. The first author wishes to tor and (word ) a vector in U, for 1 l m0, M ik ≤ ≤ thank David Naccache, Alain Lecomte, Antoine Amar- 1 k m. ≤ ≤ illi, Hugo Vanneuville and both authors the members Moreover, the statement is true in if and only if M of the TEXTE group at the LIRMM for their interest in the selector (p ) is true at (p ) (p ) M l M l+1 ◦· · ·◦M m0 ◦ the project. ( (word ) ... (word )) for 1 l m0. M i1 ⊗ ⊗ M im ≤ ≤ Proof. maps the meaning of the string to (p ) M M 1 ◦ References (pm ) ( (wordi ) ... (wordi )). · · · ◦ M 0 ◦ M 1 ⊗ ⊗ M m Samson Abramsky and Bob Coecke. 2004. A categor- Note that for k = 1, . . . , m0 any factor of (p ) is M k ical semantics of quantum protocols. In Proceed- the identity of U unless it is (word ) = q . Thus M jk k ings of the 19th Annual IEEE Symposium on Logic (p ) is a tensor product of selectors. Hence both M k in Computer Science, pages 415–425. assertions follow from Lemma 3. Stephen Clark, Bob Coecke, and Mehrnoosh Any -model defines an RDF interpretation. The C M Sadrzadeh. 2008. A compositional distribu- vocabulary consists of the basis vectors e N, label ∈ tional model of meaning. In W. Lawless P. Bruza see previous section. The set of property symbols is and J. van Rijsbergen, editors, Proceedings of given by Conference on Quantum Interactions. University of P = e : adjective rdf:type Oxford, College Publications. { is-adjective ∈ D} ∪ { } e : verb ∪ { verb ∈ C} Patrick Hayes and Brian McBride. 2004. Rdf semantics. Technical report, Hewlett Packard Labs. Define an RDF interpretation I as follows M set of properties Patrick Hayes. 2004. Rdf semantics. Technical report, • IP = (word): e P rdf:type W3C Recommendation, W3C. {M word ∈ } ∪ { } Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen set of resources Pulman, and Bob Coecke, 2013. Reasoning about • IR = IP U true Meaning in Natural Language with Compact Closed ∪ ∪ { } Categories and Frobenius Algebras. Cambridge University Press. mapping IS and its extension to blank nodes • IS(e ) = (word), Joachim Lambek. 1958. The mathematics of sen- word M tence structure. American Mathematical Monthly, IS(e ) = (determiner ) blanki M i 65:154–170. mapping IEXT from IP into the power set of IR IR • × Joachim Lambek, 1993. Substructural Logics, chap- IEXT(rdf-type) = ter From categorial grammar to bilinear logic, pages e, (noun) : (noun) is true at e 207–237. Oxford University Press. {h M i M } IEXT(IS(e )) = is-adjective Joachim Lambek. 1999. Type grammar revisited. In e, true : (adjective) is true at e {h i M } Alain Lecomte, editor, Logical Aspects of Computa- IEXT(IS(everb)) = tional Linguistics, volume 1582 of LNAI, pages 1– e , e : (verb) is true at e e . {h 1 2i M 1 ⊗ 2} 27, Heidelberg. Springer. Proposition 2. A statement word1 ... wordn is true in Saunders Mac Lane. 1971. Categories for the Working a -model if and only if every triple of its RDF Mathematician. Springer. C M translation G is true in the RDF interpretation I M Anne Preller and Joachim Lambek. 2007. Free The proof is given in Appendix B. compact 2-categories. Mathematical Structures for Computer Sciences, 17(1):1–32. 5 Conclusion Anne Preller, 2012. From Sentence to Concept: Pred- The modelling of natural language by RDF graphs re- icate Logic and Quantum Logic in Compact Closed mains limited by the expressivity of RDF. The latter Categories, pages 00–29. Oxford University Press. goes way beyond the few examples presented here. So far, we have only considered the extensional branch of Peter Selinger. 2011. A survey of graphical languages for monoidal categories. In New Structures a word. Future work will take advantage of the dis- for Physics, volume 813 of Lecture Notes in Physics, tinction between a property and its extension in RDF to pages 289–233. Springer. introduce the conceptual branch of a plural, which does not refer to an extension, e.g Tom likes books, or cap- Mark Steedman. 1996. Surface Structure and Inter- ture the difference between Eve and John own a boat pretation, volume 30 of Linguistic Inquiry Mono- and Eve and John are athletes. graph. MIT Press, Cambridge, Massachusetts.

61 A Proof of proposition 1 Indeed, let (ei)i and (fj)j be the bases of A and B respectively. Then, for all i and j, Note first that composition and the tensor are well defined. ((f1 + g1) + (f2 + g2))(ei fj) Assume (f, p): A B and (g, q): B C. Then ⊗ ⊗ ⊗ → → =(f1 + g1)(ei fj) + (f2 + g2)(ei fj) q f : A S and p : A S, so q f + p : A S. ⊗ ⊗ ⊗ ⊗ ◦ → → ◦ → Therefore (g, q) (f, p) = (g f, q f + p): A C =f1(ei) + g1(fj) + f2(ei) + g2(fj) ◦ ◦ ◦ → is well defined. Similarly, if (f , p ): A B for =((f + f ) (g + g ))(e f ) . i i i → i 1 2 + 1 2 i j i = 1, 2 then p p : A A S and therefore ⊗ ⊗ 1 ⊗+ 2 1 ⊗ 2 → (f f , p p ): A A B B as required. Finally, the Interchange Law follows from (5) and the 1 ⊗ 2 1 ⊗+ 2 1 ⊗ 2 → 1 ⊗ 2 The operation is associative on arrows. Indeed, definitions thus ⊗+ let (ai)i, (bj)j and (ck)k be the basis of A, B and C, ((f , p ) (g , q )) ((f , p ) (g , q )) respectively. Then for r : C S 2 2 ⊗ 2 2 ◦ 1 1 ⊗ 1 1 → =(f g , p q ) (f g , p q ) 2 ⊗ 2 2 ⊗+ 2 ◦ 1 ⊗ 1 1 ⊗+ 1 ((p + q) + r)(ai bj ck) =((f g ) (f g ), (p q ) (f g ) + (p q )) ⊗ ⊗ ⊗ ⊗ 2 ⊗ 2 ◦ 1 ⊗ 1 2 ⊗+ 2 ◦ 1 ⊗ 1 1 ⊗+ 1 =(p + q)(ai bj) + r(ck) =((f f ) (g g ), (p f ) (q g ) + (p q )) ⊗ ⊗ 2 ◦ 1 ⊗ 2 ◦ 1 2 ◦ 1 ⊗+ 2 ◦ 1 1 ⊗+ 1 =p(ai) + q(bj) + r(ck) =((f f ) (g g ), (p f + p ) (q g + q )) 2 ◦ 1 ⊗ 2 ◦ 1 2 ◦ 1 1 ⊗+ 2 ◦ 1 1 =(p + (q + r))(ai bj ck) . =(f f , p f + p ) (g g , q g + q ) ⊗ ⊗ ⊗ ⊗ 2 ◦ 1 2 ◦ 1 1 ⊗ 2 ◦ 1 2 ◦ 1 1 Hence the tensor product on arrows of is associative. =((f2, p2) (f1, p1)) ((g2, q2) (g1, q1)) . CS ◦ ⊗ ◦ To show the interchange law (1), we need a lemma: Therefore S is a monoidal category. Lemma 5. Let p : C S, q : D S and f : A C → → → C, g : B D with Ker(f) = Ker(g) = 0 . Then → { } B Proof of Lemma 2 and Proposition 2

(p + q) (f g) = (p f) + (q g) Proof. Note that both and map inequalities of ⊗ ◦ ⊗ ◦ ⊗ ◦ F M basic types to identities, hence also any atom without Indeed, let (ai)i, (bi)i, (ci)i and (di)i be the bases a lexical morphism to an identity. Let wordjl be the of A, B, C and D respectively. For all i and j, we property word occurring in pl, say as the kl-th fac- decompose f(ai) on the basis (ci)i and similarly for tor, q = (word ) and m = (word ), for l M jl l F jl g(bj) on (di)i: l = 1, . . . , m. Then (p ) = 1 . . . q ... 1 M l U ⊗ l ⊗ U and (p ) = 1 . . . m ... 1 . Suppose that e F l N ⊗ l ⊗ N i f(ai) = cir and g(bj) = dis is a basis vector of U and enodei a basis vector of N r s X X satisfying ei = I(enodei ). Use induction on n m l and assume that e ... Each family of indices (ir)r and (is)s is non empty, − − 1 ⊗ ⊗ er = (pl+1) (pn m) ( (nodei ) ... because Ker(f) = Ker(g) = 0 . Hence, l M ◦· · ·◦M − ◦ M 1 ⊗ ⊗ { } (nodei )) and (enode , 0) ... (enode , 0) = M m 1 ⊗ ⊗ rl ((p + q) (f g))(ai bj) (pl+1) (pn m) ( (enode ) ... − i1 ⊗ ◦ ⊗ ⊗ F ◦ · ·6 · ◦ F ◦ F ⊗ ⊗ =(p q)(f(a ) g(b )) (enodei )). ⊗+ i ⊗ j F m Let tl be the triple created when composing (pl) =(p q)(( c ) ( d )) F + ir is with (e , 0) ... (e , 0). We want the show ⊗ ⊗ node1 noderl r s ⊗ ⊗ X X that tl is true in I if and only if (pl) is true at e1 M M ⊗ =(p + q)( ci di ) ... e . ⊗ r ⊗ s ⊗ rl r,s Consider the case where word : d d. The X jl → other cases are shown similarly. Recall that ml = p(cir ) + q(dis ) ◦ (enode , 0) = (enode , tl), with tl = enode r,s kl kl kl ⊗ X e e . Hence, (p )((e , 0) ... is-wordjl true l node1 = p(cir ) + q(dis ) ⊗ F ⊗ ⊗ (enoder , 0)) = ((enode1 , 0) ... (enodek , tl) ... r s ⊗ l ⊗ X X (enoder , 0) = (enode1 ... enoder , tl). Then tl =p(f(a )) + q(g(b )) ⊗ ⊗ i j is true in I if and only if (I(enode ), true) M kl ∈ =((p f) (q g))(a b ) IEXT(I(e )) if and only if q is true at e , by ◦ ⊗+ ◦ i ⊗ j is-word l kl definition of I. On the other hand, 1U . . . ql ... 1U is The fifth equality uses the fact that 1 + 1 = 1 and the ⊗ ⊗ true at e1 . . . ekl ... er if and only if ql is true at ekl . sum is non empty. This terminates the proof of the ⊗ ⊗ If that is the case then (pl)(e1 . . . ekl ... er) = lemma. M ⊗ ⊗ e1 . . . ekl ... er. Now let (f , p ): A C, (f , p ): C E, ⊗ ⊗ 1 1 → 2 2 → (g , q ): B D and (g , q ): D F . We show 1 1 → 2 2 → first the following equality 60 can be replaced by any vector of S without changing (f g )+(f g ) = (f +f ) (g +g ). (5) the proof. 1 ⊗+ 1 2 ⊗+ 2 1 2 ⊗+ 1 2

62 Incremental semantic scales by strings

Tim Fernando Computer Science Department Trinity College Dublin Dublin, Ireland [email protected]

Abstract for intervals is a notion of finite approximability, plausibly related to cognition. What objection Scales for natural language semantics are might there be to it? The fact that no finite lin- analyzed as moving targets, perpetually ear order is dense raises the issue of compatibility under construction and subject to ad- between finite approximability and density — no justment. Projections, factorizations and small worry, given the popularity of dense linear constraints are described on strings of orders for time (e.g. Kamp and Reyle 1993, Pratt- bounded but refinable granularities, shap- Hartmann 2005, Klein 2009) as well as measure- ing types by the processes that put seman- ment (e.g. Fox and Hackl 2006). tics in flux. Fortunately, finite linear orders can be organized into a system of approximations converging 1 Introduction at the limit to a dense linear order. The present An important impetus for recent investigations work details ways to form such systems and lim- into type theory for natural language semantics is its, with density reanalyzed as refinability of ar- the view of “semantics in flux,” correcting “the im- bitrary finite approximations. A familiar example pression” from, for example, Montague 1973 “of provides some orientation. natural languages as being regimented with mean- Example A (calendar) We can represent a cal- ings determined once and for all” (Cooper 2012, endar year as the string page 271). The present work concerns scales for temporal expressions and gradable predicates. smo := Jan Feb Mar Dec Two questions that loom large from the perspec- ··· tive of semantics in flux are: how to construct of length 12, or, were we interested also in days scales and align them against one another (e.g. d1,d2...,d31, the string Klein and Rovatsos 2011). The formal study car- ried out below keeps scales as simple as possi- smo,dy := Jan,d1 Jan,d2 Jan,d31 ble, whilst allowing for necessary refinements and ··· Feb,d1 Dec,d31 adjustments. The basic picture is that a scale is ··· a moving target finitely approximable as a string of length 365 for a non-leap year (Fernando over an alphabet which we can expand to refine 2011).1 In contrast to the points in the real line granularity. Reducing a scale to a string comes, , a box can split, as Jan in s does (30 times) however, at a price; indivisible points must give R mo to way to refinable intervals (embodying underspec- Jan,d1 Jan,d2 Jan,d31 ification). ··· Arguments for a semantic reorientation around in smo,dy, on introducing days d1, d2,..., d31 intervals (away from points) are hardly new. Best into the picture. Reversing direction and gener- known within linguistic semantics perhaps are alizing from those in tense and aspect from Bennett and Partee 1972, which seem to have met less resistance than mo := Jan,Feb,...Dec { } arguments in the degree literature from Kennedy 1We draw boxes (instead of the usual curly braces and ) 2001 and Schwarzschild and Wilkinson 2002 (see around sets-as-symbols, stringing together “snapshots”{ much} Solt 2013). At the center of the present argument like a cartoon/film strip.

63 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 63–71, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics to any set A, we deﬁne the function ρ on strings the componentwise unions of strings α α and A 1 ··· n (of sets) to componentwise intersect with A β β of the same number n of sets for their su- 1 ··· n perposition ρ (α α ) := (α A) (α A) A 1 ··· n 1 ∩ ··· n ∩ α α & β β := (α β ) (α β ) (throwing out non-A’s from each box) so that 1 ··· n 1 ··· n 1 ∪ 1 ··· n ∪ n

31 28 31 and superposing languages L and L0 over Fin(Φ) ρmo(smo,dy) = Jan Feb Dec . ··· by superposing strings in L and L0 of the same Next, the block compression bc(s) of a string s length compresses all repeating blocks αn (for n 1) ≥ L & L0 := s&s0 s L, s0 L0 and of a box α in a string s to α for { | ∈ ∈ length(s) = length(s0) } bc(αs0) if s = ααs0 α bc(βs ) if s = αβs with (Fernando 2004). For example, bc(s) :=  0 0 α = β  6  s otherwise smo,dy = ρmo(smo,dy)& ρdy(smo,dy)  so that if bc(s) = α1 αn then αi = αi+1 for i where dy := d1, d2 ..., d31 . More generally, ··· 6 { } from 1 to n 1. In particular, writing L for the image of L under ρ − A A 31 28 31 bc( Jan Feb Dec ) = smo. L := ρ (s) s L , ··· A { A | ∈ } bc s B Writing A for the function mapping to observe that for L (2 )∗ and A B, L is bc(ρ (s)) ⊆ ⊆ A , we have included in the superposition of LA and LB A − bcmo(smo,dy) = smo. L LA & LB A. ⊆ − In general, we can refine a string sA of granu- The next step is to identify a language L such that larity A to one s of granularity A A with 0 A0 0 ⊇ bcA(sA ) = sA. Iterating over a chain 0 L = (LA & LB A) L0 (1) − ∩ A A0 A00 , ⊆ ⊆ ⊆ · · · other than L0 = L. For a decomposition (1) of L L we can glue together strings sA, sA0 , sA00 ,... such into (generic) contextual constraints 0 separate that from the (specific) components LA and LB A, − it will be useful to sharpen LA, LB A and &, − bc (s ) = s for X A, A0,A00,... . X X0 X ∈ { } factoring in bc and variants of bc (not to mention ). Measurements ranging from crude compar- Details in section 2. ∩ isons (of order) to quantitative judgments (mul- We shall refer to the expressions we can put in tiplying unit magnitudes with real numbers) can a box as fluents (short for temporal propositions), be expressed through fluents in Φ. We interpret and assume they are the elements of a set Φ. While the fluents relative to suitable strings in Fin(Φ) , the set Φ of fluents might be infinite, we restrict the ∗ presented below in category-theoretic terms con- boxes that we string together to finite sets of flu- nected to type theory (e.g. Mac Lane and Moerdijk ents. Writing Fin(Φ) for the set of finite subsets 1992). Central to this presentation is the notion of of Φ and 2X for the powerset of X (i.e. the set a presheaf on Fin(Φ) — a functor from the op- of X’s subsets), we will organize the strings over posite category Fin(Φ)op (a morphism in which the infinite alphabet Fin(Φ) around finite alpha- is a pair (B,A) of finite subsets of Φ such that bets 2A, for A Fin(Φ) ∈ A B) to the category Set of sets and functions. ⊆ A The Fin(Φ)-indexed family of functions bc (for Fin(Φ)∗ = (2 )∗. A A Fin(Φ)) provides an important example that A F in(Φ) ∈ ∈ [ we generalize in section 2. In addition to projecting Fin(Φ) down to 2A for An example of linguistic semantic interest to some A Fin(Φ), we can build up, forming which block compression bc applies is ∈

64 Example B (continuous change) The pair (a), (Dowty 1979). ( ) holds also for the ﬂuents in † (b) below superposes two events, soup cooling and the string (d) below for (b), where the subinterval an hour passing, in different ways (Dowty 1979). relation is inclusion restricted to intervals, v

(a) The soup cooled in an hour. I = [ ]ϕ iff ( I0 I) I0 = ϕ | w ∀ v | (b) The soup cooled for an hour. and sDg is the fluent ↓ A common intuition is that in an hour requires x (sDg < x Prev(x sDg)) an event that culminates, while for an hour re- ∃ ∧ ≤ quires a homogeneous event. In the case of (a), saying the degree drops (with I = Prev(ϕ) iff | the culmination may be that some threshold tem- I0I = ϕ for some I0 < I such that I I0 is | ∪ perature (supplied by context) was reached, while an interval). in (b), the homogeneity may be the steady drop (d) x [ ]sDg hour(x), [ ]sDg in temperature over that hour. We might track w ↓ w ↓ soup cooling by a descending sequence of degrees, ( ) is intimately related to block compression bc d > d > > d , with d at the beginning † 1 2 ··· n 1 (Fernando 2013b), supporting derivations of (c) of the hour, and dn at the end. But no sample of and (d) by a modification & of & defined in 2.3 bc § finite size n can be complete. To overcome this below. limitation, it is helpful to construe the ith box in Our third example directly concerns computa- a string as a description of an interval Ii over the tional processes, which we take up in section 3. real line R. We call a sequence I1 In of inter- n ··· (finite automata) Given a finite al- vals a segmentation if i=1 Ii is an interval and for Example C 1 i < n, Ii < Ii+1, where < is full precedence phabet A, a (non-deterministic) finite automaton ≤ S over A is a quadruple (Q, δ, F, q ) consisting A 0 I < I0 iff ( r I)( r0 I0) r < r0. of a finite set Q of states, a transition relation ∀ ∈ ∀ ∈ δ Q A Q, a subset F of Q consisting of ⊆ × × Now, assuming an assignment of degrees sDg(r) final (accepting) states, and an initial state q Q. 0 ∈ to real numbers r representing temporal instants, accepts a string a1 an A∗ precisely if there the idea is to define satisfaction = between inter- A ···n ∈ | is a string q1 qn Q such that vals I and fluents sDg < d according to ··· ∈ qn F and δ(qi 1, ai, qi) for 1 i n (2) I = sDg < d iff ( r I) sDg(r) < d ∈ − ≤ ≤ | ∀ ∈ (where q is ’s designated initial state). The ac- 0 A and similarly for d sDg. We then lift = to cepting runs of are strings of the form ≤ | A segmentations I1 In and strings α1 αn A Q n ··· ··· ∈ a , q a , q (2 ∪ )∗ Fin(Φ) of the same length n such that 1 1 ··· n n ∈ satisfying (2). While we can formulate such runs I1 In = α1 αn iff whenever 1 i n ··· | ··· ≤ ≤ as strings over the alphabet A Q, we opt for the and ϕ Ii,Ii = ϕi A Q × ∈ | alphabet 2 ∪ (formed from A Q Fin(Φ)) ∪ ∈ and analyze (a) above as (c) below, where d is to link up smoothly with examples where more the contextually given threshold required by in an than one automata may be running, not all neces- hour, and x is the start of that hour, the end of sarily known nor in perfect harmony with others. which is marked by hour(x). Such examples are arguably of linguistic interest, the so-called Imperfective Paradox (Dowty 1979) (c) x, d sDg d sDg hour(x), sDg < d being a case in point (Fernando 2008). That said, ≤ ≤ the attention below is largely on certain category- All fluents ϕ in (c) have the stative property theoretic preliminaries for type theory.2 We adopt the following notational conventions. ( ) for all intervals I and I whose union I I † 0 ∪ 0 Given a function f and a set X, we write is an interval, 2Only the most rudimentary category-theoretic notions are employed; explanations can be found in any number of in- I I0 = ϕ iff I = ϕ and I0 = ϕ ∪ | | | troductions to category theory available online (and in print).

65 - f X for f restricted to X domain(f) 2.1 Φ-preserving functions ∩ - image(f) for f(x) x domain(f) A function f on Fin(Φ)∗ is Φ-preserving if f pre- { | ∈ } serves voc and f , for all A Fin(Φ). Note that A ∈ - fX for image(f X) bc is Φ-preserving, as is the identity function id on Fin(Φ)∗. - f 1X for x domain(f) f(x) X − { ∈ | ∈ } Proposition 1. If f is Φ-preserving then f is and if g is a function for which image(f) ⊆ idempotent and domain(g), fB; fA = fA B - f; g for f composed (left to right) with g ∩ for all A, B Fin(Φ). (f; g)(x) := g(f(x)) ∈ Let Pf be the function with domain for all x domain(f). ∈ Fin(Φ) (B,A) Fin(Φ) Fin(Φ) A B We say f is a function on X if ∪ { ∈ × | ⊆ } mapping A Fin(Φ) to f(2A) domain(f) = X image(f) ∈ ∗ ⊇ A P (A) := f(s) s (2 )∗ — i.e., f : X X. The kernel of f, ker(f), is f { | ∈ } → the equivalence relation on domain(f) that holds and a Fin(Φ)op-morphism (B,A) to the restric- between s, s such that f(s) = f(s ). Clearly, 0 0 tion of fA to Pf (B) ker(f) ker(f; g) ⊆ Pf (B,A) := fA Pf (B). when f; g is defined. Corollary 2. If f is Φ-preserving then Pf is a presheaf on Fin(Φ). 2 Some presheaves on Fin(Φ) Apart from bc, we get a Φ-preserving function Given a function f on Fin(Φ) and A Fin(Φ), ∗ ∈ by stripping off any initial or final empty boxes let us write fA for the function ρA; f on Fin(Φ)∗ unpad(s ) if s = s or f (s) := f(ρ (s)) 0 0 A A unpad(s) := else s = s  0 (recalling ρ (α α ) := (α A) (α A)  s otherwise A 1 ··· n 1 ∩ ··· n ∩ and generalizing bc from Example A). To extract A so that unpad(s)neither begins nor ends with . a presheaf on Fin(Φ) from the Fin(Φ)-indexed Notice that bc; unpad = unpad; bc. family of functions fA, certain requirements on f are helpful. Toward that end, let us agree that Proposition 3. If f and g are Φ-preserving and f; g = g; f, then f; g is Φ-preserving. - f preserves a function g with domain Fin(Φ)∗ if g = f; g 2.2 The Grothendieck construction - f is idempotent if f preserves itself (i.e., f = Given a presheaf F on Fin(Φ), the category F of elements of F (also known as the Grothendieck f; f) R construction for F ) has - the vocabulary voc(s) of s Fin(Φ)∗ is the ∈ - objects (A, s) Fin(Φ) F (A) (making set of fluents that occur in s ∈ × X F in(Φ) F (X) the set of objects in F ) n ∈ voc(α1 αn) := αi R ··· -P morphisms (B, s0, A, s) from objects (B, s0) i[=1 to (A, s) when A B and F (B,A)(s ) = s ⊆ 0 whence s voc(s)∗. ∈ (e.g. Mac Lane and Moerdijk 1992). Let πf be the Note that for idempotent f, image(f) consists of left projection canonical representatives f(s) of ker(f)’s equivalence classes s Fin(Φ) f(s ) = f(s) . πf (A, s) = A { 0 ∈ ∗ | 0 }

66 from Pf back to Fin(Φ). The inverse limit of but too many non-isomorphic choices for such s00. P , lim P , is the set of ( P )-valued presheaves Consider the case of bc; unpad, with ( , ) terminal f R f f ∅ p on←−Fin(Φ) (i.e. functors p : Fin(Φ)op P ) in P (where is the null string of length R → f bc;unpad that are inverted by π 0). For distinct fluents a and b Φ, there are 13 f R R ∈ strings s Pbc;unpad( a, b ) such that π (p(A)) = A for all A Fin(Φ). ∈ { } f ∈ ( a , a ) ( a, b , s) ( b , b )) That is, p(A) = (A, s ) for some s f(2A) A A ∈ ∗ { } ← { } → { } such that corresponding to the 13 interval relations in Allen ( ) s = f (s ) whenever A B Fin(Φ). 1983 (Fernando 2007). ‡ A A B ⊆ ∈ The explosion of solutions s00 Pf (A B) to ( ) is the essential restriction that lim P adds ∈ ∪ ‡ f the equations to objects sX X F in(Φ) of the dependent←− type { } ∈ X F in(Φ) Pf (X). f (s00) = s and f (s00) = s0 ∈ A B 2.3Q Superposition and non-determinism given Taking the presheaf Pid induced by the identity function id on Fin(Φ)∗, observe that in Pid, (A, s) (A B, f (s)) (B, s0) → ∩ B ← there is a product of R (i.e., fB(s) = fA(s0)) is paralleled by the trans- ( , ) and ( ϕ , ϕ ) ∅ { } formation, under f, of a language L to but not of 1 Lf := f − fL ( ϕ , ) and ( ϕ , ϕ ). { } { } used to turn the superposition L&L0 of languages The tag A in (A, s) differentiating ( , ) from L and L0 into ∅ ( ϕ , ) cannot be ignored when forming prod- { } L &f L0 := f(Lf & L0f ). ucts in Pid. A necessary and sufficient condition for (A, s) and (B, s0) to have a product is R For f := bc; unpad, the set a &f b consists of ρB(s) = ρA(s0) the 13 strings mentioned above. (We follow the usual practice of conflating a string s with the sin- presupposed by the pullback of gleton language s whenever convenient.) { } Stepping from strings to languages, we lift the (A, s) (A B, ρB(s)) (B, s0). → ∩ ← presheaf P to the presheaf Q mapping A f f ∈ By comparison, the superposition s&s0 exists (as Fin(Φ) to a string) if and only if A Qf (A) := fL L (2 )∗ ρ (s) = ρ (s0) { | ⊆ } ∅ ∅ and a Fin(Φ)op-morphism (B,A) to the function for

Qf (B,A) := (λL Qf (B)) fAL (voc(s), s) ( , ρ (s)) (voc(s0), s0) ∈ → ∅ ∅ ← length(s) sending L Qf (B) to fAL Qf (A). Then, (or length(s) = length(s0) as ρ (s) = ). ∈ ∈ ∅ for non-identity morphisms between Q -objects Products in P are superpositions, but superpo- f id (A, L) and (A, L ) where L L , we add in- sitions need not be products. 0 ⊆ 0R R clusions from (A, L) to (A, L ) to the Q - Next, we step from id to other Φ-preserving 0 f morphisms for the category C(Φ, f) with functions f such as bc and bc; unpad. A pair R (A, s) and (B, s ) of P -objects may fail to 0 f - objects the same as those in Qf , and have a product not because there is no P -object R f (A B, s ) such that - morphisms (B,L0, A, L) R from objects ∪ 00 R (B,L ) to (A, L) whenever A B and 0 ⊆ (A, s) (A B, s00) (B, s0) f L L. ← ∪ → A 0 ⊆

67 As is the case with Q -morphisms, the sources 3.1 Bottom naturally f ⊥ (domains) of C(Φ, f)-morphisms entail their tar- R If the function ηA such that for a1 an A∗, gets (codomains). To make these entailments pre- ··· ∈ cise, we can identify the space of possible worlds η (a a ) = a a A 1 ··· n 1 ··· n with the inverse limit of Pf , and reduce (A, L) to is to be the A-th component of a natural trans- [[A, L]]f := p lim Pf formation η : S P , we need to specify { ∈ | ⇒ id ( s ←−L) p(A) = (A, s) . the presheaf S on Fin(Φ). To form a function ∃ ∈ } S(B,A): S(B) S(A) for A B Fin(Φ) → ⊆ ∈ The inclusion with B S(B) and A S(A), it is handy to ∗ ⊆ ∗ ⊆ introduce a bottom for B A, adjoining to a [[B,L0]] [[A, L]] ⊥ − ⊥ f ⊆ f ﬁnite subset X of Φ for X := X + before ⊥ {⊥} forming the strings in S(X) := X ∗. We then set can then be pronounced: (B,L0) f-entails (A, L). ⊥ S(B,A): B ∗ A ∗ Proposition 4. Let f be a Φ-preserving function ⊥ → ⊥ and (A, L) and (B,L0) be Qf -objects such that S(B,A)() := A B. (B,L ) f-entails (A, L) iff there is a ⊆ 0 R β S(B,A)(s) if β A C(Φ, f) (B,L ) (A, L) S(B,A)(βs) := ∈ ⊥ -morphism from 0 to . S(B,A)(s) otherwise ⊥ Relaxing the assumption A B, one can also ⊆ check that for f bc, unpad, (bc; unpad) , pull- (e.g. S( a, b , a )(ba ) = a ) and let ηA : ∈ { } { A } { } ⊥ ⊥ ⊥ backs of A ∗ (2 )∗ map to itself, and ⊥ →

(A, L) (A B, (f L) f L0) (B,L0) ηA(s) if α = → ∩ ∅ ∩ ∅ ← ηA(αs) := ⊥ α ηA(s) otherwise in C(Φ, f) are given by (e.g. η a ( a ) = a ). { } (A, L) (A B,L&f L0) (B,L0) (3) ⊥ ⊥ ← ∪ → Proposition 5. η is a natural transformation although (3) need not hold for L&f L0 to be well- from S to Pid. defined. 3.2 Another presheaf and category 3 Constraints and finite automata Turning now to finite automata, we recall a funda- We now bring finite automata into the picture, re- mental result about languages that are regular (i.e., calling from section 1 Example C’s superpositions accepted by finite automata),3 the Buchi-Elgot-¨ Trakhtenbrot theorem (e.g. Thomas 1997) a1 an & q1 qn (4) ··· ··· for every finite alphabet A = , a language + 6 ∅ where a1 an is accepted by a finite automaton L A is regular iff there is a sentence ϕ of ··· ⊆ going through the sequence q1 qn of (inter- MSO such that A ··· A nal) states. We can assume the tape alphabet A ⊇ + a , . . . , a and the state set Q q , . . . , q L = s A s =A ϕ . { 1 n} ⊇ { 1 n} { ∈ | | } are two disjoint subsets of the set Φ of fluents; fluents in A are “observable” (on a tape), while flu- MSOA is Monadic Second Order logic with a ents in Q are “hidden” (inside a black box). Dis- unary relation symbol U for each a A, plus a a ∈ joint though they may be, A and Q are tightly cou- binary relation symbol S for successors. The pred- pled by ’s transition table δ Q A Q (not to icate = treats a string a a a over A as an A ⊆ × × | A 1 2 ··· n mention the other components of , its initial and MSO -model with universe 1, 2, . . . , n , U as A A { } a final states). That coupling can hardly be recreated its subset i ai = a , and S as by superposition & (or some simple modification { | } &f ) without the help of some machinery encoding (1, 2), (2, 3),..., (n 1, n) { − } δ. But first, there is the small matter of formulat- 3Whether or not this sense of regular has an interesting ing the map a a a a implicit in 1 ··· n 7→ 1 ··· n connection with regular categories (which are, among other (4) above as a natural transformation. things, finitely complete), I do not know.

68 so that, for instance, for a, b A. ∈ Working back from (7) a1 an =A x y S(x, y) iff n 2 (5) ··· | ∃ ∃ ≥ A s =A ϕ iff ηA(s) = ϕ for all finite A = . Notice that no a A is | | 6 ∅ ∈ required to interpret x y S(x, y), which after all to the Buchi-Elgot-Trakhtenbrot¨ theorem, one can ∃ ∃ n is an MSO -sentence suited to strings S( ). check that for every finite A and MSO -formula ∅ ⊥ ∈ ∅ A Furthermore, for a = b and a, b A, 6 { } ⊆ ϕ, the set + no string in A satisfies x Ua(x) Ub(x) (6) (ϕ) := s (2A)+ s =A ϕ ∃ ∧ LA { ∈ | | } which makes it awkward to extend = to formulas A A | A of strings over 2 that = -satisfy ϕ is regular, us- with free variables (requiring variable assignments | + ing the fact that for all A0 A, the restriction on top of strings in A ). A ⊆ of ρA0 to (2 )∗ is computable by a finite state A simple way to accommodate variables is to 4 A transducer. But for A Φ, ρA0 (2 )∗ is just include them in A and interpret MSOA-formulas ⊆ + A + Pid(A, A0). In recognition of the role of these not over A but over (2 ) , lifting =A from A | functions in = , we effectivize the presheaf Qid strings s over A to a predicate =A on strings over | | from 2.3 as follows. Let RΦ be the presheaf on 2A § such that Fin(Φ) mapping A s =A ϕ iff ηA(s) = ϕ (7) - A Fin(Φ) to the set of languages over the | | ∈ alphabet 2A that are regular for every MSOA-sentence ϕ (Fernando 2013a). For all s (2A)+, we set ∈ RΦ(A) := L Qid(A) L is regular { ∈ | } A s = S(x, y) iff ρ x,y (s) ∗ x y ∗ (8) | { } ∈ and for A x, y , and ⊇ { } - a Fin(Φ)op-morphism (B,A) to the restric- A s = Ua(x) iff ρ a,x (s) Ea a, x Ea (9) tion of Qid(B,A) to RΦ(B) | { } ∈ for A a, x , where Ea := ( + a )∗. We must R (B,A) := (λL R (B)) ρ L. ⊇ { } Φ ∈ Φ A be careful to incorporate into the clauses defining A s = ϕ the presupposition that each first-order R -objects are then pairs (A, L) where A | Φ ∈ variable x free in ϕ occurs uniquely in s — i.e. Fin(Φ) and L is a regular language over the al- s =A x = x where R A | phabet 2 , while RΦ-morphisms are quadru- A ples (B, L, A, ρAL) from (B,L) to (A, ρAL) for s = x = y iff ρ x,y (s) ∗ x, y ∗ (10) R | { } ∈ A B Fin(Φ). To account for the Boolean op- ⊆ ∈ for x, y A. In particular, we restrict negation erations in MSO (as opposed to the predications ∈ ϕ to strings =A-satisfying x = x, for each first- (8)– (10) involving ρA), we add inclusions for a ¬ | order variable x free in ϕ. We can then put category R(Φ) with

A - the same objects as R s = x ϕ iff ( s0) ρ (s0) = ρ (s) Φ | ∃ ∃ A A A x and s0 = ∪{ } ϕ - morphisms all ofR those in C(Φ, id) be- | tween objects in RΦ — i.e., quadruples and similarly for second-order existential quantiﬁ- (B,L , A, L) such that A B Fin(Φ), 0 R ⊆ ∈ cation. The equivalence (5) above then becomes L (2B) is regular, L (2A) is regular, 0 ⊆ ∗ ⊆ ∗ A + and ρ L L. s = x y S(x, y) iff ρ (s) (11) A 0 ⊆ | ∃ ∃ ∅ ∈ Let us agree to write and in place of (6), we have

A (B,L0) ; (A, L) s = x Ua(x) Ub(x) iff ρ a,b (s) | ∃ ∧ { } ∈ 4 (2 a,b ) a, b (2 a,b ) (12) Note an MSOA-formula ϕ is not strictly a ﬂuent in Φ but { } ∗ { } ∗ is formed in part from ﬂuents.

69 to mean (B,L0, A, L) is a R(Φ)-morphism. Proposition 6. For A B Fin(Φ), ϕ an ⊆ ∈ Clearly, for s (2A)+, A A and L (2A0 )+, MSO -formula and s (2B)+, ∈ 0 ⊆ ⊆ A ∈ B A ρ (s) L iff (A, s ) ; (A0,L). s = ϕ iff ρA(s) = ϕ. A0 ∈ { } | | In particular, for x A and s (2A)+, Proposition 6 says that s =B ϕ depends only on ∈ ∈ | A the part ρA(s) of s mentioned in ϕ. It is a par- s = x = x iff (A, s ) ; ( x , ∗ x ∗) | { } { } ticular instance of the satisfaction condition in in- and similarly for x = x replaced by the differ- stitutions, expressing the invariance of truth under ent MSOA-formulas specified in clauses (8)–(12) change of notation (Goguen and Burstall 1992). above. The MSOA-sentence Proposition 6 breaks down if we replace ρA by bcA or unpadA, as can be seen with A = , and spec(A) := x (U (x) U (x)) ∅ ∀ a ∧ ¬ b ϕ = x y S(x, y), for which recall (11). a A b A a ∃ ∃ _∈ ∈ ^−{ } 3.3 Varying grain and span associating a unique a A with each string po- ∈ sition (presupposed in = but not in =A) fits the Troublesome as they are, the maps bcA and | A | same pattern unpadA have some use. Just as we can vary temporal grain through bc (Examples A and B in sec- s =A spec(A) iff ρ (s) a a A + tion 1), we can vary temporal span through unpad. | A ∈ { | ∈ } ; For instance, we can combine runs of automata 1 iff (A voc(s), s ) A ∪ { } + over A and over A in (A, a a A ) 1 A2 2 { | +∈ } iff ρ (s) η A . L( 1, 2) := AcRun( 1)&unpad AcRun( 2) A ∈ A A A A A Let us define a string s Fin(Φ)+ to be with the subscript unpad on & relaxing the re- ∈ quirement that and start and finish together A A1 A2 - A-specified if s = spec(A) (running in lockstep throughout). For i 1, 2 , | ∈ { } + + Q - A-underspecified if ρA(s) ηA(A A ) and i the state set for i, ∈ ⊥ − A AcRun( ) = unpad L( , ) - A-overspecified if ρ (s) image(η ) i Ai Qi 1 2 A 6∈ A A ∪ A A A ,A ,Q Q so that for a = a0 and A = a, a0 , a a is A- assuming the sets 1 2 1 and 2 are pair- 6 { } wise disjoint. The disjointness assumption rules specified, a is A-underspecified, and a, a0 a out any communication (or interference) between is A-overspecified. Given a finite automaton A and . As subsets of one large set Φ of over A with set Q of states, its set AcRun( ) of A1 A2 A fluents, however, it is perfectly natural for these accepting runs (Example C) is both A-specified sets to intersect (and communicate through a com- and Q-specified, provided A Q = (and other- ∩ ∅ mon vocabulary), and we might express very par- wise risks being A-overspecified). The language 1 tial constraints involving them through, for ex- accepted by is the η− -image of the language A A ample, MSO-formulas. Recalling the definition ρAAcRun( ) that is Q-underspecified, in accor- A (ϕ) := s (2A)+ s =A ϕ , we can rewrite dance with the intuition that the states are hidden. LA { ∈ | | } the satisfaction condition From the regularity of AcRun( ), however, it is A B A clear that we can make these states visible, with s = ϕ iff fA(s) = ϕ AcRun( ) as the language accepted by a finite au- | | A tomaton (over 2A Q) that may (or may not) on MSOA-formulas ϕ, A B Fin(Φ) and s A0 ∪ B + ⊆ ∈ ∈ have the same set Q of states. (2 ) as The maps ρ and inclusions underlying the B + A ⊆ B(ϕ) = s (2 ) fA(s) A(ϕ) . morphisms of R(Φ) represent the two ways in- L { ∈ | ∈ L } This equation lifts any regular language (ϕ) to formation may grow from RΦ-objects (A, L) LA to (B,L ) — expansively with A B and L = a regular language B(ϕ), provided f is computed 0 R ⊆ L ρ L , and eliminatively with L L and A = B. by a finite-state transducer (as in the case of bc or A 0 0 ⊆ The same notion of f-entailment defined in 2.3 unpad). Inverse images under such relations are a § through the sets [[A, L]]f applies, but we have been useful addition to the stock of operations constitut- careful here to fix f to id, in view of ing MSO-formulas as well as regular expressions.

70 References Richard Montague. 1973. The proper treatment of quantification in ordinary English. In Approaches to James F. Allen. 1983. Maintaining knowledge about Natural Language, pages 221 – 42. D. Reidel, Dor- temporal intervals. C. ACM, 26(11):832–843. drecht. Michael Bennett and Barbara Partee. 1972. Toward the Ian Pratt-Hartmann. 2005. Temporal prepositions and logic of tense and aspect in English. Technical re- their logic. Artificial Intelligence 166: 1–36. port, System Development Corporation, Santa Mon- ica, California. Reprinted in Partee 2008. Roger Schwarzschild and Karina Wilkinson. 2002. Quantifiers in comparatives: A semantics of de- Robin Cooper. 2012. Type theory and semantics in flux. gree based on intervals. Natural Language Seman- In Handbook of the Philosophy of Science. Volume tics 10(1):1–41. 14: Philosophy of Linguistics. pages 271–323. Stephanie Solt. 2013. Scales in natural language. David R. Dowty. 1979. Word Meaning and Montague Manuscript. Grammar. Reidel, Dordrecht. Wolfgang Thomas. 1997. Languages, automata and Tim Fernando. 2004. A finite-state approach to events logic. In Handbook of Formal Languages: Beyond in natural language semantics. J. Logic and Compu- Words, volume 3, pages 389 – 455. Springer-Verlag. tation, 14(1):79–92.

Tim Fernando. 2007. Observing events and situations in time. Linguistics and Philosophy 30(5):527–550.

Tim Fernando. 2008. Branching from inertia worlds. J. Semantics 25(3):321–344.

Tim Fernando. 2011. Regular relations for temporal propositions. Natural Language Engineering 17(2): 163–184.

Tim Fernando. 2013a. Finite state methods and description logics Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing, pages 63 – 71.

Tim Fernando. 2013b. Dowty’s aspect hypothesis seg- mented. Proceedings of the 19th Amsterdam Collo- quium, pages 107 – 114.

Danny Fox and Martin Hackl. 2006. The universal density of measurement. Linguistics and Philosophy 29(5): 537 – 586.

Joseph Goguen and Rod Burstall. 1992., Institutions: Abstract model theory for speciﬁcation and programming, J. ACM, 39(1):95–146.

Hans Kamp and Uwe Reyle. 1993. From Discourse to Logic. Kluwer.

Christopher Kennedy. 2001. Polar opposition and the ontology of degrees. Linguistics and Philosophy 24. 33 – 70.

Ewan Klein and Michael Rovatsos. 2011. Temporal Vagueness, Coordination and Communication In Vagueness in Communication 2009, LNAI 6517, pages 108–126.

Wolfgang Klein. 2009. How time is encoded. In W. Klein and P. Li, editors, The Expression of Time, pages 39 – 81, Mouton De Gruyter.

Saunders Mac Lane and Ieke Moerdijk. 1992. Sheaves in Geometry and Logic: A First Introduction to Topos Theory. Springer.

71 A Probabilistic Rich Type Theory for Semantic Interpretation

Robin Cooper1, Simon Dobnik1, Shalom Lappin2, and Staffan Larsson1 1University of Gothenburg, 2King’s College London cooper,sl @ling.gu.se, [email protected], [email protected] { }

Abstract Such systems do not appear to be efficiently learn- We propose a probabilistic type theory in which a able from the primary linguistic data (with weak situation s is judged to be of a type T with probabil- learning biases), nor is there much psychological ity p. In addition to basic and functional types it in- data to suggest that they provide biologically de- cludes, inter alia, record types and a notion of typ- termined constraints on semantic learning. ing based on them. The type system is intensional A semantic theory that assigns probability in that types of situations are not reduced to sets rather than truth conditions to sentences is in a of situations. We specify the fragment of a com- better position to deal with both of these issues. positional semantics in which truth conditions are Gradience is intrinsic to the theory by virtue of replaced by probability conditions. The type sys- the fact that speakers assign values to declarative tem is the interface between classifying situations sentences in the continuum of real numbers [0,1], rather than Boolean values in 0,1 . In addition, in perception and computing the semantic interpre- { } tations of phrases in natural language. a probabilistic account of semantic learning is fa- cilitated if the target of learning is a probabilistic 1 Introduction representation of meaning. Both semantic repre- Classical semantic theories (Montague, 1974), as sentation and learning are instances of reasoning well as dynamic (Kamp and Reyle, 1993) and un- under uncertainty. derspecified (Fox and Lappin, 2010) frameworks Probability theorists working in AI often de- use categorical type systems. A type T identifies scribe probability judgements as involving distri- a set of possible denotations for expressions in T , butions over worlds. In fact, they tend to limit and the system specifies combinatorial operations such judgements to a restricted set of outcomes for deriving the denotation of an expression from or events, each of which corresponds to a par- the values of its constituents. tial world which is, effectively, a type of situa- These theories cannot represent the gradience tion (Halpern, 2003). A classic example of the re- of semantic properties that is pervasive in speak- duction of worlds to situation types in probability ers’ judgements concerning truth, predication, and theory is the estimation of the likelihood of heads meaning relations. In general, predicates do not vs tails in a series of coin tosses. Here the world have determinate extensions (or intensions), and is held constant except along the dimension of a so, in many cases, speakers do not make categor- binary choice between a particular set of possi- ical judgements about the interpretation of an ex- ble outcomes. A slightly more complex case is pression. Attributing gradience effects to perfor- the probability distribution for possible results of mance mechanisms offers no help, unless one can throwing a single die, which allows for six pos- show precisely how these mechanisms produce the sibilities corresponding to each of its numbered observed effects. faces. This restricted range of outcomes consti- Moreover, there is a fair amount of evidence in- tutes the sample space. dicating that language acquisition in general cru- We are making explicit the assumption, com- cially relies on probabilistic learning (Clark and mon to most probability theories used in AI, with Lappin, 2011). It is not clear how a reasonable clearly defined sample spaces, that probability account of semantic learning could be constructed is distributed over situation types (Barwise and on the basis of the categorical type systems that ei- Perry, 1983), rather than over sets of entire worlds. ther classical or revised semantic theories assume. An Austinian proposition is a judgement that a

72 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 72–79, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics situation is of a particular type, and we treat it By contrast situation types can be as large or as as probabilistic. In fact, it expresses a subjec- small as we need them to be. They are not max- tive probability in that it encodes the belief of an imal in the way that worlds are, and so the issue agent concerning the likelihood that a situation is of completeness of specification does not arise. of that type. The core of an Austinian proposi- Therefore, they can, in principle, be tractably rep- tion is a type judgement of the form s : T , which resented. states that a situation s is of type T . On our account this judgement is expressed probabilistically 2 Rich Type Theory and Probability as p(s : T ) = r, where r [0,1].1 ∈ Central to standard formulations of rich type the- On the probabilistic type system that we pro- ories (for example, (Martin-Lof,¨ 1984)) is the no- pose situation types are intensional objects over tion of a judgement a : T , that object a is of type which probability distributions are specified. This T . We represent the probability of this judgement allows us to reason about the likelihood of alter- as p(a : T ). Our system (based on Cooper (2012)) native states of affairs without invoking possible includes the following types. worlds. Basic Types are not constructed out of other ob- Complete worlds are not tractably repre- jects introduced in the theory. If T is a basic type, sentable. Assume that worlds are maximal con- p(a : T ) for any object a is provided by a probabil- sistent sets of propositions (Carnap, 1947). If ity model, an assignment of probabilities to judge- the logic of propositions is higher-order, then the ments involving basic types. problem of determining membership in such a set PTypes are constructed from a predicate and is not complete. If the logic is classically first- an appropriate sequence of arguments. An exam- order, then the membership problem is complete, ple is the predicate ‘man’ with arity Ind, Time h i but undecidable. where the types Ind and Time are the basic type Alternatively, we could limit ourselves to of individuals and of time points respectively. propositional logic, and try to generate a maxi- Thus man(john,18:10) is the type of situation (or mally consistent set of propositions from a single eventuality) where John is a man at time 18:10. finite proposition P in Conjunctive Normal Form A probability model provides probabilities p(e : (CNF, a conjunction of disjunctions), by simply r(a1, . . . , an)) for ptypes r(a1, . . . , an). We take adding conjuncts to P . But it is not clear what both common nouns and verbs to provide the com- (finite) set of rules or procedures we could use to ponents out of which PTypes are constructed. decide which propositions to add in order to gen- Meets and Joins give, for T1 and T2, the meet, erate a full description of a world in a systematic T T and the join T T , respectively. a : 1 ∧ 2 1 ∨ 2 way. Nor is it obvious at what point the conjunc- T T just in case a : T and a : T . a : T 1 ∧ 2 1 2 1 ∨ tion will constitute a complete description of the T2 just in case either a : T1 or a : T2 (possibly world. both).3 The probabilities for meet and joint types Moreover, all the propositions that P entails are defined by the classical (Kolmogorov, 1950) must be added to it, and all the propositions with equations p(a : T1 T2) = p(a : T1)p(a : T2 a : T1) ∧ | which P is inconsistent must be excluded, in or- (equivalently, p(a : T1 T2) = p(a : T1, a : T2)), and ∧ der to obtain the maximal consistent set of propo- p(a : T1 T2) = p(a : T1) + p(a : T2) p(a : T1 T2), ∨ − ∧ sitions that describe a world. But then testing the respectively. satisfiability of P is an instance of the ksat prob- Subtypes A type T1 is a subtype of type T2, lem, which, in the general case, is NP-complete.2 T T , just in case a : T implies a : T no mat- 1 v 2 1 2 ter what we assign to the basic types. If T1 T2 1Beltagy et al. (2013) propose an approach on which clas- v then a : T T iff a : T and a : T T iff a : T . sical logic-based representations are combined with distribu- 1 ∧ 2 1 1 ∨ 2 2 tional lexical semantics and a probabilistic Markov logic, in Similarly, if T T then a : T T iff a : T 2 v 1 1 ∧ 2 2 order to select among the set of possible inferences from a and a : T T iff a : T . sentence. Our concern here is more foundational. We seek to 1 ∨ 2 1 If T T , then p(a : T T ) = p(a : T ), replace classical semantic representations with a rich proba- 2 v 1 1 ∧ 2 2 bilistic type theory as the basis of both lexical and composi- and p(a : T1 T2) = p(a : T1). If T1 T2, tional interpretation. ∨ v 2The ksat problem is to determine whether a formula in 3This use of intersection and union types is not standard in propositional logic has a satisfying set of truth-value assign- rich type theories, where product and disjoint union are pre- ments. For the complexity results of different types of ksat ferred following the Curry-Howard correspondence for con- problem see Papadimitriou (1995). junction and disjunction.

73 then p(a : T ) p(a : T ). These deﬁnitions it has as its domain all events which have been 1 ≤ 2 also entail that p(a : T T ) p(a : T ), and judged to have probability greater than 0 of being 1 ∧ 2 ≤ 1 p(a : T ) p(a : T T ). lightning events. Let us consider that all the puta- 1 ≤ 1 ∨ 2 We generalize probabilistic meet and join types tive lightning events were clear examples of light- to probabilities for unbounded conjunctive and ning (i.e. judged with probability 1 to be of type disjunctive type judgements, again using the clas- T1) and are furthermore associated by f with clear sical equations. events of thunder (i.e. judged with probability 1 to ^ be of type T2). Suppose there were four such pairs Let (a0 : T0, . . . , an : Tn) be the conjunctive p of events. Then the probability of f being of type probability of judgements a : T , . . . , a : T . 4 0 0 n n (T1 T2) is (1 1) , that is, 1. ^ ^ → × Then (a0 : T0, . . . , an : Tn) = (a0 : T0, . . . , an 1 : p p − Suppose, alternatively, that for one of the four

Tn 1)p(an : Tn a0 : T0, . . . , an 1 : Tn 1). If n = 0, events f associates the lightning event with a silent − − − ^ | event, that is, one whose probability of being of (a0 : T0, . . . , an : Tn) = 1. p T2 is 0. Then the probability of f being of type We interpret universal quantification as an un- (T T ) is (1 1)3 (1 0) = 0. One clear 1 → 2 × × × bounded conjunctive probability, which is true if counterexample is sufficient to show that the func- it is vacuously satisfied (n = 0) (Paris, 2010). tion is definitely not of the type. _p Let (a0 : T0, a1 : T1,...) be the disjunctive In cases where the probabilities of the an- tecedent and the consequent type judgements are probability of judgements a0 : T0, a1 : T1,.... _p higher than 0, the probability of the entire judge- It is computed by (a0 : T0, . . . , an : Tn) = ment on the existence of a functional type f will _p ^ (a0 : T0, . . . , an 1 : Tn 1) + p(an : Tn) (a0 : decline in proportion to the size of dom(f). As- − − − p sume, for example that there are k elements a T0, . . . , an 1 : Tn 1)p(an : Tn a0 : T0, . . . , an 1 : ∈ − − − dom(f) a p(a : T ) = _p | , where for each such 1 Tn 1). If n = 0, (a0 : T0, . . . , an : Tn) = 0. p(f(a): T ) .5. Every a that is added to − 2 ≥ i dom(f) will reduce the value of p(f :(T We take existential quantification to be an un- 1 → bounded disjunctive probability, which is false if it T2)), even if it yields higher values for p(a : T1) lacks a single non-nil probability instance (n = 0). and p(f(a): T2). This is due to the fact that we are treating the probability of p(f :(T T )) Conditional Conjunctive Probabilities are 1 → 2 ^ as the likelihood of there being a function that is computed by (a0 : T0, . . . , an : Tn a : T ) = p | satisfied by all objects in its domain. The larger ^ the domain, the less probable that all elements in (a0 : T0, . . . , an 1 : Tn 1 a : T )p(an : Tn p − − | | it fulfill the functional relation. ^ a0 : T0, . . . , an 1 : Tn 1, a : T )). If n = 0, (a0 : We are, then, interpreting a functional type − − p judgement of this kind as a universally quantified T0, . . . , an : Tn a : T ) = 1. | assertion over the pairing of objects in dom(f) Function Types give, for any types T1 and T2, range(f) the type (T T ). This is the type of total func- and . The probability of such an asser- 1 → 2 tions with domain the set of all objects of type tion is given by the conjunction of assertions corresponding to the co-occurrence of each element a T1 and range included in objects of type T2. The in f’s domain as an instance of T1 with f(a) as an probability that a function f is of type (T1 T2) → T is the probability that everything in its domain is of instance of 2. This probability is the product of the probabilities of these individual assertions. type T1 and that everything in its range is of type T2, and furthermore that everything not in its do- This seems reasonable, but it only deals with main which has some probability of being of type functions whose domain is all objects which have

T is not in fact of type T . p(f :(T1 T2)) = been judged to have some probability, however 1 1 → ^ _p low, of being of type T1. Intuitively, functions (a : T1, f(a): T2)(1 (a : T1)) p − a dom(f) a dom(f) which leave out some of the objects with lower ∈ 6∈ Suppose that T1 is the type of event where there likelihood of being of type T1 should also have a is a ﬂash of lightning and T is the type of event probability of being of type (T T ). This fac- 2 1 → 2 where there is a clap of thunder. Suppose that f tor in the probability is represented by the second maps lightning events to thunder events, and that element of the product in the formula.

74 Negation T , of type T , is the function type 3 Compositional Semantics ¬ (T ), where is a necessarily empty type → ⊥ ⊥ and p( ) = 0. It follows from our rules for func- Montague (1974) determines the denotation of a ⊥ tion types that p(f : T ) = 1 if dom(f) = , that complex expression by applying a function to an ¬ ∅ is T is empty, and 0 otherwise. intensional argument (as in [[ NP ]]([[ ∧VP ]])). We We also assign probabilities to judgements con- employ a variant of this general strategy by applying a probabilistic evaluation function [[ ]] to cerning the (non-)emptiness of a type, p(T ). we · p pass over the details of how we compute the prob- a categorical (non-probabilistic) semantic value. abilities of such judgements, but we note that our For semantic categories that are interpreted as account of negation entails that p(T T ) = 1, functions, [[ ]]p yields functions from categorical ∨ ¬ · and (ii) p( T ) = p(T ). Therefore, we sustain values to probabilities. For sentences it produces ¬¬ classical Boolean negation and disjunction, in con- probability values. The probabilistic evaluation function [[ ]] pro- trast to Martin-Lof’s¨ (1984) intuitionistic type the- · p ory. duces a probabilistic interpretation based on a Dependent Types are functions from objects to classical compositional semantics. For sentences types. Given appropriate arguments as functions it will return the probability that the sentence is they will return a type. Therefore, the account of true. For categories that are interpreted as func- probabilities associated with functions above ap- tions it will return functions from (categorical) in- plies to dependent types. terpretations to probabilities. We are not propos- Record Types A record in a type system asso- ing strict compositionality in terms of probabili- ciated with a set of labels is a set of ordered pairs ties. Probabilities are like truth-values (or rather, (fields) whose first member is a label and whose truth-values are the limit cases of probabilities). second member is an object of some type (possibly We would not expect to be able to compute the a record). Records are required to be functional on probability associated with a complex constituent labels (each label in a record can only occur once on the basis of the probabilities associated with its in the record’s left projection). immediate constituents, any more than we would A dependent record type is a set of fields (or- expect to be able to compute a categorical inter- dered pairs) consisting of a label ` followed by T as above. The set of record types is defined by: pretation entirely in terms of truth-functions and extensions. However, the simultaneous computa- 1. [], that is the empty set or Rec, is a record type. r : Rec tion of categorical and probabilistic interpretations just in case r is a record. provides us with a compositional semantic system 2. If T1 is a record type, ` is a label not occurring in T1, that is closely related to the simultaneous com- and T2 is a type, then T1 `, T2 is a record type. ∪ {h i} putation of intensions and extensions in classical r : T1 `, T2 just in case r : T1, r.` is defined (` ∪ {h i} occurs as a label in r) and r.` : T2. Montague semantics. The following definition of [[ ]]p for a fragment 3. If T is a record type, ` is a label not occuring in · T , is a dependent type requiring n arguments, and of English is specified on the basis of our proba- T 4 π1, . . . , πn is an n-place sequence of paths in T , bilistic type system and a non-probabilistic inter- h i then T `, , π1, . . . , πn is a record type. ∪ {h hT h iii} pretation function [[ ]], which we do not give in r : T `, , π1, . . . , πn just in case r : T , ∪ {h hT h iii} · r.` is defined and r.` : (r.π1, . . . , r.πn). this version of the paper. (It’s definition is given T by removing the probability p from the definition The probability that an object r is of a record below.) type T is given by the following clauses: e1:[[ S1 ]] [[ [ S1 and S2] ]]p = p( ) 1. p(r : Rec) = 1 if r is a record, 0 otherwise S e2:[[ S2 ]] [[ [ S1 or S2] ]]p = p( e:[[ S1 ]] [[ S2 ]] ) S ∨ ^ [[ [ Neg S] ]]p = [[ Neg ]]p([[ S ]]) 2. p(r : T1 `, T2 ) = (r : T1, r.` : T2) S ∪ {h i} p [[ [S NP VP] ]]p = [[ NP ]]p([[ VP ]]) [[ [NP Det N] ]]p = [[ Det ]]p([[ N ]]) 3. If :(T1 (... (Tn Type) ...)), then [[ [NP Nprop ] ]]p = [[ Nprop ]]p T → → → ^ [[ [VP Vt NP] ]]p = [[ Vt ]]p([[ NP ]]) p(r : T `, , π1, . . . , πn ) = (r : T, r.` : ∪ {h hT h iii} p [[ [VP Vi] ]]p = [[ Vi ]]p (r.π1, . . . , r.πn) r.π1 : T1, . . . , r.πn : Tn) [[ [Neg “it’s not true that”] ]]p = λT :RecType(p( e: T )) T | ¬ [[ ]] λQ λP Q P 4 [Det “some”] p = :Ppty( :Ppty(p( e:some( , ) ))) In the full version of TTR we also allow absolute paths which point to particular records, but we will not include [[ [Det “every”] ]]p = λQ:Ppty(λP :Ppty(p( e:every(Q, P ) ))) them here. [[ [Det “most”] ]]p = λQ:Ppty(λP :Ppty(p( e:most(Q, P ) )))

75 [[ [N “boy”] ]]p = λr: x:Ind (p( e:boy(r.x) )) Suppose that pd(s1:smile(kim)) = .7, [[ [N “girl”] ]]p = λr: x:Ind (p( e:girl(r.x) )) pd(s2:smile(kim)) = .3, pd(s3:smile(kim)) = [[ [Adj “green”] ]]p = .4, and there are no other situations si such λP :Ppty(λr: x:Ind (p(( e:green(r.x,P ) ))))) that pd(si:smile(kim)) > 0. Furthermore, let [[ [Adj “imaginary”] ]]p = us assume that these probabilities are indepen- 5 λP :Ppty(λr: x:Ind (p(( e:imaginary(r.x,P ) ))))) dent of each other, that is, pd(s3:smile(kim)) = [[ [ “Kim”] ]]p = λP :Ppty(p(P ( x=kim ))) p (s :smile(kim) s :smile(kim), s :smile(kim)) Nprop d 3 | 1 2 [[ ]] λP P x=sandy [Nprop “Sandy”] p = :Ppty(p( ( ))) and so on. Then [[ ]] [Vt “knows”] p = pd(smile(kim))= _p λ :Quant(λr1: x:Ind (p( (λr2:( e:know(r1.x,r2.x) ))))) P P d(s1 : smile(kim), s2 : smile(kim), s3 : smile(kim)) = [[ ]] [Vt “sees”] p = _p _p λ :Quant(λr1: x:Ind (p( (λr2:( e:see(r1.x,r2.x) ))))) d(s1 : smile(kim), s2 : smile(kim)) + .4 .4 d(s1 : P P − [[ ]] λr x:Ind e:smile(r.x) [Vi “smiles”] p = : (p( )) smile(kim), s2 : smile(kim)) = [[ [ “laughs”] ]]p = λr: x:Ind (p( e:laugh(r.x) )) Vi (.7 + .3 .7 .3) + .4 .4(.7 + .3 .7 .3) = A probability distribution d for this fragment, − × − − × .874 based on a set of situations , is such that: S 6 This means that pd( e:smile(kim) ) = .874. pd(a : Ind) = 1 if a is kim or sandy [[ ]] .874 Hence [S [NP [Nprop Kim]] [VP [Vi smiles]]] pd = pd(s : T ) [0, 1] if s and T is a ptype ∈ ∈ S 7 (where [[ α ]]pd is the result of computing [[ α ]]p pd(s : T ) = 0 if s and T is a ptype τ 6∈ S with respect to the probability distribution d). pd(a :[ P ]) = pd(P ( x=a )) τ τ Just as for categorical semantics, we can con- pd(some(P,Q)) = pd([ P ] [ Q]) τ ∧ τ struct type theoretic objects corresponding to pd(every(P,Q)) = pd([ P ] [ Q]) → τ τ pd([ P ] [ Q] probabilistic judgements. We call these proba- pd(most(P,Q)) = min(1, ∧ τ ) θmost pd([ P ]) bilistic Austinian propositions. These are records The probability that an event e is of the type in of type which the relation some holds of the properties P   sit : Sit and Q is the probability that e is of the conjunctive  sit-type : Type  type P Q. The probability that e is of the every prob : [0,1] ∧ type for P and Q is the likelihood that it instanti- where [0,1] is used to represent the type of real ates the functional type P Q. As we have de- numbers between 0 and 1. They assert that the → fined the probabilities associated with functional probability that a situation s is of type T ype is the types in terms of universal quantification (an un- value of prob. bounded conjunction of the pairings between the The definition of [[ ]]p specifies a compositional · elements of the domain P of the function and its procedure for generating an Austinian proposition range Q), this definition sustains the desired read- (record) of this type from the meanings of the syn- ing of every. The likelihood that e is of the type tactic constituents of a sentence. most for P and Q is the likelihood that e is of type P Q, factored by the product of the contex- 4 An Outline of Semantic Learning ∧ tually determined parameter θmost and the likelihood that e is of type P , where this fraction is less We outline a schematic theory of semantic learn- than 1, and 1 otherwise. ing on which agents acquire classifiers that form Consider a simple example. the basis for our probabilistic type system. For [[ ]] simplicity and ease of presentation we take these [S [NP [Nprop Kim]] [VP [Vi smiles]]] p = λP :Ppty(p(P (x=kim)))(λr:x:Ind(e:smile(r.x))) = to be Naive Bayes classifiers, which an agent ac- p(λr:x:Ind(e:smile(r.x))(x=kim)) = quires from observation. In future developments p(e:smile(kim)) of this theory we will seek to extend the approach to Bayesian networks (Pearl, 1990). 5 Notice that we characterize adjectival modifiers as rela- We assume that agents keep records of observed tions between records of individuals and properties. We can then invoke subtyping to capture the distinction between in- situations and their types, modelled as probabilis- tersective and non-intersective modifier relations. tic Austinian propositions. For example, an obser- 6 This seems an intuitive assumption, though not a neces- vation of a man running might yield the following sary one. 7Again this seems an intuitive, though not a necessary as- Austinian proposition for some a:Ind, s1:man(a), sumption. s2:run(a):

76    ref = a  Assume also that JT T has three members, man∧ run  sit =  cman = s1     corresponding to judgements by A that a man was  crun = s2      s s  ref : Ind  running in three observed situations 1, 3, and    sit-type =  cman : man(ref)   s4, and that these Austinian propositions have the  crun : run(ref)  probabilities 0.6, 0.6. and 0.5 respectively. prob = 0.7 J An agent, A, makes judgements based on a Take Tman to have five members correspond- finite string of probabilistic Austinian proposi- ing to judgements by A that there was a man in tions, J, corresponding to prior judgements held s1, . . . , s5, and that the Austinian propositions as- in memory. For a type, T , JT represents that set of signing Tman to s1, . . . , s5 all have probability 0.7. Austinian propositions j such that j.sit-type T . Given these assumptions, the conditional probabil- v If T is a type and J a finite string of probabilis- ity that A will assign on the basis of J to someone runs, given that he is a man is pA,J(r : Trun r : tic Austinian propositions, then T J represents | || || Tman Trun J 0.6+0.6+0.5 Tman) = || ∧ || = = .486 the sum of all probabilities associated with T in J Tman 0.7+0.7+0.7+0.7+0.7 || ||J P j.prob (J) We use conditional probabilities to construct a ( j JT ). is the sum of all probabilities ∈ P P in J ( j J j.prob). Naive Bayes classifier. A classifies a new situa- ∈ We use priorJ(T ) to represent the prior proba- tion s based on the prior judgements J, and what- bility that anything is of type T given J, that is ever evidence A can acquire about s. This evi- T || ||J p s : T ... p s : T (J) if (J) > 0, and 0 otherwise. dence has the form A,J( e1 ), , A,J( en ), P P pA,J(s : T ) denotes the probability that agent A where Te1 ,...,Ten are the evidence types. The assigns with respect to prior judgements J to s be- Naive Bayes classifier assumes that the evidence is ing of type T . Similarly, pA,J(s : T1 s : T2) is independent, in that the probability of each piece | the probability that agent A assigns with respect of evidence is independent of every other piece of to prior judgements J to s being of type T1, given evidence. that A judges s to be of type T2. We first formulate Bayes’ rule of conditional When an agent A encounters a new situation probability. This rule defines the conditional prob- s and considers whether it is of type T , he/she ability of a conclusion r : Tc, given evidence r : uses probabilistic reasoning to determine the value Te1 , r : Te2 , . . . , r : Ten , in terms of conditional prob- of pA,J(s : T ). A uses conditional probabilities abilities of the form p(si : Te si : Tc), 1 i n, i | ≤ ≤ to calculate this value, where A computes these and priors for conclusion and evidence: conditional probabilities with the equation pA,J(s : pA,J(r : Tc r : Te1 , . . . , r : Ten ) = T1 T2 J Te | Tc Te Tc || ∧ || || 1 ∧ ||J ... || n ∧ ||J T1 s : T2) = T , if T2 J= 0. Otherwise, T T | || 2||J || || 6 || c||J || c||J priorJ(Tc) prior (T )...prior (T ) pA,J(s : T1 s : T2) = 0. J e1 J en | This is our type theoretic variant of the standard Bayesian formula for conditional probabili- The conditional probabilities are computed A&B ties: p(A B) = | | . But instead of counting from observations as indicated above. The rule of | B categorical instances,| | we sum the probabilities of conditional probability allows the combination of judgements. This is because our “training data” is several pieces of evidence, without requiring pre- not limited to categorical observations. Instead it vious observation of a situation involving all the consists of probabilistic observational judgements evidence types. that situations are of particular types.8 We formulate a Naive Bayes classifier as a func-

Assume that we have the following types: tion from evidence types Te1 ,Te2 ,...,Ten (i.e. from ref : Ind a record of type Te1 Te2 ... Ten ) to conclusion Tman = and ∧ ∧ ∧ cman : man(ref) T ,T ,...,T types c1 c2 cm . The conclusion is a disjunc- ref : Ind Trun = tion of one or more T Tc1 ,Tc2 ,...,Tcm , where crun : run(ref) ∈ { } m ranges over all possible non-disjunctive conclu- 8As a reviewer observes, by using an observer’s previous sions distinguished by the classifier. This function judgements for the probability of an event being of a particular type, as the prior for the rule that computes the proba- is specified as follows. bility of a new event being of that type, we have, in effect, κ :(Te1 ... Ten ) (Tc1 ... Tcm ) such that κ(r) = compressed information that properly belongs in a Bayesian ∧ ∧ → ∨ ∨ (W argmax p (r : T r : T , . . . , r : T ) network into our specification of a naive Bayesian classifier. A,J e1 en T Tc ,...,Tcm | This is a simplification that we adopt here for ease of expo- ∈h 1 i sition. In future work, we will characterise classifier learning The classifier returns the type T which max- through full Bayesian networks. imises the conditional probability of r : T given

77 the evidence provided by r. The argmax operator negation. This has permitted us to sustain classi- here takes a sequence of arguments and a func- cal equivalences and Boolean negation for com- tion and yields a sequence containing the argu- plex types within an intensional type theory. We ments which maximise the function (if there are have replaced the truth of a type judgement with more than one). the probability of it being the case, and we have The classifier will output a disjunction in case applied this approach to judgements that a situa- both possibilities have the same probability. The tion if of type T . W operator takes a sequence and returns the dis- Our probabilistic formulation of a rich type the- junction of all elements of the sequence. ory with records provides the basis for a compo- In addition to computing the conclusion which sitional semantics in which functions apply to cat- receives the highest probability given the evi- egorical semantic objects in order to return either dence, we also want the posterior probability of functions from categorical interpretations to prob- the judgement above, i.e. the probability of the abilistic judgements, or, for sentences, to proba- judgement in light of the evidence. We obtain the bilistic Austinian propositions. One of the inter- nn non-normalised probabilities (pA,J) of the different esting ways in which this framework differs from possible conclusions by factoring in the probabili- classical model theoretic semantics is that the ba- ties of the evidence: sic types and type judgements at the foundation of nn pA,J(r : κ(r)) = the type system correspond to perceptual judge- P T W 1 κ(r) pA,J(r : T r : Te1 , . . . , r : Ten )pA,J(r : ments concerning objects and events in the world, ∈ − | Te1 ) . . . pA,J(r : Ten ) rather than to entities in a model and set theoretic W 1 W where − is the inverse of , i.e. a function that constructions defined on them. takes a disjunction and returns the set of disjuncts. We have offered a schematic view of semantic We then take the probability of r : κ(r) and learning. On this account observations of situa- normalise over the sum of the probabilities of tions in the world support the acquisition of naive all the possible conclusions. This gives us the Bayesian classifiers from which the basic proba- normalised probability of the judgement resulting nn bilistic types of our type theoretical semantics are pA,J(r:κ(r)) from classification p(r : κ(r)) = P nn . 1 i m pA,J(r:Tci ) extracted. Our type theory is, then, the interface ≤ ≤ between observation-based learning of classifiers However, since the probabilities of the evidence for objects and the situations in which they figure are identical for all possible conclusions, we can on one hand, and the computation of complex se- ignore them and instead compute the normalised mantic values for the expressions of a natural lan- probability with the following equation (where m guage from these simple probabilistic types and ranges over all possible non-disjunctive conclu- type judgements on the other. Therefore our gen- sions distinguished by the classifier, as above). eral model of interpretation achieves a highly integrated bottom-up treatment of linguistic mean- P W p (r:T r:T ,...,r:T ) T 1 κ(r) A,J | e1 en pA,J(r : κ(r)) = P∈ − ing and perceptually-based cognition that situates 1 i m pA,J(r:Tci r:Te1 ,...,r:Ten ) ≤ ≤ | meaning in learning how to make observational The result of classification can be represented as judgements concerning the likelihood of situations an Austinian proposition obtaining in the world.

  The types of our semantic theory are inten- sit = s sional. They constitute ways of classifying situ-  sit-type = κ(s)  prob = pA,J(s : κ(s)) ations, and they cannot be reduced to set of situations. The theory achieves ﬁne-grained intension- which A adds to J as a result of observing and ality through a rich and articulated type system, classifying s, and is thus made available for sub- where the foundation of this system is anchored in sequent probabilistic reasoning. perceptual observation. 5 Conclusions and Future Work The meanings of expressions are acquired on the basis of speakers’ experience in the applica- We have presented a probabilistic version of a rich tion of classiﬁers to objects and events that they type theory with records, relying heavily on classi- encounter. Meanings are dynamic and updated in cal equations for types formed with meet, join, and light of subsequent experience.

78 Probability is distributed over alternative situ- A. Clark and S. Lappin. 2011. Linguistic Nativism ation types. Possible worlds, construed as maxi- and the Poverty of the Stimulus. Wiley-Blackwell, mal consistent sets of propositions (ultrafilters in a Chichester, West Sussex, and Malden, MA. proof theoretic lattice of propositions) play no role Robin Cooper. 2012. Type theory and semantics in in this framework. flux. In Ruth Kempson, Nicholas Asher, and Tim Bayesian reasoning from observation provides Fernando, editors, Handbook of the Philosophy of Science, volume 14: Philosophy of Linguistics. El- the incremental basis for learning and refining sevier BV, 271–323. General editors: Dov M. Gab- predicative types. These types feed the combina- bay, Paul Thagard and John Woods. torial semantic procedures for interpreting the sentences of a natural language. C. Fox and S. Lappin. 2010. Expressiveness and complexity in underspecified semantics. Linguistic In future work we will explore implementations Analysis, Festschrift for Joachim Lambek, 36:385– of our learning theory in order to study the viabil- 417. ity of our probabilistic type theory as an interface J. Halpern. 2003. Reasoning About Uncertainty. MIT between perceptual judgement and compositional Press, Cambridge MA. semantics. We hope to show that, in addition to its cognitive and theoretical interest, our proposed H. Kamp and U. Reyle. 1993. From Discourse to Logic: Introduction to Modeltheoretic Semantics framework will yield results in robotic language of Natural Language, Formal Logic and Discourse learning, and dialogue modelling. Representation Theory. Kluwer, Dordrecht. Acknowledgments A.N. Kolmogorov. 1950. Foundations of Probability. Chelsea Publishing, New York. We are grateful to two anonymous reviewers for very helpful comments on an earlier draft of this Per Martin-Lof.¨ 1984. Intuitionistic Type Theory. Bib- liopolis, Naples. paper. We also thank Alex Clark, Jekaterina Denissova, Raquel Fernandez,´ Jonathan Ginzburg, Richard Montague. 1974. Formal Philosophy: Se- Noah Goodman, Dan Lassiter, Michiel van Lam- lected Papers of Richard Montague. Yale University Press, New Haven. ed. and with an introduction by balgen, Poppy Mankowitz, Aarne Ranta, and Pe- Richmond H. Thomason. ter Sutton for useful discussion of ideas presented in this paper. Shalom Lappin’s participation in C. Papadimitriou. 1995. Computational Complexity. the research reported here was funded by grant Addison-Wesley Publishing Co., Readin, MA. ES/J022969/1 from the Economic and Social Re- J. Paris. 2010. Pure inductive logic. Winter School in search Council of the UK, and a grant from the Logic, Guangzhou, China. Wenner-Gren Foundations. We also gratefully ac- J. Pearl. 1990. Bayesian decision methods. In knowledge the support of Vetenskapsradet,˚ project G. Shafer and J. Pearl, editors, Readings in Uncer- 2009-1569, Semantic analysis of interaction and tain Reasoning, pages 345–352. Morgan Kaufmann. coordination in dialogue (SAICD); the Depart- ment of Philosophy, Linguistics, and Theory of Science; and the Centre for Language Technology at the University of Gothenburg.

References Jon Barwise and John Perry. 1983. Situations and Attitudes. Bradford Books. MIT Press, Cambridge, Mass.

I. Beltagy, C. Chau, G. Boleda, D. Garrette, K. Erk, and R. Mooney. 2013. Montague meets markov: Deep semantics with probabilistic logical form. In Second Joint Conference on Lexical and Computa- tional Semantics, Vol. 1, pages 11–21. Association of Computational Linguistics, Atlanta, GA.

R. Carnap. 1947. Meaning and Necessity. University of Chicago Press, Chicago.

79 Probabilistic Type Theory for Incremental Dialogue Processing

Julian Hough and Matthew Purver Cognitive Science Research Group School of Electronic Engineering and Computer Science Queen Mary University of London j.hough,m.purver @qmul.ac.uk { }

Abstract 2 Previous Work

We present an adaptation of recent work Type Theory with Records (TTR) (Betarte and on probabilistic Type Theory with Records Tasistro, 1998; Cooper, 2005) is a rich type the- (Cooper et al., 2014) for the purposes of ory which has become widely used in dialogue modelling the incremental semantic pro- models, including information state models for cessing of dialogue participants. After a variety of phenomena such as clarification representing the formalism and dialogue quests (Ginzburg, 2012; Cooper, 2012) and non- framework, we show how probabilistic sentential fragments (Fernández, 2006). It has also TTR type judgements can be integrated been shown to be useful for incremental semantic into the inference system of an incremen- parsing (Purver et al., 2011), incremental genera- tal dialogue system, and discuss how this tion (Hough and Purver, 2012), and recently for could be used to guide parsing and dia- grammar induction (Eshghi et al., 2013). logue management decisions. While the technical details will be given in section 3, the central judgement in type theory s ∶ T 1 Introduction (that a given object s is of type T ) is extended in TTR so that s can be a (potentially complex) While classical type theory has been the predomi- record and T can be a record type – e.g. s could nant mathematical framework in natural language represent a dialogue gameboard state and T could semantics for many years (Montague, 1974, in- be a dialogue gameboard state type (Ginzburg, ter alia), it is only recently that probabilistic type 2012; Cooper, 2012). As TTR is highly flexible theory has been discussed for this purpose. Sim- with a rich type system, variants have been con- ilarly, type-theoretic representations have been sidered with types corresponding to real-number- used within dialogue models (Ginzburg, 2012); valued perceptual judgements used in conjunction and probabilistic modelling is common in dia- with linguistic context, such as visual perceptual logue systems (Williams and Young, 2007, inter information (Larsson, 2011; Dobnik et al., 2012), alia), but combinations of the two remain scarce. demonstrating its potential for embodied learning Here, we attempt to make this connection, taking systems. The possibility of integration of per- (Cooper et al., 2014)’s probabilistic Type Theory ceptron learning (Larsson, 2011) and naive Bayes with Records (TTR) as our principal point of de- classifiers (Cooper et al., 2014) into TTR show parture, with the aim of modelling incremental in- how linguistic processing and probabilistic conference in dialogue. ceptual inference can be treated in a uniform way To our knowledge there has been no practi- within the same representation system. cal integration of probabilistic type-theoretic in- Probabilistic TTR as described by Cooper et al. ference into a dialogue system so far; here we dis- (2014) replaces the categorical s ∶ T judgement cuss computationally efficient methods for imple- with the real number valued p s ∶ T = v where mentation in an extant incremental dialogue sys- v ∈ [0,1]. The authors show how( standard) proba- tem. This paper demonstrates their efficacy in sim- bility theoretic and Bayesian equations can be apple referential communication domains, but we ar- plied to TTR judgements and how an agent might gue the methods could be extended to larger do- learn from experience in a simple classification mains and additionally used for on-line learning game. The agent is presented with instances of in future work.

80 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 80–88, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics a situation and it learns with each round by updat- ple will be addressed in section 5. First we will ing its set of probabilistic type judgements to best set out the framework in which we want to model predict the type of object in focus — in this case such processing. updating the probability judgement that something is an apple given its observed colour and shape 3 Probabilistic TTR in an incremental dialogue framework p s ∶ Tapple s ∶ TShp,s ∶ TCol where Shp ∈ ( ∣ ∈ ) shp1,shp2 and Col col1, col2 . From a In TTR (Cooper, 2005; Cooper, 2012), the princi- { } { } cognitive modelling perspective, these judgements pal logical form of interest is the record type (‘RT’ can be viewed as probabilistic perceptual informa- from here), consisting of sequences of fields of the tion derived from learning. We use similar meth- form l ∶ T containing a label l and a type T .1 ods in our toy domain, but show how prior judge- RTs can[ be] witnessed (i.e. judged as inhabited) ments could be constructed efficiently, and how by records of that type, where a record is a set of classifications can be made without exhaustive it- label-value pairs l = v . The central type judge- eration through individual type classifiers. ment in TTR that[ a record] s is of (record) type There has also been significant experimental R, i.e. s ∶ R, can be made from the component work on simple referential communication games type judgements of individual fields; e.g. the one- in psycholinguistics, computational and formal field record l = v is of type l ∶ T just in case modelling. In terms of production and genera- v is of type T[ . This] is generalisable[ to] records and tion, Levelt (1989) discusses speaker strategies RTs with multiple fields: a record s is of RT R if for generating referring expressions in a simple s includes fields with labels matching those occur- object naming game. He showed how speakers ring in the fields of R, such that all fields in R are use informationally redundant features of the ob- matched, and all matched fields in s must have a jects, violating Grice’s Maxim of Quantity. In value belonging to the type of the corresponding natural language generation (NLG), referring ex- field in R. Thus it is possible for s to have more pression generation (REG) has been widely stud- fields than R and for s ∶ R to still hold, but not ied (see (Krahmer and Van Deemter, 2012) for vice-versa: s ∶ R cannot hold if R has more fields a comprehensive survey). The incremental algo- than s. rithm (IA) (Dale and Reiter, 1995) is an iterative feature selection procedure for descriptions of ob- l1 ∶ T1 ⎡ ⎤ l1 ∶ T1 jects based on computing the distractor set of ref- R ∶ ⎢ l2 ∶ T2 ⎥ R ∶ R ∶ 1 ⎢ ⎥ 2 ∶ ′ 3 ⎢ ⎥ l2 T2 [] erents that each adjective in a referring expression ⎢ l3 ∶ T3 l1 ⎥ ⎢ ⎥ could cause to be inferred. More recently Frank ⎣ ( ) ⎦ Figure 1: Example TTR record types and Goodman (2012) present a Bayesian model of optimising referring expressions based on sur- Fields can have values representing predicate prisal, the information-theoretic measure of how types (ptypes), such as T3 in Figure 1, and conse- much descriptions reduce uncertainty about their quently fields can be dependent on fields preced- intended referent, a measure which they claim cor- ing them (i.e. higher) in the RT, e.g. l1 is bound in relates strongly to human judgements. the predicate type field l3, so l3 depends on l1. The element of the referring expression domain we discuss here is incremental processing. Subtypes, meets and joins A relation between ⊑ There is evidence from (Brennan and Schober, RTs we wish to explore is (‘is a subtype of’), 2001)’s experiments that people reason at an in- which can be defined for RTs in terms of fields as ⊑ ∶ credibly time-critical level from linguistic infor- simply: R1 R2 if for all fields l T2 in R2, ∶ ⊑ [ ] mation. They demonstrated self-repair can speed R1 contains l T1 where T1 T2. In Figure 1, ⊑ [ ] ⊑ ⊑ up semantic processing (or at least object refer- both R1 R3 and R2 R3; and R1 R2 iff ⊑ ′ ence) in such games, where an incorrect object T2 T2 . The transitive nature of this relation (if ⊑ ⊑ ⊑ being partly vocalized and then repaired in the R1 R2 and R2 R3 then R1 R3) can be used instructions (e.g. “the yell-, uh, purple square”) effectively for type-theoretic inference. yields quicker response times from the onset of 1We only introduce the elements of TTR relevant to the the target (“purple”) than in the case of the flu- phenomena discussed below. See (Cooper, 2012) for a de- ent instructions (“the purple square”). This exam- tailed formal description.

81 We also assume the existence of manifest (sin- we use probabilistic TTR to model a common psy- gleton) types, e.g. Ta, the type of which only a is cholinguistic experimental set up in section 5. We a member. Here, we write manifest RT fields such repeat some of Cooper et al.’s calculations here as l ∶ Ta where Ta ⊑ T using the syntactic sugar for exposition, but demonstrate efficient graphical [ ] l=a ∶ T . The subtype relation effectively allows methods for generating and incrementally retriev- [progressive] instantiation of fields (as addition of ing probabilities in section 4. ′ ′ fields to R leads to R where R ⊑ R), which is Cooper et al. (2014) define the probability of the practically useful for an incremental dialogue sys- meet and join types of two RTs as follows: tem as we will explain. = We can also define meet and join types of two p s ∶ R1 ∧ R2 p s ∶ R1 p s ∶ R2 s ∶ R1 ( ∶ ∨ ) = ( ∶ ) +( ∶ ∣ − ) ∶ ∧ or more RTs. The representation of the meet type p s R1 R2 p s R1 p s R2 p s R1 R2 ( ) ( ) ( ) ( )(1) of two RTs R1 and R2 is the result of a merge operation (Larsson, 2010), which in simple terms It is practically useful, as we will describe be- here can be seen as union of fields. A meet type low, that the join probability can be computed in is also equivalent to the extraction of a maxi- terms of the meet. Also, there are equivalences be- mal common subtype, an operation we will call tween meets, joins and subtypes in terms of type 2 MaxSub Ri..Rn : judgements as described above, in that assuming ( ) if R1 ⊑ R2 then p s ∶ R2 s ∶ R1 = 1, we have: l1 ∶ T1 l2 ∶ T2 ( ∣ ) if R1 = and R2 = l2 ∶ T2 l3 ∶ T3 if R1 ⊑ R2 ∶ l1 T1 p s ∶ R ∧ R = p s ∶ R R ∧ R = ⎡ l ∶ T ⎤ 1 2 1 1 2 ⎢ 2 2 ⎥ ( ) = ( ) (2) ⎢ ∶ ⎥ p s ∶ R1 ∨ R2 p s ∶ R2 ⎢ l3 T3 ⎥ ⎢ ⎥ ( ∶ ) ≤ ( ∶ ) = ⎣MaxSub R⎦ , R p s R1 p s R2 1 2 ( ) ( ) ( ) The conditional probability of a record being of type R2 given it is of type R1 is: R1 and R2 here are common supertypes of the resulting R ∧ R . On the other hand, the join of p s ∶ R ∧ s ∶ R 1 2 p s ∶ R s ∶ R = 1 2 (3) 2 1 ( ∶ ) two RTs R1 and R2, the type R1 ∨ R2 cannot be ( ∣ ) p s R1 ( ) represented by field intersection. It is defined in We return to an explanation for these classical ∶ ∨ terms of type checking, in that s R1 R2 iff probability equations holding within probabilistic ∶ ∶ ⊑ s R1 or s R2. It follows that if R1 R2 then TTR in section 4. s ∶ R1 ∧ R2 iff s ∶ R1, and s ∶ R1 ∨ R2 iff s ∶ R2. While technically the maximally common su- Learning and storing probabilistic judgements When dealing with referring expression games, or pertype of R1 and R2 is the join type R1 ∨ R2, here we introduce the maximally common simple indeed any language game, we need a way of storing perceptual experience. In probabilistic TTR (non disjunctive) supertype of two RTs R1 and R2 as field intersection: this can be achieved by positing a judgement set J ∶ ∶ in which an agent stores probabilistic type judge- = l1 T1 = l2 T2 3 if R1 ∶ and R2 ∶ ments. We refer to the sum of the value of proba- l2 T2 l3 T3 bilistic judgements that a situation has been judged MaxSuper R1, R2 = l2 ∶ T2 ( ) to be of type Ri within J as Ri J and the sum of We will explore the usefulness of this new op- all probabilistic judgements∥ in J∥ simply as P J ; eration in terms of RT lattices in sec. 4. thus the prior probability that anything is of type( ) R under the set of judgements J is Ri J . The 3.1 Probabilistic TTR i ∥P J∥ ( ) conditional probability p s ∶ R1 s ∶ R2 un- We follow Cooper et al. (2014)’s recent extension der J can be reformulated( in terms∣ of these) sets of TTR to include probabilistic type judgements of of judgements: the form p s ∶ R = v where v ∈ [0,1], i.e. the real R1∧R2 J ( ) ∥ ∥ iff R2 J ≠ 0 valued judgement that a record s is of RT R. Here ∶ ∶ = R2 J pJ s R1 s R2 ∥ ∥ ∥ ∥ (4) ( ∣ ) w0 otherwise 2Here we concern ourselves with simple examples that 3 avoid label-type clashes between two RTs (i.e. where R1 con- (Cooper et al., 2014) characterise a type judgement as an tains l1 ∶ T 1 and R2 contains l1 ∶ T 2); in these cases the op- Austinian proposition that a situation is of a given type with erations are more complex than field concatenation/sharing. a given probability, encoded in a TTR record.

82 DS-TTR’s monotonicity, each maximal RT of the where the sample spaces R ∧ R and R 1 2 J 2 J tree’s root node is a subtype of the parser’s previ- constitute the observations∥ of the agent∥ so∥ far.∥J ous maximal output. can have new judgements added to it during learn- Following (Eshghi et al., 2013), DS-TTR tree ing. We return to this after introducing the incre- nodes include a field head in all RTs which cor- mental semantics needed to interface therewith. responds to the DS tree node type. We also as- 3.2 DS-TTR and the DyLan dialogue system sume a neo-Davidsonian representation of predicates, with fields corresponding to an event term In order to permit type-theoretic inference in a and to each semantic role; this allows all available dialogue system, we need to provide suitable semantic information to be specified incrementally TTR representations for utterances and the cur- in a strict subtyping relation (e.g. providing the rent pragmatic situation from a parser, dialogue field when subject but not object has been manager and generator as instantaneously and ac- subj parsed)() – see Figure 2. curately as possible. For this purpose we use We implement DS-TTR parsing and genera- an incremental framework DS-TTR (Eshghi et tion mechanisms in the DyLan dialogue system5 al., 2013; Purver et al., 2011) which integrates within Jindigo (Skantze and Hjalmarsson, 2010), TTR representations with the inherently incre- a Java-based implementation of the incremental mental grammar formalism Dynamic Syntax (DS) unit (IU) framework of (Schlangen and Skantze, (Kempson et al., 2001). 2009). In this framework, each module has input

x=john ∶ e and output IUs which can be added as edges be- ⎡ ∶ ⎤ ♢ ⎢ e=arrive es ⎥ tween vertices in module buffer graphs, and be- ,Ty t , ⎢ ∶ ⎥ ( ) ⎢ p=subj e,x t ⎥ ⎢ ( ) ∶ ⎥ come committed should the appropriate condi- ⎢ head=p t ⎥ ⎣⎢ ⎦⎥ tions be fulfilled, a notion which becomes important in light of hypothesis change and repair Ty e → t , situations. Dependency relations between differ- ∶ ( ∶) Ty e , λr head e . ent graphs within and between modules can be ∶ ( )∶ x=r.head e x=john e ⎡ ∶ ⎤ specified by groundedIn links (see (Schlangen and ∶ ⎢ e=arrive es ⎥ head=x e ⎢ ∶ ⎥ ⎢ p=subj e,x t ⎥ Skantze, 2009) for details). ⎢ ( ) ∶ ⎥ ⎢ head=p t ⎥ The DyLan interpreter module (Purver et al., ⎣⎢ ⎦⎥ Figure 2: DS-TTR tree 2011) uses Sato (2011)’s insight that the context of DS parsing can be characterized in terms of a Di- DS produces an incrementally specified, partial rected Acyclic Graph (DAG) with trees for nodes logical tree as words are parsed/generated; follow- and DS actions for edges. The module’s state is ing Purver et al. (2011), DS tree nodes are dec- characterized by three linked graphs as shown in orated not with simple atomic formulae but with Figure 3: RTs, and corresponding lambda abstracts repre- • input: a time-linear word graph posted by the senting functions of type RT → RT (e.g. λr ∶ ASR module, consisting of word hypothesis ∶ ∶ edge IUs between vertices W l1 T1 . l2=r.l1 T1 where r.l1 is a path ex- n [ ] [ ] pression referring to the label l1 in r) – see Fig- • processing: the internal DS parsing DAG, ure 2. Using the idea of manifestness of fields which adds parse state edge IUs between ver- as mentioned above, we have a natural represen- tices Sn groundedIn the corresponding word tation for underspecification of leaf node content, hypothesis edge IU 4 e.g. x ∶ e is unmanifest whereas x=john ∶ e • output: a concept graph consisting of domain is manifest[ ] and the latter is a subtype[ of the for-] concept IUs (RTs) as edges between vertices mer. Functional application can apply incremen- Cn, groundedIn the corresponding path in the tally, allowing a RT at the root node to be com- DS parsing DAG piled for any partial tree, which is incrementally Here, our interest is principally in the parser out- further specified as parsing proceeds (Hough and put, to support incremental inference; a DS-TTR Purver, 2012). Within a given parse path, due to generator is also included which uses RTs as goal 4 concepts (Hough and Purver, 2012) and uses the This is syntactic sugar for x ∶ ejohn and the = sign is not the same semantically as that[ in a record.] 5Available from http://dylan.sourceforge.net/

83 ⊤ R1200 = = same parse graph as the interpreter to allow self- [] monitoring and compound contributions, but we R120 = a ∶ b R121 = c ∶ d R110 = e ∶ f omit the details here. 

∶ ∶ ∶ = a b = a b = c d R10 ∶ R11 ∶ R12 ∶ c d e f e f 

a ∶ b = ⎡ ∶ ⎤ = ⊥ R1 ⎢ c d ⎥ ⎢ ∶ ⎥ ⎢ e f ⎥ ⎣⎢ ⎦⎥ Figure 4: Record Type lattice ordered by the subtype relation

join and in TTR terms is MaxSuper Rx,Ry but ( ) not necessarily their join type Rx ∨ Ry as here we concern ourselves with simple RTs. One el- Figure 3: Normal incremental parsing in Dylan ement covers another if it is a direct successor to it in the subtype ordering relation hierarchy. G has a greatest element (⊤) and least element (⊥), 4 Order theoretic and graphical methods with the atoms being the elements that cover ⊥; for probabilistic TTR ⊥ in Figure 4 if R1 is viewed as , the atoms are RT lattices to encode domain knowledge To R 10,11,12 . An RT element Rx has a comple- { } support efficient inference in DyLan, we represent ment if there is a unique element ¬Rx such that dialogue domain concepts via partially ordered MaxSuper Rx, ¬Rx = ⊤ and Rx ∧ ¬Rx = ⊥ sets (posets) of RT judgements, following similar (the lattice in( Figure) 4 is complemented as this insights used in inducing DS-TTR actions (Eshghi holds for every element). et al., 2013). A poset has several advantages over Graphically, the join of two elements can be an unordered list of un-decomposed record types: found by following the connecting edges upward the possibility of incremental type-checking; in- until they first converge on a single RT, giving us creased speed of type-checking, particularly for MaxSuper R10,R12 = R121 in Figure 4, and the pairs of/multiple type judgements; immediate use meet can be( found by) following the lines down- of type judgements to guide system decisions; in- ward until they connect to give their meet type, ference from negation; and the inclusion of learn- i.e. R10 ∧ R12 = R1. ing within a domain. We leave the final challenge If we consider R1 to be a domain concept in for future work, but discuss the others here. a dialogue system, we can see how its RT lattice We can construct a poset of type judgements G can be used for incremental inference. As infor any single RT by decomposing it into its con- crementally specified RTs become available from stituent supertype judgements in a record type lat- the interpreter they are matched to those in G to tice. Representationally, as per set-theoretic lat- determine how far down towards the final domain tices, this can be visualised as a Hasse diagram concept R1 our current state allows us to be. Dif- such as Figure 4, however here the ordering arrows ferent sequences of words/utterances lead to dif- show ⊑ (‘subtype of’) relations from descendant to ferent paths. However, any practical dialogue sys- ancestor nodes. tem must entertain more than one possible domain To characterize an RT lattice G ordered by ⊑, concept as an outcome; G must therefore contain we adapt Knuth (2005)’s description of lattices in multiple possible final concepts, constituting its line with standard order theory: for a pair of RT atoms, each with several possible dialogue move elements Rx and Ry, their lower bound is the set sequences, which correspond to possible down- of all Rz ∈ G such that Rz ⊑ Rx and Rz ⊑ Ry. ward paths – e.g. see the structure of Figure 5. In the event that a unique greatest lower bound ex- Our aim here is to associate each RT in G with a ists, this is their meet, which in G happily corre- probabilistic judgement. sponds to the TTR meet type R ∧ R . Dually, if x y Initial lattice construction We define a simple their unique least upper bound exists, this is their bottom-up procedure in Algorithm 1 to build a RT

84 lattice G of all possible simple domain RTs and the three disjunctive final situations. Each node their prior probabilistic judgements, initialised by shows an RT Ri on the left and the derivation of the disjunction of possible final state judgements its prior probability pJ Ri that any game situa- 6 ⊥ ( ) (the priors), along with the absurdity , stipu- tion record will be of type Ri on the right, purely lated a priori as the least element with probability in terms of the relevant priors and the global de- 0 and the meet type of the atomic priors. The al- nominator P J . gorithm recursively removes one field from the RT G can be searched( ) to make inferences in light being processed at a time (except fields referenced of partial information from an ongoing utterance. in a remaining dependent ptype field), then orders We model inference as predicting the likelihood the new supertype RT in G appropriately. of relevant type judgements Ry ∈ G of a situa- Each node in G contains its RT Ri and a sum tion s, given the judgement s ∶ Rx we have so far. of probability judgements Rk J + .. + Rn J To do this we use conditional probability judge- corresponding to the probabilities{∥ ∥ of the∥ priors∥ it} ments following Knuth’s work on distributive lat- stands in a supertype relation to. These sums are tices, using the ⊑ relation to give a choice function: propagated up from child to parent node as it is 1 if R ⊑ R constructed. It terminates when all simple maxi- ⎧ x y p s ∶ R s ∶ R = ⎪0 if R ∧ R = ⊥ (5) 7 J y x ⎪ x y mal supertypes have been processed, leaving the ( ∣ ) ⎪⎨p otherwise, where 0 ≤ p ≤ 1 ⊤ ⎪ maximally common supertype as (possibly the ⎩⎪ empty type [ ]), associated with the entire proba- The third case is the degree of inclusion of Ry bility mass P J , which constitutes the denomina- in Rx, and can be calculated using the conditional tor to all judgements-( ) given this, only the numer- probability calculation (4) in sec. 3. For nega- ator of equation Ri J needs to be stored at each tive RTs, a lattice generated from Algorithm 1 will ∥P J∥ be distributive but not guaranteed to be comple- node. ( ) mented, however we can still derive pJ s ∶ Ry ( ∣ Algorithm 1 Probabilistic TTR record type lattice s ∶ ¬Rx by obtaining pJ s ∶ Ry in G modulo the ) ( ) construction algorithm probability mass of Rx and that of its subtypes: INPUT: priors ▷ use the initial prior judgements for G’s atoms OUTPUT: G 0 if Ry ⊑ Rx G = newGraph(priors) ▷ P(J) set to equal sum of prior probs = pJ s ∶ Ry s ∶¬Rx pJ s∶Ry −pJ s∶Rx∧Ry agenda = priors ▷ Initialise agenda ( ∶⊤) − ( ∶ ) otherwise while not agenda is empty do ( ∣ ) w pJ s pJ s Rx ( ) ( ) RT = agenda.pop() (6) for field ∈ RT do if field ∈ RT.paths then ▷ Do not remove bound fields The subtype relations and atomic, join and meet continue superRT = RT - field types’ probabilities required for (1) - (6) can be if superRT ∈ G then ▷ not new? order w.r.t. RT and inherit RT’s priors G.order(RT.address,G.getNode(superRT),⊑) calculated efficiently through graphical search al- else ▷ new? superNode = G.newNode(superRT) ▷ create new node w. empty priors gorithms by characterising G as a DAG: the re- for node ∈ G do ▷ order superNode w.r.t. other nodes in G if superRT.fields ⊂ node.fields then verse direction of the subtype ordering edges can G.order(node,superNode,⊑) ▷ superNode inherits node’s priors agenda.append(superRT) ▷ add to agenda for further supertyping be viewed as reachability edges, making ⊤ the source and ⊥ the sink. With this characterisation, Direct inference from the lattice To explain if Rx is reachable from Ry then Rx ⊑ Ry. how our approach models incremental inference, In DAG terms, the probability of the meet of we assume Brennan and Schober (2001)’s experi- two RTs Rx and Ry can be found at their highest mental referring game domain described in section common descendant node – e.g. pJ R4 ∧ R5) in 1 2: three distinct domain situation RTs R1, R2 and Figure 5 can be found as directly at( . Note if 3 R1 R3 correspond to a purple square, a yellow square Rx is reachable from Ry, i.e. Rx ⊑ Ry, then due and a yellow circle, respectively. to the equivalences listed in (2), pJ Rx ∧ Ry can ( ) The RT lattice G constructed initially upon ob- be found directly at Rx. If the meet of two nodes ⊥ servation of the game (by instructor or instructee) is (e.g. R4 and R3 in Figure 5), then their meet ⊥ shown in Figure 5 uses a uniform distribution for probability is 0 as pJ =0. ( ) 6Although the priors’ disjunctive probability sums to 1 af- While the lattice does not have direct access to + + ter is constructed, i.e. in Figure 5 R1 J R2 J R3 J = , the join types of its elements, a join type prob- G ∥ ∥ ∥P J∥ ∥ ∥ 1 the real values initially assigned to them need( ) not sum to ability pJ Rx ∨ Ry can be calculated in terms unity, as they form the atoms of G (see (Knuth, 2005)). of p R ( ∧ R by) the join equation in (1), 7 J x y Note that it does not generate the join types but maximal which( holds for) all probabilistic distributive lat- common supertypes defined by field intersection.

85 R + R + R R = x ∶ind 1 J 2 J 3 J = ⊤ = 1 PRIORS: 8 ∥ ∥ ∥ P J∥ ∥ ∥ = 1 ( ) R1 J 3 ∥ ∥ = 1 R2 J 3 ∥ ∥ 1 R3 J = ∥ ∥ 3

∶ + ∶ ∶ + ∶ = x ind R1 J R2 J = x ind R1 J = x ind R2 J R3 J = x ind R3 J R4 ∶ ∥ ∥ ∥ ∥ R5 ∶ ∥ ∥ R6 ∶ ∥ ∥ ∥ ∥ R7 ∶ ∥ ∥ shpsq square x P J colp purple x P J coly yellow x P J shpc circle x P J ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

x ∶ ind x ∶ ind x ∶ ind = ⎡ ∶ ⎤ R1 J = ⎡ ∶ ⎤ R2 J = ⎡ ∶ ⎤ R3 J R1 ⎢ colp purple x ⎥ ∥ ∥ R2 ⎢ coly yellow x ⎥ ∥ ∥ R3 ⎢ coly yellow x ⎥ ∥ ∥ ⎢ ∶ ( ) ⎥ P J ⎢ ∶ ( ) ⎥ P J ⎢ ∶ ( ) ⎥ P J ⎢ shpsq square x ⎥ ( ) ⎢ shpsq square x ⎥ ( ) ⎢ shpc circle x ⎥ ( ) ⎣⎢ ( ) ⎦⎥ ⎣⎢ ( ) ⎦⎥ ⎣⎢ ( ) ⎦⎥

R0 = ⊥ = 0

Figure 5: Record type lattice with initial uniform prior probablities

tices (Knuth, 2005).8 As regards efficiency, worst `yell-', the best partial word hypothesis is now case complexity for finding the meet probability at “yellow”;10 the interpreter therefore outputs an RT the common descendant of Rx and Ry is a linear which matches the type judgement s ∶ R6 (i.e. that O m + n where m and n are the number of edges the object is a yellow object). Taking this judge- ( ) in the downward (possibly forked) paths Rx → ⊥ ment as the conditioning evidence using function 9 and Ry → ⊥. (5) we get pJ s ∶ R1 s ∶ R6 = 0 and us- ( ∣ ) ing (4) we get pJ s ∶ R2 s ∶ R6 = 0.5 and 5 Simulating incremental inference and ( ∣ ) pJ s ∶ R3 s ∶ R6 = 0.5 (see the schematic self-repair processing probability( distribution∣ ) at stage T2 in Figure 6 for Interpretation in DyLan and its interface to the the three objects). The meet type probabilities RT lattice G follows evidence that dialogue agents required for the conditional probabilities can be parse self-repairs efficiently and that repaired di- found graphically as described above. alogue content (reparanda) is given special sta- At T3:ùh purple', low probability in the in- tus but not removed from the discourse context. terpreter output causes a self-repair to be recog- To model Brennan and Schober (2001)’s findings nised, enforcing backtracking on the parse graph of disfluent spoken instructions speeding up ob- which informally operates as follows (see Hough ject recognition (see section 2), we demonstrate and Purver (2012)) : a self-repair parse in Figure 6 for “The yell-, uh, Self-repair: purple square” in the simple game of predicting IF from parsing word W the edge SEn is in- the final situation from R1,R2,R3 continuously sufficiently likely to be constructed from ver- given the type judgements{ made so} far. We de- tex Sn OR IF there is no sufficiently likely scribe the stages T1-T4 in terms of the current judgement p s ∶ Rx for Rx ∈ G word being processed- see Figure 6: ( ) THEN parse word W from vertex Sn−1. IF At T1:`the' the interpreter will not yield a sub- successful add a new edge to the top path, type checkable in G so we can only condition on without removing any committed edges be- 1 R8 (⊤), giving us pJ s ∶ Ri s ∶ R8 = for 3 ginning at Sn−1; ELSE set n=n−1 and repeat. i ∈ 1, 2, 3 , equivalent( to the∣ priors.) At T2: { } This algorithm is consistent with a local model 8The search for the meet probability is generalisable to conjunctive types by searching for the conjuncts’ highest for self-repair backtracking found in corpora common descendant. The join probability is generalisable to (Shriberg and Stolcke, 1998; Hough and Purver, the disjunctive probability of multiple types, used, albeit pro- 2013). As regards inference in G, upon detection gramatically, in Algorithm 1 for calculating a node’s probability from its child nodes. of a self-repair that revokes s ∶ R6, the type judge- 9 While we do not give details here, simple graphical ment s ∶ ¬R6, i.e. that this is not a yellow object, search algorithms for conjunctive and disjunctive multiple 10 types are linear in the number of conjuncts and disjuncts, sav- In practice, ASR modules yielding partial results are less ing considerable time in comparison to the algebraic calcula- reliable than their non-incremental counterparts, but progress tions of the sum and product rules for distributive lattices. is being made here (Schlangen and Skantze, 2009).

86 Figure 6: Incremental DS-TTR self-repair parsing. Inter-graph groundedIn links go top to bottom. is immediately available as conditioning evidence. Learning in a dialogue While not our focus Using (6) our distribution of RT judgements now here, lattice G’s probabilities can be updated shifts: pJ s ∶ R1 s ∶ ¬R6 = 1, pJ s ∶ R2 through observations after its initial construction. ( ∣ ) ( ∣ s ∶ ¬R6 = 0 and pJ s ∶ R3 s ∶ ¬R6 = 0 be- If a reference game is played over several rounds, fore “purple”) has been( parsed∣ – thus providing) a the choice of referring expression can change probabilistic explanation for increased subsequent based on mutually salient functions from words processing speed. Finally at T4: `square' given to situations- see e.g. (DeVault and Stone, 2009). pJ s ∶ R1 s ∶ R1 = 1 and R1∧R2 = R1∧R3 = ⊥, Our currently frequentist approach to learning is: ( ∣ ) the distribution remains unchanged. given an observation of an existing RT Ri is made The system’s processing models how listen- with probability v, then Ri J , the overall denom- ers reason about the revocation itself rather than inator P J , and the nodes∥ ∥ in the upward path ( )⊤ predicting the outcome through positive evidence from Ri to are incremented by v. The approach alone, in line with (Brennan and Schober, 2001)’s could be converted to Bayesian update learning by results. using the prior probabilities in G for calculating v before it is added. Furthermore, observations can 6 Extensions be added to G that include novel RTs: due to the Dialogue and self-repair in the wild To move DAG structure of G, their subtype ordering and towards domain-generality, generating the lattice probability effects can be integrated efficiently. of all possible dialogue situations for interesting 7 Conclusion domains is computationally intractable. We in- tend instead to consider incrementally occurring We have discussed efficient methods for construct- issues that can be modelled as questions (Lars- ing probabilistic TTR domain concept lattices or- son, 2002). Given one or more issues manifest in dered by the subtype relation and their use in the dialogue at any time, it is plausible to gener- incremental dialogue frameworks, demonstrating ate small lattices dynamically to estimate possible their efficacy for realistic self-repair processing. answers, and also assign a real-valued relevance We wish to explore inclusion of join types, the measure to questions that can be asked to resolve scalability of RT lattices to other domains and the issues. We are exploring how this could be their learning capacity in future work. implemented using the inquiry calculus (Knuth, 2005), which defines information theoretic rele- Acknowledgements vance in terms of a probabilistic question lattice, We thank the two TTNLS reviewers for their com- and furthermore how this could be used to model ments. Purver is supported in part by the European the cause of self-repair as a time critical trade-off Community’s Seventh Framework Programme un- between relevance and accuracy. der grant agreement no 611733 (ConCreTe).

87 References J. Hough and M. Purver. 2013. Modelling expectation in the self-repair processing of annotat-, um, listen- G. Betarte and A. Tasistro. 1998. Extension of Martin- ers. In Proceedings of the 17th SemDial Workshop Löf type theory with record types and subtyping. In on the Semantics and Pragmatics of Dialogue (Di- G. Sambin and J. Smith, editors, 25 Years of Con- alDam). structive Type Theory. Oxford University Press. R. Kempson, W. Meyer-Viol, and D. Gabbay. 2001. S. Brennan and M. Schober. 2001. How listeners Dynamic Syntax: The Flow of Language Under- compensate for disfluencies in spontaneous speech. standing. Blackwell. Journal of Memory and Language, 44(2):274–296. K.H. Knuth. 2005. Latticeduality: Theoriginof prob- R. Cooper, S. Dobnik, S. Lappin, and S. Larsson. 2014. ability and entropy. Neurocomputing, 67:245–274. A probabilistic rich type theory for semantic interpretation. In Proceedings of the EACL Workshop E. Krahmer and K. Van Deemter. 2012. Computa- on Type Theory and Natural Language Semantics tional generation of referring expressions: A survey. (TTNLS). Computational Linguistics, 38(1):173–218. R. Cooper. 2005. Records and record types in se- S. Larsson. 2002. Issue-based Dialogue Management. mantic theory. Journal of Logic and Computation, Ph.D. thesis, Göteborg University. Also published 15(2):99–112. as Gothenburg Monographs in Linguistics 21. R. Cooper. 2012. Type theory and semantics in flux. S. Larsson. 2010. Accommodating innovative mean- In R. Kempson, N. Asher, and T. Fernando, edi- ing in dialogue. Proc. of Londial, SemDial Work- tors, Handbook of the Philosophy of Science, vol- shop, pages 83–90. ume 14: Philosophy of Linguistics, pages 271–323. S. Larsson. 2011. The TTR perceptron: Dynamic North Holland. perceptual meanings and semantic coordination. In R. Dale and E. Reiter. 1995. Computational interpreta- Proceedings of the 15th Workshop on the Semantics tions of the gricean maxims in the generation of re- and Pragmatics of Dialogue (SemDial 2011 - Los ferring expressions. Cognitive Science, 19(2):233– Angelogue). 263. W. Levelt. 1989. Speaking: From Intention to Articu- lation. MIT Press. D. DeVault and M. Stone. 2009. Learning to interpret utterances using dialogue history. In Proceedings of R. Montague. 1974. Formal Philosophy: Selected Pa- the 12th Conference of the European Chapter of the pers of Richard Montague. Yale University Press. Association for Computational Linguistics (EACL). M. Purver, A. Eshghi, and J. Hough. 2011. Incremen- S. Dobnik, R. Cooper, and S. Larsson. 2012. Mod- tal semantic construction in a dialogue system. In elling language, action, and perception in type the- J. Bos and S. Pulman, editors, Proceedings of the ory with records. In Proceedings of the 7th Inter- 9th International Conference on Computational Se- national Workshop on Constraint Solving and Lan- mantics. guage Processing (CSLP12). Y. Sato. 2011. Local ambiguity, search strate- A. Eshghi, J. Hough, and M. Purver. 2013. Incre- gies and parsing in Dynamic Syntax. In E. Gre- mental grammar induction from child-directed di- goromichelaki, R. Kempson, and C. Howes, editors, alogue utterances. In Proceedings of the 4th An- The Dynamics of Lexical Interfaces. CSLI Publica- nual Workshop on Cognitive Modeling and Compu- tions. tational Linguistics (CMCL). D. Schlangen and G. Skantze. 2009. A general, ab- R. Fernández. 2006. Non-Sentential Utterances in Di- stract model of incremental dialogue processing. In alogue: Classification, Resolution and Use. Ph.D. Proceedings of the 12th Conference of the European thesis, King’s College London, University of Lon- Chapter of the ACL (EACL 2009). don. E. Shriberg and A. Stolcke. 1998. How far do speakers M. C. Frank and N. D. Goodman. 2012. Predicting back up in repairs? A quantitative model. In Pro- pragmatic reasoning in language games. Science, ceedings of the International Conference on Spoken 336(6084):998–998. Language Processing. J. Ginzburg. 2012. The Interactive Stance: Meaning G. Skantze and A. Hjalmarsson. 2010. Towards incre- for Conversation. Oxford University Press. mental speech generation in dialogue systems. In Proceedings of the SIGDIAL 2010 Conference. J. Hough and M. Purver. 2012. Processing self-repairs in an incremental type-theoretic dialogue system. In J. Williams and S. Young. 2007. Scaling POMDPs Proceedings of the 16th SemDial Workshop on the for spoken dialog management. IEEE Transac- Semantics and Pragmatics of Dialogue (SeineDial). tions on Audio, Speech, and Language Processing, 15(7):2116–2129.

88 Propositions, Questions, and Adjectives: a rich type theoretic approach Jonathan Ginzburg CLILLAC-ARP & Laboratoire d’Excellence (LabEx)–Empirical Foundations of Linguistics Universite´ Paris-Diderot, Sorbonne Paris-Cite´ [email protected] Robin Cooper Tim Fernando University of Gothenburg Trinity College, Dublin [email protected] [email protected]

Abstract 1. Recently there have been a number of proposals that questions and propositions are of a We consider how to develop types cor- single ontological category (see (Nelken and responding to propositions and questions. Francez, 2002; Nelken and Shan, 2006)) and Starting with the conception of Proposi- most influentially work in Inquisitive Seman- tions as Types, we consider two empirical tics (IS) (Groenendijk and Roelofsen, 2009). challenges for this doctrine. The first re- A significant argument for this is examples lates to the putative need for a single type like (1), where propositions and questions encompassing questions and propositions can apparently be combined by boolean con- in order to deal with Boolean operations. nectives. The second relates to adjectival modification of question and propositional entities. (1) If Kim is not available, who should We partly defuse the Boolean challenge we ask to give the talk? by showing that the data actually argue against a single type covering questions In Inquisitive Semantics, such data are han- and propositions. We show that by ana- dled by postulating a common type for ques- lyzing both propositions and questions as tions and propositions as sets of sets of records within Type Theory with Records worlds. It is not a priori clear how propo- (TTR), we can define Boolean operations sitions as types can account for such cases. over these distinct semantic types. We ac- 2. Adjectives pose a challenge to all existing count for the adjectival challenge by em- theories of questions and propositions, pos- bedding the record types defined to deal sible worlds based (e.g., (Karttunen, 1977; with Boolean operations within a theory of Groenendijk and Stokhof, 1997; Groenendijk semantic frames formulated within TTR. and Roelofsen, 2009), or type theoretic, as in 1 Introduction Type Theory with Records (TTR, (Cooper, 2012; Ginzburg, 2012)). There is nothing Propositions as types has long been viewed as a in the semantic entity associated with a po- sine qua non of many a type theoretic approach to lar question as in (2), be it a two cell parti- semantics (see e.g., the seminal work of (Ranta, tion (as in partition semantics) or a constant 1994)). Although this has lead to a variety of function from records into propositions (as in very elegant formal accounts, one can question its Ginzburg 2012) that will allow it to distin- appropriateness as a type for NL propositions— guish difficult from easy questions. Similarly, the denotata of declaratives and of nouns such as since the denotation of a question is not con- ‘claim’ and the objects of assertion. One imme- ceived as an event, this denotation is not ap- diate issue concerns semantic selection—how to propriate for the adjective quick: specify the semantic types of predicates such as ‘believe’ and ‘assert’ so that they will not select (2) A: I have a quick question: is every for e.g., the type of biscuits or the type of natural number above 2 the sum of two numbers, given their inappropriateness as objects primes? of belief or assertion. However, one resolves this issue, we point to two other significant challenges: B: That’s a difficult question.

89 Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 89–96, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics

And yet, these two distinct classes of adjec- (6) a. Bo asked/investigated/wondered/# tives can simultaneously apply to a question believed /# claimed who came yesterday. together with ‘resolved’, a target of all exist- b. Bo # asked/# investigated/# wondered/ ing theories of questions, as in (3), calling for believed /claimed that Mary came yester- a unified notion of question: day. (3) The quick question you just posed We argue that although speech acts involving is difficult and for the moment unre- questions and propositions can be combined by solved. boolean connectives they are not closed under ‘Difficult’ and ‘silly’ apply to both proposi- boolean operations. Furthermore, we argue that tional and question entities, suggesting the the propositions and questions qua semantic ob- need for a unified meaning for the adjective jects cannot be combined by boolean operations and a means of specifying its selection so at all. This, together with the examples above, that it can modify both questions and propo- strongly suggests that questions and propositions sitions: are distinct types of semantic objects. We use embedding under attitude verbs as a test (4) a. silly claim (a claim silly to assert) for propositions and questions as semantic objects. b. silly question (a question silly to Here we do not find mixed boolean combinations ask); of questions and propositions. Thus, for exam- c. difficult claim (a claim difficult to ple, wonder selects for an embedded question and prove) believe for an embedded proposition but a mixed conjunction does not work with either, showing In this paper we partly defuse the Boolean that it is neither a question nor a proposition: challenge by showing that the data actually argue against a single type covering questions and (7) The manager *wonders/*believes that propositions. We show that by analyzing both several people left and what rooms we propositions and questions as records within TTR, need to clean. we can define Boolean operations over these dis- The verb know is compatible with both tinct semantic types. We then propose to deal interrogative and declarative complements, with the adjectival challenge by embedding the though(Vendler, 1972; Ginzburg and Sag, 2000) types initially defined within a theory of semantic argue that such predicates do not take questions or frames (Fillmore, 1985; Pustejovsky, 1995) for- propositions as genuine arguments (i.e. not purely mulated within TTR. referentially), but involve coercions which leads to a predication of a fact. The well formedness 2 Questions and Propositions: a unified of these coercion processes require that sentences semantic type? involving decl/int conjunctions such as (8) can Although there has been a recent trend to assume only be understood where the verb is distributed a commonality of type for questions and propo- over the two conjuncts: “knows that John’s smart sitions, both Hamblin and Karttunen gave argu- and knows what qualifications he has”: ments for distinguishing questions as an ontologi- (8) The manager knows that John’s smart and cal category from propositions—(Hamblin, 1958) what qualifications he has. pointing out that interrogatives lack truth values; to which one can add their incompatibility with a Compare (9a,b)—in the second mixed case wider scoping alethic modality: there is only a reading which entails that it is surprising the conference was held at the usual time (5) a. It’s true/false who came yesterday whereas arguably in the first sentence only the conjunction but not the individual conjuncts need b. # Necessarily, who will leave tomorrow? be surprising. Whereas (Karttunen, 1977) pointed to the exis- (9) a. It’s surprising that the conference was tence of predicates that select interrogatives, but held at the usual time and so few people not for declaratives and vice versa: registered.

90 b. It’s surprising that the conference was However we treat these facts, it seems clear that held at the usual time and how few peo- it would be dangerous to collapse questions and ple registered. propositions into the same type of semantic object and allow general application of semantic boolean Embedded conditional questions are impossible operators. This would seem to force you into a sit- although, of course, embedded questions contain- uation where you have to predict acceptability of ing conditionals are ﬁne: these sentences purely on the basis of a theory of syntax, although semantically/pragmatically they (10) *The manager wonders if Hollande left, would have made perfect sense. It seems to us that whether we need to clean the west wing. distinguishing between questions and propositions and combinations of speech acts offers a more ex- a. The manager wonders whether, if Hol- planatory approach. lande left, we need to clean the west wing.

Why, then, do apparent mixed boolean com- 3 Austinian Types for Propositions and binations appear in root sentences? Our answer Questions is that natural language connectives, in addition 3.1 TTR as synthesizing Constructive Type to their function as logical connectives combin- Theory and Situation Semantics ing propositions, can be used to combine speech The system we sketch is formulated in TTR acts into another single speech act. This, however, (Cooper, 2012). TTR is a framework that draws its can only be expressed in root sentences and speech inspirations from two quite distinct sources. One acts are not closed under operations correspond- source is Constructive Type Theory, whence the ing to boolean connectives. For example in (11a), repertory of type constructors, and in particular where a query follows an assertion is fine whereas records and record types, and the notion of wit- the combination of an assertion with a preceding nessing conditions. The second source is situa- query is not, as in (11b): tion semantics (Barwise and Perry, 1983; Barwise, (11) a. John’s very smart but does he have any 1989) which TTR follows in viewing semantics as qualifications? ontology construction. This is what underlies the emphasis on specifying structures in a model the- b. *Does John have any qualifications oretic way, introducing structured objects for ex- and/but he’s smart plicating properties, propositions, questions etc. It also takes from situation semantics an emphasis on This is puzzling because a discourse corre- partiality as a key feature of information process- sponding to a string of the same separate speech ing. This aspect is exemplified in a key assumption acts works well: of TTR—the witnessing relation between records and record types: the basic relationship between (12) Does John have any qualifications? (no the two is that a record r is of type RT if each answer) But he’s smart. value in r assigned to a given label li satisfies the typing constraints imposed by RT on li: Similarly, while we can apparently conditionalize a query with a proposition, we cannot conditional- (14) record witnessing ize an assertion with a question, nor can we conditionalize a query with a question: The record: l1 = a1 is of type: (13) a. If Hollande left, do we need to clean the l = a west wing? ( “If Hollande left, I ask you  2 2  ...  whether we need to clean the west wing”),   l = a   n n b. *If whether Hollande left/did Hollande   l1 : T1 leave, we need to clean the west wing? l2 : T2(l1)  c. *If who left, do we need to clean the west ...    wing? ln : Tn(l1, l2, . . . , ln 1)  −    91 iff a1 : T1, a2 : T2(a1), . . . , an : propositions are defined in (17b): Tn(a1, a2, . . . , an 1) − (17) a. AustProp =def This allows for cases where there are fields in sit : Rec the record with labels not mentioned in the record sit-type : RecType† type. This is important when e.g., records are used " # to model contexts and record types model rules b. A proposition p = about context change—we do not want to have to sit = s0 predict in advance all information that could be is true iff " sit-type = ST0 # in a context when writing such rules. (15) illus- s0 : ST0 trates this: the record (15a) is of the type (15b), though the former has also a field for FACTS; We introduce negative types by the clause in (15b) constitutes the preconditions for a greeting, (18a). Motivated in part by data concerning nega- where FACTS—the contextual presuppositions— tive perception complements ((Barwise and Perry, has no role to play. 1983; Cooper, 1998), we can characterize witnesses for negative types by (18b). (15) a. spkr = A addr = B  (18) a. If T is a type then T is a type ¬ utt-time = t1    c1 = p1      b. a : T iff there is some T 0 such that a : T 0   ¬ Moves =  and T 0 precludes T . We assume the exis-    DE  tence of a binary, irreflexive and symmet- qud =    ric relation of preclusion which satisfies   facts =no cg1   also the following specification:   T 0 precludes T iff either (i) T = T 0 or, b. spkr : IND ¬ (ii) T,T 0 are non-negative and there is no addr : IND  a such that a : T and a : T 0 for any mod- utt-time : TIME  els assigning witnesses to basic types and   c1 : addressing(spkr,addr,utt-time) p(red)types     Moves = : list(LocProp)    (19a) and (19b) follow from these two defini-    DE  tions: qud = : set(Question)     no  (19) a. a : T iff a : T 3.2 Propositions ¬¬ Our starting point is the situation semantics no- b. a : T T is not necessary (a may not be ∨ ¬ tion of an Austinian proposition (Barwise and of type T and there may not be any type Etchemendy, 1987). (Ginzburg, 2012) introduces which precludes T either). Austinian propositions as records of the form: sit = s Thus this negation is a hybrid of classical and (16) sit-type = T intuitionistic negation in that (19a) normally holds " # for classical negation but not intuitionistic whereas This gives us a type theoretic object correspond- (19b), that is failure of the law of the excluded ing to a judgement. The type of Austinian proposi- middle, normally holds for intuitionistic negation tions is the record type (17a),where the type Rec- but not classical negation. Type† is a basic type which denotes the type of The type of negative (positive) Austinian propo- (non-dependent) record types closed under meet, sitions can be defined as (20a,b), respctively: join and negation.1 Truth conditions for Austinian 1 sit : Rec When we say ‘the type of record types’, this should be (20) a. understood in a relative, not absolute way. That is, this means " sit-type : RecType¬† # the type of record types up to some level of stratification, otherwise foundational problems such as russellian paradoxes sit : Rec can potentially ensue. See (Cooper, 2012) for discussion and b. a more careful development. " sit-type : RecType #

92 If p:Prop and p.sit-type is T T (22) AustQuestion(T) = 1 ∧ 2 def (T1 T2) we say that p is the conjunction sit : Rec ∨ sit = p.sit abstr : (T RecType) (disjunction) of and " → # " sit-type = T1 # sit = p.sit Given this, we define the following relation be- . tween a situation and a function, which is the ba- " sit-type = T2 # sis for defining key coherence answerhood no- 3.3 Questions tions such as resolvedness and aboutness (weak partial answerhood (Ginzburg and Sag, 2000)) Extensive motivation for the view of questions and question dependence (cf. erotetic implica- as propositional abstracts has been provided in tion,(Wisniewski,´ 2001)): (Ginzburg, 1995; Ginzburg and Sag, 2000)—TTR contributes to this by providing an improved no- (23) s resolves q, where q is λr :(T1)T2, (in tion of simultaneous, restricted abstraction: A (ba- symbols s?q) iff either sic, non-compound) question is a function from (i) for some a : T1 s : q(a), records into propositions. In particular, a polar or question is a 0-ary propositional abstract, which (ii) a : T1 implies s : q(a) in TTR makes it a constant function from the uni- ¬ verse of all records into propositions. We pro- Austinian questions can be conjoined and dis- pose a refinement of this view which we believe joined though not negated. The definition for maintains the essential insights of the proposi- conj/disj-unction, from which it follows that q1 tional function approach, motivated in part by the and (or) q2 is resolved iff q1 is resolved and (or) need to enable conjunction and disjunction to be q2 is resolved, is as follows: defined for questions. sit = s We introduce a notion of Austinian questions (24) ( ) defined as records containing a record and a func- " abstr = λr : T1 (T2) # ∧ ∨ tion into record types, the latter associated with sit = s = the label ‘abstr(act)’. The role of wh-words on " abstr = λr : T3 (T4) # this view is to specify the domains of these func- sit = s tions; in the case of polar questions there is no re-  left:T  striction, hence the function component of such a abstr = λr: 1  "right:T3#  question is a constant function. (21) exemplifies     this for a unary ‘who’ question and a polar ques- (q1(r.left) ( )q2(r.right))  ∧ ∨  tion:   Following (Cooper and Ginzburg, 2012)) we ar- (21) a. Who = x1 : Ind ; Whether = Rec; gue that “negative questions” involve questions re- "c1 : person(x1)# lating to negative propositions rather than nega- tions of positive questions. As Cooper and b. ‘Who runs’ Ginzburg show, such negative questions are cru- 7→ sit =r1 ; cially distinct from the corresponding positive question. Since we have a clear way of distin- abstr = λr:Who( c : run(r.x1) ) guishing negative and positive propositions, we do  h i  not conflate positive and negative polar questions. c. ‘Whether Bo runs’ 7→ sit =r1 4 Connectives in dialogue abstr = λr:Whether( c : run(b) ) We assume a gameboard dialogue semantics  h i  (Ginzburg, 2012) which keeps track of questions We characterize the type AustQuestion within under discussion (QUD). One of the central con- TTR by means of the parametric type given in versational rules in KoS is QSPEC, a conversa- (22); the parametric component of the type char- tional rule that licenses either speaker to follow acterizes the range of abstracts that build up ques- up q, the maximal element in QUD with assertions: tions and queries whose QUD update Depends on

93 q. These in turn become MaxQUD. Consequently, – John’s smart (no response) But what QSPEC seems to be able to handle the commonest qualifications does he have? case of successive questions, as in (25). – John’s smart might be offered as an enthymematic argument (Breitholtz, 2011; (25) Breitholtz and Cooper, 2011) to a con- a. Ann: Anyway, talking of over the road, clusion, e.g. “we should hire John”. but where is she? Is she home? indicates that the answer to the ques- Betty: No. She’s in the Cottage. tion might present an enthymematic argument against this conclusion. b. Arthur: How old is she? Forty? Evelyn: Forty one! or can indicate that q1 addresses the same • ultimate issue as MaxQUD, so retain both Nonetheless, not all cases of successive ques- as MaxQUD; sufficient to address one issue tions do involve a second question which is a sub- since it will resolve both simultaneously: question of the first, as exemplified in (26): (28) a. Would you like coffee and biscuits (26) On the substantive front, we now have or would you like some fruit or a preliminary answers to two key ques- piece of bread and jam or what do tions: What did the agency do wrong? you fancy? And who ordered it to target conservative groups? Notwithstanding Miller’s resig- b. are you gonna stay on another day or nation, which the President himself an- what are you doing? nounced on Tuesday evening, the answers c. David Foster Wallace is overrated appear to be: not nearly as much as re- or which novel by him refutes this cent headlines suggest; and, nobody in view? the Obama Administration. (The New Yorker, May 16, 2013) 5 Abstract Entities and Adjectives

In contrast to cases covered by QSPEC, these How to deal with adjectival modification of propo- cases are strange if the second question is posed sitional and question entities, exemplified in (3,4) by the addressee of the first question—one gets the above? The extended notion of question required feeling that the original question was ignored: can be explicated within Cooper 2012’s theory of semantic frames, inspired by (Fillmore, 1985; (27) A: What did the agency do wrong? B: Pustejovsky, 1995). Neither Ty2 (Groenendijk and Who ordered it to target conservative Stokhof, 1997) nor inquisitive semantics in propo- groups? sitional or first order formulation support the de- (Ginzburg, 2012) postulates an additional con- velopment of such an ontology. Cooper formu- versational rule that allows a speaker to follow up lates a frame as a record type (RT). In (29) we an initial question with a non-influencing question, exemplify a possible frame for question. Here, where the initial question remains QUD-maximal. the illoc role represents a question’s role in dis- We believes this basic treatment allows one to ex- course, whereas the telic role describes the goal of plain how the mixed cases involving conjunctions the process associated with resolving a question of assertions and queries can be captured. and,but — finding a resolving answer. The frame repre- and or can be used as discourse particles which sents a ‘default’ view of a question, which vari- express a relationship between a speech act and ous in effect non-subsective adjectives can modify the one preceding it: (e.g., ‘unspoken question’ negates the existence of an associated utterance, while ‘open question’ and can indicate that the following question negates the end point of the resolution event).2 • is Independent of MaxQUD. 2Here Resolve maps an austinian proposition and an aus- but indicates that the following question tinian question to a predicate type. In a more detailed account • one would add an additional argument for an information is not independent, but unexpected given state, given the arguments that this notion is agent–relative MaxQUD: (Ginzburg, 1995) and much subsequent literature.

94 (29) Question =def b. T : Type T : Type external : AustQuestion(T)  external : AustQuestion(T),   p : AustProp  telic :   u : Event   c1 : difficult(Resolve(p,external))     " #     illoc : A : Ind         c. external : PhysTrajectory  c2 : Ask(A,external,u)             p : AustProp  a : Ind telic :  telic :    "c1 : difficult(Cross(a,external))#  "c1 : Resolve(p,external)#           A type-driven compositional analysis is for- Turning to propositions, we postulate (33) as a mulated with adjectives as record type modifiers type for proposition. This allows us, for instance, (functions from RTs to RTs) that pick out frame el- to specify the adjective silly as modifying along ements of the appropriate type (for a related view the illoc dimension, thereby capturing silly claim cf. Asher & Luo 2012). For example, difficult (a claim silly to assert) and silly question (a ques- question has the record type in (30): tion silly to ask); given the specification of the telic (30) T : Type dimension and our lexical entry for difficult, difficult claim is correctly predicted to mean ‘a claim external : AustQuestion  difficult to prove’.  p : AustProp  telic :    (33) Proposition =  "c1 : difficult(Resolve(p,external))# def   external : AustProp, Records and record types come with a well-  known notion of subtyping, often construed syn-  u : Event  tactically (see e.g., (Betarte and Tasistro, 1998)). illoc : A : Ind      However, given our ontological perspective on se-  c2 : Assert(A,external,u)    mantics, we take a semantic perspective on sub-     f : Fact  typing (see e.g. (Frisch et al., 2008) for a detailed   telic :  exposition of such an approach.), wherein T < T  "c1 : Prove(f,external)#  0   iff s s : T s s : T . Given this, a record of   { | } ⊂ { | 0} the type (29) above can be viewed as also having Subject matter adjectives such as political, pertype: sonal, moral, philosophical as in (34) lead us to another intrinsic advantage for rich type theories (31) T : Type such as TTR over possible worlds based type the- "external : AustQuestion(T)# ories, relating to the types AustQuestion/Prop.

This forms the basis of our account of how an (34) a. A: Are you involved with Demi Lovato? adjective such as difficult applies simultaneously B: That’s a personal question. to question and to path. Difficult is specified as in (32)— a function from record types subsumed b. A: One shouldn’t eat meat. B: by the record type given in the domain whose That’s a moral claim. output involves a modification of the restriction field of the telic role. This yields (32b) when Subject matter adjectives target the external role combined with question and (32c) when combined of a question/proposition. This can be explicated with path:3 on the basis of the predicate types which constitute the sit-type (abstr type) field in propositions (32) a. f : (RT < (questions). Given the coarse granularity of possi- )RT[P;difficult(P)] external : Type ble worlds, it to unclear how to do so in ontologies P : Type  based on sets of possible worlds.   telic : c1 : P    Acknowledgments  h i  3Here difficult maps any type P into the predicate type difficult(P ). One probably needs to narrow this specifica- Earlier versions of portions of this work were tion somewhat. presented at the workshop Conference on Logic,

95 Questions and Inquiry (LoQI) in Paris and at a Alain Frisch, Giuseppe Castagna, and Veronique´ Ben- course on TTR given in June 2013 in Gothen- zaken. 2008. Semantic subtyping: Dealing set- burg. We thanks audiences on those occasions, as theoretically with function, union, intersection, and negation types. Journal of the ACM (JACM), well as two anonymous referees for very stimu- 55(4):19. lating comments. This work is supported by the French Investissements d’Avenir–Labex EFL pro- Jonathan Ginzburg and Ivan A. Sag. 2000. Interrog- ative Investigations: the form, meaning and use of gram (ANR-10-LABX-0083). English Interrogatives. Number 123 in CSLI Lec- ture Notes. CSLI Publications, Stanford: California. References Jonathan Ginzburg. 1995. Resolving questions, i. Lin- guistics and Philosophy, 18:459–527. Jon Barwise and John Etchemendy. 1987. The Liar. Oxford University Press, New York. Jonathan Ginzburg. 2012. The Interactive Stance: Meaning for Conversation. Oxford University Jon Barwise and John Perry. 1983. Situations and At- Press, Oxford. titudes. Bradford Books. MIT Press, Cambridge. J. Groenendijk and F. Roelofsen. 2009. Inquisitive se- Jon Barwise. 1989. The Situation in Logic. CSLI Lec- mantics and pragmatics. In Meaning, Content, and ture Notes. CSLI Publications, Stanford. Argument: Proceedings of the ILCLI International Workshop on Semantics, Pragmatics, and Rhetoric. Gustavo Betarte and Alvaro Tasistro. 1998. Martin- www. illc. uva. nl/inquisitive-semantics. lof’s¨ type theory with record types and subtyping. In G. Sambin and J. Smith, editors, 25 Years of Con- Jeroen Groenendijk and Martin Stokhof. 1997. Ques- structive Type Theory. Oxford University Press. tions. In Johan van Benthem and Alice ter Meulen, editors, Handbook of Logic and Linguistics. North Ellen Breitholtz and Robin Cooper. 2011. En- Holland, Amsterdam. thymemes as rhetorical resources. In Ron Artstein, Mark Core, David DeVault, Kallirroi Georgila, Elsi C. L. Hamblin. 1958. Questions. Australian Journal Kaiser, and Amanda Stent, editors, SemDial 2011 of Philosophy, 36:159–168. (Los Angelogue): Proceedings of the 15th Workshop Lauri Karttunen. 1977. Syntax and semantics of ques- on the Semantics and Pragmatics of Dialogue. tions. Linguistics and Philosophy, 1:3–44. Ellen Breitholtz. 2011. Enthymemes under Discus- Rani Nelken and Nissim Francez. 2002. Bilattices and sion: Towards a micro-rhetorical approach to dia- the semantics of natural language questions. Lin- logue modelling. In Proceedings of SPR-11 ILCLI guistics and Philosophy, 25:37–64. International Workshop on Semantics, Pragmatics, and Rhetoric Donostia, 9-11 November 2011. Rani Nelken and Chung-Chieh Shan. 2006. A modal interpretation of the logic of interrogation. Journal R. Cooper and J. Ginzburg. 2012. Negative inquisi- of Logic, Language, and Information, 15:251–271. tiveness and alternatives-based negation. In Maria Aloni, Vadim Kimmelman, Floris Roelofsen, Gal- James Pustejovsky. 1995. The Generative Lexicon. itW. Sassoon, Katrin Schulz, and Matthijs West- MIT Press, Cambridge. era, editors, Logic, Language and Meaning, volume 7218 of Lecture Notes in Computer Science, pages Aarne Ranta. 1994. Type Theoretical Grammar. Ox- 32–41. Springer Berlin Heidelberg. ford University Press, Oxford.

Robin Cooper. 1998. Austinian propositions, Zeno Vendler. 1972. Res Cogitans. Cornell University davidsonian events and perception complements. Press, Ithaca. In Jonathan Ginzburg, Zurab Khasidashvili, Jean Jacques Levy, Carl Vogel, and Enric Vallduvi, Andrzej Wisniewski.´ 2001. Questions and inferences. editors, The Tbilisi Symposium on Logic, Language, Logique et Analyse, 173:5–43. and Computation: Selected Papers, Foundations of Logic, Language, and Information, pages 19–34. CSLI Publications, Stanford.

Robin Cooper. 2012. Type theory and semantics in ﬂux. In Ruth Kempson, Nicholas Asher, and Tim Fernando, editors, Handbook of the Philosophy of Science, volume 14: Philosophy of Linguistics. El- sevier, Amsterdam.

C.J. Fillmore. 1985. Frames and the semantics of un- derstanding. Quaderni di semantica, 6(2):222–254.

96 Author Index

Asudeh, Ash, 19

Chatzikyriakidis, Stergios, 37 Clark, Stephen, 46 Cooper, Robin, 72, 89

Delpeuch, Antonin, 55 Dobnik, Simon, 72

Fernando, Tim, 63, 89

Ginzburg, Jonathan, 89 Giorgolo, Gianluca, 19 Grefenstette, Edward, 46 Grudzinska, Justyna, 10

Hough, Julian, 80

Lappin, Shalom, 72 Larsson, Staffan, 72 Luo, Zhaohui, 37

Maillard, Jean, 46

Preller, Anne, 55 Purver, Matthew, 80

Ranta, Aarne,1

Worth, Chris, 28

Zawadowski, Marek, 10