Quick viewing(Text Mode)

An Implementation of Formal Semantics in the Formalism of Relational Databases

An Implementation of Formal Semantics in the Formalism of Relational Databases

AN IMPLEMENTATION OF FORMAL IN THE FORMALISM OF RELATIONAL DATABASES.

Claire VANDERHOEFT D~partement de Linguistique G~n(rale, CP175 UniversitY, Libre de Bruxelles 50, avenue Roosevelt 1050 Bruxelles Belgique

E-mail: [email protected]

I describe the database. Then, I explain how the principles used to design the database meet ABSTRACT the requirements of formal semantics. The fourth part is concerned with entailment while This paper presents an implementation the last part mainly shows how one proceeds to of formal semantics as described in Keenan interpret sentences. and Fa!tz's Boolean Semantics for Natural Langt~age [4]. The main characteristic of this implementation is that it avoids the intermediate step of translating NL into a , 1. BACKGROUND. such as an extended version of predicate calculus. My choice of not using any The topics which this work is intermediate language, which Montague already concerned with have mainly been studied from suggested in Universal [5], makes three points of view. my implementation free of the problems related A first class of studies covers the to the of such a language like binding problems encountered in trying to translate NL the variables and resolving scope ambiguities. into a formal language. On the one hand, there On the other hand, not translating NL into an is theoretical research aiming at such a intermediate language requires every translation, like PATR [7]. On the other hand, denotation (i.e. semantic value) to be various kinds of inaccuracies of NL explicitly and accurately represented in a translations into in view of database. accessing databases have been discussed, see [6] for example. A second field of research that need be 0. INTRODUCTION. mentioned is concerned with NL interfaces. Famous systems are described in [9] and [1]. In extensional semantics, each There are important differences between these denotation corresponds to an of the systems and my work since I am not aiming at world° The world is the set of all the accessing a knowledge base at all. The denotations. In the implementation that I shall database that I use encodes NL meanings and it present in this paper, the world will be does so according to linguistic constraints. represented by means of a database, more Traditionally, the database rather encodes a precisely a relational database. certain knowledge independent of the language The structure of the database is used to talk about it. Problems specific to NL designed in such a way that it makes explicit interfaces can be found in [3] and [81. the semantic type of each denotation. From another point of view, there are Although I will not always stick to the standard works which are concerned with the question version of formal semantics when assigning of the organisation of the knowledge base semantic types to syntactic categories, I aim at constituted by NL meanings, see [2]. The accounting for the same range of phenomena difference between my approach and ones like that formal semantics deals with. [2], is that I am sticking to the theory of formal The paper will be divided into five semantics. Consequently, I do not (yet) parts. First, I shall trace back research results address questions about the structure of the to which my contribution can be related. Next,

i 377 lexicon nor do I treat pragmatic phenomena like corresponding type. For example, the relation common sense . which corresponds to the type of noun phrases has an attribute whose values are denotations. Each denotation is an atomic value of the attribute of the relation. 2. THE DATABASE. Furthermore, each such value actually belongs to an n-tuple belonging to the extension of the The structure of the database is relation. (This is due to the fact that the set- dependent on the semantic properties of the theoretical model of the world, i.e. the denotations. More specifically, the structure of database, contains all the denotations built on the database is dependent on the fact that an ontology constituted by a set of entities, IJ, denotations are classified into different types and the values.) and specifically recognized as the denotations The structure of the database captures of such and such syntactic categories. the (degree of) complexity of a denotation by Each denotation of each constituent is a connecting the relation which represents the value in the database. Some of the denotations semantic type assigned to the corresponding result from the composition of other syntactic category with the relations which denotations. Which denotations can be represent the semantic types assigned to the composed with which other ones are properties constituents of a complex expression of the of their type. These properties are not encoded same syntactic category. For example, a as such. The overall structure of the database proper has a complex denotation because shows how the semantic types combine with it belongs to the syntactic category of noun each other. Consequently, complex phrases. Therefore, its denotation belongs to denotations (denotations of complex the extension of the relation representing the expressions) are represented by atomic type of noun phrases. Since there are noun values, but the fact that they are complex is phrases constituted by a determiner and a deduced from the structure of the database. common noun, the relation representing the Consider the case of noun phrase denotations. type of noun phrases actually connects to the The denotation of a determiner combines with relations representing respectively the types of the denotation of a common noun. This determiners and of common nouns. Hence, the combination yields the denotation of a noun structure of the relation associated to the type of phrase, i.e., an atomic value in the database. noun phrases and, in particular, of proper The representation of this denotation is , shows that they are complex connected (in the sense of relational expressions. databases) to the representations of the denotations of the noun and of the determiner. To the extent that we need to define the Therefore, it can be recognized as a complex connections which show the respective denotation. complexity of each type of denotation, a relation is actually defined for each semantic The design of the database is dependent type. For example, we shall define relations on the fact that we need an explicit means to like Tn, Tdet, Tnp, Tvp standing, respectively, recognize the type of each denotation for the type of denotations of common nouns, represented in it. Within the formalism of determiners, noun phrases, verb phrases. relational databases, defining types of Still, there is a problem in defining denotations amounts to defining a relation for connections. The problem is that, in formal each such type. semantics, expressions of different syntactic A relation is formally defined as an n- categories can have the same semantic type. tuple of formal attributes. By formal For example, common nouns and intransitive attribute is meant a way to identify the verbs share the same type. Now, the attribute (a position in the relation or a name) complexity of the denotation of a noun phrase and the of the set of its possible is encoded in the fact that it is connected to the values. The extension of a relation is the set denotation of a common noun. On the of all well-formed n-tuples of attribute values contrary, the denotation of a must for the corresponding formal attributes. (An be connected to the denotation of a simple verb ill-formed n-tuple has at least one non-possible and to the denotations of complements. In value for a formal attribute.) general, we not only need to define relations as counterparts of semantic types, but, where Relations each represent a type. Each expressions of different syntactid categories of them has (at least) one attribute whose collapse into the same type, their types must domain is the set of denotations of the nevertheless correspond to different relations.

3 7~ 2 With respect to the example, the relation Tv - ii) relations that correspond to types of non° (the type; of simple verbs) cannot be the same lexical categories have one attribute whose as the relation Tn, because Tvp connects to Tv domain is the set consisting of all the while Tnp connects to Tn. denotations of all these expressions. Notice that it is true of all the syntactic Moreover, they have other attributes, one for categories that they have one and only one each of their constituents. These attributes relation as semantic counterpart. Sometimes have as domain the extension of the relations however, the relation could be defined as the corresponding to the types of these sum of several relations. For example the constituents. complex relation Tvp has several mutually These principles ensure that denotations exclusive sets of connections. It connects to are submitted to the principle of the relation Tv and the relation Tnp, or to the compositionality which states that the relation Tv and the relation Tpp or the relation denotation of a complex expression is a Tvp and the relation Tpp, etc. compound of the denotations of' its constituents. (It is important to understand that Let me illustrate these principles by we want to be able to check that denotations are showing the definition of two relations. The submitted to compositionality.) How Trip relation is defined as a triple of formal compositionality constrains the of attributes: the relations will now be illustrated on Tn and Tnp= <[np],Tn,Tdet> Tnp. where it is understood that the possible values In the extension of a relation like Tn, all of the first attribute are noun phrase the pairs of values have the property that the denotations, the possible values of the second first value is the denotation of the second one. attribute are pointers to noun denotations, and We augment the schema of the database with the possible values of the third attribute are the constraint on Tn that: [["n"]] = [n]. In the pointers to determiner denotations. Notice that extension of relations like Tnp, all the n-tuples proper names have dummy values for Tn and are required to satisfy the constraint that the Tdet. value of the first attribute, i.e. the np Relations which encode simple types, denotation, is the denotation of an np whose i.e. types of lexical categories, cannot be constituents, i.e. the determiner and the noun, encoded the same way as relations have as respective denotations the ones corresponding to complex expressions: they connected to by the remaining attributes of the have no connections since they do not have any n-tuple. For Tnp, we augment the schema of constituents. Instead, they are generally the database with the constraint that: defined by pairs of attributes, the first one C(Tn,Tdet) = [rip], where C operates the instantiates to a denotation of the type in composition of its arguments. question and the second one to the symbolic expression which is the item. For example, Tn, which represents the type of simple common nouns, will be defined by the pair: 3. THEORETICAL PRINCIPLES MET BY Tn = <[n],"n"> THE DATABASE. where it is understood that the possible values of the first attribute are common noun Let us summarize how principles of denotations while the possible values of the formal semantics are taken into account in second attribute are the nouns themselves designing the database: considered as symbolic expressions. As i) we restrict ourselves to extensional expected, the role of the second attribute in a semantics. Therefore, every denotation must relation such as Tn is to anchor the denotation be represented and must correspond to one of simple expressions into the lexicon. object of the (unique) world. ii) the is the smallest unit that receives a In summary, the design of the database denotation, the sentence is the biggest one. meets the two following principles: iii) all the expressions that are well-formed

- i) relations that correspond to types of lexical syntactic consUtuents have a denotation. categories have two attributes: the first one has iv) all the denotations are encoded in the as domain the set of denotations of all the extension of a specific relation, that is, all the lexical items which belong to the lexical denotations have a type defined i1~ ~:i~edatabase. category in question, and the second one has as v) denotations of complex expressions are domain those items themselves regarded as connected to the denotations of the constituents symbolic expressions. that are contained in those expressions.

3 379 The theory of databases states that names (the lexical items) or connections. relational databases are logically equivalent to a Now, representing functions by symbols rather first order language whose predicates are the than by the sets of pairs argument-result relations. In this first order language, the implies that entailment cannot be defined on extensions of the predicates are the sets of the relations representing functional types. tuples of argument values on which the For relations not corresponding to "relation-predicates" evaluate to true. functional types, i.e. Tn, Tnp, Tv and Tvp, Therefore, the fact that the database is entailment is defined by means of set inclusion. interpreted as a first order language ensures that Let tl and t2 be two tuples of Tnp, tl entails t2 all the denotations have a type and their type is if the attribute value which is the np denotation explicitly attached to them. in tl is a subset of the attribute value which is the np denotation in t2. Take another example. Assume that Tvp has four attributes: the first one is the denotation of the vp, e.g. eat an 4. ENTAILMENT. apple, the third one is the denotation of the verb without the complement whose denotation is What kinds of things are the denotations the value of the fourth attribute, i.e. eat : is indispensable to know in order to define Tvp([eat an apple],W,[eat], [an apple]) what it means for an expression to entail Now, anything that has necessarily the another expression (of the same category). property of eating an apple has the property of Let us distinguish between attributes eating. So, for Tvp=(y 1,W,y2,Z), we set the that point to other relations, attributes that are general constraint that y2 ~ yl. We will say instantiated to symbolic expressions and attributes that take as values the denotations of that y2 D yl is an that belongs to the the type represented by the relation they are definition of the Tvp relation. Crucially, attributes of. Only the last kind of attributes are objects that would not meet the concerned with entaihnenc characterizing the extension of the relation to According to formal semantics, attribute which they belong cannot correspond to objects values that represent denotations are sets. (Do of the world. In the example, if eating an apple not forget that they are atomic from the point of does not entail eating, then the two expressions view of the structure of the database.) Some fail to have acceptable denotations. (primitive) entities are implicitly defined. Then, all the denotations (except for the denotations of sentences) are sets of entities, or sets of sets of entities, or functions whose 5. THE SKETCH OF A SEMANTIC domain and range are such kinds of sets. Let INTERPRETOR AND SENTENCE me use the meta-variable X which ranges over DENOTATIONS. the sets of entities, while Y ranges over sets of sets of entities. The set structure of the world The denotations of sentences are truth is the following: values. I have not insisted on the way sentence Tn X denotations are encoded but one might expect Trip (rip's in position) y that there is a relation Ts = <[s],Tnp,Tvp>. Tnpconnection(complement np's) X -> X Although this agrees with the principles of the Tv X database, it is not the solution that I have Tvp X adopted. Tdet X-> Y Assume that there is no Ts relation. We Tprep Y -> (X -> X) must nevertheless ensure that the interpretor Tpp X -> X will provide sentences with the truth values Tadv X -> X they denote. Furthermore, we would like to show how these truth values depend on the What is entailment, in the denotations of the constituents of the sentence implementation? First, for expressions which because we want to represent how denote functions, the fact that a certain compositionality is respected. expression entails another one is given by the fact that the respective expressions in which The basic operation for interpreting an each of them appears (with other constituents) expression, that is for assigning it its entail each other. For example, we will not say denotation, is the selection of values in the that "most" entails "some", but rather that database. How is the interpretor meant to "most Xs" entails "some Xs". This being so, select denotations actually encoded in the functions can be represented by symbols, either database?

38o 4 The interpretor proceeds in parallel with checking whether the property, i.e. the set, a syntactic parser which yields (at least) the denoted by the verb phrase is a member of the constituent structure of the expressions. set of properties denoted by the noun phrase. Imagine that the information that the parser The outputs of Z Tvp,Tnp 0. are more sends Io the interpretor is the context-free rule used by the parser in such expression than just ~. Indeed, in order to show that to be interpreted. For example, suppose that compositionality is respected, we must show the rule: explicitly what are the denotations of np -> det, n constituents which combine to yield the truth parses the given expression. And suppose that value. The latter denotations involve the interpretor knows that the category np is the denotations of their own constituents. category of an expression of the type of noun Therefore, the denotation of a sentence will be phrases, hence of the relation Tnp. Likewise, logically represented by an assertion. This n corresponds to Tn and det to Tdet. assertion is the logical conjunction of the Knowing that the above context-free denotations of all the constituents of the rule applies, the interpretor can perform the sentence. For example, the sentence: Most selection of a tuple in the Tnp relation. The men eat an apple denotes: schema of the selection to perform will be written: Z Tn,Tdet (Tnp). The specific selection Tdet("most") A Tn(yl,"men") A to perform in order to interpret a specific noun Tnp(y2,yl,"most") A [most](yl) = (y2) A phrase requires the interpretor to instantiate the Tv(y3,"eat",l) ^ parameters of the selection, Tn and Tdet, to the denotations of the noun and the determiner, TdetCa") ^ Tn(y4,"apple") ^ respectively. The output of the instantiated Tnp(y5,y4,"a") A [a](y4) = (y5) a selection: Tvp(y6,"eat",y3,y5) A y3 _D y6 A Z [men],[mostl(Tnp) is: y6 ~; y2 Tnp([most men],[men],[most]) ^ [most]([men]) = [most men] where ~ = y6 e y2. Notice that I use a logical notation to note the output of the interpretor. This output is, itself, It is easy to predict that sentences a relational database containing one having the same constituent structure as Most relation, the extension of which is constituted men eat an apple will each be interpreted by an by one tuple which satisfies the second assertion of the same form as this one. conjunct of the formula. In the example, the tuples consists of attribute values such that The computational counterpart of such [most]([men]) = [most men] an assertion is a database contained in the is true. original database. (We call such a database a view in the computational terminology.) In The way the interpretor selects the summary, all the sentences that share the same denotation of a noun phrase can be easily syntactic structure denote assertions equivalent generalized to the other types of expressions. I to databases having the same structure (but immediately turn to the case of sentences. different extensions, of course). Thus, the Let us assume that there is only one rule denotation of a sentence has iconic properties to parse sentences, namely : and its structure is of the same kind as that of S -> np,vp the representation of the world. We shall say Since there is no Ts type, there is no selection. that it is a possible fact, where "fact" means Nevertheless, the pseudo-schema of selection that the denotation of the sentence is a part of corresponding to this rule is defined by the world, while "possible" means that its structure conforms to that of the world. I; Tvp.Vnp 0 (to use a notation coherent with the one used Since ~ has been defined independently for the: other selections). The denotation that from any relation of the database, false such a "selection" will yield is a formula sentences can have the same kind of denotation interpretable as the truth value denoted by the as do true sentences° By this, I want to sentence. I shall represent this formula by ~. emphasize the fact that, when ~ does not yield The way in which (~ is assumed to the value true, it does not follow that the yield either truth value conforms to the account assertion is ill-formed. On the contrary, the of standard formal semantics: it consists in fact that a false sentence fails to denote the

5 381 actual state of the world does not prevent it 3- Jones, K.S. (1984). Natural Language and from denoting a possible fact as long as its Databases, Again. Proceedings of Coling84. denotation is a well-fomled assertion. 182-183. 4- Keenan, E. and Leonard M. Faltz. (1984). Boolean Semantics for Natural Language. D. Reidel Piblishing Company, Vol 23. 6. CONCLUSION. 5- Montague, Richard. (1974b). Universal Grammar. In R. Thomason (ed.) Formal I have presented the main principles of . Selected Papers of Richard my implementation of formal semantics. If I Montague. Yale University Press. New Haven had more space to do so, I could now develop and London. 222-246. two crucial issues: I could show how the 6- Moore, R.C. (1982). Natural-Language process of interpreting an expression parallels Access to Databases-- Theoretical/Technical its syntactic parsing and prove that the structure Issues. Proceedings of the 20th Annual of the database allows to cover the same range Meeting of the Association for Computational of phenomena as formal semantics does. . 44-45. Other important issues that must be 7- Rosenschein, S.J.and Schieber, S.M. dealed with include further phenomena related (1982). Translating English into Logical to coordination, negation, passive and other Form. Proceedings of t he 20th Annual constructions in which quantifiers appear to Meeting of the Association for Computational have a non trivial behavior. I am currently Linguistics. 1-8. pursuing these developments on the basis of 8- Templeton, M., & Burger, J. (1983) empirical linguistic data. My hypothesis is that Problems in Natural-Language Interface to they can be accounted for without changing the DBMS with examples from EUFID. design of the system presented so far, and even Proceedings of the Conference on Applied without augmenting it much. Natural Language Processing. 3-16. Let me finally stress that it is useful to 9- Woods, W.A. (1978). Semantics and -know that the database is logically equivalent to Quantification In Natural Language Question a first-order language. Indeed, this fact gives a Answering. In M Yovits (ed) Advances in synthetic view of the behavior of the system Computers. Vol. 17, New York. Academic and allows us to envisage further developments Press. 2-64. in the direction of non-monotonic .

ACKNOWLEDGMENTS.

The author is thankful to Prof. M. Dominicy for providing the opportunity to conduct this research. This text presents research results which were supported by the Belgian National incentive-program for Fundamental research in artificial intelligence initiated by the Belgian State, Prime Minister's Office, Science Policy Programming The scientific responsability is assumed by the author.

REFERENCES.

1- Grosz, B.J., Appelt, D.E., Martin, P.A. and Pereira, F.C.N. (1987). TEAM: An experiment in the Design of Transportable Natural-Language Interfaces. Artificial Intelligence. Vol. 32, No2, May 1987. 173- 244. 2- Hobbs, J.R. (1984). Building a Large Knowledge Base for a Natural Language System. Proceedings of Coling84. 283-286.

382 6