Modal Expressions in Natural Language Sentence and Their Similarity

Modal Expressions in Natural Language Sentence and Their Similarity Toshifumi Tanabe*, Yasuo Koyama†, Kenji Yoshimura*, Kosho Shudo* * Department of Electronics Engineering and Computer Science, Fukuoka University 8-19-1 Nanakuma, Jonan-Ku, Fukuoka, 814-0180, JAPAN {tanabe, yosimura, shudo}@tl.fukuoka-u.ac.jp †Corporate Research & Development, Seiko Epson Corporation 3-3-5 Owa, Suwa, Nagano, 392-8502, JAPAN [email protected] Abstract This paper is concerned with the treatment of modal information in natural language processing (NLP). Modal information, which fleshes out the kernel sentence, providing temporal, interpersonal, contingent or subjective information, i.e., polarity, tense, aspect, mood, modality in narrow sense, specific kinds of speaker’s judgment or attitude, etc plays an important role especially in discourse understanding, man-machine dialogue, inference system, etc. On the other hand, It is important for future NLP systems to formulate the semantic similarity of natural language expressions. In particular, paraphrasing, full text information retrieval, example-based MT and document compression technology require the effective similarity criterion for linguistic expressions. In this paper, first, we discuss the meaning of Japanese sentence-final modality expressions (ME) and second, we present similarity rules. position and a kind of ”reversal” isomorphism between 1. Introduction the formalism and the general structure of Japanese This paper is concerned with the treatment of modal sentence is described. information in natural language processing (NLP). The Next, in section 4., we present a semantical classification term “modal” used in this paper is not closed to that of of approximately one thousand five hundred modal modal logic or present linguistic theories, but stretched expressions extracted from the final position of sentences from NLP’s viewpoints. The information which is to be in various corpus. It is shown in section 5. that this derives handled in earlier stages of NLP will be classified into the hypothetical list of approximately one hundred thirty three major classes, i.e., 1) conceptual information given “modality” or “modal operators” for NLP. by conceptual words in the input sentence such as noun, In section 6., we present semantic and pragmatic verb, adjective, etc., 2) information about the relationship, similarity rules among the sequences of modal operations i.e., case-relation, cause-effect relation, etc., between Last, in section 7., we comment future works and open concepts ordinarily given by postposition, preposition or problems concerning e.g., the completeness of our other syntactical structure of the sentence. 3) information modality set and the universality of the rules as other than 1 or 2, which is given typically by auxiliary concluding remarks. verb or adverb. While information 1 combined with 2 comprises the kernel, propositional meaning of a sentence, 2. Formal Aspect of Modalization the information 3, which we call “modal” information, We, first, assume that the modal structure of natural fleshes out the kernel. It provides temporal, interpersonal, language sentence is generically formulated as: contingent or subjective information, i.e., polarity, tense, aspect, mood, modality in narrow sense, specific kinds of Mn [Mn-1…[M2 [M1 [S ]]]…]. (1) speaker’s judgment or attitude, etc. Thus, the term “modal” or “modality” is used generically for the sake of Here, S is a kernel sentence without modal meaning. Mi (1 our convenience in this paper. Although the modal ≦ i ≦ n) is a modal operator. M[X] denotes the information plays an important role, especially in modalization of X specified by the operator M. discourse understanding, man-machine dialogue, For example, English sentence “Mary should not have inference system, etc., we have not achieved enough gone to school.” may be interpreted as: knowledge about it in NLP field yet. NEGATIVE-OBLIGATION [PAST-TENSE [Mary goes On the other hand, it is also important for forthcoming to school]] in this formalism. Here, modal operator NLP systems to formulate the semantic similarity of NEGATIVE-OBLIGATION denotes what is the speaker’s natural language expressions. It is required not only for judgment at the utterance time for PAST-TENSE the text retrieval, paraphrasing, document summarization, [Mary…school]. This formalism, which is rather simple, document compression, example-based MT, but also for might give rise to the intermediate expression in the the automatic acquisition of the concept and language. In course of logical interpretation of natural sentence. We, this paper, we discuss the modal expressions in natural however, are not concerned with the interpretation itself, language and their semantic similarity from a viewpoint of in this paper. Although the interpretation presupposes the NLP, adopting Japanese as a natural language model. strongest and sophisticated semantics closely related to the In section 2., we introduce a formalization of the modal syntax that covers the whole natural language phenomena, structure of natural sentence. In section 3., we introduce we have not achieved those yet. From the viewpoint of the the left-branching characteristic of Japanese sentence that state-of-the-art technology of NLP, we are interested in carries the modal information at the sentence-final what kind of series of modal operations Mn Mn-1…M2 M1 Here, P0 denotes a verb, an adjective, an adjective verb or should be considered in NLP. a nominal predicate. A and B denotes propositional and non-propositional ME, respectively. φ denotes the 3. General Structure of Japanese Sentence empty set. Thus, non-propositional ME can not precede The global surface structure of a Japanese sentence is propositional ME. described by production rules: 4.1. Propositional Modal Expression Si → Si-1・ei (1≦i≦n) (2) Propositional MEs are classified into following two types: where, each Si is a non-terminal symbol for a sentence, S0 is the initial symbol, each ei is a terminal symbol for a A1. dynamic ME which means operation, movement modality expression that is not limited to a word but or action. possibly a string of words that provides an A2. static ME which means state or situation. undecomposable modal information and n is the number of modal expressions located at successive final positions MEs of type A1 is further classified into A1-1; voice, of a sentence. Thus, the general form of a Japanese A1-2; aspect and A1-3; others. Table 1 shows some sentence S as the terminal string is shown as follows: examples. ME of type A1-1 is used to change the predicate to, e.g., S = S０・e1・e2・ … ・en (0≦n) (3) passive or causative one. It requires the conversion of specific case particle in the complement sentence in where, S０ denotes a kernel sentence without modality Japanese. ME of type A1-2 picks up some part of the which is governed by a verb, an adjective, an adjective time duration of operation, action, etc., denoted by the verb or a nominal predicate(a noun followed by a linking complement sentence. verb) located in its final position, and most modal MEs of type A2 is classified into many subclasses as information is packed within its final part in Japanese shown in Table 2 with examples. A2-1 is for negation. sentence. In addition, a sentence is approximately a literal A2-2 is similar to A1-2 but it is static or observatory. realization of formula (1) in reversed order. This is one of A2-3 is for the past-tense. A2-4 is for the speaker’s the remarkable features of Japanese language. Left judgment for the necessity, possibility and frequency. branching property or the linearity of rule (2) is achieved They function just as the necessity operator □ and the by that every ei is chosen so as not to have the fragmental possibility operator ◇ of modal logic(Hughes,1968) do. but have the complete modal meaning. That is, ei is sometimes a word and sometimes a sequence of words which might include conceptual words such as noun, verb, etc., as sub-strings in itself. We call ei “modal expression class marker example ME”. The independence of each e ’s meaning makes each i <PASSIVE-VOICE-IN れる(reru), S (1≦i≦n) a complete sentence with modality. We do i -NARROW-SENSE> (rareru) not care about, in this paper, the inner-structure of S and られる０ <PASSIVE-VOICE 1 > てくれる(tekureru) each ei. Si-1 is called a “complement sentence” of ei. <PASSIVE-VOICE 2 > てもらう(temorau) 4. Modal Expression A1-1 <CAUSATIVE-VOICE> せる(seru), (saseru) We made a list of approximately one thousand five させる hundred modal expressions extracted from the final <BENEFITIAL-VOICE > てやる(teyaru), position of sentences in various corpus by close てあげる(teageru) investigation. <SPONTANEITY-VOICE> れる(reru) First, sentences can be classified into two types. One is <STARTING> はじめる propositional sentence, which is objective of truth-false (hajimeru) evaluation and the other is non-propositional sentence, <CONTINUATION> つづける which is not. According to this, MEs are classified into (tudukeru), two types as well: A1-2 ていく(teiku) <ENDING> おわる(owaru), A. ME which changes a propositional sentence(Si-1) to おえる(oeru), another propositional sentence(S ). i てしまう B. ME which changes a propositional sentence(Si-1) to (teshimau) a non-propositional sentence(S ). i <FAILURE> そこなう(sokonau) <MISSING-CHANCE> We call ME of type A and B, propositional and そびれる(sobireru) non-propositional ME, respectively. Apart from the A1-3 <RELUCTANCE> しぶる(siburu) structure of the kernel sentence, the general form of the <PREPARATION>

Modal Expressions in Natural Language Sentence and Their Similarity

Romani Syntactic Typology Evangelia Adamou, Yaron Matras

Serial Verb Constructions Revisited: a Case Study from Koro

Tagalog Pala: an Unsurprising Case of Mirativity

A Cross-Linguistic Study of Grammatical Organization

The Syntax of Answers to Negative Yes/No-Questions in English Anders Holmberg Newcastle University

Schur Complement Trick for Positive Semi-Definite Energies

The Serial Verb Construction in Chinese: a Tenacious Myth and a Gordian Knot Waltraud Paul

Chapter 1/2 Sentence Types, Nom, and Acc. Cases Chapter 4 Singular And

The Production of Finite and Nonfinite Complement Clauses by Children with Specific Language Impairment and Their Typically Developing Peers

Russian Case Morphology and the Syntactic Categories David Pesetsky (MIT)1 • Terminology: I Will Use the Abbreviations NGEN, DNOM, VACC, PDAT, Etc

Chapter 1 Typology of Complement Clauses Magdalena Lohninger University of Vienna Susi Wurmbrand University of Vienna

"Structural Markedness in Formal Features: Deriving Interpretability"