Language Model and Sentence Structure Manipulations for Natural Language Applications Systems
Total Page:16
File Type:pdf, Size:1020Kb
I Language Model and Sentence Structure Manipulations for / Natural Language Application Systems Zenshiro Kawasaki, Keiji Takida, and Masato Tajima Department of Intellectual Information Systems Toyama University 3190 Gofuku, Toyama 930-0887, Japan / {kawasaki, zakida, and Zajima}@ecs.Zoyama-u. ac. jp Abstract the CC representation format lead to a correspond- ing simplicity and uniformity in the CC structure This paper presents a language model and operation scheme, i.e., CC Manipulation Language its application to sentence structure manip- (CCML). / ulations for various natural language ap- plications including human-computer com- Another advantage of the CCM formalism is that munications. Building a working natural it allows inferential facilitiesto provide flexiblehu- language dialog systems requires the inte- man computer interactions for various natural lan- l gration of solutions to many of the impor- guage applications. For this purpose conceptual re- tant subproblems of natural language pro- lationships including synonymous and implicational cessing. In order to materialize any of these relations established among CFs are employed. Such / subproblems, handling of natural language knowledge-based operations axe under development expressions plays a central role; natural and will not be discussed in this paper. language manipulation facilities axe indis- In Section 2 we present the basic components of / pensable for any natural language dialog CCM, i.e,,the concept frame and the concept com- systems. Concept Compound Manipula- pound. Section 3 introduces the CC manipulation tion Language (CCML) proposed in this language; the major features of each manipulation paper is intended to provide a practical statement are explained with illustrative examples. l means to manipulate sentences by means Concluding observations axe drawn in Section 4. of formal uniform operations. 2 Concept Coupling Model / 1 Introduction Sentence structure manipulation facilitiessuch as 2.1 Concept Compound and Concept transformation, substitution, translation, etc., axe Frame indispensable for developing and maintaining natu- ral language application systems in which language It is assumed that each linguistic expression such structure operation plays an essential role. For this as a sentence or a phrase is mapped onto an ab- / reason structural manipulability is one of the most stract data structure called a concept compound important factors to be considered for designing a (CC) which encodes the syntactic and semantic in- sentence structure representation scheme, i.e.,a lan- formation corresponding to the linguisticexpression guage model. The situation can be compared to in question. The CC is realized as an instance of a database management systems; each system is based data structure called the concept frame (CF) which on a specific data model, and a data manipulation is defined for each conceptual unit, such as an entity, sublanguage designed for the data model is provided a property, a relation, or a proposition, and serves / to handle the data structure (Date, 1990). as a template for permissible CC structures. CFs In Concept Coupling Model (CCM) proposed in axe distinguished from one another by the syntactic this paper, the primitive building block is a Con- and semantic properties of the concepts they repre- / cept Frame (CF), which is defined for each phrasal sent, and axe assigned unique identifiers. CFs axe or sentential conceptual unit. The sentence analysis classified according to their syntactic categories as is carried out as a CF instantiation process, in which sentential, nominal, adjectival, and adverbial. The several CFs axe combined to form a Concept Com- CCM lexicon is a set of CFs; each entry of the lex- / pound (CC), a nested relational structure in which icon defines a CF. It should be noted that in this the syntactic and semantic properties of the sen- paper inflectional information attached to each CF definition is left out for simplicity. l tence are encoded. The simplicity and uniformity of Kawasaki, Takida and Tajima 281 Language Model and Sentence Structure Manipulations Zenshiro Kawasaki, Keiji Takida and Masato Tajima (1998) Language Model and Sentence Structure Manipulations for Natural Language Applications Systems. In D.M.W. Powers (ed.) NeMIazP3/CoNLL98 Workshop on Human Computer l Conversation, ACL, pp 281-286. I 2.2 Syntax The CC is an instantiated CF and is defined as shown in (3): In this section we define the syntax of the formal (3) C(H,R,A, description scheme for the CF and the CC, and ex- where the concept identifier C is used to indicate plain how it is interpreted. A CF is comprised of the root node of the CC and represents the ,whole four types of tokens. The first is the concept identi- CC structure (3), and H, R, and A are the head, I fier which is used to indicate the relation name of the role, and attribute slot respectively. The head slot CF structure. The second token is the key-phrase, H is occupied by the identifier of the C's head, i.e., which establishes the links between the CF and the C itself or the identifier of the C's component which I actual linguistic expressions. The third is a list of determines the essential properties of the C. The attribute values which characterize the syntactic and role slot R, which is absent in the corresponding CF semantic properties of the CF. Control codes for the definition, is filled in by a list of syntactic and se- CCM processing system may also be included in the mantic role names which are to be assigned to C list. The last token is the concept pattern which is in the concept coupling process described in Sec- a syntactic template to be matched to linguistic ex- tion 2.3. The last slot represents the C's structure, pressions. The overall structure of the CF is defined an instance of the concept pattern P of the corre- I as follows: sponding CF, and is occupied by the constituent list. (I) c(g, A, P), The members of the list, X1,X2,..., and Xn, are the where C and K are the concept identifier and the CCs corresponding to the immediate constituents key-phrase respectively, A represents a list of at- of C. The tail element M of the constituent list, l tribute values of the concept, and P is the concept which is absent in non-sentential CCs, has the form pattern which is a sequence of several terms: vari- md_(H,R,A, [M1, ..., Mini), where M1,...,Mm rep- ables, constants, and the symbol * which represents resent CCs which are associated with optional ad- I the key-phrase itself or one of its derivative expres- verbial modifiers. sions. The constant term is a word string. The vari- By way of example, the concept st'~ruCture corre- able term is accompanied by a set of codes which sponding to the sentence in (4a) is shown in (4b), represent the syntactic and semantic properties im- l which is an instance of the CF in (2). posed on a CF to be substituted for it. These codes, (4a) John broke the box yesterday. each of which is preceded by the symbol +, are classi- (4b) break010 fied into three categories: (a) constraints, (b) roles, (break010, I and (c) instruction codes to be used by the CCM 0, processing system. No reference is made to the se- [sent, dyn, fntcls, past, agr3s], quencing of these codes, i.e., the code names are [johnO060 I uniquely defined in the whole CCM code system. (johnO060, The CF associated with the word break in the [subj, agent], sense meant by John broke the box yesterday is [nomphr s, prop, hum, agr 3s, mascIn] , shown in (2): B), I (2) breakOl O('break', [sent, dyn, base], boxO0010 '$1(+nomphrs + hum + subj + agent) • (box00010, $2( +nomphrs + Chert + obj + patnt)'). [obj, patnt], In this example the identifier and the key-phrase are [the_, encrt, nomphr s, agr3s], breakOlO and break respectively. The attribute list 0), indicates that the syntactic category of this CF is md_ sertt(ential) and the semantic feature is dyn(amic). ([yeste010], The attribute base is a control code for the CCM [modyr], processing system, which will not be discussed fur- [advphr s, mo~, ther in this paper. The concept pattern of this CF [yeste010 corresponds to a subcategorization fraxae of the verb (yeste010, break. Besides the symbol • which represents the 0, key-phrase break or one of its derivatives, the pat- [advphrs, timeAdv], I tern includes two variable terms ($1 and $2), which I) 3) 3). are called the immediate constituents of the con- In (4b) three additional attributes, i.e., f(i)n(i)t(e- cept breakOlO. The appended attributes to these )cl(au)s(e), past, and agr(eement-)3(rd-person- variables impose conditions on the CFs substituted )s(ingular), which are absent in the CF definition, for them. For example, the first variable should be enter the attribute list of break010. Also note that matched to a CF which is a nom(inal-)phr(a)s(e) the constituent list of break010 contains an optional with the semantic feature hum(an), and the syntac- modifier component with the identifier rod_, which tic role subj(ect) and the semantic role agent are to does not have its counterpart in the corresponding be assigned to the instance of this variable. CF definition given in (2). l Kawasaki, Takida and Tajima 282 Language Model and Sentence Structure Manipulations I II 2.3 Concept Coupling Another important feature of the CC representa- tion is that structural transformation relations can As sketched in the last section, each linguistic ex- easily be established between CCs with different II pression such as a sentence or a phrase is mapped syntactic and semantic properties in tense, voice, onto an instance of a CF.