<<

A GRAMMAR AND A PARSER FOR SPONTANEOUS SPEECH Mikio Nakano, Akira Shimazu, and Kiyoshi Kogure NTT Basic Research Laboratories 3-1 Morinosato-Wakamiya, Atsugi-shi, Kanagawa, 243-01 Japan {nakano, shiraazu, kogure}~atora.nt'c.jp

ABSTRACT analysis method to deal with spontaneous speech. This paper classifies distinctive phenomena occur- The former method would fail, however, when new in- ring in Japanese spontaneous speech, and proposes formation is conveyed in the ; that is, when a grammar and processing techniques for handling the semantic characteristics of the dialogue topic are them. Parsers using a grammar for written sentences not known to the hearer. In such cases, even ill a cannot deal with spontaneous speech because in spon- dialogue, the syntactic constraints are nsed for un- taneous speech there are phenomena that do not occur derstanding utterances. Because methods that dis- in written sentences. A grammar based on analysis of regard syntactic constraints would not work well in transcripts of dialogues was therefore developed. It these kinds of cases, we took the latter approach. has two distinctive features: it uses short units as We analyzed more than a hundred dialogue tran- input units instead of using sentences in grammars scripts and classified the distinctive phenomena in for written sentences, and it covers utterances includ- spontaneous Japanese speech. To handle those phe- ing phrases peculiar to spontaneous speech. Since the nomena, we develop a computational model called L'n- grammar is an augmentation of a grammar for writ- semble Model (Shimazu et al., 1993b), in which syn- ten sentences, it can also be used to analyze complex tactic, semantic, and pragmatic processing modules utterances. Incorporating the grammar into the dis- and modules that do combination of some or all of tributed natural processing model described those processing analyze the input in i)arallel and in- elsewhere enables the handling of utterances includ- dependently. Even if some of the modules are unable ing variety of phenomena peculiar to spontaneous to analyze the input, the other modules still output speech. their results. This mode] can handle various kinds of irregular expressions, such as case particle omission, 1 INTRODUCTION inversions, and fragmentary expressions. Most dialogue understanding studies have focused on the mental states, plans, and intentions of the par- We also developed Grass-.] ( GT"ammar ticipants (Cohen et al., 1990). These studies have for spontaneous speech in Japanese), which enables presumed that utterances can be analyzed syntacti- the syntactic and semantic processing modules of t~he cally and semantically and that the representation of Ensemble Model to deal with some of the phenomena the speech acts performed by those ntterances can peculiar to spontaneous speech. Since G~'ass-.] is an be obtained. Spontaneonsly spoken utterances differ augmentation of a grammar used to analyze written considerably from written sentences, however, so it is sentences (Grat-J, Gr'ammar for lexts in Japanese), not possible to analyze them syntactically and seman- Crass-Y-based parsers can be used for syntactically tically when using a grammar for written sentences. complex utterances. Spontaneous speech, a sequence of spontaneously There are two distinctive features of' G~'ass-J. One spoken utterances, can be distinguished from well- is that its focus is on the short units in spontaneous planned utterances radio news and movie dia- speech, called utter'auce units. An uniL in- logues. Mnch effort has been put into incorporating stead of a sentence as in Gral-J is used as a gram- grammatical information into speech mlderstanding matical category and is taken as the start symbol. A (e.g., Hayes et el. (1986), Young et al. (1989), Okada Grass-J-based parser takes an utterance unit as in- (1991)), but because this work has focused on well- put and outputs the representation of the speech act planned utterances, spontaneously spoken utterances (illoeutionary act) performed by the unit. The other have received little attention. This has partly been distinctive feature is a focus on expressions peculiar due to the lack of a grammar and processing technique to spontaneous speech, and here we explain how to that can be applied to spontaneous speech. Conse- augment ('at-J so that it can handle them. Pre- quently, to attain an understanding of dialogues it is vious studies of spontaneous speech analysis have fo- necessary to develop a way to analyze spontaneous cused mainly on repairs and ellipses (Bear et el., 1992; speech syntactically and semantically. l,anger, 1990; Nakatani & Hirschberg, 1993; Otsuka There are two approaches to developing this kind ~; Okada, 1992), rather than expressions peculiar to of analysis method: one is to develop a grammar spontaneous speech. and analysis method for spontaneous speech that do not depend on syntactic constraints as much as the This paper first describes Grat-J, and then classi- conventional methods for written sentences do (Den, ties distinctive phenomena in Japanese spontaneous 1993), and the other is to augment the grammar used speech. It then describes Grass-Y and presents sev- for written sentences and modify the conventional eral analysis examples.

1014 1. Subcategorization rule pus vet b 1 Rule for NP (with particle) -VP constructions. he~d [ infl sentence-final } M~CH hea.d llOlIII (M head) = (FI head) c~se g& (NGM) (14 subcat) = (M subcat) U (C) sere [index *x ] sub<:at { (M adjacent) --- nil head [IO1111 (H adjacent) = nil (:~ts(! o (A CC,) (M adjunct} = (kl adjunct} sere [ index *y ] (M lexical) -- ~,lj;t(:ent rill (M sere index) ~ (H sere index) ~djun(:t nil (M sere restric) lexical yes = (C sere restric) u (H sere restric) index *e J~ ] Symbols M, C, and tt are not names of categories but f (k,ve *e) "1 variables, or identifiers of root nodes in the graphs rep- selll [ restric { (~tgent *e *x) resenting feature structures. M, C, and H correspond [ (p~tient *e *y) to mother, complement daughter, and head daughter. The head daughter's subcat feature value is a set of feature structures. 2. Adjacency rule Fig. 2: Feature strueture for the 'aisuru' (low.). Rule for VP-AUXV constructions, Nf x particle construe tioIlS, etc, M-+AH lions in logical ff)rm in l)avidsonian style. The se- (M head) = {H head) ina.ntic represealtation ill each lexical item eonsisls of (M subeat} -- (I] subcat} a wu'iable ealled ;m inde,: (feature, (sent index}) ;rod (U adjacent) = (A) restrictions i)laced on it, (feature (selll restric)). Every (M adjacent) :- nil time a l)hrase, structure rule is ~q)lflied, l,hese restrie (M adjunct) :: (H adjunct} tions ~tre aggregated and a logical form is synthesized. (M lexlcal} - - (M sere index} = (H sere index) For exumple, let us ~gain consider 'aisuru' (love). (M sem restric} If, in the feature structure for the phr;me 'Taro ga' = (A sem restric) U 04 sere restric) (Taro-NOM), the (sen, index) value is *p a.nd gl~,, M, A, and H correspond to mother, adjacent daughter, (sere restrie) value is {(taro *p)}, after the subc.at- and head daughter. The head daughter's adjacent fea- egorization rule is al)plid the {sere restric) v~due ill ture value is unified with the adjacent daughter's feature the resulting feature str/lcture for the phrase "['aro ga strtlcture. ais.rlC (%,'o 'oves} i~ {(~ro *x) 0ov,, *e) (ag<~t *e 3. Adjunction rule *x) (patient *e *y)}. Rule for modifier modifiee constructions. (Trat-,! cowers such fundamental Jal)~mese l)henom- M~AH ena as subcategorizal.ion, passivization, interrogatiou, (M Imad) = (H he)el) coordination= and negation, and also covers copulas, (M subcat) = (H subcat) relative clauses, and conjunctions. We developed a (H adjacent) = nil parser based on (;rat-,l by using botton>u I) eha.rt (A adjunct) = {H) pursing (Kay, 1980). Unification operations are per- (M lexical) = -- (M sere index) -- {H sere index) formed by using constraint projection, ,Ul efficient (M sere restric) method for unifying disjunctive lhature descriptions = (A sem restric) U (H sem restric) (Nakano, 1991). The l)arser is inq)lemented in Lucid M, A, and H correspond to motlmr, adjunct daughter (',ommon Lisp ver. 4.0. (modifier), and head daughter (modifiee), Tile adjunct daughter's adjunct feature value is the feature structure 3 DISTINCTIVE PHENOMENA IN for the head daughter. ,IAPANESE SPONTANEOUS SPEECtI 3.1 Classification of PhcImmena Fig. 1: Phrase structure rules in (;rat-.]. We analyzed 97 telephone dialogues (about 300,000 bytes) ~d)out using ldli!]X to pl'epare docunmnts and 2 A GRAMMAR FOil. WRITTEN 26 dialogues (about i6(),O00 bytes) obtained from SENTENCES three radio lisl;ener call-in programs (Shimctzu et al., (TrM-3, a grammar for writte.n sentences, iv a uni- 1993a). We found that a.ugmentiltg the gr~:mmlal's aud fication grammar loosely based on Japanese phrase analysis methods requires taking into acconllL &{, least, structure gr~mlnar (JI'SG) (Gunji, 1986). Of Lhe six the following six phenomena in Japanese spontaneous phrase structure rules used in Gral-J, the three related speech. to the discussion in the following sections are shOWll (1)[) expressions peculiar to Japanese spontaneous in Fig. 1 in a I)A'l'll.d] like notation (Shieber, 1986)) speech, including fillers (or hesitations). ],exica] items are. represented by feature structures, (ex.) 'etto aru ndesnkedomo ._ ' 'kono fMru tar_, and example of which is shown in Fig. 2. ...' (wel], we haw'~ thenl.., this file is...) Grat-J-based p~trsers gellerate SOlllalll, iC representa- (i)2) ll~rticlc (ease pnrtiete) omission 1lhtles for relative cirCuses ~tl,d for -phr~tse coordi- (ex.) 'sore w,u.ashi y'a,'imasu' (I will do it.) mttions are not showll here. 0)3) matin verb ellipsis, or fragmentary ul, l,erances

1015 (ex.) 'aa, shinkansen de Kyoto kara.' (uh, from A: 1 anoo kisoken Kyoto by Shinkansen line.) well the Basic Research Labs. (p4) repairing phrases eno ikileala o desu.ne (ex.) 'ano chosya be, chosya no arufabetto jun to how to go ACC ni naran da, indekkusu naai?' (well, are there, 'well, how to go to the Basic Re- aren't there indices ordered alphabetically by au- search Labs.' thors' names?) B: 2 hal (p5) inversion nh-h uh (ex.) 'kopii shire kudasai, sono ronbun.' (That :1111-}1/111' paper, please copy.) A: 3 eholto shira nai ride (p6) semantic mismatch of the theme/subject and the well know NOT because main verb %ecause l don't know well' (ex.) 'rikuesuto no uketsnkejikan wa, 24-jikan 4 oshie teitadaki tai ndesukedo jonji uketsuke teori masu.' (The hours we receive tell IIAV E-A-FAVOR. want your requests, they are received 24 hours a day.) 'I'd like you tell me it'

3.2 Treatment of the Phenomena by the Fig. 4: Dialogue I. Ensemble Model These kinds of phenomena can be handled by the En- tic processing moduh', uses Crass-.) without syntactic semble Model. As described in Section 1, the En- constraints such as case information, and the prag- semble Model has syntactic, semantic, and pragmatic matic processing module uses a plan-based grammar. processing modnles and modules that; do combination of some or all of those proeessings to analyze the in- 4 A GRAMMAR, FOR SPONTANEOUS put in parallel and independently. Their output is SPEECH unified, and even if some of the modules are unable '['his section describes Grass-Z to analyze the input, the other modules output their own resnlts. This makes the Ensemble Model robust. 4.1 Processing Units Moreover, even if some of the modules are nnable to 'Sentence' is used as the start symbol in granunars analyze the input in real-time, the others output their for written but sentence boundaries are not results in real-time. clear in spontaneous speech. ;Sentence' therefore can '['he Ensemble Model has been partially imple- not be used as the start symbol in grammars lbr spon- mented, and Ensemble/Trio-I consists of syntactic, taneous speech. Many studies, though, have shown semantic, and syntactic-semantic modules, it can that utterances are composed of short units (I,evelt, handle (p2) above as described in detail elsewhere 1989: pp. 23-.24), that need not be sentences in writ- (Shimazu et al., 1993b). Phenomena (p3) through ten language. Grass-3 uses such units instead of sen- (p6) can be partly handled by another implemen- tences. tation of the Ensemble Model: Ensemble/Quartet- Consider, for example, Dialogue 1 in Fig. 4. Ut- 1, which has pragmatic processing modnle as well as terances 1 and 3 cannot be regarded as sentences in the three modules of Ensemble/'lMo-I. The pragmatic written language. Let us, however, consider 'hal' in processing module uses plan and domain knowledge Utterance 2. It expresses participant B's confirma- to handle not only well-structured sentences bnt also tion of the contents of Utterance 1. 2 Each utterance ill-structured sentences, such as those including inver in Dialogue 1 can thus be considered to be a speech sion and omission (Kognre et al., 1994). act (Shimazu et al., 1993a). These utterances are pro-- To make the system more robust by enabling the cessing nulls we call "utlerance units. They are used in syntactic and semantic processing modules to han- Grass-J instead of the sentences used in Grat-J. One dle phenomena (pl) and (p3) through (p6), we in- feature of these units is that 'hal' can be. intel\jected corporated Grass-g into those modnles. Grass-J dif- by the hearer at the end of the unit. fers fl:om Grat-J in two ways: Grass-J has lexieal en- The boundaries for these units can be determined tries for expressions peculiar to spontaneous speech, by using pauses, linguistic clues described in the next so that it can handle (pl). And because sentence section, syntactic form, and so on. In using syntactic boundaries are not clear in spontaneous speech, it uses [brm to determine utterance unit boundaries, Crass- tile concept of utterance unit (Shimazu et al,, 1993a) J first stipulates what an utterance unit actually is. instead of sentence. This allows it to handle phenom- This stipulation is based on an investigation of dia- ena (p3) through (p6). For example, an inverted sen- logue transcripts, and in the current version of Grass- tence can be handled by decomposing it, at the point .], the following syntactic constituents are recognized where the inversion occurs, into two utterance units. as utterance units. Fig. 3 shown the architecture of Ensemble/Quartet- • verb phrases (including phrases I. Each processing module is based on the bottom- and phrases) that may be followed by up (:hart analysis method (:Kay, 1980) and a disjunc- tive feature description unification method ealled con- =The roles of 'hal', ~tn interjectory response correspond- straint projection (Nakano, 1991). The syntactic-. ing to ~ back-channel utterance sueh &s uh-huh in En- semantic processing module uses Grass-J, the syntac- glish but which occurs more frequently in Japanese di;> tic processing module uses Grass-J without seman- logue, axe discussed in Shimazu et at. (1993~t) ~tnd I(~tt~tgiri tic constraints such as sortal restriction, the seman- (1993).

1016 ~ed ~ Result

Fig. 3: Architecture of Ensemble/Quartet-l.

conjunctive particles and sentence-final particles 4.2 Treatment of Expressions Peculiar to • noml phrases, which may be followed by particles Spontaneous Slme, eh • interjections • conjunctions Classification The underlined in l)iak)gue 1 in Fig. d do not Grass-J it:chides a bundle of phrase structure rules normally appear in writLen sentences, We analyzed used to derive speech act re.presentation from the logi- the dialogue transcripts to identify expressions that cal form of these (:onstituents. A Grass-J-based parser kequently appear in spoken sentences which includes inputs an utterance unit and outputs the rel)resentt¢ spontaneous speech but that do not appear in written lion of the speech act performed by the unit, which is sentences, and we cleLssitied them as follows. then input to the discourse processing system. 1. words plmnologically dif[erent Dora those in writ- Consider the following simple dialogue. ten sentences (words in parenthesis are corre- A: 1 genkou o sponding written-sentence words) manuscript ACC (ex.) 'shinakya' ('shinakereb£, if someone does 'The manuscript' not do), 'shichau' ('shiteshimatf, have done) B: 2 hal 2. fillers (or hesitations such as well in l!;nglish) uh-huh (ex.) 'etto', 'anoo' 'nh-huh ' 3. particles peculiar to spoken langnage A: 3 okut tekudasai send please (ex.) 'tte', 'nante', %oka' 'please send hie' 4. interjectory particles (words inserted interjecto- rily after phrases and adverbial/adnominal- The logical form for i)tteranee 1 is ((mannscript *x)), form verb phrases) so that its resulting speech act representation is (ex.) ~llel~ Cdesllne~ :sa ~ (l) ((r~,ter %) (agent *e *4 (speaker *4 (ol,ject *,, 5. expressions introducing topics *x) (manuscript *x)):' (ex.) '(ila)lldeSllkedo', '([la) i|desukedon,(,', or, as written in usual notation, '(n a) 12des uga' (2) l{el>r(speaker, ?x:manuscript(?x)). 6. words appearing after main verb phrases (these words take l;he sentence-final form of In the same way, the speech act representation for /auxiliary verbs/) Utterance 3 is (ex.) 'yo', 'ne', 'yone', 'keredo', 'kedo', ~kere- (3) Request(speaker, hearer, send(hearer, speaker, ry)) domo', 'ga', 'kedomo', 'kate' Nagata and Kogure (1990) addressed Jai)anese The discourse processor would find that '?x in (2) is sentence-final expressions peculiar to spoken J N)anese the same as ?y in (3). A detailed explanation of this sentences but (lid not deal with all the spontaneous discourse processing is beyond the scope of this paper. speech expressions listed above. These. expressions may be analyzed morphologica.lly (Takeshita & Fukn a'liefer' st~nt(Is for the surface referring in Alien a.nd naga, 1991). Because some expressions peculiar to Perrault (] 980). spontaneous sl)eecb do not affect the propositiomd

1017 content of the sentences, disregarding those expres- [. expressions related to aspects sions might be a way to process spontaneons speech. teku (teiku in written-language), teru (teiru), chau Such cascaded processing of morphological analysis (tesimau), etc. and syntactic and semantic analysis disables the in- 2. expressions related to topic marker 'wa' cremental processing required for real-time dialogue cha (tewa), char (tewa), ccha (tewa), .jr (dewa), etc. understanding. Another approach is to treat these 3. expressions related to conjnnetive particle 'ha' kinds of expressions as extra, 'noisy' words. Although nakerya (nakereba), nakya (nakereba), etc. this can be done by using a robust parsing technique, 4. expressions related to formal such as the one developed by Mellish (1989), it re- n (no), nmn (nmno), toko (tokoro), etc. quires the sentence to be processed more than two 5. times, and is therefore not suitable for real-time dia- kocchi (kochira), korya (korewa), so (son), soshi- tara (soushitara), sokka (souka), socchi (sochira), son logue understanding. In Grass-J these expressions are (sono), sore.jr (soredewa), sorejaa (soredewa), etc. handled in the same way as expressions appearing in 6. expressions related to interrogative nani written language, so no special techniqm~.s are needed. nanka (nanika), nante (nanito), etc. 7. other Words phonologically different froin corre- mokkai (mouikkai), etc. sponding words in written-language The words 'tern' and 'ndesu' in 'shit tern ndesu Fig. 5: Words that in spoken language differ from ka' (do you know that'?) correspond semantically to corresponding words in written language. 'teirn' and 'nodesu' in written sentences. We investi- gated such words in the dialogue data (Fig. 5). One way to handle these words is to translate them into VP their corresponding written-language words, but be- cause this requires several steps it is not suitable for NP V incremental dialogue processing. We therefore regard I these words as independent of their corresponding I N desu words in written-language, even though their lexical r entries have the same content. etto 400-yen 'Phe 'etto' modifies the following word '400- Fillers yen' and the logical form of the sentence is the same Fillers such as 'anoo' and 'etto', which roughly cor- as that of '400-yen desn'. respond to wellin English, appear fl'equently in spore taneous speech (Arita et al., 1993) and do not affect Particles peculiar to spoken language the propositional content of sentences in which they Words such as 2;te' in 'Kyoto tte Osaka no tsugi appear 4 . One way to handle them is to disregard them no eki desu yone' (Kyoto is the station next to Osaka, after morphological analysis is completed. As noted isn't it?) work in the same way a~s c~>e-marking/topic- above, however, such an approach is not suitable for marking particles. Because they have no correspond- dialogue processing. We therefore treat them directly ing words in written language, lexical entries for then, in parsing. are required. '['hese words do not correspond to any In Grass-J, fillers modify the following words, what- specific surface case, such as 'ga' and %'. I,ike t, he ever their grammatical categories are. The feature topic marker 'wa', the semanl ic relationships they ex- structure for fillers is as follows. press depend on the meaning of the phrases they con- nect. head [pos interjection] su beat { } Interjectory particles adjunct [ lexical +] Intmjectory particles, such ~%s 'ne' and 'desune', for adjacent nil low noun phrases and adverbial/adnominal-form verb lexical 4- phrases, and they do ]lot affect tile meaning of tile sem [ restric {}] utterances. The intmjeciory particle 'he' differs from the sentence-final particle 'ne' in the sense that the The value of the feature lexicaI is either + or -: it latter follows sentence-final form verb phrases. These is + in lexical items and - in feature structures for kinds of words can be treated by regarding them as phrases colnposed, by phrase structure rules, of sub- particles Ibllowing noun phrases and verbs phrases. phrases. Because these words do not affect proposi- The following is the feature structure for these words. tional contents, the value of the feature (sere restric) is empty. head "1 subcat { } For exalnple, let us look at the parse tree for 'etto adjunct nil 400-yen desu' (well, it's 400 yen). Symbols I (Interjec- adjacent[ head*] tion), NP, and VP are abbreviations for the complex sere [ index *2] ] feature structures. lexical + [index "2 ] 4Although Sadanobu and 'TPakubo (1993) investigated sem restric { } the discourse management function of fillers, we do not discuss it here.

1018 The interjectory particles indicate the end of utte> (AGENT *X29 *X31) (SPEAKER *X31) ante units; they do not appear in the nliddle of utter- (BASIC-RESEARCH-LABS *X32) ante units. They flmetion ~us, so to Sl)eak, utterane(> (DESTINATION *X30 *X32) unit-final t)articles. Therefore, a noun phrase followed (HOW-TO-GO *X30) ) by an interjectory particle forms a (surface) referring • Utterance 4: 'oshie teitadaki tai ndesukedo' (I'd speech act in the same. way as noun phrase utter- like you to tell me it) ances, hH;er.jectory particles add nothing to logical parsc t,ree: forms. For example, the speech act representation of OU 'genkou o desune' ix the. same as (2) in Section 4.l. I Expressions introducing topics VP As in Uttermtce 4 of l)ialogne 1, an expression such as (,,a)r, des,,k,~do(,,~o) frequently apl,ears in di- VP P alogues, especially in the beginning. This expres- sion introduces a new topic. One way t.o handle an expression such as this is to break it. down into VP AUXV ndesukedo na + ndesu + kcdo F m.o. This process, however, pre I vents the system fronl detecting its role in topic intro- V AUXV tai (luction. We therefore consider each of these expres- sions to be one word. 'l'he reason these expression I I oshie teitadaki are used is to make a topie explicit, by introdncing a discourse referent ('Phomason, 1990). Consequently, speech act representation: an 'introduce-topic' speech act is formed. These ex- index = *X777 l)ressions indicate the en(I of an utterance unit as an restriction = interjectory particle. ( (INTRODUCE-TOPIC *XYYY) (OBJECT *XT(7 *X778) (AGENT *X777 *xggg) Words aplmaring after main verb phrase (SPEAKER *X779) [t has already been pointed out that; sentence-[inal (TELL *X780) |)articles, such as 'yo' and 'ne', Dequently app(:ar in (AGENT *X780 *X808 spoken Japanese sentences (Kawamori, 1991). Con- (OBJECT *X780 *X809 junctive particles, such as qwAo' and 'kara', are also (PATIENT *X780 *XOI0 used as sentenee-.final pa.rticle.s ([h)saka et ah, 1991) (HAVE-A-FAVOR *X784 and thc'y m:c treated as such in Grass-J. They perform (OBJECT *X784 *X780 the function of anticipating the heater's reaction, as (AGENT *X784 *X81J,) (SOURCE *X784 *X808 a trial cxt)ression does (Clark &. Wilkes-(]ibbs, 19!10). (WANT *X718) ']'hey Mso indicate the end of utterance units. (OBJECT *X778 *X784 (AGENT *X778 *X811)) 5 ANALYSIS EXAMPLES Below we show results obtained by using a Grass- 6 CONCLUSION J-based parser to analyze some of the utterances in We have developed a grammar, called Cras.s-.], Dialogue 1. U (J means the utterance refit category. for handling distinctive phenomena in spontaneous ® Utterance I: 'anoo kisoken eno ikikata o desune' speech. '['he grammatical analysis of spontaneous (*veil, how to go to the Basic Research l,al)s.) speech is useful in combildng the fruits of dialogue parse tree: undersl, anding research and those of speech pro OO cessing research. As describ(:d earlier, GrassoJ is used as the grammar tbr the experimental systems t~;nsemble/rli'io- 1 ancl l'hmembleffQuartetq, which are NP based on the Ensemble Model. It enables the pro- cessing of several kinds of spontaneons speech, such NP P as that lacking particles. We focused on processing ~rans('ripts because a NP P desune grammar and an analysis method for spontaneons f speech can be combined with speech processing sys- NP N o tems more accurately than (:art those for written lan- Jf~_ I guages. NP P ikikata Finally, a.lthough we {b(;used only on Japanese'. spontaneous Sl)eech , mosl, of the techniques described I N eno in this paper can also be used 1,o analyze spontaneous I I speech in other languages. anoo kisoken speech act representation: ACKNOWLEDGEMETNS index = *X29 We thank Chung Pal l,ing, Yuiko Otsuka, Miyoko

restriction = Sou, Kaeko Matsuzawa, and Sanae Nagata, for help- ((REFER *X29) (OBJECT *X29 *X30) ing us analyze dialogue data.

1079 REFERENCES Nakatani, C., & Itirschberg, J. (1993). A Speech-First Allen, J. F., & Perrault, C. R. (1980). Analyzing Model for 1%epair Detection and Correction. In Intention in Utterances. Artificial Intelligence, ACL-93, pp. 46. 53. t5, 143 178. Okada, M. (1991). A Unifieation-(7;rammar-Direeted Arita, H., Kogure, K., Nogaito, I., Maeda, H., & One-Pass Search Algorithm for Parsing Spoken Iida, II. (1993). Media-Dependent Conversation Language. [n Proceedings of IUASSP-9I. Manners. In SIG-NL-t;I, h~jbrmation Process- Otsuka, lI., & Okada, M. (1992). Incremental Elabo- ing Society of Japan. (in Japanese). ration in Generating Spontaneons Speech. SIG- Bear, J., l)owding, ,1., & Shriberg, 1",. (1992). In- NLC92-/tI, lnslitute of Eleclronics, information tegrating Multiple Knowledge Sources for the and Communication L'ngineer.s. (in Japanese). Detection and Correction of Repairs in Haman- Sadanobu, q'., & Takubo, Y. (1993). The Discourse Computer Dialog. In ACL-92, pp. 56 63. Management Function of Fillers a case of "eeto" Clark, II. H., & Wilkes-Gibbs, D. (1990). Referring and "ano(o)" . h Proceedings oJ International as a Collaborative Process. In Cohen, P. R., Symposium on Spoken Dialog, pp. 271 274. Morgan, J., & Pollack, M. E. (Eds.), Intentions Shieber, S. M. (1986). An Introduction to Unification- in Communication, pp. 463- 493. MIT Press. Based Approaches to Grammar. CSLI Lecture Cohen, P. 11,., Morgan, J., & Pollack, M. t!',. (Eds.). Notes Series No. 4. Stanford: CSLI. (1990). Intentions in Communication. MIT Shimazu, A., Kawamori, M., & Kogure, K. (1993@. Press. Analysis of Interjectory Responses in I)ialoguc. Den, Y. (1993). A Study on Spoken l)ialogue Gram SIG~NI, C-93-9, Institute of Electronics, In- mar. SIG-,5%UD-9302-5, Japanese Society of formation and Comunication l,;ngineers. (in AI, 33 .4:0. (in Japanese). Japanese). Gnnji, '1'. (1986). Japanese Phrase Structure Gram- Shimazn, A., Kogure, K., b. Nakano, M. (1993b). mar. Reidel, Dordrecht. An 1,2xperimental Distributed Natural Language Processing System and its Application to Ro- llayes, P. J., Hauptmann, A. C., Carbonell, J. (k, & bust Processing. in '[bmita, M. (1986). Parsing Spoken Language: Proceedings of the Sympo- A Semantic Caseframe Approach. in COLING- sium on lmplementalion of NaluraI Language Institute of Hectronies, Informa- 86, pp. 587 592. Processing. tion and Communication Engineers/Japan So- ttosaka, J., Takezawa, T., & Ehara, '1'. (1991). ciety for Software Science and Technology. (in Constructing Syntactic Constraints for Speech Japanese). l],ecognition using Empirical l)ata. In SIG-NL- Takeshita, A., & Fukmmga, It. (1991). Morphoh)gical 83, Information Processing Society of Japan, Analysis for Spokcm l,anguage. In pp. 97 104. (in Japanese). Proceedings of the 42nd Con.fcrence of Injbrmation Process*nO Katagiri, Y. (1993). l)ialogue Coordination Functions ,5'oc~et9 of Japan, Vol. 3, pp. 5- 6. (in Japanese). of Japanese Sentenee--Hnal Particles. In Pro- Thomason, R. iI. (1990). Accommodation, Meaning, ceedings of [nterna*ional Symposium on Spoken Dialogue, pp. 145 148. and lmplicature: Interdisciplinary t)'oundations for . In Cohen, P. 1%, Morgan, J., & Kawalnori, M. (1991 ). Japanese Sentence Final Parti- Pollack, M. It;. (Eds.), Intentions in Uommuni- cles and Epistemic Modality. SIG-NI;-SL h~foT'- cation, pp. 325 1t64. MIT Press. marion Processing Society of Japan, 41 48. (in Japanese). Young, S. I%., Hauptmann, A. G., Ward, W. I1., Smith, E. T., & Werner, P. (1989). High bevel Kay, M. (1980). Algorithm Schemal, a and Data Strnc- Knowledge Sourees in Usable Spe.ech l{ecogni- tnres in Syntactic Processing. Teeh. rep. CSL- tion Systems. Comm*lnication of the ACM, 80-12, Xerox PARC. ,~2(2), 183 194. Kogure, K., Shimazu, A., L; Nakano, M. (1994). Phm- Based Utterance Understanding. In Proceedings of the /~Slh Conference of [nform.ation Process- ing Society of Japan, Vol. 3, pp. 189 190. (in a apanese). Langer, tI. (1990). Syntactic Normalization of Spon- taneous Speech. In COLfNG-gO, pp. 180 183. Levelt, W. J. M. (1989). Speaking. MIT Press. Mellish, C. (1989). Some Chart-Based Techniques for Parsing Ill-Formed Input. In A CL-89, pp. 102 109. Nagata, M., & Kogure, K. (1990). tlPSG-Based Lat- tice Parser for Spoken Japanese in a Spoken Language Translation System. In 1~;CAI-.90, pp. 4(it 466. Nakano, M. (199]). Constraint Projection: An I"fffl- cient Treatment of Disjunctive Feature Descrip- tions. In ACL-gl, pp. 307 314.

1020