1 Title: Stochastic Natural Language Generation for Spoken Dialog Systems

Stochastic Natural Language Generation for Spoken Dialog Systems

Alice H. Oh July 10, 2000

The two dominant approaches to language generation, template-based and rule- based (linguistic) NLG, have limitations when applied to spoken dialog systems. For both techniques, there is a tradeoff between development and maintenance efforts and the quality of the output. This tradeoff can be lessened by a corpus- based technique that takes advantage of certain characteristics of a task-oriented spoken dialog. In this paper I will discuss corpus-based stochastic language generation at two levels: content selection and surface realization. At the content selection level, the utterances are modeled by bigrams, and the appropriate attributes are chosen using the bigram statistics. For surface realization, the utterances in the corpus are modeled by n-grams and each new utterance is generated stochastically. This paper presents the details of the implementation in the CMU Communicator, some preliminary evaluation results, and the potential contribution to the general problem of natural language generation for spoken dialog, written dialog, and text generation.

1 Introduction As the field of natural language generation (NLG) is maturing and many useful technologies are being developed for text generation, the time is ripe for the community to broaden its horizon and look at other potential applications. Spoken dialog is a field where good NLG techniques are very important. However, we feel that the current NLG technologies are not adequate for use in spoken dialog systems.

In developing and maintaining a natural language generation (NLG) module for a spoken dialog system, we realized that the current NLG technologies are impractical for our purposes. While several general-purpose rule-based generation systems have been developed (cf. Elhadad and Robin, 1996), they are often quite difficult to adapt to small, task-oriented applications because of their generality. To solve this problem, several people have proposed different solutions. Bateman and Henschel (1999) have described a lower cost and more efficient generation system for a specific application using an automatically customized subgrammar. Busemann and Horacek (1998) describe a system that mixes templates and rule-based generation. This approach takes advantages of templates and rule-based generation as needed by specific sentences or utterances. Stent (1999) has also proposed a similar approach for a spoken dialog system. However, for all of these, there is still the burden of writing and maintaining grammar rules. In addition, we suspect it would still take too much time at run-time to generate sentences using grammar rules to be usable in spoken dialog systems (only the average time for templates and rule-based sentences combined is reported in Busemann and Horacek, 1998).

Because comparatively less effort is needed, many current dialog systems use template-based generation. But there is one obvious disadvantage to templates: the quality of the output depends entirely on the set of templates. Even in a relatively simple domain, such as travel reservations, the number of templates necessary for reasonable quality can become quite large that maintenance becomes a serious problem. There is an unavoidable trade-off between the amount of time and effort in creating and maintaining templates and the variety and quality of the output utterances.

Given these shortcomings of the above approaches, we developed a corpus-based stochastic generation system, in which we model language spoken by domain experts performing the task of interest, and use that model to stochastically generate system utterances. We have applied this technique to sentence realization and content planning, and have incorporated the resulting generation component into a working natural dialog system. In our evaluation experiments, this technique performed well for our spoken dialog system. This shows that the corpus- based approach is a promising avenue to further explore. 2 Natural Language Generation for Spoken Dialog Systems Natural Language Generation (NLG) and Spoken Dialog Systems are two distinct and non-overlapping research fields within language technologies. Most researchers in these two fields have not worked together until very recently. However, since every spoken dialog system needs an NLG component, spoken dialog researchers have started to look at what technologies the NLG community can provide them. While that may be a good approach, it is also possible for the spoken dialog community to take the initiative and contribute to the NLG community. This work is the first attempt at such contribution. It is not merely an application of NLG techniques to spoken dialog, but it goes beyond that to introduce a novel technique effective for a subset of NLG applications. In the next two sections, ……

2.1 Natural Language Generation Natural language generation is the process of generating text from a meaning representation. It can be thought of as the reverse of natural language understanding (see Figure 1). Researchers acknowledge the importance of NLG as one half of natural language processing, but compared to NLU, there has been a definite shortage of research in NLG. This can be partly explained by the reality that NLG, at least until now, has had more application potential, due to the enormous amount of text present in the world (Mellish and Dale, 1998). In contrast, it is unclear what the input to NLG should be, and other than some systems in which input to NLG is automatically created by another module, all input must be created somehow just for this purpose.

Just as there are different sub-tasks within NLU, there are sub-tasks within NLG. There has been no consensus among the NLG researchers as to where the boundaries are, and what each of the sub-tasks spans 1. Table 1 illustrates three different segmentations.

Text Semantic (Syntactic Representation)Semantic (Syntactic Representation) Text

Natural Language Understanding Natural Language Generation

Figure 1: Comparing natural language understanding (NLU) and natural language generation (NLG)

Reiter, 1995 Nirenburg, et al. 1989 Mellish and Dale, 1998 Content/text Planning Content Delimitation Content Determination Text Structuring Document Structuring Sentence Planning Lexical Selection Lexicalization Syntactic Selection Coreference Treatment Referring Expression Generation Constituent Ordering Aggregation Syntactic Realization Realization Surface Realization

Table 1: Three different segmentations of sub-modules of NLG

2.2 Spoken Dialog Systems A spoken dialog system enables human-computer interaction via spoken natural language. A task-oriented spoken dialog system speaks as well as understands natural language to complete a well-defined task. This is a relatively new research area, but many task-oriented spoken dialog systems are already fairly advanced. Examples include a

1 In response to this problem, the RAGS project is aiming to establish a reference architecture (a more functional model) of NLG. See Cahill, et al., 1999. complex travel planning system (Rudnicky, et al. 1999), a publicly available worldwide weather information system (Zue, et al. 2000), and an automatic call routing system (Gorin, et al. 1997).

Building a generation module for a spoken dialog system differs from the more traditional NLG problems of generating documents, but it is a very interesting problem that can provide a novel way of looking at NLG. The following are some characteristics of task-oriented spoken dialog systems that ……

1. The language used in spoken dialog is different from the language used in written text in terms of vocabulary, sentence length, and syntax. 2. The language used in task-oriented dialogs tends to be very domain-specific. The domains for these systems are fairly narrow. 3. NLG is usually not the main focus in building/maintaining these systems. Yet the NLG module is critical in development and system performance.

Taking those characteristics into account, NLG for task-oriented spoken dialog systems must be able to

1. generate language appropriate for spoken interaction 2. generate domain-specific language; the lexicon must contain appropriate words for the domain 3. enable fast prototyping; development of the NLG module should not be the bottleneck in developing the whole system

Also, to satisfy the overall goal of the task-oriented spoken dialog system, the NLG component must be able to carry out a natural conversation, elicit appropriate responses from the user, prevent user confusion, and guide the user in cases of confusion.

3 Modeling Human-Human Interaction Since it is not clear how best to design human-computer spoken interactions, especially in deciding what the system prompts should be, the obvious choice would be to use models of human-human interactions. Boyce and Gorin (1996) support this argument by their definition of a natural dialog: “[a dialog] that closely resembles a conversation two humans might have”. Applying this definition to the NLG module of the spoken dialog system, we can build computational models of a domain expert having a conversation with a client, and use those models to generate system utterances. ……

Knowledge acquisition is another name for building these models of human-human interaction. For many domains, acquiring the correct lexicon items or grammar rules is not a trivial task, and to date, most researchers relied on informal methods of knowledge acquisition (KA). Reiter, Robertson, and Osman (2000) have begun exploring more principled ways of KA with their recent experiment of structured knowledge acquisition techniques. Although the technique presented here is much simpler than theirs, concentrating mostly on acquisition of lexicon, it can be thought of as an efficient and effective way of automatically acquiring knowledge needed for NLG.

3.1 Corpora We used two corpora in the travel reservations domain to build n-gram language models. One corpus (henceforth, the CMU corpus) consists of 39 dialogs between a travel agent and clients (Eskenazi, et al. 1999). Another corpus (henceforth, the SRI corpus) consists of 68 dialogs between a travel agent and users in the SRI community (Kowtko and Price 1989).

query_arrive_city inform_airport query_arrive_time inform_confirm_utterance query_arrive_time inform_epilogue query_confirm inform_flight query_depart_date inform_flight_another query_depart_time inform_flight_earlier query_pay_by_card nform_flight_earliest query_preferred_airport inform_flight_later query_return_date inform_flight_latest query_return_time inform_not_avail hotel_car_info inform_num_flights hotel_hotel_chain inform_price hotel_hotel_info other hotel_need_car hotel_need_hotel hotel_where Figure 2: utterance classes

airline depart_date arrive_airport depart_time arrive_city flight_num arrive_date hotel_city arrive_time hotel_price car_company name car_price num_flights depart_airport pm depart_city price

Figure 3: word classes

3.2 Tagging The utterances in the two corpora were tagged with utterance classes and word classes (see Figure 2 and Figure 3). The CMU corpus was manually tagged, and back-off trigram models built (using Clarkson and Rosenfeld, 1997). These language models were used to automatically tag the SRI corpus; the tags were manually checked.

3.2.1 Issues for Human-Computer Interaction Several issues arise when using computational models of human-human interaction for spoken dialog systems. First, as Boyce and Gorin (1996) also point out, there are some user and system utterances in human-computer interactions that do not occur in normal human-human interactions. These include more frequent confirmations, error and help messages, and less frequent backchanneling responses. Also, the quality of the output depends very much on the expert whose language is modeled. This means the selection process for the expert is crucial for system performance. Another issue is that, while the models of human-human interaction may result in natural dialogs, they may not lead to an efficient dialog.

4 Content Planning Content planning is a process where the system decides which attributes (represented as word classes, see Figure 3) should be included in an utterance. In a task-oriented dialog, the number of attributes generally increases during the course of the dialog. Therefore, as the dialog progresses, the system needs to decide which ones to include at each system turn. If the system includes all of them every time (indirect echoing, see Hayes and Reddy, 1983), the utterances become overly lengthy, but if we remove all unnecessary attributes, the user may get confused. With a fairly high recognition error rate, this becomes an even more important issue.

The problem, then, is to find a compromise between the two. We compared two ways to systematically generate system utterances with only selected attributes, such that the user hears repetition of some of the constraints he/she has specified, at appropriate points in the dialog, without sacrificing naturalness and efficiency. The specific problems, then, are deciding what should be repeated, and when. We first describe a simple heuristic of old versus new information. Then we present a statistical approach, based on bigram models.

4.1 Using Heuristics As a simple solution, we can use the previous dialog history, by tagging the attribute-value pairs as old (previously said by the system) information or new (not said by the system yet) information. The generation module would select only new information to be included in the system utterances. Consequently, information given by the user is repeated only once in the dialog, usually in the utterance immediately following the user utterance in which the new information was given2.

Although this approach seems to work fairly well, echoing user’s constraints only once may not be the right thing to do. Looking at human-human dialogs, we observe that this is not very natural for a conversation; humans often repeat mutually known information, and they also often do not repeat some information at all. Also, this model does not capture the close relationship between two consecutive utterances within a dialog. The second approach tries to address these issues.

4.2 Statistical Approach For this approach, we built a two-stage statistical model of human-human dialogs using the CMU corpus. The model first predicts the number of attributes in the system utterance given the utterance class, then predicts the attributes given the attributes in the previous user utterance.

4.2.1 Models

4.2.1.1 The number of attributes model The first model will predict the number of attributes in a system utterance given the utterance class. The model is the probability distribution P(nk) = P(nk|ck), where nk is the number of attributes and ck is the utterance class for system utterance k.

4.2.1.2 The bigram model of the attributes This model will predict which attributes to use in a system utterance. Using a statistical model, what we need to do is find the set of attributes A* = {a1, a2, …, an} such that

A *  arg max P(a1, a2, ..., an)

We assume that the distributions of the ai’s are dependent on the attributes in the previous utterances. As a simple model, we look only at the utterance immediately preceding the current utterance and build a bigram model of the attributes. In other words, A* = arg max P(A|B), where B = {b1, b2, …, bm}, the set of m attributes in the preceding user utterance.

If we took the above model and tried to apply it directly, we would run into a serious data sparseness problem, so we make two independence assumptions. The first assumption is that the attributes in the user utterance contribute independently to the probabilities of the attributes in the system utterance following it. Applying this assumption to the model above, we get the following:

m A *  arg max  P(bk)P(A | bk) k1

The second independence assumption is that the attributes in the system utterance are independent of each other. This gives the final model that we used for selecting the attributes. m n A*  arg max  P(bk) P(ai | bk) k1 i1 2 When the system utterance uses a template that does not contain the slots for the new information given in the previous user utterance, then that new information will be confirmed in the next available system utterance in which the template contains those slots. Although this independence assumption is an oversimplification, this simple model is a good starting point for our initial implementation of this approach.

5 Surface Realization A definition of surface realization given in Mellish and Dale (1998) is as follows:

Determining how the underlying content of a text should be mapped into a sequence of grammatically correct sentences. … An NLG system has to decide which syntactic form to use, and it has to ensure that the resulting text is syntactically and morphologically correct.

One technique for surface realization is using templates, which is probably the most popular in terms of actual text generation applications. At the other end of the spectrum is a technique based on generation grammar rules, which most research systems use. Recently, there has been work on hybrid systems and stochastic methods. The next section describes these techniques, gives examples, and then analyzes the characteristics of the techniques to compare them to one another.

5.1 Existing Approaches

5.1.1 Template-based Technique In a template-based generation system, the developer hand-crafts a set of templates and canned expressions to be used. The Apple Macintosh Ballon Help system, an example of a template system, produces the sentence

This is the kind of item displayed at left. This shows that test data is a(n) Microsoft Word document. where “Microsoft Word” has been inserted into the appropriate slot.

The template-based technique is popular among business applications where similar documents are produced in large quantities (e.g., customer service letters). It is also used often in dialog systems where there is only a finite and limited set of system output sentences.

Input to a template-based system can be minimally specified. In the example above, only the sentence type and the slot-filler “Microsoft Word” are needed. However, syntax and morphology of the output are not as sophisticated, as evidenced by “a(n) “ in the example above. The quality and flexibility of the output sentences depend very much on the templates, as there needs to be a template for every kind of sentence to be generated. In principle, a template- based system can generate every sentence a rule-based system can generate, if the set of templates covers all of the possible sentences the grammar rules can generate. Obviously, that is not very practical.

Depending on the domain, handcrafting the templates can be a trivial task or a very difficult task. However, no special knowledge other than domain knowledge (e.g., computational linguistics) is necessary to build templates, which makes it much easier than generation grammar rules where you would need an expert grammar writer. As mentioned in the previous section, every sentence to be generated must have its own template, so maintenance of templates is nontrivial, if the set of output sentences must change on a regular basis. This is comparatively more work than updating a generation grammar, where one addition of a rule would result in a set of new sentences.

5.1.2 Grammar Rule-Based Technique Many research NLG systems use generation grammar rules, much like parsers with semantic or syntactic grammars. Most generation grammars are syntactic, and rely on one or more linguistic theories. A good example of a rule- based surface realizer is SURGE (Elhadad and Robin, 1996). It incorporates a couple of different linguistic theories, namely systemic grammar and lexicalist linguistics. One drawback as a result of using specific linguistic theories is that no one theory covers all possible sentence constructions. SURGE, with its combination of theories, still cannot generate some sentences which the underlying theories do not provide rules for. Nevertheless, generation grammar rules enable an NLG system to have wide coverage, be domain independent, and be reusable, proven by many very different applications that use SURGE as the surface realizer.

Input to SURGE, and all other rule-based systems need to be very richly specified with features such as verb tense, number, NP heads, categorization, definiteness, and so on (see Figure 2). This is definitely the biggest disadvantage of the rule-based technique. Most systems do provide default values, but using the default values too often defeats the purpose of rule-based technique; it would produce output equivalent to a poorly developed set of templates. With richly specified input, though, SURGE produces a wide variety of high-quality sentences. Some examples include (Elhadad and Robin, 1999)

Michael Jordan scored two-thirds of his 36 points with 3 point shots, to enable Chicago to hand New York a seson high sixth straight loss.

Dallas, TX – Charles Barkley matched his season record with 42 points Friday night as the Phoenix Suns handed the Dallas Mavericks their franchise worst 27th defeat in a row at home 123-97. cat clause process type material effect-type creative lex “score” tense past participants agent cat proper head cat person-name first-name [lex “Michael”] last-name [lex “Jordan”] created cat np cardinal [value 36] definite no head [lex “point”]

Figure 2: Input for the sentence “Michael Jordan scored 36 points.” (Elhadad and Robin, 1999)

As is true for parsing grammars, generation grammar rules take much effort and time to develop. The set of grammar rules is the core knowledge source of any rule-based NLG systems, and the quality of output and coverage of sentence types depend on the set of grammar rules. Not only do they take time and effort, only a highly skilled grammar writer can write the rules. Hence, if the set of output sentences changes periodically, new rules must be added, and such maintenance is costly. One advantage, though, is that the rules are domain independent, so they can be reused in many applications, provided the input specification conforms.

5.1.3 Hybrid Technique: Rules + Statistics This technique was developed to overcome the drawback of the rule-based technique that the input needs to be richly specified. This technique uses bigram language models to compensate for missing features in the input to generate the correct syntax and morphology in the output sentences. Nitrogen, a generation engine within a machine translation system, is the first to apply this technique (Langkilde and Knight, 1998). It first generates a lattice of possible sentences, then uses the bigram statistics to choose the best one.

5.2 Corpus-based Approach If a natural human-computer dialog is one that closely resembles a human-human conversation, the best method for generating natural system utterances would be to mimic human utterances. In our case, where the system is acting as a travel agent, the solution would be to use a human travel agent’s utterances. The computational model we chose to use is the simple n-grams used in speech recognition. 5.2.1 Implementation We have implemented a hybrid NLG module of three different techniques: canned expressions, templates, and corpus-based stochastic generation. For example, at the beginning of the dialog, a system greeting can be simply generated by a “canned” expression. Other short, simple utterances can be generated efficiently by templates. Then, for the remaining utterances where there is a good match between human-human interaction and human-computer interaction, we use the statistical language models.

There are four aspects to our stochastic surface realizer: building language models, generating candidate utterances, scoring the utterances, and filling in the slots. We explain each of these below.

5.2.1.1 Building Language Models Using the tagged utterances as described in the introduction, we built an unsmoothed n-gram language model for each utterance class. Tokens that belong in word classes (e.g., “U.S. Airways” in class “airline”) were replaced by the word classes before building the language models. We selected 5 as the n in n-gram to introduce some variability in the output utterances while preventing nonsense utterances.

Note that language models are not used here in the same way as in speech recognition. In speech recognition, the language model probability acts as a ‘prior’ in determining the most probable sequence of words given the acoustics. In other words, W* = arg max P(W|A) = arg max P(A| W)Pr(W) where W is the string of words, w1, …, wn, and A is the acoustic evidence (Jelinek 1998).

Although we use the same statistical tool, we compute and use the language model probability directly to predict the next word. In other words, the most likely utterance is W* = arg max P(W|u), where u is the utterance class. We do not, however, look for the most likely hypothesis, but rather generate each word randomly according to the distribution, as illustrated in the next section.

5.2.1.2 Generating Utterances The input to NLG from the dialogue manager is a frame of attribute-value pairs. The first two attribute-value pairs specify the utterance class. The rest of the frame contains word classes and their values. Figure 4 is an example of an input frame to NLG. The generation engine uses the appropriate language model for the utterance class and generates word sequences randomly according to the language model distributions. As in speech recognition, the probability of a word using the n-gram language model is P(wi) = P(wi|wi-1, wi-2, … wi-(n-1), u) where u is the utterance class. Since we have built separate models for each of the utterance classes, we can ignore u, and say that P(wi) = P(wi|wi-1, wi-2, … wi-(n-1)) using the language model for u.

{ act query content depart_time depart_city New York arrive_city San Francisco depart_date 19991117 } Figure 4: an input frame to NLG

Since we use unsmoothed 5-grams, we will not generate any unseen 5-grams (or smaller n-grams at the beginning and end of an utterance). This precludes generation of nonsense utterances, at least within the 5-word window. Using a smoothed n-gram would result in more randomness, but using the conventional back-off methods (Jelinek 1998), the probability mass assigned to unseen 5-grams would be very small, and those rare occurrences of unseen n-grams may not make sense anyway. There is the problem, as in speech recognition using n-gram language models, that long-distance dependency cannot be captured.

5.2.1.3 Scoring Utterances For each randomly generated utterance, we compute a penalty score. The score is based on the heuristics we’ve empirically selected. Various penalty scores are assigned for an utterance that 1. is too short or too long (determined by utterance-class dependent thresholds), 2. contains repetitions of any of the slots, 3. contains slots for which there is no valid value in the frame, or 4. does not have some required slots (see section 2 for deciding which slots are required).

The generation engine generates a candidate utterance, scores it, keeping only the best-scored utterance up to that point. It stops and returns the best utterance when it finds an utterance with a zero penalty score, or runs out of time.

5.2.1.4 Filling Slots The last step is filling slots with the appropriate values. For example, the utterance “What time would you like to leave {depart_city}?” becomes “What time would you like to leave New York?”.

6 Evaluation and Discussions It is generally difficult to evaluate a generation system. In the context of spoken dialog systems, evaluation of NLG becomes an even more difficult problem. One reason is simply that there has been little effort in building sophisticated generation engines for spoken dialog systems. Another reason is that it is hard to separate the NLG module from the rest of the system. It is especially hard to separate evaluation of language generation and speech synthesis.

6.1 Methods used in text generation

6.2 Methods used in spoken dialog systems

6.3 Comparative Evaluation As a simple solution, we have conducted a comparative evaluation by running two identical systems varying only the generation component. In this section we present results from two preliminary evaluations of our generation algorithms described in the previous sections.

6.3.1 Experiment

6.3.2 Results

7 Conclusion

8 Acknowledgements

9 References S. Axelrod. 2000. Natural language generation in the IBM Flight Information System. In Proceedings of ANLP/NAACL 2000 Workshop on Conversational Systems, May 2000, pp. 21-26. J. Bateman and R. Henschel. 1999. From full generation to ‘near-templates’ without losing generality. In Proceedings of the KI'99 workshop, "May I Speak Freely?"

S. Boyce and A. Gorin. 1996. User interface issues for natural spoken dialog systems. In Proceedings of the International Symposium on Spoken Dialog, pp. 65-68. October 1996.

S. Busemann and H. Horacek. 1998. A flexible shallow approach to text generation. In Proceedings of the International Natural Language Generation Workshop. Niagara-on-the-Lake, Canada.

Cahill, Doran, Evans, Mellish, Paiva, Reape, Scott, and Tipper. 1999. In search of a reference architecture for NLG systems. In Proceedings of European Workshop on Natural Language Generation, 1999.

P. Clarkson and R. Rosenfeld. 1997. Statistical Language Modeling using the CMU-Cambridge toolkit. In Proceedings of Eurospeech97.

M. Elhadad and J. Robin. 1996. An Overview of SURGE: A reusable comprehensive syntactic realization component, Technical Report 96-03, Dept of Mathematics and Computer Science, Ben Gurion University, Beer Sheva, Israel.

M. Eskenazi, A. Rudnicky, K. Gregory, P. Constantinides, R. Brennan, C. Bennett, J. Allen. 1999. Data Collection and Processing in the Carnegie Mellon Communicator. In Proceedings of Eurospeech, 1999, 6, 2695-2698.

A.L. Gorin, G. Riccardi and J.H. Wright. How may I Help You? Speech Communication 23 (1997) pp. 113-127.

J. Kowtko and P. Price. 1989. Data collection and analysis in the air travel planning domain. In Proceedings of DARPA Speech and Natural Language Workshop, October 1989.

I. Langkilde and K. Knight. 1989. The practical value of n-grams in generation. In Proceedings of International Natural Language Generation Workshop. 1998.

C. Mellish and R. Dale. 1998. Evaluation in the context of natural language generation. Computer Speech and Language, vol. 12, pp. 349-373.

S. Nirenburg, Lesser, and E. Nyberg. 1989. Controlling a language generation planner. In Proceedings of IJCAI-89. Detroit, MI.

E. Reiter. 1995. NLG vs. Templates. In Proceedings of European Natural Language Generation Workshop.

E. Reiter, R. Robertson, and L. Osman. 2000. Knowledge acquisition for natural language generation. In Proceedings of the First International Natural Language Generation Conference, June 2000.

A. Rudnicky, E. Thayer, P. Constantinides, C. Tchou, R. Shern, K. Lenzo, W. Xu, and A. Oh. 1999. Creating natural dialogs in the Carnegie Mellon Communicator system. Proceedings of Eurospeech, 1999, 4, 1531-1534.

A. Rudnicky and W. Xu. 1999. An agenda-based dialog management architecture for spoken language systems. IEEE Automatic Speech Recognition and Understanding Workshop, 1999, p I-337.

A. Stent. 1999. Content planning and generation in continuous-speech spoken dialog systems. In Proceedings of the KI'99 workshop, "May I Speak Freely?"

V. Zue, et al. JUPITER: A Telephone-Based Conversational Interface for Weather Information. IEEE Transactions on Speech and Audio Processing, Vol. 8 , No. 1, January 2000.