Integration of Speech with Natural Language Understanding ROBERT C

Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9983-9988, October 1995 Colloquium Paper This paper was presented at a coUoquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9, 1993. Integration of speech with natural language understanding ROBERT C. MOORE Artificial Intelligence Center, SRI International, Menlo Park, CA 94025 ABSTRACT The integration of speech recognition with lems more easily using spoken input than using a simple textual natural language understanding raises issues of how to adapt transcription of that input. natural language processing to the characteristics of spoken This paper looks at how these issues are being addressed in language; how to cope with errorful recognition output, current research in the ARPA Spoken Language Program. including the use of natural language information to reduce recognition errors; and how to use information from the COPING WITH SPONTANEOUS speech signal, beyond just the sequence ofwords, as an aid to SPOKEN LANGUAGE understanding. This paper reviews current research address- ing these questions in the Spoken Language Program spon- Language Phenomena in Spontaneous Speech sored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken The participants in the ARPA Spoken Language Program language differs from standard written language and discuss have adopted the interpretation of requests for air travel methods ofcoping with the difficulties of spontaneous speech. information as a common task to measure progress in research I then look at how systems cope with errors in speech on spoken language understanding. In support of this effort, recognition and at attempts to use natural language informa- over 15,000 utterances have been collected from subjects by tion to reduce recognition errors. Finally, I discuss how using either a simulated or actual spoken language Air Travel prosodic information in the speech be used to Information System (ATIS). Interpreting these utterances signal might requires dealing with a number of phenomena that one would improve understanding. not encounter often in dealing with linguists' examples or even real written texts. The goal of integrating speech recognition with natural lan- Among the most common types of nonstandard utterances guage understanding is to produce spoken-language- in the data are sentence fragments, sequences of fragments, or understanding systems-that is, systems that take spoken fragments combined with complete sentences: as their and in an language input respond appropriate way six thirty a m from atlanta to san francisco what type depending on the meaning of the input. Since speech recog- of aircraft nition (1) aims to transform speech into text, and natural- on the delta flight number ninety eight what type of language-understanding systems (2) aim to understand text, it aircraft might seem that spoken-language-understanding systems i would like information on ground transportation city could be created by the simple serial connection of a speech of boston between airport and downtown recognizer and a natural-language-understanding system. This naive approach is less than ideal for a number of reasons, the A particular subclass of these utterances might be dubbed most important being the following: "afterthoughts." These consist of an otherwise well-formed * Spontaneous spoken language differs in a number ofways sentence followed by a fragment that further restricts the initial from standard written language, so that even if a speech request: recognizer were able to deliver a perfect transcription to a i'd like a return flight from denver to atlanta evening natural-language-understanding system, performance would flights still suffer if the natural language system were not adapted to i need the cost ofa ticket goingfrom denver to baltimore the characteristics of spoken language. a first class ticket on united airlines * Current speech recognition systems are far from perfect what kind of airplane goes from philadelphia to san transcribers of spoken language, which raises questions about francisco monday stopping in dallas in the afternoon how to make natural-language-understanding systems robust first class flight to recognition errors and whether higher overall performance Another important group of nonstandard utterances can be can be achieved by a tighter integration of Apeech recognition classified as verbal repairs or self-corrections, in which the and natural language understanding. speaker intends that one or more words be replaced by * Spoken language contains information that is not neces- subsequently uttered words. In the following examples, groups sarily represented in written language, such as the distinctions ofwords that are apparently intended for deletion are enclosed between words that are pronounced differently but spelled the in brackets: same, or syntactic and semantic information that is encoded i'd like [to] a flight from washington [to] that stops in prosodically in speech. In principle it should be possible to denver and goes on to san francisco extract this information to solve certain understanding prob- [do any oftheseflights] [do] are there anyflights that arrive afterfive p m The publication costs of this article were defrayed in part by page charge can you give me information on all the flights [fiom san payment. This article must therefore be hereby marked "advertisement" in francisco no] from pittsburgh to san francisco on accordance with 18 U.S.C. §1734 solely to indicate this fact. monday 9983 Downloaded by guest on September 30, 2021 9984 Colloquium Paper: Moore Proc. Natl. Acad. Sci. USA 92 (1995) Some utterances involve use of metonymy, in which a word Extending the linguistic rules of a system to include non- or phrase literally denoting one type of entity is used to refer standard but regular patterns still leaves disfluencies and truly to a related type of entity: novel uses of language unaccounted for. To deal with these, i need flight information between atlanta and boston virtually all systems developed for the ARPA ATIS task what is the flight number that leaves at twelve twenty p m incorporate some sort of language-understanding method that what delta leaves boston for atlanta does not depend on deriving a complete analysis of the input that accounts for every word. Such methods are usually In the first two utterances, properties of flights are attributed described as "robust," although they are actually robust only to flight information and flight numbers; in the third, the name along certain dimensions and not others. A common strategy delta is used to refer to flights on Delta Airlines. is to have predefined patterns (case frames or templates) for Some utterances that perhaps could be viewed as cases of the most common sorts of queries in the ATIS task and to scan metonymy might better be interpreted simply as slips of the the input string for words or phrases that match the elements tongue: of the pattern, allowing other words in the utterance to be does delta aircraft d c tens skipped. The Carnegie-Mellon University (CMU) Phoenix fly system (3) and SRI International's Template Matcher (4) rely In this utterance aircraft has simply been substituted for exclusively on this approach. In the SRI system, for example, airlines, perhaps because of the phonetic similarity between the a key word such as flight, fly, go, or travel is used to trigger the two words and semantic priming from the information being template for a request for flight information and phrases requested. matching patterns such as on <date>, from <city>, and to Finally, some utterances are simply ungrammatical: <city> are used to fill in constraints on the flights. This allows what kinds ofground transportation is available in dallas the system to ignore disfluencies or novel language if they do fort worth not occur in parts of the utterances that are crucial for okay what type of aircraft is used on a flight between san recognizing the type of request or important constraints on the francisco to atlanta request. For instance, this approach can easily process the what types of aircraft can i get a first class ticket from example given above of a sentence with multiple problematic philadelphia to dallas features, from those show me that serve lunch [flight number] [okay flight] [let us] you have a flight The first example in this list is a case of lack of number numbergoing [to] from san francisco to atlanta around agreement between subject and verb; the subject is plural, the eight a m verb singular. The second example seems to confuse two because it is fundamentally a very simple type of request, and different ways of expressing the same constraint; between san none of the disfluencies affect the phrases that express the francisco and atlanta and from san francisco to atlanta have constraints on the answer. been combined to produce between sanfrancisco to atlanta. The This template-based approach works well for the vast ma- final pair of examples both seem to involve deletion of function jority of utterances that actually occur in the ATIS data. In words needed to make the utterances grammatical: principle, however, the approach would have difficulties with what types of aircraft can i get a first class ticket from utterances that express more complex relationships among philadelphia to dallas (on) entities or that involve long-distance dependencies, such as in from those show me (the ones) that serve lunch what cities does united fly to from san francisco Of course, there are also utterances that combine several of Here the separation of what cities and to would make this these phenomena, for example: utterance difficult to interpret by template-based techniques, [flight number] [okay flight] [let us] you have a flight unless a very specific pattern were included to link these together.

Integration of Speech with Natural Language Understanding ROBERT C

Aviation Speech Recognition System Using Artificial Intelligence

A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech

Learned in Speech Recognition: Contextual Acoustic Word Embeddings

Synthesis and Recognition of Speech Creating and Listening to Speech

Natural Language Processing in Speech Understanding Systems

Arxiv:2007.00183V2 [Eess.AS] 24 Nov 2020

Improving Named Entity Recognition Using Deep Learning with Human in the Loop

Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog

Language Modelling for Speech Recognition

Recent Improvements on Error Detection for Automatic Speech Recognition

A Lite BERT for Self-Supervised Learning of Audio Representation

Masked Language Model Scoring