Spoken Language Generation and Un- Derstanding by Machine: a Problems and Applications-Oriented Overview

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/256982222 Spoken Language Generation and Understanding by Machine: A Problems and Applications Oriented Overview Chapter · January 1980 DOI: 10.1007/978-94-009-9091-3_1 CITATIONS READS 6 826 1 author: David Hill The University of Calgary 58 PUBLICATIONS 489 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Computer-Human Interaction View project Animating speech View project All content following this page was uploaded by David Hill on 03 December 2015. The user has requested enhancement of the downloaded file. Spoken Language Generation and Un- derstanding by Machine: a Problems and Applications-Oriented Overview David R. Hill Department of Computer Science, University of Calgary, AB, Canada © 1979, 2015 David R. Hill Summary Speech offers humans a means of spontaneous, convenient and effective communication for which neither preparation nor tools are required, and which may be adapted to suit the developing requirements of a communication situation. Although machines lack the feedback, based on understanding and shared experience that is essential to this form of communication in general, speech is so fundamental to the psychology of the human, and offers such a range of practical advantages, that it is profitable to develop and apply means of speech communication with machines under constraints that allow the provision of adequate dialog and feedback. The paper details the advantages of speech communication and outlines speech production and perception in terms relevant to understanding the rest of the paper. The hierarchy/heterar- chy of processes involved in both automatic speech understanding and the machine generation of speech is examined from a variety of points of view and the current capabilities in terms of viable applications are noted. It is concluded that there are many economically viable applications for both the machine generation of speech and the machine recognition of speech. It is also suggested that, although fundamental research problems still exist in both areas, the barriers to increased use of speech for communication with machines are mainly psychological and social, rather than technological or economic, though problems of understanding and natural dialogue design remain. This paper is largely based on an earlier paper with the same title, originally presented at the NATO Advanced Summer Institute, held in Bonas, France, June 26 to July 7, 1979 and published in J. C. Si- mon (ed.) Spoken Language Generation and Understanding, D. Reidel Publishing Co. Dordrecht, Holland/Boston/London 1980, pp 3-38. 2 D.R. Hill 1. Introduction: History and Background Historically, speech has dominated the human scene as a basis for person to person communication, writing and printing being used chiefly for record-keeping (especially for legal and govern- ment purposes) and long-distance communication. It is only com- paratively recently that written language skills have been widely acquired and practised, even in the developed western world, and it is still true that large numbers of people in the world are unable to read or write. With the advent of television and radio speech has started to gain ground again even in the west, and it is a peren- nial complaint amongst educators that the rising generation is not at ease with the written forms of language. Even universities are finding it necessary to set tests of students’ ability to comprehend and write their native language. It is in this sense that speech may be considered a more fundamental means of human communication than other forms of linguistic expression. Face to face speech allows a freer and more forceful use of language, because it is aided by contextual, situational and all man- ner of linguistic and para-linguistic features (voice tone, rhythm, the rise and fall of pitch, gestures, pauses and emphasis) that are simply absent from written forms. Grammatical rules of all kinds may be broken with impunity in speech to give a stream of language that would be incomprehensible in written form. Even for careful speech (as in an important meeting) it is sobering to read a verbatim transcript of the proceedings afterwards. However, ev- eryone understands at the time and (perhaps more important) if they don’t, they can seek immediate clarification. The planning load for speech communication is less, even over a telephone, because ideas can be added to and expanded as required. A written document must anticipate and deal with all questions and obscu- rities that could arise for arbitrary readers, whenever they read it, and under whatever circumstances, without all the additional resources inherent in speech. This is why written documents fol- low such a mass of conventions and rules so precisely. That kind of redundancy and familiarity is necessary to overcome the many disadvantages with which written documents are burdened. But, in this sense, written documents can remain precise despite the test of time and circumstances. A problems and applications oriented overview 3 Human beings can usually communicate far more readily, conveniently, and effectively using speech than they can by writing, providing there is adequate feedback, and responses do not overload the human short-term memory. Under these conditions, the subject feels more comfortable and is less encumbered. This is what people really mean when they say that speech is more natural and convenient as a means of communication. With re- duced feedback, speech must put on some of the formal trappings of written language, which explains the difference in character between a lecture, a seminar and a conversation. With increasing interaction and feedback, the constraints of language may be increasingly relaxed, short cuts taken, and the communication experience greatly improved. Books record material, but teach- ers and lecturers are likely to communicate what is recorded in a more effective, succinct (if different) form, depending on their familiarity with the subject(s) by speaking. It is, of course, easier to close a book (and do something else) than to walk out of a well- delivered lecture or seminar. Speech also has immediacy, impact, and holds attention better. Thus the first reason for wanting a speech interface, as op- posed to any other, is based in the psychology of the human. It is natural, convenient, and indeed almost expected of any system worth communicating with. There are, however, a number of very practical advantages: (a) it leaves the human body free for other activities providing an extra channel for multi-modal communication in the process; (b) it is well-suited to providing “alert” or break-in messages even for quite large groups and may communicate urgency in a way only rivalled by real flames; (c) no tool is required; (d) it is omnidirectional, so does not require a fixed operator position; (e) it is equally effective in the light or the dark, or, in fact, under many conditions when visual or tactile communication would be impeded such as in smoke, or under conditions of high acceleration or vibra- tion (though the latter may cause some interference with human speech production); (f) it avoids the use of keyboards and displays thereby taking up less panel space and avoiding the need for special operator training to cope with the key layouts and display interpretation; (g) it allows simple monitoring of command/response sequences by remote personnel or recording devices; 4 D.R. Hill (h) it can allow a single channel to serve more than one purpose (e.g. communication with aircraft at the same time as data entry to an Air Traffic Control computer); (i) it allows blind people, or those with manual disabilities to integrate into Human-Computer Interaction (HCI) systems with ease; (j) it allows security checking on the basis of voice characteristics; and perhaps most important, (k) it is compatible with the widely available and inexpensive public telephone system, including mobile devices. Against these we must set a number of disadvantages: (a) it may be subject to interference by noise; (b) it is not easily scanned for purposes of selecting infrequent items from some large body of information; (c) special precautions must be taken for verification and feedback, par- ticularly for messages initiating critical actions, for reasons previously discussed; (d) unrestricted speech input to machines is likely to be unavailable for a long time to come (in fairness, it should be pointed out that there are severe restrictions on other forms of input, the removal of which would involve the solution of similar problems); (e) speech input to machines seems likely always to require additional ex- pense, compared to written input, simply because of the considerable extra processing power and memory space needed to decode the inher- ently ill-defined and ambiguous acoustic evidence—a step more akin to using handwriting for input in place of well defined machine generated character codes. However, these costs may be tolerable, given the falling price and increasing power of the necessary hardware, just as Optical Character Recognition (OCR) is now viable. It is concluded that we should use speech to communicate with machines. People need to interact with machine systems in the most effective way under a variety of circumstances in which speech offers solutions to problems, or communication enhancing advantages. Especially,

Spoken Language Generation and Un- Derstanding by Machine: a Problems and Applications-Oriented Overview

Part 2: RHYTHM – DURATION and TIMING Część 2: RYTM – ILOCZAS I WZORCE CZASOWE

SPEECH ACOUSTICS and PHONETICS Text, Speech and Language Technology VOLUME 24

Estudios De I+D+I

Half a Century in Phonetics and Speech Research

Speech Generation: from Concept and from Text

Aplicació De Tècniques De Generació Automàtica De La Parla En Producció Audiovisual

C H a P T E R

Gnuspeech Tract Manual 0.9

DOCUMENT RESUME ED 052 654 FL 002 384 TITLE Speech Research

Michael Studdert-Kennedy

1. the Speech Chain

“Síntese Computacional De Voz”