CHAPTER 3 SELECTION OF TUTORIALS AND RELATED MATERIALS FOR SPOKEN LANGUAGE ENGINEERING

Klaus Fellbaum Brandenburg Technical University of Cottbus, Germany

Marian Boldea Universitea Politehnica Timisoara, Romania Andrzej Drygajlo Ecole Polytechnic Federale de Lausanne, Switzerland Mircea Giurgiu Technical University of Cluj-Napoca, Romania Phil Green University of Sheffield, United Kingdom Ruediger Hoffmann Technische Universität, Dresden, Germany Michael McTear University of Ulster, Northern Ireland Bojan Petek University of Ljubljana, Slovenia Victoria Sanchez University of Granada, Spain Kyriakos Sgarbas University of Patras, Greece

Spoken Language Engineering

1 Introduction

This chapter summarises work by the Spoken Language Engineering (SLE) Working Group of the Socrates Thematic Network in Speech Communication Sciences. The SLE Working Group, now in its third funding year, has surveyed SLE course provision in Europe (Green et al., 1997) and has made proposals for SLE curriculum development at both undergraduate and postgraduate levels (Espain et al., 1998). The thematic network has shown that computer-based teaching aids (on- line tutorials, demonstration packages and so on) are vital to the future development of SLE education. This follows from the multidisciplinary and technical nature of SLE, which requires novel ways of presenting unfamiliar material. In recent years, such software has begun to appear, partly as a result of initiatives taken within the network, within associated projects and independently. The increasing interest in SLE courseware was demonstrated at the recent MATISSE workshop, on which much of the following review material is based (Hazan and Holland, 1999). The chapter analyses software resources available in relation to curricular requirements and educational criteria and makes recommendations for modules in an SLE curriculum. In addition, we identify areas for which high-quality courseware is, to our knowledge, unavailable and identify actions to fill these remaining gaps.

According to the structuring of the second book (Bloothooft et al., 1998), we used the sections: • Introduction to Speech Communication and Speech Technology • Speech Analysis • Natural Language Processing • Speech Production and Perception • Speech Coding • • Speech Recognition • Spoken Dialogue Modelling • Language Resources Concerning the subchapters on Applications and Current Research in SLE, we did not find tutorials or other relevant teaching material. This is not very surprising since applications in speech processing are usually a commercial matter and a company (presenting applications) normally does not have a strong interest in a detailed and tutorial-like presentation. For current research in general, the time is too short to transform the results into a didactically oriented form and thus the usual presentation is in scientific articles or in proceedings.

Generally speaking, we found a very heterogeneous coverage of the speech communication area, heterogeneous in both, the subjects (see above) and the media (Web, CDROM, books etc.). We could identify an accumulation of introductory material, mainly in speech production and perception, signal processing and linguistics. But other areas (for example speech coding and synthesis) are not covered satisfactorily. As to the media, very often the Web presentations were only test versions with a more advertising character offering CDROMS, books or the download of the complete material after payment. Finally, the quality of the material and the didactic quality varies strongly.

22 Spoken Language Engineering

As a general remark it must be stated that SPEECH INPUT is still a difficult problem in the landscape of the Web. SPEECH OUTPUT in contrast is quite easy! There are only very few tutorials using speech input.

For the moment, there are only two possibilities to perform speech input:

• The shareware SoundBite (SCRAWL company) which works only for Windows 95/98 or NT and the NETSCAPE Browser 4.04 (or higher). For more details and downloading visit http://www.scrawl.com/store. • The Tcl/Tk plugin. This plugin can be used if speech input (and output) is applied to an existing tutorial. If someone wants to produce a new tutorial, the Tcl/Tk libraries are necessary. For more details see http://www.scriptics.com/plugin/ .

A third possibility, based on Java2 tools, is in preparation and a release is announced for the end of 1999. For now, only pre-versions are available. One module is Java Sound API (http://java.sun.com/products/java-media/sound) which has record and playback but no storage features. Another Java product is Java Media Framework which has storage capabilities and, in addition, network transmission (RTP) features. However, it is also a pre-version. For more details see http://java.sun.com/products/java-media/jmf .

The next sections will present a selection of tutorials in detail.

23 Spoken Language Engineering

2 Introduction to Speech Communication and Speech Technology

This section deals with introductory material. As a matter of fact, the subjects cover a wide area between speech signal presentations in the time and frequency domain, signal processing techniques (windowing, FFT, parameter extraction etc.) and basic principles of acoustics and physiology. Although most of the tutorials are far from complete speech courses, they are very useful as appetizers and they can motivate beginners to dive into the speech area.

Speech Visualisation Tutorial http://isl.ira.uka.de/~maier/speech/vistut/ University of Karlruhe, Interactive Systems Laboratories Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: The tutorial covers visualisation of speech waveforms and spectrograms. It presents the waveform and a spectrogram of the utterance "speech lab". Labels have been added to the views of the speech marking the beginning of each phoneme (or speech sound) in the utterance. Impression: May be used as a short introductory text on spectrograms.

comp.speech Frequently Asked Questions WWW site http://svr-www.eng.cam.ac.uk/comp.speech/ University of Cambridge, Department of Engineering Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: The site provides a range of information on speech technology, including speech synthesis, speech recognition, speech coding, and related material. The information is regularly posted to the comp.speech newsgroup as the "comp.speech FAQ" posting. This site is mirrored at several other WWW sites around the world (Australia, UK, Japan and USA) and the information is also available in a plain text format. There are 250 comp.speech WWW pages and they include over 500 hyperlinks to speech technology web sites, ftp servers, mailing lists, and newsgroups. Impression: These web sites are a very useful tool to get oriented in the world of speech technology. They are not suited as teaching material but they present a collection of interesting speech themes and very many links to speech products. Speech Analysis Tutorial http://www.ling.lu.se/research/speechtutorial/tutorial.html University of Cambridge, Department of Engineering

24 Spoken Language Engineering

Author: Tony Robinson Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: Very brief, very introductory tutorial on speech analysis, introducing fundamental speech signal representations (waveform, Fo contour, spectrum, spectrogram, waterfall spectrogram, phonetic transcription), suitable for a first exposure to these topics. Impression: The tutorial covers a lot of details but gives only short explanations. Thus, it is suited to support lectures in speech signal analysis.

Spectrogram Reading Tutorial http://cslu.cse.ogi.edu/tutordemos/SpectrogramReading/spectrogram_readin g.html Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: This is a more extended introduction to speech signal representations, stressing spectrograms and transcription, with many practical exercises.

Das Lesen von Sonagrammen V0.2. Begleitendes Hypertext-Dokument zur Vorlesung (in German) http://www.phonetik.uni-muenchen.de/SGL/SGLHome.html Institut für Phonetik und Sprachliche Kommunikation der Ludwig-Maximilians- Universität München Authors: K. Machelett, H.G. Tillmann Availability: free Requirements: WWW Browser Description: The tutorial deals with Spectrogram Reading following the chapters • Fundamentals • The sound classes in the sonagram • On the differentiation of sounds within the sound classes • Reading sonagrams in practice Impression: Companion material to complete lecture series in spectrogram reading with high expertise and good pictures. Das SPRACHLABOR - eine multimediale Einführung in die Welt des Sprechens/der Phonetik (in German) http://www.media-enterprise.de/sprachla/sprachla.htm Availability: Only demo version Requirements: WWW-browser with sound replay capabilities. Description: Physiology of the speech organs, acoustic fundamentals of the speech

25 Spoken Language Engineering

process, spectrogram reading, speech analysis. Impression: Professional program.

Tutorien und Skripte der Universität Kiel (in German, English and Swedish) http://www.ipds.uni-kiel.de/links/skripte.de.html Availability: free Requirements: WWW Browser with sound replay capabilities Description: Course papers and audio demonstrations on acoustic phonetics. Also an interactive course on linguistics, speech synthesis, speech recognition. Impression: A very useful selection of courses in different languages, covering many speech areas. Many links to other tutorials.

3 Speech Analysis

For the area of speech analysis, some (but not very many) useful tutorials and similar material exist. The following survey of educational resources that may be used to support speech analysis training for courses in Spoken Language Engineering reflects current practice. Its objective is to give a clear idea of the material available and to provide a useful starting point to the next phase - the development of new teaching and learning reference material, preferably in electronic format.

The key to a successful speech analysis training is to create tools which open up the field to interactive investigation by the student. These tools should allow the student to interact with parameters in order to acquire practical skills in listening, analysis and interpretation performance as well as create new algorithms. The aim is to provide dedicated educational software (executables and sources) instead of exercises based on commonly used, only executable research tools such as Waves+ (Entropic) or MultiSpeech (Kay Elemetrics). Another goal is to create platform- independent, interactive, on-line laboratories accessible on the Internet to increase the efficiency of student self-study in the speech analysis domain. Recently, the MATLAB programming environment has been employed by the Speech Science community to develop some educational aids for teaching and learning. Examples are the MAD programme from University of Sheffield [section 5] and the set of exercises from Czech Technical University in Prague [J. Uhlir, in Hazan and Holland, 1999, pp. 53-56]. MATLAB provides facilities for numerical computation, user interface creation and data visualisation. Its cheap academic edition allows students to use the demonstration tools at home. Unfortunately, MATLAB is neither completely portable, nor can it run within a Web browser.

26 Spoken Language Engineering

The nearest contender to MATLAB appears to be Java. The ability of Java applets to be used within WWW pages certainly offers advantages for platform-independent distance learning. Purely Java-based speech analysis software was developed at EPFL Lausanne [below]. The second example is the Snack speech analysis module from KTH Stockholm that uses Java applets, Tcl/Tk language and C/C++ [below]. Unfortunately, Java is in its infancy and educators working with it must create many of their own classes to handle tasks for which they would be able to obtain ready-to-use libraries in other languages.

The educational speech analysis software packages available today, written in MATLAB, Java or in other languages (e.g. KHOROS: Z. Kacic, in Hazan and Holland, 1999, pp. 117-120.) are far from completion and their development can be seen as an ongoing concern to produce tools for the teaching and learning speech analysis concepts.

As mentioned in the introduction, the problems of speech input generate a bottleneck and they hinder a flexible and easy construction of tutorials. In addition, the following points from the module's syllabus seem not to be (well) covered by the existing material: • speech signal analysis in the time domain: amplitude, zero crossings, autocorrelation, statistic parameters; • speech signal analysis in the frequency domain: general time-frequency descriptions (wavelets etc.); • homomorphic speech processing: cepstrum and its applications; • determination of vocal tract parameters: estimation of vocal tract geometry.

Tony Robinson's Speech Analysis Course http://svr-www.eng.cam.ac.uk/~ajr/SpeechAnalysis/ Cambridge University, Department of Engineering Availability: free. Requirements: WWW browser. Description: Introductory course by Tony Robinson from the Cambridge University Engineering Department - Speech, Vision and Robotics group, intended for students that come to speech processing from many directions. It presents DSP fundamentals (sampling theory, linear systems/filters, convolution, z and Fourier transforms) and aspects of speech production and perception (source-filter model, non-linear frequency scales) before discussing speech analysis methods. Impression: For a course taught according to the SLE recommendations, it should be accompanied by a more detailed text.

The SNACK Tcl/Tk sound extension http://www.speech.kth.se/SNACK/ Technical University Stockholm, Department of Speech, Hearing and Music

27 Spoken Language Engineering

Availability: free Requirements: Tcl/Tk 8.0.3 or later; optional: WWW-browser with the Tcl plug-in 2.0; NIST SPHERE library. GNU autoconfig and C compiler to build from sources. Description: Snack is a Tcl/Tk extension for sound which allows a rapid development of interactive speech processing and visualization tools. In conjunction with the Tcl plugin, it also allows interactive experiments to be included in educational materials in HTML format. Although in the present state it provides only basic functionality for speech signals recording, play back, spectral analysis, and waveform, spectrum, and spectrogram representation, it has the great advantage of being extensible. Impression: Very interesting as an ODL authoring/experimental aid for teachers.

The SNACK example web pages http://www.speech.kth.se/labs/analysis/ Technical University Stockholm, Department of Speech, Hearing and Music Availability: free. Requirements: Tcl/Tk 8.0.3 or later, WWW-browser with the Tcl plug-in 2.0. NETSCAPE4/IE 4.0 Description: A few laboratory experiments using Snack: a demonstration of the window length effect in FFT analysis, wide and narrow band spectrograms, real time spectrum and spectrogram, foramants measurements, etc. Impression: An excellent appetizer to the educational potential of the SNACK package.

JavaSpeechLab http://scgwww.epfl.ch/JavaSpeechLab

Availability: free.

Requirements: Java-enabled WWW browser. Drygajlo, A. [1999]. "Speech Processing", Part I, EPFL, Lausanne.

Description: JavaSpeechLab demonstrator has been designed to transform the WWW page into a click-and-play interactive, platform-independent speech analysis workstation for distance learning applications. The main double window allows a user to load and to edit a speech signal. A menu selection and pull-down lists allow the user to specify a speech analysis application in terms of control blocks. Once the user has chosen the parameters in the control block the program can execute, display and playback the results. In this software laboratory the common "frame-window" environment is provided for all types of analysis from the simplest in the time domain through the classical one using short-term Fourier transform to the most complex in the time-spectral domain based on multi-resolution wavelet packet transforms which use different

28 Spoken Language Engineering

windows and frames at different frequency subbands. This modular package can be completed by the user, adding new speech analysis or processing modules using a plug-in technique.

Impression: JavaSpeechLab can be used for conventional classroom experiments, in the students' laboratory or can provide means that can increase the efficiency of student self study in the speech analysis domain. It is designed to transform the WWW page into an easy-to-use speech analysis workstation. It is interactive, easy to learn, and encourages self-exploration.

The Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs.htm University College London, Department of Linguistics and Phonetics Availability: free sources and binaries for DOS/Windows. Requirements: DOS, Windows or Unix computer with sound I/O; gnu C compiler to build from sources. Description: The Speech Filing System is one of the top resources, including, besides general utilities for recording, replaying, and manipulating both acoustic and laryngographic signals, many programs useful to process them and present the results in graphical form. It also includes processors for sml, a specialized speech measurement language, and span, a language for controlling a parallel formant synthesizer. There is also one for spc, a Pascal dialect adapted for speech processing, unfortunately not documented.

Although SFS does not cover all the topics in this module, being available in C source form, it has the advantage that it can be extended easily to illustrate most of them. The Speech Filing System includes also some programs that allow simple isolated word recognition experiments and results evaluation:

dp - simple DTW recognizer, with or without slope constraints; dprec - same, but with training facilities; conmat - prints confusion matrix and correct percentage;

Impression: One of the best software resources for speech education, well documented and extensible.

The INTEL Signal Processing Library http://www.intel.com/vtune/perflibst/spl/ Availability: free Windows DLLs. Requirements: Intel-architecture CPU PC (Pentium II/III recommended), min. 16 MB RAM (32 MB recommended), Windows95/98/NT, C++ compiler.

29 Spoken Language Engineering

Description: the Intel Signal Processing Library is a collection of functions useful to explore signal and speech processing in a Windows environment. It supports: vector operations (arithmetic, logical, threshold, square root, standard deviation, exponential, power spectrum), windowing (Bartlett, Blackman, Hamming, Hann, Kaiser), transforms (DFT, FFT, DCT, wavelets), filters (FIR, IIR, LMS), signal generation (pseudo-random, uniform, Gaussian), auto and crosscorrelation, convolution. Impression: Very good documentation, but not always in sync with the actual package contents.

The TSP signal processing library http://tsp.ee.mcgill.ca/software/ Availability: free sources and Windows binaries. Requirements: Unix/Windows system, C compiler. Description: The Telecommunications and Signal Processing Laboratory at McGill University makes available this signal processing library. It includes functions for signal file manipulation, math support, filtering, autocorrelation, FFT, DCT, LPC, spectral distance measures. Impression: Although biased towards speech coding, it has the advantage of being very well documented.

The comp.speech ftp site ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/ Availability: free. Requirements: variable from package to package. Description: An eclectic collection of data and software related to speech processing. The quality may vary very much, many packages being provided on an "as-is" basis, often with little or no documentation. A few packages have been found to be working, e.g.: • analysis/Pitch_Tracker-1.0.tar.Z - pitch tracking software with sources; • tools/ep.1.2.tar.gz - endpointing software with sources; • tools/gvqprog.zip: miscelaneous vector quantization methods (LBG, LVQ, GVQ). Impression: Don't expect too much!

RASTA speech analysis http://www.icsi.berkeley.edu/real/rasta.html Availability: free sources.

30 Spoken Language Engineering

Requirements: C compiler. Description: Home page of the RASTA speech signal feature extraction method, with links to articles and sources. RASTA is a term used to describe a range of speech recognition front end algorithms developed by Nelson Morgan, Hynek Hermansky and others at ICSI. The "rasta" program, which implements some of the algorithms in addition to basic PLP processing, is available from the ICSI FTP site.

4 Natural Language Processing

Tutorial: Data Structures and Algorithms II http://www.cee.hw.ac.uk/~alison/ds.html Description: A Tutorial on Data Structures and Algorithms containing information about String Processing Algorithms, Parsing, Context Free Grammars, Simple Parsing Methods from a Software Engineer's point of view.

Tutorial: Databases and Artificial Intelligence 3 - Artificial Intelligence Segment http://www.cee.hw.ac.uk/~alison/ai3notes/all.html Description: This tutorial contains a Natural Language Processing section covering issues of syntax, writing a grammar, grammars in Prolog, parsers, returning the parse tree, multiple parsers, semantics and pragmatics.

Course in Natural Language Processing http://www.cs.bham.ac.uk/~pjh/sem1a5/sem1a5.html Description: This lecture covers some introductory material, in particular the meaning of the title, some history, current applications, a review of the state-of- the-art in Natural Language Processing and research directions. These notes take some of the topics further, suggesting directions for students for extended studies. The tutorial notes are exercises to be completed in time for a small group tutorials.

Lexical Functional Grammar http://clwww.essex.ac.uk/LFG/ Description: An informative site on LFG (Lexical Functional Grammar) covering

31 Spoken Language Engineering

introductory issues, books/articles, standard papers, collections, bibliography of works, publically available LFG systems, archive of papers about LFG, current grammar development efforts, current research directions in LFG, an LFG mailing list, recent LFG news (The LFG Bulletin), LFG conferences, ILFGA: The International LFG Association, slides and teaching material from various conferences and summer institutes.

The AGFL Grammar Work Lab http://www.cs.kun.nl/agfl/ Requirements: The programs run on different platforms (SunOS, Solaris, MS-DOS and Linux). Description: Tutorial and resources for the AGFL-formalism (Affix Grammars over Finite Lattices). The AGFL formalism has been developed by the Department of Software Engineering, University of Nijmegen. It is a formalism in which context free grammars can be described. AGFLs are two level grammars: a first, context free level is augmented with features for expressing agreement between parts of speech.

Finite-State HomePage http://www.xrce.xerox.com/research/mltt/fst/ Description: Provides theory for Regular Expressions, Finite-State Automata and Transducers. Includes a demonstrator: the Xerox Finite-State Compiler which allows the user to enter a regular expression in a text window and compile it into a finite-state network. Depending on the expression being typed, the result is either a simple automaton or a transducer.

Courses in Computational Linguistics http://clwww.essex.ac.uk/course/ Description: This page gives access to information, course handouts, source code for programs, etc. for a number of different courses taught in Essex: Head Driven Phrase Structure Grammar, Lexical Functional Grammar, Prolog and Natural Language Processing, Machine Translation, Computational Linguistics 1, Lexical Functional Grammar, Prolog and Natural Language Processing, Computational Linguistics 2, Head Driven Phrase Structure Grammar, Prolog and Natural Language Processing.

Computational Syntax and Semantics at New York University http://www.nyu.edu/pages/linguistics/ Availability: free Description: The course consists of 5 chapters:

32 Spoken Language Engineering

1. Research opportunities in computational syntax and semantics. 2. Courses offered in computational syntax and semantics. 3. The HTML Gesellschaft mit Stammtische. 4. Beginner's Workbook in Computational Linguistics. 5. New York University Natural Language Computing Project. Impression: Very well organized and attractive. It contains valuable introductory information, tutorials, and free NLP software.

WEB-Course: Corpus Linguistics http://www.ling.lancs.ac.uk/monkey/ihe/linguistics/contents.htm Description: Web pages to be used to supplement the book "Corpus Linguistics", published by Edinburgh University Press, written by Tony McEnery and Andrew Wilson. The document consists of 3 sections: • Early Corpus Linguistics and the Chomskyan Revolution. • What is a Corpus and What is in it? • Quantitative Data. • The Use of Corpora in Language Studies.

Progress Report 1: Sources on Connectionist Natural Language Processing - second draft http://www.tardis.ed.ac.uk/~james/CNLP/report1/report1.html Description: The report provides pointers to useful sources on Connectionist Natural Language processing systems, which were taken to perform work on parsing, NL understanding, NL generation, NL learning, script processing, text processing, machine translation and speech processing.

The Simon Laven Page http://www.toptown.com/hp/sjlaven/ Description: A collection of Chatterbots. A Chatterbot is a program that attempts to simulate conversation, with the aim at least temporarily fooling a real human being into thinking they were talking to another person. It includes ELIZA, FRED and many more.

German Transducers Demo http://www.wordmanager.com/ Availability: free Description: The Multifunctional Morphological Dictionary was developed and designed in a collaboration of the University of Basel, the Vrije Universiteit Amsterdam and the institute for artificial intelligence IDSIA in Lugano. This dictionary is a demo result of a project in the field of Natural Language

33 Spoken Language Engineering

Processing (NLP). The major task of the project was the development of a framework for reusable finite state automatons for word form analysis and word generation. This application uses a database as source, created from a NLP program package called Word Manager (WM), developed at the University of Basel, department of computer science. Finite state systems represent the state- of-the-art in morphological analysis because of their efficiency in time and space.

Johns Hopkins Demos http://bigram.cs.jhu.edu/~demos/index.html Description: Part of Speech Tagger Natural Language Interface to Databases Impression: No tutorial, just a demo

The Natural Language Playground http://www.link.cs.cmu.edu/dougb/playground.html Description: Collection of NLP demos offered by the Link Grammar Group at Carnegie Mellon. Among others: • Parse and extract lexical relationships from unrestricted English sentences. • Insert commas at appropriate places into a sentence stripped of punctuation. • Find relationships between words and concepts across a variety of different relation types • Find rhyming words that are close in meaning to a given target. • The CMU Pronouncing Dictionary • An English pronunciation dictionary with over 100,000 words. • QuickLM compiler: Compile a back-off language model on sparse data. • Hypertext Webster: A hypertext interface to various Webster's dictionary services. • A great Web page filter with many "collaborative dictionaries". • A number of dictionary search services • Find words that match a given definition, or answer a given question.

5 Speech Production and Perception

MAD (Matlab Auditory Demonstrations) http://www.dcs.shef.ac.uk/~martin Sheffield University, Deprtment of Computer Science Availability: free Requirements: MATLAB and audio components Description: Sheffield's software is a growing resource for interactive student learning in speech and hearing. A number of these demos (more than 20

34 Spoken Language Engineering

MATLAB applications) address topics in hearing and , for instance basilar membrane motion, sine-wave speech and auditory scene analysis. Impression: Strong features of MAD are its uniform look-and-feel and documentation style. In summary, one of the best tutorials for psycho-acoustic experiments.

Cochlear Mechanics http://btnrh.boystown.org/cel/ Boys Town National Research Hospital, Omaha, Nebraska, Communication Engineering Laboratory Availability: free Requirements: WWW Browser Description: From the web page of the Communication Engineering Laboratory, the user may select the following topics by mouse clicking: • Introduction to Cochlear Mechanics • Piezoelectric model of the Outer Hair Cell (PDF) • Notes on cochlear mechanics (PDF) • Loudness FAQ The program offers free software for speech analysis. Impression: Under the topic „Introduction to Cochlear Mechanics“, the user will find some hair cell animations which are very useful to support lectures on the functionality of the inner ear.

A Pictorial Guide to the Cochlear Fluids http://oto.wustl.edu/cochlea/ Washington University Medical School, St. Louis, Department of Otolaryngology Availability: free Requirements: WWW Browser Description: The tutorial-like text which is enhanced with numerous pictures consists of the parts • "Fluid in Your Ears" • Anatomy of the Inner Ear • Cochlear Anatomy • Cochlear Fluids Composition • Endolymphatic Hydrops Furthermore, a Cochlear Fluids Simulator Program (free) may be downloaded by the user. Impression: The material is presented under more medical aspects. It may be useful to explain details within a lecture on the functionality of the inner ear.

35 Spoken Language Engineering

Audite - Ein Multimediaprogramm zum Thema Gehör und Hören (in German) http://www.dasp.uni-wuppertal.de/audite Bergische Universität Gesamthochschule Wuppertal Fachgebiet Digitale Signalverarbeitung und Elektroakustik Author: Martina Kremer Requirements: Netscape Navigator Version 4.5, Microsoft Internet Explorer Version 4.0, ActiveX Components Description: Fundamentals of acoustics, physiology of the ear, transfer functions, psychoacoustics Impression: This is an excellent tutorial which is suited to support a lecture series in fundamentals of hearing and psychoacoustics. Numerous acoustical demonstrations and experiments are included.

Sprache und Gehirn. Ein neurolinguistisches Tutorial (in German) http://www.ims.uni-stuttgart.de/phonetik/joerg/sgtutorial/ Experimental Phonetics Group at the Institute of Natural Language Processing, University of Stuttgart, Germany Authors: G. Dogil, J. Mayer Requirements: WWW Browser. The tutorial contains some audio examples which are data from patients. These data as well as some graphics are password protected. Description: The turorial offers an overview on the most important topics in Neurolinguistics, Clinical Linguistics, and Phonetics with the chapters • The brain • Localization of speech in the brain • Speech and speaking disaeses • Linguistical diagnostics Impression: The tutorial is mainly directed towards Neurolinguistics but may be used to illustrate some facts in higher processes in speech perception, too.

Akustische Phonetik (in German) http://www.phonetik.uni-muenchen.de/AP/APHome.html Institut für Phonetik und Sprachliche Kommunikation der Ludwig-Maximilians- Universität München Authors: H.G. Tillmann, F. Schiel Requirements: WWW Browser with sound replay capabilities Description: 'Begleitendes Hypertext-Dokument zur Vorlesung'. Besides an introductory chapter, the document is subdivided into the three main chapters • What is sound? • What are speech sounds? • What makes sounds to speech sounds? Impression: Companion material to a complete lecture series in acoustical phonetics with high expertise, good pictures and audio examples.

36 Spoken Language Engineering

Auditory scales of frequency representation http://www.ling.su.se/staff/hartmut/bark.htm Stockholm University, Departments of Linguistics Author: Hartmut Traunmüller Availabilty: free Requirements: WWW Browser and audio components Description: Basic auditory processes, physical scales, auditory scales, conversion equations. Impression: A comprehensive text with tables and examples which can serve as a hand-out to the students. Acoustic examples are included. En tur i fonetikens marker (Tour in the domains of phonetics). http://www.ling.su.se/staff/hartmut/tur.htm Stockholm University, Departments of Linguistics Author: Hartmut Traunmüller Description: In Swedish (some sections available in English) 1. Grundtonen 2. Formanterna och spektrogram 3. Svenska vokaler 4. Språklig och utomspråklig variation i tal 5. Manipulation av talares ålder och kön (Speaker age and sex manipulations) 6. Grundtonens betydelse för varseblivningen av vokaler (The role of F0 in vowel perception ) 7. Modulationsteorin (The Modulation Theory of Speech) 8. Nolla-halloneffekten 9. Kan vi lita på våra öron? 10. Besläktade presentationer på nätet Impression: A nice tutorial on Swedish phonetics (and that’s why it is written mainly in Swedish).

Speech Production and Perception http://www.sens.com/SPP1.htm Availability: Demo version Requirements: commercial CDROM Description: Speech Production and Perception I (SPP1) [1] is an interactive multimedia CD-ROM courseware that enables us to gain understanding of the correspondence between sound, spectrum and articulation. One of the main strengths of using this tool in teaching an introductory course in acoustic phonetics is its excellent and stimulating interactivity. The courseware includes hundreds of interactive models and simulations that motivate a user for self- and teacher-guided instruction through the study units on Spectrograms, Vowel

37 Spoken Language Engineering

Acoustics, Consonant Acoustics, Speech Perception, and Vowel Perception. Additionally, the SPP1 includes a library of IPA vowel and consonant charts with spectrograms and pronunciations, an interactive glossary defining the basic terminology and reference links to textbook material. The courseware also provides interactive student evaluations as well as separately printed student worksheets. Each worksheet contains questions that require about paragraph length answers. This enables written assessment of study goals on the introductory as well as advanced level of evaluations. Impression: The Information Presentation, Knowledge Space Compatibility, Navigation and Mapping (for definitions, see [2]) are the strong dimensions of the SPP1’s user interface. The Media Integration dimension could potentially be improved by judicious addition of video clips, e.g., to support the animated vocal tract cross sections with visual feedback of human articulatory setup during consonant or vowel pronunciations. The Cognitive Load dimension could be improved by instructing the user which concepts are prerequisites and should better be studied from textbooks in advance before entering the SPP1 courseware unit. In conclusion, the SPP1 is an excellent example of successful synergy between teachers, speech scientists, artists and programmers who have developed it.

References: [1] Berkovitz, R. (1999). Design, development and evaluation of computer- assisted learning for Speech Science education. Proc. ESCA/SOCRATES Tutorial and Research Workshop on Method and Tool Innovations for Speech Science Education (MATISSE), pp. 9-16. [2] Reeves, T. C., and S. W. Harmon [April, 1999]. User interface rating tool for interactive multimedia. [http://itech18.coe.uga.edu/edit8350/UIRF.html]

Human Speech Production Based on a Linear Predictive Vocoder. An Interactive Tutorial http://www.kt.tu-cottbus.de/speech-analysis BTU Cottbus and TU Berlin

Authors: K. Fellbaum and J. Richter

Availability: free

Requirements: WWW Browser (best: NETSCAPE 4.5), sound replay capabilities

Description: The tutorial explains the principle of the human speech production with the aid of a Linear Predictive Vocoder (LPC vocoder) and the use of interactive learning procedures. The user can replay the signal and compare it with the reference speech signal. For visual comparison the reference speech signal and the reconstructed speech signal are depicted in both the time and frequency domain. For the reconstructed signal, also the pitch frequency contour is graphically presented and the user can directly manipulate this contour. The main advantage of the tutorial are its numerous interactive functions and the

38 Spoken Language Engineering

speech input feature. The tutorial is based on HTML pages and Java applets and can be downloaded.

6 Speech Coding

The area of Speech Coding is not well covered by tutorials on the Web. Most of the tutorials are mainly theoretical and rather superficial with general descriptions and some block diagrams. They can be useful to get a first impression on the subject but they are not designed to let you play and experiment with the ideas and to get a more profound knowledge on the subject. There are some tutorials which include demo files but even those are not very useful as they do not usually allow you to generate speech files. One exception is a tutorial on an LPC vocoder (see http://www.kt.tu-cottbus.de/speech-analysis). But this tutorial (although it explains the principle of LPC vocoders) is more related to aspects of human speech production. For more details see the description in the sub-section „Speech Production and Perception“. In the second place there is not a uniform coverage of the different coding techniques. There is a wider coverage of waveform coding techniques, analysis-by- synthesis coding techniques and the LPC vocoder and not much material on very low bit rate speech coding (except for the LPC vocoder) and on perceptual matters as applied to speech coding. The following list describes of some interesting web pages that can be a useful complement for a speech coding course.

Speech Coding http://www-mobile.ecs.soton.ac.uk/jason/speech_codecs/ Availability: free Description: A qualitative introduction to high to low bit-rate speech coding techniques. No demos. Impression: Good for an overview of the most common speech coding techniques with reference to the standards that use them.

On-line Tutorial on Subband Coding http://www.apocalypse.org/pub/u/howitt/sbc.tutorial.html Otolith (Will Howitt) Description: This is a short description mainly of the MPEG-1 standard. Sub-Band Coding (SBC) is a powerful and general method of encoding audio signals efficiently. Unlike source specific methods (like LPC, which works only on speech), SBC can encode any audio signal from any source, making it ideal for music recordings, movie soundtracks, and the like. MPEG Audio is the most popular example of SBC. The document describes the basic ideas behind SBC

39 Spoken Language Engineering

and discusses some of the issues involved in its use. Impression: Helpful to support a teaching unit on MPEG-1. On-line Tutorial on Linear Predictive Coding http://www.apocalypse.org/pub/u/howitt/lpc.tutorial.html Otolith (Will Howitt) Availability: free Requirements: WWW Browser and audio components Description: This is a short discussion of selected problems concerning the application of LPC. Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation. The document describes the basic ideas behind Linear Prediction and discusses some of the issues involved in its use. Impression: Helpful to support a teaching unit on application aspects of LPC.

Wideband Speech and Audio Coding http://www.umiacs.umd.edu/users/desin/Speech/new.html Availability: free Description: An introduction to wideband speech and audio coding with emphasis on perceptual matters. No demos. Impression: Good for an introduction to the speech and audio coding techniques used for wider bandwidth than the telephone bandwidth.

Speech Coding http://wwwdsp.ucd.ie/speech_tut.htm Description: An introduction to speech coding with reference to speech production, perception and quantization. It treats all types of speech coding techniques. No demos.

Impression: A widely complete tutorial

Speech related software: LPC vocoder implemented in simulink, MELP source code, subband coder, pitch shifting and detection http://www.cteh.ac.il/staff/noama Author: Noam Amir, [email protected] Availability: free Requirements: WWW Browser and audio components, needs (partly) Matlab and Simulink Description: This site contains an ongoing accumulation of speech related projects, software and tutorial matter. The material can be found in a number of areas in the site. The "student projects" page contains project reports in HTML on speech

40 Spoken Language Engineering

coding and processing, along with source code in C, MATLAB and SIMULINK, and along with the relevant sound files. "My tutorials" contains various programs written to demonstrate speech and DSP material. One of the most interesting parts is a complete LPC vocoder implemented in SIMULINK, along with GUI's for modifying pitch, intensity and duration. This site is continuously evolving as the students and the teacher add material.

The next links are no tutorials but they contain demos of different speech coders and can be used as a complements of the previous tutorials having no demos. http://www-mobile.ecs.soton.ac.uk/speech_codecs/voicedemo (Digital/Analog voice demo) http://www-mobile.ecs.soton.ac.uk/clare/index.html (Low Bit-rate speech coders) http://www.eas.asu.edu/~speech/table.html (Speech coding demostrations) Impression: This link contains many different speech coders http://people.qualcomm.com/karn/voicedemo/ (short demo, online, with speech examples) http://rainbow.ece.ucsb.edu/demos/wideband_new/Readme.html (Demo of multi-band CELP(MBC) wide-band speech coding) http://www.eas.asu.edu/~speech/table.html (only speech examples) http://www.ee.cityu.edu.hk/~cfchan/demo.html (very short description with online speech examples) http://www-isl.stanford.edu/people/earl/speech_coding.html (Earl Levine's Speech Coding Research Page)

Links, also for further search http://cslu.cse.ogi.edu/HLTsurvey/indextop.html http://www-mobile.ecs.soton.ac.uk/jason/speech_codecs/ http://wwwdsp.ucd.ie/speech_tut.htm http://ourworld.compuserve.com/homepages/Peter_Meijer/link.htm http://www-mobile.ecs.soton.ac.uk/speech_codecs/hot_links.html

7 Speech Synthesis

Speech synthesis belongs to the areas of speech processing which are very weakly covered by tutorials. Most of the Web material are acoustic examples of synthesised speech. The user can type a text which is then transcribed into phonetic symbols, they are transformed into acoustic elements and these elements are finally concatenated to continuous speech. Although the generation of synthetic speech is very impressive, because each synthesiser has its specific sound and

41 Spoken Language Engineering pronunciation, a detailed information and description about the transcription, segmentation and concatenation aspects is widely missing in the Web material (exception: the Interactive Course on Speech Synthesis of the TU Dresden and the BTU Cottbus, http://www.ias.et.tu-dresden.de/kom/lehre).

The user is normally referred to books and articles. A selection of useful references has been given already in book 2.

Text-to-Speech System -> BellLabs http://www.bell-labs.com/project/tts/ Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: High-quality text-to-speech system. Languages: American English, German, Mandarin Chinese, Spanish, French, Italian. The system performes a synthesis via Internet using diphone segments of natural speech. Impression: This is no tutorial but very useful material.

Online Synthesis with HADIFIX http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html Institut für Kommunikationsforschung und Phonetik of the Universität Bonn. Author: Thomas Portele Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: Synthesis online via Internet, Max. Length: 512 characters. Male and female voice. Different formats (.au, .wav, .pcm) and bit rates (8 bit/linear, 8 bit u-law and 16 bit linear). Impression: This is no tutorial but very useful material

Automatic Transcription of German Text http://asl1.ikp.uni-bonn.de/~tpo/O2p.en.html Institut für Kommunikationsforschung und Phonetik of the Universität Bonn. Author: Thomas Portele Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: The transcription from orthographic input to SAMPA is generated using the methods of the speech synthesis system HADIFIX. Multilingual Text-to-Speech System http://www.fb9-ti.uni-duisburg.de/demos/speech.html University Duisburg

42 Spoken Language Engineering

Author: Gerhard Mercator

Availability: free.

Requirements: WWW-browser with sound replay capabilities.

Description: Moderate speech quality. Synthesis online via Internet.

Languages: German, English, Japanese. 8-bit u-law, 8-bit linear, 16-bit linear, different sampling frequencies. Formates: .wav, .au, .aiff, .raw

Impression: This is no tutorial but very useful material

Vienna Concept-to-Speech system – VIECTOS http://www.ai.univie.ac.at/oefai/nlu/viectos/ Austrian Research Institute for Artificial Intelligence (ÖFAI) in cooperation with the Institute of Communications and Radio-Frequency Engineering at Vienna University of Technology.

Availability: free.

Requirements: WWW-browser with sound replay capabilities.

Description: Language: German. Input: SAMPA format, synthesizer output is returned as 16bit/16kHz Wave-file (mime-type: audio/x-wav). Synthese online via Internet

Impression: This is no tutorial but very useful material.

Interactive Demo of SVOX Text-to-Speech System http://www.tik.ee.ethz.ch/cgi-bin/w3svox

Availability: free.

Requirements: WWW-browser with sound replay capabilities.

Description: Synthese online via Internet , different sound formats.

Impression: This is no tutorial but very useful material

Festival Speech Synthesis System Centre for Speech Technology Research, University of Edinburgh http://www.cstr.ed.ac.uk/projects/festival.html (Description) http://www.cstr.ed.ac.uk/projects/festival/userin.html (Online demo) Availability: free. Requirements: WWW-browser with sound replay capabilities. Version 1.3.1: 26th January 1999, online demos via Internet. Very good quality. Audio files are 8 bit ulaw 8k (audio/basic), 16 bit Sun headered, or 16 bit Microsoft. WAV format.

43 Spoken Language Engineering

Description: Festival is a general multi-lingual speech synthesis system developed at CSTR. It offers a full text- to-speech system with various APIs and an environment for development and research of speech synthesis techniques. It is written in C++ with a scheme-based command interpreter for general control. Impression: This is no tutorial but very useful material

Klatt Synthesizer http://www.asel.udel.edu/speech/tutorials/synthesis/KlattSynth/ index.htm http://www.asel.udel.edu/speech/tutorials/synthesis/vowels.html http://www.asel.udel.edu/speech/tutorials/synthesis/ceevees.html Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: Introduction to the design of the Klatt synthesizer. Vowels, consonants and syllables can be produced by input of parameters like pitch, formant frequencies etc.

Examples of Synthesised Speech http://www.ims.uni- stuttgart.de/phonetik/gregor/synthspeech/examples.html Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: A nice and large collection of different synthesis examples. Many links to TTS systems.

A research version of Next-Generation Text-To-Speech (TTS) http://www.research.att.com:80/~mjm/cgi-bin/ttsdemo AT&T Labs Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: Excellent synthesis system for English.

Zur Geschichte der Sprachsynthese (History of Speech Synthesis) http://www.ling.su.se/staff/hartmut/kempln.htm Stockholm University, Departments of Linguistics Author: Hartmut Traunmüller Availability: free.

44 Spoken Language Engineering

Requirements: WWW-browser with sound replay capabilities. Description: in German • Wolfgang von Kempelens sprechende Maschine (this part also in English) • Homer Dudleys VODER • Frank Coopers Pattern Playback • Elektrische Modelle der Sprachproduktion • Computergesteuerte Sprachsynthese Impression: This text provides the user with historic material and demos mainly on the Kempelen machine but also on later developments. Further benefits come from the numerous links to other sources and demos which are contained in the HTML material. A wonderful document with a clear description and excellent graphical material.

An Interactive Course on Speech Synthesis http://www.ias.et.tu-dresden.de/kom/lehre Authors: R. Hoffmann, U. Kordon et.al., TU Dresden and BTU Cottbus Availability: free. Requirements: WWW-browser with sound replay capabilities. Description: In the first part, the components of a TTS systems are explained in general. Entering this part, a block diagram of a TTS system is presented to the user. The user can click onto the different building blocks, and an explanation of the corresponding block will appear. A special section is devoted to the crucial problem of correct segmentation of the speech elements used for the . At first, the rules and the problems assocoated with the segmentation are explained. In an second, experimental part, the user may select his own diphone segments from a given speech data base. The quality of the segments may be evaluated acoustically, and hints are given to avoid errors in cutting. Thus, the user will learn how to select the segments with good quality. Another special section offers a complete TTS system to the user for experimental purposes. The user may type any text, and he may observe how the system processes the text from the first linguistic preprocessing to the acoustic synthesis. The TTS system presented is the "Dresden Speech Synthesizer" (DreSS). Impression: One of the very rare tutorials in speech synthesis which goes beyond pure synthesis examples. More links http://www.ai.univie.ac.at/~hannes/lv_bookmarks.html http://rice.ecs.soton.ac.uk/hth97r/links/speechlink/sp_tutorial.html http://www.isis.ecs.soton.ac.uk/~pjbj96r/sgrs/sgrslink.html http://wwwam.hhi.de/hotlist/speech.htm http://www.cstr.ed.ac.uk/~awb/synthesizers.html http://svr-www.eng.cam.ac.uk/comp.speech/Section5/Q5.4.html (online

45 Spoken Language Engineering

synthesis) http://www.ipds.uni-kiel.de/links/skripte.en.html http://asa.aip.org/links.html http://www.tue.nl/ipo/hearing/webspeak.htm (links, online synthesis, spoken samples of synthetic speech http://www.is.cs.cmu.edu/ http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e- www-site.html

8 Speech Recognition

In order to allow more flexibility in curricula design and implementation, this topic is split into two modules.

8.1 Basic topics The first module is meant to present the basic topics, and stops at the level of isolated word recognition based on either dynamic time warping or Hidden Markov models.

The minimal theoretical support for this first module can be found in the following books:

Deller, J.R., Proakis, J.G., and Hansen, J.H.L. [1993]. Discrete Time Processing of Speech Signals. New York: Macmillan, chapters 10-12. Rabiner, L.R. and Juang, B.H. [1993]. Fundamentals of Speech Recognition. Prentice Hall, chapters 1-2, 4-6. More detailed or alternative treatments of some topics can be found in:

O’Shaughnessy, D. [1987]. Speech Communication: Human and Machine. Addison- Wesley. Huang, X.D., Jack, M. and Ariky, Y. [1990]. Hidden Markov Models for Speech Recognition. Edinburgh University Press. Jelinek, F. [1997]. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA De Mori, R. (Ed.) [1998]. Spoken Dialogues with Computers. New York: Academic Press.

The experimental support for this modules includes, besides software tools in the list of Internet resources below, isolated words speech databases, which are easy to collect for a moderate number of speakers and vocabulary size, or can be obtained at moderate costs from specialized agencies, e.g. the Linguistic Data Consortium's TI 46-Word database (http://www.ldc.upenn.edu).

46 Spoken Language Engineering

The inventory of Internet resources for the first module include:

Joe Picone's Course Materials http://www.isip.msstate.edu/publications/courses/ece_8993_speech/ http://www.isip.msstate.edu/publications/courses/isip_0000/ Availability: free. Requirements: Deller, J.R., Proakis, J.G., and Hansen, J.H.L. [1993]. Discrete Time Processing of Speech Signals. New York: Macmillan, chapters 10-12. Software: Acrobat Reader, Word. Description: Material for two courses on fundamentals of Speech Recognition taught by Joe Picone at Mississippi State University, based mostly on the textbook of Deller, Proakis, and Hansen. They cover more or less the modules on both, speech analysis and speech recognition, up to large vocabulary continuous speech recognition. Impression: Not exactly what the green book recommends, but an example though of how to teach all these in a one semester course.

The CSLU Tutorials http://cslu.cse.ogi.edu/tutordemos Availability: free. Requirements: computer with WWW browser. Description: Collection of tutorials on HMM, NN, and hybrid automatic speeech recognition, using the CSLU toolkit. Impression: Useful material for a wide range of expertise (from beginners to advanced level users).

The VISPER System http://www.fm.vslib.cz/~kes/visper.html Availability: 200 US$ for academic institutions, 500 US$ for others, for an unlimited number of Windows licenses. Requirements: Minimum Pentium PC at 100 MHz, 16 MB RAM, SVGA 800x600, 16 bit sound card, microphone, speakers, Windows95. Description: Available at a moderate cost for an unlimited number of binary licenses under Windows, the VIsual SPEech pRocessing (VISPER) system from the Speech Lab, Technical University of Liberec, Czech Republic, deals with all stages of isolated word recognition using DTW or CDHMM, and has facilities to visually examine all the processing it does. Impression: Although limited to IWR, it has a good educational value.

47 Spoken Language Engineering

The CSLU Toolkit http://cslu.cse.ogi.edu/toolkit/ Availability: free Windows beta binaries. Requirements: minimum Windows95/98/NT/2000 PC Pentium Pro/II/III 200 MHz, 64 MB RAM, sound I/O. Description: The Oregon Graduate Institute Center for Spoken Language Understanding Speech Toolkit is a very good resource, possibly the best, given the documentation and the availability of some tutorials (see above). Impression: This could be an exemplar educational means, provided it would be made available on a wider range of platforms, open sourced, and would have a more stable status (at the time this article was written - June 1999 - only Windows binaries were available, with a time limited license).

The INTEL Recogniton Primitives Library http://www.intel.com/vtune/perflibst/rpl/ Availability: free Windows DLLs. Requirements: Intel-architecture CPU PC (Pentium II/III recommended), min. 16 MB RAM (32 MB recommended), Windows95/98/NT, C++ compiler. Description: The Intel Recognition Primitives Library, useful to explore isolated word recognition in a Windows environment, includes functions for speech feature extraction (pre-emphasis, windowing, cepstral and Linear Prediction analysis) and recognition (distance computation, Gaussian mixtures estimation, MLP and Kohonen neural networks, vector quantization, DTW, HMMs - discrete, continuous and semicontinuous). Impression: Very good documentation, but not always in sync with the actual package contents.

University of Maryland Discrete HMMs http://www.cfar.umd.edu/~kanungo/software/software.html Availability: free sources. Requirements: C compiler. Description: Simple discrete HMM software, training based on a single sequence of symbols. Used to teach HMM-based POS tagging in a statistical NLP course. Mississippi State University Discrete HMMs http://www.isip.msstate.edu/projects/speech/software/discrete_hmm Availability: free sources. Requirements: C compiler. Description: Discrete HMM demo by the book of Deller, Proakis, and Hansen, illustrating how the theory can be implemented.

48 Spoken Language Engineering

8.2 Advanced topics The second module is oriented towards advanced topics, including continuous speech recognition realized either as connected word recognition or based on subword acoustic modeling units, speaker modeling and adaptation, robust recognition, etc.

The theoretical support for this module can be found in the following books: Deller, J.R., Proakis, J.G., and Hansen, J.H.L. [1993]. Discrete Time Processing of Speech Signals, Macmillan, New York, chapters 13-14. Rabiner, L.R. and Juang, B.H. [1993]. Fundamentals of Speech Recognition, Prentice Hall, chapters 1, 7-9.

More detailed or alternative treatments of some topics can be found in: Huang, X.D., Jack, M., and Ariky, Y. (1990). Hidden Markov Models for Speech Recognition, Edinburgh University Press. Keller, E. (Ed.) [1994]. Fundamentals of Speech Synthesis and Speech Recognition, John Wiley & Sons. Bourlard, H.A. and Morgan, N. [1994]. Connectionist Speech Recognition: A Hybrid Approach, Kluwer. Junqua, J.C. and Haton, J.P. [1996]. Robustness in Automatic Speech Recognition, Kluwer. Jelinek, F., Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997 De Mori, R. (Ed.) [1998]. Spoken Dialogues with Computers, Academic Press, New York.

The experimental support is more complicated than in the case of the first module: continuous speech needs databases to build acoustic models; text corpora to train language models, use of pronunciation dictionaries. Given the nontrivial effort needed to produce such resources, and the prices at which they can be obtained from specialized agencies, most recommendable for this module are speech databases like Linguistic Data Consortium's (http://www.ldc.upenn.edu/) TIDIGITS (connected digits recognition), TIMIT, and Resource Management (continuous speech recognition), or European Language Resources Association's (http://www.icp.inpg.fr/ELRA/) EUROM databases. The same agencies can provide larger, more complex speech databases, and the associated text corpora, but the increased costs are not justified by purely educational use. The ABBOT Hybrid Continuous Speech Recognition Demo System ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo/ Availability: free binaries. Requirements: Unix (HP-UX/IRIX/Linux/Solaris/SunOS) machine with sound I/O. Description: Binary demo versions of the Abbot hybrid ASR system from Cambridge University Engineering Department, with a graphical interface to all its operations: speech input calibration, acquisition, replay, recognition, display of some recognition internals. Impression: Good (but not more than a) demo.

49 Spoken Language Engineering

Language Identification Tutorial and Quiz http://www- dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol3/gac1/test.html#gq1 Availability: free. Requirements: WWW browser. Description: A report on automatic language identification, tutorial nature plus a quiz test.

The ISIP Public Domain ASR System http://www.isip.msstate.edu/projects/speech/software/asr/ Availability: free sources. Requirements: Unix system; GNU C++ compiler and make; Tcl/Tk 8.0 or newer; NIST sctk package. Description: The Large Vocabulary Conversational Speech Recognition project at the Institute for Signal and Information Processing of Mississippi State University makes available a public domain ASR system using standard HMM technology. It includes programs for all the steps from feature extraction (MFCC, CMS, energy, delta, acceleration), through context independent and dependent Baum-Welch and Viterbi HMM training, up to decoding (forced alignment, word lattice generation, n-gram decoding), and results evaluation. The documentation so far consists of only a short user manual, including a tutorial on continuous speech recognition, and many aspects could be more detailed. Impression: As the project is in progress, and other developments are announced, it can be expected to become a very interesting resource.

The CMU-Cambridge Statistical Language Modeling Toolkit http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html Availability: free sources. Requirements: Unix system, C compiler. Description: The CMU-Cambridge Statistical Language Modeling toolkit is a very well documented suite of UNIX software tools to construct standard n-gram statistical language models: word frequency lists, vocabularies, general and vocabulary- specific n-gram counts, n-gram related statistics, backoff language models (Good-Turing, Witten-Bell, absolute, and linear discounting). Impression: A very useful tool, a must if you're in language modeling or want to build large vocabulary continuous speech recognition systems.

50 Spoken Language Engineering

The Speech Training and Recognition Unified Tools (STRUT) http://tcts.fpms.ac.be/speech/strut.html Availability: free Unix (HP-UX, DEC-Alpha, Linux, IRIX, SunOS) and Windows NT binaries. Requirements: Unix or Windows computer with optional sound I/O. Description: The Speech Training and Recognition Unified Tool (STRUT) is made available in binary form by the Circuit Theory and Signal Processing Laboratory of the Mons Polytechnic Faculty, Belgium, accompanied by a user manual. It includes programs for features extraction (LPC, MFC, RASTA-PLP, cepstral mean substraction), acoustic modeling using discrete, continuous, and hybrid HMMs, isolated word and continuous speech recognition experiments, and results evaluation. Impression: Pretty good coverage of most important ASR methods, but also quite involved configuration and command line options.

The NICO Neural Network Toolkit http://www.speech.kth.se/NICO/ Availability: free sources. Requirements: C compiler. Description: The NICO toolkit, available in source form from the speech group of the Royal Institute of Technology, Stockholm, includes tools to build (topology definition, I/O format specification), train (input normalization, back-propagation training, pruning), and evaluate artificial neural network classifiers for ASR systems. Impression: A well documented work, but additional software has to be written to build a full automatic speech recognition system.

The Stuttgart Neural Network Simulator (SNNS) http://www.informatik.uni-stuttgart.de/ipvr/bv/projekte/snns/ Availability: free sources and binaries for SunOS, Solaris, Linux, Windows95/NT. Requirements: X-windows server. Description: SNNS (Stuttgart Neural Network Simulator) is a software simulator for neural networks developed at the Institute for Parallel and Distributed High Performance Systems (IPVR) at the University of Stuttgart. It consists of a kernel simulator, operating on internal data structures of the neural networks and performing all learning and recall, and a graphical user interface under X11. The kernel can also be used without the other parts, as a C program embedded in custom applications and it is extensible. Many network architectures and learning procedures are included. Impression: A very comprehensive package, very well documented, worth studying if you're thinking about NN in speech processing.

51 Spoken Language Engineering

The NIST Speech Recognition Evaluation Software http://www.nist.gov/speech/software.htm Availability: free sources. Requirements: Unix machine with C compiler. Description: The NIST speech recognition scoring packages SCORE and SCTK were designed to evaluate results of recognition experiments run on standard test data sets from a few corpora used in the DARPA evaluations: Resource Management, Air Travel Information Systems, Wall Street Journal, Switchboard, and include programs for performance evaluation and statistical significance tests of performance differences among speech recognition systems evaluated on a common test set.

9 Spoken Dialogue Modelling

Spoken dialogue systems can be viewed as an advanced application of spoken language technology. Spoken dialogue systems provide an interface between the user and a computer-based application that permits spoken interaction with the application in a relatively natural manner. In so doing, spoken dialogue systems provide a stringent test for the major fields of spoken language technology, including speech recognition and speech synthesis, language processing, and dialogue management.

Spoken dialogue modelling is concerned with the dialogue management component of a spoken dialogue system. A module for spoken dialogue modelling should cover theoretical background from linguistics and artificial intelligence on the nature of dialogic interaction; present some case studies of representative systems; examine methodologies for the development and evaluation of spoken dialogue systems; and recommend tools and development environments.

The resources presented below provide a good coverage of most of the sections of the module, with resources ranging from book materials for the more theoretical sections to Web based materials and software.

9.1 Introduction to spoken dialogue systems Introductory material on spoken dialogue systems can be found in the following texts:

Bernsen, N.O., Dybkjaer, H. and Dybkjaer, L (1998). Designing Interactive Speech Systems: From First Ideas to User Testing. New York: Springer Verlag (chapters 1 and 2).

This book provides a detailed overview of spoken dialogue design methodologies, based on the results of the Danish Dialogue Project. The book covers topics such

52 Spoken Language Engineering as: dialogue design guidelines, evaluation methodologies, and speech functionality analysis. The book is a useful resource for dialogue designers as it presents a detailed methodology for dialogue design and evaluation, with a number of worked examples.

Cole, R.A., Mariani, J., Uszkoreit, H., Zaenen, A. and Zue, V. (Eds.) (1997). Survey of the State of the Art in Human Language Technology. Cambridge : Cambridge University Press. Online version: http://cslu.cse.ogi.edu/HLTsurvey

This book, which is available online and can also be downloaded as a postscript file, contains chapters and bibliographies on all aspects of human language technology. Chapter 6, Discourse and Dialogue, which is the most relevant to this module, has brief but informative sections on discourse modelling, dialogue modelling, and spoken language dialogue.

Fraser, N. (1997). Assessment of interactive systems. In D. Gibbon, R. Moore and R. Winski [Eds.] Handbook of Standards and Resources for Spoken Language Systems. New York: Mouton de Gruyter, 564-614.

The main goal of the Handbook is to collect and catalogue information on spoken language resources and standards. The Handbook provides a reference work for speech technology development. This chapter covers the development and assessment of interactive systems, providing explicit recommendations on good practice. The chapter complements the Bernsen et al. Book listed above. The Handbook is available as a Library Handbook edition (including hypertext version on CD-ROM), in four paperback parts, or with free Web access to hypertext version with user registration. Availability: http://www.degruyter.de/EAGLES/eaglefly.html

Giachin, E. and McGlashan, S. (1997). Spoken Language Dialogue Systems. In S. Young and G. Bloothooft (Eds.) Corpus-based methods in language and speech processing. Dordrecht: Kluwer Academic Publishers, 69-117.

This chapter covers the essential components of a spoken dialogue system and discusses in detail the approaches to dialogue modelling that informed the Esprit SUNDIAL project. The chapter is useful as a documentation of the SUNDIAL approach that is otherwise restricted to less accessible sources. This is a good overview for an advanced reader.

Smith, R.W. and Hipp, D.R. (1994). Spoken Natural Language Dialog Systems: A Practical Approach, New York: Oxford University Press (chapters 1 and 2).

This book describes the Circuit Fix-It Shop system, which assists users in fixing an electronic circuit using spoken language (see below). The first two chapters provide a good overview of the field of spoken dialogue systems.

9.2 Theories of dialogue modelling

Levinson, S. (1983). Pragmatics. Cambridge: Cambridge University Press.

53 Spoken Language Engineering

This is the standard text for linguistic theories of dialogue, including comprehensive discussions of speech acts, Gricean maxims and the co-operative principle, and conversation analysis. Although the book does not discuss spoken dialogue systems, some of the chapters describe the theoretical underpinnings, for example, the chapters on speech acts and conversation analysis.

McTear, M. (1987). The Articulate Computer. Oxford: Blackwell.

This book describes in a readable fashion the work on dialogue modelling that was conducted in the late 70s and early 80s, including script-based and plan-based approaches. While somewhat outdated in the light of more recent developments in spoken language systems, the book provides an insight into the more complex dialogue modelling that will be required by more advanced dialogue agents.

Allen, J. (1995). Natural Language Processing. Redwood, Ca.: Benjamin/Cummings Publishing Company, Inc.

Part III of this book presents an AI view of discourse and dialogue, with chapters on knowledge representation and reasoning; local discourse context and reference; using world knowledge; discourse structure; defining a conversational agent. This material forms the basis for most of the more theoretically motivated dialogue systems that have been developed in the past decade.

9.3 Case studies: overview and comparison of some representative systems

The following is a selection of links to representative spoken dialogue systems. Most of the sites contain comprehensive lists of publications as well as demos.

The TRAINS Project: Natural Spoken Dialogue and Interactive Planning http://www.cs.rochester.edu/research/trains/ Description: The TRAINS project at the University of Rochester Department of Computer Science is a long-term effort to develop an intelligent planning assistant that is conversationally proficient in natural language. The goal is a fully integrated system involving online spoken and typed natural language together with graphical displays and GUI-based interaction. The primary application has been a planning and scheduling domain involving a railroad freight system, where the human manager and the system must cooperate to develop and execute plans. The TRAINS site provides links to a number of online publications, access to the TRAINS Dialogue Corpus and the Dialogue Annotation Project, and links to promotional videos. The successor to TRAINS is TRIPS, a project involving more complex planning and multi-agent interaction. Impression: This is an excellent site that provides access to a large number of relevant publications (some of which are online), as well as short introductory movies that require a Real Audio player. See http://www.cs.rochester.edu/research/trips/trains.html).

54 Spoken Language Engineering

The ARISE Project http://www2.echo.lu/langeng/projects/arise/summary.html Availability: free Requirements: WWW Browser and audio components Description: ARISE is a project funded by the EU Language Engineering programme. ARISE is developing an automatic train schedule enquiry service which will be accessed via an ordinary telephone and which will handle the bulk of routine enquiries automatically. These enquiries amount to more than 200 million calls annually to European railway centres, of which 20% currently go unanswered due to the cost of a manual service. This Web page provides a detailed description of the different elements of the project, including its objectives, the technology used, and the results. An example dialogue in French can be found at: http://www.limsi.fr/Recherche/TLP/demos.html

Circuit Fix-it Shop http://www.cs.duke.edu/~msf/voicelab/circuit/ Description: The Circuit Fix-It Shop system assists users in fixing an electronic circuit. The only mode of interaction between the user and the computer is spoken natural language dialogue. The site includes a brief demo and a reference to the book Smith, R.W. & Hipp, D.R. (1994) Spoken Natural Language Dialog Systems: A Practical Approach listed in section 1.

The MASK Project http://www.limsi.fr/Recherche/TLP/mask.html Availability: free Requirements: WWW Browser and audio components Description: The aim of the Multimodal-Multimedia Automated Service Kiosk (MASK) project is to pave the way for more advanced public service applications by user interfaces employing multimodal, multi-media input and output. The project has analyzed the technological requirements in the context of users and the tasks they perform in carrying out travel enquiries, and developed a prototype information kiosk that will be installed in the Gare St. Lazare in Paris. The Web site contains a number of pages describing the project, including several online publications (Postscript).

55 Spoken Language Engineering

Speech Applications Project (SUN) http://www.sunlabs.com/research/speech/projects/SpeechActs/index.html Availability: free Requirements: WWW Browser and audio components Description: The Speech Applications Project at Sun Microsystems Laboratories performs research in speech technology, including creating tools for building speech applications, designing techniques for managing spoken language discourse, prototyping sample applications, and studying speech user interface issues. The site includes a number of online publications and several interesting demos in au, wav, real-audio, and text formats. The site contains useful guidelines on speech user interface design.

Spoken Language Systems - MIT Laboratory for Computer Science http://www.sls.lcs.mit.edu/sls/ Description: Since 1994, the SLS group has been developing a conversational platform called GALAXY. Users may engage in GALAXY-based conversations about weather forecasts (JUPITER), airline scheduling (PEGASUS), a Cambridge city locations (VOYAGER), Boston area restaurants (DINEX), and online automobile classified ads (WHEELS), and selected Web-based information (WebGALAXY). This site provides information about the projects at SLS and access to a number of online publications.

WAXHOLM http://www.speech.kth.se/waxholm/waxholm.html Description: The demonstrator application, WAXHOLM, gives information on boat traffic in the Stockholm archipelago. Besides the speech recognition and synthesis components, the system contains modules that handle graphic information such as pictures, maps, charts, and time-tables. This information can be presented to the user at his/her request. The site includes online publications. The site includes a video playback of a real interaction between the August and a user (mpg-format 1.44M) or (mpg-format 12M) (both are Swedish only). More videos can be found on the August Homepage: http://www.speech.kth.se/august/ . The site also includes a number of movies in MPEG, MOV and AVI formats, as well as links to other sites that make use of animated talking heads.

9.4 Methodologies for development and evaluation of spoken dialogue systems

Some of the most comprehensive work on methodologies has been carried out by members of the Danish Dialogue Project within EU funded projects such as DISC

56 Spoken Language Engineering and REWARD.

The PARADISE tool for evaluation of spoken dialogue systems http://www.research.att.com/~diane/TOOT.html Description: This page describes a framework for evaluating spoken dialogue agents, and evaluations of cooperative responses, the use of tutorial dialogues, and adaptable dialogue behaviour in TOOT and other real-time dialogue systems. TOOT is a spoken dialogue agent that allows users to access train schedules stored on the web via a telephone conversation. TOOT has served as a testbed for evaluating spoken dialogue systems using the PARADISE evaluation framework. There is a recorded dialogue from a first experimental evaluation with TOOT along with a transcript of the dialogue. A list of online publications about the PARADISE tool can be found at: http://www.research.att.com/~diane/evaluation-pubs.html DISC Project - Spoken Language Dialogue Systems and Components: Best practice in development and evaluation http://www.elsnet.org/disc/

Description: DISC is an Esprit Long-Term Research project that is performing an in- depth examination of a broad selection of state-of-the-art spoken dialogue systems and their components in order to identify current development and evaluation practices and pinpoint their deficiencies. This site provides some introductory in formation about the project. DISC documents are currently available only to members of the DISC Advisory Panel.

REWARD Project - Real World Application of Robust Dialogues http://www.cpk.auc.dk/speech/reward.html Description: The REWARD project addresses the needs of organisations which do business over the telephone: to automate certain telephone services using spoken language dialogue technology and to automate the process of creating such services. Documents from the REWARD project are not publicly available. A brief description of the project can be found at the web site listed above. See also the conference paper: Brøndsted, T., Bai, B. and Olsen, J. "The REWARD Service Creation Environment, an overview" Proceedings of ICSLP98, Sydney, Australia, Dec. 1998, pp. 1175-1178.

Now you're talking (BT guide for the design of voice services) http://www.labs.bt.com/projects/voice/talk/index.htm Description: This guide is an introduction to designing speech-based dialogues for interactive telephone services. The aim is to provide friendly, logical and consistent dialogue style. The guide details factors for consideration before a voice service can be set up and outlines the underlying technology. The guide provides a set of useful and practical guidelines for developers. No theoretical motivation for the guidelines is provided.

57 Spoken Language Engineering

9.5 Dialogue control and dialogue modelling

Dialogue management http://www.sics.se/~scott/papers/twlt96/twlt96.html Description: This online paper describes management techniques developed in a speech-only dialogue system and how they are being extended for a multimodal system which combines a direct manipulation interface with a spoken dialogue interface for a simple consumer information service. The paper is written by one of the authors of the Giachin and McGlashan chapter 'Spoken Language Dialogue Systems' described in section 1. 9.6 Toolkits and development environments

The CSLU Web site http://cslu.cse.ogi.edu/ Description: The CSLU Web site is designed to facilitate learning about and getting language resources and technologies. The site provides access to publications, demonstrations, courses and interactive tutorials resulting from work at CSLU. The CSLU Toolkit and speech corpora, which are free of charge for researchers and educators, can be obtained from this site. This is an excellent site that provides a vast range of materials and tools. The CSLU toolkit can be used to teach a complete course on spoken dialogue systems, using the tutorials provided.

IMM course in Spoken Dialogue Systems http://www.kom.auc.dk/~lbl/IMM/S9_98/SDS_course_overview.html Description: The goal of this Web-based course, which is a module in the IMM (Intelligent MultiMedia) Masters Degree at the University of Aalborg, Denmark, is to give ‘an understanding of the design and implementation of speech-based interfaces through hands-on experience in building a spoken dialogue system.’ Students taking the course are required to design, implement and test a spoken dialogue system for a telephone based service. The dialogues, which are developed in a textual script-language, are tested on the REWARD dialogue platform, which is connected to the telephone network. The course assumes that the students have taking prerequisite modules in ‘Spoken Language Processing’ and ‘Design of Multi Modal HCI Systems’. The literature supporting the course includes reports from the DISC and REWARD projects. This material, which includes details of the development platform to be used for implementation, is not publicly available. A useful site, although access to most of the course materials is restricted.

GULAN: A System for Teaching Spoken Dialogue Systems Technology http://www.speech.kth.se/~joakim_g/plan/gulan.html

58 Spoken Language Engineering

Description: The aim of this work has been to put a fully functioning spoken dialogue system into the hands of the students as an instructional aid. They can test it themselves and are able to examine the system in detail. They are shown how to extend and develop the functionality. In this way, we hope to increase their understanding of the problems and issues involved and to spur their interest for this technology and its possibilities. The TMH speech toolkit, including a broker system with distributed servers, has been used to create an integrated lab environment that can be used on Unix machines. The system has been used in the courses on spoken language technology given at Masters level at the Royal Institute of Technology (KTH), at Linköping University and at Uppsala University in Sweden. The system, which is currently in Swedish but due to be ported to English, is currently only runnable locally. The site includes various links to screenshots as well as online publications. There is also a link to information about the research project 'Swedish Dialogue Systems': http://www.ida.liu.se/labs/nlplab/sds/

SpeechMania™ http://www.speech.be.philips.com:100/ud/get/Pages/09A_T_Prod01.htm Description: SpeechMania™ is a natural language recognition and understanding engine developed at Philips Speech Processing to provide human-to-human-like automatic services over the telephone. The software allows people to talk with computers over the phone. Information services or transactions such as railway and flight timetables, bank statements, stock exchange quotations or mailorder articles can be fully automated. This site provides information about SpeechMania™ and about applications that have been developed using SpeechMania™. SpeechMania™ is not in the public domain, but it may be possible to obtain an academic licence for research purposes by contacting Philips Speech Processing.

9.7 Additional Resources

List of Spoken Dialogue Systems in Operation http://www.elsnet.org/disc/tools/opSLDSs.html Description: This page provides a list of spoken dialogue systems from France, Germany, Germany, Switzerland, UK and USA. In many cases phone numbers are given so that the systems can be called and tested. These sources are useful for students who wish to test working spoken dialogue systems.

SIGdial - Special Interest Group on Dialogue http://www.iet.com/Projects/sigdial/index.html Description: SIGdial is a Special Interest Group of the Association for Computational Linguistics (ACL). SIGdial is a non-profit cooperative organization sponsored by an international community of Discourse and Dialogue researchers in government, industry, and education. SIGdial has a number of aims that are of

59 Spoken Language Engineering

interest to those studying spoken dialogue systems, including: the promotion, development and distribution of reusable discourse processing components; exploration of techniques for evaluation of dialogue systems; sharing resources and data among the international community; encouraging empirical methods in research; agreeing upon standards for discourse transcription, segmentation, and annotation; promoting collaboration among developers of various dialogue system components; supporting student participation in the discourse and dialogue community.

Further information, including instructions on how to join SIGdial, can be found at the URL listed above.

9.8 Similar courses

CSE561 Dialogue Spring 1999 http://www.cse.ogi.edu/cse561/ Description: This is a course taught at CSLU by Phil Cohen and Peter Heeman. The site provides a detailed schedule for the course, which includes slides, lists of readings, and other materials. Contact: Peter Heenan email: [email protected]

10 Language Resources

The infrastructural role that language resources (LRs) play, supporting system development for speech recognition, speech synthesis, natural language processing, and dialogue modeling, each area with its own peculiarities as to the necessary LRs, might make questionable both the placement in the green book of this module after those treating all the areas above, and its lumping of topics.

Anyway, the modules are devices to support curricula organization, and although in a particular curriculum this one can be missing as a separate unit of teaching, it should draw the attention to subjects relevant for other modules, in conjunction with which they have to be at least touched.

As theoretical support, the only comprehensive book so far is

Gibbon, D., Moore, R., and Winski, R. (Editors) [1997]. Handbook of Standards and Resources for Spoken Language Systems, Mouton de Gruyter, Berlin.

For dialogue systems, more information relevant to LRs are included in

Bernsen, N.O., Dybkjaer, H., and Dybkjaer, L. {1997]. Designing Interactive Speech

60 Spoken Language Engineering

Systems: From First Ideas to User Testing, Springer; in which chapter 5 is dedicated to Wizard of Oz simulation, and chapter 7 to corpus handling.

The Internet resources listed below include not already available LRs, to be mentioned in conjunction with the modules they belong to, but tools and proposed/standard formats and annotation/description formalisms of interest for new LRs design, collection, and annotation.

From the list, it can be noticed that the design and collection stages are not covered at all with software tools, for different reasons, e.g.: the lack of theoretically founded methodologies for LRs design most often result in rather loose specifications, not implementable as such by generic tools; the dependence on certain recording equipments, etc.

The OGI Speech Tools ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ogitools.v1.0.tar.Z Availability: free Requirements: Unix machine with C compiler and X-windows server. Description: The OGI Speech Tools, precursors of the CSLU Toolkit, include two programs, lyre and autolyre, very useful for visualizing speech signals, spectrograms, and labels, and signal annotation. Especially autolyre is highly recommandable as a production tool, due to its batch processing support. There are also a number of utilities for speech signal manipulation (byte swaping, rate and format conversion, filtering) and analysis (DFT, PLP).

The ISIP Switchboard Segmenter and Transcriber http://www.isip.msstate.edu/resources/software/swb_segmenter/ http://www.isip.msstate.edu/resources/software/transcriber/ Availability: free Requirements: Unix machine with X-windows server, GNU make and C++ compiler, Tcl/Tk 8.0 or newer; optionally the Network Audio Server (NAS). Description: The Institute for Signal and Information Processing, Mississippi State University, has published two related programs for speech signal segmentation, transcription, and annotation. They are oriented towards telephone recordings, as the conversations in the Switchboard corpus, for which a two-channel segmenter was specifically designed, or utterances of a single speaker (SpeechDat-like), for which a transcriber was adapted. Both support pronunciation dictionaries manipulation. Impression: Good production tools.

61 Spoken Language Engineering

The ETCA Transcriber http://www.etca.fr/CTA/gip/Projets/Transcriber/ Availability: free sources and binaries (Windows NT, Linux, Solaris, IRIX). Requirements: Tcl/Tk 8.0 or newer, SNACK 1.5 (see above), tcLex 1.1 Tcl/Tk lexical analyzer generator; optional the NIST Sphere library ver. 2.6a. Description: Although oriented towards the annotation of long single channel signals, like those collected from radio and TV broadcasts, in which multiple speakers and topics can appear, this Transcriber support other types of recordings too. Spectrographic displays of the signal are not yet supported, but they could be easily incorporated, as SNACK is used as the display engine.

The Rochester Dialogue Segmentation and Annotation Tools ftp://ftp.cs.rochester.edu/pub/packages/dialog-tools/ ftp://ftp.cs.rochester.edu/pub/packages/dialog-annotation/ Availability: free Requirements: Tcl/Tk 8.0, C compiler, awk, perl 5.02 or newer, perlTk; the Entropic ESPS/waves package (for segmentation). Description: The segmentation tools are intended to cut dialogues in utterances, and are based on the Entropic ESPS/waves package; they consist of a few special C programs, plus shell scripts. The dialog annotation tool is written in Tcl/Tk, and supports a specific annotation formalism. Impression: Although not necessarily appropriate for a certain project, they can be useful at least as examples.

The Nb (Nota bene) System http://www.sls.lcs.mit.edu/flammia/Nb.html Availability: free Requirements: Tcl 7.4 and Tk 4.0 or newer. Description: Nb is a graphical user interface for annotating the discourse structure of spoken dialogue, monologue, and text. Different annotation instructions and different theories about discourse interpretation and generation can be easily incorporated in the annotation process without the need of changing the graphical user interface. The instructions and the annotated text are displayed in a clear-cut way, and typing is reduced to a minimum.

62