LABORATORY for

JULY 1998 SPOKEN LANGUAGE SYSTEMS

Massachusetts Institute SPOKEN LANGUAGE SYSTEMS of Technologyi c 1998 Massachusetts Institute of Technology

For information or copies of this report, please contact:

Victoria L. Palay MIT Laboratory for Computer Science 545 Technology Square, NE43-601 Cambridge, MA 02139 USA [email protected]

Please visit the Spoken Language Systems Group on the World Wide Web at http://www.sls.lcs.mit.edu

ii SUMMARY OF RESEARCH SUMMARY of RESEARCH

JULY 1998

SPOKEN LANGUAGE SYSTEMS iii iv SUMMARY OF RESEARCH Table of Contents

Summary of Research 1 Research, Technical, Administrative and Support Staff ...... viii-xi Research Assistants ...... xii-xv Post-Doctoral Associates, Undergraduate Students, Transitions ...... xv-xvi Research Sponsorship ...... xvii

Research Highlights 2 Research Highlights, 1997-1998 Victor Zue ...... 3

Research Projects 3 JUPITER Data Collection and Analysis Joseph Polifroni, James Glass and Sally Lee ...... 9 Natural Language Processing in the JUPITER Domain Stephanie Seneff and Joseph Polifroni...... 12 Spontaneous in the JUPITER Domain James Glass ...... 18 Confidence Scoring for Speech Understanding Christine Pao, Philipp Schmid and James Glass...... 22 PEGASUS: Flight Departure/Arrival/Gate Information System Stephanie Seneff, Joseph Polifroni and Philipp Schmid ...... 25 Using Aggregation to Improve the Performance of Mixture Gaussian Acoustic Models T.J. Hazen and Andrew Halberstadt ...... 27 BIANCA: A Dialogue Managment Engine for PEGASUS Philipp Schmid, Stephanie Seneff and Joseph Polifroni ...... 29 ANGIE-Based Pronunciation Server Aarati Parmar and Stephanie Seneff ...... 32

Thesis Research 4 A Model for Segment-Based Speech Recognition Jane Chang ...... 37 Hierarchical Duration Modelling for a Speech Recognition System Grace Chung ...... 40 Discourse Segmentation of Spoken Dialogue: An Empirical Approach Giovanni Flammia ...... 42 Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition Andrew Halberstadt ...... 45 The Use of Speaker Correlation Information for Automatic Speech Recognition T. J. Hazen ...... 47

SPOKEN LANGUAGE SYSTEMS v Thesis Research (continued) 4 The Mole: A Robust Framework for Accessing Information from the World Wide Web Hyung-Jin Kim ...... 50 Sublexical Modelling for Word-Spotting and Speech Recognition Using ANGIE Raymond Lau ...... 52 Probabilistic Segmentation for Segment-Based Speech Recognition Steven Lee ...... 56 A Model for Interactive Computation: Applications to Speech Research Michael McCandless ...... 57 Subword Approaches to Spoken Document Retrieval Kenney Ng ...... 60 A Semi-Automatic System for the Syllabification and Stress Assignment of Large Lexicons Aarati Parmar ...... 62 A Segment-Based Speaker Verification System Using SUMMIT Sridevi Sarma ...... 64 Context Dependent Modelling in a Segment-Based Speech Recognition System Benjamin Serridge ...... 66 Toward the Automatic Transcription of General Audio Data Michelle Spina ...... 67 Porting the GALAXY System to Mandarin Chinese Chao Wang ...... 70 Concatenative Speech Synthesis of Isolated Words Using Sub-Word Units Jon Yi ...... 75

vi SUMMARY OF RESEARCH Theses, Publications, Presentations and Seminars 5 Ph.D. and Masters Theses ...... 79 Publications ...... 80 Presentations...... 82 SLS Seminar Series ...... 83

SPOKEN LANGUAGE SYSTEMS vii Research Staff

photo here photo here photo here photo here

VICTOR ZUE JAMES GLASS T.J. HAZEN LEE HETHERINGTON

Victor Zue has been associated James Glass is a Principal Timothy James (T. J.) Hazen Lee Hetherington received his with MIT since 1970, as a Research Scientist and arrived at MIT in1987 where S.B., S.M., and Ph.D. degrees graduate student, teacher and Associate Head of the SLS he received his S.B. degree in from MIT's Department of researcher. He is now a Senior group. He received his Ph.D. in 1991, S.M. degree in 1993 and Electrical Engineering and Research Scientist, Associate Electrical Engineering and PhD in 1998,all in Electrical Computer Science. He Director of the MIT Labora- Computer Science from MIT Engineering. T.J. joined the completed his doctoral thesis, tory for Computer Science, and in 1988. Over the past fifteen SLS group as an undergraduate "A Characterization of the the head of the SLS group. His years, his research has covered in 1991 and has been with the Problem of New, Out-of- main research interest is in the many different areas of the group ever since. He is Vocabulary Words in Continu- development of conversational speech communication chain, currently working as a research ous-Speech Recognition and systems to facilitate graceful centered on computer speech scientist in the group. His Understanding," and joined human/computer interactions. recognition and spoken primary research interests the SLS group in October He has taught courses at MIT language understanding. In include acoustic modeling, 1994. His research interests and abroad, written over 150 addition to publishing speaker adaptation, automatic include many aspects of speech papers, and delivered numer- extensively in these areas, he language identification, and recognition, including search ous talks on this subject. He is has supervised S.M. and Ph.D. phonological modeling. techniques, acoustic measure- a Fellow of the Acoustical students, and co-taught courses ment discovery, and recently Society of America, and in spectrogram reading and the use of weighted finite-state currently chairs the Informa- speech recognition. He is one transduction for context- tion Science and Technology of the original developers of dependent phonetic models, (ISAT) Study Group for the segment-based SUMMIT phonological rules, lexicons, DARPA. In 1994, he was speech recognition system. and language models in an elected Distinguished Lecturer integrated search. by the IEEE Signal Processing Society. viii SUMMARY OF RESEARCH photo here photo here photo here

RAYMOND LAU HELEN MENG STEPHANIE SENEFF

Raymond Lau received the B.S. Helen Meng is a Research Stephanie Seneff has a B.S. in Computer Science and Scientist in the SLS group. She degree in Biophysics and M.S., Engineering, the M.S. degree in received her S.B., S.M. and E.E., and Ph.D. degrees in Electrical Engineering and Ph.D. degrees from MIT's Electrical Engineering and Computer Science, and the Department of Electrical Computer Science from MIT. Ph.D. degree in Computer Engineering and Computer Her research interests span a Science, all from the Massachu- Science. Her doctoral thesis, wide spectrum of topics related setts Institute of Technology in entitled "Phonological Parsing to conversational systems, 1993, 1994, and 1998, for Bi-directional Letter-to- including phonological respectively. He was a National Sound /Sound-to-Letter modelling, auditory modelling, Science Foundation fellow and Generation," was completed in computer speech recognition, is a member of Eta Kappa Nu. February 1995, and her statistical language modelling, His current research interests master's thesis, entitled "The natural language understanding are in the are area of speech Use of Distinctive Features for and generation, discourse and recognition and spoken Automatic Speech Recogni- dialogue modelling, and language systems with a tion,'' was completed in June prosodic analysis. She has particular focus on subword 1991. Her research interests published numerous papers in modelling, search strategies and include front-end processing these areas, and she is currently language modelling. for speech recognition, lexical supervising several students at phonology, language modelling, both master's and doctoral discourse and dialog modelling levels. as well as multi-linguality for conversational systems.

SPOKEN LANGUAGE SYSTEMS ix Technical Staff

photo here photo here photo here

ED HURLEY CHRISTINE PAO JOSEPH POLIFRONI

Ed Hurley received his B.S. in Christine Pao has been a Joseph Polifroni's interests Physics from MIT in 1985. member of the technical include language generation, After working in semiconduc- research staff since 1992. She is human-computer interaction, tor fabrication and parallel primarily involved in the and multilingual systems. He processing, he joined the SLS development and maintenance has worked on the back-end group in October 1994 doing of the GALAXY system. Her components of many of the application programming and research interests are in SLS systems, including GALAXY system administration. His discourse and dialog, systems and DINEX in addition to his interests are in using the Web integration with a focus on work on GENESIS, the natural as a mechanism for the delivery multilingual systems and language generation system that of spoken language systems, as language learning, and open is part of the overall GALAXY well as for speech data microphone issues such as architecture. He has also collection. He is also actively rejection and channel contributed to the Spanish and involved in developing normalization. Christine has a Mandarin Chinese systems. telephone based spoken bachelor's degree in Physics Before joining SLS, Joe worked language systems. from MIT. in the Speech Group at Carnegie Mellon University and was also a consultant for Carnegie Group Inc. in Pittsburgh. In addition, Joe spent two years living in China, teaching English at Shandong University in Jinan.

x SUMMARY OF RESEARCH Administrative & Support Staff

photo here photo here

VICTORIA PALAY SALLY LEE

Victoria Palay has been a Sally Lee joined the Spoken member of the Spoken Language Systems group as Language Systems group since senior secretary in 1993. She 1988. As SLS program received a B.A. in Studio Art/ administrator, she manages Art History from Colby personnel, fiscal, publication College in 1984. She also and contractual matters as well studied at the Art Institute of as space and other group Boston and the New York resources. In addition, she Studio School. In addition to supports Victor Zue's duties as her secretarial duties, Sally has LCS Associate Director by made many of the animated coordinating equipment and still icons for SLS donations made to the programs including GALAXY Laboratory. Victoria has a B.A. and JUPITER. She also is in Government and French responsible for transcribing Studies from Smith College. sentences that are recorded from people calling into the JUPITER system.

SPOKEN LANGUAGE SYSTEMS xi Graduate Students

photo here photo here photo here photo here

JANE CHANG GRACE GHUNG GIOVANNI FLAMMIA ANDREW HABERSTADT

Jane Chang is a doctoral Grace Chung graduated in Giovanni Flammia completed a Andrew Halberstadt received student working on a frame- Electrical Engineering and M.Eng. (Laurea) in Electrical the B.S. and M.S. degrees in work for feature-based speech Mathematics from the Engineering from the Univer- Electrical Engineering from the recognition that better models University of New South sity of Rome in 1988 and an University of Rochester in 1992 the inherent variability in Wales, Sydney, Australia. She M.S. in Computer Science and 1993, respectively. In human speech. Currently, she earned a Fulbright scholarship from McGill University in addition, he received the is exploring how to use to attend MIT and completed 1991, funded by a Government Bachelor of Music degree in phonological and pronuncia- her master's degree in June of Canada Award. Before 1991 from the Eastman School tion constraints in lexical 1997. Her interests are in joining the SLS group, he did of Music in Rochester, NY. He access. In the past year, she has acoustic modelling and speech processing research at was the recipient of a Sproul also worked on other aspects of prosodic modelling for speech the Center for Personal fellowship at the University of segmentation, classification recognition. Communication at the Rochester, and is a member of and recognition. Jane receives University of Aalborg, the engineering honor society support from an AT&T Advisor: Stephanie Seneff Denmark and at CNET France Tau Beta Pi. His research Fellowship. Telecom labs in Lannion. His interests include time-frequency doctoral research focuses on representations, phonetic Advisors: Victor Zue developing dialogue models classification and recognition, and James Glass and user interfaces for spoken speech and audio processing, language systems that gather and pattern recognition. information from the Internet. Advisor: James Glass Advisor: Victor Zue

xii SUMMARY OF RESEARCH photo here photo here photo here photo here

HYUNG-JIN KIM STEVEN LEE KAREN LIVESCU MICHAEL MCCANDLESS

Hyung-Jin Kim spent his Steven Lee is pursuing his Karen Livescu received her B.A. Michael McCandless is working undergraduate years at MIT master’s degree with SLS. He in Physics at Princeton towards a doctoral degree in and is currently pursuing a received his S.B. degree from University in 1996. She spent the area of speech recognition. Masters of Engineering degree MIT in 1997 and expects to the following year at the He is constructing a novel through the SLS group. His receive his M.Eng. degree in Technion in Haifa, Israel, as a framework which will unify all research interests include Java, June 1998. He is a member visiting student in the stages in the speech recognition XML, and other web technolo- and former president of the Electrical Engineering process. The framework is cast gies. Currently, he is working a Tau Beta Pi Engineering Honor department. Karen started within an interactive tool, system called the Mole which is Society, as well as a member graduate study in the SLS based on the Python language, a framework for robustly and vice-president of the Eta group in September 1997. She which enables rapid accessing information on Kappa Nu Honor Society. His is a National Science Founda- prototyping of new recognition HTML pages. current research interest is in tion fellow and plans to pursue domains and exploration of probabilistic segmentation. research in speech recognition. design tradeoffs. His master's Advisor: Lee Hetherington thesis was in the area of Advisor: James Glass Advisor: James Glass automatic learning of language structure for improving speech recognition. Michael is co- author of the IEEE Expert Internet Services Department and a member of the American Association for the Advance- ment of Science.

Advisor: James Glass

SPOKEN LANGUAGE SYSTEMS xiii Graduate Students

photo here photo here photo here photo here

KENNEY NG SRIDEVI SARMA MICHELLE SPINA CHAO W ANG

Kenney Ng's current research Sridevi Sarma received her Michelle Spina received the Chao Wang received her interest is in the area of bachelor's degree in Electrical B.S. in Electrical Engineering bachelor's degree in Biomedical information retrieval of spoken Engineering from Cornell from the Rochester Institute of Engineering, with a minor in documents, which is the task of University in 1994. She Technology in 1991, and the Computer Science from identifying those speech completed her master's thesis, S.M. in Electrical Engineering Tsinghua University, Beijing, messages stored in a large which investigates speaker from MIT in 1994. She is China in 1994. She started her collection that are relevant to a verification using a segment- currently pursuing a Ph.D. graduate study in MIT in query provided by a user. Prior based approach in June 1997. degree in the SLS group. September 1995 and joined the to his return to MIT in 1995, Sridevi is a National Science Michelle's research interests SLS group in April 1996. Kenney was a member of the Foundation fellow. include automatic indexing of Chao's master's degree, Speech and Language Depart- audio content, speech completed in June 1997, ment at BBN Systems and Advisor: Victor Zue recognition and understanding, worked on porting the GALAXY Technologies where he did and biomedical issues of speech system to Mandarin Chinese. research on large vocabulary processing as they relate to recognition of conversational automatic speech recognition. Advisor: Stephanie Seneff speech, word spotting, topic Her current research involves spotting, probabilistic general sound understanding, segmental speech models, and and orthographic analysis of noise compensation. He general audio data. Michelle received his B.S. and M.S. was a 1995 Intel Foundation degrees in EECS from MIT in Graduate Fellow, and is a 1990. member of Tau Beta Pi, Eta Kappa Nu, and Phi Kappa Phi. Advisor: Victor Zue Advisor: Victor Zue xiv SUMMARY OF RESEARCH Post-Doctoral Associates

photo here photo here

JON Y I PHILIPP SCHMID NIKKO STRÖM

Jon Yi received the S.B. and the Philipp Schmid received his Nikko Ström received the M.Eng. degrees in Electrical Ph.D. in Computer Science Master of Science, (Engineer- Engineering and Computer and Engineering from the ing Physics) degree in 1991, and Science from the Massachusetts Oregon Graduate Institute of the Ph.D. degree in Electrical Institute of Technology in 1997 Science and Technology in Engineering (Department of and 1998, respectively. He also December 1996. In his Ph.D. Speech, Music, and Hearing) in graduated in 1997 with a thesis he investigated the use of 1997 at the Royal Institute of minor in Music. At SLS he has "Explicit N-Best Formant Technology (KTH), Stockholm, worked on developing a Features for Segment-Based Sweden. He joined SLS in May Mandarin Chinese Speech Recognition". He '98 as a Postdoctoral Associate. concatenative speech synthe- joined SLS in January 1997 as His main areas of interest are sizer and a UNICODE/Java a Postdoctoral Associate. He is human/machine dialogue, World Wide Web interface for interested in building conversa- lexical search in automatic the GALAXY system. His research tional systems for real users. speech recognition, and interests include speech His main focus has been in the acoustic/phonetic modeling. synthesis, communications area of dialogue management, systems, and multilingual information retrieval from web- speech understanding systems. based knowledge sources, and flow of control issues. He has Advisor: James Glass been working on PEGASUS, the flight arrival and departure information system, where he helped develop a new dialogue control mechanism.

SPOKEN LANGUAGE SYSTEMS xv Undergraduate Students

Ivan Gonzalez-Gallo Simo Kamppari Fernando Perez Rafael Schloming Archit Shah Aleem Siddiqui Samuel Wong James Wood Minnan Xu Transitions

Jane Chang, Ph.D. June 1998 Giovanni Flammia, Ph.D., June 1998 T.J. Hazen, Ph.D., January 1998/SLS Research Scientist Jim Hugunin Raymond Lau, Ph.D., March 1998/SLS Research Scientist Steve Lee, M.Eng, June 1998 Michael McCandless, Ph.D., June 1998 Helen Meng, April 1998 joined Chinese University of Hong Kong Aarati Parmar, M.Eng., June 1997 Sridevi Sarma, S.M., June 1997 Benjamin Serridge, M.Eng., August 1997

xvi SUMMARY OF RESEARCH Research Sponsorship

Defense Advanced Research Projects Agency 1 Bell Atlantic Corporation BellSouth Intelliventures National Science Foundation 2

In addition, discretionary funds for research are provided by ATR Interpreting Telecommunications Research Laboratories and Intel Corporation.

1. Contract No. N66001-96-C-8526, from the Information Technology Office, monitored by the Naval Command Control, and Ocean Surveillance Center and contract no. DAAN02-98-K0003, monitored through US Army Soldier Systems Command.

2. This material is based upon work supported by NSF grant no. IRI-9618731.

SPOKEN LANGUAGE SYSTEMS xvii xviii SUMMARY OF RESEARCH