March 10 - 12, 2008 San Diego, CA

Total Page:16

File Type:pdf, Size:1020Kb

March 10 - 12, 2008 San Diego, CA March 10 - 12, 2008 San Diego, CA San Diego Marriott Hotel & Marina www.VoiceSearchConference.com Primary Sponsors Caller Experience Analytics Supporting Another Innovative Solution From BBN Technologies Sponsors Welcome to the inaugural Voice Search Conference. hank you for helping to move this can simplify user interfaces and accelerate Tfundamental development forward. the adoption of many creative products and Voice Search leverages advances in speech services. Our many sponsors agree and have technology to make applications and services made it easier for us to deliver a top-notch easier to use. Th e basic Voice Search principle user experience for att endees. We hope you is to reduce complexity and the number agree. of steps that a user must take to achieve an objective. Enjoy the conference! Th e program committ ee: Our guiding principle in producing this Bill Meisel, President, TMA Associates conference is to reduce the number of steps K.W. “Bill” Scholz, President, AVIOS, and you need take to make the most of this President, NewSpeech LLC development in advancing your business. Tom Schalk, Vice President, Voice Tech- Our objective has been to draw together nology, ATX Group experts and practical examples of what is being done and what can be done in applying Voice Search. We believe that Voice Search AVIOS Board of Directors Sara Basson, Program Director - Human Alexander Rudnicky, Principal Systems Ability, IBM - T.J. Watson Research Scientist, Carnegie Mellon University Center Th omas Schalk, Vice President, Voice Neal Bernstein, Senior Director, Local & Technology, ATX Group, Inc. Mobile Search, Microsoft K.W. “Bill” Scholz, President, AVIOS, and Michael Cohen, Manager, Speech Tech- President, NewSpeech LLC nology Group, Google Markett a Silvera, Chief Executive Offi cer, Susan Hura, Principal, SpeechUsability Apptera Alan Knipe, Founder, StarNet Systems Kim Silverman, Manager, Spoken Lan- James Larson, Columnist, Information guage Technologies, Apple Computer Today, Inc. Nava Shaked, CRM & CC Practice Leader, John Oberteuff er, Chairman Advisory Global Business Services, IBM Israel Committ ee, Fonix Michael Wehrs, Vice President Evangelism Ron Owens, Director Multimedia Applica- & Industry Aff airs, Nuance tions PSO, Nortel Matt Yuschik, Human Factors Specialist, Bruce Pollock, Vice President, Profes- Convergys Corporation sional Services, West Corporation Patt i Price, Principal, PPRICE Speech and Language Conference organizers Th e new Voice Search Conference is AVIOS is a not-for-profi t educational organized by Applied Voice Input Output organization founded in 1981. For many Society (AVIOS), the non-profi t educational years, the AVIOS annual conference was organization; and Bill Meisel, president, the only forum dedicated to practical TMA Associates, and Editor, Speech applications of advanced speech Strategy News. Th e organizers’ deep and technology. Most recently, long experience in practical applications AVIOS helped organize and business of speech technology—as well content for other conferences, as delivering successful conferences—is and now has launched the Voice refl ected in the program. Search Conference. As a bonus, att endees to the conference receive Bill Meisel’s TMA Associates publishes AVIOS membership and benefi ts. See www. Speech Strategy News, a paid-subscription, avios.org for more information. no-ads business newslett er writt en by Meisel. Bill also provides consulting services and other resources for businesses impacted by voice search and the maturing of speech recognition, text- to-speech, speaker verifi cation, and other advanced speech technologies. See www. tmaa.com for more information. Conference at-a-glance For program updates & further details, see www.voicesearchconference.com 7:30 am Continental Breakfast Marina Ballroom Foyer 8:15 K1: Keynote panel: Welcome and introduction to the AVT Seminar Marina G 8:20 − K2: Keynote panel: What is diff erent about Voice Search technology (and what isn’t) 9:20 Marina G AVIOS Applied Voice Technology Seminar Demonstration Derby (Marina G) (Marina F) 9:30 − A102: Delivering high-volume Voice Search applications B102: Directory assistance 10:30 10:30 Break 11:00 − A103: Dealing with unstructured searches B103: Contact center automation & analytics 12:30 12:30 pm Lunch 1:30 − A104: Th e Voice User Interface in Voice Search B104: Local & general mobile search Monday, March 10 Monday, 2:30 2:45 − A105: Standards & multimodality B105: Innovative speech technology 3:45 3:45 Break sponsored by LNTS 4:00 − A106: Special topics in Voice Search B106: Supporting speech technology 5:15 5:15 Wine & cheese reception and AVIOS Student Application Contest Awards Ceremony 7:30 am Continental Breakfast Marina Ballroom Foyer 8:00 K3: Keynote panel: Conference introduction: Why we’re here Marina G 8:15 − K4: Keynote panel: How big is the Voice Search opportunity & how high the hurdles? 9:30 Marina G Applications, strategy, business models, & Call Center automation & Unifi ed Design strategies, tools, & delivery platforms 1 marketing Communications 1 (Coronado) () () 1 (Coronado) (Marina G) (Marina F) 9:45 − A201: Business models in Voice Search B201: Multimodal user interfaces C201: Th e impact of Voice Search & mobile ads 11:00 on call centers 11:00 Break sponsored by TalkHouse 11:15 − A202: Applications of Voice Search I B202: Delivery platform options C202: Customer-friendly call center applications 12:15 12:30 pm Lunch sponsored by LumenVox Coronado Terrace Tuesday, March Tuesday, 1:30 − A203: Searching audio/video sources on the Web B203: Dialog strategies for Voice Search C203: Infrastructure for contact centers adapting 2:30 to Voice Search 2:45 − A204: Directory Assistance & Local Search B204: Th e role of standards C204: Supporting mobile devices in call center 3:45 applications 3:45 Break sponsored by CallMiner 4:00 − A205: Delivering relevant ads B205: Voice hosting: Outsourcing the voice C205: Speech analytics for business intelligence 5:15 infrastructure or application 7:00 Casino Night and Dinner sponsored by Call Genie, IBM, Nuance, vlingo, and VoiceBox Technologies 7:30 am Continental Breakfast sponsored by Voice Compass Marina Ballroom Foyer 8:00 − K5: Keynote panel: How will contact centers evolve in the Voice Search era? 9:30 Marina G Applications, strategy, business models, & Call Center automation & Unifi ed Dimensions of voice search marketing Communications (Coronado) (Marina G) (Marina F) 9:45 − A301: Speech & mobility B301: Converting voicemail to text C301: Improving the user experience 11:00 11:00 Break 11:15 − A302: Applications of Voice B302: International & multilingual services C302: Agents & automation 12:30 Search II 12:30 pm Lunch 1:45 − A303: Agent support of automation in services B303: “Personal assistant” & avatar services C303: Managing communications Wednesday, March 12 Wednesday, 3:00 3:00 Break 3:15 − K6: Closing debate: Lessons we should take from the conference (Panel) 4:30 Marina G Principal Sponsors and collaboration for the globally-integrated enterprise. IBM shares its speech soft ware with clients and partners, providing intel- www.callgenie.com lectual property, engineering expertise and Call Genie, Inc. is a leading provider of en- design services to improve their products and hanced Voice-enabled Mobile Local (“VoMo- solutions. Lo”) search products and services to Wireless Carriers, Directory Assistance providers, and Yellow Pages publishers. Off ered as a turn- key or ASP solution, Call Genie’s Enhanced Voice Directory (EVD™) platform enables companies to off er a comprehensive, voice- enabled business category search service to www.nuance.com consumers and business customers. EVD™ is Nuance Communications, Inc. (NASDAQ: network, handset and location independent, NUAN) is a leading provider of speech and and can be incorporated into any existing imaging solutions for businesses and con- DA service or deployed as a stand-alone of- sumers around the world. Its technologies, fering. Call Genie won the 2006 Yellow Page applications, and services make the user ex- Association Industry Excellence Award for perience more compelling by transforming Marketing Innovation in North America, the the way people interact with information and 2006 Whitaker Innovation Award in Europe, how they create, share, and use documents. and the 2006 118 Tracker Award for Technol- Every day, millions of users and thousands of ogy Innovation in the UK. businesses experience Nuance’s proven ap- plications. For more information, please visit www.nuance.com. www.ibm.com As a pioneer in speech recognition technol- ogy, IBM is focused on taking this technol- ogy to the various touchpoints of an increas- www.vlingo.com ingly mobile world. You will fi nd IBM speech Vlingo Corporation delivers a voice-pow- soft ware enhancing the driver’s experience ered interface for mobile phones. Leveraging including features like the control of the car a new technology called adaptive Hierarchical radio and the navigation system. IBM soft - Language Models (HLMs), vlingo’s approach ware, hardware, and services are transforming allows carriers and mobile application provid- healthcare delivery, helping children learn to ers to quickly and inexpensively voice-enable read, providing improved customer service any application – without custom engineer- and new business insight through self service ing or in-house speech expertise. Unlike con- and deep analytics. IBM is combining speech ventional voice recognition technologies that and translation technologies
Recommended publications
  • Rečové Interaktívne Komunikačné Systémy
    Rečové interaktívne komunikačné systémy Matúš Pleva, Stanislav Ondáš, Jozef Juhár, Ján Staš, Daniel Hládek, Martin Lojka, Peter Viszlay Ing. Matúš Pleva, PhD. Katedra elektroniky a multimediálnych telekomunikácií Fakulta elektrotechniky a informatiky Technická univerzita v Košiciach Letná 9, 04200 Košice [email protected] Táto učebnica vznikla s podporou Ministerstvo školstva, vedy, výskumu a športu SR v rámci projektu KEGA 055TUKE-04/2016. c Košice 2017 Názov: Rečové interaktívne komunikačné systémy Autori: Ing. Matúš Pleva, PhD., Ing. Stanislav Ondáš, PhD., prof. Ing. Jozef Juhár, CSc., Ing. Ján Staš, PhD., Ing. Daniel Hládek, PhD., Ing. Martin Lojka, PhD., Ing. Peter Viszlay, PhD. Vydal: Technická univerzita v Košiciach Vydanie: prvé Všetky práva vyhradené. Rukopis neprešiel jazykovou úpravou. ISBN 978-80-553-2661-0 Obsah Zoznam obrázkov ix Zoznam tabuliek xii 1 Úvod 14 1.1 Rečové dialógové systémy . 16 1.2 Multimodálne interaktívne systémy . 19 1.3 Aplikácie rečových interaktívnych komunikačných systémov . 19 2 Multimodalita a mobilita v interaktívnych systémoch s rečo- vým rozhraním 27 2.1 Multimodalita . 27 2.2 Mobilita . 30 2.3 Rečový dialógový systém pre mobilné zariadenia s podporou multimodality . 31 2.3.1 Univerzálne riešenia pre mobilné terminály . 32 2.3.2 Projekt MOBILTEL . 35 3 Parametrizácia rečových a audio signálov 40 3.1 Predspracovanie . 40 3.1.1 Preemfáza . 40 3.1.2 Segmentácia . 41 3.1.3 Váhovanie oknovou funkciou . 41 3.2 Spracovanie rečového signálu v spektrálnej oblasti . 41 3.2.1 Lineárna predikčná analýza . 43 3.2.2 Percepčná Lineárna Predikčná analýza . 43 3.2.3 RASTA metóda . 43 3.2.4 MVDR analýza .
    [Show full text]
  • A Framework for Intelligent Voice-Enabled E-Education Systems
    A FRAMEWORK FOR INTELLIGENT VOICE-ENABLED E-EDUCATION SYSTEMS BY AZETA, Agbon Ambrose (CUGP050134) A THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER AND INFORMATION SCIENCES, SCHOOL OF NATURAL AND APPLIED SCIENCES, COLLEGE OF SCIENCE AND TECHNOLOGY, COVENANT UNIVERSITY, OTA, OGUN STATE NIGERIA, IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE MARCH, 2012 CERTIFICATION This is to certify that this thesis is an original research work undertaken by Ambrose Agbon Azeta with matriculation number CUGP050134 under our supervision and approved by: Professor Charles Korede Ayo --------------------------- Supervisor Signature and Date Dr. Aderemi Aaron Anthony Atayero --------------------------- Co- Supervisor Signature and Date Professor Charles Korede Ayo --------------------------- Head of Department Signature and Date --------------------------- External Examiner Signature and Date ii DECLARATION It is hereby declared that this research was undertaken by Ambrose Agbon Azeta. The thesis is based on his original study in the department of Computer and Information Sciences, College of Science and Technology, Covenant University, Ota, under the supervision of Prof. C. K. Ayo and Dr. A. A. Atayero. Ideas and views of this research work are products of the original research undertaken by Ambrose Agbon Azeta and the views of other researchers have been duly expressed and acknowledged. Professor Charles Korede Ayo --------------------------- Supervisor Signature and Date Dr. Aderemi Aaron Anthony Atayero --------------------------- Co- Supervisor Signature and Date iii ACKNOWLEDGMENTS I wish to express my thanks to God almighty, the author of life and provider of wisdom, understanding and knowledge for seeing me through my Doctoral research program. My high appreciation goes to the Chancellor, Dr. David Oyedepo and members of the Board of Regent of Covenant University for catching the vision and mission of Covenant University.
    [Show full text]
  • Speechtek 2008 Final Program
    FINAL PROGRAM AUGUST 18–20, 2008 NEW YORK MARRIOTT MARQUIS NEW YORK CITY SPEECH IN THE MAINSTREAM WWW. SPEECHTEK. COM KEYNOTE SPEAKERS w Analytics w Mobile Devices Ray Kurzweil author of The Age of Spiritual Machines and w Natural Language w Multimodal The Age of Intelligent Machines w Security w Video and Speech w Voice User Interfaces w Testing and Tuning Lior Arussy President, Strativity Group author of Excellence Every Day Gold Sponsors: Kevin Mitnick author, world-famous (former) hacker, & Silver Sponsor: Media Sponsors: security consultant Organized and produced by Welcome to SpeechTEK 2008 Speech in the Mainstream Last year we recognized that the speech industry was at a tipping Conference Chairs point, or the point at which change becomes unstoppable. To James A. Larson illustrate this point, we featured Malcolm Gladwell, author of the VP, Larson Technical Services best-selling book, The Tipping Point, as our keynote speaker. Not surprisingly, a lot has happened since the industry tipped: We’re Susan L. Hura seeing speech technology in economy-class cars and advertised on television to millions of households, popular video games with VP, Product Support Solutions more robust voice commands, and retail shelves stocked with affordable, speech-enabled aftermarket navigation systems. It’s Director, SpeechTEK clear that speech is in the mainstream—the focus of this year’s Program Planning conference. David Myron, Editorial Director, Speech technology has become mainstream for organizations CRM and Speech Technology magazines seeking operational efficiencies through self-service. For enterprises, speech enables employees to reset passwords, sign up for benefits, and find information on company policies and procedures.
    [Show full text]
  • Speech Recognition and Synthesis
    Automatic Text-To-Speech synthesis Speech recognition and synthesis 1 Automatic Text-To-Speech synthesis Introduction Computer Speech Text preprocessing Grapheme to Phoneme conversion Morphological decomposition Lexical stress and sentence accent Duration Intonation Acoustic realization, PSOLA, MBROLA Controlling TTS systems Assignment Bibliography Copyright c 2007-2008 R.J.J.H. van Son, GNU General Public License [FSF(1991)] van Son & Weenink (IFA, ACLC) Speech recognition and synthesis Fall 2008 4 / 4 Automatic Text-To-Speech synthesis Introduction Introduction Uses of speech synthesis by computer Read aloud existing text, eg, news, email and stories Communicate volatile data as speech, eg, weather reports, query results The computer part of interactive dialogs The building block is a Text-to-Speech system that can handle standard text with a Speech Synthesis (XML) markup. The TTS system has to be able to generate acceptable speech from plain text, but can improve the quality using the markup tags van Son & Weenink (IFA, ACLC) Speech recognition and synthesis Fall 2008 5 / 4 Automatic Text-To-Speech synthesis Computer Speech Computer Speech: Generating the sound Speech Synthesizers can be classified on the way they generate speech sounds. This determines the type, and amount, of data that have to be collected. Speech Synthesis Articulatory models Rules (formant synthesis) Diphone concatenation Unit selection van Son & Weenink (IFA, ACLC) Speech recognition and synthesis Fall 2008 6 / 4 Automatic Text-To-Speech synthesis Computer Speech
    [Show full text]
  • Voicexml (VXML) 2.0
    Mobile Internet Applications─ XML-based Languages Lesson 03 XML based Standards and Formats for Applications © Oxford University Press 2018. All rights reserved. 1 Markup Language Format Stand- ardized For Specific Application • The tags, attributes, and XML-based language use the extensible property of XML • Defines a specific standardized sets of instances of the tags, attributes, their representation and behaviour, and other characteristics for using in that application. © Oxford University Press 2018. All rights reserved. 2 XForm • An XML format standardized for Specific Application needing UIs (user interfaces) like text area fields, buttons, check- boxes, and radios • Xform the fields (keys) which are initially specified • Fields either have no initial field values or default field values © Oxford University Press 2018. All rights reserved. 3 XForm • Presented to a user using a browser or presentation software and user interactions take place • The user supplies the values for these fields by entering text into the fields, checking the check-boxes, and selecting the radio © Oxford University Press 2018. All rights reserved. 4 XForm • The user then submits the XForm which is transmitted from client to server for carrying out the needed form processing • The server program can be at remote web server or at the device computing system © Oxford University Press 2018. All rights reserved. 5 XML for Forms─ XForms • An application may generate a form for answering the queries at server end, which needs to be filled and submitted by the client end • XForms is a form in XML format which specifies a data processing model for XML data and UIs for the XML data © Oxford University Press 2018.
    [Show full text]
  • The Definitive Guide
    Mastering the MP3 Audio Experience The Definitive Guide ~~/ REILLY® Scot Hacker Facebook's Exhibit No. 1062 Page 1 MP3 The Definitive Guide Facebook's Exhibit No. 1062 Page 2 MP3 The Definitive Guide Scot Hacker O'REILLY® Beijing · Cambridge · Farnham · KO!n · Paris · Sebastopol · Taipei · Tokyo Facebook's Exhibit No. 1062 Page 3 MP3: The Definitive Guide by Scot Hacker Copyright © 2000 O'Reilly & Associates, Inc. All rights reserved. Printed in the United States of America. Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472. Editor: Simon Hayes Production Editor: Maureen Dempsey Cover Designer: Hanna Dyer Printing History: March 2000: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of a hermit crab and MP3 is a trademark of O'Reilly & Associates, Inc. While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Library of Congress Cataloging-in-Publication Data Hacker, Scot MP3: the definitive guide/Scot Hacker.-1st ed. p. em. ISBN 1-56592-661-7 (alk. paper) 1. MP3. 2.MP3 players. 3. Music-Computer programs. 4. Internet (Computer network)­ Computer programs. I. Title. ML74.4.M6 H33 2000 780' .285'65-dc21 00-025403 ISBN: 1-56592-661-7 [4/00] [M] Facebook's Exhibit No.
    [Show full text]
  • IBM Websphere Voice Systems Solutions
    Front cover IBM WebSphere Voice Systems Solutions Covers features and functionality of WebSphere Voice Server with WebSphere Voice Response V3.1 Includes scenarios of Dialogic, Cisco and speech technologies Highlights application development and SDK 3.1 Guy Kempny Suchita Dadhich Ekaterina Dietrich Hu Jiong Justin Poggioli ibm.com/redbooks International Technical Support Organization IBM WebSphere Voice Systems Solutions Implementation Guide January 2003 SG24-6884-00 Note: Before using this information and the product it supports, read the information in “Notices” on page xi. First Edition (January 2003) This edition applies to WebSphere Voice Server for Windows 2000 and AIX, V3.1, WebSphere Voice Response for AIX, V3.1, and WebSphere Voice Response for Windows NT and Windows 2000, V3.1. © Copyright International Business Machines Corporation 2003. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . .xi Trademarks . xii Preface . xiii The team that wrote this redbook. xiv Become a published author . xvi Comments welcome. xvi Chapter 1. Voice technologies . 1 1.1 Access to information through voice . 2 1.2 What are voice applications? . 3 1.3 Speech recognition . 4 1.4 Text-to-speech . 6 1.5 Terminology. 7 1.6 VoiceXML . 8 1.7 Application development . 11 1.7.1 Available tools . 11 1.7.2 Creating and deploying an application . 12 1.7.3 Integrating speech recognition and TTS . 13 1.8 Hardware technology . 14 1.8.1 IBM . 14 1.8.2 Intel Dialogic . 14 1.8.3 Aculab .
    [Show full text]
  • Video Compression: MPEG-4 and Beyond Video Compression: MPEG-4 and Beyond
    Video Compression: MPEG-4 and Beyond Video Compression: MPEG-4 and Beyond Ali Saman Tosun, [email protected] Abstract Technological developments in the networking technology and the computers make use of video possible. Storing and transmitting uncompressed raw video is not a good idea, it requires large storage space and bandwidth. Special algorithms which take the characteristics of the video into account can compress the video with high compression ratio. In this paper I will give a overview of the standardization efforts on video compression: MPEG-1, MPEG-2, MPEG-4, MPEG-7, and I will explain the current video compression trends briefly. See also: Multimedia Networking Products | Multimedia Over IP: RSVP, RTP, RTCP, RTSP | Multimedia Networking References | Books on Multimedia | Protocols for Multimedia on the Internet (Class Lecture) | Video over ATM networks | Multimedia networks: An introduction (Talk by Prof. Jain) | Other Reports on Recent Advances in Networking Back to Raj Jain's Home Page Table of Contents: ● 1. Introduction ● 2. H.261 ● 3. H.263 ● 4. H.263+ ● 5. MPEG ❍ 5.1 MPEG-1 ❍ 5.2 MPEG-2 ❍ 5.3 MPEG-3 ❍ 5.4 MPEG-4 ❍ 5.5 MPEG-7 ● 6. J.81 ● 7. Fractal-Based Coding ● 8. Model-based Video Coding http://www.cis.ohio-state.edu/~jain/cis788-99/compression/index.html (1 of 13) [2/7/2000 10:39:09 AM] Video Compression: MPEG-4 and Beyond ● 9. Scalable Video Coding ● 10 . Wavelet-based Coding ● Summary ● References ● List of Acronyms 1. Introduction Over the last couple of years there has been a great increase in the use of video in digital form due to the popularity of the Internet.
    [Show full text]
  • Mp3
    <d.w.o> mp3 book: Table of Contents <david.weekly.org> January 4 2002 mp3 book Table of Contents Table of Contents auf deutsch en español {en français} Chapter 0: Introduction <d.w.o> ● What's In This Book about ● Who This Book Is For ● How To Read This Book books Chapter 1: The Hype code codecs ● What Is Internet Audio and Why Do People Use It? mp3 book ● Some Thoughts on the New Economy ● A Brief History of Internet Audio news ❍ Bell Labs, 1957 - Computer Music Is Born pictures ❍ Compression in Movies & Radio - MP3 is Invented! poems ❍ The Net Circa 1996: RealAudio, MIDI, and .AU projects ● The MP3 Explosion updates ❍ 1996 - The Release ❍ 1997 - The Early Adopters writings ❍ 1998 - The Explosion video ❍ sidebar - The MP3 Summit get my updates ❍ 1999 - Commercial Acceptance ● Why Did It Happen? ❍ Hardware ❍ Open Source -> Free, Convenient Software ❍ Standards ❍ Memes: Idea Viruses ● Conclusion page source http://david.weekly.org/mp3book/toc.php3 (1 of 6) [1/4/2002 10:53:06 AM] <d.w.o> mp3 book: Table of Contents Chapter 2: The Guts of Music Technology ● Digital Audio Basics ● Understanding Fourier ● The Biology of Hearing ● Psychoacoustic Masking ❍ Normal Masking ❍ Tone Masking ❍ Noise Masking ● Critical Bands and Prioritization ● Fixed-Point Quantization ● Conclusion Chapter 3: Modern Audio Codecs ● MPEG Evolves ❍ MP2 ❍ MP3 ❍ AAC / MPEG-4 ● Other Internet Audio Codecs ❍ AC-3 / Dolbynet ❍ RealAudio G2 ❍ VQF ❍ QDesign Music Codec 2 ❍ EPAC ● Summary Chapter 4: The New Pipeline: The New Way To Produce, Distribute, and Listen to Music ● Digital
    [Show full text]
  • W3C Looks to Improve Speech Recognition Technology for Web Transactions 10 December 2005
    W3C looks to improve speech recognition technology for web transactions 10 December 2005 pronunciation. This insures the software will hear the right tones and pitches so critical in languages were a tiny change in pronunciation can affect the whole meaning of a word. SSML is also used to tag areas of speech with different regional pronunciations. It is based on JSpeech Grammar Format (JSGF). A technical description and how to use SSML version 1 on a web page can be found here: www.xml.com/pub/a/2004/10/20/ssml.html Microsoft Agent website is another source for would be speech interface developers. www.microsoft.com/MSAGENT/downloads/user.as W3C, the standards-setting body for the Internet p (World Wide Web Consortium), has completed a draft for the important VoiceXML 3.0 - technology Opera browsers can be programmed for speech enabling voice identification verification. While recognition with some XHTML (Extended Hypertext normally associated with voice commands, it has Markup Language) extensions. the potential to greatly speed and improve the my.opera.com/community/dev/voice/ accuracy and positive proof of online transactions. Working with web-based speech applications can Some larger net businesses are even using it to be frustrating. While the speech recognition confirm orders and verify identity. Many, however, software works well, poor quality microphones and have become increasingly worried about the PC speakers combined with slower Internet reliability and security of these transactions with connections can put a damper on effectiveness. fraud and identity theft on the rise. Error rates have These issues will be difficult to address due to been around 1 to 2% - unacceptable for ironclad being largely beyond the control of the developer.
    [Show full text]
  • AN EXTENSIBLE TRANSCODER for HTML to VOICEXML CONVERSION APPROVED by SUPERVISORY COMMITTEE: Supervisor
    AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION APPROVED BY SUPERVISORY COMMITTEE: Supervisor: AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamalai, B.E. in CSE THESIS Presented to the Faculty of The University of Texas at Dallas in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE THE UNIVERSITY OF TEXAS AT DALLAS May 2002 Copyright c 2002 Narayanan Annamalai All Rights Reserved Dedicated to my Parents ACKNOWLEDGEMENTS My sincere gratitude goes to my advisors Dr. Gopal Gupta and Dr. B Prabhakaran for being instrumental in shaping my ideas and helping me achieve my ambitions. But for their guidance, it would not have been possible for me to complete my thesis. I would also like to thank Dr. Latifur Khan for serving in my thesis committee. I would like to extend my sincere gratitude to all my family, friends who have been sup- portive and encouraging all through my career. Finally, I would like to thank my fellow graduate students Srikanth Kuppa and Vinod Vokkarane for reading my thesis draft and providing me valuable suggestions and helping me with the format of the draft. v AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION Publication No. Narayanan Annamalai, M.S The University of Texas at Dallas, 2002 Supervising Professors: Dr. Gopal Gupta and Dr. B Prabhakaran ‘Anytime anywhere Internet access’ has become the goal for current technology vendors. Sudden increase in the number of mobile users has necessitated the need for ‘Internet access through mobile phones’. The existing web infrastructure was designed for traditional desktop browsers and not for hand-held devices.
    [Show full text]
  • Modeling of an MPEG Layer-3 Encoder and Decoder in Ptolemy
    Modeling of an MPEG Layer-3 Encoder and Decoder in Ptolemy Literature Survey Patrick Brown EE382C – Embedded Software Systems Spring, 2000 Abstract MPEG Audio Layer-3, or “MP3,” has rapidly become the most popular format for the encoding and compression of digital audio and has found widespread application in both hardware and software. This literature survey provides a brief introduction to the MPEG (Moving Picture Experts Group) standard for audio encoding and explores the feasibility of formally modeling an encoder and a decoder. It also describes some of the available literature on the MPEG standard and associated research. The tremendous popularity of the standard has led to a large amount of literature, research, and source code related to the encoding and decoding of MPEG audio. The focus of the actual project will be a model of an MP3 encoder and decoder in Ptolemy [1]. This model will be used to generate C code that can be compared, in terms of speed and memory efficiency, to several of the widely used codecs currently available. Introduction MPEG Audio Layer-3, more commonly known as “MP3,” is part of the set of standards created by the Moving Picture Experts Group (MPEG) [2]. Currently, there are three complete standards that have been created by MPEG and adopted by the International Organization for Standardization (ISO). They are • MPEG-1, approved Nov. 1992 • MPEG-2, approved Nov. 1994 • MPEG-4, approved Oct. 1998 (version 1) and Dec. 1999 (version 2) Each of these standards is made up of several parts. The three main parts to all three are systems, video, and audio.
    [Show full text]