Wireless Data: Speaking Up

February 1, 2001 Technology Research Wireless Data: Speaking Up Speaking of the Next Killer Application, Voice Takes Center Stage Source: Corbis Images. Marianne R. Wolk 212.407.0427 Candace K. Bryan 212.610.6109 Table of Contents SECTION I. INVESTMENT SUMMARY Overview..........................................................................................................................3 Investment Thesis and Recommendations......................................................................4 SECTION II. THE VOICE MARKET Voice Solutions Market Likely to Reach at Least $5 Billion by 2005..............................11 Diverse Markets for Voice Solutions..............................................................................12 Enterprise.................................................................................................................13 Telecom ...................................................................................................................16 Internet.....................................................................................................................17 Expanded Distribution Strategies Track the Evolution of Speech Applications and Markets ..................................................................................................................19 Voice XML, an Emerging Standard for Application Development..................................22 Onset of Standards Could Alter the Value Chain of the Voice Market...........................23 SECTION III. VOICE TECHNOLOGY Voice-Enabling Technologies, Speech Recognition at the Core ...................................29 SECTION IV. VOICE VENDORS Nuance and SpeechWorks Enjoy Accelerating Growth and Significant Barriers to Entry.............................................................................................................35 Public Company Profiles ...............................................................................................37 Private Company Profiles ..............................................................................................79 SECTION I. INVESTMENT SUMMARY Robertson Stephens, Inc. 1 Overview This report marks the fourth in our ongoing series exploring solutions and technologies that we feel are critical to the success of the burgeoning wireless Internet market. Our focus at present is the market for voice software and services (“voice market”), which includes speech recognition engines (speech-to-text), speech synthesis (text-to-speech), speech interfaces (voice browsers), speaker verification, related applications and the slew of new voice portals1 and Application Service Providers (ASPs) hosting voice content and applications. Our thesis, explained in more detail in the following section, is that voice is the killer application for the wireless Internet. By providing the most user-friendly data interface possible to telephony users, voice technology expands the potential Internet market by 3–4 times and acts as a major catalyst for wireless Internet information access and mobile commerce. We have divided this report into three major sections: the voice market, voice technology and voice vendors. Our discussion of the voice market addresses the following: • our forecast for growth • enterprise, telco and Internet market opportunities and business models • the mutual benefits of voice and wireless data • the transition in distribution strategies as the industry moves from equipment-centric to application-centric sales • the effect of open standards, such as VoiceXML, on the value chain in the voice market and the relative positioning of leading vendors in an open marketplace Our review of voice technology includes: • detailed descriptions of the components of voice market technology • investment in natural language understanding, interruption capabilities and interface technolgies Our final section identifies public and private vendors that we believe are positioned for success in the voice technology market. Other reports in our series include: Wireless Data: The Next Internet Frontier (dated January 25, 2000) Wireless Data: The New Economics (dated June 5, 2000) Wireless Data: In Sync (dated December 28, 2000) 1Please refer to Robertson Stephens retail research by Lauren Levitan for more information regarding branded voice portals and voice commerce. Robertson Stephens, Inc. 3 The Human Side of Computing: Voice Is at the Cusp of a Major Investment Cycle, Widening the Promise of the Wireless Internet In our opinion, voice is the next killer application, poised at the start of a major new technology investment cycle. Though speech technology is not new, we believe vast improvements in the accuracy and performance of speech recognition engines coupled with the skyrocketing growth of Internet and wireless communications is setting the stage for widespread adoption. In our view, voice is the most natural user-friendly interface for data and likley to act as a major catalyst for telephony-based Internet services. In particular, we believe voice is the killer application for the wireless Internet, handicapped currently by small screens and inadequate data input devices. Those caught pressing 999-2-44-666-666 into a touch-tone dialing system to spell “Yahoo” or clumsily using your thumbs to type a simple sentence on a two-inch keyboard should appreciate the vast improvement offered by voice. Major improvements in voice technology, including broader support for complex vocabularies, languages, accents noise, natural language (uh-huh) and performance above the 90% mark, voice solutions are no longer just a dream. Buoyed by a potential base of one billion wireless users seeking Internet access by 2002, approximately 4x today’s keyboard-centric PC Internet users, the voice market is at the cusp of a major investment cycle, which we already measure at approximately $1 billion. Accuracy and usability are improving speech’s usefulness in mainstream settings. Speech-to-text technology has been available since the early 1980s but its popularity has seen a major step forward over the few years as its efficacy has risen. At present, companies like SpeechWorksa, Philips Speech Processing and Nuance Communicationsa offer speech recognition (speech-to-text) technology featuring dramatic improvements in accuracy. According to Forrester Research, the combination of increasingly sophisticated algorithms, richer vocabularies and increased cost- effective processor power has produced 90%-plus accuracy across multiple tonalities, languages, accents and speaking rates (although it still falls short of the 99.9999% accuracy of live telephone access). Newer speech recognition systems also feature interruption (barge-in), natural language (open-ended conversation) and noise reduction and filtering mechansims—some of the “missing links” that drove accuracy below 50% in earlier versions (at least one miss per sentence uttered). In addition, new compressor/decompressor (codec) software has reduced bandwidth requirements. As a result, these solutions are now far more effective for mainstream use in a host of settings—indoor, outdoor, wired and wireless. This vast improvement in accuracy is evidenced by customer purchasing criteria, which short lists the aforementioned vendors based on accuracy and then quickly turns to factors ranging from computing requirements and vendor to applications, service and time to market. From dial-tone to voice-tone, voice is revolutionizing the way telephone networks are used. With advanced speech recognition, voice-directed dialing is becoming a reality. Enterprises and carriers alike have already deployed simple applications for tasks such as voice-directed call routing. These applications often replace legacy touch-tone systems that required the calling party to “spell” the name of the person they sought (e.g., press 5,6,3,7,6,4,8,4 for Joe Smith). Newer applications are far more responsive, allowing systems to recognize spoken phrases such as “Call Joe,” and then probe further to see if the user wants to “Call Joe Smith or Joe Green” and then may even ask if you want to call him “at home, mobile or office” before routing the call to the selected number. Looking into our crystal ball, we believe the real nirvana for the voice market is “voice tone”—where, much like in the days of old, a user attempting to make a call hears a voice saying “how can I help you” instead of a mechanical dial tone—only this time the voice will be that of a 24/7 advanced speech recognition system instead of a busy live operator. Speaking to customer service takes on a whole new meaning. The emerging voice market should see a sizable near-term growth opportunity in voice-enabled customer service (a call center upgrade) and eCRM applications. Much of the current market for speech recognition systems has 4 Robertson Stephens, Inc. been driven by enterprises and communication service providers as they seek to automate and replace more expensive, people-based call centers and tedious tone-tone systems. Call center costs can be reduced by as much as 90% relative to a live customer service representative (CSR) assuming a call to a customer service representative (CSR) runs $1.00–1.50 per call and only $0.09 for a speech recognition port. In addition, to reduced expenses, voice-based systems improve customer acquisition, retention and loyalty by improving hold times and faster destination access. SpeechTechnology indicates surveys find more than 80% of consumers using speech systems are satisfied and

Wireless Data: Speaking Up

Voice Interfaces

NLP-5X Product Brief

A Guide to Chatbot Terminology

Voice User Interface on the Web Human Computer Interaction Fulvio Corno, Luigi De Russis Academic Year 2019/2020 How to Create a VUI on the Web?

Eindversie-Paper-Rianne-Nieland-2057069

Voice Assistants and Smart Speakers in Everyday Life and in Education

Integrating a Voice User Interface Into a Virtual Therapy Platform Yun Liu Lu Wang William R

Master Thesis

Context-Aware Intelligent User Interfaces for Supporting Sytem

Arxiv:2011.11315V1 [Eess.AS] 23 Nov 2020 in Public Areas Due to Its Risk of Privacy Leakage

A ROOM with a VUI – VOICE USER INTERFACES in the TESOL CLASSROOM by David Kent Woosong University Daejon, Republic of Korea Dbkent @ Wsu.Ac.Kr

Fuzzing Semantic Misinterpretation for Voice Assistant Applications