<<

Mobile Speech: Unlocking Personal Apps, Features and Functions

The forces of human nature, technological progress and regulatory stricture are converging to boost interest in “truly hands-free” mobile applications. Dozens of firms have responded with a broad variety of , services and features that work remarkably well. Next step: to build a sustainable market model.

September 2009

Dan Miller Sr. Analyst

Opus Research, Inc. 300 Brannan St., Suite 305 San Francisco, CA 94107 For sales inquires please e-mail [email protected] or call +1(415)904-7666 This report shall be used solely for internal information purposes. Reproduction of this report without prior written permission is forbidden. Access to this report is limited to the license terms agreed to originally and any changes must be agreed upon in writing. The information contained herein has been obtained from sources believe to be reliable. However, Opus Research, Inc. accepts no responsibility whatsoever for the content or legality of the report. Opus Research, Inc. disclaims all warranties as to the accuracy, completeness or adequacy of such information. Further, Opus Research, Inc. shall have no liability for errors, omissions or inadequacies in the information contained herein or interpretations thereof. The opinions expressed herein may not necessarily coincide with the opinions and viewpoints of Opus Research, Inc. and are subject to change without notice. Published September 2009 © Opus Research, Inc. All rights reserved. Mobile Speech: Unlocking Personal Apps, Functions & Features Page ii

Key Findings: • Voice control of mobile services has made great strides – A common theme in our research this year is that “the technology works!” and makes for a much-improved user experience. • More companies have joined the ecosystem – The lure of mobile speech has attracted dozens of technology providers, application developers and solution providers (many of whom are profiled in this report). • Ecosystem members learning to play together – Pursuit of the mobile masses has attracted providers of core technologies, builders of applications, retailers and “arms merchants.” • Revenue models are all over the place – As with mobile applications in general, the top-line revenues come from a number of sources, including: technology licenses, monthly subscriptions, advertising and percentage of transactions. • So is application quality – Better building blocks (in the form of recognition software, SDKs, APIs into resources) have attracted more developers, but not all create truly useful services. • Focus has shifted to results – Whether its updating Facebook status, finding a song or artist or originating a text message, users of mobile speech are task-oriented. • Success is measured by task completion – Rather than engine accuracy or automation rates, users want voice to help them accomplish specific objectives. • Adoption is a lagging indicator – With a broad spectrum of applications and services available, awareness is gaining; but successful, repeated use is not yet a reality. • The “Long [Coat] Tail” of the iPhone – Making “Voice Control” a standard feature of the new iPhone S is a legitimization moment for application control, mobile search and dictation. • “Truly hands-free” apps are the next frontier – Most speech enabled apps require a “button” to initiate recognition, but the next frontier is achieving hands-free/eyes-elsewhere engaged usability.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page iii

Table of Contents Key Findings: ...... ii Speech: A Key To Unlocking Mobile’s Potential...... 1 Mobilizing the Internet...... 1 Revenue Models for Mobile Speech ...... 2 The Syndrome...... 4 The iPhone Effect ...... 4 How Speech Has Evolved ...... 5 “Hybrid” Applications Will Be the Norm ...... 5 Compelling Applications Drive Adoption...... 6 Mobile Devices Vary in Size, Expand in Variety and Reach ...... 7 Apple’s iPhone is the Game Changer...... 8 The Larger Opportunity is in “Feature Phones” ...... 10 Safety is a Driving Force ...... 10 Going “Truly Hands-Free”...... 11 Time to Leverage Mobile Speech ...... 12 Appendix: Profiles of Vendors ...... 13 The Three Voices of Google ...... 13 Has its Own Identity Issues ...... 15 Nuance Mobile: A Multimodal Approach...... 16 The Contenders ...... 17 Creaceed ...... 17 Ditech Networks...... 18 Fonix ...... 18 Handheld Speech ...... 19 Jott ...... 19 Loquendo...... 19 Melodis ...... 20 Metaphor ...... 20 Novauris ...... 21 One Voice Technologies ...... 21 PhoneTag (formerly SimulScribe) ...... 22 Promptu...... 22 QTech...... 23 Sakhr Software (Dial Directions) ...... 24 Sensory ...... 24 Siri, Inc...... 25 SpeechCloud ...... 25 SpinVox ...... 25 Vlingo...... 26 VoiceActivation ...... 26 VoiceBox Technologies...... 26 Yap ...... 27 Ydilo AVS ...... 28

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page iv

Table of Figures

Figure 1: Mobile Internet Leading Indicators ...... 2 Figure 2: The Mobile Speech Opportunity (in thousands) ...... 3 Figure 3: What is "Hybrid" Speech? ...... 6 Figure 4: Voice Activated Apps for the iPhone ...... 9 Figure 5: Assessed Risk of Driving While Using Cell Phone ...... 11

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 1

Speech: A Key To Unlocking Mobile’s Potential On the desktop, the Internet offers entertainment, search, messaging, Web browsing and social networking, among other activities. These time-wasting life enhancers make for a compelling online experience. Their addictive qualities drive efforts to make the Internet mobile. “Push ” precipitated the growth of RIM’s Blackberry. Flat-screen, multi-touch interfaces fueled access to multiple apps on the iPhone.

Our premise in this report is that a well-devised and designed voice user interface (VUI) is the next step in accelerating the mobilization of Internet- based applications. Using one’s voice is the most natural mode for communicating over the phone. Though (as avid texters know) it will never be the only modality, which is why today’s developers and technology providers are getting better at creating a user experience that includes voice (when appropriate), and tailors it to a broad spectrum of mobile devices, including phones, laptops, netbooks, personal navigation devices and media players.

There is strong evidence that, over the past five months, mobile speech – or the mobile VUI – has improved significantly on several fronts. Core recognition capabilities have improved thanks to the use of constrained grammars, and the use of “distributed speech processing” has allowed server-side automated (ASR) and application design that is more task-oriented. What’s more, a generation of punditry that had been characteristically skeptical about the prospects of talking to computers is beginning to see value in this form of two-way communication. Paul Saffo, long-time technology forecaster at the Institute For the Future (IFF), for instance, was recently quoted saying, “We're right on the edge of a new era of conversational computing, where in certain circumstances your primary mode of interaction with a machine will be talking to it and having it talk back.”

As long-time developers know, automated speech recognition is but a small part of a conversational application. It is a cocktail that includes equal parts ASR, AI (as in “artificial intelligence”) and transactional capabilities. In the mobile realm, it is best taken in the context of Web services, acoustic modeling, location awareness, multimodal interactions and, of course, a flat- rate data plan. Mobilizing the Internet In 2009, more than 3 billion new mobile phones will be sold worldwide. Roughly 15% are “” with browsers to support Web browsing, an operating system to support multiple applications and owners who, by-and- large, subscribe to a data plan. For the 85% of mobile subscribers who do not have smartphones, a higher-quality VUI will provide the easiest way to find, access and control the features and functions that their carriers offer.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 2

The result should be more frequent purchases, a higher “take rate” on premium offers and the higher retention rates that are associated with happy subscribers.

Figure 1: Mobile Internet Leading Indicators Factors Metric 2009 Wireless Phone Sales (projected): 3 billion % Smartphones 15% Number of Apps shipped/month 200-250 million New Apps downloaded by iPhone 1-6 users/month Source: Opus Research estimates (2009)

Owners of the broad spectrum of mobile devices, if they are paying attention, are deluged by offers from sources of new , content, applications, utilities and even videos offered through a confusing array of App Stores, “carrier decks,” websites and “over-the-air” updates. Revenue Models for Mobile Speech The community of mobile speech-based solutions providers is part of a business ecosystem that, as of this writing, is both under-developed and under development. Correspondingly, the revenue models for mobile speech resources and services are not well defined and under constant change. The full revenue potential of mobile speech will be a mixture of: • Software Licenses - For speech processing resources and related applications that ship with new handsets, plus licensing revenue for software that is downloaded from application stores. • Fees – Charged on a per call, per use or ‘percentage of transaction basis’ for automated speech-initiated activity on mobile devices (enhanced DA is a primary example of this pricing methodology) • Subscriptions - Monthly fees charged for such services as “voicemail- to-text” or “personal virtual assistant” services. None of these methods is mutually exclusive. Indeed a voicemail-to-text service provider, PhoneTag, offers an “unlimited” monthly subscription for its service ($39.95) as well as a per message charge of $3.99. In between, it offers a bundle of 40 messages each month for $9.99. Our model takes multiple pricing tiers into account. But the can be refined over time as we learn more about customer preferences and have more empirical data on actual purchase practices. Empirical observation leads us to believe that actual adoption, use and purchase activity will resemble what we call Skype’s inverted pyramid, whereby the marketing efforts of a service provider can attract hundreds of millions of registered accounts; in Skype’s case, that number is approaching half a billion. Over time, that number turns out to be ten times the number

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 3

of “active users” (on a monthly basis, the number of Skype users reaches about 40 million).

Figure 2: The Mobile Speech Opportunity (in thousands)

Source: Opus Research (2009)

Figure 2 above, illustrates the “most-likely” output of a revenue forecasting model Opus Research built for North America only. The model takes into account the licenses paid to embedded voice recognition software providers who receive a modest license fee for each handset that is shipped with pre- installed software. It also includes the licensing fees charged for software downloads from carriers or device makers’ application stores. One of the “learnings” reflected in this category is the predominance of “free” downloads, usually of voice dialing utilities, which are positioned to demonstrate the capabilities of embedded speech in hopes of building sales of “premium” services. The “Subscriptions” category reflects an estimate of the monthly or annual fees collected by mobile speech-based solution providers. This list includes the high-profile providers of voicemail-to-text conversion services, like SpinVox, PhoneTag and Nuance. But we expect the ranks of subscription- based solution providers to be the fastest growing category of mobile speech services, and a number of those that provide their services on a “free” basis will fairly quickly introduce “premium” flavors of their services that attach fees to the most popular features. The “Activity-based” category is a true wildcard in terms of forecasting. We have seen some very impressive new mobile speech services, especially in

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 4

the area of “mobile search” that plan to generate revenues by taking commissions or a percentage of the value of transactions that are the culmination of a mobile speech-initiated session. Voice controlled search of media libraries is already emerging as an efficient way to buy software. There are similar expectations for speech-initiated search for general merchandise. To generate our forecast, we made assumptions about both the number and value of speech-initiated transactions. As purchases skew from music and media to goods and services, the frequency and average selling price both increase. The Skype Syndrome From a revenue generation perspective, Skype’s experience illustrates a pattern that is all-too-common in the era of “free” services. Nearly 90% of the minutes generated by Skype subscribers carry no charges. Only 1 in 7 to 8 minutes on the Skype network generate any revenue whatsoever. To its credit, the company is expected to generate revenues on the order of $600 million in 2009. The chain of logic between Skype’s experience and the business planning process for mobile speech solutions providers is a bit tenuous. The point is that mobile speech service providers are well-counseled either to introduce services with demonstrated value that can justify a monthly subscription or recognize that they must recruit a large number of prospective users into their fold so that they can operate a profitable, sustainable business with one-tenth of the minutes generated by the one-tenth of the user base that can be considered “active.”

It includes providers of automated speech processing software that can reside on mobile handsets and devices or, more likely, on larger application servers that receive digitized voice delivered over a mobile carrier’s data channel. The iPhone Effect The Apple iPhone, alone, has encouraged its approximately 30 million global owners (all of whom have data plans) to download over 6 billion mobile applications. The iPhone effect has spawned “App Stores” and raised expectations for all mobile phones to deliver more content and greater utility. As content/application providers, device makers and carriers scramble to keep up with the iPhone, the result has been a high level of confusion and a hodge-podge of ways for mobile subscribers to gain access to the features, functions and services offered through their handsets and mobile devices.

As confusion reigns, the so-called voice user interface (VUI) is emerging as a great equalizer by enabling mobile subscribers to use the spoken commands and input to control their devices, enter text (into search boxes or messaging applications) or find information, content or applications running on the device or out in the network cloud.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 5

How Speech Has Evolved For nearly 20 years, the speech processing community has advanced a variety of recognition and workflow technologies. The asymptotic pursuit of “100% accuracy” continues, with some informed scientists saying that the accuracy of core ASR improves 10 to 15 percent every six months. Meanwhile speech scientists have joined product developers to design platforms and protocols that are more “context aware” and “dynamic.”

Personal navigation devices (PNDs) are tuned to recognize specific place names and instructions while a “Voice Control” interface for the iPhone can have a more constrained “grammars” (the term for an ASR engine’s vocabulary and thesaurus) around such terms as “find,” “play,” “send” and the like.

Improved recognition has been accompanied by dramatic advances in text- to-speech (TTS) rendering. The output of text readers can be tuned, tailored or “sculpted” to sound lifelike – even to the point where it supports branding efforts by sounding like a company’s spokesperson. As a result, both application developers and device makers have become more comfortable making their services or products “conversational.” In PNDs this translates into spoken, turn-by-turn directions. It also converts smartphones and in-car systems into text readers that support hands-free access to email, text messages or social media. “Hybrid” Applications Will Be the Norm Offering accurate and effective speech recognition on mobile handsets will always pose power and size constraints, but these challenges are now overcome by the advent of “hybrid” implementations, which balance the realities of battery life and connectivity issues against the ability to recognize both what callers are saying and, to an increasing degree – what they want to accomplish. Once the user experience is given paramount importance, application designers attach less importance to automation rates and place more emphasis on success rates for task completion.

In pursuit of the best experience, more mobile applications employ robust recognition engines and dynamic grammars that are “in the cloud,” so that they can have deeper integration with back-office data and metadata (like product catalogues, price lists or traffic conditions). Depending on the application, they may also employ live agents to assist in “disambiguation” of spoken utterances. In some cases, they merely “tag” (i.e. assigning categories) to content – which helps to determine which grammar to employ. However, they can also perform full-blown speech-to-text transcription of spoken messages.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 6

Key points to remember for hybrid speech include:

1) Human intervention is important/necessary and okay 2) The “path” can include embedded resources, hosted resources, back-end servers, multiple modes…and people.

Figure 3: What is "Hybrid" Speech?

Source: Opus Research (2009)

Still, business realities tell a different story. Low automation rates are accompanied by high levels of human intervention. During the past few years, service providers like SpinVox employed live agents in low-wage countries to provide a semblance of affordability. Yet it is broadly believed, based on financial modeling, that large-scale service offerings will require near-total automation rates, which justifies the “100% automated” approach pursued by Google as part of its Goog411 and Google Voice’s voicemail-to-text transcription services. Compelling Applications Drive Adoption As new mobile devices and “operating systems” are added to the mix, the case for speech-enabled services grows stronger and stronger. That’s nothing new. It is a cruel irony that has confronted the speech processing community

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 7 since the first talking computers, or maybe the first TV episode of “Star Trek.” Like William Shatner, speech processing continues to improve with age, adding more conversational aspects as its underlying platforms grow smaller and less expensive.

During that time Opus Research joined other analysts to point out that the mass acceptance and deployment of speech technologies would be “solutions” driven. This, indeed, turned out to be true as speech-enabled interactive voice response systems (IVRs) established beachheads in large contact centers that have had success building business cases thanks to the ability to reduce headcount in the contact center, shorten the time it takes to resolve customer care issues and, more recently, in reducing the number of misdirected calls.

Part of the business case for more robust speech applications is the replacement of dedicated IVRs with media servers and “voice portal” platforms that conform to open standards and can be more easily linked with the servers and customer care logic that powers customer care Web sites, as well as phone-based contact centers. While the stars are aligned within the enterprise data center, a parallel transformation has been taking place in the wireless services community. New applications mix-and-match voice/data modes and online/offline states to fulfill user requirements. Mobile Devices Vary in Size, Expand in Variety and Reach Hundreds of millions of terminals are connected to wireless service providers’ networks. Many are cell phones for which “voice” and “speech-enabled” services are natural extensions of the server-based resources. But the speech community has long-standing investment in solutions that integrate and internetwork with both laptops and a growing number of “always on” netbooks that ship with the ability to support dictation and softphones.

Solutions providers have gotten better at building applications that are indifferent to whether the phones are “connected” or “offline.” As a matter of fact, the most elegant applications do a great job of transparently toggling between embedded resources, remote servers and even intervention by live people in order to fulfill user requirements. For example, in the most common deployments of speech-to-text transcription from a mobile device, the following steps are taken:

User – Dictates instruction and content (For example: user says “send text message to xxx-xxxx”; content: “Will be late for meeting. Please start without me.”)

Device-based resources – Captures utterances; parses commands from content, transmits content over data link to host or server for interpretation and transcription.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 8

Remote server – Executes instructions. Performs speech-to-text transcription (perhaps first pass). Transmits transcribed text to SMS or email delivery fabric.

There are a number of variations to this theme. In some cases, the initial text file created by the speech-to-text conversion engine is treated as a “first draft” and presented to live agents on the receiving end for editing and correction. This is especially true for Voicemail-to-Text transcription services where service providers like Nuance, SpinVox or PhoneTag put transcription accuracy ahead of automation rates . In effect, they leverage the best of Web 2.0, cloud computing and peer-to-peer as they employ remote agents to interpret and transcribe utterances. Apple’s iPhone is the Game Changer There is no question that the iPhone has had an impact on the marketplace that far exceeds its market share. It is a “closed” and locked-down system with restrictive or at least time-consuming regimes surrounding the App Store’s development and distribution methods. Still, it has opened the eyes of developers to the prospects of circumnavigating wireless carriers to bring their wares to the marketplace. It has also presented applications and content to wireless subscribers as a multiplicity of icons that can be scrolled through and opened through the multi-touch protocol.

The introduction of the iPhone 3G S added another area of innovation – heightened prospects for “Voice Command” – a feature that was given the same prominence as video recording and editing, “spotlight search” and geo- positioning. At the time of publication, by our estimate, there were more than twenty “voice-related” applications in the iTunes App Store. It is a tough category to monitor given that Apple correctly decided not to have a “Voice Application” category and, instead opted to organize each application according to its rightful place either in the “utilities” or “productivity” category. Our best assessment is displayed in Figure 4 above.

Figure 4 below illustrates the voice-activated or speech-enhanced applications and utilities for AT&T Mobility subscribers with . Most of the companies listed have products or services offered to the broader “,” with special attention to RIM Blackberry’s and the growing population of phones that run the Android operating system endorsed by Google and its partners.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 9

Figure 4: Voice Activated Apps for the iPhone

Voice Dialers and Control iPhone VoiceControl Apple (presumed to be Free licensed from Nuance) AdelaVoice Voice Dialer SpeechCloud $4.99 Voice Dial Makayama $1.99 Melodis Voice Dialer Melodis Free VoiceBox Dialer™ VoiceBox Technologies Free Voice Lookup Handheld Speech $2.99 Fonix iSpeak™ Fonix $2.99 iVoiceDialIt VoiceIt Technologies $6.99 Voice Dial GP Imp $0.99 Say Who Dial Directions Free Dillo Loquendo $3.99 SayNCall VoiceDialer Metaphor $0.99 NameDial VoiceActivation $0.99 Vocalia Creaceed $3.99

Voice-to-Text Productivity Enhancers Vlingo Vlingo Corporation Free Google Google Free VoiceBoxPlus VoiceBox Technologies $2.99 ReQall QTech “Free” but carries a $2.99/mo. service charge ShoutOUT Promptu TBD

Apple’s commitment to deploying speech technologies across its entire product line is made manifest by a broad licensing agreement with Nuance that covers desktop dictation systems (a la Dragon Systems) as well as mobile voice control. In addition, Apple has been both aggressive and imaginative in defining mobile user interfaces that incorporate both spoken and keyboard input. Many of the use cases that it foresees are codified in a patent application that was made public in August 2009 by the U.S. Patent and Trademark Office under the application #20090216531.

The iPhone applications providers in Figure 4 are a high-profile subset of the speech-enabled mobile services ecosystem. Because of the predominance of “hybrid” solutions, the lines of demarcation are often blurred, but the groups include:

• Core Speech Processing Technology Providers - Exemplified by Nuance, Loquendo and Fonix

• Voice Application Developers – This category puts industry giants like Microsoft (Tellme) and Google into the same boat as smaller,

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 10

emerging innovators like Vlingo and Yap! (both using IBM’s recognizer), Novauris, Dial Directions (now a subsidiary of Sakhr Software), and a dozen or so others.

• Wireless Network Operators - Verizon, among the carriers, has distinguished itself by having a speech-enabled flavor of its VCAST store, powered by Novauris. The Larger Opportunity is in “Feature Phones” Nuance doesn’t appear in Figure 4 above with a stand-alone product. That’s because its technologies are integrated into a wide range of Apple products as part of a wide ranging licensing agreement. “Voice Control” on the iPhone 3G S (though not publicly “branded”) is an integration of Nuance Mobile software and services. Meanwhile, Nuance and a number of its go-to-market partners see significant revenues to be generated by introducing speech recognition as part of the user interface on the hundreds of millions of so- called “feature phones” around the world.

Nuance’s Mobile division has been aggressively promoting a range of embedded, network-based and “distributed” solutions for improving the mobile user experience. Its products span embedded speech processing, which includes both ASR (automated speech recognition) and TTS (text-to- speech rendering), as well as a range of “predictive” text input mechanisms including T9, which provides predictive text entry primarily for SMS, and T9 NAV which uses similar predictive text entry to shorten the time it takes to search, find and initiate features or services that are available to through the .

Nuance VSuite is the trade name Nuance has assigned to its “framework” of voice command and control features that start with the prompt “Say a command.” Followers of VoiceSignal (which was acquired by Nuance in 2006) will recognize it as the evolutionary descendant of its flagship Voice Command product. In the aggregate, VSuite, T9, T9 NAV and Nuance Voice Control ship on hundreds of millions of handsets.

In point of fact, phones do not need to be speech enabled, as the original Voice Portal providers - like Tellme, BeVocal (Now Nuance On Demand) and HeyAnita – proved nearly a decade ago low end phones can perform sophisticated voice searches when they are connected to speech processing resources. GOOG411 and many of the automated, enhanced directory assistance services fulfill on automated query-and-response tasks through low-end devices. Safety is a Driving Force A report issued by Virginia Tech Transportation Institute in late July 2009, confirmed a set of facts that proponents of hands-free technologies have been citing for many years. Using wireless devices while driving significantly

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 11 increases the chance of having an accident. As illustrated in Figure 5, texting is the big culprit, leading to an increased risk that’s as high as 23 times normal, referring to use in a truck.

Figure 5: Assessed Risk of Driving While Using Cell Phone CELL PHONE TASK Risk of Crash or Near Crash event Light Vehicle/Cars Dialing Cell Phone 2.8 times as high as non‐distracted driving Talking/Listening to Cell Phone 1.3 times as high as non‐distracted driving Reaching for object 1.4 times as high as non‐distracted driving Heavy Vehicles/Trucks Dialing Cell phone 5.9 times as high as non‐distracted driving Talking/Listening to Cell Phone 1.0 times as high as non‐distracted driving Use/Reach for electronic device 6.7 times as high as non‐distracted driving 23.2 times as high as non‐distracted driving

Going “Truly Hands-Free” Talk of “The Holy Grail” ranges from the sublime (Christian mythology) to the ridiculous (Monty Python), but in the context of new technologies, it has come to mean a device, set of code or system that has miraculous powers. “Truly-hands-free” voice applications are the mobile speech community’s Holy Grail, delivering technologies and protocols that allow a device’s owner to use his or her voice to provide commands and spoken input without having to touch a button or area on a screen.

“Hot words” are an interesting approach. Exemplified by the “toktok” service soon to be introduced by DiTech Networks, its recognizer is always listening in for the subscriber to say “toktok” at which point it toggles into “enter command” mode. The days of the mythological moniker appear to be numbered.

Ergo, the greatest potential for a voice-user interface will be to provide hands-free and eyes-otherwise-engaged ways to use existing, popular messaging, social networking and communications applications. To date, the promise of hands-free services has rung hollow. For one thing, they were not truly hands-free. Almost all of them required the use of an “activate” button to tell the application to listen for spoken utterances.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 12

Time to Leverage Mobile Speech Today, in the latter part of 2009, the stage is set for serious uptake of speech-enabled mobile services in 2010. That’s because:

• The mobile subscriber base has hit critical mass - Virtually all phones are capable of supporting voice-activated features or services (embedded, server-side or both). • Mobile devices are being used for a multiplicity of purposes – Including text messaging, music and video entertainment, map-based navigation, personal information management and “search.” • Legal mandates coming - Safety concerns surrounding distracted driving (and even distracted walking) are at an all-time high • Carriers can be circumvented - With traditional “voice services revenues” flattening out, carriers are taking voice-activated services more seriously as a growth accelerant. • Applications that work - There are more showcase applications than ever before and subscriber awareness is at an all-time high At this time, an ecosystem of solutions providers is showing a higher level of logic and coherence than in prior years. Participants fall into three categories: • Core technology providers – Providers of specialized chips or code to support speech recognition or text-to-speech conversion in small environments or distributed across wireless networks. Examples are Sensory, Nuance, Novauris, Loquendo, IBM, Fonix and Microsoft • Solutions providers and integrators – Have invested in foundational “platforms” that support a multiplicity of services – especially speech-to-text conversion, search and form-filling (e.g. Yap, Vlingo, Google, Tellme (Microsoft), V-Enable and others). • Service providers and retailers – Exemplified by wireless network operators but, in The Age of the App Store, it obviously includes the companies that aggregate other people’s solutions into “direct to the customer” sales sites. Voice-activated applications present a problem of categorization to application retailers. As the Apple/AT&T arrangement exemplifies, there is bound to be a “native” flavor of speech-enablement. In this case Apple Voice Control ships with every new iPhone 3G S. It enables iPhone owners to “voice dial” the numbers in the phone’s contact list or search for music in the iTunes library. The introduction of Voice Control has not foreclosed on dozens of other speech-enabled applications in the iTunes App Store.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 13

Appendix: Profiles of Vendors This section contains brief profiles of firms that have created speech-enabled services or applications to reach mobile subscribers. For planning purposes, profiles of supply-side vendors are organized into two categories:

• Google, Microsoft and Nuance, and • Everyone else

Among providers of mobile speech technology, three companies give the market shape, size and heft. They are Google, Microsoft (Tellme) and . Google leads (not only alphabetically) but by dint of being the leading provider of search services that emanate from the desktop onto mobile devices through a dial-up service (GOOG-411) and through speech- enabled, downloadable applications.

Microsoft has one of the longest standing commitments to speech technologies, starting with a dictation capabilities that are baked into every operating system since XP. As for its commitment to mobile speech, its purchase of Tellme networks (in 2007) placed it at the epicenter of the mobile speech movement. Tellme operated the first speech portal (800-555- TELL) dating back to 1999. It is also the resource for the newly branded 800- BING-411 enhanced directory assistance service (formerly 800-Call-411).

Yet to be determined is the fate of Microsoft’s downloadable application for devices and Blackberry’s (beta). As discussed in the profiles below, Microsoft is rebranding the spectrum of and downloadable applications that fit under the “ 6.5” umbrella into a package called “.” To many, it appears to be an effort to make a mobile device running OS more like an iPhone. However, it will be interesting to see where the Tellme Mobile applications reside in the new environment, especially now that Apple prominently features the Nuance-powered Voice Control capabilities as a differentiator for the iPhone 3G S.

This brings us to the third member of the mobile voice triumvirate, Nuance Communications. While the company does not presently match Microsoft or Google in market capitalization, R&D or marketing budget, actions taken this year by both IBM Corporation and Apple give Nuance tremendous influence in the world of mobile speech. It is both the heir apparent to Big Blue’s legacy speech opportunities and the most favored supplier of speech processing technologies across the broad spectrum of Apple computing and communications platforms, including the iPhone. The Three Voices of Google Some readers will catch the reference to the 1957 Oscar winning movie in which Joanne Woodward played a woman with multiple personality disorder. The parallel to Google’s automated speech development efforts is not a

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 14 perfect fit, but I use it to make a point. Google’s participation in the speech- enabled world is, at once, pre-emptive, innovative, and deleterious.

It is pre-emptive because, like so much of what Google has to offer, it is “free” to all users. On the one hand, that ensures broader use by a mobile public that expects as much (free content and services). On the other hand, Google is one of a handful of companies that can remain profitable even as it launches services, features and functions that exact no fees from their direct beneficiaries. In effect, they have built a barrier to entry against other firms that would depend on top line revenue for survival and sustainability.

The fact that it is “innovative” speaks for itself as Google launches “beta” versions of a multiplicity of cloud-based applications, utilities, features and functions. Of special importance in the mobile speech community is the creative way that the downloadable Google App facilitates voice input by iPhone users. It overcomes the need to “press and hold” or simply press a physical or soft button by using the device’s accelerometer to detect when the phone is in the “voice input receptive” position. Users hear a brief tone signaling that the phone is “in listen mode” and, just as importantly, it toggles into response mode when the phone moves into a position where the user can read what’s on the screen. It’s not “hands-free” but it is a highly reliable why to accommodate spoken input.

“Deleterious” may be too strong a term to capture Google’s impact on user acceptance. While dedicated to providing high levels of user satisfaction, Google also takes a 100% automated approach to speech recognition. Whether it’s directory assistance (GOOG411), Voicemail-to-Text transcription (integrated into Google Voice) or Voice Search (Google Mobile), human intervention or interpretation is not an option. It delivers the “best effort” results, which are often incorrect. This approach supports large-scale implementation, but it is accompanied by the potential for large-scale, public ridicule – which occurred when the technology reporter for the New York Times dedicated two lengthy columns to the faulty transcriptions of calls to his Google Voice mailbox.

Google’s approach is both understandable and defensible. The company is getting speech technologies into the field and available to the masses. It is helping to build realistic expectations among the folks who use the services most. Their experience will tell them when the technology works best and when results may be suspect. But in the near term, it is human nature for people to try to “break the system,” for example, by calling into the Times’ reporter’s mailbox and reading Lewis Carroll’s Jabberwocky just to see how it would be rendered by a machine.

It is human nature to try to break every new technology that comes along. Speech recognition over the phone is too easy a target at this point. That’s

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 15 why Google’s all-or-nothing approach plays into the hands of doubters, naysayers and skeptics who would just as soon keep things as they are. Fortunately, in the modern day adoption curve for new technologies, “break the system” is quickly followed by “game the system” as the next generation of prospective users begin to figure out how to make the new technology work on their behalf. This pattern is followed by possession (which used to be called “personalization”) whereby users accept the systems flaws, invent their own solutions and make them their own (think of the early adopters of T-9 who plowed through other people’s skepticism and took their SMS input to blistering heights) followed finally by mass use.

So far, Google’s approach has us stuck on mass deployment and frequent failure. That’s why I call it deleterious.

Microsoft Has its Own Identity Issues Acquiring Tellme created a transformative moment for Microsoft’s enterprise customer care and mobile services efforts. Tellme carried a reported $800 million price tag, which covered both the long-standing hosted voice portal and customer care operation and a formidable set of mobile speech services. For more than a year, Tellme was treated as a near-autonomous, voice services subsidiary of the Redmond-based mothership.

Tellme General Manager (and co-founder) McCue departed in May 2009. This changing of the guard coincides with a major repositioning of the Tellme’s hosted customer care operation (making it more of a high-quality server farm for speech-based applications). Because McCue was its chief spokesperson, presenter and advocate, his departure has put Tellme Mobile’s development and marketing efforts under a cloud.

The overall promotional message for Tellme Mobile is “say what you want and get it.” Its technology is integrated into Ford Motor Company’s in-vehicle communications and entertainment system called SYNC™. So are technologies from other parts of Microsoft, as well as arch rival Nuance. Collectively, the technology provides a way for drivers to play music, make hands-free phone calls or retrieve traffic information through the VUI. These services are packaged as “say-what-you-want-and- get-it” features that will ship in Fall 2009 on mobile devices that run the Windows 6.5 operating system. Pre-release promos call it the “first mobile voice service to combine content and communications,” but companies like Novauris, Nuance, Promptu and Vlingo would dispute this claim.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 16

Nuance Mobile: A Multimodal Approach Nuance Communications, Inc. 1 Wayside Road Burlington, MA 01803 P | 781-565-5000 www.nuance.com

Over the past five to seven years Nuance has acquired and assimilated a number of technology providers that provide the foundation for fast, multimodal input of instructions, content and search terms. On the speech side, the acquisition of VoiceSignal and Mobile Voice Command have strengthened a portfolio that already included intellectual property from the likes of Phonetic Systems (enhanced directory assistance), Rhetorical (text- to-speech rendering), Locus Dialogue (auto-attendant ) and ART Advanced Recognition Technologies (mobile ASR). The acquisition and licensing binge culminated in a landmark agreement with IBM which makes its formidable portfolio of speech processing technologies (embedded, distributed and server-based) into the mix of resources that Nuance will incorporate into its next generation of products and services.

As a company that traces its origins to document management, Nuance was a naturally attracted to multimodal interactions. In 2007, it also acquired Tegic, the originator of T-9, the predictive texting protocol that runs on hundreds of millions of mobile handsets. Coupled with the acquisition of hosted speech provider BeVocal (which had a number of wireless carriers as clients), Nuance is well-positioned to roll-out multimodal services that merge tapping, typing and talking into the user interface with the capabilities of responding via voice, text or video as appropriate to the users’ needs.

In mid-July 2009 Nuance acquired mobile application provider Jott to round out a set of mobile voice services to include Jott Assistant, which allows its users to use their voice from any device (mobile phone, phone or softphone) to create notes, set reminders and appointments, send email and text messages. Those voice services are an important component of Nuance Mobile services, which also includes the next generation of T9 predictive texting resources, which is reported to have shipped on more than 4 billion mobile phones.

The latest portfolio of T9-based products includes “T9 Nav” which uses the predictive logic that accelerates text entry to create keypad-based “shortcuts” for a mobile device’s functions and features as well as services, software and content provided by a mobile carrier or any third-party’s application store. This is where Nuance, which had built its reputation on its dominance of the speech processing business, shows that it recognizes that spoken input is but one alternative for mobile users to accomplish their goals most efficiently. In noisy rooms, in quiet environments, or when it is simply ore convenient, Nuance knows that it is better to provide the most efficient

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 17 way to enter instructions, text or queries through the keypad, no matter how accurate recognition becomes.

To round out its Mobile portfolio, Nuance also offers exemplary hybrid services. Once again, the term “hybrid” refers to the blending of embedded, device-based resources with applications or speech processing servers “in the cloud”, as well as intervention by live humans when required. The Nuance offer of Voicemail to Text (VM2Txt) transcription service is an example of the hybrid approach. The approach deploys the venerable Dragon Naturally dictation software “in the cloud” as a first step to transcribe a spoken message.

But Dragon is more like a Tier 1 or triage resource. If the system has high confidence in the utterances, transcription can be accomplished with 100% automation and high levels of accuracy. However, in the frequent cases when further disambiguation or better transcription is needed the text files are transmitted to remote contact centers where live agents can listen to the phrases where transcription is suspect and perform the transcription on the fly. It is also the modern convention to deliver the transcribed message as email with the recorded, spoken message as an attached .WAV file.

Nuance’s real coup may be the recently announced agreement with IBM which essentially makes Nuance the most favored go-to-market partner for IBM’s portfolio of speech processing patents. While the proof in this sort of technology pudding is in the integrations and deployments, IBM has made it clear that third parties like Nuance, Vlingo and Yap will be the product marketers and, with hope, Big Blue will be there to harvest some of the fruits of the spending on integration and deployment.

The Contenders Nuance is not the only non-giant with high-hopes for mobile speech applications. What follows is certain to be an incomplete list of companies offering speech-based applications and solutions for the problems of mobile subscribers. Creaceed Rue de l'Epargne, 56 7000 Mons Belgium P + 32 65 37 44 90 www.creaceed.com

Creaceed developed and licenses Vocalia 2.0, a mobile speech app for the iPhone. It enables its users to search for songs from the iTunes library by saying the artist or group name. If they have downloaded bookmarks from their desktop browsers, they can use spoken commands to open pages in the Safari browser. As of this publication (September 2009) the application is

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 18 available in English, French and German with more (unspecified) languages coming. All apps are based on CeedVocal SDK, which is an ASR technology developed by Creaceed for the iPhone. Ditech Networks 825 East Middlefield Road Mountain View, CA 94043 P | 650-623-1300 www.ditechnetworks.com

Publicly traded company under the stock symbol DITC, Ditech Networks is undertaking a major change in business strategy. It built a profitable business by providing advanced voice processing, Session Initiation Protocol (SIP), and security technologies to the likes of Verizon Wireless, Sprint/Nextel, Orascom Telecom, and others that collectively serve more than 150 million subscribers. It entered the mobile speech domain by developing and offering a mobile, personal virtual assistance service called toktok.

As a single-number service with close links to resources in Google’s cloud, toktok is much like Google Voice. The service, now in controlled “beta” release, enables users to voice dial, schedule meetings or events, get “whispered” caller ID information and engage in voice enabled social networking. It differentiates itself from other members of the mobile speech community by offering single-word access to its speech-enabled features and functions. Fonix 387 South 520 West, Suite 110 Lindon, Utah 84042 P | 801-553-6600 www.fonix.com

Fonix Corporation (founded in 1994) is headquartered in Salt Lake City, Utah. Fonix Speech is the wholly-owned subsidiary of Fonix Corporation. Its ASR and TTS products are designed for a broad variety of platforms, including mobile devices; electronic games, toys and appliances. It claims to have seven patents and “nearly a dozen” pending patents. It offers embedded speech interface solutions for mobile phones, Smartphones, PDAs and wireless communication devices. The company’s most recent financial filings (10-Q) noted risk factors that “raise substantial doubt about the Company’s ability to continue as a going concern.”

Its longest-standing mobile product is Fonix VoiceDial™ is akin to the Fonix iSpeak (which is slated for release as an Apple iPhone 3G application), available in English, Spanish, French, German and Italian.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 19

Handheld Speech 18 Hillside Ave Amesbury MA 01913 www.handheldspeech.com

This is a small company, roughly seven years old that provides small foot- print voice recognition and text-to-speech applications for handheld devices. The applications also operate on desktop PCs. Offered for sale are Voice LookUp for PocketPC and the iPhone. It offers a software development kit (SDK) PocketPC, desktop Windows, Mac OSX, Linux, and uClinux.

Jott See Nuance (Company purchased by Nuance, July 2009) Loquendo Loquendo S.P.A Via Olivetti, 6 I-10149 Turin, Italy P | +39 011 291 3413 www.loquendo.com

Loquendo is a global company that offers ASR, TTS resources, as well as a VoiceXML-based platform called VoxNautica. In spite of its full-scale offering, it is often regarded as a TTS-centric technology provider As of September 2009, Loquendo’s software supported speech processing in 27 languages with 65 voices, and the company had developed its own voice biometric- based speaker identification and authentication engine.

Loquendo’s embedded TTS software has been available for over five years and, by the company’s estimate, is in use on over 7 million personal navigation devices (PNDs) around the world. Its embedded software had been available of the OS since 2002 and, as of mid-2009, the software had been operating on Symbian OS™ as well as Microsoft’s Windows Mobile 5 & 6 (all editions), CE 5 & 6, Windows XP Embedded. Its reach is further extended over VxWorks, Linux, QNX and iPhone OS. Its first iPhone app, a voice dialer called “Dillo (Italian for Say It!) was introduced in early 2009.

Derived from the Loquendo ASR, and sharing the same core algorithms, the Loquendo Embedded ASR engine is a solution for deploying speech applications in embedded and mobile environments. Because it shares the same core engine as the standard server version, the product offers the same range of languages, the same APIs and standards’ support. Loquendo Embedded ASR employs optimized neural-networks, reducing recognition time, and includes new features that give improved recognition performance in a range of challenging environments, such as interactions with non-native

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 20 speakers, high background noise (e.g. in-car, warehouse) and a diverse range of audio channels (e.g. VoIP, GSM/UMTS).

In the mobile speech industry, Loquendo is focused mainly on the following markets: Automotive & Navigation, Mobile Devices, Assistive, Transport, Industry ( Voice-picking). Loquendo customers include: Tom Tom, Citroen, Magneti Marelli, AvMap, deCarta, Nitax, Archean, ESRI, NY, CodeFactory, RNIB, and Saltillo.

Melodis 1731 Technology Drive, Suite 700 San Jose, CA 95110 P | 408-441-3200 www.melodis.com

Melodis is backed with $4 million in venture funding, and its stated mission is “to make sound and speech the preferred means to search and navigate information on mobile or IP connected devices.”

The company publishes two separately branded product lines. The Melodis Voice Dialer is available on the iPhone and will soon be available on other smartphones. The other product line is the Midomi Ultra Music Identifier and Search which is designed to enable a mobile phone to identify recorded music playing over the radio, in the background or even if it is hummed or sung by the device’s owner

The company charges $4.99 for its music finder, but offers the Voice Dialer for free. Metaphor 106 Crest Rd. Wellesley, MA 02482 www.metaphorivr.com

Metaphor Solutions provides speech-enabled IVR services on an on-demand basis primarily for business enterprises. It has aggregated a list of 55 pre- packaged voice applications which it positions as “plug and play” resources for both premises-based and hosted solutions. It has forged partnerships with the likes of XO Communications, Tellme (division of Microsoft), Genesys (subsidiary of Alcatel-Lucent), Voxeo and many others.

In 2008, Metaphor launched a series of speech-enabled mobile applications for the Apple iPhone. They include SayMedia (which allows you to say the name of a TV show or movie to receive a review) and SayNFind, a mobile search application and a voice dialer called SayNCall.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 21

Novauris Novauris Technologies Ltd Millbank, Stoke Road Bishops Cleeve Cheltenham, Gloucestershire England GL52 8RW +44 (0)1242 678581

North American Office: Novauris LLC 440 N. Wolfe Rd. Sunnyvale, CA 94085 P | 408-524-3094 www.novauris.com

Novauris is primarily a “white label” provider of voice search software, including resources for speech recognition and database (media) search and access. Its core technology is called NovaSearch®. The company targets voice enabled applications offered by wireless service providers (like Verizon), hosted service providers (like Angel.com) and in-vehicle solutions providers. One Voice Technologies 4250 Executive Sq., Ste. 770 La , CA 92037 www.onevoicetech.com

One Voice Technologies, Inc., develops what it refers to as “4th Generation voice solutions” for the Telecom and Interactive Multimedia markets. It has trade marked the name “Intelligent Voice” for its core solutions. Its software enables dictated words to be sent email, SMS, Instant Messaging and text pages. It is a public company with revenues in 2008 of about $700,000. Its major telephone company customers are Telefonos de Mexico (TELMEX) and its TELNOR subsidiary, the Government of India, Fry's Electronics, Inland Cellular, Nex-Tec Wireless, and several additional telecom service providers throughout the United States.

In 2006, One Voice signed a deployment contract with the residential group within TELMEX for deployment of One Voice's MobileVoice solutions to the over 19 million TELMEX subscribers. MobileVoice was launched in October, 2007, under the TELNOR brand as IRIS. Results are said to be positive but growth awaits a “relaunch” as part of a standard bundle of features from TELNOR.

Another carrier customer is Mahanagar Telephone Nigam Ltd. ("MTNL") with 6 million subscribers in India. One Voice is well on the way to introducing a service that carries a $Rs. 50/- (Rupees) monthly per subscriber out of which the Company has a 30% share. We anticipate the MTNL revenue stream to

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 22 grow as we launch additional MobileVoice services including voice dialing, group call and voice-to-SMS services.

One Voice works with Intel to offer an embedded version of MobileVoice, targeting telecom launches in 2009 as they wait for a larger market for “mobile internet devices” (MIDs) to materialize. The two companies have evaluated launching applications on iPhones and BlackBerry’s but they are discouraged by the fact that competitors offer similar software or capabilities for free. PhoneTag (formerly SimulScribe) 110 East 59th Street New York, NY, 10022 www.phonetag.com

Phonetag provides a voicemail-to-text transcription service. It uses both speech recognition/transcription and human-assistance to deliver text- versions of voicemail messages to subscribers through email (with audio file attached) or as a text message. The company claims to “work with” with voicemail systems operated by Alltel, AT&T (and AT&T Mobility), SaskTel, Sprint, Skype, T-Mobile, Verizon (and Verizon Wireless), and other carriers, which the company claims to account for “95% of home, office and mobile voicemail systems.”

Pricing is as follows: There is a “basic” subscription of $9.95/month for 40 messages. The “unlimited” plan is $30/mo for “unlimited”). It also charges $0.35 per message. As a private company, it does not report revenues, but had revealed revenues “over $2 million” in 2007.

In September 2009, Ditech Networks purchased the sole rights to market the technology under the Simulscribe brand. Promptu 333 Ravenswood Avenue Building 202 Menlo Park, CA 94025 P | 650-859-5800 www.promptu.com

Founded in 2002, under the name ActiveTV as a research venture at SRI Labs, Promptu introduced a voice-activated remote control for all the screen- based services offered by broadband carriers (primarily Cable TV system operators). The devices allowed cable subscribers to tune their sets to a selected program in response to instructions like “Find the Knicks game” or “Tune into some comedy.” Both Comcast and TimeWarner cable conducted market trials of the speech-activated remotes in 2005, and the company raised $22 million in venture funding.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 23

By 2006, the company changed its name to Promptu and broadened its service offering to include voice-activated mobile search, including location- based services. VC’s chipped in another $11.6 million. In early 2007, another $5.6 million underwrote a further incursion into the mobile search world, with Italy becoming a major geographic focus.

Promptu leveraged new attention to speech-enabled mobile services to forge relationships with a number of content providers. With the Official Airline Guide (OAG) it rolled out voice-activated access to WAP-based flight information, which was branded Flights2Go.com and offered for $2.99/mo by Verizon Wireless. In September 2008 it formally announced a speech-to-text dictation service at the CTIA Wireless event in San Francisco. At Demo 2009, the company showed ShoutOUT a voice-to-SMS application for the iPhone designed to support the three most popular spontaneous messaging applications: IM, SMS and Twitter. ShoutOUT is expected to be released in November 2009. To the best of our knowledge, each product in the U.S. has hit considerable speed bumps on its way to mass adoption.

Much of the company’s development has been taking place in Italy where it has succeeded in providing the technology deployed by Telecom Italia for an iPhone application called “dettaSMS.” As the name implies, it enables TIM (Telecom Italia Mobile) subscribers to dictate, review, edit and send SMS- based text messages.

Promptu’s other major accomplishment in Italy has been to offer a voice- based mobile access to the services offered by the Italian rail system through its Treinitaliz Web site. The mobile version of what is seen as the “leading mobile travel application in Italy” is called ProntoTreno. It allows passengers to use their voice to check schedules, make and change reservations, and purchase tickets. Both of the applications in Italy are accessible through a variety of handset QTech View Point H.No.8-2-293/82/L/208/A M.L.A. Colony, Banjara Hills Hyderabad, 500 034 India www.reqall.com

Venture company, raised seed capital $2.5 million in 2006. QTech’s flagship product, ReQall is positioned as a “memory jogger”, which is multimodal by nature because it can parse spoken phrases, like “remind me of the meeting with Sales on Tuesday” and converts them into scheduled events in Google Calendar. It supports updates via voice, email or IM.

Its voice partners include Yap and Novauris.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 24

Sakhr Software (Dial Directions) U.S. Office 8065 Leesburg Pike Suite 305 Vienna, VA 22182 Tel: (202) 429-2772 HQ: Nasr City, Cairo 11771 Egypt www.sakr.com

Dial Directions was acquired by Sakhr Software in June of 2009 to be incorporated in that company’s suite of software that supports natural language processing in Arabic. The resulting company has 200 employees. Dial Directions’ CEO Adeeb Shanaa has the same title at Sakhr, while the chief executive of Sakhr, Fahad Al Sharekh, is now chairman of the combined company, with the objective of building more business alliances and partnerships.

But Dial Directions was one of the pioneers of voice-based entry of destination information for an application that provides turn-by-turn driving directions. To start its service, the company laid claim to 1-DIR-ECTIONS (1- 347-328-4667) as a mechanism for delivering turn-by-turn direction as an SMS message in response to prompts that request current location and proposed destination. Sensory 575 N. Pastoria Ave. Sunnyvale, CA 94085-2916 P | 408-625-3300 www.sensoryinc.com

Sensory, founded in 1994, is a privately held company that offers custom “application specific integrated circuits (ASICs) that speech enable consumer electronic devices. Of special interest from the mobile speech point of view is the circuitry that supports Sensory’s BlueGenie Voice Interface. In January 2008, wireless handset maker BlueAnt integrated BlueGenie into the BlueAnt V1 Bluetooth headset. Sensory notes that the circuit brings speech recognition and synthesized voice output together to enable full “voice control” through headsets, without the need for visual displays. Once initiated (through the push of a button the headset), it provides the basis for “truly hands-free” access to wireless services and features.

Sensory’s technology is also integrated into a broad variety of mobile devices that support personal services. VoiceActivation, also based in Sunnyvale, CA, is a “premier partner” of Sensory Inc. providing such applications as Sensory's FluentSoft SDK for iPhone, including NameDial.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 25

Siri, Inc. Almaden Boulevard San Jose, CA 95113 www.siri.com

Siri has created a “Personal Virtual Assistant” mobile application. While the name summons the memory of Wildfire, Webley and a few other adventurous firms, Siri does more for its users than the message and calendar management that prevails in the PVA category. It leverages its founders’ significant experience in artificial intelligence (AI) – gleaned from one of the larges DARPA-funded projects in the history of SRI. Siri also puts emphasis on the support of natural language, or conversational, input of commands, queries or search terms.

The combination of NL and AI is formidable. It results in mobile speech- based interactions that are more likely to culminate in transactions. As of September 2009, Siri continues to operate largely under the radar, but has demonstrated its service at several industry gatherings, most notably Demo 2009 and the first-ever Verizon Developer Conference, feeding broad speculation that Verizon Wireless may have an inside track on rolling out the Siri service. SpeechCloud Adela Group LLC 15 Richards Rd Falmouth, MA 02540 P | 508-495-0000 www.adelavoice.com

Writing speech applications for the iPhone is emerging as a popular cottage industry and Adela, with its Voice Dialer (sold under the SpeechCloud brand), is yet another example of this phenomenon.

SpinVox Wethered House - Pound Lane Marlow | Buckinghamshire SL7 2AF UK Tel: +44 207 965 2000 www.spinvox.com

Founded in 2005, Spinvox launched with a Voice Message Conversion System (which is now patented), making voicemail-to-text its core product. In this respect its direct competitors are Nuance’s Voicemail2Text service and PhoneTag. The company has been the most aggressive promoter of voice-to- text services launching a global promotional campaign, multiple offices and contracting for high-visibility booths at trade cellular industry trade shows

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 26 around the world. As a result it has reportedly burned through over $100 million in venture capital, but has only grown top line revenues to an estimated $10 million annual clip.

In addition to voicemail-to-text rendering, SpinVox has steadily added new services like “Memo”, which is a note-taking or reminder service as well as general text entry to originate text messages, update social networks or send email messages to groups of people. In April 2009, the company made an API (application programming interface) available, making it possible to launch a speech enabled email program called Quick Voice Pro for the iPhone. As of September 2009, SpinVox is available in six languages: English, French, Spanish, German, Portuguese and Italian. Vlingo 17 Dunster Street Cambridge, MA 02138 www.vlingo.com

Vlingo, which launched in August 2007, developed software that supports a voice entry of instructions and content for wireless carriers and wireless application providers’ services and applications. Thus it accurately describes itself as the inventor of the mobile phone "voice user interface." Its flagship software, which runs on Blackberry smartphones, iPhones, Nokia smartphones (Series 60) and Windows Mobile devices, allows users to instantly access services and content on their device. Its services include the ability send text and email messages, call contacts, search the Web, update status on Facebook or Twitter by speaking into their wireless phone.

It also is the mobile voice user interface provider for the Yahoo! Voice Search service.

VoiceActivation See Sensory

VoiceBox Technologies 11980 NE 24th Street Suite 100 Bellevue, WA 98005 Tel: 425.968.7900 www.voicebox.com

As of September 2009, VoiceBox had raised $21 million and was in the process of raising another $15 million, mostly from its previous investors, a list that included: AutoNavi, a digital map database company in China, MiTAC,

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 27 which acquired the assets of Magellan’s consumer GPS division in 2009, and Inventec, a cellphone and portable navigation manufacturer.

VoiceBox Technologies augments other company’s core recognition technologies with software protocols and algorithms that support what it calls “Conversational Voice Search”. In that respect, its most direct competition is Novauris, but it will ultimately move into the same competitive domains as more general search, dictate and command field which where Vlingo, Nuance, Promptu and several others, including Apple Computer.

VoiceBox’s major point of differentiation is its focus on the automotive or “Telematic” marketplace, specifically a long-standing development relationship with Toyota Motor Sales-USA. In 2006, the companies set out “to enable search, navigation and retrieval of information for potential, future use in Toyota vehicles.” The tangible fruits of the partnership are rolling out “in selected 2010 Lexus models” as part of the “Lexus Enform” service that is closely coupled with its “Safety Connect” program and blends VoiceBox’s conversational speech recognition with services from Zagat, XM Radio and ATX.

The company’s concept of “Conversational Search” fits well with the in- vehicle (and hopefully hands-free) strategy. It has invested in software that supports so-called “natural language” input whereby use of the recognition system does not require training, nor does it ask people to learn a fixed set of commands. The company also claims that its recognizers are more tolerant of the sorts of background noise that is characteristic of moving cars. The same features are viewed as advantages for people using PCs and mobile devices as well as the in-vehicle crowd. Yap 800 West Hill Street Suite 101 Charlotte, NC, 28208 www.yapme.com

Founded in 2006, Yap’s software and services uses automated speech recognition to perform voice-to-text conversion on mobile phones. In contrast to our concentration on “hybrid” services in this report, Yap stresses its 100% automated handling of speech-to-text conversion. It has concentrated on developing a technology that supports large scale highly automated speech-to-text conversion to support spoken entry of test messages, search terms and the like.

Its flagship service is Yap9 that allows mobile subscribers to enter text for text messages, email or updates to social networks. Users can also use their spoken words to search Google, Wikipedia, Yahoo, and YouTube, or interact with Facebook without using a phone’s keypads.

© 2009 Opus Research, Inc.

Mobile Speech: Unlocking Personal Apps, Functions & Features Page 28

Yap also offers its services and capabilities on a “white label” basis to support such services as QTech’s ReQall. Ydilo AVS Camino Cerro de los Gamos 1 Edificio 6 Pozuelo de Alarcón 28224 Madrid España Tel: +34 91 252 84 00 www.ydilo.com

Ydilo provides advanced voice solutions using natural language speech recognition to automate customer care and value-added services in large organizations. It is a privately-held company but reported its top-line revenues for calendar year 2008 as 11.57 million, which represented more than 13% growth over the prior year.

The company’s roots are in mobile customer care, but it has evolved into a diversified provider of hosted voice applications. In January 2008 it launched a subsidiary called Movidilo to concentrate on automated handling of customer care, value added and entertainment services offered by mobile carriers.

As of mid-2009, Ydilo operates more than 3,000 ports with natural language recognition attending over 150 million calls annually.

© 2009 Opus Research, Inc.