Digital Transformation Monitor The rise of Virtual Personal Assistants

January 2018

Internal Market, Industry, Entrepreneurship and SMEs 7 The rise of Virtual Personal Assistants

Source: The improvements in AI technologies have led to the development of new products, notably VPA, which can now display relatively high accuracy. Internet giants are leading the pack, leveraging their technological expertise, with direct R&D investments, start-ups acquisitions, and access to their own processing infrastructure. These players have notably popularized VPA among their customer base and their devices (smartphones or dedicating speaker) with no clear competitor in sight.

• 30% indicate that is a quicker This can however mean several things 1 method to search for something and can thus be decomposed in a subset of characteristics: • 24% have difficulties typing on certain devices Autonomy: The ability for a machine to VPA usage and act on its own, perceiving its • 12% favour VPA to avoid “confusing environment, and taking actions to menus”. technology maximize its chances of success in a These figures seem to indicate that VPA defined task. The development of these Virtual Personal Assistants (VPA) are usage should not be marginalised as a capacities has given rise to the field of program meant to interact with “fad”, since users’ responses indicate that automation and applications such as an end user in a natural way, to answer recognition systems: or the internet of things. questions, follow a conversation and accomplish different tasks. • Are useful in cases where traditional Problem solving: The ability for a input methods are not available machine, given defined inputs to produce Two kinds of inputs are usually possible outputs that maximize predefined for a VPA: a voice interface (such as • Provide, for some users, an improved criteria of success. The development of Apple ) or a text interface ( experience, even when these traditional these capacities is usually considered to Assistant). The key point is that the end input methods are available. be the field of algorithmic. user is supposed to be able to talk to the is as the basis VPA using his natural , that is, as : The ability to get he would do to another human being The recent development of VPA can be better over time at a defined task. Or said rather than having to use specific sets of directly linked to recent progress in AI differently, the ability for a machine to commands or a computer language. technologies. progress autonomously at problem solving. This is the field that is usually Since the launch of Siri by Apple in 2011, Artificial Intelligence can be defined in considered as the main topic of Artificial the offers of virtual personal assistants several ways. Grossly speaking it is the Intelligence nowadays, and indeed most have developed rapidly to provide a ability for a machine to mimic the of the excitement around A.I. is focused more generic user interface. This user cognitive behaviour of a human being, on Machine Learning. interface can be accessed from the user device (smartphone or specific device) to Figure 1: From Artificial Intelligence to perform actions, control objects, answer question and even make recommendations on its own. Usage of VPA services and devices Among users, the popularity of VPA is overwhelmingly linked to its hands-free capabilities. Indeed, more than 60% state1 that the “usefulness when hands/vision occupied” if one the main reasons for using their voice. In addition, some users’ responses indicate that using their voice seem to provide a better experience overall: Source:

2 Virtual Personal Assistants

The recent progress in Machine Learning Figure 2: Error rates for systems that have made possible the rise of Virtual Personal Assistant, come from a specific approach to Machine Learning: Neural Networks and more specifically Deep Learning Neural Networks. Indeed, these new approach benefit the many different tasks which are at the core of natural language processing. Main tasks needed to develop a VPA There are many tasks involved in processing natural language for a VPA, which can be regrouped in four main categories, described as bellow. Speech-to-text and text-to speech

Notes : According to , the error rate for human is 5.9% For voice-based input, like those from Source : Microsoft VPA such as Siri, the first need consists of converting speech to actionable data. • Syntax analysis (or ) is used to Using a correctly identified and This “speech-to-text” step, (also called analyse and identify the structure of the its meaning/intention, systems thus have speech recognition), is of paramount sentence, based on knowledge of to succeed in finding the correct answer importance: if the input is not correctly grammar. and formulating it. recognized, all following steps are useless. Indeed, even an error on one • Semantic analysis is used to reach a deals with is very likely to result in an partial representation of the meaning of information retrieval (using information inaccurate answer. the sentence, based on the knowledge of on the Internet, or in an application) and the meaning of . generating a correct sentence, before the Speech-to-text technologies have been in last step of . development for several decades, but the • Pragmatic analysis is used to reach a development of AI has enabled final representation of the meaning of Text-to speech important improvements in the last the sentence, based on information about Text-to-speech (TTS) is the last step in years. the context. an VPA interaction which uses audio: the Syntax and semantic processing Question Answering text/answer has already been For the vast majority of applications determined, and the synthesis is the only Once a sequence of spoken words is remaining step. successfully converted to a text form, related to natural language, an answer numerous and very complex tasks (oral and/or written) is given back after Speech synthesis is not the most complex remain2: a query from the user. task, and is already mastered by all involved players.

Figure 3: Tech Giants takeover of A.I. start-ups

Source: CBInsight, March 2017

3 Virtual Personal Assistants

Figure 4: Free integration of a VPA, for a third-party developer (Alexa Voice Service)

Source: Amazon

However, one major possibility of • voice recognition: the ability for a VPA • Google funded the creation of a improvement is to succeed in generating to recognize the speaker (in particular research group to study “deep learning” human-sounding speech. Indeed, current for households with more than one at the the Institut des algorithmes des text-to-speech systems are largely based person) apprentissages de Montréal (MILA) in on concatenative TTS and have a • follow-up questions: the ability for a November 2016. robotic-like sound as a result. VPA to take into account previous Apple questions and answers Siri is arguably the most well-known 2 Google VPA, thanks to an earlier launch than its One of Google’s most famous public foray main competitors, an availability in more into AI technologies is Google Assistant, , and a native support in many GAFAM and tech its VPA available on many Android based Apple devices, iPhones in particular. products. More generally, the US giant is Apple has been buying several start-ups giants are leading investing heavily in many artificial in the AI field, in order to improve its intelligence related topics to improve its products: products: the pack • Siri was originally acquired by Apple in There are many companies, and start- • Google launched the Cloud Natural 2010, for an estimated 100-250 million ups, active in the field of AI technologies. Language API in July 2016, a service USD However, as GAFAM and tech giants are enabling third-parties to process • In 2016, Apple acquired VocallQ, for an investing heavily in all AI-related unstructured data through machine estimated 50-100 million USD, with the technologies and products, there seem to learning. stated goal to improve Siri. have an increasing lead regarding VPA. • In September 2016, Google presented In June 2016, Apple (partially) opened Currently, these companies are racing to WaveNet a “deep generative model” 3 Siri to app developers, following similar improve the accuracy of their VPA and to aiming at significantly improving text- moves from other GAFA. As a result, Siri add new features, such as: to-speech systems by sounding more natural. In October 2017, Google can leverage apps from third-party For the e commerce giant, securing a announced that this technology was now developers and not just Apple’s services, central position in the home could be a used for Google Assistant (for English strategy to develop its prime members' and Japanese speakers). base, and as a result, to further increase its revenue. Figure 5: Availability of VPA offered by Internet Giants, by device

Source: IDATE DigiWorld

4 Virtual Personal Assistants

Figure 6: Share of households owning multiple VPA-equipped speakers, 2020 • “Computers” Smartphones, tablets, computers, smart • “Home entertainment”: TV sets, set-top boxes, streaming boxes, consoles • “Smart home” alarm , speakers, lamps, thermostats, fridges • Cars In the vast majority of cases, manufacturers of CE devices with speech recognition capabilities rely directly on the VPA of Amazon, Google or Microsoft. Indeed the three giants allowed third-

Notes: Among households owning at least one VPA-enabled speaker party developers to fully integrate a VPA Source: IDATE DigiWorld, based on forecast data from Gartner (Google Assistant, Cortana and respectively) into products. Amazon For instance, Google and Apple are For vertical industries, there are some competing on mobile operating systems, Amazon entered the VPA consumer risk in integrating a VPA, as they lose a a battle that Microsoft as already lost and market with the , a smart significant portion of their control on the abandoned. speaker using “Alexa” its in-house user interface. In addition, the company designed VPA. Since the launch of the Processing natural language using developing the VPA has a control on Echo in 2014, competitors, notably technologies based on AI is key for GAFA, which services are available and which Google and Apple, launched similar as they rely heavily on user input to are not: they can redirect users towards products or are planning to. provide their services. As a result, GAFA their own. have make strong investments in Echo has allowed Amazon to establish a Still more U.I. than A.I. Artificial Intelligence, through presence in the home. In addition to its acquisitions and partnerships. Artificial intelligence technologies have multimedia and shopping features, Echo enabled VPA to understand and respond can be used as a 'hub' to control The rise of VPA and smart-equipped but their promises go beyond being just a connected objects (lighting, thermostats, speakers are arguably the best example new way for end users to interact with power sockets, etc.) from a multitude of of this competition and the tremendous their devices (UI). They are seen (and manufacturers. For the e commerce importance of AI. After Amazon’s success marketed) as intelligent assistants able giant, securing a central position in the in 2014 with its smart speaker Echo, not just of understanding but of taking home could be a strategy to develop its equipped with a VPA, Google and Apple decisions, supporting and potentially prime members' base, and as a result, to decided quickly to join the race with replacing human in several tasks. This further increase its revenue. their own devices , despite the fact that vision has yet to fully materialize. they entered the speech recognition field The group also set up a 100 million USD earlier (with Now and Siri respectively). The long term vision of the development fund to help start-ups that want to use its of VPA is that they will become capable speech recognition technology (Alexa). This illustrates the fact that these actors of more and more tasks, being for Amazon is giving access to its AI want to retain control of their users and example able to follow entire capabilities, by allowing third-party do not want to risk that any competitor professional conversations and seek developers to make their product "Alexa- succeed in developing alone a new type documents in a company information enabled". of UI/product. For GAFA, providing a system related to specific requests. continuity of experience is key in • Basic compatibility with Alexa (a keeping users “locked-in” their platform. consumer can control the third—party product from their Echo) Vertical industries are starting to integrate VPA products References • Full integration (the product is equipped with a speaker and a Speech recognition technologies are 1 KPCB - Internet Trends 2016 – Code Conference, microphone and acts as an Echo). being integrated into electronic devices July 2016 2 Worcester Polytechnic Institute - Natural Language in spectacularly rapid fashion. Processing - Prof. Carolina Ruiz Initially limited to simple queries, 3 Results (mean opinion scores) available on Deepmind’s website increasingly complex information can 3 now be processed as more and more devices are equipped with an Internet connection and as the field of speech Key Stakes recognition surges forward.

For GAFA, AI is a competitive Virtually all consumer electronic (CE) imperative devices can now include speech recognition capabilities: GAFA are waging multi-sided “wars”, with the goal of trying to impose their platform, ecosystem, products and services. 5 About the Digital Transformation Monitor The Digital Transformation Monitor aims to foster the knowledge base on the state of play and evolution of digital transformation in Europe. The site provides a monitoring mechanism to examine key trends in digital transformation. It offers a unique insight into statistics and initiatives to support digital transformation, as well as reports on key industrial and technological opportunities, challenges and policy initiatives related to digital transformation. Web page:

This report was prepared for the European Commission, Directorate-General Internal Market, Industry, Entrepreneurship and SMEs; Directorate F: Innovation and Advanced Manufacturing; Unit F/3 KETs, Digital Manufacturing and Interoperability by the consortium composed of PwC, CARSA, IDATE and ESN, under the contract Digital Entrepreneurship Monitor (EASME/COSME/2014/004) Authors: Vincent Bonneau, IDATE and Laurent Probst, Virginie Lefebvre, PwC

DISCLAIMER – The information and views set out in this publication are those of the author(s) and should not be considered as the official opinions or statements of the European Commission. The Commission does not guarantee the accuracy of the data included in this publication. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which might be made of the information contained in this publication. © 2017 – European Union. All rights reserved.