Voice Assistants and Smart Speakers in Everyday Life and in Education
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Listen Only When Spoken To: Interpersonal Communication Cues As Smart Speaker Privacy Controls
Proceedings on Privacy Enhancing Technologies ; 2020 (2):251–270 Abraham Mhaidli*, Manikandan Kandadai Venkatesh, Yixin Zou, and Florian Schaub Listen Only When Spoken To: Interpersonal Communication Cues as Smart Speaker Privacy Controls Abstract: Internet of Things and smart home technolo- 1 Introduction gies pose challenges for providing effective privacy con- trols to users, as smart devices lack both traditional Internet of Things (IoT) and smart home devices have screens and input interfaces. We investigate the poten- gained considerable traction in the consumer market [4]. tial for leveraging interpersonal communication cues as Technologies such as smart door locks, smart ther- privacy controls in the IoT context, in particular for mostats, and smart bulbs offer convenience and utility smart speakers. We propose privacy controls based on to users [4]. IoT devices often incorporate numerous sen- two kinds of interpersonal communication cues – gaze sors from microphones to cameras. Though these sensors direction and voice volume level – that only selectively are essential for the functionality of these devices, they activate a smart speaker’s microphone or voice recog- may cause privacy concerns over what data such devices nition when the device is being addressed, in order to and their sensors collect, how the data is processed, and avoid constant listening and speech recognition by the for what purposes the data is used [35, 41, 46, 47, 70]. smart speaker microphones and reduce false device acti- Additionally, these sensors are often in the background vation. We implement these privacy controls in a smart or hidden from sight, continuously collecting informa- speaker prototype and assess their feasibility, usability tion. -
Personification and Ontological Categorization of Smart Speaker-Based Voice Assistants by Older Adults
“Phantom Friend” or “Just a Box with Information”: Personification and Ontological Categorization of Smart Speaker-based Voice Assistants by Older Adults ALISHA PRADHAN, University of Maryland, College Park, USA LEAH FINDLATER, University of Washington, USA AMANDA LAZAR, University of Maryland, College Park, USA As voice-based conversational agents such as Amazon Alexa and Google Assistant move into our homes, researchers have studied the corresponding privacy implications, embeddedness in these complex social environments, and use by specific user groups. Yet it is unknown how users categorize these devices: are they thought of as just another object, like a toaster? As a social companion? Though past work hints to human- like attributes that are ported onto these devices, the anthropomorphization of voice assistants has not been studied in depth. Through a study deploying Amazon Echo Dot Devices in the homes of older adults, we provide a preliminary assessment of how individuals 1) perceive having social interactions with the voice agent, and 2) ontologically categorize the voice assistants. Our discussion contributes to an understanding of how well-developed theories of anthropomorphism apply to voice assistants, such as how the socioemotional context of the user (e.g., loneliness) drives increased anthropomorphism. We conclude with recommendations for designing voice assistants with the ontological category in mind, as well as implications for the design of technologies for social companionship for older adults. CCS Concepts: • Human-centered computing → Ubiquitous and mobile devices; • Human-centered computing → Personal digital assistants KEYWORDS Personification; anthropomorphism; ontology; voice assistants; smart speakers; older adults. ACM Reference format: Alisha Pradhan, Leah Findlater and Amanda Lazar. -
Samrt Speakers Growth at a Discount.Pdf
Technology, Media, and Telecommunications Predictions 2019 Deloitte’s Technology, Media, and Telecommunications (TMT) group brings together one of the world’s largest pools of industry experts—respected for helping companies of all shapes and sizes thrive in a digital world. Deloitte’s TMT specialists can help companies take advantage of the ever- changing industry through a broad array of services designed to meet companies wherever they are, across the value chain and around the globe. Contact the authors for more information or read more on Deloitte.com. Technology, Media, and Telecommunications Predictions 2019 Contents Foreword | 2 Smart speakers: Growth at a discount | 24 1 Technology, Media, and Telecommunications Predictions 2019 Foreword Dear reader, Welcome to Deloitte Global’s Technology, Media, and Telecommunications Predictions for 2019. The theme this year is one of continuity—as evolution rather than stasis. Predictions has been published since 2001. Back in 2009 and 2010, we wrote about the launch of exciting new fourth-generation wireless networks called 4G (aka LTE). A decade later, we’re now making predic- tions about 5G networks that will be launching this year. Not surprisingly, our forecast for the first year of 5G is that it will look a lot like the first year of 4G in terms of units, revenues, and rollout. But while the forecast may look familiar, the high data speeds and low latency 5G provides could spur the evolution of mobility, health care, manufacturing, and nearly every industry that relies on connectivity. In previous reports, we also wrote about 3D printing (aka additive manufacturing). Our tone was posi- tive but cautious, since 3D printing was growing but also a bit overhyped. -
Smart Speakers & Their Impact on Music Consumption
Everybody’s Talkin’ Smart Speakers & their impact on music consumption A special report by Music Ally for the BPI and the Entertainment Retailers Association Contents 02"Forewords 04"Executive Summary 07"Devices Guide 18"Market Data 22"The Impact on Music 34"What Comes Next? Forewords Geoff Taylor, chief executive of the BPI, and Kim Bayley, chief executive of ERA, on the potential of smart speakers for artists 1 and the music industry Forewords Kim Bayley, CEO! Geoff Taylor, CEO! Entertainment Retailers Association BPI and BRIT Awards Music began with the human voice. It is the instrument which virtually Smart speakers are poised to kickstart the next stage of the music all are born with. So how appropriate that the voice is fast emerging as streaming revolution. With fans consuming more than 100 billion the future of entertainment technology. streams of music in 2017 (audio and video), streaming has overtaken CD to become the dominant format in the music mix. The iTunes Store decoupled music buying from the disc; Spotify decoupled music access from ownership: now voice control frees music Smart speakers will undoubtedly give streaming a further boost, from the keyboard. In the process it promises music fans a more fluid attracting more casual listeners into subscription music services, as and personal relationship with the music they love. It also offers a real music is the killer app for these devices. solution to optimising streaming for the automobile. Playlists curated by streaming services are already an essential Naturally there are challenges too. The music industry has struggled to marketing channel for music, and their influence will only increase as deliver the metadata required in a digital music environment. -
Digital Workaholics:A Click Away
GSJ: Volume 9, Issue 5, May 2021 ISSN 2320-9186 1479 GSJ: Volume 9, Issue 5, May 2021, Online: ISSN 2320-9186 www.globalscientificjournal.com DIGITAL WORKAHOLICS:A CLICK AWAY Tripti Srivastava Muskan Chauhan Rohit Pandey Galgotias University Galgotias University Galgotias University [email protected] muskanchauhansmile@g [email protected] m mail.com om ABSTRACT:-In today’s world, Artificial Intelligence (AI) has become an 1. INTRODUCTION integral part of human life. There are many applications of AI such as Chatbot, AI defines as those device that understands network security, complex problem there surroundings and took actions which solving, assistants, and many such. increase there chance to accomplish its Artificial Intelligence is designed to have outcomes. Artifical Intelligence used as “a cognitive intelligence which learns from system’s ability to precisely interpret its experience to take future decisions. A external data, to learn previous such data, virtual assistant is also an example of and to use these learnings to accomplish cognitive intelligence. Virtual assistant distinct outcomes and tasks through supple implies the AI operated program which adaptation.” Artificial Intelligence is the can assist you to reply to your query or developing branch of computer science. virtually do something for you. Currently, Having much more power and ability to the virtual assistant is used for personal develop the various application. AI implies and professional use. Most of the virtual the use of a different algorithm to solve assistant is device-dependent and is bind to various problems. The major application a single user or device. It recognizes only being Optical character recognition, one user. -
Voice Interfaces
VIEW POINT VOICE INTERFACES Abstract A voice-user interface (VUI) makes human interaction with computers possible through a voice/speech platform in order to initiate an automated service or process. This Point of View explores the reasons behind the rise of voice interface, key challenges enterprises face in voice interface adoption and the solution to these. Are We Ready for Voice Interfaces? Let’s get talking! IO showed the new promise of voice offering integrations with their Voice interfaces. Assistants. Since Apple integration with Siri, voice interfaces has significantly Almost all the big players (Google, Apple, As per industry forecasts, over the next progressed. Echo and Google Home Microsoft) have ‘office productivity’ decade, 8 out of every 10 people in the have demonstrated that we do not need applications that are being adopted by world will own a device (a smartphone or a user interface to talk to computers businesses (Microsoft and their Office some kind of assistant) which will support and have opened-up a new channel for Suite already have a big advantage here, voice based conversations in native communication. Recent demos of voice but things like Google Docs and Keynote language. Just imagine the impact! based personal assistance at Google are sneaking in), they have also started Voice Assistant Market USD~7.8 Billion CAGR ~39% Market Size (USD Billion) 2016 2017 2018 2019 2020 2021 2022 2023 The Sudden Interest in Voice Interfaces Although voice technology/assistants Voice Recognition Accuracy Convenience – speaking vs typing have been around in some shape or form Voice Recognition accuracy continues to Humans can speak 150 words per minute for many years, the relentless growth of improve as we now have the capability to vs the typing speed of 40 words per low-cost computational power—and train the models using neural networks minute. -
Deep Almond: a Deep Learning-Based Virtual Assistant [Language-To-Code Synthesis of Trigger-Action Programs Using Seq2seq Neural Networks]
Deep Almond: A Deep Learning-based Virtual Assistant [Language-to-code synthesis of Trigger-Action programs using Seq2Seq Neural Networks] Giovanni Campagna Rakesh Ramesh Computer Science Department Stanford University Stanford, CA 94305 {gcampagn, rakeshr1}@stanford.edu Abstract Virtual assistants are the cutting edge of end user interaction, thanks to endless set of capabilities across multiple services. The natural language techniques thus need to be evolved to match the level of power and sophistication that users ex- pect from virtual assistants. In this report we investigate an existing deep learning model for semantic parsing, and we apply it to the problem of converting nat- ural language to trigger-action programs for the Almond virtual assistant. We implement a one layer seq2seq model with attention layer, and experiment with grammar constraints and different RNN cells. We take advantage of its existing dataset and we experiment with different ways to extend the training set. Our parser shows mixed results on the different Almond test sets, performing better than the state of the art on synthetic benchmarks by about 10% but poorer on real- istic user data by about 15%. Furthermore, our parser is shown to be extensible to generalization, as well as or better than the current system employed by Almond. 1 Introduction Today, we can ask virtual assistants like Amazon Alexa, Apple’s Siri, Google Now to perform simple tasks like, “What’s the weather”, “Remind me to take pills in the morning”, etc. in natural language. The next evolution of natural language interaction with virtual assistants is in the form of task automation such as “turn on the air conditioner whenever the temperature rises above 30 degrees Celsius”, or “if there is motion on the security camera after 10pm, call Bob”. -
The Role of Higher-Level Linguistic Features in HMM-Based Speech Synthesis
INTERSPEECH 2010 The role of higher-level linguistic features in HMM-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK [email protected] [email protected] [email protected] Abstract the annotation of a synthesiser’s training data on listeners’ re- action to the speech produced by that synthesiser is still unclear We analyse the contribution of higher-level elements of the lin- to us. In particular, we suspect that features such as pitch ac- guistic specification of a data-driven speech synthesiser to the cent type which might be useful in voice-building if labelled naturalness of the synthetic speech which it generates. The reliably, are under-used in conventional systems because of la- system is trained using various subsets of the full feature-set, belling/prediction errors. For building the systems to be evalu- in which features relating to syntactic category, intonational ated we therefore used data for which hand labelled annotation phrase boundary, pitch accent and boundary tones are selec- of ToBI events is available. This allows us to compare ideal sys- tively removed. Utterances synthesised by the different config- tems built from a corpus where these higher-level features are urations of the system are then compared in a subjective evalu- accurately annotated with more conventional systems that rely ation of their naturalness. exclusively on prediction of these features from text for annota- The work presented forms background analysis for an on- tion of their training data. going set of experiments in performing text-to-speech (TTS) conversion based on shallow features: features that can be triv- 2. -