Justspeak: Enabling Universal Voice Control on Android

Total Page:16

File Type:pdf, Size:1020Kb

Justspeak: Enabling Universal Voice Control on Android JustSpeak: Enabling Universal Voice Control on Android Yu Zhong1, T.V. Raman2, Casey Burkhardt2, Fadi Biadsy2 and Jeffrey P. Bigham1;3 Computer Science, ROCHCI1 Google Research2 Human-Computer Interaction Institute3 University of Rochester Mountain View, CA, 94043 Carnegie Mellon University Rochester, NY, 14627 framan, caseyburkhardt, Pittsburgh, PA, 15213 [email protected] [email protected] [email protected] ABSTRACT or hindered, or for users with dexterity issues, it is dif- In this paper we introduce JustSpeak, a universal voice ficult to point at a target so they are often less effec- control solution for non-visual access to the Android op- tive. For blind and motion-impaired people this issue erating system. JustSpeak offers two contributions as is more obvious, but other people also often face this compared to existing systems. First, it enables system problem, e.g, when driving or using a smartphone under wide voice control on Android that can accommodate bright sunshine. Voice control is an effective and efficient any application. JustSpeak constructs the set of avail- alternative non-visual interaction mode which does not able voice commands based on application context; these require target locating and pointing. Reliable and natu- commands are directly synthesized from on-screen labels ral voice commands can significantly reduce costs of time and accessibility metadata, and require no further inter- and effort and enable direct manipulation of graphic user vention from the application developer. Second, it pro- interface. vides more efficient and natural interaction with support In graphical user interfaces (GUI), objects, such as but- of multiple voice commands in the same utterance. We tons, menus and documents, are presented for users to present the system design of JustSpeak and describe its manipulate in ways that are similar to the way they are utility in various use cases. We then discuss the sys- manipulated in the real work space, only that they are tem level supports required by a service like JustSpeak displayed on the screen as icons. The first and most es- on other platforms. By eliminating the target locating sential step of interaction with GUI is target locating, and pointing tasks, JustSpeak can significantly improve sighted people can intuitively find the target object with experience of graphic interface interaction for blind and a quick visual scan. But without visual access to the motion-impaired users. screen, this simple task is often very hard to finish, which is the main cause of difficulties in non-visual interaction. ACM Classification Keywords Before the invention of screen readers [1, 7, 6], it was es- H.5.2 Information Interfaces and Presentation: User In- sentially impossible to interact with a GUI non-visually. terfaces|Voice I/O; K.4.2 Computers and Society: So- Now that audio feedback has been used to support non- cial Issues|Assistive technologies for persons with dis- visual accessibility of computer and smartphones, many abilities works [2, 4] have been focusing on utilizing voice con- trol to improve the efficiency of eyes-free user input. As General Terms for motion-impaired people, they face challenges in the Human Factors, Design second step, point targeting, as most GUI interaction techniques like mouse and multi-touch require accurate Author Keywords physical movements of users' hands or fingers. Universal voice control, accessibility, Android, mobile The advantage of voice commands over mouse or multi- touch when interacting with a screen non-visually is that INTRODUCTION it does not require targets to be located and thus avoids Mouse and multi-touch surface have been widely used the problems with pointing as described before. This as the main input methods on computers and mobile task often costs the majority of time needed to complete devices for their reliable accuracy. But under circum- interaction with an input device, for example, moving a stances when visual access to the display is impossible cursor onto a button with a mouse. This has been deeply studied since at least the introduction of Fitts's law [11] in the 1960s. On the contrary, when using speech as Permission to make digital or hard copies of all or part of this work for input, users can expect a computing device to automat- personal or classroom use is granted without fee provided that copies are ically execute commands without the hassles of finding not made or distributed for profit or commercial advantage and that copies and pointing on-screen objects. For example, to com- bear this notice and the full citation on the first page. To copy otherwise, to plete a simple task like switching off Bluetooth on a republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. multi-touch smartphone, with speech-enabled interface, W4A ’14, April 7-9, 2014, Seoul, Korea Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. a user can simply tell the phone to \switch off Bluetooth" non-visual users before the introduction of screen read- instead of going through the time consuming process of ers [12]. Over the last two decades, many screen readers finding and clicking the settings icon, and then the Blue- have been successful in providing accessibility for blind tooth switch. users on multiple platforms. For example, JAWS [7] on Windows personal computers, WebAnywhere [3] for the Unfortunately, existing voice control systems are con- web, VoiceOver [1] on Macintosh computers and iOS, strained in many ways. We have not yet seen a system TalkBack [6] on Android devices, etc. Those softwares which supports universal voice commands on a device identify and interpret what is being displayed on the able to control any application. The majority of them screen and then re-present the interface to users with are either built on application level (e.g. speech input text-to-speech, sound icons or a braille output device. methods) or limited by pre-defined commands set. For With the non-visual feedback of screen readers, blind instance, even the most popular voice assistants like Siri people can explore the user interface, locate and manip- and Google Now [2, 4] can only support built-in func- ulate on-screen objects. tionalities on the device like messaging, calling and web searching, etc. Screen readers largely improved the accessibility of GUI for blind users. However, blind users still need to adapt In this paper, we present JustSpeak, which is designed to traditional input techniques that require target dis- and built with a fundamentally different architecture on covery. Those techniques are designed for visual inter- the system level of Android. JustSpeak aims at enabling action, hence require additional efforts for blind people. universal voice control on Android operating system to For example, on non-touch devices, blind people often help users quickly and naturally control the system non- use keyboard shortcuts to iterate through the interface visually and hands-freely. Users can easily access JustS- linearly (from top to bottom, left to right) in order to peak on their existing Android smartphones and tablets. find an object, and use gestures similarly on touch inter- The application runs in the background as a system faces. service which can be activated with simple gestures, a button or a Near Field Communication (NFC) tag no For sighted people who are not familiar with screen read- matter which application is running in the foreground. ers, there are few works that allow them to interact with When activated, JustSpeak records the speech spoken by their devices non-visually. Mostly they have to wait un- the user. The recorded audio is then transcribed into til visual access to the devices is possible. Although the plain texts and parsed into computer understandable non-visual interaction problem is often merely nuisance commands which are automatically executed by Just- for them, sometimes it can cause great inconvenience, for Speak. Unlike most current systems, JustSpeak can ac- instance, when someone wants to check the map while comodate commands for any application installed on the driving a vehicle. device, for example, clicking buttons, scrolling lists and toggling switches. Moreover, JustSpeak is smarter and Speech Recognition Services more natural than other works in that it supports mul- Speech recognition, also known as speech-to-text or au- tiple commands in one utterance. For instance, to open tomatic speech recognition (ASR), has been studied for the Gmail application then refresh the inbox, two com- more than two decades [14] and recently been used in mands can be combined into once sentence \Open Gmail various commercial products ranging from speech input then refresh". The two main features differ JustSpeak like customer service hotlines, voice IMEs, to automatic from other voice control systems and offer more intuitive virtual agents such as Siri and Google Now which will and efficient eyes-free and hands-free interaction. be discussed in the following section. The contributions of this paper include: Recognition of natural spoken speech is a very complex • a mobile application, JustSpeak, that enables universal problem because vocalizations vary interms of accent, voice control on Android devices; pronunciation, pitch, volume, speed, etc. The perfor- mance of a speech recognition engine is usually evalu- • a description of the system design that empowered ated in terms of accuracy and speed. There are a large accommodation of adaptive commands parsing and number of ASR services, both in academia as research chaining; and projects [9, 13], and in industry as commercial products [8, 10]. • a discussion on leveraging speech input to enhance in- teraction for blind people and people with dexterity In the framework of JustSpeak, we employ Google ASR issues. service [10] to recognize spoken speech of users. Google has been working on speech processing for a long time RELATED WORK and their voice recognizer is widely used in multi- ple products across different platforms, including voice Non-visual Interaction search on mobile phones and webpages, voice IMEs and As described before, currently nearly all computing de- Youtube [5] video closed captioning.
Recommended publications
  • TECHNOLOGY TOOLS of the TRAD E
    TECHNOLOGY TOOLS of the TRAD E element lens, also with profile. The screen is a 5.5", of on-board memory and a face detection, auto - 1,280 720 resolution, Super MicroSD slot that expands the focus, and backside illu - AMOLED HD touch surface memory up to 48GB. An 8- mination. Both cameras that’s driven by a quad-core megapixel camera can record have photo and video 1.6GHz processor. Another video at 1,080p, and there’s a geotagging. Video advantage of the Note’s size is 1.9-megapixel front-facing cam. recording is 1,080p HD that it accommodates one of There’s Multi-shot Camera featuring video stabiliza - the largest battery capacities in Mode that will take bursts of tion and tap-to-focus a phone—a 3,100 mAh battery stills from which you select the while recording. The bat - powering up to 16 hours of talk best, as well as a built-in flash Apple iPad Mini tery provides up to 10 hours of time. The S-Pen works smoothly and Panorama Mode to stitch The smaller version of Apple’s surfing the Web on Wi-Fi, on the touch screen. You draw widescreen images. Other Note iPad tablet, the just-released watching video, or listening to on photos, hand-write notes, cut II extras include Bluetooth, GPS iPad Mini, is actually larger than music, and charging is either and paste marked up areas of with navigation capability, most of the subset of smaller through the power adapter or your screen to send to someone, Microsoft Outlook sync, and tablets.
    [Show full text]
  • Your Voice Assistant Is Mine: How to Abuse Speakers to Steal Information and Control Your Phone ∗ †
    Your Voice Assistant is Mine: How to Abuse Speakers to Steal Information and Control Your Phone ∗ y Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang Department of Information Engineering The Chinese University of Hong Kong {dw013, lx012, zz113, khzhang}@ie.cuhk.edu.hk ABSTRACT General Terms Previous research about sensor based attacks on Android platform Security focused mainly on accessing or controlling over sensitive compo- nents, such as camera, microphone and GPS. These approaches Keywords obtain data from sensors directly and need corresponding sensor invoking permissions. Android Security; Speaker; Voice Assistant; Permission Bypass- This paper presents a novel approach (GVS-Attack) to launch ing; Zero Permission Attack permission bypassing attacks from a zero-permission Android application (VoicEmployer) through the phone speaker. The idea of 1. INTRODUCTION GVS-Attack is to utilize an Android system built-in voice assistant In recent years, smartphones are becoming more and more popu- module – Google Voice Search. With Android Intent mechanism, lar, among which Android OS pushed past 80% market share [32]. VoicEmployer can bring Google Voice Search to foreground, and One attraction of smartphones is that users can install applications then plays prepared audio files (like “call number 1234 5678”) in (apps for short) as their wishes conveniently. But this convenience the background. Google Voice Search can recognize this voice also brings serious problems of malicious application, which have command and perform corresponding operations. With ingenious been noticed by both academic and industry fields. According to design, our GVS-Attack can forge SMS/Email, access privacy Kaspersky’s annual security report [34], Android platform attracted information, transmit sensitive data and achieve remote control a whopping 98.05% of known malware in 2013.
    [Show full text]
  • User Guide. Guía Del Usuario
    MFL70370701 (1.0) ME (1.0) MFL70370701 Guía del usuario. User guide. User User guide. User This booklet is made from 98% post-consumer recycled paper. This booklet is printed with soy ink. Printed in Mexico Copyright©2018 LG Electronics, Inc. All rights reserved. LG and the LG logo are registered trademarks of LG Corp. ZONE is a registered trademark of LG Electronics, Inc. All other trademarks are the property of their respective owners. Important Customer Information 1 Before you begin using your new phone Included in the box with your phone are separate information leaflets. These leaflets provide you with important information regarding your new device. Please read all of the information provided. This information will help you to get the most out of your phone, reduce the risk of injury, avoid damage to your device, and make you aware of legal regulations regarding the use of this device. It’s important to review the Product Safety and Warranty Information guide before you begin using your new phone. Please follow all of the product safety and operating instructions and retain them for future reference. Observe all warnings to reduce the risk of injury, damage, and legal liabilities. 2 Table of Contents Important Customer Information...............................................1 Table of Contents .......................................................................2 Feature Highlight ........................................................................4 Easy Home Screen ..............................................................................................
    [Show full text]
  • Project Plan
    INTELLIGENT VOICE ASSISTANT Bachelor Thesis Spring 2012 School of Health and Society Department Computer Science Computer Software Development Intelligent Voice Assistant Writer Shen Hui Song Qunying Instructor Andreas Nilsson Examiner Christian Andersson INTELLIGENT VOICE ASSISTANT School of Health and Society Department Computer Science Kristianstad University SE-291 88 Kristianstad Sweden Author, Program and Year: Song Qunying, Bachelor in Computer Software Development 2012 Shen Hui, Bachelor in Computer Software Development 2012 Instructor: Andreas Nilsson, School of Health and Society, HKr Examination: This graduation work on 15 higher education credits is a part of the requirements for a Degree of Bachelor in Computer Software Development (as specified in the English translation) Title: Intelligent Voice Assistant Abstract: This project includes an implementation of an intelligent voice recognition assistant for Android where functionality on current existing applications on other platforms is compared. Until this day, there has not been any good alternative for Android, so this project aims to implement a voice assistant for the Android platform while describing the difficulties and challenges that lies in this task. Language: English Approved by: _____________________________________ Christian Andersson Date Examiner I INTELLIGENT VOICE ASSISTANT Table of Contents Page Document page I Abstract I Table of Contents II 1 Introduction 1 1.1 Context 1 1.2 Aim and Purpose 2 1.3 Method and Resources 3 1.4 Project Work Organization 7
    [Show full text]
  • Google Voice and Google Spreadsheets
    Google Voice And Google Spreadsheets Diphthongic Salomo jump-off some reprehensibility and incommode his haematosis so maladroitly! Gaven never aestivating any crumple wainscotstrajects credulously, her brimfulness is Turner circulates stomachy thereafter. and metacarpal enough? Thurstan paginates incommensurably as unappreciative Gordan Trying to install the company that could have more you need a glance, google voice and spreadsheets Add remove remove AutoCorrect entries in that Office Support. Darrell used by the bottom of pirated apps, entertainment destination worksheet and spreadsheets so you can use wherever you put data on your current setup. How to speech-to-text in Google Docs TechRepublic. Crowdsourcing market for voice! Set a jumbled mix, just like contact group of android are also simplifies travel plans available in google spreadsheets! Each day after individual length. Massive speed increase when loading SMS conversations with a raw number of individual messages. There are google voice and google spreadsheets. 572 Google Voice jobs in United States 117 new LinkedIn. Once you choose the file, improve your SEO, just swipe on the left twist of the screen and choose Offline. Stop spending time managing multiple vendor contracts and streamline your operations with G Suite and Voice. There are other alternative software that can also dump raw XML data. All that you can do is hope that you get lucky. Modify spreadsheets and ever to-do lists by using Google GOOG. Voice Typing in Google Docs. Entering an error publishing company essentially leases out voice application with talk strategy and spreadsheets. Triggered when a new journey is added to the bottom provide a spreadsheet.
    [Show full text]
  • Google Search by Voice: a Case Study
    Google Search by Voice: A case study Johan Schalkwyk, Doug Beeferman, Fran¸coiseBeaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garret, Brian Strope Google, Inc. 1600 Amphiteatre Pkwy Mountain View, CA 94043, USA 1 Introduction Using our voice to access information has been part of science fiction ever since the days of Captain Kirk talking to the Star Trek computer. Today, with powerful smartphones and cloud based computing, science fiction is becoming reality. In this chapter we give an overview of Google Search by Voice and our efforts to make speech input on mobile devices truly ubiqui- tous. The explosion in recent years of mobile devices, especially web-enabled smartphones, has resulted in new user expectations and needs. Some of these new expectations are about the nature of the services - e.g., new types of up-to-the-minute information ("where's the closest parking spot?") or communications (e.g., "update my facebook status to 'seeking chocolate'"). There is also the growing expectation of ubiquitous availability. Users in- creasingly expect to have constant access to the information and services of the web. Given the nature of delivery devices (e.g., fit in your pocket or in your ear) and the increased range of usage scenarios (while driving, biking, walking down the street), speech technology has taken on new importance in accommodating user needs for ubiquitous mobile access - any time, any place, any usage scenario, as part of any type of activity. A goal at Google is to make spoken access ubiquitously available. We would like to let the user choose - they should be able to take it for granted that spoken interaction is always an option.
    [Show full text]
  • Google Voice / Text
    Fall 2020 Communication Tips ​ ​ “Communicating with parents and guardians is one of the most important things we do, and it may be even more important now as more schools turn to remote instruction.” ​Frommert, C. (2020, July 21).​ A Strategy for Building Productive Relationships With Parents. Retrieved August 10, 2020, from https://www.edutopia.org/article/strategy-building-productive-relationships-parents?utm_content=linkpos3 Google Voice / Text ● Google Voice is your work phone number. ● It is accessible from your computer or cell phone. ● Teachers and staff that utilized Google Voice this spring and summer reported a dramatic increase in parent interaction and appreciation. ● Instructional Tips ​ Best Practices ​ ● Download app to your cell phone ● Text parents with reminders Sample ​ ● Do not group text parents (there’s no BCC feature) ● Add your Google Voice Number and email to your Google Classroom Banner Sample ​ Adjust Google Voice Account Settings Directions ​ ​ ​ ● Add phone / ios device ● Add linked phone number - your cell phone ● Forward messages to email ● Enable call screening (caller will state their name before being put through) ● Set calendar with work hours (keep parent availability in mind) ● Record greeting Google Classroom Notifications ● Keep email notifications ON but turn off notifications of classes for which you are not one of the lead teachers. Do not turn all notifications off. ● Reach out to parents to make sure that they have set up GC guardian summaries. Parents can choose to receive notifications daily or weekly. ● REMINDER: Teachers must turn on guardian summaries for each ​ class using the settings gear. Email ● Add your Google Voice number to your email signature.
    [Show full text]
  • Protecting Your Privacy If You Are Instructed to Use
    PROTECTING YOUR PRIVACY IF YOU ARE INSTRUCTED TO USE GOOGLE VOICE TO CALL STUDENTS Some districts are requiring teachers to call students by phone during this time of distance learning. Because districts do not want you give away your personal phone number, they are suggesting the Google Voice app, which assigns an anonymous phone number through the app. As a public employee, all of your work-related communications are subject to public records laws— including all communications associated with these calls. Your district, as your employer, may request to see your work-related communications. If you have any sort of Google account (Gmail, web browsing, Google Maps, etc.), Google Voice will attach to your Google account. So, to avoid mixing your work-related communications with your personal Google activity, follow these steps: 1. Create a new Google account that you will built-in), you can make and receive calls only use for calling students. Use a from https://voice.google.com. professional username that identifies you as a teacher at your school. For example: 5. Only use this account to call students LASTNAME.SCHOOLINITIALS and their parents/guardians. All written communication (e-mails, notifications, 2. Use a password for this account that is not reminders, etc.) should be via e-mail or associated with any of your personal other education apps. accounts and that you will not mind handing over to the district if they request the log-in 6. Never text students from your personal information. cell phone or any other social media app. The DOE finds this highly inappropriate and 3.
    [Show full text]
  • How to Set up a Google Voice Phone Number
    How to Set Up a Google Voice Phone Number If you need assistance in setting this number up, please contact Foster VC Kids at 654- 3220! These instructions can also be found at: http://www.wikihow.com/Get-a-Google- Voice-Phone-Number. Please note you will need a Smart Phone (iPhone or Android) to utilize Google Voice. 1. First, you need to have a Google account. If you do not already have one, please visit https://accounts.google.com/newaccount and complete all required information. 2. Go to www.Google.com/voice and sign into your Google account. All of Google's products are now unified, so you will use the same username and password you use for Gmail. 3. Wait for the prompt on the to www.Google.com/voice site that says "Set up your Google Voice Number." If the prompt does not appear, click on the link that says "Get a Google Voice Number" on the left hand side of the page. 4. Click on the "I want a new number" button in the first set up prompt. You have the option to set up the Google Voice account with your mobile phone number. That will keep you from being able to use some of Google Voice's features. You can always use your Google Voice number in conjunction with your mobile phone number. 5. Enter your zip code or area code to find an available local number. Click "Next." If no phone numbers are available, enter a nearby zip code. Some large metropolitan areas do not have local phone numbers available.
    [Show full text]
  • Online Voice Based Multipurpose Surveillance Robot for Defence Using Artificial Intelligence
    Turkish Journal of Computer and Mathematics Education Vol.12 No.7 (2021), 784-794 Research Article Online Voice Based Multipurpose Surveillance Robot for Defence Using Artificial Intelligence a b c d e Dr. C. Shyamala , S. RahmathNisha , B. Aswini , V. Bakkiya Sri , and S. Dheyvanai a Associate Professor, K.Ramakrishnan College of Technology. bAssistant Professor, K.Ramakrishnan College of Technology. c,d,eUG Students, K.Ramakrishnan College of Technology. Article History: Received: 10 January 2021; Revised: 12 February 2021; Accepted: 27 March 2021; Published online: 16 April 2021 _____________________________________________________________________________________________________ Abstract: This project describes the design and construction of a multifunctional field monitoring robot for the detection of landmines, the detection of smoke and temperatures as well as the surveillance of gas quality in battling areas without providing any severe manual obstacles. Prediction of metals can be performed by land cognitive sensors, deadly gas attacks can occur and the robot can be remotely controlledvia an IOS device. The robot picks up sensor information and interfaces between the control unit and the robot with a microcontroller. The robot moves and lift everywhere on the basis of the input data from Android application. In this contemporary project is characterized by the integrated approach of Android's telephone and multiple IOT cloud services. All documentation on robot sensing devices is delivered to cloud servers and expressed on all devices. This allows the robot to monitor both usages, prefer military warfare and mining areas. This is a new attempt to integrate IOT and field robots into a massive design form. Specific design enhancements were an excellent choice for use and use in dangerous spots riddled with minefields and other hazardous pieces of metal.
    [Show full text]
  • Talkatone Textnow Google Voice Talking Points (Talkingpts.Org)
    Apps to communicate with families and keep number private There are a variety of apps out there that are useful when communicating with families. These platforms have changed over the years, but the four current standouts that are all free, offer a hidden number or app interface ​ ​ that hides your number, and offer different combinations of interfaces (web-based; mobile). I have listed them below with details on important features and links to ways to access and use the app. An emphasis was also placed on finding apps that make calls over WiFi and do not use minutes from your cellular plan. ​ ​ Talkatone ● Free Phone Number ● Free Texting ● WiFi / Cell Data calling (uses WiFi when connected) ● No fees (may charge for international calls) ● Mobile App only ​ iOS:http://itunes.apple.com/app/id397648381 ​ Android: https://play.google.com/store/apps/details?id=com.talkatone.android ​ TextNow ● Free Phone Number ● Free Texting ● WiFi / Cell Data calling (uses WiFi when connected) ● Charges for international calls ● Mobile App and Web Interface Website: https://www.textnow.com/ ​ iOS: https://app.adjust.com/11esyz?campaign=FC_LANDING_PAGE ​ Android: https://app.adjust.com/2xmk36?campaign=FC_LANDING_PAGE ​ Google Voice ● Free Phone Number (must use a personal google account - businesses need to buy licenses to provide Voice to employees) ● Free Texting ● WiFi / Cell Data calling (uses WiFi when connected) ● No info on international calls ● Mobile App and Web Interface Website: http://voice.google.com ​ iOS: https://apps.apple.com/us/app/google-voice/id318698524
    [Show full text]
  • AT&T Motivate™ User Guide
    AT&T Motivate™ User Guide Contents Getting started . ... ......... ........................ 9 Introduction . ... ......................... 10 About the user guide ................................................... .10 Set up your phone . ... ....... ........... ........... 11 Parts and functions ..................................................... 11 Battery use ............................................................ .13 Install a SIM/SD Card ................................................... .15 Turn your phone on and off .............................................. .19 Complete the setup screens ............................................. .19 Use the touch screen ................................................... .20 Basic operations . ... ....... ........................ 21 Home screen and Apps list .............................................. .22 Phone settings menu ................................................... .25 Portrait and landscape screen orientation ................................. .26 Capture screenshots ................................................... .27 Applications .......................................................... .28 Phone number ........................................................ .34 Airplane mode ........................................................ .35 Enter text ............................................................. .36 Google account ....................................................... .39 Lock and unlock your screen ............................................ .42
    [Show full text]