Your Voice Assistant Is Mine: How to Abuse Speakers to Steal Information and Control Your Phone ∗ †

Your Voice Assistant is Mine: How to Abuse Speakers to Steal Information and Control Your Phone ∗ y Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang Department of Information Engineering The Chinese University of Hong Kong {dw013, lx012, zz113, khzhang}@ie.cuhk.edu.hk ABSTRACT SMS/Email, access privacy information, transmit sensitive data and «««< .mine Previous research about sensor based attacks on An- achieve remote control without any permission. Moreover, we droid platform focused mainly on accessing or controlling over found a vulnerability of status checking in Google Search app, sensitive components, such as camera, microphone and GPS. These which can be utilized by GVS-Attack to dial arbitrary numbers approaches obtain data from sensors directly and need corre- even when the phone is securely locked with password. »»»> .r67 sponding permissions. ======= Previous research about sensor A prototype of VoicEmployer has been implemented to demon- based attacks on Android platform focused mainly on accessing or strate the feasibility of GVS-Attack. In theory, nearly all Android controlling over sensitive components, such as camera, microphone (4.1+) devices equipped with Google Services Framework are and GPS. These approaches obtain data from sensors directly and vulnerable to the proposed GVS-Attack. We hope that our study need corresponding sensor invoking permissions. »»»> .r67 can raise an alarm on the security risk of passive components (like «««< .mine This paper presents a novel attack that can bypass phone speakers). Although they will not generate any sensitive Android’s permission-based security mechanism. The basic idea information, they could still be exploited for privacy attacks. is that a zero-permission app (VoicEmployer) can play a malicious voice command at background, and that voice command will be Categories and Subject Descriptors picked up and executed by Google Voice Search, the built-in voice D.4.6 [Operating System]: Security and Protection—Invasive assistant module on Android phones, enabling the malicious app to software get lots of sensitive data that are originally not accessible by zero- permission apps. With ingenious design, our GVS-Attack can forge SMS/Email, access privacy information, transmit sensitive data, General Terms and achieve remote control without requesting any permission. Security Moreover, we found a vulnerability of status checking in the official Google Search app, which can be utilized by GVS-Attack to dial arbitrary phone numbers even when the phone is securely locked Keywords with a password. ======= This paper presents a novel approach Android Security; Speaker; Voice Assistant; Permission Bypass- (GVS-Attack) to launch permission bypassing attacks from a zero- ing; Zero Permission Attack permission Android application (VoicEmployer) through the phone speaker. The idea of GVS-Attack is to utilize an Android system 1. INTRODUCTION built-in voice assistant module – Google Voice Search. With In recent years, smartphones are becoming more and more popu- Android Intent mechanism, VoicEmployer can bring Google Voice lar, among which Android OS pushed past 80% market share [32]. Search to foreground, and then plays prepared audio files (like “call One important reason for this popularity is that users can install number 1234 5678”) in the background. Google Voice Search applications (apps for short) as their wishes conveniently. But such can recognize this voice command and perform corresponding conveniences has brought serious security problems, like malicious operations. With ingenious design, our GVS-Attack can forge applications. According to Kaspersky’s annual security report [34], ∗Responsible disclosure: We have reported the vulnerability of Android platform attracted a whopping 98.05% of known malware Google Search app and corresponding attack schemes to Google in 2013. security team on May 16th 2014. Current Android phones are equipped with several kinds of yDemo video can be found on the following website: https:// sensors, such as light sensor, accelerometer, microphone, GPS, sites:google:com/site/demogvs/ etc. They enhance the user experience and can be used to develop creative apps. However, these sensors could be utilized as powerful Permission to make digital or hard copies of all or part of this work for personal or weapons by mobile malware to steal user privacy as well. Taking classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation the microphone as an example, a malicious app can record phone on the first page. Copyrights for components of this work owned by others than the conversations which may contain sensitive business information. author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or With some special design, even credit card and PIN number can be republish, to post on servers or to redistribute to lists, requires prior specific permission extracted from recorded voice [46]. and/or a fee. Request permissions from [email protected]. Nearly all previous sensor based attacks [46, 40, 13, 10, 47, SPSM’14, November 7, 2014, Scottsdale, Arizona, USA. 49, 30] only considered using input type of components to per- Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2955-5/14/11 ...$15.00. form malicious actions, such as accelerometer for device position http://dx.doi.org/10.1145/2666620.2666623. analysis and microphone for conversation recording. These attacks are based on accessing or controlling over sensitive sensors di- The basic idea of GVS-Attack is to utilize the capability of rectly, which means corresponding permissions are required, such Google VoiceSearch. Through Android Intent mechanism, VoicEm- as CAMERA for camera, RECORD_AUDIO for microphone, and ployer can bring Google Voice Search to foreground, and then plays ACCESS_FINE_LOCATION for GPS. As a matter of fact, output an audio file in the background. The content of this audio file could type of permission-free components (e.g. speaker) can also be be a voice command, such as “call number 1234 5678”, which utilized to launch attacks, namely an indirect approach. will be recognized by Google Voice Search and the corresponding Consider the following question: operation would be performed. By exploiting a vulnerability of status checking in Google Search app, zero permission based Q: To a zero-permission Android app that only invokes VoicEmployer can dial arbitrary malicious numbers even when the permission-free sensors, what malicious behaviors can it do? phone is securely locked. With ingenious design, GVS-Attack can In general, zero permission means harmlessness and extremely forge SMS/Email, access privacy information, transmit sensitive limited functionality. However, our research results show that: data and achieve remote control without any permission. What is A: Via the phone speaker, a zero-permission app can make more, context-aware information can be collected and analyzed to phone calls, forge SMS/Email, steal personal schedules, obtain assist GVS-Attack, which makes this attack more practical. user location, transmit sensitive data, etc. It is difficult to identify the malicious behaviors of VoicEmployer through current mainstream Android malware analysis techniques. This paper presents a novel attack approach based on the speaker From the aspect of permission checking [9, 22], VoicEmployer and an Android built-in voice assistant module – Google Voice doesn’t need any permission. From the aspect of app dynamic Search (so we call it GVS-Attack). GVS-Attack can be launched behavior analysis [21, 53], VoicEmployer doesn’t perform ma- by a zero-permission Android malware (called VoicEmployer licious actions directly. The actual executor is Google Voice correspondingly). This attack can achieve varied malicious goals Search, a "trusted" system built-in app module. In addition, the by bypassing several sensitive permissions. voice commands are outputted from the speaker and captured by GVS-Attack utilizes system built-in voice assistant functions. the microphone as input. Such inter-application communication Voice assistant apps are hands-free apps which can accept and channel is beyond the control of Android OS. We tested several execute voice commands from users. In general, typical voice famous anti-virus apps (AVG, McAfee, etc.) to monitor the process commands include dialing (like “call Jack”), querying (like “what’s of GVS-Attack. None of them can detect GVS-Attack and report the weather tomorrow”) and so on. Benefited from the development VoicEmployer as malware. of speech recognition and other natural language processing techniques [16], voice assistant apps can facilitate users’ daily opera- Contributions. We summarize this paper’s contributions here: tions and have become an important selling point of smartphones. Every mainstream smartphone platform has its own built-in voice • New Attack Method and Surface. To the best of our knowl- assistant app, such as Siri [8] for iOS and Cortana [36] for Windows edge, GVS-Attack is the first attack method utilizing the Phone. On Android, this function is provided by Google Voice speaker and voice assistant apps on mobile platforms. Be- Search1 (see Figure1) which has been merged into Google Search sides, this attack can be launched by a totally zero-permission app as a sub-module starting from Android 4.1. Users can start Android malware that

Load more