Deaf-Aid: Mobile Iot Communication Exploiting Stealthy Speaker-To-Gyroscope Channel

Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel

Ming Gao Feng Lin∗ Weiye Xu Zhejiang University Zhejiang University Muertikepu Nuermaimaiti [email protected] [email protected] [email protected] [email protected] Zhejiang University Jinsong Han Wenyao Xu Kui Ren Zhejiang University SUNY Buffalo Zhejiang University [email protected] [email protected] [email protected]

ABSTRACT ACM Reference Format: Internet of Things (IoT) devices are hindered from communicating Ming Gao, Feng Lin, Weiye Xu, Muertikepu Nuermaimaiti, Jinsong Han, with their neighbors by incompatible protocols or electromagnetic Wenyao Xu, and Kui Ren. 2020. Deaf-Aid: Mobile IoT Communication Ex- ploiting Stealthy Speaker-to-Gyroscope Channel. In The 26th Annual Inter- interference. Existing solutions adopting physical covert channels national Conference on Mobile Computing and Networking (MobiCom ’20), have limitations in receiver distinction, additional hardware, condi- September 21–25, 2020, London, United Kingdom. ACM, New York, NY, USA, tional placement, or physical contact. Our system, Deaf-Aid, utilizes 13 pages. https://doi.org/10.1145/3372224.3419210 the stealthy speaker-to-gyroscope channel to build robust protocol- independent communication with automatic receiver identification. Deaf-Aid exploits ultrasonic signals at a frequency corresponding to the target receiver, forcing the gyroscope inside to resonate, so 1 INTRODUCTION as to convey information. We probe the relationship among axes in a gyroscope to surmount frequency offset ingeniously and support Internet of Things (IoT) has attracted increasing attention in recent multi-channel communication. Meanwhile, Deaf-Aid identifies the years. It connects various electronic appliances intelligently for receivers automatically via device fingerprints constituted by the people’s convenience. Analysts predict that the expenditure on the diversity of resonant frequency ranges. Furthermore, we entitle deployment of IoT will continue to maintain good momentum, ris- Deaf-Aid the capability of mobile communication which is an essen- ing to $726.5 billion worldwide annually [24]. Artificial intelligence tial demand for IoT devices. We address the challenge of accurate and 5G communication technology also help to combine various signals recovery from motion interference. Extensive evaluations devices, aiming at building a comprehensive IoT network. demonstrate that Deaf-Aid yields 47bps with BER lower than 1% However, creating such an everything-related IoT network in- under motion interference. To our best knowledge, Deaf-Aid is the volves abundant obstacles. Various incompatible communication first work to enable stealthy mobile IoT communication on thebasis standards have aggravated the problem during information ex- of inertial motion sensors. change via IoT devices. Requirements in different scenarios encour- age various protocols, while devices usually support only one or a CCS CONCEPTS couple of them. Wi-Fi [27] and Bluetooth [62] are widely used in mobile communication. ZigBee [52] and MQTT [22] are suitable for • Hardware → Sensor applications and deployments. the transmission of small-streamed data, especially under resource constraints. Furthermore, there are EnOcean [38], 6LowPan [59] KEYWORDS in the field of smart home and AMQP[49], COAP [7] in industrial gyroscopes, covert channel, mobile IoT communication IoT. To make matters worse, manufacturers develop their own protocols, building distinctive systems to attract consumers. These aforementioned methods rely only on the electromagnetic wave ∗Feng Lin is the corresponding author. and would fail upon the electromagnetic interference and shielding, as demonstrated in Fig. 1. To address these problems above, researchers take advantage of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed physical characteristics to build a covert channel between nodes for profit or commercial advantage and that copies bear this notice and the full citation that are physically and logically separated [28, 56, 57], so that de- on the first page. Copyrights for components of this work owned by others than ACM vices can communicate regardless of the protocols. Nevertheless, must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a these systems are confronted with several hindrances, such as ad- fee. Request permissions from [email protected]. ditional hardware, confined placement, or physical contact. For MobiCom ’20, September 21–25, 2020, London, United Kingdom instance, Ripple [40, 41] demands specialized vibration motors and © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7085-1/20/09...$15.00 physical contact; BitWhisper [17] can only be applied between two https://doi.org/10.1145/3372224.3419210 desktop PCs in fixed position. Moreover, they are dependent on

705 MobiCom ’20, September 21–25, 2020, London, United Kingdom Gao and Lin, et al.

identification. It provides an alternative and complementary communication channel to current IoT devices. We model the resonant output of gyroscopes, analyze the frequency offset, and exploit the inter-axial relation for correction. The compositions of Deaf-Aid are elaborated, including receiver identification, encoding, denoising, and threshold. It supports simultaneous communication on double channels, even from two transmitters. Movement influence is taken into account and described exhaustively. Multiple technologies are employed to adjust our system to a mobile IoT network. We evaluate our system on three kinds of speakers, 32 gyroscopes chips of four models, and three kinds of phones, and exert a variety of Figure 1: An application scenario for Deaf-Aid in the case movement on it for robustness test. To better evaluate our system, of espionage. Spy A divulges privacy secretly to Spy B, who we perform a comprehensive evaluation with 22 participants to receives it stealthily via various IoT devices under the elec- validate the effectiveness under real-world scenarios. tromagnetic shield. The contribution of Deaf-Aid can be summarized as follows: • We investigate the possibility of communicating through a gyroscope and elaborate a stealthy channel without the manual receiver identification, which is impractical in a compre- restriction of peripheral, contact, fixed placement, and espe- hensive and mobilizable IoT network. More feasible and robust cially the manual receiver identification. communication between IoT devices is needed urgently [48]. • We comprehensively analyze the relationship among axes We turn attention to the channel of speaker-to-gyroscope. It has in a gyroscope under resonance, which has not been studied been interpreted that micro-electro-mechanical systems (MEMS) before in existing literature. Accordingly, noise is removed inertial sensors are vulnerable to the ultrasonic injection [42, 47, and motion interference is suppressed even when drift brings 48, 51]. Choreographed ultrasound can couple to the stationary about frequency offset. MEMS gyroscopes and make them produce low-frequency angular • We develop a robust communicating system for a mobiliz- rate readings [10, 11]. However, little attention was drawn on the able IoT network. In particular, we take the initiative in potential benefits of its susceptibility. Inspired by it, we explore excavating the potential of inertial sensors reutilized for the gyroscopes resonance from a communication perspective. Despite robustness to movement. the limitation of protocols, a robust system is proposed for bridging a stable transmission in an IoT network, transmitting via speakers, 2 BACKGROUND and decoding them through gyroscopes. The channel frequency 2.1 Gyroscope Structure is selected according to the receivers as each gyroscope has its own unique resonant frequency range. Such a non-contact speaker- MEMS gyroscopes are implemented with Coriolis force [8]. It is to-gyroscope channel in IoT communication is feasible because usually equipped with movable parts in two orthogonal directions ultrasonic signals can be obtained through commodity speakers in to generate Coriolis force. As shown in Fig. 2, in the driving di- phones or voice assistants without any peripherals, and gyroscopes rection, driving springs add a sinusoidal voltage to force the mass have become an indispensable part in intelligent devices [53], in- to oscillate at its natural frequency. The Coriolis sensing fingers cluding smartphones, VR sets, vehicles, wearable devices, remote move owing to the transverse Coriolis motion. In sensing direction, control devices and the like. Coriolis acceleration leads to capacitance change. This acceleration Robust communication among mobilizable devices is significant. 푎푥 is similar to angular rate 휔, according to Movement introduces noise, masking characteristic signals, espe- 푎푥 = −2휔푦,¤ (1) cially on the gyroscope-based system. Robustness to motion is cer- where 푦¤ is the linear velocity in driving direction. It converts the tainly a key issue in inertial sensors reutilization [9, 29, 35, 50, 58]. angular rate into the displacement in sensing direction. Moreover, unpredictable frequency offset confuses the frequency Meanwhile, only one module cannot distinguish between transla- characteristics, hindering accurate signals recovery via spectrum tion and rotation. Consequently, two identical structures are placed analysis. In this case, the gyroscope-based systems are unable to work stably and precisely in a dynamic environment. For robust gyroscope-based communication in these circumstances, our system needs to specifically address several practical challenges: (1) Capability: How to leverage gyroscopes to build a protocol-independent channel of high quality with precise receiver identification. (2) Mobility: How to accurately recover the signals in a mobile communication scene. (3) Drift: How to deal with the frequency offset caused by drift to ensure communication stability. To this end, we present a convenient and robust system that exchanges data over the air, namely Deaf-Aid, free from restric- tions including peripherals, fixed positions, and artificial receiver Figure 2: Concept of MEMS gyroscope structure.

706 Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel MobiCom ’20, September 21–25, 2020, London, United Kingdom

Figure 3: The characteristics of the forced vibration. Figure 4: Popular gyroscopes have a narrow resonant frequency range above human audibility. abreast, linked to a differential amplifier, and then the gyroscope from environmental noise. Conversely, the transmission will not reading is finally obtained after the processes of amplifier, filter, affect the normal operations of its surrounding devices. and analog-to-digital conversion. 4 MODEL 2.2 Resonance Principle In this section, we utilize an axis as an example to develop a physics- based mathematical model to quantitatively analyze the resonant The structure of MEMS gyroscopes is a kind of single-degree-of- outputs of a gyroscope. freedom system with a high damping ratio 휉 typically. Damping is ignored at low frequency, and gyroscopes keep linear outputs. As 4.1 Oscillation frequency increases, damping gradually becomes dominant, and oscillation occurs, with the characteristics of the forced vibration Acoustic inputs of specific frequency bring harmonic excitation on [46] illustrated in Fig. 3. They indicate that the gain coefficient a gyroscope. Sound waves produce forces of the same frequency on a single-degree-of-freedom system. Hence, the force imposed 푏 reaches the peak at the natural frequency 푓푁 , where phase Φ changes dramatically. As a result, gyroscopes respond to acoustic on one axis can be described by injection. For the sake of accurate measurement, this architecture 퐹 (푡) = 퐴 · 푠푖푛(2휋 푓0푡 + 휙0), (2) is designed to share the same natural frequency with resonating where 퐴 is the magnitude decided by intensity and position of mass. However, inevitable errors bring about natural frequency the sound source, 푓0 is the frequency of the sound source, and 휙0 alternation in batch production. This implies the inter-individual indicates the initial phase. In a single-degree-of-freedom system discrepancy in the natural frequency among gyroscopes. [46], the resulting oscillation is 푅0 (푡) = 푏퐴 · 푠푖푛(2휋 푓0푡 + 휙0 + 휙1), (3) 3 FEASIBILITY INVESTIGATION where the gain coefficient 푏 and phase 휙1, introduced by resonance, / On the basis of characteristics of the ultrasonic wave, we demon- are determined by the frequency ratio 푓0 푓푁 . strate and evaluate the feasibility and stealth of the speaker-to- gyroscope channel in respect of noiselessness, availability, and 4.2 Digitization inaudibility. We tested eight models of chips for four each, whose Typical MEMS architecture in a gyroscope comprises three parts: resonant frequency bands are displayed in Fig. 4. amplifier, filter, and analog-to-digital conversion. Inaudibility. It has been proved that the resonant frequency of Amplifier and Filter: Two identical structures are designed gyroscopes tends to exceed 18kHz in Fig. 4. It is scarcely perceptible in typical MEMS gyroscopes to obtain rotation via a differential to human hearing [10, 11, 42, 51] and always ignored by the speech amplifier, and then a low-pass filter (LPF) is aimed at removing noise. recognition system, whose sampling rate is below 16kHz. Although An ideal LPF can completely remove the high-frequency noise an accelerometer also resonate with acoustic injection [6], it is beyond the cut-off frequency. However, filters are less effective discarded for its audible resonant frequency, within 10kHz usually. at handling noise whose frequency is much higher than the cut- No peripheral. According to the Nyquist sampling theorem, off frequency in gyroscopes. Even so, it may introduce amplitude commercial speakers can induce sound within 24kHz with the alteration besides slight frequency changes and phase shifts. In 48kHz sampling rate. Hi-Fi speakers [54] and contemporary mo- general, the analog signal in gyroscopes follows this formula, ′ biles [14] perform better. For example, Samsung Galaxy S8 is manu- 푅(푡) = 푏퐿퐴 · 푠푖푛(2휋 푓0 푡 + Φ), (4) factured with a sound card up to 32-bit/384kHz [15]. A smartphone where 퐿 is the influence of filter and depends on the output-data- ′ or a commodity speaker can cover the frequency band of most rate (ODR) and insensitive to the variations of 푓0 experimentally; 푓0 ′ popular gyroscopes and apply Deaf-Aid without any peripherals. is the frequency under the influence of filter and Φ = 휙0 +휙1 +휙 is Little environment interference. Common application sce- the overall phase shift while 휙 ′ is introduced by amplifier and filter. narios of ultrasound prefer frequency bands above 40kHz, such as Since 퐿 and 휙 ′ rely on the structural parameters in gyroscopes, they cleaning, medical examination, and treatment. The resonant fre- can be regarded as constants in a given gyroscope. quencies of gyroscopes are almost in the band between 18kHz and Analog-to-digital Conversion: The frequency of acoustic in- 40kHz, where few devices work. Therefore, Deaf-Aid is shielded put is usually over 18kHz, much higher than the sampling rate in

707 MobiCom ’20, September 21–25, 2020, London, United Kingdom Gao and Lin, et al.

Figure 5: Deaf-Aid, a speaker-to-gyroscope channel for mobile IoT communication, where the transmitter is realized by a smartphone or a commercial speaker and the receiver can be any IoT device equipped with a gyroscope.

푘 hundreds. This leads to aliasing, where the sampled signal fails 푅[푘] = 푏퐿퐴 · 푠푖푛(2휋 푓2 + Φ), (8) to maintain the original spectrum characteristics. Assuming the 퐹푠 + Δ퐹푠 sampling rate is 퐹푠, the sampled signal can be expressed as 퐹푠 퐹푠 where 푓2 = 푓1 − 푛 × Δ퐹푠 is the frequency of gyroscope readings 푓 ′ = 푛 × 퐹푠 + 푓 , (− < 푓 < ) (5) 0 1 2 1 2 under drift. Since 푓0 is usually hundreds of times more than 퐹푠, slight fluctuations in the sampling rate may initiate a remarkable 푘 푅[푘] = 푏퐿퐴 · 푠푖푛(2휋 푓 + Φ). (6) frequency offset, as indicated in Fig. 6. 1 퐹푠 In conclusion, we expound the formation of the angular rate readings of gyroscopes under acoustic injection when the gyro- 5.2 Inter-axial Characteristics scope is stationary. Vividly, the final readings are dependent on We demonstrate the offset-independent characteristics and correct the input frequency and sample rate, where aliasing brings about the frequency offset, then further suppress motion influence onthe low-frequency readings. basis of these interrelationships. Previous studies focused on the resonant data on only one axis and neglected the relation among 5 FREQUENCY OFFSET CORRECTION multiple axes. We thoroughly investigate these inherent inter-axial Sample rate drift occurs casually and generates unpredictable fre- characteristics. quency offset, making it difficult to separate signals from noise Frequency synchronization. The oscillation of each axis in a and motion inference via spectrum analysis. In this section, we gyroscope coincides. They originate from the same ultrasonic input, elaborately analyze this process and exploit the relationship among undergo the identical digitization process, and share the same re- axes for offset correction. sponse frequency correspondingly. The sampling rate shift, if any, is destined to happen at the same time, and accordingly, the frequency 5.1 Sample Rate Drift offset occurs simultaneously. Fig. 6 provides an illustration. Fixed phase difference. Because of the synchronous resonance A serious weakness of sampling rate drift is that it leads to obvious and the identical digitization process, the phase difference is only but unpredictable deviations of output frequency [48], making the introduced in sensing and resonance stages. We experimentally outputs unstable. This is an issue that remains to be resolved, es- discover that each axis oscillates at the same frequency with a fixed pecially in the mobile communication system. We assume Δ퐹푠 as phase difference throughout. The scatter plot in Fig. 7 exemplifies the sample rate drift and substitute it into Equ. 5 and Equ. 6. The the relationship between every two variables and supports this output frequency alters as ′ perspective. The curve fit an ellipse, reflecting these variables follow 푓0 = 푛 × (퐹푠 + Δ퐹푠) + 푓2, (7) fixed phase difference (b). The phase characteristics of the forced vibration in Fig. (b) account for this difference. Axes differ in the natural frequency owing to production, indicating that the peak point 푓푁 is different. When subjected to the same frequency of vibration, the ratios 푓 /푓푁 in multiple axes are unequal, bringing about a difference in amplitude coefficient 푏 and phase 휙1. Notably, the amplitude coefficient and phase difference keep invariable when input frequency does not change, even disturbed by motion.

5.3 Offset-independent Correction Considering the existence of synchronous frequency and fixed phase difference, we employ a multiplier and a mean filter for Figure 6: A sample of fre- Figure 7: A scatter plot and correction. We multiply data in any two axes and obtain a result quency offset. its fitting curves. composed of constant bias and a second harmonic component.

708 Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel MobiCom ’20, September 21–25, 2020, London, United Kingdom

Taking data from X- and Y-axes for example, 푆푐표푟 [푘] = 푅푥 [푘] × 푅푦 [푘] ′ 1 푓 (9) = 퐴 퐴 [푐표푠(Φ − Φ ) − 푐표푠(4휋 0 푘 + Φ + Φ )], 2 푥 푦 푥 푦 퐹푠 푥 푦 where 푅푥 [푘] and 푅푦 [푘] are the readings on two axes and 퐴푥, 퐴푦, Φ푥 and Φ푦 are amplitudes and phases respectively. After filtered, the harmonic component is removed, with the constant retained. The whole process is not sensitive to frequency nor offset. By aid of these relationships, we avoid the undesirable outcomes triggered by drift dexterously. (a) (b) 6 SYSTEM DESIGN Figure 9: An illustration of (a) multiplier-based channel and We propose a novel communication system that utilizes the suscep- (b) its performance on noise reduction. tibility of gyroscopes to ultrasound. It involves combined efforts from several modules, as illustrated in Fig. 5. error, accounts for it. We select the frequency range where at least 6.1 Receiver Identification two axes have a response as the ID range (the green frame in Fig. 8). Receiver identification is fundamental for communication inan It supports faster authentication than the ergodic comparisons in enormous and dynamic IoT network. However, there are certain each axis and avoids the confusion where some gyroscopes share drawbacks associated with the use of traditional methods. Rec- similar ranges on one axis. The radar chart in Fig. 10(d) confirms ognizing devices manually is widely used in covert channels but the validness of this fingerprint. For stable communications, the impractical. Meanwhile, routing protocol and address resolving de- resonant frequency of a gyroscope is measured in advance and mand an excessive configuration, especially in a dynamic situation. all information is known by legal users. The whole measurement Hence, it is a conundrum to keep a balance between overheads process is very fast (within several minutes) and multiple devices and automation. Device fingerprint may acquit itself splendidly in the same model can be measured simultaneously. in this scenario. Fig. 4 reveals that different kinds of gyroscopes Practically, the transmitter sends an identifier composed of chirps have various resonant frequency ranges. However, these ranges modulated by the ID range of the target. It pushes the gyroscope to may coincide and it is difficult to further distinguish different gyro- oscillate with the homologous chirps, even if there is an offset or scopes of the same kind. Furthermore, we perceive the otherness of movement disturbance. The device that receives a full identifier will gyroscopes of the same model in resonant passband finely. Deaf-Aid be regarded as a communication target. Although utilizing the ID leverages this diversity as a device fingerprint to identify receivers range reduces the dimension of features, it can still distinguish hun- in a dynamic position. dreds of devices. To verify the stability of this feature, we prepared From this perspective, we conduct an exploratory experiment an experiment a month later after the ranges were first measured. to get the accurate resonant passband ranges at intervals of 1Hz. We tested 6 speakers and 12 chips of two kinds of models, including √ eight BMI160 chips (1-8th) and four L3GD20 chips (9-12th) with Conventionally, the frequencies corresponding to 2/2 of the peak the results displayed in a confusion matrix in Fig. 10(e). It achieves value of gain coefficient are deemed starting and ending points (푓 푠 an accuracy of 96.32% totally. There is a slight drop in the accuracy and 푓 in Fig. 3), as shown in Fig. 8. Distinctly, each gyroscope of the 푒 of the 4th and 5th chips. We note that these errors are concentrated same model varies in the passband. Additionally, we observe each during the test procedure via the same speaker. The poor perfor- axis in one gyroscope may differ slightly. We make a comparison mance of this speaker is to blame for the mistakes in identification. of each axis among gyroscopes, as diagrammed in Fig. 10. The Even in the worst circumstances with speakers of poor frequency difference of the natural frequency 푓 , spawned by the production 푁 resolution, it still has the capability to make a distinction among tens of devices, which copes with most scenarios.

6.2 Encoding The traditional method highlights the amplitude envelope which needs mass data to make up one bit at the sacrifice of speed. One bit should be composed of as few data as possible with a low error rate. Another issue is the signal energy fluctuation. It is not prede- termined and varies along with the location and energy of sound sources. As a compromise, we make some adjustments on pulse interval encoding (PIE) [39]. We encode the data by defining different time gap widths between the rising edges of the pulses where a short interval indicates ‘1’ and a long one implies ‘0’, as shown in Fig. 5(b). We stipulate that only one rising edge is detected in a bit. Figure 8: The bandwidth for 8 identified BMI160 chips. This is conducive to the subsequent motion influence suppression.

709 MobiCom ’20, September 21–25, 2020, London, United Kingdom Gao and Lin, et al.

(a) X-axis (b) Y-axis (c) Z-axis (d) ID range (e) Confusion matrix Figure 10: Radar charts of the resonant frequency range for 8 identical BMI160 chips and confusion matrix of gyroscopes identification for 12 subjects of 2 kinds of models.

6.3 Noise Reduction 6.5 Multi-channel Support The sound wave intensity, as well as channel energy, declines as the Although resonance on each axis is dependent, it varies in the distance increases. Discussed thoroughly in Stebler et al. [43], the resonant frequency range. In some frequencies, only some of the inherent noise of a gyroscope is regarded as independent Gaussian axes oscillate. We choose where just two axes resonate as multi- white noise in each axis. Filtering fails due to an unstable frequency channels (the black frames in Fig. 8). The mutual interference on the offset. In this scenario, we remove noise sagaciously based onthe common axis can be reduced in the same way in Sec. 6.3. Thus, these multiplier in Sec. 5 for signal extraction. Two axes are combined as channels can deliver different massages over different frequency a channel by a multiplier, with an average filter for high-frequency ranges at the same time. It provides double capacity or allows components and noise removal, as elaborated in Fig. 9. The com- a receiver to listen to two users simultaneously. bined channel has a higher signal-to-noise ratio and extends com- Consequently, we have established a gyroscope-based commu- munication distance excellently. nication channel. A sample of signal transmission is illustrated in Fig. 11. We meet the demand for faster transmission speed with 6.4 Threshold and Decoding less noise, whereas it is still prone to error in movable conditions. It is insufficient to rely solely on empirical thresholds. The signal amplitude depends on several aspects, including communication distance, source, and resonance intensity. Inspired by the image threshold, we introduce the maximum entropy threshold method [26]. The basic idea is to find the maximum entropy and take the (a) Tx moving (b) LOS blocking (c) Rx moving. corresponding threshold as the final one. Concretely, for a channel Figure 12: Three basic kinds of motion interference. with resolution 푟 and maximum value 퐾 · 푟, we decide threshold 푞 = 푘 × 푟, (푘 = 1, 2, . . ., 퐾) when the following entropy reaches a 7 MOTION INFLUENCE SUPPRESSION maximum, Motion exerts a huge impact, especially on the gyroscope-based 푘 Õ 푝(푖 × 푟) 푝(푖 × 푟) system. It is possible to be either disturbed by obstacle occlusion 퐻 (푞) = − 푙표푔 Í푘 Í푘 or significantly affected by the mixture of motion and resonance 푖=1 푗=1 푝(푗 × 푟) 푗=1 푝(푗 × 푟) (10) during communication. Fig. 12 illustrates that motion influences 퐾−1 Õ 푝(푖 × 푟) 푝(푖 × 푟) can be resolved into three simple forms: transmitter (Tx) moving, − 푙표푔 , Í퐾−1 푝(푗 × 푟) Í퐾−1 푝(푗 × 푟) blocking in the line-of-sight, and receiver (Rx) moving, which can 푖=푘+1 푗=푘+1 푗=푘+1 combine to form complex motion in practice. We offer an exhaustive where 푝(·) is the probability density. Then, we decode the signals description of the motion effect and propose solutions accordingly where the points with a value larger than this threshold are regarded for robust communication in a mobile IoT network. It integrates as high level, or as low level otherwise. a string of techniques, including adaptive threshold, interleaver, wavelet transform, multiplier and blind source separation. 7.1 Transmitter Motion Transmitter motion contributes to a variation in the transmission distance. It changes the force imposed on the gyroscope, and results in signal fluctuation, including amplitude 퐴[푘] and initial phase 휙0 [푘]. The gyroscope output is rewritten as 푘 푅[푘] = 푏퐿 · 퐴[푘]· 푠푖푛(2휋 푓 + Φ[푘]), (11) 1 퐹푠 ′ where, Φ[푘] = 휙0 [푘] + 휙1 + 휙 . Because of the signal jitters, a fixed threshold struggles and promotes the probability of error, which Figure 11: An example of signal transmission. initiates communicating instability.

710 Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel MobiCom ’20, September 21–25, 2020, London, United Kingdom

We assume the effect of distance change maintains stable ina motion. So it could be removed easily by wavelet transform at the pulse. A pulse is roughly measured in milliseconds, and the com- cost of an acceptable loss of accuracy in angular rate measure. munication distance will not drastically change in such a short Due to the random frequency offset generated by drift, this ap- time. The intensity and phase changes are negligible within a pulse. proach results in an accuracy loss in signal transmission. It is essen- Inspired by the idea of threshold window in image recognition tial to separate resonant oscillation from the motion for accurate [61], we calculate threshold in a short time (such as several bits) to communication. We discuss it in two situations where the receiver achieve adaptive threshold segmentation, handling the fluctuation is moving in a plane or space. of the pulse. Moreover, we can normalize amplitude on the basis of 7.3.2 Signal Separation from Plane Motion these thresholds. Plane motion is ubiquitous like cars, smart assistants, or cleaning robots. It affects some of axes in a gyroscope, which indicates that 7.2 Line-of-Sight Blocking 푀푖 [푘] in Equ. 14 do not necessarily exist synchronously. For in- Sound transmission is affected by the medium especially on LOS. stance, a motion is concentrated on the XoY plane, which indicates The transmitter can deliberately avoid protracted obstacles. In real that the 푀푧 [푘] is zero constantly here. Under such circumstances, scenarios, it is more likely to occur disorderly, thrown in like a these motions can count as noise. We multiply the data in X- or sudden error. The sudden error can be denoted as 푆퐸[푘], and the Y-, and Z-axes to generate a combined channel. Taking X-axis for gyroscope output is rewritten as example, it can be modeled as 푘 푆푐표푚푏 [푘] = 푅푥 [푘, 푀푥 ] × 푅푧 [푘, 0] 푅[푘] = 푏퐿 · 퐴 · 푠푖푛(2휋 푓1 + Φ) + 푆퐸[푘]. (12) (15) 퐹푠 = 푅 [푘, 0] × 푅 [푘, 0] + 푀 [푘] × 푅 [푘, 0]. We utilize interleaving technology to reduce these burst errors. 푥 푧 푥 푧 It introduces an item [ ] × [ 0], where the energy of the Interleaving allocates the transmission bits in the time or frequency 푀푥 푘 푅푧 푘, low-frequency components, if any, is low, and the high-frequency domain or both. It changes the information structure to the greatest components are removed by the mean filter, since [ ] is often extent without content alternation. In this way, the decoder can 푀푥 푘 low-frequency. In this way, the plane receiver movement induces treat these errors as random ones, which indicates that it maximizes no alteration in signal transmission. the dispersion of concentrated errors during channel transmission. One of the most common ways is block interleaver [5]. It writes the 7.3.3 Signal Separation from Spatial Motion × input sequence into a 푚 푛 matrix in the order of rows and then Spatial motion is more widespread and complicated. Its complex- reads by columns. The read and write objects are swapped during ity invalidates the multiplier-based signal extraction. We leverage reordering. The mapping function is expressed as the single-channel blind source separation (BSS) method with the 퐼 (푖) = [(푖 − 1) mod 푛] + ⌊(푖 − 1)/푛⌋ + 1, (13) ensemble empirical mode decomposition (EEMD) for error-free where 퐼 (푖) is the location of the 푖th (푖 = 1, 2. . ., 푁 ) data in the origi- channels under spatial motion interference. nal line, 푚 and 푛 are the number of rows and columns, respectively, The length of intervals of rising edges between bits must fall ⌊·⌋ is the floor function, and 푁 = 푚 × 푛 represents the interleaving into a definite range ruled in Sec. 6.2. Once the interval length length. It maximizes the dispersion of the burst errors in the pro- exceeds this range, it is judged to be disturbed by motion during cess of channel transmission and effectively cuts down the errors data transmission. aroused by sudden block. A BSS model can be represented by 푋 = 퐴푆, (16) 푇 7.3 Receiver Motion where 푋 = [푥1 [푘], 푥2 [푘], . . ., 푥푁 [푘]] is the observation matrix, 푇 Receiver motion triggers distance variation and devotes gyroscopes 푆 = [푠1 [푘], 푠2 [푘], . . ., 푠푀 [푘]] is the source matrix, and 퐴 is a 푁 × to produce additional readings concurrently. With distance varia- 푀 matrix. Typically, it requires that the number of independent tion solved and normalization on the basis of the adaptive threshold observers is not less than the number of sources, that is 푁 ≥ 푀. in Sec. 7.1, the gyroscope output is rewritten as Thanks to the encoding rules in Sec. 6.2, resonance must occur in 푘 the odd number pulse width with variance greater than the mean 푅 [푘, 푀 ] = 푏 퐿퐴′ · 푠푖푛(2휋 푓 + Φ) + 푀 [푘], (14) 푖 푖 푖 1 퐹푠 푖 one. We take the subsequent high-frequency parts after wavelet where 푀푖 (푖 = 푥,푦, 푧) represents the additional readings introduced transform and energy normalization, a mix of the resonant data by movement on the corresponding axis, and 퐴′ is the normalized amplitude. Particularly, 푏 and 휙1 are immune to motion. The inter- axial characteristics are valid with a moving receiver and employed for signal recovery. 7.3.1 Motion Recovery The initial objective of the gyroscope is to measure movements. We are supposed to eliminate the influence of motion on the signal concurrently with recovering motion. Since the resonant data is sinusoidal with a peak 퐿푏퐴 and the frequency 푓1, the accumulative Figure 13: The performance improvement on each compo- error on the angle is tiny, with the maximum error 퐿푏퐴/푓1휋. Besides, the frequency of the sinusoidal oscillations often exceeds that of nent of signal separation from spatial motion.

711 MobiCom ’20, September 21–25, 2020, London, United Kingdom Gao and Lin, et al.

Algorithm 1: Components Reorganization Based on the in Algorithm 1. Fig. 13 reflects that our method is better than only Inter-axial Characteristics using wavelet transform, in which the BERs are reduced to below 0.7%. It demonstrates that the BSS method and the inter-axial Input: The 푛-dimension matrices 푆ˆ1 and 푆ˆ2; characteristics dominate signal recovery. Output: The resonant data 퐷1 [푘] and 퐷2 [푘] To summarize, we explicate the influence of motion and provide 1 Initialize 퐴푚푎푥 = 0; corresponding solutions, and as a result, prepare the system for the 2 퐵푖 = [퐵푖1, 퐵푖2, . . ., 퐵 푛(푛−1) ](푖 = 1, 2) are Power Set of 푆ˆ푖 , 푖 2 robustness against movement. To the best of our knowledge, it is 푛(푛−1) ˆ where 퐵푖 푗 (푗 = 1, 2,... 2 ) ⊂ 푆푖 ; the first work to develop the reutilization of inertial sensors witha Í 3 푇푖 푗 [푘] = 퐵푖 푗 [푘]; high precision in a movable scene. 4 퐿 ← the length of 푇푖 푗 [푘]; 푛(푛−1) 8 EVALUATION 5 for 푖 ∈ [1, 2 ] do 푛(푛−1) 8.1 Experimental Setup and Metrics 6 for 푗 ∈ [1, 2 ] do 7 푄 = ∅ ; We build the prototype of Deaf-Aid using off-the-shelf devices. 8 for 푚 = 1 : 퐿 do We conduct a comprehensive study to evaluate the accuracy and 9 푄 = 푄 ∪ (푇1푖 [푚],푇2푗 [푚]); robustness of our system. These devices are fixed into brackets and 10 end the distances are adjustable, as shown in Fig. 14. These speakers 11 푄 fits an ellipse 퐸 ; play modulated signals, where the gyroscope chips’ readings are collected by an Arduino and an application is developed to record 12 휉 ← the mean square error of fitting; gyroscope readings inside phones. The output-data-rate is set as 13 if 휉 < 0.01 then 200Hz and the pulse width is 50ms unless otherwise stated. 14 퐴 ← Area of ellipse 퐸; Shannon channel capacity [30], a theoretically achievable upper 15 if 퐴 > 퐴 then 푚푎푥 bound, is widely used to measure the effectiveness. It is based upon 16 퐴 = 퐴; 푖 = 푖; 푗 = 푗; 푚푎푥 푚푎푥 푚푎푥 the realized bit error rate, and in a binary symmetric channel, we 17 end have the channel capacity as follows, 18 end 1 퐶 = [1 + 퐵퐸푅 × 푙표푔2퐵퐸푅 + (1 − 퐵퐸푅) × 푙표푔2 (1 − 퐵퐸푅)], (17) 19 end 푃푊 20 end where 퐵퐸푅 is the realized bit error rate and 푃푊 is the pulse width. This bound could be approached practically with proper encoding 21 return 퐷1 [푘] = 푃1푖푚푎푥 [푘]; 퐷2 [푘] = 푃2푗푚푎푥 [푘]; like turbo-codes [45]. and remnant of motion, as the observation 푋, with 푁 = 1 here. However, because of frequency offset and motion, there are several independent source vectors, that is 푀 > 2. It is a necessity to decompose observation for 푁 < 푀 here. EEMD [55] is employed to decompose the single-channel mixed data to fulfill the requirement on the dimension of observation Figure 14: Experimental setup. matrix. Different from FFT, EEMD manages non-stationary signal analysis. It is based on the data itself and does not require any basic 8.2 Universality function, making it more suitable for arbitrary data. It decomposes We take the variety of devices and environments into account to fur- single-channel data into several intrinsic mode functions (IMFs). ther verify the universality of Deaf-Aid. Here we place the speaker They constitute a 푛-dimensional matrix as the observation 푋 instead. and gyroscope at a distance of 15cm in three different locations The detailed extraction has been elaborated in Huang et al. [21] including a large seminar room, a small office, and a crowded labo- and Mijovic et al. [34]. ratory. We test on six speakers of three kinds (including JBL 750T After satisfying the dimension requirement 푁 = 푛 ≥ 푀, we [19], Samsung Galaxy S8 [15], and HIVI-SS1II [20]), whose sup- utilize Fast ICA [23], a widely used solution for BSS, with a 푛- ply power is limited within 5W, and 32 gyroscopes of four models dimension matrix 푆ˆ = [푠ˆ1 [푘], 푠ˆ2 [푘],..., 푠ˆ푛 [푘]] as a result. Neverthe- less, the number of the source is unclear because of the complexity of the motion component, and there is no law on how to combine those vectors into the resonant data. Components are reorganized on the basis of the inter-axial characteristics. On account of the fixed phase difference, resonant data on any two axes can fit a circle, or in the vast majority of cases, an ellipse. We repeat the aforementioned processes on mixed data from another axis and list all possible combinations for fitting. The one with the largest area and accredited mean square error is regarded to contain only resonant data, with detailed flow clarified Figure 15: The impact of environment and devices.

712 Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel MobiCom ’20, September 21–25, 2020, London, United Kingdom

(a) SNR (b) BER

Figure 16: Performance centered on the gyroscope. Figure 18: Communication distance.

Such a communication range is sufficient to cover most application scenarios. It can be concluded that Deaf-Aid is capable of support- ing error-free and remote transmission with scarce constraints on placement in real life scenarios.

8.4 Transmission Capacity (a) SNR (b) BER Transmission speed is conditional on output-data-rate (ODR) and pulse width (PW). In this evaluation, a pair of the speaker (JBL GTO Figure 17: Performance centered on the speaker. 750T, 5W) and gyroscope (BMI160) are placed 15cm apart in the seminar room to judge the trend of transmission capability with (including BMI160, L3GD20, LSM6DS3, and L3G4200D). The dis- different ODR and PW. tribution is presented by box-plots in Fig. 15. In these situations, Pulse width. We appraise our system with an adjustable pulse our system performs diversely but satisfactorily, with BER lower width in the range of 25ms and 100ms when the ODR in gyroscope than 0.25% comprehensively. It guarantees a stable communication is defaulted to 200Hz, a widely used rate in mobile devices. The quality among numerous devices with little deformation due to the product PW×ODR decides the amount of data contained in a bit. ambient environment. A shorter PW understandably means fewer data to form a bit and possibly cause more errors in this issue. In Fig. 19(a), the results 8.3 Orientation and Distance vividly demonstrate that BER maintains below 1% when PW is Placement limitation is a crucial issue in covert channels. Taking a longer than 40ms and 0.1% when PW is longer than 50ms. 5W JBL750T speaker and a BMI160 gyroscope chip as an example, Output-data-rate. Similarly, we repeat the experiment where we examine the resilience under multiple layouts to further explain ODR varies evenly between 100Hz and 500Hz at 100Hz intervals. × that there is no restriction on the layout of devices. Concretely, we Fairly, we maintain the product PW ODR at 0.01. As illustrated in rotate the speaker around the fixed gyroscope. The performance in Fig. 19(b), channel capacity ascends with the incline of ODR, up to the XoY plane is illustrated in Fig. 16 and that of other planes are 47.4bps. Although there is a slight increase in BER, it remains a low similar. It reflects the placement of gyroscopes makes no difference. level within 0.6%. Conversely, we rotate the gyroscope around the fixed speaker. Fig. Generally speaking, Deaf-Aid is competent for the different re- 17 illustrates the effectiveness in the range ofa 22.5◦ opening angle quirements of transmission speed and tolerance of error flexibly in of speakers, with a BER of 0.1% at 15cm and 1% at 20cm. It is practical various occasions. for users tend to turn towards the objective, and slight direction deviation is tolerable. Moreover, we carry with arbitrary layout and 8.5 Multi-channel draw similar results. By the aid of multi-channel, communication capacity can double, The above results about the communication range are obtained or one receiver can get information simultaneously from two trans- from a power-limited speaker, whose power setting is only 5W. We mitters in different scenarios. The third chip in Fig. 8 is exemplified adopt such a setting with the consideration that some IoT devices are equipped with such a power-limited speaker for the purpose of low power consumption. For those devices with less strict requirements on power consumption, the communication distance can dramatically increase. For example, we raise the power of the speaker to 30W, which is also very common in existing commodity speakers. The distance can be extended up to 14m, as indicated in Fig. 18(a). In general, different gyroscopes have different resonance peaks 푏푚푎푥 , and different communication distances correspondingly. Nevertheless, even the L3GD20 chip, which is with the worst performance among (a) Pulse width (b) Output-bit-rate our gyroscopes, can support a communication distance of 3.6m. Figure 19: Performance under different conditions.

713 MobiCom ’20, September 21–25, 2020, London, United Kingdom Gao and Lin, et al.

Figure 20: Performance of Deaf-Aid against motion interference.

Table 1: Validity of simultaneous multi-channel We use a Pixel 4 to further validate the flexibility of Deaf-Aid in terms of layout. We rotate the speaker around the fixed phone, Mode Channels BER(%) Capacity(bps) which is vertical and parallel respectively, with graphical represen- XoY 0.23 tations of results in Fig. 21. The communication distance fluctuates Case1 38.79 YoZ 0.39 between 3.1m and 4.8m with BER less than 1%. This enables a smart- XoY 0.27 19.46 phone to retrieve massages sent within three meters accurately via Case2 Deaf-Aid, no matter in which orientation it is. This demonstrates YoZ 0.17 19.64 that Deaf-Aid is capable of establishing communication among real- istic devices in the wild, merely a tiny penalty of slightly shortening the communication range. to bear out the feasibility of multi-channel, where XoY channel works at 24.41kHz and YoZ channel works at 25.5kHz. We test in two cases. In case 1, a speaker delivers different massages on 8.7 Motion Influence these two channels simultaneously. In case 2, two speakers each Plenty of deliberate motion interference is involved for a better deliver on one of them respectively, with the performance attached understanding of the robustness of Deaf-Aid against the movement. to Tab. 1. In both cases, it succeeds at the expense of a slight ac- We bind a 5W JBL speaker, an obstacle, and gyroscopes to a manipu- curacy loss. Deaf-Aid supports simultaneous communication lator respectively. They move under the control of the program. The on multiple channels, even from two transmitters. experimental distance is set within 15cm by default. Moreover, we recruit 22 participants, who arm with 30W speakers and three kinds 8.6 Smartphone Prototype of phones with encapsulated gyroscopes for further confirmation To verify the feasibility of Deaf-Aid working on those devices with on real-world scenes. encapsulated gyroscopes, we build the smartphone prototype using Rx motion and LOS blocking. We manipulate a speaker into three kinds of phones, including Samsung Galaxy S8, Google Pixel 4, moving in three simple directions (transverse, longitude, and ver- and Mi 5s Plus. All those devices are with encapsulated gyroscopes, ticality), while a BMI160 chip is fixed. Then we manipulate an corresponding to LSM6DSL, BMI160, and ICG-20660/L, respectively. obstacle into moving randomly in LOS between the fixed speaker We measure the transmission distance using a 30W speaker. Fig. and gyroscope. As graphically shown in Fig. 20(a), BER is around 18(b) reflects that our system is able to communicate with those 0.1% when the speaker moves and is always below 0.2% even under phones up to 12.3m away, as their screen is set vertical to the obstacle disturbance. There is no doubt that it can settle the impact ground. The BMI160 encapsulated in Pixel 4 shows the shortest of transmitter motion and LOS blocking smoothly. communication distance, 4.4m. But it is still satisfactory in many Plane Tx motion. Then we fix the speaker and rotate a BMI160 scenarios, e.g., the indoor environment. chip around its axes with the comparison at different distances illustrated in Fig. 20(b). This system maintains a low BER. It is less than 0.2% at a distance of 15cm and rises to 1% as the distance increase to 20cm. Thereby, the plane motion of gyroscopes has little effect on the stability of our system. Spatial Tx motion. Here, gyroscopes move in space irregularly within 15cm from a fixed speaker. This evaluation involves four types of gyroscopes and each type contains 8 chips. Repetitive experiments are conducted on these chips where the manipulator repeats the same trajectory. It has a maximum error of 0.8% with all averages lower than 0.7% in Fig. 20(c). We have prepared it for the robustness against the fundamental movement. Concurrent influence. We ask 22 volunteers to send informa- Figure 21: Performance centered on a Pixel 4. tion with a speaker in hand where 32 gyroscope chips move in

714 Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel MobiCom ’20, September 21–25, 2020, London, United Kingdom

Table 2: Comparison with previous work

Receiver Motion System Basic Speed Accuracy Placement Distance Identification Robustness Vibra-motor Fixed Ripple [41] 200 bps BER<1.7% Manually 6 inches Not to Accelerator on a plane Vibra-motor Physical Touch Just to Ripple II [40] 30 kbps SNR>15db Manually to Microphone contact based tiny vibration Heat emission 1-8 bits Fixed Not BitWhisper [17] Not evaluate Manually 40cm to thermal sensor per hour position allow move Speaker No Dhwani [36] 2.4 kbps Accuracy>95% Manually 10cm Yes to Microphone limitation Speaker No Deaf-Aid 47 bps BER<0.6% Automatic 14m Yes to Gyroscope limitation space irregularly under the same conditions as above. As shown in 9 DISCUSSION Fig. 20(d), our system performs well under the multiple concurrent 9.1 Implementation Consideration motion interference. It maintains the mean of BER below 1%. Al- though there exists off-group points data, the peak is lower than6%. Communication distance and capacity will soar along with One explanation for those outliers is that volunteers accidentally technology. A better speaker, with a wider spectrum of responses or deflect the orientation of the speaker away from receivers. larger power, extends the communication range. It is reported that Real-world motion. In order to evaluate the robustness of the ultrasound is capable to affect gyroscopes 37m away [42]. This indi- prototypes, we organize volunteers in pairs. In each pair, one vol- cates a great potential of Deaf-Aid in more scenarios. On the other unteer holds a 30W speaker and the other carries a smartphone. hand, increasing the sampling rate would result in a higher trans- Both of them freely move within a range of 2 meters and fiddle mission rate. Therefore, Deaf-Aid would contentiously improve in with the devices. We evaluate on three kinds of smartphones and the transmission rate upon the emergence of new hardware. For find that the mean of BER is below1% and the peak is lower than example, the gyroscopes MPU6050 [25] and BMI160 [8] that are 7% during the entire experiment, as the result shown in Fig. 20(e). used in our experiments support 1kHz and 3.2kHz sampling rates, This indicates that Deaf-Aid facilitates a robust channel among respectively. If we adopt a gyroscope with an over 10kHz sampling mobilizable IoT devices in a real-world implementation. rate, Deaf-Aid can raise the transmission rate to thousands of bps In conclusion, Deaf-Aid shows immense potential as a commu- by a conservative estimate. We will obtain a more efficient system nication bridge even under various motion disturbances in a com- as new hardware emerges. plicated IoT network. Signal clipping means that the sensing mass produces voltages exceeding the input range of its amplifier, and distort the signal. For instance, it occurs on communication via an L3GD20 chip within 8.8 Comparison with Previous Systems 5cm experimentally. Even so, rising edges are still recognizable. Clip- ping introduces little additional error statistically. We will further Covert channels take advantage of physical phenomenon for data analyze the sensitivity of gyroscopes under different conditions, for transmission among adjacent devices. We select some typical cases example, sound pressure levels, to find the optimal device setting. for comparison, listed in Tab. 2. Communication through vibration, Power consumption is another issue that deserved discussions. for example Ripple [40] and Ripple II [41], is good at speed but weak Currently, in the audio components of IoT devices, the power is at fixed position and poor motion robustness. BitWhisper [17] deliv- almost consumed by the speaker, relied on the volume rather than ers messages further but slowly on a covert channel using thermal frequency. Its rated power and maximum power are fixed after the manipulations. The speaker-to-microphone channel is exploited manufacture. In general, the consumption of a commodity speaker by Dhwani [36]. However, with the purpose of recording human is designed and constricted within an acceptable range for an IoT voices, the microphones on IoT devices are more likely to filter out device. For the transmitters, it is just the consumption of speakers 8kHz [1]. In this case, the Dhwani like approaches require periph- and we have limited the power of speakers in our experiments, with erals to utilize ultrasound for stealthy communication, such as a the consideration on the low power consumption of IoT devices in high-quality microphone and a sound card with a high sampling some cases (see Sec. 8.3). In addition, if the transmitter is a mobile rate. Otherwise, people nearby will be disturbed. Deaf-Aid has no phone, the power consumption would not be a big issue due to the such issues instead, not to mention that Deaf-Aid also has other aid of the high-capacity battery and power bank. Deaf-Aid prepares advantages, such as multi-channel communication and automatic itself for occasions with both limited and sufficient power supply. receiver identification. More IoT devices and platforms will be supported in the fore- In summary, Deaf-Aid enables IoT devices to identify and chat seeable future. On the basis of proven resonance phenomenon in 3D with their neighbors remotely, infallibly, and liberally. It provides an mouse, screwdriver, VR device, iPhone [48], drone [42] and remote alternative and complementary communication channel to current control model car [47], our system can be applied in a broader range IoT devices.

715 MobiCom ’20, September 21–25, 2020, London, United Kingdom Gao and Lin, et al. of devices including those above. This could be an essential step to probe the relationship among axes in a gyroscope under resonance expand application fields, thus leading a more comprehensive IoT and triumph over adversity, such as frequency offset and multi- network based on Deaf-Aid. channel communication support. As an innovation, the diversity of resonant frequency range among gyroscopes is employed as 9.2 Security fingerprint for automatic receiver identification. The motion influ- The current resonant frequency band is relatively narrow, deter- ence suppression and corresponding mobile communication have mined by the inherent structure of gyroscopes. In this case, some been delicately designed. Our system, Deaf-Aid, reaches up to 47bps sophisticated communication techniques, such as FDM and OFDM, with a low BER even under motion interference. It could act as a are not applicable. However, we exploit the potential of the narrow stepping-stone for an everything-related IoT network. bands from a security perspective. Jamming: It is difficult for jammers to find out the appropriate ACKNOWLEDGMENTS band of a gyroscope, narrower than 50Hz usually. Without suffi- This paper is partially supported by National Natural Science Foun- cient knowledge, an attacker has to jam in a broadband spectrum. dation of China under grant 61872285, 61772236, and 61972348, the This method demands professional acoustic devices and it can be Fundamental Research Funds for the Central Universities, Alibaba- detected easily. A practicable means of avoiding malicious jam- Zhejiang University Joint Institute of Frontier Technologies, Re- ming is timely jamming detection. It is easy for Deaf-Aid, as only a search Institute of Cyberspace Governance in Zhejiang University, microphone is required. Leading Innovative and Entrepreneur Team Introduction Program Eavesdropping: Deaf-Aid can prevent replay attacks even if the of Zhejiang (Grant No. 2018R01005), Zhejiang Key R&D Plan (Grant private key is leaked. Benefited from gyroscopes’ narrow band-pass No. 2019C03133). width, intended non-informative ultrasound signal could broad- cast at the nearby frequency to confuse attackers but receivers are REFERENCES impervious to those noises with the help of multiplier-based sig- [1] H. Abdullah, W. Garcia, C. Peeters, P. Traynor, K. R. B. Butler, and J. Wilson. 2019. nal extraction. Furthermore, users can utilize multi-channel with Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. signals on one channel and deceptive data on the other. Prior in- In Proceedings of the Network and Distributed System Security Symposium. formation is an absolute necessity for eavesdroppers, such as the [2] S. Anand and N. Saxena. 2018. Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. In Proceedings of the IEEE Symposium communication frequency band, which is difficult to pick out the on Security and Privacy. right one from camouflage. [3] Z. Ba, T. Zheng, X. Zhang, Z. Qin, B. Li, X. Liu, and K. Ren. 2020. Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer. In Proceedings of the Network and Distributed System Security Symposium. 10 RELATED WORK [4] D. B. Bartolini, P. Miedl, and L. Thiele. 2016. On the capacity of thermal covert channels in multicores. In Proceedings of the Eleventh European Conference on Privacy is recorded by inertial sensors. A malicious attacker Computer Systems. can easily obtain inertial sensors data inside mobile platforms with- [5] C. Berrou and A. Glavieux. 1996. Near optimum error correcting coding and decoding: turbo-codes. IEEE Transactions on Communications 44, 10 (1996), 1261– out access permission, for keystroke inference [9, 29, 35, 37, 50, 58], 1271. device identification12 [ , 60] and speech recognition [2, 3, 18, 33]. [6] K. Block, S. Narain, and G. Noubir. 2017. An Autonomic and Permissionless The Gyrophone [33] like approaches leverages a gyroscope as an Android Covert Channel. In Proceedings of the 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks. eavesdropper to recognize speeches, mostly lower than 1kHz. Their [7] C. Bormann, A. P. Castellani, and Z. Shelby. 2012. CoAP: An Application Protocol intention is to eavesdrop on the context of human conversation for Billions of Tiny Internet Nodes. IEEE Internet Computing 16, 2 (2012), 62–67. via vibration. Different from Gyrophone, Deaf-Aid benefits from the [8] Bosch. 2018. BMI160 Datasheet. https://www.bosch-sensortec.com/products/ motion-sensors/imus/bmi160.html. resonance of gyroscopes and is aiming at transferring modulated [9] L. Cai and H. Chen. 2012. On the Practicality of Motion Based Keystroke Inference information from a speaker to a gyroscope. Attack. In Trust and Trustworthy Computing - 5th International Conference. [10] R. N. Dean, S. T. Castro, G. T. Flowers, G. Roth, A. Ahmed, A. S. Hodel, Gyroscope is vulnerable to acoustic injection attacks. It B. E. Grantham, D. A. Bittle, and J. P. Brunsch. 2011. A Characterization of has been demonstrated that resonance of gyroscopes could be trig- the Performance of a MEMS Gyroscope in Acoustically Harsh Environments. gered by acoustic signals [10, 11, 47]. An adversary can impose IEEE Trans. Industrial Electronics 58, 7 (2011), 2591–2596. [11] R. N. Dean, G. T. Flowers, A. S. Hodel, G. Roth, S. Castro, R. Zhou, A. Moreira, on outputs of gyroscopes, bringing about control systems error. A A. Ahmed, R. Rifki, B. E. Grantham, D. Bittle, and J. Brunsch. 2007. On the DoS attack was conducted to incapacitate drones [42]. Tu et al. [48] Degradation of MEMS Gyroscope Performance in the Presence of High Power realized a black box switching attack to push victim gyroscope to Acoustic Noise. In IEEE International Symposium on Industrial Electronics. [12] S. Dey, N. Roy, W. Xu, R. R. Choudhury, and S. Nelakuditi. 2014. AccelPrint: produce expected outputs. Imperfections of Accelerometers Make Smartphones Trackable. In Network and Covert channels have attracted great interest. They take ad- Distributed System Security Symposium. [13] B. Farshteindiker, N. Hasidim, A. Grosz, and Y. Oren. 2016. How to Phone Home vantage of physical phenomena, such as heat [4, 17, 56, 57], light with Someone Else’s Phone: Information Exfiltration Using Intentional Sound [32, 44], electromagnetic leakage [16], and ultrasound [31]. Iner- Noise on Gyroscopic Sensors. In 10th USENIX Workshop on Offensive Technologies. tial sensors have become candidates [6, 13, 40, 41]. Nevertheless, [14] GSMArena. 2017. Mobile Release Date, Price, Specs, Features, Review, News. https://www.gsmarena.com. these methods demand physical contact, specialized equipment or [15] GSMArena. 2017. Samsung Galaxy S8. https://www.gsmarena.com/samsung_ artificial assistance, none of which is needed in Deaf-Aid. galaxy_s8-8161.php. [16] M. Guri, G. Kedma, A. Kachlon, and Y. Elovici. 2014. AirHopper: Bridging the air-gap between isolated networks and mobile phones using radio frequencies. 11 CONCLUSION In 9th International Conference on Malicious and Unwanted Software. [17] M. Guri, M. Monitz, Y. Mirski, and Y. Elovici. 2015. BitWhisper: Covert Signaling In this paper, we leverage the stealthy speaker-to-gyroscope chan- Channel between Air-Gapped Computers Using Thermal Manipulations. In IEEE nel for protocol-independent mobile IoT communication. We first 28th Computer Security Foundations Symposium.

716 Deaf-Aid: Mobile IoT Communication Exploiting Stealthy Speaker-to-Gyroscope Channel MobiCom ’20, September 21–25, 2020, London, United Kingdom

[18] J. Han, A. J. Chung, and P. Tague. 2017. PitchIn: Eavesdropping via Intelligible [47] T. Trippel, O. Weisse, W. Xu, P. Honeyman, and K. Fu. 2017. WALNUT: Waging Speech Reconstruction Using Non-acoustic Sensor Fusion. In 16th ACM/IEEE Doubt on the Integrity of MEMS Accelerometers with Acoustic Injection Attacks. International Conference on Information Processing in Sensor Networks. In IEEE European Symposium on Security and Privacy. [19] Harman. 2019. JBL STADIUM GTO750T. https://www.onlinecarstereo.com/ [48] Y. Tu, Z. Lin, I. Lee, and X. Hei. 2018. Injected and Delivered: Fabricating Implicit CarAudio/p_51143_JBL_STADIUMGTO750T.aspx. Control over Actuation Systems by Spoofing Inertial Sensors. In Proceedings of [20] HiVi. 2013. SS1II. http://www.hivi.com/products/detail.aspx?pid= 27th USENIX Security Symposium. 100000443464649. [49] S. Vinoski. 2006. Advanced Message Queuing Protocol. IEEE Internet Computing [21] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, 10, 6 (2006), 87–89. C. C. Tung, and H. H. Liu. 1998. The empirical mode decomposition and the [50] C. Wang, X. Guo, Y. Wang, Y. Chen, and B. Liu. 2016. Friend or Foe?: Your Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceed- Wearable Devices Reveal Your Personal PIN. In Proceedings of the 11th ACM on ings of the Royal Society of London 454, 1971 (1998), 903–995. Asia Conference on Computer and Communications Security. [22] U. Hunkeler, H. L. Truong, and A. Stanford-Clark. 2008. MQTT-S — A pub- [51] Z. Wang, W. Zhu, J. Miao, H. Zhu, C. Chao, and O. K. Tan. 2005. Micromachined lish/subscribe protocol for Wireless Sensor Networks. In 3rd International Con- thick film piezoelectric ultrasonic transducer array. Sensors Actuators A Physical ference on Communication Systems Software and Middleware and Workshops. 130-131 (2005), 485–490. [23] A. Hyvarinen. 1999. Fast and robust fixed-point algorithms for independent [52] A. Wheeler. 2007. Commercial Applications of Wireless Sensor Networks Using component analysis. IEEE Transactions on Neural Networks 10, 3 (1999), 626–634. ZigBee. IEEE Communications Magazine 45, 4 (2007), 70–77. [24] IDC. 2019. Worldwide Internet of Things Forecast, 2019–2023. https://www.idc. [53] Wikipedia. 2020. Gyroscope. https://en.wikipedia.org/wiki/Gyroscope. com/getdoc.jsp?containerId=US45373120. [54] Wikipedia. 2020. Hi-Fi. https://en.wikipedia.org/wiki/High_fidelity. [25] Invensense. 2015. MPU-6050 Datasheet. https://www.invensense.com/wp- [55] Z. Wu and N. E. Huang. 2009. Ensemble empirical mode decomposition: a noise- content/uploads/2015/02/MPU-6000-Datasheet1.pdf. assisted data analysis method. Advances in adaptive data analysis 1, 1 (2009), [26] J. Kapur, P. Sahoo, and A. Wong. 1980. A new method for gray-level picture 1–41. thresholding using the entropy of the histogram. Computer Vision, Graphics, and [56] Z. Wu, Z. Xu, and H. Wang. 2012. Whispers in the Hyper-space: High-speed Image Processing 29 (1980), 273–285. Covert Channel Attacks in the Cloud. In Proceedings of the 21th USENIX Security [27] J. Lee, Y. Su, and C. Shen. 2007. A Comparative Study of Wireless Protocols: Blue- Symposium. tooth, UWB, ZigBee, and Wi-Fi. In 33rd Annual Conference of the IEEE Industrial [57] Z. Wu, Z. Xu, and H. Wang. 2015. Whispers in the Hyper-Space: High-Bandwidth Electronics Society. and Reliable Covert Channel Attacks Inside the Cloud. IEEE/ACM Transactions [28] K. Lee, H. Wang, and H. Weatherspoon. 2014. PHY Covert Channels: Can you see on Networking 23, 2 (2015), 603–615. the Idles?. In Proceedings of the 11th USENIX Symposium on Networked Systems [58] Z. Xu, K. Bai, and S. Zhu. 2012. TapLogger: inferring user inputs on smartphone Design and Implementation. touchscreens using on-board motion sensors. In Proceedings of the Fifth ACM [29] X. Liu, Z. Zhou, W. Diao, Z. Li, and K. Zhang. 2015. When Good Becomes Evil: Conference on Security and Privacy in Wireless and Mobile Networks. Keystroke Inference with Smartwatch. In Proceedings of the 22nd ACM SIGSAC [59] C. Yum, Y. Beun, S. Kang, Y. Lee, and J. Song. 2007. Methods to use 6LoWPAN in Conference on Computer and Communications Security. IPv4 network. In The 9th International Conference on Advanced Communication [30] C. T. M. 2017. Elements of Information Theory (Wiley Series in Telecommunications Technology. and Signal Processing). Wiley-Interscience. [60] J. Zhang, A. R. Beresford, and I. Sheret. 2019. SensorID: Sensor Calibration [31] A. Madhavapeddy, R. Sharp, D. J. Scott, and A. Tse. 2005. Audio networking: the Fingerprinting for Smartphones. In IEEE Symposium on Security and Privacy. forgotten wireless technology. IEEE Pervasive Computing 4, 3 (2005), 55–60. [61] D. Zhou and W. Cheng. 2008. Image denoising with an optimal threshold and [32] A. Maiti and M. Jadliwala. 2019. Light Ears: Information Leakage via Smart neighbouring window. Pattern Recognition Letter 29, 11 (2008), 1694–1697. Lights. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous [62] Y. Zou, J. Zhu, X. Wang, and L. Hanzo. 2016. A Survey on Wireless Security: Technologies 3, 3 (2019), 98:1–98:27. Technical Challenges, Recent Advances, and Future Trends. Proc. IEEE (2016). [33] Y. Michalevsky, D. Boneh, and G. Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proceedings of the 23rd USENIX Security Symposium. [34] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman, and S. Van Huffel. 2010. Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE transactions on biomedical engineering 57, 9 (2010), 2188–2196. [35] E. Miluzzo, A. Varshavsky, S. Balakrishnan, and R. R. Choudhury. 2012. Tapprints: your finger taps have fingerprints. In The 10th International Conference on Mobile Systems, Applications, and Services. [36] R. Nandakumar, K. K. Chintalapudi, V. Padmanabhan, and R. Venkatesan. 2013. Dhwani: secure peer-to-peer acoustic NFC. ACM SIGCOMM Computer Commu- nication Review 43, 4 (2013), 63–74. [37] E. Owusu, J. Han, S. Das, A. Perrig, and J. Zhang. 2012. Accessory: password inference using accelerometers on smartphones. In Proceedings of the Twelfth Workshop on Mobile Computing Systems & Applications. [38] J. Ploennigs, U. Ryssel, and K. Kabitzsch. 2010. Performance analysis of the EnOcean wireless sensor network protocol. In IEEE 15th Conference on Emerging Technologies Factory Automation. [39] P. R. Prucnal and P. A. Perrier. 1988. Optical self-routing in a self-clocked photonic switch using pulse-interval encoding. In Fourteenth European Conference on Optical Communication, Vol. 1. 259–263. [40] N. Roy and R. R. Choudhury. 2016. Ripple II: Faster Communication through Physical Vibration. In 13th USENIX Symposium on Networked Systems Design and Implementation. [41] N. Roy, M. Gowda, and R. R. Choudhury. 2015. Ripple: Communicating through Physical Vibration. In 12th USENIX Symposium on Networked Systems Design and Implementation. [42] Y. Son, H. Shin, D. Kim, Y. Park, J. Noh, K. Choi, J. Choi, and Y. Kim. 2015. Rocking Drones with Intentional Sound Noise on Gyroscopic Sensors. In Proceedings of 24th USENIX Security Symposium. [43] Y. Stebler, S. Guerrier, J. Skaloud, and M. Victoria-Feser. 2012. A framework for inertial sensor calibration using complex stochastic error models. In Proceedings of the IEEE/ION Position, Location and Navigation Symposium. [44] T. Sugawara, B. Cyr, S. Rampazzi, D. Genkin, and K. Fu. 2019. Light Com- mands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems. arXiv:2006.11946 [45] P. Sweeney. 2002. Error Control Coding: From Theory to Practice. John Wiley Sons, Inc. [46] W. T. Thomson. 1981. Theory of vibration with applications. Prentice Hall.

717