EchoLock: Towards Low Effort Mobile User Identification

Yilin Yang Chen Wang Yingying Chen Rutgers University Louisiana State University Rutgers University yy450@scarletmail. [email protected] ∗ yingche@scarletmail. rutgers.edu rutgers.edu Yan Wang Binghamton University [email protected]

ABSTRACT User identification plays a pivotal role in how we interact with our mobile devices. Many existing authentication ap- proaches require active input from the user or specialized sensing hardware, and studies on mobile device usage show significant interest in less inconvenient procedures. In this paper, we propose EchoLock, a low effort identification scheme that validates the user by sensing hand geometry via com- modity microphones and speakers. These acoustic signals produce distinct structure-borne sound reflections when con- tacting the user’s hand, which can be used to differentiate Figure 1: Capture of hand biometric informa- between different people based on how they hold their mo- tion embedded in structure-borne sound using bile devices. We process these reflections to derive unique commodity microphones and speakers. acoustic features in both the time and frequency domain, convenient practices [37]. Techniques such as facial recog- which can effectively represent physiological and behavioral nition or fingerprinting provide fast validation without traits, such as hand contours, finger sizes, holding strength, requiring considerable effort from the user, but demand and gesture. Furthermore, learning-based algorithms are de- dedicated hardware components that may not be avail- veloped to robustly identify the user under various environ- able on all devices. This is of particular importance in ments and conditions. We conduct extensive experiments markets of developing countries, where devices such as with 20 participants using different hardware setups in key the IDEOS must forgo multiple utilities in or- use case scenarios and study various attack models to demon- der to maintain affordable price points (e.g. under $80) strate the performance of our proposed system. Our results [13, 15]. Secure and effective functionality necessitates show that EchoLock is capable of verifying users with over a lightweight protocol to distinguish between different 90% accuracy, without requiring any active input from the users, both to facilitate intelligently tailored services as user. well as maintain privacy. To this end, we propose EchoLock, an original and 1. INTRODUCTION inexpensive technique capable of secure, fast, and low- arXiv:2003.09061v2 [cs.HC] 9 Apr 2020 User identification is a fundamental and pervasive as- effort user identification utilizing only commodity speak- pect of modern mobile device usage, both as a means of ers and microphones. EchoLock uses a carefully de- maintaining security and personalized services. Verify- signed inaudible signal to measure acoustic reflections ing oneself is necessary to gain access to , and structural vibrations produced by the user’s hand bank accounts, and customized news feeds; informa- when holding their mobile devices, such as smartphones, tion and resources which must be available on demand. tablets, smart remotes, or wireless mouses. Sound waves As such, repeated acts of authentication can quickly produced by speakers propagate through physical medi- grow tedious and consume unnecessarily long portions ums such as the aforementioned devices in the form of of daily routines involving mobile devices. Studies on structure-borne sound. Pressure from the hand applied cellphone addiction suggest that user identification pro- to the device has a unique and observable impact on cedures encompass up to 9.0% of daily usage time [17], structure-borne sound propagation. This information with related inquiries showing strong interest in more is representative of the user’s hand geometry, a biomet- ∗This work was done during Chen Wang’s Ph.D. study at ric indicator well studied in medical fields and known Rutgers University. to be accurate [7], yet rarely employed in mobile ap-

1 plications due to obstacles in obtaining accurate mea- in airborne and structure-borne signals that are hard surements with limited hardware. We find that acous- to distinguish from each other. Moreover, the ambi- tic sensing of these characteristics is not only feasible, ent noises and acoustic signals reflected off surrounding but also low-effort as no conscious action is required obstacles create serious interference that needs to be by the user; holding the device itself is the validating accounted for. Last but not least, many factors could action, rather than typing a password or pressing a fin- impact the robustness of the proposed approach, such gerprint scanner, as shown in Figure 1. This can be as different device shapes or materials. achieved using only speakers and microphones found on To address these challenges, EchoLock utilizes an ul- most commercial-off-the-shelf (COTS) devices, making trasonic signal to sense a user’s mannerisms when hold- our system non-intrusive and low cost. The availability ing a device. A high-frequency, short duration trans- of these hardware components is only expected to in- mission is selected to reduce audible disturbances to the crease with the rising prevalence of integrated IoT de- user and provide prompt validation. The signals coming vices built with virtual assistants and voice controllers, through the structure-born and near-surface airborne projected to reach an install base of over 75 billion by propagation are separated based on different sound speeds 2025 [4]. in the air and solid materials [5]. The system applies a Existing solutions in the market are typically consid- band-pass filter to remove the ambient acoustic noises ered effective and fast, but do have some limitations that do not share the same spectrum as the designated regarding ease of use. Actions such as password en- ultrasonic signal. We derive fine-grained acoustic fea- try, voice utterances, or finger presses demand, however tures, including statistic features in the time and fre- briefly, the user’s attention and active participation in quency domain as well as MFCC features, to capture the process. In contrast, EchoLock is a passive pro- the unique hand holding biometrics. We find simple cedure. Our technique serves as a viable standalone learning-based algorithms (i.e., SVM and LDA) are suf- identification system, or as a complementary system for ficient to robustly identify the user considering various multi-factor authentication due to its naturally low in- impact factors. There is no dependence on specialized volvement. Password security, for example, can be en- cameras or other dedicated hardware to accomplish this, hanced by simultaneously sampling hand geometry dur- requiring only the use of readily accessible microphones ing typing or swiping actions, compensating for com- and speakers found on most modern mobile devices. mon vulnerabilities (e.g. PIN codes spied on through Most significantly, no active input is required from the shoulder-surfing attacks cannot be used if the attacker’s user, making our system low effort. hand is not recognized by the device). Our main contributions in this work are as follows: As a standalone technique, EchoLock is applicable • We develop a low-effort user identification system to a wide variety of services. Resources such as finan- for mobile and smart handheld devices (e.g. smart- cial accounts or health apps on smartphones can trig- phones, smart remotes, wireless mouses) that vali- ger a single-instance identification check to verify the dates hand biometric information based on acous- user’s hand before divulging sensitive information. Pri- tic sensing. The proposed system does not require vate message notifications can be displayed or hidden any input from the user and is non-invasive by uti- onscreen depending on which person is currently hold- lizing a inaudible frequencies. Moreover, our min- ing the device. The low costs of small-scale microphone imal hardware requirements make our system easy and speaker sensors can enable even unconventional de- to be deployed on most COTS devices. vices to be compatible with EchoLock. Door handles • We design an acoustic sensing-based approach to augmented with such components can passively sense study the unique signal interferences caused by structure-borne sound propagation as the user holds an individual’s hand on sound propagation result- them and open or lock accordingly. The speed of sound ing from their different curvatures, finger sizes and propagation is rapid, even when traveling through phys- holding behavior. We demonstrate that structure- ical mediums, enabling our identification checks to be born sound propagation of a carefully designed completed within milliseconds. Continuous authenti- acoustic signal is able to carry physiological and cation can also be implemented via periodic measure- behavioral hand information, which previously re- ments. lied on imaging techniques to measure. Building EchoLock for such applications does present • We identify unique acoustic features, including time- many challenges, however. The most prominent is the domain, frequency domain, and MFCC features, challenge in leveraging a single pair of low-fidelity speaker to capture the user’s biometrics and apply signal and microphones to capture the unique characteristics processing and learning-based methods to distin- of a user’s hand geometry, which usually has only minute guish the users based on their phone-holding be- differences between people. In addition, the acoustic haviors for user identification. signal propagating from the device’s speaker to its mi- • We implemented an early prototype of EchoLock crophone usually experiences multipath effect, resulting on various mobile devices and evaluated perfor-

2 mance under multiple conditions. Our tests in- volving over 160 trials of key use case scenarios show identification accuracy upwards of 90%. Top Mic (Mic A) 2. RELATED WORK Reconciling the trade-off of accuracy and usability is a constant challenge of user identification for mobile StructureStructt Path devices. The most popular identification methods are AirborneAirboo Path based on memorization of a text or numerical string SpeakerSpeaker (e.g., a password and PIN) [33]. For these methods, the user must either commit to memory complex knowl- Bottom Mic (Mic B) edge (e.g., random characters or long combinations), or settle for a simpler string at the expense of security. Figure 2: Example sensor configuration and Graph-based authentication, such as lock patterns [36] sound paths for a typical design. and image-based authentications [31, 12], are proposed thentication, which we consider to be the closest cat- to ease the mental burden. They identify a user by ask- egory of works to EchoLock. Ren et al. derive the ing the user to swipe line segments across multiple grid unique gait patterns from the user’s walking behav- points or recognize the user’s pre-selected image from ior to passively verify the user’s identity by using ac- a sequence of randomly displayed pictures. However, celerometer readings in mobile devices [25]. Zheng et these approaches require more time to input graphical al. propose to extract the user’s behavioral patterns of patterns compared to the entry of alphanumerical pass- tapping (e.g., rhythm, strength, and angle words. Moreover, all these knowledge-based credentials preferences of the applied force) from the device’s built- are subject to loss and have low security. in accelerometer, gyroscope, and touchscreen sensor to To relieve the burdens of user input (e.g., memoriza- provide non-intrusive user authentication [39] when the tion and input time) during identification, mobile de- device is in the unlocked status. vices tend to adopt biometric-based approaches, which We summarize these findings in Table 1. Different allow the users to simply use physical traits of a body from other approaches, EchoLock utilizes novel hand- part as credentials. Several popular examples include related physiological biometrics (e.g. hand contours, face ID [1], capacitive fingerprint scanning [10, 2], and size, and finger dimensions) and behavioral bio- iris scanning [3]. These physical traits are typically metrics (e.g. holding strength and behavior) to provide unique to a person and do not change abruptly over convenient and secure user authentication on mobile de- time, making them ideal for identifying purposes. How- vices. In particular, we perform fine-grained acoustic ever, these approaches require expensive or dedicated sensing (i.e. using the near ultrasound emanations of hardware components to obtain users’ physiological char- the mobile device) to capture the unique characteris- acteristics, which limits the deployment of these meth- tics of a user’s holding hand for identification. Acoustic ods on most devices. Moreover, researchers show that sensing is an emerging technique that has been widely voices or faces are faster to input than passwords and used in many mobile computing applications (e.g., in- gestures, but users spend over 5s on average to respond door localization [34], computer-human interaction [35, to the system with an active input in addition to the au- 30]) because it is easy to obtain and process the acous- thentication processing time, which is not convenient [32]. tic signals in mobile devices. To our best knowledge, we Human behaviors are considered another type of bio- are the first work to utilize acoustic sensing to capture metrics that can be used for identification. Because biometric information for low effort user identification. they usually encompass traits that are exhibited sub- Our proposed technique does not depend on personally consciously, they are difficult to be stolen or imitated. identifiable information, active user inputs, long input For example, existing works show that it is possible to time, or specialized hardware. identify a user based on finger gestures on the touch screen [28, 29, 26], hand gestures in the air [20], finger input on any solid surface [21], or voice commands [38, 3. FEASIBILITY STUDY 18]. While these approaches are low-cost, they also pose a challenge for identification due to a high level 3.1 Sound Propagation on Mobile Devices of variability when measuring these behaviors with low- Acoustic sensing techniques may vary depending on fidelity sensors available on mobile devices. Moreover, the transmission medium sound propagates through. behavioral-based identification still requires the user to Figure 2 illustrates this using an example diagram of actively input a behavior pattern, which is still not con- a common configuration format for speaker and micro- venient for frequent device unlocking activities. phone placement on a COTS smartphone. Many mobile There is active research on low-effort passive user au- devices are equipped with at least two microphones, po-

3 Table 1: Qualitative comparison of existing user identification methods. Identification Evaluation Personally Physiological Behavioral Specialized Technique Category Identifiable Credentials Credentials Hardware Lockscreen Image [11] Knowledge No No Yes No Face [14] Visual Yes Yes No No Fingerprint [23] Visual Yes Yes No Yes Iris [8] Visual Yes Yes No Yes Gait [40] Visual No Yes Yes No Voice [19] Acoustic Yes Yes Yes No Our Work (EchoLock) Acoustic No Yes Yes No

(a) Held in hand (b) Idle on table (c) Stored in pocket Figure 3: Testing conditions for preliminary en- vironmental differentiation experiments.

Figure 4: Euclidean distance of signal ampli- tudes recorded from different use case scenarios. Lower distances imply signal similarities. sitioned opposite of each other at either ends of the de- vice. Sound sources, such as speakers of the mobile de- vice, produce acoustic signals, which follow an airborne propagation path to be received by the microphones. −(p − p ) ∆P K = 1 0 = , (1) For structure-borne propagation, however, the trans- (V1 − V0)/V0 ∆V/V0 mission medium is the device itself. This form of sound for a given differential change in pressure ∆P and vol- is most often recognized as vibration and can be per- ume ∆V relative to an initial volume V . The introduc- ceived both aurally as well as tangibly, due to the na- 0 tion of a stable additive density V and fluctuating pres- ture of the medium. From Hooke’s Law [27], the speed 1 sure increase p by the user changes K to a dynamic set of sound through a medium can be represented as a 1 of elasticity constants. This produces a range of acous- q K function, formulated as c = p where K is the bulk tic patterns representative of how the device is held by a modulus of elasticity, or Young’s modulus, and p is the specific individual at the time of measurement, uniquely medium density. As seen in Figure 2, the structure path shaped by hand contour, posture, grip pressure, and be- is more direct due to the greater density and compres- havior. Note that this model is an incomplete explana- sion resistance of the mobile device, allowing sound to tion as it does not account for a distributed application travel and be received more quickly compared to in the of pressure from different focal points of a user’s hand. air. This trait is of interest as propagation through a However, based on our observations, this explanation physical medium provides natural resilience to reflec- is sufficient for the purposes of conveying the intuition tions from distant obstacles as there is minimal devia- behind EchoLock’s functionality. tion from the sound path. A caveat to this, however, is that structure-borne 3.2 Structure-borne Propagation Feasibility propagation is much more sensitive to physical distur- To establish the capabilities of structure-borne sound bances. Interactions such as touching the medium can propagation, a simple preliminary experiment was per- significantly alter the acoustic patterns as the contact formed to gauge the ability for a mobile device to as- and force exerted upon the medium changes how it re- certain environmental conditions using acoustic sens- verberates. While this normally poses a challenge for ing. The observations from this experiment shaped the acute acoustic sensing, EchoLock exploits this for the design of EchoLock and provided insight into imple- purposes of recognizing individual people. The force mentation feasibility and potential design challenges. of a user’s grip on the mobile device is integral to the Three different use cases were selected with the intent system, essentially extending the medium to encompass of demonstrating that conditions with distinct forces both the device and user’s hand. The bulk modulus of exerted upon the mobile device could be easily recog- elasticity can be expressed as: nized. Figure 3 shows the particular experimental setup

4 Figure 5: System Overview of EchoLock. for each the three scenarios; having the mobile device Impersonation Attack. The attacker attempts to resting in the user’s hand, on a table, or within the mimic the holding posture of the legitimate user to gain user’s pocket. access to the device. For impersonation attacks, the at- A Galaxy Note 5 was used to conduct the tacker may either be informed or uninformed. In the experiment. Due to its relatively large size, the com- uninformed case, the attacker possesses no knowledge pressing forces acted upon it are more pronounced as on how to circumvent the identification process and the sound propagation is prolonged by traveling through naively attempts to mimic the legitimate user’s holding a larger medium. An inaudible chirp signal sweeping behavior. In the informed case, however, the attacker is from 16 kHz to 22 kHz was emitted from the device explicitly aware of the legitimate user’s authentication speakers to induce vibration, which is then recorded by credentials in some form. This may be through pas- a single microphone near the top of the device. This sive observations of the user’s hands or interactions such captured recording is then examined for features that as handshaking. Physically faking the legitimate user’s may identify environmental origins. The results of these profile, however, requires applying forces to the device experiments can be seen in Figure 4. Three samples such that they create structural deformations similar to from each use case in Figure 3 were obtained and the av- how the user’s own hands would. erage amplitude for each recording was extracted. The Eavesdropping and Replay Attack. The attacker absolute difference was then computed for each com- attempts to steal acoustic credentials of the legitimate bination of samples to measure statistical distinction user by eavesdropping instances of identification attempts. between each use case. This may be done by positioning a microphone near the Our findings show a clear relation between samples user as EchoLock is deployed. After obtaining an au- originating from different use cases. Samples native dio sample of a signal used to authenticate the user, to the same use case displayed naturally low ampli- the attacker gains possession of the targeted device and tude difference, which increased when compared to sam- replays the audio sample via an external speaker. Dur- ples from foreign cases. Note across the diagonal that ing ultrasonic sensing, the mobile device will transmit the difference is zero as these are comparisons between and record our acoustic signal. Successful replay of an a sample with itself. This suggests that, even with eavesdropped sample necessitates the attacker to first minimal sensors and signal processing, structural-borne suppress or prevent this signal to avoid overlapping with sound propagation is capable of communicating critical their own transmission, which is a non-trivial challenge. information about the immediate surroundings. Jamming Attack. The attacker in this scenario is focused on deliberate sabotage of genuine authentica- 4. SYSTEM OVERVIEW tion attempts. This may be carried out by playing loud EchoLock utilizes the properties of sound wave prop- noise or ultrasonic frequencies near the user to disrupt agation to identify mobile device users, which can be the profile estimation procedure. The attacker does not applied to applications such as accessing banking ac- necessarily need to know the user’s credentials to jam counts, managing notifications, and unlocking devices. the system. We assume in our assessment that the at- In this section, we introduce the application of acoustic tacker will utilize ultrasonic frequencies to decrease the sensing to deduce the identity of the person holding the chances of detection by the ordinary user. mobile device via structure-borne sound propagation, the attack model, and system overview. 4.2 Challenges and Requirements Although the results from our feasibility study are en- 4.1 Attack Model couraging, they also highlight critical design challenges.

5 First, using a single built-in speaker and a microphone available on a mobile device to sense the user’s compli- cated physical hand traits (e.g., hand geometry, finger placements and holding strength) is an unexplored area. Second, because the acoustic signals travel too fast com- pared to the small dimensions (i.e., sensing area) of mo- bile devices (e.g., 15 cm between a built-in speaker and microphone for a smartphone), the existing built-in mi- crophone can only receive limited acoustic samples (< 20 samples [30, 35]) to describe a complete propaga- tion. Using such limited acoustic information to de- scribe the complicated interferences caused by people’s hand is difficult. Third, the acoustic signals arriving at Figure 6: Structure-borne, near surface airborne the microphone are the combination of structure-borne sounds and environmental reflections. propagation and airborne propagation; how to separate Derivation. A series of statistical and frequency based or leverage these propagation signals to extract peo- features, further elaborated in Section 5, are identified ple’s hand biometrics is challenging. Fourth, the en- based on their sensitivity to the forces, palm sizes and vironmental reflections of the acoustic sensing signals finger displacements of the user’s hand and selected by and the ambient noises corrupt the received sound and KNN-based Feature Selection. make the acoustic analysis of the user’s holding hand These features are used to develop our learning-based even harder. Besides addressing these challenges, we classifiers, which are responsible for processes we refer also need to consider both security and usability when to as profiling and verification. All extracted features designing the system. In particular, the passive user in- representative of a specific user are compiled into a cell- put to our system should be hard to observe and imitate based data structure and saved to a database of Acous- to meet security requirements. Moreover, our system tic Sensing-based User Hand Profile. This database is needs to have a short acoustic sensing period to avoid then referenced for a profile match when performing the causing additional delay and support the user’s both user authentication. However, even known users may left hand and right hand to maintain high usability. be difficult to verify due to behavioral inconsistencies in how they hold their devices. To combat this, we em- 4.3 System Architecture ploy the Learning-based Classifiers Support Vector Ma- We designed EchoLock as a fast and low effort user chine (SVM) and Linear Discriminant Analysis (LDA) authentication scheme using existing hardware compo- learning techniques to examine numerical distance dis- nents to detect and process structure-borne sound waves. crepancies and deduce the most probable user profile In order to authenticate a user’s identity from these match. Once complete, EchoLock finally concludes the sound waves, we perform an acoustic signal preprocess- process with an identification decision, performing a de- ing phase and a learning-based mobile device holder sired function, such as unlocking the device or denying identification process as illustrated in Figure 5. Echo- access, based on the computed similarities between the Lock uses the Pickup Detection to continuously monitor measured features obtained from ultrasonic sensing and an initiating action, such as the user picking up their database of learned profiles. Because of the rapid prop- device from a table or pocket, with the assumed intent agation of sound through a physical medium, the entire that the user wishes to unlock the device. Upon de- process is conducted within milliseconds without any tection of this action, our system performs Ultrasonic conscious exertions from the user. Sensing, where the device’s speaker briefly emits an in- audible acoustic signal (i.e., a chirp signal ranging from 5. LOW EFFORT IDENTIFICATION 18kHz to 22kHz) and immediately after records the user’s reflection via onboard microphones. This proce- 5.1 Sound Propagation Separation dure concludes within milliseconds, depending on signal When a user holds his or her mobile device, EchoLock design parameters we discuss in Section 6.1. utilizes the device’s speaker and microphone to send and Then, this response undergoes our Acoustic Signal receive acoustic sensing signals, which can capture the Preprocessing phase, where we apply a bandpass filter user’s fine-grained hand biometrics. This is because the from 18kHz to 22kHz and perform cross-correlation to holding hand interacts (i.e. block and reflect) with the remove noise and ensure that our obtained signal con- acoustic signals and creates unique patterns associated sists of structure-borne sound. Following the prepro- with the holding hand’s physical characteristics, includ- cessing, the system analyzes the signal to capture the ing the curvature and fat composition of the palm. user’s physiological and behavioral characteristics in the Intuitively, the structure-borne propagation of the Acoustic-based Physiological and Behavioral Biometrics acoustic signals is a good candidate for deriving the

6 10-3 2.5 Person 1 Person 2 2

1.5

Power 1

0.5

0 1.6 1.7 1.8 1.9 2 2.1 2.2 Frequency 104 Figure 8: Spectral analysis of the received acous- Figure 7: Illustration of the time-domain statis- tic signal for two people’s holding hands. tical features to differentiate two people’s hands. Frequency Cepstral Coefficient (MFCC) and Chroma- user’s hand biometrics, since the structure-borne signals gram features. are directly affected by the contacting area and force of Time-domain Statistic Features. In the time do- the hand holding the mobile device. However, extract- main, we choose to analyze signals by its statistics in- ing the structure-borne signals only is non-trivial be- cluding mean, standard deviation, maximum, minimum, cause the built-in microphone captures both structure- range, kurtosis and skewness. We also estimate the sig- borne and airborne signals. Existing studies propose to nal’s distribution by calculating second quantile, third leverage the difference between the propagation speed quantile, fourth quantile and signal dispersion. Addi- of the structure-borne signal and that of the airborne tionally, we examine peak change by deriving the index signal to separate them [30, 35]. Due to the small di- position of the data point that deviates most signifi- mensions of mobile devices, the structure-borne signal cantly from the statistical average. Figure 7 shows an only arrives at a very short time before the airborne example of the features with the standard deviation, sig- signals (e.g., 0.20ms). nal dispersion, and range of the received acoustic signal, In particular, the acoustic signals propagating along illustrating the statistical features can effectively differ- the paths close to surface of the mobile device may be entiate the user’s holding hand. blocked or reflected by the user’s holding hand. Figure 6 Frequency-based Features. In the frequency do- shows the structure-borne and near surface airborne sig- main, we apply Fast Fourier Transformation (FFT) to nals when a smartphone is placed on a table and held the received acoustic signal and derive 256 spectral points by a hand, respectively, when playing and recording a to capture the unique characteristics of the user’s hold- 1ms short chirp signal using the built-in speaker and ing hand in the frequency domain. This is because the microphone. holding hand can be considered as a filter, which results It is apparent that both structure-borne and near sur- in suppressing some frequencies while not affecting oth- face borne signals are affected significantly by a hand, ers. Figure 8 shows an example of the received sound however airborne sound is less reliable to estimate sur- in the frequency domain, where the frequencies between rounding information due to distortions from the mul- 19kHz and 21kHz are affected by the user significantly. tipath effect. The minute delay of airborne sound may Acoustic Features. We also derive the acoustic fea- also introduce noise to subsequent structure-borne sound tures from the received sound using the MFCC [22] and transmissions due to asynchronous arrival time, neces- Chromagram [24]. MFCC features are normally applied sitating separation of the two. We note that while air- in speech processing studies to describe the short-term borne sound is not directly used by our system to iden- power spectrum of the speech sound and are good for tify the user, it does provide an indirect masking effect reflecting both the linear and non-linear properties of against eavesdropping attempts by obscuring genuine the sound’s dynamics. Chromagram, often referred to structure-borne sound to external listeners. as ”pitch class profiles”, is traditionally utilized to an- alyze the harmonic and melodic characteristics of mu- 5.2 Acoustic Feature Extraction sic and categorize the music pitches into twelve cate- After receiving the designated signal, the system ex- gories. We have observed the sensitivity of the MFCC tracts from it unique features to analyze the interfer- and Chromagram to be sensitive enough to respond to ences caused by the user’s hand and derive a biometric physical biometrics as well. In this work, we derive 13 profile, which integrates both physiological and behav- MFCC features and 12 chroma-based features to de- ioral traits. A series of candidate features are identified scribe the different hand holding-related interferences for their potential responsiveness to different user hand to the sound. biometrics. These features include statistical proper- ties in the time domain, the spectral points in the fre- 5.3 KNN-based Feature Selection quency domain, and acoustic properties such as Mel- Some features are more sensitive to the minute dif-

7 10-3 12 struction phase that is detailed in Section 6.3. During

10 the user verification phase, Echolock first classifies the testing data to one user based on the user profile. Then, 8 our algorithm utilizes the prediction probabilities re- 6 turned by the classifier as a confidence level and applies 4 a threshold-based method to examine the classification

Euclidean Distance 2 Person 1 results. If the confidence level of the classification is Person 2 0 above the threshold, the user is identified. Otherwise, 0 5 10 15 20 Samples our system will determine it as “unknown” and respond Figure 9: Euclidean distance of the standard de- accordingly. This can be used for multiple applications, viation feature for two users relative to a sepa- such as immediately adjusting user settings when a reg- rate instance of the feature for the second user. istered user is detected, or locking devices when an un- known user attempts to gain access. ferences of different hands while the others may not be very effective at distinguishing between them. More- 6. IMPLEMENTATION over, mobile devices from different vendors may have their speaker and microphone embedded at different po- 6.1 Acoustic Sensing Signal Design sitions. These hardware distinctions introduce further The specific design of the acoustic signal emitted is uncertainty when measuring the user using our features. paramount to the performance of EchoLock. Conse- In this work, we develop a K-nearest neighbour (KNN) quentially, measuring biometric properties of a given based feature selection method find the more salient fea- person accurately using only sound propagation requires tures for EchoLock according to the mobile device when our signal to satisfy several design criteria: the user registers the system. In particular, we apply KNN to each type of feature • The signal must be designed in such a way to easily to obtain the clusters for different users. We then cal- distinguish between the structure-borne and airborne culate the Euclidean distance of each feature point to sound propagation. its cluster centroid and that to centroids of other clus- • The transmitted signal should be recognizable such ters. The purpose is to calculate the intra-cluster and that it can be easily identified and segmented amid inter-cluster distances to measure whether a feature is interference from ambient noise and other acoustic consistent for the same user and simultaneously distinct disruptions. This is doubly advantageous for user for different users. Next, we divide the average intra- identification as the procedure should be as fast as cluster distance over the average inter-cluster distance possible to avoid inconveniencing the user. and utilize an experimental threshold to select the fea- tures. The selected features based on KNN are not only • The signal should fall within a safe frequency range sensitive to the user’ hand holding activity but also re- inaudible to average human hearing. This is primar- silient to the other factors such as acoustic noises. ily for the purpose of usability as a noticeably au- dible signal may pose a nuisance to some users if 5.4 Learning-based Holder Verification they are subjected to it every time they wish to ver- ify their identity. Generally speaking, 16kHz is the We develop learning-based algorithms to learn the upper bound of easily detectable sound for ordinary unique characteristics of the user’s hand holding ac- adults [6]. tivity based on the derived acoustic sensing features and determine whether the current device holder is the • The signal must be able to be deployed on a COTS de- legitimate user or not. In particular, we utilize Sup- vice, limiting the viable transmission frequency range. port Vector Machines (SVM) and Linear Discriminant Android devices, for example, are reported to have a Analysis (LDA) as two alternative classifiers to classify maximum sampling rate of around 44kHz, limiting a the mobile device holder a registered or unknown user. practical signal to 22kHz at most [16]. Many devices, SVM relies on a hyperplane to divide the input acoustic however, exhibit considerable attenuation problems sensing feature space into the categories with each rep- when transmitting frequencies exceeding 20kHz due resenting a user. The hyperplane is determined during to hardware imperfections in onboard speakers [30, the training phase with the acoustic sensing data from 34, 35, 41]. However, EchoLock actually leverages the registered users. We use LIBSVM with a linear ker- this property as a means of security as these hard- nel to build the SVM classifier [9]. LDA finds a linear ware imperfections introduce variability that is con- combination of features that characterizes or discrimi- sistent with the device origin, making it difficult for nate the acoustic sensing data in different user classes attacking signals to trigger false-positive results due and utilizes Bayes’ theorem for classification. to the challenges of mimicking degradation specific to The classifiers are trained during the user profile con- a particular device.

8 Noise Removal. Thanks to the design of our acous- -50 -20 22 20 tic sensing signal, we can separate our transmission from -40 interference by using a band-pass filter. In particular, 15 20 -60 -100 we design a band-pass filter with the pass band through -80 10

Power (dB) 18 Power (dB) -100 18kHz to 22kHz, which is the expected frequency range Frequency (kHz) 5 Frequency (kHz) of the transmitted chirp signal. We apply a Butterworth 16 -120 -150 0 low-pass and high-pass filter at the specified frequencies 2 4 6 8 10 200 400 600 800 Time (s) Time (ms) to achieve this. (a) Full Recording (b) Extracted Signal Signal Segmentation. After noise filtering, we need Figure 10: Signal segmentation on a microphone to identify the precise segment containing the structure- recording containing the desired acoustic fea- borne sound. This can be accomplished by using cross- tures. correlation to locate similarities between the recorded acoustic signal with the original transmission. Figure With these considerations, we choose a chirp signal 10 shows the acoustic signal before and after the pro- sweeping from 18kHz to 22kHz. This chirp consists of posed noise removal and signal segmentation, using a 1200 samples, equating to a 25ms duration at a 48kHz volunteer holding a Note 5 as an ex- sampling frequency. During ultrasonic sensing, the recorded ample. The figure demonstrates that the acoustic signal structure-borne propagation of the chirp signal will be received is filtered and and truncated to contain only imbedded with information on the user’s hand geome- the desired frequency range of the signal transmitted, try. While a shorter signal minimizes exposure to en- suggesting that our methods can successfully remove vironmental reflections, it also limits the signal-to-noise noises and precisely extract the information containing ratio (SNR). To balance these two considerations, we the user’s hand characteristics. transmit a series of consecutive chirps, separated by 25ms of empty buffers to stagger the arrival of airborne 6.3 User Profile Construction reflections from structure-borne sound. We refer to this EchoLock requires the user to provide the training design as a n-chirp sequence where n is the number of data for user profile construction during registration. repeating chirps in the signal used during implementa- The acoustic sensing signals are emitted during this pro- tion. By utilizing multiple chirps, we can also gather cess, and the acoustic sensing features (in Section 5.2) multiple user samples in a single ultrasonic sensing in- are derived and labeled to train the learning-based clas- stance. This leads to a natural tradeoff dilemma be- sifier as described in Section 5.4. An existing data set of tween higher classification accuracy and shorter time anonymized user profiles is included to serve as negative delays. We show in further detail the performance ac- labels when classifying the user during identification at- curacy for increasingly large n values in Section 7.4. tempts. Registering multiple users to the same device also serves to expand this data set. To ensure robust- 6.2 Noise Removal and Signal Segmentation ness and low false negative rates, the user is advised As described previously in Section 3, structure-borne to vary holding behavior multiple times rather than re- sound propagation is capable of conveying information main still to train the data. This can be verified using representative of the surrounding environment. In order motion sensors to detect change in device holding pos- to extract this information, it is necessary to remove the ture. interference generated by various sources (i.e. irregular vibration patterns and ambient background noise) and 7. PERFORMANCE EVALUATION determine the precise segments of the acoustic signal containing the target structural sound pattern. A naive 7.1 Experimental Setup assertion could be made suggesting that the system can We study the performance of EchoLock in a variety be designed to transmit and recapture the sound in a of common use case scenarios as well as on several mo- synchronized time interval based on the speed of sound bile devices. Our experiments test the capability of our through a physical medium. Not only does this strategy system to lock or unlock access to mobile devices as an disregard the minute, unpredictable time delays intro- example application, approved by our institute IRB. We duced as a result of multiple layers of hardware and present our findings and detailed analysis below. operations, it also requires precise knowledge Experimental Setup. A prototype application for regarding the positioning of speakers and microphones EchoLock was developed for use on Android devices. relative to each other, as well as the particular mate- Three Android devices, the , Galaxy rial the device is constructed from. These conditions Note 5, and Galaxy Tab A, were selected for their var- are neither consistent nor easy to acquire across mul- ied designs (e.g., speaker and microphone positions) tiple devices, thus a more reliable procedure must be and size discrepancies, pictured in Figure 11. Both the developed. Nexus 5 and Galaxy Note 5 devices include two onboard

9 Accuracy: 95.25% 1 19 1 1 2 19 1 3 1 19 4 19 1 5 20 6 19 1 7 19 1 8 19 9 19 10 19 11 19 12 19 Output Class 13 18 14 18 15 1 19 16 18 17 1 1 19 18 1 1 1 1 1 1 1 1 20 19 20 20 1 20 (a) Nexus 5 (b) Galaxy Note 5 (c) Galaxy Tab A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Target Class Figure 11: Mobile device specifications and lo- Figure 12: Confusion matrix of user identifica- cations of key sensor components. tion when deploying EchoLock on the Nexus 5. microphones whereas the Galaxy Tab A is equipped ing classification, with the exception of the target user with one. Two types of environments explored were of- undergoing identification. Our data collection process fice and public settings. Office settings consist of quiet, spanned a 2 month period with volunteers providing enclosed spaces with minimal disturbances whereas pub- data across different days. While this data is used for lic settings are locations with large volumes of people user identification purposes, we do not consider it to be and traffic. We maintain an average noise level of ap- personally identifiable information (as mentioned in Ta- proximately 30dB and 60dB for the two environments, ble 1). Nonetheless, we are currently maintaining this respectively. Sources of noise for the public environment data privately and do not plan to release it publicly as include nearby conversations, walking, and dining. We a precaution. gauge the ability of our system to accurately identify 7.2 Evaluation Metrics the user in the face of these obstacles. Additionally, we also investigate the impact of accessories that may We describe the accuracy of our system by evaluating transform the properties of the user’s hand or device two criteria from our results, precision and recall. Pre- structure, such as gloves or smartphone cases (exam- cision is defined as the percentage of True Positive (TP) ples in Appendix). From these identified factors, we classifications out of all positive classifications recorded, TP devised several scenarios to study. notated as P = TP +FP where FP is the false positive TP Data Collection. 8 use case experiments were de- rate. Recall is defined as R = TP +FN or the percentage vised in total, divided into 3 general categories. The of true positive classifications out of all target class in- first and second categories study performance differ- stances. For our purposes, higher precision describes ences across device models and surrounding noise levels, lower probability for different people to be mistaken respectively, comprised of the Cartesian product of our for the legitimate user while higher recall describes the 3 devices and 2 environmental conditions. The third lower probability that the legitimate user is misidenti- category considers usage via indirect physical contact, fied as someone else. which includes when the user has equipped a protec- tive case to their device and when the user is wearing 7.3 User Identification Performance a glove while holding their device. To reduce the num- From our signal processing procedures, we obtain sev- ber of factors at play during our third category of tests, eral features used to identify the user and present them we confine our study to the Galaxy Note 5 device in to a machine learning classifier for authentication. To office settings. No specific instruction was provided to evaluate the performance of our implementation, we the participants on how to hold the device to encour- consider Decision Trees (DT), Linear Discriminant Anal- age more natural interactions. However, we find that ysis (LDA), K-Nearest Neighbor (KNN), and Support almost all participants favored holding postures similar Vector Machines (SVM) as our candidate classifiers. to those shown in Figure 11. This is both beneficial Figure 12 shows an example confusion matrix demon- and challenging as this adds consistency to our data set strating user identification accuracy for a group of 20 while also making it less simple to differentiate usage users in office settings. From our initial comparisons of behaviors. We recruit 20 volunteers, 14 males and 6 each classifier’s ability to distinguish between 5 different females ranging from ages 18-35, to participate in our users, we observe LDA and SVM to demonstrate rea- study. We collect 40 n-chirp sequences, where n=10, for sonable validation, showing accuracy of 95% and 91%, each test case based on the procedure in Section 6.3 for respectively, and smaller range compared to DT and a total of 64,000 hand geometry samples. The profiles of KNN. As a result, we present our findings for these two all volunteers collectively act as the negative label dur- classifiers for our extended evaluations. We choose 10-

10 1 1 1 1

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 Accuracy Accuracy Accuracy Accuracy

0.2 0.2 0.2 0.2 Precision Precision Precision Precision Recall Recall Recall Recall 0 0 0 0 Nexus 5 Galaxy Note 5 Galaxy Tab A Nexus 5 Galaxy Note 5 Galaxy Tab A Direct Contact Case Glove Direct Contact Case Glove (a) SVM (b) LDA (a) SVM (b) LDA Figure 13: Performance for different mobile de- Figure 15: Performance when using the mobile vice models when used in office and public set- device through direct vs. indirect physical con- tings. tact. Results shown are for the Galaxy Note 5.

1 1 We suspect this is attributed to the lack of secondary 0.8 0.8 microphones on the Tab A device, which limits noise- 0.6 0.6 cancellation capabilities compared to the smartphone

0.4 0.4 devices. While interference from sources such as people Accuracy Accuracy walking or adjusting chairs caused an expected decrease 0.2 0.2 Precision Precision Recall Recall in accuracy, we have found that loud vocalizations pro- 0 0 Office Public Office Public duced more significant performance degradation. We (a) SVM (b) LDA attribute this to our acoustic features, specifically our usage of MFCC. As MFCCs are most commonly used Figure 14: Performance when identifying users in speech processing, the presence of loud voices nearby in different environments. causes our biometric measurements to be overwritten fold cross-validation during the training process to best by the more dominant speech characteristics. utilize our data set and minimize selective bias, allocat- Indirect Physical Contact. We also consider the ing 50% for training and the remaining 50% for testing. influence of factors that may transform properties of ei- ther the device or user’s hand when using EchoLock. 7.4 Impact Factor Study Figure 15 shows our results for usage during situations Impact of Device Models. We consider the perfor- where the user’s hand does not directly make contact mance discrepancies when operating EchoLock on dif- with the mobile device. To simulate these conditions, ferent mobile devices. Our participants are provided we equip our Galaxy Note 5 device with a smartphone devices with our prototype installed and given a short case to change the physical properties of the medium. explanation on its functionality. We note that a single We also provide wool gloves roughly 2mm thick for the demonstration less than 10 seconds long is sufficient for user to wear during separate sets of experiments. We our participants to grasp how to operate EchoLock, in- conduct our tests under office settings and compare to dicating its ease of use. We graph P and R in Figure 13 previously observed performance. Our findings do not for the Nexus 5, Galaxy Note 5, and Tab A usage in show statistically significant deterioration for these con- office and public settings, and observe precision rates ditions tested. On the contrary, some users showed of 91%, 90%, and 81%, respectively, when classified us- slightly improved accuracy ranging from 1-3%, partic- ing our LDA classifier. When evaluated using our SVM ularly when wearing gloves. We suggest that the mate- classifier, we find precision rates of 94%, 80%, and 69%, rial of the gloves conformed well to the curvature of the again respectively. Similar trends follow for recall rates. hand while simultaneously suppressing certain variance We note that performance can be correlated to the size in grip behavior, such as minor fidgeting or twitching. of the device used, as users more easily acclimate to a Impact of Embedded Microphone Location. For favored posture for smaller devices compared to larger devices with more than a single onboard microphone, devices. We refer to these results, particularly for the quality of identification can vary noticeably. Figure 16 Galaxy Note 5, as a benchmark of other experiments. shows an example of two microphones’ ability to dis- Impact of Environmental Noises. Figure 14 shows tinguish separate people based on features extracted performance differences when considering the surround- from their recordings. Structure-borne sound originat- ing environments during user authentication. The intro- ing from the bottom speaker will be considerably damp- duction of significant ambient noise produces a notice- ened by the force of the user squeezing the device when able effect on accuracy, showing precision decline from received by the bottom microphone. For the top mi- 84% to 78% in the case of SVM and 92% to 81% for crophone, this sound is more strongly influenced by the LDA. Our smartphone devices showed greater resilience positioning of the thumb and fingers, which shapes the under these conditions compared to our tablet device. signal as it decays due to the longer distance traveled.

11 Accuracy: 96.00% Accuracy: 61.00% 1 1 100.0% 0.0% 0.0% 0.0% 0.0% 60.0% 20.0% 0.0% 35.0% 5.0% 1 1 20 0 0 0 0 12 4 0 7 1 0.8 0.8 0.0% 100.0% 0.0% 0.0% 0.0% 5.0% 50.0% 0.0% 10.0% 20.0% 2 2 0 20 0 0 0 1 10 0 2 4 0.6 0.6 0.0% 0.0% 95.0% 15.0% 0.0% 20.0% 5.0% 100.0% 15.0% 0.0% 3 3

0 0 19 3 0 4 1 20 3 0 TPR 0.4 0.4 0.0% 0.0% 5.0% 85.0% 0.0% 10.0% 25.0% 0.0% 25.0% 5.0% Accuracy

Output Class 4 Output Class 4 0 0 1 17 0 2 5 0 5 1 Nexus 5 0.2 0.2 0.0% 0.0% 0.0% 0.0% 100.0% 5.0% 0.0% 0.0% 15.0% 70.0% Galaxy Note 5 True Positive 5 5 0 0 0 0 20 1 0 0 3 14 Galaxy Tab A False Positive 0 0 1 2 3 4 5 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 Target Class Target Class FPR User Number (a) Top microphone (b) Bottom microphone (a) ROC curve for user iden- (b) TP and FP for eavesdrop- tification aross 3 devices ping and replay attacks. Figure 16: Classification accuracy for two avail- able microphones based on signal transmission Figure 18: Performance of user identification from a bottom-positioned speaker. during evaluation of impersonation and eaves- 1 dropping/replay attacks.

0.8

0.6 Impersonation Attacks. We evaluate the possibil- ity of potential attackers to impersonate hand profiles

Accuracy 0.4 of other users as a means of gaining unauthorized ac- 0.2 Nexus 5 Galaxy Note 5 cess to devices and information. For our assessment, Galaxy Tab A 0 we consider the worst case scenario; a limited (i.e. 5) 1 2 3 4 5 Size of N training sample size for the victim user and multiple Figure 17: Performance of EchoLock as length informed attackers. 20 participants first observe an in- of n-chirp sequence increases. stance of the designated victim using our hardware plat- forms and given 10 attempts to imitate their hold for When evaluated using 100 samples provided from 5 peo- each device. We measure the True Positive (TP) and ple, we observe identification accuracy of 96% and 61% False Positive (FP) rates and plot the ROC curve in for the top and bottom microphones, respectively. The Figure 18(a), showing FP rates as low as 6% for a 90% comparatively less accurate rate for the bottom micro- TP rate for devices such as the Nexus 5. These re- phone is attributed to the shorter path from the source sults suggest that observation alone is not sufficient to speaker to the receiving microphone, preventing suffi- impersonate another hand profile; success is dependent cient vibration and feature formation. This stipulates on physical similarity between the attacker and victim, that our speaker and microphone must be orientated which the attacker cannot control. further apart such that they encompass as much of Eavesdropping and Replay Attacks. We assessed the device as possible for optimal measurement. As the viability of eavesdropping information by during our such, we compute our results using data provided from studies on standard usage behavior. Figure 19 shows an speaker and microphone pairs with the greatest amount example configuration of hardware during these exper- of separation when an option to select exists. iments. One of our mobile devices acts as a malicious N-Chirp Sequence Length. We investigate the ef- sensor, listening for the validating signal used to iden- fect of chirp sequence length on identification accuracy tify the user. Filtered signals recovered from these sim- as a consideration to improve performance, shown in ulated attacks in an office setting showed recognizable Figure 17. While increasing the size of n used for sens- chirp patterns, as depicted in Figure 20, however clar- ing naturally enhances accuracy, we observe an onset of ity is lost due to the multipath effect. In trials with 10 diminishing returns rapidly after roughly 5 iterations. users, 9 acting as victims and 1 as an attacker, directly Stability around 90% can be maintained for n values replaying these recordings failed to grant the attacker as low as 3 in optimal test cases (i.e. smaller devices, access to the victim’s device in all instances, depicted quiet settings). As mentioned in Section 6.1, a single in Figure 18(b). Recorded signals must travel through chirp iteration requires only 25ms, or 50ms when ac- two airborne paths, once from the victim device to the counting for buffering between iterations, meaning that eavesdropper and vice versa, causing significant atten- our current performance can be feasibly achieved in exe- uation and loss of genuine structure-borne properties. cution times competitive with techniques such as finger- Sophisticated attackers may attempt to apply noise can- printing. These results also indicate that an individual’s cellation and amplification to minimize attenuation, but hand biometrics are recognizable such that our classi- careless processing of this signal risks destroying embed- fiers may begin to identify them with moderate success ded features used to validate the user. using a relatively small number of training samples. 7.5 Attacks on User Credentials 8. DISCUSSION

12 (a) Signal on vic- (b) Eavesdropped (c) Replayed signal tim device signal observed by victim Figure 20: Frequency clarity throughout eaves- dropping and replay attack 0.2m from victim. Figure 19: Example attack model setup.

Optimal Scenarios for Classifiers. Our findings show situational advantages for employing SVM and LDA classifiers for user identification. Figures 13 and 14 indicate that both the device model and surrounding en- vironments contribute significantly to the performance of EchoLock. These results indicate that our LDA clas- sifier is more platform-agnostic, showing less than 1% (a) Frequency domain under (b) Frequency domain under ordinary usage jamming attack precision and recall difference for different smartphones and under 10% difference between smartphones and tablets. Figure 21: Identification of jamming attempts In contrast, we note that SVM is less susceptible to within the system operational frequency range. change in performance as a result of additional back- came aware of certain hardware configurations unsuit- ground noise, showing an average 6% difference between able to implement EchoLock on (e.g. speakers on front office and public spaces compared to 11% for LDA. face, microphone on back). We determine that our im- These findings are intuitive to the nature of their respec- plementation necessitates our hardware components to tive classifiers as the linearity of LDA is suited for situ- be orientated such that they are as distant from each ations with consistency (i.e. a quiet space with devices other as possible to allow for uninterrupted sound prop- and hand geometry that do not change) whereas the in- agation, which cannot be accurately measured with a troduction of unpredictable factors requires adaptabil- backward-facing microphone. Adherence to this may ity offered by techniques such as SVM. be somewhat relaxed by leveraging gyroscopic measure- Jamming Attacks. We also consider the possibil- ments to compensate for inconvenient speaker or mi- ity of acoustic disruptions to our performance via jam- crophone placement, though this also adds additional ming strategies. To do so, we study the ability for 10 hardware requirements to an intentionally minimalist users to use our system when an attacking device plays design. a continuous signal within the operational frequency of EchoLock. Figure 21 shows examples of the frequency domain during these experiments. We play an attack- 9. CONCLUSION ing signal continuously sweeping from 18kHz to 22kHz We have proposed EchoLock, an inexpensive, non- during ten identification attempts per user at a dis- intrusive, and lightweight identification protocol deploy- tance of 0.2m. We find that detection of these jamming able on commodity mobile devices or smart IoT devices. signals is feasible using a threshold scheme. We note Our system validates the user’s identity using a novel that although detection is possible, negating the inter- technique leveraging acoustic sensing of structure-borne ference still poses a challenge. However, jamming at- sound propagation to ascertain biometric characteristics tempts from distances greater than 1m were observed to based on how the user holds their devices. A prototype lose potency, functioning more similar to public environ- of EchoLock has been implemented on multiple Android ment conditions described in Section 7.4. Methods to platforms and evaluated in key use case scenarios. Our avoid threshold detection may attempt to limit the ex- identification technique is quick to conduct, low effort to tent they exceed normal intensity, however this requires use, and demonstrates accuracy over 90%. Our current careful synchronization on the scale of milliseconds. Ad- prototype evaluates EchoLock as a standalone identifi- ditionally, the jamming signal must match the length of cation protocol; for future work, we intend to integrate the n-chirp sequence, which the attacker cannot predict. it with existing authentication techniques and assess the We are further improving resistance to these attacks as possibility of elevating current security rates. Further- part of our future work. more, we intend to also enhance our ability to defend Potential Hardware Constraints. During our se- against more sophisticated attack models. While we lection of candidate devices to experiment on, we be- have demonstrated EchoLock on select mobile devices,

13 we are also exploring other hardware options and de- and services (2013), MobileHCI. veloping a more robust implementation to realize high [12] Chiasson, S., van Oorschot, P. C., and accuracy, low effort identification. Biddle, R. Graphical password authentication using cued click points. In Computer 10. REFERENCES Security–ESORICS 2007. Springer, 2007, [1] Face id your face is your password. pp. 359–374. https://www.apple.com/iphone-xr/face-id/. [13] Chiu, E. Google’s ceo wants 30 dollar [2] Fingerprint sensors on their way to more smartphones for developing countries, Jun. 2017. smartphones. https://www.ibtimes.com/googles-ceo-wants-30- https://www.pcworld.com/article/2938792/ smartphones-developing-countries-2471321/. fingerprint-sensors-on-their-way-to-more- [14] Fathy, M. E., Patel, V. M., and Chellappa, smartphones.html. R. Face-based active authentication on mobile [3] iris scanner. devices. In Proceedings of the International https://www.pocket- Conference on Acoustics, Speech and Signal lint.com/phones/news/samsung/138335-samsung- Processing (2015), IEEE. galaxy-note-7-iris-scanner-what-is-it-and-how- [15] Ford, J. 80 dollar android phone sells like does-it-work. hotcakes in kenya, the world next?, Aug. 2011. [4] Internet of things (iot) connected devices installed https://singularityhub.com/2011/08/16/80- base worldwide from 2015 to 2025 (in billions), android-phone-sells-like-hotcakes-in-kenya-the- Mar. 2019. world-next/. https://www.statista.com/statistics/471264/iot- [16] Google. Android developer resources, 2019. number-of-connected-devices-worldwide/. https://developer.android.com/ [5] Material sound velocities, 2019. reference/android/media/AudioRecord.html. https://www.olympus-ims.com/en/ndt- [17] Harbach, M., von Zezchwitz, E., Fichtner, tutorials/thickness-gage/appendices-velocities/. A., De Luca, A., and Smith, M. It’s a hard [6] Ashihara, K. Hearing thresholds for pure tones lock life: A field study of smartphone (un)locking above 16 khz. The Journal of the Acoustical behavior and risk perception. In Proceedings of Society of America 122, 3 (2007). the Tenth Symposium on Usable Privacy and [7] Barra, S., De Marsico, M., Nappi, M., Security (SOUP) (2014), SOUP, pp. 213–224. Narducci, F., and Riccio, D. A hand-based [18] IOS, A. Siri, 2017. biometric system in visible light for mobile https://www.apple.com/ios/siri/. environments. Information Sciences 479 (2019), [19] Johnson, R., Scheirer, W. J., and Boult, 472–485. T. E. Secure voice-based authentication for [8] Casanova, J. G., Avila,´ C. S., mobile devices: vaulted voice verification. In de Santos Sierra, A., del Pozo, G. B., and Proceedings of Biometric and Surveillance Vera, V. J. A real-time in-air signature Technology for Human and Activity Identification biometric technique using a mobile device (2013), SPIE. embedding an accelerometer. In Networked Digital [20] Kratz, S., and Aumi, M. T. I. Airauth: a Technologies (Berlin, Heidelberg, 2010), biometric authentication system using in-air hand F. Zavoral, J. Yaghob, P. Pichappan, and gestures. In CHI’14 Extended Abstracts on E. El-Qawasmeh, Eds., Springer Berlin Human Factors in Computing Systems (2014), Heidelberg, pp. 497–503. ACM, pp. 499–502. [9] Chang, C.-C., and Lin, C.-J. Libsvm: a [21] Liu, J., Wang, C., Chen, Y., and Saxena, N. library for support vector machines. ACM Vibwrite: Towards finger-input authentication on Transactions on Intelligent Systems and ubiquitous surfaces via physical vibration. In Technology (TIST) 2, 3 (2011), 27. Proceedings of the 2017 ACM SIGSAC [10] Cherapau, I., Muslukhov, I., Asanka, N., Conference on Computer and Communications and Beznosov, K. On the impact of touch id on Security (2017), ACM, pp. 73–87. iphone passcodes. In Symposium on Usable [22] Logan, B., et al. Mel frequency cepstral Privacy and Security (SOUPS) (2015), coefficients for music modeling. In ISMIR (2000), pp. 257–276. vol. 270, pp. 1–11. [11] Chiang, H.-Y., and Chiasson, S. Improving [23] Mathur, S., Vjay, A., Shah, J., Das, S., and user authentication on mobile devices: a Malla, A. Methodology for partial fingerprint touchscreen graphical password. In Proceedings of enrollment and authentiation on mobile devices. the 15th International Conference on In Proceedings of the International Conference on Human-computer interaction with mobile devices Biometrics (2016), IEEE.

14 [24] Muller,¨ M., Kurth, F., and Clausen, M. graphical passwords: the case of android unlock Audio matching via chroma-based statistical patterns. In Proceedings of the 2013 ACM features. In ISMIR (2005), vol. 2005, p. 6th. SIGSAC conference on Computer & [25] Ren, Y., Chen, Y., Chuah, M. C., and Yang, communications security (2013), pp. 161–172. J. User verification leveraging gait recognition for [37] Van Bruggen, D., Liu, S., Kajzer, M., smartphone enabled mobile healthcare systems. Striegel, A., Crowell, C. R., and John, D. IEEE Transactions on Mobile Computing (2014). Modifying smartphone user locking behavior. In [26] Ren, Y., Wang, C., Chen, Y., Chuah, M. C., Proceedings of the Ninth Symposium on Usable and Yang, J. Critical segment based real-time Privacy and Security (SOUP) (2013), SOUP, e-signature for securing mobile transactions. In pp. 213–224. 2015 IEEE Conference on Communications and [38] WeChat. Voiceprint, 2017. Network Security (CNS) (2015), IEEE, pp. 7–15. https://thenextweb.com/apps/2015/03/25/wechat- [27] Rychlewski, J. On hooke’s law. Journal of on-ios-now-lets-you-log-in-using-just-your-voice/. Applied Mathematics and Mechanics 48, 3 (1984), [39] Zheng, N., Bai, K., Huang, H., and Wang, 303–314. H. You are how you touch: User verification on [28] Sae-Bae, N., Ahmed, K., Isbister, K., and smartphones via tapping behaviors. In ICNP Memon, N. Biometric-rich gestures: A novel (2014), vol. 14, pp. 221–232. approach to authentication on multi-touch [40] Zhong, Y., and Deng, Y. Sensor orientation devices. In Proceedings of ACM SIGCHI (2012). invariant mobile gait biometrics. In Proceedings of [29] Shahzad, M., Liu, A. X., and Samuel, A. the IEEE International Joint Conferene on Secure unlocking of mobile touch screen devices Biometrics (2014), IEEE. by simple gestures: You can see it but you can [41] Zhou, B., Lohokare, J., Gao, R., and Ye, not do it. In ACM MobiCom (2013), pp. 39–50. F. Echoprint: Two-factor authentication using [30] Sun, K., Zhao, T., Wang, W., and Xie, L. acoustics and vision on smartphones. In Vskin: Sensing touch gestures on surfaces of Proceedings of the 24th Annual International mobile devices using acoustic signals. In Conference on Mobile Computing and Networking Proceedings of the 24th Annual International (2018), pp. 321–336. Conference on Mobile Computing and Networking (2018), pp. 591–605. APPENDIX [31] Suo, X., Zhu, Y., and Owen, G. S. Graphical passwords: A survey. In Proceedings of the 21st Annual Computer Security Applications Conference (2005), IEEE. [32] Trewin, S., Swart, C., Koved, L., Martino, J., Singh, K., and Ben-David, S. Biometric authentication on a mobile device: a study of user effort, error and task disruption. In Proceedings of the 28th Annual Computer Security Applications Conference (2012), ACM, pp. 159–168. [33] TSYS. 2016 U.S. Consumer Payment Study, 2016. https://www.tsys. (a) Glove equipped (b) Protective case com/Assets/TSYS/downloads/rs 2016-us- consumer-payment-study.pdf. Figure 22: Considerations for device usage [34] Tung, Y.-C., and Shin, K. G. Echotag: through indirect physical contact. Protective Accurate infrastructure-free indoor location case is detached for visual clarity. tagging with smartphones. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (2015), pp. 525–536. [35] Tung, Y.-C., and Shin, K. G. Expansion of human-phone interface by sensing structure-borne sound propagation. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (2016), pp. 277–289. [36] Uellenbeck, S., Durmuth,¨ M., Wolf, C., and Holz, T. Quantifying the security of

15