Exploring Audio and Kinetic Sensing on Earable Devices
Total Page:16
File Type:pdf, Size:1020Kb
Exploring Audio and Kinetic Sensing on Earable Devices Chulhong Min*, Akhil Mathur*, Fahim Kawsar*y *Nokia Bell Labs, yTU Delft {chulhong.min, akhil.mathur, fahim.kawsar}@nokia-bell-labs.com ABSTRACT eating detection [2], sleep monitoring [14], and energy expenditure In this paper, we explore audio and kinetic sensing on earable de- monitoring [8]. However, they aimed at detecting specific activities vices with the commercial on-the-shelf form factor. For the study, and were mostly evaluated with bulky hardware prototypes. Recently, we prototyped earbud devices with a 6-axis inertial measurement sensor-equipped earbuds are commercially released, but the research unit and a microphone. We systematically investigate the differen- on such COTS-formed devices is limited due to their inaccessible tial characteristics of the audio and inertial signals to assess their application programming interfaces (APIs). feasibility in human activity recognition. Our results demonstrate In this paper, we explore audio and kinetic sensing on COTS- that earable devices have a superior signal-to-noise ratio under the formed earbud devices for a range of human-sensing tasks in the wild. influence of motion artefacts and are less susceptible to acoustic To this end, we prototyped an in-ear wearable instrumented with a environment noise. We then present a set of activity primitives and microphone, a 6-axis inertial measurement unit (IMU), and dual- corresponding signal processing pipelines to showcase the capabili- mode Bluetooth and Bluetooth Low Energy (BLE). We designed the ties of earbud devices in converting accelerometer, gyroscope, and hardware prototype in an aesthetically pleasing, and ergonomically audio signals into the targeted human activities with a mean accuracy comfortable form factor (See Figure 1). reaching up to 88% in varying environmental conditions. With this pre-production and a working prototype, we system- atically explore the differential characteristics of the inertial and CCS CONCEPTS audio signals in various experimental settings. We looked at how earables compare against a smartphone and a smartwatch concerning • Human-centered computing ! Ubiquitous and mobile com- some key factors that impact activity recognition pipelines, includ- puting systems and tools; ing signal to noise ratio, placement invariance, and sensitivity to KEYWORDS motion artefacts. Analysis of these experimental results suggests that earable sensing is robust in modelling these signals and in most Earable, Earbud, Audio sensing, Kinetic sensing conditions demonstrates superior performance concerning signal stability and noise sensitivities. Inspired by these characteristics, we 1 INTRODUCTION then design a set of human activity primitives. Activity classifiers The era of earables has arrived. Apple fundamentally changed the are then trained to model these activities with audio and motion data. dynamics of the markets for wireless headphones after it removed the Early experimental results show that earable sensing can reach up to 3.5mm audio jack from iPhone 7. With the explosive growth of the 88% detection accuracy of targeted human activities. markets, wireless earbuds are also becoming smarter by adopting context monitoring capabilities and conversational interface. Re- 2 RELATED WORK cently, sensor-equipped smart earbuds have been actively released Earbuds have been primarily used for a variety of health monitoring into the market, e.g., Apple AirPods, Google Pixel Buds, and Sony applications. LeBoeuf et al. [8] developed a miniaturised optome- Mobile Xperia Ear. Beyond high-quality audio, they are expected chanical earbud sensor to estimate oxygen consumption and measure to reshape our everyday experiences with new, useful, and exciting blood flow information during daily life activities. In a recent work, services. However, their monitoring capabilities are still limited to Bedri et al. [2] presented EarBit, a multisensory system for detecting the narrow set of exercise-related physical activities. eating episodes in unconstrained environments. The EarBit proto- One of the barriers for modern earables in modelling richer and type consisted of two IMUs, a proximity sensor, and a microphone, wider human activities is limited understanding of audio and kinetic and it detected chewing events in-the-wild. The authors also pre- sensing on COTS-formed earable devices. Over the last decade, there sented an insightful analysis on the choice and placement of sensors have been active research efforts to leverage earable sensing, e.g. for inside the prototype device. Amft et al. [1] placed a microphone inside the ear canal to detect eating activities and to classify them Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed into four food types. While these works have looked at using ear- for profit or commercial advantage and that copies bear this notice and the full citation worn devices for detecting specific activities with bulky hardware on the first page. Copyrights for components of this work owned by others than the prototypes, the goal of this work is to provide a broad understanding author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission of audio and kinetic sensing on COTS-formed in-ear earbuds for a and/or a fee. Request permissions from [email protected]. range of human activities. WearSys’18, June 10, 2018, Munich, Germany Audio and kinetic sensing has been used extensively for detecting © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-5842-2/18/06. $15.00 human activities and context. Dong et al. [3] used wrist-worn 6-axis https://doi.org/http://dx.doi.org/10.1145/3211960.3211970 IMU to detect eating episodes. Thomaz et al. [17] used a smartwatch WearSys’18, June 10, 2018, Munich, Germany Chulhong Min*, Akhil Mathur*, Fahim Kawsar*y Earbud Phone Watch 120 120 100 100 80 80 60 60 SNR (dB) SNR (dB) 40 40 20 20 0 0 Sitting Standing Walking Sitting Standing Walking (a) Accelerometer (b) Gyroscope Figure 1: Prototype design of earbuds. Figure 2: SNR of inertial data across multiple devices. accelerometer to detect the motion of bringing food to the mouth. Hammerla et al. [5] proposed several deep learning models to detect [13, 15]. Instead, we primarily aim at uncovering the differential physical activities using wearables. Acoustic sensing approaches characteristics in sensor signals induced by different devices and have been used to infer human activities and states such as eating [1] their positions and orientations on the human body. and coughing [7]. In this work, we aim at systematically exploring these activities using COTS-formed earbud devices. To this end, we 4.1 Characterising Inertial Signals conducted a comparative study of the signal characteristics and de- We start by looking at the behaviour of the inertial signals of the tection performance on earable devices and commodity smartphones earbud and in particular, we study the signal-to-noise ratio and signal and smartwatches. sensitivity as a function of placement in two controlled experiments. 3 HARDWARE PROTOTYPE 4.1.1 Understanding Signal-to-Noise Ratio. For the study, we prototyped earbud devices with multiple sensing Objective: Inertial sensor data obtained from earbuds, a smart- modalities. One of the key design goals is to leverage an established phone, and a smartwatch are directly influenced by the movement form while uncovering opportunities for multi-sensory experience of corresponding body parts. The objective of this experiment is to design. As such, we have chosen to augment an emerging True Wire- understand how IMU signals behave as it is placed at a different part less Stereo earpiece primarily used for hands-free audio experiences of our body, e.g., in an ear, on a wrist, etc. - music playback, seamless conversation, etc. This design choice Experimental setup: To quantify the impact of the placement demands that the critical functional attributes are kept as original as on activity monitoring, we collected the inertial sensor data from possible without compromising their operational behaviour. Our ear- earbuds, a smartphone, and a smartwatch, simultaneously. We re- bud prototype first and foremost is a comfortable earpiece capable of cruited ten users, and they performed each activity (sitting, standing, producing high definition wireless audio experience in a compelling and walking) for one minute. Then, we compared the signal-to-noise form. The earbud is instrumented with a microphone, a 6-axis iner- ratio (SNR) values across the devices. tial measurement unit, Bluetooth, and BLE and powered by a CSR Results and implications: As shown in Figure 2, earbuds and a processor. Figure 1 shows its final form. Each earbud equips 40mAh smartwatch have higher SNR compared to a smartphone, 20-30dB battery capacity, weights 20 gram, and has the physical dimension higher; higher SNR represents less noisy signals, i.e., higher varia- of 18mm × 18mm × 20mm including the enclosure. tion compared to the stationary situation. It is reasonable considering that the arms and head have a relatively higher level of freedom of 4 UNDERSTANDING SIGNAL BEHAVIOUR movement, compared to the thigh. For example, the arm will move We investigate the opportunities and challenges in human sensing