DSP.Ear: Leveraging Co-Processor Support for Continuous Audio Sensing on Smartphones
Total Page:16
File Type:pdf, Size:1020Kb
DSP.Ear: Leveraging Co-Processor Support for Continuous Audio Sensing on Smartphones Petko Georgiev§, Nicholas D. Laney, Kiran K. Rachuri§z, Cecilia Mascolo§ §University of Cambridge, yMicrosoft Research, zSamsung Research America Abstract Keywords The rapidly growing adoption of sensor-enabled smart- mobile sensing, co-processor, DSP, energy, audio phones has greatly fueled the proliferation of applications that use phone sensors to monitor user behavior. A central 1 Introduction sensor among these is the microphone which enables, for in- In recent years, the popularity of mobile sensing applica- stance, the detection of valence in speech, or the identifica- tions that monitor user activities has soared, helped by the tion of speakers. Deploying multiple of these applications wide availability of smart and wearable devices. The appli- on a mobile device to continuously monitor the audio en- cations enabled by the microphone have attracted increasing vironment allows for the acquisition of a diverse range of attention as audio data allows a wide variety of deep infer- sound-related contextual inferences. However, the cumula- ences regarding user behavior [23, 28, 36, 46, 19]. The track- tive processing burden critically impacts the phone battery. ing of one’s activities requires a fairly continuous stream of To address this problem, we propose DSP.Ear – an inte- information being recorded. Most smartphones already boast grated sensing system that takes advantage of the latest low- a quad-core or octa-core processor, therefore, computation power DSP co-processor technology in commodity mobile is increasingly less of a constraint on these commonly used devices to enable the continuous and simultaneous operation devices. However, the biggest limiting factor for smartphone of multiple established algorithms that perform complex au- applications that depend on sensor data is the inadequate bat- dio inferences. The system extracts emotions from voice, es- tery capacity of phones. This problem is exacerbated in the timates the number of people in a room, identifies the speak- case of the microphone sensor, due to its high data rates. ers, and detects commonly found ambient sounds, while crit- Further, as more of these sensing applications become de- ically incurring little overhead to the device battery. This is ployed on the device, the high computational requirements achieved through a series of pipeline optimizations that allow of simultaneously running complex classifiers add to the en- the computation to remain largely on the DSP. Through de- ergy burden. As a result, duty cycling strategies are com- tailed evaluation of our prototype implementation we show monly adopted, but the side effect is a reduced coverage of that, by exploiting a smartphone’s co-processor, DSP.Ear captured sound events, since these can be missed while the achieves a 3 to 7 times increase in the battery lifetime com- sensor is not sampling. pared to a solution that uses only the phone’s main processor. Recently, a new architecture has been adopted by many In addition, DSP.Ear is 2 to 3 times more power efficient than smartphone manufacturers whereby an additional low-power a na¨ıve DSP solution without optimizations. We further an- processor, typically referred to as co-processor, is added alyze a large-scale dataset from 1320 Android users to show alongside the more powerful primary processor. The aim of that in about 80-90% of the daily usage instances DSP.Ear is this additional low-power processor is to sample and process able to sustain a full day of operation (even in the presence sensor data. For example, the Apple iPhone 5S is equipped of other smartphone workloads) with a single battery charge. with an M7 motion co-processor [6] that collects and pro- Categories and Subject Descriptors cesses data from the accelerometer and gyroscope even when the primary processor is in sleep mode. Similarly, the Mo- C.3 [Special-Purpose and Application-Based Systems]: torola Moto X [7] is equipped with a “contextual computing Real-time and embedded systems; H.1.2 [User/Machine processor” for always-on microphone sensing but is limited Systems]: Human information processing to recognizing only spoken voice commands. A key chal- General Terms lenge in leveraging such a co-processor is effectively man- Design, Experimentation, Performance aging its fairly limited computational resources to support a diverse range of inference types. In this work, we present a system that operates within Permission to make digital or hard copies of all or part of this work for personal or the hardware constraints of low-power co-processors (Qual- classroom use is granted without fee provided that copies are not made or distributed comm Hexagon DSP [13]) to continuously monitor the for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than phone’s microphone and to infer various contextual cues ACM must be honored. Abstracting with credit is permitted. To copy otherwise, from the user’s environment at a low energy cost. Most exist- or republish, to post on servers or to redistribute to lists, requires prior specific ing work (e.g., [27, 34]) on using a low-power processor for permission and/or a fee. Request permissions from [email protected]. sensing has employed custom hardware: we build our sys- SenSys’14, November 3–5, 2014, Memphis, TN, USA. Copyright 2014 ACM 978-1-4503-3143-2/14/11 ...$15.00 tem on off-the-shelf phones and propose several novel opti- http://dx.doi.org/10.1145/2668332.2668349 mizations to substantially extend battery life. Sensing with commodity co-processor support (e.g., [40, 42]) is still ex- is achieved through the publicly released C/assembly-based ploratory. The computational constraints of the low-power Hexagon SDK [8] which, to date, has been used for multi- units has limited the research to either fairly simple tasks media applications but has yet to be used for sensing. such as accelerometer step counting or pipelines typically The realization of our targeted scenario of multiple con- seen in isolation or in pairs. We take these efforts one step currently running audio inference pipelines with emerging further by studying how the co-processor in off-the-shelf co-processor technology such as the Hexagon application smartphones can support multiple computationally inten- DSP becomes particularly challenging because of the follow- sive classification tasks on the captured audio data, suitable ing design space limitations: for extracting, for instance, human emotional states [36] or stress [28]. In our design we are able to support the inter- Memory Restrictions. The DSP runtime memory con- leaved execution of five existing audio pipelines: ambient straints restrict the amount of in-memory data reserved to noise classification [29], gender recognition [19], counting sensing applications. Examples of such data are classifica- the number of speakers in the environment [46], speaker tion model parameters, accumulated inferences or features identification [36], and emotion recognition [36]. This is extracted for further processing. The lack of direct file sys- done primarily on the DSP itself with only limited assis- tem support for the DSP imposes the interaction with the tance from the CPU, maximizing the potential for energy CPU which will either save the inferences or perform ad- efficiency. ditional processing on DSP-computed features. This eas- ily turns into a major design bottleneck if the DSP memory The contributions of our work are: is low, the data generated by multiple sensing applications • The design, and implementation, of a novel framework grows fast, and the power-hungry CPU needs to be woken for smartphone microphone sensing (DSP.Ear) which up from its low-power standby mode specifically to address allows continuous and simultaneous sensing of a variety the memory transfers. of audio-related user behaviors and contexts. Code Footprint. The DSP restricts the size of the shared • A detailed study of the trade-offs of CPU and co- object file deployed with application code. Deployment is- processor support for sensing workloads. We design a sues arise when machine learning models with a large num- series of techniques for optimizing interleaved sensing ber of parameters need to be initialized but these parameters pipelines that can generalize to other future designs. cannot be read from the file system and instead are provided We provide an extensive evaluation of the system and find directly in code which exhausts program space. Sensing in- that it is 3 to 7 times more power efficient than baselines ferences performed on the DSP become restricted to the ones running solely on the CPU. The optimizations we introduce the models of which successfully fit under the code size limit. prove critical for extending the battery lifetime as they allow Performance. The DSP has different performance charac- DSP.Ear to run 2 to 3 times longer on the phone compared to teristics. While ultra low-power is a defining feature of the a na¨ıve DSP deployment without optimizations. Finally, by DSP, many legacy algorithms which have not been specif- analyzing a large 1320-user dataset of Android smartphone ically optimized for the co-processor hardware architecture usage we discover that in 80% to 90% of the daily usage in- will run slower. Computations that are performed in real time stances DSP.Ear is able to operate for the whole day with a on the CPU may not be able to preserve this property when single battery charge – without impacting non-sensor smart- deployed on the DSP. The DSP generally supports floating phone usage and other phone operations. point operations via a 32-bit FP multiply-add (FMA) unit, 2 Hardware Support for Continuous Sensing but some of the basic operations such as division and square root are implemented in software which potentially intro- Low-power co-processors for mobile devices are usually duces increased latency in some of the algorithms.