Nestdnn: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision
Total Page:16
File Type:pdf, Size:1020Kb
NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision Biyi Fangy, Xiao Zengy, Mi Zhang Michigan State University yCo-primary authors ABSTRACT KEYWORDS Mobile vision systems such as smartphones, drones, and Mobile Deep Learning Systems; Deep Neural Network Model augmented-reality headsets are revolutionizing our lives. Compression; Scheduling; Continuous Mobile Vision These systems usually run multiple applications concur- rently and their available resources at runtime are dynamic ACM Reference Format: y y due to events such as starting new applications, closing ex- Biyi Fang , Xiao Zeng , Mi Zhang. 2018. NestDNN: Resource-Aware isting applications, and application priority changes. In this Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In The 24th Annual International Conference on Mobile Com- paper, we present NestDNN, a framework that takes the dy- puting and Networking (MobiCom ’18), October 29-November 2, 2018, namics of runtime resources into account to enable resource- New Delhi, India. ACM, New York, NY, USA, 13 pages. https://doi. aware multi-tenant on-device deep learning for mobile vi- org/10.1145/3241539.3241559 sion systems. NestDNN enables each deep learning model to offer flexible resource-accuracy trade-offs. At runtime, it dynamically selects the optimal resource-accuracy trade-off 1 INTRODUCTION for each deep learning model to fit the model’s resource de- Mobile systems with onboard video cameras such as smart- mand to the system’s available runtime resources. In doing so, phones, drones, wearable cameras, and augmented-reality NestDNN efficiently utilizes the limited resources in mobile headsets are revolutionizing the way we live, work, and inter- vision systems to jointly maximize the performance of all the act with the world. By processing the streaming video inputs, concurrently running applications. Our experiments show these mobile systems are able to retrieve visual information that compared to the resource-agnostic status quo approach, from the world and are promised to open up a wide range of NestDNN achieves as much as 4:2% increase in inference new applications and services. For example, a drone that can accuracy, 2:0× increase in video frame processing rate and detect vehicles, identify road signs, and track traffic flows 1:7× reduction on energy consumption. will enable mobile traffic surveillance with aerial views that traditional traffic surveillance cameras positioned at fixed CCS CONCEPTS locations cannot provide [31]. A wearable camera that can • Human-centered computing → Ubiquitous and mo- recognize everyday objects, identify people, and understand bile computing; • Computing methodologies → Neu- the surrounding environments can be a life-changer for the ral networks; blind and visually impaired individuals [2]. The key to achieving the full promise of these mobile vi- sion systems is effectively analyzing the streaming video frames. However, processing streaming video frames taken Permission to make digital or hard copies of all or part of this work for in mobile settings is challenging in two folds. First, the pro- personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear cessing usually involves multiple computer vision tasks. This this notice and the full citation on the first page. Copyrights for components multi-tenant characteristic requires mobile vision systems to of this work owned by others than ACM must be honored. Abstracting with concurrently run multiple applications that target different credit is permitted. To copy otherwise, or republish, to post on servers or to vision tasks [24]. Second, the context in mobile settings can redistribute to lists, requires prior specific permission and/or a fee. Request be frequently changed. This requires mobile vision systems permissions from [email protected]. MobiCom ’18, October 29-November 2, 2018, New Delhi, India to be able to switch applications to execute new vision tasks © 2018 Association for Computing Machinery. encountered in the new context [12]. ACM ISBN 978-1-4503-5903-0/18/10...$15.00 In the past few years, deep learning (e.g., Deep Neural https://doi.org/10.1145/3241539.3241559 Networks (DNNs)) [21] has become the dominant approach in computer vision due to its capability of achieving impres- of each other, this approach is not scalable and becomes in- sively high accuracies on a variety of important vision tasks feasible when the mobile system concurrently runs multiple [19, 37, 41]. As deep learning chipsets emerge, there is a deep learning models, with each of which having multiple significant interest in leveraging the on-device computing model variants. (ii) Selecting which resource-accuracy trade- resources to execute deep learning models on mobile systems off for each of the concurrently running deep learning models without cloud support [3–5]. Compared to the cloud, mobile is not trivial. This is because different applications have dif- systems are constrained by limited resources. Unfortunately, ferent goals on inference accuracies and processing latencies. deep learning models are known to be resource-demanding Taking a traffic surveillance drone as an example: an applica- [35]. To enable on-device deep learning, one of the common tion that counts vehicles to detect traffic jams does not need techniques used by application developers is compressing high accuracy but requires low latency; an application that the deep learning model to reduce its resource demand at a reads license plates needs high plate reading accuracy but modest loss in accuracy as trade-off [12, 39]. Although this does not require real-time response [39]. technique has gained considerable popularity and has been To address the first challenge, NestDNN employs a novel applied to developing state-of-the-art mobile deep learning model pruning and recovery scheme which transforms a deep systems [8, 14, 17, 20, 38], it has a key drawback: since appli- learning model into a single compact multi-capacity model. cation developers develop their applications independently, The multi-capacity model is comprised of a set of descendent the resource-accuracy trade-off of the compressed model is models, each of which offers a unique resource-accuracy predetermined based on a static resource budget at applica- trade-off. Unlike traditional model variants that are inde- tion development stage and is fixed after the application is pendent of each other, the descendent model with smaller deployed. However, the available resources in mobile vision capacity (i.e., resource demand) shares its model parameters systems at runtime are always dynamic due to events such as with the descendent model with larger capacity, making it- starting new applications, closing existing applications, and self nested inside the descendent model with larger capacity application priority changes. As such, when the resources without taking extra memory space. In doing so, the multi- available at runtime do not meet the resource demands of capacity model is able to provide various resource-accuracy the compressed models, resource contention among concur- trade-offs with a compact memory footprint. rently running applications occurs, forcing the streaming To address the second challenge, NestDNN encodes the video to be processed at a much lower frame rate. On the inference accuracy and processing latency of each descen- other hand, when extra resources at runtime become avail- dent model of each concurrently running application into a able, the compressed models cannot utilize the extra available cost function. Given all the cost functions, NestDNN employs resources to regain their sacrificed accuracies back. a resource-aware runtime scheduler which selects the optimal In this work, we present NestDNN, a framework that takes resource-accuracy trade-off for each deep learning model the dynamics of runtime resources into consideration to en- and determines the optimal amount of runtime resources to able resource-aware multi-tenant on-device deep learning allocate to each model to jointly maximize the overall infer- for mobile vision systems. NestDNN replaces fixed resource- ence accuracy and minimize the overall processing latency accuracy trade-offs with flexible resource-accuracy trade-offs, of all the concurrently running applications. and dynamically selects the optimal resource-accuracy trade- Summary of Experimental Results. We have conducted off for each deep learning model at runtime to fit the model’s a rich set of experiments to evaluate the performance of resource demand to the system’s available runtime resources. NestDNN. To evaluate the performance of the multi-capacity In doing so, NestDNN is able to efficiently utilize the lim- model, we evaluated it on six mobile vision applications that ited resources in the mobile system to jointly maximize the target some of the most important tasks for mobile vision sys- performance of all the concurrently running applications. tems. These applications are developed based on two widely Challenges and our Solutions. The design of NestDNN used deep learning models – VGG Net [33] and ResNet [13]– involves two key technical challenges. (i) The limitation of and six datasets that are commonly used in computer vision existing approaches is rooted in the constraint where the community. To evaluate the performance of the resource- trade-off between resource demand and accuracy of acom-