Cinet: Redesigning Deep Neural Networks for Efficient

CiNet: Redesigning Deep Neural Networks for Efficient Mobile-Cloud Collaborative Inference Xin Dai∗ Xiangnan Kong∗ Tian Guo∗ Yixian Huang∗ Abstract Data transimission cost High Low Deep neural networks are increasingly used in end devices such as mobile phones to support novel features, ... ... High ... layer i layer layer n layer layer 1 layer 2 layer 1 layer 2 layer 3 layer layer n layer e.g., image classification. Traditional paradigms to sup- i+1 layer port mobile deep inference fall into either cloud-based or 100 × 100 × 3 feature map 100 × 100 × 3 on-device|both require access to an entire pre-trained Mobile device Cloud server Mobile device model. As such, the efficacy of mobile deep inference is limited by mobile network conditions and compu- ... ... layer 1 layer 2 layer 3 layer layer n layer layer 1 layer 2 layer 3 layer layer n layer tational capacity. Collaborative inference, a means to On-device computational cost 10×10×3 Low 100 × 100 × 3 100 × 100 × 3 splitting inference computation between mobile devices 100 × 100 × 3 Mobile device Cloud server Mobile device Cloud server and cloud servers, was proposed to address the limitations of traditional inference through techniques such as Figure 1: The problem of mobile-cloud collaborative inference. The goal is to perform image classification on image compression or model partition. a mobile device by collaborating with a cloud server. The In this paper, we improve the performance of collab- mobile device can send some data to the server to aid in the orative inference from a complementary direction, i.e., inference process, which will reduce the computational cost through redesigning deep neural networks to satisfy the on the mobile device but increase the data transmission cost. collaboration requirement from the outset. Specifically, we describe the design of a collaboration-aware convolu- to high data transmission. To use mobile-based infer- tional neural network, referred to as CiNet, for image ence, one needs to use mobile-specific models such as classification. CiNet consists of a mobile-side extractor MobileNet, SqueezeNet, or ShuffleNet [8, 9, 22]; even so submodel that outputs a small yet relevant patch of the mobile-based inference performance can be hindered by image and a cloud-based submodel that classifies on the limited on-device resources, e.g., CPU and battery life. image patch. To address the limitations of cloud-based and We evaluated the efficiency of CiNet in terms of mobile-based inference, an inference paradigm called inference accuracy, computational cost and mobile data collaborative inference was proposed recently [11, 13]. transmission on three datasets. Our results demonstrate Collaborative inference allows inference execution to that CiNet achieved comparable inference accuracy be split between mobile devices and cloud servers as while incurring orders of magnitude less computational demonstrated in Figure 1. Prior work on collabora- cost and 99% less transmitted data, when comparing to tive inference focuses on either reducing network data both traditional and collaborative inference approaches. transmission and the impact on inference accuracy of such reduction or partitioning schemes that split the 1 Introduction inference computation across mobile devices and cloud To leverage deep neural networks to provide novel fea- servers [11, 14, 21]. In this work, we approach the tures, mobile applications either use powerful cloud problem of collaborative inference from a complemen- servers, i.e., cloud-based inference, or directly run them tary perspective, by considering the collaboration re- on-device, i.e., mobile-based inference, as shown in Fig- quirement from the outset and redesigning the deep neu- ure 1. Cloud-based inference allows the use of com- ral networks. plex models [7, 12, 17, 18] (thus higher inference ac- Designing deep learning models that effectively sup- curacy), but requires mobile applications to send non- port collaborative inference has the following two key trivial amount of data over mobile networks, leading challenges. First, the on-device submodel needs to bal- ance mobile bandwidth consumption, on-device computational cost, and inference accuracy. For example, ∗Worcester Polytechnic Institute using more complex on-device model structure can ef- Copyright c 2021 by SIAM Unauthorized reproduction of this article is prohibited fectively reduce the required network data transmis- in this paper can be extended to more complex images sion, but can also increase on-device computation. Sec- having multiple ROIs. ond, both the on-device and cloud submodels should be trained in tandem without requiring additional time- 2 Problem Formulation consuming and manual annotations. Prior work on ob- In this section, we first define the problem of mobile- ject detection [4, 5, 16] is a potential candidate for de- cloud collaborative inference and outline key research tecting image regions to send to the cloud, but often re- challenges followed by our design principles. quires access to annotated locations during training [5]. Our design of models that are suitable for collabora- Mobile-cloud Collaborative Inference. In this pa- tive inference is centred around two key insights. First, per, we study the problem of improving the perfor- in many real-world scenarios, the results of image clas- mance, including mobile computational need, mobile sification often only depend on a small image portion. data transmission, and inference accuracy, of an emerg- Second, the task to identify the important image por- ing paradigm called mobile-cloud collaborative infer- tion, i.e., extraction, is often easier than the classifica- ence. At a high level, collaborative inference allows one tion. Note our key insights are similar to prior work in to split the model computation across mobile devices dynamic capacity networks [1]. We make the following and cloud servers. We approach the problem of effi- main contributions. cient collaborative inference by redesigning deep neural networks that are collaboration-aware, unlike existing • We identify the need and the key principles to re- works in collaborative inference [11, 13, 14, 21]. We design deep neural networks for achieving efficient in- focus on the problem of image classification and pro- ference under the collaborative inference paradigm. pose a new neural network design in this work. To For example, the performance of existing collabo- use our proposed collaborative inference solution, mo- rative inference approaches are constrained by the bile applications use an on-device sub-model that out- deep learning models and often can not simultane- puts a smaller-size representation P of the original im- ously achieve equally important performance goals age I. Afterwards, mobile applications send P to the such as low on-device computation, low mobile data cloud model server which generates and sends back the transmission, and high inference accuracy. predicted label for I. • We describe the design of a collaboration-aware model Key Challenges. One of the key challenges in de- for image classification called CiNet that works signing collaborative inference is to achieve low mo- within the limitations of mobile devices and achieves bile computational and transmission cost simultane- comparable inference accuracy to complex cloud- ously without impacting classification accuracy. Par- based models. On the mobile side, CiNet consists titioning existing successful deep neural networks, e.g., of an extractor submodel that generates a predefined AlexNet [12], can satisfy accuracy goal but often violate data grid from the content-locations of original image. either computational or transmission goals. For exam- The values on the grid are resampled at the corre- ple, the first two convolutional layers of AlexNet takes sponding locations on the original image and are used up a large portion of inference computation, making it as the input to the cloud-based classifier. In short, Ci- less ideal to run on mobile devices. Further, the out- Net efficiently splits computation across mobile de- put feature maps of early layers are usually quite large vices and cloud servers with low transmission cost, which undermines the mobile-cloud transmission cost. and can be trained in an end-to-end fashion using the standard backpropagation with only the image labels. Design Principles. In designing deep neural networks that are suitable for collaborative inference, we follow • We evaluated CiNet on three datasets and compared the key design principles below: (i) Reducing mobile against four inference mechanisms including cloud- computational cost. The required on-device computa- based, mobile-based, and existing collaborative infer- tion directly impacts the mobile energy consumption, ence. Our results show that CiNet reduced mobile as well as the inference response time. Lower compu- computational cost by up to three orders of magni- tational cost saves mobile battery life and helps with tude, lowered mobile data transmission by 99%, and mobile user experiences. (ii) Reducing mobile transmis- achieved similar inference accuracy with 0.34%-2.46% sion cost. Similar to computational cost, the required differences. transmitted data also affects mobile energy consump- Although we only focus on the images having only tion. Further, it is beneficial to send less data as ways one ROI in this paper, the experiment shows the ROI to preserve mobile data plan. (iii) Achieving compa- can be detected with low computational cost compared rable classification

Cinet: Redesigning Deep Neural Networks for Efficient

2020 Annual Report

Tianlin Liu Updated: April 17, 2021

Mandarin Chinese 2

Catalyzed Enantioselective Anti-Carboboration of Alkenyl Carbonyl Compounds

Names of Chinese People in Singapore

Elastic Resource Management

Historical Background of Wang Yang-Ming's Philosophy of Mind

Surname Methodology in Defining Ethnic Populations : Chinese

Characterization of Rose Rosette Virus and Development of Reverse Genetic System for Studying

The WAY of CHINESE CHARACTERS

Tian He Computer Science and Engineering Wang, Shuai, Anas Basalamah, Song Min Kim, Shuo Guo, Yoshito Tobe, and Tian He

Artist Bio Xiaotian Grew up in Xi'an, China, and Came to The