<<

COVER FEATURE EMBEDDED

Deep Learning for the of Things

Shuochao Yao and Yiran Zhao, University of Illinois Urbana-Champaign (UIUC) Aston Zhang, AI Shaohan Hu, IBM Thomas J. Watson Research Center Huajie Shao and Chao Zhang, UIUC Lu Su, State University of New York, Buffalo Tarek Abdelzaher, UIUC

How can the advantages of deep learning be brought to the emerging world of embedded IoT devices? The authors discuss several core challenges in embedded and mobile deep learning, as well as recent solutions demonstrating the feasibility of building IoT applications that are powered by effective, efficient, and reliable deep learning models.

he proliferation of internetworked mobile and Indeed, significant research efforts have been spent embedded devices leads to visions of the Inter- toward building smarter and more user-friendly appli- net of Things (IoT), giving rise to a -rich cations on mobile and embedded devices and . world where physical things in our everyday At the same time, recent advances in deep learning have Tenvironment are increasingly enriched with computing, greatly changed the way that computing devices process sensing, and communication capabilities. Such capabil- human-centric content such as images, video, speech, ities promise to revolutionize the interactions between and audio. Applying deep neural networks to IoT devices humans and physical objects. could thus bring about a generation of applications

32 PUBLISHED BY THE IEEE COMPUTER SOCIETY 0018-9162/18/$33.00 © 2018 IEEE capable of performing complex sens- therefore lies in the high resource We elaborate on these core problems ing and recognition tasks to support demand of trained deep neural net- and their emerging solutions to help a new realm of interactions between work models. While existing neu- lay a foundation for building IoT sys- humans and their physical surround- ral network compression algorithms tems enriched with effective, efficient, ings. This article discusses four key can effectively reduce the number and reliable deep learning models. research questions toward the realiza- of model parameters, not all of these tion of such novel interactions between models lead to matrix representa- ON DEEP LEARNING MODELS humans and (deep-) learning-enabled tions that can be efficiently imple- FOR SENSOR DATA physical things, namely: What deep mented on IoT devices. A key research challenge toward the neural network structures can effec- Recent work describes a particularly realization of learning-enabled IoT tively process and fuse sensory input effective deep learning compression systems lies in the design of deep neu- data for diverse IoT applications? How algorithm, called DeepIoT,2 that can ral network structures that can effec- to reduce the resource consumption directly compress the structures of tively estimate outputs of interest of deep learning models such that commonly used deep neural net- from noisy time-series multisensor they can be efficiently deployed on works. The compressed model can be measurements.1 resource-constrained IoT devices? How deployed on commodity devices. A Despite the large variety of embed- to compute confidence measurements large proportion of execution time, ded and tasks in IoT in the correctness of deep learning pre- energy, and memory can be reduced contexts, one can generally categorize dictions for IoT applications? Finally, with little effect on the final predic- them into two common subtypes: esti- how to minimize the need for labeled tion accuracy. mation tasks and classification tasks, data in learning? Third, reliability assurances are depending on whether prediction To elaborate on the above chal- important in cyber-physical and IoT results are continuous or categorical, lenges, first, observe that IoT appli- applications. The need for offering respectively. The question therefore cations often depend on collabora- such assurances calls for well-calibrated becomes whether or not a general neu- tion among multiple sensors, which estimation of uncertainty associated ral network architecture exists that requires designing novel neural net- with learning results. We present can effectively learn the structure of work structures for multisensor data simple methods for generating well- models needed for estimation and fusion. These structures should be calibrated uncertainty estimates for classification tasks from sensor data. able to model complex interactions the predictions computed in deep neu- Such a general deep learning neural among multiple sensory inputs over ral networks, called RDeepSense.3 It network architecture would, in princi- time and effectively encode features achieves accurate and well-calibrated ple, overcome disadvantages of today’s of sensory inputs that are pertinent to estimations by changing the objective approaches that are based on analyti- desired recognition and other tasks. function to faithfully reflect predic- cal model simplifications or the use of We review a general deep learning tion correctness. hand-crafted engineered features. framework for this purpose, called Finally, labeling data for learn- Traditionally, for estimation-oriented DeepSense,1 that provides a unified yet ing purposes is time-consuming. One problems such as tracking and localiza- customizable solution for the learn- must teach sensing devices to recog- tion, sensor inputs are processed based ing needs of various IoT applications. nize objects and concepts without the on the physical models of the phenom- It demonstrates that certain combi- benefit of (many) examples, where ena involved. Sensors generate mea- nations of deep neural network topol- ground truth values for such objects surements of physical quantities such ogies are particularly well-suited for and concepts are given. Unsupervised as acceleration and angular velocity. learning from sensor data. and semisupervised solutions are From these measurements, other phys- Second, IoT devices are usually low- needed to solve the challenge of learn- ical quantities are derived (such as dis- end systems with limited computa- ing with limited labeled (and mostly placement through double integration tional, energy, and memory resources. unlabeled) samples, while approach- of acceleration over time). However, One key impediment in deploying ing the performance of learning from measurements of commodity sensors deep neural networks on IoT devices fully labeled data. are noisy. The noise in measurements

MAY 2018 33 EMBEDDED DEEP LEARNING

Single/multiple outputs Output layer Type-speci c output layer T time intervals with width τ

Recurrent layer 2 GRU ...... GRU

Recurrent layer 1 GRU ...... GRU

Flatten and concatenation (c) τ x ..t τ (6) Merge convolutional layer 3 x ..t (5) x ..t Merge convolutional layer 2 (4) x ..t

Merge convolutional layer 1 K attened K attened . . . .

. . . .

. . .

features . (3) features x ..t Flatten and concatenation ...... (k,3) ...... x .. Individual convolutional layer 3 t (k,2) ...... x ..t ......

Individual convolutional layer 2 (k,1) ...... x ..t ......

Individual convolutional layer 1 ...... (k) ...... x ..t

K sensor inputs K sensor inputs

FIGURE 1. Main architecture of the DeepSense framework.

is nonlinear and might be correlated automatically adapting the learned among multiple sensors for effective over time, which makes it hard to neural network to complex correlated . Next, an RNN is applied model. It is therefore challenging to noise patterns while, at the same time, to extract temporal patterns. At last, separate signal from noise, leading to converging on the extraction of max- either an affine transformation or a estimation errors and bias. imally robust signal features that are softmax output is used, depending on For classification-oriented prob- most suited for the task at hand. A whether we want to model an estima- lems, such as activity and context recent framework, called DeepSense, tion or a classification task. recognition, a typical approach is to demonstrates a case for feasibility of This architecture solves the gen- compute appropriate features derived such a general solution. eral problem of learning multisensor from raw sensor data. These hand- As shown in Figure 1, DeepSense fusion tasks for purposes of estimation crafted features are then fed into integrates convolutional neural net- or classification from time-series data. a classifier for training. Designing works (CNNs) and recurrent neural For estimation-oriented problems, good hand-crafted features can be networks (RNNs). Sensory inputs are DeepSense learns the physical system time-consuming; it requires exten- aligned and divided into time inter- and noise models to yield outputs from sive experiments to generalize well to vals for processing time-series data. noisy sensor data directly. The neu- diverse settings such as different sen- For each interval, DeepSense first ral network acts as an approximate sor noise patterns and heterogeneous applies an individual CNN to each sen- transfer function. For classification- user behaviors. sor, encoding relevant local features oriented problems, the neural network A general deep learning frame- within the sensor’s data stream. Then, acts as an automatic feature extractor work can effectively address both of a (global) CNN is applied on the respec- encoding local, global, and temporal the aforementioned challenges by tive outputs to model interactions information.

34 COMPUTER WWW.COMPUTER.ORG/COMPUTER 1

0.9

0.8

DeepSense 0.7 DS−singleGRU DS−noIndvConv As a unified model, DeepSense can DS−noMergeConv HAR−MultiRBM 0.6 be easily customized for a specific IoT HAR−RF application. The application designer HAR−SVM needs only to decide on the number of HAR−RBM 0.5 sensory inputs, input/output dimen- Accuracy Macro F1 Micro F1 sions, and the training objective func- tion. The detailed mathematical for- FIGURE 2. Performance metrics of heterogeneous human activity recognition (HHAR) mulation of DeepSense can be found in task with the DeepSense framework. a related article.1 Encouraging results were reported on applying DeepSense in two repre- 1 sentative sensing tasks: heterogeneous human activity recognition (HHAR) 0.9 and user identification with biometric motion analysis (UserID). HHAR is a motion-sensor-based activity recog- 0.8 nition task. It is tested on new users who have not appeared in the training set. In contrast, UserID uses motion 0.7 sensors for user identification from activities such as walking, biking, and climbing stairs. 0.6 DeepSense To understand the contributions of DS−singleGRU different architectural components, DS−noIndvConv 0.5 variants of the DeepSense model were DS−noMergeConv IDNet introduced by removing some design GaitID component(s) from the general archi- 0.4 tecture. DS-singleGRU simplifies the Accuracy Macro F1 Micro F1 RNN by replacing its two-layer stacked GRU architecture with a single-layer FIGURE 3. Performance metrics of UserID task with the DeepSense framework. GRU of a larger dimension, while keeping the number of parameters the same. DS-noIndvConv skips the application, including HAR-RF,4 HAR- can beat hand-crafted solutions convolutional subnets for individual SVM,4 HRA-RBM, and HRA-MultiRBM5 designed for the individual applica- sensors, keeping a single CNN that for activity recognition, and GaitID6 tion spaces. Although current work is merges data from all sensors in each and IDNet7 for used identification. by no means a consummate proof of time window. Finally, DS-noMerge- Accuracy results in performing generalizability, this property (if true) Conv skips the global convolutional HHAR and UserID tasks are illus- would be very important, because a subnet that merges sensor data. trated in Figures 2 and 3, respectively. main appeal of applying deep learn- Instead, it flattens the output of each The DeepSense based algorithms ing in IoT contexts lies in obviating individual convolutional subnet and (including DeepSense and its three the need for per-application custom- concatenates them into a single vec- variants) outperform other baseline ization of theoretical derivations and as the input to the RNN. algorithms by a large margin (that hand-crafted features. More research These models (together with the is, at least 10 percent for HHAR and is needed to substantiate or refute the overall DeepSense model) were com- at least 20 percent for UserID). The early evidence and to understand the pared to various custom-designed results offer anecdotal evidence that limits of generalizability of learning or hand-crafted baselines for each a general deep learning architecture models across IoT systems.

MAY 2018 35 EMBEDDED DEEP LEARNING

DeepIoT

Original neural network

Fully-connected3 Compressor neural network

Dropout layer2 p(2 ) Parameter2

Fully-connected2 resulting compression reduce energy, W(2 ) RNN execution time, and memory needs in practice?2 Dropout layer1 p(1) An illustration of such a compres- sion framework, called DeepIoT,2 is Parameter1 shown in Figure 4. DeepIoT borrows ( ) Fully-connected1 W 2 RNN the idea of dropping hidden elements from a widely used deep learning regu- larization method called dropout. The dropout operation gives each hidden element a dropout probability. During FIGURE 4. Overall DeepIoT system framework. Orange boxes represent dropout opera- the dropout process, hidden elements tions. Green boxes represent parameters of the original neural network. can be pruned according to their drop- out probabilities. A “thinned” network structure can thus be generated. The

100 challenge is to set these dropout proba- bilities in an informed manner to gen- erate the optimal slim network struc- ture that preserves the accuracy of 80 sensing applications while maximally reducing their resource consumption. An important purpose of DeepIoT is 100 thus to find the optimal dropout prob- 60 ability for each hidden element in the 90 neural network. Accuracy (%) To obtain the optimal dropout 80 probabilities for nodes in the neural 40 DeepIoT network, DeepIoT exploits the net- 70 DyNS−Ext work parameters themselves. From DyNS the perspective of model compres- 60 0 0.01 0.02 0.03 0.04 0.05 0.06 SparseSep sion, an element that is more redun- 20 dant should have a higher probabil- 0 0.2 0.4 0.6 0.8 1 ity of being dropped. A contribution Proportion of memory consumption by the model of DeepIoT lies in exploiting a novel compressor neural network to solve FIGURE 5. The tradeoff between testing accuracy and memory consumption by models. this problem. It takes model param- eters of each layer as input, learns parameter redundancies, and gener- COMPRESSING NEURAL where they fit comfortably on low-end ates the dropout probabilities accord- NETWORK STRUCTURES embedded devices, enabling real-time ingly. The compressor neural network Resource constraints of IoT devices “intelligent” interactions with their is optimized jointly with the original remain an important impediment environment. Can a unified approach neural network to be compressed in toward deploying deep learning mod- compress commonly used deep learn- an iterative manner that tries to min- els. A key question is therefore whether ing structures, including fully con- imize the loss function of the original it is possible to compress deep neu- nected, convolutional, and recurrent IoT application. ral networks, such as those described neural networks, as well as their com- Evaluation shows that the Deep- in the previous section, to a point binations? To what degree does the IoT compression algorithm is able to

36 COMPUTER WWW.COMPUTER.ORG/COMPUTER 100

80

60

greatly reduce the network size, exe- Accuracy (%) cution time, and energy consump- tion without hurting the prediction accuracy.2 We continue to use UserID 40 DeepIoT as the running application examples, DyNS−Ext and compare compression efficacy DyNS to that of several baselines; namely, SparseSep 20 DyNS,8 SparseSep,9 and DyNS-Ext. 20 40 60 80 100 120 140 160 DyNS is a magnitude-based net- Execution time (ms) work pruning algorithm that prunes weights in convolutional kernels and FIGURE 6. The tradeoff between testing accuracy and execution time. fully connected layers based on their magnitude. SparseSep simplifies the fully connected layer by the sparse 100 coding technique, and compresses the convolutional layer with matrix factorization. DyNS-Ext extends the magnitude-based method used in 80 DyNS to recurrent layers. Just like DeepIoT, DyNS-Ext can be applied to all commonly used deep network modules, including fully connected 60

layers, convolutional layers, and Accuracy (%) recurrent layers. All models use 32-bit floats without quantization. 40 DeepIoT Experiments are conducted on the DyNS−Ext Edison platform. DyNS The detailed tradeoff between SparseSep testing accuracy and memory con- 20 sumption of the resulting models is 0 50 100 150 illustrated in Figure 5. We compress Inference energy (mJ) the original DeepSense neural net- work with different compression FIGURE 7. The tradeoff between testing accuracy and energy consumption. ratios and observe the final testing accuracy. DeepIoT achieves the best tradeoff. The ability of compression algo- ESTIMATING UNCERTAINTY The tradeoff between execution rithms to significantly reduce net- The next problem concerns the reliabil- time and testing accuracy is shown work size without affecting accuracy ity of deep learning models. In particu- in Figure 6. Similarly, the tradeoff suggests that the underlying models lar, how to offer principled uncertainty between energy consumption and of IoT applications are inherently low- estimates that can faithfully reflect testing accuracy is shown in Figure 7. dimensional, thus allowing for signif- the correctness of model predictions? DeepIoT offers the best reduction in icant simplifications of learned neural Principled uncertainty estimation is execution time (approximately 80.8 network structures. This is good news critical when deep learning is used to percent) as well as the best reduc- in terms of feasibility of implementa- support IoT applications that require tion in energy consumption (approxi- tion on resource-limited hardware, quantified reliability assurances. mately 83.3 percent) without apparent such as the Edison board used on the Recent work focused on two related loss in accuracy. above evaluation. challenges:

MAY 2018 37 EMBEDDED DEEP LEARNING

TABLE 1. Mean absolute error (MAE) and negative log- likelihood (NLL) for the NYCommute task.

Deep learning algorithm MAE NLL

RDeepSense 5.64 7.7

SSP-1 8.15 4.86

SSP-3 7.90 4.67 and the overestimation effect of nega- SSP-5 7.51 4.84 tive log-likelihood are thus balanced SSP-10 7.03 4.81 by tuning the weighted sum. RDeep- Sense was shown to generate well- MCDrop-3 5.69 19,995.6 calibrated uncertainty estimates. Regarding resource efficiency, since MCDrop-5 5.64 1,335.73 RDeepSense emits a distribution esti- MCDrop-10 5.61 640.35 mate instead of a point estimate at the output layer, it can do the uncertainty MCDrop-20 5.61 640.35 estimation in a single run. Compared Gaussian Process 11.84 7.46 with sampling-based and ensemble- based methods that require running a model k times for k samples, RDeep- Sense results in much reduced execu- ››how to develop methods that variables, which equates a neural net- tion time and energy consumption. provide accurate uncertainty work to a statistical model. Proper We evaluate the accuracy of uncer- estimates in prediction results scoring rules (based on the loss func- tainty estimation of RDeepSense and obtained from deep learning tion) then measure the accuracy of related baselines on the NYCommute models, and probabilistic predictions. task. NYCommute predicts commute ››how to develop resource- The loss function has a large effect times in New York City based on a data efficient solutions for the uncer- on the final results. Taking a regres- set of taxi-cab pick-up/drop-off times tainty estimation problem, such sion problem as an example, using the and locations. that they can be implemented on mean square error as the loss function We compare RDeepSense to three resource-limited IoT devices. tends to underestimate the uncertain- baseline algorithms. They are called ties. This is so because the training MCDrop,10 SSP,11 and Gaussian Pro- In this section, we introduce a process is focused on predicting an cess (GP). All deep-learning-based simple, well-calibrated, and efficient accurate mean value without concern- algorithms use a four-layer fully con- uncertainty estimation algorithm for ing itself with the variance. At the same nected neural network with 500 hid- a multilayer perceptron (MLP) called time, using negative log-likelihood as den dimensions. MCDrop is based on RDeepSense.3 RDeepSense enables the loss function tends to overesti- a Monte Carlo dropout. Compared uncertainty estimation with theo- mate the uncertainties. The reason with RDeepSense, the main difference retically proven error bounds for IoT is that, during the early phase of is that MCDrop is not optimized by a applications. training a neural network with log- proper scoring rule. MCDrop requires There are only two steps in com- likelihood loss, it is relatively hard running the neural network multiple puting uncertainty for an arbitrary to generate an accurate estimate of times to generate samples for uncer- fully connected neural network. First, the mean. Increasing the value of tainty estimation. We use MCDrop-k insert dropout operations to each estimated variance can consistently to represent MCDrop with k samples. fully connected layer. Second, adopt decrease the negative log-likelihood SSP trains the neural network with a proper scoring rule as the loss func- loss with a high probability. There- proper scoring methods. Compared tion and emit a distribution estimate fore, the predicted uncertainty tends with RDeepSense, the main difference instead of a point estimate at the out- to favor a larger variance that overes- is that SSP uses the ensemble method put layer. timates the true uncertainty. instead of the dropout operation in Intuitively speaking, the drop- RDeepSense applies a tunable func- each layer. SSP requires training mul- out operations convert a traditional tion, based on a weighted sum of neg- tiple neural networks for the ensem- (deterministic) neural network with ative log-likelihood and mean square ble. We use SSP-k to represent SSP with parameters into a random Bayesian error, as the loss function. The under- an ensemble of k individual neural net- neural network model with random estimation effect of mean square error works. GP is a Gaussian-process-based

38 COMPUTER WWW.COMPUTER.ORG/COMPUTER 1 Optimal MCDrop−1 MCDrop−3 0.8 MCDrop−5 MCDrop−10 GP 0.6 RDeepSense algorithm. It is used to illustrate the Accuracy (%) 0.4 quality of uncertainty estimation gen- erated by a statistical model. In test- ing, we compute the z% confidence 0.2 interval based on the predicted mean and variance of each algorithm. We then measure the fraction of the test- ing data that falls into this confidence 0 0 0.2 0.4 0.6 0.8 1 interval. For a well-calibrated uncer- Execution Time (ms) tainty estimation, the fraction of test- ing data that falls into the confidence FIGURE 8. The calibration curves of RDeepSense, GP, and MCDrop-k. interval should be similar to z%. The comparison result is shown in Table 1. MCDrop-k shows low MAE 1 and high NLL, while SSP-k shows high MAE and low NLL. MCDrop-k tries to minimize the mean square 0.8 error, while SSP-k tries to minimize the negative log-likelihood. There- fore, MCDrop-k focuses more on the 0.6 mean of predictive distribution, and

SSP-k focuses more on the overall like- Optimal lihood. RDeepSense combines two Accuracy (%) 0.4 SSP−1 objective functions, mean square error SSP−3 and negative log-likelihood, to find a SSP−5 balance between these two. 0.2 SSP−10 The calibration curves are illus- GP trated in Figures 8 and 9. Both RDeepSense MCDrop-k and SSP-k fail to generate 0 high-quality uncertainty estimates, 0 0.2 0.4 0.6 0.8 1 either underestimating or overesti- Execution Time (ms) mating the uncertainty. However, RDeepSense provides uncertainty esti- FIGURE 9. The calibration curves of RDeepSense, GP, and SSP-k. mates with good quality, outperform- ing GP by a significant margin. The results offer a path toward accurate to be estimated. Since the number of strategy is to define a game between estimation of uncertainty in outputs parameters is large, so is the required two competing networks. The gener- of deep learning models. number of labeled examples. This ator network maps a source of noise need for labeling offers a significant to the input space. The discriminator MINIMIZING LABELED DATA practical impediment to the use of network receives either a generated A general disadvantage of deep learn- deep learning in IoT contexts, where sample or a true data sample and must ing methods lies in the need for large labeling cannot be easily done. distinguish between the two. The gen- amounts of labeled data. To learn well Recently, generative adversarial erator is trained to fool the discrimi- from empirical measurements, the networks (GAN) has been proposed nator. Here, we define the input prob- neural network must be given a suf- as a promising deep learning tech- abilistic space as the joint probabilistic ficient number of labeled examples nique for unsupervised and semisu- distribution of input sensory data and from which network parameters are pervised learning.12 The GAN training classification label. The GAN training

MAY 2018 39 EMBEDDED DEEP LEARNING

Sensing Systems with a Compressor- TABLE 2. Semisupervised training of HHAR Critic Framework,” Proc. 15th ACM with DeepSense framework. Conf. Embedded Network Sensor Sys- tem, 2017; https://arxiv.org/abs p% 10% 5% 3% 2% 1% /1706.01215. Sense-GAN 94.8% 92.5% 91.4% 90.4% 88.3% 3. S. Yao et al., “RDeepSense: Reliable Deep Mobile Computing Models with DeepSense 92.0% 89.3% 85.3% 83.6% 79.1% Uncertainty Estimations,” Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous , vol. 1, no. 4, strategy leverages the unlabeled data system performance, such as execu- 2018, p. 173. to increase the capacity of generator tion time and energy consumption? 4. A. Stisen et al., “Smart Devices are and discriminator networks, which Can one extend uncertainty mea- Different: Assessing and Mitigating explicitly improve the discriminating surements to other deep learning Mobile Sensing Heterogeneities for ability of classifier in return. models besides MLPs? How does one Activity Recognition,” Proc. 13th Evaluation shows that the semisu- learn in highly dynamic environ- ACM Conf. Embedded Network Sensor pervised strategy, called SenseGAN, ments where it is impossible to collect System (SenSys 15), 2015, pp. 127–140. greatly reduces the requirements a large number of data samples? More 5. V. Radu et al., “Towards Multimodal of labeled data. We continue to use investigation is needed to address Deep Learning for Activity Recogni- HHAR with DeepSense framework1 these questions. tion on Mobile Devices,” Proc. ACM as the running application example, Int’l Joint Conf. Pervasive and Ubiqui- where we take p% of the overall data- ACKNOWLEDGMENTS tous Computing: Adjunct (UbiComp set as labeled data. Research reported in this was 16), 2016, pp. 185–188. As shown in Table 2, the semisu- sponsored in part by NSF under grants 6. H.M. Thang et al., “Gait Identifica- pervised training can preserve the CNS 16-18627 and CNS 13-20209 and in tion Using Accelerometer on Mobile classification accuracy with only 10 part by the Army Research Laboratory Phone,” Proc. Int’l Conf. Control, percent of labeled data by leveraging under Cooperative Agreements W911NF- and Information Sciences 90 percent of unlabeled data. How- 09-2-0053 and W911NF-17-2-0196. The (ICCAIS 12), 2012, https://doi ever, extensive studies are still needed views and conclusions contained in this .org/10.1109/ICCAIS.2012.6466615. to explore the possibility of training document are those of the authors and 7. M. Gadaleta and M. Rossi, “Idnet: with fewer number of labeled as well should not be interpreted as represent- -Based Gait Recognition as unlabeled data in IoT context. ing the official policies, either expressed with Convolutional Neural Net- or implied, of the Army Research Labo- works,” 2016; https://arxiv.org/abs ratory, NSF, or the U.S. Government. The /1606.03238. e introduced challenges U.S. Government is authorized to repro- 8. Y. Guo, A. Yao, and Y. Chen, and emerging solutions duce and distribute reprints for Govern- “Dynamic Network Surgery for Wthat suggest feasibility of ment purposes notwithstanding any Efficient DNNs,” Proc. 30th Int’l Conf. building effective, efficient, and reli- copyright notation here on. Neural Information Processing System able IoT systems enriched with deep (NIPS 16), 2016, pp. 1387–1395. learning techniques. More studies are REFERENCES 9. S. Bhattacharya and N.D. Lane “Spar- needed to further verify the applicabil- 1. S. Yao et al., “DeepSense: A Uni- sification and Separation of Deep ity of results. Can one build a unified fied Deep Learning Framework for Learning Layers for Constrained deep learning framework for largely Time-Series Mobile Sensing Data Resource Inference on Wearables,” heterogeneous sensory inputs, such Processing,” Proc. 26th Int’l Conf. Proc. 14th ACM Conf. Embedded Net- as audio signals, Wi-Fi signals, and World Wide Web, 2017, pp. 351–360. work Sensor Systems (SenSys 16), 2016, motion inputs? What are the impact 2. S. Yao et al., “DeepIoT: Compressing pp. 176–189. of neural network compression on Deep Neural Network Structures for 10. Y. Gal and Z. Ghahramani, “Dropout

40 COMPUTER WWW.COMPUTER.ORG/COMPUTER ABOUT THE AUTHORS

SHUOCHAO YAO is a PhD student in the Department of networks, and distributed data centers. Shao received an MS Computer Science at the University of Illinois Urbana-Cham- from Zhejiang University. Contact him at [email protected]. paign (UIUC). His research interests include deep learning on Internet of Things (IoT), cyber-physical systems, and crowd CHAO ZHANG is a PhD student in the Department of Com- and social sensing. Yao received a BS in information engi- puter Science at UIUC. His research interests include social neering from Shanghai Jiao Tong University. Contact him at media analysis, spatiotemporal , text mining, graph [email protected]. mining, and urban computing. Zhang received an MS from Zhejiang University. Contact him at [email protected]. YIRAN ZHAO is a PhD student in the Department of Computer Science at UIUC. His research interests include cyber-phys- LU SU is an assistant professor in the Department of Computer ical systems, , and IoT applications. Zhao Science and Engineering at State University of New York Buf- received a BS in information engineering from Shanghai Jiao falo. He has also worked at IBM T. J. Watson Research Cen- Tong University. Contact him at [email protected]. ter and National Center for Supercomputing Applications. Su received a PhD in computer science from UIUC. Su’s research ASTON ZHANG is an applied scientist at Amazon AI. His interests include the general areas of mobile and crowd sens- research focuses is on deep learning. Zhang received a PhD ing systems, Internet of Things, and cyber-physical systems. in computer science from UIUC. He previously interned with He is the recipient of an NSF Career Award, University at Buf- Research, Research, Yahoo Labs, UBS, and falo Young Investigator Award, ICCPS 17 Best Paper Award, proprietary trading, and has served in program committees and the ICDCS 17 Best Student Paper Award. He is a member for WWW, KDD, SIGIR, and WSDM. He is a coauthor and coin- of ACM and IEEE. Contact him at [email protected]. structor of the deep learning tutorial with Apache MXNet/ Gluon. Contact him at [email protected]. TAREK ABDELZAHER is a professor and Willett Faculty Scholar in the Department of Computer Science at UIUC. SHAOHAN HU is a research staff member at IBM Thomas His research interests include understanding and influencing J. Watson Research Center. His research interests include performance and temporal properties of networked embed- cyber-physical systems, mobile , crowd ded, social and software systems in the face of increasing and social sensing, , , and complexity, distribution, and degree of interaction with an quantum computing. Hu received a PhD in computer science external physical environment. Abdelzaher received a PhD from UIUC. Contact him at shaohan.hu@.com. in adaptation in real-time systems from the University of Michigan. He has authored or coauthored more HUAJIE SHAO is a PhD student in the Department of Com- than 200 refereed publications in real-time computing, dis- puter Science at UIUC. His research interests include data tributed systems, sensor networks, and control. He is a mem- analysis in social networks, applied machine learning, sensor ber of IEEE and ACM. Contact him at [email protected].

as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” Proc. 33rd Int’l Conf. CONFERENCES Machine Learning (ICML 16), 2016, pp. 1050–1059. in the Palm of Your Hand 11. B. Lakshminarayanan, A. Pritzel, • Conference schedules • Networking and C. Blundell, “Simple and Scalable • Noti cations • And more Predictive Uncertainty Estima- tion Using Deep Ensembles,” 2016; Contact Conference Publishing Services at [email protected] https://arxiv.org/abs/1612.01474. 12. I. Goodfellow et al., “Generative Adversarial Nets,” 2014; https:// arxiv.org/abs/1406.2661.

MAY 2018 41