Arxiv:1901.04985V1 [Cs.DC] 12 Jan 2019 Computing Power of Embedded Or Edge Devices, Neural Net- Tion, Saving Computing Resources
Total Page:16
File Type:pdf, Size:1020Kb
NNStreamer: Stream Processing Paradigm for Neural Networks, Toward Efficient Development and Execution of On-Device AI Applications MyungJoo Ham1 Ji Joong Moon1 Geunsik Lim1 Wook Song1 Jaeyun Jung1 Hyoungjoo Ahn1 Sangjung Woo1 Youngchul Cho1 Jinhyuck Park2 Sewon Oh1 Hong-Seok Kim1,3 1Samsung Research, Samsung Electronics 2Biotech Academy, Samsung BioLogics 3Left the affiliation 1,2{myungjoo.ham, jijoong.moon, geunsik.lim, wook16.song, jy1210.jung, hello.ahn, sangjung.woo, rams.cho, jinhyuck83.park, sewon.oh, hongse.kim}@samsung.com Abstract 3. Reduce operating cost by off-loading computation to de- We propose nnstreamer, a software system that handles vices from servers. This cost is often neglected; it is, how- neural networks as filters of stream pipelines, applying the ever, significant if billions of devices are to be deployed. stream processing paradigm to neural network applications. We do not discuss the potential advantage, distributing work- A new trend with the wide-spread of deep neural network loads across edge devices, because it is not in the scope of applications is on-device AI; i.e., processing neural networks this paper, but of future work. directly on mobile devices or edge/IoT devices instead of On-device AI achieves such advantages by processing di- cloud servers. Emerging privacy issues, data transmission rectly in the nodes where data exist so that data transmis- costs, and operational costs signifies the need for on-device sion is reduced and sensitive data are kept inside. However, AI especially when a huge number of devices with real-time on-device AI induces significant challenges. Normally, edge data processing are deployed. Nnstreamer efficiently handles devices including mobile phones, TVs, home appliances, and neural networks with complex data stream pipelines on de- IoT devices have limited computing power compared to work- vices, improving the overall performance significantly with stations of conventional AI systems. In the meanwhile, on- minimal efforts. Besides, nnstreamer simplifies the neural net- device AI applications often require short response time or work pipeline implementations and allows reusing off-shelf high processing rates; e.g., AR Emoji [43], Animoji [3], and multimedia stream filters directly; thus it reduces the devel- offline speech recognition and translations. Besides, on-device opmental costs significantly. Nnstreamer is already being AI applications often use sensors incurring data stream with deployed with a product releasing soon and is open source high bandwidth: video cameras. software applicable to a wide range of hardware architectures On-device AI applications are becoming more complex, and software platforms. incurring even more challenges. A neural network may pro- cess multiple input streams and multiple neural networks may 1 Introduction process streams simultaneously. Besides, networks may share input data, and an input of a network may be outputs of other We have witnessed the wide spread of deep neural networks networks. For example, inputs of a facial recognition and an and their applications in the last decade. With ever growing emotion recognition may share outputs of an object recogni- arXiv:1901.04985v1 [cs.DC] 12 Jan 2019 computing power of embedded or edge devices, neural net- tion, saving computing resources. Composing systems with works are being adopted to such devices, further assisted by multiple networks allows to train smaller networks (more ef- AI accelerators [2,10, 24, 25, 27,38, 44, 48]. In mobile phone ficient) and to save training costs. However, stream pipelines industry, this trend has already become obvious enough to for such systems may become highly complicated with inter- be adopted by major manufacturers including Samsung and connections of networks and sensors along with fluctuating Apple [2, 44]. Running intelligence mechanisms such as deep latency and synchronization issues. Another challenge is that neural networks directly in edge devices including consumer the topology of pipelines may be dynamic. electronics is often called as on-device AI [37]. We have observed another issue in practice; developers may On-device AI is becoming more attractive to manufacturers use their own neural network frameworks such as TensorFlow because of the following three advantages: and Caffe or their own tweaked version of such frameworks. 1. Avoid data privacy and protection issues by keeping the With stable releases of applications, we may enforce develop- data in user devices without uploading to cloud servers. ers to share a framework or to provide independent binaries 2. Reduce data transmission latency of high-bandwidth real- without the need for any neural network frameworks. How- time stream data including live video feeds from cameras. ever, for researching and prototyping, developers want to ex- periment with various frameworks while testers still need to and LG use GStreamer as the multimedia engine of televi- execute the whole system integrated. Thus, we need to com- sions. Centricular [14,15,23,49,50] releases GStreamer for pose such complicated interactions between neural networks TVs, set-top boxes, medical devices, in-vehicle infotainment, and sensors along with multiple neural network frameworks. video-on-demand, and on-demand streaming solutions. We apply the stream processing paradigm [46] to on-device Another popular framework for Linux, FFmpeg [5] is not AI systems. Our approach is to handle a neural network as modular, but everything is built-in and has limited cross- a stream filter and then, to construct the system as a stream platform support. Note that GStreamer can embed FFmpeg pipeline. We propose data standards for tensors and containers as yet another plugin. of tensors, which allow varying neural network frameworks in StageFright [17] is the multimedia stream framework of a pipeline along with conventional stream filters, conventional Android. StageFright depends on Android services and, unlike media types (video, audio, and text), and tensors. In order GStreamer, is not flexible enough for other generic Linux to construct complicated pipelines, we also propose various distributions. Besides, it does not allow to construct arbitrary stream path manipulators with synchronization policies and stream pipelines. tensor operators. AVFoundation [4] is the multimedia stream framework of Handling stream pipelines efficiently along with various iOS and macOS. Along with Core ML, a machine learning synchronization methods, filters, and data types has been a framework, we may have AVFoundation as an input to Core main topic of multimedia frameworks for decades. Among ML; however, it is not supposed to write arbitrary pipelines multimedia frameworks, we choose GStreamer [19] as the ba- with Core ML entities as stream filters. sis because GStreamer offers highly modular architecture, is DirectShow [9] is the multimedia framework of Windows. fully open source software, and is compatible with various op- We cannot alter and use DirectShow or AVFoundation for the erating systems and hardware architectures. We can also reuse given purpose because it is proprietary software and supports what GStreamer offers; hundreds of media processing plugins proprietary platforms only. and developmental infrastructure. GStreamer has been widely deployed to desktop computing, video conferencing, broad- casting, and consumer electronics, even including televisions 2.2 Stream Processing in General of Samsung and LG, where the multimedia performance is extremely critical. StreamCloud [21] has addressed the bottleneck of a single An experimental product deployed to hospitals and house- input node in a stream pipeline. StreamCloud provides a holds, which will be commercially released soon, is based scalable and elastic stream framework to process large data on nnstreamer to process recognize events from multiple input and execute pipeline elements in parallel transparently sensors. Product developers have significantly improved the by splitting queries to allocate multiple nodes. performance and simplified the code, which have enabled There are studies to help manage large data streams effi- to add more sensors and neural networks with inexpensive ciently; however, they do not provide a general stream pro- hardware and minimal developmental costs. Nnstreamer is cessing framework for neural networks and complex topology. open source software licensed with LGPL [16]. We plan to Kumar [32] has proposed a network data stream analysis tool use nnstreamer for more products in the affiliation including with higher accuracy and lower overhead than prior work. In robots and conventional consumer electronics. order to reduce power consumption of edge devices receiving multimedia data streams, a client-side statistical prediction 2 Related Work scheme [51] estimates the data patterns of incoming data streams and tries to make processors sleep longer. A fragmen- 2.1 Multimedia Stream Processing in Practice tal proxy-caching scheme [47] tries to improve the quality of multimedia streams by adjusting caching units with finer Multimedia stream frameworks have recently evolved to pro- granularity. Apache Flink [7] proposes a platform with a uni- cess high quality audio and video with mobile phones and versal dataflow engine designed to perform both stream and TVs whose computing resources are limited. batch analytics for processing streaming and batch data. GStreamer [19] is the standard multimedia pipeline frame- Applying streaming framework for GPUs fits well with work for Tizen