<<

CALIFORNIA STATE UNIVERSITY SAN MARCOS

THESIS SIGNATURE PAGE

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

MASTER OF SCIENCE

IN

COMPUTER SCIENCE

THESIS TITLE: : An Orchestration of Devices

AUTHOR: Emmanuel Ayuyao Castillo

DATE OF SUCCESSFUL DEFENSE:

TI IE THESIS HAS BEEN ACCEPTED BY THE THESIS COMMITTEE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE.

Ali Ahrnadinia /zos: 16 TI IESIS COMMITTEE CHAIR DATE

Xin Ye ;z/of/lt TIIESIS COMMITTEE MEMBER DATE

TI IESIS COMMITTEE MEMBER SIGNATURE DATE Edge Computing: An Orchestration of Devices

Thesis by Emmanuel Ayuyao Castillo

In Partial Fulfllment of theRequirements for the Degree of Master of Science in Computer Science

CALIFORNIA STATE UNIVERSITY, SAN MARCOS San Marcos, California

2018 Defended December 2018 ii

© 2018 Emmanuel Ayuyao Castillo ORCID: 004268444 All rights reserved except where otherwise noted iii ACKNOWLEDGEMENTS

I would like to thank California State University, San Marcosfor acceptingme into their Computer Science Master’s program and providing mea rewarding academic experience I would never forget. The professors I work with were very supportive and I have learned a lot in the Computer Science program. I would alsolike to thank my parents and close friends that help me get throughthis journey. It was a diÿcult time for me working and going to school full time with long commutes. There were many cases when I had to sacrifce time and events with my friends and family to take care of my academics. In the end they are still by my side. Lastly, I have the greatest appreciation for my graduate advisor Dr. Ali Ahmadinia. It was my goal to be an Engineer. He introduced me to not only embedded systems both also Artifcial Intelligence, and ComputerVision. I not only gained skills and knowledge in embeddedsystems but also a range of other interesting domains. Overall, withoutDr. Ali AhmadiniaI would have not become a very knowledgeable Software Engineer today. iv ABSTRACT

The advancements of Artifcial Intelligence (AI) introduces challenges in integrat- ing its technology into resource limited devices. Many AI technology requires substantial resources to run e˙ectively. Object detection is an example that uses complex neural network models that requires both memory and heavy computation to run eÿciently. With the emergence of the Internet of Things (IoT), this will be a problemfor thoseIoT that are battery powered and resource limited. Fortunately, edge computing has shown promising results in improving the performance of re- source heavy applications. This paper introduces an edge computing architecture, Edge Orchestration Architecture that works with the Cloud. TheEdge Orchestration Architecture orchestrates a set of devices in determining where computation should occur to meet ideal system performance. This architecture is implemented in an object detectionsystem and enforces the potentialsthat edge computing has for this current AIgeneration. v TABLE OFCONTENTS

Acknowledgements ...... iii Abstract ...... iv Table of Contents ...... v List of Illustrations ...... vi Chapter I: Introduction ...... 1 Chapter II: Related Work ...... 3 Chapter III: Architecture ...... 5 3.1 Node ...... 6 3.2 Cloud ...... 6 3.3 Edge ...... 7 Chapter IV: Object Detection ...... 12 4.1 You Only Look Once ...... 13 4.2 Multi-view Object Detection withYOLOV2 ...... 18 4.3 Edge Computing Object DetectionSystem ...... 19 Chapter V: Experiment ...... 23 Chapter VI: Results ...... 26 6.1 Processing Time ...... 26 6.2 Data Transfer Size ...... 29 6.3 Battery Life ...... 29 Chapter VII: Future Work ...... 31 Chapter VIII: Conclusion ...... 32 Bibliography ...... 33 vi LIST OF ILLUSTRATIONS

Number Page 3.1 Edge Orchestration Architecture ...... 5 3.2 Node Component ...... 6 3.3 Cloud Component ...... 7 3.4 Initial Edge/Node Set-up ...... 8 3.5 Edge o˜oading Node work to Cloud ...... 9 3.6 Node directly o˜oading work to Cloud ...... 9 3.7 Edge confguresNode to o˜oad work back to the Edge ...... 9 3.8 Edge re-confguring Node to do its own computation ...... 10 4.1 Object DetectionModel Comparison [31] ...... 12 4.2 Feature Extraction in Convolutional Layer [6] ...... 13 4.3 Leaky Rectifed LinearUnit [36] ...... 14 4.4 Batch Normalization [22] ...... 14 4.5 Max Pooling Layer [6] ...... 15 4.6 You Only Look Once [31] ...... 15 4.7 YOLOV2’s Prediction Output [21] ...... 16 4.8 YOLOV2’s Confdence Loss: Object Existence [33] ...... 16 4.9 YOLOV2’s Confdence Loss: No Object Existence [33] ...... 16 4.10 YOLOV2’s Classifcation Loss [33] ...... 17 4.11 YOLOV2’s Localization Loss [33] ...... 17 4.12 Anchor Box [31] ...... 18 4.13 Point Correspondence using Homography [41] ...... 19 4.14 Edge Computing Objection DetectionSystem ...... 19 4.15 YOLOV2 Architecture [31] ...... 20 4.16 YOLOV2 Tiny Architecture [14] ...... 20 4.17 Object Detectionat the Edge ...... 21 4.18 Multi-view Object DetectionDesign ...... 22 5.1 Edge Computing Object Detection Component Specifcations . . . . 23 5.2 Experiment with Raspberry Pis ...... 23 6.1 1 Camera Average Processing Time ...... 26 6.2 1 Camera Average Data Transfer Size ...... 26 6.3 1 Camera Average Battery Life ...... 27 vii 6.4 2 Camera Average Processing Time ...... 27 6.5 2 Camera Average Data Transfer Size ...... 27 6.6 2 Camera Average Battery Life ...... 28 6.7 3 Camera Average Processing Time ...... 28 6.8 3 Camera Average Data Transfer Size ...... 28 6.9 3 Camera Average Battery Life ...... 29 1 C h a p t e r 1

INTRODUCTION

Amazon Alexa, Google Home, Smart Refrigerators, and several interesting products are being produced today [12, 17, 19, 30]. Internet of Things (IoT) is the generic term for these products and is also expected to increase in count to approximately 50 billion by 2020 [4, 24, 37, 38, 42]. These are thanks to advances in networking and artifcial intelligence (AI). Althougha milestone for technological advances, it also introduces technological challenges. As more devices connect to a network, the network’s throughput will be limited due to the congestion from incoming and outgoing data from thesedevices. On the otherhand, current advances in AI require signifcant resources in computing and storage resulting in limitation in real-time deadlines [37, 38, 42]. Thus, an overhead will occurin IoT and anything connected to a network to handle wait time or run complex AI algorithms [37, 38, 42]. To top it all o˙, these overheads will require extra power consumption on any battery powered device [37, 38, 42].

This would be a major problem in applications that requires advances in object detection. Object detection’s current advancements uses an AI-based technology, Deep Learning (DL) to provide highly accurate detection of objects in images [9, 16, 20, 26, 27, 31–33, 35]. State-of-the-art object detection uses complex DL models, mainly based on convolutional neural networks (CNN) that requires either time or expensive hardware to run [9, 16, 20, 26, 27, 31–33, 35]. These complex models will need to process images which are naturally large size data. If the processing is done elsewhere it will result in a signifcant decrease in a network’s throughput. The processing that occurs from either running the object detection model or waiting for an object detection result puts a load in power consumption. Hence, object detection requires signifcant resources that integrating it into an IoT product will have its challenges in response time, data transfer size and power consumption.

Edge computing is a recent research topic that is being considered as a possible solution in resourceheavy applications[4, 13, 24, 37, 38, 40, 42]. Edge computing’s idea is that a device or devices with powerful hardware can exist in the local area of resource limited devices. By having a powerful device near other less powerful 2 devices, it canbe usedto reduce data transfer overhead that e˙ects both processing time and power consumption. Additionally, by having a device with powerful hardware it can reduce the computational load on resource limited devices which will boost system performance. Therefore, edge computing can be a suitable solution to the resourcerequirements on object detectionsystems by reducing both overhead and load on resource limited devices. 3 C h a p t e r 2

RELATED WORK

Object detection is a topic within the image recognition domain. Generally, the challenges in image recognition applications are working with its heavy resource requirements in both running its algorithms and large data dependency[37, 38, 42]. Edge computing and work related to edge computing has shown promising capa- bilities in providing solutions to the challenges in image recognition applications. Primarily, theseworks focus on reducing processing time.

DIANNE is an architecture used to improve processing time by distributing a neural network among multiple devices. In this architecture, neural network computation occurs among multiple devices in producing a single result [10]. In comparison to a neural network on a single device, it resulted in improved processing speed when ran on an image recognition algorithm. Related to this work, there is a Big- Little Approach architecture. This architecture distributes a neural network’s image classifcation category into two locations [11]. One location is on an embedded device, another is at a Cloud service. The ones located at the embeddeddevice were those categories with higher expected detection count. Similarto the previous work, improved processing time is achieved with this architecture. One work considers the hardware limitationson unmannedaerial vehicles (UAV). It considersboth edge computing and . By doing object tracking on the edge, a UAV is able to achieve ideal object tracking speed in comparison to tracking done on the UAV or cloud service [13]. Another work uses edge computing for video analytic which defnes an edge computing platform, Latency-aware Video Analystics on Edge Computing Platform (LAVEA) [40]. It splits up edge computing into two parts. One part o˜oad tasks directly to an edge device. The second part uses several edge devices and splits tasks among them. This work focuses on having all tasks go to the edge frst and have the edge device determine how the task should be distributed. Generally, this work shows that processingtasks at edge devices results in improve video analytics time. A similar idea goes intoa work that splitsa CNN network between devices [6]. This work reveals that running image recognition at the edge is moresystem performant thanrunning at a cloud service.

A common themeon all theseworks is distributing image recognition tasks among 4 di˙erent devices or service (cloud). All work focuses on processing time and shy away from other system performance criteria: data transfer size and battery life. Therefore, a system that considers a combination of processing time, data transfer size and battery life is a topic to be investigated. 5 C h a p t e r 3

ARCHITECTURE

Complexity arises in the management of a system’s performance. This is due to consideration that must go in both devices capabilities and system performance requirements. Devices capabilities are based on their hardware specifcations that informs the type of computation that a device can perform. This carries over to system performance requirements that determines how the devices and other components of a system should operate. Generally, a system can consist of embedded devices, a locally powerful device and a cloud service. The localpowerful device is the basis of a proposed architecture that is considered as the Edge component of a system. The Edge can orchestrate how the full system performs, driving eÿciency in the full system’s operation. This is an edge computing architecture and known as the Edge Orchestration Architecture that focuses on the eÿciencyof the following system performance criteria: ProcessingTime, DataTransfer Size and Battery Life.

Figure 3.1: Edge Orchestration Architecture

As shown in Figure 3.1, the Edge Orchestration Architecture categorizes system components into three parts. Nodes, which are embedded devices. Edge, which is the locally more powerful device. Lastly, Cloud which is a cloud service with 6 virtually unlimited resources. The idea of the Edge Orchestration Architecture shares ideas from existing work, such as from the DIANNEarchitecture and LAVEA edge computing platform [10, 40]. In essence, they revealed that it is ideal that computing should be done at the edge. Although their work displayed promising results, a system’s behavior can change as more load enters an edge device or network congestion slows down data transfer rate impacting overall computational time of a device. Hence, the Edge Orchestration Architecture monitorsthe system’s behavior to coordinatethe Nodes in where they should process their data, whether it is through itself,the Edge, or the Cloud. The architecture’s objective is to coordinate the Nodes to meet ideal system performance in processing time, data transfer size and battery life.

3.1 Node Embedded devices arethe basisof the Node. Thus, the Node has limited hardware resources and is usually battery powered. Its computational ability and operating life will be limited in comparison to the Edge and Cloud. Additionally, it is where data originates from. Therefore, its roles are limited.

Figure 3.2: Node Component

As shown in Figure 3.2 the Edge will base whether it does full/pre-processing and/or whether it sends its data to the Edge or Cloud based on its confguration. The confguration changes from messages sent by the Edge. The Node’s confguration based operation will allow the Edge to optimize the full system’s performance by adjusting the Node’s operation to meetsystem performance requirements. TheNode will provide the ability for the Edge to access system information, such as battery life for Edge’s monitoring purposes.

3.2 Cloud Cloud computinghas emerged as an idealtechnology in resourceheavy applications [5]. Although seen as a virtually unlimited resource component of a system, its 7 distance from the data sourcebecomes a constraint in total processing time. The data transfer timeis the mainoverhead in Cloud computing that produces limitationsin real-time based applications. Thus,its role is largely based on its unlimited resource characteristic and limited to real-time deadline based applications.

Figure 3.3: Cloud Component

The Cloud will be the support to the Edge. As shown in Figure 3.3, the Edge will delegate the Node to o˜oadits work to the Cloudor the Edge itself o˜oadits work to the Cloud. The Edge does this when it determines that it will be optimalfor the system performance. Additionally, the Cloud will keep track on how long it takes to do its computation. This will be information that the Edge can use to monitor the system’s behavior. Lastly, the Cloud will store data that comes from the Node or Edge. Data will come from the Node when the Node is o˜oading work to the Cloud. The same goesfor the Edge. Due to the Cloud’s virtually unlimited abilities, it will be tasks to do heavy computationalwork such as training a machine learning model or be ableto update software for the Edge and Node.

3.3 Edge The Edge is the key to optimizing a system’s performance. Its close distance to the Nodes substantially reduces the data transfer overhead when Nodes o˜oad its work to other devices. Additionally, its ideal hardware specifcations can reduce computational time on highly complex algorithms that may not be suitable for a Node to run. Hence, the Edge will leverage those characteristics to provide optimal 8 processing time, limit data transfer size and improve battery life on the Nodes of the system. Generally, the Edge will do this by registering the Node as connectedto its system, run computationfor the Node, monitorsystem performance and reconfgure the Node’s to improve operational eÿciency. The Edge will keep track at where the Nodes are o˜oading its task through its LocationRegisters. Usually these registers would be an Edge Register, Cloud Register and Node Register.

Figure 3.4: Initial Edge/Node Set-up

Initially, the Edge will register a Node that it is running its computation. As shown in Figure 3.4, this is done through defning the Node in an Edge Register. After, the Edge willrun its computation whilein the background monitoring the system’s behavior. It will monitor both computational time that is done within itself and network congestion.

The Edge will assure that the overall processing time on the Nodes is below a maximum threshold. If it is not for a defned amount of time, it will take one or more work that camefrom the Node and send it to the Cloud. Thiswill reduce some computation load in the Edge. This is displayed in Figure 3.5. Although this may be optimal for some time, as more Nodes enter the Edge the Edge’s performance will start to degenerate. At this situation,the Edge willrequest from each Node for its battery level. From there, a greedy algorithm approach will take place where the Node with the least battery level will be confgured from the Edge to o˜oad its processing to the Cloud. The idea behind confguring the Node with the least 9

Figure 3.5: Edge o˜oadingNode work to Cloud

Figure 3.6: Node directly o˜oading work to Cloud

Figure 3.7: Edge confguresNode to o˜oad work back to the Edge 10 battery level is the fact that network wait time requires less power consumptionthan o˜oading data at a higher rate. This Node will be registered in the Edge’s Cloud Register. This is displayed in Figure 3.6.

Over time, the Edge’s load may be reduced. An example of this is if a Node stops communicating with the Edge. When this is the case, the Edge will look into its Cloud Register and tell that Node to move its o˜oadof work from the Cloudback to the Edge. This will reducethe data transfer time of that Node improving overall computational time amongthe system. This is displayed in Figure 3.7.

Figure 3.8: Edge re-confguring Node to do its own computation

Ping is a tool that a system can use to determine how congested a network is. This will be the mechanism thatthe Edge will useto monitor network throughput. Thisis done ata defned rate. As network gets congested, data transfer time will increase. This will e˙ect the overall processingtime for each individual Node. If a number of pings to the Cloud produces an average above the maximum threshold,the Edge will request the Nodes for its processing time. If theoverall processing time among all the Nodes are above a maximum threshold,the Edge will confgurea Node with the highest processing timeto run its computation locally. This will reduce data transfer size. Additionally, the Edge will defne the Node into its Node Register. This is shown in Figure 3.8. As the Edge continues monitoring the network congestion and notice that the congestion has drop to a minimum threshold,it will reconfgurethe Node that is doing its own computationback to the Edge. This Node will be defned back into the Edge Register. 11 Another advantage of the Edge is that it has accessto data from all Nodes connected to it. For this reason, the Edge can be used to correlate data between Nodes to improve system performance. The Edge can confgure the Node to do initial processing, reducing processing time or data size prior to the Node o˜oading its task to another locationfor fnal processing.

As data comesinto the Edge, the Edge will store the data from the Nodes. After a specifed time,the Edge will transfer the datato the Cloudfor storage.

Above all, the Edge is the component of the system that orchestrates the system components to improve overall performance. Fundamentally, this is done by its monitoring mechanism. System behavior within the system components are in- dicators for the Edge in orchestrating the location of where the Node should do its computation. This determination will be based on defned system performance requirements. This is the key feature of the Edge Orchestration Architecture. 12 C h a p t e r 4

OBJECT DETECTION

Object detection’s advancements reveal that the AI technology, DL provides ideal results in detection accuracy. The accuracy level correlates to the size of the DL model. State-of-the-art DL model’s are large models that requires substantial resources while smaller models are not as accurate but requires less resources. In e˙ect, object detection applications results in a trade o˙ between running a DL model with less accuracy or using a system with capable hardware. Currently, applications resort to either using a Graphical Processing Unit (GPU) or running the DL model on the Cloud. Those are suitable solutions but limited consideration goes into a network’s dynamic behavior that e˙ectsthe overall system performance. The Edge Orchestration Architecture considersthe overall system performance and suited for object detection applications.

Figure 4.1: Object Detection Model Comparison [31]

An object detection system can work with the Edge Orchestration Architecture to achieve ideal object detectionperformance. Thissystem will use existing DL models distributed among the components of the system. The distribution will be based on the components ability to run the object detection algorithm. For an optimal performance object detectionsystem, a real-timeDL modelwould be ideal. There are existing real-time DL models that can be leveraged such as You Only Look 13 Once (YOLO), Faster Recurrent Convolutional Neural Network (Faster R-CNN), MobileNet, Single Shot Multibox Detector (SSD) and several others [20, 25, 26, 31– 35]. These models were created to handle the issues with running DL models on resource limited object detection applications. YOLO Version 2 (YOLOV2) will be used as an application for the Edge Orchestration Architecture and considered an edge computing object detection system [31]. As shown in Figure 4.1, the use of YOLOV2 is due to both its real-time object detection ability compared to other models. YOLOV2’s main architecture is based on CNN, a general neural network type model that has shown promising results in many DL based image recognition applications [2, 18, 20, 23, 25–27, 31–35].

4.1 You Only Look Once YOLOV2’s prime achievement for real-time object detection comes from ignoring the standard neural network model, Multilayer Perceptron in its architecture. Its architecture is fully CNN that dramatically reduces the computational time to run its algorithm [31]. The main componentsof the YOLOV2 model is a Convolutional layer, Batch Normalization and Max Pooling layer.

Convolutional Layer

Figure 4.2: Feature Extractionin Convolutional Layer [6]

Convolutional layer is usedto extract features fromits input. It does this througha flter of a defned size. As shown in Figure 4.2, the flter is slide through an input and does matrix multiplication. The amountof spacesa flter slidesis basedon a defned stride value. In addition, padding is used with a flter if the flter may slide beyond 14 the size of an image. Padding adds 0 values beyond the size of an image so that a flter would have values to computeagainst if it slidesbeyond an image. Essentially, a flteris used to extract features from an input and createsa feature map. The size of the feature map is basedon the numberof flters usedin the convolutional layer.

Figure 4.3: Leaky Rectifed LinearUnit [36]

After a feature map is created,an activation functionis used on the feature map to e˙ect the fnal outputof thelayer. In YOLOV2, it uses the LeakyRectifed Linear Unit (LeakyReLU). Its linearcharacteristics lessonsthe computationin comparison to other activation function which makes both prediction and training more eÿcient. As shown in Figure 4.3, the fnal output of a convolutional layer with the Leaky ReLU will leave the values of the feature map as is unless the value is less than 0. By having the values less than 0 be within a slope, it will diminisha problem known as the "dyingReLU" wherenegative values willstart to have a negative e˙ecton a model’s learning ability.

Batch Normalization

Figure 4.4: Batch Normalization [22]

Batch normalization is used to generalize a neural network [22]. It does this by modifying the outputsof the previous layer. In YOLOV2’s case, batch normalization 15 is used after the convolutional layer to add variance to its output values. As a result, it reduces over-ftting on the model. The transformation of the output is based on the equations in Figure 4.4.

Max Pooling Layer

Figure 4.5: Max Pooling Layer [6]

Finally, the Max Pooling Layer is used to reduce the dimensionality of the input. Similar to a convolutional layer, a flteris slide throughthe input producinga fnal output. As shown it Figure 4.5, the Max Pooling Layer gets the maximumvalues of each slide window producing a reduced size output.By reducing the dimensionality of the input, it reduces the number of parameters of the CNN model resulting in less computation.

Loss Function

Figure 4.6: You Only Look Once [31]

Depicted in Figure 4.6, YOLOV2’s idea in learning for object detectionis based on transforming an image into a 13x13 grid [31]. This is done through the layers that YOLOV2 is consisted with. As shown in Figure 4.7,YOLOV2 makes a prediction of a bounding box of an object for each cell in the grid. The prediction consist of the bounding box’s center x and y location as well as its width and height. Along side the bounding box prediction, prediction occurs on the confdence that an object 16

Figure 4.7: YOLOV2’s Prediction Output [21] exist at the cell and the class probabilities of the object [31]. Lastly, non-maximum suppression is used from the output to remove duplication of an object with lower confdence score [31]. The fnal output is used to create the bounding box that it seen in Figure 4.6.

YOLOV2’s achieve this from its unique loss function.The loss functionis comprised of three parts: Confdence Loss, Classifcation Loss, and Localization Loss.

Confdence Loss

Figure 4.8: YOLOV2’s Confdence Loss: Object Existence [33]

Figure 4.9: YOLOV2’s Confdence Loss: No Object Existence [33] 17 Confdence Loss is used so YOLOV2 can learn whether an object exist or does not exist in a cell of a grid. As shown in Figure 4.8 and Figure 4.9 YOLOV2’s Confdence Loss consist of the confdence that an object exist and that an object does notexist.

Classifcation Loss

Figure 4.10: YOLOV2’s Classifcation Loss [33]

Classifcation Loss is used so YOLOV2 can learn what type of object is detected.As shown in Figure 4.10, this is based on the probabilitythat an object is of a particular class. Depending on the YOLOV2’s trainingset, the numberof classeswill vary.

Localization Loss

Figure 4.11: YOLOV2’s Localization Loss [33]

Lastly, Localization Loss is used so YOLOV2 can learn the bounding box infor- mation of an object. As shown in Figure 4.11, this consist of the x and y center location, width and height of the bounding box.

Finally, Confdence Loss, Classifcation Loss and Localization Loss is added to- gether to form YOLOV2’s loss function. 18

Figure 4.12: Anchor Box [31]

Anchor Boxes

YOLOV2 implementsa feature di˙erentfrom its previous version to reduce local- ization errors [31]. It uses anchor boxes with predefned sizes of the bounding box. The anchor boxes are determined by the training set. K-means clustering are used on the training set bounding boxes to provide a set of anchor boxes that is most reliable on the YOLOV2 model. As shown in Figure 4.12, the fnal boundingbox is determined through the o˙set of the anchor boxes.

4.2 Multi-view Object Detection withYOLOV2 In a multi-camera system, images can be correlated with each other to improve understanding of a scene as a whole. Existing object detectionmodels take advantage of this idea and resulted in highly accurate detection in comparison to single camera object detection. Deep Occlusion minimized the issue of occlusion among objects with its implementationof a multi-view object detectionalgorithm [3]. Other work does pre-processingfor boundingbox information prior to utilizinggeometry among multiple images to improve detection accuracy [8, 39]. A system using YOLOV2 can leverage this idea to improve its accuracy. An approach will utilize ideas from work in [39] with somevariation.

A multiple view system will frst do pre-processingfor boundingbox location from each camera. This is done through a YOLOV2 model with a lower threshold. After, all cameras will pass its prediction to the Edge, where the Edge will use a homography matrix to correspond points among the cameras. As Figure 4.13 depicts, the homography matrix is produced by defning the relationship between the two image plane relative to a scene. This is done through Direct Linear Transform 19

Figure 4.13: Point Correspondence using Homography [41]

(DLT) where matching points between two images are defnedand used to form the homography matrix [29]. From the corresponding points, a graph will be created where each connected vertex are corresponding points between images. Lastly, the Page Rank algorithm will be used to propagate the confdence levels among the corresponding points’ detection probabilities. This will be the fnal prediction results and sentback to the correct Node [28].

4.3 Edge Computing ObjectDetection System

Figure 4.14: Edge Computing Objection DetectionSystem

The Edge Orchestration Architecture is implemented in an object detection edge computing system usinga combinationof YOLOV2 models.

As shown in Figure 4.14,the YOLOV2 is usedon bothEdge and Cloud becauseof its resourcerequirements. YOLOV2 consist of 25 layers with a maximum numberof 20

Figure 4.15: YOLOV2 Architecture [31]

flters of 1024 usedin a convolutional layer [31]. Hence, it will require signifcant memory and heavy computation to run eÿciently. The YOLOV2 architecture is displayed in Figure 4.15.

Figure 4.16: YOLOV2 Tiny Architecture [14]

As shown in Figure 4.14, the YOLOV2 Tiny is used on the Nodes. The YOLOV2 Tiny version consist of only 15 layers with a maximum number of flters of 512 in a convolutional layer. This requires less memory and computation that makes it suitable for an embedded device. The YOLOV2 Tiny architecture is displayed in Figure 4.16.

In this object detection system, the Node will gather images from the camera’s scene. The images are then computed for detection of objects at its designated 21

Figure 4.17: Object Detection at the Edge computational location. Initially, the Node will start its computation at the Edge. The Node will transfer the image over and the Edge will return only the prediction results. By only returning the prediction results, minimal network throughput will be used in a network. From there the Node can update its image with bounding boxes for any objects detected. An example is shown in Figure 4.17.

As moreNodes connectto the Edge, processingtime is expected to decreasedue to additional load on the Edge resulting in poor object detectionrate. For this case, Edge work that originated from the Node will be transferred to the Cloud for computation. When this approach becomes less e˙ective after the Edge reaches its maximum load from Nodes, it will delegate one or more Nodes to send its computation requests directly to the Cloud. In e˙ect, both caseswill allow the overall system to operate eÿciently. Again, the Cloud will only return the prediction resultsto the Node. This same concept occurs when the Node determines that the network is congested but for this case, computation will occur at the Node. By transferring computation to the Node, data transfer size willbe reduced.

An issue in running object detection computation on the Node is that its detection accuracy is not as good as the Edge or Cloud. Due to the Node’s resource limitation it is using the YOLOV2 Tiny that has signifcantly less detection capability than YOLOV2. Fortunately, an Edge has accessto data among all Nodes connectedto it. Data transfer size can be minimized while maintaining a relatively good detection rate by correlating detection among the di˙erent Nodes. The Node will run its computation with its predicted results. The predicted results will be transferred to the Edge for a multi-view object detection algorithm. The Edge does this by maintaining a dictionary, where each key is a frame of the scene.Every time a Node sends a requests, the dictionary gets updated with the Node’s camera number and 22

Figure 4.18: Multi-view Object Detection Design prediction results. TheNode’s request will be in a polling state untilall Nodes sends its data for the same frame. A timer is also used to prevent the Node’s requests from staying in a polling state indefnitely. Once all required data is provided for a given frame, the multi-view algorithm is ran. After, the fnal prediction results are sent back for each individual Node. This is display is Figure 4.18. Since only the predicted results are transferred to the Edge, only a fraction of the network throughput is used relative to a full image. This is the expense for data transfer overhead but will allow for a better object detectionsystem.

Over time, data will be collected from both Edge and Cloud from images from the Node. The Edge will transfer the images to the Cloud. The Cloud can then keep the images and train the YOLOV2 models for better prediction abilities. This defnes the implementation of the Edge Orchestration Architecture into an edge computing object detection system. 23 C h a p t e r 5

EXPERIMENT

Figure 5.1: Edge Computing Object Detection Component Specifcations

Figure 5.2: Experiment with Raspberry Pis

The experiment consisted of a system of components that communicates through the HyperText Transfer Protocol (HTTP). The Nodes are several 3 Model B+. The Edge is an Alienware 17 R4 laptop and the Cloud is the Google 24 Cloud Platform. Figure 5.1 provides the specifcationof the device. The Raspberry Pi receives images from a 5 MP 1080p OV5647 camera. It is battery powered by a 3800 mAH power supply. The Edge and Cloud runs its object detectionmodel using their GPU throughTensorfow’s framework [1]. As seen, both have very powerful specifcations, where the Cloud is the most performant with the NVIDIA Tesla 80 GPU. Lastly, the object detection models are trained on thePASCAL Visual Object Classes (VOC) 2007 and 2012 data set [15]. The VOC data set allows the object detection system to detect thefollowing objects: airplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motor bike, person, potted plant, sheep, sofa, train and television monitor.

Experiment Scenario 1 1 Node Compute at Node 2 1 Node Compute at Edge 3 1 Node Compute at Cloud 4 1 Node O˜oad to Edge Compute at Cloud 5 2 Node Compute at Edge 6 2 Node Compute at Node than at Edge for Multi-view Algorithm 7 3 Node Compute at Edge 8 3 Node Compute at Cloud 9 3 Node O˜oad to Edge Compute at Cloud 10 2 Node Compute at Edge, 1Node Compute at Cloud Table 5.1: Experiment Scenarios

During the experiment, data is collected while the Raspberry PI is set up to run object detectionon episodesof the series, Friends. This is displayed in Figure 5.2. The experiments are based on the scenarios listed in Table 5.1. The scenario defnes where the computation occurs. For example, "2 Node Compute at Edge, 1 Node Compute at Cloud" specifes that 2 Nodes ran its object detection computation at the Edge and 1 Node runs its object detection computation at the Cloud. For the multi-view scenario, two cameras will be calibratedto obtaina homography matrix. The homography matrix is required in the multi-view object detection algorithm to correlate prediction resultsamong the two cameras. Becauseof this, the experiment for multi-view uses recorded videosof Friends to maintainthe positionof the camera relative to where the series was being played. The data collected are processing time, data transfer size and battery life. All data except battery life is collectedevery time a Node processesa framefor object detection. This is about1 set of data per 25 second for about3 hours per run. The datain the charts are basedon average of all the data from all the runs per scenario. Each scenario is ran twice and the battery pack is switch per run. Battery life is based on the start time of the system till when the battery dies on the Raspberry Pi. 26 C h a p t e r 6

RESULTS

The experiment results are separated by the number of cameras. They are further separated by average processing time,average data transfer size andaverage battery life.

Figure 6.1: 1 Camera Average ProcessingTime

Figure 6.2: 1 Camera Average DataTransfer Size

6.1 Processing Time As we increase the number of cameras we see a common theme in average processing time. Computation at the Edge has the best average processing timein comparison 27

Figure 6.3: 1 Camera Average Battery Life

Figure 6.4: 2 Camera Average ProcessingTime

Figure 6.5: 2 Camera Average DataTransfer Size 28

Figure 6.6: 2 Camera Average Battery Life

Figure 6.7: 3 Camera Average ProcessingTime

Figure 6.8: 3 Camera Average DataTransfer Size 29

Figure 6.9: 3 Camera Average Battery Life to having computation occurat the Node or having computation occurat the Cloud.

Another interesting comparison is between two scenarios. One scenario is wherea Node requests for computationat the Edge thanthe Edge forwards the computation to the Cloud. The second scenario is where the Node requests for computation directly to the Cloud. For both1 Node and3 Nodes, requesting computation to the Edge frst than the Cloud requires less processing time than sending computation directly to the Cloud. The thoughts behind this is that the Edge has better hardware for making more eÿcient requests and processing response than theNode.

In essence, the Edge signifcantly improves processing time in a system whether it is to do computation directly for the Node or forwarding the Node’s computation requests directly to the Cloud.

6.2 Data Transfer Size As expected, data transfer size is relatively the same whether you send it to the Edge or Cloud. The approach that shows improvement in data transfer size is the multi-view object detection approach. By having the Nodes do pre-processingprior to fnalizing the object detection prediction at the Edge, a substantial reduction in data transfer size is achieved. This can make a big impact in a network’s throughput when network congestion is a concern.

6.3 Battery Life Battery life eÿciency is achieved when the Node o˜oad its work directly to the Cloud. In 1 Node, we see an almost even average battery life between the frst two 30 scenarios. The frst scenario frst goes to the Edge than goes to the Cloud. The second scenario goes directly to the Cloud. With 3 Nodes of the same scenarios, we see theNode going directly to the Cloudfor computation with the best average battery life. Comparing this results with the same scenario’s average processing time, there is a correlation where battery life is better when processing time is longer. Hence, the Edge Orchestration Architecture utilizes this idea where when it determines it needs to confgurea Node to send work directly to the Cloud, it uses the Node with the least battery life.

Overall, the Edge has the ability to improve a system’s processing time, data transfer size and battery life. Nodes dramatically improves its processing time by having a more powerful device, Edge handle the heavy computation for it. When data transfer size is a concern, the Node can do pre-processingon its data, reducingthe data size than have the Edge take care of the fnal processing. Lastly, the Edge can confgure the Node to do computation directly to the Cloudto conserve battery life. The results display this in an edge computing object detection system. 31 C h a p t e r 7

FUTURE WORK

Edge computing research is in its infancy, Hence, there are many discoveries to be made. A few future work comes to mind in discovering approaches to develop a robust edge computing system.

During the operation of a system, data is collected about the system’s behavior. This information canbe utilizedto forecast the events that can constrain a system’s performance. A machine learning algorithm canbe usedfor thisforecasting ability and can be an improvement on the monitoring feature in the Edge Orchestration Ar- chitecture. Thus, a future work goes intoa more intelligent approach in monitoring the system’s behavior to improve overall system performance.

Another future work considers di˙erent machine learning models to handle object detection tasks. One model, the Correlational Neural Network (CorNet) makes image classifcation prediction from multiple image views [7]. CorNet may be a model that can work well in an object detection system consisted of multiple cameras. CorNet is also known to maintaingood prediction resultseven when one input is not provided. Thiswould be benefcialin a multiple cameraobject detection system for the case when one camera dies but other camerasstill exist. Therefore, a model similarto CorNet has potentialsin a multipleinput system. 32 C h a p t e r 8

CONCLUSION

Edge computing is emerging as a useful approach in improving the processingtime on resource limiteddevices. This becomes moreimportant to aid in the development on applications that utilizes resource heavy AI algorithms. Consideration should also take place in data transfer size and battery life. These consideration will be important when decreasenetwork throughputor increaseload on a device negatively impacts the performance of a system. The Edge Orchestration Architecture takes an approach in maintaining an overall eÿcient system by monitoring the system’s behavior and adjusting itselfwhen system performance starts to degrade. With this architecture, an improvement can be achieved on an object detection system and other heavy resource applications. 33 BIBLIOGRAPHY

[1] Martín Abadi, Paul Barham, JianminChen, Zhifeng Chen, Andy Davis, Je˙rey Dean, Matthieu Devin, Sanjay Ghemawat, Geo˙rey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorfow: A system for large- scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pages 265–283, Berkeley, CA, USA, 2016. USENIX Association. ISBN 978-1-931971-33-1. URL http://dl.acm.org/citation.cfm?id=3026877.3026899.

[2] Pedro Ballester and Ricardo Matsumura de Arajo. On the performance of googlenet and alexnet applied to sketches. In AAAI, pages 1124–1128, 2016.

[3] Pierre Baqué, François Fleuret, and Pascal Fua. Deep occlusion reasoning for multi-camera multi-target detection. CoRR, abs/1704.05775, 2017. URL http://arxiv.org/abs/1704.05775.

[4] Flavio Bonomi,Rodolfo Milito, PreethiNatarajan, and Jiang Zhu. Fog Comput- ing: A Platform for Internet of Things and Analytics, pages 169–186. Springer International Publishing, Cham, 2014. ISBN 978-3-319-05029-4. doi: 10.1007/978-3-319-05029-4_7. URL https://doi.org/10.1007/978-3- 319-05029-4_7.

[5] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. Cloud computingand emerging it platforms: Vision, hype, and reality for delivering computingas the 5th utility. Future Generation computer systems, 25(6):599–616, 2009.

[6] Emmanuel Ayuyao Castillo and Ali Ahmadinia. Distributed deep convolutional neural network for smart camera image recognition. In Proceedings of the 11th International Conference on Distributed Smart Cameras, ICDSC 2017, pages 169–173, New York, NY, USA, 2017. ACM. ISBN 978-1-4503-5487- 5. doi: 10.1145/3131885.3131935. URL http://doi.acm.org/10.1145/ 3131885.3131935. 34 [7] Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, and Balaraman Ravin- dran. Correlational neural networks. CoRR, abs/1504.07225, 2015. URL http://arxiv.org/abs/1504.07225.

[8] A. Coatesand A. Y. Ng. Multi-camera object detectionfor robotics. In 2010 IEEE International Conference onRobotics and Automation, pages 412–419, May 2010. doi: 10.1109/ROBOT.2010.5509644.

[9] M. Daniels,K. Muldawer, J. Schlessman, B. Ozer, and W. Wolf. Real-time hu- man motion detection with distributed smart cameras. In 2007 First ACM/IEEE International Conference on Distributed Smart Cameras, pages 187–194, Sept 2007. doi: 10.1109/ICDSC.2007.4357523.

[10] Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Steven Bohez, Sam Leroux, and Pieter Simoens. Dianne: Distributed artifcial neural networks for the internet of things. In Proceedings of the 2Nd Workshop on Middleware for Context-Aware Applications in the IoT, M4IoT 2015, pages 19–24,New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3731-1. doi: 10.1145/2836127. 2836130. URL http://doi.acm.org/10.1145/2836127.2836130.

[11] Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Steven Bohez, Pieter Simoens, Piet Demeester, and Bart Dhoedt. Distributed neural networks for internet of things: The big-little approach. In Benny Mandler, Johann Marquez-Barja, Miguel Elias Mitre Campista, Dagmar Cagá¬ová, Hakima Chaouchi, Sherali Zeadally, Mohamad Badra,Stefano Giordano, Maria Fazio, Andrey Somov, and Radu-LaurentiuVieriu, editors, Internet of Things.IoT In- frastructures, pages 484–492, Cham, 2016. Springer International Publishing. ISBN 978-3-319-47075-7.

[12] P. Dempsey. The teardown: Google home personal assistant. Engineering Technology, 12(3):80–81, April 2017. ISSN 17509637. doi: 10.1049/et.2017. 0330.

[13] Joel Dick, Caleb Phillips, Seyed Hosein Mortazavi, and Eyal de Lara. High speed object tracking using edge computing: Poster abstract. In Proceedings of the SecondACM/IEEE Symposiumon Edge Computing, SEC ’17, pages 26:1– 26:2, New York, NY, USA, 2017.ACM. ISBN 978-1-4503-5087-7. doi: 10. 1145/3132211.3132457. URL http://doi.acm.org/10.1145/3132211. 3132457. 35 [14] Taha Emara. How to build a custom object detector using yolo. http://emaraic.com/blog/yolo-custom-object-detector, 2018.

[15] Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, Jun 2010. ISSN 1573-1405. doi: 10.1007/s11263-009-0275-4. URL https://doi.org/10. 1007/s11263-009-0275-4.

[16] Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. Convolutional two-stream network fusionfor video action recognition. In The IEEE Confer- ence on Computer Vision andPattern Recognition (CVPR), June 2016.

[17] A. Floarea and V. Sgârciu. : A next generation refrigerator connected to the iot. In 2016 8th International Conference on Electronics, Computers and Artifcial Intelligence (ECAI), pages 1–6, June 2016. doi: 10.1109/ECAI.2016.7861170.

[18] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, Oct 2017. doi: 10.1109/ICCV.2017.322.

[19] E. Hodo,X. Bellekens, A. Hamilton,P. Dubouilh,E. Iorkyase, C. Tachtatzis, and R. Atkinson. Threatanalysis of iot networks using artifcial neural network intrusion detection system. In 2016 International Symposium on Networks, Computers and Communications (ISNCC), pages 1–6, May 2016. doi: 10. 1109/ISNCC.2016.7746067.

[20] Andrew G. Howard, Menglong Zhu,Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Eÿcient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. URL http://arxiv.org/abs/1704.04861.

[21] Jonathan Hui. Real-time object detection with yolo, yolov2 and now yolov3.

https://medium.com/@jonathan hui/real − time − object − detection − with − yolo − yolov2 − 28b1b93e2088, 2018.

[22] Sergey Io˙e and Christian Szegedy. Batch normalization: Accelerating deep network trainingby reducing internal covariate shift. CoRR, abs/1502.03167, 2015. URL http://arxiv.org/abs/1502.03167. 36 [23] Alex Krizhevsky and Geo˙ Hinton. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 40(7), 2010.

[24] J. Lee, M. Stanley, A. Spanias, and C. Tepedelenlioglu. Integrating machine learning in embedded sensor systems for internet-of-things applications. In 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pages 290–294, Dec 2016. doi: 10.1109/ISSPIT.2016. 7886051.

[25] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015. URL http://arxiv.org/abs/ 1512.02325.

[26] Chengcheng Ning, Huajun Zhou,Yan Song, and JinhuiTang. Inception single shot multibox detectorfor object detection. In 2017 IEEE International Con- ference on Multimedia ExpoWorkshops (ICMEW), pages 549–554, July 2017. doi: 10.1109/ICMEW.2017.8026312.

[27] Guanghan Ning, Zhi Zhang, Chen Huang, Zhihai He, Xiaobo Ren, and Haohong Wang. Spatially supervised recurrent convolutional neural net- works for visual object tracking. CoRR, abs/1607.05781, 2016. URL http://arxiv.org/abs/1607.05781.

[28] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing orderto the web. Technical Report 1999- 66, Stanford InfoLab, November 1999. URL http://ilpubs.stanford. edu:8090/422/. Previous number = SIDL-WP-1999-0120.

[29] Philippe Pourcelot, Fabrice Audigié, Christophe Degueurce, Didier Geiger, and Jean Marie Denoix. A method to synchronise cameras using the di- rect linear transformation technique. Journal of Biomechanics, 33(12):1751– 1754, 2000. ISSN 0021-9290. doi: https://doi.org/10.1016/S0021-9290(00) 00132-9. URL http://www.sciencedirect.com/science/article/ pii/S0021929000001329.

[30] Ashwin Ram,Rohit Prasad, Chandra Khatri, AnuVenkatesh, Raefer Gabriel, Qing Liu, Je˙ Nunn, Behnam Hedayatnia, Ming Cheng, Ashish Nagar, Eric King, Kate Bland, Amanda Wartick, Yi Pan, Han Song, Sk Jayadevan, Gene Hwang, and Art Pettigrue. Conversational AI: the science behind the alexa 37 prize. CoRR, abs/1801.03604, 2018. URL http://arxiv.org/abs/1801. 03604.

[31] Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.

[32] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv, 2018.

[33] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unifed, real-time object detection. CoRR, abs/1506.02640, 2015. URL http://arxiv.org/abs/1506.02640.

[34] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: To- wards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.

[35] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[36] Deval Shah.Activation functions. https://towardsdatascience.com/activation- functions-in-neural-networks-58115cda9c96, 2017.

[37] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu. Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5):637–646, Oct 2016. ISSN 2327-4662. doi: 10.1109/JIOT.2016.2579198.

[38] Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, and Dim- itrios S. Nikolopoulos. Challenges and opportunities in edge computing. CoRR, abs/1609.01967, 2016. URL http://arxiv.org/abs/1609.01967.

[39] Zixuan Wang and Hamid Aghajan. Tracking by Detection Algorithms Using Multiple Cameras, pages 175–188. Springer New York, New York, NY, 2014. ISBN 978-1-4614-7705-1. doi: 10.1007/978-1-4614-7705-1_8. URL https://doi.org/10.1007/978-1-4614-7705-1_8.

[40] S. Yi, Z. Hao, Q. Zhang, Q. Zhang, W. Shi, and Q. Li. Lavea: Latency-aware video analytics on edge computing platform. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2573–2574, June 2017. doi: 10.1109/ICDCS.2017.182. 38 [41] Canben Yin, Shaowu Yang, Xiaodong Yi, Zhiyuan Wang, Yanzhen Wang, Bo Zhang, and Yuhua Tang. Removing dynamic3d objectsfrom point clouds of a moving rgb-d camera. In Information and Automation, 2015 IEEE Inter- national Conference on, pages 1600–1606. IEEE, 2015.

[42] Mugen Peng Yuan Ai and Kecheng Zhang. Edge cloud computingtechnologies for internet of things:A primer. Digital Communications andNetworks, 2017. doi: 10.1016/j.dcan.2017.07.001.