Embedded CNN Based Vehicle Classification and Counting in Non

Embedded CNN based vehicle classification and counting in non-laned road traffic Mayank Singh Chauhan* Arshdeep Singh* Mansi Khemka* IIT Delhi IIT Delhi Delhi Technological University [email protected] [email protected] [email protected] Arneish Prateek Rijurekha Sen IIT Delhi IIT Delhi [email protected] [email protected] ABSTRACT steps to make public transport better by gradually adding subway Classifying and counting vehicles in road traffic has numerous ap- and bus infrastructure [16], albeit under budget constraints. Policy plications in the transportation engineering domain. However, the decisions like odd-even rules (vehicles with odd and even numbered wide variety of vehicles (two-wheelers, three-wheelers, cars, buses, plates are allowed on alternate days) are being tried to curb the num- trucks etc.) plying on roads of developing regions without any lane ber of private cars [8, 13, 23]. A lot of times such policy decisions discipline, makes vehicle classification and counting a hard problem are met with angry protests from citizens in news and social media. to automate. In this paper, we use state of the art Convolutional In absence of data driven empirical analysis of the potential and ac- Neural Network (CNN) based object detection models and train tual impact of such urban transport policies, debates surrounding the them for multiple vehicle classes using data from Delhi roads. We policies often become political rhetoric. Building systems to gather get upto 75% MAP on an 80-20 train-test split using 5562 video and analyze transport, air quality and similar datasets is therefore frames from four different locations. As robust network connectivity necessary, for data driven policy debates. is scarce in developing regions for continuous video transmissions This paper focuses on a particular kind of empirical measurement, from the road to cloud servers, we also evaluate the latency, energy namely counting and classification of vehicles and pedestrians from and hardware cost of embedded implementations of our CNN model roadside cameras installed at intersections in Delhi-NCR. These based inferences. numbers can be used in road infrastructure planning, e.g. in construc- * These authors have equal contributions. tion of signalized intersections, fly-overs, foot-bridges, underpasses, footpaths and bike lanes. Classified counts can also help in evalu- CCS CONCEPTS ating the effect of policies like odd-even, to see if private transport numbers go down during the policy enforcement period as expected. • Computer systems organization Embedded systems; We discuss these and more motivational use cases of automated KEYWORDS vehicle counting and classification in Section 2. We show how some of these use cases can benefit from empirical data, based onour Urban computing, traffic monitoring, deep neural network, computer dataset, in Section 7. vision, embedded computing Non-laned driving in developing regions with high heterogeneity ACM Reference Format: of vehicles and pedestrians, make automated counting and classifica- Mayank Singh Chauhan*, Arshdeep Singh*, Mansi Khemka*, Arneish Pra- tion a hard problem. This paper explores state of the art computer teek, and Rijurekha Sen. 2019. Embedded CNN based vehicle classification vision methods of CNN based object detection, to handle this task. and counting in non-laned road traffic. In Proceedings of THE TENTH CNN models need annotated datasets from the target domain for INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNI- supervised learning. Annotated video frames from in-vehicle and CATION TECHNOLOGIES AND DEVELOPMENT (ICTD ’19). ACM, New roadside cameras are available for western traffic, and have recently York, NY, USA, 11 pages. https://doi.org/10.1145/3287098.3287118 been in high demand to train computer vision models for self-driving arXiv:1901.06358v1 [cs.CV] 18 Jan 2019 1 INTRODUCTION cars etc. We started our explorations with CNN models, available with Traffic congestion and air pollution levels are becoming life threat- weights trained on the Imagenet dataset [22], which has many classes ening in developing region cities like Delhi and the National Capital of objects including vehicles. We further fine-tuned the model with Region (NCR). The local government is being forced to take concrete existing annotated datasets of developed country traffic from PAS- Permission to make digital or hard copies of all or part of this work for personal or CAL VOC [24] and KITTI [11]. However, the accuracies obtained classroom use is granted without fee provided that copies are not made or distributed with models trained with developed country traffic datasets, on test for profit or commercial advantage and that copies bear this notice and the full citation videos and images collected in Delhi-NCR, was very low (Mean Av- on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, erage Precision or MAP value in object detection was 0.01% using to post on servers or to redistribute to lists, requires prior specific permission and/or a fine-tuning with KITTI dataset and 0.58% using fine-tuning with fee. Request permissions from [email protected]. Pascal VOC). ICTD ’19, January 4–7, 2019, Ahmedabad, India © 2019 Association for Computing Machinery. We identified several differences between the annotated video and ACM ISBN 978-1-4503-6122-4/19/01. $15.00 image datasets of western traffic and our traffic videos, that might https://doi.org/10.1145/3287098.3287118 ICTD ’19, January 4–7, 2019, Ahmedabad, India M. Chauhan et al. cause the accuracy difference of training models on one dataset also explore the prospect of real time inferences using our trained and testing on the other. Four-wheelers and motorbikes look sim- CNN models, especially using on-road embedded platforms. ilar across countries, but there are many vehicles in Delhi-NCR Why is embedded processing interesting to explore in this con- which look completely different from the western world (e.g. auto- text? Since broadband network connectivity across different road rickshaws, e-rickshaws, cycle-rickshaws, trucks and buses). Sec- intersections and highways is not reliable in developing countries, ondly, our lack of lane discipline causes higher levels of occlusion, transfer of video frames from the road to cloud servers for running where a large vehicle like a bus is occluded by many smaller vehicles. computer vision models on them can become a bottleneck. We there- Thirdly, our roads are not rectangular grid shaped as seen in devel- fore evaluate embedded platforms on their ability to run inference oped country videos, but have different adhoc intersection designs, tasks i.e. given a pre-trained CNN based object detection model creating different views of the captured traffic flows. Finally, since and a video frame, whether the embedded platform can process the self-driving cars is one of the main application focus in developed frame to give classified counts. We measure the latency incurred and countries, many images and videos are captured from the view point energy drawn per inference task on three off-the-shelf embedded of the driver. This view significantly differs from the view of traffic platforms (Nvidia Jetson TX2, Raspberry PI Model 3B and Intel Mo- a road-side camera gets. On obtaining low accuracies with annotated vidius Neural Compute Stick). Our evaluations in Section 6 show the datasets from developed regions as a combined effect of all these feasibility of embedded processing and also shows the cost-latency- differences, we tried to find annotated datasets of non-laned devel- energy trade-offs of particular hardware-software combinations. oping region traffic to fine-tune our CNN models. Unfortunately, we Our trained models are available at1 The annotated datasets will could not find such datasets. potentially be of interest to computer vision researchers, for design- We therefore create such annotated datasets ourselves, as part ing and testing better CNN models for developing region traffic. of this paper. We collect videos from three different intersections The trained models and technical know-how of training the CNN and a highway in Delhi-NCR, in collaboration with Vehant Technolo- models and running inferences on embedded platforms will poten- gies [19] and Delhi Integrated Multimodal Transit System (DIMTS) [9]. tially aid government organizations in data driven policy design and We split the video into frames, manually annotate the different ob- evaluation on road traffic measurement and management. jects with bounding boxes and use this annotated dataset to train and test CNN based object detectors. Our annotated dataset comprises 2 MOTIVATION 5562 frames with 32088 total annotations, average number of anno- Why is vehicle classification and counting useful? One use case for tations per frame being 6. In an 80-20 split of train and test data, our such classified counts is data driven infrastructure planning. Each ve- trained model achieves MAP values of upto 75%. We describe our hicle class can carry a certain number of passengers, which is called dataset in Section 4 and CNN based object detector model training Passenger Count Unit (PCU) [14]. PCU/hour is used to compute and testing in Section 5. capacity of roads and if this capacity needs to be increased, flyovers, The cameras from which we obtain data have either a fish-eye or underpasses and road widening projects have to be undertaken after a normal lens, and get a frontal, back or side view of the road traffic. proper assessment. While this is the norm in developed countries, in These different kind of camera installations and lens configurations developing countries infrastructure enhancement projects are often help us in evaluating how models trained on one annotated dataset ridden in political rhetoric and controversy. A notable example has perform on test set from the same camera vs. other installations.

Embedded CNN Based Vehicle Classification and Counting in Non

EECS 498/598: Brain-Inspired Computing: Models, Architectures, and Programming

A Deep Neural Network for On-Board Cloud Detection on Hyperspectral Images

AI EDGE up Series

Artificial Intelligence Computing for Automotive Webcast for Xilinx Adapt: Automotive: Anywhere January 12Th, 2021

Synopsys and Movidius First-Pass Silicon Success for Myriad 2 Vision Processing Unit with Designware USB 3.0, LPDDR3/2 & MIPI D-PHY IP

Semi Autonomous Vehicle Intelligence: Real Time Target Tracking for Vision Guided Autonomous Vehicles

A Comparative Analysis of Tightly-Coupled Monocular, Binocular, and Stereo VINS

Using Computer Vision in Retail Analytics

Cost-Effective Hardware Accelerator Recommendation for Edge Computing∗

Benchmarking of Cnns for Low-Cost, Low-Power Robotics Applications

Exploring the Vision Processing Unit As Co-Processor for Inference

HETEROGENEOUS COMPUTING for AI at the EDGE Webinar Pasca Sarjana Terapan PENS 2 September 2020 Dr