sensors

Article Real-Time Vehicle Make and Model Recognition with the Residual SqueezeNet Architecture

Hyo Jong Lee 1,* , Ihsan Ullah 2, Weiguo Wan 1, Yongbin Gao 3 and Zhijun Fang 3

1 Division of Computer Science and Engineering, CAIIT, Chonbuk National University, Jeonju 54896, Korea; [email protected] 2 Department of Robotics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Daegu 42988, Korea; [email protected] 3 School of Electrics and Electronic Engineering, Shanghai University of Engineering Science, Shanghai 201620, China; [email protected] (Y.G.); [email protected] (Z.F.) * Correspondence: [email protected]; Tel.: +82-63-270-2407

 Received: 20 December 2018; Accepted: 21 February 2019; Published: 26 February 2019 

Abstract: Make and model recognition (MMR) of vehicles plays an important role in automatic vision-based systems. This paper proposes a novel approach for MMR using the SqueezeNet architecture. The frontal views of vehicle images are first extracted and fed into a deep network for training and testing. The SqueezeNet architecture with bypass connections between the Fire modules, a variant of the vanilla SqueezeNet, is employed for this study, which makes our MMR system more efficient. The experimental results on our collected large-scale vehicle datasets indicate that the proposed model achieves 96.3% recognition rate at the rank-1 level with an economical time slice of 108.8 ms. For inference tasks, the deployed deep model requires less than 5 MB of space and thus has a great viability in real-time applications.

Keywords: vehicle make recognition; deep learning; residual SqueezeNet

1. Introduction In recent years, a plethora of innovative technologies and solutions are bringing intelligent transportation systems (ITSs) closer to reality. ITSs are advanced transportation systems that aim to advance and automate the operation and management of transport systems, thereby improving upon the efficiency and safety of transport. ITSs combine cutting-edge technology such as electronic control and communications with means and facilities of transportation [1]. The development of digital image processing and techniques offers many advantages in enabling many important ITSs applications and components such as advanced driver-assistance systems (ADASs), automated vehicular surveillance (AVS), traffic and activity monitoring, traffic behavior analysis, traffic management, among others. Vehicle make and model recognition (MMR) is of great interest in these applications, owing to heightened security concerns in ITSs. Over the years, numerous studies have been conducted to solve different challenges in vehicle detection, identification, and tracking. However, classification of vehicles into fine categories has gained attention only recently, and many challenges remain to be addressed [2–5]. The traditional vehicle MMR system relies on manual human observation or automated license plate recognition (ALPR) technique, these indirect progresses make the vehicle MMR system hardly meets the real-time constraints. Through manual observation, it is practically difficult to remember and efficiently distinguish between the wide variety of vehicle makes and models; it becomes a laborious and time-consuming task for a human observer to monitor and observe the multitude of screens and record the incoming or outgoing makes and models or to even spot the make and model being looked

Sensors 2019, 19, 982; doi:10.3390/s19050982 www.mdpi.com/journal/sensors Sensors 2019, 19, 982 2 of 14

Sensors 2019, 19, x FOR PEER REVIEW 2 of 14 for. The MMR systems that rely on license plates may suffer from forging, damage, occlusion, etc., asshown shown in inFigure Figure 1.1 In. In addition, addition, there there are are some some license-plates license-plates that that can can be ambiguousambiguous inin termsterms ofof interpreting lettersletters (e.g.,(e.g., between between “0” “0” and and “O”) “O”) or or license license types. types. Moreover, Moreover, in in some some areas, areas, it mayit may not not be requiredbe required to bearto bear the the license license plate plate at the at frontthe front or the or rear. the rear. If the If ALPR the ALPR system system is not is equipped not equipped to check to forcheck license for license plates plates at both at (front both and(front rear) and ends rear) of ends the vehicle,of the vehicle, it could it fail.could Consequently, fail. Consequently, when ALPRwhen systemsALPR systems fail to correctly fail to correctly read the detectedread the licensedetected plates license due plates to the abovedue to issues, the above the wrong issues, make-model the wrong informationmake-model couldinformation be retrieved could frombe retrieved the license-plates from the license-plates registry or database. registry or database.

(a) Ambiguous (b) Forged (c) Damaged (d) Duplicated

Figure 1. SomSomee cases that cause failure of the licenselicense plate recognition-based MMR systems.

To overcomeovercome thethe aforementionedaforementioned shortcomingsshortcomings in traditional vehicle identificationidentification systems, automated vehicle MMR techniques have becomebecome critical.critical. The The make make and and model of the vehicle recognized by by the the MMR MMR system system can can be be cross-checked cross-checked with with the the license-plate license-plate registry registry to check to check for any for anyfraud. fraud. In this In thisdual dual secure secure way, way, vision-based vision-based automated automated MMR MMR techniques techniques can can augment augment traditional ALPR-based vehicle classificationclassification systemssystems toto furtherfurther enhanceenhance security.security. Abdel Maseeh et al. [[6]6] proposed an approach to addressaddress thisthis specificspecific problemproblem byby combiningcombining global and local information and utilizing discriminative information labelled labelled by by a a human human expert. expert. They validated their approach throughthrough experiments on recognizing the make and model of sedan cars from single single view view images. images. Jang Jang and and Turk Turk [7] [7 demonstrated] demonstrated a car a car recognition recognition application application based based on onthe theSURF SURF feature feature descriptor descriptor algorithm, algorithm, which which fuses fuses bag-of-words bag-of-words and and structural verificationverification techniques. Baran Baran et et al. al. [8] [8] presented a smart cameracamera in intelligent transportation systems for the surveillance of vehicles. The The smart smart camera camera can can be used for MMR, ALPR, and color recognition of vehicles. In In [9], [9], Santos Santos et et al. al. introduced introduced two two car car recognition recognition methods, methods, both both relying relying on on the the analysis analysis of ofthe the external external features features of the of the car. car. The The first first method method evaluates evaluates the the shape shape of the of the car’s car’s rear, rear, exploring exploring its itsdimensions dimensions and and edges, edges, while while the the second second considers considers features features computed computed from from the the car’s car’s rear rear lights, notably, their their orientation, orientation, eccentricity, eccentricity, position, position, angle angle to the to thecar carlicense license plate, plate, and shape and shape contour. contour. Both Bothmethods methods are combined are combined in the in proposed the proposed automatic automatic car carrecognition recognition system. system. Ren Ren et etal. al.[10] [10 proposed] proposed a aframework framework toto detect detect a amoving moving vehicle’s vehicle’s make make an andd model model using convolutional neural networks (CNNs). Dehghan Dehghan et et al. al. [11] [11] proposed a CNN-basedCNN-based vehicle make, model, and color recognitionrecognition system, which is computationally inexpensive and provides state-of-the-art results. Huang et al. [[12]12] investigated fine-grained fine-grained vehicle recognition using deep CNNs. They localized the vehicle and the corresponding parts with the help of region-basedregion-based CNNs (RCNNs) and aggregated their features from a set of pre-trained CNNs to train a support vector machine (SVM) classifier.classifier. Gao and Lee [[13]13] proposed aa locallocal tiled tiled CNN CNN (LTCNN) (LTCNN) strategy strategy to alterto alter the the weight-sharing weight-sharing scheme scheme of CNN of CNN with with a local a tiledlocal structure,tiled structure, which which can provide can provide the translational, the translational, rotational, rotational and scale, and invariancescale invariance for MMR. for MMR. In this study, wewe focusfocus onon addressingaddressing thethe challengeschallenges inin realreal timetime andand automatedautomated vehiclevehicle MMR by utilizingutilizing state-of-the-artstate-of-the-art deepdeep learning-basedlearning-based techniques.techniques. Vanilla Vanilla CNNs have demonstrateddemonstrated impressive performance in vehiclevehicle MMR-related tasks. However, However, long training periods and large memory requirementsrequirements in in deployment deployment environments environments often often constrain constrain the the use use of suchof such models models in real-time in real- applicationstime applications as well as well as distributed as distributed environments. environments. Thus, Thus, in in this this study, study, a a smaller smaller CNNCNN architecture named SqueezeNet is adopted, which requires much fewerfewer parameters,parameters, consequently reducing memory constraints and making itit suitablesuitable forfor real-timereal-time applications.applications. In addition, compared to vanilla SqueezeNet,SqueezeNet, the the proposed proposed method, method, using ausing bypass a connection-inspiredbypass connection-inspired version of SqueezeNet,version of demonstratesSqueezeNet, demonstrates a 2.1% improvement a 2.1% improvement in recognition in accuracy. recognition Further, accuracy. we demonstrate Further, we thedemonstrate effectiveness the effectiveness of a data-clustering approach used in our study that considerably improves the speed of the data preparation and labelling process. The experiment results show that the proposed approach obtains higher recognition accuracy in less time and with fewer memory constraints. Sensors 2019, 19, 982 3 of 14 ofSensors a data-clustering 2019, 19, x FOR PEER approach REVIEWused in our study that considerably improves the speed of the3 dataof 14 preparation and labelling process. The experiment results show that the proposed approach obtains higher2.Sensors Background 2019 recognition, 19, x FOR PEER accuracy REVIEW inless time and with fewer memory constraints. 3 of 14

2.2.1. Background General Architecture of Vehicle MMR System The problem of automated vehicle classification into makes and models is an important task for 2.1. General Architecture of Vehicle MMR SystemSystem AVS and other ITSs applications. The general architecture of the vehicle MMR system can be depicted as shownThe problem in Figure of of automated2. automated Most studies vehicle vehicle first classification classification adopt a vehicle into into makes detection makes and and modelsstep models that is an isproduces animportant important regions task task forof forinterestsAVS AVS and and(ROIs)other other ITSs containing ITSsapplications. applications. the vehicles’ The general The faces general archit (front),ecture architecture segmented of the vehicle of from the vehicleMMRthe background. system MMR can system Thebe depicted vehicle can be depictedMMRas shown systems as in shown Figure then in work 2. Figure Most on2 these.studies Most ROIs studies first [14]. adopt first adopt a vehicle a vehicle detection detection step step that that produces produces regions regions of ofinterests interests (ROIs) (ROIs) containing containing the the vehicles’ vehicles’ faces faces (fr (front),ont), segmented segmented from from the the background. background. The vehicle MMR systems then work on these ROIs [[14].14].

Figure 2. Flowchart of the typical MMR system. Figure 2. Flowchart of the typical MMR system. The traditional vehicle detectionFigure 2. Flowchart methods ofcan the be typical mainly MMR divided system. into three categories [15]. One is theThe edge traditional feature-based vehicle method, detection which methods first detec can bets mainlypossible divided vehicle into candidates three categories from input [15]. images, One is thethen edge Theemploying traditional feature-based appropriate vehicle method, detection algorithms which methods first for detects verification can be possible mainly [16]. vehicledivided Another candidates into method three categories fromis the input color-based [15]. images, One thenmethod,is the employingedge which feature-based utilizes appropriate the method, large algorithms variations which first for in verificationdetec vehiclets possible colors [16 [17].]. vehicle Another The candidates third method commonly-used from is the input color-based images,vehicle method,detectionthen employing which method utilizes appropriate involves the largetraining algorithms variations a robust for in vehiclverification vehiclee detector colors [16]. [17 through Another]. The third active method commonly-used learning is the tools color-based such vehicle as detectionAdaBoostmethod, which method[18]. Nowadays, utilizes involves the thelarge training CNNs variations a based robust vehiclein vehicle vehicle detection detector colors [17].methods through The hasthird active been commonly-used learning widely applied, tools suchvehicle such as AdaBoostasdetection Fast R-CNN [method18]. Nowadays, [19], involves Faster the training R-CNN CNNs a based[20], robust vehicleand vehicl YOLO detectione detector [21] methodsetc. through As for has activethe been vehicle widelylearning recognition, applied, tools such such the as FasttraditionalAdaBoost R-CNN [18]. method [19 Nowadays,], Faster is to R-CNNextract the CNNs vehicle [20], and based features YOLO vehicle [such21] detection etc. as Haar As for methodsand the SIFT vehicle has features. recognition,been widely Then, the theapplied, traditionalclassifiers such methodsuchas Fast as R-CNN isSVM to extractand [19], K-nearest vehicleFaster featuresR-CNNneighbors [20], such are and as learned Haar YOLO and to [21] SIFTclassify etc. features. Asthe for vehicle Then,the vehicle into the classifiersdifferent recognition, models. such the as SVMBecausetraditional and of K-nearest themethod variations is neighbors to extract in the are vehiclevisual learned appearance features to classify suchs andthe as theHaar vehicle confusion and into SIFT different between features. models. certain Then, Because vehiclethe classifiers types, of the variationsdevelopingsuch as SVM in a the highlyand visual K-nearest accurate appearances neighborsvehicle and MMR are the confusionlearnedsystem isto between stillclassify very certainthe challenging vehicle vehicle into using types, different the developing traditional models. a highlymethods.Because accurate of the variations vehicle MMR in the system visual is appearance still very challengings and the confusion using the between traditional certain methods. vehicle types, developing a highly accurate vehicle MMR system is still very challenging using the traditional 2.2.methods. Convolutional Neural Networks Recently, deepdeep networks networks are are increasingly increasingly being being used used to extract to extract discriminative discriminative features features for vehicle for 2.2. Convolutional Neural Networks MMR.vehicle TheMMR. deep The network deep network concept concept has been has around been ar sinceound 1980, since with 1980, similar with similar ideas including ideas including neural networkneuralRecently, network and backpropagation. deep and backpropagation.networks Theare resurgenceincreasingly The resurgence of interestbeing of used ininterest deep to networksinextract deep discriminativenetworks has been has brought been features aboutbrought for by theaboutvehicle breakthrough by MMR. the breakthrough The in deep Restricted network in Re Boltzmannstricted concept Boltzmann has Machines been arMachines (RBM)ound since from (RBM Hinton1980,) from with [22 Hinton ].similar In the [22]. ideas deep In networks,including the deep therenetworks,neural are network massive there andare parameters massivebackpropagation. parameters which require The which resurgence large-scale require of datasetslarge-scale interest forin deepdatasets training. networks for The training. CNNs has been areThe brought usually CNNs adoptedareabout usually by to the reduceadopted breakthrough the to number reduce in ofRe the parameters.stricted number Boltzmann of In pa addition,rameters. Machines many In addition, advanced(RBM) from ma techniques nyHinton advanced [22]. such In astechniques the dropout, deep maxout,suchnetworks, as anddropout, there max-pooling are maxout, massive have and parameters been max-pooling coupled which into have requ the been CNNire large-scalecoupled structure. into datasets Figure the 3CNN shows for training. structure. one convolutional The Figure CNNs 3 layer.showsare usually Byone going convolutional adopted deeper to intoreduce layer. the theBy convolutional goingnumber deeper of networks,pa inrameters.to the convolutional CNNs In addition, dictate networks, themany performance advanced CNNs techniquesdictate of various the applicationsperformancesuch as dropout, suchof various asmaxout, AlexNet applications and [23 max-pooling] and such GoogLeNet as AlexNet have [ 24been ].[23] coupled and GoogLeNet into the [24].CNN structure. Figure 3 shows one convolutional layer. By going deeper into the convolutional networks, CNNs dictate the performance of various applications such as AlexNet [23] and GoogLeNet [24].

Figure 3. Illustration of one convolutional layer. Figure 3. Illustration of one convolutional layer.

Figure 3 illustrates theFigure architecture 3. Illustration of one of oneconvolutional convolutional layer, layer. in which the bottom layer includes several N × N inputs and the top layer includes several M × M convolutional outputs. The Figure 3 illustrates the architecture of one convolutional layer, in which the bottom layer includes several N × N inputs and the top layer includes several M × M convolutional outputs. The

Sensors 2019, 19, 982 4 of 14

SensorsFigure 2019, 193, xillustrates FOR PEER REVIEW the architecture of one convolutional layer, in which the bottom4 layer of 14 includes several N × N inputs and the top layer includes several M × M convolutional outputs. convolutionalThe convolutional layer layer conducts conducts convolution convolution operations operations across across the the input input maps maps with with a aK K ×× KK filter filter for for eacheach map, map, resulting resulting in in a a (N (N −− KK + +1) 1) × ×(N(N − K− +K 1) + feature 1) feature map. map. Here, Here, M = M N = − N K− + K1. + 1.

2.3.2.3. SqueezeNet SqueezeNet TheThe SqueezeNetSqueezeNet [ 25[25]] is ais smaller a smaller CNN CNN architecture archit thatecture uses that fewer uses parameters fewer parameters while maintaining while maintainingcompetitive accuracy.competitive Several accuracy. strategies Several are strategies employed are on theemployed CNN basis on the todesign CNN basis the SqueezeNet: to design the (1) SqueezeNet:replace 3 × 1)3 filtersreplace with 3 × 3 1 filters× 1 filters, with 1 (2)× 1 decreasefilters, 2) thedecrease number the ofnumber input of channels input channels to 3 × 3 to filters, 3 × 3 filters,(3) downsample 3) downsample late inlate the in the network network so so that that the the convolution convolution layers layers havehave largelarge activation activation maps. maps. TheThe SqueezeNet SqueezeNet is is comprised comprised mainly mainly of of Fire Fire module moduless that that are are squeeze squeeze convolution convolution layers layers with with only only 11 × 11 filters. filters. These These layers layers are are then then fed fed into into an an expand expand layer, layer, which has a mix of 11 ×× 1 and 3 × 33 convolutionconvolution filters, filters, as as shown shown in in Figure Figure 44..

Expand

Squeeze

Relu Relu

1×1 convolution filters

1×1 and 3×3 convolution filters Figure 4. Micro-architectural view: Convolution filters organization in the Fire modules. Figure 4. Micro-architectural view: Convolution filters organization in the Fire modules. 3. Vehicle Data Clustering and Labelling 3. Vehicle Data Clustering and Labelling Deep learning typically benefits from large amounts of data. We create a dataset pool designed specificallyDeep learning for the typically task of recognition benefits from of thelarge vehicle amou makents of data. and model. We create Date a dataset set are pool collected designed from specificallycurrently driven for the vehicles task of in recognition Korea, including of the Korean vehicle manufactured make and model. and commonlyDate set are imported collected vehicles from currentlyfor proposed driven network. vehicles In the in followingKorea, includin sections,g dataKorean clustering manufactured and labeling and arecommonly discussed imported in detail. vehicles for proposed network. In the following sections, data clustering and labeling are discussed in3.1. detail. Dataset Clustering In order to train deep networks, a large-scale dataset is required, labelling of which is difficult and 3.1. Dataset Clustering tedious. Thus, a cluster method is required for speeding up the labelling process. The appearance of a vehicleIn order will change to train under deep varyingnetworks, environmental a large-scale conditions dataset is required, and also based labelling onmarket of which requirements. is difficult andThis tedious. makes theThus, vehicle a cluster model method recognition is required a challenging for speeding task. up the labelling process. The appearance of a Avehicle cluster will algorithm change aims unde tor differentiatevarying environmental images into groupsconditions so thatand the also distance based betweenon market the requirements.different groups This is highest.makes the The vehicle K-means model algorithm recognition is most a efficientchallenging and suitabletask. for large-scale datasets. However,A cluster if we algorithm apply the aims K-means to differentiate directly on images the raw into images, groups the accuracyso that th ise compromiseddistance between because the differentthe raw imagesgroups areis highest. not discriminative. The K-means This algorith problemm is will most result efficient in burdensome and suitable tasks for to large-scale manually datasets.relabel the However, incorrect if vehicle we apply data. the In K-means addition, directly the computational on the raw images, time of the this accuracy direct clustering is compromised strategy becausewas high the because raw images the raw are images not discriminative. were of large size. This In problem this study, will a cluster result methodin burdensome that incorporates tasks to manuallydeep learning relabel is adoptedthe incorrect to assist vehicle faster data. labeling In addition, of a large-scale the computational vehicle make time and modelof this dataset.direct clusteringThe cluster strategy method was can high automatically because the divide raw images the images were intoof large groups size.and In this each study, group a cluster is divided method into thatclasses. incorporates The framework deep learning for data is clustering adopted isto shownassist faster in Figure labeling5 and of the a large-scale detailed steps vehicle are asmake follows: and model dataset. The cluster method can automatically divide the images into groups and each group is divided into classes. The framework for data clustering is shown in Figure 5 and the detailed steps are as follows:

Sensors 2019, 19, x FOR PEER REVIEW 5 of 14

(1) The vehicles are first detected based on frame difference and symmetrical filter. The frame difference method is applied to images by shifting one image with moderate pixels to generate another image. The difference between these two images is used to detect the vehicle by a symmetrical filter, which makes use of the symmetrical structure of the vehicles. (2) Then, the discriminative and simple features are extracted based on deep learning before using the K-means algorithm. To use deep learning for feature extraction, we introduce a third-party dataset, which may have a lower number of images and car models. The third-party dataset named as “compcars” [26] is labelled and used for training the deep network. The trained model is used to extract features of the unlabeled vehicles. However, the features are still high- dimension for adopting the K-means algorithm. Thus, we use principal component analysis Sensors 2019, 19, 982 5 of 14 (PCA) to reduce the dimensions. (3) Finally, the K-means algorithm is used to cluster the data and assign the group to each data. The (1) manualThe vehicles correction are firstof the detected wrongly based clustered on frame data is difference performed and to get symmetrical a new dataset. filter. This The dataset frame isdifference added to methodthe third-party is applied dataset to images to train byshifting a more onepowerful image deep with moderatenetwork. pixelsIn this to sense, generate the labellinganother is image. performed The iteratively. difference For between every theseiteration, two we images introduce is used 100,000 to detect images the as vehicleincremental by a data.symmetrical filter, which makes use of the symmetrical structure of the vehicles. (2) OurThen, study the discriminativefocuses on the and surveillance simple features vehicle are data extracted while the based “compcars” on deep learning contain beforeonly 44,481 using imagesthe in K-means 281 different algorithm. models, To which use deep is not learning enough for for featurepractical extraction, use. Thus, we we introduce collect 291,602 a third-party images from dataset,the surveillance which may camera have that a lower is set number up on an of actual images street. and car The models. data collection The third-party time spans dataset one year, namedand we as divide “compcars” the images [26] isinto labelled two sets: and usedset-1 for(145,800 training images) the deep and network. set-2 (145,802 The trained images). model We first useis used the to“compcars” extract features images of theto train unlabeled a deep vehicles. model However,and then extract the features features are stillfor high-dimensionthe set-1. After clusteringfor adopting this set, the we K-means use these algorithm. images to Thus, train wea new use principalmodel. The component new model analysis is used (PCA) to cluster to reduce the set-2. theFigure dimensions. 6 shows some examples of clustered images. These images contain many kinds of (3)variationsFinally, such the as K-meansrainy/sunny, algorithm daytime/nighttime, is used to clusterillumination the data variations, and assign and parts the groupmissing to due each to the cardata. detection The manual failure. correctionAs a result, of a large-scale the wrongly dataset clustered containing data is 291,602 performed images to gethas abeen new labelled, dataset. whichThis consists dataset of 766 is added models. to theThe third-party accuracy of dataset the cluster to train of set-2 a more reached powerful 85%, deepwhich network. means that In thisthe secondsense, iteration the labelling of the cluster is performed reduced iteratively. the manual For label every work iteration, of 85%. we Thus, introduce our cluster 100,000 sped images up the as labellingincremental work significantly. data.

Figure 5. Framework for the proposed car detection and clustering method. Figure 5. Framework for the proposed car detection and clustering method. Our study focuses on the surveillance vehicle data while the “compcars” contain only 44,481 images in 281 different models, which is not enough for practical use. Thus, we collect 291,602 images from the surveillance camera that is set up on an actual street. The data collection time spans one year, and we divide the images into two sets: set-1 (145,800 images) and set-2 (145,802 images).

We first use the “compcars” images to train a deep model and then extract features for the set-1. After clustering this set, we use these images to train a new model. The new model is used to cluster the set-2. Figure6 shows some examples of clustered images. These images contain many kinds of variations such as rainy/sunny, daytime/nighttime, illumination variations, and parts missing due to the car detection failure. As a result, a large-scale dataset containing 291,602 images has been labelled, which consists of 766 models. The accuracy of the cluster of set-2 reached 85%, which means that the second iteration of the cluster reduced the manual label work of 85%. Thus, our cluster sped up the labelling work significantly. Sensors 2019, 19, 982 6 of 14 Sensors 2019, 19, x FOR PEER REVIEW 6 of 14

Figure 6. Examples of clustered images.

3.2. Dataset Labelling As mentioned mentioned earlier, earlier, we we have have labelled labelled 766 766 models models for for the the 291,602 291,602 vehicle vehicle images images collected. collected. We Welabelled labelled each each vehicle vehicle with with its itsrespective respective model model name, name, make make and and type type with with its its respective class identification.identification. Some of the datasetsdatasets are shown in Table1 1..

Table 1. Examples of label of make, model, and type with their respective class IDs. Table 1. Examples of label of make, model, and type with their respective class IDs. ID Make Model Type ID Make Model Type ID Make Model Type ID Make Model Type 11 Hyundai Hyundai Grand-Starex MiniBus MiniBus 6 6 Kia Grand-Carnival Grand-Carnival SumoCar SumoCar 22 Hyundai Hyundai Starex-2006-Model MiniBusMiniBus 7 7 VolkswagenVolkswagen New-CCNew-CC Sedan Sedan 33 Chevrolet Chevrolet 25 25 Tons-Cargo-Truck TruckTruck 8 8 Samsung Samsung QM3 QM3 Sedan Sedan 4 Kia Sorento SumoCar 9 Hyundai Porter-2 MiniTruck 4 Kia Sorento SumoCar 9 Hyundai Porter-2 MiniTruck 5 Hyundai The-Luxury-Grandeur Sedan 10 Hyundai Avante-Hybrid Sedan 5 Hyundai The-Luxury-Grandeur Sedan 10 Hyundai Avante-Hybrid Sedan 4. The Proposed MMR System 4. The Proposed MMR System 4.1. Residual SqueezeNet Architecture 4.1. Residual SqueezeNet Architecture After vehicle detection and clustering, we train a deep neural network capable of recognizing 766 modelsAfter vehicle of different detection companies. and clustering, In this study, we train a smaller a deep CNN neural called network SqueezeNet capable [25] of is recognizing employed, 766which models can achieve of different performance companies. comparable In this study, with a smallerother CNN CNN frameworks called SqueezeNet such as [25 AlexNet] is employed, while whichrequiring can fewer achieve parameters, performance which comparable is practicalwith in a real-time other CNN scenario. frameworks such as AlexNet while requiringAs shown fewer in parameters, Figure 7a, whichthe basic is practicalSqueezeNet in a star real-timets with scenario. a convolution layer, followed by eight Fire Asmodules, shown inending Figure with7a, the another basic SqueezeNet convolution starts layer. with The a convolution number of layer, filters followed per Fire by module eight Fire is modules,increased endinggradually with from another the beginning convolution to the layer. end The of the number network. of filters The max-pooling per Fire module with is a increased stride of graduallytwo is performed from the after beginning layers conv1, to the endFire4, of Fire8, the network. and conv10. The max-poolingReLU is adopted with as a stridethe activations of two is performedfunction and after Dropout layers with conv1, a ratio Fire4, of Fire8,0.5 is used and conv10.after the ReLU Fire9 ismodule. adopted as the activations function and DropoutTo improve with the a ratiorecognition of 0.5 is accuracy, used after a modified the Fire9 Sq module.ueezeNet is designed by adding some simple bypass connections to the SqueezeNet between some Fire modules. In our simple bypass architecture, bypass connections are added around Fire modules 3, 5, 7, and 9, requiring these modules to learn a residual function between input and output as shown in Figure 7b. As in ResNet, to implement a bypass connection around Fire3, we set the input to Fire4 equal to the output of Fire2 + output of Fire3, where the + operator is an element-wise addition, as shown in the Figure 8. This changes the regularization applied to the parameters of these Fire modules and, as per ResNet, can improve the final accuracy and/or trainability of the full model.

Sensors 2019, 19, 982 7 of 14 Sensors 2019, 19, x FOR PEER REVIEW 7 of 14

95 256 maxpool 128 128 maxpool 256 384 conv1 Fire2 Fire3 Fire4 Fire5 Fire6

227×227 766 512 avgpool 512 maxpool 384 SUV Hyundai softmax conv10 Fire9 Fire8 Fire7 i30(New)

Sensors 2019, 19, x FOR PEER REVIEW (a) 7 of 14

95 256 maxpool95 128 128 maxpool256 256 384 conv1maxpool Fire2128 Fire3128 Fire4maxpool Fire5256 Fire6384 conv1 Fire2 Fire3 Fire4 Fire5 Fire6 227×227 766 512 227×227 avgpool766 512 maxpool512 384 SUV Hyundai avgpool 512 maxpool 384 SUV Hyundai softmax conv10 Fire9 Fire8 Fire7 i30(New) softmax conv10 Fire9 Fire8 Fire7 i30(New)

((ab) ) FigureFigure 7. Illustration 7. Illustration of SqueezeNet-based of SqueezeNet-based MMR MMR system. system. (a) SqueezeNet (a) SqueezeNet architecture. architecture. (b) Our proposed (b) Our 95 256 Residualproposed SqueezeNet Residual architecture.SqueezeNetmaxpool architecture.128 128 maxpool 256 384 conv1 Fire2 Fire3 Fire4 Fire5 Fire6 To improve the recognition accuracy, a modified SqueezeNet is designed by adding some simple 227×227 766128 512 bypass connections to the SqueezeNet betweenavgpool some Fire512 modules.maxpool In our simple384 bypass architecture, SUV Hyundai softmax conv10 Fire9 Fire8 Fire7 bypass connections are addedi30(New) around Fire modules1×1 3, Conv 5, 7, and 9, requiring these modules to learn a residual function between input and output as shownExpand in Figure64 7b. As in ResNet, to implement 128 1×1 Conv Output 128 Input 16 a bypass connection around Fire3,Squezze we set the input(b) to Fire4 equalConcat/Eltwise to the output of Fire2 + output of 3×3 Conv Fire3, where the + operator is an element-wise addition, as shown64 in the Figure8. This changes the Figure 7. Illustration of SqueezeNet-based MMRExpand system. (a) SqueezeNet architecture. (b) Our regularization applied to the parameters of these Fire modules and, as per ResNet, can improve the proposed Residual SqueezeNet architecture. final accuracyFigure and/or 8. Proposed trainability residual of the Squeeze full model. Net architecture with simple bypass connections.

During training process, the weight 128parameters are randomly initialized with a Gaussian distribution. The learning rate is initially set as 0.01 and decreases with step size of 10. In addition, 1×1 Conv the RMSprop is applied as the optimizer. Expand 64 128 1×1 Conv Output 128 Input 16 Squezze Concat/Eltwise 3×3 Conv 4.2. Hardware and Software 64 Expand We used a GeForce GTX 1080 in our setup to benefit from faster training times in deep learning frameworksFigureFigure with 8.8. the ProposedProposed support residualresidual of Cu SqueezeSqueezeda and Net cuDNN.Net architecturearchitec Theture CPUwith with used simple simple is bypass bypassan Intel(R) connections. connections. Core (TM) i7-4790 CPU with eight cores operating at 3.60 GHz. During training process, the weight parameters are randomly initialized with a Gaussian DuringAn Nvidia training DIGITS process, is used the as aweight second parametersframework thatare worksrandomly on top initialized of , withor more a Gaussianprecisely, distribution. The learning rate is initially set as 0.01 and decreases with step size of 10. In addition, distribution.a separate Caffe The forklearning by Nvidia rate is ofinitially the official set as re 0.01pository. and decreases For training, with DIGITS step size provide of 10. functionalityIn addition, the RMSprop is applied as the optimizer. thefor RMSpropon-the-fly is data applied augmentation as the optimizer. using random crops with a fixed size, e.g., randomly taking 227 × 4.2.227 Hardware regions from and Software a 256 × 256 image in our case, and random horizontal flipping. Furthermore, DIGITS 4.2.visualize Hardware learning and Software rate decay as shown in Figure 9. Visualizations of training loss, validation loss, and validationWeWe used used accuracy a a GeForce GeForce in an GTX GTX interact 1080 1080 inive, in our our constantly setup setup toto benefitupdatingbenefit from from graph faster faster make training training it easy timestimes to see inin deepwhetherdeep learninglearning or not frameworksframeworksthe training with withis going thethe support supportwell. A of ofdisadvantage CudaCuda andand cuDNN.cuDNN. is that, TheThebecause CPUCPU DIGITS usedused isis anworkan Intel(R)Intel(R) on a CoreCorefork (TM)of(TM) Caffe, i7-4790i7-4790 new CPUCPUdevelopments withwith eighteight coresofcores the operating Caffeoperating repository at at 3.60 3.60 GHz. areGHz. merged into it only when the developers of DIGITS decide to doAnAn so. NvidiaNvidia DIGITSDIGITS isis usedused asas aa secondsecond frameworkframework thatthat worksworks onon toptop of Caffe, or more precisely, aa separateseparate CaffeCaffe forkfork byby NvidiaNvidia ofof thethe officialofficial repository.repository. For For training, training, DIGITSDIGITS provideprovide functionalityfunctionality forfor on-the-flyon-the-fly data data augmentation augmentation using using random random crops crops with with a fixed a fixed size, size, e.g., e.g.,randomly randomly taking taking 227 × × × 227227 regions227 regions from a from 256 × a 256 256 image256 in image our case, in our and case, random and random horizontal horizontal flipping. flipping. Furthermore, Furthermore, DIGITS DIGITSvisualize visualize learning learning rate decay rate as decay shown as in shown Figure in 9. Figure Visualizations9. Visualizations of training of trainingloss, validation loss, validation loss, and loss,validation and validation accuracy accuracy in an interact in anive, interactive, constantly constantly updating updating graph make graph it make easy itto easy see whether to see whether or not orthe not training the training is going is goingwell. well.A disadvantage A disadvantage is that, is because that, because DIGITS DIGITS work workon a onfork a forkof Caffe, of Caffe, new developments of the Caffe repository are merged into it only when the developers of DIGITS decide to do so.

Sensors 2019, 19, 982 8 of 14

new developments of the Caffe repository are merged into it only when the developers of DIGITS Sensorsdecide 2019 to, do 19, so.x FOR PEER REVIEW 8 of 14

0.010

0.008

0.006

0.004 LearningRate

0.002

0 0105153020 25 Epoch (a) 7 100

6 80 5 60 4

Loss 3 40

2 (%) Accuracy 20 1 0 0 0105153020 25 Epoch

0105153020 25 loss (train) accuracy_top1 (val) accuracy_top5 (val) loss (val) (b)

Figure 9. (a) Learning rate decay is visualized over trainingtraining epochs. Here, a st stepep function decay is used, and thethe learninglearning raterate is is divided divided by by 10 10 after after one-third one-third and and two-thirds two-thirds oftraining. of training. (b) Training(b) Training loss, loss,validation validation loss andloss accuracyand accuracy are plotted are plotted over trainingover training epochs. epochs. 5. Performance Evaluation 5. Performance Evaluation 5.1. Dataset 5.1. Dataset Data from real streets using cameras are collected from seven urban areas in Korea and labelled Data from real streets using cameras are collected from seven urban areas in Korea and labelled to create our own dataset of vehicles, yielding over 291,602 labelled vehicle images for vehicles to create our own dataset of vehicles, yielding over 291,602 labelled vehicle images for vehicles recognition. The labelled images are cropped automatically for classification by using symmetrical recognition. The labelled images are cropped automatically for classification by using symmetrical filter and split into training, validation and test sets to train and evaluate the neural networks. For the filter and split into training, validation and test sets to train and evaluate the neural networks. For evaluation, we use the rule of thumb according to the Pareto Principle and select 20% as the test the evaluation, we use the rule of thumb according to the Pareto Principle and select 20% as the test set and split the remainder again into an 80% training set and a 20% validation set. A subset of set and split the remainder again into an 80% training set and a 20% validation set. A subset of our our cropped vehicle dataset which include all the 766 classes and labels will be available at: https: cropped vehicle dataset which include all the 766 classes and labels will be available at: //sites.google.com/view/jbnu-selab. https://sites.google.com/view/jbnu-selab. 5.2. Performance of SqueezeNet and Residual SqueezeNet 5.2. Performance of SqueezeNet and Residual SqueezeNet We evaluate the performance and properties of SqueezeNet and Residual SqueezeNet trained withWe vehicle evaluate datasets the toperformance prove the following and properties hypotheses. of SqueezeNet First, we showand Residual that SqueezeNet SqueezeNet recognition trained withis more vehicle accurate datasets as compared to prove the to following AlexNet in hypoth termseses. of classification First, we show and that more SqueezeNet robust as recognition compared is more accurate as compared to AlexNet in terms of classification and more robust as compared to AlexNet and GoogLeNet in terms of speed. The performance of Residual SqueezeNet surpasses that of SqueezeNet in terms of recognition. Second, we analyze the generalization capabilities of the classifier network and changes to the amount of data provided. Lastly, we compressed SqueezeNet

Sensors 2019, 19, 982 9 of 14 to AlexNet and GoogLeNet in terms of speed. The performance of Residual SqueezeNet surpasses that of SqueezeNet in terms of recognition. Second, we analyze the generalization capabilities of the classifier network and changes to the amount of data provided. Lastly, we compressed SqueezeNet and Residual SqueezeNet to less than 5 MB, which is 53 times smaller than AlexNet and 11 times smaller than GoogLeNet in size. As shown in Table2, we have trained SqueezeNet model with training samples of 233,280, which contain all 766 classes for 30 epochs and tested the model with 58,322 samples, which contain all the 766 classes. The SqueezeNet model achieved rank-1 and rank-5 accuracies of 94.23% and 99.38% respectively. The training loss is almost negligible, as shown in Table2. After adding the bypass connections around Fire modules 3, 5, 7, and 9, these modules learn a residual function between input and output. Interestingly, the simple bypass enabled an improvement in accuracy as compared to the architecture without bypass connections. Adding the simple bypass connections yielded an increase of 2.1 percent-age-points in rank-1 accuracy and 0.14 percentage-points in rank-5 accuracy without increasing the model size.

Table 2. Performance of SqueezeNet and residual SqueezeNet.

No. of No. of Training No. of Test Rank-1 Rank-5 Architecture Loss Classes Samples Samples Accuracy Accuracy SqueezeNet 766 233,280 58,322 94.23% 99.38% 0.0516 Proposed Residual 766 233,280 58,322 96.33% 99.52% 0.0397 SqueezeNet

5.3. Generalization of New Data and Processing Evaluation of the generalization capabilities of the model and dataset is important for demonstrating its utility for new applications. We tested SqueezeNet and Residual SqueezeNet in real-time with new video data, which contains 112 vehicles, out of which, 111 are correctly recognized. We also recorded recognition time per vehicle in the video, which was calculated to be 108.81 ms using SqueezeNet, as shown in Table3. Residual SqueezeNet gives the same accuracy in terms of recognition, but in terms of recognition time, Residual SqueezeNet took 0.44 ms more than SqueezeNet recognition time. However, the recognition capabilities of both models prove that we can use both models in real-time due to their fast processing speed.

Table 3. Generalization to new data and processing time.

Correctly Recognized Per Vehicle Recognition Architecture Vehicles (out of 112) Time (ms) SqueezeNet 111 108.81 Proposed Residual SqueezeNet 111 109.25

5.4. Model Size and Number of Parameters We proposed decreasing the number of parameters by using squeeze layers to decrease the number of input channels by 3 × 3 filters. With the reduction in the number of parameters, the SqueezeNet and Residual SqueezeNet file size is compressed to less than 5 MB, which is 53 times smaller than AlexNet and 11 times smaller than GoogLeNet in size. The number of parameters and the size of the model is summarized in Table4. Sensors 2019, 19, 982 10 of 14

Table 4. Model size and number of parameters.

Proposed Residual AlexNet [23] GoogLeNet [23] SqueezeNet SqueezeNet No. of Parameters 59,983,292 6,752,430 1,118,974 1,118,974 Model Size (MB) 229.0 49.4 4.4 4.4

5.5. Comparison of the Proposed Method with State-Of-The-Art Methods Owing to the traditional method-based literatures of vehicle MMR difficult-to-handle large-scale class data, they used fewer vehicle classes for experiments [27–30]. Thus, it is unrealistic to make exact comparisons. Nevertheless, to provide different views to analyse our work, as did in [31], we list other relevant works’ results as listed in Table5.

Table 5. Reported results of few state-of-the-art methods.

Methods Classes No. of Samples Rank-1 Accuracy (%) Recognition Time (ms) He et al. [2] 30 1196 92.47 500.0 Llorca et al. [4] 52 1342 94.00 - Pearce et al. [27] 74 262 96.00 432.5 Psyllos et al. [28] 11 110 85.00 363.8 Psyllos et al. [29] 10 400 92.00 913.0 Siddiqui et al. [30] 29 6601 94.84 137.9 Fang et al. [31] 281 44,481 98.63 - Proposed Method 766 291,602 96.33 109.5

From it, we can observe that the proposed method achieves a state-of-the-art rank-1 accuracy on the 766 models, which indicates the promising performance of our method for real-world application. Other performance parameters, such as execution time and memory usage were not compared because the most of these methods were based on traditional image processing technology instead of deep neural networks. For a fair comparison, in this study, we also compared our results with popular deep networks of AlexNet and GoogLeNet, as reported in [26], which contemplated large-scale vehicle models. We have proposed steps towards a more disciplined approach for the design-space exploration of CNNs. Towards this goal, we have presented SqueezeNet and Residual SqueezeNet architectures for extraction of vehicle information such as make, model, and type. The SqueezeNet CNN architecture, which required less than 2% of parameters as compared to AlexNet, surpassed the AlexNet-level architectures by 94.23% in terms of recognition accuracy on our vehicle database. Additionally, with model compression techniques, we are able to compress SqueezeNet and Residual SqueezeNet file size to less than 5 MB. Using SqueezeNet, we were able to pull up the accuracy to 94.23% of rank-1 level, which surpasses the accuracy for AlexNet, which is 93.57%. By adding the bypass connections, we were able to increase the rank-1 accuracy of SqueezeNet by 2.1% and reduce the loss value with the same model size and comparable time cost. The proposed system, when implemented in real time, proves to be very efficient, with an economical time slice of 108.81 ms. Our system is highly reliable for real-time applications given the subsequent decrease in the number of parameters, memory size and computational time. SqueezeNet and Residual SqueezeNet involve considerably smaller architectures and perform very well when it comes to processing time as compared to GoogLeNet and AlexNet. The proposed model is also memory efficient; the total size of the SqueezeNet and Residual SqueezeNet is 4379 KB, which is much smaller compared to the size of GoogLeNet, which is 49,446 KB and that of AlexNet, which is 229 MB. Furthermore, the performances of small architectures have been found to be better than the later models. Sensors 2019, 19, 982 11 of 14

In fact, for a given accuracy, we can design different architectures with a different number of parameters and enable them to search for the suitable network architectures that can achieve the given accuracy. By using the approach provided by SqueezeNet, we were able to achieve better accuracy than AlexNet with a lesser number of parameters by selecting the most important parameters that give the best accuracy and neglecting the parameters that have a lower effect on the accuracy. Another advantage of SqueezeNet over AlexNet and GoogLeNet is fewer number of parameters, which translates into a lower probability of overfitting. Hence, it performs better on the test set and is more generalized to the new data as compared to the AlexNet and GoogLeNet. Our models have been successful to a greater extent in decreasing the vulnerability of such systems in real-time, especially in terms of recognition. The performance of the proposed Residual SqueezeNet models as compared to other state-of-the-art deep learning models is given in Table6. By our network, the classification time for each vehicle image achieves 109.54 ms, which much faster than AlexNet and GoogLeNet.

Table 6. Proposed residual SqueezeNet in comparison with the state-of-the-art deep learning models.

Proposed Residual Parameter AlexNet [23] GoogleNet [23] SqueezeNet SqueezeNet Rank-1 Accuracy (%) 93.57 94.31 94.23 96.33 Rank-5 Accuracy (%) 99.02 99.46 99.38 99.52 Loss 0.0835 0.1259 0.0516 0.0397 Parameters (K) 59,983 6752 1118 1118 Size (KB) 229,000 49,446 4379 4379 Classification Time (ms) 294.71 531.40 108.81 109.54

5.6. Performance of SqueezeNet with Respect to Fire Modules To evaluate which architecture was best in terms of accuracy, we performed a series of experiments during the course of our study. In one of our experiments, we tried to modify the SqueezeNet in order to know why we selected eight Fire modules. Our experimental results proved that while using SqueezeNet with four Fire modules we achieved 93.60% of rank-1 accuracy and the simultaneous training loss was 0.1367, whilst the rank-5 accuracy achieved was 99.19%. Similarly, for six Fire modules we achieved 93.58% rank-1 accuracy and the training loss was 0.1765 and 99.14% rank-5 accuracy. For nine Fire modules, we achieved 93.56% accuracy with 0.1725 training losses and 99.19% rank-5 accuracy. Finally, with ten Fire modules we achieved 93.38% accuracy with 0.2044 training losses and 99.17% rank-5 accuracy. We found that the best accuracy of 94.23% is achieved with eight Fire modules, as shown in Table7.

Table 7. SqueezeNet performance of models with respect to Fire modules.

No. of Fire Modules Training Loss Rank-1 Accuracy (%) Rank-5 Accuracy (%) 10 0.2044 93.38 99.17 9 0.1725 93.56 99.18 8 0.0516 94.23 99.38 6 0.1765 93.58 99.14 4 0.1367 93.61 99.19

The graphical trend of the number of parameters with respect to Fire modules is shown in Figure 10. We can see that with the growing number of Fire modules, the size of parameters also increases. As it can be seen, the SqueezeNet with 4, 6 and 8 Fire modules have almost the same Sensors 2019, 19, 982 12 of 14 parameter number of 1,000,000. When the number of Fire modules exceeds eight, the parameter numberSensors 2019 increase, 19, x FOR drastically. PEER REVIEW This increase results in a corresponding increase in the model size,12 which, of 14 in turn, makes its implementation difficult for use with embedded systems that have a limited memory.

5

4 ) 6

3

2 Parameters Parameters (×10 1

0 468910 Fire Modules

Figure 10. Number of parameters of SqueezeNet with respect to Fire modules. Figure 10. Number of parameters of SqueezeNet with respect to Fire modules. 6. Conclusions 6. ConclusionsIn this study, an optimized SqueezeNet has been modeled for vehicle MMR. Our contributions are threefold. First, a large-scale dataset for vehicles recognition with over 291,602 images comprising In this study, an optimized SqueezeNet has been modeled for vehicle MMR. Our contributions 766 classes has been created and is made publicly available (http://cse.jbnu.ac.kr). Second, a cluster are threefold. First, a large-scale dataset for vehicles recognition with over 291,602 images comprising method was proposed to incorporate deep learning techniques to assist faster labeling of a large-scale 766 classes has been created and is made publicly available (http://cse.jbnu.ac.kr). Second, a cluster vehicle make and model dataset. Experimental results demonstrate that our cluster method method was proposed to incorporate deep learning techniques to assist faster labeling of a large-scale significantly speeds up the labeling task compared to the manual labeling of each image. Third, vehicle make and model dataset. Experimental results demonstrate that our cluster method we proposed and investigated an approach for a highly robust and real-time automated vehicle MMR significantly speeds up the labeling task compared to the manual labeling of each image. Third, we based on memory-efficient CNN architectures. Toward this goal, we demonstrated the usefulness proposed and investigated an approach for a highly robust and real-time automated vehicle MMR of SqueezeNet deep networks employed to address the challenges in VMMR. The SqueezeNet based on memory-efficient CNN architectures. Toward this goal, we demonstrated the usefulness of architectures employed enable real-time application use because of compressed model sizes without a SqueezeNet deep networks employed to address the challenges in VMMR. The SqueezeNet considerable change in the recognition accuracy of 96.33% at the rank-1 level. Experimental results architectures employed enable real-time application use because of compressed model sizes without have proved the superiority of our proposed system in vehicle MMR. For future work, we plan to a considerable change in the recognition accuracy of 96.33% at the rank-1 level. Experimental results further optimize Residual SqueezeNet architecture like adaptive networks to achieve better accuracy have proved the superiority of our proposed system in vehicle MMR. For future work, we plan to and focus on non-frontal vehicle recognition in real-time scenarios. It is also necessary to train our further optimize Residual SqueezeNet architecture like adaptive networks to achieve better accuracy network with larger dataset to handle more vehicle models. and focus on non-frontal vehicle recognition in real-time scenarios. It is also necessary to train our Authornetwork Contributions: with larger datasetH.L., I.U., to and handle W.W. more developed vehicle the models. main framework and collaborated in writing the paper. All other authors contributed by revising the manuscript. Author Contributions: H.L., I.U., and W.W. developed the main framework and collaborated in writing the Funding:paper. All Thisother research authors wascontribu supportedted by revising by the MSIPthe manuscript. (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2018-2015-0-00378) supervised byFunding: the IITP This (Institute research for was Information supported & communications by the MSIP (Ministry Technology of Science, Promotion). ICT and This Future research Planning), was also Korea, supported under bythe Basic ITRC Science (Information Research Technology Program Research through theCenter National) support Research program Foundation (IITP-2018-2015-0-00378) of Korea (NRF) fundedsupervised by the by Ministrythe IITP (Institute of Education for Information (GR 2016R1D1A3B03931911). & communications This Technology study was Promotion). also financially This research supported was by also the supported grants of China Scholarship Council (CSC No.2017 08260057). by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the ConflictsMinistry of Education Interest: The (GR authors 2016R1D1A3B03931911). declare no conflict ofThis interest. study was also financially supported by the grants of China Scholarship Council (CSC No.2017 08260057). References Conflicts of Interest: The authors declare no conflict of interest. 1. Lim, S. Intelligent transport systems in Korea. IJEI Int. J. Eng. Ind. 2012, 3, 58–64. 2.ReferencesHe, H.; Shao, Z.; Tan, J. Recognition of car makes and models from a single traffic-camera image. IEEE Trans. Intell. Transp. Syst. 2015, 16, 3182–3192. [CrossRef] 1. Lim, S. Intelligent transport systems in Korea. IJEI Int. J. Eng. Ind. 2012, 3, 58–64. 3. Chen, L.; Hsieh, J.; Yan, Y.; Chen, D. Vehicle make and model recognition using sparse representation and 2. He, H.; Shao, Z.; Tan, J. Recognition of car makes and models from a single traffic-camera image. IEEE symmetrical SURFs. Pattern Recognit. 2015, 48, 1979–1998. [CrossRef] Trans. Intell. Transp. Syst. 2015, 16, 3182–3192. 3. Chen, L.; Hsieh, J.; Yan, Y.; Chen, D. Vehicle make and model recognition using sparse representation and symmetrical SURFs. Pattern Recognit. 2015, 48, 1979–1998. 4. Llorca, D.; Colás, D.; Daza, I.; Parra, I.; Sotelo, M. Vehicle model recognition using geometry and appearance of car emblems from rear view images. In Proceedings of the 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 3094– 3099.

Sensors 2019, 19, 982 13 of 14

4. Llorca, D.; Colás, D.; Daza, I.; Parra, I.; Sotelo, M. Vehicle model recognition using geometry and appearance of car emblems from rear view images. In Proceedings of the 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 3094–3099. 5. Fraz, M.; Edirisinghe, E.A.; Sarfraz, M.S. Mid-level-representation based lexicon for vehicle make and model recognition. In Proceedings of the 2014 22nd International Conference on Pattern Recognition (ICPR), Stockholm, Sweden, 24–28 August 2014; pp. 393–398. 6. AbdelMaseeh, M.; Badreldin, I.; Abdelkader, M.F.; El Saban, M. Car Make and Model recognition combining global and local cues. In Proceedings of the 2012 21st International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, 11–15 November 2012; pp. 910–913. 7. Jang, D.; Turk, M. Car-Rec: A real time car recognition system. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV), Kona, HI, USA, 5–7 January 2011; pp. 599–605. 8. Baran, R.; Rusc, T.; Fornalski, P. A smart camera for the surveillance of vehicles in intelligent transportation systems. Multimed. Tools Appl. 2016, 75, 10471–10493. [CrossRef] 9. Santos, D.; Correia, P.L. Car recognition based on back lights and rear view features. In Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services, London, UK, 6–8 May 2009; pp. 137–140. 10. Ren, Y.; Lan, S. Vehicle make and model recognition based on convolutional neural networks. In Proceedings of the 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 26–28 August 2016; pp. 692–695. 11. Dehghan, A.; Masood, S.Z.; Shu, G.; Ortiz, E.G. View Independent Vehicle Make, Model and Color Recognition Using Convolutional Neural Network. arXiv 2017, arXiv:1702.01721. 12. Huang, K.; Zhang, B. Fine-grained vehicle recognition by deep Convolutional Neural Network. In Proceedings of the International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Datong, China, 15–17 October 2016; pp. 465–470. 13. Gao, Y.; Lee, H.J. Local tiled deep networks for recognition of vehicle make and model. Sensors 2016, 16, 226. [CrossRef][PubMed] 14. Al-Smadi, M.; Abdulrahim, K.; Salam, R.A. Traffic surveillance: A review of vision based vehicle detection, recognition and tracking. Int. J. Appl. Eng. Res. 2016, 11, 713–726. 15. Hsieh, J.W.; Chen, L.; Chen, D. Symmetrical SURF and its applications to vehicle detection and vehicle make and model recognition. IEEE Trans. Int. Transp. Syst. 2014, 15, 6–20. [CrossRef] 16. Ma, X.; Grimson, W.E.L. Edge-based rich representation for vehicle classification. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; pp. 1185–1192. 17. Aarathi, K.S.; Abraham, A. Vehicle color recognition using deep learning for hazy images. In Proceedings of the 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 10–11 March 2017; pp. 335–339. 18. Biglari, M.; Soleimani, A.; Hassanpour, H. A Cascaded Part-Based System for Fine-Grained Vehicle Classification. IEEE Trans. Intell. Transp. Syst. 2018, 19, 273–283. [CrossRef] 19. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. 20. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. 21. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. 22. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [CrossRef][PubMed] 23. Alex, K.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. 24. Szegedy, C.; Liu, W.; Jia, Y. Going Deeper with Convolutions. Available online: http://arxiv.org/abs/1409. 4842 (accessed on 17 September 2014). Sensors 2019, 19, 982 14 of 14

25. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. 26. Yang, L.; Luo, P.; Loy, C.C.; Tang, X. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 3973–3981. 27. Pearce, G.; Pears, N. Automatic make and model recognition from frontal images of cars. In Proceedings of the 2011 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), Klagenfurt, Austria, 30 August–2 September 2011; pp. 373–378. 28. Psyllos, A.; Anagnostopoulos, C.N.; Kayafas, E. Vehicle model recognition from frontal view image measurements. Comput. Stand. Interfaces 2011, 33, 142–151. [CrossRef] 29. Psyllos, A.; Anagnostopoulos, C.N.; Kayafas, E. SIFT-based measurements for vehicle model recognition. In Proceedings of the XIX IMEKO World Congress Fundamental and Applied Metrology, Lisbon, Portugal, 6−11 September 2009. 30. Siddiqui, A.J.; Mammeri, A.; Boukerche, A. Real-time vehicle make and model recognition based on a bag of surf features. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3205–3219. [CrossRef] 31. Fang, J.; Zhou, Y.; Yu, Y.; Du, S. Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1782–1792. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).