sensors

Article A Dual Neural Architecture Combined SqueezeNet with OctConv for LiDAR Data Classification

Aili Wang 1,*, Minhui Wang 1, Kaiyuan Jiang 1, Mengqing Cao 1 and Yuji Iwahori 2

1 The Higher Educational Key Laboratory for Measuring & Control Technology and Instrumentations of Heilongjiang, Harbin University of Science and Technology, Harbin 150080, China; [email protected] (M.W.); [email protected] (K.J.); [email protected] (M.C.) 2 Department of Computer Science, Chubu University, Aichi 487-8501, Japan; [email protected] * Correspondence: [email protected]; Tel.: +86-451-86-392-366

 Received: 7 October 2019; Accepted: 9 November 2019; Published: 12 November 2019 

Abstract: Light detection and ranging (LiDAR) is a frequently used technique of data acquisition and it is widely used in diverse practical applications. In recent years, deep convolutional neural networks (CNNs) have shown their effectiveness for LiDAR-derived rasterized digital surface models (LiDAR-DSM) data classification. However, many excellent CNNs have too many parameters due to depth and complexity. Meanwhile, traditional CNNs have spatial redundancy because different convolution kernels scan and store information independently. SqueezeNet replaces a part of 3 3 convolution kernels in CNNs with 1 1 convolution kernels, decomposes the original × × one convolution layer into two layers, and encapsulates them into a Fire module. This structure can reduce the parameters of network. Octave Convolution (OctConv) pools some feature maps firstly and stores them separately from the feature maps with the original size. It can reduce spatial redundancy by sharing information between the two groups. In this article, in order to improve the accuracy and efficiency of the network simultaneously, Fire modules of SqueezeNet are used to replace the traditional convolution layers in OctConv to form a new dual neural architecture: OctSqueezeNet. Our experiments, conducted using two well-known LiDAR datasets and several classical state-of-the-art classification methods, revealed that our proposed classification approach based on OctSqueezeNet is able to provide competitive advantages in terms of both classification accuracy and computational amount.

Keywords: data classification; light detection and ranging (LiDAR); convolutional neural networks (CNNs); SqueezeNet; octave convolution (OctConv)

1. Introduction Light detection and ranging (LiDAR) technology is an active remote sensing measurement technology which can obtain ground object information by emitting laser to the target [1]. Compared with optical imaging and infrared remote sensing, LiDAR can obtain high-resolution three-dimensional spatial point clouds, elevation models, and other raster-derived data independent of weather conditions through laser beam [2,3]. The data adopted in this article are LiDAR-derived rasterized digital surface models (LiDAR-DSM) which are obtained by rasterizing the point cloud data acquired by the LiDAR system [4]. There is a lot of research on LiDAR-DSM data classification. Priestnall et al. examined methods for extracting surface features from DSM produced by LiDAR [5]. Song et al. modified the crown shape in spectrum by using DSM [6]. Zhou et al. used minimum description length (MDL) and morphology to recognize the buildings [7]. Zhou combined LiDAR height and intensity data to accurately map urban land cover [8].

Sensors 2019, 19, 4927; doi:10.3390/s19224927 www.mdpi.com/journal/sensors Sensors 2019, 19, 4927 2 of 15

LiDAR-DSM mainly includes terrain changes and feature heights of the target area and it is suitable for classification tasks that distinguish targets with different heights. Thus, the accurate classification of DSM plays an important role in distinguishing different land cover categories. The classification task of LiDAR-DSM data is usually based on the classification of pixels, that is, the interpretation process of remote sensing images [9]. For example, Lodha et al. used the support vector machine (SVM) algorithm to classify LiDAR data and obtained higher accuracy and convincing visual results [10]. Sasaki et al. adopted a decision tree analysis to investigate the average height of each land class [11]. Naidoo et al. used an automated Random Forest modelling approach to classify eight common savanna tree species [12]. Khodadadzadeh et al. developed a new efficient strategy for fusion and classification of hyperspectral and LiDAR data to integrate multiple types of features [13]. Ghamisi et al. proposed a joint classification method which is based on extinction profiles (EPs) features and convolutional neural network (CNN) to improve the classification accuracy [14]. Then Ghamisi et al. proposed to extract the spatial and background information of DSM data in an unsupervised manner to obtain higher classification precision [15]. Wang et al. combined morphological profiles (MPs) and CNN to provide more features for LiDAR-DSM classification [16]. He et al. used spatial transformer networks (STN) to identify the best input of CNN for LiDAR classification [17]. Xia et al. used an ensemble classifier to process morphological features which fuses hyperspectral (HS) images and LiDAR-DSM [18]. Ge et al. proposed a new framework for fusion of hyperspectral image and LiDAR data based on the extinction profiles, local binary pattern (LBP), and kernel collaborative representation classification [19]. AlexNet [20] won the ImageNet [21] competition in 2012, making the convolutional neural network (CNN) become the focus again. It is a breakthrough achievement in the history of CNN development. Since then, convolutional neural networks have been widely used in various image processing fields [22–32]. Compared with the traditional machine learning algorithms, the three characteristics of CNN (receptive field, weights sharing, and subsampling) make it learn characteristics from the images directly. Researchers have proposed various CNNs with different structures and obtained good classification results [33–36]. These excellent CNNs are designed to achieve higher classification accuracy by designing deeper and more complex structures. However, they have relatively many structural parameters and may require large storage space and computing resources. SqueezeNet is a lightweight structure with a smaller number of structural parameters and less calculations whose structure and classification accuracy satisfy the application requirements as well [37]. SqueezeNet has only 1 1 and 3 3 convolution kernels and its purpose is not to obtain the best × × classification accuracy but to simplify the complexity of the network and achieve the classification accuracy similar to a public network. Octave Convolution (OctConv) is mainly used to deal with feature mapping of multiple spatial frequencies and reduce spatial redundancy. It is a single, generic, and plug-and-play convolution unit that can replace ordinary convolution directly and it does not need to adjust the network structure. In addition, the features of adjacent pixels in one image have similarities; however, the convolution kernel in traditional CNNs sweeps each location and stores its own feature description independently. It ignores spatial consistency, so that the feature maps have a large amount of redundancy in the spatial dimension. OctConv scales a portion of the feature maps and then communicates with the unscaled portion to reduce spatial redundancy and enhance feature utilization [38]. In this article, we propose a novel dual neural architecture, OctSqueezeNet, which combines SqueezeNet with OctConv, which improves LiDAR-DSM data classification accuracy with less structural parameter memory. In this article, OctSqueezeNet was applied to LiDAR data classification. The main contributions are written as follows: SqueezeNet combined with OctConv was used to LiDAR data classification for the first time, which improved the classification accuracy and reduced the storage space occupied by the network. We replaced the original convolutional layer in OctConv with the Fire modules in SqueezeNet to Sensors 2019, 19, 4927 3 of 15

compensate for the shortcomings of SqueezeNet, which had less extracted feature information due to lots of 1 1 convolution kernel. × 2. SqueezeNet Design Architecture The main objective of SqueezeNet is maintaining competitive accuracy with few parameters. To achieve this goal, three main strategies were adopted. Firstly, the 3 3 filter was replaced by the 1 1 × × filter for which has fewer parameters. Secondly, we reduced the number of input channels to the 3 3 × filters. Finally, we performed subsampled operations in the later stages of the network to make the convolution layer with large activation. SqueezeNet drew on the idea of the Inception module [23] to design a Fire module with a squeeze layer and an expand layer. The structure of Fire module is shown in Figure1. In order to reduce the number of channels for input elements, the squeeze layer used a 1 × 1 convolution kernel to compress the input elements. The expansion layer used the 1 1 and 3 3 Sensors 2019, 19, x FOR PEER REVIEW × 3 of 15 × convolution kernels for multi-scale learning and concatenating. Sensors 2019, 19, x FOR PEER REVIEW 3 of 15 1×1 squeeze 1×1 squeeze

expend expend 3×33×3 1×1 1×1

softmax softmax

FigureFigure 1. 1.Structure Structure of of Fire Fire module. module. Figure 1. Structure of Fire module. The operationThe main process objective of of the SqueezeNet Fire module is maintaining is shown competitive in Figure 2accuracy. The size with of few the parameters. input feature To maps isTheh mainwachieve nobjective. Firstly,this goal, of the threeSqueezeNet input main feature strategies is maintaining maps were passadopted. competitive through Firstly, the the accuracy squeeze3 × 3 filter with layerwas replacedfew and parameters. obtain by the the 1 × outputTo × 1 ×filter for which has fewer parameters. Secondly, we reduced the number of input channels to the 3 achievefeature this maps goal, withthree amain size ofstrategiesh w weres1. The adopted. size of Firstly, the feature the 3 maps × 3 filter is unchanged was replaced but by the the channels 1 × × 3 filters. Finally, we performed× × subsampled operations in the later stages of the network to make 1 filterreduce for which from n has to s fewer. The parameters. output feature Secondly, maps of we squeeze reduced layer the are number sent into of 1input1 and channels 3 3 convolutionto the 3 the convolution1 layer with large activation. SqueezeNet drew on the idea of the× Inception× module × 3 filters.kernels Finally,[23] in expandto design we performedlayer, a Fire module respectively. subsampled with a squeeze Then operat concatenatelayerions and anin expand the the later result layer. stages of The convolution. structureof the network of Fire Finally, module to make only the the numberconvolutionis of shown channels layer in Figure with changes 1.large In order toactivation.e to+ reducee . In Squeezthe order numb toeNeter enable of drewchannels the on outputfor the input idea activation elements, of the theInception of squeeze the 1 layermodule1 and 3 3 1 3 × × [23] filtersto design ofused the a Firea extension 1 × module1 convolution module with kernel a to squeeze have to compress the layer same theand height input an expand elements. and width, layer. The the expansion The boundary structure layer zero used of Fire filling the module1 × operation 1 is shownwith 1in pixeland Figure 3 is× 3 performed 1.convolution In order to forkernels reduce the for input themulti-scale numb of the erlearning 3 of 3channels filters and concatenating. in for the input extension elements, module. the squeeze Rectified layer Linear × usedUnit a 1 (ReLU)× 1 convolution is applied kernel to the to activation compress of the squeeze input layerelements. and anThe expand expansion layer. layer Meanwhile, used the there 1 × 1 is no and full-connection3 × 3 convolution layer kernels in SqueezeNet. for multi-scale learningh× wand×n concatenating.

k=1 , c=s1 squeeze h×w×n

h×w×s1

k=1 , c=s1 squeeze k=1 , c=e1 k=1 , c=e3 expend

h×w×he1×w×s1 h×w×e3

concatenate k=1 , c=e1 k=1 , c=e3 expend h×w×(e1+e3) h×w×e1 h×w×e3

Figure 2. Operationconcatenate process of Fire module.

The operation process of the Fire module is shown in Figure 2. The size of the input feature maps h×w×(e1+e3) is h × w × n. Firstly, the input feature maps pass through the squeeze layer and obtain the output

feature maps with a size of hFigure×w×s1. 2.TheOperation size of theprocess feature maps of Fire is module.unchanged but the channels reduce from n to s1. The output feature maps of squeeze layer are sent into 1 × 1 and 3 × 3 convolution kernels Figure 2. Operation process of Fire module. in expand layer, respectively. Then concatenate the result of convolution. Finally, only the number of channels changes to e1+e3. In order to enable the output activation of the 1 × 1 and 3 × 3 filters of the The operationextension process module toof have the Firethe same module height is and shown width, in the Figure boundary 2. The zero size filling of the operation input with feature 1 pixel maps is h × w × nis. Firstly,performed the for input the input feature of the maps3 × 3 filters pass in throug the extensionh the module.squeeze Rectified layer and Linear obtain Unit (ReLU) the output is feature mapsapplied with toa sizethe activation of h×w× sof1. squeezeThe size layer of theand feature an expand maps layer. is Meanwhile, unchanged there but is the no full-connectionchannels reduce from n to s1.layer The inoutput SqueezeNet. feature maps of squeeze layer are sent into 1 × 1 and 3 × 3 convolution kernels in expand layer, respectively. Then concatenate the result of convolution. Finally, only the number of 3. Octave Convolution channels changes to e1+e3. In order to enable the output activation of the 1 × 1 and 3 × 3 filters of the extension module to have the same height and width, the boundary zero filling operation with 1 pixel is performed for the input of the 3 × 3 filters in the extension module. Rectified Linear Unit (ReLU) is applied to the activation of squeeze layer and an expand layer. Meanwhile, there is no full-connection layer in SqueezeNet.

3. Octave Convolution

Sensors 2019, 19, 4927 4 of 15

3. Octave Convolution Natural images can be decomposed into low spatial frequency components and high spatial Sensorsfrequency 2019, 19 components., x FOR PEER REVIEW Similarly, the feature maps of the convolutional layers can also be decomposed4 of 15 into feature components of different spatial frequencies. The low frequency components are used to describeNatural theimages structure can be with decomposed the smooth into changes, low spatial and the frequency high frequency components components and high are spatial used to frequencydescribethe components. fine details ofSimilarly, the fast changes.the feature maps of the convolutional layers can also be decomposedAs shown into in feature Figure components3, the components of different of high spatial and lowfrequencies. frequency The of thelow imagefrequency are written components as XH areand usedXL, to and describe their corresponding the structure with outputs the smoot are YHhand changes,YL after and the the convolution high frequency operation, components where areα ∈ used(0,1) to represents describe the fine ratio de oftails the of low the frequency fast changes. channels. The arrows above and below represent that the featureAs shown maps in Figure with the 3, samethe components frequency self-update of high and their low informationfrequency of by the convolution image are operation,written as andXH andthe X crossedL, and their arrows corresponding help to exchange outputs information are YH and between YL after the the two convolution frequencies operation, by pooling, where upsample, αand ∈ add (0,1) operations.represents the In ratio the convolution of the low frequenc operation,y channels.WH responses The arrows are both above from andX Hbelowto YH representand from L H H H H L H H H thatX totheY feature, that is mapsW = [withW → the,W same→ ]. Wfrequenc→ isy traditional self-update convolution, their information and the sizeby convolution of input and L H operation,output images and the are crossed the same. arrowsW → helpupsamples to exchange the input information image firstbetween and then the performstwo frequencies traditional by H L H pooling,convolution, upsample, while andW →addpools operatio thens. input In imagethe convolution first. As shownoperation, as Equations W responses (1) and are (2), both the from high XandH to lowYH and frequency from X featureL to YH, maps that is are W storedH = [W inH→H di,WfferentL→H ]. groups.WH→H is Sharingtraditional information convolution, between and the adjacent size oflocations input and can output reduce images the spatial are the resolution same. W ofL→ theH upsamples low-frequency the input group image and spatial first and redundancy. then performs traditional convolution, while WH→L pools the input image first. As shown as Equations (1) and (2), H H H L H the high and low frequency feature maps Yare =storedY → in +differentY → groups. Sharing information between(1) adjacent locations can reduce the spatial resolution of the low-frequency group and spatial L = L L + H L redundancy. Y Y → Y → (2)

( 1 - αout ) cout f ( X H; W H→H ) Y H→H H h H → h Y X f ( pool (X H , 2 ); W H L ) ⊕ W Y L→H W Y H→L αin cin αout cout upsample ( f ( X L; W L→H ),2)⊕ 0.5h X L 0.5h Y L Y L→L L L→L 0.5w f ( X ; W ) 0.5w FigureFigure 3. Structure 3. Structure of OctConv. of OctConv.

The specific calculation process is written as Equations (3) and (4). 𝑌 = 𝑌→ + 𝑌→ (1) H  H H H   L L H  Y = f X ; W 𝑌→ = 𝑌+→upsample+ 𝑌→ f X ; W → , 2 (2)(3)

The specific calculation processL is writtenL L Las Equations  H (3) andH (4).L  Y = f X ; W → + f pool X , 2 ; W → (4) 𝑌 = 𝑓(𝑋; 𝑊→) + 𝑢𝑝𝑠𝑎𝑚𝑝𝑙𝑒 (𝑓(𝑋;𝑊→),2) (3) 4. OctSqueezeNet for LiDAR Classification 𝑌 = 𝑓(𝑋;𝑊→) + 𝑓(𝑝𝑜𝑜𝑙(𝑋,2);𝑊→) (4) As shown in Figure4, we rescaled the feature maps with di fferent resolutions to the same spatial resolution and concatenated their feature channels, forming a dual neural architecture similar to 4. OctSqueezeNet for LiDAR Classification multi-layers called OctSqueezeNet [39] for LiDAR data classification processing. As shown in Figure 4, we rescaled the feature maps with different resolutions to the same spatial resolution and concatenated their feature channels, forming a dual neural architecture similar to multi-layers called OctSqueezeNet [39] for LiDAR data classification processing. First, random selection of 20% from the input feature map performed a 2 × 2 maxpooling operation to halve the size to 16 × 16, and the remaining feature maps maintained the original size, 32 × 32. The two parts were sent separately to different Fire modules to obtain their respective output feature maps. The size of feature maps did not change through the Fire module. We repeated the above operation for the 32 × 32 output feature maps. At the same time, 80% of the 16 × 16 output feature maps were sent to a Fire module and a 2 × 2 upsampling operation was performed on the corresponding output to restore the size to 32 × 32; the remaining 20% of these were sent to another Fire module. Feature maps with the same size in different sections were combined to form the dual

Sensors 2019, 19, x FOR PEER REVIEW 5 of 15

outputs, then sent them to the average pooling layer to obtain output feature maps with size 16 × 16 and 8 × 8. As shown in the lower part of Figure 4, the above operation was repeated for these 16 × 16 output feature maps. Meanwhile, 80% of the these 8 × 8 output feature maps were only sent to a Fire module; the remaining 20% were sent to a Fire module and then the outputs the outputs after this Fire module were sent to a 2 × 2 upsampled layer to restore to the size of 16 × 16. The feature maps with the same size in different parts were combined to obtain 8 × 8 and 16 × 16 and, respectively, sent to the Fire module, shown as the lower part of Figure 4. Then we upsampled Sensorsthe 8 ×2019 8 feature, 19, 4927 maps to 16 × 16. The 16 × 16 feature maps were merged to obtain a single output5 ofand 15 downsampled to 8 × 8. Finally, softmax was used for data classification.

Conv Fire α = 0.8 1×1 Conv Fire 1×1 32×32 add

concat 32×32 α = 0.8 16×16 32×32 Maxpooling 2×2

32×32 Fire Conv α = 0.216×16 3×3 α = 0.2 Up- α = 0.8 sample Conv Maxpooling2×2 Fire 2×2 Maxpooling 1×1 Conv 2×2 32×32 1×1 concat add 16×16 16×16 16×16

Conv α = 0.2 Fire Upsample Fire 3×3 Fire 16×16 2×2 Fire

Global Averagepool 2×2 16×16α = 0.8 add Fire

softmax 16×16 8×8 16×16 8×8 α = 0.8 Max- Fire pooling 8×8 2×2 α = 0.2 16×16 2×2 Fire add 8×8 Up- α = 0.2

sample Fire

FigureFigure 4.4. ArchitectureArchitecture ofof OctSqueezeNet.OctSqueezeNet.

4.1. AdaptiveFirst, random Learning selection Optimization of 20% from Algorithm the input feature map performed a 2 2 maxpooling operation × to halve the size to 16 16, and the remaining feature maps maintained the original size, 32 32. The Adam optimization× algorithm is an extension of the stochastic gradient descent× (SGD) two parts were sent separately to different Fire modules to obtain their respective output feature maps. algorithm. It is a step-optimization algorithm based on the gradient stochastic objective function and The size of feature maps did not change through the Fire module. We repeated the above operation for low-order moment adaptive estimation based on training data. It can replace the SGD and update the 32 32 output feature maps. At the same time, 80% of the 16 16 output feature maps were sent the weight× of the neural network iteratively based on the training× data. The method estimates the to a Fire module and a 2 2 upsampling operation was performed on the corresponding output to different parameters by the× first order matrix, mt, and the second order matrix, nt, of the gradient to restore the size to 32 32; the remaining 20% of these were sent to another Fire module. Feature maps calculate the adaptive× learning rate. The process is shown as Equations (5)–(7): with the same size in different sections were combined to form the dual outputs, then sent them to the average pooling layer to obtain output𝑚 feature=𝜇×𝑚 maps +(1−𝜇)×𝑔 with size 16 16 and 8 8. (5) × × As shown in the lower part of Figure4, the above operation was repeated for these 16 16 output 𝑛 =𝑉×𝑛 +(1−𝑉)×𝑔 × (6) feature maps. Meanwhile, 80% of the these 8 8 output feature maps were only sent to a Fire module; ×𝑚′ = the remaining 20% were sent to a Fire module and then the outputs the outputs after this Fire module(7) were sent to a 2 2 upsampled layer to restore to the size of 16 16. × × The feature maps with the same size in different parts were combined to obtain 8 8 and 16 16 × × and, respectively, sent to the Fire module, shown as the lower part of Figure4. Then we upsampled the 8 8 feature maps to 16 16. The 16 16 feature maps were merged to obtain a single output and × × × downsampled to 8 8. Finally, softmax was used for data classification. × 4.1. Adaptive Learning Optimization Algorithm Adam optimization algorithm is an extension of the stochastic gradient descent (SGD) algorithm. It is a step-optimization algorithm based on the gradient stochastic objective function and low-order Sensors 2019, 19, 4927 6 of 15 moment adaptive estimation based on training data. It can replace the SGD and update the weight of the neural network iteratively based on the training data. The method estimates the different parameters by the first order matrix, mt, and the second order matrix, nt, of the gradient to calculate the adaptive learning rate. The process is shown as Equations (5)–(7):

mt = µ mt 1 + (1 µ) gt (5) × − − × 2 nt = V nt 1 + (1 V) gt (6) × − − × mt m0 = (7) t 1 µt − nt n0 = (8) t 1 vt − The mt0 and nt0 are the correction for mt and nt, which approximate the expected unbiased estimate, have no additional requirements for memory, and can be adjusted dynamically according p to the gradient, where -mt0/ nt0 + ε forms a dynamic constraint on the learning rate and it has a clear scope.

4.2. Loss and Activate Function The activate function of the structure in this article is ReLU. As shown as Equation (9), it causes a part of the output of neuron to zero, which results in sparseness of the network and reduces the interdependence of parameters, alleviating the problem of overfitting. The calculation of the whole process saves a lot of time. g(x) = max(0, x) (9)

Because our dataset is a multi-category dataset, softmax is adopted as the final classifier of the network model. Shown in Equation (10), softmax is used as the exponential operation, which can increase the comparison of the large values and small values to improve learning efficiency.

ZL e j aL = (10) j P ZL K e K

L L where Zj represents the input of the jth neuron of the Lth layer (usually the last layer), aj represents P ZL the output of the jth neuron in the Lth layer, and e represents the natural constant. e K represents the K sum of the inputs of all neurons in the Lth layer. Therefore, as Equation (11), the corresponding loss function is a combination of softmax and cross-entropy loss.

ZL e j Lossi = log yi = log (11) − − P ZL K e K

5. Experimental Results and Analysis

5.1. Datasets Description This article conducted experiments on two different datasets, Bayview Park and Recology, to evaluate the performance of the proposed classification method. They are the public datasets of the 2012 IEEE International Remote Sensing Image Convergence Competition and collected in the city of San Francisco, CA, USA. The Bayview Park dataset has a size of 300 200 pixels, the spatial resolution × of 1.8 meters and marks 7 land classes. The Recology dataset consists of 200 250 pixels with a × spatial resolution of 1.8 meters and contains 11 land classes. Figure5 shows the DSM maps and the groundtruth maps, respectively, for Bayview Park and Recology datasets. Sensors 2019, 19, 4927 7 of 15 Sensors 2019, 19, x FOR PEER REVIEW 7 of 15

Building1 Building2 Building3 Road Trees Soil Seawater

(a) (b) Building1 Building2 Building3 Building4 Building5 Building6 Building7 Trees Parking Lot Soil Grass (c) (d)

Figure 5. Bayview Park (up) and Recology (down) datasets: (a) Digital surface models (DSM) of Figure 5. Bayview Park (up) and Recology (down) datasets: (a) Digital surface models (DSM) of Bayview Park, (b) groundtruth of Bayview Park, (c) DSM of Recology, (d) groundtruth of Recology. Bayview Park, (b) groundtruth of Bayview Park, (c) DSM of Recology, (d) groundtruth of Recology. 5.2. Experimental Set-Up This article conducted experiments on two different datasets, Bayview Park and Recology, to evaluateExperiments the performance used the of TensorFlow the proposed under classifica windowstion method. as the backend, They are encoded the public with datasets of and the Python.2012 IEEE We International divided the Remote datasets Sensing into two Image parts: Convergence Training samples Competition and test and samples. collected The in the number city of ofSan training Francisco, samples CA, USA. (i.e., 400,The Bayview 500, 600, Park and dataset 700) was has selected a size of randomly 300 × 200 pixels, and the the rest spatial were resolution used as testof 1.8 sets. meters And and we marks adopted 7 land overall classes. accuracy The Recology (OA), average dataset accuracy consists of (AA), 200 × and 250 Kappa pixels with as objective a spatial evaluation criteria. In the experiments, SqueezeNet only used convolutions size with 1 1 and 3 3, resolution of 1.8 meters and contains 11 land classes. Figure 5 shows the DSM maps× and× the the input features of both datasets were 32 32 pixels. DSM data were linearly mapped to ( 1 1) and groundtruth maps, respectively, for Bayview× Park and Recology datasets. − the gradient optimization algorithm selected Adam. The kernel function of SVM was set to radial basis function5.2. Experimental (rbf) and Set-Up the coe fficient of rbf was default to auto. The penalty parameter of the error terms was 100. The initial learning rate for Bayview Park and Recology datasets both were 0.001 in CNN, and theyExperiments were set to used 0.0005 the and TensorFlow 0.001, respectively, under windows in OctConv. as the In backend, SqueezeNet encoded and OctSqueezeNet, with Keras and thePython. initial We learning divided rates the for datasets two datasets into two were parts: both Training set to 0.0005.samples For and the test two samples. datasets, The the number ratio of of poolingtraining for samples feature (i.e., maps 400, in the500, first 600, input and layer700) was and selected middle layersrandomly were and all setthe to rest 0.2. were The lastused output as test layersets. didAnd not we perform adopted pooling. overall In thisaccuracy article, (OA), the initial average learning accuracy rate for(AA), Bayview and ParkKappa and as Recology objective datasetsevaluation both criteria. were 0.001 In the in experiments, CNN, and they SqueezeNet were set toonly 0.0005 used and convolutions 0.001, respectively, size with in 1 OctConv;× 1 and 3 × in 3, SqueezeNetthe input features and OctSqueezeNet, of both datasets the were initial 32 learning × 32 pixels. rates DSM for thedata two were datasets linearly were mapped both set to to(−1 0.0005. 1) and the gradient optimization algorithm selected Adam. The kernel function of SVM was set to radial 5.2.1.basis Bayview function Park(rbf) Datasetand the coefficient of rbf was default to auto. The penalty parameter of the error terms was 100. The initial learning rate for Bayview Park and Recology datasets both were 0.001 in The experiments ran on a 3.2-GHz CPU with a GTX 1060 GPU card. As shown in Table1, CNN, and they were set to 0.0005 and 0.001, respectively, in OctConv. In SqueezeNet and OctSqueezeNet achieved the best results on OA, AA, and Kappa when different numbers of training OctSqueezeNet, the initial learning rates for two datasets were both set to 0.0005. For the two datasets, samples were selected. When 700 samples were selected, the best OA was 95.42%, increasing 1.91%, the ratio of pooling for feature maps in the first input layer and middle layers were all set to 0.2. The 2.21%, 6.32%, and 19.17% compared to OctConv, SqueezeNet, CNN, and SVM, respectively. Figure6 last output layer did not perform pooling. In this article, the initial learning rate for Bayview Park uses the bar chart to show the accuracy of each comparison method when different numbers of samples and Recology datasets both were 0.001 in CNN, and they were set to 0.0005 and 0.001, respectively, were selected. It can be seen intuitively that the accuracy of OctSqueezeNet proposed by us was always in OctConv; in SqueezeNet and OctSqueezeNet, the initial learning rates for the two datasets were higher than that of other methods. both set to 0.0005.

5.2.1. Bayview Park Dataset

Sensors 2019, 19, x FOR PEER REVIEW 8 of 15 SensorsSensors 2019 2019, 19, 19, x, FORx FOR PEER PEER REVIEW REVIEW 8 8of of15 15 The experiments ran on a 3.2-GHz CPU with a GTX 1060 GPU card. As shown in Table 1, TheThe experiments experiments ran ran on on a a3.2-GHz 3.2-GHz CPU CPU with with a aGTX GTX 1060 1060 GPU GPU card. card. As As shown shown in in Table Table 1, 1, OctSqueezeNet achieved the best results on OA, AA, and Kappa when different numbers of training OctSqueezeNetOctSqueezeNet achieved achieved the the best best results results on on OA, OA, AA AA, and, and Kappa Kappa when when different different numbers numbers of of training training samples were selected. When 700 samples were selected, the best OA was 95.42%, increasing 1.91%, samplessamples were were selected. selected. When When 700 700 samples samples were were selected, selected, the the best best OA OA was was 95.42%, 95.42%, increasing increasing 1.91%, 1.91%, 2.21%, 6.32%, and 19.17% compared to OctConv, SqueezeNet, CNN, and SVM, respectively. Figure 6 2.21%,2.21%, 6.32%, 6.32%, and and 19.17% 19.17% compared compared to to OctConv, OctConv, SqueezeNet, SqueezeNet, CNN, CNN, and and SVM, SVM, respectively. respectively. Figure Figure 6 6 uses the bar chart to show the accuracy of each comparison method when different numbers of usesuses the the bar bar chart chart to to show show the the accuracy accuracy of of each each comparison comparison method method when when different different numbers numbers of of samples were selected. It can be seen intuitively that the accuracy of OctSqueezeNet proposed by us samplessamples were were selected. selected. It Itcan can be be seen seen intuitively intuitively th thatat the the accuracy accuracy of of OctSqueezeNet OctSqueezeNet proposed proposed by by us us wasSensors always2019 ,higher19, 4927 than that of other methods. 8 of 15 waswas always always higher higher than than that that of of other other methods. methods.

Table 1. Classification results on Bayview Park dataset. TableTable 1. 1.Classification Classification results results on on Bayview Bayview Park Park dataset. dataset. TableTable 1. Classification 1. Classification results results on onBayview Bayview Park Park dataset. dataset. Training of TrainingTraining of of DifferentTraining ofIndex SVM CNN SqueezeNet OctConv OctSqueezeNet DifferentDifferent IndexIndex SVMSVM CNN CNN SqueezeNetSqueezeNet OctConvOctConv OctSqueezeNetOctSqueezeNet SamplesDi fferent Index SVM CNN SqueezeNet OctConv OctSqueezeNet SamplesSamplesSamples OA% 72.11 ± 2.02 86.77 ± 2.11 88.60 ± 1.56 91.32 ± 0.41 91.99 ± 0.81 OA%OA% 72.11 72.11 ± 2.02± 2.02 86.77 86.77 ± 2.11± 2.11 88.60 88.60 ± 1.56± 1.56 91.32 91.32 ± 0.41± 0.41 91.99 91.99 ± 0.81± 0.81 400 AA% OA% 77.08 ± 72.111.43 2.02 88.01 86.77± 1.32 2.11 89.04 88.60 ± 0.471.56 91.3293.02 ± 0.410.32 91.99 93.210.81 ± 0.43 400 AA% 77.08 ± 1.43 ± 88.01 ± 1.32± 89.04 ± 0.47± 93.02 ±± 0.32 93.21± ± 0.43 400400 400 AA%AA% AA% 77.08 77.08 ± 77.081.43± 1.43 1.43 88.01 88.01 88.01± 1.32± 1.32 1.32 89.04 89.04 89.04 ± 0.47± 0.470.47 93.02 93.02 ± 0.32± 0.32 93.21 93.21 93.210.43 ± 0.43± 0.43 K×100 63.11 ± 1.89 ± 82.92 ± 1.77± 85.42 ± 2.01± 88.30 ±± 0.66 89.48± ± 1.00 K×100K×100 K 100 63.11 63.11 ± 63.111.89± 1.89 1.89 82.92 82.92 82.92± 1.77± 1.77 1.77 85.42 85.42 85.42 ± 2.01± 2.012.01 88.30 88.30 ± 0.66± 0.66 89.48 89.48 89.481.00 ± 1.00± 1.00 OA% × 73.75 ± 2.40 ± 87.37 ± 1.07± 91.31 ± 2.81± 91.61 ±± 0.63 92.79± ± 0.41 OA%OA% 73.75 73.75 ± 2.40± 2.40 87.37 87.37 ± 1.07± 1.07 91.31 91.31 ± 2.81± 2.81 91.61 91.61 ± 0.63± 0.63 92.79 92.79 ± 0.41± 0.41 OA% OA% 73.75 ± 73.752.40 2.40 87.37 87.37± 1.07 1.07 91.31 91.31 ± 2.812.81 91.61 ± 0.63 92.79 92.790.41 ± 0.41 500 AA% 78.45 ± 2.07 ± 89.21 ± 2.99± 90.59 ± 2.05± 93.37 ±± 1.21 95.02± ± 0.90 500500 500 AA%AA% AA% 78.45 78.45 ± 78.452.07± 2.07 2.07 89.21 89.21 89.21± 2.99± 2.99 2.99 90.59 90.59 90.59 ± 2.05± 2.052.05 93.37 93.37 ± 1.21± 1.21 95.02 95.02 95.020.90 ± 0.90± 0.90 K×100 65.41 ± 0.84 ± 84.62 ± 1.52± 88.44 ± 1.65± 88.47 ±± 1.43 90.48± ± 0.47 K×100K×100 K 100 65.41 65.41 ± 65.410.84± 0.84 0.84 84.62 84.62 84.62± 1.52± 1.52 1.52 88.44 88.44 88.44 ± 1.65± 1.651.65 88.47 88.47 ± 1.43± 1.43 90.48 90.48 90.480.47 ± 0.47± 0.47 OA% × 74.72 ± 2.03 ± 88.27 ± 0.71± 91.32 ± 1.03± 92.32 ±± 0.49 94.09± ± 1.23 OA%OA% OA% 74.72 74.72 ± 74.722.03± 2.03 2.03 88.27 88.27 88.27± 0.71± 0.71 0.71 91.32 91.32 91.32 ± 1.03± 1.031.03 92.32 92.32 ± 0.49± 0.49 94.09 94.09 94.091.23 ± 1.23± 1.23 600 AA% 78.87 ± 1.10 ± 89.44 ± 2.64± 92.11 ± 1.57± 94.30 ±± 0.51 95.75± ± 1.25 600600 600 AA%AA% AA% 78.87 78.87 ± 78.871.10± 1.10 1.10 89.44 89.44 89.44± 2.64± 2.64 2.64 92.11 92.11 92.11 ± 1.57± 1.571.57 94.30 94.30 ± 0.51± 0.51 95.75 95.75 95.751.25 ± 1.25± 1.25 K×100 66.74 ± 1.88 ± 85.88 ± 2.01± 89.63 ± 1.43± 89.89 ±± 0.72 92.23± ± 1.64 K×100K×100 K 100 66.74 66.74 ± 66.741.88± 1.88 1.88 85.88 85.88 85.88± 2.01± 2.01 2.01 89.63 89.63 89.63 ± 1.43± 1.431.43 89.89 89.89 ± 0.72± 0.72 92.23 92.23 92.231.64 ± 1.64± 1.64 OA% × 76.25 ± 0.74 ± 89.10 ± 2.09± 93.21 ± 0.63± 93.51 ±± 0.77 95.42± ± 0.91 OA%OA% OA% 76.25 76.25 ± 76.250.74± 0.74 0.74 89.10 89.10 89.10± 2.09± 2.09 2.09 93.21 93.21 93.21 ± 0.63± 0.630.63 93.51 93.51 ± 0.77± 0.77 95.42 95.42 95.420.91 ± 0.91± 0.91 700 AA% 81.21 ± 2.25 ± 90.11 ± 0.88± 93.33 ± 0.79± 95.64 ±± 1.56 96.43± ± 1.37 700700 700 AA%AA% AA% 81.21 81.21 ± 81.212.25± 2.25 2.25 90.11 90.11 90.11± 0.88± 0.88 0.88 93.33 93.33 93.33 ± 0.79± 0.790.79 95.64 95.64 ± 1.56± 1.56 96.43 96.43 96.431.37 ± 1.37± 1.37 K×100 68.70 ± 2.14 ± 86.52 ± 2.70± 91.11 ± 2.14± 91.38 ±± 1.04 93.99± ± 1.97 K×100K×100 K 100 68.70 68.70 ± 68.702.14± 2.14 2.14 86.52 86.52 86.52± 2.70± 2.70 2.70 91.11 91.11 91.11 ± 2.14± 2.142.14 91.38 91.38 ± 1.04± 1.04 93.99 93.99 93.991.97 ± 1.97± 1.97 K×100 × 68.70 ± 2.14 ± 86.52 ± 2.70± 91.11 ± 2.14± 91.38 ±± 1.04 93.99± ± 1.97

100 100100 95.42 95.42 95.42 95.42 94.09 95.42 94.09 93.51 93.21 93.51 94.09 94.09 92.79 93.21 93.51 94.09 92.79 92.32 93.51 93.21 91.99 93.21 93.51 92.32 92.79 91.61 93.21 92.79 91.99 91.32 91.31 91.32 92.32 92.79 91.61 92.32 91.99 91.32 91.31 91.32 91.99 92.32 91.61 91.61 91.99 91.32 91.31 91.32 91.32 91.31 91.32 91.61 91.32 91.31 91.32 89.1 89.1 88.6 88.27 88.6 89.1 89.1 88.27 88.6 89.1 87.37 90 88.6 88.27 88.27 88.6 90 87.37 86.77 88.27 87.37 90 86.77 90 87.37 86.77 90 87.37 86.77 86.77

80

8080 76.25

80 76.25 76.25 OA(%) 76.25 74.72 OA(%) 76.25 74.72 OA(%) OA(%) 73.75 74.72 OA(%) 74.72 73.75 74.72 73.75 73.75 72.11 73.75 72.11 72.11 72.11 72.11 70 7070

60 6060 400 500 600 700 400400 500 500 600 600 700 700 SVM CNN SqueezeNet OctConv OctSqueezeNet SVMSVM CNNCNN SqueezeNetSqueezeNet OctConvOctConv OctSqueezeNetOctSqueezeNet

Figure 6. The classification results of different methods on Bayview Park. Figure 6. The classification results of different methods on Bayview Park. FigureFigure 6. 6.The The classification classification results results of of different different methods methods on on Bayview Bayview Park. Park. Table2 and Figure7 show the classification accuracy of each class. OctSqueezeNet had a good Table 2 and Figure 7 show the classification accuracy of each class. OctSqueezeNet had a good effectTableTable on 2 classification 2and and Figure Figure 7 of 7show dishowfferent the the classification landclassification classes. accu Figureaccuracyracy8 showsof of each each the class. class. classification Oc OctSqueezeNettSqueezeNet results had had of dia agoodff gooderent effect on classification of different land classes. Figure 8 shows the classification results of different effecteffectnetworks on on classification classification for Bayview of Parkof differen differen visuallyt landt land through classes. classes. the Figure Figure false color8 8shows shows maps. the the Itclassification canclassification be seen fromresults results the of classificationof different different networks for Bayview Park visually through the false color maps. It can be seen from the classification networksnetworksresults, for as for theBayview Bayview precision Park Park increased, visually visually through the through mis-division the the false false color area color ofmaps. maps. the featureIt Itcan can be graduallybe seen seen from from decreased. the the classification classification results, as the precision increased, the mis-division area of the feature gradually decreased. results,results, as as the the precision precision increased, increased, the the mis-divi mis-divisionsion area area of of the the feature feature gradually gradually decreased. decreased. Table 2. Classification results of each class on Bayview Park dataset. Table 2. Classification results of each class on Bayview Park dataset. TableTable 2. 2.Classification Classification results results of of each each class class on on Bayview Bayview Park Park dataset. dataset. NO. Classes SVM CNN SqueezeNet OctConv OctSqueezeNet NO. Classes SVM CNN SqueezeNet OctConv OctSqueezeNet NO.NO. ClassesClasses SVM SVM CNN CNN SqueezeNet SqueezeNet OctConv OctConv OctSqueezeNet OctSqueezeNet 11 82.0082.00 ± 4.214.21 93.78 93.78 ± 1.76 1.76 94.89 94.89 ± 1.33 1.33 99.40 99.40 ± 0.41 0.41 99.52 99.52± 0.09 0.09 1 1 82.0082.00 ± 4.21±± 4.21 93.78 93.78 ± 1.76± 1.76± 94.89 94.89 ± 1.33± 1.33± 99.40 99.40 ± 0.41± 0.41± 99.52 99.52 ± 0.09± 0.09± 22 83.8883.88 ± 3.323.32 92.71 92.71 ± 1.04 1.04 98.16 98.16 ± 1.84 1.84 98.08 98.08 ± 1.53 1.53 99.93 99.93± 0.07 0.07 2 2 83.8883.88 ± 3.32±± 3.32 92.71 92.71 ± 1.04± 1.04± 98.16 98.16 ± 1.84± 1.84± 98.08 98.08 ± 1.53± 1.53± 99.93 99.93 ± 0.07± 0.07± 3 91.01 ± 5.20 92.53 ± 1.82 96.12 ± 1.14 98.84 ± 1.36 99.54 ± 0.46 33 3 91.0191.0191.01 ± 5.20± 5.205.20 92.53 92.53 92.53 ± 1.82± 1.82 1.82 96.12 96.12 96.12 ± 1.14± 1.14 1.14 98.84 98.84 98.84 ± 1.36± 1.36 1.36 99.54 99.54 99.54± 0.46± 0.46 0.46 Sensors 2019, 19, x FOR PEER REVIEW ± ± ± ± ± 9 of 15 Sensors 20194 , 19, x FOR PEER REVIEW81.70 ± 4.31 91.05 ± 1.24 92.37 ± 0.33 96.29 ± 0.88 96.29 ± 2.77 9 of 15 44 4 81.7081.7081.70 ± 4.31± 4.314.31 91.05 91.05 91.05 ± 1.24± 1.24 1.24 92.37 92.37 92.37 ± 0.33± 0.33 0.33 96.29 96.29 96.29 ± 0.88± 0.88 0.88 96.29 96.29 96.29± 2.77± 2.77 2.77 ± ± ± ± ±

55 83.8583.85 ± 1.991.99 86.03 86.03 ± 1.81 1.81 97.45 97.45 ± 3.11 3.11 95.54 95.54 ± 0.67 0.67 98.67 98.67± 0.88 0.88 5 83.85 ± 1.99± 86.03 ± 1.81± 97.45 ± 3.11± 95.54 ± 0.67± 98.67 ± 0.88± 6 60.53 ± 3.39 85.37 ± 1.79 86.29 ± 3.02 83.51 ± 0.53 88.75 ± 3.26 66 60.5360.53 ± 3.393.39 85.37 85.37 ± 1.79 1.79 86.29 86.29 ± 3.02 3.02 83.51 83.51 ± 0.53 0.53 88.75 88.75± 3.26 3.26 7 85.53 ± 2.74± 90.98 ± 2.64± 92.25 ± 1.98± 97.01 ± 2.51± 92.47 ± 2.30± 77 85.5385.53 ± 2.742.74 90.98 90.98 ± 2.64 2.64 92.25 92.25 ± 1.98 1.98 97.01 97.01 ± 2.51 2.51 92.47 92.47± 2.30 2.30 ± ± ± ± ±

100

90

80 OA(%) OA(%) OA(%) 70

60

50 Building1 Building2 Building3 road trees soil seawater

SVM CNN SqueezeNet OctConv OctSqueezeNet

Figure 7. Classification results of different methods for each class on Bayview Park.

(a) (a) (b) (c)

(d) (e) (f)

Figure 8. Classification results on Bayview Park: (a) Groundtruth map, (b) support vector machine (SVM), (c) convolutional neural network (CNN), (d) SqueezeNet, (e) Octave Convolution (OctConv), and (f) OctSqueezeNet.

5.2.2. Recology Dataset

Sensors 2019, 19, x FOR PEER REVIEW 9 of 15 Sensors 2019, 19, x FOR PEER REVIEW 9 of 15 5 83.85 ± 1.99 86.03 ± 1.81 97.45 ± 3.11 95.54 ± 0.67 98.67 ± 0.88 56 83.8560.53 ± ± 1.99 3.39 86.03 85.37 ± ± 1.81 1.79 97.45 86.29 ± ± 3.11 3.02 95.54 83.51 ± ± 0.67 0.53 98.67 88.75 ± ± 0.88 3.26 67 60.5385.53 ± ± 3.39 2.74 85.37 90.98 ± ± 1.79 2.64 86.29 92.25 ± ± 3.02 1.98 83.51 97.01 ± ± 0.53 2.51 88.75 92.47 ± ± 3.26 2.30 7 85.53 ± 2.74 90.98 ± 2.64 92.25 ± 1.98 97.01 ± 2.51 92.47 ± 2.30 Sensors 2019, 19, 4927 9 of 15 100 100 90 90 80 80

OA(%) 70

OA(%) 70 60 60 50 50 Building1 Building2 Building3 road trees soil seawater Building1 Building2 Building3 road trees soil seawater SVM CNN SqueezeNet OctConv OctSqueezeNet SVM CNN SqueezeNet OctConv OctSqueezeNet

FigureFigure 7. 7. ClassificationClassification results results of of different different methods for each class on Bayview Park. Figure 7. Classification results of different methods for each class on Bayview Park.

(a) (b) (c) (a) (b) (c)

(d) (e) (f)

(d) (e) (f) Figure 8. ClassificationClassification results on Bayview Park: ( aa)) Groundtruth map, map, ( b) support vector machine Figure(SVM), 8. ( c Classification) convolutional results neural on network network Bayview (CNN), (CNN), Park: ( (da)) SqueezeNet,Groundtruth SqueezeNet, ( (emap,e)) Octave Octave (b) supportConvolution Convolution vector (OctConv), (OctConv), machine (SVM),and ( f) (OctSqueezeNet. c) convolutional neural network (CNN), (d) SqueezeNet, (e) Octave Convolution (OctConv), 5.2.2.and Recology (f) OctSqueezeNet. Dataset 5.2.2. Recology Dataset 5.2.2. AsRecology shown Dataset in Table 3, OctSqueezeNet also achieved the best results for Recology dataset on OA, AA, and Kappa when selecting different numbers of training samples. When selecting 700 samples, the best OA was 95.91%, increasing 0.63%, 0.97%, 3.22%, and 17.98% compared to OctConv, SqueezeNet, CNN, and SVM, respectively. Figure9 shows that the accuracy of the five methods grew steadily with the number of samples increasing. The accuracy of OctSqueezeNet was always the highest. Sensors 2019, 19, x FOR PEER REVIEW 8 of 15 Sensors 2019, 19, x FOR PEER REVIEW 8 of 15 Sensors 2019, 19, x FOR PEER REVIEW 10 of 15 The experiments ran on a 3.2-GHz CPU with a GTX 1060 GPU card. As shown in Table 1, The experiments ran on a 3.2-GHz CPU with a GTX 1060 GPU card. As shown in Table 1, OctSqueezeNet achieved the best results on OA, AA, and Kappa when different numbers of training OctSqueezeNetAs shown achieved in Table 3,the OctSqueezeNet best results on alsoOA, achievedAA, and Kappathe best when results different for Recology numbers dataset of training on OA, samples were selected. When 700 samples were selected, the best OA was 95.42%, increasing 1.91%, samplesAA, and were Kappa selected. when Whenselecting 700 differentsamples numberswere selected, of training the best samples. OA was When 95.42%, selecting increasing 700 samples, 1.91%, 2.21%, 6.32%, and 19.17% compared to OctConv, SqueezeNet, CNN, and SVM, respectively. Figure 6 2.21%,the best 6.32%, OA and was 19.17% 95.91%, compared increasing to OctConv, 0.63%, SqueezeNet,0.97%, 3.22%, CNN, and and17.98% SVM, compared respectively. to FigureOctConv, 6 uses the bar chart to show the accuracy of each comparison method when different numbers of usesSqueezeNet, the bar chart CNN, to and show SVM, the respectively. accuracy of Figure each comparison9 shows that methodthe accuracy when of different the five methods numbers grew of samples were selected. It can be seen intuitively that the accuracy of OctSqueezeNet proposed by us samplessteadily were with selected. the number It can of be samples seen intuitively increasing that. The the accuracy ofof OctSqueezeNetOctSqueezeNet proposed was always by usthe was always higher than that of other methods. washighest. always higher than that of other methods. Sensors 2019, 19, 4927 10 of 15 Table 1. Classification results on Bayview Park dataset. TableTable 1. Classification 3. Classification results results on Bayview on Recology Park dataset. dataset. Training of TrainingTraining of of Different Index TableSVM 3. Classification CNN resultsSqueezeNet on Recology dataset.OctConv OctSqueezeNet DifferentDifferent IndexIndex SVMSVM CNN CNN SqueezeNetSqueezeNet OctConvOctConv OctSqueezeNetOctSqueezeNet Samples SamplesSamplesTraining of OA% 72.11 ± 2.02 86.77 ± 2.11 88.60 ± 1.56 91.32 ± 0.41 91.99 ± 0.81 Different OA%OA% Index 72.11 71.86 ± 2.02± SVM1.05 86.77 85.57 ± CNN2.11± 1.37 88.60 SqueezeNet90.39 ± 1.56± 0.65 OctConv91.32 92.88 ± 0.41± 1.07 OctSqueezeNet 91.99 93.93 ± 0.81± 0.61 400 AA% 77.08 ± 1.43 88.01 ± 1.32 89.04 ± 0.47 93.02 ± 0.32 93.21 ± 0.43 400 Samples AA% 77.08 ± 1.43 88.01 ± 1.32 89.04 ± 0.47 93.02 ± 0.32 93.21 ± 0.43 400 K×100AA% 63.11 72.69 ± 1.89± 2.43 82.92 88.97 ± 1.77± 2.13 85.42 89.02 ± 2.01± 0.18 88.30 91.17 ± 0.66± 1.56 89.48 93.63 ± 1.00± 0.17 K×100 OA% 63.11 ± 71.86 1.89 1.05 82.92 85.57 ± 1.77 1.37 85.42 90.39 ± 2.010.65 92.88 88.30 ±1.07 0.66 93.93 89.480.61 ± 1.00 OA%K×100 73.75 66.62 ± 2.40± 0.88± 87.37 82.35 ± 1.07± 1.71± 91.31 88.67 ±± 2.81± 3.21 91.61 91.28± ± 0.63± 0.73 92.79 92.79± ± 0.41± 0.74 400 OA% AA% 73.75 ± 72.69 2.40 2.43 87.37 88.97 ± 1.07 2.13 91.31 89.02 ± 2.810.18 91.17 91.61 ±1.56 0.63 93.63 92.790.17 ± 0.41 OA% 73.75 ± 2.40± 87.37 ± 1.07± 91.31 ±± 2.81 91.61± ± 0.63 92.79± ± 0.41 500 AA%OA% K 100 78.45 74.51 ± 66.62 2.07± 0.77 0.88 89.21 87.63 82.35 ± 2.99± 1.43 1.71 90.59 92.5488.67 ± 2.05±3.21 2.01 91.28 93.37 93.24 ±0.73 1.21± 1.14 92.79 95.02 94.770.74 ± 0.90± 0.83 500 AA% × 78.45 ± 2.07± 89.21 ± 2.99± 90.59 ±± 2.05 93.37± ± 1.21 95.02± ± 0.90 500 K×100AA% 65.41 77.64 ± 0.84± 3.26 84.62 91.06 ± 1.52± 0.39 88.44 91.28 ± 1.65± 1.21 88.47 91.41 ± 1.43± 0.57 90.48 93.72 ± 0.47± 0.60 K×100 OA% 65.41 ± 74.51 0.84 0.77 84.62 87.63 ± 1.52 1.43 88.44 92.54 ± 1.652.01 93.24 88.47 ±1.14 1.43 94.77 90.480.83 ± 0.47 OA% 74.72 ± 2.03± 88.27 ± 0.71± 91.32 ±± 1.03 92.32± ± 0.49 94.09± ± 1.23 500 OA%K×100 AA% 74.72 69.73 ± 77.64 2.03± 1.75 3.26 88.27 85.98 91.06 ± 0.71± 0.29 0.39 91.32 87.3091.28 ± 1.03±1.21 1.43 91.41 92.32 92.06 ±0.57 0.49± 1.20 93.72 94.09 93.790.60 ± 1.23± 0.99 OA% 74.72 ± 2.03± 88.27 ± 0.71± 91.32 ±± 1.03 92.32± ± 0.49 94.09± ± 1.23 600 AA% K 100 78.87 ± 69.73 1.10 1.75 89.44 85.98 ± 2.64 0.29 92.11 87.30 ± 1.571.43 92.06 94.30 ±1.20 0.51 93.79 95.750.99 ± 1.25 600 AA%OA% × 78.87 76.35 ± 1.10± 2.33± 89.44 90.12 ± 2.64± 0.52± 92.11 93.13 ±± 1.57± 0.71 94.30 94.53± ± 0.51± 1.24 95.75 95.07± ± 1.25± 0.48 K×100 66.74 ± 1.88 85.88 ± 2.01 89.63 ± 1.43 89.89 ± 0.72 92.23 ± 1.64 K×100 OA% 66.74 ± 76.35 1.88 2.33 85.88 90.12 ± 2.01 0.52 89.63 93.13 ± 1.430.71 94.53 89.89 ±1.24 0.72 95.07 92.230.48 ± 1.64 600 K×100AA% 66.74 79.51 ± 1.88± 1.31± 85.88 88.77 ± 2.01± 1.22± 89.63 92.38 ±± 1.43± 1.72 89.89 94.02± ± 0.72± 0.62 92.23 95.36± ± 1.64± 1.15 600 OA% AA% 76.25 ± 79.51 0.74 1.31 89.10 88.77 ± 2.09 1.22 93.21 92.38 ± 0.631.72 94.02 93.51 ±0.62 0.77 95.36 95.421.15 ± 0.91 OA%K×100 76.25 71.85 ± 0.74± 2.01± 89.10 86.89 ± 2.09± 0.72± 93.21 91.94 ±± 0.63± 2.13 93.51 93.51± ± 0.77± 1.44 95.42 94.13± ± 0.91± 0.63 700 AA% K 100 81.21 ± 71.85 2.25 2.01 90.11 86.89 ± 0.88 0.72 93.33 91.94 ± 0.792.13 93.51 95.64 ±1.44 1.56 94.13 96.430.63 ± 1.37 700 AA% × 81.21 ± 2.25± 90.11 ± 0.88± 93.33 ±± 0.79 95.64± ± 1.56 96.43± ± 1.37 K×100OA% 68.70 77.93 ± 2.14± 0.93 86.52 92.69 ± 2.70± 1.69 91.11 94.94 ± 2.14± 2.63 91.38 95.28 ± 1.04± 1.38 93.99 95.91 ± 1.97± 0.73 K×100 OA% 68.70 ± 77.93 2.14 0.93 86.52 92.69 ± 2.70 1.69 91.11 94.94 ± 2.142.63 95.28 91.38 ±1.38 1.04 95.91 93.990.73 ± 1.97 K×100 68.70 ± 2.14± 86.52 ± 2.70± 91.11 ±± 2.14 91.38± ± 1.04 93.99± ± 1.97 700 700 AA%AA% 81.00 81.00 ± 1.321.32 92.22 92.22 ± 1.471.47 94.29 ±0.79 0.79 94.82 94.821.16 ± 1.16 95.89 95.890.17 ± 0.17 ± ± ± ± ± K×100K 100 73.70 73.70 ± 1.011.01 89.53 89.53 ± 2.182.18 93.19 ±2.14 2.14 94.40 94.401.09 ± 1.09 95.13 95.130.11 ± 0.11 100 × ± ± ± ± ± 100 95.42 95.42 95.42 94.09 95.42 94.09 93.51 93.21 93.51 94.09 92.79 93.21 93.51 94.09 92.79 92.32 93.21 91.99 93.51 92.32 100 92.79 91.61 93.21 91.99 91.32 91.31 91.32 92.32 92.79 91.61 91.99 91.32 91.31 91.32 92.32 91.61 91.99 91.32 91.31 91.32 91.61 95.91 95.28 91.32 91.31 91.32 95.07 94.94 94.77 89.1 94.53 89.1 88.6 93.93 88.27 88.6 89.1 88.27 93.24 93.13 88.6 89.1 92.88 90 87.37 92.69 88.27 92.54 88.6 90 87.37 86.77 88.27 87.37 90 86.77 86.77 90 87.37 86.77 90.39 90.12

90 87.63 85.57 80

80 76.25

80 76.25 76.25 OA(%) 74.72 OA(%) 76.25 74.72 OA(%) 73.75 74.72 OA(%) 73.75 74.72 77.93 73.75 72.11 80 73.75 72.11 76.35 72.11 72.11 OA(%) 74.51 70

70 71.86 70 70

60 60 400 500 600 700 60 400 500 600 700 SVM400CNN SqueezeNet 500OctConv 600OctSqueezeNet 700 SVM CNN SqueezeNet OctConv OctSqueezeNet

SVM CNN SqueezeNet OctConv OctSqueezeNet Figure 6. The classification results of different methods on Bayview Park. FigureFigure 6. The 9. classificationThe classification results results of different of different methods methods on Bayview on Recology. Park. Figure 9. The classification results of different methods on Recology. Table 2 and Figure 7 show the classification accuracy of each class. OctSqueezeNet had a good TableTable 2 4and and Figure Figure 7 10 show also the prove classification that OctSqueezeNet accuracy of had each better class. classification OctSqueezeNet e ffects had on a di goodfferent effect on classification of different land classes. Figure 8 shows the classification results of different effectclasses. onTable classification Figure 4 and 11 Figure shows of 10differen the also classification provet land that classes. OctSqueezeNe results Figure of the 8 landshowst had classes better the classification and classification the area ofresults effects the wrong of on different different division networks for Bayview Park visually through the false color maps. It can be seen from the classification networksclasses.was also Figure for less Bayview and 11 shows less Park with the visually theclassification increase through of results precision.the false of the color land maps. classes It can and be the seen area from of the the wrong classification division results, as the precision increased, the mis-division area of the feature gradually decreased. results,was also as theless precision and less withincreased, the increase the mis-divi of precision.sion area of the feature gradually decreased. Table 4. Classification results of each class on Recology dataset. Table 2. Classification results of each class on Bayview Park dataset. TableTable 2. Classification 4. Classification results results of each of each class class on Bayview on Recology Park dataset. dataset. NO. Classes SVM CNN SqueezeNet OctConv OctSqueezeNet NO. Classes SVM CNN SqueezeNet OctConv OctSqueezeNet NO.NO. ClassesClasses SVM SVM CNN CNN SqueezeNet SqueezeNet OctConv OctConv OctSqueezeNet OctSqueezeNet 11 82.0071.66 ± 4.210.94 93.78 98.42 ± 1.76 1.09 94.89 96.87 ± 1.33 1.20 99.40 96.05 ± 0.41 1.01 99.52 99.06± 0.09 0.94 1 1 82.0071.66 ± ±4.21± 0.94 93.78 98.42 ± ±1.76 1.09± 94.89 96.87 ± ±1.33 1.20± 99.40 96.05 ± ±0.41 1.01± 99.52 99.06 ± ±0.09 0.94± 22 83.8865.04 ± 3.320.74 92.71 95.53 ± 1.04 1.21 98.16 98.89 ± 1.84 1.10 98.08 98.67 ± 1.53 1.69 99.93 99.56± 0.07 0.44 2 2 83.8865.04 ± ±3.32± 0.74 92.71 95.53 ± ±1.04 1.21± 98.16 98.89 ± ±1.84 1.10± 98.08 98.67 ± ±1.53 1.69± 99.93 99.56 ± ±0.07 0.44± 3 91.01 ± 5.20 92.53 ± 1.82 96.12 ± 1.14 98.84 ± 1.36 99.54 ± 0.46 33 3 91.0193.0493.04 ± ±5.20 1.041.04 92.53 94.09 94.09 ± ±1.82 1.12 1.12 96.12 98.38 98.38 ± ±1.14 1.99 1.99 98.84 99.11 99.11 ± ±1.36 0.54 0.54 99.54 98.12 98.12± ±0.46 1.43 1.43 Sensors 2019, 19, x FOR PEER REVIEW ± ± ± ± ± 9 of 15 Sensors 20194 , 19, x FOR PEER REVIEW81.70 ± 4.31 91.05 ± 1.24 92.37 ± 0.33 96.29 ± 0.88 96.29 ± 2.77 9 of 15 44 4 81.7089.5489.54 ± ±4.31 2.332.33 91.05 97.25 97.25 ± ±1.24 1.13 1.13 92.37 95.48 95.48 ± ±0.33 1.25 1.25 96.29 98.80 98.80 ± ±0.88 0.58 0.58 96.29 99.55 99.55± ±2.77 0.45 0.45 Sensors 2019, 19, x FOR PEER REVIEW ± ± ± ± ±11 of 15 Sensors 20195 , 19, x FOR PEER REVIEW86.08 ± 1.20 96.01 ± 1.95 96.72 ± 2.16 99.59 ± 0.41 100 11 of 15 55 83.8586.08 ± 1.991.20 86.03 96.01 ± 1.81 1.95 97.45 96.72 ± 3.11 2.16 95.54 99.59 ± 0.67 0.41 98.67 ± 0.88 100 5 83.85 ± 1.99± 86.03 ± 1.81± 97.45 ± 3.11± 95.54 ± 0.67± 98.67 ± 0.88 6 60.53 ± 3.39 85.37 ± 1.79 86.29 ± 3.02 83.51 ± 0.53 88.75 ± 3.26 66 69.1160.5369.11 ± 1.123.391.12 94.8785.37 94.87 ± 1.411.79 1.41 95.5486.29 95.54 ± 1.423.02 1.42 96.2183.51 96.21 ± 0.110.53 0.11 95.3588.75 95.35± 1.813.26 1.81 6 69.11 ± 1.12± 94.87 ± 1.41± 95.54 ± 1.42± 96.21 ± 0.11± 95.35 ± 1.81± 7 85.53 ± 2.74 90.98 ± 2.64 92.25 ± 1.98 97.01 ± 2.51 92.47 ± 2.30 77 87.9487.94 ± 3.113.11 96.83 96.83 ± 1.79 1.79 95.87 95.87 ± 2.01 2.01 97.74 97.74 ± 2.46 2.46 96.94 96.94± 1.64 1.64 ± ± ± ± ± 88 86.6686.66 ± 1.111.11 95.36 95.36 ± 0.61 0.61 97.64 97.64 ± 0.61 0.61 97.36 97.36 ± 0.34 0.34 97.54 97.54± 2.01 2.01 8 86.66 ± 1.11± 95.36 ± 0.61± 97.64 ± 0.61± 97.36 ± 0.34± 97.54 ± 2.01± 9 63.61 ± 2.12 77.14 ± 0.79 89.99 ± 1.64 86.99 ± 1.25 87.48 ± 1.27 99 63.6163.61 ± 2.122.12 77.14 77.14 ± 0.79 0.79 89.99 89.99 ± 1.64 1.64 86.99 86.99 ± 1.25 1.25 87.48 87.48± 1.27 1.27 10 80.31 ± 4.34± 72.62 ±1.46± 78.53 ± 2.33± 82.28 ± 2.12± 89.68 ± 0.53± 1010 100 80.3180.31 ± 4.344.34 72.62 72.62 ±1.46 1.46 78.53 78.53 ± 2.33 2.33 82.28 82.28 ± 2.12 2.12 89.68 89.68± 0.53 0.53 11 98.09 ± 1.87± 96.31 ± 1.27± 93.31 ± 0.48± 94.09 ± 1.26± 91.68 ± 1.61± 1111 98.0998.09 ± 1.871.87 96.31 96.31 ± 1.27 1.27 93.31 93.31 ± 0.48 0.48 94.09 94.09 ± 1.26 1.26 91.68 91.68± 1.61 1.61 ± ± ± ± ± 90 100 80 90 OA(%) OA(%) OA(%) 70 80 60 OA(%) 60 OA(%) 60 OA(%) 70 OA(%) 70

50 5060 Building1 Building2 Building3 road trees soil seawater 50 SVM CNN SqueezeNet OctConv OctSqueezeNet

Figure 7. Classification results of different methods for each class on Bayview Park.

SVM CNN SqueezeNet OctConv OctSqueezeNet

Figure 10. Classification results of different methods for each class on Recology.

(a) (b) (c) (c) (a) (a) (b) (c) (c)

(d) (e) (d) (e) (f)

Figure 11. Classification results on Recology: (a) Groundtruth map, (b) SVM, (c) CNN, (d) SqueezeNet, (e) OctConv, and (f) OctSqueezeNet. (d) (e) (f) (d) (e) (f) 5.3. Selection of Experimental Parameters Figure 8. Classification results on Bayview Park: (a) Groundtruth map, (b) support vector machine The number of parameters and the model size of the proposed method in this article were (SVM), (c) convolutional neural network (CNN), (d) SqueezeNet, (e) Octave Convolution (OctConv), compared with the classical methods. In order to adapt to the Bayview Park and Recology datasets, and (f) OctSqueezeNet. we made some adjustments to the structure of these classic algorithms. Next, the adjustments made in each architecture are described. in5.2.2. each Recology architecture Dataset are described. (1) SqueezeNet: The first change was the addition of batch normalization, which was not present in the original architecture. Then dropout was not used in the last convolution layer. Finally, we changed the size of the kernel on the first convolution layer from original 7 × 7 to 3 × 3.

Sensors 2019, 19, x FOR PEER REVIEW 11 of 15 Sensors 2019, 19, x FOR PEER REVIEW 11 of 15 6 69.11 ± 1.12 94.87 ± 1.41 95.54 ± 1.42 96.21 ± 0.11 95.35 ± 1.81 6 7 69.1187.94 ± ±1.12 3.11 94.87 96.83 ± ±1.41 1.79 95.54 95.87 ± ±1.42 2.01 96.21 97.74 ± ±0.11 2.46 95.35 96.94 ± ±1.81 1.64 7 8 87.9486.66 ± ±3.11 1.11 96.83 95.36 ± ±1.79 0.61 95.87 97.64 ± ±2.01 0.61 97.74 97.36 ± ±2.46 0.34 96.94 97.54 ± ±1.64 2.01 8 9 86.6663.61 ± ±1.11 2.12 95.36 77.14 ± ±0.61 0.79 97.64 89.99 ± ±0.61 1.64 97.36 86.99 ± ±0.34 1.25 97.54 87.48 ± ±2.01 1.27 910 63.6180.31 ± ±2.12 4.34 77.14 72.62 ± ±1.460.79 89.99 78.53 ± ±1.64 2.33 86.99 82.28 ± ±1.25 2.12 87.48 89.68 ± ±1.27 0.53 1011 80.3198.09 ± ±4.34 1.87 72.62 96.31 ±1.46 ± 1.27 78.53 93.31 ± ±2. 0.4833 82.28 94.09 ± ±2.12 1.26 89.68 91.68 ± ±0.53 1.61 Sensors 2019, 19, 4927 11 of 15 11 98.09 ± 1.87 96.31 ± 1.27 93.31 ± 0.48 94.09 ± 1.26 91.68 ± 1.61 100 100 90 90 80 80

OA(%) 70

OA(%) 70 60 60 50 50

SVM CNN SqueezeNet OctConv OctSqueezeNet SVM CNN SqueezeNet OctConv OctSqueezeNet

Figure 10. ClassificationClassification results of different different methods for each classclass onon Recology.Recology. Figure 10. Classification results of different methods for each class on Recology.

(a) (b) (c) (a) (b) (c)

(d) (e) (f)

(d) (e) (f) FigureFigure 11.11.Classification Classification results results on Recology:on Recology: (a) Groundtruth (a) Groundtruth map, (b )map, SVM, ( (bc) CNN,SVM, (d()c) SqueezeNet, CNN, (d) e f Figure(SqueezeNet,) OctConv, 11. Classification and(e) OctConv, ( ) OctSqueezeNet. resultsand (f) OctSqueezeNet.on Recology: ( a) Groundtruth map, (b) SVM, (c) CNN, (d) 5.3.SqueezeNet, Selection of Experimental(e) OctConv, and Parameters (f) OctSqueezeNet. 5.3. Selection of Experimental Parameters 5.3. SelectionThe number of Experimental of parameters Parameters and the model size of the proposed method in this article were The number of parameters and the model size of the proposed method in this article were compared with the classical methods. In order to adapt to the Bayview Park and Recology datasets, comparedThe number with the of classicalparameters methods. and the In ordermodel to size adapt of tothe the proposed Bayview method Park and in Recologythis article datasets, were we made some adjustments to the structure of these classic algorithms. Next, the adjustments made in comparedwe made withsome the adjustments classical methods. to the structure In order of to th adaptese classic to the algorithms. Bayview Park Next, and the Recology adjustments datasets, made each architecture are described. wein madeeach architecture some adjustments are described. to the structure of these classic algorithms. Next, the adjustments made in each architecture are described. (1) (1)SqueezeNet: SqueezeNet: The The first first change change was was the the addition addition of batchof batch normalization, normalization, which which was was not not present present in in the(1)the SqueezeNet:original original architecture. architecture. The first changeThen Then dropout was dropout the wasaddition was not not usedof used batch in in thenormalization, the last last convolution convolution which layer. layer. was not Finally, present wewe changed the size of the kernel on the first convolution layer from original 7 7 to 3 3. inchanged the original the size architecture. of the kernel Then on thedropout first convolution was not used layer in fromthe last original convolution 7 × 7 to× layer.3 × 3. Finally,× we changed(2) AlexNet: the size Itsof the original kernel architecture on the first convolution contained eight layer weights from original layers; 7 the× 7 to first 3 × five3. layers were convolution and the rest of the layers were fully connected. In comparison experiments, the first convolution layer changed the stride from 4 to 1. The output of the last fully connected layer was fed into a 7-way softmax for the Bayview Park dataset and an 11-way softmax for the Recology dataset. Sensors 2019, 19, 4927 12 of 15

(3) ResNet-34: Original Resnet-34 had 4 sections which were composed of 3, 4, 6, and 3 identity blocks, respectively. The number of filters was 64, 128, 256, and 512, respectively, in each identity block of the four sections. In our experiments, the kernel of the first convolution layer was changed from 7 to 3. The number of filters was modified to 16, 28, 40, and 52, respectively. The output of the last fully connected layer was set the same as AlexNet.

As shown in Table5, compared with some classical CNN algorithms, OctSqueezeNet proposed by us had higher precision, and reduced the number of parameters and model size significantly. When 700 training sets were selected, the OA of OctSqueezeNet on the Bayview Park dataset reached 95.42%, increasing 2.33%, 1.67%, and 2.21% compared with AlexNet, ResNet-34, and SqueezeNet, respectively; but the number of parameters of OcSqueezeNet was only 0.32M, which is 29.67M, 0.1M, and 0.92M less than the number of parameters of AlexNet, ResNet-34, and SqueezeNet, respectively. The model size was only 1.38M, which is 113M, 0.45M, and 3.45M less than the model size of AlexNet, ResNet-34, and SqueezeNet, respectively. The best OA of OcSqueezeNet on the Recology dataset was 95.91% and it was 1.64%, 0.86%, and 0.97% higher than AlexNet, ResNet-34, and SqueezeNet. Its number of parameters was the least (0.33M) compared with the other three methods. The number of parameters of AlexNet, ResNet-34, and SqueezeNet was 114.51M, 1.84M, and 4.85M, respectively. The model size of OctSqueezeNet was also the smallest (1.37M) of these four methods, which decreased 113.14M, 0.47M, and 3.48M compare to the model size of AlexNet, ResNet-34, and SqueezeNet.

Table 5. Performance comparison of 700 samples for different networks.

Params OA Model Size Training Test Data Method (Million) (%) (M) Time (s) Time (s) AlexNet 29.99 93.09 0.66 114.38 212.13 223.12 ± Bayview ResNet-34 0.42 93.75 1.29 1.83 125.15 160.17 ± Park SqueezeNet 1.24 93.21 0.63 4.83 75.82 65.43 ± OctSqueezeNet 0.32 95.42 0.91 1.38 108.82 151.07 ± AlexNet 29.99 94.27 0.74 114.51 222.94 230.52 ± ResNet-34 0.41 95.05 0.31 1.84 169.94 207.63 Recology ± SqueezeNet 1.25 94.94 2.63 4.85 38.04 44.12 ± OctSqueezeNet 0.33 95.91 0.73 1.37 119.62 191.54 ±

Additionally, the training and test time are shown in Table5 where 700 samples were selected as training sets and experiments ran 150 epochs. For Bayview Park data set, the training time of AlexNet, Resnet-34, SqueezeNet and OctSqueezeNet are 212.13 s, 116.35 s, 75.82 s and 108.82 s respectively. The test time of them are 223.12 s, 160.17 s, 65.43 s and 151.07 s respectively. For Recology data set, the training time of the four methods are 222.94 s, 169.94 s, 38.04 s and 119.62 s respectively. The test time of them are 230.52 s, 207.63 s, 44.12 s and 191.54 s respectively. The time of OctSqueezeNet was longer than SqueezeNet. Although the parameters were reduced, the structural branches were more complicated, which affected the transfer time. However, though the time was slower than SqueezeNet itself, it was acceptable compared to the greatly improved accuracy and the reduced number of parameters and model size. OctSqueezeNet was much faster than AlexNet and Resnet-34.

6. Conclusions This article designed a dual neural architecture OctSqueezeNet to classify LiDAR-DSM data. Using SqueezeNet alone without combining with OctConv, the network parameters were more and the model size was larger. Most importantly, the classification accuracy of the dataset was reduced significantly. Because SqueezeNet replaced the traditional 3 3 convolution kernels with a large × number of 1 1 convolution kernels, which lost more extracted features relatively. The OctConv did × not change the size of the convolution kernel. By reducing the size of a part of the feature maps, it was equivalent to expanding the receptive field to obtain more global feature information. The feature Sensors 2019, 19, 4927 13 of 15 maps with the original size were equivalent to extracting more detailed feature information, because the receptive field had no change. This can compensate for the loss of SqueezeNet in feature extraction. Because there were only 1 1 and 3 3 convolution kernels, it also made parameters of a network × × fewer and the model size smaller. Overall, combining SqueezeNet with OctConv gave better classification precision of datasets and less space for the storage model. The experiment results indicate that OctSqueezeNet achieved 95.42% and 95.66%, respectively, in terms of OA on the Bayview Park and Recology datasets when the number of training samples was 700, which were better than other classification methods. At the same time, for the two datasets, the number of parameters of OctSqueezeNet achieved 0.32M and 0.33M, respectively, and the model size of OctSqueezeNet achieved 1.38M and 1.37M, respectively. They were lower than some other methods. The combination of SqueezeNet and OctConv opens a new window for LiDAR data classification by fully extracting its spatial information.

Author Contributions: This article was completed by all authors. A.W. and M.W. designed and implemented the classification algorithm. K.J. and M.C. made an experimental analysis of the algorithm. Y.I. participated in the writing of the paper. Funding: Supported by National Science Foundation of China (NSFC-61671190), the University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province (UNPYSCT-2017086) and Fundamental Research Fundation for Universities of Heilongjiang Province (LGYC2018JQ014). Acknowledgments: The authors would like to thank the support of the laboratory and university. Conflicts of Interest: The authors declare no conflict of interest.

References

1. You, S.Y.; Hu, J.H.; Neumann, U.; Fox, P. Urban Site Modeling from LiDAR. In Proceedings of the Computational Science and Its Applications—ICCSA, Montreal, QC, Canada, 18–21 May 2003; Volume 2669, pp. 579–588. 2. Hackel, T.; Wegner, J.D.; Schindler, K. Fast Semantic Segmentation of 3D Point Clouds with Strongly Varying Density. In Proceedings of the ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Science, Prague, Czech Republic, 12–19 July 2016; Volume 3, pp. 177–184. 3. Weidner, U. Digital Surface Models for Building Extraction. In Automatic Extraction of Man-Made Objects from Aerial and Space Images, 2nd Ed.; Gruen, A., Baltsavias, E.P., Henricsson, O., Eds.; Birkhäuser: Basel, Switzerland, 1997; pp. 193–202. [CrossRef] 4. Lo, C.S.; Lin, C. Growth-Competition-Based Stem Diameter and Volume Modeling for Tree-Level Forest Inventory Using Airborne LiDAR Data. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2216–2226. [CrossRef] 5. Priestnall, G.; Jaafar, J.; Duncan, A. Extracting Urban Features from LiDAR Digital Surface Models. Comput. Environ. Urban 2000, 24, 65–78. [CrossRef] 6. Song, Y.; Imanishi, J.; Hashimoto, H.; Morimura, A.; Morimoto, Y.; Kitada, K. Spectral Correction for the Effect of Crown Shape at the Single-Tree Level: An Approach Using a Lidar-Derived Digital Surface Model for Broad-Leaved Canopy. IEEE Trans. Geosci. Remote Sens. 2013, 21, 755–764. [CrossRef] 7. Zhou, S.; Mi, L.; Chen, H. Building Detection in Digital Surface Model. In Proceedings of the IEEE International Conference on Imaging Systems and Techniques (IST), Beijing, China, 22–23 October 2013; pp. 194–199. [CrossRef] 8. Zhou, W.Q. An Object-Based Approach for Urban Land Cover Classification: Integrating LiDAR Height and Intensity Data. IEEE Geosci. Remote Sens. Lett. 2013, 8, 928–931. [CrossRef] 9. Liu, Y.; Ren, Y.; Hu, L.; Liu, Z. Study on Highway Geological Disasters Knowledge base for Remote Sensing Images Interpretation. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012. [CrossRef] 10. Lodha, S.K.; Kreps, E.J.; Helmbold, D.P.; Fitzpatrick, D. Aerial LiDAR Data Classification Using Support Vector Machines (SVM). In Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), Chapel Hill, NC, USA, 14–16 June 2006; pp. 567–574. [CrossRef] Sensors 2019, 19, 4927 14 of 15

11. Sasaki, T.; Imanishi, J.; Ioki, K.; Morimoto, Y.; Kitada, K. Object-based Classification of Land Cover and Tree Species by Integrating Airborne LiDAR and High Spatial Resolution Imagery Data. Landsc. Ecol. Eng. 2012, 8, 157–171. [CrossRef] 12. Naidoo, L.; Cho, M.A.; Mathieu, R.; Asner, G. Classification of Savanna Tree Species, in the Greater Kruger National Park Region, by Integrating Hyperspectral and LiDAR Data in a Random Forest Data Mining Environment. ISPRS J. Photogramm. Remote Sens. 2012, 69, 167–179. [CrossRef] 13. Khodadadzadeh, M.; Li, J.; Prasad, S.; Plaza, A. Fusion of Hyperspectral and LiDAR Remote Sensing Data Using Multiple Feature Learning. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2015, 8, 2971–2983. [CrossRef] 14. Ghamisi, P.; Höfle, B.; Zhu, X.X. Hyperspectral and LiDAR Data Fusion Using Extinction Profiles and Deep Convolutional Neural Network. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2017, 10, 3011–3024. [CrossRef] 15. Ghamisi, P.; Höfle, B. LiDAR Data Classification Using Extinction Profiles and a Composite Kernel Support Vector Machine. IEEE Geosci. Remote Sens. Lett. 2017, 14, 659–663. [CrossRef] 16. Wang, A.L.; He, X.; Ghamisi, P.; Chen, Y.S. LiDAR Data Classification Using Morphological Profiles and Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 774–778. [CrossRef] 17. He, X.; Wang, A.L.; Ghamisi, P.; Li, G.; Chen, Y.S. LiDAR Data Classification Using Spatial Transformation and CNN. IEEE Geosci. Remote Sens. Lett. 2018, 16, 125–129. [CrossRef] 18. Xia, J.S.; Yokoya, N.; Iwasaki, A. Fusion of Hyperspectral and LiDAR Data with a Novel Ensemble Classifier. IEEE Geosci. Remote Sens. Lett. 2018, 15, 957–961. [CrossRef] 19. Ge, C.; Du, Q.; Li, W.; Li, Y.S.; Sun, W.W. Hyperspectral and LiDAR Data Classification Using Kernel Collaborative Representation Based Residual Fusion. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2019, 12, 1963–1973. [CrossRef] 20. Krizhevsky,A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25. [CrossRef] 21. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.F. ImageNet: A Large-scale Hierarchical Image Database. In Proceedings of the IEEE Conference on and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [CrossRef] 22. Ito, S.; Hiratsuka, S.; Ohta, M.; Matsubara, H.; Ogawa, M. Small Imaging Depth LIDAR and DCNN-Based Localization for Automated Guided Vehicle. Sensors 2018, 18, 177. [CrossRef][PubMed] 23. Kwon, S.K.; Jung, H.S.; Baek, W.K.; Kim, D. Classification of Forest Vertical Structure in South Korea from Aerial Orthophoto and Lidar Data Using an Artificial Neural Network. Appl. Sci. 2017, 7, 1046. [CrossRef] 24. Shao, J.; Qu, C.; Li, J.; Peng, S. A Lightweight Convolutional Neural Network Based on Visual Attention for SAR Image Target Classification. Sensors 2018, 18, 3039. [CrossRef][PubMed] 25. Gao, F.; Huang, T.; Wang, J.; Sun, J.; Hussain, A.; Yang, E. Dual-Branch Deep Convolution Neural Network for Polarimetric SAR Image Classification. Appl. Sci. 2017, 7, 447. [CrossRef] 26. Gao, Q.; Lim, S.; Jia, X. Hyperspectral Image Classification Using Convolutional Neural Networks and Multiple Feature Learning. Remote Sens. 2018, 10, 299. [CrossRef] 27. Zhang, X.Y.; Shi, H.C.; Zhu, X.B.; Li, P. Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 2019, 345, 103–113. [CrossRef] 28. Zhang, X.Y.; Shi, H.C.; Li, C.S.; Zheng, K.; Zhu, X.B.; Li, D. Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision. In Proceedings of the 33th AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 1–8. 29. Zhu, X.B.; Li, Z.Z.; Zhang, X.Y.; Li, P.; Xue, Z.; Wang, L. Deep convolutional representations and kernel extreme learning machines for image classification. Multimed. Tools Appl. 2018, 78, 29271–29290. [CrossRef] 30. Jiang, Y.G.; Wu, Z.X.; Tang, J.H.; Li, Z.C.; Xue, X.Y.; Chang, S.H. Modeling Multimodal Clues in a Hybrid Framework for Video Classification. IEEE Trans. Multimedia 2018, 20, 3137–3147. [CrossRef] 31. Jiang, Y.G.; Wu, Z.X.; Wang, J.; Xue, X.Y.; Chang, S.H. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 352–364. [CrossRef][PubMed] 32. Yang, P.; Zhao, P.; Gao, X.; Liu, Y. Robust cost-sensitive learning for recommendation with implicit feedback. In Proceedings of the SIAM International Conference on Data Mining, San Diego, CA, USA, 3–5 May 2018; pp. 621–629. Sensors 2019, 19, 4927 15 of 15

33. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference of Learning Representation, Banff, AB, Canada, 14–16 April 2014. 34. Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. 35. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; Volume 1, pp. 770–778. 36. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. 37. SqueezeNet: Alex-Net-level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. Available online: https://arxiv.org/abs/1602.07360 (accessed on 4 September 2019). 38. Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution. Available online: https://export.arxiv.org/abs/1904.05049 (accessed on 4 September 2019). 39. Ke, T.W.; Maire, M.; Yu, S.X. Multigrid Neural Architectures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1063–6919. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).