remote sensing

Article Robust Classification Technique for Hyperspectral Images Based on 3D-Discrete Transform

R Anand 1,* , S Veni 2 and J Aravinth 2

1 Research Scholar, Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India 2 Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India; [email protected] (S.V.); [email protected] (J.A.) * Correspondence: [email protected]

Abstract: Hyperspectral image classification is an emerging and interesting research area that has attracted several researchers to contribute to this field. Hyperspectral images have multiple narrow bands for a single image that enable the development of algorithms to extract diverse features. Three- dimensional discrete wavelet transform (3D-DWT) has the advantage of extracting the spatial and spectral information simultaneously. Decomposing an image into a set of spatial–spectral components is an important characteristic of 3D-DWT. It has motivated us to perform the proposed research work. The novelty of this work is to bring out the features of 3D-DWT applicable to hyperspectral images classification using Haar, Fejér-Korovkin and Coiflet filters. Three-dimensional-DWT is implemented with the help of three stages of 1D-DWT. The first two stages of 3D-DWT are extracting spatial resolution, and the third stage is extracting the spectral content. In this work, the 3D-DWT   features are extracted and fed to the following classifiers (i) random forest (ii) K-nearest neighbor (KNN) and (iii) support vector machine (SVM). Exploiting both spectral and spatial features help the Citation: Anand, R; Veni, S; classifiers to provide a better classification accuracy. A comparison of results was performed with Aravinth, J. Robust Classification the same classifiers without DWT features. The experiments were performed using Salinas Scene Technique for Hyperspectral Images Based on 3D-Discrete Wavelet and Indian Pines hyperspectral datasets. From the experiments, it has been observed that the SVM Transform. Remote Sens. 2021, 13, with 3D-DWT features performs better in terms of the performance metrics such as overall accuracy, 1255. https://doi.org/10.3390/ average accuracy and kappa coefficient. It has shown significant improvement compared to the state rs13071255 of art techniques. The overall accuracy of 3D-DWT+SVM is 88.3%, which is 14.5% larger than that of traditional SVM (77.1%) for the Indian Pines dataset. The classification map of 3D-DWT + SVM is Academic Editor: more closely related to the ground truth map. Sathishkumar Samiappan Keywords: discrete wavelet transform; support vector machine; machine learning; K-nearest neigh- Received: 1 February 2021 bor; random forest; classification; hyperspectral image Accepted: 16 March 2021 Published: 25 March 2021

Publisher’s Note: MDPI stays neutral 1. Introduction with regard to jurisdictional claims in published maps and institutional affil- The importance of HSI has accurate discrimination of different materials in spectral iations. resolution and analysis of a small spatial structure in spatial resolutions [1,2]. The anomaly detection based on the auto encoder (AE) has attracted significant interest in the study of hyperspectral images (HSI). Both spatial and spectral resolution are powerful tools for accurate analysis of the Earth surface. HSI data provide useful information related to size, structure, and elevation. HSI deals with big data because it has huge information available Copyright: © 2021 by the authors. in several bands, and it has a good impact on classifying different urban areas. For the Licensee MDPI, Basel, Switzerland. This article is an open access article selection of classifiers for processing of fusion, data must satisfy the properties such as (i) distributed under the terms and accurate and automatic (ii) simple and fast [3,4]. Let x be an input attributes, and f (x) be a conditions of the Creative Commons projecting or mapping function, then, predicted class “y” is given as Attribution (CC BY) license (https:// N creativecommons.org/licenses/by/ y = ∑ f (xi) (1) 4.0/). i=1

Remote Sens. 2021, 13, 1255. https://doi.org/10.3390/rs13071255 https://www.mdpi.com/journal/remotesensing Remote Sens. 2021, 13, 1255 2 of 19

According to the Equation (1), x = x1, x2, x3, xN, which are the set of attributes x to N of sample data [1]. Hyperspectral imaging is highly advanced remote sensing imagery consisting of hundreds of continuous narrow spectral bands compared to traditional panchromatic and multispectral images. Additionally, it allows a better accuracy for object classifications. Discrete wavelet transform (DWT) provides better precision in case of HSI image classification [3]. It is a time-scale methodology that can be used in several applications [3]. Ghazali et al. [4] has developed 2D-DWT feature extraction method and image classifier for hyperspectral data. Chang et al. [5] proposed the improved image denoising method using 2D-DWT for hyperspectral images. Sparse representation of signals (images, movies or other multidimensional signals) is very important for a wide variety of applications, such as compression, denoising, attribute extraction, estimation, super-resolution, compressive sensing [1], blind separation of mixtures of dependent signals [2] and many others. In most cases, discrete cosine transform (DCT) is applied for sparse representation, and DWT is applicable for both linear transformations and filter bank. Thus, several versions of these transformations are involved in contemporary approaches for signal encoding (MP3...), (JPEG, JPEG2000...), as well as in several other implementations. Low computational complexity is an important benefit of the DWT when it is implemented using wavelet filter banks. Usually, for decomposition and reconstruction, finite support filters are used. For a large class of signals, the wavelet transform offers sparse representation. DWT filter banks, however, have several drawbacks: ringing near discontinuities, shift variance and the lack of decomposition function directionality. A lot of studies and publications have recently focused on solving these issues. Adaptive wavelet filter bank [3,4] is one of the methods for obtaining spectral representation and lower dissipation in the wavelet domain. A lower ringing effect is achieved when the of L2 or L1 is minimized by adapting wavelet functions in some neighborhoods of each data sample [5,6]. With the help of dual-tree complex wavelet transform [7], the directionality and shift variance problem may be reduced in high dimensional data. In addition, the adaptive directional lifting- based wavelet adjusts directions on every pixel depending on the image orientation in the neighborhood [8]. Enhancement can be achieved in all implementations using multiple representations. The better quality of the estimation will be provided by more statistical estimators with known properties for the same estimated value. The referred advantages of the wavelet transform motivated us to apply it for HSI classification. In this work, 3D-DWT features are extracted from HSI data, which has both spatial and spectral information. It has various applications and the ability to collect remote sensed data from visible to near infrared (NIR) wavelength ranges, delivering multi-spectral channels from the same location. An attempt is made to perform hyperspectral data classification using the 3D-DWT. The 3D-DWT features are extracted and applied to the following classifiers: (i) random forest (ii) K-nearest neighbor (KNN) and (iii) support vector machine (SVM). A comparison of results was performed with the same classifiers without DWT features. The aim of this work to exploit both spatial and spectral features using suitable classifiers to provide better classification accuracy. By using 3D-DWT, it is possible to achieve better relative accuracy of about 14.5% compared with classifier without using DWT, which is the contributions of the proposed work. The rest of the work is organized as follows. Section2 describes related work. Method- ology is elaborated in Section3. Section4 deals with results and discussion on various HSI image classification. Conclusions are given in Section5.

2. Related Work HSI has a very fine resolution of the spectral information with the use of hundreds of bands. These bands give more information about the spectral details of a scene. Many researchers have attempted to perform HSI image classification using various techniques. First, a classification task that attempts to assign a hyperspectral image pixel into many categories can be converted to several hyperspectral applications. State-of-the-art classifi- Remote Sens. 2021, 13, 1255 3 of 19

cation techniques were tried for this role, and appreciable success was achieved for some applications [1–4]. However, these approaches also tend to face some difficulties in realistic situations. Second, the available labelled training samples in the HSI classification are usually small because of the high cost of image marking [5], which contributes to solve the issue of high-dimension and low sample size classification. Third, considering the high spectral resolution of HSI, the same material may have very different spectral signatures, while different materials may share identical spectral signatures [6]. Various grouping methods have been studied to solve these problems. A primary strategy is to find the critical discriminatory features that gain classification, while reducing the noise embedded in HSIs that impairs classification efficiency. It was pointed out in [7] that spatial information is more relevant in terms of leveraging discriminatory characteristics than the spectral signatures in the HSI classification. Therefore, a pixel-wise classification approach becomes a simple but successful way of applying this technique [4,8] after a spatial-filtering preprocessing stage. Square patch is a representative tool for the strategy of spatial filtering [9–11], which first groups the adjacent pixels by square windows and then extracts local window-based features using other subspace learning strategies, such as low-rank matrix factorization [12–17], learning dictionary [4,18] and clustering subspace [19]. Compared to the initial spectral signatures, the filtered characteristics derived by the square patch process have less intra-class heterogeneity and greater spatial smoothness, with much decreased noise. The spectral–spatial method, which incorporates spectral and spatial information into a classifier, except for the spatial filtering method, is another common technique to manipulate spatial information. Unlike the pixel-wise classification approach that does not consider spatial structure details, this approach integrates the local accuracy of the neighboring pixel labels. Such an approach has been suggested to unify the pixel- wise grouping with a segmentation map [20]. In addition, 3D-DWT is also a common technique for a spectral–spatial approach that can be implemented to incorporate the local association of adjacent pixels into the classification process. The classification task can then be formulated into a maximum posterior (MAP) problem [21,22]. Finally, the graph-based segmentation algorithm [23] can efficiently solve this problem. As for the above two approaches to improving classification efficiency, most of the existing methods typically use only one of them, i.e., either the spatial-filtering approach in feature extraction or the spectral–spatial approach in the prior integration of the DCT into the classification stage. These strategies can also not help us to achieve the highest results because of the lack of full use of spatial knowledge. A natural idea is to combine all the two methodologies into a specific structure to further promote the state-of-the-art potential for this HSI classification challenge to leverage spatial knowledge more comprehensively. A novel methodology is recommended in the proposed work for supervised HSI classification. Firstly, to produce spectral–spatial features, a spatial-filtering method is used. These techniques were used to extract either spatial or spectral features. Then, are intro- duced for hyperspectral image classification, which allows extraction of both spectral and spatial features. A 3D-DWT [24] is used in our work to extract spectral–spatial characteris- tics that have been validated to be more discriminatory than the original signature of the spectrum. Secondly, to exploit the spatial information, the local correlation of neighboring pixels should also be introduced. In this work, the model has a local correlation using 3D-DWT, which assumes that adjacent pixels likely to belong to the same class. In addition, the spatial features of 3D-DWT function are included in SVM. The extensive experimental results show that such amelioration is insightful to this problem and can always guarantee a significant improvement in scenarios both with relatively less labeled samples and with noisy input training samples beyond state-of-the-art techniques. The author suggested that an enhanced hybrid-graph discriminant learning (EHGDL) approach is used to evaluate the dynamic inner relationship of the HSI and to derive useful low-dimensional discriminant characteristics [25]. First, we create two discriminant hypergraphs, namely an intraclass hypergraph and an interclass hypergraph, based on intraclass and interclass neighbors, to Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 20

experimental results show that such amelioration is insightful to this problem and can always guarantee a significant improvement in scenarios both with relatively less labeled samples and with noisy input training samples beyond state-of-the-art techniques. The author suggested that an enhanced hybrid-graph discriminant learning (EHGDL) ap- proach is used to evaluate the dynamic inner relationship of the HSI and to derive useful Remote Sens. 2021, 13, 1255 low-dimensional discriminant characteristics [25]. First, we create two discriminant4 of hy- 19 pergraphs, namely an intraclass hypergraph and an interclass hypergraph, based on in- traclass and interclass neighbors, to expose the high-level structures of HSI. Then, a su- pervised locality graph is constructed with a class mark and a locality-restricted coding to exposerepresent the high-levelthe HSI’s binary structures relationship. of HSI. Then, Therefore, a supervised it is expected locality that graph the is proposed constructed ap- withproach a class will mark further and prompt a locality-restricted the frontier of coding this line to representof study. theThe HSI’s author binary developed relationship. a mod- Therefore,ern unsupervised it is expected dimensionality that the proposed reduction approach algorithm will called further local prompt a geometric the frontier structure of thisfeature line oflearning study. The(LNSPE) author algorithm developed [26] a modern for the unsupervisedclassification of dimensionality HIS and achieved reduction good algorithmaccuracy rather called localthan other a geometric dimensionality structure redu featurection learning techniques. (LNSPE) Lei J et algorithm al. suggest [26 a] fornew theapproach classification for the of reconstruction HIS and achieved of hyperspect good accuracyral anomaly rather images than otherwith spectral dimensionality learning reduction(SLDR) [27]. techniques. The use Leiof spatia J et al.l–spectral suggest a data, new approachwhich will for re thevolutionize reconstruction the conventional of hyper- spectral anomaly images with spectral learning (SLDR) [27]. The use of spatial–spectral classification strategies posing a huge obstacle in distinguishing between different forms data, which will revolutionize the conventional classification strategies posing a huge of land use cover and classification [28]. The following section deals with methodology of obstacle in distinguishing between different forms of land use cover and classification [28]. this paper. The following section deals with methodology of this paper.

3.3. Methodology Methodology FigureFigure1 depicts1 depicts the the schematic schematic representation representation of of the the proposed proposed system. system. For For this this work, work, IndianIndian Pines Pines [ 29[29]] and and Salinas Salinas [ 30[30]] airborne airborne HSI HSI datasets datasets were were used used to to experiment experiment the the pro- pro- posedposed model. model. A A hyperspectral hyperspectral image image is is a a vector vector in in high-dimensional high-dimensional space. space. The The principal principal componentcomponent analysis analysis method method (PCA) (PCA) is is used used to to reduce reduce the the subspaces subspaces in in high high dimensional dimensional data,data, producing producing computably computably compact compact image image representation representation [ 31[31].]. This This represents represents all all the the pixelspixels in in anan imageimage using a new new basis basis function function along along with with the the directions directions of maximum of maximum var- variationiation [26]. [26 The]. The DWT DWT maps maps an animage image into into other other basis basis functions. functions. These These basis basis functions functions must musthave have properties properties such such as ascapture capture scale, scale, inve invertible,rtible, orthogonal, orthogonal, square, pixelpixel reduction,reduction, multi-scalemulti-scale representation, representation, invertibility invertibility and and linearity. linearity. Initially, Initially, the the 1D-DWT 1D-DWT is is applied applied on on thethe rows, rows, and and then,then, thethe output of of 1D-DWT 1D-DWT is is fe fedd along along with with the the columns. columns. This This creates creates four foursub-bands sub-bands in the in first the first stage stage of transformed of transformed space space such such as Low-Low as Low-Low (LL), (LL), LowHigh LowHigh (LH), (LH),HighLow HighLow (HL) (HL) and and HighHigh HighHigh (HH) (HH) frequencie frequenciess [32]. [32 ].The The LL LL band band is equivalentequivalent toto a a downsampleddownsampled (by (by a a factor factor of of two) two) version version of of the the original original image. image. Then, Then, these these features features are are appliedapplied to to the the random random forest forest [ 33[33],], KNN KNN [ 33[33]] and and SVM SVM [34 [34]] classifiers, classifiers, and and the the results results are are comparedcompared with with existing existing classification classification algorithm algorithm as as decision decision tree, tree, KNN KNN and and SVM SVM without without thethe use use of of the the wavelet wavelet features. features.

Hyperspectral Images Indian Pines Salinas Scene

3D-DWT Feature Extraction Haar filter Fejér-Korovkin filter Coiflet filters

Machine learning Algorithms SVM •Random Forest •KNN

Predicted Classes Indian Pines-17 Classes Salinas Scene-17 Classes

Performance Metrices Average Accuracy Overall Accuracy Kappa Coefficeint

FigureFigure 1. 1.Schematic Schematic representation representation of of proposed proposed hyperspectral hyperspectral data data processing processing flow. flow. 3.1. Overview of DWT In this section, we initially define the terminologies used in this paper and the proce- dure to extract feature from HSI images by using 3-dimensional discrete wavelet transform. Here, HSI data have X ∈ Rˆ(w ∗ h ∗ λ), where w and h are width and height of the HSI images, and λ is the wavelength or band of the entire HSI image spectrum. Here, input training samples are x = x1, x2, x3, x4 ...... xn ∈ Rˆ(b ∗ n), where b is the translating func- tion, and n should be less than “w” and “h”, y = y1, y2, y3, ... yn ∈ Kˆn, where K is the number of classes. Wavelet transform is the best mathematical tool for performing time– Remote Sens. 2021, 13, 1255 5 of 19

frequency analysis and has a wider spectrum of applications in image compression and noise removal technique. The equation given below denotes the general wavelet transform

+∞ 1 Z  t − b  W (a, b) = √ x(t)ϕ∗ dt (2) f a a −∞

where a is the scaling parameter, and b is the shifting parameter. ϕ(t) indicates the mother wavelet of the kernel function, and it is used to generate all the basic functions as prototypes. The translation parameter or the shifting parameter “b” gives the time information and location of the input signal. The scaling parameter “a” gives the frequency and detailed information of the input signal. DWT is given in Equation (3)

+∞ p ! Z −p − ϕ 1 2 t nb0a0 Wp,q(x) = √ x(t)a ϕ dt (3) a 0 ap −∞ 0

where a0, b0 are the dyadic scaling parameter and the shifting parameter, respectively. In multistage analysis, the signal x(t) will be recovered by the combination of different levels of wavelet and scaling function ∅(t) and ψ(t). Generally, Equation (3) can be rewritten for discrete signal x(n) as

1 1 ∞ x(n) = √ Cψ[i0, d]ψi ,d[n] + √ Dϕ[i, d]ϕi ,d[n] (4) M ∑ 0 M ∑ ∑ 0 d i=i0 d ( ) [ ] [ ] − where x n , ψi0,d n and ϕi0,d n are discrete set functions that generally vary from [0, M 1], [ ] [ ] totally M points. Since ψi0,d n and ϕi0,d n are orthogonal to each other, the inner product can be obtained from the following wavelet coefficient equations

1 C [i d] = √ x(n) [n] ψ 0, ∑ ψi0,d (5) M n

1 D [i d] = √ x(n) [n] ϕ , ∑ ϕi0,d (6) M n In this work, we have used 3D-DWT feature extraction for hyperspectral images, which can be obtained from the expressions given in Equation (3) using different 1-D DWTs. This 3D-DWT generates several features with different scaling, orientations and frequen- cies. is used as the mother wavelet in this work. The scaling and shifting functions are represented by the different filter bank (L, H) given by low-pass and high-pass  1 1  filter coefficients l(d) and h(d). Generally, Haar wavelet has l(d) = √ , √ and 2 2  1 1  h(d) = − √ , √ ; Fejér-Korovkin and Coiflet filters are used as wavelet filters in 2 2 this paper and compared the same. Three-dimensional DWT is performed for hyperspec- tral imaging by expanding the standard 1-D wavelet to the image’s spectral and spatial domains, i.e., by applying the 1-D DWT filter banks in turn to each of the three dimensions (two for the spatial domain and one for the spectral direction). A tensor product is used to construct the 3-D DWT, which is shown in Equations (7) and (8).

X(x,y,z) = (Lx ⊕ Hx) ⊗ (Ly ⊕ Hy) ⊗ (Lz ⊕ Hz) (7)

X(x,y,z) = Lx Ly Lz ⊕ Lx Ly Hz ⊕ Lx Hy Hz ⊕ Lx Hy Hz ⊕ Hx Ly Lz ⊕ Hx Ly Hz ⊕ Hx Hy Lz ⊕ Hx Hy Hz (8) where ⊕ and ⊗ denote the space direct sum and the product of the tensor, and X denotes the hyperspectral image, respectively. Figure2 demonstrates the decomposition of 3D data into eight sub-bands after single-level decomposition implementation. The sub-band Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 20

Remote Sens. 2021, 13, 1255 (,,) =⨁⨁⨁⨁⨁⨁⨁ 6(8) of 19 where ⨁ and ⨂ denote the space direct sum and the product of the tensor, and X de- notes the hyperspectral image, respectively. Figure 2 demonstrates the decomposition of that3D data has goneinto eight horizontally sub-bands through after single-level the lowpass decomposition filter is called implementation. LLL in the vertical The sub- and spectralband that directions. has gone Thishorizontally band is through talking tothe data lowpass cube filter approximations. is called LLL Thein the vertex vertical angle and of thespectral knowledge directions. cube This is seen band by is HHH talking band to data that cube went approximations. horizontally through The vertex the high-passangle of filterthe knowledge in vertical directionscube is seen and by spectral HHH band directions. that went The horizontally low- and high-pass through filters the high-pass along the xfilter, y and in verticalz axes are directions expressed and byspectralL and directions.H. The x andThe low-y directions, and high-pass in practice, filters along denote the an image’sx, y and spatial z axes coordinates,are expressed and by theL and spectral H. The axis x and is z. y The directions, volume in data practice, are broken denote down an intoimage’s eight spatial sub-bands coordinates, after single-level and the spectral processing, axis is i.e., z. The LLL, volume LLH, LHL,data are LHH, broken HLH, down HHL andinto HHH, eight sub-bands which can after be divided single-level into proc threeessing, categories: i.e., LLL, approximation LLH, LHL, LHH, sub-bands HLH, HHL (LLL), spectraland HHH, variation which (LLH, can be LHH, divided HLH) into and three spatial categories: variation approximation (LHL, HLL, HHL),sub-bands respectively. (LLL), Atspectral each scaling variation level, (LLH, the LHH, convolution HLH) and between spatial variation the input (LHL, and the HLL, corresponding HHL), respectively. low- or high-passAt each scaling coefficients level, the is performed. convolution In between the third the stage input of and the the filter corresponding bank, eight separatelow- or filteredhigh-pass hyperspectral coefficients cubes is performed. can be obtained. In the th Then,ird stage these of cubes the againfilter bank, filtered eight with separate low-pass filterfiltered inall hyperspectral three dimensions cubes tocan get bethe obtained. next level Then, of waveletthese cubes coefficients, again filtered which with is shown low- Figurepass filter2. Here, in all this three hyperspectral dimensions cube to get is the decomposed next level intoof wavelet two levels, coefficients, and it has which 16 sub is cubesshown(U Figure1, U2, U 2.3 ,...,Here,U this15) ashyperspectral Equation (9). cube is decomposed into two levels, and it has 16 sub cubes (,,,…,) as Equation (9). xi,j = (U1(i, j), U2(i, j), U3(i, j),...... , U15(i, j)) (9)

Input X

Low Pass High Pass Filter (L) Filter (H)

LL LH HL HH

LLL LLH LHL LHH HLL HLH HHL HHH

Apprxoima te x

Figure 2. Three-dimensional DWT flowchart.

FigureConsider 2. Three-dimensional the relationship DWT between flowchart. neighboring cubes to prevent edge blurring effects and lack of spatial information. The 3D wavelet features are derived from each local cube to integrate spatial and spectral information. In this work, the 3D wavelet coefficients =(, ), (, ), (, ),….., (, ) (9) of the entire cube are extracted, after the first and second 3D-DWT levels. The LLL sub- bandConsider extracts thethe relationship spatial information between inneighborin the second-levelg cubes to 3D-DWTprevent edge and blurring the LLH effects band extractsand lack the of HSIspatial data information. spectral information. The 3D wavelet The firstfeatures and are second derived stage from knowledge each local vectors cube oftoall integrate eight sub-bands spatial and are spectral concatenated information. to make In this a work, vector the with 3D awavelet size of coefficients 15 coefficients of (removingthe entire cube approximate are extracted bands). after It the is called first and 3D datasecond cube 3D-DWT correlation, levels. which The LLL helps sub-band to boost theextracts compression. the spatial The information basic concept in the is thatsecond-level a signal is3D-DWT viewed and as a the wavelet LLH band superposition. extracts Inthe comparison HSI data spectral to 2D-DWT information. that decomposes The first and only second in horizontal stage knowledge and vertical vectors dimensions, of all 3D-DWTeight sub-bands has the are advantage concatenated of decomposing to make a vector volumetric with a size date of in 15 horizontal, coefficients vertical(removing and depthapproximate directions bands). [35]. It Three-dimensional is called 3D data cube DWT correlation, is performed which for hyperspectralhelps to boost imagesthe com- by applyingpression. one-dimensional The basic concept DWT is that filter a signal banks is toviewed three spatio-spectralas a wavelet superposition. dimensions. In com- parison to 2D-DWT that decomposes only in horizontal and vertical dimensions, 3D-DWT 3.2.has Hyperspectral the advantage Image of decomposing Classification volumetric Using 3D-DWT date Featurein horizontal, Extraction vertical and depth di- rectionsThe [35]. main Three-dimensional objective of this work DWT isis toperf classifyormed hyperspectralfor hyperspectral images images using by applying 3D DWT features.one-dimensional Random DWT forest filter [36 ]banks is a classification to three spatio-spectral algorithm dimensions. that uses many decision trees models built on different sets of bootstrapped features. This algorithm will work with the following steps: Bootstrap the training set multiple times, the algorithm adopts a new set to build a single tree in the ensemble during each bootstrap. Whenever the sample of tree is split, a portion of features are selected randomly in the training sets for finding the best Remote Sens. 2021, 13, x FOR PEER REVIEW 7 of 20

3.2. Hyperspectral Image Classification Using 3D-DWT Feature Extraction The main objective of this work is to classify hyperspectral images using 3D DWT features. Random forest [36] is a classification algorithm that uses many decision trees models built on different sets of bootstrapped features. This algorithm will work with the Remote Sens. 2021, 13, 1255 7 of 19 following steps: Bootstrap the training set multiple times, the algorithm adopts a new set to build a single tree in the ensemble during each bootstrap. Whenever the sample of tree is split, a portion of features are selected randomly in the training sets for finding the best splitsplit variable variable and and new new features features are are evaluated. evaluated. The The random random forest forest has has taken taken more more time time to to computecompute the the process process to to validate validate the the results, results, but but its its performance performance was was good. good. ToTo overcome overcome this this issue, issue, these these algorithms algorithms are are compared compared with with KNN KNN and and SVM. SVM. K-nearest K-near- neighborsest neighbors [KNN] [KNN] is the is secondthe seco classifiernd classifier adopted adopted in the in proposedthe proposed method method [36]. [36]. Given Given N trainingN training vectors, vectors, with with “a” “a” and and “o” “o” letters letters as as input input training training features features in in this this 2 dimensional2 dimensional featurefeature space, space, this this algorithm algorithm classifies classifies the the vector vector “c”, “c”, where where “c” “c” is another is another feature feature vector vector to beto evaluated. be evaluated. In this In this case, case, it identifies it identifies the k-nearest the k-nearest neighbors neighbors regardless regardless of labels. of labels. Consider Con- thesider distribution the distribution of classes of classes “a” and “a” “o” and as “o” shown as shown in Figure in Figure3 with 3 kwith equals k equals 3. The 3. aimThe ofaim theof the algorithm algorithm is to is findto find the the class class for for “c”. “c”. Since Since k isk is 3, 3, the the three three nearest nearest neighbors neighbors of of “c” “c” mustmust be be identified. identified. Among Among the the three three nearest nearest neighbors, neighbors, one one belongs belongs to to class class “a”, “a”, and and the the otherother two two belong belong toto classclass “o”.“o”. Hence, we we have have 2 2 votes votes for for “o” “o” and and 1 1vote vote for for “a”. “a”. The The “c” “c”vector vector will will be beclassified classified as class as class “o.” “o.” When When k is kequal is equal to 1,to the 1, first the firstnearest nearest neighbor neighbor of the ofelement the element will define will define the class. the In class. KNN In predicti KNN prediction,on, computation computation time is very time slow, is very but slow, train- buting training computation computation time is timebetter is betterthan random than random forest. forest. Even Eventhough though training training timing timing better bettercompared compared to other to other algorithms, algorithms, it requires it requires more more memory memory power power to compute to compute the HSI the HSIdata. data.So, these So, these algorithms algorithms are compared are compared with with SVM. SVM. This This algorithm algorithm finds finds the the most most similar similar ob- observationservation to to the the one one we we must must predict predict and and from from which which you you derive derive a good a good intuition intuition of ofthe thepossible possible answer answer by by averaging averaging the the neighboring neighboring values. values. The The k-value, k-value, an an integer integer number, number, is isthe the neighbors neighbors that that the the algorithm algorithm must must consid considerer finding finding the the solution. solution. The The smaller smaller the the k kparameter, parameter, the the more more than than the the algorithm algorithm will will adapt adapt to the to thedata data we weare using, are using, risking risking over- overfitting but nicely fitting complex separating boundaries between classes. The larger fitting but nicely fitting complex separating boundaries between classes. The larger the K- the K-parameter, the more it abstracts from the ups and downs of real data, which derives parameter, the more it abstracts from the ups and downs of real data, which derives nicely nicely smoother carvers between classes in data. In KNN prediction, computation time smoother carvers between classes in data. In KNN prediction, computation time is very is very slow but training computation time is better than random forest. Even though slow but training computation time is better than random forest. Even though training training timing is better than other algorithms, it requires more memory power to compute timing is better than other algorithms, it requires more memory power to compute the the HSI data. So, these algorithms are compared with SVM. HSI data. So, these algorithms are compared with SVM.

FigureFigure 3. 3.Example Example of of K-nearest K-nearest neighbors neighbors (KNN). (KNN).

TheThe SVM SVM [ 32[32]] model model isis aa supervisedsupervised learninglearning methodmethod that looks at data data and and sorts sorts it itinto into one one of of the the two two categories. categories. For For example, example, in in binary binary classification, classification, SVM SVM starts starts with with a a perception formulation and enforces certain constraints that are trying to obtain the separate line according to certain rules. This algorithm divides samples with the help of feature space using a hyperplane. Equation (8) will help to frame the SVM classification part   y xTw + b ≥ M (10)

where “x”, ”w” and”b” indicate the input feature vector, weight vector and bias, respec- tively, and y is the predicted class. It provides a single value whose sign is indicative of the Remote Sens. 2021, 13, 1255 8 of 19

Remote Sens. 2021, 13, x FOR PEER REVIEWclass, because y can be only −1 or +1 and may be equal or more than zero when the opera-8 of 20

tions between parentheses have guessed the same sign as y. In the preceding expression, M is the constraint representing the margin. It is positive and the largest possible value to assure the best separation of predicted classes. The weakness of SVM is the optimization perception formulation and enforces certain constraints that are trying to obtain the sep- process. Since the classes are seldom linearly separable, it may not be possible for all the arate line according to certain rules. This algorithm divides samples with the help of fea- labels of training images to force the distance to equal or exceed constraint M. In Equation ture space using a hyperplane. Equation (8) will help to frame the SVM classification part (10), the epsilon parameter is an optimization parameter for SVM, it allows us to handle with noisy or non-linearly separable data(+) ≥ (10) where “”, ”” and”” indicate the input feature vector, weight vector and bias, respec- y xTw + b ≥ M(1 + e ) (11) tively, and y is the predicted class. It provides a single ivalue whose sign is indicative of the class, because y can be only −1 or +1 and may be equal or more than zero when the The epsilon () value will be zero for correctly classified classes. If the values of epsilon operations between parentheses have guessed the same sign as y. In the preceding expres- are between 0 and 1, the classification is correct, but the points fall within the margin. If sion, M is the constraint representing the margin. It is positive and the largest possible the epsilon value is greater than 1, it means misclassification occurs, and it may be in the value to assure the best separation of predicted classes. The weakness of SVM is the opti- wrong side of the separating hyperplane. The general flowchart of 3D-DWT-based feature mization process. Since the classes are seldom linearly separable, it may not be possible extraction and machine-learning classification is shown in Figure4. for all the labels of training images to force the distance to equal or exceed constraint M. In Equation (10), the epsilon parameter is an optimization parameter for SVM, it allows Algorithm 1: 3D-DWT-based feature extraction for hyperspectral image classifications us to handle with noisy or non-linearly separable data Input: Airborne hyperspectral image data X∈Rˆ(w*h*λ), K is the number of classes. Output: Predicted labels y. ( +) ≥(1+) (11) • TheThe epsilon input feature (ϵ) value vector willX isbe converted zero for intocorrectly 3D-DWT classified features, classes.U∈Rˆ( wIf* hthe*15). values of epsi- lon• areA between certain amount 0 and of1, inputthe classification features is selected is correct, for training but the randomly points fall and within remaining the samples margin.  If the areepsilon used value as testing is greater samples. than 1, it means misclassification occurs, and it may be in • The KNN, SVM, random forest machine-learning algorithm are trained using training the wrong side of the separating hyperplane. The general flowchart of 3D-DWT-based samples. feature• The extraction labels for and testing machine-learning samples are computed classification using all theis shown machine-learning in Figure 4. algorithms.

Data Machine Preprocessing Learning Feature Airborne Algorithms extraction •Training Predicted Hyperspectral using 3D- •60% Image Labels y DWT • Random forrest •Testing •KNN •40% •SVM

Figure 4. Machine-learning Algorithm 1 workflow. Figure 4. Machine-learning Algorithm 1 workflow. 4. Results and Discussions AlgorithmIn this 1 work,: 3D-DWT-based our proposed feature 3D-DWT-random extraction for hyperspectral forest, 3D-DWT-KNN image classifications and 3D-DWT- Input:SVM methods Airborne are hyperspectral validated for image two standarddata X∈R^(w*h* datasetsλ), such K is asthe Salinas number Scene of classes. and Indian PinesOutput: hyperspectral Predicted labels data y. [ 37]. The results are compared with the same classifiers but without• The usinginput informationfeature vector from X is the converted DWT. All into coding 3D-DWT required features, for this U∈ researchR^(w*h*15). has been carried• A certain out using amount MATLAB of input 2020b features on a personal is selected computer for trai withning 4GHzrandomly CPU and and remaining 8GB RAM. All thesamples above mentionedare used as algorithms ϵ testing samples. are compared analytically using overall accuracy (OA), average accuracy (AA) and kappa coefficient [38]. Generally, OA gives the information • The KNN, SVM, random forest machine-learning algorithm are trained using train- about the number of correctly classified samples divided by the total number of samples. ing samples. AA provides information about average accuracies of individual classes, whereas kappa • coefficientThe labels (κ) indicates for testing the samples two errors are computed (omission using and commission) all the machine-learning and overall accuracyalgo- performancerithms. of the classifier. A confusion matrix as shown in Equation (12) is utilized to calculate the performance metrices. 4. Results and Discussions

In this work, our proposed 3D-DWT-r andomH11 forest,H12 ···3D-DWT-KNNH1K and 3D-DWT-

. . . SVM methods areConfusion validated for matrix two( standardH) = datasets. such.. as Salinas. Scene and Indian(12) Pines hyperspectral data [37]. The results are compared with the same classifiers but with- HK1 HK2 ··· HKK out using information from the DWT. All coding required for this research has been car- ried out using MATLAB 2020b on a personal computer with 4GHz CPU and 8GB RAM. All the above mentioned algorithms are compared analytically using overall accuracy

Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 20

(OA), average accuracy (AA) and kappa coefficient [38]. Generally, OA gives the infor- mation about the number of correctly classified samples divided by the total number of samples. AA provides information about average accuracies of individual classes, whereas kappa coefficient (κ) indicates the two errors (omission and commission) and overall accuracy performance of the classifier. A confusion matrix as shown in Equation (12) is utilized to calculate the performance metrices.

⋯ Remote Sens. 2021, 13, 1255 () = ⋮⋱⋮ 9 of(12) 19 ⋯

where indicates the number of pixels, here “i” denotes the actual class, “j” denotes wherethe predictedHij indicates class and the numberK indicates of pixels, the total here number “i” denotes of classes. the actual The OA class, equation “j” denotes is shown the predictedin Equation class (13). and K indicates the total number of classes. The OA equation is shown in Equation (13). 1 1 K OA= = H (13)(13) _∑ ii Ntest_samples i=1

wherewhereN test__samples is is the the total total number number of testingof testing samples. samp Theles. The average average accuracy accuracy is computed is com- asputed shown as shown in Equations in Equations (14) and (14) (15). and (15).

H = ii (14) CAi = (14) Ni 1 1K AA== CA (15)(15) K ∑ i i=0

wherewhere Ni is is the the total total test test samples samples in in class classi, and, and it varies it varies from from0 to 0 K . The . The kappa kappa coefficient coeffi- cancient be can calculated be calculated by by OA − Pe κ = − (16) =1 − Pe (16) 1− K Ri Ci where pe = ∑ (P.i ∗ Pi.) is the projected agreement, Pi. = , P.i = , where = ∑(. ∗.) is the projected agreement, . = Ntest_samples ,. = Ntest_samples , i=1 _ _ R and C represent the summation of ith row and ith column in confusion matrix, respec- i i represent the summation of ith row and ith column in confusion matrix, respec- tively.tively. HigherHigher valuesvalues ofof OA,OA, AAAA andandκ are are expected expected by by a gooda good classifying classifying algorithm. algorithm. 4.1. 3D-DWT-Based Hyperspectral Image Classification Using Indian Pines Data 4.1. 3D-DWT-Based Hyperspectral Image Classification Using Indian Pines Data The Indian Pines dataset was collected by an airborne visible/infrared imaging spec- The Indian Pines dataset was collected by an airborne visible/infrared imaging spec- trometer (AVIRIS) sensor. This dataset has been taken in North-West part of India in the trometer (AVIRIS) sensor. This dataset has been taken in North-West part of India in the year 1992. Each sample has 224 spectral bands with the spatial dimension of size 145 * 145 year 1992. Each sample has 224 spectral bands with the spatial dimension of size 145*145 with ground truth data as shown in Figure5. It includes 16 different vegetation classes, as with ground truth data as shown in Figure 5. It includes 16 different vegetation classes, as shown in Table1. shown in Table 1.

(a) (b) (c)

Figure 5. Cont.

Remote Sens. 2021,, 13, x 1255 FOR PEER REVIEW 1010 of of 20 19

(d)

Figure 5. 5. IndianIndian Pines Pines hyperspectral hyperspectral image: image: ( (aa)) Original Original Indian Indian Pines Pines data data cube cube (145*145*224). (145*145*224). ( (bb)) The The Indian Indian Pines Pines ground ground truthtruth image. image. ( (cc)) The The training training region region of of interest (ROI). ( d) The classification classification legend for different classes.classes.

TableTable 1. IndianIndian Pines Pines hyperspectral hyperspectral dataset. Training IndianIndian Pines Pines Classes Classes Training DatasetTesting Testing Dataset Dataset Label 0 Background 6465 4310 Label 0 Background 6465 4310 Label 1 Alfalfa 27 19 Label 1 Alfalfa 27 19 Label 2 Corn-no till 857 571 Label 2 Corn-no till 857 571 LabelLabel 3 Corn-min 3 Corn-min till till 486 486 333 LabelLabel 4 Corn 4 Corn 142 142 94 LabelLabel 5 Grass-pasture 5 Grass-pasture 290 290 194 LabelLabel 6 Grass-trees 6 Grass-trees 438 438 292 LabelLabel 7 Grass-pasture-mowed 7 Grass-pasture-mowed 17 17 11 Label 8 Hay-windrowed 287 191 Label 8 Hay-windrowed 287 191 Label 9 Oats 12 8 Label 9 Oats 12 8 Label 10 Soybean-no till 583 389 LabelLabel 10 Soybean-no 11 Soybean-min till till 583 1473 982389 LabelLabel 11 Soybean-min 12 Soybean-clean till 1473 356 237982 Label 12Label Soybean-clean 13 Wheat 356 123 237 82 LabelLabel 13 Wheat 14 Woods 123 759 506 82 Label 15Label Buildings-Grass-Trees-Drives 14 Woods 759 232 154506 Label 15 Buildings-Grass-Trees-DrivesLabel 16 Stone-Steel-Towers 232 56 154 37 Total 12,602 8411 Label 16 Stone-Steel-Towers 56 37

Initially, the 3D-DWT+randomTotal forest method is 12,602 evaluated with labeled8411 samples that have 60% data as training and 40% of data as testing samples. Principal component anal- ysis (PCA)Initially, [39] the is 3D-DWT+randomused to reduce the forest number method of feature is evaluated parameters with labeledto 15, to samples extract thatthe significanthave 60% information. data as training the parameters and 40% of considered data as testing in this samples. work are Principal(i) gamma component value and (ii)analysis epsilon (PCA) value, [39 which] is used are totuned reduce up theto b numberoost the ofperformance feature parameters of the proposed to 15, to experi- extract ment.the significant All the machine-learning information. the algorithms parameters were considered validated in thisby K-cross work are validation (i) gamma (K value= 10). Splittingand (ii) epsilonthe training value, dataset which into are k folds tuned is uppart to of boost the k-fold the performance cross-validation of the process. proposed The firstexperiment. k-1 folds are All used the machine-learning to train a model, while algorithms the remaining were validated kth fold byserves K-cross as a test validation range. In(K each= 10). fold, Splitting we canthe use training a variant dataset of k-fold into cross-validation k folds is part that of the maintains k-fold cross-validation the imbalanced classprocess. distribution. The first kIt− is1 foldsknown are as used stratified to train k-fold a model, cross-validation, while the remaining and it ensureskth fold that serves the classas a testdistribution range. In in each each fold, split we of can the use data a variantis consistent of k-fold with cross-validation the distribution that of maintainsthe entire trainingthe imbalanced dataset. classIt also distribution. emphasizes Itthe is impo knownrtance as stratified of using k-foldstratified cross-validation, k-fold cross-valida- and it tionensures of imbalanced that the class datasets distribution to maintain in each the split class of the distribution data is consistent in the train with and the distributiontest sets for

Remote Sens. 2021, 13, 1255 11 of 19

Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 20 of the entire training dataset. It also emphasizes the importance of using stratified k-fold cross-validation of imbalanced datasets to maintain the class distribution in the train and test sets for each model evaluation. In SVM, kernel function with 3rd order polynomial rd wereeach used.model Classification evaluation. In results SVM, of kernel all the function methods with on Indian 3 order Pines polynomial dataset are were shown used. in FigureClassification6, and the results overall of all accuracy, the methods average on Indian accuracy Pines and dataset kappa are coefficients shown in are Figure reported 6, and andthe overall compared accuracy, in Table average2. The accuracy best obtained and ka resultsppa coefficients are highlighted are reported in Table and2 .compared Three- dimensionalin Table 2. DWT+SVMThe best obtained performs results better are in termshighlighted of all performance in Table 2. metricesThree-dimensional and larger improvementDWT+SVM performs in this scenario; better in 3D-DWT+KNN terms of all performance and 3D-DWT+decision metrices and larger tree trail improvement somewhat behindin this scenario; (79.4 and 3D-DWT+KNN 76.2% OA, respectively). and 3D-DWT+ As showndecision in Tabletree trail1, 12,602 somewhat training behind samples (79.4 fromand 76.2% the Indian OA, respectively). Pines data are As selected. shown in The Tabl overalle 1, 12,602 accuracy training of 3D-DWT+SVM samples from the is 88.3%,Indian whichPines data is 14.5% are selected. larger than The that overall of traditional accuracy of SVM 3D-DWT+SVM (77.1%). The is classification88.3%, which maps is 14.5% of 3D-DWT+SVMlarger than that areof traditional more closely SVM related (77.1%). to theThe ground classification truth map.maps Theof 3D-DWT+SVM observed results are leadmore to closely misclassification related to duethe ground to the consideration truth map. The of onlyobserved spatial results information, lead to whichmisclassifica- can be rectifiedtion due by to includingthe consideration both spectral of only and spatial spatial information, information. which Wavelet can coefficients be rectified from by includ- eight sub-bandsing both spectral capture and the spatial variations information. in the respective Wavelet dimensions coefficients by from implementing eight sub-bands 3D-DWT cap- overture spatial–spectralthe variations in dimensions. the respective The dimens local 3D-DWTions by structureimplementing is represented 3D-DWT in over three spatial– ways byspectral the energy dimensions. of the wavelet The local coefficients. 3D-DWT structure is represented in three ways by the en- ergy of the wavelet coefficients.

(a) Random Forest (67.7%) (b) KNN (60.2%) (c) SVM (77.1%)

(d) 3D-DWT+ Random Forest (e) 3D-DWT+KNN (79.4%) (f) 3D-DWT+SVM (88.1%) (76.2%)

FigureFigure 6. 6. ClassificationClassification outputoutput mapsmaps obtainedobtained byby allall computingcomputing methodsmethods onon IndianIndian PinesPines hyperspectralhyperspectral datadata (OA(OA areare describeddescribed in in parentheses). parentheses). The confusion matrix for 3D-DWT + SVM in Indian Pines hyperspectral data is shown The confusion matrix for 3D-DWT + SVM in Indian Pines hyperspectral data is in Figure7. The hyperspectral image classification outputs obtained by all the methods shown in Figure 7. The hyperspectral image classification outputs obtained by all the have more salt and pepper noise. This is because only spectral information is considered methods have more salt and pepper noise. This is because only spectral information is in both KNN and random forest methods. When both spectral and spatial information of considered in both KNN and random forest methods. When both spectral and spatial in- Indian Pines hyperspectral data are used, it results in better OA. For a better understanding formation of Indian Pines hyperspectral data are used, it results in better OA. For a better for calculating AA the steps are as follows: A sample calculation for computing AA is understanding for calculating AA the steps are as follows: A sample calculation for com- given as follows, for we consider true class Label 2, the total number of test sample of class puting AA is given as follows, for we consider true class Label 2, the512 total number of test 2 is 571, the predicted output of SVM of class 2 is 512, so the AA = 571 = 89.6% , which is sample of class 2 is 571, the predicted output of SVM of class 2 is 512, so the = = shown in Table2 (third row fourth column). In a similar way, we evaluated all the average accuracy89.6% , which of each is class. shown Table in Table3 shows 2 (third the OA, row AA fourth and kappacolumn). coefficient In a similar of entire way, class we evalu- with differentated all the algorithms average withaccuracy help of each Equation class. (14). Table 3 shows the OA, AA and kappa coeffi- cient of entire class with different algorithms with help of Equation (14).

Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 20

Table 2. Classification accuracy of all the computing methods in Indian Pines dataset.

3D-DWT+ Random Forest 3D-DWT+ KNN 3D-DWT+SVM Random AWFF- RemoteClass Sens. 2021, 13, 1255 KNN SVM Fejer- Fejer- Fejer-12 of 19 Forest GAN Haar Coiflets Haar Coiflets Haar Coiflets Korovkin Korovkin Korovkin Filters Filters Filters Filters Filters Filters Filters filters Filters

0 79.7 79.6 93.6 100 85.4 86.0 86.1 91.6 92.1 92.2 91.1 91.6 91.2 Table 2. Classification accuracy of all the computing methods in Indian Pines dataset. 1 93.8 66.7 89.5 98.00 100 100 100 94.4 84.2 100 100 68.4 93.3 2 70.9 56.6 89.6 90.173D-DWT+ 90.6 Random89.6 Forest92.0 87.9 3D-DWT+KNN 86.2 90.2 91.8 3D-DWT+SVM 91.4 93.1 Random AWFF- Fejer- Fejer- Fejer- Class3 78.4KNN SVM63.4 90.7 94.1Haar 91.2Coiflets 91.1 91.0 Haar90.3 Coiflets88.6 90.9 Haar93.8 Coiflets89.0 94.5 Forest GAN Korovkin Korovkin Korovkin Filters Filters Filters Filters Filters Filters 4 62.1 55.9 86.2 92.6 92.5 92.5 Filters90.5 88.9 88.2 Filters84.7 93.1 95.2 Filters96.6 05 79.791.3 79.6 93.686.7 91.2 100 92.4 85.4 92.7 86.096.8 86.197.0 91.694.6 92.194.5 92.297.8 91.195.2 91.695.1 91.297.3 16 93.885.0 66.7 89.570.3 98.0093.8 93.7 100 96.9 10095.1 10097.6 94.493.4 84.291.3 10093.9 10094.5 68.493.1 93.398.6 2 70.9 56.6 89.6 90.17 90.6 89.6 92.0 87.9 86.2 90.2 91.8 91.4 93.1 37 78.487.5 63.4 90.771.4 71.4 94.1 98.6 91.2 100 91.185.7 91.090.0 90.390.9 88.681.8 90.990.9 93.8100 89.063.6 94.5100 48 62.189.5 55.9 86.277.6 96.2 92.6 96.9 92.5 91.4 92.595.2 90.593.4 88.994.0 88.294.5 84.792.7 93.196.8 95.297.3 96.696.9 5 91.3 86.7 91.2 92.4 92.7 96.8 97.0 94.6 94.5 97.8 95.2 95.1 97.3 69 85.0100 70.3 93.892.2 87.5 93.7 88.1 96.9 100 95.1100 97.6100 93.4100 91.3100 93.9100 94.5100 93.1100 98.6100 710 87.573.8 71.4 71.458.9 91.6 98.6 94.5 100 88.1 85.791.5 90.089.9 90.986.6 81.886.8 90.984.5 10092.2 63.693.8 10091.3 8 89.5 77.6 96.2 96.9 91.4 95.2 93.4 94.0 94.5 92.7 96.8 97.3 96.9 911 10076.8 92.2 87.569.3 94.2 88.1 92.6 100 92.1 10090.3 10091.9 10091.0 10090.3 10090.2 10093.1 10093.1 10093.0 1012 73.874.9 58.9 91.666.8 82.2 94.5 94.5 88.1 90.8 91.588.9 89.990.7 86.688.0 86.886.2 84.592.2 92.294.7 93.891.9 91.396.3 11 76.8 69.3 94.2 92.6 92.1 90.3 91.9 91.0 90.3 90.2 93.1 93.1 93.0 1213 74.991.5 66.8 82.278.8 91.5 94.5 97.9 90.8 96.1 88.997.2 90.796.2 88.097.4 86.297.4 92.294.0 94.798.7 91.998.8 96.396.4 1314 91.562.3 78.8 91.558.0 81.6 97.9 91.4 96.1 87.6 97.297.8 96.285.4 97.482.6 97.482.8 94.082.5 98.782.4 98.883.3 96.492.0 14 62.3 58.0 81.6 91.4 87.6 97.8 85.4 82.6 82.8 82.5 82.4 83.3 92.0 15 50.0 51.1 68.0 88.1 97.8 97.4 91.4 91.5 91.0 85.4 89.7 83.8 91.2 15 50.0 51.1 68.0 88.1 97.8 97.4 91.4 91.5 91.0 85.4 89.7 83.8 91.2 1616 78.278.2 74.3 96.874.3 96.8 92.6 92.6 91.4 91.4 85.485.4 79.179.1 91.291.2 84.284.2 78.678.6 92.392.3 87.887.8 84.684.6

Figure 7. Confusion matrix for SVM in Indian Pines hyperspectral data. Figure 7. Confusion matrix for SVM in Indian Pines hyperspectral data. The proposed 3D-DWT features-based classifiers perform better compared to existing algorithms. The overall accuracy of test data is achieved SVM (77.1%), KNN (60.2%), random forest (67.7%), 3D-DWT+KNN (79.4%) and 3D-DWT+random Forest (76.2%) are less than the 3D-DWT+SVM method’s (88.3%) relative accuracy by 14.5, 46.7, 30.4, 11.2 and 15.9%, respectively. Table3 discussed the comparison of adaptive weighting feature fusion (AWFF) with generative adversarial network (GAN) [40], and our proposed method of 3D-DWT-based KNN and SVM based on our test dataset. In this Table3, we can say our proposed method has some improvement in terms of accuracy of individual classes because both spatial and spectral information are extracted to improve the performance of the classifications. Only few classes of AWFF-GANs have achieved more accuracies compared to our proposed method.

Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 20

Table 3. OA, AA and kappa coefficient of entire class with different algorithms for Indian Pines hyperspectral data.

3D-DWT+ Random Forest 3D-DWT+ KNN 3D-DWT+SVM Random AWFF- Fejer- Fejer- Fejer- Class KNN SVM Haar Coiflets Haar Coiflets Haar Coiflets Forest GAN Korovkin Korovkin Korovkin Filters Filters Filters Filters Filters Filters Filters Filters Filters Overall Accuracy 67.7 60.2 77.1 87.5 76.2 74.1 78.4 79.4 78.2 78.6 88.3 86.2 90.4 (OA) Average Accuracy 72.1 66.4 89.6 90.8 88.3 85.2 82.7 90.7 86.6 90.4 91.7 90.4 92.6 (AA) RemoteKappa Sens. Coefficient2021, 13 , 12556.92 5.83 6.24 8.11 7.76 7.54 7.24 7.94 7.83 7.81 8.14 8.01 8.3613 of 19

The proposed 3D-DWT features-based classifiers perform better compared to exist- ing algorithms. The overall accuracy of test data is achieved SVM (77.1%), KNN (60.2%), Table 3. OA, AA and kapparandom coefficient forest of (67.7%), entire class 3D-DWT+KNN with different algorithms(79.4%) and for 3D-DWT+random Indian Pines hyperspectral Forest data. (76.2%) are

less than the 3D-DWT+SVM3D-DWT+Random method’s Forest (88.3%) 3D-DWT+KNN relative accuracy by 14.5, 3D-DWT+SVM 46.7, 30.4, 11.2 Random and 15.9%,AWFF- respectively. Table 3Fejer- discussed the comparisonFejer- of adaptive weighting featureFejer- Class KNN SVM Haar Coiflets Haar Coiflets Haar Coiflets Forest GAN Korovkin Korovkin Korovkin fusion (AWFF)Filters with Filtersgenerative adversarialFilters networkFilters (GAN) [40],Filters andFilters our proposed Filters Filters Filters method of 3D-DWT-based KNN and SVM based on our test dataset. In this Table 3, we Overall 67.7 60.2 77.1 87.5 76.2 74.1 78.4 79.4 78.2 78.6 88.3 86.2 90.4 Accuracy (OA) can say our proposed method has some improvement in terms of accuracy of individual Average 72.1 66.4 89.6 90.8 88.3 85.2 82.7 90.7 86.6 90.4 91.7 90.4 92.6 Accuracy (AA) classes because both spatial and spectral information are extracted to improve the perfor- Kappa 6.92 5.83mance 6.24 of 8.11 the classifications. 7.76 7.54 Only 7.24 few classes 7.94 of 7.83AWFF-GANs 7.81 have 8.14 achieved 8.01 more 8.36 accu- Coefficient racies compared to our proposed method. Initially 60:40 ratios are used for training and testing. Later, the effectiveness of our proposedInitially method 60:40 is ratios evaluated are used with for different training number and testing. of training Later, thesamples effectiveness such as 50, of our55, 60,proposed 65, 70, method75 and 80% is evaluated of supervised with different samples number of each ofclass training from samplesthe Indian such Pines as 50, HSI 55, da- 60, taset.65, 70, The 75 andoverall 80% accuracy of supervised for different samples training of each ratio class is shown from the in IndianFigure Pines8. The HSI plot dataset. shows thatThe out overall of several accuracy approaches for different proposed, training the ratio 3D-DWT+SVM is shown in Figure achieves8. The better plot performance shows that inout all of aspects. several Meanwhile, approaches proposed,3D-DWT+KNN the 3D-DWT+SVM and 3D-DWT+random achieves betterforest performanceobtain the second in all andaspects. third Meanwhile, highest OA. 3D-DWT+KNN and 3D-DWT+random forest obtain the second and third highest OA.

FigureFigure 8. 8. OAOA of of all all computing computing methods methods with with different different pe percentagercentage of Indian Pines training samples. 4.2. 3D-DWT-Based Hyperspectral Image Classification Using Salinas Scene Hyperspectral Data

The Salinas Scene dataset was collected by AVIRIS Sensor. This dataset was taken in Salinas Valley, California in the year 1998. These data have 224 spectral bands with the size of 512*217 spatial dimension with ground truth data and have 17 different landcover classes as given in Table4. All the machine-learning algorithms were validated by K-cross validation. In SVM, kernel function with 3rd order polynomial were used. The confusion matrix (3D-DWT+SVM) for Salinas hyperspectral data is shown in Figure9. Classification results of all the challenging methods on the ground truth of Salinas Scene hyperspectral data are shown in Figure 10. The overall accuracy, average accuracy and kappa coefficients of test datasets are reported and compared in Table5. For a better understanding for calculating AA, the steps are as follows: A sample calculation for computing average accuracy is as follows, for we consider, true class Label 2, the total number of test sample of class 2 is 3725, the predicted output of 3D-DWT + SVM (Fejer- 3725 Korovkin filters) of class 2 is 3699, so the AA = 3699 = 99.3% , which is shown in Table5 (third row thirteenth column). In a similar way, we evaluated every average accuracy of each class. Table6 shows the OA, AA and kappa coefficient of entire class with different algorithms with help of Equation (14). Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 20

4.2. 3D-DWT-Based Hyperspectral Image Classification Using Salinas Scene Hyperspectral Data The Salinas Scene dataset was collected by AVIRIS Sensor. This dataset was taken in Salinas Valley, California in the year 1998. These data have 224 spectral bands with the size of 512*217 spatial dimension with ground truth data and have 17 different landcover classes as given in Table 4. All the machine-learning algorithms were validated by K-cross validation. In SVM, kernel function with 3rd order polynomial were used. The confusion matrix (3D-DWT+SVM) for Salinas hyperspectral data is shown in Fig- ure 9. Classification results of all the challenging methods on the ground truth of Salinas Scene hyperspectral data are shown in Figure 10. The overall accuracy, average accuracy and kappa coefficients of test datasets are reported and compared in Table 5. For a better understanding for calculating AA, the steps are as follows: A sample calculation for com- puting average accuracy is as follows, for we consider, true class Label 2, the total number of test sample of class 2 is 3725, the predicted output of 3D-DWT + SVM (Fejer-Korovkin filters) of class 2 is 3699, so the 𝐴𝐴 = = 99.3% , which is shown in Table 5 (third row thirteenth column). In a similar way, we evaluated every average accuracy of each class. Remote Sens. 2021, 13, 1255 Table 6 shows the OA, AA and kappa coefficient of entire class with different algorithms14 of 19 with help of Equation (14).

Figure 9. Confusion matrix for 3D-DWT+SVM (Fejer-Korovkin filters) in Salinas hyperspectral data. Figure 9. Confusion matrix for 3D-DWT+SVM (Fejer-Korovkin filters) in Salinas hyperspectral data. From Figure 11 and Table4, one and two best accuracies are highlighted: 3D-DWT+ SVMFrom performs Figure 11 better and in Table terms 4, of one all performanceand two best metrices accuracies and largerare highlighted: improvement 3D- in this DWT+SVMscenario; performs 3D-DWT+KNN better in and terms 3D-DWT+decision of all performance tree metrices trail behind and larger 3D-DWT+SVM improvement with 96.1 in thisand scenario; 86.2% OA, 3D-DWT+KNN respectively [and41]. 3D-DWT+decision Table3 shows that tree 159,206 trail samplesbehind 3D-DWT+SVM from Salinas Scene withhyperspectral 96.1 and 86.2% dataset OA, respectively were selected [41]. for training.Table 3 shows The overall that 159,206 accuracy samples of 3D-DWT+SVM from Sa- is linas96.9%, Scene whichhyperspectral is about dataset relative were accuracy selected 7.54% for larger training. than The that overall of traditional accuracy SVM of 3D- (90.1%). DWT+SVMThe classification is 96.9%, which maps ofis about 3D-DWT+SVM relative accuracy is more 7.54% closely larger related than to that the of ground traditional truth map. SVM (90.1%). The classification maps of 3D-DWT+SVM is more closely related to the groundTable truth 4. Salinas map. Scene hyperspectral data.

Salinas Scene Classes Training Dataset Testing Dataset Label 0 Background 85,463 56,975 Label 1 Brocoli green weeds 1 3014 2009 Label 2 Brocoli green weeds 2 5588 3725 Label 3 Fallow 2964 1976 Label 4 Fallow rough plow 2091 1394 Label 5 Fallow smooth 4017 2678 Label 6 Stubble 5939 3959 Label 7 Celery 5369 3579 Label 8 Grapes untrained 16,907 11,271 Label 9 Soil Vinyard develop 9305 6203 Label 10 Corn senesced green weeds 4917 3278 Label 11 Lettuce romaine 4wk 1602 1068 Label 12 Lettuce romaine 5wk 2891 1927 Label 13 Lettuce romaine 6wk 1374 916 Label 14 Lettuce romaine 7wk 1605 1070 Label 15 Vinyard untrained 10,902 7268 Label 16 Vinyard vertical trellis 2711 1807 166,655 111,103 Remote Sens. 2021, 13, x FOR PEER REVIEW 16 of 20

Table 6. OA, AA and kappa coefficient of entire class with different algorithms for Salinas Scene hyperspectral data.

3D-DWT+ 3D-DWT+ 3D-DWT+ Random Forest KNN SVM Random Class KNN SVM Fejer- Fejer- Fejer- Forest Haar Coiflets Haar Coiflets Haar Coiflets Korovkin Korovkin Korovkin Filters Filters Filters Filters Filters Filters Filters Filters Filters Overall Accuracy 84.2 88.2 90.1 86.2 88.7 86.2 96.1 78.4 96.1 91.9 88.5 96.7 (OA) Average 86.6 90.2 92.4 89.6 90.7 92.9 91.2 88.5 93.2 92.4 89.5 97.1 Accuracy (AA) Kappa 6.92 5.83 6.24 8.13 7.88 7.67 8.42 7.82 7.86 8.42 8.51 8.62 Coefficient

The hyperspectral image classification output is obtained by all the computing meth- Remote Sens. 2021, 13, 1255 ods. Three-dimensional DWT performs better than other methods, which is evident15 from of 19 the overall accuracy values. The OAs of SVM is 90.1%, KNN is 88.2%, random forest is 84.2%, 3D-DWT+KNN is 96.1% and 3D-DWT+random Forest is 76.2%.

(a) Salinas Ground (b) 3D-DWT+ (c) 3D-DWT+ KNN (d) 3D-DWT+ SVM

Truth RandomForest (96.1%) (96.9%)

(86.2%)

Figure 10. Classification output maps obtained by all computing methods on Salinas Scene hyperspectral data (OAs are Figure 10. Classification output maps obtained by all computing methods on Salinas Scene hyperspectral data (OAs are describeddescribed in in parentheses). parentheses).

Table 5. Classification accuracy of all the computing methods in Salinas Scene hyperspectral data.

3D-DWT+Random Forest 3D-DWT+KNN 3D-DWT+SVM Random Fejer- Fejer- Fejer- Class KNN SVM Haar Coiflets Haar Coiflets Haar Coiflets Forest Korovkin Korovkin Korovkin Filters Filters Filters Filters Filters Filters Filters Filters Filters 0 83.1 88.2 96.2 95.0 84.1 88.3 98.2 92.1 95.2 97.3 92.2 95.9 1 46.2 90.2 86.4 96.7 100 92.2 95.3 92.1 100 92.8 92.1 98.4 2 88.4 83.7 82.4 90.2 87.6 94.2 95.4 88.2 94.4 94.4 89.2 99.3 3 87.9 68.4 91.3 67.0 68.1 92.4 92.5 89.7 93.5 93.1 90.3 98.7 4 70.3 82.4 89.4 72.6 84.5 92.7 91.6 86.5 92.6 83.4 85.6 95.4 5 81.2 80.2 86.2 93.1 94.8 84.8 92.3 96.2 92.3 89.2 97.0 96.0 6 91.2 72.2 93.1 83.9 85.4 92.8 94.1 92.6 94.1 93.0 93.3 98.6 7 88.4 86.5 91.4 89.5 85.7 92.2 94.8 86.4 94.8 94.0 88.6 98.7 8 84.5 90.7 90.2 76.2 95.2 95.6 92.9 93.6 95.9 96.3 93.2 97.0 9 84.5 92.6 92.7 89.2 100 100 97.9 100 94.9 96.6 100 97.4 10 87.5 90.2 92.6 52.8 62.3 72.1 96.8 85.65 93.8 96.3 85.1 97.7 11 83.3 87.5 88.5 95.6 92.1 90.7 93.7 90.3 92.7 90.7 90.2 96.5 12 75.6 88.6 82.7 85.7 88.9 92.9 92.7 89.2 92.7 86.4 90.7 98.7 13 82.2 86.2 88.6 97.8 96.9 98.4 92.0 95.7 92.4 92.8 94.9 94.8 14 84.5 88.4 92.4 92.8 93.8 87.6 91.5 82.7 93.5 90.7 82.6 94.5 15 86.5 85.2 90.2 89.7 90.2 93.6 91.4 88.2 92.4 89.8 86.8 95.7 16 81.5 89.1 92.3 96.6 93.4 78.2 96.7 81.4 89.7 96.2 80.0 98.9

Table 6. OA, AA and kappa coefficient of entire class with different algorithms for Salinas Scene hyperspectral data.

3D-DWT+Random Forest 3D-DWT+KNN 3D-DWT+SVM Random Fejer- Fejer- Fejer- Class KNN SVM Haar Coiflets Haar Coiflets Haar Coiflets Forest Korovkin Korovkin Korovkin Filters Filters Filters Filters Filters Filters Filters Filters Filters Overall 84.2 88.2 90.1 86.2 88.7 86.2 96.1 78.4 96.1 91.9 88.5 96.7 Accuracy (OA) Average 86.6 90.2 92.4 89.6 90.7 92.9 91.2 88.5 93.2 92.4 89.5 97.1 Accuracy (AA) Kappa 6.92 5.83 6.24 8.13 7.88 7.67 8.42 7.82 7.86 8.42 8.51 8.62 Coefficient Remote Sens. 2021, 13, x FOR PEER REVIEW 17 of 20 Remote Sens. 2021, 13, 1255 16 of 19

FigureFigure 11. 11. OAOA of of all all computing computing methods withwith aa differentdifferent percentage percentage of of Salinas Salinas Scene Scene training training samples. sam- ples. The hyperspectral image classification output is obtained by all the computing meth- ods.Later, Three-dimensional the effectiveness DWT of performsour proposed better method than other was methods,evaluated which with different is evident num- from berthe of overall training accuracy samples values. such as The 50, OAs55, 60, of 65, SVM 70, 75 is 90.1%,and 80% KNN of supervised is 88.2%, randomsamples forestof each is class84.2%, from 3D-DWT+KNN the Salinas Scene is 96.1% hyperspectral and 3D-DWT+random dataset. The ForestOA of isdifferent 76.2%. training samples are shownLater, thein Figure effectiveness 11; from of ourthis, proposed it is observed method that was the evaluated proposed with method different of num-3D- DWT+SVMber of training achieves samples better such performance as 50, 55, 60, in 65,all aspects. 70, 75 and Meanwhile, 80% of supervised 3D-DWT+KNN samples and of each class from the Salinas Scene hyperspectral dataset. The OA of different training 3D-DWT+random forest obtain the second and third highest OA. The computation time samples are shown in Figure 11; from this, it is observed that the proposed method of taken for training and testing datasets for our proposed method are as follows: the train- 3D-DWT+SVM achieves better performance in all aspects. Meanwhile, 3D-DWT+KNN ing times of 3D-DWT and testing times is 4 to 6 times greater than SVM, KNN and random and 3D-DWT+random forest obtain the second and third highest OA. The computation forest, because it has both spatial and spectral features and a number of object prediction time taken for training and testing datasets for our proposed method are as follows: the (testing data) in terms of seconds also shown in the Table 7. Figures 12 and 13 show the training times of 3D-DWT and testing times is 4 to 6 times greater than SVM, KNN and average accuracy for a different ratio of training and testing images of Indian Pines (50:50, random forest, because it has both spatial and spectral features and a number of object 55:45, 60:40, 65:35, 70:30, 75:25) and Salinas Scene (50:50, 55:45, 60:40, 65:35, 70:30, 75:25), prediction (testing data) in terms of seconds also shown in the Table7. Figures 12 and 13 respectively. In this, if we are increasing the number of training samples, increasing aver- show the average accuracy for a different ratio of training and testing images of Indian age accuracy of each class because both are directly proportional. From the analysis of Pines (50:50, 55:45, 60:40, 65:35, 70:30, 75:25) and Salinas Scene (50:50, 55:45, 60:40, 65:35, Figure70:30, 75:25),11, if we respectively. are varying Ina different this, if we ratio are of increasing training theand numbertesting samples, of training the samples, overall accuracyincreasing is averagenot varying accuracy that much of each because class because of the unbalanced both are directly dataset. proportional. From the analysis of Figure 11, if we are varying a different ratio of training and testing samples, the Table 7. Computation time of training phase and prediction speed of testing data. overall accuracy is not varying that much because of the unbalanced dataset. Indian Pines Salinas Scene Table 7. Computation timeObject of training Pre- phase and prediction speed of testing data. Algorithms Training Training diction Object Prediction Speed (Sec) Time (Sec) IndianTime Pines Salinas Scene Speed (Sec) Algorithms Training Time Object Prediction Object Prediction Training Time Random Forest 91.6 (Sec) 8000 98.3Speed (Sec) 8000 Speed (Sec) KNNRandom Forest85.58 4100 91.6 93.5 8000 98.3 4100 8000 Haar

Haar Wavelet Wavelet SVM KNN107.26 85.585000 130.42 4100 93.5 5000 4100 Random ForestSVM 88.03 107.26 5900 95.42 5000 130.42 5900 5000

KNNRandom Forest86.3 88.034200 92.66 5900 95.42 4200 5900

Coiflets filters filters KNN 86.3 4200 92.66 4200 Coiflets SVM 104.85 4800 110.35 4800 SVM 104.85 4800 110.35 4800 Random Forest 103.45 8900 113.54 8900 Random Forest 103.45 8900 113.54 8900 Fejer-Korovkin KNN 82.82 4200 101.23 4200 Ko- KNN 82.82 4200 101.23 4200 Fejer- filt filtersrovkin SVM SVM118.88 118.885000 142.47 5000 142.47 5000 5000

Remote Sens. 2021, 13, 1255 17 of 19 Remote Sens. 2021, 13, x FOR PEER REVIEW 18 of 20

Remote Sens. 2021, 13, x FOR PEER REVIEW 18 of 20

FigureFigure 12. 12. AverageAverage accuracy accuracy of of different different ratios ratios of Indian Pines trainingtraining samplessamples (plot(plot ofof differentdifferent types types of of training training and and testingFigure samples). 12. Average accuracy of different ratios of Indian Pines training samples (plot of different types of training and testing samples). testing samples).

Figure 13. Average accuracy of different ratios of Salinas Scene training samples (plot of different types of training and Figure 13. Average accuracy of different ratios of Salinas Scene trainingtraining samples (plot of differentdifferent types of training and testingtesting samples). samples). testing samples). 5.5. Conclusions Conclusions InIn thisthis work,work, aa methodmethod of classifying hyhyperspectralperspectral imageimage datasets datasets is is proposed proposed by by integratingintegrating spatialspatial andand spectral features ofof thethe inputinput images.images. Based Based on on the the performance performance metricsmetrics suchsuch asas overalloverall accuracy, average acaccuracycuracy and and kappa kappa coefficient, coefficient, 3D-DWT+SVM 3D-DWT+SVM achievesachieves betterbetter performanceperformance in all aspects anandd hashas shownshown significant significant improvement improvement in in the the

Remote Sens. 2021, 13, 1255 18 of 19

5. Conclusions In this work, a method of classifying hyperspectral image datasets is proposed by integrating spatial and spectral features of the input images. Based on the performance metrics such as overall accuracy, average accuracy and kappa coefficient, 3D-DWT+SVM achieves better performance in all aspects and has shown significant improvement in the results compared to the state-of-the-art techniques. Meanwhile, 3D-DWT+KNN and 3D-DWT+random forest attain the second and third highest OA for both widely used hyperspectral image datasets such as Indian Pines and Salinas. As a result, the intrinsic characteristics of 3D-DWT+SVM can be effectively represented to improve the discrimina- tion of spectral features for HSI classification. The comparison of different ratio of training and testing images are providing a sufficient information about average accuracy of each class. Our future research will concentrate on how to efficiently represent spatial–spectral data and increase computational performance with the help of deep-learning algorithms.

Author Contributions: R.A. is responsible for the research ideas, overall work, the experiments, and writing of this paper. J.A. provided guidance and modified the paper. S.V. is the research group leader who provided general guidance during the research and approved this paper. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: Not applicable. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Elmasry, G.; Sun, D.-W. Principles of hyperspectral imaging technology. In Hyperspectral Imaging for Food Quality Analysis and Control; Academic Press: Cambridge, MA, USA, 2010; pp. 3–43. 2. Gupta, N.; Voloshinov, V. Hyperspectral imager, from ultraviolet to visible, with a KDP acousto-optic tunable filter. Appl. Opt. 2004, 43, 2752–2759. [CrossRef] 3. Vainshtein, L.A. Electromagnetic Waves; Izdatel’stvo Radio i Sviaz’: Moscow, Russian, 1988; 440 p. (In Russian) 4. Engel, J.M.; Chakravarthy, B.L.N.; Rothwell, D.; Chavan, A. SEEQ™ MCT wearable sensor performance correlated to skin irritation and temperature. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 2030–2033. 5. Shen, S.C. Comparison and competition between MCT and QW structure material for use in IR detectors. Microelectron. J. 1994, 25, 713–739. [CrossRef] 6. Bowker, D.E. Spectral Reflectances of Natural Targets for Use in Remote Sensing Studies; NASA: Washington, DC, USA, 1985; Volume 1139. 7. Chang, C.-I. Hyperspectral Imaging: Techniques for Spectral Detection and Classification; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 1. 8. Martin, M.E.; Wabuyele, M.B.; Chen, K.; Kasili, P.; Panjehpour, M.; Phan, M.; Overholt, B.; Cunningham, G.; Wilson, D.; Denovo, R.C.; et al. Development of an advanced hyperspectral imaging (HSI) system with applications for cancer detection. Ann. Biomed. Eng. 2006, 34, 1061–1068. [CrossRef][PubMed] 9. Huang, W.; Lamb, D.W.; Niu, Z.; Zhang, Y.; Liu, L.; Wang, J. Identification of yellow rust in wheat using in-situ spectral reflectance measurements and airborne hyperspectral imaging. Precis. Agric. 2007, 8, 187–197. [CrossRef] 10. Wang, Y.; Duan, H. Classification of hyperspectral images by SVM using a composite kernel by employing spectral, spatial and hierarchical structure information. Remote. Sens. 2018, 10, 441. [CrossRef] 11. Marconcini, M.; Camps-Valls, G.; Bruzzone, L. A composite semi supervised SVM for classification of hyperspectral images. IEEE Geosci. Remote. Sens. Lett. 2009, 6, 234–238. [CrossRef] 12. Jian, Y.; Chu, D.; Zhang, L.; Xu, Y.; Yang, J. Sparse representation classifier steered discriminative projection with applications to face recognition. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1023–1035. 13. Li, Z.; Zhou, W.-D.; Chang, P.-C.; Liu, J.; Yan, Z.; Wang, T.; Li, F. Kernel sparse representation-based classifier. IEEE Trans. Signal Process. 2011, 60, 1684–1695. [CrossRef] 14. Hongyan, Z.; Li, J.; Huang, Y.; Zhang, L. A nonlocal weighted joint sparse representation classification method for hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 7, 2056–2065. [CrossRef] 15. Xiangyong, C.; Xu, L.; Meng, D.; Zhao, Q.; Xu, Z. Integration of 3-dimensional discrete wavelet transform and Markov random field for hyperspectral image classification. Neurocomputing 2017, 226, 90–100. Remote Sens. 2021, 13, 1255 19 of 19

16. Jun, L.; Bioucas-Dias, J.M.; Plaza, A. Spectral–spatial hyperspectral image segmentation using subspace multinomial logistic regression and Markov random fields. IEEE Trans. Geosci. Remote Sens. 2011, 50, 809–823. 17. Youngmin, P.; Yang, H.S. Convolutional neural network based on an extreme learning machine for image classification. Neurocom- puting 2019, 339, 66–76. 18. Zhuyun, C.; Gryllias, K.; Li, W. Mechanical fault diagnosis using convolutional neural networks and extreme learning machine. Mech. Syst. Signal Process. 2019, 133, 106272. 19. Zhenyu, Y.; Gao, F.; Xiong, Q.; Wang, J.; Huang, T.; Yang, E.; Zhou, H. A novel semi-supervised convolutional neural network method for synthetic aperture radar image recognition. Cogn. Comput. 2019, 1–12. [CrossRef] 20. Zhengxia, Z.; Shi, Z. Hierarchical suppression method for hyperspectral target detection. IEEE Trans. Geosci. Remote Sens. 2015, 54, 330–342. 21. Xudong, K.; Li, S.; Benediktsson, J.A. Spectral–spatial hyperspectral image classification with edge-preserving filtering. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2666–2677. 22. Akrem, S.; Abbes, A.B.; Barra, V.; Farah, I.R. Fused 3-D spectral-spatial deep neural networks and spectral clustering for hyperspectral image classification. Pattern Recognit. Lett. 2020, 138, 594–600. 23. Zhang, X. Wavelet-Domain Hyperspectral Soil Texture Classification. Ph.D. Thesis, Mississippi State University, Starkvale, MS, USA, 2020. 24. Wang, H.; Ye, G.; Tang, Z.; Tan, S.H.; Huang, S.; Fang, D.; Feng, Y.; Bian, L.; Wang, Z. Combining Graph-based Learning with Automated Data Collection for Code Vulnerability Detection. IEEE Trans. Inf. Forensics Secur. 2021, 16, 1943–1958. [CrossRef] 25. Mubarakali, A.; Marina, N.; Hadzieva, E. Analysis of Feature Extraction Algorithm Using Two Dimensional Discrete Wavelet Transforms in Mammograms to Detect Microcalcifications. In Computational Vision and Bio-Inspired Computing: ICCVBIC 2019; Springer International Publishing: Cham, Switzerland, 2020; Volume 1108, p. 26. 26. Shi, G.; Huang, H.; Wang, L. Unsupervised dimensionality reduction for hyperspectral imagery via local geometric structure feature learning. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1425–1429. [CrossRef] 27. Lei, J.; Fang, S.; Xie, W.; Li, Y.; Chang, C.I. Discriminative Reconstruction for Hyperspectral Anomaly Detection with Spectral Learning. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7406–7417. [CrossRef] 28. Luo, F.; Du, B.; Zhang, L.; Zhang, L.; Tao, D. Feature learning using spatial-spectral hypergraph discriminant analysis for hyperspectral image. IEEE Trans. Cybern. 2018, 49, 2406–2419. [CrossRef][PubMed] 29. Nagtode, S.A.; Bhakti, B.P.; Morey, P. Two dimensional discrete Wavelet transform and Probabilistic neural network used for brain tumor detection and classification. In Proceedings of the 2016 Fifth International Conference on Eco-friendly Computing and Communication Systems (ICECCS), Manipal, India, 8–9 December 2016; pp. 20–26. 30. Aman, G.; Demirel, H. 3D discrete wavelet transform-based feature extraction for hyperspectral face recognition. IET Biom. 2017, 7, 49–55. 31. Pedram, G.; Chen, Y.; Zhu, X.X. A self-improving convolution neural network for the classification of hyperspectral data. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1537–1541. 32. Gao, H.; Lin, S.; Yang, Y.; Li, C.; Yang, M. Convolution neural network based on two-dimensional spectrum for hyperspectral image classification. J. Sens. 2018, 2018, 1–13. [CrossRef] 33. Slimani, I.; Zaarane, A.; Hamdoun, A.; Issam, A. Traffic surveillance system for vehicle detection using discrete wavelet transform. Transform, detection using discrete wavelet. J. Theor. Appl. Inf. Technol. 2018, 96, 5905–5917. 34. Alickovic, E.; Kevric, J.; Subasi, A. Performance evaluation of empirical mode decomposition, discrete wavelet transform, and wavelet packed decomposition for automated epileptic seizure detection and prediction. Biomed. Signal Process. Control. 2018, 39, 94–102. [CrossRef] 35. Bhushan, D.B.; Sowmya, V.; Manikandan, M.S.; Soman, K.P. An effective pre-processing algorithm for detecting noisy spectral bands in hyperspectral imagery. In Proceedings of the 2011 International symposium on ocean electronics, Kochi, India, 16–18 November 2011; pp. 34–39. 36. Vinayakumar, R.; Soman, K.P.; Poornachandran, P.; Akarsh, S.; Elhoseny, M. Improved DGA domain names detection and categorization using deep learning architectures with classical machine learning algorithms. In Cybersecurity and Secure Information Systems; Springer: Cham, Switzerland, 2019; pp. 161–192. 37. Anand, R.; Veni, S.; Aravinth, J. Big data challenges in airborne hyperspectral image for urban landuse classification. In Proceedings of the 2017 International Conference on Advances in Computing, Communications, and Informatics (ICACCI), Manipal, India, 13–16 September 2017; pp. 1808–1814. 38. Prabhakar, T.N.; Xavier, G.; Geetha, P.; Soman, K.P. Spatial preprocessing based multinomial logistic regression for hyperspectral image classification. Proc. Comput. Sci. 2015, 46, 1817–1826. [CrossRef] 39. Morais, C.L.M.; Martin-Hirsch, P.L.; Martin, F.L. A three-dimensional principal component analysis approach for exploratory analysis of hyperspectral data: Identification of ovarian cancer samples based on Raman micro spectroscopy imaging of blood plasma. Analyst 2019, 144, 2312–2319. [CrossRef] 40. Liang, H.; Bao, W.; Shen, X. Adaptive Weighting Feature Fusion Approach Based on Generative Adversarial Network for Hyperspectral Image Classification. Remote Sens. 2021, 13, 198. [CrossRef] 41. Luo, F.; Zhang, L.; Du, B.; Zhang, L. Dimensionality reduction with enhanced hybrid-graph discriminant learning for hyperspec- tral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5336–5353. [CrossRef]