Desertification Detection using an Improved Variational AutoEncoder-Based Approach through ETM-Landsat Satellite Data

Item Type Article

Authors Zerrouki, Yacine; Harrou, Fouzi; Zerrouki, Nabil; Dairi, Abdelkader; Sun, Ying

Citation Zerrouki, Y., Harrou, F., Zerrouki, N., Dairi, A., & Sun, Y. (2020). Desertification Detection using an Improved Variational AutoEncoder-Based Approach through ETM-Landsat Satellite Data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 1–1. doi:10.1109/ jstars.2020.3042760

Eprint version Publisher's Version/PDF

DOI 10.1109/JSTARS.2020.3042760

Publisher Institute of Electrical and Electronics Engineers (IEEE)

Journal IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Rights This work is licensed under a Creative Commons Attribution 4.0 License.

Download date 02/10/2021 14:37:08

Item License https://creativecommons.org/licenses/by/4.0/

Link to Item http://hdl.handle.net/10754/666315 This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 1

Desertification Detection using an Improved Variational AutoEncoder-Based Approach through ETM-Landsat Satellite Data

Yacine Zerrouki, Fouzi Harrou, Nabil Zerrouki, Abdelkader Dairi, Ying Sun,

Abstract— The accurate land cover change detection is I. INTRODUCTION critical to improve landscape dynamics analysis and He need for environmental protection, monitoring, and mitigate desertification problems efficiently. Desertification Tsecurity is increasing over the last decades [1]. Accurate detection is a challenging problem because of the high land cover change detection (LCCD) is intertwined with several degree of similarity between some desertification cases and fields. It has a vital role in improving the valuation of burned like-desertification phenomena, such as deforestation. This areas, shifting cultivation, monitoring pollution, assessing paper provides an effective approach to detect deserted deforestation, urban growth, and desertification. All over the regions based on Landsat imagery and Variational years, with the imminent need and the availability of data AutoEncoder (VAE). The VAE model, as a deep learning- repositories, various methods for change detection have been based model, has gained special attention in features devised in the remote sensing field [2-4]. This work focuses on extraction and modeling due to its distribution-free desertification detection, which is one of the most challenging assumptions and superior nonlinear approximation. Here, applications in the LCCD. a VAE approach is applied to spectral signatures for Detecting desertification within time is undoubtedly a critical detecting pixels affected by the land cover change. The factor in mitigating desertification propagation. Two significant considered features are extracted from multi-temporal factors are increasing the propagation of desertification images and include multi-spectral information, and no phenomena, namely human and natural factors. Human factors prior image segmentation is required. The proposed include unordered urban growth, deforestation, and incorrect method was evaluated on the publicly available remote sensing data using multi-temporal Landsat optical images exploitation of semi-vegetal and vegetal regions. On the other taken from the freely available Landsat program. The arid hand, natural factors are mainly intertwined with climatic region around Biskra in is selected as a study area change, humidity, and wind. These factors accelerate the sand since it is well-known that desertification phenomena movement from the desert to surrounding places, like cities, strongly influence this region. The VAE model was roads, and low-density vegetation areas. Such propagation of evaluated and compared with restricted Boltzmann desertification negatively affects the environment and the daily machines, deep learning model, and binary clustering life of the local population. Thus, the desertification problem algorithms, including Agglomerative, Birch, expected has received essential consideration by authorities and maximization, KMean clustering algorithms, and one-class governmental agencies in many affected countries, such as support vector machine. The comparative results showed Algeria. It is worth noting that the desertification problem is not that the VAE consistently outperformed the other models new in Algeria. Several projects (e.g., several tree plantation for detecting changes to land cover, mainly deserted programs and vegetation barrier by massive deforestation) have regions. This study also showed that VAE outperformed the been launched by Algerian authorities through ministries of state of the art algorithms. environment and Agriculture and Rural Development to stop or reduce mobile dunes' progression in the Sahara and Index Terms—desertification detection, feature extraction, Landsat sensors, VAE classification. preservation of vegetation regions. As a result of these projects, more than 3.3 million hectares have been preserved by these Y. Zerrouki is with Conservatoire National des Formations à l'Environnement, programs, with a potential area of 30 million hectares [5]. , Algeria. e-mail: [email protected] F. Harrou and Y. Sun are with King Abdullah University of Science and Accordingly, the evaluation and real estimation of regions Technology (KAUST) Computer, Electrical and Mathematical Sciences and affected by desertification or preserved from desertification are Engineering (CEMSE) Division, Thuwal, 23955-6900, Saudi Arabia. e-mail: necessary. [email protected]. N. Zerrouki is with Center for Development of Advanced Technologies, DI2M laboratory, Algiers, Algeria. e-mail: Over the past decades, various remote sensing-based [email protected]. A. Dairi is with University of Science and Technology of classification techniques have been developed for Oran-Mohamed Boudiaf (USTOMB), Computer Science department, SIMPA desertification detection. For instance, in [1], a method based laboratory, Oran, Algeria. on spectral mixture analysis (SMA) and a decision-tree classification is introduced and applied to time-series images

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2

from Landsat sensors over a time interval of 18 years (1993– Specifically, this method dynamically analyzes remote sensing 2011). Yanchi County in Ningxia, China, was selected as a data at different time points and correlation linking study area to analyze the grassland desertification situation. In desertification and driving factors. It has been shown that this [8], an innovative approach is proposed for change detection method shows a satisfying performance when applied for when dealing with the 3D shape and size of dunes obtained by predicting spatial-temporal desertification expansion in Hebei Digital Terrain Models (DTM) derived from stereo province. In [30], using MODIS data, an approach has been observations made by on-board sensors of Unmanned Aerial proposed to monitor desertification n inner Mongolia. The Vehicles (UAV). Multi-features based on topographic NDVI and vegetation overlay obtained from MODIS images parameters are also explored from synthetic aperture radar are employed to monitor the grassland desertification. Over the past two decades, some studies have been (SAR) to analyze the spatial and temporal movement of conducted to map and identify Algeria's desertification using desertification at Kubuqi Desert, China. In [11], Zanchetta et Remote Sensing data. In [9], a pixel-based approach based on al. exploited Landsat images satellite time series over a time the Maximum Likelihood algorithm is focused on the mobile interval of 30 years (1984–2013) based on the Tasselled Cap dune phenomenon in the region of In-Salah Adrar situated in transform. A new set of parameters for the Tasselled Cap southern Algerian Sahara. In [13], a detailed analysis of land transform are introduced to better fit dryland conditions and degradation and desertification dynamics in North Africa using have been applied to the city of Azraq Oasis, Jordan. In [12], a remote sensing imagery is presented. In this study, a coupled deep learning framework has been applied to oasis desert approach is proposed by combining the decision tree (DT) as recognition using UAV low altitude and remote sensing supervised classification with the Isodata algorithm as imagery. This study intends to propose a system to estimate unsupervised classification. This coupled approach is then plant communities’ degradation in oasis-desert to prevent applied to the principal components extracted from Knepper desertification and protect ecological security. The used ratios. Two regions were selected as study regions, namely recognition algorithm is based on a convolution neural network Oum Zessar (Tunisia) and Biskra (Algeria), and the achieved (CNN) using VGG16 and VGG19 models, where correct correct detection rate is 79.97% [13]. In [5], a temporal detection rates of 93.8 % and 96.73% were respectively evaluation of the desertification process based on the support obtained according to their experimental results. The studied vector machine per object classification and the change indices area is located in Taklamakan Desert, northwestern China. In as features for assessing the land cover degradation is proposed. [27], the phenomenon of desertification is associated to the Such is applied to study the desertification phenomenon in effects of climate change, a combination of two water indices, Biskra (Algeria) using optical Landsat image series over a time namely Normalized Difference Infrared Index (NDII) and Ratio interval between 1986 and 2011, where an overall accuracy of 95.15% was reached [5]. Drought Index (RDI), has been utilized in the moisture content Accurately detecting a land cover change is essential to estimation. The used data was obtained from the MODIS enhance landscape dynamics analysis and alleviate satellite sensor covering Ukraine territory. In [28], to assess desertification problems efficiently. The adopted approach in desertification sensitive areas, a fuzzy analytical hierarchy this work is within the pixel-based framework [10]. process is proposed. They exploit the digital elevation model Specifically, a pixel can encompass a large region in the case of (DEM) with a spatial resolution of 12 m and the Drader sector's desertification analysis, where the main goal lies in determining climate data, . whether the pixel has changed or not. It is worth underscoring In [14], a method combining several sensitive parameters for that some existing change detection systems do not consider desertification, namely normalized difference vegetation index reversible changes due to seasonal weather effects. Indeed, land (NDVI), topsoil grain size index (TGSI), modified soil adjusted cover change detection remains a challenging task because of vegetation index (MSAVI), and land surface albedo, is the complexity of monitored areas, which are becoming more proposed. Thresholding classification has been conducted in and more complicated by the presence of a high degree of recognition of different levels of desertification. As a study similarity between some thematic classes such as bare soil, area, the semi-arid and semi-humid regions of Naiman Banner, sand, and low vegetation. This paper's main goal is to provide a China were selected. The non-linear Albedo-MSAVI feature practical approach to detect deserted regions based on Landsat space model presented the highest result with an overall imagery and Variational AutoEncoder (VAE). Indeed, the precision of 90.1%. In [6], a similarity measure is developed for VAE model, as a deep learning-based model, has gained special desertification assessment based on NDVI, Topsoil Grain Size attention in features extraction and modeling due to its Index (TGSI), and land surface albedo. The extracted attributes distribution-free assumptions and superior nonlinear are then conducted to a Decision Tree (DT) classifier to assess approximation. Here, a VAE deep learning approach is applied the Hogno Khaan protected area's desertification evolution in to spectral signatures for detecting pixels affected by the land 1990, 2002, and 2011. In [7], to analyze desertification cover change. The arid region around Biskra in Algeria is evolution, residual trend, and rain-use efficiency (RUE) selected as a study area since it is well known that it is strongly methods are applied to monitor vegetation degradation at affected by desertification phenomena. The VAE model was western Rajasthan, India. A correlation analysis considering evaluated and compared with restricted Boltzmann machines RUE and NDVI index was presented between 1983 and 2013. (RBM), deep learning model, and binary clustering algorithms, In [29], a desertification monitoring method based on including Agglomerative, Birch, expected maximization, geographic information system (GIS), remote sensing, and KMean clustering algorithms, and one-class support vector global positioning system technologies has been introduced. machine.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 3

Section II briefly describes different phases of desertification (where the sample to be classified is not matrix but pixel’s detection. VAE framework is described in Section III. The feature vector). The deep learning approach is assessed using experiments, including data collection, obtained results, and optical satellite images taken from the freely available Landsat some performance measures, are the object of Section IV. The program in Biskra (Algeria) for 19 years (2000-2019). conclusion and future work are presented in Section V. The global flowchart of the proposed desertification detection system is illustrated in Figure 1. System II. DESERTIFICATION DETECTION APPROACH implementation comprises four main parts: image acquisition, As discussed above, several desertification detection systems pretreatment, feature extraction, and desertification detection. have been proposed in the literature based on threshold-based Next, we will present the different elements of the proposed approaches and conventional supervised classifications. approach. Despite the ease of implementing threshold-based procedures, such methods can present difficulty in determining the optimal III. IMAGE PRE-PROCESSING AND FEATURE EXTRACTION threshold for multi-temporal images. Inappropriate selection of In optical remote sensing imagery, radiometric information detection threshold leads to misclassification, which makes the depends on several parameters, namely the surface slope and its recognition capacity relatively low. On the other hand, in orientation, the atmospheric components, and especially the conventional supervised classifications, several methods nature of the covered surface, which is generally the purpose of strongly depend on the training samples. These techniques can the analysis [20]. Towards this end, the pre-processing phase is fail due to the high degree of similarity between features of necessary and involves typically radiometric calibration, some desertification cases and like-desertification pixels. There atmospheric normalization, and geometric correction. are also several studies based on (i) object classification [5] or deep learning like CNN [12]. In these studies, each sample to be classified is composed of a region of pixel or a sub-image. In both approaches, the segmentation phase is needed for sample construction. However, it is well established that the segmentation task conducted in many different areas is error- prone [10]. In this work, an alternative approach based on pixel’s spectral signature classification is proposed. The considered features are extracted from multi-temporal images and include multi-spectral information, and no prior image segmentation is required. It is worth noting that several studies have focused on the use of deep learning formalism in image segmentation and pixel-level classification; one can cite fully convolutional neural networks (FCN) using different architectures like UNet and DeepLabV3 [25, 26]. Until recently, few studies have explored the deep learning technologies in the detection of desertification. Since the methodology has successfully reshaped multiple research areas, it should be considered to detect desertification areas. The second main contribution of this paper is the introduction of the deep learning VAE model, which is primarily designed as a generative to deal with computer vision applications (e.g., images classification problems), for desertification detection based on Landsat imagery. The choice of the VAE model is essentially motivated by its capability in feature extraction and separation. Furthermore, the VAE based on deep learning formalism has the advantage of distribution-free assumptions and superior nonlinear approximation, making it suitable for resolving complex recognition problems. Essentially, the main idea behind this study is to present an efficient and semiautomatic tool for land degradation assessment in an arid region with a pixel-based method. The proposed approach to desertification detection amalgamates the desirable features of both (i) feature generation paradigm (i.e., the multispectral and multi-temporal Fig. 1. Schematic illustration of the proposed desertification detection data provided by the on-board sensors led to an improved procedure. understanding of land degradation since the extracted features

present a physical meaning) and (ii) deep learning paradigm

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 4

In fact, the atmospheric correction is applied by retrieving factor of change in thematic images. For example, if sudden spectral reflectance from multispectral bands using QUick land cover changes like a disaster, the radiometric responses Atmospheric Correction (QUAC) [31,32]. The atmospheric exhibit a high degree of change. In contrast, changes like correction is applied on each pixel independently since each growing urban reflect a relatively smooth variation in spectral, pixel contains an independent measurement of atmospheric radiometric responses. Besides, certain false changes caused (i) water vapor absorption bands. Here, the geometric correction by weather seasonal effects like dryness regions or (ii) by is performed via ground control points. These control points are human impacts such as deforestation are often confused with invariant in the multi-temporal images, where 255 control desertification detections. This misdetection is due to the degree points have been used for the co-registration using the whole of similarity between features of some desertification cases and bands of the four images. The choice of the control points was like-desertification pixels. based on a visual inspection and our degree of knowledge about the study area. In this work, we associate for each multi-temporal spectral signature (t1, t2, t3, and t4) an observation sequence that characterizes radiometric responses in different channels (3 bands of the visible spectrum namely: red, green and blue). Since the set of remote sensing images is assimilated to an observation sequence, features extracted from separate images are then concatenated to construct the entire vector of attributes. Each pixel is characterized by twelve features (f1... f12), as illustrated in Figure 2.

IV. CLASSIFICATION BASED ON VAE AND TRADITIONAL METHODS Variational Autoencoders (VAEs) are an essential class of generative-based models effective in learning relevant from complex data in an unsupervised way [1, 16]. They combine key characteristics and advantages of variational inference and autoencoder to efficiently extract low-dimensional and relevant features in raw data in an unsupervised manner [19]. It is worth highlighting that the desirable properties VAEs consist of their dimensionality reduction ability, making them useful for compressing high dimensional data into a compressed representation. They can effectively be used for complex probability distribution approximation based on stochastic gradient descent [22]. The superior performance of VAE has been demonstrated in different domain-specific applications, including and urban networks modeling [23] and forecasting time series data [33].

Fig 2.Sample of remote sensing extracted features used to detect pixels affected by desertification phenomena.

Feature extraction is considered a crucial step in the recognition process, where the selected features directly affect the classification performances. On the one hand, extracted features have to be representative enough to obtain a reasonable classification accuracy, and on the other hand, not too complicated to allow reasonable processing time. In conventional remote sensing classification approaches, features are usually generated using software such as ENVI or eCongnition. The degree of variance in spectral, radiometric signatures can be exploited as a parameter to recognize the Fig 3. Basic Illustration of Variational Autoencoders. thematic class and desertification evolution over time. The In this paper, VAE is introduced for desertification detection amount of variation of pixel’s radiometric values is an efficient based on Landsat imagery. Figure 3 illustrates a schematic

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5

presentation of the structure of a VAE. softmax layer is then added at the end of the neural network to perform so-called Softmax Regression, where Softmax The VAE, as a variant of autoencoders, contains an encoder classifiers provide probabilities for each class label (Figure 4). and a decoder (i.e., neural networks). The purpose of the encoder is to encode the input data, X, into a latent space as a The desertification problem consists of two classes (binary distribution, q(z|x). The latent (termed hidden) dimension is classification): undeserted and deserted regions. A given data much decreased compared to the size of the input data. Indeed, point x is classified according to its corresponding higher the encoder is built to efficiently compress the input data into probability. Here, the softmax layer is trained in a supervised this lower-dimensional space. A sample, z∼q(z|x), can be way, where it is fed from the output of the VAE. The whole generated from the code distribution. On the other hand, the architecture used for desertification is the amalgamated VAE decoder's critical purpose, p(x|z), consists of generating the data and Softmax. A similar procedure is adopted with RBM, where point x from the input z. It should be emphasized that the RBM is trained in an unsupervised manner, and then a softmax reconstruction of data using the decoder results in some error of layer is added and trained in a supervised way. reconstruction, which is calculated and backpropagated via the network. This error is minimized in the training phase of the VAE model by the minimization of the mismatch between the original input and the encoded-decoded data. Towards this end, the loss function, L(q), is minimized by the VAE based on training data. The loss function of the VAE consists of the reconstruction loss (on the output layer) and a regularizer (on the latent layer). The reconstruction loss aims to make an effective encoding-decoding process. Simultaneously, a regularizer enables regularizing the latent space structure to get the encoder's distributions as similar as possible to a predefined distribution (Gaussian distribution is the most commonly used in the literature).

핷(q) = E z ∼ q(z|x) ( log p(x|z) ) – KL( p(x|z) || p(z) ), (1)

Reconstruction term Regularization term

The goal is to design a VAE model such that the reconstruc- tion loss converges closer to zero. The reconstruction term in (1) allows strengthening the decoder’s ability to learn data Fig.4 VAE-based classification procedure. reconstruction. On the other hand, the regularization term is given by the Kulback-Leibler (KL) divergence that separate the Here, the greedy layer-wise unsupervised plus fine-tuning was encoder distribution function (q(z|x)) and the latent variable applied to RBM and VAE. prior (z, p(z)). One way to minimize the loss function with respect to the encoder’s parameters and decoder is to use the B. Traditional methods gradient descent procedure in the training stage. Of course, the For comparison, we present several machine learning loss function is minimized to assure obtaining a regular latent algorithms applied to desertification detection problematic, space, z, and satisfactory sampling of new data point based on namely: k-means, mean shift, Birch, Agglomerative, Em, RBM, z ∼ p φ (z) [24]. DNN, and OCSVM.

1) k-means A. VAE-based classification strategy The k-means formalism was initially proposed for data The dataset used for training and testing are normalized and clustering, and has been widely exploited in the fields of vector then used as input for the VAE. We normalize the data, x, by quantization, originally from signal processing [19]. The Min-Max Scaler as follow: principle of the k-means is to partition n observation samples 푥−푥 (x1, x2, ..., xn), into k clusters k (≤ n) (classes) 푥̃ = 푚푖푛 , (2) 푥푚푎푥−푥푚푖푛 C = {C1, c2, ..., Ck} in which each sample fits to the cluster where 푥푚푖푛 and 푥푚푎푥 denotes the minimum and maximum of with the nearest mean (cluster centers), serving as a model of the smoothed data, respectively. The dataset is then split into the cluster. Here, the k-means algorithm was selected as a training (80%) and testing dataset (20%), where VAE is trained comparative classifier since it adapts scales to large data, adapts through an unsupervised manner without data labeling. This to new examples, and guarantees convergence. phase is called pre-training, where the aim is to learn and Formally, the objective is to find: discover features from the training dataset. Specifically, VAE is adopted as a feature extractor for desertification detection. A ∑푘 ∑ ‖ ‖2 ∑푘 | | 푎푟𝑔푚𝑖푛퐶 푖=1 푥휖퐶푖 푥 − µ푖 = 푎푟𝑔푚𝑖푛퐶 푖=1 퐶푖 푉푎푟 퐶푖 (3)

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6

where μi is the mean of samples in Ci. This is equivalent to activation function usually sigmoid. Training a DNN to perform minimizing the pairwise squared deviations of points in the a classification was challenging till the development of new same cluster: training methods, such as lire-wise (in the case of unsupervised learning), and also the design of new optimizers used by the ∑푘 1 ∑ ‖ ‖2 back-propagation process, such as Stochastic gradient descent 푎푟𝑔푚𝑖푛퐶 푖=1 푥,푦휖퐶푖 푥 − µ푖 . (4) 2|퐶푖| (SGD), Rmsprop and Adam, make possible successfully train a DNN for a classification purpose or forecasting problems 2) Mean shift [16]. The training approach used, for example, in the case of The Mean shift formalism was initially proposed by forecasting is supervised, where the network learns the mapping Fukunaga and Hostetler and has been widely exploited in of multiple data points sequence to a single point, which is the classification and clustering [19]. The mean shift is considered next value, this situation resembles finding the following word as a non-parametric feature-space analysis technique for in the text processing field. This task can also be seen as a locating a density function's maxima. The main idea behind the dimensionality reduction that DNN learns to achieve. Mean shift is to place a kernel on each point in the data set. There are several types of kernels, but the most popular one is 5) RBM the Gaussian kernel. Depending on the kernel bandwidth A restricted Boltzmann machine (RBM) is a generative parameter, the resulting density function will change. In other stochastic artificial neural network. The standard type of RBM words, since the Mean shift is an iterative method, it operates has binary-valued (Boolean/Bernoulli) hidden and visible units. by making a copy of the original data and freezing the original Weights’ matrix associates the connection between hidden and visible units. Hidden and visible layers are characterized by points. The copied points are iteratively shifted until they join their energy, which can be defined by: with one of the formed kernels. In the case of Gaussian kernel 퐸(푣, ℎ) = − ∑ 푎 푣 − ∑ 푏 ℎ − ∑ 푣 ℎ 푤 K, the weighted mean of the density is determined as follows, 푖 푖 푖 푗 푗 푗 푖 푖 푗 푖푗 where 푣 and ℎ are respectively binary states of visible and ∑ 퐾(푥 −푥)푥 푥푖휖푁(푥) 푖 푖 hidden units. While a and b represent biases, and w represents 푚(푥) = , (5) ∑푥 퐾(푥푖−푥) weight factors between units. RBM structure assigns a 푖휖푁(푥) probability to each connection vector of visible and hidden where N(x) is the neighborhood of a given sample x, the units: difference m(x)-x is called the mean shift. The value of m(x) is 1 푃(푣, ℎ) = (6) then iteratively re-estimated until convergence. 푍 푒−퐸(푣,ℎ) The mean shift algorithm was selected as a comparative 푍 represents the summation over connections between two algorithm due to its simplicity to solve problems, where results layers. are controlled only by one parameter (the kernel bandwidth 6) OCSVM value). Unlike several existing algorithms, such as k-means, the Unlike the majority of supervised classifiers, where the training number of clusters is needed as an input parameter, not known phase is applied on all pixels belonging to all classes (in our case in certain clustering scenarios. The main shortcoming of mean- desertification and other land cover changes), One-Class shift clustering is its slowness in data separation, making the Support Machine (OCSVM) algorithm is only trained on parallelizable not suitable in some cases, as each sample could positive information, corresponding to the magnitude of pixels be shifted in parallel with all the other samples. from the desertification-free data and then detect rare or abnormal events, related to desertification cases, where they will be rejected and labeled as outliers [21]. In other words, the 3) BIRCH OCSVM classifier defines the boundary of Γ, the minimum Balanced Iterative Reducing and Clustering Using Hierarchies volume region enclosing (1−ν) l observations. Hyper-parameter (BIRCH) is an unsupervised method based on tree structure to ν, in [0; 1], controls the fraction of observations that are allowed classify samples [19]. The different nodes of each tree are to be out of Γ (outliers, desertification incidents in our case). composed of separate Clustering Features (CFs), hence the 7) Agglomerative clustering appellation Clustering Feature Tree (CF Tree). This algorithm's The agglomerative clustering is the most commonly used main idea is to search the leaf node nearest to the sample (to be algorithm in hierarchical clustering formalism [17]. The main classified) and the closest leaf CF node from the root node. If idea is to regroup samples in clusters based on their similarity. the threshold condition corresponding to the radius of this CF As the first phase, each sample is trained as a singleton cluster, node's hyper-sphere is verified, then the sample is accepted, and and pairs of samples are then grouped into clusters. These all CFs on the path are updated. The main principle advantage newly formed clusters are linked to each other via linkage of this algorithm is that (i) the number of classes is not deeded function to create bigger clusters. This process is iteratively as an input parameter; (ii) only one examination of the training repeated until all original data samples are linked together in a samples is needed to construct the CF tree, which makes it very hierarchical tree. The final result is a tree-based representation advantageous in terms of processing time. named dendrogram. 4) Deep Neural Networks (DNN) 8) Spectral clustering Deep Neural Networks (DNN) is similar to a multilayer In spectral clustering, samples to be classified are considered perceptron (MLP), a class of feed-forward artificial neural as nodes of a graph [19]. So, clustering is applied as a graph network (ANN) with many hidden layers to perform a no-linear partitioning problem. The spectral clustering technique is based transformation of a given input, which is achieved through on the spectrum (eigenvalues) for defining similarity between

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7

samples. In other words, spectral clustering consists of (i) chance classification, and an AUC > 0.9 presents a significant computing a similarity graph, (ii) projecting samples’ features classification. We also evaluated the classification efficiency into a low-dimensional space, and finally (iii) creating borders using the F-measure (called F1Score). This indicator reflects of clusters. All samples are connected (xi, xj) to construct a the influence of the random classification rate. F-measure is similarity graph, and edges are formed according to weights. calculated using both recall (r) and precision (p). The recall 2 ‖푥 −푥 ‖ corresponds to the rate of a true positive number (tp) to the total 푆(푥 , 푥 ) = exp (− 푖 푗 ) (7) 푖 푗 2휎2 numbers of samples in the positive class. In contrast, the recall corresponds to the rate of the true positive number (tp) to the where the parameter σ controls the width of the neighborhoods. sum true positive and false negative samples. The F-measure is The similarity graph presents the local neighborhood defined as follows: relationships between samples. The calculation of eigenvalues 푝 ×푟 is then used in the projection of samples into a low-dimensional 퐹 = (1 + β2) , (10) space. For clusters creation, eigenvalues are also used to assign (β2×푝)+푟 weights to each node. The main advantage of spectral clustering 푡푝 푡푝 where 푝 = and 푟 = . is its capacity to correctly cluster samples (even if they are far 푡푝+푓푝 푡푝+푓푛 from each other) due to the dimension reduction step. 9) Expectation-maximization Since recall and precision ratios vary between 0 and 1, the F- The expectation-maximization (EM) approach is primarily measure associates them into a single measure for efficiency. In developed to classify samples (unobserved variables) with the present paper, the value of β is fixed to 1. latent variables using maximizing likelihood estimation [18]. Namely, the EM algorithm approximates the samples’ V. EXPERIMENTS distributions with Gaussian Mixture distributions. For A. Data collection observed sample X, the aim is determining the value of Φ that maximizes the log-likelihood, L(Φ)=logP(x|Φ). By using latent To investigate the introduced strategy's performance, we use variables Z the log-likelihood can be expressed multi-temporal satellite images from a publicly available L(Φ|X,Z)=logP(X|Φ, Z). It is assumed that the observed archive of the United States Geological Survey (USGS). Such images cover the same area of Biskra, but they are collected at samples X have Gaussian distributions, described by the different dates: 2000, 2006, 2011, and 2019 respectively. attributes Φ={m,σ}, and the latent variables are Gaussian- Indeed, the time interval of several years between the collection selector random variables. Of course, the EM algorithm finds of images enables us to observe desertification propagation and elements belonging to a cluster by calculating probabilities recognize different relevant changes, such as new built-up using Gaussian Mixture distributions. The probability that regions, building operations, and planting large groups of trees. maximizing the likelihood of the observed sample is utilized to It is worth to point out that the satellite scenes were chosen with determine the cluster. In other words, every observation is low cloud content to facilitate sand movement monitoring. The belonging to all clusters with a certain probability. presence of clouds modifies the observed pixels' radiometric C. Classification performance measures values and mask relevant information, which increases the false detection rate. The four images used in this study have been Several statistical tools can be used to assess the performance acquired with a resolution of 30 meters per pixel. The main of a classification method; one can cite the overall accuracy, characteristics of these images are listed in Table I. precision, F-measure, or graphical tools like the area under curve (AUC) and receiver operating characteristic (ROC) [10]. TABLE I Overall accuracy is the commonly used measure that is FEATURES OF THE CONSIDERED LANDSAT IMAGES. calculated from diagonal elements of the confusion matrix. It Processing Acquisition Landsat Path/ Solar Solar represents the ratio between pixels correctly classified and the Software Data type Date ID N Row azimuth elevation total number of pixels. The ROC curve is the most used Version graphical performance measure. It represents variation between LPGS OLI_TIRS 03/09/2019 LC 08 194/036 139.55 56.76 sensitivity, related to the true positive rate (TPR), and 13.1.0 L1TP specificity, related to the false positive rate (FPR). LPGS 05/03/2011 LT 05 194/036 TM_L1TP 143.02 42.13 12.8.2 LPGS 19/02/2006 LT 05 194/036 TM_L1TP 145.86 37.47 푁푢푚푏푒푟 표푓 푐표푟푟푒푐푡푙푦 푐푙푎푠푠푖푓푖푒푑 푑푒푟푡푖푓푖푒푑 푝푖푥푒푙푠 13.0.1 TPR = (8) 푇표푡푎푙 푛푢푚푏푒푟 표푓 푑푒푠푖푟푡푖푓푖푒푑 푝푖푥푒푙푠 LPGS 15/04/2000 LE 07 194/036 TM_L1TP 133.81 57.49 12.8.3 푁푢푚푏푒푟 표푓 푚푖푠푠푐푙푎푠푠푖푓푖푒푑 푛표푛−푑푒푠푒푟푡푖푓푖푒푑 푝푖푥푒푙푠 FPR = (9) 푇표푡푎푙 푛푢푚푏푒푟 표푓 푛표푛−푑푒푠푒푟푡푖푓푖푒푑 푝푖푥푒푙푠 Figure. 5 depicts the three bands' composition (red, green, For performance comparisons, the AUC is a good index for and blue) of the four satellite images captured at different dates. assessing classification accuracy. Generally, an AUC of 1 The area selected in this study is an arid region around Biskra presents a correct classification, an AUC of 0.5 presents a in the Algerian northeastern and is often called the Door of the

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8

Desert. This area is placed between dunes (Occidental Grand B. Classification Results and Discussion Erg) in the south direction and the Mountains of Aurés in the In this study, to assess the proposed methods' classification north direction. performance, we selected only a subarea of the global scene. This subarea is used for further analysis, which represents 400x400 pixels. Since the classification is based on pixels, where each pixel represents the sample to be classified, which gives us a total of 160.000 samples to classify. Unlike object- based classification, where the number of samples will be drastically reduced (since each object contains numerous pixels grouped during the segmentation phase according to their homogeneity). The training dataset used in this study comprises 10000 pixels, and 2000 for the testing which means that the amount of data is sufficient to build a robust model.

We evaluated the VAE classification and compared it with several machine learning algorithms. In the training phase, each approach's optimal parameters have been carefully selected (parameter tuning phase) based on the training dataset. Three categories of models are considered in this study, namely: deep learning (i.e., VAE, RBM, DNN), clustering (i.e., Birch, EM, KMean), and machine learning (OCSVM) to deal with binary

classification problem. The proposed approach, VAE-based and RBM, are trained in an unsupervised approach first, and later, a fine-tuning is performed to optimize the model

parameters, while the DNN is trained in a supervised approach. Furthermore, the clustering methods are trained to classify data Fig. 5 Composition of three bands (red, green, and blue) of the four satellite points in two clusters (normal and abnormal). The OCSVM is images captured at different dates. a variant of classic SVM that can deal with a one-class The prevailing wind direction travels from southeast to classification problem; it is trained with only normal northwest along with the Sahara Atlas. The total value of observations compared to the traditional SVM. The classifiers’ precipitation is about 135 mm by year, while wind speed is parameters used in this study are summarized in Table II. about 15 m/s. This region suffers remarkably from the desertification phenomena. Specifically, the wind direction TABLE II The parameters selected for each classification method influences sand's transport and its accumulation and the local topographic factor present in the sloping terrain. The Classifier The optimal parameters geographic location of the study area (around Biskra) is illustrated in Figure 6. Agglomerative Clustering linkages = 'ward' Birch threshold = 0.01

EM Covariance type = 'full' Kmeans Max iterations = 300

Meanshift Max iterations = 300 Spectral clustering gamma=1., affinity = 'rbf'

Hidden = 24 , epochs=1000 , loss

RBM = binary_crossentropy , lr=0.0005 03 layers , layers :[12,8,1] DNN Epochs = 1000 , Loss = binary cross entropy , lr=0.001 Intermediate dimension = 24 VAE Epochs=1000 , Loss = binary cross entropy lr=0.0005

Fig. 6 The study area is located in Biskra city, Algeria.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9

The hyper-parameters for DNN, RBM, and VAE were achieve reasonable recognition rates by achieving respectively, selected using the grid search approach. The VAE model is an AUC of 0.961, 0.939, and 0.902. Overall, the VAE remains composed of five layers (Table III). Table IV illustrates the significantly better compared to all of these classifiers (i.e., impact of the hyper-parameters' selected values in the proposed AUC=0.98). The VAE algorithm outperformed the other architecture on the VAE detection accuracy. classifiers because VAE has a different strategy in detecting desertification incidents. The VAE algorithm, based on deep TABLE III learning formalism, has the advantage of distribution-free THE HYPER-PARAMETERS OF THE PROPOSED METHOD assumptions and superior nonlinear approximation, which makes it suitable for resolving complex recognition problems like separating desertification cases from like-desertification Layer Dimension pixels. Furthermore, the RBM algorithm is based on Contrastive Divergence, which necessitates sampling from Monte Carlo

Input 12 Markov Chain. The partition function in the model’s energy makes logarithmic computing likelihood under the model Intermediate 24 relatively complex. In contrast, in the VAE model, one can track cross-entropy minimized by the model’s learning algorithm Variance 12 (back-propagation of classification errors).

TABLE V Mean 12 EVALUATION METRICS OF EACH METHOD.

Latent space (Z) 12 Model TPR FPR Accuracy Precision F1Score AUC

Softmax 02 VAE 0.971 0.011 0.98 0.989 0.980 0.98

RBM 0.968 0.046 0.961 0.955 0.961 0.961

TABLE IV DNN 0.96 0,082 0.939 0.921 0.940 0.939 IMPACT OF LATENT AND INTERMEDIATE SPACES ON THE CLASSIFICATION ACCURACY OF THE VAE MODEL Agglomerative 0.244 0,09 0.577 0.731 0.366 0.577

Birch 0.246 0,168 0.539 0.594 0.348 0.539

EM 0.42 0.236 0.592 0.640 0.507 0.592

KMean 0.747 0.667 0.54 0.528 0.619 0.54

When applied to the testing data, the accuracy metrics of the OCSVM 0.804 0 0.902 1 0.891 0.902 eight considered methods are summarized in Table V. Table V can reach the highest overall accuracy of 0.98 and AUC of 0.98. That means the VAE model enables a robust identification of Figure 7 illustrates the visual evolution of sand movement pixel affected by desertification with a reduced number of false over a different period. Pixels affected by desertification negatives (i.e., FPR=0.011) and high detection rate (i.e., phenomena are highlighted with yellow dots. We observe from TPR=0.971) even in challenging situations corresponding to Figure 7 that several regions were invaded by sand during the like-desertification cases (e.g., deforestation pixels). On the studied period (2000-2019). These regions mainly belonged to other hand, the binary clustering algorithms, including Birch, the thematic classes: bare soil, less dense vegetation, or even Agglomerative, EM, and KMean classifications, achieve low rocky. accuracies compared to the rest of the methods. Agglomerative, Birch, EM, KMean achieved a poor classification performance by reaching an AUC of 0.577, 0.539, 0.592, 0.54, respectively. One reason for this is that applying the binary clustering methods for clustering a relatively large dataset is challenging. Specifically, in Birch classification, the algorithm is based on the clustering feature tree generation, where the obtained clusters are slightly different from real classes’ distributions. Furthermore, several clustering methods (Agglomerative, Birch, and KMean) present some misclassifications when the data distribution is not similar to a hyper-sphere (convex), making them unsuitable for this problem. Table V also indicates that RBM, DNN, and OCSVM can

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10

Fig. 7 Evolution of pixel affected by desertification over different period of time (2000-2019).

The second observation is that the main phase of sand propagation occurred between the years 2000 and 2006. This fast movement of sand during this period can be related to several natural and human aspects. For human factors, one can cite (i) unordered urban growing and (ii) incorrect exploitation of semi-vegetal and vegetal regions. On the other hand, natural aspects are principally related to (i) wind direction, which affects the transport of sand and its accumulation, and (ii) the local topographic present in the sloping terrain.

Figure 8 shows a visual evaluation of desertification detection using the eight considered methods when applied to the image acquired in 2019. The yellow area describes the spatial distribution of regions affected by desertification. This visual evaluation confirms the performance and the robustness of each detection method and illustrates its impact on the detected surfaces. Fig. 8 Visual evolution of desertification detection using the eight considered methods.

Besides, we also evaluated the performance of the VAE-based detection approach with some existing works based on decision- tree [1], SVM [5], CNN [12], decision tree, and Isodata combination [13] (Table VI). Results in Table V and Table VI indicate the promising performance of the VAE-based desertification approach. In the case of decision-tree classification (used alone in [1] or combined with Isodata in [13]), many misclassification cases can be observed, which makes the recognition capacity relatively lower (accuracy around 80%). In [12], a deep learning formalism based on CNN has been used, and the achieved accuracy is relatively high (96.73% with VGG 19).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11

TABLE VI features even in challenging cases. The VAE algorithm, based CLASSIFICATION PERFORMANCE COMPARISON WITH SEVERAL POWERFUL on deep learning formalism, has the advantage of distribution- ALGORITHMS free assumptions and superior nonlinear approximation, which Overall Classifier The used features Approach Database Accuracy makes it suitable for resolving complex recognition problems (%) like separating desertification cases from like-desertification Landsat spectral mixture pixels. The proposed method offers a meaningful and reliable Li et. al [1] decision-tree TM/ETM+ 80 analysis images assessment tool that detects pixels affected by desertification principal and automatically distinguishes the type of change. decision tree Afrasinei components Landsat 7 and & 79.97 [13], analysis (PCA) & 8 imagery Isodata Knepper ratios To more improve desertification detection, in the future, we Optical UAV plan to investigate incorporating more input variables, typical plant CNN VGG 16 93.8 Ainiwaer et remote- including several sensitive parameters for desertification, species features and al. [12] sensing image backgrounds CNN VGG 19 96.73 namely normalized difference vegetation index (NDVI), topsoil images grain size index (TGSI), and modified soil adjusted vegetation Optical satellite index (MSAVI). Azzouzi et. Radiometric values SVM images 95.15 al [5] (Landsat program) Optical ACKNOWLEDGMENT The Multi-spectral and satellite proposed multi-temporal VAE images 98 method radiometric values (Landsat program)

REFERENCES However, this classification is based on sub-image [1] J. Li, L. Zhao, B. Xu, X. Yang, Y. Jin, T. Gao, H. Yu, F. Zhao, H. classification instead of pixels as samples. These sub-images are Ma, and Z. Qin, “Spatiotemporal variations in grassland desertification based on landsat images and spectral mixture constructed during a segmentation phase, and it is well analysis in Yanchi county of Ningxia, China,” IEEE Journal of established that the segmentation task conducted in many Selected Topics in Applied Earth Observations and Remote different topics is error-prone. In [5], an object-based Sensing, vol. 7, no. 11, pp. 4393-4402, 2014. classification using SVMs is applied. Even if SVMs utilizes a [2] F. Harrou, N. Zerrouki, Y. Sun, and L. Hocini, "Monitoring land- cover changes by combining a detection step with a classification geometric aspect (in pixels separation) and provide a sparse step," in 2018 IEEE Symposium Series on Computational solution via structural risk minimization, the VAE classification Intelligence (SSCI). pp. 1651-1655, 2018. has outperformed the SVM detection methods [5]. In summary, [3] Zerrouki, N., Harrou, F., & Sun, Y. Statistical monitoring of this study confirms that the VAE formalism adapted to changes to land cover. IEEE Geoscience and Remote Sensing Letters, 15(6), 927-931, 2018. desertification detection applications show promising detection [4] Zerrouki, N., Harrou, F., Sun, Y., & Hocini, L. A Machine performance. Learning-Based Approach for Land Cover Change Detection Using Remote Sensing and Radiometric Measurements. IEEE Sensors Journal, 19(14), 5843-5850, 2019. VI. CONCLUSION [5] Azzouzi, S. A., Vidal-Pantaleoni, A., & Bentounes, H. A. This paper introduces a robust approach based on deep Desertification monitoring in Biskra, Algeria, with Landsat learning formalism to desertification detection and imagery by means of supervised classification and change detection methods. IEEE Access, 5, 9065-9072, 2017. discrimination using multi-temporal and multi-spectral remote [6] Lamchin, M., Lee, J. Y., Lee, W. K., Lee, E. J., Kim, M., Lim, C. sensing data. The proposed VAE-based method is applied to H., & Kim, S. R. Assessment of land cover change and desertification detection and discrimination of the sand desertification using remote sensing technology in a local region of Mongolia. Advances in Space Research, 57(1), 64-77, 2016. propagation process in Biskra (Algeria) for 19 years (from 200 [7] Kundu, A., Patel, N. R., Saha, S. K., & Dutta, D. “Desertification to 2019). Seven other promising deep models and clustering in western Rajasthan (India): an assessment using remote sensing algorithms were also tested with the same data sets. Their derived rain-use efficiency and residual trend methods”. Natural Hazards, 86(1), 297-313, 2017. detection performance was compared with those reached by the [8] Kim, J., Lin, C. W., vanGasselt, S., Lin, S., & Lan, C. W. “An VAE-based desertification detection approach. Results indicate Integrated Multi-Sensor Approach to Monitor Desert Environments the superior capability of the VAE compared to the other by UAV and Satellite Sensors: Case Study Kubuqi Desert, China”. In AGU Fall Meeting Abstracts, December, 2017. models in separating real desertification cases from all other [9] N. Boulghobra, T. Hadri, and M. Bouhana, “Using Landsat imagery land cover changes, such as deforestation or areas undergoing for monitoring the spatiotemporal evolution of sanding in dryland, seasonal phenomena of dryness of wild grasses. Also, the the case of in-Salah in the Tidikelt (Southern Algerian Sahara)”,' comparison with some existing studies demonstrated that the Geograph. Techn., vol. 9, no. 1, pp. 1-9, 2014. [10] Zerrouki, N., & Bouchaffra, D. “Pixel-based or Object-based: VAE approach outperformed the state of the art methods based Which approach is more appropriate for remote sensing image on decision-tree [1], SVM [5], CNN [12], decision tree, and classification?”. In 2014 IEEE International Conference on Isodata combination [13]. This result highlights the promising Systems, Man, and Cybernetics (SMC), pp. 864-869, October 2014. [11] Zanchetta, A., Bitelli, G., & Karnieli, A. “Monitoring performance of the VAE algorithm in separating non-linear desertification by remote sensing using the Tasselled Cap transform

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12

for long-term change detection”. Natural Hazards, 83(1), 223-237, Workshop on Hyperspectral Image and Signal Processing: 2016. Evolution in Remote Sensing (WHISPERS), pp. 1-4, 2012, June.. [12] Ainiwaer, M., Ding, J., & Kasim, N. “Deep learning-based rapid [33] Zeroual, A., Harrou, F., Dairi, A. and Sun, Y. “Deep learning recognition of oasis-desert ecotone plant communities using UAV methods for forecasting COVID-19 time-Series data: A low-altitude remote-sensing data”. Environmental Earth Comparative study. Chaos, Solitons & Fractals, 140, p.110121, Sciences, 79, 1-11, 2020. 2020. [13] Afrasinei, G. M., Buttau, C., Ghiglieri, G., Melis, M.T. and Virdis, S., “Study of land degradation and desertification dynamics in North Africa areas using remote sensing techniques”, University of Cagliari. doi:10.13140/RG.2.1.2412.6327, 2016. Yacine Zerrouki received the B.S. degree [14] Guo, B., & Wen, Y. “An optimal monitoring model of in biology, and the M.S. degree in desertification in naiman banner based on feature space utilizing biochemistry from the University of landsat8 oli image”, IEEE Access, 8, 4761-4768, 2019. [15] D. P. Kingma and M.Welling, “Stochastic gradient VB and the Sciences and Technology Houari variational auto-encoder”,_ in Second International Conference on Boumedienne(USTHB), Algiers, Algeria, Learning Representations, ICLR, vol. 19, 2014. in 2011 and 2013, respectively. He is [16] Dairi, A., Harrou, F., Sun, Y. and Khadraoui, S. “Short-Term currently working at the National Forecasting of Photovoltaic Solar Power Production Using Variational Auto-Encoder Driven Deep Learning Approach”. Conservatory of Environmental Training, Applied Sciences, 10(23), p.8400, 2020. Algiers, Algeria. He is affiliated with the Algerian ministry of [17] Dairi, A., Harrou, F., Sun, Y. and Senouci, M. “Obstacle detection environment. His research interests include climate change and for intelligent transportation systems using deep stacked ecological environment monitoring and evaluating autoencoder and k-nearest neighbor scheme”. IEEE Sensors Journal, 18(12), pp.5122-5132, 2018. [18] A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, J. Roy. Fouzi Harrou received the M.Sc. degree in Statist. Soc. B (Methodol.), vol. 39, no. 1, pp. 1-38, 1977. telecommunications and networking from [19] Harrou, F., Sun, Y., Hering, A.S. and Madakyaru, M. “Statistical Process Monitoring Using Advanced Data-Driven and Deep the University of Paris VI, , and the Learning Approaches: Theory and Practical Applications”. Ph.D. degree in systems optimization and Elsevier, 2020. security from the University of Technology [20] Nag, S., Hewagama, T., Georgiev, G. T., Pasquale, B., Aslam, S., of Troyes (UTT), France. He was an & Gatebe, C. K. “Multispectral snapshot imagers onboard small satellite formations for multi-angular remote sensing”. IEEE Assistant Professor with UTT for one year Sensors Journal, 17(16), 5252-5268, 2017. and with the Institute of Automotive and [21] Li, W., Guo, Q., & Elkan, C. “A positive and unlabeled learning Transport Engineering, Nevers, France, for one year. He algorithm for one-class classification of remote-sensing was also a Postdoctoral Research Associate with the Systems data”. IEEE Transactions on Geoscience and Remote Sensing, 49(2), 717-725, 2010. Modeling and Dependability Laboratory, UTT, for one year. [22] D. P. Kingma and M. Welling, “Auto-encoding variational bayes, He was a Research Scientist with the Chemical Engineering arXiv” preprint arXiv:1312.6114, 2013. Department, Texas A&M University at Qatar, Doha, Qatar, for [23] Dixit, S. and Verma, N.K. “Intelligent Condition-Based Monitoring three years. He is actually a Research Scientist with the of Rotary Machines With Few Samples”. IEEE Sensors Journal, 20(23), pp.14337-14346, 2020. Division of Computer, Electrical, and Mathematical Sciences [24] C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint and Engineering, King Abdullah University of Science and arXiv:1606.05908, 2016. Technology. He is the author of more than 130 refereed journals [25] Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. “Rethinking and conference publications, and book chapters. He is co-author atrous convolution for semantic image segmentation”. arXiv preprint arXiv:1706.05587, 2017. of the book "Statistical Process Monitoring Using Advanced [26] Ronneberger, O., Fischer, P. and Brox, T. “U-net: Convolutional Data-Driven and Deep Learning Approaches: Theory and networks for biomedical image segmentation”. In International Practical Applications" (Elsevier, 2020). His current research Conference on Medical image computing and computer-assisted interests include statistical decision theory and its applications, intervention, pp. 234-241, Springer, Cham, October 2015. [27] Lyalko, V. I., Romanciuc, I. F., Yelistratova, L. A., Apostolov, A. fault detection and diagnosis, and deep learning. A., & Chekhniy, V. M. “Detection of Changes in Terrestrial Ecosystems of Ukraine Using Remote Sensing Data”. Journal of Nabil Zerrouki received the Ph.D. degree in signal and image Geology, Geography and Geoecology, 29(1), 102-110, 2020. processing from the University of Sciences and Technology [28] Kacem, H. A., Fal, S., Karim, M., Alaoui, H. M., Rhinane, H., & Maanan, M. “Application of fuzzy analytical hierarchy process for Houari Boumedienne (USTHB), Algiers, Algeria, in 2018. He assessment of desertification sensitive areas in North West of is currently a Scientist with the Center for Development of Morocco”. Geocarto International, 1-18, 2019. Advanced Technologies, Algiers, Algeria. His main research [29] Ding, H.P., Chen, J.P. and Wang, G.W. “A model for interest is image processing, machine learning, pattern desertification evolution employing GIS with cellular automata”. In 2009 International Conference on Computer Modeling and recognition, and remote sensing. Simulation, pp. 324-328, February 2009. [30] Zhang, C., Niu, S. and Xu, D. “Monitoring of desertification in Abdelkader Dairi is an Assistant Inner Mongolia based on MODIS data”. In 2010 Third professor of the Department of Computer International Conference on Information and Computing, Vol. 3, pp. 322-324, June 2010. Science, University of Science and [31] Bernstein, L.S., Jin, X., Gregor, B. and Adler-Golden, S.M., “Quick Technology of Oran-Mohamed Boudiaf atmospheric correction code: algorithm description and recent (USTO-MB), Algeria. He received an upgrades”. Optical engineering, 51(11), p.111719, 2012. Engineer degree in computer science from [32] Bernstein, L.S., Adler-Golden, S.M., Jin, X., Gregor, B. and Sundberg, R.L. “Quick atmospheric correction (QUAC) code for the University of Oran 1 , VNIR-SWIR spectral imagery: Algorithm details”. In 2012 4th Algeria, in 2003. He also received the

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3042760, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13

Magister degree in computer science from the National Polytechnic School of Oran, Algeria, in 2006. From 2007 to 2013, he was a senior Oracle database administrator (DBA) and enterprise resource planning (ERP) manager. He has over 20 years of programming experience in different languages and environments. In 2018 he received a Ph.D. degree in computer sciences from Ben Bella Oran1 University. His research interests include a deep learning approach for autonomous robot navigation, computer vision, image processing, and mobile robotics.

Ying Sun received the Ph.D. degree in statistics from Texas A&M, in 2011. She held a two-year postdoctoral research position at the Statistical and Applied Mathematical Sciences Institute and the University of Chicago. She was an Assistant Professor with Ohio State University for a year before joining KAUST, in 2014. At KAUST, she established and leads the Environmental Statistics research group, which works on developing statistical models and methods for complex data to address important environmental problems. She has made original contributions to environmental statistics, in particular in the areas of spatiotemporal statistics, functional data analysis, visualization, computational statistics, with an exceptionally broad array of applications. She received two prestigious awards: The Early Investigator Award in Environmental Statistics presented by the American Statistical Association and the Abdel El-Shaarawi Young Research Award from The International Environmetrics Society.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/