This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Systematic Assessment of MODTRAN Emulators for Atmospheric Correction

Jorge Vicent Servera , Juan Pablo Rivera-Caicedo , Jochem Verrelst , Jordi Muñoz-Marí , Neus Sabater, Béatrice Berthelot, Gustau Camps-Valls , Fellow, IEEE, and José Moreno , Senior Member, IEEE

Abstract— Atmospheric radiative transfer models (RTMs) sim- Index Terms— Atmospheric correction, emulation, hyperspec- ulate the light propagation in the Earth’s atmosphere. With tral, MODTRAN, radiative transfer. the evolution of RTMs, their increase in complexity makes them impractical in routine processing such as atmospheric correction. To overcome their computational burden, standard I. INTRODUCTION practice is to interpolate a multidimensional lookup table (LUT) of prestored simulations. However, accurate interpolation relies TMOSPHERIC correction is one of the main algorithms on large LUTs, which still implies large computation times for Awhen processing optical satellite [1]. Its main task is their generation and interpolation. In recent years, emulation has the conversion of the top-of-atmosphere (TOA) radiance signal been proposed as an alternative to LUT interpolation. Emulation approximates the RTM outputs by a statistical regression model measured by a satellite instrument into surface reflectance. trained with a low number of RTM runs. However, a concern is After compensation of the atmospheric effects, the derived whether the emulator reaches sufficient accuracy for atmospheric surface reflectance data can be used to retrieve geophysical correction. Therefore, we have performed a systematic assess- properties for applications, such as vegetation monitoring ment of key aspects that impact the precision of emulating [2] or water quality [3]. In general, atmospheric correction MODTRAN: 1) regression algorithm; 2) training database size; 3) dimensionality reduction (DR) method and a number of algorithms can be categorized into two main families: 1) semi- components; and 4) spectral resolution. The Gaussian processes empirical and 2) physically based methods. The first category regression (GPR) was found the most accurate emulator. The refers to image-based methods that aim at normalizing the principal component analysis remains a robust DR method and TOA radiance data through semi-empirical approaches [4]. nearly 20 components reach sufficient precision. Based on a The second category refers to methods that make use of database of 1000 samples covering a broad of atmospheric conditions, GPR emulators can reconstruct the simulated spectral radiative transfer models (RTM) to derive the atmospheric data with relative errors below 1% for the 95th . These composition and to decouple the surface reflectance from emulators reduce the processing time from days to minutes, atmospheric transfer functions (i.e., path radiance, transmit- preserving sufficient accuracy for atmospheric correction and tances, and spherical albedo) [5]–[7]. Because of such a providing model uncertainties and derivatives. We provide a set rigorous approach, the latter methods are more accurate, of guidelines and tools to design and generate accurate emulators for satellite data processing applications. though slower, than the first category. Atmospheric RTMs are computer models that describe the physical processes of scattering, absorption, and emission of the electromagnetic Manuscript received November 9, 2020; revised March 30, 2021; accepted radiation within the Earth’s atmosphere [8]. They are widely April 2, 2021. This work was supported by the FLEX L2 End-to-End Sim- used in the atmospheric correction and other Earth observa- ulator Development and Mission Performance Assessment Project, European Space Agency (ESA), under Contract 4000119707/17/NL/MP. The work of tion applications, such as scene generation [9], atmospheric Jochem Verrelst was supported by the European Research Council (ERC) chemistry [10], and numerical weather prediction [11]. Over through the ERC-2017-STG SENTIFLEX Project under Grant 755617. The time, these models have increased their realism from sim- work of Neus Sabater was supported by the Academy of Finland FluoSYN Project under Grant 324091. The work of Gustau Camps-Valls was supported ple semi-empirical models [12] toward advanced physically by the ERC through the ERC-CoG-2014 SEDAL Project under Grant 647423. based models, such as MODTRAN [13], MOMO [14], and (Corresponding author: Jorge Vicent Servera.) libRadtran [15]. This evolution has led to an increase in com- Jorge Vicent Servera and Béatrice Berthelot are with Magellium, 31520 Toulouse, France (e-mail: [email protected]; putational time and memory requirements to run the model, [email protected]). making RTMs impractical in routine data processing chains, Juan Pablo Rivera-Caicedo is with the Departamento of Secretaría de inves- such as atmospheric correction. To overcome these limitations, tigación y postgrado, Consejo Nacional de Ciencia y Tecnología, Autonomous University of Nayarit, Tepic 63173, Mexico (e-mail: [email protected]). the common approach is to interpolate lookup tables (LUTs) Jochem Verrelst, Jordi Muñoz-Marí, Gustau Camps-Valls, and José of precalculated RTM simulations [16]. However, large LUTs Moreno are with the Image Processing Laboratory, University of Valencia, are still needed to achieve sufficient accuracies, imposing long 46100 Valencia, Spain (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). computation times in case of slow RTMs [17]. In addition, Neus Sabater is with the Finnish Meteorological Institute, 00560 Helsinki, advanced interpolation techniques are memory intensive, and Finland (e-mail: neus.sabater@fmi.fi). thus, limited to low-dimensional spaces [18], [19]. Color versions of one or more figures in this article are available at https://doi.org/10.1109/TGRS.2021.3071376. In this context, RTM emulators were proposed as an Digital Object Identifier 10.1109/TGRS.2021.3071376 accurate and fast alternative to LUT interpolation [20], [21]. 0196-2892 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

An emulator is a surrogate statistical learning model that take into account when replacing an RTM with an emulator approximates the original deterministic model at a fraction is that one should ensure the same properties of the model; of the original running time [22], [23]. Therefore, emulation the emulator should be mathematically tractable and permit and LUT interpolation serve essentially the same purpose: to the calculation of the Jacobians, uncertainty quantification and approximate the original atmospheric RTM outputs. The idea propagation, and sensitivity analysis. As LUT-based meth- of emulating atmospheric RTMs has already been success- ods, emulators also keep these properties by computing the fully tested to global sensitivity analysis [24], [25]. In these Jacobians numerically and, for kernel methods, also analyti- works, the emulators were designed so that their accuracy was cally [35]. This allows error quantification and propagation as sufficient to study the dependencies of the RTM outputs on well sensitivity analysis from an emulator, features that are the main input variables through a statistical global sensitivity not commonly implemented within an RTM code [11]. analysis. However, when it comes to applying an emulator The emulation technique is, however, challenged when it in a data processing scheme, a critical aspect is a precise comes to predict the multiple outputs produced by an RTM approximation of the original RTM. In fact, various factors code and reconstruct a full-spectral simulation. Indeed, a full determine the accuracy and performance of the emulator [26]: 400–2500-nm spectral data consists of over 10 000 bands when 1) used machine learning method; 2) training database size; simulated at 1 cm−1 resolution. This is an additional difficulty and 3) applied dimensionality reduction (DR) strategy. These compared with classical emulators since only a few regression three aspects must be systematically assessed in order to algorithms are capable of generating multiple outputs [36]. develop an emulator that is both fast and accurate to be Moreover, it can take a considerable time to train such a used for atmospheric correction as an alternative over LUT complex statistical multi-output model, especially when using interpolation. NN [37]. A workaround solution is to take advantage of the Therefore, the main goal of this work is to perform Hughes phenomenon [38] and the large degree of co-linearity a systematic assessment of various emulator configurations in spectroscopic data. Accordingly, a DR method can trans- in terms of computation time and accuracy to reconstruct form the spectral data into a lower dimensional space, in which MODTRAN-based atmospheric transfer functions for a wide the number of components is a fraction of the original amount range of atmospheric conditions. This work focuses on the of spectral bands. With the application of a DR method, following two aspects. The first aspect refers to the identi- the multi-output problem is greatly reduced to a number of fication of the most suitable statistical learning method by components that retain the spectral information from the origi- evaluating the role of: 1) regression algorithm and DR strategy. nal data. By first applying a DR method, the spectroscopic data The second aspect aims at further optimizing the emulator by is reduced to a given number of components. Then, by looping studying the role of: 2) a number of components, as a function over each component, multiple models can be trained by of the spectral resolution and 3) the training database size. As single-output regression algorithms. Afterward, the spectral a proof-of-concept, the best performing MODTRAN emulator signal can again be reconstructed. Although this iterative will be applied for the atmospheric correction of real Sentinel- process takes some computational time, it goes much faster 3 and synthetic TOA radiance from the future Fluorescence than training an emulator for every single-spectral channel Explorer (FLEX) mission, European Space Agency (ESA), and makes the problem better conditioned [20], [31], [37]. Paris, France. We will close this article with a discussion on Moreover, the application of a DR method in the emula- potential applications of atmospheric RTM emulators. tion processing allows converting any single-output regression algorithm into a multi-output algorithm. However, in practice, only statistical regression algorithms are sufficiently adaptive II. EMULATION THEORY to enable approximating the behavior of an RTM. Emulation is a regression model that approximates model simulations from the statistical learning of a training data set [23]. Similar to LUT interpolation, an emulator uses a A. Statistical Regression Algorithms database of prestored model simulations (i.e., training samples) In order to approximate an RTM (e.g., MODTRAN) through to infer the output values of a computationally expensive code emulation, a statistical regression algorithm must be trained for an unseen input configuration. Therefore, an emulator is using a database of precalculated RTM simulations [23]. The an advanced regression model as typically done for retrieval training data set should ideally cover the multidimensional applications but then in reversed order [27], [28]. Emulation input space so that the statistical regression uncovers the offers a practical solution for routine applications to efficiently underlying correlations between model inputs and outputs. approximate computer codes of high computational burden, One strategy is using a space-filling pseudorandom which is the case of most atmospheric RTMs. This technique algorithm, such as (LHS) [39]. has been successfully applied for the last few decades in An alternative is to use active learning methods to set the the climate and environmental modeling communities [29]– samples in optimum locations of the input space [40]. From [31] and to approximate atmospheric RTMs [24], [32]. The our previous literature review [24], [37], we evaluated the emulation technique is possible thanks to the development of following algorithms to function as emulators, belonging to adaptive and flexible regression algorithms [33], with success- the three most standard families of regression algorithms: ful examples, such as Gaussian processes regression (GPR) 1) the standard and efficient random forest (RF); 2) kernel and neural networks (NNs) [20], [34]. An important aspect to ridge regression (KRR) and GPR as paradigmatic examples of

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 3 kernel methods, and 3) the standard feed-forward multilayer X-scores, simply by W = UX.AsU is an orthogonal perceptron from the family of artificial NNs. matrix the reconstruction of X given the scores is obtained by Decision trees are predictive models based on a set of X = UW. Hence, the spectra are reconstructed. The DR by hierarchical connected nodes, each of them representing a PCA is achieved by selecting a number of components p that linear decision based on a specific input feature. To over- is much lower than the number of original spectral channels come the difficulty of classical decision trees to cope with B. Typically, one takes enough components to ensure that at strong nonlinear input–output relationships, a combination of least 99.99% of is retained. While this variance level decision trees (RFs) can improve results [41]. We tuned an is quite standard, it might also retain noise in the real data. RF trying with different configurations involving the number An alternative DR method that allows reconstruction of the of trees in the forest (from 100 to 1000), minimum leaf size spectral output is the partial (PLS) [53]. The PLS (from one to five samples), and a number of variables to select method looks for projections that maximize the and randomly in each split of the trees (from 1/3 of the number the correlation between the RTM output spectra and its input of variables, as originally proposed in [41], to all variables). variables. The possibility to combine multiple regression algo- Artificial NNs are layered circuits of artificial neurons con- rithms with either PCA or PLS allows construction of different nected via weights (links) representing excitation or inhibition emulators. The next step is to assess their performance in order states [42]. Designing and training a NN implies selecting the to consolidate an optimized emulator design for atmospheric number of hidden layers and nodes per layer, the shape of the RTMs and atmospheric correction applications. nonlinearity connection, the learning rate, the regularization parameters to avoid overfitting, and initializing the weights. III. MATERIALS AND METHODS Both the training algorithm and the have an The systematic evaluation of emulator configuration was impact on the performance of an NN. Here, we have used carried out with the in-house developed automated RTMs oper- the standard multilayer perceptron, which is a fully connected ator (ARTMO) framework [54]. ARTMO is a scientific soft- network. In order to simplify its design, the NNs in this work ware package developed in MATLAB that provides tools and are consisting of only one hidden layer of neurons. The NN toolboxes for running a suite of leaf, canopy, and atmosphere structure is optimized using the Levenberg–Marquardt learning RTMs and for postprocessing applications, such as emulation. algorithm with a squared loss function. ARTMO has been made compatible with the atmospheric LUT In machine learning, kernel methods use kernel function to generator (ALG) toolbox [25], a standalone software tool that quantify similarities (or distances) between input samples of a allows generating LUTs based on a suite of atmospheric RTM, data set [33], [43]. The similarity is computed by a linear dot such as MODTRAN, 6SV, or libRadtran. ARTMO and ALG product in a higher dimensional feature space, yet without ever are freely downloadable at www.artmotoolbox.com. computing the data location in the feature space. The following two methods are tested here: 1) KRR [44] and 2) GPR, as a A. Atmospheric Lookup Table Generator probabilistic version of KRR [45]. For all kernel methods, ALG is a MATLAB-compiled software tool that facilitates we used a standard radial basis kernel function. the generation of LUTs for a wide range of atmospheric The above-mentioned regression methods are popular in var- RTMs [25]. This tool provides a graphical user interface ious application domains thanks to processing times, accuracy, with which users can configure and execute atmospheric and robustness to handle overfitting. A detailed description of RTM codes for simulations in the optical domain. ALG these methods is given in [28] and the simpleR package [46]. automatically processes and standardizes all the RTM input and output data into the final LUT file. These LUTs store B. Dimensionality Reduction Algorithms the so-called atmospheric transfer functions. These spectral outputs permit uncoupling the atmospheric absorption and As previously introduced, an emulator should combine a scattering effects from the surface, and thus, are particularly regression algorithm with a DR method in order to reconstruct useful in atmospheric correction and scene generation [55]. the spectral output generated by an RTM. The regression For a Lambertian and homogeneous surface with reflectance algorithm first predicts the components of the RTM data ρ, a TOA radiance spectrum (L) can be calculated through (1) in a lower dimensional space, and the inverse of the DR (E μ + E )(T + T )ρ transformation is then applied to reconstruct output spectra at L = L + dir il dif dir dif 0 π( − ρ) (1) expenses of some loss in accuracy. The principal component 1 S analysis (PCA) [47] is a DR method that is widely used to where L0 (atmospheric reflected radiance), Edir/dif (at-surface reduce the dimensionality of large data sets. PCA has shown direct/diffuse solar irradiance), S (spherical albedo) and Tdir/dif its suitability to reconstruct satellite data and to speed-up (upwelling direct/diffuse target-to-sensor transmittance) are μ atmospheric RTMs [48]–[52]. This method looks for corre- the atmospheric transfer functions, and il is the cosine of lations between the spectral data in order to identify a set of the solar zenith angle (SZA). orthogonal linearly uncorrelated directions that maximize the variance of the projections. The eigenvectors and eigenvalues B. Modtran of this transformation are estimated from the MODTRAN (http://modtran.spectral.com/) is a widely used of the spectral data set X. The eigenvectors matrix, U,isthen atmospheric RTM with multiple applications in Earth obser- used as a projection matrix that allows obtaining the so-called vation [13]. It solves the radiative transfer equation with

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING accurate modeling of the coupled scattering and absorption TABLE I RANGE OF MODTRAN6INPUT VARIABLES.BOTH VIEWING ZENITH AND effects. At its core, MODTRAN implements the discrete ◦ ordinates algorithm [56] and the correlated-k method for RELATIVE AZIMUTH ANGLES ARE FIXED TO 0 simulating the gaseous absorptions [57]. The spherically sym- metric atmosphere is constituted of stratified vertical pro- files of temperature, gas molecules standard, and user-defined aerosol optical properties are distributed in the boundary-layer (<2 km) and stratosphere. Accordingly, MODTRAN prop- agates the incoming solar irradiance through the Earth’s atmosphere, modeling the effects from spherical refraction, molecular and aerosol absorption/emission and scattering, and surface reflections and emission. The calculated spectral outputs include spectral transmittance, TOA radiance, solar irradiance, and so on. These outputs cover the full optical range (0.2–200 μm) with a resolution up to 0.1 cm−1 for the ultrafine correlated-k simulations, or even higher using the latest line-by-line capabilities [58]. MODTRAN has been extensively validated, and it continues to be maintained and updated.

C. ARTMO’s Emulator Toolbox As part of the ARTMO software package, the Emulator toolbox enables the evaluation of regression algorithms on their capability to approximate RTM outputs as a function of Fig. 1. Sample of the direct upward transmittance in the O2-A absorption input variables [24], [37]. Essentially, the emulator toolbox band at coarse (yellow), medium (orange), and fine (blue) spectral resolutions. encompasses a suite of methods from the simpleR library [46] that can be combined with DR methods (e.g., PCA) in order to train statistical models that produce spectral outputs that using just 400 samples would suffice to function as an based on RTM inputs. While in earlier versions, the focus emulator [59], though larger databases may also improve the of the Emulator toolbox was to train emulators based on accuracy of an emulator [37]. Nevertheless, an increasing the vegetation leaf-canopy RTMs available within ARTMO, number of samples also results in longer computation times in the here latest presented version (v. 1.13), one of the and can also put a problem due to RAM memory limitations in introduced novelties involved the ability to handle the multiple training the emulators. In the analysis presented here, we built RTM spectral outputs stored in the ALG files (i.e., the six training databases with 100, 500, 1000, and 2000 samples to atmospheric transfer functions). Accordingly, users can easily study the effect of database size in the accuracy while still develop emulators of atmospheric RTMs and export them for having an acceptable compromise regarding the computation their later use in data processing applications. time and memory constraints. Regarding the spectral configuration, all these training IV. EXPERIMENTAL SETUP data sets were generated in the 400–2500-nm spectral range A. Database Description avoiding the saturated H2O absorption bands at ∼1380 and The emulation and assessment are based on a ∼1880 nm. The correlated-k option was used for the fol- set of atmospheric transfer functions databases that were first lowing three spectral resolution: coarse (15 cm−1), medium generated with ALG using MODTRAN6 and the interrogation (5 cm−1), and fine (1 cm−1). With these resolutions, an entire technique in [16]. To do so, the input variable space was spectrum consists of nearly 1400, 4000, and 20000 spectral sampled using an LHS strategy with minimum and maximum channels, respectively. Simulations at ultrafine spectral reso- boundaries, as given in Table I. These input variables are lution (0.1 cm−1) were discarded due to their high memory typically used in the atmospheric correction of optical satellite requirements (200 000 spectral channels, resulting in more data due to their influence in shaping the TOA radiance [24]. than 300-Gb matrices to perform GPR emulators). An increas- An LHS of training data is preferred over a systematic grid- ing spectral resolution increases the complexity of the data ded sampling, as LHS ensures that the set of pseudorandom in the absorption features (see Fig. 1), which can potentially values cover the full variability of the input parameter space impact the accuracy of the DR method and, thus, of the while minimizing the number of samples. Thus, in principle, emulator. the developed emulator will be able to reconstruct accurately The processing time of the 1000-samples databases took the spectral atmospheric transfer functions for any possible 3.5 h, 12 h, and 44 h, respectively, for the three increasing combination of input variables. The training database size spectral resolutions on a personal computer with the follow- is known to influence the emulator accuracies as it serves ing characteristics: Windows 10 64-bits OS, i7-4710 CPU to sample the input variable space. Earlier studies suggest 2.50 GHz, 16-GB RAM, and using five parallel executions

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 5

TABLE II Here, a single-random split was applied using 70% samples LIST OF REGRESSION METHODS USED FOR EMULATION.ADETAILED for training and the remaining 30% for verification. However, DESCRIPTION IS GIVEN IN [28] it is more suitable to perform alternative cross validation methods (e.g., leave-one-out) to perform the validation when the number of samples in the database is small. In the second step, a performance assessment was carried out in order to compare the accuracy of the various emulators against each other. Here, a reference database of 2000 samples was generated with a random LHS distribution and the same input variables and ranges, as in Table I. This reference database out of eight cores. The same computer was used to carry out of RTM outputs is used as ground truth to evaluate and to the application scenarios described in Section IV-D. compare the performance of all emulators. Here, we used the spectral normalized root--square-error (NRMSE) [%] B. Methodology for Emulator Analysis (see (2)) to evaluate the performance  The methodology for emulator analysis follows our previous  1 n [ fˆ(x ) − f (x )]2 work in [26]. The selected regression algorithms (see Table II n i=1 i i NRMSE = 100 (2) and Section II-A) were first analyzed for the default training fmax − fmin database (i.e., 1000-samples and medium resolution) for emu- n = reference lating the six spectral atmospheric transfer functions. where 2000 is the number of samples in the database, fˆ and f , respectively, the emulated and RTM The PCA method was first applied for reducing the dimen- x f sionality of training data. Although the first five compo- spectral values evaluated at the input point i ,and max and f nents explain 99.95% of the variance in the medium spectral min are, respectively, the maximum and minimum values of the n spectra in the reference data set. In order to inspect the resolution database, developing the regression models with less than ten components led to inaccurate reconstructions emulators’ accuracy and compare their performances along with the spectral range, the spectral NRMSE is plotted as of the atmospheric transfer functions. Higher accuracy can be achieved, particularly in absorption bands, adding more a function of wavelength. The spectrally averaged NRMSE λ n components in the regression algorithms but at the expenses (NRMSE ) and the emulation time of the samples are also tracked. of slowing down processing time. Therefore, the first analysis was carried out with ten components (i.e., 99.99% explained variance) in order to balance accuracy and processing time. D. Application Example: Atmospheric Correction After identifying the best algorithm, the role of training database size was second analyzed. Here, the 100-, 500-, As a proof of concept, the best evaluated emulator is 1000-, and 2000-samples databases at medium spectral resolu- used in the context of atmospheric correction for three sce- tion were subsequently passed to ARTMO’s Emulator toolbox narios: 1) synthetic hyperspectral TOA radiance spectra as for the best performing emulator and 10 PCA components. would be observed by an imaging spectrometer overflying a The role of the DR method and the number of compo- vegetated surface; 2) a real-OLCI/Sentinel-3 data; and 3) a nents was third analyzed for the best performing regression synthetic FLORIS/FLEX data. Before testing the emulators algorithm. The PCA and PLS methods were first compared for atmospheric correction, we first evaluated the accuracy of in the default training database in order to identify the best atmospheric RTM emulators in reconstructing TOA radiance performing method. Second, an increasing number of compo- spectra. A simulated TOA radiance spectra database (Lsim) nents (10, 20, 30, and 40) were evaluated for all the six spectral was constructed with (1) using the MODTRAN6 simula- atmospheric transfer functions. The analysis is carried out for tions in the reference database at medium spectral resolu- the 500-samples databases and for the coarse, medium, and tion. Only the conifer spectrum from MODTRAN6’s albedo fine MODTRAN6 spectral resolutions. database was used as Lambertian surface reflectance spec- trum in the TOA radiance simulations in order to vary only the input atmospheric parameters. In parallel, an equivalent C. Emulation Validation TOA radiance database (Lemu) was constructed using the Emulators are approximations of the original model, and as atmospheric transfer function emulators. The spectral relative such, this approximation introduces a source of error in the error, in absolute values, between Lsim and Lemu was calculated emulated spectral outputs [23]. Therefore, validation of the for all the 2000 samples and its plotted against generated emulator is an important step in order to evaluate wavelengths. In order to better visualize the errors and avoid its accuracy. Two steps were used to validate the accuracy of the deep H2O absorption regions, the results will be shown the emulator. In the first step, a verification was carried out in in the 400–1400-nm spectral range. For consistency with the order to provide a first quick analysis of emulator performance. emulation validation, the spectral NRMSE will also be pro- Through this verification, we determined which regression vided. We can compare these results against typical absolute algorithm methods and training configuration were worth radiometric calibration performance for satellite instruments exploring in more detail. For this verification step, the original (e.g., 1%–2% in OLCI/Sentinel-3 [60]) and signal-to-noise data were split into two parts following the approach in [24]. requirements for imaging spectrometers (i.e., ∼0.1%).

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 3. RGB composite (640/550/500 nm) of FLORIS Level-1B TOA radiance simulated with FLEX-E (left) and sample spectra for the twelve distinctive land cover classes (right).

Fig. 2. Color composite of OLCI Level-1B data acquired over France on and Angstrom exponent values are, respectively, 0.39 and the June 17, 2019, at 10:03. The processed data subset is marked by the red 1.085. For the remaining aerosol parameters (asymmetry square. factor, g, and SSA), we assigned the spectrally averaged values from OPAC’s pure water-soluble aerosol type at 50% relative humidity [61] as identified by the synergy algorithm (in the Second, we applied the emulator to invert the surface variable called AMID), i.e., g = 0.67 and SSA = 0.95. The reflectance from the forward TOA radiance simulation. In this atmospheric correction was performed by inverting the surface case, we started with the aforementioned simulated TOA reflectance from (1). The atmospheric transfer functions were radiance database, Lsim and derived the surface reflectance first derived pixelwise by running a previously trained by inversion of (1) using the emulated atmospheric transfer emulator with the input geometric and atmospheric conditions functions. This produced a database of 2000 inverted surface of the image. These high spectral resolution spectra were reflectance spectra, for which we calculated the relative error then convolved by the nominal OLCI/Sentinel-3B spectral against the ground-truth conifer reflectance spectrum. The response function extracted from ESA’s Sentinel online histogram of relative errors is plotted for each wavelengths. https://sentinel.esa.int/web/sentinel/technical-guides/sentinel- With the aim of testing the utility of emulators in practical 3-olci/olci-instrument/spectral-response-function-datawebsite. applications, as a proof of concept, third, we applied the best Lastly, we analyzed the distribution of surface reflectance emulators to the atmospheric correction of OLCI/Sentinel- values from the emulation inversion and from the OLCI’s 3 Level-1B. OLCI is an optical instrument on the board of synergy surface reflectance product, excluding the O2 the Sentinel-3 satellite, belonging to the space segment of absorption (bands #13–#15) and the deep H2O absorption the Copernicus program. OLCI acquires the Earth-reflected (bands #19 and #20). radiance with 21 spectral channels distributed in the As a final application example, optimized emulators 400–1000-nm spectral range and with a resolution of were generated for the atmospheric correction of synthetic 2.5–40 nm. Five cameras scan the Earth surface at 300-m FLORIS/FLEX data. FLEX is the future’s ESA’s Earth spatial resolution with a total swath of 1270 km. More Explorer mission, aiming at obtaining global maps of vegeta- details of the instrument configuration can be found in ESA’s tion Sun-induced fluorescence emission and biophysical para- Sentinel online https://sentinel.esa.int/web/sentinel/user- meters from measurements in the 500–780-nm spectral range guides/sentinel-3-olciwebsite. We performed the atmospheric [62]. FLEX is equipped with imaging spectrometers (FLORIS) correction on a subset OLCI/Sentinel-3B Level-1B TOA that will measure radiance at a spectral sampling (resolution) radiance data (see Fig. 2). up to 0.1 nm (0.3 nm). Since the expected launch date of A subset of nearly 1.4 million pixels was taken within FLEX is 2023, here, we tested the emulators for atmospheric columns 3138–4062 and lines 1500–3000 correspondings correction over a synthetic noise-free scene generated with to the mountainous area of the Swiss Alps in the center, the FLEX end-to-end mission performance simulator [9], the coastal area around Nice, France, in the bottom-left, and [63]. The scene of 100 × 100 pixels (see Fig. 3) consists North–West Italy at the left-hand side of the image. The of 12 homogeneous land cover classes characterized by surface elevation ranges from sea level up to 4 km (1 km constant values of leaf area index, chlorophyll content, and on average). The average geometric conditions are 5◦ of background bare-soil surface reflectance (dark/bright), for viewing zenith angle and 28◦ of SZA. As for the atmospheric which the surface reflectance and fluorescence emission are conditions over the image, the ozone (0.308–0.327 atm-cm) simulated with SCOPE RTM. The homogeneous atmosphere and water vapor (0.47–2.87 g·cm−2) are provided by the was simulated with MODTRAN6 with a mid-latitude summer ECMWF near-real-time data sets and stored in the OLCI profile, a CWV = 2g·cm−2 and aerosol conditions described Level-1B meteorological data file. The aerosol parameters are by an AOT = 0.25, g = 0.81, α = 1.54, and SSA = 0.98. extracted from the OLCI Synergy product. For the selected As for the geometric conditions, the average SZA is 40◦,and image subset, the average Aerosol Optical Thickness (AOT) surface elevation at 0 km.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 7

TABLE III TABLE IV SPECTRAL CONFIGURATION OF FLORIS INSTRUMENT AND MOD- GPR EMULATORS PERFORMANCE ASSESSMENT RESULTS (NRMSEλ IN TRAN6 TRAINING DATABASE %) FOR THE 100, 500, 1000, AND 2000 SAMPLES TRAINING DATA- BASES (COLUMNS)USING 10 PCA CONVERSION.RESULTS ARE SHOWN FOR THE SIX ATMOSPHERIC TRANSFER FUNCTIONS (ROWS). THE AVERAGE CPU TIME (IN S) TO EMULATE 2000 SPECTRA IS SHOWN IN THE LAST ROW

The emulators for the atmospheric transfer functions were trained with the best performing algorithm, database size, DR method, and a number of components as derived from the analysis described in Section IV-A. The training database consists of the same parameter configuration as in Table I and a spectral configuration optimized for the FLORIS instru- ment (see Table III). In order to perform the atmospheric correction of FLORIS data, the high spectral resolution emu- lated atmospheric transfer functions were convolved by a and the at-surface solar irradiance (Edir) are more accurately super-Gaussian approximation of FLORIS spectral response emulated than their diffuse counterparts (Tdif and, specially, [64] following the approach proposed in [65]. Edif) for the GPR and KRR emulators. As for the computation time, all regression methods emulate the 12 000 spectra (i.e., V. R ESULTS the six transfer functions and 2000 conditions of the reference database) in less than 1 s without significant differences. Results are organized as follows. First, the key aspects Altogether, GPR seems the optimal method for emulating that play a role in the emulation of spectral atmospheric MODTRAN data followed by KRR. The RF and NN regres- transfer functions are systematically analyzed. Then, the best sion methods obtain higher NRMSE values and can, therefore, performing emulators are used for the atmospheric correction be discarded to function as an emulator. of synthetic and real satellite data sets.

A. On the Role of Used Regression Algorithms B. On the Role of Database Size The performance of the selected regression algorithms for emulating atmospheric transfer function spectra is first ana- The role of database training size was subsequently ana- lyzed for the medium spectral resolution and using a training lyzed. The spectral NRMSE performance evaluation results database of 1000 samples according to a 70%/30% train- are shown in Fig. 5 only for the path radiance (L0)andthe ing/validation distribution and a 10 PCA components reduc- at-surface diffuse solar irradiance (Edif) outputs for illustra- tion. Performance assessment results are compared by plotting tion purposes. Similar results are obtained for the remaining the spectral relative errors (NRMSE) (Fig. 4), and these atmospheric transfer functions (figures not shown). These error statistics were averaged for the six spectral variables. results indicate how increasing the database size from 100 to In agreement with our previous findings in [24], the results 2000 samples (70% used for training) reduced the relative in Fig. 4 indicate that KRR and GPR methods systemati- errors to nearly half along with the entire spectral range. How- cally emulate more accurately the six atmospheric transfer ever, the error reduction is not systematic with the addition of functions with average NRMSE values of 0.4% and 0.2%. more samples in the training database. Indeed, the errors are The other regression algorithms get higher average NRMSE largely reduced when passing from 100 to 500 samples but values of 1.9% for NN and 3.5% for RF. Except for the they get more stabilized for larger training databases (i.e., 500, diffuse transmittance (Tdif), KRR and GPR show a nearly 1000, and 2000 samples). In fact, the performance of the L0 overlapping spectral NRMSE with an overall value below 1% emulator seems to give slightly better results with 500 training in most spectral channels. It is also observed that the errors samples than with 2000 training samples. increase inside the deep H2OandO2 absorption bands. This Table IV summarizes the spectrally averaged NRMSE val- is a result of two combined effects. On the one hand, the DR ues as a function of the database size for all the atmospheric through PCA is challenged by the reconstruction of all the transfer functions. It is observed again how the error is reduced spectral features in these absorption regions and for the broad from 0.62% to 0.18% when passing from 100 training samples simulated spectral range. On the other hand, the nature of the to 2000 samples. The “saturation” effect in the performance relative error metric implies divisions by nearly zero values for 500–1000 training samples is also observed for all the in these deep absorption regions, and thus, increases the error atmospheric transfer functions. Regarding the emulation com- values at these wavelengths. When observing the performance putation time, the average CPU time spent to emulate the for each atmospheric transfer function, the results indicate n = 2000 spectra from the reference database shows a nearly that the direct components of the upward transmittance (Tdir) proportional dependence of the training database size.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 4. Spectral NRMSE (in %) results for the regression algorithms performance assessment as function of the regression algorithms (see legend) using 10 PCA conversion. Results are shown for the six atmospheric transfer functions outputs. Note that, except for Tdif, GPR and KRR results are overlapping.

C. On the Role of Dimensionality Reduction Methods TABLE V GPR EMULATORS PERFORMANCE ASSESSMENT RESULTS (NRMSEλ IN The influence of DR methods on the emulator accuracy %) USING TEN COMPONENTS CONVERSION FOR PCA AND PLS. is assessed on the best performing regression algorithm, RESULTS ARE SHOWN FOR THE SIX TRANSFER FUNCTIONS i.e., GPR. For a GPR emulator and ten components of DR, (COLUMNS) the performance of PCA and PLS transformation is shown in Fig. 6. Here, only the emulator for the at-surface diffuse solar irradiance is shown since the performance results with the PCA and PLS methods show larger differences. For this specific atmospheric transfer function, the PCA method is the best performing. However, for the remaining functions, the dif- ference in performance with PCA and PLS methods is less (coarse, medium, and fine). The assessment was carried out evident as observed in the spectrally averaged NRMSE values by using 10, 20, 30, and 40 components. In Fig. 7, only the in Table V. Among the six atmospheric transfer functions, it is results for the upward direct transmittance (Tdir)areshown worth noticing that the emulator for Edif systematically gets for illustration purposes. Similar results were obtained for the the higher errors with both DR methods. remaining atmospheric transfer functions. The spectral range Following these results, the PCA method was chosen to was selected to illustrate the effect of adding more PCA evaluate the role of a number of components used for training components in a region affected by several gas absorption the GPR models as a function of the spectral resolution features like the H2O (720, 820, and 940 nm) and the O2-A

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 9

Fig. 7. Spectral NRMSE (in %) results for the GPR emulator performance assessment using 10, 20, 30, and 40 PCA conversions (see legend) at (a) coarse, (b) medium, and (c) fine spectral resolutions. Results are shown only for the direct upward transmittance output.

Fig. 5. Spectral NRMSE (in %) results for the GPR emulator performance assessment as function of the database size (see legend) using 10 PCA conversion. Results are shown only for the path radiance (top) and diffuse at-surface solar irradiance (bottom) outputs.

Fig. 8. Relative error histogram (in %) (see color bar) and mean error (black dashed line) between simulated and emulated TOA radiance for the reference database conditions at 5-cm−1 resolution.

components, where the NRMSE error does not decreases neither in its absolute nor random values. The drawback of Fig. 6. Spectral NRMSE (in %) results for the GPR emulator performance assessment using ten components conversion for PCA and PLS. Results are adding more components is increasing the model complexity, shown only for the diffuse at-surface solar irradiance output. and thus, a longer computation time. For instance, emulating 2000 spectra in the fine spectral resolution increases from 1.8 s using ten components to 3.6 s using 40 components. (760 nm) and used in land vegetation applications (e.g., Sentinel-2, FLEX). A clear trend is observed when comparing the results on the D. Application of Emulators in Atmospheric Correction various spectral resolutions; as the resolution increases from As a first demonstration case, we compared the TOA coarse to fine, the bias in the NRMSE error (i.e., outside of radiance spectra constructed from MODTRAN simulations absorption regions) decreases from 0.2% to 0.05%. However, against those constructed from the best performing emulations. the error difference between this bias and the values in the To do so, the medium resolution GPR emulators trained with absorption regions increases from ∼0.1% in the coarse reso- 500 samples and 20 PCA components were run at the input lution up to ∼0.2% in the fine resolution. For Fig. 7(a)–(c), conditions in the reference database. The histogram of relative it is also observed that adding more components into the errors between simulated and emulated TOA radiances is given model reduces the relative errors in the absorption regions. in Fig. 8 in the log-scale together with the average relative Nevertheless, there is a saturation effect after 20–30 PCA error.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 9. Spectral NRMSE (in %) results for the performance assessment of TOA radiance reconstruction.

Fig. 11. Average surface reflectance from OLCI Synergy product and retrieved with the emulators (blue and red dashed lines, respectively) and dynamic range (shaded areas) for the (Px ) 20% and 80%. Over- lapping areas are seen in purple color.

Fig. 10. Relative error histogram (in %) (see color bar) and mean error (black dashed line) between reference and atmospherically corrected surface reflectance for the reference database conditions at 5-cm−1 resolution.

Fig. 8 shows that, outside of main absorption bands, Fig. 12. Sample surface reflectance spectra from OLCI synergy prod- the TOA radiance is reconstructed with an error <0.6% for uct (blue) and retrieved with the emulators (red). 95% of the conditions in the reference database. These errors are even lower (∼0.15%) in average atmospheric/geometric conditions and even down to ∼0.01% for the best 10% of in the OLCI synergy product (blue) and as obtained with our the 2000 simulations. When analyzing the errors inside of emulators (red). Fig. 11 indicates a generally good agreement gas absorption regions, they increase due to the divisions on the derived image values. However, some discrepancies by nearly zero radiances. The upper 95% percentile of the are visible in the two first spectral channels, where the choice errors increases above 1%, with values of 2% in the O2-A at of the extraterrestrial solar irradiance spectrum and aerosol 761 nm and up to 10% in the deep H2O at 1125 nm. However, optical properties have a higher impact. Above 800 nm, the average values are typically below 1%. For consistency the dynamic range of the values obtained by the emulator and comparison with the performance metrics shown along shows a tendency to obtain lower reflectances than those in the previous sections, the spectral NRMSE is shown in Fig. 9. the OLCI synergy product, with differences around 5%. The same analysis was carried out in terms of the inverted In Fig. 12, we show a sample of five surface reflectance surface reflectance. The histogram of the relative errors (see spectra covering the dynamic range observed in the image, Fig. 10) shows an increase toward the 400–500-nm spectral i.e., from low to high values. Fig. 12 shows a good agreement range due to the low surface reflectance values and the higher between the two atmospherically corrected reflectance spectra. impact of the aerosol scattering. Nevertheless, the errors are In our last test case scenario, synthetic FLORIS/FLEX rather spectrally flat after 600 nm (excluding the absorption data were atmospherically corrected using MODTRAN-based regions). The relative error is below 1%, when excluding deep emulators. Querying the emulator for the input conditions of absorption regions, for 95% of the atmospheric conditions and the 10 000 image pixels took less than 30 s. Sample surface up to 3% in the O2-A and the H2Obandat∼820 nm. reflectance spectra for the 12 distinctive land cover classes In our second test case scenario, the 500-sample GPR in Fig. 3 are shown at the bottom of Fig. 13. An overall emulators of MODTRAN atmospheric transfer functions at agreement is found for all distinctive surfaces, with errors medium spectral resolution were applied to atmospherically ∼1% inside the O2-Aand3.6%inO2-B absorptions. These correct a subset of OLCI Level-1B data. The processing time errors are slightly above the FLEX mission requirements (1%). was 25 m for 1.3 million pixels in batches of 100 000 pixels to However, it must be remarked that the error propagation avoid saturating the RAM memory. Fig. 11 shows the statistics combines the emulator and the residual spectral calibration of spectral reflectance values in the OLCI image as provided errors.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 11

the employed methodology to recover the multi-output spectral data. Consequently, the training and application of these mod- els will take longer times when more components are added to the emulator. However, GPR has an additional advantage over other methods. While all emulators- and LUT-based methods allow computing derivatives numerically, only kernel methods (KRR and GPR) yield the explicit formulas for the calculation of Jacobians analytically. This allows error quantification and propagation directly from their model structure, avoiding the implementation of complex error propagation strategies nor requiring the calculation of any numerical approximation or empirical derivatives [35], [67]. These associated uncertainty estimates can function to propagate errors from atmospheric correction algorithms and the Jacobians in efficient numerical inversion algorithms. Finally, although the best performing emulators reached accuracies with a relative error (NRMSE) of 0.2% for the tested data sets, there is still a margin for improvement foreseen. The latest generation of deep GPRs is indeed a promising alternative to the classical GPRs used Fig. 13. Sample surface reflectance spectra from the ground truth refer- in this work [68], [69]. Moreover, we can think about con- ence (blue) and retrieved with the emulators (red). Bottom: zoom window in straining the emulators-based physical correlations of the six the O2-A (left-hand side) and O2-B (right-hand side) absorption bands. atmospheric transfer functions, which were here emulated independently. For example, it is known that an increase in VI. DISCUSSION AOT will increase the absorption and the multiple scattering, thus reducing the direct transmittance (T ) while increasing In this section, we discuss the most important messages dir the diffuse transmittance (T ). By including these correla- derived from analyzing the emulator results (Section VI-A), dif tions, we would expect better conditioned emulators with an and the implications on further studies for atmospheric correc- improvement in accuracy and performance. tion (Section VI-B). We finally discuss possible data process- A second aspect studied in this work is the effect of the ing opportunities with atmospheric emulators (Section VI-C). training database. Based on the usage of traditional LUT interpolation methods, it was initially expected that larger A. Interpreting Emulator Results training databases would turn into more-accurate emulators. Emulation is a statistical technique used to approximate This aspect was investigated in order to optimize the emulator deterministic models of large computational burden [23]. Pre- accuracy. Yet, the results indicate a residual increase in the vious experiments have shown that regression algorithms can GPR emulator accuracy between 500 and 1000 samples and a accurately function as emulators to approximate RTMs [20], kind of saturation effect after 1000 samples. One explanation [24], [37] and obtain competitive results with respect to is that adding more samples in the same input variable space traditional LUT interpolation [21]. In this work, we explored does not add new information to the . With further the concept of emulating spectral transfer functions a few samples, the GPR emulator can already recover the for atmospheric correction. MODTRAN6 was used to gener- smooth dependencies of the atmospheric transfer functions ate the databases of atmospheric transfer functions, but the with the input variables. Another explanation is that the emulation technique can be applied to any other atmospheric reference data set was generated with the same data distri- RTM as demonstrated in [25]. Specifically, key aspects that bution (LHS) than the training data sets. Thus, the emulator impact the performance of an emulator were systematically is somehow optimized for variations of the spectral outputs in analyzed. These aspects include: 1) used statistical regres- this input variable space and distribution. Moreover, it must be sion algorithm; 2) training database size; and 3) used DR remarked that larger training databases go along with longer method and number of components, linked with spectral processing times, both in database generation and emulator resolution. Each of these aspects is briefly discussed in the execution, nearly doubling the computation time when the following. training database doubled its size. It worth noting that in this The first aspect studied in this work is a comparison of the work, we did not study the impact of sampling methods in performance of various regression algorithms. From the five the training database. And only the LHS method was used algorithms tested, GPR and KRR reached systematically the for its simplicity and accuracy [24]. However, the methods best accuracies for all the six atmospheric transfer functions in known as experimental design (also known as active learning) agreement with our previous results in [24]. The NNs and RFs can potentially be used to further improve the accuracy of tested here were not able to reach the accuracy achieved by an emulator [70]–[72]. Indeed, these methods aim at finding GPR. Though a multi-output version of the GPR method exists an optimal (and minimum) set of training samples with a [66], the version used in this work is a single-output method, consequent reduction in the error of the statistical regression and thus, looping over each individual PCA component was and its computation time.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

A third aspect was the evaluation of DR in terms of (Tx ), we would have an effective gas absorption (Tgas,x ) whose algorithms and number of components, and how these DR spectral features are well known and common for all of these methods perform with an increasing spectral resolution. The spectral functions based on the formulation of the spectral implementation of the DR step is key for the emulation of optical depth (τi ) in [76]. An effective optical depth (per multi-outputs models, as it allows reducing the RTM spec- gas molecule) could be used as a DR component to account tral output to a manageable number of components before for variations in the gas concentration, pressure conditions, applying a single-output regression algorithm. A suitable DR and optical path due to coupled absorption-scattering effects. method should also allow reconstructing the full spectrum in In addition, the molecular (Rayleigh) and aerosol scattering order to deliver spectroscopy outputs of several thousands of consist of smooth spectra that can be empirically represented bands, and thus, limit the choice of other more advanced DR by a double exponential or by a simplified physical model, methods [73], [74]. In this work, we analyzed two classical such as [12]. These empirical parameters correspond to new statistical approaches being PCA and PLS. On the one hand, DR components. This concept is represented in (3)   the PCA method solves the multicollinearity problem in the ng model outputs without considering the correlations between Tx ≡ Tscat(px ) · Tgas,x = Tscat(px ) · exp wi,x τi (3) dependent (outputs) and independent (inputs) variables. On the i=1 other hand, PLS is a more advanced version of PCA that takes where p are the parameters of the scattering curve and w , into account these correlations between variables. However, x i x are the parameters modifying the effective optical depth of the in the absence of clusters in the input variables, both methods n gases, both specific for each transfer function (x subindex). are mathematically alike which explains why PLS did not g The gaseous spectral optical depth is provided by running an lead to performance improvement as compared with PCA atmospheric RTM with fixed atmospheric conditions. for the analyzed number of components. In fact, the PCA method derived slightly lower errors in the reconstruction of the six atmospheric transfer functions and for all spectral B. Opportunities for Atmospheric Correction channels. Once selected the PCA method to train our GPR The applicability of emulating atmospheric RTM data has emulators, we analyzed the effect of increasing the number been demonstrated in three study cases of atmospheric cor- of components. Indeed, we could expect that by adding more rection of real and synthetic satellite data. The first test case components in the DR we would achieve better spectral recon- served to validate the correct functionality of the trained struction, particularly at higher spectral resolution. Previous GPR emulators over a set of MODTRAN TOA research in [48] and [49] indicate that 100–400 PCA compo- radiance simulations. The reconstruction of simulated TOA nents are needed to accurately reconstruct the spectral data. radiance spectra was achieved with an average (maximum) Yet, our results indicate that adding more than components error of ∼0.1% (0.5%), excluding deep H2O absorption bands, 20–30 did not seem to further reduce the reconstruction errors. in a wide range of atmospheric and geometric conditions. The same conclusion is derived for the coarse, medium, and These errors are an order of magnitude lower than current fine spectral resolutions, though with small error differences state-of-the-art absolute radiometric calibration performances when comparing inside and outside absorption regions. The of satellite spectrometers [60], and on the same order of signal- apparent discrepancy can be explained by the application to-noise requirements for imaging spectrometers. The emulator domain and error requirements. The cited works focus on performance was further analyzed in terms of atmospherically reproducing high spectral resolution (0.5 cm−1) spectra from corrected surface reflectance data. Here, the results showed the IASI/MetOp instrument inside of deep gas absorptions for errors typically below 1%. These errors are in agreement applications in trace gas monitoring, where a selection of the with our previous analysis in [21], demonstrating the potential number of components was based on having a reconstruction accuracy of emulators against classical LUT interpolation for error below the random instrument noises. We instead focus atmospheric correction. on a broader spectral range and coarser spectral resolution In the second test case, the atmospheric correction of nearly for applications in atmospheric correction, choosing a more 1.3 million pixels of OLCI L1B data subset took about relaxed error requirement based on absolute radiometric cali- 25 m, showing that GPR emulators can be used efficiently bration. This comparison between works underlines that the in practical applications. In terms of computation time and shape of the spectral data, application domain, and error memory usage of the GPR emulators, further improvements requirements determine the optimized number of components. are expected when optimizing the number of spectral channels In practice, some repetitions are required to deduce this and resolution of the simulated training database. Moreover, optimal number. Nevertheless, although PCA can be concluded if the emulators are trained with the simulated data after as the preferred DR method among the two explored options, spectral convolution by the ISRF, the emulation time will the development of alternative DR methods to improve the drastically drop down when reducing several thousand spectral reconstruction of the signal in the entire spectral range is channels from the high spectral resolution data down to strongly encouraged (e.g., [50]). An envisaged option is to 21 spectral channels of OLCI data. In addition, this pre- consider the known physics of the radiative transfer in a DR processing step of spectral convolution will avoid the need of method. Here, we could decouple the gas absorption from the applying a DR technique, consequently saving computation molecular and aerosol scattering following a similar approach time and improving the emulator accuracy. However, this to [75]. For each of the six atmospheric transfer functions strategy might still increase the errors in case of a strong

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 13 smile effect in an imaging spectrometer, particularly within there is a tremendous gain in processing speed when com- absorption bands. In terms of accuracy, the discrepancies in paring emulators to the original RTM simulations. Second, the retrieved surface reflectance statistics can be explained due emulators have competitive or superior accuracies to classical to several reasons. First, the OLCI SYN surface reflectance LUT interpolation techniques. Third, as with any regression product is obtained using the MOMO RTM [14] to perform the method, GPR emulators provide associated uncertainties that atmospheric correction [6], while we used MODTRAN to train are not straightforward to calculate with LUT interpolation the GPR emulators. The differences in the radiative transfer methods. Fourth, Jacobians are directly calculated based on solver (e.g., vectorial in MOMO and scalar in MODTRAN) the analysis of the numerical derivatives in the LUT nodes. and in the aerosol parameters (OPAC aerosols in [6] against Finally, the memory consumption is much lower than storing effective optical properties in our MODTRAN GPR emulators) and reading a large LUT for its interpolation. Consequently, might explain the statistical differences, particularly at shorter RTM emulators can potentially be applied in a diversity of wavelengths where the aerosol scattering and atmospheric remote sensing (RS) applications. Some of these applications polarization of the signal have a stronger effect. Second, were already described in [26], such as fast computation of the atmospheric correction in the OLCI SYN processing is global sensitivity analysis [24], [25]. Here, we outline two carried out at TOA reflectance data, i.e., after normalization more promising emulation applications related to atmospheric by the extraterrestrial solar irradiance measured by the solar RTMs. diffuser on-board of Sentinel-3. We instead processed the 1) Numerical Inversion: Iterative optimization is a tech- OLCI data in radiance units, using the default extraterrestrial nique to invert geophysical parameters from spectral data solar irradiance spectrum in the MODTRAN simulations. by comparing it against RTM simulations. Commonly used The choice of the solar irradiance spectrum could explain in aerosol characterization [6], the optimization consists in differences along with OLCI’s spectral range. Third, the first minimizing a cost function, i.e., the difference between OLCI band (centered at 400 nm with a spectral resolution measured and estimated TOA radiance by successive itera- of 15 nm) covers observations below the spectral range in tions in the input variables. This iterative procedure can be our training database (400–2500 nm). At these short wave- time-consuming when large spectral data sets are inverted lengths, the deep O3 absorption and spectral shape of the solar and impractical for computationally intensive RTMs. There- irradiance spectrum are not well covered in our simulations, fore, interpolating precomputed LUTs is common practice in which explains the negative reflectances obtained for the first atmospheric characterization algorithms [5], [7]. Following the OLCI spectral channel. Finally, our processing with emulators same approach, emulators can be used to replace the original assumes ideal Lambertian surface reflectance and neglects the RTMs and be applied in numerical inversion as an attractive adjacency and topographic effects, which are likely important alternative to classical LUT-based methods. It would bypass in the selected mountainous area. the need to generate large LUTs and develop complex LUT In the final test case, we trained GPR emulators from interpolation routines while leading to faster and more accurate an MODTRAN database of atmospheric transfer functions results [21]. tailored for the hyperspectral FLORIS/FLEX data. The FLEX 2) Data Assimilation and Numerical Weather Prediction: mission simulator was used to generate realistic instrument Numerical weather prediction relies on the assimilation of FLORIS data over various vegetated and bare-soil surfaces satellite TOA radiance measurements [77]. This assimilation and standard atmospheric conditions. In terms of computation is carried out by simulating the processes of emission and efficiency, the atmospheric correction with GPR emulators absorption by surface, clouds, and gases, along the line- took less than 30 s, a fraction of the several minutes that of-sight of satellite instrument measurements. In order not might have taken with a classical LUT interpolation technique to delay the production of weather forecasting, the RTM [21]. In terms of accuracy, the errors in surface reflectance simulations must be made in a few milliseconds. Hence, fast were ∼1% (4%) in the O2-A (O2-B) absorption band. These atmospheric models are a major requirement to enable the errors are slightly above the mission requirements for accurate assimilation of satellite measurements within weather predic- retrieval of Sun-induced fluorescence emission [62]. We have tion models. An example of such RTMs is the RTTOV model to remark here the stringent reflectance error requirements in [11], a fast model for simulating radiances acquired by passive a challenging spectral region affected by narrow, spectrally optical and microwave satellite instruments (e.g., IASI). With variable, and deep O2 absorption features. Indeed, along an increasing spectral resolution of satellite instruments, effi- with this work, we have observed that the trained emulators ciency and accuracy are current challenges of RTTOV [49]. typically achieve lower accuracies inside of gaseous absorption Atmospheric emulators offer an alternative, or complement, regions than in spectrally smooth regions. This indicates that to fast RTMs for data assimilation applications. Moreover, developing more accurate DR methods (statistical-based or the ingestion of uncertainties on data assimilation schemes physically based) should be an important research line to is essential; here, the use of emulators could represent an improve the accuracy of emulators for atmospheric correction. additional advantage.

C. Data Processing Applications With Atmospheric VII. CONCLUSION Emulators Emulation is a technique to approximate deterministic Emulators offer similar and additional advantages when models based on statistical regression algorithms. With a com- compared with traditional LUT interpolation techniques. First, petitive or even higher accuracy than traditional LUT interpo-

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING lation, emulators also provide important savings in memory [5] T. Cooley et al., “FLAASH, a MODTRAN4-based atmospheric correc- and in computation time. Hence, they offer practical solutions tion algorithm, its application and validation,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., vol. 3, Jun. 2002, pp. 1414–1418. for data processing applications and new research opportuni- [6] P. North and A. Heckel, “Sentinel-3 optical products and algorithms ties by converting computationally expensive RTMs into fast definitions: SYN algorithm theoretical basis document,” Geography surrogate models. Given a subset of typical input variables Dept., Swansea Univ., U.K., Tech. Rep. S3-L2-SD-03-S02-ATBD, 2010. [7] D. R. Thompson, V. Natraj, R. O. Green, M. C. Helmlinger, B.-C. Gao, used in atmospheric correction, we have first analyzed key and M. L. Eastwood, “Optimal estimation for imaging spectrometer aspects of emulator design that play a role in the accuracy of atmospheric correction,” Remote Sens. Environ., vol. 216, pp. 355–373, reproducing MODTRAN simulations of spectral atmospheric Oct. 2018. [8] W. Q. Fu, “Radiative transfer,” in Atmospheric Science, J. M. Wallace transfer functions, being: 1) the used statistical regression and P. V. Hobbs, Eds., 2nd ed. San Diego, CA, USA: Academic, 2006, algorithm; 2) the training database size; 3) the used DR method ch. 4, pp. 113–152. to simplify the spectral dimension into components; and 4) the [9] C. Tenjo et al., “Design of a generic 3-D scene generator for passive number of components to where the regression is applied as a optical missions and its implementation for the ESA’s FLEX/Sentinel- 3 tandem mission,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 3, function of the spectral resolution of the simulated data. The pp. 1290–1307, Mar. 2018. GPR was found best suited to function as an emulator for the [10] O. Dubovik et al., “Statistically optimized inversion algorithm for six atmospheric transfer functions. With a training database of enhanced retrieval of aerosol properties from spectral multi-angle polari- metric satellite observations,” Atmos. Meas. Techn., vol. 4, no. 5, approximately 1000 samples, the GPR emulators can recon- pp. 975–1018, May 2011. struct MODTRAN spectral outputs for any combination of the [11] R. Saunders et al., “An update on the RTTOV fast radiative transfer input variables with an average NRMSE of 0.2%. From the model (currently at version 12),” Geoscientific Model Develop., vol. 11, no. 7, pp. 2717–2737, Jul. 2018. two tested standard DR methods, PCA achieves slightly better [12] F. C. Seidel, A. A. Kokhanovsky, and M. E. Schaepman, “Fast and accuracies when decomposing and reconstructing the spectra. simple model for atmospheric radiative transfer,” Atmos. Meas. Techn., Typically, 20 components achieve the lowest errors for various vol. 3, no. 4, pp. 1129–1141, Aug. 2010. [13] A. Berk, P. Conforti, R. Kennett, T. Perkins, F. Hawes, and spectral resolutions. A larger number of components might J. Van Den Bosch, “MODTRAN6: A major upgrade of the MOD- better preserve the spectral variability within gas absorption TRAN radiative transfer code,” Proc. SPIE, vol. 9088, Jun. 2014, regions but at the expenses of increasing the computation Art. no. 90880H. [14] F. Fell and J. Fischer, “Numerical simulation of the light field in the time. The best performing emulators were applied second in atmosphere–ocean system using the matrix-operator method,” J. Quant. three atmospheric correction scenarios for real-OLCI Level-1B Spectrosc. Radiat. Transf., vol. 69, no. 3, pp. 351–388, May 2001. data and synthetic hyperspectral and FLORIS/FLEX Level-1B [15] C. Emde et al., “The libRadtran software package for radiative transfer calculations (version 2.0.1),” Geoscientific Model Develop., vol. 9, no. 5, data. Nearly 1.3 million OLCI pixels were atmospherically pp. 1647–1672, May 2016. corrected in less than 25 m and in less than 30 s for [16] L. Guanter, R. Richter, and H. Kaufmann, “On the application of 10 000 FLORIS pixels. In terms of accuracy, the atmospheric the MODTRAN4 atmospheric radiative transfer code to optical remote correction using emulators was achieved with errors below sensing,” Int. J. Remote Sens., vol. 30, no. 6, pp. 1407–1424, Mar. 2009. [17] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions 1% outside of absorption regions and below 3.6% in the (Applied Mathematics Series), vol. 55. Washington, DC, USA: National O2 bands for FLEX measurements. These results show that Bureau Standard, 1964, ch. 25.2. emulators can replace computationally expensive atmospheric [18] C. B. Barber, D. P. Dobkin, and H. Huhdanpaa, “The Quickhull algorithm for convex hulls,” ACM Trans. Math. Softw., vol. 22, no. 4, RTMs and potentially be applied in atmospheric correction pp. 469–483, Dec. 1996. of satellite instruments at various spectral resolutions with a [19] I. Sibson, Handbook of Mathematical Functions (Interpolating Mul- competitive accuracy and computational efficiency. Moreover, tivariate Data), vol. 55. New York, NY, USA: Wiley, 1989, ch. 2, pp. 21–36. emulators can be implemented in any other data processing [20] J. Gómez-Dans, P. Lewis, and M. Disney, “Efficient emulation of application, where atmospheric RTMs are involved. Finally, radiative transfer codes using Gaussian processes and application to we have identified potential research lines to improve the land surface parameter inferences,” Remote Sens., vol. 8, no. 2, p. 119, Feb. 2016. accuracy of atmospheric RTMs involving the latest progress in [21] J. Vicent et al., “Emulation as an accurate alternative to interpolation deep GPRs, implementation of active learning methods, and in sampling radiative transfer codes,” IEEE J. Sel. Topics Appl. Earth optimization of DR techniques-based physical approximations. Observ. Remote Sens., vol. 11, no. 12, pp. 4918–4931, Dec. 2018. [22] M. Kennedy and A. O’Hagan, “Predicting the output from a complex It is expected that with a combination of these improvements, computer code when fast approximations are available,” Biometrika, emulators will further increase their accuracy in reproducing vol. 87, no. 1, pp. 1–13, Mar. 2000. the spectral outputs of an atmospheric RTM. [23] A. O’Hagan, “Bayesian analysis of computer code outputs: A tutorial,” Rel. Eng. Syst. Saf., vol. 91, nos. 10–11, pp. 1290–1300, Oct. 2006. [24] J. Verrelst et al., “Emulation of leaf, canopy and atmosphere radiative EFERENCES R transfer models for fast global sensitivity analysis,” Remote Sens.,vol.8, [1] A. A. Kokhanovsky et al., “Aerosol remote sensing over land: A com- no. 8, p. 673, Aug. 2016. parison of satellite retrievals using different algorithms and instruments,” [25] J. Vicent et al., “Comparative analysis of atmospheric radiative transfer Atmos. Res., vol. 85, nos. 3–4, pp. 372–394, Sep. 2007. models using the atmospheric look-up table generator (ALG) tool- [2] M. E. Schaepman, S. L. Ustin, A. J. Plaza, T. H. Painter, J. Verrelst, box (version 2.0),” Geoscientific Model Develop., vol. 13, no. 4, and S. Liang, “Earth system science related imaging spectroscopy— pp. 1945–1957, 2020. An assessment,” Remote Sens. Environ., vol. 113, pp. S123–S137, [26] J. Verrelst, J. R. Caicedo, J. Muñoz-Marí, G. Camps-Valls, and Sep. 2009. J. Moreno, “SCOPE-based emulators for fast generation of synthetic [3] M. Gholizadeh, A. Melesse, and L. Reddi, “A comprehensive review on canopy reflectance and sun-induced fluorescence spectra,” Remote Sens., water quality parameters estimation using remote sensing techniques,” vol. 9, no. 9, p. 927, Sep. 2017. Sensors, vol. 16, no. 8, p. 1298, Aug. 2016. [27] F. Baret, J. G. P. W. Clevers, and M. D. Steven, “The robustness of [4] L. S. Bernstein, X. Jin, B. Gregor, and S. M. Adler-Golden, canopy gap fraction estimates from red and near-infrared reflectances: “Quick atmospheric correction code: Algorithm description and recent A comparison of approaches,” Remote Sens. Environ., vol. 54, no. 2, upgrades,” Opt. Eng., vol. 51, no. 11, pp. 1–12, 2012. pp. 141–151, Nov. 1995.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 15

[28] J. Verrelst et al., “Optical remote sensing and the retrieval of terrestrial [53] H. Wold, “Non-linear estimation by iterative least procedures squares,” vegetation bio-geophysical properties–a review,” ISPRS J. Photogramm. in Research Papers in Statistics,F.DavidandJ.Neyman,Eds. Remote Sens., vol. 108, pp. 273–290, Oct. 2015. New York, NY, USA: Wiley, 1966. [29] C. Carnevale, G. Finzi, G. Guariso, E. Pisoni, and M. Volta, “Surrogate [54] J. Verrelst, E. Romijn, and L. Kooistra, “Mapping vegetation den- models to compute optimal air quality planning policies at a regional sity in a heterogeneous river floodplain ecosystem using pointable scale,” Environ. Model. Softw., vol. 34, pp. 44–50, Jun. 2012. CHRIS/PROBA data,” Remote Sens., vol. 4, no. 9, pp. 2866–2889, [30] A. Castelletti, S. Galelli, M. Ratto, R. Soncini-Sessa, and P. C. Young, Sep. 2012. “A general framework for dynamic emulation modelling in environmen- [55] W. Verhoef and H. Bach, “Simulation of Sentinel-3 images by four- tal problems,” Environ. Model. Softw., vol. 34, pp. 5–18, Jun. 2012. stream surface–atmosphere radiative transfer modeling in the optical [31] N. Bounceur, M. Crucifix, and R. Wilkinson, “Global sensitivity analysis and thermal domains,” Remote Sens. Environ., vol. 120, pp. 197–207, of the climate–vegetation system to astronomical forcing: An emulator- May 2012. based approach,” Earth Syst. Dyn. Discuss., vol. 5, no. 2, pp. 901–943, [56] K. Stamnes, S.-C. Tsay, W. Wiscombe, and K. Jayaweera, “Numerically 2014. stable algorithm for discrete-ordinate-method radiative transfer in mul- [32] J. Verrelst, J. Vicent, J. P. Rivera-Caicedo, M. Lumbierres, tiple scattering and emitting layered media,” Appl. Opt., vol. 27, no. 12, P. Morcillo-Pallarés, and J. Moreno, “Global sensitivity analysis of pp. 2502–2509, 1988. leaf-canopy-atmosphere RTMs: Implications for biophysical variables [57] R. Goody, R. West, L. Chen, and D. Crisp, “The correlated-k method retrieval from top-of-atmosphere radiance data,” Remote Sens., vol. 11, for radiation calculations in nonhomogeneous atmospheres,” J. Quant. no. 16, p. 1923, Aug. 2019. Spectrosc. Radiat. Transf., vol. 42, no. 6, pp. 539–550, Dec. 1989. [33] G. Camps-Valls and L. Bruzzone, Eds., Kernel Methods for Remote [58] A. Berk and F. Hawes, “Validation of MODTRAN6 and its line-by-line Sensing Data Analysis. U.K.: Wiley, Dec. 2009. algorithm,” J. Quant. Spectrosc. Radiat. Transf., vol. 203, pp. 542–556, [34] S. Razavi, B. A. Tolson, and D. H. Burn, “Numerical assessment Dec. 2017. of metamodelling strategies in computationally intensive optimization,” [59] S. Conti and A. O’Hagan, “Bayesian emulation of complex multi-output Environ. Model. Softw., vol. 34, pp. 67–86, Jun. 2012. and dynamic computer models,” J. Stat. Planning Inference, vol. 140, [35] J. E. Johnson, V. Laparra, and G. Camps-Valls, “Accounting for input no. 3, pp. 640–651, Mar. 2010. noise in Gaussian process parameter retrieval,” IEEE Geosci. Remote [60] N. Lamquin, S. Clerc, L. Bourg, and C. Donlon, “OLCI A/B tandem Sens. Lett., vol. 17, no. 3, pp. 391–395, Mar. 2020. phase analysis, part 1: Level 1 homogenisation and harmonisation,” [36] R. K. S. Hankin, “Introducing BACCO, an R package for Bayesian Remote Sens., vol. 12, no. 11, p. 1804, Jun. 2020. analysis of computer code output,” J. Stat. Softw., vol. 14, no. 16, [61] M. Hess, P. Koepke, and I. Schult, “Optical properties of aerosols and pp. 1–21, 2005. clouds: The software package OPAC,” Bull. Amer. Meteorolog. Soc., [37] J. P. Rivera, J. Verrelst, J. Gómez-Dans, J. M. Marí, J. Moreno, and vol. 79, no. 5, pp. 831–844, May 1998. G. Camps-Valls, “An emulator toolbox to approximate radiative transfer [62] Report for Mission Selection: FLEX, ESA SP-1330/2 (2 Volume Series), models with statistical learning,” Remote Sens., vol. 7, no. 7, p. 9347, Eur. Space Agency, Noordwijk, The Netherlands, 2015. 2015. [63] J. Vicent et al., “FLEX end-to-end mission performance simulator,” [38] G. Hughes, “On the mean accuracy of statistical pattern recognizers,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 7, pp. 4215–4223, IEEE Trans. Inf. Theory, vol. IT-14, no. 1, pp. 55–63, Jan. 1968. Jul. 2016. [39] M. D. McKay, R. J. Beckman, and W. J. Conover, “Comparison of three [64] P. Coppo, A. Taiti, L. Pettinato, M. Francois, M. Taccola, and M. Drusch, methods for selecting values of input variables in the analysis of output “Fluorescence imaging spectrometer (FLORIS) for ESA FLEX mission,” from a computer code,” Technometrics, vol. 21, no. 2, pp. 239–245, Remote Sens., vol. 9, no. 7, p. 649, Jun. 2017. May 1979. [65] N. Sabater, J. Vicent, L. Alonso, S. Cogliati, J. Verrelst, and J. Moreno, [40] G. D. S. Ferreira and D. Gamerman, “ in geosta- “Impact of atmospheric inversion effects on solar-induced chlorophyll tistics under preferential sampling,” Bayesian Anal., vol. 10, no. 3, fluorescence: Exploitation of the apparent reflectance as a quality pp. 711–735, Sep. 2015. indicator,” Remote Sens., vol. 9, no. 6, p. 622, Jun. 2017. [41] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, [66] M. A. Álvarez, L. Rosasco, and N. D. Lawrence, “Kernels for vector- 2001. valued functions: A review,” Found. Trends Mach. Learn., vol. 4, no. 3, [42] S. Haykin, Neural Networks—A Comprehensive Foundation, 2nd ed. pp. 195–266, 2012. Upper Saddle River, NJ, USA: Prentice-Hall, Oct. 1999. [67] J. Verrelst, J. P. Rivera, J. Moreno, and G. Camps-Valls, “Gaussian [43] J. Rojo-Álvarez, M. Martínez-Ramón, J. M. Marí, and G. Camps-Valls, processes uncertainty estimates in experimental Sentinel-2 LAI and leaf Digital Signal Processing With Kernel Methods. U.K.: Wiley, Apr. 2018. chlorophyll content retrieval,” ISPRS J. Photogramm. Remote Sens., [44] J. A. K. Suykens and J. Vandewalle, “Least squares support vector vol. 86, pp. 157–167, Dec. 2013. machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, [68] A. C. Damianou and N. D. Lawrence, “Deep Gaussian processes,” in Jun. 1999. Proc. Int. Conf. Artif. Intell. Statist., Scottsdale, AZ, USA, vol. 31, 2013, [45] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine pp. 207–215. Learning. New York, NY, USA: MIT Press, 2006. [69] D. H. Svendsen, P. Morales-Álvarez, A. B. Ruescas, R. Molina, and [46] G. Camps-Valls, L. Gómez-Chova, J. Munoz-Marí, M. Lázaro-Gredilla, G. Camps-Valls, “Deep Gaussian processes for biogeophysical parameter and J. Verrelst, “Simpler: A simple educational MATLAB toolbox for retrieval and model inversion,” ISPRS J. Photogramm. Remote Sens., statistical regression,” Univ. Valencia, Valenica, Spain, Tech. Rep., 2013. vol. 166, pp. 68–81, Aug. 2020. [47] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” [70] L. Martino, J. Vicent, and G. Camps-Valls, “Automatic emulator and Intell. Lab. Syst., vol. 2, nos. 1–3, pp. 37–52, 1987. optimized look-up table generation for radiative transfer models,” in [48] X. Liu, W. L. Smith, D. K. Zhou, and A. Larar, “Principal component- Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2017, based radiative transfer model for hyperspectral sensors: Theoretical pp. 1457–1460. concept,” Appl. Opt., vol. 45, no. 1, pp. 201–209, 2006. [71] J. Vicent et al., “Gradient-based automatic lookup table generator for [49] M. Matricardi, “A principal component based version of the RTTOV radiative transfer models,” IEEE Trans. Geosci. Remote Sens., vol. 57, fast radiative transfer model,” Quart. J. Roy. Meteorolog. Soc., vol. 136, no. 2, pp. 1040–1048, Feb. 2019. no. 652, pp. 1823–1835, Oct. 2010. [72] D. H. Svendsen, L. Martino, and G. Camps-Valls, “Active emulation [50] D. Efremenko, A. Doicu, D. Loyola, and T. Trautmann, “Optical prop- of computer codes with Gaussian processes—Application to remote erty dimensionality reduction techniques for accelerated radiative trans- sensing,” Pattern Recognit., vol. 100, Apr. 2020, Art. no. 107103. fer performance: Application to remote sensing total ozone retrievals,” [73] G. Camps-Valls, J. M. Marí, L. Gómez-Chova, L. Guanter, and J. Quant. Spectrosc. Radiat. Transf., vol. 133, pp. 128–135, Jan. 2014. X. Calbet, “Nonlinear statistical retrieval of atmospheric profiles [51] A. D. Águila, D. Efremenko, V. M. García, and J. Xu, “Analysis of two from MetOp-IASI and MTG-IRS infrared sounding data,” IEEE dimensionality reduction techniques for fast simulation of the spectral Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1759–1769, radiances in the Hartley-Huggins band,” Atmosphere, vol. 10, no. 3, May 2012. p. 142, Mar. 2019. [74] J. Arenas-Garcia, K. B. Petersen, G. Camps-Valls, and L. K. Hansen, [52] C. Liu et al., “A spectral data compression (SDCOMP) radiative transfer “Kernel multivariate analysis framework for supervised subspace learn- model for high-spectral-resolution radiation simulations,” J. Atmos. Sci., ing: A tutorial on linear and kernel multivariate methods,” IEEE Signal vol. 77, no. 6, pp. 2055–2066, Jun. 2020. Process. Mag., vol. 30, no. 4, pp. 16–29, Jul. 2013.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

16 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

[75] E. F. Vermote, D. Tanre, J. L. Deuze, M. Herman, and J.-J. Morcette, Jordi Muñoz-Marí was born in Valencia, Spain, “Second simulation of the satellite signal in the solar spectrum, 6S: in 1970. He received the B.Sc. degree in physics, An overview,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 3, the B.Sc. degree in electronics engineering, and the pp. 675–686, May 1997. Ph.D. degree in electronics engineering from the [76] B. A. Bodhaine, N. B. Wood, E. G. Dutton, and J. R. Slusser, “On Universitat de Valencia (UV), Valencia, in 1993, Rayleigh optical depth calculations,” J. Atmos. Ocean. Technol., vol. 16, 1996, and 2003, respectively. no. 11, pp. 1854–1861, Nov. 1999. He is an Associate Professor with the Department [77] J. Coiffier, Fundamentals of Numerical Weather Prediction. Cambridge, of Electronics Engineering, UV, where he teaches U.K.: Cambridge Univ. Press, 2011. machine learning, big data, digital signal processing, and electronics, and also a Research Member of the Image Processing Laboratory. His research interests are tied to machine learning, statistical methods, and digital signal processing applied to data analysis in general and remote sensing (RS) in particular. He is a skilled programmer in different computer languages, such as C/C++,Java, Python, and others. He has authored and coauthored many journal articles, book chapters, and conference papers. Please visit https://www.uv.es/jordi/ for Jorge Vicent Servera received the B.Sc. degree in more information. physics from the University of Valencia, Valencia, Spain, in 2008, the M.Sc. degree in physics from the École polytechnique fédérale de Lausanne, Switzer- land, in 2010, and the Ph.D. degree in remote sens- ing (RS) from the University of Valencia in 2016. Since 2017, he has been a Research and Develop- ment with the Earth Observation Depart- ment, Magellium, Toulouse, France. He is involved in developing the Level-2 processing chain for ESA’s FLEX mission. His research interests include the modeling of Earth observation satellites, radiative transfer modeling, sim- ulation of synthetic scenes, atmospheric correction, and hyperspectral data analysis.

Neus Sabater received the B.Sc. degree in physics, the M.Sc. degree in remote sensing (RS), and the Ph.D. degree in RS from the Universitat de Valencia, Valencia, Spain, in 2010, 2012, and 2018, respec- tively. Juan Pablo Rivera-Caicedo received the B.Sc. Since 2012, she has been involved in the activities degree in agricultural engineering from the Univer- of the Laboratory for Earth Observation, Image sity National of Colombia, Bogotá, Colombia, and Processing Laboratory, University of Valencia, as a the University of Valle, Cali, Colombia, in 2001, the Research Technician. Her main activities during M.Sc. degree in irrigation engineering from the this period were related to the development of the CEDEX-Centro de Estudios y Experimentación de preparatory activities of the FLEX mission. In 2013, Obra Públicas, Madrid, Spain, in 2003, and the she was supported by the Ph.D. scholarship from the Spanish Ministry of M.Sc. and Ph.D. degrees in remote sensing (RS) Economy and Competitiveness and associated with the Ingenio/Seosat Spanish from the University of Valencia, Valencia, Spain, space mission. Her main research and personal interests include atmospheric in 2011 and 2014, respectively. correction, atmospheric radiative transfer, meteorology, and hyperspectral RS. Since 2011, he has been a member of the Lab- Dr. Sabater received the award of the University of Valencia for the best oratory for Earth Observation, Image Processing Laboratory, University of student records in the M.Sc. degree of RS from 2012 to 2013. Valencia. Since 2016, he has been in the program Cátedras, Conacyt Program, with the Consejo Nacional de Ciencia y Tecnología, Tepic, Mexico. His research interests include retrieval of vegetation properties using airborne and satellite data, leaf and canopy radiative transfer modeling, and hyperspectral data analysis.

Jochem Verrelst received the M.Sc. degree in trop- ical land use and in geo-information science and the Ph.D. in remote sensing (RS) from Wageningen Uni- Béatrice Berthelot received the M.Sc. degree in versity, Wageningen, The Netherlands, in 2005 and physics and chemistry of environment and the 2010, respectively. His Ph.D. dissertation was on the Ph.D. degree in ocean color from the University space-borne spectrodirectional estimation of forest Paul Sabatier, Toulouse, France, in 1988 and 1992, properties. respectively. Since 2010, he has been involved in prepara- She was with the Cesbio, Toulouse, France, tory activities of FLEX. He is the Founder of the in developing atmospheric corrections methods, data ARTMO software package. He is also the Co-Chair processing chains for Medium/HR sensor spatial res- of the SENSECO Cost Action (CA17134) that olution, and image simulation for seven years. She focuses on optical synergies for spatiotemporal sensing of scalable eco- moved to the industry to work on the development of physiological traits. In 2017, he received an H2020 ERC Starting Grant algorithms for different optical sensors and L1 and (755617) to work on the development of vegetation products based on the L2 data processing. She leads the Pole “Physics and Applications” with synergy of FLEX and Sentinel-3 data. His research interests include retrieval Magellium, Toulouse. Her research interests include atmospheric correction, of vegetation properties using airborne and satellite data, canopy radiative cloud screening, and cloud shadows detection, radiometry monitoring, BRDF transfer modeling and emulation, and hyperspectral data analysis. correction, and radiative transfer modeling.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VICENT SERVERA et al.: SYSTEMATIC ASSESSMENT OF MODTRAN EMULATORS FOR ATMOSPHERIC CORRECTION 17

Gustau Camps-Valls (Fellow, IEEE) received the José Moreno (Senior Member, IEEE) is a Profes- Ph.D. degree in physics from the Universitat de sor of earth physics with the Department of Earth Valencia, Valencia, Spain, in 2002. Physics and Thermodynamics, Faculty of Physics, He is a Full Professor of electrical engineering and University of Valencia, Valencia, Spain, teaching a Coordinator of the Image and Signal Processing and working on different projects related to remote Group, Image Processing Laboratory, Universitat sensing (RS) and space research as responsible for de Valencia. He is also involved in the develop- the Laboratory for Earth Observation. He is also ment of machine learning algorithms for geoscience the Director of the Laboratory for Earth Observa- and remote sensing (RS) data analysis. He has tion, Image Processing Laboratory/Scientific Park, authored 200 journal articles, more than 200 con- Valencia. He has been involved in many interna- ference papers, and 20 international book chapters. tional projects and research networks, including the He has edited the books Kernel Methods Engineering, Signal and Image preparatory activities exploitation programs of several satellite missions, Processing (IGI, 2007), Kernel Methods for Remote Sensing Data Analysis such as ENVISAT, CHRIS/PROBA, GMES/Sentinels, and SEOSAT, and (Wiley & Sons, 2009), Remote Sensing Image Processing (MC, 2011), and the Fluorescence Explorer, European Space Agency (ESA)’s Eighth Earth Digital Signal Processing With Kernel Methods (Wiley & Sons, 2018). Explorer mission. His main work is related to the modeling and monitoring Dr. Camps-Valls was a recipient of the prestigious European Research of land surface processes by using RS techniques. Council Consolidator Grant on Statistical Learning for Earth Observation Data Dr. Moreno was a member of the ESA Earth Sciences Advisory Committee Analysis in 2015. He has been an Associate Editor of the IEEE TRANSAC- from 1998 to 2002. He has been a member of the Space Station Users Panel TIONS ON SIGNAL PROCESSING, the IEEE GEOSCIENCE AND RS LETTERS, and other international advisory committees. He has served as Associate Editor and the IEEE SIGNAL PROCESSING LETTERS. He was the Invited Guest for the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING from Editor of the IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1994 to 2000. in 2012 and the IEEE Geoscience and Remote Sensing Magazine in 2015. He holds Hirsch’s index, h = 60 (source: Google Scholar), and entered the ISI list of Highly Cited Researchers in 2011. One of his papers was identified on kernel-based analysis of hyperspectral images as a Fast Moving Front research by Thomson Reuters Science Watch.

Authorized licensed use limited to: Universidad de Valencia. Downloaded on April 22,2021 at 07:13:02 UTC from IEEE Xplore. Restrictions apply.