<<

remote sensing

Article Retrieval of Cloud Optical Thickness from Sky-View Camera Images using a Deep Convolutional Neural Network based on Three-Dimensional Radiative Transfer

Ryosuke Masuda 1, Hironobu Iwabuchi 1,* , Konrad Sebastian Schmidt 2,3 , Alessandro Damiani 4 and Rei Kudo 5

1 Graduate School of Science, Tohoku University, Sendai, Miyagi 980-8578, Japan 2 Department of Atmospheric and Oceanic Sciences, University of Colorado, Boulder, CO 80309-0311, USA 3 Laboratory for Atmospheric and Space , University of Colorado, Boulder, CO 80303-7814, USA 4 Center for Environmental Remote Sensing, Chiba University, Chiba 263-8522, Japan 5 Meteorological Research Institute, Japan Meteorological Agency, Tsukuba 305-0052, Japan * Correspondence: [email protected]; Tel.: +81-22-795-6742

 Received: 2 July 2019; Accepted: 17 August 2019; Published: 21 August 2019 

Abstract: Observation of the spatial distribution of cloud optical thickness (COT) is useful for the prediction and diagnosis of photovoltaic power generation. However, there is not a one-to-one relationship between transmitted and COT (so-called COT ambiguity), and it is difficult to estimate COT because of three-dimensional (3D) radiative transfer effects. We propose a method to train a convolutional neural network (CNN) based on a 3D radiative transfer model, which enables the quick estimation of the slant-column COT (SCOT) distribution from the image of a ground-mounted radiometrically calibrated digital camera. The CNN retrieves the SCOT spatial distribution using spectral features and spatial contexts. An evaluation of the method using synthetic data shows a high accuracy with a mean absolute percentage error of 18% in the SCOT range of 1–100, greatly reducing the influence of the 3D radiative effect. As an initial analysis result, COT is estimated from a sky image taken by a digital camera, and a high correlation is shown with the effective COT estimated using a pyranometer. The discrepancy between the two is reasonable, considering the difference in the size of the field of view, the space–time averaging method, and the 3D radiative effect.

Keywords: deep learning; cloud; 3D radiative transfer; sky-view camera; convolutional neural network

1. Introduction Clouds primarily modulate the Earth's energy balance and climate. The observation of the spatial distribution of cloud physical properties, in particular, cloud optical thickness (COT), is useful for the evaluation of clouds’ radiative effects and the diagnosis of photovoltaics, which is a promising form of renewable energy [1]. Many methods for the estimation of COT by optical measurement from satellites and the ground have been developed. Satellite remote sensing often relies on the reflection of solar radiation, while ground-based remote sensing usually relies on transmission. A one-to-one relationship between transmitted radiance and COT does not hold (COT ambiguity), and it is difficult to estimate COT from transmitted radiance measured at a single . Therefore, many ground-based remote sensing methods combining multiple have been proposed. For example, Marshak et al. [2] and Chiu et al. [3] estimated COT using differences in the transmitted between two wavelengths over a vegetated surface. Kikuchi et al. [4] estimated COT using

Remote Sens. 2019, 11, 1962; doi:10.3390/rs11171962 www.mdpi.com/journal/remotesensing Remote Sens. 2019, 11, 1962 2 of 28 the difference between the transmission at water-absorbing and non-absorbing bands. McBride et al. [5] estimated COT using the spectral gradient of transmitted radiance in the near-infrared and shortwave infrared. Niple et al. [6] estimated COT using the equivalent width of the oxygen absorption band. In these methods, the COT in the zenith direction is estimated using the transmitted radiance in the zenith direction. However, for the evaluation of the amount of photovoltaic power generation, it is desirable to estimate the COT distribution in the whole sky. Therefore, the use of digital cameras is attractive. Mejia et al. [7] developed the radiance red-to-blue ratio (RRBR) method, which combines the ratio of the transmitted radiance of the red and blue channels (red-to-blue ratio; RBR) and the transmitted radiance of the red channel. They estimated the COT of each pixel from a camera image. Conventional cloud remote sensing methods, including all of the above-mentioned methods, perform the estimation based on a one-dimensional (1D) radiative transfer model that assumes plane-parallel clouds. In the case of satellite observation, information on the spatial arrangement of cloud water is usually recorded as a two-dimensional (2D) image. The estimation method based on a 1D radiative transfer model implicitly assumes the independent pixel approximation (IPA) [8], which requires each pixel to be independent of its spatial context. However, in the presence of broken clouds, the effects of the horizontal transport of radiation—in other words, the three-dimensional (3D) radiative effect—are very large. Many studies have revealed that 3D radiative effects cause large errors in the estimation of cloud physical quantity if the IPA is assumed [9–13]. The observed radiance is influenced not only by the characteristics of the cloud elements along the line of sight, but also by the 3D arrangement of the cloud water around the line of sight. Thus, to accurately estimate the characteristics of the clouds, a cloud estimation method should take the 3D arrangement of the clouds into account. The multi-pixel method is expected to be effective in estimating cloud properties using multiple pixels, including the neighboring pixels of the target pixel [14]. Therefore, the use of a neural network (NN) that is trained based on 3D radiative transfer calculation is promising in practical applications. Such NN cloud retrieval methods started with the feasibility studies of the multi-spectral multi-pixel method by Faure et al. [15,16], and have been extended to practical applications to analyze Moderate-resolution Imaging Spectroradiometer (MODIS) observational data by Cornet et al. [17,18]. In the method of Cornet et al. [17,18], the input data were radiances from a small image area of about 1 1 km2. They estimated bulk properties such as the average and degree of heterogeneity of COT and × cloud droplet effective radius (CDER), and the cloud fraction. Evans et al. [19] tested an NN to estimate COT, using spatial features of multi-directional observation images at visible wavelengths. In recent years, deep learning techniques to train deep NNs have emerged, which enables applications involving large, complicated problems of estimation, detection, classification, and prediction. As a result of its very high modeling ability, deep learning has rapidly grown in popularity for various remote sensing applications. Okamura et al. [20] trained a deep convolutional neural network (CNN) based on 3D radiative transfer, and estimated COT and CDER simultaneously at multiple pixels of a multi-spectral image used as input. They showed that the CNN achieved a high accuracy and efficient estimation. On the other hand, the 3D radiative effects on the transmitted radiance observed on the ground are not well understood, and their influence on the retrieval of the cloud physical properties from ground-based observation is not clear. For transmitted radiance, radiation enhancement frequently occurs [21]. Several algorithms have been proposed to estimate the 3D cloud field by iterative 3D radiative transfer calculations, using the variational and ensemble Kalman filter methods [22–27]. However, their computational cost is high, as they perform multiple computationally heavy calculations of 3D radiative transfer, and gradients are needed in the variational methods. The purpose of this study is to develop a practical, efficient (fast and accurate) method that can be applied for the estimation of the spatial distribution of COT from images actually taken by a ground-based digital camera. Following the previous work by Okamura et al. [20], we have trained deep CNNs based on a 3D radiative transfer model. Although the prior work [20] showed the feasibility of deep learning-based cloud retrieval, the conditions of cloud, atmosphere, and surface were idealized and limited. In this study, synthetic data are generated with more variety of cloud, atmosphere, and surface conditions. In addition, we have Remote Sens. 2019, 11, 1962 3 of 28 explored several kinds of new DNN architecture and techniques recently available in the literature, in order to achieve accuracy and computational efficiency. This paper is organized as follows. Section2 describes the cloud data and radiative transfer model calculations. Section3 presents an investigation of the 3D radiative e ffects on transmitted zenith radiance, and tests how NN and CNN help to overcome the COT ambiguity and 3D radiative effects. Section4 presents a deep CNN model that estimates COT from sky-view camera images, with accuracy evaluation by synthetic data. In Section5, the CNN is applied to actual observations from a digital camera, and the CNN retrieval is compared with COT estimated using a pyranometer. A summary and conclusions are given in Section6.

2. Data and Methods A large amount of synthetic data were generated by simulations using a high-resolution cloud model and a 3D radiative transfer model, and the synthetic data were used for training and testing the NN and CNN models.

2.1. SCALE-LES Cloud-Field Data In this study, we used high-resolution cloud data reproduced by the Scalable Computing for Advanced Library and Environment-Large Eddy Simulation (SCALE-LES) model [28–30]. Three kinds of simulation datasets were used, namely, an open-cell case with a low concentration of cloud condensation nuclei (clean), a closed-cell case with a high concentration (polluted), and a Barbados Oceanographic and Meteorological Experiment (BOMEX) case reproducing convective clouds in the tropics. Each case has 60 time-steps with 10-minute intervals. Spatially-coarsened grid data were used in this study. The open- and closed-cell cases cover a horizontal area of 28 28 km2 with a horizontal × resolution (∆x and ∆y) of 0.14 km. The vertical resolution is higher for the lower atmosphere (∆z = 0.04 km at the lower end) and decreases with increasing height, and ∆z = 0.2 km at the top of the model domain. The BOMEX case covers a horizontal area of 7.2 6.4 km2, with a horizontal resolution × (∆x and ∆y) of 0.05 km and a vertical resolution of ∆z = 0.04 km. In all cases, the lateral boundary conditions are periodic. SCALE-LES uses a double-moment bulk microphysics scheme to obtain the liquid water content (LWC) and cloud particle number density (N). The two quantities are converted to optical properties for radiative transfer calculations. COT at 550 nm is defined as follows: Z ∞ τc = βext(z)dz (1) 0 where z is the height and βext is the volume extinction coefficient of the cloud particles, as follows:

3w βext = Qext (2) 4πρwre where Qext is the extinction efficiency factor, w is the LWC, ρw is the droplet internal density, and re is the CDER. In this study, we assume only water clouds. re is determined by the following:

!1/3 3w re = (3) 4πρwNχ where χ is a parameter determined by the particle size distribution. As the particle size distribution is assumed to be a lognormal distribution in this study, χ can be obtained from the geometrical standard deviation, s, by the following equation:   χ = exp 3 ln2 s (4) − × Remote Sens. 2019, 11, 1962 4 of 28

Figure1 shows examples of snapshots of cloud fields. In the closed-cell case, optically-thick cloudsRemote Sens. cover 2019 almost, 11, x FOR the PEER whole REVIEW sky, and are relatively homogeneous in the horizontal directions. In4 of the 30 open case, stratocumulus clouds with small gaps are distributed. In the BOMEX case, cumulus clouds areclouds sparsely are sparsely distributed. distributed Temporal. Temporal variations variations of cloud physical of cloud properties, physical properties including, a including horizontal a heterogeneityhorizontal heterogeneity index defined index in Equationdefined in (A1), Equation in the three(A1), casesin the are three presented cases are in Figurepresented A1. in Figure A1. (a) (b)

(c) (d)

(e)

(f)

Figure 1. Examples of Scalable Computing for Advanced Library and Environment-Large Eddy Figure 1. Examples of Scalable Computing for Advanced Library and Environment-Large Eddy Simulation (SCALE-LES) cumulus cloud fields for the following three cases: (a,b) closed-cell, Simulation (SCALE-LES) cumulus cloud fields for the following three cases: (a,b) closed-cell, (c,d) (c,d) open-cell, and (e,f) Barbados Oceanographic and Meteorological Experiment (BOMEX). (a,c,e) open-cell, and (e,f) Barbados Oceanographic and Meteorological Experiment (BOMEX). (a,c,e) Horizontal distribution of the cloud optical thickness (COT). (b,d,f) Vertical of the volume Horizontal distribution of the cloud optical thickness (COT). (b,d,f) Vertical cross section of the extinction coefficient along the black dashed lines superimposed on (a,c,e), respectively. White lines in volume extinction coefficient along the black dashed lines superimposed on (a,c,e), respectively. (b) indicate lines of sight from a ground-based camera. White lines in (b) indicate lines of sight from a ground-based camera.

2.2. Simulations of Spectral Radiance

Remote Sens. 2019, 11, 1962 5 of 28

2.2. Simulations of Spectral Radiance The Monte Carlo Atmospheric Radiative Transfer Simulator (MCARaTS) [31,32], a 3D atmospheric radiative transfer model, is used to generate synthetic images of a ground-based camera. This model uses the Monte Carlo method and ray-tracing to track the trajectory of the model , and calculates the spectral radiances for arbitrary locations and directions. The model inputs are the 3D distributions of the atmospheric optical characteristics of every wavelength and ground surface albedo. The model atmosphere up to 50 km is divided into approximately 50 layers, and the 3D distribution of the clouds obtained from SCALE-LES is incorporated into the troposphere. The cloud fields are modified from the original SCALE-LES simulations in various ways, so as to increase the variety of cases. For example, cloud optical properties can be modified by scaling and adding bias. Rayleigh scattering by air molecules, absorption by ozone, and scattering by cloud particles are considered. The single scattering albedo and the scattering phase function, depending on the wavelength and CDER, are calculated using the Lorenz–Mie theory. The visible-wavelength range of 385–715 nm is divided into 11 wavelength bands with a width of 30 nm, and the average spectral radiance of each wavelength band is calculated. Furthermore, in this study, the spectral response functions of a digital camera (described later in Section5) are used to convert the 11-band spectral radiances to those of red/green/blue (RGB) channels. More details of the synthetic image generation are presented in Sections3 and4.

3. Test of COT Retrieval from Zenith In this section, in order to understand the elements for COT estimation with high accuracy from transmitted visible radiation, we tested various algorithms for limited cases. Previous studies suggest that retrieval algorithms should be based on a 3D radiative transfer model, and use multiple pixels including surrounding pixels [14,15,20]. Contributions to accuracy improvement are evaluated by comparing several estimation algorithms, including the RRBR method [7].

3.1. 3D Radiative Effects in Zenith Transmittance First, we investigated the relationship between COT and transmitted radiance in order to understand how the 3D radiative effect works. The horizontal distributions of the visible transmitted radiance in the zenith direction are calculated by 3D radiative transfer and the IPA, respectively, for the BOMEX case, as shown in Figure2. Compared to IPA radiance, the sun-side of a cloud is bright and the opposite side is dark in the 3D radiance, which are the effects of radiation enhancement and shading, respectively. These cannot be expressed in the IPA calculation, because the radiative transfer calculation is performed for each air column independently. Interestingly, comparing the RBR of the 3D calculation with the IPA, the 3D scene is redder in parts where radiation enhancement appears, and bluer in the shadowed parts. Therefore, 3D effects change the spectral slope in the visible region, and make the scene with broken clouds more colorful. Remote Sens. 2019, 11, 1962 6 of 28 Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 30

(a) 3D (b) IP1DA (c) 33DD –- I1DPA (red(red channel)channel)

6 6 6

)

) ) 4 4 4

m

m

m

k

k

k

(

(

(

Y

Y

Y

2 2 2

0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 X (km) X (km) X (km)

-150 -100 -50 0 50 100 150 -2 -1 -1 Difference (W m strsr µm )

(d) Red-to-blue ratio (3D) (e) Red-to-blue ratio (1D)(IPA) (f) COT

6 6 6

)

) ) 4 4 4

m

m

m

k

k

k

(

(

(

Y

Y

Y

2 2 2

0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 X (km) X (km) X (km)

0.4 0.6 0.8 1.0 1.2 1.4 1.6 0.4 0.6 0.8 1.0 1.2 1.4 1.6 0.1 1 10 100 Red-to-blue ratio Red-to-blue ratio Cloud optical thickness

Figure 2. Examples of the spatial distributions of the zenith-viewing transmitted radiances at the ground,Figure 2. as Example follows:s (aof) true-color the spatial image distributions composed of ofthe mean zenith spectral-viewing radiances transmitted in the radiances red, green, at and the blueground (RGB), as channelsfollows: ( ina) thetrue standard-color image RGB composed (sRGB) color of mean space definedspectral byradiances International in the Electrotechnicalred, green, and Commission,blue (RGB) channels computed in the by standard three-dimensional RGB (sRGB) (3D) color radiative space defined transfer; by International (b) the same Electrotechnical as (a), but for radiancesCommission computed, computed using by the three independent-dimensional pixel (3D) approximation radiative transfer (IPA);; ( (bc)) t diheff erences same as in (a the), but mean for spectralradiances radiance computed in the using red channelthe independent between thepixel 3D approximation and IPA calculations; (IPA); ((dc,)e )d red-to-blueifferences in ratio the (RBR)mean ofspectral 3D and radiance IPA, respectively; in the red channel (f) COT. between The sun the is 3D located and IPA in the calculations left of the; ( images,d,e) red- withto-blue a solar ratio zenith (RBR) angleof 3D ofand 60 ◦IPA,. The respectively; regions framed (f) COT. by red The squares sun is correspond located in to the the left examples of the images of image, with patches a solar analyzed zenith inangle detail of in 60˚. this The study, regions and the framed regions by framed red squares by green correspond squares correspond to the examples to pixels of of image the input patches and outputanalyzed for in the detail convolutional in this study neural, and network the regions (CNN). framed by green squares correspond to pixels of the input and output for the convolutional neural network (CNN). Figure3 shows the relationship between the red-channel radiance and the RBR, and the column COT forFigure each 3 pixelshows for the the relationship example shown between in Figure the red2. The-channel IPA radiance radiance is and determined the RBR from, and thethe columncolumn COT,COT andfor each has apixel peak for at athe COT example of about shown 5. The in radiance Figure increases2. The IPA with radiance the increasing is determined COT when from COT the radiance5. Therefore, increases COT haswith two the solutions increasing for COT the samewhen IPA COT radiance <5, and value decreases (COT ambiguity), with the increasing and COT COT cannot when be estimated COT >5.using Therefore, a single COT wavelength. has two Assolutions shown for by Mejiathe same et al. IPA [7], radiance it is possible value to (COT distinguish ambiguity), whether and a COT cloud cannot is optically be estimated thin or thickusing by a usingsingleRBR. wavelength. However, As this shown solution by is Mejia effective et al. only [7], for it homogeneous is possible to cloudsdistinguish assuming whether the IPA, a cloud and is is notoptically effective thin for or brokenthick by clouds, using RBR. because However, the 3D this radiative solution effect is effective perturbs only the spectralfor homogeneous signature ofclouds thin vs.assuming thick clouds the IPA, beyond and is its not usefulness. effective Evenfor broken though clouds the spectral, because information the 3D radiative cannot effect be used perturbs to resolve the thespectral COT signature ambiguity of in thin the vs. case thick of strong clouds 3D beyond effects, its the usefulness correct COT. Even can though be still the be spectral retrieved information based on cannot be used to resolve the COT ambiguity in the case of strong 3D effects, the correct COT can be

Remote Sens. 2019, 11, 1962 7 of 28 Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 30 stillstructural be retrieved context. based As 3D on radiancestructural depends context. on As the 3D spatial radiance arrangement depends on of the spatial nearby arrangement cloud elements, of thethis nearby should cloud be taken elements, into consideration this should be in thetake retrievaln into consideration of COT. in the retrieval of COT.

(a) (b)

Figure 3. (a) Spectral radiances of red and blue channels and (b) RBR, computed using the IPA and Figure3D radiative 3. (a) Spectral transfer radiances as functions of ofred COT, and forblue the channels case in Figureand (b2) .RBR, For the computed 3D results, using solid the lines IPA andand 3Dshaded radiative areas transfer denote theas functions mean and of standard COT, for deviation, the case in respectively, Figure 2. For of the the 3D values results, computed solid lines for eachand shadedCOT bin, areas where denote 30 bins the aremean spaced and standard equally on deviation a logarithmic, respectively, scale. of the values computed for each COT bin, where 30 bins are spaced equally on a logarithmic scale. 3.2. Data Generation 3.2. DataFor G theeneration tests of various NN/CNNs for COT estimation, the synthetic data of the transmitted zenithFor radiance the tests are of generated various NN/CNNs by radiative for transfer COT estimation simulations., the Only synthetic the BOMEX data of case the wastransmitted used for zeniththis purpose. radiance For are simplicity, generated the by solarradiative zenith transfer and azimuth simulations. angles Only (SZA the and BOMEX SAA, respectively)case was used were for thisfixed purpose at 60◦ and. For 0simplicity,◦, respectively; the solar the aerosolzenith and was azimuth assumed angles to be (SZA a rural and model SAA, [ 33respectively)] with an optical were fixedthickness at 60° of and 0.2; 0°, and respectively the cloud base; the heightaerosol was was set assumed at 500 m. to Inbe ordera rural to model increase [33 the] with variety an optical of the thicknesscloud field, of the0.2 original; and the values cloud ofbase COT height were was modified set at by 500 a randomlym. In order chosen to increase factor betweenthe variety 0.2 of and the 5, cloudand the field, cloud thegeometrical original values thickness of COT was were scaled modified by a factorby a randomly of 0.5, 1.0, chosen or 2.0. factor IPA andbetween 3D radiative 0.2 and 5,transfer and the calculations cloud geometrical were performed, thickness andwas thescaled horizontal by a factor distribution of 0.5, 1.0, of or the 2.0. band-mean IPA and 3D radiances radiative at transferRGB channels calculations were calculatedwere performed, for the and transmitted the horizontal zenith distribution radiance at of ground the band level.-mean The radiances horizontal at resolution is ∆x = ∆y = 0.05 km, and a radiance distribution can be obtained for 128 128 pixels per RGB channels were calculated for the transmitted zenith radiance at ground level. ×The horizontal scene. Nine image patches with a size of 52 52 pixels were extracted from each scene. The green resolution is ∆x = ∆y = 0.05 km, and a radiance× distribution can be obtained for 128  128 pixels per scene.square Nine in Figure image2 corresponds patches with to a one size of of these 52  nine 52 pixels image were patches, extracted as an example.from each A scene. total ofThe 972 green and square108 image in Fig patchesure 2 correspond were generateds to one for theof these training nine and image performance patches, as tests, an example respectively.. A total of 972 and 108 image patches were generated for the training and performance tests, respectively. 3.3. Design and Configuration of NN and CNN Models 3.3. DesignThe NNs and andConfiguration CNN listed of inNN Table and1 CNN were M tested.odels All of the models were trained with the radiance of 3D and IPA calculations, respectively, as input data. Only 3D radiance data were used for error The NNs and CNN listed in Table 1 were tested. All of the models were trained with the radiance evaluation. RGB radiances are normalized by dividing by 800 W m 2 sr 1 µm 1 in the preprocessing, of 3D and IPA calculations, respectively, as input data. Only 3D radiance− − data− were used for error which makes it easy to train the neural network model. The model output is the non-linear transform evaluation. RGB radiances are normalized by dividing by 800 W m–2 sr–1 µm–1 in the preprocessing, of COT, as defined by the following: which makes it easy to train the neural network model. The model output is the non-linear transform of COT, as defined by the following:     τ0 = H log τc + 1 log τc + 1 /3 (5) c 10 × 10 휏′푐 = 퐻(log10 휏푐 + 1) × (log10 휏푐 + 1)⁄3 (5) where H is the Heaviside step function. The datasets used for training and evaluation were the same wherefor all ofH theis the models, Heaviside although step thefunction. numbers The of datasets pixels in used the input for training and output and evaluation data were di werefferent the among same forthe all models. of the NN-0 models, is an although NN that usesthe numbers the single-pixel of pixels radiance in the to input estimate and single-pixel output data COT, were while different NN-1 amongto NN-10 the are models. the NNs NN that-0 is usean NN multi-pixel that uses radiances the single to-pixel estimate radiance the COT to estimate at a central single pixel.-pixel NN- COT,M while(M = 1–10)NN-1 takesto NN (2-M10 +are1) 2thepixels NNs as that an input,use multi assuming-pixel radiances that the sizeto estimate of one sidethe ofCOT the at peripheral a central 2 pixel.area is NNM pixels.-M (M = Each 1–10) NN takes is a (2 multi-layerM + 1) pixels perceptron as an input, model assuming with a single that middlethe size layer.of one The side number of the peripheralof nodes in area each is NNM pixels. model Each was NN determined is a multi so-layer that theperceptron degree ofmodel freedom with of a thesingle model middle increased layer. The number of nodes in each NN model was determined so that the degree of freedom of the model increased according to the complexity of the problem. In the training stage, the output pixels of the

Remote Sens. 2019, 11, 1962 8 of 28 according to the complexity of the problem. In the training stage, the output pixels of the NN model were sampled one by one from 32 32 pixels centered on the image patch with a size of 52 52 pixels, × × and the corresponding input pixels included the radiance data of the surrounding pixels when M >0.

Table 1. Tested neural network (NN) architectures for the estimation of cloud optical thickness (COT) from zenith-viewing transmitted radiance. Single-pixel (NN-0) and multi-pixel (NN-M for M = 1–10) architectures derive a single-pixel COT of the central pixel from single-pixel and multiple pixels’ radiance data. CNN denotes convolutional NN, which derives the multi-pixel COT from multiple pixels’ radiance data. The input image corresponds to the surrounding pixels used to derive COT at either the single central pixel in NN-M, or the whole image at once in CNN.

Output Architecture Input Image Number of Model Number of Training Model Image Size (Nodes) Size (Pixels) Parameters (Millions) Epochs Time (h) (Pixels) NN-0 NN (64) 1 1 0.000321 500 3.0 NN-1 NN (288) 3 3 1 0.00835 500 3.0 × NN-2 NN (800) 5 5 1 0.0616 500 3.0 × NN-3 NN (1568) 7 7 1 0.234 500 3.0 × NN-4 NN (2592) 9 9 1 0.635 500 3.0 × NN-5 NN (3872) 11 11 1 1.41 500 3.1 × NN-6 NN (5408) 13 13 1 2.75 500 3.1 × NN-7 NN (7200) 15 15 1 4.87 500 3.1 × NN-8 NN (9248) 17 17 1 8.04 500 3.1 × NN-9 NN (11552) 19 19 1 12.5 500 3.7 × NN-10 NN (14112) 21 21 1 18.7 500 5.1 × CNN PSPNet 52 52 52 52 10.8 2000 0.35 × ×

Among the models listed in Table1, only the CNN model estimates COT at multiple pixels in a single operation. The input and output of the CNN are the radiances and COT, respectively, for the 52 52-pixel area, and the error evaluation is performed for the central 32 32 pixels, consistently with × × models NN-1 to NN-10. Therefore, in the CNN, the COT at the evaluated pixel is estimated using the radiances of 10 or more surrounding pixels. The structure of the CNN is shown in Figure4. The basic structure was designed based on a pyramid scene network (PSPNet) [34]. In the forward calculation of CNN, the data are continuously processed through several layers, in which multi-channel convolution and non-linear transformation are performed. In the convolution layer, the output (uijm) of the mth output channel and pixel indices i and j is represented as follows:

K 1 L 1 L 1 X− X− X− = + uijm zi+p,j+q,khpqkm bm (6) k = 0 p = 0 q = 0 where z is the input signal of the kth input channel, h is the weight of an L L convolution filter, ijk × and b is the bias. The convolution operation produces an output image (feature map) that responds to patterns similar to the filter patterns, combining spectral and spatial information contained in the input multi-channel image (feature map). The filter weight and bias coefficients are optimized in the training. The dashed lines in Figure4 represent the so-called shortcuts; the detailed structure of the units with the shortcut is shown in the upper right of Figure4. This unit with the shortcut is called a residual unit [35], which adds the input of the unit to the data obtained by the process involving multiple convolution operations. In general, the use of residual units facilitates the training of recent deeper networks. The processing order in a residual unit is based on the work of He et al. [36], and Zagoruyko and Komodakis [37]. In order to improve the regularization and generalization performance, the CNN uses batch normalization and dropout (with a fraction of 10%) [38,39]. The rectified linear function (ReLU) is also used in the residual unit. Batch normalization and ReLU are applied not only in residual units, but also after all convolution operations. PSPNet has multiple pooling layers with different scales, in which the feature maps are subsampled to coarse grids by averaging. In this study, four kinds of averaging poolings, with feature map sizes of 1 1, 2 2, 4 4, and 8 8, were used. Thus, it × × × × Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 30 residual unit [35], which adds the input of the unit to the data obtained by the process involving multiple convolution operations. In general, the use of residual units facilitates the training of recent deeper networks. The processing order in a residual unit is based on the work of He et al. [36], and Zagoruyko and Komodakis [37]. In order to improve the regularization and generalization performance, the CNN uses batch normalization and dropout (with a fraction of 10%) [38,39]. The rectified linear function (ReLU) is also used in the residual unit. Batch normalization and ReLU are applied not only in residual units, but also after all convolution operations. PSPNet has multiple poolingRemote Sens. layers2019, with11, 1962 different scales, in which the feature maps are subsampled to coarse grids9 ofby 28 averaging. In this study, four kinds of averaging poolings, with feature map sizes of 1  1, 2  2, 4  4,is and expected 8  8, thatwere cloud used. spatial Thus, structuresit is expected and that 3D cloud radiative spatial effects structures at various and horizontal 3D radiative scales effects can at be variouslearned. horizontal The number scales of output can be channels learned. from The eachnumber pooling of output layer ischannels reduced from to a quarter, each pooling and the layer feature is reducedmap of each to a channel quarter, is and enlarged the feature (upsampled) map of by each bilinear channel interpolation. is enlarged Subsequently, (upsampled) the by data bilinear from interpolation.all of the pooling Subsequently, layers are concatenatedthe data from along all theof the channel pooling dimension. layers are Finally, concatenated multi-scale along features the channelare fused dimension. to obtain Finally, an image multi with-scale the samefeatures size are as thefused input to obtain image an by image the transposed with the same convolution. size as theThe input details image of these by deep the NN transpose techniquesd convolution. are summarized The details by Goodfellow of these et deep al. [39 NN]. techniques are summarized by Goodfellow et al. [39].

. Figure 4. Structure of a simplified version of a convolutional neural network (CNN) used for a Figuretest of 4 COT. Structure estimation of a simplified based on version the zenith-viewing of a convolutional transmitted neural radiances.network (CNN) The inputused for and a outputtest of COTlayers estimation (yellow) are based presented on the withzenith the-viewing image size transmitte (in pixels)d radiances. and the numberThe input of and channels output/variables. layers (yellow)The input are takes presented a multispectral with the image withsize (in three pixels) spectral and channels,the number and of the channels/variables. output is a two-dimensional The input takesdistribution a multispectral of COT. The image neural with network three spectral(NN) layers channels, denoted and in light the output blue and is purple a two- representdimensional the distributionconvolutional of CO andT. deconvolutional The neural network layers, (NN) respectively, layers denoted shown in withlight ablue filter and size purple and therepresent number the of convolutionaloutput channels. and Thedeconvolutional “ 1/2” represents layers the, respectively convolution, shown to halve with the sizea filter of thesize feature and the map. number The NN of × outputlayers denotedchannels. in The pink “ represent1/2” represents the average the convolution pooling to obtainto halve a featurethe size map of the with feature a shown map. number The NN of layerspixels. denoted In the “upsample” in pink represent layer, bilinear the average interpolation pooling istoperformed obtain a feature to expand map the with feature a shown map, number in order ofto pixels. obtain In the the shown “upsample” number layer, of pixels. bilinear The interpolation “concat” layer is concatenates performed to all expand of the channelsthe feature in themap layer, in orderinput. to The obtain layer the blocks shown denoted number by of black pixels. dashed The “concat” lines with layer marks concatenates (# and *) are all the of residualthe channels blocks in withthe layerdetailed input. structures, The layer shown blocks separately denoted inby the black upper dashed right. lines These with consist marks of batch (# and normalization *) are the residual (“batch blocksnorm”); with a rectified detailed linear structures (ReLU), activationshown separately function; and in convolutionthe upper right. with a These 3 3 filter, consist dropout, of batch and × normalizationsummation (“sum”) (“batch operators. norm”); a Morerectified details linear are (ReLU) explained activation in the mainfunction text.; and convolution with a 3  3 filter, dropout, and summation (“sum”) operators. More details are explained in the main text. Adaptive moment estimation (Adam) [40] was used for training, that is, the optimization of the model parameters. The loss function to be minimized during training is the mean squared error of the COT transform in Equation (6). During the training, data samples are randomly rearranged and divided into mini-batches. For each mini-batch, a gradient calculation is performed and the model parameters are updated, and this process is iterated for all of the mini-batches. One epoch corresponds to the process for all of the data samples, and the optimization is continued for multiple epochs while shuffling data samples for each epoch [39]. The training of the NN models took between 3 and 5 hours Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 30

Adaptive moment estimation (Adam) [40] was used for training, that is, the optimization of the model parameters. The loss function to be minimized during training is the mean squared error of the COT transform in Equation (6). During the training, data samples are randomly rearranged and divided into mini-batches. For each mini-batch, a gradient calculation is performed and the model parameters are updated, and this process is iterated for all of the mini-batches. One epoch correspondsRemote Sens. 2019 to, 11the, 1962 process for all of the data samples, and the optimization is continued for multiple10 of 28 epochs while shuffling data samples for each epoch [39]. The training of the NN models took between 3 and 5 hours for 500 epochs, while the training of the CNN took only 0.35 hours for 2000 epochs. As for 500 epochs, while the training of the CNN took only 0.35 hours for 2000 epochs. As CNNs can CNNs can obtain a result for multiple pixels simultaneously with a single operation, the calculation obtain a result for multiple pixels simultaneously with a single operation, the calculation time required time required to process the same amount of data is significantly shorter than in the NN models. The to process the same amount of data is significantly shorter than in the NN models. The NN model was NN model was implemented using the Chainer deep learning framework [41]. implemented using the Chainer deep learning framework [41].

3.4.3.4. Compariso Comparisonn of of V Variousarious NNs NNs FigureFigure 55 showsshows anan example example of of the the estimation estimation results results using using the the RRBR RRBR method, method, NN-0-IPA, NN-0-IPA, NN-0-3D, NN-0- 3D,NN-10-3D, NN-10-3D, and and CNN-3D. CNN-3D. The The RRBR RRBR method, method, NN-0-IPA, NN-0-IPA, and and NN-0-3D NN-0-3D use use thethe datadata of a single single pixelpixel as as input. input. NN NN-0-IPA-0-IPA is is considered considered to to be be a a traditional traditional IPA IPA-based-based single single-pixel-pixel estimation method method impleimplementedmented by by NN, NN, and and NN NN-0-3D-0-3D is is a single single-pixel-pixel estimation method trained by 3D radiances. As As shownshown in in Fig Figureure 33,, NN-0-IPANN-0-IPA andand NN-0-3DNN-0-3D areare trainedtrained usingusing veryvery didifferentfferent radiances. The The image image patchespatches shown shown in in Fig Figureure 55 correspondcorrespond to to the the red red squares squares in in Figure Figure2 ,2 and, and the the illumination illumination and and shadowingshadowing e effectsffects work work strongly strongly on on the the sun sun side side and theand opposite the opposite side, respectively, side, respectively, of a large of cumulus a large cumuluscloud. The cloud. estimation The estimationerror is very error large inis the very single-pixel large in theretrieval single methods-pixel retrieval (RRBR method, methods NN-0-IPA, (RRBR method,and NN-0-3D). NN-0-IPA, As the and estimation NN-0-3D). error As the of NN-0-3Destimation is error as large of NN as that-0-3D of is NN-0-IPA, as large as COT that cannotof NN- be0- IPA,accurately COT cannot estimated be accurately from only estimated the information from only at a the single information pixel, even at ifa single the NN pixel, is trained even if using the NN the isradiance trained datausing including the radiance 3D radiative data including effects. On3D theradiative other hand,effects. the On estimation the other accuracy hand, the of estimation NN-10-3D accuracyand CNN-3D, of NN in-10 which-3D and the CNN horizontal-3D, in distributionwhich the horizontal of the radiance distribution of multiple of the radiance pixels is usedof multiple as the pixelsinput, is is used dramatically as the input, improved. is dramatically This can improved. be ascribed This to can the be fact ascribed that 3D to radiativethe fact that eff 3Dects radiative are well effectscaptured are by well considering captured by the considering arrangement the of arrangement the surrounding of the cloud surrounding elements. cloud elements.

(a) Ground truth (b) RRBR (c) NN-0-IPA (d) NN-0-3D (e) NN-10-3D (f) CNN-3D

T

O

C

(g) (h) (i) (j) (k)

]

%

[

r

o

r

r

e

e

v

i

t

a

l

e

R

Figure 5. Examples of COT retrieval by various methods, namely: (a) ground-truth COT, (b–f) estimated FigureCOT, and 5. ( Examplesg–k) relative of errorsCOT in retrieval the estimated by various COT. ( methods,b,g) The radiance namely: red-to-blue (a) ground ratio-truth (RRBR) COT method, (b–f) eofstimated Mejia et COT al. (2016),, and ( (gc–,hk)) NN-0-IPA,relative errors (d,i) in NN-0-3D, the estimated (e,j) NN-M10-3D, COT. (b,g) The and radiance (f,k) CNN-3D. red-to The-blue details ratio (RRBR)of each NNmethod/CNN-based of Mejia estimation et al. (2016) model, (c, areh) NN summarized-0-IPA, (d in,i) Table NN-10.-3D, This ( samplee,j) NN has-M10 a size-3D, of and32 (f,k32) × CNNpixels,-3D. and The corresponds details of to each the NN/CNN region marked-based by estimation a red square model in Figure are summarized2. The direct in solar Table beam 1. This is sampleincident has from a size the of left, 32 with 32 apixels solar, and zenith corresponds angle of 60 to◦. the region marked by a red square in Figure 2. The direct solar beam is incident from the left, with a solar zenith angle of 60°. The relative error of the COT estimated by each model is shown in Figure6. The root-mean-square percentageThe relative error (RMSPE)error of the is calculated COT estimated by the by following: each model is shown in Figure 6. The root-mean- square percentage error (RMSPE) is calculated by thev following: t n 1 X RMSPE = 2 (7) n j k = 1 where  is the relative error (in %) of jth pixel sample, as follows: j

τsc,est τsc,tru  = 100 − (8) × τsc,tru Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 30

푛 1 RMSPE = √ ∑ 휖2 (7) 푛 푗 푘=1

where 휖푗 is the relative error (in %) of jth pixel sample, as follows: Remote Sens. 2019, 11, 1962 11 of 28 휏푠푐,푒푠푡 − 휏푠푐,푡푟푢 휖 = 100 × (8) 휏푠푐,푡푟푢 where τsc,est and τsc,tru represent the estimated and ground-based COTs, respectively. The single-pixel where 휏 and 휏 represent the estimated and ground-based COTs, respectively. The single- methods푠푐 (RRBR,푒푠푡 and푠푐 NN-0-IPA),푡푟푢 based on the IPA tend to overestimate the COT for a COT of around 3, pixel methods (RRBR and NN-0-IPA) based on the IPA tend to overestimate the COT for a COT of and underestimate COT for a COT of 10 or more. As clearly illustrated in Figure3, the 3D radiance around 3, and underestimate COT for a COT of 10 or more. As clearly illustrated in Figure 3, the 3D at a COT of ~5 or more is significantly larger than the IPA radiance by the illumination effect, and is radiance at a COT of ~5 or more is significantly larger than the IPA radiance by the illumination effect, close to the IPA radiance for a COT of 5. Thus, the IPA-estimated COT tends to be around 5, when the and is close to the IPA radiance for a COT of 5. Thus, the IPA-estimated COT tends to be around 5, true COT is 5 or more. The IPA estimation is not improved by training, even when the surrounding when the true COT is 5 or more. The IPA estimation is not improved by training, even when the pixels are included in the input. However, if an NN is trained based on the 3D radiance at multiple surrounding pixels are included in the input. However, if an NN is trained based on the 3D radiance pixels, the estimation accuracy is improved. This is evident even in NN-1-3D (Figure6c), which uses at multiple pixels, the estimation accuracy is improved. This is evident even in NN-1-3D (Figure 6c), only eight adjacent pixels. Furthermore, it can be seen that the accuracy improves dramatically as the which uses only eight adjacent pixels. Furthermore, it can be seen that the accuracy improves number of surrounding pixels increases (with increasing M). dramatically as the number of surrounding pixels increases (with increasing M).

Figure 6. Comparison of relative errors in the estimated COT obtained by various methods summarized Figure 6. Comparison of relative errors in the estimated COT obtained by various methods in Table1, namely: ( a) RRBR method; (b) NN-0 method; (c) NN-1 method; (d) NN-3 method; (e) summarized in Table 1, namely: (a) RRBR method; (b) NN-0 method; (c) NN-1 method; (d) NN-3 NN-10 method; (f) CNN. The solid lines denote the median values and the shaded areas denote ranges method; (e) NN-10 method; (f) CNN. The solid lines denote the median values and the shaded areas between the 25th and 75th percentiles. The NN models were trained using IPA and 3D radiances, denote ranges between the 25th and 75th percentiles. The NN models were trained using IPA and 3D respectively, and the estimation errors for the IPA- and 3D-based NN models were evaluated for the 3D radiances, respectively, and the estimation errors for the IPA- and 3D-based NN models were radiances as the input. evaluated for the 3D radiances as the input. Figure7 shows the COT estimation accuracy of models NN-1 to NN-10, which verifies that the informationFigure 7 of shows distant the pixels COT isestimation useful for accuracy COT estimation. of models ThisNN- confirms1 to NN-10, that which the estimation verifies that error the decreasesinformation as theof distant spatial pixels scale of isthe useful input for data COT increases. estimation. Figure This7 showsconfirms the that results the for estimation only up toerror 10 pixelsdecreases away as fromthe spatial the center, scale thatof the is, input 500 m data away, increases. but it is Figure expected 7 show thats the theestimation results for accuracyonly up to can 10 bepixels further away improved from the by center, further that expanding is, 500 m away, the spatial but it scale. is expected It is known that the that estimation the 3D radiative accuracy eff canect worksbe further mainly improved for a horizontal by further scale expanding of up to the about spatial 5 km scale. for cloudsIt is known with that a geometrical the 3D radiative thickness effect of 500works m [14mainly,42]. Figurefor a ho7 alsorizontal shows scale the of estimation up to about error 5 km of CNN for clouds for comparison. with a geometrical The CNN thickness model has of the500 samem [14 estimation,42]. Figure error 7 also as shows NN-6 orthe NN-7. estimation However, error as of mentioned CNN for comparison above, the operation. The CNN of model the CNN has modelthe same is overwhelmingly estimation error fast,as NN because-6 or NN it estimates-7. However, the cloud as mentioned properties above, of many the opera pixelstion simultaneously. of the CNN Itmodel can be is seen overwhelmingly from Table1 that fast the, CNN-3D because model it estimates takes significantly the cloud less properties time to be trained of many (about pixels 7% ofsimultaneously. the training time It can of NN-10-3D).be seen from This Table numerical 1 that the advantage CNN-3D ofmodel the CNN-3D takes significantly model becomes less time more to important as the number of input and output pixels increase. Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 30

be trained (about 7% of the training time of NN-10-3D). This numerical advantage of the CNN-3D Remotemodel Sens. becomes2019, 11 more, 1962 important as the number of input and output pixels increase. 12 of 28

(a) (b) (c)

Figure 7. Root-mean-square percentage errors (RMSPEs) as functions of the number of input-image pixelsFigure of 7. theRoot multi-pixel-mean-square NN percentage (NN-M) estimation errors (RMSPEs) model for as (afunctions) COT = 0.2–1,of the (numberb) COT of= 1–10,input and-image (c) COTpixels= of10–100. the multi For- thepixel number NN (NNM in-M) the estimation horizontal model axis, the for number (a) COT of = input-image 0.2–1, (b) COT pixels = 1 is–10, (2M and+ 1) (c2), 2 includingCOT = 10– the100. target For the pixel number and surrounding M in the horizontal pixels. For axis, comparison, the number the of RMSPEinput-image for the pixels CNN-3D is (2M model + 1) , isincluding plotted withthe target identical pixel values, and surrounding independent pixels. of the For number comparison, of surrounding the RMSPE pixels. for the CNN-3D model is plotted with identical values, independent of the number of surrounding pixels. 4. COT Retrieval from Synthetic Sky-View Camera Images 4. COT Retrieval from Synthetic Sky-View Camera Images Based on the results in the previous section, a new CNN model was developed for COT retrieval fromBased a sky-view on the camera results image. in the previous The training section, was a based new onCNN a large model number was developed of synthetic for camera COT retrieval images thatfrom were a sky simulated-view camera by 3D image. radiative The training transfer was calculations. based on a large number of synthetic camera images that were simulated by 3D radiative transfer calculations. 4.1. Data Generation 4.1. DataIn order Generation to develop a practical estimation model, it is necessary to include various cloud fields, and variousIn order aerosol toand develop underlying a practical surface estimation conditions model, in the it trainingis necessary data. to Therefore, include variou the clouds,s cloud aerosols, fields, and surfacevarious albedoaerosol were and variedunderlying regularly surface or randomly conditions in in the the simulations. training data. By changingTherefore, the the LWC clouds, and cloudaerosols, particle and surface number albedo density, were the varied cloud regularly extinction or coe randomlyfficient andin the CDER simulations. were modified By changing from the originalLWC and states. cloud The particle cloud number extinction density, coefficient the cloud was randomly extinction modified coefficient by aand factor CDER within were the modified range of 0.2–5.from the The original cloud base states. height The was cloud set extinction as 500, 1500, coefficient 2500, 3500, was or randomly 4500 m. The modified geometrical by acharacteristics factor within ofthe the range 3D cloudsof 0.2–5. were The modifiedcloud base by height stretching was set or as shrinking 500, 1500, in 2500, the horizontal 3500, or 4500 and m. vertical The geometrical directions. Thecharacteristics cloud geometrical of the 3D thickness clouds was were changed modified by by a factor stretching of 0.5, or 1, shrinking or 2, and the in the entire horizontal domain was and stretchedvertical directions. horizontally The by cloud a random geometrical factor thickness within the was range changed of 0.5–2 by ina factor the x and of 0.5,y directions 1, or 2, and of the Cartesianentire domain coordinate was stretched system. Thehorizontally rural type by of a aerosolrandom was facto assumedr within as the in therange previous of 0.5– section,2 in the andx and the y aerosoldirections optical of the thickness Cartesian was coordinate chosen randomly system. within The rural the rangetype of of aerosol 0.04–1. wasThe surface assumed albedo as in was the randomlyprevious section, chosen withinand the the aerosol range optical of 0.02–0.5. thickness The solarwas chosen position randomly was also random,within the with range a solar of 0.04 zenith–1. angleThe surface of 0–70 albedo◦ and a was solar randomly azimuth anglechosen of within 0–360◦ .the range of 0.02–0.5. The solar position was also random,The with synthetic a solar camera zenith image angle isof generated 0–70° and bya solar an image azimuth projection angle of of 0– the360°. angular distribution of the spectralThe synthetic radiance camera within image a field is ofgenerated view of theby an camera. image Theprojection technical of the details angular of the distribution camera used of inthe this spectral study radiance are presented within in a field Section of view5.1. MCARaTSof the camera. allows The for technical the position, details attitude,of the camera and viewing used in anglethis study of the are camera presented to be in set Subsection arbitrarily. 5.1. The MCARaTS horizontal allows location for of the the position, camera was attitude, random and within viewing the computationalangle of the camera domain, to be and set the arbitrarily. camera was The assumed horizonta to bel location oriented of to the the camera zenith. was Each random camera within image wasthe computational projected in an domain, equidistant and projection the camera with was a size assumed of 128 to128 be oriented pixels, with to the a diagonal zenith. angleEach ofcamera view × ofimage about was 127 projected◦ and a field in an of equidistant view within projection 45◦ of the with image a size center of 128 in the 128 vertical pixels, and with horizontal a diagonal axes angle of theof view image. of about The spectral 127° and radiance a field of was view calculated within 45 for° of the the pixels, image with center a viewing in the vertical zenith angleand horizontal (VZA) of lessaxes thanof the 45 ◦image.. An example The spectral of the radiance multi-directional was calculated lines of for sight the seenpixels from, with a cameraa viewing is schematically zenith angle shown(VZA) in of Figure less than1b. Because 45°. An of ex theample presence of the of multi Monte-directional Carlo noise lines in the of simulations, sight seen from each a of camera the RGB is channelsschematically contains shown an averagein Figure calculation 1b. Because noise of the ofabout presence 2%–4% of Monte at the Carlo pixel level.noise in the simulations, each Anof the example RGB channels of the input contains and outputan average data calculation of the CNN noise is shown of about in Figure 2%–4%8. at The the input pixel includes level. 2 1 1 RGBAn radiances example normalized of the input by and dividing output by data 800 Wof mthe− CNNsr− µ ism shown− in the in preprocessing, Figure 8. The input as in Sectionincludes3. AsRGB the radiances observed normalized radiance is by strongly dividing affected by 800 by W the m solar–2 sr–1 position, µm–1 in the an additionalpreprocessing channel, as in in Section the CNN 3. inputAs the represents observed radiance the solar is position. strongly Theaffected solar by position the solar channel position, was an represented additional channel by an isotropic in the CNN 2D Gaussian distribution with a peak in the solar direction and a standard deviation of 50 pixels. A cloud

Remote Sens. 2019, 11, 1962 13 of 28

Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 30 property that is to be estimated from the camera images is the slant-column COT (SCOT) along the line input represents the solar position. The solar position channel was represented by an isotropic 2D of sight, corresponding to each pixel as follows: Gaussian distribution with a peak in the solar direction and a standard deviation of 50 pixels. A cloud property that is to be estimated from the cameraZ images is the slant-column COT (SCOT) along the = ∞ ( ) line of sight, corresponding to each pixelτ assc follows: βext s ds (9) 0 ∞ where s is the distance from the camera. SCOT휏푠푐 = is∫ calculated훽푒푥푡(푠)푑푠 by ray-tracing using MCARaTS. The output(9) of the CNN is actually the non-linear transform0 of SCOT, as follows: where s is the distance from the camera. SCOT is calculated by ray -tracing using MCARaTS. The τ0 = H log τsc + 1 log τsc + 1 /3 (10) output of the CNN is actually thesc non-linear10 transform× of SCOT10 , as follows: SCOT can be inverted from Equation휏′푠푐 = 퐻 (11),(log10 in휏 which푠푐 + 1) pixels× (log with10 휏푠푐 a+ SCOT1)⁄3 <0.1 are reset as clear sky(10 (i.e.,) SCOT can= 0). be The inverted SCOT from corresponding Equation (11 to) a, in direction which pixels with thewith zenith a SCOT angle <0.1θ arecan reset be converted as clear sky into (i.e., an SCOTequivalent = 0). COTThe SCOT (i.e., vertical corresponding integral to of extinction)a direction aswith in Equationthe zenith (1), angle using θ can the be following converted equation: into an equivalent COT (i.e., vertical integral of extinction) as in Equation (1), using the following equation: τc,eq(θ) = τsc(θ) cos θ (11) 휏푐,푒푞(휃) = 휏푠푐(휃) cos 휃 (11) A total of 115,200 samples were generated. These samples were divided into 103,680 samples for A total of 115,200 samples were generated. These samples were divided into 103,680 samples for training and 11,520 samples for accuracy evaluation. training and 11,520 samples for accuracy evaluation.

(a) (b) (c)

(d) (e) (f)

Figure 8. Example of the input and output of the CNN for the estimation of slant-column COT (SCOT) Figurefrom the 8. sky-cameraExample of the images. input Shown and output are the of spectralthe CNN radiances for the estimation from the (ofa) slant red, (-bcolumn) green, COT and (SCOT) (c) blue from the sky-camera images. Shown are the spectral2 1 radiances1 from the (a) red, (b) green, and (c) blue channels, normalized by dividing by 800 W m− sr− µm− ,(d) a true-color image, (e) a solar position –2 –1 –1 channels,image, and normalized (f) the SCOT by alongdividing the by line 800 of sightW m corresponding sr µm , (d) toa true each-color pixel. image, The spectral (e) a solar responses position of image,a commercial and (f) digitalthe SCOT camera along with the aline fish-eye of sight lens corresponding are predetermined to each and pixel. assumed The spectral in the responses simulations. of aThe commercial input to the digital CNN camera is a four-channel with a fish- imageeye lens composed are predetermined of (a–c,e), and and the assumed output in is the a single-channel simulations. The input to the CNN is a four-channel image composed of (a,b,c,e), and the output is a single-channel image of (f). The solar zenith and azimuth angles are 26◦ and 184.2◦, respectively. The centers of the image of (f). The solar zenith and azimuth angles are 26° and 184.2°, respectively. The centers of the images are oriented to the zenith, the field of view is within 45◦ from the center, and an equidistant imagesprojection areis oriented assumed. to the zenith, the field of view is within 45° from the center, and an equidistant projection is assumed. 4.2. Design and Configuration of CNN Model 4.2. Design and Configuration of CNN Model Figure9 shows the structure of the CNN. The basic structure is the same as in Figure4, but the input and outputFigure of9 shows the CNN the instructure Figure9 of are the images CNN. with The abasic size structure of 128 128is the pixels. same The as in use Figure of many 4, but layers the × inputgenerally and increasesoutput of thethe representationCNN in Figure capability9 are images of thewith CNN a size model. of 128 The 128 CNN pixels in. FigureThe use9 contains of many layersadditional generally dilated increase convolutions the representation layers [43,44]. capability The dilated of convolution the CNN model. can capture The CNN variation in Figure with 9 a containslarger spatial additional scale, bydilated widening convolution the sampling layers interval [43,44]. ofThe the dilated convolution convolution filter. The can implementationcapture variation of withthe dilated a larger convolution spatial scale into, theby residual widening unit the is basedsampling on the interval work of Yu the et convolution al. [45]. The filter. multiscale The ipoolingmplementation layers facilitate of the dilated to capture convolution the local into and the global residual features, unit andis based they on are the fused work by of concatenation. Yu et al. [45]. The multiscale pooling layers facilitate to capture the local and global features, and they are fused by concatenation.

Remote Sens. 2019, 11, 1962 14 of 28 Remote Sens. 2019, 11, x FOR PEER REVIEW 14 of 30

FigureFigure 9. 9. StructureStructure of of the the CNN CNN proposed proposed in in the the present present study study for for the the estimation estimation of of slant slant-column-column COT COT (SCOT)(SCOT) from from sky sky camera camera images. images. The The input input is is three three spectral spectral channels channels with with an an additional additional solar solar position position image.image. Layers Layers denoted denoted in in light light green green represent represent the the dilated dilated con convolutionalvolutional layers layers with with the the shown shown filter filter size,size, number number of of output output channels, channels, and and dilation dilation parameter. parameter. The The filters filters and and an an example example of of the the feature feature mapsmaps of of the the layers layers denoted denoted by by (A2), (A2), (A3), (A3), and and (A4) (A4) are presented in FiguresFigures A2B1–A54. The. The other other notations notations araree the the same same as as in in Fig Figureure 44.. More More details details are are explained explained in in the the text. text.

InIn order order to toimprove improve the the generalization generalization performance performance of the of CNN, the CNN, according according to the towork the by work Zhao by etZhao al. [34 et], al. so [-34called], so-called supervised supervised learning learning was used was in usedthe training in the trainingprocess. process.The CNN The has CNN two hasoutput two layersoutput, outputs layers, (1) outputs and (2) (1), as and in (2),Figure as in 9, Figure and the9, andloss thefunction loss function is defined is definedas follows: as follows:

퐿L ==퐿1L++0.0.44 × 퐿L2 (12(12)) 1 × 2 where L1 and L2 are the mean squared errors of the SCOT transform obtained from outputs (1) and where L1 and L2 are the mean squared errors of the SCOT transform obtained from outputs (1) (2), respectively. Adam with Weight decay (AdamW) [46] was used for the optimization of the model and (2), respectively. Adam with Weight decay (AdamW) [46] was used for the optimization of the parameters. The training took about 50 hours on a supercomputer system of the Information model parameters. The training took about 50 hours on a supercomputer system of the Information Technology Center of Tokyo, using four NVIDIA Tesla P100 GPUs in parallel. Technology Center of Tokyo, using four NVIDIA Tesla P100 GPUs in parallel.

4.3.4.3. Examples Examples of of CNN CNN R Retrievaletrieval TheThe estimation estimation accuracy accuracy of of the the trained trained CNN CNN model model was was evaluated evaluated using using the the test test data. data. Figure Figure 1 100 showsshows examples examples of of the the ground ground truth truth and and estimated estimated SCOT. SCOT. Figure Figure 1 010a,ba,b show showss small small bro brokenken clouds, clouds, FigureFigure 11c 11 cshows shows a acase case of of relatively relatively homogeneous homogeneous thin thin cloud, cloud, Figure Figure 1 100dd shows shows a acase case of of large large thin thin brokenbroken cloud, cloud, Figure 1010e–he–h correspond correspond to to stratocumulus stratocumulus clouds clouds with with small small gaps, gaps, and Figure and Figure 10i shows 10i shows a very optically thick cloud. In all cases, SCOT seems to be reasonably estimated, and even small clouds are accurately detected. The CNN adequately reproduces the fact that the clouds are

Remote Sens. 2019, 11, 1962 15 of 28 a very optically thick cloud. In all cases, SCOT seems to be reasonably estimated, and even small cloudsRemote are Sens. accurately 2019, 11, x FOR detected. PEER REVIEW The CNN adequately reproduces the fact that the clouds are15 of optically 30 thick at the central part and become optically thin near the edges. The CNN can overcome the COT optically thick at the central part and become optically thin near the edges. The CNN can overcome ambiguity by using the spatial features learned from the training data. the COT ambiguity by using the spatial features learned from the training data.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 10. Examples of SCOT estimated by the CNN for cases (a–i) in the test dataset. The leftmost Figure 10. Examples of SCOT estimated by the CNN for cases (a–i) in the test dataset. The leftmost column shows the true-color images, the two middle columns show the (left) ground-truth and (right) column shows the true-color images, the two middle columns show the (left) ground-truth and (right) estimatedestimated SCOT, SCOT, and and the the rightmost rightmost columncolumn shows shows the the relative relative error error (%) (%) of the of theestimated estimated SCOT. SCOT.

Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 30

The complex non-linear relationship between COT and radiance caused by the 3D radiative effect is also learned by the CNN. A closer analysis of the results shows that the spatial distribution of the estimated COT on small scales. Large estimation errors exceeding 100% occur locally, suggesting that fine-scale fluctuations are difficult to reproduce using the present technique.

4.4. Evaluation of Retrieval Error The statistical results of the SCOT retrieval error using all o the test data are shown in Figure 11. As can be seen in the joint histogram of the estimated and ground-truth SCOTs (Figure 11a), most of the data are concentrated near the diagonal line. There is a high correlation between the estimated and ground-truth SCOTs, with a correlation coefficient of 0.84. Figure 11b shows the relative error percentiles of the estimated SCOT as functions of the ground truth SCOT. The 50th percentile is between –7% and 1% for all of the SCOT ranges, and the 25th and 75th percentiles are within a range of 15% for SCOT ≥1. For 0.2 ≤ SCOT ≤ 1, the variation in the relative error increases, but the 25th and 75th percentiles are still within  26%. When SCOT is close to 0.2, the 5th percentile is –100%, and the 95th percentile increases rapidly. The reason for the decrease in estimation accuracy for the very small SCOT is that the radiance sensitivity to COT is low (Figure 3a) and that the AOT and surface albedo are uncertain [47]. In this study, training data were generated assuming randomly chosen AOT and surface albedo values, and the CNN model assumes that the AOT and surface albedo are completely uncertain when estimating SCOT. It is remarkable that the CNN model can stably estimate SCOT with reasonable accuracy, even with an uncertain AOT and surface albedo. On the other hand, in the RRBR method [7], the increase of the observed (3D) radiance due to the illuminating effect leads to an underestimation of COT, and a large negative bias of over –50% has been reported when COT ≥30, which is consistent with the results in Figure 6. However, in our results, the bias is significantly reduced, which is because of the training based on the 3D radiative transfer model. The histogram of the estimated SCOT shows a very good agreement with the ground-truth histogram (Figure 11c), Remote Sens. 2019, 11, 1962 16 of 28 which suggests that the statistical distribution of SCOT can be adequately reproduced.

(a) (b) (c)

Figure 1 11.1. Evaluation of SCOT estimation by the CNN for the test dataset dataset:: ( a)) jo jointint histogram of the estimated and ground ground-truth-truth SCOT SCOT,, with an occurrence denoted by color shades shades;; ( b)) r relativeelative error of the estimated SCOT SCOT as as a a function function of of the the ground ground-truth-truth SCOT SCOT;; ( c) c comparisonomparison of histograms of thethe es estimatedtimated and ground ground-truth-truth SCOT.

TableThe complex 2 summarizes non-linear the relationship RMSPE, mean between absolute COT and percentage radiance causederror (MAPE), by the 3D and radiative mean e biasffect percentageis also learned error by (MBPE) the CNN. for A different closer analysis ranges of of the SCOT results and shows cloud that fraction. the spatial MAPE distribution and MBPE of are the definedestimated by COT the following on small scales.equations: Large estimation errors exceeding 100% occur locally, suggesting that fine-scale fluctuations are difficult to reproduce using푛 the present technique. 1 MAPE = ∑|휖 | (13) 4.4. Evaluation of Retrieval Error 푛 푘 푘=1 푛 The statistical results of the SCOT retrieval error1 using all o the test data are shown in Figure 11. As can be seen in the joint histogram of theMBPE estimated= ∑ and휖 ground-truth SCOTs (Figure 11a), most(14 of) 푛 푘 the data are concentrated near the diagonal line. There푘=1 is a high correlation between the estimated Errorand ground-truth evaluation was SCOTs, performed with afor correlation pixels with coe VZAfficient ≤43°. of RMSPE 0.84. Figure is 87% 11 whenb shows all the data relative are errorused forpercentiles evaluation. of the RMSPE estimated is reduced SCOT as to functions 23% when of the the ground cloud truthfraction SCOT. is ≥0.7 The, 50thand percentileSCOT is ≥ is 1. between Under –7% and 1% for all of the SCOT ranges, and the 25th and 75th percentiles are within a range of 15% ± for SCOT 1. For 0.2 SCOT 1, the variation in the relative error increases, but the 25th and 75th ≥ ≤ ≤ percentiles are still within 26%. When SCOT is close to 0.2, the 5th percentile is –100%, and the 95th ± percentile increases rapidly. The reason for the decrease in estimation accuracy for the very small SCOT is that the radiance sensitivity to COT is low (Figure3a) and that the AOT and surface albedo are uncertain [47]. In this study, training data were generated assuming randomly chosen AOT and surface albedo values, and the CNN model assumes that the AOT and surface albedo are completely uncertain when estimating SCOT. It is remarkable that the CNN model can stably estimate SCOT with reasonable accuracy, even with an uncertain AOT and surface albedo. On the other hand, in the RRBR method [7], the increase of the observed (3D) radiance due to the illuminating effect leads to an underestimation of COT, and a large negative bias of over –50% has been reported when COT 30, which is consistent with the results in Figure6. However, in our results, the bias is significantly ≥ reduced, which is because of the training based on the 3D radiative transfer model. The histogram of the estimated SCOT shows a very good agreement with the ground-truth histogram (Figure 11c), which suggests that the statistical distribution of SCOT can be adequately reproduced. Table2 summarizes the RMSPE, mean absolute percentage error (MAPE), and mean bias percentage error (MBPE) for different ranges of SCOT and cloud fraction. MAPE and MBPE are defined by the following equations: n 1 X MAPE =  (13) n | k| k = 1 n 1 X MBPE =  (14) n k k = 1 Error evaluation was performed for pixels with VZA 43 . RMSPE is 87% when all the data are used ≤ ◦ for evaluation. RMSPE is reduced to 23% when the cloud fraction is 0.7, and SCOT is 1. Under ≥ ≥ Remote Sens. 2019, 11, 1962 17 of 28 broken clouds with a small cloud fraction, the 3D radiative transfer effects are large, which results in large estimation error. Additionally, RMSPE tends to be large when SCOT < 1. This is mainly caused by the uncertainty in AOT, as described above, which may cause extremely large relative errors in a small number of pixels when SCOT 0.2 (Figure 11a,b). MAPE is 1.5–5 times smaller than RMSPE. ≈ The MBPE ranges from –1.7% to 4.2% when the cloud amount is 0.7 or more, and ranges from –11.7% to 21.1% when the cloud amount is smaller than 0.7.

Table 2. Summary of errors in estimated slant-column COT (SCOT) along the line of sight from the simulated sky-view camera image, evaluated for the test data set. The results are summarized for different cloud fraction and SCOT ranges.

Cloud Fraction SCOT range RMSPE 1 (%) MAPE 1 (%) MBPE 1 (%) 1–100 23 13 –0.2 0.2–100 30 14 0.3 0.7 ≥ 0.2–1 69 22 4.2 1–10 26 14 1.1 10–100 17 11 –1.7 1–100 118 49 7.0 0.2–100 189 52 12.6 <0.7 0.2–1 262 56 21.1 1–10 128 48 12.9 10–100 77 50 –11.7 1–100 48 18 0.8 0.2–100 87 21 2.6 all 0.2–1 191 39 12.5 1–10 59 20 3.2 10–100 27 14 –2.5 1 RMSPE, MAPE, and MBPE denote root-mean-square, mean absolute, and mean bias percentage errors, respectively.

So far, it has been assumed that the radiometric calibration of the digital camera is accurate, that is, the RGB radiances are accurate. However, if the calibration is biased, the SCOT can be overestimated or underestimated. We examined how the bias in the radiometric calibration affects the estimation error of SCOT. The amplification factors of the RMSPE of the estimated SCOT with respect to a case of perfect calibration are shown in Figure 12. In Figure 12a, all of the RGB channels are equally biased, with a bias factor from 0.6 to 2.0. The RMSPE is indeed the smallest when there is no radiance bias, and the RMSPE increases by increasing the positive and negative bias. When the bias factor is in the Remoterange Sens. of 0.75–1.7, 2019, 11, x the FOR RMSPE PEER REVIEW doesnot increase significantly, with an amplification factor of 1.518 or of less. 30

(a) (b)

Figure 12. Sensitivities of the SCOT estimation error to (a) radiance bias and (b) RBR bias. The vertical Figure 12. Sensitivities of the SCOT estimation error to (a) radiance bias and (b) RBR bias. The vertical axis is the RMSPE relative to the RMSPE, without radiance and RBR bias. For (a), the radiance bias axis is the RMSPE relative to the RMSPE, without radiance and RBR bias. For (a), the radiance bias with the same factor was assumed for all of the RGB channels. For (b), the biases of the red and blue with the same factor was assumed for all of the RGB channels. For (b), the biases of the red and blue channels are assumed to have opposite signs. A total of 100 samples from the test dataset were used to channelsevaluate are each assumed case of biasedto have input. opposite signs. A total of 100 samples from the test dataset were used to evaluate each case of biased input.

In other words, the estimation of SCOT does not deteriorate if the error in absolute calibration of the camera is not excessive. On the other hand, Figure 12b shows the estimated error amplification of SCOT when the RBR is biased. When SCOT is 10 or more, and RBR has a negative bias (more bias in the blue channel than the red channel), the RMSPE of the SCOT rapidly increases. Thus, the SCOT estimation is relatively sensitive to the RBR bias. The RBR bias factor should be within 0.95–1.17 in order to keep the RMSPE amplification factor less than 1.5. Although the relative channel responses need to be calibrated with moderate accuracy, the SCOT retrieval is not very sensitive to the absolute calibration of radiance, which could be a reflection of the fact that the CNN model mainly uses the structural context for the SCOT retrieval.

5. Applications to Sky-View Camera Observations

5.1. Overview of Observation Observations were made using a digital camera for 28 days between 22 June and 12 November 2018. From June to October 2018, the observation site was at the rooftop of a building of the Graduate School of Science, Tohoku University (140.84°E, 38.25°N, 158 m above sea level), and in November 2018, the observation site was at the rooftop of a building in Chiba University (140.10°E, 35.62°N, 21 m ASL). The digital camera was a Nikon D7000 with an AF DX Fish-eye-Nikkor lens (10.5 mm f/2.8G ED) with a diagonal angle of 180°. The camera was oriented in the zenith direction, the aperture size was set at the maximum f-value of 2.8, and the exposure time was adjusted according to the overall brightness at the time of observation. A radiometric calibration was performed in advance using equipment at the Japan Aerospace Exploration Agency (JAXA) Tsukuba Space Center. The coefficients for conversion from the digital counts of RAW-format images to the channel-mean spectral radiances and spectral response functions of the RGB channels were determined, and a correction of the vignetting effect was applied [48]. The RAW images were converted to the equidistant projection, the radiance values were averaged to 128 × 128 pixels, and the pixel radiances for VZA >45° were set to 0, as for with the training data.

5.2. Example Results Examples of the estimation result obtained by the CNN are shown in Figure 13. In this section, pixels with SCOT ≥0.2 were identified as cloudy. The cases in Figure 13a–d appear to be water clouds in the boundary layer. The SCOT distribution and the cloud pixel detection seem reasonable for both the overcast clouds (Figure 13a) and small broken clouds (Figure 13c). The cloud spatial structure at various horizontal scales was trained by the CNN, thus capturing that the COT is large at the center of the cloud and small at the edges. In the example in Figure 13b, the sun is located at a low position

Remote Sens. 2019, 11, 1962 18 of 28

In other words, the estimation of SCOT does not deteriorate if the error in absolute calibration of the camera is not excessive. On the other hand, Figure 12b shows the estimated error amplification of SCOT when the RBR is biased. When SCOT is 10 or more, and RBR has a negative bias (more bias in the blue channel than the red channel), the RMSPE of the SCOT rapidly increases. Thus, the SCOT estimation is relatively sensitive to the RBR bias. The RBR bias factor should be within 0.95–1.17 in order to keep the RMSPE amplification factor less than 1.5. Although the relative channel responses need to be calibrated with moderate accuracy, the SCOT retrieval is not very sensitive to the absolute calibration of radiance, which could be a reflection of the fact that the CNN model mainly uses the structural context for the SCOT retrieval.

5. Applications to Sky-View Camera Observations

5.1. Overview of Observation Observations were made using a digital camera for 28 days between 22 June and 12 November 2018. From June to October 2018, the observation site was at the rooftop of a building of the Graduate School of Science, Tohoku University (140.84◦E, 38.25◦N, 158 m above sea level), and in November 2018, the observation site was at the rooftop of a building in Chiba University (140.10◦E, 35.62◦N, 21 m ASL). The digital camera was a Nikon D7000 with an AF DX Fish-eye-Nikkor lens (10.5 mm f/2.8G ED) with a diagonal angle of 180◦. The camera was oriented in the zenith direction, the aperture size was set at the maximum f-value of 2.8, and the exposure time was adjusted according to the overall brightness at the time of observation. A radiometric calibration was performed in advance using equipment at the Japan Aerospace Exploration Agency (JAXA) Tsukuba Space Center. The coefficients for conversion from the digital counts of RAW-format images to the channel-mean spectral radiances and spectral response functions of the RGB channels were determined, and a correction of the vignetting effect was applied [48]. The RAW images were converted to the equidistant projection, the radiance values were averaged to 128 128 pixels, and the pixel radiances for VZA >45 were set to 0, as for with the × ◦ training data.

5.2. Example Results Examples of the estimation result obtained by the CNN are shown in Figure 13. In this section, pixels with SCOT 0.2 were identified as cloudy. The cases in Figure 13a–d appear to be water clouds ≥ in the boundary layer. The SCOT distribution and the cloud pixel detection seem reasonable for both the overcast clouds (Figure 13a) and small broken clouds (Figure 13c). The cloud spatial structure at various horizontal scales was trained by the CNN, thus capturing that the COT is large at the center of the cloud and small at the edges. In the example in Figure 13b, the sun is located at a low position in the image, and in the camera image and RBR image, there is a tendency for the sun side of the cloud to be bright and the RBR to be high. This azimuthal dependence does not appear in the estimated SCOT distribution, suggesting that the 3D radiative effects are corrected. In Figure 13e, the sky is very dark, being covered completely by optically thick clouds, with a very large estimated SCOT of around 100. In Figure 13f, there is a cloud with moderate SCOT on the upper left side of the image. On the lower right side, a 22◦ halo is visible around the sun, which is likely caused by cirrus cloud composed of ice crystals. The halo feature results in an estimated SCOT of ~0.3, and the SCOT pattern clearly shows a halo shape. As the CNN in this study was trained assuming water clouds exclusively, the SCOT estimation may not be correct when the scattering phase function is largely different from that of water droplets. Figure 13g shows a misidentification of cloud around the sun. This could be caused by a lens flare. When the light from a strong light source, such as direct sunlight, is an incident on the camera, stray light from the reflections at the lens surfaces and the lens barrel leaks to darker pixels around the bright pixels. It is known that fisheye lenses are easily affected by lens flare, particularly when the aperture is large. In this study, the lens flare effect was not included when training the CNN, so that inconsistency is present between the training data and the actual observational data. The example Remote Sens. 2019, 11, 1962 19 of 28 in Figure 13g suggests that lens flare causes clear-sky pixels to be possibly erroneously identified as optically thin cloud. There are two methods to reduce such adverse effects, namely: (1) to shield the sun atRemote the timeSens. 2019 of observation, 11, x FOR PEER and REVIEW (2) to train the CNN model by including the lens flare effect.20 of 30

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Figure 13. Examples of SCOT estimated from a ground-based zenith-oriented digital camera. Cases (a–g) correspondFigure 13. Examples to single of snapshots SCOT estimated made atfrom respective a ground times-based shown zenith in-oriented the title digital of the camera. leftmost Cases panel. The observation(a–g) correspond time to is single shown snapshot in thes YYYY.MM.DD.HH:XX made at respective timesformat, shown in where the title YYYY of the denotes leftmost thepanel. year, MM theThe month, observation DD thetime day, is shown HH the in the hour, YYYY.MM.DD.HH:XX and XX the minute. format, The columns where YYYY from denotes left to rightthe year, show the cameraMM the images month, inDD JPEG the day, format, HH the hour, RBR, and the XX estimated the minute. SCOT, The andcolumns the cloudfrom left flag, to right respectively. show The cloudthe came flagra is images displayed in JPEG with format, a white thecolor RBR, forthe estimated cloudy pixels, SCOT, a and navy-blue the cloud color flag for, respectively clear-sky. pixels,The cloud flag is displayed with a white color for cloudy pixels, a navy-blue color for clear-sky pixels, and and a red color for the pixels that are undetermined because of the saturation of the camera-image a red color for the pixels that are undetermined because of the saturation of the camera-image signals. signals. The saturated pixels are indicated by black in the RBR and estimated SCOT images. The saturated pixels are indicated by black in the RBR and estimated SCOT images.

Remote Sens. 2019, 11, 1962 20 of 28

5.3. Comparison with Pyranometer Measurements In November 2018, an intensive observation experiment called the “Chiba Campaign” was conducted at Chiba University in order to observe the atmosphere with various instruments [49]. Chiba University has a SKYNET site, and the data and site information are available on the web (http://atmos3.cr.chiba-u.jp/skynet/chiba/chiba.html). Building on previous efforts (e.g., [50] and the references therein), the effective hemispherical-mean COT was estimated from the total downward observed by a pyranometer on 11 November. A look-up table of irradiance was generated from radiative transfer model calculations, assuming a plane-parallel homogeneous cloud with a constant CDER, and the COT was estimated from a one-minute average of irradiance obtained with 10-second intervals. The COT was not estimated when the transmittance (defined as the ratio of the recorded irradiance to the corresponding value that is theoretically expected for the clear sky case [51]) is equal to or more than 0.8, as it is difficult to determine the COT in such a high-transmittance case, possibly because of the enhanced radiation by illuminated cloud sides. The directional SCOT of each image pixel was estimated by the CNN from a digital camera, and was converted to an equivalent COT using Equation (12). The camera measurements were made every minute. The data with the closest observation times were used to compare the two methods (with a time lag of less than 30 seconds). The pyranometer cannot distinguish between clear sky and cloudy regions within the hemispheric field of view. Therefore, it is reasonable to compare it with the COT averaged over the field of view of the digital camera with VZA 43 , including the clear-sky pixels with zero COT. For this day, the COT ≤ ◦ estimated from the downward irradiance resulted in being in relatively good agreement with the other estimates retrieved from both co-located instruments and satellites [49]. Figure 14 shows a comparison of the COT estimates from the pyranometer and camera. When the transmittance was high, the COT estimation was not actually performed using the pyranometer, but for convenience, the COT estimated from the pyranometer was set to 0 in this comparison. The temporal variations of COT obtained by the two methods are similar, with a correlation coefficient as high as 0.721. The COT obtained from the camera is overestimated, with mean bias of 0.702 compared to that from the pyranometer. The variability of COT obtained from the camera is also larger than that of the COT obtained from the pyranometer. A possible reason for this is that the pyranometer-based estimate is more spatiotemporally averaged in order to obtain smaller fluctuations than that obtained from the camera. Whereas the pyranometer has a field of view of a solid angle of 2π sr, the camera COT is averaged over a field of view of a solid angle of 0.586 π sr (VZA 43 ) centered on the zenith. × ≤ ◦ Additionally, while the camera captures instantaneous values, one-minute average irradiance was used for obtaining COT from the pyranometer. For the camera, the COT of each pixel is first determined, and the spatial variations of the COT are then averaged. On the other hand, for the pyranometer, the variation of radiance within the field of view is averaged at the time of measurement. In general, COT has large spatial and temporal variations, and also has a strong nonlinearity with radiation. In heterogeneous cloud fields, averaging in the radiance space and averaging in the COT space may lead to different results. At around 12:50 and 14:00, the COT obtained from the pyranometer is evidently larger than the COT obtained from the camera. As shown in the photograph in Figure 13c, at these times, the clouds cover a small fraction of the sky, and visually appear optically thin. If the instruments are under a cloud shadow, the pyranometer measures low global irradiance, as the direct sunlight is shadowed and the diffuse radiation from the above is weak because of the absence of significant scattering by clouds. This may cause a significant disagreement between the pyranometer and camera. The radiation enhancement can cause an opposite effect on the pyranometer; downward irradiance is enhanced and possibly exceeds a theoretical value for the clear sky case, resulting in the underestimation of COT obtained from the pyranometer. If cases are excluded when the COT obtained from either the pyranometer or camera is smaller than 1.5, the correlation coefficient becomes as high as 0.748. This kind of comparison is useful to characterize different algorithms and to identify possible 3D radiative effects, although a systematic study is needed in order to draw a conclusion. More detailed Remote Sens. 2019, 11, 1962 21 of 28 analyses including comparisons with different ground- and satellite-based instruments are presented inRemote a separate Sens. 2019 paper, 11, x FOR [49]. PEER REVIEW 22 of 30

(a) (b)

Figure 14. Comparison of the average COT estimated from a zenith-oriented digital camera using Figure 14. Comparison of the average COT estimated from a zenith-oriented digital camera using the the CNN, and the COT estimated using a pyranometer, for 11 November 2018, as follows: (a) time CNN, and the COT estimated using a pyranometer, for 11 November 2018, as follows: (a) time series series with a one-minute interval; (b) a scatter plot of the COT estimated by the two methods. For the with a one-minute interval; (b) a scatter plot of the COT estimated by the two methods. For the camera-based estimation, the COT values were converted from the SCOT obtained for every pixel, camera-based estimation, the COT values were converted from the SCOT obtained for every pixel, and averaged over a field within 43 from the center of the image. In (a), the three red dashed lines and averaged over a field within 43°◦ from the center of the image. In (a), the three red dashed lines correspond to the times of Figure 13a,b,c, respectively. In (b), the correlation coefficient (r) is shown. correspond to the times of Figure 13a,b,c, respectively. In (b), the correlation coefficient (r) is shown. 6. Conclusions 6. Conclusions A new method has been proposed for the estimation of the whole-sky spatial distribution of SCOT fromA images new method taken by has a digitalbeen proposed camera installedfor the estimation on the ground, of the by whole training-sky aspatial CNN baseddistribution on a 3Dof radiativeSCOT from transfer images model. taken Two by a CNNs digital were camera tested installed in this on study. the ground First, the, by eff ectstraining of 3D a radiativeCNN based transfer on a on3D transmittedradiative transfer radiance, model which. Two was CNNs not well were understood, tested in hasthis beenstudy. investigated First, the effects by model of 3D calculations. radiative Undertransfer broken on transmitted clouds, the radiance, 3D radiative which e ffwasects, not such well as theunderstood, radiation has enhancement been investigated on the sun by sidemodel of thecalculations. cloud and Under cloud broken shadow clouds, on the oppositethe 3D radiative side, are effects, significant. such as It wasthe radiation confirmed enhancement that the radiance on the is notsun determinedside of the cloud by the and local cloud COT, shadow but is strongly on the dependentopposite side, on the are spatial significant. arrangement It was confirmed of the adjacent that cloudthe radiance elements. is not In de ordertermined to estimate by the the local local COT COT,, but it is is strongly essential dependent to consider on the the spatial spatial arrangement arrangement of theof the adjacent adjacent cloud cloud elements. elements. A In comparison order to estimate of diff erentthe local NN COT, and CNNit is essential architectures to consider experimentally the spatial confirmedarrangement that of the the multi-pixel adjacent cloud method elements. using multi-pixel A comparison data of including differen surroundingt NN and CNN pixels architectures is effective. Subsequently,experimentally based confirmed on these that findings,the multi we-pixel have method developed using amulti deep-pixel CNN data for including SCOT retrieval surrounding from a sky-viewpixels is effective. camera. EvaluationsSubsequently, using based synthetic on these data findings, showed we that have the developed CNN can a accurately deep CNN and for rapidly SCOT estimateretrieval the from spatial a sky distribution-view camera. of the Evaluations SCOT from using multi-channel synthetic cameradata showed images. that For the SCOTs CNN in can the rangeaccurately of 1–100, and the rapidly bias and estimate dispersion the spatialof the error distribution were stable of and the small,SCOT with from a significant multi-channel reduction camera in theimages. number For ofSCOTs artifacts in the caused range by of 3D 1– radiative100, the bias effects, and which dispersion had been of the a problem error were in previousstable and research. small, Thewith results a significant show that reduction the CNN in the reproduces number theof artifacts spatialfeature caused of by the 3D cloud radiative that theeffects, COT which decreases had been from thea problem center toin theprevious edges research. of the cloud, The suggestingresults show that that the the CNN CNN is reproduces adequately the trained spatial to representfeature of thisthe spatialcloud that feature. the COT Information decreases about from the the spectral center dependencyto the edges ofof thethe radiance cloud, suggesting in the visible that region the CNN is also is usedadequately as supplementary trained to represent information, this spatial and the feature. spectral Information properties about of the the camera spectral need dependency to be accurately of the calibrated.radiance in Onthe thevisible other region hand, is the also CNN used estimation as supplementary does not stronglyinformation, depend and on the the spectral absolute properties value of radiance,of the camera and is need relatively to be robust accurately against calibrated. an uncertainty On the of aboutother 20%hand, of the radiance CNN calibration.estimation However,does not whenstrongly the depend SCOT ison small, the absolute the uncertainty value of radiance in the aerosol, and is amount relatively makes robust the against estimation an uncertainty accuracy of SCOTabout 20% low. of radiance calibration. However, when the SCOT is small, the uncertainty in the aerosol amountThe makes CNN-based the estimation COT estimation accuracy has of been SCOT applied low. to the images obtained by a digital camera that was radiometricallyThe CNN-based calibrated COT estimation beforehand. has Thebeen spatial applied distribution to the images of the obtained estimated by COT a digital did not camera show significantthat was radiometrically dependence on calibrated the solar beforehand. direction, suggesting The spatial that distribution the 3D radiative of the esti effectsmated were COT eff ectivelydid not corrected.show significant As an initialdependence test, the on camera the solar retrieval direction, was compared suggesting with that the the eff ective3D radiative hemispherical-mean effects were COTeffectively estimated corrected. from a pyranometer As an initial for test, one day,the cameraand a high retrieval correlation was wascompared shown. The with COT the estimated effective fromhemispherical the camera-mean has aCOT larger estimated temporal from variability, a pyranometer which may for beone caused day, and by the a high differences correlation in (1) was the shown. The COT estimated from the camera has a larger temporal variability, which may be caused by the differences in (1) the size of the field of view and (2) the time–space averaging method between the digital camera and the pyranometer. The next steps are to analyze more diverse clouds, including ice clouds, so as to improve the accuracy and resolution of the training data, and to conduct more validation work by comparisons

Remote Sens. 2019, 11, 1962 22 of 28 size of the field of view and (2) the time–space averaging method between the digital camera and the pyranometer. The next steps are to analyze more diverse clouds, including ice clouds, so as to improve the accuracy and resolution of the training data, and to conduct more validation work by comparisons with independent observations. By increasing the resolution of the camera images used for estimation, it is possible to better capture small-scale spatial features of clouds [52]. This study is unique in that the spatial distribution of COT can be quickly estimated from a ground-based camera. This estimation can be useful as an initial guess in advanced cloud tomography [23–25,27], for the prediction and diagnosis of solar energy production, and for the validation of satellite cloud products.

Author Contributions: Conceptualization, H.I.; methodology, R.M. and H.I.; software, R.M. and H.I.; validation, R.M.; formal analysis, R.M. and A.D.; investigation, R.M.; resources, H.I.; data curation, R.M.; writing (original draft preparation), R.M.; writing (review and editing), H.I., K.S.S., A.D., and R.K.; visualization, R.M. and H.I.; supervision, H.I. and K.S.S.; project administration, H.I.; funding acquisition, H.I. and R.K. Funding: This research was funded by a Grant-in-Aid for Scientific Research, grant number Kiban B 17H02963, from the Japan Society for the Promotion of Science (JSPS), and the Second Research Announcement on the Earth Observations of the Japan Aerospace Exploration Agency (JAXA), PI number ER2GCF204. A.D.’s work was funded by the JST CREST, grant number JPMJCR15K4. Acknowledgments: SCALE-LES was developed by the SCALE interdisciplinary team at the RIKEN Advanced Institute for Computational Science (AICS), Japan. Parts of the results in this paper were obtained using the K supercomputer at RIKEN AICS. The authors are grateful to Dr. Yousuke Sato of Hokkaido University and Dr. Tomita of AICS, REKEN, for providing the SCALE-LES simulation data, and to Dr. Hitoshi Irie for organizing the Chiba campaign. The results of the radiative transfer simulations in this paper were obtained using the Reedbush supercomputer system of the University of Tokyo. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Clouds Simulated by SCALE-LES Figure A1 shows the temporal variation of cloud in the three SCALE-LES simulation cases (open-cell, closed-cell, and BOMEX cases). The coefficient of variation of COT is defined as the horizontal heterogeneity index Vτ, as follows:

στ Vτ = (A1) τ where τ and στ represent the mean and standard deviation, respectively, of COT at 550 nm for the cloudy pixels. A large Vτ indicates that clouds are horizontally heterogeneous. In the closed-cell case, the clouds maintain a nearly constant average COT over all of the time steps. In the open case, the cloud fraction decreases, and the clear sky spreads with advancing time steps. The horizontal heterogeneity is moderate. In the BOMEX case, the horizontal heterogeneity is larger than in the other two cases. Remote Sens. 2019, 11, 1962 23 of 28 Remote Sens. 2019, 11, x FOR PEER REVIEW 24 of 30

(a) (b)

(c) (d)

(e)

Figure A1. Temporal variations of cloud physical properties in the three cases simulated by the Figure A1. Temporal variations of cloud physical properties in the three cases simulated by the SCALE-LES model, namely: (a) cloud fraction; (b) the horizontal inhomogeneity index defined in SCALE-LES model, namely: (a) cloud fraction; (b) the horizontal inhomogeneity index defined in Equation (5). (c,d,e) Domain average (solid lines) and 20th and 80th percentiles (dashed lines) of cloud Equation (5). (c,d,e) Domain average (solid lines) and 20th and 80th percentiles (dashed lines) of cloud optical thickness, column-mean droplet effective radius, and cloud geometrical thickness, respectively, optical thickness, column-mean droplet effective radius, and cloud geometrical thickness, calculated for the cloudy columns. All of the calculations are made by assuming that cloudy columns respectively, calculated for the cloudy columns. All of the calculations are made by assuming that have a COT greater than or equal to 0.1. cloudy columns have a COT greater than or equal to 0.1. Appendix B. Anatomy of the CNN Appendix B. Anatomy of the CNN In order to understand how the CNN extracts features from the input images, we show the convolutionIn order filters to understand and the output how thefeature CNN maps extracts from featuresthe middle from layer the of input the CNN images, with we the show structure the convolutionshown in Figure filters9. and We takethe output the case feature of Figure maps A2 from as an the example, middle layer in which of the the CNN sun with is located the structure outside shownof the imagein Figure in the 9. We upper take left the direction, case of Fig withure solarB1 as zenith an example angle,of in69.7 which◦. Figure the sun A3 is shows located the outside filters ofand the output image ofin the the convolution upper left direction layer (shown, with assolar A2 zenith in Figure angle9). Theseof 69.7° were. Fig trainedure B2 shows so as tothe capture filters andthe featuresoutput of of the the convolution spatial distribution layer (shown of radiance, as A2 in depending Figure 9). on These the spatialwere trained structure so as of to the capture clouds theand features the positional of the spatial relationship distribution between of theradiance sun and, depending the clouds. on the In the spatial filters structure corresponding of the clouds to the andRGB the channels, positional there relationship are many patterns between in the which sun positiveand the andclouds. negative In the signs filters appear corresponding at the center to andthe RGBperiphery channels, of the there filter. are Spatial many patternspatterns arein which isotropic positive or anisotropic. and negative Additionally, signs appear there at arethe many center filters and peripheryin which the of red the channel filter. Spatial and blue patterns channel are have isotropic opposite or anisotropic. signs, and these Additionally, filters seem there to capture are many the filtersspectral in which features the of red radiance channel in and the blue same channel way that have RBR opposite captures signs, the and spectral these feature.filters seem Some to capture portion theof thespectral output features images of clearlyradiance shows in the the same edges way of that cloud, RBR andcaptures other the images spectral show feature. large Some values portion in the ofdirections the output near images the sun. clearly Figure shows A4 shows the edges the output of cloud, feature and maps other of theimages convolutional show large layer values of Figure in the9 directions near the sun. Figure B3 shows the output feature maps of the convolutional layer of Figure 9 (A3). Various patterns have variations on different spatial scales. In some feature maps, the spatial

RemoteRemote Sens.Sens. 20192019,, 1111,, 1962x FOR PEER REVIEW 2425 of 2830

Remotepattern Sens. strongly 2019, 11, x depend FOR PEER on REVIEW the position of the sun. Figure B4 shows half of the output feature25 mapsof 30 (A3).of the Various convolutional patterns layer have of variations Figure 9 on(A4), diff whileerent spatial the total scales. number Insome of output feature channels maps, the of (A4)spatial is patpatternactuallytern strongly strongly 512. There depend depend is a greaton on the the variety position of ofpatterns, the sun.sun. but Figure it is A5currB4 showsently difficult halfhalf ofof thethe to outputinterpretoutput feature feature the meaning maps maps of oftheof the the convolutional convolutionalfeatures in the layer layer deep of Figureoflayers. Figure9 It(A4), is9 generally(A4), while while the known total the number totalto be numbermore of output difficult of output channels to understand channels of (A4) of isfeatures actually (A4) is of actually512.the deeper There 512. layers isThere a great inis deepa variety great NN. variety of patterns, of patterns, but it isbut currently it is curr dientlyfficult difficult to interpret to interpret the meaning the meaning of the offeatures the features in the in deep the deep layers. layers. It is It generally is generally known known to be to morebe more diffi difficultcult to understandto understand features features of of the thedeeper deeper layers layers in deepin deep NN. NN. (a) (b)

(a) (b)

Figure B1. Examples of (a) an input true-color image and (b) estimated SCOT. The sun is located FigureoutsideA2. of theExamples image in of the (a) upper an input left true-color direction, imagewith solar and (zenithb) estimated and azimuth SCOT. angles The sun of is69.7° located and Figureoutside195.5°, B respectively.1. of Examples the image ofIn in (thisa the) an upperimage, input left thetrue direction,azimuth-color image is with defined and solar ( bto zenith) beestimated positive and azimuth SCOT. in the T anglesclockwisehe sun of is 69.7 directionlocated◦ and outside195.5from◦ the, of respectively. right.the image Inin thisthe image,upper theleft azimuthdirection, is with defined solar to bezenith positive and inazimuth the clockwise angles directionof 69.7° and from 195.5°,the right. respectively. In this image, the azimuth is defined to be positive in the clockwise direction from the right. Red

Red Green

Green Blue

Blue Solar

Solar

Figure A3. Upper panels show the multi-channel filters in the first convolution layer (A2) shown in Figure9 .B Each2. Upper column panels corresponds show the tomulti one- ofchannel 16 filters, filters and in each the first row correspondsconvolution layer to one (A2) of four shown input in channels. The filter size is 7 7 pixels, and the color shades represent the filter weight. The lower panels Figure 9. Each column corresponds× to one of 16 filters, and each row corresponds to one of four input show output feature maps with a size of 128 128 pixels for the case shown in Figure A2, where each Figurechannels. B2. UpperThe filter panels size show is 7 × the 7 pixels multi,- channeland ×the filterscolor shadesin the firstreprese convolutionnt the filter layer weight. (A2) shownThe lower in Figimagepanelsure 9 corresponds.show Each outputcolumn to feature corresponds the above maps multi-channel with to one a size of 16 of convolutionfilters, 128 × 128 and pixels each filters. rowfor Each the corresponds case feature shown map to in one shows Fig ofure four normalized B1 ,input where channels.signalseach image with The corresponds afilter mean size of 0is and7 to × the7 standard pixels above, and deviation multi the-channel color of 1,shades except convolution represe for the filters.nt third, the filter Each ninth, weight. feature and eleventh The map lower shows ones 35 panelsfromnormalized theshow left, output signals for which feature with the a maps standardmean with of deviation0a andsize ofstandard 128 is less× 128 thandeviation pixels 10− for .of the 1, caseexcept shown for thein Fig thiurerd, Bnin1, whereth, and eacheleven imageth ones corresponds from the left, to thefor which above the multi standard-channel deviation convolution is less filters. than 10 Each–35. feature map shows normalized signals with a mean of 0 and standard deviation of 1, except for the third, ninth, and eleventh ones from the left, for which the standard deviation is less than 10–35.

Remote Sens. 2019, 11, x FOR PEER REVIEW 26 of 30 Remote Sens. 2019, 11, 1962 25 of 28

Figure A4. Feature maps with a size of 32 32 pixels that are output from the last convolution layer of Figure B3. Feature maps with a size of 32× × 32 pixels that are output from the last convolution layer the residualof the blockresidual (A3) block shown (A3) shown in Figure in Fig9ure, for 9, for the the case case shown shown in in Fig Figureure B1 A2. All. 64 All feature 64 feature maps are maps are shown.shown. Each imageEach image shows shows the the normalized normalized signalsignal of of the the feature feature map map with with a mean a mean of 0 and of standard 0 and standard Remote Sens. 2019, 11, x FOR PEER REVIEW 27 of 30 deviationdeviation of 1. of 1.

Figure A5. Selected feature maps with a size of 16 16 pixels that are output from the convolution Figure B4. Selected feature maps with a size of 16 × 16 pixels that are output from the convolution layer (A4)layer shown (A4) shown in Figure in Fig9ure, for 9, thefor the case case shown shown in in FigureFigure BA21. Half. Half (256) (256) of the of 512 the feature 512 feature maps are maps are arbitrarilyarbitrarily shown. shown. Each Each image image shows shows the the normalized normalized signal signal of ofthe the feature feature map mapwith a with mean a of mean 0 and of 0 and standardstandard deviation deviation of 1. of 1.

References

1. Chow, C.W.; Urquhart, B.; Lave, M.; Dominguez, A.; Kleissl, J.; Shields, J.; Washom, B. Intra-hour forecasting with a total sky imager at the UC San Diego solar energy testbed. Sol. Energy 2011, 85, 2881– 2893. 2. Marshak, A.; Knyazikhin, Y.; Evans, K.D.; Wiscombe, W.J. The “RED versus NIR” plane to retrieve broken- cloud optical depth from ground-based measurements. J. Atmos. Sci. 2004, 61, 1911–1925. 3. Chiu, J.C.; Huang, C.‐H.; Marshak, A.; Slutsker, I.; Giles, D.M.; Holben, B.N.; Knyazikhin, Y.; Wiscombe, W.J. Cloud optical depth retrievals from the Aerosol Robotic Network (AERONET) cloud mode observations. J. Geophys. Res. 2010, 115, 14202.

Remote Sens. 2019, 11, 1962 26 of 28

References

1. Chow, C.W.; Urquhart, B.; Lave, M.; Dominguez, A.; Kleissl, J.; Shields, J.; Washom, B. Intra-hour forecasting with a total sky imager at the UC San Diego solar energy testbed. Sol. Energy 2011, 85, 2881–2893. [CrossRef] 2. Marshak, A.; Knyazikhin, Y.; Evans, K.D.; Wiscombe, W.J. The “RED versus NIR” plane to retrieve broken-cloud optical depth from ground-based measurements. J. Atmos. Sci. 2004, 61, 1911–1925. [CrossRef] 3. Chiu, J.C.; Huang, C.-H.; Marshak, A.; Slutsker, I.; Giles, D.M.; Holben, B.N.; Knyazikhin, Y.; Wiscombe, W.J. Cloud optical depth retrievals from the Aerosol Robotic Network (AERONET) cloud mode observations. J. Geophys. Res. 2010, 115, 14202. [CrossRef] 4. Kikuchi, N.; Nakajima, T.; Kumagai, H.; Kuroiwa, H.; Kamei, A.; Nakamura, R.; Nakajima, T.Y. Cloud optical thickness and effective particle radius derived from transmitted solar radiation measurements: Comparison with cloud radar observations. J. Geophys. Res.-Atmos. 2006, 111, 07205. [CrossRef] 5. McBride, P.J.; Schmidt, K.S.; Pilewskie, P.; Kittelman, A.S.; Wolfe, D.E. A spectral method for retrieving cloud optical thickness and effective radius from surface-based transmittance measurements. Atmos. Chem. Phys. 2011, 11, 7235–7252. [CrossRef] 6. Niple, E.R.; Scott, H.E.; Conant, J.A.; Jones, S.H.; Iannarilli, F.J.; Pereira, W.E. Application of oxygen A-band equivalent width to disambiguate downwelling radiances for cloud optical depth measurement. Atmos. Meas. Tech. 2016, 9, 4167–4179. [CrossRef] 7. Mejia, F.A.; Kurtz, B.; Murray, K.; Hinkelman, L.M.; Sengupta, M.; Xie, Y.; Kleissl, J. Coupling sky images with radiative transfer models: A new method to estimate cloud optical depth. Atmos. Meas. Tech. 2016, 9, 4151–4165. [CrossRef] 8. Cahalan, R.F.; Ridgway, W.; Wiscombe, W.J.; Bell, T.L.; Snider, J.B. The albedo of fractal stratocumulus clouds. J. Atmos. Sci. 1994, 51, 2434–2455. [CrossRef] 9. Loeb, N.G.; Várnai, T.; Davies, R. Effect of cloud inhomogeneities on the solar zenith angle dependence of nadir reflectance. J. Geophys. Res. Atmos. 1997, 102, 9387–9395. [CrossRef] 10. Várnai, T.; Davies, R. Effects of cloud heterogeneities on shortwave radiation: Comparison of cloud-top variability and internal heterogeneity. J. Atmos. Sci. 1999, 56, 4206–4224. [CrossRef] 11. Várnai, T.; Marshak, A. Observations of three-dimensional radiative effects that influence MODIS cloud optical thickness retrievals. J. Atmos. Sci. 2002, 59, 1607–1618. [CrossRef] 12. Kato, S.; Hinkelman, L.M.; Cheng, A. Estimate of satellite-derived cloud optical thickness and effective radius errors and their effect on computed domain-averaged . J. Geophys. Res. Atmos. 2006, 111, 17201. [CrossRef] 13. Marshak, A.; Platnick, S.; Várnai, T.; Wen, G.; Cahalan, R.F. Impact of three-dimensional radiative effects on satellite retrievals of cloud droplet sizes. J. Geophys. Res. Atmos. 2006, 111, 09207. [CrossRef] 14. Iwabuchi, H.; Hayasaka, T. A multi-spectral non-local method for retrieval of boundary layer cloud properties from optical remote sensing data. Remote Sens. Environ. 2003, 88, 294–308. [CrossRef] 15. Faure, T.; Isaka, H.; Guillemet, B. Neural network retrieval of cloud parameters of inhomogeneous and fractional clouds: Feasibility study. Remote Sens. Environ. 2001, 77, 123–138. [CrossRef] 16. Faure, T.; Isaka, H.; Guillemet, B. Neural network retrieval of cloud parameters from high-resolution multispectral radiometric data: A feasibility study. Remote Sens. Environ. 2002, 80, 285–296. [CrossRef] 17. Cornet, C.; Isaka, H.; Guillemet, B.; Szczap, F. Neural network retrieval of cloud parameters of inhomogeneous clouds from multispectral and multiscale radiance data: Feasibility study. J. Geophys. Res. Atmos. 2004, 109, 12203. [CrossRef] 18. Cornet, C.; Buriez, J.C.; Riédi, J.; Isaka, H.; Guillemet, B. Case study of inhomogeneous cloud parameter retrieval from MODIS data. Geophys. Res. Lett. 2005, 32, 13807. [CrossRef] 19. Evans, K.F.; Marshak, A.; Várnai, T. The potential for improved boundary layer cloud optical depth retrievals from the multiple directions of MISR. J. Atmos. Sci. 2008, 65, 3179–3196. [CrossRef] 20. Okamura, R.; Iwabuchi, H.; Schmidt, S. Feasibility study of multi-pixel retrieval of optical thickness and droplet effective radius of inhomogeneous clouds using deep learning. Atmos. Meas. Tech. 2017, 10, 4747–4759. [CrossRef] 21. Marín, M.J.; Serrano, D.; Utrillas, M.P.; Núñez, M.; Martínez-Lozano, J.A. Effective cloud optical depth and enhancement effects for broken liquid water clouds in Valencia (Spain). Atmos. Res. 2017, 195, 1–8. [CrossRef] Remote Sens. 2019, 11, 1962 27 of 28

22. Fielding, M.D.; Chiu, J.C.; Hogan, R.J.; Feingold, G. A novel ensemble method for retrieving properties of warm cloud in 3-D using ground-based scanning radar and zenith radiances. J. Geophys. Res. Atmos. 2014, 119, 10912–10930. [CrossRef] 23. Levis, A.; Schechner, Y.; Aides, A. Airborne Three-Dimensional Cloud Tomography. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3379–3387. [CrossRef] 24. Levis, A.; Schechner, Y.Y.; Davis, A.B. Multiple-scattering Microphysics Tomography. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6740–6749. 25. Holodovsky, V.; Schechner, Y.Y.; Levin, A.; Levis, A.; Aides, A. In-situ Multi-view Multi-scattering Stochastic Tomography. In Proceedings of the 2016 IEEE International Conference on Computational Photography (ICCP), Evanston, IL, USA, 13–15 May 2016; pp. 1–12. [CrossRef] 26. Martin, W.G.K.; Hasekamp, O.P. A demonstration of adjoint methods for multi-dimensional remote sensing of the atmosphere and surface. J. Quant. Spectrosc. Radiat. Transf. 2018, 204, 215–231. [CrossRef] 27. Mejia, F.A.; Kurtz, B.; Levis, A.; de la Parra, I.; Kleissl, J. Cloud tomography applied to sky images: A virtual testbed. Sol. Energy 2018, 176, 287–300. [CrossRef] 28. Sato, Y.; Nishizawa, S.; Yashiro, H.; Miyamoto, Y.; Tomita, H. Potential of retrieving shallow-cloud life cycle from future generation satellite observations through cloud evolution diagrams: A suggestion from a large eddy simulation. Sci. Online Lett. Atmos. 2014, 10, 10–14. [CrossRef] 29. Sato, Y.; Nishizawa, S.; Yashiro, H.; Miyamoto, Y.; Kajikawa, Y.; Tomita, H. Impacts of cloud microphysics on trade wind cumulus: Which cloud microphysics processes contribute to the diversity in a large eddy simulation? Prog. Earth Planet. Sci. 2015, 2, 23. [CrossRef] 30. Nishizawa, S.; Yashiro, H.; Sato, Y.; Miyamoto, Y.; Tomita, H. Influence of grid aspect ratio on planetary boundary layer turbulence in large-eddy simulations. Geosci. Model Dev. 2015, 8, 3393–3419. [CrossRef] 31. Iwabuchi, H. Efficient Monte Carlo methods for radiative transfer modeling. J. Atmos. Sci. 2006, 63, 2324–2339. [CrossRef] 32. Iwabuchi, H.; Okamura, R. Multispectral Monte Carlo radiative transfer simulation by the maximum cross-section method. J. Quant. Spectrosc. Radiat. Transf. 2017, 193, 40–46. [CrossRef] 33. Hänel, G. The properties of atmospheric aerosol particles as functions of the relative humidity at thermodynamic equilibrium with the surrounding moist air. Adv. Geophys. 1976, 19, 73–188. 34. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. 35. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. 36. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 630–645. 37. Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference (BMVC), York, UK, 19–22 September 2016; BMVA Press: Durham, UK, 2016; pp. 87.1–87.12. 38. Ioffe, S.; Szegedy, C. Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; p. 37. 39. Goodfellow, I.J.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. 40. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization CoRR, abs/1412.6980. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. 41. Tokui, S.; Oono, K.; Hido, S.; Clayton, J. Chainer: A Next-generation Open Source Framework for Deep Learning. In Proceedings of the Workshop on Machine Learning Systems (LearningSys) in the Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS), Montreal CANADA , 7–12 December 2015; Volume 5, pp. 1–6. 42. Marshak, A.; Davis, A. 3D Radiative Transfer in Cloudy Atmospheres; Springer Science Business Media: Berlin, Heidelberg, Germany; New York, NY, USA, 2005. Remote Sens. 2019, 11, 1962 28 of 28

43. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. In Proceedings of the International Conference Learning Representations, San Diego, CA, USA, 7–9 May 2015. 44. Yu, F.; Koltun, V.Multi-scale Context Aggregation by Dilated convolutions. In Proceedings of the International Conference Learning Representations, San Juan, PR, USA, 2–4 May 2016. 45. Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480. 46. Loshchilov, I.; Hutter, F. Fixing weight decay regularization in Adam, CoRR, abs/1711.05101. In Proceedings of the ICLR 2018 Conference Blind Submission, Vancouver, BC, Canada, 30 April–3 May 2018. 47. Zinner, T.; Hausmann, P.; Ewald, F.; Bugliaro, L.; Emde, C.; Mayer, B. Ground-based imaging remote sensing of ice clouds: Uncertainties caused by sensor, method and atmosphere. Atmos. Meas. Tech. 2016, 9, 4615–4632. [CrossRef] 48. Saito, M.; Iwabuchi, H.; Murata, I. Estimation of spectral distribution of sky radiance using a commercial digital camera. Appl. Opt. 2016, 55, 415–424. [CrossRef] 49. Damiani, A.; Irie, H.; Takamura, T.; Kudo, R.; Khatri, P.; Iwabuchi, H.; Masuda, R.; Nagao, T. An intensive campaign-based intercomparison of cloud optical depth from ground and satellite instruments under overcast conditions. Sci. Online Lett. Atmos. 2019, 15, 190–193. [CrossRef] 50. Serrano, D.; Nunez, M.; Utrillas, M.P.; Marın, M.J.; Marcos, C.; Martınez-Lozano, J.A. Effective cloud optical depth for overcast conditions determined with a UV radiometers. Int. J. Climatol. 2014, 34, 3939–3952. [CrossRef] 51. Damiani, A.; Irie, H.; Horio, T.; Takamura, T.; Khatri, P.; Takenaka, H.; Nagao, T.; Nakajima, T.Y.; Cordero, R. Evaluation of Himawari-8 surface downwelling solar radiation by ground-based measurements. Atmos. Meas. Tech. 2018, 11, 2501–2521. [CrossRef] 52. Schwartz, S.E.; Huang, D.; Vladutescu, D.V. High-resolution photography of clouds from the surface: Retrieval of optical depth of thin clouds down to centimeter scales. J. Geophys. Res. Atmos. 2017, 122, 2898–2928. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).