<<

atmosphere

Article A Data-Driven Approach for Winter Classification Using Radar and NWP Data

Bong-Chul Seo Iowa Center and IIHR—Hydroscience & Engineering, The University of Iowa, Iowa City, IA 52242, USA; [email protected]

 Received: 30 May 2020; Accepted: 24 June 2020; Published: 1 July 2020 

Abstract: This study describes a framework that provides qualitative weather information on winter precipitation types using a data-driven approach. The framework incorporates the data retrieved from weather radars and the numerical weather prediction (NWP) model to account for relevant precipitation microphysics. To enable multimodel-based ensemble classification, we selected six supervised machine learning models: k-nearest neighbors, logistic regression, support vector machine, decision tree, random forest, and multi-layer perceptron. Our model training and cross-validation results based on Monte Carlo Simulation (MCS) showed that all the models performed better than our baseline method, which applies two thresholds (surface temperature and atmospheric layer thickness) for binary classification (i.e., /). Among all six models, random forest presented the best classification results for the basic classes (rain, , and snow) and the further refinement of the snow classes (light, moderate, and heavy). Our model evaluation, which uses an independent dataset not associated with model development and learning, led to classification performance consistent with that from the MCS analysis. Based on the visual inspection of the classification maps generated for an individual radar domain, we confirmed the improved classification capability of the developed models (e.g., random forest) compared to the baseline one in representing both spatial variability and continuity.

Keywords: precipitation; classification; ; NWP; machine learning; MCS

1. Introduction Precipitation is one of the most important variables in atmospheric and environmental sciences, including weather- and hydrology-related research. The estimation of rainfall (i.e., liquid phase precipitation) is becoming easier and more accurate thanks to advanced remote-sensing technology and the presence of reliable ground reference networks (e.g., [1,2]). On the other hand, quantifying amounts of solid and mixed precipitation remains challenging, as does identifying the many different types of precipitation (e.g., [3,4]) because of the difficulty of their reliable measurement. The information on this cold type of precipitation is important for infrastructure and facility management (e.g., air/ground traffic control and road closure) during the winter season in many regions (e.g., [5]). The conventional way of monitoring winter weather types (e.g., snow and freezing rain) has often relied on the dual-polarization capability of weather radars, which allows us to define hydrometeor types (e.g., [6]). These precipitation types obtained at the radar sampling locations aloft occasionally do not coincide with what is observed at the surface because of the possible phase transition along its falling path. The Iowa Flood Center (IFC) has provided a real-time quantitative rainfall map for statewide streamflow prediction over the State of Iowa in the United States [7]. This radar-derived rainfall map is a composite of multiple radar datasets with a 5-min update cycle and an approximately 0.5 km resolution [8]. During the winter season, this map includes snow coverage information estimated using the numerical weather prediction (NWP) model data. This snow classification procedure does not

Atmosphere 2020, 11, 701; doi:10.3390/atmos11070701 www.mdpi.com/journal/atmosphere Atmosphere 2020, 11, 701 2 of 14 incorporate radar observations to avoid the abovementioned disagreement between the ground and the radar sampling altitude; the rain/snow delineation applies two thresholds for surface temperature and critical thickness (e.g., [9,10]) retrieved from the High-Resolution Rapid Refresh (HRRR) model analysis (e.g., [11]). The real-time visualization of this map often reveals discontinuous snow regions split by a tile-shape border because the model’s resolution is coarser than the resolution of the rainfall map (i.e., 3 vs. 0.5 km). This discontinuous pattern continues until the model analysis is updated every hour. In this study, we explore a data-driven approach to improve the discussed weakness of our current classification. We combine the data retrieved from radars and the NWP model to overcome the limitations of each radar- and NWP-only method and develop multiple classification models based on the supervised machine learning approach. Given the difficulty or uncertainty in classifying winter precipitation (e.g., [12,13]), our objective is to develop a framework that can deliver a multimodel-based classification ensemble or probabilistic classification product.

2. Data In this study, we developed a data-driven model to classify winter precipitation for the entire State of Iowa in the United States. To ensure the model’s robustness and reliability, the modeling framework must include a variety of meteorological and environmental data resources retrieved from weather radars and an NWP model for multiple years. This dataset accounts for atmospheric conditions associated with precipitation microphysics (e.g., [14,15]). We used observed weather classes (e.g., rain, freezing rain, and snow) as reference information as recorded in the automated surface observing system (ASOS [16]) network. These records enable the supervised learning of our classification model and provide an opportunity to capture the dynamic features of winter precipitation processes because of the ASOS’s high observational frequency (e.g., 1 min).

2.1. ASOS Data The ASOS is a joint effort initiated by the U.S. National Oceanic and Atmospheric Administration (NOAA), the Federal Aviation Administration (FAA), and the Department of Defense (DOD) to support weather forecasting research/activities and aviation operations. The ASOS network comprises over 900 stations across the United States, which provide a wide range of automated surface weather observations, such as temperature, precipitation, wind, pressure, , and sky conditions [17]. In the study domain (i.e., Iowa) shown in Figure1, there are 15 ASOS stations; we collected precipitation-type observations with a 1 min resolution from all 15 stations for selected events during a four-year period (2012–2015). As one of the ASOS stations is positioned within a range where no radar observation is available (i.e., “the cone of silence”), we excluded the data from the station (the one beside the KDVN radar in Figure1) in our model development and validation. ASOS stations equipped with two sensors (one is for rain/snow identification, and the other is for freezing rain detection) for detecting weather types check the observed data and execute the precipitation identification (PI) algorithm every minute [17]. The PI result is categorized into three precipitation classes: (1) rain ( RA, RA, +RA), − (2) freezing rain ( FZRA, FZRA), and (3) snow ( SN, SN, +SN). The negative ( ) and positive (+) − − − signs denote whether the strengths of the reported classes are “light” or “heavy”, respectively; no sign indicates a “moderate” state of precipitation. Atmosphere 2020, 11, 701 3 of 14 Atmosphere 2020, 11, x FOR PEER REVIEW 3 of 13

Figure 1. 1. TheThe locationslocations of of radars radars and and automated automated surface surface observing observing system system (ASOS) (ASOS) stations stations in the in study the studydomain. domain. The circular The circular areas centeredareas centered on each on radar each indicate radar indicate the 230 the km 230 coverage km coverage of individual of individual radars. 2.2. Radarradars. Data

2.2. RadarWe acquired Data the radar volume data (e.g., [18]) for four radars (see Figure1) in the U.S. Weather Surveillance Radar-1988 Doppler (WSR-88D) network from Amazon’s big data archive (e.g., [19,20]). We acquired the radar volume data (e.g., [18]) for four radars (see Figure 1) in the U.S. Weather As illustrated in Figure1, the four radars (KFSD in Sioux Falls, South Dakota; KDMX in Des Moines, Surveillance Radar-1988 Doppler (WSR-88D) network from Amazon’s big data archive (e.g., [19,20]). Iowa; KDVN in Davenport, Iowa; and KOAX in Omaha, Nebraska) illuminate the Iowa domain As illustrated in Figure 1, the four radars (KFSD in Sioux Falls, South Dakota; KDMX in Des Moines, with some overlaps and fully cover the ASOS locations. All four radars have provided polarimetric Iowa; KDVN in Davenport, Iowa; and KOAX in Omaha, Nebraska) illuminate the Iowa domain with observations since their upgrade to dual-polarization (DP) in 2012 (KFSD, KDMX, and KDVN) and some overlaps and fully cover the ASOS locations. All four radars have provided polarimetric 2013 (KOAX). Their volume data contain three DP observables (differential reflectivity Z , copolar observations since their upgrade to dual-polarization (DP) in 2012 (KFSD, KDMX, and KDVN)DR and correlation coefficient ρHV, and differential phase ϕDP), including the conventional ones (horizontal 2013 (KOAX). Their volume data contain three DP observables (differential reflectivity ZDR, copolar reflectivity ZH, radial velocity, and spectrum width). To prepare the radar data for model development correlation coefficient ρHV, and differential phase φDP), including the conventional ones (horizontal and evaluation, we assigned each individual radar to its closest ASOS station. The radar–ASOS pairs reflectivity ZH, radial velocity, and spectrum width). To prepare the radar data for model and the distances between them are listed in Table1. We then retrieved the Z , Z , and ρ data development and evaluation, we assigned each individual radar to its closestH ASOSDR station.HV The observed at the base elevation angle (approximately 0.5◦) for the corresponding spherical coordinates radar–ASOS pairs and the distances between them are listed in Table 1. We then retrieved the ZH, (e.g., 0.5◦ by 250 m in azimuth and range) of the ASOS stations from each assigned individual radar. ZDR, and ρHV data observed at the base elevation angle (approximately 0.5°) for the corresponding The scanning intervals of WSR-88D are 4–5 min in a precipitation mode and 9–10 min for a clear air spherical coordinates (e.g., 0.5° by 250 m in azimuth and range) of the ASOS stations from each mode. The time of the radar observations for data retrieval was limited to a 1 min window from ASOS assigned individual radar. The scanning intervals of WSR-88D are 4–5 min in a precipitation mode records. We did not include ϕ observations in this procedure because ϕ is a cumulative quantity and 9–10 min for a clear air mode.DP The time of the radar observations forDP data retrieval was limited along a radial direction and quite noisy except for in heavy rain cases. The relevance (or signature) of to a 1 min window from ASOS records. We did not include φDP observations in this procedure ZH, ZDR, and ρHV with different types of precipitation is well described in [1,3]. We note that there is because φDP is a cumulative quantity along a radial direction and quite noisy except for in heavy rain no significant radar data quality control (except for a simple procedure using a ρHV threshold) included cases. The relevance (or signature) of ZH, ZDR, and ρHV with different types of precipitation is well in the data retrieval to remove non-meteorological radar echoes because we selected obvious weather described in [1,3]. We note that there is no significant radar data quality control (except for a simple cases from the ASOS reports. procedure using a ρHV threshold) included in the data retrieval to remove non-meteorological radar echoes because we selected obvious weather cases from the ASOS reports. Table 1. Radar–ASOS pairs for matching precipitation classes and model input features retrieved from the radar data. The closest ASOS station was assigned to each individual radar, and the numeric values Table 1. Radar–ASOS pairs for matching precipitation classes and model input features retrieved in parenthesis indicate the distance (km) between them. from the radar data. The closest ASOS station was assigned to each individual radar, and the numeric values in parenthesisRadar indicate The Number the distance of ASOS (km) Stations between them. ASOS Stations

Radar The NumberKFSD of ASOS Stations 2 EST (161.1), ASOS SPW Stations (131.9) KFSD 2 EST (161.1), SPW (131.9)ALO (142.2), AMW (30.2), DSM KDMX 7 ALO (142.2), AMW(22.5), (30.2), LWD DSM (122.9), (22.5), MCW LWD (161.8),(122.9), MCW KDMX 7 (161.8), MIW (78.9),MIW OTM (78.9), (126.9) OTM (126.9) KDVN 5 BRL (102.7), CID (98.3),BRL (102.7), DBQ (87.9), CID (98.3),DVN DBQ(0.6), IOW (78.0) KDVN 5 KOAX 1 SUX (119.0) (87.9), DVN (0.6), IOW (78.0) KOAX 1 SUX (119.0) 2.3. NWP Data To define the weather conditions, we retrieved several atmospheric variables, which likely describe winter precipitation formation and development, from the Rapid Refresh (RAP) model

Atmosphere 2020, 11, 701 4 of 14

2.3. NWP Data To define the weather conditions, we retrieved several atmospheric variables, which likely describe winter precipitation formation and development, from the Rapid Refresh (RAP) model analysis. The RAP is a continental-scale assimilation and model forecast system updated hourly and based on 13 and 20 km resolution horizontal grids [21]. For this study, we selected the 20 km horizontal grid data, assuming that the weather conditions do not sharply change within the 20 km grids collocated with the ASOS stations shown in Figure1. The retrieved RAP variables for the corresponding ASOS record hours are the geopotential height at 1000, 850, and 700 mb; air temperature at the surface and 850 mb; and relative at the surface. Because the critical thickness (e.g., [9,10]) is a typical indicator of snow-likely conditions, we derived two thickness layers between 1000–850 and 850–700 mb from the geopotential height at these pressure levels. The relative humidity combined with air temperature (e.g., wet bulb temperature) is another indicator that can account for the phase transition of precipitation and snow formation.

3. Methodology In this section, we briefly introduce multiple models based on the supervised machine learning method for winter precipitation classification. Because of its efficiency and simplicity in model implementation, we used the Python “Scikit-Learn” library [22] to build multiple classification models. The six models selected for this study are k-nearest neighbors (kNN), logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and multi-layer perceptron (MLP). This section also outlines the model training and testing strategies for the objective evaluation of the models’ classification performance, depending on the refinement of precipitation classes and the different combinations of input features retrieved from the radar and NWP data.

3.1. Classification Models One of the simplest classification algorithms is kNN, which does not require prior knowledge of the data (e.g., distribution). For a new data point, kNN attempts to define the point in the training dataset closest to the new point. The parameter k in the algorithm denotes how many of the closest neighbors to consider, and the algorithm makes a decision/prediction based on the majority class among the k nearest neighbors. While the kNN algorithm is often used as a baseline classification method, its performance (e.g., training time and prediction accuracy) tends to decrease as the training dataset becomes larger with an increasing number of features. In this study, we use a simple kNN model (i.e., k = 1) for training efficiency to test several different combinations of training features. LR was named after the algorithm’s primary function, called a “logistic function”. This is also known as the “sigmoid function”, which allows predictive analysis based on the concept of probability (e.g., [23]). LR makes binary predictions by estimating the linear function parameters (i.e., slope and intercept) and transferring the linear estimates to the sigmoid function. For multiple class (e.g., n classes) identification, the algorithm analyzes n binary models, and for each of these, the classes are split into two groups, a specific class and all other classes. The algorithm then runs n binary classifiers for a new data point and makes its prediction by selecting a single class that yields the highest score. SVM basically seeks a multi-dimensional hyperplane (e.g., decision boundary), which maximizes the margin/separation between two different classes. The hyperplane dimension is determined by the number of training data features. For instance, if there are two data features, the hyperplane is a line. As SVM is inherently a binary classifier [24], it is necessary to execute a procedure similar to the one in LR for multi-class identification. Support vectors defined as training points located near the border between classes determine the orientation and position of the hyperplane. To make a prediction, the algorithm measures the distances between a new data point and each support vector using a selected kernel function. The algorithm then makes its decision based on the measured distances and the weights of the support vectors. Because the kernel function is a key parameter in SVM, we tested Atmosphere 2020, 11, 701 5 of 14 several kernels (e.g., linear, sigmoid, polynomial, and radial basis functions) and selected the linear one for this study’s classification. DT is a white-box type method, as opposed to a black-box model (e.g., neural network), which does not expose the model’s decision logic and rules. The DT algorithm begins by partitioning the tree in a recursive manner from the topmost decision node, known as the “root node”. The algorithm selects the best feature/attribute for splitting the data records at decision nodes and divides the records into smaller datasets (“branches”). It continues to the decision (“leaf”) at the end of the branch, where there is no more record to split. The algorithm applies a heuristic rule to decide the data-splitting criteria and assigns ranks to attributes (the best one is selected) based on attribute selection measures (ASM), such as the information gain, gain ratio, or Gini index (see, for example, [25]). RF consists of a large collection of individual DTs, which result in an ensemble for a class decision. In other words, the individual DTs reach their own decisions, and the class with the highest occurrence becomes the model’s prediction. Given this basic concept, the most crucial factor in the determination of a model’s prediction performance is whether the model has a large number of (relatively) uncorrelated DTs. To avoid highly correlated trees in the forest, RF uses a random subset of all the provided features when building individual DTs, which makes the model stronger and more robust than a single DT by limiting model’s overfitting and increasing error caused by bias (e.g., [26,27]). In the RF model implementation, we set 100 trees for winter precipitation classification. A perceptron is a simple binary classifier, which makes a prediction based on a linear transformation of an input feature vector using weights and a bias. On the other hand, MLP is a feedforward artificial neural network model that comprises at least three fully connected layers (i.e., input, hidden, and output layers) (e.g., [28]). MLP has two basic motions associated with model learning: (1) forward signals propagate from the input layer through the hidden layers and then reach a decision when the signals arrive at the output layers, and (2) the multitude of weights and biases are backpropagated and repeated until the decision error is minimized using a gradient-based optimization algorithm. While MLP provides a basis for deep learning, which facilitates a complex model structure to capture characteristics contained in large numbers of data, its training often takes longer than that of other models presented in this section. For the MLP model in this study, we set 100 hidden layers for training efficiency.

3.2. Model Training and Evaluation Supervised model learning requires model input features (e.g., predictors) and target classes. The target classes used in this modeling procedure are rain, freezing rain, and snow (light, moderate, and heavy) as observed at the ASOS stations. While the ASOS records offer three rain classes (light, moderate, and heavy), we consolidate these into one simple class (i.e., rain) in this study. Quantitative rainfall information is widely available from radar-derived products (e.g., [8,29]), and there are fewer concerns about the qualitative discretization of rain classes. Light and moderate freezing rain are also combined into one class because there are fewer cases of freezing rain than other types during the data period. The input features used for model development are ZH, ZDR, ρHV, two critical thicknesses between 1000–850 and 850–700 mb (Thick1000–850 and Thick850–700), two air temperatures at the surface (Ts) and 850 mb (T850), and relative humidity (RHs) at the ASOS locations. To explore the classification capability of the selected models, we applied various model configurations using the different sets of classes and input features listed in Table2. Our focus on selecting input features was to explore snow- and freezing rain-likely conditions using different combinations of the NWP variables (e.g., F1–F4). Since radar observations are requirements for generating a classification map with high space and time resolutions, we included them in the final feature set (F5). Atmosphere 2020, 11, 701 6 of 14

Table 2. Model target classes and input features for various model configurations.

C1 RA (rain), FR (freezing rain), and SN (snow) Class (C) RA, FR, LS (light snow), MS (moderate snow), and HS C2 (heavy snow)

F1 Thick1000–850 and Ts F2 Thick1000–850, Ts, and RHs Feature (F) F3 Thick1000–850, Thick850–700, Ts, and T850 F4 Thick1000–850, Thick850–700, Ts, T850, and RHs F5 Thick1000–850, Thick850–700, Ts, T850, RHs, ZH, ZDR, and ρHV Atmosphere 2020, 11, x FOR PEER REVIEW 6 of 13

WeWe split split the the entire entire data data period period into into a a separate separate training training period period (2012–2013) (2012–2013) and and validation validation period period (2014–2015),(2014–2015), as as shown shown in in Figure Figure2. 2. The The sample sample sizes sizes of of each each class class for for the the training training and and validation validation periodsperiods are are presented presented in in Table Table3. To3. assessTo assess the the consistency consistency and and robustness robustness of models’ of models’ performance, performance, we conductedwe conducted simultaneous simultaneous model model training training and cross-validation and cross-validation using Monte using Carlo Monte simulations Carlo simulations (MCS) as(MCS) follows: as follows: (1) randomly (1) randomly select select 75%of 75% the of entire the entire training training dataset dataset and and use use the the samples samples for for model model training,training, (2) (2) evaluate evaluate model model performance performance using using the the rest rest (25%) (25%) of the of dataset,the dataset, and (3)and repeat (3) repeat the prior the prior two stepstwo steps 1000 times1000 times and calculate and calculate a success a success rate (represented rate (represented as “score” as “score” in the in resulting the resulting figures) figures) for each for repetition.each repetition. This MCS This approach MCS approach allowed us allowed to examine us to the examine distribution the distribution/variability/variability of model predictions of model andpredictions compare and the classificationcompare the performanceclassification ofperformance all six models. of all We six based models. our decisionWe based regarding our decision the modelregarding input the features model on input the MCS features results on achieved the MCS by results testing achieved the different by sets testing of features the different (see Table sets2 ) of retrievedfeatures (see from Table the radar 2) retrieved and NWP from data. the radar We then and trainedNWP data. all the We models then trained again all using the models the selected again featureusing the dataset selected for the feature entire dataset training for period the entire and capturedtraining period the models’ and captured structure the and models’ parameters. structure We noteand thatparameters. the input We feature note datathat listedthe input in Table feature2 required data listed a scale in Table transformation 2 required (e.g., a scale standardization) transformation before(e.g., theystandardization) could be used before for model they learning.could be Thisused is for because model some learning. learning This algorithms is because (e.g., some SVM learning and MLP)algorithms are often (e.g., sensitive SVM and to MLP) inconsistency are often sensitive in the data to inconsistency scales among in di thefferent data input scales features, among different which causesinput afeatures, divergence which of thecauses algorithm a divergence cost function of the andalgorithm leads to cost inaccurate function predictions. and leads to We inaccurate applied thepredictions. final (trained) We applied models andthe final data (trained) transformation models scheme and data saved transformation from the model scheme training saved step from to the the independentmodel training model step evaluation to the independent using a new model dataset evaluation for the validation using perioda new presenteddataset for in the Figure validation2. For theperiod purposes presented of visual in Figure comparison 2. For andthe assessment,purposes of wevisual generated comparison classification and assessment, maps for twowe generated example weatherclassification cases using maps the for developed two example models weather and the cases IFC’s using threshold the developedapproach (hereafter models andreferred the to IFC’s as thethreshold baseline approach method). (hereafter referred to as the baseline method).

FigureFigure 2. 2.Data Data periods periods and andprecipitation precipitation events events used for used (a) model for (a training) model and training Monte and Carlo Monte Simulation Carlo (MCS)Simulation analysis (MCS) and analysis (b) independent and (b) independent model evaluation. model evaluation.

Table 3. The sample size of precipitation classes for the training (2012–2013) and validation (2014– 2015) periods.

Classes RA FR LS MS HS Total Training 408 68 956 107 14 1553 Validation 134 51 803 63 15 1066

4. Results

4.1. Relationships between Classes and Features The analysis begins with an inspection of the relationships between precipitation classes and the model input features retrieved from the radar and NWP data. This analysis examines how well the features (e.g., as explanatory variables) describe snow- or rain-likely conditions in the atmosphere and on the surface. The results from this analysis can provide useful guidance when selecting model

Atmosphere 2020, 11, 701 7 of 14

Table 3. The sample size of precipitation classes for the training (2012–2013) and validation (2014–2015) periods.

Classes RA FR LS MS HS Total Training 408 68 956 107 14 1553 Validation 134 51 803 63 15 1066

4. Results

4.1. Relationships between Classes and Features The analysis begins with an inspection of the relationships between precipitation classes and the model input features retrieved from the radar and NWP data. This analysis examines how well the Atmosphere 2020, 11, x FOR PEER REVIEW 7 of 13 features (e.g., as explanatory variables) describe snow- or rain-likely conditions in the atmosphere inputand on features the surface. and help The resultsus understand from this the analysis classification can provide performance useful guidance derived whenfrom selectinga specific modelset of input features. features In and Figure help 3, us we understand present the the relationships classification between performance two explanatory derived from variables a specific arbitrarily set of selectedinput features. from the In NWP Figure data3, we and present precipitation the relationships classes observed between in two the explanatorytraining period variables shown arbitrarily in Figure selected from the NWP data and precipitation classes observed in the training period shown in Figure2. 2. Regarding radar variables (e.g., ZH–ZDR), [1,3] provide similar representation to Figure 3. Although thereRegarding are many radar more variables possible (e.g., combinationsZH–ZDR), [1 using,3] provide all the similar input features, representation Figure to3 demonstrates Figure3. Although a few representativethere are many morecombinations. possible combinations We also note using that all this the input two-dimensional features, Figure illustration3 demonstrates has limited a few usefulnessrepresentative in scrutinizing combinations. multi-dimensional We also note that characteristics this two-dimensional (dependence) illustration between has limited the explanatory usefulness andin scrutinizing response variables. multi-dimensional We hope that characteristics the machine (dependence) learning models between will theaddress explanatory such characteristics. and response variables. We hope that the machine learning models will address such characteristics. The critical The critical thicknesses (Thick1000–850 and Thick850–700) and surface temperature (Ts) shown in Figure 3 arethicknesses typical indicators (Thick1000–850 commonlyand Thick used850–700 to identify) and surface snow/rain temperature (e.g., [9,10]). (Ts) shownFor example, in Figure the3 IFCare typicalapplies indicators commonly used to identify snow/rain (e.g., [9,10]). For example, the IFC applies two two threshold values (i.e., 3 °C for Ts and 1310 m for Thick1000–850) to distinguish snow-likely regions andthreshold provides values a real-time (i.e., 3 ◦C radar-derived for Ts and 1310 rainfall m for Thickmap1000–850 with snow) to distinguish area identification snow-likely [8]. regionsThese two and thresholdsprovides a real-time are presented radar-derived in Figure rainfall 3 as map red with solid snow lines, area and identification we use this [8 ]. baseline These two method thresholds as a benchmarkare presented when in Figure evaluating3 as red the solid machine lines, learning and we models use this proposed baseline methodin this study. as a benchmark As demonstrated when inevaluating Figure 3, the there machine is much learning overlap models among proposed the winter in this precipitation study. As demonstrated classes (cold inrain, Figure freezing3, there rain, is andmuch snow), overlap and among warm the winterrain is precipitation easier to discriminate. classes (cold rain,This freezing implies rain, that and winter snow), precipitation and warm rain is easier to discriminate. This implies that winter precipitation classification is quite challenging classification is quite challenging with such a threshold approach. However, the thresholds for Ts and with such a threshold approach. However, the thresholds for T and Thick presented in Figure3 Thick1000–850 presented in Figure 3 seem to be reasonable discriminatorss with1000–850 an acceptable occurrence ofseem classification to be reasonable errors discriminators(e.g., about 7% withfor the an presented acceptable samples), occurrence given of classification the simplicity errors of the (e.g., method. about 7% for the presented samples), given the simplicity of the method. Figure3 also reveals that relative Figure 3 also reveals that relative humidity (RHs) is another important factor in deciding rain/snow underhumidity certain (RH conditionss) is another because important cold factor and mixed in deciding (i.e., freezing) rain/snow rain under is not certain likely conditionsto form below because 60% cold and mixed (i.e., freezing) rain is not likely to form below 60% of RH at lower surface temperatures. of RHs at lower surface temperatures. s

Figure 3. PrecipitationPrecipitation classes classes characterized characterized by by the the environmental environmental variables variables retrieved retrieved from the numerical weather prediction (NWP) data. The The precipitation precipitation classes classes (rain, (rain, freezing freezing rain, rain, and and snow) were observed observed at at the the ground ground using using the ASOS the ASOS stations stations in Iowa. in The Iowa. red The solid red lines solid indicate lines the indicate thresholds the thresholdsfor surface for temperature surface temperature and critical and thickness critical (1000–850 thickness mb) (1000–850 in the baselinemb) in the method. baseline method.

4.2. Model Training and MCS To decide the model input features, we tested several different sets of feature combinations (F1– F5) for a simpler precipitation class set (C1). The input feature and class sets are listed in Table 2. Figure 4 shows the MCS results for all the different feature sets and classification models described in Section 3.1. Each boxplot represents the distribution of classification scores (i.e., success rate) obtained from 1000 rounds of random resampling, and the scores were calculated with cross- validation using 25% of the entire training dataset. As shown in Figure 4, all the models perform better than the baseline method (indicated by a red solid line) except for some realizations with the F1 and F2 feature sets. We discovered that the classification score tends to increase as the models include more explanatory variables. This implies that the upper atmospheric layer information (i.e., Thick850–700 and T850) is a critical factor that leads to improved classification based on the sharp changes observed between the F2 and F3 distributions for all six models. The inclusion of radar observations yielded results that differ depending on the model learning schemes; most models result in similar or worse performance (e.g., F4 vs. F5), while RF shows slight improvement in terms of the median and interquartile range. Because RF seems to be the best model given the median (or the mean) and the variance of distribution (e.g., the width of the interquartile range) in Figure 4, we decided to select

Atmosphere 2020, 11, 701 8 of 14

4.2. Model Training and MCS Atmosphere 2020, 11, x FOR PEER REVIEW 8 of 13 To decide the model input features, we tested several different sets of feature combinations (F1–F5) forF5 for a simpler model precipitationinput features. class We set also (C1). expect The inputthat the feature inclusion and class of radar sets arevariables listed insupports Table2. Figurefrequent4 showsmap updates the MCS (e.g., results 4 to for 5 min), all the which different is not feature feasible sets with and classificationthe NWP data models only. described in Section 3.1. EachUsing boxplot the represents final feature the distributionset (F5), we performed of classification the MCS scores analysis (i.e., success once more rate) obtainedfor the class from set 1000 C2 rounds(see Table of 2), random which resampling, further refines and snow the scores classes were from calculated C1. As shown with in cross-validation Figure 5, the performance using 25% of theall models entire trainingfor C2 significantly dataset. As dropped shown inwhen Figure compared4, all the to modelsthat for performC1. However, better RF than remains the baseline the best methodmodel, with (indicated a classification by a red solidsuccess line) rate except of about for some90%. realizationsTherefore, we with examined the F1 and the F2reason feature for sets.the Weperformance discovered drop that observed the classification in Figure score 5 and tends analyzed to increase the misclassification as the models include rates moreof all explanatoryfive classes variables.during the This resampling implies that procedure. the upper atmosphericFor example, layer Figure information 6a shows (i.e., theThick distribution850–700 and T of850 ) RF’s is a criticalmisclassification factor that rates leads with to improved respect to classification each individual based class. on the The sharp sum changesof misclassification observed between rates from the F2all andfive classes F3 distributions reaches unity for allat each six models. realization, The and inclusion the distribution of radar observations presented as yieldeda boxplot results in Figure that di6aff indicateser depending a collection on the modelof 1000-time learning simulation schemes; results. most models As presented result in in similar Figure or 6a, worse MS was performance the most (e.g.,challenging F4 vs. F5),one whileas defined RF shows by RF. slight We also improvement confirmed inthat terms LS and of the MS median contained and most interquartile of the errors range. in BecauseMLP. We RF further seems tracked to be the RF’s best cases model of given misclassification the median (orfor the MS mean) and present and the the variance results of in distribution Figure 6b, (e.g.,regarding the width the content of the interquartile of the errors range) derived in from Figure RF.4, weIn Figure decided 6b, to we select show F5 forthat model most MS input cases features. were Weclassified also expect as LS, that and the the inclusion discrimination of radar variablesbetween LSsupports and MS frequent is the map most updates challenging (e.g., task4 to 5 in min), the whichrefinement is not of feasible snow classes. with the NWP data only.

Figure 4. MCS results for the didifferentfferent combinations (F1–F5) of model input features with respect to the classificationclassification models. The The MCS MCS analysis analysis was was performed performed for for the the basic basic precipitation precipitation classes classes (C1). (C1). The boxplot boxplot describes describes the the distribution distribution of the of classification the classification success rate success achieved rate from achieved 1000 realizations. from 1000 Therealizations. baseline method The baseline indicated method by the indicated red solid lineby the was red estimated solid line using was the estimatedtwo thresholds using for the critical two thicknessthresholds and for surfacecritical temperaturethickness and shown surface in temperature Figure3. shown in Figure 3.

Using the final feature set (F5), we performed the MCS analysis once more for the class set C2 (see Table2), which further refines snow classes from C1. As shown in Figure5, the performance of all models for C2 significantly dropped when compared to that for C1. However, RF remains the best model, with a classification success rate of about 90%. Therefore, we examined the reason for the performance drop observed in Figure5 and analyzed the misclassification rates of all five classes during the resampling procedure. For example, Figure6a shows the distribution of RF’s misclassification rates with respect to each individual class. The sum of misclassification rates from all five classes reaches unity at each realization, and the distribution presented as a boxplot in Figure6a indicates a collection of 1000-time simulation results. As presented in Figure6a, MS was the most challenging one as defined by RF. We also confirmed that LS and MS contained most of the errors in MLP. We further tracked RF’sFigure cases of5. misclassificationThe comparison of forMCS MS results and presentfor basic theprecipitation results in classes Figure (C1)6b, and regarding further refinement the content of of snow classes (C2). The feature set F5 was used for this analysis.

While RF outperforms other models, it would be valuable to maintain all six models (because all models with the final feature set F5 performed better than the baseline method), which enables a multimodel-based ensemble or probabilistic approach. We finally trained all six models with the

Atmosphere 2020, 11, x FOR PEER REVIEW 8 of 13

F5 for model input features. We also expect that the inclusion of radar variables supports frequent map updates (e.g., 4 to 5 min), which is not feasible with the NWP data only. Using the final feature set (F5), we performed the MCS analysis once more for the class set C2 (see Table 2), which further refines snow classes from C1. As shown in Figure 5, the performance of all models for C2 significantly dropped when compared to that for C1. However, RF remains the best model, with a classification success rate of about 90%. Therefore, we examined the reason for the performance drop observed in Figure 5 and analyzed the misclassification rates of all five classes during the resampling procedure. For example, Figure 6a shows the distribution of RF’s misclassification rates with respect to each individual class. The sum of misclassification rates from all five classes reaches unity at each realization, and the distribution presented as a boxplot in Figure 6a indicates a collection of 1000-time simulation results. As presented in Figure 6a, MS was the most challenging one as defined by RF. We also confirmed that LS and MS contained most of the errors in MLP. We further tracked RF’s cases of misclassification for MS and present the results in Figure 6b, regarding the content of the errors derived from RF. In Figure 6b, we show that most MS cases were classified as LS, and the discrimination between LS and MS is the most challenging task in the refinement of snow classes.

AtmosphereFigure2020 4., MCS11, 701 results for the different combinations (F1–F5) of model input features with respect to9 of 14 the classification models. The MCS analysis was performed for the basic precipitation classes (C1). The boxplot describes the distribution of the classification success rate achieved from 1000 the errorsrealizations. derived The from baseline RF. In method Figure 6 indicatedb, we show by thethat red most solid MS line cases was were estimated classified using as theLS, twoand the discriminationthresholds for between critical LSthickness and MS and is surface the most temperature challenging shown task in in Figure the refinement 3. of snow classes.

Atmosphere 2020, 11, x FOR PEER REVIEW 9 of 13

feature set F5 for the target precipitation classes C1 and C2 using the entire training dataset and saved the structure and parameters of the models. We added one more element to the model training stream to learn if sequential modeling, which performs the first classification for C1 and then refines the snowFigure classes 5. The using comparisoncomparison C1’s result, ofof MCSMCS can resultsresults reduce for for basic basic errors precipitation precipitation compared classes classes to the (C1) (C1) direct and and further further classification refinement refinement for of C2. Therefore,snowof snow classes separate classes (C2). (C2). snow The The featuremodels feature set were set F5 F5 was also was used trainedused for for this for this analysis. the analysis. classification of its three classes.

While RF outperforms other models, it would be valuable to maintain all six models (because all models with the final feature set F5 performed better than the baseline method), which enables a multimodel-based ensemble or probabilistic approach. We finally trained all six models with the

FigureFigure 6. 6. The distribution of (a) random forest (RF)’s misclassification misclassification rate rate with with respect respect to to each each precipitationprecipitation class.class. TheThe classes classes presented presented in in (b ()b indicate) indicate RF’s RF’s misclassification misclassification for for MS, MS, which which shows shows the highestthe highest misclassification misclassification rate rate in (a in). (a).

4.3. ModelWhile Evaluation RF outperforms other models, it would be valuable to maintain all six models (because all models with the final feature set F5 performed better than the baseline method), which enables In this section, we evaluate the performance of all six models trained in the previous section a multimodel-based ensemble or probabilistic approach. We finally trained all six models with the using an independent dataset (see Figure 2) that was not used for model development and training. feature set F5 for the target precipitation classes C1 and C2 using the entire training dataset and saved The evaluation results are illustrated in Figure 7 with respect to the target precipitation classes listed the structure and parameters of the models. We added one more element to the model training stream in Table 2. The results for C1 and C2 presented in Figure 7 show similar tendencies to those in Figure to learn if sequential modeling, which performs the first classification for C1 and then refines the snow 5: (1) the performance drops from C1 to C2, and (2) RF performs the best for both C1 and C2. The classes using C1’s result, can reduce errors compared to the direct classification for C2. Therefore, label “C1-snow” in Figure 7 denotes the results of sequential modeling, which identifies the C1 classes separate snow models were also trained for the classification of its three classes. first and then details the snow classes (light, moderate, and heavy) from SN in C1. As expected, the 4.3.use Modelof successive Evaluation models (C1-snow) seems to improve the result slightly compared to the results from direct refinement (C2). The C1-snow approach tends to reduce the number of rain classes (e.g., FA In this section, we evaluate the performance of all six models trained in the previous section and RA) that are erroneously classified into the refined snow classes. However, the performance using an independent dataset (see Figure2) that was not used for model development and training. difference between C2 and C1-snow does not seem significant, and most of the misclassification for The evaluation results are illustrated in Figure7 with respect to the target precipitation classes listed in both approaches appears between light and moderate snow, as shown in Figure 6. We also present Table2. The results for C1 and C2 presented in Figure7 show similar tendencies to those in Figure5: the result of the baseline threshold method (a red solid line in Figure 7) for the validation dataset, (1) the performance drops from C1 to C2, and (2) RF performs the best for both C1 and C2. The label which can be compared with that of C1 because the baseline method provides binary classification “C1-snow” in Figure7 denotes the results of sequential modeling, which identifies the C1 classes first (rain/snow) only. and then details the snow classes (light, moderate, and heavy) from SN in C1. As expected, the use of successive models (C1-snow) seems to improve the result slightly compared to the results from direct refinement (C2). The C1-snow approach tends to reduce the number of rain classes (e.g., FA and RA) that are erroneously classified into the refined snow classes. However, the performance difference between C2 and C1-snow does not seem significant, and most of the misclassification for

Figure 7. The results of independent model evaluation regarding different class sets. The red solid line indicates the classification success rate of the baseline method.

Atmosphere 2020, 11, x FOR PEER REVIEW 9 of 13

feature set F5 for the target precipitation classes C1 and C2 using the entire training dataset and saved the structure and parameters of the models. We added one more element to the model training stream to learn if sequential modeling, which performs the first classification for C1 and then refines the snow classes using C1’s result, can reduce errors compared to the direct classification for C2. Therefore, separate snow models were also trained for the classification of its three classes.

Figure 6. The distribution of (a) random forest (RF)’s misclassification rate with respect to each precipitation class. The classes presented in (b) indicate RF’s misclassification for MS, which shows the highest misclassification rate in (a).

4.3. Model Evaluation In this section, we evaluate the performance of all six models trained in the previous section using an independent dataset (see Figure 2) that was not used for model development and training. The evaluation results are illustrated in Figure 7 with respect to the target precipitation classes listed in Table 2. The results for C1 and C2 presented in Figure 7 show similar tendencies to those in Figure 5: (1) the performance drops from C1 to C2, and (2) RF performs the best for both C1 and C2. The label “C1-snow” in Figure 7 denotes the results of sequential modeling, which identifies the C1 classes first and then details the snow classes (light, moderate, and heavy) from SN in C1. As expected, the use of successive models (C1-snow) seems to improve the result slightly compared to the results from Atmospheredirect refinement2020, 11, 701 (C2). The C1-snow approach tends to reduce the number of rain classes (e.g.,10 of FA 14 and RA) that are erroneously classified into the refined snow classes. However, the performance difference between C2 and C1-snow does not seem significant, and most of the misclassification for both approaches appears between light and moderate snow, as shown in Figure6. We also present both approaches appears between light and moderate snow, as shown in Figure 6. We also present the result of the baseline threshold method (a red solid line in Figure7) for the validation dataset, the result of the baseline threshold method (a red solid line in Figure 7) for the validation dataset, which can be compared with that of C1 because the baseline method provides binary classification which can be compared with that of C1 because the baseline method provides binary classification (rain/snow) only. (rain/snow) only.

Figure 7.7. TheThe resultsresults of of independent independent model model evaluation evaluation regarding regarding diff differenterent class class sets. sets. The The red solidred solid line indicatesline indicates the classification the classification success success rate ofrate the of baseline the baseline method. method.

In Figures8 and9, we applied the developed models to two well-known weather cases and generated classification maps for the KDVN radar domain. Figure8 shows maps of radar reflectivity and precipitation classification (baseline and RF) for an event that occurred in May 2013 during the NASA’s field campaign, known as Iowa Flood Studies (IFloodS). The observed precipitation pattern, the presence of melting layer, and the precipitation types (e.g., phase) within the KDVN domain are described well in [30]. The baseline classification using two NWP parameters (Thick1000–850 and Ts) indicates that the coarse grid resolution of the NWP model generates an undesirable border between the precipitation classes. These discontinuous border and classification pattern persist until the model is updated, depending on the modeling cycle (e.g., hourly). On the other hand, the RF map using both the radar and NWP data better represents the spatial variability and provides an additional class (i.e., FR). In Figure9, we present another case of widespread snow across the entire radar domain that occurred in March 2015. We confirmed the occurrence of both snow and freezing rain during this period based on the ASOS records and our personal experience (see Figure2). While the RF map in Figure8 shows three classes to be compared with the baseline method, Figure9 illustrates the similarities and differences in the results produced by the different machine learning models (e.g., kNN, RF, and MLP) with five classes, as an example. As illustrated in Figure9, RF seems to underestimate the coverage of freezing rain, whereas MLP defines a fairly extended area for the same class. Unfortunately, there is no reference information on this spatial coverage of freezing rain, and we speculate that the actual coverage pattern might be close to the one shown in the kNN map. Hopefully, we will be able to expand our training dataset and improve the models’ classification capabilities for freezing rain; we lacked sufficient cases of freezing rain in the development of our model. AtmosphereAtmosphere 20202020,, 1111,, xx FORFOR PEERPEER REVIEWREVIEW 10 10 ofof 1313

InIn FiguresFigures 88 andand 9,9, wewe appliedapplied thethe developeddeveloped modelsmodels toto twotwo well-knownwell-known weatherweather casescases andand generatedgenerated classificationclassification mapsmaps forfor thethe KDVNKDVN radarradar domain.domain. FigureFigure 88 showsshows mapsmaps ofof radarradar reflectivityreflectivity andand precipitationprecipitation classificationclassification (baseline(baseline andand RF)RF) forfor anan eventevent thatthat occurredoccurred inin MayMay 20132013 duringduring thethe NASA’sNASA’s fieldfield campaign,campaign, knownknown asas IowaIowa FloodFlood StudiesStudies (IFloodS).(IFloodS). TheThe observedobserved precipitationprecipitation pattern,pattern, thethe presencepresence ofof meltingmelting layer,layer, andand thethe precipitationprecipitation typestypes (e.g.,(e.g., phase)phase) withinwithin thethe KDVNKDVN domaindomain areare describeddescribed wellwell inin [30].[30]. TheThe baselinebaseline classificationclassification usingusing twotwo NWPNWP parametersparameters ((ThickThick1000–8501000–850 andand TTss)) indicatesindicates thatthat thethe coarsecoarse gridgrid resolutionresolution ofof thethe NWPNWP modelmodel generatesgenerates anan undesirableundesirable borderborder betweenbetween thethe precipitationprecipitation classes.classes. TheseThese discontinuousdiscontinuous borderborder andand classificationclassification patternpattern persistpersist untiluntil thethe modelmodel isis updated,updated, dependingdepending onon thethe modelingmodeling cyclecycle (e.g.,(e.g., hourly).hourly). OnOn thethe otherother hand,hand, thethe RFRF mapmap usingusing bothboth thethe radarradar andand NWPNWP datadata betterbetter representsrepresents thethe spatialspatial variabilityvariability andand providesprovides anan additionaladditional classclass (i.e.,(i.e., FR).FR). InIn FigureFigure 9,9, wewe presentpresent anotheranother casecase ofof widespreadwidespread snowsnow acrossacross thethe entireentire radarradar domaindomain thatthat occurredoccurred inin MarchMarch 2015.2015. WeWe confirmedconfirmed thethe occurrenceoccurrence ofof bothboth snowsnow andand freezingfreezing rainrain duringduring thisthis periodperiod basedbased onon thethe ASOSASOS recordsrecords andand ourour personalpersonal experienceexperience (see(see FigureFigure 2).2). WhileWhile thethe RFRF mapmap inin FigureFigure 88 showsshows threethree classesclasses toto bebe comparedcompared withwith thethe baselinebaseline method,method, FigureFigure 99 illustratesillustrates thethe similaritiessimilarities andand differencesdifferences inin thethe resultsresults producedproduced byby thethe differentdifferent machinemachine learninglearning modelsmodels (e.g.,(e.g., kkNN,NN, RF, RF, and and MLP) MLP) with with five five classes, classes, as as an an example. example. As As illustrated illustrated in in Figure Figure 9, 9, RF RF seems seems to to underestimateunderestimate thethe coveragecoverage ofof freezingfreezing rain,rain, whereaswhereas MLPMLP definesdefines aa fairlyfairly extendedextended areaarea forfor thethe samesame class.class. Unfortunately,Unfortunately, therethere isis nono referencereference informationinformation onon thisthis spatialspatial coveragecoverage ofof freezingfreezing rain,rain, andand wewe speculatespeculate thatthat thethe actualactual coveragecoverage patternpattern mightmight bebe closeclose toto thethe oneone shownshown inin thethe kkNNNN map.map. Hopefully,Hopefully, Atmospherewewe willwill bebe2020 able able, 11 , to 701to expandexpand ourour trainingtraining datasetdataset andand improveimprove thethe models’models’ classificationclassification capabilitiescapabilities11 of forfor 14 freezingfreezing rain;rain; wewe lackedlacked sufficientsufficient casescases ofof freezingfreezing rainrain inin thethe developmentdevelopment ofof ourour model.model.

FigureFigure 8.8. ExampleExample maps maps of of ( (aa)) base base scan scan reflectivity, reflectivity,reflectivity, ((bb)) classificationclassificationclassification byby thethe baselinebaseline method, method, and and ( (cc)) classificationclassificationclassification byby RF.RF. TheTheThe circularcircularcircular lineslineslines centeredcenteredcentered ononon thethethe radarradarradar sitesitesite demarcatedemarcatedemarcate everyeveryevery 5050 kmkm withinwithin aa 230230

kmkm radarradar domain.domain. The The radial radial lineslines areare positionedpositioned withwith aa 3030°30°◦ interval.interval.

FigureFigure 9. 9. ExampleExample classificationclassification classification maps maps generated generated from from ( (aa)) kkNN,NN, ( (bb)) RF, RF, andand ((cc)) MLP.MLP.

5.5. Conclusions Conclusions ThisThis study study proposes proposes a a framework framework framework for for providing providing providing qualitative qualitative qualitative weather weather information information on on winter winter precipitationprecipitation basedbased onon aa data-drivendata-driven approach approach using using weather weather radar radar observations observations and and NWP NWP analysis. analysis. ThisThis frameworkframeworkframework enables enablesenables the thethe easy easyeasy implementation implementationimplementation of ofof multiple multiplemultiple classification classificationclassification models, models,models, which whichwhich could couldcould lead leadlead to atoto multimodel-based a a multimodel-basedmultimodel-based ensemble ensembleensemble or probability oror probabilityprobability of occurrence ofof occurrenceoccurrence for a precipitation forfor a a precipitationprecipitation class of interest. classclass ofof Using interest.interest. the widespreadUsingUsing the the widespread widespread Python “Scikit-Learn” Python Python “Scikit-Learn” “Scikit-Learn” module [22], whichmodule module supports [22], [22], which which a variety supports supports of machine a a variety variety learning of of machine machine studies, welearninglearning constructed studies,studies, six wewe supervised constructedconstructed classification sixsix supervisedsupervised models: classificationclassificationkNN, LR, models:models: SVM, k DT,kNN,NN, RF, LR,LR, and SVM,SVM, MLP. DT,DT, Although RF,RF, andand referenceMLP.MLP. AlthoughAlthough information referencereference on winter informationinformation precipitation onon winterwinter classes precipitationprecipitation is uncommon, classesclasses weather isis uncommon,uncommon, class observations weatherweather classclass [17] fromobservationsobservations two ASOS [17][17] sensors fromfrom twotwo allowed ASOSASOS us sensorssensors to apply allowedallowed these supervised usus toto applyapply learning thesethese supervisedsupervised methods. learninglearning methods.methods. To determine which model input features properly describe winter precipitation processes, we tested several different feature combinations (Table2) of the WSR-88D observations ( ZH, ZDR, and ρHV) and the RAP analysis (Thick1000–850, Thick850–700, Ts, T850, and RHs) over Iowa in the United States. We performed random resampling, known as MCS, for each feature set by assigning 75% of the training samples for model learning and the rest for model cross-validation, and repeating these steps 1000 times. The MCS results for basic classification (rain, freezing rain, and snow) demonstrated that the model performance tends to improve with more feature information, particularly for the RF model, which showed the best classification performance of all six models. We also confirmed that based on the classification results from all the models for the selected feature set (F5), they outperform our current baseline method using two thresholds for Thick1000–850, and Ts. Although class refinement tends to decrease the accuracy of model classification (Figure5), we found that the most frequent misclassification occurs between light and moderate snow, implying that the models’ basic classification is still valuable with high success rates. The model evaluation using an independent dataset, which was not used for model learning, verified that the performance of the models remained within a range similar to that for the MCS results. We also made an effort to improve the performance decrease observed in class refinement by splitting the modeling procedure into two steps: basic classification and snow class refinement. This sequential modeling resulted in better classification than the direct refinement method, but the improvement did not seem significant. We also visually inspected the model-generated classification maps for well-known weather cases and found that the maps looked much more realistic than the one Atmosphere 2020, 11, 701 12 of 14 generated by the baseline method in representing spatial variability and eliminating discontinuous patterns arising from the coarse spatial resolution of the NWP data. Based on this development, our goal is to implement the proposed models in our real-time operation [7] to improve the current binary classification (i.e., the baseline method). The inclusion of radar observations in the model input stream facilitates frequent map updates (e.g., every 5 min) with more precipitation classes and better representation of both spatial variability and continuity. The generation of a composite field map for a large area requires a data processing scheme to synchronize radar data (e.g., ZH, ZDR, and ρHV) or final classification maps generated at all different observation times from multiple radars. This can be readily resolved by calculating storm velocity vectors for each individual radar domain and projecting individual fields in a linear fashion with the vectors into synchronized space and time domains [31]. We note that the weather class information generated by this study is far from ready for use in quantitative applications (e.g., hydrologic prediction) because the estimation of snow density and snow-water equivalent is still challenging. The reliability of data-driven models, particularly the supervised ones, depends on the accuracy of reference information. The sensors deployed in ASOS stations, which provide the class observations for supervised learning in this study, assign a certain portion of into one of the rain categories (if they are accompanied by freezing rain, the sensor regards them as freezing rain) [17]. Ice crystals are regarded as no precipitation. Unfortunately, the degree and frequency of these uncertainties are unknown, and we hope to further investigate the effect of these factors on our classification using future cases. With the advantage of our data-driven framework, we will expand the model training dataset as the capacity of radar and NWP data grows and routinely train our models to make them more robust. While RF seems to perform better than other models with the current dataset, we hope that future data expansion and extensive model training will equalize the classification performance of all the models somewhat, leading to reliable ensemble production.

Funding: This research received no external funding and was supported by the Iowa Flood Center at the University of Iowa. Acknowledgments: The author is grateful to Witold Krajewski for his insightful comments on the modeling and evaluation approach presented in the study. The author also thanks Kang-Pyo Lee from the Information Technology Services at the University of Iowa for providing his expertise on the machine learning models. Conflicts of Interest: The author declares no conflict of interest.

References

1. Ryzhkov, A.V.; Schuur, T.J.; Burgess, D.W.; Heinselman, P.L.; Giangrande, S.E.; Zrnic, D.S. The joint polarization experiment: Polarimetric rainfall measurements and hydrometeor classification. Bull. Am. Meteorol. Soc. 2005, 86, 809–824. [CrossRef] 2. Kim, D.; Nelson, B.; Seo, D.-J. Characteristics of reprocessed hydrometeorological automated data system (HADS) hourly precipitation data. Weather Forecast. 2009, 24, 1287–1296. [CrossRef] 3. Straka, J.M.; Zrni´c,D.S.; Ryzhkov, A.V. Bulk hydrometeor classification and quantification using polarimetric radar data: Synthesis of relations. J. Appl. Meteorol. Clim. 2000, 39, 1341–1372. [CrossRef] 4. Rasmussen, R.; Baker, B.; Kochendorfer, J.; Meyers, T.; Landolt, S.; Fischer, A.P.; Black, J.; Thériault, J.M.; Kucera, P.; Gochis, D.; et al. How well are we measuring snow: The NOAA/FAA/NCAR winter precipitation test bed. Bull. Am. Meteorol. Soc. 2012, 93, 811–829. [CrossRef] 5. Black, A.W.; Mote, T.L. Characteristics of Winter-Precipitation-Related Transportation Fatalities in the United States. Weather Clim. Soc. 2015, 7, 133–145. [CrossRef] 6. Park, H.; Ryzhkov, A.V.; Zrni´c,D.S.; Kim, K.E. The hydrometeor classification algorithm for the polarimetric WSR-88D: Description and application to an MCS. Weather Forecast. 2009, 24, 730–748. [CrossRef] 7. Krajewski, W.F.; Ceynar, D.; Demir, I.; Goska, R.; Kruger, A.; Langel, C.; Mantilla, R.; Niemeier, J.; Quintero, F.; Seo, B.-C.; et al. Real-time flood forecasting and information system for the State of Iowa. Bull. Am. Meteorol. Soc. 2017, 98, 539–554. [CrossRef] Atmosphere 2020, 11, 701 13 of 14

8. Seo, B.-C.; Krajewski, W.F. Statewide real-time quantitative precipitation estimation using weather radar and NWP model analysis: Algorithm description and product evaluation. Environ. Modell. Softw. 2020, (in press). 9. Keeter, K.K.; Cline, J.W. The objective use of observed and forecast thickness values to predict precipitation type in North Carolina. Weather Forecast. 1991, 6, 456–469. [CrossRef] 10. Heppner, P.O.G. Snow versus rain: Looking beyond the “magic” numbers. Weather Forecast. 1992, 7, 683–691. [CrossRef] 11. Pinto, J.O.; Grim, J.A.; Steiner, M. Assessment of the high-resolution rapid refresh model’s ability to predict mesoscale convective systems using object-based evaluation. Weather Forecast. 2015, 30, 892–913. [CrossRef] 12. Schuur, T.J.; Park, H.S.; Ryzhkov, A.V.; Reeves, H.D. Classification of precipitation types during transitional winter weather using the RUC model and polarimetric radar retrievals. J. Appl. Meteor. Climatol. 2012, 51, 763–779. [CrossRef] 13. Thompson, E.; Rutledge, S.; Dolan, B.; Chandrasekar, V. A dual polarimetric radar hydrometeor classification algorithm for winter precipitation. J. Atmos. Ocean. Technol. 2014, 31, 1457–1481. [CrossRef] 14. Thompson, G.; Rasmussen, R.M.; Manning, K. Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part I: Description and sensitivity analysis. Mon. Weather Rev. 2004, 132, 519–542. [CrossRef] 15. Zhang, G.; Luchs, S.; Ryzhkov, A.; Xue, M.; Ryzhkova, L.; Cao, Q. Winter precipitation microphysics characterized by polarimetric radar and video disdrometer observations in central Oklahoma. J. Appl. Meteorol. Climatol. 2011, 50, 1558–1570. [CrossRef] 16. Clark, P. Automated surface observations, new challenges-new tools. In Proceedings of the 6th Conference on Aviation Weather Systems, Dallas, TX, USA, 15–20 January 1995; pp. 445–450. 17. National Oceanic and Atmospheric Administration; Department of Defense; Federal Aviation Administration; United States Navy. Automated Surface Observing System (ASOS) User’s Guide, ASOS Program. 1998. Available online: https://www.weather.gov/media/asos/aum-toc.pdf (accessed on 30 May 2020). 18. Kelleher, K.E.; Droegemeier, K.K.; Levit, J.J.; Sinclair, C.; Jahn, D.E.; Hill, S.D.; Mueller, L.; Qualley, G.; Crum, T.D.; Smith, S.D.; et al. A real-time delivery system for NEXRAD Level II data via the internet. Bull. Am. Meteorol. Soc. 2007, 88, 1045–1057. [CrossRef] 19. Ansari, S.; Greco, S.D.; Kearns, E.; Brown, O.; Wilkins, S.; Ramamurthy, M.; Weber, J.; May, R.; Sundwall, J.; Layton, J.; et al. Unlocking the potential of NEXRAD data through NOAA’s big data partnership. Bull. Am. Meteorol. Soc. 2017, 99, 189–204. [CrossRef] 20. Seo, B.-C.; Keem, M.; Hammond, R.; Demir, I.; Krajewski, W.F. A pilot infrastructure for searching rainfall metadata and generating rainfall product using the big data of NEXRAD. Environ. Modell. Softw. 2019, 117, 69–75. [CrossRef] 21. Benjamin, S.G.; Weygandt, S.S.; Brown, J.M.; Hu, M.; Alexander, C.R.; Smirnova, T.G.; Olson, J.B.; James, E.P.; Dowell, D.C.; Grell, G.A.; et al. A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev. 2016, 144, 1669–1694. [CrossRef] 22. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. 23. Peng, C.Y.J.; Lee, K.L.; Ingersoll, G.M. An introduction to logistic regression analysis and reporting. J. Educ. Res. 2002, 96, 3–14. [CrossRef] 24. Vapnik, V. Support vector machine. Mach. Learn. 1995, 20, 273–297. 25. Raileanu, L.E.; Stoffel, K. Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 2004, 41, 77–93. [CrossRef] 26. Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2, 18–22. 27. Belgiu, M.; Drăgu¸t,L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [CrossRef] 28. Gardner, M.W.; Dorling, S.R. Artificial neural network: The multilayer perceptron: A review of applications in atmospheric sciences. Atmos. Environ. 1998, 32, 2627–2636. [CrossRef] 29. Zhang, J.; Howard, K.; Langston, C.; Kaney, B.; Qi, Y.; Tang, L.; Grams, H.; Wang, Y.; Cocks, S.; Martinaitis, S.; et al. Multi-Radar Multi-Sensor (MRMS) quantitative precipitation estimation: Initial operating capabilities. Bull. Am. Meteorol. Soc. 2016, 97, 621–638. [CrossRef] Atmosphere 2020, 11, 701 14 of 14

30. Seo, B.-C.; Dolan, B.; Krajewski, W.F.; Rutledge, S.; Petersen, W. Comparison of single and dual polarization based rainfall estimates using NEXRAD data for the NASA iowa flood studies project. J. Hydrometeor. 2015, 16, 1658–1675. [CrossRef] 31. Seo, B.-C.; Krajewski, W.F. Correcting temporal sampling error in radar-rainfall: Effect of advection parameters and rain storm characteristics on the correction accuracy. J. Hydrol. 2015, 531, 272–283. [CrossRef]

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).