A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity
Total Page:16
File Type:pdf, Size:1020Kb
atmosphere Article A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity Xia Sun 1 , Lian Xie 1,*, Shahil Umeshkumar Shah 2 and Xipeng Shen 2 1 Department of Marine, Earth and Atmospheric Sciences, North Carolina State University, Box 8208, Raleigh, NC 27695-8208, USA; [email protected] 2 Department of Computer Sciences, North Carolina State University, Raleigh, NC 27695-8206, USA; [email protected] (S.U.S.); [email protected] (X.S.) * Correspondence: [email protected] Abstract: In this study, nine different statistical models are constructed using different combinations of predictors, including models with and without projected predictors. Multiple machine learning (ML) techniques are employed to optimize the ensemble predictions by selecting the top performing ensemble members and determining the weights for each ensemble member. The ML-Optimized Ensemble (ML-OE) forecasts are evaluated against the Simple-Averaging Ensemble (SAE) forecasts. The results show that for the response variables that are predicted with significant skill by individual ensemble members and SAE, such as Atlantic tropical cyclone counts, the performance of SAE is comparable to the best ML-OE results. However, for response variables that are poorly modeled by Citation: Sun, X.; Xie, L.; Shah, S.U.; individual ensemble members, such as Atlantic and Gulf of Mexico major hurricane counts, ML-OE Shen, X. A Machine Learning Based predictions often show higher skill score than individual model forecasts and the SAE predictions. Ensemble Forecasting Optimization Algorithm for Preseason Prediction of However, neither SAE nor ML-OE was able to improve the forecasts of the response variables when Atlantic Hurricane Activity. all models show consistent bias. The results also show that increasing the number of ensemble Atmosphere 2021, 12, 522. https:// members does not necessarily lead to better ensemble forecasts. The best ensemble forecasts are from doi.org/10.3390/atmos12040522 the optimally combined subset of models. Academic Editors: Keywords: hurricane prediction; machine learning; ensemble model Valentine Anantharaj, Forrest M. Hoffman, Udaysankar S. Nair, Samantha Vanessa Adams and Jimy Dudhia 1. Introduction Tropical cyclones (TC), known as hurricanes in the Atlantic Ocean and Eastern Pacific, Received: 28 February 2021 are extreme weather systems on Earth that have far reaching adverse impacts on the human Accepted: 17 April 2021 Published: 20 April 2021 society [1,2] and are the costliest natural disasters in the United States [3]. Governmen- tal agencies and nongovernmental organizations dealing with TC disaster preparedness Publisher’s Note: MDPI stays neutral planning and post-disaster humanitarian relief efforts, and industries dealing with the with regard to jurisdictional claims in potential impacts from TCs rely on skillful seasonal predictions of TC activities for their published maps and institutional affil- preseason decisions. Hurricane experts have started issuing preseason TC predictions iations. since 1984 [4], and the methodologies currently used to produce preseason TC forecasts include multivariate regression [5–8], dynamic models [9], and statistical dynamical ap- proaches [10–13]. The reliability and utility of such long-range forecasts have met some skepticism from the public [14]. Findings from several studies also showed that the skills of preseason forecasts issued by various groups were marginal [15–17]. Thus, there is a Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. clear gap between the current skills of preseason TC forecasts and the public demand for This article is an open access article such information. Only when such technological gap is bridged, the potential economic distributed under the terms and values of seasonal hurricane prediction can be fully realized [18]. conditions of the Creative Commons Ensemble techniques have been widely used in weather and climate prediction to Attribution (CC BY) license (https:// reduce forecast uncertainty [19,20]. Applications of artificial intelligence in weather and creativecommons.org/licenses/by/ climate prediction have emerged in recent years [21,22]. Combination of ensemble forecast- 4.0/). ing approaches with machine learning (ML) techniques has also been explored. Rasp and Atmosphere 2021, 12, 522. https://doi.org/10.3390/atmos12040522 https://www.mdpi.com/journal/atmosphere Atmosphere 2021, 12, 522 2 of 20 Lerch [23] and Krasnopolsky and Lin [24] applied neural network (NN) in postprocessing of ensemble weather forecasting and found NN technique can improve ensemble forecasts over traditional ensemble approaches. With regard to seasonal hurricane prediction, Jag- ger and Elsner [25] demonstrated the benefit of using multimodel consensus in seasonal hurricane prediction. Richmana et al. [26] published an article showing ML techniques can improve seasonal hurricane prediction over traditional regression models. However, combining ML and ensemble forecasting has yet to gain wide adoption in the preseason prediction of hurricanes. In this study, we present a novel approach to seasonal TC predic- tion based on the optimization of multimodel ensemble forecasts using machine learning techniques. The goal is to improve preseason prediction of Atlantic hurricane activity by identifying response variables and scenarios which are likely to benefit from ML-based optimization of ensemble forecasting. The ensemble members used in the optimization include nine statistical models and a suite of models based on machine learning. The rest of the article is organized as follows. In Section2, data and methods used in this study are described, followed by a detailed presentation of results in Section3. Section4 discusses the results. Conclusions are summarized in Section5. 2. Data and Methods 2.1. Data The number of tropical cyclones (TCs) in the past seasons was obtained by combining the historical database known as HURDAT (HURricane DATabase) [27]) and the archived best track seasonal maps compiled by the National Hurricane Center (NHC). TC counts were manually determined by region and then further categorized by its peak strength within each region based on the Saffir–Simpson hurricane wind scale. Forecasts are made for three categories TC, HU and MH: TC includes tropical storms, hurricanes (categories 1–2) and major hurricanes (category 3 and higher); HU includes hurricanes and major hurricanes; MH includes major hurricanes only. The three regions are the Gulf of Mexico, the Caribbean Sea, and the whole North Atlantic Basin (Figure1). Therefore, for clarity, nine response variables are listed in Table1. Figure 1. Three regions for forecast: Gulf of Mexico (bounded by the Gulf coast of the United States, from the southern tip Figure 1. Three regions for forecast: Gulf of Mexico (bounded by the Gulf coast of the United of Florida toStates, Texas; from on the the southwest southern and tip of south Florida by Mexico; to Texas; and on on the the southwest southeast and by south Cuba), by Caribbean Mexico; and Sea (borderedon by the Yucatan Peninsulathe southeast and the by central Cuba), America Caribbean on the Sea west (bordered and southwest; by the Yucatan on the southPeninsula by Venezuela; and the central and the Amer West‐ Indies); the whole Atlanticica Basinon the is west composed and southwest; of the Atlantic on the Ocean, south by the Venezuela; Gulf of Mexico, and the and West the CaribbeanIndies); the Sea. whole Atlan‐ tic Basin is composed of the Atlantic Ocean, the Gulf of Mexico, and the Caribbean Sea. Figure 4. Percentage of times a variable is selected across all 39 windows (unit: %) in the model F with Lasso and SWCV per response: (a) ATTC, (b) ATHU, (c) ATMH, (d) CATC, (e) CAHU, (f) CAMH, (g) GUTC, (h) GUHU, and (i) GUMH. NINO variables are highlighted in bold lines. Atmosphere 2021, 2, 27. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/atmosphere Atmosphere 2021, 12, 522 3 of 20 Table 1. Definition of response variables. Response Variable Region Definitions Atlantic Tropical Cyclones: counts of tropical ATTC storms, and hurricanes in North Atlantic North Atlantic Atlantic Hurricanes: counts of hurricanes in ATHU Basin North Atlantic Atlantic Major Hurricanes: counts of major ATMH hurricanes in North Atlantic Caribbean Sea Tropical Cyclones: counts of CATC tropical storms and hurricanes in Caribbean Sea Caribbean Sea Hurricanes: counts of hurricanes CAHU Caribbean Sea in Caribbean Sea Caribbean Sea Major Hurricanes: counts of CAMH major hurricanes in Caribbean Sea Gulf of Mexico Tropical Cyclones: counts of GUTC tropical storms and hurricanes in Gulf of Mexico Gulf of Mexico Hurricanes: counts of hurricanes GUHU Gulf of Mexico in Gulf of Mexico Gulf of Mexico Major Hurricanes: counts of GUMH major hurricanes in Gulf of Mexico A variety of climate-related global and regional monthly predictors are taken into account for the forecast of the forthcoming hurricane season. Most of these candidate predictors come from the NOAA Earth System Research Laboratory Division (https://psl. noaa.gov/data/climateindices/list/, accessed on 18 April 2021), including Atlantic and Pacific SST-related climate indices, El Nino Southern Oscillation (ENSO) related indices, and atmospheric and teleconnection indices. In addition, measures taken over the main development region (MDR, 10◦–20◦ N, 80◦–20◦ W) are incorporated as well. All of the MDR indices are derived from the NCEP–NOAA Reanalysis dataset at https://psl.noaa.