<<

INVESTIGATING PROBABILISTIC FORECASTING OF TROPICAL OVER THE NORTH ATLANTIC USING LINEAR AND NON-LINEAR CLASSIFIERS

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Christopher C. Hennon, M.S.

* * * * *

The Ohio State University 2003

Dissertation Committee: Approved by Dr. Jay Hobgood, Adviser

Dr. John Rayner ______Adviser Dr. Hugh Willoughby Atmospheric Sciences Graduate Program

ABSTRACT

Current numerical weather prediction models experience great difficulty in forecasting tropical cyclogenesis, primarily because of limitations of cloud parameterizations and observations. Forecasters have also struggled with the problem since they rely on the numerical models as an objective source of information. This research was performed with the aim of filling the void of objective guidance for tropical cyclogenesis. A new dataset of cloud clusters is created through the examination of infrared (IR) satellite imagery over the tropical Atlantic during the 1998-2001 hurricane seasons. Eight large-scale predictors of tropical cyclogenesis were then calculated from NCEP-NCAR Reanalysis dataset for each 6-hour interval of the cloud cluster life cycle extending back to 48 hours prior to genesis. Independent classifications were then performed on the entire dataset using both discriminant analysis (DA) and an artificial neural network (NN). The classifiers are fundamentally different from each other in that DA performs classifications based solely on linear trends in the predictors; the NN is potentially a more powerful classifier as it can find non-linear relationships in the data. The performance of each classifier was investigated through statistical scores and a series of case studies from the 1998- 2001 seasons. Tropical cyclogenesis is a rare event. Climatologically only about 15% of all cloud clusters develop into tropical depressions over the Atlantic Basin. The new cloud cluster database reflects that. 432 cloud clusters, of which 62 developed into tropical depressions, were tracked during the four seasons. Independent DA classifications show forecast skill over climatology. For the “prime” development season of August – October, the DA correctly forecast a higher percentage of clusters than climatology for all forecast periods. The most important predictors are latitude and the vertical shear structure. A comparison of DA forecasts with NN forecasts on the same dataset produced mixed results. The NN generally performed better with non-developing cloud clusters; however, there are indications that the NN suffers from over fitting to a greater degree than DA. An investigation of six case studies shows that both classifiers performed well in the majority of the cases. The DA appears to generalize much better than the NN in most cases. Danielle (1998) and a non-developing cluster (ND-6, 2000) brought to light several possible deficiencies in the statistical model. The large-scale predictors over-forecast genesis in a favorable shear environment, even if the thermodynamic environment is marginal.

ii

Also, the lack of any information on the convective structure of the cloud cluster will decrease forecast accuracy in some cases. Danielle (2000) developed explosively despite an unfavorable large-scale shear environment, perhaps due to mesoscale interactions that are not resolved in this model. Results suggest that this model has sufficient potential to be implemented as an objective forecast tool. Each predictor can easily be calculated from an analysis field that is routinely available to forecasters. The inclusion of mesoscale predictors, especially satellite derived temperature, moisture, and wind data, is thought to be an important next step for improvement of forecasts; especially since the current literature suggests that the important physical interactions for tropical cyclogenesis occur at the smaller scales.

iii

Dedicated to Paula and our baby

iv

ACKNOWLEDGMENTS

This research is the product of contributions from many people. I would like to thank my advisor, Dr. Jay Hobgood, for the intellectual freedom to pursue this project and the sound advice that kept it going in the right direction. Dr. Hugh Willoughby graciously donated his time and energy to serve on my examining and defense committees. He also provided valuable guidance in the formulation of the project three years ago and has kept me scientifically thorough when I got lazy. I am also grateful for Dr. John Rayner’s insightful advice in regards to this work and early guidance in my student career at Ohio State. Comments from National Hurricane Center (NHC) forecasters James Franklin, Stacy Stewart, Miles Lawrence, Richard Pasch, and Jack Beven were an invaluable source of information regarding how tropical cyclogenesis is handled operationally in the Atlantic Basin. Commander Marge Nordman at the Joint Warning Center (JTWC) provided me with similar information for the Central and Western Pacific Basins. I would also like to thank Dr. Jeff Halverson at NASA Goddard Space Flight Center for making my first professional manuscript submission a better one. Technical support was generously provided from a number of sources. Dr. Caren Marzban at Oklahoma was a willing and exhaustive source of information, constructive criticisms, and expert advice on the implementation and use of my neural network. Without his support this document would not have been possible in its present form. Dr. Michael Tiefelsdorf, Ohio State Department of Geography, helped me to sort out many statistical issues, especially in the early stages of the project. Department of Geography technicians Jim DeGrand and Jens Blegvad generously allowed me to perform this research within their quiet, secured office and were quick to address any hardware/software issues that arose. Irene Casas provided valuable guidance early on in developing the implementation of the neural network and the automated resampling of the data. All data that were used in this project were obtained at no cost. The Space Science and Engineering Center (SSEC) at the University of Wisconsin provided all of the GOES-8 Infrared imagery for the western North Atlantic. Coverage in the far eastern North Atlantic was provided from Meteosat-7 imagery made available by the European Organisation for the Exploitation of

v

Meteorological Satellites (EUMETSAT). Dr. Greg Holland provided the computer software that allowed the calculation of the Maximum Potential Intensity predictor. Finally, I would like to acknowledge those persons who provided personal support at all stages of this work. My wife Paula has been an unending source of love, support, friendship, and intellectual stimulation. I am grateful that we shared this journey together and I hope I can provide even half of what she’s given me in return as she completes her dissertation. I am grateful to have parents that have always supported me unconditionally. Thank you Mom and Dad. I would also like to acknowledge Karen LeBel, a special person that I have known since my graduate work at Purdue University. She is a continuing source of friendship, laughter, and motivation. No, we are not going to name the baby “Z-O”. Finally, I’d like to thank Jim DeGrand, Jens Blegvad, and Joe Szymczak for endless hours of distraction and entertainment playing Age of Empires and Age of Mythology.

Ready to play?

vi

VITA

October 3, 1972 ……………………………….Born – Cleveland, Ohio, USA 1994 …………………………………………….B.A. with Honors, Aeronautics-Mathematics, Miami University (Ohio) 1996 …………………………………………….M.S., Atmospheric Science, Purdue University 1996 - 1998 ……………………………….……Meteorologist/Software Engineer, TASC, Inc., Reading, Massachusetts 1998 – 2000, 2003 …………………………… Graduate Teaching Assistant, The Ohio State University 2001 – 2002 …………………………………….Graduate Research Assistant, Byrd Polar Research Center, The Ohio State University

FIELDS OF STUDY

Major Field: Atmospheric Sciences

vii

TABLE OF CONTENTS

Abstract………………………………………………………………………………………………………ii

Dedication…………………………………………………………………………………………………..iv

Acknowledgements…………………………………………………………………………………………v

Vita………………………………………………………………………………………………………….vii

List of Tables……………………………………………………………………………………………….xi

List of Figures……………………………………………………………………………………………...xii

List of Acronyms and Abbreviations……………………..……………………………………………..xvi

Chapters:

1. Introduction………………………………………………………………………………………………1

1.1 Motivation………………………………………………………………………………….…1 1.2 Definition and Philosophy of Tropical Cyclogenesis……………………………………..4 1.3 Justification for the Use of Large-Scale Data…………………………………………….5 1.4 Summary……………………………………………………………………………………..6

2. Atlantic Tropical Cyclogenesis – Literature Review………………………………………………...8

2.1 Tropical Cyclogenesis Theory……………………………………………………………...8 2.1.1 CISK Theory……………………………………………………………………...8 2.1.2 WISHE………………………………………………………………………….…9 2.1.3 Mesoscale Convective Vorticees……………………………………………..10 2.1.4 Potential ……………………………………………………………….12 2.2 North Atlantic Basin Tropical Cyclogenesis…………………………………………..…12 2.2.1 Easterly Waves Over the Atlantic Basin…………………………………..…13 2.2.2 Favored Genesis Regions and Variability…………………………………...14 2.3 Forecasting Tropical Cyclogenesis Over the Atlantic Basin…………………………..15 2.4 Summary…………………………………………………………………………………....16

3. Datasets………………………………………………………………………………………………..17

3.1 National Hurricane Center “Best Track” Data File……………………………………...17 3.2 Reynolds Sea Surface Temperature……………………………………………………..18 3.3 Satellite Imagery……………………………………………………………………………19

viii

3.3.1 GOES-8………………………………………………………………………….19 3.3.2 Meteosat-7………………………………………………………………………20 3.4 Satellite Image Compositing Technique………………………………………………....20 3.5 Identification of Cloud Cluster Candidates……………………………………………....21 3.6 Cloud Cluster Data Set Characteristics……………………………………………….....23 3.7 NCEP-NCAR Reanalysis……………………………………………………………...... 25 3.7.1 Data Sources and Analysis Procedure…………………………………...... 25 3.7.2 Output Formats and Variable Descriptions……………………………...... 26 3.7.3 Known Error Sources………………………………………………………..…28 3.7.4 Detection of Incipient Tropical in the Reanalysis…………….….34 3.8 Summary……………………………………………………………………………………37

4. Preditors of Tropical Cyclogenesis...... 38

4.1 Averaging Technique……………………………………………………………………...39 4.2 Latitude……………………………………………………………………………………...39 4.3 Daily Genesis Potential……………………………………………………………………39 4.4 Maximum Potential Intensity………………………………………………………………43 4.5 850 mb Moisture Divergence……………………………………………………………..46 4.6 Columnar Precipitable Water……………………………………………………………..47 4.7 24-Hour Sea Level Pressure Tendency…………………………………………………48 4.8 6-Hour Surface Relative Vorticity Tendency………………………………………….…48 4.9 6-Hour 700 mb Relative Vorticity Tendency………………………………………….…49 4.10 Distributions of Predictors…………………………………………………………….....49 4.11 Summary…………………………………………………………………………………..55

5. Discriminant Analysis Classification...... 56

5.1 Statistical Assumptions……………………………………………………………………57 5.1.1 Normality………………………………………………………………………...57 5.1.2 Homogeneity of the Variance-Covariance Matrices………………………..57 5.1.3 Linearity………………………………………………………………………….58 5.2 Procedure…………………………………………………………………………………...59 5.3 Results of Preliminary Classifications…………………………………………………....59 5.3.1 Significance of Predictors…………………………………………………...... 59 5.3.2 Individual Case Classification……………………………………………...... 61 5.3.3 Statistical Measures of Skill………………………………………………...... 61 5.3.4 Classification by Month of Season………………………………………...…64 5.4 Discussion………………………………………………………………………………..…66 5.5 Summary…………………………………………………………………………………....67

6. Neural Network Background, Architecture, and Training……………………………………...... 68

6.1 Components of a Neural Network………………………………………………………..68 6.2 Applications of Neural Networks in the Atmospheric Sciences…………………….....70 6.3 Training of the Neural Network………………………………………………………...... 70 6.4 Overfitting in Neural Networks………………………………………………………...... 72 6.5 Neural Network Description and Architecture……………………………………….....74 6.5.1 Network Inputs……………………………………………………………….....75 6.5.2 Normalization of the Predictor Dataset……………………………………....77 6.5.3 Initialization of Network Weights……………………………………………...77 6.5.4 Network Training – The BFGS Quasi-Newton Optimization Method……..78 6.5.5 Calculation of Training Error – Activation Functions and Cross Entropy...79

ix

6.5.6 Adaptive Regularization…………………………………………………….....80 6.5.7 Convergence Check………………………………………………………...... 81 6.5.8 Network Outputs…………………………………………………………….....81 6.6 Finding the Optimal Number of Hidden Nodes (Hoptimal)…………………………….....81 6.7 Summary…………………………………………………………………………………....83

7. Evaluation of Neural Network Forecast Performance Against Discriminant Analysis………....85

7.1 The 2 x 2 Contingency Table…………………………………………………………...... 85 7.2 Definitions of Skill Scores………………………………………………………………....86 7.3 Finding Optimal Decision Thresholds………………………………………………...... 87 7.4 Neural Network and Discriminant Analysis Heidke Skill Score Comparison……...... 97 7.5 POD and FAR Comparison………….……………………………………………...... …98 7.6 Receiver Operating Characteristic Plots………………………………………….....…100 7.7 Summary and Discussion………………………………………………………...... …106

8. Case Studies...... 107

8.1 Methodology………………………………………………………………………...... …107 8.2 Keith (2000)………………………………………………………………………...... 108 8.3 Non-Developing Case 43 (ND-43 1999)………………………………………...... …114 8.4 Non-Developing Case 6 (ND-6 2000)……………………………………………...... 117 8.5 Danielle (1998)…………………………………………………………………...... …120 8.6 Mitch (1998)………………………………………………………………………...... 122 8.7 Non-Developing Case 34 (ND-34 2001)………………………………………...... …125 8.8 Summary…………………………………………………………………………...... …131

9. Summary, Conclusions, and Future Work…………………………………………………...... 132

9.1 Summary……………………………………………………………………………...... 132 9.2 Future Work…………………………………………………………………………...... 134 9.2.1 Improvement of the Model…………………………………………...... 134 9.2.2 Applications of the Model……………………………………………...... 136 9.3 Conclusion…………………………………………………………………………...... 137

List of References…………………………………………………………………………………...... 139

x

LIST OF TABLES

Table Page

1 Summary of documented cloud clusters for the 1998-2001 Atlantic hurricane seasons……………………………………………………………………………….23

2 Observational data incorporated into the NCEP-NCAR Reanalysis……………………….26

3 Summary statistics of filtered vs. unfiltered MPI values (averaged at a radius of two degrees from the cloud cluster center) from the NCEP-NCAR Reanalysis for the 1998 Atlantic hurricane season. ‘ND’ signifies non-developing cloud clusters…………………………………………………………………30

4 Mean differences for all forecast periods between the corrected and uncorrected reanalysis data. ‘ND’ signifies a non-developing case. Predictors are two- degree averages from the cluster center, except MPI (averaged at six degrees)………..33

5 The eight large-scale predictors of TCG used for this study………………………………..38

6 Average terms of the SGP for all Atlantic cases in the McBride survey…………………..41

7 Atlantic DGP based on the mean relative vorticity differences between 900 mb and 200 mb in units of 10-5s-1. From McBride and Zehr (1981)……………………………43

8 Results of the K-S1 test for normality applied to the predictor distributions………….…...50

9 Values of the tests for variance-covariance homogeneity for each forecast hour………..58

10 Discriminant function loadings for each forecast hour. Rank is given in parentheses for each time period……………………………………………………………...60

11 Development rate by probabilistic prediction and forecast hour (1998-2000)…………….62

12 Forecast decision boundaries by classifier and forecast hour. The NN boundaries were derived from a series of twenty randomly sampled training and validation sets. The DA boundaries were derived from a cross validation technique with many independent trials…………………………………………..97

xi

LIST OF FIGURES

Figure Page

1 The Tropical Forecast Alert (TCFA) checklist employed by the Joint Typhoon Warning Center (JTWC) for Northern Hemisphere tropical cyclogenesis……………………………………………………………………………..3

2 Spatial coverage of GOES-8 and Meteosat-7 for this survey………………………………21

3 Spatial distribution of the cloud cluster dataset (1998-2001). Non- developing clusters (top panel) are shown at every 6-hour fix. The location of developing clusters at genesis time is shown in the lower panel……………………….24

4 Schematic drawing of the NCEP-NCAR Reanalysis system……………………………….27

5 100 mb global monthly averaged temperature difference (ºC) of reanalysis data without TOVS land filters (CDAS) and data with the filters (R2)……………………..28

6 Anomaly (unfiltered – filtered) of maximum potential intensity. Shaded regions represent negative values…………………………………………………………….29

7 Precipitable water (mm) and Daily Genesis Potential (x 10-5s-1) differences between the uncorrected reanalysis and the corrected data for 0600 UTC 8 September 1998. Negative values are shaded……………………………………..31

8 Percent of cases correctly classified using discriminant analysis for uncorrected (dark bars) and corrected (white bars) data by forecast hour (hours before genesis). The black line is the difference between the datasets and can be considered the degradation magnitude for the corrected data. The decision boundary is assumed to be P = 0.5……………………………………………………………32

9 Range/azimuth plot of NNR 850 mb relative vorticity maximums (black dots) and NHC fix (center of plot) for developing cloud clusters during the 1998 – 2000 Atlantic seasons (47 cases). Range values are in units of km and azimuth values are compass degrees. All cases are at genesis time…………………….35

10 Scatter plot of all tropical cyclogenesis cases (1998 – 2000) on a 850 mb relative vorticity vs. distance difference (NHC vs. NNR) scale. The linear best-fit line is shown. R = -0.5…………………………………………………………………36

11 Schematic of the averaging technique used to arrive at the final predictor values. Any grid point that fell within a 2º swath around the cloud cluster center was averaged together. The black dots represent grid points from the NNR…………………………………………………………………………………….40

xii

12 Flow chart describing the solution method for Holland MPI (Holland 1997)………………44

13 Frequency distributions of a) COR and b) DGP predictors from all cases in the 1998-2001 database……………………………………………………………………..51

14 As in Figure 13, except for a) MDIV and b) MPI……………………………………………..52

15 As in Figure 13, except for a) PWAT and b) PTEND………………………………………..53

16 As in Figure 13, except for a) VTENDSFC and b) VTEND700…………………………….54

17 Threat (solid) and Brier (dotted) scores by forecast hour for the 1998-2001 data. A decision boundary of P = 0.7 was used to calculate these scores………………63

18 Frequency of cloud cluster and tropical depression development by month……………..65

19 Percentages of cloud clusters correctly classified by month. The dark solid line without points is the climatology boundary shown in Figure 18…….…………………66

20 Schematic diagram of a common neural network architecture. Each layer is connected by a series of “weights” (solid lines), which are optimized by the training process……………………………………………………………...69

21 Flow chart showing the steps taken during the training process of a NN…………………71

22 Hypothetical data illustrating generalization skill. The model plotted in (b) has over fit the data (a), while the output in (c) shows much better generalization…………...73

23 Schematic diagram of the components and flow of the binary neural network classifier. The numbers correspond to the sections of this chapter that address that component…………………………………………………………………..76

24 Hypothetical error surface with possible initial weights (W) specified by filled circles. The arrows represent the direction an un-optimized network will take the initial weights – towards the closest minimum error surface…………………78

25 A schematic tv-diagram. Though a trial in H = 2 has a lower training and validation error than H = 0, it does not have the lowest validation error for all of the H = 2 cases. Hence, the NN has found a local minima…….………………..82

26 Frequency histogram of Hoptimal for the 160 bootstrap trials. Hoptimal = 6 was chosen for all forecast hours except 12 and 18 (Hoptimal = 7 and 5 respectively)……………………………………………………………………………………...84

27 The 2 x 2 contingency table……………………………………………………………………86

28 HSS for the 6-hour forecast period over a range of decision thresholds from 0 to 1. The NN a) training sets and b) validation sets were averaged across ten trials. The DA curve was derived from the cross-validation procedure. The threshold increment was 0.001……………………………………………….…………..89

29 As in Figure 28, except for the 12-hour forecast classifiers…………………….…………..90

xiii

30 As in Figure 28, except for the 18-hour forecast classifiers………………………….……..91

31 As in Figure 28, except for the 24-hour forecast classifiers………………………….……..92

32 As in Figure 28, except for the 30-hour forecast classifiers………………………….……..93

33 As in Figure 28, except for the 36-hour forecast classifiers………………………….……..94

34 As in Figure 28, except for the 42-hour forecast classifiers………………………………...95

35 As in Figure 28, except for the 48-hour forecast classifiers………………………………...96

36 a) POD and b) FAR by forecast hour for the NN (light) and DA (dark) classifiers. The 90% confidence interval error bars are shown on the NN skill scores………………………………………………………………………………………..99

37 2 x 2 contingency tables for each forecast hour. The neural network runs (left) are a 10-trial average. The discriminant analysis runs (right) were derived from a cross-validation (leave one out) procedure. The actual events (0,1) are in the first column. The predicted events are in the first row………………………...101

38 ROC diagrams for the a) 6-hour and b) 12-hour forecast periods…..…………………...102

39 As in Figure 38, except for the a) 18-hour and b) 24-hour forecast periods…...………..103

40 As in Figure 38, except for the a) 30-hour and b) 36-hour forecast periods………...…..104

41 As in Figure 38, except for the a) 42-hour and b) 48-hour forecast periods……...……..105

42 Track of the cloud cluster that would eventually develop into Tropical Depression and then Tropical Storm Keith. Note the northward displacement, and then disappearance, of in the mid-Atlantic followed by a re-emergence in the Caribbean Sea………………………………………………………109

43 Infrared Enhanced (left) GOES satellite image of the future Keith thirty hours before genesis and visible (right) GOES image of Keith around the time of genesis. Images courtesy of the Naval Research Laboratory………………110

44 Series of 6-48 hour forecasts for each point along the track of cloud cluster Keith. Genesis time is denoted by the vertical dark line at 1800 UTC 28 September……………………………………………………………………..111

45 Time series of a) 6-hour forecasts and b) 12-hour forecasts for cloud cluster Keith. Genesis is denoted by the vertical dark line at right……………………………….112

46 As in Figure 45, except for a) 42-hour and b) 48-hour forecasts…………………………113

47 Irregular track of cloud cluster ND-43 (1999). The date and UTC time of each location is given next to the position fix……………………………………………….114

48 Meteosat-7 IR image of cloud cluster ND-43 at 1200 UTC 1 September. The cloud cluster had been tracked for approximately 24 hours prior to this image and would dissipate 36 hours later………………………………….…………..115

xiv

49 Series of 6-48 hour forecasts for cloud cluster ND-43 for a) DA and b) NN classifiers……………………………………………………………………………………….116

50 Track of cloud cluster ND-6 (2000)………………………………………………….……….117

51 Meteosat-7 IR image of cloud cluster ND-6 (2000). The white arrow points towards the convective maximum of the system. Convection would gradually decrease over the next few days as an unfavorable shear situation established itself over the mid-Atlantic……………………………………………….….…..118

52 As in Figure 49, except for cloud cluster ND-6 (2000)……………………………..………119

53 Pre-depression track of the cloud cluster that eventually developed into Hurricane Danielle……………………………………………………………………………..120

54 Pre-Danielle cloud cluster approximately 24 hours prior to genesis (left) and at genesis (right)……………………………………………………………………………….121

55 Tropical Storm Danielle approximately fourteen hours after genesis…………………….122

56 As in Figure 49, except for the pre-Danielle cloud cluster…………………………………123

57 Pre-tropical depression track of the cloud cluster that was eventually named Mitch……………………………………………………………………………………124

58 GOES-East IR image of the pre-Mitch cloud cluster at approximately genesis time. Image courtesy of the Naval Research Laboratory……………………….125

59 As in Figure 49, except for the pre-Mitch cloud cluster……………………………………126

60 Forecast time series for a) 6-hours and b) 12-hours for the pre-Mitch cloud cluster. Genesis occurred at 0000 UTC 22 October, which is at the right edge of the plot…………………………………………………………………………………127

61 As in Figure 60 except for the a) 42-hour and b) 48-hour forecast periods……………..128

62 Track of cloud cluster ND-34 (2001)…………………………………………………………129

63 As in Figure 49 except for the ND-34 (2001) cluster…………………………………….…130

64 Concept GUI for an operational tropical cyclogenesis forecast tool……………………...138

xv

LIST OF ACRONYMS AND ABBREVIATIONS

AMSU Advanced Microwave Sounding Unit

AVHRR Advanced Very High Resolution Radiometer

BFGS Broyden-Fletcher-Goldfarb-Shanno

BP Backpropagation

CDAS Data Assimilation System

CDC Climate Diagnostics Center

CISK Convective Instability of the Second Kind

COADS Comprehensive Ocean- Data Set

DA Discriminant Analysis

DGP Daily Genesis Potential

DV Developing

ECMWF European Centre for Medium-Range Weather Forecasts

EN El Niño

FAR False Alarm Ratio

GARP Global Atmospheric Research Program

GATE GARP Atlantic Tropical Experiment

GOES Geostationary Operational Environmental Satellite

GRIB Grid in Binary

GUI Graphical User Interface

HSS Heidke Skill Score

IR Infrared

xvi

JTWC Joint Typhoon Warning Center

MCS Mesoscale Convective System

MCV Mesoscale Convective Vortex

MJO Madden-Julian Oscillation

MPI Maximum Potential Intensity

MRF Medium Range Forecast

MSE Mean Square Error

MSU Microwave Sounding Unit

NCAR National Center for Atmospheric Research

NCDC National Climatic Data Center

NCEP National Centers for Environmental Prediction

ND Non-Developing

NHC National Hurricane Center

NN Neural Network

NNR NCEP-NCAR Reanalysis

OI Optimal Interpolation

POD Probability of Detection

POP Probability of Precipitation

PV Potential Vorticity

QBO Quasi-Biennial Oscillation

ROC Receiver’s Operating Characteristic

SAL

SGP Seasonal Genesis Parameter

SHIPS Statistical Hurricane Intensity Prediction Scheme

SLP Sea Level Pressure

SLPA Sea Level Pressure Anomaly

SSEC Space Science and Engineering Center

xvii

SSM/I Special Sensor Microwave Imager

SSI Spectral Statistical Interpolation

SST Sea Surface Temperature

TCFA Forecast Alert

TCG Tropical Cyclogenesis

TCGI Tropical Cyclogenesis Index

TIROS Television Infrared Observation Satellite

TMI TRMM Microwave Imager

TOVS TIROS Operational Vertical Sounder

TRMM Tropical Rainfall Measuring Mission

TD Tropical Depression

UKMET United Kingdom Meteorological Office

UTC Coordinated Universal Time

WISHE Wind Induced Surface Heat Exchange

xviii

CHAPTER 1 INTRODUCTION

How does a cluster of tropical late-summer thunderstorms transform itself into a self- sufficient heat engine capable of producing devastating winds, , and low pressures comparable with the mightiest of mid-latitude cyclones? How does this occur in a nearly barotropic environment characterized by gentle trade winds? This transformation process, called ‘tropical cyclogenesis’ (TCG), is a rare but reliable event during the hurricane season of May – December. Over the Atlantic Basin, it occurs in about 15% of all candidate thunderstorm clusters (hereafter called ‘cloud clusters’). The goal of this research is not to hypothesize and test the physics of TCG. Rather, the purpose is to accurately predict TCG using large-scale predictors either documented in the literature, or theorized to be important to the process. Furthermore, it is intended that this work be designed so that it could ultimately be implemented in a real-time forecast environment. Thus, predictors were chosen that could be easily calculated and regularly available to a forecaster. The predictors were statistically connected to the outcome of the event (development or non-development) through a ‘classifier’. Two classifiers were chosen. Discriminant analysis (DA) is a traditional statistical classifier that linearly maps the predictors to the outcome by calculating a single function that incorporates all given predictors. Neural networks (NN) are relatively new tools that were designed to model the behavior of the human brain. NNs are inherently non- linear, easily to implement, and have a vast potential to model the most complex processes in the atmosphere. This research will address the following questions:

1) Can useful predictions of TCG be made from the large-scale data? 2) Is a linear or non-linear classifier more effective at forecasting? 3) Where can the most improvement in forecasts be made with future research?

1.1 Motivation The driving force behind this work is the current lack of any objective technique for forecasting TCG in the Atlantic. James Franklin, a forecaster for the National Hurricane Center

1

(NHC) says “…there is no method, certainly no objective one, that is employed. Consequently, there is very little skill (if any) in forecasting tropical cyclogenesis” (James Franklin Personal Communication 2000). Numerical models such as the Medium Range Forecast Model (MRF, Heming 1997) and the United Kingdom Meteorological Office Model (UKMET, Surgi et al. 1998) are frequently consulted and have shown “some” skill in forecasting TCG (Richard Pasch Personal Communication 2000). But these models still suffer from analysis and parameterization deficiencies. As an example, the MRF model frequently spins up “boguscanes”, or spurious vortices (Beven 1999) due to problems with inaccurate analysis fields and the model convective parameterization. However, there are indications that model performance is improving as initialization and parameterizations improve (Pasch 2002). The vast majority of research efforts over recent decades have emphasized improved guidance for track and intensity forecasts rather than the prediction of TCG. This does not imply that little is known about the atmospheric and oceanic conditions that engender TCG. There are several well-established factors (Gray 1968) that are evaluated subjectively: , surface pressure tendency, convective distribution, and atmospheric stability among others (James Franklin Personal Communication 2000). Some of these factors will be discussed in more detail later in this paper. Hence, forecasters have no choice other than to take a primarily reactive rather than prognostic stance when monitoring the likelihood of TCG. Suspicious areas are identified and then several different factors are subjectively considered. For example, cloud pattern recognition through the use of the Dvorak technique (Dvorak 1984) is one tool that forecasters at the NHC use extensively to determine when TCG occurs (Richard Pasch Personal Communication 2000). The only forecasting technique currently employed at NHC is persistence. That is, if a cloud cluster appears to be becoming better organized, a statement will be issued that says in effect “if this trend continues a tropical depression could form within the next 24 hours” (Richard Pasch Personal Communication 2000). Clearly there is a need for a more objective tool. Other forecast offices follow similar procedures. The Joint Typhoon Warning Center (JTWC) at Pearl Harbor, HI is currently the only tropical cyclone forecast office in the world that issues tropical cyclone genesis alerts before genesis occurs (Stacey Stewart Personal Communication 2000). The issuance of alerts is feasible because recent research and experience have shown that TCG is intimately tied to seasonal oscillations such as the monsoon circulation and the Madden-Julian Oscillation (Hall et al. 2001, Ferreira et al. 1996, Maloney and Hartman 2001, Dickinson and Molinari 2002). This technique does not diverge too far from the one used at the NHC. They employ a checklist, shown in Figure 1, that guides the forecaster in

2

Figure 1. The Tropical Cyclone Forecast Alert (TCFA) checklist employed by the Joint Typhoon Warning Center (JTWC) for Northern Hemisphere tropical cyclogenesis.

3

assessing favorable TCG conditions. An examination of the checklist shows that many of the variables considered are subjective in nature and the analysis of the situation will most likely differ with each individual forecaster.

1.2 Definition and Philosophy of Tropical Cyclogenesis So far, TCG has been defined qualitatively as a transformation from a disorganized group of thunderstorms into a self-sufficient heat engine. This immediately raises the question of how to determine when that transformation occurs, especially since TCG is rarely observed with in situ measurements. To add further complication, other researchers have traditionally defined TCG in different ways. Ritchie and Holland (1999) exclude any ‘self-sustaining’ requirement and define genesis as “the series of physical processes by which a warm-core, tropical cyclone-scale vortex with maximum amplitude near the surface forms.” Bracken and Bosart (2000) also recognize TCG as a sequential pattern of events, but that culminates in a “self-sustaining” vortex. In other words, the incipient disturbance reaches a point where it is less influenced by the outside environment and is able to achieve a self-maintaining circulation. Briegel and Frank (1997) also view cyclogenesis as a series of amplifying events rather than an event at a single point in time. This research is based on this premise. That is, TCG is not a single event in time, but a series of events that results in the creation of a self-sustaining vortex with a warm core. Each subsequent event cannot occur without a favorable result from the one immediately preceding it. As the sequence progresses, the probability of TCG gradually increases until either genesis occurs or external forcings (such as movement into a high shear zone) prevent it. Thus, TCG forecasting should be approached from a probabilistic view. Furthermore, it is envisioned that genesis is a process that can cascade down and up in scale. For example, there must first be a favorable large-scale environment in place to ensure that the immature convection that makes up a cloud cluster can prosper and moisten the lower to mid-. Then, mesoscale processes such as vortex mergers are hypothesized to create a larger and stronger parent vortex from two or more smaller vortices. Theories on how this happens are discussed in Chapter 2. This research focuses on the large-scale – using mesoscale predictors is a topic for further research and is discussed in the conclusions of this document. But the question still remains – how does one know when TCG has occurred? Fortunately, there is a traditional, quantitative way of defining it. The NHC labels a cloud cluster as a tropical depression when sustained wind speeds around a closed circulation are less than 34 knots (17.5 m/s). If direct reconnaissance from aircraft is not possible, forecasters rely on satellite imagery to estimate the amount of organization and surface wind speeds. TCG is deemed to occur when the Dvorak T number reaches 2.0 (Dvorak 1984); at such time a

4

depression number is issued. The NHC records genesis date, time, and location in its ‘best- track’ database (Jarvinen et al. 1984). Although there are inherent errors and concerns in using TCG identification by the NHC (see Chapter 3 for more details), the NHC ‘best-track’ database provides the most objective and consistent quantitative method for identifying TCG.

1.3 Justification for the Use of Large-Scale Data From the preceding discussion, it was suggested that most theories that deal with the process of TCG involve features at the mesoscale or smaller. There is increasing evidence in the recent literature that the crucial physical processes regarding TCG involve convective and dynamical interactions that occur at those scales (e.g. Emanuel 1989; Chen and Frank 1993; Bister and Emanuel 1997; Simpson et al. 1997; Ritchie and Holland 1997; Montgomery and Enaganio 1998). At first, it might seem that the creation of a probabilistic forecast model for TCG would be very difficult if not impossible, since the availability of high-resolution observational data is very limited. Many phenomena thought to be crucial to obtaining genesis: potential vorticity (PV) asymmetries, mesoscale convective systems (MCS), and mesoscale convective vortices (MCV); all occur at scales smaller than that for which we can observe without the aid of aircraft, high resolution radar, satellite imagery, or other high density in situ measurements. But there are several previous studies that suggest that a large amount of predictability can be harvested from the large-scale fields – and thus provide legitimacy to this work’s premise described in the previous section. The first category of research that found a relationship between large-scale data and TCG was derived from large field experiments. During the Global Atmospheric Research Program’s (GARP) Atlantic Tropical Experiment (GATE), several rawindsonde and surface stations provided higher than normal data resolution in the Caribbean. Vincent and Waterman (1979) took advantage of these observations and the fortunate track of Hurricane Carmen (1974) through the data network to analyze the lifecycle of Carmen from early depression stage through mature hurricane status. They found that the large-scale features of Carmen’s environment, including warm SSTs, deep easterly flow, low-level conditional instability, and very weak vertical shear, led to intensification. A rawindsonde network in the Pacific and Atlantic basins was utilized by McBride (1981) and McBride and Zehr (1981) to develop composite datasets of developing and non-developing cloud clusters. They found that a high degree of predictive skill could be obtained by consideration of the large-scale wind fields. Their findings form the basis for the Daily Genesis Potential (DGP) predictor used in this research. Lee (1989a, 1989b) also used rawindsonde soundings in the Pacific and identified the importance of

5

the shear and vorticity fields in TCG. His findings were consistent with both McBride and Vincent. It is quite remarkable that such strong relationships can be obtained from a relatively sparse observational network. Recently, gridded observational (reanalysis) datasets have provided an alternative basis for large-scale TCG studies. They provide enhanced data quality by assimilating observations from such sources as ships, satellites, rawindsonde and surface stations, and aircraft. Watterson et al. (1995) applied Gray’s (1979) seasonal genesis parameter (SGP) to ECMWF reanalysis data and found that it provided a useful diagnostic for tropical cyclone activity. Briegel and Frank (1997), also using ECMWF gridded data, analyzed TCG in the Pacific Basin. They found that 85% of all clusters that became tropical storms exhibited either a common high level ( in close proximity) and/or low-level (vorticity increase) large-scale feature. Another interesting result is the composite vertical shear field, which showed the zero shear line almost exactly over the center of the system. This result collaborated with what has been found previously in other studies using large-scale data (e.g. McBride 1981). A TCG study performed in the Australian region by Ritchie and Holland (1999) also showed promising results with large-scale gridded analysis.

1.4 Summary This research will explore how effective a probabilistic prediction system based on statistics can forecast a rare event whose physical mechanisms are not yet understood. It is hypothesized that large-scale data will yield sufficient predictive information to produce a useful forecast despite the increasing evidence that the crucial physical processes of TCG occur at the mesoscale and microscale. This project is based on the premise that TCG is a series of events that initially cascade down to smaller scales, and that the key to forecasting it is to approach it from a probabilistic point of view. Chapter 2 will present a literature review of TCG, especially as it pertains to the Atlantic Basin. Chapter 2 will also present a high-level overview of several theories on TCG, including Convective Instability of the Second Kind (CISK) and Wind Induced Surface Heat Exchange (WISHE). Chapter 3 discusses the data used for this project. Included in Chapter 3 will be the strengths, weaknesses, and other possible source of error in the data. Also included in this chapter will be a thorough description of the new cloud cluster database created for this project. This will be followed by a description of the eight large-scale predictors employed in this research in Chapter 4. Chapter 5 will explore the classification technique of DA and the application of it to a preliminary set of data. A mostly informational chapter about NNs in Chapter 6 will follow this.

6

Chapters 7 and 8 house the main body of results from this research. A comparison between DA and NN classifiers is the topic of Chapter 7; Chapter 8 will examine six case studies of developing and non-developing cloud clusters that occurred over the Atlantic from 1998-2001 and how well the probabilistic forecast system handled them. Finally, Chapter 9 contains the summary and conclusions, including suggestions for future work.

7

CHAPTER 2 ATLANTIC TROPICAL CYCLOGENESIS - LITERATURE REVIEW

This chapter will provide an abbreviated review on previous research that is especially relevant to this study. First, several different arguments on the physics of tropical cyclogenesis will be presented. This will be followed by an examination on the origins and features of the easterly wave – the system responsible for approximately 60% of all genesis events in the Atlantic Basin (Avila et al. 2000). Finally, a brief history and overview will be presented on the approaches taken to forecast tropical cyclogenesis, especially in the Atlantic. As will be shown, most forecasting efforts thus far have been focused on the prediction of seasonal development rather than individual systems.

2.1 Tropical Cyclogenesis Theory Although much is known about the favorable ocean and atmospheric conditions that encourage TCG, the arguments regarding the physical transformation of a disorganized cloud cluster into a self-sustaining, intense vortex are still being debated. It is accepted that there must be an accumulation of latent heat in the vortex core to intensify the developing primary circulation. Furthermore, the superposition of potential vorticity (PV) with the developing circulation has been shown to encourage intensification as well (Molinari et al. 1997). Most theories on TCG seek to answer first how high θE air arrives in the core, and then what processes occur to retain it. This section will present a brief overview of four TCG theories: convective instability of the second kind (CISK), wind induced surface heat exchange (WISHE), the mesoscale convective vortex (MCV) hypothesis and potential vorticity theory.

2.1.1 CISK Theory Charney and Eliassen (1964) asked the question “Why do cyclones form in a conditionally unstable tropical atmosphere whose vertical thermal structure is apparently more favorable to small-scale cumulus convection than to convective circulations of tropical cyclone scale?” Their solution lies in viewing the interaction of small cumulus clouds with the large-scale

8

circulation as a cooperative one. This conceptual view of tropical cyclone formation became known as conditional instability of the second kind, or CISK. As summarized in McBride (1995), CISK theory makes three basic assumptions:

1) the initial perturbation is a synoptic-scale wave with balanced dynamics, 2) frictionally induced upward motion will result in latent heat release in the free atmosphere above the low-level cyclonic vorticity, and 3) the magnitude of the latent heat release is proportional to Ekman pumping.

In addition, the tropical atmosphere is assumed to be conditionally unstable. The essence of the theory is a positive feedback loop, where latent heat release caused by the large-scale circulation in turn reinforces it. Ekman pumping initiated from the large-scale vorticity field results in upward motion and latent heat release in the column. This forces the development of a secondary circulation and increased inward flow into the column. In turn, the vertical vortex is “stretched”, resulting in increased cyclonic vorticity at the surface and hence a greater amount of Ekman pumping. Charney and Eliassen (1964) showed that a development period of approximately 2.5 days over a 100 km region is obtainable with reasonable choices of input parameters into their 2- level model. These values are similar to scales of tropical depression formation. This type of development was not obtained when regular conditional instability was assumed. Another numerical modelling experiment by Kurihara and Kawase (1985) showed that the addition of a wave-CISK type heating effect to a simulated tropical disturbance resulted in the growth of the wave. The addition of non-linear effects alone, such as nonlinear zonal advection and vortex stretching, had little impact on the growth of the disturbance. They concluded that a sustained increase of vorticity at the surface seemed to require the concurrent warming of the air above the surface vorticity maximum.

2.1.2 WISHE The development of CISK apparently reconciled the problem of scale in moist convection. However, arguments refuting CISK and the convective parameterizations based on it have recently been suggested. Xu and Emanuel (1989) disputed the crucial assumption of CISK that the tropical atmosphere was in general conditionally unstable, presenting evidence that the atmosphere was in fact near neutral to moist convection. Without a reservoir of convective available potential energy (CAPE) to tap into, CISK could not exist. Another argument against CISK theory is the seemingly incorrect connection between latent heating and temperature made

9

by Charney and Eliassen, who assumed that latent heating directly leads to kinetic energy production. Emanuel et al. (1994) argues that adiabatic cooling and radiative heat loss nearly offset the positive contribution of latent heat release, and that the correlation between heating and temperature is very difficult to determine. A comment on the modeling work of Kurihara and Kawase (1985) by McBride and Willoughby (1986) argued that the growth of the model disturbance (which in itself was observationally unrealistic) was due to problems of runaway short-wave amplification in wave-CISK. Furthermore, they suggest that the representation of the basic flow in Kurihara and Kawase does not accurately portray the mean flow in a typical trade wind condition, further bringing into question their conclusions. Energy contribution by the ocean has long been recognized as a crucial component of TCG and maintenance (e.g. Riehl 1954). Seizing upon this and the weaknesses of CISK, Emanuel and others developed a new theory; ultimately named wind induced surface heat exchange, or WISHE (Emanuel 1989; Emanuel et al. 1994). A model based on WISHE can produce an amplifying tropical storm without the assumption of conditional instability (Rotunno and Emanuel 1987). Emanuel et al. (1994) presents the scenario of TCG within a WISHE framework. An incipient vortex induces Ekman pumping, resulting in upward motion throughout the depth of the troposphere (assuming a length-scale of 500 km). Eventually, downdrafts result from this forcing, bringing low θE air into the sub-cloud layer. Heat flux from the ocean surface initially is not enough to counteract this effect, and the vortex threatens to cool and spin down. The key factor that allows for amplification of the vortex is a moistening of the sub-cloud layer (through stratiform precipitation) to near saturation. Hence, when the moistened air is cycled into the secondary

circulation, low θE air eventually disappears. This allows latent heat flux from the wind induced surface evaporation to begin to dominate, warming the core and allowing growth of the vortex. Thus, WISHE creates its own conditional instability through energy extraction from the ocean surface. An observational study conducted in Hurricane Guillermo (1991) during the Tropical Experiment in Mexico (TEXMEX) provides evidence for the processes hypothesized in WISHE (Bister and Emanuel 1997).

2.1.3 Mesoscale Convective Vortices Hurricane Guillermo (1991) also exhibited a feature that several researchers have theorized to be an important component of TCG. The development of a mesoscale convective system (MCS) and an accompanying mesoscale convective vortex (MCV) was observed during the genesis of Guillermo. A MCS is a large group (>100,000 km2 in area) of organized convective clouds that persist for at least several hours. Some MCSs have been observed to form a

10

localized region of enhanced cyclonic vorticity in their rear stratiform precipitation regions, called MCVs. Evidence connecting MCSs with TCG has been detected through satellite imagery (e.g. Velasco and Waterman 1979), but TCG within a MCS event is thought to be rare. The reasons for the rarity of TCG within a MCS are unclear. It has been proposed that downdrafts from the MCS convection spread radially outward from the system, effectively cutting off the source of warm, moist air that a MCS (and incipient tropical cyclone) needs to sustain itself (Jay Hobgood Personal Communication 2002). Perhaps TCG from MCSs are more common, but are simply not observed. Chen and Frank (1993) performed a numerical simulation of the lifecycle of a MCS using the PSU/NCAR mesoscale model. A warm core MCV on the scale of approximately 400 km in diameter spun up under the stratiform precipitation shield at the 500 hPa level. As the simulation marched forward in time, the mid-level vortex propagated down towards the lower troposphere. It is interesting that the vortex descent is an event that forecasters look for as a precursor for TCG (Richard Pasch Personal Communication 2000). Based on this simulation, Chen and Frank attribute the initial development of the MCV to vortex tube stretching. Intensification of the vortex is attributed to its descent into high θE air near the surface. Bister and Emanuel (1997) emphasize the importance of this descent for TCG, insisting that it is highly unlikely without it. They attribute the descent of the cold core vortex to downdrafts initiated by evaporation of the stratiform precipitation. Model simulations using the Rotunno and Emanuel (1987) model provide evidence for this hypothesis. There is observational evidence for the role of MCSs in TCG. Simpson et al. (1997) examined the genesis of Tropical Cyclone Oliver in the South Pacific and found that several MCVs developed within two dominant MCSs and interacted with each other in a seemingly stochastic manner. They concluded that this interaction played a vital role in the birth of Oliver. Ritchie and Holland (1997) analyzed a number of MCSs that formed during the development of in the North Pacific. They found that the merger of two MCVs within a MCS possibly played a significant role in cyclogenesis as the resulting increased cyclonic circulation extended down near the surface. They explained the decent of the vortex through the modified Rossby-Burger-Prandtl relationship:

( flocζ a ) D = L (1) N

where D = vertical influence of a potential vorticity (defined in the following section) perturbation,

L = horizontal scale of the perturbation, ζa = absolute vorticity, and N = Brunt-Vaisala frequency. Hence, if L is increased (as is the case when two vortices merge (Ritchie and Holland 1993), D

11

will increase assuming other factors remain constant. Once the vortex extends to the surface, it is thought that new convection is spawned through either enhanced convergence and/or lower tropospheric moistening (i.e. WISHE-type process), which may lead to a tropical depression.

2.1.4 Potential Vorticity An alternate method for the examination of TCG is through the use of potential vorticity (PV). Several studies use Ertel’s potential vorticity in isentropic form: ∂θ q = −g(ζ + f ) (2) θ ∂p

where ζθ is the relative vorticity for a isentropic surface, f is the Coriolis parameter, g is gravity, and ∂θ ∂p the thickness term (vertical potential temperature gradient). PV is a measure of the strength of the vorticity field in relation to the vertical thickness of the vortex. It is conserved for frictionless, adiabatic flow. The evolution of the large-scale PV field has been connected with TCG frequency. In regions of PV sign reversals, TCG is favored due to wave amplification from the instability of the mean flow. Molinari et al. (1997) examined the frequency of TCG in the Eastern Pacific for the 1991 season and found a correlation with the PV field over the Caribbean. They contend that easterly waves grow in regions of PV sign reversals and tend to decay otherwise. Results showed that when the PV sign reversal weakened (strengthened) in the Caribbean during late August-early September, TCG ceased (intensified) downstream. An alternate way of examining cyclogenesis is through the mesoscale PV field within the cloud cluster itself. Montgomery and Enaganio (1998) hypothesize that TCG may be triggered by vorticity asymmetries spawned by intense convection. To test their theory, they used a three- dimensional quasi-geostrophic model and initialized it with a pre-existing mesoscale vortex with accompanying convection. To simulate the effects of a convective outbreak near a quasi-circular vortex, non-symmetric PV anomalies were introduced on two sides. The resulting model run showed an axisymmetrization of the vorticity field, increase in tangential winds, a significant drop in surface pressure around the incipient vortex, and the development of a warm core. Montgomery and Enagonio acknowledge that this process is in essence a merger of the PV anomaly with the parent vortex. They suggest that the main difference between their theory and Simpson et al. (1997) is the emphasis on the convective heating.

2.2 North Atlantic Basin Tropical Cyclogenesis Approximately 80% of the world’s tropical depressions develop in a monsoonal equatorial trough (McBride and Keenan 1982). Such an environment is characterized by a high meridional

12

shear gradient between the tropical easterlies () and mid-latitude westerlies (easterlies) at low (high) levels. However, tropical cyclogenesis in the Atlantic basin is frequently achieved within the circulation of easterly waves. In fact, easterly waves account for about 60% of all development in the basin (Avila et al. 2000). This section will present a brief overview of easterly waves and genesis characteristics of the Atlantic Basin. It should be emphasized that genesis requirements in other ocean basins are most likely quite different than the Atlantic Basin, both in terms of large-scale features and enhancement mechanisms. For example, genesis in the North Eastern Pacific is frequently enhanced by interactions with topography (e.g. Zehnder 1991; Mozer and Zehnder 1996a; Farfan and Zehnder 1997); in the Australian region, the Madden Julian Oscillation (MJO) was found to increase the frequency of genesis events during its active phase (Hall et al. 2001).

2.2.1 Easterly Waves Over the Atlantic Basin Burpee (1972) was the first to realize that easterly waves derive their energy from the horizontal and vertical shear of the mean zonal wind. Charney and Stern (1962) showed that when the meridional gradient of potential temperature (θ) disappears at the surface, the vanishing of the mean potential vorticity gradient along an isentropic surface is a necessary condition for zonal flow instability (in this case, the zonal flow refers to the which derives its energy from the strong baroclinic zone created by the Saharan Desert and the African Sahel). Burpee showed that this condition is satisfied between the longitudes of 5° and 35°E during the months of June through October. This agrees with the observed temporal and spatial origins of easterly waves. Another possible mechanism for the enhancement of easterly waves is interaction with topography over the African continent. In numerical experiments, Mozer and Zehnder (1996b) showed that lee vorticity production could be achieved downstream of topography within easterly flow. The modeled waves had similar wavelengths and phase speeds to observed easterly waves. Burpee initially discounted this effect, as he saw no observational evidence that Ethiopian mountains spawn easterly waves. But Mozer and Zehnder suggest that mountains located further to the east, in Algeria and Niger, may provide the initial perturbations that trigger instability in the African easterly jet. The thermal structure of easterly waves is not conducive to tropical cyclogenesis. Easterly waves have a cold (warm) core at low (upper) levels that is established by the vertical variation in the large-scale relative vorticity field (Jenkins 1995). Thus, the associated convection occurs in a thermally indirect environment – any upward vertical motion created by low-level heating is offset by the wave’s thermal structure. In order for development to occur, this thermal structure must be overcome so that warm core and associated vorticity maximum propagate

13

down towards the surface. Hypotheses on how this process occurs were presented earlier in section 2.1. Kwon and Mak (1990) argue that few tropical waves become tropical storms because condensational heating in ‘normal’ moisture conditions is not enough to overcome this handicap. The low development rate of easterly waves, even those with strong associated convection, suggests that their hypothesis is accurate. Easterly waves typically have a period of 3-5 days. As mentioned previously, they typically form over continental Africa from June through October. They move westward and leave the coast of Africa with phase speeds in the range of 5-10 ms-1. Easterly waves have a large range of sizes – wavelengths can be as small as 2000 km or as large as 4000 km (Mozer and Zehnder 1996b). Thorncroft and Hodges (2001) showed that there is a positive correlation between seasonal tropical cyclogenesis in the Atlantic and the number of African easterly waves, particularly waves with significant low-level amplitudes. This is not surprising, especially since the importance of African forcings was established by Gray (1984b) nearly two decades ago.

2.2.2 Favored Genesis Regions and Variability Tropical cyclogenesis in the North Atlantic has a strong intraseasonal variability that is most intimately tied to the evolution of the SST and vertical wind shear fields. The Atlantic hurricane season begins on 1 June and ends on 1 December. Genesis in June and July is relatively rare, limited by marginal SSTs. The vast majority of depressions form within the months of August through October, engendered by warm SSTs and favorable vertical wind shear. Although SSTs remain favorable through November, strong westerlies impede on the tropical development region during that month. Development is limited by the strong vertical wind shear they create. There are also detectable variations in genesis by region. For example, Inoue et al. (2002) note that there is a bimodal distribution of genesis in the Caribbean Sea region. Development peaks in June, and then declines until October. They attribute this bimodal behavior to the onset of the strong easterly trades in July, enhancing upwelling in the southwest Caribbean. This results in cooler SSTs, higher ambient surface pressures, and a decrease in precipitable water. The onset of enhanced convection in the Pacific basin later in the season weakens the easterly trades and thus allows for a more favorable development environment in the Caribbean at that time. In the eastern North Atlantic, African easterly waves typically do not develop except during a brief window of late-August to early-October. SSTs off the coast of Africa are marginal before that time, and by October the vertical wind shear is climatologically too high.

14

2.3 Forecasting Tropical Cyclogenesis Over the Atlantic Basin The majority of research in the forecasting of tropical cyclogenesis in the Atlantic Basin has focused on seasonal time scales. Gray (1984a) first noticed a high correlation between the phase of the equatorial Quasi-Biennial Oscillation (QBO) and seasonal genesis frequency. He also showed that genesis frequency was negatively correlated with the existence of an El Niño (EN) event in the Pacific due to the creation of higher vertical wind shear over the Atlantic. Shapiro (1982a; 1982b) analyzed National Centers for Environmental Prediction (NCEP) grid point analysis and found that sea level pressure (SLP) poleward of 20ºN during August- September-October is negatively correlated with seasonal hurricane activity. Similarly, Gray (1984b) analyzed SLP data at various Caribbean stations south of 20ºN and found that springtime sea level pressure anomalies (SLPA) are significantly correlated with the following summer’s anomalies. Using EN, QBO, and SLPA as predictors, Gray has demonstrated success in predicting the seasonal number of tropical storms, hurricanes, and hurricane days. Recently, DeMaria et al. (2001) developed a tropical cyclone genesis parameter for the Atlantic Basin based on 5-day running means of mean zonal vertical wind shear, vertical instability, and midlevel moisture parameters. The wind and instability variables are derived from the NCEP operational analysis – the moisture fields are from GOES-8 sounder data. Results showed that their genesis parameter explained approximately 41% of the intraseasonal variability. The authors state that the parameter represents a necessary but not sufficient condition for development, and suggest that it may be possible to develop a disturbance-centered parameter that could be used to evaluate individual systems. There has been comparatively little research in the area of individual genesis forecasting. Shapiro (1977), noting that wave-like dynamics appeared more linear than storm-like dynamics, developed a criterion for the development of a tropical depression from a based on that facet of the dynamics. Using an observationally based linearity threshold for the development criterion, he presented several cases of systems that developed when they moved into favorable areas. Several cases were not forecast properly, however. Shapiro attributes this to the exclusion of thermodynamic and dynamic factors (such as SST and vertical wind shear) that may affect a system. McBride and Zehr (1981) analyzed individual cloud clusters in the Pacific Basin. They found that a single parameter computed from observational upper air data, called the ‘Daily Genesis Potential’ (DGP), provided valuable predictive information for development. This variable, which plays a prominent role in this research, will be discussed in more detail in Chapter 3. Perrone and Lowe (1986) followed a more statistical approach to the forecasting of individual clusters that formed in the Pacific Basin. After compositing developing and non-developing cloud clusters together and considering many possible predictors, they found

15

that low-level relative vorticity, 500 mb equivalent potential temperature (θE), and low-level divergence were effective at forecasting genesis out to 24 hours in advance. For the 1980 season, their probabilistic model correctly forecasted genesis 84% of the time. Only four of twenty-five systems were ‘false alarms’ (forecast to develop but did not). The predictors were related to the outcomes by discriminant analysis; this is the same linear statistical technique used in this research. It will be shown in Chapter 5 that the results obtained here were moderately inferior to Perrone and Lowe. Speculative discussion for this result is presented in that chapter.

2.4 Summary This chapter presented a review of the fundamental theories on tropical cyclogenesis, particularly as they pertain to the unique situation over the Atlantic Basin. Surprisingly little research has been performed that has looked specifically at the forecast problems associated with genesis, especially in light of the history of difficulties that numerical models (and forecasters) have experienced in recent years with the event. Documentation, discussion, and detailed analysis of other relevant topics in this research will be developed in the following chapters. This includes data sources, discriminant analysis, neural networks, and the strengths, weaknesses, and possible error sources of each.

16

CHAPTER 3

DATASETS

The continuing expansion of information available via the Internet has allowed the acquisition of virtually all data required for this project in a quick and inexpensive manner. Except for the satellite data files, all data were obtained in this fashion. This chapter will serve as a reference for information pertaining to all data used or derived for this project. Advantages, disadvantages, assumptions, and error sources will be discussed for each. This includes the criteria and methodology employed in creating the cloud cluster database. A description of a pilot study undertaken to determine the robustness of the NCEP-NCAR Reanalysis (NNR) in detecting disorganized tropical cloud clusters is also presented near the end of this chapter.

3.1 National Hurricane Center “Best Track” Data File A complete reference of all known occurrences of tropical cyclone formation in the North Atlantic basin is contained in a single text file, commonly called the NHC “Best Track” dataset (Jarvinen et al. 1984). Storms are arranged sequentially and each contains the following information at 6-hour intervals: latitude, longitude, minimum central pressure (mb), and wind speed (kt). In addition, information is also available pertaining to United States landfalls, the highest categorization the storm reached (tropical storm, hurricane), and other characteristics of the system (subtropical, extratropical, etc.). Since all storms during the study period that have been categorized as a tropical depression or greater appear in the best-track database, it provides an objective method for determining when TCG has occurred. However, the classification of disturbances and determining their exact locations can be subjective. Bracken and Bosart (2000) note several possible sources of uncertainty and error in the best track database. For example, it is a product of subjective interpretations of forecasters and satellite analysts. Usually, there are little or no in- situ observational data at hand to aid forecasters in determining whether genesis has occurred and the exact location of the circulation center, if any. Geostationary satellites provide little help in locating the disturbance center and diagnosing the intensity since the images they generate are relatively coarse in resolution and are unable to “see” through the that

17

typically exists over a strong tropical disturbance. Other microwave sensors that can penetrate cloud cover, such as the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI), Special Sensor Microwave Imager (SSM/I), and Microwave Sounding Unit (MSU), are carried on polar-orbiting satellites and are available to forecasters only when the satellite passes over the system(s) of interest. As was discussed in the introduction, there are several ideas on defining when a tropical cloud cluster is thought to make the transition into a tropical depression. For simplicity and because there are no better methods to date, this research assumes that TCG occurs when the cloud cluster first appears in the best-track database. It is not expected that the uncertainties advanced above will have a significant detrimental impact on the results of this study. Positional errors for the disturbances are generally small (an analysis of NHC fixes versus NNR relative vorticity maximas is presented later in this chapter). Since the study encompasses a short temporal period (1998-2001), there are no harmful impacts in the analysis due to a major shift in tropical cyclone identification techniques. Despite its limitations, the best-track dataset provides the most objective and quality controlled archive of tropical cyclone activity in the Atlantic Basin. It is available online and is updated annually.

3.2 Reynolds Sea Surface Temperature Real-time observations of SST are available with a temporal resolution of one week, distributed on a 1x1° grid. These data serve as the lower temperature boundary for the maximum potential intensity predictor variable, which is discussed in the following chapter. The SST data fields are obtained from a somewhat complex optimum interpolation (OI) technique (Reynolds and Smith 1994) that improves upon the blended technique of Reynolds (1988). Among the data sources for the SST analysis are ship and buoy observations that are transmitted to the National Centers for Environmental Prediction (NCEP). Ship observations are most dense in the Northern Hemisphere, but the tropical Atlantic has almost no buoy observations available (Reynolds and Smith 1994). The Advanced Very High Resolution Radiometer (AVHRR) provides better coverage than the in situ data; despite the inability to retrieve SST values through cloud cover. Satellite retrievals are tuned by regression against quality controlled buoy observations. Before the data is sent through the OI, a quality control procedure eliminates the worst data. The previous weekly analysis is used as a first guess field for the new analysis. The OI technique preserves the bias correction that had been implemented in the blended technique while improving upon the spatial and temporal resolution of the resulting SST fields. Examples of the robustness of this method and a detailed treatment of OI can be found in the originating paper (Reynolds and Smith 1994).

18

A number of pre-processing steps were undertaken with the raw SST data. The maximum potential intensity routine used in this research (see section 4.4) requires a surface pressure, temperature, and atmospheric sounding in order to run. The atmospheric parameters were obtained from the NNR, which is available on a 2.5° x 2.5° square grid. Hence, a disparity in spatial resolution existed between the SST and atmospheric data. To resolve this issue, the SST data were resampled into 2.5° square grids. This procedure was performed using the “resample” routine contained within the ArcInfo version 7.0.3 software package. A bilinear interpolation technique was used. Another linear interpolation in temporal space was then performed on the weekly files such that 6-hourly SST data were created. Each 6-hour SST data field was then read into the maximum potential intensity routine along with its corresponding NNR atmospheric profile. In almost all conditions, the SST variability in the tropical Atlantic is small from week to week. Hence, the linear interpolation between weekly fields and the reduction in spatial resolution should not have a significant impact on the results presented in this study. However, one should be aware that certain higher frequency events, such as a passage of a strong tropical cyclone, would escape detection in the weekly SST analysis. This will result in an artificially warm sea surface environment and may have a significant impact on the results presented here. It is suspected that these types of situations in which a tropical disturbance follows immediately in the cold wake of a strong tropical cyclone are relatively rare in the longitudes where TCG is most common, however. The Reynolds SST data are available online (IRI Data Library 2003) and are updated in near real-time.

3.3 Satellite Imagery Infrared (IR) satellite imagery provided the means to track tropical cloud clusters across the entire Atlantic Basin for the duration of the hurricane season. In order to obtain complete spatial coverage, it was necessary to utilize imagery from two different observing platforms: the eighth generation of the Geostationary Operational Environmental Satellite (GOES-8, providing coverage for the Western Atlantic), and the seventh generation of the European Organisation for the Exploitation of Meteorological Satellites (Meteosat-7, providing coverage for the Eastern Atlantic). Each platform will be briefly summarized below. A summary of the methodology used to merge the two images into one composite mosaic follows the discussion of the platforms.

3.3.1 GOES-8 The GOES-8 geostationary satellite provides crucial monitoring capability of tropical activity in the Atlantic. Archived imagery is available through the Space Science Engineering

19

Center (SSEC) at the University of Wisconsin back to 1 October 1997. For this study, IR images (band 2, wavelength = 3.9 µm) in the form of Mcidas AREA files were obtained for the following times: 0015, 0615, 1215, and 1815 UTC. Due to solar activity, downtimes, and other unforeseen difficulties with the satellite itself, several times were unavailable. The number of missing images was generally small, however, and did not significantly impact the study. Other archived satellite imagery and information regarding the GOES-8 products are available on the SSEC data center website (SSEC 2003).

3.3.2 Meteosat-7 Archived AREA files were obtained from EUMETSAT. The Meteosat-7 observing platform was launched on 2 September 1997 and has since provided nearly continuous coverage of the Eastern Atlantic and Western Indian Oceans. Rectified imagery was obtained that was centered on the Prime Meridian. The only IR band available was centered in the 10.5-12.5 µm range (band 8). This created some visual discontinuity with the GOES-8 image, which was available at 3.9 µm. Also, there is a slight temporal discontinuity between the GOES-8 and Meteosat-7 images, as Meteosat-7 images are obtained at the top of each hour rather than 15 minutes past. These differences are minor, however, and did not significantly impact the data collection process.

3.4 Satellite Image Compositing Technique Once the GOES and Meteosat images were obtained, they were stitched together to form a coherent picture of the entire Atlantic basin. Figure 2 illustrates the spatial coverage of the GOES-8, Meteosat-7, and their merged areas used in this study. Processing of the AREA files was done using the McIDAS software package (freely available to educational institutions from Unidata) using the following algorithm. For each 6-hour period, a check was performed to ascertain whether both the GOES and Meteosat images were available. Images were deemed to exist for that hour if their timestamp showed they were taken at any time during that particular hour. In most cases, images were no more than 15 minutes apart, since the regular temporal difference between the two platforms is 15 minutes. However, as long as both images were taken during the same hour, they were deemed a match. If one or more of the images was missing, an error message is given and the composite was not created. This occurred in about 9% of all the times processed. Those times were excluded from the rest of the study. Once it is determined that both files exist, they were remapped to a standard rectangular projection and stitched together in a single frame. The boundary between the two satellite fields of view was

20

Figure 2. Spatial coverage of GOES-8 and Meteosat-7 for this survey. chosen to be 51.5°W, although both platforms are able to “see” beyond that longitude. Differences in the brightness and some spatial discontinuities are discernable along that boundary (especially if the images were more than 15 minutes apart), due to the differences in wavelength and image times noted above. In the vast majority of merged images, this effect was small and did not hinder the analysis. The final composited image was saved as a GIF formatted image file to allow easy access during the data collection period.

3.5 Identification of Cloud Cluster Candidates The 1998-2001 Atlantic hurricane seasons (1 June – 1 December) were examined to find all of the tropical cloud clusters that formed with the study domain. In order to identify cloud clusters in the most objective manner possible, a formal cloud cluster definition was established. The identification of cloud clusters is not a trivial procedure – the matter is subjective and there

21

are few precedents in the literature on this topic. The overriding idea is that a cloud cluster should be a system that has the potential to develop into a tropical depression. Hence, several requirements immediately surface; namely, a cluster must have sufficient size, must persist for an extended period of time (i.e. not diurnal in nature), and must exist in a region where genesis is a genuine possibility (not in the high latitudes). Specifically, the following criteria initially adopted by Lee (1989a; 1989b) for his work in the Pacific basin were used:

(i) Each cluster must be an independent entity, disassociated with either a cyclone or pre-cyclone system; (ii) Cluster must be at least 4° in diameter and not elongated in shape; (iii) Cluster must be located south of 40° N and north of the ; (iv) Cluster must persist for at least 24 hours.

These criteria provided a somewhat objective method for identifying viable candidates. Essentially, tropical cyclogenesis is highly unlikely if any one of these criteria is not met. It should be noted that tropical cyclogenesis from small cloud clusters (~ 1° - 2° diameter) does sometimes occur in the Pacific basin (Lander 1994). But this phenomenon has been rarely observed in the Atlantic. Table 1 presents a summary of the cloud cluster characteristics for the four Atlantic seasons in this study. The 2000 season was the most active in terms of both developing and non-developing candidates, though 1998 appeared to be a more active year for fertile African waves (those easterly waves that exhibited significant and persistent convection). The 2001 season should be viewed with caution for several reasons; these will be discussed in section 3.5.2. Full season summaries for the 1998-2000 seasons can be found in the recent literature (Franklin et al. 2001; Lawrence et al. 2001; Pasch et al. 2001). The season summary from the NHC for the 2001 season was not available as of the writing of this document. Once all candidate cloud clusters were identified, they were grouped into developing (DV) or non-developing (ND) bins. A cluster is categorized as DV if it developed into a tropical depression within 48 hours of the image time. The DV cases were further stratified by the number of hours before genesis. For example, Tropical Depression Bonnie (1998) achieved genesis at 1200 UTC 19 August. The pre-Bonnie cloud cluster at 0600 UTC 19 August was categorized as a ‘6-hour developing cluster’. At two days prior two genesis, the 1200 UTC cluster 17 August was labeled a ’48-hour developing cluster’. If the Bonnie cloud cluster was evident before that time, it was categorized as a ND cluster and included with other clusters that never achieved tropical depression status.

22

1998 1999 2000 2001 Total Number of Clusters 90 91 110 79 Longest in duration (ND - hours)* 198 258 294 228 Mean duration (ND - hours) 58.9 55.1 54.8 65.7 Median duration (ND - hours) 42 36 42 48 Number of African waves** 42 32 41 35 African waves / Total clusters (%) 46.7 35.2 37.3 44.3 Number of TDs 14 16 18 14

* ‘ND’ signifies non-developing clusters only. ** Only those easterly waves that exhibited significant and persistent convection are included.

Table 1. Summary of documented cloud clusters for the 1998-2001 Atlantic hurricane seasons.

3.6 Cloud Cluster Data Set Characteristics It is desirable that the cloud cluster dataset encompass as many of the unique situations of both genesis and non-development cases as possible. This is valid both in the spatial and temporal distributions of the data. If these conditions are adequately satisfied, then good generalization of any validation set (i.e. a new, independent case) presented to the system will most likely be attained. Figure 3 illustrates the spatial distribution of all of the ND (top panel) and DV (bottom panel) cloud clusters at genesis time. There were 2892 ND cases from 345 clusters, meaning that each cluster persisted for approximately 50 hours on average. These numbers do not include a significant number of other cases due to missing satellite data (200+), untraceable convection (100+), and progression over land (12). Each individual case is assumed to be independent of the other cases in the same cloud cluster for the discriminant analysis. The two areas of greatest cloud cluster concentration easily stand out: the easterly wave track along 8°N from the African coast into the central North Atlantic, and clusters spawned by the in the southern Caribbean. The void in cloud cluster activity between 50ºW and 80ºW is most likely due to the dissipation or genesis of easterly waves before they move into that area. If either of those events occurred, the wave would be excluded from this study. For the DV cases at genesis (bottom panel), generally 3-5 cases for each forecast hour were eliminated because of missing satellite data, 1-5 were eliminated because of insufficient convection to fix a location, and 1-3 were eliminated due to existence over land. Forecasts at longer lead times (30-48 hours) generally have a higher percentage of missing/unusable data. It can be seen that the concentration of DV clusters is spatially coherent with the ND cases, though there is a higher development ratio for the region immediately off the eastern U.S. shore. The spatial distribution

23

of cloud clusters is consistent with climatological activity. On the temporal scale, the four-year sampling captures approximately 45-50 clusters per month, with development occurring most frequently in the prime development season of August through October. Figure 18 (Chapter 5) shows a histogram of cloud clusters by month. There were a couple of important issues that arose with the 2001 data. There may exist a small bias in the data towards non-July cases since many of these were missing from the 2001 season. Satellite data was missing from 1 July – 5 July and 23 July – 1 August. Using the 1998-2000 seasons as a guide, this results in about 6-9 missing cloud clusters. Even with the addition of those systems, the 2001 season overall still appears slightly less active than the previous three. Other concerns with the 2001 season are centered on the classification of the

Figure 3. Spatial distribution of the cloud cluster dataset (1998-2001). Non-developing clusters (top panel) are shown at every 6-hour fix. The location of developing clusters at genesis time is shown in the lower panel.

24

developing clusters by the NHC. Two systems (Allison and Dean) were initially classified as tropical storms. For this study, the genesis time of these storms were treated as if the first classification was a tropical depression. Also, the tropical depression that would eventually be named Felix was found to have two genesis events, one at 1800 UTC 7 September and the other at 0600 UTC 10 September. Felix was thus treated as two separate developing events, although it was only counted once in the ‘Number of TDs’ category of Table 1. Finally, the NHC labeled two storms (Karen and Olga) as subtropical and one storm (Olga) as sub/extratropical. These storms were not included in this study and not counted in Table 1. It is believed that the and dynamics of genesis for these ‘hybrid’ storms are quite different from traditional tropical cyclogenesis (Gray 1968). Since the NHC has just recently begun to include subtropical and extratropical systems in the best track database, it is possible that several late season storms during the 1998-2000 seasons would have been so categorized and excluded from the study. This may be one contributing factor to the below average classification of late season systems that will be discussed in Chapter 4.

3.7 NCEP-NCAR Reanalysis All of the atmospheric data used for this study were derived from or directly read from the NCEP-NCAR Reanalysis (Kalnay et al. 1996). The NNR is a data analysis/forecast system that assimilates a wide range of observational data, employs a stringent quality control procedure, and runs the National Centers for Environmental Prediction (NCEP) global spectral model to produce a global observational grid with a resolution of 2.5° in the horizontal and 28 levels in the vertical. Unique to the NNR are the length of the record (data are available from 1948-present) and the assembly of a comprehensive observational database. The NNR is commonly used in longer- term climatic studies since the system is “frozen”. Hence, differences in the output fields cannot be attributed to changes in the analysis system, a common problem in climate studies that use operational analysis.

3.7.1 Data Sources and Analysis Procedure As mentioned above, a distinguishing feature of the NNR is the incorporation of an exhaustive set of observational data from around the globe. Table 2 lists a summary of the different observations that are incorporated into the analysis. There is very little in situ observational data located within the domain of this study. Thus, many of the values are highly dependent on remotely sensed data, which includes satellite sounder data, SSM/I surface winds,

25

Global rawindsonde soundings COADS Aircraft Surface land synoptic data Satellite sounder data SSM/I Satellite cloud drift winds

Table 2. Observational data incorporated into the NCEP-NCAR Reanalysis.

and satellite cloud drift winds. There may also be limited input from aircraft observations as well as the Comprehensive Ocean-Atmosphere Data Set (COADS), which includes data from ship reports, fixed and drifting buoys, and near surface data from ocean station reports. Incidentally, the SSM/I winds are corrected to more closely match surface buoy observations through the use of a neural network developed at NCEP (Krasnopolsky et al. 1995). Once the data are gathered, an automated analysis system is invoked. A schematic of the system is shown in Figure 4. The data decoder and quality control (QC) preprocessor module decode all data and invoke a series of checks that assures the elimination of false or questionable observations that may exist due to instrument or human error. The output is then assimilated into a model first guess field, which was produced as a 6-hour forecast from the previous analysis field. This step is performed in the data assimilation module, of which the NCEP global model is a part. The model includes parameterizations of convection, large-scale precipitation, shallow convection, gravity wave drag, radiation (including clouds), boundary layer physics, surface hydrology, and vertical/horizontal diffusion processes. Several statistical procedures within the data assimilation module are also employed, including spectral statistical interpolation (SSI) and optimal interpolation (OI) of SST analysis (Reynolds and Smith 1994). The archive module is responsible for sending the final fields to disk or tape in one of several formats and the dissemination of data to various centers across the country, including NCEP and the National Climatic Data Center (NCDC).

3.7.2 Output Formats and Variable Descriptions At present, the NNR is freely available over several Internet sites in either gridded binary (GRIB) or netCDF format. All data used in this study were downloaded from the Climate Diagnostics Center (CDC) website (Climate Diagnostics Center 2003). This site also contains

26

links to other distribution pages, a history and description of error sources in the reanalysis, and links to software used to read and process the data. Data are also available via limited field CD- ROMs through the National Center for Atmospheric Research (NCAR). Since nearly all surface and upper air stations are located on land areas, there is a great data void over the world’s ocean basins. This is an important consideration for any study on tropical cyclones, since they form and spend most of their lifetimes over the oceans. Hence, there is some question as to the validity of the observations in the reanalysis within the study domain. The NNR addresses the issue of ‘variable confidence’ by assigning a letter description to each variable in the analysis that ranks it according to how closely that variable is observed or derived from the spectral model. Each parameter is categorized as follows:

‘A’ = variable is strongly influenced by observed data; most reliable ‘B’ = observations still directly influence variable; model has strong influence ‘C’ = observations do not directly influence variable; model derived; least reliable

This study uses the following variables from the NNR, with their categorization as given in Kalnay et al. (1996) in parentheses: pressure level temperature (A), zonal and meridional pressure level winds (A), mean sea level pressure (A), precipitable water (B), and specific (B). Thus all

Data

Data

Data Preparation Data Assimilation

Data

Data Data Distribution

Figure 4. Schematic drawing of the NCEP-NCAR Reanalysis system.

27

data used are directly influenced by observations. Furthermore, temperature and wind data are most strongly influenced by observations and are therefore thought to be very reliable. Remotely sensed observations of temperature and winds from satellite platforms, including TOVS temperature soundings and cloud track winds, contribute to the high confidence in those values.

3.7.3 Known Error Sources Errors in the reanalysis beyond standard instrumental error and model inaccuracies are unavoidable due to the complexity of the data assimilation process. Most are caught relatively early and corrected promptly. An on-line list history of the known error problems in the reanalysis can be found on the CDC web site (Climate Diagnostics Center 2003). For the data collected during the period of this study (1998-2001), there was one significant problem that was not caught by NCEP until after the completion of the data analysis portion of this work. Beginning in March of 1997, filters had been in place to eliminate TOVS satellite temperature retrievals over land areas were lost during a port to a CRAY machine. These filters are important since they eliminate many inaccurate reports over those areas. Figure 5 illustrates this problem as a distinct

Figure 5. 100 mb global monthly averaged temperature difference (°C) of reanalysis data without TOVS land filters (CDAS) and data with the filters (R2).

28

jump in 100 mb temperature is seen with the unfiltered reanalysis (CDAS) over the filtered (R2) beginning in March 1997 after the code port. The most significant errors in the reanalysis were in the lower stratospheric temperatures and the stratospheric heights. The TOVS problem was discovered early in the year 2000 and was corrected in subsequent reanalysis runs beginning in October 2001. The issue was severe enough to warrant a rerun of the affected periods. The corrected data were not available to users until April 2002 - well after the data was collected for this study. To assess the impact of the inclusion of the unfiltered TOVS temperature soundings on the results of this work, corrected reanalysis data were downloaded for the 1998 hurricane season. Several predictors were recalculated and analyzed against the unfiltered predictor values. Figure 6 shows an anomaly map of maximum potential intensity (unfiltered – filtered TOVS) for 1200 UTC 5 June 1998. Other times were also examined (not shown) and exhibited similar patterns and magnitudes. Shaded contours indicate areas where the correct reanalysis data yields a higher MPI value. Anomaly values range from

Figure 6. Anomaly (unfiltered – filtered) of maximum potential intensity. Shaded regions represent negative values.

29

near zero in the and subtropics to 20 or more millibars in the main development region of tropical depressions and storms (5°-20° N, 15°-60° W). The southern Caribbean, another primary development region early in the season, also shows significant differences between datasets. These findings warranted a more detailed investigation; specifically, would the magnitudes of these results significantly impact the predictor’s contribution as a whole to the entire statistical model. The unfiltered MPI data were replaced by the corrected values over the entire season. The results for all developing and non-developing cases during the 1998 season are summarized in Table 3. For all forecast periods, the corrected MPI mean is 2-3 mb lower with a slightly higher variance (0.3-1.5 mb) in comparison to the original. The mean and standard deviation for the non-developing cases have similar magnitudes. If the cloud cluster traversed through one or more of the “bulls-eyes” seen in Figure 6, MPI anomaly values could differ by 10 or more millibars. Clearly, these numbers were too large to discount. Further work was performed to determine the impact of the TOVS problem to other predictors in the dataset. Figure 7 shows similar anomaly fields for precipitable water and the daily genesis potential for 0600 UTC 8 September during the 1998 season. Precipitable water (a) shows differences of up to 8 mm between datasets, or approximately 15-20% of its typical tropical values of 40-50 mm. The effect seems to be most pronounced in the deep tropics and over water areas. In terms of percentage

Corrected Data Uncorrected Data (Correct– Uncorrected) Forecast Std. Std. Std. N Mean Mean Mean Hour Dev. Dev. Dev. 6 12 910.6 24.2 913.0 23.9 -2.4 0.3 12 13 907.1 29.3 910.0 28.4 -2.9 0.9 18 13 902.6 35.0 905.1 33.5 -2.5 1.5 24 14 904.1 35.5 906.9 34.6 -2.8 0.9 30 14 901.1 34.7 903.4 33.4 -2.3 1.3 36 13 900.7 34.6 903.9 33.5 -3.2 1.1 42 11 892.8 31.2 896.2 30.3 -3.4 0.9 48 9 902.2 31.4 905.6 30.9 -3.4 0.5 ND 767 896.7 22.6 899.9 21.8 -3.2 0.8

Table 3. Summary statistics of filtered vs. unfiltered MPI values (averaged at a radius of two degrees from the cloud cluster center) from the NCEP-NCAR Reanalysis for the 1998 Atlantic hurricane season. ‘ND’ signifies non-developing cloud clusters.

30

Figure 7. Precipitable water (mm) and Daily Genesis Potential (x 105 s-1) differences between the uncorrected reanalysis and the corrected data for 0600 UTC 8 September 1998. Negative values are shaded.

31

difference, it appears that the DGP (b) is even more affected by the inclusion of all TOVS soundings. Typical values of DGP for a grid point during September in the tropical Atlantic range from –1.5 – 3.0 x 105s-1. At this particular time, there are some regions that show a near 100% change in its previous value as the DGP changed within the range of zero to nearly three units. Other times (not shown) reveal similar patterns both in space and magnitude. Unlike precipitable water there appears to be no favored area for higher magnitude changes from the inaccurate dataset. Given this evidence, the entire dataset for this research was re-processed using the corrected reanalysis data. The predictive capability of the dataset was examined by comparing the correct classification rates of the discriminant analysis procedure for ‘bad’ and ‘good’ data. Results of this work can be seen in Figure 8, which illustrates the correctly classified percentage of each cloud cluster in the dataset by forecast hour (hours before genesis). The predictive skill of the ‘new’ data is significantly degraded as each forecast hour shows a lower percentage of clusters correctly classified. Here is a situation where ‘better’ data has lead to worse

100 90

80 70

60 50 40 Percentage 30 20

10 0 6 12182430364248 Forecast Hour

Figure 8. Percent of cases correctly classified using discriminant analysis for uncorrected (dark bars) and corrected (white bars) data by forecast hour (hours before genesis). The black line is the difference between the datasets and can be considered the degradation magnitude for the corrected data. The decision boundary is assumed to be P = 0.5.

32

results. A brief examination of the predictor group means yields a satisfactory explanation to the classification results. As discussed in the following chapter, discriminant analysis performs classification by ‘maximally discriminating between groups’. In other words, the method performs better when the independent predictors for each respective group (‘developing’ and ‘non- developing’) are more differentiated. For example, the uncorrected DGP mean for the 6-hour forecast group was 1.556; the corresponding non-developing clusters had a DGP of 0.4, a difference of 1.156. But for the corrected data, the DGP mean for the 6-hour forecast group was 1.377. The non-developing clusters for the same data had a DGP of 0.374, a difference of 1.003 or approximately 15% smaller than the uncorrected data. Table 4 lists shows that four of the predictors (Daily Genesis Potential (DGP), precipitable water (PWAT), 6-hour surface relative vorticity (VTENDSFC), and 6-hour 700 mb relative vorticity tendency (VTEND700)) would contribute to a degraded classification since the developing and non-developing data are becoming less separated. Only the MPI would help the classification as there is slightly more separation between the groups. The corrected data were used in all of the results presented in Chapters 6 through 8. However the old data were used in the pilot study, summarized in the following section, which examined the ability of the reanalysis to detect tropical waves. The errors introduced by the TOVS issue were not a critical consideration in that study since the goal was to qualitatively

Forecast Hr DGP MPI PWAT VTNEDSFC VTEND700 6 -0.179 -3.678 -1.126 -0.095 -1.032 12 -0.299 -3.637 -0.733 -0.181 -0.200 18 -0.244 -3.364 -0.771 -0.260 -0.143 24 -0.255 -3.607 -0.562 -0.463 -0.366 30 -0.141 -3.619 -0.659 -0.033 -0.105 36 -0.088 -4.091 -0.856 -0.108 -0.118 42 -0.194 -3.925 -1.179 -0.056 0.051 48 -0.209 -3.925 -1.192 0.060 -0.011 ND -0.026 -3.243 0.010 0.014 0.035

Table 4. Mean differences for all forecast periods between the corrected and uncorrected reanalysis data. ‘ND’ signifies a non-developing case. Predictors are two-degree area averages from the cluster center, except MPI (averaged at six degrees).

33

ascertain the robustness of the analysis in resolving the clusters, rather than quantitatively determining the importance of each predictor in regards to tropical cyclogenesis. The difference between the datasets and can be considered the degradation magnitude for the corrected data. The decision boundary is assumed to be P = 0.5.

3.7.4 Detection of Incipient Tropical Depressions in the Reanalysis A pilot study was undertaken to determine if the weak circulation that accompanies tropical waves and cloud clusters is detectable in the NNR. This is an important consideration, for if the cloud clusters are not resolved in the reanalysis it would be difficult, if not impossible, to develop a predictive model based on the data. The 850 mb relative vorticity fields were examined for all of the developing clusters that formed during the first three Atlantic hurricane seasons in the study (June – November 1998-2000). This variable has been used to track tropical systems in previous studies (e.g. Reed et al. 1988) and was determined to be a useful proxy for finding the central location of the system. The maximum in the relative vorticity field was found through an interpolation scheme in Gempak – hence the center of the system was allowed to be in between grid points. The cluster was considered to be “detected” in the NNR if a local 850 mb relative vorticity maximum was located near the center of the cloud mass as determined by IR satellite imagery. In addition, the local maximum must clearly be associated with the system in question. The coordinates and magnitude of the relative vorticity maximum were recorded for each developing system during the three-year study period. Then, these locations were compared to the data in the NHC Best Track database at genesis time (the first time the storm appears in the database). For the 1998-2000 seasons, the NNR was able to detect every developing tropical cloud cluster except for TD-7 (1999). According to the NHC (Avila 1999), TD-7 formed within the same tropical wave that spawned Tropical Storm Greg in the Eastern Pacific around the same time. The NNR picked up on Greg’s dominant vorticity maximum and did not detect the weaker one associated with TD-7. The ability of detection in reanalysis datasets has not been as robust in other research. Fiorino (2000) examined the 15-year ECMWF reanalysis (ERA-15) and found a detection rate of approximately 90%. His detection scheme, modified from an operational tracker run at the Naval Research Laboratory, is based on surface wind direction shifts in addition to the 850 mb relative vorticity field. Detection occurs if the reanalysis produces a feature from which a forecast track could be made (Mike Fiorino Personal Communication 2001). Comparing the position of the tropical cloud cluster in the NNR to the analyzed NHC best-track coordinates further tested the robustness of the NNR. Figure 9 is a range-azimuth plot of the differences between the NHC fix and the NNR analyzed center (from the 850 mb at the

34

relative vorticity maximum) at genesis time. The dots represent the NNR fix – the NHC position is center of the circle for each case. Of all the systems resolved in the NNR, 36 out of 47 (76.6%) were within one NNR grid point (approximately 277 km) of the NHC fix at genesis time. Of those clusters that were outside of this radius, there appears to be a westward bias in the NNR fix. Nine out of eleven (~82%) of clusters with a distance difference of greater than one grid point were analyzed to the west from the NHC fix. Reasons for this bias are not known and are beyond the scope of this study, but a few speculative explanations are presented here. In these cases

1998-2000 NNR Range (km) and Azimuth (deg) from NHC Fix

0 700 330 30 600

500

400 300 60 300

200

100

270 0 90 700 600 500 400 300 200 100 0 100 200 300 400 500 600 700 100

200

300 240 120 400

500

600 210 150 700 180

Figure 9. Range/azimuth plot of NNR 850 mb relative vorticity maximums (black dots) and NHC fix (center of plot) for developing cloud clusters during the 1998-2000 Atlantic seasons (47 cases). Range values are in units of km and azimuth values are compass degrees. All cases are at genesis time.

35

there may be a secondary (or even primary) vorticity maximum that is located separate from the intense convection. This circulation may not have been picked up by the operational analysis; hence, NHC analysts would tend to fix the center of the storm closer to the convective mass. This may be even a stronger argument for those storms that form too far out into the Atlantic so that reconnaissance data were not collected. Another possible explanation may lie in the NCEP global spectral model, which produces a six-hour forecast that is used as a first guess field for the following reanalysis. The model may be moving the vorticity center too fast to the west, where it remains if there are no observed data to be assimilated into the analysis (which may be frequent for remote ocean systems). There is a correlation between the strength of the system at genesis time and the difference in fixes from the NHC and NNR. Figure 10 is a scatter plot of 850 mb relative vorticity maximum at genesis vs. the distance difference between the NHC and NNR centers for all genesis events from 1998-2000. Though the strength of correlation is not overwhelming (R = - 0.5) the larger (smaller) differences tend to be for the weaker (stronger) systems.

1998-2000 Storms at Genesis

600

500 ) m

(k 400 r ro r

E 300 ce an st i

D 200

100

0 123456 850 mb Relative Vorticity Maximum Magnitude (x105 s-1)

Figure 10. Scatter plot of all tropical cyclogenesis cases (1998-2000) on a 850 mb relative vorticity vs. distance difference (NHC vs. NNR) scale. The linear best-fit line is shown. R = -0.5.

36

3.8 Summary Data sources utilized for this project were presented. Within each section the source, availability, advantages, and known error sources were discussed. During the analysis period it was made known that a severe error source had propagated into the NNR dataset. This necessitated a rerun of the predictor dataset. Results presented here show that the corrected data had a significant impact on the values of the predictors and in effect degraded the predictive capabilities of the dataset. Results presented from the pilot study show that the NNR is able to resolve all pre-depression disturbances in the 850 mb relative vorticity field except one. This allowed for the continuation of the project to its completion, including the discriminant analysis and neural network classification results that will be discussed later in this work.

37

CHAPTER 4 PREDICTORS OF TROPICAL CYCLOGENESIS

Eight large-scale predictors of TCG were chosen for use in this study. The choice of predictors were limited to those that were easily obtainable, large-scale, and most importantly, hypothesized to be enlightening determiners of the developmental outcome of the cloud cluster. As will be shown, the analysis quickly highlighted the important variables and those that have little effect on the development outcome of the cloud cluster. The predictors are synthesized in Table 5 and discussed in more detail in the sections that follow. Ideas for the implementation of new predictors and the improvement of classifications in general are discussed in Chapter 9.

Physical Predictor Abbrev. Units Min. Max. Significance Planetary Latitude COR 10-5s-1 0.61 8.55 Vorticity Daily Genesis Vertical Wind DGP 10-5s-1 -1.64 3.74 Potential Shear Structure Maximum SST and Potential MPI Vertical mb 838.29 993.48 Intensity Instability 850 mb Moisture Moisture Influx MDIV 10-7g/(kg s) -6.94 5.05 Divergence at low levels Vertical Precipitable 2 PWAT Moisture kg/m 32.28 51.62 Water Availability Pressure System PTEND mb/24 hr -2.46 2.69 Tendency Amplification Surface Relative Vorticity VTENDSFC Vortex Spin-up 10-5s-1/6 hr -0.58 0.85 Tendency 700 mb Relative Vorticity VTEND700 Vortex Spin-up 10-5s-1/6 hr -1.01 0.84 Tendency

Table 5. The eight large-scale predictors of TCG used for this study.

38

4.1 Averaging Technique The values of the predictors described in this chapter were computed for every grid point in the domain (shown in Figure 3) every six hours (00, 06, 12, and 18 UTC) for the entire Atlantic hurricane season (June 1 – December 1). Once a time and location of a cloud cluster was identified, all NNR grid points within a 2º swath (6º for the MPI predictor) were identified (see Figure 11). The predictor value for each of those grid points is calculated, and then averaged together to yield one value. This procedure was performed with swaths of 2º, 4º, 6º, 8º, and 10º - preliminary DA runs showed that the 2º averaging radius, which focuses essentially on only the cloud cluster itself, was by far the most skillful at TCG prediction.

4.2 Latitude Tropical storms rarely form at latitudes within 5° of the equator or poleward of 30° (Gray 1968). The limiting factor for the formation of storms in the mid-latitudes is usually the dominance of unfavorable vertical wind shear. Near the equator, the thermodynamic and dynamic environment is usually ideal for tropical cyclogenesis, but there isn’t sufficient planetary vorticity to initiate organized cyclonic rotation near the surface. In a large observational survey of tropical cloud clusters, McBride (1981) noted that TCG tends to occur at higher latitudes on average (in the Northern Hemisphere). Because of these factors, latitude is included as a predictor of tropical cyclogenesis. A scaled Coriolis Parameter, defined as:

f = 2ω sinφ (3) represents latitude in this study. The units are 10-5s-1, where ω is the angular rotation of the earth (=7.29x10-5s-1) and φ is latitude in degrees.

4.3 Daily Genesis Potential A large observational study of tropical cloud clusters that encompassed 912 tropical systems in two ocean basins was conducted over 20 years ago by a research team at Colorado State University. Details on the compositing technique and labeling of developing and non- developing systems can be found in McBride (1981). Essentially, twice daily rawinsonde data over a 10 year period (1961-1970) in the northwest Pacific and 14 years (1961-1974) in the northwest Atlantic were composited and analyzed for differentiating patterns between developing and non-developing systems. Hereafter the Atlantic basin will be focused upon since this research is based within that domain.

39

Figure 11. Schematic of the averaging technique used to arrive at the final predictor values. Any grid point that fell within a 2º swath around the cloud cluster center was averaged together. The black dots represent grid points from the NNR.

Components of the seasonal genesis parameter, or SGP (Gray 1979) were calculated and compared for both developing and non-developing cases. Table 6 was reproduced from Table 9 in McBride (1981) except only for the Atlantic cases. The most obvious difference between developing (D1, D2, D3, D4) clusters (hereafter “D clusters”) and non-developing (N1, N2, N3) clusters (hereafter “N clusters”) is the much higher vorticity parameter (900 mb relative vorticity) for the developers. If the non-developing tropical depressions are included with the developing cases, as they are for this research, then the results become even more convincing. The Coriolis parameter is generally higher for the D clusters, indicating the importance of planetary vorticity as discussed in the previous section. The vertical shear parameter, calculated as a scaled difference of the wind vectors at 900 and 200 mb, shows little difference between the D and N clusters. As will be discussed later in this section, the differentiating variable is not the shear itself, but where it is in relation to the center of the storm. Finally, note that the last three parameters in the table, the “thermal” parameters, show no differentiation between D and N

40

Vertical Ocean Moist Vorticity Coriolis Humidity Shear energy Stability SGP Parameter Parameter Parameter Parameter Parameter Parameter N1 (Cloud 2.0 4.99 0.082 10 13.3 0.94 1 cluster) N2 (Wave- 7.0 3.95 0.063 12 15.3 0.68 2 trough cluster) N3 (Non-dev 24.2 5.11 0.121 10 13.5 0.89 18 depression) D1 (Pre-hurr 23.5 4.44 0.090 10 13.6 0.93 12 cloud cluster) D2 (Pre-hurr 39.4 5.15 0.162 10 12.5 0.92 38 depression) D3 (Intensifying 48.2 5.44 0.108 10 12.4 0.99 35 cyclone) D4 (Hurricane) 72.6 5.68 0.117 10 10.8 1.00 52

Table 6. Average terms of the SGP for all Atlantic cases in the McBride survey. clusters in the Atlantic for this study period. The implication is that the thermal parameters play no role in determining whether an individual system will develop into a tropical depression. Rather, they provide the climatological basis for tropical cyclogenesis. They must be consistently high (and usually are during the Atlantic hurricane months) to allow for the faster-varying dynamical considerations to provide the sufficient conditions for tropical depression formation. The SGP was formulated with the goal of producing a climatological indication of tropical cyclogenesis. In the second of their two part paper, McBride and Zehr (1981) use the information learned from that analysis to develop the “daily genesis potential”, or DGP. The DGP was designed for application to a singular tropical cloud cluster to assess the likelihood of development into a hurricane or typhoon. It synthesizes their primary findings regarding the characteristics of non-developing versus developing systems:

1) Both are warm core in the upper levels, 2) No obvious difference in vertical stability, 3) No obvious difference in moisture content,

41

4) Developing clusters are located in areas of high low-level relative vorticity, 5) Genesis occurs under conditions of zero vertical wind shear near the system center, 6) Developing clusters exhibit large positive (negative) zonal shear to the north (south), of the system and southerly (northerly) shear to the west (east).

These findings emphasize the importance of the large-scale wind field (4-6) and reiterate the findings of McBride (1981) that the moisture and thermodynamic considerations play little role in daily genesis probability. Note also that the application of DGP in this study is to the development into a tropical depression rather than a mature typhoon or hurricane. However, as Table 6 shows, the N3 type clusters (those that attained tropical depression status) are more similar to the stronger systems than the non-developing cases, at least in terms of the dynamic SGP parameters. A mathematical statement that encompasses the necessity of zero vertical shear near the center and the requirement of a strong zonal and meridional shear gradient across the system is:

∂  ∂U  ∂  ∂V  −  +   (4) ∂y  ∂p  ∂x  ∂p 

where U = zonal wind and V = meridional wind. If the order of differentiation is reversed, this can be rewritten as:

∂  ∂U ∂V  − +  . (5) ∂p  ∂y ∂x 

This is the vertical gradient of relative vorticity. The DGP is a representation of this term:

DGP = ζ 900 −ζ 200 (6)

where ζ900 and ζ200 are the 900 mb and 200 mb relative vorticities. This value is typically scaled so that the units are 10-5s-1. A high (low) value of DGP indicates favorable (unfavorable) development conditions. McBride and Zehr calculated DGP values for all of the developing and non-developing cases in their dataset by assuming axisymmetry and averaging all data within 2°, 4°, and 6° of

42

0-2° 0-4° 0-6° Atlantic non-developing N1 Cloud cluster -0.5 0.4 0.7 N2 Wave trough cluster 2.1 0.6 0.4 N3 Non-developing depression 5.5 2.0 1.0

Atlantic developing D1 Pre-hurricane cloud cluster 4.7 2.4 1.6 D2 Pre-hurricane depression 5.2 2.8 1.8 D3 Intensifying cyclone 9.8 4.2 2.8 D4 Hurricane 10.8 5.2 3.7

Table 7. Atlantic DGP based on mean relative vorticity differences between 900 mb and 200 mb in units of 10-5s-1. From McBride and Zehr (1981). the center of the system. Table 7, reproduced from their paper, illustrates DGP calculations for all categories of Atlantic basin cloud clusters in their dataset. At both 4° and 6° the DGP is approximately three times higher for the non-developing depression than other non-developing clusters and four times higher than a pre-hurricane cloud cluster. The NNR does not contain data at the 900 mb level. Calculations of DGP were performed using both 925 mb and 850 mb as the lower level and the results were examined. There was very little difference in the DGP results using either level. Hence, the 850 mb level was chosen as the lower level in this study since it is more commonly used in operational meteorology. Thus the DGP for this study is defined as:

DGP = ζ 850 −ζ 200 . (7)

4.4 Maximum Potential Intensity The maximum potential intensity, or MPI, is defined as the highest intensity (usually measured as the lowest pressure or highest surface wind speed) a tropical cyclone could theoretically achieve given the environmental conditions present. There are several formulations of MPI that have been developed over the past 50 years (Miller 1958; Malkus and Riehl 1960; Emanuel 1988; Holland 1997). Each focuses on either potential thermodynamic constraints to intensity, dynamic constraints, or some combination of the two. A thorough review paper on MPI and all of these methods is presented in Camp and Montgomery (2001).

43

Though it has been established that the large-scale thermodynamic environment plays little role in the daily genesis potential of tropical depressions, it has long been known that certain thresholds in both the ocean and atmospheric conditions must be met before tropical cyclogenesis can occur1. These requirements were discussed in Chapter 1. The MPI as formulated by Holland (1997) is used as a proxy for two of the thermodynamic requirements; namely, the sea surface temperature (SST) and the conditional instability of the environment. Holland’s method makes the following assumptions:

1) The environment can be represented by a single sounding of temperature, surface pressure, and surface temperature, 2) Cyclone is axisymmetric, 3) Surface pressure drop from the environment is defined hydrostatically by the temperature anomaly in the column above,

Input

Environmental Calculate surface θE sounding and from specified RH and surface pressure surface parameters

Choose eyewall RH

Calculate moist New θE at surface adiabatic by iteration No

Convergence of Hydrostatic surface surface pressure pressure calculations?

Yes

Consider temperature profile and Estimated MPI hydrostatic central pressure if pressure drop > 20 mb

Figure 12. Flow chart describing the solution method for Holland MPI (Holland 1997).

______1. Recent observations of the Saharan Air Layer’s effect on TCG strongly suggest that the moisture fields may play a more prominent role in daily TCG than previously believed.

44

4) The eyewall temperature anomaly results from moist adiabatic lifting of surface air aloft in addition to subsidence warming (for stronger systems only), 5) Neither ice phase process nor entrainment of midlevel air into the eyewall are considered.

The solution, which always converges for realistic atmospheric conditions, is arrived at through an iterative process and is illustrated in Figure 12. Relative humidity under the cyclone is specified at 90% in this study. The lower boundary temperature (i.e. surface temperature) is specified as 1°C cooler than the Reynolds SST. Given the initial surface conditions, the initial surface

equivalent potential temperature (θE) is computed. The temperature change in the eyewall as a result of the redistribution of the moist entropy aloft is then found. Once the temperature anomaly is arrived at, the resulting surface pressure fall due to the warm anomaly in the column (Hirschberg and Fritsch 1993) is computed as:

P PT ∆P = s ∆T d ln p (8) s T ()P ∫ v v s Ps where the virtual temperature is defined as:

Tv = T (1+ 0.61q) (9)

where q = specific humidity. The upper level pressure (PT) is chosen to be the level at which a saturated moist adiabat from the lifting condensation level in the eyewall region crosses the environmental profile. Hence a new value of θE is found since it is a function of pressure. The process is then repeated until the change in surface pressure is less than 1 mb. If the total pressure drop is less than 20 mb, it is assumed that the system is weaker in nature (less intense than a hurricane) and thus has a cloudy center. In these cases, the MPI is due entirely to the redistribution of moist entropy aloft. If the pressure drop is at least 20 mb, the system is considered to have an eye, whereas the contribution to the total pressure fall due to the existence of such is considered (see the Holland paper for more details). The final MPI is defined as:

MPI = PSenv − ∆PS max (10)

45

where PSenv is the environmental surface pressure, and ∆PSmax is the maximum achievable pressure fall. An observational study (Emanuel 2000) has shown that few tropical cyclones reach their MPI because of limiting factors such as wind shear, feedback from ocean cooling (Khain and Ginnis 1991) and eyewall replacement cycles (Willoughby et al. 1982) that dampens intensification. As mentioned previously, MPI represents the combined effect of the SST and the instability of the environment. The Holland MPI is a good choice as a SST proxy because it is hypersensitive to changes in it. For SST values of 27°-31°C, the MPI varies from 963 mb down to 832 mb, a difference of 131 mb or 33 mb per °C (Holland 1997). The Holland method assumes instantaneous response to surface changes. In reality, a rapid change in the SST will procure a somewhat slower response in the characteristics of the system. The Holland MPI also explicitly calculates the moist adiabatic lapse rate and will fail to produce any significant columnar warming if the atmosphere is not sufficiently conditionally unstable to redistribute the moist entropy aloft. Thus, this formulation of the MPI provides one convenient variable that incorporates two thermodynamic thresholds that must be met for TCG to occur. Greg Holland kindly provided the software needed to compute the MPI.

4.5 850 mb Moisture Divergence A developing tropical cyclone requires the influx of warm, moist air at lower levels in order to sustain the primary circulation. If the incoming air is dry, ascent in the inner core will result in adiabatic cooling without opposition from latent heating, resulting in a stabilization of the profile and evaporation of the inner core moisture. Furthermore, modeling (Emanuel 1995) and observational (Cione et al. 2000) studies have demonstrated that when the core area of tropical cyclones maintains near-saturation, the cool, dry downdrafts that normally accompany strong convection are absent. This allows surface fluxes to increase subcloud entropy and promotes further intensification. It is hypothesized that the strong low-level transport of moisture by the primary circulation into the core region will serve as a useful proxy for maintenance of the near- saturated condition in the subcloud layer. To evaluate this factor, the 850 mb moisture divergence was computed for each cloud cluster. In theory, a highly negative value (strong convergence) would indicate that the tropical cloud cluster exists in a favorable large-scale environment, with moisture-laden air being readily advected into the lower levels of the disturbance. Conversely, a strong positive value (divergence) indicates a weak primary circulation, a drier environment, or both.

46

The predictor was computed in Gempak (a common meteorological analysis software package) and derived from the dew point and u and v components of the wind field in the NNR. The equation for the moisture divergence at 850 mb is defined as:

K K MDIV = r∇ •V +V • ∇r (11)

K where V is the total wind at 850 mb and r is the 850 mb mixing ratio, computed as:

 e  r = ε  (12)  850 − e 

where ε is the ratio of the molecular weight of water and dry air (Mv/Md = 0.62197). The vapor pressure (e) is always much less than the total pressure at a given temperature and is hence usually neglected in the denominator2. Gempak explicitly calculates the mixing ratio according to equation 12. The vapor pressure is calculated using the dewpoint (Td) of the air at 850 mb:

(17.67*Td ) ()T +243.5 e = 6.112*e d . (13)

The units of moisture divergence are scaled to be in units of 10-7g/(kg s).

4.6 Columnar Precipitable Water The intensification of a cloud cluster is dependent on the maintenance of the thermal buoyancy within the convective area. This requirement is hard to sustain if dry air is entrained into the updraft shaft, since such an occurrence would lead to evaporative cooling and a stabilization of the sounding. However, if the surrounding environment were sufficiently moist, entrainment into the super moist convective updraft would not be as detrimental to the convection. Gray (1979) used seasonally averaged values of mid-tropospheric relative humidity values as a part of his SGP, and noted that tropical cyclogenesis does not occur if the 500-700 mb relative humidity was less than 40%. Columnar precipitable water provides an indication of

______2. In tropical environments, the vapor pressure (e) can be more than 20 mb. This is too large to be discounted.

47

the moisture present within the troposphere. High (low) values presumably indicate favorable (unfavorable) regions for genesis. Values were extracted directly from the NNR dataset and are in units of kg/m2.

4.7 24-Hour Sea Level Pressure Tendency As mentioned in the introduction, one of the variables that forecasters routinely consider is the pressure tendency. A fall in pressure is indicative of progress towards the establishment of a warm core and spin up of the primary and secondary circulations. Since there is a marked diurnal oscillation in surface pressure values (e.g. Dai and Wang 1999), the 24-hour pressure tendency is used to rid the data of that signal. Furthermore, in preliminary analysis it was discovered that a pressure tendency predictor added little if any predictive skill for forecast times greater than 24 hours. Therefore, this predictor is only included in the 6-24 hour forecast times. Since this is a 24-hour tendency, the predictor requires available pressure data back to 30-48 hours prior to genesis. Many times the data were missing due to cloud cluster locations over land areas, especially for the easterly waves in the Eastern North Atlantic. Due to the rarity of developing cases in the first place, it was desirable to include as many as possible in the final analysis, including those with missing pressure tendencies. Thus, the mean pressure tendency of all of the non-missing developing cases for each respective forecast hour was substituted in for the missing tendency values. This was only done for the developing cases – missing values in a non-developing case resulted in that case being left out of the final analysis. The mean 24-hour pressure tendency values for the developing cases are: -0.32 mb (6-hour forecast), -0.41 mb (12), -0.36 mb (18), and –0.19 mb (24). This procedure may create a slight bias in the DA and NN predictions, but it is assumed that a similar technique would be employed in a forecast environment. The 24-hour surface pressure tendency is simply calculated as:

PTEND = Pt − Pt−24 (14)

where Pt is the mean sea level pressure at the analysis time and Pt-24 is the mean sea level pressure 24 hours prior to that time. The calculations were made in a Lagrangian framework since it is assumed that the cluster would have moved a non-trivial distance during those times.

4.8 6-Hour Surface Relative Vorticity Tendency Another diagnostic predictor that is considered is the 6-hour surface relative vorticity. Unlike pressure tendency, this predictor is calculated in a Eulerian framework. This assumes that the cloud cluster over a 6-hour period does not propagate far enough to affect significantly the

48

validity of the calculation, especially over a coarse grid like the NNR. Most cloud clusters that form in the Gulf of Mexico region tend to move very slowly or remain nearly stationary. In those cases this assumption would be easily valid. The climatological forward speed for easterly waves is approximately 5-10 ms-1 (Mozer and Zehnder 1996b), which is 108-216 km per 6 hours. This distance is less than one grid point distance (about 278 km) in the NNR. Thus it is possible that grid points outside of the immediate cloud cluster radius of influence will factor into the calculation, but any error would not be of first order. The 6-hour surface pressure tendency is calculated as:

VTENDSFC = ζ sfc(t) −ζ sfc(t−24) (15)

where ζ sfc(t) is the surface 6-hour relative vorticity at the analysis time, and ζ sfc(t−24) the same parameter 24-hours previous. This predictor was chosen as a result of interactions with NHC forecasters: “We have found that genesis seems to proceed from the low- to mid-troposphere downward, i.e. oftentimes an incipient tropical cyclone may be a cloud system that exhibits a circulation aloft, but not (initially) at the surface” (Richard Pasch Personal Communication 2000). This predictor was chosen with the hope that the spin up of the low-level circulation will be resolved in the NNR and thus provide an indication for future development of the cloud cluster.

4.9 6-Hour 700 mb Relative Vorticity Tendency The 700 mb relative vorticity tendency was chosen to diagnose the increase in the primary circulation usually seen in the mid-troposphere before genesis, possibly associated with the descent of a MCV spawned by a MCS (Chen and Frank 1993). Like the surface vorticity tendency, it was calculated in a Eulerian framework:

VTEND700 = ζ 700(t) −ζ 700(t−24) (16)

where ζ 700(t) is the 700 mb 6-hour relative vorticity at the analysis time, and ζ 700(t−24) the same parameter 24-hours previous.

4.10 Distributions of Predictors One of the requirements of DA is that the predictors are normally distributed and independent. This requirement can be relaxed somewhat with little impact on the interpretation of the results. If normality is not met, results may not be optimal but are still valid (Tabachnick and

49

Fidell 2001). To assess the degree to which the predictors are normally distributed, distributions were first visually inspected for each one. Figures 13-16 show the distributions of each predictor for the 1998-2001 cases for the 6-hour forecast period. Note that all predictors exhibit a ‘bell-like’ distribution, similar to a normal distribution. The COR predictor has a somewhat skewed distribution towards the right side of the distribution. As will be discussed in the following chapter, DA is able to cope with a certain degree of skewness as long as the sample size is large. There is a hint of a bimodal distribution in the PWAT predictor, but it is difficult reach that conclusion with certainty. Distributions for other forecast hours (not shown) are very similar to the distributions shown in Figures 13-16, with the one notable exception of an increase in variance (‘fatter’ bell shape) in the vorticity tendency predictors at longer lead times. As will be shown in the next chapter, the increase in the variance similarity between the vorticity and other predictors makes the DA results at longer lead times more statistically sound. To quantify the degree to which the predictor distributions resemble a normal one, a Kolmogorov-Smirov One-Sample Test (K-S1) was performed. The K-S1 test tests for the null hypothesis that a sample comes from a normal distribution. It finds the largest difference between two cumulative distribution functions – one from the data being tested, and one from the normal distribution. A significance value of 0.05 or greater indicates that the distribution resembles a normal distribution. Table 8 shows the results from the K-S1 test. Note that PTEND is the only predictor that resembles a normal distribution. The MPI and PWAT predictors are relatively close. All of the other predictors have distributions that are not normal-like, especially COR. It is not certain how “relaxed” the normality requirement can be made before the results are no longer statistically valid. What is certain is that the results from the DA must be interpreted with caution.

Predictor Abs. Difference Between Significance Value Normal and Observed Dist. COR 0.127 0 DGP 0.058 0 MDIV 0.060 0 MPI 0.040 0.011 PWAT 0.049 0.001 PTEND 0.018 0.696 VTENDSFC 0.089 0 VTEND700 0.090 0

Table 8. Results of the K-S1 test for normality applied to the predictor distributions.

50

a)

b)

Figure 13. Frequency distributions of a) COR and b) DGP predictors from all cases in the 1998- 2001 database.

51

a)

b)

Figure 14. As in Figure 13, except for a) MDIV and b) MPI.

52

a)

b)

Figure 15. As in Figure 13, except for a) PWAT and b) PTEND.

53

a)

b)

Figure 16. As in Figure 13, except for a) VTENDSFC and b) VTEND700.

54

Another requirement for DA is that the predictors are independent of one another. To test this requirement, a series of inter-variable correlations was performed. All combinations of predictors had extremely low correlations with the other predictor(correlation coefficient r < 0.20 ). This result satisfies that requirement. There are other statistical requirements that the predictors must meet in order for DA to yield valid results. These will be discussed and evaluated in the following chapter as well.

4.11 Summary This chapter presented the eight predictors that were used to develop the probabilistic statistical forecast model for TCG. Each predictor was selected not only for its potential contribution for forecasting TCG, but also for its accessibility and ease of calculation. Predictor values were calculated by averaging all NNR grid points that were within 2º of the center of the cloud cluster. It was shown that even though most predictors have a “bell-like” normal distribution, a K-S1 test showed that only PTEND satisfied the condition of normality required by DA. The results are still valid, though some degradation may have occurred. Another DA requirement is sufficiently satisfied as the predictors are independent of one another. The following chapter will present a preliminary analysis of the dataset with the DA classifier.

55

CHAPTER 5 DISCRIMINANT ANALYSIS CLASSIFICATION

Classification of phenomena based on a set of predictors has traditionally been performed using discriminant analysis. DA fits a linear boundary created from the combination of the independent predictors that best separates the two dependent outcomes. This technique classifies a set of normal, equivariant predictors into groups by creating a “discriminant function”. This function can take on two forms: linear or quadratic. The linear form of the function produces a decision based on linear combinations of the predictors. The quadratic form of the function potentially has more flexibility and power, as it can produce higher order terms that are able to fit a non-linear boundary between groups. A thorough examination of discriminant analysis is given in Tabachnick and Fidell (2001). There are two primary purposes of this chapter. First, it will introduce the DA technique and consider the statistical requirements and possible limitations of the method. Second, this chapter will present results from a DA run on the entire 1998-2001 dataset. These results include information on the significance of predictors and independent classification results from all cloud clusters in the dataset, including classification by month of the season. The results will show that the selected predictors and DA provide an abundance of useful information that can be applied to the neural network with aims at improving it; namely, the importance of each predictor, their relationship to each other, and areas of high and low classification skill. Linear discriminant analysis was chosen as the first applied technique although it is limited by the linearity of the decision boundary. The advantages of using this method, rather than quadratic discriminant analysis or a NN, are its simplicity, speed (one year of classification takes only a few seconds on a 500 Mhz PC), and theoretically better generalization of predictions. In addition, the procedure was readily accessible via SPSS versions 10 and 11 and easy to implement through a graphical user interface. SPSS did not support quadratic discriminant analysis, although it is a feature of the SAS statistical software package and available through C and Fortran libraries at a significant cost. Quadratic DA is more similar to a NN in that the robust fitting power must be balanced by mitigation of over fitting.

56

Neural networks may at first seem like exotic methods, but they are closely related to discriminant analysis. A neural network with zero hidden nodes that uses a logistic activation function (defined in Chapter 6) is the same classifier as linear discriminant analysis (Caren Marzban Personal Communication 2002). The power of neural networks is their ability to go beyond the limitations of linearity by adding “hidden nodes”, which allow the network to fit any non-linear boundary. Hence the results presented in this chapter can be thought of as results from a linear neural network with no hidden nodes. A more thorough discussion of neural networks will follow in the next chapter.

5.1 Statistical Assumptions Another potential limitation of discriminant analysis that neural networks do not have are requirements of the predictors that must be met in order for the technique to be statistically valid. Discriminant analysis is not hindered by the existence of unequal sample sizes in different groups (Tabachnick and Fidell 2001). This is fortunate since the dataset being considered has many more non-developing cases than developing ones. As long as the sample size of the smallest group (developing clusters, of which there are at least 30 in within each forecast period) exceeds the number of predictors (8), the technique retains its ability to discriminate. As will become evident in Chapter 8, the NN appears to suffer from the inequality of the dataset. But in DA there are several other requirements of the dataset that must be satisfied in order for the technique to be statistically valid. These requirements can be relaxed in many cases without severely degrading the classification (Tabachnick and Fidell 2001).

5.1.1 Normality First, the predictors and linear combinations of them are assumed to be normally distributed and independent. Discriminant analysis handles skewness of the distribution better than outliers. As shown in the previous chapter, seven out of eight predictors did not pass the Kolmogorov-Smirov One-Sample Test for normality. This suggests that the DA results are not optimal. There is no convenient statistical test to assess the normality of the linear combination of predictors.

5.1.2 Homogeneity of the Variance-Covariance Matrices The variance-covariance matrices of the predictors should vary together, though violations of this assumption do not necessarily imply a degradation in the power of the discriminant analysis, especially for large sample sizes (Tabachnick and Fidell 2001). Box’s Test

57

Log 6 12 18 24 30 36 42 48 Determinants ND -3.453 -3.453 -3.453 -3.453 -2.242 -2.242 -2.242 -2.242 DV 0.637 -0.920 -2.972 -3.938 -2.876 -2.903 -3.387 -3.655 Pooled within -3.100 -3.314 -3.385 -3.425 -2.235 -2.235 -2.240 -2.247 groups

Box’s M 294.022 87.242 68.832 47.604 40.824 42.075 45.140 33.336 F Approx. 7.469 2.210 1.726 1.184 1.361 1.400 1.492 1.093 df1 36 36 36 36 28 28 28 28 df2 11452.893 10787.233 8910.690 7759.844 14668.184 13905.549 11740.129 9758.304 Sig. .000 .000 .004 .208 .097 .078 .046 .336

Table 9. Values of the tests for variance-covariance homogeneity for each forecast hour.

is frequently used to test for this assumption. Table 9 shows the log determinants and Box’s M Test for each forecast hour. Log determinants can be used to show the magnitude of the variance of the covariance matrix. A larger value can be interpreted as a higher variance matrix. It is desired that the magnitudes of the log determinants for the developing, non-developing, and pooled within groups categories be of similar magnitude. This is indeed the case for forecast hours 18-48. At the two earliest forecast periods, however, the developing case’s log determinant values are grossly different than the ND group. The ‘Sig’ row tests the null hypothesis of equal population covariance matrices; a value of .000 in the ‘Sig’ row indicates that the homogeneity matrix differs much less than the non-developing ones. For the 18-48 hour forecasts periods, there are some similarities between the variance-covariance matrices, indicating that the assumption is not as grossly violated as in the earlier forecast periods. This test collaborates the conclusion from the normality test that the DA results should be viewed with caution.

5.1.3 Linearity

Linear discriminant analysis implies that the dependant variable (DV or ND) can be classified based on a linear combination of the predictors. Since it is possible that there are non-linear relationships within the independent variables, this assumption could potentially limit the classification skill of the analysis.

58

5.2 Procedure The discriminant analysis procedure in SPSS was run for each forecast period (eight times) for the entire dataset described in Chapter 3. The dependent variable is the development status, which has a value of ‘0’ or ‘1’ for a ND or DV cluster. The independent variables, or predictors, were the eight variables described in Chapter 3 (COR, DGP, MDIV, MPI, PWAT, PTEND (6-24 hours only), VTENDSFC, and VTEND700). SPSS allows one to choose whether each case has a prior probability of occurrence associated with it or whether each case has an equal chance of occurring. All of the results presented here were created from the premise that each case had an equal chance of belonging to one of the groups. In rare-event situations, probabilistic forecasts for the event are usually quite low if one applies a priori probability criteria (Caren Marzban Personal Communication 2002). An example of this can be seen in Chapter 8, where the NN is shown to have the best classification skill when a decision boundary is chosen to be around P = 0.25. This means that most probabilistic forecasts were less than this value. In similar plots, the DA algorithm exhibited decision boundaries in the area of P = 0.80. This is due almost entirely to the equal prior probability assumption. Ultimately this does not affect the value of the forecasts since they are subsequently scaled with relation to the decision boundary. If there were missing values in any of the predictors for a trial, they were excluded from the analysis. Finally, a ‘leave one-out’ classification option was invoked for the analysis of the 1998-2000 seasons (the 2001 season was not processed at that time), whereby a discriminant function was derived for all of the cases except one. Then the left out case was classified as an independent trial against the derived function. This procedure is then repeated for all of the cases in the dataset, thereby yielding a complete set of independently calculated results (known commonly as ‘cross-validation’ or ‘jackknife’).

5.3 Results of Preliminary Classifications This section will present the results of the preliminary DA classifications on the 1998- 2001 seasons. The purpose of the preliminary runs is to assess the feasibility of the prediction scheme, where the model performs well and poorly, and to obtain other relevant information (such as the significance of the predictors) that are harder to obtain with a NN classifier. This dataset is the source for the DA and NN case studies presented in Chapter 8.

5.3.1 Significance of Predictors By examining the correlation between each predictor and the predicted outcome (the ‘discriminant function loadings’) one can gauge the significance of each predictor. Table 10 shows the loadings for all forecast hours. Higher values for a predictor indicate that it is more

59

important in determining group membership and hence a greater degree of separation between developing and non-developing groups. For all forecast hours, the DGP was by far the most significant predictor, followed on average by COR (latitude). This was expected (though maybe to a lesser degree) and re-emphasizes the importance of the large-scale dynamics to TCG. The weakest predictors overall were the 700 hPa vorticity tendency (VTEND700), MPI, and the PWAT. The vorticity tendency signal was relatively weak in most instances and showed little differentiation (as did PWAT) between DV and ND clusters. This result emphasizes the need to consider the vertical change of vorticity as in the DGP instead of simply the lower level vorticity when assessing the potential for TCG. In fact, the correlation was negative at a couple of the forecast times, indicating that little confidence should be given to them. The significance of the MPI appeared to be reduced from early and late season TCG events that occurred over marginal SSTs – this will be discussed further below. In summary, the results in Table 10 seem to confirm, as shown in McBride and Zehr (1981), that for daily tropical cyclogenesis prediction the large- scale wind field is the most skillful predictor.

Avg. Hour 6 12 18 24 30 36 42 48 Rank DGP .864 .815 .813 .781 .830 .859 .860 .880 1.0 (1) (1) (1) (1) (1) (1) (1) (1)

COR .279 .343 .323 .260 .495 .424 .432 .360 2.4 (3) (3) (2) (3) (2) (2) (2) (2)

PTEND -.211 -.356 -.286 -.197 N/A N/A N/A N/A 3.7 (5) (2) (3) (5)

MDIV -.232 -.130 -.281 -.415 -.201 -0.94 -.100 -.128 4.0 (4) (5) (4) (2) (3) (5) (4) (5)

VTEND .307 .129 .194 .231 .030 -.056 .100 -.079 SFC 5.0 (2) (6) (5) (4) (6) (6) (4) (7)

VTEND -.012 .242 .017 .189 .136 .259 .057 -.194 700 5.5 (8) (4) (8) (6) (5) (3) (7) (3)

MPI -.260 -.075 -.160 -.097 -.145 -.110 -.263 -.116 5.6 (7) (8) (6) (7) (4) (4) (3) (6)

PWAT .209 .111 .130 .068 .002 .016 -.076 -.131 6.5 (6) (7) (7) (8) (7) (7) (6) (4)

Table 10. Discriminant function loadings for each forecast hour. Rank is given in parentheses for each time period.

60

5.3.2 Individual Case Classification In addition to predicting group membership, DA also gives a probability of each case belonging to that group. To evaluate the performance and usefulness of the predictions, all cases were stratified into five bins by this probability value. Table 11 lists each bin, the number of cases that were grouped in that bin by forecast hour, and how many of those ended up actually developing into a tropical depression. For example, bin 1 lists all of the cases where the discriminant analysis procedure predicted DV group membership with a probability of 0.90 or higher. For the 24-hour forecast period, 24 cases met these criteria. Of those 24, 10 developed into tropical depressions 24 hours later, a development percentage of 41.7%. For the same time period but with a lower confidence classification, the development percentage drops from 21.4% (bin 2) to 1.1% (bin 5). Table 11 reveals an interesting aspect of the classification results. The formation rate of tropical depressions is still less than 50% even when the large-scale environment is especially favorable (corresponding to probabilistic forecasts P > 0.90). If one were to rely on these results alone, the false alarm ratio (FAR) would be far too great for a useful forecast. But, if the probability of development drops below 0.7 (as it is with over 90% of all cases), TCG almost never occurs and the forecaster could make a rather confident forecast given this situation. Given the development percentage values in Table 11, it is possible to categorize the development likelihood given the probabilistic prediction of development by the DA procedure. This is shown in the far right column of Table 11. If the probability of development P >= 0.90, the cluster is labeled as having a ‘good’ chance of developing. If 0.80 <= P < 0.90, the cluster has a ‘fair’ chance of developing. If P < 0.70, development is unlikely to extremely unlikely. Admittedly, this is a rather ad hoc way of deciding where the decision boundary between developing and non-developing systems should be. A far better method, and one used in Chapter 7, is to cycle through all possible decision boundaries (between P = 0 and P = 1), calculate some statistical score(s) at each one, and pick the boundary that has the highest score as the decision boundary. As will be shown in Chapter 7, the optimal boundary varies not only by classifier but by forecast hour as well. But for the purposes of this pilot study, it was sufficient to use P = 0.7 as the decision boundary.

5.3.3 Statistical Measures of Skill Threat and Brier scores were computed for the composite dataset to assess the statistical power of the DA classification. The threat score, typically used to evaluate qualitative precipitation forecasting, is calculated as:

61

Bin 1 (P >= 0.9) Forecast Hour Number cases Developed DV Percentage DV Likelihood 6 36 16 44.4% Good 12 27 12 44.4% 18 30 14 46.7% 24 24 10 41.7% 30 29 10 34.5% 36 37 9 24.3% 42 26 8 30.8% 48 18 1 5.6% Bin 2 (0.8 <= P < 0.9) 6 16 6 37.5% Fair 12 36 7 19.4% 18 26 6 23.1% 24 28 6 21.4% 30 76 7 9.2% 36 57 10 17.5% 42 44 3 6.8% 48 49 6 12.2% Bin 3 (0.7 <= P < 0.8) 6 18 4 22.2% Unlikely 12 43 2 4.7% 18 30 1 3.3% 24 42 0 0% 30 92 6 6.5% 36 85 3 3.5% 42 124 7 5.6% 48 115 5 4.3% Bin 4 (0.5 <= P < 0.7) 6 50 1 2.0% Very unlikely 12 106 3 2.8% 18 93 3 3.2% 24 144 2 1.4% 30 326 7 2.1% 36 284 7 2.5% 42 385 7 1.8% 48 471 9 1.9% Bin 5 (P < 0.5) 6 1108 8 0.7% Extremely Unlikely 12 1015 10 1.0% 18 1045 7 0.7% 24 984 11 1.1% 30 1781 9 0.5% 36 1840 9 0.5% 42 1721 10 0.6% 48 1643 11 0.7%

Table 11. Development rate by probabilistic prediction and forecast hour (1998-2000).

62

Figure 17. Threat (solid) and Brier (dotted) scores by forecast hour for 1998-2001 data. A decision boundary of P = 0.7 was used to calculate these scores.

C THREAT = (17) A + B − C where A = number of DV forecasts made, B = number of DV cases observed, and C = number of correct DV cases. A threat score of 1 is a perfect score for a suite of forecasts; a score of 0.5 or higher is considered a highly skilled forecast. To compute threat scores, we defined a DV forecast as one in which the probability of DV was at least 0.7 (similar to Perrone and Lowe who used P >= 0.65). Figure 17 shows the threat scores for the 6-48 hour forecast periods. They range from 0.329 at 6 hours to 0.059 at 48 hours. These values are significantly lower than the scores computed by (Perrone and Lowe 1986). This may be attributed to differences in the methodologies of the studies and will be discussed further in section 4 of this chapter. The Brier Score (Brier 1950) is a measure of skill for a probabilistic forecasting system. It is essentially the sum of the squared probability errors:

63

N 1 2 BS = ∑()fi − Oi (18) N i=1

th where N = number of forecasts, fi = the forecast probability of the occurrence of the i event, and

Oi the observed value of the event (1 = DV, 0 = ND). The lower (higher) the Brier Score, the more (less) skillful the probabilistic forecast is. A Brier Score of ‘0’ indicates a perfect suite of forecasts. The Brier Scores for each forecast periods are shown in Figure 17. They are of similar magnitude to Perrone and Lowe and indicate some degree of value in the probabilistic forecast method derived here.

5.3.4 Classification by Month of Season A good deal of information about the behavior of the classification algorithm can be extracted by examining the performance of the system at different periods of the hurricane season. To accomplish this task, it was necessary to develop a quantitative climatology of development by month of the year. The composite dataset from 1998-2001 was examined and all cloud clusters were separated by the month in which they first formed. They were then further stratified by their eventual development status, either DV or ND. The results of this exercise are summarized in Figure 18. As expected, the largest percentage of cloud clusters developed into tropical depressions in September, though June actually spawned more cloud clusters overall. Late season cyclogenesis was more common than early season even though there were fewer candidate cloud clusters during that period. The plotted line in Figure 18 represents the development percentage by month, or the total number of depressions divided by the total number of clusters for that month. These percentages will serve as the climatological values for comparison with the performance of the DA classification. Next, classification results from the DA were examined in the following manner. For each cluster, each 6-hour period is either classified correctly or misclassified. A correct classification is one in which the DA forecast matches the observation, using a decision threshold of P = 0.7. That is, the DA forecast is ‘1’ if the probability forecast is equivalent or greater than 0.7; otherwise the forecast is ‘0’. If all times for the cluster are correctly classified, the cluster is considered a ‘success’ – if one or more times are misclassified, than the cluster is labeled a ‘miss’. The number of successes and misses was then tallied by month. The results are shown in Figure 19. For the prime development season (Aug-Oct) the DA is superior to climatology at all forecast times. For the month of July, the DA is comparable with climatology. However, the DA

64

Figure 18. Frequency of cloud cluster and tropical depression development by month.

falls well short of the climatological skill early (June) and late (November) in the season. For the June cases, the false alarm ratio (FAR, defined in section 7.2) was very high (30-35% of all June clusters). For those clusters that were falsely forecast to develop, they exhibited a favorable shear (DGP) and moisture (PWAT and MDIV) environment. Most November cases that were false alarms were located at higher latitudes (favorable COR) and exhibited strong vorticity spin up (VTENDSFC and VTEND700) and pressure falls (PTEND). Clearly, the DA is failing to find a significant predictive signal for the early and late season systems. The rarity of developing systems during those months compared to the prime development season is one factor. The June results also suggest that the DA assigns too much weight to the DGP predictor. The November results imply that the DA does not handle the higher latitude systems as well. These issues will be explored further in the DA and NN analyses in Chapters 7 and 8.

65

Figure 19. Percentages of cloud clusters correctly classified by month. The dark solid line without points is the climatology boundary shown in Figure 18.

5.4 Discussion Perrone and Lowe’s tropical cyclogenesis work with Pacific cloud clusters (Perrone and Lowe 1986) has many parallels to this research. But there are several crucial differences that give arise to significantly different results, especially in the statistical scores. First, they are predicting tropical storm formation. It is believed that this improved their classification rate since the differences between developing and non-developing clusters are likely to be greater during the time the cluster is a tropical depression. In this study TCG is assumed to occur when a cluster makes the transition to a tropical depression – the transition from a tropical depression to a tropical storm is assumed to be the intensification of a pre-existing tropical cyclone. The smaller differences between developing and non-developing tropical disturbances make statistical differentiation more challenging, but they form the core of the forecast problem. Second, Perrone and Lowe used a more relaxed procedure for the selection of their ND cases. They had a size requirement of only 1° diameter, four times smaller than the requirement in this study. In addition, it is not clear if their clusters met a persistence requirement nor was

66

there any mention of cloud structures that may have been associated with other systems. If a “cloud cluster” is defined as a convective entity that satisfies the condition of “pre-existing convection” for tropical cyclogenesis, it is quite possible that Perrone and Lowe included clusters in their dataset that in fact were not cloud clusters at all. Although midget may form from small clusters in the western Pacific (Lander 1994), tropical cyclogenesis normally occurs from larger clusters over the Atlantic. The selection criteria of 4° diameters and persistence for 24 hours eliminates smaller, more transient features that have little likelihood of developing into a tropical depression. Since DA works best when there is a high degree of separation between predictors, we think that the inclusion of smaller, transient cloud features would artificially inflate the skill scores. Third, they chose some predictors that are important from a climatological point of view rather than those with a day-to-day focus. This may have hurt their model performance, as several of their predictors are not necessarily the best ones for daily genesis studies. For example, vertical wind shear is highly correlated with tropical cyclone formation on a seasonal scale. But as has been shown in several studies (e.g. McBride and Zehr 1981), tropical cyclones frequently form in high shear environments. The key is that there is near-zero shear over the cluster itself. The importance of the DGP in this study clearly validates this statement.

5.5 Summary This chapter introduced DA and described a pilot study that was undertaken to assess the validity of the research model. Statistical scores and a comparison to climatology showed that the statistical model has predictive skill out to at least 48 hours, especially during the prime development season of August-October. Discriminant function loadings showed that the DGP predictor dominated the others in terms of importance. This factor may hinder predictions of TCG during June when favorable DGP coincides with a marginal thermodynamic environment. A more strict definition of a ‘cloud cluster’ and the choice of other predictors resulted in a significant departure in statistical performance than for the only other previous study of TCG. It is thought that better performance would be realized if: 1) there were more developing cases available, especially during June and November. 2) The classifier was able to resolve non-linear interactions in the predictors. The following chapters will explore the second explanation – only more cloud cluster tracking in the future will improve the first.

67

CHAPTER 6 NEURAL NETWORK BACKGROUND, ARCHITECTURE, AND TRAINING

Neural networks (NNs) are a statistical means of relating a set of independent variables (“predictors”) to another set of dependent variables (“targets”) in a non-linear fashion. Their design and function is meant to mimic the activities of neurons in the human brain. Namely, each neuron accepts input from a preceding layer of neurons and emits its own signal based on the nature of the input. The most common network designs employ at least three “layers” of neurons: an input layer (the predictors), an output layer (the targets), and a “hidden” layer found between the two. It is this hidden layer that gives the neural network the ability to model non-linear processes by minimizing the error function (usually the mean square error) between the predictors and the targets by finding the optimal “weights”. Weights are analogous to synapses in a biological brain. There are many different species of NNs, the most common being the feed-forward backpropagation NN. “Feed forward” describes the direction that the information flows through the network during training (from the input layer, through the hidden layer, and exiting through the output layer). “Backpropagation” refers to the method used to find the minimum error of the network. It is an iterative process where the initial weights are adjusted through a simplified steepest descent algorithm: ∂E ∆W = −η (19) ∂W where W is any weight, E the error function, η the learning rate (specified by the user), and ∆W the weight increment. There are many other methods for finding the optimal weights, such as gradient descent, conjugate gradient, simulated annealing, and genetic algorithm. Descriptions for many of these methods and others can be found in DeMuth and Beale (2001).

6.1 Components of a Neural Network The anatomy of the neural network (NN) employed in this research is illustrated in Figure 20. The network consists of 3 layers: the input layer, a “hidden” layer, and the output layer. The input layer contains the predictor information for each of the eight independent variables

68

Input Layer

Hidden Layer

Output Layer

Figure 20. Schematic diagram of a common neural network architecture. Each layer is connected by a series of “weights” (solid lines), which are optimized by the training process.

described in Chapter 4. Each predictor was pre-processed to check for collinearity between them. In addition, they were normalized such that all values in every predictor were transformed to take on a value between 0 and 1. More information on the pre-processing of the independent variables is presented later in this chapter. The second layer contains the neurons that make up the hidden layer. The job of the hidden layer is to apply an “activation function” on the inputs presented to it from the input layer, and then present the output to the output layer. An activation function is nonlinear, limited, and monotonically increasing. The output layer absorbs input from the hidden layer, calculates the error using a user specified function, and initiates another iteration of training if the error goal (desired error minimum) is not met.

69

6.2 Applications of Neural Networks in the Atmospheric Sciences Though the field of artificial intelligence and neural networks is still developing, there have been several successful applications of NNs in the atmospheric sciences despite some inherent limitations in the application of them to several types of meteorological data (Hsieh and Tang 1998). The classification of cloud types from satellite imagery is a difficult problem and has traditionally been addressed using heuristic decision trees. Bankert (1994) showed that a probabilistic neural network, trained from cloud scenes manually classified by ‘experts’, achieved a success rate (i.e. correct classification) of 78%. Marzban and Stumpf (1996) applied a NN to the problem of tornado prediction. 23 variables culled from Doppler radar data were used as inputs and a feed-forward network was trained. A comparison of the Heidke skill score (Doswell et al. 1990) and the critical success index (Donaldson et al. 1975) showed that their NN outperformed both discriminant analysis and the expert system currently in place at the National Severe Storms Laboratory by a significant margin. Hall et al. (1999) attempted to improve localized precipitation forecasts through the use of a NN. Applying numerical model output (ETA) and upper air soundings that encompassed 19 meteorological variables as predictors, two networks were trained to predict probability of precipitation (POP) and the 24-hour quantitative precipitation for Dallas, TX. Their POP network exhibited remarkable skill, as rain occurred on only 1 day out of 436 in which the forecasted POP was < 5%. On days during the study period when the network forecasted a greater than 95% chance of rain, rain always occurred. The amount of precipitation forecasted was also highly correlated with the actual measured rainfall (R=0.95 (0.86) for the cool (warm) season). Another area in which NNs could potentially create a big impact is the diagnoses of rain-rate based on cloud characteristics sensed by satellites. Bellerby et al. (2000) trained a network with two hidden layers (300 neurons in total) and 45 predictors derived from TRMM and GOES data. Their two-layer network outperformed both the one-layer network and linear regression for both of the time periods tested.

6.3 Training of the Neural Network A network is considered “trained” when the error meets or drops below a goal specified by the user. Figure 21 illustrates a flow chart that describes the training process. The independent variables (X) and their associated targets (Z) are presented to the network. Either the data can be presented to the network all at once (called “batch training”) or in sequential fashion. It will be assumed that the example NN is being trained in a sequential fashion. Using the initial weights, the hidden layer of the NN calculates the output parameter, which is a function of the input, weights, and biases. The network then computes the error (E) by examining the NN output to the

70

X Training Set Z

Neural Network Y E < ε Yes {W} Error End E=|Z-Y| ? Training

Output No

BP

∆W

Figure 21. Flow chart showing the steps taken during the training process of a NN.

target. If the error meets the goal (ε) established prior to training, the training process ends. Otherwise, the network tries to minimize the error by adjusting the weights and spawning a second iteration (called “epoch”). In this particular example, the network uses backpropagation (BP) to minimize the error, though one may choose other algorithms such as those mentioned earlier in this section. After the adjustments are made to the weights, another epoch is performed with the new weights. The number of epochs the NN performs to reach the desired error goal depends on many factors, including the strength of the signal in the data, choice of network architecture (including number of hidden layers and number of neurons), existence of a weight penalty term to reduce over fitting (see next section), choice of initial weights, choice of transformation and error functions, and the training algorithm. Training can be the most time consuming aspect of employing a NN to data (Vladimir Krasnopolsky Personal Communication 2002). But this hindrance is offset by the rapidity of applying new data to a trained network. Once a network is trained, results on an independent set of data are obtained in a nearly instantaneous fashion.

71

6.4 Overfitting in Neural Networks As with any statistical procedure, there are both advantages and disadvantages in employing NNs to a dataset. One of the most powerful aspects of NNs is their ability to extract predictive signals from noisy predictor data. Through the specification of the number of hidden layers and neurons, the user has the ability to model any continuous function. The NN is fault tolerant, handling the existence of outliers and contradictory signals well. But this character attribute of NNs also leads to one of its more common faults - the over fitting of the training data. In essence, the NN learns the noise and the signal is lost. Hence, one may extract high predictive results from the training data set but find no skill when a new, independent set of predictors is applied to the network. As the non-linearity of an analysis scheme increases, the chance that it will over fit the input dataset also increases. The middle panel of Figure 22 illustrates an example of a non-linear regression technique that learned the noise. The application of such a model to an independent set of data would yield very poor results. The problem was alleviated in panel c, as the exponential curve there generalizes the signal well. In NNs, the degree of linearity of the model is controlled by the weight terms (ω) and the number of hidden layers. It follows that these two factors are responsible for the minimization of the error between the training data and the output data. If there are too many hidden layers or the weights grow too large, the NN can learn any continuous function, which may result in over fitting. As the weights decrease and the number of hidden layers goes to zero, the NN becomes more linear and tends to under fit the data. Hence, a crucial consideration in formulating a neural network for use with a particular dataset is to determine the optimal number of weights and hidden layers that will yield the best predictions on an independent data set. There are several strategies that are employed to accomplish this. The first method is to add a “weight decay term” into the error function that forces unimportant weights to zero and prevents other weights from becoming too large. Consider a network that employs the mean square error (MSE) function to determine the optimal weights for the network. In a NN that does not utilize a weight decay term, the cost function is simply equal to the MSE:

E = MSE . (20)

In a NN that employs a weight penalty:

E = MSE + ∑ω 2 . (21)

72

Figure 22. Hypothetical data illustrating generalization skill. The model output plotted in (b) has over fit the data (a), while the output in (c) shows much better generalization.

73

Where the second term artificially adds error to the result, allowing the weights to remain relatively small compared with what they would have been without the term. Finoff et al. (1993) provides a good summary on techniques that are used to minimize the weights of the NN. Another strategy employed to prevent over fitting is called “early stopping”. During the training process, the user at some point terminates the network learning before a convergent solution is found. This prevents the NN from finding a global error minimum, and hence possibly over fitting the data. Imagine that panel c from Figure 22 is a network that was stopped early, while panel b is a network that was allowed to completely converge on a solution, and thus over fit the data. The network used in this research employs an adaptive regularization technique to prevent over fitting; this technique will be discussed in more detail in the following section.

6.5 Neural Network Description and Architecture This study uses a binary classification network developed at the Technical University of Denmark (Kolenda et al. 2002). The network, called ‘nc_binclass’ (version 1.0) was downloaded from their website (Denmark 2002) and set up to run in Matlab. The network was specifically designed to output the probabilities of a candidate belonging to the developing class (class 1). It is a three-layer, feed forward backpropagation neural network with one hidden layer and one output node contained in the output layer. The hyperbolic tangent (logistic) activation function is used in the hidden (output) layer. The use of cross-entropy as the cost function for weight optimization in conjunction with the logistic activation function insures that the network output can be interpreted as posteriori probabilities, as shown in Richard and Lippmann (1991). To mitigate over fitting, the ‘nc_binclass’ network employs regularization. As stated earlier, regularization involves adding an additional term to the network error that in effect discourages network mappings (weight configurations) that are not smooth. A good discussion on regularization theory can be found in Bishop (1995). This network uses an adaptive Bayesian regularization technique developed by MacKay (MacKay 1992; 1992). Though a thorough discussion of Bayesian methods and regularization theory is beyond the scope of this paper, MacKay’s scheme essentially seeks to find the most probable weights and regularization parameters (α) that maximize the fit of the training data set. This is similar to conventional training techniques, except that MacKay then approximates a probability distribution of weights surrounding this maximum to yield information about the certainties of the predictions. The regularization parameter α effectively controls the ability of the weights to grow too large, which leads to increasing non-linearity and over fitting, or too small (limiting the predictive capabilities of the network). It can be set manually for testing purposes, effectively disabling the MacKay regularization, or it can be computed by the regularization procedure.

74

There is a limitation with the ‘nc_binclass’ network that potentially limits its power and restricts its flexibility. There is only one regularization parameter for all of the weights in the network, rather than one parameter for each weight. There have been documented problems with this type of implementation (see sections 9.2.2 and 10.1.6 of Bishop 1995). Namely, there are “inconsistencies with known scaling properties of network mappings”. The tangible result is that the network will favor some solutions over others that may be better ones (Caren Marzban Personal Communication 2002). This potentially could limit the predictive skill of the network since it is not considering all possible solutions. Evidence for this symptom can be seen in the bootstrapping trials presented in section 6.6. The network converges on just a few local minima for many different trials of differing initial weights and dataset partitioning (resulting in fewer scatter plot points). If there were one regularization parameter for each weight, each trial would theoretically result in a different error value, allowing for a more exhaustive evaluation of network performance. However, it is believed that this problem is relatively minor and the NN will still perform with skill despite the limitations imposed upon it. The network consists of a series of Matlab subroutines that allow for simple training. Figure 23 is a schematic diagram of the components of the neural network and how they are related to each other. The following subsections give a brief description of each component with references, in chronological order.

6.5.1 Network Inputs The user specifies six parameters as inputs into the neural network: training data, training targets, validation data, validation targets, number of hidden nodes, and a random seed. The training data are the set of predictors that are used to ‘train’ the network (converge on a set of optimal weights that minimize the error function). The training targets are the corresponding binary values (0 or 1) for each case in the training data. After the network completes training, the validation data and targets are independently classified and evaluated. The number of hidden nodes refers to the number of nodes in the hidden layer (the layer between the input nodes and the output node). As a general rule, more hidden nodes means longer training times, lower training error, and higher validation error. Theoretically it is possible to train a neural network to produce zero error on the training data. But such a network would be useless as it would be inept at classifying new data. This issue is discussed further in section 6.6. Finally, the ‘random seed’

75

Model Inputs Normalization of Input (6.5.1) Data (6.5.2)

Train Network Initialize Network (6.5.4) Weights

(6.5.3)

Network Error Employ Weight Calculations Regularization/ Penalty (6.5.5) (6.5.6)

Network Convergence? NO (6.5.7)

Evaluate Output (6.5.8) YES

Figure 23. Schematic diagram of the components and flow of the binary neural network classifier. The numbers correspond to the sections of this chapter that address that component.

76

refers to a value that is used to randomly initialize the weights. This is an important parameter that is exploited to build a neural network that is able to generalize well. It is also discussed in more detail in section 6.5.3.

6.5.2 Normalization of the Predictor Dataset Before the predictors for the training and test datasets are shown to the NN, they are normalized. To perform normalization, all input data is converted into z-scores, in the form (x − µ ) z = (22) σ where µ is the mean and σ the standard deviation of the entire predictor dataset. There are several good reasons for computing z-scores before the network is trained. First, it prevents a measure of confusion that may be created within the network by feeding it variables that have different units and magnitudes. Normalization allows the network to treat each variable with equal consideration. Second, there has been some empirical evidence that normalization can help stabilize training, i.e. help the back-propagation training algorithm to converge to a nice solution more often (Caren Marzban Personal Communication 2002). Finally, there is a good chance that the hidden nodes will “saturate” if normalization is not performed. This means that the activation function (logistic in this case) will quickly proceed to its asymptotic values (0 or 1 for the logistic function) if the inputs are too large or small. As training proceeds, the inputs into the hidden layer will quickly become 0 or 1 as the weight adjustments are made; clearly a non- desirable solution. Normalization reduces the chance of this occurring. It should be noted here that another concern that may affect the speed of the training process is collinearity within the predictor dataset. Collinearity is the process of determining which, if any, of the predictors are highly correlated with each other (R > 0.9). In general, NNs that are designed for prediction are not adversely affected by collinear inputs (Marzban 2000). However, each predictor that is shown to the network creates a whole set of weights that have to be estimated. This will increase the chance that the NN will overfit the data. Before the data were normalized, each of the eight predictors used in this study were examined for collinearity with each other. None of the predictors were found to have a significantly high correlation with any other predictor. Hence, none were eliminated from the training session.

6.5.3 Initialization of Network Weights The network requires that the weights be initialized to some value. The user is given the option of specifying a number that is used as a random seed to generate the initial weights. If no value is given, the network produces its own. The magnitudes of the initial weights may have an

77

impact on the error minimization process, resulting in different training experiences. For example, suppose the error surface (a quadratic summation of errors for all possible weights and biases) for a given set of training data is given by Figure 24. The neural network seeks to find areas of the error surface that yield minimum values. Note that by starting on different parts of the error surface, the network may find different error minima. The cross validation technique described in section 6.6 is a procedure that minimizes the possibility of the network finding a ‘local minimum’ and maximizes the chances of the network converging on a ‘global minimum’ by varying the random seeds among other variables. As discussed earlier, the MacKay regularization may restrict the network to only finding one or two local minima out of hundreds of possibilities.

6.5.4 Network Training – The BFGS Quasi-Newton Optimization Method The process of network training is defined as the steps taken to minimize the error of the training data to the training targets. The most common training algorithm (and the one utilized in this network) is backpropagation, of which there are many techniques. The general idea of backpropagation was discussed in the introduction to this chapter. The main disadvantage to a basic backpropagation method is speed, especially for higher numbers of hidden nodes and

Local Minimum Error

Local Minimum

W

Figure 24. Hypothetical error surface with possible initial weights (W) specified by filled circles. The arrows represent the direction an un-optimized network will take the initial weights - towards the closest minimum error surface.

78

larger datasets. A great deal of development and testing on many variations of backpropagation methods has produced several viable candidates for network training that are more efficient at error minimization. The training algorithm used in this network is the Broyden-Fletcher-Goldfarb- Shanno (BFGS) quasi-Newton backpropagation method. A more detailed discussion on training methods, including quasi-Newton, is given in Bishop (1995). Newtonian training methods utilize a Hessian matrix (H), which represents second-order information about the error surface at the current weight and bias values. Evaluation of H is computationally intensive, however, and requires NW2 iterations to arrive at a solution (where N = number of patterns in the dataset and W = number of weights). In addition, traditional Newtonian methods must invert the H matrix, resulting in another W3 iterations. The quasi-Newton method avoids the necessity of inversion by building an approximation of H-1 over a number of steps. This effectively yields an overall cost of only NW2. The speed advantage of the BFGS quasi-Newton and simple Newtonian algorithms over more traditional techniques like conjugate gradient algorithms is embedded in the robustness of choosing the gradient vector (vector that points from the current position on the error surface towards a lower error value) that always points at the minimum of the error function rather than towards a less direct general direction. Figure 7.13 in Bishop (1995) and the accompanying discussion provide a more detailed explanation of this advantage. Barnard (1992) shows results of a neural network trained with the BFGS algorithm.

6.5.5 Calculation of Training Error – Activation Functions and Cross Entropy After the network computes the new weights by the BFGS algorithm, the values are mapped from the training outputs (which may vary between -∞ to ∞) to a range of –1 to 1 (or some other discrete interval). The set of functions which perform this task are called ‘activation functions’ and are usually specified as g(a), where a = the value of the weight from the training algorithm. In this particular network, the activation function for the hidden layer is the ‘tanh’ activation function:

e a − e−a g(a) = tanh(a) = . (23) e a + e −a

Typically, networks that utilize a ‘tanh’ activation function in the hidden layer in conjunction with a logistic sigmoid activation function for the output layer (as will be shown next) lead to a faster convergence of training algorithms in the network (Bishop 1995).

79

Once the weights are mapped using this function, they are sent to the output node. It is here that the performance of the current iteration of the network is assessed. The neural network is designed to create a true probabilistic forecast of a case belonging to a certain class (developing or non-developing). The logistic (sigmoid) activation function accepts the input from the hidden layer and maps the values to the range of 0 to 1 in the following manner:

1 g(a) = (24) 1+ e−a where ‘a’ represents the input from the hidden layer. If the network has converged upon the minimum error, then the output from the logistic function is the posterior probabilities of a case belonging to the developing (class 1) group. However, the convergence is yet to be checked, and more iteration may be necessary. At this point, the network first calculates a parameter called ‘classification error’. Setting the decision boundary to P = 0.5 where P is the probability of that case belonging to the developing (class 1) group, the network checks each case in both the training and validation datasets and assesses the misclassification rate. This error parameter is ignored in this study, since the goal of the training is to produce a probabilistic rather than discrete forecast. A second error function, called ‘cross entropy’, is then called. The cross- entropy error function is defined as:

1 N  y 1− y  i i (25) E = − ∑ ti log + ()1− ti log  N i  ti 1− ti 

th where N = number of cases (cloud clusters), ti is the target value for the i case, and yi the outputs for the ith case from the hidden layer. As mentioned previously, the minimization of the cross entropy error in conjunction with the application of the logistic activation function on the output node insures that the network outputs can be interpreted as posterior probabilities.

6.5.6 Adaptive Regularization At this point, MacKay’s Bayesian adaptive regularization (MacKay 1992a; 1992b) is employed to update the regularization parameter (α). See section 6.5 of this chapter for more details.

80

6.5.7 Convergence Check A check for error convergence is then performed. The network calculates the change in the cross entropy error (∆E) from the current iteration. If the change in training error is less than a specified threshold (1 x 10-5), the network stops training. Training is also stopped if the number of iterations exceeds 100. If the ∆E was greater than the threshold, the new weights are returned to the BFGS algorithm and the training process begins again.

6.5.8 Network Outputs Once network convergence is reached, the posterior probabilities from the logistic activation function are then saved to output variables for analysis.

6.6 Finding the Optimal Number of Hidden Nodes (Hoptimal) It is important that a network be optimally designed to produce the best results for a particular training set. One of the most crucial aspects to network design, perhaps second in importance to the selection of the activation and error minimization functions, is the number of nodes in the hidden layer. If too many hidden nodes are chosen, the network will probably over fit the data. If too few are selected, the network will not be able to provide sufficient fitting power. Two popular methods for determining the optimal number are cross-validation and bootstrapping. Each method employs the dissection of the input dataset into a training set (typically 2/3 of the entire dataset) and validation set (1/3 of the dataset). For cross validation, a value for H is specified, and then the network is trained repeatedly for N trials. Then, the value of H is changed and the procedure is repeated. The final product of this test is a scatter plot of training errors vs. validation errors (sometimes called a “tv-diagram” (Caren Marzban Personal Communication 2002)). An example of a tv-diagram for H = 0,2,4 and 8 is shown in Figure 25. As one would expect, the training errors gradually decrease as the number of nodes is increased, indicating that the network is beginning to fit all of the data points. However, the pronounced increase in the validation error for the H=8 trials suggest that the network is fitting the noise. But, the optimal choice of H is not necessarily the one with the lowest validation error. In this case, one of the trials where H=2 yielded the minimum validation error, lower than the trial with no hidden nodes. That trial, however, does not have the lowest training error in its group, indicating that a local minimum has been found. The global minimum for the H=2 group (the trial with the lowest training error) has a validation error value that is higher than the H=0 trial, indicating that the data has been over fit. Hence, in this example, the best choice of H would be H=0 (a linear NN).

81

400

350

300 r

ro 250 on Er

i 200 t da i 150 Val

100

50

0 0 102030405060

Training Error H = 8 H = 4 H = 2 H = 0

Figure 25. A schematic tv-diagram. Though a trial in H = 2 has a lower training and validation error than H = 0, it does not have the lowest validation error for all of the H = 2 cases. Hence, the NN has found a local minima.

The bootstrap technique is essentially the process of creating many tv-diagrams in an effort to determine the optimal number of hidden nodes. The input dataset is again divided into a training set (2/3) and validation set (1/3) and the cross-validation trials are performed, with H allowed to vary. Once the tv-diagram is completed and an optimal H value is obtained for that trial, the entire dataset is recombined and randomly partitioned into another training and validation set, but with the data divided in a different fashion than before. The cross-validation procedure is repeated and a new optimal H is obtained. If a significant number of trials (usually 15-20) yield the same optimal choice of H, one can be confident that the optimal value has indeed

been found. If different bootstrap trials produce different choices for Hoptimal, one may choose either the mode or median of the trials as the final choice of H (Caren Marzban Personal

82

Communication 2002). In this study, the mode was chosen for every forecast hour except at the 18- hour time. It was determined that a different value for H yielded significantly lower cross- entropy errors than the mode value. The bootstrap procedure was performed on the cloud cluster dataset twenty times. For each trial, the 1998-2001 hurricane seasons were randomly divided into a training set (with 0.67 of the cases) and validation set (0.33). For each random partition, the network was trained ten times, each with random initial weights. This procedure was then run eight times, one for each H value of 0-7. Thus, the network was trained 1600 times for each forecast hour.

For each bootstrap trial, Hoptimal was determined in the following manner. A tv-diagram and accompanying diagram were created for each trial. For each cluster of points in the tv- diagram, the group of points with the lowest training errors was found. Then, from these candidates, the point with the lowest validation error was identified. This point was then identified

as the “Hoptimal for trial 1”. The procedure was then repeated twenty times, each with a different random partition of data. At the conclusion of the exercise, there were twenty Hoptimal values suggested. Due in part to the similarities of the dataset across forecast hours (each dataset used the same set of non-developing cases) and degree of restriction imposed on the network by the regularization procedure, the bootstrap trials produced similar results across the different forecast hours. Figure 26 shows a histogram of Hoptimal for all forecast hours. Note that the most frequent

choice of Hoptimal is six nodes.

Occasionally, subjective modification of the Hoptimal selection procedure described above was necessary. This arose primarily when several trials with the same node number produced wildly different validation errors. This suggests that the network in that configuration is sensitive to subtle changes in the initial data that may lead it to find unfavorable local error minima during the training session. Hence, even though one partition of the data may lead to a result with the lowest training and validation errors, that result may not be obtainable with most configurations of the data. Therefore, it would be undesirable to select a network configuration with that number of

hidden nodes if there were a high chance of a result with high validation error. The Hoptimal for each forecast hour was found as follows: 6 (6 hour), 7 (12), 6 (18), 6 (24), 6 (30), 6 (36), 6 (42), 6 (48).

6.7 Summary This chapter provided an overview of neural networks in general, how they work, and potential limitations. Neural networks are becoming more popular in many different disciplines, including atmospheric science, as they provide a powerful, cost-effective means of obtaining

83

12

10

6 8 12 18 24 6 30

Frequency 36 4 42 48

2

0 01234567 Optimal Number of Hidden Nodes

Figure 26. Frequency histogram of Hoptimal for the 160 bootstrap trials. Hoptimal = 6 was chosen for all forecast hours except 12 and 18 (Hoptimal = 7 and 5 respectively). quality results that in many instances improve upon the performance of traditional statistical techniques. A general overview of how a network “learns”, or “trains” was presented within the context of the neural network used in this study. A discussion of overfitting highlighted some of the precautions that must be taken so that a network would be able to generalize well. The ‘nc_binclass’ network used for this study takes advantage of Bayesian arguments to yield true posterior probability forecasts. The components of this network were documented, including the activation and error minimization functions. The network employs a special adaptive regularization procedure, developed by MacKay, which allows the network to converge on a solution quickly. However, it was shown that this feature probably limits the network by not allowing it to explore a large number of local error minima. This means that the network most likely does not converge on the best solution, potentially degrading the forecasts to some degree. It is not believed that this restriction is serious enough to abandon the network in favor of a less restrictive one that may take many times longer to arrive at a solution (Caren Marzban Personal Communication 2002).

84

CHAPTER 7

EVALUATION OF NEURAL NETWORK FORECAST PERFORMANCE AGAINST DISCRIMINANT ANALYSIS

The vast majority of results obtained in this research will be presented in the final two chapters. This chapter will present work performed to ascertain any systematic advantages or disadvantages that the neural network classifier possesses in comparison to the linear discriminant analysis classifier. It should be immediately stated that the results in this chapter

were obtained from a dataset that can be described as containing “rare event” cases; i.e. N0 >>

N1 where N0 is the number of non-developing clusters and N1 the number of developing cases. This presents some difficulty, both in the statistical interpretation of the results and in producing a skillful classifier. There is a high risk that the signal of the developing systems will be lost in the noise of the non-developing systems. The only viable solution would be to expand the dataset to encompass more developing systems, either by adding new cases or artificially “weighting” the dataset by inserting bogus copies of developing systems or taking out a number of non- developing cases; thereby decreasing the ratio of N0/N1. The former approach is unrealistic given the timeframe of completion of this work; the latter is a risky endeavor in which the researcher may invalidate the statistical interpretation of the forecasts as posterior probabilities (Caren Marzban Personal Communication 2002). Nevertheless, this chapter will show that reasonable conclusions can be drawn regarding model performance even within these statistical limitations.

7.1 The 2x2 Contingency Table

One measure of forecast accuracy is the I x J contingency table, where I and J are the possible discrete outcomes of the forecast and actual event scenarios. In this case, I and J represent two possible outcomes: non-development (0) or development (1) of a cloud cluster.

85

Figure 27. The 2 x 2 contingency table.

The contingency table therefore becomes a 2 x 2 matrix and is shown in Figure 27. From this table, ‘a’ and ‘d’ can be defined as ‘hits’; outcomes where the prediction matched the actual event. The ‘b’ square represents the number of false alarms. These arise when an event is predicted but does not occur. Finally, the ‘d’ square shows the number of ‘misses’. A miss occurs when an event is realized that was not forecast. A suite of perfect forecasts occurs when b = c = 0. From the contingency table, a number of measures of forecast accuracy can be constructed. A thorough list of statistical measures of forecast skill, especially related to rare- event situations, is given in Marzban (1995). This paper will consider three measures of skill: probability of detection (POD), false alarm ratio (FAR), and the Heidke Skill Score (HSS).

7.2 Definitions of Skill Scores Using Figure 27 as a reference for a,b,c, and d, the POD is defined as:

d POD = (26) ()c + d

86

and the FAR as:

b FAR = . (27) ()a + b

The Heidke Skill Score is the most common score for assessing forecast performance that is based on the contingency table (Wilks 1995). The HSS ranges from 1 (perfect forecasts) to 0 (random) in most cases, though negative HSS scores are possible if the forecasts are worse than random. Any positive HSS value indicates forecasts that are more skillful than climatology. The HSS is calculated as:

2(1− c − c ) HSS = 01 10 (28) 2 − ()1− N 01 c01 − ()1− N10 c10

where N01 = ratio of non-developing events to developing events (N0/N1), N10 = N1/N0, and:

b c01 = (29) N 0

c c10 = (30) N1

where c01 is the rate at which non-developing events are misclassified as developing ones and c10 the rate at which developing cases are misclassified as non-developing cases.

7.3 Finding Optimal Decision Thresholds Forecasts issued by both the NN and the DA are probabilities. Since the contingency table and all scores based off of it require discrete forecasts (1 or 0), it is necessary to establish a decision boundary, whereby probability forecasts above (below) the threshold would be categorized as a ‘1’ (‘0’). An obvious choice of decision boundary would be p=0.5, since probability forecasts greater than 0.5 are interpreted as ‘more favorable for development than not’. However, in this instance this choice is a poor one. In a rare event situation, probability forecasts of an event occurring typically exhibit a peak in the distribution at very small

87

probabilities, with little or no probability forecasts issued above p=0.5 (Caren Marzban Personal Communication 2002). More importantly, categorical forecasts perform very poorly (in terms of HSSs) if the decision boundary is set to 0.5. Almost all developing cases are missed. Therefore it is necessary to find the optimal decision boundaries for each forecast hour that maximize the skill of the categorical forecasts. To find the best decision boundary, the entire 1998-2001 dataset was considered. First, the neural network was configured for each forecast hour with the optimal number of hidden nodes as given in section 6.6. Ten separate trials were performed. For each trial, the dataset were randomly partitioned into a training set (containing approximately 2/3 of all cases) and validation set (~1/3). Then, for each forecast hour, the neural network was trained. For the DA portion, the same ten randomly partitioned datasets were not used because of logistical constraints. Instead, the DA was run using a ‘leave one out’ procedure. This resulted in essentially hundreds of independent trials instead of ten. As discussed later, this may give the DA a small advantage, since the discriminant function is being derived on practically the entire dataset rather than just 67% of it. The decision boundary was allowed to vary from 0 to 1 by 0.001, and the probabilistic forecasts were then categorized for each increment. Based on those categorizations (a,b,c,d in the contingency table), the HSS was calculated for the NN training set, validation set, and independent discriminant analysis runs. Finally, the independent NN trials were averaged together to yield a mean HSS for each forecast hour, decision boundary, and dataset (training or validation). The HSS curves for both the NN and the discriminant analysis are presented in Figures 28-35. First, note that the HSS NN curve for the training (dependent) trials are much higher than the validation (independent) trials, indicating more skill. This is expected since the optimal weights in the NN are derived from the training set. Also note that the DA curves are identical for each forecast hour – this is because there was not a ‘training’ set or ‘validation’ set for the DA per se. Both curves represent the same validation set, since the forecasts were derived independent of the construction of the discriminant function. Finally, the HSS scores for the DA maximize at high decision boundaries while the HSS scores for the NN are at a maximum at lower probabilities. The latter situation is ‘normal’ for a rare event situation (Caren Marzban Personal Communication 2002) and indicates that most of the probabilistic forecasts are small. The discriminant curve is bowed to the right because it issued a large number of high probability (p>0.8) forecasts. This is symptomatic of the discriminant classifier assuming prior probability of an event as p=0.5. When the prior probability of an event is set to the actual occurrence of the event in the dataset (approximately p=0.036 for the 6-hour dataset), the shape of the discriminant

88

a)

b)

Figure 28. HSS for the 6-hour forecast period over a range of decision thresholds from 0 to 1. The NN a) training sets and b) validation sets were averaged across ten trials. The DA curve was derived from the cross-validation procedure. The threshold increment was 0.001.

89

a)

b)

Figure 29. As in Figure 28, except for the 12-hour forecast classifiers.

90

a)

b)

Figure 30. As in Figure 28, except for the 18-hour forecast classifiers.

91

a)

b)

Figure 31. As in Figure 28, except for the 24-hour forecast classifiers

92

a)

b)

Figure 32. As in Figure 28, except for the 30-hour classifiers.

93

a)

b)

Figure 33. As in Figure 28, except for the 36-hour forecast classifiers.

94

a)

b)

Figure 34. As in Figure 28, except for the 42-hour forecast classifiers.

95

a)

b)

Figure 35. As in Figure 28, except for the 48-hour forecast classifiers.

96

6 12 18 24 30 36 42 48 NN .319 .276 .267 .261 .214 .204 .193 .193 DA .868 .850 .840 .821 .852 .848 .887 .885

Table 12. Forecast decision boundaries by classifier and forecast hour. The NN boundaries were derived from a series of twenty randomly sampled training and validation sets. The DA boundaries were derived from a cross validation technique with many independent trials.

analysis HSS curve more closely matches the NN curve (not shown). This issue was not discovered until after the analysis was complete. However, it does not affect the validity of the comparison between the two classifiers – the decision boundary for the DA just becomes much higher than the NN. The maximum of the HSS curve suggests the choice of decision boundary for each forecast hour. For example, for the 6-hour forecasts (Figure 28), the peak of the HSS curve occurs at a decision boundary of approximately 0.3 – the corresponding validation set peak is between 0.3 and 0.4. It is desirable to give more weight to the suggested boundary of the training set since it is made up of a larger data sample. Therefore, the suggested boundaries of the training and validation set were multiplied by 0.67 and 0.33 respectively to yield a final optimal decision boundary value. These values are shown in Table 12, and are applied to the case studies presented in the next chapter.

7.4 Neural Network and Discriminant Analysis Heidke Skill Score Comparison The HSS values in Figures 28-35 can also be used to evaluate the relative forecast skill of each classifier by comparing the magnitude of the HSS maximum (across the spectrum of thresholds) in the validation sets (b). The HSSmax for the NN and the DA curves are very close for every forecast hour except for 6 hours, where it appears that the NN classifier does somewhat better. There is some indication that the DA performs better at longer forecast times, though it is likely that any differences are within statistical error.

It should also be noted that HSSmax can be compared across forecast hours to determine

forecast skill at short and long forecast times. As expected, HSSmax at the 6-hour forecast time

(~0.52) exceeds HSSmax at the 48-hour forecast time (~0.26) by a significant margin. The intermediate forecasts show a general decreasing trend in HSSmax.

97

7.5 POD and FAR Comparison The HSS provided little indication of a superior classifier. Additional analysis was performed by calculating the POD and FAR (equations 26 and 27), using the decision thresholds in Table 12. As with the HSS, the NN results were derived from ten random partitions of the 1998-2001 data and the DA results were calculated from the ‘leave one out’ procedure. The NN values were averaged over the ten trials and 90% confidence intervals were computed as standard error:

σ E = 1.645* (35) n where E = 90% confidence interval, σ = standard deviation of the parameter, and n = number of trials (10 in this case). The standard error can be interpreted as confidence intervals if one assumes that the error values are normally distributed. The POD parameter shows how well the classifier is recognizing developing events; FAR indicates skill in discriminating the non- developing events. It is desirable that a classifier exhibits high (low) values of POD (FAR). Figure 36 illustrates the POD and FAR for the NN and DA classifiers. The data from which Figure 36 was created is shown in a series of tables in Figure 37. The error bars represent 90% confidence intervals. It is readily apparent that the DA forecasts are more skillful at detecting developing events. At every forecast hour except 6 and 48 hours, the POD for the DA classifier is larger than the NN and above the 90% confidence interval. Note also the decreasing trend in detection skill as the forecast hour increases. For the non-developing clusters, the NN classifier performs better than the DA at all forecast hours. The NN exceeds DA forecast skill for false alarms at the 90% confidence level for all forecast periods as well. These results raise an interesting question. If a forecaster in charge of forecasting tropical cyclogenesis only had access to one classifier, which one should he/she use? The vast majority of cloud clusters will not develop into tropical depressions – the ratio of the non- developing to developing events in the dataset is very large. In this sense, one should choose the neural network, since it systematically does a better job at forecasting non-events. Theoretically if both classifiers were scored over an entire season, the NN would achieve a higher score – there is a weak indication of this in the HSSmax plots. However, when development does occur, the NN will miss it more frequently than the DA classifier. That is not an optimal situation. Does the importance of avoiding false alarms outweigh that of issuing proactive, skillful tropical cyclogenesis forecasts? An interesting analogy to this question is the problem of forecasting

98

a)

b)

Figure 36. PODrate (a) and FARrate (b) by forecast hour for the NN (light) and DA (dark) classifiers. The 90% confidence interval error bars are shown on the NN skill scores.

99

tropical cyclone track and intensity change, especially as the storm approaches a coastline where the potential for life and property loss exists. Over warning, both in terms of storm intensity and length of coastline, can lead to inflated costs due to evacuation and, if done frequently, a “boy cries wolf” mentality in the population. One can imagine the impacts this situation would create if the 50-year hurricane did indeed live up to expectations and the population did not take the warnings seriously. Perhaps the consequences of avoiding false alarms in terms of tropical cyclogenesis are not as dire, and one should choose DA as the classifier of choice. Whatever the decision, it should be noted that the performance of the NN is highly sensitive to the data partition process (illustrated by the error bars) and meets or exceeds the DA in POD skill in a few trials. With the higher variance see in the NN trials, it is difficult to make any firm determination of a better classifier.

7.6 Receiver Operating Characteristic Plots Thus far, with the exception of the HSS, each classifier has been evaluated with its respective optimum decision threshold. There is a convenient way to evaluate performance across all decision thresholds. The Receiver Operating Characteristic (ROC, Masters 1993; Marzban and Witt 2001) plot is a parametric plot of POD and FAR as calculated in equations 26 and 27. It contains the advantage of viewing classifier performance across all thresholds; independent of any user determined ones. ROC plots for each forecast hour are shown in Figures 38-41. The diagonal line y=x in each of the ROC plots represents the performance of a random classifier. Any curve that bows above that diagonal line is considered to have classification skill – the higher above the random line, the more skill a classifier contains. Furthermore, the area under each respective curve can be computed as a scalar measure of classification performance if desired. ROC plots have been used to evaluate a NN performance relative to hail size classification (Marzban and Witt 2001). Figure 38 supports other results presented thus far that show high forecast skill for the 6- hour forecast accompanied by a relatively steep drop off in skill to 12 hours. The NN validation forecasts across the range of thresholds appear to be mostly higher than the DA forecasts. This is expected considering the POD and FAR results presented above. At the 12-hour forecast time, there is some indication that the NN performs at a higher level across most thresholds, though any difference probably could be lost in the statistical error. The 18-42 hour plots (Figures 39-41) show almost coincident NN and DA lines. Finally, at the 48 hour forecast period, the DA

100

Neural Network Validation (10-trial average) Discriminant Analysis Cross-Validation

6-hour 6-hour 0 1 0 1 0 502 5 0 502 5 1 9 8 1 9 8

12-hour 12-hour 0 1 0 1 0 497 7 0 502 5 1 12 7 1 9 8

18-hour 18-hour 0 1 0 1 0 498 7 0 502 5 1 11 6 1 9 8

24-hour 24-hour 0 1 0 1 0 498 8 0 502 5 1 11 7 1 9 8

30-hour 30-hour 0 1 0 1 0 945 9 0 502 5 1 13 5 1 9 8

36-hour 36-hour 0 1 0 1 0 947 8 0 502 5 1 12 4 1 9 8

42-hour 42-hour 0 1 0 1 0 947 5 0 502 5 1 12 4 1 9 8

48-hour 48-hour 0 1 0 1 0 952 3 0 502 5 1 11 3 1 9 8

Figure 37. 2 x 2 contingency tables for each forecast hour. The neural network runs (left) are a 10-trial average. The discriminant analysis runs (right) were derived from a cross-validation (leave one out) procedure. The actual events (0,1) are in the first column. The predicted events are in the first row.

101

a)

b)

Figure 38. ROC diagrams for the a) 6-hour and b) 12-hour forecast periods.

102

a)

b)

Figure 39. As in Figure 38, except for the a) 18-hour and b) 24-hour forecast periods.

103

a)

b)

Figure 40. As in Figure 38, except for the a) 30-hour and b) 36-hour forecast periods.

104

a)

b)

Figure 41. As in Figure 38, except for the a) 42-hour and b) 48-hour forecast periods.

105

classifier appears to have clear superiority, somewhat contradicting earlier results that showed an advantage for the NN at this time. It is possible that the NN decision boundary at 48 hours is highly sensitive to adjustments in either direction, resulting in a more abrupt loss of forecast skill.

7.7 Summary and Discussion As mentioned previously, the DA results presented here may be an optimistic assessment of the classifier’s performance due to the validation method used. The ‘leave one out’ procedure uses the entire dataset of cloud clusters (except for the ‘one’ case) in the derivation of the discriminant function. Since the data contains only about 50 developing cases out of more than 2000, assumably more developing cases in the training set will result in a better discriminating function. The NN only has the luxury of using about 2/3 of the developing cases in the training set (with potentially large variations). This may limit the NN’s ability to find the optimal weights for training - the large variation in performance by the NN over just 10 random trials is evidence for this suggestion. Given that and the balance of evidence presented in this chapter, it appears that the NN classifier would be a better choice of classifier, at least for the dataset used in this research. But this choice is by no means certain, as the quantitative analysis shows. This is not a surprising result, as DA is a powerful statistical classifier in its own right (Caren Marzban Personal Communication 2002). At most forecast periods, there is no clear indication that either DA or the NN outperforms the other. The HSSs over the range of decision thresholds (Figures 28-35) show no clear winner as well. However, the highest confidence results presented here (POD and FAR scores) show that the NN is better in limiting false alarm forecasts and competitive with DA in genesis detection. Since a large majority of cloud clusters do not develop into tropical depressions, better performance will be achieved over a longer time period by the use of a NN classifier. The quantitative assessment aside, perhaps the most enlightening exercise that could be performed to evaluate classifier performance is to simulate a real forecast situation. There may be subtleties and/or peculiarities with the probabilistic forecasts that would provide more evidence for preferring one method over another. The following chapter presents six separate case studies – three developing systems and three non-developing systems – that will show how well each classifier performs in a simulated forecast environment.

106

CHAPTER 8 CASE STUDIES

In the previous chapter, it was suggested that the neural network performs better for non- developing situations, while the discriminant analysis seemed more adept at forecasting the developing events. The purpose of this chapter is to answer the following questions: 1) Do the predictors described in Chapter 4 produce satisfactory forecasts? 2) What are the strengths and weaknesses of the probabilistic forecast system? 3) Are there any advantages in using one classifier over another? These questions are addressed through the examination of six case studies from systems contained within the 1998-2001 dataset. These cases were selected because they were not missing a significant amount of data, they represent systems that formed across a wide spectrum of spatial and temporal scales, and the associated convection was very strong with each system. Theoretically forecasters would have a difficult time predicting development with these systems based on cloud recognition techniques and a casual examination of the large- scale fields.

8.1 Methodology The data associated with each case study were removed from the 1998-2001 dataset to yield independent classifications. For the DA, this meant that the ‘target’ variable (0 or 1) was cleared from the input data. The subsequent DA then treated the cases with the missing targets as independent cases subject to classification based on the discriminant function derived from the remaining data. For the NN, this process involved manually removing the case from the training data and creating a small validation set. The DA and the NN were set up to run eight times, one for each 6-hourly forecast. Thus, there were effectively sixteen different datasets for each case study: 6-48 hour training sets containing all of the 1998-2001 data except for the case being classified, and 6-48 hour validation sets containing the times for the case in question. After a particular forecast hour was run, the probabilistic output for all of the times in the independent case were recorded. Each 6-hour period of the cloud cluster’s life resulted in a 6-48 hour forecast.

107

As discussed in the previous chapter, different forecast hours produced different optimal forecast decision thresholds. This makes it difficult to compare probabilistic forecasts across different forecast hours. For example, a probabilistic forecast of P=0.86 would be interpreted as a ‘developing’ forecast for the 12-hour DA run, a ‘non-developing’ forecast for a 6-hour DA forecast, and non-developing for any NN forecast (since it falls below the decision thresholds). Therefore, a scale was developed that produced a numerical index from –1 (least favorable) to +1 (most favorable) with the value 0 as neutral. This transformation was applied to all forecast hours and both classifiers, allowing direct comparison of forecasts. For convenience, this index value is called the ‘TCG Index’, or TCGI. It is mathematically defined as follows:

(P − D) TCGI = if P > D (36) (1− D)

(P − D) TCGI = if P < D (37) D where P = probability forecast from the classifier, and D = decision threshold for that classifier and forecast hour as shown in Table 12. All of the following case studies are shown on this scale. All of the synoptic summaries for the developing case studies below were taken from the National Hurricane Center website, and are referenced individually by author.

8.2 Keith (2000) The tropical wave that eventually spawned tropical depression and then originated as an African easterly wave that came off the coast on 16 September (Beven 2000). The track of Keith is shown in Figure 42. The convection associated with the wave seemed to become somewhat disconnected and propagate towards the north, where it dissipated shortly thereafter. The NHC attributed the lack of development at this stage to strong vertical wind shear (Beven 2000). Although the convection subsided, the wave continued to move towards the west and into the Caribbean Sea. Strong convection reappeared about two days before genesis (Figure 43 left), and the NHC declared the system a tropical depression at 1800 UTC 28 September (Figure 43 right). The center of the new tropical depression was fixed at approximately 60 nautical miles north-northeast of Cape Gracias a Dios. Keith is an interesting case study because the lifecycle of the tropical wave can be divided into a non-developing and developing component. During the original analysis, the convection associated with the Keith wave in the mid-Atlantic was associated with an entirely

108

Figure 42. Track of the cloud cluster that would eventually develop into Tropical Depression and then Tropical Storm Keith. Note the northward displacement, and then disappearance, of convection in the mid-Atlantic followed by a re-emergence in the Caribbean Sea.

different non-developing system. It was not until the NHC storm summary was consulted that it was determined that the struggling convective wave was the same one that eventually spawned a tropical depression. This case highlights one of the difficulties of tracking tropical cloud clusters – many times there are difficulties with continuity because of the sporadic nature of the convection. A series of 6-48 hour forecasts were issued for the cloud cluster for each point shown on Figure 42. Both the DA and the NN were used – their forecasts are shown in Figure 44. Both classifiers do a good job at capturing the large-scale forcings during both phases of the system. Every forecast period at each 6-hour interval yielded TCGI values less than zero during the mid- Atlantic non-development phase, indicating a non-favorable environment for development. Then, after the blowup of convection beginning late in the day on 26 September, the forecasts become steadily more favorable. TCGI values for the DA range from near 0.2 up to more than 0.9 from 30 hours before genesis on 27 September through genesis on 28 September. A comparison of the DA and the NN reveals some confusion in the NN forecasts after 27 September. The index values for the NN are generally positive, but there are wild oscillations between forecast hours. For example, the 1800 UTC 26 September forecast gave relatively low TCGI values at 6 (~ -0.75) and 12 hours (~-0.25), then a favorable forecast (~0.45) at 18 hours, followed by a steep drop at 30 hours (~-0.40). Subsequent NN forecasts, though generally favorable for development, showed similar behavior. This behavior could be attributed to one or more of three factors. First, the NN may be over fitting the data. The large negative forecasts

109

Figure 43. Infrared Enhanced (left) GOES satellite image of the future Keith thirty hours before genesis and visible (right) GOES image of Keith around the time of genesis. Images courtesy of the Naval Research Laboratory.

occur when the predictor set is not similar enough to the developing cases in the training data. The DA appears to be generalizing better in those cases, as there are not any rapid reversals of the sign of the TCGI for subsequent time intervals. Second, the sample size of the developing cases is probably too small. This factor is not unrelated to the first. Recall that there are at most approximately 50 developing cases in any one training set accompanied by 1500 – 2200 non- developing cases. The training set therefore may not be large enough for the NN to ‘learn’ all the possible patterns for developing cases well enough; or, more accurately, it learns one forecast hour set (high TCGI values) better than another (low TCGI values). Though the small developing sample size does seem to affect the DA forecasts, it does not appear to do so as significantly. Finally, the NN may have learned a periodicity in the training data. Although all of the predictors by themselves do not exhibit an obvious, regular oscillation, it is possible that the non-linear nature of the NN has allowed it to perceive the existence of one, perhaps as a result of the combination of one or more predictors. Figures 45-46 are time series of Keith forecasts at short (6-12 hour) and long-term (42-48 hour) periods. During the non-development phase, both classifiers produced consistently negative forecasts. Note that the NN index values are systematically lower than the DA forecasts

110

a)

b)

Figure 44. Series of 6-48 hour forecasts for each point along the track of cloud cluster Keith for a) DA and b) NN. Genesis time is denoted by the vertical dark line at 1800 UTC 28 September.

111

a)

b)

Figure 45. Time series of a) 6-hour forecasts and b) 12-hour forecasts for cloud cluster Keith. Genesis is denoted by the vertical dark line at right.

112

a)

b)

Figure 46. As in Figure 45, except for a) 42-hour and b) 48-hour forecasts.

113

during this period. This is consistent with the FAR analysis presented in the last chapter that showed that the NN generally performs better in limiting false alarm forecasts. During the development phase of Keith, both classifiers resolve the signal at about the same time and have similar TCGI magnitudes. For the 6-hour forecast, the highest TCGI values occur for the forecast issued 6 hours before genesis. At 1800 UTC 26 September (48 hours before genesis), both classifiers issued 42 and 48-hour forecasts that were marginally negative. The forecasts at these lead times did not become positive until approximately 24 hours prior to genesis. Hence, even though the model did predict development, it appears to have been late in the timing. In summary, the Keith case study overall was a promising one. Both classifiers performed well during both phases of development. There are indications that the NN is suffering more than the DA from the small sample sizes in the development cases, as evidenced by the oscillations in the forecasts during the development phase of Keith.

8.3 Non-Developing Case 43 (ND-43 1999) This convective maximum formed in the mid-Atlantic near the time of the peak of the Atlantic hurricane season. The cloud cluster did not appear to be associated with an easterly

Figure 47. Irregular track of cloud cluster ND-43 (1999). The date and UTC time of each location is given next to the position fix.

114

wave and exhibited little if any consistent motion (Figure 47). As shown in Figure 48, convection in the small cloud cluster was initially moderate - but tended to ‘jump’ and then weaken towards the end of the cloud cluster’s life. An examination of the tropical cyclogenesis predictors for this case reveals that ND-43 existed in an unfavorable large-scale wind field environment for its entire lifetime (low DGP values). In Chapter 5 it was shown that the DA classifier assigned the most weight to the DGP predictor in making its forecasts – this would suggest that the DA classifier would do very well in this case. Figure 49 illustrates the 6-48 hour forecasts for ND-43 for the DA and NN classifiers. Both systems consistently predicted no development for this particular case. TCGI values for the

Figure 48. Meteosat-7 IR image of cloud cluster ND-43 at 1200 UTC 1 September. The cloud cluster had been tracked for approximately 24 hours prior to this image and would dissipate 36 hours later.

115

a)

b)

Figure 49. Series of 6-48 hour forecasts for cloud cluster ND-43 for a) DA and b) NN classifiers.

116

DA (NN) ranged from –0.6 to near –1.0 (-0.9 - -1.0) for the entire life of the cloud cluster. The advantage that the NN enjoys over the DA in the unfavorable development environment is evident in this case. However, it is safe to say that a forecaster on duty would have predicted no chance for development given the predictions by either classifier.

8.4 Non-Developing Case 6 (ND-6 2000) ND-6 (2000) was an early season easterly wave that exhibited strong, persistent convection over a period of 5 days despite the climatologically unfavorable SST in the region. Traditionally, African easterly waves do not develop off the western coast until the late August through September timeframe. However in this case, the large-scale wind field was very favorable for development and the system maintained its integrity across much of the Atlantic. The track of cloud cluster ND-6 is shown in Figure 50 and a Meteosat-7 IR image of the cluster near its peak convective intensity is shown in Figure 51. As can be seen by this image, this cluster would probably be considered a high-risk for development if it formed a couple of months later in the season. The convective shield appears to extend very high into the troposphere and is large (~5º in diameter). The probabilistic forecasts issued for ND-6 reflect the uncertainty of this case. Figure 52 shows the series of forecasts for the DA and NN classifiers. There appears to be a peak into the positive values of the TCGI around 1800 UTC 5 June, 12 hours after the IR image shown in Figure 51. The DGP during this period was very high, indicating that ND-6 was experiencing very little if any shear over its center. In addition, the system was receiving a strong influx of moisture into its core region as low-level moisture divergence values were in the –1.0 to –1.5 x 10-7 g/kg s-1

Figure 50. Track of cloud cluster ND-6 (2000).

117

Figure 51. Meteosat-7 IR image of cloud cluster ND-6 (2000). The white arrow points towards the convective maximum of the system. Convection would gradually decrease over the next few days as an unfavorable shear situation established itself over the mid-Atlantic.

range. The thermodynamic environment was marginally unfavorable, as would be expected in early June in this region. The greatest negative that ND-6 encountered was its low latitude – generally, genesis events occur more frequently at least 3º-4º poleward of the track of ND-6. Beginning at 0000 UTC 6 June, both classifiers show a gradual decrease in TCGI values. This is a reflection primarily of the change in the large-scale wind patterns encountered by ND-6. DGP values begin to steadily decline around this time. Forecasts issued during the day on 7 June were strongly negative for the DA classifier (-0.9 - -0.5) and near –1 for the NN classifier. The convective identity of ND-6 was lost very soon after that time. Although the forecasts did fairly well in this case, the life of ND-6 highlights one of the deficiencies of this model. Namely, the DGP is the single most skillful predictor of tropical cyclogenesis. This does not mean that the inclusion of the other predictors does not add predictive skill to the model – each of them does. But it appears that each classifier weighs most heavily on this one parameter. In the case of ND-6, the cloud cluster managed to maintain its identity despite the unfavorable thermodynamic environment because the large-scale dynamics were almost perfect. It has already been shown that the DGP predictor in the DA has the highest correlation to the predicted outcome. It is assumed that the NN has learned this feature of the

118

a)

b)

Figure 52. As in Figure 49, except for cloud cluster ND-6 (2000).

119

data as well. Thus, the classifiers were perhaps fooled for a period of time in believing that ND-6 would become a tropical depression despite the marginal SSTs and low latitude of the system. In fact, the latitude was probably the most important factor that kept the models in check during this time. In the end, both models successfully forecast the dissipation of ND-6 as the large-scale winds turned unfavorable. This was no surprise as the DGP parameter dropped from near 2.0 to 0.75 in a 24-hour period. This case stresses the need to find better predictors of tropical cyclogenesis, perhaps at smaller scales.

8.5 Danielle (1998) Danielle originated as an African easterly wave that left the coast on 21 August. The track of the pre-Danielle cloud cluster is shown in Figure 53. Convection was initially widespread and rather disorganized1. Consolidation occurred during the day on 22 August (Pasch 1998). After 1200 UTC 22 August, the cloud cluster gradually became better organized as convection began to occur near a poorly defined circulation center (Figure 54). A Dvorak T-number of 2.0 was analyzed at 0600 UTC 24 August and hence the cloud cluster was declared a tropical depression by the NHC. The tropical depression as it appeared on Meteosat-7 IR imagery is shown in Figure 54 at around the time that genesis was declared. At this time the depression was located about 600 nautical miles west-southwest of the Cape Verde Islands. Danielle had an interesting post-depression phase that is worth mentioning here. Despite seemingly unfavorable large-scale wind patterns (DGP values ranged from 0.11 – 0.85 during the

Figure 53. Pre-tropical depression track of the cloud cluster that eventually developed into Hurricane Danielle. ______1. It is possible that Danielle at this stage was embedded in a dry Saharan Air Layer (Hugh Willoughby Personal Communication 2003).

120

Figure 54. Pre-Danielle cloud cluster approximately 24 hours prior to genesis (left) and at genesis (right).

pre-depression stage, well below the average for other developing systems), relatively poor thermodynamic support (MPI values were well above development average), and little indication of low-level vorticity increases, the tropical depression rapidly intensified to tropical storm stage only 12 hours after it was declared a depression (Figure 55). After only another 18 hours, Danielle was diagnosed as a hurricane with 65 kt. sustained wind speeds. It is possible that the vertical shear field rapidly became more favorable after genesis – those values were not calculated for this study. After the rapid intensification phase, Danielle surprisingly lost strength as it moved out into the Atlantic. The NHC believed that southeasterly vertical shear may have contributed to the weakening (Pasch 1998) – another likely factor was the cold wake left by Hurricane Bonnie. The existence of this effect in the SST field was diagnosed by TRMM imagery (Jeffrey Halverson Personal Communication 2002). The statistical models poorly handled the formation of Tropical Depression Danielle – the post-depression lifecycle provides insight to some of the critical elements that need to be addressed in the future. First, this model has no way of sensing any high temporal and spatial resolution changes in the thermodynamic environment (e.g. SSTs) that may occur, such as the effects due to a previous passage of a strong hurricane. Second, judging by the large-scale fields there was no obvious reason that the pre-Danielle cloud cluster would rapidly intensify into a tropical depression (and subsequently, a hurricane). There must have been an alternate forcing

121

Figure 55. Tropical Storm Danielle approximately fourteen hours after genesis.

mechanism that overcame the marginal large-scale environment. Perhaps a convective burst(s) occurred (Paula Hennon Personal Communication 2003). Figure 56 shows the failure of the statistical model to pick up on the strengthening phase. Both models issued generally strongly negative forecasts, although the NN showed some positive hints in the data. However, this is more than likely an artifact of the sampling/over fitting problem seen in the Keith case study rather than a real indication of positive development. The DA forecasts are mostly centered on a TCGI value of –0.6 and show a slight decreasing trend in TCGI even as the cloud cluster was rapidly intensifying during 24 August. The model performance on Danielle was the worst seen in the case studies analyzed. The large-scale fields were marginal to unfavorable and there was no obvious reason found for the rapid intensification of the system immediately before and following genesis. This case will provide a good test for future improvements of the model, which may include mesoscale predictors.

8.6 Mitch (1998) The incipient Hurricane Mitch, like Keith, experienced a long pre-development phase before organizing and making history as the strongest October Atlantic storm in recorded history, responsible for over 9000 deaths in Central America (Guiney and Lawrence 1998). Mitch

122

a)

b)

Figure 56. As in Figure 49, except for the pre-Danielle cloud cluster.

123

Figure 57. Pre-tropical depression track of the cloud cluster that was eventually named Mitch. originated as an easterly wave that moved across the west coast of Africa on 10 October (Figure 57). The system maintained an observable convective maximum across the entire Atlantic basin despite west-southwesterly shearing winds. As the cloud cluster entered the eastern Caribbean Sea on 18 and 19 October, satellite imagery showed increasingly organized convection that continued through 21 October. Soon thereafter, a reconnaissance aircraft found 39 kt. winds at flight level (1500 ft.) with a central pressure of 1001 mb (Guiney and Lawrence 1998). Based on that information and the continuance of the convective development (Figure 58), a tropical depression was declared by the NHC at 0000 UTC 22 October. Both classifiers handled Mitch very well. Figure 59 shows the series of 6-48 hour forecasts for the system with the genesis time/date labeled as a dark vertical line. For the long non-development phase, the NN classifier gives virtually no indication of any impending development as TCGI values flat line near –1.0 from 13 October through 17 October. The DA issued negative TCGI forecasts as well, but with a gradual increasing trend. Using hindsight, it appears that the DA is performing better during this phase of non-development. The cloud cluster maintained its integrity during this time and one could reasonably assume that a slow intensification process may have been occurring, especially as the system approached the Caribbean Sea. If the NN forecasts are to be believed, they would indicate a hostile environment that should probably result in the dissipation of convection. This was already seen in the ND-43 (1999) and the ND-6 (2000) cases, for example. Both models resolved the intensification period, although the forecasts once again suggest that the DA handled the situation better. Note the increasing, almost linear, trend in TCGI values beginning at 48 hours prior to genesis (0000 UTC 20 October) climaxing with the strongly favorable forecasts (TCGI = 0.8 – 1.0) in the 24 hours

124

Figure 58. GOES-East IR image of the pre-Mitch cloud cluster at approximately genesis time. Image courtesy of the Naval Research Laboratory.

preceding genesis. The NN forecasts, though on average positive throughout the intensification period, show the abnormal, oscillating behavior first seen in the Keith case study. There are subtle indications of the same behavior in the DA forecasts, though it is clearly not as severe. The DA resolved the intensification process about 18 hours sooner than the NN (Figure 60). Once the NN began issuing positive forecasts, however, the magnitudes of TCGI quickly increased to levels comparable to the DA forecasts (although the oscillating effect was still evident). As seen in Figure 61, the DA produced better longer-term forecasts than the NN as well, although both performed respectably. At 42 hours prior to genesis, both 42-hour forecasts were near neutral (0) with the DA TCGI increasing at a faster rate. As with the Keith case study, both models were slightly tardy in their long-term genesis forecasts, as the TCGI did not go positive until approximately 30 hours before genesis for the 42 and 48-hour forecasts.

8.7 Non-Developing Case 34 (ND-34 2001) The final case study is of a mid-season cloud cluster that developed in the extremely warm waters of the western Gulf of Mexico. It was selected to test the abilities of the statistical models in such a situation. ND-34 was first distinguishable via IR imagery at 0600 UTC 27

125

a)

b)

Figure 59. As in Figure 49, except for the pre-Mitch cloud cluster.

126

a)

b)

Figure 60. Forecast time series for a) 6-hours and b) 12-hours for the pre-Mitch cloud cluster. Genesis occurred at 0000 UTC 22 October, which is at the right edge of the plot.

127

a)

b)

Figure 61. As in Figure 60, except for the a) 42-hour and b) 48-hour forecast periods.

128

August. The track of ND-34 shown in Figure 62 illustrates the slow north-northeast movement of the cloud cluster over the next two days. An examination of the values of the large-scale predictors showed that ND-34 experienced an unfavorable shear field for its entire lifetime – DGP values ranged from –0.1 to 0.45. Other predictors were also somewhat unfavorable. MDIV values were positive, indicating a net loss of moisture at low-levels. PWAT values were also low (39 – 41 mm), which indicated the existence of a dry column. Despite the very low MPI values (~870 mb), ND-34 did not develop and dissipated by 1800 UTC 29 August. Forecasts issued for ND-34 (Figure 63) were uniformly negative. The NN forecasts formed their familiar flat line pattern near TCGI = 0 as seen previously, especially in the ND-43

Figure 62. Track of cloud cluster ND-34 (2001).

129

a)

b)

Figure 63. As in Figure 49 except for the ND-34 (2001) cluster.

130

case study. DA forecasts were centered at TCGI = -0.6 and showed no obvious trend in either direction during the cluster’s lifetime. Recall that in Chapter 5 it was shown that the MPI predictor was poorly correlated with the DA classifications. The results of this case study suggest that both classifiers apply little weight to the thermodynamic environment.

8.8 Summary Six case studies from the 1998-2001 dataset were examined to ascertain the performance of the statistical models in a simulated forecast environment. In general, both models performed relatively well. It appears that the DA produces more reasonable forecasts more often than the NN classifier, possibly due to over fitting of the NN training set. In the previous chapter it was shown that the NN might have had a slight edge in terms of statistical scores. Results in this chapter suggest that the linear classifier is the better choice at this point in time. Whether the problems stem from tendencies specific to this particular NN or probabilistic NNs in general is something to be researched in the future. However, it is theorized that one of the biggest problems affecting both classifiers is the small development sample size. This of course can only be resolved with the analysis of new cases. It has also become apparent, especially as shown in the Danielle case, that there is a glaring need to find more information, probably at the mesoscale, to improve forecasts. Recent technological improvements in remote sensing data provide the opportunity for such work.

131

CHAPTER 9 SUMMARY, FUTURE WORK, AND CONCLUSIONS

9.1 Summary This research is based on the premise that tropical cyclogenesis (TCG) is a series of accumulating events that begins with a pre-existing disturbance and a favorable large-scale environment. Since current numerical models have difficulty with forecasting TCG, an alternative statistical approach is discussed which approaches the problem from a probabilistic point of view. The North Atlantic Basin is unique in that approximately 60% of all TCG events occur in association with African Easterly Waves (Avila et al. 2001), whereas TCG in other basins is more frequently associated with monsoon troughs. Genesis events over the North Atlantic, especially in association with easterly waves, are most frequent during the months of August through October. A new dataset of tropical cloud clusters that formed or propagated over the North Atlantic during the 1998-2001 hurricane seasons was formed by visually examining GOES and Meteosat infrared imagery. Using identification criteria initially documented by Lee (1989a, 1989b), 370 total cloud clusters were identified, of which 62 achieved genesis. A pilot study was undertaken using the 850 mb relative vorticity in the NCEP-NCAR Reanalysis (NNR) to determine if the pre- tropical depression cloud clusters were resolved in the NNR. Signatures of every cloud cluster but one case were found in the reanalysis. Furthermore, the relative vorticity maximum of 75% of the cases were located within one grid point (278 km) of the National Hurricane Center (NHC) fix at genesis time. Eight large-scale predictors were selected for the study: latitude (COR), daily genesis potential (DGP, McBride and Zehr 1981), low-level moisture divergence (MDIV), maximum potential intensity (MPI, Holland 1997), precipitable water (PWAT), 24-hour pressure tendency (PTEND), 6-hour surface relative vorticity tendency (VTENDSFC), and 6-hour 700 mb relative vorticity tendency (VTEND700). The predictors were derived from the NNR and Reynolds sea surface temperature (SST) data. All grid points within two degrees of the center of the cloud cluster were averaged together to arrive at the predictor value for each case. Discriminant function loadings showed that DGP and COR were most highly correlated with the outcome of the event.

132

Two classifiers were selected to relate the predictors to the event outcome (developing or non-developing): discriminant analysis (DA) and a neural network (NN). DA creates a predictive function based on a linear combination of all predictors. The predictors in order for DA to be completely valid must meet several statistical requirements. A Kolmogorov-Smirov One Sample Test showed that only one predictor (PTEND) satisfied the DA requirement of normality. Several other predictors were shown to have “somewhat” normal distributions. The COR predictor was farthest from a normal distribution. Other requirements were satisfied to a stronger degree: all predictors were uncorrelated with each other, and the variance-covariance matrices varied together for most forecast hours. The lack of normality does not invalidate the DA results, but they degrade the statistical power to some degree. The NN used is a binary classifier with one hidden layer. Using the logistic activation function in the output layer, the NN minimizes cross- entropy error and produces a probabilistic forecast through the Broyden-Fletcher-Goldfarb- Shanno (BFGS) back propagation-training algorithm. A bootstrap technique was employed to optimize the structure of the NN by finding the optimal number of hidden nodes. The use of six hidden nodes for all forecast hours except for the 12-hour forecast (7) was found to provide the best results. The hidden neurons allow the NN to model non-linear interactions in the data that the DA cannot resolve. Each developing cloud cluster was stratified into eight forecast bins based on the number of hours before genesis. Thus, forecasts were issued back to 48 hours before genesis at six-hour intervals. This necessitated the training of eight NNs and DA functions, one for each forecast period. Each set of developing clusters was trained against the same set of non-developing cases. A comparison of DA versus NN performance for the 1998-2001 seasons showed that there was no clearly superior classifier, although the NN (DA) performed better in non- development (development) cases. The calculation of Heidke Skill Scores (HSS) across a range of decision thresholds showed that the optimal threshold for the NN (DA) classifier was approximately 0.25 (0.86), although these values varied slightly by forecast hour. Using those threshold values, the probability of detection (POD) and false alarm ratio (FAR) were tabulated for each classifier. The DA was more skillful at detecting developing systems, but the NN was slightly better at limiting false alarm forecasts. False alarm forecasts for both classifiers were very low. For example, the DA (NN) only issued 5 (3) false alarms out of 507 (955) non- development cases at the 48-hour forecast period. It was difficult to determine whether one classifier was in general superior to the other. Finally, six case studies were evaluated for forecast performance from each classifier. All probability forecasts were scaled to an index value of ‘–1’ (unfavorable for genesis) to ‘+1’ (favorable), with ‘0’ being neutral so that forecasts could be compared across forecast hours.

133

This was necessary since each forecast hour had a different decision threshold. As the probabilistic forecast farther deviated away from the decision boundary, the tropical cyclogenesis index (TCGI) moved farther away from neutral. Four out of six case studies (Keith (2000), Mitch (1998), ND-34 (2001), and ND-43 (1999)) were forecasted well by both classifiers, although there are indications that the NN was having difficulty with over fitting the data. Both Keith and Mitch had a long non-developing phase in which both classifiers skillfully handled. As each system moved into the Caribbean Sea, both models predicted development to occur although the timing was slightly tardy. ND-34 and ND-43 were embedded in an unfavorable wind shear environment and were forecast well. Danielle (1998) was a rapidly intensifying system that was handled poorly by both versions of the statistical model. It was thought that there were smaller-scale interactions that must have contributed to the development since the large-scale fields were generally unfavorable. ND-6 (2000) was an early-season system that moved off the coast of Africa in a favorable (unfavorable) wind shear (thermodynamic) environment. The poor forecasts issued by both the DA and NN suggest that in this case the thermodynamics may have played a more important role than the classifiers learned in the training.

9.2 Future Work Areas of future work can be categorized into one of two broad headings: improvement of model forecasts through the incorporation of new data, tools, and methods; and possible new applications of the model.

9.2.1 Improvement of Model Potentially, one of the biggest areas for improvement in the model is the inclusion of more skillful predictors, both at the large-scale and mesoscale. It has been shown that even though all eight predictors elevate the forecast skill of the model, the DGP and COR predictors provide by far the greatest contribution to the forecasts. When one considers a case like Danielle, it is immediately clear that a great deal of predictability is not being captured by the DGP and COR predictors (although it has been shown that they provide a large amount of predictability in most cases). Without performing the exploratory work, it is not obvious which kinds of predictors would increase the predictive skill of the model; but there are some worthy candidates. The MPI predictor was used in the hope it would effectively wrap up two important large-scale factors (SST and vertical instability) into one parameter. Judging by the discriminant function loadings and the classification results, MPI was a rather ineffective predictor. Explicitly including barotropic- baroclinic instability and higher resolution (both spatially and temporally) SST fields would be a

134

realistic modification1. This may allow the model to detect storm-induced cooling and other smaller scale features that are currently smoothed out by the coarse spatial and temporal resolution. Another area that is lacking in the current model is an effective way to represent the moisture field structure, especially in the vertical dimension. The literature has repeatedly emphasized the importance of sufficient low and mid-level moisture content, primarily to counteract the dry, cool downdrafts that occur near strong convection. In addition, it was shown that convective downdrafts will bring cool, dry air into the lower-levels if they are too dry. The only moisture predictors presently included are precipitable water (which gives a very rough picture of the overall moisture contained in a column) and low-level moisture divergence. There is no information regarding moisture content at specific levels other than 850 mb. Furthermore, the moisture variables from the NNR are of lower confidence to begin with (see section 3.7.2), especially in comparison to the temperature and wind data. New moisture data, both at the large- scale and mesoscale, could come from satellite sounders (such as TRMM and AMSU) mounted on polar orbiting or even geostationary satellites. Such data would not be regularly available for all systems, however. Recently it has been demonstrated that easterly waves are severely negatively impacted if they propoagate into Saharan Air Layer (SAL) regions (Jason Dunion Personal Communication 2002). SALs (Karyampudi and Carlson 1998) can be diagnosed by channel compositing techniques with geostationary satellite imagery. If this occurrence can be quantified, perhaps into a SAL Index value, it will provide an additional skill measure of moisture, especially for waves in the eastern north Atlantic. As was shown in Chapter 8, the NN predictions appear to suffer from an irregular oscillation in consecutive forecast hours. Those results suggest that the NN may be over fitting the data (despite the restrictive regularization employed), which in turn may be related to the small sample size of developing events. Even if that problem did not exist, there is no clear evidence that the NN outperformed the DA overall. One explanation is that non-linear techniques are not needed in this case to produce good forecasts. Another is that the NN itself may not be optimally configured for this particular situation. An area for future work should involve the integration and testing of other neural networks to determine whether this version of the NN is inferior to similar incarnations. As discussed in chapter 3.7.3, the NCEP-NCAR Reanalysis (now commonly abbreviated as ‘R-1’) contains known errors, due to human error and/or lack of knowledge and understanding,

______1. Strong tropical disturbances act as a natural spatial and temporal smoother of SSTs. A higher resolution SST dataset would not necessarily increase its effectiveness as a predictor.

135

which probably has a substantial effect on users of the data. Since this research has been completed, many of these problems have been addressed and have been incorporated in an updated version of the NNR called NCEP-DOE AMIP-II Reanalysis, or R-2 (Kanamitsu et al. 2002). In addition to correcting the known errors, the R-2 reanalysis added new system components and improved model physics and fixed fields. Many of the fixes and improvements were related to deficiencies in the hydrological budget and moisture parameters, including improvements in precipitation, albedo values, snow cover, and SST fields. As an example of a change that may forcefully impact predictor values, precipitable water values over the tropical oceans are 1-3 mm larger in the R-2 reanalysis than in R-1. This paper has already shown that an error in temperatures at upper tropospheric levels, with a seemingly indirect relationship to any of the predictors at best, can change predictor values by as much as 15-20% and potentially impact forecast skill. It would be advantageous to replace the NNR dataset with improved data, as R-2 is believed to be, to assess any changes in forecast skill. Of course, as new and even better reanalysis datasets are produced, perhaps at higher spatial and temporal resolution, it is anticipated that the classification forecasts produced by this model would improve.

9.2.2 Applications of the Model As shown earlier, especially in the previous chapter, it is not clear whether there was any worthwhile information gained by issuing forecasts at a 6-hour interval. It is entirely possible that the data do not allow for a reasonable forecast at that time resolution. Perhaps a better model structure would be to issue a ‘short-term’ and ‘long-term’ genesis forecast based on the current analysis. In this way, the 6-24 hour developing cases could be combined into a single short-term group of developing clusters – this would provide approximately 200 developing cases instead of 50 for the individual forecasts. Although there may be issues with case independence to consider, such a procedure may help to alleviate the data-sampling problem without actually adding new cases. Another possible improvement would be the attachment of a confidence interval to the forecast in a Bayesian type of framework. DA and NN forecasts would be combined to produce a probability distribution around the forecast values. The fundamental structure of the model in its current state bases the forecasts on the current state of the cloud cluster (with some auxiliary information from the past 6 and 24 hours from the vorticity and pressure tendency predictors). This approach, while shown to be generally useful for predictions, by definition ignores any change in the large-scale fields at future times. A future version of the model could incorporate numerical forecast information and the projected

136

track of the cloud cluster to improve the predictions. For example, if the cloud cluster is currently tracking to the west at 10 kt., that track can be extrapolated at 6-48 hours into the future. Then, numerical model forecast fields of temperature, winds, and moisture could be calculated for the region that the cluster is projected to move. Also, the future track of the cloud cluster itself could be more objectively determined by the consulting the track of the associated tropical wave in the numerical model forecast itself (similar to the method employed by the updated Statistical Hurricane Intensity Prediction Scheme (SHIPS, DeMaria and Kaplan 1999). Of course, consideration of errors in the track prediction may negatively impact genesis predictions. This model was specifically developed for the Atlantic basin. It is quite possible, and even probable, that tropical cyclogenesis in other basins occurs under somewhat different large- scale environments. For example, genesis in the Pacific Basin is frequently associated with monsoon troughs rather than easterly waves (Ritchie and Holland 1999). To apply this model to other basins, tests should first be performed to determine how well the current model predicts genesis. Then, if it is determined that adjustment is necessary; a basin-specific database should be developed. This would involve adjustments to the current predictors, and perhaps may entail the addition or subtraction of predictors, depending on the situation. Finally, if this model were to be transferred to an operational environment, several tasks should be performed. First, testing must be performed to ascertain how well the model does with real, operational analysis (rather than reanalysis). Second, it would be desirable to develop a simple graphical user interface (GUI) that would allow the forecaster quick, convenient ‘point-and- click’ access to the model. A schematic of such a GUI is shown in Figure 64. It is envisioned that a forecaster would spot a suspicious cloud cluster, enter the coordinates of the cluster in the GUI, and press a button. The GUI would interface with the DA or NN, access the current analysis fields, compute the predictors, and make the probabilistic forecast. The forecaster would have access to the raw probability forecast, the decision boundary, and the TCGI. This information would then be used to help make a genesis forecast.

9.3 Conclusion It has been shown that useful predictions of tropical cyclogenesis can be obtained by the consideration of a small number of large-scale predictors. Though there is room for improvements in both the model structure and quality of predictors, it is believed that significant benefit can be achieved by the implementation of this system into an operational setting. This is especially true given current “watch and see” techniques currently used. The need for a good

137

Figure 64. Concept GUI for an operational tropical cyclogenesis forecast tool.track forecast, but it is the first step towards a basic understanding of one of the intriguing mysteries of tropical meteorology.

TCG forecast is becoming increasingly important since the advent of the 5-day time horizon for track and intensity prediction implemented by the National Oceanic and Atmospheric Administration. The need for exact TCG predictions may not be as crucial to life and property as a good intensity or track forecast, but it is the first step towards a basic understanding of one of the intriguing mysteries of tropical meteorology.

138

LIST OF REFERENCES

Avila, L. A., 1999: Preliminary report: Tropical Depression Seven 5-7 September 1999. Internet. http://www.nhc.noaa.gov/1999seven.html

Avila, L.A., R.J. Pasch, J-G Jing, 2000: Atlantic tropical systems of 1996 and 1997: Years of contrasts. Mon. Wea. Rev.,128, 3695-3706.

Bankert, R. L., 1994: Cloud classification of AVHRR imagery in maritime regions using a probabilistic neural network. J. Appl. Meteor., 33, 909-918.

Barnard, E., 1992: Optimization for training neural nets. IEEE Transactions on Neural Networks, 3, 232-240.

Bellerby, T., et al., 2000: Rainfall estimation from a combination of TRMM precipitation radar and GOES multispectral imagery through the use of an artificial neural network. J. Appl. Meteor., 39, 2115-2128.

Beven, J. L., 1999: The boguscane - a serious problem with the NCEP medium range forecast model in the tropics. Preprints, 23rd Conf. on Hurr. and Trop. Meteor., Dallas, TX, Amer. Meteor. Soc., 845-848.

Beven, J. L., 2000: Hurricane Keith. Internet. http://www.nhc.noaa.gov/

Bishop, C. M., 1995: Neural networks for pattern recognition. Oxford, Oxford University Press, 482 pp.

Bister, M. and K. A. Emanuel, 1997: The genesis of Hurricane Guillermo: TEXMEX analyses and a modeling study. Mon. Wea. Rev., 125, 2662-2682.

Bracken, W. E. and L. F. Bosart, 2000: The role of synoptic-scale flow during tropical cyclogenesis over the North . Mon. Wea. Rev., 128, 353-376.

Briegel, L. M. and W. M. Frank, 1997: Large-scale influences on tropical cyclogenesis in the Western North Pacific. Mon. Wea. Rev., 125, 1397-1413.

Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1-3.

Burpee, R. W., 1972: The origin and structure of easterly waves in the lower troposphere of North Africa. J. Atmos. Sci., 29, 77-90.

Camp, J. P. and M. T. Montgomery, 2001: Hurricane maximum intensity: Past and present. Mon. Wea. Rev., 129, 1704-1717.

139

Charney, J. and A. Eliassen, 1964: On the growth of the hurricane depression. J. Atmos. Sci., 21, 68-75.

Charney, J. and M. E. Stern, 1962: On the stability of internal baroclinic jets in a rotating atmosphere. J. Atmos. Sci., 19, 159-172.

Chen, S. S. and W. M. Frank, 1993: A numerical study of the genesis of extratropical convective mesovortices. Part I: Evolution and dynamics. J. Atmos. Sci., 50, 2401-2426.

Cione, J.J., P.G. Black, and S.H. Houston, 2000: Surface observations in the hurricane environment. Mon. Wea. Rev., 128, 1550-1561.

Climate Diagnostics Center, 2003: The NCEP/NCAR Reanalysis Project. Internet. http://www.cdc.noaa.gov/cdc/reanalysis/reanalysis.shtml

Dai, A. and J. Wang, 1999: Diurnal and semidiurnal tides in global pressure fields. J. Atmos. Sci., 56, 3874-3891.

DeMaria, M. and J. Kaplan, 1999: An updated statistical hurricane intensity prediction scheme (SHIPS) for the Atlantic and Eastern North Pacific basins. Wea. Forecasting, 14, 326- 337.

DeMaria, M., J.A. Knaff, and B.H. Connell, 2001: A tropical cyclone genesis parameter for the tropical Atlantic. Wea. Forecasting, 16, 219-233.

Demuth, H. and M. Beale, 2001: Neural network toolbox. Natick, The Mathworks, Inc., 357 pp.

Denmark, Technical University of, 2002: Ann:DTU toolbox. http://mole.imm.dtu.dk/toolbox/ann/

Dickinson, M. and J. Molinari, 2002: Mixed Rossby-Gravity waves and Western Pacific tropical cyclogenesis. Part I: Synoptic evolution. J. Atmos. Sci., 59, 2183-2196.

Donaldson, R.J., R.M. Dyer, and M.J. Krauss, 1975: An objective evaluator of techniques for predicting severe weather events. Preprints, 9th Conference on Severe Local Storms, Norman, OK, Amer. Meteor. Soc., 321-326.

Doswell, C. A. I., R. Davies-Jones, and D. Keller, 1990: On summary measures of skill in rare event forecasting based on contingency tables. Wea. Forecasting, 5, 576-585.

Dvorak, V. F., 1984: Tropical cyclone intensity analysis using satellite data. NOAA Tech. Rep. NESDIS 11, National Oceanic and Atmospheric Administration, Washington, D.C., 47 pp.

Emanuel, K. A., 1988: The maximum intensity of hurricanes. J. Atmos. Sci., 45, 1143-1155.

Emanuel, K. A., 1989: The finite-amplitude nature of tropical cyclogenesis. J. Atmos. Sci., 46, 2599-2620.

Emanuel, K. A., J.D. Neelin, and C.S. Bretherton, 1994: On large-scale circulations in convecting . Quart. J. Roy. Meteor. Soc., 120, 1111-1143.

Emanuel, K.A., 1995: The behavior of a simple hurricane model using a convective scheme based on subcloud-layer entropy equilibrium. J. Atmos. Sci., 52, 3960-3968.

140

Emanuel, K.A., 2000: A statistical analysis of tropical cyclone intensity. Mon. Wea. Rev., 128, 1139-1151.

Farfan, L. M. and J. A. Zehnder, 1997: Orographic influence on the synoptic-scale circulations associated with the genesis of Hurricane Guillermo (1991). Mon. Wea. Rev., 125, 2683- 2698.

Ferreira, R.N., W.H. Schubert, and J.J. Hack, 1996: Dynamical aspects of twin tropical cyclones associated with the Madden-Julian Oscillation. J. Atmos. Sci., 53, 929-945.

Finnoff, W., F. Hergert, and H.G. Zimmermann, 1993: Improving model selection by nonconvergent methods. Neural Networks, 6, 771-783.

Fiorino, M., 2000: Prospects for an improved understanding of tropical cyclones from reanalysis. Preprints, 2nd WCRP International Conference on Reanalysis, Wokefield Park, Reading, UK, World Climate Research Programme, 423-426.

Franklin, J. L., L.A. Avila, J.L. Beven, M.B. Lawrence, R.J. Pasch, and S.R. Stewart, 2001: Atlantic hurricane season of 2000. Mon. Wea. Rev., 129, 3037-3056.

Gray, W. M., 1968: Global view of the origin of tropical disturbances and storms. Mon. Wea. Rev., 96, 669-700.

Gray, W. M., 1979: Hurricanes: Their formation, structure, and likely role in the tropical circulation. Meteorology over the tropical oceans. D. B. Shaw, Royal Meteorological Society, 155- 218.

Gray, W. M., 1984a: Atlantic seasonal hurricane frequency. Part I: El Niño and 30 mb quasibiannual oscillation influences. Mon. Wea. Rev., 112, 1649-1668.

Gray, W. M., 1984b: Atlantic seasonal hurricane frequency. Part II: Forecasting its variability. Mon. Wea. Rev., 112, 1669-1683.

Guiney, J. L. and M. B. Lawrence, 1998: Hurricane Mitch. Internet. http://www.nhc.noaa.gov/

Hall, J. D., A.J. Mathews, and D.J. Karoly, 2001: The modulation of tropical cyclone activity in the Australian region by the Madden-Julian Oscillation. Mon. Wea. Rev., 129, 2970-2982.

Hall, T., H.E. Brooks, and C.A.I. Doswell, 1999: Precipitation forecasting using a neural network. Wea. Forecasting, 14, 338-345.

Heming, J. T. ,1997: UK meteorological office forecast performance during the unusual Atlantic hurricane season of 1995. Preprints, 22nd Conf. on Hurricanes and Tropical Meteorology, Ft. Collins, CO, Amer. Meteor. Soc., 511-512.

Hirschberg, P. A. and J. M. Fritsch, 1993: On understanding height tendency. Mon. Wea. Rev., 121, 2646-2661.

Holland, G. J., 1997: The maximum potential intensity of tropical cyclones. J. Atmos. Sci., 54, 2519-2540.

141

Hsieh, W. W. and B. Tang, 1998: Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull. Amer. Meteor. Soc., 79, 1855-1870.

Inoue, M., et al., 2002: Bimodal distribution of tropical cyclogenesis in the Caribbean: Characteristics and environmental factors. J. Climate, 15, 2897-2905.

IRI Data Library, 2003: NOAA NCEP EMC CMB Global Reyn_SmithOIv1 climatology. Internet. http://ingrid.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.EMC/.CMB/.GLOBAL/ .Reyn_SmithOIv1/.climatology/

Jarvinen, B. R., C.J. Neumann, and A.S. Davis, 1984: A tropical cyclone data tape for the North Atlantic Basin, 1886-1983: Contents, limitations and uses. NOAA Tech. Memo NWS NHC-22, National Hurricane Center, Miami, FL, 21 pp.

Jenkins, M. A., 1995: The cold-core temperature structure in a tropical easterly wave. J. Atmos. Sci., 52, 1168-1177.

Kalnay, E., and co-authors, 1996: The NCEP/NCAR 40-year reanalysis project. Bull. Amer. Meteor. Soc., 77, 437-471.

Kanamitsu, M., W. Ebisuzaki, J. Woolen, S.-K. Yang, J.J. Hnilo, M. Fiorino, and G.L. Potter, 2002: NCEP-DOE AMIP-II reanalysis (R-2). Bull. Amer. Meteor. Soc., 83, 1631-1643.

Karyampudi, V. M. and T. N. Carlson, 1998: Analysis and numerical simulation of the Saharan Air Layer and its effect on easterly wave disturbances. J. Atmos. Sci., 45, 3102-3136.

Khain, A., and I. Ginnis, 1991: The mutual response of a moving tropical cyclone and the ocean. Beitr. Phys. Atmos., 64, 125-141.

Kolenda, T., S. Sigurdsson, O. Winther, L.K. Hansen, and J. Larsen, 2002: DTU:Toolbox. Internet. http://mole.imm.dtu.dk/toolbox/ann

Krasnopolsky, V. M., L.C. Breaker, and W.H. Gemmill, 1995: A neural network as a nonlinear transfer function model for retrieving surface wind speeds from the SSM/I. J. Geophys. Res., 100, 11033-11045.

Kurihara, Y. and M. Kawase, 1985: On the transformation of a tropical eastery wave into a tropical depression: A simple numerical study. J. Atmos. Sci., 42, 68-77.

Kwon, H. J. and M. Mak, 1990: A study of the structural transformation of the African easterly waves. J. Atmos. Sci., 47, 277-292.

Lander, M. A., 1994: Description of a monsoon gyre and its effects on the tropical cyclones in the Western North Pacific during August 1991. Wea. Forecasting, 9, 640-654.

Lawrence, M. B., L.A. Avila, J.L. Beven, J.L. Franklin, J.L. Guiney, and R.J. Pasch, 2001: Atlantic hurricane season of 1999. Mon. Wea. Rev., 129, 3057-3084.

Lee, C. S., 1989a: Observational analysis of tropical cyclogenesis in the Western North Pacific. Part I: Structural evolution of cloud clusters. J. Atmos. Sci., 46, 2580-2598.

Lee, C. S., 1989b: Observational analysis of tropical cyclogenesis in the Western North Pacific. Part II: Budget analysis. J. Atmos. Sci., 46, 2599-2620.

142

MacKay, D. J. C., 1992a: The evidence framework applied to classification networks. Neural Computation, 4, 720-736.

MacKay, D. J. C., 1992b: A practical Bayesian framework for backpropagation. Neural Computation, 4, 448-472.

Malkus, J. S. and H. Riehl, 1960: On the dynamics and energy transformations in steady-state hurricanes. Tellus, 12, 1-20.

Maloney, E.D., and D.L. Hartman, 2001: The Madden-Julian Oscillation, barotropic dynamics, and North Pacific tropical cyclone formation. Part I: Observations. J. Atmos. Sci., 58, 2545-2558.

Marzban, C., 1995: Scalar measures of performance in rare-event situations. Wea. Forecasting, 13, 753-763.

Marzban, C., 2000: A neural network for tornado diagnoses. Neural Computing and Applications, 9, 133-141.

Marzban, C. and G. J. Stumpf, 1996: A neural network for tornado prediction based on doppler radar-derived attributes. J. Appl. Meteor., 35, 617-626.

Marzban, C. and A. Witt, 2001: A Bayesian neural network for hail size prediction. Wea. Forecasting, 16, 600-610.

Masters, T., 1993: Practical neural network recipes in C++. Academic Press, 493 pp.

McAdie, C. and M. B. Lawrence, 2000: Improvements in tropical cyclone track forecasting in the Atlantic basin, 1970-98. Bull. Amer. Meteor. Soc., 81, 989-997.

McBride, J. L., 1981: Observational analysis of tropical cyclone formation. Part I: Basic description of data sets. J. Atmos. Sci., 38, 1117-1131.

McBride, J. L., 1995: Tropical cyclone formation. Global perspectives on tropical cyclones. World meteorological organization tech. Document WMO/TD. 693, 63-105.

McBride, J. L. and T. D. Keenan, 1982: Climatology of tropical cyclone genesis in the Australian region. Journal of Climatology, 2, 13-33.

McBride, J. L. and H. E. Willoughby, 1986: Comment - an interpretation of Kurihara and Kawase's two-dimensional tropical-cyclone development model. J. Atmos. Sci., 43, 3279-3283.

McBride, J. L. and R. Zehr, 1981: Observational analysis of tropical cyclone formation. Part II: Comparison of non-developing versus developing systems. J. Atmos. Sci., 38, 1132- 1151.

Miller, B. I., 1958: On the maximum intensity of hurricanes. J. Meteor., 15, 184-195.

Molinari, J., D. Knight, M. Dickinson, D. Vollaro, and S. Skubis, 1997: Potential vorticity, easterly waves, and Eastern Pacific tropical cyclogenesis. Mon. Wea. Rev., 125, 2699-2708.

143

Molinari, J., S. Skubis, D. Vollaro, F. Alsheimer, and H.E. Willoughby, 1998: Potential vorticity analysis of tropical cyclone intensification. J. Atmos. Sci., 55, 2632-2644.

Montgomery, M. T. and J. Enaganio, 1998: Tropical cyclogenesis via convectively forced vortex Rossby waves in a three-dimensional quasigeostrophic model. J. Atmos. Sci., 55, 3176- 3207.

Mozer, J. B. and J. A. Zehnder, 1996a: Lee vorticity production by large-scale tropical mountain ranges. Part I: Eastern North Pacific tropical cyclogenesis. J. Atmos. Sci., 53, 521-538.

Mozer, J. B. and J. A. Zehnder, 1996b: Lee vorticity production by large-scale tropical mountain ranges. Part II: A mechanism for the production of African waves. J. Atmos. Sci., 53, 539-549.

Pasch, R. J., 1998: Hurricane Danielle. Internet. http://www.nhc.noaa.gov/ .

Pasch, R. J., 2002: Forecasting tropical cyclogenesis in the NCEP global model. Preprints, 25th Conference on Hurricanes and Tropical Meteorology, San Diego, CA, Amer. Meteor. Soc., 178-179.

Pasch, R. J., L.A. Avila, and J.L. Guiney, 2001: Atlantic hurricane season of 1998. Mon. Wea. Rev., 129, 3085-3123.

Perrone, T. J. and P. R. Lowe, 1986: A statistically derived prediction procedure for tropical storm formation. Mon. Wea. Rev., 114, 165-178.

Reed, R. J., A. Hollingsworth, W.A. Heckley, and F. Delsol, 1988: An evaluation of the performance of the ECMWF operational system in analyzing and forecasting easterly wave disturbances over Africa and the tropical Atlantic. Mon. Wea. Rev., 116, 824-865.

Reynolds, R. W., 1988: A real-time global sea surface temperature analysis. J. Climate, 1, 75-86.

Reynolds, R. W. and T. M. Smith, 1994: Improved global sea surface temperature analysis using optimum interpolation. J. Climate, 7, 929-948.

Richard, M. D. and R. P. Lippmann, 1991: Neural network classifiers estimate Bayesian a- posteriori probabilities. Neural Computation, 3, 461-483.

Riehl, H., 1954: Tropical meteorology. McGraw-Hill, 392 pp.

Ritchie, E. A. and G. J. Holland, 1993: On the interaction of tropical-cyclone-scale vorticees. II: Discrete vortex patches. Quart. J. Roy. Meteor. Soc., 119, 1363-1379.

Ritchie, E. A. and G. J. Holland, 1997: Scale interactions during the formation of Typhoon Irving. Mon. Wea. Rev., 125, 1377-1396.

Ritchie, E. A. and G. J. Holland, 1999: Large-scale patterns associated with tropical cyclogenesis in the Western Pacific. Mon. Wea. Rev., 127, 2027-2043.

Rotunno, R. and K. A. Emanuel, 1987: An air-sea interaction theory for tropical cyclones. Part II: Evolutionary study using a non-hydrostatic axisymmetric numerical model. J. Atmos. Sci., 44, 542-561.

144

Shapiro, L. J., 1977: Tropical storm formation from easterly waves: A criterion for development. J. Atmos. Sci., 34, 1007-1022.

Shapiro, L. J., 1982a: Hurricane climatic fluctuations. Part I: Patterns and cycles. Mon. Wea. Rev., 110, 1007-1013.

Shapiro, L. J., 1982b: Hurricane climatic fluctuations. Part II: Relation to large-scale circulation. Mon. Wea. Rev., 110, 1014-1023.

Simpson, J., E.A. Ritchie, G.J. Holland, J. Halverson, and S.R. Stewart, 1997: Mesoscale interactions in tropical cyclone genesis. Mon. Wea. Rev., 125, 2643-2661.

SSEC, 2003: Images and data. Internet. http://www.ssec.wisc.edu/data/ .

Surgi, N., H.-L. Pan, and S.J. Lord, 1998: Improvement of the NCEP global model over the tropics: An evaluation of model performance during the 1995 hurricane season. Mon. Wea. Rev., 126, 1287-1305.

Tabachnick, B. G. and L. S. Fidell, 2001: Discriminant function analysis. Using Multivariate Statistics. R. Pascal, ed., Boston, Allyn and Bacon, 456-516.

Thorncroft, C. and K. Hodges, 2001: African easterly wave variability and its relationship to Atlantic tropical cyclone activity. J. Climate, 14, 1166-1179.

Velasco, D. G. and R. G. Waterman, 1979: Mesoscale convective complexes in the Americas. J. Geophys. Res., 92, 9591-9613.

Vincent, D. G. and R. G. Waterman, 1979: Large-scale atmospheric conditions during the intensification of Hurricane Carmen (1974) I. Temperature, moisture, and kinematics. Mon. Wea. Rev., 107, 283-294.

Watterson, I. G., J.L. Evans, and B.F. Ryan, 1995: Seasonal and interannual variability of tropical cyclogenesis: Diagnostics from large-scale fields. J. Climate, 8, 3052-3066.

Wilks, D. S., 1995: Statistical methods in the atmospheric sciences. San Diego, Academic Press, 467 pp.

Willoughby, H.E., J.A. Clos, and M.G. Shoreibah, 1982: Concentric eye walls, secondary wind maxima, and the evolution of the hurricane vortex. J. Atmos. Sci., 39, 395-411.

Xu, K.-M. and K. A. Emanuel, 1989: Is the tropical atmosphere conditionally unstable? Mon. Wea. Rev., 117, 1471-1479.

Zehnder, J. A., 1991: The interaction of planetary-scale tropical easterly waves and topography: A mechanism for the initiation of tropical cyclones. J. Atmos. Sci., 48, 1217-1230.

145