Rain Field Prediction in Land-Falling Tropical Cyclones: from a Spatio-Temporal Perspective Using Ground-Based Doppler Weather O

RAIN FIELD PREDICTION IN LAND-FALLING TROPICAL CYCLONES: FROM A SPATIO-TEMPORAL PERSPECTIVE USING GROUND-BASED DOPPLER WEATHER OBSERVATIONS AND “BIG-SPATIAL-DATA” TECHNOLOGIES

By JINGYIN TANG

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2017 © 2017 Jingyin Tang To my Mother, Father and beloved Ruixia ACKNOWLEDGMENTS This work is supported by National Science Foundation Research Grant BCS-1053864, University of Florida Research Opportunity Seed Fund, Intel Code Optimization Research Grant and Microsoft Azure Research Grant. I acknowledge my advisor, Dr. Corene Matyas. Her magnificent mentoring and continuous supports make my PhD study and this dissertation possible. I am truly grateful for her exceptional advising and mentoring as my committee chair. I acknowledge advisory from all my committee members, Dr. Markus Schneider, Dr. Micheal Binford and Dr. Stefan Gerber. I want to thank all group members, Dr. Stephanie Zick, Dr. Jose Hernandez, Yao Zhou, Yu Wang, Guoqian Yan, Sanghoo Kim, and all students in group members. I want to thank my family for all of their supports during my PhD study. At last, I truly thank my beloved Ruixia for her great support of every decision in my life.

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...... 4 LIST OF TABLES ...... 7 LIST OF FIGURES ...... 8 ABSTRACT ...... 10

CHAPTER 1 INTRODUCTION ...... 12 1.1 Tropical Cyclone and Doppler Weather Radar ...... 12 1.2 Literature Review ...... 13 1.2.1 Geographic Information System and Its Applications to Radar Meteorology 13 1.2.2 Doppler Radar Products and Their Applications in Geography ... 15 1.2.2.1 Hydrological research ...... 16 1.2.2.2 Climatology research ...... 16 1.2.2.3 Geospatial informatics ...... 17 1.2.2.4 Summary and limitations ...... 17 1.3 Overview of Study ...... 18 1.3.1 Intellectual merits ...... 18 1.3.2 Broader Impacts ...... 19 2 FAST PLAYBACK FRAMEWORK FOR ANALYSIS OF GROUND-BASED DOPPLER RADAR OBSERVATIONS USING MAPREDUCE TECHNOLOGY 21 2.1 Overview of 3D Weather Radar Mosaic ...... 21 2.2 Advantages of Diagnostic Playback over Real-time Processing ...... 24 2.3 Playback System Architecture ...... 26 2.3.1 Parallelization Principle and Software ...... 26 2.3.2 Radar Data Mosaic Methods ...... 27 2.3.3 Introduction to Resilient Distributed Dataset and MapReduce ... 29 2.3.4 Processing Radar Data using RDDs ...... 30 2.3.4.1 Preprocess ...... 30 2.3.4.2 Map function chain ...... 31 2.3.4.3 Shuffle and reducing function chain ...... 32 2.4 Mosaicking Radar Data ...... 33 2.4.1 Combining Scalar and Vector Radar Variables ...... 33 2.4.2 Communicating with Geospatial Analytics Functions ...... 34 2.5 Case Demonstration ...... 36 2.6 Summary ...... 38

5 3 ARC4NIX: A CROSS-PLATFORM GEOSPATIAL ANALYTICAL LIBRARY FOR CLUSTER AND CLOUD COMPUTING ...... 46 3.1 Overview ...... 46 3.1.1 Background ...... 46 3.1.2 Arc4nix ...... 48 3.2 Design, Architecture and Implementation ...... 50 3.2.1 Architecture Overview ...... 50 3.2.2 Protocol Design and Client Implementation ...... 51 3.2.3 Server Implementation ...... 53 3.3 Case Study ...... 56 3.4 Conclusion and Future Plan ...... 58 4 A SPATIO-TEMPORAL NOWCASTING MODEL FOR TROPICAL CYCLONES USING SEMI-LAGRANGIAN SCHEME ...... 64 4.1 Overview ...... 64 4.1.1 Tropical Cyclone and Precipitation Nowcasting ...... 64 4.1.2 State-of-art Radar-based Nowcasting Methods ...... 66 4.1.3 Motivation and Goals ...... 68 4.2 Methodology ...... 70 4.2.1 Overview ...... 70 4.2.2 Basic Method ...... 71 4.2.3 Calculating Motion Field ...... 72 4.2.4 Motion Field Correction ...... 73 4.2.5 Advection Scheme ...... 76 4.2.6 Determine the Source/Sink Term and Extrapolation ...... 78 4.3 Use Geographic Information System as an Integrated Platform ...... 78 4.4 Performance Evaluation ...... 80 4.4.1 Data and Methods ...... 80 4.4.2 Results ...... 82 4.5 Summary and Future Study ...... 83 5 CONCLUSION ...... 95 5.1 Innovations on Generating Radar Products in Geospatial Formats ..... 96 5.2 Capabilities Enhancement of Geospatial Analytic on Radar Products ... 96 5.3 Predictability Improvements on Tropical Cyclone Rainfall Using Weather Radar ...... 98 5.4 Future Directions ...... 98 REFERENCES ...... 100 BIOGRAPHICAL SKETCH ...... 114

6 LIST OF TABLES Table page 3-1 GIS supports on major Cloud Computing platforms ...... 60

7 LIST OF FIGURES Figure page 2-1 A sample procedure of processing data in MapReducee programming model. Arrows indicate data flow...... 40 2-2 The architecture and procedure of mosaicking radar variables...... 41 2-3 Reflectivity contours of Hurricane Charley and frontal zone to the north at2035 UTC 13 Aug. The left figure shows a slice at height of 4.0km. “Contour” function in ArcGIS Runtime is employed to create polygons of contours at intervals of 2dBZ. The right figure is the zoomed in image of both reflectivity contours and optimized horizontal wind vector near the eye...... 42 2-4 Demonstration of constructing rainband isosurface at 30dBZ in Hurricane Charley. Vertical expansion of eyewall and outer rainbands’ structure are clearly shown. . 43 2-5 Reflectivity contours of TS Bill at 0000 UTC 17 Jun. The left figure showsa contour slice at height of 4.0km. The right figure shows corresponding mosaic of Kdp...... 44 2-6 Total running time in hours of playing-back 24 hours of data during Hurricane Charley with 17 radars on 2, 4, 8, 16, 32 and 64 CPU cores with 2GB memory assigned with each CPU core...... 45 3-1 The overview of arc4nix. Arc4nix uses C/S architecture. When a geospatial function is called at the client side from a Python script, the client generates a geospatial task and send it to the server. The server executes the task and return messages back to clients, writing results to the shared storage accessible from the client...... 60 3-2 Workflow in synchronous mode (left) and asynchronous mode (right). Inthe synchronous mode, server responds after the task is executed; while in asynchronous mode, server immediately acknowledges the submitted task and runs the task in another session. Client can check progress of submitted tasks in the asynchronous mode ...... 61 3-3 Workflow of spatial analysis in the Isabel (2003) case. The first four stepsare repeated for each input layer, and the last step is aggregating from all layers. .. 61 3-4 Observed (left) and simulated (right) radar reflectivity polygons when Hurricane Isabel (2003) makes landfall. A time series of similar reflectivity layers are created every 10 minutes from 0930 UTC Sep 18 to 2100 UTC Sep 19 2013...... 62 3-5 System throughputs (files/min) of the Isabel case using different number ofCPUs for observed reflectivity and simulated reflectivity ...... 63

8 4-1 The procedure of the tracking stage. It takes four consecutives reflectivity images over 30 minutes with a 10 minute interval. The TREC motion vector is based on a nested TREC calculation scheme ...... 85 4-2 Workflow of the prediction stage, which this study sets to 8 hours witha10 minute time step...... 86 4-3 Illustration of a pixel’s displacement vector and its actual track...... 87 4-4 The variational analysis corrects interpolated field (green in left pane) to a realistic and symmetric field (red in right pane)...... 88 4-5 Simulated track of an air parcel as it moves in a spiral trajectory into the eyewall of an idealized stationary hurricane using the semi-Lagrangian scheme...... 89 4-6 Radar reflectivity at Isabel’s landfall on 1927 UTC 18 Sep. 2003. It is acomposite reflectivity below 4km altitude (below freezing layer) . It is noticeable thatwe cannot see a clear boundary of radar circles...... 90 4-7 The source/sink term Q for the prediction period...... 91 4-8 A zoomed view for motion vectors in the inner core area with background of reflectivity Z. For visibility, vectors are shown at 16x16 block-size (48kmx48km) scale ...... 92 4-9 Correlation as defined in Eq. 4–2 between forecast and observation during prediction period ...... 93 4-10 Skill scores during the prediction period ...... 94

9 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy RAIN FIELD PREDICTION IN LAND-FALLING TROPICAL CYCLONES: FROM A SPATIO-TEMPORAL PERSPECTIVE USING GROUND-BASED DOPPLER WEATHER OBSERVATIONS AND “BIG-SPATIAL-DATA” TECHNOLOGIES By Jingyin Tang August 2017 Chair: Corene J. Matyas Major: Geography Doppler weather radars are ideal instruments to observe precipitation in tropical cyclones when they move over land. Methodological and technical challenges arise in the use of data from radars in research due to the high spatio-temporal resolution, large volume and large transmission rate of radar data. Efficient yet accurate methods are needed to mosaic data from dozens of radars to better understand precipitation processes in tropical cyclones. Accurate observational data and reliable prediction models are both essential to improve forecasting quality, while technical innovations are critical to provide those data in time. Thus, this dissertation focuses on improving short-term predictability of tropical cyclone precipitation with Doppler weather radar observations, from both methodological and technical perspectives. It also aims to develop algorithms that prioritize high efficiency, convenience and feasibility for both meteorological and geographic communities. The analysis of historical weather events should utilize radar data from both sides of a moving temporal window and process them in a flexible data architecture which is not available in most standalone software tools or real-time systems. The first study presents a map-reduce-based playback framework using Apache Spark’s computational engine to interpolate large volumes of radar Level-II data onto 3D grids. Designed as being friendly to use on a high-performance computing cluster, these methods may also be executed on a low-end configured machine. A protocol is designed

10 to enable interoperability with Geographic Information System and spatial analysis functions in this framework. Open-source software is utilized to enhance radar usability in the non-specialist community. Case studies during tropical cyclone landfall shows this framework’s capability of efficiently creating a large scale high-resolution 3D radar mosaic with integration of GIS functions for spatial analysis. In the second study, the ability to use a GIS for geospatial analysis with radar data is extended to provide a comprehensive, scalable geospatial analytical library with strong compatibility of the market-dominating ArcGIS software stack on a personal workstation, high-performance computing cluster, or modern Cloud Computing platforms. This cross-platform geospatial library “arc4nix” permits the application of a wide range of geospatial methods that exist in ArcGIS and enables the geographic community to perform sophisticated analysis of weather radar data with minimal technical difficulties. It also supports parallel computational tasks using multiple CPU cores and computers for large-scale analyses. In the third study, improved forecasting model based on the semi-Lagrangian advection scheme, variational analysis technique and Geographic Information System is presented. The combination of these methods increases reliable rainfall prediction period in tropical cyclones to about 7 hours. The model presented in this dissertation is proven to improve the tropical cyclone rainfall predictability and rainfall data processing ability in better spatio-temporal resolution and accuracy.

11 CHAPTER 1 INTRODUCTION 1.1 Tropical Cyclone and Doppler Weather Radar

Tropical cyclones (TCs) are rotating low pressure systems that develop over the warm waters of tropical oceans. TCs have a spiral shape with low pressure in the center of the system, in an area known as the eye. The most intense winds and rainfall tends to be concentrated in the eyewall of the TC, an area dominated by intense convective activity associated with towering cumulonimbus clouds. TCs also have outer rainbands that can produce significant rainfall at a farther distance from the storm center. AsTCs approach land, they pose a danger to life and property with their associated fast winds, storm surges, and rainfall. Once they move inland, many of the forecasting challenges and death stem from heavy rainfall. (Rappaport, 2000, 2014). During 1963–2012, nearly 50 of TCs with fatalities had at least one death from freshwater floodingRappaport ( , 2014). Additionally, Czajkowski et al. (2011) found that a one-inch increase in rainfall increased total fatalities by 28, while more than a third of deaths occurred where storm-total rainfall was only 3–6 inches. As TCs make landfall, rapid changes occur to TCs due to the heterogeneity of friction, moisture, and temperature present in the land surface which differs markedly from the oceanPowell ( , 1982; Jones, 1987; Tuleya, 1994; Li et al., 1997; Mackey and Krishnamurti, 2001; Kimball, 2008; Huang and Liang, 2010) Weather radar, formally called Doppler weather surveillance radar (WSR), is a type of radio remote sensing device to measure ice, liquids and clouds in the atmosphere. Weather radars are designed to locate precipitation, calculate its motion and estimate its hydrometeor types (rain, hail, snow etc.). In the United States, more than 160 geostationary Weather Surveillance Radar – 1988 Doppler (WSR-88D) units form the Next Generation Radar (NEXRAD) network, operated by National Weather Services (NWS). The NEXRAD radar network provides non-stop weather observations nationwide in a high spatio-temporal manner (Bluestein and Hazen, 1989; Bluestein et al., 2014):

12 its resolution can reach to about 250 meters with updates every 5 minutes. Due to the complex radial and tangential motions of convective storm cells within TC rainbands, only the weather radar network can provide continuous monitoring on storm evolution at a large distance and high-resolution. Thus it becomes the ideal instrument to observe the evolution of TCs as they approach the coastline and move inland. 1.2 Literature Review

1.2.1 Geographic Information System and Its Applications to Radar Meteo- rology

Geographic Information System (GIS) is a computer system designed to store, manipulate, analyze and present geographic and spatial data. Nowadays, the acronym GIS also refers to “Geographic Information Science”. Conceptually, GIS not only represents a tool to perform spatial analyses but also extends to a comprehensive research discipline called “geoinformatics” (Sinha, 2006). An extended literature review demonstrates that GIS has made significant contributions in radar meteorology. Previous research canbe generally categorized into three types:

1. Applying spatial analytical methods on radar data and products to study weather and climate. Research has shown that the use of a GIS can enhance the understanding of radar mosaic images by measuring shape metrics and applying geospatial analytics (Matyas, 2007; Hu, 2014; Tiranti et al., 2014). For example, tracing shape changes of rain fields can help reveal TC’s interaction with topographical features ormiddle latitude weather systems (Lin et al., 2002). However, few geographers actually use radar data in their research (Matyas, 2010c).

2. Using GIS as a tool to improve the operational quality of the weather radar. For example, Krajewski et al. (2006) uses GIS to calculate the beam blockage map with Digital Elevation Model (DEM), whose results are useful in radar quality control processes. GIS is also applied in multiple countries for radar network planning (Minciardi et al., 2003; Gabella and Perona, 1998; Smith and Pielke, 2001).

3. The use of GIS as an integrated platform to store, edit, manipulate and visualize radar data. Research has shown that GIS can serve as powerful platform to manage and visualize weather data as high-quality maps, including radar data and products (Smith and Lakshmanan, 2006; Hu, 2014; Stoffel et al., 2001).

13 Although the usefulness of GIS in radar meteorology is evident, it is noticeable that technical bottlenecks exist to extend GIS into operational applications of weather radar. Despite a historical background that GIS applications are mostly programmed for the Windows operating system, radar meteorological applications are mostly programmed for the Linux operating system. Differences of software architectures between GIS and radar meteorology reflects two completely different concepts and purposes between the two disciplines: GIS software is principally user-central (Lanter and Essinger, 1991), it prioritizes user experiences on personal computers (PC), whose computational capabilities are consequentially limited by the capability of a single PC. In contrast, radar meteorological software is data-centric, thus it prioritizes computational capabilities and performance. Its primary operational environments are dedicated high performance workstations and computer clusters with very limited user interactions. Consequentially, the size and complexity of weather, climate and earth observation datasets often exceeds the capabilities of a single PC, as well as common GIS software. However, some critical scenarios like warning decisions require that the GIS software provide high-performance capabilities for certain operations, including storage, retrieval, and processing of observed and modeled weather and climate data. A conventional GIS has increasingly developed bottlenecks in computational efficiencyGoodchild ( et al., 1996; Healey et al., 1997; Huang et al., 2007) to fulfill those requirements. Among all so that it cannot fulfill those requirements. Weather-observational data through NEXRAD are extremely intensive given their data collection rate and total volume. For example, to trace the rain fields associated with Hurricane Charley as it moved across Florida and South Carolina in the year of 2004 and the frontal system with which it interacted, a domain of roughly 1000 × 1000 × 10km is required. To observe this domain using WSR-88D radars, dozens of stations must be analyzed together. Experiments show that software that runs a mosaicking algorithm on a domain with 500 × 500 × 100m resolution, using double-precise floating numbers, requires a total of 37-GB memory, and real-time algorithms require those calculations be done in

14 5 minutes or less. These requirements exceed the hardware capability of most PCs, as well as the capabilities of many kinds of conventional GIS software, because technically they are limited to using no more than 2-GB of memory due to their software architecture despite the amount of total memory installed on the machine. Recently, “Big Data” (Marr, 2015) and “Big Spatial Data” (Vatsavai et al., 2012) concepts are proposed and widely accepted. Although there is no formal scientific definition of “Big Data”, this word typically refers to datasets with “four Vs”: volume, velocity, variety and veracity. In IT industry, “Big Data” not only indicates the large datasets that impose technical challenges, but also a complete software stack based on Apache Hadoop and its derivatives (Zikopoulos et al., 2011). Similarly, the concept of “Big Spatial Data” also lacks a formal scientific definition, but the GIS community uses this term to refer to “Big Data” with location information. Both GIS and atmospheric science communities agree that weather and climate data are “Big Spatial Data” (Yang et al., 2017a,b). because weather data can always associated to a particular geographic location on the earth. Since 2012, many new tools have been designed and developed by the information technology community to handle Big Data in an efficient manner. The GIS community quickly adopted those tools and demonstrated many successful applications using weather and climate data (Li and Wang, 2017; Bosler et al., 2016). 1.2.2 Doppler Radar Products and Their Applications in Geography

Employing radar data in research outside of radar meteorology is mostly about using radar products. In the U.S., two levels of products are provided in the public domain: Level II data and Level III products. In the Federal Meteorological Handbook (Office of the Fedefal Coordinator for Meteorological Servicea nd Supporting, Research 2005), Level II data are defined as “digital radial base data (Reflectivity, Radial Velocity, Spectrum Width) and Dual-Polarized Variables (Differential Reflectivity, Correlation Coefficient, and Differential Phase) output from the signal processor in the RadarData Acquisition unit”. According to the Radar Operation Center (ROC) in the National

15 Oceanic and Atmospheric Administration (NOAA), the purposes of Level II data are to “support of operations, maintenance and developmental activities at the ROC” and “activities directed toward algorithm and product improvement”. Also, Level III products are defined as “the output product data of the Radar Product Generator (RPG)”. The Level III products “assist forecasters and others in weather analysis, forecasts, warnings and weather tracking”. Through an extensive literature review, we found that Level III products are more frequently used by non-meteorologists, including geographers. They generally cover three major topics: hydrology, climatology and geospatial informatics: 1.2.2.1 Hydrological research

Since Level III products provide a direct data source of precipitation at hourly and 3-hour resolution and continuous spatial coverage, they are widely used in multiple perspective of hydrological studies related to rainfall estimation(Krajewski and Smith, 2002; Andrieu et al., 1997), flood hazards (Creutin and Borga, 2003; Kouwen, 1988), spatio-temporal modeling (Cole and Moore, 2009; Bruen, 2000) and many other topics (Delrieu et al., 2009; Cluckie and Collier, 1991). But comparing with rain gauge measurements, rainfall rate is indirectly derived from echoed radar reflectivity which may lead to biased results. Through comparison studies (Johnson et al., 1999; Steiner et al., 1999) between rain gauge and radar products, it is shown bias-corrected radar rainfall products can provide more accurate information than interpolating rain gauge data over the study domain (Haberlandt, 2007). 1.2.2.2 Climatology research

Although the operational weather radar network has a shorter historical period compared with other observational instruments like gauges and radiosondes, its continuous spatial coverage provides a unique data source for regional and recent climatological research. Climatological research utilizing Level III products is closely related to rainfall (Knight and Davis, 2007) and clouds (Kuo and Orville, 1973) patterns. Although WSR-88D radar network operationally started in mid-1990s, climatological research

16 using experimental stations at a regional scale can be traced back to earlier ages. For example, Croft and Shulman (1989) reports a five-year regional rainfall climatology study. After operational WSR-88D network became online, longer-term studies are also reported by Overeem et al. (2009b); Holleman (2007) using NEXRAD Level-III products. Extreme weather events are also widely studied using radar products. For example Overeem et al. (2009a, 2010) use radar products to model local convective rainfall patterns; Cintineo et al. (2012) presented a nationwide hail climatology for the U.S. continent. 1.2.2.3 Geospatial informatics

Weather radar products general don’t directly contribute to a geospatial infomatics study as they share little common knowledge base. However, GIS applications and services are widely adopting weather radar. For example, there are many mobile applications and web-maps showing real-time radar images to provide public weather information. Especially, weather alert and warning systems are directly relying on high-resolution weather radar products to provide up-to-date information about rainfall to GIS modules (Sznaider et al., 2004; Carpenter et al., 1999; Yu et al., 2005). 1.2.2.4 Summary and limitations

In general, because of the natural characteristics of radar data (i.e. high resolution, fast updates and continuous coverage), wide applications in physical geography, GIS and other related are evident. However, it is found that utilization of Level II data is rare, and most studies only use Level III data only. This is because Level III data are preprocessed and saved in simpler formats, and contain less information so they can be understood without professional radar meteorological training. However, Level III products such as base reflectivity (the lowest altitude of radial scan) and composite reflectivity (vertical aggregation of the maximum reflectivity from all altitudes) do not retain information regarding elevation, thus they do not allow the vertical profiles of raining areas tobe discerned. A better solution is to create a product using algorithms similar to those for Level III but that yield output for multiple vertical levels and that are less complicated to

17 perform. Providing a straightforward interpretation of radar data at different altitudes and storing them as common geospatial format will enhance weather and climate research by permitting a larger group of scientists to more easily access the data. 1.3 Overview of Study

1.3.1 Intellectual merits

This dissertation is a cross-disciplinary study that spans three primary topics: physical geography, atmospheric science and information technology. It improves our understanding and predictability of where rainfall occurs as TCs move over land from both methodological and technological perspectives. From a methodological perspective, this study applies geospatial informatics on high-resolution spatial-temporal data to build a short-term TC forecasting model. From a technological perspective, this dissertation improves the technical framework in a way that empowers researchers to efficiently process complex weather and climate data in large volume, and permits experiments and studies to be carried out in a feasible amount of time while maintaining data accuracy and resolution. The rest of the dissertation is organized into four chapters. In Chapter 2, a new processing framework is presented. This framework introduces a methodology that adopts methods and solutions from studies about “Big Spatial Data” into research on weather radar data, aiming to create accurate three-dimensional grids on which to present weather radar observations. This chapter introduces a highly efficient MapReduce paradigm to digest radial-based Level-II products into a common raster-based spatial data model. It permits the application of geospatial analytics on radar products from a conceptual level. In Chapter 3, a distributed GIS analytical framework is presented. The aim of this chapter is to apply geospatial analytics on large volume radar products, permitting the execution of the geospatial data processing workflow on a radar data stream at or faster than real-time speed. Methods presented in this chapter permit the application of geospatial analytics on radar products using common GIS software. In Chapter 4, a new temporal

18 forecasting model is presented. This model is designed to produce reliable short-term forecasting (nowcasting) of TC’s rainfall areas and precipitation evolution up to 6–8 hours by utilizing spatial trends from a time series of radar reflectivity raster layers. Besides accuracy, a nowcasting model also requires fast running speeds to issue a forecast as soon as possible, usually in a few minutes. The methods developed and tested in Chapter 2 and 3 ensure the data availability and performance required in Chapter 4. Finally Chapter 5 concludes the entire study by summarizing the results and major contributions of the dissertation and providing recommendations for future research. 1.3.2 Broader Impacts

Although this dissertation focuses on a special extreme weather scenario – TCs and their related precipitation patterns, the developed methodologies, frameworks and software implementations can be extended to a wide range of GIS and physical geography studies. The MapReduce paradigm uses spatio-temporal keys in Chapter 2 applicable to all grid-based climate and weather data (for example, reanalysis datasets). The geospatial analytical library developed in Chapter 3 is a general solution that has no limitation on types of spatial analytical functions, thus any studies using GIS, especially using ArcGIS with Python programming, can benefit from it to accelerate analyses with large datasets. This tool can be applied to all general geospatial analysis tasks. This design is generalized for tasks that come with a large volume of inputs. In this study, we utilize radar products and numeric weather model outputs as cases represent such large volumes of data. The two frameworks presented in these chapters are expected to contribute to filling the gap between GIS and the meteorological community: it allows non-meteorologists to use both Level II data and Level III products in an easier way, and enables geographers, meteorologists and scientists in related areas to more easily apply powerful geospatial analytical tools on large volumes of weather data. The improvement of precipitation predictability makes a valuable contribution to the forecasting of rainfall from tropical cyclones that will enable flood-prone locations to be evacuated before conditions become

19 too dangerous to leave, and hopefully reduce property loss and causalities caused by TC rainfall.

20 CHAPTER 2 FAST PLAYBACK FRAMEWORK FOR ANALYSIS OF GROUND-BASED DOPPLER RADAR OBSERVATIONS USING MAPREDUCE TECHNOLOGY 2.1 Overview of 3D Weather Radar Mosaic

Ground-based weather radars sample the atmosphere at a high spatial and temporal resolution, making them the preferred instrument for remotely sensing precipitation over land (Germann and Zawadzki, 2002). In the United States, the Weather Surveillance Radar-1988 Doppler (WSR-88D), or Next Generation Weather Radar (NEXRAD), a network of S-band radars, has been operational since 1995. Approximately 160 radars provide more than 20 years of data for atmospheric research (Crum and Alberty, 1993). However, the mosaicking of data from dozens of radars to analyze synoptic-scale systems, such as tropical cyclones (TCs), presents challenges in computational efficiency and accuracy, especially for researchers who are not formally trained in radar meteorology. For example, Matyas (2007, 2010c) did not consider altitude when utilizing a geographical information system (GIS) to mosaic base reflectivity data, calculate shape metrics, and measure the sizes of rainfall regions in landfalling TCs. Because of the scanning limitations of a single radar (Carbone et al., 1985), multiradar composites are essential to many meteorological applications, including real-time weather forecasting, observational research of synoptic-scale weather systems, and numerical weather prediction models. This range of applications requires different data processing strategies. Real-time analysis requires a time-sensitive handling of input data from the recent past. On the other hand, research-grade applications can utilize data on both sides of a moving temporal window to construct the best possible representation of precipitation structures in the atmosphere. For this purpose, multiple pieces of software and libraries were developed (Ansari et al., 2010). Based on the software designing mode, they could be categorized into three types: 1) single-node tools, 2) service clients, and 3) batch processing systems. Single-node tools such as Radx Single-node tools as Radx (Dixon and Wiener, 1993), SOLO (Nettleton et al., 1993), Reorder (Oye and Case, 1995)

21 enable researchers to run software locally using personal computers or workstations to decode, extract, and transform archived radar data. Visualization of radar data on screen is also supported by software, such as the Weather and Climate Toolkit (Ansari et al., 2010). Service clients like the Virtual University of Chicago–Illinois State Water Survey(Chandrasekar et al., 2005) preprocess and prepare radar data using data services provided by a remote data server. These clients require a low amount of computing resources on researchers’ local computers. Batch processing systems are usually designed for processing large volumes of data in parallelized architecture, optionally distributed over multiple nodes. Real-time processing systems and warning decision systems (Michalakes et al., 2004; Gao et al., 2013; Lakshmanan and Humphrey, 2014) fall into this category. They must run on a specialized platform to ensure data are processed at real-time speeds. For example, platform guides for the Warning Decision Support System–Integrated Information proposes dedicated high-performance disks and 16-GB memory per CPU core (Lakshmanan et al., 2007b). Although these three categories handle radar data at different scales, these requirements are not optimal for research-grade applications or feasible for use by the community outside of meteorology, thus limiting the knowledge that could be gained through analysis of these high-resolution data. Because of the conical nature that radar senses objects in the atmosphere, radar data are usually sampled, recorded, and stored in uncommon binary formats (e.g., NEXRAD archive for the WSR-88D archive, DX format for the German Radar Service, Common Doppler Radar Exchange Format (Universal Format) and Climate and Forecast (CF)-compliant netCDF for general radial radar data). Thus, before radar data can be analyzed for research purposes, multiple preprocessing steps are required. Since radar antennas rotate, weather radar data are recorded in polar coordinates (azimuth, elevation, and range). The data must be mapped to Cartesian coordinates to fit the needs of applications that require radius-insensitive values and constant-sized gridded data (Mohr et al., 1986). Because of the curvature of the radar beam and the earth’s surface, the

22 range of a single radar is limited to 460 km at the lowest tilt in the case of U.S. WSR-88D radars and 230km in higher tilts. To analyze synoptic-scale systems such as TCs, it is necessary to mosaic data from multiple individual radars onto a Cartesian grid. Because weather systems move over time, it is preferable to process and display weather radar data seamlessly over large domains, regardless of the radar from which the data originated (Vasiloff et al., 2007; Lakshmanan et al., 2007b). With the aid of high-performance computing and parallel processing, 3D radar mosaics can be constructed faster than real time to permit the analysis of multiple past weather events. WSR-88D radar data from National Climatic Data Center are stored in two formats: level II archives and level III products (Crum et al., 1993). Level II data include basic Doppler radar moments and system status information. They contain three important variables – reflectivity, radial velocity, and spectrum width – and are encoded asa unique binary format. Additionally, differential reflectivity, correlation coefficient, and differential phases are available in dual-polarized level II data. As level II data donot always accurately and directly portray information for hydrometeors, they are mostly used by trained meteorologists. Level III products are derived from level II archives and are pre-processed to facilitate their use by nonspecialist researchers(Matyas, 2010c). But level III products, such as base reflectivity and composite reflectivity, have limitations regarding elevation that do not allow the vertical profile of raining areas to be discerned. A better solution is to create a product using algorithms similar to those for level III but that yield output for multiple vertical levels and that are less complicated to perform. The need for this type of product was confirmed through a survey conducted bythe National Center for Atmospheric Research during an international workshop. The survey showed a high demand for a methodology of integrating radar data with nonradar data (such as satellite images, land-use data, topology) using open-source software that could support both personal research and community usage (Bluestein et al., 2014). Providing a

23 straightforward 3D interpretation of radar data stored as an uncomplicated file format will enhance research by permitting a larger group of scientists to more easily access the data. To fulfill these demands, the goal of this research is to develop a method tocreate a mosaic with data from multiple radars and to perform spatial analysis in the most computationally efficient manner possible. In this paper, we present a fast playback system for creating a 3D mosaic of reflectivity and wind velocity using level II data from multiple WSR-88D radars. This system provides a scalable, computationally efficient, geospatial, and GIS-friendly environment. We present a flexible and extensible method to incorporate radar mosaic technology into a popular open-source parallel computational framework (Apache Spark) and industrial-level GIS software (ArcGIS) to enhance efficiency and simplicity. 2.2 Advantages of Diagnostic Playback over Real-time Processing

We will utilize the term diagnostic playback to refer to the use of radar data to examine past weather events. Both real-time processing and diagnostic playback systems collect observations from one or multiple radar stations, place them in the correct position on the research domain, construct a 3D mosaic, and compose radar products. Although both systems share the same methodology, there are different considerations for three elements: timeliness pressure, data availability, and system runtime environment. These differences are highlighted below. Time is critical in a real-time system. The real-time system must process radar data as soon as they are streamed to the system, and the minimum processing speed must be as fast as real time because processed data are immediately required as inputs to follow-up procedures, especially in a warning decision scenario. Thus, a real-time system usually places a temporal margin at each stage to safely handle unexpected latencies, for example, an input/output bottleneck in hard disks, or the system might be blocked by the operating system until old data are processed and flushed out. Avoiding such latency during an operational run necessitates that the design of a real-time system focuses more on stability

24 than flexibility. With the hard constraints of running time, algorithms for real-time systems also must balance accuracy, complexity, and available computing resources. Thus, a real-time system often needs a dedicated environment to ensure that algorithms run timely and to reduce latency in all steps, especially while processing data from multiple radars when the volume of data streaming is large. In contrast, because there is no time pressure for running a diagnostic case, a playback system can run in a different style by prioritizing flexibility and scalability. Without hard time constraints, a playback system leads to lower requirements on both hardware and software environments, and it can be run on an arbitrary system capable of completing a finite case in a user-acceptable duration. Also, complicated algorithms and experiments could be executed without consideration for time pressure. Latency in a playback system can be handled by either slowing down the running speed or hanging up the system until all queuing data are processed. Two additional details also distinguish historical analysis from that in real time: finite experiment duration and the need to make changes. Before commencing playback of a historical case, all data are available and the exact end time for the case is known. Thus, algorithms applied in a playback system can produce a two-sided temporal interpolation at any single moment by using past and future data. Also, the researcher can adjust the range of possible times to be included in the output product in both directions. However, this is not always possible in a real-time system when processing the most leading time data because it is impossible to acquire observations in future time. So, algorithms in a real-time system are usually limited to a one-sided temporal interpolation. Furthermore, the playback process is usually carried out by researchers who require multiple experiments with parameters that may change frequently. A playback system is a more suitable platform for these tests than a real-time system. The system runtime environment also requires the flexibility to easily extend the functionality in the system to accommodate such changes. Given this, the playback system needs a different type of

25 architecture rather than simply porting a real-time system to emulate a diagnostic case in the same manner as it would process a real-time case. 2.3 Playback System Architecture

As the previous section highlighted the demands for research-grade operations, our playback system is designed on a distributed computation environment. We construct our system using the concept of an actor model, and implement it with Apache Spark and ArcGIS Runtime using the Scala programming language. This combination considers feasibility in its implementation in addition to scalability, flexibility, and computational efficiency. 2.3.1 Parallelization Principle and Software

Admittedly, batch processing large volumes of radar data could be done via naive parallelizing at high levels using existing open-source software [e.g., Radx, Weather Climate Toolkit, Python ARM Radar Toolkit (Py-ART)] by utilizing high-level application programming interface (API) or command line interface (CLI). For example, one could use GNU Parallel to parallelize several instances of Radx CLI commands, each processing a list of input files. This scheme is less optimal in three main aspects. First, most existing software is naturally designed to operate on a single workstation. Second, although they provide a fully functional interface in an easy-to-use style, these functions are mostly designed to be run inside a stand-alone application in a single-thread loop. Such a long sequential loop performs inefficiently in a distributed environment. Finally, it is nearly impossible to exchange data efficiently between multiple instances of one application, even in one local node. These aspects lead to an impossible task of coordinating the computation of large volumes of data among multiple computers (e.g., high performance cluster), which is often accessible by many researchers. Thus, the coordination of computation is essential to parallelize radar processing procedures in a low level. The principle in our proposed scheme is to divide a procedure into small units. Each unit contains a simple small calculation with minimal data dependencies. Those units are

26 stored in a pool. A unit will be picked out from the pool and executed by any idle computer. The procedure continues until the pool is empty, at which time the procedure has been completed. We will demonstrate this principle in later sections. Parallelization in this framework is based on Apache Spark. Apache Spark is a fast and general engine for large-scale data processing capable of executing our commands in an efficient mannerDean ( and Ghemawat, 2008) where the functions of mosaicking radar and spatial analytics are plugged in and executed. Apache Spark provides high-level programming interfaces in Java, Scala, and Python programing languages. It also supports running on a single node as a simple concurrent execution engine and as a cluster across multiple nodes, allowing our system to be run by users with or without access to high-performance computing clusters. In this research, we utilize an actor model, the fundamental communication and parallelization mechanism in Apache Spark, to simulate a radar network composed by radars involved in the playback case. The simulation is straightforward: each actor acts as an individual radar station in the network, actors change their respective status when the corresponding radar station changes its operating mode by external control or by self-controlled rules. A radar actor continuously sends out messages that contain observations to the processor actor until the case ends or it receives stop commands from supervisors. During the simulation, those individual actors preserve time asynchrony among multiple radar stations. The simulation of the radar network is not a mandatory requirement in the presented parallelization scheme, but rather it simplified the procedure to adopt an optional radar station–specific algorithm on the corresponding radar actor in the parallelization scheme. 2.3.2 Radar Data Mosaic Methods

The Level-II data are stored in a spherical coordinate system. Thus, the first challenge in creating a 3D mosaic domain is to transform the coordinates into a Cartesian 3D grid. In single radar scenario, a gate with slant range to its corresponding radar

27 station equal to Sr, at azimuth of α and elevation of ϕ, the relationship of its position in radar-centric conic coordinate system to its position (x, y, z) on a Cartesian grid is:

  × × π − ×  x =Sr cosϕ cos( α + β) m + xr  2   × × π − ×  y =Sr sinϕ sin( α + β) m + yr  2 z =H + z (2–1)  r   S2  H =S × sinϕ + r  r 2 × I × R  r e  × − Sri =rg (i 1) + r0

where (xr, yr, zr) are the coordinates of the radar antenna in a 3D Cartesian coordinate system, H is the height of the beam centerline above the radar level (km) given by WSR-88D operational beam propagation, Ir is the standard refractive index

th equal to 1.21; the radius of earth (Re) is set as 6371km; Sri is the slant range of the i gate in a beam is; where rg is the gate width (1km in legacy radar products; 0.25km in super-resolution radar products) and r0 is the distance to first gate. Super-resolution radar products became available during 2008 at most WSR-88D sites (NOAA Radar Operation Center , 2008). m and β are the geodesic scale factor and geodesic convergence at the location of the radar station which gates belongs to. The above-mentioned equations assume that the earth’s surface is flat. To combine data from multiple radars onto a single grid, the earth’s curved surface needs to be projected onto a flat surface. The choice of projection system is based on the extent ofthe research domain and spatial distribution of radar stations. To maintain the circle shape of the radar scan and to preserve the true direction of all azimuths, conformal projection methods are preferred. With properly defined parameters, the projection will not yield large spatial errors. Distance errors are less than 1.3% when using the Lambert conformal conic projection that covers the continental United States with central latitude at 39°N and standard parallels at 33°N and 45°N. Errors will be less for smaller research domains.

28 2.3.3 Introduction to Resilient Distributed Dataset and MapReduce

MapReduce is a programming model for parallelizing data processing across massive numbers of computers in a distributed environment (Dean and Ghemawat, 2008). MapReduce achieves large-scale data processing for parallel computing by using map and reduce functions, prohibiting the data exchange between map functions in the map stage and reduce functions in the reduce phase. As shown in Fig. 2-1, each MapReduce job takes inputs as a key-value dataset, producing a new key-value dataset with the following steps:

• The map phase executes map functions, which at each time, accepts one key-value pair and transforms it into one or multiple intermediate key-value pairs: < k1, v1 >→ [< k2, v2 >]

• The shuffle phase collects all intermediate key-values pairs from the map functions, sorts them by keys, and groups values with the same keys: [< k2, v2 >] →< k2, [v2] >

• The reduce phase aggregates grouped values and produces one key-value pair for each group: < k2, [v2] >→< k3, v3 > All inputs, intermediate results, and outputs are always represented as key-value pairs. The key-value pairs are stored in resilient distributed datasets (RDDs) – fundamental elements that carry and transform data fundamental elements that carry and transform data in in Apache Spark. RDDs should be kept small in order to be stored and exchanged completely in memory for high performance, as suggested in Apache Spark programming guides (Apache Incubator, 2013). For our routine, each RDD contains a small portion of scans at one elevation for one radar. Initially, each RDD is tagged by the time of echoes and the coordinates in the research domain, which are then converted into a spatial–temporal index as keys. Once the spatial–temporal keys are defined, they fit the requirement of the MapReduce paradigm. Apache Spark is advantageous as it is possible to chain multiple map and reduce functions in a pipeline, which enables us to operate on complex tasks in MapReduce jobs. Because exchanging data between nodes consumes

29 resources and time, the restriction of data exchange inside each stage guarantees high performance when scaling up the system. 2.3.4 Processing Radar Data using RDDs

A system run begins from reading environmental configurations. The configurations initialize variables of the following: definition of geographic projection; positions of known radar stations; start and end times of a diagnostic case; spatial and temporal resolution of mosaicking datasets, which determines spatial–temporal keys; name of spatial analysis functions to be executed; and total available CPUs and memory available. These environmental configurations must be deployed to all nodes in a cluster before MapReduce functions are executed on separate nodes because no data are exchanged between map functions or reduce functions on different nodes. After the environmental initialization, the system is ready to commence MapReduce functions. The complete procedure of playing back a diagnostic case includes four steps: preprocess, map function chain, reducing function chain, and postprocess (Fig. 2-2). These steps are detailed below for our radar research task. 2.3.4.1 Preprocess

In this step raw level II archives are decompressed to dump into the file system. Then they are decoded and split into small parts according to the names of variables [i.e., reflectivity (dBZ), radial velocity, and spectral width], scan elevation, andtime, and then written to individual netCDF (Network Common Data Form) filesRew ( and Davis, 1990). An optional quality control algorithm for radar data could be employed here. For example, utilizing the w2qcnndp (Lakshmanan et al., 2007a) application from WDSS-II, a real-time radar processing system (Lakshmanan et al., 2014) for Level II radar data could be beneficial (e.g. removing clutter, sun-spikes). Velocity-azimuthal display (VAD) technologies are employed here to add horizontal wind information. Although multiple improvements are proposed based on VAD in later research (Gao et al., 2004a; Matejka and Srivastava, 1991), VAD employed at this stage is only for providing

30 an initial guess that will be corrected in a later stage. The NetCDF files are further dumped and converted to JSON (JavaScript Object Notation) as it is a compact text and is compatible with Apache Spark (Crockford, 2006). The converted JSON files store attributes and metadata as JSON dictionaries and variables as JSON lists. These JSON files are the initial inputs to the map function chain. For dual-pol radar data, specific

differential phasesK ( dp) are also calculated based on differential phase shiftϕ ( dp) using the convolution filter presented in Vulpiani et al. (2012). It is also beneficial to adopt a MapReduce step at preprocessing stage to parallelize

VAD and Kdp calculation. For the VAD algorithm, the minimal input set is a sweep of gates at one elevation and one gate range, and for the Kdp convolution filter, the inputs will be individual beams. So the calculations are easily mapped into multiple mapping functions, while the reducing functions are null because these calculations add additional information to the original gates. 2.3.4.2 Map function chain

Our utilization of Apache Spark permits the flexibility to connect multiple functions as a whole map chain so long as the output formats after the last function are formatted as key-value pairs. In the map stage, we chain three mapping steps. The goal of the map stage is to tag all gates sampled by radar devices with their correct spatial temporal keys. The steps are as follows. Initial inputs from the preprocessor are individual JSON files that contain a complete 360°can at one antenna elevation. Each file is read into an RDD, and the key of theRDD is the start time of the scan at that elevation. Then individual beams are extracted from that scan and saved into individual beam-RDDs and are tagged by the exact sampling time. The time is calculated from the rotation rate of its corresponding volume coverage pattern (VCP) and scan start time. The next step is to identify which data are needed for analysis at a predetermined time. The temporal resolution of the final output may not perfectly match the radar scan period and at least one completed volume scan

31 must be collected from each radar to generate a complete 3D mosaic. Thus, RDDs may need duplication if they are needed for multiple outputs over short time intervals or removal if the output time is long. When the start time and temporal resolution are given, the time is mapped to an integer starting from index 0. RDDs with the same time index are gathered in the reduce stage to calculate the 3D mosaic. Then each beam-RDD is decomposed gate by gate into multiple gate-RDDs. Next, the gate’s polar coordinate (r, α, ϕ) will be transformed to its true coordinate (x, y, z), and then further mapped to a grid index of (i, j, k) according to the domain’s original points and spatial resolution as defined in the configuration. Only the gate center point is mapped duringthe transformation. Empty grid cells exist when the domain resolution is high. Filling blank cells will be done in later reducing functions. Combined with the temporal index from the previous step, the final key (T, i, j, k) contains T as the temporal index and (i, j, k) as the spatial index. RDDs that contain same key fall into the same grid point in one 3D mosaic dataset at one time. In the map stage, one RDD is always “mapped” into multiple RDDs in each step: an input-RDD reads an input file that contains a complete sweep of the elevation scan thatis then is mapped into multiple beam-RDDs; each beam-RDD is then mapped into multiple gate-RDDs. No messages are exchanged between RDDs inside each step, thus allowing input-RDDs to be randomly dispersed among multiple nodes to create beam-RDDs, and so forth. Tagging all RDDs with keys guarantees correctness in the map stage. 2.3.4.3 Shuffle and reducing function chain

After the map step, Apache Spark collects all RDDs that were calculated across multiple nodes and sorts them by keys. The shuffle stage groups RDDs first by keyT and then key (i, j, k). Each group serves as inputs to the reduce step. Similar to the map function chain described previously, it is possible to have multiple functions in the reduce chain. Although variables of reflectivity and radial velocity are mapped atthe same time, they are aggregated separately in the reduce stage because different methods

32 are applied to construct their mosaics. Furthermore, mosaicked reflectivity datasets are required by the method employed in this paper to derive wind velocity vectors from scalar radial velocity. So, we calculate the reflectivity product first, followed by the wind velocity product. 2.4 Mosaicking Radar Data

2.4.1 Combining Scalar and Vector Radar Variables

The processing of reflectivity data is straight forward. For one temporal key,all RDDs tagged with this key are combined into one 3D mosaic dataset. It is a common ocurrence for a grid cell to lack an observation when it is located in the gap between two radar beams at high elevation and close to the radar. So, we first compute grid values that contain nonzero observations and then use those values to interpolate values for grid points with no observations. When more than one observation exists for a grid point, the grid value is computed as a weighted average because an echo is more accurate when it is detected where the beam energy density is higher and its observation time is closer to the output time of the mosaic dataset. A time–distance-weighted function proposed by Lakshmanan et al. (2006) is employed here. After this step, all points containing at least one observation are filled. When no observations are available with which to to populate the grid, the pointis marked as ‘‘missing” but no further action is performed at this step. We follow the same procedure to combine other scalar variables and dual-polarization variables, for example, differential reflectivityZ ( dr) and specific differential phaseK ( dp). After the preliminary mosaic dataset is created, the results are sent to a geospatial function where interpolation is used to fill in the missing values. A distance-weighed function is employed so that missing values are averaged based on the surrounding values. A k-dimensional (k-d) tree (Bentley, 1975) is built to accelerate nearest-neighbor searching. The horizontal distance is weighted 2.5 times the vertical distance. The purpose in adopting different factors is to aid in overcoming the problem of the sampling gaps

33 at high altitudes near the radar that produces a bull’s-eye pattern of concentric rings, especially when bright bands are present. The procedure of combining velocity data is more complex. True wind velocity over a domain is a three-order array filled by a 3D vector. Since the VAD technology employed above gives only horizontal wind direction, the mosaicked 3D wind grid also contains only a horizontal wind field at each vertical level. To fully utilize both 3D reflectivity and horizontal wind fields, we further implemented a three-dimensional variation (3DVAR) data assimilation method proposed by Gao et al. (2004b) to analyze vertical winds and to make corrections on horizontal winds. The implementation is to minimize an overall cost function equal to the square sum of the difference between the calculated radial velocity to radar stations from the wind mosaic and radar-observed radial velocity, plus a penalty term of a weak anelastic mass continuity. The 3D reflectivity grid provides the required reflectivity to calculate hydrometeor terminal velocity based on the formulain Montmerle et al. (2001). The initial guess is the wind mosaic with all vertical wind speeds set to 0. We apply a 3x3x3 moving cubic to parallelize the calculation of Jacobian and Hessian terms so that the optimized 3D wind field can be efficiently obtained using the quasi-Newton method. 2.4.2 Communicating with Geospatial Analytics Functions

Previous research has shown that the use of GIS can enhance the understanding of radar mosaic images by measuring shape metrics and applying geospatial analytics (Matyas, 2007; Hu, 2014; Tiranti et al., 2014). For example, tracing shape changes of rain fields can help reveal the interaction between TCs and topographical features or middle latitude weather systems (Lin et al., 2002). Although many geospatial analytic functions are based on computational geometry and statistics and could be ported into the system using Scala and Java, it is better to utilize existing GIS software rather than rewrite the same GIS functions. Thus, we employ ArcGIS Runtime to integrate with the system for three main reasons: (1) Geospatial analytics function, namely “geoprocessing”

34 in ArcGIS software is considered to be robust (Steiniger and Bocher, 2009). (2) ArcGIS Runtime contains a lightware core which can be deployed easily on multiple nodes without a complicated configuration and heavy dependencies on Graphics User Interface libraries. (3) ArcGIS Runtime provides a Java-based SDK capable of collaborating with Scala and Apache Spark. Our method also allows users to replace the ArcGIS part with similar implementations as long as these functions are able to accept and return key-pair RDDs. To enable interoperation between Spark and ArcGIS Runtime, a scriptable interface is designed to pass arguments and values between RDDs in Spark and ArcGIS Runtime. The scriptable interface triggers a local geoprocessing server if no local server is running on the current node. It then submits geoprocessing tasks to the local server, waits until execution finishes and retrieves the results, and transforms them as RDDs again. The scriptable interface permits the ArcGIS Runtime to run as a stand-alone program as a workaround of potential license conflicts with the entire open-source playback system. The officially documented interface in ArcGIS Runtime is not suitable for executing arbitrary geoprocessing tasks because a local geoprocessing server is able to execute only a predefined ‘‘geoprocessing package.” However, ArcGIS Runtime is able to execute any Python script and it is possible to call all geoprocessing functions inside the Python script. This gives the scriptable interface in the system a proper method to overcome this limitation. On the playback system’s side, the scriptable interface generates Python codes containing all geospatial analytics functions dynamically, and then submits codes and necessary data to the local server. On the local server’s side, a specially designed geoprocessing package is deployed to simply execute the Python codes it received. For our system, ESRI JSON is chosen to carry all vector datasets and ESRI ASCII Grid is chosen to carry all raster datasets. Both are text formats recognized by GIS and are easy to manipulate on Apache Spark’s side. All geoprocessing functions executed on ArcGIS Runtime’s local server use in-memory workspaces to allay workloads on hard drives. After executing a geoprocessing function, the local server can be reused for the

35 next geoprocessing function until the case is finished. At this point, the system stops the scriptable interface and then the local server. Finally, the job ends and the system becomes idle and ready to be terminated by any external job scheduler. Despite our best efforts, two main limitations still exist: 1) the ability of geospatial analyses is limited to functions provided by ArcGIS Runtime, which is only a subset of all available functions provided in desktop version of ArcGIS; 2) most geoprocessing functions only accept 2D coordinates. Our 3D reflectivity and velocity mosaics are derived directly from grouped RDDs rather than stacked from 2D slices. Thus, the 3D mosaic must be created first before taking a slice through the mosaic to create a 2D cross section even though only 2D slices are desired as shown in many studies (Villarini et al., 2011). 2.5 Case Demonstration

Our playback system has been applied to reconstruct 30 TC landfall cases from 1995 to 2015, spanning both legacy and super-resolution (dual polarization included) Level II datasets. The number of radars involved in these cases ranges from 8 to 28. Each case started from the time when a TC’s center enters the 230-km range from radar and lasted 24 h to capture its evolution before and after landfall. We present system demonstrations of two cases at different scales: a large-scale case of Hurricane Charley (2004) anda small-scale case of Tropical Storm Bill (2015). In the first case, Charley rapidly intensified prior to landfall in southwestern Florida and produced hurricane-force winds across the peninsula while also spawning nine tornadoes (Franklin et al., 2006). Charley contained deep convection and stratiform cloud regions (Matyas 2009) that required analysis at high spatial resolution in 3D to be rendered properly. As it also interacted with a frontal boundary across Georgia and the Carolinas, the analysis of Charley required the merger of data from 17 radars. This case begins 0600 UTC 13 Aug. We generated 3D mosaics every minute with a grid cell size of 0.1km × 0.1km × 0.1km. Using 16 CPU cores on the University of Florida Research Computing (UFRC) cluster, the analysis of this test case was completed in about 15 h (Fig. 2-6).

36 Fig. 2-3 and Fig. 2-4 show 2D and 3D snapshots of merged reflectivity data, and Fig. 2-3 shows a horizontal slice at 4-km altitude. The dashed line indicates a 230-km buffer zone from the center of WSR-88D radar stations included in this case. The inner-core structure is interpolated from echoes detected by radars located in Key West, Tampa, and Miami, Florida. This snapshot in the left inset clearly shows Charley’s broken eyewall before landfall as reported by Lee and Bell (2007). In Fig. 2-4, a 3D isosurface of reflectivity at 30dBZ at the same time shows the double eyewall present during the analysis period discussed by Lee and Bell (2007). The right sub-figure shows a zoomed in view of reflectivity contour and horizontal wind field near the eye. The trend ofwind direction shows the inner wind circulation of Charley is still closed. However, despite our best efforts of optimizing wind direction and speed, this result is still coarse, thisisdueto the drawback of VAD because VAD is not optimal when applied in tropical cyclone cases. The second case of TS Bill (2015) shows our system’s capability to handle both reflectivity contours and corresponding dual-pol data. In this case, all three radarsare dual polarimetric. The ϕdp is converted into Kdp and then gridded with same spatial resolution. Fig. 2-5 shows spiral rainbands near the storm center after landfall at 0000

UTC 17 July 2016. Term Kdp is expected to improve observation and identification of convective process and heavy rainfall area as compared to analysis of reflectivityLi ( and Mecikalski, 2012). The Charley case is also used to test our system under a different hardware configuration to provide a comparison of performance when using 2 to 64 CPU cores(Fig. 2-6). As a general conclusion, one CPU core at 2.4 GHz with 4–6-GB memory digests 24 h of Level II data from one radar station in 6–8 h. With increases in the number of CPUs and total memory, an approximated linear decrement of total running time is observed. However, this decrement is expected to slow or stop when the system reaches bottlenecks when caching data into the hard disk, exchanging large volumes of data between nodes and actors, and waiting for execution of geoprocessing functions in ArcGIS Runtime.

37 To simulate these bottlenecks, a test was carried out on a low-end configured machine whose CPU runs about one-sixth fast as one in the UFRC cluster, equipped with limited memory and less than 80MB s−1 hard disk speed. The playback system took 70 h to run with identical outputs. Thus, we conclude that researchers with or without access to high performance computing clusters could implement our playback system without concern for stability and scalability. Additionally, the performance is compared with Radx by running Radx to grid one radar station (KMLB) on same time period with same spatial resolution. In average, with 16 CPU cores of UFRC cluster, Radx takes 1.6 h to create a single radar mosaic for one moment with same spatial resolution. Thus, for a task of gridding total 17 radars, the estimated running time is about 27 hours, which is slower. However, due to functional limitations of Radx which cannot temporally interpolate to any moments and incompatibility for multi-radar scenario, this comparison is not identical, but it shows a good acceleration of distributed parallelization provided by the playback system. 2.6 Summary

In this paper, a scalable, fast playback framework is presented, based on the MapReduce paradigm, for taking data (reflectivity and radial velocity) from multiple radars and combining them into a 3D merged domain. Level II Radar data are mapped by time, elevation scan and gates then transformed into Cartesian coordinates in parallelized map functions in the format of RDDs. RDDs carrying key-value pairs are grouped, mosaicked into a 3D domain, and further interpolated with geospatial functions. We evaluated the scalability of the system and found it capable of running across both low-configuration single workstations and high-configuration clusters. It demonstrates good compatibility with traditional high-performance computing (HPC) architecture and job schedulers commonly available for advanced community and personal research. With a highly efficient in-memory data exchange protocol between Apache Spark and ArcGIS Runtime, the system successfully adopts GIS and geospatial functions into radar data

38 processing that should enhance collaboration between researchers inside and outside the radar meteorology community. This project presents our first effort to create an open-source, cross-platform, user-friendly project for large-scale radar processing. Our future work will address solutions to the limitations that still exist. We will incorporate existing open-source projects, including the Python ARM Radar Toolkit(Helmus et al., 2013), wradlib (Heistermann et al., 2013) as the pre-processing part in the playback system. This preprocessing component will provide functionality of supporting multiple data formats besides NEXRAD Level II, as well as built-in quality control on inputs. Addressing the observed performance loss on Java-based code is critical. Also we plan to design a convenient Graphical User Interface that lowers the requirements of professional training on radar processing and the Linux system for software operation. For example, users could work with the GUI on their windows workstation, test and deploy geospatial functions with ArcGIS, then execute and monitor jobs on clusters. A GUI-enabled environment will play a large role in filling the gap between the professional radar community and theuse of radar data by researchers in other disciplines.

39 Figure 2-1. A sample procedure of processing data in MapReducee programming model. Arrows indicate data flow.

40 Figure 2-2. The architecture and procedure of mosaicking radar variables.

41 A B

Figure 2-3. Reflectivity contours of Hurricane Charley and frontal zone to the northat 2035 UTC 13 Aug. The left figure shows a slice at height of 4.0km. “Contour” function in ArcGIS Runtime is employed to create polygons of contours at intervals of 2dBZ. The right figure is the zoomed in image of both reflectivity contours and optimized horizontal wind vector near the eye.

42 Figure 2-4. Demonstration of constructing rainband isosurface at 30dBZ in Hurricane Charley. Vertical expansion of eyewall and outer rainbands’ structure are clearly shown.

43 A B

Figure 2-5. Reflectivity contours of TS Bill at 0000 UTC 17 Jun. The left figure showsa contour slice at height of 4.0km. The right figure shows corresponding mosaic of Kdp.

44 Figure 2-6. Total running time in hours of playing-back 24 hours of data during Hurricane Charley with 17 radars on 2, 4, 8, 16, 32 and 64 CPU cores with 2GB memory assigned with each CPU core.

45 CHAPTER 3 ARC4NIX: A CROSS-PLATFORM GEOSPATIAL ANALYTICAL LIBRARY FOR CLUSTER AND CLOUD COMPUTING Software Availability

Free under GNU General Public License (GPLv3) with Environmental Systems Research Institute (ESRI) software exception. Source code can be obtained from http://github.com/striges/arc4nix.git 3.1 Overview

3.1.1 Background

The volume and complexity of spatial data have significantly increasedMarr ( , 2015; Hsu et al., 2015) in past two decades. For example, operational earth observation systems (e.g. Atmosphere Radiation Measurements) and numeric model simulations (e.g. Coupled Model Intercomparison Project) produce tera- to peta-bytes of data daily (Yang et al., 2010) that exceeds the computational capability of a traditional Geographic Information System on a personal desktop computer. Further non-traditional geospatial data acquisition methods like unmanned aerial vehicles (Einav and Levin, 2014) and web scraping of social media (Romero et al., 2011) requires complicated online algorithms to mine spatial information from a large volume of plain texts, images and videos, and obtained spatial data may not associate with an accurate coordinate on the earth. Thus these data are difficult to model and store in traditional spatial databases usedby common GIS software. These technical challenges that occur while processing geospatial data in high volumes, high veracity and a high velocity manner are called Big Data (Zikopoulos et al., 2011) or Big Spatial Data (Vatsavai et al., 2012). Utilizing Cloud Computing in geospatial analytics and development of GIS software on Cloud Computing infrastructures provide feasible solutions for Big Spatial Data with two major advantages: (1) Cloud Computing provides elastic computing power that allow users to allocate required resources from small scale (e.g. local, regional) to large scale (e.g. nationwide, global) analytical tasks; (2) Cloud Computing enables the on-demand computing resources

46 without a significant upfront investment of computing hardware and softwareYang ( et al., 2013) to users. A quick review shows that GIS services have been provided by all major Cloud Computing companies, as well as by some research institutes. (Table. 3-1). Current GIS services provided by Cloud Computing vendors are still very limited: they either only provide individual virtual machines with GIS software installed which is the same as installing a GIS on a traditional PC, or only provide basic map editing and delivery service without true spatial analytical capabilities. But to solve Big (Spatial) Data challenges, one must perform complex spatial analytical tasks that exceed the service level currently available. Furthermore, a large portion of research and applications in both industry (e.g. Goodchild (2000); O’Looney (1997); Goodchild and Haining (2004)) and academy (e.g. Matyas (2009); Barnolas et al. (2008); Jones et al. (2015)) are relying on the ArcGIS software developed by Environmental Systems Research Institute, which declares a “43 percent share of the market in the year of 2015”. As most ArcGIS software products including ArcMap are explicitly only available on a Windows platform, they’re naturally incompatible with Clouding Computing where the Linux Operating System dominates the market. Admittedly, users and researchers can port existing programs to Linux on a Cloud Computing platform using Linux-compatible GIS software like QGIS, GRASS GIS. However, they must rewrite their original geospatial analytical programs to work with new operating system, runtime libraries and virtual machine environment. These steps are not trivial and require additional time and technical knowledge beyond that of many researchers who wish to apply GIS techniques but are not qualified to develop code. Rather than reworking the codes, another solution is to use remote processing techniques: computations are performed on a remote server and clients only retrieve aggregated results from the server. But according to a review by Evangelidis et al. (2014), nearly every remote GIS server requires that both input and output data formats follow Open Geospatial Consortium (OGC) Web Processing Service (WPS) standards. In other

47 words, a static and strict definition of geometry data type and attribute table structure must be given for both input and output data for each spatial analytical function, which is extremely inflexible in applications. For example, a user must define three identical buffer functions for point, line, and polygon inputs. This approach is redundant and may even be infeasible for collective operations like intersecting, merging and union whose number and type of inputs can vary with each use. To perform spatial analytical tasks on Cloud Computing in an easier way, we propose a solution to connect the gap between the demands of state-of-the-art Cloud Computing services and the requirements to run spatial analysis programs on a Cloud Computing platform. This purpose of solution is to allow users to compose and test their spatial algorithms on their familiar platform, typically ArcGIS Desktop series software on Windows, and then run them in dedicated environment like GIS servers on a Cloud Computing platform, or even a private computing cluster like High Performance Computing (HPC). Also, to adopt the characteristics of elasticity on Cloud Computing, scalability of running on different configurations, from single-node but powerful workstation to clustered multi-node environment, is also essential. In this paper, we present our solution called “arc4nix”. It is a cross-platform geospatial analytical library which is highly compatible with “arcpy” – a Python interface of geospatial functions in ArcGIS software. It runs on both Linux and Windows operating systems, and can be run on personal laptops, standalone workstations and distributed computing environments. 3.1.2 Arc4nix

As its name implies, we initially developed arc4nix to create a set of compatible application programming interfaces (API) for arcpy package on Linux. It is then ported to a Windows operating system to make it cross-platform. Arcpy, developed by ESRI, is a Python package that provides a programmatic method to call geospatial analytical functions in ArcGIS Desktop or ArcGIS Server series software with the Python programming language. Because both ArcGIS Desktop and ArcGIS Server series software

48 products explicitly depend on a Windows operating system, their Linux compatibility relies on an emulated Windows layer through the Wine project. However, the Wine project is not fast and stable, and its support of ArcGIS Desktop software is ranked as “not runnable at all” in the Wine’s official application database. Since any arbitrary third-party libraries may be used for advanced statistical and numeric computation, and those libraries may not be runnable on Wine, a separate environment is needed. Only Python built-in functions and functions provided by arcpy, which are shipped with ArcGIS Server installation on Linux operation system, should run on Wine. All other functions written in a user’s script must be executed in native Python on the native operating system environment. Although the Wine compatibility issues don’t exist on Windows, the separation of user Python and ArcGIS Python should be maintained to avoid crashes due to conflicting library dependencies between some Python packages and arcpy. We utilize client-server (C/S) architecture to achieve the separation: the client generates the task of geospatial functions and those geospatial functions are executed on the server. A protocol is designed to handle server-client communication with minimal payload. Details will be discussed in Section 3.2. Arc4nix is developed using Python and Scala programming languages and supports multi-core workstations, HPC clusters and Cloud Computing platforms like Microsoft Azure and Amazon Web Service. In Section 3.3, we demonstrate its capability and scalability by calculating geometric statistics on output from a numeric weather model. We present this case because outputs from numeric models are typical large-volume cases (in the case we will present in the Section 3.3, the total size of output files from the model is about 993-GB, and total size of weather radar observations is 225-GB). Also, analyses in this case involve conversions and manipulation for both raster and vector formats, as well as functionalities of spatial statistics not provided by any other GIS solutions on Cloud Computing platforms.

49 3.2 Design, Architecture and Implementation

3.2.1 Architecture Overview

The ultimate purpose of arc4nix is to create a drop-in replacement of arcpy API that to allows users to run their original arcpy-based scripts with minimal modification of the scripts’ source codes when utilizing a Cloud Computing environment. In arcpy, a user performs a “geoprocessing” task by calling an arcpy function written in Python programming language. ESRI defines the term “geoprocessing” as “a framework and set of tools for processing geographic and related data”. The Fig. 3-1 shows the C/S architecture design in arc4nix project. The client is a pure Python library that provides an arcpy-compatible interface. The server is a program which executes actual geospatial functions. When a geospatial function is called at the client side, in other words, in a user’s Python script, the arc4nix library creates a task, wraps all related parameters in objects, and sends them to the server. We call these client-side functions and objects “dummy” because they are not “real” functions or objects to perform geospatial calculations. In fact, a “dummy” function only creates a segment of Python code corresponding to real geospatial function to be executed on server, namely a task; while a “dummy” object contains necessary information to construct a “real” object on the server. Once the server receives the task, it executes the real geospatial function written in Python, then collects results and messages during the execution. Because the client only recognizes “dummy” objects in the pure Python language, the results and messages must be converted to “dummy” results and “dummy” messages on the server. Then those “dummy” results and messages are sent back to client, while “real” results are written to the storage space. The storage is shared by the servers and the client. It can be a centralized network storage system mounted as a shared file system, a distributed file system across both the servers and clients, or an external database server accessible by the client and server. Having a shared storage also avoids the exchange of large volumes and

50 complex computed results between clients and servers over the network. Once data are written to the shared storage, the client can immediately access them. 3.2.2 Protocol Design and Client Implementation

The protocol design contains three parts: a semantic rule, a data model of dummy functions and geographic datasets, and a transmit protocol between server and client. The semantic rule defines a subset of Python language elements allowed in dummy functions and objects, and it ensures all dummy objects can be completely described by Python built-in objects. The data model of dummy objects is the critical part to ensure programming compatibility between arcpy and arc4nix. Lastly, the transmit protocol defines the method of serializing dummy objects into text-based format for sendingand deserializing after they are received. Details about each part are explained in following paragraphs. Semantic Rule. From the software perspective, an arcpy function is merely a wrapped and user-friendly interface to a complicated ArcGIS internal function. But re-implementing the entire arcpy API in arc4nix is not a trivial task due to a large number of arcpy functions. The source codes of arcpy show that ArcGIS internal functions only accept text-based representations for all input parameters, while arcpy accepts parameters in Python-native types (e.g. decimals, strings, lists, dictionaries, tuples) and converts them into pure texts before passing them to internal ArcGIS functions. Thus in arc4nix, we partially re-implement this conversion, with several additional semantic rules defined to create a subset of Python-native types to keep full cross-platform compatibility. These rules are: (1) A simple variable in arc4nix can be one of the following simple types: numerics, text strings and sequences; and a sequence may only contain simple variables. (2) Dummy geospatial functions can only take simple variables as input parameters; a dummy geospatial function must be able to be translated into a single-sentenced expression that can be evaluated by Python’s “eval” function; (3) A dummy object can only contain variables defined by Rule 1 and functions defined by Rule2.

51 Dummy classes. The dummy class is a set of classes that have the same names and methods as real arcpy classes and functions. It serves two purposes: (1) carrying necessary information on the client side so the server side can construct a real arcpy-class based on the information in dummy one; (2) keeping API-level compatibility with arcpy script on all platforms. The first purpose requires that the dummy class be serializable intoa format compatible for hyper-text transportation protocol (HTTP), and the second purpose requires the dummy class to contain no system-specific dependencies. Both purposes are feasible due to the semantic rules above. Additionally, we explicitly require a dummy class to override Python default “__repr__” function, an internal protocol to tell Python interpreter what an object “represents”, to use text-based representation showing the actual path of data on the storage system shared by clients and server. Dummy geometry data model. Geometry data are vector data without attributes. In arcpy, some functions can accept pure geometries as parameters when attribute information is not necessary. For example, clipping a raster layer using a polygon mask doesn’t depend upon attributes of the mask polygon. To support this functionality, the dummy geometry class uses the OGC Well-Known Text (WKT) format to carry the geometry object. For the raster data model, we want to preserve one convenient feature of using arcpy to process raster data: use native Python expressions to perform arithmetic calculations on raster layers. Thus, a dummy “Raster” class is designed. Internal python functions related to arithmetic operators are overridden and then mapped into corresponding arcpy geoprocessing APIs (e.g. the ”plus” operator is mapped to ”arcpy.sa.Plus” function). The real arcpy raster class can store a raster layer temporarily in memory, but since the dummy class contains no actual data (only the path to where the data file is saved), the raster data will be physically saved in storage. This isalso beneficial in a cluster environment, as it is impossible for one node to directly accessdata in other node’s memory.

52 Communication protocol. Messages from the client to the server contain four parts. The first part is a variable definition dictionary whose keys are variable names and values are variable themselves. To avoid potential errors of string escape between client and server over HTTP, all objects are serialized into Base64 encoded texts. The second part is a preparation block that contains a Python code segment to restore encoded variables from this dictionary to memory. The third part is a flag indicating if the geospatial task will be executed in synchronous or asynchronous mode. The last part is the task itself written in a single-line expression. After the server receives a message from the client, it restores variables from the dictionary to memory, translates any incompatible paths between Windows and Linux, evaluates the single-line expression and returns dummy results back to the client. In synchronous mode, the client waits until the task result is returned from the server side, while in asynchronous mode, the client expects an immediate acknowledgement from the server to continue sending the next task. This procedure is visualized in Fig 3-2 as a workflow example. 3.2.3 Server Implementation

As discussed in the previous section, a server takes charge of executing actual geospatial functions. In arc4nix, the server will execute arcpy geoprocessing functions. The server contains four components (Fig. 3-1): server backend, server dispatcher, code container and optional local container. The server backend provides the environment and capabilities to run geospatial functions; the server dispatcher communicates with the client and dispatches geospatial functions to the proper backend; the code container is a service running on the server backend to receive and process functions from the dispatcher. Each client will have one corresponding server dispatcher, but a server dispatcher may send tasks to multiple code containers on multiple server backends. Although the server backend and the client can run on different computers, it is required that any data to be analyzed must be stored in the shared storage. A shared storage eliminates the cost to exchange data over the network, making results immediately available among all

53 computers in the cluster. Furthermore, the shared storage is always available in HPC cluster environments and cloud computing platforms. Thus this requirement guarantees high data exchange efficiency and simple architecture of the entire system. Server backend. The server backend is the actual software used to calculate geospatial functions. Two kinds of server backend are supported: managed backend and unmanaged backend. If a backend is managed, it runs inside a “local container” and its lifecycle is controlled by the “server dispatcher”, while an unmanaged backend works like an external server and communicates with the “server dispatcher”. Currently, the only supported managed backend in arc4nix is ArcGIS Runtime Local Server in ArcGIS Runtime SDK, while the only supported unmanaged server backend is ArcGIS Server. The ArcGIS Runtime provides a free-of-charge subset of ArcGIS Server. It only contains common geospatial functions but requires no installation. In contrast, ArcGIS Server provides full geospatial functionalities but requires additional licenses and installations. To call a geoprocessing tool calculations, a geoprocessing package must be deployed on the server backend to handle geospatial analytical requests. Since users may call any “geoprocessing function” in their script from either Linux or Windows environment, the package must be carefully composed. Due to the complexity of the package, we split it into a standalone component called “code container”. Server dispatcher. The server dispatcher acts as the bridge between the client and server backend. It receives the dummy geospatial function from the client, finds an idle server backend and sends commands to the backend to execute the real geospatial function. After execution, the dispatcher collects results and messages (e.g. warnings, errors, etc.) from backend and sends them to the client. The server dispatcher is written in Scala language with Actor Model (Esposito and Loia, 2000) framework. We chose this combination for three reasons: (1) The core functionality of arc4nix is inherited from our previous work (Tang and Matyas, 2016) as an integrated spatial analytical component based-on the Akka Actor Model (Neykova and Yoshida, 2014) and Apache Spark (Apache

54 Spark, 2016). (2) The Akka Actor Model provides a highly concurrent asynchronous capability to parallelize multiple geospatial functions in a distributed environment; (3) they are compatible with popular Cloud Computing and “big data” frameworks like Apache Hadoop (Vavilapalli et al., 2013) and Apache Spark (Apache Spark, 2016). The server dispatcher is the key component that implements the protocol. When a task is received by the dispatcher, it first checks the flag to determine whether the modeis synchronous or asynchronous. It will not respond to the client until the task is completed on the backend in synchronous mode. In contrast, the server will immediately respond to client in asynchronous mode with a callback link which allows the client to check the status of the task later. Local container. The local container is an optional component that controls the lifecycle of the local managed server backend. For each local container, it starts an instance of ArcGIS Runtime Local Server, collects geospatial analytical tasks from dispatcher, executes the tasks, returns results and status upon requests from dispatcher and finally shuts down the instance after execution. It is possible to run multiple containers on one powerful workstation – one for each CPU core, or on multiple nodes across HPC clusters. Since the dispatcher runs jobs in asynchronous mode, it can send a task to any idle container and immediately dispatch the next task without blocking the processing. This allows users to easily run large-scale geospatial analyses on clusters. Code container. Code container is a specially crafted “geoprocessing package” running on ArcGIS Runtime Local Server or ArcGIS Server. It is special because this package does not provide any spatial analytical capabilities as a normal “geoprocessing package” does. It receives encoded Python code segments from the server dispatcher as input arguments and evaluates them in the host’s Python interpreter (i.e. the Python interpreter provided in ArcGIS Server). This package also detects operating system environments and performs file-path conversions when necessary. This happens when the client (e.g. on Linux) and the server backend (e.g. on a remote Windows server) are

55 running on different operating systems or on different machines. In this example, theclient uses the Linux-style file path of input file as parameters in geospatial functions. Butthe server backend on Windows won’t understand the Linux-style path. Thus, the package will convert the Linux-style path to corresponding Universal Name Convention (UNC) name for the backend on Windows, or Wine-mapped path for the backend on Linux, to ensure correct access of the input files. After the path conversion, the package will evaluate the Python code segments containing the actual geospatial function. To avoid potential input/output latency on hard disk or shared storage, the code container will utilize in-memory workspace to store intermediate results whenever possible. 3.3 Case Study

We present a case study to demonstrate the capability, performance and scalability of arc4nix by performing spatial analysis on observations produced by ground-based radars and conditions simulated by a numerical weather prediction model for a landfalling hurricane. This analytical example represents a typical practice in atmospheric science research: modeling a weather system and verifying it with observations. Computationally, this case represents a typical application scenario, running large-scale spatial analytical functions in-place on a distributed computing environment using arcpy-enabled scripts. The Next Generation Weather Radar (NEXRAD) network in United States represents a typical Big Data source of earth observations as each of the approximately 160 ground-based radar stations produces about 200GB of compressed geospatial data daily while scanning the atmosphere (Doviak and Zrnic, 2014). The most frequently used radar echo is reflectivity which can be converted to precipitationAustin rate( , 1987). Thus NEXRAD provides high-resolution (0.25–1km, every 5–10min) continuous rainfall observations over the U.S continent. By combining individual radar observations into a large mosaic, which is necessary to examine synoptic-scale weather features such as hurricanes and cold fronts, it can be used to verify simulated results obtained from numeric weather prediction models. For this example, the NEXRAD mosaic is computed

56 using a distributed MapReduce algorithm on Apache Spark, which is also running on the same HPC (Tang and Matyas, 2016). To simulate Hurricane Isabel’s landfall, we utilize Advanced Research Weather Research and Forecasting (WRF-ARW) model version 3.6.1 (Wang et al., 2007) on HPC. This case study includes post-processing steps to plot contour lines of radar-observed and model-simulated reflectivity, and performs geometric and statistical calculations on polygonized contour lines. Previously, without arc4nix, radar and model outputs must be copied back to a local workstation with ArcGIS installed, where spatial analytics can be performed. This procedure takes a long time due to the large size of data transfer and limited computing power of the local workstation. With arc4nix, all spatial analytics are done directly on HPC in parallel using multiple nodes, which significantly accelerates calculation. Furthermore, only final results are copied back to the local workstation for display, whose sizes are dramatically smaller than original data. In this case, the input dataset to arc4nix includes: (1) 225-GB of 224 observed raster layers of reflectivity values at 3.5km altitude, which are composed from 17 radar stations, capturing Isabel’s rainband structures every 30 minutes between 0930 UTC Sep 18 and 2000 UTC Sep 19 2003; (2) 997-GB simulated reflectivity obtained by the numeric weather prediction model whose during same period, stored in Network Common Data Form (netCDF) format. We employ spatial analytical techniques discussed in (Matyas, 2007) and Zick and Matyas (2016) to investigate Isabel’s structural change from a spatial statistical perspective. The case workflow is illustrated as Fig. 3-3. Through calculating geometry attributes (e.g. area, perimeter, minimal bounding box, etc.) for each polygonized reflectivity contour line, and their spatial pattern (e.g. roundness, dispersiveness, etc.), we can quantitatively measure similarity between modeled and observed data. Fig. 3-4, plotted using ArcMap, shows both observed and simulated contour polygons created by arc4nix in ESRI shapefile format, from raster layers of reflectivity at the time when Hurricane made landfall. The results of our test case indicate that using arc4nix with a single CPU core on Intel Xeon

57 Processor at 2.30GHz, it takes about 2.2 hours to process 224 inputs of observed radar reflectivity and calculate their spatial statistics. With performance profiling, wediscover that parameter encoding and decoding takes about 3.2% of total CPU time. Although 68.2% of total CPU time is consumed by the code container on ArcGIS Server, it occupies 75.8% of total running time on the client side in synchronous mode. This means that a 7.6% overhead occurs on during communication between the client and the servers, including possible latencies on storage and network. To verify the overhead costs, we re-run the same single-CPU case with original arcpy in ArcMap on a Windows PC with identical hardware configuration. The test runs 3.4% faster than using arc4nix on Linux. This result is expected because the network and storage latency on a local computer can be ignored, the slowness is caused by additional computation of variable encoding and decoding. To evaluate the scalability of arc4nix, we run the same case with different hardware configurations using 1 to 32 CPU cores (Fig. 3-5). We can find that system throughputs is higher when processing model outputs than radar observed data. It is because in this case, model outputs lead to less number of polygons after contouring, which use less time to process in follow-up algorithms. Although overheads exist as discussed above, we see running speeds are proportional to the total number of CPUs. This test shows the parallel architecture brings a significant acceleration with little efforts on programming toadopta distributed Linux environment, and the arc4nix can reach a nearly perfect linear scaling in the multi-cpu environment. 3.4 Conclusion and Future Plan

In this paper, we presented a scalable, arcpy-compatible, cross-platform library called “arc4nix”. Arc4nix utilized C/S architecture to bridge geospatial functions between native operating systems and GIS-ready servers. On the client side, arcpy geoprocessing functions are converted to arc4nix dummy geospatial functions for server-side execution; on the server side, the real geospatial functions are restored from the dummy functions

58 and executed, with results saved on the storage accessible from both server and client sides and dummy results sent back to client for follow up executions. Arc4nix showed high compatibility with original arcpy interfaces in three aspects: arcpy geoprocessing functions can be used directly in arc4nix with no modifications; map algebra for raster dataset written in Python native expressions is also supported in arc4nix; in-memory objects like database cursors can be directly evaluated as code segments, which is the only instance when certain code modifications may be necessary. We conclude that arc4nix’s advantage of allowing a user to run their original scripts on both single-node and multi-node environments with minimal modification achieves our goal to simplify technical procedures when applying GIS on Cloud Computing or HPC environments in cross-disciplinary research areas. We will continue the development of arc4nix to improve its compatibility with original arcpy. In particular, we plan to directly implement the functionalities of database cursors in arc4nix project using Geospatial Data Abstract Layer (GDAL) library to eliminate necessary code modification. We also plan to provide an automated parallelization mechanism to improve user experience to utilize large amounts of computing resources and enhance security for arc4nix. Although arc4nix enforces an authentication step to access an unmanaged backend like ArcGIS Server, we will impose further restrictions on executing Python scripts on server so that the only allowable functions will be contained on configurable whitelist.

59 Table 3-1. GIS supports on major Cloud Computing platforms Company Name Platform Name GIS Software Example Research Amazon Amazon Web ArcGIS Server Wang et al. (2009); Service virtual machine Shao et al. (2011) image Microsoft Microsoft Azure ArcGIS Server Gong et al. (2010); virtual machine Agarwal and Prasad image (2012) ESRI ArcGIS Online Portal for ArcGIS Pimm (2008) Server Google Google Cloud Google Earth Patel et al. (2015) Platform Engine (Gorelick, 2012)

Figure 3-1. The overview of arc4nix. Arc4nix uses C/S architecture. When a geospatial function is called at the client side from a Python script, the client generates a geospatial task and send it to the server. The server executes the task and return messages back to clients, writing results to the shared storage accessible from the client.

60 Figure 3-2. Workflow in synchronous mode (left) and asynchronous mode (right). Inthe synchronous mode, server responds after the task is executed; while in asynchronous mode, server immediately acknowledges the submitted task and runs the task in another session. Client can check progress of submitted tasks in the asynchronous mode

Figure 3-3. Workflow of spatial analysis in the Isabel (2003) case. The first four stepsare repeated for each input layer, and the last step is aggregating from all layers.

61 Figure 3-4. Observed (left) and simulated (right) radar reflectivity polygons when Hurricane Isabel (2003) makes landfall. A time series of similar reflectivity layers are created every 10 minutes from 0930 UTC Sep 18 to 2100 UTC Sep 19 2013.

62 Figure 3-5. System throughputs (files/min) of the Isabel case using different number of CPUs for observed reflectivity and simulated reflectivity

63 CHAPTER 4 A SPATIO-TEMPORAL NOWCASTING MODEL FOR TROPICAL CYCLONES USING SEMI-LAGRANGIAN SCHEME 4.1 Overview

4.1.1 Tropical Cyclone and Precipitation Nowcasting

As tropical cyclones (TCs) approach land, they pose a danger to life and property with their associated fast winds, storm surges, and rainfall. Once they move inland, many of the forecasting challenges and death stem from heavy rainfall. (Rappaport, 2000, 2014) An accurate short-term forecast of 0 to 6 h of precipitation from tropical cyclones (TCs) is required by forecasters and decision makers for the issuance of flash flood warnings and urban drainage management (Wilson et al., 1998). This kind of short-time forecasting is called “nowcasting”. The World Meteorological Organization (WMO) defines nowcasting as “the detailed description of the current weather along with forecasts obtained by extrapolation for a period of 0 to 6 hours ahead” (World Meteorological Organization). Nowcasting incorporates the most recent observations, including those from radars and satellites, to make an accurate forecast for small regions such as cities. A successful nowcast that gives an accurate rainfall prediction will significantly reduce the hazardous risk to people and properties (Cao and Wei, 2004) In the United States, the Weather Surveillance Radar-1988 Doppler (WSR-88D), or Next Generation Weather Radar (NEXRAD), a network of S-band radars, has been operational since 1995. Approximately 160 radars provides non-stopping high-resolution weather observation data at 0.25—1km, every 5—10 minutes (Crum and Alberty, 1993; Istok et al., 2009). Thus, the weather radar network is an important data source in weather forecasting, including nowcasting. In general, radar data can be used in nowcasting in two different ways: predicting only based on weather radar data (radar-only nowcasting), or predicting using a numeric weather prediction (NWP) model with radar observations being used to calibrate the NWP model (radar-assimilated model nowcasting). Radar-only nowcasting uses radar moments (e.g. reflectivity) and their

64 directly derived products (e.g. composite reflectivity), while in radar-assimilated model nowcasting, a NWP model digests radar observations through a data-assimilation procedure. Radar-only nowcasting is usually based on object tracking and extrapolation methods in image processing technologies. It analyzes a time series of radar products, usually reflectivity data, plotted as digital images, and identifies and extracts thetracks of coherent cloud structures from one image to the next. Then extrapolation is done by moving pixels on the last image following the extracted track from the image series. Because using image processing techniques ignores physical rules in the atmosphere, radar-only nowcasting cannot produce a reliable prediction for days into the future in NWP models. In contrast, radar-assimilated model nowcasting does not suffer from this problem because it uses physical (deterministic) equations to solve clouds’ thermodynamic processes to predict its future status, and that prediction is expected to be more reliable as it Liang et al. (2010). These models take weather radar as an external observation to adjust the model status. For example, the recently-developed RAPid refresh model (RAP) digests weather radar observations through a data-assimilation system and it can produce a fast prediction up to 8 hours into the future nationwide with updates every 15 minutes (Benjamin et al., 2016). Strictly speaking, these NWP models don’t use weather radar observations as initial and boundary conditions when solving the physical equations (Chen and Dudhia, 2001). The purpose of feeding weather radar observations into these models is to improve the quality of the short-term forecast (Tong and Xue, 2005; Snyder and Zhang, 2003) through a data assimilation process, because a short-term forecast is very sensitive to the quality of observed inputs (Kleczek et al., 2014). Although theoretically, using a NWP model that assimilates radar observations to issue forecasting surpasses using radar observations only, it cannot replace radar-based nowcasting yet. Firstly, using numeric weather models in nowcasting is heavily limited by the quality of observational data. Since NWP models are sensitive to initial conditions, setting up accurate initial conditions is critical in short-term forecasting. Thus quality

65 control takes an important role in data collection and high-quality data may not be available in an emergent situation (Wilson et al., 1998). Secondly, radar-only nowcasting requires significantly less computational resources than NWP models. Modern NWP models require thousands of CPUs to reach a spatial resolution of 1km on a nationwide domain, and 1km is also a resolution limit for many NWP models. In contrast, data from current operational WSR-88D radars can be obtained at a spatial resolution up to 250m, and extrapolating radar observations requires less than one hundred CPUs with this finer resolutionLakshmanan ( and Humphrey, 2014). Finally, radar reflectivity is not an explicit measurement of liquids in the atmosphere, rather it is an aggregated reading contributed by both total liquid volume and rain drop size. Because rain drop size contributes exponentially to the reflectivity echo strength, a few large rain drops can produce same reflectivity reading as many small rain drops, whose actual total liquid volume is smaller than the later. When a NWP model tries to adjust its states with radar observations, it must assume a rain drop size distribution to calculate “simulated reflectivity”Koch ( et al., 2005) before comparison with observed reflectivity, while such rain drop size distributions are sampled from field experiments under non-extreme conditions (Ulbrich, 1983). This means that the “simulated reflectivity” may not produce results equivalent to radar-observed reflectivity, especially in convective and extreme weather scenarios, including TCs. Thus, although the concepts and theories in the NWP models are more advanced than extrapolation-based methods, radar-only nowcasting is still applied operationally in many countries, and cannot be replaced by NWP models. 4.1.2 State-of-art Radar-based Nowcasting Methods

Traditional radar-only-based (radar-based hereafter) nowcasting is mostly based on observations from a single radar. Tracking radar echo by correlation (TREC) is the first kind of radar-based nowcasting method, proposed by Rinehart and Garvey (1978). It calculates correlation coefficients between successive images of radar echoes anduses the maximum values to obtain the motion vectors of different regions. TREC is an

66 image processing algorithm that is purely-based on image sequences and completely ignores scale and dynamical equations of motion for a weather system. To overcome these drawbacks and improve accuracy, multiple refined methods have been proposed after TREC. Tuttle and Foote (1990) added a spatial filter to obtain the internal motion at a smaller scale. Li et al. (1995) proposed the continuity of TREC vectors (COTREC) scheme to comply with the continuity constraint in the atmosphere. This constraint helped avoid the strong divergence of echoes that occurs in TREC results, but during calculations it unavoidably weakened the retrieved wind field at the same time. Tuttle and Gall (1999) showed that average echo motion speed obtained using COTREC was underestimated by 10% compared with the speed detected by an aircraft. Mecklenburg et al. (2000) was the first study to introduce a parameter in both TREC and COTREC schemes to take into consideration the growth and decay of individual cloud regions. Its demonstration through analyses of local thunderstorm cases showed that the inclusion of a growth parameter led to a better forecast. Gamba et al. (2002) combined COTREC and a shape analysis approach to track precipitation events and obtained a more refined motion vector field that reached a 70% match with ground observations, which was a better performance when compared to using COTREC only (40–50% matching). Besides the limitation on continuity, TREC occasionally produces a vector that points to a direction that is contradictory when compared with its surrounding vectors. This limitation was addressed by Zhang et al. (2006), who proposed the difference image based TREC (DITREC) algorithm by calculating the cross correlation maximum between differences in precipitating regions from three consecutive images instead of two images. Liang et al. (2010) introduced a blending algorithm that combines TREC vectors with model-predicted winds to prolong the prediction time up to 3h. Wang et al. (2013) proposed the Multi-scale TREC (MTREC) algorithm that uses TREC in a nested style: a first pass of TREC calculation with low resolution obtains the synoptic-scale motion, and one additional pass at high resolution inside each large low-resolution region is used to

67 predict meso- to local-scale internal motion. They reported that MTREC could produce a reliable 1h forecast in typhoon cases with input mosaics of composite reflectivity. Operationally, TREC and all TREC-derived nowcasting methods are still based on single-radar scenarios. In the U.S., a single WSR-88D radar station can only cover a circular region with 230km radius. Thus, using observations from a single radar station puts an upper limit to the spatial scale of observations, and consequently, also to the timescale over which the forecast is useful. The total time length of useful forecasts reported in the literature are usually less than 2 hours and often less than 1 hour. (Mecklenburg et al., 2000). Extending these methods to adopt a large mosaic of radar images obtained from multiple stations in a network permits forecasters to reveal cloud patterns that are not observable in single radar scenarios. For example, in a single-radar range, non-linear motions like rotation in a mesoscale convective system may be not significant due to its limited spatio-temporal scale, but such rotation can beeasily captured at a large domain (Tuttle and Foote, 1990). Also, it is suggested by previous research (Turner et al., 2004) that patterns at about 1000km scale tend to be more consistent and predictable up to 1 day. Thus it is more preferable to perform nowcasting at a synoptic scale when data are available from multiple radar stations and can be mosaicked into a single image. 4.1.3 Motivation and Goals

We find that research applying TREC and TREC-derived methods only reports their predictability on local heavy and extreme rainfall cases like convective storms, localized thunderstorms, and squall lines that are captured well by a single radar. Aside from the MTREC case-study that featured a partial view of a small typhoon before landfall using single radar station, there is a lack of literature discussing their performances and limitations when a TC moves into the analytical domain. Tropical cyclone cases should be treated separately from other weather systems because TCs are predominately large and have a strong rotational component to their motion. The linear extrapolation

68 methods used by all TREC-derived algorithms ignore the tangential component of motion. Therefore, the predicted motion vectors would break the storm apart. Although convective clouds with relatively strong vertical motion and small horizontal extent occur in the eyewall and spiral rainbands of TCs, most of the clouds those comprise a TC are stratiform (Jorgensen, 1984) where vertical motion is relatively weak and the clouds occupy a large horizontal region. The previous studies employing TREC or its derivatives mostly focused on either convective or stratiform cases, not the mixed scenario that occurs with a TC. Furthermore, although the average extent of a TC’s rainfall is 220–240 km on either side of its circulation center (Matyas, 2010b; Zhou and Matyas, 2017), it is generally considered that rainfall within 500km radius from a TC’s eye is produced by the TC rather than another type of weather system (Jiang et al., 2011). This 500km distance obviously exceeds the detection range of one radar station. As such rainfall can potentially trigger floodingVillarini ( et al., 2011), it is critical to accurately predict its motion, but the scale of a TC’s rainbands necessitates that multiple radars be employed to capture the entire system. Given the radar network’s ability to observe the atmosphere at a horizontal resolution of 0.5–1km and temporal resolution of about 5 minutes, it is a powerful instrument to study and forecast TCs before, during and after their landfall. Thus, in this study we aim to extend the capability of the original single-radar nowcasting method to a large domain and tune the algorithm to consider tangential motion and mixed cloud types for a TC scenario. We choose TREC as our starting point, then employ multiple ideas from related methods to adopt it for a TC scenario. The rest of the paper is arranged as follows. Section 2 presents our revised nowcasting scheme for a multi-radar scenario with special considerations when a TC is observed. Section 3 discusses the motivation and efforts to use GIS as the platform to conduct the nowcasting. To provide a performance evaluation, Section 4 presents a 10h nowcast of the precipitation associated with Hurricane Isabel as it

69 moves over the mid-Atlantic and northeastern U.S. in September of 2003. The last section concludes this study and presents directions for future research. 4.2 Methodology

4.2.1 Overview

As a prediction model, a nowcasting model shares the same features as any general prediction model: it takes observed data as inputs and generates outputs beyond the observed time period. As radar-only based nowcasting lacks many fundamental physical variables to establish a numeric weather prediction, a general assumption is to treat all clouds in the atmosphere as ideal air parcels and use a trajectory or dispersion model. In this study, we divide our model in two stages: tracking stage and forecast stage. In the tracking stage, trajectories are determined for each cloud. The tracking method is based on two “weakened” methods: weakened MTREC and weakened variational analysis. We say that they are weakened because we do not completely follow the original MTREC and variational analysis, rather we only employ their conceptual ideas. From MTREC we use its nested region-tracking scheme; for variational analysis, we use a similar mathematical method but we omit error terms from “true observations” because there are no “true observations” available. The forecasting stage starts immediately after the end of the tracking stage. In the forecast stage, the identified regions are extrapolated using a semi-Lagrangian trajectory model. For convenience, reflectivity values taken from weather radars are re-projected into a stack of raster layers at different altitudes with equal-sized square cells using the method described in Chapter 2. These raster layers can be treated as digital images and each cell containing a reflectivity value is represented as a pixel on an image, thus we call them reflectivity images. Clouds and rainfall regions are both represented using pixels on these reflectivity images. In this study, a rainfall region is defined as those clustered pixels on the reflectivity images where cell values largerthan 10dBZ. Although it is ideal to create a model to track movements for individual storms, as previously mentioned, TCs are mainly composed of stratiform clouds and individual

70 storm tracking techniques underperform in this scenario because cloud boundaries of individual storms are difficult to distinguishPierce ( et al., 2004). For example, if we define storm boundaries using a certain reflectivity threshold, two large regions ofthat reflectivity value may be connected by a single pixel to produce a single larger region whereas the desired outcome would be to split the regions at the location of the single pixel. Thus, we decide to fallback to pixel-based nowcasting methods, which is utilized in all TREC-derived methods. The method produces a trajectory for each pixel. 4.2.2 Basic Method

The basic form of our nowcasting can be written as following mathematic expressions:

ˆ Z(t0 + τ, x) = Z(t0, x − α) − τQ(t0, x − α) (4–1) it uses displacement vectors α and observation Z to predict Zˆ at with leading time of τ. Change of rainfall rate are accounted using a source/sink term Q. Fig. 4-1 and Fig. 4-2 show the setup and general workflow of tracking and predicting stages in this study. In the tracking stage, three motion fields are calculated from 4 consecutive images: t and t + 10min, t + 10min and t + 20min, t + 20min and t + 30min. Then one motion field is set to the mean value of the 3 fields. This step adoptsthe DITREC idea of using the last several consecutive images to avoid disordered vectors. Then in the next stage, the averaged motion is corrected using a variational analysis technique. During the prediction stage, a pixel’s displacement vector α is determined using semi-Lagrangian extrapolation scheme (Staniforth and Côté, 1991) over the motion field domain. It is noticeable that a pixel’s displacement vector α is always from its start point x − α to its end point x, but its actual track may be a curve because it follows the cyclonic rotation in a TC (Fig. 4-3). Details of these steps are explained in the following sections.

71 4.2.3 Calculating Motion Field

The basic method to obtain the displacement vector of a block is TREC. The first step is to calculate the correlation of two radar reflectivity images as: ∑ ∑ ∑ Z (x )Z (x ) − Z1(x1) Z2(x2) R = √ 1 1 2 2 n (4–2) 2 − ¯ 2 − ¯ (Z1 (x1) nZ1)(Z2 (x2) nZ2)

TREC requires a predefinition of area (e.g. a polygon contains the cloud) andforthe defined area of Z1 in the first reflectivity image, then in the second image, itsearches for another area with same shape which gives the highest R value, then computes the −−−→ vector of Z1Z2 as the motion vector. Since the predefined shape may not be rectangular, the TREC method flattens all reflectivity pixels into an 1D array in the left-to-right, top-to-bottom order. In this study, we simple divide the entire domain into a fishnet, and track each block on the fishnet. Since all predefined shapes are squared blocks, wecan skip the flattening stage and use 2D normalized cross-correlation between two general digital images to represent same correlation R in TREC. Determining the proper size of blocks in a TC scenario could be difficult. If the block size is too large, it cannot capture the rotational motion of convective clouds in the eyewall of the TC. If it the block is too small, it may lead to chaotic motion vectors because TREC ignores the fact that the low pressure system is rotating in Northern Hemisphere cases. To overcome this limitation, we employ the concept from MTREC where large blocks are used to obtain the synoptic-scale motion. The first step is to divide the entire reflectivity image intoa tessellation of large squared blocks, each block contains 64 × 64 pixels. We choose 64 × 64 as the biggest block size because we use 3x3km resolution in the reflectivity images (i.e. each pixel is 3 × 3km). Then a large block size is about 200km (64 × 3km=192km), which is suggested in the original MTREC research (Wang et al., 2013). In the next step, each large block is recursively divided into four small blocks in a quad-tree-styled pattern (Finkel and Bentley, 1974) to obtain finer detail until the level of 8x8 block-size. Atthe 64x64 level, any block that yields R < 0.25 is discarded and filled by averaging its four

72 connected neighbors. At all lower levels, if R < 0.25 occurs, or the obtained motion vector is 30 degrees away from its 1-step upper level vector, it is replaced by the upper level vector. When using TREC with reflectivity images, R can only be obtained with a sufficient number of pixels in a region. This is usually not an issue in a single-radar scenario when a TC is near that radar station, as a TC is usually much larger than a single-radar’s scanning domain so it will guarantee a “filled” image. However, in a multi-radar scenario, a large analytical domain may be selected to enclose the entire TC and related mesoscale interactions for several hours, leading to large blank areas without sufficient reflectivity pixels to calculate R. Also, some TCs quickly dissipate after landfall, or their cloud “pieces” may become more dispersed (Zick and Matyas, 2016). To fill out those blank areas, in the last step, we interpolate the calculated motion vectors over the analytical domain. To complete this task, we assume wind speed along the analytical domain boundary is 0, unless there are vectors that were calculated. The motion vector domain is interpolated into 4x4 block-size resolution, which means all 16 pixels inside the 4x4 block will share a same motion vector value. Since all interpolation methods make assumptions about certain spatial patterns, interpolation will always create some “artificial” patterns. To further improve the quality of interpolated result, in the nextstep, we adopt a variational analysis technique to correct the wind field. 4.2.4 Motion Field Correction

The calculated and interpolated motion vector field from the previous step hasa flaw in that it may not follow a basic characteristic of the atmosphere, which isthatthe atmosphere is continuous and smooth. To get a realistic motion vector field from the obtained field, we use a data-assimilation technique called “variational analysis”. The technique is initially designed for adjusting NWP models with external observations. It uses model states (called “model guess”) to match external observational data (often called “ground truth”), without breaking thermodynamic rules. This process usually employs a numeric optimization method called 3D variational analysis (3DVAR) or

73 stochastic analysis method called ensemble Kalman filter (EnKF) to reach a best balance (called “best estimation”) between model guess and ground truth. In this study, we treat the motion field obtained from the last step as the “model guess”, but wedon’t have “ground truth” to perform a 3DVAR routine. Thus we simplify this method into “2DVAR” that only counts the total cost of producing a continuous and smooth motion field and differences between the result and initially-calculated fields, thenuses a mathematical optimization technique to minimize that cost function. A very similar method is reported to be successful in the Korean radar network (Bellon et al., 2010). In this study, we create a function with two penalties as mentioned before, a continuity penalty JC (Li et al., 1995) and and a smoothness penalty JS. The cost function to be minimized is

J(u, v) = J + J + αJ (4–3) ∫0 C S 2 2 J0 = β(x)[(u − u0) + (v − v0) ] (4–4) Ω

u is the x-component of wind and v is the y-component of the wind over the analytical domain Ω, where u0 and v0 is the calculated motion vector from the previous step. The β(x) is the background error covariance that reflects the radar data quality at the location of (x, y). For example, in the area where clutter and partial blockage of the radar beam often occurs, the weight will be lower. J0 reflects the total differences between final field and first guess field obtained in previous step. The continuity term

JC is used to maintain mass conservation. In this scenario, we cannot enforce mass conservation everywhere in COTREC for two major reasons: (1) as COTREC is a single-radar algorithm, motion vectors on its domain boundary are often not zero, thus it will allow a precipitation region to come inside and go outside of the domain; while in our scenario, vectors on boundaries are mostly 0, meaning that we do not expect precipitation to enter or leave the domain as we have mosaicked a large enough region to completely

74 encompass the TC; (2) a TC often contains to convective clouds with strong updraft and downdraft air flows, thus across a fixed 2D altitude, mass may not be conserved. Thuswe impose a weak constraint on mass conservation so that the total mass in the analytical domain is conserved. This penalty term is:

2 JC = λC D ∂ρu ∂ρv D = + ∂x∫ ∂y 1 ∂ρu ∂ρv 2 −1 λC = [ ( + ) dxdy] (4–5) Ω Ω ∂x ∂y

Neglecting the compressibility over the analytical domain Ω, ρ will be a constant and can be discarded in JC . The second term JS is the smooth penalty. It is defined as (Wahba and Wendelberger, 1980):

∫∫ 2 2 2 2 2 2 ∂ u 2 ∂ u 2 ∂ u 2 ∂ v 2 ∂ v 2 ∂ v 2 JS = [( 2 ) + 2( ) + ( 2 ) + ( 2 ) + 2( ) + ( 2 ) ]dxdy (4–6) Ω ∂x ∂x∂y ∂y ∂x ∂x∂y ∂y

Finally, JS is scaled byα – an constant factor. A previous study (Gao et al., 2005) shows that ∇2J(u, v)’s smallest eigenvalue is larger than unit value of 1 (i.e. J(u, v) is positive definite), thus there exists a global minimal solution, and can be solved bythe Conjugate Gradient method. During experiments, we found that since our first guess is closed to the global minimal point, thus a quasi-Newton method like Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) usually converges very fast. Fig. 4-4 shows how the minimized cost function J(u, v) restores an idealized, symmetric circulation from a large missing patch in the third quadrant. The left panel shows interpolated (green) components from existed motion vector (blue), and the right panel shows restored components (right) using the variational analysis. It is obvious that the proposed method can restore the original symmetrical pattern in the circular flow.

75 4.2.5 Advection Scheme

Once the final motion field is obtained, it is taken unchanged for the entire forecast stage. TREC-based nowcasting methods usually use linear extrapolation during the entire period (Wilson et al., 1998). In fluid dynamics, this is called Eulerian advection, written as

a = τu(t0, x) (4–7)

Linear extrapolation assumes all cloud structures keep their newest status and move along straight lines during the entire forecasting period. Since the assumption of linear movement breaks a basic physical fact that atmospheric is a non-linear system, it cannot realistically account for nonlinear changes in the atmosphere. As a result, linear extrapolation only produces reliable predictions for a short amount of time, in which non-linearity can be neglected, often around 30-60 minutes. After that, errors become significantWilson ( and Mueller, 1993). Through experiments, we confirm that linear extrapolation is not suitable for use in a TC scenario where winds have a tangential component to their motion. For a simple example, if rotation is neglected in a mature hurricane whose motion vectors are almost tangential, advecting the vectors linearly will tear the hurricane into pieces, leading to unreliable results in about 30 minutes. Later researchers reported using Lagrangian advection (Berenguer et al., 2011; Mandapaka et al., 2012; Zahraei et al., 2012) that is effectively moving each pixel along its track according to the motion field. The scheme can adopt rotation and curved tracks but ithas a problem in that the final position of a pixel usually does not fall exactly on thecenter of grid cell, but instead overlaps with grid lines. Further, multiple pixels may partially overlap each other at the destination. Questions arise as to how to properly handle those unaligned pixels and partially overlapped areas given that rain drop size distribution information is not available inside each pixel. Hence, we choose to use a semi-Lagrangian scheme convert a common extrapolation to an implicit extrapolation.

76 The semi-Lagrangian scheme is initially presented by Sawyer (1963) and further developed by Robert (1981), and refined by Turner et al. (2004). It is widely applied in trajectory models, for example, it is used in HYSPLIT model from National Oceanic and Atmospheric Administration [NOAA (Stein et al., 2015)]. In a semi-Lagrangian scheme, we choose a pixel at time t0 + τ and try to traverse back to t0 with time step of ∆t in order to see where it comes from. Just like the Lagrangian scheme, its source “destination” may not fall on an exact cell center, its nearest 8 neighbors within a 6km buffer zone (two-cell-sized buffer zone) are selected to interpolate to such a value. Since we start from the endpoint in the semi-Lagrangian scheme, it is unclear what the momentum of the tracked air parcel is at the very beginning, we need iterative steps to determine the final displacement vector from its source to current endpoint:

∑N a a = ∆tu (t , x − ) (4–8) i i i 2 i=1 where ∆t is the time step for iteration in which air parcels are advected linearly, u(t, x) is the motion vector at position x,time t. Since we assume the motion field is static, we have u(t, x) = u(t,0 x). We found that the semi-Lagrangian scheme can converge very quickly in less than 3 iterations on the motion field obtained by Eq. 4–3. Fig. 4-5 shows the track for an air parcel from the position marked in blue star as it moves inside towards the eyewall simulated by our semi-Lagrangian scheme. We choose to test it on an ideal stationary cyclone because we would like to avoid the external factors of TC’s motion itself and interaction with surrounding weather systems in a real TC case. The ideal cyclone we set up is based on a climatological parameterization scheme presented by Chavas et al. (2015). Each pixel on Fig. 4-5 is 20km. The stationary cyclone has a circular eye with a radius of 20km. We can clearly see that the air parcel comes from the outer bands and spirals into the eyewall.

77 4.2.6 Determine the Source/Sink Term and Extrapolation

The source/sink term Q in Eq. 4–1 represents the growth and decay of rainfall regions, which represents a major source of poor nowcasting if ignored Browning (1982). In a Lagrangian or Euler extrapolation scheme, such growth and decay needs to be calculated over matched blocks between the last two consecutive images. But in a semi-Lagrangian scheme, there is no need to trace and move blocks; it is simpler to calculate the rate of change of Q in each block along time. In this study, we calculate the average reflectivity change rate during the entire tracking period for mean reflectivity at each 4x4 block. This rate is also applied to any interpolated source pixels located in an adjacent 4x4 block. 4.3 Use Geographic Information System as an Integrated Platform

Traditionally, weather radar-based warning tools are implemented as independent tools, like Python ARM Radar Toolkit from Argonne National Lab, Department of Energy (Helmus et al., 2013), Radx from University Community of Atmospheric Research, Research Applications Laboratory (Dixon and Wiener, 1993), and Warning Decision Support System – Integrated Information from Oklahoma University, Cooperative Institute for Mesoscale Meteorological Studies (Lakshmanan et al., 2006). Most systems come with only command line and programming libraries, or with a very unstable and out-of-date user interface on a UNIX operation system (e.g. SOLO). Thus, in this study, we present the usefulness and convenience of using GIS as an environment for calculation, analysis and visualization purposes. Using GIS as the integrated platform has the following advantages: (1) Although raw radar data represent a 3D sampling of the atmosphere, 2D radar products and algorithms are still widely used. Thus, using a spatial database (e.g. PostGIS on PostgreSQL) will allow better data management of 2D radar products as both vector (radial radar products) or raster formats (gridded radar products), as radar data are normally organized in a plain filesystem as files and directories. (2) Recent improvements in the temporal support in GIS (Christakos

78 et al., 2012) permits the sorting and storage of forecasting layers along time, leading to a better search performance and advanced visualization ability (e.g. animation). (3) After comparing individual weather radar tools, in 2008, the GIS community developed comprehensive standards to display and manipulate spatial data over the internet and web browsers. This enhances the dissemination of scientific research results to the public because it allows the public to acquire results without using professional software. (4) Last but not least, the Graphic User Interface (GUI) in GIS is easier to use and more stable compared to any weather radar tools developed, thus researchers can instantly visualize and verify outputs from experiments. We find that adopting GIS in this research is feasible, especially thanks torecent technical improvements of new ArcGIS Pro software. First, ArcGIS Pro uses the official Python binding of the newest netCDF C library, rather than an out-of-date implementation in ArcMap, thus it is possible to read large climate and weather data in netCDF4 formats stored in Climate Format (CF) convention. Second, ArcGIS Pro is a 64-bit software and thus removes 2GB memory limits when handling large datasets. We find that it is very difficult to manipulate dual-polarized Doppler radar data within2GB memory due to their high spatio-temporal resolution. Furthermore, improved rendering ability permits fast display of radar images and animations for better user experiences. Although there are many other kinds of GIS software besides ArcGIS series software, we still predominately utilize ArcGIS as the platform because it is widely used in geographic and geospatial technical communities, including teaching, research, decision-making. According to Environmental Systems Research Institute – the manufacturer of ArcGIS series software, ArcGIS occupied more than 60% of GIS market and user community in the year of 2015 (ESRI, 2017). But during our study, we find that GIS (including ArcGIS Pro and other open source platforms like QGIS) still has several drawbacks in processing radar data. For example, GIS is not natively compatible with radar products, especially those in a non-Cartesian coordinate system. Also, processing radar data in real-time may

79 require distributed computation and complex design, whose implementation inside a GIS software architecture is a non-trivial effort. This was a limiting factor in several studies where a GIS was used to measure the spatial attributes of radar reflectivity in numerous landfalling TCs published by (Matyas, 2007, 2010a,b, 2013). However, bringing GIS into this scope will contribute to lower technical barriers to the utilization of high-resolution precipitation data in the geographic community, and easier interactive visualization to present results to the public with a graphic user interface and internet browser. Some efforts to overcome these drawbacks are presented in the previous two chapters. 4.4 Performance Evaluation

4.4.1 Data and Methods

The case we present is Hurricane Isabel that made landfall over North Carolina in 2003 (Fig. 4-6). We choose the Isabel case for following reasons. First, the TC has a large and clearly-defined circulation center that allows convective clouds in the eyewall tobe analyzed at 3km resolution. Secondly, adequate radar reflectivity data are available for 36 hours (from Sep 18 2003 0900 UTC to Sep 19 2003 2100 UTC) while the storm was over land, leaving enough range to pick up one nowcasting event. Third, Isabel became restructured into a cold-cored low pressure system as it moved within radar range. Nearly half of Atlantic basin TCs experience this rapid change in structure (Hart and Evans, 2001), which causes rainfall regions to fragment and disperse from and dissipate behind the storm center Atallah et al. (2007); Zick and Matyas (2016). Testing our model during Isabel’s restructuring process will allow us to evaluate the performance of when a TC experiences changes in organization in both tangential and radial directions. As detailed in Chapter 2, we create a time series of reflectivity mosaics from radar stations located within 600km of the storm center. After quality control and preprocessing, data are gridded at a 3 × 3 × 0.5 km spatial resolution, and 10 minute temporal resolution. In grid cells where multiple reflectivity values are available performed several experiments and determined that using the highest value from those available

80 is the best solution, as we found that employing a weighted average algorithm leads to a low bias. This may due to the fact that some stations have a slightly weaker signal (Fulton, 1998). Cells with missing values are filled using the Cressman interpolation (Cressman, 1959). Traditionally, quantitative precipitation estimation (QPE) is based on the composite reflectivity using a Z-R relationship (Jorgensen and Willis, 1982). But recently, Ping-Wah et al. (2014) points out that using composite isothermal reflectivity below 0℃ instead of composite reflectivity in QPE improves the correlation in Z-R relationship due to avoiding of the overly high reflectivity values generated by melting hydrometeors around the freezing level Austin and Bemis (1950). Verification using the North American Regional Reanalysis (NARR) dataset, which has a reasonable representation of TC position and size over the U.S. (Zick and Matyas, 2015) shows that the 0℃ isotherm appears between the altitudes of 4.0–4.5km over the entire analytical domain during the study period. Thus a composite reflectivity is calculated using data below 4km. Further, the composite reflectivity values are filtered with a low-pass filter with a 5x5 moving window. The filtered images are used for tracking but we use original images to predict actual reflectivity value. The tracking stage is 0.5 hour (e.g., 1730 –1800 UTC) and the forecasting period is 8 hours, with a 10 minute resolution of through the period. After that we use the variational analysis technique to obtain the final field. We set β(x) to I for the whole domain, which means that data quality is assumed to be the same over the domain. Because the Level-II data are quality controlled before mosaicking is performed, errors in data due to problems such as instrument errors are not a concern. The reflectivity images are processed using ArcGIS Pro and its Spatial Analytic Toolbox. Motions vectors and numeric minimization in the variational analysis are programmed and calculated by using Python with Scientific Python (scipy) package. All reflectivity images and motion vector fields are saved in a geodatabase for better management.

81 4.4.2 Results

Fig. 4-7 shows the reflectivity change rate (source/sink term) Q determined during the tracking stage. Fig. 4-8 shows a zoomed view near the inner core area from the final obtained motion field served for forecasting. It is clear that the field captured the rotation inside Isabel. Fig 4-9 shows a correlation between forecasting results and corresponding observed results. We find that the semi-Lagrangian advection scheme can produce reliable forecasts out to about 7 hours before it falls to a decorrelation point defined as

1 R = e (Zawadzki, 1973). The reflectivity correlation drops linearly in the semi-logarithmic scale. This drop matches that in a previous study by Germann and Zawadzki (2002) who stated that a good advection scheme should be able to maintain a consistent accuracy rate over the forecasting period. In other words, with a consistent accuracy rate, there should be an exponential drop over time, which shows a general linear relationship in a logarithmic scale. To evaluate the prediction success, we employ three standard scores used in operational radar-based nowcasting called contingency tables (Doswell III et al., 1990; Schaefer, 1990) These scores are calculated by taking point by point comparisons at the prediction time between the value observed by the radar and the predicted value. If both of the measured value and the predicted value are larger than a threshold, successful nowcasting is considered. If the measured value is larger than the threshold while the predicted value is smaller than the threshold, this is considered to be a failure. If the measured value is smaller than the threshold while the predicted value is larger than the threshold, this constitutes a false alarm. We choose use 24dBZ as the threshold as it roughly equals 0.1mm hr−1 rain rate (about 1inch daily) according to Rosenfeld’s Tropical Z-R relationship (Rosenfeld et al., 1993), while 1 inch over a day is generally considered a threshold with which to identify TC-related rainfall (Groisman et al., 2012). The contingency table contains three indices that are calculated based on the three criteria, probability of detection (POD), false alarm ratio (FAR) and critical success index (CSI) (Polger et al., 1994). Their equations are as follows:

82 a POD = (4–9) a + b c F AR = (4–10) a + c a CSI = (4–11) a + b + c where a is total pixels of successful nowcasting, b is failed nowcasting, c is false alarms. Fig 4-10 depicts the three skill scores over the prediction period. A roughly linear trend over time is observed for these CSI and POD scores, which also indicates a consistent hit rate for each step during the forecasting period. We also see FAR increase rapidly, this also matches patterns in previous research (Germann and Zawadzki, 2002; Turner et al., 2004). But we can see the semi-Lagrangian scheme outperforms linear extrapolation as the linear extrapolation can only produce a reliable prediction about 0.5-1 hours. 4.5 Summary and Future Study

In this paper, a methodology is presented to forecast a TC’s rainfall distribution up to 6 hours into the future using a high-resolution Doppler radar reflectivity mosaic in a large analytical domain. The method contains three steps to produce a reliable forecasting time series. First, a nested reflectivity motion vector retrieval method is designed. It uses the normalized 2D correlation between two reflectivity images to calculate motion vectors in a quad-tree pattern. Second, a variational analysis method creates a realistic motion field. This method minimizes a cost function with three constraints: the residual of the reflectivity conservation equation, mass conservation over the entire domain and a smoothing penalty function. Finally a semi-Lagrangian scheme is designed to adopt three critical factors in nowcasting: mesoscale-sized circulations, momentum of air parcels, and growth and dissipation of precipitation area. The results of the case study examining a landfalling hurricane with rapidly evolving rainfall regions shows that an acceptable prediction can be extended from the 1–2 hours currently available in a single radar application to about 6 hours in a multi-radar scenario.

83 Future research should extend this model from a deterministic model to a statistic model, which gives both predicted value and uncertainty at the forecasting stage. This idea is reported in previous research (Xu et al., 2005) from a purely statistical aspect. It shows that a quantitative measurement of uncertainty improves accuracy in a small-scale storm, but a similar study on a large mesoscale system like a TC does not exist. Besides understanding uncertainty quantitatively for each forecasting time, measuring uncertainty and error spatially is also a potential topic to extend from this study.

84 Figure 4-1. The procedure of the tracking stage. It takes four consecutives reflectivity images over 30 minutes with a 10 minute interval. The TREC motion vector is based on a nested TREC calculation scheme

85 Figure 4-2. Workflow of the prediction stage, which this study sets to 8 hours witha10 minute time step.

86 Figure 4-3. Illustration of a pixel’s displacement vector and its actual track.

87 Figure 4-4. The variational analysis corrects interpolated field (green in left pane) toa realistic and symmetric field (red in right pane).

88 Figure 4-5. Simulated track of an air parcel as it moves in a spiral trajectory into the eyewall of an idealized stationary hurricane using the semi-Lagrangian scheme.

89 Figure 4-6. Radar reflectivity at Isabel’s landfall on 1927 UTC 18 Sep. 2003. Itisa composite reflectivity below 4km altitude (below freezing layer) .Itis noticeable that we cannot see a clear boundary of radar circles.

90 Figure 4-7. The source/sink term Q for the prediction period.

91 Figure 4-8. A zoomed view for motion vectors in the inner core area with background of reflectivity Z. For visibility, vectors are shown at 16x16 block-size (48kmx48km) scale

92 Figure 4-9. Correlation as defined in Eq. 4–2 between forecast and observation during prediction period

93 Figure 4-10. Skill scores during the prediction period

94 CHAPTER 5 CONCLUSION The work presented in this dissertation directly contributes to efforts by geographers to quantify the spatial patterns of tropical cyclone rainband structures. Previous research was limited to detailed case studies Matyas (2008, 2009), a small sample of storms (Matyas, 2006, 2007), a small subset of analysis times Matyas (2010a, 2013), or only examining a single reflectivity thresholdMatyas ( , 2007, 2010b,a,c). None of these studies were able to consider the vertical dimension of the TC rainbands. Matyas (2010c) discovered that few Geographers employ radar data in their research and a likely reason is the need for high-powered computation resources and the incompatibility of data and processing requirements with traditional spatial analytical tools employed by Geographers, mainly Geographic Information System (GIS). The new methods reported in this dissertation should allow for geographers to obtain high-resolution 3D weather radar products in a large research domain and with finer temporal scale. This dissertation also presented a cross-disciplinary study using weather radar data to better predict (nowcast) Tropical Cyclone (TC) precipitation distribution, with distributed computing techniques and geospatial analytical methods to enhance the computational capability and speed of the prediction. This dissertation improved our understanding of the evolution and predictability of TC rainfall from both methodological and technological perspectives. From a methodological perspective, this study built a better forecasting model that extends radar-based short-term forecasting in four ways: (1) it extended forecasting area capability local area forecasting to synoptic scale regional forecasting; (2) it improved reliable forecasting time period from less than 2 hours to about 6-8 hours; (3) it added new functionality to forecast rainfall caused by landfalling TCs; and (4) it elevated computation capability to utilize larger volumes of observations inputs from multiple radar stations. From a technological perspective, this dissertation improved data processing capabilities that empower researchers to process complex

95 weather and climate data in large volume efficiently with lower requirements of specialized software environment, professional meteorological training and expensive hardware investments. Thus, this dissertation makes a substantial contribution to broader impacts to prompt research not only in meteorological and geographic community, but also all other communities that utilize weather data and geospatial analysis technology. 5.1 Innovations on Generating Radar Products in Geospatial Formats

Chapter 2 addressed the basic question to increase radar data processing efficiency and accuracy that is required by the TC rainfall prediction model. In Chapter 2, this question is solved by creating a fast and scalable playback framework based on the MapReduce paradigm taking Level-II data from multiple radars and combining them into a 3D mosaic domain. Generated radar products were stored in a 3D grid which is like a multi-layer raster format, permitting scientists in Geography and other related areas to easily apply domain-specific spatial analytical algorithms. Level-II Radar data were mapped by time, elevation scan and gates then transformed into Cartesian coordinates in parallel map functions in the format of Resilient Distributed Datasets (RDDs). RDDs carrying key-value pairs were grouped, mosaicked into 3D domain and further interpolated with geospatial functions. This framework was evaluated and was found to be capable of running across both low configuration single workstations and high configuration clusters. It demonstrated good compatibility with traditional HPC architecture and job schedulers commonly available for the advanced research community. With a highly efficient, Open Geospatial Consortium-standard compatible and in-memory data exchange protocol between Apache Spark and ArcGIS Runtime, the system successfully adopted GIS and geospatial functions into radar data processing that should enhance collaboration between researchers inside and outside of the radar meteorology community. 5.2 Capabilities Enhancement of Geospatial Analytic on Radar Products

The basic question in the Chapter 3 followed up from Chapter 2 to develop a feasible method to perform spatial analysis on a large volume of radar products generated by the

96 playback system. This question was addressed by designing a scalable, arcpy-compatible, cross-platform library called “arc4nix”. Arc4nix utilized C/S architecture to bridge geospatial functions between native operating systems and GIS-ready servers. On the client side, arcpy geoprocessing functions were converted to arc4nix dummy geospatial functions for server-side execution; on the server side, the real geospatial functions were restored from the dummy functions and executed, with results saved on the storage accessible from both server and client sides and dummy results sent back to client for follow up executions. Arc4nix showed high compatibility with original arcpy interfaces in three aspects: arcpy geoprocessing functions could be used directly in arc4nix with no modifications; map algebra for raster dataset written in Python native expressions was also supported in arc4nix; in-memory objects like database cursors could be directly evaluated as code segments, which was the only instance when certain code modifications might be necessary. Conclusively, arc4nix enhanced the capability of GIS software to manipulate large volumes of radar products generated from the playback framework. Its unique advantage of allowing a user to run their original scripts on both single-node and multi-node environments with minimal modification achieves the goal of analyzing radar products in-place on a distributed Linux environment where radar products were generated. The demonstrated case that involved the calculation of spatial statistics on the radar-detected rainband structures of Hurricane Isabel proved its application capabilities on radar products in a large volume. Arc4nix also showed a strong broader impact in that it can be adopted to run all ArcGIS-supported geospatial functions by communities outside geography and meteorology where a demand exists for high-performance geospatial computation. This enhancement was able to prompt cross-disciplinary research using radar, weather and climate datasets.

97 5.3 Predictability Improvements on Tropical Cyclone Rainfall Using Weather Radar

In the Chapter 4, the methods developed in the previous two chapters were employed along with a new algorithm to make short-term predictions of TC rainfall by utilizing ground-based radar reflectivity data. One question explored in this chapter was whether utilizing large-scale observations could increase predictability in a radar-based nowcasting model. The question was solved by adopting multiple state-of-art current nowcasting models that utilize a single radar and extending their capability to the regional scale by incorporating data from dozens of radars. Another important question was about incorporating tangential motion so that TCs can be more accurately predicted. This question was solved by designing a method contained three steps to produce an improved forecasting time series. First, a nested reflectivity motion vector retrieval method was developed. It uses the normalized correlation between two reflectivity images to calculate motion vectors in a top-down pattern. Second, a variational analysis method creates a realistic motion field. This method minimizes a cost function with three constraints: the residual of reflectivity conservation equation, mass conservation over the entire domain and a smoothing penalty function. Finally a semi-Lagrangian scheme is designed to adopt three factors in nowcasting: mesoscale-sized circulations, momentum of air parcels, growth and dissipation of precipitation area. It was found that this model achieved our goal to forecast a TC’s rainfall distribution reliably to 6 hours with high-resolution Doppler radar reflectivity mosaic images in a large analytical domain, which extended state-of-art capability from the previous benchmark of 1-–2 hours when using a single radar observation. 5.4 Future Directions

The work presented in Chapter 2 was our first effort to create an open-source, cross-platform, user-friendly project for large scale radar processing. After its publication as (Tang and Matyas, 2016), based on this study, new mosaicking methods are being

98 designed using a more abstract MapReduce paradigm and spatio-temporal key, but eliminating redundant data transfer in a distributed computing framework as well as additional interpolation steps. The new framework can utilize graphical cards which greatly accelerate gridding computations. Preliminary tests show that the product generation at same spatio-temporal resolution as presented in Chapter 2 can be done with a single workstation rather than HPC cluster. Meanwhile, future improvements to arc4nix include plans to improve its compatibility on a Linux system by utilizing the newest API provided in ArcGIS Server 10.5 to remove the requirements of code modification using database cursors. The TC rainfall prediction model presented in Chapter 4 is effectively a deterministic model. Future modifications include adopting statistical models to measure uncertainty quantitatively and spatially. We also plan to explore the incorporation of a machine learning algorithm to use historical observations to calibrate forecast data.

99 REFERENCES Agarwal, D. and Prasad, S. K. (2012). Lessons learnt from the development of gis application on azure cloud platform. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pages 352–359. IEEE. Andrieu, H., Creutin, J., Delrieu, G., and Faure, D. (1997). Use of a weather radar for the hydrology of a mountainous area. Part I: Radar measurement interpretation. J. Hydrol., 193(1):1–25. Ansari, S., Del Greco, S., and Hankins, B. (2010). The weather and climate toolkit. In AGU Fall Meeting Abstracts, volume 1, page 06. Apache Spark (2016). Apache Spark is a fast and general engine for large-scale data processing. Apache Incubator (2013). Spark: Lightning-fast cluster computing. Atallah, E., Bosart, L. F., and Aiyyer, A. R. (2007). Precipitation distribution associated with landfalling tropical cyclones over the eastern united states. Mon. Wea. Rev., 135(6):2185–2206. Austin, P. M. (1987). Relation between measured radar reflectivity and surface rainfall. Mon. Weather Rev., 115(5):1053–1070. Austin, P. M. and Bemis, A. C. (1950). A quantitative study of the “bright band” in radar precipitation echoes. J. Meteor., 7(2):145–151. Barnolas, M., Atencia, A., Llasat, M., and Rigo, T. (2008). Characterization of a Mediterranean flash flood event using rain gauges, radar, GIS and lightning data. Adv. Geosci., 17:35–41. Bellon, A., Zawadzki, I., Kilambi, A., Lee, H. C., Lee, Y. H., and Lee, G. (2010). McGill algorithm for precipitation nowcasting by lagrangian extrapolation (MAPLE) applied to the south korean radar network. Part I: Sensitivity studies of the variational echo tracking (VET) technique. Asia-Pac. J. Atmos. Sci., 46(3):369–381. Benjamin, S. G., Weygandt, S. S., Brown, J. M., Hu, M., Alexander, C. R., Smirnova, T. G., Olson, J. B., James, E. P., Dowell, D. C., Grell, G. A., et al. (2016). A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144(4):1669–1694. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517. Berenguer, M., Sempere-Torres, D., and Pegram, G. G. (2011). SBMcast–an ensemble nowcasting technique to assess the uncertainty in rainfall forecasts by Lagrangian extrapolation. J. Hydrol., 404(3):226–240.

100 Bluestein, H. B. and Hazen, D. S. (1989). Doppler-radar analysis of a tropical cyclone over land: Hurricane Alicia (1983) in Oklahoma. Mon. Wea. Rev., 117(11):2594–2611. Bluestein, H. B., Rauber, R. M., Burgess, D. W., Albrecht, B., Ellis, S. M., Richardson, Y. P., Jorgensen, D. P., Frasier, S. J., Chilson, P., Palmer, R. D., Yuter, S. E., Lee, W.-C., Dowell, D. C., Smith, P. L., Markowski, P. M., Friedrich, K., and Weckwerth, T. M. (2014). Radar in atmospheric sciences and related research: Current systems, emerging technology, and future needs. Bull. Amer. Meteor. Soc., 95(12):1850–1861. Bosler, P. A., Roesler, E. L., Taylor, M. A., and Mundt, M. R. (2016). Stride Search: a general algorithm for storm detection in high-resolution climate data. Geosci. Model Dev., 9(4):1383. Browning, K. A. (1982). Nowcasting. Academic Press London. Bruen, M. (2000). Using radar information in hydrological modeling: COST 717 WG-1 activities. Phys. Chem. Earth Part B, 25(10):1305–1310. Cao, M. and Wei, J. (2004). Weather derivatives valuation and market price of weather risk. Journal of Futures Markets, 24(11):1065–1089. Carbone, R., Carpenter, M., and Burghart, C. (1985). Doppler radar sampling limitations in convective storms. J. Atmos. Oceanic Technol., 2(3):357–361. Carpenter, T., Sperfslage, J., Georgakakos, K., Sweeney, T., and Fread, D. (1999). National threshold runoff estimation utilizing GIS in support of operational flash flood warning systems. J. Hydrol., 224(1):21–44. Chandrasekar, V., Cho, Y.-G., Brunkow, D., and Jayasumana, A. (2005). Virtual CSU-CHILL radar: The VCHILL. J. Atmos. Oceanic Technol., 22(7):979–987. Chavas, D. R., Lin, N., and Emanuel, K. (2015). A model for the complete radial structure of the tropical cyclone wind field. Part I: Comparison with observed structure. J. Atmos. Sci., 72(9):3647–3662. Chen, F. and Dudhia, J. (2001). Coupling an advanced land surface–hydrology model with the penn state–NCAR MM5 modeling system. Part I: Model implementation and sensitivity. Mon. Weather Rev., 129(4):569–585. Christakos, G., Bogaert, P., and Serre, M. (2012). Temporal GIS: advanced functions for field-based applications. Springer Science & Business Media. Cintineo, J. L., Smith, T. M., Lakshmanan, V., Brooks, H. E., and Ortega, K. L. (2012). An objective high-resolution hail climatology of the contiguous united states. Wea. Forecasting, 27(5):1235–1248. Cluckie, I. D. and Collier, C. G. (1991). Hydrological applications of weather radar.

101 Cole, S. J. and Moore, R. J. (2009). Distributed hydrological modelling using weather radar in gauged and ungauged basins. Adv. Water Resour., 32(7):1107–1120. Cressman, G. P. (1959). An operational objective analysis system. Mon. Wea. Rev, 87(10):367–374. Creutin, J.-D. and Borga, M. (2003). Radar hydrology modifies the monitoring of flash-flood hazard. Hydrol. Processes, 17(7):1453–1456. Crockford, D. (2006). The application/json media type for JavaScript Object Notation (JSON). RFC 4627, RFC Editor. http://www.rfc-editor.org/rfc/rfc4627.txt. Croft, P. J. and Shulman, M. D. (1989). A five-year radar climatology of convective precipitation for new jersey. Int. J. Climatol., 9(6):581–600. Crum, T. D. and Alberty, R. L. (1993). The WSR-88D and the WSR-88D operational support facility. Bull. Amer. Meteor. Soc., 74(9):1669–1687. Crum, T. D., Alberty, R. L., and Burgess, D. W. (1993). Recording, archiving, and using WSR-88D data. Bull. Amer. Meteor. Soc., 74(4):645–653. Czajkowski, J., Simmons, K., and Sutter, D. (2011). An analysis of coastal and inland fatalities in landfalling us hurricanes. Nat. Hazards, 59(3):1513–1531. Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113. Delrieu, G., Braud, I., Berne, A., Borga, M., Boudevillain, B., Fabry, F., Freer, J., Gaume, E., Nakakita, E., Seed, A., et al. (2009). Weather radar and hydrology. Dixon, M. and Wiener, G. (1993). TITAN: Thunderstorm identification, tracking, analysis, and nowcasting-A radar-based methodology. J. Atmos. Oceanic Technol., 10(6):785–797. Doswell III, C. A., Davies-Jones, R., and Keller, D. L. (1990). On summary measures of skill in rare event forecasting based on contingency tables. Wea. Forecasting, 5(4):576– 585. Doviak, R. J. and Zrnic, D. S. (2014). Doppler Radar & Weather Observations. Academic press. Einav, L. and Levin, J. (2014). The data revolution and economic analysis. Innovation Policy and the Economy, 14(1):1–24. Esposito, A. and Loia, V. (2000). Integrating concurrency control and distributed data into workflow frameworks: an actor model perspective. In Systems, Man,and Cybernetics, 2000 IEEE International Conference on, volume 3, pages 2110–2114. IEEE. ESRI (2017). Independent report highlights esri as leader in global gis market. http://www.esri.com/esri-news/releases/15-1qtr/independent-report-highlights-esri-as-leader-in-global-gis-market. accessed at may. 27th, 2017.

102 Evangelidis, K., Ntouros, K., Makridis, S., and Papatheodorou, C. (2014). Geospatial services in the Cloud. Comput. Geosci., 63:116–122. Finkel, R. A. and Bentley, J. L. (1974). Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1):1–9. Franklin, J. L., Pasch, R. J., Avila, L. A., Beven, J. L., Lawrence, M. B., Stewart, S. R., and Blake, E. S. (2006). Atlantic hurricane season of 2004. Mon. Wea. Rev., 134(3):981– 1025. Fulton, R. A. (1998). WSR-88D polar-to-HRAP mapping. Gabella, M. and Perona, G. (1998). Simulation of the orographic influence on weather radar using a geometric–optics approach. J. Atmos. Oceanic Technol., 15(6):1485–1494. Gamba, P., Dell Acqua, F., and Houshmand, B. (2002). SRTM data characterization in urban areas. International Archives of Photogrammetry Remote Sensing and Spatial Information Sciences, 34(3/B):55–58. Gao, J., Droegemeier, K. K., Gong, J., and Xu, Q. (2004a). A method for retrieving mean horizontal wind profiles from single-Doppler radar observations contaminated by aliasing. Mon. Wea. Rev., 132(6):1399–1409. Gao, J., Nuttall, C., Gilreath, C., et al. (2005). Multiple Doppler wind analysis and assimilation via 3DVAR using simulated observations of the planned case network and WSR-88D radars. In 32nd Conference Radar Meteorology Albuquerque. New Mexico: Araer Meteor Soc. Gao, J., Smith, T. M., Stensrud, D. J., Fu, C., Calhoun, K., Manross, K. L., Brogden, J., Lakshmanan, V., Wang, Y., Thomas, K. W., et al. (2013). A real-time weather-adaptive 3DVAR analysis system for severe weather detections and warnings. Wea. Forecasting, 28(3):727–745. Gao, J., Xue, M., Brewster, K., and Droegemeier, K. K. (2004b). A three-dimensional variational data analysis method with recursive filter for Doppler radars. J. Atmos. Oceanic Technol., 21(3):457–469. Germann, U. and Zawadzki, I. (2002). Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology. Mon. Wea. Rev., 130(12):2859–2873. Gong, J., Yue, P., and Zhou, H. (2010). Geoprocessing in the Microsoft Cloud Computing platform-Azure. In Proceedings the Joint Symposium of ISPRS Technical Commission IV & AutoCarto, page 6. Citeseer. Goodchild, M. F. (2000). GIS and transportation: status and challenges. GeoInformatica, 4(2):127–139.

103 Goodchild, M. F. and Haining, R. P. (2004). GIS and spatial data analysis: Converging perspectives. Papers in Regional Science, 83(1):363–385. Goodchild, M. F., Steyaert, L. T., and Parks, B. O. (1996). GIS and environmental modeling: progress and research issues. John Wiley & Sons. Gorelick, N. (2012). Google Earth Engine. In AGU Fall Meeting Abstracts, volume 1, page 04. Groisman, P. Y., Knight, R. W., and Karl, T. R. (2012). Changes in intense precipitation over the central united states. J. Hydrometeorol., 13(1):47–66. Haberlandt, U. (2007). Geostatistical interpolation of hourly precipitation from rain gauges and radar for a large-scale extreme rainfall event. J. Hydrol., 332(1):144–157. Hart, R. E. and Evans, J. L. (2001). A climatology of the extratropical transition of atlantic tropical cyclones. Journal of Climate, 14(4):546–564. Healey, R., Dowers, S., Gittings, B., and Mineter, M. J. (1997). Parallel processing algorithms for GIS. CRC Press. Heistermann, M., Jacobi, S., and Pfaff, T. (2013). Technical note: An open source library for processing weather radar data (wradlib). Hydrol. Earth Syst. Sci., 17(2):863–871. Helmus, J., Collis, S., Johnson, K. L., North, K., Giangrande, S. E., and Jensen, M. (2013). The Python-ARM radar toolkit (Py-ART), an open source package for weather radar. In 93rd AMS Annual Meeting. Holleman, I. (2007). Bias adjustment and long-term verification of radar-based precipitation estimates. Meteorol. Appl., 14(2):195–203. Hsu, C.-H., Slagter, K. D., and Chung, Y.-C. (2015). Locality and loading aware virtual machine mapping techniques for optimizing communications in mapreduce applications. Future Gener Comput Syst., 53:43–54. Hu, H. (2014). An algorithm for converting weather radar data into GIS polygons and its application in severe weather warning systems. Int. J. Geogr. Inf. Sci., 28(9):1765–1780. Huang, F., Liu, D., Liu, P., Wang, S., Zeng, Y., Li, G., Yu, W., Wang, J., Zhao, L., and Pang, L. (2007). Research on cluster-based parallel GIS with the example of parallelization on GRASS GIS. In Grid and Cooperative Computing, 2007. GCC 2007. Sixth International Conference on, pages 642–649. IEEE. Huang, W. and Liang, X. (2010). Convective asymmetries associated with tropical cyclone landfall: β-plane simulations. Adv. Atmos. Sci., 27(4):795–806. Istok, M. J., Fresch, M., Smith, S., Jing, Z., Murnan, R., Ryzhkov, A., Krause, J., Jain, M., Ferree, J., Schlatter, P., et al. (2009). WSR-88D dual polarization initial operational capabilities. In Preprints, 25th Conf. on Interactive Information and Processing Systems

104 for Meteorology, Oceanography, and Hydrology, Phoenix, AZ, Amer. Meteor. Soc, volume 15. Jiang, H., Liu, C., and Zipser, E. J. (2011). A TRMM-based tropical cyclone cloud and precipitation feature database. J. Appl. Meteor. Climatol., 50(6):1255–1274. Johnson, D., Smith, M., Koren, V., and Finnerty, B. (1999). Comparing mean areal precipitation estimates from NEXRAD and rain gauge networks. Journal of Hydrologic Engineering, 4(2):117–124. Jones, D., Jones, N., Greer, J., and Nelson, J. (2015). A cloud-based MODFLOW service for aquifer management decision support. Comput. Geosci., 78:81–87. Jones, R. W. (1987). A simulation of hurricane landfall with a numerical model featuring latent heating by the resolvable scales. Monthly weather review, 115(10):2279–2297. Jorgensen, D. P. (1984). Mesoscale and convective-scale characteristics of mature hurricanes. Part II. inner core structure of Hurricane Allen (1980). J Atmos Sci, 41(8):1287–1311. Jorgensen, D. P. and Willis, P. T. (1982). A Z-R relationship for hurricanes. J. Appl. Meteorol., 21(3):356–366. Kimball, S. K. (2008). Structure and evolution of rainfall in numerically simulated landfalling hurricanes. Mon. Wea. Rev., 136(10):3822–3847. Kleczek, M. A., Steeneveld, G.-J., and Holtslag, A. A. (2014). Evaluation of the weather research and forecasting mesoscale model for GABLS3: impact of boundary-layer schemes, boundary conditions and spin-up. Boundary Layer Meteorol., 152(2):213–243. Knight, D. B. and Davis, R. E. (2007). Climatology of tropical cyclone rainfall in the southeastern united states. Phys. Geogr., 28(2):126–147. Koch, S. E., Ferrier, B., Stoelinga, M. T., Szoke, E., Weiss, S. J., and Kain, J. S. (2005). The use of simulated radar reflectivity fields in the diagnosis of mesoscale phenomena from high-resolution wrf model forecasts. In Preprints, 11th Conf. on Mesoscale Processes, Albuquerque, NM, Amer. Meteor. Soc., J4J, volume 7. Kouwen, N. (1988). WATFLOOD: a micro-computer based flood forecasting system based on real-time weather radar. Can. Water Resour. J., 13(1):62–77. Krajewski, W. and Smith, J. (2002). Radar hydrology: rainfall estimation. Adv. Water Resour., 25(8):1387–1394. Krajewski, W. F., Ntelekos, A. A., and Goska, R. (2006). A GIS-based methodology for the assessment of weather radar beam blockage in mountainous regions: two examples from the us nexrad network. Comput. Geosci., 32(3):283–302.

105 Kuo, J.-T. and Orville, H. D. (1973). A radar climatology of summertime convective clouds in the black hills. J. Appl. Meteorol., 12(2):359–368. Lakshmanan, V., Fritz, A., Smith, T., Hondl, K., and Stumpf, G. J. (2007a). An automated technique to quality control radar reflectivity data. J. Appl. Meteor. Climatol., 46(3):288–305. Lakshmanan, V. and Humphrey, T. W. (2014). A MapReduce technique to mosaic continental-scale weather radar data in real-time. IEEE J. Sel. Topics Appl. Earth Observ. in Remote Sens, 7(2):721–732. Lakshmanan, V., Karstens, C., Krause, J., and Tang, L. (2014). Quality control of weather radar data using polarimetric variables. J. Atmos. Oceanic Technol., 31(6):1234–1249. Lakshmanan, V., Smith, T., Hondl, K., Stumpf, G. J., and Witt, A. (2006). A real-time, three-dimensional, rapidly updating, heterogeneous radar merger technique for reflectivity, velocity, and derived products. Wea. Forecasting, 21(5):802–823. Lakshmanan, V., Smith, T., Stumpf, G., and Hondl, K. (2007b). The warning decision support system-integrated information. Wea. Forecasting, 22(3):596–612. Lanter, D. P. and Essinger, R. (1991). User-centered graphical user interface design for gis. National Center for Geographic Information & Analysis. Lee, W.-C. and Bell, M. M. (2007). Rapid intensification, eyewall contraction, and breakdown of Hurricane Charley (2004) near landfall. Geophys. Res. Lett., 34(2). Li, J., Davidson, N. E., Hess, G. D., and Mills, G. (1997). A high-resolution prediction study of two typhoons at landfall. Mon. Wea. Rev., 125(11):2856–2878. Li, L., Schmid, W., and Joss, J. (1995). Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteorol., 34(6):1286–1300. Li, W. and Wang, S. (2017). PolarGlobe: A web-wide virtual globe system for visualizing multidimensional, time-varying, big climate data. Int. J. Geogr. Inf. Sci., pages 1–21. Li, X. and Mecikalski, J. R. (2012). Impact of the dual-polarization Doppler radar data on two convective storms with a warm-rain radar forward operator. Mon. Wea. Rev., 140(7):2147–2167. Liang, Q., Feng, Y., Deng, W., Hu, S., Huang, Y., Zeng, Q., and Chen, Z. (2010). A composite approach of radar echo extrapolation based on TREC vectors in combination with model-predicted winds. Adv. Atmos. Sci., 27(5):1119–1130. Lin, Y.-L., Ensley, D. B., Chiao, S., and Huang, C.-Y. (2002). Orographic influences on rainfall and track deflection associated with the passage of a tropical cyclone. Mon. Wea. Rev., 130(12):2929–2950.

106 Mackey, B. P. and Krishnamurti, T. (2001). Ensemble forecast of a typhoon flood event. Wea. Forecasting, 16(4):399–415. Mandapaka, P. V., Germann, U., Panziera, L., and Hering, A. (2012). Can lagrangian extrapolation of radar fields be used for precipitation nowcasting over complex alpine orography? Wea. Forecasting, 27(1):28–49. Marr, B. (2015). Big Data: Using SMART big data, analytics and metrics to make better decisions and improve performance. John Wiley & Sons. Matejka, T. and Srivastava, R. C. (1991). An improved version of the extended velocity-azimuth display analysis of single-doppler radar data. J. Atmos. Oceanic Technol., 8(4):453–466. Matyas, C. (2006). Using GIS to assess the symmetry of tropical cyclone rain shields. Papers of the Applied Geography Conferences, 29:31–39. Matyas, C. (2007). Quantifying the shapes of U.S. landfalling tropical cyclone rain shields. Prof. Geogr., 59(2):158–172. Matyas, C. (2008). Shape measures of rain shields as indicators of changing environmental conditions in a landfalling tropical storm. Meteorol. Appl., 15(2):259–271. Matyas, C. J. (2009). A spatial analysis of radar reflectivity regions within Hurricane Charley (2004). J. Appl. Meteor., 48(1):130–142. Matyas, C. J. (2010a). Associations between the size of hurricane rain fields at landfall and their surrounding environments. Meteorol Atmos Phys, 106(3-4):135–148. Matyas, C. J. (2010b). A geospatial analysis of convective rainfall regions within tropical cyclones after landfall. International Journal of Applied Geospatial Research, 1(2):71– 91. Matyas, C. J. (2010c). Use of ground-based radar for climate-scale studies of weather and rainfall. Geogr. Comp., 4(9):1218–1237. Matyas, C. J. (2013). Processes influencing rain-field growth and decay after tropical cyclone landfall in the united states. J. Appl. Meteor. Climatol., 52(5):1085–1096. Mecklenburg, S., Joss, J., and Schmid, W. (2000). Improving the nowcasting of precipitation in an Alpine region with an enhanced radar echo tracking algorithm. J. Hydrol., 239(1):46–68. Michalakes, J., Dudhia, J., Gill, D., Henderson, T., Klemp, J., Skamarock, W., and Wang, W. (2004). The weather research and forecast model: software architecture and performance. In Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology, volume 25, page 29. Reading, UK.

107 Minciardi, R., Sacile, R., and Siccardi, F. (2003). Optimal planning of a weather radar network. J. Atmos. Oceanic Technol., 20(9):1251–1263. Mohr, C. G., Jay Miller, L., Vaughan, R. L., and Frank, H. W. (1986). The merger of mesoscale datasets into a common cartesian format for efficient and systematic analyses. J. Atmos. Oceanic Technol., 3(1):143–161. Montmerle, T., Caya, A., and Zawadzki, I. (2001). Simulation of a midlatitude convective storm initialized with bistatic Doppler radar data. Mon. Wea. Rev., 129(8):1949–1967. Nettleton, L., Daud, S., Neitzel, R., Burghart, C., Lee, W., and Hildebrand, P. (1993). SOLO: A program to peruse and edit radar data. In Preprints, 26th Conf. on Radar Meteorology, Norman, OK, Amer. Meteor. Soc, pages 338–339. Neykova, R. and Yoshida, N. (2014). Multiparty session actors. In International Conference on Coordination Languages and Models, pages 131–146. Springer. NOAA Radar Operation Center (2008). RPG SW BUILD 10.0 - includes reporting for sw 41 RDA software note 41/43. Office of the Fedefal Coordinator for Meteorological Servicea nd Supporting Research (2005). Federal Meteorological Handbook No. 1 Surface Weather Observations and Reports Surface Weather Observations and Reports, volume 1. U.S. Department of Commerce / National Oceanic and Atmospheric Administration. O’Looney, J. (1997). Beyond maps: GIS and decision making in local government. ESRI, Inc. Overeem, A., Buishand, T., and Holleman, I. (2009a). Extreme rainfall analysis and estimation of depth-duration-frequency curves using weather radar. Water Resour. Res., 45(10). Overeem, A., Buishand, T., Holleman, I., and Uijlenhoet, R. (2010). Extreme value modeling of areal rainfall from weather radar. Water Resour. Res., 46(9). Overeem, A., Holleman, I., and Buishand, A. (2009b). Derivation of a 10-year radar-based climatology of rainfall. J. Appl. Meteor., 48(7):1448–1463. Oye, D. and Case, M. (1995). REORDER: A program for gridding radar data. Installation and use manual for the UNIX version. Patel, N. N., Angiuli, E., Gamba, P., Gaughan, A., Lisini, G., Stevens, F. R., Tatem, A. J., and Trianni, G. (2015). Multitemporal settlement and population mapping from landsat using google earth engine. Int. J. Appl. Earth Obs. Geoinf., 35:199–208. Pierce, C., Ebert, E., Seed, A., Sleigh, M., Collier, C., Fox, N., Donaldson, N., Wilson, J., Roberts, R., and Mueller, C. (2004). The nowcasting of precipitation during Sydney 2000: an appraisal of the QPF algorithms. Wea. Forecasting, 19(1):7–21.

108 Pimm, S. L. (2008). Biodiversity: climate change or habitat loss—which will kill more species? Curr. Biol., 18(3):R117–R119. Ping-Wah, L., Wai-Kin, W., Ping, C., and Hon-Yin, Y. (2014). An overview of nowcasting development, applications, and services in the Hong Kong Observatory. J. Meteor. Res., 28(5):859–876. Polger, P. D., Goldsmith, B. S., Przywarty, R. C., and Bocchieri, J. R. (1994). National Weather Service warning performance based on the WSR-88D. Bull. Amer. Meteor. Soc., 75(2):203–214. Powell, M. D. (1982). The transition of the hurricane frederic boundary-layer wind field from the open gulf of mexico to landfall. Mon. Wea. Rev., 110(12):1912–1932. Rappaport, E. N. (2000). Loss of life in the United States associated with recent atlantic tropical cyclones. Bull. Amer. Meteor. Soc., 81(9):2065–2073. Rappaport, E. N. (2014). Fatalities in the United States from atlantic tropical cyclones: New data and interpretation. Bull. Amer. Meteor. Soc., 95(3):341–346. Rew, R. and Davis, G. (1990). NetCDF: an interface for scientific data access. IEEE Comput. Graph. Appl., 10(4):76–82. Rinehart, R. and Garvey, E. (1978). Three-dimensional storm motion detection by conventional weather radar. Nature, 273(5660):287–289. Robert, A. (1981). A stable numerical integration scheme for the primitive meteorological equations. Atmos. Ocean, 19(1):35–46. Romero, D. M., Galuba, W., Asur, S., and Huberman, B. A. (2011). Influence and passivity in social media. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 18–33. Springer. Rosenfeld, D., Wolff, D. B., and Atlas, D. (1993). General probability-matched relations between radar reflectivity and rain rate. J Appl Meteorol, 32(1):50–72. Sawyer, J. (1963). A semi-lagrangian method of solving the vorticity advection equation. Tellus, 15(4):336–342. Schaefer, J. T. (1990). The critical success index as an indicator of warning skill. Wea. Forecasting, 5(4):570–575. Shao, Y., Di, L., Gong, J., Zhao, P., et al. (2011). GIS in the cloud: implementing a web coverage service on amazon cloud computing platform. In Electrical Engineering and Control, pages 289–295. Springer. Sinha, A. K. (2006). Geoinformatics: data to knowledge, volume 397. Geological society of America.

109 Smith, M. and Pielke, R. (2001). Method and system for integrating weather information with enterprise planning systems. US Patent App. 09/883,340. Smith, T. M. and Lakshmanan, V. (2006). Utilizing Google Earth as a GIS platform for weather applications. In 22nd International Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology. Snyder, C. and Zhang, F. (2003). Assimilation of simulated Doppler radar observations with an ensemble kalman filter. Mon. Wea. Rev., 131(8):1663–1677. Staniforth, A. and Côté, J. (1991). Semi-lagrangian integration schemes for atmospheric models—a review. Mon. Wea. Rev., 119(9):2206–2223. Stein, A., Draxler, R. R., Rolph, G. D., Stunder, B. J., Cohen, M., and Ngan, F. (2015). NOAA’s HYSPLIT atmospheric transport and dispersion modeling system. Bull. Amer. Meteor. Soc., 96(12):2059–2077. Steiner, M., Smith, J. A., Burges, S. J., Alonso, C. V., and Darden, R. W. (1999). Effect of bias adjustment and rain gauge data quality control on radar rainfall estimation. Water Resources Research, 35(8):2487–2503. Steiniger, S. and Bocher, E. (2009). An overview on current free and open source desktop GIS developments. Int. J. Geogr. Inf. Sci., 23(10):1345–1370. Stoffel, A., Brabec, B., and Stoeckli, U. (2001). GIS applications at the Swiss Federal Institute for snow and avalanche research. In Proceedings of the 2001 ESRI International User Conference, San Diego. Sznaider, R. J., Chenevert, D. P., Hugg, R. L., Reece, C. F., and Block, J. H. (2004). GIS-based automated weather alert notification system. US Patent 6,753,784. Tang, J. and Matyas, C. J. (2016). Fast playback framework for analysis of ground-based Doppler radar observations using MapReduce technology. J. Atmos. Oceanic Technol., 33(4):621–634. Tiranti, D., Cremonini, R., Marco, F., Gaeta, A. R., and Barbero, S. (2014). The DEFENSE (debris Flows triggEred by storms–nowcasting system): An early warning system for torrential processes by radar storm tracking using a Geographic Information System (gis). Comput. Geosci., 70:96–109. Tong, M. and Xue, M. (2005). Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments. Mon. Wea. Rev., 133(7):1789–1807. Tuleya, R. E. (1994). Tropical storm development and decay: Sensitivity to surface boundary conditions. Mon. Wea. Rev., 122(2):291–304.

110 Turner, B., Zawadzki, I., and Germann, U. (2004). Predictability of precipitation from continental radar images. Part III: Operational nowcasting implementation (MAPLE). J. Appl. Meteorol., 43(2):231–248. Tuttle, J. and Gall, R. (1999). A single-radar technique for estimating the winds in tropical cyclones. Bull. Amer. Meteor. Soc., 80(4):653–668. Tuttle, J. D. and Foote, G. B. (1990). Determination of the boundary layer airflow from a single doppler radar. J. Atmos. Oceanic Technol., 7(2):218–232. Ulbrich, C. W. (1983). Natural variations in the analytical form of the raindrop size distribution. J. Climate Appl. Meteor., 22(10):1764–1775. Vasiloff, S. V., Howard, K. W., Rabin, R. M., Brooks, H. E., Seo, D.-J., Zhang,J., Kitzmiller, D. H., Mullusky, M. G., Krajewski, W. F., Brandes, E. A., et al. (2007). Improving QPE and very short term QPF: An initiative for a community-wide integrated approach. Bull. Amer. Meteor. Soc., 88(12):1899–1911. Vatsavai, R. R., Ganguly, A., Chandola, V., Stefanidis, A., Klasky, S., and Shekhar, S. (2012). Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data, pages 1–10. ACM. Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al. (2013). Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, page 5. ACM. Villarini, G., Smith, J. A., Baeck, M. L., Marchok, T., and Vecchi, G. A. (2011). Characterization of rainfall distribution and flooding associated with U.S. landfalling tropical cyclones: Analyses of Hurricanes Frances, Ivan, and Jeanne (2004). J. Geophys. Res., 116(D23). Vulpiani, G., Montopoli, M., Passeri, L. D., Gioia, A. G., Giordano, P., and Marzano, F. S. (2012). On the use of dual-polarized C-band radar for operational rainfall retrieval in mountainous areas. J. Appl. Meteor. Climatol., 51(2):405–425. Wahba, G. and Wendelberger, J. (1980). Some new mathematical methods for variational objective analysis using splines and cross validation. Mon. Wea. Rev., 108(8):1122–1143. Wang, G., Wong, W., Liu, L., and Wang, H. (2013). Application of multi-scale tracking radar echoes scheme in quantitative precipitation nowcasting. Adv. Atmos. Sci., 30(2):448–460. Wang, W., Barker, D., Bray, J., Bruyere, C., Duda, M., Dudhia, J., Gill, D., and Michalakes, J. (2007). User’s Guide for Advanced Research WRF (ARW) Modeling System Version 3.

111 Wang, Y., Wang, S., and Zhou, D. (2009). Retrieving and indexing spatial data in the cloud computing environment. In IEEE International Conference on Cloud Computing, pages 322–331. Springer. Wilson, J. W., Crook, N. A., Mueller, C. K., Sun, J., and Dixon, M. (1998). Nowcasting thunderstorms: A status report. Bull. Amer. Meteor. Soc., 79(10):2079–2099. Wilson, J. W. and Mueller, C. K. (1993). Nowcasts of thunderstorm initiation and evolution. Wea. Forecasting, 8(1):113–131. World Meteorological Organization. Nowcast (http://www.wmo.int/pages/prog/amp/pwsp/nowcasting.htm). Xu, K., Wikle, C. K., and Fox, N. I. (2005). A kernel-based spatio-temporal dynamical model for nowcasting weather radar reflectivities. J. Am. Stat. Assoc., 100(472):1133– 1144. Yang, C., Huang, Q., Li, Z., Liu, K., and Hu, F. (2017a). Big Data and cloud computing: innovation opportunities and challenges. Int. J. Digital Earth, 10(1):13–53. Yang, C., Raskin, R., Goodchild, M., and Gahegan, M. (2010). Geospatial cyberinfrastructure: past, present and future. Comput. Environ. Urban Syst., 34(4):264– 277. Yang, C., Xu, Y., and Nebert, D. (2013). Redefining the possibility of digital earth and geosciences with spatial cloud computing. Int. J. Digital Earth, 6(4):297–312. Yang, C., Yu, M., Hu, F., Jiang, Y., and Li, Y. (2017b). Utilizing Cloud Computing to address big geospatial data challenges. Comput. Environ. Urban Syst., 61:120–128. Yu, B., Seed, A., Pu, L., and Malone, T. (2005). Integration of weather radar data into a raster GIS framework for improved flood estimation. Atmos. Sci. Lett., 6(1):66–70. Zahraei, A., Hsu, K.-l., Sorooshian, S., Gourley, J., Lakshmanan, V., Hong, Y., and Bellerby, T. (2012). Quantitative precipitation nowcasting: a Lagrangian pixel-based approach. Atmos. Res., 118:418–434. Zawadzki, I. (1973). Statistical properties of precipitation patterns. J. Appl. Meteorol., 12(3):459–472. Zhang, Y., Chen, M., Xia, W., Cui, Z., and Yang, H. (2006). Estimation of weather radar echo motion field and its application to precipitation nowcasting. Acta Meteorologica Sinica, 64(5):631–646. Zhou, Y. and Matyas, C. J. (2017). Spatial characteristics of storm-total rainfall swaths associated with tropical cyclones over the Eastern United States. Int. J. Climatol.

112 Zick, S. E. and Matyas, C. J. (2015). Tropical cyclones in the North American Regional Reanalysis: An assessment of spatial biases in location, intensity, and structure. J. Geophys. Res. Atoms., 120(5):1651–1669. Zick, S. E. and Matyas, C. J. (2016). A shape metric methodology for studying the evolving geometries of synoptic-scale precipitation patterns in tropical cyclones. Ann. Assoc. Am. Geogr., 106(6):1217–1235. Zikopoulos, P., Eaton, C., et al. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.

113 BIOGRAPHICAL SKETCH Jingyin Tang is a radar meteorologist and a scientific programmer studies radar meteorology, scientific computing, tropical meteorology and physical geography atthe Department of Geography in the University of Florida. Jingyin holds a Bachelor of Science and a Master of Science degree in environmental science from Fudan University in China. He studied particle dispersions during his master’s study and developed high-performance air quality numeric models. At University of Florida, Jingyin worked with Dr. Corene Matyas on radar meteorology and its application on tropical cyclone rainfall prediction. During this period, Jingyin earned his concurrent Master of Science degree in computer science. Jingyin led collaborated research between the Department of Geography and The Department of Computer & Information Science & Engineering to develop new weather radar product generation algorithms. His research was presented in multiple national and international conferences, including the American Meteorology Society Annual Meeting, the AMS Conference on Hurricanes and Tropical Meteorology, American Association of Geographers Annual Meeting and the International Summit on Hurricanes and Climate Change. These researches were rewarded by Microsoft Azure Research Award – Climate Data Initiative, Intel Code Modernization Fellowship and Grinter Award. After graduation, Jingyin will work as a senior meteorological software engineer in The Weather Company, IBM in Atlanta, Georgia.

114