Computational Methods for Forecasting

------

A Thesis Presented to

the Faculty of the Department of Computer Science

University of Houston

------

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

------

By

Christariny Hutapea

August 2016

Computational Methods for Flood Forecasting

Christariny Hutapea

APPROVED:

Dr. Christoph Eick, Chairman Department of Computer Science

Dr. Weidong Shi Department of Computer Science

Dr. Klaus Kaiser Department of Mathematics

Dean, College of Natural Sciences and Mathematics

ii

Acknowledgements

I would like to extend my gratitude to Dr. Christoph Eick, my advisor, for his continuous support and guidance during this endeavor. His valuable expertise, timely feedback and great patience have made it possible for me to complete this thesis.

The unconditional love and prayers from my parents have also given me strength in this journey. Finally, this thesis is dedicated to my wonderful husband and daughter whose tremendous support and encouragement I am thankful for.

iii

Computational Methods for Flood Forecasting

------

An Abstract of a Thesis

Presented to

the Faculty of the Department of Computer Science

University of Houston

------

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

------

By

Christariny Hutapea

August 2016

iv

Abstract

According to World Meteorological Organization (WMO), flooding is one of the most hazardous natural disasters, affecting millions of people globally every year. Over the years, many research have been conducted with the objective of reducing the impacts of flooding on peoples’ lives, the environment, and economy. This thesis surveys computational methods for flood management covering flood forecasting, flood warning and monitoring, and flood-response management. Most existing flood-forecasting models employ simulation techniques that operate on complex physics and mathematical equations representing the dynamics of the atmosphere and of water flow.

Moreover, there are web-based and mobile applications that collect flood-related data from sensors, and serve as flood monitoring and warning systems.

Furthermore, the thesis investigates water-level forecasting techniques relying on a regression approach. The investigated forecasting techniques are applied and evaluated for Harris County

Flood Warning System (HCFWS) datasets. The purpose of the case study is to generate alternative water-level forecasting models using existing statistical forecasting techniques, in contrast to existing simulation approaches. We investigated several forecasting approaches including Linear Regression, Vector Autoregressive (VAR), and Autoregressive Integrated

Moving Average (ARIMA) model. We applied these approaches to different forecasting scenarios including predicting water levels in Harris County at a particular location and in the

Addicks Reservoir watershed. We compared each model’s performance using two statistics: Root

Mean Square Error (RMSE) and Coefficient of Determination, or also known as R2. The experiments showed mixed results for different scenarios, but, in general, the linear regression produced better results than other approaches. However, the RMSE for some forecasting v scenarios was quite high with values greater than 0.5 feet; consequently, there is a need to look for better approaches.

vi

Contents 1 Introduction ...... 1 2 Background and Survey on Recent Research on Flood-Related Problems...... 4 2.1 Background ...... 4 2.2 Survey on Recent Research on Flood-Related Problems...... 7 2.2.1 Coastal Flood Risk Reduction Program ...... 8 2.2.2 Houston-Galveston Area Protection System (H-GAPS) ...... 8 2.2.3 Research by Department of Homeland Security ...... 11 3 Alleviating Flood Problems with Computational Method and Information Technology ...... 15 3.1 Flood Forecasting ...... 16 3.1.1 Storm Surge and Wave Computation Models ...... 16 3.1.2 Another Storm Surge Modelling and Wave Computation Models ...... 23 3.2 Flood Warning/Monitoring Systems...... 26 3.2.1 Harris County Flood Warning System ...... 26 3.2.2 The Rice University Flood Alert System ...... 35 3.2.3 Central Texas HUB ...... 38 3.2.4 Flood Warning/Monitoring Systems on Mobile Devices ...... 41 3.3 Flood Response Management ...... 44 4 Case Study: Water-Level Forecasting in Harris County ...... 45 4.1 Objective ...... 45 4.2 Datasets ...... 46 4.2.1 Data Entities ...... 46 4.2.2 Extraction Tool: Selenium ...... 50 4.2.3 Data Extraction ...... 52 4.2.4 Data Pre-Processing ...... 54 4.3 Exploratory Data Analysis ...... 55 4.3.1 Exploratory Data Analysis for Particular Locations ...... 56 4.3.2 Exploratory Data Analysis for the Addicks Reservoir Watershed ...... 59 4.3.3 Global Exploratory Data Analysis ...... 65 4.3.4 Data Cleaning Procedure ...... 67 4.4 Forecasting Scenarios and Evaluation Measures ...... 71 4.4.1 Forecasting Scenarios ...... 71 4.4.2 Forecasting Methods ...... 72 4.4.3 Forecasting Models ...... 76 4.4.4 Evaluation Measures ...... 79 4.5 Experimental Method ...... 80 4.5.1 Location-Specific Forecasting Scenario ...... 81 4.5.2 The Addicks Reservoir Watershed Water-Level Forecasting ...... 84 4.5.3 Global Water-Level Forecasting ...... 87

vii

4.6 Experimental Results and Discussions ...... 87 4.6.1 Location-Specific Water-Level Forecasting Scenario ...... 87 4.6.2 Addicks Reservoir Watershed Water-Level Forecasting Scenario ...... 89 4.6.3 Global Water-Level Forecasting Scenario ...... 93 5 Conclusion ...... 96

viii

List of Figures

Figure 1. Operational SLOSH Model Coverage...... 19 Figure 2. Sample Output of the SLOSH Model for Hurricane Ike Displayed in SDP...... 20 Figure 3. Harris County Flood Warning Website Main Page...... 29 Figure 4. View of Elevation Data...... 31 Figure 5. View of Rainfall Amount Data...... 32 Figure 6. Harris County Flood Warning System High Level Architecture...... 34 Figure 7. The Rice University Flood Alert System Website Main Page...... 37 Figure 8. Central Texas HUB Main Page...... 39 Figure 9. Rivercast Application ...... 42 Figure 10. FloodWatch Application ...... 43 Figure 11. King County Flood Warning Application ...... 44 Figure 12. Sensor Information Screenshot...... 46 Figure 13. Excerpt of Daily Rainfall Amount Data...... 48 Figure 14. Excerpt of Water-Level Data...... 49 Figure 15. Rainfall Amount Data Extraction Script in Selenium IDE...... 53 Figure 16. Data Extraction Process Flow...... 54 Figure 17. Data Pre-Processing...... 55 Figure 18. Box Plot...... 56 Figure 19. Water Levels of Location-Specific Gage Stations...... 57 Figure 20. Water Levels and Rainfall Amounts of Location-Specific Gage Stations...... 58 Figure 21. Primary in the Addicks Reservoir Watershed...... 59 Figure 22. Locations of Observed Gage Stations...... 60 Figure 23. Box Plots of Rainfall Amounts for Individual Gage Stations in the Addicks Reservoir Watershed...... 61 Figure 24. Histogram of Rainfall Amounts in the Addicks Reservoir Watershed...... 62 Figure 25. Rainfall Amount Fluctuations in the Addicks Reservoir Watershed...... 62 Figure 26. Box Plots of Absolute Water-Level in the Addicks Reservoir Watershed...... 63 Figure 27. Water-Level Fluctuations from Individual Gage Stations in the Addicks Reservoir Watershed...... 64 Figure 28. Box Plots of Relative Water Levels in the Addicks Reservoir Watershed...... 65

ix

Figure 29. Histogram of Harris County Rainfall Amounts in year 2014 and 2015...... 66 Figure 30. Box Plots of Water Levels in Harris County...... 67 Figure 31. Water-Level Fluctuations for Individual Gage Stations in the Addicks Reservoir Watershed After Approximation...... 69 Figure 32. Pseudocode for Parameter-Selection Experiment...... 82 Figure 33. RMSE of Location-Specific Forecasting Scenario...... 88 Figure 34. R2 of Location-Specific Forecasting Scenario...... 89 Figure 35. Comparison of Addicks Watershed Forecasting Scenario at Each Location for Training Sets...... 90 Figure 36. Comparison of Addicks Watershed Forecasting Scenario at Each Location for Testing Sets...... 91 Figure 37. Comparison of Addicks Watershed Forecasting Scenario in Training and Testing Phase ...... 92 Figure 38. Global Forecasting Scenario...... 93 Figure 39. Linear Models Performance in the Global and the Addicks Watershed Forecasting Scenario...... 94 Figure 40. Performance of All Forecasting Models in Different Scenarios...... 95

x

List of Tables

Table 1. Computer Models Used by the NOAA ...... 17 Table 2. Number of Gage Stations by Agencies ...... 33 Table 3. Attributes of Sensor Information Data ...... 47 Table 4. Attributes of Rainfall Amount Data ...... 49 Table 5. Attributes of Water-Level Data ...... 49 Table 6. Selenium Compatibility ...... 52 Table 7. Different Components in Selenium ...... 52 Table 8. Observed Gage in the Addicks Reservoir Watershed ...... 60 Table 9. Data Summary ...... 65 Table 10. Quartiles of Rainfall Amount Data ...... 66 Table 11. Distribution of Inconsistent Water-Level Data ...... 69 Table 12. Inconsistent Water-Level Data Resolution ...... 70 Table 13. Summary of Scenario ...... 72 Table 14. Notation Description ...... 76 Table 15. ARIMA Parameter-Selection Experiments for Model 4 ...... 83 Table 16. Selected ARIMA (p,d,q) parameters ...... 83 Table 17. VAR Parameters Selection Experiment for Model 7 ...... 86 Table 18. Coefficients of the Linear Models in the Global and the Addicks Reservoir Watershed Scenario ...... 94

xi

List of Abbreviations

ADCIRC Advance CIRCulation

AHPS Advanced Hydrologic Prediction Service

ARIMA Autoregressive Integrated Moving Average

CHC Coastal Hazards Center of Excellence

CoE Centers of Excellence

CRC Coastal Resilience Center of Excellence

CRWR Center for Research in Water Resources

CRWR Center for Research in Water Resources

CTH Central Texas HUB

DHS Department of Homeland Security

ESRI Environmental Systems Research Institute

ESTOFS Extratropical Surge and Tide Operational Forecast System

ETSS Extratropical Storm Surge Model

FAS3 Rice University Flood Alert System

FEMA Federal Emergency Management Agency

GIS Geographic Information System

GUI Graphical User Interface

HCFCD Harris County District

HCFWS Harris County Flood Warning System

H-GAPS Houston-Galveston Area Protection System

HSC Houston Ship

IDE Integrated Development Environment

LCRA Lower Colorado Authority

xii

LSCNRA Lone Star Coastal National Recreation Area

MPI Message Passing Interface

NASA National Aeronautics and Space Administration

NCEP National Centers for Environmental Prediction

NCEP National Centers for Environmental Prediction

NHC National Hurricane Center

NOAA National Oceanic and Atmospheric Administration

NWS National Weather Service

P-Surge Probabilistic Hurricane Storm Surge

RFC River Forecast Center

RMSE Root Mean Square Error

SDP SLOSH Display Program

SLOSH Sea, Lake, and Overland Surges from Hurricanes

SSPEED Severe Storm Prediction, Education and Evacuation from Disasters

SWAN Simulating Waves Nearshore

TMC Texas Medical Center

USACE U.S. Army Corps of Engineers

USGS United States Geological Survey

VAR Vector Autoregressive

WMO World Meteorological Organization

WWIII WAVEWATCH III

xiii

1 Introduction

A flood is the most hazardous of natural disasters in the world according to World

Meteorological Organization (WMO). Over the years, flooding has caused numerous losses of lives, brought disruption to peoples’ lives, and created major economic damages. Flooding is not only caused by heavy rainfall, but it can be also caused by other factors, like extreme weather conditions, such as storms or tsunamis producing storm surge in coastal areas, earthquakes that cause failure, and overflowing their banks because the exceeds its channel volume. The permeability of the soil and rock in river drainage basins also plays a big role in a flood occurrence. Impermeability, which may be caused by previous rainfall or prolonged heating, will not allow precipitation to infiltrate the soil and water will run off to the nearest river.

This causes an increase in the river’s discharge and triggers .

Scientists have been conducting many studies in the area of flood forecasting for both coastal and inland areas. A coastal flood is often triggered by storm surge and large waves. It can also be caused by a sea level rise due to climate change or by tsunamis. Storm surge and large waves, usually associated with major hurricanes and storms, are well-known for their ability to cause fatalities and devastating property loss. Storm surges are considered extremely dangerous and their massive strength is able to destroy the life and property along the coast. As an example: one of five deadliest storms, Hurricane Katrina caused over than 1,200 fatalities and a total property loss of US$ 108 billion in 2005.

Because of the great loss of lives and economic damage, the National Oceanic and

Atmospheric Administration (NOAA), a U.S government federal agency, continually monitors coastal flood conditions 24 hours a day and 7 days a week. The National Weather Service (NWS), an agency within NOAA, issues coastal flood watches and warnings to alert residents about flood condition in coastal areas. If there is a chance of a flood along the coast, a coastal flood watch

1 will be issued. If a flood is imminent or occurring in these areas, a coastal flood warning will be issued. Flood watches and warnings involve the use of storm-surge forecasting products which have been studied for a long time and have been continuously improved in numerous iterations by various scientists. They run simulation techniques that operate on complex physics and mathematical equations representing the dynamics of the atmosphere and of water flow. Some of these flood-forecasting products will be discussed in this thesis.

In addition to storm-surge forecasting products, NOAA also utilizes flood-forecasting tools in inland areas. The NWS’ River Forecast Centers (RFCs) collect, process, and provide forecasts and information about water resources for 13 major river basins throughout the U.S.

There are Alaska-Pacific RFC, Arkansas-Red Basin RFC, California-Nevada RFC, Colorado

Basin RFC, Lower Mississippi RFC, Middle Atlantic RFC, Missouri Basin RFC, North Central

RFC, Northeast RFC, Northwest RFC, Ohio RFC, Southeast RFC, and West Gulf RFC. The river forecast is produced by a computer model called the Advanced Hydrologic Prediction Service

(AHPS). It uses forecasted and observed precipitation levels generated by a combination of gages, radars, and satellites, and employs hydrologic models to estimate the downstream water movement and predicts the flow and the level at a given location.

However, the RFC only provides forecast information at a limited number of locations.

For example, in the State of Texas, specifically Harris County, the RFC only provides forecast information for ten locations. This creates a need to provide river forecasts for more locations in

Harris County. Moreover, the existing forecasting model AHPS requires many hydrological and hydrodynamic parameters. In this thesis, we investigate alternative approaches to predict future water levels by using data-mining and machine-learning techniques; the obtained models are less complex and require fewer parameters. By using fewer parameters, water-level forecast models are easier to learn, particularly if the amount of training data is limited.

2

In this thesis, we also conducted a case study to forecast water levels in rivers/streams in

Harris County, Texas by using past water levels and the amount of rainfall as input. We employed some existing statistical-forecast models and generated forecast models from training data collected in 2014 and 2015. We used the models to predict water levels for the training data, and for “unseen” testing data collected in the first five months of 2016. Moreover, we apply these forecasting models to different forecasting scenarios, including water-level forecasts in particular locations within Harris County, in the Addicks Reservoir watershed, and for the entire Harris

County.

This thesis is organized as follows: Chapter 2 presents survey results on resent research on flood-related problems. This includes research conducted nationwide and in Texas. Chapter 3 surveys various computational methods and systems used to alleviate flood problems, centering on flood forecasting, flood warning, and flood-response management. Chapter 4 presents a case study. This case study explains different forecasting scenarios and models and discusses the experiment result. Finally, chapter 5 makes conclusions and discusses future work.

3

2 Background and Survey on Recent Research on Flood-Related

Problems

2.1 Background

Flooding is one natural disaster that can happen anywhere in the world. We may have heard it on the news or witnessed it ourselves. Flooding brings misery to the victims and causes loss of lives, property damage, and major disruption in our daily lives. It comes into people’s houses, disrupts water and electricity supplies, blocks roads, and causes people not able to do their daily tasks or jobs.

According to World Meteorological Organization (WMO), flooding is one of the most hazardous disasters, affecting millions of people globally every year. Between 1970 and 2012,

44% of the worldwide natural disasters were floods, which makes flooding the number one cause of other linked disasters, such as mass movement (subsidences, rockfalls, avalanches, and landslides), storms, droughts, extreme temperatures, and wildfires. Among the fatalities and the economic damage resulting from natural disasters, flooding is responsible for 14% of the fatalities and 33% of the economic damage. This is equal to 272,251 fatalities and US$ 788.9 billion of economic damage [1]. In the United States alone, flooding relates to approximately 75% of all presidential-disaster declarations. According to the National Weather Service, “Flooding is a coast-to-coast threat to the United States and its territories in all months of the year” [2].

A flood happens when water covers land when normally it is not covered. It typically occurs (1) when rain continues to fall over an extended period; (2) when heavy rainfall occurs over a short period of time; (3) when an ice or debris jam causes a river to overflow onto the surrounding area; or (4) as a result of a water-control structure failure, such as dam

4 breaks [2]. The FLOODSite project1 categorizes flooding into five types: flash flooding, coastal flooding, river flooding, urban flooding, and ponding flooding [3].

1. Flash Flooding

Flash flooding occurs during a high-intensity rainfall in a short time, generally less than six hours, on a saturated soil or a dry soil that has poor absorption ability. In the areas with steep slopes, rain water travels downhill with high speed overflowing low-lying areas and their surroundings in a very short time. Failure of a water-control structure, like a or a dam, can also cause [2].

Because of its nature and the high speed of water movement, a flash flood is very dangerous. Although the water is able to transport trees, cars and rocks [3], people tend to underestimate the danger of a flash flood. According to a report published by the NWS, in 2014, flash flooding caused 31 fatalities and 16 injuries throughout the country and created the number one property and crop damage at a cost of US$ 2.5 billion [4]. Furthermore, in 2015, flash flooding caused 129 fatalities, 42 injuries, and also created the number one damage cost of US$

2.1 billion [5]. Based on both reports, the damage cost caused by flash flooding surpassed the cost brought on by other natural disasters, such as hail, tornados, hurricanes, and drought.

2. Coastal Flooding

Coastal floods happen when sea water covers a normally dry land. This type of flood can be triggered by various causes, such as storm surges, large waves, and a rise in sea level due to climate change or by tsunamis [6]. The NOAA defines storm surge as an “abnormal rise in water level, over and above the regular astronomical tide, caused by a severe storm such as a tropical

1 The FLOODsite is a flood risk-management project initiated by the European Commission.

5 cyclone or a nor'easter” [7]. Large waves can be caused by a local wind or by a swell2 as a result of distant storms. Storm surge and large waves usually are associated with major hurricanes and storms, and are well-known for its ability to cause fatalities and devastating property loss. Storm surge is considered extremely dangerous because it is able to flood a large coastal area and bring severe damage. Its massive strength is able to destroy life and property along the coast. There is a possible scenario of complete demolition of homes and business establishments in the coastal areas; as an example, Hurricane Katrina in 2005 was the costliest natural disaster, and became one of the five deadliest hurricanes in the history of the United States [8]. At least 1,200 people reported died because of the hurricane, making it the United States’ deadliest hurricane since the

1900 Great Galveston Hurricane and 1928 Okeechobee hurricane. Hurricane Katrina caused severe devastation to the Gulf coast from central Florida to Texas, mainly due to the storm surges and levee failures.

A coastal flood can also be caused by sea level rises due to climate change. The increase of the global temperatures causes the oceans to warm, expand and take more space in their basins.

This leads to the rise in water levels. It also causes the melting of more ice which leads to more water in the oceans. Another cause of a coastal flood is tsunamis. A tsunami is series of waves propagating through an ocean caused by a sudden displacement of a large volume of water due to earthquakes, volcanic eruptions, or other underwater explosions.

3. River Flooding

A rainfall which happens during an extended period often causes major rivers to overflow. Besides a prolonged rainfall, dam failures, rapid snowmelt, and ice jams also cause

2 A swell is a series of mechanical waves which propagates along the interface between water and air.

6 river flooding. When a dike or a dam breaks, the speed of the water flow is comparable to a flash flood.

However, the river flooding commonly occurs relatively slowly, especially in a large river. When a rainfall happens for a long period of time, the water level in a river rises gradually as it is receiving water from smaller rivers. Because of the slow rise, the local agencies or officials are able to benefit from the time to plan for evacuations of people living around the river to minimize the flood damage [3].

4. Urban Flooding

A flood that occurs in an urban area is characterized by a lack of drainages. In urban areas, people develop and drainage systems to collect rain water because of the limited surface area available for rain water . In high intensity rainfall, the systems and may not be able to collect, store, and drain the rain water. This causes urban flooding. The casualties of this flood are usually far less than other flood types, however, an urban flood is able to disturb the daily life in the city [3].

5. Ponding Flooding

A ponding flood occurs when rain water diverts into a relatively flat area with no outlet or is not suitable for drainage. There is more rain water than ground absorption capacity or a lack of drainage systems. Since the rain water is not absorbed by the ground, puddles or ponds develop. Ponding flooding is similar to urban flooding, but it happens in rural areas. [9]

2.2 Survey on Recent Research on Flood-Related Problems

Over the years, research has been conducted to address flood-related problems. This research is mainly funded by the U.S. federal government or by several philanthropists.

7

2.2.1 Coastal Flood Risk Reduction Program

The Coastal Flood-Risk Reduction Program [10] is an ongoing research and education program conducted by a partnership of the U.S and the Netherlands. The partnership includes six educational institutions, namely, the Texas A&M University at Galveston, the Texas A&M

University at College Station (Department of Landscape Architecture and Urban Planning), the

Jackson State University (Department of Civil and Environmental Engineering), the University of

Houston (C.T. Bauer College of Business) and the Delft University (Department of Hydraulic

Engineering). It is a five-year program (2015-2020) at US$ 3.6 million funded by the Office of

International Science and Engineering of the National Science Foundation through Partnerships for International Research and Education [11].

The objective of the program is to address the following issues [10]:

- Underlying characteristics of physical flood-risk;

- Causes of human-communities and built-environment vulnerability to flooding;

- Most effective mitigation techniques (structural and non-structural) to reduce the flooding

impact.

The research will focus on the interaction of three domains: Physical Environment, Built

Environment and Socioeconomics to answer the above three issues.

2.2.2 Houston-Galveston Area Protection System (H-GAPS)

The Houston-Galveston Area Protection System (H-GAPS) [12] is a three-year (2014-

2017) program for a hurricane-surge reduction in the Houston-Galveston area. It is conducted by the Severe Storm Prediction, Education, and Evacuation from Disasters Center (SSPEED), a university-based research group led by Rice University. According to [13], the US$ 3.1 million research is funded by the Houston Endowment. The participants are faculties and researchers from: Rice University, Texas A&M University, Texas A&M University-Galveston, the

8

University of Houston, the University of Texas-Arlington, and the University of Texas at Austin.

It also involves a faculty from Louisiana State University and a partnership with several environmental consulting companies.

The main objective of the research is to investigate and develop a potential surge- reduction scenario for the Houston-Galveston region. The Galveston Bay Region is vulnerable to storm surges due to the mild slope of the coastal shelf and the flat, low-lying areas. This factor adds potential damage that are created by winds and rainfall associated with hurricanes.

The research was motivated by the large damage caused by Hurricane Ike in 2008. The hurricane generated nearly US$ 29.5 billion in damages [14]. The location of the hurricane Ike’s center was just east of Galveston Island. Although hurricane Ike was only categorized as

Category 2 storm, it generated significant storm surge which flooded inland areas from Galveston

Bay to Grand Isle, Louisiana, caused by the large wind field and its relatively slow-forward motion. Ike had the highest storm surge, reaching more than 17 feet above sea level, in Chambers

County. The storm surge nearly reached the Interstate Highway 10 (IH-10), located about 20 miles inland. Moreover, the storm surge level was about 13 feet in west of Galveston Bay and in the Houston Ship Channel (HSC).

Today, the population around Galveston Bay area is over 1.6 million people and over 2.6 million people are predicted to live within the Hurricane Evacuation Zone over the next two decades [15]. Furthermore, there are also many ports and shipping channels in the area. Two of them are the Port of Houston, the second largest port in the U.S, and the Houston Ship Channel

(HSC), the largest petrochemical complex in the U.S. There are hundreds of facilities along the

HSC which need protection from flooding caused by storm surges. With such large population and important facilities, future hurricane storm surges have the potential to cause extensive damage to the Galveston Bay area. The SSPEED Center has developed several hypothetical

9 scenarios simulating different locations of hurricane Ike’s landfall location in order to understand differences in storm-surge levels generated along the coastline of Galveston.

Currently, this research is in the third phase which is intended to develop a more comprehensive surge-reduction scenarios for the industrial complex along the HSC as well as for the communities located to the south of the HSC Gate, such as the west side of Galveston Bay and the City of Galveston. The research also oversees the low-lying areas of Chambers,

Galveston, Brazoria, and Matagorda Counties to participate in a novel ecosystem services exchange program with the purpose of solidifying and preserving the storm-surge risk reduction services and other ecosystem services provided by the natural areas along the Texas coast.

In order to achieve the objectives, this research employed the Advance CIRCulation

(ADCIRC) and the Simulating WAves Nearshore (SWAN) models to conduct the storm-surge modelling. The ADCIRC is a model for solving the equations of motion for a moving fluid on a rotating earth. It was built by the faculties at the University of North Carolina at Chapel Hill and the University of Notre Dame. The SWAN is a model for simulating wind-generated waves during a hurricane. It was built by the Delft University of Technology (TU-Delft). Section 3.1.2 will discuss both models in further details.

During the first year of phase 3, the research analyzed variations of hurricanes that could potentially hit the Houston-Galveston area. The ADCIRC model is used to better understand how hurricanes would affect the production of storm surge along the Galveston coastline, as well as within Galveston Bay and up to the City of Houston along the HSC. The maximum water-level in a storm surge is affected by a combination of water levels generated by various factors: (1) a drop of atmospheric pressure, (2) a storm’s forward movement, (3) the wind set-up, (4) the wave set- up, (5) waves, and (6) the daily fluctuation of tides. In addition to that, a storm surge is also

10 affected by the characteristic of the hurricane, such as the angle of approach, the wind speed, the maximum wind radius, and the storm’s forward speed. Moreover, the magnitude of a storm surge is also affected by underwater topography. These complex processes are modelled by the SWAN and the ADCIRC models. Both models are also used for evaluation purposes to determine the effectiveness of the proposed storm-surge reduction scenarios.

As a result, the research proposed the following initial regional storm-surge reduction strategies in the first year of phase 3:

1. Navigation Gates “Upper-Bay”, “Mid-Bay”, “Lower-Bay”

These three navigation gates will be located across the upper, middle, and lower portion of

the HSC.

2. Coastal Barriers.

This includes the raising of existing roadways along the Galveston Island’s FM-3005 and the

Bolivar Peninsula SH-87 to a height of 15 feet.

3. In-Bay Barriers

This includes the extension and the raising of existing dredged containment berms to 25 feet

and the raising of existing Texas City Levee to uniform height of 25 feet.

4. The creation of a Galveston Levee which will be located around the eastern part of the

Galveston Island. The levee will be connected to the existing Galveston Seawall.

The second and third year of this research will focus on the evaluation of the above strategies and the strategies refinement and optimization.

2.2.3 Research by Department of Homeland Security

The Homeland Security Act of 2002 granted the Department of Homeland Security

(DHS) the authority to create a university-based Centers of Excellence (CoE). The Centers are authorized by the U.S. Congress and chosen by the Department’s Science and Technology (S&T)

11

Directorate through a competitive selection process. The S&T Directorate’s Office of University

Programs sponsors the CoEs in conducting various ground-breaking research to address the homeland security challenges. Each CoE focuses on a unique homeland security need, perform research and development activities to provide the critical tools, technologies, training, and expertise to the homeland security community. The DHS has been funding 17 Centers of

Excellence across the country [16] and up to this date, two of them are dedicated to the study of coastal flooding: the Coastal Hazards Center of Excellence (CHC) and the Coastal Resilience

Center of Excellence (CRC).

2.2.3.1 Coastal Hazards Center of Excellence (CHC)

The Coastal Hazards Center of Excellence (CHC) [17] was established in 2008 to 2015 in response to Hurricane Katrina. It was co–led by the University of North Carolina at Chapel Hill and the Jackson State University in Mississippi. At that time, the CHC was the only CoE solely dedicated to the study of natural disasters. The key accomplishments of the CHC are:

1. The development of 39 new courses and seven new concentrations in hazards-related studies

varying from coastal engineering to social sciences at 14 colleges and universities.

2. The advanced state-of-the art capabilities in the storm surge/flood forecasting, disaster

management decision support, hazard mitigation planning, disaster recovery planning, and

coastal engineering

3. The winner of the 2010 and 2012 DHS Science & Technology Impact Awards for the

Advanced Circulation (ADCIRC) storm surge/flood forecasts that improved

preparedness/response during hurricanes Gustav, Ike, and Irene.

4. The winner of the 2012 Special Achievement in GIS Award from the Environmental Systems

Research Institute for the Disaster Response Intelligent System (DRIS).

There are two key research products of CHC which are beneficial in the field of study:

12

1. Advanced Circulation (ADCIRC)

The Advanced Circulation (ADCIRC) is an advanced (rain, wind, hydrological, storm surge,

wave) computer model that forecasts where, when, and to what extent flood will inundate a

coastal community. The model results were used by the U.S Coast Guard to aid storm-related

decision making, such as the deployment locations and the maintenance of operations

continuity during Hurricanes Irene, Isaac, and Sandy. It is also used by the Federal

Emergency Management Agency (FEMA) to update the National Flood Insurance Program

coastal inundation maps from New England to Texas and to re-evaluate the coastal

evacuation routes. We will discuss this computational model in section 3.1.2.1.

2. Disaster Response Intelligent System (DRIS)

The Disaster Response Intelligent System (DRIS) is a GIS-based decision support system for

disaster response and recovery. It is a combination of multiple tools and analyses, such as

hurricane models, plume models, live traffic, weather and incident feeds. The tool was

deployed by the state and local emergency managers for many disaster response operations

including those of the two Mississippi tornado events in 2010 and the Mississippi river flood

in May 2011. It is currently used by several counties, the Mississippi Emergency

Management Agency, the Mississippi National Guard, a local fire department, and a private

sector firm.

2.2.3.2 Coastal Resilience Center of Excellence (CRC)

In 2015, the DHS S&T Directorate selected the University of North Carolina at Chapel

Hill as the lead institution of the Coastal Resilience Center of Excellence (CRC) [18]. The CRC is an expansion of the former DHS’ CHC which have been discussed earlier. The S&T Directorate will provide the CRC with US$ 20 million grant for five years, with US$ 3.4 million grant for its first operating year. The CRC will collaborate with multiple U.S academic institutions, state,

13 local, and federal agencies and private sectors to conduct multiple research and education programs that directly address key challenges associated with growing coastal vulnerability.

According to [18], some examples of CRC research and education programs include:

- Development of more accurate storm-surge models;

- Timely delivery of storm-surge predictions prior to storm land-fall;

- Assisting FEMA, the state, and local governments in the development of better coastal-

hazard predictions and pre-disaster plans;

- Improving understanding about individual motives at the household level whether to

implement the risk-reduction measures and their employed risk-reduction measures;

- Improving the capability of risk-communication to multiple audiences;

- Educating students that will become hazard researchers and practitioners;

- Development of certification and degree programs in the minority-serving educational

institutions.

14

3 Alleviating Flood Problems with Computational Method and

Information Technology

Over the time, computational methods and information technology has been utilized to support flood management and prevention in various ways. They serve as a tool for flood forecasting, flood warning, and flood preparedness and response management.

According to the American Meteorological Society, flood forecasting is “the use of real- time precipitation and data in rainfall-runoff and streamflow routing models to forecast flow rates and water levels for periods ranging from a few hours to days ahead, depending on the size of the watershed or river basin” [19]. Several computational models for flood forecasting have been proposed. They consist of complex physics and mathematical equations representing the dynamics of the atmosphere and of water flow. The weather conditions are simulated with past and/or current data, typically from various resources, in enormous quantity and often in a short timeframe. Flood warning is the activity of making use of flood forecast products to decide whether a warning or alert should be issued to the general public or whether previous warnings should be withdrawn. When flood warning occurs, people, including the emergency personnel, are involved in various ways of response or recovery operations. The activities include but not limited to: evacuation, search and rescue operation. Street flooding sometimes gives the emergency service providers a challenge in traffic navigation. Road blocks or delays can be a matter of life or death because sometimes acceleration in seconds or minutes makes the difference in life-saving activities. A computational tool which serves as a traffic navigation decision support system has been proposed to help to address the challenge faced by the emergency service providers.

15

This chapter will give a general overview of available computational models/tools used in each aspect of flood managements. It will focus more on the technological aspect and the use of the model/tools and less on the theoretical challenges in creating the models.

3.1 Flood Forecasting

In the U.S, the National Oceanic and Atmospheric Administration (NOAA) is a federal agency within the United States Department of Commerce that focuses on the study of the atmosphere and the ocean conditions. Some of its services are: daily weather forecasts, severe storm warnings, climate monitoring to fisheries management, coastal restoration and supporting marine commerce [20]. We will discuss the computational models used by the NOAA and its sub-agencies to produce storm surge and wave forecasts. In addition to that, we also discuss the additional widely used forecast models developed within the academic institutions.

3.1.1 Storm Surge and Wave Computation Models

The NOAA defines storm surge as “abnormal rise in water level, over and above the regular astronomical tide, caused by a severe storm such as a tropical cyclone or nor'easter” [7].

A tropical3 or extratropical4 storm is able to cause a storm surge and poses dangerous impact to life and properties along the coast line. In order to reduce and better manage such damage, scientists have been conducting various research to predict storm surge and ocean waves. As part of their work, they developed a number of computer models which are used to simulate weather conditions during a storm and do storm surge predictions. These models are computational tools

3 A tropical storm forms over warm waters and is driven by heat transfer from the higher temperatures of the ocean to the lower temperatures at higher altitudes. Tropical storms can become extratropical if they lose tropical characteristics [7]. 4 An extratropical storm results from the temperature contrast between warm and cold air masses as they move against each other horizontally [7].

16 which depend on state-of-the-art high performance computer platforms so that computation result is obtained in an acceptable timeframe.

The NOAA combines several computational models for storm surge modelling. For tropical storms, they utilize the SLOSH and the P-Surge; and for extratropical storms, they utilize the ETSS and the ESTOFS. For wave modelling, they utilize the WAVEWATCH III.

Table 1. Computer Models Used by the NOAA Tropical Storms Extratropical Storms Storm Surge Modelling SLOSH ETSS P-Surge ESTOFS Wave Modelling WAVEWATCH III

3.1.1.1 Sea, Lake, and Overland Surges from Hurricanes (SLOSH)

The SLOSH [21] is a computational model developed by the NWS, an agency of the

NOAA, for coastal inundation risk assessment and storm surge prediction. The model is used by the National Hurricane Center (NHC), a sub-organization in the NOAA for the exclusive benefit of the NWS, the U.S. Army Corps of Engineers (USACE), and the Emergency Management personnel [22]. It predicts the storm surge heights produced by past, hypothetical or predicted storm hurricanes. It considers the storm's atmospheric pressure, size, forward speed, track, and winds. The model consists of a set of physics equations applied to a specific locale’s shoreline, taking into an account the unique bay and river configurations, water depths, bridges, roads, , and other physical features.

At the moment, SLOSH is applied to the entire U. S. Atlantic and Gulf of Mexico coastlines, extending to Hawaii, Puerto Rico, Virgin Islands, and the Bahamas. As seen in Figure

1, the operational SLOSH model coverage is subdivided into 32 regions or basins. Each SLOSH

17 basin is a geographical region with known values of topography and bathymetry (underwater topology).

When a hurricane is threatening, the NHC runs SLOSH using the details of hurricane track input provided by the hurricane forecasters. The input parameters are the hurricane’s attributes, such as pressure, radius of maximum winds, location, direction, and speed. The hurricane track determines to which basin the SLOSH model will be applied. The outputs of the model are made available to the forecast and emergency management by the means of a file transfer protocol (ftp) site. They include the maximum of storm surge heights in the Geographic

Information System (GIS) format and an animation of the storm surge [23]. This output is a useful estimate of a potential storm surge and flood for a given hurricane category, forward speed, and direction. It is displayed by an external application called the SLOSH Display

Program (SDP), which we will discuss next.

18

Figure 1. Operational SLOSH Model Coverage [24].

19

3.1.1.2 SLOSH Display Program (SDP)

The SLOSH Display Program (SDP) [21] is software which displays the SLOSH model simulation results in a graphical format. The target users of the SDP are trained emergency managers, the FEMA personnel, and the NWS forecasters. It was developed to help emergency managers visualize storm surge vulnerability and assist the users to manage evacuation and also educate the decision makers. Figure 2 shows an example of the graphical output displayed in the

SDP. It is the forecast of the storm surge impacting the Galveston Bay area which was caused by

Hurricane Ike in 2008. It shows color-coded storm-surge heights the Galveston Bay area in feet above ground level.

Figure 2. Sample Output of the SLOSH Model for Hurricane Ike Displayed in SDP [25].

20

3.1.1.3 Probabilistic Hurricane Storm Surge (P-Surge)

The result of the storm-surge model produced by the SLOSH model is highly dependent on the accuracy of a hurricane forecast. However, an accurate hurricane forecast remains a very difficult task. Even 12 to 24 hours before the hurricane reaches the coast, there is still a chance of error by tens of miles in the hurricane’s landfall location prediction. Therefore, this error factor can make the deterministic storm-surge forecast incorrect. To make the forecasting results less sensitive to errors, forecasters use the Probabilistic Hurricane Storm-Surge (P-Surge) forecast

[26].

P-Surge was developed to calculate the likelihood of a storm-surge occurrence in a specific location and in an indicated forecast period. The storm-surge values used in the calculation are based on the result produced by the SLOSH model.

3.1.1.4 Extratropical Storm Surge Model (ETSS)

The ETSS is a model used by the NWS to predict storm surge accompanied by extratropical storms, commonly known as the Northeasters or the Nor’easters. This model was developed by the Meteorological Development Laboratory and was a modification on the SLOSH model, which is used for storm-surge prediction caused by tropical storms.

There are two major characteristics which differentiate an extratropical storm from a tropical storm: (1) an extratropical storm has greater time and length scales, which means it lasts for a longer time than a tropical storm; and (2) an extratropical storm tends to move along the seashore, while tropical storms tend to have a cross-shore movement [27]. Due to their more complicated characteristics, the SLOSH model cannot be directly applied to extratropical storms.

21

Instead of using the simplified parametric wind fields as used in the tropical case, ETSS use higher resolution Global Forecast System5 wind and pressure as input parameters [28].

3.1.1.5 Extratropical Surge and Tide Operational Forecast System (ESTOFS)

The ESTOFS [29] is an extratropical storm surge model operated by the NWS in addition to the ETSS. It was established by a collaboration between the Coast Survey Development

Laboratory (CSDL) of the National Ocean Service (NOS) and the Environmental Modeling

Center (EMC) of the National Centers for Environmental Prediction (NCEP). Compared to the

ETSS, the ESTOFS provides additional capabilities which include: providing forecasts of surges with astronomical tides and providing models at a finer resolution (2.5 km vs 5 km for the ETSS).

These additional capabilities enable the ESTOFS to produce more accurate result to the local weather offices, which results to more effective preparation and response to an extratropical storm surge.

The ESTOFS employs the ADvanced CIRCulation model (ADCIRC) as its hydrodynamic model. The ADCIRC will be discussed in section 3.1.2.1. It is also designed to provide surge with tides input to the WAVEWATCH III model.

3.1.1.6 WAVEWATCH III (WWIII)

The WAVEWATCH III (WWIII) [30] is a forecast system that predicts wind-generated ocean waves. WWIII is the 3rd generation wave model developed at the NOAA/NCEP. The former generation, the WAVEWATCH and the WAVEWATCH II were developed at the Delft

University of Technology and the NASA, Goddard Space Flight Center, respectively. The key important points that differentiate the WWIII from its predecessors are the governing equations, the model structure, the numerical methods and the physical parameterizations. Since version

5 Global Forecast System is a global numerical weather prediction computer model run by NWS.

22

3.14, the WWIII is evolving from a wave model into a wave modelling framework, where it is possible for easier development of additional physical and numerical approaches to wave modelling. The latest version of the WWIII is version 4.18 which was released on 19 March

2014. The WWIII is written in the ANSI standard FORTRAN 90 and can be compiled optionally for parallel computers using the OpenMP or Message Passing Interface (MPI).

The WWIII has been used operationally at the NOAA/NCEP since 1999. At the moment, the operational wave model covers the Atlantic, the Pacific, and the Indian Ocean, the regional northeast and northwest Atlantic Ocean, the U.S. East Coast, the northeast Pacific Ocean, the

Alaskan Waters, the Australia-Indonesia Ocean, the Gulf of Mexico, Key West, the U.S West

Coast, and Hawaii. The WWIII requires several input parameters, such as wind fields, hurricane wind fields (for regional hurricane wave model), ice concentration, sea surface temperatures, boundary data for regional models, and bathymetric and obstruction grids. The bathymetric grids represent the underwater topology and the obstruction grids represent the topographic features which block the wave energy propagation. The output of the operational model is available in graphical and binary formats.

3.1.2 Another Storm Surge Modelling and Wave Computation Models

This section will give a brief overview about two computer models which were developed within academic institutions.

3.1.2.1 Advanced Circulation (ADCIRC)

ADCIRC [31] is a system of computer programs developed by Dr. Rick Luettich at the

University of North Carolina at Chapel Hill and Dr. Joannes Westerink at the University of Notre

Dame. The ADCIRC stands for A (Parallel) Advanced Circulation Model for Oceanic, Coastal and Estuarine Waters. The software is used for solving the equations of motion for a moving fluid

23 on a rotating earth. The equations have been formulated using the traditional hydrostatic pressure and Boussinesq approximations and have been discretized in space and time using the finite element and finite differences methods.

The ADCIRC applications typically are able to: (1) model tidally and wind-driven circulation in coastal water, (2) forecast hurricane storm surge and flood, (3) perform dredging feasibility and material disposal studies , (4) perform larval transport studies, (5) do riverine modeling for currents and water levels, (6) and do baroclinic coastal modeling from lab scale to field scale. In addition to that, the ADCIRC is also equipped with the following capabilities:

- Running as two dimensional depth integrated (2DDI) or three dimensional (3D) model

- Running using Cartesian or spherical coordinate system

- Running on single processor computers and parallel computers using MPI

- Producing ASCII and NetCDF output formats

ADCIRC runs on many operating systems with a working FORTRAN compiler, such as large commercial UNIX systems (IBM, Sun, etc.), Linux and FreeBSD clusters, and also personal workstations running Windows or Mac OSX. As of this thesis is written, the latest version of

ADCIRC is version 51.52, which was released on 31 March 2015.

In order to run the program, user needs to prepare input files which consist of: bathymetry, topography, boundary information, tidal characteristics, nodal attributes (often based on land use data), river inflow, meteorological forcing input, wave radiation stress forcing, and others depending on the geographical area of interest. ADCIRC prepare templates for these input files. However, they can also be prepared using external software. Some of them are Modelling System (SMS) from Aquaveo, LLC6 or Blue Kenue package from National

6 http://www.aquaveo.com

24

Research Council Canada7. ADCIRC will produce output files containing time and spatially varying water surface elevation, water velocity, wind velocity and atmospheric pressure. These output files are directly readable by SMS software. But with the use of additional utilities or scripts, they are also readable in many GIS software or visualization software, such as ArcGIS,

Tecplot, Paraview, Google Earth and gnuplot. These utilities or scripts are developed by various users and are available in ADCIRC website.

ADCIRC source code is made freely available to academic, government, and other not- for-profit research purposes. However, commercial use is also possible with fully negotiated terms of use with its major software authors.

3.1.2.2 Simulating Waves Nearshore (SWAN)

Simulating Waves Nearshore (SWAN) [32] is a computer program which computes random, short-crested wind-generated waves in coastal regions and inland waters. It was developed by Delft University of Technology and was first released in 1998. Since the first release, it has been widely used as a tool for off-shore and near-shore wave prediction. Delft

University of Technology is still continuously improving the software’s capabilities, features and physics with the support of the U.S. Office of Naval Research and Dutch Ministry of Public

Works [33]. The current version of SWAN is 41.01.

SWAN model is a third generation spectral wave predication model based on an Eulerian formulation of the discrete spectral balance of action density that accounts for refractive propagation over arbitrary bathymetry and current fields. Unlike its predecessors, third generation wave model contains state-of-the-art formulation of the process of wave generation, wave

7 http://www.nrc-cnrc.gc.ca/eng/solutions/advisory/blue_kenue_index.html

25 dissipation and wave-wave interactions in phase-averaged models [34]. Some of SWAN capabilities are:

- Computable on a regular, curvilinear grid and a triangular mesh in a Cartesian or spherical

coordinate system

- Nested running with input files from SWAN, WWIII or WAM8

- Running on single processor computer or parallel computers using MPI or OpenMP

SWAN software can be used freely under the terms of the GNU General Public License.

It runs on computers with FORTRAN compilers and Perl package.

3.2 Flood Warning/Monitoring Systems

Flood warning is the activity of making use of flood forecast products to decide whether a warning or alert should be issued to the general public or whether previous warnings should be withdrawn. Flood warning system relies on flood monitoring system. Flood monitoring system observes real time rainfall amount, precipitation, streamflow, and water level collected by sensors placed in certain strategic areas. The sensors transmit valuable data during heavy rainfall, storms and/or hurricanes. Some of them also measure wind speed and direction, barometric pressure, air temperature, road temperature, and humidity. We will discuss three widely-used flood warning systems which serve inland Texas areas: Harris County Flood Warning System, The Rice

University and Texas Medical Center Flood Alert System, and Central Texas Hub.

3.2.1 Harris County Flood Warning System

Harris County Flood Warning System (HCFWS) [35] provides information for the government officials and the public about dangerous weather conditions in Harris County and its

8 Wave Modelling (WAM) a third generation wave model, developed by the Wave Model Development and Implementation Group.

26 surrounding areas on a real-time basis9. The system is intended to provide the collected information to the public in a user-friendly format. In addition to that, the HCFCD and Harris

County’s Office of Homeland Security and Emergency Management also use the information to inform residents of imminent and current flood conditions along the . NWS also uses the information to assist in the issuing of flood watches and warnings. The website address is www.harriscountyfws.org.

HCFWS is owned and maintained by Harris County Flood Control District (HCFCD).

The district was created by the Texas Legislature in 1937 after devastating floods hit the region in

1929 and 1935. The district’s jurisdictional boundaries are the same as that of Harris County, which is a community that includes the City of Houston. This special purpose district is responsible for forming and implementing flood-damage reduction plans and maintaining the drainage system infrastructure. It is operated daily by a staff of approximately 380 people and heavily depends on the private sector. According to [36], the district “obtains virtually all engineering design work for capital projects or maintenance repairs through consulting contracts”, and “assigns all construction work through the public bidding process”. Nearly 100% of its routine maintenance work is performed by private companies.

HCFWS provides interactive maps of Harris County and its surrounding areas which are available in both desktop and mobile version. The map is overlaid with meaningful information, such as channel flood status and rainfall amount information. Figure 3 displays the HCFWS website’s main page that consists of two frames. The left frame is for navigation and the right frame is for displaying the Harris County map overlaid with the information requested by user. In this figure, the map is overlaid with channel status information represented by green rectangles

9 The system claims that the data presented in the mapping tool and website may be delayed by approximately 5 minutes.

27 indicating the water levels at various locations. Using the website, users are able to perform the following tasks:

 View and navigate (zoom in/out, pan, etc.) a map of Harris County and its surrounding areas,

watershed areas and boundaries

 Overlay the map with rainfall amount or channel status information

 View past data based in a specified time window, for example, 1 hour, 24 hours, 7 days, etc.

 Search for and view a specific address location

 View partner agencies’ gage station sites on the base map

 View current weather information (air/road temperature, wind direction/speed, etc.)

 Access current and past rainfall and water-level data at a specified gage location, export it to

Excel and to print information displayed on screen

 View a gage station’s changes in /water level and rainfall accumulation over time in

chart or table version

28

Harris County Flood Warning Website Warning Main Page. Website County Harris Flood

.

3 Figure

29

One of the HCFWS features is viewing the past water-level and rainfall amount data at a specific gage location. Figure 4 shows the view of past water-level/stream elevation data. It shows one year water-level data up to February 19, 2016 for gage station number 475 located in the intersection of Brays Bayou and Bellaire Boulevard. The upper part of the screen shows sensor information data, e.g. identification number, location, type, installation date, and location topography. The lower part of the screen shows the past data in both graphic and table format.

There is also an option to download the data in a Microsoft Excel file by clicking the ‘Export to

Excel’ button. Moreover, Figure 5 shows the past rainfall amount data for the same location and period of time. The figure shows rainfall amounts at a monthly basis. Besides that, users also have options to view rainfall amounts every 12 hours or 24 hours.

30

Figure 4. View of Stream Elevation Data.

31

Figure 5. View of Rainfall Amount Data.

32

The HCFWS relies on gage stations which measure amounts of rainfall and monitor water levels in Harris County bayous and its . As of February 2016, there are 139 gage stations owned by HCFCD which are placed strategically in the area. They contain sensors which transmit rainfall amounts using radio frequency every time it measures 0.04 inches of rain and water-level data are reported every 0.10 foot rise in water levels. Besides amounts of rainfall and water levels, some gage stations also measure wind speed and direction, barometric pressure, air temperature, road temperature, and humidity.

Not all of the gage stations shown in HCFWS are maintained by HCFCD. There are other agencies in Harris County which operate and maintain their own flood warning systems. HCFCD works with these agencies to provide more comprehensive information in HCFWS. These partner agencies use radio frequencies to transmit data to their base station or to Houston TranStar10. displays the number of gage stations owned by HCFCD and partner agencies of February 2016.

Table 2. Number of Gage Stations by Agencies Agency Number of gage stations Harris County Flood Control District 139 Fort Bend County 9 Brazoria County 4 Harris County Toll Road Authority 0 City of Houston 3 City of Pearland 10 City of Sugarland 17 Metropolitan Transit Authority of Harris County 6 Texas Department of Transportation 44

10 The Houston TranStar consortium is a partnership of four government agencies (The Texas Department of Transportation, Harris County, The Metropolitan Transit Authority of Harris County and The City of Houston) that are responsible for providing Transportation Management and Emergency Management services to the Greater Houston Region.

33

Table 2. Number of Gage Stations by Agencies (continued) Agency Number of gage stations Trinity River Authority 12 San Jacinto River Authority 15 Woodlands 3 Total 262

Figure 6 shows the high level architecture of HCFWS. The sensors in each gage station transmit data to two primary repeaters located in the Huffman and Clodine areas. Subsequently, the repeaters relay the data to primary and back-up base stations located at Houston Transtar and at the Harris County Appraisal District. To ensure the gage stations are functioning properly and transmitting accurate data, software and HCFCD staff monitors the information daily.

Figure 6. Harris County Flood Warning System High Level Architecture.

34

3.2.2 The Rice University Flood Alert System

The Rice University Flood Alert System (FAS3) [37] is a flood warning and forecast system which initially covers the area of the Rice University and the Texas Medical Center

(TMC) complex. It is an integrated system utilizing radar, rain gage information, bayou stage data and hydrologic modelling. It was designed by Dr. Philip Bedient at Rice University from 1997 to

2015, with the assistance of Dr. Nick Fang and Dr. Baxter Vieux [38]. The system is accessible at its website at http://fas3.flood-alert.org/.

Figure 7 shows the main page of the website. It displays a map of the City of Houston, specifically outlining Brays Bayou watershed and its corresponding sub-basins. The map is overlaid with a color representing rainfall intensity information when rainfall occurs in the watershed. There are two graphs located below the map. Both graphs show important observations at Brays Bayou. The graph on the left side shows the rainfall intensity over time at

Brays Bayou upstream at Main Street, while the graph on the right side is a representing Brays Bayou stream flow near TMC including the stream flow prediction for few hours ahead. There is also a traffic light displayed at the upper right corner of the website. This traffic light is intended as a flood alert communication means for emergency personnel. It indicates the water levels in Brays Bayou at Main Street. Moreover, there are a few thumbnail views linking to several other web pages of interest on the right side of the page. This includes the links to in-situ bayou cameras providing visual confirmation to users.

The current version of FAS3 is a third generation flood warning system which incorporates next-generation radar (NEXRAD) rainfall data. The data are also calibrated with 20 local gages. Besides that, a hydraulic prediction tool called Floodplain Map Library (FPML) is also implemented FAS3. The tool predicts and visualizes flood inundation in real time and is useful for aiding emergency personnel in their duties during sever weather conditions. In order to

35 do streamflow prediction at Brays Bayou, FAS3 employs hydrologic models HEC-1, designed by the Hydrologic Engineering Center, and Vflo designed by Vieux & Associates, Inc. HEC-1 is a model that simulates the surface run off response of a river basin to precipitation. Vflo is a computer model simulating storm water runoff based on geospatial data to model interior locations in the drainage network [39].

36

Figure 7. The Rice University Flood Alert System Website Main Page.

37

3.2.3 Central Texas HUB

Another flood warning system available in the state of Texas region is the Central Texas

HUB (CTH) [40]. CTH is a warning system which gives a unified view of water levels and flood conditions for the Central Texas region This information is made available through a web browser at: http://www.centraltexashub.org/wiskiweb.htm or through a water web services interface at: http://www.centraltexashub.org/kiwis.htm. By integrating this information, CTH helps local authorities understand storm and flood situations more quickly and easily and eventually will make faster decisions.

Figure 8 displays the main page of CTH. The page consists of two frames. The left frame is a navigation page where user can filter gage stations based on external system, range of observation value and gage location. The right frame is a display page which shows map overlaid with requested information. As displayed in Figure 8, a box containing detailed observation information will be displayed if user navigates to a certain observation point. User can also display the observation locations in a table format instead of in a map format.

38

Central Texas HUB MainPage. HUB Central Texas

.

8 Figure

39

The CTH integrates the data collected by US Geological Survey's (USGS) National

Water Information System, the City of Austin's Flood Early Warning System, and the HydroMet system of the Lower Colorado River Authority (LCRA). Various data, such as precipitation, streamflow, and flood water elevation, are collected by those individual systems during flood in

Central Texas. CTH collects the observation data from the various systems and stores it in a centralized database which is maintained by University of Texas at Austin’s Center for Research in Water Resources (CRWR).

The development of CTH was triggered by tropical storm Hermine in 2010. At that time, each system maintained its data collection and has its own website for displaying data. In some cases, data collected by some gage stations could also be displayed by more than one websites.

Emergency managers then requested for the development of a centralized system for easier and quicker way of understanding storms and floods. CTH was developed by KISTERS, the

Environmental Systems Research Institute (ESRI) and the CRWR of the University of Texas at

Austin.

There are several technologies used in this system:

1. KISTERS WISKI [41]

WISKI (Water Information Systems KISTERS) serves as a central database that stores water

observation data collected from various systems. It is a software solution developed by

KISTERS, an environmental software company to manage water data. The software is able to

do data collection automation, data quality improvement, visualization, reporting, and data

sharing.

40

2. WaterML [42]

The data from USGS are transferred using water web services in WaterML. It is a standard

information model representing water information data. The use of standard language enables

data exchange between different information systems.

3. CTH also uses geospatial information services provided by ESRI to outline the drainage area

for each location of the observation point, and to construct interpolated precipitation maps

from the point precipitation data. These features are important for user to display the

watershed of certain observation point and the precipitation map for a few hours back.

3.2.4 Flood Warning/Monitoring Systems on Mobile Devices

The increasing reliance on smartphones use has triggered the development of flood warning applications in a smartphone. These applications typically leverage raw data from government agencies, such as the National Oceanic and Atmospheric Administration (NOAA), the National Weather Service (NWS), and the U.S. Geological Survey (USGS). Each application then formats the data using custom mapping or graphical solutions for better visualization and readability. Some of the flood warning smartphone applications currently available include:

Rivercast, FloodWatch and King County Flood Warning.

3.2.4.1 RiverCast

RiverCast is a free mobile application running on iOS platform which was developed by

Juggernaut Technologies, Inc. The application uses raw data obtained from the NOAA and the

AHPS to display river level data in the U.S in the form of interactive maps and graphs.

The main screen displays a map overlaid with gage-location indicators (Figure 9a). Once user selects a certain location, the application shows past, current and forecast river level data in graphical format (Figure 9b).

41

Figure 9. Rivercast Application (from left to right): a) Location Selection Screen, b) River Height Data.

3.2.4.2 Floodwatch

Floodwatch is a weather application developed by D5G Technology. The application is intended for monitoring rivers and stream throughout the U.S. It utilizes real time data from the

USGS and NWS gages.

Figure 10 displays some Floodwatch GUIs. Users are able to select and bookmark their location of interest in the application (Figure 10a). The selection page consists of a map overlaid by several gage location indicators. When a user selects a location, the application shows river level, precipitation totals, and discharge and data (Figure 10b). The data can also be displayed in graphical format in multiple selections from 7 to 120 days range (Figure 10c).

42

Figure 10. FloodWatch Application (from left to right): a) Location Selection, b) River Information, c) River Level Data in Graphical Format, x-axis : Time, y-axis : River Level (in feet).

3.2.4.3 King County Flood Warning

The King County Flood Warning is a mobile application developed by King County in the State of Washington and funded by King County Flood Control District. It provides users with the current flood conditions of eight King County Rivers: Skykomish River, Snoqualmie

River, Tolt River, Raging River, Issaquah Creek, Cedar River, Green River, and White River.

The application is available for both Apple and Android users. It uses data from USGS, NWS, and Northwest River Forecast Center.

The main screen of the application shows the flood phase of each river (Figure 11a). The flood phase is represented by a number ranging from o to 4, where 0 means no flood is expected and 4 means major flood is expected. User can see more detailed information about the river condition at a specific gage location by tapping the river's name (Figure 11b). The information includes: current river flows, flood stage data, and real time flood phases. River flow and stage information are also displayed in graphical format including their forecast values in two days

43 ahead. Besides that, the application is also able to display the past river flow and stage (Figure

11c).

Figure 11. King County Flood Warning Application (from left to right): a) Main Screen, b) Cedar River Information – Gage Location : Landsburg, c) Past River Flow and Stage.

3.3 Flood Response Management

One of the activities in flood preparedness and response is evacuation route planning. The most frequent causes of evacuation in the United States are fire and flood. Almost every year people in coastal areas leave their homes as hurricane approaches. Evacuation route planning is a complex problem faced by local and federal agencies when there is a need to relocate their residents due to flood or other natural disaster. Massive traffic congestion caused by simultaneous evacuation of millions of residents and road closures are some of the challenges in determining the most efficient evacuation route planning. To address this problem, Lim et al. proposed a capacitated network flow model for short notice evacuation planning [43]. This model determines the evacuation schedule (start time), evacuation paths and their flow for priority based evacuation networks. A more enhanced model also takes the capacity uncertainty of road links into account

[44].

44

4 Case Study: Water-Level Forecasting in Harris County

4.1 Objective

In general, flood occurrence is a complex process because it is affected by various variables, such as rainfall amounts, ground cover, river/stream topography, capacity of the watercourse to drain water, and condition prior to rainfall events. Moreover, forecasting some of those variables is also equally challenging; for example, rainfall amounts are affected by various variables that are not easy to predict, such as wind speed and wind direction.

Along the same lines, we have learnt that currently Harris County does not have any satisfactory flood forecasting system. The Harris County Flood Warning System (HCFWS) which provides up-to-date flood information to its residents does not have any water level or rainfall forecasting capabilities. Secondly, although the Rice University’s flood warning system (FAS3) has forecasting capabilities, the system coverage is only limited to the Rice University and the

Texas Medical Center (TMC) complex. Lastly, the NWS’ River Forecast Center (RFC) also only provides forecast information at ten limited locations in Harris County.

Due to the flood forecasting complexity and the lack of flood forecasting systems in

Harris County, we investigate different forecasting approaches and scenarios to predict water levels in Harris County. The approaches we introduce in this chapter employ existing statistical forecasting techniques relying on the regression approach, in contrast to existing simulation-based forecasting approaches. The objective of this case study is to learn forecasting models from

HCFWS datasets supporting three different forecasting scenarios: water-level forecasting at a particular location in Harris County, in a watershed in Harris County and in the whole area of

Harris County, relying on global-forecasting models. We also perform exploratory data analysis on the HCFWS datasets to obtain valuable knowledge about the HCFWS data characteristics. As

45 part of this activity, we also assess the data quality of the HCFWS datasets. Finally, by conducting this case study, not only we generate some useful model to forecast the water levels in rivers and watersheds in Harris County, but we also create useful background knowledge for understanding flood problems in Harris County.

4.2 Datasets

4.2.1 Data Entities

There are three data entities accessible to the public through the HCFWS website: sensor information, rainfall amount, and water-level data, which will be explained next.

4.2.1.1 Sensor information

The sensor information data describe the type of sensor used in a gage and some reference information related to the physical location of the sensors. Figure 12 shows the reference information that belongs to a particular sensor. This information is displayed in the

HCFWS website.

Figure 12. Sensor Information Screenshot. Table 3 lists the sensor information data attributes used in our case study. Besides sensor information obtained from HCFWS, we also create additional sensor location latitude and

46 longitude coordinates from Harris County’s Office of Homeland Security and Emergency

Management11.

Table 3. Attributes of Sensor Information Data Name of Attributes Description Sensor ID Sensor identification information Sensor Type Sensor type used in the gage. There are five possible sensor types, e.g. Bubbler, Radar, Pressure Transducer, USGS Bubbler, and USGS Radar. Installation Date Sensor Installation date Top of A measurement of elevation that refers to the height of the banks of a bayou or creek. The elevation is measured in feet. Bottom of Channel A measurement of elevation above sea level at a bayou’s lowest point and is generally measured from the downstream side of a bridge near a gage station site. The elevation is measured in feet. Tip of Orifice A measurement that refers to the distance between the lowest tip of a water-level measuring device and the bottom of a waterway. The elevation is measured in feet. Measuring Plate A measurement of the distance from measuring plate to the top of the water. The distance is measured in feet. A measuring plate is an iron plate attached to the downstream side of a bridge, usually near the center of a bayou, creek or other waterway, that has been surveyed to determine its elevation. This measurement is compared to the reading from the water-level measuring device in that waterway to test the accuracy of the device’s sensor.

11 Data is available at: http://www.hcoem.org/stationlist.aspx

47

Table 3. Attributes of Sensor Information Data (continued) Name of Attributes Description Benchmark A benchmark is a known elevation at a point on the ground typically marked by a brass disk embedded in a bridge or sidewalk. The elevation is measured in feet. A benchmark is used to determine the elevation of the water levels in bayous and creeks as measured by the Flood Warning System equipment. Latitude Latitude coordinates of the sensor. These are additional data obtained from Harris County’s Office of Homeland Security and Emergency Management. Longitude Longitude coordinates of the sensor. These are additional data obtained from Harris County’s Office of Homeland Security and Emergency Management.

4.2.1.2 Rainfall Amount

The rainfall amount data contain the measurement of the amount of rainfall during a certain time period and is presented as a time series of rainfall amount measured in inches over a period of time; rainfall amount data can be obtained at different granularities, such as every hour,

12 hours, 1 day, month and year. Figure 13 shows an excerpt of daily rainfall data for gage station number 475 in 2015.

Reading Date From Reading Date To Rain (inches) 12/31/2015 0:00 1/1/2016 0:00 0.00" 12/30/2015 0:00 12/31/2015 0:00 0.04" 12/29/2015 0:00 12/30/2015 0:00 0.00" 12/28/2015 0:00 12/29/2015 0:00 0.04" 12/27/2015 0:00 12/28/2015 0:00 0.44" 12/26/2015 0:00 12/27/2015 0:00 0.00" 12/25/2015 0:00 12/26/2015 0:00 0.00" 12/24/2015 0:00 12/25/2015 0:00 0.00" 12/23/2015 0:00 12/24/2015 0:00 0.00"

Figure 13. Excerpt of Daily Rainfall Amount Data.

48

Table 4 lists the rainfall data attributes we used in the case study.

Table 4. Attributes of Rainfall Amount Data Name of Attributes Description Sensor Rainfall sensor ID measuring the amounts of rainfall Reading Date From Start timestamp when sensor reading was taken Reading Date To End timestamp when sensor reading was taken Rainfall Amount Rainfall amount measured in inches

4.2.1.3 Water Level

The water-level data contain measurements of the water elevation in a river or stream at a particular location at a particular time. Figure 14 shows an excerpt of past water levels for sensor number 474 in year 2015, collected at different days.

Sensor Reading Date Stream Elevation (feet) 474 12/31/2015 22:24 48.73' 474 12/30/2015 22:19 48.84' 474 12/29/2015 22:14 49.02' 474 12/28/2015 22:09 49.83' 474 12/27/2015 23:24 50.68' 474 12/26/2015 21:59 48.66' 474 12/25/2015 23:28 48.72' 474 12/24/2015 21:49 48.71' 474 12/23/2015 21:44 48.65'

Figure 14. Excerpt of Water-Level Data. Table 5 lists the water-level data attributes that are used in this case study.

Table 5. Attributes of Water-Level Data Name of Attributes Description Sensor Water-level sensor ID measuring the water elevation Reading Date Timestamp when sensor reading was taken Stream Elevation Water level measured in feet

49

4.2.2 Extraction Tool: Selenium

The architecture of the HCFWS website requires users to extract rainfall amounts or water levels by gage station. This means users need to access one gage station at a time for data extraction. To speed up this process, we utilized a tool called Selenium. By using Selenium, we are minimizing user interactions with an internet browser because we are able to create an automated script which includes a series of commands to open a browser, display a website for a certain gage station, fill in the required extraction options, and extract the data.

Selenium [45] is a suite of tools built specifically for test automation of web-based applications. Selenium is known as a high-compatible toolkit as it supports and works with many browsers and operating systems and can be controlled by many programming languages and testing frameworks. Table 6 lists the browsers, operating systems, programming language, and testing frameworks that Selenium supports. Selenium consists of different components with specific roles:

1. Selenium IDE

Selenium IDE (Integrated Development Environment) is a prototyping tool for test

automation, through which a user can create preliminary automated test script. It is a Firefox

plugin and offers an easy to use interface for user to develop an automated testing. It has the

recording capability, which is able to record-and-playback user interaction with a browser.

The recorded actions later can be exported as a reusable script in one of many programming

languages and be executed as a test script.

2. Selenium 1 (was known as Selenium RC or Remote Control)

Selenium 1 is a client-server system where user can control web browsers locally or remotely.

50

It contains the following components:

- Selenium Server which launches and kills browsers, receives, interprets and runs

commands passed from test program and acts as a HTTP proxy between browser and web

application which is being tested

- Client libraries which provide programming interface (API) which runs Selenium

commands from the test program.

Selenium 1 had been the main Selenium software tools until Selenium 2 was introduced.

Although Selenium 1 is now deprecated, it is still supported in maintenance mode and

provides some features which are not yet available in Selenium 2.

3. Selenium 2 (was known as Selenium WebDriver)

Selenium 2 is the newest addition to Selenium toolkit. It provides more cohesive and object-

oriented API and answers the limitations of old implementation, such as a better support for

dynamic website where elements of a page can change without refreshing the page itself.

Selenium 2 still runs Selenium 1’s interface for backward compatibility.

4. Selenium-Grid

Selenium Grid offers the capability of running the Selenium 1 solution in multiple

environments or scaling a large test suite. It allows us to run different tests in parallel

machine at the same time. The use of Selenium Grid can improve a large or slow-running

test suit performance substantially by making use of parallel processing. Table 7 highlights

the differences between the above components.

51

Table 6. Selenium Compatibility Browser Operating Programming Languages & Systems Corresponding Testing Frameworks - Firefox - Microsoft - C# NUnit - Internet Windows - Haskell Explorer - Apple OS X - Safari - Linux - Java JUnit, TestNG - Opera - JavaScript WebdriverJS, WebdriverIO, - Chrome NightwatchJS - Objective-C - Perl - PHP Behat + Mink - Phyton unittest, pyunit, py.test, robot framework - R - Ruby RSpec, Test::Unit

Table 7. Different Components in Selenium Selenium IDE Selenium 1 (Selenium RC) Selenium Grid or Selenium 2 (Selenium WebDriver) - create simple test scripts - create robust, browser-based - create test assisted by Firefox add- regression automation suites and scripts in on which can do tests multiple browser interaction - scale and distribute scripts across environments record and playback many environments or parallel - Selenium 2 is successor of machines Selenium 1

4.2.3 Data Extraction

The Selenium component we used for data extraction is the Selenium IDE. Figure 15 displays the Selenium IDE graphical user interface (GUI) containing the rainfall amount data extraction script for the year 2015. The GUI is overlaid with numbers from 1 to 12 where each number correlates to a specific command which will be explained later. Firstly, the user needs to specify the website’s URL for data extraction in the ‘Base URL’ text field. The red toggle button on the GUI’s upper right corner is useful to mark whether the user is in recording mode. The tab

‘Table’ specifies a set of consecutive user interactions with the browser which has been recorded by the user. In this script, those actions are:

52

- Command 1-3: Open a website address for a specific gage station, i.e., gage station 100

- Command 4: Click the ‘Rainfall’ tab in the website page

- Command 5-6: Determine an extraction period, i.e., 2015

- Command 7-8: Refresh the website page

- Command 9-10: Input a data granularity, i.e., 1 day

- Command 11-12: Click the ‘Download’ button to save the data into a file

1 2 3 4 5 6 7 8 9 10 11 12

Figure 15. Rainfall Amount Data Extraction Script in Selenium IDE. We wrote separate scripts for sensor information data, rainfall amount and water-level data extraction. Because each gage station has its own website address, we need to execute the script in different website addresses; therefore, we built a windows batch script containing

53 execution commands for each website address. Figure 16 illustrates the data extraction process workflow.

Figure 16. Data Extraction Process Flow.

4.2.4 Data Pre-Processing

Figure 17 depicts the Data Pre-Processing process flow. The inputs of this process are raw water-levels and rainfall amounts stored in multiple Microsoft Excel files. We transformed those raw data files into one final dataset which is readable by the data analysis or modelling tools. We built a custom Excel VBA program which serves the purpose. The program performs the following steps: a. Consolidate multiple extraction files into one file by joining the raw rainfall amount and

water-levels using two identifiers: gage station ID and sensor-reading date. b. Add four columns containing the following data: level of sensor’s bottom of channel, water

level recorded on the previous day, relative water-level on the same day and relative water-

level on the previous day. Relative water-level is obtained by subtracting level of sensor’s

bottom of channel from water level. c. Handle data problems, such as: mark records with missing water-level values as NA.

54

. Figure 17. Data Pre-Processing.

4.3 Exploratory Data Analysis

In order to create water-level forecasting models, we make use of two years of daily rainfall amount and water-level data in 2014 and 2015 as a training set. Before we proceed with the experiments, we conduct exploratory data analysis to understand the characteristics of data captured by these gage stations. The rainfall amount and water-level data were collected by sensors residing in 139 gage stations owned by HCFCD. The data analysis is performed on the cleaned data; how data were cleaned is explained in section 4.3.4. There are two measurements for water-level data: absolute and relative water-level. Absolute water-level is the measurement of water surface above sea level. Relative water-level data is the measurement of the water surface above its bottom of channel. We derive the relative water-level data by subtracting the level of bottom of channel from the absolute water-level measurement.

We use box plots and histograms to graphically represent the data distribution. In a box plot, we are able to see the data representation in their quartiles. The bottom and top of the box represent the first and third quartile, and the band inside the box represents the second quartile or median. There is a whisker in the box plot which extends to the most extreme data points; that is, the largest and smallest value which are not an outlier and the value should be no more than

55

1.5*IQR (interquartile range) from the box. Any data outside the whisker is considered an outlier.

Figure 18 illustrates the box plot definition.

Figure 18. Box Plot.

4.3.1 Exploratory Data Analysis for Particular Locations

Exploratory data analysis was performed on three datasets describing water-level and rainfall amount observations for the following three gage stations:

- Gage Station 2130 located at the intersection of Horsepen Creek and Trailside Drive

- Gage Station 1185 located at the intersection of Cypress Creek and Sharp Road

- Gage Station 370 located at the intersection of Sims Bayou and State Highway 288

Figure 19 shows the absolute water-level data of each gage station over time. We can see that gage station 1185 is located in the highest elevation compared to gage stations 1185 and 370.

Respectively, gage station 370 is located in the lowest elevation compared to gage station 2130 and gage station 1185. We also observed that the water-level fluctuations of gage stations 2130

56 and 370 were almost similar. The water levels at both gage stations fluctuate frequently in summer 2014, from end of 2014 to early 2015, 2015 and fall 2015. Meanwhile, the water levels at gage station 1185 fluctuate frequently from early 2014 to end of 2014 and around early

2015 and summer 2015.

Figure 19. Water Levels of Location-Specific Gage Stations. Figure 20 compares the water levels and rainfall amounts at each gage station and showed that the water-level fluctuations at gage stations 2130 and 370 were consistent with their rainfall amount fluctuations. If there is a high intensity of rainfall, the water level at the stream also increases. If the rainfall stops, the water level at the stream also decreases. However, that is not the case for gage station 1185; the water-level graph of gage station 1185 has longer wave lengths than its rainfall graph. It also looks to us that the location of gage station 1185 has a slower discharge rate.

57

Gage 2130

110

106

102

Water Level (in feet) Level(in Water

98

6

5

4

3

2

1

Rainfallinches)(in 0

2014 2015 2016

Time

Gage 1185

165

160

155

Water Level (in feet) Level(in Water

4

3

2

1

Rainfallinches)(in 0

2014 2015 2016

Time

Gage 370

30

25

20

Water Level (in feet) Level(in Water

15

8

6

4

2

Rainfallinches)(in 0

2014 2015 2016

Time Figure 20. Water Levels and Rainfall Amounts of Location-Specific Gage Stations.

58

4.3.2 Exploratory Data Analysis for the Addicks Reservoir Watershed

The Addicks Reservoir watershed is located in western part of Harris County with a small portion crossing into eastern Waller County. The primary stream of this watershed is Langham

Creek and its tributaries: Horsepen Creek, Bear Creek, and South Mayde Creek. These streams drain into the Addicks Reservoir, located near the intersection of I-10 and State Highway 6, and finally to the Buffalo Bayou watershed. The Addicks Reservoir and the Barker Reservoir were built in the 1940s by the U.S. Army Corps of Engineers as part of a federal project. Both reservoirs were built to reduce flood risks in Buffalo Bayou which runs west to east through downtown Houston. Figure 21 shows the Addicks Reservoir watershed and its primary streams.

During heavy rainfall, the Addicks Reservoir watershed occasionally receives a significant amount of storm water overflow from the Cypress Creek watershed, which is north of it [46].

N

Figure 21. Primary Streams in the Addicks Reservoir Watershed [46]. There are nine gage stations located in the Addicks Reservoir watershed. Table 8 and

Figure 22 show the location of those gage stations.

59

Table 8. Observed Gage in the Addicks Reservoir Watershed Gage ID Gage Location 2110 Addicks Reservoir 2120 Langham Creek @ West Little York Road 2130 Horsepen Creek @ Trailside Drive 2140 Langham Creek @ Longenbaugh Road 2150 South Mayde @ Greenhouse Road 2160 Bear Creek @ Clay Road 2170 South Mayde Creek @ Morton Road 2180 Bear Creek @ FM 529 2190 South Mayde Creek @ Peek Road

2140 2130 2180

2120 2190 2160 2170 2150 2110 N

Figure 22. Locations of Observed Gage Stations.

4.3.2.1 Rainfall Amount

Figure 23 displays the box plots of the daily rainfall amount data at each gage station in the Addicks Reservoir watershed and Figure 24 displays the histogram of rainfall amounts in the watershed. According to both figures, most of the rainfall amounts are 0, which means it rarely

60 rained in the Addicks Reservoir watershed during the year 2014 and 2015; however, there were also days with up to 7 inches of rain.

Figure 25 gives a plot of the daily rainfall amount fluctuations. We can see that the graphs for gage stations 2110, 2120, 2130 are very similar and so are the graphs for gage stations

2150 and 2160. On the other hand, the graphs for the remaining gage stations are quite different.

In those 5 locations, rainfall amounts increased in the month of April, May, September, and late

December of year 2014. In year 2015, the higher rainfall intensities occurred around the month of

April, May, September, and October.

Figure 23. Box Plots of Rainfall Amounts for Individual Gage Stations in the Addicks Reservoir Watershed.

61

Figure 24. Histogram of Rainfall Amounts in the Addicks Reservoir Watershed.

Figure 25. Rainfall Amount Fluctuations in the Addicks Reservoir Watershed.

4.3.2.2 Absolute Water Level

Figure 26 shows box plots of water levels at each gage stations located in the Addicks

Reservoir watershed and the box plot for all data combined. A red triangle indicates the level of

62 each gage station’s top of banks and a blue diamond indicates the level of each gage station’s bottom of channel.

The water-level data at gage station 2110 located in the Addicks reservoir are lower than other gage stations. This is consistent with the fact that it is the final drainage of all streams located in this watershed. We also observe that gage station 2110 has larger data variance if compared with other gage stations. These means there are more water fluctuation in this location; this is probably also caused by the collection of water from various streams in the watershed.

Furthermore, the water levels of gage stations 2140, 2180 and 2190 are higher compared to the rest of the other gage stations. This is also related to the level of the streams where the gage stations are located. Those gage stations locations’ river/ levels (indicated by the level of bottom of channel) are higher than the location of the rest gage stations; hence, we could expect that the water levels at those locations are also higher. There are only a few locations where water levels reach the top of banks, which are gage stations 2150, 2180, and 2190.

Figure 26. Box Plots of Absolute Water-Level in the Addicks Reservoir Watershed.

63

Figure 27 shows the water-level fluctuations at each gage stations in the Addicks

Reservoir Watershed. We observe consistent fluctuation patterns among various locations. We also observe some seasonality in this display where water levels rise in the months of April, May,

September, and in late December.

Addicks Watershed Water Level

Gage Stations 2110 2120 2130

140 2140 2150 2160 2170 2180

120 2190

100

above sea levelseaabove (inches) 80

2014 2015 2016

Time

Figure 27. Water-Level Fluctuations from Individual Gage Stations in the Addicks Reservoir Watershed.

4.3.2.3 Relative Water Level

Figure 28 displays the box plots of relative water-level data at each gage station in the

Addicks Reservoir watershed. The figure is different from the box plots of absolute water levels in Figure 26. As depicted by Figure 28, the water levels at all gage stations, except gage stations

2110 and 2120, are in the range of 0 to 5 feet. According to Figure 26, the gage station 2120’s stream bed is lower than that of the other gage stations, except gage station 2110. And yet, the relative water levels at this location are still higher than that of the other locations.

64

Figure 28. Box Plots of Relative Water Levels in the Addicks Reservoir Watershed.

4.3.3 Global Exploratory Data Analysis

Exploratory data analysis was also performed on a dataset which contains two years of water-level and rainfall data for all gage stations in Harris County. Table 9 summarizes data that were extracted from the HCFWS website.

Table 9. Data Summary Rainfall Amount Water Level Total Number of Gage Stations 136 136 2014 48,592 39,655 Number of Observations 2015 49,423 45,404

4.3.3.1 Rainfall Amount

Figure 29 gives two histograms that show the distribution of rainfall in year 2014 and

2015. Almost all rainfall amounts are in the range 0 and 0.5 inch; moreover, both histograms are very similar. Table 10 shows the quartiles of the rainfall distribution in more detail. We observe that both years has the same value of 1st, 2nd, and 3rd quartile. However, they have different mean

65 and maximum value and we conclude that the rainfall amounts in 2015 are higher than in year

2014.

Figure 29. Histogram of Harris County Rainfall Amounts in year 2014 and 2015.

Table 10. Quartiles of Rainfall Amount Data 2014 2015 Combined 2014 and 2015 data Minimum 0.0000 0.0000 0.0000 1st Quartile 0.0000 0.0000 0.0000 Median 0.0000 0.0000 0.0000 Mean 0.1268 0.1961 0.1618 3rd Quartile 0.0400 0.0400 0.0400 Maximum 6.4800 11.9200 11.9200

4.3.3.2 Water Level

Figure 30 shows the distribution of both absolute and relative water levels in Harris

County. The absolute water levels have a larger variation than the relative water levels. As explained in the earlier chapter, all HCFWS gage stations are placed at various locations in Harris

County bayous and its tributaries. If measured from the sea level, each river bed has different

66 elevations; hence, the absolute water-level values also vary. Furthermore, we also see that the water levels in 2015 were slightly higher than 2014. This seems to be consistent with the higher amounts of rainfall in year 2015.

Figure 30. Box Plots of Water Levels in Harris County.

4.3.4 Data Cleaning Procedure

All exploratory data analysis described in the previous sections were performed on dataset which contains two years of past water-level and rainfall amount data observed by all gage stations in Harris County. During data preparation process, we found several challenges which affect the data quality and also might impact the goodness of fit of the chosen model:

1. Lack of water-level measurements

There are 3 gage stations which do not collect water-level data. Those three gage stations are:

 Gage Station 510 located at Harris County Flood Control District Office at Brookhollow

 Gage Station 1000 located at Houston Transtar Office

 Gage Station 1020 located in NRG Park

67

Due to the absence of the water-level information, we omit these three locations from the

case study, leaving us with 136 gage stations.

2. Missing water-level data.

The water-level data for some gage stations are missing for single days and longer

periods of time. This was caused by two reasons: uninstalled sensor and missed observations.

Figure 27 shows some examples for both cases. Firstly, no water-level data were found for

gage stations 2140, 2170, 2180, and 2190 in early 2014. This is because the sensors had not

been installed.

The second example was no observations found for gage station 2180 for several periods

of time in the second half of year 2014. We are not able to determine the reason for this case;

however, we took approximation approach to overcome this challenge. We use R’s

na.approx function in the zoo package to interpolate the missing observations. The

function uses a linear interpolation method to replace the missing observations. Figure 31

shows the water-level fluctuations after approximation.

68

Addicks Watershed Water Level (approx.)

Gage Stations 2110 2120 2130

140 2140 2150 2160 2170 2180

120 2190

100

above sea level(inches)seaabove 80

2014 2015 2016

Time

Figure 31. Water-Level Fluctuations for Individual Gage Stations in the Addicks Reservoir Watershed After Approximation. 3. Inconsistent water-level data

There are 127 inconsistent observations where the water levels were less than their

corresponding levels at bottom of the channels. In other words, its relative water-level

(measurement of water surface above its bottom of channel) is less than 0 feet.

As we can see in Table 11, the majority of the relative water-level data is in the range of

-1 and -0.010 foot. We consider this as a very small difference that does not affect data

analysis results and we left the data as it is.

Table 11. Distribution of Inconsistent Water-Level Data Range of Relative Water-Level (in feet) Number of observation -119.910 ≤ x < -15.64 1 -15.64 ≤ x < -8.14 2 -8.14 ≤ x < -1.00 7

69

Table 11. Distribution of Inconsistent Water-Level Data (continued) Range of Relative Water-Level (in feet) Number of observation -1 ≤ x < -0.095 22 -0.095 ≤ x < -0.040 17 -0.040 ≤ x < -0.020 38 -0.020≤ x≤-0.010 40 TOTAL 127

Meanwhile, we further investigated the remaining 10 observations whose water levels are less than -1.00 foot. Table 12 summarizes how those inconsistencies were resolved.

Table 12. Inconsistent Water-Level Data Resolution Gage Relative Number Probable Cause12 Resolution ID Water of Level observat (in feet) ions 2140 -119.91 1 Inconsistent observation due to Exclude the water- first days of sensor installation level observation from experiments 240 -15.64 2 Inconsistent observations due to Same as above first days of sensor installation 750 -8.14 1 Inconsistent observation due to Same as above first days of sensor installation 620 -1.01 1 Unable to determine cause Use data as it is 310 -1.57 1 Unable to determine cause Use data as it is

12 This is our proposed explanations concluded from limited evidence which yet to be cross checked with HCFCD

70

Table 12. Inconsistent Water-Level Data Resolution (continued) Gage Relative Number Probable Cause12 Resolution ID Water of Level observat (in feet) ions 380 -3.45, -3.09, 4 Defect in sensors. Sensor was Exclude the water- -1.76, -1.67 reporting inconsistent data level observations during the period of 04/21/2014 during the period of to 05/06/2014. There were no 04/21/2014 to data reported after 05/06/2014 05/06/2014 from until 08/15/2014 (almost 3 experiments months later).

4.4 Forecasting Scenarios and Evaluation Measures

4.4.1 Forecasting Scenarios

One goal of this case study is to develop models that forecast future water levels for three different scenarios, which are:

1. Water-level forecasting for a particular location in Harris County.

We will use this location-specific scenario to predict water levels at three different gage

stations, i.e. gage station 2130 located at the intersection of Horsepen Creek and Trailside

Drive, gage station 1185 located at the intersection of Cypress Creek and Sharp Road, and

gage station 370 located at the intersection of Sims Bayou and State Highway 288.

2. Water-level forecasting for a watershed in Harris County.

We will use this scenario to learn forecasting models for the Addicks Reservoir watershed.

We will use a dataset of all HCFWS gage stations located in the watershed.

3. Global water-level forecasting for any location in Harris County.

This is a non-location specific scenario where we will learn a global forecasting model for all

rivers in Harris County.

71

For each scenario, we developed several forecasting models, which will be explained in section 4.4.3 . These models vary by their dependent and independent variables and by their forecasting methods. They will predict absolute water levels, relative water levels, or water-level differences (henceforth: water-level delta) at the locations defined by the scenario. We used three forecasting methods to obtain water-level forecasting models: linear regression, ARIMA and

VAR models. These methods will be explained in section 4.4.2. Table 13 summarizes the forecasting scenarios of our case study.

Table 13. Summary of Scenario Forecasting Model Forecasting Scenarios No Forecasting Location- Dependent Variable Watershed Global method Specific 1 Absolute Water-Level    2 Linear Regression Relative Water-Level    3 Water-Level Delta    4 Absolute Water-Level  5 ARIMA Relative Water-Level  6 Water-Level Delta  7 Absolute Water-Level  8 VAR Relative Water-Level  9 Water-Level Delta 

4.4.2 Forecasting Methods

In this case study, we use linear regression, ARIMA, and VAR method to fit the forecasting models to the training data, which will be explained next. We used R to conduct the experiments. R is a free software that is widely used for statistical computing and graphics. It compiles and runs on UNIX machines, Windows, and MacOS. Currently we use version 3.2.3 which was released on 10 December 2015.

72

4.4.2.1 Linear Regression

Linear regression is an approach to model the relationship between one independent variable

푦 and 푛 dependent variables, namely, 푥1,푥2,… 푥푛.The linear regression model takes the form of:

푦 = 훽1푥1 + 훽2푥2 + ⋯ + 훽푖푥푖 + 휀, 푖 = 1, … , 푛 where 훽푖 is the regression coefficient and 휀 is the error term or noise.

We will use the R’s function lm to fit linear models.

4.4.2.2 ARIMA (Autoregressive Integrated Moving Average)

ARIMA [47], which stands for Autoregressive Integrated Moving Average, is a forecasting approach on stationary time-series which combines autoregressive and moving average models. In an autoregressive model, the forecast value of a variable is predicted from the past values of that variable, while in a moving average model, the forecast value of a variable is a regression-like model of its past forecast errors. ARIMA model is not directly applicable to non- stationary time-series. If the time-series indicates non-stationarity, we need to transform them into stationary time series using differencing techniques.

The model was first introduced by statisticians George Box and Gwilym Jenkins in 1970; therefore, it is also commonly known as Box-Jenkins model. The original model consists of iterative three stage processes of: model selection, parameter estimation, and model checking.

Recent descriptions of the model also include additional steps of data preparation and forecasting, making it a 5-stage process.

1. Data Preparation

The activities in this stage are transformations and differencing. Transformations, such as

taking square roots or logarithms of the measurements, are intended to stabilize the variance

in the time series. Differencing or taking difference between consecutive observations is

intended to make sure that no obvious patterns, such as trend or seasonality, exists in the data.

73

2. Model Selection

In this stage, we identify whether we want to include autoregressive or moving average

model into the ARIMA processes. Various graphs, that is, autocorrelation and partial

autocorrelation plot help to make the proper choice in this stage.

3. Parameter Estimation

After the potential model is chosen, this stage determines the coefficients that fit the data the

best. Some of the popular objective functions rely on maximum likelihood estimation or non-

linear least squares estimation.

4. Model Checking

Model checking involves testing whether the model conforms to the criteria or a stationary

univariate process. It means that the residuals should be independent of each other and

constant in mean and variance over time. This can be identified in several ways: (1) plotting

the mean and variance over time, (2) plotting the residuals’ autocorrelation and partial

autocorrelation and (3) performing a Ljung-Box test. If the model does not meet the criteria,

another model must be built.

5. Forecasting

If the model satisfies the criteria, it can be used for forecasting.

An ARIMA model is represented as ARIMA (푝, 푑, 푞) where p is the number of the autoregressive terms, d is the degree of differencing involved and q is the number of the lagged

′ forecast errors in the equation. Let 푦 denotes 푑-th difference of 푦, error term 푒푡, and exogenous variable 푐; the equation takes the form:

′ 푦′푡 = 푐 + 훼1푦′푡−1 + … + 훼푝푦 푡−푝 + 휃1푒푡−1 + … + 휃푞푒푡−푞 + 푒푡

The ARIMA model will be fitted by Arima function available in R forecast package created by Rob J. Hyndman.

74

4.4.2.3 VAR (Vector Autoregressive)

Vector Autoregressive (VAR) Models [48] [49] is a model used to capture linear interdependencies between variables. In univariate autoregressive model, a forecast value is a linear function of past observations. In contrast to linear models, variables in VAR models are treated equally, each variable is dependent on the other variables. A basic VAR model consists of a set endogenous variables 푦푡 = (푦1푡,푦2푡, …,푦퐾푡) for 푘 = 1, … , 퐾. The VAR (푝) – process, also called a VAR with 푝 lags, is then defined as:

푦푡 = 푐 + 퐴1푦푡−1 + 퐴2푦푡−2 + ⋯ + 퐴푝푦푡−푝 + 푢푡 with 푦푡 = (푦1푡,푦2푡, …, 푦퐾푡) is a (퐾 × 1) vector, 푐 is a (퐾 × 1)vector of exogenous variables or intercepts, 퐴푖 are (퐾 × 퐾) coefficient matrices for 푖 = 1, … , 푝. Finally,푢푡 = (푢1푡,푢2푡, …, 푢퐾푡) is a (퐾 × 1) stochastic process or error terms where 푢푡 is identically distributed with mean 퐸(푢푡) =

푇 0 and positive definite covariance matrix ∑푢 = 퐸(푢푡푢푡 ).

As an example, a VAR (1) model of three variables can be written in matrix form as:

푦1푡 푐1 퐴11 퐴12 퐴13 푦1푡−1 푢1 [푦2푡] = [푐2] + [퐴21 퐴22 퐴23] [푦2푡−1] + [푢2] 푦3푡 푐3 퐴31 퐴32 퐴33 푦3푡−1 푢3

Alternatively, it can also be written as a system of three equations:

푦1푡 = 푐1 + 퐴11푦1푡−1 + 퐴12푦2푡−1 + 퐴13푦3푡−1 + 푢1

푦2푡 = 푐2 + 퐴21푦1푡−1 + 퐴22푦2푡−1 + 퐴23푦3푡−1 + 푢2

푦3푡 = 푐3 + 퐴31푦1푡−1 + 퐴32푦2푡−1 + 퐴33푦3푡−1 + 푢3

Each variable has one equation and each equation shows the interdependencies between all three variables. In this case study, we use R’s vars package version 1.5-2 which was published in

2013, which provides the capability to generate VAR models.

75

4.4.3 Forecasting Models

Before discussing the forecasting models, we find it useful to summarize the terminology and notations in this section in Table 14.

Table 14. Notation Description Notation Description

푤푡 Absolute water-level at day 푡. Absolute water-level is the measurement of water surface above sea level.

푤′푡 Relative water-level at day 푡. Relative water-level is the measurement of water surface above its bottom of channel. It is derived by subtracting the level of bottom of channel from the absolute water-level.

훥푡 Water-level delta data. It represents the water-level differencing such

that 훥푡 = 푤푡 − 푤푡−1.

푟푡 Rainfall amount at day 푡

푤푖푡 Absolute water-level at location 푖 and day 푡

푤′푖푡 Relative water-level at location 푖 and day 푡

훥푖푡 Water-level increase/decrease data at location 푖 and day 푡

푟푖푡 Rainfall amount at location 푖 and day 푡

1. Absolute Water-Level Prediction with Linear Regression Method

This is the basic model where we will predict the absolute water-level based on the rainfall

amount observed on the same day and the absolute water-level observed on the previous day.

Let 푟푡 be the rainfall amount on day 푡, 푤푡 be the absolute water-level on day 푡, and 푓 be a

linear regression function; the model takes the form of:

푤푡 = 푓(푟푡, 푤푡−1)

2. Relative Water-Level Prediction with Linear Regression Method

In this model, we use relative water-level instead of absolute water-level as the dependent

variable. We will predict the relative water-level based on the rainfall amount observed on the

76

same day and the relative water-level observed on the previous day. Let 푟푡 be the rainfall

amount on day 푡, 푤′푡 be the relative water-level on day 푡, and 푓 be a linear regression

function; the model takes the form of:

푤′푡 = 푓(푟푡, 푤′푡−1)

3. Water-Level Delta Prediction with Linear Regression Method

In this model, we will predict the water-level delta based on the rainfall amount. Let 푟푡 be the

rainfall amount observed on day 푡, ∆푡 be the water-level delta, and 푓 be a linear regression

function; the model takes the form of:

∆푡= 푓(푟푡)

4. Absolute Water-Level Prediction with the ARIMA method

We will fit an ARIMA model to a training set to predict the absolute water-level based on the

rainfall amount and the past observations of the absolute water-level. Let 푟푡 be the rainfall

amount observed on day 푡, {푤푡} be a sequence of absolute water-level observations,

where 푡 ≥ 0, and 푓 be the fitted ARIMA function; the forecast model takes the form of:

푤푡 = 푓(푟푡, 푤푡−1, 푤푡−2, 푤푡−3, … , 푤0)

5. Relative Water-Level Prediction with the ARIMA method

We will fit an ARIMA model to a training set to predict the relative water-level based on the

rainfall amount and the past observations of relative water-level. Let 푟푡 be the rainfall amount

observed on day 푡, {푤′푡} be a sequence of relative water-level observations, where 푡 ≥ 0, and

푓 be the fitted ARIMA function; the forecast model takes the form of:

푤′푡 = 푓(푟푡, 푤′푡−1, 푤′푡−2, 푤′푡−3, … , 푤′0)

6. Water-Level Delta Prediction with the ARIMA method

We will fit an ARIMA model to a training set to predict the water-level delta based on the

rainfall amount and the past observations of water-level delta. Let 푟푡 be the rainfall amount

77

observed on day 푡, {∆푡} be a sequence of water-level delta observations, where 푡 ≥ 0, and 푓

be the fitted ARIMA function; the forecast model takes the form of:

훥푡 = 푓(푟푡, 훥푡−1, 훥푡−2, 훥푡−3, … , 훥0)

7. Absolute Water-Level Prediction with the VAR method

We will fit a VAR model to a training set to predict the absolute water-level at one location

based on several independent variables: the rainfall amount at that location and at other

locations, the past observations of absolute water-level at that location and at other locations.

Let 푛 be the number of sensors, 푟푖푡 be the rainfall amount observed at location 푖 and on day 푡

where 푖 = 1,2, … , 푛, {푤푖푡} is a sequence of absolute water-level observations at sensor

location 푖 and day 푡 where 푖 = 1,2, … , 푛 and 푡 ≥ 0 , and 푓 be the fitted VAR function; the

forecast model consists of a set of equation which takes the form of:

푤1푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {푤1푡−1, 푤2푡−1, … , 푤푛푡−1}, {푤1푡−2, … , 푤푛푡−2}, … , {푤10, … , 푤푛0})

푤2푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {푤1푡−1, 푤2푡−1, … , 푤푛푡−1}, {푤1푡−2, … , 푤푛푡−2}, … , {푤10, … , 푤푛0})

푤푛푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {푤1푡−1, 푤2푡−1, … , 푤푛푡−1}, {푤1푡−2, … , 푤푛푡−2}, … , {푤10, … , 푤푛0})

8. Relative Water-Level Prediction with the VAR method

We will fit a VAR model to a training set to predict the relative water-level at one location

based on several independent variables: the rainfall amount at that location and at other

locations, the past observations of relative water-level at that location and at other locations.

Let 푛 be the number of sensors, 푟푖푡 be the rainfall amount observed at location 푖 and on day 푡

where 푖 = 1,2, … , 푛, {푤′푖푡} be a sequence of relative water-level observations at location 푖

and day 푡 where 푖 = 1,2, … , 푛 and 푡 ≥ 0 , and 푓 be the fitted VAR function; the forecast

model consists of a set of equation which takes the form of:

78

푤′1푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {푤′1푡−1, 푤′2푡−1, … , 푤푛푡−1}, {푤′1푡−2, … , 푤′푛푡−2}, … , {푤′10, … , 푤′푛0})

푤′2푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {푤′1푡−1, 푤′2푡−1, … , 푤푛푡−1}, {푤′1푡−2, … , 푤′푛푡−2}, … , {푤′10, … , 푤′푛0})

푤′푛푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {푤′1푡−1, 푤′2푡−1, … , 푤푛푡−1}, {푤′1푡−2, … , 푤′푛푡−2}, … , {푤′10, … , 푤′푛0})

9. Water-Level Delta Prediction with the VAR method

We will fit a VAR model to a training set to predict the water-level delta at one location

based on several independent variables: the rainfall amount at that location and at other

locations, the past observations of water-level delta at that location and at other locations. Let

푛 be the number of observed sensors’ location, 푟푖푡 be the rainfall amount observed at

location 푖 and on day 푡 where 푖 = 1,2, … , 푛, {훥푖푡} be a sequence of water-level delta

observations at location 푖 and day 푡 where 푖 = 1,2, … , 푛 and 푡 ≥ 0 , and 푓 be the fitted VAR

function; the forecast model consists of a set of equation which takes the form of:

훥1푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {훥1푡−1, 훥2푡−1, … , 푤푛푡−1}, {훥1푡−2, … , 푤′푛푡−2}, … , {훥10, … , 푤′푛0})

훥2푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {훥1푡−1, 훥2푡−1, … , 푤푛푡−1}, {훥1푡−2, … , 푤′푛푡−2}, … , {훥10, … , 푤′푛0})

훥푛푡 = 푓({푟1푡,푟2푡, … , 푟푛푡}, {훥1푡−1, 훥2푡−1, … , 푤푛푡−1}, {훥1푡−2, … , 푤′푛푡−2}, … , {훥10, … , 푤′푛0})

4.4.4 Evaluation Measures

We will determine the best fit model for each location-specific, watershed and global forecasting scenario. We evaluate each model based on the fitted and observed values for the training sets and the forecast and observed values for the testing sets. In this case study, the training sets consist of rainfall amounts and water-level measurements captured in the years 2014 and 2015. The test sets use the same type of data captured from January 1, 2016 to June 8, 2016.

79

The following measurements will be used to evaluate the quality of forecasting models for both training and testing sets:

4.4.4.1 Root Mean Square Error (RMSE)

Root Mean Square Error (RMSE) measures the standard deviation of the residuals

(difference between predicted and actual values). The RMSE indicates the absolute fit of the model to the data. Lower value of RMSE indicates a better fit. It is computed with the following formula:

∑푛 (푦̂ − 푦 )2 푅푀푆퐸 = √ 푖=1 푖 푖 푛 where ̂푦푖 is the predicted value at the 푖-th observation, 푦푖 is the actual value at the 푖-th observation, and 푛 is the number of data.

4.4.4.2 Coefficient of Determination

Coefficient of Determination is also known as R2; it measures the proportion of variance in the dependent variables which are able to be predicted by the independent variables. It is computed with the following formula:

푛 2 2 ∑푖=1(푦̂푖 − 푦푖) 푅 = 1 − 푛 2 ∑푖=1(푦̅ − 푦푖) where 푦̂푖 is the predicted value at the 푖-th observation, 푦푖 is the actual value at the 푖-th observation, 푦̅ is the mean of the actual values and 푛 is the number of data. An R2 of 1 indicates that the model perfectly fits the data.

4.5 Experimental Method

This section will explain how we conducted the experiments for three different forecasting scenarios. We will describe how we used R functions to fit the forecasting models

80 explained in section 4.4.3 to the training sets and will evaluate and compare the obtained forecasting models.

4.5.1 Location-Specific Forecasting Scenario

4.5.1.1 Model Fitting

To fit the linear regression models 1, 2, and 3 to the training set, we use the function lm() from the base R package stats with the following specification: lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...)

The usage of R’s function lm()is straightforward and it will generate a linear model which fits the training set the best.

Furthermore, to fit the ARIMA models 4, 5, and 6 to the training set, we use the function

Arima() from the R’s package forecast with the following specification:

Arima(x, order=c(0,0,0), seasonal=c(0,0,0), xreg=NULL, include.mean=TRUE, include.drift=FALSE,include.constant, lambda=model$lambda, transform.pars=TRUE, fixed=NULL, init=NULL, method=c("CSS-ML","ML","CSS"), n.cond, optim.control=list(), kappa=1e6, model=NULL)

When using this approach, we need to determine the best parameters for the function

Arima()to produce a parsimonious model. A parsimonious model refers to the simplest plausible model with the fewest numbers of variables. The function Arima() requires three main parameters, namely, the autoregressive order(p), the degree of differencing(d), and the moving average order(q). In the function specification, these parameters are encapsulated in the parameter order. We set the parameter d to 0, which means that the series of the dependent variable doesn’t need to be differenced due to its stationarity, and conducted a parameter- selection experiment that learnt different models for different combinations of p and q in the range [0, 5]. Finally, we selected the best model as the model that has the minimum value for: (1 − 푅2) × 푅푀푆퐸. The experiment pseudocode is given in Figure 32.

81

for each dataset in gage station 2130, 1185 and 370 for p in (0,1,2,3,4,5) for q in (0,1,2,3,4,5)

model_result  ARIMA(x=dataset$dependent_variable, xreg=dataset$rainfall_amount, order=c(p,0,q), include.mean=TRUE) RMSE  sqrt(model_result$sigma2) R2  cor(fitted(model_result), dataset$dependent_variable)^2 end for end for end for

Figure 32. Pseudocode for Parameter-Selection Experiment. Table 15 shows the result of the parameter-selection experiment for the forecasting model

4. The selected parameters are highlighted in bold fonts. Based on the formula criterion, the selected (p,d,q) parameters for training set 2130, 1185, and 370 were (5,0,3), (5,0,3), (3,0,4). The same selection method was applied to the models 5 and 6. Table 16 summarizes what (p,d,q) parameters were selected for the models 4, 5, and 6.

82

Table 15. ARIMA Parameter-Selection Experiments for Model 4 (p,d,q) Gage 2130 Gage 1185 Gage 370 RMSE R2 RMSE R2 RMSE R2 0,0,0 0.9794 0.6018 2.7894 0.0639 1.1843 0.5655 0,0,1 0.9330 0.6451 1.6838 0.8170 1.1519 0.5978 0,0,2 0.9124 0.6625 1.2573 0.8577 1.1495 0.6011 0,0,3 0.9100 0.6637 1.1103 0.8718 1.1495 0.6011 0,0,4 0.9098 0.6638 1.0326 0.8796 1.1477 0.6019 0,0,5 0.9098 0.6638 1.0109 0.8808 1.1399 0.6069 1,0,0 0.9142 0.6612 1.1337 0.8454 1.1492 0.6013 1,0,1 0.9110 0.6626 1.0050 0.8786 1.1492 0.6013 1,0,2 0.9096 0.6638 0.9857 0.8832 1.1418 0.6048 1,0,3 0.9084 0.6648 0.9835 0.8837 1.1415 0.6054 1,0,4 0.9079 0.6651 0.9827 0.8839 1.1410 0.6059 1,0,5 0.9079 0.6651 0.9814 0.8842 1.1373 0.6087 2,0,0 0.9103 0.6630 0.9874 0.8828 1.1492 0.6013 2,0,1 0.9096 0.6638 0.9841 0.8835 1.1492 0.6013 2,0,2 0.9096 0.6638 0.9960 0.8807 1.1416 0.6053 2,0,3 0.9087 0.6643 0.9832 0.8838 1.1413 0.6057 2,0,4 0.9084 0.6649 0.9832 0.8838 1.1410 0.6059 2,0,5 0.9074 0.6654 0.9831 0.8838 1.1343 0.6112 3,0,0 0.9096 0.6637 0.9835 0.8837 1.1484 0.6017 3,0,1 0.9096 0.6638 0.9785 0.8849 1.1415 0.6054 3,0,2 0.9095 0.6639 0.9822 0.8840 1.1369 0.6086 3,0,3 0.9093 0.6640 0.9821 0.8840 1.1398 0.6068 3,0,4 0.9078 0.6653 0.9828 0.8839 1.1273 0.6169 3,0,5 0.9062 0.6666 0.9776 0.8851 1.1341 0.6111 4,0,0 0.9096 0.6638 0.9826 0.8839 1.1431 0.6053 4,0,1 0.9081 0.6650 0.9818 0.8841 1.1416 0.6059 4,0,2 0.9066 0.6657 0.9784 0.8849 1.1413 0.6061 4,0,3 0.9074 0.6661 0.9822 0.8840 1.1331 0.6110 4,0,4 0.9061 0.6661 0.9806 0.8844 1.1317 0.6115 4,0,5 0.9068 0.6658 0.9776 0.8851 1.1322 0.6123 5,0,0 0.9093 0.6640 0.9818 0.8841 1.1407 0.6063 5,0,1 0.9078 0.6652 0.9816 0.8841 1.1336 0.6118 5,0,2 0.9059 0.6658 0.9746 0.8857 1.1334 0.6117 5,0,3 0.9036 0.6679 0.9673 0.8875 1.1301 0.6147 5,0,4 0.9060 0.6668 0.9774 0.8852 1.1319 0.6120 5,0,5 0.9070 0.6658 0.9746 0.8858 1.1310 0.6126 Table 16. Selected ARIMA (p,d,q) parameters Model Gage 2130 Gage 1185 Gage 370 4 (5,0,3) (5,0,3) (3,0,4) 5 (5,0,3) (5,0,3) (3,0,4) 6 (5,0,5) (4,0,3) (5,0,4)

83

4.5.1.2 Forecasting

The R’s function predict.lm() from the R’s package stats was used to predict the dependent variables values for the models 1, 2, and 3 with the following specification: predict.lm((object, newdata, se.fit = FALSE, scale = NULL, df = Inf, interval = c("none", "confidence", "prediction"), level = 0.95, type = c("response", "terms"), terms = NULL, na.action = na.pass, pred.var = res.var/weights, weights = 1, ...)) We used the fitted model and the testing set as the input parameters object and newdata.

The function forecast() from the R’s package forecast was used to predict the dependent variables values in the models 4, 5, and 6 with the following specification: forecast(object, h = ifelse(frequency(object) > 1, 2 * frequency(object), 10) , level=c(80,95), fan=FALSE, robust=FALSE, lambda=NULL, find.frequency=FALSE, allow.multiplicative.trend=FALSE, ...) We used the ARIMA model generated in the fitting process as the input parameter object. The forecast() function’s return value is an object of several elements: point forecasts, lower and upper limits of 80% and 95% forecast confidence intervals. Point forecasts were used as the forecast values to analyze the model performance.

4.5.2 The Addicks Reservoir Watershed Water-Level Forecasting

4.5.2.1 Model Fitting

Similar to location-specific water-level forecasting, the function lm() from the base R package stats was used to fit the linear models 1, 2 and 3. Moreover, for the models 7, 8, and

9, the function VAR() from the R package vars was used to fit the VAR models to the training set with the following specification:

VAR(y, p = 1, type = c("const", "trend", "both", "none"), season = NULL, exogen = NULL, lag.max = NULL, ic = c("AIC", "HQ", "SC", "FPE")) We also performed a parameter-selection experiment to determine the best setting for the combination of the VAR()parameter type (type of deterministic regressors) and p (lag order).

84

The possible type values are “const”, “trend”, “both”, and “none”; and the p value was chosen in the range of [1,10].

In the forecasting model 7, “trend” and “none” produce the highest R2 value of larger than 0.99 for all variations of p. Then, the RMSE values were compared and the maximum

RMSE difference between the ARIMA model with type “trend” and “none” was 0.0125 feet, which was insignificant. Therefore, the parameter type=“none” was considered as a better option compared to the type=“trend” because it will not include any additional regressor to the ARIMA model; this will retain the number of parameters in the ARIMA model.

Although the selection of parameter type “none” led to higher R2 values, type values of “const” and “both” gave lower RMSE values. The difference between the lowest RMSE value for the same gage with type=“none” and type “const” or “both” ranged from

0.0003 feet to 0.0271 feet. We considered that these differences as insignificant; hence, the type

“none” was still considered as a better parameter setting. Finally, we choose the parameter type= “none” and p=10, which gave the lowest RMSE for almost all gage stations, as the best-fit parameter settings for the VAR() function. Table 17 shows the RMSE and R2 result for the parameter-selection experiment with the type “none” and “both”.

The similar parameter-selection experiments were also performed for the models 8 and 9.

For the model 8, the best parameter setting was type= “none” and p=10. While for the model

9, the best parameter setting was type= “both” and p=10.

85

Table 17. VAR Parameters Selection Experiment for Model 7 type = none (above), type = both (below) P

Gage 1 2 3 4 5 6 7 8 9 10

2110 0.99932 0.99941 0.99943 0.99944 0.99945 0.99946 0.99949 0.99950 0.99953 0.99955 2120 0.99997 0.99997 0.99997 0.99998 0.99998 0.99998 0.99998 0.99998 0.99998 0.99998 2130 0.99996 0.99996 0.99997 0.99997 0.99997 0.99997 0.99997 0.99997 0.99997 0.99997

2140 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999

2

R 2150 0.99986 0.99988 0.99988 0.99989 0.99989 0.99989 0.99989 0.99989 0.99990 0.99990 2160 0.99994 0.99995 0.99995 0.99995 0.99996 0.99996 0.99996 0.99996 0.99996 0.99996 2170 0.99998 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 2180 0.99997 0.99998 0.99998 0.99998 0.99998 0.99998 0.99998 0.99998 0.99998 0.99998 2190 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 0.99999 2110 2.07572 1.95522 1.94357 1.94050 1.94085 1.95097 1.92289 1.92020 1.88126 1.87187 2120 0.57315 0.54598 0.50733 0.50053 0.49674 0.48937 0.48332 0.48275 0.48540 0.48184 2130 0.66203 0.63002 0.60207 0.59711 0.59568 0.57746 0.57373 0.57033 0.57296 0.56827

2140 0.45794 0.44243 0.42262 0.41971 0.41834 0.40462 0.40521 0.39993 0.39789 0.39720 2150 1.17695 1.11964 1.11745 1.10934 1.10808 1.10864 1.11826 1.11889 1.12755 1.13480

RMSE 2160 0.75771 0.74365 0.71242 0.70828 0.69936 0.68978 0.69180 0.69450 0.70269 0.69887 2170 0.44630 0.42958 0.41070 0.40629 0.40348 0.39630 0.39225 0.38940 0.39178 0.38757 2180 0.78406 0.71787 0.67600 0.65420 0.64017 0.63378 0.63053 0.62929 0.62536 0.61903 2190 0.39166 0.36895 0.35119 0.34040 0.33741 0.33329 0.33393 0.33365 0.33382 0.33603

p

Gage 1 2 3 4 5 6 7 8 9 10

2110 0.9473 0.9542 0.9560 0.9569 0.9578 0.9584 0.9606 0.9617 0.9642 0.9655 2120 0.8958 0.9048 0.9160 0.9193 0.9223 0.9248 0.9285 0.9299 0.9309 0.9331 2130 0.8619 0.8708 0.8792 0.8806 0.8827 0.8897 0.8947 0.8979 0.8989 0.9018

2140 0.8951 0.9021 0.9087 0.9108 0.9133 0.9193 0.9203 0.9232 0.9250 0.9270

2

R 2150 0.8496 0.8676 0.8713 0.8757 0.8791 0.8819 0.8828 0.8856 0.8867 0.8884 2160 0.8592 0.8657 0.8787 0.8820 0.8872 0.8926 0.8949 0.8970 0.8974 0.9010 2170 0.9232 0.9296 0.9359 0.9378 0.9392 0.9424 0.9455 0.9475 0.9482 0.9503 2180 0.7697 0.8178 0.8347 0.8485 0.8576 0.8639 0.8680 0.8723 0.8765 0.8826 2190 0.9349 0.9433 0.9485 0.9524 0.9538 0.9568 0.9579 0.9589 0.9598 0.9601 2110 2.0708 1.9507 1.9342 1.9354 1.9391 1.9489 1.9202 1.9193 1.8800 1.8707 2120 0.5203 0.5028 0.4779 0.4741 0.4712 0.4691 0.4636 0.4651 0.4683 0.4672 2130 0.5786 0.5661 0.5541 0.5576 0.5597 0.5498 0.5440 0.5428 0.5480 0.5477

2140 0.4102 0.4007 0.3916 0.3917 0.3911 0.3823 0.3845 0.3825 0.3833 0.3836 2150 1.1782 1.1183 1.1153 1.1091 1.1074 1.1082 1.1184 1.1199 1.1291 1.1362 RMSE 2160 0.7429 0.7340 0.7058 0.7046 0.6974 0.6892 0.6905 0.6930 0.7014 0.6988 2170 0.4287 0.4150 0.4006 0.3993 0.3998 0.3941 0.3882 0.3866 0.3893 0.3866 2180 0.7720 0.6946 0.6693 0.6486 0.6366 0.6302 0.6287 0.6265 0.6246 0.6176 2190 0.3722 0.3514 0.3387 0.3296 0.3290 0.3221 0.3219 0.3226 0.3236 0.3270

86

4.5.2.2 Forecasting

The function predict.lm()from the base R package stats was used to predict the dependent variables in the forecasting models 1, 2, and 3; and the function predict() from the

R package vars was used to predict the dependent variables in the models 7, 8, and 9 with the following specification: predict(object, ..., n.ahead = 10, ci = 0.95, dumvar = NULL) The function predict() generates three outputs: forecast values, upper and lower bounds of specified forecast confidence intervals. In this case study, the forecast values were used to analyze the model performance.

4.5.3 Global Water-Level Forecasting

The functions used for model fitting and forecasting process in this analysis were the same functions used to fit linear models in location specific analysis. They are the function lm() and predict.lm() from the base R package stats.

4.6 Experimental Results and Discussions

4.6.1 Location-Specific Water-Level Forecasting Scenario

Figure 33 and Figure 34 shows the RMSE and R2 values of each model after the linear and ARIMA models were fitted to the training set and evaluated to the testing set of gage stations

2130, 1185 and 370; the green triangles represent the result for testing sets and the blue circles represent the result for training sets.

Figure 33 and Figure 34 show that the forecasting model 3 had the highest RMSE and the lowest R2 in all gage locations, if compared with the models 1 and 2. This was also the case with the model 6; if compared with the models 4 and 5, it had the highest RMSE and the lowest R2.

However, an exception occurred in the ARIMA forecasting models 4, 5, and 6 results. Based on

87 these observations, we conclude that the water-level delta forecasting models produced lower quality results compared to the absolute and relative water-level forecasting models.

The comparison of the linear regression models 1 and 2 and ARIMA models 4 and 5 shows that for absolute and water-level forecasting, the linear models generated better results

(lower RMSE and higher R2) than the ARIMA models. We also observed that the RMSE and R2 values of the pair of models 1 & 2 and models 4 & 5 were the same.

Besides that, the comparison of testing and training result based on Figure 33 shows an expected result, that is, the testing sets have larger error variances than the training sets. However,

Figure 34 shows a surprising result, that is, the testing sets have higher R2 than the training sets, except for the ARIMA models 4 and 5 in gage location 1185.

Location-Specific Forecasting Scenario

2130 1185 370 5 5 5 Testing Training

4 4 4

3 3 3 RMSE ft)(in

2 2 2 Root Mean Square Error Square Mean Root

1 1 1

0 0 0

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Forecasting Model ForecastingForecasting Model Model Forecasting Model Figure 33. RMSE of Location-Specific Forecasting Scenario.

88

Location-Specific Forecasting Scenario

2130 1185 370 1.0 1.0 1.0 Testing Training

0.8 0.8 0.8

0.6 0.6 0.6

R-squared R-squared 0.4 0.4 0.4

0.2 0.2 0.2

0.0 0.0 0.0

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Scenario ForecastingScenario Model Scenario Figure 34. R2 of Location-Specific Forecasting Scenario.

4.6.2 Addicks Reservoir Watershed Water-Level Forecasting Scenario

We analyzed the performance of the models in this scenario using several ways. The first analysis is the comparison of all models belonging to this scenario in all gage locations in the watershed using training sets. Figure 35 shows the performance of each forecasting model at each gage location. The circles and triangles represent the linear models 1, 2 and 3 and VAR models 7,

8 and 9, respectively. The blue, green and red lines represent absolute water-level forecasting models 1 and 7, relative water-level forecasting models 2 and 8, and water-level delta forecasting models 3 and 9, respectively.

Similar with the result in Location-Specific forecasting scenario, the water-level delta forecasting models, which are indicated by red lines, performed the worst. If compared to the linear models 1 and 2, the linear model 3 has lower R2 and higher RMSE. Also, the VAR model 9 has lower R2 and higher RMSE than the linear models 7 and 8 have. The figure also shows that

89 the VAR models, indicated by the triangles, perform better than the linear models, which are indicated by circles. The R2 of the VAR models are higher than that of the linear models; subsequently, the RMSE of the VAR models are lower than that of the linear models.

1.0

0.8

0.6

0.4 R-squared

0.2

0.0

2110 2120 2130 2140 2150 2160 2170 2180 2190 3.0 x Linear Model 1 VAR Model 7 2.5 Linear Model 2 VAR Model 8 2.0 Linear Model 3 VAR Model 9 1.5

1.0 RMSE ft)(in

0.5

0.0

2110 2120 2130 2140 2150 2160 2170 2180 2190

Gage Stationx Figure 35. Comparison of Addicks Watershed Forecasting Scenario at Each Location for Training Sets. The second analysis is the comparison of all models belonging to this scenario in all gage locations in the watershed using testing sets. Figure 36 shows the performance of each forecasting model at each gage location. Using the testing sets, the VAR models for absolute and relative water-level forecasting, represented by the blue and green triangles, are worse than the linear models for absolute and relative water-level forecasting, represented by the blue and green circles. This is indicated by the VAR models’ higher RMSE and lower R2 values.

90

1.0

0.8

0.6

0.4 R-squared

0.2

0.0

2110 2120 2130 2140 2150 2160 2170 2180 2190 8 x Linear Model 1 VAR Model 7 Linear Model 2 6 VAR Model 8 Linear Model 3 VAR Model 9

4 RMSE ft)(in 2

0

2110 2120 2130 2140 2150 2160 2170 2180 2190

Gage Stationx Figure 36. Comparison of Addicks Watershed Forecasting Scenario at Each Location for Testing Sets. We also compared the performance of each method across the training and testing sets.

Figure 37 shows the performance of each model; the above graph shows the performance of absolute water-level forecasting models 1 and 7, the middle graph shows the performance of relative water-level forecasting models 2 and 8 and the below graph shows the performance of water-level forecasting models 3 and 9.

Based on the graphs, the VAR models, indicated by the blue lines, consistently performed better in the training phase than in the testing phase. On the other hand, the linear model, indicated by the green lines, shows inconsistent performance; as expected, in most locations, it has larger RMSE for testing sets, but, the R2 are better for testing sets.

91

1.0 8 Testing-Linear Model 1 Training-Linear Model 1 Testing-VAR Model 7 0.8 6 Training-VAR Model 7

0.6

4 RMSE

R-squared 0.4

2 0.2

0.0 0

2110 2130 2150 2170 2190 2110 2130 2150 2170 2190

Gage Station

1.0 8 Testing-Linear Model 2 Training-Linear Model 2 Testing-VAR Model 8 0.8 6 Training-VAR Model 8

0.6

4 RMSE

R-squared 0.4

2 0.2

0.0 0

2110 2130 2150 2170 2190 2110 2130 2150 2170 2190

Gage Station

1.0 8 Testing-Linear Model 3 Training-Linear Model 3 Testing-VAR Model 9 0.8 6 Training-VAR Model 9

0.6

4 RMSE

R-squared 0.4

2 0.2

0.0 0

2110 2130 2150 2170 2190 2110 2130 2150 2170 2190

Gage Station Figure 37. Comparison of Addicks Watershed Forecasting Scenario in Training and Testing Phase Absolute Water-Level Forecasting Models (above); Relative Water-Level Forecasting Models (middle); Water- Level Delta Forecasting Models (below).

92

4.6.3 Global Water-Level Forecasting Scenario

Figure 38 shows the comparison result for three linear models generated in this scenario; the green triangles represent result for testing sets and the blue circles represent result for training sets. The linear model 1 has the highest R2 value for both training and testing set. Similar to previous location-specific and Addicks Reservoir watershed forecasting scenarios, model 3 has the lowest R2 result. However, unlike the previous scenarios, model 3 has a lower RMSE value for the training set, if compared with the models 1 and 2.

Global Water Level Forecasting Scenario

1.0 Testing 3.0 Training

2.5 0.8

2.0 0.6

1.5 R-squared 0.4

1.0 Root Mean Square Error (in ft)(in Error Square Mean Root 0.2 0.5

0.0 0.0

1 2 3 1 2 3 Forecasting Model x x Figure 38. Global Forecasting Scenario. We compared the coefficients of various linear models obtained for this scenario.

Surprisingly, the linear models generated for this scenario are almost identical with the linear models created in the forecasting scenario for the Addicks Reservoir watershed. Table 18 shows the error terms and coefficients of the linear models in both scenarios.

93

Table 18. Coefficients of the Linear Models in the Global and the Addicks Reservoir Watershed Scenario Forecasting Model Global Addicks Watershed Forecasting Scenario Forecasting Scenario Error Coefficient Coefficient Error Coefficient Coefficient Term 1 2 Term 1 2 Absolute Water-Level -0.2804 1.7997 0.9987 0.2045 1.7987 0.9950 Relative Water-Level 0.3184 1.8666 0.8379 -0.03989 1.83733 0.91460 Water-Level Delta -0.341 1.794 -0.3218 1.7955

Figure 39 shows the performance of all linear models for both Global and Addicks

Reservoir watershed forecasting scenarios in the testing phase. If we compare the models based on its dependent variable, the relative water-level forecasting models, indicated by green lines, show less error variance in most locations. In addition to that, the relative water-level forecasting model for global forecasting scenario, indicated by green crosses, has lower error variances than for the Addicks Reservoir watershed forecasting scenario.

1.0 3.0 Addicks Linear Model 1 Global Linear Model 1 Addicks Linear Model 2 Global Linear Model 2 Addicks Linear Model 3 2.5 Global Linear Model 3 0.8

2.0 0.6

1.5

RMSE R-squared 0.4

1.0

0.2 0.5

0.0 0.0

2110 2130 2150 2170 2190 2110 2130 2150 2170 2190 Figure 39. Linear Models Performance in the Global and the Addicks Watershed Forecasting Scenario.

94

Finally, we are interested in comparing the performance of all models across different scenarios, namely, the Location-Specific, the Addicks Reservoir Watershed and the Global forecasting scenarios. Figure 40 shows the performance of all models for gage location 2130. The left graph shows the performance of all models in training phase and the right graph shows models performance in testing phase. In the training phase, the VAR models in the Addicks

Watershed forecasting scenario shows the best performance. Different results were obtained in the testing phase; the absolute and relative water-level forecasting models in the Location-

Specific Scenario have the best performance.

Training Testing

1.0 1.5 Particular Loc-Linear 1.0 2.5 Particular Loc-Linear Addicks-Linear Addicks-Linear Addicks-VAR Addicks-VAR Global-Linear Global-Linear

0.8 0.8 2.0

1.0

0.6 0.6 1.5

R-squared R-squared RMSE ft)(in 0.4 RMSE ft)(in 0.4 1.0 0.5

0.2 0.2 0.5

0.0 0.0 0.0 0.0

Abs WL Rel WL Delta Abs WL Rel WL Delta Abs WL Rel WL Delta Abs WL Rel WL Delta x Dependent Variables x x Dependent Variables x Figure 40. Performance of All Forecasting Models in Different Scenarios.

95

5 Conclusion

This thesis surveyed recent research on flood-related problems and computational methods for flood forecasting, flood warning and monitoring, and flood response management.

Secondly, the thesis investigates water-level forecasting techniques relying on a regression approach; namely, linear regression, and the VAR and ARIMA models are investigated to forecast water levels of streams and rivers. Finally, the investigated techniques were applied and evaluated for Harris County Flood Warning System datasets.

The existing computational tools used in the flood management, specifically for flood forecasting, heavily rely on complex physics and mathematical equations representing the dynamics of the atmosphere and of the water flow. In the U. S, these forecasting tools are mainly managed and operated by the federal government, the NOAA and its sub-agencies. However, the public is able to access the tools or its results from the NOAA’s website. There are also some forecasting models which are made freely available to the academic, government and other not- for-profit research purposes.

The flood warning or monitoring systems are also mainly managed and operated by local governments, such as the public administration of Counties or Cities. These systems heavily rely on weather sensors placed in strategic locations in the local areas. The sensors measure rainfall amount, precipitation, streamflow, water level, and many more, and transmit the data to a centralized storage or system during rainfall or storms. Interfaced by a website or a mobile application, the public or the warning/monitoring system designated users are able to access flood information easily.

Water-level forecasting techniques were explored for three forecasting scenarios in this thesis. We developed forecasting models that predicted the water levels at a particular location solely relying on local information, forecasting models that predict water levels in a watershed,

96 namely the Addicks Reservoir watershed, and global, non-location specific forecasting models for Harris County. To obtain the forecasting models, three different regression methods were investigated: linear regression, VAR and ARIMA forecasting models.

We compared the forecast models obtained for different forecasting scenarios and methods in the case study. The result comparison between different dependent variables in different forecasting models shows that the forecasting models which predict water-level delta performed the worst among other forecasting models. We also compared the models’ performance between different methods. The VAR and ARIMA models are considerably more complex approaches than the linear model. In the location-specific scenario, the linear models did better than the ARIMA models. Meanwhile, in the Addicks Reservoir scenario, the VAR models only performed better than the linear models in the training phase. Lastly, a comparison analysis using gage 2130 dataset shows that the VAR model performed better than any models in the training phase; but for the testing set, it was outperformed by the linear model for the location- specific scenario.

The case study conducted in this thesis focuses on the comparison of different forecasting techniques for different forecasting scenarios involving a particular location, a watershed and the whole area of Harris County. Some recommendations for future work include a further investigation of an area-based water-level forecasting model. An area-based forecasting model refers to a model which is built based on input parameters of neighboring rivers, which is similar to our forecasting models for the Addicks Reservoir watershed. For future work, the comparison of the performance of the Addicks Reservoir watershed forecasting models with the linear models built from each sensor’s past water-level data is necessary to measure the effectiveness of this

“local watershed” approach. Moreover, it might be interesting to investigate why the VAR model performed better only on the training sets; but not on the testing sets.

97

References

[1] World Meteorological Organization, Atlas of mortality and economic losses from weather, climate and water extremes (1970–2012), vol. 1123, World Meteorological Organization, 2014. [2] National Weather Service, "Flood Related Hazards," [Online]. Available: http://www.floodsafety.noaa.gov/hazards.shtml. [3] "Flood Types," FLOODSite, 2008. [Online]. Available: http://www.floodsite.net/juniorfloodsite/html/en/student/thingstoknow//floodtype s.html. [4] National Weather Service, "Summary of Natural Hazard Statistics for 2014 in the United States," [Online]. Available: http://www.nws.noaa.gov/om/hazstats/sum14.pdf.

[5] National Weather Service, "Summary of Natural Hazard Statistics for 2015 in the United States," [Online]. Available: http://www.nws.noaa.gov/om/hazstats/sum15.pdf. [6] "Coastal Flood," [Online]. Available: https://en.wikipedia.org/wiki/Coastal_flood. [7] National Oceanic and Atmospheric Administration, "Storm Surge and Coastal Inundation," [Online]. Available: http://www.stormsurge.noaa.gov/. [8] R. D. Knabb, J. R. Rhome and D. P. Brown, "Tropical Cyclone Report : Hurricane Katrina 23-30 August 2005," 2005. [Online]. Available: http://www.nhc.noaa.gov/data/tcr/AL122005_Katrina.pdf. [9] Floodsite Project, "Ponding (or Pluvial Floods)," Floodsite Project, 2008. [Online]. Available: http://www.floodsite.net/juniorfloodsite/html/en/student/thingstoknow/hydrology/ponding. html. [10] "PIRE – Coastal Flood Risk Reduction Program," Texas A&M University at Galveston, [Online]. Available: http://www.tamug.edu/ctbs/PIRE/. [11] "PIRE5 (2015) Award Information," The National Science Foundation, [Online]. Available: http://www.nsf.gov/od/oise/pire-2015-list.jsp.

98

[12] "2015 Annual Report Houston-Galveston Area Protection System," 2015. [Online]. Available: http://sspeed.rice.edu/sspeed/downloads/HGAPS_Report_08_31_15.pdf. [13] J. Boyd, "Rice wins $3.1M to develop storm strategy for Houston-Galveston," Rice University News & Media, Houston, 2014. [14] E. S. Blake and E. J. Gibney, "The Deadliest, Costliest, and Most Intense United States Tropical Cyclones from 1851 to 2010 (And Other Frequently Requested Hurricane Facts)," 2011. [Online]. Available: http://www.nhc.noaa.gov/pdf/nws-nhc-6.pdf. [15] A. Sebastian, J. Proft, J. C. Dietrich, W. Du, P. Bedient and C. N. Dawson, "Characterizing hurricane storm surge behavior in Galveston Bay using the SWAN + ADCIRC model," Coastal Engineering, vol. 88, p. 171–181, 2014. [16] "Centers of Excellence Network - University Programs," 2015. [Online]. Available: https://www.hsuniversityprograms.org/. [17] Coastal Hazards Center, "Coastal Hazards Center. A U.S. Department of Homeland Security Center of Excellence.," 2012. [Online]. Available: http://coastalhazardscenter.org/. [18] Coastal Resilience Center, "Coastal Resilience Center of Excellence," 2016. [Online]. Available: http://coastalresiliencecenter.unc.edu. [19] American Meteorological Society, "Glossary of Meteorology," cited 2015: Flood Forecasting. [Online]. Available: http://glossary.ametsoc.org/wiki/Flood_forecasting. [20] National Oceanic and Atmospheric Administration, "National Oceanic and Atmospheric Administration - About our Agency," [Online]. Available: http://www.noaa.gov/about-our- agency. [21] National Oceanic and Atmospheric Administration, "SLOSH Model," National Hurricane Center, [Online]. Available: http://www.nhc.noaa.gov/surge/slosh.php. [22] "SLOSH Display Training," 2003. [Online]. Available: http://slosh.nws.noaa.gov/slosh/SLOSH-Display-Training.ppt. [23] B. Glahn, A. Taylor, N. Kurkowski and W. A. Shaffer, "The Role of the SLOSH Model in NWS Storm Surge Forecasting," National Weather Digest, vol. 33, no. 1, pp. 3-14, 2009. [24] National Oceanic and Atmospheric Administration, "SLOSH Model Coverage," [Online]. Available: http://www.nhc.noaa.gov/surge/images/SLOSHmodelbasincoverage.png.

99

[25] National Oceanic and Atmospheric Administration, "SDP Output Sample," [Online]. Available: http://www.nhc.noaa.gov/surge/images/gl2_ike2008.png. [26] National Oceanic and Atmospheric Administration, "Storm Surge and Coastal Inundation - Models and Observations," [Online]. Available: http://www.stormsurge.noaa.gov/models_observations.html. [27] S.-C. Kim, J. Chen and W. A. Shaffer, "An Operational Forecast Model for Extratropical Storm Surges along the U.S. East Coast," in Conference on Oceanic and Atmospheric Prediction, 1996. [28] A. Taylor, "Extra-Tropical Storm Surge," National Oceanic and Atmospheric Administration, [Online]. Available: http://slosh.nws.noaa.gov/etss2.1/#. [29] Y. Funakoshi, J. Feyen, F. Aikman, H. Tolman, A. van der Westhuysen, A. Chawla, I. Rivin and A. Taylor, "Development of Extratropical Surge and Tide Operational Forecast System (ESTOFS)," in Estuarine and Coastal Modeling, Reston, VA, 2011. [30] "WAVEWATCH III Model," National Weather Service (NWS), National Oceanic and Atmospheric Administration, 2009. [Online]. Available: http://polar.ncep.noaa.gov/waves/wavewatch/. [31] The University of North Carolina at Chapel Hill, "The Official ADCIRC Web Site - ADCIRC," [Online]. Available: http://adcirc.org/.

[32] "SWAN Simulating Waves Nearshore," [Online]. Available: http://swanmodel.sourceforge.net/. [33] M. Zijlema, G. van Vledder, L. Holthuijsen, J. Salmon and P. Smit, "SWAN and its recent developments," in 13th International Workshop on Wave Hindcasting and Forecasting & 4th Coastal Hazard Symposium, Banff, 2013. [34] N. Booij, R. C. Ris and L. H. Holthuijsen, "A Third-generation Wave Model for Coastal Regions 1. Model Description and Validation," Journal of Geophysical Research, vol. 104, no. C4, p. 7649–7666, 1999. [35] "Harris County Flood Warning System," Harris County Flood Control District, [Online]. Available: http://www.harriscountyfws.org. [36] "Harris County Flood Control District," Harris County Flood Control District, 2016. [Online]. Available: https://www.hcfcd.org/.

100

[37] "The Rice University and Texas Medical Center Flood Alert System," Rice University, 1997-2013. [Online]. Available: http://fas3.flood-alert.org/. [38] P. Bedient, N. Fang and B. Vieux, "The Rice University Flood Alert System," [Online]. Available: http://sspeed.rice.edu/sspeed/downloads/Flood_Alert_System_Brochure_Aug2015_Single Pgs.pdf. [39] Z. Fang, P. B. Bedient and B. Buzcu-Guven, "Long-Term Performance of a Flood Alert System and Upgrade to FAS3: A Houston, Texas, Case Study," Journal of Hydrologic Engineering, vol. 16, no. 10, pp. 818-828, 2011.

[40] "Central Texas Hub," [Online]. Available: http://www.centraltexashub.org/. [41] KISTERS North America, Inc., "WISKI," 2013. [Online]. Available: http://www.kisters.net/wiski.html. [42] "OGC® WaterML," Open Geospatial Consortium (2015), 2015. [Online]. Available: http://www.opengeospatial.org/standards/waterml. [43] G. J. Lim, S. Zangeneh, M. R. Baharnemati and T. Assavapokee , "A capacitated network flow optimization approach for short notice evacuation planning," European Journal of Operational Research, vol. 223, no. 1, p. 234–245, 2012. [44] G. J. Lim, M. Rungta and M. R. Baharnemati , " Reliability analysis of evacuation routes under capacity uncertainty of road links," IIE Transactions, vol. 47, no. 1, 2015. [45] "Selenium HQ Browser Automation," [Online]. Available: www.seleniumhq.org. [46] Harris County Flood Control District, "Addicks Reservoir Watershed," November 2013. [Online]. Available: https://www.hcfcd.org/media/1286/addicks-reservoir- watershed110513.pdf. [47] R. J. Hyndman, "Box-Jenkins modelling," 2001. [Online]. Available: http://robjhyndman.com/papers/BoxJenkins.pdf. [48] B. Pfaff, VAR, SVAR and SVEC Models: Implementation Within R Package vars. [49] H. Luetkepohl, "Vector Autoregressive Models," European University Institute, EUI Working Paper ECO 2011/30, 2011. [Online]. Available: http://cadmus.eui.eu/bitstream/handle/1814/19354/ECO_2011_30.pdf?sequence=1.

101