A Markov Decision Process Model for Socio-Economic Systems Impacted by Climate Change

Salman Sadiq Shuvo 1 Yasin Yilmaz 1 Alan Bush 1 Mark Hafen 1

Abstract ing recurring hurricanes, storm surge, and heavy rainfall as Coastal communities are at high risk of natural a result of sea level rise (Wahl et al., 2015). This intensifies hazards due to unremitting global warming and social vulnerability, especially in underprivileged neighbor- sea level rise. Both the catastrophic impacts, e.g., hoods (Kashem et al., 2016; Schrock et al., 2015); induces tidal flooding and storm surges, and the long-term coastal hazards (Nicholls, 2011; Wahl et al., 2014); and impacts, e.g., beach erosion, inundation of low ly- affects regional economies by influencing property values, ing areas, and saltwater intrusion into aquifers, taxation, and the insurance cost (Fu et al., 2016). Due to sea cause economic, social, and ecological losses. level rise, coastal inhabitants are exposed to many of these Creating policies through appropriate modeling consequences and need to develop the adaptive capability of the responses of stakeholders, such as govern- and resilience frameworks to counter these stressors through ment, businesses, and residents, to climate change adequate planning and decision support (Beatley, 2012). and sea level rise scenarios can help to reduce The governments, planners, coastal administrators, and these losses. In this work, we propose a Markov personnel in a variety of agencies require substantial in- decision process (MDP) formulation for an agent formation for effective interaction, decision making, and (government) which interacts with the environ- adaptation planning (Beatley, 2012; Fu et al., 2017; Re- ment (nature and residents) to deal with the im- search Council, 2009). This demands the support of key pacts of climate change, in particular sea level actors to relate the science, the variability, and the hazard rise. Through theoretical analysis we show that a of various outlines to stakeholders (Tribbia & Moser, 2008). reasonable government’s policy on infrastructure Explicitly modeling these agents’ reaction to sea level rise development ought to be proactive and based on scenarios can serve in the creation of strategies tailored to detected sea levels in order to minimize the ex- local impacts and resilience management and requires a pected total cost, as opposed to a straightforward mixture of social engagement and planning tools, including government that reacts to observed costs from scenario planning (Berke & Stevens, 2016; Drogoul, 2015; nature. We also provide a deep reinforcement Potts et al., 2017). learning-based scenario planning tool considering different government and resident types in terms In this paper, for planning sea level rise scenarios, we study of cooperation, and different sea level rise projec- the interactions between a government, residents, and the tions by the National Oceanic and Atmospheric nature around the sea level rise problem. Specifically, we Administration (NOAA). use probabilistic models to simulate their behaviors and in- teractions between them. Focusing on the economic cost of the sea level rise-related natural events (e.g., flooding, hurri- 1. Introduction cane) and government investments (e.g., infrastructure im- provement) we propose a Markov decision process (MDP) The consequences of global warming and sea level rise have framework to analyze the decision policies for government been widely documented, examined, and forecasted (Sal- together with the reactions of the nature and residents. lenger Jr et al., 2012; Hapke et al., 2013; Pachauri et al., We specialize the proposed MDP framework to the sea level 2014; Plant et al., 2016; Hine et al.; Burke et al., 2019). The rise problem and illustrate it using available economic data, coastal communities will undergo multiple threats, includ- sea level projections, and community partner feedbacks 1 1 University of South Florida, Tampa, FL 33620, USA. Corre- for the Tampa Bay region in Florida. However, the proposed spondence to: Yasin Yilmaz . MDP framework can be easily adapted to simulating other

Proceedings of the 37 th International Conference on Machine 1A list of community partners can be seen at Acknowledge- Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- ments. thor(s). An MDP Model for Socio-Economic Systems Impacted by Climate Change natural and socioeconomic systems under the impacts of 1.2. Contributions climate change. In our model, MDP agent is the government in an urban MDP provides a suitable theoretical framework for creating setup, which interacts with the nature and residents by mak- agent-based scenarios (Howard, 1960; Bellman, 1957; Fi- ing investment decisions for the sea level rise problem to lar & Vrieze, 2012). The MDP agent (government in our minimize the incurred economic cost. Our contributions can case) interacts with the environment (nature and residents be summarized as follows. in our case) by taking action at each time and receiving a reward/cost from the environment in return. The objective • We theoretically show that the government should base of the agent is to optimize the return over time by choos- its investment decision on the observed sea level in- ing actions from a list of available actions. Each time, the stead of the incurred cost from the nature. Making system moves to a new state based on the agent’s action investment decisions based on the natural cost corre- according to a distribution. The optimal policy sponds to the straightforward policy that any sensitive determines which action to take in which state by map- but shortsighted government would adopt. Through ping system states to actions. It can be either found using mathematical analysis of the proposed MDP model, or learned from experience using re- we show that proactive actions triggered by the rising inforcement learning (RL) (Sutton & Barto, 2018), which is sea levels are more effective in reducing the cumulative also known as approximate dynamic programming. In deep cost than reactive actions triggered by the cost from , a neural network structure is used the nature. to approximate the expected value for each action at each state. Deep RL is suitable for problems where the system • Using the proposed MDP model we provide a deep states are many or continuous in nature (Mnih et al., 2015). RL-based scenario planning framework that takes into account arbitrary number of (discrete) government ac- 1.1. Related Works tions, continuous sea level rise values, different sea level projections by NOAA, and different government As an impact of climate change, sea level change is neither and resident prototypes in terms of being responsive to globally nor regionally uniform (Hine et al.; Sallenger Jr the sea level rise problem. et al., 2012), and can vary spatially and temporally (Ezer, 2013; Hine et al.; Wahl et al., 2015). This leads to a degree • Using the available economic data and the regionally of uncertainty to the projection of sea level rise impacts adjusted NOAA sea level projections we present sce- at the local level, as well as planning for adaptation and nario simulations for Tampa Bay, FL. For each sce- resilience. The implications of this uncertainty for coastal nario, we consider different government and resident community planning and adaptation are many, and have prototypes using cooperation indices that represent led to a variety of efforts to constrain, refine, and bench- how responsive they are to the sea level rise problem. mark sea level rise projections for purposes of assessing risk Finally, we show that the optimal (proactive) policy (Buchanan et al., 2016; Hinkel et al., 2015; Le Cozannet learned by the proposed deep RL achieves et al., 2017; Sweet & Park, 2014; Tyler & Moench, 2012). 40% less economic cost than the best reactive policy Similarly, in this work, in addition to proposing a general in 100 years in three different sea level projections by scenario planning model, we also tailor our model to a spe- NOAA. cific region, Tampa Bay, FL, using NOAA projections and economic data for the region. The rest of the paper is organized as follows. Section2 Agent-based modeling has been a popular choice for sim- explains the proposed MDP model. In Section 3.1, we char- ulating complex systems including transportation systems acterize the optimal policy through function analysis. Then, (Faboya et al., 2018), disaster recovery (Eid & El-Adaway, we present a deep RL algorithm for finding the optimal 2018), and climate change (Patt & Siebenhuner¨ , 2005). Dif- policy in Section 3.2. The scenario simulations for Tampa ferent than existing works, in this paper we propose a novel Bay region are provided in Section4. Discussions about scenario planning model based on MDP and deep RL for a the proposed model is given in Section5, and the paper is realistic setup with arbitrary number of actions and continu- concluded in Section6. For interested readers, we provide ous sea level rise values. Furthermore, as a major contribu- the proofs in the accompanying supplementary file. tion, we theoretically analyze the optimal decision policy, and illustrate the proposed model for the Tampa Bay region. 2. MDP Problem Formulation We propose an MDP framework to model the behaviors of stakeholders (government, residents, and nature), and the in- teractions between them. As shown in Figure1, government, An MDP Model for Socio-Economic Systems Impacted by Climate Change

Markov State

Sn = (sn−1 + xn, ℓn−1 + rn) Table 1. Model parameters.

Sn Initial sea level `0 ≥ 0 xn rn ≥ 0 Sea level rise at time n rn ≥ 0 Pn Agent Environment Sea level at time n `n = `0 + m=1 rm Initial infrastructure state s0 ∈ {1, 2, ...} Government Nature Residents xn n−1 zn sn 1, ℓn 1 ≥ yn xm, zm ∈ , Infrastructure decision at time n xn ∈ {0, 1, . . . , q} xn ∈ {0, 1, . . . , q} ( − − ) 0 { }m=1 {0 1} Pn Infrastructure state at time n sn = s0 + m=1 xm xn zn yn Residents’ decision at time n yn ∈ {0, 1} cn Cost from nature at time n zn ≥ 0 Cost Total cost at time n cn = αxn − βyn + zn cn(xn, yn, zn) ≥ 0

2.1. Government Model Figure 1. Proposed MDP framework. At each time step, e.g., a year, the government decides the degree of its investment xn ∈ {0, 1, 2, ...., q} for infras- (sn−1, `n−1) xn = 0, rn = r1 (s , ` + r ) n−1 n−1 1 tructure development, where q is a finite positive integer.

xn = 1, rn = r2 xn = 0 means no investment at step n. Hence, there are q + 1 possible actions for the government at each time step. (s + 1, ` + r ) n−1 n−1 2 The numerical value of xn = m can be interpreted as spend- ing m unit money towards the infrastructure development m q xn = q, rn = r3 or the th action among different actions with increasing cost and effectiveness. Possible government actions include but are not limited to building seawalls, raising roads, widen- ing beach, building traditional or horizontal levees, placing (sn−1 + q, `n−1 + r3) stormwater pumps, improving sewage systems, relocating seaside properties, etc. (Sergent et al., 2014; mar, 2016; Xia et al., 2019). The range of x is designed to cover the Figure 2. MDP state transition. The state space consists of a con- n tinuous horizontal dimension for sea level and discrete vertical real world costs from the cheapest investment like clean- dimension for infrastructure state. Three possible state transitions ing the pipes to the most expensive investment like buying are shown in the figure. Note that rn can take any nonnegative lands and property to relocate the seaside inhabitants and value. businesses.

The total cost cn to the agent at each time n consists of the the central decision maker in dealing with the sea level rise investment cost, cost from nature, and the residents’ contri- problem, at each time step n takes an action (i.e., investment bution to the investment. Since the government’s investment decision) xn and receives responses from the residents yn decision and the residents’ contribution decision have inte- c = αx −βy +z and the nature zn through the cost cn. Then, this natural and ger values, we model the total cost as n n n n α β socioeconomic system moves to a new state Sn based on the using parameters and to map the decisions to monetary previous state Sn−1, government’s action xn, and the sea values. The three different entities in the cost definition are α, β level rise rn. The system state consists of the pair of city’s in- combined by adjusting the parameters and the param- frastructure state sn and the sea level `n, i.e., Sn = (sn, `n). eters of the probabilistic model introduced for the nature’s The government’s investment decisions also determine the cost zn in Section 2.2. In Section4, we discuss how to set state of the city infrastructure sn, which is modeled as these parameters to obtain realistic costs for the Tampa Bay Pn region. The discounted cumulative cost for the government sn = s0 + m=1 xm = sn−1 + xn. Likewise, the sea level at step n is given by the cumulative sea level rise val- in N time steps is given by ` = ` + Pn r = ` + r s l ues: n 0 m=1 m n−1 n. Here 0 and 0 are N respectively the initial infrastructure state and the initial sea X n CN = a [αxn − βyn + zn], (1) level of the city. In terms of simulations, these are two user- g n=0 defined numbers representing the existing states at the be- ginning of the simulations. The system state clearly satisfies where the discount factor ag ∈ (0, 1) defines the weight the Markov property: P(Sn|Sn−1,..., S0) = P(Sn|Sn−1). of future costs in the agent’s decisions. Apart from being The state transition is illustrated in Figure2, and the param- a standard parameter in MDP cost function, ag has an im- eters of the proposed MDP framework are summarized in portant contextual significance in this research. It indicates Table1. We next explain our proposed models for the gov- how much the government values the future costs due to ernment, nature, and residents under the MDP framework. sea level rise in its decision making process. Hence, in this An MDP Model for Socio-Economic Systems Impacted by Climate Change

3000 with the sea level rise (nce, 2020). Thus, we model the Int. Low scale parameter of generalized Pareto distributed zn directly Intermediate 2500 High proportional to the most recent sea level `n−1 and inversely Int. Low Fitted proportional to the most recent infrastructure state sn−1. Intermediate Fitted The proposed model for zn is given by 2000

) High Fitted

m m

( zn ∼ GeneralizedPareto(k, σn, θ) l

e 1500

v a

e η(` ) l n−1

a θ ≥ 0, σn = , k < 0, (2)

e b

s (sn−1)

e 1000

v

i

t a

l where θ, σ, k are the standard parameters of generalized e R 500 Pareto: location, scale, and shape parameters, respectively; and η > 0, a ∈ (0, 1), b > 0 are our additional model pa-

0 rameters. The parameters k, θ, η, a, b helps to regulate the 2000 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100 impact of most recent sea level ` over nature’s cost z Year n−1 n relative to the most recent infrastructure state sn−1. Choos- Figure 3. Adjusted NOAA projections (solid lines) and our fittings ing an appropriate set of parameters depends on the region (dashed lines) for relative sea level change for Tampa Bay, FL. considered for simulations. We explain how to set the pa- rameters for the Tampa Bay region in Section4. Our prefer- ence for modeling the scale parameter and not the location work, a is termed as the government’s cooperation index. g parameter is due to the fact that the scale parameter can con- The government’s objective is to minimize the expected trol both the mean and the variance, whereas the location cumulative cost E[C ] by taking investment actions {x } N n parameter appears only in the mean. over time. 2.3. Residents Model 2.2. Nature Model In our framework, government requests a contribution from Hurricanes, floods, and stagnant water are some of the many residents to the sea level rise investments, e.g., through sea level rise-related natural events that cause cost in differ- an additional tax referendum; and residents make a binary ent ways such as loss of properties, jobs, taxes, and tourism decision y ∈ {0, 1} whether to support the government. incomes due to submerged areas. To model this cost z , we n n The model is flexible in terms of the frequency of voting. begin with modeling the sea level rise r . Recently, three n For instance, residents may vote every 5 years and the voting different regionally adjusted NOAA projections for sea level decision remains constant until the next voting. rise have been proposed for the Tampa region (Burke et al., 2019), as shown in Fig.3. An appropriate model for yn requires social modeling for the residents’ voting behavior. While the residents of a re- We take these projections as mean sea level rise values, gion in general form a heterogeneous community in terms and model the uncertainty around them using the Gamma of social and psychological factors, it is sufficient to model distribution, i.e., r ∼ Gamma(µ, φ), since it is a flexible n the aggregate response in this work since we consider a two-parameter probability distribution used for modeling referendum in which the majority vote wins. Similar to the positive variables including environmental applications, e.g., government’s cooperation index, we define the residents’ co- daily rainfall (Aksoy, 2000). We set the scale parameter operation index a ∈ (0, 1) to quantify how responsive the φ = 0.5 and vary the shape parameter µ in a range to match r residents are to the sea level rise problem. These cooperation the mean relative sea level, given by Pn E[r ], with m=1 m indices are helpful in simulating different government and the NOAA projection curves. The successful curve fitting, resident profiles, as illustrated in Section4. As explained shown in Fig.3, is achieved by increasing µ from 11.2 by the social exchange theory (Homans, 1974; Rasinski & to 12.388 with 0.012 increments for the intermediate-low Rosenbaum, 1987), some individuals’ votes are motivated projection; from 13 to 34.78 with 0.22 increments for the by what benefit they receive in exchange. A high percentage intermediate projection; and from 14.6 to 87.86 with 0.74 of such individuals in a community is represented by a low increments for the high projection. ar value in our model. According to another perspective, Then, we model nature’s cost zn using the generalized voter behavior is determined by non-self-interested factors Pareto distribution, which is commonly used to model catas- such as a sense of civic duty (Katosh & Traugott, 1982), trophic losses, e.g., (Cebrian´ et al., 2003; Uchiyama & moral obligation (Rasinski et al., 1985), and psychological Watanabe, 2006; Daspit & Das, 2012). It is known that sense of community (Davidson & Cotter, 1993). In our the storm- and flooding-related costs have been increasing model, a high ar value corresponds to a community which An MDP Model for Socio-Economic Systems Impacted by Climate Change is dominantly composed of this type of individuals who optimal policy, the value non-self-interested factors. V (sn−1, `n−1) = min E[cn + agV (sn, `n)|xn] We use ar to obtain a score function that gives the proba- xn bility of support from the residents. In addition to internal provides a recursive approach by focusing on finding the factors, represented by ar, voters’ decision is also affected optimal action x at each time step using the successor state by external factors, government’s actions {xn} and nature’s n value, instead of trying to find the entire policy {x } at once. cost {zn} in our case. We hypothesize that the probability n Using the cost definition given by (1) and considering possi- of a positive voting outcome, P(yn = 1), is high when both nature’s cost and government’s efforts are high (i.e., high ble q actions for xn this iterative equation can be rewritten as xn and zn) in the recent history. In other words, when either nature’s cost has been low or government is not responsive, n residents lack the main reason for accepting an additional V (sn−1, `n−1) = min tax. Hence, we propose the following score function and E − βy + z + a V (s , ` + r ), the Bernoulli model for y : n n g n−1 n−1 n n | {z } F0(sn,`n) n−1   X n−m E α − βyn + zn + agV (sn−1 + 1, `n−1 + rn) , hn = ar xmzm, (3) m=1 | {z } F1(sn,`n) 1 y ∼ (p ), p = σ(h ) = , E2α − βy + z + a V (s + 2, ` + r ), n Bernoulli n n n −(h −h ) n n g n−1 n−1 n 1 + e n 0 | {z } F2(sn,`n) where σ(·) is the logistic sigmoid function used for mapping ... , the score hn to probability pn. High hn means a high likeli-   o hood of support from the residents. The sigmoid’s midpoint E qα − βyn + zn + agV (sn−1 + q, `n−1 + rn) , (4) h0 is empirically obtained as the average value of hn. In | {z } addition to being a cooperation index, another interpreta- Fq (sn,`n) tion for ar is the concept of discount (or forgetting) factor where Fm(sn, `n) is defined as the expected total cost of for past events, similar to the government’s ag coefficient. taking action xn = m at state (sn, `n). For high ar values, residents have a longer memory, and they consider also the past disasters and government efforts At each time step n, the action xn shapes the instant with an exponentially decaying weight in their decision. cost cn and moves the system to the next state, which Whereas, for a community with low ar, residents have a determines the discounted future cost agV (sn, ln). The short memory and only care about the most recent cost and optimum policy chooses among the investment actions government effort. In an extreme case where ar is close to xn ∈ {0, 1, 2, ...., q} that has the minimum expected to- zero, they may not be even motivated to contribute by a very tal cost, minm{Fm(sn, `n)}, as shown in (4). Since the recent disaster and government’s high efforts. functions {F0(sn, `n),...,Fq(sn, `n)} determine the opti- mal policy, we next analyze them to characterize the optimal 3. Optimal Policy government policy. Theorem 1. For m = 0, 1, . . . , q, F (s , ` ) is nonde- In this section, we first analyze the optimal policy for gov- m n n creasing and concave in `n for each sn; and the deriva- ernment investments, and then provide a practical algorithm tive of F (s , ` ) with respect to ` is lower than that of to find the optimal policy. m n n n Fm−1(sn, `n) for m = 1, . . . , q.

3.1. Optimal Policy Analysis Proof is provided in the Supplementary file. For In the proposed MDP framework, the objective of a rational a specific infrastructure state sn, expected costs F0(sn, `n),F1(sn, `n),...,Fq(sn, `n) for all the q + 1 government is to minimize the expected total cost E[CN ] in N time steps by following an optimal investment policy. actions are illustrated in Figure4 according to Theorem1. Central to MDP is the optimal value function The optimum policy picks the minimum of the q + 1 curves at each time, which is shown with the solid curve in Figure

V (sn, `n) = min E[CN |{xn}], 4. As a result of Theorem1, we next give the outline of {xn} optimum policy in Corollary1. which gives the minimum expected total cost possible at Corollary 1. The optimum policy, at each infrastructure each state (sn, `n), denoted as the optimal value of that state sn, compares the sea level `n with at most q thresholds state, by choosing the best action policy {xn}. To find the where each threshold signifies a change of optimal action. An MDP Model for Socio-Economic Systems Impacted by Climate Change

F (s , ` ) 0 n n The thresholds also depend on the cooperation indices ag and ar. Higher cooperation indices set the thresholds lower

F1(sn, `n) and vice versa. Notably, the government’s cooperation in- dex ag is dominant in shaping the thresholds. Intuitively, F (s , ` ) 2 n n as ag grows, the government becomes more cautious about (i.e., sees more objectively without severely discounting) the F (s , ` ) 3 n n expected future natural costs and sets a lower threshold for investment actions. On the contrary, small ag implies under- estimated future costs and thus overemphasized investment costs, which results in a high threshold for investment.

Expected Total Cost 3.2. Finding the Optimal Policy We showed that the optimal policy is given by a number of thresholds on the current sea level; however completely specifying the optimal policy with analytical expressions for thresholds does not seem tractable due to a large number of 0 parameters involved in the problem. Hence, we next propose ` (s ) ` (s ) ` (s ) thr1 n−1 thr2 n−1 thr3 n−1 `n a reinforcement learning (RL) algorithm to find the optimal

e.g.: `n policy for simulated scenarios. Specifically, to learn the

⇒ optimal action is xn = 1 optimal policy here, a deep RL algorithm is needed due to the continuous sea level values, which cause infinite number of possible states. The deep Q network (DQN) algorithm Figure 4. Expected total costs as a function of sea level for an (Mnih et al., 2015), which is a popular choice for deep RL, example case with q = 3. The expected total costs intersect each addresses well the infinite dimensional state space problem. other only once for any given infrastructure state according to It leverages a deep neural network to estimate the optimal Theorem1. action-value function To prove Corollary1, note that Q(sn, `n, xn) = E[cn + ag min Q(sn + xn, `n+1, x) | xn]. x Fm−1(sn, `n = 0) < Fm(sn, `n = 0) where V (s , ` ) = min Q(s , ` , x ). In Algorithm1, for m ∈ {1, 2, ...., q} because ` = 0 corresponds to n n xn n n n n we present a DQN algorithm for learning the optimal policy the fictional case of zero sea level where there is no on government’s infrastructure investment actions. risk. That is, Fm−1(sn, `n) starts at a lower point than Fm(sn, `n), but increases faster than Fm(sn, `n) since its Algorithm1 operates many episodes to iteratively compute derivative is higher (Theorem1). Also from Theorem1, it the weights of the DQN neural network that relates input is known that both of them are concave and bounded, hence states and actions with the output Q(S, x) values. Each Fm−1(sn, `n) and Fm(sn, `n) intersect exactly at one point episode consists of Monte-Carlo simulations in which sev- for m ∈ {1, 2, . . . , q}. While for `n less than the inter- eral states are visited according to the current policy defined section point the action xn = m is less effective than the by the current value function. Selecting the actions always action xn = m − 1 in terms of immediate cost and expected using equation (4) may keep some of the states not explored future cost, it becomes more effective when `n exceeds the enough. To encourage the agent to explore the state space intersection point. adequately, we use a technique that explores frequently at the beginning, but reduces exploration with gaining more Figure4 gives an example case where there are q = 3 experience over time (Singh et al., 2000). After the weight thresholds ` (s ), ` (s ), ` (s ) which de- thr1 n−1 thr2 n−1 thr3 n−1 matrix values converge, the final weight matrix is used to pend on s and indicate change points of optimal action. n−1 generate scenario simulations, as described in the following However, depending on the slopes of {F (s , ` )} curves m n n section. The convergence of the weight matrix is depicted at each infrastructure state s , there may be less than q n−1 in Fig.5. change points.

To summarize, for a given state (sn−1, `n−1), the optimum 4. Scenario Simulations policy chooses xn based on the relative value of the current sea level `n = `n−1 + rn, where rn is the rise on top In this section, we present scenario simulations for the of `n−1, with respect to the existing infrastructure state Tampa Bay region in Florida using our MDP model and Pn−1 sn−1 = m=1 xm. deep RL algorithm. Tampa Bay, situated along the Gulf of An MDP Model for Socio-Economic Systems Impacted by Climate Change

Algorithm 1 DQN algorithm for finding optimum policy 6 1: Input: ag, ar, α, β, θ, k, η, a, b 2: Initialize replay memory D to capacity N 3: Initialize action-value function Q with random weights 5

w and target action-value function Q0 with random s

0 e

weights w = w g

n 4

4: for episode = 1, 2, ... do a

h c

5: Initialize state S0 = (s0, `0)

d e

6: for n = 1, 2, ..., N do r

a 3 u

7: With probability  select a random action xn, oth- q s

erwise select x = arg min Q(S , x; w) f

n x n o 2

8: Execute action xn and observe cost cn = αxn − m u

βyn + zn (see (2) and (3)) S 9: Store transition (Sn, xn, cn, Sn+1) in D 1 10: Sample random minibatch of transitions (Sj, xj, cj, Sj+1) from D 0 0 11: Set target tj = cj + ag minx Q (Sj+1, x; w ) 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 12: Perform a gradient descent step on [t − j Number of weight updates Q(Sj, xj; w)] with respect to the weights w 13: d w0 = w Every steps reset Figure 5. Convergence of weights in Algorithm 1. 14: if Q(S, x) converges for all S, x then 15: break −3 16: end if and 10 gave the best result. Decaying exploration rate 17: end for is used with initial value 1, multiplying factor 0.99, and 18: end for minimum value 0.01. In Fig.6, we analyze the effect of cooperation indices. We obtain different scenarios by varying the cooperation indices Mexico in Florida with more than 3 million residents, is ag and ar for the government and residents, respectively, listed number seven among the most at-risk areas in terms and by considering different NOAA projections for sea level of risk of damaging floods on the globe by the World Bank rise. As expected, the average total cost decreases with (wor, 2013). We have three different sea level rise projec- growing cooperation index for both the government (ag) and tions by NOAA until the year 2100 for the area (Burke et al., residents (ar). The cost is significantly higher if both the 2019), as shown in Fig.3. The City of Tampa expects to government and residents are not cooperative (ag = ar = collect $14 M/year for 30 years from residents to improve 0.1) compared to the cooperative case (ag = ar = 0.9). its stormwater system (sto, 2019). Thus, we set β = 14 with We next compare the proposed MDP-based government the base unit of million dollars. According to the ongoing policy, which proactively improves infrastructure based on improvement projects and regular stormwater services of observed sea levels, with the shortsighted policy which re- City of Tampa (sto, 2019), annual investment may range acts to significant costs from the nature by improving the between $ 25-75 M. Hence, we choose α = 25 and the infrastructure. We model this shortsighted policy with a number of nonzero actions q = 3, i.e., x ∈ {0, 1, 2, 3}.A n threshold parameter for the cost from nature z . When z recent report by the Tampa Bay Regional Planning Coun- n n exceeds the threshold, the shortsighted policy improves the cil (tbr, 2017) gives “the cost of doing nothing” for years infrastructure by three units (x = 3). Under the three 2020-2060 due to the sea level rise impacts under the n sea level projections, we find the optimum thresholds that high sea level rise projections as $162 billion. We obtain minimize the average total cost of shortsighted policy in P2060 z = 162, 000 (in million dollars) by keeping the in- 2020 n 100 years as 34, 28, and 21 for the intermediate low, in- frastructure state at the initial level s = 50, increasing the 0 termediate, and high projections, respectively (Figure7). sea level from the initial ` = 100 according to the gamma 0 The optimal thresholds in Figure7 show that the infras- distribution following the high sea level projection (Section tructure improvement is more urgent under higher sea level 2.2), and setting the parameters of generalized Pareto model projections, in accordance with the intuition. as k = −0.001, θ = 1, η = 25, a = 0.9, b = 1.1. All the costs in the following figures are normalized by taking the Finally, in Fig.8, we compare the best MDP-based policy residents’ contribution 14 M as one unit. Vanilla DQN is (ag = ar = 0.9) with the best shortsighted policy, which used with Huber loss and normalization on state and reward uses the optimum threshold, in terms of average total cost in values. The learning rate is tuned between 10−5 and 10−2, 100 years for all sea level projections. The same resident and An MDP Model for Socio-Economic Systems Impacted by Climate Change

3400 4500 6500 s s a = 0:1 a = 0:1 ar = 0:1

r s r 6400

r r

3300 r 4400

a

a a

a = 0:5 e a = 0:5

e ar = 0:5 r r

e y

y 6300 a = 0:9 y ar = 0:9 ar = 0:9 3200 r 0

0 4300

0

0 0

0 6200

1

1

1 n n 3100

n 4200

i i

i 6100

t

t

t

s

s

s

o o

3000 o 4100 6000

c

c

c

l

l

l

a a

a 5900

t t

2900 t 4000

o

o

o

t

t t

5800

e

e

e

g g

2800 g 3900

a

a a

r 5700

r

r

e

e

e

v v 2700 3800 v

5600

A

A A

2600 3700 5500 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9 ag ag ag

Figure 6. Average total cost as a function of government cooperation index ag and resident cooperation index ar for different NOAA projections: intermediate low (left), intermediate (middle), high (right).

9500 8000 Shortsighted policy 9000 MDP-based policy 7000

8500

s

s

r

r a

a intermediate low

e

e y

y 8000 intermediate 0

0 6000

0 0

high 1

1 7500

n

n

i

i

t

t

s

s o o 7000

c 5000

c

l

l

a

a

t

t

o o

6500 t

t

e

e g

g 4000

a

a r

r 6000 e

e 21

v

v

A A 5500 28 3000

5000 34 4500 2000 0 10 20 30 40 50 60 70 80 intermediate low intermediate high Threshold Sea level rise projections Figure 7. Average total cost for the shortsighted government policy Figure 8. Comparison of the cumulative cost for threshold based as a function of investment threshold under the three sea level and agent based optimal policy. projections. The optimum thresholds for each projection are shown in colored boxes. tinuous actions. Moreover, multi-agent RL methods can be nature models are used while simulating both policies. For used to actively model the decisions of other stakeholders, all the three sea level projections, the MDP-based proactive such as residents and businesses, and interactions between policy, whose decisions are driven by the current sea level, them. Modeling of each stakeholder would ideally need achieves around 40% less cost than the shortsighted reactive a tailored approach that incorporates the characteristics of policy, which decides based on the cost from nature. stakeholder into designing cost function, actions, etc. Last but not least, modeling the cooperation indices (ag, ar) is 5. Discussions and Future Work an interesting research direction that would require an inter- disciplinary effort. By decomposing them into a number of To the best of our knowledge, we presented the first com- fundamental traits, such as political, sociological, religious, prehensive MDP framework with theoretical and numerical and ethical traits of governments and residents, one can analysis for modeling socio-economic scenarios of the cli- study how to obtain ag, ar, or another cooperation index for mate change-related sea level rise problem. Although we a new stakeholder, for a given community. strived for a realistic framework, we know that the presented work can be improved in several aspects to obtain a more 6. Conclusion realistic framework. For instance, a continuous action space for the government can be integrated into the current model A novel Markov Decision Process (MDP) model has been by considering the monetary value of investments instead of proposed for simulating socio-economic scenarios of the a discrete set of infrastructure improvement actions. Policy sea level rise problem. The optimal government policy gradient methods such as the actor-critic method can be has been shown to rely on the observed sea levels through used instead of DQN to find the optimal policy with con- analyzing the expected cost functions. We also provided a An MDP Model for Socio-Economic Systems Impacted by Climate Change deep reinforcement learning (RL) algorithm for finding the optimal policy, and presented scenario simulations for the Tampa Bay region using available economic data from the city government, regional sea level projections from NOAA, and feedback from community partners who confirmed the real-world utility of the proposed model. The simulations of several scenarios corroborated the importance of supportive residents and a proactive government that improves the infrastructure according to the observed sea level and does not undervalue the future cost of sea level rise. Finally, we discussed several ways of improving the proposed model for generating more realistic scenarios. An MDP Model for Socio-Economic Systems Impacted by Climate Change

Acknowledgements under uncertain local sea-level rise. Climatic Change, 137(3-4):347–362, 2016. This work was supported by NSF award no. 1737598. The authors would like to thank the reviewers for their helpful Burke, M., Carnahan, L., Hammer-Levy, K., and Mitchum, comments. The authors are grateful to Douglas Hutchens, G. Recommended projections of sea level rise in the Deputy City Manager, the City of Dunedin; Melissa Zor- tampa bay region (update). 2019. nitta, Executive Director, Hillsborough County Planning Commission; Mary Keith, President, Tampa Audubon Soci- Cebrian,´ A. C., Denuit, M., and Lambert, P. General- ety; and Alison Barlow, Executive Director, St. Petersburg ized pareto fit to the society of actuaries? large claims Innovation District for their valuable feedback about the database. North American Actuarial Journal, 7(3):18–36, real-world utility of the proposed model. 2003. Daspit, A. and Das, K. The generalized pareto distribu- References tion and threshold analysis of normalized hurricane dam- age in the united states gulf coast. In Joint Statistical Which Coastal Cities Are at Highest Risk of Dam- Meetings (JSM) Proceedings, Statistical Computing Sec- aging Floods? New Study Crunches the Numbers, tion, Alexandria, VA: American Statistical Association, The World Bank. https://www.worldbank. pp. 2395–2403, 2012. org/en/news/feature/2013/08/19/ coastal-cities-at-highest-risk-floods, Davidson, W. B. and Cotter, P. R. Psychological sense of 2013. community and support for public school taxes. American Journal of Community Psychology, 21(1):59–66, 1993. Marin County Community Development Agency, Game of Floods. https://www.marincounty. Drogoul, A. Agent-based modeling for multidisciplinary org/depts/cd/divisions/planning/ and participatory approaches to climate change adaptation csmart-sea-level-rise/game-of-floods, planning. In Proceedings of RFCC-2015 Workshop, AIT, 2016. 2015.

The Cost Of Doing Nothing, Economic Impacts of Sea Eid, M. S. and El-Adaway, I. H. Decision-making frame- Level Rise in the Tampa Bay Area, Tampa Bay Regional work for holistic sustainable disaster recovery: Agent- Planning Council. http://www.tbrpc.org/ based approach for decreasing vulnerabilities of the as- wp-content/uploads/2018/11/2017-The_ sociated communities. Journal of infrastructure systems, Cost_of_Doing_Nothing_Final.pdf, 2017. 24(3):04018009, 2018.

City of Tampa Stormwater Utility Funding. https: Ezer, T. Sea level rise, spatially uneven and temporally //www.tampagov.net/tss-stormwater/ unsteady: Why the US East Coast, the global tide gauge programs/assessment, 2019. record, and the global altimeter data show different trends. Geophysical Research Letters, 40(20):5439–5444, 2013. NOAA National Centers for Environmental Informa- tion (NCEI), U.S. Billion-Dollar Weather and Cli- Faboya, O. T., Figueredo, G. P., Ryan, B., and Siebers, P.-O. mate Disasters. https://www.ncdc.noaa.gov/ Position paper: The usefulness of data-driven, intelli- billions/time-series, 2020. gent agent-based modelling for transport infrastructure 2018 21st International Conference on Aksoy, H. Use of gamma distribution in hydrological analy- management. In Intelligent Transportation Systems (ITSC) sis. Turkish Journal of Engineering and Environmental , pp. 144–149. Sciences, 24(6):419–428, 2000. IEEE, 2018. Beatley, T. Planning for coastal resilience: best practices Filar, J. and Vrieze, K. Competitive Markov decision pro- for calamitous times. Island Press, 2012. cesses. Springer Science & Business Media, 2012. Bellman, R. A Markovian decision process. Journal of Fu, X., Song, J., Sun, B., and Peng, Z.-R. “Living on the mathematics and mechanics, pp. 679–684, 1957. edge”: Estimating the economic cost of sea level rise on coastal real estate in the Tampa Bay region, Florida. Berke, P. R. and Stevens, M. R. Land use planning for Ocean & Coastal Management, 133:11–17, 2016. climate adaptation: Theory and practice. Journal of Plan- ning Education and Research, 36(3):283–289, 2016. Fu, X., Gomaa, M., Deng, Y., and Peng, Z.-R. Adaptation planning for sea level rise: A study of US coastal cities. Buchanan, M. K., Kopp, R. E., Oppenheimer, M., and Journal of Environmental Planning and Management, 60 Tebaldi, C. Allowances for evolving coastal flood risk (2):249–265, 2017. An MDP Model for Socio-Economic Systems Impacted by Climate Change

Hapke, C. J., Brenner, O., Hehre, R., and Reynolds, B. J. Potts, R., Jacka, L., and Yee, L. H. Can we ‘Catch ‘em Assessing hazards along our Nation’s coasts. Technical All’? An exploration of the nexus between augmented report, US Geological Survey, 2013. reality games, urban planning and urban design. Journal of Urban Design, 22(6):866–880, 2017. Hine, A. C., Chambers, D. P., Clayton, T. D., Hafen, M. R., and Mitchum, G. T. Sea Level Rise in Florida: Science, Rasinski, K., Tyler, T. R., and Fridkin, K. Exploring the Impacts, and Options. University Press of Florida. ISBN function of legitimacy: Mediating effects of personal 9780813062891. and institutional legitimacy on leadership endorsement and system support. Journal of Personality and Social Hinkel, J., Jaeger, C., Nicholls, R. J., Lowe, J., Renn, O., Psychology, 49(2):386, 1985. and Peijun, S. Sea-level rise scenarios and coastal risk management. Nature Climate Change, 5(3):188–190, Rasinski, K. A. and Rosenbaum, S. M. Predicting citizen 2015. support of tax increases for education: A comparison of two social psychological perspectives. Journal of Applied Homans, G. C. Social behavior: Its elementary forms. 1974. Social Psychology, 17(11):990–1006, 1987.

Howard, R. A. Dynamic programming and markov pro- Research Council, N. Informing decisions in a changing cesses. 1960. climate. Panel on Strategies and Methods for Climate- Kashem, S. B., Wilson, B., and Van Zandt, S. Planning Related Decision Support of the Committee on the Hu- for climate adaptation: Evaluating the changing patterns man Dimensions of Global Change, National Research of social vulnerability and adaptation challenges in three Council of the National Academies, 2009. coastal cities. Journal of Planning Education and Re- Sallenger Jr, A. H., Doran, K. S., and Howd, P. A. Hotspot search, 36(3):304–318, 2016. of accelerated sea-level rise on the Atlantic coast of North Katosh, J. P. and Traugott, M. W. Costs and values in the America. Nature Climate Change, 2(12):884–888, 2012. calculus of voting. American Journal of Political Science, Schrock, G., Bassett, E. M., and Green, J. Pursuing equity pp. 361–376, 1982. and justice in a changing climate: Assessing equity in Le Cozannet, G., Manceau, J.-C., and Rohmer, J. Bounding local climate and sustainability plans in US cities. Jour- probabilistic sea-level projections within the framework nal of Planning Education and Research, 35(3):282–295, of the possibility theory. Environmental Research Letters, 2015. 12(1):14012, 2017. Sergent, P., Prevot, G., Mattarolo, G., Brossard, J., Morel, Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, G., Mar, F., Benoit, M., Ropert, F., Kergadallan, X., J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidje- Trichet, J.-J., et al. Adaptation of coastal structures to land, A. K., Ostrovski, G., et al. Human-level control mean sea level rise. La Houille Blanche, (6):54–61, 2014. through deep reinforcement learning. Nature, 518(7540): Singh, S., Jaakkola, T., Littman, M. L., and Szepesvari,´ 529, 2015. C. Convergence results for single-step on-policy Nicholls, R. J. Planning for the impacts of sea level rise. reinforcement-learning . , Oceanography, 24(2):144–157, 2011. 38(3):287–308, 2000.

Pachauri, R. K., Allen, M. R., Barros, V. R., Broome, J., Sutton, R. S. and Barto, A. G. Reinforcement learning: An Cramer, W., Christ, R., Church, J. A., Clarke, L., Dahe, introduction. MIT press, 2018. Q., and Dasgupta, P. Climate change 2014: synthesis Sweet, W. V. and Park, J. From the extreme to the mean: Ac- report. Contribution of Working Groups I, II and III to celeration and tipping points of coastal inundation from the fifth assessment report of the Intergovernmental Panel sea level rise. Earth’s Future, 2(12):579–600, 2014. on Climate Change. IPCC, 2014. Tribbia, J. and Moser, S. C. More than information: what Patt, A. and Siebenhuner,¨ B. Agent based modeling coastal managers need to plan for climate change. Envi- and adaption to climate change. Vierteljahrshefte zur ronmental science & policy, 11(4):315–328, 2008. Wirtschaftsforschung, 74(2):310–320, 2005. Tyler, S. and Moench, M. A framework for urban climate re- Plant, N. G., Robert Thieler, E., and Passeri, D. L. Coupling silience. Climate and development, 4(4):311–326, 2012. centennial-scale shoreline change to sea-level rise and coastal morphology in the gulf of mexico using a bayesian Uchiyama, Y. and Watanabe, S. Modeling Natural Catas- network. Earth’s Future, 4(5):143–158, 2016. trophic Risk and its Application, 2006. An MDP Model for Socio-Economic Systems Impacted by Climate Change

Wahl, T., Calafat, F. M., and Luther, M. E. Rapid changes in the seasonal sea level cycle along the US Gulf coast from the late 20th century. Geophysical Research Letters, 41(2):491–498, 2014. Wahl, T., Jain, S., Bender, J., Meyers, S. D., and Luther, M. E. Increasing risk of compound flooding from storm surge and rainfall for major US cities. Nature Climate Change, 5(12):1093, 2015. Xia, R., Kannan, S., and Castleman, T. The Ocean Game: The sea is rising. Can you save your town? https://www.latimes.com/projects/ la-me-climate-change-ocean-game/, 2019.