A Markov Decision Process Model for Socio-Economic Systems Impacted by Climate Change

Salman Sadiq Shuvo 1 Yasin Yilmaz 1 Alan Bush 2 Mark Hafen 3

Abstract lenger Jr et al., 2012; Hapke et al., 2013; Pachauri et al., Coastal communities are at high risk of natural 2014; Plant et al., 2016; Hine et al.; Burke et al., 2019). The hazards due to unremitting global warming and coastal communities will undergo multiple threats, includ- sea level rise. Both the catastrophic impacts, e.g., ing recurring hurricanes, storm surge, and heavy rainfall as tidal flooding and storm surges, and the long-term a result of sea level rise (Wahl et al., 2015). This intensifies impacts, e.g., beach erosion, inundation of low ly- social vulnerability, especially in underprivileged neighbor- ing areas, and saltwater intrusion into aquifers, hoods (Kashem et al., 2016; Schrock et al., 2015); induces cause economic, social, and ecological losses. coastal hazards (Nicholls, 2011; Wahl et al., 2014); and Creating policies through appropriate modeling affects regional economies by influencing property values, of the responses of stakeholders, such as govern- taxation, and the insurance cost (Fu et al., 2016). Due to sea ment, businesses, and residents, to climate change level rise, coastal inhabitants are exposed to many of these and sea level rise scenarios can help to reduce consequences and need to develop the adaptive capability these losses. In this work, we propose a Markov and resilience frameworks to counter these stressors through decision process (MDP) formulation for an agent adequate planning and decision support (Beatley, 2012). (government) which interacts with the environ- The governments, planners, coastal administrators, and ment (nature and residents) to deal with the im- personnel in a variety of agencies require substantial in- pacts of climate change, in particular sea level formation for effective interaction, decision making, and rise. Through theoretical analysis we show that a adaptation planning (Beatley, 2012; Fu et al., 2017; Re- reasonable government’s policy on infrastructure search Council, 2009). This demands the support of key development ought to be proactive and based on actors to relate the science, the variability, and the hazard detected sea levels in order to minimize the ex- of various outlines to stakeholders (Tribbia & Moser, 2008). pected total cost, as opposed to a straightforward Explicitly modeling these agents’ reaction to sea level rise government that reacts to observed costs from scenarios can serve in the creation of strategies tailored to nature. We also provide a deep reinforcement local impacts and resilience management and requires a learning-based scenario planning tool considering mixture of social engagement and planning tools, including different government and resident types in terms scenario planning (Berke & Stevens, 2016; Drogoul, 2015; of cooperation, and different sea level rise projec- Potts et al., 2017). tions by the National Oceanic and Atmospheric Administration (NOAA). In this paper, for planning sea level rise scenarios, we study the interactions between a government, residents, and the nature around the sea level rise problem. Specifically, we 1. Introduction use probabilistic models to simulate their behaviors and interactions between them. Focusing on the economic cost of The consequences of global warming and sea level rise have the sea level rise-related natural events (e.g., flooding, hurri- been widely documented, examined, and forecasted (Sal- cane) and government investments (e.g., infrastructure im- *Equal contribution 1Department of Electrical Engineering, provement) we propose a Markov decision process (MDP) University of South Florida, Tampa, USA 2Honors College, Uni- framework to analyze the decision policies for government versity of South Florida, Tampa, USA 3School of Public Affairs, together with the reactions of the nature and residents. University of South Florida, Tampa, USA. Correspondence to: Salman Sadiq Shuvo , Yasin Yilmaz We specialize the proposed MDP framework to the sea . level rise problem and illustrate it using available economic

th data and sea level projections for the Tampa Bay region in Proceedings of the 37 International Conference on Machine Florida. However, the proposed MDP framework can be eas- Learning, Vienna, Austria, PMLR 108, 2020. Copyright 2020 by the author(s). ily adapted to simulating other natural and socioeconomic An MDP Model for Socio-Economic Systems Impacted by Climate Change systems under the impacts of climate change. ing investment decisions for the sea level rise problem to minimize the incurred economic cost. Our contributions can MDP provides a suitable theoretical framework for creating be summarized as follows. agent-based scenarios (Howard, 1960; Bellman, 1957; Fi- lar & Vrieze, 2012). The MDP agent (government in our • We theoretically show that the government should base case) interacts with the environment (nature and residents its investment decision on the observed sea level in- in our case) by taking action at each time and receiving a stead of the incurred cost from the nature. Making reward/cost from the environment in return. The objective investment decisions based on the natural cost corre- of the agent is to optimize the return over time by choos- sponds to the straightforward policy that any sensitive ing actions from a list of available actions. Each time, the but shortsighted government would adopt. Through system moves to a new state based on the agent’s action mathematical analysis of the proposed MDP model, according to a probability distribution. The optimal policy we show that proactive actions triggered by the rising determines which action to take in which state by map- sea levels are more effective in reducing the cumulative ping system states to actions. It can be either found using cost than reactive actions triggered by the cost from dynamic programming or learned from experience using re- the nature. inforcement learning (RL) (Sutton & Barto, 2018), which is also known as approximate dynamic programming. In deep • Using the proposed MDP model we provide a deep reinforcement learning, a neural network structure is used RL-based scenario planning framework that takes into to approximate the expected value for each action at each account arbitrary number of (discrete) government ac- state. Deep RL is suitable for problems where the system tions, continuous sea level rise values, different sea states are many or continuous in nature (Mnih et al., 2015). level projections by NOAA, and different government and resident prototypes in terms of being responsive to 1.1. Related Works the sea level rise problem. As an impact of climate change, sea level change is neither • Using the available economic data and the regionally globally nor regionally uniform (Hine et al.; Sallenger Jr adjusted NOAA sea level projections we present sce- et al., 2012), and can vary spatially and temporally (Ezer, nario simulations for Tampa Bay, FL. For each sce- 2013; Hine et al.; Wahl et al., 2015). This leads to a degree nario, we consider different government and resident of uncertainty to the projection of sea level rise impacts prototypes using cooperation indices that represent at the local level, as well as planning for adaptation and how responsive they are to the sea level rise problem. resilience. The implications of this uncertainty for coastal Finally, we show that the optimal (proactive) policy community planning and adaptation are many, and have learned by the proposed deep RL algorithm achieves led to a variety of efforts to constrain, refine, and bench- 40% less economic cost than the best reactive policy mark sea level rise projections for purposes of assessing risk in 100 years in three different sea level projections by (Buchanan et al., 2016; Hinkel et al., 2015; Le Cozannet NOAA. et al., 2017; Sweet & Park, 2014; Tyler & Moench, 2012). Similarly, in this work, in addition to proposing a general The rest of the paper is organized as follows. Section2 scenario planning model, we also tailor our model to a spe- explains the proposed MDP model. In Section 3.1, we char- cific region, Tampa Bay, FL, using NOAA projections and acterize the optimal policy through function analysis. Then, economic data for the region. we present a deep RL algorithm for finding the optimal Agent-based modeling has been a popular choice for sim- policy in Section 3.2. The scenario simulations for Tampa ulating complex systems including transportation systems Bay region are provided in Section4. Discussions about (Faboya et al., 2018), disaster recovery (Eid & El-Adaway, the proposed model is given in Section5, and the paper is 2018), and climate change (Patt & Siebenhuner¨ , 2005). Dif- concluded in Section6. For interested readers, we provide ferent than existing works, in this paper we propose a novel the proofs in the accompanying supplementary file. scenario planning model based on MDP and deep RL for a realistic setup with arbitrary number of actions and continu- 2. MDP Problem Formulation ous sea level rise values. Furthermore, as a major contribution, we theoretically analyze the optimal decision policy, We propose an MDP framework to model the behaviors of and illustrate the proposed model for the Tampa Bay region. stakeholders (government, residents, and nature), and the interactions between them. As shown in Figure1, government, 1.2. Contributions the central decision maker in dealing with the sea level rise problem, at each time step n takes an action (i.e., investment In our model, MDP agent is the government in an urban decision) xn and receives responses from the residents yn setup, which interacts with the nature and residents by mak- and the nature zn through the cost cn. Then, this natural and An MDP Model for Socio-Economic Systems Impacted by Climate Change

Markov State

Sn = (sn−1 + xn, ℓn−1 + rn) Table 1. Model parameters.

Sn Initial sea level `0 ≥ 0 xn rn ≥ 0 Sea level rise at time n rn ≥ 0 Pn Agent Environment Sea level at time n `n = `0 + m=1 rm Initial infrastructure state s0 ∈ {1, 2, ...} Government Nature Residents xn n−1 zn sn 1, ℓn 1 ≥ yn xm, zm ∈ , Infrastructure decision at time n xn ∈ {0, 1, . . . , q} xn ∈ {0, 1, . . . , q} ( − − ) 0 { }m=1 {0 1} Pn Infrastructure state at time n sn = s0 + m=1 xm xn zn yn Residents’ decision at time n yn ∈ {0, 1} cn Cost from nature at time n zn ≥ 0 Cost Total cost at time n cn = αxn − βyn + zn cn(xn, yn, zn) ≥ 0

2.1. Government Model Figure 1. Proposed MDP framework. At each time step, e.g., a year, the government decides the degree of its investment xn ∈ {0, 1, 2, ...., q} for infrastructure development, where q is a ﬁnite positive integer. xn = 0 means no investment at step n. Hence, there are (sn−1, `n−1) xn = 0, rn = r1 (s , ` + r ) n−1 n−1 1 q + 1 possible actions for the government at each time step.

xn = 1, rn = r2 The numerical value of xn = m can be interpreted as spend- ing m unit money towards the infrastructure development (sn−1 + 1, `n−1 + r2) or the mth action among q different actions with increasing cost and effectiveness. Possible government actions include

xn = q, rn = r3 but are not limited to building seawalls, raising roads, widen- ing beach, building traditional or horizontal levees, placing stormwater pumps, improving sewage systems, relocating seaside properties, etc. (Sergent et al., 2014; mar, 2016; (s + q, ` + r ) n−1 n−1 3 Xia et al., 2019). The range of xn is designed to cover the real world costs from the cheapest investment like clean- ing the pipes to the most expensive investment like buying Figure 2. MDP state transition. The state space consists of a continuous horizontal dimension for sea level and discrete vertical lands and property to relocate the seaside inhabitants and dimension for infrastructure state. Three possible state transitions businesses. are shown in the figure. Note that rn can take any nonnegative The total cost cn to the agent at each time n consists of the value. investment cost, cost from nature, and the residents’ contribution to the investment. Since the government’s investment decision and the residents’ contribution decision have integer values, we model the total cost as cn = αxn −βyn +zn α β socioeconomic system moves to a new state Sn based on the using parameters and to map the decisions to monetary previous state Sn−1, government’s action xn, and the sea values. The three different entities in the cost definition are α, β level rise rn. The system state consists of the pair of city’s in- combined by adjusting the parameters and the param- frastructure state sn and the sea level `n, i.e., Sn = (sn, `n). eters of the probabilistic model introduced for the nature’s The government’s investment decisions also determine the cost zn in Section 2.2. In Section4, we discuss how to set state of the city infrastructure sn, which is modeled as these parameters to obtain realistic costs for the Tampa Bay Pn region. The discounted cumulative cost for the government sn = s0 + m=1 xm = sn−1 + xn. Likewise, the sea level at step n is given by the cumulative sea level rise val- in N time steps is given by ` = ` + Pn r = ` + r s l ues: n 0 m=1 m n−1 n. Here 0 and 0 are N respectively the initial infrastructure state and the initial sea X n CN = a [αxn − βyn + zn], (1) level of the city. In terms of simulations, these are two user- g n=0 defined numbers representing the existing states at the beginning of the simulations. The system state clearly satisfies where the discount factor ag ∈ (0, 1) defines the weight the Markov property: P(Sn|Sn−1,..., S0) = P(Sn|Sn−1). of future costs in the agent’s decisions. Apart from being The state transition is illustrated in Figure2, and the param- a standard parameter in MDP cost function, ag has an im- eters of the proposed MDP framework are summarized in portant contextual significance in this research. It indicates Table1. We next explain our proposed models for the gov- how much the government values the future costs due to ernment, nature, and residents under the MDP framework. sea level rise in its decision making process. Hence, in this An MDP Model for Socio-Economic Systems Impacted by Climate Change

3000 with the sea level rise (nce, 2020). Thus, we model the Int. Low scale parameter of generalized Pareto distributed zn directly Intermediate 2500 High proportional to the most recent sea level `n−1 and inversely Int. Low Fitted proportional to the most recent infrastructure state sn−1. Intermediate Fitted The proposed model for zn is given by 2000

) High Fitted

m m

( zn ∼ GeneralizedPareto(k, σn, θ) l

e 1500

v a

e η(` ) l n−1

a θ ≥ 0, σn = , k < 0, (2)

e b

s (sn−1)

e 1000

t a

l where θ, σ, k are the standard parameters of generalized e R 500 Pareto: location, scale, and shape parameters, respectively; and η > 0, a ∈ (0, 1), b > 0 are our additional model pa-

0 rameters. The parameters k, θ, η, a, b helps to regulate the 2000 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100 impact of most recent sea level ` over nature’s cost z Year n−1 n relative to the most recent infrastructure state sn−1. Choos- Figure 3. Adjusted NOAA projections (solid lines) and our fittings ing an appropriate set of parameters depends on the region (dashed lines) for relative sea level change for Tampa Bay, FL. considered for simulations. We explain how to set the parameters for the Tampa Bay region in Section4. Our prefer- ence for modeling the scale parameter and not the location work, a is termed as the government’s cooperation index. g parameter is due to the fact that the scale parameter can con- The government’s objective is to minimize the expected trol both the mean and the variance, whereas the location cumulative cost E[C ] by taking investment actions {x } N n parameter appears only in the mean. over time. 2.3. Residents Model 2.2. Nature Model In our framework, government requests a contribution from Hurricanes, floods, and stagnant water are some of the many residents to the sea level rise investments, e.g., through sea level rise-related natural events that cause cost in differ- an additional tax referendum; and residents make a binary ent ways such as loss of properties, jobs, taxes, and tourism decision y ∈ {0, 1} whether to support the government. incomes due to submerged areas. To model this cost z , we n n The model is flexible in terms of the frequency of voting. begin with modeling the sea level rise r . Recently, three n For instance, residents may vote every 5 years and the voting different regionally adjusted NOAA projections for sea level decision remains constant until the next voting. rise have been proposed for the Tampa region (Burke et al., 2019), as shown in Fig.3. An appropriate model for yn requires social modeling for the residents’ voting behavior. While the residents of a re- We take these projections as mean sea level rise values, gion in general form a heterogeneous community in terms and model the uncertainty around them using the Gamma of social and psychological factors, it is sufficient to model distribution, i.e., r ∼ Gamma(µ, φ), since it is a flexible n the aggregate response in this work since we consider a two-parameter probability distribution used for modeling referendum in which the majority vote wins. Similar to the positive variables including environmental applications, e.g., government’s cooperation index, we define the residents’ co- daily rainfall (Aksoy, 2000). We set the scale parameter operation index a ∈ (0, 1) to quantify how responsive the φ = 0.5 and vary the shape parameter µ in a range to match r residents are to the sea level rise problem. These cooperation the mean relative sea level, given by Pn E[r ], with m=1 m indices are helpful in simulating different government and the NOAA projection curves. The successful curve fitting, resident profiles, as illustrated in Section4. As explained shown in Fig.3, is achieved by increasing µ from 11.2 by the social exchange theory (Homans, 1974; Rasinski & to 12.388 with 0.012 increments for the intermediate-low Rosenbaum, 1987), some individuals’ votes are motivated projection; from 13 to 34.78 with 0.22 increments for the by what benefit they receive in exchange. A high percentage intermediate projection; and from 14.6 to 87.86 with 0.74 of such individuals in a community is represented by a low increments for the high projection. ar value in our model. According to another perspective, Then, we model nature’s cost zn using the generalized voter behavior is determined by non-self-interested factors Pareto distribution, which is commonly used to model catas- such as a sense of civic duty (Katosh & Traugott, 1982), trophic losses, e.g., (Cebrian´ et al., 2003; Uchiyama & moral obligation (Rasinski et al., 1985), and psychological Watanabe, 2006; Daspit & Das, 2012). It is known that sense of community (Davidson & Cotter, 1993). In our the storm- and flooding-related costs have been increasing model, a high ar value corresponds to a community which An MDP Model for Socio-Economic Systems Impacted by Climate Change is dominantly composed of this type of individuals who optimal policy, the Bellman equation value non-self-interested factors. V (sn−1, `n−1) = min E[cn + agV (sn, `n)|xn] We use ar to obtain a score function that gives the proba- xn bility of support from the residents. In addition to internal provides a recursive approach by focusing on finding the factors, represented by ar, voters’ decision is also affected optimal action x at each time step using the successor state by external factors, government’s actions {xn} and nature’s n value, instead of trying to find the entire policy {x } at once. cost {zn} in our case. We hypothesize that the probability n Using the cost definition given by (1) and considering possi- of a positive voting outcome, P(yn = 1), is high when both nature’s cost and government’s efforts are high (i.e., high ble q actions for xn this iterative equation can be rewritten as xn and zn) in the recent history. In other words, when either nature’s cost has been low or government is not responsive, n residents lack the main reason for accepting an additional V (sn−1, `n−1) = min tax. Hence, we propose the following score function and E − βy + z + a V (s , ` + r ), the Bernoulli model for y : n n g n−1 n−1 n n | {z } F0(sn,`n) n−1 X n−m E α − βyn + zn + agV (sn−1 + 1, `n−1 + rn) , hn = ar xmzm, (3) m=1 | {z } F1(sn,`n) 1 y ∼ (p ), p = σ(h ) = , E2α − βy + z + a V (s + 2, ` + r ), n Bernoulli n n n −(h −h ) n n g n−1 n−1 n 1 + e n 0 | {z } F2(sn,`n) where σ(·) is the logistic sigmoid function used for mapping ... , the score hn to probability pn. High hn means a high likeli- o hood of support from the residents. The sigmoid’s midpoint E qα − βyn + zn + agV (sn−1 + q, `n−1 + rn) , (4) h0 is empirically obtained as the average value of hn. In | {z } addition to being a cooperation index, another interpreta- Fq (sn,`n) tion for ar is the concept of discount (or forgetting) factor where Fm(sn, `n) is defined as the expected total cost of for past events, similar to the government’s ag coefficient. taking action xn = m at state (sn, `n). For high ar values, residents have a longer memory, and they consider also the past disasters and government efforts At each time step n, the action xn shapes the instant with an exponentially decaying weight in their decision. cost cn and moves the system to the next state, which Whereas, for a community with low ar, residents have a determines the discounted future cost agV (sn, ln). The short memory and only care about the most recent cost and optimum policy chooses among the investment actions government effort. In an extreme case where ar is close to xn ∈ {0, 1, 2, ...., q} that has the minimum expected to- zero, they may not be even motivated to contribute by a very tal cost, minm{Fm(sn, `n)}, as shown in (4). Since the recent disaster and government’s high efforts. functions {F0(sn, `n),...,Fq(sn, `n)} determine the optimal policy, we next analyze them to characterize the optimal 3. Optimal Policy government policy. Theorem 1. For m = 0, 1, . . . , q, F (s , ` ) is nonde- In this section, we first analyze the optimal policy for gov- m n n creasing and concave in `n for each sn; and the deriva- ernment investments, and then provide a practical algorithm tive of F (s , ` ) with respect to ` is lower than that of to find the optimal policy. m n n n Fm−1(sn, `n) for m = 1, . . . , q.

3.1. Optimal Policy Analysis Proof is provided in the Supplementary ﬁle. For In the proposed MDP framework, the objective of a rational a speciﬁc infrastructure state sn, expected costs F0(sn, `n),F1(sn, `n),...,Fq(sn, `n) for all the q + 1 government is to minimize the expected total cost E[CN ] in N time steps by following an optimal investment policy. actions are illustrated in Figure4 according to Theorem1. Central to MDP is the optimal value function The optimum policy picks the minimum of the q + 1 curves at each time, which is shown with the solid curve in Figure

V (sn, `n) = min E[CN |{xn}], 4. As a result of Theorem1, we next give the outline of {xn} optimum policy in Corollary1. which gives the minimum expected total cost possible at Corollary 1. The optimum policy, at each infrastructure each state (sn, `n), denoted as the optimal value of that state sn, compares the sea level `n with at most q thresholds state, by choosing the best action policy {xn}. To ﬁnd the where each threshold signiﬁes a change of optimal action. An MDP Model for Socio-Economic Systems Impacted by Climate Change

F (s , ` ) 0 n n The thresholds also depend on the cooperation indices ag and ar. Higher cooperation indices set the thresholds lower

F1(sn, `n) and vice versa. Notably, the government’s cooperation index ag is dominant in shaping the thresholds. Intuitively, F (s , ` ) 2 n n as ag grows, the government becomes more cautious about (i.e., sees more objectively without severely discounting) the F (s , ` ) 3 n n expected future natural costs and sets a lower threshold for investment actions. On the contrary, small ag implies under- estimated future costs and thus overemphasized investment costs, which results in a high threshold for investment.

Expected Total Cost 3.2. Finding the Optimal Policy We showed that the optimal policy is given by a number of thresholds on the current sea level; however completely specifying the optimal policy with analytical expressions for thresholds does not seem tractable due to a large number of 0 parameters involved in the problem. Hence, we next propose ` (s ) ` (s ) ` (s ) thr1 n−1 thr2 n−1 thr3 n−1 `n a reinforcement learning (RL) algorithm to ﬁnd the optimal

e.g.: `n policy for simulated scenarios. Speciﬁcally, to learn the

⇒ optimal action is xn = 1 optimal policy here, a deep RL algorithm is needed due to the continuous sea level values, which cause infinite number of possible states. The deep Q network (DQN) algorithm Figure 4. Expected total costs as a function of sea level for an (Mnih et al., 2015), which is a popular choice for deep RL, example case with q = 3. The expected total costs intersect each addresses well the infinite dimensional state space problem. other only once for any given infrastructure state according to It leverages a deep neural network to estimate the optimal Theorem1. action-value function To prove Corollary1, note that Q(sn, `n, xn) = E[cn + ag min Q(sn + xn, `n+1, x) | xn]. x Fm−1(sn, `n = 0) < Fm(sn, `n = 0) where V (s , ` ) = min Q(s , ` , x ). In Algorithm1, for m ∈ {1, 2, ...., q} because ` = 0 corresponds to n n xn n n n n we present a DQN algorithm for learning the optimal policy the fictional case of zero sea level where there is no on government’s infrastructure investment actions. risk. That is, Fm−1(sn, `n) starts at a lower point than Fm(sn, `n), but increases faster than Fm(sn, `n) since its Algorithm1 operates many episodes to iteratively compute derivative is higher (Theorem1). Also from Theorem1, it the weights of the DQN neural network that relates input is known that both of them are concave and bounded, hence states and actions with the output Q(S, x) values. Each Fm−1(sn, `n) and Fm(sn, `n) intersect exactly at one point episode consists of Monte-Carlo simulations in which sev- for m ∈ {1, 2, . . . , q}. While for `n less than the inter- eral states are visited according to the current policy defined section point the action xn = m is less effective than the by the current value function. Selecting the actions always action xn = m − 1 in terms of immediate cost and expected using equation (4) may keep some of the states not explored future cost, it becomes more effective when `n exceeds the enough. To encourage the agent to explore the state space intersection point. adequately, we use a technique that explores frequently at the beginning, but reduces exploration with gaining more Figure4 gives an example case where there are q = 3 experience over time (Singh et al., 2000). After the weight thresholds ` (s ), ` (s ), ` (s ) which de- thr1 n−1 thr2 n−1 thr3 n−1 matrix values converge, the final weight matrix is used to pend on s and indicate change points of optimal action. n−1 generate scenario simulations, as described in the following However, depending on the slopes of {F (s , ` )} curves m n n section. The convergence of the weight matrix is depicted at each infrastructure state s , there may be less than q n−1 in Fig. 3.2. change points.

To summarize, for a given state (sn−1, `n−1), the optimum 4. Scenario Simulations policy chooses xn based on the relative value of the current sea level `n = `n−1 + rn, where rn is the rise on top In this section, we present scenario simulations for the of `n−1, with respect to the existing infrastructure state Tampa Bay region in Florida using our MDP model and Pn−1 sn−1 = m=1 xm. deep RL algorithm. Tampa Bay, situated along the Gulf of An MDP Model for Socio-Economic Systems Impacted by Climate Change

Algorithm 1 DQN algorithm for ﬁnding optimum policy 6 1: Input: ag, ar, α, β, θ, k, η, a, b 2: Initialize replay memory D to capacity N 3: Initialize action-value function Q with random weights 5

w and target action-value function Q0 with random s

0 e

weights w = w g

n 4

4: for episode = 1, 2, ... do a

h c

5: Initialize state S0 = (s0, `0)

d e

6: for n = 1, 2, ..., N do r

a 3 u

7: With probability select a random action xn, oth- q s

erwise select x = arg min Q(S , x; w) f

n x n o 2

8: Execute action xn and observe cost cn = αxn − m u

βyn + zn (see (2) and (3)) S 9: Store transition (Sn, xn, cn, Sn+1) in D 1 10: Sample random minibatch of transitions (Sj, xj, cj, Sj+1) from D 0 0 11: Set target tj = cj + ag minx Q (Sj+1, x; w ) 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 12: Perform a gradient descent step on [t − j Number of weight updates Q(Sj, xj; w)] with respect to the weights w 13: d w0 = w Every steps reset Figure 5. Convergence of weights in Algorithm 1. 14: if Q(S, x) converges for all S, x then 15: break obtain different scenarios by varying the cooperation indices 16: end if ag and ar for the government and residents, respectively, 17: end for and by considering different NOAA projections for sea level 18: end for rise. As expected, the average total cost decreases with growing cooperation index for both the government (ag) and residents (ar). The cost is significantly higher if both the government and residents are not cooperative (a = a = Mexico in Florida with more than 3 million residents, is g r 0.1) compared to the cooperative case (a = a = 0.9). listed number seven among the most at-risk areas in terms g r of risk of damaging floods on the globe by the World Bank We next compare the proposed MDP-based government (wor, 2013). We have three different sea level rise projec- policy, which proactively improves infrastructure based on tions by NOAA until the year 2100 for the area (Burke et al., observed sea levels, with the shortsighted policy which re- 2019), as shown in Fig.3. The City of Tampa expects to acts to significant costs from the nature by improving the collect $14 M/year for 30 years from residents to improve infrastructure. We model this shortsighted policy with a its stormwater system (sto, 2019). Thus, we set β = 14 with threshold parameter for the cost from nature zn. When zn the base unit of million dollars. According to the ongoing exceeds the threshold, the shortsighted policy improves the improvement projects and regular stormwater services of infrastructure by three units (xn = 3). Under the three City of Tampa (sto, 2019), annual investment may range sea level projections, we find the optimum thresholds that between $ 25-75 M. Hence, we choose α = 25 and the minimize the average total cost of shortsighted policy in number of nonzero actions q = 3, i.e., xn ∈ {0, 1, 2, 3}.A 100 years as 34, 28, and 21 for the intermediate low, in- recent report by the Tampa Bay Regional Planning Coun- termediate, and high projections, respectively (Figure7). cil (tbr, 2017) gives “the cost of doing nothing” for years The optimal thresholds in Figure7 show that the infras- 2020-2060 due to the sea level rise impacts under the tructure improvement is more urgent under higher sea level high sea level rise projections as $162 billion. We obtain projections, in accordance with the intuition. P2060 z = 162, 000 (in million dollars) by keeping the in- 2020 n Finally, in Fig.8, we compare the best MDP-based policy frastructure state at the initial level s = 50, increasing the 0 (a = a = 0.9) with the best shortsighted policy, which sea level from the initial ` = 100 according to the gamma g r 0 uses the optimum threshold, in terms of average total cost in distribution following the high sea level projection (Section 100 years for all sea level projections. The same resident and 2.2), and setting the parameters of generalized Pareto model nature models are used while simulating both policies. For as k = −0.001, θ = 1, η = 25, a = 0.9, b = 1.1. All the all the three sea level projections, the MDP-based proactive costs in the following figures are normalized by taking the policy, whose decisions are driven by the current sea level, residents’ contribution 14 M as one unit. achieves around 40% less cost than the shortsighted reactive In Fig.6, we analyze the effect of cooperation indices. We policy, which decides based on the cost from nature. An MDP Model for Socio-Economic Systems Impacted by Climate Change

3400 4500 6500 s s a = 0:1 a = 0:1 ar = 0:1

r s r 6400

r r

3300 r 4400

a a

a = 0:5 e a = 0:5

e ar = 0:5 r r

e y

y 6300 a = 0:9 y ar = 0:9 ar = 0:9 3200 r 0

0 4300

0 0

0 6200

1 n n 3100

n 4200

i i

i 6100

o o

3000 o 4100 6000

a a

a 5900

t t

2900 t 4000

t t

5800

g g

2800 g 3900

a a

r 5700

v v 2700 3800 v

5600

A A

2600 3700 5500 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9 ag ag ag

Figure 6. Average total cost as a function of government cooperation index ag and resident cooperation index ar for different NOAA projections: intermediate low (left), intermediate (middle), high (right).

9500 8000 Shortsighted policy 9000 MDP-based policy 7000

8500

r a

a intermediate low

e y

y 8000 intermediate 0

0 6000

0 0

high 1

1 7500

s o o 7000

c 5000

o o

6500 t

e g

g 4000

a r

r 6000 e

e 21

A A 5500 28 3000

5000 34 4500 2000 0 10 20 30 40 50 60 70 80 intermediate low intermediate high Threshold Sea level rise projections Figure 7. Average total cost for the shortsighted government policy Figure 8. Comparison of the cumulative cost for threshold based as a function of investment threshold under the three sea level and agent based optimal policy. projections. The optimum thresholds for each projection are shown in colored boxes. but not least, modeling the cooperation indices (ag, ar) is 5. Discussions and Future Work an interesting research direction that would require an inter- disciplinary effort. By decomposing them into a number of To the best of our knowledge, we presented the first com- fundamental traits, such as political, sociological, religious, prehensive MDP framework with theoretical and numerical and ethical traits of governments and residents, one can analysis for modeling socio-economic scenarios of the cli- study how to obtain ag, ar, or another cooperation index for mate change-related sea level rise problem. Although we a new stakeholder, for a given community. strived for a realistic framework, we know that the presented work can be improved in several aspects to obtain a more 6. Conclusion realistic framework. For instance, a continuous action space for the government can be integrated into the current model A novel Markov Decision Process (MDP) model has been by considering the monetary value of investments instead of proposed for simulating socio-economic scenarios of the sea a discrete set of infrastructure improvement actions. Policy level rise problem. The optimal government policy has been gradient methods such as the actor-critic method can be shown to rely on the observed sea levels through analyzing used instead of DQN to find the optimal policy with con- the expected cost functions. We also provided a deep rein- tinuous actions. Moreover, multi-agent RL methods can be forcement learning (RL) algorithm for finding the optimal used to actively model the decisions of other stakeholders, policy, and presented scenario simulations for the Tampa such as residents and businesses, and interactions between Bay region using available economic data from the city them. Modeling of each stakeholder would ideally need government and regional sea level projections from NOAA. a tailored approach that incorporates the characteristics of The simulations of several scenarios corroborated the im- stakeholder into designing cost function, actions, etc. Last portance of supportive residents and a proactive government An MDP Model for Socio-Economic Systems Impacted by Climate Change that improves the infrastructure according to the observed Cebrian,´ A. C., Denuit, M., and Lambert, P. General- sea level and does not undervalue the future cost of sea level ized pareto fit to the society of actuaries? large claims rise. Finally, we discussed several ways of improving the database. North American Actuarial Journal, 7(3):18–36, proposed model for generating more realistic scenarios. 2003.

Daspit, A. and Das, K. The generalized pareto distribu- References tion and threshold analysis of normalized hurricane dam- Which Coastal Cities Are at Highest Risk of Dam- age in the united states gulf coast. In Joint Statistical aging Floods? New Study Crunches the Numbers, Meetings (JSM) Proceedings, Statistical Computing Sec- The World Bank. https://www.worldbank. tion, Alexandria, VA: American Statistical Association, org/en/news/feature/2013/08/19/ pp. 2395–2403, 2012. coastal-cities-at-highest-risk-floods, Davidson, W. B. and Cotter, P. R. Psychological sense of 2013. community and support for public school taxes. American Marin County Community Development Agency, Journal of Community Psychology, 21(1):59–66, 1993. https://www.marincounty. Game of Floods. Drogoul, A. Agent-based modeling for multidisciplinary org/depts/cd/divisions/planning/ and participatory approaches to climate change adaptation csmart-sea-level-rise/game-of-floods , planning. In Proceedings of RFCC-2015 Workshop, AIT, 2016. 2015.

The Cost Of Doing Nothing, Economic Impacts of Sea Eid, M. S. and El-Adaway, I. H. Decision-making frame- Level Rise in the Tampa Bay Area, Tampa Bay Regional work for holistic sustainable disaster recovery: Agent- Planning Council. http://www.tbrpc.org/ based approach for decreasing vulnerabilities of the as- wp-content/uploads/2018/11/2017-The_ sociated communities. Journal of infrastructure systems, Cost_of_Doing_Nothing_Final.pdf, 2017. 24(3):04018009, 2018.

City of Tampa Stormwater Utility Funding. https: Ezer, T. Sea level rise, spatially uneven and temporally //www.tampagov.net/tss-stormwater/ unsteady: Why the US East Coast, the global tide gauge programs/assessment, 2019. record, and the global altimeter data show different trends. Geophysical Research Letters, 40(20):5439–5444, 2013. NOAA National Centers for Environmental Informa- tion (NCEI), U.S. Billion-Dollar Weather and Cli- Faboya, O. T., Figueredo, G. P., Ryan, B., and Siebers, P.-O. mate Disasters. https://www.ncdc.noaa.gov/ Position paper: The usefulness of data-driven, intelli- billions/time-series, 2020. gent agent-based modelling for transport infrastructure management. In 2018 21st International Conference on Aksoy, H. Use of gamma distribution in hydrological analy- Intelligent Transportation Systems (ITSC), pp. 144–149. sis. Turkish Journal of Engineering and Environmental IEEE, 2018. Sciences, 24(6):419–428, 2000. Filar, J. and Vrieze, K. Competitive Markov decision pro- Beatley, T. Planning for coastal resilience: best practices cesses. Springer Science & Business Media, 2012. for calamitous times. Island Press, 2012. Fu, X., Song, J., Sun, B., and Peng, Z.-R. Living on the edge: Bellman, R. A Markovian decision process. Journal of Estimating the economic cost of sea level rise on coastal mathematics and mechanics, pp. 679–684, 1957. real estate in the Tampa Bay region, Florida. Ocean & Coastal Management, 133:11–17, 2016. Berke, P. R. and Stevens, M. R. Land use planning for climate adaptation: Theory and practice. Journal of Plan- Fu, X., Gomaa, M., Deng, Y., and Peng, Z.-R. Adaptation ning Education and Research, 36(3):283–289, 2016. planning for sea level rise: A study of US coastal cities. Journal of Environmental Planning and Management, 60 Buchanan, M. K., Kopp, R. E., Oppenheimer, M., and (2):249–265, 2017. Tebaldi, C. Allowances for evolving coastal ﬂood risk under uncertain local sea-level rise. Climatic Change, Hapke, C. J., Brenner, O., Hehre, R., and Reynolds, B. J. 137(3-4):347–362, 2016. Assessing hazards along our Nation’s coasts. Technical report, US Geological Survey, 2013. Burke, M., Carnahan, L., Hammer-Levy, K., and Mitchum, G. Recommended projections of sea level rise in the Hine, A. C., Chambers, D. P., Clayton, T. D., Hafen, M. R., tampa bay region (update). 2019. and Mitchum, G. T. Sea Level Rise in Florida: Science, An MDP Model for Socio-Economic Systems Impacted by Climate Change

Impacts, and Options. University Press of Florida. ISBN Rasinski, K., Tyler, T. R., and Fridkin, K. Exploring the 9780813062891. function of legitimacy: Mediating effects of personal and institutional legitimacy on leadership endorsement Hinkel, J., Jaeger, C., Nicholls, R. J., Lowe, J., Renn, O., and system support. Journal of Personality and Social and Peijun, S. Sea-level rise scenarios and coastal risk Psychology, 49(2):386, 1985. management. Nature Climate Change, 5(3):188–190, 2015. Rasinski, K. A. and Rosenbaum, S. M. Predicting citizen support of tax increases for education: A comparison of Homans, G. C. Social behavior: Its elementary forms. 1974. two social psychological perspectives. Journal of Applied Social Psychology, 17(11):990–1006, 1987. Howard, R. A. Dynamic programming and markov pro- cesses. 1960. Research Council, N. Informing decisions in a changing climate. Panel on Strategies and Methods for Climate- Kashem, S. B., Wilson, B., and Van Zandt, S. Planning Related Decision Support of the Committee on the Hu- for climate adaptation: Evaluating the changing patterns man Dimensions of Global Change, National Research of social vulnerability and adaptation challenges in three Council of the National Academies, 2009. coastal cities. Journal of Planning Education and Re- search, 36(3):304–318, 2016. Sallenger Jr, A. H., Doran, K. S., and Howd, P. A. Hotspot of accelerated sea-level rise on the Atlantic coast of North Katosh, J. P. and Traugott, M. W. Costs and values in the America. Nature Climate Change, 2(12):884–888, 2012. calculus of voting. American Journal of Political Science, pp. 361–376, 1982. Schrock, G., Bassett, E. M., and Green, J. Pursuing equity and justice in a changing climate: Assessing equity in Le Cozannet, G., Manceau, J.-C., and Rohmer, J. Bounding local climate and sustainability plans in US cities. Jour- probabilistic sea-level projections within the framework nal of Planning Education and Research, 35(3):282–295, of the possibility theory. Environmental Research Letters, 2015. 12(1):14012, 2017. Sergent, P., Prevot, G., Mattarolo, G., Brossard, J., Morel, Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, G., Mar, F., Benoit, M., Ropert, F., Kergadallan, X., J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidje- Trichet, J.-J., et al. Adaptation of coastal structures to land, A. K., Ostrovski, G., et al. Human-level control mean sea level rise. La Houille Blanche, (6):54–61, 2014. through deep reinforcement learning. Nature, 518(7540): 529, 2015. Singh, S., Jaakkola, T., Littman, M. L., and Szepesvari,´ C. Convergence results for single-step on-policy Nicholls, R. J. Planning for the impacts of sea level rise. reinforcement-learning algorithms. Machine learning, Oceanography, 24(2):144–157, 2011. 38(3):287–308, 2000.

Pachauri, R. K., Allen, M. R., Barros, V. R., Broome, J., Sutton, R. S. and Barto, A. G. Reinforcement learning: An Cramer, W., Christ, R., Church, J. A., Clarke, L., Dahe, introduction. MIT press, 2018. Q., and Dasgupta, P. Climate change 2014: synthesis Sweet, W. V. and Park, J. From the extreme to the mean: Ac- report. Contribution of Working Groups I, II and III to celeration and tipping points of coastal inundation from the ﬁfth assessment report of the Intergovernmental Panel sea level rise. Earth’s Future, 2(12):579–600, 2014. on Climate Change. IPCC, 2014. Tribbia, J. and Moser, S. C. More than information: what Patt, A. and Siebenhuner,¨ B. Agent based modeling coastal managers need to plan for climate change. Envi- and adaption to climate change. Vierteljahrshefte zur ronmental science & policy, 11(4):315–328, 2008. Wirtschaftsforschung, 74(2):310–320, 2005. Tyler, S. and Moench, M. A framework for urban climate re- Plant, N. G., Robert Thieler, E., and Passeri, D. L. Coupling silience. Climate and development, 4(4):311–326, 2012. centennial-scale shoreline change to sea-level rise and coastal morphology in the gulf of mexico using a bayesian Uchiyama, Y. and Watanabe, S. Modeling Natural Catas- network. Earth’s Future, 4(5):143–158, 2016. trophic Risk and its Application, 2006.

Potts, R., Jacka, L., and Yee, L. H. Can we Catch em All’? Wahl, T., Calafat, F. M., and Luther, M. E. Rapid changes An exploration of the nexus between augmented reality in the seasonal sea level cycle along the US Gulf coast games, urban planning and urban design. Journal of from the late 20th century. Geophysical Research Letters, Urban Design, 22(6):866–880, 2017. 41(2):491–498, 2014. An MDP Model for Socio-Economic Systems Impacted by Climate Change

Wahl, T., Jain, S., Bender, J., Meyers, S. D., and Luther, M. E. Increasing risk of compound ﬂooding from storm surge and rainfall for major US cities. Nature Climate Change, 5(12):1093, 2015. Xia, R., Kannan, S., and Castleman, T. The Ocean Game: The sea is rising. Can you save your town? https://www.latimes.com/projects/ la-me-climate-change-ocean-game/, 2019.