Mixture Model of Generalized Chain-Dependent Processes and Its
Total Page:16
File Type:pdf, Size:1020Kb
Mixture model of generalized chain-dependent processes and its application to simulation of interannual variability of daily rainfall Xiaogu Zheng National Institute of Water and Atmospheric Research, Wellington, New Zealand Richard W. Katz Institute for Study of Society and Environment, National Center for Atmospheric Research, Boulder, Colorado *Corresponding author. Tel: +64-4-386-0499, fax: +64-4-386-2153 E-mail address: [email protected] Abstract This study is devoted to simulation of the interannual variability of daily rainfall, such as the mean and variance of seasonal rainfall totals and the correlation between seasonal rainfall totals and a seasonal climate variable (e.g., an index of atmospheric or oceanic circulation). Prior work established that a mixture of chain- dependent processes, involving conditioning on an observed seasonal circulation index, results in improved representation of the interannual variance of seasonal rainfall totals, as well as of the correlation between rainfall and this index. The present study extends this work through consideration of a mixture model of generalized chain-dependent processes. Here “generalized” refers to the inclusion of a generalized linear model in accounting for the relationship between daily rainfall statistics and the circulation index. It is shown that this extension results in further improvements in the simulation of the interannual variability of seasonal rainfall and, especially, in its correlation with the circulation index. Keywords: Simulation of precipitation; Interdecadal Pacific Oscillation; Mixture model; EM algorithm; New Zealand data. 1. Introduction Daily rainfall is a very important environmental variable for hydrological research. Particularly, it is the most important input for hydrological models, such as TOPNET (Bandaragoda et al. 2004). Such hydrological models can simulate inflows and outflows of reservoirs, which are vital in the analysis of hydroelectricity generation capacity in New Zealand (Zheng and Thompson 2007). A chain-dependent process (i.e. Katz and Parlange 1993) is a popular stochastic rainfall generator to simulate daily rainfall. It can simulate the intraseasonal variability of daily rainfall, such as the distributions of daily rainfall amount and wet- dry durations, reasonably well. However, in order to study the response of hydrological states to climate variability and change, it is also important to correctly simulate the interannual variability of rainfall, such as the variance of seasonal rainfall totals and the correlation between seasonal rainfall totals and a climate variable (e.g., a circulation index) (Zheng and Thompson 2007). Numerous studies indicate that a chain-dependent process tends to underestimate the variance of seasonal rainfall total, a phenomenon called “overdispersion” (Katz and Parlange 1998; Katz and Zheng 1999; Zheng and Thompson 2007). Overdispersed models tend to underestimate the risks of drought and flood to hydropower generation capacity. Katz and Zheng (1999) proposed a mixture of chain- dependent processes to reduce this overdispersion. Despite only two possible states, they found that the hidden index involved in the mixture model is closely related to observed information about atmospheric circulation patterns. Observed climate variables, such as Southern Oscillation indexes, tropical sea surface temperature (SST indexes), and other indexes of global and regional mean circulations, are closely related to rainfall. Many of these climate variables possess considerable predictability at seasonal to decadal time scales. Correctly simulating the relation between rainfall and a climate variable will help to understand the variability of rainfall in response to climate variability and change and, therefore, to project future scenarios of hydropower generation capacity in New Zealand. As an extension of Katz and Zheng (1999), Zheng and Thompson (2007) proposed a mixture of chain- dependent process, conditional on an observed circulation index, to reflect the correlation between rainfall and this index and to reduce the overdispersion. Although the model simulates the correlation fairly well, it is still not entirely satisfactory. In particular, for some climates considerable overdispersion remains. Generalized linear modeling is another approach to rainfall generation to simulate the relation between rainfall and climate variables (e.g., Chandler and Wheater 2002, Furrer and Katz 2007). In this study, we will investigate the possibility of combining a mixture of chain-dependent process conditional on a seasonal climate variable (i.e., the model of Zheng and Thompson 2007) with a generalized linear model (GLM); that is, logistic regression with the climate variable as a predictor. In this way, it is hoped to create a rainfall generator whose ability to simulate the relationship between rainfall and the climate variable is improved. In the course of this study, we will also investigate how to correctly simulate the mean of seasonal rainfall total when a power transformation is applied to daily rainfall. The paper is arranged as follows. In section 2 we define a generalized mixture model for precipitation conditioned on a seasonal climate variable, Section 3 describes a simulation study of long term daily rainfall in the Pukaki catchment, New Zealand, using the proposed generalized mixture model as well as previously proposed simpler versions. Finally, section 4 consists of a discussion and conclusions. 2. Methodology In this section, we introduce a rainfall generator called a mixture of “generalized” chain-dependent processes conditional on a climate variable. All the rainfall generators previously studied by Katz and Zheng (1999) and Zheng and Thompson (2007), involving various types of mixtures of chain-dependent processes, can be viewed as special cases of this generalized model (i.e., can be obtained from the present model by imposing constraints, see Table 1). 2.1. Generalized chain-dependent process A chain-dependent process is a relatively simple stochastic model for daily precipitation (Katz and Parlange 1993). Let {J y,t ,t = 1,...,T} denote the sequence of daily precipitation occurrences in year y (i.e. J y,t = 1 indicates a “wet day” and J y,t = 0 a “dry day”). It is assumed that this process is a first-order Markov chain: a model completely characterized by the transition probabilities Pjk = Pr(J y,t+1 = k | J y,t = j), j, k = 0, 1. (1) Note that Pj0 = 1− Pj1 , j = 0, 1. Let R = {Ry,t ,t = 1,...,T} denote the time series of daily precipitation amounts in year y. The “intensity” Ry,t > 0 (i.e., days for which J y,t = 1 ) is taken to be conditionally independent and identically distributed. It is further assumed that daily precipitation intensity has a power transform distribution. That is, the transformed p variable Ry,t has a Gaussian distribution, say, with mean μ and standard deviation σ . For example, the values of p = 1/2, 1/3, or 1/4 are commonly employed to account for the high degree of positive skewness in the distribution of daily precipitation amounts. Alternatively, a positively skewed distribution, such as the gamma or mixed-exponential, may be fitted directly to the untransformed data. Recently, generalized linear models (GLM, McCullagh and Nelder 1989) were applied to model daily rainfall, including its relationship with a circulation index (see, for example, Chandler and Wheater 2002; Furrer and Katz 2007). In this study, we will investigate a simple GLM rainfall generator, in which the rainfall occurrence is related to a seasonal climate variable, whose state in year y is denoted by X y (i.e. an SST index), by logistic regression Pr(J y,t =1| J y,t−1 = j, X y )=1−1 [1+ exp(α0 +α1 X y + j(β0 + β1 X y ))]. (2) If α1 = 0 and β1 = 0 , the model degenerates to a chain-dependent process. 2.2. Mixture of generalized chain-dependent processes conditional on a climate variable Let I y ( I y = i , i = 0, 1) denote a two-state index, usually hidden or unobserved or only partially observed, that remains constant over the time period of T days (i.e., a season). The annual time series of index states is assumed to be identically distributed, with a common distribution w = Pr(I y = 1) = 1− Pr(I y = 0). (3) For a given year, the time series of daily precipitation amounts {Ry,t ,t = 1,...,T} is a mixture of generalized chain-dependent processes; i.e., given I y = i , I is conditionally a generalized chain-dependent process with parameters iα0 , iα1 , i β0 , i β1 , i μ , i σ and p (i.e., for simplicity, the power transform parameter p remains fixed). Moreover, if the rainfall depends not only on the hidden index I y , but also on a seasonal climate variable X y , then {Ry,t ,t = 1,...,T} is a mixture of generalized chain-dependent processes conditioned on a climate variable. The seasonal climate variable X y is assumed to be a mixture of two Gaussian random variables with mean i m that depends on I y = i and a common standard deviation s. Specifically, Pr(X y | I y = i) = φ(X y ;i m,s) , (4) where “Pr” is being used as a generic “probability function” (i.e., both for discrete probabilities and for density functions) and φ(x;m,s) denotes the Gaussian density function with mean m and standard deviation s. By Bayes theorem, wy ≡ Pr(I y = 1| X y ) Pr(X | I =1) Pr(I =1) = y y y Pr(X y | I y = 0) Pr(I y = 0) + Pr(X y | I y =1) Pr(I y =1) = [φ(X y ;1m,s)w] [φ(X y ;0 m,s)(1− w) + φ(X y ;1m,s)w] (5) If 0 m=1m , then Pr(I y = 1| X y ) = w = Pr(I y = 1) by equation (5). Therefore, if Xy is independent of I y , then the mixture of generalized chain-dependent processes conditioned on X y degenerates to a mixture of generalized chain-dependent processes.