<<

2016 IEEE International Symposium on Information Theory

Latent Tree Ensemble of Pairwise Copulas for Spatial Extremes Analysis

Hang Yu, Junwei Huang, and Justin Dauwels School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore

Abstract—We consider the problem of jointly describing further overcome this deficiency by expressing different orders extreme events at a multitude of locations, which presents of dependencies using pairwise-copula trees and selecting paramount importance in catastrophes forecast and risk manage- all trees to be maximum spanning trees [7]. Moreover, vine ment. Specifically, a novel Ensemble-of-Latent-Trees of Pairwise Copula (ELTPC) model is proposed. In this model, the spatial copulas can capture tail dependence by properly selecting the dependence is captured by latent trees expressed by pairwise base pairwise copulas [8]. However, the biggest impediment to copulas. To compensate the limited expressiveness of every single the application of vine copulas is the quadratically increasing latent tree, an mixture of latent trees is employed. By harnessing number of parameters w.r.t. the dimension. the variational inference and stochastic gradient techniques, we As a consequence, the existing extreme-value models are further develop a triply stochastic variational inference (TSVI) algorithm for learning and inference. The corresponding com- often limited to tens of variables. Yet many practical statistical putational complexity is only linear in the number of variables. problems, for instance in Earth Sciences, involve thousands or Numerical results from both the synthetic and real show that millions of sites (variables). On the other hand, graphical mod- the ELTPC model provides a reliable description of the spatial els can handle a large number of variables [9], nevertheless, extremes in a flexible but parsimonious manner. they have rarely been applied to the realm of extreme-event analysis. When describing spatial extremes, it is reasonable I.INTRODUCTION to use a lattice graph as shown in Fig. 1a. Unfortunately, Extreme events such as hurricanes, droughts, and floods construction of a cyclic by of pairwise often give rise to fatal consequences and huge economic loss. copulas is intractable; the corresponding log partition function For example, Hurricane Sandy in 2012 results in a death toll cannot be computed in a closed form. However, the density of at least 233 and a total loss of 75 billion USD. Worse still, function of a tree graphical model can be expressed by both observational data and climate models imply that both pairwise copulas. As such, our objective is to approximate the occurrences and sizes of such catastrophes will increase in the lattice graphical model using tree graphical models. A fruitful future. It is therefore imperative to analyze such events, assess step directed to this end is taken in [10], [11], where a mixture the likelihood, and further take precaution measurements. of all the spanning trees in the lattice is utilized. Moreover, Models of spatial extremes often resort to a two-stage there exists tail dependence in trees under mild conditions [11]. approach [1]. The first stage involves learning the marginal In this paper, to better approximate the cyclic graph, we distributions at each measuring site. The marginal parameters propose a novel extreme-value graphical model, i.e., ensemble- are typically coupled in space to address the issue of inaccurate of-latent-trees of pairwise copulas (ELTPC). Specifically, in- estimation due to the lack of extreme-value samples [2]. We stead of using the spanning trees of the observed variables, then proceed to the second stage and couple the marginals we introduce hidden variables and exploit latent trees (see together via copulas or max-stable processes. For instance, the Fig. 1b). As shown in Fig. 1c and 1d, by integrating out marginals are glued together by a Gaussian copula in [3] and the hidden variables, the graphical model capturing the de- a copula Gaussian graphical model in [4]. However, Gaussian pendence between observed variables is cyclic. However, one copulas are asymptotically tail independent [5], indicating that issue that frequently arises for latent trees is the presence of they are not able to capture the dependence between high- blocky artifacts [9], [12], since pairs of variables with the quantile extremes (e.g., extreme wind gusts at different loca- same physical distance have different statistical distances in tions). To address this issue, models with tail dependence are the graph. To overcome this difficulty, we use an ensemble proposed, including max-stable processes and extreme-value of latent trees as shown in Fig. 2. More concretely, we first (max-stable) copulas [5]. Unfortunately, such models are de- construct a pyramidal graph (see Fig. 2a) by associating the fined by complicated cumulative distribution functions (CDF). nodes in the bottom scale with the measuring stations. Next, The corresponding involves differentiation we consider all the spanning trees of the pyramidal graph, with respect to (w.r.t.) all variables, and hence is intractable which can be expressed by pairwise copulas. The resulting in high-dimensional settings [5]. As a remedy, composite ELTPC model possesses superior modeling power over the likelihood [6] replaces the original likelihood by a pseudo- ensemble of spanning trees of observed variables [10], [11], likelihood composing of pairwise likelihoods. However, differ- while at the same time guaranteeing the tail dependence, and ent configurations of composite likelihood have to be tested avoiding the need to compute the log partition function as in before the best one can be determined. Vine copula models cyclic graphs. We then develop an efficient triply stochastic

978-1-5090-1806-2/16/$31.00 ©2016 IEEE 1730 2016 IEEE International Symposium on Information Theory

= fj(xj) cjk(Fj(xj),Fk(xk)), (4) j∈V (j,k)∈E + Y Y (c) where it follows from (2) that fjk/(fjfk) = cjk. As proven in [11], two variables in a tree are tail dependent if all the pairwise copulas on the path connecting them have tail (a) (b) (d) dependence. Fig. 1: A latent tree (c) can approximate the cyclic graph (d) after integrating out the latent variables. B. Ensemble of Latent Trees variational inference (TSVI) algorithm to learn the parameters In this section, we model the spatial dependence between of the proposed model. The results from synthetic and real different locations using an ensemble of latent trees. As shown data show that the ETPLC model can well describe extreme in Fig. 2, we first introduce hidden variables by imposing events with few parameters. a pyramidal graph (Fig. 2a) on the observed variables. To This paper is structured as follows. First we learn the remove the blocky artifacts of one single latent tree [9], extreme-value marginal distributions in Section II. Next, we we further consider a mixture of all the spanning trees of present the ELTPC model in Section III, and develop the TSVI the pyramidal graph. The proposed model has three pleasing algorithm in Section IV. Numerical results are provided in properties: First, latent trees can be expressed by pairwise Section V. Finally, we offer concluding remarks in Section VI. copulas (4) [14], resulting in tractable and compact repre- II.EXTREME-VALUE MARGINAL DISTRIBUTIONS sentation of large-scale data. Second, latent trees can tackle In this study, we consider maxima over a particular time tail dependence among the extremes by properly selecting period, e.g., monthly or annual maxima. Suppose that we have the constituent pairwise copulas. Third and most importantly, (n) after integrating out hidden variables, this model can better N samples xi at each of PO observed sites, where i = 1, ··· ,PO and n = 1, ··· ,N. The extreme value theory states approximate the intractable cyclic graph than the mixture of that the block maxima xi of i.i.d. univariate samples at each spanning trees of the observed variables [10], [11]. location converge to the three-parameter Generalized Extreme Specifically, let xO and xH represent the observed and hid- T T T Value (GEV) distribution with CDF [1]: den variables respectively and x = [xO, xH ] . The dimension ξ 1 of xO, xH , and x is denoted by PO, PH and P respectively. i − ξ exp{−[1 + (xi − µi)] i }, ξi 6= 0 The density of a latent tree can be written as: σi LT i F (xi) = (1)  1 x (5) exp{− exp[− (xi − µi)]}, ξi = 0, p( |LT i) = fj(xj) cjk(Fj(xj),Fk(xk)).  σ i j∈VO (j,k)∈Ei where µi ∈ R, σi > 0, and ξi ∈ R denote the location, scale Y Y  Note that we omit the marginals of the hidden variables since and , respectively. To improve the estimation they are out of our concern; we introduce hidden variables only accuracy, we smooth the GEV parameters across space using to help explain the dependence between observed variables. a thin-membrane model as in [2]. Next, we build the ELTPC model by computing the weighted III.ENSEMBLEOF LATENT TREES sum over all possible spanning trees of the pyramidal graph:

In the following, we first briefly introduce copulas, and then p(x) = p(LT i)p(x|LT i), (6) present the ELTPC model at length. LT i X A. From Copulas to Trees where p(LT i) is a decomposable prior that assigns the weight of each latent tree LT i as the product of the weights βjk of An arbitrary copula function C taking marginal distributions the edges (j, k) ∈ Ei in the tree, i.e., Fi(xi) as its arguments, defines a valid joint distribution with 1 marginals F (x ) [13]: p(LT i) = βjk, (7) i i Z (j,k)∈E F (x1, . . . , xP ) = C (F1(x1),...,FP (xP )) = C(u1, . . . , uP ), Y i and Z is the normalization constant. According to the matrix- where ui = Fi(xi) follows unit uniform distributions. The tree theorem [10], Z = βjk = det[Q(β)], corresponding joint PDF can be written as: LT i (j,k)∈Ei where β is a symmetric edge weight matrix with zeros on P P Q the diagonal entries and with the (j, k)th entry being βjk, and f(x1, . . . , xP ) = c(F1(x1),...,FP (xP )) fi(xi), (2) β be the first rows and columns of the original i=1 Q( ) P −1 P ×P where c is the copula density function. WeY can observe Laplacian matrix of the graph whose edge weight matrix is β. from (2) that a copula separates the marginal specification By substituting (5) and (7) into (6), the ELTPC model can be from dependence modeling. succinctly formulated as: As mentioned in the introduction, the density of a tree det[Q(β ⊙ c)] p(x|β, θ) = fj(xj), (8) graphical model can be written as an expression det[] Q(β) T = (V, E) j Y∈VO of pairwise copulas [10]: where ⊙ denotes componentwise multiplication, c is the cop- fjk(xj, xk) ula density matrix whose th entry equals , f(x|T ) = fj(xj) (3) (j, k) cjk(uj, uk; θjk) fj(xj)fk(xk) j and θjk is the dependence parameter of cjk. In this setting, the Y∈V (j,kY)∈E

1731 2016 IEEE International Symposium on Information Theory

= + + + +...

(a) Gpyr = (V, E). (b)T L 1 = (V, E1). (c)T L 2 = (V, E2). (d)T L 3 = (V, E3). (e)T L 4 = (V, E4). Fig. 2: ELT model: the pyramidal graph (a) and several decomposed spanning trees (b) - (g). number of parameters in the ELTPC model increases linearly (n) (n) β θ Φ vHPH zHPH + mHPH | , − with PO.    IV. TRIPLY STOCHASTIC VARIATIONAL INFERENCE 1 (n)T (n) (n)T (n) (n) In this section, we focus on learning both the edge weight tr CH CH + mH mH + log det CH , 2 ) matrix β and the parameters of the pairwise copulas θ. Given     where φ(z ) is a P -variate standard normal distribution. N samples, our objective is to find β and θ that maximizes H H the following log-likelihood: In order to maximize the lower bound, we consider the N N gradients w.r.t. the model parameters β and θ as well as the (n) (n) parameters of the variational distributions m(n) and C(n). For log p(uO |β, θ) = log p(uO , uH |β, θ)duH . H H u n=1 n=1 Z H the sake of brevity, we focus on the gradients w.r.t. β and its Unfortunately,X the aboveX integration is intractable due to the stochastic variants in the sequel; other gradients can be derived complex functional form of p(uO, uH |β, θ) as in Eq. (8). In- in a similar spirit. N u β θ N (n) stead, we maximize the lower bound of n=1 log p( O| , ): ∂L ∂ log det Q(β ⊙ c ) N (n) = Eφ(zH ) p(u , uH |β, θ) ∂βjk ∂βjk (n) u O P u n=1 L = q ( H ) log n d H . (9)  u q( )(u ) X n=1 H H ∂ log det Q(β) X Z − , (11) Here, we need to choose a tractable variational distribution ∂βjk (n)  q (uH ) and such methods are referred to as variational where Bayes [15]. Equivalently, we seek to minimize the KL diver- ∂ log det Q(β ⊙ c) T −1 (n) (n) =cjkejkQ(β ⊙ c) ejk, (12) gence between q (uH ) and p(uH |uO ; β, θ). For simplicity, ∂βjk (n) we specify q (uH ) to be a Gaussian copula with a diagonal ∂ log det Q(β) T −1 matrix, that is, =ejkQ(β) ejk, (13) ∂βjk (n) (n) −1 −1 −1 φPH Φ (uH 1), ··· , Φ (uH PH ); mH ,CH and e is a vector extracting elements in that (n) jk (P −1)×1 Q q (uH ) = , −1 are related to βjk. We emphasize that the computational com-  φ Φ (uH )  i i 3 (10) plexity of evaluating the gradients is O(NP ). Furthermore, Q  the expectations in Eq. (11) are intractable. To address these where Φ(·) and φ(·) denotes the CDF and PDF of a univariate issues, we exploit stochastic gradients. Stochastic gradients standard Gaussian distribution, φ (·; m(n),C(n)) denotes a PH H H are unbiased estimates of the true gradients and they can m(n) PH -variate Gaussian distribution with H and co- be computed efficiently [15]. In what follows, we elaborate (n) (n)T (n) CH CH , and CH is a diagonal matrix with the on how to compute stochastic gradients to achieve dramatic (n) vector vH on the diagonal. Note that computational gain. (n) different from a regular Gaussian copula with mH = 0 First, we make the computational cost independent of N by (n) i uniformly from the set {1, ··· ,N}, and replacing and vH = 1, we allow the mean and the covariance to κ (n) change flexibly such that q (uH ) can better approximate the true gradient w.r.t. β with: (n) ˜ (iκ) p(uH |uO ; β, θ). ∂L ∂ log det Q(β ⊙ c ) ∂ log det Q(β) =NEφ(zH ) − . By substituting (10) into (9) and reparameterizing uHi by ∂βjk ∂βjk ∂βjk −1   x˜Hi = Φ (uHi), the lower bound can be simplified as: Consequently, the computational complexity is reduced to N O(P 3). Next, we approximate the intractable expectations (n) (n) {κ} L = φP x˜H ; m ,C H H H by their unbiased estimates, that is, we draw a sample zH n=1 " X Z   from the distribution φ(zH ), and then compute the stochastic (n) p uO , Φ(˜xH1), ··· , Φ(˜xHPH )|β, θ gradients as: · log dx˜ . (iκ)  (n) (n)  H ∂ log det Q(β ⊙ c ) ∂ log det Q(β) φPH (x˜H ; mH , CH ) # N − . ∂βjk ∂βjk z =z{κ} To further simplify the problem, we change variables accord-   H H (n)−1 (n) Finally, we intend to reduce the computational complexity ing to z = C (x˜ − m ) [15], and express L as: H H H H w.r.t. P such that the proposed model can be applicable to N 3 (n) (n) (n) high-dimensional problems. The O(P ) computational cost L = E z log p u , Φ v z + m , ··· , φ( H ) O H1 H1 H1 stems from the inverse of matrices in Eq. (12)-(13). Here, n=1 (  X  

1732 2016 IEEE International Symposium on Information Theory

instead of directly computing the inverse Q−1, we seek to TABLE I: Comparison of different models when fitting the find a random low-rank (P − 1) × M matrix L such that simulated from max-stable processes. T E[LL ] = I. As a result, an unbiased estimate of the diagonal Grid Size Models AvgLogLLH Prm No. BIC Running Time (s) of EQ−1ET can be computed with O(MP ) complexity by ELTPC 09.86×1 1 448 -7.62×104 3.69×103 ETPC 9.52× 1 224 -7.48× 4 1.29× 2 8×8 0 1 10 10 first solving M sparse linear systems QR = L and then CGGM 08.67×1 1 176 -6.83×104 7.28×10−1 compute the sum over rows in the matrix (ER)⊙(EL), where R-vine 09.97×1 1 2016 -6.77×104 4.96×101 ELTPC 03.89×1 2 1920 -3.00×105 9.54×103 E is comprised of ejk. To this end, we borrow the idea in [16], ETPC 3.68× 2 960 -2.88× 5 2.53× 2 16×16 0 1 10 10 CGGM 03.19×1 2 736 -2.50×105 7.92 [17], which was originally proposed to efficiently compute the 2 5 3 −1 R-vine 03.99×1 33895 -1.23×10 1.12×10 approximate diagonal of Q given Q, and design L in each ELTPC 01.39×1 3 7936 -1.07×106 8.72×104 ETPC 1.33× 3 3968 -1.04× 6 1.01× 3 iteration as follows: we partition all the nodes in the graph 32×32 0 1 10 10 CGGM 01.15×1 3 3008 -8.99×105 5.50×103 corresponding to Q into M color classes using the greedy R-vine 01.51×1 3 523776 1.93×106 7.56×104 multicoloring algorithm [17] such that the nodes with the same ELTPC 05.42×1 3 32256 -4.15×106 2.93×105 ETPC 5.26× 3 16128 -4.11× 6 1.24× 4 64×64 0 1 10 10 color have a certain minimum distance D between them. Next, CGGM 04.60×1 3 12160 -3.60×106 2.54×106 × 3 × 7 × 6 we put a column L:,j in the low-rank matrix for each color j. R-vine 06.14 1 8386560 4.53 10 4.87 10 For each node i of color j, Lij is assigned to be random signs +1 or −1 with equal probability. Other entries are set to be with a powered exponential covariance function [19], and zero. We set D = 4 in our , and M is typically 64 simulate 402 samples respectively from lattices with size 8×8, regardless of the dimension of Q. Hence, the computational 16×16, 32×32 and 64×64. We use two samples corresponding complexity is eventually reduced to O(P ). to the 99th and 60th quantiles of the overall extremes in the Based on the theory of stochastic approximation [15], using spatial domain as the testing data, and the rest as training data. a schedule of learning rates {ρt} such that ρt = ∞, The model fitting results are summarized in Table I. As 2 and ρt < ∞, the triply stochastic variational inference shown in the table, regardless of the grid size, the R-vine algorithm will converge to a local maximum ofP the bound achieves the largest averaged log-likelihood at the expense of L. Concretely,P we set the learning rate using ADAM [18] in using the largest number of parameters. As a result, the BIC which the upper bound of learning rate is initialized as 0.01 score of the R-vine pales in comparison with other models, and is multiplied by a factor of 0.9 every 50N iterations. suggesting that the R-vine overfits the data. In contrast, the After learning the parameters, it is straightforward to further CGGM utilizes fewest parameters, whereas it fails to describe extend the algorithm to infer the variational distribution q(xM ) the data accurately due to the tail independence property of at unobserved locations given observed ones; we simply fix β Gaussian copulas. The corresponding BIC score is the second and θ to their estimates and update mM and νM in each largest. Different from the R-vine and the CGGM, the ELTPC iteration. and the ETPC model successfully strike a balance between data fidelity and model complexity. When comparing the V. EXPERIMENTAL RESULTS proposed ELTPC with the ETPC, we can see that the ELTPC In this section, we test the proposed model against three can better fit the data without introducing many redundant other models, including copula Gaussian graphical models parameters. Indeed, the BIC score of the proposed model is the with a lattice structure (CGGM) [4], ensemble of trees of smallest under all scenarios. As for the computational time, the pairwise copulas (ETPC) [11], and regular vine copula models R-vine and the CGGM become intractable with the increase (R-vine) [7]. Specifically, the CGGM constructs a cyclic graph of the dimension. This can be explained by the computational based on Gaussian copulas, the ETPC model considers an complexity of their learning algorithms, which is O(P 2) and ensemble of all spanning trees of the observed variables in O(P 3) respectively for the R-vine and the CGGM. On the the lattice, and the R-vine employs a hierarchy of trees to other hand, the learning algorithms of both the ELTPC and capture all orders of dependence between observed variables. the ETPC model scale linearly with P . Since there are hidden We choose all pairwise copulas to be Gumbel copula (i.e., variables in the proposed ELTPC, the corresponding learning an extreme-value copula with tail dependence) for the ETPC, process takes a longer time than that of the ETPC. the R-vine and the ELTPC. To access the performance of Next, for two testing samples in the 16 × 16 grid, we all four models, we consider both synthetic data sampling consider the case where the data is missing in the 14 × 14 from max-stable random fields and real extreme precipitation area in the center of the grid. We then infer the marginals at in Japan. For each model, we evaluate the averaged log- the missing sites given observed ones using different models. likelihood (AvgLogLLH), the number of parameters (Prm We also conduct conditional simulation in the underlying max- No.), the score of Bayesian information criterion (BIC), and stable random field and regard the simulated distributions as the computational time. the ground truth. Fig. 3 shows the estimated marginals along A. Synthetic Data the diagonal of the missing region resulting from different We generate synthetic data by sampling from max-stable models. We omit the results of the R-vine since imputation in random fields. Theoretically, max-stable processes can well this model is unwieldy slow. We can tell from the figure that characterize the joint behavior of pointwise maxima in a the estimated distributions of resulting from the spatial domain. In particular, we consider a Schlather model ELTPC model follow the ground truth most closely. Indeed,

1733 2016 IEEE International Symposium on Information Theory

Max-stable process CGGM ETPC ELTPC 160 model, and therefore can capture the spatial dependence more 45 140 flexibly and effectively. We further derive a triply stochastic 40 120 variational inference algorithm to learn the model with low 35 100 computational complexity. Numerical results demonstrate that 30 80 the proposed ELTPC model achieves better performance than Extreme value Extreme value 25 benchmarks when describing spatial extremes. 60 20 40 ACKNOWLEDGMENT 15 5 10 15 5 10 15 This research was supported by MOE (Singapore) ACRF Location index along the diagonal Location index along the diagonal Tier 2 grant M4020187. (a) 60th quantile (b) 99th quantile Fig. 3: Marginals at sites along the diagonal in the missing REFERENCES region conditioned on observed sites. The solid lines denote [1] D. Cooley, J. Cisewski, R. J. Erhardt, S. Jeon, E. Mannshardt, B. O. the , and the dash lines denote the 0.025 and 0.975 Omolo, and Y. Sun, “A survey of spatial extremes: measuring spatial dependence and modeling spatial effects,” REVSTAT, vol. 10, pp. 135- quantile. 165, 2012. TABLE II: Modeling extreme precipitation data. [2] H. Yu, J. Dauwels, P. Jonathan, “Extreme-Value Graphical Models with Multiple Covariates,” IEEE Trans. Signal Process., vol. 62, no. 21, pp. Criteria AvgLogLLH Prm. No. BIC Running Time (s) 5734-5747, 2014. 2 5 4 ELTPC 05.44×1 1920 -6.77×10 2.31×10 [3] H. Sang, and A. E. Gelfand, “Continuous Spatial Process Models for 2 5 2 ETPC 05.33×1 960 -6.70×10 9.46×10 Spatial Extreme Values,” J. Agr. Biol. Envir. St., vol. 15 ,pp. 49-65, 2010. 2 5 1 CGGM 05.35×1 736 -6.73×10 2.64×10 [4] H. Yu, Z. Choo, W. I. T. Uy, J. Dauwels, and P. Jonathan, Modeling R-vine 06.07×1 2 32640 -5.58×105 1.42×103 Extreme Events in Spatial Domain by Copula Graphical Models, Proc. FUSION, pp. 1761- 1768, 2012. the mean absolute error between the estimated and true median [5] A. C. Davison, S. Padoan, and M. Ribatet, “Statistical modeling of spatial for the ELTPC, the ETPC, and the CGGM is respectively 0.37, extremes,” Statist. Sci., 27 (2):161-186, 2012. [6] S. A. Padoan, M. Ribatet, and S. A. Sisson, “Likelihood-Based Inference 0.57 and 0.58 in the 60th-quantile case, while being 0.99, 1.06, for Max-stable Processes,” J. Am. Stat. Assoc., vol. 105, no. 489, pp. and 2.00 in the 99th-quantile case. Moreover, the uncertainty 263-277, 2010. level of the CGGM and the ETPC can be much larger than [7] J. Dissmann, E. C. Brechmann, C. Czado, and D. Kurowicka, “Selecting and Estimating Regular Vine Copulae and Application to Financial that of the ground truth, especially in the high-quantile case. Returns,” Comput. Stat. Data An., vol. 59, no. 1, pp. 52-69. B. Real Data [8] H. Joe, H. Li, and A. K. Nikoloulopoulos, “Tail dependence functions and vine copulas,” J. Multivariate Anal., vol. 101, no. 1, pp. 252-270, The daily rainfall data from 1900–2011 is compiled and 2010. interpolated onto a grid with resolution 0.05 [20]. We select a [9] A. T. Ihler, S. Kirshner, M. Ghil, A. W. Robertson, and P. Smyth, “Graphical Models for and Data Assimilation,” 16×16 lattice with resolution 0.1 in South Japan, where heavy Physica D, 230, pp. 72-87, 2007. rainfall is often the cause of floods. We extract the extreme [10] S. Kirshner, “Learning with tree-averaged densities and distributions,” precipitation values from the sites as follows: We compute the Proc. NIPS, 2008. [11] H. Yu, W. I. T. Uy, and J. Dauwels, “Modeling Spatial Extremes via overall daily rainfall amount in that region. We next choose Ensemble-of-Trees of Pairwise Copulas,” Proc. ICASSP, pp. 2415-2419, a high enough threshold and retain those precipitation events 2014. (possibly lasting for several days) with rainfall amount above a [12] M. J. Choi, V. Chandrasekaran, D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Multiscale stochastic modeling for tractable inference threshold, resulting in fewer than 10 events per year. For each and data assimilation”, Comput. Methods Appl. Mech. Engrg., vol. 197, rainfall event in the region, we isolate the maximum daily rain pp. 3492-3515, 2008. fall amount. The resulting sample size is 633. [13] A. Sklar, “Fonctions de repartition´ a` n dimensions et leurs marges,” Publications de l’Institut de Statistique de L’Universite´ de Paris 8, pp. We list the results in Table II. Again, the ELTPC model 229-231, 1959. outperforms other models in terms of BIC score: it achieves [14] S. Kirshner, “Latent tree copulas,” 6th European Workshop on Proba- a small negative log-likelihood without introducing too many bilistic Graphical Models, 2012. [15] M. Titsias, M. Lazaro-Gredilla,´ “Doubly Stochastic Variational Bayes parameters. Interestingly, the CGGM yields larger likelihood for non-Conjugate Inference,” J. Mach. Learn. Res., W&CP, vol. 32, no. with fewer parameters when compared with the ETPC model, 1, 1971-1979, 2014. suggesting that using cyclic graph is necessary in this case. [16] D. M. Malioutov, J. K. Johnson, M. J. Choi, and A. S. Willsky, “Low- rank variance approximation in GMRF models: Single and multiscale However, the performance of the proposed ELTPC model is approaches,” IEEE Trans. Signal Process., vol. 56, no. 10, pp. 4621-4634, even better than the CGGM, since it is a good approximation 2008. to the cyclic graph after integrating out the hidden variables [17] J. M. Tang, and Y. Saad, “A Probing Method for Computing the Diagonal of a Matrix Inverse,”Numerical Lin. Alg. Appl., vol. 19, no. 3, pp. 485- and it can deal with tail dependence in the real data. 501, 2012. VI.CONCLUSIONS [18] D. P. Kingma, and J. L. Ba, “Adam: A method for stochastic optimiza- tion, ” Proc. ICLR, 2015. In this paper, we proposed a novel to [19] M. Schlather, “Models for Stationary Max-Stable Random Fields,” describe spatial extreme events. This model exploits a mixture Extremes, vol. 5, no. 1, pp. 33-44, 2002. of latent trees that can be built upon pairwise copulas. After [20] K. Kamiguchi, O. Arakawa, A. Kitoh, A. Yatagai, A. Hamada, and N. Yasutomi, “Development of APHRO JP, the first Japanese high-resolution integrating out the hidden variables in the latent trees, the daily precipitation product for more than 100 years,” Hydrol. Res. Lett., resulting model can better approximate a cyclic graphical vol. 4, pp. 60-64, 2010.

1734