Purdue University Purdue e-Pubs

Open Access Dissertations Theses and Dissertations

Summer 2014 Modeling spatial functions InKyung Choi Purdue University

Follow this and additional works at: https://docs.lib.purdue.edu/open_access_dissertations Part of the Applied Statistics Commons

Recommended Citation Choi, InKyung, "Modeling spatial covariance functions" (2014). Open Access Dissertations. 246. https://docs.lib.purdue.edu/open_access_dissertations/246

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. *UDGXDWH6FKRRO(7')RUP 5HYLVHG 0114 

PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance

7KLVLVWRFHUWLI\WKDWWKHWKHVLVGLVVHUWDWLRQSUHSDUHG

%\ InKyung Choi

(QWLWOHG  MODELING SPATIAL COVARIANCE FUNCTIONS

Doctor of Philosophy )RUWKHGHJUHHRI

,VDSSURYHGE\WKHILQDOH[DPLQLQJFRPPLWWHH

Hao Zhang

Bo Li

Xiao Wang

Tonglin Zhang

To the best of my knowledge and as understood by the student in the Thesis/Dissertation Agreement, Publication Delay, and Certification/Disclaimer (Graduate School Form 32), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy on Integrity in Research” and the use of  copyrighted material.

Hao Zhang $SSURYHGE\0DMRU3URIHVVRU V BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB $SSURYHGE\ Rebecca W. Doerge 08/29/2014

+HDGRIWKHDepartment *UDGXDWH3URJUDP 'DWH

MODELING SPATIAL COVARIANCE FUNCTIONS

A Dissertation

Submitted to the Faculty

of

Purdue University

by

InKyung Choi

In Partial Fulfillment of the

Requirements for the Degree

of

Doctor of Philosophy

December 2014

Purdue University

West Lafayette, Indiana ii

ACKNOWLEDGMENTS

First of all, I want to express my sincere gratitude to my advisor Professor Hao Zhang. He encouraged independent thinking and supported my ideas, which allowed me to freely pursue and explore topics that interested me, and whenever I went astray, his insightful comments put me back on the right track. I am also greatly indebted to Professor Bo Li who had been my advisor for the first three years of the Ph.D. program. I cannot overstate her influence at the beginning of my research in the field of Spatial Statistics. Her extensive knowledge about the field and passion for research had been an inspiration to me. Even after moving to the University of Illinois at Urbana-Champaign, she has always taken good care of me and put herself out of the way to support me. I appreciate my committee members; Professor Xiao Wang and Professor Tonglin Zhang for their helpful input and warm encouragement. The support of the Depart- ment of Statistics and the staff members should not go unmentioned for it was only through their consistent help and assistance that I could complete my study. I was fortunate to study at university that hosts a great consulting center like Statistical Consulting Service center. While working at the center, I learnt a great deal about how to apply statistics in practice, knowledge that can not be easily obtained in a classroom. The invaluable experience that I had obtained through working with clients from various fields will be of a great asset for the rest of my professional career and I am thankful for all the professors, colleagues and staff members of the center, and especially Professor Bruce Craig for his guidance and mentoring. In a seemingly endless fight with oneself and facing the battle all alone in a foreign country, I often found facets of myself that I had never truly realized before. Hence the last five years has not only been about the pursuit of academic knowledge but also a journey to understand myself, and perhaps the world around me as well. My friends iii both inside and outside the department have been wonderful companions along the journey and I’m grateful to them for being good friends and teachers during the important time of my life. Lastly, I want to thank my family in Korea. Although separated by continents, knowing that I have people who always love and support me without condition gave me strength and courage to go on even in the darkest night. I’m forever indebted for their unwavering devotion which is the true force that has shaped who I am now. iv

TABLE OF CONTENTS

Page LIST OF TABLES ...... vi LIST OF FIGURES ...... vii ABSTRACT ...... viii 1 INTRODUCTION ...... 1 1.1 Motivation ...... 1 1.2 Preliminaries ...... 2 1.2.1 Covariance function of random fields ...... 2 1.2.2 Spectral representation of stationary covariance functions .. 4 1.2.3 Some nonstationary covariance functions ...... 5 1.2.4 Spatial random effect model ...... 6 1.3 Overview ...... 8 2 NONPARAMETRIC COVARIANCE ESTIMATION ...... 9 2.1 Introduction ...... 9 2.2 Nonparametric Covariance Estimation ...... 12 2.2.1 Construction of completely monotone function ...... 12 2.2.2 Estimation ...... 14 2.3 Extension to Space-Time Covariance Function ...... 17 2.4 Simulation Study ...... 19 2.4.1 Simulation set up ...... 20 2.4.2 Number of knots and order of B-splines ...... 21 2.4.3 Comparison with parametric models ...... 22 2.4.4 Simulation with space-time covariance function ...... 25 2.5 Two Data Examples ...... 27 2.5.1 Indiana July mean temperature data ...... 27 2.5.2 Irish wind data ...... 29 2.6 Conclusion ...... 33 3 ANALYSIS OF TELECONNECTION ...... 35 3.1 Introduction ...... 35 3.2 ENSO Teleconnection to North America ...... 38 3.3 Model for Time-varying Teleconnection ...... 42 3.3.1 Model ...... 43 3.3.2 Estimation ...... 45 v

Page 3.3.3 Empirical confidence interval for covariance of spatial random effect ...... 48 3.4 Data Analysis ...... 50 3.4.1 Teleconnection analysis ...... 50 3.4.2 Spatial prediction ...... 57 3.5 Conclusion ...... 63 REFERENCES ...... 65 VITA ...... 70 vi

LIST OF TABLES

Table Page 2.1 Mean and standard error (in parenthesis) of integrated squared error (ISE) and mean squared prediction error (MSPE) of nonparametric model at different knot number m and order of B-splines p (unit: 10−2). .... 23 2.2 Mean and standard error (in parenthesis) of integrated squared error (ISE) and mean squared prediction error (MSPE) of three parametric models (spherical, Mat´ernand Gaussian) and the nonparametric model (with or- der of B-splines p = 3 and knot number m = 5), together with the MSPE of true covariance model (unit: 10−2)...... 24 2.3 Mean and standard error (in parenthesis) of integrated squared error (ISE) and mean squared prediction error (MSPE) of the nonparametric space- time covariance model at different knot number (mφ, mψ) and order of B-splines (pφ, pψ) and of the parametric models (unit: 10−2)...... 26 2.4 Mean squared prediction error (MSPE) of one-day ahead prediction of Irish wind data...... 32 3.1 Mean of mean squared prediction error (MSPE) and logarithm score (LogS) of prediction based 1) Two regions : uses both North America data and Ni˜no3region data; and 2) One region : uses data from the same region as the region of the dropped-site ...... 63 vii

LIST OF FIGURES

Figure Page [q] 2.1 Example of functions {fj : j = 1, . . . , m + p} for m = 2 and p = 3, [q] and two covariance functions constructed based on those fj : C1(h) = [2] 2 [2] 2 [2] 2 [2] 2 5f1 (h ) + 0.5f5 (h ) and C2(h) = 0.7f3 (h ) + 0.1f4 (h )...... 15 2.2 86 weather stations in Indiana with July 2011 mean temperatures, together with temperature surface interpolated using the fitted nonparametric co- variance function ...... 28 2.3 Emprical covariance estimates (dots) and fitted covariance functions for Indiana July 2011 temperature data (distance unit: 10 km) ...... 29 2.4 Empirical covariance estimates (dots) of Irish wind data and fitted non- parametric model (solid line) at four different sets of space-time lags (dis- tance unit: 100 km) ...... 31 3.1 Nino3 region and North America region ...... 41 3.2 Temporal variation of residual variance (top: Ni˜no3region; bottom: North America) before standardization (blue lines) and after the standardization (red lines) ...... 51 3.3 Location of centers of spatial basis functions ...... 52

3.4 Estimated covariancesσ ˆij(t) from 1871 to 1990 ...... 53

3.5 Estimated covariancesσ ˆij(t) from 1871 to 1990 ...... 54

3.6 Estimated covariance σbij(t) for spatial random effects α1, . . . , α6 .... 55 3.7 Estimated covariance σbij(t) for spatial random effects α7, . . . , α12 ... 56 3.8 Mean and empirical confidence interval of associated with α1, . . . , α4 ...... 58 3.9 Mean and empirical confidence interval of covariances associated with α6, α7, α8 and α10 ...... 59 3.10 Squared Frobenius norm of estimated cross-region covariance matrix .. 60 3.11 Location of centers of spatial basis functions for spatial prediction; 20 (14) new basis functions are added to 12 (6) basis functions of North America (Ni˜no3region) used in teleconnection analysis (rA = 32 and rB = 20) . 61 viii

ABSTRACT

Choi, InKyung Ph.D., Purdue University, December 2014. Modeling spatial covari- ance functions. Major Professor: Hao Zhang.

Covariance modeling plays a key role in the spatial data analysis as it provides important information about the dependence structure of underlying processes and determines performance of spatial prediction. Various parametric models have been developed to accommodate the idiosyncratic features of a given dataset. However, the parametric models may impose unjustified restrictions to the covariance structure and the procedure of choosing a specific model is often ad-hoc. In the first part of the dissertation, a new nonparametric covariance model that can avoid the choice of parametric forms is proposed. The estimator is obtained via a nonparametric approx- imation of completely monotone functions. It is easy to implement and simulation study shows it outperforms the parametric models when there is no clear information on model specification. Two real datasets are analyzed to illustrate the proposed approach and provide further comparison between the nonparametric and parametric models. Most spatial covariance models assume that the dependence becomes stronger when two locations are closer to each other and thus assume that the dependence is negligible when two locations are far apart from one another. However long-distance connection can occur in climate variables through, for example, high altitude winds or large-scale atmospheric waves propagation. This phenomenon is called teleconnection and often considered to be responsible for extreme weather events occurring simul- taneously around the world. In the second part of the dissertation, a nonstationary spatial covariance model for long-distance dependence is proposed. The model allows the spatial dependence to vary with time so that temporal dynamics of the telecon- ix nection phenomenon can be captured. The model is applied to analyze teleconnection between sea surface temperature of tropical Pacific Ocean and hydrological droughts of North America incurred by the El Ni˜no-SouthernOscillation. 1

1. INTRODUCTION

1.1 Motivation

As data are often assumed to be a realization of a defined on an ordered set, a random field has been a primary mathematical framework of choice for analysis of spatial data. While time series data are observed at a sequence on one dimension of time, spatial data are observed over a two-dimensional plane or even in a higher dimensional space. This complicates the problem of dependence modeling because we now have another aspect to consider: direction of the spatial lag. One may assume that the dependence strength depends only on the spatial lag length as in time series data analysis and this works reasonably well for many cases where the process can be assumed to be relatively homogeneous across the space. On the other hand, if the spatial domain of interest is over a vast region or has complex geographical features, the assumption may not hold and the dependence may vary by the direction of spatial lag as well as location at which the lag is measured. Indeed, the direction or the location with strong spatial dependence could provide important features of the underlying physical process such as in teleconnection phenomenon (see Chapter 2 for more details). Once data are observed, they are considered as a realization of a multivariate distribution whose distributional characteristics are inherited from the corresponding random field. Although there are several multivariate dependence concepts (e.g. copula, tail dependence, orthant dependence and etc.), covariance is the most popular measure to quantify spatial dependence in the data for several reasons: the concept is straightforward; it is easy to calculate sample covariances and its properties are well understood. Above all, the best linear spatial predictor () is expressed 2 as a function of covariances. Therefore, it is not surprising to find that covariance function modeling plays an important role in spatial data analysis and a lot of effort has been made to develop covariance functions that can accommodate idiosyncratic features of a given dataset. In the remaining part of this chapter, we give an introduction to basic concepts and background, and provide a brief review of some existing spatial covariance models.

1.2 Preliminaries

1.2.1 Covariance function of random fields

Consider a Gaussian random field {Y (s): s ∈ D ⊂ Rd} where D is a domain of interest in d-dimensional Euclidean space Rd. A common framework of spatial data analysis decomposes the random field into three components (Cressie, 1993):

Y (s) = µ(s) + Z(s) + (s), (1.1) where µ(s) is a large-scale variation, Z(s) is a small-scale spatial variation and (s) is a measurement error. The deterministic large-scale variation µ(s) is used to remove the non-stationary component in Y (s) that is caused by the varying mean and often

0 0 takes a form of µ(s) = x(s) βµ, where x(s) = (x1(s), . . . , xp(s)) is a vector of known covariates and βµ is an unknown parameter vector. The measurement error (s) is 2 assumed to be a zero-mean white noise process with variance Var((s)) = σ . The spatial dependence of Y (s) arises from the dependence within the small-scale spatial variation Z(s) and we focus on modeling the dependence of this process.

Let C : D × D → R denote the covariance function of random field Z(s):

C(s, u) = Cov(Z(s),Z(u)) s, u ∈ D. (1.2)

If the mean of the random field is constant (i.e. E(Z(s)) ≡ µ) and the covariance depends on the spatial lag s − u between two locations only, i.e.,

C(s, u) = CSt(s − u), 3 for some function CSt : Rd → R, the random field is called weakly stationary (also called transional invariant or second-order stationary) and its covariance function is called stationary covariance function. Furthermore, if the covariance function does not depend on the direction of the lag s−u, but depends on the length ||s−u|| only, i.e.,

C(s, u) = CIso(||s − u||), for some function CIso : R+ → R, the process is called isotropic random field and its covariance function is called isotropic covariance function. Although the isotropy is a rather strong assumption, isotropic covariance function can serve as a building block in constructing non-isotropic (anisotropic) covariance functions or non-stationary covariance functions. For example, a popular class of anisotropic covariance functions is obtained by a linear transformation of the spatial lag vector h using some matrix A (Journel and Huijbregts, 1978):

CSt(s − u) = CIso(||A(s − u)||), (1.3) or, by extension, m St X Iso C (s − u) = Ci (||Ai(s − u)||). i=1 A similar approach can be taken to construct non-stationary covariance functions from isotropic covariance functions and this will be described in Section 1.2.3. In the rest of this dissertation, C (without superscript) will be used to denote a covariance function regardless of the type. It will be obvious from the arguments of the function which type of covariance function is referred to in each context. Now, for C to be a valid covariance function, it needs to satisfy the positive definite condition, i.e., n n X X aiajC(si, sj) > 0, (1.4) i=1 j=1 for any finite set of spatial locations s1,..., sn ∈ D and real numbers a1, . . . , an. There are several known results that guarantee a function to be positive definite such 4 as complete monotonousness (see Section 2.2.1), but the condition can be more easily checked in the spectral domain using spectral representation as described in following Section 1.2.2.

1.2.2 Spectral representation of stationary covariance functions

By spectral representation theorem, a weakly stationary random field Z(s) can be represented in the form of Fourier-Stieltjes integral as: Z Z(s) = exp(is0ω)dX(ω), (1.5) Rd where X(ω) is a zero-mean process with independent increments. Since X(ω) in (1.5) is an independent process, we can write the covariance function of the random field Z(s) as: Z Z  C(h) = E exp(iω00)dX(ω) exp(iω0h)dX(ω) Rd Rd Z = exp(ih0ω)F (dω), (1.6) Rd where F (ω) = E[|dX(ω)|2] is called the spectral measure whose density with respect to Lebesgue measure, if exists, is called the spectral density function. It is straight- forward to show that C(h) is positive definite if it has representation (1.6) because n n Z n X X X 0 2 aiajC(si − sj) = | aj exp(isjω)| F (dω) ≥ 0. d i=1 j=1 R j=1 In fact, every positive definite function has this representation (Bochner’s theo- rem) and (1.6) is the only way to obtain a valid covariance function. Now, spectral representation (1.6) can be simplified if we further assume that the covariance function is isotropic. Then, for h ∈ R+, the isotropic covariance function C(h) can be represented as: Z C(h) = C(h||v||)U(dv) ∂bd Z Z = exp(ihv0ω)F (dω)U(dv), d ∂bd R 5

for an uniform probability measure U on d-dimensional unit sphere ∂bd (Stein, 1999). Then, using spherical coordinate system and integrating out the angular variable, we can obtain that Z ∞  2 (d−2)/2 C(h) = Γ(d/2) J(d−2)/2(hω)dH(ω), (1.7) 0 hω R where Jν(·) is a Bessel function of the first kind of order ν and H(ω) = ||ω||<ω F (dω). Many of commonly used isotropic spatial covariance functions have the representa- tion (1.7). For example, spectral densities of exponential covariance function and gaussian covariance function are proportional to bivariate Cauchy probability density and normal probability density respectively. Mat´erncovariance function has a spec- tral density proportional to student-t density, which explains, in the spectral domain, how exponential and Gaussian covariance functions are the special cases of Mat´ern family. Indeed it is this representation (1.7) of an isotropic covariance function that has contributed to the development of many nonparametric covariance models (see Section 2.1 for more details).

1.2.3 Some nonstationary covariance functions

Stationary covariance functions may work well if one considers a relatively small region or if it is reasonably believed that the process in the region of interest would exhibit homogenous characteristics. But when the spatial domain is vast, it is more realistic to assume that the dependence structure changes over the space. For ex- ample, the same spatial lag h may imply different degree of dependence strength depending on where the lag is measured, i.e.,

C(s, s + h) 6= C(u, u + h), s 6= u.

As in the construction of anistropic covariance function (1.3), nonstationary co- variance functions can be obtained from isotropic covariance functions. “Image wrap- ping” is one of such attempts. Sampson and Guttorp (1992) pioneered the the work on the use of space deformation techniques in developing nonstationary covariance 6

functions. They extended linear transformation (1.3) to a smooth non-linear trans-

0 formation f : Rd → Rd :

C(s, u) = CIso(|f(s) − f(u)|),

so that the covariance function is isotropic in a transformed space f(s). Convolution-based methods have also been employed to create nonstationary co- variance functions. In these approaches, a nonstationary covariance function is ob- tained by defining a convolution process instead of specifying a particular form of covariance function. For example, in Higdon, Swall and Kern (1999), a nonstation- ary process is defined as the convolution of some kernel function K and white noise process X(v): Z Z(s) = K(s − v)X(v)dv, (1.8) Rd while in Fuentes and Smith (2001), the nonstationary process is defined as: Z Z(s) = K(s − v)Xθ(v)(s)dv, (1.9) Rd d where {Xθ(v)(s): s, v ∈ R } are independent white noise processes indexed by pa- rameter θ(v). The covariance function can be obtained directly from the definition. For example, from (1.9), we have Z C(s, u) = K(s − v)K(u − v)Cθ(v)(s − u)dv, (1.10) Rd which is obviously nonstationary. These types of nonstationary covariance functions are considered as a ‘local’ approach because locally stationary processes are used to create a globally nonstationary process, hence generating a globally nonstationary covariance function.

1.2.4 Spatial random effect model

In recent years, large-scale data are becoming more and more common due to the advancement of high-resolution satellite data and climate product data. Also, existing 7 historic datasets, such as sea surface temperature measurements from ships or buoys, which are very sparse individually, are combined and interpolated together to pro- duce a high resolution global dataset. These massive datasets incur a computational burden both in estimating the covariance function and making spatial prediction at an unknown location because such procedures often involve inverting a n×n variance matrix, where n is the size of the dataset. For this reason, the massive dataset prob- lem has gained much attention in the spatial statistical community, and the spatial random effect model is proposed to tackle this problem. In the spatial random effect model, the small-scale variation Z(s) in the random

field Y (s) is described using a set of latent variables (random effects) {αr}r=1,...,R with spatial basis functions {fr(s)}r=1,...,R:

R X Y (s) = µ(s) + αrfr(s) + (s). r=1 Similar to the random effect model in the experimental design, the dependence among data arises from sharing the latent variables and the degree of communality is ad- justed by weights assigned through spatial basis functions. The covariance function of random field Y (s) is:

0 2 C(s, u) = f sKf u + σ δ{s=u}, (1.11)

where f s is a R × 1 vector of spatial basis functions evaluated at spatial location 0 s,(f1(s), . . . , fR(s)) and K is a R × R variance matrix of the random effects,

Var((a1, . . . , aR)). Note that the covariance function represented as (1.11) is gen- erally nonstationary. Also, it is easy to see that (1.11) satisfies the positive definite condition (1.4) since for any n spatial locations s1,..., sn ∈ D the covariance matrix is

0 2 {C(si, sj)}i,j=1,...,n = F KF + σ In, (1.12) 8

where F = {fr(si)}r=1,...,R;i=1,...,n is a n × R basis matrix and this implies, for a = 0 (a1, . . . , an) ,

n n X X 0 0 2 aiajC(si, sj) = a F KF a + σ In > 0, i=1 j=1 provided that K is non-negative definite and σ2 is non-zero. Due to (1.12), the variance matrix of the data can be represented using a fixed dimensional variance matrix K and this reduces the computational complexity of variance matrix inversion task from O(n3) to O(R3) (Sherman-Morrison-Woodbury formula). The idea of the spatial random effect model (also called low rank model) has been employed in studies like Predictive process model (Banerjee et al., 2008) and fixed rank kriging (Cressie and Johanesson, 2008). The representation can also be related to convolution process model when the white noise process X(v) in (1.8) is replaced with a random vector Z.

1.3 Overview

This dissertation examines two problems in modeling spatial covariance functions. In Chapter 2, a new nonparametric isotropic covariance function based on a com- pletely monotone function is proposed and extended to a space-time covariance func- tion. Simulation experiment is conducted to assess the performance of the proposed nonparametric model. In Chapter 3, climatic long-distance dependence phenomenon called teleconnection is investigated. The description and scientic background of the teleconnection phenomenon are first provided in Section 3.1. The spatial model ef- fect model is adapted to incorporate the long-distance time-varying dependence and applied to analyze the North America-Pacific Ocean teleconnection phenomenon in Section 3.4.1. Finally Chapter 4 concludes with a brief discussion. 9

2. NONPARAMETRIC COVARIANCE ESTIMATION

2.1 Introduction

Covariance function plays a key role in the spatial data analysis since the de- pendence structure of random field is determined through the choice of covariance function. Various parametric models have been developed to accommodate the id- iosyncratic features of a given dataset. However, the procedure of choosing a specific parametric model is often ad-hoc and relies on subjective judgments. Furthermore, many spatial datasets usually have complex structures and thus a parametric model may impose unjustified restrictions to the covariance function. For these reasons, we seek a nonparametric modeling approach to estimate the covariance function that will effectively respect the dependence structure in the data and will reduce, if not eliminate, the risk of model misspecification. Recall the spectral representation of isotropic covariance function (1.7) that isotropic covariance function can be expressed as :

Z ∞ C(h) = Ωd(ωh)dH(ω), (2.1) 0

(d−2)/2 where Ωd(ωh) = Γ(d/2)(2/hω) J(d−2)/2(hω) and H(ω) is a non-decreasing func- tion on [0, ∞). Compared to the task of finding a positive definite function that satis- fies the condition (1.4), it is easier to find a function that satisfies the non-decreasing condition. For this reason, most nonparametric approaches that have been proposed to date are fundamentally based on Bochner’s theorem and focus on modeling the non-decreasing function H(ω).

The arguments and results presented in this chapter have been published in Choi et al. (2013) 10

Shapiro and Botha (1991) estimated the function,

2γ(h) = var{Z(s) − Z(s + h)}

= 2(C(0) − C(h)), by substituting H(ω) in (2.1) by a step function with a finite number of positive jumps at nodes ω1, . . . , ωl. Genton and Gorsich (2000) also used a finite discrete measure, but selected the nodes as zeroes of Bessel functions to obtain orthogonal discretization of the spectral representation of positive definite functions, which reduced the approx- imation error and resulted in a smoother covariance function over continuum. Hall et al. (1994) developed a nonparametric covariance estimator for one-dimensional stochastic process based on the Fourier transform of a kernel estimator of covariance function. Im et al. (2007) modeled the spectral density as a linear combination of

B-splines up to a certain frequency threshold ω0 and then an algebraic power function with a smoothness parameter beyond the threshold point ω0. This semiparametric method is flexible in that it uses B-splines for low-middle frequencies while using a power function similar to Mat´ernmodel for high frequencies. More recently, under the intrinsic stationary assumption, Huang et al. (2011) proposed a nonparametric method for variogram estimation using a general spline methodology. Zheng et al. (2010) developed a nonparametric Bayesian approach to estimate the spectral density of Gaussian random fields over a regular lattice, where a multidimensional Bernstein polynomial distribution was used as prior and Whittle’s approximation was incorpo- rated to facilitate the computation. Reich and Fuentes (2012) generalized this idea to non-Gaussian and non-lattice data using Dirichlet process prior. In addition to above methods that are based on the spectral representation of covariance functions, Barry and Ver Hoef (1996) proposed a nonparametric piecewise linear model for the variogram using the convolution representation of random fields. Typically, these methods require complex computation because they often involve either an optimization over a large number of nodes or integration of special functions. Here we propose an easy-to-implement approach that constructs a nonparametric 11

estimator of isotropic covariance functions valid for all dimensions, d = 1, 2,.... Our method is derived from the relationship between a positive definite function and a completely monotone function. A completely monotone function is a function φ(y) such that (−1)nφ(n)(y) ≥ 0 for n = 0, 1,.... Schoenberg (1938, p. 821) established that C(h) is positive definite for all d = 1, 2,... if and only if φ(h) = C(h1/2) is completely monotone (Cressie, 1993 p. 86). Therefore, given a completely monotone function φ(h), setting C(h) = φ(h2) produces a valid covariance function. Based on this relationship, we first represent the completely monotone function φ(h) as a non- negative linear combination of simple completely monotone functions derived from B-splines. Then the nonparametric covariance function C(h) is estimated using this completely monotone function. The explicit form of the simple completely monotone functions enables computational efficiency of our proposed nonparametric estimation. Our approach can also be extended to model the space-time covariance function via the covariance class developed in Gneiting (2002), since the form of that class can essentially be represented by two completely monotone functions as discussed in Section 2.3. It is worth noting that covariance functions valid for all dimensions have both advantages and disadvantages, because properties of covariance functions such as the lower bound or the degree of oscillating depend on the dimension. Covariance functions valid for all dimensions are strictly decreasing and positive (Stein, 1999), hence they cannot model the negative correlation. However, this limit might be only a minor issue as the negative covariance or oscillating covariance function is not usually considered in the realm of spatial statistics, e.g., most common parametric models such as Mat´ernor exponential are also valid for all dimensions and cannot model negative covariance. On the other hand, since covariance functions valid for d-dimensional space are not guaranteed to be valid for higher dimensions d0 (> d), the covariance functions valid for all dimensions are subject to less restriction than those defined for a certain dimension. Both Apanasovich and Genton (2010) and Bornn et al. (2012) required covariance functions valid in high dimension in their 12

methodology development, and our proposed nonparametric covariance function can be readily applied to these cases. This chapter proceeds as follows. Section 2.2 derives the nonparametric approx- imation of a completely monotone function and describes the nonparametric esti- mation of spatial covariance functions. Section 2.3 extends the idea to estimate space-time covariance functions. Section 2.4 conducts simulation studies to compare the nonparametric and parametric models and study the properties of nonparametric estimators. In Section 2.5, two real datasets are analyzed as illustrative examples.

2.2 Nonparametric Covariance Estimation

2.2.1 Construction of completely monotone function

From Bernstein’s theorem (Feller, 1996), a function φ(x) is completely monotone if and only if it can be represented as:

Z ∞ φ(x) = e−xydF (y), (2.2) 0 where F (y) is a probability distribution function. Examples of completely monotone functions include e−ax, 1/(a + bx)ν and log(a + b/x) for a ≥ 0, b ≥ 0 and ν ≥ 0. In (2.2), by letting F (y) = 1 − G(e−y), we have that φ(x) is completely monotone if

Z 1 φ(x) = yxdG(y), (2.3) 0 for a non-decreasing function G(y). The non-decreasing function G(y) in (2.3) can be chosen as a linear combination

of B-splines with a certain constraint on the coefficients. Let 0 < κ1 < . . . < κm < 1

denote m knots for the B-spline bases of order p with boundary knots κ0 = 0 and

κm+1 = 1. We consider equally spaced knots, that is, κ1 = 1/(m + 1), κ2 = 2/(m +

1), . . . , κm = m/(m + 1). In order to define m + p + 1 B-spline bases of order p, [p] {Bk , k = 1, . . . , m + p + 1} on the domain [0, 1], we augment the knot sequence

{κj : j = 0, . . . , m + 1} to {τj : j = −p, . . . , m + p + 1} as follows: for j = −p, . . . , −1 13

and j = m + 2, . . . , m + p + 1, τj = j/(m + 1), while for j = 0, . . . , m + 1, τj = κj. Now we express the non-decreasing function G(y) by

m+p+1 X [p] G(y) = bkBk (y). k=1 It can be easily shown that G(y) is non-decreasing as long as the sequence of B-spline coefficients {bk : k = 1, . . . , m + p + 1} is non-decreasing. Note that (2.3) is well-defined for p = 0 as well, although in practice we usually use p ≥ 1. Following the derivative formula for B-splines in de Boor (2001), we have

m+p+1 X [p−1] dG(y)/dy = (bk − bk−1)(m + 1)Bk (y). (2.4) k=2

[r] Define a function fk as following: Z 1 [r] x [r] fk (x) = (m + 1)y Bk+1(y)dy. 0 Then (2.3) and (2.4) together yield a completely monotone function

m+p X [q] φ(x) = βkfk (x), (2.5) k=1 where βk = bk+1 − bk and q = p − 1. The non-decreasing constraint on {bk : k =

1, . . . , m+p+1} now becomes a non-negative constraint on βk, i.e., we require βk ≥ 0 [r] for all k = 1, . . . , m + p. Note that fk itself is also a completely monotone function because it is an integration of a product between yx and non-negative B-spline over [0, 1]. Thus, the completely monotone function φ(x) is virtually represented by a linear combination of completely monotone functions with non-negative coefficients, [r] and the set of functions {fk : k = 1, . . . , m + p} essentially plays a role as basis functions in constructing φ(x). Given φ(x) in (2.5), setting C(h) = φ(h2) yields a valid covariance function as below:

m+p X [q] 2 C(h) = βkfk (h ). (2.6) k=1 14

[q] Although fk (x) is a function that involves an integration operator, due to Cox-de [r] Boor recursion formula of B-splines, fk (x) with order r ≥ 1 can be easily obtained using below formula:

m + 1 n o f [r](x) = f [r−1](x + 1) − τ f [r−1](x) + τ f [r−1](x) − f [r−1](x + 1) , k r k k−p k k−p+l+1 k+1 k+1

[0] with fk (x) given by:  m+1 x+1 x+1 [0]  x+1 (τk−p+1 − τk−p ) if 0 ≤ τk−p, τk−p+1 ≤ 1, fk (x) =  0 otherwise.

[q] As an example, Figure 2.1 shows the set of functions {fk (x): k = 1, . . . , m + p} for p = 3 and m = 2, superimposed by two completely monotone functions, C1(h) = [2] 2 [2] 2 [2] 2 [2] 2 [2] 5f1 (h ) + 0.5f5 (h ) and C2(h) = 0.7f3 (h ) + 0.1f4 (h ), generated from {fk : k = [r] 1,..., 5}. The explicit form of function fk (x) ensures easy and fast computation for the proposed nonparametric estimator, with no need to resort to complicated numerical integration or approximation procedures, which are often required by other nonparametric estimators.

2.2.2 Estimation

[q] Given {fk (x): k = 1, . . . , m + p}, the only unknown parameters in (2.6) are the set of nonnegative coefficients {βk : k = 1, . . . , m + p}. We propose to estimate 0 β = (β1, . . . , βm+p) by weighted least squares method:

L m+p X X [q] 2 2 βbW LS = arg min wl{CbE(hl) − βkfk (hl )} , (2.7) β ≥0 k l=1 k=1 where CbE(hl) is an empirical estimator of C(hl), L is the number of distance lags to be considered, and {wl : l = 1,...,L} is a set of weights. We choose CbE(hl) to be the moment estimator:

1 X CbE(hl) = {Z(si) − Z}{Z(sj) − Z}, (2.8) |N(hl)| (si,sj )∈N(hl) 15

Basis Functions 1.0 Covariance Function 1 2 Covariance Function 2 f3(h )

2 0.8 f4(h ) 0.6 2 f2(h ) C(h) 0.4

0.2 2 f5(h )

2 f1(h ) 0.0

0.0 0.5 1.0 1.5 2.0

h

[q] Figure 2.1. Example of functions {fj : j = 1, . . . , m + p} for m = 2 [q] and p = 3, and two covariance functions constructed based on those fj : [2] 2 [2] 2 [2] 2 [2] 2 C1(h) = 5f1 (h ) + 0.5f5 (h ) and C2(h) = 0.7f3 (h ) + 0.1f4 (h ). 16

where Z is the sample mean, N(hl) is a set of data pairs that are apart by distance lag hl, and |N(hl)| is the cardinality of the set N(hl). Since the empirical covariance

CbE at different lags have different variability, it is common to outweigh CbE with low variability and downweigh those with high variability (Cressie, 1993; Diggle and Ribeiro, 2007). We use the weighting scheme derived from the weights for empirical

2 variogram in Cressie (1985), wl = |N(hl)|/{1 − CbE(hl)} . Alternatively, we can employ the maximum likelihood (ML) method to estimate

0 β. Assuming that the observed data Z = (Z(s1),...,Z(sn)) is a realization from a Gaussian random field, βb can be obtained by minimizing -2 loglikelihood function, l(β),

βbML = arg min l(β) βk≥0 0 −1 = arg min(log |Σβ| + Z Σβ Z), βk≥0 where Σβ is the covariance matrix of Z computed from (2.6) :

Σβ = {C(hij)}i,j=1,...,n m+p X [q] = { βkfk (hij)}i,j=1,...,n k=1 where hij is a distance between si and sj. 0 To facilitate the computation, the gradient function lk = ∂l(β)/∂βk can be used in the optimization process :

0 ∂ 0 −1 lk = (log |Σβ| + Z Σβ Z) ∂βk 1 ∂Σ ∂Σ−1 = β + Z0 β Z |Σβ| ∂βk ∂βk

1 −1 ∂Σβ 0 −1 ∂Σβ −1 = (|Σβ|tr(Σβ )) − Z Σβ Σβ Z |Σβ| ∂βk ∂βk

−1 ∂Σβ 0 −1 ∂Σβ −1 = tr(Σβ ) − Z Σβ Σβ Z. ∂βk ∂βk The ML method is used widely in the parameter estimation for spatial data, but it is model dependent and also may suffer from numerical issues. Under the current 17

context, our experience indicates that βbML is often unstable and heavily depends on initial values of parameters, due to the nonlinear optimization over a m + p dimensional space. Hence, we adopt the weighted least squares approach for our parameter estimation. It is seen from the simulation results that the performance of this estimating approach is satisfactory. In general, the spline fitting is sensitive to the number and the placement of knots. In such a case, we can resort to traditional approaches such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) to determine the location and the knot number. However, Meyer (2008) indicated that if an assumption such as monotonicity or convexity is imposed on the regression function, the shape-restricted regression splines are robust to knot choices. Our simulation results in Section 2.4 also corroborate this statement. Based on our experience and simulation studies, it is generally adequate to choose less than 10 knots to achieve a reasonable approximation.

2.3 Extension to Space-Time Covariance Function

Suppose now we observe a stochastic process Y (s, t) over a space-time domain,

where s ∈ Rd and t ∈ R denote a spatial location and a time point respectively. Analogous to (1.1), a common framework for modeling this stochastic process is

Y (s, t) = µ(s, t) + Z(s, t) + (s, t),

where the assumptions for µ(s, t) and (s, t) follow those for µ(s) and (s) in model (1.1). Again we assume that Z(s, t) is a weakly stationary stochastic process with E{Z(s, t)} = 0 for all s and t, and with covariance function

C(h, v) = Cov{Z(s, t),Z(s + h, t + v)},

and we focus on modeling the covariance structure in Z(s, t). The positive definiteness is still the requirement for a space-time covariance func- tion to be valid. Gneiting (2002) developed a class of non-separable covariance mod- els for a weakly stationary spatio-temporal process, based on a completely monotone 18 function φ(·) and a positive function ψ(·) with completely monotone derivatives. The author showed that

σ2  ||h||2  C(h, v) = φ , (h, v) ∈ d × (2.9) ψ(|v|2)d/2 ψ(|v|2) R R is a valid space-time covariance function. Although later Zastavnyi and Porcu (2011) argued that the conditions for ψ(·) can be relaxed to negative definiteness, we still follow the conditions in Gneiting (2002) for our illustration. Let h = ||h||, φ∗(·) = σ2φ(·) and ψ∗(·) = 1/ψ(·), then (2.9) can be written into

C(h, v) = φ∗ h2ψ∗(v2) ψ∗(v2)d/2. (2.10)

Obviously φ∗(·) is completely monotone and ψ∗(·) can also be shown to be completely monotone, since it is the reciprocal of a function with completely monotone derivatives and the reciprocal itself is a completely monotone function (Feller, 1996 p. 436). Now the estimation of C(h, v) is equivalent to the estimation of two completely monotone functions φ∗(·) and ψ∗(·), and thus our proposed nonparametric estimation method can be extended to estimate C(h, v). To ensure the identifiability of those two functions, we impose ψ∗(0) = 1 without loss of generality. We observe that setting h = 0 leads to C(h, 0) = φ∗(h2), and setting v = 0 leads to C(0, v) = φ∗(0)ψ∗(v2)d/2. Hence we can first estimate φ∗(·) using the

∗ empirical covariance estimates CbE(h, 0), and then plug the estimated φ (0) back into ∗ (2.10) to estimate ψ (·) using the empirical covariance estimates CbE(0, v). Following ∗ 2 Pmφ+pφ [qφ] 2 ∗ 2 Pmψ+pψ [qψ] 2 (2.6) we approximate φ (h ) by k=1 αkfk (h ) and ψ (v ) by k=1 γkfk (v ), where mφ and mψ are the number of knots, and pφ and pψ are degree of B-splines used for function φ∗(·) and ψ∗(·), respectively. Then the estimation of these two functions boils down to the estimation of αk for k = 1, . . . , mφ+pφ and γk for k = 1, . . . , mψ +pψ.

We again choose CbE(h, 0) and CbE(0, v) to be moment estimators. Following the notation in (2.8), we have

T 1 1 X X CbE(hl, 0) = {Z(si, t) − Z}{Z(sj, t) − Z}, T |N(hl)| t=1 (si,sj )∈N(hl) 19

and n T −v 1 X X CbE(0, v) = {Z(si, t) − Z}{Z(si, t + v) − Z}. nT i=1 t=1 0 0 Let α = (α1, . . . αmφ+pφ ) and γ = (γ1, . . . , γmψ+pψ ) . We first estimate α by

L mφ+pφ X φ X [qφ] 2 2 αb = arg min wl {CbE(hl, 0) − αjfk (hl )} . α ≥0 k l=0 k=1

∗ 2 Pmφ+pφ [qφ] 2 Then based on φb (h ) = k=1 αbkck (h ), we estimate γ by

T −1 mψ+pψ X ψ CbE(0, v) X [qψ] 2 2 γb = arg min wv { − γkfk (v )} , γ ≥0 ∗ k v=0 φb (0) k=1

Pmψ+pψ [qψ] ∗ subject to k=1 γkfk (0) = 1 which ensures that ψ (0) = 1. Any constraint optimization procedure can be used to obtain the non-negative estimates of α and γ.

∗ ∗ Given αb and γb, φb (·) and ψb (·) are determined by

mφ+pφ ∗ X [qφ] φb (·) = αbkfk (·) k=1

mψ+pψ ∗ X [qψ] ψb (·) = γbkfk (·). k=1 Then C(h, v) is estimated by

Cb(h, v) = φb∗(h2ψb∗(v2))ψb∗(v2)d/2. (2.11)

Although the nonparametric model proposed here is space-time nonseparable, the idea of our nonparametric estimation is also naturally suitable for space-time separable models. For example, if the hypothesis testing indicates that the space- time covariance can be assumed to be separable (e.g., Li et al., 2007), we can then

2 2 estimate C(h, v) by Cb(h, v) = φb1(h )φb2(v ), where φ1(·) and φ2(·) are two completely monotone functions.

2.4 Simulation Study

We conduct simulations to investigate the sensitivity of the estimators to the num- ber of knots and order of B-splines, and compare the performance of our nonparamet- 20 ric covariance estimator with parametric covariance estimators. Sections 2.4.1, 2.4.2 and 2.4.3 focus on the spatial covariance function and Section 2.4.4 briefly reports the experiment results for space-time covariance functions.

2.4.1 Simulation set up

We generate data from a zero mean Gaussian random field over a 20 × 20 grid with interval 0.5 based on the following five covariance models:

• Spherical model:  n o σ2 1 − 3 h + 1 h 3 h ≤ θ 2  2 θ 2 θ CS(h; σ , θ) =  0 h > θ

• Mat´ernmodel: √ √ 1 1 2 νhν 2 νh C (h; σ2, θ, ν) = σ2 K M Γ(ν) 2ν−1 θ ν θ

• Gaussian model:

2 2 2 2 CG(h; σ , θ) = σ exp(−h /θ )

• Linear combination of two Mat´ernmodels: 1 1 C (h; σ2, θ, ν, θ0, ν0) = C (h; σ2, θ, ν) + C (h; σ2, θ0, ν0) L 2 M 2 M

• Nonparametric model:

m+p X [q] 2 CN (h; β1, . . . , βm+p) = βjfj (h ) j=1

The spherical model has a compact support [0, θ] and is widely used in hydrology. The Mat´ernfunction is perhaps the most popular parametric model due to its flexibil- ity in modeling smoothness of the random field, and the Gaussian model that corre- sponds to infinitely mean-square differentiable processes can be considered as a limit- ing version of Mat´ernmodel. We include a linear combination of two Mat´ernmodels 21

and the proposed nonparametric model as examples for complex covariance structures that cannot be captured by a single parametric model. In the simulation, we set σ2 = 1, but choose θ ∼ unif[1.0, 1.5], θ0 ∼ unif[2.0, 2.5], ν ∼ discrete unif{1, 2, 3, 4, 5} and ν0 ∼ discrete unif{6, 7, 8, 9, 10} to eliminate the effect that specific choices of pa- rameter values can possibly have on the estimation. For the nonparametric model,

we use m = p = 3, and choose {βk : k = 1, . . . , m + p} ∼ unif[0, 1] subject to P [q] k βkck (0) = 1. We generate 500 datasets for each covariance model. The parameters are estimated based on weighted least square approach (2.7). We use the integrated squared error (ISE) to assess how well the estimated nonparametric

covariance function, Cb(h), approximate the true covariance function, C0(h), and use the mean squared prediction error (MSPE) to evaluate the predictive performance of the covariance function estimator, which are calculated as below :

• Integrated Squared Error (ISE) : Z ∞ 2 ISE = {C0(h) − Cb(h)} dh. 0

Since C0(h) → 0 as h → ∞, in practice, we only numerically calculate the

integration over a finite interval [0, hm] instead of [0, ∞), where hm is half of maximum distance observed.

• Mean Squared Prediction Error (MSPE)

10 1 X 2 MSPE = {Z(j) − Zb(j)} . 10 j=1

where Z(j)(j = 1, ··· , 10) is a randomly selected from each simulated dataset

and Zb(j)(j = 1, ··· , 10) is a predicted value calculated using the simple kriging (Cressie, 1993) based on the fitted covariance function Cb(h)

2.4.2 Number of knots and order of B-splines

To assess the effect of the number of knots and order of B-splines, for each dataset we fit our nonparametric model (2.6) with three different number of knots, m = 3, 5 22

and 7, and three different orders, p = 2, 3, and 5. Table 2.1 shows the mean and standard error of ISE and MSPE of the nonparametric estimator at different m and p. In general, MSPE decreases as the knot number increases, except when the underlying model is the nonparametric model in which case increasing the knot number leads to a further departure from the true model. Whereas no consistent pattern of ISE is observed when the knot number increases. However, it is seen that both ISE and MSPE quickly stabilize at the knot number as few as m = 5. Similarly, as the order of B-splines p increases, the ISE and MSPE tend to decrease slightly, but no significant difference is observed between p = 3 and p = 5. This suggests that the number of knots or the number of basis functions, and the order of B-splines only have a little effect on the proposed nonparametric estimator; especially when there is a sufficient number of knots, different orders of B-splines only make very little difference.

2.4.3 Comparison with parametric models

We compare our nonparametric covariance model with spherical, Mat´ernand Gaussian models by comparing their ISE and MSPE. To accomplish this task, for each simulated data generated from the five models aforementioned in this section, we fit both the nonparametric model and the spherical, Mat´ernand Gaussian models using weighted least squares method, respectively. Then we compute the ISE and MSPE corresponding to each fitted model, and also the MSPE with true model as reference. This provides ISE and MSPE with 1) correctly specified parametric model, 2) misspecified parametric model and 3) proposed nonparametric model. Table 2.2 summarizes the results. Although the ISE and MSPE corresponding to the cor- rectly specified model are usually the smallest ones among all the fitted models, our nonparametric model tends to produce very similar results as the correctly specified model, except when the data is generated from the spherical model. In the latter case, the nonparametric model gives a little larger ISE and MSPE compared to using the 23

Table 2.1. Mean and standard error (in parenthesis) of integrated squared error (ISE) and mean squared prediction error (MSPE) of nonparametric model at different knot number m and order of B-splines p (unit: 10−2).

Simulation ISE MSPE model m = 3 m = 5 m = 7 m = 3 m = 5 m = 7 p = 2 1.34 (.07) 1.08 (.06) 1.00 (.06) 64.51 (.66) 61.95 (.63) 60.70 (.61) Spherical p = 3 1.11 (.06) .99 (.06) .96 (.06) 61.88 (.63) 60.24 (.61) 59.63 (.60) p = 5 .97 (.06) .94 (.06) .93 (.06) 59.81 (.61) 59.08 (.60) 58.80 (.60) p = 2 4.63 (.28) 4.68 (.28) 4.70 (.29) 11.00 (.53) 10.74 (.50) 10.64 (.48) Mat´ern p = 3 4.59 (.28) 4.66 (.28) 4.70 (.28) 10.69 (.51) 10.44 (.48) 10.44 (.47) p = 5 4.60 (.28) 4.61 (.28) 4.67 (.28) 10.39 (.50) 10.19 (.47) 10.21 (.46) p = 2 5.60 (.44) 5.52 (.41) 5.57 (.42) .19 (.02) .16 (.01) .15 (.01) Gaussian p = 3 5.59 (.42) 5.53 (.43) 5.57 (.42) .20 (.02) .16 (.01) .15 (.01) p = 5 5.78 (.43) 5.53 (.43) 5.55 (.43) .25 (.02) .18 (.01) .16 (.01) p = 2 11.54 (.71) 11.67 (.74) 11.71 (.74) 5.92 (.30) 5.38 (.24) 5.29 (.24) Linear p = 3 11.59 (.74) 11.57 (.72) 11.66 (.72) 5.72 (.29) 5.34 (.25) 5.26 (.23) p = 5 11.54 (.75) 11.57 (.72) 11.63 (.73) 5.54 (.28) 5.34 (.25) 5.19 (.22) p = 2 9.22 (.42) 9.32 (.42) 9.35 (.42) 6.82 (.13) 6.89 (.13) 7.00 (.13) NP p = 3 9.15 (.41) 9.28 (.42) 9.34 (.42) 6.88 (.13) 6.89 (.13) 6.97 (.13) p = 5 9.05 (.41) 9.25 (.42) 9.31 (.42) 6.74 (.13) 6.90 (.13) 6.99 (.13) 24

Table 2.2. Mean and standard error (in parenthesis) of integrated squared error (ISE) and mean squared prediction error (MSPE) of three parametric models (spherical, Mat´ernand Gaussian) and the nonparametric model (with order of B-splines p = 3 and knot number m = 5), together with the MSPE of true covariance model (unit: 10−2).

Simulation ISE MSPE Model NP Spherical Mat´ern Gaussian True NP Spherical Mat´ern Gaussian .99 .58 .94 18.60 52.18 60.24 52.81 60.63 94.93 Spherical (.06) (.05) (.06) (.58) (2.92) (.61) (.62) (.69) (1.05) 4.66 5.28 4.75 5.04 9.30 10.44 12.82 11.13 25.92 Mat´ern (.28) (.32) (.29) (.29) (.44) (.48) (.41) (.47) (1.68) 5.53 7.38 5.57 5.44 0.10 0.16 3.31 0.17 0.11 Gaussian (.43) (.56) (.43) (.43) (.01) (.01) (.09) (.02) (.01) 11.57 12.14 11.87 12.61 4.45 5.34 6.32 6.33 18.84 Linear (.72) (.82) (.74) (.75) (.21) (.25) (.19) (.33) (1.36) 9.28 9.81 9.48 10.83 6.46 6.89 11.01 7.77 13.55 NP (.42) (.44) (.42) (.43) (.13) (.13) (.20) (.16) (.32)

spherical model itself. The reason might be because the sperical model is compactly supported while our nonparametric model is not designed in that fashion. Table 2.2 also indicates the risk of model misspecification. For example, when the true model is spherical yet a Gaussian model is fitted or vice versa, the ISE and MSPE with the misspecified model are seen inflated. Generally speaking, our nonparametric covariance estimator outperforms parametric estimators when the parametric model is misspecified, and performs comparably well as parametric estimators under correct model specification. It is interesting to note that our nonparametric model even outperforms the Mat´ernmodel in terms of MSPE, when the true model is in Mat´ern class such as the Mat´ernmodel and the linear combination of two Mat´ernmodels. We conjecture that this is due to the difficulty of parameter estimation with the Mat´ern model. 25

2.4.4 Simulation with space-time covariance function

In addition to the investigation of spatial covariance functions, we perform a small simulation study for space-time covariance models. We generate 500 independent spatio-temporal datasets from a zero mean Gaussian random field over a 7×7 spatial grid with a total of T = 50 time points, based on the following two space-time covariance models:

• Covariance Model 1: σ2  ch2γ  C (h, v; σ2, a, α, c, γ) = exp − , 1 (a|v|2α + 1) (a|v|2α + 1)γ with σ2 = 1, a = 1, α = 0.5, c = 1, γ = 0.5.

• Covariance Model 2: σ2(a|v| + 1) C (h, v; σ2, a, b) = , 2 {(a|v| + 1)2 + b2h2}3/2 with σ2 = 1, a = 0.5, b = 0.25.

Covariance Models 1 and 2 are developed by Gneiting (2002) and Cressie and Huang (1999), respectively. The nonparametric covariance function is estimated using the procedure described in Section 2.3. To assess the sensitivity of the skills of our nonparametric estimator to the choice of knot number and degree of B-splines in the context of spatio-temporal data, we fit the nonparametric model with different combinations of knot number and degree of B-splines ( pφ, pψ = 2, 3, 4, mφ = 5, 7, 9

ψ and m = 3, 5). We also fit parametric models C1 and C2 to the simulated data, and then compare the ISE and MSPE from the nonparametric model to those from two parametric models. Simulation results are summarized in Table 2.3. It can be seen that both ISE and MSPE of the nonparametric estimator are robust to the choice of the knot number and degree of B-splines. Similar to the conclusions for spatial covariance functions, the ISE and MSPE of the nonparametric space-time covariance estimator are comparable to those of correctly specified parametric models but they are significantly reduced compared to the misspecified parametric models. 26

Table 2.3. Mean and standard error (in parenthesis) of integrated squared error (ISE) and mean squared prediction error (MSPE) of the nonparametric space- time covariance model at different knot number (mφ, mψ) and order of B-splines (pφ, pψ) and of the parametric models (unit: 10−2).

Simulation model 1 Simulation model 2 pφ pψ mφ mψ ISE MSPE ISE MSPE 2 2 5 3 12.02 (.74) 76.73 (.69) 18.51 (.95) 64.55 (.65) 5 11.85 (.73) 76.62 (.69) 17.91 (.94) 64.23 (.64) 7 3 12.13 (.74) 76.80 (.69) 18.39 (.95) 64.65 (.65) 5 11.96 (.73) 76.70 (.69) 17.78 (.94) 64.31 (.64) 9 3 12.11 (.74) 76.84 (.69) 18.39 (.95) 64.70 (.65) 5 11.93 (.73) 76.73 (.69) 17.78 (.95) 64.37 (.64) 3 3 5 3 12.36 (.72) 76.81 (.69) 18.37 (.95) 65.02 (.71) 5 12.25 (.71) 76.74 (.69) 18.15 (.94) 64.14 (.64) Nonparametric 7 3 12.16 (.73) 76.74 (.69) 18.44 (.95) 65.53 (.76) Model 5 12.05 (.72) 76.66 (.69) 18.21 (.94) 64.38 (.64) 9 3 12.14 (.72) 76.77 (.69) 18.45 (.95) 66.10 (.99) 5 12.04 (.71) 76.69 (.69) 18.22 (.94) 64.39 (.65) 4 4 5 3 12.30 (.72) 76.71 (.69) 18.66 (.94) 64.44 (.65) 5 12.21 (.71) 76.71 (.69) 18.34 (.93) 64.24 (.65) 7 3 12.33 (.72) 76.74 (.69) 18.39 (.94) 64.67 (.65) 5 12.24 (.71) 76.74 (.69) 18.07 (.93) 64.44 (.65) 9 3 12.19 (.71) 76.72 (.69) 18.43 (.94) 64.84 (.66) 5 12.11 (.70) 76.72 (.69) 18.11 (.93) 64.59 (.65) True - 76.06 (.69) - 62.93 (.64) Parametric Model 1 12.85 (.66) 77.08 (.69) 20.53 (.90) 67.00 (.67) Model Model 2 13.86 (1.05) 78.98 (.85) 20.95 (1.14) 63.33 (.62) 27

2.5 Two Data Examples

We apply the proposed nonparametric method to two datasets. One is the spatial data of Indiana July mean temperature in 2011, and the other is the spatio-temporal Irish wind data.

2.5.1 Indiana July mean temperature data

This dataset consists of mean temperatures over 86 weather stations in Indiana state in July 2011, and can be obtained from National Climatic Data Center 1. Figure ?? shows 86 weather stations with mean temperature values. As expected, a smooth decreasing trend of temperatures along the South-North direction is observed. Since the topography of Indiana state is flat with no big mountains or big water body, the isotropic stationarity assumption for temperature data seems reasonable if the trend along the latitude can be removed. To this end, we first fit a linear model using latitude as an independent variable, and then we treat residuals from the fitted model as a realization from an isotropic stationary random field and apply the proposed nonparametric model (2.6) to these residuals. We choose p = 3 and m = 7 for the B-spline bases in the nonparametric model fitting. For this dataset, we find that at h = 0 there is a large discrepancy between the fitted nonparametric model Cb(0) and the moment estimator CbE(0), which indi- cates the presence of nugget effect. For this reason, we fit the nonparametric model without using CbE(0), and then estimate the nugget asη ˆ = CbE(0) − Cb(0). The fitted nonparametric covariance function is

2 [2] 2 [2] 2 Cb(h) =ηI ˆ (h=0) + φb(h ) = 79.15I(h=0) + 71.19f8 (h ) + 56.17f9 (h ), and is shown in Figure 2.3. Given Cb(h), we can make statistical inferences for this dataset as with parametric models. For example, we can use kriging predictor (Cressie, 1993) based on the fitted covariance function to interpolate the temperature

1http://www.ncdc.noaa.gov/cdo-web/ 28

77.5 ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● 76.6●●●●●●76.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●76●●●●●●●●●●75.1●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●75.2●●●●●●●●●●●●●●●●●●●●75●●●●●●●●76.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●74.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●76.9●●●●●●●●●●●●●●●●77.3●●●●●●●●●●●●●●78.377.3●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●75.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 77●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.1●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●79.3●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●76.8●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●77.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●77●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●76.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●77.2●●●●●●●●●●●●●●79.8●●●●●●●●●●●●● ●●●●●●78.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.5●●●●●●●●●●●●●●79.5●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●78.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●80.1●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●76.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.6●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.2●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●77.1●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●78.5●●●●●●●●●●●●●●●●●●●●77.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●79.2●●●●●●●●78.6●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●78.2●●●●●●●●●●78.6●●●●●●●●●●●●●●●●78.3●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 82 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.6●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●77.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.8●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●79.4●●●●●●●●●●●●●●●●●●●●●●●●●● 78.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●77.9●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●76.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●78.2●●●●●●●79.3●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●76.8●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●77.2●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●80●●●●●●●●●●●●●●●●●●●●●●●●●●●●79.9●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 80 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●79.7●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●81.1●●●●●●●●●●●●●●●●●●●●●●●82.1●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●79.2●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●76.6●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 81.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●77.2●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●77.5●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●80.7●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●81.4●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●80.1●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 78 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●78.6●●●●●●●●●●●●●●●●●●●●80●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●80●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●78.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●78.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●81●●●●81.1●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●80.7●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 76 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 81.3●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● 82.1●●●●80.5●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●76.2●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●78.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●79.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●79.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●81.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●80.4●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 84.1●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●82.6●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●85.4●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● 83.3●●●●●●●●●●●●●●●●●●●●●●●●●81.3● ●●●●●●● ●●●●●●●● ●● ●●●●●●●●●●●● 81.7●●●●●● ●● ●● ●●●●●●● ●

Figure 2.2. 86 weather stations (square) in Indiana state with July 2011 mean temperatures, together with temperature surface interpolated using the fitted nonparametric covariance function (unit: Fahrenheit). 29

200 NP Matern Gauss Spherical 150

● 100

● C(h)

50 ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● −50

0 5 10 15

h

Figure 2.3. Emprical covariance estimates (dots) and fitted covariance functions for Indiana July 2011 temperature data (distance unit: 10 km)

surface, which is shown in Figure ??. Using this dataset, we also fit three parametric models that are used in Section 2.4 (see Figure 2.3), and compare our nonparamet- ric model with those parametric models by examining their corresponding MSPE of drop-one predictions. The MSPE of the nonparametric model, the Mat´ern, Gaussian and spherical models are 174.50, 187.26, 185.50, and 187.61, respectively. Apparently, our nonparametric model yields the smallest MSPE. Since the smoothness parameter ν in the Mat´ernmodel is estimated to be 4.415, it is not surprising that the Gausian model yields similar result as the Mat´ernmodel.

2.5.2 Irish wind data

The Irish wind data consists of daily average of wind speed at 11 synoptic mete- orological stations in Ireland during the period 1961-1978. To study the covariance structure in this dataset, we follow the previous analyses such as Haslett and Raftery 30

(1989) and Gneiting (2002) to compute the velocity measures by taking a square root of the wind speeds and then removing the seasonal effects and spatially varying mean from the square roots. We then fit the nonparametric space-time model (2.10) to those velocity measures. The empirical correlation plot in Figure 5 of Gneiting (2002) implies that the purely temporal correlation is close to zero after time lag 3, whereas the purely spatial correlation is non-negligible over the entire spatial domain

∗ of interest. With this observation, we set mφ = 10 for the estimation of φ (·) and ∗ mψ = 4 for the estimation of ψ (·), but use the same degree of B-splines pφ = pψ = 3 for both functions. As in Gneiting et al. (2007), we estimate the correlation function C(h, u) using data from 1961 to 1970 (3650 days). Following the model fitting procedures described in Section 2.3, we obtain the estimates of two completely monotone functions as below:

∗ [2] [2] [2] φb (h) = 0.033f8 (h) + 0.025f10 (h) + 4.990f13 (h), and

∗ [2] [2] [2] ψb (v) = 0.548f4 (v) + 0.031f5 (h) + 1.356f7 (v).

Our nonparametric estimate of the space-time correlation function of the Irish wind data thus follows by Cb(h, v) = φb∗{h2ψb∗(v2)}ψb∗(v2). Figure 2.4 shows the fitted correlation function Cb(h, v) together with the empirical estimates CbE(h, v) at v = 0, 1, 2 and 3. We compare our nonparametric model with four parametric models based on the

MSPE of one-day ahead prediction. These four parametric models include the C1 and

C2 presented in Section 2.3, and the following two additional models that appeared in Gneiting et al. (2007):

• Covariance Model 3

1 − ν   c||h||  ν  C (h, v; ν, a, α, c, β) = exp − + I , 3 1 + a|v|2α (1 + a|v|2α)β/2 1 − ν h=0 31

● 1.0 1.0 ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● 0.8 ● ● ● ● ● 0.8 ●● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● 0.6 0.6 ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● C(h,0) C(h,1) ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.4 0.4 ● ● ● ● ● ● ● ● 0.2 0.2 0.0 0.0

0 1 2 3 4 0 1 2 3 4

h h 1.0 1.0 0.8 0.8 0.6 0.6 C(h,2) C(h,3) 0.4 0.4

● ● ● ●● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● 0.2 ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ●● ●●●● ● ●● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● 0.0 0.0

0 1 2 3 4 0 1 2 3 4

h h

Figure 2.4. Empirical covariance estimates (dots) of Irish wind data and fitted nonparametric model (solid line) at four different sets of space-time lags (distance unit: 100 km)

• Covariance Model 4

(1 − ν)(1 − λ)   c||h||  ν  C (h, v; λ, ν, a, α, c, β, ζ) = exp − + I 4 1 + a|v|2α (1 + a|v|2α)β/2 1 − ν h=0  1  + λ 1 − |h1 − ζv| , 2ζ +

0 where h1 is a longitudinal component of h = (h1, h2) .

Note that Model 4 allows both space-time asymmetry and space-time interaction, and thus takes the most flexible form among all the four models considered here. 32

Table 2.4. Mean squared prediction error (MSPE) of one-day ahead prediction of Irish wind data.

Stations NP Model 1 Model 2 Model 3 Model 4 Valentia .251 .250 .649 .251 .249 Belmulle .245 .245 .363 .245 .245 Claremorris .241 .240 .394 .242 .240 Shannon .219 .218 .412 .219 .217 Roche’s Point .227 .227 .439 .229 .225 Birr .225 .226 .351 .227 .223 Mullinger .177 .179 .462 .180 .176 Malin Head .239 .241 .379 .242 .238 Kilkenny .188 .188 .388 .190 .184 Clones .230 .233 .451 .234 .229 Dublin .194 .198 .531 .198 .194

MSPE of Models 3 and 4 have been reported in Gneiting et al. (2007). For Models 1 and 2 and our nonparametric model, we compute their MSPE of one- day ahead prediction in the same way as for Models 3 and 4 so that the results from all different models are comparable. Table 2.4 summarizes the MSPE of all models at each of the 11 stations. The MSPE of Models 1, 3, and 4 are very similar, though Model 4 perhaps has slightly lower MSPE than the other two. Model 2 corresponds to the largest MSPE among all the four parametric models. MSPE of our nonparametric model appear to be as good as those of Model 4. These results imply that if the parametric models are adequate, our nonparametric model performs as well as those parametric models if not better, even compared to the very flexible parametric models. However, the proposed nonparametric method avoids the risk of selecting an inadequate parametric model. 33

2.6 Conclusion

It is well known that the tail behavior of the spectral density of a covariance function determines the smoothness of the process, and influences the performance of interpolation procedures (Stein, 1999; Im et al., 2007). Since our proposed estimator does not directly estimate the spectral density, it is not clear how the high frequency component can be modeled separately from the other components. This will be an interesting topic for further investigation. Some limitations of the proposed nonparametric space-time covariance function are observed. The two completely monotone functions alone do not allow flexible space-time interactions, unless an additional space-time interaction parameter is in- troduced as in Gneiting (2002). The two functions are essentially determined solely by the purely spatial covariance C(h, 0) and purely temporal covariance C(0, v). Al- though the C(h, v) for h 6= 0 and v 6= 0 can be incorporated to estimate the two completely monotone functions as well, they only play a role in increasing the sample size in the estimation of those functions, but will not essentially alter the functions estimated from the marginal covariances. Nevertheless, the small difference between the nonparametric model and Model 4 in Irish wind data analysis indicates that miss- ing the interaction parameter may not cause major loss in the covariance estimation. This indeed is corroborated by the conclusion in Genton (2007). Our nonparametric estimation approach can be adapted to other data struc- tures such as the multivariate spatial or multivariate spatio-temporal random field. Apanasovich and Genton (2010) proposed a general class of cross-covariance func- tions based on a set of completely monotone functions and positive functions with completely monotone derivatives (see equation (4) therein). Analogous to the argu- ments in this paper, the form of that cross-covariance can be expressed as a product of several completely monotone functions, and thus the idea of our nonparametric estimation can be applied. 34

Another interesting direction of future research is to extend the proposed estimator to nonstationary covariance models. One possible way to achieve this goal is to allow the non-negative coefficients β of the covariance estimator to be spatially varying as in Kleiber and Nychka (2012), so we have

m+p X [q] 2 C(s, u) = βbk(s, u)fk (||s − u|| ), k=1 for two locations s and u in a spatial domain. 35

3. ANALYSIS OF TELECONNECTION

3.1 Introduction

Teleconnection is a linkage between seemingly uncorrelated climate anomalies of two regions that are separated from each other by a very large distance1. The dis- tances between two telconnected regions are typically thousands kilometers and can span the entire continent in the case of some large-scale teleconnections. Since tele- connection reflects large-scale phenomenons in the atmospheric circulation, it is often considered to be responsible for extreme weather conditions occurring simultaneously across the globe. The study on teleconnection was pioneered by Sir Gilbert Walker through his work on the “seesaw” pattern of the atmospheric pressure over the tropical Indio-Pacific Ocean (Walker, 1923). In the series of work, he discovered that the atmospheric pres- sure over the Northern Australia and Indonesia was negatively correlated with that of the eastern South Pacific Ocean. This long-distance pattern, called the Southern Oscillation (SO) in oceanography and climatology, was also shown to be related to the Asian monsoon. It was observed that during the negative phase of the SO (above normal pressure in the Northern Australia and below normal pressure in the eastern South Pacific Ocean), the amount of rainfall was diminished, causing droughts and famine across many Asian countries (see Barnett et al. (1991) and reference therein). Other examples of teleconnection include the North Atlantic Oscillation (NAO), the Pacific North America (PNA) oscillation, precipitations in the East China and the southwest Western Australia (Li et al., 2012), and the Pakistan flood-Russia heat waves in 2010 (Lau and Kim, 2012).

1When the two anomalies are negatively related, it is also called dipole. 36

In this chapter, we investigate teleconnection between the Sea Surface Tempera- ture (SST) of the tropical Pacific Ocean and hydrological droughts in North America occurred with the El Ni˜no-SouthernOscillation (ENSO) (see Section 3.2 for more details). El Ni˜norefers to an abnormal warming of sea surface temperature in the eastern tropical Pacific Ocean. As its name suggests, the ENSO is a coupled phe- nomenon consisting of an oceanic component (El Ni˜no)and an atmospheric com- ponent (Southern Oscillation). The ENSO is arguably the most extensively studied ocean-atmospheric phenomenon due to its huge global-scale impact. For example, the strong episode of the ENSO during 1982/1983 is believed to have caused droughts and fires in Australia, Southern Africa, Central America, Indonesia, the Philippines, South America and India; floods in the United States, Gulf of Mexico, Peru, Ecuador, Bolivia and Cuba and more hurricanes than usual in Hawaii and Tahiti. Hsiang et al. (2011) argued that the probability of new civil conflicts arising throughout the tropics doubled during El Ni˜no(warm ENSO event)2 years relative to La Ni˜na(cold ENSO event) years. The physical mechanism of how anomalies over such a large distance can be as- sociated (i.e. teleconnected) has not been fully understood and depends largely on the locations where teleconnection occurs. Tribbia (1991) explained one of the most widely accepted theories that described the ENSO teleconnection in two stages. First, anomalies in the thermal state of SST affect the equatorial upper troposphere and then the large-scale wave oscillation such as Rossby wave propagates the ENSO signal horizontally from tropics to extratropics (mid-latitude regions). El Ni˜noalso affects the frequency of tropical cyclones in the Atlantic-Caribbean region by changing the vertical wind shear (i.e. wind gradient), which is a crucial factor in a hurricane formation (Gray and Sheaffer, 1991). In the absence of physical models, empirical analyses of climatic/oceanic data based on statistical techniques have been used to detect teleconnections and investi- gate their pattern (Brown and Katz, 1991). In literature, two approaches have mainly

2El Ni˜nois also called simply ENSO event in which case La Ni˜nais called anti ENSO event 37 been used to analyze teleconnection quantitatively, one of which is based on the anal- ysis of bivariate time series. In this approach, the regional mean is computed from observations within each region (e.g. weather stations or grid points) at each time point. This is repeated for the whole observation period, which produces bivariate time series data. The dependence between two regions is then measured using a cor- relation coefficient either directly from the bivariate time series data (e.g. Cole and Cook, 1998) or from multi-resolution signals obtained after, for example, maximal overlap discrete wavelet transform (e.g. Li et al., 2012). This approach, however, carries a risk of losing spatial information by aggregating data and does not respect the fact that even if two regions are teleconnected, the magnitude of the dependence, or even the sign of the relationship (i.e. positive/negative) may vary from subregion to subregion. Another popular approach involves multivariate analysis techniques such as Sin- gular Value Decomposition (SVD), Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), or variations of those (Bretherton et al., 1992). For example, in an approach using SVD, a set of singular vectors and singular values are computed from a sample cross-region covariance matrix and then singular values are used as a metric to measure the strength of teleconnection whose “spatial pattern” is determined through the corresponding pair of singular vectors. As previous approach, this approach also implicitly assumes that the pattern and the strength of telecon- nection do not change over the observation period, which may not be reasonable if the period is long enough. In this chapter, we propose a statistical model that can be used to investigate the spatial pattern of teleconnection. The pattern (i.e. spatial variation of dependence between teleconnected regions) is also allowed to vary over time. To the best of our knowledge, this is the first attempt to model explicitly the time-varying element in studying spatial pattern of teleconnection. This chapter is organized as follows. Section 3.2 describes the ENSO teleconnec- tion to North America, reviews works on this teleconnection and explains the dataset 38 used in this study. The saptial model for time-varying teleconnection is suggested in Section 3.3 and the ENSO-North America teleconnection is analzyed using the proposed model in Section 3.4.

3.2 ENSO Teleconnection to North America

In the normal state of the tropical Pacific Ocean, sea surface is the warmest in the western part and the coldest in the eastern part due to easterly trade wind driven ocean dynamics of Pacific Ocean (Cane, 1991). This generates an east-west SST gradient (i.e. contrast between the warm west and the cold east) and initiates an atmospheric circulation where the warm and moist air ascends in the west and descends in the east, forming the Walker circulation cell over the tropical Pacific Ocean. El Ni˜nois a phenomenon where the warm water of the western tropical Pacific Ocean extends to the eastern part of the tropical Pacific Ocean including the Pacific coast of South America, which consequently increases SST of this region to an un- usually high level. Conversely, when SST of the region becomes irregularly low, the period is called La Ni˜na.El Ni˜no/LaNi˜nais often coupled with the Southern Oscil- lation (SO) described in Section 3.1. The strength of the SO is often measured as a difference in the atmospheric pressures between Tahiti and Darwin, Australia called the Southern Oscillation Index (SOI). For instance, during El Ni˜noevents, the SOI tends to have negative values while during La Ni˜naevents, it tends to have positive values. For this reason, El Ni˜no(and La Ni˜na) and the Southern Oscillation are often associated together and comprise a single oceanic-atmospheric phenomenon called El Ni˜no-SouthernOscillation (ENSO). Bjerknes (1969) pioneered the study on mecha- nism of this coupled process. He observed that during El Ni˜no,the SST contrast between the east and the west is diminished, which then weakens both of the sea level pressure contrast and the strenght of the westward wind flow, causing SST of this region even warmer. Two decades later, it was revealed that “sluggish” change 39 of thermocline3 not being in phase with the rapid change of SST/wind causes the oscillation between the warm period and the cold period of the ENSO (see Section 1.8 in Sarachick and Cane, 2010). Due to its significance to the global climate variability, many indices have been developed to measure the strength of the ENSO and the Pacific Ocean sea surface temperature is often used for this purpose. For instance, Ni˜no1+2,Ni˜no3,Ni˜no3.4 and Ni˜no4indices all refer to SST anomalies averaged over a specific region of the Pacific Ocean (e.g. Ni˜no1+2is a rectangular region of Pacific bounded by 0°S-10°S, 90°W-80°W). The SOI measures the strength of the SO, hence also can be used to measure the ENSO strength. Although these ENSO indices do not always coincide with each other, they generally agree on strong events. Based on these indices, 1973- 74, 1975-76 and 1988-89 are considered to be strong La Ni˜nayears and 1957-58, 1965-66, 1972-73, 1982-83 and 1987-88 are considered to be strong El Ni˜noyears. The ENSO phenomenon strongly affects the tropical region and South America due to their geographical proximity. During the El-Nino periods, due to warm tem- peratures over the Pacific coast, the western coast of South America receives more rainfall than usual and the combination of warm temperatures and high precipita- tions alters the ecology system of this region as well as the composition of fishery/sea bird population in the Pacific Ocean (Jord´an,1991). Also, since the ENSO desta- bilizes the energy flow between the equator and the polar regions, its effect reaches far beyond the tropical region and causes teleconnections around the world (Battisti and Sarachick 1995). It moves tracks of tropical cyclones of the Atlantic Ocean and affects various regions including Southern Africa, North America, Australia, Europe and even Antarctica. In this study, we focus on the effect of the ENSO on hydrologic droughts across North America. The association has been identified in various literature. Ropelewski and Halpert (1989) and Halpert and Ropelewski (1992), among others, studied the connection between the ENSO and the temperature/precipitation of the United States

3a layer in ocean where temperature changes rapidly 40

and found that above-normal temperature/precipitation were associated with the ENSO events. In Cole and Cook (1998), significant ENSO-drought correlation was observed over the Southwestern United States. Rajagopalan et al. (2000) used Ni˜no3index/global SST and Palmer Drought Severity Index (PDSI) to study teleconnection between the winter ENSO and the drought condition across the North America region over a period from 1895 to 1995. In the study, the authors divided the entire period into three epochs (1895∼1928 epoch, 1929∼1962 epoch and 1963∼1995 epoch) and within each epoch, investigated the spatial structure of correlation between Ni˜no3index/global SST and PDSI through analyzing the correlation map and the heterogenous correlation map obtained from SVD of the sample cross-region covariance matrix. Their findings showed that the spatial pattern of teleconnection changed over time with Texas region exhibiting the strongest relationship with Ni˜no3index during the first epoch while in the third epoch, the Western United States and Southern California showing stronger teleconnection than Texas region. Given this result and based on the general understanding that the strength of teleconnection varies over time, we investigate the temporally-varying spatial depen- dence pattern between SST of the tropical Pacific Ocean and PDSI of North America. Following is a short description of the datasets used for the analysis in Section 3.4.

• Sea Surface Temperatures (SST) are obtained from the Hadley Center Global Sea Ice and Sea Surface Temperature (HadISST) dataset4. HadISST tempera- tures were reconstructed using a two stage reduced-space optimal interpolation procedure, followed by superposition of quality-improved gridded observations onto the reconstructions to restore local detail (Rayner et al., 2003). The tem- perature dataset is on 1°× 1° grid over the global ocean, but we only use SST over Ni˜no3region (5°S-5°N and 150°-90°W) over a period from 1870 to 1990 for 4Data : Hadley Centre for Climate Prediction and Research/Met Office/Ministry of Defence/United Kingdom, 2000: Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST). Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory, Boulder, CO. [Available online at http://rda.ucar.edu/datasets/ds277.3/ .] Accessed 5 May 2014 41

this study. Following Rajagoplan et al. (2000), we use the winter SST, defined as an average SST over October-March, which results in 600 points per each year.

• Palmer Drought Severity Index (PDSI) is obtained from the North American Summer PDSI Reconstructions dataset5. The values of PDSI were reconstructed on 2°× 3° grid points over the North America continent from tree-ring chronolo- gies and we use 176 grid points inside a box covering the United States (30°N- 50°N and 122.5°W-70°W) from 1870 to 1990. PDSI mainly reflects long-term droughts and PDSI value 0 means the normal condition while negative values mean the drought condition (PDSI < −4 : extremely dry) and positive values mean the moist condition (PDSI > 4 : extremely moist).

60 ° N

30 ° N

0 ° ° W 150 ° W 60 ° 120° W 90 W

Figure 3.1. Nino3 region and North America region

5Data : Cook, E. R., et al. 2004. North American Summer PDSI Reconstructions. IGBP PAGES/World Data Center for Paleoclimatology Data Contribution Series # 2004-045. NOAA/NGDC Paleoclimatology Program, Boulder CO, USA. 42

3.3 Model for Time-varying Teleconnection

Teleconnection can be considered as a type of the spatial dependence between two regions. Although there are many spatial covariance functions developed to date in the statistical literature, not many of them are applicable to modeling the long- distance dependence, which is a crucial element in the teleconnection phenomenon because most of covariance functions assume that the spatial dependence between two locations decreases as the distance between the two locations increases. Also, teleconnections usually involve vast geographical regions. In ENSO-North America teleconnection, for example, the size of Ni˜no3region alone is greater than hundred thousand square kilometers. Therefore it is more reasonable to expect the dependence structure varies over the space and hence non-stationary covariance model (see Section 1.2.3) is required. In Cheng (2013), the space deformation method (Samp- son and Guttorp, 1992) was employed to model global temperature data aggregated onto a country level. However, in our problem setting, we are interested in two regions only but not other regions between or surrounding the two, the space deformation technique is not applicable here. Nonstationary covariance function based on convo- lution process (1.10) is also hard to be applied for modeling teleconnection. Negative dependence commonly happens in the teleconnection phenomenon but since the kernel function K(·) is non-negative, for C(s, u) in (1.10) to be negative, stationary covari- ance function Cθ(v)(s − u) has to be able to model negative covariance. Therefore it reduces to the problem of finding a stationary covariance function for long-distance dependence which is not readily available. Also, even if such covariance function ex- ists, K(s − v)K(u − v) quickly becomes near-zero if v is far away from either one of s or u. But since two teleconnected regions are far away from each other, it is not clear how to obtain large value of C(s, u) by integrating K(s−v)K(u−v)Cθ(v)(s, u) with respect to v. Another important aspect to consider in developing a statistical model for tele- connection is that the strength and the pattern of teleconnection change over time 43 when the period of observation is long enough. The two approaches that are based on either bivariate time series or empirical orthogonal functions reviewed in Section 3.1 lack the ability to model teleconnection that varies depending on the observed time t. In this chapter, we propose a temporally-varying teleconnection model. Spatial random effects (SREs) are used to capture the spatial variability of teleconnection. The covariances of SREs are then represented as a function of time so that the spatial pattern of teleconnection can change over time as well. The parameters are estimated by minimizing the distance between the empirical cross-region covariance and the corss-region covariance under the model. The significance of teleconnection can be expressed using an empirical confidence interval of the model estimates.

3.3.1 Model

Let DA and DB denote two teleconnected regions and suppose we have nA observa- tions {ZA,t(si); si ∈ DA}i=1,...,nA from region DA and nB observations {ZB,t(uj); uj ∈

DB}j=1,...,nB from region DB at times t = 1,...,T . Let ZA,t and ZB,t be nA × 1 and nB × 1 vectors of data observed at time t. Analyses of teleconnection have mainly focused on annual means of specific con- secutive months since teleconnection is a low-frequency phenomenon. This produces a large temporal gap between two time points hence we assume that dependence between two data points observed at different time is negligible. If the spatial pattern of teleconnection is constant over time, the series of pairs of observed data, {ZA,t, ZB,t}t=1,...,T , can be considered as “replicates” of size T and the teleconnection analysis can be done through the correlation analysis or empirical orthogonal function driven methods such as SVD, CCA or PCA based on a sample covariance matrix, assuming that each pair {ZA,t, ZB,t} contains the identical infor- mation regarding teleconnection. However, as discussed earlier, evidence shows that 44

ENSO-North America teleconnection varies with time and hence a model that can incorporate the time-varying spatial dependence is needed.

To do so, we represent ZA,t(s) and ZB,t(u) using a SRE model with rA spatial random effects {αi(t)}i=1,...,rA and rB spatial random effects {βj(t)}j=1,...,rB with spa- tial basis functions {fi(s)}i=1,...,rA in region DA and {gj(u)}j=1,...,rB in region DB respectively: r XA ZA,t(s) = αi(t)fi(s)(s ∈ DA), i=1 r (3.1) XB ZB,t(u) = βj(t)gj(u)(u ∈ DB), j=1 where the covariance between i-th spatial random effect in region DA and j-th spatial random effect in region DB is denoted by

σij(t) = Cov(αi(t), βj(t)).

Now, instead of assuming that the covariance is constant over time (i.e. σij(t) = σij) as implicitly assumed in many existing methods, we let the dependence change with time t, and express σij(t) using an expansion of local basis functions:

K X σij(t) = cij,kbk(t), (3.2) k=1 where {bk(t)}k=1,...,K is a set of B-splines with coefficients {cij,k}k=1,...,K . Therefore we have that the covariance between a random process ZA,t observed at s ∈ DA and a random process ZB,t observed at u ∈ DB be represented as following:

r r XA XB Cov(ZA,t(s),ZB,t(u)) = σij(t)fi(s)gj(u) i=1 j=1 r r K XA XB X = cij,kbk(t)fi(s)gj(u). (3.3) i=1 j=1 k=1

Note that when modeling the covariance in (3.2), we only specify that σij(t) is tem- porally smoothed because the smoothing over the spatial domain is taken care of in

(3.1) by spatial basis functions, {fi(·)} and {gj(·)}. This covariance function allows 45 covariances from different regions to change differently with time, giving flexibility to model the interaction between time and space on the covariance .

Now for the observed random vectors, ZA,t and ZB,t, the nA × nB cross-region covariance matrix is expressed as:

ΣAB,t = Cov(ZA,t, ZB,t)       f1(s1) . . . frA (s1) σ11(t) . . . σ1rB (t) g1(u1) . . . g1(unB )             =  ......   ......   ......        f1(snA ) . . . frA (snA ) σrA1(t) . . . σrArB (t) grB (u1) . . . grB (unB )

0 = FM tG , (3.4)

where F and G are basis matrices, and M t is a cross-covariance matrix of two sets of random effects. Since our interest lies in the between-region dependence (i.e. teleconnection), we will focus on estimating the cross-region covariance matrix ΣAB,t instead of the whole variance matrix

Σt = Var([ZA,t, ZB,t])   ΣAA,t ΣAB,t =   , (3.5) 0 ΣAB,t ΣBB,t and hence the cross-covaraince matrix of random effects, M t, does not need any constraint while, on the other hand, when the whole variance matrix (3.5) is to be estimated, the variance matrix of random effects needs to satisfy the positive definite condition (see Section 3.4.2).

3.3.2 Estimation

Given the choice of basis functions {fi(s)}, {gj(u)} and {bk(t)}, the unknown coefficients {cij,k} in (3.3) remain to be estimated and we propose to estimate the parameters by minimizing the sum of squared Frobenius norms between the empirical 46

cross-region covariance matrix, SAB,t, and the cross-region covariance matrix under the proposed model, FM tG, over the observation period t = 1,...,T :

T X 0 2 ||SAB,t − FM tG ||F , (3.6) t=1 where the Frobenius norm of a n × m matrix M is defined as:

0 1/2 ||M||F = (tr(MM )) .

It is useful to note that the Frobenius norm, which is a norm of matrix M, can be expressed as a l2-vector norm || · ||2 of a vectorized version of M:

0 vec(M) = (M 1,1, M 1,2,..., M 1,m, M 2,1,..., M 2,m,...,..., M n,1,..., M n,m) ,

where M i,j is (i, j) element of matrix M, because

n m X X 2 1/2 ||M||F = ( M i,j) i=1 j=1 = (vec(M)0vec(M))1/2

= ||vec(M)||2.

Hence we have

T T X 0 2 X 0 2 ||SAB,t − FM tG ||F = ||vec(SAB,t) − vec(FM tG )|| t=1 t=1

T rA rB X X X 0 2 = ||vec(SAB,t) − vec(f igj)σij(t)|| t=1 i=1 j=1

T rA rB K X X X 0 X 2 = ||vec(SAB,t) − vec(f igj) cij,kbk(t)|| t=1 i=1 j=1 k=1 T X 2 = ||vec(SAB,t) − HCb(t)|| , (3.7) t=1 47

where f i is a nA ×1 vector of spatial basis function fi(·) evaluated at s1,..., snA ∈ DA

and gj is a nB ×1 vector of spatial basis function gj(·) evaluated at u1,..., unB ∈ DB, b(t) is a K × 1 vector of B-spline functions evaluated at time t:

0 f i = (fi(s1), . . . , fi(snA )) (i = 1, . . . , rA),

0 gj = (gj(u1), . . . , gj(unB )) (j = 1, . . . , rB),

0 b(t) = (b1(t), . . . , bK (t)) ,

H is a nAnB × rArB matrix and C is a rArB × K matrix with

H = [vec(f g0 ),..., vec(f g0 ),..., vec(f g0 ),..., vec(f g0 )], 1 1 1 rB rA 1 rA rB   c11,1 . . . c11,K      c12,1 . . . c12,K  C =   .    ......   

crArB ,1 . . . crArB ,K

Let Q(C) be the objective function in (3.7):

T X 2 Q(C) = ||vec(SAB,t) − HCb(t)|| . t=1 Then, taking derivative with respect to C, we have

T T ∂Q(C) X X = 2 H0HCb(t)b(t)0 − 2 H0vec(S )b(t)0 ∂C AB,t t=1 t=1 which leads to an equation

T T 0 X 0 0 X 0 (H H)C( b(t)b(t) ) = H vec(SAB,t)b(t) . t=1 t=1

Hence, the optimal value of the matrix, Cb , is uniquely determined as

T T 0 −1 0 X 0 X 0 −1 Cb = (H H) H vec(SAB,t)b (t)( b(t)b (t)) , (3.8) t=1 t=1

0 PT 0 provided that the inverse of matrices H H and t=1 b(t)b (t) exist. It is easy to check that H0H is full rank if basis matrices F and G are full rank and since the spatial 48

basis matrix is often assumed to be full rank in statistical models using spatial basis expansion techniques, the existence of the inverse matrix (H0H)−1 is not a strict PT 0 0 assumption. Also, note that t=1 b(t)b (t) = B B, where B is a T × K matrix

B = {bk(t)}k=1,...,K;t=1,...,T . Since B-spline functions are compactly supported with non-zero values for a short span of time, B is full rank in general given that T is long enough. Once the estimate of parameter matrix, Cb , is obtained, the time-varying covari- ance of random effects can be obtained by plugging the estimated coefficients bcij,k in (3.2):

K X σbij(t) = bcij,kBk(t). k=1

3.3.3 Empirical confidence interval for covariance of spatial random effect

Teleconnection is considered to be responsible for many extreme weather events and whether teleconnection is well-reproduced or not is used to measure the model fidelity of an ocean-atmospheric coupled model simulation (Coats et al., 2013). There- fore, in the study of teleconnection, it is often of interest whether the observed de- pendence is significantly strong or not and the identification of significantly strong teleconnection signal can help scientists in prioritizing areas/periods important in teleconnection research.

In the proposed framework, the strength of dependence is characterized by σij(t), hence the confidence interval of σij(t) can be used as a guidance in determining whether the observed teleconnection signal between certain areas at certain time is significant or not.

First let bcij be a K ×1 vector of estimated coefficients associated with i-th random effect of region DA (i.e. αi) and j-th random effect of region DB (i.e. βj) in (3.3):

0 bcij = (bcij,1,..., bcij,K ) . 49

Then

0 bcij = eijCb T T X 0 0 −1 0 0 X 0 −1 = eij(H H) H vec(SAB,t)(bt( btbt) ), (3.9) t=1 t=1 where eij is a rArB ×1 vector of zeros except single one at a location corresponding to 0 0 −1 0 index (ij). Since (eij(H H) H )vec(SAB,t) is an univariate random variable, under temporal independent assumption, a K × K variance matrix of bcij is: T " T # X 0 0 −1 0 0 X 0 −1 Var(bcij) = Var (eij(H H) H )(vec(SAB,t))(bt( btbt) ) t=1 t=1 T X 0 0 −1 0 0 −1 = (eij(H H) H )Var(vec(SAB,t))(H(H H) eij) t=1 T T 0 X 0 −1 0 0 X 0 −1 (bt( btbt) ) (bt( btbt) ). (3.10) t=1 t=1

And the variance of σbij(t) is now simply:

0 Var(σbij(t)) = b (t)Var(bcij)b(t). (3.11)

Hence, if we use the empirical rule, the approximate confidence interval can be ob- tained as: q σbij(t) ± 2 Var(σbij(t)) (3.12)

Note that each σij(t) represents the dependence between subregion of DA with non- zero value of basis function fi(·) and subregion of DB with non-zero value of basis

function gj(·). Hence, if teleconnection between a particular subregion of DA, say

subregion associated with the spatial random effect αi, and the whole region DB is of interest, one might want to use a mean of σ ,..., σ : bi1 birB r 1 XB σ (t) = σ (t), (3.13) bi· r bij B j=1 instead of σbij alone. The variance of σbi· can be obtained in a similar way as in (3.12) using different indicator vector ei·, a rArB × 1 vector of zeros except rB ones at locations corresponding to indices (i1),..., (irB) instead of eij in (3.9). 50

3.4 Data Analysis

3.4.1 Teleconnection analysis

As described in Section 3.2, the sea surface temperature of the tropical Pacific Ocean exhibits the warm west-cold east trend. Also, regions that are close to the equator (i.e. low latitude regions) generally receive more solar insolation energy than high latitude regions. Hence we first fit a mean trend model to SST and PDSI separately using longitude and latitude as explanatory variables. Residuals from the mean trend model are further standardized using the kernel estimate of residual variance6 to minimize the effect of temporally-varying variance on the analysis of the between-region covariance. Figure 3.2 shows the means of kernel estimate of the residual variance in each region before the standardization (blue lines) and after (red lines) the standardization. It can be seen that the temporal variation is stabilized around 1 after the standardization. Now we denote the standardized residual as

{ZA,t(ui)}i=1,...,179 for PDSI or North America and {ZB,t(vi)}i=1,...,600 for SST of the Ni˜no3region (t = 1871,..., 1990).

In this data analysis, we use rA = 12 spatial basis functions for North America and

rB = 6 spatial basis functions for Ni˜no3region. The number of spatial basis functions is much larger for North America than for Ni˜no3region because the terrain of North America is much more complex than that of the Pacific Ocean. The centers of spatial basis functions are placed on a regular grid as in Figure 3.3. Following Cressie and Johannesson (2008) and Katzfuss and Cressie (2010), we use the bisquare function defined as:  ||u − c||2 if ||u − c|| < d f(u; c) = (3.14)  0 if ||u − c|| > d

6The kernel estimate of residual variance at each spatial location is computed as: PT 2 PT l=1 ZA,t(si)K((t − l)/h)/ l=1 K((t − l)/h) for Epanechnikov kernel function K(·) with band- width h = 3 51

4

3

2

1

0 0 20 40 60 80 100 120

2

1.5

1

0.5

0 0 20 40 60 80 100 120

Figure 3.2. Temporal variation of residual variance (top: Ni˜no3region; bottom: North America) before standardization (blue lines) and after the standardization (red lines)

as spatial basis function for both North America and Ni˜no3region. In (3.14), c is the center of the bisquare function and d is chosen to be 1.5 times the largest distance between centers of basis functions within each region. To capture the time-varying teleconnection signal between two regions, it is im- portant to have enough number of B-splines in (3.2). On the other hand, as it is with other B-spline applications, too many B-splines can make the functional estimate to be too sensitive to the noise in the data and prone to capture spurious signals. In this analysis, we use 30 B-splines (K = 30) so that the length of positive part of B-spline functions is short enough to capture the effect of alternating phases of the ENSO, which is generally believed to be 2 ∼ 7 years (Battisti and Sarachick, 1995; Sarachick and Cane, 2010). For the empirical cross-region covariance matrix in (3.6), we use a 0 cross product of observed vectors, ZA,tZB,t.

Figure 3.4 shows estimated time-varying covariancesσ ˆij(t) from 1871 to 1990.

Since there are rA = 12 spatial basis functions in North America and rB = 6 spatial 52

50 ° N

40 ° N

30 ° N 120 ° ° W W ° 110° ° W 70 W 100° W 90 W 80

5° N

0 150° ° W 145° W 140° W 135° W 130° W 125° W 120° W 115° W 110° W 105° W 100° W 95° W 90° W

5° S

Figure 3.3. Location of centers of spatial basis functions

basis functions in Ni˜no3region, a total of 72 curves are produced. Overall, the meandering pattern indicates that the strength of teleconnection changes over the observation period. Also, it can be seen that the curves do not exhibit any unified pattern but rather deviate from each other. For example, around early 70s, one group of curves shows a negative deep while other curves have near-zero or only moderately positive values. This implies that the dependence structure changes not only with time but also over the space as well. As mentioned in Section 3.3.3, given the choice of spatial basis functions (3.14), each spatial random effect αi can be associated with subregion of North America (see Figure 3.5). For example, the center of the first spatial basis function of North America is at the southwest part of the United States, near central California and the second spatial basis function is centered at the Pacific Northwest, somewhere close to Seattle. To see spatial variation of teleconnection more clearly, time-varying covariances are plotted separately for each subregion of North America in Figure 3.6 and Figure 3.7. Examing 12 plots shows that the temporal teleconnection pattern changes a lot depending on which subregion of North America is considered, but stays 53

1.5

1

0.5

0

−0.5

−1

−1.5 1890 1920 1950 1980

Figure 3.4. Estimated covariancesσ ˆij(t) from 1871 to 1990

relatively the same over Ni˜no3region. Specifically, the Pacific Northwest (associated with the second random effect, α2) shows mostly negative dependence with Ni˜no3 region in the 20th century, while Southwest subregion (α3) shows mostly positive dependence with the Ni˜no3region for the entire observation period. It can also be seen that the east part of the Southern United States (α9) exhibits mostly positive relationship with Ni˜no3region except 1930-40 and that the eastern part of the Western

United States (α4 and α5) is relatively weakly related with the tropical Pacific Ocean. Important aspect of the proposed method is that it allows the strength of tele- connection to change not only with spatial location but also with time, which can reflect the dynamical feature of the teleconnection phenomenon. For example, Cal- ifornia/Nevada (α1) shows a strong positive teleconnection around 1982/1983 (blue 54

50 ° N

WA 2 12 45 ° 5 11 N MT ND 8 MN OR ID VTNH MI WI MA SD NY RI 40 ° WY CT N IA PA NJ 1 NE NV OH DE UT MDDC 4 IL IN 10 WV CA CO 7 35 ° KS VA N MO KY NC TN AZ OK NM AR SC 30 ° N GA MS AL ° W 120 ° 3 9 70 W 6 ° 115 ° W W TX LA ° 75 110° ° 80 W W 105° ° ° W W 100 W 95° W 90 W 85

Figure 3.5. Estimated covariancesσ ˆij(t) from 1871 to 1990

vertical line represents year 1983) which corresponds to the strongest El Ni˜noepisode in the 20th century. The second strongest El Ni˜noof 1972/1973 is captured (blue

vertical line represents year 1973) in covariance plots of Pacific Northwest (α2). As the Pacific Northwest has a negative relationship and the Southwest has positive rela- tionship with Ni˜no3region, the signal is represented as a deep valley in the Southwest and a high peak in the Pacific Northwest. It should be noted, however, that El Ni˜no(or La Ni˜na)events are not always

reflected in the covariance curves. California (α1), for instance, shows a strong re- sponse to 1982/1983 El Ni˜nobut only weakly responded to 1972/1973 El Ni˜no.On the other hand, Pacific Northwest (α2) is associated with Ni˜no3region more stronlgy in 1972/1973 than in 1982/1983. This indicates that the ENSO teleconnection is a non-static phenomenon rather than a simple relationship with fixed pattern. To measure the significance of the observed teleconnection signal, we calculate the empirical confidence interval of time-varying covariances as discussed in Section 3.3.3.

We calculate the empirical confidence interval of the mean covariance σbi·(t) in (3.13) 55

NA subregion = 1 NA subregion = 2 NA subregion = 3

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

NA subregion = 4 NA subregion = 5 NA subregion = 6

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

Figure 3.6. Estimated covariance σbij(t) for spatial random effects α1, . . . , α6

instead of individual covariance σbij(t) since covariance curves behave very similarly given a subregion of North America. In (3.12), we need variances of vectorized em- 56

NA subregion = 7 NA subregion = 8 NA subregion = 9

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

NA subregion = 10 NA subregion = 11 NA subregion = 12

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

Figure 3.7. Estimated covariance σbij(t) for spatial random effects α7, . . . , α12

pirical variance matrix SAB,t. Since we are using the cross product of ZA,t and ZB,t in place of SAB,t,

0 Var(vec(SAB,t)) = Var(vec(ZA,tZB,t))   ZA,tZB,t(u1)     = Var  ...    ZA,tZB,t(unB )   ΣA,tCov(ZB,t(u1),ZB,t(u1)) ... ΣA,tCov(ZB,t(u1),ZB,t(un))     =  ......    ΣA,tCov(ZB,t(un),ZB,t(u1)) ... ΣA,tCov(ZB,t(un),ZB,t(un))

= Var(ZB,t) ⊗ Var(ZA,t), 57

where ⊗ is a kronecker product. As both Var(ZA,t) and Var(ZB,t) are not available, we use locally weighted variance matrices

1990 1990 X 0 X SA,t = K((t − l)/h)ZA,tZA,t/ K((t − l)/h), l=1871 l=1871 (3.15) 1990 1990 X 0 X SB,t = K((t − l)/h)ZB,tZB,t/ K((t − l)/h), l=1871 l=1871 for Epanechnikov kernel function K(·) with bandwidth h = 3. The empirical confidence intervals of mean covariances are shown in Figure 3.8 and Figure 3.9. From the figures, it can be seen that the Pacific Northwest (α2) has two relatively significant negative relationships to ENSO in early 10’s and early

70’s while the west part of the Southern United States including Arizona state (α3) shows several significant positive relationship to ENSO during the 20th century. The significant ENSO teleconnection shown in Texas subregion (α6) around 1910 in Figure 3.9 confirms the observation in Rajagopalan et al. (2000) that this region exhibited the strongest teleconnection signal during the first epoch (1895∼1928). Note that for any given time t, if there is no teleconnection between two regions, the cross-region covariance matrix ΣAB,t would be close to a zero matrix. Hence, as a global measure of teleconnection strength, we use the squared Frobenius norm of

2 estimated cross-region covariance matrix ||ΣbAB,t||F . Figure 3.10 shows several strong teleconnections including 1982/1983 and others in the late 19th century.

3.4.2 Spatial prediction

In a task of predicting an unknown value at a new location, observed data from sites nearby that location play an important role because spatial dependence among observations is believed to be stronger when they are close to each other. However, if there exists a significant relationship between two regions as we have seen in Sec- tion 3.4.1, incorporating information from a teleconnected region may provide useful information and improve prediction performance. 58

NA subregion = 1 NA subregion = 2 NA subregion = 3

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

NA subregion = 4 NA subregion = 5 NA subregion = 6

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

Figure 3.8. Mean and empirical confidence interval of covariances asso- ciated with α1, . . . , α4

To use data from the other region, we first estimate the whole variance matrix (3.5) using SRE model where a new set of spatial basis functions is added to the spatial basis functions used for the teleconnection analysis (see Figure 3.11). Then the variance matrix of random effects Kt = Var([α1(t), . . . , αrA (t), β1(t), . . . , βrB (t)]) 2 2 2 and measurement error σt = [σA,t, σB,t] in (1.11) can be estimated by minimizing the Frobenius norm as in Section 3.3.2:

2 0 2 Q(K , σ ) = ||S − FK F − V 2 || , (3.16) t t t t σt F

where S is an empirical variance matrix, V 2 is a block diagonal matrix with the t σt 2 2 first block and the second block being σA,tInA and σB,tInA respectively, and the basis 59

NA subregion = 7 NA subregion = 8 NA subregion = 9

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

NA subregion = 10 NA subregion = 11 NA subregion = 12

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5

1890 1920 1950 1980 1890 1920 1950 1980 1890 1920 1950 1980

Figure 3.9. Mean and empirical confidence interval of covariances asso- ciated with α6, α7, α8 and α10

matrix F is a block diagonal matrix of basis matrices from the two region constructed through basis functions in Figure 3.11.

However, unlike in the case of estimating cross covariance matrix M t in (3.4), vari-

ance matrix Kt has to satisfy the positive definite condition. Cressie and Johanesson (2008) estimated the parameters using QR decomposition based on a positive definite

empirical variance matrix St obtained from binned data. In this study, the observa-

tions are not dense enough, so we take a different approach and estimate Kt directly from data without the preliminary binning process. 60

x 104 5

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0 1880 1900 1920 1940 1960 1980

Figure 3.10. Squared Frobenius norm of estimated cross-region covari- ance matrix

2 2 Let Kc(σt ) be the minimizer of (3.16) given measurement error σt . Then using derivative formulas of matrix trace, it can be shown that the non-negative definite matrix Kt is obtained as (Tzeng and Huang, 2014):

2 0 −1/2 + 0 0 −1/2 Kct(σt ) = (F F ) U tDt U t(F F ) , (3.17)

+ where U t and Dt are the eigenvector matrix and the positive part of the eigenvalue 2 matrix of Kft(σt ) defined as:

2 0 −1 0 0 −1 K (σ ) = (F F ) F (S − V 2 )F (F F ) . (3.18) ft t t σt 61

50 ° N

40 ° N

30 ° N

120 ° ° W W 70 ° ° 110 ° W W 100° W 90 W 80

5° N

0 °150 ° W 145° W 140° W 135° W 130° W 125° W 120° W 115° W 110° W 105° W 100° W 95° W 90° W

5° S

Figure 3.11. Location of centers of spatial basis functions for spatial prediction; 20 (14) new basis functions are added to 12 (6) basis functions of North America (Ni˜no3region) used in teleconnection analysis (rA = 32 and rB = 20)

Now, since

2 2 (Kct, σt ) = arg minQ(Kt, σt ) b 2 Kt,σt 2 2 = arg minQ(Kct(σt ), σt ), (3.19) 2 σt

2 2 we plug (3.17) in (3.19) and optimize over σt only, instead of (Kt, σt ) simultaneously. To evaluate the predictive performance, at each time t, each site is dropped in turn, predicted using data both from the same region and teleconnected region, and mean square prediction error (MSPE) is calculated (see MSPE row in Table 3.1). As a reference, the value of omitted site is predicted using data from the same region only and its MSPE is shown next column of the MSPE calculated from a prediction model that uses two regions’ data. Table 3.1 shows that using teleconnection information does not improve MSPE. Logarithm score (LogS) calculated as below : 62

· Logarithmic Score (LogS)

n  !2 1 X 1 1 Z(i) − Zb(i) LogS = log(2πσ2 ) + , n 2 bp,(i) 2 σ2  i=1 bp,(i)

2 where σbp,(i) is a prediction variance of Zb(i) also shows that there is little difference between the prediction models (see LogS row in Table 3.1). Theoretically, prediction error of model that uses the data from teleconnected region should be smaller than model that does not because

Var(ZA(u0)|ZA, ZB)  −1   2 0 0 ΣAA ΣAB σA0 = σ0 − [σA0, σB0]     0 ΣAB ΣBB σB0

2 0 −1 = σ0 − σA0ΣAAσA0

0 −1 0 −1 −1 −1 − (σA0ΣAAΣAB − σB0)(ΣBB − ΣBAΣAAΣAB) (ΣABΣAAσA0 − σB0) (3.20)

−1 ≤ Var(ZA(u0)|ZA)(∵ ΣBB − ΣBAΣAAΣAB 0)

However, this does not necessarily mean that prediction performance improves in practice because as indicated in (3.20), the reduction depends on the conditional

−1 0 −1 0 variance (ΣBB − ΣBAΣAAΣAB) and conditional covariance (σA0ΣAAΣAB − σB0) and hence, if the conditional covariance is not large enough or the conditional variance is small, the improvement can be very small. This can cause so-called “masking effect” (Diggle and Ribeiro, 2007 p.141; also called “screening effect”, see Schabenberger and Gotway, 2005 p.232), where given observations of neighboring sites, observations from sites whose distance to the target location is further than that of neighboring sites do not have much influence in prediction. Also it should be noted that as the spatial domain expands, the modeling complexity and uncertainty increases as well, making the prediction task even more difficult and complicated. 63

Table 3.1. Mean of mean squared prediction error (MSPE) and logarithm score (LogS) of prediction based 1) Two regions : uses both North America data and Ni˜no3region data; and 2) One region : uses data from the same region as the region of the dropped-site

North America Ni˜no3region Two regions One region Two regions One region MSPE 0.3028 0.3027 0.0337 0.0337 LogS 0.2682 0.2622 -0.1431 -0.1432

3.5 Conclusion

There is a growing consensus on climate change in recent years (IPCC Assessment Report: Climate Change; 1990, 1995, 2007 and 2013 7). And the rise of sea surface temperature in the last century is one of most salient features that has been shown repeatedly in many studies. Considering the importance of maritime tropics in the global atmospheric circulation, it is crucial to understand how the Pacific Ocean affects different regions in the world. In this chapter, a statistical model for the analysis of the teleconnection phe- nomenon is proposed. This model is flexible in that it can model two distinct features of teleconnection: spatial variability and temporal variability. The spatial variability of teleconnection is modeled through the spatial random effects and by representing the variance of spatial random effects to be a smooth function of time, the telecon- nection pattern is allowed to change by time. The North America hydrological drought index and Ni˜no3sea surface temperature data from 1870 to 1990 are analyzed using the proposed model. The results discussed in Section 3.4.1 conform to the findings in climate literature on teleconnection be- tween North America and the Pacific Ocean. Spatial prediction was conducted using data from teleconnected region but the reduction in the prediction error was only

7http://www.ipcc.ch/publications_and_data/publications_and_data_reports.shtml 64 negligible which begs a question on how much a global prediction model can improve the predictive performance over local prediction model. REFERENCES 65

REFERENCES

[1] Apanasovich, T. V. and Genton, M. G. (2010), Cross-covariance functions for multivariate random fields based on latent dimensions, Biometrika, v. 97, p. 15-30. [2] Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2009), Gaussian pre- dictive process models for large spatial data sets, Journal of the Royal Statistical Society : Series B, v. 70, p. 825-848. [3] Barnett, T. P., Dumenil, L., Schlese, U., Roeckner, E. and Latif, M. (1991) The Asian snow cover-monsoon-ENSO connection in Glantz, M. H., Katz, R. W. and Nicholls, N. (eds.), Teleconnections : Linking worldwide climate anomalies, Cambridge University Press, p. 191-226. [4] Barry, R. P. and Ver Hoef, J. M. (1996), Blackbox kriging: spatial prediction without specifying variogram models, Jornal of Agricultural, Biological, and En- vironmental Statistics, v. 1, p. 297-322. [5] Battisti, D. S. and Sarachick, E. S. (1995), Understanding and predicting ENSO, Review of geophysics, v. 33, p. 1367-1376. [6] Bjerknes, J. (1969), Atmospheric teleconnections from the equatorial Pacific, Monthly Weather Review, v. 97, p. 163-172. [7] Bornn, L., Shaddick, G. and Zidek J. V. (2012), Modeling nonstationary pro- cesses through dimension expansion, Journal of American Statistical Association, v. 107, p. 281-289. [8] de Boor, C. (2001), A practical guide to splines, New York: Springer. [9] Bretherton, C. S., Smith, C and Wallace, J. M. (1992), An intercomparison of methods for finding coupled patterns in climate data, Journal of Climate, v. 5, p. 541-560. [10] Brown, B. G. and Katz, R. W. (1991), Use of statistical methods in the search for teleconnections: past, present, and future in Glantz, M. H., Katz, R. W. and Nicholls, N. (eds.), Teleconnections : Linking worldwide climate anomalies, Cambridge University Press, p. 371-400. [11] Cane, M. A. (1991), Forecasting El Ni˜nowith a geophysical model in Glantz, M. H., Katz, R. W. and Nicholls, N. (eds.), Teleconnections : Linking worldwide climate anomalies, Cambridge University Press, p. 345-370. [12] Cheng, L. (2013), Non-parametric spatial models, Doctoral Dissertation, Purdue University, Department of Statistics. 66

[13] Choi, I., Li, B. and Wang, X. (2013), Nonparametric estimation of spatial and space-time covariance function, Journal of Agricultural, Biological, and Environ- mental Statistics, v. 18, p. 611-630. [14] Coats, S., Smerdon, J. E., Cook, B. I. and Seager, R. (2013), Stationarity of the tropical pacific teleconnection to North America in CMIP5/PMIP3 model simulations, Geophysical Research Letters, v. 40, p. 4927-4932. [15] Cole, J. E. and Cook, E. R. (1998), The changing relationship between ENSO variability and moisture balance in the continental United States, Geophysical Research Letters, v. 25, p. 4529-4532. [16] Cressie, N. (1985), Fitting variogram models by weighted least squares, Mathe- matical Geology, v. 17, p. 563-586. [17] Cressie, N. (1993), Statistics for spatial data (2nd), New York: Wiley. [18] Cressie, N. and Huang, H. C. (1999), Classes of nonseparable, spatio-temporal stationary covariance functions, Journal of American Statistical Association, v. 94, p. 1330-1340. [19] Cressie, N. and Johannesson, G. (2008), Fixed rank kriging for very large spatial data sets, Journal of the Royal Statistical Society : Series B, v. 70, p. 209-226. [20] Diggle, P. and Ribeiro, P. J. (2007), Model-based , New York: Springer. [21] Feller, W. (1996), An introduction to probability theory and its applications, New York: Wiley. [22] Fuentes, M., and Smith, R. (2001), A new class of nonstationary models, Tech- nical report, Department of Statistics, North Carolina State University. [23] Genton, M. G. and Gorsich, D. J. (2002), Nonparametric variogram and covari- ogram estimation with fourier-bessel matrices, Computational Statistics & Data Analysis, v. 41, p. 47-57. [24] Genton, M. G. (2007), Separable approximations of space-time covariance ma- trices, Environmetrics, v. 18, p. 681-695. [25] Gneiting, T. (2002), Nonseparable stationary covariance functions for space-time data, Journal of American Statistical Association, v. 97, p. 590-600. [26] Gneiting, T., Genton, M. G. and Guttorp, P. (2007), Geostatistical space-time models, stationarity, separability, and full symmetry in Finkenstadt, B., Held, L. and Isham, V. (eds.), Statistical methods for spatio-temporal systems, Chapman and Hall/CRC, Boca Raton, p. 151-175. [27] Gray, W. M. and Sheaffer, J. D. (1991), El Ni˜noand QBO influences on tropical cyclone activity in Glantz, M. H., Katz, R. W. and Nicholls, N. (eds.), Telecon- nections : Linking worldwide climate anomalies, Cambridge University Press, p. 257-284. [28] Guan, Y., Sherman, M., and Calvin, J. A. (2004), A nonparametric test for spa- tial isotropy using subsampling, Journal of the American Statistical Association, v. 99, p. 810-821. 67

[29] Hall, P., Fisher, N. I. and Hoffmann, B. (1994), On the nonparametric estimation of covariance function, Annals of Statistics, v. 22, p. 2115-2134. [30] Halpert, Michael S., Chester F. Ropelewski (1992), Surface temperature patterns associated with the Southern Oscillation, Journal of Climate, v. 5, p. 577-593. [31] Haslett, J. and Raftery, A. E. (1989), Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource, Applied Statistics, v. 38, p. 1-50. [32] Hering, A. S. and Genton, M. G. (2011), Comparing spatial predictions, Tech- nometrics, v. 53, p. 414-425. [33] Higdon, D., Swall, J. and Kern, J. (1999), Non-stationary spatial modeling in Bernardo J. M. et al. (eds.) Bayesian Statistics 6, Oxford University Press, p. 761-768. [34] Hsiang, S. M., Meng, K. C. and Cane, M. A. (2011), Civil conflicts are associated with the global climate, Nature, v. 476, p. 438-441. [35] Huang, C., Hsing, T. and Cressie, N. (2011), Nonparametric estimation of the variogram and its spectrum, Biometrika, v. 98, p. 775-789. [36] Im, H., Stein. M. L. and Zhu, Z. (2007), Semiparametric estimation of spectral density with irregular observations, Journal of American Statistical Association, v. 102, p. 726-735. [37] Jord´an,R. (1991), Impact of ENSO events on the southeastern Pacific region with special reference to the interaction of fishing and climate variability in Glantz, M. H., Katz, R. W. and Nicholls, N. (eds.), Teleconnections : Linking worldwide climate anomalies, Cambridge University Press, p. 401-430. [38] Journel, A. G. and Huijbregts, C. J. (1978), Mining geostatistics, Academic Press London. [39] Katzfuss, M. and Cressie, N. (2011), Tutorial on Fixed Rank Kriging (FRK) of CO2 data, Technical Report, Department of Statistics, Ohio State University. [40] Kleiber, W. and Nychka, D. (2012), Nonstationary modeling for multivariate spatial processes Journal of Multivariate Analysis, v. 112, p. 76-91. [41] Lau, W. K. M. and Kim, K. (2012), The 2010 Pakistan flood and Russian heat wave: Teleconnection of hydrometeorological extremes, Journal of Hydrometeor, v. 13, p. 392-403. [42] Li, B., Genton, M. G. and Sherman, M. (2007), A nonparametric assessment of properties of space-time covariance functions, Journal of the American Statistical Association, v. 102, p. 736-744. [43] Li, B., Genton, M. G. and Sherman, M. (2008), Testing the covariance structure of multivariate random fields, Biometrika, v. 95, p. 813-829. [44] Li, B. and Smerdon, J. E. (2012), Defining spatial comparison metrics for evalu- ation of paleoclimatic field reconstructions of the Common Era, Environmetrics, v. 23, p. 394-406. 68

[45] Li, Y., Li, J. and Feng, J. (2012), A teleconnection between the reduction of rainfall in Southwest western Australia and North China, Journal of Climate, v. 25, p. 8444-8461. [46] Meyer, M. C. (2008), Inference using shape-restricted regression splines, Annals of Applied Statistics, v. 2, p. 1013-1033. [47] Rajagopalan, B., Cook, E., Lall, U. and Ray, B. K. (2000), Spatiotemporal variability of ENSO and SST teleconnections to summer drought over the United States during the twentieth century, Journal of Climate, v. 13, p. 4244-4255. [48] Rayner, N. A., Parker, D. E., Horton, E. B., Folland, C. K., Alexander, L. V., Rowell, D. P., Kent, E. C. and Kaplan, A. (2003), Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century, Journal of Geophysical Research: Atmospheres, v. 108, no. D14. [49] Reich, B. and Fuentes, M. (2012), Nonparametric Bayesian models for a spatial covariance, Statistical Methodology, v. 9, p. 265-274. [50] Ropelewski, C. F. and Halpert, M. S. (1989), Precipitation patterns associated with the high index phase of the Southern Oscillation, Journal of Climate, v. 2, p. 268-284. [51] Sampson, P. D. and Guttorp, P. (1992), Nonparametric estimation of nonstation- ary spatial covariance structure, Journal of American Statistical Association, v. 87, p. 108-119. [52] Sarachick E. and Cane, M. A. (2010), The El Ni˜no-SouthernOscillation phe- nomenon, Cambridge University Press, London. [53] Schabenberger, O. and Gotway, C. A. (2005), Statistical methods for spatial data analysis, Chapman and Hall. [54] Shapiro, A. and Botha, J. D. (1991), Variogram fitting with a general class of conditionally nonnegative definite functions, Computational Statistics & Data Analysis, v. 11, p. 87-96. [55] Stein, M. L. (1999), Interpolation of spatial data: some theory for kriging, New York: Springer. [56] Schoenberg, I. J. (1938), Metric spaces and completely monotone functions, An- nals of Mathematics, v. 39, p. 811-841. [57] Tribbia, J. J. (1991), The rudimentary theory of atmospheric teleconnections associated with ENSO in Glantz, M. H., Katz, R. W. and Nicholls, N. (eds.), Teleconnections : Linking worldwide climate anomalies, Cambridge University Press, p. 285-308. [58] Tzeng, S. and Huang, H. (2014), Non-stationary multivariate spatial covariance estimation via low-rank regularization, Statistica Sinica [59] Walker, G.T. (1923), Correlation in seasonal variations of weather, VIII. A pre- liminary study of world weather. Memoirs of the India Meteorological Depart- ment, v. 24, p. 75-131. 69

[60] Zastavnyi, V. P. and Porcu, E. (2011), Characterization theorems for the Gneit- ing class of space-time covariances, Bernoulli, v. 17, p. 456-465. [61] Zheng, Y., Zhu, J. and Roy, A. (2010), Nonparametric Bayesian inference for the spectral density function of a random field, Biometrika, v. 97, p. 238-245. VITA 70

VITA

InKyung Choi was born in Seoul, South Korea. She completed her B.A. at Korea University with a major in Statistics and a minor in Economics, and obtained a M.A. in Applied Statistics at the same institution under the supervision of Professor Myoungshic Jhun. In 2009, she moved to the United States and started the Ph.D. program in the Department of Statistics at Purdue University.