Copula-Based Analysis of Dependent with and Zero Inflation

by Fuyuan Li

B.S. in Telecommunication Engineering, May 2012, Beijing University of Technology M.S. in , May 2014, The George Washington University

A Dissertation submitted to

The Faculty of The Columbian College of Arts and Sciences of The George Washington University in partial satisfaction of the requirements for the degree of Doctor of Philosophy

January 10, 2019

Dissertation directed by

Huixia J. Wang Professor of Statistics The Columbian College of Arts and Sciences of The George Washington University certifies that Fuyuan Li has passed the Final Examination for the degree of Doctor of Philosophy as of December 7, 2018. This is the final and approved form of the dissertation.

Copula-Based Analysis of Dependent Data with Censoring and Zero Inflation

Fuyuan Li

Dissertation Research Committee: Huixia J. Wang, Professor of Statistics, Dissertation Director

Tapan K. Nayak, Professor of Statistics, Committee Member

Reza Modarres, Professor of Statistics, Committee Member

ii c Copyright 2019 by Fuyuan Li All rights reserved

iii Acknowledgments

This work would not have been possible without the financial support of the National Science Foundation grant DMS-1525692, and the King Abdullah University of Science and Technology office of Sponsored Research award OSR-2015-CRG4-2582. I am grateful to all of those with whom I have had the pleasure to work during this and other related projects. Each of the members of my Dissertation Committee has provided me extensive personal and professional guidance and taught me a great deal about both scientific research and life in general. I would especially like to thank Dr. Huixia Judy Wang, the chairman of my committee. As my teacher and mentor, she has taught me more than I could ever give her credit for here. She has shown me, by her example, what a good scientist (and person) should be. I am also indebted to Dr. Yanlin Tang, who have been supportive of my career goals and who worked actively to provide me with the protected academic time to pursue those goals. Most important support to me in the pursuit of this project came from my family. I would like to thank my parents, whose love and guidance are with me in whatever I pursue. They are my ultimate role models.

iv Abstract

Copula-Based Analysis of Dependent Data with Censoring and Zero Inflation

The analysis of data with detection limits is challenging due to the high- dimensional integral involved in the likelihood. To account for the computational challenge, various methods have been developed but most of them rely on restrictive parametric distri- butional assumptions. In the first porject, we propose a semiparametric method for analyzing censored time series, where the temporal dependence is captured by parametric copula while the marginal distributions are estimated nonparametrically. Utilizing the properties of copula modeling, we develop a new copula-based sequential algorithm, which provides a convenient way to calculate the censored likelihood. Even without full paramet- ric distributional assumptions, the proposed method still allows us to easily compute the conditional quantiles of the censored response at a future time point, and thus construct both point and interval predictions. We establish the asymptotic properties of the proposed pseudo maximum likelihood estimator, and demonstrate through simulation and the analysis of a water quality data, that the proposed method is more flexible and it leads to more accurate predictions than Gaussian-based methods for non-normal data. In the second project, we focus on the analysis of multi-site precipitation data that are zero-inflated. We consider an alternative three-part copula-based model to analyze precipitations at multiple locations, where copula functions are used to capture the dependence among locations, and the marginal distribution is characterized through the first two parts of the model.

v Table of Contents

Acknowledgments ...... iv Abstract ...... v List of Figures ...... viii List of Tables ...... x 1 Introduction ...... 1 1.1 Copula ...... 2 1.2 Verification of the Markov Property ...... 3 1.3 Organization ...... 5 2 Copula-Based Semiparametric Estimation for Markov Models with Censoring ...... 7 2.1 Introduction ...... 7 2.2 Proposed Method ...... 9 2.2.1 Copula-Based Markov Model ...... 9 2.2.2 Copula-Based Semiparametric Estimator ...... 11 2.2.3 Computation ...... 15 2.2.4 Estimation of Conditional Quantiles ...... 18 2.2.5 Selection of Copula Functions ...... 21 2.2.6 Selection of Copula Based on The Test ...... 22 2.3 Simulation Study ...... 23 2.3.1 Estimation of the Copula Parameter ...... 23 2.3.2 Investigation of Clayton Copula ...... 29 2.3.3 Estimation of Conditional Quantiles ...... 32 2.3.4 Selection of Copulas ...... 38 2.3.5 Selection of Copula Based on The Goodness of Fit Test ...... 40 2.4 Large Sample Properties of the Estimator ...... 42 2.5 Technical Proofs ...... 46 2.5.1 Lemma 1 ...... 46 2.5.2 Proof of Theorem 1 (Consistency of θˆ)...... 49 2.5.3 Proof of Theorem 1 (Asymptotic Normality) ...... 51 2.6 Analysis of a Water Quality Data ...... 54 2.7 Conclusion ...... 57 3 Copula-Based Analysis of Multisite Daily Precipitation ...... 58 3.1 Introduction ...... 58 3.2 Proposed Method ...... 60 3.2.1 Notation ...... 60 3.2.2 Three-Part Model ...... 61 3.2.3 Parameter Estimation ...... 63 3.2.4 Prediction at New Time for Existing Locations ...... 67 3.2.5 Interpolation at New Locations ...... 72 3.3 Simulation ...... 76

vi 3.3.1 Simulation Design ...... 76 3.3.2 Estimation of Matérn Parameters ...... 78 3.3.3 Prediction at New Time for Existing Locations ...... 81 3.3.4 Interpolation at New Locations ...... 84 3.4 Analysis of Chicago Precipitation Reanalysis Data ...... 88 3.4.1 Preliminary Analysis of Chicago Precipitation Data ...... 89 3.4.2 Prediction at New Time for Existing Locations ...... 98 3.4.3 Interpolation at New Locations ...... 103 3.5 Conclusion ...... 107 4 Conclusion and Discussion ...... 108 4.1 Concluding Remarks ...... 108 4.2 Limitations and Future Works ...... 110 Bibliography ...... 112

A Copula-Based Semiparametric Estimation for Markov Models with Censoring ...... 118

vii List of Figures

2.1 Boxplots of the omniscient estimator and the CopC* estimator based on the true marginal distribution G∗(·), across different censoring proportions in Case 3 with Clayton(θ = 2) copula and t3 marginals. Omni: the omniscient estimator based on the nonparametric estimator Gˆn...... 30 2.2 Violin plots of the true and estimated quantile Qq(Yn+1 | In) for n = 2000 at q = 0.5 and 0.9 across 500 simulations in Cases 1-3 with 40% censoring and θ0 corresponding to τ = 0.3. Omni, CopC and Naive: the copula-based estimations by assuming the correct copula function and using the omniscient, proposed and naive estimators of θ0, respectively; CopC2: the counterpart of CopC with selected copula; GIM: the Gaussian-based imputation method from Park et al (2007)...... 41 2.3 (a) observed time series of dissolved ammonia Yt in Middle Susquehanna from 1988 to 2014, where d = 0.02 is the detection limit; (b) the Q-Q plot of log- transformed ammonia above log(d); (c) the estimated conditional quantiles of Yn+1 (curve with solid circles) and the 95% pointwise confidence band from ∗ the proposed method, and the estimated conditional quantiles of Yn+1 from the Gaussian-based imputation method of Park et al (2007) (curve with open circles)...... 55 2.4 of the scaled estimated conditional probability from the proposed copula method (CopC) and the Gaussian-based imputation method (GIM) from Park et al (2007) for the cross validation study...... 57

3.1 boxplots of λˆ for JF and JC methods ...... 79 3.2 Computing time for JF and JC methods when the number of locations increases. 80 3.3 Multivariate rank histograms for prediciton on new time obtained from the five methods...... 83 3.4 ROC curves and AUCs using misspecified link functions in Model (3.1) and Model (3.3)...... 88 3.5 Stations for Chicago precipitation data ...... 89 3.6 ROC curves of the regression models for rain occurrence at station Aurora, L P with link function ho (left) and ho (right)...... 91 L 3.7 ROC curves of the regression models for rain occurrence, with link function ho P (left) and ho (right), using data from all locations under the common parameter assumption...... 92 3.8 (i) Observed vs Fitted plots for rain amount at station Chicago Midway Airport, i l with link function ha (left) and ha (right)...... 94 3.9 (ii) Q-Q plots for rain amount at station Chicago Midway Airport, with link i l function ha (left) and ha (right)...... 95 3.10 (iii) Q-Q plots for observed against simulated rain amount at station Chicago i l Midway Airport, with link function ha (left) and ha (right)...... 96

viii 3.11 Graphical tools of the goodness of fit for the regression models for rain amount, i l with link function ha (upper panels) and ha (lower panels), using data from all locations under the common parameter assumption...... 97 3.12 Multivariate rank histograms of different methods for predicting at new times for the Chicago precipitation dataset from 1998 to 2002...... 102 3.13 Multivariate rank histograms of different methods for predicting at new times for the Chicago precipitation dataset from 1998 to 2002. (continue) . . . . . 103 3.14 ROC curves and AUCs for rain occurrence predictions of the four candidate methods ...... 106

ix List of Tables

1.1 Arhimedean copulas and their generators ...... 3

2.1 Average Bias and Root Squared Error (RMSE) of different estimators of θ for Case 1 with Gaussian copula and normal marginals...... 25 2.2 Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 2 with Gumbel copula and t3 marginals...... 26 2.3 Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 3 with Clayton copula and t3 marginals...... 27 2.4 Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 4 with Joe copula and t3 marginals...... 28 2.5 of the CopC and CopC∗ estimators for Case 3 with Clayton(θ = 2) copula and t3 marginal under left censoring...... 29 2.6 Average Bias and Mean Squared Error (MSE) of the omniscient and CopC- estimators of θ for Case 3 with Clayton copula and t3 marginals under right censoring...... 33 ∗ 2.7 Variance of the CopC and CopC estimators for Clayton(θ = 2) copula and t3 marginal under right censoring...... 33 2.8 Root mean squared error of Qˆq(Yn+1|In) at q = 0.5 and q = 0.9 for Case 1 with Gaussian copula and normal marginals based on different estimators of θ. Values are multiplied by 100...... 35 2.9 Root mean squared error of Qˆq(Yn+1|In) at q = 0.5 and q = 0.9 for Case 2 with Gumbel copula and t3 marginals based on different estimators of θ. Values are multiplied by 100...... 36 2.10 Root mean squared error of Qˆq(Yn+1|In) at q = 0.5 and q = 0.9 for Case 3 with Clayton copula and t3 marginals based on different estimators of θ. Values are multiplied by 100...... 37 2.11 Frequencies of copulas chosen by the minimal distance method across 500 simulations in Cases 1-3 with 40% censoring and θ0 corresponding to τ = 0.3. 39 2.12 10×RMSE of Qˆq(Yn+1 | In) from different methods at q = 0.5 and q = 0.9 in Cases 1-3 with 40% censoring and θ0 corresponding to τ = 0.3...... 39 2.13 Frequencies of copulas chosen by the minimal distance (MD) and GOF methods across 64 simulations in Case 1 with 40% censoring, θ0 = 0.5 and n = 500. The truth is Gaussian copula...... 41

3.1 Bias and Root Mean Squared Error of the proposed methods for estimating the Matérn parameter λˆ ...... 79 3.2 CRPS for one-day-ahead precipitation predictions...... 82 ∗(m) M 3.3 100×Averaged CRPS for interpolated ensembles {yT ∗,S∗ }m=1 and truly gener- ated wT ∗,S∗ across simulations...... 85 3.4 100×Averaged Mean Absolute Deviation for the of the interpolated ∗(m) M ensembles {yT ∗,S∗ }m=1 and truly generated wT ∗,S∗ across simulations. . . . . 86

x 3.5 100×AUC of predicted raining probability against observed rain occurrence of two methods under misspecification of link funcitons ...... 87 3.6 Estimated parameters in the regression model for rain occurrence under the common parameter assumption...... 92 3.7 Estimated parameters in the regression model for rain amount assuming com- mon parameters across locations...... 95 3.8 BIC for regression models for rain amount using two choices of link functions with common or location-specific parameters...... 97 3.9 Performance of predictions on new time based on different models for Chicago precipitation dataset from 1998 to 2002...... 101 3.10 Performance of cross-validation of interpolations based on different models for Chicago precipitation dataset from 1998 to 2002...... 105

xi Chapter 1

Introduction The main goal of this work is the analysis of dependent data with censoring or zero inflation. Censoring is sometimes caused by the lower/upper detection limits. With lower detection limits, data are available only when the measured concentration is greater than the detection limit, and those below the detection limits are thus censored. For example, in Wang and Fygenson (2009), the viral load measurements, which is a marker of disease progression, are often subject to left censoring due to a lower limit of quantification. In Singh and Nocerino (2002), the concentration of a pollutant in the environment is not observable if the value is smaller than the fixed lower detection limit of measuring device or process. Moulton and Halsey (1995) studied the antibody concentration in blood serum, which is left-censored due to the lower detection limit of the quantitative assays. In our first project, we focus on the analysis of Markov time series data subject to detection limits. We propose to use copulas functions to model the temporal dependence while leaving the marginal distribution unspecified. In the second project, we focus on the analysis of multi-site precipitation data that are zero-inflated, that is, the data contains many zero values for dry days. In the literature, some researchers used censored models to handle zero inflation; see for instance Gleiss et al. (2015), de Oliveira Jr et al. (2017). We consider an alternative three-part copula model to analyze multi-site precipitation, where copula functions are used to capture the dependence among locations, and the marginal distribution (at each location) is characterized through the first two parts of the model. In this chapter, we provide some brief overview of copula, and a chi-squared test for

1 verifying the Markov property for time series data.

1.1 Copula

A d-dimensional copula function C is defined as a multivariate distribution function with uniform marginal distributions,

C(u1,u2,··· ,ud) = P(U1 ≤ u1,U2 ≤ u2,··· ,Ud ≤ ud),

where Ui’s follow uniform distribution on I=[0,1]. A strict mathematical definition of a d-dimensional copula is a function C from Id to I with the following properties.

T d (i) For every u = (u1,...,ud) in I , C(u) = 0 if at least one coordinate of u is 0.

(ii) If all coordinates of u are 1 except uk , then C(u) = uk, ∀ k = 1,...,d. d (iii) For every a = (a1,...,ad) and b = (b1,...,bd) in I such that ak < bk ∀ k = 1,...,d, the

b b bd bd−1 b1 dth order of difference of C on [a,b], ∆aC(t) ≥ 0, where ∆aC(t) = ∆ad ∆ad−1 ···∆a1C(t)

bk and ∆akC(t) = C(t1,··· ,tk−1,bk,tk+1,··· ,td) −C(t1,··· ,tk−1,ak,tk+1,··· ,td). By standardizing marginal distributions to be uniform marginals, copula functions cap- ture a scale-free measure of dependence. Sklar’s theorem (Sklar, 1959) elucidates copula’s role of linkage between multivariate distribution functions and their univariate margins. Sklar’s theorem Let H be a d-dimensional joint distribution function with margins

T d F1,...,Fd. Then there exists a copula C such that, for ∀x = (x1,...,xd) ∈ R ,

  H(x1,x2,··· ,xd) = C F1(x1),F2(x2),··· ,Fd(xd) . (1.1)

If F1,...,Fd are continuous, then C is unique; otherwise, C is uniquely determined on

RanF1 × ··· × RanFd, where RanF1,...,RanFd denote the ranges of the marginal cdf’s.

Conversely, if C is a copula function and F1,...,Fd are distribution functions, then the

function defined by formula 1.1 is a joint distribution function with margins F1,...,Fd.

2 Based on the Sklar’s theorem, one can transform an elliptical multivariate distribution to a copula function. Such copula functions are known as elliptical copulas. Some commonly used elliptical copulas include Gaussian copula and t-copula, which are derived from multivariate Normal distribution and multivariate t distribution, respectively. Another commonly used class of copulas are Archimedean copulas, which are con- structed by

−1  C(u1,··· ,ud) = ϕ ϕ(u1) + ··· + ϕ(ud) , where ϕ, the "generator" function, is continuous and strictly decreasing from [0,1] to [0,∞]. In bivariate cases, some widely used Archimedean copulas and their generators are shown in Table 1.1.

Table 1.1: Arhimedean copulas and their generators

copula Cθ (u,v) ϕθ (k) θ ∈ −θ −θ −1/θ 1 −θ Clayton [max{u + v − 1,0}] θ (k − 1)[−1,0) ∪ (0,∞) Gumbel exp{−[(−lnu)θ + (−lnv)θ ]1/θ } (−lnk)θ [1,∞) Joe 1 − [(1 − u)θ + (1 − v)θ + (1 − u)θ (1 − v)θ ]1/θ −ln[1 − (1 − k)θ ][1,∞)

Both elliptical copulas and Archimedean copulas can capture various properties of the multivariate distributions. Some copulas, say Gumbel or Gaussian copulas, shows the symmetry in distribution; Clayton and Joe copulas describe the lower and upper tail dependence, respectively.

1.2 Verification of the Markov Property

In the first project (Chapter 2), we assume that the latent time series is a Markov process of order 1. Markov property refers to the property of memoryless of a stochastic process, named after Andrew A. Markov (Markov, 1954). Specifically, a Markov process of order p is a

sequence of dependent random variables {xt,t = 1,2,...}, identified by increasing values of

a parameter t, with the property that any prediction of the next value of the sequence xt (with

3 t−1 t > p) knowing the preceding states {xi}i=1, is only based on the last p-state xt−p,...xt−1. In order to validate the Markov property on real data, we can consider the following

chi-squared test. Note that in our empirical study, the observations Yt,t = 1,... are subject ∗ to detection limit d, so that the chi-squared test for latent fully observed data Yt is not ∗ accessible. However, a chi-squared test for the censoring status δt = I(Yt > d) can still be conducted.

n Suppose that a time series sequence {Xt}t=1 consists only 1 and 0’s, i.e., Xt = 1 or 0, for t = 1,...,n. We want to check whether Markov property (of order 1) holds for this

sequence. Under the Markov property, we have P(Xt|Xt−1,...,X1) = P(Xt|Xt−1). By Chen et al. (2012), to check the Markov property, we need essentially check the conditional independence, that is, test the following null hypothesis:

P(Xt|Xt−1,Xt−2) = P(Xt|Xt−1),

against the alternative that the above equality does not hold for at least some t. Now consider the following two contingency tables:

Xt

(Xt−2,Xt−1) 0 1 (0,0) n000 n001 n00· (1,0) n100 n101 n10· n·00 n·01 n·0·

Xt

(Xt−2,Xt−1) 0 1 (0,1) n010 n011 n01· (1,1) n110 n111 n11· n·10 n·11 n·1·

The notation is as follows:

4 • ni jk: the counts of pattern (Xt−2,Xt−1,Xt) = (i, j,k) occurred in the time series, where i, j,k ∈ {0,1};

• ni j· := ∑k∈{0,1} ni jk;

• n· jk := ∑i∈{0,1} ni jk;

• n· j· := ∑i∈{0,1} ni j· = ∑k∈{0,1} n· jk.

Under the null hypothesis of conditional independence, the chi-squared test

(n − n (n /n )(n /n ))2 Q = i jk · j· · jk · j· i j· · j· ∑ ∑ n n n n n j∈{0,1} i,k∈{0,1} · j·( · jk/ · j·)( i j·/ · j·) follows asymptotically the χ2(2) distribution.

1.3 Organization

In this dissertation, we develop copula-based methods for analyzing dependent data with detection limits or zero inflation. In Chapter 2, we propose a semiparametric copula-based method for analyzing time series data subject to detection limits. Meanwhile, utilizing the properties of copula mod- eling, we develop a new copula-based sequential sampling algorithm, which provides a convenient way to calculate the censored likelihood. Even without full parametric distribu- tional assumptions, the proposed method still allows us to easily compute the conditional quantiles of the censored response at a future time point, and thus construct both point and interval predictions. We establish the asymptotic properties of the proposed pseudo maximum likelihood estimator, and demonstrate through simulation and the analysis of a water quality data, where the proposed method is proved to be flexible and could lead to accurate predictions for non-normal data. In Chapter 3, focus on the analysis of precipitation data from multiple locations. We explore a copula-based method to handle the mixture of zero and positive-valued responses. We use the copula functions to capture the spatial

5 dependence and characterize the marginal distribution of precipitations at each location given covariates through a two-part model. In Chapter 4, main conclusions are drawn. Some potential ways of generalizing our proposed methods are also discussed.

6 Chapter 2

Copula-Based Semiparametric Estimation for Markov Models with Cen- soring

2.1 Introduction

Data from environmental, biomedical and social studies are often subject to detection limits (DLs) and thus censored; see examples in Wang and Fygenson (2009), Fu and Wang (2011), Nadkarni et al. (2011), Fu and Wang (2012), Bernhardt et al. (2014) and Zhao et al. (2014). In empirical studies, common practices to handle data with DLs include deleting the values below the DLs, or treating the censored values as observed, or replacing them with a constant such as half of the DL (Glelt, 1985; Huang et al., 2001). These approaches, though simple, often lead to biased estimation and inference (Stansfield, 2001; Austin and Brunner, 2003; Gray et al., 2004). On the other hand, finding the appropriate single imputation to correct for bias is not always feasible except for some special cases (Richardson and Ciampi, 2003; Schisterman et al., 2006), and it often relies on parametric distributional assumptions for the censored variable (Garland et al., 1993). In this chapter, we focus on time series data subject to lower DLs, which are also referred to as fixed censoring as the censoring points are known everywhere. Fixed censoring complicates the analysis for time series data in two major ways. Firstly, due to temporal dependence, the resulting likelihood involves high-dimensional integrals whose dimension

7 can be as large as the number of censored observations. This leads to great computational difficulty, as the integrals often have no closed forms and standard numerical integration is impractical to approximate integrals of such high dimensionality. Secondly, under fixed censoring, the model parameters are often nonidentifiable without parametric distributional assumptions. To overcome these two issues, researchers have proposed various models and computing methods. For instance, Zeger and Brookmeyer (1986) discussed maximum likelihood estimation for censored autoregressive models, Lee (1999) proposed a simulated likelihood method for dynamic Tobit models, Park et al. (2007) developed an imputation algorithm for ARMA (autoregressive ) models, Monokroussos (2013) proposed a Markov Chain Monte Carlo procedure with data augmentation for limited time series data including censored data as a special case; and Mohammad (2014) considered a quasi-EM algorithm for censored ARMA models. More recently, Wang and Chan (2017) proposed a quasi- likelihood method and Schumacher et al. (2017) proposed a of the EM algorithm for censored regression models with autoregressive errors. All these methods require some parametric distributional assumptions on the disturbances, such as normality, which eases some computational challenge but could be too restrictive and violated in applications. Choi and Portnoy (2016) proposed an estimation method for censored quantile autoregression models by using only complete data, that is, the time points where the covariates are uncensored, and the method requires fitting the entire quantile process that is assumed to take the same functional form. In this chapter, we propose a new semiparametric approach based on copula for analyz- ing time series with fixed censoring. The proposed method does not require any parametric assumptions on the disturbance distribution, and the copula used to capture temporal de- pendence can be chosen from a class of copula families. To our best knowledge, this is a first attempt to analyze censored time series without making full parametric distributional assumptions, so the method offers more flexibility in modeling and allows wider applica-

8 tions. The proposed method has the following additional advantages. Under our proposed copula-based model setup, the dimensionality of integrals involved in the likelihood func- tion is reduced from the total number of censored observations to the maximum length of consecutively censored observations, and this greatly reduces the computational burden. Utilizing the properties of copula modeling, we develop a new copula-based sequential sampling algorithm, which provides a convenient tool to calculate the multi-dimensional integration in the censored likelihood. Finally, even without parametric assumptions on the marginal distribution, we can still estimate the conditional quantiles of the response at a future time point and use them for prediction, and the estimates are automatically monotonic across quantiles. The structure of this chapter is as follows. Section 2.2 explains the model assumptions and our proposed method. In Section 2.3, we conduct simulation studies for four main copula families: Gaussian copula, Gumbel copula, Clayton copula and Joe copula, investi- gating the performances of the proposed estimators and the estimations of the conditional quantile based on the proposed estimators. Selection of copulas and sensitivity analysis to misspecification of copulas are also considered in this section. In Section 2.4, we establish the theoretical properties of the proposed estimator. The practical value of the proposed method is illustrated through a real data analysis in Section 2.6.

2.2 Proposed Method

2.2.1 Copula-Based Markov Model

For many time series, it is plausible to assume that the current outcomes only depend on their recent past, that is, to assume the Markov property. In this paper we assume that the

∗ latent sequence {Yt ,t = 1,2,...} is a stationary Markov process of order p with continuous state space. For the ease of presentation, we focus on p = 1; the proposed method can be extended to p > 1 with some modification of the computing algorithm. Under this assumption, the probabilistic properties of the latent sequence are fully determined by the

9 ∗ ∗ ∗ ∗ ∗ bivariate joint distribution of neighboring time points Yt−1 and Yt , denoted as H (y1,y2). ∗ ∗ ∗ By Sklar’s theorem (Sklar, 1959; Nelsen, 2006), we can uniquely represent H (y1,y2) by ∗ ∗ ∗ the marginal distribution function of Yt and the copula function of Yt−1 and Yt . However, ∗ ∗ due to a lower detection limit, we only observe Yt = max(Yt ,d) and δt = I(Yt > d) with d being the known detection limit (or censoring point). In the following we formally state the copula-based censored Markov model assumption.

∗ Assumption A1. The latent sequence {Yt ,t = 1,...,n} is a sample of a stationary first-oder ∗ Markov process, which is generated from the copula-based Markov model {G ,C(·,·;θ0)}, where G∗ is the true common marginal distribution, which is absolutely continuous with

∗ ∗ respect to Lebesgue measure on the real line, and C(·,·;θ0) is the copula for (Yt−1,Yt ) with

the true unknown parameter θ0 such that

∗ ∗ ∗ n ∗ ∗ ∗ ∗ o H (y1,y2) = C G (y1),G (y2);θ0 . (2.1)

We assume that C(·,·;θ) is absolutely continuous with respect to Lebesgue measure on [0,1]2, and it is neither the Fréchet-Hoeffding upper bound, C(u,v;θ) = min(u,v), nor the lower bound, C(u,v;θ) = max(u + v − 1,0). Due to left fixed censoring, we only observe

∗ ∗ the sequence {(Yt,δt),t = 1,...,n}, where Yt = max(Yt ,d) and δt = I(Yt > d). Under the model assumption A1, we propose a semiparametric estimation method based on nonparametric estimation of the marginal distribution G∗ and parametric modeling of the copula function. The proposed method can be used to carry out one-step-ahead forecast based on censored time series, and this is of particular interest in applications. Copula has been used in other works to analyze time series, e.g. Chen and Fan (2006), Beatriz Vaz de and Aíube (2011), Rémillard et al. (2012), Patton (2012), Emura et al. (2017), but all these focused on fully observed data.

10 2.2.2 Copula-Based Semiparametric Estimator

In this section, we present a pseudo maximum likelihood estimator of the copula parameter

θ0, based on the observed censored data. For general copula families, the copula parameter

θ0 may not necessarily be the same as the temporal dependence, but it fully determines the temporal dependence in the latent Markov process. We first derive the

∗ for the latent sequence {Yt } under the Markov property. From (2.1), we can obtain the joint ∗ ∗ density of (Yt−1,Yt ) for any t > 1 as

∗ ∗ ∗ n ∗ ∗ ∗ ∗ o ∗ ∗ ∗ ∗ h (yt−1,yt ) = c G (yt−1),G (yt );θ g (yt−1)g (yt ), where g∗ is the density corresponding to the marginal distribution function G∗, and c(u,v;θ) = ∂C(u,v;θ)/(∂u∂v) is the copula density of C(·,·;θ). Consequently, the conditional density

∗ ∗ of Yt given Yt−1 is given by

∗ ∗ ∗ ∗ ∗ n ∗ ∗ ∗ ∗ o f (yt | yt−1) = g (yt )c G (yt−1),G (yt );θ .

Provided that the marginal distribution G∗ is known, the full likelihood function of the latent

∗ ∗ sequence {y1,...,yn} is given by

n n ∗ ∗ ∗ ∗ ∗ ∗ n ∗ ∗ ∗ ∗ o L (θ | y1,y2,...,yn) = ∏g (yt ) × ∏c G (yt−1),G (yt );θ . (2.2) t=1 t=2

However, due to censoring, we can show that the Markov property no longer holds

for the observed sequence {y1,...,yn}, and this complicates the likelihood function and estimation. For instance, suppose that the outcome at time t − 1 is censored and we only

∗ ∗ observe yt−1 = d, while {yt−2 = yt−2,...,y1 = y1} are fully observed. Then the conditional

11 ∗ density of Yt given the past history is

∗ f (Yt = x | yt−1 = d,yt−2,...,y1) R d f ∗(Y ∗ = x,y∗ ,y∗ ,...,y∗)dy∗ = −∞ t t−1 t−2 1 t−1 R d ∗ ∗ ∗ ∗ ∗ −∞ f (yt−1,yt−2,...,y1)dyt−1 R d f ∗(Y ∗ = x | y∗ ,y∗ ,...,y∗) f ∗(y∗ ,y∗ ,...,y∗)dy∗ = −∞ t t−1 t−2 1 t−1 t−2 1 t−1 R d ∗ ∗ ∗ ∗ ∗ −∞ f (yt−1,yt−2,...,y1)dyt−1 R d f ∗(Y ∗ = x | y∗ ) f ∗(y∗ ,y∗ ,...,y∗)dy∗ = −∞ t t−1 t−1 t−2 1 t−1 R d ∗ ∗ ∗ ∗ ∗ −∞ f (yt−1,yt−2,...,y1)dyt−1 R d f ∗(Y ∗ = x | y∗ ) f ∗(y∗ | y∗ ,...,y∗) f ∗(y∗ ,...,y∗)dy∗ = −∞ t t−1 t−1 t−2 1 t−2 1 t−1 R d ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ −∞ f (yt−1 | yt−2,...,y1) f (yt−2,...,y1)dyt−1 R d f ∗(Y ∗ = x | y∗ ) f ∗(y∗ | y∗ )dy∗ = −∞ t t−1 t−1 t−2 t−1 R d ∗ ∗ ∗ ∗ −∞ f (yt−1 | yt−2)dyt−1 R d f ∗(Y ∗ = x,y∗ | y∗ )dy∗ = −∞ t t−1 t−2 t−1 R d ∗ ∗ ∗ ∗ −∞ f (yt−1 | yt−2)dyt−1 ∗ = f (Yt = x | yt−1 = d,yt−2),

∗ which does not equal f (Yt = x | yt−1 = d) in general. Similarly, if yt−1 and yt−2 are both ∗ ∗ censored, we can obtain that f (Yt = x | yt−1 = yt−2 = d,yt−3,...,y1) = f (Yt = x | yt−1 = yt−2 = d,yt−3). In general, the following Proposition 1 states the general probabilistic ∗ properties for the latent Yt given the previous censored time series.

∗ Proposition 1. Under the assumption A1, the conditional density of the latent Yt given the

past information set It−1 = {(yt−1,δt−1),...,(y1,δ1)} is

 ∗  f (y | yt−1), if yt−1 > d,  t  ∗ ∗ f (yt | It−1) = f (yt | yt−1 = ··· = yt−l = d,yt−l−1), if yt−1 = ··· = yt−l = d,    and yt−l−1 > d,l ≥ 1.

∗ Proposition 1 shows that under censoring, the conditional distribution of Yt only de-

pends on the previous state yt−1 if yt−1 is not censored; otherwise it depends on the past

12 consecutively censored time points and the nearest uncensored observation prior to time t.

Now we proceed to derive the full likelihood of the observed sequence {y1,...,yn}. We

assume that the data consists of total Kn consecutively censored sequences, and denote

Di = {ysi+1 = d,...,ysi+li = d} as the i-th sequence, where li is the length of this sequence, ∗ and si is the time of the previous uncensored observation, i = 1,...,Kn. Let n = #{t : δt =

δt+1 = 1} be the number of uncensored adjacent neighbors, and let t1 < t2 < ··· < tn∗ be the ∗ time points such that δt j = δt j+1 = 1, j = 1,...,n . For simplicity, assume that y1 and yn are not censored, otherwise one can simply discard them and this will not affect the asymptotic properties of the resulting estimator. For the one-step-ahead forecast at time n + 1, whether

yn is censored or not makes a difference; see Section 2.2.4 for some related discussion. Then the censored likelihood (of the observed sequence) is

n∗ ∗ Ln(θ) = ∏ g (yt) ∏ c(ut j ,ut j+1;θ) {t:δt =1} j=1 Kn Z π Z π si+li+1 ×∏ ··· ∏ c(ut−1,ut;θ)dusi+1 ···dusi+li , (2.3) 0 0 i=1 t=si+1

∗ ∗ ∗ where ut = G (yt), t = 1,...,n and π = P(Yt ≤ d) = G (d) is the censoring probability; The derivation is as follow.

By equation(2.2), we can obtain the likelihood for the observed data {Y1,...,Ys1 } prior

to D1,

s1 s1 ∗ n ∗ ∗ o Ln(θ | y1,...,ys1 ) = ∏g (yt)∏c G (yt−1),G (yt);θ . t=1 t=2

13 Thus the likelihood associated with data {y1,...,ys1 ,ys1+1,...,ys1+l1 ,ys1+l1+1} is

Ln(θ | y1,...,ys1 ,ys1+1 = ... = ys1+l1 = d,ys1+l1+1) Z d Z d ∗ = ... L (θ;y1,...,ys1+l1+1)dys1+1 ...dys1+l1 −∞ −∞ s1 s1 ∗ n ∗ ∗ o = ∏g (yt)∏c G (yt−1),G (yt);θ t=1 t=2 Z d Z d s1+l1+1 s1+l1+1 n o × ... g∗(y ) c G∗(y ),G∗(y ) dy ...dy ∏ t ∏ t−1 t ;θ s1+1 s1+l1 −∞ −∞ t=s1+1 t=s1+1 s1 s1 Z π Z π s1+l1+1 = g∗(y ) c(u ,u )g∗(y ) ... c(u ,u )du ...du , ∏ t ∏ t−1 t;θ s1+l1+1 ∏ t−1 t;θ s1+1 s1+l1 0 0 t=1 t=2 t=s1+1

∗ ∗ where π = G (d), ut = G (yt). Consequently, we can derive the full likelihood function as (2.3).

∗ Provided that G is known, we can maximize Ln(θ) in (2.3) with respect to θ to obtain the maximum likelihood estimator. However, in practice, G∗ is unknown and has to be estimated. We leave G∗ completely unspecified and estimate it nonparametrically. Under the left censoring, the lower tail of G∗ is not identifiable without any parametric distributional assumptions. Fortunately, for y > d, we can still consistently estimate G∗(y) by the rescaled empirical distribution function

n ˆ 1 Gn(y) = ∑ I(yt ≤ y), y ≥ d, n + 1 t=1 where the rescaling of n+1 on denominator, instead of n, is to avoid estimating the probabil-

−1 n ity exactly as one, and the censoring probability π can be estimated by πˆ = n ∑t=1 I(δt = 0).

Note that the first term in (2.3) does not depend on θ. Therefore, by ignoring this term,

and plugging in πˆ and {uˆt = Gˆn(yt) : δt = 1}, we can define the pseudo maximum likelihood

estimator of θ0 as

14 θˆ = argmaxLLn(θ), θ ∗ n Kn n o ˆ LLn(θ) = ∑ log{c(uˆt j ,uˆt j+1;θ)} + ∑ log Int(uˆsi ,uˆsi+li+1,π;θ) , (2.4) j=1 i=1 where

Int(uˆsi ,uˆsi+li+1,πˆ;θ) (2.5) Z πˆ Z πˆ n si+li o = ··· c(uˆsi ,usi+1;θ) ∏ c(ut−1,ut;θ) 0 0 t=si+2

× c(usi+li ,uˆsi+li+1;θ)dusi+1 ···dusi+li .

The numerical algorithm for calculating (2.5) is discussed in Section 2.2.3.

2.2.3 Computation

When analyzing censored time series data, one computational challenge is that the likelihood involves multi-dimensional integration. Specifically, in our setup, to calculate the proposed

estimator θˆ, we need to solve an li-dimensional integration for each consecutively censored

sequence Di of length li, for i = 1,...,K. To overcome this challenge, we develop a convenient sequential importance sampling method for integration.

For notational ease, we simplify c(u,v;θ) to c(u,v) whenever clear from the context.

For any u0 > π and um+1 > π, consider the following m-dimensional integral

Z π Z π I(u0,um+1) = ··· c(u0,u1)c(u1,u2)···c(um,um+1)du1 ···dum. (2.6) 0 0

Let h¯(·) be a m-dimensional density function for u = (u1,...,um) such that h¯(u) > 0 for u ∈ [0,π]m. We can then write

" m+1 # ∏i=1 c(ui−1,ui) I(u0,um+1) = Eh¯ h¯(u1,...,um)

and approximate the integral through Monte Carlo simulation by drawing samples from h¯.

15 For efficient computation, we’d like to find a h¯ that is easy to simulate values from, and is nearly proportional to the integrand of (2.6). Note that by (2.2), we can regard the product of the first m terms in the integrand of (2.6),

m ∏i=1 c(ui−1,ui), as the joint density of the random process (U0,...,Um) that is generated from a stationary first-order Markov process, with Uni f (0,1) marginal distribution and

bivariate copula function C(·,·). Since the integration (2.6) is restricted to [0,π] for each

ui,i = 1,...,m, we consider a first-order Markov process with truncated bivariate copula functions.

t Define C (·|Ui = ui) as the conditional truncated distribution of Ui+1 given Ui = ui and

Ui+1 ≤ π. Then we have

t C(ui+1|Ui = ui) C (ui+1|Ui = ui) ≡ P(Ui+1 ≤ ui+1|Ui = ui,Ui+1 ≤ π) = , C(π|Ui = ui) where C(u|v) = ∂C(u,v)/∂v is the conditional copula function of U given V = v, and the corresponding truncated conditional copula density is

t c(ui+1|Ui = ui) c(ui,ui+1) c (ui+1|Ui = ui) ≡ c(ui+1|Ui = ui,Ui+1 ≤ π) = = . C(π|Ui = ui) C(π|Ui = ui)

Therefore, we can take the importance sampling density as

m m t c(ui−1,ui) h¯(u1,...,um) = ∏c (ui|ui−1) = ∏ , for ui ∈ [0,π],i = 1,...,m. i=1 i=1 C(π|ui−1)

Using the Markov property, it is very convenient to simulate the truncated sample (u1,...,um) sequentially from h¯. The detailed algorithm is as follows.

t Step 1. Generate w1 ∼ Uni f (0,1), and generate U1 = u1 from C (·|U0 = u0) by letting t C (u1|U0 = u0) = w1, i.e., by solving C(u1|U0 = u0) = w1C(π|U0 = u0) for u1.

Step 2. For i = 2,...,m, we first generate w2,...,wm independently from Uni f (0,1), and t then generate Ui = ui sequentially from the conditional truncated distribution C (·|Ui−1 =

16 ui−1) by solving C(ui|Ui−1 = ui−1) = wiC(π|Ui−1 = ui−1).

Step 3. Repeating Steps 1–2 for B times, we obtain B independent samples (ub,1,...,ub,m), b = 1,...,B. We can then estimate I(u0,um+1) by

B m B m −1 c(u0,ub,1)c(ub,m,um+1)∏i=2 c(ub,i−1,ub,i) −1 B ∑ = B ∑ c(ub,m,um+1)∏C(π|ub,i−1), b=1 h¯(ub,1,...,ub,m) b=1 i=1 where ub,0 = u0 for all b = 1,...,B. For further illustration, we next describe how to generate truncated samples from h¯(·) for cases with Clayton and Gaussian copula functions.

Example 1 (Clayton copula)

The Clayton copula is given by

C(u,v;θ) = (u−θ + v−θ − 1)−1/θ .

The associated copula density function is

c(u,v;θ) = (θ + 1)(uv)−θ−1(u−θ + v−θ − 1)−(2θ+1)/θ , and the conditional copula function is

C(v|u;θ) = P(V ≤ v|U = u) = ∂C(u,v;θ)/∂u = u−θ−1(u−θ + v−θ − 1)−1−1/θ .

Consequently, we can generate samples (U1,...,Um) sequentially from h¯(·) by letting

−1/θ n −θ/(θ+1)  −θ o −θ−1 −θ −θ −1−1/θ ui = w˜i − 1 ui−1 + 1 , withw ˜i = wiui−1 (ui−1 + π − 1) ,

for i = 1,...,m, where wi are independently generated from Uni f (0,1). Clayton copula belongs to the Archimedean copula family. Since Archimedean copula functions have

17 explicit expressions, we can often simulate the truncated samples in a direct way.

Example 2 (Gaussian copula)

Consider the bivariate Gaussian copula function

−1 −1 C(u,v;θ) = Φ2(Φ (u),Φ (v);θ),

where Φ is the distribution function of the standard normal distribution, and Φ2(·,·;θ) is the distribution function of the bivariate normal distribution with mean zero, variance 1

and correlation coefficient θ. Suppose (Ui−1,Ui) follows the bivariate Gaussian copula −1 distribution with marginal Uni f (0,1) distribution. Let Xi = Φ (Ui). Then (Xi−1,Xi) 2 follows the bivariate normal distribution Φ2(·,·;θ), and Xi|Xi−1 = xi−1 ∼ N(θxi−1,1 − θ ). Note that

C(ui|Ui−1 = ui−1) = P(Ui ≤ ui|Ui−1 = ui−1)

−1 = P(Xi ≤ Φ (ui)|Xi−1 = xi−1) Φ−1(u ) − θx  = Φ √ i i−1 , 1 − θ 2

−1 where xi−1 = Φ (ui−1). We can then sequentially generate the truncated samples m (u1,...,um) ∈ [0,π] from h¯(·) by letting

h i  −1 −1  p 2 −1 −1 Φ (ui−1) − θΦ (π) ui = Φ 1 − θ Φ (w˜i) + θΦ (ui−1) , withw ˜i = wiΦ √ , 1 − θ 2 for i = 1,...,m, where wi are independent Uni f (0,1) random variables.

2.2.4 Estimation of Conditional Quantiles

Based on the estimated copula parameter θˆ, we are able to make prediction on Yn+1 = ∗ max(d,Yn+1), conditional on the past information set In = {(y1,δ1),...,(yn,δn)}. Without parametric assumptions on the marginal distribution G∗(·), we cannot estimate the condi-

18 tional moments of Yn+1|In. However, we can still estimate the conditional quantiles of ∗ Yn+1|In. Given In, let Qq(Yn+1|In) and Qq(Yn+1|In) denote the q-th conditional quantile ∗ of the latent response Yn+1 and the observed response Yn+1, respectively, where 0 < q < 1. ∗ ∗ ∗ Note that the quantity Qq(Yn+1|In) satisfies that P{Yn+1 ≤ Qq(Yn+1|In)|In} = q, and by

the equivariance property of quantiles to monotonic transformation, we have Qq(Yn+1|In) = ∗ max{d,Qq(Yn+1|In)}. We consider two scenarios. ∗ Scenario 1: yn > d, i.e, yn = yn is fully observed. In this case, by Proposition 1, we know ∗ ∗ ∗ that the conditional density of Yn+1 given In is f (yn+1|In) = f (yn+1|yn). Therefore, the ∗ conditional q-th quantile Qq(Yn+1|In) can be obtained by solving

Z Q∗ Z Q∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ q = f (yn+1|yn)dyn+1 = g (yn+1)c(G (yn),G (yn+1);θ)dyn+1 −∞ −∞ Z Q∗ Z G∗(Q∗) ∗ ∗ ∗ ∗ ∗ = c(G (yn),G (yn+1);θ)dG (yn+1) = c(G (yn),v;θ)dv −∞ 0  ∗ ∗ ∗  = C G (Q )|U = G (yn);θ with respect to Q∗. If θ and G∗(·) are known, we can solve

 ∗  C v|U = G (yn);θ = q to get

−1 ∗ v = C (q|U = G (yn);θ),

∗ ∗−1 ∗−1 and then obtain Qq(Yn+1|In) = G (v) and Qq(Yn+1|In) = max{G (v),d}. In practice, ∗ ∗ θ and G (·) are unknown, and we replace θ by θˆ and G (y) by Gˆn(y). Then, we can solve

  −1  C v|U = Gˆn(yn);θˆ = q for v and denote the solution as vˆ = C q|U = Gˆn(yn);θˆ . Note ˆ ˆ−1 that if v ≥ Gn(d) = πˆ, then Gn (vˆ) ≥ d can be observed. Otherwise if v ≤ πˆ, we can only

19 observe d, the detection limit. Therefore, we define

  ˆ−1 Gn (vˆ), ifv ˆ > πˆ, Qˆq(Yn+1|In) = (2.7)  d, ifv ˆ ≤ πˆ.

Scenario 2: Yn = d is censored, and there is a consecutive censored sequence Yn,...,Yn−l+1 ∗ with δn = ··· = δn−l+1 = 0 and Yn−l = yn−l > d. In this case, the conditional density of ∗ ∗ ∗ ∗ Yn+1 given In is f (yn+1|In) = f (yn+1 | yn = ··· = yn−l+1 = d,yn−l). Therefore, we need to solve

Z Q∗ ∗ ∗ ∗ q = f (yn+1 | yn = ··· = yn−l+1 = d,yn−l)dyn+1 −∞ Z Q∗ ∗ ∗ ∗ ∗ ∗ ∗ Pr(Yn+1 = yn+1,yn < d,··· ,yn−l+1 < d,Yn−l = yn−l) ∗ = ∗ ∗ ∗ ∗ dyn+1 −∞ Pr(yn < d,··· ,yn−l+1 < d,Yn−l = yn−l) ∗ R Q R d R d ···R d f ∗(y∗ ,y∗,··· ,y∗ ,y∗ )dy∗ ···dy∗dy∗ = −∞ −∞ −∞ −∞ n+1 n n−l+1 n−l n−l+1 n n+1 R d R d R d ∗ ∗ ∗ ∗ ∗ −∞ −∞ ··· −∞ f (yn,··· ,yn−l+1,yn−l)dyn−l+1 ···dyn ∗ R Q R d R d ···R d ∏n+1 g∗(y∗)∏n+1 c(G∗(y∗ ),G∗(y∗);θ)dy∗ ···dy∗dy∗ = −∞ −∞ −∞ −∞ t=n−l t t=n−l+1 t−1 t n−l+1 n n+1 R d R d R d n ∗ ∗ n ∗ ∗ ∗ ∗ ∗ ∗ −∞ −∞ ··· −∞ ∏t=n−l g (yt )∏t=n−l+1 c(G (yt−1),G (yt );θ)dyn−l+1 ···dyn G∗(Q∗) R R π ···R π ∏n+1 c(u∗ ,u∗;θ)du∗ ···du∗du∗ = 0 0 0 t=n−l+1 t−1 t n−l+1 n n+1 R π R π n ∗ ∗ ∗ ∗ 0 ··· 0 ∏t=n−l+1 c(ut−1,ut ;θ)dun−l+1 ···dun ∗ ∗ ∗ In(un−l,G (Q );θ) = ∗ Id(un−l;θ)

∗ ∗ ∗ with respect to Q , where un−l = G (yn−l) and

Z G∗(Q∗) Z π Z π ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ In(un−l,G (Q );θ) = ··· c(un−l,un−l+1)···c(un,un+1)dun−l+1 ···dun+1 0 0 0 Z π Z π ∗ ∗ ∗ ∗ ∗ ∗ ∗ Id(un−l;θ) = ··· c(un−l,un−l+1)···c(un−1,un)dun−l+1 ···dun. 0 0

Following the similar idea as in Scenario 1, we plug in θˆ and Gˆn(·) and estimate

20 Qq(Yn+1|In) by (2.7), withv ˆ being the solution to

Iˆ (Gˆ (y ),v;θˆ) n n n−l = q. (2.8) Iˆd(Gˆn(yn−l);θˆ)

In (2.8), In and Id can be approximated by using the sequential important sampling algorithm as described in Section 2.2.3, i.e.,

B n ˆ ˆ ˆ −1 ˆ ˆ In(Gn(yn−l),v;θ) = B ∑ ∏ C(πˆ|ub,t−1;θ) ∗C(vˆ|ub,n;θ), b=1t=n−l+1 B n ˆ ˆ ˆ −1 ˆ Id(Gn(yn−l);θ) = B ∑ ∏ C(πˆ|ub,t−1;θ), b=1t=n−l+1 where ub,n−l = Gˆn(yn−l) for all b = 1,...,B, and ub,t, for t = n − l + 1,...,n are from the importance sampling.

2.2.5 Selection of Copula Functions

The proposed semiparametric estimation requires the copula function to be correctly spec- ified. In practice, the true copula is unknown, and needs to be chosen from some copula families. In this paper, we adapt the idea in Joe (2015, Chapter 5.10) and choose the copula among parametric copulas by comparing their estimates with the empirical copula function.

n Let {Xi = (Xi,1,Xi,2)}i=1 be a random sample of a continuous two-dimensional ran- 0 dom vector X, and for simplicity, we assume that there are no ties. Let Ri, j = #{i =

1,...,n : Xi0, j ≤ Xi, j} be the increasing rank of Xi, j in {X1, j,...,Xn, j}, i = 1,...,n, j = T 2 n 1,2. For any u = (u1,u2) ∈ [0,1] , the empirical copula is defined as Cn(u) = # i : o (Ri, j − 0.5)/n ≤ u j, j = 1,2 /n. We can then choose the copula by minimizing d(C,Cn) =

R 2 |C(u) −Cn(u)| du among candidate copulas. To adapt our setup with censored data, we consider the modified distance

n∗ ˆ ∗−1  ˆ ˆ ˆ ˆ ˆ 2 d(C,Cn) = n ∑ C{Gn(yt j ),Gn(yt j+1);θ} −Cn{Gn(yt j ),Gn(yt j+1)} , j=1

21 which is the average of squared distances between copulas over all uncensored adjacent

pairs, and here θˆ is the estimator of the parameter in the candidate copula C(·).

2.2.6 Selection of Copula Based on The Goodness of Fit Test

Besides the above minimum distance method, we can also adapt the bootstrap goodness-of- fit test in Genest et al. (2009) to censored data by calculating the p-value using all uncensored adjacent pairs, and selecting the copula giving the largest p-value. Note that the motivation and necessity of modifying the original test in Genest et al. (2009) arise by the existence of censoring and detection limit in our data.

∗ ∗ Let t1 < t2 < ··· < tn be the time points such that δt j = δt j+1 = 1, j = 1,...,n , and

∗ denote O = {t1,...,tn }. Our modified test statistic is based on {ut j }t j∈O and the estimated copula parameter θˆ. The p-value is calculated through parametric bootstrap by using all uncensored adjacent pairs. The detailed procedure is as follows.

∗ 1 • Calculate Cn (u) = n∗ ∑t j∈O 1(ut j ≤ u), for u ∈ {ut1 ,...utn∗ }.

ˆ • Calculate Cθˆ (u) = C(u;θ), for u ∈ {ut1 ,...utn∗ }.

∗ ∗ n ∗ 2 • Calculate Sn = ∑ j=1{Cn (ut j ) −Cθˆ (ut j )} .

• Repeat the following steps for every k = 1,2,...K.

∗(k) n ˆ ˆ – Generate a random sample {Yt }t=1 based on C(·,·;θ) and G(·).

(k) ∗(k) ˆ (k) – Define Yt = max(Yt ,d), estimate θ and obtain the subset (k) = {t(k),...t(k) } based on {Y (k)}n . O 1 n∗(k) t t=1

– Calculate Cn∗(k) (u) and Cθˆ (u) for u ∈ {u (k) ,...u (k) }. t1 tn∗ n∗(k) 2 – Calculate Sn∗(k) = ∑ j=1 {Cn∗(k) (u (k) ) −Cθˆ (u (k) )} . t j t j

K • Estimate the p-value by ∑k=1 1(Sn∗(k) > Sn∗ )/K.

22 Using the above GOF test procedure, we can then choose the copula that yields the largest p-value among all candidates. However, our numerical study shows that this method is not only time consuming, but also does not lead to better performance than the minimum distance method; see Section 2.3.5.

2.3 Simulation Study

In this section, we assess the finite sample performance of the proposed method for censored time series data, in terms of the estimation for copula parameter θ and the conditional quantiles.

∗ n The latent time series data {Yt }t=1 are generated from the Markov process with bivariate ∗ ∗ ∗ n copula C(u,v;θ ) and marginal distribution G (·). we generate the latent data {Yt }t=1 in the following steps.

n Step 1. Generate an i.i.d. sequence from Uni f (0,1), denoted as {Tt}t=1. −1 Step 2. Set U1 = T1 and Ut = C (Tt|U = Ut−1) for t = 2,...,n. ∗ ∗−1 Step 3. Set Yt = G (Ut) for t = 1,...,n. ∗ Then we obtain the censored response Yt = max(Yt ,d) and the censoring indicator ∗ δt = I(Yt > d), t = 1,...,n. In the data generation, we consider four choices of copula functions: Gaussian, Gumbel,

Clayton and Joe, and two choices of marginal distribution: Normal and student t3. For each case, to achieve stationarity in finite samples, we generate a time series of length 5000 and keep the last n = 2000 observations as our data, i.e., we have a burning sequence of length 3000. Such burning aims to remove the impact from the initial first generated response. The detection limit d is chosen to yield 20% or 40% censoring proportion.

2.3.1 Estimation of the Copula Parameter

For each copula, we choose three θ ∗ values, resulting in three levels of Kendall’s τ, 0.3, 0.5 and 0.6. Recall that Kendall’s τ is defined as the probability of concordance minus the

23 probability of discordance (Nelsen, 2006). More specifically, let (X1,Y1) and (X2,Y2) be independent and identically distributed random vectors, then

τ = P[(X1 − X2)(Y1 −Y2) > 0] − P[(X1 − X2)(Y1 −Y2) < 0].

We consider the following three cases.

Case 1: Gaussian copula with correlation coefficient θ = 0.5,0.7,0.8. The marginal distribution is N(0,1), and the censoring proportion (CP) is around 20% when the detection limit is d = −0.7, and around 40% when d = −0.2.

Case 2: Gumbel copula with copula parameter θ = 1.5,2,2.5. The marginal distribution

is student t3 distribution, and the censoring proportion (CP) is around 20% when the detection limit is d = −1, and around 40% when d = −0.25.

Case 3: Clayton copula with copula parameter θ = 1,2,3. The marginal distribution is

student t3 distribution, and the censoring proportion (CP) is around 20% when the detection limit is d = −0.1, and around 40% when d = −0.3.

Case 4: Joe copula with copula parameter θ = 1.9,2.9,3.8. The marginal distribution is

student t3 distribution, and the censoring proportion (CP) is around 20% when the detection limit is d = −0.98, and around 40% when d = −0.28. For each scenario, simulation is repeated 500 times. We compare the finite sample performance of three estimators: (1) our proposed copula-based estimator for censored data (CopC); (2) the omniscient estimator, i.e., the copula-based estimator proposed by Chen

∗ n and Fan (2006), based on the latent data {Yt }t=1 (Omni); (3) the naive estimator of Chen and Fan (2006) based on the observed data that ignores the censoring (Naive). For the CopC estimator, we used B=5000 samples for the importance sampling. Tables 2.1 – 2.4 summarize the estimation results for Cases 1–4, respectively. For Case 1 with Gaussian copula, the proposed CopC estimator performs comparably with the omniscient estimator. The naive estimator clearly fails with large biases. As

24 expected, the CopC estimator has larger MSE when the censoring proportion gets higher.

Table 2.1: Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 1 with Gaussian copula and normal marginals.

CP = 0.2 CP = 0.4 Omni CopC Naive CopC Naive θ = 0.5 (τ = 0.3) Bias × 100 -0.01 -0.10 12.91 -0.07 24.37 (0.09) (0.10) (0.10) (0.10) (0.08) RMSE × 10 0.21 0.22 1.31 0.23 2.44 (0.01) (0.01) (0.01) (0.01) (0.01) θ = 0.7 (τ = 0.5) Bias × 100 -0.15 -0.22 9.19 -0.19 14.57 (0.08) (0.08) (0.08) (0.09) (0.06) RMSE × 10 0.17 0.19 0.93 0.20 1.46 (0.01) (0.01) (0.01) (0.01) (0.01) θ = 0.8 (τ = 0.6) Bias × 100 -0.24 -0.31 4.36 -0.30 8.87 (0.06) (0.07) (0.07) (0.08) (0.06) RMSE × 10 0.15 0.16 0.46 0.17 0.90 (0.00) (0.01) (0.01) (0.01) (0.01)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: the ∗ n omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: the copula- based estimator for censored data; Naive: the naive estimator that ignores censoring.

For Case 2 with Gumbel copula, It appears that the naive estimator is closely centered around a point away from the truth, yielding systematic bias and it does not vanish with larger sample. For instance, under 40% censoring when we increase the sample size from 500 to 2000 with Gumbel copula with true parameter equals 2, the bias of the CopC estimator decreases from -0.0303 to -0.0027, while the bias of the naive estimator increases from 0.5338 to 0.7748.

25 Table 2.2: Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 2 with Gumbel copula and t3 marginals. CP = 0.2 CP = 0.4 Omni CopC Naive CopC Naive θ = 1.5 (τ = 0.3) Bias × 100 0.08 0.17 14.37 0.13 53.13 (0.19) (0.20) (0.28) (0.22) (0.43) RMSE × 10 0.43 0.44 1.56 0.48 5.40 (0.01) (0.01) (0.03) (0.02) (0.04) θ = 2 (τ = 0.5) Bias × 100 -0.40 -0.46 22.61 -0.27 77.48 (0.38) (0.41) (0.53) (0.45) (0.77) RMSE × 10 0.86 0.91 2.55 1.01 7.94 (0.03) (0.03) (0.06) (0.03) (0.08) θ = 2.5 (τ = 0.6) Bias × 100 -2.46 -2.18 30.64 -1.93 100.32 (0.60) (0.66) (0.82) (0.74) (1.18) RMSE × 10 1.36 1.48 3.57 1.67 10.37 (0.05) (0.05) (0.09) (0.06) (0.13)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: the ∗ n omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: the copula- based estimator for censored data; Naive: the naive estimator that ignores censoring.

26 Table 2.3: Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 3 with Clayton copula and t3 marginals. CP = 0.2 CP = 0.4 Omni CopC Naive CopC Naive θ = 1 (τ = 0.3) Bias × 100 -0.83 0.29 91.39 0.93 262.14 (0.48) (0.41) (0.90) (0.45) (1.60) RMSE × 10 1.07 0.92 9.36 1.01 26.46 (0.04) (0.03) (0.10) (0.03) (0.17) θ = 2 (τ = 0.5) Bias × 100 -4.74 -0.34 95.79 0.52 292.61 (1.19) (1.03) (1.93) (1.08) (3.17) RMSE × 10 2.70 2.31 10.50 2.41 30.11 (0.10) (0.09) (0.22) (0.09) (0.36) θ = 3 (τ = 0.6) Bias × 100 -11.18 1.39 112.64 2.17 336.86 (2.18) (2.03) (3.66) (1.93) (5.54) RMSE × 10 5.00 4.53 13.92 4.31 35.89 (0.18) (0.19) (0.44) (0.18) (0.62)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: the ∗ n omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: the copula- based estimator for censored data; Naive: the naive estimator that ignores censoring.

For Case 3 with Clayton copula, the CopC estimator clearly outperforms the naive estimator in all scenarios considered. In this case, we observe an interesting phenomenon: the CopC estimator is not sensitive to low censoring proportion, and it gives even smaller MSE than the omniscient estimator under 20% and 40% censoring when n = 2000. To understand this better, we carry out a more detailed investigation in Section 2.3.2. Our investigation suggests the following two possible reasons. First, the variance of the copula- based estimators (for both the CopC and the omniscient estimator) is dominated by the nonparametric estimation of the marginal distribution G∗(·). If the marginal distribution is known, then the variance of CopC increases with the censoring proportion. Second, in the presence of censoring, the CopC estimator maximizes the likelihood that involves integration, and such smoothing of the likelihood leads to more stable estimation in finite samples.

27 Table 2.4: Average Bias and Root Mean Squared Error (RMSE) of different estimators of θ for Case 4 with Joe copula and t3 marginals. CP = 0.2 CP = 0.4 Omni CopC Naive CopC Naive θ = 1.9 (τ = 0.3) Bias × 100 -0.23 -0.25 5.78 -0.14 44.85 (0.48) (0.48) (0.53) (0.49) (0.80) RMSE × 10 1.08 1.08 1.33 1.10 4.83 (0.04) (0.04) (0.05) (0.04) (0.08) θ = 2.9 (τ = 0.5) Bias × 100 -2.91 -2.88 9.62 -2.77 91.30 (1.14) (1.15) (1.26) (1.19) (1.79) RMSE × 10 2.56 2.59 2.97 2.67 9.96 (0.10) (0.10) (0.13) (0.11) (0.19) θ = 3.8 (τ = 0.6) Bias × 100 -9.30 -9.31 13.52 -9.44 136.23 (2.00) (2.02) (2.24) (2.11) (3.00) RMSE × 10 4.56 4.60 5.18 4.82 15.18 (0.18) (0.19) (0.25) (0.19) (0.33)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: the ∗ n omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: the copula- based estimator for censored data; Naive: the naive estimator that ignores censoring.

For Case 4 with Joe copula, when censoring proportion increases, the MSE of the CopC estimator increases slowly. Similarly as in the Clayton copula case, the insensitivity of the CopC estimator to the censoring proportion, is partly caused by the fact that the variance of the estimator is dominated by the nonparametric estimation of the marginal distribution; see more discussions in Section 2.3.2.

28 2.3.2 Investigation of Clayton Copula

In Section 2.3.1, we noticed that the CopC estimator has even smaller MSE than the omniscient estimator, and its MSE does not increase when the censoring proportion goes up from 0.2 to 0.4. We carry out additional studies to investigate this appearing strange phenomenon. The variance of the CopC estimator comes from three main sources: (1) the variation due to the nonparametric estimation of the marginal distribution G(·); (2) the variation of the copula parameter estimation; (3) the censoring effect. To study the effect of nonparametric estimation of G(·) and the censoring, we consider an alternative copula-based estimator that is obtained by using the true marginal distribution G∗(·), referred to as CopC*.

For demonstration, we focus on Clayton copula with θ = 2.

(i) Variation due to nonparametric estimator Gˆn Table 2.5 summarizes the of the CopC and CopC* estimators across censoring

proportions. We also present the boxplots of the omniscient estimator (based on Gˆn(·)) and the CopC* estimator across different censoring proportions in Figure 2.1. The results suggest that the variance of the CopC estimator appears to be dominated by the variation

∗ of Gˆn(·), especially when the censoring proportion is low. However, when G (·) is known, the variance of the CopC* estimator increases with the censoring proportion as expected, but the increasing trend is relatively slow compared to the variance due to nonparametric estimation of marginals.

Table 2.5: Variance of the CopC and CopC∗ estimators for Case 3 with Clayton(θ = 2) copula and t3 marginal under left censoring.

CP 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Var(CopC) × 100 6.40 5.17 5.14 5.36 5.61 6.01 6.93 8.13 Var(CopC∗) × 100 0.37 0.79 1.04 1.73 2.16 3.03 4.59 6.00

∗ CP: censoring proportion; CopC: the proposed estimator with estimated marginal Gˆn; CopC : the proposed estimator with true t3 marginal distribution.

29 Figure 2.1: Boxplots of the omniscient estimator and the CopC* estimator based on the true marginal distribution G∗(·), across different censoring proportions in Case 3 with Clayton(θ = 2) copula and t3 marginals. Omni: the omniscient estimator based on the nonparametric estimator Gˆn.

30 (ii) Smoothing effect

Table 2.3 shows that the CopC estimator in the case of 20% and 40% censoring has even smaller MSE than the omniscient estimator for Clayton copula. Note that in the presence of censoring, the likelihood involves multivariate integration, which tends to smooth out the likelihood and may consequently lead to more stable CopC estimation in finite samples. Under light censoring, the gain of variation due to smoothing in finite samples offsets the loss of information due to censoring. This together with the fact that the variance of

the CopC estimator is dominated by the variance of Gˆn(·) is the possible cause for this counter-intuitive behavior of the CopC estimator under light left censoring. However, as shown in Table 2.5, when censoring proportion gets higher (higher than 40%), the variance of the CopC estimator clearly increases with the censoring proportion. (iii) CopC and CopC* estimators under right censoring

To further understand the behavior of the proposed estimator for Case 3 with Clayton copula, we carry out additional simulation under right censoring. Recall that Clayton copula captures lower tail dependence but no upper tail dependence. Let X and Y be two continuous random variables with distribution functions F and G, respectively. Then the left (lower) tail dependence of X and Y is defined as

−1 −1 λL = lim P[Y ≤ G (t)|X ≤ F (t)], t→0+

provided that the limit exists and the right (upper) tail dependence of X and Y is defined as

−1 −1 λU = lim P[Y > G (t)|X > F (t)], t→1−

provided that the limit exists. More details about tail dependence can be found in Nelsen (2006).

31 For a given bivariate copula function C(·,·), the left tail dependence can be obtained by

C(u,u) λL = lim , u→0+ u provided that the limit exists and the right tail dependence can be obtained by

1 −C(u,u) λU = 2 − lim , u→1− 1 − u provided that the limit exists. For Clayton copula with parameter θ, the left tail dependence

−1/θ is λL = 2 and the right tail dependence is λU = 0. When θ=1, 2, 3, the corresponding left tail dependence is 0.5, 0.7 and 0.8, respectively. Table 2.6 summarizes the simulation results of the omniscient and CopC estimators for censoring proportions of 0.2, 0.3, 0.4 and 0.5, and Table 2.7 presents the variances of the

CopC and CopC* estimators under right censoring for Clayton copula with θ=2 across different censoring proportions. The results suggest that the MSEs of CopC estimators are larger than those of the omniscient, and they tend to increase with heavier censoring. In addition, compared to Table 2.5 for left censoring, the variance of the CopC estimator increases in censoring proportion faster under right censoring. One possible explanation is as follows. Under right censoring, our proposed method involves an estimation of G∗(·) at the left tail, whose variation is likely to be inflated by the positive left-tail dependence of Clayton copula.

2.3.3 Estimation of Conditional Quantiles

In this section, we assess the finite sample performance of Qˆq(Yn+1|In), the estimated conditional quantiles of Yn+1 given history information. We consider the following three cases:

Case 1: Gaussian copula with correlation coefficient θ = 0.5,0.7,0.8. The marginal distribution is N(0,1), and the censoring proportion is around 20% when the detection limit

32 Table 2.6: Average Bias and Mean Squared Error (MSE) of the omniscient and CopC- estimators of θ for Case 3 with Clayton copula and t3 marginals under right censoring. Omni CP = 0.2 CP = 0.3 CP = 0.4 CP = 0.5 θ = 1 (τ = 0.3) Bias×100 -1.08 -0.37 -0.28 -0.10 -0.02 MSE×100 1.09 1.09 1.10 1.15 1.21 θ = 2 (τ = 0.5) Bias×100 -5.50 -5.72 -5.67 -5.61 -5.69 MSE×100 6.73 6.74 6.94 7.32 7.83 θ = 3 (τ = 0.6) Bias×100 -12.87 -13.97 -14.07 -14.30 -14.99 MSE×100 22.73 23.18 23.93 25.56 27.20

CP: censoring proportion.

∗ Table 2.7: Variance of the CopC and CopC estimators for Clayton(θ = 2) copula and t3 marginal under right censoring.

CP 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Var(CopC) × 100 6.40 6.42 6.62 7.01 7.51 8.51 9.90 12.53 Var(CopC∗) × 100 0.37 0.38 0.40 0.44 0.51 0.56 0.71 0.96

∗ CP: censoring proportion; CopC: the proposed estimator with estimated marginal Gˆn; CopC : the proposed estimator with true t3 marginal distribution.

is d = −0.7, and around 40% when d = −0.2.

Case 2: Gumbel copula with copula parameter θ = 1.5,2,2.5. The marginal distribution

is student t3 distribution, and the censoring proportion is around 20% when the detection limit is d = −1, and around 40% when d = −0.25.

Case 3: Clayton copula with copula parameter θ = 1,2,3. The marginal distribution is

student t3 distribution, and the censoring proportion is around 20% when the detection limit is d = −0.1, and around 40% when d = −0.3. For each scenario, simulation is repeated 500 times. We compare the finite sample performance of three estimators at two quantile levels q = 0.5 and q = 0.9 with sample size

n = 2000: (1) Qˆq(Yn+1|In) based on our proposed copula-based estimator for censored data

(CopC); (2) Qˆq(Yn+1|In) based on the omniscient estimator, i.e., the copula-based estimator

33 ∗ n ˆ proposed by Chen and Fan (2006), based on the latent data {Yt }t=1 (Omni); (3) Qq(Yn+1|In) based on the naive estimator of Chen and Fan (2006) based on the observed data that ignores the censoring (Naive). For the CopC estimator, we used B=5000 samples for the importance sampling. Tables 2.8 – 2.10 summarize the mean squared error (MSE) of the omniscient,

naive and CopC estimators of Q0.5(Yn+1|In) for Cases 1–3, respectively. Note that the

omniscient estimator for Qq(Yn+1|In) is obtained by plugging in the omniscient estimator

of θˆ to the conditional quantile function of Yn+1 given the observed data In. Therefore the omniscient estimator in Tables 2.8 – 2.10 depend on the observed censored data and thus varies across different censoring proportions. As expected, in Case 1, the MSE of the CopC estimator is larger than that of the omniscient estimator. but for for both 20% and 40% censoring, the CopC estimators are more efficient than the naive estimator. Table 2.8 also shows that the CopC estimator of

Q0.5(Yn+1|In) has slightly smaller MSE under 40% censoring than under 20% censoring. Note that under heavier censoring, the conditional median of Y∗ is more likely to be less than d, for which the true conditional median of Y is a constant d, and this leads to smaller variation in the estimation of Q0.5(Yn+1|In). In Case 3 with Clayton copula, the MSE of the CopC estimators are always more efficient than that of the naive estimators and comparable to that of omniscient estimators. Note that

Table 2.10 shows that when θ = 3, Qˆ0.5(Yn+1|In) under light and heavy censoring seem outperform the omniscient estimators. Such outperformance is due to the less variation of

the CopC estimator of θ. More details have been discussed in Section 2.3.2.

34 Table 2.8: Root mean squared error of Qˆq(Yn+1|In) at q = 0.5 and q = 0.9 for Case 1 with Gaussian copula and normal marginals based on different estimators of θ. Values are multiplied by 100.

CP = 0.2 CP = 0.4 Omni CopC Naive Omni CopC Naive θ = 0.5 (τ = 0.3) q = 0.5 3.11 3.13 8.33 2.39 2.43 15.34 (0.12) (0.12) (0.37) (0.13) (0.12) (0.74) q = 0.9 4.39 4.40 13.80 4.36 4.61 26.72 (0.16) (0.17) (0.41) (0.17) (0.18) (0.54) θ = 0.7 (τ = 0.5) q = 0.5 2.42 2.42 6.79 2.27 2.17 10.09 (0.16) (0.14) (0.35) (0.17) (0.14) (0.53) q = 0.9 4.47 4.68 14.58 4.66 4.76 20.73 (0.21) (0.24) (0.3) (0.2) (0.21) (0.32) θ = 0.8 (τ = 0.6) q = 0.5 2.24 2.22 3.58 2.18 2.13 6.16 (0.15) (0.15) (0.21) (0.15) (0.16) (0.32) q = 0.9 4.94 4.84 8.88 5.02 4.95 15.00 (0.38) (0.31) (0.23) (0.33) (0.30) (0.21)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: Qˆq(Yn+1|In) ∗ n based on the omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: Qˆq(Yn+1|In) based on the copula-based estimator for censored data; Naive: Qˆq(Yn+1|In) based on the naive estimator that ignores censoring. q: the conditional quantile of interest.

35 Table 2.9: Root mean squared error of Qˆq(Yn+1|In) at q = 0.5 and q = 0.9 for Case 2 with Gumbel copula and t3 marginals based on different estimators of θ. Values are multiplied by 100.

CP = 0.2 CP = 0.4 Omni CopC Naive Omni CopC Naive θ = 1.5 (τ = 0.3) q = 0.5 6.95 6.94 12.14 6.50 6.39 22.25 (1.53) (1.53) (1.93) (1.66) (1.55) (1.77) q = 0.9 12.91 12.92 15.85 12.93 13.01 29.79 (3.09) (3.09) (2.49) (3.09) (3.07) (1.39) θ = 2 (τ = 0.5) q = 0.5 4.73 4.82 6.3 4.39 4.49 12.31 (0.92) (0.9) (0.42) (0.99) (1.03) (0.65) q = 0.9 10.31 10.29 12.7 10.38 10.37 22.82 (1.41) (1.41) (1.12) (1.4) (1.39) (0.34) θ = 2.5 (τ = 0.6) q = 0.5 3.56 3.53 4.71 3.38 3.55 8.17 (0.65) (0.65) (0.68) (0.68) (0.68) (0.66) q = 0.9 9.54 9.56 11.98 9.42 9.5 61.48 (1.12) (1.13) (0.86) (1.13) (1.12) (22.5)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: Qˆq(Yn+1|In) ∗ n based on the omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: Qˆq(Yn+1|In) based on the copula-based estimator for censored data; Naive: Qˆq(Yn+1|In) based on the naive estimator that ignores censoring. q: the conditional quantile of interest.

36 Table 2.10: Root mean squared error of Qˆq(Yn+1|In) at q = 0.5 and q = 0.9 for Case 3 with Clayton copula and t3 marginals based on different estimators of θ. Values are multiplied by 100.

CP = 0.2 CP = 0.4 Omni CopC Naive Omni CopC Naive θ = 1 (τ = 0.3) q = 0.5 3.42 3.25 14.7 2.90 2.73 26.22 (0.14) (0.14) (0.43) (0.15) (0.15) (1.04) q = 0.9 8.04 7.41 35.32 8.52 7.61 60.71 (0.28) (0.24) (0.87) (0.34) (0.25) (1.03) θ = 2 (τ = 0.5) q = 0.5 3.51 2.81 9.39 3.28 2.53 19.44 (0.19) (0.14) (0.38) (0.2) (0.14) (0.87) q = 0.9 9.92 8.56 25.52 10.24 10.09 47.55 (0.35) (0.34) (0.6) (0.42) (0.58) (0.94) θ = 3 (τ = 0.6) q = 0.5 4.17 3.11 7.67 4.02 2.64 16.22 (0.30) (0.21) (0.45) (0.31) (0.19) (0.87) q = 0.9 10.69 8.98 21.06 11.05 9.82 40.64 (0.36) (0.38) (0.58) (0.56) (0.85) (0.99)

θ: the copula parameter; τ: Kendall’s τ correlation coefficient; CP: censoring proportion. Omni: Qˆq(Yn+1|In) ∗ n based on the omniscient estimator proposed by Chen and Fan (2006), based on the latent data {Yt }t=1; CopC: Qˆq(Yn+1|In) based on the copula-based estimator for censored data; Naive: Qˆq(Yn+1|In) based on the naive estimator that ignores censoring. q: the conditional quantile of interest.

37 2.3.4 Selection of Copulas

In this section, we assess the performance of the proposed estimator of Qq(Yn+1 | In), the q-th conditional quantile of Yn+1 at time n + 1 given the observed data, when the true copula function is unknown and chosen from a class of candidate families. We consider

Cases 1-3 with 40% censoring and θ0 corresponding to the Kendall’s τ = 0.3, and two quantile levels q = 0.5 and 0.9. For comparison, we include the following five estimators: Omni, CopC, CopC2, Naive and GIM. The Omni, CopC and Naive are the copula-based approaches by assuming the correct copula function and using the omniscient, proposed and naive estimators of θ0, respectively. The CopC2 is the counterpart of CopC but with the copula function selected from five candidate families: Gaussian, Clayton, Gumbel, Frank and Joe copulas using the minimal distance method described in Section 2.2.5. When the incorrect copula is selected, the quantile estimation from CopC2 corresponds to that from a misspecified model. The method GIM is the Gaussian-based imputation method from Park et al. (2007), which fits a censored AR(1) process by assuming that the latent time series are multivariate normal with an AR(1) structure. Table 2.11 summarizes the copula selection results based on the minimal distance method. Results show that the minimal distance method works well for selecting copula, and the selection accuracy increases with the sample size. Table 2.12 summarizes the RMSE of Qˆq(Yn+1 | In) from five methods. As expected, the Omni is the most efficient estimator as it is based on the correct copula function and latent data without censoring. In all cases considered, both CopC and CopC2 estimators are competitive to Omni, and they have smaller RMSEs than Naive and GIM, even when the copula was occasionally misspecified in CopC2.

38 Table 2.11: Frequencies of copulas chosen by the minimal distance method across 500 simulations in Cases 1-3 with 40% censoring and θ0 corresponding to τ = 0.3. Selected copula Case Gaussian Gumbel Clayton Joe Frank 1. Gaussian 462 4 0 0 34 2. Gumbel 29 454 0 17 0 3. Clayton 1 0 487 0 12

Table 2.12: 10×RMSE of Qˆq(Yn+1 | In) from different methods at q = 0.5 and q = 0.9 in Cases 1-3 with 40% censoring and θ0 corresponding to τ = 0.3.

Case q Omni CopC CopC2 Naive GIM 1. Gaussian 0.5 0.24 (0.01) 0.24 (0.01) 0.26 (0.01) 1.58 (0.08) 0.57 (0.02) 0.9 0.43 (0.02) 0.45 (0.02) 0.47 (0.02) 2.76 (0.05) 0.51 (0.02) 2. Gumbel 0.5 0.65 (0.16) 0.64 (0.15) 0.76 (0.14) 2.22 (0.17) 1.58 (0.07) 0.9 1.29 (0.28) 1.30 (0.28) 1.39 (0.27) 2.99 (0.13) 4.20 (0.27) 3. Clayton 0.5 0.29 (0.01) 0.27 (0.01) 0.32 (0.02) 2.62 (0.10) 2.04 (0.26) 0.9 0.87 (0.04) 0.82 (0.04) 0.90 (0.05) 16.17 (7.07) 5.25 (0.30)

Omni, CopC and Naive: the copula-based estimations by assuming the correct copula function and using the

omniscient, proposed and naive estimators of θ0, respectively; CopC2: the counterpart of CopC with selected copula; GIM: the Gaussian-based imputation method from Park et al. (2007).

For further investigation, we include the violin plots of the true and estimated quantiles across 500 simulations for n = 2000 in Figure 2.2. Each shows the box-plot in the middle and the kernel on both sides. Results show that the distributions of the quantile estimates from Omni, CopC and CopC2 have similar shapes with those from the true quantiles, and the similarity increases with sample size. In contrast, the distribution of naive estimates in all three cases and that of GIM in Cases 2-3 have clearly different shapes with the truth. Even in Case 1 when the normality assumption in Park et al. (2007) is satisfied, the distribution of GIM estimates shows a slight deviation from the truth at q = 0.5, which together with the larger MSE is caused by the unstable estimation of the parameters in the AR model.

In terms of computing time, to obtain the estimation of both θ and the conditional quantiles for one simulation with 40% censoring, it took Omni and CopC methods on

39 average 6 and 100 seconds for n = 2000 for Case 1 with Gaussian copula; 36 and 2000 seconds for n = 2000 for Case 2 with Gumbel copula; 0.3 and 9 seconds for n = 2000 for Case 3 with Clayton copula. On the other hand, on average CopC2 took 2509 seconds to complete the entire analysis for one simulation, including the selection of copula among five candidates, and GIM took 2521 seconds. Even though the proposed method requires more computing time than GIM for smaller sample size, it provides more flexibility for analyzing non-Gaussian data. In addition, the proposed importance sampling algorithm greatly reduces the computational cost over standard integration algorithms, especially for larger sample sizes or data with heavy censoring. For instance, using the adaptive multi-dimensional integration algorithm in the R function “hcubature" in the package cubature will cost more

than two hours to obtain the estimator θˆ for one simulation in Case 3 with 20% censoring with reduced sample size n = 500, and it will take even longer time for 40% censoring.

2.3.5 Selection of Copula Based on The Goodness of Fit Test

In Section 2.2.6, we discuss an alternative to selection of copulas, which is based on bootstrapped GOF test ((Genest et al., 2009)). In this section, we conduct a simulation to compare the performance of the GOF and minimal distance methods for copula selection. The candidate copulas include Gaussian, Clayton, Gumbel, Frank and Joe copulas. Due to the high computational cost of the GOF method caused by bootstrap, we only include

64 simulations for Case 1 (Gaussian copula) with 40% censoring, θ0 = 0.5, and n = 500. For the GOF method, K = 100 was used. Table 2.13 presents the frequencies of different copulas chosen by the two methods among 64 simulations. Results show that the minimal distance method in Section 2.3.4 choses the correct Gaussian copula more often than the GOF method, and it is computationally more efficient.

40 Case 1, q = 0.5 Case 1, q = 0.9 3.0 4.0 3.0 2.0 2.0 1.0 1.0 0.0 0.0 Omni CopC CopC2 Naive GIM True Omni CopC CopC2 Naive GIM True

Case 2, q = 0.5 Case 2, q = 0.9 4.5 7.5 3.0 5.0 1.5 2.5 0.0 0.0 Omni CopC CopC2 Naive GIM True Omni CopC CopC2 Naive GIM True

Case 3, q = 0.5 Case 3, q = 0.9 3.0 4.5 2.0 3.0 1.0 1.5 0.0 0.0 Omni CopC CopC2 Naive GIM True Omni CopC CopC2 Naive GIM True

Figure 2.2: Violin plots of the true and estimated quantile Qq(Yn+1 | In) for n = 2000 at q = 0.5 and 0.9 across 500 simulations in Cases 1-3 with 40% censoring and θ0 corresponding to τ = 0.3. Omni, CopC and Naive: the copula-based estimations by assuming the correct copula function and using the omniscient, proposed and naive estimators of θ0, respectively; CopC2: the counterpart of CopC with selected copula; GIM: the Gaussian-based imputation method from Park et al (2007).

Table 2.13: Frequencies of copulas chosen by the minimal distance (MD) and GOF methods across 64 simulations in Case 1 with 40% censoring, θ0 = 0.5 and n = 500. The truth is Gaussian copula.

Selected copula Method Gaussian Gumbel Clayton Joe Frank

MD 48 5 0 0 11 GOF 10 1 44 1 8

41 2.4 Large Sample Properties of the Estimator

∗ n Under assumption A1 in Section 2.2.1, the latent time series {Yt }t=1 is strictly stationary ergodic and β-mixing when it is generated via copulas satisfying the conditions in Proposi- tion 2.1 of Chen and Fan (2006). Chen et al. (2009) further showed that these conditions are satisfied by commonly used copulas, including Gaussian copula, Clayton copula with

0 ≤ θ < ∞, Gumbel copula with 1 ≤ θ < ∞, and t-copula with |θ| < 1 and the degree

∗ of freedom satisfying 2 ≤ ν < ∞. Suppose that the latent series {Yt } satisfies A1 and is −b β-mixing with the decay rate βt = O(t ) for some b > 0. When there is no censoring,

Lemma 4.1 of Chen and Fan (2006) showed that Gˆn(y) is uniformly consistent across y ∈ R.

Under fixed censoring, it is easy to show that the uniform consistency of Gˆn(y) still holds over y > d, which is needed to establish the asymptotic properties of θˆ.

∗ For the latent time series {Yt }, we define the set of segments as ∗ n ∗ ∗ ∗ ∗ o S = (y1,y2),...,(yn−1,yn) since the likelihood depends on the copula of the adjacent paired variables. Therefore, the latent log-likelihood can be written as

n n ∗ ∗ ∗ ∗ ∗ ∗ ∗ log{L (θ | y1,y2,...,yn)} = ∑ log{g (yt )} + ∑ lt (u,v;θ), (2.9) t=1 t=2

∗ ∗ ∗ ∗ ∗ where lt (u,v;θ) = log[c{G (yt−1),G (yt );θ}] for t = 2,...,n. Furthermore, we define

∂ ∂ S∗(u,v;θ) = l∗(u,v;θ) = log[c(u,v;θ)], t ∂θ t ∂θ ∂ ∂ 2 S∗ (u,v;θ) = S∗(u,v;θ) = log[c(u,v;θ)], t,θ ∂θ T t ∂θ∂θ T ∂ ∂ 2 S∗ (u,v;θ) = S∗(u,v;θ) = log[c(u,v;θ)], t,u ∂u t ∂θ∂u ∂ ∂ 2 S∗ (u,v;θ) = S∗(u,v;θ) = log[c(u,v;θ)]. t,v ∂v t ∂θ∂v

42 For the observed censored time series, we define the set of segments

n o = (y ,y ),...,(y ,y ),(y ,D ,y ),...,(y ,D ,y ) , S t1 t1+1 tn∗ tn∗ +1 s1 1 s1+l1+1 sKn Kn sKn +lKn +1 where the first n∗ segments correspond to the pairs of adjacent uncensored observations,

and the last Kn segments correspond to the consecutively censored sequences together with ∗ their preceding and subsequent uncensored observations. The set S contains total n + Kn number of segments. The censored log-likelihood can be written as

∗ log{Ln(θ)} = ∑ log{g (y j)} + ∑ lt(ut,vt;θ), (2.10) j:δ j=1 t∈{t1,...,tn∗ ,s1,...,sKn } where

  ∗ log{c(ut j ,ut j+1;θ)} t = t j, j ∈ {1,...,n }, l (u ,v ;θ) = t t t n o  log Int(usi ,usi+li+1,π;θ) t = si,i ∈ {1,...,Kn}, and

Int(usi ,usi+li+1,π;θ) Z π Z π n si+li o = ... c(usi ,usi+1;θ) ∏ c(ut−1,ut;θ) c(usi+li ,usi+li+1;θ)dusi+1 ...dusi+li . 0 0 t=si+2

We define the derivative of the log-likelihood with respect to θ for each segment as

St(ut,vt;θ)   ∂ ∗ ∗ ∗  logc(ut,vt;θ), t = t j, j ∈ {1,...,n },ut = G (yt),vt = G (yt+1), = ∂θ n o  ∂ ∗ ∗  ∂θ log Int(ut,vt,π;θ) , t = si,i ∈ {1,...,Kn},ut = G (yt),vt = G (yt+li+1).

In addition,

43 define ∂ S (u,v;θ) = St(u,v;θ), t,θ ∂θ ∂ St,u(u,v;θ) = St(u,v;θ), ∂u ∂ St,v(u,v;θ) = St(u,v;θ). ∂v

∗ Let G be the space of continuous probability distributions over the support of Yt . For

∗ ∗ ∗ any G ∈ G , define ||G − G ||G = supy>d {G(y) − G (y)}/w{G (y)} with w(·) satisfying ∗ the conditions in A6. Denote Gε = {G ∈ G : ||G − G ||G ≤ ε} for a small ε > 0. Denote

Fε = {(θ,G) ∈ Θ × Gε : kθ − θ0k ≤ ε}. Besides Assumption A1 stated in Section 2.2.1, we make the following additional assumptions.

Assumption A2. (i) The true parameter θ0 ∈ Θ, where Θ is compact. ∗ ∗ ∗ (ii) E{St (Ut−1,Ut;θ)} = 0 if and only if θ = θ0, where Ut = G (Yt ). ∗ 2 Assumption A3. (i) St (u,v;θ) is well-defined for (θ,u,v) ∈ Θ × (0,1) , and for all θ ∈ Θ, ∗ ∗ St (Ut−1,Ut;θ) is Lipschitz continuous at θ with probability one. (ii) St,u(u,v;θ) and ∗ 2 St,v(u,v;θ) are well defined and continuous in (θ,u,v) ∈ Θ × (0,1) . ∗ −b Assumption A4. {Yt : t = 1,2,...} is β-mixing with mixing decay rate βt = O(t ) for some b > 1.

∗ ∗ Assumption A5. E[supθ∈Θ kSt (Ut,Ut+1;θ)klog{1 + kSt (Ut,Ut+1;θ)k}] < ∞. Assumption A6. E{ kS∗ (U ,U )w(U )k} < supθ∈Θ,G∈Gε t,u t t+1;θ t ∞ E{ kS∗ (U ,U )w(U )k} < w(·) and supθ∈Θ,G∈Gε t,v t t+1;θ t+1 ∞, where the weight function is a

continuous function on [0,1] that is strictly positive on (0,1), symmetric at v0 = 1/2, and increasing on (0,1/2].

∗ Assumption A7. The interchange of differentiation and integration of St (u,v;θ) is valid. n ∗ o n 2 2o Assumption A8. E − St,θ (Ut,Ut+1;θ0) = E − ∂ logc(Ut,Ut+1;θ)/∂θ exists and is positive definite.

44 Assumption A9. E[ kS∗ {G(Y ),G(Y ) }k]2 < sup(θ,G)∈Fε t,θ t t+1 ;θ ∞ and ∗ St,θ {G(Yt),G(Yt+1);θ} is continuously differentiable in θ, with derivative bounded above. ∗ Assumption A10. The solution to E{S¯n(G ,θ)} = 0 is unique, where

∗ h n Kn i ¯ −1 Sn(G,θ) = n ∑ St j {G(Yt j ),G(Yt j+1);θ} + ∑ Ssi {G(Ysi ),G(Ysi+li+1);θ} j=1 i=1 ( n∗ ∂ −1 = n ∑ log[c{G(Yt j ),G(Yt j+1);θ}] ∂θ j=1 ) Kn h i + ∑ log Int{G(Ysi ),G(Ysi+li+1),G(d);θ} . i=1

Assumptions A2–A9 are standard in the analysis of Markov time series data; see Chen and Fan (2006). Assumption A10 is for the identifiability of the true parameter under fixed censoring.

In addition, to study the asymptotic normality of θˆ, we also introduce its infeasible counterpart based on the true marginal distribution G∗(·):

θ˜ = argmaxLLfn(θ), θ ∗ n Kn n o LLfn(θ) = ∑ log{c(ut j ,ut j+1;θ)} + ∑ log Int(usi ,usi+li+1,π;θ) , j=1 i=1

∗ ∗ where ut = G (yt) and π = G (d). The following theorem provides the asymptotic properties of θ˜ and θˆ. It is worth noting that compared to the infeasible estimator θ˜, the limiting representation of θˆ contains an

∗ additional term ϒ due to the nonparametric estimation of G by Gˆn.

Theorem 1. Suppose that assumptions A1 in Section 2.2.1 and A2–A6, A10 hold, then

√ D −1 kθˆ − θ0k = op(1); if moreover A7–A9 hold, we have Γ˜ n = n(θ˜ − θ0) −→ Γ˜ ∼ N(0,J ), √ D and Γˆ n = n(θˆ − θ0) −→ Γ˜ + ϒ with Γ˜ and ϒ being jointly Gaussian, where J, Γ˜ and ϒ are defined in (2.13), (2.14) and (2.15) in Section 2.5.3.

45 2.5 Technical Proofs

2.5.1 Lemma 1

Lemma 1. Under Assumptions A2–A7 and A10, we have

∗ (i)E {S¯n(G ,θ)} = 0 if and only if θ = θ0;

0 −b0 0 (ii) {Yt : t = 1,2,...} is β-mixing with mixing decay rate βt = O(t ) for some b ≥ b.

∗ Proof of part (i). By Assumption A10, we only need prove that E{S¯n(G ,θ0)} = 0. We ∗ ∗ ∗ use Ut to denote G (Yt) and π to denote G (d), where G (·) is the true marginal distribution. ∗ By Assumption A2 (ii), E{St (Ut−1,Ut;θ)} = 0, i.e, E{∂ logc(Ut−1,Ut;θ0)/∂θ} = 0, By Section S2, we consider using induction to prove part (i) of Lemma 1.

We start from n=3, and suppose that the observed data are {{y1,δ1}, {y2,δ2},{y3,δ3}}. 3 There are 2 = 8 cases of censoring status of {u1,u2,u3}, i.e. ut > π or not, or equivalently,

δt = 1 or 0, t = 1,2,3. Therefore, the score function has the form as

∂ log f ({u ,δ },{u ,δ },{u ,δ };θ) ∂θ 1 1 2 2 3 3 ∂ = I(δ = 0,δ = 0,δ = 0) log f (θ) 1 2 3 ∂θ 000 ∂ +I(δ = 1,δ = 0,δ = 0) log f (u ;θ) 1 2 3 ∂θ 100 1 ∂ +I(δ = 0,δ = 1,δ = 0) log f (u ;θ) 1 2 3 ∂θ 010 2 ∂ +I(δ = 0,δ = 0,δ = 1) log f (u ;θ) 1 2 3 ∂θ 001 3 ∂ +I(δ = 1,δ = 1,δ = 0) log f (u ,u ;θ) 1 2 3 ∂θ 110 1 2 ∂ +I(δ = 1,δ = 0,δ = 1) log f (u ,u ;θ) 1 2 3 ∂θ 101 1 3 ∂ +I(δ = 0,δ = 1,δ = 1) log f (u ,u ;θ) 1 2 3 ∂θ 011 2 3 ∂ +I(δ = 1,δ = 1,δ = 1) log f (u ,u ,u ;θ), 1 2 3 ∂θ 111 1 2 3

46 where

Z π Z π Z π f000(θ) = c(u1,u2;θ)c(u2,u3;θ)du1du2du3, 0 0 0 Z π Z π f100(u1;θ) = c(u1,u2;θ)c(u2,u3;θ)du2du3, 0 0

similar for f010(u2;θ), f001(u3;θ), Z π f110(u1,u2;θ) = c(u1,u2;θ)c(u2,u3;θ)du3, 0

similar for f101(u1,u3;θ), f011(u2,u3;θ),

f111(u1,u2,u3;θ) = c(u1,u2;θ)c(u2,u3;θ).

We take δ1 = 1,δ2 = 0,δ3 = 1 as an example to derive the expectation, and other cases are

similar. Note that the conditional density function of {U1,U3} | {U1 > π,U2 < π,U3 > π} is f101(u1,u3;θ0)/P(U1 > π,U2 < π,U3 > π;θ0). Therefore,

∂ E {I(δ = 1,δ = 0,δ = 1) log f (U ,U ;θ)} {U1,δ1},{U2,δ2},{U3,δ3} 1 2 3 ∂θ 101 1 3  ∂  = E log f (U ,U ;θ) U > π,U < π,U > π P(U > π,U < π,U > π;θ ) ∂θ 101 1 3 1 2 3 1 2 3 0 Z 1 Z 1 ∂ ∂θ f101(u1,π,u3;θ0) = f101(u1,π,u3;θ0)du1du3 π π f101(u1,π,u3;θ0) ∂ Z 1 Z 1 Z π = c(u1,u2;θ0)c(u2,u3;θ0)du2du1du3. ∂θ π π 0 ∂ = P(U > π,U < π,U > π;θ ). ∂θ 1 2 3 0

Hence,

h ∂ i ∂ E log f ({U ,δ },{U ,δ },{U ,δ };θ ) = 1 = 0, {{U1,δ1},{U2,δ2},{U3,δ3}} ∂θ 1 1 2 2 3 3 0 ∂θ

and this proves the result for n = 3.

47 ∗ Now assuming that E{S¯n(G ,θ0)} = 0 holds for n = n0, then it is easy to show that

h ∂ i E{{U ,δ },...,{U ,δ }} log f ({U1,δ1},...,{Un +1,δn +1};θ0) 1 1 n0+1 n0+1 ∂θ 0 0 ∂ = {P(U > π) + P(U ≤ π)} = 0. ∂θ n0+1 n0+1

The proof is complete.

∗ Proof of part (ii). The latent sequence {Yt ,t = 1,...} is a sample of a stationary first-oder Markov process and is β-mixing. Hence we have

∗ ∗k ∗∞ ∗ a→∞ β (a) = supkP1 ⊗ Pk+a − Pk,k+akTV −−−→ 0, (2.11) k

∗ ∗ k where k · kTV is the total variation norm, Pk,k+a is the joint distribution of {Yt }t=1 and ∗ ∞ ∗ {Yt }t=k+a. We want to show that for {Yt,t = 1,...}, where Yt = max(Yt ,d), (2.11) still holds.

∗ ∗ ∗ Note that Yt = Yt I(Yt ≥ d) + dI(Yt < d), thus

k ∞ supkP1 ⊗ Pk+a − Pk,k+akTV k ∗k ∗∞ ∗ ∗ ≤ supkP1 ⊗ Pk+a − Pk,k+akTV I(Yt ≥ d for all t = 1...,k,k + a,...) k 0k 0∞ 0 ∗ + supkP1 ⊗ Pk+a − Pk,k+akTV I(Yt < d for some t = 1...,k,k + a,...), k

0 ∗ ∗ where P denotes the truncated joint distribution of {Yt } with at least one of Yt ’s censored at d. Obviously, P0 has less total variation than P∗ due to the censoring. By the fact that

∗k ∗∞ ∗ ∗ supkP1 ⊗ Pk+a − Pk,k+akTV I(Yt ≥ d for all t = 1...,k,k + a,...) k ∗k ∗∞ ∗ ≤ supkP1 ⊗ Pk+a − Pk,k+akTV , k 0k 0∞ 0 ∗ supkP1 ⊗ Pk+a − Pk,k+akTV I(Yt < d for some t = 1...,k,k + a,...) k ∗k ∗∞ ∗ ≤ supkP1 ⊗ Pk+a − Pk,k+akTV , k

48 k ∞ a→∞ ∗ we have β(a) = supk kP1 ⊗ Pk+a − Pk,k+akTV −−−→ 0 for Yt = max(Yt ,d), i.e, under As- ∗ sumption A4, the censored sequence {Yt = max(Yt ,d),t = 1,...} is β-mixing, and that the ∗ mixing decaying rate of {Yt} is at least as high as that of {Yt }.

2.5.2 Proof of Theorem 1 (Consistency of θˆ)

Recall that

∗ h n Kn i ¯ −1 Sn(G,θ) = n ∑ St j {G(Yt j ),G(Yt j+1);θ} + ∑ Ssi {G(Ysi ),G(Ysi+li+1);θ} . j=1 i=1

By part (i) of Lemma 1, it is easy to verify that

 T θˆn = argmin Qn(θ) = S¯n(Gˆn,θ) S¯n(Gˆn,θ) , θ∈Θ  ∗ T ∗  θ0 = argmin Q(θ) = E{S¯n(G ,θ)} E{S¯n(G ,θ)} . θ∈Θ

¯ ˆ ¯ ∗ Hence it suffices to show that supθ∈Θ Sn(Gn,θ) − E{Sn(G ,θ)} = op(1). For the ease of presentation, we introduce the index t0 to represent the end point of the

∗ sandwich element in S , which is equal to t j + 1 if the start point is t = t j, j = 1,...,n ,

and equal to si + li + 1 if the start point is t = si, i = 1,...,Kn. Therefore, S¯n(G,θ) = − 1 0 n ∑t=t j or si St{G(Yt),G(Yt );θ}. Note that

∗ sup kS¯n(Gˆn,θ) − E{S¯n(G ,θ)}k θ∈Θ −1 ˆ ˆ −1 ∗ ∗ = sup kn ∑ St{Gn(Yt),Gn(Yt0 );θ} − n ∑ E[St{G (Yt),G (Yt0 );θ}]k θ∈Θ t=t j or si t=t j or si −1 ˆ ˆ −1 ∗ ∗ ≤ sup kn ∑ St{Gn(Yt),Gn(Yt0 );θ} − n ∑ St{G (Yt),G (Yt0 );θ}k θ∈Θ t=t j or si t=t j or si −1 ∗ ∗ −1 ∗ ∗ + sup kn ∑ St{G (Yt),G (Yt0 );θ} − n ∑ E[St{G (Yt),G (Yt0 );θ}]k θ∈Θ t=t j or si t=t j or si −1 ˆ ˆ ∗ ∗ ≤ n ∑ sup kSt{Gn(Yt),Gn(Yt0 );θ} − St{G (Yt),G (Yt0 );θ}k t=t j or si θ∈Θ −1 ∗ ∗ −1 ∗ ∗ + sup kn ∑ St{G (Yt),G (Yt0 );θ} − n ∑ E[St{G (Yt),G (Yt0 );θ}]k. θ∈Θ t=t j or si t=t j or si

49 Define

∇St{G(Yt),G(Yt0 );θ} = [St,u{G(Yt),G(Yt0 );θ},St,v{G(Yt),G(Yt0 );θ}],

k∇St{G(Yt),G(Yt0 );θ}k = kSt,u{G(Yt),G(Yt0 );θ}kw{G(Yt)}

+ kSt,v{G(Yt),G(Yt0 );θ}kw{G(Yt0 )}.

By Lemma 1 (ii), and A2, A6, we have

∗ ∗ sup kSt{Gˆn(Yt),Gˆn(Yt0 );θ} − St{G (Yt),G (Yt0 );θ}k θ∈Θ ∗ ∗ T = sup k∇St{G(Yt),G(Yt0 );θ}·{Gˆn(Yt) − G (Yt),Gˆn(Yt0 ) − G (Yt0 )} k θ∈Θ ˆ ∗ ≤ sup k∇St{G(Yt),G(Yt0 );θ}k · kGn − G kG , θ∈Θ and hence

−1 ˆ ˆ ∗ ∗ sup n ∑ kSt{Gn(Yt),Gn(Yt0 );θ} − St{G (Yt),G (Yt0 );θ}k θ∈Θ t=t j or si h i −1 ˆ ∗ ≤ n ∑ k∇St{G(Yt),G(Yt0 );θ}k × kGn − G kG = op(1). t=t j or si

It remains to show that

−1 ∗ ∗ −1 ∗ ∗ sup kn ∑ St{G (Yt),G (Yt0 );θ} − n ∑ E[St{G (Yt),G (Yt0 );θ}]k = op(1). θ∈Θ t=t j or si t=t j or si (2.12)

For fixed t, under Assumption A2(i) and A3(i), we know that for any εt > 0, there exists

γ > 0 and m finite integers such that {θ1,··· ,θm} forms a γ-covering of Θ, and

∗ ∗ ∗ ∗ sup kSt{G (Yt),G (Yt0 );θ} − St{G (Yt),G (Yt0 );θi}k ≤ εt, θ∈Θ,kθ−θik≤γ ∗ ∗ ∗ ∗ sup kE[St{G (Yt),G (Yt0 );θ} − St{G (Yt),G (Yt0 );θi}]k ≤ εt. θ∈Θ,kθ−θik≤γ

50 Therefore,

−1 ∗ ∗ ∗ ∗ . sup kn [St{G (Yt),G (Yt0 );θ} − St{G (Yt),G (Yt0 );θi}]k ≤ maxεt = εˇ, ∑ t θ∈Θ,kθ−θik≤γ t=t j or si −1 ∗ ∗ ∗ ∗ sup kn ∑ E[St{G (Yt),G (Yt0 );θ} − St{G (Yt),G (Yt0 );θi}]k ≤ εˇ, θ∈Θ,kθ−θik≤γ t=t j or si and then

−1  ∗ ∗ ∗ ∗  sup kn ∑ St{G (Yt),G (Yt0 );θ} − E[St{G (Yt),G (Yt0 );θ}] θ∈Θ,kθ−θik≤γ t=t j or si −1  ∗ ∗ ∗ ∗  − n ∑ St{G (Yt),G (Yt0 );θi} − E[St{G (Yt),G (Yt0 );θi}] k ≤ 2εˇ. t=t j or si

By A5, Lemma 1 (ii), and theorem (1) and the application (5) in Rio (1995), we have

−1 h ∗ ∗ ∗ ∗ i max kn St{G (Yt),G (Yt0 );θi} − E{St(G (Yt),G (Yt0 );θi)} k = op(1). 1≤i≤m ∑ t=t j or si

This proves (2.12) and consequently the consistency of θˆ.

2.5.3 Proof of Theorem 1 (Asymptotic Normality)

˜ Recall that θ = argmaxθ LLfn(θ), where

∗ n Kn n o LLfn(θ) = ∑ log{c(ut j ,ut j+1;θ)} + ∑ log Int(usi ,usi+li+1,π;θ) . j=1 i=1

 ∗ T ∗ Therefore, θ˜ = argmin Q˜(θ) = S¯n(G ,θ) S¯n(G ,θ) . Following similar steps as in Sec- θ∈Θ tion 2.5.2, we have kθ˜ − θ0k = op(1). By A9, we have

−1 ∗ ∗ ˜ −1 ∗ ∗ n ∑ St{G (Yt),G (Yt0 );θ} − n ∑ St{G (Yt),G (Yt0 );θ0} t=t j or si t=t j or si h −1 ∗ ∗ i ˜ = n ∑ St,θ {G (Yt),G (Yt0 );θ0} (θ − θ0) + rθ¯ , t=t j or si

51 −1 ∂ ∗ ∗ 2 r = n S {G (Y ),G (Y 0 ) }( ¯ − ) = o (k ¯ − where the reminder term θ¯ ∑t=t j or si ∂θ t,θ t t ;θ0 θ θ0 p θ

θ0k), with θ¯ between θ˜ and θ0. Therefore,

−1 1/2 ˜ h −1 ∗ ∗ i n (θ − θ0) = − n ∑ St,θ {G (Yt),G (Yt0 );θ0} t=t j or si

−1/2 ∗ ∗ × n ∑ St{G (Yt),G (Yt0 );θ0} + op(1). t=t j or si

By the , we have

1/2 D −1 Γ˜ n = n (θ˜ − θ0) −→ Γ˜ ∼ N (0,J ), (2.13)

∗ T where J = −limn→∞ ∂S¯n(G ,θ0)/∂θ . 1/2 1/2 For Γˆ n = n (θˆn − θ0), we decompose it into Γˆ n = Γ˜ n + ∆n, where ∆n = n (θˆn − θ˜). Note that

−1 ˆ ˆ ˆ −1 ∗ ∗ ˜ n ∑ St{Gn(Yt),Gn(Yt0 );θ} − n ∑ St{G (Yt),G (Yt0 );θ} t=t j or si t=t j or si −1 ∗ ∗ ˆ ˜ =n ∑ St,θ {G (Yt),G (Yt0 );θ0} × (θ − θ) t=t j or si −1 ∗ ∗ ˆ ∗ + n ∑ St,u{G (Yt),G (Yt0 );θ0} × {Gn(Yt) − G (Yt)} t=t j or si −1 ∗ ∗ ˆ ∗ + n ∑ St,v{G (Yt),G (Yt0 );θ0} × {Gn(Yt0 ) − G (Yt0 )} + RG¯,θ¯ , t=t j or si where the reminder term

h iT −1 ˆ ∗ ˆ ∗ ˆ ˜ RG¯,θ¯ = n ∑ {Gn(Yt) − G (Yt)},{Gn(Yt0 ) − G (Yt0 )},(θ − θ) HG¯,θ¯ t=t j or si

h ∗ ∗ i × {Gˆn(Yt) − G (Yt)},{Gˆn(Yt0 ) − G (Yt0 )},(θˆ − θ˜) ,

with HG¯,θ¯ being the second order Hessian matrix of St{G(Yt),G(Yt0 );θ} evaluated at some ¯ ˆ ∗ ¯ ˆ ˜ ˆ ˜ G between Gn and G and θ between θ and θ. Therefore, RG¯,θ¯ = op(kθ − θk). Also, by

52 the definitions of θˆ and θ˜,

−1 ˆ ˆ ˆ −1/2 n ∑ St{Gn(Yt),Gn(Yt0 );θ} = op(n ) t=t j or si

and −1 ∗ ∗ ˜ −1/2 n ∑ St{G (Yt),G (Yt0 );θ} = op(n ). t=t j or si Then we have

1/2 ∆n =n (θˆ − θ˜) −1 h −1 ∗ ∗ i = − n ∑ St,θ {G (Yt),G (Yt0 );θ0} t=t j or si h −1/2 ∗ ∗ ˆ ∗ × n ∑ St,u{G (Yt),G (Yt0 );θ0}{Gn(Yt) − G (Yt)} t=t j or si i −1/2 ∗ ∗ ˆ ∗ + n ∑ St,v{G (Yt),G (Yt0 );θ0}{Gn(Yt0 ) − G (Yt0 )} + op(1). t=t j or si

By part (2) of Lemma 4.1 in Chen and Fan (2006), we know that

Gˆ (·) − G∗(·) n−1 S {G∗(·),G∗(·);θ } × w{G∗(·)} × n1/2 sup n = O (1). ∑ t,u 0 w{G∗(·)} p t=t j or si y>d

D So we have ∆n −→ ϒ, where

−1 −1 ∗ ∗ 1/2 ∗ ϒ = J lim n St,u{G (Yt),G (Yt0 );θ0} × n {Gˆn(Yt) − G (Yt)} n→∞ ∑ t=t j or si −1 −1 ∗ ∗ 1/2 ∗ +J lim n St,v{G (Yt),G (Yt0 );θ0} × n {Gˆn(Yt0 ) − G (Yt0 )}, (2.14) n→∞ ∑ t=t j or si

and ϒ follows some multivariate normal distribution according to the Central Limit Theorem.

By Theorem 1 of Rémillard et al. (2012), Γ˜ n and ∆n are joint Gaussian and

D Γˆ n = Γ˜ n + ∆n −→ Γˆ = Γ˜ + ϒ ∼ N(0,Ψ), (2.15)

53 for some Ψ. 

2.6 Analysis of a Water Quality Data

We apply the proposed method to analyze the dissolved ammonia in the water quality data at Susquehanna River Basin in the United States. Ammonia is the end-product of protein metabolism and considered as one sanitary pollutant. Continuous monitoring of ammonia in water is important for plant operations and process control. The dissolved ammonia data were observed biweekly in Susquehanna River at Towanda, PA, from 1988 to 2014, consisting of 524 observations, which are available from the Susquehanna River

Basin Commission (SRBC) database (http://www.srbc.net/programs/cbp/sites/ Susquehanna_River_At_Towanda.html). Due to the detection limit, 194 (37.02%) of the observations are left-censored at 0.02 mg/L. Figure 2.3(a) plots the observed time series of dissolved ammonia on the original scale and (b) presents the normal Q-Q plot of log-transformed ammonia above the detection limit. The plot shows that even after the logarithm transformation, the marginal distribution of ammonia still has a clear departure from normality on the right tail. Therefore, our proposed method has potential advantages for analyzing this data since it relaxes the Gaussian assumption and can be applied on the untranformed data. Before applying the proposed method, we first checked the Markov assumption using a Chi-squared test based on the discretization method (Bickenbach and Bode, 2001; Wallis, 2001). Specifically, we discretized the observed data into six bins by putting the censored data into one bin, and then applied the Chi-squared test to check the Markov property. The test resulted a p-value of 0.28, suggesting no significant violation of the Markov order one assumption. To assess the one-step prediction, we analyzed the data using the first n = 523 obser- vations, and estimate the conditional quantiles of Yn+1 given the historical data. For the proposed method, B = 5000 was used for approximating the multi-dimensional integral. Following the procedure in Section 2.2.5, we chose the copula function among five copula

54 (a) (b) (c)

0.20 -1 0.40

0.15 0.30 -2

0.10 0.20

-3 0.05 0.10

0.02 0.00 0.00 1990 2000 2010 -2 -1 0 1 2 25 50 75 100 Time Quantiles of N(0,1) Quantile level

Figure 2.3: (a) observed time series of dissolved ammonia Yt in Middle Susquehanna from 1988 to 2014, where d = 0.02 is the detection limit; (b) the Q-Q plot of log-transformed ammonia above log(d); (c) the estimated conditional quantiles of Yn+1 (curve with solid circles) and the 95% pointwise confidence band from the proposed method, and the estimated ∗ conditional quantiles of Yn+1 from the Gaussian-based imputation method of Park et al (2007) (curve with open circles).

families: Gaussian, Clayton, Gumbel, Frank and Joe. The Joe copula led to the smallest dis- tance from the empirical copula and was thus chosen. The parameter in the Joe copula was

estimated as θˆ = 1.45, corresponding to the Kendall’s coefficient of 0.20. For comparison, we also included the imputation method proposed by Park et al. (2007), which assumes that

∗ the latent log-transformed ammonia log(Yt ) follows an AR(1) model with Gaussian white noises, and the resulting estimations for the , mean and variance parameters

are 0.38, -3.44 and 0.58, respectively. For this data set, Yn = 0.04 is above the detection

limit, so the conditional quantile of Yn+1 depends only on Yn under the model assumptions

of both methods. Figure 2.3(c) presents the estimated conditional quantiles of Yn+1 (subject to the DL) together with the 95% pointwise confidence band from the proposed method

∗ obtained through parametric bootstrap, and the estimated conditional quantiles of Yn+1 (the latent response) from the imputation method. Conditional median serves as a natural and robust choice for point forecast at a future time point. Our proposed method estimates the

conditional median of Yn+1 as 0.04, which is comparable to the observed value of 0.06.

55 However, the Gaussian-based imputation method leads to an under prediction of 0.01. The underestimation of the imputation method is consistent across quantile levels. Our proposed

method suggests that P(Yn+1 ≥ 0.06|In) ≈ 20%, while the imputation method suggests that the chance of exceedance is only 0.007, making 0.06 a rare event, and such a conclusion is doubtful considering that 27% of ammonia in this data is over 0.06. For a more thorough comparison, we conduct a moving window cross validation study.

s+l−1 We first form a training data of length l, {Yt}t=s , where the first observation Ys is un- censored. We then fit the candidate model using the training data to estimate the model

parameters, and consequently obtain the estimated conditional probability pˆs = Pr(Ys+l ≤ s+l−1 ys+l | {Yt}t=s ). Repeating this procedure until s + l = n gives a set of pˆs’s. When the

model fits the data well, we would expect pˆs to be uniformly distributed on (wˆs,1] with s+l−1 wˆs = Prˆ (Ys+l ≤ d | {Yt}t=s ), that is, vˆs = (pˆs − wˆs)/(1 − wˆs) to be uniformly distributed

on [0,1]. In our implementation, we let l = 400 and this yielded 51 vˆs’s for each method.

Figure 2.4 plots the histograms of vˆs’s obtained from two methods. The cor- responding to the copula-based method is much more uniform. We further conduct the

Kolmogorov-Smirnov test to assess the uniformity of vˆs’s, and the test gives p-values of 0.59 and 1.08e-14 for the copula-based and GIM methods, confirming that the proposed copula-based model provides a better fit to this ammonia data.

56 CopC GIM 15 15 10 10 5 5 0 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure 2.4: Histograms of the scaled estimated conditional probability from the proposed copula method (CopC) and the Gaussian-based imputation method (GIM) from Park et al (2007) for the cross validation study.

2.7 Conclusion

In this chapter, we propose a semiparametric approach for analyzing censored time series using copula to capture the temporal dependence. The proposed method is more flexible than existing ones that rely on parametric distributional assumptions on disturbances. The devel- oped sequential importance sampling provides a convenient tool to handle the challenging multiple integration related to the censored data. Similar to that in Joe (1994), Chen and Fan (2006), and Rémillard et al. (2012), our proposed estimation method is a two-step approach, where the marginal distribution G∗ is

estimated nonparametrically by Gˆn in the first step, which is then used to construct a pseudo likelihood to estimate the copula parameter in the second step. To improve the efficiency, we can combine the idea of the proposed algorithm with the sieve maximum likelihood estimation approach in Chen et al. (2009) by approximating the marginal density with sieves and estimating the copula parameter and sieve parameters jointly.

57 Chapter 3

Copula-Based Analysis of Multisite Daily Precipitation

3.1 Introduction

Precipitation poses numerous challenges to stochastic weather models and generators, since it is temporally intermittent, spatially dependent, and exhibits a highly skewed distribution. Modeling precipitation at multisites is important for hydrology and agricultural studies because precipitation predictions are often used as inputs for other weather models and generators. One option is statistical downscaling models. Statistical downscaling aims to derive a statistical relationship between global climate model output and local observations; see Uppala et al. (2005) and Wang et al. (2012). In this chapter, we use coarser-resolution predictor variables generated from a global climate model (Uppala et al., 2005). Our aim is to utilize such global climate variables to predict or interpolate local precipitation in the greater Chicago area. Two main challenges exist in precipitation analysis. First, observed precipitation data are usually highly sparse and contain many zero responses. Moreover, the distribution of the non-zero response, i.e. positive rain amount, is frequently skewed. To account for the mixed distribution and point mass probability at zero, some non-Gaussian processes have been proposed. For instance, Bardossy and Plate (1992) proposed a power-transformed truncated normal method, in which the underlying hidden process was assumed to be a latent truncated Gaussian process. Then, the precipitation was assumed to be from an underlying

58 truncated Gaussian process after a power transformation. One drawback of this method is that it cannot be generalized to make predictions at new locations where no historical data are available. Berrocal et al. (2008) presented a two-stage Gaussian process method to model precipitation, one for rain occurrence and the other for rain amount. Marginally, rain occurrence is modeled by a probit regression, and the rain amount is modeled by a gamma regression. This two-stage model estimated spatial dependence by two separate processes. Finally, the latent Gaussian process will be transformed into precipitation data through some skewed distribution other than Gaussian. Sloughter et al. (2007) and Kleiber et al. (2012) specified raining probability via and modeled the wet amount via a gamma regression at each location. The spatial dependence is then captured by two separate Gaussian processes. Sun and Stein (2015) considered a t random field to model latent spatial rainfall data, and employed logistic regression for precipitation occurrence. Ben Alaya et al. (2015) considered a mixture of Bernoulli-Generalized Pareto regression to account for the rain amount process. At each location, the distribution of precipitation is accumulated from Bernoulli and generalized Pareto distributions. The parameters of this mixed distribution are estimated through regressions at each location. They considered using only one Gaussian process to account for spatial dependence. In this chapter, we propose a copula-based model for analyzing multisite daily precipita- tion. By using the copula function, we can discern the dependence of multiple locations from the marginal distribution at each location. In our proposed model, multivariate Gaussian copula is used to capture the covariance of daily precipitation across locations, while the marginal distribution at each location can be estimated by regression models. Our proposed approach assumes the same latent spatial process to model both rain occurrence and amount. This model possesses superior interpretability compared to existing methods based on two latent processes. In addition, this allows us to use both binary responses of rain occurrence and positive responses of rain amount to estimate dependence across multiple locations, and thus leads to more accurate estimation and prediction.

59 The structure of this chapter is as follows. Section 3.2 explains the model assumptions and our proposed method. In Section 3.3, we conduct simulation studies to assess the performance of the proposed methods in terms of parameter estimation, and prediction at a new time or new location (interpolation). The practical value of the proposed method is illustrated through a real data analysis in Section 3.4.

3.2 Proposed Method

3.2.1 Notation

Suppose that we observe precipitation data on t = 1,...,T time points at s = 1,...,S t=T,s=S locations. Let {Wt,s}t=1,s=1 be non-negative responses, indicating rain amount at time t t=T,s=S at location s. Let {Yt,s}t=1,s=1 be binary responses, indicating rain occurrence at time t at

location s, where Yt,s = 1 indicates rain. Hence, at each time point t and location s, a pair of

responses (Yt,s,Wt,s) are recorded and

   1, if Wt,s > 0, Yt,s =   0, if Wt,s = 0.

t=T,s=S Meanwhile, denote {Xt,s}t=1,s=1 as the observed predictors. In this work using the Chicago precipitation data, the predicotrs include reanalysis temperature, humidity, pre- cipitations, simulated from the ERA-40 reanalysis model in Uppala et al. (2005). In our

empirical study, Xt,1 = ... = Xt,S = xt for all seven observed locations s = 1,...,S = 7,

since xt is from the gloabal weather model for the use of downscaling at local area. Our proposed method, however, is not restricted to such condition and allows for different values of predictors across locations. In addition, in the motivating real data considered, the reanal- ysis variables come from the global weather model. The variables are called "reanalysis data" since the data generation of these predictors involves reprocessing observational data spanning an extended historical period using physical, statistical and mathematical analysis

60 global weather models. Since these global weather models generate reanalysis data for a long historical and future period, predictors at future time points are also available for our model, where the reanalysis global weather data, for the greater Chicago area, is downscaled through modeling to the local precipitations at seven Chicago stations.

3.2.2 Three-Part Model

In this section, we formulate the proposed copula-based three-part precipitation model. Through the preliminary analysis, we also found, that after regression models for precipita- tion at each location, the temporal dependence of residuals is quite weak, which was also observed in Wang et al. (2012) and Yang and He (2012). Therefore, we assume that the

temperal trends for Yt,s and Wt,s for t = 1,...,T are fully captured by the predictor Xt,s.

Part 1: Marginal model for the rain occurrence.

At each location s, we assume the following model:

T ho{Pr(Yt,s = 1|Xt,s)} = Xt,sβs, (3.1) where 1 hL(z) = o 1 + exp(−z) for logistic regression and

P −1 ho (z) = Φ (z) for probit regression, where Φ(·) is the CDF of standard normal distribution.

Part 2: Marginal model for the rain amount.

At locations with positive precipitations (wet days), we propose to model the rain amount using a Gamma regression. Specifically, at a given location s, we assume that

1/3 Wt,s |Wt,s > 0,Xt,s ∼ Gamma(αs, µt,s),

61 where Gamma(α, µ) denote the gamma distribution with α and mean µ, whose density is

1 zα−1 z g(z) = exp(− ). (3.2) Γ(α) (µ/α)α µ/α

In our model assumption, we assume that the shape parameter α is constant across

time and locations, while the mean parameter µt,s is varying and can be expressed as some transformation of the linear combination of predictors, that is,

T ha(µt,s|Xt,s) = Xt,sγs, (3.3)

where ha(·) is the link function. In our empirical studies, we consider both the identity link i l funciton ha(z) = z and the log link function ha(z) = log(z).

Part 3: Joint model through copula.

For rain occurrence Yt,s, we denote πt,s(xt,s) = Pr(Yt,s = 1 | xt,s) as the conditional

probability of raining give Xt,s = xt,s. For rain amount Wt,s, we denote the conditional 1/3 distribution of Wt,s | Wt,s > 0,Xt,s = xt,s as Gt,s(· | xt,s). Therefore, by models (3.1) and (3.3), the marginal distribution of precipitations at location s on time t can be derived as

  1 − (x ), if z = 0, 1/3  πt,s t,s Ft,s(z | xt,s) = P(Wt,s < z | xt,s) = (3.4)   1 − πt,s(xt,s) + πt,s(xt,s)Gt,s(z), if z > 0,

where xt,s are the predictors at location s and time t. 1/3 Denote ut,s = Ft,s(wt,s | xt,s). For any fixed time t, we denote a S−dimensional probabil-

ity vector ut = (ut,1,...,ut,S) as probability measurements for precipitations at all locations on time t. To account for the dependence of multiple locations, we assume that the spatial

dependence is fully captured by the S-dimensional Gaussian copula function C(v1,...,vS;Σ),

62 where Σ ∈ RSxS is the spatial correlation matrix for locations s = 1,...S, for instance, the Matérn correlation matrix with parameter η = (λ,ν). Let d = ks − s0k be the Euclidean dis- tance of any two given locations s and s0. Hence, the Matérn correlation function M(·;λ,ν) is defined as:

21−ν √ d ν √ d  M(d;λ,ν) = 2ν K 2ν , (3.5) Γ(ν) λ ν λ where Γ(·) is the gamma function

Z ∞ Γ(z) = xz−1 exp(−x)dx, 0

and Kν (·) is the modified Bessel function of the second kind

∞ π I−ν (z) − Iν (z) 1  z 2m+ν Kν (z) = , with Iν (z) = ∑ . 2 sinνz m=0 m!Γ(m + ν + 1) 2

It is worth noting that, in the Matérn correlation function, the parameter (λ) and the smoothing parameter (ν) are both involved to determine the correlation coefficient. The range parameter determines the decay rate of the correlation regarding distance, while the smoothing parameter specifies the degree of smoothness of the Matérn function. Additional details about the roles of the two parameters in the Matérn covariance matrix can be found at Minasny and McBratney (2005).

3.2.3 Parameter Estimation

In the preliminary analysis of the Chicago daily precipitation data, we found that the

marginal parameters (βs) in part 1 model (3.1) across locations do not vary drastically,

and the marginal parameters (γs,αs) in part 2 model (3.3) are not significantly different in each location. Consequently, we assume that the marginal parameters are common across locations and we utilize data from all locations to estimate the marginal parameters. In this

63 way, marginal information can be taken and shared from neighborhoods, and thus more efficient marginal estimations can be obtained. In addition, such an assumption makes it possible for interpolations at new stations where no historical data are available.

Part 1: Estimation for β.

Assuming that βs is common across locations, i.e., β1 = ... = βS = β, we can estimate β by βˆ, the maximum likelihood estimator of the binary regression in model (3.1) by using data from all locations, under the working independence assumption across sites, that is, by ignoring the inter-site dependence.

Part 2: Estimation for γ and α.

We assume that γs and αs are common across locations, i.e., γ1 = ... = γS = γ and

α1 = ... = αS = α. The common estimator (γˆ,αˆ ) is then obtained by fitting the gamma regression using data from all locations using the R function “glm”, by ignoring the inter-site dependence.

Part 3: Estimation for λ.

After pluging in the estimated marginal estimator θˆ = (βˆ,γˆ,αˆ ) into (3.4) we can obtain 1/3 uˆt,s = Fˆt,s(wt,s | xt,s), and hence uˆ t = (uˆt,1 ...uˆt,S). Let the location index set = {S[t],...S[t]} includes all the locations without rains on Dt 1 kt

time t, i.e., Wt,i = 0 for i ∈ Dt. Then using the Matérn correlation matrix in (3.5), the MLE of ηˆ = (λˆ ,νˆ ) is obtained by

T ˆ ηˆ = argmax ∑ logIntt(λ,ν | uˆ t,θ,Dt), (3.6) λ,ν t=1 where Z Int (λ,ν | uˆ ,θˆ, ) = c(v ...v ;λ,ν)dv [t] ...dv [t] , t t Dt 1 n S S Ξt 1 kt

64 and c(··· ;λ,ν) is the S-dimensional Gaussian copula density. Note here that the integral

area is Ξ = [0,1 − π [t] ] × ... × [0,1 − π [t] ]. t t,S t,S 1 kt In our estimation, the likelihood function involves integration, and its calculation is challenging when precipitation is rare or when the number of locations (S) is large. Therefore, for the sake of computational simplicity (Varin et al., 2011), we propose the following

composite likelihood estimator η˜ = (λ˜ ,ν˜ ). The composite likelihood is a pairwise likelihood used to approximate the full likelihood.

For each two locations si and sk, their pairwise likelihood is:

Comp (u ,u , , ,y ,y ; , ) t,si,sk t,si t,sk πt,si πt,sk t,si t,sk λ ν  c(u ,u ;λ,ν), if y = 1 and y = 1,  t,si t,sk t,si t,sk    C(1 − πt,si | ut,sk ;λ,ν), if yt,si = 0 and yt,sk = 1, = C(1 − π | u ;λ,ν), if y = 1 and y = 0,  t,sk t,si t,si t,sk    C(1 − πt,si ,1 − πt,sk ;λ,ν), if yt,si = 0 and yt,sk = 0, where C(·,·) is the two-dimensional Gaussian copula function, c(·,·) is the two-dimensional copula density function corresponding to C(·,·) and C(· | ·) is the conditional copula func- tion corresponding to C(·,·). Specifically, for bivaraite Gaussian copula with correlation

coefficient ρ, we have

65 −1 −1 C(u,v;ρ) = Φ2(Φ (u),Φ (v);ρ), φ (Φ−1(u),Φ−1(v);ρ) c(u,v;ρ) = 2 φ(Φ−1(u))φ(Φ−1(v)) 1  Φ−1(u)2 − 2ρΦ−1(u)Φ−1(v) + Φ−1(v)2  = p exp − 2 2π 1 − ρ2 2(1 − ρ ) 1  Φ−1(u)2 + Φ−1(v)2  / exp − 2π 2 1 ρ2Φ−1(u)2 − 2ρΦ−1(u)Φ−1(v) + ρ2Φ−1(v)2  = p exp 2 , 1 − ρ2 2(1 − ρ ) C(v|u;ρ) = P(V ≤ v|U = u;ρ)       Z 0 1 ρ −1 −1  1     = P(Z2 ≤ Φ (v)|Z1 = Φ (v)), given   ∼ N2  , . Z2 0 ρ 1 ! Φ−1(v) − ρΦ−1(u) = Φ p , 1 − ρ2 where Φ2(·,·;ρ) and φ(·,·;ρ) are the standard bivariate Gaussian distribution function with correlation coefficient ρ and its density; Φ(·), Φ−1(·) and φ(·) are the standard univariate Gaussian distribution function, and its inverse distribution and density functions. Thus, the composite likelihood estimator is defined as

T ˜ = argmax logComp (u ,u , , ,y ,y ; , ). (3.7) η ∑ ∑ t,si,sk t,si t,sk πt,si πt,sk t,si t,sk λ ν λ,ν t=1 i

In practice, it is often challenging to estimate the smoothing parameter ν. A commonly used approach is to fix the smoothing parameter at some reasonable value while estimating

the remaining parameters. We fix ν, as well, in this work to ease the computational burden

and thus estimate λ alone, denoted as λν . Hence the full likelihood estimator in (3.6)

66 becomes

T ˆ ˆ λν = argmax ∑ logIntt(λ,| ν,uˆ t,θ,Dt), (3.8) λ t=1 and the composite likelihood estimator in (3.7) becomes

T ˜ = argmax logComp (u ,u , , ,y ,y ; , ). (3.9) λν ∑ ∑ t,si,sk t,si t,sk πt,si πt,sk t,si t,sk λ ν λ t=1 i

3.2.4 Prediction at New Time for Existing Locations

In this section, we propose methods for conducting probabilistic predictions of WT ∗,s,s = 1,...,S, the precipitations at existing locations at a new time point T ∗ , given the estimated

parameter (θˆ,ηˆ ), and the covariates information, for instance, the reanalysis temperature, humidity and precipitation variables in the motivating Chicago precipitation dataset. Here

(θˆ,ηˆ ) are estimated from precipitation data at locations s = 1,...S on time points t = 1,...T. Recall that in our preliminary analysis with Chicago precipitation dataset, we notice that, at each location, the temporal dependence of residuals from the gamma regression model of precipitations on the reanalysis covariates are quite weak (Wang et al. (2012),Yang and He (2012)) and thus in the proposed three-part model, we assume that the temporal trends are fully captured by models (3.1) and (3.3). Therefore, the covariate information (reanalysis variables in the Chicago data) at future time points can be used to conduct prediction. We consider two approaches for prediction: (1) the marginal approach that uses only covariates information at the given location for prediction; (2) the joint approach which uses information from all available locations to conduct prediction for a given location.

67 3.2.4.1 Marginal Approach

With covariates xT ∗,s at new time, we first calculate

−1 T ˆ πˆT ∗,s = ho (xT ∗,sβs),

by model (3.1), and GˆT ∗,s(· | xT ∗,s), which is the CDF of the gamma distribution with . parameters α = αˆ and

. −1 T ˆ µ = µˆT ∗,s = ha (xT ∗,sβs),

by model (3.3). Then, the following procedures are carried out to obtain probabilisitc predictions for new time T ∗.

∗ 1. For a new time point T , at each location s, generate a uniform uT ∗,s.

2. Compare uT ∗,s with 1 − πˆT ∗,s; If uT ∗,s ≤ 1 − πˆT ∗,s, predictw ˆT ∗,s = 0.

1−uT∗,s −1 3 3. If uT ∗,s > 1 − πˆT ∗,s, compute vT ∗,s = 1 − and predictw ˆT ∗,s = {Gˆ ∗ (vT ∗,s)} . πˆT∗,s T ,s

3.2.4.2 Joint Approach

To account for the dependence across multiple locations, we propose this joint approach to s=S ˆ predict {WT ∗,s}s=1 jointly based on the estimated parameters (θ,ηˆ ). First, with covariates

xT ∗,s at the new time, by plugging in θˆ, we can calculate πˆT ∗,s, µˆT ∗,s and GˆT ∗,s(· | xT ∗,s), . . which is the CDF of gamma distribution with parameters α = αˆ and µ = µˆT ∗,s as in the

marginal approach. The estimated correlation matrix Σˆ S is also obtained through Matérn function (3.5) and ηˆ . The following procedure is then performed to obtain probabilisitc predictions for at the new time T ∗.

∗ 1. At the new time point T , generate S-dimensional vector uT∗ = (uT ∗,1,...,uT ∗,S),

where uT ∗,s,s = 1,...S are dependent uniform random variables through the

68 S-dimensional copula function C(v1,...,vS;Σˆ S);

2. for each s = 1,...,S, compare uT ∗,s with 1−πˆT ∗,s; If uT ∗,s ≤ 1−πˆT ∗,s, predict wˆT ∗,s = 0.

1−uT∗,s 3. If uT ∗,s > 1−πˆT ∗,s, compute vT ∗,s = 1− and obtain the predicted value wˆT ∗,s = πˆT∗,s ˆ−1 3 {GT ∗,s(vT ∗,s)} .

3.2.4.3 Alternative Existing Approach

Berrocal et al. (2008) proposed a two-stage Gaussian process method, denoted as 2GP, to model precipitation. Their proposed modeling strategy comprises two separate Gaussian

spatial processes, ε(s) and Z(s), accounting for the rain occurence and rain amount, respec- t=T,s=S tively. Suppose that {(Xt,s,Yt,s,Wt,s)}t=1,s=1 is observed, where Xt,s is covariates at location

s at time t, and Yt,s and Wt,s are rain occurrence and rain amount, respectively, at location s at time t. The method in Berrocal et al. (2008) makes the following assumptions:

1/3 1. At each time t = 1,...,T, assume that the cube-root of precipitation Wt,s comes from ∗ T 1/3 T ˆ a latent Gaussian process Wt,s = Xt,sβs + εt,s. I.e., Wt,s = max(0,Xt,sβs + εt,s). It is

worth noting for a fixed time t, εt,s = ε(s) is a zero-mean Gaussian process across 0 ks−s0k locations s = 1,...,S with covariance matrix Cov(ε(s),ε(s )) = exp(− ρ );

2. Assume that

Wt,s | Wt,s > 0,Xt,s ∼ Gamma(αs, µs), (3.10)

with density in (3.2). And further assume that

T E(Wt,s | Wt,s > 0,Xt,s) = Xt,scs, (3.11)

69 and

T Var(Wt,s | Wt,s > 0,Xt,s) = Xt,sds, (3.12)

2 where E(Wt,s | Wt,s > 0,Xt,s) = µs and Var(Wt,s | Wt,s > 0,Xt,s) = µs /αs;

3. Denote Gs(·) as the gamma distribution in (3.10). Then at each fixed t, let Zt,s = Z(s) be a zero-mean Gaussian process with covariance matrix Cov(Z(s),Z(s0)) = ks−s0k exp(− r ), such that, at each location s where Wt,s is strictly positive, we have 1/3 −1 Wt,s = Gs (Φ(Zt,s)).

Below is the estimation procedure of the 2GP method.

1. βs is estiamted at any individual site through fitting a probit model;

T ˆ 2. setting Yt,s = I(Xt,sβs +εt,s > 0), ρ is estimated by maximizing the likelihood function

of Yt,s for t = 1,...,T and s = 1,...,S;

3. cs in (3.11) is estimated through a simple of Wt,s | {(t,s) : Wt,s > 0}

on Xt,s | {(t,s) : Wt,s > 0};

4. denote the residual terms for the above as ξt,s. Then ds in

2 (3.12) is estimated through a simple linear regression ξt,s on Xt,s | {(t,s) : Wt,s > 0};

5. r is estimated by maximizing the likelihood function at all locations where Wt,s > 0, given all the other parameters as fixed.

It is worth noting that the probit model assumption in the 2GP model is a special case

P −1 with our proposed three-part model by using ho (z) = Φ (z) as the link function in part 1 model (3.1). In addition, their covariance matrices with exponential decay are also a special

form of the Matérn function by specifying the smoothing parameter ν = 0.5 in our model (3.5).

70 3.2.4.4 Prediction Assessment

In this section, we introduce two assessment tools to evaluate the probabilistic predictions for the new time at existing locations. Here, we adopt the continuous ranked probability score (CRPS) that was considered in Matheson and Winkler (1976) and Sloughter et al. (2007). At location s on time t, for the realized observation w and the predictive CDF of

∗ precipitaions at location s on time t, F , given predictor xt,s, the CRPS is defined as

Z ∞ CRPS(F∗,w) = (F∗(ξ) − I{w ≤ ξ})2dξ. (3.13) −∞

Gneiting and Raftery (2007) showed that (3.13) is equivalent to

∗ 1 0 CRPS(F ,w) = E ∗ |W − w| − E ∗ |W −W |, (3.14) F 2 F where W and W 0 are independent random variables from the common distribution F∗. In practice, F∗ is approximated by Fˆ, the empirical distribution of emsembles of size M, denoted as y∗(1),...,y∗(M). Then

M M m ˆ 1 ∗(i) 1 ∗(i) ∗( j) CRPS(F,w) = ∑ |y − w| − 2 ∑ ∑ |y − y |. (3.15) M i=1 2M i=1 j=1

It is worth noting that the lower the CRPS, the better the performance of the prediction. In our proposed approach, CRPS will be computed at each individual location s and at each time point t. Consequently, the resulting assessment should be averaged across all locations and over all validation time points. The second assessment tool is multivariate rank histogram (Tang et al. (2017); Thorarins- dottir et al. (2016); Gneiting et al. (2008); Hamill (2001); Talagrand et al. (1997); Hamill and Colucci (1997) ). Assume that we have m = 1,...M ensembles for precipitations at all (m) (m) locations at time t, denoted as yt . Note here that yt is an S−dimensional vector, where S is the total number of locations. Then, at a fixed time t, for m = 1,...M, the pre-ranks

71 (m) ρt is defined as M (m)  (m) ( j) ρt = ∑ I yt ≤ yt , j=0

(0) where yt denotes the observed vector at time t and for two n−vectors, x = (x1,...xn) and

z = (z1,...zn), x ≤ z if and only if xi ≤ zi for all i = 1,...n. Then the multivariate rank for (0) (0) (0) (M) yt , denoted as mrt, is the rank of ρt among ρt ,...ρt . If the probabilistic forecast ensembles are random draws from the same distribution of true observations, the multivariate

ranks mrt across time t = T + 1,... should follow a uniform distribution. Hence, we would expect the multivariate rank histogram to be uniform. On the other hand, inappropriate consideration of spatial dependence will result in a deviation from uniformity.

3.2.5 Interpolation at New Locations

In practice, it is usually of interest to predict the outcome at some unknown location for the purpose of reconstruction (Steinman et al. (2012); Zhou et al. (2013)). In this section, we describe two approaches for interpolations at a new location S∗ for a time point t0, given that covariates and precipitation at existing locations are known. Specifically, we aim to predict

 0 Wt0,S∗ | Xt,s,Yt,s,Wt,s,t ∈ {1,...,T} ∪ {t },s = 1,...,S,Xt0,S∗ . . (3.16)

3.2.5.1 Marginal Strategy

We propose a marginal strategy to serve as a baseline. At the new location S∗, this marginal strategy ignores the dependence across multisites, and thus the prediction is solely based

on the available covariate information Xt0,S∗ at location S*, and the estimated marginal parameter θˆ is obtained from the analysis of data from existing locations.

Specifically, with covariates xt0,S∗ at a new location, we first calculate

−1 T ˆ πˆt0,S∗ = ho (xt0,S∗ βS∗ ), (3.17)

72 . by Model (3.1), and Gˆt0,S∗ (·), whose density has the form in (3.2) with α = αˆ and

. −1 T ˆ µ = µˆt0,S∗ = ha (xt0,S∗ βS∗ ), (3.18)

by Model (3.3). The following procedure is then carried out to obtain probabilistic interpo- lation for the new location S∗ at time t0.

0 1. At the time point t = t , generate an independent uniform random variable ut0,S∗ .

2. Compare ut0,S∗ with 1 − πˆt0,S∗ ; if ut0,S∗ ≤ 1 − πˆt0,S∗ , predict yt0,S∗ = 0.

1−ut0,S∗ 3. If ut0,S∗ > 1 − πˆt0,S∗ , compute vt0,S∗ = 1 − , which is the vt0,S∗ -th quantile of πˆt0,S∗

Gamma distributed with mean µˆt0,S∗ and shape parameter αˆ , whose density is (3.2).

ˆ−1 3 4. Obtain the vt0 -th quantile as prediction, i.e.,w ˆt0,S∗ = {Gt0,S∗ (vt0 )} .

3.2.5.2 Joint Strategy

We propose the following joint strategy, which involves (θˆ,ηˆ ).

First, based on covariates xt0,S∗ at a new location, πˆt0,S∗ , µˆt0,S∗ and thus Gˆt0,S∗ are estimated by plugging in θˆ as the marginal approach. The estimated correlation matrix for locations s = 1,...,S, Σˆ, is obtained through the Matérn function (3.5) and ηˆ . Similarly, with additional new location S∗, Σˆ ∗ , the estimated correlation matrix for locations s = 1,...,S,S∗

is calculated by plugging ηˆ into (3.5). Then, the following procedure is performed to obtain interpolations for the new location S∗ at time t0.

0 1. At the time point t = t , generate an independent uniform random variable wt.

s=S 2. Given {Yt0,s,Wt0,s}s=1, calculateu ˆt0,s through (3.4), s = 1...S.

∗ 3. Based on Σˆ and Σˆ S, generate ut0,S∗ s.t. P(Ut0,S∗ | uˆt0,1,...uˆt0,S) = wt.

4. Compare ut0,S∗ with 1 − πˆt0,S∗ ; If ut0,S∗ ≤ 1 − πˆt0,S∗ , predict yt0,S∗ = 0.

73 1−ut0,S∗ 5. If ut0,S∗ > 1 − πˆt0,S∗ , compute vt0,S∗ = 1 − , which is the vt0,S∗ -th quantile of πˆt0,S∗

Gamma distributed with mean µˆt0,S∗ and shape parameter αˆ , whose density is (3.2).

6. Based on the estimated gamma distribution Gˆt0,S∗ (·), obtain the vt0 -th quantile as ˆ−1 3 prediction, i.e.,w ˆt0,S∗ = {Gt0,S∗ (vt0 )} .

Denote   Σˆ σ ˆ ∗   Σ =  , σ T 1

and there are three scenarios of generating ut0,S∗ through P(Ut0,S∗ | uˆt0,1,...uˆt0,S) = wt in

Step 3, based on three cases of wt0,s for s = 1,...S.

Case 1. wt0,s > 0 for all locations s = 1,...S.

In such a case, all the wt0,s are truly observed and hence uˆt0,s > 1 − πˆt0,s for s = 1,...S.

We need to solve the following equation for ut0,S∗ ,

∗ C(ut0,s∗ | uˆt0,1,...uˆt0,S;Σˆ ) = w, (3.19) where C(· | ···) is the conditional copula function and w is generated from a uniform distribution. By using Gaussian copula functions, the solution for equation (3.19) can be simplified as

T −1 −1 p T −1 ut0,S∗ = Φ{σ Σˆ Z + Φ (w) 1 − σ Σˆ σ}.

Case 2. wt0,s = 0 for all locations s = 1,...S.

In this case, all the wt0,s are unobservable and hence uˆt0,s are censored at 1 − πˆt0,s for

s = 1,...S. We can obtain ut0,S∗ by solving

∗ C(1 − πˆt0,1,...1 − πˆt0,S,ut0,S∗ ;Σˆ ) = wC(1 − πˆt0,1,...1 − πˆt0,S;Σˆ), (3.20) where w is generated from a uniform distribution and C(·;) is the multivariate Gaussian

74 copula function. By using Gaussian copula functions, the solution to equation (3.20) is

∗ ut0,S∗ = Φ(Z ), where Z∗ is a solution to

∗ ∗ ΦS+1{(Z,Z );Σˆ } = wΦS(Z;Σˆ),

where ΦS(·;Σ) is CDF for S-dimensional normal distribution with mean~0 and covariance −1 −1 matrix Σ, Z = (Φ (1 − πˆt0,1),...Φ (1 − πˆt0,S)), and Φ(·) is the CDF of the standard normal distribution.

Case 3. wt0,s > 0 for some locations among s = 1,...S and wt0,s = 0 for other locations.

In such a mixed case, let r1,...rm denote the locations with strictly positive precipitation,

0 i.e., wt ,ri > 0 for i = 1,...m, and let q1,...qn denote the locations without raining, i.e.,

0 ∗ wt,q j = 0 for j = 1,...n. Notice that m + n = S. Hence, ut ,S can be obtained by solving

∗ 0 ∗ 0 0 0 0 ˆ Cn+1(ut ,S ,1 − πˆt ,q1 ,...1 − πˆt ,qn | uˆt ,r1 ,...uˆt ,rm ;Σ )

0 0 0 ˆ = wtCn(1 − πˆt ,q1 ,...1 − πˆt ,qn | uˆt ,r1 ,...uˆt,rm ;Σ), (3.21)

where w is generated from a uniform distribution and Cn(· | ··· ;Σ) is the n-dimensional conditional copula function with the correlation matrix Σ. It is worth noting that the solution to (3.21) is similar to (3.20), only differing in the terms that in (3.21), multivariate conditonal Gaussian distributions are involved. In order to derive the involved multivariate conditional Gaussian distribution, we define

−1 −1 R = (Φ (uˆt,r1 ),...,Φ (uˆt,rm ))

75 for locations with rains and define

−1 −1 Q = (Φ (1 − πˆt,q1 ),...,Φ (1 − πˆt,qn ))

for locations without rains, where Φ(·) is the CDF for standard normal distribution. And let

∗ −1 ∗ Z = Φ (ut0,S∗ ). Hence we can rearrange Σˆ and Σˆ according to the permutation of R, Q and Z∗. Assume       R 0 ΣR ΣRQ σR              Q  ∼ N  0 , Σ Σ σ .      QR Q Q   ∗    T T  Z 0 σR σQ 1

Then Z∗ can be obtained by solving the equation

∗ −1 −1 Φn+1{(Q,Z );νc,Σc} = wtΦn(Q;ΣQRΣR R,ΣQ − ΣQRΣR ΣRQ),

where Φn(·;~µ,Σ) is the n-dimensional Gaussian distribution with mean vector ~µ and covari- ance matrix Σ, and   Σ  QR −1 νc = ΣR R,  T  σR and     Σ σ Σ  Q Q   QR −1 Σc = − ΣR (ΣRQ,σR).  T   T  σQ 1 σR

∗ Therefore, the solution to equation (3.21) is ut0,S∗ = Φ(Z ).

3.3 Simulation

3.3.1 Simulation Design

In this section, we assess the finite sample performance of the proposed method for multisite precipitation data. In the simulation designs, the following procedures are performed to

76 generate the simulated precipitation data. We utilize a dataset of reanalysis daily global weather factors in the greater Chicago area from 1976 to 2002, which can be used to sample covariates in simulations. It is worth noting that, to mimic the real data, at each t, covariates

are set to be the same across all locations, denoted as xt, i.e., xt,1 = ... = xt,S = xt.

3 1. For each time t = 1,...T, randomly draw predictors xt ∈ R from the dataset of the reanalysis daily global weather factors (including reanalysis temperature, humidity

T and precipitation) in greater Chicago area and hence obtain πt = exp(xt β)/(1 + T T exp(xt β)), µt = xt γ with prespecified α. For each location s = 1,...S on each t, set

πt,1 = ...,= πt,S = πt and µt,1 = ...,= µt,S = µt;

2. fix S locations on a [0,1] × [0,1] grid, and define

  1 ρ12 ...     ΣS =  1,...      ...... ρi j ... 1

as the correlation matrix, where ρi j = M(di j;λ,ν) by (3.5) and di j is the euclidean distance for ith and jth location among s ∈ {1,...S};

3. for each t = 1,...T, generate (u1,t,u2,t,...uS,t) from the copula function CS(··· ;ΣS);

4. at location s and at time t, compare ut,s with πt,s. If ut,s ≤ 1 − πt,s, then let yt,s = 0;

−u 5. if u > 1−π , let v = 1− 1 t,s , then w = [G−1{v ;α, µ }]3, where G(·;α, µ) t,s t,s t,s πt,s t,s s,t t,s is the CDF of Gamma distribution, with density function

1 α α f (y) = ( y)α exp(− y)d log(y). Γ{α) µ µ

77 3.3.2 Estimation of Matérn Parameters

In this section, with simulated data, our target is to estimate the Matérn parameter λ. Although the marginal parameters θ = (β,γ,α) are also estimated, they are not of particular interest to this chapter. Consequently, we omit the results of estimated marginal parameters

and only focus on the estimation of the Matérn parameter λ in this subsection. We first generate simulated precipitation data according to the procedures in Section

3.3.1, and we chooose the parameters to be β = (−1.5,1.5), γ = (0.30,0.30), α = 2, λ = 20, and ν = 1. The simulation is replicated 240 times. Within each replicate, we consider {T = 60,S = 3}. We also compare the following methods:

1. JF: based on the proposed three-part model and the full likelihood estimator of the ˆ Matérn parameter λν .

2. JC: based on the proposed three-part model and the composite likelihood estimator of ˜ the Matérn parameter λν .

ˆ ˜ Table 3.1 summarizes the bias and root mean squared error of the estimated λν and λν from the above methods, given fixed ν at the true value. Results show that the composite likelihood estimator performs comparably to the full likelihood estimator.

78 Table 3.1: Bias and Root Mean Squared Error of the proposed methods for estimating the Matérn parameter λˆ

JF JC

Bias -0.89 -0.43 (0.28) (0.32) RMSE 4.48 4.9 (0.3) (0.22)

ˆ JF is based on the proposed three-part model and the full likelihood estimator of the Matérn parameter λν ; JC ˜ is based on the proposed three-part model and the composite likelihood estimator of the Matérn parameter λν .

Figure 3.1: boxplots of λˆ for JF and JC methods 40 30 20 10 JF JC

ˆ JF is based on the proposed three-part model and the full likelihood estimator of the Matérn parameter λν ; JC ˜ is based on the proposed three-part model and the composite likelihood estimator of the Matérn parameter λν ; The y-axis is the estimated value of λˆ and the horizontal dashed line corresponds to the truth λ = 20.

79 Figure 3.1 also presents the boxplots of the estimated Matérn parameter λˆ from the JF and JC methods. Results shows that the composite likelihood estimator JC exhibits a slightly larger variance than their full likelihood counterparty JF. However, the JF method is computationally more intensive due to the complexity in the full likelihood function (3.6), especially for data with a larger number of locations. Figure 3.2 shows the computing time of the two methods when the number of locations increases. The computing time of JF increases exponentially with respect to the number of locations. Considering both numerical and computational efficiency, we only focus on the JC method in the following sections.

Figure 3.2: Computing time for JF and JC methods when the number of locations increases.

150

100 Methods JC JF

50 Computing time (seconds)

0 4 6 8 10 Number of locations

ˆ JF is based on the proposed three-part model and the full likelihood estimator of the Matérn parameter λν ; JC ˜ is based on the proposed three-part model and the composite likelihood estimator of the Matérn parameter λν .

80 3.3.3 Prediction at New Time for Existing Locations

In this section, we assess the finite sample performance of prediction on new times for

existing locations, based on the estimated marginal and correlation parameters (θˆ,ηˆ ). We consider the same simulation design as in Section 3.3.1, except that simulated precipitations of T + 1 days were generated for validation on one-day-ahead prediction. It is worth noting

that at each time t, xt,1 = ... = xt,S = xt, i.e., the sampled covariates xt,s is the same across locations s = 1,...S. The reason for doing this is to mimic the real data application of the reanalysis. Recall that the reanalysis covariates in real data are global weather factors and are used for the statistical downscaling model. Therefore, covariates are the same across all locations. We also conducted simulations for another scenario with site-specific covariates, which leads to similar conclusions, and thus is omitted. We choose the parameters to be the same as in Section 3.3.2. The simulation is repeated 240 times. Within each replicate, we consider {T = 60,S = 20}. During the estimation,

simulated 60-day data are used to estimate the parameters (θ,η)). During the prediction step, the estimated parameters in the previous step are used to conduct one-day ahead prediction. Finally, during the evaluation, we consider the CRPS and multivariate rank histogram in Section 3.2.4.4. We consider the following prediction methods:

1. OMNI: Prediction based on the joint approach in Section 3.2.4.2, with true values of

parameters (θ,η);

2. JC(ν): Prediction based on the joint approach in Section 3.2.4.2, with parameters estimated from the proposed three-part model using the composite likelihood function

(3.7), with smoothness parameter fixed at prespecified ν.

3. MARGIN: Prediction based on the marginal approach in Section 3.2.4.1, with esti-

mated parameters θˆ.

81 It is worth noting that we do not consider JF method due to the computational inefficiency. In our analysis, we notice that the computing time for JF increases exponentially with respect to the number of locations. Hence it is infeasible to implement JF method when the number of location is large, say S = 20 in the simulation set up. Therefore, in order to

estimate the Matérn parameter λ, we only consider the proposed three-part model using the composite likelihood function (3.7), denoted as JC(ν), where ν is the prespecified smoothing parameter in the Matérn function in (3.5). Note that ν =1, 0.5 and 5 are considered, so JC(1) corresponds to the correctly specified ν, while JC(0.5) and JC(5) are based on misspecified ν values. Table 3.2 summarizes the CRPS for the probabilistic predictions of one-day-ahead predictions from different methods. Note that the CRPS is calculated by simulated M ˜ ensembles from the predictive distribution, with estimated parameters θˆ and λν . We set the emsemble size M = 200. It shows that the joint approach improves the performance by decreasing CRPS. In addition, in terms of CRPS, the joint approach seems not very sensitive

to the choice of the smoothing parameter ν.

Table 3.2: CRPS for one-day-ahead precipitation predictions.

JC(1) JC(0.5) JC(5) MARGIN OMNI

10xCRPS 1.52 1.52 1.52 1.54 1.43 (0.18) (0.18) (0.18) (0.18) (0.16)

JC(ν): prediction based on the joint approach with parameters estimated by the composite likelihood method with a fixed ν; MARGIN: prediction based on the marginal approach with estimated parameters θˆ; OMNI: prediction based on the joint approach in Section 3.2.4.2, with true values of (θ,η); CRPS: The continuous rank probability scores, approximated by (3.15);

Figure 3.3 then shows the multivariate rank histogram obtained from the five methods. A deviation away from the uniform distribution of ranks indicates a poor predictive dis- tribution from the underlying precipitation distribution. The results demonstrate that the

82 joint approach using estimated parameters provides a comparable performance to the OMNI method, where true parameters are plugged in.

OMNI JC(1) 40 40 20 20 0 0

0 50 100 150 200 0 50 100 150 200

JC(0.5) JC(5) 40 40 20 20 0 0

0 50 100 150 200 0 50 100 150 200

MARGIN 80 40 0

0 50 100 150 200

Figure 3.3: Multivariate rank histograms for prediciton on new time obtained from the five methods.

JC(ν): prediction based on the joint approach with parameters estimated by the composite likelihood method with a fixed ν; MARGIN: prediction based on the marginal approach with estimated parameters θˆ; OMNI: prediction based on the joint approach in Section 3.2.4.2, with true values of (θ,η);

In conclusion, both Table 3.2 and Figure 3.3 demonstrate that an improvement of prediction is achieved by implementing the joint approach, compared with the marginal

83 approach, in terms of either probabilistic distribution or the uniformity of multivariate rank.

3.3.4 Interpolation at New Locations

In this section, we assess the finite sample performance of interpolation of precipitation at a new location S∗ for a new time point T ∗, given that precipitation from neighborhoods is known, i.e., s=S ˆ WT ∗,S∗ | {XT ∗,s,YT ∗,s,WT ∗,s}s=1,(θ,ηˆ ),

ˆ t=T,s=S where (θ,ηˆ ) is estimated from {Xt,s,Yt,s,Wt,s}t=1,s=1 . In addition, we also investigated the sensitivity of misspecification to marginal distribu-

L P tions. Recall that, in our three-part model, logit link function ho or probit link function ho i l can be used in part 1 model (3.1), and identity link ha or log link ha can be used in part 2 L model (3.3). In our simulation design 3.3.1, we assume that the true data are from ho in part 1 i and ha in part 2. In practice, however, either model in part 1 or part 2 may be misspecified. In this section, we examine the sensitivity of interpolations under misspecification of marginal

P l distributions in parts 1 and 2, by using ho in part 1 and/or ha in part 2 to estimate (θ,η). Interpolations are then based on misspecified estimators. For simplicity, in the following, we denote (1) Li as the model with correctly specified marginals (logit link for part 1 and identity link for part 2); (2) Ll as the model with logit link for part 1 and log link for part 2; (3) Pi as the model with probit link for part 1 and identity link for part 2; (4) Pl as the model with probit link for part 1 and log link for part 2. We consider the same simulation design as in Section 3.3.1, except that simulated precipitation of S + 1 locations and T + 1 time points are generated for validation on interpolations. We choose the parameters to be the same as in Section 3.3.2. The simulation is replicated 240 times. Within each replicate, we consider {T = 60,S = 3}, and hence the generated precipitation at the 61st day at location 4 is used for validation. During the

estimation, simulated 60-day data is used to estimate the parameters (θ,η). As discussed in Section 3.2.5, we compare the following strategies of performing

84 ˆ t=T,s=S interpolations given estimated (θ,ηˆ ) and {Xt,s,Yt,s,Wt,s}t=1,s=1 .

1. MARGIN : The interpolated WT ∗,S∗ is obtained from the marginal strategy in Section 3.2.5.1, based on estimated parameters of θˆ;

2. JC : The interpolated WT ∗,S∗ is obtained from the joint strategy in Section 3.2.5.2, based on estimated θˆ and composite likelihood estimator η˜ in (3.7).

Note that in JC methods, η is estimated with fixed ν = 1, which is the true value of the smoothing parameter used for data generation. We first compare the CRPS discussed in Section 3.2.4.4. Recall that for observed

(simulated) precipitation at t=61 and s=4, w61,4, we generate a large number M = 200 s=S ensembles given covariates and observations form other locations {wT ∗,s,yT ∗,s,xT ∗,s}s=1, ˆ ∗(1) ∗(M) with estimated parameters (θ,ηˆ ) plugged in, denoted as y61,4,...,y61,4 . Then the CRPS is approximated by 3.15:

M M m 1 ∗(i) 1 ∗(i) ∗( j) ∑ |y61,4 − w61,4| − 2 ∑ ∑ |y61,4 − y61,4|. M i=1 2M i=1 j=1

Table 3.3 summarizes the averaged CRPS for the above interpolations under four mis- specified models across simulations.

∗(m) M Table 3.3: 100×Averaged CRPS for interpolated ensembles {yT ∗,S∗ }m=1 and truly generated wT ∗,S∗ across simulations.

Li Ll Pi Pl

MARGIN 13.17 13.74 13.16 13.74 (3.11) (3.17) (3.11) (3.17) JC 3.45 3.43 3.45 3.43 (0.94) (0.93) (0.94) (0.93)

85 It is shown from the Table 3.3, that considering dependence across locations, the predic- tion is more accurate compared to the marginal based predictions. Moreover, a significant decrease in CRPS can be observed in all scenarios with the joint strategy, even if we mis- specified the probit link function with logit link function, or the identity link with log link. Consequently, it can be concluded that the proposed three-part model is not sensitive to the misspecification of link functions in parts 1 and 2. We then also compare the prediction error between the observed (simulated) precip-

itation w61,4 and deterministic prediction wˆ61,4, which is the median of the ensembles ∗(1) ∗(M) y61,4,...,y61,4 . Hence we choose the Mean Absoluted Deviation (MAD) as the measure- ment for assessing the prediction error across simulations.

Table 3.4: 100×Averaged Mean Absolute Deviation for the median of the interpolated ∗(m) M ensembles {yT ∗,S∗ }m=1 and truly generated wT ∗,S∗ across simulations.

Li Ll Pi Pl

MARGIN 14.38 14.65 14.38 14.65 (3.46) (3.47) (3.46) (3.47) JC 4.88 4.86 4.88 4.86 (1.29) (1.28) (1.29) (1.29)

Tables 3.4 summarizes the MAD for the marginal and joint interpolation methods under four misspecification models across simulations. Results suggest that the joint method uniformly outperforms the marginal method, since the former gives significantly smaller MAD than the latter. Finally, for comparison of the rain occurrence, we also show the ROC curves for the predicted raining probability against observed rain occurrence at time t=61 and s=4. Note

86 ∗(1) ∗(M) that the predicted raining probability is calculated from the ensembles y61,4,...,y61,4 as

M 1  ∗(1)  ∑ I y61,4 > 0 , M m=1 where I(·) is the indicator function.

Table 3.5: 100×AUC of predicted raining probability against observed rain occurrence of two methods under misspecification of link funcitons

Li Ll Pi Pl

MARGIN 75.93 75.93 76.06 76.06 JC 98.56 98.52 98.57 98.52

Table 3.5 summarizes the AUC (Area Under Curve) for the above methods under misspecification of link funcitons. It is shown that JC methods lifts the AUC significantly compared to MARGIN, even though the link functions are misspecified. Figure 3.4 illustrates

P the ROC (Receiver Operating Characteristic) curves under scenario that ho is misspecified l in part 1 and ha is misspecified in part 2. As it is shown in Figure 3.4, the JC method outperforms the MARGIN method significantly.

87 Figure 3.4: ROC curves and AUCs using misspecified link functions in Model (3.1) and Model (3.3).

AUC = 0.76 AUC = 0.99

MARGIN JC x-axis and y-axis are the false-positive rate and true-positive rate of the predicted rain occurrence, respectively.

In conclusion, in this simulation study with interpolations at new locations for a new time point, given available neighboring precipitations, we observe that our proposed three- part model is insensitive to the misspecification of link functions in parts 1 and 2. In addition, through our simulation, we find that the proposed joint strategy (Section 3.2.5.2) outperforms the baseline marginal strategy (Section 3.2.5.1), since the predictive distribution of interpolated precipitation uniformly exhibits smaller CRPS and MAD.

3.4 Analysis of Chicago Precipitation Reanalysis Data

In this section, we apply the proposed prediction methods to statistical downscaling of daily precipitation in the greater Chicago area. Statistical downscaling aims to predict small-scale regional climate changes, utilizing large-scale global climate model output. Such global climate model is available to simulate atmospheric variables for both historical and future time periods. Our specific interests are: (1) the prediction of precipitation at existing stations, given the coarser-resolution predictor variables generated from a global climate model; and (2) interpolations for a new station without historical observations, given observed precipitation at neighboring stations and global climate predictors. In our

88 dataset, the response variable {Wt,s,Yt,s} is the observed daily precipitation (inches) at S = 7 locations and T = 8392 time points from 1957 – 2002. We mainly focus on the most recent

five-year (1998 - 2002) data to implement our proposed method. The predictor {Xt,s} is the reanalysis daily temperature, humidity and precipitations from the ERA-40 reanalysis model introduced in Uppala et al. (2005). Figure 3.5 shows the relative locations for the seven stations in the Chicago precipitation dataset.

6.PARK FOREST

2.MIDWAY

3.OHARE

7.WHEATON 5.BRANDON ROAD DAM

1.AURORA

4.ELGIN

Figure 3.5: Stations for Chicago precipitation data

3.4.1 Preliminary Analysis of Chicago Precipitation Data

In this preliminary analysis, we investigate different choices of link functions in (3.1) and (3.1) in the proposed three-part model and diagnose their goodness-of-fit.

89 3.4.1.1 Regression on Rain occurrence

Recall that, in Section 3.2, for rain occurrence we propose to use regression model (3.1) with two choices of link functions

1 hL(z) = o 1 + exp(−z) for logistic regression and

P −1 ho (z) = Φ (z)

for probit regression. In this section, we first consider regressions at each location separately, ˆ L using both link functions. Denote βs as the estimated parameter at location s via Logistic ˆ P regression and βs as the estimated parameter at location s via Probit regression, s = 1,...,7. For the goodness-of-fit, we use ROC (Receiver Operating Characteristic) curve as the graphical diagnostic tool and evaluate the model fit based on AUC (Area Under Curve). The ROC curve illustrates the true-positive rate against false-positive rate (Type I error), while the threshold varies. At any fixed false-positive rate, a higher true-positive rate that the model predictions fit the truly observed data better. Therefore, a higher AUC indicates a better fit. In our regression models for rain occurrence on reanalysis temperature, humidity and precipitation, at each location separately, we notice that all predictors are statistically significant. Additionally, there is no significant difference for the choice of the link funcitons, in terms of the ROC curves and AUCs.

90 AUC = 0.83 AUC = 0.83 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 Sensitivity Sensitivity 0.2 0.2 0.0 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Specificity

Figure 3.6: ROC curves of the regression models for rain occurrence at station Aurora, with L P link function ho (left) and ho (right).

Figure 3.6 shows the ROC curves of the regression models for rain occurrence at one of the locations, station Aurora, using Logitistic (left) and Prpbit (right) regression, respectively. Both choices of link functions result in the AUCs equal 0.83, indicating of similar performance of the Logistice regression and Probit regression for rain occurrence. ˆ L We notice that the estimates βs are not varying drastically across locations for s = 1....7, ˆ P and neither do βs . Therefore, we fit a regression model for rain occurrence using data L L from all locations together, assuming βs = β for s = 1,...,7 using the link function L P P P ho, or βs = β for s = 1,...,7 using the link function ho . The reason for assuming that the parameters are the same across locations is that we intend to use information from neighborhood stations for regression on rain occurrence. In addition, such an assumption makes it possible for interpolations at new stations where no historical data are available. ˆ L ˆ P L Table 3.6 summaries the estimated parameter β and β , using link functions ho and P ho , respectively. It suggests that, under the common parameter assumption, the estimated parameters are still significant and do not deviate largely from estimated values obtained from regressions at each location separately.

91 Figure 3.7 also shows the ROC curves for the two choices of link functions. From the ROC curves and corresponding AUC, we can conclude that there is no significant decrease in goodness-of-fit under the common parameter assumption. We could also conclude that AUC is 0.82 for both choices of link functions, which is comparable to the AUCs obtained from fitting regression models at each location separately.

Table 3.6: Estimated parameters in the regression model for rain occurrence under the common parameter assumption.

Link function Intercept tem hum rprcp L ho -1.53 -0.61 0.78 1.48 (0.01) (0.02) (0.02) (0.02) P ho -0.89 -0.38 0.48 0.77 (0.01) (0.01) (0.01) (0.01) tem: the reanalysis temperature; hum: the reanalysis humidity; rprcp: the reanalysis precipitations.

AUC = 0.82 AUC = 0.82 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 Sensitivity Sensitivity 0.2 0.2 0.0 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Specificity

L Figure 3.7: ROC curves of the regression models for rain occurrence, with link function ho P (left) and ho (right), using data from all locations under the common parameter assumption.

92 3.4.1.2 Regression on Rain amount

Recall that, in Section 3.2, for the rain amount we propose to use gamma regression model (3.3) with the identity link funciton

i ha(z) = z or the log link function

l ha(z) = log(z).

In this section, we first consider gamma regressions at each location separately, using both

i link functions. Denote γˆs as the estimated parameter at location s using the identity link i l l function ha, and γˆs as the estimated parameter at location s using the log link function ha, s = 1,...,7. It is worth noting that, since the shape parameter α is assumed to be constant across locations, we do not include the results for the estimation of α here; however, the main conclusions are the same. To assess the goodness-of-fit, we consider three graphical tools: (i) observed vs fitted 1/3 plot, where we plotted ys,t∗ on the y-axis and µˆs,t∗ on the x-axis. The plot with lying around the diagonal line indicates a good fit; (ii) the Q-Q plot for the residuals, which is y∗ y∗ −µˆ ∗ defined as −2{log( s,t∗ ) − s,t∗ s,t }. According to McCullagh and Nelder (1989) (page µˆs,t∗ µˆs,t∗ 38) and Cepeda-Cuervo et al. (2016), a good fit will have the deviance residual distributed

∗ normally; and (iii) the Q-Q plot for the simulated ys,t∗ against y˜s,t∗ , as in Wang et al. (2009). A good fit will produce a diagonal aligned plot. In our regression models for rain amount on the reanalysis dataset at each location separately, we notice that only the cube-root of reanalysis precipitation is statistically significant. Additionally, there is no significant difference for the values of estimated parameters across locations. Figure 3.8 shows the plots of observed against fitted for the gamma regression model based on the identity and log link functions at one of the location, station Chicago Midway Airport. Both models show a good fit for the data, except for some outliers in the upper

93 Figure 3.8: (i) Observed vs Fitted plots for rain amount at station Chicago Midway Airport, i l with link function ha (left) and ha (right). 2.5 2.5 1.5 1.5 Observed Observed 0.5 0.5

0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 Fitted Fitted y-axis is the cube-root of observed positive precipitations; x-axis is the fitted mean value via gamma regression i using specified link functions. Left panel used the identity link function ha, right panel used the log link l function ha.

quantile of the observed cube-root of rain amount. Figure 3.9 shows the residuals Q-Q plots for the gamma regression model based on the identity and log link functions at one of the location, station Chicago Midway Airport. The deviance residuals from both models exhibit a slight deviation from a theoretical normal distribution at the lower quantile. Figure 3.10 shows the Q-Q plots of the observed against simulated for the gamma regression model based on the identity and log link functions at one of the location, station Chicago Midway Airport. The Q-Q plots suggest that the model with identity link fits the data slightly better than the log link model, especially at the upper tail of the data.

i We also notice that the estimated parameters γˆs,s = 1,...,7 do not show significant l difference across locations, and neither do γˆs. Therefore, we fit a regression model for rain amount using data from all locations together, assuming common parameters across

i i i locations. That is, we assume that γs = γ for s = 1,...,7 using the identity link function ha, l l l or γs = γ for s = 1,...,7 using the log link function ha.

94 Figure 3.9: (ii) Q-Q plots for rain amount at station Chicago Midway Airport, with link i l function ha (left) and ha (right). 4 4 3 3 2 2 1 1 0 0 −2 −2

−3 −1 0 1 2 3 −3 −1 0 1 2 3

y-axis is the sample quantiles of the residuals of the cube-root of observed positive precipitations after the gamma regressions; x-axis is the theoretical quantiles of standard normal distribution. Left panel used the i l identity link function ha, right panel used the log link function ha.

i l i Table 3.7 summarizes the estimated parameter γˆ and γˆ , using link functions ha and l ha, respectively, under the common parameter assumption for rain amount. It is found that, under the common parameter assumption, the estimated parameters are still significant and do not deviate largely from the estimated values obtained from regressions at each location separately.

Table 3.7: Estimated parameters in the regression model for rain amount assuming common parameters across locations.

Link function Intercept rprcp.cr i ha 0.33 0.26 (0.00) (0.00) l ha -1.04 0.51 (0.01) (0.01) rprcp.cr: the cube-root of the reanalysis precipitations.

Figure 3.11 shows the three graphical tools for the goodness-of-fit on gamma regression for the rain amount model, assuming common parameters across locations with identity link

95 Figure 3.10: (iii) Q-Q plots for observed against simulated rain amount at station Chicago i l Midway Airport, with link function ha (left) and ha (right). 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 Observed Observed 0.5 0.5 0.0 0.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Simulated Simulated

y-axis is the sample quantiles of the cube-root of observed positive precipitations; x-axis is the quantiles of the simulated samples from the gamma distributions with estimated parameters from regression models. Left i l panel used the identity link function ha, right panel used the log link function ha.

and log link, respectively. Both models predict poorly on the upper tail from the observed against fitted plots. The Q-Q plots of observed against simulated data again suggest that the identity link provides a superior fit. We also summarize the Bayesian information criterion (BIC) for the two choices of link functions with and without the common parameter assumption in Table 3.8. It is worth noting that BIC, according to Schwarz (1978), is a criterion for among a finite set of models, defined as:

BIC = ln(N)p − 2ln(llˆ ), where N is the number of data points, p is the number of parameters and llˆ is the estimated log likelihood evaluated at the maximizing likelihood estimator. The BIC introduces a penalty term for the number of parameters in the model to the log likelihood of that model. The model with smaller BIC is preferred. From Table 3.8, we can conclude that for the

96 Figure 3.11: Graphical tools of the goodness of fit for the regression models for rain amount, i l with link function ha (upper panels) and ha (lower panels), using data from all locations under the common parameter assumption.

(i) (ii) (iii) 4 3 2.5 2.0 2 1 1.5 0 Sample 1.0 Observed Observed 0.5 −2 0.0 0.4 0.6 0.8 1.0 −4 −2 0 2 4 0.0 0.5 1.0 1.5 2.0 2.5 Fitted Theoretical Simulated 4 3 2.5 2.0 2 1 1.5 0 Sample 1.0 Observed Observed −2 0.5 0.0 0.4 0.6 0.8 1.0 −4 −2 0 2 4 0.0 1.0 2.0 Fitted Theoretical Simulated

(i): the observed against fitted plots; y-axis is the cube-root of observed positive precipitations; x-axis is the fitted mean value via gamma regression using specified link functions. (ii): the residuals Q-Q plots; y-axis is the sample quantiles of the residuals of the cube-root of observed positive precipitations after the gamma regressions; x-axis is the theoretical quantiles of standard normal distribution. (iii): the observed against simulated Q-Q plots y-axis is the sample quantiles of the cube-root of observed positive precipitations; x-axis is the quantiles of the simulated samples from the gamma distributions with estimated parameters from regression models. regression model on rain amount, the model using the identity link function with the common parameter assumption is preferable, since this model gives the lowest BIC.

Table 3.8: BIC for regression models for rain amount using two choices of link functions with common or location-specific parameters.

Link function common parameter location-specific

i ha -2847.78 -2795.89 l ha -3024.53 -2972.14

97 3.4.2 Prediction at New Time for Existing Locations

In this section, we investigate the model performance for the prediction on a new time (at existing locations) with Chicago precipitation data. For Chicago precipitation, we looked at the data within five years (1997 to 2001). There are four seasons in each year and three months (90 days) in each season. Within each season, we set the window size to T = 60

days as a training dataset. The model parameter (θ, η) was then estimated by a training dataset and used to predict the following 2-day precipitation at each location. The 2-day ahead prediction was repeated until the end of each season by sliding the training window accordingly. Therefore, we have 30 validation days for each month and 600 across five years. Recall that there are seven locations in the Chicago precipitation dataset such that finally we have 4200 validation observations for evaluations on the prediction performance. t=T,s=S Mathematically, each training dataset contains {Wt,s,Yt,s,Xt,s}t=1,s=1 , and testing dataset t=T+2,s=S contains {Xt,s}t=T+1,s=1, where T = 60 and S = 7. Then we are predicting

t=T+2,s=S t=T+2,s=S ˆ ˆ {Wt,s}t=T+1,s=1 {Xt,s}t=T+1,s=1,(θ,η), where θˆ = (βˆ,γˆ,αˆ ) are the parameters in the regression model on rain occurrence in part 1 model (3.1), and the parameters in the regression model on rain amount in part 2 model ˜ (3.3), assuming common parameter across locations; ηˆ = (λν ,ν) is the Matérn paramter in ˜ (3.5) in the part 3 copula-based model, with ν pre-specified, and λν is estimated through the composite likelihood function in (3.9). It is also worth noting that (θˆ,ηˆ ) is estimated from t=60,s=7 {Yt,s,Xt,s}t=1,s=1 . There are, in total, 30 such training-testing pairs in each season. The marginal approach is used as the baseline, in which both estimation and predictions are made at each location separately, and spatial dependence is ignored. In contrast, the proposed joint approach is used to account for dependence across multiple locations. We also implemented two other competing methods for comparison. The two-stage Gaussian process model (2GP), proposed by Berrocal et al. (2008), is

98 revised and considered as the first competing method. Their original model is reviewed in Section 3.2.4.3. Here, the main revisions were done on the selection of regression covariates and dependence structure. In their original model, the reanalysis precipitation and its indicator are the covariates considered in both rain occurrence and rain amount regression. However, in our dataset, we also include temperature and humidity which are shown to be statistically significant in the regression models of rain occurrence and rain amount. Consequently, to make the 2GP model fit better to the data, the marginal regression models are revised, including more predictors. Meanwhile, the covariance matrix in the 2GP model is formulated to exponentially decay with respect to distance. However, in our proposed model, Matérn structure is considered with the smoothness parameter prespecified at 1. Note that exponential decay is a special case of Matérn structure by specifying the smoothness parameter to be 0.5. Thus, to make a fair comparison, we change the exponential structured covariance matrix in the original 2GP model to the Matérn structure, with the smoothing parameter fixed at 1. The second competing method is the power-transformed truncated normal model (PTN) proposed by Bardossy and Plate (1992). The original PTN model assumed that the observed multi-location precipitation data came from a latent multivariate normal distribution. In their assumption, the latent multivariate normal distribution is truncated at zero and then powered at some fixed exponent, transferring to the precipitation. During the estimation, the mean vector, the variance at each location, and the power exponent are first estimated marginally by maximizing the marginal likelihood based on a univariate truncated normal distribution. The correlation matrix is then estimated given that all of the other parameters are fixed. It is worth noting that no covariates or regressions are involved in the PTN model. In our implementation, we revise the unstructured correlation matrix to the Matérn structure with the smoothing parameter fixed at 1. For each method, a large number M = 200 of ensembles are generated as the probabilistic prediction. To evaluate the prediction performance, we compare the following criteria based

99 ∗(m) M on the true observation yt,s and its ensembles {yt,s }m=1 for each fixed t ∈ T , the validation time set and s = 1...7:

1. CRPS: The continuous ranked probability score. Additional details can be found in Section 3.2.4.4;

2. Multivariate rank histogram: The histogram of multivariate ranks across the validation

time t ∈ T ;

3. Prediction errors: Based on the probabilistic forecasting ensembles, we can obtain use mean or median of the ensemble distribution as the deterministic prediction. We consider two kinds of prediction errors based on the deterministic prediction, including mean squared errors (MSE) and mean absolute deviance (MAD).

Table 3.9 summarizes the CRPS and prediction errors comparing the marginal and joint approaches, together with two other competing models. It is shown that the PTN model does not perform as well as the other three, since it ignores the impact from covariates. On the other hand, both the joint approach and the 2GP model are comparable and superior to the marginal approach. Specifically, the joint approach performs uniformly better than all of the other models, regardless of: (1) the choice of smoothing parameters; or (2) the way that the prediction is obtained from its ensembles. In every scenario, the fixed prediction obtained from ensembles results in a relatively lower prediction error. From the the multivariate rank histograms in Figure 3.12 and 3.13, we can find that the marginal approach shows a clear U-shape of the distribution of the multivariate rank, and the PTN method also exhibits a deviance from uniformity. In contrast, both the joint approach and the 2GP method have multivariate rank histograms close to be uniform, especially when the smoothing parameter is pre-specified at 0.5 and 1.

100 Table 3.9: Performance of predictions on new time based on different models for Chicago precipitation dataset from 1998 to 2002.

MARGIN JC(0.5) JC(1) JC(5) 2GP(0.5) 2GP(1) 2GP(5) PTN

100×CRPS 7.46 7.12 7.12 7.13 7.15 7.15 7.14 9.28 (0.32) (0.32) (0.32) (0.32) (0.32) (0.32) (0.32) (0.39)

Prediciton Error for Mean

100×MSE 7.38 7.18 7.19 7.22 7.29 7.27 7.22 9.22 (0.80) (0.79) (0.80) (0.80) (0.81) (0.81) (0.80) (1.02)

Prediciton Error for Median

100×MAD 9.43 9.11 9.11 9.12 9.15 9.13 9.15 10.18 (0.41) (0.41) (0.41) (0.41) (0.41) (0.41) (0.41) (0.46)

MARGIN: predictions based on the marginal approach; JC(ν): predictions based on the joint approach, using composite likelihood, with smoothing parameter in Matérn function fixed at ν; 2GP(ν): The Two-Stage Gaussian Process model by Berrocal et al. (2008), with smoothing parameter in Matérn function fixed at ν; PTN: The Power-transformed Truncated Normal model by Bardossy and Plate (1992).

101 MARGIN PTN 120 120 80 80 40 40 0 0 0 50 100 150 200 0 50 100 150 200

JC(0.5) 2GP(0.5) 120 120 80 80 40 40 0 0 0 50 100 150 200 0 50 100 150 200

Figure 3.12: Multivariate rank histograms of different methods for predicting at new times for the Chicago precipitation dataset from 1998 to 2002.

MARGIN: predictions based on the marginal approach; JC(ν): predictions based on the joint approach, using composite likelihood, with smoothing parameter in Matérn function fixed at ν; 2GP(ν): The Two-Stage Gaussian Process model by Berrocal et al. (2008), with smoothing parameter in Matérn function fixed at ν; PTN: The Power-transformed Truncated Normal model by Bardossy and Plate (1992).

102 JC(1) 2GP(1) 120 120 80 80 40 40 0 0 0 50 100 150 200 0 50 100 150 200

JC(5) 2GP(5) 120 120 80 80 40 40 0 0 0 50 100 150 200 0 50 100 150 200

Figure 3.13: Multivariate rank histograms of different methods for predicting at new times for the Chicago precipitation dataset from 1998 to 2002. (continue)

JC(ν): predictions based on the joint approach, using composite likelihood, with smoothing parameter in Matérn function fixed at ν; 2GP(ν): The Two-Stage Gaussian Process model by Berrocal et al. (2008), with smoothing parameter in Matérn function fixed at ν.

3.4.3 Interpolation at New Locations

t=T,s=S Assume that the precipitation data {Wt,s,Yt,s}t=1,s=1 are observed in the training dataset. Our main goal is to interpolate the precipitation at a new location s = S∗ at a new time

103 ∗ S S t = T , given {WT∗,s}s=1 and {XT ∗,s}s=1. Mathematically, we are predicting

WT ∗,S∗ |WT∗,1,...WT∗,S,XT ∗,s,(θˆ,ηˆ ),

ˆ t=T,s=S where (θ,ηˆ ) is estimated from {Wt,s,Yt,s,Xt,s}t=1,s=1 . ∗(m) M Since we are making a probabilistic prediciton, a large number M of ensembles {yT ∗,S∗ }m=1 will be generated using the joint strategy JC (Section 3.2.5.2) based on the estimator (θˆ,ηˆ ). We choose M = 200 in this study. In the original Chicago precipitation dataset, there are seven locations. Here, we cross- validate through the seven locations, each time leaving out the data from one location as the validation set. Within each validation, the sliding window strategy in the previous section is applied again to estimate and update the parameters. Therefore, we set {T = 60,S = 6}.

As a baseline, the marginal strategy (MARGIN) based on θˆs is also considered (see

Section 3.2.5.1). It is worth noting that the estimator θˆs in the MARGIN method differs

from θˆ in the JC method, in that θˆs is estimated at each location separately to account for variation across locations in the MARGIN method. Consequently, for a new location S∗ without historical data, θˆS∗ is approximated using the average of θˆs,s = 1,...,6. We also compared our proposed JC method with the 2GP and PTN methods. It is worth noting that, in their original work, neither considered the interpolate at new locations. Thus, we generalized the original 2GP and PTN methods so that we could use the estimated model parameters for interpolation under their model frameworks. In the 2GP model, two separated latent Gaussian processes are used to model rain occur- rence and rain amount, respectively. For rain amount at the new location, the distribution of rain amount is the univariate conditional normal distribution given rain amounts in neighbor- hoods. On the other hand, the rain occurrence can be randomly drawn from the conditional multivariate truncated normal distribution given rain occurrence in neighborhoods. In the PTN model, the marginal mean and variance of the new location are not accessible.

104 Therefore, the weighted average of mean and variance at current locations can be used as a proxy. In our analysis, only one new location is considered, and thus we set equal weights. To evaluate the performance, ROC is used to assess accuracy for predicting rain occur- rences, CRPS is used to assess the distributive deviation from the true observations, and prediction errors are used to assess the performances of the deterministic interpolations.

Table 3.10: Performance of cross-validation of interpolations based on different models for Chicago precipitation dataset from 1998 to 2002.

MARGIN JC 2GP PTN

100×CRPS 7.10 6.60 6.89 7.39 (0.32) (0.30) (0.32) (0.32)

Prediciton Error for Mean

100×MSE 7.11 6.61 6.82 6.59 (0.79) (0.77) (0.79) (0.78)

Prediciton Error for Median

100×MAD 9.05 8.46 8.64 8.84 (0.41) (0.38) (0.39) (0.38)

MARGIN: predictions based on the marginal approach; JC: predictions based on the joint approach, using composite likelihood. 2GP: The Two-Stage Gaussian Process model by Berrocal et al. (2008). PTN: The Power-transformed Truncated Normal model by Bardossy and Plate (1992); the smoothing parameter in

Matérn function is fixed at ν;

105 MARGIN (AUC:0.83) PTN (AUC:0.84) 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 Sensitivity Sensitivity 0.2 0.2 0.0 0.0

1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Specificity

JC (AUC:0.87) 2GP (AUC:0.86) 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 Sensitivity Sensitivity 0.2 0.2 0.0 0.0

1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Specificity

Figure 3.14: ROC curves and AUCs for rain occurrence predictions of the four candidate methods

MARGIN: marginal approach. JC: predictions based on the joint approach, with composite likelihood. 2GP: The two-stage Gaussian process method. PTN: The power-transformed truncated normal method. Smoothing

parameters are specified at ν = 1.

Table 3.10 summarizes the CRPS and prediction errors for the four methods. It is shown that the proposed joint method has the lowest cross-validated CRPS across all seven locations. This indicates a consistently better performance across all locations for the

106 joint approach, compared to others. From the perspective of prediction error, JC performs uniformly better than 2GP and PTN, and exhibits relatively lower errors. In conclusion, for interpolation to a new location, the proposed joint method outperforms for both probabilistic prediction and fixed point prediction. Figure 3.14 also shows the ROC for rain occurrence predictions of the four candidate methods. Even though PTN and 2GP lift the AUC by 1% and 3%, respectively, from the baseline, the proposed JC kept increasing AUC for a better prediction on rain occurrence. The relatively higher AUC indicates that the JC method is expected to have better prediction accuracy for raining probability on any given day.

3.5 Conclusion

In this chapter, we propose a multi-step approach for analyzing precipitation using copula to capture spatial dependence. The proposed method is flexible in choosing the marginal distribution. It is also shown to be insensitive to the misspecification of link functions in marginal distribution and the smoothing parameter of the spatial correlation matrix. Compared to existing methods, the proposed method consistently outperforms in both forecasting at new times and interpolation at new locations. Utilizing composite likelihood estimation and parameterization on spatial correlation structures can assist to handle higher dimensional spatial correlation matrices.

107 Chapter 4

Conclusion and Discussion

4.1 Concluding Remarks

The study of copulas and their applications in statistics and probability have attracted increasing attention from researchers. In this work, we explore copula-based estimation for temporally or spatially dependent data. For a stationary time-series, copula functions are utilized to capture temporal dependence, while leaving the marginal distribution to be estimated non-parametrically. However, the censoring caused by detection limits complicates the likelihood and leads to bias and/or inefficient estimates of temporal correlations. We then derive the likelihood function, which involves high-dimensional integrals, to account for the censoring. To handle the high-dimensional integration, we explore copula-based importance sampling, in which we establish the truncated conditional copula for the importance sampling. Such sampling technique is shown to be computationally efficient through our numerical studies. Based on the estimated copula parameter, we further investigate the conditional quantile estimation given the last observation in the time series. We find that the conditional quantile for the censored time series is related to the copula parameter and the last consecutive censored sequence at the tail of the series. In addition, our proposed copula-based estimator is theoretically proven to be consistent and efficient compared to existing methods. In practice, in order to choose a suitable copula family among a variety of candidate copula families, we propose a copula-selection criterion based on the distance between the suggested parametric copula and the (non-parametrically) empirical copula obtained from

108 data. Throughout our simulation, we show that our proposed copula-selection method is comparable to the correctly specified copula, in terms of conditional quantile estimation. We also demonstrate the improvement achieved with the proposed method in an empirical study with water quality data, compared to existing imputation methods. In the second project, we propose a copula-based model to analyze multi-site precip- itation data. We use the copula function to capture spatial dependence among multiple locations. As copula functions can separate the dependence structure and marginal distri- bution, we have the freedom to choose marginal candidate distributions that fit the data better. Given that sparse and skewed data constitute the main challenges in precipitation analysis, the use of copulas ensures that a flexible marginal estimation can be incorporated to overcome the sparsity and . In our empirical study, we discover that the logit and/or probit regression and gamma regressions provide good fits for the rain occurrence and rain amount, respectively. Our proposed copula-based method is also shown to be insensitive to the misspecification of the link functions in the binary regression for rain occurence, and the gamma regression for rain amount. Another difficulty caused by the sparsity of precipitation is that high-dimensional in- tegrals are involved in the likelihood function. In the , the dimension of integration could become extremely high as the number of locations increases. We solve this difficulty by using the copula-based composite likelihood function. Simulations and em- pirical results are shown to provide decent estimations and predictions. Given the estimated copula parameters, we can then further generate ensembles as the probabilistic predicted distribution, either for a new-time forecast or new-location interpolation. For forecasting, the predicted precipitation at multi-location is generated together. For interpolation, we derive the conditional distribution of the precipitation at the new location given the raining occurrence and raining amount at existing locations. The predicted precipitation can then be randomly drawn from the derived conditional distribution for the new location.

109 4.2 Limitations and Future Works

This thesis focuses on the copula-based analysis for censored time series and zero-inflated multisite data. For the semiparametric copula-based estimation for markov models in the first project, even though we do not make distributional assumptions for the marginal distribution, we incorporate parametric copula functions to capture the dependence structure. The asymptotic properties is also based on scenario when the true copula is specified. We considered a ad-hoc minimum-distance approach for choosing the copula function among a set of candidates, but the study of a formal copula selection method needs further investigation. On the other hand, We considered a two-step estimator, where the marginal distirbution and the copula parameter are estimated separately, For uncensored time series data analysis, Chen et al. (2009) proposed a sieve method by estimating the marginal distribution and the copula parameter jointly, and showed that this method is more efficient than the two-step estimator. Joint estimation for censored time series would be another interesting direction to explore. Additionally, in Chapter 2, the latent process is assumed to be stationary. In order to test the stationarity on a fully obsered time series, we can refer to the uniroot test, for instance, Dickey–Fuller test by Dickey and Fuller (1979) and Phillips–Perron test by Phillips and Perron (1988). However, some adjustments may be needed to test uniroot in censored time series. In the second project with zero-inflated multiside precipitation analysis, we developed a copula-based statistical downscaling for the Chicago precipitations at multisites. In the current empirical study, only seven locaitons are involved in the motivating Chicago dataset, but the proposed method can be generalized to spatial precipitation analysis by using the composite likelihood estimator. In our analysis, we also assumed that the regression parameters involved in the marginal distributions are common across locations. In the case where the common parameter assumption is violated, varying coefficient model (Gelfand et al., 2003) or smoothing parameter techniques (Wang et al., 2013) can be adopted. For

110 spatial data, Moran’s I test or the geographically weighted regression can be used to check the spatial stationarity; see Moran (1950) and Brunsdon et al. (1996). How to adapt these methods for checking the stationarity of spatial data with zero inflation remains an interesting open question.

111 Bibliography

Austin, P. C. and Brunner, L. J. (2003), “Type I Error Inflation in the Presence of a Ceiling Effect,” The American , 57, 97–104.

Bardossy, A. and Plate, E. J. (1992), “Space-Time Model for Daily Rainfall Using Atmo- spheric Circulation Patterns,” Water Resour. Res., 28, 1247–1259.

Beatriz Vaz de, M. M. and Aíube, C. (2011), “Copula based models for serial dependence,” International Journal of Managerial Finance, 7, 68–82.

Ben Alaya, M. A., Chebana, F., and Ouarda, T. B. M. J. (2015), “Probabilistic Multisite Statistical Downscaling for Daily Precipitation Using a Bernoulli–Generalized Pareto Multivariate Autoregressive Model,” Journal of Climate, 28, 2349–2364.

Bernhardt, P. W., Wang, H. J., and Zhang, D. (2014), “Flexible modeling of survival data with covariates subject to detection limits via multiple imputation,” Computational Statistics and Data Analysis, 69, 81–91.

Berrocal, V. J., Raftery, A. E., and Gneiting, T. (2008), “Probabilistic Quantitative Pre- cipitation Field Forecasting Using A Two-Stage Spatial Model,” The Annals of Applied Statistics, 2, 1170–1193.

Bickenbach, F. and Bode, E. (2001), “Markov or Not Markov – This Should Be a Question,” in Kiel Working Paper, Kiel Institute of World Economics, vol. 1086.

Brunsdon, C., Fotheringham, A. S., and Charlton, M. E. (1996), “Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity,” Geographical Analysis, 28, 281–298.

Cepeda-Cuervo, E., Corrales, M., Cifuentes, M. V., and Zarate, H. (2016), “On Gamma Regression Residuals,” Journal of The Iranian Statistical Society, 15, 29–44.

Chen, B., Hong, Y., and Wang, Y. (2012), “Testing for the Markov property in time series,” Econometric Theory, 28, 130–178.

Chen, X. and Fan, Y. (2006), “Estimation of copula-based semiparametric time series models,” Journal of , 130, 307–335.

Chen, X., Wu, W., and Yi, Y. (2009), “Efficient estimation of copula-based semiparametric markov models,” The Annals of Statistics, 37, 4214–4253.

Choi, S. J. and Portnoy, S. (2016), “Quantile autoregression for censored data,” Journal of Time Series Analysis, 37, 603–623.

112 de Oliveira Jr, M. R., Moreira, F., and Louzada, F. (2017), “The zero-inflated promotion cure rate model applied to financial data on time-to-default,” Cogent Economics & Finance, 5, 1395950.

Dickey, D. A. and Fuller, W. A. (1979), “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, 74, 427–431.

Emura, T., Long, T.-H., and Sun, L.-H. (2017), “R routines for performing estimation and sta- tistical process control under copula-based time series models,” Journal Communications in Statistics - Simulation and Computation, 47, 3067–3087.

Fu, L. and Wang, Y. (2011), “Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits,” Environmental Science and Technology, 45, 1481–1489.

— (2012), “Statistical tools for analyzing water quality data,” in Water Quality Monitoring and Assessment, eds. Voudouris, K. and Voutsa, D., InTech, chap. 6, pp. 143–168.

Garland, M., Morris, J. S., Rosner, B. A., Stampfer, M. J., Spate, V. L., and Baskett, C. J. (1993), “Toenail trace-element levels as biomarkers-reproducibility over a 6-year period.” Cancer , Biomarkers & Prevention, 2, 493–497.

Gelfand, A. E., Kim, H.-J., Sirmans, C. F., and Banerjee, S. (2003), “Spatial Modeling with Spatially Varying Coefficient Processes,” Journal of the American Statistical Association, 98, 387 – 396.

Genest, C., Rémillard, B., and Beaudoin, D. (2009), “Goodness-of-fit tests for copulas: a review and a power study,” Insurance: Mathematics and Economics, 44, 199–213.

Gleiss, A., Dakna, M., Mischak, H., and Heinze, G. (2015), “Two-group comparisons of zero-inflated intensity values: the choice of test statistic matters,” , 31, 2310–2317.

Glelt, A. (1985), “Estimation for Small Normal Data Sets with Detection Limits,” Environ- mental Science & Technology, 19, 1201–1206.

Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., and Johnson, N. A. (2008), “Assess- ing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds,” TEST, 17, 211.

Gray, L., Cortina-Borja, M., and Newell, M. L. (2004), “Modelling HIV-RNA viral load in vertically infected children,” Statistics in , 23, 769–781.

Hamill, T. M. (2001), “Interpretation of Rank Histograms for Verifying Ensemble Forecasts,” Monthly Weather Review, 129, 550–560.

Hamill, T. M. and Colucci, S. J. (1997), “Verification of Eta–RSM Short-Range Ensemble Forecasts,” Monthly Weather Review, 125, 1312–1327.

113 Huang, W., De Gruttola, V., Fischl, M., Hammer, S., Richman, D., Havlir, D., Gulick, R., Squires, K., and Mellors, J. (2001), “Patterns of plasma human immunodeficiency virus type 1 RNA response to antiretroviral therapy,” The Journal of Infectious Diseases, 1455–1465.

Joe, H. (1994), “Asymptotic efficiency of the two-stage estimation method for copula-based models,” Journal of Multivariate Analysis.

— (2015), Dependence Modeling with Copulas, CRC Press, Taylor and Francis Group Boca Raton.

Kleiber, W., Katz, R. W., and Rajagopalan, B. (2012), “Daily spatiotemporal precipitation simulation using latent and transformed Gaussian processes,” Water Resour. Res., 48.

Lee, L. F. (1999), “Estimation of dynamic and ARCH Tobit models,” Journal of Economet- rics, 92, 355–390.

Markov, A. (1954), Theory of Algorithms, TT 60-51085, Academy of Sciences of the USSR.

Matheson, J. E. and Winkler, R. L. (1976), “Scoring Rules for Continuous Probability Distributions,” Management Science, 22, 1087–1096.

McCullagh, P. and Nelder, J. (1989), Generalized Linear Models, Chapman and Hall.

Minasny, B. and McBratney, A. B. (2005), “The Matérn function as a general model for soil variograms,” Geoderma, 128, 192 – 207, pedometrics 2003.

Mohammad, N. M. (2014), “Censored Time Series Analysis,” Ph.D. thesis, The University of Western Ontario.

Monokroussos, G. (2013), “A classical MCMC approach to the estimation of limited dependent variable models of time series,” Computational Economics, 42, 71–105.

Moran, P. A. P. (1950), “Notes on Continuous Stochastic Phenomena,” Biometrika, 37, 17–23.

Moulton, L. H. and Halsey, N. A. (1995), “A Mixture Model with Detection Limits for Regression Analyses of Antibody Response to Vaccine,” Biometrics, 51, 1570–1578.

Nadkarni, N. V., Zhao, Y., and Kosorok, M. R. (2011), “Inverse regression estimation for censored data,” Journal of the American Statistical Association, 106, 178–190.

Nelsen, R. B. (2006), An Introduction to Copulas, second ed., Springer, New York.

Park, J., Genton, M., and Ghosh, S. (2007), “Censored time series analysis with autoregres- sive moving average models,” The Canadian Journal of Statistics, 35, 151–168.

Patton, A. (2012), “A review of copula models for economic time series,” Journal of Econometrics, 110, 4–18.

114 Phillips, P. C. B. and Perron, P. (1988), “Testing for a unit root in time series regression,” Biometrika, 75, 335–346.

Rémillard, B., Papageorgiou, N., and Soustra, F. (2012), “Copula-based semiparametric models for multivariate time series,” Journal of Multivariate Analysis, 110, 30–42.

Richardson, D. B. and Ciampi, A. (2003), “Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit,” American Journal of Epidemiology, 157, 355–363.

Rio, E. (1995), “A maximal inequality and dependent Marcinkiewicz-Zygmund strong laws,” The Annals of Probability, 23, 918–937.

Schisterman, E. F., Vexler, A., Whitcomb, B. W., and Liu, A. (2006), “The Limitations due to Exposure Detection Limits for Regression Models,” American Journal of Epidemiology, 163, 374–383.

Schumacher, F. L., Lachos, V. H., and Dey, D. K. (2017), “Censored regression models with autoregressive errors: A likelihood-based perspective,” The Canadian Journal of Statistics, 45, 375–392.

Schwarz, G. (1978), “Estimating the Dimension of a Model,” The Annals of Statistics, 6, 461 – 464.

Singh, A. and Nocerino, J. (2002), “Robust estimation of mean and variance using environ- mental data sets with below detection limit observations,” Chemometrics and Intelligent Laboratory Systems, 60, 69 – 86, fourth International Conference on Environ metrics and Chemometrics held in Las Vegas, NV, USA, 18-20 September 2000.

Sklar, M. (1959), “Fonctions de repartition a n dimensions et leurs marges,” Publ. Inst. Statist. Univ. Paris, 8, 229–231.

Sloughter, J. M. L., Raftery, A. E., Gneiting, T., and Fraley, C. (2007), “Probabilistic Quan- titative Precipitation Forecasting Using Bayesian Model Averaging,” Monthly Weather Review, 135, 3209–3220.

Stansfield, B. (2001), “Effects of sampling and laboratory detection limits on the determination of time series water quality trends,” New Zealand Journal of Marine and Freshwater Research, 35, 1071–1075.

Steinman, B. A., Abbott, M. B., Mann, M. E., Stansell, N. D., and Finney, B. P. (2012), “1,500 year quantitative reconstruction of winter precipitation in the Pacific Northwest,” Proceedings of the National Academy of Sciences of the United States of America, 109, 11619–11623.

Sun, Y. and Stein, M. L. (2015), “A Stochastic Space-Time Model for Intermittent Precipita- tion Occurrences,” The Annals of Applied Statistics, 9, 2110–2132.

115 Talagrand, O., Vautard, R., and Strauss, B. (1997), “Evaluation of probabilistic prediction systems,” in Workshop on , 20-22 October 1997, ECMWF, Shinfield Park, Reading: ECMWF, pp. 1–26.

Tang, Y., Wang, H. J., and Liang, H. (2017), “Composite Estimation for Single-Index Models with Responses Subject to Detection Limits,” Scandinavian Journal of Statistics, 45, 444–464.

Thorarinsdottir, T. L., Scheuerer, M., and Heinz, C. (2016), “Assessing the Calibration of High-Dimensional Ensemble Forecasts Using Rank Histograms,” Journal of Computa- tional and Graphical Statistics, 25, 105–122.

Uppala, S. M., KÅllberg, P. W., Simmons, A. J., Andrae, U., Bechtold, V. D. C., Fiorino, M., Gibson, J. K., Haseler, J., Hernandez, A., Kelly, G. A., Li, X., Onogi, K., Saarinen, S., Sokka, N., Allan, R. P., Andersson, E., Arpe, K., Balmaseda, M. A., Beljaars, A. C. M., Berg, L. V. D., Bidlot, J., Bormann, N., Caires, S., Chevallier, F., Dethof, A., Dragosavac, M., Fisher, M., Fuentes, M., Hagemann, S., Hólm, E., Hoskins, B. J., Isaksen, L., Janssen, P. A. E. M., Jenne, R., Mcnally, A. P., Mahfouf, J.-F., Morcrette, J.-J., Rayner, N. A., Saunders, R. W., Simon, P., Sterl, A., Trenberth, K. E., Untch, A., Vasiljevic, D., Viterbo, P., and Woollen, J. (2005), “The ERA-40 re-analysis,” Quarterly Journal of the Royal Meteorological Society, 131, 2961–3012.

Varin, C., Reid, N., and Firth, D. (2011), “An overview of composite likelihood methods,” Statistica Sinica, 21, 5–42.

Wallis, K. F. (2001), “Chi-squared tests of interval and density forecasts, and the bank of England’s fan charts,” in European Central Bank Working Paper Series, vol. 83.

Wang, C. and Chan, K. (2017), “Quasi-likelihood Estimation of a Censored Autoregressive Model With Exogenous Variables,” Journal of the American Statistical Association, to appear.

Wang, H. J. and Fygenson, M. (2009), “Inference for censored quantile regression models in longitudinal studies,” The Annals of Statistics, 37, 756–781.

Wang, H. J., Li, D., and He, X. (2012), “Estimation of High Conditional Quantiles for Heavy-Tailed Distributions,” Journal of the American Statistical Association, 107, 1453– 1464.

Wang, H. J., Zhu, Z., and Zhou, J. (2009), “Quantile regression in partially linear varying coefficient models,” The Annals of Statistics, 37, 3841–3866.

Wang, X., Du, P., and Shen, J. (2013), “Smoothing splines with varying smoothing parame- ter,” Biometrika, 100, 955–970.

Yang, Y. and He, X. (2012), “Bayesian empirical likelihood for quantile regression,” The Annals of Statistics, 40, 1102–1131.

116 Zeger, S. L. and Brookmeyer, R. (1986), “ with censored autocorrelated data,” The American Statistician, 81, 722–729.

Zhao, Y., Brown, B., and Wang, Y. (2014), “Smoothed rank-based procedure for censored data,” Electronic Journal of Statistics, 8, 2953–2974.

Zhou, W., Xian, F., Du, Y., Kong, X., and Wu, Z. (2013), “The last 130 ka precipitation reconstruction from Chinese loess 10Be,” Journal of Geophysical Research: Solid Earth, 119, 191–197.

117 Appendix A

Copula-Based Semiparametric Estimation for Markov Models with Cen- soring

We provide the detailed derivation of the likelihood Ln(θ) in Section 2.2.2. By equation(2.2), we can obtain the likelihood for the observed data {Y1,...,Ys1 } prior to D1,

s1 s1 ∗ n ∗ ∗ o Ln(θ | y1,...,ys1 ) = ∏g (yt)∏c G (yt−1),G (yt);θ . t=1 t=2

Thus the likelihood associated with data {y1,...,ys1 ,ys1+1,...,ys1+l1 ,ys1+l1+1} is

Ln(θ | y1,...,ys1 ,ys1+1 = ... = ys1+l1 = d,ys1+l1+1) Z d Z d ∗ = ... L (θ;y1,...,ys1+l1+1)dys1+1 ...dys1+l1 −∞ −∞ s1 s1 ∗ n ∗ ∗ o = ∏g (yt)∏c G (yt−1),G (yt);θ t=1 t=2 Z d Z d s1+l1+1 s1+l1+1 n o × ... g∗(y ) c G∗(y ),G∗(y ) dy ...dy ∏ t ∏ t−1 t ;θ s1+1 s1+l1 −∞ −∞ t=s1+1 t=s1+1 s1 s1 Z π Z π s1+l1+1 = g∗(y ) c(u ,u )g∗(y ) ... c(u ,u )du ...du , ∏ t ∏ t−1 t;θ s1+l1+1 ∏ t−1 t;θ s1+1 s1+l1 0 0 t=1 t=2 t=s1+1

118 ∗ ∗ where π = G (d), ut = G (yt). Consequently, we can derive the full likelihood function as

  Ln θ | {(yt,δt),t = 1,...,n} n∗ ∗ = ∏ g (yt) ∏ c(ut j ,ut j+1;θ) {t:δt =1} j=1 Kn Z π Z π si+li+1 × ∏ ... ∏ c(ut−1,ut;θ)dusi+1 ...dusi+li . 0 0 i=1 t=si+1

119