This article was downloaded by: [Rosangela Assumpção] On: 30 April 2014, At: 11:47 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20 Analysis of local influence in geostatistics using Student's t- distribution R.A.B. Assumpçãoa, M.A. Uribe-Opazob & M. Galeac a Colegiado de Matemática, Universidade Tecnológica Federal do Paraná, Rua Cristo Rei, 19, Vila Becker, 85902-490 Toledo, PR, Brazil b Centro de Ciências Exatas e Tecnológicas, Universidade Estadual do Oeste do Paraná, Rua Universitária 119, Jardim Universitário, 85814-110 Cascavel, PR, Brazil c Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile Published online: 28 Apr 2014.
To cite this article: R.A.B. Assumpção, M.A. Uribe-Opazo & M. Galea (2014): Analysis of local influence in geostatistics using Student's t-distribution, Journal of Applied Statistics, DOI: 10.1080/02664763.2014.909793 To link to this article: http://dx.doi.org/10.1080/02664763.2014.909793
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions Downloaded by [Rosangela Assumpção] at 11:47 30 April 2014 Journal of Applied Statistics, 2014 http://dx.doi.org/10.1080/02664763.2014.909793
Analysis of local influence in geostatistics using Student’s t-distribution
R.A.B. Assumpçãoa∗, M.A. Uribe-Opazob and M. Galeac
aColegiado de Matemática, Universidade Tecnológica Federal do Paraná, Rua Cristo Rei, 19, Vila Becker, 85902-490 Toledo, PR, Brazil; bCentro de Ciências Exatas e Tecnológicas, Universidade Estadual do Oeste do Paraná, Rua Universitária 119, Jardim Universitário, 85814-110 Cascavel, PR, Brazil; cDepartamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile
(Received 23 September 2013; accepted 26 March 2014)
This article aims to estimate parameters of spatial variability with Student’s t-distribution by the EM algorithm and present the study of local influence by means of two methods known as likelihood dis- placement and Q-displacement of likelihood, both using Student’s t-distribution with fixed degrees of freedom (ν). The results showed that both methods are effective in the identification of influential points.
Keywords: spatial variability; Q-displacement of likelihood; EM algorithm; diagnostics
1. Introduction Geostatistics is a method of analysis which models the spatial variability of georeferenced vari- ables, estimating the parameters that define the structure of spatial dependence. These parameters are used when values are interpolated with kriging at unsampled locations. For this interpolation to be reliable it is estimated that it represents the real local variability, the modeling process has to be performed with caution, especially in the presence of outliers or Downloaded by [Rosangela Assumpção] at 11:47 30 April 2014 influential points because the observations identified as influential in certain data sets produce disproportional changes in the estimation of parameters, in the covariance matrix, as well as in the design of thematic maps. In the presence of influential points, Cysneiros et al. [7] suggested as alternative models in the class of symmetric distributions such as Student’s t-distribution, which are more sensitive and therefore allow to reduce the influence of these points by incorporating additional parameters that adjust the kurtosis of the data. Fang et al. [11] presented a theoretical development of the Student’s t-distribution moti- vated by the fact be a robust alternative to the normal distribution. Lange and Sinsheimer [13]
∗Corresponding author. Email: [email protected]
c 2014 Taylor & Francis 2 R.A.B. Assumpção et al.
presented the family of the normal/independent distribution, including the multivariate Stu- dent’s t-distribution, and used the maximum likelihood and the expectation and maximization (EM) algorithm for estimating the parameters for the robust regression. Liu and Rubin [14,15] described the algorithms EM, Expection/Conditional Maximization and Expectation/Conditional Maximization Either, showing their computational efficiency in the maximum likelihood esti- mation of the parameters of the multivariate Student’s t-distribution with known and unknown degrees of freedom, with or without missing data, and with or without covariates. Despite being an alternative with heavier tails than normal and better accommodate aberrant observations, it is still possible that the Student’s t-distribution suffers the effect of influential observations. Therefore, it is important to perform studies of sensibility on it through diagnostic analysis. A diagnostic analysis is a technique used to evaluate the quality of the fitting model by assess- ing the assumptions made to it and also for assessing the robustness of its estimation when disturbances are introduced in the model itself or in the data [12]. Within the diagnostic analysis there is the analysis of local influence which studies the effect of introducing small perturbations in the model (or data) using a suitable measure of influence. This methodology was originally developed by Cook [9] and has become a popular diagnostic tool. Zhu and Lee [28] proposed a method to assess the local influence on incomplete data using the EM algorithm. Wei et al. [25] presented a technique for analysis of influence based on the mixture of distributions and the EM algorithm, considering the multivariate Student’s t-distribution as a particular case of the Gaussian mixture. The goal of this paper is to describe two methods for the diagnostic of analysis of local influ- ence in geostatistics: the first one, introduced by Cook [9], uses the log-likelihood function, and the second method, described by Zhu and Lee [28], uses the expectation of the log-likelihood function. We consider the spatial linear models with n-variate Student’s t-distribution with a fixed degrees of freedom.
2. Student’s t spatial linear model Consider a stationary stochastic process {Y(s), s ∈ S} where S ⊂ R2 and R2 is a two-dimensional T Euclidean space. Let Y = (Y(s1), ..., Y(sn)) have an n-variate in Student’s t-distribution func- tion on, fixed degrees of freedom, ν, μ is the vector of location of parameters n × 1 and the scale matrix n × n, then Y ∼ tn(μ, , ν). T For every Y = (Y(s1), ..., Y(sn)) , there are corresponding spatial locations known in si and sj where i = j = 1, ..., n, so that Downloaded by [Rosangela Assumpção] at 11:47 30 April 2014 Y(si) = μ(si) + (si) for i = 1, ..., n,(1)
where μ(si) is a deterministic term and (si) is a stochastic term, both dependent on the param- eter space where Y(si) operates. Assuming that the stochastic error (si) has E[(si)] = 0for i = 1, ..., n, and that the variation between points in space is determined by some covariance function C(si, sj) = Cov((si), (sj)). T The covariance function is specified by a vector of parameters φ = (φ1, φ2, φ3) , and these parameters are defines by the structure of spatial dependence. Supposing that the known functions of si, xi(si), ..., xp(si), the average of the stochastic process is given by the following equation: p μ( ) = β ( ) = ... si jxj si for i 1, , n,(2) j=1
where, β1, ..., βp are unknown parameters to be estimated. Journal of Applied Statistics 3
The expectation of the vector of random errors , n × 1, E() = 0 is a vector of zeros, T i.e. 0 = (0, ...,0) and its scale matrix is = [(σij)], where σij = C(si, sj) and C(si, sj) = Cov((si), (sj)). Assuming that is non-singular, and that the matrix X(n × p) is full rank (rank(X) = p), the scale matrix takes the following spatial structure given by the following equation: