Contributions to Modeling Spatially Indexed Functional Data Using a Reproducing Kernel Hilbert Space Framework Daniel Clayton Fortin Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2015 Contributions to modeling spatially indexed functional data using a reproducing kernel Hilbert space framework Daniel Clayton Fortin Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Statistics and Probability Commons Recommended Citation Fortin, Daniel Clayton, "Contributions to modeling spatially indexed functional data using a reproducing kernel Hilbert space framework" (2015). Graduate Theses and Dissertations. 14836. https://lib.dr.iastate.edu/etd/14836 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Contributions to modeling spatially indexed functional data using a reproducing kernel Hilbert space framework by Daniel Clayton Fortin A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Petrut¸a Caragea, Co-major Professor Zhengyuan Zhu, Co-major Professor Heike Hofmann Dan Nordman Stephen Vardeman Iowa State University Ames, Iowa 2015 Copyright c Daniel Clayton Fortin, 2015. All rights reserved. ii DEDICATION I would like to dedicate this dissertation to Donald Lacky|the most interesting man in the world|whose friendship and confidence in my abilities inspired me to start this journey. iii TABLE OF CONTENTS LIST OF TABLES . vi LIST OF FIGURES . vii ACKNOWLEDGEMENTS . xii CHAPTER 1. INTRODUCTION . 1 1.1 General introduction and organization of the dissertation . .1 1.2 Theoretical background on smoothing splines and reproducing kernel Hilbert space (RKHS) . .4 1.2.1 Historical note on the use of reproducing kernel Hilbert spaces in statistics . .4 1.2.2 Going beyond ordinary least squares: Penalized least squares and the cubic smoothing spline . .6 1.2.3 Reproducing kernel Hilbert spaces . .8 CHAPTER 2. NONPARAMETRIC COVARIANCE FUNCTION AND PRINCIPAL COMPONENT FUNCTION ESTIMATION FOR FUNC- TIONAL DATA . 13 2.1 Abstract . 13 2.2 Introduction . 14 2.3 Methodology . 17 2.4 Covariance function estimation . 19 2.4.1 Practical considerations for knot selection . 22 iv 2.5 Estimation of functional principal components . 22 2.6 Simulations . 25 2.7 Conclusions . 29 2.8 Proofs . 37 CHAPTER 3. ESTIMATION AND KRIGING FOR SPATIALLY IN- DEXED FUNCTIONAL DATA . 38 3.1 Introduction . 38 3.2 Ordinary kriging of function-valued data . 39 3.3 Cokriging functional principal components scores . 41 3.3.1 Estimation . 43 3.3.2 Prediction . 47 3.4 Simulation studies . 48 3.4.1 Simulation study of the spatially weighted covariance function es- timator . 48 3.4.2 Comparing prediction performance . 49 3.5 Discussion . 52 CHAPTER 4. FUNCTIONAL DATA ANALYSIS APPROACH TO LAND COVER CLASSIFICATION IN INDIA USING MERIS TER- RESTRIAL CHLOROPHYLL INDEX . 57 4.1 Abstract . 57 4.2 Introduction . 58 4.3 Data . 60 4.3.1 Data cleaning and processing . 61 4.4 Methodology . 63 4.4.1 Functional model for homogeneous grid cells . 67 4.4.2 Smoothing . 68 4.4.3 Land cover classification . 70 v 4.5 Results . 72 4.6 Conclusions . 77 BIBLIOGRAPHY . 86 vi LIST OF TABLES Table 2.1 Tensor product space decomposition with corresponding repro- ducing kernels. 20 2 Table 2.2 Integrated squared error, ^(t) − (t) , for the first two func- L2 tional principal components averaged over 100 runs. The value m indicates the number of observations per curve. The Monte Carlo standard error is shown in parentheses. 29 Table 3.1 Prediction error for the kriging predictors using exponential co- variance function with r = 0:2. The number in parentheses is the standard error. 54 Table 3.2 Prediction error for the kriging predictors using exponential co- variance function with r = 0.3. The number in parentheses is the standard error. 54 vii LIST OF FIGURES Figure 2.2 Examples of basis functions from knot locations located on a reg- ular grid on [0; 1] × [0; 1]. 23 Figure 2.4 Estimated covariance functions using the SSCOV method on a data set consisting of 100 curves simulated from the process X(t) in (2.14) using observation standard deviation equal to σ0 = 0:329. Here m equals the number of observations for each curve. 28 Figure 2.5 Estimates of the first two functional principal components using the SSCOV method. For each of the 100 simulated data sets 100 curves were simulated with m observations per curve using Scenario I. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 30 Figure 2.6 Functional principal component estimation using the SIC method as described in Ramsay and Silverman (2005). For each of the 100 simulated data sets 100 curves were simulated with m obser- vations per curve using Scenario I. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 31 viii Figure 2.7 Estimates of the first two functional principal components using the SSCOV method. For each of the 100 simulated data sets 100 curves were simulated with m observations per curve using Scenario II. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 32 Figure 2.8 Functional principal component estimation using the SIC method as described in Ramsay and Silverman (2005). For each of the 100 simulated data sets 100 curves were simulated with m obser- vations per curve using Scenario II. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 33 Figure 2.9 Estimates of the first two functional principal components using the SSCOV method. For each of the 100 simulated data sets 100 curves were simulated with m observations per curve using Scenario III. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 34 Figure 2.10 Functional principal component estimation using the SIC method as described in Ramsay and Silverman (2005). For each of the 100 simulated data sets 100 curves were simulated with m observa- tions per curve using Scenario III. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 35 Figure 3.1 Locations of curves used for the simulation. 49 Figure 3.2 Exponential covariance functions used in the simulation. 50 ix Figure 3.3 The x-axis shows the value of the scale parameter p in the weight function. Large values of p correspond to smaller weights for curves in high-point-intensity areas. The y-axis shows the average integrated square error for the covariance estimator. The error bars show +/- two standard errors. 51 Figure 3.4 Locations of curves generated for the simulation study. Black points show locations used for fitting each model. The 12 red numbers labeled on the plot show prediction locations. 52 Figure 3.5 Boxplots showing distribution of the average prediction error, 2 X^(t) − X(t) , across the 12 unobserved locations shown in L2 3.4 for 1000 simulated data sets. The curves were simulated us- ing the functional process in (3.20) with the covariance function in (3.21). The plot label \dependence" refers to the value of the range parameter r in the covariance function. The plot label \sigma" refers to the observation noise standard deviation σ0 in (3.8). ................................ 53 Figure 4.1 Diagram illustrating typical phenology patterns for agricultural land and natural vegetation from Dash et al. (2010). 61 Figure 4.2 Google EarthTM [data ISO, NOAA, U.S. Navy, NGA, GEBCO, Image Landsat] screenshot of study region in southern India from May 2000 (top). Land cover classification at 4.6 km spatial reso- lution derived from GLC2000 land cover database (bottom). 62 Figure 4.3 Smoothed MTCI values by year and by land cover. 63 x Figure 4.4 Land cover map of the study region aggregated into \Agricul- ture", \Vegetation", and \other". Grid cells were labeled \Vege- tation" if their original land cover was Tropical Evergreen, Sub- tropical Evergreen, Tropical Semievergreen, Tropical Moist De- ciduous, or Coastal vegetation. Grid cells were labeled \Agricul- ture" is their original land cover was Rainfed Agriculture, Slope Agriculture, Irrigated Agriculture, or Irrigated Intensive Agricul- ture. Black dots show the location of homogeneous agriculture grid cells. 64 Figure 4.5 Map of the study region showing the proportion of agricultural land within each of the 4.6 km×4.6 km grid cells based on the GLC2000 land cover database. 65 Figure 4.6 Observed MTCI values for homogeneous agriculture cells in 2003. The color identified curves as in either the eastern or western part of the country. The homogeneous agriculture cells on the west side of the country consist of intensely irrigated agriculture. 66 Figure 4.7 Spatial locations of the homogeneous agricultural grid cells. The size of the point indicates the weight used for data at that location for estimating the covariance function. 71 Figure 4.8 First three principal component functions computed for each year from 2003{2007. 73 Figure 4.9 Map of the study region showing locations where classification re- sults differ from the original GLC2000 derived land classification.