Contributions to Modeling Spatially Indexed Functional Data Using a Reproducing Kernel Hilbert Space Framework Daniel Clayton Fortin Iowa State University

Total Page:16

File Type:pdf, Size:1020Kb

Contributions to Modeling Spatially Indexed Functional Data Using a Reproducing Kernel Hilbert Space Framework Daniel Clayton Fortin Iowa State University Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2015 Contributions to modeling spatially indexed functional data using a reproducing kernel Hilbert space framework Daniel Clayton Fortin Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Statistics and Probability Commons Recommended Citation Fortin, Daniel Clayton, "Contributions to modeling spatially indexed functional data using a reproducing kernel Hilbert space framework" (2015). Graduate Theses and Dissertations. 14836. https://lib.dr.iastate.edu/etd/14836 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Contributions to modeling spatially indexed functional data using a reproducing kernel Hilbert space framework by Daniel Clayton Fortin A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Petrut¸a Caragea, Co-major Professor Zhengyuan Zhu, Co-major Professor Heike Hofmann Dan Nordman Stephen Vardeman Iowa State University Ames, Iowa 2015 Copyright c Daniel Clayton Fortin, 2015. All rights reserved. ii DEDICATION I would like to dedicate this dissertation to Donald Lacky|the most interesting man in the world|whose friendship and confidence in my abilities inspired me to start this journey. iii TABLE OF CONTENTS LIST OF TABLES . vi LIST OF FIGURES . vii ACKNOWLEDGEMENTS . xii CHAPTER 1. INTRODUCTION . 1 1.1 General introduction and organization of the dissertation . .1 1.2 Theoretical background on smoothing splines and reproducing kernel Hilbert space (RKHS) . .4 1.2.1 Historical note on the use of reproducing kernel Hilbert spaces in statistics . .4 1.2.2 Going beyond ordinary least squares: Penalized least squares and the cubic smoothing spline . .6 1.2.3 Reproducing kernel Hilbert spaces . .8 CHAPTER 2. NONPARAMETRIC COVARIANCE FUNCTION AND PRINCIPAL COMPONENT FUNCTION ESTIMATION FOR FUNC- TIONAL DATA . 13 2.1 Abstract . 13 2.2 Introduction . 14 2.3 Methodology . 17 2.4 Covariance function estimation . 19 2.4.1 Practical considerations for knot selection . 22 iv 2.5 Estimation of functional principal components . 22 2.6 Simulations . 25 2.7 Conclusions . 29 2.8 Proofs . 37 CHAPTER 3. ESTIMATION AND KRIGING FOR SPATIALLY IN- DEXED FUNCTIONAL DATA . 38 3.1 Introduction . 38 3.2 Ordinary kriging of function-valued data . 39 3.3 Cokriging functional principal components scores . 41 3.3.1 Estimation . 43 3.3.2 Prediction . 47 3.4 Simulation studies . 48 3.4.1 Simulation study of the spatially weighted covariance function es- timator . 48 3.4.2 Comparing prediction performance . 49 3.5 Discussion . 52 CHAPTER 4. FUNCTIONAL DATA ANALYSIS APPROACH TO LAND COVER CLASSIFICATION IN INDIA USING MERIS TER- RESTRIAL CHLOROPHYLL INDEX . 57 4.1 Abstract . 57 4.2 Introduction . 58 4.3 Data . 60 4.3.1 Data cleaning and processing . 61 4.4 Methodology . 63 4.4.1 Functional model for homogeneous grid cells . 67 4.4.2 Smoothing . 68 4.4.3 Land cover classification . 70 v 4.5 Results . 72 4.6 Conclusions . 77 BIBLIOGRAPHY . 86 vi LIST OF TABLES Table 2.1 Tensor product space decomposition with corresponding repro- ducing kernels. 20 2 Table 2.2 Integrated squared error, ^(t) − (t) , for the first two func- L2 tional principal components averaged over 100 runs. The value m indicates the number of observations per curve. The Monte Carlo standard error is shown in parentheses. 29 Table 3.1 Prediction error for the kriging predictors using exponential co- variance function with r = 0:2. The number in parentheses is the standard error. 54 Table 3.2 Prediction error for the kriging predictors using exponential co- variance function with r = 0.3. The number in parentheses is the standard error. 54 vii LIST OF FIGURES Figure 2.2 Examples of basis functions from knot locations located on a reg- ular grid on [0; 1] × [0; 1]. 23 Figure 2.4 Estimated covariance functions using the SSCOV method on a data set consisting of 100 curves simulated from the process X(t) in (2.14) using observation standard deviation equal to σ0 = 0:329. Here m equals the number of observations for each curve. 28 Figure 2.5 Estimates of the first two functional principal components using the SSCOV method. For each of the 100 simulated data sets 100 curves were simulated with m observations per curve using Scenario I. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 30 Figure 2.6 Functional principal component estimation using the SIC method as described in Ramsay and Silverman (2005). For each of the 100 simulated data sets 100 curves were simulated with m obser- vations per curve using Scenario I. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 31 viii Figure 2.7 Estimates of the first two functional principal components using the SSCOV method. For each of the 100 simulated data sets 100 curves were simulated with m observations per curve using Scenario II. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 32 Figure 2.8 Functional principal component estimation using the SIC method as described in Ramsay and Silverman (2005). For each of the 100 simulated data sets 100 curves were simulated with m obser- vations per curve using Scenario II. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 33 Figure 2.9 Estimates of the first two functional principal components using the SSCOV method. For each of the 100 simulated data sets 100 curves were simulated with m observations per curve using Scenario III. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 34 Figure 2.10 Functional principal component estimation using the SIC method as described in Ramsay and Silverman (2005). For each of the 100 simulated data sets 100 curves were simulated with m observa- tions per curve using Scenario III. The solid line is the pointwise mean, the dashed line is the true FPC, and the gray bands show the pointwise 2.5 and 97.5 quantiles. 35 Figure 3.1 Locations of curves used for the simulation. 49 Figure 3.2 Exponential covariance functions used in the simulation. 50 ix Figure 3.3 The x-axis shows the value of the scale parameter p in the weight function. Large values of p correspond to smaller weights for curves in high-point-intensity areas. The y-axis shows the average integrated square error for the covariance estimator. The error bars show +/- two standard errors. 51 Figure 3.4 Locations of curves generated for the simulation study. Black points show locations used for fitting each model. The 12 red numbers labeled on the plot show prediction locations. 52 Figure 3.5 Boxplots showing distribution of the average prediction error, 2 X^(t) − X(t) , across the 12 unobserved locations shown in L2 3.4 for 1000 simulated data sets. The curves were simulated us- ing the functional process in (3.20) with the covariance function in (3.21). The plot label \dependence" refers to the value of the range parameter r in the covariance function. The plot label \sigma" refers to the observation noise standard deviation σ0 in (3.8). ................................ 53 Figure 4.1 Diagram illustrating typical phenology patterns for agricultural land and natural vegetation from Dash et al. (2010). 61 Figure 4.2 Google EarthTM [data ISO, NOAA, U.S. Navy, NGA, GEBCO, Image Landsat] screenshot of study region in southern India from May 2000 (top). Land cover classification at 4.6 km spatial reso- lution derived from GLC2000 land cover database (bottom). 62 Figure 4.3 Smoothed MTCI values by year and by land cover. 63 x Figure 4.4 Land cover map of the study region aggregated into \Agricul- ture", \Vegetation", and \other". Grid cells were labeled \Vege- tation" if their original land cover was Tropical Evergreen, Sub- tropical Evergreen, Tropical Semievergreen, Tropical Moist De- ciduous, or Coastal vegetation. Grid cells were labeled \Agricul- ture" is their original land cover was Rainfed Agriculture, Slope Agriculture, Irrigated Agriculture, or Irrigated Intensive Agricul- ture. Black dots show the location of homogeneous agriculture grid cells. 64 Figure 4.5 Map of the study region showing the proportion of agricultural land within each of the 4.6 km×4.6 km grid cells based on the GLC2000 land cover database. 65 Figure 4.6 Observed MTCI values for homogeneous agriculture cells in 2003. The color identified curves as in either the eastern or western part of the country. The homogeneous agriculture cells on the west side of the country consist of intensely irrigated agriculture. 66 Figure 4.7 Spatial locations of the homogeneous agricultural grid cells. The size of the point indicates the weight used for data at that location for estimating the covariance function. 71 Figure 4.8 First three principal component functions computed for each year from 2003{2007. 73 Figure 4.9 Map of the study region showing locations where classification re- sults differ from the original GLC2000 derived land classification.
Recommended publications
  • Intrinsic Riemannian Functional Data Analysis
    Submitted to the Annals of Statistics INTRINSIC RIEMANNIAN FUNCTIONAL DATA ANALYSIS By Zhenhua Lin and Fang Yao University of California, Davis and University of Toronto In this work we develop a novel and foundational framework for analyzing general Riemannian functional data, in particular a new de- velopment of tensor Hilbert spaces along curves on a manifold. Such spaces enable us to derive Karhunen-Loève expansion for Riemannian random processes. This framework also features an approach to com- pare objects from different tensor Hilbert spaces, which paves the way for asymptotic analysis in Riemannian functional data analysis. Built upon intrinsic geometric concepts such as vector field, Levi-Civita con- nection and parallel transport on Riemannian manifolds, the devel- oped framework applies to not only Euclidean submanifolds but also manifolds without a natural ambient space. As applications of this framework, we develop intrinsic Riemannian functional principal com- ponent analysis (iRFPCA) and intrinsic Riemannian functional linear regression (iRFLR) that are distinct from their traditional and ambient counterparts. We also provide estimation procedures for iRFPCA and iRFLR, and investigate their asymptotic properties within the intrinsic geometry. Numerical performance is illustrated by simulated and real examples. 1. Introduction. Functional data analysis (FDA) advances substantially in the past two decades, as the rapid development of modern technology enables collecting more and more data continuously over time. There is rich literature spanning more than seventy years on this topic, in- cluding development on functional principal component analysis such as Rao (1958); Kleffe (1973); Dauxois, Pousse and Romain (1982); Silverman (1996); Yao, Müller and Wang (2005a); Hall and Hosseini-Nasab (2006); Zhang and Wang (2016), and advances on functional linear regression such as Yao, Müller and Wang (2005b); Hall and Horowitz (2007); Yuan and Cai (2010); Kong et al.
    [Show full text]
  • A Reproducing Kernel Hilbert Space Approach to Functional Linear
    The Annals of Statistics 2010, Vol. 38, No. 6, 3412–3444 DOI: 10.1214/09-AOS772 c Institute of Mathematical Statistics, 2010 A REPRODUCING KERNEL HILBERT SPACE APPROACH TO FUNCTIONAL LINEAR REGRESSION By Ming Yuan1 and T. Tony Cai2 Georgia Institute of Technology and University of Pennsylvania We study in this paper a smoothness regularization method for functional linear regression and provide a unified treatment for both the prediction and estimation problems. By developing a tool on si- multaneous diagonalization of two positive definite kernels, we obtain shaper results on the minimax rates of convergence and show that smoothness regularized estimators achieve the optimal rates of con- vergence for both prediction and estimation under conditions weaker than those for the functional principal components based methods developed in the literature. Despite the generality of the method of regularization, we show that the procedure is easily implementable. Numerical results are obtained to illustrate the merits of the method and to demonstrate the theoretical developments. 1. Introduction. Consider the following functional linear regression mod- el where the response Y is related to a square integrable random function X( ) through · (1) Y = α0 + X(t)β0(t) dt + ε. ZT Here α is the intercept, is the domain of X( ), β ( ) is an unknown slope 0 T · 0 · function and ε is a centered noise random variable. The domain is assumed T to be a compact subset of an Euclidean space. Our goal is to estimate α0 arXiv:1211.2607v1 [math.ST] 12 Nov 2012 Received March 2009; revised November 2009. 1Supported in part by NSF Grant DMS-MPSA-0624841 and NSF CAREER Award DMS-0846234.
    [Show full text]
  • Mixture Inner Product Spaces and Their Application to Functional Data Analysis
    MIXTURE INNER PRODUCT SPACES AND THEIR APPLICATION TO FUNCTIONAL DATA ANALYSIS Zhenhua Lin1, Hans-Georg Muller¨ 2 and Fang Yao1;3 Abstract We introduce the concept of mixture inner product spaces associated with a given separable Hilbert space, which feature an infinite-dimensional mixture of finite-dimensional vector spaces and are dense in the underlying Hilbert space. Any Hilbert valued random element can be arbi- trarily closely approximated by mixture inner product space valued random elements. While this concept can be applied to data in any infinite-dimensional Hilbert space, the case of functional data that are random elements in the L2 space of square integrable functions is of special interest. For functional data, mixture inner product spaces provide a new perspective, where each realiza- tion of the underlying stochastic process falls into one of the component spaces and is represented by a finite number of basis functions, the number of which corresponds to the dimension of the component space. In the mixture representation of functional data, the number of included mix- ture components used to represent a given random element in L2 is specifically adapted to each random trajectory and may be arbitrarily large. Key benefits of this novel approach are, first, that it provides a new perspective on the construction of a probability density in function space under mild regularity conditions, and second, that individual trajectories possess a trajectory-specific dimension that corresponds to a latent random variable, making it possible to use a larger num- ber of components for less smooth and a smaller number for smoother trajectories.
    [Show full text]
  • Sampled Forms of Functional PCA in Reproducing Kernel Hilbert Spaces.” DOI:10.1214/ 12-AOS1033SUPP
    The Annals of Statistics 2012, Vol. 40, No. 5, 2483–2510 DOI: 10.1214/12-AOS1033 c Institute of Mathematical Statistics, 2012 SAMPLED FORMS OF FUNCTIONAL PCA IN REPRODUCING KERNEL HILBERT SPACES By Arash A. Amini and Martin J. Wainwright1 University of California, Berkeley We consider the sampling problem for functional PCA (fPCA), where the simplest example is the case of taking time samples of the underlying functional components. More generally, we model the sampling operation as a continuous linear map from H to Rm, where the functional components to lie in some Hilbert subspace H of L2, such as a reproducing kernel Hilbert space of smooth functions. This model includes time and frequency sampling as special cases. In con- trast to classical approach in fPCA in which access to entire functions is assumed, having a limited number m of functional samples places limitations on the performance of statistical procedures. We study these effects by analyzing the rate of convergence of an M-estimator for the subspace spanned by the leading components in a multi-spiked covariance model. The estimator takes the form of regularized PCA, and hence is computationally attractive. We analyze the behavior of this estimator within a nonasymptotic framework, and provide bounds that hold with high probability as a function of the number of statistical samples n and the number of functional samples m. We also derive lower bounds showing that the rates obtained are minimax optimal. 1. Introduction. The statistical analysis of functional data, commonly referred to as functional data analysis (FDA), is an established area of statis- tics with a great number of practical applications; see the books [26, 27] and references therein for various examples.
    [Show full text]
  • Intrinsic Riemannian Functional Data Analysis
    The Annals of Statistics 2019 Vol. 47, No. 6, 3533{3577 DOI: 10.1214/18-AOS1787 © Institute of Mathematical Statistics, 2019 INTRINSIC RIEMANNIAN FUNCTIONAL DATA ANALYSIS By Zhenhua Lin† and Fang Yao∗,‡ National University of Singapore† and Peking University‡ In this work we develop a novel and foundational framework for analyzing general Riemannian functional data, in particular a new de- velopment of tensor Hilbert spaces along curves on a manifold. Such spaces enable us to derive Karhunen{Lo`eve expansion for Riemannian random processes. This framework also features an approach to com- pare objects from different tensor Hilbert spaces, which paves the way for asymptotic analysis in Riemannian functional data analysis. Built upon intrinsic geometric concepts such as vector field, Levi{Civita connection and parallel transport on Riemannian manifolds, the de- veloped framework applies to not only Euclidean submanifolds but also manifolds without a natural ambient space. As applications of this framework, we develop intrinsic Riemannian functional principal component analysis (iRFPCA) and intrinsic Riemannian functional linear regression (iRFLR) that are distinct from their traditional and ambient counterparts. We also provide estimation procedures for iRF- PCA and iRFLR, and investigate their asymptotic properties within the intrinsic geometry. Numerical performance is illustrated by sim- ulated and real examples. 1. Introduction. Functional data analysis (FDA) advances substan- tially in the past two decades, as the rapid development of modern technol- ogy enables collecting more and more data continuously over time. There Received October 2017; revised October 2018. ∗Fang Yao's research is partially supported by National Natural Science Foundation of China Grant 11871080, a Discipline Construction Fund at Peking University and Key Laboratory of Mathematical Economics and Quantitative Finance (Peking Univer- sity), Ministry of Education.
    [Show full text]
  • Sparse Estimation for Functional Semiparametric Additive Models Peijun Sang, Richard A
    Journal of Multivariate Analysis 168 (2018) 105–118 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sparse estimation for functional semiparametric additive models Peijun Sang, Richard A. Lockhart, Jiguo Cao * Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada V5A1S6 article info a b s t r a c t Article history: We propose a functional semiparametric additive model for the effects of a functional Received 5 January 2018 covariate and several scalar covariates and a scalar response. The effect of the functional Available online 11 July 2018 covariate is modeled nonparametrically, while a linear form is adopted to model the effects of the scalar covariates. This strategy can enhance flexibility in modeling the effect of MSC: the functional covariate and maintain interpretability for the effects of scalar covariates 62G08 simultaneously. We develop the method for estimating the functional semiparametric Keywords: additive model by smoothing and selecting non-vanishing components for the functional Functional data analysis covariate. Asymptotic properties of our method are also established. Two simulation Functional linear model studies are implemented to compare our method with various conventional methods. We Functional principal component analysis demonstrate our method with two real applications. Crown Copyright ' 2018 Published by Elsevier Inc. All rights reserved. 1. Introduction High-dimensional data sets of large volume and complex structure are rapidly emerging in various fields. Functional data analysis, due to its great flexibility and wide applications in dealing with high-dimensional data, has received considerable attention. One important problem in functional data analysis is functional linear regression (FLR).
    [Show full text]
  • Topics in Functional Data Analysis and Machine Learning Predictive Inference
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2019 Topics in functional data analysis and machine learning predictive inference Haozhe Zhang Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Statistics and Probability Commons Recommended Citation Zhang, Haozhe, "Topics in functional data analysis and machine learning predictive inference" (2019). Graduate Theses and Dissertations. 17626. https://lib.dr.iastate.edu/etd/17626 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Topics in functional data analysis and machine learning predictive inference by Haozhe Zhang A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Yehua Li, Co-major Professor Dan Nettleton, Co-major Professor Ulrike Genschel Lily Wang Zhengyuan Zhu The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Haozhe Zhang, 2019. All rights reserved. ii DEDICATION I would like to dedicate this dissertation to my parents, Baiming Zhang and Jiany- ing Fan, and my significant other, Shuyi Zhang, for their endless love and unwavering support.
    [Show full text]
  • Functional Data Analysis • Explore Related Articles • Search Keywords Jane-Ling Wang,1 Jeng-Min Chiou,2 and Hans-Georg Muller¨ 1
    ST03CH11-Wang ARI 29 April 2016 13:44 ANNUAL REVIEWS Further Click here to view this article's online features: • Download figures as PPT slides • Navigate linked references • Download citations Functional Data Analysis • Explore related articles • Search keywords Jane-Ling Wang,1 Jeng-Min Chiou,2 and Hans-Georg Muller¨ 1 1Department of Statistics, University of California, Davis, California 95616; email: [email protected] 2Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China Annu. Rev. Stat. Appl. 2016. 3:257–95 Keywords The Annual Review of Statistics and Its Application is functional principal component analysis, functional correlation, functional online at statistics.annualreviews.org linear regression, functional additive model, clustering and classification, This article’s doi: time warping 10.1146/annurev-statistics-041715-033624 Copyright c 2016 by Annual Reviews. Abstract All rights reserved With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. These are both examples of functional data, which has become a commonly encountered type of data. Functional data analy- sis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with Annu. Rev. Stat. Appl. 2016.3:257-295. Downloaded from www.annualreviews.org simple statistical notions such as mean and covariance functions, then cover- ing some core techniques, the most popular of which is functional principal Access provided by Academia Sinica - Life Science Library on 06/04/16.
    [Show full text]
  • Review of Functional Data Analysis
    Review of Functional Data Analysis Jane-Ling Wang,1 Jeng-Min Chiou,2 and Hans-Georg M¨uller1 1Department of Statistics, University of California, Davis, USA, 95616 2Institute of Statistical Science, Academia Sinica,Tapei, Taiwan, R.O.C. ABSTRACT With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. They are both examples of “functional data”, which have become a prevailing type of data. Functional Data Analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is Functional Principal Component Analysis (FPCA). FPCA is an important dimension reduction tool and in sparse data situations can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional data. Beyond linear and single or multiple index methods we touch upon a few nonlinear approaches that are promising for certain applications. They include additive and other nonlinear functional regression models, such as time warping, manifold learning, and dynamic modeling with empirical differential equations. The paper concludes with a brief discussion of future directions. KEY WORDS: Functional principal component analysis, functional correlation, functional linear regression, functional additive model, clustering and classification, time warping arXiv:1507.05135v1 [stat.ME] 18 Jul 2015 1 Introduction Functional data analysis (FDA) deals with the analysis and theory of data that are in the form of functions, images and shapes, or more general objects.
    [Show full text]
  • Representing Functional Data in Reproducing Kernel Hilbert Spaces with Applications to Clustering and Classification
    Working Paper 10-27 Departamento de Estadística Statistics and Econometrics Series 013 Universidad Carlos III de Madrid May 2010 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624-98-49 Representing Functional Data in Reproducing Kernel Hilbert Spaces with applications to clustering and classification Javier González and Alberto Muñoz Abstract Functional data are difficult to manage for many traditional statistical techniques given their very high (or intrinsically infinite) dimensionality. The reason is that functional data are essentially functions and most algorithms are designed to work with (low) finite-dimensional vectors. Within this context we propose techniques to obtain finite- dimensional representations of functional data. The key idea is to consider each functional curve as a point in a general function space and then project these points onto a Reproducing Kernel Hilbert Space with the aid of Regularization theory. In this work we describe the projection method, analyze its theoretical properties and propose a model selection procedure to select appropriate Reproducing Kernel Hilbert spaces to project the functional data. Keywords: Functional Data, Reproducing, Kernel Hilbert Spaces, Regularization Theory. Universidad Carlos III de Madrid, c/ Madrid 126, 28903 Getafe (Madrid), e-mail adresses: (Alberto Muñoz) [email protected], (Javier González) [email protected]. Acknowledgements: The research of Alberto Muñoz and Javier González was supported by Spanish Goverment grants 2006-03563-001, 2004-02934-001/002 and Madrid Goverment grant 2007-04084-001. 1. Introduction The field of Functional Data Analysis (FDA) [Ramsay and Silverman, 2006] [Ferraty and Vieu, 2006] deals naturally with data of very high (or intrinsically infinite) dimensionality.
    [Show full text]
  • Functional Data Analysis
    Functional data analysis Jane-Ling Wang,1 Jeng-Min Chiou,2 and Hans-Georg M¨uller1 1Department of Statistics/University of California, Davis, USA, 95616 2Institute of Statistical Science/Academia Sinica,Tapei, Taiwan, R.O.C. Annu. Rev. Statist. 2015. ?:1{41 Keywords This article's doi: Functional principal component analysis, functional correlation, 10.1146/((please add article doi)) functional linear regression, functional additive model, clustering and Copyright c 2015 by Annual Reviews. classification, time warping All rights reserved Abstract With the advance of modern technology, more and more data are be- ing recorded continuously during a time interval or intermittently at several discrete time points. They are both examples of \functional data", which have become a commonly encountered type of data. Func- tional Data Analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is Functional Principal Component Analysis (FPCA). FPCA is an important dimension reduction tool and in sparse data situations can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional data. Beyond linear and single or multiple index methods we touch upon a few nonlinear approaches that are promising for certain applications. They include additive and other nonlinear functional regression models, and models that feature time warping, manifold learning, and empirical differen- tial equations.
    [Show full text]
  • Functional Principal Component Analysis of Aircraft Trajectories Florence Nicol
    Functional Principal Component Analysis of Aircraft Trajectories Florence Nicol To cite this version: Florence Nicol. Functional Principal Component Analysis of Aircraft Trajectories. [Research Report] RR/ENAC/2013/02, ENAC. 2013. hal-01349113 HAL Id: hal-01349113 https://hal-enac.archives-ouvertes.fr/hal-01349113 Submitted on 26 Jul 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. RAPPORT DE RECHERCHE Mai 2013 Functional Principal Component Analysis of Aircraft Trajectories N° : RR/ENAC/2013/02 La référence aéronautique École Nationale de l’Aviation Civile Florence NICOL Abstract: In Functional Data Analysis (FDA), the underlying structure of a raw observation is functional and data are assumed to be sample paths from a single stochastic process. Functional Principal Component Analysis (FPCA) generalizes the standard multivariate Principal Component Analysis (PCA) to the infinite-dimensional case by analyzing the covariance structure of functional data. By approximating infinite-dimensional random functions by a finite number of random score vectors, FPCA appears as a dimension reduction technique just as in the multivariate case and cuts down the complexity of data. This technique offers a visual tool to assess the main direction in which trajectories vary, patterns of interest, clusters in the data and outlier detection.
    [Show full text]