Local Regression Models: Advancements
Total Page:16
File Type:pdf, Size:1020Kb
LOCAL REGRESSION MODELS: ADVANCEMENTS, APPLICATIONS, AND NEW METHODS A Dissertation Submitted to the Faculty of Purdue University by Ryan P. Hafen In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2010 Purdue University West Lafayette, Indiana ii [Put dedication here.] iii ACKNOWLEDGMENTS [Put statement of appreciation or recognition of special assistance here.] iv TABLE OF CONTENTS Page LIST OF TABLES :::::::::::::::::::::::::::::::: vii LIST OF FIGURES ::::::::::::::::::::::::::::::: viii SYMBOLS :::::::::::::::::::::::::::::::::::: xv ABBREVIATIONS :::::::::::::::::::::::::::::::: xix ABSTRACT ::::::::::::::::::::::::::::::::::: xx 1 Background :::::::::::::::::::::::::::::::::: 1 1.1 Regression Models ::::::::::::::::::::::::::: 1 1.2 Local Regression Models :::::::::::::::::::::::: 1 1.2.1 Fitting a Local Regression Model ::::::::::::::: 2 1.2.2 Loess as a Linear Operator ::::::::::::::::::: 4 1.2.3 Statistical Properties :::::::::::::::::::::: 5 1.2.4 Diagnostics and Model selection :::::::::::::::: 9 1.2.5 Computation :::::::::::::::::::::::::: 13 1.3 Time Series Models ::::::::::::::::::::::::::: 14 1.3.1 ARMA Models ::::::::::::::::::::::::: 14 1.3.2 Nonstationarity and Trends :::::::::::::::::: 15 1.3.3 Linear Filters :::::::::::::::::::::::::: 16 1.3.4 Linear Filters in the Frequency Domain :::::::::::: 17 1.4 Seasonal Trend Decomposition using Loess (STL) :::::::::: 20 1.4.1 The STL Method :::::::::::::::::::::::: 20 1.4.2 STL Smoothing Parameters :::::::::::::::::: 24 1.4.3 STL Parameter Guidelines ::::::::::::::::::: 25 1.4.4 Seasonal Parameter Selection ::::::::::::::::: 27 1.4.5 Post Smoothing ::::::::::::::::::::::::: 29 1.4.6 Computational Considerations ::::::::::::::::: 30 2 Advancements to the STL Method ::::::::::::::::::::: 31 2.1 The Operator Matrix of an STL Decomposition ::::::::::: 31 2.1.1 Why Calculate the Operator Matrix? ::::::::::::: 31 2.1.2 How to Calculate the Operator Matrix :::::::::::: 32 2.1.3 Variance Coefficients along the Design Space ::::::::: 36 2.1.4 Spectral Properties of Smoothing Near Endpoints :::::: 37 2.1.5 ARMA Modeling on Remainder Component ::::::::: 37 2.1.6 Predicting Ahead :::::::::::::::::::::::: 39 v Page 2.1.7 Confidence and Prediction Intervals :::::::::::::: 42 2.1.8 Analysis of Variance :::::::::::::::::::::: 43 2.1.9 Model Selection ::::::::::::::::::::::::: 44 2.2 Local Quadratic Support and New Parameter Selection Guidelines : 45 2.2.1 Approximating the Critical Frequency for a Loess Filter :: 45 2.2.2 Calculating q Given !, λ, and fcrit :::::::::::::: 48 2.2.3 Trend Filter Parameter Guidelines :::::::::::::: 48 2.2.4 High-Pass Filter Parameter Guidelines :::::::::::: 49 2.3 The Revision Problem ::::::::::::::::::::::::: 49 2.3.1 Reproducing Kernel Hilbert Spaces with Loess :::::::: 51 2.3.2 Blending to Lower Degree Polynomials at the Endpoints :: 52 2.3.3 Comparison of Blending and RKHS :::::::::::::: 60 2.4 Software Implementation: stl2 and operator Packages ::::::: 62 2.4.1 The stl2 Package :::::::::::::::::::::::: 63 2.4.2 The operator Package ::::::::::::::::::::: 64 2.4.3 Computational Considerations ::::::::::::::::: 65 2.4.4 Approximations ::::::::::::::::::::::::: 67 2.5 General Guidelines ::::::::::::::::::::::::::: 68 3 An Application: Syndromic Surveillance :::::::::::::::::: 71 3.1 Background ::::::::::::::::::::::::::::::: 71 3.2 Modeling the Data ::::::::::::::::::::::::::: 72 3.2.1 Model Overview ::::::::::::::::::::::::: 73 3.2.2 Model Components and Parameter Selection ::::::::: 74 3.2.3 Model Diagnostics ::::::::::::::::::::::: 76 3.2.4 Blending ::::::::::::::::::::::::::::: 76 3.2.5 Modeling on the Counts Scale ::::::::::::::::: 77 3.3 Outbreak Detection ::::::::::::::::::::::::::: 78 3.3.1 Detection Method :::::::::::::::::::::::: 78 3.3.2 Outbreak Model :::::::::::::::::::::::: 79 3.3.3 Comparison of Methods :::::::::::::::::::: 80 3.3.4 Outbreak Injection Scenarios :::::::::::::::::: 82 3.3.5 Choosing Cutoff Limits ::::::::::::::::::::: 83 3.4 Results :::::::::::::::::::::::::::::::::: 84 3.4.1 Results Summary :::::::::::::::::::::::: 84 3.4.2 Examining the Difference in Detection Performance ::::: 86 3.4.3 False Positives :::::::::::::::::::::::::: 87 3.5 Discussion :::::::::::::::::::::::::::::::: 87 3.6 Conclusions ::::::::::::::::::::::::::::::: 88 3.7 Other Work ::::::::::::::::::::::::::::::: 89 4 A New Density Estimation Method: Ed ::::::::::::::::::: 91 4.1 Introduction ::::::::::::::::::::::::::::::: 91 vi Page 4.1.1 Background ::::::::::::::::::::::::::: 91 4.1.2 Summary of Ed ::::::::::::::::::::::::: 92 4.1.3 Overview of the Chapter :::::::::::::::::::: 93 4.2 Examples :::::::::::::::::::::::::::::::: 94 4.3 Estimation Step 1: Raw Estimates :::::::::::::::::: 95 4.3.1 Choosing κ ::::::::::::::::::::::::::: 97 4.3.2 Approximate Distribution of the Raw Estimates ::::::: 98 4.4 Estimation Step 2: Loess Smoothing ::::::::::::::::: 100 4.5 Diagnostic Checking, Model Selection, and Inference ::::::::: 101 4.5.1 Evaluating Goodness-of-Fit :::::::::::::::::: 101 4.5.2 Mallows Cp Model Selection :::::::::::::::::: 102 4.5.3 Validating Model Assumptions ::::::::::::::::: 102 4.5.4 Inference ::::::::::::::::::::::::::::: 102 4.6 Comparison with Kernel Density Estimates :::::::::::::: 103 4.6.1 The Family-Income data :::::::::::::::::::: 104 4.6.2 The Packet-Delay Data ::::::::::::::::::::: 105 4.6.3 The Normal-Mixture Data ::::::::::::::::::: 105 4.7 Grid Augmentation ::::::::::::::::::::::::::: 106 4.8 More Examples ::::::::::::::::::::::::::::: 108 4.9 Computational Methods :::::::::::::::::::::::: 109 4.10 Discussion :::::::::::::::::::::::::::::::: 110 5 Figures ::::::::::::::::::::::::::::::::::::: 113 5.1 Figures for Local Regression Modeling :::::::::::::::: 113 5.2 Figures for Time Series Modeling ::::::::::::::::::: 127 5.3 Figures for STL Advancements :::::::::::::::::::: 139 5.4 Figures for Syndromic Surveillance :::::::::::::::::: 179 5.5 Figures for Ed :::::::::::::::::::::::::::::: 205 Appendix: R Package Documentation ::::::::::::::::::::: 243 LIST OF REFERENCES :::::::::::::::::::::::::::: 273 VITA ::::::::::::::::::::::::::::::::::::::: 279 vii LIST OF TABLES Table Page (λ) 2.1 Coefficient ηi;j values for λ, i; j = 0; 1; 2 :::::::::::::::::: 47 3.1 Overall percentage of outbreaks detected by method ::::::::::: 85 viii LIST OF FIGURES Figure Page 5.1 Data for loess example :::::::::::::::::::::::::: 113 5.2 Loess example - fit at x = 0:5 :::::::::::::::::::::: 114 5.3 Loess fit for λ = 2, α = 0:4 :::::::::::::::::::::::: 115 5.4 Variance coefficient along the design space for the density loess fit :: 116 5.5 Loess fitted values for several parameters :::::::::::::::: 117 5.6 Loess residuals for several parameters :::::::::::::::::: 118 5.7 Mallows' Cp plot for the density data :::::::::::::::::: 119 5.8 Residual variance vs. ν for density data fits :::::::::::::: 120 5.9 Smoothing parameter α vs. equivalent number of parameters ν for den- sity data :::::::::::::::::::::::::::::::::: 121 5.10 Normal quantile plot for the density data loess fit with λ = 2, α = 0:4 122 5.11 Residual serial correlation plot for density data loess fit :::::::: 123 5.12 Residuals vs. design points for density data loess fit :::::::::: 124 5.13 Residuals vs. fitted values for density data loess fit :::::::::: 125 5.14 Loess fit for density data with exact fits at k-d tree vertices and inter- polation elsewhere ::::::::::::::::::::::::::::: 126 5.15 Two realizations of a unit-variance random walk process ::::::: 127 5.16 Time series loess fit to simulated data :::::::::::::::::: 128 5.17 Power transfer functions for symmetric loess operators :::::::: 129 5.18 CO2 concentration measurements at Mauna Loa, Hawaii from 1959 to 1997 :::::::::::::::::::::::::::::::::::: 130 5.19 STL decomposition for CO2 measurement time series ::::::::: 131 5.20 Power transfer function for each smoothing component of the CO2 STL decomposition ::::::::::::::::::::::::::::::: 132 5.21 Seasonal diagnostic plot for CO2 decomposition :::::::::::: 133 ix Figure Page 5.22 Remainder component by cycle-subseries for CO2 decomposition ::: 134 5.23 Fitted seasonal component by cycle-subseries for CO2 data :::::: 135 5.24 Trend diagnostic plot for CO2 decomposition :::::::::::::: 136 5.25 CO2 decomposition with low-frequency post-trend smoothing ::::: 137 5.26 CO2 decomposition with two post-trend smoothing components ::: 138 5.27 Variance coefficients for CO2 decomposition components ::::::: 139 5.28 Power transfer functions for STL operator matrix L∗ ::::::::: 140 5.29 Autocorrelation function for CO2 remainder term and residual term after ARMA modeling ::::::::::::::::::::::::::::: 141 5.30 Normal quantile plot of remainder term of CO2 model after ARMA mod- eling on remainder term ::::::::::::::::::::::::: 142 5.31 Predicting the CO2 series ahead 3 years with 95% prediction intervals 143 5.32 Simulated data to mimic daily visit counts to an emergency department (ED) :::::::::::::::::::::::::::::::::::: 144 5.33 Cp plot for CO2 decomposition with varying seasonal span n(s). ::: 145 5.34 Critical frequencies of a symmetric loess smooth vs. q−1 :::::::: 146 5.35 Residuals from fitting critical frequencies vs. q−1 for a linear and quadratic model ::::::::::::::::::::::::::::::::::: 147 5.36 Coefficients from quadratic polynomial fits of critical frequency vs. q−1 plotted against ! by λ :::::::::::::::::::::::::: 148 (lower) (upper) 5.37 Plot of n(p)fh and n(p)fh vs. log2 n(p) for λ(l) = 0; 1; 2 and ! = 0:05 :::::::::::::::::::::::::::::::::: 149 5.38 Variance coefficients