State Space Modelling of Extreme Values with Particle Filters

State space modelling of extreme values with particle filters David Peter Wyncoll, MMath Submitted for the degree of Doctor of Philosophy at Lancaster University June 2009 i State space modelling of extreme values with particle filters David Peter Wyncoll, MMath Submitted for the degree of Doctor of Philosophy at Lancaster University, June 2009 Abstract State space models are a flexible class of Bayesian model that can be used to smoothly capture non-stationarity. Observations are assumed independent given a latent state process so that their distribution can change gradually over time. Sequential Monte Carlo methods known as particle filters provide an approach to inference for such models whereby observations are added to the fit sequentially. Though originally developed for on-line inference, particle filters, along with re- lated particle smoothers, often provide the best approach for off-line inference. This thesis develops new results for particle filtering and in particular develops a new particle smoother that has a computational complexity that is linear in the number of Monte Carlo samples. This compares favourably with the quadratic complexity of most of its competitors resulting in greater accuracy within a given time frame. The statistical analysis of extremes is important in many fields where the largest or smallest values have the biggest effect. Accurate assessments of the likelihood of extreme events are crucial to judging how severe they could be. While the extreme values of a stationary time series are well understood, datasets of extremes often contain varying degrees of non-stationarity. How best to extend standard extreme ii value models to account for non-stationary series is a topic of ongoing research. The thesis develops inference methods for extreme values of univariate and multivariate non-stationary processes using state space models fitted using particle methods. Though this approach has been considered previously in the univariate case, we identify problems with the existing method and provide solutions and extensions to it. The application of the methodology is illustrated through the analysis of a series of world class athletics running times, extreme temperatures at a site in the Antarctic, and sea-level extremes on the east coast of England. Acknowledgements I would first like to thank my supervisors Jonathan Tawn and Paul Fearnhead for their help and guidance throughout my PhD. I also acknowledge the assistance of staff and students at Lancaster University, in particular Granville Tunnicliffe- Wilson, Emma Eastoe and Chris Jewell. I am grateful to the Natural Environment Research Council (NERC) for funding my studies at Lancaster. I thank Dougal Goodman from the Foundation for Science and Technology for the Antarctic temperature dataset used in Section 4.2. I also acknowledge the British Oceanographic Data Centre, who are funded by the Environment Agency and NERC, for the sea-level data used in Section 5.2. Finally, I would like to thank my wife Robyn for her support and encouragement over the last three and a half years. iii Declaration The work in this thesis is my own, as carried out under the supervision of Jonathan Tawn and Paul Fearnhead. It has not been submitted for the award of a higher degree elsewhere. Section 3.2 has been submitted for publication as Fearnhead et al. (2009) along with the simplest EM algorithm in Subsection 3.3.1, the athletics analysis of Sec- tion 4.1 as well as corresponding material from Chapter 2 and Appendix A. David Wyncoll iv Contents List of figures . ix List of tables . xii List of algorithms . xiii 1 Introduction 1 2 Literature review 5 2.1 Particle Filtering . 5 2.1.1 State space model . 5 2.1.2 Kalman Filter . 7 2.1.3 Basic particle filters . 9 2.1.4 Auxiliary particle filter . 14 2.1.5 Enhancements . 16 2.1.6 Parameter estimation . 18 2.2 Particle Smoothing . 21 2.2.1 Kalman Smoothing . 21 2.2.2 Smoothing while filtering . 22 2.2.3 Forwards-backwards smoothers . 24 2.2.4 Two-Filter Smoother . 25 2.3 Univariate Extreme Value Theory . 28 2.3.1 Maxima of IID random variables . 28 2.3.2 Point process characterisation . 34 2.3.3 Dependent processes . 39 v CONTENTS vi 2.3.4 Non-stationary processes . 43 2.4 Multivariate Extreme Value Theory . 47 2.4.1 Multivariate extreme value distributions . 47 2.4.2 Alternative approaches . 51 3 New results in particle filtering 53 3.1 Choosing when to re-sample . 53 3.1.1 Motivation . 53 3.1.2 Theory . 54 3.1.3 Simulation studies . 60 3.1.4 Conclusion . 74 3.2 A sequential smoothing algorithm with linear computational cost . 75 3.2.1 Weaknesses of current particle smoothers . 75 3.2.2 New smoothing algorithm . 77 3.2.3 Simulation studies . 83 3.3 EM algorithm for static parameter estimates . 93 3.3.1 EM algorithm . 93 3.3.2 Estimating observed information . 97 4 State space modelling of univariate extremes 98 4.1 Analysis of women’s 3000m running event . 98 4.1.1 Introduction . 98 4.1.2 Dynamic r-smallest order statistics model . 101 4.1.3 Parameter estimation . 103 4.1.4 Results . 104 4.2 Analysis of Antarctic temperature data . 107 4.2.1 Introduction . 107 4.2.2 Standardising the dataset . 108 4.2.3 Dynamic point process model . 112 4.2.4 Results . 115 CONTENTS vii 5 State space modelling of bivariate extremes 120 5.1 Pooling athletics data from two events . 120 5.1.1 Introduction . 120 5.1.2 Dynamic logistic model with correlated random walks . 121 5.1.3 Parameter estimation . 125 5.1.4 Results . 127 5.2 Joint analysis of sea-level data . 129 5.2.1 Introduction . 129 5.2.2 Dynamic logistic model with variable dependence parameter 132 5.2.3 Parameter estimation . 136 5.2.4 Results . 138 A Implementation of particle methods 143 A.1 Linear-Gaussian model . 143 A.2 Stochastic volatility . 145 A.3 Bearings-only tracking . 149 A.4 Analysis of women’s 3000m running event . 150 A.5 Analysis of Antarctic temperature data . 154 A.6 Pooling athletics data from two events . 156 A.7 Joint analysis of sea-level data . 160 B Derivation of state models from SDEs 166 B.1 Integrated random walk . 166 B.2 Correlated integrated random walks . 169 C Derivation of bivariate logistic model 171 C.1 Joint distribution functions in Fréchet margins . 171 C.2 Joint density functions in Fréchet margins . 176 C.3 Transforming densities to GEV margins . 177 Bibliography 179 CONTENTS viii Index 190 List of Figures 2.1 General form of the state space model showing the conditional in- dependences that exist between the observations Yt and the hidden states Xt................................. 6 2.2 Diagram showing how the simple Filter-Smoother re-weights the filter particles. 23 2.3 Probability density functions of the Negative-Weibull, Gumbel and Fréchet distributions. 30 2.4 Example of a series declustered with the runs method. 41 2.5 Example of the selection of peaks over a constant threshold when the series is non-stationary. 44 3.1 Log variances of filter estimates of the state for different ESS thresholds. 54 3.2 Log variances of filter estimates of E(Xt|y1:t) in the AR(1) model for different ESS thresholds. 62 3.3 Log variances of filter estimates of the volatility in the stochastic volatility model for different ESS thresholds. 67 3.4 Simulated path of an object in the u-v plane over 24 time steps with the observed noisy bearings made from the origin. 70 3.5 Log variances of filter estimates of the range in the bearings-only tracking model for different ESS thresholds. 71 ix LIST OF FIGURES x 3.6 Average effective sample size for each of the 200 time steps using the filter, Forward-Backward and Two-Filter smoothers as well as the O(N 2) and O(N) versions of our new algorithm. 86 3.7 Average effective sample sizes as in Figure 3.6 with different ratios of the state noise ν2 to the observation noise τ 2............ 87 4.1 The µt and ξt components of 1000 particles from the filter at years 1973, 1979, 1983 and 1992 following the method of Gaetan and Grigoletto (2004). 100 4.2 Five fastest times for the women’s 3000m race between 1972 and 2007 with Wang Junxia’s time in 1993. 102 4.3 Raw Faraday temperature series showing annual maxima, mean and minima with linear least-squares fits. 107 4.4 Logged sum of squares of the discrepancy between the data removed for cross-validation and its smoothed estimate for a variety of kernel bandwidths. 110 4.5 Smoothed Faraday temperature series with confidence interval con- structed from the smoothed standard deviation. 110 4.6 Standardised Faraday temperature series for 1958 with declustered threshold exceedances for the upper and lower tail. 111 4.7 Standardised Faraday temperature series with declustered threshold exceedances and fitted trend µt for the upper and lower extremes. 116 5.1 Five fastest times for the women’s 1500m and 3000m races between 1972 and 2007. 122 5.2 Contour plot of the model log likelihood for a range of dependence parameters ρ and α........................... 126 5.3 Map showing the locations of our sea-level data sources along the eastern coastline of England. 130 5.4 Monthly maximum sea-level surges for each of the three locations. 131 LIST OF FIGURES xi 5.5 Sea-level surges at Immingham during the winter containing Jan- uary 1989 with the five largest monthly cluster maximum obtained using blocks of radius κ =7days.

State Space Modelling of Extreme Values with Particle Filters

An Overview of the Extremal Index Nicholas Moloney, Davide Faranda, Yuzuru Sato

Using Extreme Value Theory to Test for Outliers Résumé Abstract

Extreme Value Theory Page 1

Going Beyond the Hill: an Introduction to Multivariate Extreme Value Theory

View Article

An Introduction to Extreme Value Statistics

Limit Theorems for Point Processes Under Geometric Constraints (And

Extreme Value Theory in Application to Delivery Delays

Actuarial Modelling of Extremal Events Using Transformed Generalized Extreme Value Distributions and Generalized Pareto Distributions

Extreme Value Statistics and the Pareto Distribution in Silicon Photonics 1

L-Moments for Automatic Threshold Selection in Extreme Value Analysis

EXTREME VALUE THEORY and LARGE FIRE LOSSES The