Tsalis Christos Phd.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Tsalis Christos Phd.Pdf NATIONAL AND KAPODISTRIAN UNIVERSITY OF ATHENS SCHOOL OF SCIENCE – FACULTY OF PHYSICS DIVISION OF ENVIRONMENTAL PHYSICS DOCTORAL THESIS Statistical Modeling of Extreme Environmental Values Christos Tsalis M.Sc Advisory Committee Emeritus Professor George Kallos Professor Kostas A. Belibassakis Research Director Takvor H. Soukissian Athens, July 2021 Examination Committee George Kallos Emeritus Professor, National & Kapodistrian University of Athens School of Physics Division of Environment Physics - Meteorology Takvor H. Soukissian Research Director, Hellenic Centre for Marine Research - Institute of Oceanography, Anavissos Kostas A. Belibassakis Professor, School of Naval Architecture & Marine Eng. NTUA George N. Galanis Professor, Hellenic Naval Academy Helena Flocas Professor, National & Kapodistrian University of Athens School of Physics Division of Environment Physics - Meteorology Sofianos Sarantis Associate Professor, National & Kapodistrian University of Athens School of Physics Division of Environment Physics - Meteorology Nikolaos Themelis Assistant Professor, School of Naval Architecture & Marine Eng. NTUA i Acknowledgments This doctoral thesis would not have been possible without the guidance and the help of several people who in one way or another contributed in the preparation and completion of this study. First and foremost, I would like to fully express my thanks and gratitude to my supervisor Prof. George Kallos, and Dr. Takvor Soukissian and Prof. Konstantinos Belibassakis for their guidance. Their great thinking, intuition, patient guidance and encouragement has proven a valuable lesson for me. It has been a pleasure and honour to work under their supervision and both contributed many ideas, useful suggestions and constructive comments. I am greatly indebted to them for proof reading the final draft of this thesis. In addition, I would like to thank all my fellow PhD researchers from the department of Physics at the University of Athens and Naval Architecture & Marine Eng. department of NTUA and my fellow researchers of the Institute of Oceanography of HCMR for their help and moral support. I am greatly indebted for the help of several people who contributed in the preparation and completion of this study, especially Dr Flora Karathanasi for her help and moral support, Mrs Fani Anagnostou and Mr Miltos Stamos for their assistance in computational issues and finally Mr Panagiotis Triantafyllopoulos for his assistance in graphical issues. I would also like to thank Dr Ioannis Papastathopoulos from the department of Mathematics at the University of Edinburgh for his scientific advisory and his advice to extend the present research to other useful scientific areas. Finally, I would like to thank my family, my father Vasilis, my mother Paraskevi and my sisters Maria and Effie for their constant love and unconditional support over the years. Their encouragement has played a vital role throughout my studies. ii ii Dedicated to all my students ii Declaration I declare that this thesis entitled Statistical Modeling of Extreme Environmental Values is the result of my own research under the guidance of Prof. George Kallos, Prof. Kostas A. Belibassakis and Dr. Takvor Soukissian except as cited in the references. This thesis has not been submitted in any form for the award of a higher degree elsewhere. This doctoral thesis was partially supported by the Operational Program "Human Resources Development, Education and Lifelong Learning" and co-financed by the European Union (European Social Fund) and Greek national funds, (MIS 5007050), in (http://www.edulll.gr). Copyright © Tsalis Christos, 2021. All rights reserved. iii Abstract (in English) The accurate estimation of extreme values for metocean parameters (e.g., wind speed) plays a crucial role in the marine renewable energy industry and in coastal and offshore engineering applications. Typical challenges that arise in these fields of interest among others are the limited source of information in samples, commonly associated by the scarcity of long datasets, and the accurate estimation of the underlying dependence structure of the stochastic models that can be used for inference on applied problems with extremes. The present analysis, aims to assess the effect of the asymptotic distributional behavior of two types of extreme wind speed sampling data that form the basis of all subsequent predictions in the long term time scale. The first type of sampling data considered will be subsets of observations extracted from blocks of annual length and the second type are subset of observations exceeding a high enough threshold. The challenges closely related to the special attributes that form these types of sampling data motivate the present thesis which focuses on constructing and improving extreme value models to assess the risk associated to extreme wind speed episodes. In particular, in this thesis we focus on The identification of the combined effects of the samples of wind speed that influence the stability of the parameter estimates as well as the efficiency of the estimators to the modelling of extremes. Providing alternative methods of modelling extremes of wind speed that are less known to the relative fields of interest and infer to demonstrate better in comparison to the standard modelling approach. Extending the formulation of the stationary model of extremes to the parameterization of a nonstationary model in order to incorporate subject specific knowledge in the presence of trends under the assumption of climate wind changes. Extending the classical methods that identify the dependence structure in sample of observations in order to effectively model the extremes that are irregularly spaced in time. Specifically, the reconstruction of a dependent sample of extremes that are irregularly spaced in time is focused on relatively small samples of wind speed where the scarcity of long and complete time series is a common restriction in climatological studies. In this setting, the statistical analysis of the most used and less known estimators that model the extremes of wind speed is inferred from a twofold approach. A simulation study is performed first to assess the effect of the sample size to the estimators of the asymptotic distribution that model extremes. The evaluation of the simulation results is based on several statistical measures. Afterwards, the optimum methods from the simulation analysis are applied to wind speed datasets of different sample size and different direction step of sampling. The evaluation is based on datasets originated from databases of relatively moderate horizontan resolution to the regional locations at the North Sea, at the Pacific coast of central America and at the eastern Atlantic Ocean where these locations are exposed to a strong wind climate with evidence of extreme wind speeds. Inference of the sample size effect and the directional step of sampling to the demonstration of the model estimators is made on the obtained 50- and 100- year wind speed design values. From this assessment, the combined method of moments is advised as the suitable method when the sample size is limited. Other challenges that motivated this study is the modelling of extremes when the extremal characteristics are expected not constant over time. To this effect, seasonality and long-term trends are probably the main reasons that influence the stationary hypothesis of the wind speed processes. In this part of this study, an attempt is made to model the possible trends of extremes in the long-term behavior of the process. Since in practice the trend is unknown, various formulations of the trend as a function of time are assessed to represent the extremes of wind speed when the stationary assumption is not valid in order to alleviate the bias effect from iv the attempt of de-trending the process before the time series is used. Statistical tests challenged the modeling of the trend of rejection or not in favor of stationarity. For the extremes of non- stationary sequences and the application to wind speed design values, our analysis is based on coarse historical data of long datasets at regional locations at the North Sea where trends are notable to influence the wind speed variability. From this assessment, the simplest form of parameterization in the parameters of the extreme value distribution is advised in modeling extremes when stationarity is violated. Another common problem of design to assess risk associated to extremes of wind speed in met-ocean fields of interest, is the scarcity of long datasets. To this limitation, many applications utilize as many as possible extremes from the available dataset by re-sampling to a subset of extremes. However, the re-samples are often affected by dependency and the diagnostics related to the independence limitations is usually violated when the observations of these samples are irregularly spaced in time. To alleviate this effect, a resampling strategy is proposed that effectively models extremes irregularly in time when re-sampling of relatively small datasets of wind speed is advised. The proposed DeCA Uncorrelated (DeCAUn) model provides an improvement to the current physical De-Clustering Algorithm (DeCA) modelling the samples of DeCA irregularly in time. Specifically, the resampling strategy proposed analyzes the correlation effect in samples based on the extension of the standard correlation operator setting weight functions to observations irregularly spaced in time. To infer in terms of precision
Recommended publications
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • Robper: an R Package to Calculate Periodograms for Light Curves Based on Robust Regression
    RobPer: An R Package to Calculate Periodograms for Light Curves Based on Robust Regression Anita M. Thieler Roland Fried Jonathan Rathjens TU Dortmund University TU Dortmund University TU Dortmund University Abstract An important task in astroparticle physics is the detection of periodicities in irregularly sampled time series, called light curves. The classic Fourier periodogram cannot deal with irregular sampling and with the measurement accuracies that are typically given for each observation of a light curve. Hence, methods to fit periodic functions using weighted regression were developed in the past to calculate periodograms. We present the R package RobPer which allows to combine different periodic functions and regression techniques to calculate periodograms. Possible regression techniques are least squares, least absolute deviations, least trimmed squares, M-, S- and τ-regression. Measurement accuracies can be taken into account including weights. Our periodogram function covers most of the approaches that have been tried earlier and provides new model-regression-combinations that have not been used before. To detect valid periods, RobPer applies an outlier search on the periodogram instead of using fixed critical values that are theoretically only justified in case of least squares regression, independent periodogram bars and a null hypothesis allowing only normal white noise. Finally, the package also includes a generator to generate artificial light curves. Keywords: periodogram, light curves, period detection, irregular sampling, robust regression, outlier detection, Cramér-von-Mises distance minimization, time series analysis, beta distri- bution, measurement accuracies, astroparticle physics, weighted regression, regression model. 1. Introduction We introduce the R (R Core Team 2015) package RobPer (Thieler, Rathjens, and Fried 2015), which can be used to calculate periodograms and detect periodicities in irregularly sampled time series.
    [Show full text]
  • Statistical Theory
    Statistical Theory Prof. Gesine Reinert November 23, 2009 Aim: To review and extend the main ideas in Statistical Inference, both from a frequentist viewpoint and from a Bayesian viewpoint. This course serves not only as background to other courses, but also it will provide a basis for developing novel inference methods when faced with a new situation which includes uncertainty. Inference here includes estimating parameters and testing hypotheses. Overview • Part 1: Frequentist Statistics { Chapter 1: Likelihood, sufficiency and ancillarity. The Factoriza- tion Theorem. Exponential family models. { Chapter 2: Point estimation. When is an estimator a good estima- tor? Covering bias and variance, information, efficiency. Methods of estimation: Maximum likelihood estimation, nuisance parame- ters and profile likelihood; method of moments estimation. Bias and variance approximations via the delta method. { Chapter 3: Hypothesis testing. Pure significance tests, signifi- cance level. Simple hypotheses, Neyman-Pearson Lemma. Tests for composite hypotheses. Sample size calculation. Uniformly most powerful tests, Wald tests, score tests, generalised likelihood ratio tests. Multiple tests, combining independent tests. { Chapter 4: Interval estimation. Confidence sets and their con- nection with hypothesis tests. Approximate confidence intervals. Prediction sets. { Chapter 5: Asymptotic theory. Consistency. Asymptotic nor- mality of maximum likelihood estimates, score tests. Chi-square approximation for generalised likelihood ratio tests. Likelihood confidence regions. Pseudo-likelihood tests. • Part 2: Bayesian Statistics { Chapter 6: Background. Interpretations of probability; the Bayesian paradigm: prior distribution, posterior distribution, predictive distribution, credible intervals. Nuisance parameters are easy. 1 { Chapter 7: Bayesian models. Sufficiency, exchangeability. De Finetti's Theorem and its intepretation in Bayesian statistics. { Chapter 8: Prior distributions. Conjugate priors.
    [Show full text]
  • On the Relation Between Frequency Inference and Likelihood
    On the Relation between Frequency Inference and Likelihood Donald A. Pierce Radiation Effects Research Foundation 5-2 Hijiyama Park Hiroshima 732, Japan [email protected] 1. Introduction Modern higher-order asymptotic theory for frequency inference, as largely described in Barn- dorff-Nielsen & Cox (1994), is fundamentally likelihood-oriented. This has led to various indica- tions of more unity in frequency and Bayesian inferences than previously recognised, e.g. Sweeting (1987, 1992). A main distinction, the stronger dependence of frequency inference on the sample space—the outcomes not observed—is isolated in a single term of the asymptotic formula for P- values. We capitalise on that to consider from an asymptotic perspective the relation of frequency inferences to the likelihood principle, with results towards a quantification of that as follows. Whereas generally the conformance of frequency inference to the likelihood principle is only to first order, i.e. O(n–1/2), there is at least one major class of applications, likely more, where the confor- mance is to second order, i.e. O(n–1). There the issue is the effect of censoring mechanisms. On a more subtle point, it is often overlooked that there are few, if any, practical settings where exact frequency inferences exist broadly enough to allow precise comparison to the likelihood principle. To the extent that ideal frequency inference may only be defined to second order, there is some- times—perhaps usually—no conflict at all. The strong likelihood principle is that in even totally unrelated experiments, inferences from data yielding the same likelihood function should be the same (the weak one being essentially the sufficiency principle).
    [Show full text]
  • Sigspec User's Manual
    Comm. in Asteroseismology - Complementary Topics Volume 1, 2010 c Austrian Academy of Sciences Preface Jeffrey D. Scargle1 1 Space Science and Astrobiology Division, NASA Ames Research Center SigSpec is a method for detecting and characterizing periodic signals in noisy data. This is an extremely common problem, not only in astronomy but in almost every branch of science and engineering. This work will be of great interest to anyone carrying out harmonic analysis employing Fourier techniques. The method is based on the definition of a quantity called spectral signif- icance – a function of Fourier phase and amplitude. Most data analysts are used to exploring only the Fourier amplitude, through the power spectrum, ignoring phase information. The Fourier phase spectrum can be estimated from data, but its interpretation is usually problematic. The spectral sig- nificance quantity conveys more information than does the conventional amplitude spectrum alone, and appears to simplify statistical issues as well as the interpretation of phase information. arXiv:1006.5081v1 [astro-ph.IM] 25 Jun 2010 Comm. in Asteroseismology - Complementary Topics Volume 1, 2010 c Austrian Academy of Sciences SigSpec User’s Manual P. Reegen1 1 Institut f¨ur Astronomie, T¨urkenschanzstraße 17, 1180 Vienna, Austria [email protected] Abstract SigSpec computes the spectral significance levels for the DFT amplitude spec- trum of a time series at arbitrarily given sampling. It is based on the analyti- cal solution for the Probability Density Function (PDF) of an amplitude level, including dependencies on frequency and phase and referring to white noise. Using a time series dataset as input, an iterative procedure including step- by-step prewhitening of the most significant signal components and MultiSine least-squares fitting is provided to determine a whole set of signal components, which makes the program a powerful tool for multi-frequency analysis.
    [Show full text]
  • 1 the Likelihood Principle
    ISyE8843A, Brani Vidakovic Handout 2 1 The Likelihood Principle Likelihood principle concerns foundations of statistical inference and it is often invoked in arguments about correct statistical reasoning. Let f(xjθ) be a conditional distribution for X given the unknown parameter θ. For the observed data, X = x, the function `(θ) = f(xjθ), considered as a function of θ, is called the likelihood function. The name likelihood implies that, given x, the value of θ is more likely to be the true parameter than θ0 if f(xjθ) > f(xjθ0): Likelihood Principle. In the inference about θ, after x is observed, all relevant experimental information is contained in the likelihood function for the observed x. Furthermore, two likelihood functions contain the same information about θ if they are proportional to each other. Remark. The maximum-likelihood estimation does satisfy the likelihood principle. Figure 1: Leonard Jimmie Savage; Born: November 20, 1917, Detroit, Michigan; Died: November 1, 1971, New Haven, Connecticut The following example quoted by Lindley and Phillips (1976) is an argument of Leonard Savage dis- cussed at Purdue Symposium 1962. It shows that the inference can critically depend on the likelihood principle. Example 1: Testing fairness. Suppose we are interested in testing θ, the unknown probability of heads for possibly biased coin. Suppose, H0 : θ = 1=2 v:s: H1 : θ > 1=2: 1 An experiment is conducted and 9 heads and 3 tails are observed. This information is not sufficient to fully specify the model f(xjθ): A rashomonian analysis follows: ² Scenario 1: Number of flips, n = 12 is predetermined.
    [Show full text]
  • A Heuristic Derivation of the Uncertainty of the Frequency Determination in Time Series Data
    Astronomy & Astrophysics manuscript no. Kallinger˙ferror c ESO 2018 November 1, 2018 A heuristic derivation of the uncertainty of the frequency determination in time series data Thomas Kallinger, Piet Reegen and Werner W. Weiss Institute for Astronomy (IfA), University of Vienna, T¨urkenschanzstrasse 17, A-1180 Vienna Accepted 22 December 2007 ABSTRACT Context. Several approaches to estimate frequency, phase and amplitude errors in time series analyses were reported in the literature, but they are either time consuming to compute, grossly overestimating the error, or are based on empirically determined criteria. Aims. A simple, but realistic estimate of the frequency uncertainty in time series analyses. Methods. Synthetic data sets with mono- and multi–periodic harmonic signals and with randomly distributed amplitude, frequency and phase were generated and white noise added. We tried to recover the input parameters with classical Fourier techniques and investigated the error as a function of the relative level of noise, signal and frequency difference. Results. We present simple formulas for the upper limit of the amplitude, frequency and phase uncertainties in time–series analyses. We also demonstrate the possibility to detect frequencies which are separated by less than the classical frequency resolution and that the realistic frequency error is at least 4 times smaller than the classical frequency resolution. Key words. methods: data analysis – methods: statistical 1. Motivation based on an analytical solution for the one–sigma error of a least–squares sinusoidal fit with a rms of σ(m). The total num- In the frequency analysis of time series, a realistic estimateof the ber of data points, the total time base of the observations, the amplitude, phase and frequency uncertainties can be of special signal amplitude, phase and frequency are N, T, a, φ and f , re- interest.
    [Show full text]
  • Model Selection by Normalized Maximum Likelihood
    Myung, J. I., Navarro, D. J. and Pitt, M. A. (2006). Model selection by normalized maximum likelihood. Journal of Mathematical Psychology, 50, 167-179 http://dx.doi.org/10.1016/j.jmp.2005.06.008 Model selection by normalized maximum likelihood Jay I. Myunga;c, Danielle J. Navarrob and Mark A. Pitta aDepartment of Psychology Ohio State University bDepartment of Psychology University of Adelaide, Australia cCorresponding Author Abstract The Minimum Description Length (MDL) principle is an information theoretic approach to in- ductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude `two-part code' MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined `universal code' MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided. Keywords: Minimum Description Length, Model Complexity, Inductive Inference, Cognitive Modeling. Preprint submitted to Journal of Mathematical Psychology 13 August 2019 To select among competing models of a psychological process, one must decide which criterion to use to evaluate the models, and then make the best inference as to which model is preferable.
    [Show full text]
  • P Values, Hypothesis Testing, and Model Selection: It’S De´Ja` Vu All Over Again1
    FORUM P values, hypothesis testing, and model selection: it’s de´ja` vu all over again1 It was six men of Indostan To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind. ... And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right, And all were in the wrong! So, oft in theologic wars The disputants, I ween, Rail on in utter ignorance Of what each other mean, And prate about an Elephant Not one of them has seen! —From The Blind Men and the Elephant: A Hindoo Fable, by John Godfrey Saxe (1872) Even if you didn’t immediately skip over this page (or the entire Forum in this issue of Ecology), you may still be asking yourself, ‘‘Haven’t I seen this before? Do we really need another Forum on P values, hypothesis testing, and model selection?’’ So please bear with us; this elephant is still in the room. We thank Paul Murtaugh for the reminder and the invited commentators for their varying perspectives on the current shape of statistical testing and inference in ecology. Those of us who went through graduate school in the 1970s, 1980s, and 1990s remember attempting to coax another 0.001 out of SAS’s P ¼ 0.051 output (maybe if I just rounded to two decimal places ...), raising a toast to P ¼ 0.0499 (and the invention of floating point processors), or desperately searching the back pages of Sokal and Rohlf for a different test that would cross the finish line and satisfy our dissertation committee.
    [Show full text]
  • 1 Likelihood 2 Maximum Likelihood Estimators 3 Properties of the Maximum Likelihood Estimators
    University of Colorado-Boulder Department of Applied Mathematics Prof. Vanja Dukic Review: Likelihood 1 Likelihood Consider a set of iid random variables X = fXig; i = 1; :::; N that come from some probability model M = ff(x; θ); θ 2 Θg where f(x; θ) is the probability density of each Xi and θ is an unknown but fixed parameter from some space Θ ⊆ <d. (Note: this would be a typical frequentist setting.) The likelihood function (Fisher, 1925, 1934) is defined as: N Y L(θ j X~ ) = f(Xi; θ); (1) i=1 or any other function of θ proportional to (1) (it does not need be normalized). More generally, the likelihood ~ N ~ function is any function proportional to the joint density of the sample, X = fXigi=1, denoted as f(X; θ). Note you don't need to assume independence, nor identically distributed variables in the sample. The likelihood function is sacred to most statisticians. Bayesians use it jointly with the prior distribution π(θ) to derive the posterior distribution π(θ j X) and draw all their inference based on that posterior. Frequentists may derive their estimates directly from the likelihood function using maximum likelihood estimation (MLE). We say that a parameter is identifiable if for any two θ 6= θ∗ we have L(θ j X~ ) 6= L(θ∗ j X~ ) 2 Maximum likelihood estimators One way of identifying (estimating) parameters is via the maximum likelihood estimators. Maximizing the likelihood function (or equivalently, the log likelihood function thanks to the monotonicity of the log transform), with respect to the parameter vector θ, yields the maximum likelihood estimator, denoted as θb.
    [Show full text]
  • Sigspec User's Manual
    Comm. in Asteroseismology Volume 163, 2011 c Austrian Academy of Sciences SigSpec User’s Manual P. Reegen† Institut f¨ur Astronomie, T¨urkenschanzstraße 17, 1180 Vienna, Austria Abstract SigSpec computes the spectral significance levels for the DFT∗ amplitude spectrum of a time series at arbitrarily given sampling. It is based on the ana- lytical solution for the Probability Density Function (PDF) of an amplitude level, including dependencies on frequency and phase and referring to white noise. Us- ing a time series dataset as input, an iterative procedure including step-by-step prewhitening of the most significant signal components and MultiSine least- squares fitting is provided to determine a whole set of signal components, which makes the program a powerful tool for multi-frequency analysis. Instead of the step-by-step prewhitening of the most significant peaks, the program is also able to take into account several steps of the prewhitening sequence simultane- ously and check for the combination associated to a minimum residual scatter. This option is designed to overcome the aliasing problem caused by periodic time gaps in the dataset. SigSpec can detect non-sinusoidal periodicities in a dataset by simultaneously taking into account a fundamental frequency plus a set of harmonics. Time-resolved spectral significance analysis using a set of intervals of the time series is supported to investigate the development of eigen- frequencies over the observation time. Furthermore, an extension is available to perform the SigSpec analysis for multiple time series input files at once. In this MultiFile mode, time series may be tagged as target and comparison data.
    [Show full text]
  • Arxiv:Physics/0703160V1 [Physics.Data-An] 15 Mar 2007 Ok Nomto,Iitoueanwadubae Reliabili Unbiased and New a ( Introduce Criterion I Information, Book) W 4)
    Astronomy & Astrophysics manuscript no. AA˙2006˙6597 c ESO 2018 October 15, 2018 SigSpec I. Frequency- and Phase-Resolved Significance in Fourier Space P. Reegen1 Institut f¨ur Astronomie, Universit¨at Wien, T¨urkenschanzstraße 17, 1180 Vienna, Austria, e-mail: [email protected] Received October 19, 2006; accepted March 6, 2007 ABSTRACT Context. Identifying frequencies with low signal-to-noise ratios in time series of stellar photometry and spectroscopy, and measuring their amplitude ratios and peak widths accurately, are critical goals for asteroseismology. These are also challenges for time series with gaps or whose data are not sampled at a constant rate, even with modern Discrete Fourier Transform (DFT) software. Also the False-Alarm Probability introduced by Lomb and Scargle is an approximation which becomes less reliable in time series with longer data gaps. Aims. A rigorous statistical treatment of how to determine the significance of a peak in a DFT, called SigSpec, is presented here. SigSpec is based on an analytical solution of the probability that a DFT peak of a given amplitude does not arise from white noise in a non-equally spaced data set. Methods. The underlying Probability Density Function (PDF) of the amplitude spectrum generated by white noise can be derived explicitly if both frequency and phase are incorporated into the solution. In this paper, I define and evaluate an unbiased statistical estimator, the “spectral significance”, which depends on frequency, amplitude, and phase in the DFT, and which takes into account the time-domain sampling. Results. I also compare this estimator to results from other well established techniques and assess the advantages of SigSpec, through comparison of its analytical solutions to the results of extensive numerical calculations.
    [Show full text]