On Variance Estimation Under Shifts in the Mean

Total Page:16

File Type:pdf, Size:1020Kb

On Variance Estimation Under Shifts in the Mean AStA Advances in Statistical Analysis https://doi.org/10.1007/s10182-020-00366-5 ORIGINAL PAPER On variance estimation under shifts in the mean Ieva Axt1 · Roland Fried1 Received: 18 April 2019 / Accepted: 18 March 2020 © The Author(s) 2020 Abstract In many situations, it is crucial to estimate the variance properly. Ordinary variance estimators perform poorly in the presence of shifts in the mean. We investigate an approach based on non-overlapping blocks, which yields good results in change- point scenarios. We show the strong consistency and the asymptotic normality of such blocks-estimators of the variance under independence. Weak consistency is shown for short-range dependent strictly stationary data. We provide recommenda- tions on the appropriate choice of the block size and compare this blocks-approach with diference-based estimators. If level shifts occur frequently and are rather large, the best results can be obtained by adaptive trimming of the blocks. Keywords Blockwise estimation · Change-point · Trimmed mean 1 Introduction We consider a sequence of random variables Y1, … , YN generated by the model K Y = X + h I . t t k t≥tk (1) k=1 Most of the time we assume that X1, … , XN are i.i.d. random variables with 2 E Xt = and Var Xt = , but this will be relaxed occasionally to allow for a short-range dependent strictly stationary sequence. The observed data y1, … , yN are afected by an unknown number K of level shifts of possibly diferent heights h1, … , hK at diferent time points t1, … , tK . Our goal is the estimation of the vari- 2 ance . Without loss of generality, we will set = 0 in the following. * Ieva Axt [email protected] Roland Fried [email protected] 1 TU Dortmund University, Dortmund, Germany Vol.:(0123456789)1 3 I. Axt, R. Fried 2 In Sect. 2 we analyse estimators of from the sequence of observations (Yt)t≥1 by combining estimates obtained from splitting the data into several blocks. Without the need of explicit distributional assumptions the mean of the blockwise estimates turns out to be consistent if the size and the number of blocks increases, and the number of jumps increases slower than the number of blocks. If many jumps in the mean are expected to occur, an adaptively trimmed mean of the blockwise estimates can be used, see Sect. 3. In Sect. 4 a simulation study is conducted to assess the performance of the proposed approaches. In Sect. 5 the estimation procedures are applied to real data sets, while Sect. 6 summarizes the results of this paper. 2 Estimation of the variance by averaging When dealing with independent identically distributed data the sample variance is the 2 common choice for estimation of . However, if we are aware of a possible presence of level shifts at unknown locations, it is reasonable to divide the sample Y1, … , YN into m non-overlapping blocks of size n = N∕m and to calculate the average of the m sample variances derived from the diferent blocks. A similar approach has been used in Dai et al. (2015) in the context of repeated⌊ measurements⌋ data and in Rooch et al. (2019) for estimation of the Hurst parameter. 2 The blocks-estimator Mean of the variance investigated here is defned as m 1 2 = S2, Mean j (2) m j=1 S2 = 1 n (Y − Y )2 Y = 1 n Y Y , … , Y where j n−1 t=1 j,t j , j n t=1 j,t and j,1 j,n are the observa- j n tions in the th block.∑ We are interested in∑ fnding the block size which yields a low mean squared error (MSE) under certain assumptions. In what follows, we will concentrate on the situation where all jump heights are pos- itive. This is a worse scenario than having both, positive and negative jumps, since the data are more spread in the former case resulting in a larger positive bias of most scale estimators. 2.1 Asymptotic properties We will use some algebraic rules for derivation of the expectation and the variance of 2 quadratic forms in order to calculate the MSE of Mean , see Seber and Lee (2012). Let B be the number of blocks with jumps in the mean and K ≥ B the total number of 2 jumps. The expected value and the variance of Mean are given as follows: 1 3 On variance estimation under shifts in the mean B 2 2 1 T E Mean = + j Aj, m(n − 1) j=1 B 1 4(n − 3) 42 Var 2 = 4 − + TA , Mean m n n(n − 1) m2(n − 1)2 j j j=1 = E X4 , A = − 1 1 1T 1 =(1, … ,1)T where 4 1 n n n n , n is the unit matrix, n and j con- tains the expected values of the random variables in the perturbed block j = 1, … , B , T T T i.e., j =(j,1, … , j,n) =(E(Yj,1), … , E(Yj,n)) . The term j Aj∕(n − 1) is the empirical variance of the expected values E(Yj,1), … , E(Yj,n) in block j. In a jump- T free block, we have j Aj = 0 , since all expected values and therefore the elements of j are equal. The blocks-estimator (2) estimates the variance consistently if the number of blocks grows sufciently fast as is shown in Theorem 1. Theorem 1 Y , … , Y Y = X + K h I Let 1 N with t t k=1 k t≥tk from Model (1) be segregated into m blocks of size n, where t1, … , tK are the time points of the jumps of size ∑ h1, … , hK , respectively. Let B out of m blocks be contaminated by K1, … , KB jumps, B 4 respectively with Kj = K = K(N) Moreover let E( X1 ) < ∞ , 2 j=1 . , , K K h = o(m) m → ∞ k=1 k and∑ , whereas the block size n can be fxed or increas- m ing� as N →�∞ Then 2 = 1 S2 → 2 almost surely ∑ . Mean m j=1 j . ∑ Proof Without loss of generality assume that the frst B out of m blocks are contami- 2 nated by K1, … , KB jumps, respectively. Let the term Sj,0 denote the empirical vari- 2 ance of the uncontaminated data in block j, while Sj,h is the empirical variance when Kj level shifts are present. Moreover, Yj,1, … , Yj,n are the observations in the jth = E Y = 1 n E Y block, j,t j,t and j n t=1 j,t . Then we have m ∑ m � � B 2 1 2 1 2 1 2 Mean = Sj = Sj,0 + Sj,h m j=1 m j=B+1 m j=1 m B n 2 1 2 1 1 = Sj,0 + Xj,t + j,t − Xj − j m j=B+1 m j=1 n − 1 t=1 m B n (3) 1 2 1 2 = Sj,0 + (Xj,t − Xj)(j,t − j) m j=1 m j=1 n − 1 t=1 B n 1 1 2 + j,t − j . m j=1 n − 1 t=1 For the second term in the last Eq. (3), we have almost surely 1 3 I. Axt, R. Fried B n 2 (X − X )( − ) m(n − 1) j,t j j,t j j=1 t=1 B n 2 ≤ (X − X ) ( − ) j,t j j,t j m(n − 1) j=1 t=1 B n K (4) 2 ≤ (X − X ) h m(n − 1) j,t j k j=1 t=1 k=1 K B n 2 n 1 1 = B h (X − X ) ⟶ 0. k m n − 1 B n j,t j k=1 j=1 t=1 1 B 1 n The term j=1 t=1 (Xj,t− Xj) in (4) is a random variable with fnite moments B n if n and B are fxed. This random variable converges to the term ∑ ∑ � � E 1 n (X − X ) � � B → ∞ B → ∞ n → ∞ n t=1 j,t j almost� surely� if . In the case of and � �E X this term∑ converges� to� 1 almost surely due to Theorem 2 of Hu et al. (1989) � E�( X 4) < ∞ S2 − E S2 and the condition� � 1 , since j j are uniformly bounded with 2 2 → 2 → P( Sj − E Sj > t) 0 ∀t due to Chebyshev’s inequality and Var(Sj ) 0 . More- K ≤ K over, we used the fact that B k=1 hk K k=1 hk = o(m). The following is valid for the third term in (3): �∑ � �∑ � � � � � B n � � � B � n K 2 1 1 2 1 1 − ≤ h m n 1 j,t j m n 1 k j=1 − t=1 j=1 − t=1 k=1 K 2 B n = h ⟶ 0. m n 1 k − k=1 2 The frst term of the last equation in (3) converges almost surely to due to the results on triangular arrays in Theorem 2 of Hu et al. (1989), assuming that the con- 4 2 2 dition E( X1 ) < ∞ holds, since Sj − E Sj are uniformly bounded with 2 2 → 2 → P( Sj − E Sj > t) 0 ∀t due to Chebyshev’s inequality and Var(Sj ) 0 . Appli- cation of Slutsky’s Theorem proves the result. ◻ Remark 2 1. If the jump heights are bounded by a constant h ≥ hk, k = 1, … , K, the strongest restriction arises2 if all heights equal this upper bound resulting in the constraint K 3 2 K k=1 hk = K h = o(m) .
Recommended publications
  • Efficient Inflation Estimation
    1 Efficient Inflation Estimclticn by Michael F. Bryan, Stephen G. Cecchetti, and Rodney L. Wiggins II Working Paper 9707 EFFICIENT JIVFLATION ESTIMATION by Michael F. Bryan, Stephen G. Cecchetti, and Rodney L. Wiggins I1 Michael F. Bryan is assistant vice president and economist at the Federal Reserve Bank of Cleveland. Stephen G. Cecchetti is executive vice president and director of research at the Federal Reserve Bank of New York and a research associate at the National Bureau of Econonlic Research. Rodney L. Wiggins I1 is a member of the Research Department of the Federal Reserve Bank of Cleveland. The authors gratefully acknowledge the comments and assistance of Todd Clark, Ben Craig, and Scott Roger. Working papers of the Federal Reserve Bank of Cleveland are preliminary materials circulated to stimulate discussion and critical comment. The views stated herein are those of the authors and are not necessarily those of the Federal Reserve Bank of Cleveland or of the Board of Governors of the Federal Reserve System. Federal Reserve Bank of Cleveland working papers are distributed for the purpose of promoting discussion of research in progress. These papers may not have been subject to the formal editorial review accorded official Federal Reserve Bank of Cleveland publications. Working papers are now available electronically through the Cleveland Fed's home page on the World Wide Web: http:Ilwww.clev.frb.org. August 1997 Abstract This paper investigates the use of trimmed means as high-frequency estimators of inflation. The known characteristics of price change distributions, specifically the . observation that they generally exhibit high levels of kurtosis, imply that simple averages of price data are unlikely to produce efficient estimates of inflation.
    [Show full text]
  • A Globally Optimal Solution to 3D ICP Point-Set Registration
    1 Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration Jiaolong Yang, Hongdong Li, Dylan Campbell, and Yunde Jia Abstract—The Iterative Closest Point (ICP) algorithm is one of the most widely used methods for point-set registration. However, being based on local iterative optimization, ICP is known to be susceptible to local minima. Its performance critically relies on the quality of the initialization and only local optimality is guaranteed. This paper presents the first globally optimal algorithm, named Go-ICP, for Euclidean (rigid) registration of two 3D point-sets under the L2 error metric defined in ICP. The Go-ICP method is based on a branch-and-bound (BnB) scheme that searches the entire 3D motion space SE(3). By exploiting the special structure of SE(3) geometry, we derive novel upper and lower bounds for the registration error function. Local ICP is integrated into the BnB scheme, which speeds up the new method while guaranteeing global optimality. We also discuss extensions, addressing the issue of outlier robustness. The evaluation demonstrates that the proposed method is able to produce reliable registration results regardless of the initialization. Go-ICP can be applied in scenarios where an optimal solution is desirable or where a good initialization is not always available. Index Terms—3D point-set registration, global optimization, branch-and-bound, SE(3) space search, iterative closest point F 1 INTRODUCTION given an initial transformation (rotation and transla- Point-set registration is a fundamental problem in tion), it alternates between building closest-point cor- computer and robot vision.
    [Show full text]
  • Trimmed Mean Group Estimation
    Trimmed Mean Group Estimation Yoonseok Lee∗ Donggyu Sul† Syracuse University University of Texas at Dallas May 2020 Abstract This paper develops robust panel estimation in the form of trimmed mean group estimation. It trims outlying individuals of which the sample variances of regressors are either extremely small or large. As long as the regressors are statistically independent of the regression coefficients, the limiting distribution of the trimmed estimator can be obtained in a similar way to the standard mean group estimator. We consider two trimming methods. The first one is based on the order statistic of the sample variance of each regressor. The second one is based on the Mahalanobis depth of the sample variances of regressors. We apply them to the mean group estimation of the two-way fixed effect model with potentially heterogeneous slope parameters and to the commonly correlated regression, and we derive limiting distribution of each estimator. As an empirical illustration, we consider the effect of police on property crime rates using the U.S. state-level panel data. Keywords: Trimmed mean group estimator, Robust estimator, Heterogeneous panel, Two- way fixed effect, Commonly correlated estimator JEL Classifications: C23, C33 ∗Address: Department of Economics and Center for Policy Research, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244. E-mail: [email protected] †Address: Department of Economics, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080. E-mail: [email protected] 1Introduction Though it is popular in practice to impose homogeneity in panel data regression, the homogeneity restriction is often rejected (e.g., Baltagi, Bresson, and Pirotte, 2008).
    [Show full text]
  • Programs Tramo (Time Series Regression with Arima Noise
    PROGRAMS TRAMO (TIME SERIES REGRESSION WITH ARIMA NOISE, MISSING OBSERVATIONS AND OUTLIERS) AND SEATS (SIGNALEXTRACTIONS AREMA TIME SERIES) INSTRUCTIONS FOR THE USER (BETA VERSION: JUNIO 1997) Víctor Gómez * Agustín Maravall ** SGAPE-97001 Julio 1997 * Ministerio de Economía y Hacienda ** Banco de España This software is available at the following Internet address: http://www.bde.es The Working Papers of the Dirección General de Análisis y Programación Presupuestaria are not official statements of the Ministerio de Economía y Hacienda Abstract Brief summaries and user Instructions are presented for the programs-* TRAMO ("Time 'Series Regression with ARTMA Noise, Missing Observations and Outliers") and SEATS ("Signal Extraction in ARIMA Time Series"). TRAMO is a program for estimation and forecasting of regression models with possibly nonstationary (ARIMA) errors and any sequence of missing val- ues. The program interpolates these values, identifies and corrects for several types of outliers, and estimates special effects such as Trading Day and Easter and, in general, intervention variable type of effects. Fully automatic model identification and outlier correction procedures are available. SEATS is a program for estimation of unobserved components in time se- ries following the so-called ARlMA-model-based method. The trend, seasonal, irregular, and cyclical components are estimated and forecasted with signal extraction techniques applied to ARIMA models. The standard errors of the es- timates and forecasts are obtained and the model-based structure is exploited to answer questions of interest in short-term analysis of the data. The two programs are structured so as to be used together both for in- depth analysis of a few series (as presently done at the Bank of Spain) or for automatic routine applications to a large number of series (as presently done at Eurostat).
    [Show full text]
  • Learning Entangled Single-Sample Distributions Via Iterative Trimming
    Learning Entangled Single-Sample Distributions via Iterative Trimming Hui Yuan Yingyu Liang Department of Statistics and Finance Department of Computer Sciences University of Science and Technology of China University of Wisconsin-Madison Abstract have n data points from n different distributions with a common mean but different variances (the mean and all the variances are unknown), and our goal is to es- In the setting of entangled single-sample dis- timate the mean. tributions, the goal is to estimate some com- mon parameter shared by a family of distri- This setting is motivated for both theoretical and prac- butions, given one single sample from each tical reasons. From the theoretical perspective, it goes distribution. We study mean estimation and beyond the typical i.i.d. setting and raises many inter- linear regression under general conditions, esting open questions, even on basic topics like mean and analyze a simple and computationally ef- estimation for Gaussians. It can also be viewed as ficient method based on iteratively trimming a generalization of the traditional mixture modeling, samples and re-estimating the parameter on since the number of distinct mixture components can the trimmed sample set. We show that the grow with the number of samples. From the practical method in logarithmic iterations outputs an perspective, many modern applications have various estimation whose error only depends on the forms of heterogeneity, for which the i.i.d. assumption noise level of the dαne-th noisiest data point can lead to bad modeling of their data. The entan- where α is a constant and n is the sample gled single-sample setting provides potentially better size.
    [Show full text]
  • Power-Shrinkage and Trimming: Two Ways to Mitigate Excessive Weights
    ASA Section on Survey Research Methods Power-Shrinkage and Trimming: Two Ways to Mitigate Excessive Weights Chihnan Chen1, Nanhua Duan2, Xiao-Li Meng3, Margarita Alegria4 Boston University1 UCLA2 Harvard University3 Cambridge Health Alliance and Harvard Medical School4 Abstract 2 Power-Shrinkage and Trimming Large-scale surveys often produce raw weights with very The survey weights are constructed as the reciprocals of large variations. A standard approach is to perform the selection probabilities. Let w be the survey weight, n some form of trimming, as a way to reduce potentially be the sample size, and y be the variable of interest. Let large variances in various survey estimators. The amount p 2 [0; 1] be the power-shrinkage parameter and T 2 [0; 1] of trimming is usually determined by considerations of be the trimming threshold de¯ned in terms of percentile. bias-variance trade-o®. While bias-variance trade-o® is The original weighted estimator, power-shrinkage estima- a sound principle, the trimming method itself is popular tor and the trimmed estimator for the expectation, E(y), only because of its simplicity, not because it has statisti- are given by cally desirable properties. In this paper we investigate a Xn more principled method by shrinking the variability of the wiyi log of the weights. We use the same mean squared error ² Original weighted estimator :y ¹ = i=1 (MSE) criterion to determine the amount of shrinking. w Xn The shrinking on the log scale implies shrinking weights wi by a power parameter p between [0,1]. Our investiga- i=1 tion suggests an empirical way to predict the optimal Xn choice.
    [Show full text]
  • 26 Sep 2019 the Trimmed Mean in Non-Parametric Regression
    The Trimmed Mean in Non-parametric Regression Function Estimation Subhra Sankar Dhar1, Prashant Jha2, and Prabrisha Rakhshit3 1Department of Mathematics and Statistics, IIT Kanpur, India 2Department of Mathematics and Statistics, IIT Kanpur, India 3Department of Statistics, Rutgers University, USA September 27, 2019 Abstract This article studies a trimmed version of the Nadaraya-Watson estimator to estimate the unknown non-parametric regression function. The characterization of the estimator through minimization problem is established, and its pointwise asymptotic distribution is derived. The robustness property of the proposed estimator is also studied through breakdown point. Moreover, as the trimmed mean in the location model, here also for a wide range of trim- ming proportion, the proposed estimator poses good efficiency and high breakdown point for various cases, which is out of the ordinary property for any estimator. Furthermore, the usefulness of the proposed estimator is shown for three benchmark real data and various simulated data. Keywords: Heavy-tailed distribution; Kernel density estimator; L-estimator; Nadaraya-Watson estimator; Robust estimator. arXiv:1909.10734v2 [math.ST] 26 Sep 2019 1 Introduction We have a random sample (X1,Y1),..., (Xn,Yn), which are i.i.d. copies of (X,Y ), and the regres- sion of Y on X is defined as Yi = g(Xi)+ ei, i =1, . , n, (1) where g( ) is unknown, and e are independent copies of error random variable e with E(e X)=0 · i | and var(e X = x)= σ2(x) < for all x. Note that the condition E(e X) = 0 is essentially the | ∞ | 2000 AMS Subject Classification: 62G08 1 identifiable condition for mean regression, and it varies over the different procedures of regression.
    [Show full text]
  • Measuring Core Inflation in India: an Application of Asymmetric-Trimmed Mean Approach
    Measuring Core Inflation in India: An Application of Asymmetric-Trimmed Mean Approach Motilal Bicchal* Naresh Kumar Sharma ** Abstract The paper seek to obtain an optimal asymmetric trimmed means-core inflation measure in the class of trimmed means measures when the distribution of price changes is leptokurtic and skewed to the right for any given period. Several estimators based on asymmetric trimmed mean approach are constructed, and evaluated by the conditions set out in Marques et al. (2000). The data used in the study is the monthly 69 individual price indices which are constituent components of Wholesale Price Index (WPI) and covers the period, April 1994 to April 2009, with 1993-94 as the base year. Results of the study indicate that an optimally trimmed estimator is found when we trim 29.5 per cent from the left-hand tail and 20.5 per cent from the right-hand tail of the distribution of price changes. Key words: Core inflation, underlying inflation, asymmetric trimmed mean JEL Classification: C43, E31, E52 * Ph.D Research scholar and ** Professor at Department of Economics, University of Hyderabad, P.O Central University, Hyderabad-500046, India. Corresponding author: [email protected] We are very grateful to Carlos Robalo Marques of Banco de Portugal for providing the program for computing the asymmetric trimmed-mean as well as for his valuable comments and suggestions. We also wish to thank Bandi Kamaiah for going through previous draft and for offering comments, which improved the presentation of this paper. 1. Introduction Various approaches to measuring core inflation have been discussed in the literature.
    [Show full text]
  • Robust Statistics
    Robust statistics Stéphane Paltani Why robust statistics? Removal of data Robust statistics L-estimators and some non-parametric statistics R-estimators M-estimators Stéphane Paltani ISDC Data Center for Astrophysics Astronomical Observatory of the University of Geneva Statistics Course for Astrophysicists, 2010–2011 Outline Robust statistics Stéphane Paltani Why robust statistics? Removal of data Why robust statistics? L-estimators R-estimators Removal of data M-estimators L-estimators R-estimators M-estimators Caveat Robust statistics Stéphane Paltani Why robust statistics? Removal of data The presentation aims at being user-focused and at L-estimators presenting usable recipes R-estimators Do not expect a fully mathematically rigorous description! M-estimators This has been prepared in the hope to be useful even in the stand-alone version Please provide me feedback with any misconception or error you may find, and with any topic not addressed here but which should be included Outline Robust statistics Stéphane Paltani Why robust statistics? Why robust statistics? First example General concepts First example Data representation General concepts Removal of data Data representation L-estimators R-estimators M-estimators Removal of data L-estimators R-estimators M-estimators Example: The average weight of opossums Robust statistics Stéphane Paltani Why robust statistics? First example General concepts Data representation Removal of data L-estimators R-estimators M-estimators I Find a group of N opossums I Weight them: wi ; i = 1;:::; N
    [Show full text]
  • Configurable FPGA-Based Outlier Detection for Time Series Data
    Configurable FPGA-Based Outlier Detection for Time Series Data by Ghazaleh Vazhbakht, B.Sc. A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Applied Science in Electrical and Computer Engineering Ottawa-Carleton Institute for Electrical and Computer Engineering Carleton University Ottawa, Ontario c 2017 Ghazaleh Vazhbakht TABLE OF CONTENTS Page LIST OF FIGURES iv LIST OF TABLES v LIST OF ACRONYMS vi ACKNOWLEDGMENTS vii ABSTRACT viii 1 Introduction 1 1.1 Outlier Detection in Time Series . 2 1.2 Thesis Objective and Contributions . 3 2 Background 5 2.1 Definition of Outlier . 5 2.2 Outlier Detection . 6 2.3 Scope of Work . 8 2.3.1 Types of Outliers . 10 2.3.2 Techniques Based on Outlier Aware Prediction Models . 13 2.3.3 Managing Detected Outliers . 14 2.4 FPGA-based Outlier Detection Techniques . 15 3 Theory and Reduction to Practice 20 3.1 ARMA Modeling . 21 3.1.1 Model Selection . 23 3.1.2 Parameter Estimation . 26 3.1.3 Model Checking . 35 3.2 Test Statistics . 35 3.2.1 Critical Value . 36 3.3 Outlier Detection Algorithm . 37 3.3.1 Estimating Outliers Effects . 39 3.3.2 Locating and Identifying Outliers . 41 3.3.3 Residual Standard Deviation . 42 ii 3.3.4 The Procedure of Joint Estimation of Model Parameters and Outlier Effects . 43 4 Design and Implementation 48 4.1 Profiling the Outlier Detection Algorithm . 49 4.2 MATLAB Implementation of the Simplified Algorithm . 50 4.3 Hardware Implementation of the Outlier Detection Algorithm .
    [Show full text]
  • Outlier Detection and Trimmed Estimation for General Functional Data
    Outlier detection and trimmed estimation for general functional data Daniel Gervini Department of Mathematical Sciences University of Wisconsin–Milwaukee P.O. Box 413, Milwaukee, WI 53201 [email protected] December 3, 2012 arXiv:1001.1014v2 [stat.ME] 30 Nov 2012 Abstract This article introduces trimmed estimators for the mean and covariance function of general functional data. The estimators are based on a new measure of “outlying- ness” or data depth that is well defined on any metric space, although this paper focuses on Euclidean spaces. We compute the breakdown point of the estimators and show that the optimal breakdown point is attainable for the appropriate choice of tuning parameters. The small-sample behavior of the estimators is studied by simulation, and we show that they have better outlier-resistance properties than alternative estimators. This is confirmed by two real-data applications, that also show that the outlyingness measure can be used as a graphical outlier-detection tool in functional spaces where visual screening of the data is difficult. Key Words: Breakdown Point; Data Depth; Robust Statistics; Stochastic Pro- cesses. 1 Introduction Many statistical applications today involve data that does not fit into the classi- cal univariate or multivariate frameworks; for example, growth curves, spectral curves, and time-dependent gene expression profiles. These are samples of func- tions, rather than numbers or vectors. We can think of them as realizations of a stochastic process with sample paths in L 2(R), the space of square-integrable functions. The statistical analysis of function-valued data has received a lot of at- tention in recent years (see e.g.
    [Show full text]
  • Robust Mean Estimation for Real-Time Blanking in Radioastronomy
    ROBUST MEAN ESTIMATION FOR REAL-TIME BLANKING IN RADIOASTRONOMY Philippe Ravier1, Cedric Dumez-Viou1−2 1Laboratory of Electronics, Signals, Images, Polytech'Orleans´ - University of Orleans 12, rue de Blois, BP 6744 Cedex 2, F-45067 Orleans,´ France 2Observatoire de Paris/(CNRS No.704), Station de radioastronomie, F-18330 Nanc¸ay, France phone: + (33).2.38.49.48.63, fax: + (33).2.38.41.72.45, email: [email protected] web: www.obs-nancay.fr/rfi ABSTRACT assumption does not hold anymore and the exact PDF must be taken into account for precise threshold derivation. Radio astronomy observations suffer from strong interfer- ences that need to be blanked both in the time and frequency domains. In order to achieve real-time computations, in- 2. PROCEDURE USED FOR REAL-TIME terference detection is made by simple thresholding. The In order to achieve real-time computation, a simple thresh- threshold value is linked to the mean estimation of the power olding drives the time and/or frequency blanking decision. 2 of clean observed data that follows a χ distribution. Spu- The problem is formulated as the following binary hy- rious values insensitivity is obtained by replacing mean by pothesis test: robust mean. A theoretical study of the variance of three es- timators of the mean is presented. The study leads to a prac- tical trimmed percentage that is chosen for the robust mean. H0 : the SOI only is present, H1 : the SOI is polluted with additive interference. 1. INTRODUCTION The SOI is supposed to be Gaussian. On the other hand, Radioastronomy benefits from protected bands for the study no a priori information about the interference PDF is avail- of radio-sources.
    [Show full text]