Lecture 10 Robust and Quantile Regression
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  A Generalized Linear Model for Principal Component Analysis of Binary DataA Generalized Linear Model for Principal Component Analysis of Binary Data Andrew I. Schein Lawrence K. Saul Lyle H. Ungar Department of Computer and Information Science University of Pennsylvania Moore School Building 200 South 33rd Street Philadelphia, PA 19104-6389 {ais,lsaul,ungar}@cis.upenn.edu Abstract they are not generally appropriate for other data types. Recently, Collins et al.[5] derived generalized criteria We investigate a generalized linear model for for dimensionality reduction by appealing to proper- dimensionality reduction of binary data. The ties of distributions in the exponential family. In their model is related to principal component anal- framework, the conventional PCA of real-valued data ysis (PCA) in the same way that logistic re- emerges naturally from assuming a Gaussian distribu- gression is related to linear regression. Thus tion over a set of observations, while generalized ver- we refer to the model as logistic PCA. In this sions of PCA for binary and nonnegative data emerge paper, we derive an alternating least squares respectively by substituting the Bernoulli and Pois- method to estimate the basis vectors and gen- son distributions for the Gaussian. For binary data, eralized linear coefficients of the logistic PCA the generalized model's relationship to PCA is anal- model. The resulting updates have a simple ogous to the relationship between logistic and linear closed form and are guaranteed at each iter- regression[12]. In particular, the model exploits the ation to improve the model's likelihood. We log-odds as the natural parameter of the Bernoulli dis- evaluate the performance of logistic PCA|as tribution and the logistic function as its canonical link.
- 
												  Robust Statistics Part 3: Regression AnalysisRobust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline 1 Classical regression estimators 2 Classical outlier diagnostics 3 Regression M-estimators 4 The LTS estimator 5 Outlier detection 6 Regression S-estimators and MM-estimators 7 Regression with categorical predictors 8 Software Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 2 Linear regression Classical estimators The linear regression model The linear regression model says: yi = β0 + β1xi1 + ... + βpxip + εi ′ = xiβ + εi 2 ′ ′ with i.i.d. errors εi ∼ N(0,σ ), xi = (1,xi1,...,xip) and β =(β0,β1,...,βp) . ′ Denote the n × (p + 1) matrix containing the predictors xi as X =(x1,..., xn) , ′ ′ the vector of responses y =(y1,...,yn) and the error vector ε =(ε1,...,εn) . Then: y = Xβ + ε Any regression estimate βˆ yields fitted values yˆ = Xβˆ and residuals ri = ri(βˆ)= yi − yˆi . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 3 Linear regression Classical estimators The least squares estimator Least squares estimator n ˆ 2 βLS = argmin ri (β) β i=1 X If X has full rank, then the solution is unique and given by ˆ ′ −1 ′ βLS =(X X) X y The usual unbiased estimator of the error variance is n 1 σˆ2 = r2(βˆ ) LS n − p − 1 i LS i=1 X Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 4 Linear regression Classical estimators Outliers in regression Different types of outliers: vertical outlier good leverage point • • y • • • regular data • ••• • •• ••• • • • • • • • • • bad leverage point • • •• • x Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p.
- 
												  4.2 Variance and Covariance4.2 Variance and Covariance The most important measure of variability of a random variable X is obtained by letting g(X) = (X − µ)2 then E[g(X)] gives a measure of the variability of the distribution of X. 1 Definition 4.3 Let X be a random variable with probability distribution f(x) and mean µ. The variance of X is σ 2 = E[(X − µ)2 ] = ∑(x − µ)2 f (x) x if X is discrete, and ∞ σ 2 = E[(X − µ)2 ] = ∫ (x − µ)2 f (x)dx if X is continuous. −∞ ¾ The positive square root of the variance, σ, is called the standard deviation of X. ¾ The quantity x - µ is called the deviation of an observation, x, from its mean. 2 Note σ2≥0 When the standard deviation of a random variable is small, we expect most of the values of X to be grouped around mean. We often use standard deviations to compare two or more distributions that have the same unit measurements. 3 Example 4.8 Page97 Let the random variable X represent the number of automobiles that are used for official business purposes on any given workday. The probability distributions of X for two companies are given below: x 12 3 Company A: f(x) 0.3 0.4 0.3 x 01234 Company B: f(x) 0.2 0.1 0.3 0.3 0.1 Find the variances of X for the two companies. Solution: µ = 2 µ A = (1)(0.3) + (2)(0.4) + (3)(0.3) = 2 B 2 2 2 2 σ A = (1− 2) (0.3) + (2 − 2) (0.4) + (3− 2) (0.3) = 0.6 2 2 4 σ B > σ A 2 2 4 σ B = ∑(x − 2) f (x) =1.6 x=0 Theorem 4.2 The variance of a random variable X is σ 2 = E(X 2 ) − µ 2.
- 
												  5. the Student T DistributionVirtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
- 
												  Theoretical Statistics. Lecture 20Theoretical Statistics. Lecture 20. Peter Bartlett 1. Recall: Functional delta method, differentiability in normed spaces, Hadamard derivatives. [vdV20] 2. Quantile estimates. [vdV21] 3. Contiguity. [vdV6] 1 Recall: Differentiability of functions in normed spaces Definition: φ : D E is Hadamard differentiable at θ D tangentially → ∈ to D D if 0 ⊆ φ′ : D E (linear, continuous), h D , ∃ θ 0 → ∀ ∈ 0 if t 0, ht h 0, then → k − k → φ(θ + tht) φ(θ) − φ′ (h) 0. t − θ → 2 Recall: Functional delta method Theorem: Suppose φ : D E, where D and E are normed linear spaces. → Suppose the statistic Tn : Ωn D satisfies √n(Tn θ) T for a random → − element T in D D. 0 ⊂ If φ is Hadamard differentiable at θ tangentially to D0 then ′ √n(φ(Tn) φ(θ)) φ (T ). − θ If we can extend φ′ : D E to a continuous map φ′ : D E, then 0 → → ′ √n(φ(Tn) φ(θ)) = φ (√n(Tn θ)) + oP (1). − θ − 3 Recall: Quantiles Definition: The quantile function of F is F −1 : (0, 1) R, → F −1(p) = inf x : F (x) p . { ≥ } Quantile transformation: for U uniform on (0, 1), • F −1(U) F. ∼ Probability integral transformation: for X F , F (X) is uniform on • ∼ [0,1] iff F is continuous on R. F −1 is an inverse (i.e., F −1(F (x)) = x and F (F −1(p)) = p for all x • and p) iff F is continuous and strictly increasing. 4 Empirical quantile function For a sample with distribution function F , define the empirical quantile −1 function as the quantile function Fn of the empirical distribution function Fn.
- 
												  Best Representative Value” for Daily Total Column Ozone ReportingAtmos. Meas. Tech., 10, 4697–4704, 2017 https://doi.org/10.5194/amt-10-4697-2017 © Author(s) 2017. This work is distributed under the Creative Commons Attribution 4.0 License. A more representative “best representative value” for daily total column ozone reporting Andrew R. D. Smedley, John S. Rimmer, and Ann R. Webb Centre for Atmospheric Science, University of Manchester, Manchester, M13 9PL, UK Correspondence to: Andrew R. D. Smedley ([email protected]) Received: 8 June 2017 – Discussion started: 13 July 2017 Revised: 20 October 2017 – Accepted: 20 October 2017 – Published: 5 December 2017 Abstract. Long-term trends of total column ozone (TCO), 1 Introduction assessments of stratospheric ozone recovery, and satellite validation are underpinned by a reliance on daily “best repre- Global ground-based monitoring of total column ozone sentative values” from Brewer spectrophotometers and other (TCO) relies on the international network of Brewer spec- ground-based ozone instruments. In turn reporting of these trophotometers since they were first developed in the 1980s daily total column ozone values to the World Ozone and Ul- (Kerr et al., 1981), which has expanded the number of traviolet Radiation Data Centre (WOUDC) has traditionally sites and measurement possibilities from their still-operating been predicated upon a simple choice between direct sun predecessor instrument, the Dobson spectrophotometer. To- (DS) and zenith sky (ZS) observations. For mid- and high- gether these networks provide validation of satellite-retrieved latitude monitoring sites impacted by cloud cover we dis- total column ozone as well as instantaneous point measure- cuss the potential deficiencies of this approach in terms of ments that have value for near-real-time low-ozone alerts, its rejection of otherwise valid observations and capability particularly when sited near population centres, as inputs to to evenly sample throughout the day.
- 
												  Linear, Ridge Regression, and Principal Component AnalysisLinear, Ridge Regression, and Principal Component Analysis Linear, Ridge Regression, and Principal Component Analysis Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/∼jiali Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Introduction to Regression I Input vector: X = (X1, X2, ..., Xp). I Output Y is real-valued. I Predict Y from X by f (X ) so that the expected loss function E(L(Y , f (X ))) is minimized. I Square loss: L(Y , f (X )) = (Y − f (X ))2 . I The optimal predictor ∗ 2 f (X ) = argminf (X )E(Y − f (X )) = E(Y | X ) . I The function E(Y | X ) is the regression function. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Example The number of active physicians in a Standard Metropolitan Statistical Area (SMSA), denoted by Y , is expected to be related to total population (X1, measured in thousands), land area (X2, measured in square miles), and total personal income (X3, measured in millions of dollars). Data are collected for 141 SMSAs, as shown in the following table. i : 1 2 3 ... 139 140 141 X1 9387 7031 7017 ... 233 232 231 X2 1348 4069 3719 ... 1011 813 654 X3 72100 52737 54542 ... 1337 1589 1148 Y 25627 15389 13326 ... 264 371 140 Goal: Predict Y from X1, X2, and X3. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Linear Methods I The linear regression model p X f (X ) = β0 + Xj βj .
- 
												  A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic DistributionA Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No.
- 
												  Quantile Regression for Overdispersed Count Data: a Hierarchical Method Peter CongdonCongdon Journal of Statistical Distributions and Applications (2017) 4:18 DOI 10.1186/s40488-017-0073-4 RESEARCH Open Access Quantile regression for overdispersed count data: a hierarchical method Peter Congdon Correspondence: [email protected] Abstract Queen Mary University of London, London, UK Generalized Poisson regression is commonly applied to overdispersed count data, and focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to response outliers and provide no information on other conditional distribution features of the response. We consider instead a hierarchical approach to quantile regression of overdispersed count data. This approach has the benefits of effective outlier detection and robust estimation in the presence of outliers, and in health applications, that quantile estimates can reflect risk factors. The technique is first illustrated with simulated overdispersed counts subject to contamination, such that estimates from conditional mean regression are adversely affected. A real application involves ambulatory care sensitive emergency admissions across 7518 English patient general practitioner (GP) practices. Predictors are GP practice deprivation, patient satisfaction with care and opening hours, and region. Impacts of deprivation are particularly important in policy terms as indicating effectiveness of efforts to reduce inequalities in care sensitive admissions. Hierarchical quantile count regression is used to develop profiles of central and extreme quantiles according to specified predictor combinations. Keywords: Quantile regression, Overdispersion, Poisson, Count data, Ambulatory sensitive, Median regression, Deprivation, Outliers 1. Background Extensions of Poisson regression are commonly applied to overdispersed count data, focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to response outliers.
- 
												  On the Efficiency and Consistency of Likelihood Estimation in MultivariateOn the efficiency and consistency of likelihood estimation in multivariate conditionally heteroskedastic dynamic regression models∗ Gabriele Fiorentini Università di Firenze and RCEA, Viale Morgagni 59, I-50134 Firenze, Italy <fi[email protected]fi.it> Enrique Sentana CEMFI, Casado del Alisal 5, E-28014 Madrid, Spain <sentana@cemfi.es> Revised: October 2010 Abstract We rank the efficiency of several likelihood-based parametric and semiparametric estima- tors of conditional mean and variance parameters in multivariate dynamic models with po- tentially asymmetric and leptokurtic strong white noise innovations. We detailedly study the elliptical case, and show that Gaussian pseudo maximum likelihood estimators are inefficient except under normality. We provide consistency conditions for distributionally misspecified maximum likelihood estimators, and show that they coincide with the partial adaptivity con- ditions of semiparametric procedures. We propose Hausman tests that compare Gaussian pseudo maximum likelihood estimators with more efficient but less robust competitors. Fi- nally, we provide finite sample results through Monte Carlo simulations. Keywords: Adaptivity, Elliptical Distributions, Hausman tests, Semiparametric Esti- mators. JEL: C13, C14, C12, C51, C52 ∗We would like to thank Dante Amengual, Manuel Arellano, Nour Meddahi, Javier Mencía, Olivier Scaillet, Paolo Zaffaroni, participants at the European Meeting of the Econometric Society (Stockholm, 2003), the Sympo- sium on Economic Analysis (Seville, 2003), the CAF Conference on Multivariate Modelling in Finance and Risk Management (Sandbjerg, 2006), the Second Italian Congress on Econometrics and Empirical Economics (Rimini, 2007), as well as audiences at AUEB, Bocconi, Cass Business School, CEMFI, CREST, EUI, Florence, NYU, RCEA, Roma La Sapienza and Queen Mary for useful comments and suggestions. Of course, the usual caveat applies.
- 
												  Simple Linear Regression with Least Square Estimation: an OverviewAditya N More et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (6) , 2016, 2394-2396 Simple Linear Regression with Least Square Estimation: An Overview Aditya N More#1, Puneet S Kohli*2, Kshitija H Kulkarni#3 #1-2Information Technology Department,#3 Electronics and Communication Department College of Engineering Pune Shivajinagar, Pune – 411005, Maharashtra, India Abstract— Linear Regression involves modelling a relationship amongst dependent and independent variables in the form of a (2.1) linear equation. Least Square Estimation is a method to determine the constants in a Linear model in the most accurate way without much complexity of solving. Metrics where such as Coefficient of Determination and Mean Square Error is the ith value of the sample data point determine how good the estimation is. Statistical Packages is the ith value of y on the predicted regression such as R and Microsoft Excel have built in tools to perform Least Square Estimation over a given data set. line The above equation can be geometrically depicted by Keywords— Linear Regression, Machine Learning, Least Squares Estimation, R programming figure 2.1. If we draw a square at each point whose length is equal to the absolute difference between the sample data point and the predicted value as shown, each of the square would then represent the residual error in placing the I. INTRODUCTION regression line. The aim of the least square method would Linear Regression involves establishing linear be to place the regression line so as to minimize the sum of relationships between dependent and independent variables.
- 
												  Principal Component Analysis (PCA) As a Statistical Tool for Identifying Key Indicators of Nuclear Power Plant Cable InsulationIowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2017 Principal component analysis (PCA) as a statistical tool for identifying key indicators of nuclear power plant cable insulation degradation Chamila Chandima De Silva Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Materials Science and Engineering Commons, Mechanics of Materials Commons, and the Statistics and Probability Commons Recommended Citation De Silva, Chamila Chandima, "Principal component analysis (PCA) as a statistical tool for identifying key indicators of nuclear power plant cable insulation degradation" (2017). Graduate Theses and Dissertations. 16120. https://lib.dr.iastate.edu/etd/16120 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Principal component analysis (PCA) as a statistical tool for identifying key indicators of nuclear power plant cable insulation degradation by Chamila C. De Silva A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Materials Science and Engineering Program of Study Committee: Nicola Bowler, Major Professor Richard LeSar Steve Martin The student author and the program of study committee are solely responsible for the content of this thesis. The Graduate College will ensure this thesis is globally accessible and will not permit alterations after a degree is conferred.