Applied Linear Regression

Total Page:16

File Type:pdf, Size:1020Kb

Applied Linear Regression Applied Linear Regression TLFeBOOK Applied Linear Regression Third Edition SANFORD WEISBERG University of Minnesota School of Statistics Minneapolis, Minnesota A JOHN WILEY & SONS, INC., PUBLICATION Copyright 2005 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data: Weisberg, Sanford, 1947– Applied linear regression / Sanford Weisberg.—3rd ed. p. cm.—(Wiley series in probability and statistics) Includes bibliographical references and index. ISBN 0-471-66379-4 (acid-free paper) 1. Regression analysis. I. Title. II. Series. QA278.2.W44 2005 519.5 36—dc22 2004050920 Printed in the United States of America. 10987654321 To Carol, Stephanie and to the memory of my parents Contents Preface xiii 1 Scatterplots and Regression 1 1.1 Scatterplots, 1 1.2 Mean Functions, 9 1.3 Variance Functions, 11 1.4 Summary Graph, 11 1.5 Tools for Looking at Scatterplots, 12 1.5.1 Size, 13 1.5.2 Transformations, 14 1.5.3 Smoothers for the Mean Function, 14 1.6 Scatterplot Matrices, 15 Problems, 17 2 Simple Linear Regression 19 2.1 Ordinary Least Squares Estimation, 21 2.2 Least Squares Criterion, 23 2.3 Estimating σ 2,25 2.4 Properties of Least Squares Estimates, 26 2.5 Estimated Variances, 27 2.6 Comparing Models: The Analysis of Variance, 28 2.6.1 The F -Test for Regression, 30 2.6.2 Interpreting p-values, 31 2.6.3 Power of Tests, 31 2.7 The Coefficient of Determination, R2,31 2.8 Confidence Intervals and Tests, 32 2.8.1 The Intercept, 32 2.8.2 Slope, 33 vii viii CONTENTS 2.8.3 Prediction, 34 2.8.4 Fitted Values, 35 2.9 The Residuals, 36 Problems, 38 3 Multiple Regression 47 3.1 Adding a Term to a Simple Linear Regression Model, 47 3.1.1 Explaining Variability, 49 3.1.2 Added-Variable Plots, 49 3.2 The Multiple Linear Regression Model, 50 3.3 Terms and Predictors, 51 3.4 Ordinary Least Squares, 54 3.4.1 Data and Matrix Notation, 54 3.4.2 Variance-Covariance Matrix of e,56 3.4.3 Ordinary Least Squares Estimators, 56 3.4.4 Properties of the Estimates, 57 3.4.5 Simple Regression in Matrix Terms, 58 3.5 The Analysis of Variance, 61 3.5.1 The Coefficient of Determination, 62 3.5.2 Hypotheses Concerning One of the Terms, 62 3.5.3 Relationship to the t-Statistic, 63 3.5.4 t-Tests and Added-Variable Plots, 63 3.5.5 Other Tests of Hypotheses, 64 3.5.6 Sequential Analysis of Variance Tables, 64 3.6 Predictions and Fitted Values, 65 Problems, 65 4 Drawing Conclusions 69 4.1 Understanding Parameter Estimates, 69 4.1.1 Rate of Change, 69 4.1.2 Signs of Estimates, 70 4.1.3 Interpretation Depends on Other Terms in the Mean Function, 70 4.1.4 Rank Deficient and Over-Parameterized Mean Functions, 73 4.1.5 Tests, 74 4.1.6 Dropping Terms, 74 4.1.7 Logarithms, 76 4.2 Experimentation Versus Observation, 77 CONTENTS ix 4.3 Sampling from a Normal Population, 80 4.4 More on R2,81 4.4.1 Simple Linear Regression and R2,83 4.4.2 Multiple Linear Regression, 84 4.4.3 Regression through the Origin, 84 4.5 Missing Data, 84 4.5.1 Missing at Random, 84 4.5.2 Alternatives, 85 4.6 Computationally Intensive Methods, 87 4.6.1 Regression Inference without Normality, 87 4.6.2 Nonlinear Functions of Parameters, 89 4.6.3 Predictors Measured with Error, 90 Problems, 92 5 Weights, Lack of Fit, and More 96 5.1 Weighted Least Squares, 96 5.1.1 Applications of Weighted Least Squares, 98 5.1.2 Additional Comments, 99 5.2 Testing for Lack of Fit, Variance Known, 100 5.3 Testing for Lack of Fit, Variance Unknown, 102 5.4 General F Testing, 105 5.4.1 Non-null Distributions, 107 5.4.2 Additional Comments, 108 5.5 Joint Confidence Regions, 108 Problems, 110 6 Polynomials and Factors 115 6.1 Polynomial Regression, 115 6.1.1 Polynomials with Several Predictors, 117 6.1.2 Using the Delta Method to Estimate a Minimum or a Maximum, 120 6.1.3 Fractional Polynomials, 122 6.2 Factors, 122 6.2.1 No Other Predictors, 123 6.2.2 Adding a Predictor: Comparing Regression Lines, 126 6.2.3 Additional Comments, 129 6.3 Many Factors, 130 6.4 Partial One-Dimensional Mean Functions, 131 6.5 Random Coefficient Models, 134 Problems, 137 x CONTENTS 7 Transformations 147 7.1 Transformations and Scatterplots, 147 7.1.1 Power Transformations, 148 7.1.2 Transforming Only the Predictor Variable, 150 7.1.3 Transforming the Response Only, 152 7.1.4 The Box and Cox Method, 153 7.2 Transformations and Scatterplot Matrices, 153 7.2.1 The 1D Estimation Result and Linearly Related Predictors, 156 7.2.2 Automatic Choice of Transformation of Predictors, 157 7.3 Transforming the Response, 159 7.4 Transformations of Nonpositive Variables, 160 Problems, 161 8 Regression Diagnostics: Residuals 167 8.1 The Residuals, 167 8.1.1 Difference Between eˆ and e, 168 8.1.2 The Hat Matrix, 169 8.1.3 Residuals and the Hat Matrix with Weights, 170 8.1.4 The Residuals When the Model Is Correct, 171 8.1.5 The Residuals When the Model Is Not Correct, 171 8.1.6 Fuel Consumption Data, 173 8.2 Testing for Curvature, 176 8.3 Nonconstant Variance, 177 8.3.1 Variance Stabilizing Transformations, 179 8.3.2 A Diagnostic for Nonconstant Variance, 180 8.3.3 Additional Comments, 185 8.4 Graphs for Model Assessment, 185 8.4.1 Checking Mean Functions, 186 8.4.2 Checking Variance Functions, 189 Problems, 191 9 Outliers and Influence 194 9.1 Outliers, 194 9.1.1 An Outlier Test, 194 9.1.2 Weighted Least Squares, 196 9.1.3 Significance Levels for the Outlier Test, 196 9.1.4 Additional Comments, 197 9.2 Influence of Cases, 198 9.2.1 Cook’s Distance, 198 CONTENTS xi 9.2.2 Magnitude of Di, 199 9.2.3 Computing Di, 200 9.2.4 Other Measures of Influence, 203 9.3 Normality Assumption, 204 Problems, 206 10 Variable Selection 211 10.1 The Active Terms, 211 10.1.1 Collinearity, 214 10.1.2 Collinearity and Variances, 216 10.2 Variable Selection, 217 10.2.1 Information Criteria, 217 10.2.2 Computationally Intensive Criteria, 220 10.2.3 Using Subject-Matter Knowledge, 220 10.3 Computational Methods, 221 10.3.1 Subset Selection Overstates Significance, 225 10.4 Windmills, 226 10.4.1 Six Mean Functions, 226 10.4.2 A Computationally Intensive Approach, 228 Problems, 230 11 Nonlinear Regression 233 11.1 Estimation for Nonlinear Mean Functions, 234 11.2 Inference Assuming Large Samples, 237 11.3 Bootstrap Inference, 244 11.4 References, 248 Problems, 248 12 Logistic Regression 251 12.1 Binomial Regression, 253 12.1.1 Mean Functions for Binomial Regression, 254 12.2 Fitting Logistic Regression, 255 12.2.1 One-Predictor Example, 255 12.2.2 Many Terms, 256 12.2.3 Deviance, 260 12.2.4 Goodness-of-Fit Tests, 261 12.3 Binomial Random Variables, 263 12.3.1 Maximum Likelihood Estimation, 263 12.3.2 The Log-Likelihood for Logistic Regression, 264 xii CONTENTS 12.4 Generalized Linear Models, 265 Problems, 266 Appendix 270 A.1 Web Site, 270 A.2 Means and Variances of Random Variables, 270 A.2.1 E Notation, 270 A.2.2 Var Notation, 271 A.2.3 Cov Notation, 271 A.2.4 Conditional Moments, 272 A.3 Least Squares for Simple Regression, 273 A.4 Means and Variances of Least Squares Estimates, 273 A.5 Estimating E(Y |X) Using a Smoother, 275 A.6 A Brief Introduction to Matrices and Vectors, 278 A.6.1 Addition and Subtraction, 279 A.6.2 Multiplication by a Scalar, 280 A.6.3 Matrix Multiplication, 280 A.6.4 Transpose of a Matrix, 281 A.6.5 Inverse of a Matrix, 281 A.6.6 Orthogonality, 282 A.6.7 Linear Dependence and Rank of a Matrix, 283 A.7 Random Vectors, 283 A.8 Least Squares Using Matrices, 284 A.8.1 Properties of Estimates, 285 A.8.2 The Residual Sum of Squares, 285 A.8.3 Estimate of Variance, 286 A.9 The QR Factorization, 286 A.10 Maximum Likelihood Estimates, 287 A.11 The Box-Cox Method for Transformations, 289 A.11.1 Univariate Case, 289 A.11.2 Multivariate Case, 290 A.12 Case Deletion in Linear Regression, 291 References 293 Author Index 301 Subject Index 305 Preface Regression analysis answers questions about the dependence of a response variable on one or more predictors, including prediction of future values of a response, dis- covering which predictors are important, and estimating the impact of changing a predictor or a treatment on the value of the response.
Recommended publications
  • Computing Primer for Applied Linear Regression, Third Edition Using R and S-Plus
    Computing Primer for Applied Linear Regression, Third Edition Using R and S-Plus Sanford Weisberg University of Minnesota School of Statistics February 8, 2010 c 2005, Sanford Weisberg Home Website: www.stat.umn.edu/alr Contents Introduction 1 0.1 Organizationofthisprimer 4 0.2 Data files 5 0.2.1 Documentation 5 0.2.2 R datafilesandapackage 6 0.2.3 Two files missing from the R library 6 0.2.4 S-Plus data files and library 7 0.2.5 Getting the data in text files 7 0.2.6 Anexceptionalfile 7 0.3 Scripts 7 0.4 The very basics 8 0.4.1 Readingadatafile 8 0.4.2 ReadingExcelFiles 9 0.4.3 Savingtextoutputandgraphs 10 0.4.4 Normal, F , t and χ2 tables 11 0.5 Abbreviationstoremember 12 0.6 Packages/Libraries for R and S-Plus 12 0.7 CopyrightandPrintingthisPrimer 13 1 Scatterplots and Regression 13 v vi CONTENTS 1.1 Scatterplots 13 1.2 Mean functions 16 1.3 Variance functions 16 1.4 Summary graph 16 1.5 Tools for looking at scatterplots 16 1.6 Scatterplot matrices 16 2 Simple Linear Regression 19 2.1 Ordinaryleastsquaresestimation 19 2.2 Leastsquarescriterion 19 2.3 Estimating σ2 20 2.4 Propertiesofleastsquaresestimates 20 2.5 Estimatedvariances 20 2.6 Comparing models: The analysis of variance 21 2.7 The coefficient of determination, R2 22 2.8 Confidence intervals and tests 23 2.9 The Residuals 26 3 Multiple Regression 27 3.1 Adding a term to a simple linear regression model 27 3.2 The Multiple Linear Regression Model 27 3.3 Terms and Predictors 27 3.4 Ordinaryleastsquares 28 3.5 Theanalysisofvariance 30 3.6 Predictions and fitted values 31 4 Drawing Conclusions
    [Show full text]
  • Testing the Significance of the Improvement of Cubic Regression
    Testing the statistical significance of the improvement of cubic regression compared to quadratic regression using analysis of variance (ANOVA) R.J. Oosterbaan on www.waterlog.info , public domain. Abstract When performing a regression analysis, it is recommendable to test the statistical significance of the result by means of an analysis of variance (ANOVA). In a previous article, the significance of the improvement of segmented regression, as done by the SegReg calculator program, compared to simple linear regression was tested using analysis of variance (ANOVA). The significance of the improvement of a quadratic (2nd degree) regression over a simple linear regression is tested in the SegRegA program (A stands for “amplified”) by means of ANOVA. Also, here the significance of the improvement of a cubic (3rd degree) over a simple linear regression is tested in the same way. In this article, the significance test of the improvement of a cubic regression over a quadratic regression will be dealt with. Contents 1. Introduction 2. ANOVA symbols used 3. Explanatory examples A. Quadratic regression B. Cubic regression C. Comparing quadratic and cubic regression 4. Summary 5. References 1. Introduction When performing a regression analysis, it is recommendable to test the statistical significance of the result by means of an analysis of variance (ANOVA). In a previous article [Ref. 1], the significance of the improvement of segmented regression, as done by the SegReg program [Ref. 2], compared to simple linear regression was dealt with using analysis of variance. The significance of the improvement of a quadratic (2nd degree) regression over a simple linear regression is tested in the SegRegA program (the A stands for “amplified”), also by means of ANOVA [Ref.
    [Show full text]
  • Correctional Data Analysis Systems. INSTITUTION Sam Honston State Univ., Huntsville, Tex
    I DocdnENT RESUME ED 209 425 CE 029 723 AUTHOR Friel, Charles R.: And Others TITLE Correctional Data Analysis Systems. INSTITUTION Sam Honston State Univ., Huntsville, Tex. Criminal 1 , Justice Center. SPONS AGENCY Department of Justice, Washington, D.C. Bureau of Justice Statistics. PUB DATE 80 GRANT D0J-78-SSAX-0046 NOTE 101p. EDRS PRICE MF01fPC05Plus Postage. DESCRIPTORS Computer Programs; Computers; *Computer Science; Computer Storage Devices; *Correctional Institutions; *Data .Analysis;Data Bases; Data Collection; Data Processing; *Information Dissemination; *Iaformation Needs; *Information Retrieval; Information Storage; Information Systems; Models; State of the Art Reviews ABSTRACT Designed to help the-correctional administrator meet external demands for information, this detailed analysis or the demank information problem identifies the Sources of teguests f6r and the nature of the information required from correctional institutions' and discusses the kinds of analytic capabilities required to satisfy . most 'demand informhtion requests. The goals and objectives of correctional data analysis systems are ontliled. Examined next are the content and sources of demand information inquiries. A correctional case law demand'information model is provided. Analyzed next are such aspects of the state of the art of demand information as policy considerations, procedural techniques, administrative organizations, technology, personnel, and quantitative analysis of 'processing. Availa4ie software, report generators, and statistical packages
    [Show full text]
  • Water Quality Trend and Change-Point Analyses Using Integration of Locally Weighted Polynomial Regression and Segmented Regression
    Environ Sci Pollut Res DOI 10.1007/s11356-017-9188-x RESEARCH ARTICLE Water quality trend and change-point analyses using integration of locally weighted polynomial regression and segmented regression Hong Huang1,2 & Zhenfeng Wang1,2 & Fang Xia1,2 & Xu Shang1,2 & YuanYuan Liu 3 & Minghua Zhang1,2 & Randy A. Dahlgren1,2,4 & Kun Mei1,2 Received: 11 January 2017 /Accepted: 2 May 2017 # Springer-Verlag Berlin Heidelberg 2017 Abstract Trend and change-point analyses of water quality The reliability was verified by statistical tests and prac- time series data have important implications for pollution con- tical considerations for Shanxi Reservoir watershed. The trol and environmental decision-making. This paper devel- newly developed integrated LWPR-SegReg approach is oped a new approach to assess trends and change-points of not only limited to the assessment of trends and change-points water quality parameters by integrating locally weighted poly- of water quality parameters but also has a broad application to nomial regression (LWPR) and segmented regression other fields with long-term time series records. (SegReg). Firstly, LWPR was used to pretreat the original water quality data into a smoothed time series to represent Keywords Water quality . Long-term trend assessment . the long-term trend of water quality. Then, SegReg was used Change-point analysis . Locally weighted polynomial to identify the long-term trends and change-points of the regression . Segmented regression smoothed time series. Finally, statistical tests were applied to determine the significance of the long-term trends and change- points. The efficacy of this approach was validated using a 10- Introduction year record of total nitrogen (TN) and chemical oxygen de- mand (CODMn) from Shanxi Reservoir watershed in eastern Many countries and regions in the world suffer from chronic China.
    [Show full text]
  • T2S General Functional Specifications Table of Contents
    Target2-Securities General Functional Specifications V8.0 27 January 2020 T2S General Functional Specifications Table of Contents Table of Contents 1 Approach to T2S Functional Design .............................................................................. 5 1.1 Presentation of the document................................................................................ 5 1.2 Methodological elements ....................................................................................... 6 1.3 Reference documents............................................................................................. 7 1.4 Specifications references – cross-referencing URD ............................................... 7 1.5 Conventions used ................................................................................................... 7 2 General Functional Overview ........................................................................................ 8 2.1 Introduction ........................................................................................................... 8 2.1.1 Objective and scope of T2S ........................................................................... 8 2.1.2 T2S Actors.................................................................................................... 23 2.1.3 Multi-currency in T2S .................................................................................. 29 2.1.4 Other systems interacting with T2S ............................................................ 29 2.2 Overall high level
    [Show full text]
  • Regression Analysis of Hydrologic Data
    6 FREQUENCY AND REGRESSION ANALYSIS OF HYDROLOGIC DATA R.J. Oosterbaan PART II: REGRESSION ANALYSIS On web site www.waterlog.info For free sofware on frequency regression analysis see: www.waterlog.info/cumfreq.htm For free sofware on segmented regression analysis see: www.waterlog.info/segreg.htm Chapter 6 in: H.P.Ritzema (Ed.), Drainage Principles and Applications, Publication 16, second revised edition, 1994, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. ISBN 90 70754 3 39 Table of contents 6 FREQUENCY AND REGRESSION ANALYSIS .................................................................3 6.1 Introduction......................................................................................................................3 6.5 Regression Analysis..........................................................................................................4 6.5.1 Introduction................................................................................................................4 6.5.2 The Ratio Method......................................................................................................5 6.5.3 Regression of y upon x...............................................................................................9 Confidence statements, regression of y upon x.............................................................11 Example 6.2 Regression y upon x.................................................................................14 6.5.4 Linear Two-way Regression.....................................................................................16
    [Show full text]
  • Nonlinear Regression
    Nonlinear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Nonlinear Regression 1 / 36 Nonlinear Regression 1 Introduction 2 Estimation for Nonlinear Mean Functions Iterative Estimation Technique 3 Large Sample Inference 4 An Artificial Example Turkey Growth Example Three Sources of Methionine 5 Bootstrap Inference The Temperature Example James H. Steiger (Vanderbilt University) Nonlinear Regression 2 / 36 Introduction Introduction A mean function may not necessarily be a linear combination of terms. Some examples: E(Y jX = x) = θ1 + θ2(1 − exp(−θ3x)) (1) E(Y jX = x) = β0 + β1 S (x; λ) (2) where S (x; λ) is the scaled power transformation, defined as ( (X λ − 1)/λ if λ 6= 0 S (X ; λ) = (3) log(X ) if λ = 0 Once λ has been chosen, the function becomes, in the sense we have been describing, linear in its terms, just as a quadratic in X can be viewed as linear in X and X 2. James H. Steiger (Vanderbilt University) Nonlinear Regression 3 / 36 Introduction Introduction Nonlinear mean functions often arise in practice when we have special information about the processes we are modeling. For example, consider again the function E(Y jX = x) = θ1 + θ2(1 − exp(−θ3x)). As X increases, assuming θ3 > 0, the function approaches θ1 + θ2. When X = 0, the function value is θ1, representing the average growth with no supplementation. θ3 is a rate parameter. James H. Steiger (Vanderbilt University) Nonlinear Regression 4 / 36 Estimation for Nonlinear Mean Functions Estimation for Nonlinear Mean Functions Our general notational setup is straightforward.
    [Show full text]
  • Appendix A: Datasets and Models
    Appendix A: Datasets and Models Table A.1. Models considered in this book. Model Page Antoine 56 Beverton-Holt 2, 106 Biexponential 87 Brain-Cousens 58, 71, 108 Clapeyron 56 Demodulator 48 Deriso 106 Exponential decay 24, 31 Hockey stick 41 Hyperbolic 4, 35 Logistic 70, 101 Log-logistic (4 parameters) 5, 60, 118 Log-logistic (3 parameters) 58 Michaelis-Menten 9, 30 Monoexponential 108 Ricker 78, 106 Shepherd 106 Verhulst 70 134 Appendix A: Datasets and Models Table A.2. Datasets used in this book. Discipline Page Package Name Biology exp1 87 nlrwr exp2 87 nlrwr L.minor 9 nlrwr Leaves 35 NRAIA methionine 131 drc RGRcurve 33 nlrwr RScompetition 4, 45 drc ScotsPine 108 nlrwr Chemistry btb 35 nlrwr Isom 53 NRAIA sts 35 nlrwr vapCO 56 nlrwr Fisheries M.merluccius 2, 105 nlrwr O.mykiss 91 drc, nlrwr sockeye 78 nlrwr Engineering Chwirut2 23 NISTnls IQsig 48 nlrwr Medicine heartrate 71 drc Puromycin 109 datasets wtloss 20 MASS Toxicology G.aparine 118 drc lettuce 57 drc ryegrass 60 drc S.alba 4 drc secalonic 101 drc spinach 131 drc vinclozolin 123 drc Other Indometh (pharmacokinetics) 71 datasets segreg (energy consumption) 44 alr3 US.pop (population growth) 71 car Appendix B: Self-starter Functions Table B.1. Available self-starter functions for nls(). Model No. Self-starter function param. Mean function Biexponential 4 SSbiexp(x, A1, lrc1, A2, lrc2) A1 · exp(− exp(lrc1) · x)+A2 · exp(− exp(lrc2) · x) Asymptotic 3 SSasymp(x, Asym, R0, lrc) regression Asym +(R0 − Asym) · exp(− exp(lrc)x) Asymptotic 3 SSasympOff(x, Asym, lrc, c0) regression Asym
    [Show full text]
  • Community Engagement @ Illinois
    COMMUNITY ENGAGEMENT The staff of the eBlack CU summer project include faculty and students from area high schools, Parkland Community College, and the University of Illinois. From left to right: Noah Lenstra, Rachel Harmon, Jaime Carpenter, Dominique Johnson, Reginald Carr, Deidre Murphy, Abdul Alkalimat, and Jelani Saadiq. (Not pictured: Sherodyn Bolden) RESEARCH AND SERVICE @ UNIVERSITY OF ILLINOIS COMMUNITY ENGAGEMENT: RESEARCH AND SERVICE @ UNIVERSITY OF ILLINOIS Noah Lenstra, Editor Cover Art: The Park Street Mural was created in 1978 at Fifth and Park in Champaign. Standing in front of the mural you can see Salem Baptist Church and Bethel A.M.E., the two oldest historically African-American churches in Champaign County, both founded before the civil war. The principal artist was Angela Rivers, who grew up in Champaign-Urbana. She was assisted by young men like L. Nolan, M. Mitchell, S. Brown and P. Caston, and a group of over 30 folks from the community including high school students and young adults. For more information go to: http://eblackcu.net/mural/ In the summer of 2010 six African-American high school students and young adults were hired by the University of Illinois to work on digitizing the history of Af- rican-Americans in Champaign-Urbana. One of their projects was creating a digital representation of the Park Street Mural and its history. This website is the result. We hope you enjoy it! Please send us any and all feedback and commentary by vis- iting: http://eblackcu.net/portal/contact. Community Engagement @ the University of Illinois is a free Research Product of the eBlackChampaign-Urbana Project, with funding and support from the Office of the Vice Chancellor for Public Engagement and Community Informatics Research Laboratory, Graduate School of Library and Information, University of Illinois at Urbana- Champaign.
    [Show full text]
  • Statistical Significance of Segmented Linear Regression with Break-Point
    1 Statistical significance of segmented linear regression with break-point . using variance analysis (ANOVA) and F-tests . Principles used in the SegReg program at https://www.waterlog.info/segreg.htm R.J. Oosterbaan On website https://www.waterlog.info , public domain, latest upload 20-11-2017 Subjects: Analysis of variance for segmented linear regression with break point page 1 Type 1 page 2 Type 5 page 3 Types 3 and 4 page 4 Types 2 and 6 page 5 Examples of F-testing page 6 Reference page 7 Analysis of variance for data having several y-values for each x-value page 8 Analysis of variance when two independent variables are present page 9 2 Introduction In SegReg the significance of the break-point (BP) is indicated by the 90% confidence area around the BP as shown in the graphs. When the area remains within the data range, the break-point is significant, i.e. the BP gives a significant additional explanation compared to straightforward linear regression without a BP. Or, one can say that the BP analysis gives an improvement of the simple linear regression. Although the confidence-area test makes other types of significance tests unnecessary, one may still like to perform an analysis of variance (ANOVA) and apply the F-test (named in honour of R.A. Fisher). The following ANOVA procedure assumes that the regression is done of y (dependent variable) on x (independent variable). An F-test calculator is available at http://www.waterlog.info/f-test.htm Symbols used y value of dependent variable η average value of y (mean) SSD sum of squares
    [Show full text]
  • Mathematical Programming for Piecewise Linear Regression Analysis
    Mathematical Programming for Piecewise Linear Regression Analysis Lingjian Yanga, Songsong Liua, Sophia Tsokab, Lazaros G. Papageorgioua,∗ aCentre for Process Systems Engineering, Department of Chemical Engineering, University College London, Torrington Place, London WC1E 7JE, UK bDepartment of Informatics, School of Natural and Mahtematical Sciences, King's College London, Strand, London WC2R 2LS, UK Abstract In data mining, regression analysis is a computational tool that predicts con- tinuous output variables from a number of independent input variables, by ap- proximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression etc. In terms of piece-wise regression, the existing methods in literature are usu- ally restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise regression method is intro- duced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultane- ously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demon- ∗Corresponding author: Tel.: +442076792563; Fax.: +442073882348 Email addresses: [email protected] (Lingjian Yang), [email protected] (Songsong Liu), [email protected] (Sophia Tsoka), [email protected] (Lazaros G.
    [Show full text]
  • Computing Primer for Applied Linear Regression, 4Th Edition, Using R
    Computing Primer for Applied Linear Regression, 4th Edition, Using R Version of October 27, 2014 Sanford Weisberg School of Statistics University of Minnesota Minneapolis, Minnesota 55455 Copyright c 2014, Sanford Weisberg This Primer is best viewed using a pdf viewer such as Adobe Reader with bookmarks showing at the left, and in single page view, selected by View ! Page Display ! Single Page View. To cite this primer, use: Weisberg, S. (2014). Computing Primer for Applied Linear Regression, 4th Edition, Using R. Online, http://z.umn.edu/alrprimer. Contents 0 Introduction vi 0.1 Getting R and Getting Started................................ vii 0.2 Packages You Will Need.................................... vii 0.3 Using This Primer....................................... vii 0.4 If You Are New to R....................................... viii 0.5 Getting Help.......................................... viii 0.6 The Very Basics........................................ ix 0.6.1 Reading a Data File..................................x 0.6.2 Reading Your Own Data into R ............................ xi 0.6.3 Reading Excel Files.................................. xiii 0.6.4 Saving Text and Graphs................................ xiii 0.6.5 Normal, F , t and χ2 tables.............................. xiv i CONTENTS ii 1 Scatterplots and Regression1 1.1 Scatterplots...........................................1 1.4 Summary Graph........................................6 1.5 Tools for Looking at Scatterplots...............................6 1.6 Scatterplot Matrices......................................7
    [Show full text]