Logistic Regression and Generalized Linear Models

Total Page:16

File Type:pdf, Size:1020Kb

Logistic Regression and Generalized Linear Models Logistic Regression and Generalized Linear Models Sridhar Mahadevan [email protected] University of Massachusetts ©Sridhar Mahadevan: CMPSCI 689 – p. 1/29 Topics Generative vs. Discriminative models In many cases, it is difficult to model data using a parametric class conditional density P (X|ω, θ) Yet, in many problems, a linear decision boundary is usually adequate to separate classes (also, gaussian densities with a shared covariance matrix produces a linear decision boundary). Logistic regression: discriminative model for classification that produces linear decision boundaries Model fitting problem solved using maximum likelihood Iterative gradient-based algorithm for solving nonlinear maximum likelihood equations Recursive weighted least squares regression Logistic regression is an instance of a generalized linear model (GLM), which consists of a large variety of exponential models.GLMs can also be extended to generalized additive models (GAMs). ©Sridhar Mahadevan: CMPSCI 689 – p. 2/29 Discriminative vs. Generative Models Both generative and discriminative approaches address the problem of modeling the discriminant function P (y|x) of output labels (or values) y conditioned on the input x. In generative models, we estimate both P (x) and P (x|y), and use Bayes rule to compute the discriminant. P (y|x) ∝ P (x)P (x|y) Discriminative approaches model the conditional distribution P (y|x) directly, and ignore the marginal P (x). We now turn to explore several types instances of discriminative models, including logistic regression in this class, and later several other types including support vector machines. ©Sridhar Mahadevan: CMPSCI 689 – p. 3/29 Generalized Linear Models In linear regression, we model the output y as a linear function of the input variables, with a noise term that is zero mean constant variance Gaussian. y = g(x)+ ǫ, where the conditional mean E(y|x)= g(x), and the noise term is ǫ. T g(x)= β x (where β0 is an offset term). We saw earlier that the maximum likelihood framework justified the use of a squared error loss function, provided the errors were IID gaussian (the variance does not matter). We want to generalize this idea of specifying a model family by specifying the type of error distribution: When the output variable y is discrete (e.g., binary or multinomial), the noise term is not gaussian, but binomial or multinomial. A change in the mean is coupled by a change in the variance, and we want to be able to couple mean and variance in our model. Generalized linear models provides a rich tool of models based on specifying the error distribution. ©Sridhar Mahadevan: CMPSCI 689 – p. 4/29 Logit Function Since the output variable y only takes on values ∈ (0, 1) (for binary classification), we need a different way of representing E(y|x) so that the range of y ∈ (0, 1). One convenient form to use is the sigmoid or logistic function. Let us assume a vector-valued input variable x =(x1,...,xp). The logistic function is S shaped and approaches 0 (as x → −∞) or 1 (as x → ∞). T eβ x 1 P (y = 1|x, β)= µ(x|β)= = 1+ eβT x 1+ e−βT x 1 P (y = 0|x, β) = 1 − µ(x|β)= 1+ eβT x We assume an extra input x0 = 1, so that β0 is an offset. We can invert the above transformation to get the logit function µ(x|β) g(x|β) = log = βT x 1 − µ(x|β) ©Sridhar Mahadevan: CMPSCI 689 – p. 5/29 Logistic Regression y β0 β2 β1 X2 X1 X0 ©Sridhar Mahadevan: CMPSCI 689 – p. 6/29 Example Dataset for Logistic Regression The data set we are analyzing is coronary heart disease in South Africa. The chd response (output) variable is binary (yes, no), and there are 9 predictor variables: There are 462 instances, out of which 160 are cases (positive instances), and 302 are controls (negative instances). The predictor variables are systolic blood pressure, tobacco, ldl, famhist, obesity, alcohol, age, adiposity, typea, Let’s focus on a subset of the predictors: sbp, tobacco, ldl, famhist, obesity, alcohol, age. We want to fit a model of the following form 1 P (chd = 1|x, β)= 1+ e−βT x where βT x = β0 +β1xsbp +β2xtobacco +β3xldl +β4xfamhist +β5xage +β6xalcohol +β7xobesity ©Sridhar Mahadevan: CMPSCI 689 – p. 7/29 Noise Model for Logistic Regression Let us try to represent the logistic regression model as y = µ(x|β)+ ǫ and ask ourself what sort of noise model is represented by ǫ. Since y takes on the value 1 with probability µ(x|β), it follows that ǫ can also only take on two possible values, namely If y = 1, then ǫ = 1 − µ(x|β) with probability µ(x|β). Conversely, if y = 0, then ǫ = −µ(x|β) and this happens with probability (1 − µ(x|β)). This analysis shows that the error term in logistic regression is a binomially distributed random variable. Its moments can be computed readily as shown below: E(ǫ)= µ(x|β)(1 − µ(x|β)) − (1 − µ(x|β))µ(x|β) = 0 (the error term has mean 0). V ar(ǫ)= Eǫ2 − (Eǫ)2 = Eǫ2 = µ(x|β)(1 − µ(x|β)) (show this!) ©Sridhar Mahadevan: CMPSCI 689 – p. 8/29 Maximum Likelihood for LR Suppose we want to fit a logistic regression model to a dataset of n observations X =(x1, y1),..., (xn, yn). We can express the conditional likelihood of a single observation simply as i i P (yi|xi, β)= µ(xi|β)y (1 − µ(xi|β))1−y Hence, the conditional likelihood of the entire dataset can be written as n i i P (Y |X, β)= µ(xi|β)y (1 − µ(xi|β))1−y Y i=1 The conditional log-likelihood is then simply n l(β|X,Y )= yi log µ(xi|β)+(1 − yi) log(1 − µ(xi|β)) X i=1 ©Sridhar Mahadevan: CMPSCI 689 – p. 9/29 Maximum Likelihood for LR We solve the conditional log-likelihood equation by taking gradients n ∂l(β|X,Y ) 1 ∂µ(xi|β) 1 ∂µ(xi) = yi − (1 − yi) i i ∂βk X µ(x |β) ∂βk (1 − µ(x |β)) ∂βk i=1 i ∂µ(x |β) ∂ 1 i i i Using the fact that = ( T i )= µ(x |β)(1 − µ(x |β))x , we get βk ∂βk 1+e−β x k n ∂l(β|X,Y ) i i i = xk(y − µ(x |β)) ∂βk X i=1 Setting this to 0, since x0 = 1 the first component of these equations reduces to n n yi = µ(xi|β) X X i=1 i=1 The expected number of instances of each class must match the observed number. ©Sridhar Mahadevan: CMPSCI 689 – p. 10/29 Newton-Raphson Method Newton’s method is a general procedure for finding the roots of an equation f(θ) = 0. Newton’s algorithm is based on the recursion f(θt) θ = θ − t+1 t ′ f (θt) Newton’s method finds the minimum of a function f. We want to find the maximum of the log likehood equation. But, the maximum of a function f(θ) is exactly when its derivative f ′(θ) = 0. So, plugging in f ′(θ) for f(θ) above, we get ′ f (θt) θ = θ − t+1 t ′′ f (θt) ©Sridhar Mahadevan: CMPSCI 689 – p. 11/29 Fisher Scoring In logistic regresion, the parameter β is a vector, so we have to use the Newton-Raphson algorithm −1 βt+1 = βt − H ∇β l(βt|X,Y ) Here, ∇β l(βt|X,Y ) is the vector of partial derivatives of the log-likelihood equation 2 ∂ l(β|X,Y ) Hij = is the Hessian matrix of second order derivatives. ∂βi∂βj The use of Newton’s method to find the solution to the conditional log-likelihood equation is called Fisher scoring. ©Sridhar Mahadevan: CMPSCI 689 – p. 12/29 Fisher Scoring for Maximum Likelihood Taking the second derivative of the likelihood score equations gives us n 2 ∂ l(β|X,Y ) i i i i = − xkxmµ(x |β)(1 − µ(x |β)) ∂βk∂βm X i=1 We can use matrix notation to write the Newton-Raphson algorithm for logistic regression. Define the n × n diagonal matrix µ(x1|β)(1 − µ(x1|β)) ... 0 0 µ(x2|β)(1 − µ(x2|β)) ... W = ... 0 ... µ(xn|β)(1 − µ(xn|β)) Let Y be an n × 1 column vector of output values, and X be the design matrix of size n × (p + 1) of input values, and P be the column vector of fitted probability values µ(xi|β). ©Sridhar Mahadevan: CMPSCI 689 – p. 13/29 Iterative Weighted Least Squares The gradient of the log likelihood can be written in matrix form as n ∂l(β|X,Y ) = xi(yi − µ(xi|β)) = XT (Y − P ) ∂β X i=1 The Hessian can be written as ∂2l(β|X,Y ) = −XT W X ∂β∂βT The Newton-Raphson algorithm then becomes βnew = βold +(XT W X)−1XT (Y − P ) = (XT W X)−1XT W Xβold + W −1(Y − P ) = (XT W X)−1XT W Z where Z ≡ Xβold + W −1(Y − P ) ©Sridhar Mahadevan: CMPSCI 689 – p. 14/29 Weighted Least Squares Regression Weighted least squares regression finds the best least-squares solution to the equation W Ax ≈ Wb (WA)T WAxˆ = (WA)T Wb xˆ = (AT CA)−1AT Cb where C = W T W Returning to logistic regression, we now see βnew =(XT W X)−1XT W Z is weighted least squares regression (where X is the matrix A above, W is a diagonal weight vector with entries µ(xi|β)(1 − µ(xi|β)), and Z corresponds to the vector b above).
Recommended publications
  • Generalized Linear Models and Generalized Additive Models
    00:34 Friday 27th February, 2015 Copyright ©Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ Chapter 13 Generalized Linear Models and Generalized Additive Models [[TODO: Merge GLM/GAM and Logistic Regression chap- 13.1 Generalized Linear Models and Iterative Least Squares ters]] [[ATTN: Keep as separate Logistic regression is a particular instance of a broader kind of model, called a gener- chapter, or merge with logis- alized linear model (GLM). You are familiar, of course, from your regression class tic?]] with the idea of transforming the response variable, what we’ve been calling Y , and then predicting the transformed variable from X . This was not what we did in logis- tic regression. Rather, we transformed the conditional expected value, and made that a linear function of X . This seems odd, because it is odd, but it turns out to be useful. Let’s be specific. Our usual focus in regression modeling has been the condi- tional expectation function, r (x)=E[Y X = x]. In plain linear regression, we try | to approximate r (x) by β0 + x β. In logistic regression, r (x)=E[Y X = x] = · | Pr(Y = 1 X = x), and it is a transformation of r (x) which is linear. The usual nota- tion says | ⌘(x)=β0 + x β (13.1) · r (x) ⌘(x)=log (13.2) 1 r (x) − = g(r (x)) (13.3) defining the logistic link function by g(m)=log m/(1 m). The function ⌘(x) is called the linear predictor. − Now, the first impulse for estimating this model would be to apply the transfor- mation g to the response.
    [Show full text]
  • Logistic Regression, Dependencies, Non-Linear Data and Model Reduction
    COMP6237 – Logistic Regression, Dependencies, Non-linear Data and Model Reduction Markus Brede [email protected] Lecture slides available here: http://users.ecs.soton.ac.uk/mb8/stats/datamining.html (Thanks to Jason Noble and Cosma Shalizi whose lecture materials I used to prepare) COMP6237: Logistic Regression ● Outline: – Introduction – Basic ideas of logistic regression – Logistic regression using R – Some underlying maths and MLE – The multinomial case – How to deal with non-linear data ● Model reduction and AIC – How to deal with dependent data – Summary – Problems Introduction ● Previous lecture: Linear regression – tried to predict a continuous variable from variation in another continuous variable (E.g. basketball ability from height) ● Here: Logistic regression – Try to predict results of a binary (or categorical) outcome variable Y from a predictor variable X – This is a classification problem: classify X as belonging to one of two classes – Occurs quite often in science … e.g. medical trials (will a patient live or die dependent on medication?) Dependent variable Y Predictor Variables X The Oscars Example ● A fictional data set that looks at what it takes for a movie to win an Oscar ● Outcome variable: Oscar win, yes or no? ● Predictor variables: – Box office takings in millions of dollars – Budget in millions of dollars – Country of origin: US, UK, Europe, India, other – Critical reception (scores 0 … 100) – Length of film in minutes – This (fictitious) data set is available here: https://www.southampton.ac.uk/~mb1a10/stats/filmData.txt Predicting Oscar Success ● Let's start simple and look at only one of the predictor variables ● Do big box office takings make Oscar success more likely? ● Could use same techniques as below to look at budget size, film length, etc.
    [Show full text]
  • Stochastic Process - Introduction
    Stochastic Process - Introduction • Stochastic processes are processes that proceed randomly in time. • Rather than consider fixed random variables X, Y, etc. or even sequences of i.i.d random variables, we consider sequences X0, X1, X2, …. Where Xt represent some random quantity at time t. • In general, the value Xt might depend on the quantity Xt-1 at time t-1, or even the value Xs for other times s < t. • Example: simple random walk . week 2 1 Stochastic Process - Definition • A stochastic process is a family of time indexed random variables Xt where t belongs to an index set. Formal notation, { t : ∈ ItX } where I is an index set that is a subset of R. • Examples of index sets: 1) I = (-∞, ∞) or I = [0, ∞]. In this case Xt is a continuous time stochastic process. 2) I = {0, ±1, ±2, ….} or I = {0, 1, 2, …}. In this case Xt is a discrete time stochastic process. • We use uppercase letter {Xt } to describe the process. A time series, {xt } is a realization or sample function from a certain process. • We use information from a time series to estimate parameters and properties of process {Xt }. week 2 2 Probability Distribution of a Process • For any stochastic process with index set I, its probability distribution function is uniquely determined by its finite dimensional distributions. •The k dimensional distribution function of a process is defined by FXX x,..., ( 1 x,..., k ) = P( Xt ≤ 1 ,..., xt≤ X k) x t1 tk 1 k for anyt 1 ,..., t k ∈ I and any real numbers x1, …, xk .
    [Show full text]
  • Variance Function Regressions for Studying Inequality
    Variance Function Regressions for Studying Inequality The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Western, Bruce and Deirdre Bloome. 2009. Variance function regressions for studying inequality. Working paper, Department of Sociology, Harvard University. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2645469 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP Variance Function Regressions for Studying Inequality Bruce Western1 Deirdre Bloome Harvard University January 2009 1Department of Sociology, 33 Kirkland Street, Cambridge MA 02138. E-mail: [email protected]. This research was supported by a grant from the Russell Sage Foundation. Abstract Regression-based studies of inequality model only between-group differ- ences, yet often these differences are far exceeded by residual inequality. Residual inequality is usually attributed to measurement error or the in- fluence of unobserved characteristics. We present a regression that in- cludes covariates for both the mean and variance of a dependent variable. In this model, the residual variance is treated as a target for analysis. In analyses of inequality, the residual variance might be interpreted as mea- suring risk or insecurity. Variance function regressions are illustrated in an analysis of panel data on earnings among released prisoners in the Na- tional Longitudinal Survey of Youth. We extend the model to a decomposi- tion analysis, relating the change in inequality to compositional changes in the population and changes in coefficients for the mean and variance.
    [Show full text]
  • Flexible Signal Denoising Via Flexible Empirical Bayes Shrinkage
    Journal of Machine Learning Research 22 (2021) 1-28 Submitted 1/19; Revised 9/20; Published 5/21 Flexible Signal Denoising via Flexible Empirical Bayes Shrinkage Zhengrong Xing [email protected] Department of Statistics University of Chicago Chicago, IL 60637, USA Peter Carbonetto [email protected] Research Computing Center and Department of Human Genetics University of Chicago Chicago, IL 60637, USA Matthew Stephens [email protected] Department of Statistics and Department of Human Genetics University of Chicago Chicago, IL 60637, USA Editor: Edo Airoldi Abstract Signal denoising—also known as non-parametric regression—is often performed through shrinkage estima- tion in a transformed (e.g., wavelet) domain; shrinkage in the transformed domain corresponds to smoothing in the original domain. A key question in such applications is how much to shrink, or, equivalently, how much to smooth. Empirical Bayes shrinkage methods provide an attractive solution to this problem; they use the data to estimate a distribution of underlying “effects,” hence automatically select an appropriate amount of shrinkage. However, most existing implementations of empirical Bayes shrinkage are less flexible than they could be—both in their assumptions on the underlying distribution of effects, and in their ability to han- dle heteroskedasticity—which limits their signal denoising applications. Here we address this by adopting a particularly flexible, stable and computationally convenient empirical Bayes shrinkage method and apply- ing it to several signal denoising problems. These applications include smoothing of Poisson data and het- eroskedastic Gaussian data. We show through empirical comparisons that the results are competitive with other methods, including both simple thresholding rules and purpose-built empirical Bayes procedures.
    [Show full text]
  • Variance Function Estimation in Multivariate Nonparametric Regression by T
    Variance Function Estimation in Multivariate Nonparametric Regression by T. Tony Cai, Lie Wang University of Pennsylvania Michael Levine Purdue University Technical Report #06-09 Department of Statistics Purdue University West Lafayette, IN USA October 2006 Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cail, Michael Levine* Lie Wangl October 3, 2006 Abstract Variance function estimation in multivariate nonparametric regression is consid- ered and the minimax rate of convergence is established. Our work uses the approach that generalizes the one used in Munk et al (2005) for the constant variance case. As is the case when the number of dimensions d = 1, and very much contrary to the common practice, it is often not desirable to base the estimator of the variance func- tion on the residuals from an optimal estimator of the mean. Instead it is desirable to use estimators of the mean with minimal bias. Another important conclusion is that the first order difference-based estimator that achieves minimax rate of convergence in one-dimensional case does not do the same in the high dimensional case. Instead, the optimal order of differences depends on the number of dimensions. Keywords: Minimax estimation, nonparametric regression, variance estimation. AMS 2000 Subject Classification: Primary: 62G08, 62G20. Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104. The research of Tony Cai was supported in part by NSF Grant DMS-0306576. 'Corresponding author. Address: 250 N. University Street, Purdue University, West Lafayette, IN 47907. Email: [email protected]. Phone: 765-496-7571. Fax: 765-494-0558 1 Introduction We consider the multivariate nonparametric regression problem where yi E R, xi E S = [0, Ild C Rd while a, are iid random variables with zero mean and unit variance and have bounded absolute fourth moments: E lail 5 p4 < m.
    [Show full text]
  • Variance Function Program
    Variance Function Program Version 15.0 (for Windows XP and later) July 2018 W.A. Sadler 71B Middleton Road Christchurch 8041 New Zealand Ph: +64 3 343 3808 e-mail: [email protected] (formerly at Nuclear Medicine Department, Christchurch Hospital) Contents Page Variance Function Program 15.0 1 Windows Vista through Windows 10 Issues 1 Program Help 1 Gestures 1 Program Structure 2 Data Type 3 Automation 3 Program Data Area 3 User Preferences File 4 Auto-processing File 4 Graph Templates 4 Decimal Separator 4 Screen Settings 4 Scrolling through Summaries and Displays 4 The Variance Function 5 Variance Function Output Examples 8 Variance Stabilisation 11 Histogram, Error and Biological Variation Plots 12 Regression Analysis 13 Limit of Blank (LoB) and Limit of Detection (LoD) 14 Bland-Altman Analysis 14 Appendix A (Program Delivery) 16 Appendix B (Program Installation) 16 Appendix C (Program Removal) 16 Appendix D (Example Data, SD and Regression Files) 16 Appendix E (Auto-processing Example Files) 17 Appendix F (History) 17 Appendix G (Changes: Version 14.0 to Version 15.0) 18 Appendix H (Version 14.0 Bug Fixes) 19 References 20 Variance Function Program 15.0 1 Variance Function Program 15.0 The variance function (the relationship between variance and the mean) has several applications in statistical analysis of medical laboratory data (eg. Ref. 1), but the single most important use is probably the construction of (im)precision profile plots (2). This program (VFP.exe) was initially aimed at immunoassay data which exhibit relatively extreme variance relationships. It was based around the “standard” 3-parameter variance function, 2 J σ = (β1 + β2U) 2 where σ denotes variance, U denotes the mean, and β1, β2 and J are the parameters (3, 4).
    [Show full text]
  • Logistic Regression Maths and Statistics Help Centre
    Logistic regression Maths and Statistics Help Centre Many statistical tests require the dependent (response) variable to be continuous so a different set of tests are needed when the dependent variable is categorical. One of the most commonly used tests for categorical variables is the Chi-squared test which looks at whether or not there is a relationship between two categorical variables but this doesn’t make an allowance for the potential influence of other explanatory variables on that relationship. For continuous outcome variables, Multiple regression can be used for a) controlling for other explanatory variables when assessing relationships between a dependent variable and several independent variables b) predicting outcomes of a dependent variable using a linear combination of explanatory (independent) variables The maths: For multiple regression a model of the following form can be used to predict the value of a response variable y using the values of a number of explanatory variables: y 0 1x1 2 x2 ..... q xq 0 Constant/ intercept , 1 q co efficients for q explanatory variables x1 xq The regression process finds the co-efficients which minimise the squared differences between the observed and expected values of y (the residuals). As the outcome of logistic regression is binary, y needs to be transformed so that the regression process can be used. The logit transformation gives the following: p ln 0 1x1 2 x2 ..... q xq 1 p p p probabilty of event occuring e.g. person dies following heart attack, odds ratio 1- p If probabilities of the event of interest happening for individuals are needed, the logistic regression equation exp x x ....
    [Show full text]
  • Generalized Linear Models
    CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + + Xkβk.Logistic regression predicts Pr(y =1)forbinarydatafromalinearpredictorwithaninverse-··· logit transformation. A generalized linear model involves: 1. A data vector y =(y1,...,yn) 2. Predictors X and coefficients β,formingalinearpredictorXβ 1 3. A link function g,yieldingavectoroftransformeddataˆy = g− (Xβ)thatare used to model the data 4. A data distribution, p(y yˆ) | 5. Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution. The options in a generalized linear model are the transformation g and the data distribution p. In linear regression,thetransformationistheidentity(thatis,g(u) u)and • the data distribution is normal, with standard deviation σ estimated from≡ data. 1 1 In logistic regression,thetransformationistheinverse-logit,g− (u)=logit− (u) • (see Figure 5.2a on page 80) and the data distribution is defined by the proba- bility for binary data: Pr(y =1)=y ˆ. This chapter discusses several other classes of generalized linear model, which we list here for convenience: The Poisson model (Section 6.2) is used for count data; that is, where each • data point yi can equal 0, 1, 2, ....Theusualtransformationg used here is the logarithmic, so that g(u)=exp(u)transformsacontinuouslinearpredictorXiβ to a positivey ˆi.ThedatadistributionisPoisson. It is usually a good idea to add a parameter to this model to capture overdis- persion,thatis,variationinthedatabeyondwhatwouldbepredictedfromthe Poisson distribution alone.
    [Show full text]
  • Examining Residuals
    Stat 565 Examining Residuals Jan 14 2016 Charlotte Wickham stat565.cwick.co.nz So far... xt = mt + st + zt Variable Trend Seasonality Noise measured at time t Estimate and{ subtract off Should be left with this, stationary but probably has serial correlation Residuals in Corvallis temperature series Temp - loess smooth on day of year - loess smooth on date Your turn Now I have residuals, how could I check the variance doesn't change through time (i.e. is stationary)? Is the variance stationary? Same checks as for mean except using squared residuals or absolute value of residuals. Why? var(x) = 1/n ∑ ( x - μ)2 Converts a visual comparison of spread to a visual comparison of mean. Plot squared residuals against time qplot(date, residual^2, data = corv) + geom_smooth(method = "loess") Plot squared residuals against season qplot(yday, residual^2, data = corv) + geom_smooth(method = "loess", size = 1) Fitted values against residuals Looking for mean-variance relationship qplot(temp - residual, residual^2, data = corv) + geom_smooth(method = "loess", size = 1) Non-stationary variance Just like the mean you can attempt to remove the non-stationarity in variance. However, to remove non-stationary variance you divide by an estimate of the standard deviation. Your turn For the temperature series, serial dependence (a.k.a autocorrelation) means that today's residual is dependent on yesterday's residual. Any ideas of how we could check that? Is there autocorrelation in the residuals? > corv$residual_lag1 <- c(NA, corv$residual[-nrow(corv)]) > head(corv) . residual residual_lag1 . 1.5856663 NA xt-1 = lag 1 of xt . -0.4928295 1.5856663 .
    [Show full text]
  • An Introduction to Logistic Regression: from Basic Concepts to Interpretation with Particular Attention to Nursing Domain
    J Korean Acad Nurs Vol.43 No.2, 154 -164 J Korean Acad Nurs Vol.43 No.2 April 2013 http://dx.doi.org/10.4040/jkan.2013.43.2.154 An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain Park, Hyeoun-Ae College of Nursing and System Biomedical Informatics National Core Research Center, Seoul National University, Seoul, Korea Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial short- comings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nurs- ing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models. Key words: Logit function, Maximum likelihood estimation, Odds, Odds ratio, Wald test INTRODUCTION The model serves two purposes: (1) it can predict the value of the depen- dent variable for new values of the independent variables, and (2) it can Multivariable methods of statistical analysis commonly appear in help describe the relative contribution of each independent variable to general health science literature (Bagley, White, & Golomb, 2001).
    [Show full text]
  • Estimating Variance Functions for Weighted Linear Regression
    Kansas State University Libraries New Prairie Press Conference on Applied Statistics in Agriculture 1992 - 4th Annual Conference Proceedings ESTIMATING VARIANCE FUNCTIONS FOR WEIGHTED LINEAR REGRESSION Michael S. Williams Hans T. Schreuder Timothy G. Gregoire William A. Bechtold See next page for additional authors Follow this and additional works at: https://newprairiepress.org/agstatconference Part of the Agriculture Commons, and the Applied Statistics Commons This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. Recommended Citation Williams, Michael S.; Schreuder, Hans T.; Gregoire, Timothy G.; and Bechtold, William A. (1992). "ESTIMATING VARIANCE FUNCTIONS FOR WEIGHTED LINEAR REGRESSION," Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475-7772.1401 This is brought to you for free and open access by the Conferences at New Prairie Press. It has been accepted for inclusion in Conference on Applied Statistics in Agriculture by an authorized administrator of New Prairie Press. For more information, please contact [email protected]. Author Information Michael S. Williams, Hans T. Schreuder, Timothy G. Gregoire, and William A. Bechtold This is available at New Prairie Press: https://newprairiepress.org/agstatconference/1992/proceedings/13 Conference on153 Applied Statistics in Agriculture Kansas State University ESTIMATING VARIANCE FUNCTIONS FOR WEIGHTED LINEAR REGRESSION Michael S. Williams Hans T. Schreuder Multiresource Inventory Techniques Rocky Mountain Forest and Range Experiment Station USDA Forest Service Fort Collins, Colorado 80526-2098, U.S.A. Timothy G. Gregoire Department of Forestry Virginia Polytechnic Institute and State University Blacksburg, Virginia 24061-0324, U.S.A. William A. Bechtold Research Forester Southeastern Forest Experiment Station Forest Inventory and Analysis Project USDA Forest Service Asheville, North Carolina 28802-2680, U.S.A.
    [Show full text]