Simple Linear Regression Deterministic Model

Simple Linear Regression 4 Slope-intercept equation for a line 1 Regression equation—an equation that describes the y mx b average relationship between a response (dependent) 2 1 and an explanatory (independent) variable. 0 1 y 8 y 3 (4,6) slope 1.5 x 2 6 (2,3) y 6 3 3 4 2 x 4 2 2 2 4 6 8 10 x 1 2 Deterministic Model 0 2 1 9 A model that defines an exact relationship F C 32 5 between variables. 6 9 t i Example: y 1.5 x 2 e 7 h n e r h a There is no allowance for error in the prediction of y 8 F for a given x. 4 4 2 0 -10 0 10 20 30 40 50 Celsius 3 4 1 Probabilistic Model 5 A model that accounts for random error. 1 y 1.5x Includes both a deterministic component and a random 0 error component. 1 y 1. 5 x random error y This model hypothesizes a probabilistic relationship 5 between y and x. 0 0 2 4 6 8 10 x 5 6 Probabilistic Model—General Form First-Order (Straight Line) Probabilistic Model y = Deterministic component + Random y 0 1x where y = Dependent variable component x = Independent variable where y is the “variable of interest”. 0 = population y-intercept of the line—the point at which the line intersects or cuts through the y-axis Assume that the mean value of the random error is 1 = population slope of the line—the amount of increase (or zero the mean value of y, E(y), equals the decrease) in the deterministic deterministic component of the model component of y for every 1-unit increase (or decrease) in x. = random error component 7 8 2 First-Order (Straight Line) Probabilistic Model Model Development and will generally be unknown. 0 and 1 are population parameters. They will only 0 1 be known if the population of all (x, y) measurements The process of developing a model, estimating model are available. parameters, and using the model can be summarized in these 5-steps: 1. Hypothesize the deterministic component of 0 and 1, along with a specific value of the independent variable x determine the mean value of the model that relates the mean, E(y) to the the dependent variable y. independent variable x E y 0 1x 2. Use sample data to estimate unknown model parameters ˆ ˆ find estimates: 0 or b0 ,1 or b1 9 10 Model Development (continued) Example: Reaction time versus drug percentage 3. Specify the probability distribution of the random error term and estimate the SD of this distribution Amount of Drug Reaction Time Subject (%) (seconds) ~ N 0, will revisit this later x y 1 1 1 2 2 1 4. Statistically evaluate the usefulness of the model 3 3 2 5. Use model for prediction, estimation or other 4 4 2 purposes 5 5 4 11 12 3 Example: Reaction time versus drug percentage Example: Reaction time versus drug percentage 0 0 . 4 4 2 2 . 3 3 ) ) . c c 3 3 e e . s s 2 2 ( ( e e m m i i 5 5 . T T 1 1 n n o o i i t t c c 7 7 . a a 0 0 e e R R 2 2 . 0 0 - - 0 0 . 1 1 - - 0 1 2 3 4 5 0 1 2 3 4 5 Drug (%) Drug (%) 13 14 Example: Reaction time versus drug percentage Least Squares Line Errors of prediction---vertical differences between Also called regression line, or the least squares the observed and the predicted values of y prediction equation 2 x y y 1 x y y y y Method to find this line is called the method of least 1 1 0 (1-0) = 1 1 squares 2 1 1 (1-1) = 0 0 For our example, we have a sample of n = 5 pairs of 3 2 2 (2-2) = 0 0 (x, y) values. The fitted line that we will calculate is 4 2 3 (2-3) = -1 1 written as ˆy b0 b1 x 5 4 4 (4-4) = 0 0 Sum of Sum of squared ˆy is an estimator of the mean value of y, E y ; errors = 0 errors (SSE) = 2 b0 and b 1 are estimators of 0 and 1 15 16 4 Least Squares Line (continued) Formulas for the Least Squares Estimates SS SD Define the sum of squares of the deviations of the y xy y Slope: b1 or b1 r values about their predicted values for all n data points SSxx SDx as: 2 n n 2 2 sxy = SSxy xi x yi y sxx = SSxx xi x ˆ SSE yi y yi b0 b1 xi 2 xi yi x i1 i1 2 i xi yi x n i n We want to find b and b to make the SSE a 0 1 y x minimum---termed least squares estimates y-intercept: b y b x i b i 0 1 n 1 n ˆy b0 b1 x is called the least squares line n = sample size 17 18 LS Calculations for Drug/Reaction Example LS Line for Drug/Reaction Example 0 2 . xi yi xi xi yi 4 7 ˆy .1 .7 x 1 1 1 1 b1 0. 7 2 . 10 3 2 1 4 2 ) . c 3 e . 3 2 9 6 10 15 s 2 ( b0 .7 5 5 e 4 2 16 8 m i 5 . T 2 .7 3 1 5 4 25 20 n o i t 2 c 2 2. 1 .1 7 . x 15 y 10 x 55 x y 37 a i i i i i 0 e R 2 2 . 15 10 15 0 - SS 37 37 30 7 SS 55 55 45 10 xy 5 xx 5 0 . 1 - 0 1 2 3 4 5 Drug (%) 19 20 5 LS Calculations for Drug/Reaction Example Least Squares Line—Interpretation of ˆy .1 .7 x 2 x y ˆy .1 .7 x y ˆy y ˆy Estimated intercept is negative 1 1 .6 (1-.6) = .4 .16 that the estimated mean reaction time is equal to 2 1 1.3 (1-1.3) = -.3 .09 -0.1 seconds when the amount of drug is 0%. 3 2 2.0 (2-2.0) = 0 .00 4 2 2.7 (2-2.7) = -.7 .49 What does this mean since negative reaction times are not possible? 5 4 3.4 (4-3.4) = .6 .36 Sum of Sum of squared errors = 0 errors (SSE) = 1.10 Model parameters should be interpreted only within the sampled range of the independent variable. The LS line has a sum of errors = 0, but SSE = 1.1 < 2.0 for visual model 21 22 Least Squares Line—Interpretation of ˆy .1 .7 x Coefficient of Determination The slope of 0.7 implies that for every unit increase of A measure of the contribution of x in predicting y x, the mean value of y is estimated to increase by 0.7 Assuming that x provides no information for the prediction of y, units. the best prediction for the value of y is y 0 . 4 2 y y . ˆ In the context of the problem: 3 ) . c 3 e . s 2 For every 1% increase in the amount of drug in the ( e m i 5 . bloodstream, the mean reaction time is estimated to T 1 n o i increase by 0.7 seconds over the sampled range of drug t c 7 . a 0 e amounts from 1% to 5%. R 2 2 . SS y y 0 yy i - 0 . 1 - 0 1 2 3 4 5 Drug (%) 23 24 6 Coefficient of Determination (continued) Coefficient of Determination (continued) 0 . 2 4 2 SS y y ˆ yy i --total sample variation around mean SSE yi yi 2 . i 3 ) . 2 c 3 --unexplained sample variability after e ˆ . SSE yi yi s 2 ( fitting e m i 5 . T 1 n SS SSE --explained sample variability attributable to o yy i t c linear relationship 7 . a 0 e R SSyy SSE explained 2 . proportion of total sample 0 - SS total yy variability explained by the 0 . 1 - linear relationship 0 1 2 3 4 5 Drug (%) 25 26 Coefficient of Determination (continued) SS SSE SSE r 2 yy 1 Unexplained variability SSyy SSyy } In simple linear regression r 2 is computed as the square of the correlation coefficient, r 0 r 2 1 Interpretation— r 2 .75 means that the sum of squared deviations of the y values about their predicted values has been reduced by 75% by the use ˆy, instead of y, to predict y of the least squares equation. 27 7.

Simple Linear Regression Deterministic Model

Understanding Linear and Logistic Regression Analyses

Application of General Linear Models (GLM) to Assess Nodule Abundance Based on a Photographic Survey (Case Study from IOM Area, Paciﬁc Ocean)

The Simple Linear Regression Model

Chapter 2 Simple Linear Regression Analysis the Simple

The Conspiracy of Random Predictors and Model Violations Against Classical Inference in Regression 1 Introduction

Lecture 14 Multiple Linear Regression and Logistic Regression

Design and Analysis of Ecological Data Landscape of Statistical Methods: Part 1

Chapter 11 Autocorrelation

Simple Statistics, Linear and Generalized Linear Models

Binary Response and Logistic Regression Analysis

Skedastic: Heteroskedasticity Diagnostics for Linear Regression

Statistical Analysis of Corpus Data with R a Short Introduction to Regression and Linear Models