<<

Simple

4 -intercept equation for a line 1 Regression equation—an equation that describes the y  mx b average relationship between a response (dependent) 2 1 and an explanatory (independent) variable. 0 1 y 8 y 3 (4,6) slope   1.5 x 2 6 (2,3) y 6  3  3 4

2 x 4  2  2

2 4 6 8 10 x 1 2

Deterministic Model 0 2 1 9 A model that defines an exact relationship F  C  32 5 between variables. 6 9 t i

Example: y  1.5 x 2 e 7 h n e r h a

There is no allowance for error in the prediction of y 8 F for a given x. 4 4 2 0 -10 0 10 20 30 40 50 Celsius

3 4

1 Probabilistic Model 5 A model that accounts for random error. 1 y 1.5x  

Includes both a deterministic component and a random 0 error component. 1

y  1. 5 x  random error y

This model hypothesizes a probabilistic relationship 5 between y and x. 0 0 2 4 6 8 10 x 5 6

Probabilistic Model—General Form First-Order (Straight Line) Probabilistic Model y = Deterministic component + Random y  0  1x   where y = Dependent variable component x = Independent variable

 where y is the “variable of interest”. 0 = population y-intercept of the line—the point at which the line intersects or cuts through the y-axis

Assume that the value of the random error is 1 = population slope of the line—the amount of increase (or zero  the mean value of y, E(y), equals the decrease) in the deterministic deterministic component of the model component of y for every 1-unit increase (or decrease) in x.  = random error component

7 8

2 First-Order (Straight Line) Probabilistic Model Model Development  and  will generally be unknown. 0 and 1 are population parameters. They will only 0 1 be known if the population of all (x, y) measurements The process of developing a model, estimating model are available. parameters, and using the model can be summarized in these 5-steps: 1. Hypothesize the deterministic component of 0 and 1, along with a specific value of the independent variable x determine the mean value of the model that relates the mean, E(y) to the the dependent variable y. independent variable x

E y  0  1x 2. Use sample to estimate unknown model parameters ˆ ˆ find estimates: 0 or b0 ,1 or b1 9 10

Model Development (continued) Example: Reaction time versus drug percentage 3. Specify the of the random error term and estimate the SD of this distribution Amount of Drug Reaction Time Subject (%) (seconds)  ~ N 0,   will revisit this later x y 1 1 1 2 2 1 4. Statistically evaluate the usefulness of the model 3 3 2 5. Use model for prediction, estimation or other 4 4 2 purposes 5 5 4

11 12

3 Example: Reaction time versus drug percentage Example: Reaction time versus drug percentage 0 0 . . 4 4 2 2 . . 3 3 ) ) . . c c 3 3 e e . . s s 2 2 ( (

e e m m i i 5 5 . . T T

1 1 n n o o i i t t c c 7 7 . . a a 0 0 e e R R 2 2 . . 0 0 - - 0 0 . . 1 1 - - 0 1 2 3 4 5 0 1 2 3 4 5 Drug (%) Drug (%) 13 14

Example: Reaction time versus drug percentage Line Errors of prediction---vertical differences between Also called regression line, or the least squares the observed and the predicted values of y prediction equation 2 x y y  1  x  y  y   y  y  Method to find this line is called the method of least 1 1 0 (1-0) = 1 1 squares 2 1 1 (1-1) = 0 0 For our example, we have a sample of n = 5 pairs of 3 2 2 (2-2) = 0 0 (x, y) values. The fitted line that we will calculate is 4 2 3 (2-3) = -1 1 written as ˆy  b0  b1 x 5 4 4 (4-4) = 0 0 Sum of Sum of squared ˆy is an estimator of the mean value of y, E  y  ; errors = 0 errors (SSE) = 2

b0 and b 1 are estimators of  0 and 1

15 16

4 Least Squares Line (continued) Formulas for the Least Squares Estimates SS SD Define the sum of squares of the deviations of the y xy y Slope: b1  or b1  r values about their predicted values for all n data points SSxx SDx as: 2 n n 2 2 sxy = SSxy   xi  x  yi  y  sxx = SSxx   xi  x  ˆ   SSE   yi  y    yi  b0  b1 xi  2  xi  yi  x i1 i1   2  i    xi yi   x  n  i n We want to find b and b to make the SSE a 0 1 y x minimum---termed least squares estimates y-intercept: b  y b x   i  b  i 0 1 n 1 n

ˆy  b0  b1 x is called the least squares line n = sample size

17 18

LS Calculations for Drug/Reaction Example LS Line for Drug/Reaction Example 0

2 . xi yi xi xi yi 4 7 ˆy  .1  .7 x 1 1 1 1 b1   0. 7 2 .

10 3 2 1 4 2 ) . c 3 e .

3 2 9 6 10 15 s 2 ( b0   .7 5 5 e

4 2 16 8 m i 5 . T

2  .7 3 1 5 4 25 20    n o i t

2 c 2  2. 1  .1 7 . x  15 y  10 x  55 x y  37 a i i i i i 0     e R

2 2 .

15 10 15 0    - SS  37   37 30 7 SS  55   55 45 10 xy 5 xx 5 0 . 1 - 0 1 2 3 4 5 Drug (%) 19 20

5 LS Calculations for Drug/Reaction Example Least Squares Line—Interpretation of ˆy  .1  .7 x 2 x y ˆy  .1  .7 x  y  ˆy  y  ˆy Estimated intercept is negative 1 1 .6 (1-.6) = .4 .16 that the estimated mean reaction time is equal to 2 1 1.3 (1-1.3) = -.3 .09 -0.1 seconds when the amount of drug is 0%. 3 2 2.0 (2-2.0) = 0 .00 4 2 2.7 (2-2.7) = -.7 .49 What does this mean since negative reaction times are not possible? 5 4 3.4 (4-3.4) = .6 .36 Sum of Sum of squared errors = 0 errors (SSE) = 1.10 Model parameters should be interpreted only within the sampled of the independent variable. The LS line has a sum of errors = 0, but SSE = 1.1 < 2.0 for visual model

21 22

Least Squares Line—Interpretation of ˆy  .1  .7 x Coefficient of Determination The slope of 0.7 implies that for every unit increase of A measure of the contribution of x in predicting y x, the mean value of y is estimated to increase by 0.7 Assuming that x provides no information for the prediction of y, units. the best prediction for the value of y is y 0 . 4

2 y y . ˆ 

In the context of the problem: 3 ) . c 3 e . s 2 For every 1% increase in the amount of drug in the (

e m i 5 . bloodstream, the mean reaction time is estimated to T

1 n o i increase by 0.7 seconds over the sampled range of drug t c 7 . a 0 e amounts from 1% to 5%. R 2 2 . SS   y  y  0 yy i

-  0 . 1 - 0 1 2 3 4 5 Drug (%) 23 24

6 Coefficient of Determination (continued) Coefficient of Determination (continued) 0

. 2 4 2 SS  y  y ˆ yy  i  --total sample variation around mean SSE   yi  yi  2 . i 3 )

. 2 c

3 --unexplained sample variability after e ˆ . SSE   yi  yi  s

2  (

fitting e m i 5 . T

1

n SS  SSE --explained sample variability attributable to

o yy i t

c linear relationship 7 . a 0 e R SSyy  SSE explained 2 .   proportion of total sample 0 - SS total yy variability explained by the 0 . 1 - linear relationship 0 1 2 3 4 5 Drug (%) 25 26

Coefficient of Determination (continued)

SS  SSE SSE r 2  yy 1  Unexplained variability SSyy SSyy } In simple linear regression r 2 is computed as the square of the , r

0  r 2  1

Interpretation— r 2  .75 that the sum of squared deviations of the y values about their predicted values has been reduced by 75% by the use ˆy, instead of y, to predict y of the least squares equation. 27

7