Simple Linear Regression Deterministic Model

Simple Linear Regression 4 Slope-intercept equation for a line 1 Regression equation—an equation that describes the y mx b average relationship between a response (dependent) 2 1 and an explanatory (independent) variable. 0 1 y 8 y 3 (4,6) slope 1.5 x 2 6 (2,3) y 6 3 3 4 2 x 4 2 2 2 4 6 8 10 x 1 2 Deterministic Model 0 2 1 9 A model that defines an exact relationship F C 32 5 between variables. 6 9 t i Example: y 1.5 x 2 e 7 h n e r h a There is no allowance for error in the prediction of y 8 F for a given x. 4 4 2 0 -10 0 10 20 30 40 50 Celsius 3 4 1 Probabilistic Model 5 A model that accounts for random error. 1 y 1.5x Includes both a deterministic component and a random 0 error component. 1 y 1. 5 x random error y This model hypothesizes a probabilistic relationship 5 between y and x. 0 0 2 4 6 8 10 x 5 6 Probabilistic Model—General Form First-Order (Straight Line) Probabilistic Model y = Deterministic component + Random y 0 1x where y = Dependent variable component x = Independent variable where y is the “variable of interest”. 0 = population y-intercept of the line—the point at which the line intersects or cuts through the y-axis Assume that the mean value of the random error is 1 = population slope of the line—the amount of increase (or zero the mean value of y, E(y), equals the decrease) in the deterministic deterministic component of the model component of y for every 1-unit increase (or decrease) in x. = random error component 7 8 2 First-Order (Straight Line) Probabilistic Model Model Development and will generally be unknown. 0 and 1 are population parameters. They will only 0 1 be known if the population of all (x, y) measurements The process of developing a model, estimating model are available. parameters, and using the model can be summarized in these 5-steps: 1. Hypothesize the deterministic component of 0 and 1, along with a specific value of the independent variable x determine the mean value of the model that relates the mean, E(y) to the the dependent variable y. independent variable x E y 0 1x 2. Use sample data to estimate unknown model parameters ˆ ˆ find estimates: 0 or b0 ,1 or b1 9 10 Model Development (continued) Example: Reaction time versus drug percentage 3. Specify the probability distribution of the random error term and estimate the SD of this distribution Amount of Drug Reaction Time Subject (%) (seconds) ~ N 0, will revisit this later x y 1 1 1 2 2 1 4. Statistically evaluate the usefulness of the model 3 3 2 5. Use model for prediction, estimation or other 4 4 2 purposes 5 5 4 11 12 3 Example: Reaction time versus drug percentage Example: Reaction time versus drug percentage 0 0 . 4 4 2 2 . 3 3 ) ) . c c 3 3 e e . s s 2 2 ( ( e e m m i i 5 5 . T T 1 1 n n o o i i t t c c 7 7 . a a 0 0 e e R R 2 2 . 0 0 - - 0 0 . 1 1 - - 0 1 2 3 4 5 0 1 2 3 4 5 Drug (%) Drug (%) 13 14 Example: Reaction time versus drug percentage Least Squares Line Errors of prediction---vertical differences between Also called regression line, or the least squares the observed and the predicted values of y prediction equation 2 x y y 1 x y y y y Method to find this line is called the method of least 1 1 0 (1-0) = 1 1 squares 2 1 1 (1-1) = 0 0 For our example, we have a sample of n = 5 pairs of 3 2 2 (2-2) = 0 0 (x, y) values. The fitted line that we will calculate is 4 2 3 (2-3) = -1 1 written as ˆy b0 b1 x 5 4 4 (4-4) = 0 0 Sum of Sum of squared ˆy is an estimator of the mean value of y, E y ; errors = 0 errors (SSE) = 2 b0 and b 1 are estimators of 0 and 1 15 16 4 Least Squares Line (continued) Formulas for the Least Squares Estimates SS SD Define the sum of squares of the deviations of the y xy y Slope: b1 or b1 r values about their predicted values for all n data points SSxx SDx as: 2 n n 2 2 sxy = SSxy xi x yi y sxx = SSxx xi x ˆ SSE yi y yi b0 b1 xi 2 xi yi x i1 i1 2 i xi yi x n i n We want to find b and b to make the SSE a 0 1 y x minimum---termed least squares estimates y-intercept: b y b x i b i 0 1 n 1 n ˆy b0 b1 x is called the least squares line n = sample size 17 18 LS Calculations for Drug/Reaction Example LS Line for Drug/Reaction Example 0 2 . xi yi xi xi yi 4 7 ˆy .1 .7 x 1 1 1 1 b1 0. 7 2 . 10 3 2 1 4 2 ) . c 3 e . 3 2 9 6 10 15 s 2 ( b0 .7 5 5 e 4 2 16 8 m i 5 . T 2 .7 3 1 5 4 25 20 n o i t 2 c 2 2. 1 .1 7 . x 15 y 10 x 55 x y 37 a i i i i i 0 e R 2 2 . 15 10 15 0 - SS 37 37 30 7 SS 55 55 45 10 xy 5 xx 5 0 . 1 - 0 1 2 3 4 5 Drug (%) 19 20 5 LS Calculations for Drug/Reaction Example Least Squares Line—Interpretation of ˆy .1 .7 x 2 x y ˆy .1 .7 x y ˆy y ˆy Estimated intercept is negative 1 1 .6 (1-.6) = .4 .16 that the estimated mean reaction time is equal to 2 1 1.3 (1-1.3) = -.3 .09 -0.1 seconds when the amount of drug is 0%. 3 2 2.0 (2-2.0) = 0 .00 4 2 2.7 (2-2.7) = -.7 .49 What does this mean since negative reaction times are not possible? 5 4 3.4 (4-3.4) = .6 .36 Sum of Sum of squared errors = 0 errors (SSE) = 1.10 Model parameters should be interpreted only within the sampled range of the independent variable. The LS line has a sum of errors = 0, but SSE = 1.1 < 2.0 for visual model 21 22 Least Squares Line—Interpretation of ˆy .1 .7 x Coefficient of Determination The slope of 0.7 implies that for every unit increase of A measure of the contribution of x in predicting y x, the mean value of y is estimated to increase by 0.7 Assuming that x provides no information for the prediction of y, units. the best prediction for the value of y is y 0 . 4 2 y y . ˆ In the context of the problem: 3 ) . c 3 e . s 2 For every 1% increase in the amount of drug in the ( e m i 5 . bloodstream, the mean reaction time is estimated to T 1 n o i increase by 0.7 seconds over the sampled range of drug t c 7 . a 0 e amounts from 1% to 5%. R 2 2 . SS y y 0 yy i - 0 . 1 - 0 1 2 3 4 5 Drug (%) 23 24 6 Coefficient of Determination (continued) Coefficient of Determination (continued) 0 . 2 4 2 SS y y ˆ yy i --total sample variation around mean SSE yi yi 2 . i 3 ) . 2 c 3 --unexplained sample variability after e ˆ . SSE yi yi s 2 ( fitting e m i 5 . T 1 n SS SSE --explained sample variability attributable to o yy i t c linear relationship 7 . a 0 e R SSyy SSE explained 2 . proportion of total sample 0 - SS total yy variability explained by the 0 . 1 - linear relationship 0 1 2 3 4 5 Drug (%) 25 26 Coefficient of Determination (continued) SS SSE SSE r 2 yy 1 Unexplained variability SSyy SSyy } In simple linear regression r 2 is computed as the square of the correlation coefficient, r 0 r 2 1 Interpretation— r 2 .75 means that the sum of squared deviations of the y values about their predicted values has been reduced by 75% by the use ˆy, instead of y, to predict y of the least squares equation. 27 7.

Simple Linear Regression Deterministic Model

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support