<<

SimpleSimple LinearLinear RegressionRegression ModelsModels

©2010 Raj Jain www.rajjain.com 14-1 OverviewOverview

1. Definition of a Good Model 2. Estimation of Model 3. Allocation of Variation 4. Standard of Errors 5. Confidence Intervals for Regression Parameters 6. Confidence Intervals for 7. Visual Tests for verifying Regression Assumption

©2010 Raj Jain www.rajjain.com 14-2 SimpleSimple LinearLinear RegressionRegression ModelsModels

‰ Regression Model: Predict a response for a given set of predictor variables. ‰ Response : Estimated variable ‰ Predictor Variables: Variables used to predict the response. predictors or factors ‰ Models: Response is a of predictors. ‰ Models: Only one predictor

©2010 Raj Jain www.rajjain.com 14-3 DefinitionDefinition ofof aa GoodGood ModelModel

y y y

x x x

Good Good Bad

©2010 Raj Jain www.rajjain.com 14-4 GoodGood ModelModel (Cont)(Cont)

‰ Regression models attempt to minimize the distance measured vertically between the observation point and the model line (or curve). ‰ The length of the line segment is called residual, modeling error, or simply error. ‰ The negative and positive errors should cancel out ⇒ Zero overall error Many lines will satisfy this criterion.

©2010 Raj Jain www.rajjain.com 14-5 GoodGood ModelModel (Cont)(Cont)

‰ Choose the line that minimizes the sum of squares of the errors.

where, is the predicted response when the predictor variable is x. The b0 and b1 are fixed regression parameters to be determined from the .

‰ Given n observation pairs {(x1, y1), …, (xn, yn)}, the estimated response for the ith observation is:

‰ The error is:

©2010 Raj Jain www.rajjain.com 14-6 GoodGood ModelModel (Cont)(Cont)

‰ The best minimizes the sum of squared errors (SSE):

subject to the constraint that the error is zero:

‰ This is equivalent to minimizing the of errors (see Exercise).

©2010 Raj Jain www.rajjain.com 14-7 EstimationEstimation ofof ModelModel ParametersParameters

‰ Regression parameters that give minimum error variance are:

and

‰ where,

©2010 Raj Jain www.rajjain.com 14-8 ExampleExample 14.114.1

‰ The number of disk I/O's and processor times of seven programs were measured as: (14, 2), (16, 5), (27, 7), (42, 9), (39, 10), (50, 13), (83, 20) ‰ For this data: n=7, Σ xy=3375, Σ x=271, Σ x2=13,855, Σ y=66, Σ y2=828, = 38.71, = 9.43. Therefore,

‰ The desired linear model is:

©2010 Raj Jain www.rajjain.com 14-9 ExampleExample 14.114.1 (Cont)(Cont)

©2010 Raj Jain www.rajjain.com 14-10 ExampleExample 14.14. (Cont)(Cont)

‰ Error Computation

©2010 Raj Jain www.rajjain.com 14-11 DerivationDerivation ofof RegressionRegression ParametersParameters

‰ The error in the ith observation is:

‰ For a of n observations, the mean error is:

‰ Setting mean error to zero, we obtain:

‰ Substituting b0 in the error expression, we get:

©2010 Raj Jain www.rajjain.com 14-12 DerivationDerivation ofof RegressionRegression ParametersParameters (Cont)(Cont)

‰ The sum of squared errors SSE is:

©2010 Raj Jain www.rajjain.com 14-13 DerivationDerivation (Cont)(Cont)

‰ Differentiating this equation with respect to b1 and equating the result to zero:

‰ That is,

©2010 Raj Jain www.rajjain.com 14-14 AllocationAllocation ofof VariationVariation

‰ Error variance without Regression = Variance of the response

and

©2010 Raj Jain www.rajjain.com 14-15 AllocationAllocation ofof VariationVariation (Cont)(Cont)

‰ The sum of squared errors without regression would be:

‰ This is called total sum of squares or (SST). It is a measure of y's variability and is called variation of y. SST can be computed as follows:

‰ Where, SSY is the sum of squares of y (or Σ y2). SS0 is the sum of squares of and is equal to .

©2010 Raj Jain www.rajjain.com 14-16 AllocationAllocation ofof VariationVariation (Cont)(Cont)

‰ The difference between SST and SSE is the sum of squares explained by the regression. It is called SSR:

or

‰ The fraction of the variation that is explained determines the goodness of the regression and is called the coefficient of determination, R2:

©2010 Raj Jain www.rajjain.com 14-17 AllocationAllocation ofof VariationVariation (Cont)(Cont)

‰ The higher the value of R2, the better the regression. R2=1 ⇒ Perfect fit R2=0 ⇒ No fit

‰ Coefficient of Determination = { (x,y)}2 ‰ Shortcut formula for SSE:

©2010 Raj Jain www.rajjain.com 14-18 ExampleExample 14.214.2

‰ For the disk I/O-CPU time data of Example 14.1:

‰ The regression explains 97% of CPU time's variation.

©2010 Raj Jain www.rajjain.com 14-19 StandardStandard DeviationDeviation ofof ErrorsErrors

‰ Since errors are obtained after calculating two regression parameters from the data, errors have n-2 degrees of freedom

‰ SSE/(n-2) is called mean squared errors or (MSE). ‰ of errors = root of MSE. ‰ SSY has n degrees of freedom since it is obtained from n independent observations without estimating any parameters. ‰ SS0 has just one degree of freedom since it can be computed simply from ‰ SST has n-1 degrees of freedom, since one parameter must be calculated from the data before SST can be computed.

©2010 Raj Jain www.rajjain.com 14-20 StandardStandard DeviationDeviation ofof ErrorsErrors (Cont)(Cont)

‰ SSR, which is the difference between SST and SSE, has the remaining one degree of freedom. ‰ Overall,

‰ Notice that the degrees of freedom add just the way the sums of squares do.

©2010 Raj Jain www.rajjain.com 14-21 ExampleExample 14.314.3

‰ For the disk I/O-CPU data of Example 14.1, the degrees of freedom of the sums are:

‰ The is:

‰ The standard deviation of errors is:

©2010 Raj Jain www.rajjain.com 14-22 ConfidenceConfidence IntervalsIntervals forfor RegressionRegression ParamsParams

‰ Regression coefficients b0 and b1 are estimates from a single sample of size n ⇒ Random

⇒ Using another sample, the estimates may be different. If β0 and β1 are true parameters of the population. That is,

‰ Computed coefficients b0 and b1 are estimates of β0 and β1, respectively.

©2010 Raj Jain www.rajjain.com 14-23 ConfidenceConfidence IntervalsIntervals (Cont)(Cont)

‰ The 100(1-α)% confidence intervals for b0 and b1 can be be computed using t[1-α/2; n-2] --- the 1-α/2 of a t variate with n-2 degrees of freedom. The confidence intervals are:

And

‰ If a includes zero, then the regression parameter cannot be considered different from zero at the at 100(1-α)% confidence level.

©2010 Raj Jain www.rajjain.com 14-24 ExampleExample 14.414.4

‰ For the disk I/O and CPU data of Example 14.1, we have n=7,

=38.71, =13,855, and se=1.0834.

‰ Standard deviations of b0 and b1 are:

©2010 Raj Jain www.rajjain.com 14-25 ExampleExample 14.414.4 (Cont)(Cont)

‰ From Appendix Table A.4, the 0.95-quantile of a t-variate with 5 degrees of freedom is 2.015. ⇒ 90% confidence interval for b0 is:

‰ Since, the confidence interval includes zero, the hypothesis that this parameter is zero cannot be rejected at 0.10 significance level. ⇒ b0 is essentially zero. ‰ 90% Confidence Interval for b1 is:

‰ Since the confidence interval does not include zero, the b1 is significantly different from zero at this confidence level.

©2010 Raj Jain www.rajjain.com 14-26 CaseCase StudyStudy 14.1:14.1: RemoteRemote ProcedureProcedure CallCall

©2010 Raj Jain www.rajjain.com 14-27 CaseCase StudyStudy 14.114.1 (Cont)(Cont)

‰ UNIX:

©2010 Raj Jain www.rajjain.com 14-28 CaseCase StudyStudy 14.114.1 (Cont)(Cont)

‰ ARGUS:

©2010 Raj Jain www.rajjain.com 14-29 CaseCase StudyStudy 14.114.1 (Cont)(Cont)

‰ Best linear models are:

‰ The regressions explain 81% and 75% of the variation, respectively. Does ARGUS takes larger time per byte as well as a larger set up time per call than UNIX?

©2010 Raj Jain www.rajjain.com 14-30 CaseCase StudyStudy 14.114.1 (Cont)(Cont)

‰ Intervals for intercepts overlap while those of the do not. ⇒ Set up times are not significantly different in the two systems while the per byte times (slopes) are different.

©2010 Raj Jain www.rajjain.com 14-31 ConfidenceConfidence IntervalsIntervals forfor PredictionsPredictions

‰ This is only the mean value of the predicted response. Standard deviation of the mean of a future sample of m observations is:

‰ m =1 ⇒ Standard deviation of a single future observation:

©2010 Raj Jain www.rajjain.com 14-32 CICI forfor PredictionsPredictions (Cont)(Cont)

‰ m = ∞ ⇒ Standard deviation of the mean of a large

number of future observations at xp:

‰ 100(1-α)% confidence interval for the mean can be constructed using a t quantile read at n-2 degrees of freedom.

©2010 Raj Jain www.rajjain.com 14-33 CICI forfor PredictionsPredictions (Cont)(Cont)

‰ Goodness of the decreases as we move away from the center.

©2010 Raj Jain www.rajjain.com 14-34 ExampleExample 14.514.5

‰ Using the disk I/O and CPU time data of Example 14.1, let us estimate the CPU time for a program with 100 disk I/O's.

‰ For a program with 100 disk I/O's, the mean CPU time is:

©2010 Raj Jain www.rajjain.com 14-35 ExampleExample 14.514.5 (Cont)(Cont)

‰ The standard deviation of the predicted mean of a large number of observations is:

‰ From Table A.4, the 0.95-quantile of the t-variate with 5 degrees of freedom is 2.015. ⇒ 90% CI for the predicted mean

©2010 Raj Jain www.rajjain.com 14-36 ExampleExample 14.514.5 (Cont)(Cont)

‰ CPU time of a single future program with 100 disk I/O's:

‰ 90% CI for a single prediction:

©2010 Raj Jain www.rajjain.com 14-37 VisualVisual TestsTests forfor RegressionRegression AssumptionsAssumptions

Regression assumptions: 1. The true relationship between the response variable y and the predictor variable x is linear. 2. The predictor variable x is non-stochastic and it is measured without any error. 3. The model errors are statistically independent. 4. The errors are normally distributed with zero mean and a constant standard deviation.

©2010 Raj Jain www.rajjain.com 14-38 1.1. LinearLinear Relationship:Relationship: VisualVisual TestTest

‰ Scatter of y versus x ⇒ Linear or nonlinear relationship

©2010 Raj Jain www.rajjain.com 14-39 2.2. IndependentIndependent Errors:Errors: VisualVisual TestTest

1. of εi versus the predicted response

‰ All tests for independence simply try to find dependence.

©2010 Raj Jain www.rajjain.com 14-40 IndependentIndependent ErrorsErrors (Cont)(Cont)

2. Plot the residuals as a function of the number

©2010 Raj Jain www.rajjain.com 14-41 3.3. NormallyNormally DistributedDistributed Errors:Errors: TestTest

‰ Prepare a normal quantile-quantile plot of errors. Linear ⇒ the assumption is satisfied.

©2010 Raj Jain www.rajjain.com 14-42 4.4. ConstantConstant StandardStandard DeviationDeviation ofof ErrorsErrors

‰ Also known as

‰ Trend ⇒ Try curvilinear regression or transformation

©2010 Raj Jain www.rajjain.com 14-43 ExampleExample 14.614.6

For the disk I/O and CPU time data of Example 14.1 Residual CPU time in ms CPU time in Residual Quantile Number of disk I/Os Predicted Response Normal Quantile

1. Relationship is linear 2. No trend in residuals ⇒ Seem independent 3. Linear normal quantile-quantile plot ⇒ Larger deviations at lower values but all values are small

©2010 Raj Jain www.rajjain.com 14-44 ExampleExample 14.7:14.7: RPCRPC PerformancePerformance Residual Residual Quantile

Predicted Response Normal Quantile 1. Larger errors at larger responses 2. of errors is questionable

©2010 Raj Jain www.rajjain.com 14-45 SummarySummary

‰ Terminology: Simple Linear Regression model, Sums of Squares, Mean Squares, degrees of freedom, percent of variation explained, Coefficient of determination, correlation coefficient ‰ Regression parameters as well as the predicted responses have confidence intervals ‰ It is important to verify assumptions of , error independence, error normality ⇒ Visual tests ©2010 Raj Jain www.rajjain.com 14-46 ExerciseExercise 14.714.7

‰ The time to encrypt a k byte record using an encryption technique is shown in the following table. Fit a linear regression model to this data. Use visual tests to verify the regression assumptions.

©2010 Raj Jain www.rajjain.com 14-53 ExerciseExercise 2.12.1

‰ From published literature, select an article or a report that presents results of a performance evaluation study. Make a list of good and bad points of the study. What would you do different, if you were asked to repeat the study?

©2010 Raj Jain www.rajjain.com 14-54 HomeworkHomework

‰ Read Chapter 14 ‰ Submit answers to exercise 14.7 ‰ Submit answer to exercise 2.1

©2010 Raj Jain www.rajjain.com 14-55