The following topics will be covered in Exam 2:

Multiple Regression

 Dummy variables  Residuals  Correlation  Autocorrelation  Least squares estimation  Cross correlation  Multicollinearity  Outliers  Forecasting

Analysis of Variance

 ANOVA  ANOCOVA

You should know:

 Why and how dummy variables are used in multiple regression analysis.  What residuals represent and their relationship to common cause variation in quality control.  Why we use correlation coefficients instead of covariance to investigate relationships between variables.  What autocorrelation is and how it may be used in multiple regression.  What least squares estimation is and how it relates to the computation of regression coefficients in multiple regression.  What cross correlation is and how it may be used to identify important relationships in multiple regression analysis.  What Multicollinearity means (central difference vectors are partially aligned between two explanatory variables, i.e. they point in nearly the same or opposite directions), and implications for multiple regression coefficients.  How to identify outliers and deal with them in multiple regression analysis.  How to produce a forecast from a multiple regression model in Statgraphics.  The uses of analysis of variance (ANOVA) in data analysis.  The uses of analysis of covariance (ANOCOVA) in data analysis.  How to use Statgraphics to develop regression models using these concepts (see the sample project for an example).  Basic concepts covered in Exam 1 and their application in model building, i.e. Central Limit Theorem, Normal Distribution, Common Cause and Specific Cause Variations, Confidence intervals and confidence limits, hypothesis testing and the P value, common statistics and their applications (Z, t, Chi-Squared, F). MIS 131 Sections 7, 13 Sample Exam 2

This exam has three parts. Parts 1 and 2 deal with multiple regression and have a total point value of 60 points each. Part 3 deals with ANOVA and ANACOVA and has a total point value of 30 points. Part 4 is a bonus section worth another 30 points. Statgraphics will be useful in answering all questions in parts 1, 2 and 3. The bonus question tests your knowledge of basic concepts in regression analysis. Data for parts 1, 2 and 3 is contained in the data file IceCream.sfa

Part 1 The following 10 questions are worth 6 points each.

It is proposed to model the dependence of ice cream sales on Store Traffic and outdoor temperature using daily data for the prior three months for Tom’s Texas Cookies and Ice Cream. A local Faire during the month of August has a significant impact on store traffic. The model is expected to be useful in planning sales staff, and order quantities for ice cream. The variables IceCream, Traffic and Temperature are daily time series data. The dummy variable Faire is included to represent days during which the local Faire is in progress.

A model is proposed of the form;

IceCream = 0 + 1*Traffic + 2*Temperature + 3*Faire

1. Are the proposed dependencies statistically significant (90%)? Yes No 2. Can the model be simplified? Yes No 3. Now how much of the variation in IceCream is explained by the model? ______4. Which two points are obvious outliers? ______5. Remove these points from the model. Now how much of the variation in IceCream is explained by the model (with the outliers excluded)? ______6. Describe the model in the table:

Variable Coefficient Value p-Value

7. Is there significant autocorrelation in IceCream? Yes No 8. What is the most significant lag in IceCream? ______9. Is this lag significant in modeling IceCream? Yes No 10. Is the seasonality in IceCream adequately represented by Traffic? Yes No Part 2

Variables A, B, C, and D represent time series data. A model is proposed of the form:

A = 0 + 1*B + 2*C + 3*D

You are to investigate whether this model is adequate to describe the variation in A. The following 10 questions are worth six points each.

1. How much of the variation in A is explained by the proposed model? ______

2. Is autocorrelation in A with a 1 period lag statistically important? ______

Your client suggests that there may be some delays in seeing the effects of B, C and D on A because of the process by which A is influenced by B, C and D. He estimates the delays at up to 3 periods. Adjust your model to account for these possibilities and test it.

What are the lags found to be significant for B, C and D?

Variable Lag 3. B 4. C 5. D

6. Describe your revised model for A.

7. How much of the variation in A is explained by the revised model? ______

8. Your client objects to the inclusion of a constant in the model. Is it statistically significant at the 95% level of confidence? ______

Remove the constant from the model.

9. How much of the variation in A is explained by the re-revised model? ______

10. List the coefficients of the re-revised model to 1 decimal place in the following table. Variable Coefficient Part 3

An experiment was designed to test the importance of shelf placement and soup type on chicken soup sales. The variables Type, Level and Sales refer to the type of chicken soup (Stars, Noodles or Rice), the shelf level (Top, Middle, Lowest) and the aggregate sales of the type of soup. Sales data for these soups was collected over a period of a several days, sufficient to mask seasonal effects.

1. Is shelf level significant? ______

2. Is soup type significant? ______

Normally, all three soups sell for the same price. However later information has revealed that the store practices promotional pricing on soups located on the lower shelf. The sales data was disaggregated and placed into the variables SoupySales, PSoup, TSoup and LSoup, showing respectively sales, price, type and shelf level.

3. Is shelf level significant? ______

4. Is soup type significant? ______

5. Show your ranking of importance for the significant variables in the following table in the order of most conducive to sales.

Indicator Rank

Bonus: Each question is worth 6 points.

Trending variables may display Multicollinearity when used as explanatory variables in a model.

1. If a sign of a coefficient in a multiple regression analysis was unexpected, e.g. negative if you expected positive, how would you test for Multicollinearity?

2. If Multicollinearity was identified, what implication would this have for forecasting beyond the range of the data? 3. Non-linear effects of independent variables can be modeled in linear regression analysis. True False

4. Explain the term “minimum variance estimation” as it relates to regression analysis.

5. What is the null hypothesis corresponding to the p-values in a multiple regression analysis table of coefficients?