Simple Linear Regression – Assignment #7 ( Points)
Total Page:16
File Type:pdf, Size:1020Kb
Simple Linear Regression – Assignment #6 (40 points)
1 – Points Per Game (PTS/G) and Minutes Played Per Game (Min/G)
Goal: Develop a regression model to predict/explain points scored per game using playing time measured as minutes per game. Data Set: NBA Top 50 2002-2003
This assignment is similar to your simple linear regression handout; however, I want you to investigate Points/Game (i.e. the average number of points scored per game) versus Minutes/Game (i.e. the average number of minutes played). The primary interest is in Points/Game (PTS/G) and we want to understand this variable using Minutes/Game (Min/G), so let Y=PTS/G and X = Min/G. Main Items to address: 1. Obtain a linear correlation measurement to initially investigate the linear relationship between these two variables. Are these variables linearly related to each other? Explain. (2 pts.)
2. Perform the overall regression usefulness test (i.e. HO: Regression is not useful vs HA: Regression is useful) to formalize your initial investigation of these variables. What is your decision for this test? Write a conclusion in everyday language for this test. (2 pts.)
3. Perform the test to ensure that the slope of our regression line is not zero (i.e. HO:
Min/G = 0 vs HA: Min/G ≠ 0). What is your decision for this test? Write a conclusion using everyday language for this test. (2 pts.)
4. What is the R-Square value for this analysis? In the context of this problem, carefully explain what this number is measuring. (3 pts.)
5. Using JMP, create a scatter plot of the data with the estimated regression line. In the context of this problem, carefully interpret the y-intercept and slope of your estimated regression line. Again, carefully explain what these numbers are measuring. (You need to do more than say they are the y-intercept and slope of the line.) (4 pts.)
6. Discuss whether or not the assumptions for this procedure are being meet. Also, identify any outliers in the data set. (4 pts.)
Checking the assumptions: > Model Appropriate: Make sure no existing trends remain in the residual plot. > Constant Variance: Make sure there is no megaphone patterns in the residual plot > Independence: Don’t really need to check this as these data are collected over time. > Normality: Make a histogram of the residuals and make sure they follow a normal distribution > Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers. Who are the outlying players?
7. What is the predicted PTS/G for Kevin Garnett of the Minnesota Timberwolves? What can we say about the Kevin Garnett's scoring in terms of the number of minutes he is allowed to play? Discuss. (3 pts.)
8. Using this analysis, identify at least one player that is doing much better than expected in terms of his scoring considering how many minutes he is allowed to play. How did you make this determination? Explain. (3 pts.)
2 – Car Purchase Price and Income of Buyer Goal: Develop a regression model to predict/explain the amount a person spends on a car using their income in thousands of dollars. Data Set: Car-purchase
Here I want you to investigate amount spend on a car purchase versus the income of the car buyer. The primary interest is in Price.Paid ($) and we want to understand this variable using Income ($1000’s), so let Y=Price.Paid and X = Income ($1000’s).
Main Items to address: 1. Obtain a linear correlation measurement to initially investigate the linear relationship between these two variables. Are these variables linearly related to each other? Explain. (2 pts.)
2. What is the R-Square value for this analysis? In the context of this problem, carefully explain what this number is measuring. (3 pts.)
3. Create a scatter plot of the data with the estimated regression line. In the context of this problem, carefully interpret the y-intercept and slope of your estimated regression line. Again, carefully explain what these numbers are measuring. (You need to do more than say they are the y-intercept and slope of the line.) (4 pts.)
4. Discuss whether or not the assumptions for this procedure are being meet. Also, identify any outliers in the data set. (4 pts.)
Checking the assumptions: > Model Appropriate: Make sure no existing trends remain in the residual plot. > Constant Variance: Make sure there is no megaphone patterns in the residual plot > Independence: Don’t really need to check this as these data are collected over time. > Normality: Make a histogram of the residuals and make sure they follow a normal distribution > Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers.
5. Use the Analyze > Fit Model approach to fit the model and obtain 95% confidence and 95% prediction intervals for a car buyer with an income of $40,000. Interpret each of these intervals. (4 pts.)