Backward Elimination Procedure
Total Page:16
File Type:pdf, Size:1020Kb
Backward Elimination procedure
The scenario:
An investor wishes to track the NASDAQ 100 (QQQQ) index by purchasing up to 11 stocks which he has already pre-selected. He is only willing to tolerate volatility of 2.5%. What is the minimum number of the 11 stocks that he must purchase in order to meet his volatility requirement?
Solution:
Using backwards elimination, the error and R2 (adjusted) of different sets of stocks can be compared. Collineararity matters, not in the sense that independent variables are being explored, but in the sense that more variables will be kept in the model.
Results:
The procedure removed three variables (surprisingly GM was one of them). (Output attached at end of this analysis.) A correlation matrix was run on the remaining variables to check for high collinearity.
MMM ASD EBAY GS IBM MER MEL YHOO
MMM 1.00000 0.24255 0.10982 0.32939 0.30720 0.38582 0.34506 0.19755 MMM <.0001 0.0205 <.0001 <.0001 <.0001 <.0001 <.0001
ASD 0.24255 1.00000 0.15376 0.32312 0.20317 0.32785 0.34731 0.23693 ASD <.0001 0.0011 <.0001 <.0001 <.0001 <.0001 <.0001
EBAY 0.10982 0.15376 1.00000 0.27219 0.17819 0.24400 0.19815 0.42391 EBAY 0.0205 0.0011 <.0001 0.0002 <.0001 <.0001 <.0001
GS 0.32939 0.32312 0.27219 1.00000 0.32283 0.70268 0.45924 0.37330 GS <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
IBM 0.30720 0.20317 0.17819 0.32283 1.00000 0.37625 0.36774 0.19470 IBM <.0001 <.0001 0.0002 <.0001 <.0001 <.0001 <.0001
MER 0.38582 0.32785 0.24400 0.70268 0.37625 1.00000 0.56383 0.31523 MER <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
MEL 0.34506 0.34731 0.19815 0.45924 0.36774 0.56383 1.00000 0.27715 MEL <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
YHOO 0.19755 0.23693 0.42391 0.37330 0.19470 0.31523 0.27715 1.00000 YHOO <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 MER is highly correlated with both MEL and GS (highlighted above). All other correlations are far lower. To address this, a second regression was run with MER removed. In both models the C-statistic equals k+1 however R2 fell from .6809 to .6645. Following the parameters of the question strictly, i.e. choose the minimum number meeting 2.5% volatility, it seems that keeping MER is prudent. The SAS System 15:25 Friday, May 26, 2006 1
The REG Procedure Model: MODEL1 Dependent Variable: QQQQ QQQQ
Number of Observations Read 445 Number of Observations Used 445
Backward Elimination: Step 0
All Variables Entered: R-Square = 0.6809 and C(p) = 12.0000
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Pr > F
Model 11 0.02327 0.00212 84.01 <.0001 Error 433 0.01090 0.00002518 Corrected Total 444 0.03418
Parameter Standard Variable Estimate Error Type II SS F Value Pr > F
Intercept -0.00002310 0.00024104 2.31243E-7 0.01 0.9237 MMM 0.08373 0.02508 0.00028074 11.15 0.0009 ASD 0.04881 0.01608 0.00023202 9.21 0.0025 EBAY 0.07495 0.01093 0.00118 47.02 <.0001 GM 0.00535 0.00961 0.00000781 0.31 0.5780 GS 0.08354 0.02857 0.00021541 8.55 0.0036 IBM 0.20059 0.02677 0.00141 56.15 <.0001 HAL 0.02016 0.01201 0.00007095 2.82 0.0940 MER 0.11100 0.03332 0.00027947 11.10 0.0009 MEL 0.09640 0.02677 0.00032671 12.97 0.0004 WMI 0.05392 0.02518 0.00011545 4.58 0.0328 YHOO 0.09883 0.01450 0.00117 46.48 <.0001 Bounds on condition number: 2.5188, 178.19 ------
Backward Elimination: Step 1
Variable GM Removed: R-Square = 0.6807 and C(p) = 10.3100
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Pr > F
Model 10 0.02326 0.00233 92.52 <.0001 Error 434 0.01091 0.00002514 Corrected Total 444 0.03418
Parameter Standard Variable Estimate Error Type II SS F Value Pr > F
Intercept -0.00002725 0.00024073 3.220941E-7 0.01 0.9099 MMM 0.08380 0.02506 0.00028122 11.18 0.0009 ASD 0.04951 0.01602 0.00024010 9.55 0.0021 EBAY 0.07507 0.01092 0.00119 47.26 <.0001 GS 0.08246 0.02848 0.00021085 8.39 0.0040 IBM 0.20161 0.02668 0.00144 57.09 <.0001 HAL 0.02006 0.01200 0.00007023 2.79 0.0954 MER 0.11438 0.03274 0.00030687 12.20 0.0005 MEL 0.09657 0.02674 0.00032789 13.04 0.0003 WMI 0.05475 0.02512 0.00011943 4.75 0.0298 YHOO 0.09892 0.01448 0.00117 46.65 <.0001 The SAS System 15:25 Friday, May 26, 2006 2
The REG Procedure Model: MODEL1 Dependent Variable: QQQQ QQQQ
Backward Elimination: Step 1
Bounds on condition number: 2.4354, 149.65 ------
Backward Elimination: Step 2
Variable HAL Removed: R-Square = 0.6786 and C(p) = 11.0986
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Pr > F
Model 9 0.02319 0.00258 102.07 <.0001 Error 435 0.01098 0.00002525 Corrected Total 444 0.03418
Parameter Standard Variable Estimate Error Type II SS F Value Pr > F
Intercept 0.00000999 0.00024019 4.371028E-8 0.00 0.9668 MMM 0.08454 0.02511 0.00028632 11.34 0.0008 ASD 0.05250 0.01595 0.00027337 10.83 0.0011 EBAY 0.07567 0.01094 0.00121 47.87 <.0001 GS 0.08588 0.02846 0.00022989 9.11 0.0027 IBM 0.20004 0.02672 0.00141 56.04 <.0001 MER 0.11849 0.03272 0.00033119 13.12 0.0003 MEL 0.09767 0.02679 0.00033558 13.29 0.0003 WMI 0.05537 0.02517 0.00012219 4.84 0.0283 YHOO 0.09995 0.01450 0.00120 47.52 <.0001 Bounds on condition number: 2.4217, 124.27 ------
Backward Elimination: Step 3
Variable WMI Removed: R-Square = 0.6751 and C(p) = 13.9506
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Pr > F
Model 8 0.02307 0.00288 113.23 <.0001 Error 436 0.01111 0.00002547 Corrected Total 444 0.03418
Parameter Standard Variable Estimate Error Type II SS F Value Pr > F
Intercept 0.00003384 0.00024100 5.020536E-7 0.02 0.8884 MMM 0.08798 0.02517 0.00031124 12.22 0.0005 ASD 0.05546 0.01597 0.00030727 12.06 0.0006 EBAY 0.07768 0.01095 0.00128 50.36 <.0001 GS 0.08991 0.02853 0.00025300 9.93 0.0017 IBM 0.20236 0.02682 0.00145 56.94 <.0001 MER 0.12354 0.03278 0.00036178 14.20 0.0002 MEL 0.10614 0.02663 0.00040466 15.89 <.0001 YHOO 0.09958 0.01456 0.00119 46.76 <.0001 The SAS System 15:25 Friday, May 26, 2006 3
The REG Procedure Model: MODEL1 Dependent Variable: QQQQ QQQQ
Backward Elimination: Step 3
Bounds on condition number: 2.4098, 100.13 ------
All variables left in the model are significant at the 0.0250 level.
Summary of Backward Elimination
Variable Number Partial Model Step Removed Label Vars In R-Square R-Square C(p) F Value Pr > F
1 GM GM 10 0.0002 0.6807 10.3100 0.31 0.5780 2 HAL HAL 9 0.0021 0.6786 11.0986 2.79 0.0954 3 WMI WMI 8 0.0036 0.6751 13.9506 4.84 0.0283