Simple Mean Weighted Mean Or Harmonic Mean
Total Page:16
File Type:pdf, Size:1020Kb
MultiplyMultiply oror Divide?Divide? AA BestBest PracticePractice forfor FactorFactor AnalysisAnalysis 77 ––10 10 JuneJune 20112011 Dr.Dr. ShuShu-Ping-Ping HuHu AlfredAlfred SmithSmith CCEACCEA Los Angeles Washington, D.C. Boston Chantilly Huntsville Dayton Santa Barbara Albuquerque Colorado Springs Ft. Meade Ft. Monmouth Goddard Space Flight Center Ogden Patuxent River Silver Spring Washington Navy Yard Cleveland Dahlgren Denver Johnson Space Center Montgomery New Orleans Oklahoma City Tampa Tacoma Vandenberg AFB Warner Robins ALC Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PRT-70, 01 Apr 2011 ObjectivesObjectives It is common to estimate hours as a simple factor of a technical parameter such as weight, aperture, power or source lines of code (SLOC), i.e., hours = a*TechParameter z “Software development hours = a * SLOC” is used as an example z Concept is applicable to any factor cost estimating relationship (CER) Our objective is to address how to best estimate “a” z Multiply SLOC by Hour/SLOC or Divide SLOC by SLOC/Hour? z Simple, weighted, or harmonic mean? z Role of regression analysis z Base uncertainty on the prediction interval rather than just the range Our goal is to provide analysts a better understanding of choices available and how to select the right approach Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 2 of 25 OutlineOutline Definitions and Examples: z Means (Simple, Weighted, and Harmonic Means) z Regression Methods (MUPE, ZMPE, and Log-Error) z Harmonic Mean Examples Estimating SW development Hours by a Factor of z SLOC per Hour z Hour per SLOC Analyzing Factor CERs Using Different Methods Analyzing Uncertainties Realistic Example Conclusions Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 3 of 25 DefinitionsDefinitions ofof MeansMeans Given a set of n observations (Z1, Z2, …, Zn), simple mean (SM), weighted mean (WM), geometric mean (GM), and harmonic mean (HM) are given by n GM =()Z /1 n n n ∏ i Z ∑i 1wi Z i i=1 ∑i=1 i WM = = SM = n w −1 n ∑i=1 i ⎛1n 1⎞ HM = ⎜ ⎟ ⎜ ∑ i=1 ⎟ ⎝n Zi ⎠ where wi is the weighting factor of the ith n observation (i = 1,…, n) = n 1 -1 ∑ i=1 Note: HM(Z) = (SM(1/Z)) Z i SM ≥ GM ≥ HM if all observations (Zi’s) are non-negative z These means are the same if all observations are the same Whether SM is greater or less than WM depends upon the relative magnitudes of the weighting factors Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 4 of 25 WeightedWeighted HarmonicHarmonic MeanMean The weighted harmonic mean (WHM) is defined by n w −1 ∑i=1 i ⎛ ⎞ WHM = ⎜ 1 n wi ⎟ n w WHM = i ⎜ n ∑i=1 ⎟ i=1 w Zi ∑ ∑i=1 i Zi ⎝ ⎠ where wi is the weighting factor of the ith observation (i = 1,…, n) −1 ⎛1n 1⎞ HM = ⎜ ⎟ WMH Î ⎜ ∑i=1 ⎟ when all wi’s are the same ⎝n Zi ⎠ HM = GM2/SM for two observations z HM(a,b) = 2/(1/a+1/b) = 2ab/(a+b) = ab/[(a+b)/2] z If a = 1, b = 2, then SM = 1.5, GM = 1.414, HM = 1.333 Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 5 of 25 HarmonicHarmonic MeanMean ExamplesExamples Productivity: z John can finish a data entry job in 60 hours z Mary can do it in 30 hours z The harmonic mean is 40 hours per person z It will take them 40/2 = 20 hours to finish the job if they work together Speed: z A car travels 120 miles at a speed of 60 miles/hour z It travels the next 120 miles at a speed of 30 miles/hour z Its average speed is the harmonic mean of 60 and 30, i.e., 40 miles/hour ¾ This means that if you would like to cover the 240 miles in the same time at a single speed, you would travel at 40 miles/hour SM is not applicable here and often mistakenly used in places where HM should be used Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 6 of 25 ContrastingContrasting SimpleSimple andand HarmonicHarmonic MeansMeans Every statistic that measures the central tendencies has its own pros and cons Simple Mean Properties: z It can be highly influenced by extreme data points z It does not give greater weights to large data points z It treats all data points equally, as the weights are the same in the computation formula HM strongly favors the smaller numbers in the data set; it does not give equal weight to each data point z If x1 = 1, x2 = 2, Î SM=1.5, GM = 1.4, HM = 1.3 z If x1 = 0.1 & x2 =…= x10 = 10Î SM=9.0, GM = 6.3, HM = 0.9 Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 7 of 25 DevelopingDeveloping thethe FactorsFactors byby RegressionRegression AnalysisAnalysis Simple and Weighted Means are also the solutions derived by weighted least squares (WLS) regression analysis for factor equations Multiplicative regression methods will be discussed next Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 8 of 25 MultiplicativeMultiplicative ErrorError ModelsModels –– LogLog-Error,Log--Error,Error, MUPE,MUPE, && ZMPEZMPE Cost variation is Definition of error term for Y = f(x)*ε proportional to the level of the function Log-Error: ε ~ LN(0, σ2) Ö Least squares in log space z Error = Log (Y) - Log f(X) z Minimize the sum of squared errors; process is done in log space MUPE: E(ε ) = 1, V(ε ) = σ2 Ö Least squares in weighted space z Error = (Y-f(X))/f(X) Note: E((Y-f(X))/f(X)) = 0 z Minimize the sum of squared % errors V((Y-f(X))/f(X)) = 2 σ2 iteratively (i.e., minimize Σi {(yi-f(xi))/fk-1(xi)} , k is the iteration number) ZMPE: E(ε ) = 1, V(ε ) = σ2 Ö Least squares in weighted space z Error = (Y-f(X))/f(X) z Minimize the sum of squared (percentage) errors with a constraint: Σi (yi-f(xi))/f(xi) = 0 We will focus on MUPE/ZMPE factor equations in this paper Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 9of 25 SWSW DevDev HoursHours AsAs aa FactorFactor ofof SLOC/HourSLOC/Hour && Hour/SLOCHour/SLOC (1/4)(1/4) SW Dev hours for a program are commonly estimated using the source lines of code te ( ra(SLOC) and productivitySLOC per hour) Given n paired-data (SLOC1,hour1), (SLOC2,hour2), …, (SLOCn,hourn), given byre a ratetyiSM, WM, and HM of the productiv n (SLOCi / i hour⎛ 1 n ) 1 ⎞ SM(SLOC/hour) = SM_Rate = ∑i=1 = ⎜ ⎟ ⎜ ∑i=1 ⎟ n ⎝ n (hour/SLOCi i ⎠ ) n n n ( SLOCw /hour hour ( ) SLOC /( hour SLOC ) ) ∑i=1 i i i ∑i=1 i i i ∑i=1 i WM(SLOC/hour) = WM_Rate = n = n = n ()w ( hour ) ( hour ) ∑i=1 i ∑i=1 i ∑i=1 i −1 −1 ⎛ 1 n 1 ⎞ ⎛ 1 n ⎞ HM(SLOC/hour) = HM_Rate = ⎜ ⎟(= hour / SLOC ) ⎜ ∑i=1 ⎟ ⎜ ∑i=1 i i ⎟ ⎝ n SLOCi / i ⎠ hour⎝ n ⎠ (HM(SLOC/hour))-1 = SM(hour/SLOC); (SM_Rate)-1 HM(hour/SLOC)= -1 -1 “Flip” twice to relate HM HM(X) = (SM(1/X)) SM(X) = (HM(1/X)) to SM, and vice versa Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70 01, Apr 2011 PublApproved foric Release 10 of 25 SWSW DevDev HoursHours AsAs aa FactorFactor ofof SLOC/HourSLOC/Hour && Hour/SLOCHour/SLOC (2/4)(2/4) SW Dev hours for a program are commonly estimated based on an estimate of SLOC divided by an average factor of SLOC per Hour (an estimate of productivity rate) Average Productivity Rate z Divide: Dev-Hours = Est(SLOC) / SM(SLOC per hour) -1 = Est(SLOC) / {(Σi SLOCi/Houri)/n } = Est(SLOC) * {(Σi 1/(Houri/SLOCi))/n} = Est(SLOC) * HM(Hour/SLOC) SW Dev hours can also be predicted by an estimate of SLOC times an average factor of Hour per SLOC z Multiply: Dev-Hours = Est(SLOC) * SM(Hours per SLOC) = Est(SLOC) * SM(Hour/SLOC) The “Divide” method generates a point estimate (PE) smaller than the “Multiply” method unless all ratios are the same Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 11 of 25 SWSW DevDev HoursHours AsAs aa FactorFactor ofof SLOC/HourSLOC/Hour && Hour/SLOCHour/SLOC (3/4)(3/4) Another method to estimate SW development hours is to divide an estimate of SLOC by a weighted productivity rate Weighted z Dev-Hours = Est(SLOC) / (Wtd SLOC per hour) Productivity Rate (weighted by hours) = Est(SLOC) / {(Σiwi *(SLOCi / Houri))/(Σiwi)} = Est(SLOC) / {(Σi Houri *(SLOCi / Houri))/(ΣiHouri)} = Est(SLOC) / {Σi SLOCi / Σi Houri } = Est(SLOC) * (Σi Houri)/(Σi SLOCi ) = Est(SLOC) / WM(SLOCi/Houri) = Est(SLOC) * WM(Houri/SLOCi) Using WM appears to be a better approach because it provides the same result regardless of the methods Be wary of the “common” uncertainty approach for WM z Three distributions are often applied to model the uncertainty for Dev-hours Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 12 of 25 SWSW DevDev HoursHours AsAs aa FactorFactor ofof SLOC/HourSLOC/Hour && Hour/SLOCHour/SLOC (4/4)(4/4) Question #1: Which method is more appropriate to estimate SW development hours, “Divide” or “Multiply”? Divide: DevHours = Est(SLOC)/SM(SLOC/Hour) = Est(SLOC)*HM (Hour/SLOC)(1) Multiply: DevHours = Est(SLOC)*SM(Hour/SLOC)