<<

VOLUME: 3 | ISSUE: 1 | 2019 | March

Comparison among Akaike Information Criterion, Bayesian Information Criterion and Vuong's test in : A Case Study of Violated Speed Regulation in Taiwan

Kim-Hung PHO1,∗, Sel LY1, Sal LY1, T. Martin LUKUSA2

1Faculty of Mathematics and , Ton Duc Thang University, Ho Chi Minh City, Vietnam 2Institute of Statistical Science, Academia Sinica, Taiwan, R.O.C., Taiwan

*Corresponding Author: Kim-Hung PHO (Email: [email protected]) (Received: 4-Dec-2018; accepted: 22-Feb-2019; published: 31-Mar-2019) DOI: http://dx.doi.org/10.25073/jaec.201931.220

Abstract. When doing research scientic is- Keywords sues, it is very signicant if our research issues are closely connected to real applications. In re- Akaike Information Criteria (AIC), ality, when analyzing in practice, there are Bayesian Information Criterion (BIC), frequently several models that can appropriate to Vuong's test, , Zero- the survey data. Hence, it is necessary to have inated Poisson regression, Negative a standard criteria to choose the most ecient . model. In this article, our primary interest is to compare and discuss about the criteria for select- ing model and its applications. The authors pro- 1. Introduction vide approaches and procedures of these methods and apply to the trac violation data where we look for the most appropriate model among Pois- The model selection criteria is a very crucial son regression, Zero-inated Poisson regression eld in statistics, economics and several other ar- and Negative binomial regression to capture be- eas and it has numerous practical applications. tween number of violated speed regulations and This issue is currently researched theoretically some factors including distance covered, motor- and practically by several and has cycle engine and age of respondents by using gained many attentions in the last two decades, AIC, BIC and Vuong's test. Based on results on especially in regression and econometric mod- the training, validation and test data set, we nd els. There are three most commonly used model that the criteria AIC and BIC are more consis- selection criteria including Akaike information tent and robust performance in model selection criterion (AIC), Bayesian information criterion than the Vuong's test. In the present paper, the (BIC) and Vuong's test, which are compared authors also discuss about advantages and disad- and discussed in this paper. AIC is rst pro- vantages of these methods and provide some of posed by Akaike [1] as a method to compare dif- suggestions with potential directions in the future ferent models on a given outcome. Meanwhile, research. BIC is proposed by Schwarz [20], is a criterion for model selection among a nite set of models. Vuong's test has been proposed by Vuong [24] in the literature aiming at selecting a single model

c 2019 Journal of Advanced Engineering and Computation (JAEC) 293 VOLUME: 3 | ISSUE: 1 | 2019 | March regardless of its intended use. All three crite- criteria for choosing model including Akaike in- ria are the most widespread criteria for choosing formation criterion (AIC), Bayesian information model. criterion (BIC) and Vuong's test. In Section 3, these methods are applied to a real data which Until today, these problems have been stud- could help readers to easily assess them. Some ied and utilized in numerous areas. AIC has of suggestions and some potential directions for been researched and applied extensively in lit- the further research are devoted in Section 4. erature such as: Snipes et al. [19] employ AIC Finally, some conclusions and remarks are given and present about an example from wine rat- in Section 5. ings and prices, Taylor et al. [21] introduce in- dicators of hotel protability: Model selection using AIC, Charkhi et al. [4] research about asymptotic post-selection inference for the AIC, 2. Some of Criteria for Chang et al. [3] present about Akaike Informa- Model Selection tion Criterion-based conjunctive belief rule base learning for complex system modeling, etc. In this section, we present approaches and proce- In addition, BIC is also utilized extensively in dures of ubiquitous methods to choose the most literature for example: Neath et al. [16] intro- ecient model consisting of Akaike Information duce about regression and model se- Criteria (AIC), Bayesian Information Criterion lection using variants of the Schwarz information (BIC) and Vuong's test. criterion. Cavanaugh et al. [2] present about generalizing the derivation of the BIC. Weakliem [27] introduce about a critique of the Bayesian 2.1. Akaike Information information criterion for model selection. Neath Criteria (AIC) et al. [15] present about a Bayesian approach to the multiple comparisons problem. Neath et al. AIC is rst proposed by Akaike [1] as a method [17] present about the BIC: background, deriva- to compare dierent models on a given outcome. tion, and applications. Nguefack-Tsague et al. The AIC for candidate model is dened as fol- [23] focus on introduce about Bayesian informa- lows: tion criterion, etc. AIC := −2`(θˆ|y) + 2K, (1) Similarly to AIC and BIC, Vuong's test [24] is also used largely in literature for instance: where K is the number of estimated parameters ˆ Clarke [5] employ Vuong's test to introduce in the model including the intercept and `(θ|y) a simple distribution-free test for non-nested is a log-likelihood at its maximum point of the model selection, Theobald [22] utilize Vuong's estimated model. The rule of choice: the smaller test to present a formal test of the theory of the value of AIC is, the better the model is. universal common ancestry, Lukusa et al. [13] use Vuong's test to evaluate whether the zero- 2.2. Bayesian Information inated Poisson (ZIP) regression model is con- Criterion (BIC) sistent with the real data, Dale et al. [6] perform model comparison using Vuong's test to estimate BIC is rst introduced by Schwarz [20], one of nested and zero-inated ordered probit mod- sometimes calls the Bayesian information cri- els, Schneider et al. [18] present about model terion (BIC) or Schwarz criterion (also SBC, selection of nested and non-nested item response SBIC) which is a criterion for model selection models using Vuong's test, etc. among a nite set of models. The BIC for can- Our main objective in this paper is to provide didate model is dened as follows: researchers an overview of the criteria in model selection for the trac violation data. The rest BIC := −2`(θˆ|y) + K ln(n), (2) of the paper is organized as follows. In Section where n is a sample size; K is the number of 2, we present approaches and procedures of the estimated parameters in the model including the

294 c 2019 Journal of Advanced Engineering and Computation (JAEC) VOLUME: 3 | ISSUE: 1 | 2019 | March intercept and ˆ is the log-likelihood at its f (Y |X ,Z , α ) `(θ|y) • m (α , α ) = ln 1 i i i b1 , maximum point of the estimated model. The i b1 b2 f2 (Yi|Xi,Zi, αb2) where is the predicted rule of selection: the smaller the value of BIC fj (Yi|Xi,Zi, αbj) , is, the better the model is. The procedure for probability of an observed count for case i applying AIC and BIC are given as follows: from the model j, j = 1, 2, respectively.

Step 1: Selecting candidate models which • Moreover for the complete case, V can be can be tted to the data set. easily obtained from the package pscl in R language, (Zeileis at el. [28]). Step 2: Estimating unknown parameters of models. At the signicant level α, the decision rule is given as follows: Step 3: Finding values of AIC and BIC by using the formulas (1) and (2), respectively. • If V > Qα/2, choose model 1. Step 4: Basing on the rule of choice, one

can decide the most suitable model. • If V < −Qα/2, choose model 2.

• If |V | < Qα/2, both models are equivalent. 2.3. Vuong's Test where Q is an upper quantile of standard nor- Vuong's test [24] is one of the ubiquitous cri- α/2 mal distribution at the level α/2. Similar to al- teria for choosing model and it is often used gorithms for AIC and BIC, to perform Vuong's to the data set with no missing values. Let test, we need to do through following steps: f1(Y |X,Z,W ; α1) and f2(Y |X,Z,W ; α2) be two non-nested probability models. Let and αb1 Step 1: Choosing candidate models which be a consistent estimator of and un- αb2 α1 α2 can be tted to the data set. der the model f1 and f2, respectively. Letting hypotheses Step 2: Estimating unknown coecients of models. • H0: The two models are equally closed to the true data. Step 3: Calculating V by using (3) : Model 1 is closer than model 2. • H1 Step 4: Basing on the rule of choice, one can select the most compatible model. The Vuong's test statistics is provided as follows; (see Mouatassim and Ezzahid [14]): Note that: Step 1 is a very important step in  n  practice, basing on characteristics of the data √ 1 P n mi (αb1, αb2) set, one can choose some reasonable models to n i=1 V = V (αb1, αb2) = , t. For example, if the data set is a binary, then h (αb1, αb2) candidate models are considered such as logis- (3) tic regression model, probit model and so on. If where the data set is class of count data, one can uti- lize some of models such as: Poisson regression 2 model, binomial regression model, negative bi- h (α1, α2) b b nomial regression model and so on. If the data n " n #2 1 X 1 X set is a zero-inated or imbalance data, zero- = m2 (α , α ) − m (α , α ) n i b1 b2 n i b1 b2 inated Poisson (ZIP) regression model, zero- i=1 i=1 inated binomial (ZIB) regression model, and The detailed calculation of V is provided in Ap- zero-inated negative binomial (ZINB) regres- pendix. Note that: sion model could be more plausible candidates.

c 2019 Journal of Advanced Engineering and Computation (JAEC) 295 VOLUME: 3 | ISSUE: 1 | 2019 | March

3. Models for Violated Descriptions Variables Re Speed Regulation Distance-covered X 6262 (km a year) 1. Under 1,000 X = 1 1752 The data set utilized in this analysis is from a 2. 1,000-2,999 X = 2 1711 motorcycle survey study regarding road trac 3. 3,000-9,999 X = 3 1856 regulations conducted in Taiwan by the Ministry 4. Over 1,000 X = 4 943 of Transportation and Communication in 2007. Number-Violation Y 6262 This data set has been used in the paper "Semi- (in a year) parametric estimation of a zero-inated Poisson 1. Never violation Y = 0 5637 (ZIP) regression model with missing covariates" 2. One violation Y = 1 380 by Lukusa et al. [13]. This study consists of 3. Two violations Y = 2 169 7,386 respondents involving 1122 missing val- 4. Three violations Y = 3 59 ues. Before applying the criteria to select opti- 5. Four violations Y = 4 11 mal models, one may require the data having no 6. Five violations Y = 5 2 missing values. Hence, we need to remove all of 7. Six violations Y = 6 3 missing values and displayed in the Tab. 1. The 8. Seven violations Y = 7 1 bar graph of the outcome variable Y is exhibited Motorcycle-engine Z 6262 in Fig. 1 (Appendix). As can be observed from (cubic centimeters (cc)) the Tab. 1 and the Fig. 1 that the number of 1. Under 50 Z = 1 1303 people violating of speed regulations in Taiwan 2. 50-249 Z = 2 4153 2007 is very small. The data set contains most 3. 250-549 Z = 3 272 of zeros in Y which is usually called zero-inated 4. Over 550 Z = 4 534 count data. With this type of data set, some of Respondent's age W 6262 zero-inated models may be more appropriate (years old) than other models. In this section, we investi- 1. Under 18 W = 1 3 gate three following models: Zero-inated Pois- 2. 18-19 W = 2 142 3. 20-29 1395 son (ZIP) regression model denoted by M1, Pois- W = 3 4. 30-39 1607 son regression model called M2 and M3 stands W = 4 for Negative binomial (NB) regression model. 5. 40-49 W = 5 1508 The forms of these models are briey given in 6. Over 50 W = 6 1607 the Appendix. Our aim is to evaluate which model is more appropriate for modeling between Tab. 1: of respondents (Re) in data set after deleting missing values. the number of violated speed regulation (Y ) with some factors such as Distance-covered (X), Motorcycle-engine (Z) and the Age of respon- dents (W ). Firstly the data is randomly split The ZIP model (M1) is composed of two parts into three data sets, namely, training, valida- separately, where the former is called count tion and test with respect to the percentage model with coecients denoted by β and the of 60% − 20% − 20%. This 60% of the latter is the so-called ination model with co- whole data is used to train the three models ecients denoted by γ, see Equation ( 5. ). Mi, i = 1, 2, 3, with results as shown in the Tabs. As can be seen from the Tab. 2, all esti- 2, 3 and 4, respectively. Next, the validation mated coecients of zero-inated part are sta- data which is also randomly extracted by 20% tistically signicant at the level 5% thanks to of the full data is then used for selecting the most all P-values are less than 0.05. In contrast, in appropriate model while the remaining test data the count model, the Distance-covered (X) and is to check accuracy when we do a performance Motorcycle-engine (Z) are not signicant, ex- of forecast with those models. The criteria AIC, cept the Age (W ). The factor Age aects the BIC, Vuong's test, square error (MSE) and number of trac violations for both parts in accuracy are respectively computed to each data the sense that if W is increasing and other fac- set and each model for comparisons. tors are assumed to be unchanged, then the ex-

296 c 2019 Journal of Advanced Engineering and Computation (JAEC) VOLUME: 3 | ISSUE: 1 | 2019 | March pected number of violation is denitely reduced (BIC). These formulas have the same character- and the probability of not violating is clearly in- istics that can be derived from model's likeli- creasing since we have βb3 = −0.23536 < 0 and hood functions and results of maximum likeli- , respectively. hood estimates (MLE). Nevertheless, if AIC or γb3 = 0.19547 > 0 BIC is used to consider the appropriateness of For the Poisson regression model ( ) and M2 models, one needs to calculate separately each the Negative binomial regression model ( ), M3 formula and compare values together with the we also see the statistical signicance of esti- decision rule: the smaller the value of AIC or mated coecients based on P-values are very BIC is, the better the model is, but the short- small ( ). The two factors and with ≈ 0 X Z coming is sometimes one may not know how positive coecients imply that they increase the to determine whether dierences between two incidence rate (see in (11) and (12)) of num- µ values AIC (resp. BIC) is statistically signi- ber of trac violations while makes it to be W cant or not. In case of using Vuong's test, we decreasing as in the case of ZIP model, see Tab. only need to compute the given in (3) 3 and 4. and follow the rule of choice or nd the P-value We now turn to discuss which model is better. which can help us dierentiate two models sig- Based on results represented in the Tab. 5 and nicantly. However, the Vuong's test is not more 6, the smallest value AIC and BIC on validation robust than AIC and BIC in model selection as data are respectively 1013.404 and 1033.937 and shown in the Section 3. . both are produced by the model . One can M1 For AIC and BIC, AIC is very ubiquitous in also see this conrmation on the training and econometrics, while BIC is more commonly uti- test data sets. Hence, the model (ZIP) is M1 lized in sociology, see Weakliem [27]. It can be the most plausible model in comparison to the seen that, BIC becomes to AIC if . models and . However, by Vuong's test K = ln(n) M3 M2 To see the relationship between formula (1), (2), results on the validation set, see Tab. 8, it sug- and Vuong's test, the problem is given as fol- gests that the model is more preferable than M1 lows: Let is an observed data (a real data). A the model , but it is equivalent to the model D M2 number of possible models for D are consid- (P-value ). This equivalence is Mk M3 = 0.1 > 0.05 ered, with each model having a likelihood func- also conrmed by the same mean square error tion and are unknown param- and the same accuracy L(D|θk; Mk) θk MSE = 0.3488 90.42% eters need to be estimated with parameters. on the validation data, see Tabs. 10 and 11. pk For simplicity's sake, let `(θ ) = ln[L(D|θ ; M )] When checking on the test set, the model M has k k k 1 and be an estimator of by using the maxi- a slightly better performance with the smallest θbk θk mum likelihood estimate (MLE). Assessment of MSE 0.2811, the greatest accuracy 90.60% and similarly result if using Vuong's test. Our result the candidate models can be carried out as a is consistent to Lukusa et al. It also shows that sequence of comparisons between pairs of mod- els. It is more convenient to consider model the information criteria AIC and BIC are more M1 and . The dierence of two values AIC (resp. robust than the Vuong's test in model selection. M2 [13]. BIC) obtained from two certain models can be expressed as follows: ˆ ˆ ∆AIC := −2[`(θ2) − `(θ1)] + 2(p2 − p1) (4) ˆ ˆ 4. Discussion and some ∆BIC := −2[`(θ2) − `(θ1)] + (p2 − p1) ln(n), potential directions for (5) further research and the Vuong's test can be rewritten as: `(θˆ ) − `(θˆ ) V := 1 2 , (6) It can be seen that, to consider the compati- √ ˆ ˆ nh(θ1, θ2) bility of two models, we can use some criteria 2 ˆ ˆ such as: Vuong's test, Akaike Information Cri- where h ((θ1, θ2)) denotes sample of the ˆ ˆ teria (AIC) and Bayesian Information Criterion dierence of log-likelihood `(θ1) − `(θ2).

c 2019 Journal of Advanced Engineering and Computation (JAEC) 297 VOLUME: 3 | ISSUE: 1 | 2019 | March

From this point of view, one may prefer the above formulas with the possibility of dealing

rst model M1 than the second model M2 if with . To the best of our knowl- ∆AIC, ∆BIC and V are positive values. edge, no scholar has studied this problem yet. These are potential research directions in the AIC is a very widespread formula, thus there next time. Some of methods to solve this is- are several scholars have researched and im- sue are very ubiquitous and prevalent. Little proved it by some adjustments. List of modied [12] reviewed six methods to solve the missing AIC statistics are given as follows: data problem that are complete-case (CC) anal- ysis, available-case (AC) methods, First denoted by AICc is the corrected AIC • (LS) on imputed data, maximum likelihood for sample size (ML), Bayesian methods and multiple imputa- 2K(K + 1) tion (MI). Zhao and Lipsitz [29] proposed the AICc := AIC + . (7) inverse probability weighting (IPW) method. n − K − 1 Wang et al. [26] developed a regression calibra- tion (RC) method. Wang et al. [25] introduced • Next is the AIC weight of the model Mk dened by the joint conditional likelihood (JCL) method. In addition, we can combine methods to provide  1  a robust tool to solve this problem. For instance: exp − AICc(k) 2 Han [8] presented multiply robust estimation in AICw(k) := , (8) R  1  with missing data where the P exp − AICc(i) IPW and MI method are combined together. k=1 2 About the expansion of above issues, it is sim- where is number of possible candidate R ilar to the study of regression models, the tradi- models. The is the weight of AICw(k) tional regression models such as logistic regres- the evidence of the model with respect Mk sion model, zero-inated binomial (ZIB) regres- to other candidate models, i.e. the model sion model, zero-inated Poisson (ZIP) regres- has the highest is considered as the AICw sion model, etc, coecients cannot be directly strongest model. estimated if some covariates having missing val- ues. Hence, one needs to have some new ap- • Evidence ratio of the model Mk is deter- mined by proaches to estimate parameters in this situa- tion. For instance, Wang et al. [25] employed AICwbest the joint conditional likelihood (JCL) estima- ER(k) := , (9) AICw(k) tor in with missing covari- ates data. Hsieh et al. [9] extended method of where is the AIC weight of the AICwbest Wang et al. (2002) to introduce a semiparamet- best (true) model. This ratio measures how ric analysis of randomized response data with decisive the evidence in the sense that the missing covariates in logistic regression. Lee et model with the smallest is the most ap- ER al. [11] also extended method in Wang et al. propriate model with respect to other can- (2002) to present a semiparametric estimation didate models. of logistic regression model with missing covari- ates and outcome. Pho et al. [30] discussed Regarding applicability, Vuong's test, Akaike about three ubiquitous approaches to handle the Information Criteria (AIC) and Bayesian Infor- issues having missing data. Diallo et al. [7] in- mation Criterion (BIC) are only applicable for troduced an IPW estimator of the parameters of complete data i.e. no missing values. In sev- a ZIB regression model with missing-at-random eral practical applications, some elements in the covariates. Lukuasa et al. [13] presented a semi- given data set are usually missing. Hence, these parametric estimation of a zero-inated Poisson traditional criteria may be no longer suitable for (ZIP) regression model with missing covariates, selecting models and if we remove all missing etc. elements, it could lead to the biasness in infer- ences. Therefore, it is necessary to improve the

298 c 2019 Journal of Advanced Engineering and Computation (JAEC) VOLUME: 3 | ISSUE: 1 | 2019 | March

5. Conclusion [7] Diop, A., & Dupuy, J. F. (2017). Esti- mation in zero-inated binomial regression We reviewed widespread methods for selecting with missing covariates. the most ecient model: Vuong's test, Akaike [8] Han, P. (2014). Multiply robust estimation Information Criteria (AIC) and Bayesian In- in regression analysis with missing data. formation Criterion (BIC). The approach and Journal of the American Statistical Asso- procedure of these methods and application to ciation, 109(507), 1159-1173. trac violation data are provided step by step. Based on results on the training, validation and [9] Hsieh, S. H., Lee, S. M., & Shen, P. (2009). test data set, we nd that the criteria AIC and Semiparametric analysis of randomized re- BIC have a more consistent and robust per- sponse data with missing covariates in lo- formance in model selection than the Vuong's gistic regression. Computational Statistics test in this case. Besides, some advantages and & Data Analysis, 53(7), 2673-2692. disadvantages of these methods have been dis- cussed and compared in the paper. Further- [10] Lambert, D. (1992). Zero-inated Poisson more, the authors also suggest some potential regression, with an application to defects in research directions in the next time. manufacturing. Technometrics, 34(1), 1-14. [11] Lee, S. M., Li, C. S., Hsieh, S. H., & Huang, L. H. (2012). Semiparametric estimation of References logistic regression model with missing co- variates and outcome. Metrika, 75(5), 621- [1] Akaike, H. (1974). A new look at the 653. identication. In Selected Papers of Hirotugu Akaike (pp. 215-222). [12] Little, R. J. (1992). Regression with miss- Springer, New York, NY. ing X's: a review. Journal of the American Statistical Association, 87(420), 1227-1237. [2] Cavanaugh, J. E., & Neath, A. A. (1999). Generalizing the derivation of the Schwarz [13] Lukusa, T. M., Lee, S. M., & Li, C. information criterion. Communications in S. (2016). Semiparametric estimation of a Statistics-Theory and Methods, 28(1), 49- zero-inated Poisson regression model with 66. missing covariates. Metrika, 79(4), 457-483. [3] Chang, L., Zhou, Z., Chen, Y., Xu, X., Sun, [14] Mouatassim, Y., & Ezzahid, E. H. (2012). J., Liao, T., & Tan, X. (2018). Akaike In- Poisson regression and zero-inated Poisson formation Criterion-based conjunctive be- regression: application to private health in- lief rule base learning for complex system surance data. European actuarial journal, modeling. Knowledge-Based Systems, 161, 2(2), 187-204. 47-64. [15] Neath, A. A., & Cavanaugh, J. E. (2006). [4] Charkhi, A., & Claeskens, G. (2018). A Bayesian approach to the multiple com- Asymptotic post-selection inference for the parisons problem. Journal of Data Science, Akaike information criterion. Biometrika, 4(2), 131-146. 105(3), 645-664. [16] Neath, A. A., & Cavanaugh, J. E. (1997). [5] Clarke, K. A. (2007). A simple distribution- Regression and time series model selection free test for nonnested model selection. Po- using variants of the Schwarz information litical Analysis, 15(3), 347-363. criterion. Communications in Statistics- Theory and Methods, 26(3), 559-580. [6] Dale, D., & Sirchenko, A. (2018). Estima- tion of Nested and Zero-Inated Ordered [17] Neath, A. A., & Cavanaugh, J. E. (2012). Probit Models. Higher School of Economics The Bayesian information criterion: back- Research Paper No. WP BRP, 193. ground, derivation, and applications. Wiley

c 2019 Journal of Advanced Engineering and Computation (JAEC) 299 VOLUME: 3 | ISSUE: 1 | 2019 | March

Interdisciplinary Reviews: Computational [28] Zeileis, A., Kleiber, C., & Jackman, S. Statistics, 4(2), 199-203. (2008). Regression models for count data in R. Journal of statistical software, 27(8), 1- [18] Schneider, L., Chalmers, R. P., Debelak, R., 25. & Merkle, E. C. (2018). Model Selection of Nested and Non-Nested Item Response [29] Zhao, L. P., & Lipsitz, S. (1992). Designs Models using Vuong Tests. arXiv preprint and analysis of two-stage studies. Statistics arXiv:1810.04734. in medicine, 11(6), 769-782.

[19] Snipes, M., & Taylor, D. C. (2014). Model [30] Pho, K. H., & Nguyen, V. T. (2018). Com- selection and Akaike Information Criteria: parison of Newton-Raphson Algorithm and An example from wine ratings and prices. Maxlik Function. Journal of Advanced En- Wine Economics and Policy, 3(1), 3-9. gineering and Computation, 2(4), 281-292. [20] Schwarz, G. (1978). Estimating the dimen- About Authors sion of a model. The annals of statistics, 6(2), 461-464. Kim-Hung PHO is a Ph.D. Student in Applied Statistics at Feng Chia University, [21] Taylor, D. C., Snipes, M., & Barber, N. Taiwan. In 2014, he became a lecturer of A. (2018). Indicators of hotel protability: Faculty of Mathematics and Statistics in Ton Model selection using Akaike information Duc Thang University, Ho Chi Minh City, criteria. Tourism and Hospitality Research, Vietnam. His currently research interests 18(1), 61-71. include Regression models with missing data, Randomized Response Technique, Copula, [22] Theobald, D. L. (2010). A formal test of Mathematics education models and Financial the theory of universal common ancestry. Mathematics. Nature, 465(7295), 219.

[23] Nguefack-Tsague, G., Bulla, I. (2014). A fo- Sel LY has worked as a lecturer at Fac- cused Bayesian information criterion. Ad- ulty of Mathematics and Statistics, Ton Duc vances in Statistics, 2014. Thang University since 2014. He earned a Bachelor degree in Maths-Informatics Teacher [24] Vuong, Q. H. (1989). Likelihood ratio tests Education in 2011 and a Master degrees in for model selection and non-nested hy- Probability Theory and potheses. Econometrica: Journal of the in 2013. Currently, he is a Ph.D. Student in Econometric Society, 307-333. Mathematical Sciences at Nanyang Technologi- cal University, Singapore. His research interests [25] Wang, C. Y., Chen, J. C., Lee, S. M., & Ou, are Data mining, Copula, Stochastic Process S. T. (2002). Joint conditional likelihood es- and Financial Mathematics. timator in logistic regression with missing covariate data. Statistica Sinica, 555-574. Sal LY hold Bachelor and Master degrees in Probability Theory and Mathematical [26] Wang, C. Y., Wang, S., Zhao, L. P., & Ou, Statistics in 2014 and 2016, respectively. His S. T. (1997). Weighted semiparametric es- currently research interests include Copula timation in regression analysis with miss- Theory and Financial Mathematics. ing covariate data. Journal of the American Statistical Association, 92(438), 512-525. T. Martin LUKUSA is working in In- stitute of Statistical Science, Academia Sinica, [27] Weakliem, D. L. (1999). A critique of the Taiwan, R.O.C., Taiwan. His currently research Bayesian information criterion for model se- interests include Regression models with miss- lection. Sociological Methods & Research, ing data, Randomized Response Technique, and 27(3), 359-397. Financial Mathematics.

300 c 2019 Journal of Advanced Engineering and Computation (JAEC) VOLUME: 3 | ISSUE: 1 | 2019 | March

Appendix (X,Z,W )T and so the ZIP model can be ex- pressed as follows: The detailed calculation of V P (Y = y|X,Z,W ) = H(γT X )I(y = 0)+ √  n  T T y 1 P T exp[− exp(β X )][exp(β X )] n mi (αb1, αb2) + [1 − H(γ X )] n i=1 y! V = 1  n  2 (10) 1 P 2 [mi (αb1, αb2) − m] n i=1 T for y = 0, 1, 2,... , where γ = (γ0, . . . , γ3) is  n  √ 1 P called coecients of zero-ination model while n mi (α1, α2) n b b T i=1 β = (β0, . . . , β3) is called coecients of count = n n 1 2m 1 P 2 P 2 2 model, see more details in Lambert [10] and { mi (αb1, αb2) − mi (αb1, αb2) + m } n i=1 n i=1 Lukusa et al. [13]. √ 1 n n[ P m (α , α )] n i b1 b2 = i=1 1 Poisson regression model  1 n  2 P m2 (α , α ) − 2m2 + m2 n i b1 b2 i=1 The Poisson incidence rate is determined by  n  µ √ 1 P a set of regressor variables (the X's). The n mi (αb1, αb2) p n i=1 expression relating these quantities is = 1  n  2 1 P 2 2 mi (αb1, αb2) − m µ = exp (β0 + β1X1 + ··· + βnXp) . (11) n i=1  n  An ubiquitous Poisson regression model for an √ 1 P n mi (αb1, αb2) observation is written as follows n i=1 i = 1 y ( ) −µi i 1 n  1 n 2 2 e (µi) P 2 P P (Yi = yi|µi) = , mi (αb1, αb2) − mi (αb1, αb2) y ! n i=1 n i=1 i  n  where µ = exp (β + β X + ··· + β X ) , and √ 1 P i 0 1 1i p pi n mi (α1, α2) are regression coecients need to n b b β0, β1, . . . , βn = i=1 be estimated. h (αb1, αb2) n 1 P where m = mi (α1, α2) , and Negative binomial regression n b b i=1 model n 2 1 X 2 h (α1, α2) = m (α1, α2) The mean of is determined by the exposure b b n i b b y i=1 time t and a set of p regressor variables (the " n #2 X's). The expression relating these quantities is 1 X − m (α , α ) . n i b1 b2 i=1 µi = exp (ln(ti) + β0 + β1X1i + ··· + βpXpi) . (12) Zero-inated Poisson (ZIP) The widespread negative binomial regression model for an observation is given by regression model i −1 Γ yi + α P (Yi = yi|µi, α) = −1 Lambert [10] propose the parametric ZIP regres- Γ(α )Γ(yi + 1) −1 sion model in which the non-susceptible proba-  α  yi 1 αµi bility (mixing weight) p is linked to X via a logit- × 1 + αµ 1 + αµ linear predictor, p = H(γT X ) for H(u) = [1 + i i exp(−u)]−1, and the Poisson mean λ is linked where β0, β1, . . . , βp are unknown coecients to X via a log-linear predictor, λ = exp(βT X ). need to be estimated. In this paper, p = 3 and where γ and β are unknown parameters need the parameter α is taken to 1 which is automat- to be estimated. In the present paper, X = ically estimated by the package "pscl" in R .

301 VOLUME: 3 | ISSUE: 1 | 2019 | March

Fig. 1. Frequency of violations of speed regulations in Taiwan 2007.

Count Coe. Estimate Std. Error z value Pr(> |z|) Intercept 0.31028 0.37714 0.823 0.41067 X 0.07804 0.07061 1.105 0.26904 Z 0.11969 0.08031 1.490 0.13614 W -0.23536 0.07102 -3.314 0.00092 Zero-inated Estimate Std. Error z value Pr(> |z|) Intercept 4.16321 0.50823 8.192 2.58e-16 X -0.31510 0.09374 -3.362 0.000775 Z -1.25280 0.14906 -8.405 < 2e-16 W 0.19547 0.08847 2.209 0.027152

Tab. 2: Estimates of the model M1 (ZIP model).

Estimate Std. Error z value Pr(> |z|) Intercept -3.07651 0.22958 -13.401 <2e-16 X 0.29910 0.04154 7.200 6e-13 Z 0.88207 0.04166 21.174 <2e -16 W -0.36958 0.03918 -9.433 <2e-16

Tab. 3: Estimates of the model M2 (Poisson model).

Estimate Std. Error z value Pr(> |z|) Intercept -3.23072 0.29809 -10.838 < 2e-16 X 0.34635 0.05465 6.337 2.34e-10 Z 0.98898 0.06208 15.931 < 2e-16 W -0.42228 0.05024 -8.405 < 2e-16

Tab. 4: Estimates of the model M3 (NB model).

302 c 2019 Journal of Advanced Engineering and Computation (JAEC) VOLUME: 3 | ISSUE: 1 | 2019 | March

Data Set M1 M2 M3 Training 2869.835 3181.406 2943.556 Validation 1013.404 1155.584 1028.383 Test 918.2253 1025.232 945.857 Tab. 5: Results of AIC.

Data Set M1 M2 M3 Training 2894.741 3206.312 2968.462 Validation 1033.937 1176.118 1048.916 Test 938.8124 1045.819 966.4441 Tab. 6: Results of BIC.

Model V P-value Preference

M1 vs M2 6.03 8.44e-10 M1 M1 vs M3 4.11 1.99e-05 M1 M2 vs M3 -5.14 1.00 M3 Tab. 7: Results of Vuong's test on training data.

Model V P-value Preference

M1 vs M2 5.37 5.42e-06 M1 M1 vs M3 1.28 0.10 M1 ≈ M3 M2 vs M3 -5.01 1.00 M3 Tab. 8: Results of Vuong's test on validation data.

Model V P-value Preference

M1 vs M2 4.40 4.03e-08 M1 M1 vs M3 2.83 0.023 M1 M2 vs M3 -3.79 1.00 M3 Tab. 9: Results of Vuong's test on test data.

Data Set M1 M2 M3 Training 0.3145 0.3100 0.3145 Validation 0.3488 0.3416 0.3488 Test 0.2811 0.2827 0.2827 Tab. 10: Results of MSE.

Data Set M1 M2 M3 Training 0.8968 0.8933 0.8968 Validation 0.9042 0.8986 0.9042 Test 0.9063 0.9047 0.9047 Tab. 11: Results of accuracy.

"This is an Open Access article distributed under the terms of the Creative Commons Attribution 303 License, which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited (CC BY 4.0)."