The Performance of Predictive Classification Using Empirical Bayes

World Academy of Science, Engineering and Technology International Journal of Mathematical and Computational Sciences Vol:6, No:12, 2012 The Performance of Predictive Classification using Empirical Bayes N. Deetae, S. Sukparungsee, Y. Areepong, and K. Jampachaisri function neural network with support vector machine. The Abstract—This research is aimed to compare the percentages of results disclosed that Bayesian network gave the best correct classification of Empirical Bayes method (EB) to Classical performance with 91% of correct classification in the test set. method when data are constructed as near normal, short-tailed and Williams and Barber [5] studied the problem of assigning an long-tailed symmetric, short-tailed and long-tailed asymmetric. The input vector into several classes using Gaussian prior in study is performed using conjugate prior, normal distribution with known mean and unknown variance. The estimated hyper-parameters Bayesian classification and it was generalized to multiclass obtained from EB method are replaced in the posterior predictive problem. Porwal and Carranza [9] had proposed classifiers probability and used to predict new observations. Data are generated, using Bayesian network and applied to classify mineral consisting of training set and test set with the sample sizes 100, 200 deposit occurrence. and 500 for the binary classification. The results showed that EB The Bayesian classification involves in hyper-parameters method exhibited an improved performance over Classical method in that should be known or be able to assess using previous all situations under study. knowledge prior to data collection [10]. However, little information is sometimes available in practice, causing the Keywords—Classification, Empirical Bayes, Posterior predictive probability. assessment of hyper-parameters impossible. As a result, Empirical Bayes (EB) can be utilized to estimate the unknown I. INTRODUCTION hyper-parameters using information in the observed data. Li [11] exhibited the use of EB to estimate unknown parameters ISCRIMINATION and classification are techniques and classify a set of unidentified input patterns into k separate often used together. The goal of discrimination is to D classes using stochastic approximation. In addition, the results search for distinct characteristics of observations in the from Monte Carlo simulation study demonstrated the training set of sample with known classes and used to favorable estimation of unknown parameters in normal construct a rule, called discriminant, to separate observations distribution with EB. Chang and Li [12] illustrated the use of as much as possible [1], [2]. Classification is focused on EB to classify a new item or product using stochastic allocating new observations in the test set of sample to labeled approximation with Weibull distribution into two classes classes based on well-defined rule obtained from the training (good or defective) or identified an item as produced from one set [3]. Generally, classification can be performed with of two production lines. Wei and Chen [13] studied EB various methods, such as Bayesian, Nearest Neighbor, estimation in two-way classification model. The results Classification Tree, Support Vector Machines and Neural showed that EB yielded smaller mean square error matrix than Network etc. the least sum of squares method. Ji, Tsui, and Kim [14] Bayesian method is one of the most popular methods. This adopted EB to classify gene expression profiles, leading to the method is simple and effective for classification [4]. The decrease of number of nuisance parameters in the Bayesian Bayesian classification method, which classified observations model. into related classes using a decision rule, is known for its With parametric classification, data are assumed to be flexibility and accuracy [5]. Decision rule is defined by the normally distributed which rarely occurred in practice except posterior probability which class membership is indicated for large sample. Sometimes, data may be distributed based on its highest posterior probability [6], [7]. Duarte- symmetrically or asymmetrically with long tail and short tail. Mermoud and Beltran [8] proposed the Bayesian network in EB, consequently, can be utilized to produce the posterior classification of Chilean wines and compared radial basis predictive probability of assigning new observations to known International Science Index, Mathematical and Computational Sciences Vol:6, No:12, 2012 waset.org/Publication/2822 classes and it is believed to improve performance over N. Deetae is a PhD student in the Department of Applied Statistics at King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand. Classical method. The objective of this research is to develop (phone: +66870713312; fax: +6625856105; e-mail: natthineed@hotmail. a classification technique to classify various types of data; com). near normal, short-tailed and long-tailed symmetric, short- S. Sukparungsee is an Assistant Professor with the Department of Applied tailed and long-tailed asymmetric, using EB method. The Statistics, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand (e-mail: [email protected]). estimated hyper-parameters obtained from EB are substituted Y. Areepong is an Assistant Professor with the Department of Applied in the posterior predictive probability and used to classify new Statistics, King Mongkut’s University of Technology North Bangkok, observations and then compared to Classical method. Data Bangkok, Thailand (e-mail: yupaporn@ kmutnb.ac.th). K. Jampachaisri is an Assistant Professor with the Department of employed in this research are simulated and divided into two Mathematics,Naresuan University, Phitsanulok (e-mail: [email protected]). International Scholarly and Scientific Research & Innovation 6(12) 2012 1703 scholar.waset.org/1307-6892/2822 World Academy of Science, Engineering and Technology International Journal of Mathematical and Computational Sciences Vol:6, No:12, 2012 −1 sets: a set of sample used to create a rule, called training data, 2()rr 2() ()r ⎡⎤∂∂LL⎡ ∂L ⎤ and a set of sample used to evaluate a rule derived from (1)rr+ () ⎢⎥2 ⎢ ⎥ ⎡⎤⎡⎤ααˆˆ∂∂∂ααβ ∂α (6) training data, called test data. In each situation, the percentage ⎢⎥⎢⎥=−⎢⎥ ×⎢ ⎥ ˆˆ(1)rr+ () 2()rr 2() ()r of correct classification is considered. ⎣⎦⎣⎦⎢⎥⎢⎥ββ⎢⎥∂∂LL⎢ ∂⎥L ⎢⎥2 ⎢ ⎥ ⎣⎦∂∂βα ∂ β ⎣ ∂β ⎦ II. EMPIRICAL BAYES METHOD where EB can be performed using either parametric or n ⎛⎞1 nonparametric methods. With parametric EB, the prior LX;αβ ,=+Γ+− α ln β ln α ln 2 π ()i ∑ i=1 ⎜⎟ distribution is assumed to be known, in contrast with ⎝⎠2 nonparametric EB [10]. The estimation of hyper-parameters ⎛⎞⎛11 2 ⎞ −Γln()α −⎜⎟⎜αθβ + ⋅ ln ()xi − + ⎟ with EB method can be obtained from posterior marginal ⎝⎠⎝22 ⎠ distribution function as follow 2()r ∂ L nnnn2 2()2()=+2 −−rr ∂ααα()r 1 ()r 2 mx(|)δ = f (|)(|) xθπθ δ d θ (1) α + ()21α + ∫ 2 2()r ∂ Ln n 1 =− where θ is parameter which is continuous random variable ()r ∑ i=1 1 ∂∂αβ β ()x −+θ 2()β r in this case 2 i δ is hyper-parameter ()r 1 2()rr () α + mx(|)δ is posterior marginal distribution function ∂ Lnα n =− + 2 22()r ∑ i=1 2 In this research, the form of prior distribution, conjugate ∂ββ ⎡ 1 ⎤ ()x −+θβ2()r prior, is assumed to be known with known mean ( ) and ⎢ i ⎥ θ 0 ⎣ 2 ⎦ 2 ()r unknown variance (σ ). ∂Ln()rr⎛⎞ ()1 () r 2 =+nnlnβ ln⎜⎟αα +−−()r n ln Informative prior: σ ~(,)IG αβ that is, ∂+αα⎝⎠22 1 n n ⎡⎤⎛⎞1 2()r 2 +−ln (x −+θβ ) βσααβσ()2(1)/−+ e − ()r ∑ i=1 ⎢⎥⎜⎟i πσ()22=>>> ; σ 0,0,0 α β (2) 22α ⎣⎦⎝⎠ Γ α () ⎛⎞()r 1 ()rr () ⎜⎟α + ∂Lnα n 2 =− ⎝⎠ The steps of EB method are demonstrated below: ()r ∑ i=1 1 ∂ββ ()x −+θ 2()β r 2 i Step i: Find posterior marginal distribution function. and r represents the iteration number. Consider probability distribution function of X . Step iii: Find the posterior distribution function. 1 −−()x θ 2 1 2 0 fx(|σσ22 )=−∞<<∞> e2σ ; x , 0 (3) 22 2 2 fx(|σπσ )( ) (7) 2πσ πσ(|)x = ∞ ∫ f (|xdσ 222 )(πσ ) σ Posterior marginal distribution function is 0 ∞ Thus, the posterior distribution function is mx(|α ,βσπσσ )= ∫ f (| x222 )( ) d . (4) 0 ⎛⎞n 2 2α + n ()xi −+θ 0 2β σ 2 |~XIG⎜⎟ ,∑ i=1 . (8) ⎜⎟22 Therefore, posterior marginal distribution function of X is ⎝⎠ ⎛⎞21α + International Science Index, Mathematical and Computational Sciences Vol:6, No:12, 2012 waset.org/Publication/2822 Step iv: Take estimators of hyper-parameter from Step ii and α Γ ⎜⎟ β 2 (5) mx(|αβ , )=⋅⎝⎠ . replaced into the posterior distribution function. ⎛⎞21α + 2()πα⋅Γ ⎜⎟ ⎛⎞()x −+θβ2 2 ⎝⎠2 ⎜⎟ ⎛⎞n 2 ˆ 2αˆ + n ()xi −+θ 0 2β ⎜⎟2 σ 2 |~XIG⎜⎟ ,∑ i=1 (9) ⎝⎠ ⎜⎟22 ⎝⎠ Step ii: Find estimators of hyper-parameter using maximum likelihood method and use the Newton-Raphson method to Step v: Compute the posterior predictive probability. solve a nonlinear equation. The posterior predictive probability is frequently used for Therefore, the estimators ofαˆ and βˆ are prediction of new observation, y , in test data [15]. The International Scholarly and Scientific Research & Innovation 6(12) 2012 1704 scholar.waset.org/1307-6892/2822 World Academy of Science, Engineering and Technology International Journal of Mathematical and Computational Sciences Vol:6, No:12, 2012 posterior predictive probability of y conditionally on x , TABLE I PERCENTAGES OF CORRECTLY CLASSIFIED DATA USING CLASSICAL AND EB denoted by METHOD Methods Sample sizes 2222 (10) Data distribution py(|, xσ )= ∫ f (| yσπσ )( |) xd σ (n) Classical EB σ 2 where 100 Near Normal 98.0700 98.0790 Symmetric short

Load more