
Introduction Bayesian Classifiers Bayesian Classifiers Naive Bayes Classifier TAN and BAN Probabilistic Graphical Models Semi-Naive Bayesian Classifiers L. Enrique Sucar, INAOE Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Hierarchical Classification Applications References (INAOE) 1 / 56 Outline Introduction 1 Introduction Bayesian Classifiers 2 Bayesian Classifiers Naive Bayes Classifier 3 Naive Bayes Classifier TAN and BAN Semi-Naive 4 TAN and BAN Bayesian Classifiers Multidimen. 5 Semi-Naive Bayesian Classifiers Bayesian Classifiers Bayesian Chain 6 Multidimen. Bayesian Classifiers Classifiers Bayesian Chain Classifiers Hierarchical Classification 7 Hierarchical Classification Applications References 8 Applications 9 References (INAOE) 2 / 56 Introduction Classification Introduction Bayesian Classification consists in assigning classes or labels to Classifiers Naive Bayes objects. There are two basic types of classification Classifier problems: TAN and BAN Unsupervised: in this case the classes are unknown, so the Semi-Naive Bayesian problem consists in dividing a set of objects into Classifiers n groups or clusters, so that a class is assigned Multidimen. Bayesian to each different group. It is also known as Classifiers Bayesian Chain clustering. Classifiers Hierarchical Supervised: the possible classes or labels are known a Classification priori, and the problem consists in finding a Applications References function or rule that assigns each object to one of the classes. (INAOE) 3 / 56 Introduction Probabilistic Classification Introduction Bayesian Classifiers Naive Bayes • Supervised classification consists in assigning to a Classifier particular object described by its attributes, TAN and BAN A ; A ; :::; A , one of m classes, C = fc ; c ; :::; c g, Semi-Naive 1 2 n 1 2 m Bayesian such that the probability of the class given the attributes Classifiers Multidimen. is maximized: Bayesian Classifiers Bayesian Chain Classifiers ArgC[MaxP(C j A1; A2; :::; An)] (1) Hierarchical Classification • If we denote the set of attributes as A = fA1; A2; :::; Ang: Applications ArgC[MaxP(C j A)] References (INAOE) 4 / 56 Introduction Classifier Evaluation Introduction Bayesian Accuracy: it refers to how well a classifier predicts the Classifiers correct class for unseen examples (that is, Naive Bayes Classifier those not considered for learning the classifier). TAN and BAN Classification time: how long it takes the classification Semi-Naive Bayesian process to predict the class, once the classifier Classifiers has been trained. Multidimen. Bayesian Classifiers Training time: how much time is required to learn the Bayesian Chain Classifiers classifier from data. Hierarchical Classification Memory requirements: how much space in terms of Applications memory is required to store the classifier References parameters. Clarity: if the classifier is easily understood by a person. (INAOE) 5 / 56 Introduction Class Imbalance Introduction • Bayesian In general we want to maximize the classification Classifiers accuracy; however, this is only optimal if the cost of a Naive Bayes Classifier wrong classification is the same for all the classes TAN and BAN • When there is imbalance in the costs of Semi-Naive misclassification, we must then minimize the expected Bayesian Classifiers cost (EC). For two classes, this is given by: Multidimen. Bayesian Classifiers EC = FN × P(−)C(− j +) + FP × P(+)C(+ j −) (2) Bayesian Chain Classifiers Hierarchical Where: FN is the false negative rate, FP is the false Classification positive rate, P(+) is the probability of positive, P(−) is Applications the probability of negative, C(− j +) is the cost of References classifying a positive as negative, and C(+ j −) is the cost of classifying a negative as positive (INAOE) 6 / 56 Bayesian Classifiers Bayes Classifier (I) Introduction Bayesian Classifiers Naive Bayes Classifier • Applying Bayes rule: TAN and BAN Semi-Naive Bayesian Classifiers P(C)P(A1; A2; :::; An j C) P(C j A1; A2; :::; An) = (3) Multidimen. P(A1; A2; :::; An) Bayesian Classifiers Bayesian Chain • Which can be written more compactly as: Classifiers Hierarchical Classification P(C j A) = P(C)P(A j C)=P(A) (4) Applications References (INAOE) 7 / 56 Bayesian Classifiers Bayes Classifier (II) Introduction Bayesian Classifiers • The classification problem can be formulated as: Naive Bayes Classifier TAN and BAN ArgC[Max[P(C j A) = P(C)P(A j C)=P(A)]] (5) Semi-Naive Bayesian • Classifiers Equivalently: Multidimen. • ArgC [Max[P(C)P(A j C)]] Bayesian Classifiers • ArgC [Max[log(P(C)P(A j C))]] Bayesian Chain Classifiers • ArgC [Max[(logP(C) + logP(A j C)]] Hierarchical Classification Note that the probability of the attributes, P(A), does Applications not vary with respect to the class, so it can be References considered as a constant for the maximization. (INAOE) 8 / 56 Bayesian Classifiers Complexity Introduction Bayesian Classifiers Naive Bayes • The direct application of the Bayes rule results in a Classifier TAN and BAN computationally expensive problem Semi-Naive • The number of parameters in the likelihood term, Bayesian Classifiers P(A1; A2; :::; An j C), increases exponentially with the Multidimen. number of attributes Bayesian Classifiers • Bayesian Chain An alternative is to consider some independence Classifiers properties as in graphical models, in particular that all Hierarchical Classification attributes are independent given the class, resulting in Applications the Naive Bayesian Classifier References (INAOE) 9 / 56 Naive Bayes Classifier Naive Bayes Introduction Bayesian Classifiers Naive Bayes • Classifier The naive or simple Bayesian classifier (NBC) is based TAN and BAN on the assumption that all the attributes are Semi-Naive independent given the class variable: Bayesian Classifiers Multidimen. Bayesian P(C)P(A1 j C)P(A2 j C):::P(An j C) Classifiers P(C j A1; A2; :::; An) = Bayesian Chain P(A) Classifiers (6) Hierarchical Classification where P(A) can be considered, as mentioned before, a Applications normalization constant. References (INAOE) 10 / 56 Naive Bayes Classifier Complexity Introduction Bayesian Classifiers • The naive Bayes formulation drastically reduces the Naive Bayes Classifier complexity of the Bayesian classifier, as in this case we TAN and BAN only require the prior probability (one dimensional Semi-Naive vector) of the class, and the n conditional probabilities Bayesian Classifiers of each attribute given the class (two dimensional Multidimen. matrices) Bayesian Classifiers • The space requirement is reduced from exponential to Bayesian Chain Classifiers linear in the number of attributes Hierarchical Classification • The calculation of the posterior is greatly simplified, as Applications to estimate it (unnormalized) only n multiplications are References required (INAOE) 11 / 56 Naive Bayes Classifier Graphical Model Introduction Bayesian Classifiers Naive Bayes Classifier TAN and BAN Semi-Naive Bayesian Classifiers Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Hierarchical Classification Applications References (INAOE) 12 / 56 Naive Bayes Classifier Parameter Learning Introduction Bayesian Classifiers Naive Bayes Classifier • The probabilities can be estimated from data using, for TAN and BAN instance, maximum likelihood estimation Semi-Naive • Bayesian The prior probabilities of the class variable, C, are given Classifiers by: Multidimen. Bayesian P(ci ) ∼ Ni =N (7) Classifiers Bayesian Chain Classifiers • The conditional probabilities of each attribute, Aj can be Hierarchical estimated as: Classification P(Ajk j ci ) ∼ Njki =Ni (8) Applications References (INAOE) 13 / 56 Naive Bayes Classifier Inference Introduction Bayesian Classifiers Naive Bayes Classifier • The posterior probability can be obtained just by TAN and BAN multiplying the prior by the likelihood for each attribute Semi-Naive • Given the values for m attributes, a ; :::a , for each Bayesian 1 m Classifiers class ci , the posterior is proportional to: Multidimen. Bayesian Classifiers P(ci j a1; :::am) ∼ P(ci )P(a1 j ci ):::P(am j ci ) (9) Bayesian Chain Classifiers Hierarchical • The class ck that maximizes the previous equation will Classification be selected Applications References (INAOE) 14 / 56 Naive Bayes Classifier Example - Classifier for Golf Introduction Outlook Temperature Humidity Windy Play Bayesian Classifiers sunny high high false no Naive Bayes sunny high high true no Classifier overcast high high false yes TAN and BAN rain medium high false yes Semi-Naive Bayesian rain low normal false yes Classifiers rain low normal true no Multidimen. Bayesian overcast low normal true yes Classifiers Bayesian Chain sunny medium high false no Classifiers sunny low normal false yes Hierarchical Classification rain medium normal false yes Applications sunny medium normal true yes References overcast medium high true yes overcast high normal false yes rain medium high true no (INAOE) 15 / 56 Naive Bayes Classifier Example - NBC for Golf Introduction Bayesian Classifiers Naive Bayes Classifier TAN and BAN Semi-Naive Bayesian Classifiers Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Hierarchical Classification Applications References (INAOE) 16 / 56 Naive Bayes Classifier Example - inference for Golf Introduction Bayesian Classifiers Naive Bayes Classifier Given: Outlook=rain, Temperature=high, Humidity=normal, TAN and BAN Windy=no Semi-Naive P(Play = yes j overcast; medium; normal; no) = Bayesian Classifiers k × 0:64 × 0:33 × 0:22 × 0:67 × 0:67 = k × 0:021 Multidimen. P(Play = no j overcast; medium; normal; no) = Bayesian Classifiers k × 0:36 × 0:40 × 0:40 × 0:2: × 0:40 = k × 0:0046 Bayesian Chain
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages57 Page
-
File Size-