Introduction

Bayesian Classifiers Bayesian Classifiers

Naive Bayes Classifier TAN and BAN Probabilistic Graphical Models Semi-Naive Bayesian Classifiers L. Enrique Sucar, INAOE Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 1 / 56 Outline

Introduction 1 Introduction Bayesian Classifiers 2 Bayesian Classifiers Naive Bayes Classifier 3 Naive Bayes Classifier TAN and BAN Semi-Naive 4 TAN and BAN Bayesian Classifiers

Multidimen. 5 Semi-Naive Bayesian Classifiers Bayesian Classifiers Bayesian Chain 6 Multidimen. Bayesian Classifiers Classifiers Bayesian Chain Classifiers Hierarchical Classification 7 Hierarchical Classification Applications References 8 Applications

9 References

(INAOE) 2 / 56 Introduction Classification

Introduction Bayesian Classification consists in assigning classes or labels to Classifiers

Naive Bayes objects. There are two basic types of classification Classifier problems: TAN and BAN Unsupervised: in this case the classes are unknown, so the Semi-Naive Bayesian problem consists in dividing a set of objects into Classifiers n groups or clusters, so that a class is assigned Multidimen. Bayesian to each different group. It is also known as Classifiers Bayesian Chain clustering. Classifiers Hierarchical Supervised: the possible classes or labels are known a Classification priori, and the problem consists in finding a Applications

References function or rule that assigns each object to one of the classes.

(INAOE) 3 / 56 Introduction Probabilistic Classification

Introduction

Bayesian Classifiers Naive Bayes • Supervised classification consists in assigning to a Classifier particular object described by its attributes, TAN and BAN A , A , ..., A , one of m classes, C = {c , c , ..., c }, Semi-Naive 1 2 n 1 2 m Bayesian such that the probability of the class given the attributes Classifiers

Multidimen. is maximized: Bayesian Classifiers Bayesian Chain Classifiers ArgC[MaxP(C | A1, A2, ..., An)] (1)

Hierarchical Classification • If we denote the set of attributes as A = {A1, A2, ..., An}: Applications ArgC[MaxP(C | A)] References

(INAOE) 4 / 56 Introduction Classifier Evaluation

Introduction Bayesian Accuracy: it refers to how well a classifier predicts the Classifiers correct class for unseen examples (that is, Naive Bayes Classifier those not considered for learning the classifier). TAN and BAN Classification time: how long it takes the classification Semi-Naive Bayesian process to predict the class, once the classifier Classifiers has been trained. Multidimen. Bayesian Classifiers Training time: how much time is required to learn the Bayesian Chain Classifiers classifier from data.

Hierarchical Classification Memory requirements: how much space in terms of

Applications memory is required to store the classifier

References parameters. Clarity: if the classifier is easily understood by a person.

(INAOE) 5 / 56 Introduction Class Imbalance

Introduction • Bayesian In general we want to maximize the classification Classifiers accuracy; however, this is only optimal if the cost of a Naive Bayes Classifier wrong classification is the same for all the classes TAN and BAN • When there is imbalance in the costs of Semi-Naive misclassification, we must then minimize the expected Bayesian Classifiers cost (EC). For two classes, this is given by: Multidimen. Bayesian Classifiers EC = FN × P(−)C(− | +) + FP × P(+)C(+ | −) (2) Bayesian Chain Classifiers Hierarchical Where: FN is the false negative rate, FP is the false Classification positive rate, P(+) is the probability of positive, P(−) is Applications the probability of negative, C(− | +) is the cost of References classifying a positive as negative, and C(+ | −) is the cost of classifying a negative as positive

(INAOE) 6 / 56 Bayesian Classifiers Bayes Classifier (I)

Introduction

Bayesian Classifiers

Naive Bayes Classifier • Applying Bayes rule: TAN and BAN

Semi-Naive Bayesian Classifiers P(C)P(A1, A2, ..., An | C) P(C | A1, A2, ..., An) = (3) Multidimen. P(A1, A2, ..., An) Bayesian Classifiers Bayesian Chain • Which can be written more compactly as: Classifiers

Hierarchical Classification P(C | A) = P(C)P(A | C)/P(A) (4) Applications

References

(INAOE) 7 / 56 Bayesian Classifiers Bayes Classifier (II)

Introduction

Bayesian Classifiers • The classification problem can be formulated as: Naive Bayes Classifier

TAN and BAN ArgC[Max[P(C | A) = P(C)P(A | C)/P(A)]] (5) Semi-Naive Bayesian • Classifiers Equivalently:

Multidimen. • ArgC [Max[P(C)P(A | C)]] Bayesian Classifiers • ArgC [Max[log(P(C)P(A | C))]] Bayesian Chain Classifiers • ArgC [Max[(logP(C) + logP(A | C)]] Hierarchical Classification Note that the probability of the attributes, P(A), does Applications not vary with respect to the class, so it can be References considered as a constant for the maximization.

(INAOE) 8 / 56 Bayesian Classifiers Complexity

Introduction

Bayesian Classifiers Naive Bayes • The direct application of the Bayes rule results in a Classifier

TAN and BAN computationally expensive problem Semi-Naive • The number of parameters in the likelihood term, Bayesian Classifiers P(A1, A2, ..., An | C), increases exponentially with the Multidimen. number of attributes Bayesian Classifiers • Bayesian Chain An alternative is to consider some independence Classifiers properties as in graphical models, in particular that all Hierarchical Classification attributes are independent given the class, resulting in Applications the Naive Bayesian Classifier References

(INAOE) 9 / 56 Naive Bayes Classifier Naive Bayes

Introduction

Bayesian Classifiers Naive Bayes • Classifier The naive or simple Bayesian classifier (NBC) is based

TAN and BAN on the assumption that all the attributes are

Semi-Naive independent given the class variable: Bayesian Classifiers

Multidimen. Bayesian P(C)P(A1 | C)P(A2 | C)...P(An | C) Classifiers P(C | A1, A2, ..., An) = Bayesian Chain P(A) Classifiers (6) Hierarchical Classification where P(A) can be considered, as mentioned before, a Applications normalization constant. References

(INAOE) 10 / 56 Naive Bayes Classifier Complexity

Introduction

Bayesian Classifiers • The naive Bayes formulation drastically reduces the Naive Bayes Classifier complexity of the Bayesian classifier, as in this case we

TAN and BAN only require the (one dimensional Semi-Naive vector) of the class, and the n conditional probabilities Bayesian Classifiers of each attribute given the class (two dimensional Multidimen. matrices) Bayesian Classifiers • The space requirement is reduced from exponential to Bayesian Chain Classifiers linear in the number of attributes Hierarchical Classification • The calculation of the posterior is greatly simplified, as Applications to estimate it (unnormalized) only n multiplications are References required

(INAOE) 11 / 56 Naive Bayes Classifier

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 12 / 56 Naive Bayes Classifier Parameter Learning

Introduction

Bayesian Classifiers

Naive Bayes Classifier • The probabilities can be estimated from data using, for TAN and BAN instance, maximum likelihood estimation Semi-Naive • Bayesian The prior probabilities of the class variable, C, are given Classifiers by: Multidimen. Bayesian P(ci ) ∼ Ni /N (7) Classifiers Bayesian Chain Classifiers • The conditional probabilities of each attribute, Aj can be Hierarchical estimated as: Classification P(Ajk | ci ) ∼ Njki /Ni (8) Applications

References

(INAOE) 13 / 56 Naive Bayes Classifier Inference

Introduction

Bayesian Classifiers

Naive Bayes Classifier • The can be obtained just by

TAN and BAN multiplying the prior by the likelihood for each attribute Semi-Naive • Given the values for m attributes, a , ...a , for each Bayesian 1 m Classifiers class ci , the posterior is proportional to: Multidimen. Bayesian Classifiers P(ci | a1, ...am) ∼ P(ci )P(a1 | ci )...P(am | ci ) (9) Bayesian Chain Classifiers Hierarchical • The class ck that maximizes the previous equation will Classification be selected Applications

References

(INAOE) 14 / 56 Naive Bayes Classifier Example - Classifier for Golf

Introduction Outlook Temperature Humidity Windy Play Bayesian Classifiers sunny high high false no

Naive Bayes sunny high high true no Classifier overcast high high false yes TAN and BAN rain medium high false yes Semi-Naive Bayesian rain low normal false yes Classifiers rain low normal true no Multidimen. Bayesian overcast low normal true yes Classifiers Bayesian Chain sunny medium high false no Classifiers sunny low normal false yes Hierarchical Classification rain medium normal false yes Applications sunny medium normal true yes References overcast medium high true yes overcast high normal false yes rain medium high true no

(INAOE) 15 / 56 Naive Bayes Classifier Example - NBC for Golf

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 16 / 56 Naive Bayes Classifier Example - inference for Golf

Introduction

Bayesian Classifiers

Naive Bayes Classifier Given: Outlook=rain, Temperature=high, Humidity=normal,

TAN and BAN Windy=no

Semi-Naive P(Play = yes | overcast, medium, normal, no) = Bayesian Classifiers k × 0.64 × 0.33 × 0.22 × 0.67 × 0.67 = k × 0.021

Multidimen. P(Play = no | overcast, medium, normal, no) = Bayesian Classifiers k × 0.36 × 0.40 × 0.40 × 0.2. × 0.40 = k × 0.0046 Bayesian Chain Classifiers k = 1/(0.021 + 0.0046) = 39.27 Hierarchical P(Play = yes | overcast, medium, normal, no) = 0.82 Classification ( = | , , , ) = . Applications P Play no overcast medium normal no 0 18

References

(INAOE) 17 / 56 Naive Bayes Classifier Analysis

Introduction • Bayesian Advantages: Classifiers • The low number of required parameters, which reduces Naive Bayes Classifier the memory requirements and facilitates learning them from data. TAN and BAN • The low computational cost for inference (estimating the Semi-Naive Bayesian posteriors) and learning. Classifiers • The relatively good performance (classification Multidimen. Bayesian precision) in many domains. Classifiers • A simple and intuitive model. Bayesian Chain Classifiers • Limitations: Hierarchical Classification • In some domains, the performance is reduced given that Applications the assumption is not valid. References • If there are continuous attributes, these need to be discretized (or consider alternative models such as the linear discriminator).

(INAOE) 18 / 56 TAN and BAN TAN

Introduction • The Tree augmented Bayesian Classifier, or TAN, Bayesian Classifiers incorporates some dependencies between the Naive Bayes attributes by building a directed tree among the attribute Classifier

TAN and BAN variables Semi-Naive • The n attributes form a graph which is restricted to a Bayesian Classifiers directed tree that represents the dependency relations Multidimen. between the attributes Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 19 / 56 TAN and BAN BAN

Introduction • The augmented Bayesian Classifier, Bayesian Classifiers or BAN, which considers that the dependency structure

Naive Bayes among the attributes constitutes a directed acyclic Classifier graph (DAG) TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 20 / 56 TAN and BAN Inference and Learning

Introduction

Bayesian Classifiers

Naive Bayes Classifier • The TAN and BAN classifiers can be considered as TAN and BAN

Semi-Naive particular cases of a more general model, that is, Bayesian Classifiers Bayesian networks Multidimen. • Techniques for inference and for learning Bayesian Bayesian Classifiers networks, which can be applied to obtain the posterior Bayesian Chain Classifiers probabilities (inference) and the model (learning) for the Hierarchical Classification TAN and BAN classifiers

Applications

References

(INAOE) 21 / 56 Semi-Naive Bayesian Classifiers Semi-Naive Bayesian Classifiers (SNBC)

Introduction

Bayesian Classifiers

Naive Bayes Classifier • Another alternative to deal with dependent attributes is TAN and BAN to transform the basic structure of a naive Bayesian Semi-Naive classifier, while maintaining a star or tree-structured Bayesian Classifiers network Multidimen. Bayesian • The basic idea of the SNBC is to eliminate or join Classifiers Bayesian Chain attributes which are not independent given the class Classifiers

Hierarchical • This is analogous to in machine Classification learning Applications

References

(INAOE) 22 / 56 Semi-Naive Bayesian Classifiers Structural Improvement

Introduction • Two alternative operations to modify the structure of a Bayesian Classifiers NBC: (i) node elimination, and (ii) node combination, Naive Bayes considering that we start from a full structure Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 23 / 56 Semi-Naive Bayesian Classifiers Node elimination and combination

Introduction

Bayesian Classifiers • Node elimination consists in simply eliminating an Naive Bayes attribute, Ai , from the model, this could be because it is Classifier not relevant for the class (Ai and C are independent); or TAN and BAN because the attribute Ai and another attribute, Aj , are Semi-Naive Bayesian not independent given the class Classifiers

Multidimen. • Node combination consists in merging two attributes, Ai Bayesian Classifiers and Aj , into a new attribute Ak , such that Ak has as Bayesian Chain Classifiers possible values the cross product of the values of Ai Hierarchical and Aj (assuming discrete attributes). For example, if Classification A = a, b, c and A = 1, 2, then Applications i j A = a1, a2, b1, b2, c1, c2. This is an alternative when References k two attributes are not independent given the class

(INAOE) 24 / 56 Semi-Naive Bayesian Classifiers Node Insertion

Introduction • Consists in adding a new attribute that makes two Bayesian Classifiers dependent attributes independent Naive Bayes • This new attribute is a kind of virtual or hidden node in Classifier

TAN and BAN the model, for which we do not have any data Semi-Naive • An alternative for estimating the parameters of hidden Bayesian Classifiers variables in Bayesian networks, such as in this case, is Multidimen. based on the Expectation-Maximization (EM) procedure Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 25 / 56 Multidimen. Bayesian Classifiers Multidimensional classification

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN • Several important problems need to predict several Semi-Naive classes simultaneously Bayesian Classifiers • For example: text classification, where a document can Multidimen. Bayesian be assigned to several topics; gene classification, as a Classifiers gene may have different functions; image annotation, as Bayesian Chain Classifiers an image may include several objects Hierarchical Classification

Applications

References

(INAOE) 26 / 56 Multidimen. Bayesian Classifiers Definition

Introduction

Bayesian Classifiers

Naive Bayes Classifier • The multi-dimensional classification problem TAN and BAN corresponds to searching for a function h that assigns to Semi-Naive Bayesian each instance represented by a vector of m features Classifiers X = (X ,..., X ) a vector of d class values Multidimen. 1 m Bayesian C = (C ,..., C ): Classifiers 1 d Bayesian Chain Classifiers

Hierarchical ArgMaxc1,...,cd P(C1 = c1,..., Cd = cd |X) (10) Classification

Applications

References

(INAOE) 27 / 56 Multidimen. Bayesian Classifiers Multi-label classification

Introduction

Bayesian Classifiers • Multi-label classification is a particular case of

Naive Bayes multi-dimensional classification, where all class Classifier variables are binary TAN and BAN • Two basic approaches: Semi-Naive Bayesian • Binary relevance approaches transform the multi-label Classifiers classification problem into d independent binary Multidimen. Bayesian classification problems, one for each class variable, Classifiers Bayesian Chain C1,..., Cd Classifiers • The label power-set approach transforms the multi-label Hierarchical Classification classification problem into a single-class scenario by

Applications defining a new compound class variable whose possible

References values are all the possible combinations of values of the original classes

(INAOE) 28 / 56 Multidimen. Bayesian Classifiers Multidimensional Bayesian Network Classifiers Introduction

Bayesian Classifiers • A multidimensional Bayesian network classifier (MBC) s Naive Bayes Classifier a Bayesian network with a particular structure TAN and BAN • The set V of variables is partitioned into two sets Semi-Naive Bayesian VC = {C1,..., Cd}, d ≥ 1, of class variables and Classifiers V = {X ,..., X }, m ≥ 1, of feature variables Multidimen. X 1 m Bayesian (d + m = n) Classifiers Bayesian Chain Classifiers • The set A of arcs is also partitioned into three sets, AC, Hierarchical AX, ACX, such that AC ⊆ VC × VC is composed of the Classification arcs between the class variables, A ⊆ V × V is Applications X X X composed of the arcs between the feature variables, References and finally, ACX ⊆ VC × VX is composed of the arcs from the class variables to the feature variables

(INAOE) 29 / 56 Multidimen. Bayesian Classifiers MDBNC - Graphical Model

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 30 / 56 Multidimen. Bayesian Classifiers Inference

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN • The problem of obtaining the classification of an Semi-Naive instance with a MBC, that is, the most likely Bayesian Classifiers combination of classes, corresponds to the MPE (Most Multidimen. Probable Explanation) or abduction problem Bayesian Classifiers • Bayesian Chain This is a complex problem with a high computational Classifiers cost Hierarchical Classification

Applications

References

(INAOE) 31 / 56 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Chain Classifiers

Introduction

Bayesian Classifiers • Chain classifiers are an alternative method for Naive Bayes Classifier multi-label classification that incorporate class

TAN and BAN dependencies, while keeping the computational Semi-Naive efficiency of the binary relevance approach Bayesian Classifiers • A chain classifier consists of d base binary classifiers Multidimen. Bayesian which are linked in a chain, such that each classifier Classifiers incorporates the classes predicted by the previous Bayesian Chain Classifiers classifiers as additional attributes Hierarchical Classification • The feature vector for each binary classifier, Li , is Applications extended with the labels (0/1) for all previous classifiers References in the chain

(INAOE) 32 / 56 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Chain Classifiers - example

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 33 / 56 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Learning and Inference

Introduction

Bayesian Classifiers Naive Bayes • Each classifier in the chain is trained to learn the Classifier

TAN and BAN association of label li given the features augmented with

Semi-Naive all previous class labels in the chain Bayesian Classifiers • For classification, it starts at L1, and propagates the Multidimen. predicted classes along the chain, such that for Li ∈ L Bayesian Classifiers (where L = {L1, L2,..., Ld }) it predicts Bayesian Chain Classifiers P(Li | X, L1, L2,..., Li−1) Hierarchical Classification • The class vector is determined by combining the Applications outputs of all the binary classifiers References

(INAOE) 34 / 56 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers Bayesian chain classifiers

Introduction Bayesian • Bayesian chain classifiers are a type of chain classifier Classifiers under a probabilistic framework: Naive Bayes Classifier

TAN and BAN Semi-Naive ArgMax P(C |C ,..., C , X) ... P(C |X) (11) Bayesian C1,...,cd 1 2 d d Classifiers Multidimen. • If we consider the dependency relations between the Bayesian Classifiers class variables, and represent these relations as a Bayesian Chain Classifiers directed acyclic graph (DAG), then we can simplify the Hierarchical previous Equation: Classification

Applications d References Y ArgMaxC1,...,Cd P(Ci |Pa(Ci ), x) (12) i=1

(INAOE) 35 / 56 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

• Further simplification by assuming that the most probable joint combination of classes can be Introduction approximated by concatenating the individual most Bayesian Classifiers probable classes

Naive Bayes Classifier

TAN and BAN ArgMaxC1 P(C1|Pa(C1), X) Semi-Naive Bayesian ArgMaxC P(C2|Pa(C2), X) Classifiers 2

Multidimen. ····················· Bayesian Classifiers ArgMaxCd P(Cd |Pa(Cd ), X) Bayesian Chain Classifiers

Hierarchical Classification • This last approximation corresponds to a Bayesian Applications

References chain classifier. • For the base classifier we can use any of the Bayesian classifiers presented in the previous sections, for instance a NBC

(INAOE) 36 / 56 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers BCC

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 37 / 56 Hierarchical Classification Hierarchical Classification

Introduction

Bayesian Classifiers • Hierarchical classification is a type of multidimensional Naive Bayes Classifier classification in which the classes are ordered in a TAN and BAN predefined structure, typically a tree, or in general a Semi-Naive Bayesian directed acyclic graph (DAG) Classifiers • In hierarchical classification, an example that belongs to Multidimen. Bayesian certain class automatically belongs to all its Classifiers Bayesian Chain superclasses Classifiers Hierarchical • Hierarchical classification has application in several Classification areas, such as text categorization, protein function Applications prediction, and object recognition References

(INAOE) 38 / 56 Hierarchical Classification Hierarchical Classification - example

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 39 / 56 Hierarchical Classification Basic approaches

Introduction Bayesian • The global approach builds a classifier to predicts all Classifiers the classes at once; this becomes too complex Naive Bayes Classifier computationally for large hierarchies. TAN and BAN • The local approaches train several classifiers and Semi-Naive Bayesian combine their outputs: Classifiers • Local Classifier per hierarchy Level, that trains one Multidimen. multi-class classifier for each level of the class hierarchy Bayesian Classifiers • Local binary Classifier per Node, in which a binary Bayesian Chain Classifiers classifier is built for each node (class) in the hierarchy, Hierarchical except the root node Classification • Local Classifier per Parent Node (LCPN), where a Applications multi-class classifier is trained to predict its child nodes References • Local methods commonly use a top-down approach for classification

(INAOE) 40 / 56 Hierarchical Classification Types of Methods

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 41 / 56 Hierarchical Classification Hierarchical Structure

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 42 / 56 Hierarchical Classification Chained path evaluation

Introduction

Bayesian Classifiers

Naive Bayes Classifier • Chained Path Evaluation (CPE) analyzes each possible TAN and BAN

Semi-Naive path from the root to a leaf node in the hierarchy, taking Bayesian into account the level of the predicted labels to give a Classifiers

Multidimen. score to each path and finally return the one with the Bayesian Classifiers best score Bayesian Chain Classifiers • Additionally, it considers the relations of each node with Hierarchical its ancestors in the hierarchy, based on chain classifiers Classification

Applications

References

(INAOE) 43 / 56 Hierarchical Classification Training

Introduction

Bayesian Classifiers • A local classifier is trained for each node, Ci , in the Naive Bayes Classifier hierarchy, except the leaf nodes, to classify its child TAN and BAN nodes Semi-Naive Bayesian • The classifier for each node, Ci , for instance a naive Classifiers Bayes classifier, is trained considering examples from Multidimen. Bayesian all it child nodes, as well as some examples of it sibling Classifiers Bayesian Chain nodes in the hierarchy Classifiers • Hierarchical To consider the relation with other nodes in the Classification hierarchy, the class predicted by the parent (tree Applications structure) or parents (DAG), are included as additional References attributes in each local classifier

(INAOE) 44 / 56 Hierarchical Classification Classification

Introduction • In the classification phase, the probabilities for each Bayesian class for all local classifiers are obtained based on the Classifiers

Naive Bayes input data Classifier • The score for each path in the hierarchy is calculated by TAN and BAN a weighted sum of the log of the probabilities of all local Semi-Naive Bayesian classifiers in the path: Classifiers Multidimen. n Bayesian X Classifiers score = wC × log(P(Ci |Xi , pa(Ci ))) (13) Bayesian Chain i Classifiers i=0 Hierarchical Classification • The purpose of these weights is to give more Applications importance to the upper levels of the hierarchy References • Once the scores for all the paths are obtained, the path with the highest score will be selected as the set of classes corresponding to certain instance

(INAOE) 45 / 56 Hierarchical Classification CPE - Classification Example

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 46 / 56 Applications Visual skin detection

Introduction

Bayesian Classifiers • Skin detection is a useful pre-processing stage for many Naive Bayes Classifier applications in computer vision, such as person

TAN and BAN detection and gesture recognition, among others Semi-Naive • A simple and very fast way to obtain an approximate Bayesian Classifiers classification of pixels in an image as skin or not − skin Multidimen. is based on the color attributes of each pixel Bayesian Classifiers Bayesian Chain • Usually, pixels in a digital image are represented as the Classifiers combination of three basic (primary) colors: Red (R), Hierarchical Classification Green (G) and Blue (B), in what is known as the RGB Applications model. There are alternative color models, such as References HSV , YIQ, etc.

(INAOE) 47 / 56 Applications SNBC

Introduction

Bayesian • An alternative is to consider a semi-naive Bayesian Classifiers classifier and select the best attributes from the different Naive Bayes Classifier color models for skin classification by eliminating or TAN and BAN joining attributes Semi-Naive Bayesian • Then an initial NBC was learned based on data - Classifiers examples of skin and not-skin pixels taken from several Multidimen. Bayesian images. This initial classifier obtained a 94% accuracy Classifiers Bayesian Chain when applied to other (test) images. Classifiers

Hierarchical • The classifier was then optimized. Starting from the full Classification NBC with 9 attributes, the method applies the variable Applications elimination and combination stages until the simplest References classifier with maximum accuracy is obtained. With this final model accuracy improved to 98%.

(INAOE) 48 / 56 Applications Optimization

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 49 / 56 Applications Example

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 50 / 56 Applications HIV drug selection

Introduction • The Human Immunodeficiency Virus (HIV) is the Bayesian Classifiers causing agent of AIDS, a condition in which progressive Naive Bayes failure of the immune system allows opportunistic Classifier

TAN and BAN life-threatening infections to occur Semi-Naive • The Human Immunodeficiency Virus (HIV) is the Bayesian Classifiers causing agent of AIDS, a condition in which progressive Multidimen. failure of the immune system allows opportunistic Bayesian Classifiers life-threatening infections to occur Bayesian Chain Classifiers • To combat HIV infection several antiretroviral (ARV) Hierarchical Classification drugs belonging to different drug classes that affect Applications specific steps in the viral replication cycle have been References developed. It is important to select the best drug combination according to the virus’ mutations in a patient.

(INAOE) 51 / 56 Applications Multilabel Classification

Introduction

Bayesian Classifiers

Naive Bayes Classifier • Selecting the best group of antiretroviral drugs for a TAN and BAN patient can be seen as an instance of a multi-label Semi-Naive classification problem Bayesian Classifiers • This particular problem can be accurately modeled with Multidimen. Bayesian a multi-dimensional Bayesian network classifiers Classifiers Bayesian Chain • This model can be learned from data and then use it for Classifiers

Hierarchical selecting the set of drugs according to the features Classification (virus mutations) Applications

References

(INAOE) 52 / 56 Applications MBC for HIV drug selection

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Classifiers

Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 53 / 56 References Book

Introduction

Bayesian Classifiers

Naive Bayes Classifier

TAN and BAN

Semi-Naive Bayesian Sucar, L. E, Probabilistic Graphical Models, Springer 2015 – Classifiers Chapter 4 Multidimen. Bayesian Classifiers Bayesian Chain Classifiers

Hierarchical Classification

Applications

References

(INAOE) 54 / 56 References Additional Reading (1)

Introduction Bielza, C., Li, G., Larranaga,˜ P.: Multi-Dimensional Bayesian Classifiers Classification with Bayesian Networks. International Naive Bayes Classifier Journal of Approximate Reasoning, (2011) TAN and BAN Borchani, H., Bielza, C., Toro, C., Larranaga,˜ P.: Semi-Naive Bayesian Predicting human immunodeficiency virus inhibitors Classifiers using multi-dimensional Bayesian network classifiers. Multidimen. Bayesian Artificial Intelligence in Medicine 57, 219–229 (2013) Classifiers Bayesian Chain Classifiers Cheng, J., Greiner, R.: Comparing Bayesian Network Hierarchical Classifiers. Proceedings of the Fifteenth Conference on Classification Uncertainty in Artificial Intelligence, 101–108 (1999) Applications References Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. 29, 131–163 (1997)

(INAOE) 55 / 56 References Additional Reading (2)

Introduction Kwoh, C.K, Gillies, D.F.: Using Hidden Nodes in Bayesian Classifiers Bayesian Networks. Artificial Intelligence 88, Elsevier,

Naive Bayes Essex, UK, 1–38 (1996) Classifier

TAN and BAN Martinez M., Sucar, L.E.: Learning an optimal naive Semi-Naive bayes classifier. International Conference on Pattern Bayesian Classifiers Recognition (ICPR), Vol. 3, 1236–1239 (2006)

Multidimen. Bayesian Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifiers Bayesian Chain Classifier chains for multi-label classification. Classifiers Proceedings ECML/PKDD, 254–269 (2009) Hierarchical Classification

Applications Ram´ırez, M., Sucar, L.E., Morales, E.: Path Evaluation

References for Hierarchical Multi-label Classification. Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference (FLAIRS), pp. 502–507 (2014)

(INAOE) 56 / 56 References Additional Reading (3)

Introduction Silla-Jr., C. N., Freitas, A. A.: A survey of hierarchical Bayesian Classifiers classification across different application domains. Data

Naive Bayes Min. and Knowledge Discovery, 22(1-2), 31–72 (2011). Classifier

TAN and BAN Sucar, L.E, Bielza, C., Morales, E., Hernandez P., Semi-Naive Zaragoza, J., Larranaga,˜ P.: Multi-label Classification Bayesian Classifiers with Bayesian Network-based Chain Classifiers. Pattern Multidimen. Recognition Letters 41, 14–22 (2014) Bayesian Classifiers Bayesian Chain Tsoumakas, G., Katakis, I.: Multi-Label Classification: Classifiers An Overview. International Journal of Data Warehousing Hierarchical Classification and Mining, 3, 1–13 (2007) Applications

References van der Gaag L.C., de Waal, P.R.: Multi-dimensional Bayesian Network Classifiers. Third European Conference on Probabilistic Graphic Models, 107–114, Prague, Czech Republic (2006)

(INAOE) 57 / 56