Order Parameters and Model Selection in Machine Learning: Model Characterization and Feature Selection

Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Michele` Sebag; Co-advisor: Antoine Cornuejols´ PhD, December 14, 2010 Introduction Relational Kernels Feature Selection Conclusion+ Supervised Machine Learning Background Unknown distribution IP(x; y) on X × Y Objective ∗ Find h minimizing generalization error h∗(x) > 0 Err (h) = IEIP(x;y) [` (h(x); y)] Where ` (h(x); y) is the cost of error on example x h∗(x) = 0 h (x) < 0 Given ∗ Training examples L = f(x1; y1);:::; (xn; yn)g Where (x ; y ) IP(x; y); i 1;:::; n i i ∼ 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 2 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias) Learned hypothesis belong to H ∗ h∗ hH = argmin Err (h) h2H Approximation Estimation error (a.k.a. variance) h∗ Err estimated by empirical error H H 1 P Errn (h) = n `(h(xi ); yi ) hn = argmin Errn (h) h2H Optimization error Learned hypothesis returned by an optimization algorithm A ^ hn = A(L) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias) Learned hypothesis belong to H ∗ h∗ hH = argmin Err (h) h2H Approximation Estimation error (a.k.a. variance) h∗ Err estimated by empirical error H Estimation H hn 1 P Errn (h) = n `(h(xi ); yi ) hn = argmin Errn (h) h2H Optimization error Learned hypothesis returned by an optimization algorithm A ^ hn = A(L) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias) Learned hypothesis belong to H ∗ h∗ hH = argmin Err (h) h2H Approximation Estimation error (a.k.a. variance) h∗ Err estimated by empirical error H Estimation H hn 1 P Optimization Errn (h) = `(h(xi ); yi ) n hˆn hn = argmin Errn (h) h2H Optimization error Learned hypothesis returned by an optimization algorithm A ^ hn = A(L) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Focus of the thesis Combinatorial optimization problems hidden in Machine Learning + Relational representation =) Combinatorial optimization problem Example: Mutagenesis database - + Feature Selection =) Combinatorial optimization problem Example: Microarray data − R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 4 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Outline 1 Relational Kernels 2 Feature Selection R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 5 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Outline 1 Relational Kernels 2 Feature Selection R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 6 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Relational Learning / Inductive Logic Programming Position Relational database : keys in the database BackgroundX knowledge : set of logical formulas H Expressive language Actual covering test: Constraint Satisfaction Problem (CSP) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 7 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion CSP consequences within Inductive Logic Programming Consequences of the Phase Transition Complexity Worst case: NP-hard Average case: “easy” except in Phase Transistion (Cheeseman et al. 91) Phase Transition in Inductive Logic Programming Existence (Giordana & Saitta, 00) Impact: fails to learn in Phase Transition region (Botta et al., 03) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 8 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Multiple Instance Problems The missing link between Relational and Propositional Learning Multiple Instance Problems (MIP) (Dietterich et al., 89) An example: set of instances An instance: vector of features Target-concept: there exists an instance satisfying a predicate P pos(x) () 9I 2 x; P(I) Example of MIP Positive key ring A locked door A positive key-ring contains a key which can unlock the door Negative key ring R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 9 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Support Vector Machine A Convex optimization problem 0 < ξ < 1 ˆ n n i hn(x) > 0 X 1 X argmin αi αi αj yi yj xi ; xj n − 2 h i α2IR i=1 i=1 ( Pn α y = 0 s.t. i=1 i i 0 6 αi 6 C; i = 1;:::; n ˆ ξ = 0 Kernel trick hn(x) < 0 i hˆn(x) = 1 ξi > 1 hˆn(x) = 0 xi ; xj K (xi ; xj ) ˆ hn(x) = 1 h i − Kernel-based propositionalization (differs from RKHS framework) ( = (x1; y1);:::; (xn; yn) L f g Φ: x (K (x ; x);:::; K (xn; x)) K ! 1 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 10 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion SVM and MIP Averaging-kernel for MIP (Gartner¨ et al., 02) Given a kernel k on instances P P x 2x x 2x0 k(xi ; xj ) K (x; x 0) = i j norm (x) norm (x 0) Question MIP Target-concept: existential properties Averaging-Kernel: average properties Do averaging-kernels sidestep limitations of Relational Learning? R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 11 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Methodology Inspired from Phase Transition studies Usual Phase Transition framework Generate data after control parameters Observe results Draw phase diagram: results w.r.t. order parameters This study Generalized Multiple Instance Problem Experimental results of averaging-kernel-based propositionalization R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 12 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Outline 1 Relational Kernels Theoretical failure region Lower bound on the generalization error Empirical failure region 2 Feature Selection R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 13 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Generalized Multiple Instance Problems Generalized MIP (Weidmann et al., 03) An example: set of instances An instance: vector of features Target-concept: conjunction of predicates P1;:::; Pm m ^ pos(x) () 9I1;:::; Im 2 x; Pi (Ii ) i=1 O CH3 O CH3 CN Example of Generalized MIP CN CH3 N CO CO CH3 N A molecule: set of sub-graphs C C = CC N Bioactivity: implies several sub-graphs N N ) N CH CH3 CH CH3 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 14 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Control Parameters Category Param. Deﬁnition Σ Size of alphabet Σ, a Σ Instances jd j number of numerical2 features, I = (a; z) z [0; 1]d 2 + ε M+ Number of instances per positive example M− Number of instances per negative example m+ Number of instances in a predi- Examples cate, for positive example m− Number of instances in a predicate, for negative example P Number of predicates “missed” m ε by each negative example - P Number of predicate Concept " Radius of each predicate ("- ball) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 15 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Limitation of averaging-kernels Theoretical analysis 0.008 exemples positifs 0.007 exemples négatifs 0.006 + − m m 0.005 Failure for + = − ,x) M M − 0.004 K(x 0.003 0.002 0.001 IE + [K (x ; x)] = IE − [K (x ; x)] x∼D i x∼D i 0 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 K(x+,x) Empirical approach Generate, test and average empirical results Establish a lower bound on generalization error R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 16 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Limitation of averaging-kernels Theoretical analysis 0.008 exemples positifs 0.007 exemples négatifs 0.006 + − m m 0.005 Failure for + = − ,x) M M − 0.004 K(x 0.003 0.002 0.001 IE + [K (x ; x)] = IE − [K (x ; x)] x∼D i x∼D i 0 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 K(x+,x) Empirical approach Generate, test and average empirical results Establish a lower bound on generalization error R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 16 / 52 Introduction Relational Kernels Feature Selection Conclusion+ Position Theory Lower bound Experiments Discussion Efﬁciency of kernel-based

Order Parameters and Model Selection in Machine Learning: Model Characterization and Feature Selection

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support