JournalofMachineLearningResearch13(2012)1393-1434 Submitted 5/07; Revised 6/11; Published 5/12 Feature Selection via Dependence Maximization Le Song
[email protected] Computational Science and Engineering Georgia Institute of Technology 266 Ferst Drive Atlanta, GA 30332, USA Alex Smola
[email protected] Yahoo! Research 4301 Great America Pky Santa Clara, CA 95053, USA Arthur Gretton∗
[email protected] Gatsby Computational Neuroscience Unit 17 Queen Square London WC1N 3AR, UK Justin Bedo†
[email protected] Statistical Machine Learning Program National ICT Australia Canberra, ACT 0200, Australia Karsten Borgwardt
[email protected] Machine Learning and Computational Biology Research Group Max Planck Institutes Spemannstr. 38 72076 Tubingen,¨ Germany Editor: Aapo Hyvarinen¨ Abstract We introduce a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our ap- proach leads to a greedy procedure for feature selection. We show that a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice. Keywords: kernel methods, feature selection, independence measure, Hilbert-Schmidt indepen- dence criterion, Hilbert space embedding of distribution 1. Introduction In data analysis we are typically given a set of observations X = x1,...,xm X which can be { } used for a number of tasks, such as novelty detection, low-dimensional representation,⊆ or a range of . Also at Intelligent Systems Group, Max Planck Institutes, Spemannstr.