JournalofMachineLearningResearch11(2010)1517-1561 Submitted 8/09; Revised 3/10; Published 4/10 Hilbert Space Embeddings and Metrics on Probability Measures Bharath K. Sriperumbudur
[email protected] Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92093-0407, USA Arthur Gretton∗
[email protected] MPI for Biological Cybernetics Spemannstraße 38 72076, Tubingen,¨ Germany Kenji Fukumizu
[email protected] The Institute of Statistical Mathematics 10-3 Midori-cho, Tachikawa Tokyo 190-8562, Japan Bernhard Scholkopf¨
[email protected] MPI for Biological Cybernetics Spemannstraße 38 72076, Tubingen,¨ Germany Gert R. G. Lanckriet
[email protected] Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92093-0407, USA Editor: Ingo Steinwart Abstract A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embed- ding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance be- tween distribution embeddings: we denote this as γk, indexed by the kernel function k that defines the inner product in the RKHS. We present three theoretical properties of γk. First, we consider the question of determining the conditions on the kernel k for which γk is a metric: such k are denoted characteristic kernels. Un- like pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously pub- lished conditions may apply only in restricted circumstances (e.g., on compact domains), and are difficult to check, our conditions are straightforward and intuitive: integrally strictly positive defi- nite kernels are characteristic.