The Energy of Data G´abor J. Sz´ekely,1 and Maria L. Rizzo2 1National Science Foundation, Arlington, VA 22230 and R´enyi Institute of Mathematics, Hungarian Academy of Sciences, Hungary; email:
[email protected] 2Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403; email:
[email protected] Xxxx. Xxx. Xxx. Xxx. YYYY. AA:1{36 Keywords This article's doi: energy distance, statistical energy, goodness-of-fit, multivariate 10.1146/((please add article doi)) independence, distance covariance, distance correlation, U-statistic, Copyright c YYYY by Annual Reviews. V-statistic, data distance, graph distance, statistical tests, clustering All rights reserved Abstract The energy of data is the value of a real function of distances between data in metric spaces. The name energy derives from Newton's gravi- tational potential energy which is also a function of distances between physical objects. One of the advantages of working with energy func- tions (energy statistics) is that even if the observations/data are com- plex objects, like functions or graphs, we can use their real valued dis- tances for inference. Other advantages will be illustrated and discussed in this paper. Concrete examples include energy testing for normal- ity, energy clustering, and distance correlation. Applications include genome, brain studies, and astrophysics. The direct connection between energy and mind/observations/data in this paper is a counterpart of the equivalence of energy and matter/mass in Einstein's E = mc2. 1 Contents 1. INTRODUCTION ............................................................................................ 2 1.1. Distances of observations { Rigid motion invariance ................................................... 5 1.2. Functions of distances { Jackknife invariance .........................................................