Official Journal of the Bernoulli Society for Mathematical Statistics and Probability

Volume Twenty Number Four November 2014 ISSN: 1350-7265

CONTENTS

Papers HEINRICH, L., LÜCK, S. and SCHMIDT, V. 1673 Asymptotic goodness-of-fit tests for the Palm mark distribution of stationary point processes with correlated marks WANG, N.-Y. and WU, L. 1698 Convergence rate and concentration inequalities for Gibbs sampling in high dimension FERREIRA, A. and DE HAAN, L. 1717 The generalized Pareto process; with a view towards application and simulation NG, C.T. and JOE, H. 1738 Model comparison with composite likelihood information criteria FISCHER, M. 1765 On the form of the large deviation for the empirical measures of weakly interacting systems KIM, A.K.H. 1802 Minimax bounds for estimation of normal mixtures FOUCART, C. and URIBE BRAVO, G. 1819 Local extinction in continuous-state branching processes with immigration TRASHORRAS, J. and WINTENBERGER, O. 1845 Large deviations for bootstrapped empirical measures CRISAN, D. and MÍGUEZ, J. 1879 Particle-kernel estimation of the filter density in state-space models JOURDAIN, B., LELIÈVRE, T. and MIASOJEDOW, B. 1930 Optimal scaling for the transient phase of Metropolis Hastings algorithms: The longtime behavior RUKHIN, A.L. 1979 Restricted likelihood representation and decision-theoretic aspects of meta-analysis PAPASPILIOPOULOS, O. and RUGGIERO, M. 1999 Optimal filtering and the dual process LEDERER, J. and VAN DE GEER, S. 2020 New concentration inequalities for suprema of empirical processes (continued)

The papers published in Bernoulli are indexed or abstracted in Current Index to Statistics, Mathematical Reviews, Statisctical Theory and Method Abstracts-Zentralblatt (STMA-Z), and Zentralblatt für Mathematik (also avalaible on the MATH via STN database and Compact MATH CD-ROM). A list of forthcoming papers can be found online at http://www. bernoulli-society.org/index.php/publications/bernoulli-journal/bernoulli-journal-papers Official Journal of the Bernoulli Society for Mathematical Statistics and Probability

Volume Twenty Number Four November 2014 ISSN: 1350-7265

CONTENTS

(continued)

Papers GASSIAT, E. and ROUSSEAU, J. 2039 About the posterior distribution in hidden Markov models with unknown number of states BALL, F., GONZÁLEZ, M., MARTÍNEZ, R. and SLAVTCHOVA-BOJKOVA, M. 2076 Stochastic monotonicity and continuity properties of functions defined on Crump– Mode–Jagers branching processes, with application to vaccination in epidemic modelling ZHOLUD, D. 2102 Tail approximations for the Student t-, F -, and Welch statistics for non-normal and not necessarily i.i.d. random variables LACOUR, C. and PHAM NGOC, T.M. 2131 Goodness-of-fit test for noisy directional data DELGADO-VENCES, F.J. and SANZ-SOLÉ, M. 2169 Approximation of a stochastic wave equation in dimension three, with application to a support theorem in Hölder norm CASTRO, R.M. 2217 Adaptive sensing performance lower bounds for sparse signal detection and support estimation ISPÁNY, M., KÖRMENDI, K. and PAP, G. 2247 Asymptotic behavior of CLS estimators for 2-type doubly symmetric critical Galton–Watson processes with immigration KANAMORI, T. and FUJISAWA, H. 2278 Affine invariant divergences associated with proper composite scoring rules and their applications DUECK, J., EDELMANN, D., GNEITING, T. and RICHARDS, D. 2305 The affinely invariant distance correlation Author Index 2331 Bernoulli 20(4), 2014, 1673–1697 DOI: 10.3150/13-BEJ523

Asymptotic goodness-of-fit tests for the Palm mark distribution of stationary point processes with correlated marks

LOTHAR HEINRICH1, SEBASTIAN LÜCK2,* and VOLKER SCHMIDT2,** 1Institute of Mathematics, University of Augsburg, D-86135 Augsburg, Germany. E-mail: [email protected] 2Institute of Stochastics, Ulm University, D-89069 Ulm, Germany. E-mail: *[email protected]; **[email protected]

We consider spatially homogeneous marked point patterns in an unboundedly expanding convex sampling window. Our main objective is to identify the distribution of the typical mark by constructing an asymptotic χ2-goodness-of-fit test. The corresponding test statistic is based on a natural empirical version of the Palm mark distribution and a smoothed covariance estimator which turns out to be mean square consistent. Our approach does not require independent marks and allows dependences between the mark field and the point pattern. Instead we impose a suitable β-mixing condition on the underlying stationary marked which can be checked for a number of Poisson-based models and, in particular, in the case of geostatistical marking. In order to study test performance, our test approach is applied to detect anisotropy of specific Boolean models. Keywords: β-mixing point process; empirical Palm mark distribution; reduced factorial moment measures; smoothed covariance estimation; χ2-goodness-of-fit test

References

[1] Beneš, V., Hlawiczková, M., Gokhale, A. and Vander Voort, G. (2001). Anisotropy estimation prop- erties for microstructural models. Mater. Charact. 46 93–98. [2] Böhm, S. and Schmidt, V. (2004). Asymptotic properties of estimators for the volume fractions of jointly stationary random sets. Stat. Neerl. 58 388–406. MR2106346 [3] Bradley, R.C. (2007). An Introduction to Strong Mixing Conditions. Vols 1, 2, 3. Heber City, UT: Kendrick Press. [4] Daley, D. and Vere-Jones, D. (2003/2008). An Introduction to the Theory of Point Processes. Vols I, II, 2nd ed. New York: Springer. [5] Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics 85.NewYork: Springer. MR1312160 [6] Folland, G.B. (1999). Real Analysis: Modern Techniques and Their Applications, 2nd ed. Pure and Applied Mathematics (New York). New York: Wiley. MR1681462 [7] Guan, Y., Sherman, M. and Calvin, J.A. (2004). A nonparametric test for spatial isotropy using sub- sampling. J. Amer. Statist. Assoc. 99 810–821. MR2090914 [8] Guan, Y., Sherman, M. and Calvin, J.A. (2007). On asymptotic properties of the mark variogram estimator of a marked point process. J. Statist. Plann. Inference 137 148–161. MR2292847

1350-7265 © 2014 ISI/BS [9] Heinrich, L. (1994). Normal approximation for some mean-value estimates of absolutely regular tes- sellations. Math. Methods Statist. 3 1–24. MR1272628 [10] Heinrich, L., Klein, S. and Moser, M. (2014). Empirical mark covariance and product density function of stationary marked point processes – A survey on asymptotic results. Methodol. Comput. Appl. Probab. 16 283–293. MR3199047 [11] Heinrich, L., Lück, S., Nolde, M. and Schmidt, V. (2014). On strong mixing, Bernstein’s blocking method and a CLT for spatial marked point processes. Yokohama Math. J. To appear. [12] Heinrich, L., Lück, S. and Schmidt, V. (2012). Non-parametric asymptotic statistics for the Palm mark distribution of β-mixing marked point processes. Available at arXiv:1205.5044v1 [math.ST]. [13] Heinrich, L. and Molchanov, I.S. (1999). for a class of random measures asso- ciated with germ-grain models. Adv. in Appl. Probab. 31 283–314. MR1724553 [14] Heinrich, L. and Pawlas, Z. (2008). Weak and strong convergence of empirical distribution functions from germ-grain processes. Statistics 42 49–65. MR2396675 [15] Heinrich, L. and Prokešová, M. (2010). On estimating the asymptotic variance of stationary point processes. Methodol. Comput. Appl. Probab. 12 451–471. MR2665270 [16] Kallenberg, O. (1986). Random Measures. London: Academic Press. [17] Lück, S., Kupsch, A., Lange, A., Hentschel, M. and Schmidt, V. (2012). Statistical analysis of to- mographic reconstruction algorithms by morphological image characteristics. Mater. Res. Soc. Symp. Proc. 1421. DOI:10.1557/opl.2012.209. [18] Midgley, P.A. and Weyland, M. (2003). 3D electron microscopy in the physical sciences: The devel- opment of Z-contrast and EFTEM tomography. Ultramicroscopy 96 413–431. [19] Pawlas, Z. (2009). Empirical distributions in marked point processes. . Appl. 119 4194–4209. MR2565564 [20] Schneider, R. (1993). Convex Bodies: The Brunn–Minkowski Theory. Encyclopedia of Mathematics and Its Applications 44. Cambridge: Cambridge Univ. Press. MR1216521 [21] Yoshihara, K.I. (1976). Limiting behavior of U-statistics for stationary, absolutely regular processes. Z. Wahrsch. Verw. Gebiete 35 237–252. MR0418179 Bernoulli 20(4), 2014, 1698–1716 DOI: 10.3150/13-BEJ537

Convergence rate and concentration inequalities for Gibbs sampling in high dimension

NENG-YI WANG1 and LIMING WU2 1Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100190, Beijing, China. E-mail: [email protected] 2Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100190, Beijing, China and Laboratoire de Math. CNRS-UMR 6620, Université Blaise Pascal, 63177 Aubière, France. E-mail: [email protected]

The objective of this paper is to study the Gibbs sampling for computing the mean of observable in very high dimension – a powerful Monte Carlo method. Under the Dobrushin’s uniqueness condition, we establish some explicit and sharp estimate of the exponential convergence rate and prove some Gaussian concentration inequalities for the empirical mean.

Keywords: concentration inequality; coupling method; Dobrushin’s uniqueness condition; ; Markov chain Monte Carlo

References

[1] Bobkov, S.G. and Götze, F. (1999). Exponential integrability and transportation cost related to loga- rithmic Sobolev inequalities. J. Funct. Anal. 163 1–28. MR1682772 [2] Diaconis, P., Khare, K. and Saloff-Coste, L. (2008). Gibbs sampling, exponential families and orthog- onal polynomials. Statist. Sci. 23 151–178. MR2446500 [3] Diaconis, P., Khare, K. and Saloff-Coste, L. (2010). Gibbs sampling, conjugate priors and coupling. Sankhya A 72 136–169. MR2658168 [4] Djellout, H., Guillin, A. and Wu, L. (2004). Transportation cost-information inequalities and applica- tions to random dynamical systems and diffusions. Ann. Probab. 32 2702–2732. MR2078555 [5] Dobrushin, R.L. (1968). The description of a random field by means of conditional probabilities and condition of its regularity. Theory Probab. Appl. 13 197–224. [6] Dobrushin, R.L. (1970). Prescribing a system of random variables by conditional distributions. Theory Probab. Appl. 15 458–486. [7] Dobrushin, R.L. and Shlosman, S.B. (1985). Completely analytical Gibbs fields. In Statistical Physics and Dynamical Systems (Köszeg, 1984). Progress in Probability 10 371–403. Boston, MA: Birkhäuser. MR0821307 [8] Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. New York: Springer. MR1847783 [9] Gozlan, N. and Léonard, C. (2007). A large deviation approach to some transportation cost inequali- ties. Probab. Theory Related Fields 139 235–283. MR2322697 [10] Gozlan, N. and Léonard, C. (2010). Transport inequalities. A survey. Markov Process. Related Fields 16 635–736. MR2895086

1350-7265 © 2014 ISI/BS [11] Hairer, M. and Mattingly, J.C. (2011). Yet another look at Harris’ ergodic theorem for Markov chains. In Seminar on Stochastic Analysis, Random Fields and Applications VI. Progress in Probability 63 109–117. Basel: Birkhäuser. MR2857021 [12] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38 2418–2442. MR2683634 [13] Ledoux, M. (1999). Concentration of measure and logarithmic Sobolev inequalities. In Séminaire de Probabilités, XXXIII. Lecture Notes in Math. 1709 120–216. Berlin: Springer. MR1767995 [14] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Mono- graphs 89. Providence, RI: Amer. Math. Soc. MR1849347 [15] Martinelli, F. (1999). Lectures on Glauber dynamics for discrete spin models. In Lectures on Probabil- ity Theory and Statistics (Saint-Flour, 1997). Lecture Notes in Math. 1717 93–191. Berlin: Springer. MR1746301 [16] Martinelli, F. and Olivieri, E. (1994). Approach to equilibrium of Glauber dynamics in the one phase region. I. The attractive case. Comm. Math. Phys. 161 447–486. MR1269387 [17] Marton, K. (1996). Bounding d-distance by informational divergence: A method to prove measure concentration. Ann. Probab. 24 857–866. MR1404531 [18] Marton, K. (1996). A measure concentration inequality for contracting Markov chains. Geom. Funct. Anal. 6 556–571. MR1392329 [19] Marton, K. (2003). Measure concentration and strong mixing. Studia Sci. Math. Hungar. 40 95–113. MR2002993 [20] Marton, K. (2004). Measure concentration for Euclidean distance in the case of dependent random variables. Ann. Probab. 32 2526–2544. MR2078549 [21] Meyn, S.P. and Tweedie, R.L. (1993). Markov Chains and Stochastic Stability. Communications and Control Engineering Series. London: Springer. MR1287609 [22] Paulin, D. (2012). Concentration inequalities for Markov chains by Marton coupling. Preprint. [23] Rio, E. (2000). Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes. C. R. Acad. Sci. Paris Sér. IMath. 330 905–908. MR1771956 [24] Roberts, G.O. and Rosenthal, J.S. (2004). General state space Markov chains and MCMC algorithms. Probab. Surv. 1 20–71. MR2095565 [25] Rosenthal, J.S. (1995). Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90 558–566. MR1340509 [26] Rosenthal, J.S. (1996). Analysis of the Gibbs sampler for a model related to James–Stein estimations. Statist. Comput. 6 269–275. [27] Rosenthal, J.S. (2002). Quantitative convergence rates of Markov chains: A simple account. Electron. Commun. Probab. 7 123–128 (electronic). MR1917546 [28] Samson, P.M. (2000). Concentration of measure inequalities for Markov chains and -mixing pro- cesses. Ann. Probab. 28 416–461. MR1756011 [29] Talagrand, M. (1996). Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6 587–600. MR1392331 [30] Villani, C. (2003). Topics in Optimal Transportation. Graduate Studies in Mathematics 58.Provi- dence, RI: Amer. Math. Soc. MR1964483 [31] Winkler, G. (1995). Image Analysis, Random Fields and Dynamic Monte Carlo Methods: A Mathe- matical Introduction. Applications of Mathematics (New York) 27. Berlin: Springer. MR1316400 [32] Wintenberger, O. (2012). Weak transport inequalities and applications to exponential inequalities and oracle inequalities. Preprint. [33] Wu, L. (2006). Poincaré and transportation inequalities for Gibbs measures under the Dobrushin uniqueness condition. Ann. Probab. 34 1960–1989. MR2271488 [34] Zegarlinski,´ B. (1992). Dobrushin uniqueness theorem and logarithmic Sobolev inequalities. J. Funct. Anal. 105 77–111. MR1156671 Bernoulli 20(4), 2014, 1717–1737 DOI: 10.3150/13-BEJ538

The generalized Pareto process; with a view towards application and simulation

ANA FERREIRA1,3 and LAURENS DE HAAN2,3 1ISA, Univ Tecn Lisboa, Tapada da Ajuda 1349-017 Lisboa, Portugal. E-mail: [email protected] 2Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. E-mail: [email protected] 3CEAUL, FCUL, Bloco C6 – Piso 4 Campo Grande, 749-016 Lisboa, Portugal

In extreme value statistics, the peaks-over-threshold method is widely used. The method is based on the generalized Pareto distribution characterizing probabilities of exceedances over high thresholds in Rd .We present a generalization of this concept in the space of continuous functions. We call this the generalized Pareto process. Differently from earlier papers, our definition is not based on a distribution function but on functional properties, and does not need a reference to a related max-. As an application, we use the theory to simulate wind fields connected to disastrous storms on the basis of observed extreme but not disastrous storms. We also establish the peaks-over-threshold approach in function space.

Keywords: domain of attraction; extreme value theory; functional regular variation; generalized Pareto process; max-stable processes; peaks-over-threshold

References

[1] Balkema, A.A. and de Haan, L. (1974). Residual life time at great age. Ann. Probab. 2 792–804. MR0359049 [2] Billingsley, P. (1995). Probability and Measure,3rded.Wiley Series in Probability and Mathematical Statistics. New York: Wiley. MR1324786 [3] de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer Series in Oper- ations Research and Financial Engineering. New York: Springer. MR2234156 [4] de Haan, L. and Lin, T. (2001). On convergence toward an extreme value distribution in C[0, 1]. Ann. Probab. 29 467–483. MR1825160 [5] de Haan, L. and Lin, T. (2003). Weak consistency of extreme value estimators in C[0, 1]. Ann. Statist. 31 1996–2012. [6] de Haan, L. and Pereira, T.T. (2006). Spatial extremes: The stationary case. Ann. Statist. 34 146–168. [7] Einmahl, J.H.J. and Lin, T. (2006). Asymptotic normality of extreme value estimators on C[0, 1]. Ann. Statist. 34 469–492. MR2275250 [8] Falk, M., Hüsler, J. and Reiss, R.D. (2010). Laws of Small Numbers: Extremes and Rare Events. Basel: Birkhäuser. [9] Hult, H. and Lindskog, F. (2005). Extremal behavior of regularly varying stochastic processes. Stochastic Process. Appl. 115 249–274. MR2111194 [10] Penrose, M.D. (1992). Semi-min-stable processes. Ann. Probab. 20 1450–1463. MR1175271 [11] Pickands, J. III (1975). Statistical inference using extreme order statistics. Ann. Statist. 3 119–131. MR0423667

1350-7265 © 2014 ISI/BS [12] Ribatet, M. (2011). SpatialExtremes: Modelling spatial extremes. R package. [13] Rootzén, H. and Tajvidi, N. (2006). Multivariate generalized Pareto distributions. Bernoulli 12 917– 930. MR2265668 Bernoulli 20(4), 2014, 1738–1764 DOI: 10.3150/13-BEJ539

Model comparison with composite likelihood information criteria

CHI TIM NG1,2 and HARRY JOE3 1Department of Statistics, Seoul National University, Room 430, Building 25, Seoul, South Korea. E-mail: *[email protected] 2Department of Statistics, Chonnam National University, Gwangju, 500-757, South Korea 3Department of Statistics, University of British Columbia, Room ESB 3138, Earth Sciences Building, Van- couver, Canada. E-mail: [email protected]

Comparisons are made for the amount of agreement of the composite likelihood information criteria and their full likelihood counterparts when making decisions among the fits of different models, and some properties of penalty term for composite likelihood information criteria are obtained. Asymptotic theory is given for the case when a simpler model is nested within a bigger model, and the bigger model approaches the simpler model under a sequence of local alternatives. Composite likelihood can more or less frequently choose the bigger model, depending on the direction of local alternatives; in the former case, composite likelihood has more “power” to choose the bigger model. The behaviors of the information criteria are illustrated via theory and simulation examples of the Gaussian linear mixed-effects model.

Keywords: Akaike information criterion; Bayesian information criterion; local alternatives; mixed-effects model; model comparison

References

[1] Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman & Hall. MR0370837 [2] Diggle, P.J., Liang, K.Y. and Zeger, S.L. (1994). Analysis of Longitudinal Data. Oxford: Oxford Univ. Press. [3] Fackler, P.L. (2005). Notes on matrix calculus. Available at http://www.stat.duke.edu/~zo2/shared/ resources/matrixc1.pdf. [4] Gao, X. and Song, P.X.K. (2010). Composite likelihood Bayesian information criteria for model se- lection in high-dimensional data. J. Amer. Statist. Assoc. 105 1531–1540. MR2796569 [5] Joe, H. and Lee, Y. (2009). On weighting of bivariate margins in pairwise likelihood. J. Multivariate Anal. 100 670–685. MR2478190 [6] Joe, H. and Maydeu-Olivares, A. (2010). A general family of limited information goodness-of-fit statistics for multinomial data. Psychometrika 75 393–419. MR2719935 [7] Laird, N. and Ware, H.H. (1982). Random-effect models for longitudinal data. Biometrics 38 963–974. [8] Magnus, J.R. and Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley Series in Probability and Statistics. Chichester: Wiley. MR1698873 [9] Morrison, D.F. (2005). Multivariate Statistical Methods. Belmont, CA: Thomson/Brooks/Cole. [10] Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. New York: Springer. [11] Rice, S.O. (1980). Distribution of quadratic forms in normal random variables—evaluation by numer- ical integration. SIAM J. Sci. Statist. Comput. 1 438–448. MR0610756

1350-7265 © 2014 ISI/BS [12] Sasvári, Z. (1999). An elementary proof of Binet’s formula for the gamma function. Amer. Math. Monthly 106 156–158. MR1671869 [13] Varin, C. (2008). On composite marginal likelihoods. Adv. Stat. Anal. 92 1–28. MR2414624 [14] Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5–42. MR2796852 [15] Varin, C. and Vidoni, P. (2005). A note on composite likelihood inference and model selection. Biometrika 92 519–528. MR2202643 [16] Vuong, Q.H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Economet- rica 57 307–333. MR0996939 [17] Xu, X. and Reid, N. (2011). On the robustness of maximum composite likelihood estimate. J. Statist. Plann. Inference 141 3047–3054. MR2796010 Bernoulli 20(4), 2014, 1765–1801 DOI: 10.3150/13-BEJ540

On the form of the large deviation rate function for the empirical measures of weakly interacting systems

MARKUS FISCHER Department of Mathematics, University of Padua, via Trieste 63, 35121 Padova, Italy. E-mail: fi[email protected]

A basic result of large deviations theory is Sanov’s theorem, which states that the sequence of empirical measures of independent and identically distributed samples satisfies the large deviation principle with rate function given by relative with respect to the common distribution. Large deviation principles for the empirical measures are also known to hold for broad classes of weakly interacting systems. When the interaction through the empirical measure corresponds to an absolutely continuous change of measure, the rate function can be expressed as relative entropy of a distribution with respect to the law of the McKean– Vlasov limit with measure-variable frozen at that distribution. We discuss situations, beyond that of tilted distributions, in which a large deviation principle holds with rate function in relative entropy form.

Keywords: empirical measure; Laplace principle; large deviations; mean field interaction; particle system; relative entropy; Wiener measure

References

[1] Billingsley, P. (1968). Convergence of Probability Measures. New York: Wiley. MR0233396 [2] Boué, M. and Dupuis, P. (1998). A variational representation for certain functionals of . Ann. Probab. 26 1641–1659. MR1675051 [3] Budhiraja, A. and Dupuis, P. (2000). A variational representation for positive functionals of infinite dimensional Brownian motion. Probab. Math. Statist. 20 39–61. MR1785237 [4] Budhiraja, A., Dupuis, P. and Fischer, M. (2012). Large deviation properties of weakly interacting processes via weak convergence methods. Ann. Probab. 40 74–102. MR2917767 [5] Dai Pra, P. and den Hollander, F. (1996). McKean–Vlasov limit for interacting random processes in random media. J. Stat. Phys. 84 735–772. MR1400186 [6] Dawson, D.A. and Gärtner, J. (1987). Large deviations from the McKean–Vlasov limit for weakly interacting diffusions. Stochastics 20 247–308. MR0885876 [7] Del Moral, P. and Guionnet, A. (1998). Large deviations for interacting particle systems: Applications to non-linear filtering. Stochastic Process. Appl. 78 69–95. MR1653296 [8] Del Moral, P. and Zajic, T. (2003). A note on the Laplace–Varadhan integral lemma. Bernoulli 9 49–65. MR1963672 [9] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Applica- tions of Mathematics (New York) 38. New York: Springer. MR1619036 [10] Djehiche, B. and Kaj, I. (1995). The rate function for some measure-valued jump processes. Ann. Probab. 23 1414–1438. MR1349178

1350-7265 © 2014 ISI/BS [11] Duffy, K.R. (2010). Mean field Markov models of wireless local area networks. Markov Process. Related Fields 16 295–328. MR2666856 [12] Dupuis, P. and Ellis, R.S. (1997). A Weak Convergence Approach to the Theory of Large Deviations. Wiley Series in Probability and Statistics: Probability and Statistics. New York: Wiley. MR1431744 [13] Ellis, R.S. (1985). Entropy, Large Deviations, and . Grundlehren der Mathema- tischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 271. New York: Springer. MR0793553 [14] Feng, S. (1994). Large deviations for Markov processes with mean field interaction and unbounded jumps. Probab. Theory Related Fields 100 227–252. MR1296430 [15] Feng, S. (1994). Large deviations for of mean-field interacting particle system with unbounded jumps. Ann. Probab. 22 2122–2151. MR1331217 [16] Fischer, M. and Nappo, G. (2010). On the moments of the modulus of continuity of Itô processes. Stoch. Anal. Appl. 28 103–122. MR2597982 [17] Föllmer, H. (1985). An entropy approach to the time reversal of diffusion processes. In Stochastic Differential Systems (Marseille-Luminy, 1984) (M. Metivier and E. Pardoux, eds.). Lecture Notes in Control and Information Sciences 69 156–163. Berlin: Springer. MR0798318 [18] Föllmer, H. (1986). Time reversal on Wiener space. In Stochastic Processes—Mathematics and Physics (Bielefeld, 1984) (S.A. Albeverio, P. Blanchard and L. Streit, eds.). Lecture Notes in Math. 1158 119–129. Berlin: Springer. MR0838561 [19] Jacod, J. and Shiryaev, A.N. (2003). Limit Theorems for Stochastic Processes, 2nd ed. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 288. Berlin: Springer. MR1943877 [20] Kallenberg, O. (1996). On the existence of universal functional solutions to classical SDE’s. Ann. Probab. 24 196–205. MR1387632 [21] Karatzas, I. and Shreve, S.E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Graduate Texts in Mathematics 113. New York: Springer. MR1121940 [22] Kullback, S. (1968). and Statistics. Mineola, NY: Dover. [23] Lassalle, R. (2012). Invertibility of adapted perturbations of the identity on abstract Wiener space. J. Funct. Anal. 262 2734–2776. MR2885964 [24] Léonard, C. (1995). Large deviations for long range interacting particle systems with jumps. Ann. Inst. Henri Poincaré Probab. Stat. 31 289–323. MR1324810 [25] Léonard, C. (1995). On large deviations for particle systems associated with spatially homogeneous Boltzmann type equations. Probab. Theory Related Fields 101 1–44. MR1314173 [26] McKean, H.P. Jr. (1966). A class of Markov processes associated with nonlinear parabolic equations. Proc. Natl. Acad. Sci. USA 56 1907–1911. MR0221595 [27] Rogers, L.C.G. and Williams, D. (2000). Diffusions, Markov Processes, and Martingales. Vol.2. Itô Calculus. Cambridge Mathematical Library. Cambridge: Cambridge Univ. Press. Reprint of the second (1994) edition. MR1780932 [28] Tanaka, H. (1984). Limit theorems for certain diffusion processes with interaction. In Stochastic Anal- ysis (Katata/Kyoto, 1982) (K. Itô, ed.). North-Holland Mathematical Library 32 469–488. Amster- dam: North-Holland. MR0780770 [29] Üstünel, A.S. (2009). Entropy, invertibility and variational calculus of adapted shifts on Wiener space. J. Funct. Anal. 257 3655–3689. MR2572265 [30] Varadhan, S.R.S. (1966). Asymptotic probabilities and differential equations. Comm. Pure Appl. Math. 19 261–286. MR0203230 [31] Yamada, T. and Watanabe, S. (1971). On the uniqueness of solutions of stochastic differential equa- tions. J. Math. Kyoto Univ. 11 155–167. MR0278420 Bernoulli 20(4), 2014, 1802–1818 DOI: 10.3150/13-BEJ542

Minimax bounds for estimation of normal mixtures

ARLENE K.H. KIM Statistical Laboratory, Center for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge, CB30WB, UK. E-mail: [email protected]

This paper deals with minimax rates of convergence for estimation of density functions on the real line. The densities are assumed to be location mixtures of normals, a global regularity requirement that creates subtle difficulties for the application of standard minimax lower bound methods. Using novel Fourier and Hermite polynomial techniques, we determine the minimax optimal rate – slightly larger than the parametric rate – under squared error loss. For Hellinger loss, we provide a minimax lower bound using ideas modified from the squared error loss case.

Keywords: Assouad’s lemma; Hermite polynomials; minimax lower bound; normal location mixture

References

[1] Billingsley, P. (1995). Probability and Measure,3rded.Wiley Series in Probability and Mathematical Statistics. New York: Wiley. MR1324786 [2] Ghosal, S. and van der Vaart, A.W. (2001). and rates of convergence for maximum likeli- hood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29 1233–1263. MR1873329 [3] Gradshteyn, I.S. and Ryzhik, I.M. (2007). Table of Integrals, Series, and Products,7thed.Amsterdam: Elsevier/Academic Press. MR2360010 [4] Ibragimov, I. (2001). Estimation of analytic functions. In State of the Art in Probability and Statistics (Leiden, 1999) (C. Klaasen, M. de Gunst and A.W. van der Vaart, eds.). Institute of Mathematical Statistics Lecture Notes—Monograph Series 36 359–383. Beachwood, OH: IMS. MR1836570 [5] Jackson, D. (2004). Fourier Series and Orthogonal Polynomials. Mineola, NY: Dover. MR2098657 [6] Kawata, T. (1972). Fourier Analysis in . New York: Academic Press. MR0464353 [7] Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38–53. MR0334381 [8] Marsden, J.E. and Hoffman, M.J. (1987). Basic Complex Analysis, 2nd ed. New York: Freeman. MR0913736 [9] Rudin, W. (1987). Real and Complex Analysis, 3rd ed. New York: McGraw-Hill. MR0924157 [10] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics.New York: Springer. MR2724359 [11] Van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge: Cambridge Univ. Press. MR1652247 [12] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599. MR1742500 [13] Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G.L. Yang, eds.) 423–435. New York: Springer. MR1462963

1350-7265 © 2014 ISI/BS Bernoulli 20(4), 2014, 1819–1844 DOI: 10.3150/13-BEJ543

Local extinction in continuous-state branching processes with immigration

CLÉMENT FOUCART1 and GERÓNIMO URIBE BRAVO2 1Institut für Mathematik, Technische Universität Berlin, RTG 1845, D-10623, Berlin, Germany. E-mail: [email protected] 2Instituto de Matemáticas, Universidad Nacional Autónoma de México, Área de la Investigación Científica, Circuito Exterior, Ciudad Universitaria, Coyoacán, 04510, México, D.F. E-mail: [email protected]

The purpose of this article is to observe that the zero sets of continuous-state branching processes with im- migration (CBI) are infinitely divisible regenerative sets. Indeed, they can be constructed by the procedure of random cutouts introduced by Mandelbrot in 1972. We then show how very precise information about the zero sets of CBI can be obtained in terms of the branching and immigrating mechanism.

Keywords: continuous-state ; polarity; random cutout; zero set

References

[1] Abraham, R. and Delmas, J.F. (2009). Changing the branching mechanism of a continuous state branching process using immigration. Ann. Inst. Henri Poincaré Probab. Stat. 45 226–238. MR2500236 [2] Bertoin, J. (1996). Lévy Processes. Cambridge Tracts in Mathematics 121. Cambridge: Cambridge Univ. Press. MR1406564 [3] Bertoin, J. (1999). Subordinators: Examples and applications. In Lectures on Probability Theory and Statistics (Saint-Flour, 1997). Lecture Notes in Math. 1717 1–91. Berlin: Springer. MR1746300 [4] Bi, H. (2013). Time to MRCA for stationary CBI-processes. Available at arXiv:1304.2001. [5] Bingham, N.H., Goldie, C.M. and Teugels, J.L. (1989). Regular Variation. Encyclopedia of Mathe- matics and Its Applications 27. Cambridge: Cambridge Univ. Press. MR1015093 [6] Blumenthal, R.M. and Getoor, R.K. (1961). Sample functions of stochastic processes with stationary independent increments. J. Math. Mech. 10 493–516. MR0123362 [7] Blumenthal, R.M. and Getoor, R.K. (1962). The dimension of the set of zeros and the graph of a symmetric stable process. Illinois J. Math. 6 308–316. MR0138134 [8] Caballero, M.E., Pardo, J.C. and Pérez, J.L. (2010). On Lamperti stable processes. Probab. Math. Statist. 30 1–28. MR2792485 [9] Chu, W. and Ren, Y.X.(2011). N-measure for continuous state branching processes and its application. Front. Math. China 6 1045–1058. MR2862645 [10] Donati-Martin, C. and Yor, M. (2007). Further examples of explicit Krein representations of certain subordinators. Publ. Res. Inst. Math. Sci. 43 315–328. MR2341013 [11] Duquesne, T. and Labbé, C. (2013). On the Eve property for CSBP. Available at arXiv:1305.6502. [12] Duquesne, T. and Le Gall, J.F. (2005). Probabilistic and fractal aspects of Lévy trees. Probab. Theory Related Fields 131 553–603. MR2147221

1350-7265 © 2014 ISI/BS [13] Dynkin, E.B. and Kuznetsov, S.E. (2004). N-measures for branching exit Markov systems and their applications to differential equations. Probab. Theory Related Fields 130 135–150. MR2092876 [14] Etheridge, A.M. and Williams, D.R.E. (2003). A decomposition of the (1 + β)- condi- tioned on survival. Proc. Roy. Soc. Edinburgh Sect. A 133 829–847. MR2006204 [15] Evans, S.N. (1993). Two representations of a conditioned superprocess. Proc. Roy. Soc. Edinburgh Sect. A 123 959–971. MR1249698 [16] Evans, S.N. and Ralph, P.L. (2010). Dynamics of the time to the most recent common ancestor in a large branching population. Ann. Appl. Probab. 20 1–25. MR2582640 [17] Fitzsimmons, P.J., Fristedt, B. and Shepp, L.A. (1985). The set of real numbers left uncovered by random covering intervals. Z. Wahrsch. Verw. Gebiete 70 175–189. MR0799145 [18] Fu, Z. and Li, Z. (2004). Measure-valued diffusions and stochastic equations with Poisson process. Osaka J. Math. 41 727–744. MR2108152 [19] Getoor, R.K. (1963). The asymptotic distribution of the number of zero-free intervals of a stable process. Trans. Amer. Math. Soc. 106 127–138. MR0145596 [20] Grey, D.R. (1974). Asymptotic behaviour of continuous time, continuous state-space branching pro- cesses. J. Appl. Probab. 11 669–677. MR0408016 [21] Hawkes, J. and Truman, A. (1991). Statistics of and excursions for the Ornstein–Uhlenbeck process. In Stochastic Analysis (Durham, 1990). London Mathematical Society Lecture Note Series 167 91–101. Cambridge: Cambridge Univ. Press. MR1166408 [22] Kallenberg, O. (1992). Some time change representations of stable integrals, via predictable transfor- mations of local martingales. Stochastic Process. Appl. 40 199–223. MR1158024 [23] Kawazu, K. and Watanabe, S. (1971). Branching processes with immigration and related limit theo- rems. Teor. Verojatnost. iPrimenen. 16 34–51. MR0290475 [24] Keller-Ressel, M. and Mijatovic,´ A. (2012). On the limit distributions of continuous-state branching processes with immigration. Stochastic Process. Appl. 122 2329–2345. MR2922631 [25] Kyprianou, A.E. (2006). Introductory Lectures on Fluctuations of Lévy Processes with Applications. Universitext. Berlin: Springer. MR2250061 [26] Li, Z. (2012). Continuous-state branching processes. Available at arXiv:1202.3223. [27] Li, Z. (2011). Measure-valued Branching Markov Processes. Probability and Its Applications (New York). Heidelberg: Springer. MR2760602 [28] Mandelbrot, B.B. (1972). Renewal sets and random cutouts. Z. Wahrsch. Verw. Gebiete 22 145–157. MR0309162 [29] Molchanov, I. (2005). Theory of Random Sets. Probability and Its Applications (New York). London: Springer. MR2132405 [30] Patie, P. (2009). Exponential functional of a new family of Lévy processes and self-similar continuous state branching processes with immigration. Bull. Sci. Math. 133 355–382. MR2532690 [31] Pinsky, M.A. (1972). Limit theorems for continuous state branching processes with immigration. Bull. Amer. Math. Soc.(N.S.) 78 242–244. MR0295450 [32] Pitman, J. and Yor, M. (1982). A decomposition of Bessel bridges. Z. Wahrsch. Verw. Gebiete 59 425–457. MR0656509 [33] Pitman, J. and Yor, M. (1997). On the lengths of excursions of some Markov processes. In Séminaire de Probabilités, XXXI. Lecture Notes in Math. 1655 272–286. Berlin: Springer. MR1478737 [34] Xiao, Y. (2004). Random fractals and Markov processes. In Fractal Geometry and Applications: AJu- bilee of Benoît Mandelbrot, Part 2. Proc. Sympos. Pure Math. 72 261–338. Providence, RI: Amer. Math. Soc. MR2112126 Bernoulli 20(4), 2014, 1845–1878 DOI: 10.3150/13-BEJ544

Large deviations for bootstrapped empirical measures

JOSÉ TRASHORRAS* and OLIVIER WINTENBERGER** Université Paris–Dauphine, Ceremade, Place du Maréchal de Lattre de Tassigny, 75775 Paris Cedex 16, France. E-mail: *[email protected]; **[email protected]

We investigate the Large Deviations (LD) properties of bootstrapped empirical measures with exchange- able weights. Our main results show in great generality how the resulting rate functions combine the LD properties of both the sample weights and the observations. As an application, we obtain new LD results and discuss both conditional and unconditional LD-efficiency for many classical choices of entries such as Efron’s, leave-p-out, i.i.d. weighted, k-blocks bootstraps, etc.

Keywords: exchangeable bootstrap; large deviations

References

[1] Bahadur, R.R. (1969). Some Limit Theorems in Statistics. Philadelphia, PA: SIAM. MR0315820 [2] Barbe, P. and Bertail, P. (1995). The Weighted Bootstrap. Lecture Notes in Statistics 98.NewYork: Springer. MR2195545 [3] Baxter, J.R. and Jain, N.C. (1988). A comparison principle for large deviations. Proc. Amer. Math. Soc. 103 1235–1240. MR0955016 [4] Billingsley, P. (1968). Convergence of Probability Measures. New York: Wiley. MR0233396 [5] Bolley, F. (2008). Separability and completeness for the Wasserstein distance. In Séminaire de Prob- abilités XLI. Lecture Notes in Math. 1934 371–377. Berlin: Springer. MR2483740 [6] Broniatowski, M. and Cao, Z. (2013). Weighted sampling, maximum likelihood and minimum diver- gence estimators. Preprint, Université Pierre et Marie Curie, Paris, France. [7] Chaganty, N.R. and Karandikar, R.L. (1996). Some properties of the Kullback–Leibler number. SankhyaA¯ 58 69–80. MR1659055 [8] Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory. Wiley Series in Telecommuni- cations. New York: Wiley. MR1122806 [9] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Applica- tions of Mathematics (New York) 38. New York: Springer. MR1619036 [10] Dinwoodie, I.H. and Zabell, S.L. (1992). Large deviations for exchangeable random vectors. Ann. Probab. 20 1147–1166. MR1175254 [11] Donsker, M.D. and Varadhan, S.R.S. (1976). Asymptotic evaluation of certain Markov process expec- tations for large time. III. Comm. Pure Appl. Math. 29 389–461. MR0428471 [12] Dudley, R.M. (2002). Real Analysis and Probability. Cambridge Studies in Advanced Mathematics 74. Cambridge: Cambridge Univ. Press. Revised reprint of the 1989 original. MR1932358 [13] Dupuis, P. and Ellis, R.S. (1997). A Weak Convergence Approach to the Theory of Large Deviations. Wiley Series in Probability and Statistics: Probability and Statistics. New York: Wiley. MR1431744 [14] Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics 38. Philadelphia, PA: SIAM. MR0659849

1350-7265 © 2014 ISI/BS [15] Eichelsbacher, P. and Schmock, U. (2002). Large deviations of U-empirical measures in strong topolo- gies and applications. Ann. Inst. Henri Poincaré Probab. Stat. 38 779–797. MR1931586 [16] Ganesh, A. and O’Connell, N. (2007). Large and moderate deviations for matching problems and empirical discrepancies. Markov Process. Related Fields 13 85–98. MR2321752 [17] Gine, E. (1997). Lectures on some aspects of the bootstrap. In Lectures on Probability Theory and Statistics (Saint-Flour, 1996). Lecture Notes in Math. 1665 37–151. Berlin: Springer. MR1490044 [18] Grunwald, M. (1996). Sanov results for Glauber spin-glass dynamics. Probab. Theory Related Fields 106 187–232. MR1410687 [19] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer Series in Statistics.NewYork: Springer. MR1145237 [20] Holmes, S. and Reinert, G. (2004). Stein’s method for the bootstrap. In Stein’s Method: Expository Lectures and Applications. Institute of Mathematical Statistics Lecture Notes—Monograph Series 46 95–136. Beachwood, OH: IMS. MR2118605 [21] Hult, H. and Nyquist, P. (2013). Large deviations for weighted empirical measures arising in impor- tance sampling. Preprint, Royal Institute of Technology, Stockholm, Sweden. [22] Léonard, C. and Najim, J. (2002). An extension of Sanov’s theorem: Application to the Gibbs condi- tioning principle. Bernoulli 8 721–743. MR1963659 [23] Lynch, J. and Sethuraman, J. (1987). Large deviations for processes with independent increments. Ann. Probab. 15 610–627. MR0885133 [24] Najim, J. (2002). A Cramér type theorem for weighted random variables. Electron. J. Probab. 7 no. 4, 32 pp. (electronic). MR1887624 [25] Trashorras, J. (2008). Large deviations for symmetrised empirical measures. J. Theoret. Probab. 21 397–412. MR2391251 [26] Trashorras, J. and Wintenberger, O. (2014). Supplement to “Large deviations for bootstrapped empir- ical measures.” DOI:10.3150/13-BEJ544SUPP. [27] Villani, C. (2003). Topics in Optimal Transportation. Graduate Studies in Mathematics 58.Provi- dence, RI: Am. Math. Soc. MR1964483 Bernoulli 20(4), 2014, 1879–1929 DOI: 10.3150/13-BEJ545

Particle-kernel estimation of the filter density in state-space models

DAN CRISAN1 and JOAQUÍN MÍGUEZ2 1Department of Mathematics, Imperial College London, Huxley Building, 180 Queen’s Gate, London SW7 2BZ, UK. E-mail: [email protected] 2Department of Signal Theory & Communications, Universidad Carlos III de Madrid, Avenida de la Uni- versidad 30, 28911 Leganés (Madrid), Spain. E-mail: [email protected]

Sequential Monte Carlo (SMC) methods, also known as particle filters, are simulation-based recursive al- gorithms for the approximation of the a posteriori probability measures generated by state-space dynamical models. At any given time t, a SMC method produces a set of samples over the state space of the system of interest (often termed “particles”) that is used to build a discrete and random approximation of the poste- rior probability distribution of the state variables, conditional on a sequence of available observations. One potential application of the methodology is the estimation of the densities associated to the sequence of a posteriori distributions. While practitioners have rather freely applied such density approximations in the past, the issue has received less attention from a theoretical perspective. In this paper, we address the prob- lem of constructing kernel-based estimates of the posterior probability density function and its derivatives, and obtain asymptotic convergence results for the estimation errors. In particular, we find convergence rates for the approximation errors that hold uniformly on the state space and guarantee that the error vanishes al- most surely as the number of particles in the filter grows. Based on this uniform convergence result, we first show how to build continuous measures that converge almost surely (with known rate) toward the posterior measure and then address a few applications. The latter include maximum a posteriori estimation of the system state using the approximate derivatives of the posterior density and the approximation of functionals of it, for example, Shannon’s entropy.

Keywords: density estimation; Markov systems; particle filtering; sequential Monte Carlo; state-space models; stochastic filtering

References

[1] Abraham, C., Biau, G. and Cadre, B. (2004). On the asymptotic properties of a simple estimate of the mode. ESAIM Probab. Stat. 8 1–11 (electronic). MR2085601 [2] Appel, M.J., LaBarre, R. and Radulovic,´ D. (2003). On accelerated random search. SIAM J. Optim. 14 708–731 (electronic). MR2085938 [3] Bain, A. and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Stochastic Modelling and Ap- plied Probability 60. New York: Springer. MR2454694 [4] Beirlant, J., Dudewicz, E.J., Györfi, L. and van der Meulen, E.C. (1997). Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 6 17–39. MR1471870 [5] Brewer, M.J. (2000). A Bayesian model for local smoothing in kernel density estimation. Statist. Comput. 10 299–309.

1350-7265 © 2014 ISI/BS [6] Corana, A., Marchesi, M., Martini, C. and Ridella, S. (1987). Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithm. ACM Trans. Math. Software 13 262– 280. MR0918580 [7] Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory. Wiley Series in Telecommuni- cations. New York: Wiley. MR1122806 [8] Crisan, D. (2001). Particle filters – a theoretical perspective. In Sequential Monte Carlo Methods in Practice (A. Doucet, N. de Freitas and N. Gordon, eds.). Stat. Eng. Inf. Sci. 17–41. New York: Springer. MR1847785 [9] Crisan, D., Del Moral, P. and Lyons, T. (1999). Discrete filtering using branching and interacting particle systems. Markov Process. Related Fields 5 293–318. MR1710982 [10] Crisan, D. and Doucet, A. (2000). Convergence of sequential Monte Carlo methods. Technical Report CUED/FINFENG/TR381, Cambridge University. [11] Crisan, D. and Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 50 736–746. MR1895071 [12] Dean, T.A., Singh, S.S., Jasra, A. and Peters, G.W. (2011). Parameter estimation for hidden Markov models with intractable likelihoods. Available at arXiv:1103.5399v1 [math.ST]. [13] Del Moral, P. (1996). Non-linear filtering using random particles. Theory Probab. Appl. 40 690–701. [14] Del Moral, P. (1996). Nonlinear filtering: Interacting particle solution. Markov Process. Related Fields 2 555–579. MR1431187 [15] Del Moral, P. (2004). Feynman–Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Probability and Its Applications (New York). New York: Springer. MR2044973 [16] Del Moral, P. and Miclo, L. (2000). Branching and interacting particle systems approximations of Feynman–Kac formulae with applications to non-linear filtering. In Séminaire de Probabilités, XXXIV. Lecture Notes in Math. 1729 1–145. Berlin: Springer. MR1768060 [17] Del Moral, P., Doucet, A. and Singh, S. (2011). Uniform stability of a particle approximation of the optimal filter derivative. Available at arXiv:1106.2525v1 [math.ST]. [18] Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: The L1 View. Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics. New York: Wiley. MR0780746 [19] Douc, R., Cappé, O. and Moulines, E. (2005). Comparison of resampling schemes for particle filtering. In Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis 64– 69. [20] Doucet, A., de Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice (A. Doucet, N. de Freitas and N. Gordon, eds.). Statistics for Engineering and Information Science. New York: Springer. MR1847783 [21] Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice (A. Doucet, N. de Freitas and N. Gordon, eds.). Statistics for Engineering and Information Science. New York: Springer. MR1847783 [22] Doucet, A., Godsill, S. and Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statist. Comput. 10 197–208. [23] Duong, T. and Hazelton, M.L. (2005). Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand. J. Stat. 32 485–506. MR2204631 [24] Frenkel, L. and Feder, M. (1999). Recursive expectation–maximization (EM) algorithms for time- varying parameters with applications to multiple target tracking. IEEE Trans. Signal Process. 47 306– 320. [25] Gauvain, J.L. and Lee, C.H. (1992). Bayesian learning for with Gaussian mix- ture state observation densities. Speech Commun. 11 205–213. [26] Godsill, S., Doucet, A. and West, M. (2001). Maximum a posteriori sequence estimation using Monte Carlo particle filters. Ann. Inst. Statist. Math. 53 82–96. MR1777255 [27] Gordon, N., Salmond, D. and Smith, A.F.M. (1993). Novel approach to nonlinear and non-Gaussian Bayesian state estimation. IEE Proc. F 140 107–113. [28] Hall, P. and Kang, K.H. (2001). Bootstrapping nonparametric density estimators with empirically chosen bandwidths. Ann. Statist. 29 1443–1468. MR1873338 [29] Hedar, A.R. and Fukushima, M. (2006). Derivative-free filter simulated annealing method for con- strained continuous global optimization. J. Global Optim. 35 521–549. MR2249547 [30] Heine, K. and Crisan, D. (2008). Uniform approximations of discrete-time filters. Adv. in Appl. Probab. 40 979–1001. MR2488529 [31] Hu, X.L., Schön, T.B. and Ljung, L. (2008). A basic convergence result for particle filtering. IEEE Trans. Signal Process. 56 1337–1348. MR2512468 [32] Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. J. Basic Eng. 82 35–45. [33] Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Statist. 5 1–25. MR1380850 [34] Künsch, H.R. (2005). Recursive Monte Carlo filters: Algorithms and theoretical analysis. Ann. Statist. 33 1983–2021. MR2211077 [35] Le Gland, F. and Oudjane, N. (2004). Stability and uniform approximation of nonlinear filters using the Hilbert metric and application to particle filters. Ann. Appl. Probab. 14 144–187. MR2023019 [36] Liu, J.S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc. 93 1032–1044. MR1649198 [37] Logothetis, A. and Krishnamurthy, V. (1999). Expectation maximization algorithms for MAP estima- tion of jump Markov linear systems. IEEE Trans. Signal Process. 47 2139–2156. [38] Míguez, J., Crisan, D. and Djuric,´ P.M. (2013). On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Stat. Com- put. 23 91–107. MR3018352 [39] Musso, C., Oudjane, N. and Le Gland, F. (2001). Improving regularised particle filters. In Sequential Monte Carlo Methods in Practice (A. Doucet, N. de Freitas and N. Gordon, eds.). Stat. Eng. Inf. Sci. 247–271. New York: Springer. MR1847795 [40] Najim, K., Ikonen, E. and Del Moral, P. (2006). Open-loop regulation and tracking control based on a genealogical decision tree. Neural Comput. Appl. 15 339–349. [41] Nilsson, M. and Kleijn, W.B. (2007). On the estimation of differential entropy from data located on embedded manifolds. IEEE Trans. Inform. Theory 53 2330–2341. MR2319377 [42] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Monographs on Statis- tics and Applied Probability. London: Chapman & Hall. MR0848134 [43] Simonoff, J.S. (1996). Smoothing Methods in Statistics. Springer Series in Statistics.NewYork: Springer. MR1391963 [44] Van Hulle, M.M. (2005). Edgeworth approximation of multivariate differential entropy. Neural Com- put. 17 1903–1910. [45] Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing. Monographs on Statistics and Applied Prob- ability 60. London: Chapman & Hall. MR1319818 [46] West, M. (1993). Approximating posterior distributions by mixtures. J. R. Stat. Soc. Ser. BStat. Methodol. 55 409–422. MR1224405 [47] Zhang, X., King, M.L. and Hyndman, R.J. (2006). A Bayesian approach to bandwidth selection for multivariate kernel density estimation. Comput. Statist. Data Anal. 50 3009–3031. MR2239655 Bernoulli 20(4), 2014, 1930–1978 DOI: 10.3150/13-BEJ546

Optimal scaling for the transient phase of Metropolis Hastings algorithms: The longtime behavior

BENJAMIN JOURDAIN*, TONY LELIÈVRE** and BŁAZEJ˙ MIASOJEDOW† Université Paris-Est, CERMICS, 6 & 8, avenue Blaise Pascal, 77455 Marne-La-Vallée, France. E-mail: *[email protected]; **[email protected]; †[email protected]

We consider the Metropolis algorithm on Rn with Gaussian proposals, and when the tar- get probability measure is the n-fold product of a one-dimensional law. It is well known (see Roberts et al. (Ann. Appl. Probab. 7 (1997) 110–120)) that, in the limit n →∞, starting at equilibrium and for an appropriate scaling of the variance and of the timescale as a function of the dimension n, a diffusive limit is obtained for each component of the Markov chain. In Jourdain et al. (Optimal scaling for the transient phase of the random walk Metropolis algorithm: The mean-field limit (2012) Preprint), we generalize this result when the initial distribution is not the target probability measure. The obtained diffusive limit is the solution to a stochastic differential equation nonlinear in the sense of McKean. In the present paper, we prove convergence to equilibrium for this equation. We discuss practical counterparts in order to optimize the variance of the proposal distribution to accelerate convergence to equilibrium. Our analysis confirms the interest of the constant acceptance rate strategy (with acceptance rate between 1/4and1/3) first suggested in Roberts et al. (Ann. Appl. Probab. 7 (1997) 110–120). We also address scaling of the Metropolis-Adjusted Langevin Algorithm. When starting at equilibrium, a diffusive limit for an optimal scaling of the variance is obtained in Roberts and Rosenthal (J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 (1998) 255–268). In the transient case, we obtain formally that the optimal variance scales very differently in n depending on the sign of a moment of the distribution, which vanishes at equilibrium. This suggest that it is difficult to derive practical recommendations for MALA from such asymptotic results.

Keywords: diffusion limits; MALA; optimal scaling; propagation of chaos; random walk Metropolis

References

[1] Andrieu, C. and Robert, C. (2001). Controlled MCMC for optimal sampling. Working Papers 2001- 33, Centre de Recherche en Economie et Statistique. Available at http://ideas.repec.org/p/crs/wpaper/ 2001-33.html. [2] Ané, C., Blachère, S., Chafaï, D., Fougères, P., Gentil, I., Malrieu, F., Roberto, C. and Scheffer, G. (2000). Sur les Inégalités de Sobolev Logarithmiques. Panoramas et Synthèses [Panoramas and Syn- theses] 10. Paris: Société Mathématique de France. With a preface by Dominique Bakry and Michel Ledoux. MR1845806 [3] Atchadé, Y.F. and Rosenthal, J.S. (2005). On adaptive Markov chain Monte Carlo algorithms. Bernoulli 11 815–828. MR2172842 [4] Bédard, M. (2007). Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab. 17 1222–1244. MR2344305

1350-7265 © 2014 ISI/BS [5] Bédard, M. (2008). Optimal acceptance rates for Metropolis algorithms: Moving beyond 0.234. Stochastic Process. Appl. 118 2198–2222. MR2474348 [6] Bédard, M., Douc, R. and Moulines, E. (2014). Scaling analysis of delayed rejection MCMC methods. Methodol. Comput. Appl. Probab. To appear. Published online: 6 March 2013. [7] Bédard, M., Douc, R. and Moulines, E. (2012). Scaling analysis of multiple-try MCMC methods. Stochastic Process. Appl. 122 758–786. MR2891436 [8] Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.M. and Stuart, A. (2013). Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19 1501–1534. MR3129023 [9] Beskos, A., Roberts, G. and Stuart, A. (2009). Optimal scalings for local Metropolis–Hastings chains on nonproduct targets in high dimensions. Ann. Appl. Probab. 19 863–898. MR2537193 [10] Breyer, L.A., Piccioni, M. and Scarlatti, S. (2004). Optimal scaling of MaLa for nonlinear regression. Ann. Appl. Probab. 14 1479–1505. MR2071431 [11] Breyer, L.A. and Roberts, G.O. (2000). From Metropolis to diffusions: Gibbs states and optimal scal- ing. Stochastic Process. Appl. 90 181–206. MR1794535 [12] Christensen, O.F., Roberts, G.O. and Rosenthal, J.S. (2005). Scaling limits for the transient phase of local Metropolis–Hastings algorithms. J. R. Stat. Soc. Ser. BStat. Methodol. 67 253–268. MR2137324 [13] Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97–109. [14] Jourdain, B., Lelièvre, T. and Miasojedow, B. (2012). Optimal scaling for the transient phase of the random walk Metropolis algorithm: The mean-field limit. Preprint. Available at http://fr.arxiv.org/abs/ 1210.7639. [15] Mattingly, J.C., Pillai, N.S. and Stuart, A.M. (2012). Diffusion limits of the random walk Metropolis algorithm in high dimensions. Ann. Appl. Probab. 22 881–930. MR2977981 [16] Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21 1087–1092. [17] Neal, P. and Roberts, G. (2011). Optimal scaling of random walk Metropolis algorithms with non- Gaussian proposals. Methodol. Comput. Appl. Probab. 13 583–601. MR2822397 [18] Neal, P., Roberts, G. and Yuen, W.K. (2012). Optimal scaling of random walk Metropolis algorithms with discontinuous target densities. Ann. Appl. Probab. 22 1880–1927. MR3025684 [19] Otto, F. and Villani, C. (2000). Generalization of an inequality by Talagrand and links with the loga- rithmic Sobolev inequality. J. Funct. Anal. 173 361–400. MR1760620 [20] Pillai, N., Stuart, A. and Thiéry, A. (2011). Optimal proposal design for random walk type Metropolis algorithms with Gaussian random field priors. Preprint. Available at http://arxiv.org/abs/1108.1494. [21] Pillai, N.S., Stuart, A.M. and Thiéry, A.H. (2012). Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22 2320–2356. MR3024970 [22] Roberts, G.O., Gelman, A. and Gilks, W.R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110–120. MR1428751 [23] Roberts, G.O. and Rosenthal, J.S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. BStat. Methodol. 60 255–268. MR1625691 [24] Roberts, G.O. and Rosenthal, J.S. (2001). Optimal scaling for various Metropolis–Hastings algo- rithms. Statist. Sci. 16 351–367. MR1888450 [25] Sznitman, A.S. (1991). Topics in propagation of chaos. In École D’Été de Probabilités de Saint-Flour XIX—1989. Lecture Notes in Math. 1464 165–251. Berlin: Springer. MR1108185 Bernoulli 20(4), 2014, 1979–1998 DOI: 10.3150/13-BEJ547

Restricted likelihood representation and decision-theoretic aspects of meta-analysis

ANDREW L. RUKHIN National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD 20899, USA. E-mail: [email protected]

In the random-effects model of meta-analysis a canonical representation of the restricted likelihood function is obtained. This representation relates the mean effect and the heterogeneity variance estimation problems. An explicit form of the variance of weighted means statistics determined by means of a quadratic form is found. The behavior of the mean squared error for large heterogeneity variance is elucidated. It is noted that the sample mean is not admissible nor minimax under a natural risk function for the number of studies exceeding three.

Keywords: DerSimonian–Laird estimator; Hedges estimator; Mandel–Paule procedure; minimaxity; quadratic forms; random-effects model; Stein phenomenon

References

[1] Borenstein, M., Hedges, L., Higgins, J. and Rothstein, H. (2009). Introduction to Meta-Analysis.New York: Wiley. [2] Brown, L.D. (1988). The differential inequality of a statistical estimation problem. In Statistical De- cision Theory and Related Topics, IV, Vol.1(West Lafayette, Ind., 1986) (S.S. Gupta and J.O. Berger, eds.) 299–324. New York: Springer. MR0927109 [3] DerSimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials 7 177–188. [4] Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors – an empirical Bayes approach. J. Amer. Statist. Assoc. 68 117–130. MR0388597 [5] Harville, D.A. (1985). Decomposition of prediction error. J. Amer. Statist. Assoc. 80 132–138. MR0786599 [6] Jackson, D., Bowden, J. and Baker, R. (2010). How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts? J. Statist. Plann. Inference 140 961–970. MR2574658 [7] Maatta, J.M. and Casella, G. (1990). Developments in decision-theoretic variance estimation. Statist. Sci. 5 90–120. With comments and a rejoinder by the authors. MR1054858 [8] Marchand, É. and Strawderman, W.E. (2005). On improving on the minimum risk equivariant esti- mator of a scale parameter under a lower-bound constraint. J. Statist. Plann. Inference 134 90–101. MR2146087 [9] Marchand, É. and Strawderman, W.E. (2012). A unified minimax result for restricted parameter spaces. Bernoulli 18 635–643. MR2922464 [10] Marshall, A.W. and Olkin, I. (1979). Inequalities: Theory of Majorization and Its Applications. Math- ematics in Science and Engineering 143. New York: Academic Press. MR0552278 [11] Morris, C.N. and Normand, S.L. (1992). Hierarchical models for combining information and for meta- analyses. In Bayesian Statistics, Vol.4(Peñíscola, 1991) (J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, eds.) 321–344. New York: Oxford Univ. Press. MR1380284

1350-7265 In the Public Domain [12] Paule, R.C. and Mandel, J. (1982). Consensus values and weighting factors. J. Res. Natl. Bur. Stand. 87 377–385. [13] Rukhin, A.L. (1995). Admissibility: Survey of a concept in progress. Int. Stat. Rev. 63 95–115. [14] Rukhin, A.L. (2012). Estimating common mean and heterogeneity variance in two study case meta- analysis. Statist. Probab. Lett. 82 1318–1325. MR2929781 [15] Rukhin, A.L. (2013). Estimating heterogeneity variance in meta-analysis. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 451–469. MR3065475 [16] Rukhin, A.L. (2014). Supplement to “Restricted likelihood representation and decision-theoretic as- pects of meta-analysis.” DOI:10.3150/13-BEJ543SUPP. [17] Searle, S.R., Casella, G. and McCulloch, C.E. (1992). Variance Components. Wiley Series in Proba- bility and Mathematical Statistics: Applied Probability and Statistics. New York: Wiley. MR1190470 Bernoulli 20(4), 2014, 1999–2019 DOI: 10.3150/13-BEJ548

Optimal filtering and the dual process

OMIROS PAPASPILIOPOULOS1 and MATTEO RUGGIERO2 1ICREA & Department of Economics and Business, Universitat Pompeu Fabra, Ramón Trias Fargas 25-27, 08005, Barcelona, Spain. E-mail: [email protected] 2Collegio Carlo Alberto & Department of Economics and Statistics, University of Torino, C.so Unione Sovietica 218/bis, 10134, Torino, Italy. E-mail: [email protected]

We link optimal filtering for hidden Markov models to the notion of duality for Markov processes. We show that when the signal is dual to a process that has two components, one deterministic and one a pure death process, and with respect to functions that define changes of measure conjugate to the emission density, the filtering distributions evolve in the family of finite mixtures of such measures and the filter can be com- puted at a cost that is polynomial in the number of observations. Special cases of our framework include the Kalman filter, and computable filters for the Cox–Ingersoll–Ross process and the one-dimensional Wright– Fisher process, which have been investigated before. The dual we obtain for the Cox–Ingersoll–Ross pro- cess appears to be new in the literature.

Keywords: Bayesian conjugacy; Cox–Ingersoll–Ross process; finite mixture models; hidden Markov model; Kalman filter

References

[1] Barbour, A.D., Ethier, S.N. and Griffiths, R.C. (2000). A transition function expansion for a diffusion model with selection. Ann. Appl. Probab. 10 123–162. MR1765206 [2] Bernardo, J.M. and Smith, A.F.M. (1994). Bayesian Theory. Wiley Series in Probability and Mathe- matical Statistics: Probability and Mathematical Statistics. Chichester: Wiley. MR1274699 [3] Cappé, O., Moulines, E. and Rydén, T. (2005). Inference in Hidden Markov Models. Springer Series in Statistics. New York: Springer. With Randal Douc’s contributions to Chapter 9 and Christian P. Robert’s to Chapters 6, 7 and 13, with Chapter 14 by Gersende Fort, Philippe Soulier and Moulines, and Chapter 15 by Stéphane Boucheron and Elisabeth Gassiat. MR2159833 [4] Chaleyat-Maurel, M. and Genon-Catalot, V. (2006). Computable infinite-dimensional filters with ap- plications to discretized diffusion processes. Stochastic Process. Appl. 116 1447–1467. MR2260743 [5] Chaleyat-Maurel, M. and Genon-Catalot, V. (2009). Filtering the Wright–Fisher diffusion. ESAIM Probab. Stat. 13 197–217. MR2518546 [6] Cox, J.C., Ingersoll, J.E. Jr. and Ross, S.A. (1985). A theory of the term structure of interest rates. Econometrica 53 385–407. MR0785475 [7] Dawson, D.A. (1993). Measure-valued Markov processes. In École d’Été de Probabilités de Saint- Flour XXI—1991. Lecture Notes in Math. 1541 1–260. Berlin: Springer. MR1242575 [8] Etheridge, A.M. (2000). An Introduction to . University Lecture Series 20. Providence, RI: Amer. Math. Soc. MR1779100 [9] Ethier, S.N. and Kurtz, T.G. (1981). The infinitely-many-neutral-alleles diffusion model. Adv. in Appl. Probab. 13 429–452. MR0615945 [10] Ethier, S.N. and Kurtz, T.G. (1986). Markov Processes. Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics.NewYork: Wiley. MR0838085

1350-7265 © 2014 ISI/BS [11] Ethier, S.N. and Kurtz, T.G. (1993). Fleming–Viot processes in population genetics. SIAM J. Control Optim. 31 345–386. MR1205982 [12] Feller, W. (1951). Diffusion processes in genetics. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950 227–246. Berkeley and Los Angeles: Univ. Califor- nia Press. MR0046022 [13] Genon-Catalot, V. and Kessler, M. (2004). Random scale perturbation of an AR(1) process and its properties as a nonlinear explicit filter. Bernoulli 10 701–720. MR2076070 [14] Griffiths, R.C. (2006). Coalescent lineage distributions. Adv. in Appl. Probab. 38 405–429. MR2264950 [15] Hutzenthaler, M. and Wakolbinger, A. (2007). Ergodic behavior of locally regulated branching popu- lations. Ann. Appl. Probab. 17 474–501. MR2308333 [16] Jansen, S. and Kurt, N. (2013). On the notion(s) of duality for Markov processes. Available at arXiv:1210.7193 [math.PR]. [17] Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distributions. Vol.1, 2nd ed. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: Wiley. MR1299979 [18] Karlin, S. and Taylor, H.M. (1981). A Second Course in Stochastic Processes. New York: Academic Press. MR0611513 [19] Kawazu, K. and Watanabe, S. (1971). Branching processes with immigration and related limit theo- rems. Theory Probab. Appl. 16 34–51. [20] Lévy, P. (1948). Processus Stochastiques et Mouvement Brownien. Paris: Gauthier-Villars & Cie. [21] Liggett, T.M. (2005). Interacting Particle Systems. Classics in Mathematics. Berlin: Springer. Reprint of the 1985 original. MR2108619 [22] Sen, A. and Balakrishnan, N. (1999). Convolution of geometrics and a reliability problem. Statist. Probab. Lett. 43 421–426. MR1707953 [23] Tavaré, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theoret. Population Biol. 26 119–164. MR0770050 Bernoulli 20(4), 2014, 2020–2038 DOI: 10.3150/13-BEJ549

New concentration inequalities for suprema of empirical processes

JOHANNES LEDERER* and SARA VAN DE GEER** Seminar für Statistik, ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland. E-mail: *[email protected]; **[email protected]

While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration.

Keywords: chaining; concentration inequalities; deviation inequalities; empirical processes; rate of convergence

References

[1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applica- tions to Markov chains. Electron. J. Probab. 13 1000–1034. MR2424985 [2] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic The- ory of Independence. Oxford: Oxford Univ. Press. [3] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500. MR1890640 [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer. MR2807761 [5] Dudley, R.M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian pro- cesses. J. Funct. Anal. 1 290–330. MR0220340 [6] Dümbgen, L., van de Geer, S.A., Veraar, M.C. and Wellner, J.A. (2010). Nemirovski’s inequalities revisited. Amer. Math. Monthly 117 138–160. MR2590193 [7] Fernique, X. (1975). Regularité des trajectoires des fonctions aléatoires gaussiennes. In École D’Été de Probabilités de Saint-Flour IV-1974. Lecture Notes in Math. 480 1–96. Berlin: Springer. MR0413238 [8] Klein, T. and Rio, E. (2005). Concentration around the mean for maxima of empirical processes. Ann. Probab. 33 1060–1077. MR2135312 [9] Ledoux, M. (1995/97). On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Statist. 1 63–87 (electronic). MR1399224 [10] Ledoux, M. and Talagrand, M. (2011). Probability in Banach Spaces: Isoperimetry and Processes. Classics in Mathematics. Berlin: Springer. Reprint of the 1991 edition. MR2814399 [11] Massart, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical pro- cesses. Ann. Probab. 28 863–884. MR1782276 [12] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard. MR2319879

1350-7265 © 2014 ISI/BS [13] Rio, E. (2002). Une inégalité de Bennett pour les maxima de processus empiriques. Ann. Inst. Henri Poincaré Probab. Stat. 38 1053–1057. MR1955352 [14] Talagrand, M. (1996). Majorizing measures: The generic chaining. Ann. Probab. 24 1049–1103. MR1411488 [15] Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505–563. MR1419006 [16] Talagrand, M. (2005). The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer Monographs in Mathematics. Berlin: Springer. MR2133757 [17] van de Geer, S. and Lederer, J. (2013). The Bernstein–Orlicz norm and deviation inequalities. Probab. Theory Related Fields 157 225–250. [18] van der Vaart, A. and Wellner, J.A. (2011). A local maximal inequality under uniform entropy. Elec- tron. J. Stat. 5 192–203. MR2792551 [19] van der Vaart, A.W. and Wellner, J.A. (2000). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. New York: Springer. MR1385671 [20] Viens, F.G. and Vizcarra, A.B. (2007). Supremum concentration inequality and modulus of continuity for sub-nth chaos processes. J. Funct. Anal. 248 1–26. MR2329681 Bernoulli 20(4), 2014, 2039–2075 DOI: 10.3150/13-BEJ550

About the posterior distribution in hidden Markov models with unknown number of states

ELISABETH GASSIAT1 and JUDITH ROUSSEAU2 1Laboratoire de Mathématiques d’Orsay UMR 8628, Université Paris-Sud, Bâtiment 425, 91405 Orsay- Cédex, France. E-mail: [email protected] 2CREST-ENSAE, 3 avenue Pierre Larousse, 92245 Malakoff Cedex, France. E-mail: [email protected]

We consider finite state space stationary hidden Markov models (HMMs) in the situation where the number of hidden states is unknown. We provide a frequentist asymptotic evaluation of Bayesian analysis methods. Our main result gives posterior concentration rates for the marginal densities, that is for the density of a fixed number of consecutive observations. Using conditions on the prior, we are then able to define a consistent Bayesian estimator of the number of hidden states. It is known that the likelihood ratio test statistic for overfitted HMMs has a nonstandard behaviour and is unbounded. Our conditions on the prior may be seen as a way to penalize parameters to avoid this phenomenon. Inference of parameters is a much more difficult task than inference of marginal densities, we still provide a precise description of the situation when the observations are i.i.d. and we allow for 2 possible hidden states.

Keywords: Bayesian statistics; hidden Markov models; number of components; order selection; posterior distribution

References

[1] Boys, R.J. and Henderson, D.A. (2004). A Bayesian approach to DNA sequence segmentation. Bio- metrics 60 573–588. With discussions and a reply by the author. MR2089432 [2] Cappé, O., Moulines, E. and Rydén, T. (2004). Hidden Markov Models. New York: Springer. [3] Chambaz, A., Garivier, A. and Gassiat, E. (2009). A minimum description length approach to hidden Markov models with Poisson and Gaussian emissions. Application to order identification. J. Statist. Plann. Inference 139 962–977. MR2479841 [4] de Gunst, M.C.M. and Shcherbakova, O. (2008). Asymptotic behavior of Bayes estimators for hidden Markov models with application to ion channels. Math. Methods Statist. 17 342–356. MR2483462 [5] Douc, R., Moulines, É. and Rydén, T. (2004). Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime. Ann. Statist. 32 2254–2304. MR2102510 [6] Gassiat, E. (2002). Likelihood ratio inequalities with applications to various mixtures. Ann. Inst. Henri Poincaré Probab. Stat. 38 897–906. MR1955343 [7] Gassiat, E. and Boucheron, S. (2003). Optimal error exponents in hidden Markov models order esti- mation. IEEE Trans. Inform. Theory 49 964–980. MR1984482 [8] Gassiat, E. and van Handel, R. (2014). The local geometry of finite mixtures. Trans. Amer. Math. Soc. 366 1047–1072. MR3130325

1350-7265 © 2014 ISI/BS [9] Gassiat, E. and Keribin, C. (2000). The likelihood ratio test for the number of components in a mixture with Markov regime. ESAIM Probab. Statist. 4 25–52. MR1780964 [10] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192–223. MR2332274 [11] Ghosh, J.K. and Ramamoorthi, R.V. (2003). Bayesian Nonparametrics. Springer Series in Statistics. New York: Springer. MR1992245 [12] Green, P.J. and Richardson, S. (2002). Hidden Markov models and disease mapping. J. Amer. Statist. Assoc. 97 1055–1070. MR1951259 [13] Leroux, B. and Putterman, M. (1992). Maximum-penalised-likelihood estimation for independent and Markov dependent mixture models. Biometrics 48 545–558. [14] MacDonald, I.L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Time Series. Monographs on Statistics and Applied Probability 70. London: Chapman & Hall. MR1692202 [15] McGrory, C.A. and Titterington, D.M. (2009). Variational Bayesian analysis for hidden Markov mod- els. Aust. N. Z. J. Stat. 51 227–244. MR2531988 [16] Nur, D., Allingham, D., Rousseau, J., Mengersen, K.L. and McVinish, R. (2009). Bayesian hidden Markov model for DNA sequence segmentation: A prior sensitivity analysis. Comput. Statist. Data Anal. 53 1873–1882. MR2649552 [17] Richardson, S. and Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. Ser. B 59 731–792. MR1483213 [18] Rio, E. (2000). Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes. C. R. Acad. Sci. Paris Sér. IMath. 330 905–908. MR1771956 [19] Robert, C.P., Rydén, T. and Titterington, D.M. (2000). Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J. R. Stat. Soc. Ser. BStat. Methodol. 62 57–75. MR1747395 [20] Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfit- ted mixture models. J. R. Stat. Soc. Ser. BStat. Methodol. 73 689–710. MR2867454 [21] Rydén, T., Terasvirta, T. and Asbrink, S. (1998). Stylized facts of daily return series and the hidden Markov model. J. Appl. Econometrics 13 217–244. [22] Spezia, L. (2010). Bayesian analysis of multivariate Gaussian hidden Markov models with an un- known number of regimes. J. Time Series Anal. 31 1–11. MR2640842 [23] Zucchini, W. and MacDonald, I.L. (2009). Hidden Markov Models for Time Series: An Introduc- tion Using R. Monographs on Statistics and Applied Probability 110. Boca Raton, FL: CRC Press. MR2523850 Bernoulli 20(4), 2014, 2076–2101 DOI: 10.3150/13-BEJ551

Stochastic monotonicity and continuity properties of functions defined on Crump–Mode–Jagers branching processes, with application to vaccination in epidemic modelling

FRANK BALL1, MIGUEL GONZÁLEZ2,*, RODRIGO MARTÍNEZ2,** and MAROUSSIA SLAVTCHOVA-BOJKOVA3 1School of Mathematical Sciences, The University of Nottingham, Nottingham NG7 2RD, United Kingdom. E-mail: [email protected] 2Department of Mathematics, University of Extremadura, Avda, Elvas s/n, 06071-Badajoz, Spain. E-mail: *[email protected]; **[email protected] 3Faculty of Mathematics and Informatics, Sofia University and Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Bulgaria. E-mail: [email protected]fia.bg

This paper is concerned with Crump–Mode–Jagers branching processes, describing spread of an epidemic depending on the proportion of the population that is vaccinated. Births in the branching process are aborted independently with a time-dependent probability given by the fraction of the population vacci- nated. Stochastic monotonicity and continuity results for a wide class of functions (e.g., extinction time and total number of births over all time) defined on such a branching process are proved using coupling arguments, leading to optimal vaccination schemes to control corresponding functions (e.g., duration and final size) of epidemic outbreaks. The theory is illustrated by applications to the control of the duration of mumps outbreaks in Bulgaria.

Keywords: coupling; general branching process; Monte-Carlo method; mumps in Bulgaria; SIR epidemic model; time to extinction; vaccination policies

References

[1] Aldous, D. and Pitman, J. (1998). Tree-valued Markov chains derived from Galton–Watson processes. Ann. Inst. Henri Poincaré Probab. Stat. 34 637–686. MR1641670 [2] Andersson, H. and Britton, T. (2000). Stochastic Epidemic Models and Their Statistical Analysis. Lecture Notes in Statistics 151. New York: Springer. MR1784822 [3] Angelov, A.G. and Slavtchova-Bojkova, M. (2012). Bayesian estimation of the offspring mean in branching processes: Application to infectious disease data. Comput. Math. Appl. 64 229–235. MR2944806 [4] Athreya, K.B. and Ney, P.E. (1972). Branching Processes. New York: Springer. MR0373040 [5] Ball, F. (1983). The threshold behaviour of epidemic models. J. Appl. Probab. 20 227–241. MR0698527

1350-7265 © 2014 ISI/BS [6] Ball, F. (1999). Stochastic and deterministic models for SIS epidemics among a population partitioned into households. Math. Biosci. 156 41–67. MR1686455 [7] Ball, F. and Donnelly, P. (1995). Strong approximations for epidemic models. Stochastic Process. Appl. 55 1–21. MR1312145 [8] Bartlett, M.S. (1955). An Introduction to Stochastic Processes, 1st ed. Cambridge: Cambridge Univ. Press. MR0650244 [9] Daley, D.J. and Gani, J. (1999). Epidemic Modelling: An Introduction. Cambridge Studies in Mathe- matical Biology 15. Cambridge: Cambridge Univ. Press. MR1688203 [10] De Serres, G., Gay, N.J. and Farrington, C.P. (2000). Epidemiology of transmissible diseases after elimination. Am. J. Epidemiol. 151 1039–1048; discussion 1049–1052. [11] Farrington, C.P. and Grant, A.D. (1999). The distribution of time to extinction in subcritical branching processes: Applications to outbreaks of infectious disease. J. Appl. Probab. 36 771–779. MR1737052 [12] Farrington, C.P., Kanaan, M.N. and Gay, N.J. (2003). Branching process models for surveillance of infectious diseases controlled by mass vaccination. Biostatistics 4 279–295. [13] González, M., Martínez, R. and Slavtchova-Bojkova, M. (2010). Stochastic monotonicity and con- tinuity properties of the extinction time of Bellman–Harris branching processes: An application to epidemic modelling. J. Appl. Probab. 47 58–71. MR2654758 [14] González, M., Martínez, R. and Slavtchova-Bojkova, M. (2010). Time to extinction of infectious diseases through age-dependent branching models. In Workshop on Branching Processes and Their Applications. Lect. Notes Stat. Proc. 197 (M. González, I.M. del Puerto, R. Martínez, M. Molina, M. Mota and A. Ramos, eds.) 241–256. Berlin: Springer. MR2730917 [15] Heesterbeek, J.A.P. and Dietz, K. (1996). The concept of R0 in epidemic theory. Statist. Neerlandica 50 89–110. MR1381210 [16] Isham, V. (2005). Stochastic models for epidemics. In Celebrating Statistics. Oxford Statist. Sci. Ser. 33 (A.C. Davison, Y. Dodge and N. Wermuth, eds.) 27–54. Oxford: Oxford Univ. Press. MR2230750 [17] Jagers, P. (1975). Branching Processes with Biological Applications. London: Wiley. MR0488341 [18] Kendall, D.G. (1956). Deterministic and stochastic epidemics in closed populations. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955 IV 149–165. Berkeley and Los Angeles: Univ. California Press. MR0084936 [19] Kojouharova, M., Kurchatova, A., Marinova, L. and Georgieva, T. (2007). Mumps outbreak in Bul- garia, 2007: A preliminary report. Eurosurveillance 12. Available at http://www.eurosurveillance.org/ ViewArticle.aspx?ArticleId=3162. [20] Lloyd-Smith, J.O., George, D., Pepin, K.M., Pitzer, V.E., Pulliam, J.R.C., Dobson, A.P., Hudson, P.J. and Grenfell, B.T. (2009). Epidemic dynamics at the human-animal interface. Science 326 1362–1367. [21] Martínez, R. and Slavtchova-Bojkova, M. (2005). Comparison between numerical and simulation methods for age-dependent branching models with immigration. Pliska Stud. Math. Bulgar. 17 147– 154. MR2181340 [22] Metz, J. (1978). The epidemic in a closed population with all susceptibles equally vulnerable; some results for large susceptible populations and small initial infections. Acta Biotheoretica 27 75–123. [23] Mode, C.J. and Sleeman, C.K. (2000). Stochastic Processes in Epidemiology. Singapore: World Sci- entific. [24] Pakes, A.G. (2003). Biological applications of branching processes. In Stochastic Processes: Mod- elling and Simulation. Handbook of Statist. 21 (D.N. Shanbhag and C.R. Rao, eds.) 693–773. Ams- terdam: North-Holland. MR1973557 [25] Pellis, L., Ball, F. and Trapman, P. (2012). Reproduction numbers for epidemic models with house- holds and other social structures. I. Definition and calculation of R0. Math. Biosci. 235 85–97. MR2901029 [26] Smith, C.E.G. (1964). Factors in the transmission of virus infections from animal to man. In Scientific Basis of Medicine Annual Review 125–150. London: Athlone Press. [27] R Development Core Team (2012). R: A language and environment for statistical computing. R Foun- dation for Statistical Computing, Vienna. Bernoulli 20(4), 2014, 2102–2130 DOI: 10.3150/13-BEJ552

Tail approximations for the Student t-, F -, and Welch statistics for non-normal and not necessarily i.i.d. random variables

DMITRII ZHOLUD Department of Mathematical Statistics, Chalmers University of Technology and University of Göteborg, SE-412 96 Gothenburg, Sweden. E-mail: [email protected]

Let T be the Student one- or two-sample t-, F -, or Welch statistic. Now release the underlying assumptions of normality, independence and identical distribution and consider a more general case where one only assumes that the vector of data has a continuous joint density. We determine asymptotic expressions for P(T > u) as u →∞for this case. The approximations are particularly accurate for small sample sizes and may be used, for example, in the analysis of High-Throughput Screening experiments, where the number of replicates can be as low as two to five and often extreme significance levels are used. We give numerous examples and complement our results by an investigation of the convergence speed – both theoretically, by deriving exact bounds for absolute and relative errors, and by means of a simulation study.

Keywords: dependent random variables; F -test; high-throughput screening; non-homogeneous data; non-normal population distribution; outliers; small sample size; Student’s one- and two-sample t-statistics; systematic effects; test power; Welch statistic

References

[1] Bradley, R.A. (1952). The distribution of the t and F statistics for a class of non-normal populations. Virginia J. Sci.(N.S.) 3 1–32. MR0045990 [2] Bradley, R.A. (1952). Corrections for nonnormality in the use of the two-sample t-andF -tests at high significance levels. Ann. Math. Statistics 23 103–113. MR0045989 [3] Daniels, H.E. and Young, G.A. (1991). Saddlepoint approximation for the Studentized mean, with an application to the bootstrap. Biometrika 78 169–179. MR1118242 [4] Field, C. and Ronchetti, E. (1990). Small Sample Asymptotics. Institute of Mathematical Statistics Lecture Notes – Monograph Series 13. Hayward, CA: IMS. MR1088480 [5] Gayen, A.K. (1949). The distribution of “Student’s” t in random samples of any size drawn from non-normal universes. Biometrika 36 353–369. MR0033496 [6] Gayen, A.K. (1950). The distribution of the variance ratio in random samples of any size drawn from non-normal universes. Biometrika 37 236–255. MR0038035 [7] Hall, P. (1987). Edgeworth expansion for Student’s t statistic under minimal moment conditions. Ann. Probab. 15 920–931. MR0893906 [8] Hayek, S.I. (2001). Advanced Mathematical Methods in Science and Engineering.NewYork:Dekker. MR1818441 [9] Hotelling, H. (1961). The behavior of some standard statistical tests under nonstandard conditions. In Proc.4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 319–359. Berkeley, CA: Univ. California Press. MR0133922

1350-7265 © 2014 ISI/BS [10] Jing, B.-Y., Shao, Q.-M. and Zhou, W. (2004). Saddlepoint approximation for Student’s t-statistic with no moment conditions. Ann. Statist. 32 2679–2711. MR2153999 [11] Mathematica (2010). Version 8.0. Champaign, IL: Wolfram Research, Inc. [12] MATLAB (2010). Version 7.10.0 (R2010a). Natick, MA: The MathWorks, Inc. [13] Ray, W.D. and Pitman, A.E.N.T. (1961). An exact distribution of the Fisher–Behrens–Welch statistic for testing the difference between the means of two normal populations with unknown variances. J. Roy. Statist. Soc. Ser. B 23 377–384. MR0139224 [14] Rootzén, H. and Zholud, D.S. (2014). Efficient estimation of the number of false positives in high- throughput screening experiments. Biometrika. To appear. [15] SmartTail (2013). Software for the analysis of false discovery rates in high-throughput screening ex- periments. Available at www.smarttail.se – Username: Bernoulli, Password: PrkQ27. [16] Storey, J.D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. BStat. Methodol. 64 479–498. MR1924302 [17] Storey, J.D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035. MR2036398 [18] Storey, J.D., Taylor, J.E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. BStat. Methodol. 66 187–205. MR2035766 [19] Warringer, J. and Blomberg, A. (2003). Automated screening in environmental arrays allows analysis of quantitative phenotypic profiles in saccharomyces cerevisiae. Yeast 20 53–67. [20] Warringer, J., Ericson, E., Fernandez, L., Nerman, O. and Blomberg, A. (2003). High-resolution yeast phenomics resolves different physiological features in the saline response. Proc. Natl. Acad. Sci. USA 100 15724–15729. [21] Zholud, D.S. (2011). Extreme value analysis of huge datasets: Tail estimation methods in high- throughput screening and bioinformatics. Ph.D. thesis, Göteborg Univ. [22] Zholud, D.S. (2014). Supplement to “Tail approximations for the Student t-, F -, and Welch statistics for non-normal and not necessarily i.i.d. random variables”. DOI:10.3150/13-BEJ552SUPP. [23] Zhou, W. and Jing, B.-Y. (2006). Tail probability approximations for Student’s t-statistics. Probab. Theory Related Fields 136 541–559. MR2257135 Bernoulli 20(4), 2014, 2131–2168 DOI: 10.3150/13-BEJ553

Goodness-of-fit test for noisy directional data

CLAIRE LACOUR* and THANH MAI PHAM NGOC** Laboratoire de Mathématique, UMR 8628, Université Paris Sud, 91405 Orsay Cedex, France. E-mail: *[email protected]; **[email protected]

We consider spherical data Xi noised by a random rotation εi ∈ SO(3) so that only the sample Zi = εiXi, i = 1,...,N is observed. We define a nonparametric test procedure to distinguish H0 : “the density f  − 2 ≥ C of Xi is the uniform density f0 on the sphere” and H1 :“f f0 2 ψN and f is in a Sobolev space with smoothness s”. For a noise density fε with smoothness index ν, we show that an adaptive ad = procedure (i.e., s is not assumed to be known) cannot have a faster rate of separation than ψN (s) − + + (N/ log log(N)) 2s/(2s 2ν 1) and we provide a procedure which reaches this rate. We also deal with the case of super smooth noise. We illustrate the theory by implementing our test procedure for various kinds of noise on SO(3) and by comparing it to other procedures. Applications to real data in astrophysics and paleomagnetism are provided.

Keywords: adaptive testing; minimax hypothesis testing; nonparametric alternatives; spherical deconvolution; spherical harmonics

References

[1] Bai, Z.D., Rao, C.R. and Zhao, L.C. (1988). Kernel estimators of density function of directional data. J. Multivariate Anal. 27 24–39. MR0971170 [2] Beran, R.J. (1968). Testing for uniformity on a compact homogeneous space. J. Appl. Probab. 5 177– 195. MR0228098 [3] Bissantz, N., Claeskens, G., Holzmann, H. and Munk, A. (2009). Testing for lack of fit in inverse regression – With applications to biophotonic imaging. J. R. Stat. Soc. Ser. BStat. Methodol. 71 25– 48. MR2655522 [4] Butucea, C. (2007). Goodness-of-fit testing and quadratic functional estimation from indirect obser- vations. Ann. Statist. 35 1907–1930. MR2363957 [5] Butucea, C., Matias, C. and Pouet, C. (2009). Adaptive goodness-of-fit testing from indirect observa- tions. Ann. Inst. Henri Poincaré Probab. Stat. 45 352–372. MR2521406 [6] Butucea, C. and Tribouley, K. (2006). Nonparametric homogeneity tests. J. Statist. Plann. Inference 136 597–639. MR2181971 [7] Butucea, C. and Tsybakov, A.B. (2007). Sharp optimality in density deconvolution with dominating bias. I. Teor. Veroyatn. Primen. 52 111–128. MR2354572 [8] Comte, F., Rozenholc, Y. and Taupin, M.-L. (2006). Penalized contrast estimator for adaptive density deconvolution. Canad. J. Statist. 34 431–452. MR2328553 [9] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272. MR1126324 [10] Faÿ, G., Delabrouille, J., Kerkyacharian, G. and Picard, D. (2013). Testing the isotropy of high energy cosmic rays using spherical needlets. Ann. Appl. Stat. 7 1040–1073. MR3113500 [11] Fisher, N.I., Lewis, T. and Embleton, B.J.J. (1987). Statistical Analysis of Spherical Data. Cambridge: Cambridge Univ. Press. MR0899958

1350-7265 © 2014 ISI/BS [12] Giné, E., Latała, R. and Zinn, J. (2000). Exponential and moment inequalities for U-statistics. In High Dimensional Probability, II (Seattle, WA, 1999). Progress in Probability 47 13–38. Boston, MA: Birkhäuser. MR1857312 [13] Giné, M.E. (1975). Invariant tests for uniformity on compact Riemannian manifolds based on Sobolev norms. Ann. Statist. 3 1243–1266. MR0388663 [14] Hall, P., Watson, G.S. and Cabrera, J. (1987). Kernel density estimation with spherical data. Biometrika 74 751–762. MR0919843 [15] Healy, D.M. Jr., Hendriks, H. and Kim, P.T. (1998). Spherical deconvolution. J. Multivariate Anal. 67 1–22. MR1659108 [16] Holzmann, H., Bissantz, N. and Munk, A. (2007). Density testing in a contaminated sample. J. Multi- variate Anal. 98 57–75. MR2292917 [17] Huckemann, S.F., Kim, P.T., Koo, J.-Y. and Munk, A. (2010). Möbius deconvolution on the hyperbolic plane with application to impedance density estimation. Ann. Statist. 38 2465–2498. MR2676895 [18] Ingster, Yu.I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives. I, II, III. Math. Methods Statist. 2 85–114. MR1257978 [19] Ingster, Yu.I. (1997). Adaptive chi-square tests. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov (POMI) 244 150–166, 333. MR1700386 [20] Ingster, Yu.I. and Sapatinas, T. (2009). Minimax goodness-of-fit testing in multivariate nonparametric regression. Math. Methods Statist. 18 241–269. MR2560455 [21] Kalifa, J., Mallat, S. and Rougé, B. (2003). Deconvolution by thresholding in mirror wavelet bases. IEEE Trans. Image Process. 12 446–457. MR1982922 [22] Kerkyacharian, G., Pham Ngoc, T.M. and Picard, D. (2011). Localized spherical deconvolution. Ann. Statist. 39 1042–1068. MR2816347 [23] Kim, P.T. and Koo, J.-Y. (2002). Optimal spherical deconvolution. J. Multivariate Anal. 80 21–42. MR1889831 [24] Kim, P.T., Koo, J.-Y. and Park, H.J. (2004). Sharp minimaxity and spherical deconvolution for super- smooth error distributions. J. Multivariate Anal. 90 384–392. MR2081785 [25] Laurent, B., Loubes, J.-M. and Marteau, C. (2011). Testing inverse problems: A direct or an indirect problem? J. Statist. Plann. Inference 141 1849–1861. MR2763215 [26] Meister, A. (2009). Deconvolution Problems in Nonparametric Statistics. Lecture Notes in Statistics 193. Berlin: Springer. MR2768576 [27] Pensky, M. and Vidakovic, B. (1999). Adaptive wavelet estimator for nonparametric density deconvo- lution. Ann. Statist. 27 2033–2053. MR1765627 [28] Quashnock, J.M. and Lamb, D.Q. (1993). Evidence for the galactic origin of gamma-ray bursts. M.N.R.A.S. 265 L45–L50. [29] Spokoiny, V.G. (1996). Adaptive hypothesis testing using wavelets. Ann. Statist. 24 2477–2498. MR1425962 [30] Talman, J.D. (1968). Special Functions: A Group Theoretic Approach. Based on Lectures by Eugene P. Wigner. With an Introduction by Eugene P. Wigner. New York–Amsterdam: Benjamin. MR0239154 [31] Terras, A. (1985). Harmonic Analysis on Symmetric Spaces and Applications. I. New York: Springer. MR0791406 [32] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics.New York: Springer. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats. MR2724359 [33] Vedrenne, G. and Atteia, J.-L. (2009). Gamma-Ray Bursts: The Brightest Explosions in the Universe. New York: Springer/Praxis Books. [34] Vilenkin, N.Ja. (1968). Special Functions and the Theory of Group Representations. Translated from the Russian by V.N. Singh. Translations of Mathematical Monographs 22. Providence, RI: Amer. Math. Soc. MR0229863 [35] Watson, G.S. (1965). Equatorial distributions on a sphere. Biometrika 52 193–201. MR0207115 [36] The Pierre AUGER Collaboration (2010). Update on the correlation of the highest energy cosmic rays with nearby extragalactic matter. Astroparticle Physics 34 314–326. Bernoulli 20(4), 2014, 2169–2216 DOI: 10.3150/13-BEJ554

Approximation of a stochastic wave equation in dimension three, with application to a support theorem in Hölder norm

FRANCISCO J. DELGADO-VENCES* and MARTA SANZ-SOLÉ** Facultat de Matemàtiques, Universitat de Barcelona, Gran Via, 585 E-08007 Barcelona, Spain. E-mail: *[email protected]; **[email protected]

A characterization of the support in Hölder norm of the law of the solution to a stochastic wave equation with three-dimensional space variable is proved. The result is a consequence of an approximation theorem, in the convergence of probability, for a sequence of evolution equations driven by a family of regularizations of the driving noise.

Keywords: approximating schemes; stochastic wave equation; support theorem

References

[1] Aida, S., Kusuoka, S. and Stroock, D. (1993). On the support of Wiener functionals. In Asymptotic Problems in Probability Theory: Wiener Functionals and Asymptotics (Sanda/Kyoto, 1990). Pitman Res. Notes Math. Ser. 284 3–34. Harlow: Longman Sci. Tech. MR1354161 [2] Ba˘ınov, D. and Simeonov, P. (1992). Integral Inequalities and Applications. Mathematics and Its Applications (East European Series) 57. Dordrecht: Kluwer Academic. MR1171448 [3] Bally, V., Millet, A. and Sanz-Solé, M. (1995). Approximation and support theorem in Hölder norm for parabolic stochastic partial differential equations. Ann. Probab. 23 178–222. MR1330767 [4] Ben Arous, G., Gradinaru,˘ M. and Ledoux, M. (1994). Hölder norms and the support theorem for diffusions. Ann. Inst. Henri Poincaré Probab. Stat. 30 415–436. MR1288358 [5] Cont, R. and Fournié, D.-A. (2013). Functional Itô calculus and stochastic integral representation of martingales. Ann. Probab. 41 109–133. MR3059194 [6] Dalang, R.C. (1999). Extending the martingale measure stochastic integral with applications to spa- tially homogeneous s.p.d.e.’s. Electron. J. Probab. 4 no. 6, 29 pp. (electronic). MR1684157 [7] Dalang, R.C. and Frangos, N.E. (1998). The stochastic wave equation in two spatial dimensions. Ann. Probab. 26 187–212. MR1617046 [8] Dalang, R.C. and Mueller, C. (2003). Some non-linear S.P.D.E.’s that are second order in time. Elec- tron. J. Probab. 8 no. 1, 21 pp. (electronic). MR1961163 [9] Dalang, R.C. and Quer-Sardanyons, L. (2011). Stochastic integrals for spde’s: A comparison. Expo. Math. 29 67–109. MR2785545 [10] Dalang, R.C. and Sanz-Solé, M. (2009). Hölder–Sobolev regularity of the solution to the stochastic wave equation in dimension three. Mem. Amer. Math. Soc. 199 vi+70. MR2512755 [11] Folland, G.B. (1976). Introduction to Partial Differential Equations. Princeton, NJ: Princeton Univ. Press. MR0599578 [12] Gyöngy, I., Nualart, D. and Sanz-Solé, M. (1995). Approximation and support theorems in modulus spaces. Probab. Theory Related Fields 101 495–509. MR1327223

1350-7265 © 2014 ISI/BS [13] Mackevicius, V. (1986). The support of the solution of a stochastic differential equation. Litovsk. Mat. Sb. 26 91–98. MR0847207 [14] Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics 44. Cambridge: Cambridge Univ. Press. MR1333890 [15] Millet, A. and Sanz-Solé, M. (1994). The support of the solution to a hyperbolic SPDE. Probab. Theory Related Fields 98 361–387. MR1262971 [16] Millet, A. and Sanz-Solé, M. (1994). A simple proof of the support theorem for diffusion processes. In Séminaire de Probabilités XXVIII (J. Azéma, P.A. Meyer and M. Yor, eds.). Lecture Notes in Math. 1583 36–48. Berlin: Springer. MR1329099 [17] Millet, A. and Sanz-Solé, M. (2000). Approximation and support theorem for a wave equation in two space dimensions. Bernoulli 6 887–915. MR1791907 [18] Ortiz López, V. (2012). Large deviation principle for a stochastic wave equation in spatial dimension three. Ph.D. dissertation (in Catalan), Barcelona. [19] Stroock, D.W. and Varadhan, S.R.S. (1972). On the support of diffusion processes with applications to the strong maximum principle. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971). Probability Theory III 333– 359. Berkeley, CA: Univ. California Press. MR0400425 Bernoulli 20(4), 2014, 2217–2246 DOI: 10.3150/13-BEJ555

Adaptive sensing performance lower bounds for sparse signal detection and support estimation

RUI M. CASTRO Eindhoven University of Technology, The Netherlands. E-mail: [email protected]

In memory of Yuri Ingster

This paper gives a precise characterization of the fundamental limits of adaptive sensing for diverse esti- mation and testing problems concerning sparse signals. We consider in particular the setting introduced in (IEEE Trans. Inform. Theory 57 (2011) 6222–6235) and show necessary conditions on the minimum signal n magnitude for both detection and estimation: if x ∈ R is a sparse vector with s non-zero components√ then it can be reliably detected in noise provided the magnitude of the non-zero components exceeds 2√/s.Fur- thermore, the signal support can be exactly identified provided the minimum magnitude exceeds 2logs. Notably there is no dependence on n, the extrinsic signal dimension. These results show that the adaptive sensing methodologies proposed previously in the literature are essentially optimal, and cannot be substan- tially improved. In addition, these results provide further insights on the limits of adaptive compressive sensing.

Keywords: adaptive sensing; minimax lower bounds; sequential experimental design; sparsity-based models

References

[1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing prob- lems. Ann. Statist. 38 3063–3092. MR2722464 [2] Arias-Castro, E., Candès, E.J. and Davenport, M.A. (2013). On the fundamental limits of adaptive sensing. IEEE Trans. Inform. Theory 59 472–481. MR3008159 [3] Arias-Castro, E., Candès, E.J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757. MR2435454 [4] Balcan, N., Beygelzimer, A. and Langford, J. (2006). Agostic active learning. In 23rd International Conference on Machine Learning 65–72. [5] Bessler, S.A. (1960). Theory and applications of the sequential design of experiments, k-actions and infinitely many experiments: Part I – Theory. Technical Report 55, Stanford Univ., Applied Mathe- matics and Statistics Laboratories. [6] Blanchard, G. and Geman, D. (2005). Hierarchical testing designs for pattern recognition. Ann. Statist. 33 1155–1202. MR2195632 [7] Butucea, C. and Ingster, Y. (2013). Detection of a sparse submatrix of a high-dimensional noisy ma- trix. Bernoulli 19 2652–2688. [8] Cai, T.T., Jin, J. and Low, M.G. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449. MR2382653

1350-7265 © 2014 ISI/BS [9] Castro, R., Willett, R. and Nowak, R. (2005). Faster rates in regression via active learning. In Advances in Neural Information Processing Systems 18 179–186. [10] Castro, R.M. and Nowak, R.D. (2008). Minimax bounds for active learning. IEEE Trans. Inform. Theory 54 2339–2353. MR2450865 [11] Chernoff, H. (1959). Sequential design of experiments. Ann. Math. Statist. 30 755–770. MR0108874 [12] Cohn, D., Ghahramani, Z. and Jordan, M. (1996). Active learning with statistical models. J. Artificial Intelligence Res. 4 129–145. [13] Dasgupta, S. (2004). Analysis of a greedy active learning strategy. In Advances in Neural Information Processing Systems 17 337–344. [14] Dasgupta, S. (2005). Coarse sample complexity bounds for active learning. In Advances in Neural Information Processing Systems 18 235–242. [15] Dasgupta, S., Kalai, A. and Monteleoni, C. (2005). Analysis of perceptron-based active learning. In Eighteenth Annual Conference on Learning Theory (COLT ) 249–263. [16] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994. MR2065195 [17] Donoho, D.L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306. MR2241189 [18] El-Gamal, M.A. (1991). The role of priors in active Bayesian learning in the sequential statistical de- cision framework. In Maximum Entropy and Bayesian Methods (Laramie, WY, 1990). Fund. Theories Phys. 43 (W.T. Grandy and L.H. Schich, eds.) 33–38. Dordrecht: Kluwer Academic. MR1173460 [19] Fedorov, V.V. (1972). Theory of Optimal Experiments. New York: Academic Press. MR0403103 [20] Freund, Y., Seung, H.S., Shamir, E. and Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning 28 133–168. [21] Hall, P. and Molchanov, I. (2003). Sequential methods for design-adaptive estimation of discontinu- ities in regression curves and surfaces. Ann. Statist. 31 921–941. MR1994735 [22] Hanneke, S. (2011). Rates of convergence in active learning. Ann. Statist. 39 333–361. MR2797849 [23] Haupt, J., Baraniuk, R., Castro, R. and Nowak, R. (2012). Sequentially designed compressed sens- ing. In IEEE Statistical Signal Processing Workshop (IEEE SSP) Proceedings 401–404. Available at http://www.win.tue.nl/~rmcastro/publications/SCS.pdf. [24] Haupt, J., Castro, R.M. and Nowak, R. (2011). Distilled sensing: Adaptive sampling for sparse detec- tion and estimation. IEEE Trans. Inform. Theory 57 6222–6235. MR2857969 [25] Ingster, Y.I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69. MR1456646 [26] Ingster, Y.I. and Suslina, I.A. (2003). Nonparametric Goodness-of-fit Testing Under Gaussian Models. Lecture Notes in Statistics 169. New York: Springer. MR1991446 [27] Kim, J.-C. and Korostelev, A. (2000). Rates of convergence for the sup-norm risk in image models under sequential designs. Statist. Probab. Lett. 46 391–399. MR1743998 [28] Koltchinskii, V. (2010). Rademacher complexities and bounding the excess risk in active learning. J. Mach. Learn. Res. 11 2457–2485. MR2727771 [29] Lai, T.L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22. MR0776826 [30] Malloy, M. and Nowak, R. (2011). On the limits of sequential testing in high dimensions. In Asilomar Conference on Signals, Systems and Computers 1245–1249. Available at http://arxiv.org/abs/1105. 4540. [31] Malloy, M. and Nowak, R. (2011). Sequential analysis in high-dimensional multiple testing and sparse recovery. In The IEEE International Symposium on Information Theory 2661–2665. Available at http://arxiv.org/abs/1103.5991v1. [32] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393. MR2275246 [33] Novak, E. (1996). On the power of adaption. J. Complexity 12 199–237. MR1408328 [34] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. New York: Springer. MR2724359 [35] Wald, A. (1947). Sequential Analysis. New York: Wiley. MR0020764 [36] Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics.NewYork: Springer. MR2172729 Bernoulli 20(4), 2014, 2247–2277 DOI: 10.3150/13-BEJ556

Asymptotic behavior of CLS estimators for 2-type doubly symmetric critical Galton–Watson processes with immigration

MÁRTON ISPÁNY1, KRISTÓF KÖRMENDI2,* and GYULA PAP2,** 1University of Debrecen, Faculty of Informatics, Department of Information Technology, Pf. 12, H-4010 Debrecen, Hungary. E-mail: [email protected] 2University of Szeged, Faculty of Science, Bolyai Institute, Department of Stochastics, Aradi vértanúk tere 1, H-6720 Szeged, Hungary. E-mail: *[email protected]; **[email protected]

In this paper, the asymptotic behavior of the conditional least squares (CLS) estimators of the offspring means (α, β) and of the criticality parameter := α + β for a 2-type critical doubly symmetric positively regular Galton–Watson branching process with immigration is described.

Keywords: conditional least squares estimator; Galton–Watson branching process with immigration

References

[1] Athreya, K.B. and Ney, P.E. (1972). Branching Processes. New York: Springer. MR0373040 [2] Barczy, M., Ispány, M. and Pap, G. (2012). Asymptotic behavior of CLS estimators for unstable INAR(2) models. Available at http://arxiv.org/abs/1202.1617. [3] Barczy, M., Ispány, M. and Pap, G. (2011). Asymptotic behavior of unstable INAR(p) processes. Stochastic Process. Appl. 121 583–608. MR2763097 [4] Guttorp, P. (1991). Statistical Inference for Branching Processes. Wiley Series in Probability and Mathematical Statistics. New York: Wiley. MR1254434 [5] Hall, P. and Yao, Q. (2003). Inference in ARCH and GARCH models with heavy-tailed errors. Econo- metrica 71 285–317. MR1956860 [6] Hamilton, J.D. (1994). Time Series Analysis. Princeton, NJ: Princeton Univ. Press. MR1278033 [7] Horn, R.A. and Johnson, C.R. (1985). Matrix Analysis. Cambridge: Cambridge Univ. Press. MR0832183 [8] Ispány, M., Körmendi, K. and Pap, G. (2012). Asymptotic behavior of CLS estimators for 2-type critical Galton–Watson processes with immigration. Available at http://arxiv.org/abs/1210.8315. [9] Ispány, M. and Pap, G. (2012). Asymptotic behavior of critical primitive multi-type branching pro- cesses with immigration. Available at http://arxiv.org/abs/1205.0388. [10] Ispány, M. and Pap, G. (2010). A note on weak convergence of random step processes. Acta Math. Hungar. 126 381–395. MR2629664 [11] Jacod, J. and Shiryaev, A.N. (2003). Limit Theorems for Stochastic Processes, 2nd ed. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 288. Berlin: Springer. MR1943877 [12] Kallenberg, O. (1997). Foundations of Modern Probability. Probability and Its Applications (New York). New York: Springer. MR1464694

1350-7265 © 2014 ISI/BS [13] Kesten, H. and Stigum, B.P. (1966). A limit theorem for multidimensional Galton–Watson processes. Ann. Math. Statist. 37 1211–1223. MR0198552 [14] Mikosch, T. and Straumann, D. (2002). Whittle estimation in a heavy-tailed GARCH(1, 1) model. Stochastic Process. Appl. 100 187–222. MR1919613 [15] Musiela, M. and Rutkowski, M. (1997). Martingale Methods in Financial Modelling. Applications of Mathematics (New York) 36. Berlin: Springer. MR1474500 [16] Quine, M.P. (1970). The multi-type Galton–Watson process with immigration. J. Appl. Probability 7 411–422. MR0263168 [17] Revuz, D. and Yor, M. (2001). Continuous Martingales and Brownian Motion, 3rd ed., corrected 2nd printing. Berlin: Springer. MR1725357 [18] Shete, S. and Sriram, T.N. (2003). A note on estimation in multitype supercritical branching processes with immigration. Sankhya¯ 65 107–121. MR2016780 [19] Tanaka, K. (1996). Time Series Analysis: Nonstationary and Noninvertible Distribution Theory. Wiley Series in Probability and Statistics. New York: Wiley. MR1397269 [20] Wei, C.Z. and Winnicki, J. (1989). Some asymptotic results for the branching process with immigra- tion. Stochastic Process. Appl. 31 261–282. MR0998117 [21] Wei, C.Z. and Winnicki, J. (1990). Estimation of the means in the branching process with immigration. Ann. Statist. 18 1757–1773. MR1074433 [22] Winnicki, J. (1991). Estimation of the variances in the branching process with immigration. Probab. Theory Related Fields 88 77–106. MR1094078 Bernoulli 20(4), 2014, 2278–2304 DOI: 10.3150/13-BEJ557

Affine invariant divergences associated with proper composite scoring rules and their applications

TAKAFUMI KANAMORI1 and HIRONORI FUJISAWA2 1Department of Computer Science and Mathematical Informatics, Nagoya University, Furocho Chikusaku, Nagoya 464-8601, Japan. E-mail: [email protected] 2The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan. E-mail: [email protected]

In statistical analysis, measuring a score of predictive performance is an important task. In many scientific fields, appropriate scoring rules were tailored to tackle the problems at hand. A proper scoring rule is a popular tool to obtain statistically consistent forecasts. Furthermore, a mathematical characterization of the proper scoring rule was studied. As a result, it was revealed that the proper scoring rule corresponds to a Bregman divergence, which is an extension of the squared distance over the set of probability distributions. In the present paper, we introduce composite scoring rules as an extension of the typical scoring rules in order to obtain a wider class of probabilistic forecasting. Then, we propose a class of composite scoring rules, named Hölder scores, that induce equivariant estimators. The equivariant estimators have a favorable property, implying that the estimator is transformed in a consistent way, when the data is transformed. In particular, we deal with the affine transformation of the data. By using the equivariant estimators under the affine transformation, one can obtain estimators that do no essentially depend on the choice of the system of units in the measurement. Conversely, we prove that the Hölder score is characterized by the invariance property under the affine transformations. Furthermore, we investigate statistical properties of the estimators using Hölder scores for the statistical problems including estimation of regression functions and robust parameter estimation, and illustrate the usefulness of the newly introduced scoring rules for statistical forecasting.

Keywords: affine invariance; Bregman score; composite scoring rule; divergence; Hölder score

References

[1] Abernethy, J.D. and Frongillo, R.M. (2012). A characterization of scoring rules for linear properties. J. Mach. Learn. Res.: Workshop and Conference Proceedings 23 27.1–27.13. [2] Banerjee, A., Merugu, S., Dhillon, I.S. and Ghosh, J. (2005). Clustering with Bregman divergences. J. Mach. Learn. Res. 6 1705–1749. MR2249870 [3] Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C. (1998). Robust and efficient estimation by min- imising a density power divergence. Biometrika 85 549–559. MR1665873 [4] Basu, A., Shioya, H. and Park, C. (2011). Statistical Inference: The Minimum Distance Approach. Monographs on Statistics and Applied Probability 120. Boca Raton, FL: CRC Press. MR2830561 [5] Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer Series in Statistics. New York: Springer. MR0804611

1350-7265 © 2014 ISI/BS [6] Borwein, J.M. and Zhu, Q.J. (2005). Techniques of Variational Analysis. CMS Books in Mathemat- ics/Ouvrages de Mathématiques de la SMC 20. New York: Springer. MR2144010 [7] Brègman, L.M. (1967). A relaxation method of finding a common point of convex sets and its ap- plication to the solution of problems in convex programming. Ž. Vyˇcisl. Mat. iMat. Fiz. 7 620–631. MR0215617 [8] Bremnes, B.J. (2004). Probabilistic forecasts of precipitation in terms of quantiles using nwp model output. Monthly Weather Review 132 338–347. [9] Brier, G.W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78 1–3. [10] Cichocki, A. and Amari, S.-i. (2010). Families of alpha- beta- and gamma-divergences: Flexible and robust measures of similarities. Entropy 12 1532–1568. MR2659408 [11] Collins, M., Schapire, R.E. and Singer, Y. (2000). Logistic regression, adaboost and Bregman dis- tances. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory 158– 169. [12] Cover, T.M. and Thomas, J.A. (2006). Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley. MR2239987 [13] Dawid, A.P. (1998). Coherent measures of discrepancy, uncertainty and dependence, with applications to bayesian predictive experimental design. Technical report, University College London, Dept. of Statistical Science. [14] Dawid, A.P. (2007). The geometry of proper scoring rules. Ann. Inst. Statist. Math. 59 77–93. MR2396033 [15] Dawid, A.P., Lauritzen, S. and Parry, M. (2012). Proper local scoring rules on discrete sample spaces. Ann. Statist. 40 593–608. MR3014318 [16] Duffie, D. and Pan, J. (1997). An overview of value at risk. J. Derivatives 4 7–49. [17] Eguchi, S., Komori, O. and Kato, S. (2011). Projective power entropy and maximum Tsallis entropy distributions. Entropy 13 1746–1764. MR2851127 [18] Ehm, W. and Gneiting, T. (2012). Local proper scoring rules of order two. Ann. Statist. 40 609–637. MR3014319 [19] Fabian, M., Habala, P., Hájek, P., Montesinos Santalucía, V., Pelant, J. and Zizler, V. (2001). Func- tional Analysis and Infinite-Dimensional Geometry. CMS Books in Mathematics/Ouvrages de Mathé- matiques de la SMC 8. New York: Springer. MR1831176 [20] Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99 2053–2081. MR2466551 [21] Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378. MR2345548 [22] Good, I.J. (1971). Comment on “Measuring information and uncertainty,” by R.J. Buehler. In Founda- tions of Statistical Inference (V.P. Godambe and D.A. Sprott, eds.) 337–339. Toronto: Holt, Rinehart and Winston. [23] Grünwald, P.D. and Dawid, A.P. (2004). Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. Statist. 32 1367–1433. MR2089128 [24] Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986). Robust Statistics: The Ap- proach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics: Prob- ability and Mathematical Statistics. New York: Wiley. MR0829458 [25] Hendrickson, A.D. and Buehler, R.J. (1971). Proper scores for probability forecasters. Ann. Math. Statist. 42 1916–1921. MR0314430 [26] Huber, P.J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73–101. MR0161415 [27] Jones, M.C., Hjort, N.L., Harris, I.R. and Basu, A. (2001). A comparison of related density-based minimum divergence estimators. Biometrika 88 865–873. MR1859416 [28] Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006). Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Chichester: Wiley. MR2238141 [29] Murata, N., Takenouchi, T., Kanamori, T. and Eguchi, S. (2004). Information geometry of U-boost and Bregman divergence. Neural Comput. 16 1437–1481. [30] Parry, M., Dawid, A.P. and Lauritzen, S. (2012). Proper local scoring rules. Ann. Statist. 40 561–592. MR3014317 [31] Tsallis, C. (1988). Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52 479–487. MR0968597 [32] Tsuda, K., Rätsch, G. and Warmuth, M.K. (2005). Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res. 6 995–1018. MR2249846 [33] van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge: Cambridge Univ. Press. MR1652247 Bernoulli 20(4), 2014, 2305–2330 DOI: 10.3150/13-BEJ558

The affinely invariant distance correlation

JOHANNES DUECK1, DOMINIC EDELMANN1, TILMANN GNEITING2 and DONALD RICHARDS3 1Institut für Angewandte Mathematik, Universität Heidelberg, Im Neuenheimer Feld 294, 69120 Heidel- berg, Germany 2Heidelberg Institute for Theoretical Studies and Karlsruhe Institute of Technology, HITS gGmbH, Schloss- Wolfsbrunnenweg 35, 69118 Heidelberg, Germany 3Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA. E-mail: [email protected]

Székely, Rizzo and Bakirov (Ann. Statist. 35 (2007) 2769–2794) and Székely and Rizzo (Ann. Appl. Statist. 3 (2009) 1236–1265), in two seminal papers, introduced the powerful concept of distance correlation as a measure of dependence between sets of random variables. We study in this paper an affinely invariant version of the distance correlation and an empirical version of that distance correlation, and we establish the consistency of the empirical quantity. In the case of subvectors of a multivariate normally distributed random vector, we provide exact expressions for the affinely invariant distance correlation in both finite- dimensional and asymptotic settings, and in the finite-dimensional case we find that the affinely invariant distance correlation is a function of the canonical correlation coefficients. To illustrate our results, we con- sider time series of wind vectors at the Stateline wind energy center in Oregon and Washington, and we derive the empirical auto and cross distance correlation functions between wind vectors at distinct meteo- rological stations.

Keywords: affine invariance; distance correlation; distance covariance; hypergeometric function of matrix argument; multivariate independence; multivariate ; vector time series; wind forecasting; zonal polynomial

References

[1] Andrews, G.E., Askey, R. and Roy, R. (1999). Special Functions. Encyclopedia of Mathematics and Its Applications 71. Cambridge: Cambridge Univ. Press. MR1688958 [2] Eaton, M.L. (1989). Group Invariance Applications in Statistics. NSF-CBMS Regional Conference Series in Probability and Statistics 1. Hayward, CA: IMS. MR1089423 [3] Gneiting, T., Larson, K., Westrick, K., Genton, M.G. and Aldrich, E. (2006). Calibrated probabilistic forecasting at the Stateline wind energy center: The regime-switching space-time method. J. Amer. Statist. Assoc. 101 968–979. MR2324108 [4] Gorfine, M., Heller, R. and Heller, Y. (2012). Comment on “Detecting novel associations in large data sets.” Unpublished manuscript. Available at http://iew3.technion.ac.il/~gorfinm/files/science6.pdf. [5] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B. and Smola, A. (2012). A kernel two-sample test. J. Mach. Learn. Res. 13 723–773. MR2913716 [6] Gross, K.I. and Richards, D.S.P. (1987). Special functions of matrix argument. I. Algebraic induction, zonal polynomials, and hypergeometric functions. Trans. Amer. Math. Soc. 301 781–811. MR0882715 [7] Heller, R., Heller, Y. and Gorfine, M. (2013). A consistent multivariate test of association based on ranks of distances. Biometrika 100 503–510. MR3068450

1350-7265 © 2014 ISI/BS [8] Hering, A.S. and Genton, M.G. (2010). Powering up with space-time wind forecasting. J. Amer. Statist. Assoc. 105 92–104. MR2757195 [9] James, A.T. (1964). Distributions of matrix variates and latent roots derived from normal samples. Ann. Math. Statist. 35 475–501. MR0181057 [10] Koev, P. and Edelman, A. (2006). The efficient evaluation of the hypergeometric function of a matrix argument. Math. Comp. 75 833–846. MR2196994 [11] Kosorok, M.R. (2009). Discussion of: Brownian distance covariance. Ann. Appl. Stat. 3 1270–1278. MR2752129 [12] Kosorok, M.R. (2013). Correction: Discussion of Brownian distance covariance. Ann. Appl. Stat. 7 1247. MR3113509 [13] Muirhead, R.J. (1982). Aspects of Multivariate Statistical Theory. New York: Wiley. MR0652932 [14] Newton, M.A. (2009). Introducing the discussion paper by Székely and Rizzo. Ann. Appl. Stat. 3 1233–1235. MR2752126 [15] Rémillard, B. (2009). Discussion of: Brownian distance covariance. Ann. Appl. Stat. 3 1295–1298. MR2752133 [16] Reshef, D.N., Reshef, J.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lan- der, E.S., Mitzenmacher, M. and Sabeti, P.C. (2011). Detecting novel associations in large data sets. Science 334 1518–1524. [17] Rizzo, M.L. and Székely, G.J. (2011). Energy: E-statistics (energy statistics). R package, Version 1.4-0. Available at http://cran.us.r-project.org/web/packages/energy/index.html. [18] Simon, N. and Tibshirani, R. (2012). Comment on “Detecting novel associations in large data sets,” by Reshef et al. Science 334 (2011) 1518–1524. Unpublished manuscript. Available at http://www-stat. stanford.edu/~tibs/reshef/comment.pdf. [19] Speed, T. (2011). Mathematics. A correlation for the 21st century. Science 334 1502–1503. [20] Székely, G.J. and Rizzo, M.L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236–1265. MR2752127 [21] Székely, G.J. and Rizzo, M.L. (2012). On the uniqueness of distance covariance. Statist. Probab. Lett. 82 2278–2282. MR2979766 [22] Székely, G.J. and Rizzo, M.L. (2013). The distance correlation t-test of independence in high dimen- sion. J. Multivariate Anal. 117 193–213. MR3053543 [23] Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794. MR2382665 [24] Zhou, Z. (2012). Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Series Anal. 33 438–457. MR2915095