Biometrika (2020), 107,2,pp. 257–276 doi: 10.1093/biomet/asaa006 Printed in Great Britain Advance Access publication 4 April 2020 Network cross-validation by edge sampling By TIANXI LI Department of Statistics, University of Virginia, B005 Halsey Hall, 148 Amphitheater Way, Charlottesville, Virginia 22904, U.S.A.
[email protected] Downloaded from https://academic.oup.com/biomet/article/107/2/257/5816043 by guest on 01 July 2021 ELIZAVETA LEVINA AND JI ZHU Department of Statistics, University of Michigan, 459 West Hall, 1085 South University Avenue, Ann Arbor, Michigan 48105, U.S.A.
[email protected] [email protected] Summary While many statistical models and methods are now available for network analysis, resampling of network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but it is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. In this paper we propose a new network resampling strategy, based on splitting node pairs rather than nodes, that is applicable to cross-validation for a wide range of network model selection tasks. We provide theoretical justification for our method in a general setting and examples of how the method can be used in specific network model selection and parameter tuning tasks. Numerical results on simulated networks and on a statisticians’ citation network show that the proposed cross-validation approach works well for model selection. Some key words: Cross-validation; Model selection; Parameter tuning; Random network. 1. Introduction Statistical methods for analysing networks have received a great deal of attention because of their wide-ranging applications in areas such as sociology, physics, biology and the medical sciences.