Learning Exponential Families in High-Dimensions
Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity Sham M. Kakade Ohad Shamir Department of Statistics School of Computer Science and Engineering The Wharton School The Hebrew University of Jerusalem, Israel University of Pennsylvania, USA Karthik Sridharan Ambuj Tewari Toyota Technological Institute Toyota Technological Institute Chicago, USA Chicago, USA Abstract The versatility of exponential families, along with their attendant convexity prop- erties, make them a popular and effective statistical model. A central issue is learning these models in high-dimensions, such as when there is some sparsity pattern of the optimal parameter. This work characterizes a certain strong con- vexity property of general exponential families, which allow their generalization ability to be quantified. In particular, we show how this property can be used to analyze generic exponential families under L1 regularization. 1 Introduction Exponential models are perhaps the most versatile and pragmatic statistical model for a variety of reasons — modelling flexibility (encompassing discrete variables, continuous variables, covariance matrices, time series, graphical models, etc); convexity properties allowing ease of optimization; and robust generalization ability. A principal issue for applicability to large scale problems is estimating these models when the ambient dimension of the parameters, p, is much larger than the sample size arXiv:0911.0054v2 [cs.LG] 16 May 2015 n —the “p n” regime. ≫ Much recent work has focused on this problem in the special case of linear regression in high di- mensions, where it is assumed that the optimal parameter vector is sparse (e.g. Zhao and Yu [2006], Candes and Tao [2007], Meinshausen and Yu [2009], Bickel et al.
[Show full text]