Arxiv:1804.09554V2 [Math.OC] 12 Nov 2018

Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari [email protected] Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge, MA 02139 Hamed Hassani [email protected] Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia, PA 19104 Amin Karbasi [email protected] Department of Electrical Engineering and Computer Science Yale University New Haven, CT 06520 Editor: Abstract This paper considers stochastic optimization problems for a large class of objective functions, including convex and continuous submodular. Stochastic proximal gradient methods have been widely used to solve such problems; however, their applicability remains limited when the problem dimension is large and the projection onto a convex set is computationally costly. Instead, stochastic conditional gradient algorithms are proposed as an alternative solution which rely on (i) Approximating gradients via a simple averaging technique requiring a single stochastic gradient evaluation per iteration; (ii) Solving a linear program to compute the descent/ascent direction. The gradient averaging technique reduces the noise of gradient approximations as time progresses, and replacing projection step in proximal methods by a linear program lowers the computational complexity of each iteration. We show that under convexity and smoothness assumptions, our proposed stochastic conditional gradient method converges to the optimal objective function value at a sublinear rate of O(1=t1=3). Further, for a monotone and continuous DR-submodular function and subject to a general convex body constraint, we prove that our proposed method achieves a ((1 − 1=e)OPT − ) guarantee (in expectation) with O(1/3) stochastic gradient computations. This guarantee matches the known hardness results and closes the gap between deterministic and stochastic continuous submodular maximization. Additionally, we achieve ((1=e)OPT − ) guar- arXiv:1804.09554v2 [math.OC] 12 Nov 2018 antee after operating on O(1/3) stochastic gradients for the case that the objective function is continuous DR-submodular but non-monotone and the constraint set is a down-closed convex body. By using stochastic continuous optimization as an interface, we also provide the first (1 − 1=e) tight approximation guarantee for maximizing a monotone but stochastic submodular set function subject to a general matroid constraint and (1=e) approximation guarantee for the non-monotone case. Numerical experiments for both convex and submodular settings are provided, and they illustrate fast convergence time for our proposed stochastic conditional gradient method relative to alternatives. Keywords: Stochastic optimization, conditional gradient methods, convex minimization, submodular maximization, gradient averaging, Frank-Wolfe algorithm, greedy algorithm Mokhtari, Hassani, and Karbasi 1. Introduction Stochastic optimization arises in many procedures including wireless communications (Ribeiro, 2010), learning theory (Vapnik, 2013), machine learning (Bottou, 2010), adaptive filters (Haykin, 2008), portfolio selection (Shapiro et al., 2009) to name a few. In this class of problems the goal is to optimize an objective function defined as an expectation over a set of random functions subject to a general convex constraint. In particular, consider an optimization variable x Rn and a 2 X ⊂ random variable Z that together determine the choice of a stochastic function F~ : R. The goal is to solve2 the Z program X × Z ! h ~ i max F (x) := max Ez∼P F (x; z) ; (1) x2C x2C where is a convex compact set, z is the realization of the random variable Z drawn from a distributionC P , and F (x) is the expected value of the random functions F~(x; z) with respect to the random variable Z. In this paper, our focus is on the cases where the function F is either concave or continuous submodular. The first case which considers the problem of maximizing a concave function is equivalent to stochastic convex minimization, and the second case in which the goal is to maximize a continuous submodular function is called stochastic continuous submodular maximization. Note that the main challenge here is to solve Problem (1) without computing the objective function F (x) or its gradient F (x) explicitly, since we assume that either the probability distribution P is unknown or the cost ofr computing the expectation is prohibitive. In this regime, stochastic algorithms are the method of choice since they operate on computationally cheap stochastic estimates of the objective function gradients. Stochastic variants of the proximal gradient method are perhaps the most popular algorithms to tackle this category of problems both in convex minimization and submodular maximization. However, implementation of proximal methods requires projection onto a convex set at each iteration to enforce feasibility, which could be computationally expensive or intractable in many settings. To avoid the cost of projection, recourse to conditional gradient methods arises as a natural alternative. Unlike proximal algorithms, conditional gradient methods do not suffer from computationally costly projection steps and only require solving linear programs which can be implemented at a significantly lower complexity in many practical applications. In deterministic convex minimization, in which the exact gradient information is given, conditional gradient method, a.k.a., Frank-Wolfe algorithm (Frank and Wolfe, 1956; Jaggi, 2013), succeeds in lowering the computational complexity of proximal algorithms due to its projection free property. However, in the stochastic regime, where only an estimate of the gradient is available, stochastic conditional gradient methods either diverge or underperform the proximal gradient methods. In particular, in stochastic convex minimization it is known and proven that stochastic conditional gradient methods may not converge to the optimal solution without an increasing batch size, whereas stochastic proximal gradient methods converge to the optimal solution at the sublinear rate of (1=pt). Hence, the possibility of designing a convergent stochastic conditional gradient method withO a small batch size remains unanswered. In deterministic continuous submodular maximization where objective function is submodular, (neither convex nor concave), a variant of the condition gradient method called, continuous (Cali- nescu et al., 2011; Bian et al., 2017b) obtains a (1 1=e) approximation guarantee for monotone functions, in contrast the best-known result for proximal− gradient methods is a (1=2) approximation guarantee (Hassani et al., 2017). However, in stochastic submodular maximization setting, stochastic variants of the continuous greedy algorithm with a small batch size fail to achieve a constant factor approximation (Hassani et al., 2017), whereas stochastic proximal gradient method recovers the (1=2) approximation obtained by proximal gradient method in deterministic settings (Hassani et al., 2017). It is unknown if one can design a stochastic conditional gradient method that obtains a constant approximation guarantee, ideally (1 1=e), for Problem (1). − 2 Stochastic Conditional Gradient Methods In this paper, we introduce a stochastic conditional gradient method for solving the generic stochastic optimization Problem (1). The proposed method lowers the noise of gradient approximations through a simple gradient averaging technique which only requires a single stochastic gradient computation per iteration, i.e., the batch size can be as small as 1. The proposed stochastic conditional gradient method improves the best-known convergence guarantees for stochastic conditional gradient methods in both convex minimization and submodular maximization settings. A summary of our theoretical contributions follows. (i) In stochastic convex minimization, we propose a stochastic variant of the Frank-Wolfe algorithm which converges to the optimal objective function value at the sublinear rate of (1=t1=3). In other words, the proposed SFW algorithm achieves an -suboptimal objective functionO value after (1/3) stochastic gradient evaluations. O (ii) In stochastic continuous submodular maximization, we propose a stochastic conditional gradient method, which can be interpreted as a stochastic variant of the continuous greedy algorithm, that obtains the first tight (1 1=e) approximation guarantee, when the function is monotone. For the non-monotone case,− the proposed method obtains a (1=e) approximation guarantee. Moreover, for the more general case of γ-weakly DR-submodular monotone maximization the proposed stochastic conditional gradient method achieves a (1 e−γ )- approximation guarantee. − (iii) In stochastic discrete submodular maximization, the proposed stochastic conditional gradient method achieves (1 1=e) and (1=e) approximation guarantees for monotone and non-monotone settings, respectively,− by maximizing the multilinear extension of the stochastic discrete objective function. Further, if the objective function is monotone and has curvature c the proposed stochastic conditional gradient method achieves a (1 e−c)=c approximation guarantee. − We begin the paper by studying the related work on stochastic methods in convex minimization and submodular maximization (Section 2). Then, we proceed by stating stochastic convex minimization problem (Section 3). We introduce a stochastic conditional gradient method which can

Load more