Research Article Stochastic Block-Coordinate Gradient Projection Algorithms for Submodular Maximization
Total Page:16
File Type:pdf, Size:1020Kb
Hindawi Complexity Volume 2018, Article ID 2609471, 11 pages https://doi.org/10.1155/2018/2609471 Research Article Stochastic Block-Coordinate Gradient Projection Algorithms for Submodular Maximization Zhigang Li,1 Mingchuan Zhang ,2 Junlong Zhu ,2 Ruijuan Zheng ,2 Qikun Zhang ,1 and Qingtao Wu 2 1 School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China 2Information Engineering College, Henan University of Science and Technology, Luoyang, 471023, China Correspondence should be addressed to Mingchuan Zhang; zhang [email protected] Received 25 May 2018; Accepted 26 November 2018; Published 5 December 2018 Academic Editor: Mahardhika Pratama Copyright © 2018 Zhigang Li et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We consider a stochastic continuous submodular huge-scale optimization problem, which arises naturally in many applications such as machine learning. Due to high-dimensional data, the computation of the whole gradient vector can become prohibitively expensive. To reduce the complexity and memory requirements, we propose a stochastic block-coordinate gradient projection algorithm for maximizing continuous submodular functions, which chooses a random subset of gradient vector and updates the estimates along the positive gradient direction. We prove that the estimates of all nodes generated by the algorithm converge to ∗ some stationary points with probability 1. Moreover, we show that the proposed algorithm achieves the tight ((� /2)� − �) 2 min approximation guarantee afer �(1/� ) iterations for DR-submodular functions by choosing appropriate step sizes. Furthermore, ((�2/(1 + �2))� �∗ − �) �(1/�2) we also show that the algorithm achieves the tight min approximation guarantee afer iterations for weakly DR-submodular functions with parameter � by choosing diminishing step sizes. 1. Introduction In addition, there also exist many polynomial time algorithms for approximately maximizing the submodular functions In this paper, we focus on the submodular function maxi- with approximation guarantees, such as the local search mization, which has recently attracted signifcant attention and greedy algorithms [22–25]. Despite this progress, these in academia since submodularity is a crucial concept in methods use the combinatorial techniques, which have some combinatorial optimization. Furthermore, they have arisen limitations [26]. For this reason, a new approach is proposed in a variety of areas, such as social sciences, algorithm game by using multilinear relaxation [27], which can lif the sub- theory, signal processing, machine learning, and computer modular functions optimization problems into the continu- vision. Furthermore, submodular functions have found many ous domain. Tus, the continuous optimization techniques applications in the applied mathematics and computer sci- are used to minimize exactly or maximize approximately ence, such as probabilistic models [1, 2], crowd teaching submodular functions in polynomial time. Recently, most [3, 4], representation learning [5], data summarization [6], literature is devoted to continuous submodular optimization document summarization [7], recommender systems [8], [28–31]. Te algorithms cited above need to compute all the product recommendation [9, 10], sensor placement [11], (sub)gradients. network monitoring [12, 13], the design of structured norms However, the computation of all (sub)gradients can [14], clustering [15], dictionary learning [16], active learning become prohibitively expensive when dealing with huge- [17], and the utility maximization in sensor networks [18]. scale optimization problems, where the decision vectors In submodular optimization problems, there exist many are high-dimensional. For this reason, coordinate descent polynomial time algorithms for exactly minimizing the sub- method and its variants are proposed for solving efciently modular functions, such as combinatorial algorithms [19–21]. convex optimization problems [32]. At each iteration, the 2 Complexity coordinate descent methods only choose one block of vari- 2. Mathematical Background ables to update their decision vectors. Tus, they can reduce � � the memory and complexity requirements at each iteration Given a ground set , which consists of elements. If a set �:2� �→ R when dealing with high-dimensional data. Furthermore, function + satisfes coordinate descent methods can be applied in support vector � (�) +�(�) ≥�(�∩�) +�(�∪�) machine [33], large-scale optimization problems [34–37], (1) protein loop closure [38], regression [39], compressed sensing for all subsets �, � ⊆ �, then the set function � is called [40], etc. In coordinate descent methods, the choice of search submodular. Te notation of submodularity is mostly used in strategy mainly include cyclic coordinate search [41–43] and discrete domain, but it can be extended to continuous domain the random coordinate search [44–46]. In addition, the � � [49]. Given a subset of R+, X =∏�=1X�,whereeachset asynchronous coordinate decent methods are also proposed X� is a subset of R+ and is compact. A continuous function in recent years [47, 48]. �:X �→ R+ is called submodular continuous function if, Despite this progress, however, stochastic block- for all x, y ∈ X, the following inequality coordinate gradient projection methods for maximizing submodular functions have barely been investigated. To � (x) +�(y)≥�(x ∨ y)+�(x ∧ y) (2) fll this gap, we propose the stochastic block-coordinate gradient projection algorithm to solve stochastic continuous holds, where x ∨ y fl max{x, y} (coordinate-wise) and submodular optimization problems, which are introduced x ∧ y fl min{x, y} (coordinate-wise). Moreover, if x ⪯ �( )≤�() , ∈ X in [30]. In order to reduce the complexity and memory y,wehave x y for all x y ,andthen � requirements at each iteration, we incorporate the block- the submodular continuous function is called monotone X coordinate decomposition into the stochastic gradient on . Furthermore, a diferentiable submodular continuous � , ∈ X projection in the proposed algorithm. Te main contribu- function is called DR-submodular if, for all x y such that x ⪯ y,wehave∇�(x)⪰∇�(y); i.e., ∇�(⋅) is an antitone tions of this paper are as follows: mapping [29]. When the submodular continuous function � is twice diferentiable, the submodular � is submodular if and (i) We propose a stochastic block-coordinate gradient only if all of-diagonal components of its Hessian matrix are projection algorithm for maximizing continuous sub- nonpositive [28]; i.e., for all x ∈ X, modular functions. In the proposed algorithm, each 2 node chooses a random subset of the whole approxi- � � (x) mation gradient vector and updates its decision vector ≤0, ∀�=�.̸ (3) ������ along gradient ascent direction. (ii) We show that each node asymptotically converges Furthermore, if the submodular function � is DR- to some stationary points by the stochastic block- submodular, then all second-derivatives are nonpositive ∈ X coordinate gradient projection algorithm; i.e., the [29]; i.e., for all x , estimates of all nodes converge to some stationary 2 � � (x) points with probability 1. ≤0. (4) ������ (iii) We investigate the convergence rate of stochastic block-coordinate gradient projection algorithm with In addition, the twice diferentiability implies that the sub- approximation guarantee. When the submodular modular � is smooth [50]. Moreover, we say that a submod- functions are DR-submodular, we prove that the ular function � is �-smooth if, ∀x, y ∈ X,wehave �(1/√�) � /2 convergence rate of is achieved with min � � �2 approximation guarantee. More generally, we show � ( ) ≤�( ) + ⟨∇� ( ) , − ⟩ + � − � . (5) y x x y x 2 �y x� that the convergence rate of �(1/√�) is achieved � (1 − �2/(1 + �2)) with min approximation guarantee Note that the above defnition is equivalent to for weakly DR-submodular functions with parameter � � � � � . �∇� (x) −∇�(y)� ≤��x − y� . (6) Te remainder of this paper is organized as follows. We Furthermore, a function � is called weakly DR-submodular describe mathematical background in Section 2. We formu- function with parameter � if late the problem of our interest and propose a stochastic block-coordinate gradient projection algorithm in Section 3. [∇� (x)]� � fl inf inf . x,y∈X,x≤y �∈{1,...,�} (7) In Section 4, the main results of this paper are stated. Te [∇� (y)]� detailed proofs of the main results of the paper are provided in Section 5. Te conclusion of the paper is presented in More details about weak DR-submodular functions are avail- Section 6. able in [29]. Complexity 3 3. Problem Formulation and Algorithm Design Let F� denote the history information of all random variables generated by the proposed algorithm (11) up to time In this section, we frst describe the problem of our interest, �. In this paper, we adopt the following assumption on the andthenwedesignanalgorithmtoefcientlysolvethe random variables ��(�), which is stated as follows. problem. In this paper, we focus on the following constrained Assumption 1. For all �, �, �, �, the random variables ��(�) optimization problem: and ��(�) are independent of each other. Furthermore, the random variables {��(�)} are independent of F�−1 and g(�) for max � (x) fl E�∼D [�� (x)], ∈ F x∈K (8) any decision variables x �−1. where K denotes the constraint set, D denotes an