Hindawi Complexity Volume 2018, Article ID 2609471, 11 pages https://doi.org/10.1155/2018/2609471

Research Article Stochastic Block-Coordinate Projection Algorithms for Submodular Maximization

Zhigang Li,1 Mingchuan Zhang ,2 Junlong Zhu ,2 Ruijuan Zheng ,2 Qikun Zhang ,1 and Qingtao Wu 2

1 School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China 2Information Engineering College, Henan University of Science and Technology, Luoyang, 471023, China

Correspondence should be addressed to Mingchuan Zhang; zhang [email protected]

Received 25 May 2018; Accepted 26 November 2018; Published 5 December 2018

Academic Editor: Mahardhika Pratama

Copyright © 2018 Zhigang Li et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We consider a stochastic continuous submodular huge-scale optimization problem, which arises naturally in many applications such as . Due to high-dimensional data, the computation of the whole gradient vector can become prohibitively expensive. To reduce the complexity and memory requirements, we propose a stochastic block-coordinate gradient projection algorithm for maximizing continuous submodular functions, which chooses a random subset of gradient vector and updates the estimates along the positive gradient direction. We prove that the estimates of all nodes generated by the algorithm converge to ∗ some stationary points with probability 1. Moreover, we show that the proposed algorithm achieves the tight ((� /2)� − �) 2 min approximation guarantee afer �(1/� ) iterations for DR-submodular functions by choosing appropriate step sizes. Furthermore, ((�2/(1 + �2))� �∗ − �) �(1/�2) we also show that the algorithm achieves the tight min approximation guarantee afer iterations for weakly DR-submodular functions with parameter � by choosing diminishing step sizes.

1. Introduction In addition, there also exist many polynomial time algorithms for approximately maximizing the submodular functions In this paper, we focus on the submodular function maxi- with approximation guarantees, such as the local search mization, which has recently attracted signifcant attention and greedy algorithms [22–25]. Despite this progress, these in academia since submodularity is a crucial concept in methods use the combinatorial techniques, which have some combinatorial optimization. Furthermore, they have arisen limitations [26]. For this reason, a new approach is proposed in a variety of areas, such as social sciences, algorithm game by using multilinear relaxation [27], which can lif the sub- theory, signal processing, machine learning, and computer modular functions optimization problems into the continu- vision. Furthermore, submodular functions have found many ous domain. Tus, the continuous optimization techniques applications in the applied mathematics and computer sci- are used to minimize exactly or maximize approximately ence, such as probabilistic models [1, 2], crowd teaching submodular functions in polynomial time. Recently, most [3, 4], representation learning [5], data summarization [6], literature is devoted to continuous submodular optimization document summarization [7], recommender systems [8], [28–31]. Te algorithms cited above need to compute all the product recommendation [9, 10], sensor placement [11], (sub). network monitoring [12, 13], the design of structured norms However, the computation of all (sub)gradients can [14], clustering [15], dictionary learning [16], active learning become prohibitively expensive when dealing with huge- [17], and the utility maximization in sensor networks [18]. scale optimization problems, where the decision vectors In submodular optimization problems, there exist many are high-dimensional. For this reason, coordinate descent polynomial time algorithms for exactly minimizing the sub- method and its variants are proposed for solving efciently modular functions, such as combinatorial algorithms [19–21]. problems [32]. At each iteration, the 2 Complexity coordinate descent methods only choose one block of vari- 2. Mathematical Background ables to update their decision vectors. Tus, they can reduce � � the memory and complexity requirements at each iteration Given a ground set , which consists of elements. If a set �:2� �→ R when dealing with high-dimensional data. Furthermore, function + satisfes coordinate descent methods can be applied in support vector � (�) +�(�) ≥�(�∩�) +�(�∪�) machine [33], large-scale optimization problems [34–37], (1) protein loop closure [38], regression [39], compressed sensing for all subsets �, � ⊆ �, then the set function � is called [40], etc. In coordinate descent methods, the choice of search submodular. Te notation of submodularity is mostly used in strategy mainly include cyclic coordinate search [41–43] and discrete domain, but it can be extended to continuous domain the random coordinate search [44–46]. In addition, the � � [49]. Given a subset of R+, X =∏�=1X�,whereeachset asynchronous coordinate decent methods are also proposed X� is a subset of R+ and is compact. A continuous function in recent years [47, 48]. �:X �→ R+ is called submodular continuous function if, Despite this progress, however, stochastic block- for all x, y ∈ X, the following inequality coordinate gradient projection methods for maximizing submodular functions have barely been investigated. To � (x) +�(y)≥�(x ∨ y)+�(x ∧ y) (2) fll this gap, we propose the stochastic block-coordinate gradient projection algorithm to solve stochastic continuous holds, where x ∨ y fl max{x, y} (coordinate-wise) and submodular optimization problems, which are introduced x ∧ y fl min{x, y} (coordinate-wise). Moreover, if x ⪯ �( )≤�() , ∈ X in [30]. In order to reduce the complexity and memory y,wehave x y for all x y ,andthen � requirements at each iteration, we incorporate the block- the submodular continuous function is called monotone X coordinate decomposition into the stochastic gradient on . Furthermore, a diferentiable submodular continuous � , ∈ X projection in the proposed algorithm. Te main contribu- function is called DR-submodular if, for all x y such that x ⪯ y,wehave∇�(x)⪰∇�(y); i.e., ∇�(⋅) is an antitone tions of this paper are as follows: mapping [29]. When the submodular continuous function � is twice diferentiable, the submodular � is submodular if and (i) We propose a stochastic block-coordinate gradient only if all of-diagonal components of its are projection algorithm for maximizing continuous sub- nonpositive [28]; i.e., for all x ∈ X, modular functions. In the proposed algorithm, each 2 node chooses a random subset of the whole approxi- � � (x) mation gradient vector and updates its decision vector ≤0, ∀�=�.̸ (3) ������ along gradient ascent direction. (ii) We show that each node asymptotically converges Furthermore, if the submodular function � is DR- to some stationary points by the stochastic block- submodular, then all second-derivatives are nonpositive ∈ X coordinate gradient projection algorithm; i.e., the [29]; i.e., for all x , estimates of all nodes converge to some stationary 2 � � (x) points with probability 1. ≤0. (4) ������ (iii) We investigate the convergence rate of stochastic block-coordinate gradient projection algorithm with In addition, the twice diferentiability implies that the sub- approximation guarantee. When the submodular modular � is smooth [50]. Moreover, we say that a submod- functions are DR-submodular, we prove that the ular function � is �-smooth if, ∀x, y ∈ X,wehave �(1/√�) � /2 convergence rate of is achieved with min � � �2 approximation guarantee. More generally, we show � ( ) ≤�( ) + ⟨∇� ( ) , − ⟩ + � − � . (5) y x x y x 2 �y x� that the convergence rate of �(1/√�) is achieved � (1 − �2/(1 + �2)) with min approximation guarantee Note that the above defnition is equivalent to for weakly DR-submodular functions with parameter � � � � � . �∇� (x) −∇�(y)� ≤��x − y� . (6)

Te remainder of this paper is organized as follows. We Furthermore, a function � is called weakly DR-submodular describe mathematical background in Section 2. We formu- function with parameter � if late the problem of our interest and propose a stochastic block-coordinate gradient projection algorithm in Section 3. [∇� (x)]� � fl inf inf . x,y∈X,x≤y �∈{1,...,�} (7) In Section 4, the main results of this paper are stated. Te [∇� (y)]� detailed proofs of the main results of the paper are provided in Section 5. Te conclusion of the paper is presented in More details about weak DR-submodular functions are avail- Section 6. able in [29]. Complexity 3

3. Problem Formulation and Algorithm Design Let F� denote the history information of all random variables generated by the proposed algorithm (11) up to time In this section, we frst describe the problem of our interest, �. In this paper, we adopt the following assumption on the andthenwedesignanalgorithmtoefcientlysolvethe random variables ��(�), which is stated as follows. problem. In this paper, we focus on the following constrained Assumption 1. For all �, �, �, �, the random variables ��(�) optimization problem: and ��(�) are independent of each other. Furthermore, the random variables {��(�)} are independent of F�−1 and g(�) for max � (x) fl E�∼D [�� (x)], ∈ F x∈K (8) any decision variables x �−1. where K denotes the constraint set, D denotes an unknown In addition, we assume that the function � and the sets K distribution, �� is a submodular continuous function for all � satisfy the following conditions. �∈D. Moreover, we assume that the constraint set, K = � ∏ K� ⊆ X,isconvex,whereeachK� ⊆ R+ is convex Assumption 2. Assume that the following properties hold: �=1 K ⊆ X K ⊆ and closed set for all � = 1,...,�. Te problem has recently (a) Te constraint set is convex, and each set � R �∈{1,...,�} been introduced in [30]. In addition, we use the notation + is convex and closed for all . ∗ �:X �→ R � to denote the optimal value of �(x) for all x ∈ K, (b) Te function + is monotone and weakly ∗ � X i.e., � fl maxx∈K�(x).Furthermore,wecanseethatthe DR-submodular with parameter over . � � function �(x) is submodular function because each function (c) Te function is diferentiable and -smooth with ‖⋅‖ �� is submodular continuous function for all �∈D [28]. respect to norm . To solve problem (8), the projected stochastic gradient Next, we make the following assumption about stochastic (�) methods are a class of efcient algorithms [31]. However, oracle g . we focus on the case that the decision vectors x are high- (�) dimensional in this work; i.e., the dimensionality of vectors Assumption 3. Assume that the stochastic oracle g satisfes � is large. Te full gradient computations are prohibitive the following conditions: expensive and become computational bottleneck. Terefore, E [g (�)]=∇�(x (�)) (12) we propose a stochastic block-coordinate by combining the great features of block-coordinate and and stochastic gradient. We assume that the components of � �2 2 decision variables are arbitrarily chosen but fxed for each E [�g (�) −∇�(x (�))� ]≤�. (13) processor. Furthermore, at each iteration, each processor Te above assumption implies that the stochastic oracle randomly chooses a subset of (stochastic) gradients, rather (�) ∇�( (�)) than all the (stochastic) gradients. Te detailed description of g is an unbiased estimate of x . the proposed algorithm is as follows. Starting from an initial In this section, we frst formulate an optimization prob- value x ∈ K,for� = 0,1,2,...,each� = 1,...,�updates its lem, and then design an optimization method to solve it. decision variable as Moreover, we also give some standard assumptions to analyze � (�+1) =Π (� (�) +� (�) � (�) � (�)) , the performance of the proposed method. � K� � � � (9) �(�) Π (�) where is the step-size, K� denotes the Euclidean 4. Main Results projection of � on the set K�, ��(�) are independent and identically Bernoulli random variables with P(��(�) = 1) = �� In this section, We frst provide the performance of con- for all � = 0, 1, 2, . . . and � = 1,...,�,and��(�) denotes the vergence. To this end, we frst introduce the defnition of a unbiased estimate of the gradient ∇��(x(�)),whichdenotes stationary point, which is defned as in [31]. � ∇�( (�)) the -th coordinate in x . ∈ K �:X �→ We introduce the following matrix. Defnition 4. For a vector x and a function R+,ifmaxy∈K⟨y − x,∇�(x)⟩ ≤ 0,thenx is a stationary point � K ⊂ X � (�) fl diag {�1 (�) ,...,�� (�)} (10) of over . From Defnition 4, the convergence of our proposed Terefore, we can write relation (9) more compactly as algorithm is given in the following theorem.

x (�+1) =ΠK (x (�) +�(�) � (�) g (�)), (11) Teorem 5. Let Assumptions 1–3 hold. Assume that the set of stationary points is nonempty and 0<�(�)<2/�. (�) fl (� (�),...,� (�))T ∈ R� (�) fl where x � � +,andg Moreover, the sequence {x(�)} is generated by the stochastic T � (�1(�),...,��(�)) ∈ R . Note that the �-th coordinate of g(�) block-coordinate gradient projection algorithm (11). Ten, the is missing when ��(�) = 0 at each iteration �, and then the �-th iterative sequence {x(�)} converges to some stationary point coordinate of x(�) is not updated. Terefore, a random subset x ∈ K with probability 1. � of x ∈ R+ is updated at each iteration �. In addition, we use Te proof can be found in the next section. Te above result the notation � to denote a diagonal matrix with size �×�; i.e., shows that the iterations converge to some local maximum with � fl diag{�11,...,���,...,���},where��� =��. probability 1. 4 Complexity

Furthermore, when the function �:X �→ R+ for all z ∈ K. Terefore, let z = x(�) and x = x(�) + is diferentiable and DR-submodular, we have the following �(�)�(�)g(�) in inequality (16); we obtain result. ⟨x (�+1) − x (�) −�(�) � (�) g (�) , x (�+1) − x (�)⟩ Teorem 6. (17) Let Assumptions 1–3 hold. Moreover, assume ≤0, �=1in (7) and �(�) = 1/(� + √�).Tesequence {x(�)} is generated by the stochastic block-coordinate gradient where we have used relation (11). By simple algebraic manip- projection algorithm (11). Furthermore, the random decision ulations, we yield variable x(�) is picked by choosing x(1), x(�) with probability 1/2(� − 1) and the other variables with probability 1/�.Ten, ‖ (�+1) − (�)‖2 �∈{1,...,�} x x for any random variable for ,wehave (18) ≤⟨ (�+1) − (�) ,�(�) � (�) (�)⟩. 2 2 2 x x g � ∗ 1 � � � � E [� (x (�))] ≥ min � − ( + + ), (14) 2 2 2� 2√� √� Furthermore, when ��(�) = 0 for any �∈{1,...,�}, ��(� + 1) = ��(�) at each iteration �≥0. Terefore, we have 2 2 where � fl sup ‖x − y‖ , � fl min�∈{1,...,�}��. �,�∈K min ⟨x (�+1) − x (�) ,�(�)(� (�) −�) g (�)⟩=0. (19)

Te proof can be found in the next section. From the From the above relation, we also obtain above result, we can see that an objective value in expectation 2 2 2 2 can be obtained afer �(� �/� + (� +� )/� ) iterations of the ⟨x (�+1) − x (�) ,�(�) g (�)⟩ stochastic block-coordinate gradient projection algorithm (20) =⟨ (�+1) − (�) , (�)⟩. (11) for any initial value. Moreover, the objective value is at x x g ∗ least (� /2)� −�for any DR-submodular function. min Plugging relation (20) into inequality (18), we have In addition, when the function �:X �→ R+ is weakly � 2 DR-submodular function with parameter , we also yield the ‖x (�+1) − x (�)‖ ≤⟨x (�+1) − x (�) ,�(�) g (�)⟩. (21) following result. Taking conditional expectation in (21), we have Teorem 7. Let Assumptions 1–3 hold. Te sequence {x(�)} 2 is generated by the stochastic block-coordinate gradient pro- E [‖x (�+1) − x (�)‖ | F�] √ jection algorithm (11) with �(�) = 1/(� + �).Furthermore, (22) (�) (�) T the random decision variable x is picked by choosing x ≤�(�)(∇� (x (�))) × E [x (�+1) − x (�) | F�], in {x(1),...,x(�)} with probability 1/�.Ten,foranyfor�∈ {1,...,�},wehave wherewehaveusedE[g(�)] = ∇�(x(�)) in the last inequality. � � 2 In addition, since the function is -smooth, we have � 1 ∗ E [� (x (�))] ≥ (� − )� 1+�2 min �� � (x (�+1)) ≥�(x (�))

2 (15) + ⟨x (�+1) − x (�) ,∇�(x (�))⟩ � � (� + √�) �2 (23) − ( + ), 1+�2 2� √ � � − ‖ (�+1) − (�)‖2 . 2 x x 2 2 where � fl min�∈{1,...,�}��, � fl sup�,�∈K‖x − y‖ . min Taking conditional expectation on F� in (23) and using relation (22), we obtain Te proof can be found in the next section. Note that the stochastic block-coordinate gradient projection algorithm E [� (x (�+1)) | F�] 2 2 2 2 yields an objective value afer �(� �/�+(� +� )/� ) iterations ≥�( (�)) from any initial value. Furthermore, the expectation of the x 2 2 ∗ (24) objective value is in at least (� /(1+� ))� � for any weakly 1 � min +( − ) E [‖ (�+1) − (�)‖2 | F ] DR-submodular function. � (�) 2 x x �

5. Performance Analysis for step-size �(�) ∈ (0, 2/�).Forbrevity,let� fl 1/�(�) − �/2. Inequality (24) implies that In this section, the detailed proofs of main results are pro- ∗ vided. We frst analyze the convergence performance of the � − E [� (x (�+1)) | F�] stochastic block-coordinate gradient projection algorithm. ∗ 2 (25) ≤� −�(x (�)) �⋅E [‖x (�+1) − x (�)‖ | F�]. Proof of Teorem 5. By the Projection Teorem [32], we have ∗ ∗ From the defnition of � ,wehave� ≥�(x(�)) for �≥ ∗ ⟨x −ΠK (x) , z −ΠK (x)⟩≤0 (16) 0; i.e., the sequence of random variables {� −�(x(�))} Complexity 5 is nonnegative for all �≥0. Terefore, according to projection algorithm (11) with step-size �(�) ≤ �(�)/(1+��(�)), the Supermartingale Convergence Teorem [51], we can see where �(�) > 0 for �≥0.Ten,wehave that the sequence {�(x(�))} is convergent with probability 1. Furthermore, we also have � (x (�+1)) ∞ � (�) � �2 ∑E [‖ (�+1) − (�)‖2 | F ]<∞ ≥�( (�)) − �∇� ( (�)) − (�)� x x � (26) x 2 � x g � �=0 (34) +� ⟨x (�+1) − x (�) ,�(�) g (�)⟩ −1 with probability 1. From relation (11), inequality (26) implies min � that � 1 min 2 − (� + ) ‖x (�+1) − x (�)‖�−1 , ∞ 2 � (�) � �2 ∑ �ΠK (x (�) +�(�) ∇� (x (�))) − x (�)�� <∞ (27) �=0 � fl � where min min�∈{1,...,�} �. � fl {� ,� ,...,� } with probability 1, where diag 1 2 � . Terefore, Proof. In inequality (21), we let �(�) = ��,where�� denotes we obtain that the diagonal matrix with the �-th entry equal to 1 and the (Π ( (�) +�(�) ∇� ( (�))) − (�))=0 other entries equal to 0. Ten, we have ��→∞lim K x x x (28) 1 � �2 � � { (� )} �ΠK (�� (�) +�(�) �� (�))−�� (�)� withprobability1.Tus,thereexistsasubsequence x ℓ , � (�) � � � which converges to x.Ten,wehave −(Π (� (�) +�(�) � (�))−� (�))� (�) (35) K� � � � � lim ΠK (x (�ℓ)+�(�) ∇� (x (�ℓ))) = x. ℓ�→∞ (29) ≤0. Since the gradient projection operation is continuous, we have Furthermore, the above relation implies that

ΠK (x +�(�) ∇� (x)) = x (30) (Π (� (�) +�(�) � (�))−� (�))� (�) K� � � � � with probability 1. Te above relation implies that 1 � �2 − �ΠK (�� (�) +�(�) �� (�))−�� (�)� (36) 2� (�) � � � ΠK−{x} (� (�) ∇� (x)) =0 (31) ≥0. with probability 1. Ten, relation (31) implies that maxz∈K⟨z− ,∇�( )⟩ ≤ 0 �(⋅) K x x . Terefore, x is a stationary point of over Terefore, for any �,weobtain with probability 1. Te statement of the theorem is completely proved. (Π (� (�) +�(�) � (�))−� (�))� (�) K� � � � � ToproveTeorem6,wefrstpresentthefollowing 1 � �2 lemmas. Te frst lemma follows from [52], which is stated − �ΠK (�� (�) +�(�) �� (�))−�� (�)� 2� (�) � � � as follows. � (37) ≥ min (Π (� (�) +�(�) � (�))−� (�))� (�) Lemma 8. For all z ∈ K,wehave K� � � � � ��

⟨x −ΠK (x) , z −ΠK (x)⟩ ≤0 � � �2 � (32) − min �Π (� (�) +�(�) � (�))−� (�)� � K� � � � � 2� (�) �� for any diagonal matrix �.

where in the last inequality we have used � /�� ≤1.Since Te next lemma is due to [53], which is stated as follows. min 0≤�� ≤1for all � ∈ {1,...,�},setting�(�) = 1/� and Lemma 9. Assume that a function � is submodular and following from relation (23), we have monotone. Ten, we have � (x (�+1)) −�(x (�)) ⟨x − y,∇�(x)⟩≤2�(x) −�(x ∨ y)−�(x ∧ y) (33) ≥ ⟨x (�+1) − x (�) ,∇�(x (�))⟩ for any points x, y ∈ K. � − ‖ (�+1) − (�)‖2 2 x x Inaddition,wealsohavethefollowinglemma. =⟨x (�+1) − x (�) ,∇�(x (�)) − g (�)⟩ Lemma 10. Let Assumptions 1–3 hold. Te iterative sequence {x(�)} is generated by the stochastic block-coordinate gradient +⟨x (�+1) − x (�) , g (�)⟩ 6 Complexity

� − ‖ (�+1) − (�)‖2 Combining inequalities (40) and (42), we yield 2 x x ≥� ⟨ (�+1) − (�) ,�(�) (�)⟩ � (x (�+1)) min x x g �−1 � (�) � �2 � 1 2 ≥�( (�)) − �∇� ( (�)) − (�)� − min (� + ) ‖ (�+1) − (�)‖ x � x g � 2 � (�) x x �−1 2

+� ⟨k − x (�) ,�(�) g (�)⟩ −1 � (�) � �2 min � − �∇� ( (�)) − (�)� , � x g � � 2 min + ⟨x (�+1) − k, x (�+1) − x (�)⟩�−1 (38) � (�) � 1 where the last inequality is obtained by using inequality (37), min 2 − (� + ) ‖x (�+1) − x (�)‖�−1 . Young’s inequality, and the fact that ��(� + 1) = ��(�) when 2 � (�) ��(�) = 0 for all � ∈ {1,...,�}.Moreover,0≤�� ≤1for all � ∈ {1,...,�} � (�) � �2 . Rearranging the terms in (38), the lemma is =�(x (�)) − �∇� (x (t)) − g (�)� obtained completely. 2 (43) +� ⟨k − (�) ,�(�) (�)⟩ With Lemmas 8 and 10 in places, we have the following min x g �−1 result. � + min (‖ (�+1) − k‖2 − ‖ (�) − k‖2 ) 2� (�) x �−1 x �−1 Lemma 11. Let Assumptions 1–3 hold. Te iterative sequence {x(�)} is generated by the stochastic block-coordinate gradient � 1 1 + min ( −�− ) ‖ (�+1) − (�)‖2 projection algorithm (11) with step-size �(�) ≤ �(�)/(1+��(�)). 2 � (�) � (�) x x �−1 Ten, for all �≥0,wehave � (�) � �2 1 1 ≥�( (�)) − �∇� ( (�)) − (�)� E [⟨ (�) − k, (�)⟩|F ]≥ � ( (�)) − x 2 � x g � x g � � x � min min +� ⟨k − x (�) ,�(�) g (�)⟩ −1 � (�) min � ⋅ E [� (x (�+1)) | F�]− � 2� + min (‖ (�+1) − k‖2 − ‖ (�) − k‖2 ), min (39) 2� (�) x �−1 x �−1 � �2 1 ⋅ E [�∇� (x (�)) − g (�)� | F�]+ 2� (�) where the last inequality is due to �(�) ≤ �(�)/(1 + ��(�)). Taking conditional expectation of the above inequality on F� , ⋅ E [‖ (�+1) − k‖2 − ‖ (�) − k‖2 | F ], x �−1 x �−1 � we yield � = � where min min�∈{1,...,�} �. � (�) E [� ( (�+1)) | F ]≥E [� ( (�)) | F ]− x � x � 2 Proof. From the result in Lemma 10, we have � �2 ⋅ E [�∇� ( (�)) − (�)� | F ] � (x (�+1)) � x g � � � (44) � (�) � �2 min ≥�( (�)) − �∇� ( (�)) − (�)� +� E [⟨k − x (�) , g (�)⟩|F�]+ x 2 � x g � min 2� (�) 2 2 +� ⟨k − (�) ,�(�) (�)⟩ ⋅ E [‖ (�+1) − k‖ −1 − ‖ (�) − k‖ −1 | F ]. min x g �−1 (40) x � x � � +� ⟨ (�+1) − k,�(�) (�)⟩ min x g �−1 Tus, by some algebraic manipulations, inequality (39) is � 1 obtained. − min (� + ) ‖ (�+1) − (�)‖2 . 2 � (�) x x �−1 Next, we start to prove Teorem 6. In addition, following on from Lemma 8, we also obtain ∗ ∗ ⟨ (�) +�(�) � (�) (�) − (�+1) , k − (�+1)⟩ −1 Proof of Teorem 6. Setting k = x in Lemma 11, where x x g x x � ∗ (41) is the globally optimal solution for problem (8), i.e., x fl ≤0, arg maxx∈K�(x),wehave which implies that ∗ 1 1 E [⟨x (�) − x , g (�)⟩|F�]≥ � (x (�)) − ⟨ (�+1) − k,�(�) (�)⟩ � � x g �−1 min min 1 (42) � (�) ≥ ⟨ (�+1) − k, (�+1) − (�)⟩ −1 . ⋅ E [� (x (�+1)) | F�]− � (�) x x x � 2� min Complexity 7

� �2 1 ⋅ E [�∇� (x (�)) − g (�)� | F�]+ Taking expectation in (50) and using some algebraic manip- 2� (�) ulations, we have � �2 � �2 ⋅ E [� (�+1) − ∗� − � (�) − ∗� | F ]. E [� ( (�+1))] + E [� ( (�))] −� �( ∗) �x x ��−1 �x x ��−1 � x x min x (45) � � �2 � �2 ≥ min E [� (�+1) − ∗� − � (�) − ∗� ] 2� (�) �x x ��−1 �x x ��−1 Since ∗ ∗ ⟨x (�) − x , g (�)⟩=⟨x (�) − x ,∇�(x (�))⟩ � (�) � �2 − E [�∇� (x (�)) − g (�)� ] ∗ (46) 2 (51) +⟨x (�) − x , g (�) −∇�(x (�))⟩, � 2 2 min � ∗� � ∗� ≥ E [�x (�+1) − x ��−1 − �x (�) − x ��−1 ] taking conditional expectation of (46) with respect to F�,we 2� (�) have 2 ∗ � E [⟨x (�) − x ,∇�(x (�))⟩|F�] − � (�) , 2 ∗ = E [⟨x (�) − x , g (�)⟩|F�] where we have used the relation E[g(�)] = ∇�(x(�)) to ∗ + E [⟨x (�) − x , g (�) −∇�(x (�))⟩|F�] obtain the frst inequality. Summing both sides of (51) for �=1,...,�,weobtain ∗ 1 ≥ E [⟨x (�) − x , g (�) −∇�(x (�))⟩|F�]+ � � ∗ min ∑ (E [� ( (�+1))] + E [� ( (�))] −� �( )) (47) x x min x 1 � (�) �=1 ⋅�(x (�)) − E [� (x (�+1)) | F�]− � 2� � � ∗�2 � min min ≥ min E [� (�+1) − � ]− min 2� (�) �x x ��−1 2� (1) � �2 1 ⋅ E [�∇� � − � � | F ]+ � (x ( )) g ( )� � 2 � 2� (�) 2 � � � ∗� min ⋅ E [�x (1) − x ��−1 ]− ∑� (�) + � ∗�2 � ∗�2 2 �=1 2 ⋅ E [�x (�+1) − x ��−1 − �x (�) − x ��−1 | F�], �−1 (52) which implies that 1 1 � ∗�2 ⋅ ∑ ( − ) E [�x (�+1) − x ��−1 ] �=1 � (�) � (�+1) E [� (x (�+1)) | F�]≥�(x (�)) �−1 2 −� E [⟨ (�) − ∗,∇�( (�))⟩|F ] �2 �2 1 1 � min x x x � ≥− + ∑ ( − )− 2� (1) 2 � (�) � (�+1) 2 ∗ �=1 +� E [⟨x (�) − x , g (�) −∇�(x (�))⟩|F�] min (48) � �2 �2 � � (�) 2 � � � min ⋅ ∑� (�) ≥− − ∑� (�) , − E [�∇� (x (�)) − g (�)� | F�]+ 2� (�) 2 2 2� (�) �=1 �=1

� �2 � �2 ⋅ E [� (�+1) − ∗� − � (�) − ∗� | F ]. where in the last inequality we have used the fact that �x x ��−1 �x x ��−1 � ∗ 2 2 E[‖x(�) − x ‖ −1 ]≤�/� and 1/�(�)−1/�(�+1)≤0for ∗ � min Setting x = x(�) and y = x in Lemma 9 and taking condition all �=1,...,�. On the other hand, we also have expectation on F�,weobtain � E [⟨ (�) − ∗,∇�( (�))⟩|F ] ∑ (2E [� ( (�))] −� �∗)=E [� ( (1))] x x x � x min x �=1 ∗ ≤2E [� (x (�)) | F�]−E [x (�) ∨ x | F�] (49) − E [� (x (�+1))] ∗ − E [x (�) ∧ x | F�]. � (53) + ∑ (E [� ( (�+1))] + E [� ( (�))] −� �( ∗)) Tus, plugging inequality (49) into relation (48), we get x x min x �=1 E [� (x (�+1)) | F�]+E [� (x (�)) | F�] �2 �2 � ∗ ≥− − ∑� (�) − E [� ( (�+1))] , −� �(x ) x min 2� (�) 2 �=1 ∗ ≥� E [⟨x (�) − x , g (�) −∇�(x (�))⟩|F�] ∗ min (50) where � = maxx∈K�(x) and the last inequality is due to (52). � (�) 2 � Since �(�) = 1/√�,wehave � � min − E [�∇� (x (�)) − g (�)� | F�]+ 2 2� (�) � � � 1 1 √ � ∗�2 � ∗�2 ∑� (�) = ∑ ≤∫ �� = 2 �. (54) ⋅ E [�x (�+1) − x ��−1 − �x (�) − x ��−1 | F�]. �=1 �=1 √� 0 √� 8 Complexity

Plugging the above inequality into (53) and dividing both Taking expectation in (60) with respect to F�,weget sides by 2�, 1 E [� (x (�+1))] +(�+ −1)E [� (x (�))] 1 � 1 � ∑E [� (x (�))] + E [� (x (�+1))] � 2� ∗ �=1 −� �� (x ) (55) min 2 2 � 2 2 � ∗ � � min � ∗� � ∗� ≥ min � − (� + √�) − , ≥ E [�x (�+1) − x ��−1 − �x (�) − x ��−1 ] 2 4� 2√� 2� (�)

� (�) � �2 (61) �(�) = 1/(�+√�) − E [�∇� ( (�)) − (�)� ] where we have used the fact that in the last 2 � x g � inequality. Furthermore, the above inequality implies that � 2 2 min � ∗� � ∗� ≥ E [�x (�+1) − x ��−1 − �x (�) − x ��−1 ] 1 � 2� (�) ∑E [� (x (�))] � �2 �=2 − � (�) , 1 2 + (E [� (x (�+1))] + E [� (x (1))]) (56) 2� where the last inequality is due to Assumption 3. �=1,...,� � �2 �2 Adding the above inequalities for ,wehave ≥ min �∗ − (� + √�) − . 2 4� 2√� � 1 ∑ (E [� ( (�+1))] +(�+ −1)E [ (�)] x � x In addition, the sample x(�) is obtained for � ∈ {1,...,�}by �=1 (1) (� + 1) 1/(2�) choosing x , x with probability and the other � 2 ∗ min � ∗� 1/� −� �� (x )) ≥ E [�x (�+1) − x � −1 ] decision vectors with probability ;wehave min 2� (�) � �� 2 2 � ∗ � � 2 � E � ( (�)) ≥ min � − (� + √�) − . � � ∗�2 � � [ x ] (57) − min E [� 1 − � ]− ∑� � + min 2 4� 2√� �x ( ) x ��−1 ( ) 2� (1) 2 �=1 2

Terefore, the theorem is completely proved. �−1 (62) 1 1 � �2 ⋅ ∑ ( − ) E [� (�+1) − ∗� ] � (�) � (�+1) �x x ��−1 We now start to prove Teorem 7. �=1 �2 �2 �−1 1 1 �2 ProofofTeorem7.From the defnition of weakly DR- ≥− + ∑ ( − )− 2� (1) 2 � (�) � (�+1) 2 submodular function, for any x, y ∈ K,weobtain �=1

∇� ( ) ⪰�∇�( ) � �2 �2 � x y (58) ⋅ ∑� (�) ≥− − ∑� (�) , �=1 2� (�) 2 �=1 for all x ⪯ y. Recall that the following relation is from [31]; , ∈ K ∗ 2 i.e., for any x y , where the last inequality follows from E[‖x(�) − x ‖�−1 ]≤ 2 � /� and 1/�(�) − 1/�(� + 1) ≤ 0 for all � = 1,...,�. 1 1 min ⟨ − ,∇�( )⟩≤(1− )�( ) −�( ). Moreover, since �(�) = 1/√� and �(�) = 1/(� + 1/�(�)),we x y x 2 x y (59) � � obtain

∗ � Setting x = x(�) and y = x in the above inequality, then, 1 ∑ (E [� (x (�+1))] +(�+ −1)E [x (�)] following from (48) and taking conditional expectation, we �=1 � obtain (63) 2 ∗ � 2 1 −� �� (x )) ≥ − (� + √�) − � √�, E [� ( (�+1)) | F ]+(�+ −1) min 2 x � � wherewehaveusedinequality(54)toobtainthelastinequal- ⋅ E [� ( (�)) | F ]−� �� ( ∗) x � min x ity. On the other hand, we have ∗ ≥� E [⟨x (�) − x , g (�) −∇�(x (�))⟩|F�] (60) � min 1 ∗ ∑ [(� + ) E [� (x (�))] −� �� ] � (�) � �2 � � min − E [�∇� ( (�)) − (�)� | F ]+ min �=1 2 � x g � � 2� (�) � � ∗�2 � ∗�2 = E [� (x (1))] + ∑E [� (x (�+1))] ⋅ E [�x (�+1) − x ��−1 − �x (�) − x ��−1 | F�]. �=1 Complexity 9

− E [� (x (�+1))] ((� /2)�∗ −�) �(1/�2) tight min approximation guarantee afer � when the submodular functions are DR-submodular and the 1 ∗ + ∑ [(� + −1)E [x (�)] −� �� (x )] . suitable step sizes are used. More generally, we also showed � min ((�2/(1+�2))� �∗ −�) �=1 that the algorithm achieves the tight min �(1/�2) (64) afer iterations when the submodular functions are weakly DR-submodular with parameter � and the appropriate Plugging inequality (63) into equality (64), we get step sizes are used.

� 1 Data Availability ∑ [(� + ) E [� ( (�))] −� ��∗] x min �=1 � (65) Te data used to support the fndings of this study are �2 included within the article. ≥− (� + √�) − �2√�−�∗. 2 Conflicts of Interest Dividing both sides by (� + 1/�)� in (65), we have Te authors declare that they have no conficts of interest. 1 � ∑E [� (x (�))] Acknowledgments � �=1 Tis work was supported in part by the National Natural 2 � � ∗ Science Foundation of China (NSFC) under Grants no. ≥ min � (66) 1+�2 U1604155, no. 61602155, no. 61871430, no. 61772477, and no. 61572445; in part by Henan Science and Technology Innova- 2 � � (� + √�) �2 �∗ tionProjectunderGrantno.174100510010;inpartbythebasic − ( + + ). 1+�2 2� √� � research projects at the University of Henan Province under Grant no. 19zx010; in part by the Ph.D. Research Fund of the In addition, we obtain the sample x(�) by choosing x(�) with Zhengzhou University of Light Industry; and in part by the probability 1/�.Ten,forany�∈{1,...,�},wehave Natural Science Foundation of Henan Province under Grant no. 162300410322. E [� (x (�))] References �2� ≥ min �∗ 1+�2 [1] J. Djolonga and A. Krause, “From MAP to marginals: Varia- (67) tional inference in Bayesian submodular models,”in Proceedings 2 of the 28th Annual Conference on Neural Information Processing � � (� + √�) �2 �∗ − ( + + ). Systems 2014, NIPS 2014,pp.244–252,Canada,December2014. 1+�2 2� √� � [2] R.IyerandJ.Bilmes,“Submodularpointprocesseswithapplica- tions to machine learning,” in Proceedings of the in Proceedings Terefore, the theorem is obtained completely. of the 18th International Conference on Artifcial Intelligence and Statistics, pp. 388–397, 2015. In this section, we proved the main results of the paper [3] A. Singla, I. Bogunovic, A. Karbasi, and A. Krause, “Near- in detail. Te conclusion of this paper is provided in the next optimally teaching the crowd to classify,” in Proceedings of the section. 31st International Conference on Machine Learning, pp. 154–162, 2014. 6. Conclusion [4] B. Kim, R. Khanna, and O. Koyejo, “Examples are not enough, learn to criticize! Criticism for interpretability,” in Proceedings In this paper, we have considered a stochastic optimization of the 30th Annual Conference on Neural Information Processing problem of continuous submodular functions, which is an Systems, NIPS 2016, pp. 2288–2296, Spain, December 2016. important problem in many areas such as machine learn- [5] V. Cevher and A. Krause, “Greedy dictionary selection for ing and social science. Since the data is high-dimensional, sparse representation,” IEEE Journal of Selected Topics in Signal usual algorithms based on the computation of the whole Processing,vol.5,no.5,pp.979–988,2011. approximate gradient vector, such as stochastic gradient [6] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, “Dis- tributed submodular maximization: Identifying representative methods, are prohibitive. For this reason, we proposed the elements in massive data,” in Proceedings of the 27th Annual stochastic block-coordinate gradient projection algorithm for Conference on Neural Information Processing Systems, NIPS maximizing submodular functions, which randomly chooses 2013, USA, December 2013. a subset of the approximate gradient vector. Moreover, [7] H. Lin and J. Bilmes, “A class of submodular functions for we studied the convergence performance of the proposed document summarization,” in Proceedings of the 49th Annual algorithm. We proved that the iterations converge to some Meeting of the Association for Computational Linguistics: Human stationary points with probability 1 by using the suitable step Language Technologies, ACL-HLT 2011, pp. 510–520, USA, June sizes. Furthermore, we showed that the algorithm achieves a 2011. 10 Complexity

[8] Y. Yue and C. Guestrin, “Linear submodular bandits and their [24] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, “Ananalysis of application to diversifed retrieval,” in Advances in Neural approximations for maximizing submodular set functions. II,” Information Processing Systems, pp. 2483–2491, 2011. Mathematical Programming,no.8,pp.73–87,1978. [9] K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin, “Turning [25] U. Feige, V. S. Mirrokni, and J. Vondr´ak, “Maximizing non- down the noise in the blogosphere,” in Proceedings of the monotone submodular functions,” SIAM Journal on Computing, 15th ACM SIGKDD International Conference on Knowledge vol. 40, no. 4, pp. 1133–1153, 2011. Discovery and Data Mining, KDD ’09, pp. 289–297, France, July [26] C. Chekuri, J. Vondr´ak, and R. Zenklusen, “Submodular func- 2009. tion maximization via the multilinear relaxation and contention [10] B. Mirzasoleiman, A. Badanidiyuru, and A. Karbasi, “FAst coN- resolution schemes,” SIAM Journal on Computing,vol.43,no.6, sTrained submodular maximization: Personalized data summa- pp.1831–1879,2014. rization,” in Proceedings of the 33rd International Conference on [27] G. Calinescu, C. Chekuri, M. P´al, and J. Vondr´ak, “Maximizing Machine Learning, ICML 2016, pp. 2042–2054, USA, June 2016. a monotone submodular function subject to a matroid con- straint,” SIAM Journal on Computing,vol.40,no.6,pp.1740– [11]C.Guestrin,A.Krause,andA.P.Singh,“Near-optimalsensor 1766, 2011. placements in gaussian processes,” in Proceedings of the ICML 2005: 22nd International Conference on Machine Learning,pp. [28] F. Bach, “Submodular functions: from discrete to continuous 265–272, August 2005. domains,” https://arxiv.org/abs/1511.00394v2. [29] A. Bian, B. Mirzasoleiman, J. Buhmann, and A. Krause, “Guar- [12]J.Leskovec,A.Krause,C.Guestrin,C.Faloutsos,J.Vanbriesen, anteed non-convex optimization: Submodular maximization and N. Glance, “Cost-efective outbreak detection in networks,” over continuous domains,” https://arxiv.org/abs/1606.05615. in Proceedings of the 13th ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining (KDD ’07),pp. [30] M. Karimi, M. Lucic, H. Hassani, and A. Krause, “Stochastic 420–429, New York, NY, USA, August 2007. submodular maximization: Te case of convergence functions,” in Advances in Neural Information Processing Systems,pp.6853– [13] R. Gomez, J. Leskovec, and A. Krause, “Inferring networks 6863, 2017. of difusion and infuence,” ACM Transactions on Knowledge [31] H. Hassani, M. Soltanolkotabi, and A. Karbasi, “Gradient meth- Discovery from Data,vol.5,no.4,pp.1–37,2012. ods for submodular maximization,” https://arxiv.org/abs/1708 [14] F. Bach, “Structured sparsity-including norms through sub- .03949. modular functions,” in Advances in Neural Information Process- [32] D. P. Bertsekas, , Athena Scientifc, ing Systems,pp.118–126,2010. Belmont, Mass, USA, 2nd edition, 1999. [15]M.Narasimhan,N.Jojic,andJ.Bilmes,“Q-clustering,”inPro- [33] K.-W. Chang, C.-J. Hsieh, and C.-J. Lin, “Coordinate descent ceedings of the 2005 Annual Conference on Neural Information method for large-scale L2-loss linear support vector machines,” Processing Systems, NIPS 2005, pp. 979–986, Canada, December Journal of Machine Learning Research (JMLR),vol.9,pp.1369– 2005. 1398, 2008. [16] A. Das and D. Kempe, “Submodular meets spectral: Greedy [34] S. J. Wright, “Accelerated block-coordinate relaxation for reg- algorithms for subset selection, and ularized optimization,” SIAM Journal on Optimization, vol. 22, dictionary selection,” in Proceedings of the 28th International no. 1, pp. 159–186, 2012. Conference on Machine Learning, ICML 2011,pp.1057–1064, [35]P.TsengandS.Yun,“Acoordinategradientdescentmethod USA, July 2011. for nonsmooth separable minimization,” Mathematical [17] D. Golovin and A. Krause, “Adaptive submodularity: theory Programming—Series B, vol. 117, no. 1, pp. 387–423, 2009. and applications in active learning and stochastic optimization,” [36] P.Tseng and S. Yun, “Block-coordinate method Journal of Artifcial Intelligence Research,vol.42,pp.427–486, for linearly constrained nonsmooth separable optimization,” 2011. Journal of Optimization Teory and Applications,vol.140,no. [18] Z. Zheng and N. B. Shrof, “Submodular utility maximization 3, pp. 513–535, 2009. for deadline constrained data collection in sensor networks,” [37] S. Yun and K.-C. Toh, “A coordinate gradient descent method Institute of Electrical and Electronics Engineers Transactions on for l1-regularized convex minimization,” Computational Opti- Automatic Control,vol.59,no.9,pp.2400–2412,2014. mization and Applications,vol.48,no.2,pp.273–307,2011. [19] S. Iwata, L. Fleischer, and S. Fujishige, “Acombinatorial strongly [38] A. A. Canutescu and R. L. Dunbrack Jr., “Cyclic coordinate polynomial algorithm for minimizing submodular functions,” descent: A robotics algorithm for protein loop closure,” Protein Journal of the ACM,vol.48,no.4,pp.761–777,2001. Science,vol.12,no.5,pp.963–972,2003. [39] T. T. Wu and K. Lange, “Coordinate descent algorithms for lasso [20] J. B. Orlin, “Afaster strongly polynomial time algorithm for sub- penalized regression,” Te Annals of Applied Statistics,vol.2,no. modular function minimization,” Mathematical Programming, 1,pp.224–244,2008. vol. 118, no. 2, Ser. A, pp. 237–251, 2009. 1 [40] Y. Li and S. Osher, “Coordinate descent optimization for l [21] A. Schrijver, “A combinatorial algorithm minimizing sub- minimization with application to compressed sensing; a greedy modular functions in strongly polynomial time,” Journal of algorithm,” Inverse Problems and Imaging,vol.3,no.3,pp.487– Combinatorial Teory, Series B,vol.80,no.2,pp.346–355,2000. 503, 2009. [22]G.Cornuejols,M.L.Fisher,andG.L.Nemhauser,“Locationof [41] Z. Q. Luo and P. Tseng, “On the convergence of the coordinate bank accounts to optimize foat: an analytic study of exact and descent method for convex diferentiable minimization,” Jour- approximate algorithms,” Management Science,vol.23,no.8,pp. nal of Optimization Teory and Applications,vol.72,no.1,pp. 789–810, 1977. 7–35, 1992. [23] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of [42] P. Tseng, “Convergence of a block coordinate descent method approximations for maximizing submodular set functions—I,” for nondiferentiable minimization,” Journal of Optimization Mathematical Programming,vol.14,no.1,pp.265–294,1978. Teory and Applications,vol.109,no.3,pp.475–494,2001. Complexity 11

[43] P.-W. Wang and C.-J. Lin, “Iteration complexity of feasible descent methods for convex optimization,” Journal of Machine Learning Research (JMLR),vol.15,pp.1523–1548,2014. [44] Y.Nesterov, “Efciency of coordinate descent methods on huge- scale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012. [45] P.Richt´arik and M. Tak´ac, “Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function,” Mathematical Programming,vol.144,no.1-2,Ser.A, pp. 1–38, 2014. [46] Z. Lu and L. Xiao, “On the complexity analysis of randomized block-coordinate descent methods,” Mathematical Program- ming,vol.152,no.1-2,Ser.A,pp.615–642,2015. [47] J. Liu, S. J. Wright, C. RE,´ V. Bittorf, and S. Sridhar, “An asynchronous parallel stochastic coordinate descent algorithm,” Journal of Machine Learning Research (JMLR),vol.16,pp.285– 322, 2015. [48] J. Liu and S. J. Wright, “Asynchronous stochastic coordinate descent: parallelism and convergence properties,” SIAM Journal on Optimization,vol.25,no.1,pp.351–376,2015. [49] S. Fujishige, Submodular functions and optimization,vol.2nd of Annals of Discrete Mathematics, North-Holland Publishing, Amsterdam, Netherlands, 2005. [50] C. Chekuri, T. S. Jayram, and J. Vondrak, “On multiplicative weight updates for concave and submodular function maxi- mization,” in Proceedings of the 2015 Conference on Innovations in Teoretical Computer Science, pp. 201–210, ACM. [51] B. T. Polyak, Introduction to Optimization, Optimization Sof- ware,NewYork,NY,USA,1987. [52] C. Singh, A. Nedic, and R. Srikant, “Random block-coordinate gradient projection algorithms,” in Proceedings of the 2014 53rd IEEE Annual Conference on Decision and Control, CDC 2014,pp. 185–190, USA, December 2014. [53] C. Chekuri, J. Vondrak, and R. Zenklusen, “Submodular func- tion maximization via the multilinear relaxation and contention resolution schemes,” in Proceedings of the Forty-third Annual ACM Symposium on Teory of Computing,pp.783–792,ACM, 2011. Advances in Advances in Journal of The Scientifc Journal of Operations Research Decision Sciences Applied Mathematics World Journal Probability and Statistics Hindawi Hindawi Hindawi Hindawi Publishing Corporation Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www.hindawi.comwww.hindawi.com Volume 20182013 www.hindawi.com Volume 2018

International Journal of Mathematics and Mathematical Sciences

Journal of Optimization Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Submit your manuscripts at www.hindawi.com

International Journal of Engineering International Journal of Mathematics Analysis Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal of Advances in Mathematical Problems International Journal of Discrete Dynamics in Complex Analysis Numerical Analysis in Engineering Dierential Equations Nature and Society Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

International Journal of Journal of Journal of Abstract and Advances in Stochastic Analysis Mathematics Function Spaces Applied Analysis Mathematical Physics Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018