Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Proceedings of the 42nd LEEE I Conference on Decision and Control Maui, Hawaii USA, December 2003 T hM 07-2 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-Jeng Wang* and James C. Spall** The Johns Hopkins University Applied Physics Laboratory 11 100 Johns Hopkins Road Laurel, MD 20723-6099, USA Abstract- We present a stochastic approximation algorithm Several results have been presented for constrained opti- based on penalty function method and a simultaneous per- mization in the stochastic domain. In the area of stochastic turbation gradient estimate for solving stochastic optimization approximation (SA), most of the available results are based problems with general inequality constraints. We present a general convergence result that applies to a class of penalty on the simple idea of projecting the estimate e,, back to its functions including the quadratic penalty function, the aug- nearest point in G whenever e,, lies outside the constraint set mented Lagrangian, and the absolute penalty function. We also G. These projection-based SA algorithms are typically of the establish an asymptotic normality result for the algorithm with following form: smooth penalty functions under minor assumptions. Numerical results are given to compare the performance of the proposed %+I = nG[&-angn(6n)], (2) algorithm with different penalty functions. I. INTRODUCTION where w:Rd -+ G is the set projection operator, and gn(8,) is an estimate of the gradient g(8“); see, for example [2], In this paper, we consider a constrained stochastic opti- [3], [5], [6]. The main difficulty for this projection approach mization problem for which only noisy measurements of the lies in the implementation (calculation) of the projection cost function are available. More specifically, we are aimed operator KG. Except for simple constraints like interval or to solve the following optimization problem: linear constraints, calculation of %(e) for an arbitrary vector 8 is a formidable task. Other techniques for dealing with constraints have also where L: Wd -+ W is a real-valued cost function, 8 E Rd is been considered: Hiriart-Urmty [7] and Pflug [SI present and the parameter vector, and G c Rd is the constraint set. We analyze a SA algorithm based on the penalty function method also assume that the gradient of L(.) exists and is denoted for stochastic optimization of a convex function with convex by g(.). We assume that there exists a unique solution 8* for inequality constraints; Kushner and Clark 131 present several the constrained optimization problem defined by (1). We con- SA algorithms based on the Lagrange multiplier method, sider the situation where no explicit closed-form expression the penalty function method, and a combination of both. of the function L is available (or is very complicated even if Most of these techniques rely on the Kiefer-Wofolwitz (KW) available), and the only information are noisy measurements [4] type of gradient estimate when the gradient of the cost . of L at specified values of the parameter vector 8. This function is not readily available. Furthermore, the conver- scenario arises naturally for simulation-based optimization gence of these SA algorithms based on “non-projection” where the cost function L is defined as the expected value techniques generally requires complicated assumptions on of a random cost associated with the stochastic simulation the cost function L and the constraint set G. In this paper, we of a complex system. We also assume that significant costs present and study the convergence of a class of algorithms (in term of time and/or computational costs) are involved in based on the penalty function methods and the simultaneous obtaining each measurement (or sample) of L(8).These con- perturbation (SP) gradient estimate [9]. The advantage of straint prevent us from estimating the gradient (or Hessian) the SP gradient estimate over the KW-type estimate for of L(.) accurately, hence prohibit the application of effective unconstrained optimization has been demonstrated with the nonlinear programming techniques for inequality constraint, simultaneous perturbation stochastic approximation (SPSA) for example, the sequential quadratic programming methods algorithms. And whenever possible, we present sufficient (see; for example, section 4.3 of [I]). Throughout the paper conditions (as remarks) that can be more easily verified than we use 8, to denote the nth estimate of the solution 8*. the much weaker conditions used in our convergence proofs. We focus on general explicit inequality constraints where This work was supported by the JHU/APL Independent Research and Development hogram. G is defined by *Phone: 240-228-6204; E-mail: i - j eng .wang@ jhuapl .edu. **Phone: 240-228-4960; E-mail: james .spall@ jhuapl .edu. G { 8 E Rd: qj(8) 5 0,j= 1,.. .,s}, (3) 0-7803-7924-1103/$17.00 02003 IEEE 3808 where 4,: Rd ---f R are continuously differentiable real- Note that when A,, = 0, the penalty function defined by (6) valued functions. We assume that the analytical expression of reduces to the standard quadratic penalty function discussed the function qj is available. We extend the result presented in in [lo] [lo] to incorporate a larger classes of penalty functions based S on the augmented Lagragian method. We also establish the ~r(0,o)=~(o)+rC [m~{o,~j(e))l~~ asymptotic normality for the proposed algorithm. Simulation j= 1 results are presented to illustrated the performance of the Even though the convergence of the proposed algorithm only technique for stochastic optimization. requires {A,,} be bounded (hence we can set = O), we 11. CONSTRAINED SPSA ALGORITHMS can significantly improve the performance of the algorithm with appropriate choice of the sequence based on concepts A. Penalty Functions from Lagrange multiplier theory. Moreover, it has been The basic idea of the penalty-function approach is to shown [ 11 that, with the standard quadratic penalty function, convert the originally constrained optimization problem (1) the penalized cost function L,, = L + r,,P can become ill- into an unconstrained one defined by conditioned as r,, increases (that is, the condition number of the Hessian matrix of L,, at 0; diverges to 03 with rJ. minL,(B) 4 L(8) (4) e +#(e), The use of the general penalty function defined in (6) can prevent this difficulty if {A,,} is chosen so that it is close to where P: Rd -+ R is the penalty function and r is a positive real number normally referred to as the penalty parameter. the true Lagrange multipliers. In Section IV, we will present The penalty functions are defined such that P is an increasing an iterative method based on the method of multipliers (see; function of the constraint functions qj; P > 0 if and only if for example, [ 111) to update A,, and compare its performance with the standard quadratic penalty function. qj > 0 for any j; P -+ m as qj + -; and P -+ -1 (I 2 0) as qj -+ -W. In this paper, we consider a penalty function B. A SPSA Algorithms for Inequality Constraints method based on the augmented Lagrangian function defined In this section, we present the specific form of the al- by gorithm for solving the constrained stochastic optimization lS problem. The algorithm we consider is defined by L,(e,n)= L(e) + - c { [max{o,Lj + rqj(8)}l2 - A;}, 2r j=* (5) &+I = en -ani?n(en) -anrnVP(%), (7) where A E Rs can be viewed as an estimate of the Lagrange where &(On) is an estimate of the gradient of L, g(-), at multiplier vector. The associated penalty function is e,,, {r,,} is an increasing sequence of positive scalar with Iim,,-,-r, = 00, VP(0) is the gradient of P(0) at 0, and {a,,} is a positive scalar sequence satisfying U,, -+. 0 and c,”=la, = -. The gradient estimate is obtained from two Let {r,,} be a positive and strictly increasing sequence with noisy measurements of the cost function L by r,, -+ 00 and {A,} be a bounded nonnegative sequence in Rs. (L(en +CA) + E:) - (L(0, - CnAn) + e;) -1 It can be shown (see; for example, section 4.2 of [l]) that (8) the minimum of the sequence functions {L,}, defined by 2% An’ where A,, E Wdis a random perturbation vector, c,~-, 0 is a positive sequence, E,‘ and E; are noise in the measurements. converges to the solution of the original constrained problem and & denotes the vector [in,7 ...,T . For analysis, we (1). Since the penalized cost function (or the augmented rewrite the algorithm (7) into bl Lagrangian) (5) is a differentiable function of 8, we can apply the standard stochastic approximation technique with %+I = e,, -ang(en) -a,,r,vP(&) +and, -an- En (9) the SP gradient estimate for L to minimize {L,,(-)}.In other ’ words, the original problem can be solved with an algorithm where d,, and E,, are defined by of the following form: L(% +CA) - L(0,, - Cn&) 4, 4 g(@n)- > 2cnAn E,, 4 En+ -En-, where g,, is the SP estimate of the gradient g(-) at e,, that we respectively. shall specify later. Note that since we assume the constraints We establish the convergence of the algorithm (7) and the are explicitly given, the gradient of the penalty function P(.) associated asymptotic normality under appropriate assump- is directly used in the algorithm. tions in the next section. 3809 111. CONVERGENCE AND ASYMPTOTICNORMALITY (C.6) VL,,(.)satisfies condition (A5) A. Convergence Theorem Theorem 2: Suppose that assumptions (C. 1-C.6) hold. To establish convergence of the algorithm (7), we need Then the sequence {e,*}defined by (7) converges to 8* to study the asymptotic behavior of an SA algorithm with almost surely. a “time-varying” regression function. In other words, we Proofi We only need to verify the conditions (A.l-AS) need to consider the convergence of an SA algorithm of the in Theorem 1 to show the desired result: following form: Condition (A.

Load more