<<

International Journal of Control Vol. 81, No. 8, August 2008, 1232–1238

Stochastic optimisation with inequality constraints using simultaneous perturbations and penalty functions I.-Jeng Wanga* and James C. Spalla The Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723-6099, USA (Received 20 July 2006; final version received 2 August 2007)

We present a stochastic approximation based on penalty function method and a simultaneous perturbation gradient estimate for solving stochastic optimisation problems with general inequality constraints. We present a general convergence result that applies to a class of penalty functions including the quadratic penalty function, the augmented Lagrangian, and the absolute penalty function. We also establish an asymptotic normality result for the algorithm with smooth penalty functions under minor assumptions. Numerical results are given to compare the performance of the proposed algorithm with different penalty functions. Keywords: stochastic optimisation; inequality constraint; simultaneous perturbation

1. Introduction Bertsekas (1995, x 4.3). Throughout the paper, we use In this paper, we consider a constrained stochastic n to denote the nth estimate of the solution *. optimisation problem for which only noisy measure- Several results have been presented for constrained ments of the cost function are available. More optimisation in the stochastic domain. In the area of specifically, we are aimed to solve the following stochastic approximation (SA), most of the available optimisation problem: results are based on the simple idea of projecting the estimate n back to its nearest point in G whenever n min LðÞ, ð1Þ lies outside the constraint set G. These projection- 2 G based SA are typically of the following d d where L : R !Ris a real-valued cost function, 2R form: is the parameter vector, and G Rd is the constraint set. We also assume that the gradient of L() exists and nþ1 ¼ G½n ang^nðnÞ , ð2Þ is denoted by g(). We assume that there exists a unique d solution * for the constrained optimisation problem where G : R ! G is the set projection operator, and defined by (1). We consider the situation where no g^nðnÞ is an estimate of the gradient g(n); see, for explicit closed-form expression of the function L is example, Dupuis and Kushnes (1987); Kushner and available (or is very complicated even if available), and Clark (1978); Kushner and Yin (1997); Sadegh (1997). the only information are noisy measurements of L at The main difficulty for this projection approach lies in specified values of the parameter vector . This the implementation (calculation) of the projection scenario arises naturally for simulation-based optimi- operator G. Except for simple constraints like interval sation where the cost function L is defined as the or linear constraints, calculation of G() for an expected value of a random cost associated with the arbitrary vector is a formidable task. stochastic simulation of a complex system. We also Other techniques for dealing with constraints have assume that significant costs (in term of time and/or also been considered: Hiriart-Urruty (1977) and Pflug computational costs) are involved in obtaining each (1981) present and analyse a SA algorithm based on measurement (or sample) of L(). These constraint the penalty function method for stochastic optimisa- prevent us from estimating the gradient (or Hessian) of tion of a convex function with convex inequality L() accurately, hence prohibit the application of constraints; Kushner and Clark (1978) presents several effective non- techniques for SA algorithms based on the Lagrange multiplier inequality constraint, for example, the sequential method, the penalty function method, and a combina- methods; see, for example, tion of both. Most of these techniques rely on the

*Corresponding author. Email: [email protected]

ISSN 0020–7179 print/ISSN 1366–5820 online 2008 Taylor & Francis DOI: 10.1080/00207170701611123 http://www.informaworld.com International Journal of Control 1233

Kiefer and Wofolwitz (1952) (KW) type of gradient where l 2Rs can be viewed as an estimate of the estimate when the gradient of the cost function is not Lagrange multiplier vector. The associated penalty readily available. Furthermore, the convergence of function is these SA algorithms based on ‘non-projection’ techni- no 1 Xs ques generally requires complicated assumptions on PðÞ¼ maxf0, l þ rq ðÞg 2l2 : ð6Þ r2 j j j the cost function L and the constraint set G. In this 2 j¼1 paper, we present and study the convergence of a class of algorithms based on the penalty function methods Let {rn} be a positive and strictly increasing sequence with rn !1 and {ln} be a bounded non-negative and the simultaneous perturbation (SP) gradient s estimate Spall (1992). The advantage of the SP gradient sequence in R . It can be shown (see, for example, estimate over the KW-type estimate for unconstrained Bertsekas (1995, x 4.2)) that the minimum of the optimisation has been demonstrated with the simulta- sequence functions {Ln}, defined by neous perturbation stochastic approximation (SPSA) LnðÞ , Lr ð, lnÞ, algorithms. And whenever possible, we present suffi- n cient conditions (as remarks) that can be more easily converges to the solution of the original constrained verified than the much weaker conditions used in our problem (1). Since the penalised cost function (or convergence proofs. the augmented Lagrangian) (5) is a differentiable We focus on general explicit inequality constraints function of , we can apply the standard stochastic where G is defined by approximation technique with the SP gradient estimate for L to minimise {L ()}. In other words, the original , d n G f 2 R : qjðÞ0, j ¼ 1, ..., sg, ð3Þ problem can be solved with an algorithm of the d following form: where qj : R !R are continuously differentiable real-valued functions. We assume that the analytical nþ1 ¼ n anr^ LnðnÞ expression of the function qj is available. We extend the result presented in Wang and Spall (1999) to incorpo- ¼ n ang^n anrnrPðnÞ, rate a larger classes of penalty functions based on where g^ is the SP estimate of the gradient g()at the augmented Lagragian method. We also establish n n that we shall specify later. Note that since we assume the asymptotic normality for the proposed algorithm. the constraints are explicitly given, the gradient of the Simulation results are presented to illustrated the penalty function P() is directly used in the algorithm. performance of the technique for stochastic Note that when l ¼ 0, the penalty function defined optimisation. n by (6) reduces to the standard quadratic penalty function discussed in Wang and Spall (1999) 2. Constrained SPSA algorithms Xs L ð,0Þ¼LðÞþr maxf0, q ðÞg 2: 2.1 Penalty functions r j j¼1 The basic idea of the penalty-function approach is to convert the originally constrained optimisation pro- Even though the convergence of the proposed algo- blem (1) into an unconstrained one defined by rithm only requires {ln} be bounded (hence we can set ln ¼ 0), we can significantly improve the performance min LrðÞ , LðÞþrPðÞ, ð4Þ of the algorithm with appropriate choice of the sequence based on concepts from Lagrange multiplier d where P: R !R is the penalty function and r is theory. Moreover, it has been shown (Bertsekas 1995) a positive real number normally referred to as the that, with the standard quadratic penalty function, the penalty parameter. The penalty functions are defined penalized cost function Ln ¼ L þ rnP can become ill- such that P is an increasing function of the constraint conditioned as rn increases (that is, the condition functions qj; P 4 0 if and only if qj 4 0 for any j; number of the of Ln at n diverges to 1 P !1 as qj 4 1; and P !l (l 0) as qj !1.In with rn). The use of the general penalty function this paper, we consider a penalty function method defined in (6) can prevent this difficulty if {ln}is based on the augmented Lagrangian function defined chosen so that it is close to the true Lagrange by multipliers. In x 4, we will present an no 1 Xs based on the method of multipliers (see, for example, L ð, lÞ¼LðÞþ maxf0, l þ rq ðÞg 2l2 , r 2r j j j Bertsekas (1982)) to update ln and compare its j¼1 performance with the standard quadratic penalty ð5Þ function. 1234 I.-J. Wang and J.C. Spall P n 2.2 A SPSA algorithms for inequality constraints (A.2) dn ! 0, and k¼1 akek converges. In this section, we present the specific form of the (A.3) For some N 5 1, any 4 0 and for each n N, algorithm for solving the constrained stochastic if k *k, then there exists a n() 4 0 T optimisation problem. The algorithm we consider is such that ( *) Pfn() n()k*k where 1 defined by n() satisfies n¼1 annðÞ¼1 and 1 dnn() ! 0. nþ1 ¼ n ang^nðnÞanrnrPðnÞ, ð7Þ (A.4) For each i ¼ 1, 2, ..., d, and any 4 0, if where g^nðnÞ is an estimate of the gradient of L, g(), at jni (*)ij 4 eventually, then either fni(n) 0 n,{rn} is an increasing sequence of positive scalar with eventually or fni(n) 5 0 eventually, where we use limn!1 rn ¼1, rP() is the gradient of P()at, and (*)i to denote the ith element of *. (A.5) For any 4 0 and non-empty S {1, 2, ..., d} {Pan} is a positive scalar sequence satisfying an ! 0 and 1 there exists a 0(, S) 4 such that for all ¼ an ¼1. The gradient estimate g^n is obtained n 1 d from two noisy measurements of the cost function L by 2 { 2R : j( *)ij 5 when i 26 S, 0 þ j( *)ij (, S) when i 2 S}, ðLðn þ cnnÞþn ÞðLðn cnnÞþn Þ 1 , ð8Þ P 2cn n ð Þ fniðÞ lim supPi6 2 S i 51: where 2Rd is a random perturbation vector, n n!1 i 2 Sð Þi fniðÞ þ cn ! 0 is a positive sequence, n and n are noise in the measurements. and 1/ denotes the vector n Then the sequence {n} defined by the algorithm (10) 1 d 1=n, ...,1=n . For analysis, we rewrite the algo- converges to *. rithm (7) into Based on Theorem 3.1, we give a convergence result for n nþ1 ¼ n angðnÞanrnrPðnÞþandn an , algorithm (7) by substituting rLn(n) ¼ g(n) þ 2cnn rnrP(n), dn, and n=2cnn into fn(n), dn, and en in ð9Þ (10), respectively. We need the following assumptions. where dn and "n are defined by (C.1) There exists K1 2N such that for all n K1,we d Lðn þ cnnÞLðn cnnÞ have a unique n 2R with rLnðnÞ¼0. dn , gðnÞ , 2cnn (C.2) {ni} are i.i.d. and symmetrically distributed 1 , þ aboutP 0, with jnij 0 a.s. and Ejni j1. n n n , n (C.3) k¼1 ðakk=2ckkÞ converges almost surely. respectively. (C.4) If k *k, then there exists a () 4 0 such We establish the convergence of the algorithm (7) that and the associated asymptotic normality under appro- (i) if 2 G,( *)T g() ()k *k 4 0. priate assumptions in the next section. (ii) if 26 G, at least one of the following two conditions hold T 3. Convergence and asymptotic normality . ( *) g() ()k *k and ( *)T rP() 0. 3.1 Convergence theorem . ( *)T g() M for some constant To establish convergence of the algorithm (7), we need M4 0and( *)TrP()()k *k 4 0 to study the asymptotic behaviour of an SA algorithm (C.5) anrn ! 0, g() and rP() are Lipschitz. (see with a ‘time-varying’ regression function. In other comments below). words, we need to consider the convergence of an SA (C.6) rL () satisfies condition (A5). algorithm of the following form: n Theorem 2: Suppose that assumptions (C.1–C.6) hold. nþ1 ¼ n an fnðnÞþandn þ anen, ð10Þ Then the sequence {n} defined by (7) converges to * where { fn()} is a sequence of functions. We state here almost surely. without proof a version of the convergence theorem Proof: We only need to verify the conditions given by Spall and Cristion (1998) for an algorithm in (A.1–A.5) in Theorem 1 to show the desired result. the generic form (10). . Condition (A.1) basically requires the station- Theorem 1: Assume the following conditions hold. ary point of the sequence {rLn()} converges (A.1) For each n large enough (N for some N 2N), to *. Assumption (C.1) together with existing there exists a unique n such that fnðnÞ¼0. results on penalty function methods estab- Furthermore, limn!1 n ¼ : lishes this desired convergence. International Journal of Control 1235

. From the results in Spall (1992), Wang (1996), the algorithm visit any zero-measure set with zero and assumptions (C.2–C.3), we can show that probability. Assuming that the set D , { 2Rd: rP() condition (A.2) hold. does not exist} has Lebesgue measure 0 and the . Since rn !1, we have condition (A.3) hold random perturbation n follows Bernoulli distribution i i from assumption (C.4). (Pðn ¼ 0Þ¼Pðn ¼ 1Þ¼12), we can construct a . From (9), assumption (C.1) and (C.5), we have simple proof to show that

jðnþ1 nÞij5jðn Þij Pfn 2 D infinitely ofteng¼0 if P{ 2 D} ¼ 0. Therefore, the convergence result in for large n if jni (*)ij 4 . Hence, for large 0 Theorem 2 applies to the penalty functions with non- n, the sequence {ni} does not ‘jump’ over the smoothness at a set with measure zero. Hence, in any interval between (*)i and ni. Therefore, if practical application, we can simply ignore this jni (*)ij 4 eventually, then the sequence technical difficulty and use { fni(n)} does not change sign eventually. That is, condition (A.4) holds. rPðÞ¼maxf0, q ðÞgrq ðÞ, . Assumption (A.5) holds directly from (C.6). JðÞ JðÞ

œ where J() ¼ arg maxj¼1, ..., s qj() (note that J()is uniquely defined for 26 D). An alternative approach Theorem 2 given above is general in the sense that it to handle this technical difficulty is to apply the does not specify the exact type of penalty function P() SP gradient estimate directly to the penalised cost to adopt. In particular, assumption (C.4) seems L() þ rP() and adopt the convergence analysis pre- difficult to satisfy. In fact, assumption (C.4) is fairly sented in He, Fu, Marcus (2003) for non-differentiable weak and does address the limitation of the penalty optimisation with additional convexity assumptions. function based algorithm. For exam- Use of non-differentiable penalty functions might ple, suppose that a constraint function qk() has a local 0 0 allow us to avoid the difficulty of ill-conditioning as minimum at with qk( ) 4 0. Then for every with 0 T rn !1 without using the more complicated penalty qj() 0, j 6¼ k, we have ( ) rP() 4 0 whenever function methods such as the augmented Lagrangian is close enough to 0.Asr gets larger, the term rP() n method used here. TheP rationale here is that there would dominate the behaviour of the algorithm and s 0 exists a constant r ¼ j¼1ðlj lj is the Lagrange result in a possible convergence to , a wrong solution. multiplier associate with the jth constraint) such that We also like to point out that assumption (C.4) is the minimum of L þ rP is identical to the solution satisfied if cost function L and constraint functions qj, of the original constrained problem for all r4r, based ¼ j 1, ..., s are convex and satisfy the Slater condition, on the theory of exact penalties; see, for example, of that is, the minimum cost function value L(*) is Bertsekas (1995, x 4.3). This property of the absolute 2Rd finite and there exists a such that qj() 5 0 for value penalty function allow us to use a constant all j (this is the case studied in Pflug (1981)). penalty parameter r4r (instead of r !1) to avoid Assumption (C.6) ensures that for n sufficiently large n the issue of ill-conditioning. However, it is difficult to each element of g() þ r rP() make a non-negligible n obtain a good estimate for r in our situation where the contribution to products of the form ( *)T(g() þ analytical expression of g() (the gradient of the cost r rP()) when ( *) 6¼ 0. A sufficient condition n i function L()) is not available. And it is not clear for (C.6) is that for each i, g () þ r r P()be i n i that the application of exact penalty functions with uniformly bounded both away from 0 and 1 when r !1 would lead to better performance than the k *) k 4 0 for all i. n i augmented Lagrangian based technique. In x 4 we will Theorem 2 in the stated form does require that the also illustrate (via numerical results) the potential penalty function P be differentiable. However, it is poor performance of the algorithm with an arbitrarily possible to extend the stated results to the case where chosen large r. P is Lipschitz but not differentiable at a set of points with zero Lebesgue measure, for example, the absolute value penalty function 3.2 Asymptotic normality PðÞ¼ max max 0, qjðÞ : When differentiable penalty functions are used, we can j¼1,..., s establish the asymptotic normality for the proposed In the case where the density function of measurement algorithms. In the case where qj(*) 5 0 for all þ noise (n and n in (8) exists and has infinite support, j ¼ 1, ..., s (that is, there is no active constraint at we can take advantage of the fact that iterations of *), the asymptotic behaviour of the algorithm is 1236 I.-J. Wang and J.C. Spall exactly the same as the unconstrained SPSA algorithm where and has been established in Spall (1992). Here we 0 consider the case where at least one of constraints is n ¼ an HðnÞþarHpðnÞ, , active at *), that is, the set A { j ¼ 1, ..., s ; Vn ¼ n ½g^nðnÞEðg^nðnÞjnÞ q (*) ¼ 0} is not empty. We establish the asymptotic j ¼aI, normality for the algorithm with smooth penalty n =2 functions of the form Tk ¼an bnðnÞ: Xs PðÞ¼ pjðqjðÞÞ, Following the techniques used in Spall (1992) and the j¼1 general Normality results from Fabian (1968) we can which including both the quadratic penalty and establish the desired result. œ augmented Lagrangian functions. Note that based on the result in Proposition 1, the 1/3 Assume further that E[enjF n, n] ¼ 0 a.s., convergence rate at n is achieved with ¼ 1 and 2 2 i 2 2 i 2 E½enjFn! a.s., E½ðnÞ ! , and E½ðnÞ ! ¼ð1=6Þ40. 2 , where F n is the -algebra generated by 1, ..., n. Let H() denote the Hessian matrix of L() and X 2 4. Numerical experiments HpðÞ¼ r pjðqjðÞÞ : j 2 A We test our algorithm on a constrained optimisation problem described in (Schwefel 1995, p. 352) The next proposition establishes the asymptotic normality for the proposed algorithm with the follow- 2 2 2 2 min LðÞ¼1 þ 2 þ 23 þ 4 51 52 213 þ 74 ing choice of parameters: an ¼ an , cn ¼ cn and r ¼ rn with a, c, r 4 0, ¼ 2 4 0, and n subject to 3 ð=2Þþð3=2Þ0. 2 2 2 Proposition 1: Assume that conditions (C.1–6) hold. q1ðÞ¼21 þ 2 þ 3 þ 21 2 4 5 0 T 1 1 Let P be orthogonal with PHp(*)P ¼ a r diag 2 2 2 2 q2ðÞ¼1 þ 2 þ 3 þ 4 þ 1 2 þ 3 4 8 0 (l , ..., l ). Then 1 d 2 2 2 2 q3ðÞ¼1 þ 22 þ 3 þ 24 1 4 10 0: =2 T n ðn Þ dist Nð, PMP Þ, n !1 ! The minimum cost L(*) ¼44 under constraints 1 2 2 2 2 2 1 occurs at * ¼ [0, 1, 2, 1]T where the constraints where M ¼ 4 a r c diag½ð2l1 þÞ , ..., ð2ld 1 þÞ with þ ¼ 5 2 mini li if ¼ 1 and þ ¼ 0 if q1() 0 and q2() 0 are active. The Lagrange multi- T T 5 1, and plier is ½l1, l2, l3 ¼½2, 1, 0 . The problem had not 8 been solved to satisfactory accuracy with deterministic > 3 <> 0 if 3 þ 40, search methods that operate directly with constraints; 2 2 claimed by Schwefel (1995). Further, we increase the ¼ > :> 1 1 3 difficulty of the problem by adding i.i.d. zero-mean ðarHpð Þ þ I Þ Tif3 þ ¼ 0, 2 2 2 Gaussian noise to L() and assume that only noisy where the lth element of T is measurements of the cost function L are available "#(without the gradient). The initial point is chosen at 1 Xp [0, 0, 0, 0]T; and the standard deviation of the added ac22 Lð3ÞðÞþ3 Lð3ÞðÞ : 6 lll iii Gaussian noise is 4.0. i¼1, i6¼l We consider three different penalty functions. Proof: For large enough n, we have . Quadratic penalty function:

Xs E½g^nðnÞjn¼HðnÞðn ÞþbnðnÞ, 1 2 PðÞ¼ maxf0, qjðÞg : ð11Þ 0 2 j¼1 rPðnÞ¼HpðnÞðn Þ, where bnðnÞ¼E½g^nðnÞgnðnÞjn. Rewrite the In this case the gradient of P() required in algorithm into the algorithm is

þ ðþÞ=2 Xs nþ1 ¼ðI n nÞðn Þþn nVn rPðÞ¼ max 0, qjðÞ rqjðÞ: ð12Þ þ=2 þ n Tn, j¼1 International Journal of Control 1237

. Augmented Lagrangian: For all the simulations we use the following 0.602 Xs parameter values: an ¼ 0.1(n þ 100) and 1 2 2 0.101 PðÞ¼ max 0, l þ rq ðÞ l : ð13Þ cn ¼ n . These parameters for an and cn are 2r2 j j j¼1 chosen following a practical implementation guideline In this case, the actual penalty function used recommended in Spall (1998). For the augmented will vary over iteration depending on the Lagrangian method, ln is initialised as a zero vector. For the quadratic penalty function and augmented specific value selected for rn and ln. The 0.1 gradient of the penalty function required in Lagrangian, we use rn ¼ 10n for the penalty para- the algorithm for the nth iteration is meter. For the absolute value penalty function, we consider two possible values for the constant penalty: Xs 1 r ¼ r ¼ 3.01 and r ¼ r ¼ 10. Note that in our experi- rPðÞ¼ max 0, l þ r q ðÞ rq ðÞ: ð14Þ n P n r nj n j j ment, r ¼ s l ¼ 3. Hence, the first choice of r at n j¼1 j¼1 j 3.01 is theoretically optimal but not practical, since To properly update ln, we adopt a variation of the there is no reliable way to estimate r. The second multiplier method (Bertsekas 1995): choice of r represent a more typical scenario where an upper bound on r is estimated. lnj ¼ max 0, lnj þ rnqjðnÞ, M , ð15Þ Figure 1 plots the averaged errors (over 100 where lnj denotes the jth element of the vector ln, independent simulations) to the optimum over 4000 þ and M 2R is a large constant scalar. Since (15) iteration of the algorithms. The simulation result in ensures that {ln} is bounded, convergence of the Figure 1 seems to suggest that the proposed algorithm minimum of {Ln()} remains valid. Furthermore, with the quadratic penalty function and the augmen- {ln} will be close to the true Lagrange multiplier as ted Lagrangian led to comparable performance (the n !1. augmented Lagrangian method performed slightly . Absolute value penalty function: better than the standard quadratic technique). This suggests that a more effective update scheme for ln PðÞ¼ max max 0, qjðÞ : ð16Þ j¼1,..., s than (15) is needed for the augmented Lagrangian technique. The absolute value function with The gradient of P() when it exists is r ¼ 3:01ð r ¼ 3Þ has the best performance. However, when an arbitrary upper bound on r is rPðÞ¼maxf0, q ð ÞðÞgrq ð ÞðÞ, ð17Þ J J used (r ¼ 10), the performance is much worse than where J() ¼ arg maxj¼1, ..., sqj(). both the quadratic penalty function and the

0.7

0.6

0.5

Absolute value penalty r = 10 0.4

0.3 Quadratic penalty

Average error over 100 simulations 0.2

0.1 Augmented lagrangian

Absolute value penalty r = 3.01 0 0 500 1000 1500 2000 2500 3000 3500 4000 Number of iterations

Figure 1. Error to the optimum (kn *k) averaged over 100 independent simulations. 1238 I.-J. Wang and J.C. Spall augmented Lagrangian. This illustrates a key diffi- Chong, E.K., and Zak,_ S.H. (1996), An Introduction to culty in effective application of the exact penalty Optimization, New York: John Wiley and Sons. theorem with the absolute penalty function. Dupuis, P., and Kushner, H.J. (1987), ‘‘Asymptotic behavior of constrained stochastic approximations via the theory of large deviations,’’ Probability Theory and Related Fields, 75, 223–274. 5. Conclusions and remarks Fabian, V. (1968), ‘‘On Asymptotic Normality in Stochastic We presented a stochastic approximation algorithm Approximation,’’ The Annals of Mathematical Statistics, based on penalty function method and a simultaneous 39, 1327–1332. He, Y., Fu, M.C., and Marcus, S.I. (2003), ‘‘Convergence of perturbation gradient estimate for solving stochastic Simultaneous Perturbation Stochastic Approximation for optimisation problems with general inequality con- Nondifferentiable Optimization,’’ IEEE Transactions on straints. We also presented a general convergence Automatic Control, 48, 1459–1463. result and the associated asymptotic Normality for the Hiriart-Urruty, J. (1977), ‘‘Algorithms of Penalization proposed algorithm. Numerical results are included to Type and of Dual Type for the Solution of Stochastic demonstrate the performance of the proposed algo- Optimization Problems with Stochastic Constraints,’’ rithm with the standard quadratic and absolute value in Recent Developments in Statistics, ed. J. Barra, penalty functions and a more complicated penalty Amsterdam: North Holland Publishing, pp. 183–2219. function based on the augmented Lagrangian method. Kiefer, J., and Wolfowitz, J. (1952), ‘‘Stochastic Estimation of the Maximum of a Regression Function,’’ Analytical In this paper, we considered the explicit con- Mathematical Statistics, 23, 462–466. straints where the analytical expressions of the Kushner, H., and Clark, D. (1978), Stochastic Approximation constraints are available. It is also possible to apply Methods for Constrained and Unconstrained Systems, the same algorithm with appropriate gradient esti- New York: Springer-Verlag. mate for P() to problems with implicit constraints Kushner, H., and Yin, G. (1997), Stochastic Approximation where constraints can only be measured or estimated Algorithms and Applications, New York: Springer-Verlag. with possible errors. The success of this approach Pflug, G.C. (1981), ‘‘On the Convergence of a Penalty-type would depend on efficient techniques to obtain Stochastic Optimization Procedure,’’ Journal of unbiased gradient estimate of the penalty function. Information and Optimization Sciences, 2, 249–258. Sadegh, P. (1997), ‘‘Constrained Optimization via Stochastic For example, if we can measure or estimate a value of Approximation with a Simultaneous Perturbation the penalty function P(n) at arbitrary location with Gradient Approximation,’’ Automatica, 33, 889–892. zero-mean error, then the SP gradient estimate can be Spall, J.C. (1992), ‘‘Multivariate Stochastic Approximation applied. Of course, in this situation further assump- Using a Simultaneous Perturbation Gradient tions on rn needP to be satisfied (in general, we would Approximation,’’ IEEE Transactions on Automatic 1 2 Control, 37, 332–341. at least need n¼1ðÞanrn=cn 51Þ. However, in a typical application, we most likely can only measure Spall, J.C. (1998), ‘‘Implementation of the Simultaneous Perturbation Algorithm for Stochastic Optimization,’’ the value of constraint qj(n) with zero-mean error. IEEE Transactions on Aerospace and Electronic Systems Additional bias would be present if the standard , 34, 817–823. finite-difference or the SP techniques were applied to Spall, J.C., and Cristion, J.A. (1998), ‘‘Model-free Control of estimate rP(n) directly in this situation. A novel Nonlinear Stochastic Systems with Discrete-time technique to obtain unbiased estimate of P(n) Measurements,’’ IEEE Transactions on Automatic based on a reasonable number of measurements is Control, 43, 1178–1200. required to make the algorithm proposed in this Schwefel, H.-P. (1995), Evolution and Optimum Seeking, New paper feasible in dealing with implicit constraints. York: John Wiley and Sons, Inc. Wang, I.-J. (1996), ‘‘Analysis of Stochastic Approximation and Related Algorithms,’’ PhD thesis, School of Electrical and Computer Engineering, Purdue University. References Wang, I.-J., and Spall, J.C. (1999), ‘‘A Constrained Bertsekas, D.P. (1982), Constrained Optimization and Simultaneous Perturbation Stochastic Approximation Lagragne Mulitplier Methods, NY: Academic Press. Algorithm Based on Penalty Function Method,‘ in Bertsekas, D.P. (1995), , Belmont, Proceedings of the 1999 American Control Conference, MA: Athena Scientific. Vol. 1, San Diego, CA, pp. 393–399.