<<

A Discrete Parameter Stochastic Approximation Algorithm for Simulation Optimization

Shalabh Bhatnagar Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012, India [email protected]

Hemant J. Kowshik Department of Electrical Engineering Indian Institute of Technology Madras Chennai 600 036, India

The authors develop a two-timescale simultaneous perturbation stochastic approximation algorithm for simulation-based parameter optimization over discrete sets. This algorithm is applicable in cases where the cost to be optimized is in itself the long-run average of certain cost functions whose noisy es- timates are obtained via simulation. The authors present the convergence analysis of their algorithm. Next, they study applications of their algorithm to the problem of admission control in communication networks.They study this problem under two different experimental settings and consider appropriate continuous time queuing models in both settings.Their algorithm finds optimal threshold-type policies within suitable parameterized classes of these.They show results of several for different network parameters and rejection cost. The authors also study the sensitivity of their algorithm with respect to its parameters and step sizes. The results obtained are along expected lines. Keywords: Discrete parameter optimization, stochastic approximation algorithms, two-timescale SPSA, admission control in communication networks

1. Introduction the algorithm does not necessarily proceed along the path of decreasing cost as increases in cost function are al- Stochastic discrete optimization plays an important role in lowed with a certain probability. The original annealing the design and analysis of discrete event systems. Exam- algorithm, however, requires cost function measurements ples include the problem of resource (buffer/bandwidth) to be precisely known. Various variants of the above algo- allocation in manufacturing systems and communication rithm have subsequently been developed that work with networks [1, 2], as well as admission control and rout- noisy or imprecise measurements [5] or use simulation ing in communication/wireless networks [3]. Discrete op- [6]. Alrefaei and Andradóttir [6] also propose the use of timization problems in general are hard combinatorial a “constant temperature” annealing schedule instead of problems. a (slowly) decreasing schedule. Some other variants of There have been several approaches for solving dis- simulated annealing that work with noisy objective func- crete optimization problems. Among these, simulated tion estimates include the stochastic ruler algorithm of annealing [4] and its variants have been well studied. Here Yan and Mukai [7] and the stochastic comparison al- gorithm of Gong, Ho, and Zhai [8]. Algorithms based on simulated annealing, however, are known to be slow | | in general. In some other work, a stochastic branch and | bound algorithm for problems of discrete stochastic op- | SIMULATION, Vol. 81, Issue 11, November 2005 757-772 | timization, subject to constraints that are possibly given ©2005 The Society for Modeling and Simulation International | by a set of inequalities, has been developed in Norkin, | DOI: 10.1177/0037549705062294 | Bhatnagar and Kowshik

Ermoliev, and Ruszczy´nski [9]. In Barnhart, Wieselthier, surements for any N-vector parameter and is, in general, and Ephremides [3], the problem of admission control in found to be very efficient. multihop wireless networks is considered, and a gradient In Bhatnagar and Borkar [16], a two-timescale stochas- search-based procedure is used for finding an optimal pol- tic approximation algorithm that uses one-sided differ- icy within the class of certain coordinate convex policies; ence K-W-type gradient estimates was developed as an see also Jordan and Varaiya [10], who consider optimiza- alternative to PA-type schemes. The idea here is that ag- tion on the above type of policies for an admission control gregation/averaging of is performed using the faster problem in multiple-service, multiple-resource networks. timescale recursions while parameter updates are per- In Gokbayrak and Cassandras [2], a stochastic discrete formed on the slower one, and the entire algorithm is resource allocation problem is considered. The discrete updated at each epoch. The disadvantage with this algo- optimization problem is transformed into an analogous rithm, however, is that it requires a significant amount of (surrogate) continuous parameter optimization problem computation since it generates (N + 1) parallel simula- by constructing the convex hull of the discrete search tions at each instant and hence is slow when N is large. space. An estimate of the cost derivative is obtained us- In Bhatnagar et al. [17], the SPSA-based analog of the ing concurrent estimation [11] or perturbation analysis algorithm in Bhatnagar and Borkar [16] was developed, (PA) [12] techniques. For the surrogate problem, they except for the difference that an averaging of the faster use the simplex method to identify the (N + 1) points in timescale recursions over a certain fixed number (L ≥ 1) the discrete constraint set whose convex combination the of epochs was proposed before each parameter update, in RN -valued surrogate state is. In Gokbayrak and Cassan- addition to the two-timescale averaging. This number L dras [13], more general constraint sets than the one above is set arbitrarily, and the additional averaging is seen to are considered, although the basic approach is similar (to improve performance. Other simulation optimization al- Gokbayrak and Cassandras [2]). A computationally sim- gorithms based on the SPSA technique have more recently pler algorithm (than the simplex method) is provided for been developed in Bhatnagar and Borkar [18], Bhatnagar identifying the above-mentioned points. (The basic ap- et al. [19], and Bhatnagar [20]. proach in Gokbayrak and Cassandras [2, 13] is somewhat The algorithms described in the previous two para- loosely similar to our approach below. We provide de- graphs are for optimization over continuously valued sets. tailed comparisons in the next section.) In this article, we develop an algorithm for discrete param- In the works cited above, noisy estimates of the cost eter simulation-based optimization where the cost is the function are assumed available at discrete parameter set- long-run average of certain cost functions that depend on tings, with the goal being to find the optimum parameter the state of an underlying parameterized Markov process. for the noise-averaged cost. There are, however, scenarios This algorithm is a variant of the algorithm in Bhatnagar in which the cost for a given parameter value is in itself the et al. [17] that is adapted to optimization over discrete sets long-run average of a certain cost function whose noisy and uses two-timescale averaging. The motivation for us- estimates at the (above) given parameter values can be ob- ing two-timescale stochastic approximation is shown in tained via simulation. For solving such problems, there the next section. In a related work [1], a variant of the have been approaches in the continuous optimization SPSA algorithm [15] is used for function optimization framework. For instance, those based on PA [12] or the over discrete sets. This, however, is of the one-timescale likelihood ratio [14] require one simulation for finding the variety and is not developed for the setting of simulation optimum parameter. These, however, require knowledge optimization as in this study. We briefly explain the con- of sample path derivatives of performance with respect vergence analysis of our algorithm. Next, we present an to (w.r.t.) the given parameter. In addition, they require application of our algorithm for finding optimal policies certain constraining regularity conditions on sample per- for a problem of admission control [21, 22] in commu- formance and underlying process parameters. Among fi- nication networks. Our methods are applicable to both nite difference gradient approximation-based approaches, admission control of calls (e.g., of customers who require the Kiefer-Wolfowitz (K-W) algorithm with two-sided services with certain quality of service [QoS] require- differences requires 2N measurements for ments for an arbitrary or random duration of time) as well an N-dimensional parameter vector. In a recent related as packet admission control (or that for individual packets work [15], the gradient estimates in a K-W-type algorithm at link routers within the network). We consider two dif- are chosen by simultaneously perturbing all parameter ferent settings for our experiments. These are explained components along random directions, most commonly in detail in section 4. We assume the class of feedback by using independent, symmetric, ±1-valued, Bernoulli- policies in these settings to be of the threshold type that distributed, random variables. This algorithm, known as depend on the state of the underlying Markov process at the simultaneous perturbation stochastic approximation any instant. Our algorithm gives an optimal policy within (SPSA) algorithm, requires only two loss function mea- the prescribed class of policies (i.e., computes the optimal

758 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM such threshold-type policy). Obtaining optimal policies tor with 1 in the ith place and 0 elsewhere. Then, using analytical techniques in such settings is not feasi- ble in general. Moreover, as we explain in the next section, (J (θ + δei) − J(θ)) ∇iJ(θ) = lim (2) PA-type approaches, as with Gokbayrak and Cassandras δ→ δ 0   [2, 13], are not directly applicable on such settings. Our l−1 simulation-based discrete parameter algorithm proves ef-  1 i θ  = lim lim (h(Xj ) − h(Xj )) . (3) fective here as it gives an optimal policy within a given δ→ l→∞ δl 0 j= parameterized class of policies (in this case, the threshold- 0 type policies). This is similar in spirit to neurodynamic The gradient ∇J(θ) may thus be estimated by simulat- programming techniques with policy parameterization in (N + ) {Xθ} the context of Markov decision processes [23, 24]. ing the outcomes of the 1 Markov chains j i The rest of the article is organized as follows: section 2 and {Xj }, i = 1,... ,N, respectively. These Markov describes the setting and the algorithm and provides a chains, in turn, correspond to the underlying processes motivation for the two-timescale idea. We also discuss in independently running parallel systems that are each comparisons of our approach here with the ones in Gok- identical to one another except that they run with dif- bayrak and Cassandras [2, 13]. The convergence analysis ferent parameter values (above). Note that the transition is briefly presented in section 3. The experiments on ad- dynamics of these Markov chains need not be precisely mission control in communication networks are presented known as long as state transitions of these chains can be in section 4. Finally, section 5 provides the concluding simulated. Estimates (2) correspond to one-sided Kiefer- remarks. Wolfowitz estimates of the gradient ∇J(θ). One can sim- ilarly obtain two-sided Kiefer-Wolfowitz gradient esti- 2. Framework and Algorithm mates as well. More recently, Spall [15] proposed the fol- lowing (SPSA) randomized difference gradient estimates. θ Consider a Markov process {Xn } parameterized by Suppose i, i = 1,... ,N, are independent identically θ ∈ D ⊂ ZN Z distributed (i.i.d.) Bernoulli-distributed random variables , where is the set of all inte- T θ with i =±1 w.p. 1/2, and let =( , ..., N ) . gers and N ≥ 1. Suppose Xn , n ≥ 1, take val- 1 d Quantities are called random perturbations. (More gen- ues in S ⊂ R for some d ≥ 1. We assume N eral conditions on the sequences of independently gener- D has the form D = {Di, ,... ,Di, }. Here, i=1 min max ated s are given in Spall [15]; see also condition (A) be- Di, ,Di, ∈ Z with Di, ≤ Di, , i = 1,... ,N, min max min max low, which is similar to Spall [15].) Consider now Markov and {Di,min,... ,Di,max} is the set of successive integers + − chains {Xn } and {Xn } that are respectively governed by from Di,min to Di,max, with both end points included. As- θ parameters (θ + δ ) and (θ − δ ). Then, the SPSA gra- sume that for any fixed θ ∈ D, {Xn } is ergodic as well with d dient estimate is transition kernel pθ(x, dy), x,y ∈ S. Let h : R → R be   the associated cost function that is assumed to be bounded (J (θ + δ ) − J(θ − δ )) ∗ ∇iJ(θ) = lim E (4) and continuous. The aim is to find θ ∈ D such that δ→ δ 0  2 i  l−1 1 + − n = lim E  lim (h(X ) − h(X )) , ∗ 1 θ∗ j j J(θ ) = h(X ) = J(θ). δ→0 l→∞ 2δl lim i min (1) j=0 n→∞ n θ∈D i=1 (5)

where the expectation is taken w.r.t. the common distri- Here, J(·) denotes the long-run average cost. The aim θ∗ J(θ) bution of i. therefore is to find that minimizes . Note from the form (3) of the estimates above, it is Before we proceed further, we first provide the motiva- clear that the outer limit is taken only after the inner tion for using the two-timescale stochastic approximation limit. For classes of systems for which the two limits (in approach. Suppose for the that we are interested θ (3)) can be interchanged, gradient estimates from a sin- in finding a parameter , taking values in a continuously gle sample path can be obtained using PA [12] or likeli- valued set, that minimizes J(θ). One then needs to com- ∇J(θ) ≡ (∇ J(θ),...,∇ J(θ))T hood ratio [14] techniques. For a general class of systems pute the gradient 1 N , as- for which the above limit interchange is not permissible, {Xθ} suming that it exists. Consider a Markov chain j gov- any recursive algorithm that computes optimum θ should erned by parameter θ. Also, consider N other Markov have two loops, with the outer loop (corresponding to pa- i chains {Xj } governed by parameters θ + δei, i = rameter updates) being updated once after the inner loop 1,... ,N, respectively. Here, ei is an N-dimensional vec- (corresponding to data averages), for a given parameter

Volume 81, Number 11 SIMULATION 759 Bhatnagar and Kowshik value, has converged. Thus, one can physically identify dard requirement in SPSA algorithms; see, for instance, separate timescales: the faster one on which data based Spall [15] for similar conditions. Let F denote the com- on a fixed parameter value are aggregated and averaged mon distribution function of the random variables i(n). and the slower one on which the parameter is updated once the averaging is complete for one latter iteration. 2.1 Discrete Parameter Stochastic Approximation The same effect as on different physical timescales can Algorithm also be achieved by using different step-size schedules (also called timescales) in the stochastic approximation Z1( ) = Z2( ) = θ ( ) algorithm. Note also from the form of (5) that the expec- • Step 0 (Initialize): Set 0 0 0. Fix 1 0 , θ ( ) ... θ ( ) D tation is taken after the inner loop limit and is followed 2 0 , , N 0 within the set and form the pa- T by the outer limit. Bhatnagar and Borkar [16] develop rameter vector θ(0) = (θ1(0), ..., θN (0)) . Fix L a two-timescale algorithm using (3) as the gradient es- and (large) M arbitrarily. Set n := 0, m := 0 and timate, while Bhatnagar et al. [17] do so with (5). The fix a constant c>0. Generate i.i.d. random variables 1(0), 2(0), ..., N (0), each with distribution F and use of the latter (SPSA) estimate was found in Bhatnagar 1 et al. [17] to significantly improve performance in two- such that these are independent of θ(0). Set θj (0) = timescale algorithms.As explained before, the algorithms 2 Γj (θj (0) −c j (0)) and θj (0) = Γj (θj (0) +c j (0)), of Bhatnagar and Borkar [16] and Bhatnagar et al. [17] j = 1,... ,N, respectively. were for the setting of continuous parameter optimization. θ1(n) θ2(n) In what follows, we develop a variant of the algo- • Step 1: Generate simulations XnL+m and XnL+m, respec- rithm in Bhatnagar et al. [17] for discrete parameter θ1(n) = (θ1(n) ... θ1 (n))T tively governed by 1 , , N and . Consider two step size sched- θ2(n) = (θ2(n) ... θ2 (n))T ules or timescale sequences {a(n)} and {b(n)} that satisfy 1 , , N . Next, update a(n), b(n) > 0 ∀n, and 1 1 θ1(n) Z (nL + m + 1) = Z (nL + m) + b(n)(h(XnL+m) a(n) = b(n) =∞, 1 n n − Z (nL + m)), (a(n)2 + b(n)2)<∞, n 2 2 θ2(n) Z (nL + m + 1) = Z (nL + m) + b(n)(h(XnL+m) a(n) = o(b(n)). (6) − Z2(nL + m)).

Let c>0 be a given constant. Also, let Γi(x) de- If m = L − 1, set nL := nL + L, m := 0 and go to note the projection from x ∈ R to {Di, ,... ,Di, }. min max step 2; In other words, if the point x is an integer and lies m := m + within the set {Di,min,... ,Di,max}, then Γi(x) = x, else, set 1 and repeat step 1. else Γi(x) is the closest integer within the above set to • Step 2: For i = 1,... ,N, x. Furthermore, if x is equidistant from two points in    Z1(nL) − Z2(nL) {Di,min,... ,Di,max}, then Γi(x) is arbitrarily set to one T N θi (n + 1) = Γi θi (n) + a(n) . of them. For y = (y1,...yN ) ∈ R , suppose Γ(y) 2c i (n) T = (Γ1(y1), ..., ΓN (yN )) . Then, Γ(y) denotes the pro- N n := n + n = M jection of y ∈ R on the set D (defined earlier). Set 1. If , go to step 3; N Now suppose for any n ≥ 0, (n) ∈ R is a else, generate i.i.d. random variables 1(n), 2(n), ..., vector of mutually independent and zero ran- N (n) with distribution F (independent of their previ- ous values and also of previous and current θ values). dom variables { 1(n), ..., N (n)} (namely, (n) = 1 2 T Set θj (n) := Γj (θj (n) −c j (n)), θj (n) := Γj (θj (n) ( (n),..., N (n)) ), taking values in a compact set 1 +c j (n)), j = 1,... ,N, and go to step 1. E ⊂ RN and having a common distribution. We assume (n) • Step 3 (termination): Terminate algorithm and output that random variables i satisfy condition (A) below. θ(M) = (θ (M) ... θ (M))T Condition (A). There exists a constant K<¯ ∞, 1 , , N as the final parameter vector. such that for any n ≥ 0, and i ∈{1,... ,N}, E ( (n))−2 ≤ K¯ i . In the above, Z1(nL + m) and Z2(nL + m), m = We also assume that (n), n ≥ 0, are mutually in- 0, 1,... ,L− 1, n ≥ 0, are defined according to their dependent vectors and that (n) is independent of the corresponding recursions in step 1 and are used for aver- σ-field σ(θ(l), l ≤ n). Condition (A) is a somewhat stan- aging the cost function in the algorithm. As stated earlier,

760 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM we use only two simulations here for any N-vector pa- an interchange because of the use of two timescales, un- rameter. Note that in the above, we also allow for an addi- like Gokbayrak and Cassandras [2, 13], who use only one. tional averaging over L (possibly greater than 1) epochs In the type of cost functions and underlying process pa- in the two simulations. This is seen to improve perfor- rameters used in our experiments in section 4, PA is not mance of the system, particularly when the parameter di- directly applicable. Finally, Gokbayrak and Cassandras mension N is large. We study system performance with [2, 13] require that the “neighborhood” (N + 1) points, varying values of L in our experiments. Note also that whose convex combination any RN -valued state of the the value of c should be carefully chosen here. In par- surrogate system is, need to be explicitly obtained, which ticular, c>1/(2R) should be used, with R being the is computationally cumbersome. Our algorithm does not maximum absolute value that the random variables j (n) require such an identification of points as it directly up- can take. This is because for values of c that are lower, dates the algorithm on the discrete set. Next, we present 1 2 both perturbed parameters θj (n) and θj (n) (after being the convergence analysis for our algorithm. projected onto the grid) would always lead to the same point. The algorithm would not show good convergence behavior in such a case. For instance, when j (n) are 3. Convergence Analysis i.i.d. symmetric, Bernoulli-distributed, random variables c with j (n) =±1 w.p. 1/2, the value of c chosen should Suppose D denotes the convex hull of the set D. Then N be greater than 0.5. This fact is verified through our ex- c periments as well, where we generate perturbations ac- D has the form [Di,min,Di,max]—namely, it is the cording to the above distribution. Moreover, one should i=1 [D ,D ]⊂R choose appropriate step size sequences that do not de- Cartesian product of the intervals i,min i,max , i = 1,... ,N. For showing convergence, we first extend crease too fast so as to allow for better exploration of the θ search space. the dynamics of the Markov process {Xn } to the contin- c We now provide some comparisons with the approach uously valued parameter set θ ∈ D and show the analy- used in Gokbayrak and Cassandras [2, 13] as the basic ap- sis for the extended parameter process. Convergence for proach used in the above references is somewhat loosely the original process is then straightforward. Note that the similar to ours. The method in Gokbayrak and Cassan- set D contains only a finite number (say, m) of points. m dras [2, 13] uses a (surrogate) system with parameters in For simplicity, we enumerate these as θ1, ..., θ . Thus, m c a continuously valued set that is obtained as a convex hull D ={θ1,... ,θ }. The set D then corresponds to of the underlying discrete set. The algorithm used there m m updates parameters in the above set, using a continuous Dc ={ α θi | α ≥ ,i= ,... ,m, α = }. optimization algorithm by considering the sample cost for i i 0 1 i 1 the surrogate system to be a continuously valued linear i=1 i=1 interpolation of costs for the original system parameters. In general, any point θ ∈ Dc can be written as The surrogate parameters after each update are projected m onto the original discrete set to obtain the corresponding i θ = αi(θ)θ . Here we consider an explicit functional parameter value for the original system. The algorithm i=1 itself uses PA/concurrent estimation-based sample gradi- dependence of the weights αi, i = 1,... ,m, on the pa- ent estimates, with the operating parameter being the cor- rameter θ and assume that αi(θ), i = 1,... ,m, are con- responding projected “discrete” update. Note that there tinuously differentiable with respect to θ.Forθ ∈ Dc, are crucial differences between our approach and the one define the transition kernel pθ(x, dy), x,y ∈ S as above. As shown in the next section, we also use a convex hull of the discrete parameter set, but this is only done for m proving the convergence of our algorithm that updates pa- pθ(x, dy) = αk(θ)pθk (x, dy). rameters directly in the discrete set. We do not require the k=1 continuous surrogate system at all during the optimization c process. Furthermore, since we use an SPSA-based finite It is easy to verify that pθ(x, dy), θ ∈ D , satisfy prop- difference gradient estimate, we do not need PA-type gra- erties of a transition kernel and that pθ(x, dy) are con- dient estimates. We directly work (for the gradient esti- tinuously differentiable with respect to θ. Moreover, it is mates) with the projected perturbed parameter values in θ easy to verify that the Markov process {Xn } continues the discrete set. As stated earlier, the use of PA requires to be ergodic for every θ ∈ Dc as well. Let J c(θ) de- constraining conditions on the cost and system parameters note the long-run average cost corresponding to θ ∈ Dc. so as to allow for an interchange between the expectation Then, J c(θ) = J(θ) for θ ∈ D. For a set A, suppose |A| and gradient (of cost) operators. We do not require such denotes its cardinality. The following lemma is valid for

Volume 81, Number 11 SIMULATION 761 Bhatnagar and Kowshik

finite state Markov chains, that is, where |S| < ∞.We where have −1 c H(θ, θ + h) =[I − (P (θ + h) − P(θ))] → I LEMMA 1. For |S| < ∞, J (θ) is continuously differ- entiable in θ. as |h|→0 Proof. Note that J c(θ) here can be written as and J c(θ) = h(i)µ (θ), i U(θ, θ + h) = (P (θ + h) − P(θ))Z(θ) → 0¯ i∈S as |h|→0. where µi(θ) is the steady-state probability of the chain θ c {Xn } being in state i ∈ S for given θ ∈ D . Then In the above, 0¯ is the matrix (of appropriate dimension) it is sufficient to show that µi(θ), i ∈ S are continu- with all zero elements. It thus follows that Z(θ + h) → ously differentiable functions. Let for fixed θ, P(θ) := Z(θ) as |h|→0. Moreover µ(θ) is continuous since it is [[pθ(i, j)]]i,j∈S denote the transition probability matrix differentiable. Thus, from above, µ (θ) is continuous in θ θ θ of the Markov chain {Xn }, and let µ(θ) := [µi(θ)]i∈S de- as well, and the claim follows. For vector , a similar note the vector of stationary probabilities. Here we denote proof as above verifies the claim.  by p (i, j) the transition probabilities of this chain. Also θ REMARK. Lemma 1 is an important result that is re- let Z(θ) := [I −P(θ)−P ∞(θ)]−1, where I is the identity ∞ m quired to push through a Taylor series argument in the matrix and P (θ) = limm→∞(P (θ) + ...+ P (θ))/m. J c(θ) m following analysis. Moreover, turns out to be the Here, P (θ) corresponds to the m-step transition prob- associated Liapunov function for the corresponding or- m θ ability matrix with elements pθ (i, j) = Pr(Xm = j | dinary differential equation (ODE) (8) that is asymptoti- Xθ = i) i, j ∈ S θ cally tracked by the algorithm. However, as stated before, 0 , . For simplicity, let us first consider to be a scalar. Then, from theorem 2 of Schweitzer [25, lemma 1 is valid only for the case of finite state Markov pp. 402-3], one can write chains. This is because the result of Schweitzer [25] used in the proof is valid only for finite state chains. For general µ(θ + h) = µ(θ)(I + (P (θ + h) − P(θ))Z(θ) + o(h)). state Markov processes, certain sufficient conditions for (7) the differentiability of the stationary distribution are given in Vazquez-Abad and Kushner [26]. These are, however, Thus, difficult to verify in practice. In the remainder of the anal- ysis (for the case of general state Markov processes taking µ (θ) = µ(θ)P (θ)Z(θ). values in Rd ), we simply assume that J c(θ) is continu- ously differentiable in θ. For our numerical experiments Hence, µ (θ) (the derivative of µ(θ)) exists if P (θ) (the in section 4, we require only the setting of a finite state derivative of P(θ)) does. Note that by construction, for Markov chain, for which lemma 1 holds. c θ ∈ Dc, P (θ) exists and is continuous. Next, note that Let us now define another projection function Γ : RN → Dc in the same manner as the projection func- |µ (θ + h) − µ (θ)|≤|µ(θ + h)P (θ + h)Z(θ + h) tion Γ, except that the new function projects points in RN Dc D − µ(θ)P (θ + h)Z(θ + h)| on the set in place of . Now consider the al- gorithm described in section 2, with the difference that +|µ(θ)P (θ + h)Z(θ + h) c we use projection Γi (·) in place of Γi(·) in step 2 of the c − µ(θ)P (θ)Z(θ + h)| algorithm, where Γi (·), i = 1,... ,N are defined suit- Γ (·) +|µ(θ)P (θ)Z(θ + h) ably (as with i ). The above thus corresponds to the continuous parameter version of the algorithm described

− µ(θ)P (θ)Z(θ)|. in section 2. We now describe the analysis for the above (continuous parameter) algorithm with Γc(·) in place of Then, (again) from theorem 2 [25, pp. 402-3], one can Γ(·). For any bounded and continuous function v(·) ∈ R, Z(θ + h) ˆ c write as let Γi (·), i = 1,... ,N be defined by   Z(θ + h) = Z(θ)H (θ, θ + h) Γc(x + ηv(x)) − x ˆ c i Γi (v(x)) = lim . − P ∞(θ)H (θ, θ + h)U(θ, θ + h) η↓0 η

c c Z(θ)H (θ, θ + h), y = (y ,... ,y )T ∈ RN Γˆ (y) = (Γˆ (y ) Next for 1 N , let 1 1 ,

762 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM

ˆ c T ..., ΓN (yN )) . Consider the following ODE: Thus Tn −Tn−1 ≈ T , ∀n ≥ 1.Also, for any n, there exists .   some integer mn such that Tn = t(mn). Now define pro- c c ,n ,n ,n θ(t) = Γˆ −∇J (θ) . (8) cesses {Y 1 (t)} and {Y 2 (t)} according to Y 1 (Tn) = ,n Y 1(t(mn)) = Z1(nL), Y 2 (Tn) = Y 2(t(mn)) = Z2(nL), c c c t ∈[T ,T ) Y 1,n(t) Y 2,n(t) Let K ={θ ∈ D | Γˆ (∇J (θ)) = 0} denote the set of and for n n+1 , and evolve accord- all fixed points of the ODE (8). Suppose for given  > 0, ing to the ODEs (10)-(11). By an application of Gron-   wall’s inequality, it is easy to see that K denotes the set K ={θ |θ − θ ≤ ∀ θ ∈ K}. ,n Then we have the following: sup  Y 1 (t) − Y 1(t) , t∈[Tn,Tn+1) THEOREM 1. Given  > 0, there exists c0 > 0 such that c ∈ ( ,c ] for any 0 0 , the continuous parameter version of  Y 2,n(t) − Y 2(t) → n →∞. θ∗ ∈ K sup 0as the algorithm converges to some , as the number t∈[Tn,Tn+1) of iterations M →∞. Proof. The proof proceeds through a series of approxi- Now note that the iteration in step 2 of the continuous version of the algorithm can be written as follows: for mation steps as in Bhatnagar et al. [17]. We briefly sketch i = ,... ,N it below. Let us define a new sequence {b(n)} of step 1 ,  n   n  sizes according to b(n) = b , where denotes θ (n + ) = Γc(θ (n) + b(n)η (n)), L L i 1 i i i n the integer part of . It is then easy to see that {b(n)} L where ηi(n) = o(1), ∀i = 1,... ,N, since a(n) = satisfies o(b(n)). Let us now define two continuous time pro- ˆ 2 cesses {θ(t)} and {θ(t)} as follows: θ(t(n)) = θ(n) b(n) =∞, b(n) < ∞, a(n) = o(b(n)). T = (θ (n),...,θN (n)) , n ≥ 1. Also, for t ∈[t(n), t(n+ n n 1 1)], θ(t) is continuously interpolated from the values it In fact, b(n) goes to zero faster than b(n) does, and thus takes at the boundaries of these intervals. Furthermore, θˆ(T ) = θˆ(t(m )) = θ(n) t ∈[T ,T ) θˆ(t) b(n) corresponds to an “even faster timescale” than b(n). n n , and for n n+1 , T,η > Now define {t(n)} and {s(n)} as follows: t(0) = 0 and evolves according to the ODE (9). Thus, given 0, n ∃P such that ∀n ≥ P , sup  Y i,n(t) −Y i(t)  < η,  t∈[T ,T ) t(n) = b(m), n ≥ 1. Furthermore, s(0) = 0 and n n+1 ˆ m=1 i = 1, 2. Also, sup  θ(t) −θ(t)  < η.Itisnow n t∈[Tn,Tn+1) s(n) = a(m), n ≥ 1. Also, let (t) = (n) for easy to see (cf. [27]) that  Z1(nL) −J c(θ(n) −c (n))  2 c m=1 and  Z (nL) −J (θ(n) +c (n)) →0asn →∞. t ∈[s(n), s(n + 1)]. It is then easy to see that the con- Now note that because of the above, the iteration in tinuous parameter version of the algorithm above, on this step 2 of the new algorithm can be written as follows: for timescale, tracks the trajectories of the system of ODEs: i = 1,... ,N, . θ(t) = , c 0 (9) θi(n + 1) = Γi (θi(n) + a(n)   J c(θ(n) − c (n)) − J c(θ(n) + c (n)) . 1 c 1 Z (t) = J (θ(t) − c (t)) − Z (t), (10) 2c i(n) +a(n)ξ(n)) , . 2 c Z (t) = J (θ(t) + c (t)) − Z2(t), (11) where ξ(n) = o(1). Using Taylor series expansions of in the following manner: suppose we define continuous J c(θ(n) −c (n)) and J c(θ(n) +c (n)) around the point time processes {Y 1(t)} and {Y 2(t)} according to Y 1(t(n)) θ(n), one obtains = Z1(nL), Y 2(t(n)) = Z2(nL), and for t ∈[t(n), t(n + 1)], Y 1(t) and Y 2(t) are continuously interpolated from J c(θ(n) − c (n)) = J c(θ(n)) the values they take at the boundaries of these intervals. {T } N Furthermore, let us define a real-valued sequence n as c − c j (n)∇j J (θ(n)) + o(c) follows: suppose T>0 is a given constant. Then, T0 = 0, and for n ≥ 1, j=1

Tn = min{t(m) | t(m) ≥ Tn−1 + T }. and

Volume 81, Number 11 SIMULATION 763 Bhatnagar and Kowshik

itself as an associated strict Liapunov function. The claim follows.  J c(θ(n) + c (n)) = J c(θ(n)) Finally, note that we have J(θ) = J c(θ) for θ ∈ D. N Moreover, while updating θ in the original algorithm, we c + c j (n)∇j J (θ(n)) + o(c), impose the restriction (via the projection Γ(·)) that θ can c N j=1 move only on the grid D of points where D = D ∩ Z . By the foregoing analysis, the continuous parameter ver- respectively. Thus, sion of the algorithm shall converge in a “small” neigh- J c(·) J c(θ(n) − c (n)) − J c(θ(n) + c (n)) borhood of a local minimum of . Thus, in the original algorithm, θ shall converge to a point on the grid that is 2c i(n) (again) in a neighborhood of the local minimum (above) of J c(·) and hence in a corresponding neighborhood of a N c j (n) c local minimum of J(·); see Gerencsér, Hill, and Vágó [1, =−∇iJ (θ(n)) − ∇j J (θ(n)) + o(c). (n) p. 1793] for a similar argument based on certain L-mixing j= ,j=i i 1 processes. We now present our numerical experiments. (12)

Now let {Fn, n ≥ 0} denote the sequence of σ-fields de- 4. Numerical Experiments fined by Fn = σ(θ(m), m ≤ n, (m), m

764 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM

MMPP Accept Source Information Reject Feedback

Figure 1. The admission control model of setting 1

rates of the Poisson process in each of the states 0 through queue at any instant is thus upper bounded by 490 because 4 are taken to be 10, 15, 18, 22, and 30, respectively. The of the form of the feedback policies and the constraint set. service times are assumed to be exponentially distributed We assume that the queue length information is fed back with rate µ = 17. In general, these rates could be set in any to the controller once in every 100 arriving packets. The manner since we only require the Markov chain {(qt ,zt )} values of c and L in the algorithm are chosen to be 1 and to be ergodic for any given set of threshold values (see 100, respectively. The step sizes {a(n)} and {b(n)} in the below). Since we consider a finite buffer system, the above algorithm are taken as Markov chain will be ergodic. We consider the following 1 1 class of feedback policies: b(n) =  a(n) =  , n 2/3 and n 3/4 (14) 10 10 For i = 0, 1,... ,4, respectively, for n ≥ 1 and a(0) = b(0) = 1. In the {  n  n if MMPP state = i above, denotes the integer part of . Note that we { 10 10 study the sensitivity of the algorithm w.r.t. the parameters

Volume 81, Number 11 SIMULATION 765 Bhatnagar and Kowshik

500

450

400

350

300

250 Threshold N0 200

150

100

50

0 0 1 2 3 4 5 6 7 8 9 10 4 Number of Updates x 10

Figure 2. Convergence of threshold N0 when rejection cost = 250

Table 1. Effect of increase in RC on performance RC N∗ N∗ N∗ N∗ N∗ J(θ∗) 0 1 2 3 4 100 117 89 147 7 343 18.06 125 205 362 252 3 101 21.72 150 157 140 177 4 297 23.06 200 136 402 294 10 330 24.55 250 59 72 32 32 481 24.94 300 41 321 22 61 369 27.92 350 30 336 9 55 421 36.09 400 34 409 16 59 415 34.06 450 30 420 7 34 406 46.18 500 10 451 24 86 367 44.73

iments with different rejection costs in different states 4.2 Setting 2 of the CTMC but with the cost of accepting arrivals the same as before. Specifically, we chose the rejection costs The model we consider here is in many ways more gen- for this case to be 100 + 50i for states i = 0, 1,... ,4of eral than the one earlier. The basic model is shown in the underlying CTMC. The average rejection cost in this Figure 3. A similar model has recently been considered case is thus 185.5. The long-run average cost for this case in Bhatnagar and Reddy [28], where the admission con- was found to be 23.80. Upon comparison with the values trol problem is analyzed in detail and a variant of our in Table 1, it is clear that this value lies in between the algorithm that updates parameters in the (continuously long-run average costs corresponding to the cases when valued) closed convex hull of the discrete search space the rejection costs are set uniformly (over all states) at 150 is proposed. We consider two streams of packets, one and 200, respectively, which conforms with intuition. controlled and the other uncontrolled. The uncontrolled

766 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM

Uncontrolled Source (λ Poisson u )

u A (n−1) q CN n Regularized Accept SMMPP

Stream c Feedback A (n−1) Main Queue at nT, n > 0.

Reject Db

Figure 3. The admission control model of setting 2

stream corresponds to higher priority traffic, while the CN are admitted to the main queue first (as they have controlled stream characterizes lower priority traffic that higher priority) whenever there are vacant slots available is bursty in nature and receives best-effort service. We in the main queue buffer that we assume has size B. Ad- assume the uncontrolled stream to be a Poisson process mission control (in the regular sense) is performed only on with rate λu while the controlled stream is assumed to packets from the SMMPP stream that are stored at node be a regularized semi-Markov-modulated Poisson pro- CN. Packets from this stream are admitted to the main cess (SMMPP) that we explain below. Suppose {yt } is a queue at instants nT based on the queue length qn of the regularized semi-Markov process, that is, one for which main queue, the number of uncontrolled stream packets transitions are Markov but that take place every T instants admitted at time nT , and also the state yn−1 (during in- of time for some T fixed. Suppose {yt } takes values in a terval [(n − 1)T , nT )) of the underlying Markov process finite set V .Nowifyn ≡ ynT = i ∈ V , we assume the in the SMMPP.We assume that feedback policies (for the SMMPP sends packets according to a Poisson process controlled stream) are of the threshold type. For Db of with rate λ(i) during the time interval [nT , (n + 1)T ).As the form Db = MT for some M>0, the joint process with an MMPP, an SMMPP may also be used to model {(qn,yn−1, qn−1,yn−2, ..., qn−M ,yn−M−1)} is Markov. bursty traffic. For ease of exposition, we give the form of the policies ¯ ¯ Packets from both streams are first collected at a below for the case of Db = 0. Suppose L , ..., LN are ¯ 1 multiplexor or control node CN during time intervals given integers satisfying 0 ≤ Lj ≤ B, ∀ j ∈ V . Here, ¯ [(n−1)T , nT ), n ≥ 1, and are stored in different buffers. each Lj serves as a threshold for the controlled SMMPP (These correspond to the input buffers at node CN.) We stream when the state of the underlying Markov chain is assume both buffers have infinite capacity.At instants nT , j. Let au and ac denote the number of arrivals from the the queue length qn, n ≥ 1 in the main queue is observed, uncontrolled and controlled streams, respectively, at node and this information is fed back to node CN. We assume, CN during the time interval [(n − 1)T , nT ). however, that this information is received at the CN with D a delay b. On the basis of this information, a decision on Feedback Policies the number of packets to accept from both streams (that are stored in the control buffers) is instantly made. Packets •Ifau ≥ B − i that are not admitted to the main queue from either stream { are immediately dropped, and they leave the system.Thus, Accept first (B − i) uncontrolled packets and no con- the interim buffers at the control node CN are emptied ev- trolled packets. ery T units of time, with packets from these buffers either } ¯ ¯ u joining the main queue or getting dropped. We assume •Ifi

Volume 81, Number 11 SIMULATION 767 Bhatnagar and Kowshik

Accept all uncontrolled packets and no controlled of λu, the average cost increases as T is increased (and packets. thus is the lowest for the lowest value of T considered). } This happens because as T is increased, the system ex- ¯ ¯ u •Ifia ercises control less often as also the average number of { packets that arrive into the control node buffers in any Accept all uncontrolled packets and given interval increases. Also from Table 3, note that as c ¯ u λu – If a < Lj − i − a the uncontrolled rate is increased to 1.0, the average { cost values increase significantly over corresponding val- Accept all controlled packets ues of these for λu = 0.1. This is again expected since } the higher priority uncontrolled packets now have a sig- ac ≥ L¯ − i − au nificant presence in the main queue buffer, leading to an – If j ∗ { overall increase in values of J(θ ) as packets from the ¯ u Accept first Lj − i − a controlled packets control stream get rejected more frequently. In Table 4, } we vary RC over a of values, keeping the cost of ac- cepting a packet the same as before. Also, we keep T and } λu fixed at 3 and 0.4, respectively. Furthermore, Db = 0. ∗ ¯ u J(θ ) RC •Ifi ≥ Lj and B − i>a As expected, the average cost increases as is { increased. In Table 5, we show the impact of varying Db Accept all uncontrolled packets and no controlled for fixed T = 3, λu = 0.1, and RC = 50, respectively. packets. Note that as Db is increased, performance deteriorates as } expected. In the above tables, we set the step sizes as in (14). For our experiments, we consider an SMMPP stream Also, c = 1 and L = 100 as before. In Tables 6 through with four states, with the transition probability matrix of 8, we study the effect of changing these parameters when the underlying Markov chain chosen to be T = 3 and RC = 50 are held fixed. In Table 6, we let   b(n) a(n) a( ) = . . as in (14), while has the form 0 1 and 006040 1  . .  a(n) =  , n ≥ 1. We study the variation in perfor- P =  0 40060 . n α 0001 10 α ∈[. , ] 00.700.3 mance for different values of 0 66 1 . Note from the table that when α = 0.66 is used, a(n) = b(n), ∀n.In The corresponding rates of the associated Poisson pro- such a scenario, the algorithm does not show good per- cesses in each of these states are chosen as λ(1) = 0.5, formance as observed from the average cost value, which λ(2) = 1.0, λ(3) = 1.5, and λ(4) = 2.5, respectively. is high. When α is increased, performance improves very The service times in the main queue are considered to be quickly until after α = 0.75, beyond which it stabilizes. exponentially distributed with rate µ = 1.3. We show re- These results also suggest that a difference in timescales sults of experiments with different values of T , λu, Db, in the recursions improves performance. and the rejection cost (RC). These quantities are assumed In Tables 7 and 8, we study the effects of changing c constant for a given . We also show exper- and L on performance for above values of T and RC, with iments with different values for step sizes {a(n)}, and step sizes as in (14). Note from Table 7 that performance {b(n)}, as well as parameters c and L, respectively. is not good for c<0.5. As described previously (section In Figure 4, we show the convergence behavior for one 2), this happens because for c<0.5, the parameters for of the threshold values, for the case T = 3, λu = 0.4, both simulations after projection would lead to the same Db = 0, and RC = 50. The same for other cases and point. Performance is seen to improve for c>0.5. From thresholds is similar and is hence not shown. We denote Table 8, it can be seen that performance is not good for by N1∗,... ,N4∗ the optimal values of the threshold pa- low values of L (see the case corresponding to L = 10). rameters N1,... ,N4, respectively. We assume that the It has also been observed in the case of continuous opti- cost of accepting a packet at the main queue is the queue mization algorithms [17, 19, 20] that low values of L lead length as seen by it. The buffer size at the main queue is to poor system performance. Also, observe that when L assumed to be 80. is increased beyond a point (see entries corresponding In Tables 2 and 3, we study the change in average cost to L = 700 and L = 1000, respectively), performance behavior for different values of T and λu when RC = 50 degrades again. This implies that excessive additional av- and Db = 0. Specifically, in Table 2 (respectively, Ta- eraging is also not good for system performance. From ble 3), the results of experiments for which λu = 0.1 (re- the table, it can be seen that L should ideally lie between spectively, 1.0) are tabulated. Note that for a given value 100 and 500.

768 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM

60

50

40

30 Value of Threshold N1 20

10

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 Number of Parameter Updates x 10

Figure 4. Convergence of the values of threshold N1 for T = 3, λu = 0.4, Db = 0, and RC = 50

Table 2. λu = 0.1, RC = 50, Db = 0 ∗ ∗ ∗ ∗ ∗ TTNN1 N2 N3 N4 J(θ )

3 15 23 36 4 14.23 5 11 9 38 21 15.92 10 23 20 11 59 18.08

Table 3. λu = 1.0, RC = 50, Db = 0 ∗ ∗ ∗ ∗ ∗ TTNN1 N2 N3 N4 J(θ )

3 8 41 28 19 25.69 5 14 11 19 6 27.91 10 21 19 37 9 31.32

Volume 81, Number 11 SIMULATION 769 Bhatnagar and Kowshik

Table 4. T = 3, λu = 0.4, Db = 0 ∗ ∗ ∗ ∗ ∗ RC N1 N2 N3 N4 J(θ )

40 23 16 36 17 15.12 50 44 32 23 19 16.81 60 26 36 14 30 19.40 70 31 25 19 12 22.08 80 22 29 49 53 25.59 90 19 43 42 36 29.61 100 8 50 38 26 34.18

Table 5. λu = 0.4, T = 3, RC = 50 ∗ ∗ ∗ ∗ ∗ Db N1 N2 N3 N4 J(θ )

0 44 32 23 19 16.81 3 53 35 41 9 18.33 6 11 48 50 10 19.52 9 12 56 17 12 21.64

Table 6. T = 3, λu = 0.1, RC = 50, Db = 0 ∗ ∗ ∗ ∗ ∗ α N1 N2 N3 N4 J(θ )

0.66 59 26 6 19 25.12 0.70 37 32 19 21 18.81 0.72 21 31 26 28 16.42 0.75 15 23 36 4 14.23 0.80 24 32 37 12 13.67 0.85 16 29 49 43 13.79 0.90 14 37 32 29 13.61 1.00 18 45 38 16 13.66

Table 7. T = 3, λu = 0.1, RC = 50, Db = 0 ∗ ∗ ∗ ∗ ∗ ccNN1 N2 N3 N4 J(θ )

0.40 58 50 32 40 29.12 0.50 52 38 13 19 18.29 0.52 19 20 38 10 14.04 0.55 11 19 29 21 13.71 0.70 21 28 35 17 14.42 1.00 15 23 36 4 14.23

Table 8. T = 3, λu = 0.1, RC = 50, Db = 0 ∗ ∗ ∗ ∗ ∗ LLNN1 N2 N3 N4 J(θ )

10 51 39 21 45 29.20 50 32 28 22 29 15.29 100 15 23 36 4 14.23 200 9 15 29 8 13.92 500 17 18 25 47 14.01 700 27 9 34 31 16.22 1000 21 38 4 51 18.26

770 SIMULATION Volume 81, Number 11 DISCRETE PARAMETER STOCHASTIC APPROXIMATION ALGORITHM

5. Conclusions 7. References We developed in this article a two-timescale simultane- [1] Gerencsér, L., S. D. Hill, and Z. Vágó. 1999. Optimization over ous perturbation stochastic approximation algorithm that discrete sets via SPSA. In Proceedings of the IEEE Confer- ence on Decision and Control, pp. 1791-5. finds the optimum parameter in a discrete search space. [2] Gokbayrak, K., and C. G. Cassandras. 2001. Online surrogate This algorithm is applicable in cases where the cost to problem methodology for stochastic discrete resource alloca- be optimized is in itself the long-run average of certain tion problems. Journal of Optimization Theory and Applica- tions 108 (2): 349-75. cost functions whose noisy estimates can be obtained us- [3] Barnhart, C. M., J. E. Wieselthier, and A. Ephremides. 1995. ing simulation. The advantage of a simulation-based ap- Admission-control policies for multihop wireless networks. proach is that the transition dynamics of the underlying Wireless Networks 1:373-87. Markov processes need not be precisely known as long [4] Kirkpatrick, S., C. D. Gelatt, and M. Vecchi. 1983. Optimization by simulated annealing. Science 220:621-80. as state transitions of these can be simulated. We briefly [5] Gelfand, S. B., and S. K. Mitter. 1989. Simulated annealing with presented the convergence analysis of our algorithm. Fi- noisy or imprecise energy measurements. Journal of Opti- nally, we applied this algorithm to find optimal feedback mization Theory and Applications 62 (1): 49-62. policies within threshold-type policies for an admission [6] Alrefaei, M. H., and S. Andradóttir. 1999. A simulated annealing algorithm with constant temperature for discrete stochastic control problem in communication networks. We showed optimization. Management Science 45 (5): 748-64. experiments under two different settings. The results ob- [7] Yan, D., and H. Mukai. 1992. Stochastic discrete optimization. tained are along expected lines. The proposed algorithm, SIAM Journal of Control and Optimization 30 (3): 594-612. however, must be tried on high-dimensional parameter [8] Gong, W.-B., Y.-C. Ho, and W. Zhai. 1999. Stochastic compari- son algorithm for discrete optimization with estimation. SIAM settings and tested for their online adaptability in large Journal of Optimization 10 (2): 384-404. simulation scenarios. In general, SPSA algorithms are [9] Norkin, V. I., Y. M. Ermoliev, and A. Ruszczy´nski. 1998. On op- known to scale well under such settings; see, for instance, timal allocation of indivisibles under uncertainty. Operations Bhatnagar et al. [17] and Bhatnagar [20], who consider Research 46 (3): 381-95. [10] Jordan, S., and P. Varaiya. 1994. Control of multiple service high-dimensional parameters. The same, however, should multiple resource communication networks. IEEE Transac- be tried for the discrete parameter optimization algorithm tions on Communications 42 (11): 2979-88. considered here. [11] Cassandras, C. G., and C. G. Panayiotou. 1999. Concurrent sam- Recently, in Bhatnagar et al. [19], SPSA-type algo- ple path analysis of discrete event systems. Journal of Discrete Event Dynamic Systems: Theory and Applications 9:171-95. rithms that use certain deterministic perturbations instead [12] Ho, Y.-C., and X.-R. Cao. 1991. Perturbation analysis of dis- of randomized have been developed. These are found crete event dynamical systems. Boston: Kluwer. to improve performance over randomized perturbation [13] Gokbayrak, K., and C. G. Cassandras. 2002. Generalized surro- SPSA. Along similar lines, in Bhatnagar and Borkar [18], gate problem methodology for online stochastic discrete op- timization. Journal of Optimization Theory and Applications the use of a chaotic iterative sequence for random num- 114 (1): 97-132. ber generation for averaging in the generated perturba- [14] L’Ecuyer, P., and P. W. Glynn. 1994. Stochastic optimization tions in SPSA algorithms has been proposed along with by simulation: Convergence proofs for the GI/G/1 queue in a smoothed functional algorithm that uses certain pertur- steady-state. Management Science 40 (11): 1562-78. [15] Spall, J. C. 1992. Multivariate stochastic approximation using bations involving Gaussian random variables. In Bhatna- a simultaneous perturbation gradient approximation. IEEE gar [20], adaptive algorithms for simulation optimization Transactions on Automatic Control 37 (3): 332-41. based on the Newton algorithm have also been developed [16] Bhatnagar, S., andV.S. Borkar. 1998.A two time scale stochastic for continuous parameter optimization. It would be in- approximation scheme for simulation based parametric opti- mization. Probability in the Engineering and Informational teresting to develop discrete parameter analogs of these Sciences 12:519-31. algorithms as well and to study their performance in other [17] Bhatnagar, S., M. C. Fu, S. I. Marcus, and S. Bhatnagar. 2001. applications. Two timescale algorithms for simulation optimization of hid- den Markov models. IIE Transactions 33 (3): 245-58. [18] Bhatnagar, S., and V. S. Borkar. 2003. Multiscale chaotic SPSA and smoothed functional algorithms for simulation optimiza- 6. Acknowledgments tion. SIMULATION 79 (10): 568-80. [19] Bhatnagar, S., M. C. Fu, S. I. Marcus, and I.-J.Wang. 2003. Two- The authors thank all the reviewers for their many com- timescale simultaneous perturbation stochastic approximation ments that helped in significantly improving the quality of using deterministic perturbation sequences. ACM Transac- this manuscript. The authors thank Mr. I. B. B. Reddy for tions on Modelling and Computer Simulation 13 (2): 180-209. [20] Bhatnagar, S. 2005. Adaptive multivariate three-timescale help with the simulations. The first author was supported stochastic approximation algorithms for simulation based op- in part by grant no. SR/S3/EE/43/2002-SERC-Engg from timization. ACM Transactions on Modeling and Computer the Department of Science and Technology, Government Simulation 15 (1): 74-107. of India.

Volume 81, Number 11 SIMULATION 771 Bhatnagar and Kowshik

[21] Kelly, F. P., P. B. Key, and S. Zachary. 2000. Distributed admis- [27] Hirsch, M. W. 1989. Convergent activation dynamics in contin- sion control. IEEE Journal on Selected Areas in Communica- uous time networks. Neural Networks 2:331-49. tions 18:2617-28. [28] Bhatnagar, S., and I. B. B. Reddy. 2005. Optimal threshold poli- [22] Liebeherr, J., D. E.Wrege, and D. Ferrari. 1996. Exact admission cies for admission control in communication networks via control for networks with a bounded delay service. IEEE/ACM discrete parameter stochastic approximation. Telecommuni- Transactions on Networking 4 (6): 885-901. cation Systems 29 (1): 9-31. [23] Bertsekas, D. P., and J. N. Tsitsiklis. 1996. Neuro-dynamic pro- gramming. Belmont, MA: Athena Scientific. [24] Marbach, P., and J. N. Tsitsiklis. 2001. Simulation-based opti- Shalabh Bhatnagar is an assistant professor in the Department mization of Markov reward processes. IEEE Transactions on of Computer Science and Automation at the Indian Institute of Automatic Control 46 (2): 191-209. [25] Schweitzer, P. J. 1968. Perturbation theory and finite Markov Science, Bangalore, India. chains. Journal of Applied Probability 5:401-13. [26] Vazquez-Abad, F. J., and H. J. Kushner. 1992. Estimation of the Hemant J. Kowshik is an undergraduate student in the Depart- derivative of a stationary measure with respect to a control ment of Electrical Engineering at the Indian Institute of Technol- parameter. Journal of Applied Probability 29:343-52. ogy Madras, Chennai, India.

772 SIMULATION Volume 81, Number 11