
<p><a href="/goto?url=http://www.imstat.org/aos/" target="_blank">The Annals of Statistics </a></p><p>2010, Vol. 38, No. 5, 2823–2856 </p><p>DOI: <a href="/goto?url=http://dx.doi.org/10.1214/10-AOS807" target="_blank">10.1214/10-AOS807 </a></p><p>c</p><p>ꢀ <a href="/goto?url=http://www.imstat.org" target="_blank">Institute of Mathematical Statistics</a>, 2010 </p><p>TRAJECTORY AVERAGING FOR STOCHASTIC <br>APPROXIMATION MCMC ALGORITHMS<sup style="top: -0.33em;">1 </sup></p><p>By Faming Liang </p><p>Texas A&M University </p><p>The subject of stochastic approximation was founded by Robbins and Monro [Ann. Math. Statist. 22 (1951) 400–407]. After five decades of continual development, it has developed into an important area in systems control and optimization, and it has also served as a prototype for the development of adaptive algorithms for on-line estimation and control of stochastic systems. Recently, it has been used in statistics with Markov chain Monte Carlo for solving maximum likelihood estimation problems and for general simulation and optimizations. In this paper, we first show that the trajectory averaging estimator is asymptotically efficient for the stochastic approximation MCMC (SAMCMC) algorithm under mild conditions, and then apply this result to the stochastic approximation Monte Carlo algorithm [Liang, Liu and Carroll J. Amer. Statist. Assoc. 102 (2007) 305–320]. The application of the trajectory averaging estimator to other stochastic approximation MCMC algorithms, for example, a stochastic approximation MLE algorithm for missing data problems, is also considered in the paper. </p><p>1. Introduction. Robbins and Monro (1951) introduced the stochastic approximation algorithm to solve the integration equation </p><p>Z</p><p></p><ul style="display: flex;"><li style="flex:1">(1) </li><li style="flex:1">h(θ) = <sub style="top: 0.83em;">X </sub>H(θ, x)f<sub style="top: 0.15em;">θ</sub>(x)dx = 0, </li></ul><p>where θ ∈ Θ ⊂ R<sup style="top: -0.33em;">d </sup>is a parameter vector and f<sub style="top: 0.15em;">θ</sub>(x), x ∈ X ⊂ R , is a density function depending on θ. The d<sub style="top: 0.15em;">θ </sub>and d<sub style="top: 0.14em;">x </sub>denote the dimensions of θ and x, respectively. The stochastic approximation algorithm is an iterative recursive algorithm, whose each iteration consists of two steps: </p><p>d<sub style="top: 0.08em;">x </sub></p><p>θ</p><p>Received May 2009; revised November 2009. <sup style="top: -0.31em;">1</sup>Supported in part by NSF Grants DMS-06-07755, CMMI-0926803 and the Award <br>KUS-C1-016-04 made by King Abdullah University of Science and Technology (KAUST). </p><p><a href="/goto?url=http://www.ams.org/msc/" target="_blank">AMS 2000 subject classifications. </a>60J22, 65C05. </p><p>Key words and phrases. Asymptotic efficiency, convergence, Markov chain Monte <br>Carlo, stochastic approximation Monte Carlo, trajectory averaging. </p><p>This is an electronic reprint of the original article published by the </p><p><a href="/goto?url=http://www.imstat.org" target="_blank">Institute of Mathematical Statistics </a>in <a href="/goto?url=http://www.imstat.org/aos/" target="_blank">The Annals of Statistics</a>, </p><p><a href="/goto?url=http://dx.doi.org/10.1214/10-AOS807" target="_blank">2010, </a><a href="/goto?url=http://dx.doi.org/10.1214/10-AOS807" target="_blank">V</a><a href="/goto?url=http://dx.doi.org/10.1214/10-AOS807" target="_blank">o</a><a href="/goto?url=http://dx.doi.org/10.1214/10-AOS807" target="_blank">l. 38, No. 5, 2823–2856</a>. This reprint differs from the original in pagination and typographic detail. <br>1<br>2</p><p>F. LIANG </p><p>Stochastic approximation algorithm. </p><p>• Generate X<sub style="top: 0.15em;">k+1 </sub>∼ f<sub style="top: 0.15em;">θ </sub>(x), where k indexes the iteration. </p><p>k</p><p>• Set θ<sub style="top: 0.15em;">k+1 </sub>= θ<sub style="top: 0.15em;">k </sub>+ a<sub style="top: 0.15em;">k</sub>H(θ<sub style="top: 0.15em;">k</sub>, X<sub style="top: 0.15em;">k+1</sub>), where a<sub style="top: 0.15em;">k </sub>> 0 is called the gain factor. <br>The stochastic approximation algorithm is often studied by rewriting it as follows: </p><p></p><ul style="display: flex;"><li style="flex:1">(2) </li><li style="flex:1">θ<sub style="top: 0.15em;">k+1 </sub>= θ<sub style="top: 0.15em;">k </sub>+ a<sub style="top: 0.15em;">k</sub>[h(θ<sub style="top: 0.15em;">k</sub>) + ε<sub style="top: 0.15em;">k+1</sub>], </li></ul><p></p><p>R</p><p></p><ul style="display: flex;"><li style="flex:1">where h(θ<sub style="top: 0.15em;">k</sub>) = </li><li style="flex:1">H(θ<sub style="top: 0.15em;">k</sub>, x)f<sub style="top: 0.15em;">θ </sub>(x)dx corresponds to the mean effect of H(θ<sub style="top: 0.15em;">k</sub>, </li></ul><p></p><p>k</p><p>X</p><p>X<sub style="top: 0.15em;">k+1</sub>), and ε<sub style="top: 0.15em;">k+1 </sub>= H(θ<sub style="top: 0.15em;">k</sub>, X<sub style="top: 0.15em;">k+1</sub>) − h(θ<sub style="top: 0.15em;">k</sub>) is called the observation noise. In the literature of stochastic approximation, h(θ) is also called the mean field function. It is well known that the optimal convergence rate of (2) can be achieved with a<sub style="top: 0.15em;">k </sub>= −F<sup style="top: -0.33em;">−1</sup>/k, where F = ∂h(θ<sup style="top: -0.33em;">∗</sup>)/∂θ, and θ<sup style="top: -0.33em;">∗ </sup>denotes the zero point of h(θ). In this case, (2) is reduced to Newton’s algorithm. Unfortunately, it is often impossible to use this algorithm, as the matrix F is generally unknown. <br>Although an optimal convergence rate of θ<sub style="top: 0.15em;">k </sub>cannot be obtained in general, in a sequence of fundamental papers Ruppert (1988), Polyak (1990) and Polyak and Juditsky (1992) showed that the trajectory averaging estimator </p><p>P</p><p>n</p><p>¯is asymptotically efficient; that is, θ<sub style="top: 0.14em;">n </sub>= <sub style="top: 0.26em;">k=1 </sub>θ<sub style="top: 0.15em;">k</sub>/n can converge in distribu- </p><p>tion to a normal random variable with mean θ<sup style="top: -0.33em;">∗ </sup>and covariance matrix Σ, where Σ is the smallest possible covariance matrix in an appropriate sense. The trajectory averaging estimator requires {a<sub style="top: 0.15em;">k</sub>} to be relatively large, decreasing slower than O(1/k). As discussed by Polyak and Juditsky (1992), trajectory averaging is based on a paradoxical principle: a slow algorithm having less than optimal convergence rate must be averaged. <br>Recently, the trajectory averaging technique has been further explored in a variety of papers [see, e.g., Chen (1993), Kushner and Yang (1993, 1995), Dippon and Renz (1997), Wang, Chong and Kulkarni (1997), Tang, L’Ecuyer and Chen (1999), Pelletier (2000) and Kushner and Yin (2003)] with different assumptions for the observation noise. However, up to our knowledge, it has not yet been explored for stochastic approximation MCMC (SAM- CMC) algorithms [Benveniste, M´etivier and Priouret (1990), Chen (2002), Kushner and Yin (2003), Andrieu, Moulines and Priouret (2005), Andrieu and Moulines (2006)]. The stochastic approximation MCMC algorithms refer to a class of stochastic approximation algorithms for which the sample is generated at each iteration via a Markov transition kernel; that is, {x<sub style="top: 0.15em;">k+1 </sub></p><p>}</p><p>is generated via a family of Markov transition kernel {P<sub style="top: 0.15em;">θ </sub>(x<sub style="top: 0.15em;">k</sub>, ·)} controlled </p><p>k</p><p>by {θ<sub style="top: 0.15em;">k</sub>}. Recently, the stochastic approximation MCMC algorithms have been used in statistics for solving maximum likelihood estimation problems [Younes (1989, 1999), Moyeed and Baddeley (1991), Gu and Kong (1998), Gu and Zhu (2001)], and for general simulation and optimizations [Liang, </p><p>TRAJECTORY AVERAGING FOR SAMCMC </p><p>3</p><p>Liu and Carroll (2007), Atchad´e and Liu (2010)]. It is worth to point out that in comparison with conventional MCMC algorithms, for example, the Metropolis–Hastings algorithm [Metropolis et al. (1953), Hastings (1970)], parallel tempering [Geyer (1991)], and simulated tempering [Marinari and Parisi (1992), Geyer and Thompson (1995)], the stochastic approximation Monte Carlo (SAMC) algorithm [Liang, Liu and Carroll (2007)] has significant advantages in simulations of complex systems for which the energy landscape is rugged. As explained later (in Section 3), SAMC is essentially immune to the local trap problem due to its self-adaptive nature inherited from the stochastic approximation algorithm. SAMC has been successfully applied to many statistical problems, such as p-value evaluation for resampling-based tests [Yu and Liang (2009)], Bayesian model selection [Liang (2009), Atchad´e and Liu (2010)] and spatial model estimation [Liang (2007a)], among others. <br>In this paper, we explore the theory of trajectory averaging for stochastic approximation MCMC algorithms, motivated by their wide applications. Although Chen (1993, 2002) considered the case where the observation noise can be state dependent, that is, the observation noise ε<sub style="top: 0.15em;">k+1 </sub>depends on θ<sub style="top: 0.14em;">0</sub>, ... , θ<sub style="top: 0.15em;">k</sub>, their results are not directly applicable to the stochastic approximation MCMC algorithms due to some reasons as explained in Section 5. The theory established by Kushner and Yin (2003) can potentially be extended to the stochastic approximation MCMC algorithm, but, as mentioned in Kushner and Yin [(2003), page 375] the extension is not straightforward and more work needs to be done to deal with the complicated structure of the Markov transition kernel. In this paper, we propose a novel decomposition of the observation noise for the stochastic approximation MCMC algorithms. Based on the proposed decomposition, we show the trajectory averaging estimator is asymptotically efficient for the stochastic approximation MCMC algorithms, and then apply this result to the SAMC algorithm. These results are presented in Lemma A.5, Theorems 2.3 and 3.2, respectively. The application of the trajectory averaging technique to other stochastic approximation MCMC algorithms, for example, a stochastic approximation MLE algorithm for missing data problems, is also considered in the paper. <br>The remainder of this paper is organized as follows. In Section 2, we present our main theoretical result that the trajectory averaging estimator is asymptotically efficient for the stochastic approximation MCMC algorithms. In Section 3, we apply the trajectory averaging technique to the SAMC algorithm. In Section 4, we apply the trajectory averaging technique to a stochastic approximation MLE algorithm for missing data problems. In Section 5, we conclude the paper with a brief discussion. </p><p>4</p><p>F. LIANG </p><p>2. Trajectory averaging for a general stochastic approximation MCMC algorithm. </p><p>2.1. A varying truncation stochastic approximation MCMC algorithm. </p><p>To show the convergence of the stochastic approximation algorithm, restrictive conditions on the observation noise and mean field function are required. For example, one often assumes the noise to be mutually independent or to be a martingale difference sequence, and imposes a sever restriction on the growth rate of the mean field function. These conditions are usually not satisfied in practice. See Chen [(2002), Chapter 1] for more discussions on this issue. To remove the growth rate restriction on the mean field function and to weaken the conditions imposed on noise, Chen and Zhu (1986) proposed a varying truncation version for the stochastic approximation algorithm. The convergence of the modified algorithm can be shown for a wide class of the mean filed function under a truly weak condition on noise; see, for example, Chen, Guo and Gao (1988) and Andrieu, Moulines and Priouret (2005). The latter gives a proof for the convergence of the modified algorithm with Markov state-dependent noise under some conditions that are easy to verify. <br>Following Andrieu, Moulines and Priouret (2005), we consider the following varying truncation stochastic approximation MCMC algorithm. Let {K<sub style="top: 0.14em;">s</sub>, s ≥ 0} be a sequence of compact subsets of Θ such that </p><p>[</p><p></p><ul style="display: flex;"><li style="flex:1">(3) </li><li style="flex:1">K<sub style="top: 0.14em;">s </sub>= Θ and K<sub style="top: 0.14em;">s </sub>⊂ int(K<sub style="top: 0.14em;">s+1</sub>), </li></ul><p></p><p>s ≥ 0, </p><p>s≥0 </p><p>where int(A) denotes the interior of set A. Let {a<sub style="top: 0.15em;">k</sub>} and {b<sub style="top: 0.15em;">k</sub>} be two monotone, nonincreasing, positive sequences. Let X<sub style="top: 0.14em;">0 </sub>be a subset of X , and let T :X × Θ → X<sub style="top: 0.14em;">0 </sub>× K<sub style="top: 0.14em;">0 </sub>be a measurable function which maps a point (x, θ) in X × Θ to a random point in X<sub style="top: 0.14em;">0 </sub>× K<sub style="top: 0.14em;">0</sub>; that is, both x and θ will be reinitialized in X<sub style="top: 0.14em;">0 </sub>× K<sub style="top: 0.14em;">0</sub>. As shown in Lemma A.5, for the stochastic approximation MCMC algorithm, when the number of iterations becomes large, the observation noise ε<sub style="top: 0.15em;">k </sub>can be decomposed as </p><p>(4) </p><p>ε<sub style="top: 0.15em;">k </sub>= e<sub style="top: 0.15em;">k </sub>+ ν<sub style="top: 0.15em;">k </sub>+ ς<sub style="top: 0.15em;">k</sub>, </p><p>where {e<sub style="top: 0.15em;">k</sub>} forms a martingale difference sequence, and the expectation of the other two terms will go to zero in certain forms. In Theorems 2.2 and 2.3, we show that {e<sub style="top: 0.15em;">k</sub>} leads to the asymptotic normality of the trajectory <br>¯averaging estimator θ<sub style="top: 0.15em;">k</sub>, and {ν<sub style="top: 0.15em;">k</sub>} and {ς<sub style="top: 0.15em;">k</sub>} can vanish or be ignored when </p><p>¯the asymptotic distribution of θ<sub style="top: 0.15em;">k </sub>is considered. </p><p>Let σ<sub style="top: 0.15em;">k </sub>denote the number of truncations performed until iteration k and σ<sub style="top: 0.14em;">0 </sub>= 0. The varying truncation stochastic approximation MCMC algorithm starts with a random choice of (θ<sub style="top: 0.14em;">0</sub>, x<sub style="top: 0.14em;">0</sub>) in the space K<sub style="top: 0.14em;">0 </sub>×X<sub style="top: 0.14em;">0</sub>, and then iterates between the following steps: </p><p>TRAJECTORY AVERAGING FOR SAMCMC </p><p>5</p><p>Varying truncation stochastic approximation MCMC algorithm. </p><p>• Draw sample x<sub style="top: 0.15em;">k+1 </sub>with a Markov transition kernel, P<sub style="top: 0.15em;">θ </sub>, which admits </p><p>k</p><p>f<sub style="top: 0.15em;">θ </sub>(x) as the invariant distribution. </p><p>k</p><p>• Set θ<sub style="top: 0.19em;">k+1/2 </sub>= θ<sub style="top: 0.15em;">k </sub>+ a<sub style="top: 0.15em;">k</sub>H(θ<sub style="top: 0.15em;">k</sub>, x<sub style="top: 0.15em;">k+1</sub>). • If kθ<sub style="top: 0.18em;">k+1/2 </sub>− θ<sub style="top: 0.15em;">k</sub>k ≤ b<sub style="top: 0.15em;">k </sub>and θ<sub style="top: 0.18em;">k+1/2 </sub>∈ K<sub style="top: 0.14em;">σ </sub>, where kzk denote the Euclidean </p><p>k</p><p>norm of the vector z, then set (θ<sub style="top: 0.15em;">k+1</sub>, x<sub style="top: 0.15em;">k+1</sub>) = (θ<sub style="top: 0.19em;">k+1/2</sub>, x<sub style="top: 0.15em;">k+1</sub>) and σ<sub style="top: 0.15em;">k+1 </sub>= σ<sub style="top: 0.15em;">k</sub>; otherwise, set (θ<sub style="top: 0.15em;">k+1</sub>, x<sub style="top: 0.15em;">k+1</sub>) = T (θ<sub style="top: 0.15em;">k</sub>, x<sub style="top: 0.15em;">k</sub>) and σ<sub style="top: 0.15em;">k+1 </sub>= σ<sub style="top: 0.15em;">k </sub>+ 1. </p><p>As depicted by the algorithm, the varying truncation mechanism works in an adaptive manner as follows: when the current estimate of the parameter wanders outside the active truncation set or when the difference between two successive estimates is greater than a time-dependent threshold, then the algorithm is reinitialized with a smaller initial value of the gain factor and a larger truncation set. This mechanism enables the algorithm to select an appropriate gain factor sequence and an appropriate starting point, and thus to confine the recursion to a compact set; that is, the number of reinitializations is almost surely finite for every (θ<sub style="top: 0.14em;">0</sub>, x<sub style="top: 0.14em;">0</sub>) ∈ K<sub style="top: 0.14em;">0 </sub>× X<sub style="top: 0.14em;">0</sub>. This result is formally stated in Theorem 2.1, which plays a crucial role for establishing asymptotic efficiency of the trajectory averaging estimator. <br>Regarding the varying truncation scheme, one can naturally propose many variations. For example, one may not change the truncation set when only the condition kθ<sub style="top: 0.18em;">k+1/2 </sub>−θ<sub style="top: 0.15em;">k</sub>k ≤ b<sub style="top: 0.15em;">k </sub>is violated, and, instead of jumping forward in a unique gain factor sequence, one may start with a different gain factor sequence (smaller than the previous one) when the reinitialization occurs. In either case, the proof for the theorems presented in Section 2.2 follows similarly. </p><p>2.2. Theoretical results on the trajectory averaging estimator. The asymp- </p><p>¯totic efficiency of θ<sub style="top: 0.15em;">k </sub>can be analyzed under the following conditions. </p><p>Lyapunov condition on h(θ). Let hx, yi denote the Euclidean inner product. </p><p>(A<sub style="top: 0.14em;">1</sub>) Θ is an open set, the function h:Θ → R<sup style="top: -0.33em;">d </sup>is continuous, and there exists a continuously differentiable function v :Θ → [0, ∞) such that: </p><p>(i) There exists M<sub style="top: 0.14em;">0 </sub>> 0 such that <br>(5) </p><p>L = {θ ∈ Θ, h∇v(θ), h(θ)i = 0} ⊂ {θ ∈ Θ, v(θ) < M<sub style="top: 0.14em;">0</sub>}. </p><p>(ii) There exists M<sub style="top: 0.14em;">1 </sub>∈ (M<sub style="top: 0.14em;">0</sub>, ∞) such that V<sub style="top: 0.14em;">M </sub>is a compact set, </p><p>1</p><p>where V<sub style="top: 0.14em;">M </sub>= {θ ∈ Θ, v(θ) ≤ M}. <br>(iii) For any θ ∈ Θ \ L, h∇v(θ), h(θ)i < 0. (iv) The closure of v(L) has an empty interior. </p><p>6</p><p>F. LIANG </p><p>This condition assumes the existence of a global Lyapunov function v for the mean field h. If h is a gradient field, that is, h = −∇J for some lower bounded real-valued and differentiable function J(θ), then v can be set to J, provided that J is continuously differentiable. This is typical for stochastic optimization problems, for example, machine learning [Tadi´c (1997)], where a continuously differentiable objective function J(θ) is minimized. </p><p>Stability condition on h(θ). </p><p>(A<sub style="top: 0.14em;">2</sub>) The mean field function h(θ) is measurable and locally bounded. There exist a stable matrix F (i.e., all eigenvalues of F are with negative real parts), γ > 0, ρ ∈ (0, 1], and a constant c such that, for any θ<sup style="top: -0.33em;">∗ </sup>∈ L, </p><p>1+ρ </p><p></p><ul style="display: flex;"><li style="flex:1">kh(θ) − F(θ − θ<sup style="top: -0.38em;">∗</sup>)k ≤ ckθ − θ<sup style="top: -0.38em;">∗</sup>k </li><li style="flex:1">∀θ ∈ {θ :kθ − θ<sup style="top: -0.38em;">∗</sup>k ≤ γ}, </li></ul><p></p><p>where L is defined in (5). <br>This condition constrains the behavior of the mean field function around the solution points. It makes the trajectory averaging estimator sensible both theoretically and practically. If h(θ) is differentiable, the matrix F can be chosen to be the partial derivative of h(θ), that is, ∂h(θ)/∂θ. Otherwise, certain approximation may be needed. </p><p>Drift condition on the transition kernel P<sub style="top: 0.15em;">θ</sub>. Before giving details of this </p><p>condition, we first define some terms and notation. Assume that a transition kernel P<sub style="top: 0.15em;">θ </sub>is irreducible, aperiodic, and has a stationary distribution on a sample space denoted by X . A set C ⊂ X is said to be small if there exist a probability measure ν on X , a positive integer l and δ > 0 such that </p><p></p><ul style="display: flex;"><li style="flex:1">P<sub style="top: 0.23em;">θ</sub><sup style="top: -0.38em;">l</sup>(x, A) ≥ δν(A) </li><li style="flex:1">∀x ∈ C, ∀A ∈ B<sub style="top: 0.14em;">X </sub>, </li></ul><p></p><p>where B<sub style="top: 0.14em;">X </sub>is the Borel set defined on X . A function V :X → [1, ∞) is said to be a drift function outside C if there exist positive constants λ < 1 and b such that </p><p>P<sub style="top: 0.15em;">θ</sub>V (x) ≤ λV (x) + bI(x ∈ C) </p><p>∀x ∈ X, </p><p>R</p><p>where P<sub style="top: 0.15em;">θ</sub>V (x) = <sub style="top: 0.32em;">X </sub>P<sub style="top: 0.15em;">θ</sub>(x, y)V (y)dy. For a function g :X → R<sup style="top: -0.33em;">d</sup>, define the norm </p><p>kg(x)k </p><p>kgk<sub style="top: 0.14em;">V </sub>= sup </p><p>V (x) </p><p>x∈X </p><p>and define the set L<sub style="top: 0.14em;">V </sub>= {g :X → R<sup style="top: -0.33em;">d</sup>, sup<sub style="top: 0.22em;">x∈X </sub>kgk<sub style="top: 0.14em;">V </sub>< ∞}. Given the terms and notation introduced above, the drift condition can be specified as follows. </p><p>TRAJECTORY AVERAGING FOR SAMCMC </p><p>7</p><p>(A<sub style="top: 0.14em;">3</sub>) For any given θ ∈ Θ, the transition kernel P<sub style="top: 0.15em;">θ </sub>is irreducible and aperiodic. In addition, there exists a function V :X → [1, ∞) and a constant α ≥ 2 such that for any compact subset K ⊂ Θ: </p><p>(i) There exist a set C ⊂ X , an integer l, constants 0 < λ < 1, b, ς, δ > 0 and a probability measure ν such that </p><p>(6) </p><p>(7) supP<sub style="top: 0.23em;">θ</sub><sup style="top: -0.38em;">l</sup>V <sup style="top: -0.38em;">α</sup>(x) ≤ λV <sup style="top: -0.38em;">α</sup>(x) + bI(x ∈ C) </p><p>∀x ∈ X, </p><p>θ∈K </p><p>supP<sub style="top: 0.15em;">θ</sub>V <sup style="top: -0.37em;">α</sup>(x) ≤ ςV <sup style="top: -0.37em;">α</sup>(x) </p><p>∀x ∈ X, </p><p>θ∈K </p><p></p><ul style="display: flex;"><li style="flex:1">(8) </li><li style="flex:1">inf P<sub style="top: 0.23em;">θ</sub><sup style="top: -0.37em;">l</sup>(x, A) ≥ δν(A) </li></ul><p></p><p>∀x ∈ C, ∀A ∈ B<sub style="top: 0.14em;">X </sub>. </p><p>θ∈K </p><p>(ii) There exists a constant c > 0 such that, for all x ∈ X , <br>(9) </p><p>supkH(θ, x)k<sub style="top: 0.14em;">V </sub>≤ c, </p><p>θ∈K </p><p>(10) </p><p>sup kH(θ, x) − H(θ<sup style="top: -0.38em;">′</sup>, x)k<sub style="top: 0.14em;">V </sub>≤ ckθ − θ<sup style="top: -0.38em;">′</sup>k. </p><p>(θ,θ<sup style="top: -0.19em;">′</sup>)∈K </p><p>(iii) There exists a constant c > 0 such that, for all (θ, θ<sup style="top: -0.33em;">′</sup>) ∈ K × K, </p><p>′</p><p>′</p><p>(11) (12) </p><p></p><ul style="display: flex;"><li style="flex:1">kP<sub style="top: 0.15em;">θ</sub>g − P<sub style="top: 0.15em;">θ </sub>gk<sub style="top: 0.14em;">V </sub>≤ ckgk<sub style="top: 0.14em;">V </sub>kθ − θ k </li><li style="flex:1">∀g ∈ L<sub style="top: 0.14em;">V </sub>, </li></ul><p></p><p>′</p><p></p><ul style="display: flex;"><li style="flex:1">α</li><li style="flex:1">α</li><li style="flex:1">α</li></ul><p></p><p>′</p><p></p><ul style="display: flex;"><li style="flex:1">kP<sub style="top: 0.15em;">θ</sub>g − P<sub style="top: 0.16em;">θ </sub>gk<sub style="top: 0.14em;">V </sub>≤ ckgk<sub style="top: 0.14em;">V </sub>kθ − θ k </li><li style="flex:1">∀g ∈ L<sub style="top: 0.14em;">V </sub></li></ul><p></p><p>.</p><p>Assumption (A<sub style="top: 0.14em;">3</sub>)(i) is classical in the literature of Markov chain. It implies the existence of a stationary distribution f<sub style="top: 0.15em;">θ</sub>(x) for all θ ∈ Θ and V <sup style="top: -0.33em;">α</sup>-uniform ergodicity [Andrieu, Moulines and Priouret (2005)]. Assumption (A<sub style="top: 0.14em;">3</sub>)(ii) gives conditions on the bound of H(θ, x). This is a critical condition for the observation noise. As seen later in Lemmas A.1 and A.5, it directly leads to the boundedness of some terms decomposed from the observation noise. For some algorithms, for example, SAMC, for which H(θ, x) is a bounded function, the drift function can be simply set as V (x) = 1. </p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages35 Page
-
File Size-