Review of Basic Asymptotic Theory

Probability Space: (Ω, F, P), such that the following 3 conditions are satisfied:

1. A ∈ F =⇒ Ac ∈ F. S∞ 2. A1,A2,..., ∈ F =⇒ i=1 Ai ∈ F. 3. ∅ ∈ F. T∞ Question: Show that the above conditions imply Ω ∈ F and i=1 Ai ∈ F. The events lim sup and lim inf are defined as: T∞ S 1. lim supn→∞ An = n=1 m≥n Am = {An, i.o.} where i.o. denotes “infinitely often”. S∞ T 2. lim infn→∞ An = n=1 m≥n Am = {An, e.v.} where e.v. denotes “eventually”. Stochastic Convergence: All materials can be found in Ch3, Amemiya and Davidson Ch 18. Almost Sure Convergence:

a.s. Xn −→ 0 ⇐⇒ P (lim Xn (ω) = 0) = 1. ⇐⇒ ∀ > 0,P (|Xn (ω) | > , i.o.) = 0.  ∞    \ [ [ ⇐⇒∀ > 0,P (|Xm (ω) | > ) = 0. ⇐⇒ ∀ε > 0, lim P {|Xm (ω) | > } = 0.   n→∞   n=1 m≥n m≥n

 ∞    [ \ \ ⇐⇒∀ > 0,P {|Xm (ω) | ≤ } = 1. ⇐⇒ lim P {|Xm (ω) | ≤ } = 1.   n→∞   n=1 m≥n m≥n

Question: Show the above definitions are equivalent. p p L p L convergence Xn −→ 0: limn→∞ E|Xn| = 0.

Convergence in probability: ∀ > 0, limn→∞ P (|Xn (ω) | ≤ ) = 1.

Convergence in distribution: limn→∞ P (Xn ≤ x) = P (X ≤ x) for every continuity point x in the distribution of X. Relations:  a.s.   p  T  1. Xn → 0 =⇒ Xn → 0 : Note that P m≥n |Xm (ω) | ≤  ≤ P (|Xn (ω) | ≤ ).

p  L   p  −p p 2. Xn −→ 0, p > 0 =⇒ Xn −→ 0 . Note that P (|Xn| > ) ≤  E|Xn| by the Markov Inequality.

 p   d   d   p  3. Xn → 0 ⇐⇒ Xn → 0 : Almost by definition. Note however that Xn → X ; Xn − X → 0 unless X is degenerate.

P∞ Borel-Cantelli lemma(BC): n=1 P (En) < ∞ =⇒ P (En, i.o.) = 0. S  P Proof: P (En, i.o.) = limn→∞ P n≥m Em ≤ limn→∞ m≥n P (Em) → 0 where the last equality is by the summability assumption. a.s. Example: Xi ∼ IIDUniform(0, 1), show that min1≤i≤n Xi −→ 0. Qn n P∞ n Proof: Note that P (min1≤i≤n Xi > ) = i=1 P (Xi > ) = (1 − ) . And n=1 (1 − ) < ∞. Use Borel-Cantelli to conclude. P∞ Look at the example in Amemiya P88: Note that need n=1 P (|Xn| > ) = ∞ for the 1 arguments to hold, hence P (|Xn| > ) = n will do. But Borel-Cantelli may not be neccessary for a.s. convergence: Let ω be uniformly 1 1 a.s. distributed on (0, 1). Define Xn (ω) = n if ω ≤ n and Xn (ω) = 0 if ω > n . Obviously Xn (ω) → 0 but BC does not hold. p a.s. does not imply L convergence: The same example above, note EXn = 1 for all n, a.s. although Xn −→ 0. So when does a.s. convergence imply convergence in distribution: need to control for the cases where things go really wrong with small probability. a.s. Monotone Convergence Theorem(MON): If Xn → X and Xn is increasing almost surely, then limn→∞ EXn = EX. a.s. Dominated Convergence Theorem(DOM): If Xn → X and E (supn |Xn (ω) |) < ∞, then limn→∞ EXn = EX. Note that this also applies to the Lebesgue measure, which is not a probablity a.s. measure, in which case we have: If fn (x) → f (x) with respect to the Lebesgue measure, and R R R | supn fn (x) |dx < ∞, then limn→∞ fn (x) dx = f (x) dx.

Uniform Integrability(UI):Definition Xn is U.I. if limM→∞ supn E (|Xn|1 (|Xn| > M)) = 0. a.s. Theorem: If Xn is U.I. and if Xn → 0 then lim E|Xn| = 0. Hence lim EXn = 0.

Proof: Write E|Xn| = E|Xn|1 (|Xn| > M) + E|Xn|1 (|Xn| < M). The first term → 0 as M → ∞ by U.I. Use DOM to show that given M, the second term → 0 since it is dominated by M. p Stochastic Order: Xn = op (1) if Xn −→ 0. Xn = Op (1) if limM→∞ lim supn P (|Xn| > M) = 0. −1 Facts: Xn = Op (an) means an Xn = Op (1). Op (1) op (1) = op (1). Op (an) Op (bn) = Op (anbn). Op (an) + Op (bn) = Op (an + bn) = Op (max (an, bn)). Slutsky: See Amemiya thm 3.2.7. p89. Continuous Mapping: See Amemiya thm 3.2.5. p88. ¯ 1 Pn Weak Law of Large Numbers(WLLN): Xt, t = 1,.... Let Xn = n t=1 Xt, underwhat p conditions does X¯n − EX¯n −→ 0. p Sufficient to show E|X¯n − EX¯n| → 0, say p = 2 but other p > 0 also works.

WLLN for independent nonindentically distributed Xt, Davidson p293: Let Xt be an P∞ 2 2 ¯ ¯ 2 independent . If t=1 σt /t < ∞ then E Xn − EXn → 0. ¯ ¯ 2 ¯  1 Pn 2 Proof: E Xn − EXn = V ar Xn = n2 t=1 σt . Use Kronecker’s Lemma(See p34 Davidson), which says that for a positive sequence of numbers xt and a sequence of numbers that monotonically increase to infinity, a , if P∞ x /a < ∞, then 1 Pn x → 0 as n → ∞. Now take x = σ2 and t t=1 t t an t=1 t t t 2 take at = t . 2 An easy WLLN: Xt uncorrelated with mean 0 and V ar (Xt) = σ . a.s. Strong Law of Large Numbers(SLLN): Under what conditions does X¯n − EX¯n −→ 0. 2 a.s. P∞ σt ¯ SLLN 1: Thm 3.3.1 Amemiya P90 If Xt independent, t=1 t2 < ∞, then Xn − EXn −→ 0.

2 a.s. SLLN 2: Thm 3.3.2 Amemiya p90 If Xt iid with finite mean u, then X¯n − u → 0. Triangular Array: Useful for convergence in probability and in distribution {{Xnt, t = 1, . . . , n}, n = 1,..., ∞}. 2 2 For example, given Xt, t = 1,..., ∞ independent and mean 0, let σt = V ar (Xt), and let Cn = Pn σ2. Then X = Xt is an triangular array of random variables, for t = 1, . . . , n and for t=1 t nt Cn n = 1,..., ∞. Pn Pn Question: Show that V ar ( t=1 Xnt) = t=1 V ar (Xnt) = 1. Consistency of least square coefficient: p95 Amemiya 0 2 2 ˆ 0 −1 0 Model 1: yt = xtβ + ut, ut uncorrelated, mean 0, Eut = σ for all t. β = (X X) X y. 0 ˆ p Consistency Theorem: If λs (X X) → ∞, then β → β. Use λs (A) and λl (A) to denote the smallest and largest eigenvalues of A.

0 h 0 −1i h 0 −1i 0 −1 Proof: λs (X X) → ∞ =⇒ λl (X X) → 0 =⇒ trace (X X) → 0. But the trace of (X X) is the sum of the variances for the elements in βˆ − β. 2 2 p 2 Consistency of σˆ , Thm 3.5.2 Amemiy p96 Assume ut iid in Model 1. Thenσ ˆ → σ , where σˆ2 = T −1uˆ0uˆ = T −1u0u − T −1u0P u for P = X (X0X)−1 X0. Proof:

1. T −1u0u −→a.s. σ2 by SLLN 2.

−1 0  −1 −1 0 −1 2 σ2k 2. P T u P u >  ≤  ET u P u = (T ) σ tr (P ) = T  → 0.

 0 −1 0 h 0 −1 0 i 3. Reminder tr (P ) = tr X (X X) X = tr (X X) (X X) = tr [Ik].

d Convergence in distribution: See Thm 22.8 p353, Davidson Xn → X =⇒ limn→∞ E [f (Xn)] = E [f (X)] for every bounded continuous f. However EXn ; EX if Xn unbounded. Characteristic : Useful in showing convergence in distribution because it is bounded and continuous. iλX Definition: φX (λ) = Ee = E [cos (λX) + i sin (λX)]. Examples:

−λ2/2 1. X Standard Normal N (0, 1), then φX (λ) = e .

1 −|λ| 2. X Cauchy with density f (x) = π(1+x2) , then φX (λ) = e . (Davidson p167)

Property:

iλ(aX+b) ibλ 1. φaX+b (λ) = Ee = e φX (aλ). Pn Qn n 2. Let Xnt be iid, let Sn = t=1 Xnt, then φSn (λ) = t=1 φXnt (λ) = [φXnt (λ)] .

Examples:

3 √1 Pn 1. If Xt iid normal N(0, 1), let Sn = n t=1 Xt, then

2 √ n h  √ 2in  λ  φ (λ) = φ λ/ n = exp − λ/ n = exp − . Sn Xt 2 Still N (0, 1) 1 Pn 2. If Xt iid Cauchy, let Sn = n t=1 Xt, then n n φSn (λ) = [φXt (λ/n)] = [exp (−|λ/n|)] = exp (−|λ|) . Still Cauchy. So No LLN for Cauchy random variables, because it has no mean. Central Limit Theorem: CLT Sequence Version(See Amemiya p92) Let

X¯ − EX¯ S = n n . n q  V ar X¯n

d Under what conditions does Sn → N (0, 1). Triangular Array Version(See Davidson p 368.) Then X = Xt is an triangular array of random nt Cn variables, for t = 1, . . . , n and for n = 1,..., ∞. The CLT looks for conditions under which Pn d Sn = t=1 Xnt −→ N (0, 1). Remember that by definition V ar (Sn) = 1. Question: Show that the two formulations are equivalent. Lindeberg Condition(Sufficient Condition for CLT) Sequence Version(Thm 3.3.6. Amemiya. p92):

n 1 X  2  lim E Xt 1 (|Xt| > Cn) = 0. ∀ > 0. n→∞ C2 n t=1

Array Version(Thm 23.6, Davidson p369):

n X 2 lim EXnt1 (|Xnt| > ) = 0. ∀ > 0. n→∞ t=1

Question: Convince yourself that the two versions are the same thing. Liapounov Condition: implies Lindeberg condition Sequence Version(Amemiya p92):

n !1/3  2−1/2 X 3 lim Cn E|Xt| = 0. n→∞ t=1

Array Version(Davidson p 373):

n X 2+δ ∃δ > 0, s.t. lim E|Xnt| = 0. n→∞ t=1

4 Question: take δ = 1 in the Array Version then you get the Sequence Version, convince yourself that they say the same thing. Liapounov Condition implies Lindeberg Condition: Proof(use the array version):

2+δ 2+δ δ 2  E|Xnt| ≥ E 1 (|Xnt| > ) Xnt ≥  E 1 (|Xnt| > ) Xnt Therefore n n X 2  1 X 2+δ lim E Xnt1 (|Xnt| > ) ≤ lim E|Xnt| = 0. n→∞ δ n→ t=1 t=1

Question: Convince yourself that the same proof works for the sequence version for the special case of δ = 1. 2 Lindeberg-Levy CLT: If Xt iid 0, σ , then the CLT holds. Convince yourself that this implies the Lindeberg condition.

2 2 Xt A sufficient condition for Lindeberg condition: Let σt ≥ 1 for all t and nondecreasing, If 2 σt 2 max1≤t≤n σt is Uniformly Integrable(U.I.) and if supn n Pn 2 < ∞, then the Lindeberg condition holds. t=1 σt Proof:

n  2    1 X  2  1 2 Xt Xt  E Xt 1 (|Xt| > Cn) ≤ n max σt E 1 | | > Cn C2 C2 1≤t≤n σ2 σ σ n t=1 n t t t   " 2  2 2 !# 1 2 Xt Xt ε 2 ≤ 2 n max σt sup E 1 > 2 Cn −→ 0. Cn 1≤t≤n t σt σt max1≤t≤n σt where the convergence follows from the first term being finite by assumption and the second term ε2 2 → 0 by UI and that 2 Cn −→ ∞ max1≤t≤n σt Asymptotic Normality of least square coefficients: (Thm 3.5.3 p96 Amemiya) Model 1, 2   2 max1≤t≤T xt −1 0 1/2 ˆ d ut iid 0, σ . xt is a scalar, if limT PT 2 = 0, then σ (x x) β − β −→ N (0, 1). t=1 xt   PT 1 PT x u −1 0 1/2 ˆ t=1 xtut T t=1 t t Proof: Note first σ (x x) β − β = √ = q , so sufficient to check σ PT x2 1 PT t=1 t V ar( T t=1 xtut)

5 Lindeberg condition for xtut, which is   v  T u T 1 X 2 uX 2 E (xtut) 1 |xtut| > σt x 2 PT 2   t  σ t=1 xt t=1 t=1 T " T !# 1 X 2 2 2 2 X 2 = E (xtut) 1 (xtut) >  σ x 2 PT 2 t σ t=1 xt t=1 t=1 T " T !# 1 X σ2 P x2 = Ex2 u21 u2 > 2 t=1 t 2 PT 2 t t t x2 σ t=1 xt t=1 t " 2 PT 2 !# 1 2 2 2 σ t=1 xt ≤ 2 max E u11 u1 >  2 σ 1≤t≤T xt " 2 PT 2 !# 1 2 2 2 σ t=1 xt = 2 E u11 u1 >  2 −→ 0. σ max1≤t≤T xt

2 PT 2 2 σt t=1 xt where the convergence follows from Eu1 ≤ ∞ and the stated assumption that 2 −→ ∞. max1≤t≤T xt

2 What if xt is a k-dimensional vector? Thm 3.5.4 p 97 Amemiya: Model 1, ut iid 0, σ , 0 −1 2  0 1/2 −1 for each i = 1, . . . , k, limT →∞ (xixi) max1≤t≤T xti = 0. Let S = diag (xixi) , Z = XS , 0  ˆ  d 2 −1 limT →∞ Z Z = R nonsigular, then S β − β → N 0, σ R .   Proof: S βˆ − β = S (X0X)−1 X0u = S (X0X)−1 SS−1X0u = (Z0Z)−1 Z0u. Use Cramer-Wold De- vice(Amemiya p93 Thm 3.3.8) sufficient to show that for c 6= 0, c0 (Z0Z)−1 Z0u −→d N 0, σ2c0R−1c. 0 0 −1 0 LD 0 0 0 0 −1 0 But c (Z Z) Z u = γ Z u for γ = c R by Slutsky. So take xt = γ zt in Thm 3.5.3 and check its condition:

0 2 0 0 0 k 2 max1≤t≤T (γ zt) (γ γ) max1≤t≤T z zt γ γ X max1≤t≤T x lim ≤ lim t ≤ ti −→ 0. T →∞ γ0Z0Zγ T →∞ γ0Z0Zγ [λ (Z0Z)] γ0γ PT 2 s i=1 t=1 xti 1st inequality by Cauchy-Schwartz, 2nd inequality by definition of smallest eigenvalue and by definition of zit, convergence by λs bounded away from 0 for large sample and the last term goes to 0 for each i = 1, . . . , k by assumption. Note that Z0Z is the sample var-cov matrix for the regressors. Uniform Convergence(in probability): Definition Qˆ (θ) converges in probality to Q (θ) uni-  ˆ  formly over the compact set θ ∈ Θ if ∀ > 0, limT →∞ P supθ∈Θ |Q (θ) − Q (θ) | >  = 0. The concept of uniform convergence in probability is useful in two things:

1. Showing consistency of M-estimators.

2. Showing consistency of estimated variance-covariance etc.

Consistency of M-Estimators Read Thm 4.1.1 p106 Amemiya, and read Thm 2.1 p2121 Newey and McFadden. Statement of theorem: If

6 1. limT →∞ P (supθ∈Θ |QT (θ) − Q (θ) | > ) = 0.

2. Q (θ) continuous and uniquely maximized at θ0. ˆ 3. θ = argmaxQT (θ) over compact parameter set Θ. ˆ p plus some continuity and measurability requirements for QT (θ), then θ −→ θ0.   Steps in Proof: Want ∀δ > 0, P |θˆ − θ0| ≤ δ −→ 1.

1. First by condition (2), ∀δ > 0, ∃ > 0 s.t. |Q (θ) − Q (θ0) | <  =⇒ |θ − θ0| < δ. Therefore     the problem is translated into showing P |Q θˆ − Q (θ0) | <  −→ 1, or equivalently to     P Q θˆ − Q (θ0) > − −→ 1.

ˆ ˆ ˆ ˆ 2. Since QT θ > QT (θ0) by condition (3), then Q θ −QT θ +QT θ −QT (θ0)+QT (θ0)− ˆ ˆ ˆ Q (θ0) ≡ Q θ − Q (θ0) > − is implied by Q θ − QT θ > −/2 and QT (θ0) − Q (θ0) >

−/2. But both of these last two events are implied by the event supθ∈Θ |QT (θ)−Q (θ) | < /2, the probability of which tends to 1 by condition (1).

Counter-examples: Read Ex 4.1.1-4.1.3 p108-109 in Amemiya. Consistency of estimated var-cov matrix, Jacobian, etc: Read Thm 4.1.5 p113 Amemiya. Note the it is sufficient for uniform convergence to hold over a shrinking neighborhood of θ0, the Newey and McFadden chapter made extensive use of this in many lemmas, i.e. only need ∀δ −→ 0,   n P sup|θ−θ0|<δn |QT (θ) − QT (θ0) | >  −→ 0, the same proof goes through. However, δn may goes to 0 arbitrary slowly, so essentially there is no difference. Conditions for Uniform Convergence: Stochastic(uniform) Read Ch 21 of Davidson, skip the measurability problem part.

First think about just sequence of deterministic functions fn (θ). When does a sequence of functions converges uniformly? Uniform Equicontinuity for deterministic sequence of functions:

0 lim sup sup |fn (θ ) − fn (θ) | = 0. δ→0 n |θ0−θ|<δ

This is not very useful, what if fn (θ) may be discontinuous but the size of the jump goes to 0, 1 say θ ∈ [0, 1], fn (θ) = 0 for θ ∈ [0, 1/2], and fn (θ) = n for θ ∈ (1/2, 1]. Still would like to say fn (θ) → 0 uniformly in θ. So better need the following modification: Asymptotic uniform equicontinuity for deterministic sequence of functions:

0 lim lim sup sup |fn (θ ) − fn (θ) | = 0. δ→0 n→∞ |θ0−θ|<δ

Convince yourself that this definition works for the previous example. The idea is to bound the fluctuation between fixed grids of Θ. Uniform convergence of deterministic sequence of functions Read Thm 21.7 p335 Davidson

7 Statement of theorem: Θ compact, supθ∈Θ |fn (θ) | −→ 0 if and only if fn (θ) −→ 0 for each θ and fn is asymptotically uniformly equicontinuous.

Proof(sufficiency only): partition Θ into m balls with radius δ each, let θ˜i be the center of the ith ball, i = 1, . . . , m. Note the following

0 ˜  ˜  0 ˜  sup |fn (θ) | ≤ max sup |fn (θ ) − fn θi | + max |fn θi | ≤ sup |fn (θ) − fn (θ ) | + max |fn θi | 1≤i≤m 0 1≤i≤m 0 1≤i≤m θ∈Θ |θ −θ˜i|<δ |θ−θ |<δ Take lim sup of the leftmost side and the rightmost side. “Only if” part see p336 Davidson. Now turning attention to the main problem: the stochastic case. Stochastic uniform equicontinuity: See Davidson Ch 21, p327. You may also consult the original papers of Andrews and Newey in the reading list, if you have time and enjoy it.

Definition: A sequence of random functions Qn (θ) is stochastic uniform equicontinuity if ∀ > 0, ! 0 lim lim sup P sup |Qn (θ) − Qn (θ ) | >  = 0. δ→0 n→∞ |θ−θ0|<δ

Alternative definition: ∀ > 0, ∃δ > 0 such that ! 0 lim sup P sup |Qn (θ) − Qn (θ ) | >  <  n→∞ |θ−θ0|<δ

Yet another definition: ∀ > 0, η > 0, ∃δ > 0 such that ! 0 lim sup P sup |Qn (θ) − Qn (θ ) | >  < η n→∞ |θ−θ0|<δ

Convince yourself that these definitions are all the same thing. p Uniform convergence in probability: Davidson Thm 21.9 p337. If Qn (θ) −→ 0 for each θ, p and Qn (θ) is stochastic equicontinuous on θ ∈ Θ compact, then supθ∈Θ |Qn (θ) | −→ 0. Proof: Given δ > 0, the parameter set Θ can always be partitioned into a finite number of m balls with radius no larger than δ, take θ˜i to be the center of each ball, i = 1, . . . , m, then   ! h ˜  ˜  i P sup |Qˆn (θ) | >  = P max sup |Qˆn (θ) − Qˆn θi | + |Qˆn θi | >  1≤i≤m θ∈Θ |θ−θ˜i| !   ˜  ˜  ≤ P max sup |Qˆ (θ) − Qˆ θi | > /2 + P max |Qˆn θi | > /2 1≤i≤m 1≤i≤m |θ−θ˜i|<δ ! m 0 X  ˜   ≤ P sup |Qˆn (θ) − Qn (θ ) | > /2 + P |Qn θi | > /2 0 |θ−θ |<δ i=1 Take δ → 0 to get rid of the first term and n → ∞ to get rid of the second term. Comment: Although useful as highlevel conditions, the stochastic equicontinuity condition itself is not easy to use(may not be directly useful at all). If you are interested more in stochastic equicontinuity and the related subject of empirical processes where they are really useful in tackling nonsmooth problems, consult the Andrews chapter in Handbook as well as the two books by Pollard.

8 For simple problems where the objective function is smooth, differentiable, etc, uniform convergence can be verified without any of these fancy conditions, see the examples in Amemiya ex 4.3.1. and ex 4.3.2. pp131. It is good to have some simple sufficient condition for stochastic equicontinuity. Lipschitz condition for stochastic uniform equicontinuity: Read Lemma 2.9 p2138 Newey and McFadden, Theorem 21.10 p339 Davidson, consult the original Andrews(1992) and Newey(1991) papers only if you are interested. 0 0 0 0 Statement of condition: If ∀θ, θ ∈ Θ, if |Qn (θ)−Qn (θ ) | ≤ Bnd (θ, θ ) where limδ→0 sup|θ−θ0|<δ d (θ, θ ) = 0 and Bn = Op (1), then Qn (θ) is stochastic equicontinuous. 0 0 Proof: Note that supθ−θ0|<δ |Qn (θ) − Qn (θ ) | >  =⇒ Bn supθ−θ0|<δ d (θ, θ ) > . Therefore ! 0 lim lim sup P sup |Qn (θ) − Q (θ ) | >  δ→0 n→∞ |θ−θ0|<δ ! ! 0  ≤ lim lim sup P Bn sup d (θ, θ ) >  = lim lim sup P Bn > 0 δ→0 n→∞ |θ−θ0|<δ δ→0 n→∞ sup|θ−θ0|<δ d (θ , θ)  = lim lim sup P (Bn > M) = 0. where M = 0 . M→∞ n→∞ sup|θ−θ0|<δ d (θ , θ)

1 Pn Example: Suppose Qn (θ) = n t=1 f (zt, θ), zt iid, f (zt, θ) differentiable with fθ (zt, θ), then 0 1 Pn ¯ 0 ¯ 0 |Q (θ) − Q (θ ) | ≤ n t=1 |fθ zt, θ ||θ − θ |, for θ ∈ (θ, θ ). If b (zt) = supθ∈Θ fθ (zt, θ) is such that 1 Pn b(z) Eb (zt) < ∞, then the Lipschitz condition holds with Bn = n t=1 b (zt), b/o P (Bn > M) ≤ M . But what to do when the Lipschitz condition is not applicable? An Uniform WLLN: See Theorem 4.2.1, pp116 of Amemiya, which is a slightly weaker version of lemma 2.4 p2129 of Newey and McFadden.

Statement of theorem: Θ compact, yt iid, g (yt, θ) continuous in θ for each yt a.s., Eg (yt, θ) = 0, 1 Pn  E supθ∈Θ |g (yt, θ) | < ∞, then limn→∞ P supθ∈Θ | n t=1 g (yt, θ) | >  = 0 ∀ > 0. Proof: Use + stochastic equicontinuity.

1. E supθ∈Θ |g (yt, θ) | < ∞ =⇒ E|g (yt, θ) | > ∞ for each θ, so use SLLN 2 to conclude 1 Pn a.s.(p) n t=1 g (yt, θ) −→ 0 for each θ. 1 Pn 2. Verify stochastic equicontinuity for n t=1 g (yt, θ): n n n 1 X 0 1 X 0 1 X 0 sup | g (yt, θ) − g (yt, θ ) | ≤ sup |g (yt, θ) − g (yt, θ ) | ≤ sup |g (yt, θ) − g (yt, θ ) |. 0 n 0 n n 0 |θ−θ |<δ t=1 |θ−θ |<δ t=1 t=1 |θ−θ |<δ Therefore n ! n ! 1 X 0 1 X 0 lim lim sup P sup | g (yt, θ) − g (yt, θ ) | >  ≤ lim lim sup P sup |g (yt, θ) − g (yt, θ ) | >  δ→0 n→∞ 0 n δ→0 n→∞ n 0 |θ−θ |<δ t=1 t=1 |θ−θ |<δ Pn 0 E t=1 sup|θ−θ0|<δ |g (yt, θ) − g (yt, θ ) | 0 ≤ lim lim sup = lim E sup |g (yt, θ) − g (yt, θ ) | δ→0 n→∞ n δ→0 |θ−θ0|<δ

Finally use (uniform b/o compact Θ) continuity of g (yt, θ) and DOM. Since limδ→0 sup|θ−θ0|<δ |g (yt, θ)− 0 0 g (yt, θ ) | almost surely, and E supδ sup|θ−θ0|<δ |g (yt, θ) − g (yt, θ ) | < E2 supθ |g (yt, θ) | < ∞.

9