7 Essential Asymptotics and Applications
Total Page:16
File Type:pdf, Size:1020Kb
7 Essential Asymptotics and Applications Asymptotic theory is the study of how distributions of functions of a set of random variables behave, when the number of variables becomes large. One practical context for this is statistical sampling, when the number of observations taken is large. Distributional calculations in probability are typically such that exact calculations are difficult or impossible. For example, one of the simplest possible functions of n variables is their sum, and yet in most cases, we cannot find the distribution of the sum for fixed n in an exact closed form. But the central limit theorem allows us to conclude that in some cases the sum will behave like a normally distributed random variable, when n is large. Similarly, the role of general asymptotic theory is to provide an approximate answer to exact solutions in many types of problems, and often very complicated problems. The nature of the theory is such that the approximations have remarkable unity of character, and indeed nearly unreasonable unity of character. Asymptotic theory is the single most unifying theme of probability and statistics. Particularly, in statistics, nearly every method or rule or tradition has its root in some result in asymptotic theory. No other branch of probability and statistics has such an incredibly rich body of literature, tools, and applications, in amazingly diverse areas and problems. Skills in asymptotics are nearly indispensable for a serious statistician or probabilist. Numerous excellent references on asymptotic theory are available. A few among them are Bickel and Doksum (2007), van der Vaart (1998), Lehmann (1999), Hall and Heyde (1980), Ferguson (1996), and Serfling (1980). A recent reference is DasGupta (2008). These references have a statistical undertone. Treatments of asymptotic theory with a probabilistic undertone include Breiman (1968), Ash (1972), Chow and Teicher (1988), Petrov (1975), Bhattachaya and Rao (1986), Cram´er (1946), and the all time classic, Feller (1971). Other specific references are given in the sections. In this introductory chapter, we lay out the basic concepts of asymptotics with concrete appli- cations. More specialized tools will be separately treated in the subsequent chapters. 7.1 Some Basic Notation and Convergence Concepts Some basic definitions, notation, and concepts are put together in this section. Definition 7.1. Let an be a sequence of real numbers. We write an = o(1) if limn→∞ an =0.We write an = O(1) if ∃ K<∞3|an|≤K ∀n ≥ 1. More generally, if a ,b > 0 are two sequences of real numbers, we write a = o(b )if an = o(1); n n n n bn we write a = O(b )if an = O(1). n n bn Remark: Note that the definition allows the possibility that a sequence an which is O(1) is also o(1). The converse is always true, i.e., an = o(1) ⇒ an = O(1). Definition 7.2. Let a ,b be two real sequences. We write a ∼ b if an → 1, as n →∞.We n n n n bn write a ³ b if 0 < lim inf an ≤ lim sup an < ∞,asn →∞. n n bn bn n | |≤ ∀ ≥ → →∞ Example 7.1. Let an = n+1 . Then, an 1 n 1; so an = O(1). But, an 1. as n ;so an is not o(1). 213 1 | |≤ ∀ ≥ → However, suppose an = n . Then, again, an 1 n 1; so an = O(1). But, this time an 0. as n →∞;soan is both O(1) and o(1). But an = O(1) is a weaker statement in this case than saying an = o(1). Next, suppose an = −n. Then |an| = n →∞,asn →∞;soan is not O(1), and therefore also cannot be o(1). cn Example 7.2. Let cn = log n,andan = , where k ≥ 1 is a fixed positive integer. Thus, cn+k log n log n 1 → 1 ∼ an = log(n+k) = k = 1 k 1+0 =1. Therefore, an = O(1),an 1, but an log n+log(1+ n ) 1+ log n log(1+ n ) is not o(1). The statement that an ∼ 1 is stronger than saying an = O(1). Example 7.3. Let a = √ 1 . Then, n n+1 1 1 1 an = √ + √ − √ n n +1 n √ √ 1 n − n +1 1 1 = √ + √ √ = √ − √ √ √ √ n n n +1 n n n +1( n + n +1) 1 1 = √ − √ √ √ n n n(1 + o(1))(2 n + o(1)) 1 1 = √ − √ n n(1 + o(1))(2 n + o(1)) 1 1 = √ − n 2n3/2(1 + o(1))(1 + o(1)) 1 1 = √ − n 2n3/2(1 + o(1)) 1 1 1 1 = √ − (1 + o(1)) = √ − + o(n−3/2). n 2n3/2 n 2n3/2 In working with asymptotics, it is useful to be skilled in calculations of the type of this example. To motivate the first probabilistic convergence concept, we give an illustrative example. Example 7.4. For n ≥ 1, consider the simple discrete random variables Xn with the pmf P (Xn = 1 − 1 1 n )=1 n ,P(Xn = n)= n . Then, for large n, Xn is close to zero with a large probability. Although for any given n, Xn is never equal to zero, the probability of it being far from zero is very small for large n. For example, P (Xn >.001) ≤ .001 if n ≥ 1000. More formally, for any given ≤ 1 1 1 ²>0, P (Xn >²) P (Xn > n )= n , if we take n to be so large that n <². As a consequence, P (|Xn| >²)=P (Xn >²) → 0, as n →∞. This example motivates the following definition. Definition 7.3. Let Xn,n≥ 1, be an infinite sequence of random variables defined on a common sample space Ω. We say that Xn converges in probability to c, a specific real constant, if for any given ²>0,P(|Xn − c| >²) → 0, as n →∞. Equivalently, Xn converges in probability to c if given any ²>0,δ >0, there exists an n0 = n0(², δ) such that P (|Xn − c| >²) <δ∀n ≥ n0(², δ). P P If Xn converges in probability to c, we write Xn ⇒ c, or sometimes also, Xn ⇒ c. However, sometimes the sequence Xn may get close to some random variable, rather than a constant. Here is an example of such a situation. 214 Example 7.5. Let X, Y be two independent standard normal variables. Define a sequence of Y Y random variables Xn as Xn = X + n . Then, intuitively, we feel that for large n,the n part is | − | | Y | small, and Xn is very close to the fixed random variable X. Formally, P ( Xn X >²)=P ( n > ²)=P (|Y | >n²)= 2[1 − Φ(n²)] → 0, as n →∞. This motivates a generalization of the previous definition. Definition 7.4. Let Xn,X,n≥ 1 be random variables defined on a common sample space Ω. We say that Xn converges in probability to X if given any ²>0,P(|Xn − X| >²) → 0, as n →∞. P P We denote it as Xn ⇒ X,orXn ⇒ X. Definition 7.5. A sequence of random variables Xn is said to be bounded in probability or tight if, given ²>0, one can find a constant k such that P (|Xn| >k) ≤ ² for all n ≥ 1. P P Notation If Xn ⇒ 0, then we write Xn = op(1). More generally, if anXn ⇒ 0 for some positive sequence a , then we write X = o ( 1 ). n n p an If Xn is bounded in probability, then we write Xn = Op(1). If anXn = Op(1), we write X = O ( 1 ). n p an Proposition Suppose Xn = op(1). Then, Xn = Op(1). The converse is, in general, not true. Proof If Xn = op(1), then by definition of convergence in probability, given c>0,P(|Xn| > c) → 0, as n →∞. Thus, given any fixed ²>0, for all large n,sayn ≥ n0(²),P(|Xn| > ··· | | ··· 1) <². Next find constants c1,c2, ,cn0 , such that P ( Xi >ci) <²,i=1, 2, ,n0. Choose { } | | ∀ ≥ k = max 1,c1,c2,cn0 . Then, P ( Xn >k) <² n 1. Therefore, Xn = Op(1). To see that the converse is, in general, not true, let X ∼ N(0, 1), and define Xn ≡ X, ∀n ≥ 1. Then, Xn = Op(1). But P (|Xn| > 1) ≡ P (|X| > 1), which is a fixed positive number, and so, P (|Xn| > 1) does not converge to zero. ♣ Definition 7.6. Let {Xn,X} be defined on the same probability space. We say that Xn converges almost surely to X (or Xn converges to X with probability 1 )ifP (ω : Xn(ω) → X(ω)) = 1. We a.s a.s write Xn → X or Xn ⇒ X. a.s Remark: If the limit X is a finite constant c with probability one, we write Xn ⇒ c.IfP (ω : a.s Xn(ω) →∞) = 1, we write Xn →∞. Almost sure convergence is a stronger mode of convergence than convergence in probability. In fact, a characterization of almost sure convergence is th at for any given ²>0, lim P (|Xn − X|≤² ∀n ≥ m)=1. m→∞ It is clear from this characterization that almost sure convergence is stron ger than convergence in probability. However, the following relationships hold. ⇒P a.s→ Theorem 7.1. (a) Let Xn X. Then there is a subsequence Xnk such that Xnk X. P a.s (b) Suppose Xn is a monotone nondecreasing sequence and that Xn ⇒ X. Then Xn → X. Example 7.6. (Pattern in Coin Tossing). For iid Bernoulli trials with a success probability 1 p = 2 ,letTn denote the number of times in the first n trials that a success is foll owed by a failure. Denoting Ii = Iith trial is a success and (i+1) th trial is a failure , 215 P P P n−1 n−1 n−1 n−2 we have Tn = i=1 Ii, and therefore E(Tn)= 4 ,andVar(Tn)= i=1 Var(Ii)+2 i=1 Cov(Ii,Ii+1)= − − 3(n 1) − 2(n 2) n+1 Tn ⇒P 1 16 16 = 16 .