Foundations of Stochastic Modeling Page 1
FOUNDATIONS OF STOCHASTIC
MODELING
LECTURE 1 – 19th January 2010
Probability Preliminaries
Set operations o When we talk of a probability space, we quote a triple ()W,, , where . W is the space of outcomes . is a s -field – the space of “admissible sets/events”. The sets we will be talking about henceforth reside in . . is a probability distribution/measure based on which statements are made about the probability of various events. :[0,1] . In particular, for A Î , ()AA=ÎWÎ()ww : .
o Note the following basic property of unions and intersections of sets
c éùAA= c ëênnnnûú Or equivalently:
c AAc = é ù nnnnëê ûú o When we say an event A holds “almost surely”, we mean that . ()A = 1 . A Î (in other words, A is measurable). We will assume this always hold throughout this course and omit this condition.
o Proposition: Let {An } be a sequence of (measurable) events such that ()1A = for all n. Then A = 1 . n ()n n Proof: First consider that
Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 2
c éùAA= c ëênnnnûú Then consider that, using the union bound
AAcc£ (n nn) å n ( ) =-1 A å n ()n = 0 å n = 0 The union bound, used in the first line, simply states that if we add the probability of events without subtracting their intersections, the result will be greater than when considering the union (which leaves out intersections).
o Definition (limsup and liminf): Let {an } be a real-value sequence. Then we write
lim supnnaa := inf m sup n³m n This is the least upper bound on that sequence. Similarly
lim infnnaa := sup m inf n³m n Graphically (source: Wikipedia):
an
lim supnn¥ a
lim infnn¥ a n Note that these limits need not be finite. Remarks:
. If lim supnnaa< , then aan < eventually (ie: for sufficiently large n). If
lim supnnaa> , then aan > infinitely often – in other words, there are infinitely many elements in the sequence above a.
. Similarly, lim infnnaaaa> > n ev. and lim inf aann< o Definition (limsup and liminf for sets): Let {}An be a sequence of measurable sets. Then Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 3 lim supAA :== lim A nnmn³³mm n m¥ n n lim infAA :== lim A nn mn³m nm¥ nm³ n We can interpret these as follows. The lim sup of An is the set of events that are contained in infinitely many of the An (but not necessarily in all the An past a point – the event could “bounce in and out” of the An) {lim supnnAA} = {wwÎW: Î n i.o.} Similarly, the lim inf of An is the set of events that eventually appear in an An and then in all An past that point. {lim infnnA } = {wwÎW: ÎA n ev.} Clearly, if an event is in the lim inf, it also appears infinitely often (because it appears in all the An past a point, and so lim infnnAAÍ lim supnn. Remark: Take {AAnnn, ev.} = { lim inf } . Then c c AA, ev. = é ù { nn} ëêmn ³m ûú c = A mn{}()³m n = Ac mn ³m n c = lim infnnA c = {An , i.o.} o Proposition: Let {}An be a sequence of measurable sets, then . (liminfnnAA) £ liminf n( n) . (lim supnnAA) ³ lim sup n( n) (The first statement can be thought of as a generalization of Fatou’s Lemma for probabilities). Proof of (i): Let us define BA= . Then we know that mnn³m ÍÍBmmB +1 Í. In other words, the sets Bm increase monotonically to BB== A =lim inf A mnmmnnm ³ n Since the events are increasing, a simple form of monotone convergence gives us that ()BBn (). But we also have that ()BAmn£³ () nm Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 4 because Bm is the intersection of all events from Am onwards, so its probability is less than or equal to the probability of any single event. Thus ()BAmnmn£ inf ³ () supmm()BA£ sup mnmn inf ³ () ()BA£ lim infnn ( ) And given how we have defined B: (liminfnnAA) £ liminf n( n) As required. Borel-Cantelli Lemmas & Independence o Proposition (First Borel-Cantelli Lemma) Let {}An be a sequence of measurable events such that (A ) <¥. Then A i.o.= 0 . In other ån n ()n 1 words, ()lim supnnA = 0 . We offer two proofs – the first is somewhat mechanical, the second is more intuitive. Proof (version 1): Consider that lim sup AA= nn mnm ³ n As such lim sup AA£ ()nn()nm³ n £ A ånm³ ()n 0 As required. The second proof will require a lemma: Lemma (Fubini’s): If {}Xn is a sequence of real-valued random variables with X ³ 0 , then XX= (which could be infinite). This is n ()åånnnn() 1 It might not be clear that the statements (An i.o.) = 0 and (lim supnnA ) = 0 are equivalent. To see this more clearly, note that the first statement is simply a shorthand for (wwÎW: ÎAn i.o.) = 0 . This is clearly identical, by definition, to (lim supnnA ) = 0 . Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 5 effectively a condition under which we can exchange a summation and an integration. Proof (version 2): Let ïì1 if w Î A ï n Xn ()w = A = í n ï0 otherwise îï Note that ()XAnn= (). We also have that A = åånn( nA) ( ) n = ()å n A n <¥ This means that the random variable must be less than or equal to å A n n infinity for every outcome in W ; in other words <¥ a.s. å A n n This must mean that = 1 only finitely many times. Thus, An occurs only An finitely many times. o Definition (independence): A sequence of random variables {}XX1,, n are independent if and only if n Xx££=£,, Xx Xx ()()11 nni=1 ii For all xx1,, n Î . Defininition (independence): A sequence of sets {}AA1,, n are said to be independent if and only if the sequence or random variables ,, are { AA} in independent. o Proposition (second Borrel-Cantelli Lemma): Let {}An be a sequence of independent measurable events such that A =¥, then A i.o.= 1. ån ()n ()n Proof: We showed in a remark above that c AA, i.o= c , ev. { nn} { } = Ac mn ³m n Let us fix m > 1. Then using the inequality 1 -£x e-x Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 6 AAcc= ()nm³ nnnm³ () =-1 A nm³ ()n £-exp (A ) nm³ ()n =-exp (A ) ()ånm³ n = 0 But we have that AAcc£ ()()mnm³³nnåm nm = 0 As required. Notions of convergence o Definition (convergence almost surely) XXn almost surely if {}wwwÎW:()XXn ()1 = More generally, {Xn } converges almost surely if (lim supnnXX== lim inf nn) 1 This is the notion of convergence that most closely maps to convergence concepts in real analysis. Let us understand these two statements intuitively . The first statement tells us that for every single outcome in the sample space W , the sequence of random variables Xn tends to X. . The second statement is a shorthand for the statement that (){ww: lim supnnXX ( )== lim inf nn ( w )} 1 This is, again, simply the statement in real analysis that lim supnnxx= lim inf nn if and only if limnnx exists Thus, we require the limit to exist for every outcome w . Example: Consider the sequence XU= 1 [0,1]. We claim that X 0 almost o n n n surely. Proof: In this case, W=[0,1] . For any w we might drawn, we will find 0 ££X ()w 1 0 n n As required. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 7 o Definition (convergence in probability): XXnp as n ¥ if, for all e > 0 ( XXn ->e) 0as n ¥ Let us understand this concept as compared to convergence almost surely. . In this case, we consider all the outcomes for which Xn is far from X, and we require the probability of these outcomes to get smaller as n progresses. This, however, allows the possibility that for some outcomes w , XXn ()ww () infinitely often, provided that the probability of these w is small and gets smaller. . In the case of almost sure convergence, we only allow the deviation to be large a finite number of times. For every outcome, we require XXn ()ww (). Perhaps this appears clearer if we write the condition for convergence almost surely as ()XXn ->e i.o. = 0 The difference between the two statements above really hinges on the novelty of the BC Lemmas. If An is the event that “Xn is far from X”, convergence in probability requires limnn¥ (A )= 0 , whereas convergence almost surely requires that (An i.o.)= 0 – which would, for example, be satisfied if ()A <¥. å n n o Another interesting way to view the difference between convergence almost surely and in probability is in terms of experiments. Imagine carrying out an “experiment” by generating random variables XX12,,. Each such experiment corresponds to a different outcome w ÎW. . If XXn almost surely, then it is certain that in every experiment we might carry out, the values XX12(),(),ww will tend to X()w . . If XXnp , then it is possible that in some experiments (for a select subset of W ), XX12(),(),ww will not tend to X (w ). The probability of Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 8 this happening decreases with the number of X generated, but it is still possible that in certain experiments, it will happen. Example: Let X = , where the U are IID U[0, 1] random variables. Let o n U £ 1 n { n n } us fix e Î (0,1), and consider the definition ()XXnn->0 ee =() > =£()U 1 n n 1 = n 0 This proves the Xn do indeed converge to 0 in probability. Consider, however, that X > e ==¥1 åån ( n ) n n But since our Xn are independent, we know by the second BC Lemma that {}Xn > e, i.o. almost surely. So Xn cannot be converging to 0 almost surely. Note, on the other hand, that if we replaced our 1/n by 1/n2, we would find (using the first Borel-Cantelli Lemma) that Xn does indeed converge to 0 almost surely. o Claim: XXnnp¥ a.s. X Xn as . As our example shows, however, the converse is false. Proof: Recall that convergence almost surely can be written as ()XXn ->e i.o. = 0 or in other words lim supnn{}XX->e = 0 ( ) We have, however, that lim infnnAAÍ lim supnn. As such lim infnn{ XX->e} = 0 () ()XXn ->e e.v. = 0 This naturally implies that eventually the probability of large deviations falls to 0 – this is the definition of convergence in probability. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 9 o Remark: It is interesting to note that if XXnp , it is possible to find a subsequence XX a.s., k = 1, 2, . This solidifies our intuition that nk convergence in probability differs from convergence almost surely only as a result of a few “freak outcomes”. Proof: Assume we have X . Pick nk + 1 to be the smallest n such that nk 11 XX-³£k ()nk +1 k 2 (This is always possible by the definition of convergence in probability). The convergence of X almost surely follows by the BC Lemma. nk o Example: Take W= [0,1] and = uniform distribution. We then define random variables X1 and X2 as follows: ïì1 w Î [0,1 ) ïì0 w Î [0,1 ) ï 2 ï 2 X ()w = í X ()w = í 1 ï0otherwise 2 ï1otherwise îï îï Graphically: X1()w X2()w w w 0 0 1 1 1 1 2 2 Similarly, we define X3, X4 and X5 to be random variables divided into three parts: X3()w X4()w X5()w 1 1 w w w 0 0 0 1 2 1 1 2 1 1 2 1 3 3 3 3 3 3 We continue this pattern to form a pyramid of random variables. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 10 Now, consider e Î (0,1) – since Xn is equal to 0 or 1, saying Xn > e is the same as saying X = 1 . Now, consider that XX==11 ==1 . Similarly, for n ( 12) ( ) 2 random variables divided into three parts, the probability is 1 , etc… As such 3 (Xnn >e) 0as ¥ And so Xnp 0 . However, if we fix any w Î [0,1], Xn (w )= 1 infinitely often (because every set of random variables will involve at least one that is equal to 1 at that w ). Thus, Xn does not tend to 0 almost surely. o Definition (convergence in expectation) – also called L1 convergence. We say that XX as n ¥ if nL1 XXn -0as n ¥ Similarly, XX if nLp p XXn -0as n ¥ o Claim: XX implies that XX and (XX ) ( ). nL1 n n Proof: For the first part, consider that XX=+XXX- £-+XX n n n -£XXXXnn- Taking expectations yields the result. For the second part, note that ()()()XXnn-= X - X £( XX n -) Where the last step follows by Jensen’s Inequality. Taking limits yields the answer required. o Theorem (Markov’s Inequality): Let X > 0 have finite expectation. Then for all a > 0 ()X ()Xa>£ a Proof: Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 11 ()XXXaXXa=>+£()() , , ³>()XX, a ³>aXa() As required. o Claim: If XX then XX . nL1 np Proof: Fix an e > 0 and use Markov’s Inequality ( XXn - ) XX->£e 0 ()n e As required. Example: Consider Xn= , where U is U[0,1]. Clearly, X 0 a.s. (Note o n U £ 1 n { n } that this does not hold true for w = 0 . However, this set has measure 0, and so the convergence is still almost sure). However, it is clear that XnU=£=1 1 . Clearly, therefore, L convergence and almost-sure ()n ()n 1 convergence are not compatible. (This illustrates an interesting example where taking limits inside and outside expectations gives a different result. Indeed, 1= l imnn()XXnn¹=() lim 0 .) o Example: Let Xn be a sequence of IID random variables with exponential distribution with parameter 1. We will prove that X lim supn = 1 a.s. n logn Intuitively, this implies that if we simulate a certain number of exponentially distributed random variables and plot them against their index, and then plot the curve log n, an infinite number of these points will lie on the curve! This is quite shocking, since one might expect very few very large values! Proof: First e > 0 . Consider -+(1e )log n ()Xnen >+(1e ) log = 1 = n1+e This means that Xn>+(1e ) log <¥ ån ()n Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 12 So by BC-1, (Xnn >+(1e )log , i.o.) = 0 As such, Xnn £+(1e )log ev. almost surely and therefore X lim supn £+ 1 e n logn By the same reasoning 1 ()Xn>-(1e ) log = n n1-e >-=¥ Xn(1e ) log ån ()n So by BC-2 and the independence of the Xn, (Xnn >-(1e )log , i.o.) = 1 X >-lim supn 1e a.s. n logn The two boxed results together imply our result. Note: This only applies to the lim sup – to say that the actual sequence tends to 1 is nonsensical, because most of the Xn will indeed be very small! LECTURE 2 – 26th January 2010 Interchange arguments o Example: In our last lecture, we considered the sequence of random variables Xn= . We argued that X 0 almost surely (as n ¥). We showed, n U £ 1 n { n } however, that (Xn )= 1 for all n. We might wonder whether it is generally true that ? (limnnXX) ==( ) lim n ( X n ) Clearly, this is not the case here. It looks like certain conditions need to be satisfied. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 13 o Theorem (BDD – Bounded Convergence): Let {}Xn be a sequence of random variables such that XKn ££¥ (where K is a deterministic constant) and XXn almost surely, then ()XXn (). (Clearly, in the example above, the n that sits outside the indicator prevents us from bounding Xn). Proof: For any given e > 0 , let An =ÎW{}w :()()XXn wwe - < Let BAnn=W- . Because XXn a.s., (Bn ) 0 . Thus Jensen's ()XXnn-£ () () XX - =-XX(dww ) +- XX (d ) òònn ABnn £+e()2AKBnn () 0 In the last line, we used the fact that for all outcomes in An, the absolute difference is less than e (by definition) and that for all outcomes in Bn, the absolute difference is less than 2K (because XXKn £KXlimnn = £. Remark: It is clear that this theorem still holds if Xn only converges in probability to X rather than almost surely, because all that is required is for the probability of rogue events to fall to 0. The fact that XK£ can be deduced from the fact there is a subsequence X that converges to X almost surely (see nk above). o Lemma (Fatou): If {Xn } is a sequence of non-negative random variables, then ()lim infnnXX£ lim inf n ( n ) (Note: there is no converse for lim sup). Proof: Let YXmn= infn³m and YY== limnn lim inf n X n. Note that: . YYm a.s. . i()nfnn³³mn()XXY³= (inf nm ) m. To see why, consider the Xn that has lowest expectation – call it Xmin . The LHS then has value X ()(dww ) . The RHS, however, has value infX (ww ) (d ) . This ò min ò nn is clearly smaller or equal to the LHS. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 14 Now, pick some k liminf X ³ li m (Y ) n nm¥ ³ limn¥ ( min[Ym ,]k ) The absolute value of the variable of which we are taking an expectation is now bounded by k, because the Xn are positive, and so Ym is positive. So we can apply bounded convergence: lim inf Xk³ (lim min[Y , ]) n nm¥ = (min[Y,]k ) As k ¥, the last line tends to (YX )= (lim infnn ) . o Theorem (DOM – Dominated Convergence): Let {Xn } be a sequence of random variables such that XYn £ with (Y ) <¥ and XXn almost surely, then (XXn ) ( ). Proof: Since XYn £ , we have that YX+³n 0 and YX-³n 0 . Applying Fatou’s Lemma to both variables, we obtain liimifnnfnn(YX+³) ( YX +) l m i nn ( YX -³) ( YX -) Since Y has finite expectation, we can subtract (Y ) from both sides above, and obtain lim inf ()XX³-³- ()lim inf ( X ) ( X ) n nnn lim infnn(XXnn )³£ ()lim su p ( XX ) () Together, the last two lines imply that limn (XXn ) ( ) . o Theorem (MON – Monotone Convergence): Let {}Xn be a sequence of random variables such that 0 £££XX12 almost surely, then . XXn (possibly ¥) almost surely. . ()XXn () (possibly ¥). Proof: Since Xn > 0, we can apply Fatou’s Lemma, to get lim infnn(XXX )³= (lim infn ) ( ) However, the fact that Xn < X also gives ()XXnnn£ ()liminf() XX £ (). These two statements together imply that limn (XXn ) ( ) . Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 15 o Proposition (Frobini I): Let {Xn } be a sequence of random variables with Xn ³ 0 for all n, then XX= () (åånnnn) Proof: Define n ¥ SXSX== (possibly ¥) nnååmm==11 m (The last step follows because the Xn are non-negative, which means the sequence {Sn } is non-decreasing almost surely – there is no outcome in the probability space in which adding an extra term to the sum decreases it). By monotone convergence, we have that (SSn ) ( ) . However, Finite n nn¥ SX== () XX () ()nn()åååmmm===111 nn And ¥ ()SX= ()åm=1 m Combining these two results yields ¥¥ ()XX= ååmm==11nm() As required. o Proposition (Frobini II): Let {Xn } be a sequence of random variables with X <¥. Then ()å n n XX= () (åånnnn) n Proof: As above, set SX= . By the Triangle inequality, we have nmå m=1 Y nn ¥ S =<¥XXX££ a.s. nmåååmm==11mmm = 1 (The last inequality follows because the statement of the Theorem includes the fact that the expectation of the infinite sum is smaller than infinity. If the sum was equal to infinity of a set of non-zero measure, the expectation would blow up. Thus, the sum must be is less than or equal to infinity almost surely.) Now, note that Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 16 ¥¥n SX-=- X X nmåååmmm==1 =11 mm ¥ = X åm=n+1 n ¥ £ X åmn= +1 m ¥0a.s. as n (The last line follows from the fact the sum is finite almost surely. Thus, if we continually whittle away at the sum from below, it will eventually shrink to 0 almost surely). ¥ The above implies that SX almost surely. We can now apply nmåm=1 dominated convergence. First note that Finite n nn ()XXS == () ååmm==11mmn( ) Now take limits of both sides n ¥ lim (XXS )== ( ) lim ( ) nmååmm==11 mnn By dominated convergence, we have ¥ lim (SS )== (lim ) X nn nn()åm=1 m Combining the last two equations, we do indeed find that ¥¥ ()XX= ååmm==11mm() As required. o We have, so far, looked at sufficient conditions for interchange. Is there a “common thread” that runs through these conditions? o Definition (Uniform Integrability): A sequence of random variables {Xn } is said to be uniformly integrable (u.i.) if for all e > 0 , there exists a K (e ) <¥ such that supXX :>º K (ee ) sup X £ nnn() nn(){}XK> ()e n Note that the supremum implies that the K we find must work uniformly for the entire sequence – in other words, for a given e , there must be a single K()e which makes the tail expectation small for every n. Requiring this to be true for a Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 17 single Xn is actually trivial, provided the expectation is finite, because a finite expectation implies that the tail eventually “dies out”. o Proposition: Let {Xn } be a sequence of random variables and suppose XXn almost surely . If {}Xn are uniformly integrable, then (XXn ) ( ). In fact, a stronger result applies – that ( XXn -) 0 . . If Xn > 0 for all n and ()XXn <¥ () , then {Xn } are uniformly integrable. In the absence of the non-negativity requirement in the second statement, this would be an “if and only if” statement. We have almost identified the “bare minimum” condition for interchange. Proof: . Part 1: First, let us verify that X is integrable XX= ()lim infnn Fatou £ liminfnn X £ sup X nn =£+>supXX ; a XX ; a nnn()()() nn X are u.i n £+ supn ()a e <¥ Now, let YXXnn=-. We then have 0 ££YXnn + X. Hence, {Yn } is also uniformly integrable (since {}Xn is uniformly integrable, and X is integrable). Write ()YYYaYYannnnn=>+£( ;) ( ; ) £+e Y ( n {}Ya£ ) n Now, note that Yan £ , so by bounded convergence – and since {}Yan £ Y 0 a.s. because XX a.s. – we have that Y 0 . Thus, n n ()n {}Ya£ n ()Yn 0, as required. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 18 . Part 2: Given a > 1, define ïì xxÎ [0,a - 1] ï fx( )=-íï (line connecting a1 to 0) xÎ-[1aa , ] a ï ï 0 xa> îï Clearly, fa is continuous, and x££fx() x {}xa£-1 a {} xa £ Now XXX=- nnn{}Xa>£() {} Xa ()nn() £-XfXéù ()nan()ëê ûú Now, fa is clearly bounded, and XXn . Thus, applying bounded convergence, fX é ù . Furthermore, by the statement of the theorem, a ()ëê n ûú we have ()XXn () (since the Xn are positive and ()XXn ()). Thus, the RHS above can be made arbitrary close to the difference between these two expectations. In other words, there exists an n0 such that for all n > n0, XXfX£- éù +e ()nna{}Xa> () ()êú n ëû £-XX +e ()n (){}Xa£-1 o Proposition (Sufficient conditions for u.i.) p . If supn Xn <¥ for some p > 1, then {Xn } is uniformly integrable. . If XYn £ n and {Yn } is uniformly integrable, then {}Xn is uniformly integrable. . If {Xn } and {Yn } are uniformly integrable, then {XYnn+ } is uniformly integrable. Proof (part 1): We will prove the first part of the statement above. To do that, however, we will need to use Holder’s Inequality. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 19 Proposition (Holder’s Inequality): Let X and Y be random variables p q such that X <¥ and Y <¥. Pick p, q > 1 such that pq--11+=1 . Then 1/pq 1/ XY£ ( || Xpq) ( || Y ) When p = q = 2, this inequality is none other than the Cauchy-Schwartz inequality ()XY£ ()() X22 Y Proof: If X = 0 or Y = 0 a.s., the inequality holds trivially. As such, 1/p 1/q suppose xX=>() ||p 0 and yY=>() ||q 0. Suppose we are able to show that, for every point w in the sample space, pq X()wwY( ) 1 XY()ww1 () £+ xyp x pp q y Then, taking expectations and rearranging, this becomes Holder’s Inequality. To show this statement is true, we write aX= ()/w x and bY= ()/w y. The statement then becomes aapq ab £+ p q With a and b non-negative. We can therefore write ae= sp/ and be= tq/ . The expression then follows trivially from the convexity of the exponential function: æöst çst÷ ee expç + ÷ £ + èøçpq÷ pq We have therefore proved Holder’s Inequality. We will also require the following result pq Lemma: If p > 1, then XX<¥ <¥ for 1 < q < p. Proof: Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 20 qqæöæö q XXXXX=<+³çç;1÷÷ ;1 èøèøçç÷÷ æöp £+1;1ç XX ³÷ èøç ÷ £¥ As required. Now, fix K <¥ and n > 1, and choose p and q such that pq--11+=1 . We then use Holder’s Inequality to obtain XX; >= K X ()nn() n{}XK> n 1/q 1/p æöq æöp ÷ £ çX ÷÷ç ç n ÷÷÷ç (){}XK> èøèøç n ÷ Taking a supremum 1/p æöpq1/ XX;sup()>£ Kç X÷ X > K ()nn nèøç n÷ () n By the assumption in the theorem, the first expression is finite. Let its value be C: 1/q ( XXnn;()>£ K) C( X n > K) Using Markov’s Inequality 1/q æö X ç n ÷ XX; >£ K ç ÷ ()nn ç ÷ èøç K ÷ 1/q æö X ç n ÷ £ çsup ÷ ç n ÷ èøç K ÷ Now, it is clear that the first moment of Xn has finite expectation because we already know a higher moment has finite expectation: CC ¢ ()XXnn; >£ K K 1/q The RHS now no longer depends on n, so we can write CC ¢ supnnn()XX ; >£ K K 1/q For any e , provided we choose KCC³ (¢ /e )q , our condition for uniform integrability is satisfied. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 21 Proof (part 1 – alternative): We propose an alternative, somewhat simpler proof to part 1: æöp-1 ç ïïìü ÷ ç ïïXn ÷ ç ïï ÷ ()XXnn;;>£ aç X níý X n > a÷ ç ïïa ÷ èøç îþïï ÷ 1 =>()XXnn; a a p-1 K = a p-1 Setting aKa³ / p-1 gives the required result. Kolmogorov’s 3-Series Theorem o Let {Xn } be a sequence of random variables, and let n SX= nmå m=1 We now consider the question of when Sn converges. What condition is needed on the distribution of the X for this to happen? Clearly, if the Xn are IID, this will not be the case, since every additional variable in the sum add something to the sum – IID variables are either “dead at the start” or “never die out”. o Theorem (Kolmogorov 3-series): Let {Xn } be a sequence of independent random variables. Then X converges if and only if for some (and therefore å n n every) K > 0, the following conditions hold: . XK><¥ (clearly, for example, if the X are IID without å n ()n n bounded support, we’ll never be able to satisfy this). This is a statement that the “mass in the tails” must decrease with n. . XX; £<¥ K ån ()nn . arXX ; £<¥ K å n ( nn ) Note that there are no issues of existence of the expected value and the variance in the last two points because these are taken over a finite range – namely on XKn Î-(,)K . Proof: There is a simple proof of this result which relies on martingales. We will cover it later in this course. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 22 nd o Proposition (2 version of 3-series theorem): Let {}Xn be independent random variables with ()0X = for all n. If ar(X ) <¥, then n å n n n SX= converges almost surely. nmåm=1 Proof: Again, using martingale theory. o Example: Set XYnnn= / , where the Yn are IID exponential random variables with mean 1. Does the sum SX= converge? The deterministic series 1/n nnå n does not converge, and we wonder whether the exponential variables will be close enough to 0 often enough to “fix” that. In light of the result we derived in the previous lecture (that draws from an exponential will be arbitrarily large infinitely often) one might expect this series not to converge. Let us check the conditions formally. . First condition: Y XK>=n > K ()n ()n =>()YnKn = e-nK This does indeed sum to a finite number ( XKn ><) ¥ å n The first condition is therefore met. . Second condition: Y XX;;£= Kn Y £ nK ()nn()n n =£1 YY; nK n ()nn But we note that s ()YY;£= s ye-y d00 y > "> s 11 ò0 It is clear, therefore, that condition 2 is violated, because the sum of 1/n diverges. So, as we expected, Sn does not converge almost surely. o Example: Set Xn = Yn/n, where the Yn are IID random variables with (1)1(YYpnn==- =-= 1) Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 23 n Once again, we wonder whether SX= converges. The intuition here is nmåm=1 that the deterministic series (1)/- n n does converge. Our version is this å n sum, but instead of deterministically flipping between positive and negative, we randomly switch between positive and negative. We wonder whether this will “spoil” the convergence. We fix K > 0 (say K = 1) and test the three conditions: . First condition ()Xn >=10 So clearly, the sum does converge. . Second condition 1 (XXnn;1£=) ( YY nn ; £ n) n 1 = ()Y n n 21p - = n We know, however, that k diverges, so the infinite sum of these å n expectations only converges if p = ½, in which case the numerator is 0. We therefore restrict our attention to that case. . Third condition (assuming p = ½) 1 ar(XXnn ;£= 1) ar( Y n | Y n £ n) n2 1 = ar(Y ) n2 n ()0Y 2 - = n2 The infinite sum of the variances is therefore finite. As such, our sum does converge. o Example: Set XYnn= / n, where Yn are IID random variables with probability ½ of being 1 and probably ½ of being –1. Again, the deterministic sum (1)/- n n converges, and we wonder whether adding randomness will å n make a difference. Since, in this case, (Xn )= 0 , we can simply apply the second form of the 3-series theorem: Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 24 1 ar(XXnn ;£= 1) ar( XY nn ; £ n) n ()X 2 = n The infinite sum of these variances clearly diverges, and so Sn does not converge. It is interesting to note how sensitive this result is to scaling – in the 1/n case, the convergence in the deterministic case carries over to the random case, whereas in the 1/ n case, it does not. The Strong Law of Large numbers o Theorem (SLLN): Let {}Xn be IID with X1 <¥, then 1 n X ()X a.s. å k 1 n k =1 Note: there is a simpler proof of this theorem that assumes the fourth moment of X1 is bounded. We prove the more general theorem that does not require this stipulation. Proof: Let us fix n. . Step 1 – Truncation: Let ïì ïXnnX £ n Y = íï n ï 0 otherwise îï In a sense, {}Yn is a “truncated” form of {Xn } . We will first prove the result for the truncated form, and later show that the result then carries over to the non-truncated form. . Step 2: Write Yn as w n YY=-éù()Y + () Y nnëê n ûú n Now, note that since Yn only contains values of Xn that are less than or equal to n ()YXXnnnn=£( ; ) = X ( n {}Xn£ ) n Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 25 We will now attempt to evaluate this expectation using dominated convergence. Note that 1. X11 £ X {}Xn1 £ 2. Since X1 has finite mean, it is the case that X111=XX a.s. {}Xn11£<¥{} X (1) justifies using dominated convergence, and (2) gives us the limit. We get XX () ()11{}Xn£ 1 ()YXn ()1 . Step 3: We have shown that as n ¥, Yn can be written as Ywnn=+() X1 , or in other words that 11nn YwX+() nnååkk==11kk1 The obvious next step is to show that the first term above tends to 0 almost surely. To do this, we will need two lemmas. Lemma (Cesaro Sum Property): Let {an } be a real valued sequence with aan ¥ . Then 1 n aa¥as n å k ¥ n k =1 Proof: Let e > 0 , and choose an N such that aak > ¥ -³e whenever kN Then 1 nNïïìü1 nN- lim inf aaa³ lim inf ïï+- ( e) nk¥ ååkk==11n íý k¥ nnïïïîþn ï ³+0 a¥ -e By a similar argument, lim sup £ a¥ . The result follows. Lemma (Kronecker’s Lemma): Let {an } be a real valued sequence. Then Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 26 æö ¥ a ÷ æö1 n ç k converges÷ ç a 0÷ çå k ==11÷ ç å k k ÷ èøç k ÷ èøçn ÷ a Proof: Let u = k . By the assumption in our proof, n åk£n k uu¥ = limnn exists. Then a uu-=n nn-1 n We then have nn n akuunuu=-=-() ååkk==11kkknk--11 å k = 1 As such 11nn au=- u nnååkk==11kn k-1 The first term clearly tends to u¥ as n ¥. By the Cesaro Sum Property, so does the second term. Thus 1 n auu-=0 n åk=1 k ¥¥ As required. Using these two lemmas, our (rough) strategy will be as follows: n w Step 3a: Show that k converges almost surely, using the åk=1 k Kolmogorov 3-series theorem. Step 3b: Use Kronecker’s Lemma to deduce that for every n w outcome in the sample space for which k converges, åk=1 k n 1 w converges to 0. n å k =1 k n w Step 3c: Note that since k converges almost surely, åk=1 k n 1 w converges to 0 almost surely. n å k =1 k Step 3a: Note that since wYkk=- ( Y k ) , we can conclude that ()wk = 0. Furthermore, the wk are independent of each other (because they are build up from the Yk, which are built up from the Xk, which are independent). We can therefore use the second form of the Kolmogorov 3- n w series Theorem to show that k converges almost surely. (Note that åk=1 k Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 27 some steps in the exposition below require some results about moments of random variable which we will prove in a homework – these are indicated by a *). w 1 ark =- ar()YY ( ) ( k ) k 2 kk 1 = ar()Y k 2 k ()YY22- () = kk k 2 ()Y 2 £ k k 2 1 ¥ *2=>yY()k y d y k 2 ò0 1 k =>2yX()k yy d k 2 ò0 1 k =>2yX()1 yy d k 2 ò0 As such ¥¥w 1 k ark £> 2yX yy d ååk ==1 ( k ) k 1 2 ( 1 ) k ò0 By Frobini’s Lemma, we can interchange the summation and integration, since the integrands are all positive ¥ w ¥ ¥ 1 ark £> 2yX yy d åk=1 ()kykò åk=1 2 {}£ ()1 0 k ¥ ¥ 1 £>2yX y d y ò ()1 åk=êúy 2 0 ëûêúk Note, however, that the sum in this expression above can – in theory – be bounded by some integral (for example, 1/(k – 1)2 works well). Thus, we can write ¥ w ¥ C ark £>1 2yX y åk =1 ()k ò0 ()1 Cy2 + ¥ £>CXy 31ò0 () * = CX3 1 <¥ So the sum does indeed converge. . Step 3b & 3c: Consider the following two events Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 28 ¥ w ()w A =<¥w : k {}åk=1 k ïïìü1 ¥ Bw=ïïww:()0 íýåk=1 k îþïïn Now, by Kronecker’s Lemma, ABÍ . But by Step 3a, (A )= 1 , since A occurs almost surely. Thus, ()B = 1 and 1 ¥ w 0a.s. n åk =1 k Thus, from Step 2, 1 n Y ()X a.s. n åk =1 k 1 . Step 4: We have now proved the SLLN for the truncated series {Yn } . All that remains to be shown is that the result also applies to the full series {Xn } . Now, consider that (XYnn¹=) ( X n > n) But note that Xn>>£<¥= Xn() X åånn()n ()11 So by the first BC Lemma, for large enough n Xnn £ almost surely. Thus, for large enough n, Xn = Yn almost surely. Now, consider that, by the triangle inequality and the Cesaro sum property nn n 11XXY()ww- 1 Y (w) £-(w ) () nnååkk==11kk1 n åk =kk -limnnXY (w ) n (w) But we have already shown that limnnXY ()ww-= n () 0 almost surely. As such nn 11XYXY()ww- ()-lim (ww ) ( ) a.s. nnååk==11kkk n nn Thus, the SLLN holds for the original sequence Xn. o Example: Let {}Xn be an IID sequence of random variables with ()0Xn = n and X 22=<¥s . Let SX= . We know, from the SLLN, that ( n ) nnåk=1 S n 0a.s n Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 29 We might wonder, however, whether this still holds true if, instead of dividing by n, we divide by some quantity an, where an ¥ , but possibly slower than n. We might wonder what the smallest such an is that still makes the sum converge. 1 +e 2 Let us consider, for example,annn = (log ) . log n grows extremely slowly, so this is really very close to a n growth. Consider that æöX ÷ s2 arç n ÷ = ç ÷ 1 +e ç a ÷ 2 èøn nn()log And this implies that X ar n <¥ å n ( a ) n X Thus, by the Kolmogorov 3-series theorem, n converges almost surely, and å a n n by Kronecker’s Lemma, this means that Sann / 0 . Thus, we see that even with such a small-growing an, an almost-sure result still holds. However, it is interesting to note that if we take ann = (slightly smaller), then no almost sure result holds anymore. In fact, we will show in the next lecture that the best we can say is Snn /(0,1) s N . These arguments are clearly extremely sensitive to scaling. LECTURE 3 – 2nd February 2011 Weak convergence & the Central Limit Theorem o Definition (Weak convergence): A sequence of random variables {Xn } is said to converge weakly to a random variable X, denoted XXn as n ¥, if and only if éfX()ùéù fX () ëê n ûëûúêú For all bounded continuous functions f. Comments: . This is clearly a very difficult definition to use, because there are “too many” bounded continuous function. We will see later, however, that it Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 30 suffices to consider a smaller set of functions that is a “basis” for these functions. . If XXn almost surely, then fX()n fX (), because f is continuous, and éùéùfX() fX (), by bounded convergence (since the functions ëûëûêúêún are bounded). Thus, almost sure convergence implies weak convergence. . Each member of {Xn } as well as X need not even live on the same probability space. Indeed, the theorem only mentions convergence of expectations, which are effectively integrals f ()xFx d() fxFx () d() ò n ò Nothing prevents the integrals from being taken over different spaces. Note that this is radically different to the context of almost sure convergence, which fixes an w in the probability space and requires convergence. This is not possible here, because the variables might not even live in the same probability space. o Definition (Weak convergence): A sequence of real valued random variables {Xn } converges weakly to X if and only if Fxn () Fx () for all continuity points of F, where XFnn~ and XF ~ . (This definition makes it even clearly that the variables can live in different spaces). 2 o Example: Take XX12,, IID, with ()X1 = m and ar(X1 ) = s , then ¥ ()X - m å i i=1 sN(0,1) n This is none other than the central limit theorem. (Note the slight abuse of notation which will continue throughout the course – when we say F , we mean X where XF ~ ). We will prove this late in this lecture. o Example (Birthday Problem): Let XX12,, be IID uniformly distributed on {1, , n} and independent of each other (if the Xn were birthdays, we would have n = 365). Now, fix k < n (the number of people amongst whom we are looking for a match), and let Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 31 YiXXjinij==min{ : for some <} This is effectively the smallest i for which Xi matches an earlier X. We are now interested in the quantity (No match in group of kYk people)=>( n ) nn--12 nk -+ 1 =⋅1 ⋅ ⋅ ⋅ nn n We could plug in numbers of n and k into the expression above and calculate the product. This is somewhat inconvenient – let’s try to use weak convergence arguments to get an approximation to this instead. Let us fix x > 0 æöY ÷ ç n >=xYxn÷ > ç ÷ ()n èøç n ÷ êúxn = ëûêúni-+1 i=2 ()n êúxn =-ëûêú1 i-1 i=2 ()n Let us take logarithms of both sides æöêú çYn ÷ êúxn logç >=x÷ ëû log 1 -i-1 ç ÷ åi=2 ()n èøç n ÷ We now need a quick definition Definition: fx( ) ~ gx ( ) as x ¥ if limfx()= 1 . Similarly, ab ~ if x¥ gx() nn an limn¥ = 1 . Note that this does not necessarily imply either bn function/series converges. Now, note that log(1--xx ) ~ as x 0 . As such, log 1--yy ~ . Therefore ()nn æöêú çYn ÷ êúxni -1 logç >-x÷ ~ ëû ç ÷ åi=2 èøç n ÷ n êú 1 êúxn =-ëû(1)i - n åi=2 11 ~ - xn2 n 2 =-x 2 /2 As such æö Y ÷ 2 ç n >xx÷ exp -1 ç ÷ ()2 èøç n ÷ Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 32 The RHS is clearly a distribution, because it takes value 1 at x = 0 and decreases thereafter, and we therefore have weak convergence by the second definition. Before we comment on this result, let us quote a quick Lemma æö ¥ ç1 1 ÷ -xy22/2--/ 21 x 2 /2 Lemma: ç - ÷e £ ey d £ e èçx x 3 ø÷ òx x Proof: For the upper bound: Integral over yx³ ¥ 2 ¥ y 221 eeye-y /2 dy £= -yx//2 2 d- òxxò x x As required. For the lower bound: ¥ ¥ æö æö ---y2 /2 ç 311÷ yx2 /2 ç ÷ 2 /2 ey d ³-ç1÷ey d =-ç ÷ e òx òx èøç yx43÷ èøçx ÷ (Which can be shown by differentiating the result). ¥ 221 Corollary: eyex--yx/2 d~ /2 as ¥. Making this more specific òx x to a normally distributed random variable, suppose XN~(0,)s2 , then ¥ 1 22 ()Xx>=ò exp-y /2s d y x 2ps2 1 ¥ 2 = syò exp-y /2 d 2ps2 x /s 1 s 22 ~ e-x /2s 2p x It is interesting to note, therefore, how similar our result looks to the central limit theorem, despite the fact we did not center the original variables Yn. The distribution we eventually obtained has a tail integral, which, by the corollary, looks similar to that for a random variable. (In fact, the random variable with this tail integral is a Rayleigh random variable). Remark: A full rigorous treatment of this example will be done in homework 1. 2 o Example: Let XX12,, be IID N(0,s ) random variables, and let 2 MXXn = max()1 , , n . Consider the case s = 1, and let an be such that X >=a 1 (we will show in homework 1 that a ~2logn ). We would like ()1 n n n to show that Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 33 -x {aann()Mx-£ n } exp{ - e} "Î x We first note the RHS has no discontinuity points. Next, note that as x ¥, the RHS tends to 1, whereas as x -¥, the RHS tends to 0. Finally, we note the RHS is monotone. Thus, the RHS is a valid cumulative distribution function. Let us now show convergence: MXX£+xxxaaa = £+,, £+ ()nnaaa()1 nnn nnn éùn =£+ X x a êú()1 a n ëûn éùn =-1 X >x +a êú()1 a n ë n û We will now require a lemma Lemma: Consider two (possibly non-real) sequences {an } and {}bn . If bn c an 0 , bn ¥ and abnn c, then (1+aen ) as n ¥. This is precisely the situation in this case, with bnn =¥ and aX=+-> x a 0 , since a ¥. We must now consider whether nn()1 a n n abnn 0 . To do this, we will make extensive use of the Lemma in the previous 2 example (namely that Xx>=1 e-x /2 ) ()1 x 2p 11 ïïìü2 n ⋅ Xn>+xxaa~exp ⋅ ⋅ ⋅ïï -1 + ( 1 aann) íý2 ( ) n 2p x + a îþïïn an n 2 11 2 a =⋅---nxexp x n x {}2a2 2 2p + a n an n 11 --x 22/2aa 2 /2 =⋅ne-x enn e 2p x + a an n 11--x 22/2aa 1 2 /2 =⋅ne-x ennap2 e x n 22pap+ an an n = 1/n 11 22 -x -x /2an ~2ne⋅> eapnn() X1 a 2p x + a an n a -x 22/2a = n ee-x n x + a an n e-x As such, by the Lemma Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 34 é ùn MX£+xxaa =-1 >+ ()nnaaê ()1 nú nnë û -x e-e As required. o Definition (Tightness): A sequence of random variables {}Xn with corresponding distribution functions{}Fn is said to be tight if ">e 0 , $<¥K()e such that supnn { XK>£ (ee )} Equivalently supé 1- FK (ee ) ù £ nnëê ()ûú Comment: A sufficient condition for tightness if supn Xn <¥. To see why, with K > 0, and consider that by Markov’s Inequality <¥ XXnnnsup ()XKn >£ £ KK o Proposition: Let {Xn } be a sequence of random variables . If XXn , then {Xn } is tight . If X is tight, then there exists a subsequence n such that X { n } { k } { n } k converges weakly. This is somewhat isomorphic to the concept of compactness in real analysis. o Definition (Characteristic Function): A characteristic function (CF) of a random variable X is given by jq()=Î é exp(iX q )ù q X ëê ûú Remarks . jX (0)= 1 . jqX ()£ 1, because by Jensen’s inequality, jqX ()£={ exp(iX q )} 1 n ()nnn . jX ()0=iX ( ), provided X <¥. . If X and Y are independent, jqjqjqXY+ ( )= X ( ) Y ( ) , though the converse is false. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 35 o Examples: If XN~(,)ms2 , then jq()=- expi mqsq1 22 X ()2 Note also that if YX=+ms, then iXqm[]+ s i qm jqYX()==+()ee jsq ( ) This is convenient in relating all normally distributed random variables to the standard normal. o Proposition (Levy characterization theorem): Let {}Xn be a sequence of random variables with distribution functions {Fn } . If . jq() jq () for all q Î Xn . jq() is continuous at 0. Then XXn and X has characteristic function jq(). Comments: . If XXn , then the two conditions in the proposition trivially hold – this is really an “if and only if” statement. . One might wonder why the second condition in the theorem is needed. To see why, consider, for example, XNnn ~(0,). We then have 2 jq()= e-q n/2 , and as n grows large, this converges to Xn ì ï0if q ¹ 0 jqX () í n ï1if q=0 îï This does not satisfy our continuity requirements at 0. To see why, consider that with K > 0, æöK 1 XK>=ç N(0,1) >÷ ()n ç ÷ èøç n ÷ 2 In other words, the distribution in question is not tight – tail probabilities grown with n. The continuity condition, therefore, can be thought of as a tightness condition. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 36 The Central Limit Theorem o Theorem: Let XX12,, be IID with mean ()X1 = m and variance 2 s = ar(X1 ) . Then n ()X - m å i i=1 sN(0,1) n n Proof: It suffices to prove this for the case of m = 0 . Let SX= , and niå i=1 consider its characteristic function é ù jq()= expiS q / n Sn/ ê ()n ú n ë û é n ù = exp iq X ê å j ú ë ()n j =1 û é n ù = ê exp iq X ú ë j =1 ()n j û n é ù = êexp iq X ú j =1 ë ()n j û éùn = êúj q ë X ()n û Now, let us take a Taylor Expansion qq2 jjq =+(0) j¢¢¢ (0) + j (0) +R XX( ) X Xn n n 2n 2 qq22 =+jXn(0) (XiXR ) + ( ) + n 2n qs22 =-1 +R 2n n Now, consider that n jqj()= q Sn/ ()X () n n n æöqs22 =-ç1 +R ÷ ç n ÷ èøç 2n ÷ bn =+()1 an Now, suppose we can show that ab=+=+ né qs22 Rù qs 22 nR qs 22 (or in nnëê 22n nûú n 2 other words, that nRn 0 ) as n ¥, then we would have æöqs22 jq()- expç ÷ Sn/ ç ÷ n èøç 2 ÷ = jq() Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 37 This is the characteristic function of N(0,s2 ) , and it is continuous at q = 0 . Thus, all the conditions of Levy’s Theorem hold, which proves the central limit theorem. All we now need to do is to prove that nRn 0 as n ¥. To do this, first note the following standard result from deterministic analysis proof?!? æön+1 n n ç xx2 ÷ ix n ()ix ç ÷ e - £ç ÷ åm=0 mn! ç(1)!+ n ! ÷ èøç ÷ Here, ab=min( ab , ). Let us apply expectations æönn+1 n ç xx2 ÷ ix n ()ix ç ÷ e - £ç ÷ å m=0 m !!ç(1)nn+ ! ÷ èøç ÷ Now, use Jensen’s Inequality on the LHS to obtain æö22 æöm ç XXqq2 ÷ iXq 2 ()iXq ÷ ç ÷ efX-£=ç ÷ ç ÷ éù(,)q () çåm=0 ÷ ç ÷ êú èøç m !3!2÷ ç ÷ ëû èøç ÷ Now, observe that if q 0 , fX( ,q ) 0 almost surely, since X is bounded. 2 2 Similarly, note that fX(,)q is bounded by X q2 , and since X =<¥s2 , we can say that fX(,)q £ Y with Y <¥. We can therefore apply the dominated convergence theorem to conclude that é ù ëêfX(,)qqûú 0 as 0 More details? But we have R = fX(,q ) n ()n ïïìü3 ïïX q 2 ïïn 2 q =nXíýïï ïï3! n îþïï ìü3 3 ïïq 2 ïïX 2 ïïn q =íïïX ý ïï3! n îþïï Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 38 Inside goes to 0 as n -> infinity for all theta, and is dominated by |X|^2 theta^2 which has finite expectation, hence by dominated convergence it all goes to 0. We kind of need thrd moment – but clever combination here using second moments as well to be able to use interchange. Odds & Ends o Proposition (The Skorohod Representation): Let {Xn } be a sequence of random variables such that XXn . Then there exists a probability space ¢ ¢ ()W,, supporting a sequence {Xn } and X such that d d ¢ ¢ XXnn= and XX= ¢¢ And XXn almost surely. Proof: Let Fn be the distribution of Xn and F be the distribution of X. We know that Fxn () Fx () () at all continuity points x of F. Now, define Fp-1()=³ inf{} xF : ()x p (This generalized inverse deals with cases in which there are discontinuities in the distribution). Note that ( ) implies that --11 Fpn () Fp () () How do we know? for all continuity points of F (and recall that by the definition of distribution functions, F only has a finite number of discontinuity points). Now, take ()(W, , = [0,1], , uniform di stribution ). Let U ~ U[0,1]. Let ¢¢--11 XFUnn==() XFU () Note that ¢¢ ()(Xxnn£= FUx() £ ) =£()UFxn () = Fxn () d d ¢ ¢ ¢ Similarly, (XxFx£=) (). As such, XX= and XXnn= , as required. Note ¢¢ also, however, that by () , XXn for all but continuity points in W . Since Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 39 there are a countable number of such points, the set of such points has measure ¢¢ 0. Thus, XXn almost surely. o Proposition (Continuous Mapping Theorem – CMT): Let {Xn } be such that XXn . Let f be a function with (XDÎ=f ) 0 with Df being the set of discontinuity points of f. Then fX()n fX () o Proposition (Converging Together Lemma – CTL): If {Xn } is a sequence such that XXn and {Yn } is a sequence such that Yan (where a is a deterministic constant), then . XYnn++X a . XYnn Xa X . n 1 X provided a ¹ 0 . Yan LECTURE 4 – 9th February 2011 Introduction to Large Deviations 2 o Consider a sequence of IID random variables, XN12,,)X , ~(ms . Let n SX= . Fix a > m . We are interested in the value niå i=1 ()Snan > In the case of the Gaussian distribution, we can use the bounds derived earlier in these notes æöSn- m na()- m ÷ Sna>=ç n > ÷ ()n ç ÷ èøç ssnn÷ æö ç a - m÷ =>çZn ÷ èøç s ÷ 1 s æö()a - m 2 ÷ £-expç n ÷ ç 2 ÷ 2()pmna- èøç 2s ÷ And æö--13æö2 1 na()--mm na () ç na()- m ÷ Sna>³ç -÷expç - ÷ ()n ç()ss()÷ ç 2 ÷ 2p èøç ÷ çèø2s ÷ Taking logarithms, we find that (c1, c2 and c3 are constants) Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 40 cclog n()aa--mm2 1 c1 logn ()2 23+- £log ()Sna >£-- 12 nn2ss2 n n nn 2 2 Now, this means that 1 ()a - m 2 lim sup log Sna>£- n ()n 2 n 2s 1 ()a - m 2 lim inf log ()Sna>³- nnn 2s2 As such, we find that 1 (a - m)2 log ()Sn>-a n n 2s2 This result is the meta result of large deviations theory. It is easy to derive in the case of a Gaussian variable – we seek to develop this result in more generality. o Let us develop some insight into the exact concept of “large deviations”. The central limit theorem would tell us that Sn »+nZnms ZN~(0,1) A “natural” question, therefore, would be to consider ()Snn >+ms Kn The theory of large deviations goes even further – since a > m , we can write ()(SnaSnnn>= >+ms Kn ) In other words, the central limit theorem deals with “normal” deviations, of order n from the mean. The theory of large deviations refers to larger deviations, of order n. We will now see why the central limit theorem is powerless to deal with these large deviations. o Consider a sequence of IID random variables, XX12,,, with ()X1 = m and n ar(X ) =<¥s2 As ever, let SX= . Fix a > m . We might be tempted 1 niå i=1 to naively apply the Central Limit Theorem and write æöSn- m na()- m ÷ Sna>=ç n > ÷ ()n ç ÷ èøç ssnn÷ æö ç a - m÷ >çZn ÷ èøç s ÷ ¥0as n Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 41 Though this statement is correct, it isn’t particularly informative – it’s quite obvious that the probability in question tends to 0! This happens because of the factor of n on the RHS of the inequality. The central limit theorem is clearly not equipped to deal with large deviations. qX1 o Theorem: Let XX12,, be IID with mean m and such that M()q = (e ) exists for all q Î , and assume ()1X1 = o Example: Let us apply this theorem to Gaussian random variables, for which M()qqmqs=+ exp 1 22. In this case, ()2 Ia()=-- sup qqmqs a 1 22 q { 2 } This is simply a quadratic, with argmax q* = a-m . We then get s2 ()a - m 2 Ia()= 2s2 Which is indeed what we found using Gaussian tail bounds. o Proof of Cramer’s Theorem: We will carry out this proof in two steps . Step 1: establish an upper bound 1 lim sup log()Sn>£-aIa ( ) nnn . Step 2: establish a lower bound 1 lim inf log()Sn>³-aIa ( ) nnn Fix a > m and q > 0 . . Step 1: Recall from Markov’s Inequality that for any a > 0 , we have Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 42 ()X ()X >£a 1 1 a For this proof, we will need a more general form of Markov’s Inequality. Lemma (Generalized Form of Markov’s Inequality): Let g : ++ be an increasing function such that gX()1 <¥. We then have that (gX()) ()X >£a 1 1 g()a Proof: We have that gX()³>é gX (); X aù ()111ëê ûú ³>gX()(aa 1 ) As required. Now, consider that in our case. Since q is strictly positive, we can write ()()SnaSnn>=qq > na =>()exp(qqSnan ) exp( ) qS £ ()een -qna (This is often known as a Chernoff bound. Note that we have used the generalized form of Markov’s Inequality in the last line). We then have qS Snae>£-qna en ( n ) ( ) -qna é n ù = eX exp êq ú ()ë åi=1 1 û n = eX-qna exp(q ) ()i=1 1 n -qna éù = eMëûêú()q = ee-qjqna n () =--exp( na [qjq ( )]) Let us try to optimize the bound (ie: make it as tight as possible). To do this, we choose q such that qqjqÎ-arg supq>0 {}a ( ) Note, however, that Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 43 The derivative of our objective, at q = 0 , is given by a - jm¢(0)=->a 0 . We will show, below that our objective is concave. Together, these two points imply that extending our optimization problem to q Î will not change our optimum, since it necessarily lies in the positive quadrant. Thus, we can choose q such that qqqÎ-arg supqÎ {}a j ( ) This is precisely the optimization problem involved in finding I(a). Going back to the above, we therefore find that ()()Snan >£exp - nIa ( ) Precisely as requested. All that remains to do is to prove that jq ( ) is a convex function. Fix a Î (0,1) and qq12, Î . We then have jaq()()12+-[1 aq ] = logM [ aq 1 +- (1 aq ) 2 ] aqXX(1- a ) q = log() [ee11 21 ] aqXX(1- a ) q = log() [ee12 ] ????????? . Step 2: We now proof the lower bound, in a very different way. Assume, for notational convenience, that X has a density p(x) – this is not, however, required. Then, fix q > 0 and define a new density epxqx () px()=³0 q M()q which is a bona-fide density. Suppose g : is such that gX()1 <¥, then px()éù px () éùgX( )== gxpx ()() d x gx () p () x d x =êú gX ( ) ëûêúòòpx()qqêú1 px () qqëûêú Similarly, é pX(,, X )ù éùgX(,, X )= ê gX (,, X ) 1 n ú ëûêú11nnq ê pX(,, X )ú ëê q 1 n ûú Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 44 Where here, g : n . This is, effectively, a change of measure, and pp/ q is a likelihood ratio. Now, consider é ù Sna>= ( n ) êúSna> ëû{}n éùpX(,, X ) = êú1 n q êú{}Sna> n pX(,, X ) ëûêú01 n éùn pX() = êúi q êú{}Sna> i=1 n pX() ëûêúq i é ù n pX() = ê i ú q ê {}Sna> i=1 qX ú n epXMi ()/()q ëê i ûú éùn éù êúêúM()q = êúëû q êú{}Sna> n êún exp q X å i ëûêú()i=1 é ù =-+ exp qS njq() q ê Sna> ()n ú ë {}n û Now, fix a d > 0 é ù Sna>³êúexp -+qjq Sn ( ) ()nnq na<£+ S nad n () ëûêú{}n é ù ³-++ê expqdna n n jq ( ) ú q na<£+ S nad n ()() ëê {}n ûú =<£+eenaSnan--na(())qjq - qd n d q ( n ) This is starting to look like what we want – it is crucial, however, to ensure that the last probability stays bounded away from 0 – if not, it tends to negative infinity. Unfortunately, according to our original distribution, it does fall to 0. Our hope is to choose our new distribution (characterized by q ) to ensure we get what we want. Note that xeqx p() x ()Xx= d q ò M()q M ¢()q = M()q = jq¢() Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 45 Now, suppose there exists a q* such that jq¢ ( ) [in fact, we have assumed X1 has a moment generating function that exists over the real line, so 2 there does exist such a q ] . Then the mean of Sn under our new distribution would indeed be na, which gives us hope that the probability above will be bounded away from 0. Formally, by the central limit theorem, under q* Sna- n s2N(0,1) n if ()s*2<¥, where ((s*2))2 =- Xa2 . Under the conditions of q* Cramer’s Theorem, it turns out this is also satisfied. We then have that æöSna- d ÷ na<£+ S nad n =ç0 supqÎ {}qjqa - ( ) is a concave program, which implies first-order conditions are necessary and sufficient. In this case, they are a = jq¢ ( ), precisely as desired. Applications of Large Deviations Theory 2 We mentioned that it is enough to the moment generating function to exist in the neighborhood of 0, “and some extra conditions”. These extra conditions are precisely those that ensure such a q does exist. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 46 2 o Example: Let XX12,, be IID N(,ms ) variables variables. Fix a > m . We saw, earlier that ()()Snan >£exp - nIa ( ) 2 =-exp n ()a-m ()2s2 The central limit theorem allows us to “handwave” Sn »+nnNms (0,1). Using this result, however, we can be more exact – we can find a sequence an such that Sn eventually lies below nam + n almost surely. Consider: a Sna>+=mm Sn >() +n ()nnn( n ) æö2 ç (/)an÷ £-expç n n ÷ ç 2 ÷ èøç 2s ÷ We now need to choose an an that ensure our envelope is summable. Let us 2 consider, for d > 0 , annn =+2(1ds ) log . This is very close to n , which we might expect to work given the central limit theorem. With that choice of envelope, Sna>+£mdexp -+ (1 )log n ()()nn = n-+(1d ) This is indeed summable. Thus, by the first Borrel-Cantelli Lemma, {}Snan £+m n almost surely. Using a similar kind of argument, we could show that {}Snan ³-m n eventually almost surely. This is a much more precise statement than that of the central limit theorem. o Moderate deviations theory. We saw that a “typical” deviation is of order n , a “large” deviation is of order n. What about deviations in between? Theorem (Moderate Deviations): Under the conditions of Cramer’s Theorem, for any sequence an such that a . n ¥ n a . n 0 n for any a > 0 , na2 log Sna>+ma - 22()nn asn 2 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 47 Proof of upper bound: WLOG, set m = 0 and a = 1. jjjs(0)=== 0¢¢¢ (0) 0 (0) 2 We then obtain, as q 0 , qs22 jq()~ 2 (Recall that fx()~() gx =li mfx() 1). Let us now use a Chernoff bound to gx() deduce that SSaa nn>£exp(qq n )é exp( n )ù ( nn) nëê nûú a njq()q - n = eenn But j()~qqs22, so n 2n2 æö22 S a qs a ÷ nn>£C expç -q n ÷ ()nn ç ÷ èøç 2nn÷ Optimizing, we find æö2 S a ç a ÷ nn>£C expç -n ÷ ()nn ç 2 ÷ èøç 2ns ÷ As required. LECTURE 5 – 16th February 2011 Random Walks & Martingales Random walks o Definition (Random Walk): A random process {}Snn : ³ 0 is called a random n walk (RW) if it can be represented as SX= , with the X niå i=1 {}n independently and identically distributed, and independent of S0. Remarks: If S0 = 0: . ()SnXn = ()1 and ar(SnXn )= ar(1 ) . By the Strong Law of Large Numbers, SXn » n()1 . SnXn »+()1 s nN (0,1) Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 48 o Question: Suppose T is a positive, integer valued random variable. Is it the case that ()STXT = ()()1 ? o Let us consider the situation in which T is independent of the {}Sn (which is equivalent to saying T is independent of the {}Xn ). We then have TTé ù ()SX== êú XTTXTX | == () [] ()() Ti()ååii==11()ë iû 11 Unfortunately, this answer is not particularly interesting or useful, simply because in all “interesting” situations, T is allowed to depend on the behavior of the system. Let us consider some more interesting cases. o Example: Consider a random walk defined as follows IID ïì 1with prob p ï 1 n XpSX~í >= i ï--1 with prob 1 p 2 niåi=1 îï S In this case, n ->210p a.s., and so S ¥ almost surely. Now, imagine n n S0 =-1 and define TnS= sup{}³= 0 :0n Because Sn ¥, we have ()1T <¥ = . Furthermore, by definition, SSTT=0()0 =. However, we have that ()Xp1 =-> 2 10 and that ()T ³ 1. Thus, it is clear, in this case that ()STXT ¹ ()()1 Clearly, therefore, there can be some issues. In this case, the problem is that the time T is anticipatory; it is determined by full knowledge of the future of the random walk. o Definition (Filtration): We previously defined a probability space by the triplet ()W,, , where represented possible events. We define nn= s()XX0,, to be the sigma algebra generated by XX0,, n . This sigma algebra constrains all the information that is available from knowing the value of the variables XX0,, n . We then say that {}n is a filtration, and clearly, ÍÍnn+1 Í. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 49 o Definition (Stopping time): Suppose T is a non-negative integer-valued random variable. Then T is said to be a stopping time with respect to an underlying sequence {Xn } if, for each k > 0, = f XX,, {}Tk= kk( 0 ) Where fk is a deterministic function. In other words, we require {Tk=Îk} k " o Example (Hitting Times): Let TnXA= inf { ³Î0: n } . We then have = {}Tk=ÏÏÎ { X01 A,, Xkk- AXA , } Similarly, we would define TnSA= inf {}³Î0: n , and we would then have = {}Tk=ÏÏÎ { S01 A,, Skk- ASA , } n Proposition (Wald’s First Identity): Let S be a random walk SX= , o n niå i=1 with S0 = 0, and let T be a stopping time with respect to the sequence {n } (where nn= s(,,X1 X )) . If Xi ³ 0 , then ()STXT = ()()1 [this could, of course, we infinity]. . If Xi <¥ and (T ) <¥, then ()STXT = ()()0 T ¥ Proof: Let SXX== . We then have Tiiååii=11= {}iT£ ¥ (SX) = Ti()åi=1 {}i£T Now, let us do both parts: . First part: Xi > 0, and indicators are always positive, so by Fubini I, we can interchange the expectation and the sum: ¥ (SX) = Tiåi=1 (){}i£T Consider, however, that =-1 {}Ti³< {} Ti =-1 {}Ti£-1 This implies that Î . Going back to our sum {}Ti³ i-1 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 50 ¥ é ù (|SX) = Tiå ê iT£ i-1 ú i=1 ëê (){} ûú ¥ é ù = X | å ê iT£ ()i i-1 ú i=1 ë {} û ¥ éù = êú ()X åi=1 ë {}i£T 1 û ¥ = ()X 1 åi=1 (){}i£T Using Fubini I again: ¥ ((S ))= X T 1 ()åi=1 {}iT£ = ()()XT1 . Second part: In this case, consider that, by the triangle inequality T S £ X T å i=1 i By part 1, however, T XTX=<¥() ()åi=1 i 1 The result then follows by Fubini II. o Example (Gambler’s Ruin): Consider a random walk in which S0 = 0 and (1)(XX== =-= 1)1 . We let ZkS=+ , with 0 < intuitively, k is our initial wealth, and Zn is our wealth at time n. Let Tn= inf {}³=0:ZZN 0 or = nn =-inf {}n ³=-=0:SkSNknn or Consider that (ZkSTT )=+ ( ) . But consider also that, by definition ()0(ZZ==+= 0) NZN ( ) TT T ==NZ()T N Now, unfortunately, the Xi are not non-negative. However, Xi £ 1 , and so it clearly has finite expectation. Let us now assume ()T <¥ (this will be proved in homework 2). By Wald I, we then have ()SXTT == ()()01 And so NZ()== N k T ()/ZNkNT == Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 51 o Proposition (Wald II): Suppose {Xi } are IID bounded random variables, and 22 ()X1 = 0, ()X1 = s . Let T be a stopping time with respect to {}n , and such that ()T 2 <¥. Then 2 ar(STT )= s ( ) Remark: If T were independent of the Xi, the result would be obvious. We seek to extend this to stopping times. Proof: Clearly, under the assumptions of this proportion, Wald I holds – as such, (STXT )== ( ) ( ) 0 . Now 2 T ()SX2 = Tiå ()i=1 TTT-1 =+XXX2 2 ()(åååiiji===+111iij ) But by Wald I, T XTX22= ()( ) ()åi=1 i 1 = s2()T Now, consider the second term TT-¥¥1 XX= XX (ååiji==+11ij) ( åå iji ==+ 11 ij{}{}TijT-³1 £ ) Now, imagine it were possible to apply Fubini 2. We then have TT-¥¥1 XX= XX (ååiji==+11ij) åå iji ==+ 11( ij{}{}TijT-³1 £ ) But note that because of the way we have written these sums, T ³j Ti >. As such, we have that = . As such {}{}Tj³³+³ Ti1 {} Tj TT-¥¥1 XX= XX (ååiji==+11ij) åå iji ==+ 11( ij{}Tj³ ) ¥¥ é ù = XX | åå ê ijTj³ j-1 ú iji==+11ëê (){} ûú ¥¥ é ù = êXX | ú ååiji==+11ë ijj{}Tj³ ()-1 û ¥¥ éù =⋅êúX 0 ååiji==+11ëûi {}Tj³ = 0 Now, let us justify the use of Fubini 2. X is bounded, and so XK£ Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 52 2 T SX2 = Ti(å i=1 ) 2 T £ X ()å i=1 i £=()KT22 K T 2 This has a finite expectation, since T has a second moment. 22 o Example (Gambler’s Ruin): In this case, ()1X1 ==s , and we assume ()T 2 <¥ (again, see homework 2), we then have 22 ()0ZZNZNkNTT==+==()( 0 T ) However, we also have 22 ()ZkSTT=+ ( ) =+kkSS222() + () TT =+kT22s () =+kT2 () And so 2 ()TkNkkNk=-=- ( ) Martingales o Definition (Martingale): A process {}Mnn : ³ 0 is called a martingale (MG) with respect to a filtration {n } if . Mn Î n . Mnn <¥ " . ()MMnn| --11= n o Example (Random walk): Let Mn = Sn, with M0 = 0. Then . Sn Î n . Provided Xi <¥, then SnXn £<¥1 . éùéSSXSX||()=+ ù =+ ëûëêúênn--11 n nn -- 11 ûú n n Thus, if we let ()0Xn = , then our random walk is a martingale. 22 2 o Example (Variance martingale): Consider MSnnn=-s , with ()X1 <¥ and ()X1 = 0. Then . Condition 1 holds. 22 2 . MSnnnn£+=<¥"()ss 2 n Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 53 æö2 . MSnSXnS||=-22ss = çéù +- 2 | ÷ = ()nn-----11111() n nèøçëûêú n n n÷ n n Consider that if we let DMM=- , we can write MDM=+. o nnn-1 niåi=1 0 o Proposition: Let {}Mnn : ³ 0 be a martingale with respect to {}n and put DMMiii=--1 . Then . ()Dii =" 0 2 . If s)upi³1 (Di <¥, then ()0DDij="¹ i j Remarks . Note that this theorem almost achieves the representation of a general martingale as a random walk. Indeed, it expresses our martingale as a sum of mean-0 uncorrelated random variables. The Di, however, need not be independent (and indeed, in most “typical” cases, they will not be). . As we will see later, the boundedness condition is required to be able to argue that why exactly is this needed? 2 2 DDij£<¥ D i D j (Note: in this case, it would be enough to require the second moment of each difference to be bounded, but it seems less clunky to require the slightly more stringent uniform boundedness condition). Proof: Fix i > 1: ()DMMiii=-()-1 =-é (|)MM ù ëê iin--11ûú =-é (|MM ) ù ëê in--11 iûú = 0 Now, fix j >³i 1 : ()DD=-é D ( M M )ù ijëûêú i j j-1 =-é DM()| M ù ëê ()ij j--11 jûú =-éDM M| ù ëê ijjj()--11ûú = 0 n (Note that this can be used to prove ar(MM )=+ ar( ) ar( D ) ). ii0 åi=1 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 54 o Example: Let {Xn } be IID, with (Xi )= 1 and Xi <¥. And let n MX= . This is a martingale: nii=1 . Condition 1 satisfied. n . MX=<¥ nii=1 . Conditioning n MX||= ()nn--11()i=1 in n-1 = XX| ()nini=1 -1 = MXnnn--11()| = Mn-1 It is quite astounding, therefore, that even this process can be written as the sum of uncorrelated increments! Optional Stopping Theorem for Martingales o Question: If T is a stopping time, when is it true that ()MMT = ()0 ? This is the question that will be concerning us in this section. Let us consider some simple examples. o Example: Let MSnnn=-m , with M0 = 0 and ()X1 = m . When is it the case that ()MMT == ()00 ? Effectively, we are asking when it is the case that ()STT = m () This is precisely the subject matter of Wald’s First Identity. 22 22 o Expample: Let MSnnn=-s , with ()XX11==<¥ 0,() s . When is it true that ()MMT == ()00 ? Effectively, we are asking when it is the case that 22 ()STT = s () This is precisely the subject matter of Wald’s Second Identity. o Proposition: Let {}Mnn : ³ 0 be a martingale with respect to {n } , and let T be a stopping time. Then for each m > 1 ()()MMTm = 0 Where Tm=min( Tm , ) . Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 55 LECTURE 6 – 24th February 2011 o Proof: Let {}Di be the martingale differences, and write Tn MDM=+ Tn åi=1 i 0 n =+DM åi=1 i {}Ti³ 0 Take expectations n (MDM) =+ Tn åi=1 () i{}Ti³ ()0 We know, however, that Î {}Ti³ i-1 n é ù (|MDM) =+ Tn å êú i Ti³ i-1 ( 0 ) i=1 ëûêú( {} ) n éù =+êú ()()DM| å i=1 ëû{}Ti³ i i-1 0 = (M 0 ) As required. Remark: Consider that . Consider that lim MM . As such, é lim MMùéù= . nT¥ n T ëê nTnT¥ ûëûúêú . By the theorem above, however, éMMùéù= , which implies than ëê Tn ûëûúêú0 lim éùéùMM= . n¥ ëûëûêúêúTn 0 Thus, every optional stopping theorem boils down to the following interchange argument – if we can make the interchange, then éMMùéù= : ëê T ûëûúêú0 élimMMùéù= lim ëê nTnn¥ ûëûúêú ¥ Tn o Corollary I: Let ()Mnn : ³ 0 be a martingale with respect to {n } and T be a stopping time such that T is bounded (in other words, there exists a K <¥ such that ()TK<=1), then ()MMT = ()0 . o Corollary II: Let MZTn £ , with ()Z <¥. Then by dominated convergence, the interchange holds and ()MMT = ()0 . o Corollary III: Let ()Mnn : ³ 0 be a martingale with respect to {}n and T be a stopping time such that ()T <¥. Provided the martingale differences are uniformly bounded ( é DC| ù £<¥), then ()MM= (). ëê ii-1 ûú T 0 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 56 Remark: This is a stronger version of Wald I. There, we had essentially established the OST under the conditions X1 <¥, ()T <¥. In the case of a random walk, the X are exactly the martingale differences. This theorem is slightly stronger because here, it is enough for the conditional increments to be bounded). Proof: We have Tn MDM=+ Tn å i=1 i 0 M0 is integrable. Now consider that, Tn Tn DD£ ååi i ii==11 ¥ = D åi=1 i {}Ti³ Let us now take expectations. By Fubini I, we can then swap the expectation and sum, since the summands are positive ¥¥é ùé ¥éù ù DD== ê úê D| ú ååii==11ii{}Ti³³ë {} Tiûë å i = 1 {} Ti³ ëûêú in-1 û Since the martingale differences are uniformly bounded bt C, this is < CT(). As such MCTMTn £+() 0 M0 is integrable, and by the statement of the theorem, (T ) <¥. As such, our martingale is bounded by an integrable variable, and the OST holds by corollary II. Martingale Limit Theory & Concentration Inequalities o In Optional Stopping Theory, we were looking for results about random times T. We now consider very large times T. o Proposition (Martingale Convergence Theorem): Let {}Mnn : ³ 0 be a martingale with respect to {}n such that supnn³1 M <¥ (note: this is a much stronger condition than Mn <¥ for all n, because each item could be finite but the sum might diverge). Then MMn ¥ a.s. for some finite bounded random variable M¥ . Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 57 Proof: See Williams, page 109 n Recall that MM=+ D. This is a random-walk like structure, even o ni0 åi=1 though the Di may not be independent. We can write MM1 n n =+0 D nnnåi=1 i At an intuitive level, since the Di are uncorrelated, we might expect the SLLN to hold and the last sum to converge to 0. If we succeed in showing that, we have effectively shown that Mn/n converges to 0. o Proposition (Martingale SLLN): Let {}Mnn : ³ 0 be a martingale with 2 respect to {}n such that s)upn³1 (Dn <¥ (this, as we saw above, is a requirement for the increments to be uncorrelated). Then M n 0a.s. n n D Proof: Put M = k (where, as usual DMM=- . Clearly, M Î , n åk=1 k kkk-1 n n and é n-1 DD ù éùM ||=+ê kn ú ëûêúnn--11ëåk=1 kn nû n-1 D 1 =+k éD | ù åk=1 k n ëê nn-1 ûú = Mn-1 and Mn <¥ for all n, due to the conditions on the second moments of Dn.. Thus, it is a martingale. Now 2 é ù 2 é D ù DD êúnnk ê ijú Mn =+ ê ú êúååkij=¹1 2 ij ëê k ûú ëê ûú The second term is equal to 0, since the D are uncorrelated. 2 2 n sup (D ) M £<¥nk³1 n å k =1 k 2 This implies that 2 supnn³³11MM<¥ sup nn <¥ (Notice that it is often easier to work with second moments rather than first moments, despite the fact finiteness of second moments is a stronger condition Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 58 than finiteness for first moments). If so, by the Martingale Convergence Theorem, MMn ¥ a.s. Now, let n D ()w AM=ww:()()1k A= {}åk=1 k ¥ ïïìü1 n BD=ïïww:()0 íýå k=1 k îþïïn By Kroenecker’s Lemma ABÍ . Thus, ()B = 1. This proves our Theorem. Remark: When we proved the SLLN, we went to great pains to work with a first moment only. In this case, we work with second moments, which makes the proof much simpler. It is interesting to note, however, that the scaling of Dkk / D is “overkill”. Our series would also have converged for k . Carrying the entire k(1/2)+d (1/2)+d proof through, we obtain Mnn /0 a.s. Thus, assuming more than first moments has indeed resulted in a stronger conclusion. Mapping this back to the n IID world and letting SX= with ()X 2 <¥, we have obtained the niå i=1 1 stronger result that 1 S 0 . nn(log )1+d n o Proposition (Central Limit Theorem for Martingales): Let {}Mnn : ³ 0 be a martingale with respect to {}n and put VDn = max1££in i . If ()V 2 . sup n <¥ n n . Vnn / 0 n . 1 D2 s2 (deterministic and finite). This makes our martingale n åi=1 i “very similar” to a random walk. Then 1 Mn sN(1)0, n n Remark: The third condition is key here. For a random walk SY= , it is niåi=1 the case that ar(SnYn )= ar(1 ) . Similarly, for a martingale, Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 59 n ar(MM )=+ ar( ) ar( D ) – we want to ensure these two results are as nk0 åi=1 close to each other as possible. o Proposition (Azuma-Hoeffding Inequality): Let {}Mnn : ³ 0 be a martingale with respect to {n } such that Mcnnn-=£MD-1 nn " Then, for all n >>0,l 0 æö2 ç l ÷ MM-³£l 2expç - ÷ ()n 0 ç n 2 ÷ ç 2 c ÷ èøåk=1 k Remark: Suppose that Dckn £" , then we can re-write this as ïïìül2 MM-³£l 2expíýïï - ()n 0 ïï2 îþïï2cn ïïìüx 2 MM-³ xcn £2expíýïï - ()n 0 ïï2 îþïï Or, choosing lk= nnlog , we obtain æök logn ç ÷ MMn -³0 k nnlog £ 2 expç - ÷ ( ) èøç 2c ÷ Choosing, for example, k = 4c produces a summable sequence, which can be used to obtain an almost sure result. 3 Proof : Define DMMkkk=--1 . Let q > 0 , and write DD Dc=-1111 -kk + c + kk22()cc k() kk Both terms in parentheses are non-negative (since the martingale differences are bounded by ck) and add up to one. As such, we can use Jensen’s Inequality to write qqqDccD - D eekkk£ 1 11-+k + 1 k e 22()cc() k k -qqcc kk ee+ D qqcc- =+k ()eekk- 22ck 3 The proof of Azuma’s Inequality was actually covered in the next lecture. For expositional purposes, we chose to present this material here instead. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 60 Recall, however, that since M is a martingale, éD |0 ù = . Recall, in ëê kk-1 ûú -xx 2 addition, that ee+ £ ex /2 , which can be proved using a Taylor expansion. As 2 such, -qqcckk qD ee+ éùe k | £ ëûêúk-1 2 ()/2qc 2 £ e k Now, consider n é q()MMn - 0 ù é ù eD= êúexp(q ) ëûêúëûk=1 k é n ù = éùexp(qD ) | ê êú kn-1 ú ë ëûk=1 û n-1 é é qDn ùù = ê exp(qDe ) | ú ëk=1 knëê -1 ûúû ()/2qc 2 éùn-1 £ eDn êúexp(q ) ëk =1 k û Applying this process repeatedly, we get æö2 q()MM- ç n ()qc ÷ éùe n 0 expç k ÷ êúçåk=1 ÷ ëûèøç 2 ÷ However, q()MM- MM-³=l exp n 0 ³ eql ()n 0 () q()MM- £ éùeen 0 -ql ëûêú 2 æöq n £-expç c2 ql÷ ç åk=1 k ÷ èøç 2 ÷ Optimizing over q to make the bound as tight as possible, we find n ql*2= / c . Substituting back into the equation, we obtain the required å i=1 i result. o Example: Suppose the Xi are IID uniform random variables taking values in the set = {1, , N} . Let Bb= (,1 , bk ), with bi Î and kn = . Now, draw n IID random copies of X, and write n XX11= ( ,, Xn ) Let kn . We wonder how many times the specific sequence B will appear in n X1 . We will call this number Rnk, . Finding the distribution of this is difficult. However, Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 61 Number of ways we n Probability of can "place" BX on 1 by "sliding" it up and finding B at a particular position down the Xn 1 k ()Rnk=-+ ( 1) 1 nk, ()N Let M0 = ()Rnk, And let MX== éRX|(,)ù s , i ëê nk,1 iûú i i This is then a martingale, and MRnnk= , . But consider that since each X can only be part of at most k of the inequalities MM-=[|][|] R - R ii--1,nk i nki ,1 £ k Using the A-H inequality, we then have æöl2 MM->£l 2expç - ÷ ()n 0 ç 2 ÷ èøç 2nk ÷ æöl2 ÷ RR->£-()l 2expç ÷ ()nk,, nk ç 2 ÷ èøç 2nk ÷ æöx 2 RRxkn->£-() 2expç ÷ ()nk,, nk ç ÷ èøç 2 ÷ We can then construct a confidence interval RnÎ-é()RknRkcc , +ù nk,k ëê nk,,aa n ûú Where ca is chosen to make the probability of the interval =- 1 a . o Example: Let {}Xnn : ³ 0 be an irreducible, recurrent Markov Chain (MC) with transition matrix P. Then, the only bounded solution to Pf = f is a constant vector (a multiple of 1). (Note: the state space could be infinite). Remark: We can think of the vector f as a function f : , where is the set of states of the chain. Note as well that this is not the steady state equation, which satisfies pp= P . Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 62 Proof: Let MXnn= f() for an f that satisfies the assumption of the proposition. The main question here is one of uniqueness, since it is clear that a constant vectors can solve Pf = f. Let us first show that Mn is a martingale: . MXcnn=£f() . MXnnÎ= s(,,X1 n ) (in fact, it only depends on the last Xn). . éùéùMXXPXX| ===fff ()| ()() ëûëûêúêúnn----1111 n n n n As such, {}Mnn : ³ 0 is a martingale with respect to {n } . Note that supnn³1 M <¥, because we have assumed that f is bounded. Thus, MMn ¥ almost surely. Suppose there exists xy, Î such that f()xy< f () Since {}Xnn : ³ 0 is irreducible and recurrent liminff (X )£ f ()x a.s. nn lim supn fX()n ³ f () y a.s. This means, however, that lim infnnf (X )¹ lim sup nnf (X ) , which contradicts the convergence statement. LECTURE 7 – 2nd March 2011 Stochastic stability4 o Deterministic motivation: Consider a dynamical system X(t) for which d()Xt= fXt (()) d t, with f : (this can be thought of as the “equation of motion” of the system). Now, consider an “energy” function g : + , with d()gXt() £-ee >0 dt 4 Some parts of this topic were covered at the end of the previous lecture. For expositional purposes, we chose to present the material here instead. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 63 This is effectively a statement of the fact the energy of the system is “forced” down to 0, since it is “constantly decreasing” As such, $tx0(,e ) such that gXt()()= 0 for all tt³ 0 ; in other words, our dynamical process is “pushed” towards a “stable” state. o We now need to adapt this idea to a discrete stochastic process. We can write our “equation of motion” as XXfXnn+1 -=() n. To add stochasticity, we can write XXnnn++11= ye(, ), where the ei are random variables. This is effectively the definition of a Markov process, provided en is independent of X0. If we make the ei IID, the Markov process becomes time-homogeneous. Now consider the “energy” function – it would be too strong to ask for the energy of the process to decrease along every path. We therefore require it to decrease in expectation: égX()()|-£- gX X ù e ëê nnn+1 ûú Since we need this be true whatever state our process first starts in, we can write x []⋅= [ ⋅ |Xx0 = ], and the condition above becomes égX()ù -£- gx () e x ëê 1 ûú o We now specialize this to a particular stochastic process. Consider a Markov chain {}Xnn : ³ 0 that is irreducible. We would like to know whether the chain has a steady state. For a finite state space, all we need is to check for solutions to pppP ==,11 ; indeed, the existence of such a p is associated with positive recurrence. If the spate space is countably infinity, however, things get slightly more complicated; the two equations above do not suffice, and we also require p ³ 0 , which makes things more complicated. Here, we attempt to find simpler conditions for stability to hold. o Proposition: Let {}Xnn : ³ 0 be an irreducible markov chain on a countable state space k . Let K Í be a set containing a finite number of states. Then, if there exists a function g : + such that (recall x []⋅= [ ⋅ |Xx0 = ]) x gX()1 -£-"Î gx ()ee x and some >0 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 64 x gX()1 <¥ " x Î Then ()Xnn , ³ 0 is positive recurrent o Proof: Let g be as in the proposition, and fix x Î . Construct n-1 MgXgx=--() () ()() AgX nnåk=1 k Where ()()AXgX =- gX ( ) g (). Write k Xkk +1 k n-1 MgXgx=--() () [(gX )]- gX ( ) nn å Xk-1 k k=1 k n =- éù gX(())- gX å êúXk k k=1 ëûk -1 n =- D åk=1 k Now, defining nn= s()XX0,, , we have that éDgXgX|()()|ù =- êúkk--11{} X k k k ëû k -1 =-gX(){} gX ()| Xkk -1 kk-1 =-gX() gX () XkXkkk--11 = 0 And so Dk is a martingale difference. Now, consider the set K Í in the proposition. Let TnXkn= inf{}³Î 0 : K , and Fix m Î . Tmk is a bounded stopping time so we can apply the OST. MM==0 ()Tm ()0 k By plugging this into the definition of the martingale Mn ()1Tm- éùgX()()-- gxé k ()()0 Agx ù = êúTm êúå k ëûk ë k=0 û If xKÎ c , x gX(1 )-= gx () ( Agx )() £-e However, consider the sum – up to the hitting time, all the summands will be outside K, and will therefore be less than or equal to -e . As such ()1Tm- k ()()Ag x£-e T m åk=0 kk() And so é ()1Tmk - ù -³êú()()Ag xe ( T m) ëåk=0 kkû Combining our two inequalities Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 65 £0 ()1Tm- éùgX()()-= gxé k ()() Agx ù êúTm êå kú ëûk ë k =0 û £-e éùTm xkëûêú -£-gx() e éù T m xkëê ûú And so éTm£ù gx()/e xkëê ûú But 0 £TmTkk and Tk <¥. So by monotone convergence c xKTgx£Î()/e xK There is therefore a finite upper bound over the expectation – the mean return time to the set K is therefore finite in expectation. o “Application”: Consider the stochastic system XXZnnn++11=+a with {Zn } iid and Xx0 = . We want conditions on a and distributions of Z such that Xn is stable. Use gx()= x xxgX()11=+£aaxZ x+ Z 1 So if Z1 <¥ and a < 1 , then we might be able to make this work, because we would then have Z + e x gX()1 -£-+ gx ()()ae 1 x () Z £- provided X ³ 1 - a Queuing Sample path methods o We will consider a simple queuing model: . One single buffer, with unlimited space. . FIFO (first-in-first-out) service . A single server which processes work at unit rate. This model is often called the G/G/1 queuing model; arrival times are arbitrarily distributed, workloads are arbitrarily distributed (neither necessarily independent) and there is a single server. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 66 o The variability in the system comes from the flow of work into the system. The points in our sample space are w =Î{(,tSnn ): n } , with Sn > 0 and both Sn and tn finite. A single point in the sample space is an infinite set of these numbers. Here is a pictorial representation of a single point, w ÎW in our sample space: S S1 6 S2 S5 S S3 4 t t t t t t 1 2 3 4 5 6 o We assume the flow is stationary – in other words, qwt =dist w, for any w ÎW, where qwt =+(tS t , ) and w = (tS , ) . Definition (load): We define the load of the system as r = S . o ()ånÎ ntÎ[0,1] n The positioning of the interval [0,1] does not matter, since the flow is stationary. If the incoming load is greater than 1, the system will eventually saturate; if it is less than 1, it’ll rest sometimes. The case r = 1 needs separate analysis. o We also assume the flows are ergodic, and that S å n tsÎ[,]t nÎ n r ts- s-¥ t¥ r o Let Wt (w ) denote the work in the system at time t for a point w in the sample space. We will simply denote this as Wt. The time between “empty times” at the server is called a busy cycle. o Now ïì a t = 0 ï ï ()a -t + 0£ tt£ ï 1 ï ()a -+tS+ tt = W ()w = í 12 t ï ï ()Wtt-- + tÎ [,tt ) ï tn nn+1 ï n ï + ï Wt--() t + S tt = ()tn-+21 n n n+ 1 îï n These are called Lindley’s Euqations. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 67 s o Set s < t and w ÎW. Define Xt ()w to be the work in the system at time t given the system is empty at time s. For the point w depicted above (and assuming s st< 1 ), Xt would look like this s Xt s t t t t t t 1 2 3 4 5 6 o For ss¢ < , it is clear that XXs¢()w ³ s ()w t t * s Because this variable is non-decreasing in s, XXtst()ww= lim-¥ () exists. This is the “steady state” of the system; what we would observe if we had started the system a very, very long time in the past. ss+t o Now, define a time-shift operator qt so that XXtt (qwtt )= + ( w ). As such, we have ss+t ** limst-¥XXXX (qwtttt )=== lim st -¥ + ( w ) t + ( w ) t ( qw ) ** As such, XXtdt()wqw= (t ). Shift invariance also holds for the starred process. LECTURE 8 – 23rd March 2011 * o We have seen a number of properties of Xt , including the fact that it exists. It * might, however, be equal to infinity. We now look for conditions under which Xt is finite. * o Proposition: If r < 1 , then Xt <¥ a.s. for all t. Proof: Fix t Î and w ÎW and define ss TtXt ()wt=< sup{ :t () w = 0} Intuitively, this is the last empty time before time t (this is clearly not a stopping time; just a random time): Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 68 s T s ()w t t s Now, consider that (dropping the argument for Xt for simplicity) Work in system Work entering the Work processed at T s system in [Tts , ] in [Tts , ] t t t XXss=+ S1 --() tT s tntT s ånÎ s t {}tTtntÎ[,] =-StT1 ()- s ånÎ nts {}tTtntÎ[,] £ S 1 ånÎ n s {tTntÎ[,]t } ¢ s¢ s s Now clearly, for ss< , TTtt£ ; in other words, Tt decreases as s decreases. This is because by moving s (empty time) further in the past, we are potentially s -¥ introducing more work into the system). This implies that limstt-¥ TT= -¥ exists. Let us assume that Tt >-¥. This implies that there is some finite time in the past at which the system was empty, which implies that the amount of * work at t, Xt , is finite. -¥ All we therefore need to do, therefore is to show that r < 1 >-¥Tt as.. -¥ We do this by contradiction; suppose r < 1 and Tt =-¥ on a set w of sample paths. From the second line above, we have, XSss=--1 () tT tnånÎ s t {tTtntÎ[,]} s Now, however, that Xt ³ 0 (there can never be negative work in the system), s and so, dividing by tT- t throughout, we obtain S 1 ånÎ n s {}tTtntÎ[,] ³ 1 s tT- t s Note that if we were to replace Tt with s, the LHS of this expression would form a sequence with limit to r as s -¥. By the definition of a limit, however, this is also true for any subsequence of that sequence. But since we have assumed Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 69 s Tt -¥ as s -¥, the LHS above is such a subsequence. Thus, letting s -¥, we find S 1 ånÎ n s {}tntÎ[,]T t 1 £r s s¥- tT- t -¥ This is a contradiction, since we have assumed r < 1 . As such, Tt >-¥ a.s. * and Xt <¥ a.s. a o Definition (Coupling Time): Let Wt,t ()w denote the load in a system, at time t and along sample path w , given that the system is allowed to start with load * a at time t . As before, let Xt ()w denote the load in the system at time t given that system started empty at t =-¥ (the “steady state”). Then T0 , called the coupling time, is the first time at which both the paths “couple” – past that point, the two paths behave identically. a Wt,t a * Xt t T 0 o Proposition: If r < 1 , then T0 <¥ a.s. aa Proof: WLOG, assume t = 0 and denote WWt,ttº . Also assume, WLOG, that * aw> Xt () (ie: the transient process starts higher than the steady-state process, as pictured above). In this case, the diagram above should make it clear that the coupling time is the first time at which both processes are empty. Since the transient process is more loaded than the steady-state process, this is effectively the time at which the transient process first empties. In other words, a TtW0 =>inf{ 0 :t = 0} Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 70 Assume r < 1 and suppose T0 =¥ on a set of nonzero probability. This implies a that Wt >"³0 t 0 and that the server is constantly working for all t > 0. As such Work at Work processed tt=0inNew work in [0,t ] [0,] 0£=WSta a +1 - tnå ttÎ[0, ] nÎ {}n Dividing by t and re-arranging, we obtain S 1 a ånÎ n {}ttÎ[0, ] 1 £+n r ttt¥ This is a contradiction, and so T0 <¥ a.s. o Example: This theorem comes in particularly useful, for example, in estimating * quantities like ()XAt Î . Consider the following estimator, based on a queue started at an arbitrary point a 1 t pAˆ ( ) = 1 d s t ò WAa Î t 0 { s } * As t grows, one might expect this to be a good estimator for ()XAt Î . To show this formally, consider that Tt 110 pAˆ ()=+11aa dss d t òò0 WAÎÎT WA tt{}ss0 {} t T0 1 £ + 1 * ds òT XAÎ t t 0 {}s * +0 ()XAt Î We can bound the estimator the other way by simply ignoring the first term. ˆ * Either way, we find that pAt ( )s()XAt Î¥ a.s. a t . Our estimator is therefore consistent. Intuitively, this is because the chains eventually couple. o We now ask how fast this convergence occurs… o Proposition: If r < 1 sup,WAWAaaÎÎ-ÎÎ , XAXA**,, 0 Antn,,1At{ ++tt t} { t 1} t¥ 1 nn11 n n Where AA1,, n are measurable sets and for all (,t1,)tn Î and a > 0 . This is called convergence is total variation and is stronger than weak convergence. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 71 Proof: We prove the theorem when n = 2. This generalizes straightforwardly. Fix a > 0 , tt12, Î , t > 0 and sets A1 and A2. Now WAa ÎÎ-ÎÎ,W a A XAXA**, { ttt++t 12t t } { 12} 1 212 =ÎWAa ,W a Î- A XAXA** Î, Î { ttt+++ttt12t +t } { 12} 112 2 Calling the first event t and the second t , we can write this as (tttt,,TT00>+tt) ( 00 £-) ( ,T >-t) ( , T £ t) Note, however, that if T0 £ t , the steady-state process is identical to the transient process. As such, the second and fourth terms cancel, and we can write this as ()()tt,max,,TT000>-tt,,T0 > t £{ ()() tt > T > t } £>()T0 t t¥ 0 Where the last step follows from the fact T0 <¥ a.s. if r < 1 . Since this result does not depend on the sets A1, A2, we can take a supremum over all such sets and obtain our required result. o We can compare to what happens in Markov chains. Consider a finite-state, irreducible Markov chain, and consider the quantity {XiXjnnijji==-=£|max0,} p { Xi} { Tn >} qT £ max eei -qn ()ij, j Here, the first probability is taken with respect to a Markov chain that starts in an arbitrary state j (the transient process). The second probability is taken with respect to the steady state. Since the expectation of the moments of T is finite (see homework 2), this implies that this difference falls exponentially fast. W a Proposition: If r > 1 , then for all a > 0 , lim inft > 0 almost surely. In o t¥ t other words, the workload increases linearly with time. Proof: Denote the cumulative idle time up to time t as It. We then have WStIa =+a 1 -- tntånÎ {}ttÎ[0, ] () n ³ St1 - å n t Î[0,t ] nÎ {}n Dividing by t Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 72 a S 1 W ånÎ n {}ttÎ[0, ] t ³-n 1 tt Letting t ¥, we get that W a lim inft ³->r 1 0 t¥ t This is intuitively sensible – the amount that accumulates in the system is whatever load there is in excess of 1. Little’s Law (Conservation Laws) o We now consider a setting in which work ()tSnn,: nÎ enters a system and then leaves the system at a time ()dnn : Î . We let qn be the sojourn time of th the n job in the system, given by dtnn- . dn: Î (tSnn,: nÎ ) ()n System q =-dt nnn We do not put any constraints on the working of the system. For example, the server could spend some of its time idling. In particularly, we do not assume that the server processes work on a FIFO basis, so it is not necessarily the case that dd12<<. We define the following quantities At( )== sup{ n : tn £ t} number of arrivals in [0, t ] Dt( )= number of of departures in [0, t ] Nt()=-= A ()tD () t work in the system at time t The only two assumptions we make is that the following two statements are true for every sample path At() . lim(0=¥l Î , ) t¥ t n q å k . limk =1 =¥q Î (0, ) n¥ n o Proposition: Under these assumptions, 1 t Ns() d s lq t ò0 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 73 Intuitively, this states that the average amount of work in the system at any given time is equal to the average number of arrivals per unit time multiplied by the average sojourn time. Proof: tt Ns() d s=- As () Ds () d s ò00ò Note that we can write As()= 1 ånÎ {}ts£ n Ds()= 1 å d £s nÎ {}n As such Ns()==11- 1 åånÎÎ{}ts££ { ds }n { tsd ££ } nn nn (The last equality follows because any jobs that arrive after s won’t be counted at all, and any events that arrive and leave before s will be counted by both indicators and therefore cancel out). By swapping the summation and integration (valid by Fubini), we obtain tt Ns() d s = 1 ds å tsd££ òò00nÎ {}nn ìü ïïAmount of time job n was = íý ånÎ ïïin the system during [0,t ] îþïï We can bound this above by considering the sojourn time of all arrivals up to and including time t (though some of them may overrun past t) and lower bound it by considering the sojourn time of all job that depart before time t (even though some jobs that leave after t do spend some time in the system before t). This gives Dt()t At () qqnn££Ns() d s ååin==00i ò0 th Where ni is the index of the i job to leave the system (since we have no assumed FIFO processing discipline, we cannot assume that qi = i ). Before we continue, we will need the following claim Claim: Under the two assumptions above Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 74 q n ¥0as n t n Intuitively, the second assumption states qn grows slower than n, so we would expect this to be true. Proof: Consider that nn-1 q qq- 11nn-1 nk==--åå==11kk k qqqq0 nnnnååkk==11kk Consider also that n At()n =l as tn ¥ ttnn Finally, q n n ⋅=00l nt n As required. Now by our claim, for all e > 0 , there exists an N()e such that for nN> ()e q d -t nn£een £ tnnt This implies that dn £+ (1ee )tnNn "> ( ) . This means that all jobs after the N()e th job that arrive in [0, t] will have departed by time t(1+ e ) . Therefore At() Dt() (1+e ) qq£ åånn nN=+()e 1 i = 0 i Putting this together with the bounds developed above, we find that 11At() t(1)+e 1At()(1+e ) qq££Ns() d s åån n tt(1++ee ) nN=+()e 1 (1 )ò0 t (1 + e ) n =1 Let’s first consider the upper bound æöæ ö 1 At()(1++ee )ççAt()(1 + e ÷÷1 At() (1 ) qq= çç÷÷ lq åånnçç÷÷ t(1)+ e nn==11ççt(1 + e )÷÷At()(1+ e ) èøè ø The first term tends to l , by assumption 1. The second term tends to q because . By assumption 1, At()¥t¥ Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 75 q . By assumption 2, å n q , which means any subsequence thereof also n q . Since At()(1+¥e ) , the second term above is precisely such a subsequence. This implies that 1 t lim supNs ( ) d s£ lq t¥ t ò0 Now the lower bound 111At()At() At () qq= ttAt(1++ee )åånN=+()ee 1nn 1 ( ) nN =+ () 1 At() N ()e 1 At() qq- = åånn==00nn 1()+ e tAt At() N ()e æöqq÷ 1 At()çåånn÷ =-ç nn==00÷ ç ÷ 1()()+ e tAtAtèøç ÷ lq 1 + e Again, the first term tends to l by assumption 1. The second term tends to q by a similar logic as above, and by noting that since N()e <¥, the second term in brackets tends to 0. As such, we get 1 t lq lim inf()Nse d³ t¥ t ò0 1 + e Since this is true for all e > 0 , this, together with the lim sup above, proves our theorem. LECTURE 9 – 25th March 2011 Single-server queue; IID case o We now specialize our analysis to a situation in which the workloads and inter- th arrival times are IID. Letting tnnn=-tt-1 be the time gap before the n job arrives, this situation requires the {}Sn and {tn } to be IID. There is, once again, only one server. We often denote this situation GI/GI/1. th o We also denote by wn the time that the n job has to wait in queue before it is served. In that respect, dtwSnn=+ n + n. Now, consider that Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 76 ïì 0 d £ t ï n n+1 w = í n+1 ïdt-> d t îï nn++11 n n + =-()dtn n+1 As such + wwSnnnn++11=+-()t Let ZSnn=--1 t n. We then have + wwZnnn++11=+()( =max 0, wZ nn + + 1 ) Since the Zn are IID random variables, this is none other than a random walk n “capped off” at the origin. Letting s = Z , we find that njå j =1 wZ11==max() ,0 max ()s 1 ,0 + wwZ=+ 212() =+sss21max(0, -- ,2 ) =-ss202min £kk£ In fact, carrying this analysis forwards, we find that wn =-ssnkmin0££k n The second term takes into account the fact we have a reflected random walk, and prevents the walk from going negative. o One thing the GI/GI/1 framework gives us over the G/G/1 framework is that we can now say something more about the distribution of the waiting times: wnn=-ssmin kk£n =-max0££kn{}ssnk n = max Z 0££kn{å jk=+1 j } Consider, however, that the Zj are IID. As such, we can change indices on the Z at will and still maintain equality in distribution. Therefore k wZ= max nd0££kn{å j =1 j} = max0££kn{}sk = Mn Since Mn is a non-decreasing sequence, MMnkk=a.s.¥³max 0 s . As such, wMn ¥¥ as n . [Note: convergence in distribution is the best we can hope Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 77 for in this case, because Mn has some structure (the fact it’s non-decreasing) that wn does not. However, since this is a Markov chain, a stationary distribution is all we could really want]. o If the random walk has positive drift – in other words, if ()ZSnn=-> (-1 ) ()0t n – then the random walk drifts to infinity and the waiting times get infinitely large. This is consistent with our findings in the G/G/1 queue, since ()trnn+1 >< ()S 1. On the other hand, if the random walk has negative drift, the chain is stable and the waiting times return to 0 infinitely often almost surely (we motivated this result in homework 2 using a simpler reflected random walk). The single-server M/M/1 Markovian queue o We now consider the most tractable of all single-server queue models; the M/M/1 in queue. In that case, we assume the {Sn } are IID and exponentially distributed with parameter m whereas the {}tn are IID and exponentially distributed with parameter l . o Consider the process Xt( )=³ Number of jobs in system at time t 0 {Xt(:) t³ 0} is a continuous-time Markov Chain with countable state space. In fact, it is a birth-and-death process: l ll 0 1 2 … m m m We now proceed to analyze this CTMC (note that fh()= oh ()=limfh() 0 ) h¥ h Consider t > 0 and a small h > 0. We have Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 78 ïì lhoh+=+() j j 1 ï ï mhoh+=-() j j 1 Xt()|()+= h j Xt == i íï {}ï1(-+lm )hoh + () j = i ï ï oh() otherwise îï Our transition matrix P then takes the form éù êú 1 - lh lh 0 êú êúmlhh1(-+lm )h 0 êú P = êú00mlhh1(-+lm )h h êú êú0 mh êú 0 ëêúû We can write this as é ù êú-ll 0 êú êúmlml-+() 0 êú PI=+êú0()0mlml-+ hoh +() h êú êú0 m êú 0 ëûêú PIQhohh =+ +() PI- Q is the rate matrix, defined by Q = lim h . The steady-state equations in h0 h this case are ppPh = p ³ 0 p ⋅=e 1 Feeding our expression for Ph into the first equation, we obtain pp+==Qh+ o(0 h) p p Q For our particular matrix, this give -=lp + mp 0 01 lpnnn-+11-+() m l p + mp = 0 Letting rlm= / and assuming r < 1 , we obtain pr= p 01 rpnnn++11-+(1 r ) p + p = 0 Solving this recurrence relation, we obtain pr=-ccrpn ==11 nnå Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 79 So n prrn = ()1 -³n 0 This expression passes a “sanity check” at n = 0; the probability the chain is empty is given by pr=-11 =-l , which we would expect to be the average 0 m amount of time the system is empty. o We can use Little’s Law to good effect; consider the following two examples; the first is trivial, the second less so . Consider the server as the “system” Let S be the number of items in the server. The average sojourn time in the server is 1 . As such, by Little’s Law m l ()S ==r p m Note, however, that S =⋅10⋅ 11System is busy+ System is idle . As such, p (S )==( System is busy) r . The result above is therefore consistent with what we would expect. . Consider the queue as the “system” Let Q be the number of items in the queue. The sojourn time in the queue is simply the waiting time, wi. As such, by Little’s Law pp()Qw= l () (where w is the waiting time for any new item that joins the queue). + Now, note that Qt()=-( Xt () 1) (we must subtract any item currently in the server). As such 2 ¥ r ()Qn=--= ( 1)(1rr )n p ån=1 1 - r Therefore r ()w = p mr(1- ) Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 80 o We know consider the more difficult problem of deriving the probability distribution of waiting times, p (wx> ) . Once again, recall w is the wait time experienced by a random job entering the queue. ¥ wx>=⋅0 XXk==0 + wxXk >| = ppp() ( )åk =1 ( )() ¥ = wxXk> |)= (1 - rrk å k=1 p () Now, note that since the exponential distribution is memoryless, we can write Remaining time Processing time for job currently for the k- 1 other jobs waiting in server k-1 wX|Erlang(,)== k S + S = km ()dkå j =1 jd (the Erlang distribution is the distribution of the sum of exponential variables; it is a special case of the Gamma distribution). As such --mxk1 ¥ ex()mm fx()=-rrk (1 ) w åk=1 (1)!k - k-1 ¥ ()mx =-e--mxkmrr(1 ) r1 åk=1 (1)!k - k ¥ ()mrx =-e-mx mrr(1 ) åk=0 k ! =-rm(1 r )e--mr(1 )x And so ì ïexp()mr (1- ) with prob r w ~ íï ï 01- r îï And ()wx>=r e--mrx(1 ) The M/G/1 queue o We now consider a slightly more general case in which the time between arrivals {}tn are exponential, but the processing times {}Sn now follow a general distribution. This complicates matters slightly, because X(t) – the amount of work in the system – is no sufficient enough to totally describe the state of the system at time t. We also require R(t), the residual processing time of the work currently in service. (This was not the case when processing times followed a memoryless exponential distribution). Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 81 o To sidestep this complication, we will consider the following embedded Markov th chain Xn. Let Tn be the time at which the n job concludes processing, and denote Number of jobs in the system XT()ºº X n+ n immediately after the n th job as departed We then have + XXnn++11=-+()1 A n th Where An + 1 is the number of arrivals during the processing time of the (n + 1) job. In this case, by assumption, the An are IID, and ()()ASnn|~Po= sl s, where l is the rate of arrivals. o Now, using the ergodic theorem for DTMCs (and assuming we have stability; ie: r < 1 ), we have ()Xjnn=¥ p() j Furthermore, from the G/G/1 case, we know that for each sample path, * Xt()= Xt exists, which implies that ()Xt()= jt¥ pˆ () j The challenge now, however, is to prove that pp= ˆ . o Theorem: pp= ˆ Proof: Define the following two processes: At( )= Number of arrivals in [0, t ) that "find" the system in state j j Dtj ( )= Number of arrivals in [0, t ) that "leave" the system in state j Also let A(t) and D(t) be the total number of arrivals and departures up to time t, respectively. We now derive or quote a number of seemingly unrelated facts and then combine them to prove our result 1. For any given sample path w , it is clear that Atjj()- D () t£"³ 1 t 0 This is because Aj(t) is the number of “up-crossings” of X(t) over the line X(t) = j whereas Dj(t) is the number of “down-crossings”. Since the path Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 82 is continuous, every up-crossing must be followed by a down-crossing, and so the difference in numbers will be at most 1. 2. Using the ergodic theorem of Markov Chains, we have Dt() 1 n limj == lim1 p (j ) a.s. tn¥ ¥ åk =1 {}Xj= Dt() n k (The first equality follows from the fact that Xk is precisely the state of the system after the kth departure). 3. Using the bound in (1), we have that Dtjjj()- 1££+ At () Dt () 1, so pp(jj ) by (2) ( ) by (2) 11 Dt()Dt()-+ 1At() Dt () 1 Dt () jj££j ⋅ At() Dt ()A(t) Dt () At () And so At() j p()j a.s. At() th 4. Letting tk be the time of the k arrival, we have At() 1 n limj = lim 1 tn¥ ¥ åk=1 {}Xj(t )= At() n k 5. A well-known property of queues with Poisson arrivals is PASTA (Poisson arrivals see time averages – see addendum at the end of this lecture). 11t n lim11 ds = lim tn¥ò {}Xs()== j ¥ åk =1 {} X(t ) j t 0 n k th where tk is the time of the k arrival to the system. Effectively, this states that in working out the average work in the queue, we don’t need to sample at every time-step; it is enough to sample at arrivals. 6. By the ergodic theorem for Markov Chains 1 t pˆ()j = lmi 1 ds .a.s t¥ t ò0 {}Xs()= j 7. Finally, combine all the above Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 83 (6) 1 t pˆ(j )= lim 1 ds a.s. t¥ t ò0 {}Xs()= j (5) 1 n = lim 1 n¥ åk =1 {}Xt()= j n k (4) At() = lim j t¥ At() (3) =a.s. p()j Which proves the required result. o Example: We can use this result to derive the Pollaczek–Khinchine formula for the average waiting time in a M/G/1 queue. First, consider that by the above éXt()ùéù= X ppˆ ëê ûëûúêún Recall further that + XXnn++11=-+()1|~() A n ( ASsPos nn = )l We can write this as follows XA=-(1)X 1 + nn+1 {}X ³1 n+1 n = XAnn11-++1 {}XXnn³³1 {}1 And Squaring this expression, we obtain (we simply denote 11º to save space) {}Xn ³1 22 2 Xnn+++1 =++-+XAnn11- 22XXAAnn+11 111 2 n1 As such 22 2 pp()XAAnn+1 =-++-+ (XAnnnn1122XX++111 111 2n + ) We can begin by simplifying the expression above by noting that X 1 = p ()n {}X ³1 n And so 22 2 ppXXnnnnnn+1 = n 11++--+ p AX++11222 111 A + 1 XA + 1 {}XXnn³³11 {} 2 0=+rr()AXAAXnnn++1 - 2pp () - 2(11 ) + 2(+ ) () ³ add indicator 0, same Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 84 Note that ()AASS===éù (|)lr () 1 ëûêú ar(AAS )=+éù ar( | ) ar+([E(A|S))))=llrls ( SS ) +=+ ar( ) 22 11ëûêú s So 22 2 2 022=++rrlsrs + -ppXX - r +2 r 22 2 2pX(1-=rrlsr ) 2 +s + 2rr++22 ls X = s p 2(1- r ) 2 22 2ss 2()rr++ l S 2 X = ([])ES p 2(1- r ) 2(1)rr++2 CV 2 X = p 2(1 - r ) For M/M/1, CV = 1, so back to expression above. lrwX=-=-(1)(1)+ X11=-³= X X 1 () X - pp p{}X³1 pX³1 p() p Plugging in from the above rr222(1++CV ) 2r r (1 + CV 2 ) l w ==+ p 2(1---rmrmr ) (1 ) 2 (1 ) All true only if we know the expectation of X2 exists. Thus, out that 32 ()S <¥pX <¥, this is excessive because we took the simpler way to prove it. Really, all we need is the second moment. Easy approach to problem leads to an excessive condition. Pillache-kinchin formula. The steady states being equal doesn’t usually happen. Addendum on PASTA o The principle of PASTA (Poisson Arrivals see Time Averages) states that for any stochastic process X(t) over a state space , and for any A Í 11n t lim11= lim ds n¥åk =1 {}Xt()Î A t ¥ ò {}X()sAÎ ntk 0 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 85 provided the tk form a Poisson process – in other words, ttkk- -1 ~exp()l . This is really quite a surprising result! The RHS has full information and sees the system at all times. The LHS only sees it at a selection of times. o For an intuitive proof, write A(t) = cumulative number of samples up to time t. the statement above can then be written as 11t t lim11 d()As= lim d s tt¥At()òò0 {}Xs(())ÎÎ A ¥ t 0 {} Xs A Now, write tt M(t) = 11 d()As- l d s ò00{}Xs()ÎÎ Aò {} Xs () A t = 1 d()As ò0 {}Xs()Î A n »+-1 AAk(1)k () å XAÎ () k =1 {}k Where dAs ( )== d As ()d--ll s At ( ) At ( ) t. This is a martingale, because the increments dAs ( ) are independent (by the “independent increments” property of Poissons processes) and mean 0 (because of our normalization). Furthermore, the conditional increments are uniformly bounded (since they are counts). As such Mt() 0 a.s. In other words, t At( ) 11tt lim11 d(A s)= lim ds tt¥¥ ltAt()òò0 {}Xs( )ÎÎ A t 0 {}X()sA Finally, use lim At()= l to conclude that the first term converges to 1. This t¥ t gives the required result. LECTURE 10 – 30TH MARCH 2011 Renewal & Regenerative Process Renewal processes n Let X be IID, with ()X =<¥m and X =<01. Let SX= , o {}n 1 ( 1 ) niåi=1 with S0 = 0. Let Nt()=£ sup1{ n³ : Sn t} . {Nt(:) t³ 0} is called a renewal process. o Definition (renewal function): The renewal function is defined as mN(]tt)()= [ . Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 86 o Example: Let X1 ~exp(1/m ). {Nt(:) t³ 0} is then a Poisson process. Consider, incidentally, that in a Poisson process, the following two facts are true. Much of our work in this section will be concerned with generalizing these results to general renewal processes: . (())Nt=== mt ()m t ( X1 ) t (generalizes to the elementary renewal theorem). ¢ . mt()==m () X1 (generalizes to the key renewal theorem). ¥ Proposition: mt()= F () t , where Ft()=£ S t. o ån=1 n n ()n Proof: Note that ¥ N()t = 1 ån=1 {}St£ n ¥ [(Nt)] =£ S t å n=1 ()n Where, in the last line, we used Fubini I. Remark: The CDF of Sn is the n-fold convolution of the CDF of each individual variable X. FFF()⋅= () ⋅* () ⋅* * F () ⋅ n 11 1 n fold For example, Ft()=- Ft ( n ) d( Fn ) 2 ò Laplace Transforms & Co. o The Laplace Transform of a function of a distribution with CDF F(x) is defined as ¥¥ Fsˆ()== e--sx d Fx () e sx fx () d x ò00ò A few important results: . Laplace transform of convolutions. In CDF form, if FAB=*, then Fsˆˆ()= AsBsˆ () () . Note that if f is a density function, then ¥ Fsˆ()=< e-sx d Fx () 1 ò0 o Together, the points above imply that Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 87 ¥ Fsˆ() msˆ()==å Fsˆ ()n n=1 1()- Fsˆ Similarly, re-arranging msˆ() Fsˆ()= 1()+ msˆ As such, there is a one-to-one correspondence between F and m. o Theorem: If b(t) is bounded on any interval, then the solution to the following renewal equation abaF=+* ¥ at()=+ bt () at ( - s ) d() Fs ò0 is a =+*b b m t at()=+ bt () b (t -s ) dm ()s òs=0 Verification: Taking the Laplace transform of the first equation asˆˆ()=+ bsˆˆ () asFs () () As such bsˆˆˆ() bsFs () () asˆˆ()==+=+ bsˆˆˆ () bs () bsms () () 1()--Fsˆˆ 1() Fs Taking an inverse Laplace transform of the proposed solution, we do indeed find this equation is satisfied. o Example: Consider that ì ï 0 xt> éùNt()| X== x ïí ëûêú1 ï1()()|+- éùNt Nx X = x x £ t êú1 îï ëû ì ï 0 xt> = í ï1()+-mt x x £ t îï However, Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 88 é ù mt()= ëûêúNt( ) ¥ = éùNt()| X= k d Fx( ) ò0 ëêú1 û t =+-éù1()mt x d() Fx ò0 ëûêú t =+Ft() mt ( - x ) d( Fx ) ò0 =+*Ft( ) ()() m F t This equation is in the form of a renewal equation, with b = F. The solution is therefore t mt()=+ Ft () Ft ( - s ) d ms () òs=0 o Example: Let us consider the example X ~exp(1/m ) and mt(d)()= tm dmt t =m . Thus, the solution to the standard renewal equation is tt at()=+ bt ()mm bt ( - s ) d s =+ bt () bs () d s òss==00ò For the integral to be finite, we need bt() 0. As such, we might expect that t at()¥m bs () d s as t òs=0 It turns out this last conclusion holds generally, not just for Poisson processes. Some Theorems o Proposition (SLLN for renewal processes) Nt() 1 ¥a.s. as t tX()1 Proof: Consider that StSNt()££ Nt ()+ 1. As such S t S Nt()+ 1 Nt() ££N () t+ 1 Nt() Nt() Nt ()+ 1 Nt () 1 a.s. S However, by the SLLN, n ()X . Feeding this into the above proves our n 1 theorem. o Proposition (Elementary/Baby Renewal Theorem): é ù mt() êúNt() 1 =ëû as t ¥ tt()X 1 Proof: See homework 4. 2 o Proposition (CLT for renewal processes): Suppose that ar[X1 ] =<¥s Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 89 æö çNt( ) 1 ÷ 1 t ç - ÷ s N(0,1) ç t ()X ÷ 3/2 è 1 ø éù()X ëê 1 úû Or in other words t s N()ttN»+ (0,1) ()X 3/2 1 éù()X ëê 1 ûú Proof: Throughout this proof, we will write ()X1 = m . 1. Step 1: We first show that æö çN()t 11÷ t ç - ÷ s N(0,1) ç ÷ 3/2 èøçSNt( ) m÷ m We do this as follows: æö æS ö ççNt() 1 ÷÷t Nt() t çç-=÷÷1 - çç÷÷ èøççSSNtNt()mm÷÷ Nt ()/() èNt() ø æöS t 1 ç Nt()÷ =-çm ÷ ç ÷ m SNNt()/()t èøç Nt( )÷ æöS t 11 ç Nt()÷ =-Nt()çm ÷ Nt()m S/() Nt() ç Nt÷ Nt() èø m m Now, consider the last term – since Nt()¥, it is a subsequence of S n n - m . This latter expression does tend to sN(0,1) in distribution, ()n by the standard CLT. The question is whether the weak convergence also holds for the subsequence. To answer this question, we will require a side- lemma. Lemma (Ansombe’s Lemma/random time change 22 lemma): If {}Zn are IID with mean 0 and ()Z1 =<¥s and {Nt(:) t³ 0} is an integer valued process such that Nt() Îb (0, ¥ ) , then t 1 Nt() Z sN(0,1) å 1 N(t) i=1 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 90 In other words, we require our random sequence to converge at linear linearly to infinity for convergence in distribution to hold. Intuitively, this is because we need variances to accumulate fast enough for the CLT to be valid. In this case, note that Nt() 1 a.s . t (X1 ) This implies that Nt( ) 1 =()b tX( 1 ) Hence, the Ansombe’s Lemma applies. We can then use the converging together Lemma on the expression quoted above to obtained the desired result for this step. Step 2: We now use step 1 to prove our result. Consider that S t S Nt()+ 1 Nt( ) ££ Nt()+ 1 ⋅ Nt()Nt() Nt ()+ 1 Nt ( ) 1 a.s. Both sides of the “sandwich” are in the form used in step 1. Therefore, this immediately leads to the desired result. o Definition (Lattice/Arithmetic distribution): A distribution F is said to be of Lattice-type if there exists an h > 0 such that F is supported on {Xnhnn } =Î{ : } For example, the Poisson distribution is lattice with h = 1. o Theorem (Blackwell’s Theorem): If F is non-lattice, then 1 mt()()+- a mtat ⋅ as ¥ ()X1 For all a > 0. Remark: This result is not implied by the fact mt() 1 , because this theorem tX()1 concerns increments in the renewal function m. o Definition: A function f : ++ is directly Riemann integrable (dRi) if Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 91 ¥ . fx() d x<¥ ò0 n ¥¥ . fx() d x- fx () d x 0 òò00nnh0 Where ¥ fx()= sup(): fx kn££xk(1) + n1 n åk=0 { } {}xkhkÎ+[,(1) h ¥ f ()x =+ inffx ( ) : kn££xk( 1) n 1 n åk=0 { } {xkhÎ+[,(1)kh} Direct Riemann Integrability can be thought of as an extension of Riemann Integrability over an infinite line, as opposed to over an integral. + – Remark: If f : + , we simply require both f and f to be dRi for f to be dRi. Remark: Any of the following conditions are sufficient for f : + to be dRi . f is continuous with compact support. . f is bounded and continuous, and ¥ fx() d x<¥ ò0 h for some h > 0. ¥ . f is non-increasing and fx() d x<¥ ò0 o Theorem (Key Renewal Theorem): Providing the usual assumptions on F holds (IID increments with finite mean that are not masses at 0) and that the renewal process is non-lattice, then for any b : + that is dRi 1 ¥ at()¥ bs () d s t m ò0 Where a is a solution to renewal equation abaF=+* . LECTURE 11 – 6TH APRIL 2011 Regenerative processes o A regenerative process is a stochastic process with time points at which, from a probabilistic point of view, the process restarts itself. A good example is CTMC (for example, corresponding to an M/M/1 queue). We consider that the process “resets” each time the queue empties. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 92 o Definition: Let (Xt(:) t³ 0) be a stochastic process with a sample path that is right-continuous with left limits (RCLL or CADLAG). Without loss of generality, we let the renewal occur at Xt()= 0. Now, define the following quantities: th . tt(1)nt+=¹= inf{ ³ ():nXt () 0,( Xt- ) 0}, the time of the n renewal. . ttn+1 =+-(1)()nn t, the inter-renewal time. ïì ïXn()t(]-+1) t tÎ [0,tn . Xt ()= íï , where D is outside the state n ï D>t t îï n space of X. Then X is said to be regenerative if . tn <¥ a.s. "n . t()nn¥ as ¥ . XX01,, are independent . XX12,, are identically distributed (the first renewal might have a different distribution if the system doesn’t start “empty”). o Example (CTMC): Let (Xt(:) t³ 0) be a CTMC with state space = . We can then define a renewal process using any state x Î as our “renewal state”. Our only requirement is that x be at least null recurrent; we must return to the state infinitely often. o Example (DTMC): Let (Xnn : ³ 0) be a DTMC on = . We can then define a renewal process using any state x Î , provided the chain is irreducible and recurrent. o Definition (recurrence): A regenerative process is positive-recurrent if ()t1 <¥ and null-recurrent otherwise. o Proposition (SLLN for regenerative processes): Let (Xt(:) t³ 0) be a regenerative process over a state space with ()t1 <¥, and let f : be such that æöt(1) ç fXs()() d s÷ <¥ èøçòt(0) ÷ Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 93 Then æöt(1) ç fXs()() d s÷ 1 t èøçòt(0) ÷ fXs()() dst¥ a.s. as ò0 t (t1 ) Proof: Let t()k Yf()= fXs( ()) d s and Nt()=£ sup0{ n³ :t t} k òt(1)k- n This means that we can write ttNt() ct()==+ fXs( ()) d s Yf ( ) fXs( ()) d s òò0()åk=0 k t()Nt As such ct() Yf() 11Nt() t =+0 Yf() + fXs () d s å k () tttk=1 tòt()Nt() By the assumption in the theorem, Y0(f) is finite, and so the first term vanishes in the limit. Now 11Nt()Nt() Nt () Yf()= Yf () ttNtååkk==11kk() Note, however, that the Yi(f) are IID. Furthermore, t()k t (1) æöt(1) Yf()== fXs() () d s fXs() () d s£<¥ ç fXs()() d s÷ k òòòt(1)k- t (0) èç t(0 ) ø÷ by assumption, and so we can use the SLLN (since Nt()¥), and the SLLN for renewal processes to write the above as 11Nt( ) Y (f ) = Yf() åk =1 k ()1 t ()t1 Precisely as required. All we now need to do is to show that the last term vanishes. 11tt fXs()() d s£ fXs() () d s ttòòtt()Nt()() Nt () Nt() 1 t £ f ()Xs() d s tNt()òt()Nt() We now use the following Lemma. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 94 n Lemma: If a is a real-valued sequence such that 1 aa , { n } n åi=1 i ¥ then 1 maxa 0 as n ¥. n 1££iin { } Nt() Notice that 1 Yf())( Yf , since Nt()¥. As such, Nt()åk =1 k a.s. ( 1 ) 1 maxYf ( ) 0 . As such Nt( ) 1££KNt( ) { k } 1 t Nt() max{Yf ( )} 1 fXs()() d s£⋅1()1££KNt + k ⋅0 a.s. òt()Nt( ) tttN() (t1 ) As expected. t ()Y (f ) We have just proved that 1 fXs() d s 1 . This suggests that o t () a.s. () ò0 t1 ()Yf() éùfXs() = 1 . p ()()t ëûêú1 o Proposition (CLT for regenerative processes): Let ()Xt(:) t³ 0 be a 2 regenerative process on a state space with ()t1 <¥. Let f : be a 2 function satisfying ()Yf1(| |) <¥. Then æöt ç fXs()() d s Yt()÷ çò0 ()1 ÷ t ç - ÷ sN ()0, 1 ç t ()t ÷ ç 1 ÷ èç ø÷ 2 2 (())Yf1 With s = ()Yf1 ()/(c t1 ) where ffc ()⋅= () ⋅- . ()t1 o Theorem: Let ()Xt(:) t³ 0 be a positive recurrent regenerative process on a state space S Í d . Suppose that either of the following conditions hold d . F()xx==()t1 has a density and there is a function h : that is bounded . F()xx==()t1 is non-lattice and there is a function h that is bounded and continuous Then é t(1) ù êúò hXs(()) d s éùëûêút(0) êúhXt(())¥ as t ëû éùt ëê 1 ûú Then Xt( )as¥Xt() ¥ Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 95 and é t(1) ù ê 1 dsú êòt(0) {}Xs()Î A ú ()Xt()Î= A ë û éùt ëê 1 ûú Remark: If XX(0)=¥d ( ), then ()Xt(:) t³ 0 is stationary. Proof: Fix t > 0, and suppose t(0)= 0 (without loss of generality). é ù at( ) = ëûêúhXt(()) =>+£éhXt( ( )),tt tùé hXt ( ( )), t ù ëê 11ûëúê ûú t =+bt() éù hXt ( ())|t = s d Fs () ò0 ëûêú1 t =+bt() éù hX ( (t (1) +- t s )) d Fs () ò0 ëûêú t =+bt() éù hX ( (t (0) +- t sFs ))d () ò0 êúë û (Where, in the last step, we have used the fact that every cycle has the same distribution). As such t at()=+ bt () at ( - s ) d() Fs ò0 abaF=+* Now, consider that b()t = éhXt( ( )), t > tù êú1 ëû bg()tt£>£>=éùhXt(()),tt t h ( t) () ëê 11ûú ¥ Where g(t) is bounded and continuous and non-increasing and ¥ gt() d t=<¥ h (t ) finite. As such, by the conditions for direct ò0 ¥ 1 Reimann integrability, we have that g is dRi. Now, if bg£ and g is dRi, then b is dRi. And since F is non-lattice, all the conditions of the key renewal theorem hold, and therefore 1 ¥ a()tb ()ss d t¥ ò0 ()t1 1 ¥ = éhXs( ( )),t > sù d s ê 1 ú ()t ò0 ë û 1 1 ¥ éù = hX(())ss1 d êút >s ò0 ëû{}1 ()t1 t 1 1 = hXs( ()) ds ò0 (t1 ) Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 96 Where the last step follows by Fubini, since h is bounded. Now, since h is any bounded function, the above implies that Xt(() ¥ X ), and finally, taking h ()⋅=1 , we recover the last required statement. x {} ⋅= x o Example: Let ()Xt(:) t³ 0 be a positive recurrent regenerative process. Fix a set ASÎ , and let TA( )= inf{ t³Î 0 :Xt ( ) A} Again, for simplicity, assume t(0)= 0 . We are now interested in the expression ()TA()>= t ?, especially for t large. For t > 0, note that a(t) = ()TA()> t =>>+>£()()TA() t ,tt11 t TA () t , t t =>+>=éùTA()tt t() TA () t , s d s ()ëûêú11ò0 t =>+>>=éùTA()ttt t() TA () tTA ,() | s dF s ()ëûêú111ò0 t =>+>->éùTA()ttt t() TA () t sT ,()Ass |= dF ()ëûêú111ò0 Where TA()=> inf{ tt (1), X(,((1))ttXtA)γ+ A} = inf { 0 t Î}. Now t at()= éùTA()>+ttt t()() TA () >- t s TA () > | = s dF s ()ëûêú111ò0 => éùTA() t t ()ëûêú1 t +>-=>>()()()TA() t s tttsTA | () TA () dF s ò0 111 t =+bt() at (- s )b dF s ò0 Where F()ssTA=>()tt11£ |( ) So ab=+b ( aF *) For fixed l Î , t ate()lllttt=+ bte ()b e at ( - s ) d() Fs ò0 t at()=+ bt ()bb ell()ts- ats ( - ) e s d Fs () llò0 t at()=+ bt () at ( - s ) d Fs () d Fs () =b els d Fs () llò0 l l l Suppose $l s.t. Fl is a bona-fide distribution. Then ¥ FeFs()¥= 1 =b ls d() l ò0 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 97 ¥ b-1 ==eFllsZ d() s e ò0 ( ) Assume there is such a solution l* [can show using hyeristic argument]. By appealing to the key renewal theorem 1 ¥ at( ) =¥ els éùTA()>th s d s t l ò0 ()ëûêú1 l ()t1 ¥ t = xFx d() l 1 ò0 l And so at()= {TA()> t} ~h e-lt So the probability has an exponential type tail. Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 98 Index Bold numbers indicate definitions. Italicized numbers indicate examples. Ansombe's Lemma ...... 92 Normal tails, bounds on ...... See Bounds on Azuma's Inequality ...... See Martingale normal tails Baby Renewal Theorem ...... 91 Optional stopping theorem ...... 56 Bounds on normal tails ...... 33 failure of ...... 50 CLT for Renewal Processes ...... 91 Special cases ...... 57 Coupling time ...... 71, 72, 73 PASTA ...... 85 Elementary Renewal Theorem ...... 91 Pollaczek–Khinchine formula ...... 85 Filtration ...... 50 Queues G/G/1 Queue ...... 68 G/G/1 queue ...... 68 Instability analysis ...... 74 GI/GI/1 queue ...... 78 Stability analysis ...... 70, 72 Little's Law ...... 74 GI/GI/1 Queue ...... 78 M/G/1 queue ...... 83 Holder's Inequality ...... 20 M/M/1 queue ...... 79 Laplace transform ...... 89 Random walk ...... 49 Lattice distribution ...... 93 as a martingale ...... 54 Lindley's Equations ...... 69 Assymetric ...... 50 Little's Law ...... 74 Symmetric ...... 52, 53 Load ...... 68 Renewal equation ...... 89 M/G/1 Queue ...... 83 Renewal function ...... 88 Pollaczek–Khinchine formula ...... 85 Sample path methods ...... 67 SLLN for renewal processes ...... 90 M/M/1 Queue ...... 79 Stochastic stability ...... 64 Martingale ...... 54, 55 Deterministic motivation ...... 64 as a sum of differences ...... 54 Stopping time ...... 50 Azuma's Inequality ...... 61, 62 Wald's First Identitiy ...... 56 Central Limit Theorem ...... 60 Wald's First Identity ...... 51, 52 convergence theorem ...... 58 Wald's Second Identity ...... 52, 53, 56 Martingale Convergence Theorem ...... 63 SLLN ...... 59 Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011