Foundations of Stochastic Modeling Page 1

FOUNDATIONS OF STOCHASTIC

MODELING

LECTURE 1 – 19th January 2010

Probability Preliminaries

 Set operations o When we talk of a probability space, we quote a triple ()W,,  , where . W is the space of outcomes .  is a s -field – the space of “admissible sets/events”. The sets we will be talking about henceforth reside in  . .  is a probability distribution/measure based on which statements are made about the probability of various events.  :[0,1]  . In particular, for A Î  , ()AA=ÎWÎ()ww : .

o Note the following basic property of unions and intersections of sets

c éùAA= c ëênnnnûú  Or equivalently:

c AAc = é ù nnnnëê ûú o When we say an event A holds “almost surely”, we mean that . ()A = 1 . A Î  (in other words, A is measurable). We will assume this always hold throughout this course and omit this condition.

o Proposition: Let {An } be a sequence of (measurable) events such that ()1A = for all n. Then  A = 1 . n ()n n Proof: First consider that

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 2

c éùAA= c ëênnnnûú  Then consider that, using the union bound

AAcc£ (n nn) å n ( ) =-1  A å n ()n = 0 å n = 0 The union bound, used in the first line, simply states that if we add the probability of events without subtracting their intersections, the result will be greater than when considering the union (which leaves out intersections). 

o Definition (limsup and liminf): Let {an } be a real-value sequence. Then we write

lim supnnaa := inf m sup n³m n This is the least upper bound on that sequence. Similarly

lim infnnaa := sup m inf n³m n Graphically (source: Wikipedia):

an

lim supnn¥ a

lim infnn¥ a n Note that these limits need not be finite. Remarks:

. If lim supnnaa< , then aan < eventually (ie: for sufficiently large n). If

lim supnnaa> , then aan > infinitely often – in other words, there are infinitely many elements in the sequence above a.

. Similarly, lim infnnaaaa> > n ev. and lim inf aann< 

o Definition (limsup and liminf for sets): Let {}An be a sequence of measurable sets. Then

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 3

lim supAA :== lim A nnmn³³mm n m¥ n n lim infAA :== lim A nn mn³m nm¥ nm³ n

We can interpret these as follows. The lim sup of An is the set of events that are

contained in infinitely many of the An (but not necessarily in all the An past a

point – the event could “bounce in and out” of the An)

{lim supnnAA} = {wwÎW: Î n i.o.}

Similarly, the lim inf of An is the set of events that eventually appear in an An

and then in all An past that point.

{lim infnnA } = {wwÎW: ÎA n ev.} Clearly, if an event is in the lim inf, it also appears infinitely often (because it

appears in all the An past a point, and so lim infnnAAÍ lim supnn.

Remark: Take {AAnnn, ev.} = { lim inf } . Then

c c AA, ev. = é ù { nn} ëêmn ³m ûú c = A mn{}()³m n = Ac mn ³m n c = lim infnnA c = {An , i.o.}

o Proposition: Let {}An be a sequence of measurable sets, then

. (liminfnnAA) £ liminf n( n)

. (lim supnnAA) ³ lim sup n( n) (The first statement can be thought of as a generalization of Fatou’s Lemma for probabilities). Proof of (i): Let us define BA= . Then we know that mnn³m

ÍÍBmmB +1 Í. In other words, the sets Bm increase monotonically to BB== A =lim inf A mnmmnnm  ³ n Since the events are increasing, a simple form of monotone convergence gives us

that ()BBn  (). But we also have that

()BAmn£³ () nm

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 4

because Bm is the intersection of all events from Am onwards, so its probability is less than or equal to the probability of any single event. Thus

()BAmnmn£ inf ³ ()

supmm()BA£ sup mnmn inf ³ ()

()BA£ lim infnn ( ) And given how we have defined B:

(liminfnnAA) £ liminf n( n) As required.   Borel-Cantelli Lemmas & Independence

o Proposition (First Borel-Cantelli Lemma) Let {}An be a sequence of measurable events such that  (A ) <¥. Then  A i.o.= 0 . In other ån n ()n

1 words, ()lim supnnA = 0 . We offer two proofs – the first is somewhat mechanical, the second is more intuitive.

Proof (version 1): Consider that

lim sup AA= nn mnm ³ n As such

lim sup AA£ ()nn()nm³ n £  A ånm³ ()n  0 As required. 

The second proof will require a lemma:

Lemma (Fubini’s): If {}Xn is a sequence of real-valued random variables with X ³ 0 , then XX= (which could be infinite). This is n ()åånnnn()

1 It might not be clear that the statements  (An i.o.) = 0 and  (lim supnnA ) = 0 are equivalent. To see this more clearly, note that the first statement is simply a shorthand for  (wwÎW: ÎAn i.o.) = 0 . This is clearly identical, by definition, to  (lim supnnA ) = 0 .

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 5

effectively a condition under which we can exchange a summation and an integration.

Proof (version 2): Let

ïì1 if w Î A ï n Xn ()w = A = í n ï0 otherwise îï

Note that ()XAnn= (). We also have that

A = åånn( nA) ( ) n =  ()å n A n <¥ This means that the random variable  must be less than or equal to å A n n infinity for every outcome in W ; in other words  <¥ a.s. å A n n

This must mean that  = 1 only finitely many times. Thus, An occurs only An finitely many times. 

o Definition (independence): A sequence of random variables {}XX1,, n are independent if and only if

n Xx££=£,, Xx Xx ()()11 nni=1 ii

For all xx1,, n Î .

Defininition (independence): A sequence of sets {}AA1,, n are said to be

independent if and only if the sequence or random variables ,, are { AA} in independent.

o Proposition (second Borrel-Cantelli Lemma): Let {}An be a sequence of independent measurable events such that  A =¥, then  A i.o.= 1. ån ()n ()n Proof: We showed in a remark above that

c AA, i.o= c , ev. { nn} { } = Ac mn ³m n Let us fix m > 1. Then using the inequality 1 -£x e-x

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 6

AAcc= ()nm³ nnnm³ () =-1  A nm³ ()n £-exp (A ) nm³ ()n =-exp (A ) ()ånm³ n = 0 But we have that

AAcc£ ()()mnm³³nnåm nm = 0 As required.   Notions of convergence

o Definition (convergence almost surely) XXn  almost surely if

 {}wwwÎW:()XXn  ()1 =

More generally, {Xn } converges almost surely if

(lim supnnXX== lim inf nn) 1 This is the notion of convergence that most closely maps to convergence concepts in real analysis.

Let us understand these two statements intuitively . The first statement tells us that for every single outcome in the sample

space W , the sequence of random variables Xn tends to X. . The second statement is a shorthand for the statement that

(){ww: lim supnnXX ( )== lim inf nn ( w )} 1 This is, again, simply the statement in real analysis that

lim supnnxx= lim inf nn if and only if limnnx exists Thus, we require the limit to exist for every outcome w . Example: Consider the sequence XU= 1 [0,1]. We claim that X  0 almost o n n n surely. Proof: In this case, W=[0,1] . For any w we might drawn, we will find 0 ££X ()w 1 0 n n As required.  

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 7

o Definition (convergence in probability): XXnp as n ¥ if, for all

e > 0

( XXn ->e) 0as n ¥

Let us understand this concept as compared to convergence almost surely.

. In this case, we consider all the outcomes for which Xn is far from X, and we require the probability of these outcomes to get smaller as n progresses. This, however, allows the possibility that for some outcomes w ,

XXn ()ww () infinitely often, provided that the probability of these w is small and gets smaller. . In the case of almost sure convergence, we only allow the deviation to be large a finite number of times. For every outcome, we require

XXn ()ww (). Perhaps this appears clearer if we write the condition for convergence almost surely as

()XXn ->e i.o. = 0 The difference between the two statements above really hinges on the novelty of

the BC Lemmas. If An is the event that “Xn is far from X”, convergence in

probability requires limnn¥  (A )= 0 , whereas convergence almost surely

requires that  (An i.o.)= 0 – which would, for example, be satisfied if ()A <¥. å n n o Another interesting way to view the difference between convergence almost surely and in probability is in terms of experiments. Imagine carrying out an

“experiment” by generating random variables XX12,,. Each such experiment corresponds to a different outcome w ÎW.

. If XXn  almost surely, then it is certain that in every experiment we

might carry out, the values XX12(),(),ww will tend to X()w .

. If XXnp , then it is possible that in some experiments (for a select

subset of W ), XX12(),(),ww will not tend to X (w ). The probability of

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 8

this happening decreases with the number of X generated, but it is still possible that in certain experiments, it will happen. Example: Let X =  , where the U are IID U[0, 1] random variables. Let o n U £ 1 n { n n } us fix e Î (0,1), and consider the definition

()XXnn->0 ee =() > =£()U 1 n n 1 = n  0

This proves the Xn do indeed converge to 0 in probability. Consider, however, that  X > e ==¥1 åån ( n ) n n

But since our Xn are independent, we know by the second BC Lemma that

{}Xn > e, i.o. almost surely. So Xn cannot be converging to 0 almost surely. Note, on the other hand, that if we replaced our 1/n by 1/n2, we would find

(using the first Borel-Cantelli Lemma) that Xn does indeed converge to 0 almost surely. 

o Claim: XXnnp¥ a.s. X Xn as . As our example shows, however, the converse is false. Proof: Recall that convergence almost surely can be written as

()XXn ->e i.o. = 0

or in other words

 lim supnn{}XX->e = 0 ( )

We have, however, that lim infnnAAÍ lim supnn. As such

 lim infnn{ XX->e} = 0 ()

()XXn ->e e.v. = 0 This naturally implies that eventually the probability of large deviations falls to 0 – this is the definition of convergence in probability. 

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 9

o Remark: It is interesting to note that if XXnp , it is possible to find a subsequence XX a.s., k = 1, 2, . This solidifies our intuition that nk convergence in probability differs from convergence almost surely only as a result of a few “freak outcomes”.

Proof: Assume we have X . Pick nk + 1 to be the smallest n such that nk

11  XX-³£k ()nk +1 k 2 (This is always possible by the definition of convergence in probability). The convergence of X almost surely follows by the BC Lemma.  nk o Example: Take W= [0,1] and  = uniform distribution. We then define random

variables X1 and X2 as follows: ïì1 w Î [0,1 ) ïì0 w Î [0,1 ) ï 2 ï 2 X ()w = í X ()w = í 1 ï0otherwise 2 ï1otherwise îï îï Graphically:

X1()w X2()w

w w 0 0 1 1 1 1 2 2

Similarly, we define X3, X4 and X5 to be random variables divided into three parts:

X3()w X4()w X5()w 1 1

w w w 0 0 0 1 2 1 1 2 1 1 2 1 3 3 3 3 3 3 We continue this pattern to form a pyramid of random variables.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 10

Now, consider e Î (0,1) – since Xn is equal to 0 or 1, saying Xn > e is the same as saying X = 1 . Now, consider that XX==11 ==1 . Similarly, for n ( 12) ( ) 2 random variables divided into three parts, the probability is 1 , etc… As such 3

(Xnn >e) 0as ¥

And so Xnp 0 .

However, if we fix any w Î [0,1], Xn (w )= 1 infinitely often (because every set of

random variables will involve at least one that is equal to 1 at that w ). Thus, Xn does not tend to 0 almost surely.

o Definition (convergence in expectation) – also called L1 convergence. We say that XX as n ¥ if nL1

 XXn -0as n ¥ Similarly, XX if nLp

p  XXn -0as n ¥

o Claim: XX implies that XX and  (XX ) ( ). nL1 n n Proof: For the first part, consider that

XX=+XXX- £-+XX n n n -£XXXXnn- Taking expectations yields the result. For the second part, note that

()()()XXnn-= X - X £( XX n -) Where the last step follows by Jensen’s Inequality. Taking limits yields the answer required. 

o Theorem (Markov’s Inequality): Let X > 0 have finite expectation. Then for all a > 0 ()X ()Xa>£ a Proof:

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 11

()XXXaXXa=>+£()() ,  , ³>()XX, a ³>aXa() As required. 

o Claim: If XX then XX . nL1 np Proof: Fix an e > 0 and use Markov’s Inequality

( XXn - )  XX->£e 0 ()n e As required.  Example: Consider Xn=  , where U is U[0,1]. Clearly, X  0 a.s. (Note o n U £ 1 n { n } that this does not hold true for w = 0 . However, this set has measure 0, and so the convergence is still almost sure). However, it is clear that XnU=£=1 1 . Clearly, therefore, L convergence and almost-sure ()n ()n 1 convergence are not compatible. (This illustrates an interesting example where taking limits inside and outside expectations gives a different result. Indeed,

1= l imnn()XXnn¹=() lim 0 .)

o Example: Let Xn be a sequence of IID random variables with exponential distribution with parameter 1. We will prove that X lim supn = 1 a.s. n logn Intuitively, this implies that if we simulate a certain number of exponentially distributed random variables and plot them against their index, and then plot the curve log n, an infinite number of these points will lie on the curve! This is quite shocking, since one might expect very few very large values! Proof: First e > 0 . Consider

-+(1e )log n ()Xnen >+(1e ) log = 1 = n1+e This means that  Xn>+(1e ) log <¥ ån ()n

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 12

So by BC-1,

(Xnn >+(1e )log , i.o.) = 0 As such,

Xnn £+(1e )log ev. almost surely and therefore

X lim supn £+ 1 e n logn

By the same reasoning 1 ()Xn>-(1e ) log = n n1-e >-=¥ Xn(1e ) log ån ()n

So by BC-2 and the independence of the Xn,

(Xnn >-(1e )log , i.o.) = 1 X >-lim supn 1e a.s. n logn The two boxed results together imply our result.  Note: This only applies to the lim sup – to say that the actual sequence tends to

1 is nonsensical, because most of the Xn will indeed be very small!

LECTURE 2 – 26th January 2010

 Interchange arguments o Example: In our last lecture, we considered the sequence of random variables Xn=  . We argued that X  0 almost surely (as n ¥). We showed, n U £ 1 n { n }

however, that  (Xn )= 1 for all n. We might wonder whether it is generally true that

?

(limnnXX) ==( ) lim n ( X n ) Clearly, this is not the case here. It looks like certain conditions need to be satisfied. 

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 13

o Theorem (BDD – Bounded Convergence): Let {}Xn be a sequence of

random variables such that XKn ££¥ (where K is a deterministic constant)

and XXn  almost surely, then ()XXn  (). (Clearly, in the example

above, the n that sits outside the indicator prevents us from bounding Xn). Proof: For any given e > 0 , let

An =ÎW{}w :()()XXn wwe - <

Let BAnn=W- . Because XXn  a.s., (Bn ) 0 . Thus Jensen's

()XXnn-£ () () XX - =-XX(dww ) +- XX (d ) òònn ABnn

£+e()2AKBnn ()  0

In the last line, we used the fact that for all outcomes in An, the absolute

difference is less than e (by definition) and that for all outcomes in Bn, the

absolute difference is less than 2K (because XXKn £KXlimnn = £. 

Remark: It is clear that this theorem still holds if Xn only converges in probability to X rather than almost surely, because all that is required is for the probability of rogue events to fall to 0. The fact that XK£ can be deduced

from the fact there is a subsequence X that converges to X almost surely (see nk above).

o Lemma (Fatou): If {Xn } is a sequence of non-negative random variables, then

()lim infnnXX£ lim inf n ( n ) (Note: there is no converse for lim sup).

Proof: Let YXmn= infn³m and YY== limnn lim inf n X n. Note that:

. YYm  a.s.

. i()nfnn³³mn()XXY³= (inf nm )  m. To see why, consider the Xn that has

lowest expectation – call it Xmin . The LHS then has value X ()(dww ) . The RHS, however, has value infX (ww ) (d ) . This ò min ò nn is clearly smaller or equal to the LHS.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 14

Now, pick some k liminf X ³ li m (Y ) n nm¥ ³ limn¥ ( min[Ym ,]k ) The absolute value of the variable of which we are taking an expectation is now

bounded by k, because the Xn are positive, and so Ym is positive. So we can apply bounded convergence: lim inf Xk³ (lim min[Y , ]) n nm¥ = (min[Y,]k )

As k ¥, the last line tends to  (YX )= (lim infnn ) . 

o Theorem (DOM – Dominated Convergence): Let {Xn } be a sequence of

random variables such that XYn £ with  (Y ) <¥ and XXn  almost

surely, then  (XXn ) ( ).

Proof: Since XYn £ , we have that YX+³n 0 and YX-³n 0 . Applying Fatou’s Lemma to both variables, we obtain

liimifnnfnn(YX+³) ( YX +) l m i nn ( YX -³) ( YX -) Since Y has finite expectation, we can subtract  (Y ) from both sides above, and obtain lim inf ()XX³-³- ()lim inf  ( X )  ( X ) n nnn lim infnn(XXnn )³£ ()lim su p  ( XX ) ()

Together, the last two lines imply that limn  (XXn ) ( ) . 

o Theorem (MON – Monotone Convergence): Let {}Xn be a sequence of

random variables such that 0 £££XX12 almost surely, then

. XXn  (possibly ¥) almost surely.

. ()XXn  () (possibly ¥).

Proof: Since Xn > 0, we can apply Fatou’s Lemma, to get

lim infnn(XXX )³= (lim infn )  ( )

However, the fact that Xn < X also gives ()XXnnn£ ()liminf()  XX £ ().

These two statements together imply that limn  (XXn ) ( ) . 

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 15

o Proposition (Frobini I): Let {Xn } be a sequence of random variables with

Xn ³ 0 for all n, then XX= () (åånnnn) Proof: Define

n ¥ SXSX== (possibly ¥) nnååmm==11 m

(The last step follows because the Xn are non-negative, which means the sequence

{Sn } is non-decreasing almost surely – there is no outcome in the probability space in which adding an extra term to the sum decreases it). By monotone

convergence, we have that  (SSn ) ( ) . However, Finite n nn¥ SX==  () XX  () ()nn()åååmmm===111 nn

And

¥ ()SX= ()åm=1 m Combining these two results yields

¥¥ ()XX= ååmm==11nm() As required. 

o Proposition (Frobini II): Let {Xn } be a sequence of random variables with

 X <¥. Then ()å n n XX= () (åånnnn)

n Proof: As above, set SX= . By the Triangle inequality, we have nmå m=1 Y nn ¥ S =<¥XXX££ a.s. nmåååmm==11mmm = 1

(The last inequality follows because the statement of the Theorem includes the fact that the expectation of the infinite sum is smaller than infinity. If the sum was equal to infinity of a set of non-zero measure, the expectation would blow up. Thus, the sum must be is less than or equal to infinity almost surely.)

Now, note that

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 16

¥¥n SX-=- X X nmåååmmm==1 =11 mm ¥ = X åm=n+1 n ¥ £ X åmn= +1 m ¥0a.s. as n (The last line follows from the fact the sum is finite almost surely. Thus, if we continually whittle away at the sum from below, it will eventually shrink to 0 almost surely).

¥ The above implies that SX almost surely. We can now apply nmåm=1 dominated convergence. First note that Finite n nn ()XXS ==  () ååmm==11mmn( ) Now take limits of both sides

n ¥ lim (XXS )== ( ) lim ( ) nmååmm==11 mnn By dominated convergence, we have

¥ lim (SS )== (lim )  X nn nn()åm=1 m Combining the last two equations, we do indeed find that

¥¥ ()XX= ååmm==11mm() As required. 

o We have, so far, looked at sufficient conditions for interchange. Is there a “common thread” that runs through these conditions?

o Definition (Uniform Integrability): A sequence of random variables {Xn } is said to be uniformly integrable (u.i.) if for all e > 0 , there exists a K (e ) <¥ such that

supXX :>º K (ee ) sup X £ nnn() nn(){}XK> ()e n Note that the supremum implies that the K we find must work uniformly for the entire sequence – in other words, for a given e , there must be a single K()e which makes the tail expectation small for every n. Requiring this to be true for a

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 17

single Xn is actually trivial, provided the expectation is finite, because a finite expectation implies that the tail eventually “dies out”.

o Proposition: Let {Xn } be a sequence of random variables and suppose

XXn  almost surely

. If {}Xn are uniformly integrable, then  (XXn ) ( ). In fact, a stronger

result applies – that ( XXn -) 0 .

. If Xn > 0 for all n and ()XXn <¥ () , then {Xn } are uniformly integrable. In the absence of the non-negativity requirement in the second statement, this would be an “if and only if” statement. We have almost identified the “bare minimum” condition for interchange. Proof: . Part 1: First, let us verify that X is integrable

XX= ()lim infnn Fatou

£ liminfnn X £ sup  X nn =£+>supXX ; a XX ; a nnn()()() nn X are u.i n

£+ supn ()a e <¥

Now, let YXXnn=-. We then have 0 ££YXnn + X. Hence, {Yn }

is also uniformly integrable (since {}Xn is uniformly integrable, and X is integrable). Write

()YYYaYYannnnn=>+£( ;) ( ; )

£+e Y ( n {}Ya£ ) n

Now, note that Yan  £ , so by bounded convergence – and since {}Yan £

Y  0 a.s. because XX a.s. – we have that Y  0 . Thus, n n ()n {}Ya£ n

()Yn  0, as required.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 18

. Part 2: Given a > 1, define

ïì xxÎ [0,a - 1] ï fx( )=-íï (line connecting a1 to 0) xÎ-[1aa , ] a ï ï 0 xa> îï

Clearly, fa is continuous, and x££fx() x {}xa£-1 a {} xa £ Now

XXX=-   nnn{}Xa>£() {} Xa ()nn() £-XfXéù ()nan()ëê ûú

Now, fa is clearly bounded, and XXn  . Thus, applying bounded convergence, fX é ù . Furthermore, by the statement of the theorem, a ()ëê n ûú

we have ()XXn  () (since the Xn are positive and ()XXn  ()). Thus, the RHS above can be made arbitrary close to the difference

between these two expectations. In other words, there exists an n0 such

that for all n > n0,

XXfX£-  éù +e ()nna{}Xa> () ()êú n ëû

£-XX +e ()n (){}Xa£-1

o Proposition (Sufficient conditions for u.i.)

p . If supn  Xn <¥ for some p > 1, then {Xn } is uniformly integrable.

. If XYn £ n and {Yn } is uniformly integrable, then {}Xn is uniformly integrable.

. If {Xn } and {Yn } are uniformly integrable, then {XYnn+ } is uniformly integrable. Proof (part 1): We will prove the first part of the statement above. To do that, however, we will need to use Holder’s Inequality.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 19

Proposition (Holder’s Inequality): Let X and Y be random variables

p q such that  X <¥ and  Y <¥. Pick p, q > 1 such that

pq--11+=1 . Then

1/pq 1/ XY£ ( || Xpq) (  || Y ) When p = q = 2, this inequality is none other than the Cauchy-Schwartz inequality

()XY£ ()() X22 Y Proof: If X = 0 or Y = 0 a.s., the inequality holds trivially. As such,

1/p 1/q suppose xX=>() ||p 0 and yY=>() ||q 0. Suppose we are able to show that, for every point w in the sample space,

pq X()wwY( ) 1 XY()ww1 () £+ xyp x pp q y Then, taking expectations and rearranging, this becomes Holder’s Inequality. To show this statement is true, we write aX= ()/w x and

bY= ()/w y. The statement then becomes

aapq ab £+ p q With a and b non-negative. We can therefore write ae= sp/ and be= tq/ . The expression then follows trivially from the convexity of the exponential function: æöst çst÷ ee expç + ÷ £ + èøçpq÷ pq We have therefore proved Holder’s Inequality. 

We will also require the following result

pq Lemma: If p > 1, then XX<¥ <¥ for 1 < q < p.

Proof:

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 20

qqæöæö q XXXXX=<+³çç;1÷÷  ;1 èøèøçç÷÷ æöp £+1;1ç XX ³÷ èøç ÷ £¥ As required. 

Now, fix K <¥ and n > 1, and choose p and q such that pq--11+=1 . We then use Holder’s Inequality to obtain

XX; >= K X ()nn() n{}XK> n 1/q 1/p æöq æöp ÷ £ çX ÷÷ç ç n ÷÷÷ç (){}XK> èøèøç n ÷ Taking a supremum

1/p æöpq1/ XX;sup()>£ Kç X÷ X > K ()nn nèøç n÷ () n By the assumption in the theorem, the first expression is finite. Let its value be C:

1/q ( XXnn;()>£ K) C( X n > K) Using Markov’s Inequality

1/q æö X ç n ÷  XX; >£ K ç ÷ ()nn ç ÷ èøç K ÷ 1/q æö X ç n ÷ £ çsup ÷ ç n ÷ èøç K ÷

Now, it is clear that the first moment of Xn has finite expectation because we already know a higher moment has finite expectation: CC ¢ ()XXnn; >£ K K 1/q The RHS now no longer depends on n, so we can write CC ¢ supnnn()XX ; >£ K K 1/q For any e , provided we choose KCC³ (¢ /e )q , our condition for uniform integrability is satisfied. 

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 21

Proof (part 1 – alternative): We propose an alternative, somewhat simpler proof to part 1:

æöp-1 ç ïïìü ÷ ç ïïXn ÷ ç ïï ÷ ()XXnn;;>£ aç X níý X n > a÷ ç ïïa ÷ èøç îþïï ÷ 1 =>()XXnn; a a p-1 K = a p-1 Setting aKa³ / p-1 gives the required result.   Kolmogorov’s 3-Series Theorem

o Let {Xn } be a sequence of random variables, and let

n SX= nmå m=1

We now consider the question of when Sn converges. What condition is needed on

the distribution of the X for this to happen? Clearly, if the Xn are IID, this will not be the case, since every additional variable in the sum add something to the sum – IID variables are either “dead at the start” or “never die out”.

o Theorem (Kolmogorov 3-series): Let {Xn } be a sequence of independent random variables. Then X converges if and only if for some (and therefore å n n every) K > 0, the following conditions hold:

.  XK><¥ (clearly, for example, if the X are IID without å n ()n n bounded support, we’ll never be able to satisfy this). This is a statement that the “mass in the tails” must decrease with n.

.  XX; £<¥ K ån ()nn . arXX ; £<¥ K å n ( nn ) Note that there are no issues of existence of the expected value and the variance in the last two points because these are taken over a finite range – namely on

XKn Î-(,)K . Proof: There is a simple proof of this result which relies on martingales. We will cover it later in this course.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 22

nd o Proposition (2 version of 3-series theorem): Let {}Xn be independent random variables with ()0X = for all n. If ar(X ) <¥, then n å n n

n SX= converges almost surely. nmåm=1 Proof: Again, using martingale theory.

o Example: Set XYnnn= / , where the Yn are IID exponential random variables with mean 1. Does the sum SX= converge? The deterministic series 1/n nnå n does not converge, and we wonder whether the exponential variables will be close enough to 0 often enough to “fix” that. In light of the result we derived in the previous lecture (that draws from an exponential will be arbitrarily large infinitely often) one might expect this series not to converge. Let us check the conditions formally. . First condition:

Y XK>=n > K ()n ()n

=>()YnKn = e-nK This does indeed sum to a finite number

( XKn ><) ¥ å n

The first condition is therefore met. . Second condition:

Y XX;;£= Kn Y £ nK ()nn()n n =£1  YY; nK n ()nn But we note that

s ()YY;£= s ye-y d00 y > "> s 11 ò0 It is clear, therefore, that condition 2 is violated, because the sum of 1/n

diverges. So, as we expected, Sn does not converge almost surely. 

o Example: Set Xn = Yn/n, where the Yn are IID random variables with

(1)1(YYpnn==- =-= 1)

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 23

n Once again, we wonder whether SX= converges. The intuition here is nmåm=1 that the deterministic series (1)/- n n does converge. Our version is this å n sum, but instead of deterministically flipping between positive and negative, we randomly switch between positive and negative. We wonder whether this will “spoil” the convergence. We fix K > 0 (say K = 1) and test the three conditions: . First condition

()Xn >=10

So clearly, the sum does converge. . Second condition 1 (XXnn;1£=) ( YY nn ; £ n) n 1 = ()Y n n 21p - = n We know, however, that k diverges, so the infinite sum of these å n expectations only converges if p = ½, in which case the numerator is 0. We therefore restrict our attention to that case. . Third condition (assuming p = ½) 1 ar(XXnn ;£= 1) ar( Y n | Y n £ n) n2 1 = ar(Y ) n2 n ()0Y 2 - = n2 The infinite sum of the variances is therefore finite. As such, our sum does converge. 

o Example: Set XYnn= / n, where Yn are IID random variables with probability ½ of being 1 and probably ½ of being –1. Again, the deterministic

sum (1)/- n n converges, and we wonder whether adding randomness will å n

make a difference. Since, in this case,  (Xn )= 0 , we can simply apply the second form of the 3-series theorem:

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 24

1 ar(XXnn ;£= 1) ar( XY nn ; £ n) n ()X 2 = n

The infinite sum of these variances clearly diverges, and so Sn does not converge. It is interesting to note how sensitive this result is to scaling – in the 1/n case, the convergence in the deterministic case carries over to the random case,

whereas in the 1/ n case, it does not.   The Strong

o Theorem (SLLN): Let {}Xn be IID with  X1 <¥, then

1 n X  ()X a.s. å k 1 n k =1 Note: there is a simpler proof of this theorem that assumes the fourth moment of

X1 is bounded. We prove the more general theorem that does not require this

stipulation. Proof: Let us fix n. . Step 1 – Truncation: Let ïì ïXnnX £ n Y = íï n ï 0 otherwise îï

In a sense, {}Yn is a “truncated” form of {Xn } . We will first prove the result for the truncated form, and later show that the result then carries over to the non-truncated form.

. Step 2: Write Yn as

w n YY=-éù()Y + () Y nnëê n ûú n

Now, note that since Yn only contains values of Xn that are less than or equal to n

()YXXnnnn=£( ; )

= X ( n {}Xn£ ) n

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 25

We will now attempt to evaluate this expectation using dominated convergence. Note that

1. X11 £ X {}Xn1 £

2. Since X1 has finite mean, it is the case that

X111=XX a.s. {}Xn11£<¥{} X (1) justifies using dominated convergence, and (2) gives us the limit. We get

XX () ()11{}Xn£ 1

()YXn  ()1

. Step 3: We have shown that as n ¥, Yn can be written as

Ywnn=+() X1 , or in other words that

11nn YwX+() nnååkk==11kk1 The obvious next step is to show that the first term above tends to 0 almost surely. To do this, we will need two lemmas.

Lemma (Cesaro Sum Property): Let {an } be a real valued

sequence with aan  ¥ . Then

1 n aa¥as n å k ¥ n k =1 Proof: Let e > 0 , and choose an N such that

aak > ¥ -³e whenever kN Then

1 nNïïìü1 nN- lim inf aaa³ lim inf ïï+- ( e) nk¥ ååkk==11n íý k¥ nnïïïîþn ï

³+0 a¥ -e

By a similar argument, lim sup £ a¥ . The result follows. 

Lemma (Kronecker’s Lemma): Let {an } be a real valued sequence. Then

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 26

æö ¥ a ÷ æö1 n ç k converges÷ ç a 0÷ çå k ==11÷ ç å k k ÷ èøç k ÷ èøçn ÷

a Proof: Let u = k . By the assumption in our proof, n åk£n k

uu¥ = limnn exists. Then a uu-=n nn-1 n We then have

nn n akuunuu=-=-() ååkk==11kkknk--11 å k = 1 As such

11nn au=- u nnååkk==11kn k-1

The first term clearly tends to u¥ as n ¥. By the Cesaro Sum Property, so does the second term. Thus

1 n auu-=0 n åk=1 k ¥¥ As required. 

Using these two lemmas, our (rough) strategy will be as follows:

n w  Step 3a: Show that k converges almost surely, using the åk=1 k Kolmogorov 3-series theorem.  Step 3b: Use Kronecker’s Lemma to deduce that for every

n w outcome in the sample space for which k converges, åk=1 k

n 1 w converges to 0. n å k =1 k

n w  Step 3c: Note that since k converges almost surely, åk=1 k

n 1 w converges to 0 almost surely. n å k =1 k

Step 3a: Note that since wYkk=- ( Y k ) , we can conclude that

()wk = 0. Furthermore, the wk are independent of each other (because

they are build up from the Yk, which are built up from the Xk, which are independent). We can therefore use the second form of the Kolmogorov 3-

n w series Theorem to show that k converges almost surely. (Note that åk=1 k

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 27

some steps in the exposition below require some results about moments of random variable which we will prove in a homework – these are indicated by a *).

w 1 ark =- ar()YY ( ) ( k ) k 2 kk 1 = ar()Y k 2 k ()YY22- () = kk k 2 ()Y 2 £ k k 2 1 ¥ *2=>yY()k y d y k 2 ò0 1 k =>2yX()k yy d k 2 ò0 1 k =>2yX()1 yy d k 2 ò0 As such

¥¥w 1 k ark £> 2yX yy d ååk ==1 ( k ) k 1 2 ( 1 ) k ò0 By Frobini’s Lemma, we can interchange the summation and integration, since the integrands are all positive

¥ w ¥ ¥ 1 ark £> 2yX yy d åk=1 ()kykò åk=1 2 {}£ ()1 0 k ¥ ¥ 1 £>2yX y d y ò ()1 åk=êúy 2 0 ëûêúk Note, however, that the sum in this expression above can – in theory – be bounded by some integral (for example, 1/(k – 1)2 works well). Thus, we can write

¥ w ¥ C ark £>1 2yX y åk =1 ()k ò0 ()1 Cy2 + ¥ £>CXy 31ò0 ()

* = CX3 1 <¥ So the sum does indeed converge. . Step 3b & 3c: Consider the following two events

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 28

¥ w ()w A =<¥w : k {}åk=1 k

ïïìü1 ¥ Bw=ïïww:()0 íýåk=1 k îþïïn Now, by Kronecker’s Lemma, ABÍ . But by Step 3a,  (A )= 1 , since A occurs almost surely. Thus, ()B = 1 and

1 ¥ w  0a.s. n åk =1 k Thus, from Step 2,

1 n Y  ()X a.s. n åk =1 k 1

. Step 4: We have now proved the SLLN for the truncated series {Yn } . All that remains to be shown is that the result also applies to the full series

{Xn } . Now, consider that

(XYnn¹=) ( X n > n) But note that

Xn>>£<¥= Xn() X åånn()n ()11

So by the first BC Lemma, for large enough n Xnn £ almost surely.

Thus, for large enough n, Xn = Yn almost surely. Now, consider that, by the triangle inequality and the Cesaro sum property

nn n 11XXY()ww- 1 Y (w) £-(w ) () nnååkk==11kk1 n åk =kk

-limnnXY (w ) n (w)

But we have already shown that limnnXY ()ww-= n () 0 almost surely. As such

nn 11XYXY()ww- ()-lim (ww ) ( ) a.s. nnååk==11kkk n nn

Thus, the SLLN holds for the original sequence Xn. 

o Example: Let {}Xn be an IID sequence of random variables with ()0Xn =

n and  X 22=<¥s . Let SX= . We know, from the SLLN, that ( n ) nnåk=1 S n  0a.s n

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 29

We might wonder, however, whether this still holds true if, instead of dividing by

n, we divide by some quantity an, where an  ¥ , but possibly slower than n.

We might wonder what the smallest such an is that still makes the sum converge.

1 +e 2 Let us consider, for example,annn = (log ) . log n grows extremely slowly, so

this is really very close to a n growth. Consider that

æöX ÷ s2 arç n ÷ = ç ÷ 1 +e ç a ÷ 2 èøn nn()log And this implies that

X ar n <¥ å n ( a ) n

X Thus, by the Kolmogorov 3-series theorem, n converges almost surely, and å a n n

by Kronecker’s Lemma, this means that Sann / 0 . Thus, we see that even

with such a small-growing an, an almost-sure result still holds.

However, it is interesting to note that if we take ann = (slightly smaller), then no almost sure result holds anymore. In fact, we will show in the next lecture

that the best we can say is Snn /(0,1) s N . These arguments are clearly extremely sensitive to scaling. 

LECTURE 3 – 2nd February 2011  Weak convergence & the

o Definition (Weak convergence): A sequence of random variables {Xn } is

said to converge weakly to a random variable X, denoted XXn  as n ¥, if and only if éfX()ùéù fX () ëê n ûëûúêú For all bounded continuous functions f. Comments: . This is clearly a very difficult definition to use, because there are “too many” bounded continuous function. We will see later, however, that it

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 30

suffices to consider a smaller set of functions that is a “basis” for these functions.

. If XXn  almost surely, then fX()n  fX (), because f is continuous, and éùéùfX() fX (), by bounded convergence (since the functions ëûëûêúêún are bounded). Thus, almost sure convergence implies weak convergence.

. Each member of {Xn } as well as X need not even live on the same probability space. Indeed, the theorem only mentions convergence of expectations, which are effectively integrals

f ()xFx d() fxFx () d() ò n ò Nothing prevents the integrals from being taken over different spaces. Note that this is radically different to the context of almost sure convergence, which fixes an w in the probability space and requires convergence. This is not possible here, because the variables might not even live in the same probability space.

o Definition (Weak convergence): A sequence of real valued random variables

{Xn } converges weakly to X if and only if

Fxn () Fx ()

for all continuity points of F, where XFnn~ and XF ~ . (This definition makes it even clearly that the variables can live in different spaces).

2 o Example: Take XX12,, IID, with ()X1 = m and ar(X1 ) = s , then

¥ ()X - m å i i=1  sN(0,1) n This is none other than the central limit theorem. (Note the slight abuse of notation which will continue throughout the course – when we say  F , we mean  X where XF ~ ). We will prove this late in this lecture. 

o Example (Birthday Problem): Let XX12,, be IID uniformly distributed on

{1, , n} and independent of each other (if the Xn were birthdays, we would have n = 365). Now, fix k < n (the number of people amongst whom we are looking for a match), and let

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 31

YiXXjinij==min{ : for some <}

This is effectively the smallest i for which Xi matches an earlier X. We are now interested in the quantity

(No match in group of kYk people)=>( n ) nn--12 nk -+ 1 =⋅1 ⋅ ⋅ ⋅ nn n We could plug in numbers of n and k into the expression above and calculate the product. This is somewhat inconvenient – let’s try to use weak convergence arguments to get an approximation to this instead. Let us fix x > 0

æöY ÷ ç n >=xYxn÷ > ç ÷ ()n èøç n ÷ êúxn = ëûêúni-+1 i=2 ()n êúxn =-ëûêú1 i-1 i=2 ()n Let us take logarithms of both sides

æöêú çYn ÷ êúxn logç >=x÷ ëû log 1 -i-1 ç ÷ åi=2 ()n èøç n ÷ We now need a quick definition

Definition: fx( ) ~ gx ( ) as x ¥ if limfx()= 1 . Similarly, ab ~ if x¥ gx() nn

an limn¥ = 1 . Note that this does not necessarily imply either bn function/series converges.

Now, note that log(1--xx ) ~ as x  0 . As such, log 1--yy ~ . Therefore ()nn

æöêú çYn ÷ êúxni -1 logç >-x÷ ~ ëû ç ÷ åi=2 èøç n ÷ n êú 1 êúxn =-ëû(1)i - n åi=2 11 ~ - xn2 n 2 =-x 2 /2 As such æö Y ÷ 2 ç n >xx÷ exp -1 ç ÷ ()2 èøç n ÷

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 32

The RHS is clearly a distribution, because it takes value 1 at x = 0 and decreases thereafter, and we therefore have weak convergence by the second definition.

Before we comment on this result, let us quote a quick Lemma

æö ¥ ç1 1 ÷ -xy22/2--/ 21 x 2 /2 Lemma: ç - ÷e £ ey d £ e èçx x 3 ø÷ òx x Proof: For the upper bound: Integral over  yx³ ¥ 2 ¥ y 221 eeye-y /2 dy £= -yx//2 2 d- òxxò x x As required. For the lower bound:

¥ ¥ æö æö ---y2 /2 ç 311÷ yx2 /2 ç ÷ 2 /2 ey d ³-ç1÷ey d =-ç ÷ e òx òx èøç yx43÷ èøçx ÷ (Which can be shown by differentiating the result). 

¥ 221 Corollary: eyex--yx/2 d~ /2 as ¥. Making this more specific òx x to a normally distributed random variable, suppose XN~(0,)s2 , then

¥ 1 22 ()Xx>=ò exp-y /2s d y x 2ps2 1 ¥ 2 = syò exp-y /2 d 2ps2 x /s 1 s 22 ~ e-x /2s 2p x

It is interesting to note, therefore, how similar our result looks to the central

limit theorem, despite the fact we did not center the original variables Yn. The distribution we eventually obtained has a tail integral, which, by the corollary, looks similar to that for a random variable. (In fact, the random variable with this tail integral is a Rayleigh random variable).  Remark: A full rigorous treatment of this example will be done in homework 1.

2 o Example: Let XX12,, be IID N(0,s ) random variables, and let

2 MXXn = max()1 , , n . Consider the case s = 1, and let an be such that

 X >=a 1 (we will show in homework 1 that a ~2logn ). We would like ()1 n n n to show that

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 33

-x  {aann()Mx-£ n } exp{ - e} "Î x We first note the RHS has no discontinuity points. Next, note that as x ¥, the RHS tends to 1, whereas as x -¥, the RHS tends to 0. Finally, we note the RHS is monotone. Thus, the RHS is a valid cumulative distribution function. Let us now show convergence:

MXX£+xxxaaa = £+,, £+ ()nnaaa()1 nnn nnn éùn =£+ X x a êú()1 a n ëûn éùn =-1  X >x +a êú()1 a n ë n û We will now require a lemma

Lemma: Consider two (possibly non-real) sequences {an } and {}bn . If

bn c an  0 , bn ¥ and abnn c, then (1+aen ) as n ¥.

This is precisely the situation in this case, with bnn =¥ and aX=+-> x a 0 , since a ¥. We must now consider whether nn()1 a n n

abnn 0 . To do this, we will make extensive use of the Lemma in the previous

2 example (namely that  Xx>=1 e-x /2 ) ()1 x 2p 11 ïïìü2 n ⋅  Xn>+xxaa~exp ⋅ ⋅ ⋅ïï -1 + ( 1 aann) íý2 ( ) n 2p x + a îþïïn an n 2 11 2 a =⋅---nxexp x n x {}2a2 2 2p + a n an n 11 --x 22/2aa 2 /2 =⋅ne-x enn e 2p x + a an n 11--x 22/2aa 1 2 /2 =⋅ne-x ennap2 e x n 22pap+ an an n = 1/n  11 22 -x -x /2an ~2ne⋅> eapnn() X1 a 2p x + a an n

a -x 22/2a = n ee-x n x + a an n  e-x As such, by the Lemma

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 34

é ùn MX£+xxaa =-1 >+ ()nnaaê ()1 nú nnë û -x  e-e As required. 

o Definition (Tightness): A sequence of random variables {}Xn with

corresponding distribution functions{}Fn is said to be tight if ">e 0 , $<¥K()e such that

supnn { XK>£ (ee )} Equivalently

supé 1- FK (ee ) ù £ nnëê ()ûú

Comment: A sufficient condition for tightness if supn  Xn <¥. To see why, with K > 0, and consider that by Markov’s Inequality <¥

XXnnnsup ()XKn >£ £ KK

o Proposition: Let {Xn } be a sequence of random variables

. If XXn  , then {Xn } is tight

. If X is tight, then there exists a subsequence n such that X { n } { k } { n } k converges weakly. This is somewhat isomorphic to the concept of compactness in real analysis.

o Definition (Characteristic Function): A characteristic function (CF) of a random variable X is given by jq()=Î é exp(iX q )ù q  X ëê ûú Remarks

. jX (0)= 1

. jqX ()£ 1, because by Jensen’s inequality, jqX ()£={ exp(iX q )} 1

n ()nnn . jX ()0=iX ( ), provided  X <¥.

. If X and Y are independent, jqjqjqXY+ ( )= X ( ) Y ( ) , though the converse is false.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 35

o Examples: If XN~(,)ms2 , then jq()=- expi mqsq1 22 X ()2 Note also that if YX=+ms, then

iXqm[]+ s i qm jqYX()==+()ee jsq ( ) This is convenient in relating all normally distributed random variables to the standard normal. 

o Proposition (Levy characterization theorem): Let {}Xn be a sequence of

random variables with distribution functions {Fn } . If . jq() jq () for all q Î  Xn . jq() is continuous at 0. Then

XXn  and X has characteristic function jq(). Comments:

. If XXn  , then the two conditions in the proposition trivially hold – this is really an “if and only if” statement. . One might wonder why the second condition in the theorem is needed. To

see why, consider, for example, XNnn ~(0,). We then have

2 jq()= e-q n/2 , and as n grows large, this converges to Xn ì ï0if q ¹ 0 jqX () í n ï1if q=0 îï This does not satisfy our continuity requirements at 0. To see why, consider that with K > 0,

æöK 1 XK>=ç N(0,1) >÷  ()n ç ÷ èøç n ÷ 2 In other words, the distribution in question is not tight – tail probabilities grown with n. The continuity condition, therefore, can be thought of as a tightness condition.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 36

 The Central Limit Theorem

o Theorem: Let XX12,, be IID with mean ()X1 = m and variance

2 s = ar(X1 ) . Then

n ()X - m å i i=1  sN(0,1) n

n Proof: It suffices to prove this for the case of m = 0 . Let SX= , and niå i=1 consider its characteristic function é ù jq()=  expiS q / n Sn/ ê ()n ú n ë û é n ù =  exp iq X ê å j ú ë ()n j =1 û é n ù =  ê exp iq X ú ë j =1 ()n j û n é ù =  êexp iq X ú  j =1 ë ()n j û éùn = êúj q ë X ()n û Now, let us take a Taylor Expansion qq2 jjq =+(0) j¢¢¢ (0) + j (0) +R XX( ) X Xn n n 2n 2 qq22 =+jXn(0) (XiXR ) + ( ) + n 2n qs22 =-1 +R 2n n Now, consider that

n jqj()= q Sn/ ()X () n n n æöqs22 =-ç1 +R ÷ ç n ÷ èøç 2n ÷

bn =+()1 an Now, suppose we can show that ab=+=+ né qs22 Rù qs 22 nR qs 22 (or in nnëê 22n nûú n 2

other words, that nRn  0 ) as n ¥, then we would have æöqs22 jq()- expç ÷ Sn/ ç ÷ n èøç 2 ÷ = jq()

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 37

This is the characteristic function of N(0,s2 ) , and it is continuous at q = 0 . Thus, all the conditions of Levy’s Theorem hold, which proves the central limit theorem.

All we now need to do is to prove that nRn  0 as n ¥. To do this, first note the following standard result from deterministic analysis proof?!?

æön+1 n n ç xx2 ÷ ix n ()ix ç ÷ e - £ç ÷ åm=0 mn! ç(1)!+ n ! ÷ èøç ÷ Here, ab=min( ab , ). Let us apply expectations

æönn+1 n ç xx2 ÷ ix n ()ix ç ÷ e - £ç ÷ å m=0 m !!ç(1)nn+ ! ÷ èøç ÷ Now, use Jensen’s Inequality on the LHS to obtain

æö22 æöm ç XXqq2 ÷ iXq 2 ()iXq ÷ ç ÷ efX-£=ç ÷ ç ÷ éù(,)q () çåm=0 ÷ ç ÷ êú èøç m !3!2÷ ç ÷ ëû èøç ÷ Now, observe that if q  0 , fX( ,q ) 0 almost surely, since  X is bounded.

2 2 Similarly, note that fX(,)q is bounded by X q2 , and since  X =<¥s2 ,

we can say that fX(,)q £ Y with  Y <¥. We can therefore apply the

dominated convergence theorem to conclude that é ù  ëêfX(,)qqûú  0 as 0 More details? But we have

R =  fX(,q ) n ()n ïïìü3 ïïX q 2 ïïn 2 q =nXíýïï ïï3! n îþïï ìü3 3 ïïq 2 ïïX 2 ïïn q =íïïX ý ïï3! n îþïï

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 38

Inside goes to 0 as n -> infinity for all theta, and is dominated by |X|^2 theta^2 which has finite expectation, hence by dominated convergence it all goes to 0. We kind of need thrd moment – but clever combination here using second moments as well to be able to use interchange.  Odds & Ends

o Proposition (The Skorohod Representation): Let {Xn } be a sequence of

random variables such that XXn  . Then there exists a probability space ¢ ¢ ()W,,  supporting a sequence {Xn } and X such that

d d ¢ ¢ XXnn= and XX= ¢¢ And XXn  almost surely.

Proof: Let Fn be the distribution of Xn and F be the distribution of X. We know that

Fxn () Fx () () at all continuity points x of F. Now, define Fp-1()=³ inf{} xF : ()x p (This generalized inverse deals with cases in which there are discontinuities in the distribution). Note that ( ) implies that

--11 Fpn () Fp () () How do we know? for all continuity points of F (and recall that by the definition of distribution functions, F only has a finite number of discontinuity points).

Now, take ()(W, , = [0,1], , uniform di stribution ). Let U ~ U[0,1]. Let ¢¢--11 XFUnn==() XFU () Note that ¢¢ ()(Xxnn£= FUx() £ )

=£()UFxn ()

= Fxn ()

d d ¢ ¢ ¢ Similarly, (XxFx£=) (). As such, XX= and XXnn= , as required. Note ¢¢ also, however, that by () , XXn  for all but continuity points in W . Since

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 39

there are a countable number of such points, the set of such points has measure ¢¢ 0. Thus, XXn  almost surely.

o Proposition (Continuous Mapping Theorem – CMT): Let {Xn } be such

that XXn  . Let f be a function with (XDÎ=f ) 0 with Df being the set of

discontinuity points of f. Then fX()n  fX ()

o Proposition (Converging Together Lemma – CTL): If {Xn } is a sequence

such that XXn  and {Yn } is a sequence such that Yan  (where a is a deterministic constant), then

. XYnn++X a

. XYnn Xa

X . n  1 X provided a ¹ 0 . Yan

LECTURE 4 – 9th February 2011  Introduction to Large Deviations

2 o Consider a sequence of IID random variables, XN12,,)X , ~(ms . Let

n SX= . Fix a > m . We are interested in the value niå i=1

()Snan > In the case of the Gaussian distribution, we can use the bounds derived earlier in these notes

æöSn- m na()- m ÷ Sna>=ç n > ÷ ()n ç ÷ èøç ssnn÷ æö ç a - m÷ =>çZn ÷ èøç s ÷ 1 s æö()a - m 2 ÷ £-expç n ÷ ç 2 ÷ 2()pmna- èøç 2s ÷ And

æö--13æö2 1 na()--mm na () ç na()- m ÷  Sna>³ç -÷expç - ÷ ()n ç()ss()÷ ç 2 ÷ 2p èøç ÷ çèø2s ÷

Taking logarithms, we find that (c1, c2 and c3 are constants)

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 40

cclog n()aa--mm2 1 c1 logn ()2 23+- £log ()Sna >£-- 12 nn2ss2 n n nn 2 2 Now, this means that 1 ()a - m 2 lim sup log  Sna>£- n ()n 2 n 2s 1 ()a - m 2 lim inf log ()Sna>³- nnn 2s2 As such, we find that 1 (a - m)2 log ()Sn>-a n n 2s2 This result is the meta result of . It is easy to derive in the case of a Gaussian variable – we seek to develop this result in more generality.

o Let us develop some insight into the exact concept of “large deviations”. The central limit theorem would tell us that

Sn »+nZnms ZN~(0,1) A “natural” question, therefore, would be to consider

()Snn >+ms Kn The theory of large deviations goes even further – since a > m , we can write

()(SnaSnnn>= >+ms Kn ) In other words, the central limit theorem deals with “normal” deviations, of order

n from the mean. The theory of large deviations refers to larger deviations, of order n. We will now see why the central limit theorem is powerless to deal with these large deviations.

o Consider a sequence of IID random variables, XX12,,, with ()X1 = m and

n ar(X ) =<¥s2 As ever, let SX= . Fix a > m . We might be tempted 1 niå i=1 to naively apply the Central Limit Theorem and write

æöSn- m na()- m ÷ Sna>=ç n > ÷ ()n ç ÷ èøç ssnn÷ æö ç a - m÷ >çZn ÷ èøç s ÷ ¥0as n

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 41

Though this statement is correct, it isn’t particularly informative – it’s quite obvious that the probability in question tends to 0! This happens because of the

factor of n on the RHS of the inequality. The central limit theorem is clearly not equipped to deal with large deviations.

qX1 o Theorem: Let XX12,, be IID with mean m and such that M()q = (e )

exists for all q Î  , and assume ()1X1 = m , 1 log()SnaIa>-( ) n n Where Ia()=- supéqq a log M ()ù qÎ ëê ûú Ia()=- sup éq a jq()ù qÎ ëê ûú (We denote jq ( )= logM ( q )). This is the convex conjugate of j . Remark: The same conclusions hold if M (q ) exists in a neighborhood of the origin (plus some mild technicalities).

o Example: Let us apply this theorem to Gaussian random variables, for which M()qqmqs=+ exp 1 22. In this case, ()2

Ia()=-- sup qqmqs a 1 22 q { 2 }

This is simply a quadratic, with argmax q* = a-m . We then get s2 ()a - m 2 Ia()= 2s2 Which is indeed what we found using Gaussian tail bounds. 

o Proof of Cramer’s Theorem: We will carry out this proof in two steps . Step 1: establish an upper bound 1 lim sup log()Sn>£-aIa ( ) nnn . Step 2: establish a lower bound 1 lim inf log()Sn>³-aIa ( ) nnn Fix a > m and q > 0 . . Step 1: Recall from Markov’s Inequality that for any a > 0 , we have

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 42

()X ()X >£a 1 1 a For this proof, we will need a more general form of Markov’s Inequality. Lemma (Generalized Form of Markov’s Inequality): Let

g : ++ be an increasing function such that  gX()1 <¥. We then have that (gX()) ()X >£a 1 1 g()a Proof: We have that

gX()³>é gX (); X aù ()111ëê ûú ³>gX()(aa 1 ) As required. 

Now, consider that in our case. Since q is strictly positive, we can write

()()SnaSnn>=qq > na

=>()exp(qqSnan ) exp( ) qS £ ()een -qna (This is often known as a Chernoff bound. Note that we have used the generalized form of Markov’s Inequality in the last line). We then have

qS Snae>£-qna en ( n ) ( ) -qna é n ù = eX exp êq ú ()ë åi=1 1 û n = eX-qna  exp(q ) ()i=1 1 n -qna éù = eMëûêú()q = ee-qjqna n () =--exp( na [qjq ( )]) Let us try to optimize the bound (ie: make it as tight as possible). To do this, we choose q such that

qqjqÎ-arg supq>0 {}a ( ) Note, however, that

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 43

 The derivative of our objective, at q = 0 , is given by a - jm¢(0)=->a 0 .  We will show, below that our objective is concave. Together, these two points imply that extending our optimization problem to q Î  will not change our optimum, since it necessarily lies in the positive quadrant. Thus, we can choose q such that

qqqÎ-arg supqÎ {}a j ( ) This is precisely the optimization problem involved in finding I(a). Going back to the above, we therefore find that

()()Snan >£exp - nIa ( ) Precisely as requested. All that remains to do is to prove that jq ( ) is a

convex function. Fix a Î (0,1) and qq12, Î  . We then have

jaq()()12+-[1 aq ] = logM [ aq 1 +- (1 aq ) 2 ] aqXX(1- a ) q = log() [ee11 21 ] aqXX(1- a ) q = log() [ee12 ]

?????????

. Step 2: We now proof the lower bound, in a very different way. Assume, for notational convenience, that X has a density p(x) – this is not, however, required. Then, fix q > 0 and define a new density epxqx () px()=³0 q M()q which is a bona-fide density. Suppose g :  is such that

 gX()1 <¥, then

px()éù px () éùgX( )== gxpx ()() d x gx () p () x d x =êú gX ( ) ëûêúòòpx()qqêú1 px () qqëûêú Similarly,

é pX(,, X )ù éùgX(,, X )= ê gX (,, X ) 1 n ú ëûêú11nnq ê pX(,, X )ú ëê q 1 n ûú

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 44

Where here, g : n  . This is, effectively, a change of measure, and

pp/ q is a likelihood ratio.

Now, consider é ù Sna>= ( n ) êúSna> ëû{}n éùpX(,, X ) = êú1 n q êú{}Sna> n pX(,, X ) ëûêú01 n éùn pX() = êúi q êú{}Sna> i=1 n pX() ëûêúq i é ù n pX() = ê i ú q ê {}Sna> i=1 qX ú n epXMi ()/()q ëê i ûú éùn éù êúêúM()q = êúëû q êú{}Sna> n êún exp q X å i ëûêú()i=1 é ù =-+ exp qS njq() q ê Sna> ()n ú ë {}n û Now, fix a d > 0 é ù Sna>³êúexp -+qjq Sn ( ) ()nnq na<£+ S nad n () ëûêú{}n é ù ³-++ê expqdna n n jq ( ) ú q na<£+ S nad n ()() ëê {}n ûú =<£+eenaSnan--na(())qjq - qd n d q ( n ) This is starting to look like what we want – it is crucial, however, to ensure that the last probability stays bounded away from 0 – if not, it tends to negative infinity. Unfortunately, according to our original distribution, it does fall to 0. Our hope is to choose our new distribution

(characterized by q ) to ensure we get what we want.

Note that xeqx p() x  ()Xx= d q ò M()q M ¢()q = M()q = jq¢()

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 45

Now, suppose there exists a q* such that jq¢ ( ) [in fact, we have assumed

X1 has a moment generating function that exists over the real line, so

2 there does exist such a q ] . Then the mean of Sn under our new distribution would indeed be na, which gives us hope that the probability above will be bounded away from 0. Formally, by the central limit theorem, under q* Sna- n  s2N(0,1) n if ()s*2<¥, where ((s*2))2 =- Xa2 . Under the conditions of q* Cramer’s Theorem, it turns out this is also satisfied.

We then have that

æöSna- d ÷ na<£+ S nad n =ç0 ç00Z ÷ èøç s* ÷ As required. Now, returning to our expression above, 11qd* logSna> =--éùqjq** a ( ) - + log naSnan <£+ d ()n êú q* ()n nnëûn Taking a lim inf, 1 lim inf log()Snaa>³--éqjq** ( )ù nnn ëê ûú The last thing we now need to verify is that the q* required is indeed the maximizer of the function above, to get I(a). To do that, note that

supqÎ {}qjqa - ( ) is a concave program, which implies first-order conditions are necessary and sufficient. In this case, they are a = jq¢ ( ), precisely as desired.   Applications of Large Deviations Theory

2 We mentioned that it is enough to the moment generating function to exist in the neighborhood of 0, “and some extra conditions”. These extra conditions are precisely those that ensure such a q does exist.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 46

2 o Example: Let XX12,, be IID N(,ms ) variables variables. Fix a > m . We saw, earlier that

()()Snan >£exp - nIa ( ) 2 =-exp n ()a-m ()2s2

The central limit theorem allows us to “handwave” Sn »+nnNms (0,1). Using

this result, however, we can be more exact – we can find a sequence an such that

Sn eventually lies below nam + n almost surely. Consider:

a Sna>+=mm Sn >() +n ()nnn( n ) æö2 ç (/)an÷ £-expç n n ÷ ç 2 ÷ èøç 2s ÷

We now need to choose an an that ensure our envelope is summable. Let us

2 consider, for d > 0 , annn =+2(1ds ) log . This is very close to n , which we might expect to work given the central limit theorem. With that choice of envelope,  Sna>+£mdexp -+ (1 )log n ()()nn = n-+(1d ) This is indeed summable. Thus, by the first Borrel-Cantelli Lemma,

{}Snan £+m n almost surely. Using a similar kind of argument, we could show

that {}Snan ³-m n eventually almost surely. This is a much more precise statement than that of the central limit theorem. 

o Moderate deviations theory. We saw that a “typical” deviation is of order

n , a “large” deviation is of order n. What about deviations in between? Theorem (Moderate Deviations): Under the conditions of Cramer’s

Theorem, for any sequence an such that

a . n ¥ n

a . n  0 n for any a > 0 , na2 log  Sna>+ma - 22()nn asn 2

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 47

Proof of upper bound: WLOG, set m = 0 and a = 1.

jjjs(0)=== 0¢¢¢ (0) 0 (0) 2 We then obtain, as q  0 , qs22 jq()~ 2 (Recall that fx()~() gx =li mfx() 1). Let us now use a Chernoff bound to gx() deduce that

SSaa nn>£exp(qq n )é exp( n )ù ( nn) nëê nûú a njq()q - n = eenn But j()~qqs22, so n 2n2

æö22 S a qs a ÷  nn>£C expç -q n ÷ ()nn ç ÷ èøç 2nn÷ Optimizing, we find

æö2 S a ç a ÷  nn>£C expç -n ÷ ()nn ç 2 ÷ èøç 2ns ÷ As required. 

LECTURE 5 – 16th February 2011

Random Walks & Martingales

 Random walks

o Definition (): A random process {}Snn : ³ 0 is called a random

n walk (RW) if it can be represented as SX= , with the X niå i=1 {}n

independently and identically distributed, and independent of S0.

Remarks: If S0 = 0:

. ()SnXn = ()1 and ar(SnXn )=  ar(1 )

. By the Strong Law of Large Numbers, SXn » n()1

. SnXn »+()1 s nN (0,1)

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 48

o Question: Suppose T is a positive, integer valued random variable. Is it the case

that ()STXT = ()()1 ?

o Let us consider the situation in which T is independent of the {}Sn (which is

equivalent to saying T is independent of the {}Xn ). We then have

TTé ù ()SX== êú XTTXTX | == () [] ()() Ti()ååii==11()ë iû 11 Unfortunately, this answer is not particularly interesting or useful, simply because in all “interesting” situations, T is allowed to depend on the behavior of the system. Let us consider some more interesting cases.

o Example: Consider a random walk defined as follows

IID ïì 1with prob p ï 1 n XpSX~í >= i ï--1 with prob 1 p 2 niåi=1 îï

S In this case, n ->210p a.s., and so S ¥ almost surely. Now, imagine n n

S0 =-1 and define

TnS= sup{}³= 0 :0n

Because Sn ¥, we have ()1T <¥ = . Furthermore, by definition,

SSTT=0()0 =. However, we have that ()Xp1 =-> 2 10 and that ()T ³ 1. Thus, it is clear, in this case that

()STXT ¹ ()()1 Clearly, therefore, there can be some issues. In this case, the problem is that the time T is anticipatory; it is determined by full knowledge of the future of the random walk.

o Definition (Filtration): We previously defined a probability space by the triplet

()W,,  , where  represented possible events. We define nn= s()XX0,, to

be the sigma algebra generated by XX0,, n . This sigma algebra constrains all the information that is available from knowing the value of the variables

XX0,, n . We then say that {}n is a filtration, and clearly,

ÍÍnn+1 Í.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 49

o Definition (): Suppose T is a non-negative integer-valued random variable. Then T is said to be a stopping time with respect to an underlying

sequence {Xn } if, for each k > 0,  = f XX,, {}Tk= kk( 0 )

Where fk is a deterministic function. In other words, we require

{Tk=Îk} k "

o Example (Hitting Times): Let TnXA= inf { ³Î0: n } . We then have = {}Tk=ÏÏÎ { X01 A,, Xkk- AXA , }

Similarly, we would define TnSA= inf {}³Î0: n , and we would then have = {}Tk=ÏÏÎ { S01 A,, Skk- ASA , }

n Proposition (Wald’s First Identity): Let S be a random walk SX= , o n niå i=1

with S0 = 0, and let T be a stopping time with respect to the sequence {n }

(where nn= s(,,X1  X ))

. If Xi ³ 0 , then ()STXT = ()()1 [this could, of course, we infinity].

. If  Xi <¥ and  (T ) <¥, then ()STXT = ()()0

T ¥ Proof: Let SXX== . We then have Tiiååii=11= {}iT£

¥ (SX) =  Ti()åi=1 {}i£T Now, let us do both parts:

. First part: Xi > 0, and indicators are always positive, so by Fubini I, we can interchange the expectation and the sum:

¥ (SX) = Tiåi=1 (){}i£T Consider, however, that =-1 {}Ti³< {} Ti =-1  {}Ti£-1 This implies that  Î  . Going back to our sum {}Ti³ i-1

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 50

¥ é ù (|SX) =  Tiå ê iT£ i-1 ú i=1 ëê (){} ûú ¥ é ù =  X |  å ê iT£ ()i i-1 ú i=1 ë {} û ¥ éù = êú ()X åi=1 ë {}i£T 1 û ¥ = ()X 1 åi=1 (){}i£T Using Fubini I again:

¥ ((S ))= X T 1 ()åi=1 {}iT£

= ()()XT1 . Second part: In this case, consider that, by the triangle inequality

T S £ X T å i=1 i By part 1, however,

T XTX=<¥() ()åi=1 i 1 The result then follows by Fubini II. 

o Example (Gambler’s Ruin): Consider a random walk in which S0 = 0 and (1)(XX== =-= 1)1 . We let ZkS=+ , with 0 <

intuitively, k is our initial wealth, and Zn is our wealth at time n. Let Tn= inf {}³=0:ZZN 0 or = nn =-inf {}n ³=-=0:SkSNknn or

Consider that  (ZkSTT )=+ ( ) . But consider also that, by definition ()0(ZZ==+= 0) NZN  ( ) TT T ==NZ()T N

Now, unfortunately, the Xi are not non-negative. However, Xi £ 1 , and so it clearly has finite expectation. Let us now assume ()T <¥ (this will be proved in homework 2). By Wald I, we then have

()SXTT == ()()01 And so NZ()== N k T ()/ZNkNT ==

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 51

o Proposition (Wald II): Suppose {Xi } are IID bounded random variables, and

22 ()X1 = 0, ()X1 = s . Let T be a stopping time with respect to {}n , and such that ()T 2 <¥. Then

2 ar(STT )= s  ( )

Remark: If T were independent of the Xi, the result would be obvious. We seek to extend this to stopping times. Proof: Clearly, under the assumptions of this proportion, Wald I holds – as

such, (STXT )== ( ) ( ) 0 . Now

2 T ()SX2 = Tiå ()i=1 TTT-1 =+XXX2 2 ()(åååiiji===+111iij ) But by Wald I,

T XTX22= ()( ) ()åi=1 i 1 = s2()T Now, consider the second term

TT-¥¥1 XX= XX (ååiji==+11ij) ( åå iji ==+ 11 ij{}{}TijT-³1 £ ) Now, imagine it were possible to apply Fubini 2. We then have

TT-¥¥1 XX= XX (ååiji==+11ij) åå iji ==+ 11( ij{}{}TijT-³1 £ ) But note that because of the way we have written these sums, T ³j Ti >. As such, we have that = . As such {}{}Tj³³+³ Ti1 {} Tj

TT-¥¥1 XX= XX (ååiji==+11ij) åå iji ==+ 11( ij{}Tj³ ) ¥¥ é ù = XX  |  åå ê ijTj³ j-1 ú iji==+11ëê (){} ûú ¥¥ é ù = êXX  |  ú ååiji==+11ë ijj{}Tj³ ()-1 û ¥¥ éù =⋅êúX 0 ååiji==+11ëûi {}Tj³ = 0 Now, let us justify the use of Fubini 2. X is bounded, and so XK£

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 52

2 T SX2 = Ti(å i=1 ) 2 T £ X ()å i=1 i £=()KT22 K T 2 This has a finite expectation, since T has a second moment. 

22 o Example (Gambler’s Ruin): In this case, ()1X1 ==s , and we assume ()T 2 <¥ (again, see homework 2), we then have

22 ()0ZZNZNkNTT==+==()( 0  T ) However, we also have

22 ()ZkSTT=+ ( ) =+kkSS222() + () TT =+kT22s () =+kT2 () And so

2 ()TkNkkNk=-=- ( )  Martingales

o Definition (Martingale): A process {}Mnn : ³ 0 is called a martingale (MG)

with respect to a filtration {n } if

. Mn Î n

.  Mnn <¥ "

. ()MMnn|  --11= n

o Example (Random walk): Let Mn = Sn, with M0 = 0. Then

. Sn Î n

. Provided  Xi <¥, then SnXn £<¥1 . éùéSSXSX||()=+ ù =+  ëûëêúênn--11 n nn -- 11 ûú n n

Thus, if we let ()0Xn = , then our random walk is a martingale.

22 2 o Example (Variance martingale): Consider MSnnn=-s , with ()X1 <¥ and

()X1 = 0. Then . Condition 1 holds.

22 2 . MSnnnn£+=<¥"()ss 2 n

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 53

æö2 . MSnSXnS||=-22ss = çéù +- 2 | ÷ = ()nn-----11111() n nèøçëûêú n n n÷ n

n Consider that if we let DMM=- , we can write MDM=+. o nnn-1 niåi=1 0

o Proposition: Let {}Mnn : ³ 0 be a martingale with respect to {}n and put

DMMiii=--1 . Then

. ()Dii =" 0 2 . If s)upi³1 (Di <¥, then ()0DDij="¹ i j Remarks . Note that this theorem almost achieves the representation of a general martingale as a random walk. Indeed, it expresses our martingale as a

sum of mean-0 uncorrelated random variables. The Di, however, need not be independent (and indeed, in most “typical” cases, they will not be). . As we will see later, the boundedness condition is required to be able to argue that why exactly is this needed?

2 2 DDij£<¥ D i D j (Note: in this case, it would be enough to require the second moment of each difference to be bounded, but it seems less clunky to require the slightly more stringent uniform boundedness condition). Proof: Fix i > 1:

()DMMiii=-()-1 =-é (|)MM  ù ëê iin--11ûú =-é (|MM ) ù ëê in--11 iûú = 0 Now, fix j >³i 1 :

()DD=-é D ( M M )ù ijëûêú i j j-1 =-é DM()| M  ù ëê ()ij j--11 jûú =-éDM M|  ù ëê ijjj()--11ûú = 0

n (Note that this can be used to prove ar(MM )=+ ar( ) ar( D ) ).  ii0 åi=1

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 54

o Example: Let {Xn } be IID, with  (Xi )= 1 and  Xi <¥. And let

n MX= . This is a martingale: nii=1 . Condition 1 satisfied.

n . MX=<¥ nii=1 . Conditioning

n MX||= ()nn--11()i=1 in n-1 =  XX|  ()nini=1 -1

= MXnnn--11()| 

= Mn-1 It is quite astounding, therefore, that even this process can be written as the sum of uncorrelated increments!  Optional Stopping Theorem for Martingales

o Question: If T is a stopping time, when is it true that ()MMT = ()0 ? This is the question that will be concerning us in this section. Let us consider some simple examples.

o Example: Let MSnnn=-m , with M0 = 0 and ()X1 = m . When is it the case

that ()MMT == ()00 ? Effectively, we are asking when it is the case that

()STT = m () This is precisely the subject matter of Wald’s First Identity. 

22 22 o Expample: Let MSnnn=-s , with ()XX11==<¥ 0,() s . When is it

true that ()MMT == ()00 ? Effectively, we are asking when it is the case that

22 ()STT = s () This is precisely the subject matter of Wald’s Second Identity. 

o Proposition: Let {}Mnn : ³ 0 be a martingale with respect to {n } , and let T be a stopping time. Then for each m > 1

()()MMTm = 0 Where Tm=min( Tm , ) .

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 55

LECTURE 6 – 24th February 2011

o Proof: Let {}Di be the martingale differences, and write

Tn MDM=+ Tn åi=1 i 0 n =+DM åi=1 i {}Ti³ 0 Take expectations

n (MDM) =+ Tn åi=1 () i{}Ti³ ()0 We know, however, that  Î  {}Ti³ i-1

n é ù (|MDM) =+ Tn å êú i Ti³ i-1 ( 0 ) i=1 ëûêú( {} ) n éù =+êú ()()DM|   å i=1 ëû{}Ti³ i i-1 0

= (M 0 ) As required.  Remark: Consider that

. Consider that lim MM . As such, é lim MMùéù= . nT¥ n T ëê nTnT¥  ûëûúêú . By the theorem above, however, éMMùéù= , which implies than ëê Tn ûëûúêú0 lim éùéùMM= . n¥ ëûëûêúêúTn 0 Thus, every optional stopping theorem boils down to the following interchange argument – if we can make the interchange, then éMMùéù= : ëê T ûëûúêú0 élimMMùéù= lim ëê nTnn¥ ûëûúêú ¥ Tn 

o Corollary I: Let ()Mnn : ³ 0 be a martingale with respect to {n } and T be a stopping time such that T is bounded (in other words, there exists a K <¥

such that ()TK<=1), then ()MMT = ()0 .

o Corollary II: Let MZTn £ , with ()Z <¥. Then by dominated

convergence, the interchange holds and ()MMT = ()0 .

o Corollary III: Let ()Mnn : ³ 0 be a martingale with respect to {}n and T be a stopping time such that ()T <¥. Provided the martingale differences are

uniformly bounded (  é DC|  ù £<¥), then ()MM= (). ëê ii-1 ûú T 0

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 56

Remark: This is a stronger version of Wald I. There, we had essentially

established the OST under the conditions  X1 <¥, ()T <¥. In the case of a random walk, the X are exactly the martingale differences. This theorem is slightly stronger because here, it is enough for the conditional increments to be bounded). Proof: We have

Tn MDM=+ Tn å i=1 i 0

M0 is integrable. Now consider that,

Tn Tn DD£ ååi i ii==11 ¥ = D  åi=1 i {}Ti³ Let us now take expectations. By Fubini I, we can then swap the expectation and sum, since the summands are positive ¥¥é ùé ¥éù ù DD== ê úê D|  ú ååii==11ii{}Ti³³ë {} Tiûë å i = 1 {} Ti³ ëûêú in-1 û Since the martingale differences are uniformly bounded bt C, this is < CT(). As such

MCTMTn £+() 0

M0 is integrable, and by the statement of the theorem,  (T ) <¥. As such, our martingale is bounded by an integrable variable, and the OST holds by corollary II.   Martingale Limit Theory & Concentration Inequalities o In Optional Stopping Theory, we were looking for results about random times T. We now consider very large times T.

o Proposition (Martingale Convergence Theorem): Let {}Mnn : ³ 0 be a

martingale with respect to {}n such that supnn³1  M <¥ (note: this is a

much stronger condition than  Mn <¥ for all n, because each item could be finite but the sum might diverge). Then

MMn  ¥ a.s.

for some finite bounded random variable M¥ .

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 57

Proof: See Williams, page 109

n Recall that MM=+ D. This is a random-walk like structure, even o ni0 åi=1

though the Di may not be independent. We can write

MM1 n n =+0 D nnnåi=1 i

At an intuitive level, since the Di are uncorrelated, we might expect the SLLN to hold and the last sum to converge to 0. If we succeed in showing that, we have

effectively shown that Mn/n converges to 0.

o Proposition (Martingale SLLN): Let {}Mnn : ³ 0 be a martingale with

2 respect to {}n such that s)upn³1 (Dn <¥ (this, as we saw above, is a requirement for the increments to be uncorrelated). Then M n  0a.s. n n D Proof: Put M = k (where, as usual DMM=- . Clearly, M Î  , n åk=1 k kkk-1 n n and

é n-1 DD ù éùM ||=+ê kn ú ëûêúnn--11ëåk=1 kn nû n-1 D 1 =+k  éD |  ù åk=1 k n ëê nn-1 ûú  = Mn-1  and  Mn <¥ for all n, due to the conditions on the second moments of Dn.. Thus, it is a martingale. Now

2 é ù 2 é D ù DD  êúnnk ê ijú Mn =+ ê ú êúååkij=¹1 2 ij ëê k ûú ëê ûú The second term is equal to 0, since the D are uncorrelated.

2 2 n sup (D )  M £<¥nk³1 n å k =1 k 2 This implies that

2 supnn³³11MM<¥ sup nn <¥ (Notice that it is often easier to work with second moments rather than first moments, despite the fact finiteness of second moments is a stronger condition

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 58

than finiteness for first moments). If so, by the Martingale Convergence Theorem,  MMn  ¥ a.s. Now, let

n D ()w AM=ww:()()1k  A= {}åk=1 k ¥

ïïìü1 n BD=ïïww:()0 íýå k=1 k îþïïn By Kroenecker’s Lemma ABÍ . Thus, ()B = 1. This proves our Theorem.  Remark: When we proved the SLLN, we went to great pains to work with a first moment only. In this case, we work with second moments, which makes the

proof much simpler. It is interesting to note, however, that the scaling of Dkk /

D is “overkill”. Our series would also have converged for k . Carrying the entire k(1/2)+d

(1/2)+d proof through, we obtain Mnn /0 a.s. Thus, assuming more than first moments has indeed resulted in a stronger conclusion. Mapping this back to the

n IID world and letting SX= with ()X 2 <¥, we have obtained the niå i=1 1 stronger result that 1 S  0 . nn(log )1+d n

o Proposition (Central Limit Theorem for Martingales): Let {}Mnn : ³ 0

be a martingale with respect to {}n and put VDn = max1££in i . If

()V 2 . sup n <¥ n n

. Vnn /  0

n . 1 D2  s2 (deterministic and finite). This makes our martingale n åi=1 i “very similar” to a random walk. Then 1 Mn  sN(1)0, n

n Remark: The third condition is key here. For a random walk SY= , it is niåi=1

the case that ar(SnYn )=  ar(1 ) . Similarly, for a martingale,

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 59

n ar(MM )=+ ar( )  ar( D ) – we want to ensure these two results are as nk0 åi=1 close to each other as possible.

o Proposition (Azuma-Hoeffding Inequality): Let {}Mnn : ³ 0 be a

martingale with respect to {n } such that

Mcnnn-=£MD-1 nn " Then, for all n >>0,l 0

æö2 ç l ÷  MM-³£l 2expç - ÷ ()n 0 ç n 2 ÷ ç 2 c ÷ èøåk=1 k

Remark: Suppose that Dckn £" , then we can re-write this as ïïìül2  MM-³£l 2expíýïï - ()n 0 ïï2 îþïï2cn ïïìüx 2  MM-³ xcn £2expíýïï - ()n 0 ïï2 îþïï Or, choosing lk= nnlog , we obtain æök logn ç ÷  MMn -³0 k nnlog £ 2 expç - ÷ ( ) èøç 2c ÷ Choosing, for example, k = 4c produces a summable sequence, which can be used to obtain an almost sure result.

3 Proof : Define DMMkkk=--1 . Let q > 0 , and write

DD Dc=-1111 -kk + c + kk22()cc k() kk Both terms in parentheses are non-negative (since the martingale differences are

bounded by ck) and add up to one. As such, we can use Jensen’s Inequality to write

qqqDccD - D eekkk£ 1 11-+k + 1 k e 22()cc() k k -qqcc kk ee+ D qqcc- =+k ()eekk- 22ck

3 The proof of Azuma’s Inequality was actually covered in the next lecture. For expositional purposes, we chose to present this material here instead.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 60

Recall, however, that since M is a martingale,  éD |0 ù = . Recall, in ëê kk-1 ûú

-xx 2 addition, that ee+ £ ex /2 , which can be proved using a Taylor expansion. As 2 such,

-qqcckk qD ee+  éùe k |  £ ëûêúk-1 2 ()/2qc 2 £ e k Now, consider

n é q()MMn - 0 ù é ù eD= êúexp(q ) ëûêúëûk=1 k é n ù = éùexp(qD ) |  ê êú kn-1 ú ë ëûk=1 û n-1 é é qDn ùù = ê exp(qDe ) |  ú ëk=1 knëê -1 ûúû ()/2qc 2 éùn-1 £ eDn  êúexp(q ) ëk =1 k û Applying this process repeatedly, we get

æö2 q()MM- ç n ()qc ÷  éùe n 0 expç k ÷ êúçåk=1 ÷ ëûèøç 2 ÷ However,

q()MM- MM-³=l exp n 0 ³ eql ()n 0 () q()MM- £  éùeen 0 -ql ëûêú 2 æöq n £-expç c2 ql÷ ç åk=1 k ÷ èøç 2 ÷ Optimizing over q to make the bound as tight as possible, we find

n ql*2= / c . Substituting back into the equation, we obtain the required å i=1 i result. 

o Example: Suppose the Xi are IID uniform random variables taking values in

the set  = {1, , N} . Let Bb= (,1  , bk ), with bi Î  and kn  = . Now, draw n IID random copies of X, and write

n XX11= ( ,, Xn ) Let kn . We wonder how many times the specific sequence B will appear in

n X1 . We will call this number Rnk, . Finding the distribution of this is difficult. However,

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 61

Number of ways we n Probability of can "place" BX on 1 by "sliding" it up and finding B at a particular position down the Xn 1  k ()Rnk=-+ ( 1) 1 nk, ()N Let

M0 = ()Rnk, And let

MX== éRX|(,)ù s , i ëê nk,1 iûú i i

This is then a martingale, and MRnnk= , . But consider that since each X can only be part of at most k of the inequalities

MM-=[|][|] R - R ii--1,nk i nki ,1 £ k Using the A-H inequality, we then have

æöl2  MM->£l 2expç - ÷ ()n 0 ç 2 ÷ èøç 2nk ÷  æöl2 ÷ RR->£-()l 2expç ÷ ()nk,, nk ç 2 ÷ èøç 2nk ÷  æöx 2 RRxkn->£-() 2expç ÷ ()nk,, nk ç ÷ èøç 2 ÷ We can then construct a confidence interval

RnÎ-é()RknRkcc , +ù nk,k ëê nk,,aa n ûú

Where ca is chosen to make the probability of the interval =- 1 a . 

o Example: Let {}Xnn : ³ 0 be an irreducible, recurrent (MC) with transition matrix P. Then, the only bounded solution to

Pf = f is a constant vector (a multiple of 1). (Note: the state space could be infinite). Remark: We can think of the vector f as a function f :    , where  is the set of states of the chain. Note as well that this is not the steady state equation, which satisfies pp= P .

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 62

Proof: Let MXnn= f() for an f that satisfies the assumption of the proposition. The main question here is one of uniqueness, since it is clear that a constant

vectors can solve Pf = f. Let us first show that Mn is a martingale:

. MXcnn=£f()

. MXnnÎ= s(,,X1  n ) (in fact, it only depends on the last Xn). . éùéùMXXPXX| ===fff ()| ()() ëûëûêúêúnn----1111 n n n n

As such, {}Mnn : ³ 0 is a martingale with respect to {n } . Note that

supnn³1  M <¥, because we have assumed that f is bounded. Thus,

MMn  ¥ almost surely. Suppose there exists xy, Î  such that f()xy< f ()

Since {}Xnn : ³ 0 is irreducible and recurrent liminff (X )£ f ()x a.s. nn lim supn fX()n ³ f () y a.s.

This means, however, that lim infnnf (X )¹ lim sup nnf (X ) , which contradicts the convergence statement. 

LECTURE 7 – 2nd March 2011  Stochastic stability4 o Deterministic motivation: Consider a dynamical system X(t) for which d()Xt= fXt (()) d t, with f :  (this can be thought of as the “equation of motion” of the system).

Now, consider an “energy” function g :  + , with d()gXt() £-ee >0 dt

4 Some parts of this topic were covered at the end of the previous lecture. For expositional purposes, we chose to present the material here instead.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 63

This is effectively a statement of the fact the energy of the system is “forced”

down to 0, since it is “constantly decreasing” As such, $tx0(,e ) such that

gXt()()= 0 for all tt³ 0 ; in other words, our dynamical process is “pushed” towards a “stable” state.

o We now need to adapt this idea to a discrete . We can write

our “equation of motion” as XXfXnn+1 -=() n. To add stochasticity, we can

write XXnnn++11= ye(, ), where the ei are random variables. This is effectively

the definition of a Markov process, provided en is independent of X0. If we make

the ei IID, the Markov process becomes time-homogeneous.

Now consider the “energy” function – it would be too strong to ask for the energy of the process to decrease along every path. We therefore require it to decrease in expectation:  égX()()|-£- gX X ù e ëê nnn+1 ûú Since we need this be true whatever state our process first starts in, we can write

x []⋅= [ ⋅ |Xx0 = ], and the condition above becomes  égX()ù -£- gx () e x ëê 1 ûú o We now specialize this to a particular stochastic process. Consider a Markov

chain {}Xnn : ³ 0 that is irreducible. We would like to know whether the chain has a steady state. For a finite state space, all we need is to check for solutions to pppP ==,11 ; indeed, the existence of such a p is associated with positive recurrence. If the spate space is countably infinity, however, things get slightly more complicated; the two equations above do not suffice, and we also require p ³ 0 , which makes things more complicated. Here, we attempt to find simpler conditions for stability to hold.

o Proposition: Let {}Xnn : ³ 0 be an irreducible markov chain on a countable state space  k . Let K Í  be a set containing a finite number of states. Then, if

there exists a function g :    + such that (recall x []⋅= [ ⋅ |Xx0 = ])

x gX()1 -£-"Î gx ()ee x  and some >0

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 64

x gX()1 <¥ " x Î

Then ()Xnn , ³ 0 is positive recurrent o Proof: Let g be as in the proposition, and fix x Î  . Construct

n-1 MgXgx=--() () ()() AgX nnåk=1 k Where ()()AXgX =- gX ( ) g (). Write k Xkk +1 k

n-1 MgXgx=--() ()  [(gX )]- gX ( ) nn å Xk-1 k k=1 k n =- éù gX(())- gX å êúXk k k=1 ëûk -1 n =- D åk=1 k

Now, defining nn= s()XX0,, , we have that

éDgXgX|()()|ù =- êúkk--11{} X k k k ëû k -1 =-gX(){} gX ()| Xkk -1 kk-1 =-gX() gX () XkXkkk--11 = 0

And so Dk is a martingale difference. Now, consider the set K Í  in the

proposition. Let TnXkn= inf{}³Î 0 : K , and Fix m Î  . Tmk  is a bounded stopping time so we can apply the OST.

MM==0 ()Tm ()0 k

By plugging this into the definition of the martingale Mn

()1Tm- éùgX()()-- gxé k ()()0 Agx ù = êúTm êúå k ëûk ë k=0 û If xKÎ c ,

x gX(1 )-= gx () ( Agx )() £-e However, consider the sum – up to the hitting time, all the summands will be outside K, and will therefore be less than or equal to -e . As such

()1Tm- k ()()Ag x£-e T  m åk=0 kk() And so

é ()1Tmk - ù -³êú()()Ag xe ( T m) ëåk=0 kkû Combining our two inequalities

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 65

£0 ()1Tm- éùgX()()-= gxé k ()() Agx ù êúTm êå kú ëûk ë k =0 û £-e éùTm  xkëûêú -£-gx() e éù T m xkëê ûú And so  éTm£ù gx()/e xkëê ûú

But 0 £TmTkk and Tk <¥. So by monotone convergence

c xKTgx£Î()/e xK There is therefore a finite upper bound over the expectation – the mean return time to the set K is therefore finite in expectation. 

o “Application”: Consider the stochastic system XXZnnn++11=+a with

{Zn } iid and Xx0 = . We want conditions on a and distributions of Z such

that Xn is stable. Use gx()= x

xxgX()11=+£aaxZ x+ Z 1

So if  Z1 <¥ and a < 1 , then we might be able to make this work, because we would then have  Z + e x gX()1 -£-+ gx ()()ae 1 x () Z £- provided X ³ 1 - a

Queuing

 Sample path methods o We will consider a simple queuing model: . One single buffer, with unlimited space. . FIFO (first-in-first-out) service . A single server which processes work at unit rate. This model is often called the G/G/1 queuing model; arrival times are arbitrarily distributed, workloads are arbitrarily distributed (neither necessarily independent) and there is a single server.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 66

o The variability in the system comes from the flow of work into the system. The

points in our sample space are w =Î{(,tSnn ): n } , with Sn > 0 and both Sn

and tn finite. A single point in the sample space is an infinite set of these numbers. Here is a pictorial representation of a single point, w ÎW in our sample space: S S1 6 S2

S5 S S3 4

t t t t t t 1 2 3 4 5 6

o We assume the flow is stationary – in other words, qwt =dist w, for any w ÎW,

where qwt =+(tS t , ) and w = (tS , ) . Definition (load): We define the load of the system as r = S . o ()ånÎ ntÎ[0,1] n The positioning of the interval [0,1] does not matter, since the flow is stationary. If the incoming load is greater than 1, the system will eventually saturate; if it is less than 1, it’ll rest sometimes. The case r = 1 needs separate analysis.

o We also assume the flows are ergodic, and that S  å n tsÎ[,]t nÎ n  r ts- s-¥

t¥ r

o Let Wt (w ) denote the work in the system at time t for a point w in the sample

space. We will simply denote this as Wt. The time between “empty times” at the server is called a busy cycle.

o Now ïì a t = 0 ï ï ()a -t + 0£ tt£ ï 1 ï ()a -+tS+ tt = W ()w = í 12 t ï ï ()Wtt-- + tÎ [,tt ) ï tn nn+1 ï n ï + ï Wt--() t + S tt = ()tn-+21 n n n+ 1 îï n These are called Lindley’s Euqations.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 67

s o Set s < t and w ÎW. Define Xt ()w to be the work in the system at time t given the system is empty at time s. For the point w depicted above (and assuming

s st< 1 ), Xt would look like this

s Xt

s t t t t t t 1 2 3 4 5 6 o For ss¢ < , it is clear that XXs¢()w ³ s ()w t t * s Because this variable is non-decreasing in s, XXtst()ww= lim-¥ () exists. This is the “steady state” of the system; what we would observe if we had started the system a very, very long time in the past.

ss+t o Now, define a time-shift operator qt so that XXtt (qwtt )= + ( w ). As such, we have

ss+t ** limst-¥XXXX (qwtttt )=== lim st -¥ + ( w ) t + ( w ) t ( qw )

** As such, XXtdt()wqw= (t ). Shift invariance also holds for the starred process.

LECTURE 8 – 23rd March 2011

* o We have seen a number of properties of Xt , including the fact that it exists. It

* might, however, be equal to infinity. We now look for conditions under which Xt is finite.

* o Proposition: If r < 1 , then Xt <¥ a.s. for all t. Proof: Fix t Î  and w ÎW and define

ss TtXt ()wt=< sup{ :t () w = 0} Intuitively, this is the last empty time before time t (this is clearly not a stopping time; just a random time):

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 68

s T s ()w t t s Now, consider that (dropping the argument for Xt for simplicity)

Work in system Work entering the Work processed at T s system in [Tts , ] in [Tts , ] t t t XXss=+ S1 --() tT s tntT s ånÎ s t {}tTtntÎ[,] =-StT1 ()- s ånÎ nts {}tTtntÎ[,] £ S 1 ånÎ n s {tTntÎ[,]t } ¢ s¢ s s Now clearly, for ss< , TTtt£ ; in other words, Tt decreases as s decreases. This is because by moving s (empty time) further in the past, we are potentially

s -¥ introducing more work into the system). This implies that limstt-¥ TT=

-¥ exists. Let us assume that Tt >-¥. This implies that there is some finite time in the past at which the system was empty, which implies that the amount of

* work at t, Xt , is finite.

-¥ All we therefore need to do, therefore is to show that r < 1 >-¥Tt as..

-¥ We do this by contradiction; suppose r < 1 and Tt =-¥ on a set w of sample paths. From the second line above, we have, XSss=--1 () tT tnånÎ s t {tTtntÎ[,]}

s Now, however, that Xt ³ 0 (there can never be negative work in the system),

s and so, dividing by tT- t throughout, we obtain S 1 ånÎ n s {}tTtntÎ[,] ³ 1 s tT- t

s Note that if we were to replace Tt with s, the LHS of this expression would form a sequence with limit to r as s -¥. By the definition of a limit, however, this is also true for any subsequence of that sequence. But since we have assumed

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 69

s Tt -¥ as s -¥, the LHS above is such a subsequence. Thus, letting s -¥, we find S 1 ånÎ n s {}tntÎ[,]T t 1 £r s s¥- tT- t

-¥ This is a contradiction, since we have assumed r < 1 . As such, Tt >-¥ a.s.

* and Xt <¥ a.s. 

a o Definition (Coupling Time): Let Wt,t ()w denote the load in a system, at time t and along sample path w , given that the system is allowed to start with load

* a at time t . As before, let Xt ()w denote the load in the system at time t given

that system started empty at t =-¥ (the “steady state”). Then T0 , called the coupling time, is the first time at which both the paths “couple” – past that point, the two paths behave identically.

a Wt,t

a * Xt

t T 0

o Proposition: If r < 1 , then T0 <¥ a.s.

aa Proof: WLOG, assume t = 0 and denote WWt,ttº . Also assume, WLOG, that

* aw> Xt () (ie: the transient process starts higher than the steady-state process, as pictured above). In this case, the diagram above should make it clear that the coupling time is the first time at which both processes are empty. Since the transient process is more loaded than the steady-state process, this is effectively the time at which the transient process first empties. In other words,

a TtW0 =>inf{ 0 :t = 0}

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 70

Assume r < 1 and suppose T0 =¥ on a set of nonzero probability. This implies

a that Wt >"³0 t 0 and that the server is constantly working for all t > 0. As such

Work at Work processed tt=0inNew work in [0,t ]  [0,] 0£=WSta a +1 - tnå ttÎ[0, ] nÎ {}n Dividing by t and re-arranging, we obtain S 1 a ånÎ n {}ttÎ[0, ] 1 £+n  r ttt¥

This is a contradiction, and so T0 <¥ a.s.  o Example: This theorem comes in particularly useful, for example, in estimating

* quantities like ()XAt Î . Consider the following estimator, based on a queue started at an arbitrary point a

1 t pAˆ ( ) = 1 d s t ò WAa Î t 0 { s } * As t grows, one might expect this to be a good estimator for ()XAt Î . To show this formally, consider that

Tt 110 pAˆ ()=+11aa dss d t òò0 WAÎÎT WA tt{}ss0 {} t T0 1 £ + 1 * ds òT XAÎ t t 0 {}s * +0 ()XAt Î We can bound the estimator the other way by simply ignoring the first term.

ˆ * Either way, we find that pAt ( )s()XAt Î¥ a.s. a t . Our estimator is therefore consistent. Intuitively, this is because the chains eventually couple. 

o We now ask how fast this convergence occurs… o Proposition: If r < 1

sup,WAWAaaÎÎ-ÎÎ , XAXA**,, 0 Antn,,1At{ ++tt t} { t 1} t¥ 1 nn11 n

n Where AA1,, n are measurable sets and for all (,t1,)tn Î and a > 0 . This is called convergence is total variation and is stronger than weak convergence.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 71

Proof: We prove the theorem when n = 2. This generalizes straightforwardly.

Fix a > 0 , tt12, Î  , t > 0 and sets A1 and A2. Now WAa ÎÎ-ÎÎ,W a A XAXA**, { ttt++t 12t t } { 12} 1 212 =ÎWAa ,W a Î- A XAXA** Î, Î { ttt+++ttt12t +t } { 12} 112 2

Calling the first event t and the second t , we can write this as

(tttt,,TT00>+tt) ( 00 £-) ( ,T >-t) ( , T £ t)

Note, however, that if T0 £ t , the steady-state process is identical to the transient process. As such, the second and fourth terms cancel, and we can write this as

()()tt,max,,TT000>-tt,,T0 > t £{ ()()  tt > T > t }

£>()T0 t

t¥ 0

Where the last step follows from the fact T0 <¥ a.s. if r < 1 . Since this result

does not depend on the sets A1, A2, we can take a supremum over all such sets and obtain our required result. 

o We can compare to what happens in Markov chains. Consider a finite-state, irreducible Markov chain, and consider the quantity

{XiXjnnijji==-=£|max0,} p { Xi} { Tn >} qT £ max  eei -qn ()ij, j Here, the first probability is taken with respect to a Markov chain that starts in an arbitrary state j (the transient process). The second probability is taken with respect to the steady state. Since the expectation of the moments of T is finite (see homework 2), this implies that this difference falls exponentially fast.

W a Proposition: If r > 1 , then for all a > 0 , lim inft > 0 almost surely. In o t¥ t other words, the workload increases linearly with time.

Proof: Denote the cumulative idle time up to time t as It. We then have WStIa =+a 1 -- tntånÎ {}ttÎ[0, ] () n ³ St1 - å n t Î[0,t ] nÎ {}n Dividing by t

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 72

a S 1 W ånÎ n {}ttÎ[0, ] t ³-n 1 tt Letting t ¥, we get that W a lim inft ³->r 1 0 t¥ t This is intuitively sensible – the amount that accumulates in the system is whatever load there is in excess of 1.   Little’s Law (Conservation Laws)

o We now consider a setting in which work ()tSnn,: nÎ  enters a system and

then leaves the system at a time ()dnn : Î  . We let qn be the sojourn time of

th the n job in the system, given by dtnn- .

dn: Î  (tSnn,: nÎ ) ()n System

q =-dt nnn We do not put any constraints on the working of the system. For example, the server could spend some of its time idling. In particularly, we do not assume that the server processes work on a FIFO basis, so it is not necessarily the case that

dd12<<.

We define the following quantities

At( )== sup{ n : tn £ t} number of arrivals in [0, t ] Dt( )= number of of departures in [0, t ] Nt()=-= A ()tD () t work in the system at time t The only two assumptions we make is that the following two statements are true for every sample path At() . lim(0=¥l Î , ) t¥ t n q å k . limk =1 =¥q Î (0, ) n¥ n o Proposition: Under these assumptions,

1 t Ns() d s lq t ò0

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 73

Intuitively, this states that the average amount of work in the system at any given time is equal to the average number of arrivals per unit time multiplied by the average sojourn time. Proof:

tt Ns() d s=- As () Ds () d s ò00ò Note that we can write As()= 1 ånÎ {}ts£ n Ds()= 1 å d £s nÎ {}n As such Ns()==11- 1 åånÎÎ{}ts££ { ds }n { tsd ££ } nn nn (The last equality follows because any jobs that arrive after s won’t be counted at all, and any events that arrive and leave before s will be counted by both indicators and therefore cancel out). By swapping the summation and integration (valid by Fubini), we obtain

tt Ns() d s = 1 ds å tsd££ òò00nÎ {}nn ìü ïïAmount of time job n was = íý ånÎ ïïin the system during [0,t ] îþïï We can bound this above by considering the sojourn time of all arrivals up to and including time t (though some of them may overrun past t) and lower bound it by considering the sojourn time of all job that depart before time t (even though some jobs that leave after t do spend some time in the system before t). This gives

Dt()t At () qqnn££Ns() d s ååin==00i ò0

th Where ni is the index of the i job to leave the system (since we have no assumed

FIFO processing discipline, we cannot assume that qi = i ).

Before we continue, we will need the following claim Claim: Under the two assumptions above

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 74

q n ¥0as n t n

Intuitively, the second assumption states qn grows slower than n, so we would expect this to be true. Proof: Consider that

nn-1 q qq- 11nn-1 nk==--åå==11kk k qqqq0 nnnnååkk==11kk Consider also that

n At()n =l as tn ¥ ttnn Finally,

q n n ⋅=00l nt n As required. 

Now by our claim, for all e > 0 , there exists an N()e such that for nN> ()e q d -t nn£een £ tnnt

This implies that dn £+ (1ee )tnNn "> ( ) . This means that all jobs after the N()e th job that arrive in [0, t] will have departed by time t(1+ e ) . Therefore

At() Dt() (1+e ) qq£ åånn nN=+()e 1 i = 0 i Putting this together with the bounds developed above, we find that

11At() t(1)+e 1At()(1+e ) qq££Ns() d s åån n tt(1++ee ) nN=+()e 1 (1 )ò0 t (1 + e ) n =1 Let’s first consider the upper bound æöæ ö 1 At()(1++ee )ççAt()(1 + e ÷÷1 At() (1 ) qq= çç÷÷ lq åånnçç÷÷ t(1)+ e nn==11ççt(1 + e )÷÷At()(1+ e ) èøè ø The first term tends to l , by assumption 1. The second term tends to q because

. By assumption 1, At()¥t¥

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 75

q . By assumption 2, å n  q , which means any subsequence thereof also n  q . Since At()(1+¥e ) , the second term above is precisely such a subsequence. This implies that

1 t lim supNs ( ) d s£ lq t¥ t ò0 Now the lower bound

111At()At() At () qq= ttAt(1++ee )åånN=+()ee 1nn 1 ( ) nN =+ () 1 At() N ()e 1 At() qq- = åånn==00nn 1()+ e tAt At() N ()e æöqq÷ 1 At()çåånn÷ =-ç nn==00÷ ç ÷ 1()()+ e tAtAtèøç ÷ lq  1 + e Again, the first term tends to l by assumption 1. The second term tends to q by a similar logic as above, and by noting that since N()e <¥, the second term in brackets tends to 0. As such, we get

1 t lq lim inf()Nse d³ t¥ t ò0 1 + e Since this is true for all e > 0 , this, together with the lim sup above, proves our theorem. 

LECTURE 9 – 25th March 2011  Single-server queue; IID case o We now specialize our analysis to a situation in which the workloads and inter-

th arrival times are IID. Letting tnnn=-tt-1 be the time gap before the n job

arrives, this situation requires the {}Sn and {tn } to be IID. There is, once again, only one server. We often denote this situation GI/GI/1.

th o We also denote by wn the time that the n job has to wait in queue before it is

served. In that respect, dtwSnn=+ n + n. Now, consider that

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 76

ïì 0 d £ t ï n n+1 w = í n+1 ïdt-> d t îï nn++11 n n + =-()dtn n+1 As such

+ wwSnnnn++11=+-()t

Let ZSnn=--1 t n. We then have

+ wwZnnn++11=+()( =max 0, wZ nn + + 1 )

Since the Zn are IID random variables, this is none other than a random walk

n “capped off” at the origin. Letting s = Z , we find that njå j =1

wZ11==max() ,0 max ()s 1 ,0 + wwZ=+ 212()

=+sss21max(0, -- ,2 )

=-ss202min £kk£ In fact, carrying this analysis forwards, we find that

wn =-ssnkmin0££k n The second term takes into account the fact we have a reflected random walk, and prevents the walk from going negative.

o One thing the GI/GI/1 framework gives us over the G/G/1 framework is that we can now say something more about the distribution of the waiting times:

wnn=-ssmin kk£n

=-max0££kn{}ssnk n = max Z 0££kn{å jk=+1 j }

Consider, however, that the Zj are IID. As such, we can change indices on the Z at will and still maintain equality in distribution. Therefore

k wZ= max nd0££kn{å j =1 j}

= max0££kn{}sk

= Mn

Since Mn is a non-decreasing sequence, MMnkk=a.s.¥³max 0 s . As such,

wMn ¥¥ as n . [Note: convergence in distribution is the best we can hope

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 77

for in this case, because Mn has some structure (the fact it’s non-decreasing) that

wn does not. However, since this is a Markov chain, a stationary distribution is all we could really want].

o If the random walk has positive drift – in other words, if

()ZSnn=-> (-1 )  ()0t n – then the random walk drifts to infinity and the waiting times get infinitely large. This is consistent with our findings in the G/G/1 queue, since

()trnn+1 >< ()S 1. On the other hand, if the random walk has negative drift, the chain is stable and the waiting times return to 0 infinitely often almost surely (we motivated this result in homework 2 using a simpler reflected random walk).  The single-server M/M/1 Markovian queue o We now consider the most tractable of all single-server queue models; the M/M/1

in queue. In that case, we assume the {Sn } are IID and exponentially distributed

with parameter m whereas the {}tn are IID and exponentially distributed with parameter l .

o Consider the process Xt( )=³ Number of jobs in system at time t 0 {Xt(:) t³ 0} is a continuous-time Markov Chain with countable state space. In fact, it is a birth-and-death process:

l ll

0 1 2 …

m m m

We now proceed to analyze this CTMC (note that fh()= oh ()=limfh() 0 ) h¥ h Consider t > 0 and a small h > 0. We have

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 78

ïì lhoh+=+() j j 1 ï ï mhoh+=-() j j 1  Xt()|()+= h j Xt == i íï {}ï1(-+lm )hoh + () j = i ï ï oh() otherwise îï Our transition matrix P then takes the form éù êú 1 - lh lh 0 êú êúmlhh1(-+lm )h 0 êú P = êú00mlhh1(-+lm )h h êú êú0 mh   êú 0   ëêúû We can write this as é ù êú-ll 0 êú êúmlml-+() 0 êú PI=+êú0()0mlml-+ hoh +() h êú êú0 m  êú 0  ëûêú

PIQhohh =+ +()

PI- Q is the rate matrix, defined by Q = lim h . The steady-state equations in h0 h this case are

 ppPh = p ³ 0 p ⋅=e 1

Feeding our expression for Ph into the first equation, we obtain pp+==Qh+ o(0 h) p  p  Q For our particular matrix, this give -=lp + mp 0 01 lpnnn-+11-+() m l p + mp = 0 Letting rlm= / and assuming r < 1 , we obtain pr= p 01 rpnnn++11-+(1 r ) p + p = 0 Solving this recurrence relation, we obtain pr=-ccrpn ==11 nnå

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 79

So

n prrn = ()1 -³n 0

This expression passes a “sanity check” at n = 0; the probability the chain is empty is given by pr=-11 =-l , which we would expect to be the average 0 m amount of time the system is empty.

o We can use Little’s Law to good effect; consider the following two examples; the first is trivial, the second less so . Consider the server as the “system” Let S be the number of items in the server. The average sojourn time in the server is 1 . As such, by Little’s Law m l  ()S ==r p m

Note, however, that S =⋅10⋅ 11System is busy+ System is idle . As such,

p (S )==( System is busy) r . The result above is therefore consistent with what we would expect. . Consider the queue as the “system” Let Q be the number of items in the queue. The sojourn time in the

queue is simply the waiting time, wi. As such, by Little’s Law

pp()Qw= l () (where w is the waiting time for any new item that joins the queue).

+ Now, note that Qt()=-( Xt () 1) (we must subtract any item currently in the server). As such

2 ¥ r  ()Qn=--= ( 1)(1rr )n p ån=1 1 - r Therefore r  ()w = p mr(1- )

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 80

o We know consider the more difficult problem of deriving the probability

distribution of waiting times, p (wx> ) . Once again, recall w is the wait time experienced by a random job entering the queue.

¥ wx>=⋅0 XXk==0 + wxXk >| = ppp() ( )åk =1 ( )() ¥ =  wxXk> |)= (1 - rrk å k=1 p () Now, note that since the exponential distribution is memoryless, we can write

Remaining time Processing time for job currently for the k- 1 other jobs waiting in server  k-1 wX|Erlang(,)== k S + S = km ()dkå j =1 jd (the Erlang distribution is the distribution of the sum of exponential variables; it is a special case of the Gamma distribution). As such

--mxk1 ¥ ex()mm fx()=-rrk (1 ) w åk=1 (1)!k - k-1 ¥ ()mx =-e--mxkmrr(1 ) r1 åk=1 (1)!k - k ¥ ()mrx =-e-mx mrr(1 ) åk=0 k ! =-rm(1 r )e--mr(1 )x And so ì ïexp()mr (1- ) with prob r w ~ íï ï 01- r îï And ()wx>=r e--mrx(1 )  The M/G/1 queue o We now consider a slightly more general case in which the time between arrivals

{}tn are exponential, but the processing times {}Sn now follow a general distribution. This complicates matters slightly, because X(t) – the amount of work in the system – is no sufficient enough to totally describe the state of the system at time t. We also require R(t), the residual processing time of the work currently in service. (This was not the case when processing times followed a memoryless exponential distribution).

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 81

o To sidestep this complication, we will consider the following embedded Markov

th chain Xn. Let Tn be the time at which the n job concludes processing, and denote Number of jobs in the system XT()ºº X n+ n immediately after the n th job as departed

We then have

+ XXnn++11=-+()1 A n

th Where An + 1 is the number of arrivals during the processing time of the (n + 1)

job. In this case, by assumption, the An are IID, and ()()ASnn|~Po= sl s, where l is the rate of arrivals.

o Now, using the ergodic theorem for DTMCs (and assuming we have stability; ie: r < 1 ), we have

()Xjnn=¥ p() j Furthermore, from the G/G/1 case, we know that for each sample path,

* Xt()= Xt exists, which implies that

()Xt()= jt¥ pˆ () j The challenge now, however, is to prove that pp= ˆ .

o Theorem: pp= ˆ Proof: Define the following two processes: At( )= Number of arrivals in [0, t ) that "find" the system in state j j Dtj ( )= Number of arrivals in [0, t ) that "leave" the system in state j Also let A(t) and D(t) be the total number of arrivals and departures up to time t, respectively. We now derive or quote a number of seemingly unrelated facts and then combine them to prove our result 1. For any given sample path w , it is clear that

Atjj()- D () t£"³ 1 t 0

This is because Aj(t) is the number of “up-crossings” of X(t) over the line

X(t) = j whereas Dj(t) is the number of “down-crossings”. Since the path

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 82

is continuous, every up-crossing must be followed by a down-crossing, and so the difference in numbers will be at most 1. 2. Using the ergodic theorem of Markov Chains, we have

Dt() 1 n limj == lim1 p (j ) a.s. tn¥ ¥ åk =1 {}Xj= Dt() n k

(The first equality follows from the fact that Xk is precisely the state of the system after the kth departure).

3. Using the bound in (1), we have that Dtjjj()- 1££+ At () Dt () 1, so

pp(jj ) by (2) ( ) by (2) 11   Dt()Dt()-+ 1At() Dt () 1 Dt () jj££j ⋅ At() Dt ()A(t) Dt () At () And so At() j  p()j a.s. At()

th 4. Letting tk be the time of the k arrival, we have

At() 1 n limj = lim 1 tn¥ ¥ åk=1 {}Xj(t )= At() n k 5. A well-known property of queues with Poisson arrivals is PASTA (Poisson arrivals see time averages – see addendum at the end of this lecture).

11t n lim11 ds = lim tn¥ò {}Xs()== j ¥ åk =1 {} X(t ) j t 0 n k th where tk is the time of the k arrival to the system. Effectively, this states that in working out the average work in the queue, we don’t need to sample at every time-step; it is enough to sample at arrivals. 6. By the ergodic theorem for Markov Chains

1 t pˆ()j = lmi 1 ds .a.s t¥ t ò0 {}Xs()= j 7. Finally, combine all the above

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 83

(6) 1 t pˆ(j )= lim 1 ds a.s. t¥ t ò0 {}Xs()= j (5) 1 n = lim 1 n¥ åk =1 {}Xt()= j n k (4)  At() = lim j t¥ At() (3)

=a.s. p()j Which proves the required result. 

o Example: We can use this result to derive the Pollaczek–Khinchine formula for the average waiting time in a M/G/1 queue. First, consider that by the above éXt()ùéù= X ppˆ ëê ûëûúêún Recall further that

+ XXnn++11=-+()1|~() A n ( ASsPos nn = )l We can write this as follows XA=-(1)X 1 + nn+1 {}X ³1 n+1 n = XAnn11-++1 {}XXnn³³1 {}1 And Squaring this expression, we obtain (we simply denote 11º to save space) {}Xn ³1

22 2 Xnn+++1 =++-+XAnn11- 22XXAAnn+11 111 2 n1 As such

22 2 pp()XAAnn+1 =-++-+ (XAnnnn1122XX++111 111 2n + ) We can begin by simplifying the expression above by noting that

 X 1 = p ()n {}X ³1 n

And so

22 2 ppXXnnnnnn+1 = n 11++--+  p AX++11222 111 A + 1 XA + 1 {}XXnn³³11 {} 2 0=+rr()AXAAXnnn++1 - 2pp () - 2(11 ) + 2(+ ) () ³ add indicator 0, same

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 84

Note that ()AASS===éù (|)lr () 1 ëûêú ar(AAS )=+éù ar( | ) ar+([E(A|S))))=llrls  ( SS ) +=+ ar( ) 22 11ëûêú s So

22 2 2 022=++rrlsrs + -ppXX - r +2 r 22 2 2pX(1-=rrlsr ) 2 +s + 2rr++22 ls  X = s p 2(1- r ) 2 22 2ss 2()rr++ l S 2  X = ([])ES p 2(1- r ) 2(1)rr++2 CV 2  X = p 2(1 - r ) For M/M/1, CV = 1, so back to expression above.

lrwX=-=-(1)(1)+  X11=-³=  X  X 1  () X - pp p{}X³1 pX³1 p() p

Plugging in from the above

rr222(1++CV ) 2r r (1 + CV 2 ) l w ==+ p 2(1---rmrmr ) (1 ) 2 (1 )

All true only if we know the expectation of X2 exists. Thus, out that

32 ()S <¥pX <¥, this is excessive because we took the simpler way to prove it. Really, all we need is the second moment. Easy approach to problem leads to an excessive condition.

Pillache-kinchin formula. The steady states being equal doesn’t usually happen.  Addendum on PASTA o The principle of PASTA (Poisson Arrivals see Time Averages) states that for any stochastic process X(t) over a state space  , and for any A Í 

11n t lim11= lim ds n¥åk =1 {}Xt()Î A t ¥ ò {}X()sAÎ ntk 0

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 85

provided the tk form a Poisson process – in other words, ttkk- -1 ~exp()l . This is really quite a surprising result! The RHS has full information and sees the system at all times. The LHS only sees it at a selection of times.

o For an intuitive proof, write A(t) = cumulative number of samples up to time t. the statement above can then be written as

11t t lim11 d()As= lim d s tt¥At()òò0 {}Xs(())ÎÎ A ¥ t 0 {} Xs A Now, write

tt M(t) = 11 d()As- l d s ò00{}Xs()ÎÎ Aò {} Xs () A t = 1 d()As ò0 {}Xs()Î A n »+-1 AAk(1)k () å XAÎ () k =1 {}k Where dAs ( )== d As ()d--ll s At ( ) At ( ) t. This is a martingale, because the increments dAs ( ) are independent (by the “independent increments” property of Poissons processes) and mean 0 (because of our normalization). Furthermore, the conditional increments are uniformly bounded (since they are counts). As such Mt() 0 a.s. In other words, t At( ) 11tt lim11 d(A s)= lim ds tt¥¥ ltAt()òò0 {}Xs( )ÎÎ A t 0 {}X()sA Finally, use lim At()= l to conclude that the first term converges to 1. This t¥ t gives the required result.

LECTURE 10 – 30TH MARCH 2011

Renewal & Regenerative Process

 Renewal processes

n Let X be IID, with ()X =<¥m and  X =<01. Let SX= , o {}n 1 ( 1 ) niåi=1

with S0 = 0. Let Nt()=£ sup1{ n³ : Sn t} . {Nt(:) t³ 0} is called a renewal process.

o Definition (renewal function): The renewal function is defined as mN(]tt)()= [ .

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 86

o Example: Let X1 ~exp(1/m ). {Nt(:) t³ 0} is then a Poisson process. Consider, incidentally, that in a Poisson process, the following two facts are true. Much of our work in this section will be concerned with generalizing these results to general renewal processes:

. (())Nt=== mt ()m t ( X1 ) t (generalizes to the elementary renewal theorem). ¢ . mt()==m () X1 (generalizes to the key renewal theorem). 

¥ Proposition: mt()= F () t , where Ft()=£ S t. o ån=1 n n ()n Proof: Note that

¥ N()t = 1 ån=1 {}St£ n ¥ [(Nt)] =£ S t å n=1 ()n Where, in the last line, we used Fubini I. 

Remark: The CDF of Sn is the n-fold convolution of the CDF of each individual variable X. FFF()⋅= () ⋅* () ⋅* * F () ⋅ n 11 1  n fold For example,

Ft()=- Ft ( n ) d( Fn ) 2 ò  Laplace Transforms & Co. o The Laplace Transform of a function of a distribution with CDF F(x) is defined as

¥¥ Fsˆ()== e--sx d Fx () e sx fx () d x ò00ò A few important results: . Laplace transform of convolutions. In CDF form, if FAB=*, then Fsˆˆ()= AsBsˆ () () . Note that if f is a density function, then

¥ Fsˆ()=< e-sx d Fx () 1 ò0

o Together, the points above imply that

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 87

¥ Fsˆ() msˆ()==å Fsˆ ()n n=1 1()- Fsˆ Similarly, re-arranging msˆ() Fsˆ()= 1()+ msˆ As such, there is a one-to-one correspondence between F and m.

o Theorem: If b(t) is bounded on any interval, then the solution to the following renewal equation abaF=+* ¥ at()=+ bt () at ( - s ) d() Fs ò0 is a =+*b b m t at()=+ bt () b (t -s ) dm ()s òs=0 Verification: Taking the Laplace transform of the first equation

asˆˆ()=+ bsˆˆ () asFs () () As such bsˆˆˆ() bsFs () () asˆˆ()==+=+ bsˆˆˆ () bs () bsms () () 1()--Fsˆˆ 1() Fs Taking an inverse Laplace transform of the proposed solution, we do indeed find this equation is satisfied. 

o Example: Consider that ì ï 0 xt>  éùNt()| X== x ïí ëûêú1 ï1()()|+- éùNt Nx X = x x £ t êú1 îï ëû ì ï 0 xt> = í ï1()+-mt x x £ t îï However,

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 88

é ù mt()=  ëûêúNt( ) ¥ =  éùNt()| X= k d Fx( ) ò0 ëêú1 û t =+-éù1()mt x d() Fx ò0 ëûêú t =+Ft() mt ( - x ) d( Fx ) ò0 =+*Ft( ) ()() m F t This equation is in the form of a renewal equation, with b = F. The solution is therefore

t mt()=+ Ft () Ft ( - s ) d ms ()  òs=0 o Example: Let us consider the example X ~exp(1/m ) and mt(d)()= tm dmt t =m . Thus, the solution to the standard renewal equation is

tt at()=+ bt ()mm bt ( - s ) d s =+ bt () bs () d s òss==00ò For the integral to be finite, we need bt() 0. As such, we might expect that

t at()¥m bs () d s as t òs=0 It turns out this last conclusion holds generally, not just for Poisson processes.  Some Theorems o Proposition (SLLN for renewal processes) Nt() 1 ¥a.s. as t tX()1

Proof: Consider that StSNt()££ Nt ()+ 1. As such

S t S Nt()+ 1 Nt() ££N () t+ 1 Nt() Nt() Nt ()+ 1 Nt ()  1 a.s.

S However, by the SLLN, n  ()X . Feeding this into the above proves our n 1 theorem. 

o Proposition (Elementary/Baby Renewal Theorem): é ù mt()  êúNt() 1 =ëû as t ¥ tt()X 1 Proof: See homework 4.

2 o Proposition (CLT for renewal processes): Suppose that ar[X1 ] =<¥s

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 89

æö çNt( ) 1 ÷ 1 t ç - ÷  s N(0,1) ç t ()X ÷ 3/2 è 1 ø éù()X ëê 1 úû Or in other words t s N()ttN»+ (0,1) ()X 3/2 1 éù()X ëê 1 ûú

Proof: Throughout this proof, we will write ()X1 = m . 1. Step 1: We first show that æö çN()t 11÷ t ç - ÷  s N(0,1) ç ÷ 3/2 èøçSNt( ) m÷ m We do this as follows: æö æS ö ççNt() 1 ÷÷t Nt() t çç-=÷÷1 - çç÷÷ èøççSSNtNt()mm÷÷ Nt ()/() èNt() ø æöS t 1 ç Nt()÷ =-çm ÷ ç ÷ m SNNt()/()t èøç Nt( )÷ æöS t 11 ç Nt()÷ =-Nt()çm ÷ Nt()m S/() Nt() ç Nt÷  Nt() èø  m m Now, consider the last term – since Nt()¥, it is a subsequence of

S n n - m . This latter expression does tend to sN(0,1) in distribution, ()n by the standard CLT. The question is whether the weak convergence also holds for the subsequence. To answer this question, we will require a side- lemma.

Lemma (Ansombe’s Lemma/random time change

22 lemma): If {}Zn are IID with mean 0 and ()Z1 =<¥s and {Nt(:) t³ 0} is an integer valued process such that

Nt() Îb (0, ¥ ) , then t

1 Nt() Z  sN(0,1) å 1 N(t) i=1

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 90

In other words, we require our random sequence to converge at linear linearly to infinity for convergence in distribution to hold. Intuitively, this is because we need variances to accumulate fast enough for the CLT to be valid.

In this case, note that Nt() 1  a.s . t (X1 ) This implies that Nt( ) 1 =()b tX( 1 ) Hence, the Ansombe’s Lemma applies. We can then use the converging together Lemma on the expression quoted above to obtained the desired result for this step. Step 2: We now use step 1 to prove our result. Consider that

S t S Nt()+ 1 Nt( ) ££ Nt()+ 1 ⋅ Nt()Nt() Nt ()+ 1 Nt ( )  1 a.s. Both sides of the “sandwich” are in the form used in step 1. Therefore, this immediately leads to the desired result. 

o Definition (Lattice/Arithmetic distribution): A distribution F is said to be of Lattice-type if there exists an h > 0 such that F is supported on

{Xnhnn } =Î{ : } For example, the Poisson distribution is lattice with h = 1.

o Theorem (Blackwell’s Theorem): If F is non-lattice, then 1 mt()()+- a mtat  ⋅ as ¥ ()X1 For all a > 0.

Remark: This result is not implied by the fact mt() 1 , because this theorem tX()1 concerns increments in the renewal function m.

o Definition: A function f : ++ is directly Riemann integrable (dRi) if

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 91

¥ . fx() d x<¥ ò0 n ¥¥ . fx() d x- fx () d x 0 òò00nnh0 Where

¥ fx()= sup(): fx kn££xk(1) + n1 n åk=0 { } {}xkhkÎ+[,(1) h ¥ f ()x =+ inffx ( ) : kn££xk( 1) n 1 n åk=0 { } {xkhÎ+[,(1)kh} Direct Riemann Integrability can be thought of as an extension of Riemann Integrability over an infinite line, as opposed to over an integral.

+ – Remark: If f : +  , we simply require both f and f to be dRi for f to be dRi.

Remark: Any of the following conditions are sufficient for f : +  to be dRi . f is continuous with compact support. . f is bounded and continuous, and

¥ fx() d x<¥ ò0 h for some h > 0.

¥ . f is non-increasing and fx() d x<¥ ò0 o Theorem (Key Renewal Theorem): Providing the usual assumptions on F holds (IID increments with finite mean that are not masses at 0) and that the

renewal process is non-lattice, then for any b : +  that is dRi

1 ¥ at()¥ bs () d s t m ò0 Where a is a solution to renewal equation abaF=+* .

LECTURE 11 – 6TH APRIL 2011  Regenerative processes o A regenerative process is a stochastic process with time points at which, from a probabilistic point of view, the process restarts itself. A good example is CTMC (for example, corresponding to an M/M/1 queue). We consider that the process “resets” each time the queue empties.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 92

o Definition: Let (Xt(:) t³ 0) be a stochastic process with a sample path that is right-continuous with left limits (RCLL or CADLAG). Without loss of generality, we let the renewal occur at Xt()= 0. Now, define the following quantities:

th . tt(1)nt+=¹= inf{ ³ ():nXt () 0,( Xt- ) 0}, the time of the n renewal.

. ttn+1 =+-(1)()nn t, the inter-renewal time. ïì ïXn()t(]-+1) t tÎ [0,tn . Xt ()= íï , where D is outside the state n ï D>t t îï n space of X. Then X is said to be regenerative if

. tn <¥ a.s. "n . t()nn¥ as ¥  . XX01,, are independent  . XX12,, are identically distributed (the first renewal might have a different distribution if the system doesn’t start “empty”). o Example (CTMC): Let (Xt(:) t³ 0) be a CTMC with state space  =  . We can then define a renewal process using any state x Î  as our “renewal state”. Our only requirement is that x be at least null recurrent; we must return to the state infinitely often. 

o Example (DTMC): Let (Xnn : ³ 0) be a DTMC on  =  . We can then define a renewal process using any state x Î  , provided the chain is irreducible and recurrent. 

o Definition (recurrence): A regenerative process is positive-recurrent if

()t1 <¥ and null-recurrent otherwise. o Proposition (SLLN for regenerative processes): Let (Xt(:) t³ 0) be a

regenerative process over a state space  with ()t1 <¥, and let f :    be such that

æöt(1) ç fXs()() d s÷ <¥ èøçòt(0) ÷

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 93

Then

æöt(1) ç fXs()() d s÷ 1 t èøçòt(0) ÷ fXs()() dst¥ a.s. as ò0 t (t1 ) Proof: Let

t()k Yf()= fXs( ()) d s and Nt()=£ sup0{ n³ :t t} k òt(1)k- n This means that we can write

ttNt() ct()==+ fXs( ()) d s Yf ( ) fXs( ()) d s òò0()åk=0 k t()Nt As such

ct() Yf() 11Nt() t =+0 Yf() + fXs () d s å k () tttk=1 tòt()Nt()

By the assumption in the theorem, Y0(f) is finite, and so the first term vanishes in the limit. Now

11Nt()Nt() Nt () Yf()= Yf () ttNtååkk==11kk()

Note, however, that the Yi(f) are IID. Furthermore, t()k t (1) æöt(1) Yf()== fXs() () d s  fXs() () d s£<¥ ç fXs()() d s÷ k òòòt(1)k- t (0) èç t(0 ) ø÷ by assumption, and so we can use the SLLN (since Nt()¥), and the SLLN for renewal processes to write the above as

11Nt( ) Y (f ) =  Yf() åk =1 k ()1 t ()t1 Precisely as required. All we now need to do is to show that the last term vanishes.

11tt fXs()() d s£ fXs() () d s ttòòtt()Nt()() Nt ()

Nt() 1 t £ f ()Xs() d s tNt()òt()Nt() We now use the following Lemma.

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 94

n Lemma: If a is a real-valued sequence such that 1 aa , { n } n åi=1 i ¥ then 1 maxa  0 as n ¥. n 1££iin { }

Nt() Notice that 1 Yf())(  Yf , since Nt()¥. As such, Nt()åk =1 k a.s. ( 1 ) 1 maxYf ( ) 0 . As such Nt( ) 1££KNt( ) { k }

1 t Nt() max{Yf ( )} 1 fXs()() d s£⋅1()1££KNt + k  ⋅0 a.s. òt()Nt( ) tttN() (t1 ) As expected. 

t ()Y (f ) We have just proved that 1 fXs() d s 1 . This suggests that o t () a.s. () ò0  t1 ()Yf()  éùfXs() = 1 . p ()()t ëûêú1 o Proposition (CLT for regenerative processes): Let ()Xt(:) t³ 0 be a

2 regenerative process on a state space  with ()t1 <¥. Let f :    be a

2 function satisfying ()Yf1(| |) <¥. Then

æöt ç fXs()() d s  Yt()÷ çò0 ()1 ÷ t ç - ÷  sN ()0, 1 ç t ()t ÷ ç 1 ÷ èç ø÷

2 2 (())Yf1 With s = ()Yf1 ()/(c t1 ) where ffc ()⋅= () ⋅- . ()t1 o Theorem: Let ()Xt(:) t³ 0 be a positive recurrent regenerative process on a state space S Í d . Suppose that either of the following conditions hold

d . F()xx==()t1 has a density and there is a function h :  that is bounded

. F()xx==()t1 is non-lattice and there is a function h that is bounded and continuous Then

é t(1) ù  êúò hXs(()) d s éùëûêút(0)  êúhXt(())¥ as t ëû  éùt ëê 1 ûú Then Xt( )as¥Xt() ¥

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 95

and

é t(1) ù  ê 1 dsú êòt(0) {}Xs()Î A ú ()Xt()Î= A ë û  éùt ëê 1 ûú

Remark: If XX(0)=¥d ( ), then ()Xt(:) t³ 0 is stationary. Proof: Fix t > 0, and suppose t(0)= 0 (without loss of generality). é ù at( ) =  ëûêúhXt(()) =>+£éhXt( ( )),tt tùé hXt ( ( )), t ù ëê 11ûëúê ûú t =+bt() éù hXt ( ())|t = s d Fs () ò0 ëûêú1 t =+bt() éù hX ( (t (1) +- t s )) d Fs () ò0 ëûêú t =+bt() éù hX ( (t (0) +- t sFs ))d () ò0 êúë û (Where, in the last step, we have used the fact that every cycle has the same distribution). As such

t at()=+ bt () at ( - s ) d() Fs ò0 abaF=+* Now, consider that b()t =  éhXt( ( )), t > tù êú1 ëû bg()tt£>£>=éùhXt(()),tt t h ( t) () ëê 11ûú ¥ Where g(t) is bounded and continuous and non-increasing and

¥ gt() d t=<¥ h  (t ) finite. As such, by the conditions for direct ò0 ¥ 1 Reimann integrability, we have that g is dRi. Now, if bg£ and g is dRi, then b

is dRi. And since F is non-lattice, all the conditions of the key renewal theorem hold, and therefore

1 ¥ a()tb ()ss d t¥ ò0 ()t1 1 ¥ =  éhXs( ( )),t > sù d s ê 1 ú ()t ò0 ë û 1 1 ¥ éù =  hX(())ss1 d êút >s ò0 ëû{}1 ()t1 t 1 1 = hXs( ()) ds ò0 (t1 )

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 96

Where the last step follows by Fubini, since h is bounded. Now, since h is any bounded function, the above implies that Xt(() ¥ X ), and finally, taking h ()⋅=1 , we recover the last required statement. x {} ⋅= x o Example: Let ()Xt(:) t³ 0 be a positive recurrent regenerative process. Fix a set ASÎ , and let TA( )= inf{ t³Î 0 :Xt ( ) A} Again, for simplicity, assume t(0)= 0 . We are now interested in the expression ()TA()>= t ?, especially for t large. For t > 0, note that a(t) = ()TA()> t

=>>+>£()()TA() t ,tt11 t TA () t , t t =>+>=éùTA()tt t() TA () t , s d s ()ëûêú11ò0 t =>+>>=éùTA()ttt t() TA () tTA ,() | s dF s ()ëûêú111ò0 t =>+>->éùTA()ttt t() TA () t sT ,()Ass |= dF ()ëûêú111ò0 Where TA()=> inf{ tt (1), X(,((1))ttXtA)γ+ A} = inf { 0 t Î}. Now t at()= éùTA()>+ttt t()() TA () >- t s TA () > | = s dF s ()ëûêú111ò0 => éùTA() t t ()ëûêú1 t +>-=>>()()()TA() t s tttsTA | ()  TA () dF s ò0 111 t =+bt() at (- s )b dF s ò0 Where  F()ssTA=>()tt11£ |( ) So ab=+b ( aF *) For fixed l Î  ,

t ate()lllttt=+ bte ()b e at ( - s ) d() Fs ò0 t at()=+ bt ()bb ell()ts- ats ( - ) e s d Fs () llò0 t at()=+ bt () at ( - s ) d Fs () d Fs () =b els d Fs () llò0 l l l

Suppose $l s.t. Fl is a bona-fide distribution. Then ¥ FeFs()¥= 1 =b ls d() l ò0

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 97

¥ b-1 ==eFllsZ d() s  e ò0 ( ) Assume there is such a solution l* [can show using hyeristic argument]. By appealing to the key renewal theorem 1 ¥ at( ) =¥ els  éùTA()>th s d s t l ò0 ()ëûêú1 l ()t1 ¥  t = xFx d() l 1 ò0 l And so at()=  {TA()> t} ~h e-lt So the probability has an exponential type tail. 

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011 Foundations of Stochastic Modeling Page 98

Index

Bold numbers indicate definitions. Italicized numbers indicate examples.

Ansombe's Lemma ...... 92 Normal tails, bounds on ...... See Bounds on Azuma's Inequality ...... See Martingale normal tails Baby Renewal Theorem ...... 91 Optional stopping theorem ...... 56 Bounds on normal tails ...... 33 failure of ...... 50 CLT for Renewal Processes ...... 91 Special cases ...... 57 Coupling time ...... 71, 72, 73 PASTA ...... 85 Elementary Renewal Theorem ...... 91 Pollaczek–Khinchine formula ...... 85 Filtration ...... 50 Queues G/G/1 Queue ...... 68 G/G/1 queue ...... 68 Instability analysis ...... 74 GI/GI/1 queue ...... 78 Stability analysis ...... 70, 72 Little's Law ...... 74 GI/GI/1 Queue ...... 78 M/G/1 queue ...... 83 Holder's Inequality ...... 20 M/M/1 queue ...... 79 Laplace transform ...... 89 Random walk ...... 49 Lattice distribution ...... 93 as a martingale ...... 54 Lindley's Equations ...... 69 Assymetric ...... 50 Little's Law ...... 74 Symmetric ...... 52, 53 Load ...... 68 Renewal equation ...... 89 M/G/1 Queue ...... 83 Renewal function ...... 88 Pollaczek–Khinchine formula ...... 85 Sample path methods ...... 67 SLLN for renewal processes ...... 90 M/M/1 Queue ...... 79 Stochastic stability ...... 64 Martingale ...... 54, 55 Deterministic motivation ...... 64 as a sum of differences ...... 54 Stopping time ...... 50 Azuma's Inequality ...... 61, 62 Wald's First Identitiy ...... 56 Central Limit Theorem ...... 60 Wald's First Identity ...... 51, 52 convergence theorem ...... 58 Wald's Second Identity ...... 52, 53, 56 Martingale Convergence Theorem ...... 63 SLLN ...... 59

Transcribed by Daniel Guetta from lectures by Assaf Zeevi, Spring 2011