Probability and Random Processes
Total Page:16
File Type:pdf, Size:1020Kb
Probability and Random Processes Serik Sagitov, Chalmers University of Technology and Gothenburg University Abstract Lecture notes based on the book Probability and Random Processes by Geoffrey Grimmett and David Stirzaker. Last updated June 2, 2014. Contents Abstract 1 1 Random events and random variables 2 1.1 Probability space . 2 1.2 Conditional probability and independence . 3 1.3 Random variables . 3 1.4 Random vectors . 4 1.5 Filtration . 6 2 Expectation and conditional expectation 6 2.1 Expectation . 6 2.2 Conditional expectation and prediction . 8 2.3 Multinomial distribution . 9 2.4 Multivariate normal distribution . 10 2.5 Sampling from a distribution . 10 2.6 Probability generating, moment generating, and characteristic functions . 11 3 Convergence of random variables 12 3.1 Borel-Cantelli lemmas . 12 3.2 Inequalities . 13 3.3 Modes of convergence . 13 3.4 Continuity of expectation . 15 3.5 Weak law of large numbers . 17 3.6 Central limit theorem . 18 3.7 Strong LLN . 18 4 Markov chains 19 4.1 Simple random walks . 19 4.2 Markov chains . 21 4.3 Stationary distributions . 23 4.4 Reversibility . 24 4.5 Branching process . 24 4.6 Poisson process and continuous-time Markov chains . 25 4.7 The generator of a continuous-time Markov chain . 26 5 Stationary processes 27 5.1 Weakly and strongly stationary processes . 27 5.2 Linear prediction . 27 5.3 Linear combination of sinusoids . 28 5.4 The spectral representation . 30 5.5 Stochastic integral . 31 1 5.6 The ergodic theorem for the weakly stationary processes . 33 5.7 The ergodic theorem for the strongly stationary processes . 34 5.8 Gaussian processes . 35 6 Renewal theory and Queues 36 6.1 Renewal function and excess life . 36 6.2 LLN and CLT for the renewal process . 38 6.3 Stopping times and Wald's equation . 38 6.4 Stationary excess lifetime distribution . 39 6.5 Renewal-reward processes . 41 6.6 Regeneration technique for queues . 42 6.7 M/M/1 queues . 43 6.8 M/G/1 queues . 44 6.9 G/M/1 queues . 46 6.10 G/G/1 queues . 47 7 Martingales 49 7.1 Definitions and examples . 49 7.2 Convergence in L2 ........................................ 50 7.3 Doob's decomposition . 52 7.4 Hoeffding’s inequality . 53 7.5 Convergence in L1 ........................................ 54 7.6 Bounded stopping times. Optional sampling theorem . 56 7.7 Unbounded stopping times . 57 7.8 Maximal inequality . 59 7.9 Backward martingales. Strong LLN . 61 8 Diffusion processes 62 8.1 The Wiener process . 62 8.2 Properties of the Wiener process . 63 8.3 Examples of diffusion processes . 64 8.4 The Ito formula . 64 8.5 The Black-Scholes formula . 66 1 Random events and random variables 1.1 Probability space A random experiment is modeled in terms of a probability space (Ω; F; P) • the sample space Ω is the set of all possible outcomes of the experiment, • the σ-field (or sigma-algebra) F is a collection of measurable subsets A ⊂ Ω (which are called random events) satisfying 1. ; 2 F, 1 2. if Ai 2 F, 0 = 1; 2;:::, then [i=1Ai 2 F, countable unions, 3. if A 2 F, then Ac 2 F, complementary event, • the probability measure P is a function on F satisfying three probability axioms 1. if A 2 F, then P(A) ≥ 0, 2. P(Ω) = 1, 1 P1 3. if Ai 2 F, 0 = 1; 2;::: are all disjoint, then P([i=1Ai) = i=1 P(Ai). 2 De Morgan's laws c c \ [ c [ \ c Ai = Ai ; Ai = Ai : i i i i Properties derived from the axioms P(;) = 0; c P(A ) = 1 − P(A); P(A [ B) = P(A) + P(B) − P(A \ B): Inclusion-exclusion rule X X X P(A1 [ ::: [ An) = P(Ai) − P(Ai \ Aj) + P(Ai \ Aj \ Ak) − ::: i i<j i<j<k n+1 + (−1) P(A1 \ ::: \ An): Continuity of the probability measure 1 • if A1 ⊂ A2 ⊂ ::: and A = [i=1Ai = limi!1 Ai, then P(A) = limi!1 P(Ai), 1 • if B1 ⊃ B2 ⊃ ::: and B = \i=1Bi = limi!1 Bi, then P(B) = limi!1 P(Bi). 1.2 Conditional probability and independence If P(B) > 0, then the conditional probability of A given B is P(A \ B) P(AjB) = : P(B) The law of total probability and the Bayes formula. Let B1;:::;Bn be a partition of Ω, then n X P(A) = P(AjBi)P(Bi); i=1 P(AjBj)P(Bj) P(BjjA) = Pn : i=1 P(AjBi)P(Bi) Definition 1.1 Events A1;:::;An are independent, if for any subset of events (Ai1 ;:::;Aik ) P(Ai1 \ ::: \ Aik ) = P(Ai1 ) ::: P(Aik ): Example 1.2 Pairwise independence does not imply independence of three events. Toss two coins and consider three events • A =fheads on the first coing, • B =ftails on the first coing, • C =fone head and one tailg. Clearly, P(AjC) = P(A) and P(BjC) = P(B) but P(A \ BjC) = 0. 1.3 Random variables A real random variable is a measurable function X :Ω ! R so that different outcomes ! 2 Ω can give different values X(!). Measurability of X(!): f! : X(!) ≤ xg 2 F for any real number x: Probability distribution PX (B) = P(X 2 B) defines a new probability space (R; B; PX ), where B = σ(all open intervals) is the Borel sigma-algebra. 3 Definition 1.3 Distribution function (cumulative distribution function) F (x) = FX (x) = PX f(−∞; x]g = P(X ≤ x): In terms of the distribution function we get P(a < X ≤ b) = F (b) − F (a); P(X < x) = F (x−); P(X = x) = F (x) − F (x−): Any monotone right-continuous function with lim F (x) = 0 and lim F (x) = 1 x→−∞ x!1 can be a distribution function. Definition 1.4 The random variable X is called discrete, if for some countable set of possible values P(X 2 fx1; x2;:::g) = 1: Its distribution is described by the probability mass function f(x) = P(X = x). The random variable X is called (absolutely) continuous, if its distribution has a probability density function f(x): Z x F (x) = f(y)dy; for all x; −∞ so that f(x) = F 0(x) almost everywhere. Example 1.5 The indicator of a random event 1A = 1f!2Ag with p = P(A) has a Bernoulli distribution P(1A = 1) = p; P(1A = 0) = 1 − p: Pn For several events Sn = i=1 1Ai counts the number of events that occurred. If independent events A1;A2;::: have the same probability p = P(Ai), then Sn has a binomial distribution Bin(n; p) n (S = k) = pk(1 − p)n−k; k = 0; 1; : : : ; n: P n k Example 1.6 (Cantor distribution) Consider (Ω; F; P) with Ω = [0; 1], F = B[0;1], and P([0; 1]) = 1 −1 P([0; 1=3]) = P([2=3; 1]) = 2 −2 P([0; 1=9]) = P([2=9; 1=3]) = P([2=3; 7=9]) = P([8=9; 1]) = 2 and so on. Put X(!) = !, its distribution, called the Cantor distribution, is neither discrete nor contin- uous. Its distribution function, called the Cantor function, is continuous but not absolutely continuous. 1.4 Random vectors Definition 1.7 The joint distribution of a random vector X = (X1;:::;Xn) is the function FX(x1; : : : ; xn) = P(fX1 ≤ x1g \ ::: \ fXn ≤ xng): Marginal distributions FX1 (x) = FX(x; 1;:::; 1); FX2 (x) = FX(1; x; 1;:::; 1); ::: FXn (x) = FX(1;:::; 1; x): 4 The existence of the joint probability density function f(x1; : : : ; xn) means that the distribution function Z x1 Z xn FX(x1; : : : ; xn) = ::: f(y1; : : : ; yn)dy1 : : : dyn; for all (x1; : : : ; xn); −∞ −∞ n @ F (x1;:::;xn) is absolutely continuous, so that f(x1; : : : ; xn) = almost everywhere. @x1:::@xn Definition 1.8 Random variables (X1;:::;Xn) are called independent if for any (x1; : : : ; xn) P(X1 ≤ x1;:::;Xn ≤ xn) = P(X1 ≤ x1) ::: P(Xn ≤ xn): In the jointly continuous case this equivalent to f(x1; : : : ; xn) = fX1 (x1) : : : fXn (xn): Example 1.9 In general, the joint distribution can not be recovered form the marginal distributions. If FX;Y (x; y) = xy1f(x;y)2[0;1]2g; then vectors (X; Y ) and (X; X) have the same marginal distributions. Example 1.10 Consider 8 −x −y < 1 − e − xe if 0 ≤ x ≤ y; F (x; y) = 1 − e−y − ye−y if 0 ≤ y ≤ x; : 0 otherwise. Show that F (x; y) is the joint distribution function of some pair (X; Y ). Find the marginal distribution functions and densities. Solution. Three properties should be satisfied for F (x; y) to be the joint distribution function of some pair (X; Y ): 1. F (x; y) is non-decreasing on both variables, 2. F (x; y) ! 0 as x ! −∞ and y ! −∞, 3. F (x; y) ! 1 as x ! 1 and y ! 1. Observe that @2F (x; y) f(x; y) = = e−y1 @x@y f0≤x≤yg is always non-negative. Thus the first property follows from the integral representation: Z x Z y F (x; y) = f(u; v)dudv; −∞ −∞ which, for 0 ≤ x ≤ y, is verifies as Z x Z y Z x Z y f(u; v)dudv = e−vdv du = 1 − e−x − xe−y; −∞ −∞ 0 u and for 0 ≤ y ≤ x as Z x Z y Z y Z y f(u; v)dudv = e−vdv du = 1 − e−y − ye−y: −∞ −∞ 0 u The second and third properties are straightforward.