1 2
Summer Workshop on Distribution Theory & its Summability Perspective
50
45
40
35
30
25 Frequencies 20
15
10
5
0 | 16 17 18 18.9 20 21 22 Center of mass Amount of Drink Mix (in ounces)
M. Kazım Khan Kent State University (USA)
Place: Ankara University, Department of Mathematics
Dates: 16 May - 27 May 2011
Supported by: The Scientific and Technical Research Council of Turkey (TUBITAK)¨ 4
Preface
This is a collection of lecture notes I gave at Ankara University, department of math- ematics, during the last two weeks of May, 2011. I am greatful to Professor Cihan Orhan and the Scientific and Technical Research Council of Turkey (TUBITAK)¨ for the invitation and support. The primary focus of the lectures was to introduce the basic components of distribution theory and bring out how summability theory plays out its role in it. I did not assume any prior knowledge of probability theory on the part of the participants. Therefore, the first few lectures were completely devoted to building the language of probability and distribution theory. These are then used freely in the rest of the lectures. To save some time, I did not prove most of these results. Then a few lectures deal with Fourier inversion theory specifically from the summability perspective. The next batch consists of convergence concepts, where I introduce the weak and the strong laws of large numbers. Again long proofs were omitted. A noteable exception deals with the results that involve the uniformly in- tegrable sequence spaces. Since this is a new concept from summability perspective, I have tried to sketch some of the proofs. I must acknowledge the legendary Turkish hospitality of all the people I came to meet. As always, it was a pleasure visiting Turkey and I hope to have the chance to visit again.
Mohammad Kazım Khan, Kent State University Kent, Ohio, USA. 6
List of Participants
1- AYDIN, Didem Ankara Universitesi¨ 2- AYGAR, Yelda Ankara Universitesi¨ 3- AYKOL, Canay Ankara Universitesi¨ 4- BAS¸CANBAZ TUNCA, G¨ulen Ankara Universitesi¨ 5- CAN, C¸a˘gla Ankara Universitesi¨ 6- CEBESOY, S¸erifenur Ankara Universitesi¨ 7- COS¸KUN, Cafer Ankara Universitesi¨ 8- C¸ETIN,˙ Nursel Ankara Universitesi¨ 9- DONE,¨ Ye¸sim Ankara Universitesi¨ 10- ERDAL, Ibrahim˙ Ankara Universitesi¨ 11- GUREL,¨ Ovg¨u¨ Ankara Universitesi¨ 12- IPEK,˙ Pembe, Ankara Universitesi¨ 13- KATAR, Deniz Ankara Universitesi¨ 14- ORHAN, Cihan Ankara Universitesi¨ 15- SAKAOGLU,˘ Ilknur˙ Ankara Universitesi¨ 16- SOYLU, Elis Ankara Universitesi¨ 17- S¸AHIN,˙ Nilay Ankara Universitesi¨ 18- TAS¸, Emre Ankara Universitesi¨ 19- UNVER,¨ Mehmet Ankara Universitesi¨ 20- YARDIMCI, S¸eyhmus Ankara Universitesi¨ 21- YILMAZ, Ba¸sar Ankara Universitesi¨ 22- YURDAKADIM,˙ Tu˘gba Ankara Universitesi¨ 8 CONTENTS
9 Ranks, Order Statistics & Records 75
10 Fourier Transforms 83 10.1Examples ...... 83
11 Summability Assisted Inversion 89 Contents 12 General Inversion 97 12.1 Fourier&DirichletSeries ...... 99
13 Basic Limit Theorems 107 13.1 ConvergenceinDistribution ...... 108 13.2 ConvergenceinProbability&WLLN...... 111 Preface 3 14AlmostSureConvergence&SLLN 117 List of Participants 5 15 The Lp Spaces & Uniform Integrability 127 Contents 5 15.1 UniformIntegrability...... 132 List of Figures 9 16 Laws of Large Numbers 141 16.1 Subsequences & Kolmogorov Inequality ...... 142 1 Modeling Distributions 1 17 WLLN, SLLN & Uniform SLLN 151 1.1 Distributions ...... 1 17.1 Glivenko-CantelliTheorem ...... 163 1.2 ProbabilitySpace&RandomVariables ...... 8 18 Random Series 169 2 ProbabilitySpaces&RandomVariables 11 18.1 Zero-OneLaws&RandomSeries ...... 169 3 Expectations 21 18.2 RefinementsofSLLN...... 175 3.1 Properties of Lebesgue integral ...... 22 19 Kolmogorov’s Three Series Theorem 183 3.2 Covariance ...... 23 20 The Law of Iterated Logarithms 189 4 Various Inequalities 27 4.1 Holder&Minkowski’sInequalities ...... 28 4.2 Jensen’sInequality ...... 30
5 Classification of Distributions 35 5.1 AbsoluteContinuity&Singularity ...... 41
6 Conditional Distributions 49 6.1 ConditionalExpectations ...... 52
7 ConditionalExpectations&Martingales 57 7.1 Properties of E(X Y ) ...... 57 7.2 Martingales ...... | 59
8 Independence & Transformations 63 8.1 Transformations of Random Variables ...... 63 8.2 Sequences of Independent Random Variables ...... 68 8.3 GeneratingFunctions...... 70 10 LIST OF FIGURES
List of Figures
1.1 A Histogram for the Drink Mix Distribution...... 4 1.2 Inverse Image of an Interval ...... 8
8.1 Inverse of a Distribution Function...... 66
11.1 TriangularDensity ...... 95
12.1 Dirichlet Kernels for n = 5 and n =8...... 101 12.2 Fejer Kernels for T =5 and T =8...... 103 12.3 Poisson Kernels for r = 0.8 and r = 0.9...... 105
14.1 Density of Random Harmonic Series ...... 124 2 Modeling Distributions
(i) Distributions arising while measuring mass produced products. • (ii) Distributions arising in categorical populations. • (iii) Distribution of Stirling numbers. • (iv) Distribution of zeros of orthogonal polynomials. • (v) Distributional convergence of summability theory. Lecture 1 • (vi) Distributions of eigenvalues of Toeplitz matrices. • (vii) Maxwell’s law of ideal gas. • (viii) Distribution of primes. Modeling Distributions • (ix) The Fineman-Kac formula and partial differential equations. • Of course, this is just a tiny sample of topics from an enormous field. One obvious omission being the field of Schwartz’s distributions. This is purely because there A phenomenon when repeatedly observed gives rise to a distribution. In other are excellent books on the subject.1 We will, however, briefly visit this branch while words, a distribution is our way of capturing the variability in the phenomenon. discussing summability assisted Fourier inversion theory. Such distributions arise in almost all fields of endeavor. In social sciences they are used to keep tabs on social indicators, in finance they are used to study and qunatify Example - 1.1.1 - (Measurement distributions — accuracy of automatic the financial health of corporations and pricing various assets and derived securities filling machines) Kountry Times makes 20 ounce cans of lemonade drink mix. such as options and bonds. Data distributions appear in statistics. In mathematics Due to unknown random fluctuations, the actual fill weight of each can is rarely distributions of zeros of orthogonal polynomials appear and the distribution of equal to 20 oz. Here is a collection of fill weights of 200 randomly chosen cans. primes are fundamental entities. In natural sciences about one and a half centure ago Maxwell conjoured up a distribution to describe the speed of molecules in 18.3 19.4 18.8 19.6 19.8 17.7 18.2 20.1 17.2 18.8 19.0 18.6 18.0 18.9 19.1 17.2 17.3 19.4 18.6 20.5 20.8 19.9 ideal gas, which was later observed to be quite accurate. The genetic diversity 18.7 16.7 19.2 18.8 18.3 18.3 18.3 17.9 18.2 17.5 17.6 and its quantification is still in its infency in terms of discovering the underlying 19.7 20.5 19.5 18.6 19.9 19.3 18.5 19.9 18.7 20.3 19.2 distributions that it hides. 18.9 18.6 19.4 18.7 18.5 19.2 17.3 18.0 17.7 19.2 19.1 In this chapter we will collect the tools that are quite effective in studying 18.8 18.3 21.0 18.0 18.9 19.9 21.4 18.8 19.0 18.9 18.7 distributions. We will present the following basic notions. 18.9 19.2 17.6 20.0 19.5 19.4 18.3 19.9 18.4 18.3 18.6 19.4 17.7 18.8 17.8 19.2 18.6 20.2 19.0 18.3 18.3 19.0 Some examples of distributions, • 18.4 19.4 19.4 17.9 19.2 18.5 17.7 19.3 19.0 16.7 18.3 A framework by which distributions can be modeled, • 19.7 18.8 19.4 20.3 18.3 18.6 19.4 18.4 18.6 19.1 18.0 Transforms of distributions, such as moment generating functions and char- 18.8 18.3 18.7 19.1 17.8 17.5 17.0 19.4 19.2 19.8 18.6 • acteristic functions, 17.7 17.9 19.1 18.2 19.5 19.6 20.4 20.7 19.8 18.9 19.2 Conditional probabilities and conditional expectations. 17.8 21.0 17.5 17.9 18.5 21.1 19.8 18.3 20.2 17.4 18.8 • 18.5 19.7 19.0 18.3 19.3 18.8 18.1 17.8 19.1 20.1 19.9 These results will be used in the remainder of the book. 21.0 17.9 18.3 17.1 18.7 18.5 19.1 17.6 20.4 19.2 19.2 20.2 17.4 18.4 18.9 18.4 18.8 18.3 19.8 18.7 19.1 20.4 18.7 18.9 18.0 20.7 20.8 19.9 20.6 19.2 18.4 18.5 18.5 1.1 Distributions 18.4 19.9 17.9 19.4 19.2 20.4 19.7 17.5 19.0 17.9 18.4 19.7 19.1 Any characteristic, when repeatedly measured, yields a collection of measured/collected responses. The word “variable” is used for the characteristic that is being measured, In this example, the feature being measured is the fill weight (measured in ounces). since it may vary from measurement to measurement. The collection of all the mea- We see unexpectedly large amount of variability. The issue is: sured responses is called the “distribution” of the variable. Sometimes, the word “Does the distribution say anything about whether the advertised av- data is also used to refer to the distribution of the variable. Distributions may be erage fill weight of 20 oz is being met or not”? real or just imagined entities. Here we collect a few examples of distributions of the following sorts to show their vast diversity. 1For instance, see ... 1.1 Distributions 3 4 Modeling Distributions
The average, or the mean and the variance of this collected distribution now denoted 50 as x ,x , ,x , is 45 1 2 ··· n n 40 x + x + x 1 x = 1 2 ··· n = x = 18.940, n n n i 35 i=1 X 30 n n 2 2 2 1 2 i=1 xi n(x) S = (x x) = − = 0.8574. 25 n 1 i − n 1 i=1 P Frequencies − X − 20 The mean gives a feeling that the automatic filling machine may be malfunctioning. 15 Since the units of variance are squared, we work with its positive square root, S, called the standard deviation. For our fill weights distribution S = 0.926 oz. The 10 standard deviation typically gives a scale by which we can gauage the “width” of 5 the distribution. Typically plus/minus three times the standard deviation around 0 | 16 17 18 18.9 20 21 22 the mean contains most of the values of the distribution. Note that the smallest Center of mass value of our data distribution is 16.7 oz, and the largest value is 21.4. In this case Amount of Drink Mix (in ounces) all the values of the distribution lie within 3S of the mean. Of course knowing this distribution of 200 observations is only partially inter- Figure 1.1: A Histogram for the Drink Mix Distribution. esting. The real aim is to conclude someting about the source of these 200 obser- vations, called the population distribution, which is a mythical entity and repsents how the automatic filling machine behaves. To get a feel for and then model the tends to observe such bell shaped histograms. The superimposed curve is called a shape of the source distribution we resort to figures. We make some groups, also normal curve and is proportional to called bins, say J1 = (16.5 17.0], J2 = (17.0 17.5], etc., and count the number of observations that fall into− these bins. Dividing− the frequencies by the total number 1 1 2 of observations gives the relative frequency distribution, which does not change the f(x) = exp − (x µ) , < x < . σ√2π 2σ2 − −∞ ∞ shape. A plot of this frequency distribution is called a histogram of the distribution.
Fill Weights Frequency Relative Frequency Most measurement type distributions are mathematically modeled by a normal curve. Symbolically we denote this by X N(µ, σ2), where X represents the 16.5 − 17.0 2 0.010 fill weight of a randomly chosen can. The letter∼ N stands for the word “normal 17.0 − 17.5 8 0.040 distribution”, and µ is the center or mean and σ is the standard deviation. In words, 17.5 − 18.0 23 0.115 the modeled density describes where will a randomly selected can’s fill weight fall. 18.0 − 18.5 33 0.165 2 18.5 − 19.0 44 0.220 The histogram reflects the empirical evidence for our model. The qunatity, P(a
Values of X 0 1 Example - 1.1.4 - (Convergence and summability) A matrix summability method Frequencies 80,000,000 85,000,000 consists of numbers ank, n,k = 0, 1, 2, arranged in a matrix form, A = [ank]. Such a matrix is constructed with the aim··· of converting a nonconvergent sequence, The resulting relative frequency distribution is specified by the proportion x ,x , into a convergent one. In other words, if 0 1 ··· 85, 000, 000 85 p = = = 0.51515. ∞ 80, 000, 000 + 85, 000, 000 165 y := a x , n = 0, 1, 2, n nk k ··· k=0 where X denotes the preference of an individual, represented in the coded form of X 0 or 1. Note that the population mean is p and the population variance is p(1 p). then our hope is that y should converge. However, when (x ) is itself convergent − n k to some number ℓ then we insist that (yn) should also be convergent to the same Example - 1.1.3 - (Distribution of zeros of orthogonal polynomials) So ℓ. A matrix A =[ank] which has this “convergence reproducing” property is called far we talked about data distributions and their sources. As mentioned in the regular. To handle the kind of examples we will present we need a bit more general beginning, distributions appear every where. concept that allows x = (xkn) to be a matrix as well and xkn need not be num- Consider a sequence of polynomials pn(x), n = 0, 1, 2, , where p0(x) 1, and ··· ≡ bers but could be functions. When xkn = xk for all n, we revert to the classical pn(x) is of degree n. Suppose there exist real constants an, bn such that an > 0 for summability. There are four notions of convergence. all n 0, ≥ (i) Let yn = k∞=0 xkn ank be defined for all n called the A-transform of x. a p (x) + (b x)p (x) + a p (x) 0, n 1. • n+1 n+1 n n n n 1 We way that x is A-summable to α if yn α. This notion can be extended − − ≡ ≥ P → to the case when xkn and α lie in a normed linear space. There is a result of Favard which says that each pn(x) has exactly n distinct real zeros, which we denote as (ii) If xkn are real, and let F (t) be a distribution, i.e., nondecreasing right • continuous function with F ( ) = 0 and F (+ ) = 1. We say x is x1n < x2n < < xnn. −∞ ∞ ··· A-distributionally convergent to F if for all t at which F is continuous we The issue is what are they? Having these zeros gives us an extremely fast numerical have integration method (called the Gaussian quadrature) among other benefits. If it is lim ank = F (t). n →∞ 3 k:xkn t Normal distributions are also called Gaussian distributions since the German math- X≤ ematician/astronomer, Carl Friedrich Gauss (1777 - 1855), showed their importance as This notion can be extended to higher dimensional forms when both xkn and models of measurement errors in celestial objects. t are d-dimensional vectors. 1.1 Distributions 7 8 Modeling Distributions
(iii) We say x = (xkn) is A-statistically convergent to α if for every ε > 0 we Example - 1.1.6 - Of course there are many more examples, such as • have (i) Asymptotic normality of the Stirling numbers. lim ank = 0. • n →∞ k: xkn α >ε (ii) Distributions of eigenvalues of Toeplitz matrices. | X− | • This notion can be extended to the case when xkn and α lie in a topological (iii) Distributions of eigenvalues of random matrices — wigner’s law. • space. Example 1.1.3 uses this notion of convergence for the ak and bk sequences with the matrix A being the Cesaro matrix. { } { } (iv) Maxwell’s law of ideal gas — distribution of speed of molecules in ideal • gas. (iv) We say x = (xkn) is A-strongly convergent to α if • (v) Fineman-Kac formula. Solutions of many types of PDEs and Brownian • ∞ motion go hand in hand. The poster child being the heat equation. This link lim xkn α ank = 0. n has found an unexpted admirer, namely the financial industry, since it ties →∞ | − | kX=0 very nicely into the price of various derived financial securities such as the This notion can be extended to the case when xkn and α lie in a metric space. call and put and many other options.
Example - 1.1.5 - (Distribution of primes) Let π(n) be the prime counting The above examples give a glimpse of the importance of the concept of a dis- function. That is, π(n) is the total number of primes that lie in the interval (0,n]. tribution. Probability theory provide the ideal language to express concepts of Gauss as a teenager conjectured that distribution theory. Therefore we start off by building some basic structures of n probability theory. π(n) . ∼ ln n The prime number theorem says that 1.2 Probability Space & Random Variables
n 1 ∞ j! Our aim is to construct a mathematical structure to house the concept of a distribu- π(n) Li(n) := dx n . ∼ ln x ∼ (ln n)j+1 tion. Distributions always describe some features of some variables. Since variables 2 j=0 Z X may have random components in them, distributions are often linked to probability This was proved by both Hadamard and de la Vallee Poussin in 1896, by showing theory with a concept called a random variable. A random variable being a func- that the Riemann zeta function ζ(z) has no zeros of the type 1 + it. Hardy and tion defined over a probability space. To see what we need, consider the following Wright’s4 book provides more details. diagram in which ω belongs to some abstract set, Ω, shown as the horizontal axis In 1914 Littlewood5 showed that π(n) Li(n) is positive and negative infinitely for convenience. To connect to the histogram of X, if J = (a, b] is any bin, the often. Since Li(n) n + n + 2n −+ , Chebyshev asked the behavior of ∼ ln n (ln n)2 (ln n)2 ··· the ratio X(w) π(n) X := , n = 1, 2, . n n/ ln n ··· J=(a,b] 7 9 Chebyshev showed that 8 < liminfn Xn limsupn Xn < 8 . The recent book of 6 ≤ Havil shows that if limn Xn exists then limn Xn = 1. As an evidence of deep roots of π(x), the Riemann→∞ hyphothesis is equivalent to the statement w π(n) Li(n) = O (ln n)√n . A | − | For more, see Ingham.7 Figure 1.2: Inverse Image of an Interval 4Hardy, G. H. and Wright, E. M. An Introduction to the Theory of Numbers, 5th ed. Oxford University Press, 1979. 5 area of the rectangle over it represents the “size” of the event a 1 1 Remark - 1.2.1 - Note that the definition of P is tied to the collection of events X− ( J ) = X− (J ) = A . ∪i i i i (sigma field). Condition (i) of the definition of a sigma field is needed for Condition i i [ [ (i) of the definition of P to avoid logical inconsistencies. The same is the case with In general, iAi , whenever Ai . In particular, since R = i( i,i], we the third conditions of the two concepts. ∪ ∈ E 1 ∈ E ∪ − insist that Ω = iX− (( i,i]) . ∪ − ∈ E P c Exercise - 1.2.1 - Let (Ω, , ) be a probability space for a random experiment. (iii) Since J = ( , a] (b, ) is a union of some other J’s, we insist that P E • c 1 c −∞ ∪ ∞ Show that satisfies the following properties for any A, B : A = X− (J ) should also be in our collection. More generally, if A then ∈ E ∈ E (i) P( ) = 0, Ac . ∅ ∈ E (ii) P(Ac) = 1 P(A), − (iv) The concept of size should be defined for all A . Furthermore, the (iii) P(Ac B) = P(B) P(A B), ∩ − ∩ • concept of size should respect disjointness. That is, if ∈A E, A , are pairwise (iv) P(A B) = P(A) + P(B) P(A B), 1 2 ∪ − ∩ disjoint then their individual sizes should add up to the size attached··· to their (v) if A B, then P(A) P(B). ⊆ ≤ union A . ∪i i Around 1930 A. N. Kolmogorov realized that all of the above requirements were part and parcel of the then newly discovered Lebesgue measure and integration theory. His 1933 book on the foundations of probability theory detailing this is now a classic. Let us collect and freeze these notions for our future use. Definition - 1.2.1 - (Probability space) A probability space (Ω, ,P ) consists of the following items. E Ω = The set of all possible outcomes of an experiment, also called the • sample space. = The set of all subsets of Ω for which a (probability) function (measure) • EP can be defined. Each member of is called an event (or a measurable set). E The collection of all events, , must obey the conditions E (i) Ω , • ∈ E (ii) if A then Ac , • ∈ E ∈ E (iii) if A , A , then A . • 1 2 ···∈E ∪i i ∈ E Any collection of subsets of the space, Ω, that obeys the above conditions, is called a sigma field. The probability measure, P, is a real valued function over , with the following requirements: E (i) P(Ω) = 1, • (ii) 0 P(A) 1, for any A , • ≤ ≤ ∈ E (iii) if A , A , are disjoint then P( A ) = P(A ). • 1 2 ···∈E ∪i i i i P 12 Probability Spaces & Random Variables The reader should prove part (ii) (cf. Exercise (2.0.2)). ♠ As we saw above, a monotone sequence of sets has a limit. In general, if A1, A2, is any sequence of sets, the new sequence B1 = k 1Ak, B2 = k 2Ak, ··· ∪ ≥ ∪ ≥ , Bn = k nAk, for n = 1, 2, becomes monotone. That is, B1 B2 B3 ··· . Hence,∪ the≥ sequence, B ,B···, has a limit, which is called the⊇ limsup⊇ A⊇ ··· 1 2 ··· n n Lecture 2 and stands for limsup An := lim Bn = n∞=1Bn = n∞=1 k n Ak. n ≥ n →∞ ∩ ∩ ∪ Similarly, the new sequence Cn = k nAk is a monotone sequence since C1 C2 ∩ ≥ ⊆ ⊆ Probability Spaces & . It also has a limit, called the liminfn An and stands for ··· liminf An = lim Cn = n∞=1Cn = n∞=1 k n Ak. Random Variables n n ∪ ∪ ∩ ≥ →∞ Since Cn Bn for every n, their respective limits also share the same relationship, namely liminf⊆ A limsup A . Note that the definition of ensures that both n n ⊆ n n E liminfn An and limsupn An are in whenever all An are in . Remark - 2.0.2 - (Is P a continuous function?) The ususual notio of continuity E E In probability literature, the event iAi is often read as “at least one of the does not apply to the function P since its domain has not been given any topological ∪ Ai occurs”. And similarly, the event iAi is often read as “every one of the Ai structure. The problem is that we cannot talk about limn An for an arbitrary ∩ occurs”. Continuing this further, the event lim sup An stands for “infinitely many sequence of sets (events) in E. For some special sequences→∞ of sets “convergence of n of the Ai occur” and liminfn An stands for “all but finitely many of the Ai occur”. sets” can be defined. When A1 A2 , then we define limn An = ∞ An. ⊆ ⊆··· →∞ n=1 The reader should try to see why this interpretation is justified. Here is another Similarly, if A1 A2 , then limn An = n∞=1 An. A question arises, “for consequence of the definition of probability function. such sets is P continuous⊇ ⊇ ··· in the sense that→∞ S T Theorem - 2.0.2 - (The first Borel-Cantelli lemma) Let A1, A2,... be a se- P( lim An)= lim P(An)? n n quence of events. If P(A ) < then P(limsup A ) = 0. →∞ →∞ n n ∞ n n Here is a result that answers this question. P Proof: Note that if Bn = k n Ak then B1 B2 . Thus, by the continuity P ≥ ⊇ ⊇··· P property of , Theorem - 2.0.1 - (The continuity property of ) If An,n 1 and Bn,n S 1 are sequences of events such that A A and{ B ≥ B} { , then≥ } 1 ⊆ 2 ⊆ ··· 1 ⊇ 2 ⊇ ··· (i) lim P(An) = P lim An , and (ii) lim P(Bn) = P lim Bn . n n n n 0 P limsup An = lim P (Bn) = lim P Ak . ≤ n n n →∞ →∞ →∞ →∞ k n [≥ Proof: Note that lim An = ∞ An = A1 (A2 A1) (A3 A2) , where n ∪n=1 ∪ − ∪ − ∪··· the unions on the right→∞ side are disjoint. Thus, By the subadditivity property of P, we get P A P (A ). Since, the k ≤ k k n k n ∞ [≥ X≥ P A = P(A ) + P(A A ) + P(A A ) + tail of a convergent series goes to zero, n 1 2 − 1 3 − 2 ··· n=1 ! [ n 1 − = P(A1)+ lim (P(Ai+1 Ai)) 0 P limsup An = lim P Ak lim P (Ak) = 0. n − ≤ n n ≤ n ♠ →∞ i=1 k n k n X [≥ X≥ n 1 P − P P = (A1)+ lim ( (Ai+1) (Ai)) Remark - 2.0.3 - (Inclusion-exclusion principle) It is a natural question to n − →∞ i=1 X ask, “can one find the probability of union of events when one knows only the = P(A1)+ lim P(An) P(A1). probabilities of the individual events?” The answer is yes, provided probabilities of n →∞ − Probability Spaces & Random Variables 13 14 Probability Spaces & Random Variables their intersections are known, and is called the inclusion-exclusion principle and is that the outcomes should show no preference, the above counting method due to H. Poincar´e.) breaks down. Its natural analog then becomes size of A n n P(A) = , P A = P (A ) P (A A ) + P (A A A ) size of Ω i i − i ∩ j i ∩ j ∩ k i=1 ! i=1 i c c c (Independence). Another distinct modeling technique that sets probability Therefore, P (A1 A2 An) = 1 P (A1 A2 An) • ∩ ∩···∩ − n ∪ ∪···∪ theory apart from other disciplines is the modeling tool of independence. We j 1 = 1 ( 1) − B will briefly describe this concept a bit later. − − j j=1 X Example - 2.0.1 - n (Secretary’s matching problem — equilikely probabil- j ity space) Here we illustrate the use of the inclusion-exclusion principle applied to a = ( 1) Bj where B0 = 1, − particular equilikely probability space and solve the secretary’s matching problem. j=0 X A secretary types n letters addressed to n different people. Then he types n en- and it represents the probability of none of the Ai’s occurring. This is a special velopes with the same n addresses. However, while putting the letters into the case of a yet more general result due to Jordan who proved it in 1927. It says that envelopes, he randomly puts letters into the envelopes. (This word “random” here stands for no preference to any particular letter going any particular envelope. n j k j This then can be interpreted to mean that the resulting probability space is equi- P ( exactly k events among A , A will occur ) = ( 1) − B , { 1 ··· n } − k j likely.) We would like to know the probability that at least one of the letters j=k 1 X is correctly put into its own envelope. Let Ai be the event that letter i goes which reduces to the result of Poincar´efor k = 0. Both of these results can be into its own envelope (1 i n) (i.e., a match occurs for the ith letter). We ≤ ≤ P n proved by induction and are left for the reader as exercises. want the probability of at least one of the Ai’s occur, i.e., ( i=1Ai). To find out P(A ), P(A A ), P(A A A ), ,i = j = k, etc., we proceed∪ as follows. i i ∩ j i ∩ j ∩ k 6 6 Remark - 2.0.4 - (Various assignment methods) How should one define the There are n! ways to place the letters into the n envelopes. Therefore, we see that, (n 1)! 1 (n 2)! 1 P P(A ) = − = , i = 1, 2, ,n. Similarly, P(A A ) = − = and function : [0, 1] so that the three requirements of its defintion are fulfilled i n! n ··· i ∩ j n! n(n 1) E → P 1 − and at the same time is realistic? The word “realistic” points towards our desire (Ai Aj Ak) = n(n 1)(n 2) , etc. So, by the result of Poincar´e, that it should be applicable in various real life situations. This is a modeling ∩ ∩ − − n n issue. Typically any one of the following four techniques is invoked due to various 1 1 1 P A = + + reasonings: i n − n(n 1) n(n 1)(n 2) ··· i=1 ! i=1 i