Lecture Notes for the Introduction to Probability Course
Total Page:16
File Type:pdf, Size:1020Kb
Lecture Notes for the Introduction to Probability Course Vladislav Kargin December 5, 2020 Contents 1 Combinatorial Probability and Basic Laws of Probabilistic Inference 4 1.1 What is randomness and what is probability?. 4 1.1.1 The classical definition of probability . 4 1.1.2 Kolmogorov’s Axiomatic Probability . 9 1.1.3 More about set theory and indicator functions . 12 1.2 Discrete Probability Models. 16 1.3 Sample-point method (Combinatorial probability) . 19 1.4 Conditional Probability and Independence . 29 1.4.1 Conditional probability . 30 1.4.2 Independence . 32 1.4.3 Law of total probability . 34 1.5 Bayes’ formula . 39 2 Discrete random variables and probability distributions. 44 2.1 Pmf and cdf . 44 2.2 Expected value . 49 2.2.1 Definition . 49 2.2.2 Expected value for a function of a r.v. 51 2.2.3 Properties of the expected value . 53 2.3 Variance and standard deviation . 55 2.3.1 Definition . 55 2.3.2 Properties. 57 1 2.4 The zoo of discrete random variables . 58 2.4.1 Binomial . 59 2.4.2 Geometric. 63 2.4.3 Negative binomial . 65 2.4.4 Hypergeometric . 68 2.4.5 Poisson . 70 2.5 Moments and moment-generating function . 75 2.6 Markov’s and Chebyshev’s inequalities . 80 3 Chapter 4: Continuous random variables. 84 3.1 Cdf and pdf . 84 3.2 Expected value and variance . 89 3.3 Quantiles . 91 3.4 The Uniform random variable . 92 3.5 The normal (or Gaussian) random variable. 94 3.6 Exponential and Gamma distributions . 101 3.7 Beta distribution . 107 4 Multivariate Distributions 110 4.1 Discrete random variables . 111 4.1.1 Joint and marginal pmf . 111 4.1.2 Conditional pmf and conditional expectation . 114 4.2 Dependence between random variables . 119 4.2.1 Independent random variables. 119 4.2.2 Covariance . 122 4.2.3 Correlation coefficient . 126 4.3 Continuous random variables . 127 4.3.1 Joint cdf and pdf . 127 4.3.2 Calculating probabilities using joint pdf . 128 4.3.3 Marginal densities . 130 4.3.4 Conditional density (pdf). 134 4.3.5 Conditional Expectations. 136 4.3.6 Independence for continuous random variables . 137 2 4.3.7 Expectations of functions and covariance for continuous r.v.’s . 140 4.3.8 Variance and conditional variance . 142 4.4 The multinomial distribution . 143 4.5 The multivariate normal distribution . 145 5 Functions of Random Variables 147 5.1 The method of cumulative distribution functions . 148 5.1.1 Functions of one random variable . 148 5.1.2 Functions of several random variables . 150 5.2 Random variable generation. 152 5.3 The pdf transformation method . 152 5.3.1 A function of one random variable . 152 5.3.2 Functions of several variables . 154 5.4 Method of moment-generating functions . 159 5.5 Order Statistics . 160 6 Sampling distributions, Law of Large Numbers, and Cen- tral Limit Theorem 168 6.1 Sampling distributions . 168 6.2 Law of Large Numbers (LLN) . 175 6.3 The Central Limit Theorem. 176 3 Chapter 1 Combinatorial Probability and Basic Laws of Probabilistic Inference 1.1 What is randomness and what is probability? “Probability is the most important concept in modern science, especially as nobody has the slightest notion what it means.” Bertrand Russell 1.1.1 The classical definition of probability The probability theory grew up from the study of games of chance, and the definition of probability in times of French mathematician Laplace (who taught mathematics to Napoleon) was very simple. We start by considering this definition and move to a more modern definition due to Kolmogorov. 4 Definition 1.1.1 (Laplace’s definition). If each case is equally probable, then the probability of a random event comprised of several cases is equal to the number of favorable cases divided by the number of all possible cases. From the point of view of this definition, probability is the art of counting all possible and all favorable cases. We will practice this art in the first 5% of the course. Example 1.1.2. Roll a fair die. What is the probability that you get an even number? This is easy. However, it might become a bit more difficult to define what are equally probable outcomes if you consider more complicated examples. Example 1.1.3. Suppose that we throw two fair dice. Is a throw of 11 as likely as a throw of 12? ( A throw is the sum of the numbers on the dice.) Leibniz (one of the inventors of calculus) wrote in a letter: “... For example, with two dice, it is equally likely to throw twelve points, than to throw eleven; because one or the other can be done in only one manner.” Leibniz makes a mistake here. In order to see it, look at this super-simple example. Example 1.1.4. Suppose we toss two fair coins. What is the probability that we observe a head and a tail? Leibniz would presumably answered 1/3. There are only 3 possibilities: Head and Head, Head and Tail, Tail and Tail. All possibilities are equally probable, so every one of them should have probability 1/3. However, if we do a large number of experiments with coin tossing then we find that the relative frequency in a large number of experiments isclose to 1/2 not to 1/3. Maybe these possibilities are not equally probable, after all. Indeed, the coins are distinguishable and the list of possible outcomes is in fact HH = “H on first coin, H on the second coin”, HT, TH anTT.If we assume that these 4 outcomes are equally probable, then there are two favorable outcomes: HT and TH and, by Laplace’s definition, the proba- bility is 2/4 = 1/2. This agrees with experiment eventually justifying our assumption. 5 The assumption that the coins are distinguishable and the four possible outcomes are equiprobable works in the real world which we observe every day. However, if we consider less usual objects, for example, objects in quantum mechanics, then we encounter objects for which this assumption is violated. In particular, elementary particles called bosons, – for example, photons are bosons, – are indistinguishable. So if we setup an experiment in which we use polarizations of two photons as coins, then possible outcomes in this experiment will be HH, HT and TT, since we are not able to say which photon is which. Moreover, these three outcomes are equiprobable. For this experiment, Leibnitz’ prediction is correct and the probability of HT is 1/3. The probability rules based on this set of assumptions are called the Bose- Einstein statistics. Note that Laplace’s definition is OK in this situation. It simply happens that the set of equiprobable outcomes is different. Another set of probability rules that arise in quantum mechanics is Fermi-Dirac statistics for fermion particles. If the coins behaved like electron spins, then only HT is a possible outcome and the probability to observe it is 1. We will use the standard assumption for coins and similar objects in this course and we will do many calculations based on Laplace’s definition of probability. Why do we need something else besides the Laplace definition? Even in usual circumstances, there are situations in which Laplace’s definition is not sufficient. This is especially relevant when wehavean uncountable infinity of possible outcomes. Example 1.1.5. In shooting at a square target we assume that every shot hits the target and that “every point of the target have an equal probability to be hit.” What is the probability that a shot hits the left lower quadrant of the target? The Laplace definition fails since there are infinitely many points thatwe can hit in the target and infinitely many points in the lower-left quadrant of the target. Dividing infinity by infinity is not helpful. A possible approach is to say that the probability to hit a specific region is proportional to the area of this region. However, this trick works in this example but not sufficiently 6 general to serve as a good definition in many other cases. We will learn later that the correct approach here is to define the prob- ability density at each point of the target. Here is an example that shows that assigning a correct probability den- sity is sometimes not easy and usually should be chosen based on experi- ments. Example 1.1.6 (Bertrand’s paradox). Take a circle of radius 1 and choose at random a chord of it. What is the probability that this chord will be longer than the side of the regular triangle inscribed into the circle? Solution #1: Take a point in the interior of the circle at random and construct the chord whose midpoint is the chosen point. Now we can rotate the triangle so that one of its sides is parallel to the chord. Then it is clear that the chord is longer than the triangle side if and only if the distance from the midpoint of the chord to the center is smaller than the distance from the midpoint of the triangle side to the center. This distance is the radius of the circle inscribed in the triangle which is easy to calculate as 1/2. So the conclusion is that the chord will be longer than the side of the triangle if the midpoint is in the circle with the same center and radius 1/2.