Introduction to Probability and Statistics
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to probability and statistics Prof. Dr. Alexander Drewitz Universit¨at zu K¨oln Preliminary version of October 4, 2019 2 Contents 1 Probability theory 5 1.1 A short history of classical probability theory . ............. 5 1.2 Mathematical modelling of random experiments . ........... 6 1.3 σ-algebras, probability measures, and probability spaces . ............. 8 1.4 Examplesofprobabilityspaces . ....... 13 1.4.1 Discrete probability spaces . ..... 13 1.4.2 Continuous probability spaces . ...... 14 1.5 Conditionalprobabilities. ........ 15 1.6 Independence .................................... 17 1.6.1 Finite products of probability spaces . ........ 19 1.7 Randomvariables................................. 20 1.8 Sums of independent random variables and convolutions . ............. 25 1.9 Specificdistributions . ...... 25 1.9.1 Discretedistributions . 26 1.9.2 Distributions with densities . ...... 30 1.10Expectation.................................... 33 1.10.1 Second (and higher) moments . 38 1.11 Generatingfunctions . ...... 43 1.12 Convergence of random variables . ........ 45 1.12.1 Almostsureconvergence. 45 1.12.2 Convergence in p ............................... 46 L 1.12.3 Convergence in probability . ..... 47 1.12.4 Convergence in distribution . ...... 47 1.13 Some fundamental tools and inequalities . ........... 49 1.13.1 Markov’s and Cheybshev’s inequalities . ......... 49 1.13.2 The Borel-Cantelli lemmas . 50 1.13.3 Jensen’sinequality . 51 1.14 Interdependence of types of convergence of random variables ........... 52 1.15 Lawsoflargenumbers . .. .. .. .. .. .. .. .. 54 1.15.1 Weak law of large numbers . 55 1.15.2 Stronglawoflargenumbers. 59 1.15.3 Are we investigating unicorns? . ...... 61 1.16 Centrallimittheorem . ..... 61 1.17Markovchains ................................... 65 1.17.1 Stationary distribution . ...... 68 1.17.2 Classification of states . ..... 74 1.17.3 The Perron-Frobenius theorem . ..... 75 1.17.4 Quantitative convergence of Markov chains to equilibrium......... 76 3 4 CONTENTS 2 SRW basics – insert at some point 83 2.0.1 Recurrence and transience of simple random walk . ........ 83 2.0.2 Gambler’sruin ................................. 84 2.0.3 Reflectionprinciple. 85 2.0.4 Ballottheorem ................................. 85 2.0.5 Arcsinelawforlastvisits . 86 2.0.6 Arcsinelawforsojourntimes . 88 3 Statistics 91 3.1 Estimators...................................... 93 3.1.1 Propertiesofestimators . 94 3.1.2 Maximum likelihood estimators . ..... 98 3.1.3 Fisher information and the Cram´er-Rao inequality . ........... 99 3.1.4 Consistencyofestimators . 102 3.2 Confidenceregions ................................ 106 3.2.1 One recipe for constructing confidence regions . ..........106 3.2.2 Orderstatistics ............................... 107 3.3 Tests........................................... 109 3.4 Testing for alternatives (‘Alternativtests’) . ...............113 3.4.1 Chi-square test of goodness of fit (‘Chiquadrat-Anpassungstest’) . 115 Chapter 1 Probability theory 1.1 A short history of classical probability theory Arguably the first historically acknowledged hour of birth of probability was a series of exchanges of letters between Pascal and de Fermat in 1654. Most prominently, they were investigating the so-called ’problem of points’, in which the setting is as follows: Two players, A and B, are consecutively playing a game in which each player has the same chance of winning. The first player who wins a fixed number of times (say, three) wins the prize money. Now imagine that due to unforeseen circumstances the tournament has to be stopped prematurely at a time when player A has won twice and player B has won once. How should the prize money be distributed? In fact, this problem had been around for some time already when Pascal and de Fermat solved it: Attempts to solve it had been given by Pacioli in 1494, Cardano in 1539 and Forestani in 1603, just to name a few. It was also suggested that the problem is in fact a jurisdictional problem, not a mathematical one. The bulk of those solutions either comes out of (mathematically) thin air and looks naive or at least implausible from today’s point of view. Either way, in the exchange of letters mentioned above, de Fermat gave a solution to the problem by essentially enumerating all possible outcomes for the remaining games and then distributing the money in the same ratio as these outcomes make player A or B win, respectively.1 Pascal, on the other hand, realised that this combinatorial approach is getting prohibitively expensive in the case of not three wins, but an arbitrary number of wins. He solved the problem recursively, thereby reducing the computational complexity and at the same time already introducing the fundamental concept of an ‘expectation’ (without naming it this way), see Definition 1.10.1. Further motivation came from the desire to gauge insurance premia; it is not completely clear, however, why it was exactly during the above centuries that probability theory came into being. In fact, gambling had been a favorite pastime for millenia,2 and also insurance and law (see e.g. [Ber12]) would have proved useful areas of application of probability theory. It may be argued that the means to ‘generate’ randomness had not been sufficiently sophisticated before, when for example in the place of dice people used so-called ‘Astragali’, which is a type of bone. Indeed, according to today’s knowledge, it was only in the 10th century that all possible results of a series of throwing a dice several times had been completely enumerated. A further impediment might have been that (excessive) gambling had been frowned upon by the Catholic Church. Cardano, who had already been mentioned above, and who also was 1Inherent to this approach is the assumption that all those possible outcomes of the remaining games are equally likely (i.e., they have the same probability, although the definition of probability in this setting is not clear either; there are at least two different interpretations, the ‘objective’ (i.e., derived through symmetries such as in dice or coin tosses – keep in mind, though, that coins and dice do usually have some asymmetries; for example, for dice one usually has a number of cavities on each side specifying the value of the dice roll – or by investigating frequencies with which certain events happen) and the ‘subjective’ (epistemic or Bayesian, where a probability is a measure of the intensity of belief). We will soon have a closer look at the frequentist motivation. 2in china, and also around 500 BC people were gambling (throwing dice?) on the streets of rome 5 6 CHAPTER 1. PROBABILITY THEORY a medic3, found a neat way around this moral difficulty when explaining why he dealt with gambling: ‘Even if gambling were altogheter an evil, still, on account of the very large number of people who play, it would seem to be a natural evil. For that very reason it ought to be discussed by a medical doctor like one of the incurable diseases.’4 1.2 Mathematical modelling of random experiments So while the intuition of probability has been around for a long time from a practical point of view, it was only in 19335 that A. N. Kolmogorov started developing an axiomatic mathematical theory, see [Kol33]. While it is worthwhile to have a look at this original article, other texts which are suitable for accompanying this lecture are the books by A. Renyi, [Ren69], I. Schneider [Sch88], G. Gigerenzer and C. Kr¨uger [GK99], as well as U. Krengel [Kre05] and R. Durrett [Dur10]. One of the principal goals of probability theory is to describe experiments which are either random or for which one does not have sufficient information or sufficient computing power to describe them deterministically. For the latter, consider e.g. the rules of playing roulette in a casino. Even after the croupier has spun the wheel as well as the ball, a player is still allowed to make a bet until ‘rien ne va plus’. However, once ball and wheel have been spun essentially all information necessary to compute the (now essentially deterministic apart from possible air currents, percussions, etc.) outcome of this particular instance of the game. Hence, if a player was fast enough to use all this information to compute the outcome of this game, she would be able to beat the house. However, without technical support (see e.g. so-called ‘roulette computers’), a random modeling of this situation is still very close to reality. In the same spirit, a sequence of dice rolls or coin tosses would be considered random experiments. We will not go into details of discussing the more philosophical question of whether randomness actually does exist or not (see also ‘Bell’s theorem’ for a more philosophical contribution on the existence of randomness). A possible outcome of an experiment such as the one described above is usually referred to as an elementary event and denoted by ω. The set of elementary events is most often denoted by Ω (so ω Ω). An event is a set A Ω of elementary events to which we want to be able to ∈ ⊂ associate a probability P(A) via a suitable function P : 2Ω [0, 1] defined on the power set → 2Ω of Ω (i.e., on the set of all subsets of Ω). We will later on see that wanting to associate a probability to each A Ω is generally asking too much and such a function P will have to be ⊂ restricted to a subset of 2Ω.6 However, as long as Ω is either finite or countable such problems will not arise. Given an elementary event ω Ω (i.e. the outcome of a single realisation of a random experi- ∈ ment) and an event A Ω, the interpretation is that the event A has occurred if ω A and A ⊂ ∈ has not occurred if ω / A. ∈ The first goal in describing experiments as outlined above is to find Ω and P in a way that is suitable for the respective experiment. Example 1.2.1.