5 | Probability Spaces and Random Variables

5 j Probability spaces and random variables In this Chapter, we review some essentials of probability theory as required for the theory of the GLM. We focus on the particularities and inner logic of the probability theory model rather than its practical application and primarily aim to establish important concepts and notation that will be used in subsequent sections. In Section 5.1, we first introduce the basic notion of a probability space as a model for experiments that involve some degree of randomness. We then discuss some elementary aspects of probability in Section 5.2 which mainly serve to ground the subsequently discussed theory of random variables and random vectors. The fundamental mathematical construct to model univariate data endowed with uncertainty is the concept of a random variable. We focus on different ways of specifying probability distributions of random variables, notably probability mass and density functions for discrete and continuous random variables, respectively, in Section 5.3. The concise mathematical representation of more than one data point requires the concept of a random vector. In Section 5.4, we first discuss the extension of random variable concepts to the multivariate case of random vectors and then focus on three concepts that arise only in the multivariate scenario and are of immense importance for statistical data analysis: marginal distributions, conditional distributions, and independent random variables. 5.1 Probability spaces Probability spaces Probability spaces are very general and abstract models of random experiments. We use the following definition. Definition 5.1.1 (Probability space). A probability space is a triple (Ω; A; P), where Ω is a set of elementary outcomes !, A is a σ-algebra, i.e., A is a set with the following properties ◦ Ω 2 A, ◦A is closed under the formation of complements, i.e. if A 2 A, then also Ac = Ω for all A 2 A, 1 ◦A is closed under countable unions, i.e., if A1;A2;A3; ::: 2 A, then [i=1Ai 2 A. P is a probability measure, i.e., P is a mapping P : A! [0; 1] with the following properties: ◦ P is normalized, i.e., P (;) = 0 and P (Ω) = 1, and ◦ P is σ-additive, i.e., if A1;A2; ::: is a pairwise disjoint sequence in A (i.e., Ai 2 A for i = 1; 2; ::: 1 P1 and Ai \ Aj = ; for i 6= j), then P([i=1Ai) = i=1 P(Ai). • Example A basic example is a probability space that models the throw of a die. In this case the elementary outcomes ! 2 Ω model the six faces of the die, i.e., one may define Ω := f1; 2; 3; 4; 5; 6g. If the die is thrown, it will roll, and once it comes to rest, its upper surface will show one of the elementary outcomes. The typical σ-algebra used in the case of discrete and finite outcomes sets (such as the current Ω) is the power set P(Ω) of Ω. It is a basic exercise in probability theory to show that the power set indeed fulfils the properties of a σ-algebra as defined above. Because P(Ω) contains all subsets of Ω, it also contains the elementary outcome sets f1g; f2g; :::; f6g, which thus get allocated a probability P(f!g) 2 [0; 1];! 2 Ω by the probability measure P. Probabilities of sets containing a single elementary outcome are also often written simply as P(!) (:= P(f!g)). The typical value ascribed to P(!);! 2 Ω, if used to model a fair die, is P(!) = 1=6. The σ-algebra P(Ω) contains many more sets than the sets of elementary outcomes. The purpose of these additional elements is to model all sorts of events to which an observer of the random experiment may want to ascribe probabilities. For example, the observer may ask \What is the probability that the upper Elementary probabilities 2 surface shows a number larger than three?". This event corresponds to the set f4; 5; 6g, which, because the σ-algebra P(Ω) contains all possible subsets of Ω, is contained in P(Ω). Likewise, the observer may ask \What is the probability that the upper surface shows an even number?", which corresponds to the subset f2; 4; 6g of Ω. The probability measure P is defined in such a manner that the answers to the following questions are predetermined: \What is the probability that the upper surface shows nothing?" and \What is the probability that the upper surface shows any number in Ω?”. The element of P(Ω) that corresponds to the first question is the empty set, and by definition of P, P(;) = 0. This models the idea that one of the elementary outcomes, i.e., one surface with pips, will show up on every instance of the random experiment. If this is not the case, for example because the pips have worn off at one of the surfaces, the probability space model as sketched thus far is not a good model of the die experiment. The element of P(Ω) that corresponds to the second question is Ω itself. Here, the definition of the probability measure assigns P(Ω) = 1, i.e., the probability that something unspecific will happen, is one. Again, if the die falls off the table and cannot be recovered, the probability space model and the experiment are not in good alignment. Finally, the definition of the probability space as provided above allows one to evaluate probabilities for certain events based on the probabilities of other events by means of the σ-additivity of P. Assume for example that the probability space models the throw of a fair die, such that P(f!g) = 1=6 by definition. Based on this assumption, the σ-additivity property allows to evaluate the probabilities of many other events. Consider for example an observer who is interested in the probability of the event that the surface of the die shows a number smaller or equal to three. Because the elementary events f1g; f2g; f3g are pairwise disjoint, and because the event of interest can be written as the countable union f1; 2; 3g = f1g [ f2g [ f3g of these events, one may evaluate the probability of the event of interest by 3 P3 P([i=1fig) = i=1 P(i) = 1=6 + 1=6 + 1=6 = 1=2. The die example is concerned with the case that a probability space is used to model a random experiment with a finite number of elementary outcomes. In the modelling of scientific experiments, the elementary outcomes are often modelled by the set of real numbers or real-valued vectors. Much of the theoretical development of modern probability theory in the early twentieth century was concerned with the question of how ideas from basic probability with finite elementary outcome spaces can be generalized to the continuous outcome space case of real numbers and vectors. In fact, it is perhaps the most important contribution of the probability space model as defined above and originally developed by Kolmogorov (1956) to be applicable in both the discrete-finite and the continuous-infinite elementary outcome set scenarios. The study of probability spaces for Ω := R or Ω := Rn; n > 1 is a central topic in probability theory which we by and large omit here. We do however note that the σ-algebras employed when Ω := Rn; n ≥ 1 are the so-called Borel σ-algebras, commonly denoted by B for n = 1 and Bn for n > 1. The mathematical construction of these σ-algebras is beyond our scope, but for the theory of the GLM, it is not unhelpful to think of Borel σ-algebras as power sets of R or Rn; n > 1. This is factually wrong as it can be shown that there are in fact more subsets of R or Rn; n > 1 than there are elements in the corresponding Borel σ-algebras. Nevertheless, many events of interest, such as the probability for the elementary outcome of a random experiment with outcome space R to fall into a real interval [a; b], are in B. 5.2 Elementary probabilities We next discuss a few elementary aspects of probabilities defined on probability spaces. Throughout, let (Ω; A; P) denote a probability space, such that P : A! [0; 1] is a probability measure. Interpretation We first note that the probability P(A) of an event A is associated with at least two interpretations. From a Frequentist perspective, the probability of an event corresponds to the idealized long run frequency of observing the event A. From a Bayesian perspective, the probability of an event corresponds to the degree of belief that the event is true. Notably, both interpretations are subjective in the sense that the Frequentist perspective envisions an idealized long run frequency which can never be realized in practice, while the Bayesian belief interpretation is explicitly subjective and specific to a given observer. However, irrespective of the specific interpretation of the probability of an event, the logical rules for probabilistic inference, also known as probability calculus, are identical under both interpretations. The General Linear Model j © 2020 Dirk Ostwald CC BY-NC-SA 4.0 Elementary probabilities 3 Basic properties We next note the following basic properties of probabilities, which follow directly from probability space definition. Theorem 5.2.1 (Properties of probabilities). Let (Ω; A; P) denote a probability space. Then the following properties holds. (1) If A ⊂ B, then P(A) ≤ P(B). (2) P(Ac) = 1 − P(A). (3) If A \ B = ;, then P(A [ B) = P(A) + P(B).

Load more