Probability Theory

Chapter 1 Probability Theory 1.1 Set Theory One of the main objectives of a statistician is to draw conclusions about a population of objects by conducting an experiment. The first step in this endeavor is to identify the possible outcomes or, in statistical terminology, the sample space. Definition 1.1.1 The set, S, of all possible outcomes of a particular experiment is called the sample space for the experiment. If the experiment consists of tossing a coin, the sample space contains two outcomes, heads and tails; thus, S = {H, T }. Consider an experiment where the observation is reaction time to a certain stimulus. Here, the sample space would consist of all positive numbers, that is, S = (0, ∞). 1 2 CHAPTER 1. PROBABILITY THEORY The sample space can be classified into two type: countable and uncountable. If the elements of a sample space can be put into 1–1 cor- respondence with a subset of integers, the sample space is countable. Otherwise, it is uncountable. Definition 1.1.2 An event is any collection of possible outcomes of an experiment, that is, any subset of S (including S itself). Let A be an event, a subset of S. We say the event A occurs if the outcome of the experiment is in the set A. 1.1. SET THEORY 3 We first define two relationships of sets, which allow us to order and equate sets: A ⊂ B ⇔ x ∈ A ⇒ x ∈ B (containment) A = B ⇔ A ⊂ B and B ⊂ A. (equality) Given any two events (or sets) A and B, we have the following elementary set operations: Union: The union of A and B, written A∪B, is the set of elements that belong to either A or B or both: A ∪ B = {x : x ∈ A or x ∈ B}. Intersection: The intersection of A and B, written A ∩ B, is the set of elements that belong to both A and B: A ∩ B = {x : x ∈ A and x ∈ B}. Complementation: The complement of A, written Ac, is the set of all elements that are not in A: Ac = {x : x∈ / A}. 4 CHAPTER 1. PROBABILITY THEORY Example 1.1.1 (Event operations) Consider the experiment of selecting a card at random from a standard deck and noting its suit: clubs (C), diamond (D), hearts (H), or spades (S). The sample space is S = {C, D, H, S}, and some possible events are A = {C, D}, and B = {D, H, S}. From these events we can form A ∪ B = {C, D, H, S},A ∩ B = {D}, and Ac = {H, S}. Furthermore, notice that A ∪ B = S and (A ∪ B)c = ∅, where ∅ denotes the empty set (the set consisting of no elements). 1.1. SET THEORY 5 Theorem 1.1.1 For any three events, A, B, and C, defined on a sample space S, a. Commutativity. A ∪ B = B ∪ A, A ∩ B = B ∩ A. b. Associativity. A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ (B ∩ C) = (A ∩ B) ∩ C. c. Distributive Laws. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). d. DeMorgan’s Laws. (A ∪ B)c = Ac ∩ Bc, (A ∩ B)c = Ac ∪ Bc. The operations of union and interaction can be extended to infinite collections of sets as well. If A1,A2,A3,... is a collection of sets, all defined on a sample space S, then ∞ ∪i=1Ai = {x ∈ S : x ∈ Ai for some i}. ∞ ∩i=1Ai = {x ∈ S : x ∈ Ai for all i}. For example, let S = (0, 1] and define Ai = [(1/i), 1]. Then ∞ ∞ ∪i=1Ai = ∪i=1[(1/i), 1] = (0, 1] ∞ ∞ ∩i=1Ai = ∩i=1[(1/i), 1] = {1}. 6 CHAPTER 1. PROBABILITY THEORY It is also possible to define unions and intersections over uncountable collections of sets. If Γ is an index set (a set of elements to be used as indices), then ∪a∈ΓAa = {x ∈ S : x ∈ Aa for some a}, ∩a∈ΓAa = {x ∈ S : x ∈ Aa for all a}. Definition 1.1.3 Two events A and B are disjoint (or mutually exclusive) if A ∩ B = ∅. The events A1,A2,... are pairwise disjoint (or mutually exclusive) if Ai ∩ Aj = ∅ for all i 6= j. ∞ Definition 1.1.4 If A1,A2,... are pairwise disjoint and ∪i=1Ai = S, then the collection A1,A2,... forms a partition of S. 1.2. BASICS OF PROBABILITY THEORY 7 1.2 Basics of Probability Theory When an experiment is performed, the realization of the experiment is an outcome in the sample space. If the experiment is performed a number of times, different outcomes may occur each time or some outcomes may repeat. This “frequency of occurrence” of an outcome can be thought of as a probability. More probable outcomes occur more frequently. If the outcomes of an experiment can be described probabilistically, we are on our way to analyzing the experiment sta- tistically. 1.2.1 Axiomatic Foundations Definition 1.2.1 A collection of subsets of S is called a sigma algebra (or Borel field), denoted by B, if it satisfied the following three properties: a. ∅ ∈ B (the empty set is an element of B). b. If A ∈ B, then Ac ∈ B (B is closed under complementation). ∞ c. If A1,A2,... ∈ B, then ∪i=1Ai ∈ B (B is closed under countable unions). 8 CHAPTER 1. PROBABILITY THEORY Example 1.2.1 (Sigma algebra-I) If S is finite or countable, then these technicalities really do not arise, we define for a given sample space S, B = {all subsets of S, including S itself}. If S has n elements, there are 2n sets in B. For example, if S = {1, 2, 3}, then B is the following collection of 23 = 8 sets: {1}, {1, 2}, {1, 2, 3}, {2}, {1, 3}, ∅, {3}, {2, 3}. Example 1.2.2 (Sigma algebra-II) Let S = (−∞, ∞), the real line. Then B is chosen to contain all sets of the form [a, b], (a, b], (a, b), [a, b) for all real numbers a and b. Also, from the properties of B, it follows that B contains all sets that can be formed by taking (possibly countably infinite) unions and interactions of sets of the above varieties. 1.2. BASICS OF PROBABILITY THEORY 9 Definition 1.2.2 Given a sample space S and an associated sigma algebra B, a probability function is a function P with domain B that satisfies 1. P (A) ≥ 0 for all A ∈ B. 2. P (S) = 1. ∞ 3. If A1,A2,... ∈ B are pairwise disjoint, then P (∪i=1Ai) = P∞ i=1 P (Ai). The three properties given in the above definition are usually re- ferred to as the Axioms of Probability or the Kolmogorov Axioms. Any function P that satisfies the Axioms of Probability is called a probability function. The following gives a common method of defining a legitimate probability function. Theorem 1.2.1 Let S = {s1, . , sn} be a finite set. Let B be any sigma algebra of subsets of S. Let p1, . , pn be nonnegative numbers that sum to 1. For any A ∈ B, define P (A) by X P (A) = pi. {i:si∈A} 10 CHAPTER 1. PROBABILITY THEORY (The sum over an empty set is defined to be 0.) Then P is a probability function on B. This remains true if S = {s1, s2,...} is a countable set. Proof: We will give the proof for finite S. For any A ∈ B, P (A) = P p ≥ 0, because every p ≥ 0. Thus, Axiom 1 is true. Now, i:si∈A i i X Xn P (S) = pi = pi = 1. i:si∈S i=1 Thus, Axiom 2 is true. Let A1,...,Ak denote pairwise disjoint events. (B contains only a finite number of sets, so we need to consider only finite disjoint unions.) Then, X Xk X Xk k P (∪i=1Ai) = pj = pj = P (Ai). k i=1 {j:s ∈A } i=1 {j:sj∈∪i=1Ai} j i The first and third equalities are true by the definition of P (A). The disjointedness of the Ai’s ensures that the second equality is true, because the same pj’s appear exactly once on each side of the equality. Thus, Axiom 3 is true and Kolmogorov’s Axioms are satisfied. ¤ 1.2. BASICS OF PROBABILITY THEORY 11 Example 1.2.3 (Defining probabilities-II) The game of darts is played by throwing a dart at a board and receiving a score corre- sponding to the number assigned to the region in which the dart lands. For a novice player, it seems reasonable to assume that the probability of the dart hitting a particular region is proportional to the area of the region. Thus, a bigger region has a higher probability of being hit. The dart board has radius r and the distance between rings is r/5. If we make the assumption that the board is always hit, then we have Area of region i P (scoring i points) = . Area of dart board For example, πr2 − π(4r/5)2 4 P (scoring1point) = = 1 − ( )2. πr2 5 It is easy to derive the general formula, and we find that (6 − i)2 − (5 − i)2 P (scoring i points) = , i = 1,..., 5, 52 independent of π and r. The sum of the areas of the disjoint regions equals the area of the dart board. Thus, the probabilities that have been assigned to the five outcomes sum to 1, and, by Theorem 1.2.6, this is a probability function. 12 CHAPTER 1. PROBABILITY THEORY 1.2.2 The calculus of Probabilities Theorem 1.2.2 If P is a probability function and A is any set in B, then a.

Probability Theory

Notes for ECE 313, Probability with Engineering

Joint Probability Distributions

A Counterexample to the Central Limit Theorem for Pairwise Independent Random Variables Having a Common Arbitrary Margin

Distance Metrics for Measuring Joint Dependence with Application to Causal Inference Arxiv:1711.09179V2 [Stat.ME] 15 Jun 2018

Testing Mutual Independence in High Dimension Via Distance Covariance

Independence & Causality: Chapter 17.7 – 17.8

On Khintchine Type Inequalities for $ K $-Wise Independent Rademacher Random Variables

5. Independence

Dependence and Dependence Structures

Arxiv:2005.03967V5 [Math.PR] 12 Aug 2021 Kolmogoroff's Strong Law of Large Numbers Holds for Pairwise Uncorrelated Random Va

Random Variables and Independence Anup Rao April 24, 2019

Tight Probability Bounds with Pairwise Independence