1 Probability Space

Total Page:16

File Type:pdf, Size:1020Kb

1 Probability Space Introduction to Probability Theory Unless otherwise noted, references to Theorems, page numbers, etc. from Casella- Berger, chap 1. Statistics: draw conclusions about a population of objects by sampling from the population 1 Probability space We start by introducing mathematical concept of a probability space, which has three components (Ω; B;P ), respectively the sample space, event space, and probability function. We cover each in turn. Ω: sample space. Set of outcomes of an experiment. Example: tossing a coin twice. Ω = fHH;HT;TT;THg An event is a subset of Ω. Examples: (i) \at least one head" is fHH; HT; T Hg; (ii) \no more than one head" is fHT; T H; T T g. &etc. In probability theory, the event space B is modelled as a σ-algebra (or σ-field) of Ω, which is a collection of subsets of Ω with the following properties: (1) ; 2 B (2) If an event A 2 B, then Ac 2 B (closed under complementation) 1 (3) If A1;A2;::: 2 B, then [i=1Ai 2 B (closed under countable union). A countable sequence can be indexed using the natural integers. Additional properties: (4) (1)+(2) ! Ω 2 B 1 1 (5) (3)+De-Morgan's Laws !\i=1Ai 2 B (closed under coutable intersection) Consider the two-coin toss example again. Even for this simple sample space Ω = fHH;HT;TT;THg, there are multiple σ-algebras: 1. f;; Ωg: \trivial" σ-algebra 1(A [ B)c = Ac \ Bc 1 2. The \powerset" P(Ω), which contains all the subsets of Ω In practice, rather than specifying a particular σ-algebra from scratch, there is usually a class of events of interest, C, which we want to be included in the σ-algebra. Hence, we wish to \complete" C by adding events to it so that we get a σ-algebra. For example, consider 2-coin toss example again. We find the smallest σ-algebra containing (HH); (HT ); (TH); (TT ); we call this the σ-algebra \generated" by the fundamental events (HH); (HT ); (TH); (TT ). It is... Formally, let C be a collection of subsets of Ω. The minimal σ-field generated by C, denoted σ(C), satisfies: (i) C ⊂ σ(C); (ii) if B0 is any other σ-field containing C, then σ(C) ⊂ B0. Finally, a probability function P assigns a number (\probability") to each event in B. It is a function mapping B! [0; 1] satisfying: 1. P (A) ≥ 0, for all A 2 B. 2. P (Ω) = 1 3. Countable additivity: If A1;A2; · · · 2 B are pairwise disjoint (i.e., Ai \ Aj = ;, for 1 P1 all i 6= j), then P ([i=1Ai) = i=1 P (Ai). Define: Support of P is the set fA 2 B : P (A) > 0g. Example: Return to 2-coin toss. Assuming that the coin is fair (50/50 chance of getting heads/tails), then the probability function for the σ-algebra consisting of all subsets of Ω is Event AP (A) 1 HH 4 1 HT 4 1 TH 4 1 TT 4 ; 0 Ω 1 3 (HH, HT, TH) 4 (using pt. (3) of Def'n above) 1 (HH,HT) 2 . 2 1.1 Probability on the real line In statistics, we frequently encounter probability spaces defined on the real line (or a portion thereof). Consider the following probability space: ([0; 1]; B([0; 1]); µ) 1. The sample space is the real interval [0; 1] 2. B([0; 1]) denotes the \Borel" σ-algebra on [0,1]. This is the minimal σ-algebra generated by the elementary events f[0; b); 0 ≤ b ≤ 1g. This collection contains 1 2 1 2 1 1 2 things like [ 2 ; 3 ], [0; 2 ] [ ( 3 ; 1], 2 ,[ 2 ; 3 ]. • To see this, note that closed intervals can be generated as countable inter- sections of open intervals (and vice versa): 1 lim [0; 1=n) = \n=1[0; 1=n) = f0g ; n!1 1 lim (0; 1=n) = \n=1(0; 1=n) = ;; n!1 1 (1) lim (a − 1=n; b + 1=n) = \n=1(a − 1=n; b + 1=n) = [a; b] n!1 1 lim [a + 1=n; b − 1=n] = [n=1[a + 1=n; b − 1=n] = (a; b) n!1 (Limit has unambiguous meaning because the set sequences are mono- tonic.) • Thus, B([0; 1]) can equivalently be characterized as the minimal σ-field generated by: (i) the open intervals (a; b) on [0; 1]; (ii) the closed intervals [a; b]; (iii) the closed half-lines [0; a], and so on. • Moreover: it is also the minimal σ-field containing all the open sets in [0; 1]: B([0; 1]) = σ(open sets on [0; 1]): • This last characterization of the Borel field, as the minimal σ-field con- taining the open subsets, can be generalized to any metric space (ie. so that \openness" is defined). This includes R, Rk, even functional spaces (eg. L2[a; b], the space of square-integrable functions on [a; b]). 3. µ(·), for all A 2 B, is Lebesgue measure, defined as the sum of the lengths of the 1 2 1 1 2 5 1 intervals contained in A. Eg.: µ([ 2 ; 3 ]) = 6 , µ([0; 2 ] [ ( 3 ; 1]) = 6 , µ([ 2 ]) = 0. 3 More examples: Consider the measurable space ([0; 1]; B). Are the following proba- bility measures? • for some δ 2 [0; 1], A 2 B, µ(A) if µ(A) ≤ δ P (A) = 0 otherwise • 1 if A = [0; 1] P (A) = 0 otherwise • P (A) = 1, for all A 2 B. Can you figure out an appropriate σ-algebra for which these functions are probability measures? For third example: take σ-algebra as f;; [0; 1]g. 1.2 Additional properties of probability measures (CB Thms 1.2.8-11) For prob. fxn P and A; B 2 B: • P (;) = 0; • P (A) ≤ 1; • P (Ac) = 1 − P (A). • P (B \ Ac) = P (B) − P (A \ B) • P (A [ B) = P (A) + P (B) − P (A \ B); • Subadditivity (Boole's inequality): for events Ai; i ≥ 1, 1 1 X P ([i=1Ai) ≤ P (Ai): i=1 • Monotonicity: if A ⊂ B, then P (A) ≤ P (B) 4 P1 • P (A) = i=1 P (A \ Ci) for any partition C1;C2;::: By manipulating the above properties, we get P (A \ B) = P (A) + P (B) − P (A [ B) (2) ≥ P (A) + P (B) − 1 which is called the Bonferroni bound on the joint event A \ B. (Note: when P (A) and P (B) are small, then bound is < 0, which is trivially correct. Also, bound is always ≤ 1.) With three events, the above properties imply: 3 3 3 X X P ([i=1Ai) = P (Ai) − P (Ai \ Aj) + P (A1 \ A2 \ A3) i=1 i<j and with n events, we have n n n n X X X P ([i=1Ai) = P (Ai) − P (Ai \ Aj) + P (Ai \ Aj \ Ak)+ i=1 1≤i<j≤n 1≤i<j<k≤n n+1 ··· + (−1) P (A1 \ A2 \···\ An): This equality, the inclusion-exclusion formula, can be used to derive a wide variety of bounds (depending on what is known and unknown). 2 Updating information: conditional probabilities Consider a given probability space (Ω; B;P ). Definition 1.3.2: if A; B 2 B, and P (B) > 0, then the conditional probability of A given B, denoted P (AjB) is P (A \ B) P (AjB) = : P (B) If you interpret P (A) as \the prob. that the outcome of the experiment is in A", then P (AjB) is \the prob. that the outcome is in A, given that you know it is in B". • If A and B are disjoint, then P (AjB) = 0=P (B) = 0. 5 • If A ⊂ B, then P (AjB) = P (A)=P (B) < 1. Here \B is necessary for A". • If B ⊂ A, then P (AjB) = P (B)=P (B) = 1. Here \B implies A" As CB point out, when you condition on the event B, then B becomes the sample space of a new probability space, for which the P (·|B) is the appropriate probability measure: Ω ! B B ! fA \ B; 8A 2 Bg P (·) ! P (·|B) From manipulating the conditional probability formula, you can get that P (A \ B) = P (AjB) · P (B) = P (BjA) · P (A) P (BjA) · P (A) ) P (AjB) = : P (B) P1 For a partition of disjoint events A1;A2;::: of Ω: P (B) = i=1 P (B \ Ai) = P1 i=1 P (BjAi)P (Ai). Hence: P (BjAi) · P (Ai) P (AijB) = P1 i=1 P (BjAi)P (Ai) which is Baye's Rule. Example: Let's Make a Deal. • There are three doors (numbered 1,2,3). Behind one of them, a prize has been randomly placed. • You (the contestant) bet that prize is behind door 1 • Monty Hall opens door 2, and reveals that there is no prize behind door 2. • He asks you: \do you want to switch your bet to door 3?" 6 Informally: MH has revealed that prize is not behind 2. There are two cases: either (a) it is behind 1, or (b) behind 3. In which case is MH's opening door 2 more probable? In case (a), MH could have opened either 2 or 3; in case (b), MH is forced to open 2 (since he cannot open door 1, because you chose that door). MH's opening of door 2 is more probable under case (b), so you should switch. (This is actually a \maximum likelihood" argument.) More formally, define two random variables D (for door behind which the prize is) and M (denoted the door which Monty opens).
Recommended publications
  • Measure-Theoretic Probability I
    Measure-Theoretic Probability I Steven P.Lalley Winter 2017 1 1 Measure Theory 1.1 Why Measure Theory? There are two different views – not necessarily exclusive – on what “probability” means: the subjectivist view and the frequentist view. To the subjectivist, probability is a system of laws that should govern a rational person’s behavior in situations where a bet must be placed (not necessarily just in a casino, but in situations where a decision must be made about how to proceed when only imperfect information about the outcome of the decision is available, for instance, should I allow Dr. Scissorhands to replace my arthritic knee by a plastic joint?). To the frequentist, the laws of probability describe the long- run relative frequencies of different events in “experiments” that can be repeated under roughly identical conditions, for instance, rolling a pair of dice. For the frequentist inter- pretation, it is imperative that probability spaces be large enough to allow a description of an experiment, like dice-rolling, that is repeated infinitely many times, and that the mathematical laws should permit easy handling of limits, so that one can make sense of things like “the probability that the long-run fraction of dice rolls where the two dice sum to 7 is 1/6”. But even for the subjectivist, the laws of probability should allow for description of situations where there might be a continuum of possible outcomes, or pos- sible actions to be taken. Once one is reconciled to the need for such flexibility, it soon becomes apparent that measure theory (the theory of countably additive, as opposed to merely finitely additive measures) is the only way to go.
    [Show full text]
  • Distributions:The Evolutionof a Mathematicaltheory
    Historia Mathematics 10 (1983) 149-183 DISTRIBUTIONS:THE EVOLUTIONOF A MATHEMATICALTHEORY BY JOHN SYNOWIEC INDIANA UNIVERSITY NORTHWEST, GARY, INDIANA 46408 SUMMARIES The theory of distributions, or generalized func- tions, evolved from various concepts of generalized solutions of partial differential equations and gener- alized differentiation. Some of the principal steps in this evolution are described in this paper. La thgorie des distributions, ou des fonctions g&&alis~es, s'est d&eloppeg 2 partir de divers con- cepts de solutions g&&alis6es d'gquations aux d&i- &es partielles et de diffgrentiation g&&alis6e. Quelques-unes des principales &apes de cette &olution sont d&rites dans cet article. Die Theorie der Distributionen oder verallgemein- erten Funktionen entwickelte sich aus verschiedenen Begriffen von verallgemeinerten Lasungen der partiellen Differentialgleichungen und der verallgemeinerten Ableitungen. In diesem Artikel werden einige der wesentlichen Schritte dieser Entwicklung beschrieben. 1. INTRODUCTION The founder of the theory of distributions is rightly con- sidered to be Laurent Schwartz. Not only did he write the first systematic paper on the subject [Schwartz 19451, which, along with another paper [1947-19481, already contained most of the basic ideas, but he also wrote expository papers on distributions for electrical engineers [19481. Furthermore, he lectured effec- tively on his work, for example, at the Canadian Mathematical Congress [1949]. He showed how to apply distributions to partial differential equations [19SObl and wrote the first general trea- tise on the subject [19SOa, 19511. Recognition of his contri- butions came in 1950 when he was awarded the Fields' Medal at the International Congress of Mathematicians, and in his accep- tance address to the congress Schwartz took the opportunity to announce his celebrated "kernel theorem" [195Oc].
    [Show full text]
  • Arxiv:1404.7630V2 [Math.AT] 5 Nov 2014 Asoffsae,Se[S4 P8,Vr6.Isedof Instead Ver66]
    PROPER BASE CHANGE FOR SEPARATED LOCALLY PROPER MAPS OLAF M. SCHNÜRER AND WOLFGANG SOERGEL Abstract. We introduce and study the notion of a locally proper map between topological spaces. We show that fundamental con- structions of sheaf theory, more precisely proper base change, pro- jection formula, and Verdier duality, can be extended from contin- uous maps between locally compact Hausdorff spaces to separated locally proper maps between arbitrary topological spaces. Contents 1. Introduction 1 2. Locally proper maps 3 3. Proper direct image 5 4. Proper base change 7 5. Derived proper direct image 9 6. Projection formula 11 7. Verdier duality 12 8. The case of unbounded derived categories 15 9. Remindersfromtopology 17 10. Reminders from sheaf theory 19 11. Representability and adjoints 21 12. Remarks on derived functors 22 References 23 arXiv:1404.7630v2 [math.AT] 5 Nov 2014 1. Introduction The proper direct image functor f! and its right derived functor Rf! are defined for any continuous map f : Y → X of locally compact Hausdorff spaces, see [KS94, Spa88, Ver66]. Instead of f! and Rf! we will use the notation f(!) and f! for these functors. They are embedded into a whole collection of formulas known as the six-functor-formalism of Grothendieck. Under the assumption that the proper direct image functor f(!) has finite cohomological dimension, Verdier proved that its ! derived functor f! admits a right adjoint f . Olaf Schnürer was supported by the SPP 1388 and the SFB/TR 45 of the DFG. Wolfgang Soergel was supported by the SPP 1388 of the DFG. 1 2 OLAFM.SCHNÜRERANDWOLFGANGSOERGEL In this article we introduce in 2.3 the notion of a locally proper map between topological spaces and show that the above results even hold for arbitrary topological spaces if all maps whose proper direct image functors are involved are locally proper and separated.
    [Show full text]
  • Bayes and the Law
    Bayes and the Law Norman Fenton, Martin Neil and Daniel Berger [email protected] January 2016 This is a pre-publication version of the following article: Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016, doi: 10.1146/annurev-statistics-041715-033428 Posted with permission from the Annual Review of Statistics and Its Application, Volume 3 (c) 2016 by Annual Reviews, http://www.annualreviews.org. Abstract Although the last forty years has seen considerable growth in the use of statistics in legal proceedings, it is primarily classical statistical methods rather than Bayesian methods that have been used. Yet the Bayesian approach avoids many of the problems of classical statistics and is also well suited to a broader range of problems. This paper reviews the potential and actual use of Bayes in the law and explains the main reasons for its lack of impact on legal practice. These include misconceptions by the legal community about Bayes’ theorem, over-reliance on the use of the likelihood ratio and the lack of adoption of modern computational methods. We argue that Bayesian Networks (BNs), which automatically produce the necessary Bayesian calculations, provide an opportunity to address most concerns about using Bayes in the law. Keywords: Bayes, Bayesian networks, statistics in court, legal arguments 1 1 Introduction The use of statistics in legal proceedings (both criminal and civil) has a long, but not terribly well distinguished, history that has been very well documented in (Finkelstein, 2009; Gastwirth, 2000; Kadane, 2008; Koehler, 1992; Vosk and Emery, 2014).
    [Show full text]
  • Probability and Statistics Lecture Notes
    Probability and Statistics Lecture Notes Antonio Jiménez-Martínez Chapter 1 Probability spaces In this chapter we introduce the theoretical structures that will allow us to assign proba- bilities in a wide range of probability problems. 1.1. Examples of random phenomena Science attempts to formulate general laws on the basis of observation and experiment. The simplest and most used scheme of such laws is: if a set of conditions B is satisfied =) event A occurs. Examples of such laws are the law of gravity, the law of conservation of mass, and many other instances in chemistry, physics, biology... If event A occurs inevitably whenever the set of conditions B is satisfied, we say that A is certain or sure (under the set of conditions B). If A can never occur whenever B is satisfied, we say that A is impossible (under the set of conditions B). If A may or may not occur whenever B is satisfied, then A is said to be a random phenomenon. Random phenomena is our subject matter. Unlike certain and impossible events, the presence of randomness implies that the set of conditions B do not reflect all the necessary and sufficient conditions for the event A to occur. It might seem them impossible to make any worthwhile statements about random phenomena. However, experience has shown that many random phenomena exhibit a statistical regularity that makes them subject to study. For such random phenomena it is possible to estimate the chance of occurrence of the random event. This estimate can be obtained from laws, called probabilistic or stochastic, with the form: if a set of conditions B is satisfied event A occurs m times =) repeatedly n times out of the n repetitions.
    [Show full text]
  • Estimating the Accuracy of Jury Verdicts
    Institute for Policy Research Northwestern University Working Paper Series WP-06-05 Estimating the Accuracy of Jury Verdicts Bruce D. Spencer Faculty Fellow, Institute for Policy Research Professor of Statistics Northwestern University Version date: April 17, 2006; rev. May 4, 2007 Forthcoming in Journal of Empirical Legal Studies 2040 Sheridan Rd. ! Evanston, IL 60208-4100 ! Tel: 847-491-3395 Fax: 847-491-9916 www.northwestern.edu/ipr, ! [email protected] Abstract Average accuracy of jury verdicts for a set of cases can be studied empirically and systematically even when the correct verdict cannot be known. The key is to obtain a second rating of the verdict, for example the judge’s, as in the recent study of criminal cases in the U.S. by the National Center for State Courts (NCSC). That study, like the famous Kalven-Zeisel study, showed only modest judge-jury agreement. Simple estimates of jury accuracy can be developed from the judge-jury agreement rate; the judge’s verdict is not taken as the gold standard. Although the estimates of accuracy are subject to error, under plausible conditions they tend to overestimate the average accuracy of jury verdicts. The jury verdict was estimated to be accurate in no more than 87% of the NCSC cases (which, however, should not be regarded as a representative sample with respect to jury accuracy). More refined estimates, including false conviction and false acquittal rates, are developed with models using stronger assumptions. For example, the conditional probability that the jury incorrectly convicts given that the defendant truly was not guilty (a “type I error”) was estimated at 0.25, with an estimated standard error (s.e.) of 0.07, the conditional probability that a jury incorrectly acquits given that the defendant truly was guilty (“type II error”) was estimated at 0.14 (s.e.
    [Show full text]
  • General Probability, II: Independence and Conditional Proba- Bility
    Math 408, Actuarial Statistics I A.J. Hildebrand General Probability, II: Independence and conditional proba- bility Definitions and properties 1. Independence: A and B are called independent if they satisfy the product formula P (A ∩ B) = P (A)P (B). 2. Conditional probability: The conditional probability of A given B is denoted by P (A|B) and defined by the formula P (A ∩ B) P (A|B) = , P (B) provided P (B) > 0. (If P (B) = 0, the conditional probability is not defined.) 3. Independence of complements: If A and B are independent, then so are A and B0, A0 and B, and A0 and B0. 4. Connection between independence and conditional probability: If the con- ditional probability P (A|B) is equal to the ordinary (“unconditional”) probability P (A), then A and B are independent. Conversely, if A and B are independent, then P (A|B) = P (A) (assuming P (B) > 0). 5. Complement rule for conditional probabilities: P (A0|B) = 1 − P (A|B). That is, with respect to the first argument, A, the conditional probability P (A|B) satisfies the ordinary complement rule. 6. Multiplication rule: P (A ∩ B) = P (A|B)P (B) Some special cases • If P (A) = 0 or P (B) = 0 then A and B are independent. The same holds when P (A) = 1 or P (B) = 1. • If B = A or B = A0, A and B are not independent except in the above trivial case when P (A) or P (B) is 0 or 1. In other words, an event A which has probability strictly between 0 and 1 is not independent of itself or of its complement.
    [Show full text]
  • The Probability Set-Up.Pdf
    CHAPTER 2 The probability set-up 2.1. Basic theory of probability We will have a sample space, denoted by S (sometimes Ω) that consists of all possible outcomes. For example, if we roll two dice, the sample space would be all possible pairs made up of the numbers one through six. An event is a subset of S. Another example is to toss a coin 2 times, and let S = fHH;HT;TH;TT g; or to let S be the possible orders in which 5 horses nish in a horse race; or S the possible prices of some stock at closing time today; or S = [0; 1); the age at which someone dies; or S the points in a circle, the possible places a dart can hit. We should also keep in mind that the same setting can be described using dierent sample set. For example, in two solutions in Example 1.30 we used two dierent sample sets. 2.1.1. Sets. We start by describing elementary operations on sets. By a set we mean a collection of distinct objects called elements of the set, and we consider a set as an object in its own right. Set operations Suppose S is a set. We say that A ⊂ S, that is, A is a subset of S if every element in A is contained in S; A [ B is the union of sets A ⊂ S and B ⊂ S and denotes the points of S that are in A or B or both; A \ B is the intersection of sets A ⊂ S and B ⊂ S and is the set of points that are in both A and B; ; denotes the empty set; Ac is the complement of A, that is, the points in S that are not in A.
    [Show full text]
  • Random Variable = a Real-Valued Function of an Outcome X = F(Outcome)
    Random Variables (Chapter 2) Random variable = A real-valued function of an outcome X = f(outcome) Domain of X: Sample space of the experiment. Ex: Consider an experiment consisting of 3 Bernoulli trials. Bernoulli trial = Only two possible outcomes – success (S) or failure (F). • “IF” statement: if … then “S” else “F” • Examine each component. S = “acceptable”, F = “defective”. • Transmit binary digits through a communication channel. S = “digit received correctly”, F = “digit received incorrectly”. Suppose the trials are independent and each trial has a probability ½ of success. X = # successes observed in the experiment. Possible values: Outcome Value of X (SSS) (SSF) (SFS) … … (FFF) Random variable: • Assigns a real number to each outcome in S. • Denoted by X, Y, Z, etc., and its values by x, y, z, etc. • Its value depends on chance. • Its value becomes available once the experiment is completed and the outcome is known. • Probabilities of its values are determined by the probabilities of the outcomes in the sample space. Probability distribution of X = A table, formula or a graph that summarizes how the total probability of one is distributed over all the possible values of X. In the Bernoulli trials example, what is the distribution of X? 1 Two types of random variables: Discrete rv = Takes finite or countable number of values • Number of jobs in a queue • Number of errors • Number of successes, etc. Continuous rv = Takes all values in an interval – i.e., it has uncountable number of values. • Execution time • Waiting time • Miles per gallon • Distance traveled, etc. Discrete random variables X = A discrete rv.
    [Show full text]
  • Extremal Dependence Concepts
    Extremal dependence concepts Giovanni Puccetti1 and Ruodu Wang2 1Department of Economics, Management and Quantitative Methods, University of Milano, 20122 Milano, Italy 2Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L3G1, Canada Journal version published in Statistical Science, 2015, Vol. 30, No. 4, 485{517 Minor corrections made in May and June 2020 Abstract The probabilistic characterization of the relationship between two or more random variables calls for a notion of dependence. Dependence modeling leads to mathematical and statistical challenges and recent developments in extremal dependence concepts have drawn a lot of attention to probability and its applications in several disciplines. The aim of this paper is to review various concepts of extremal positive and negative dependence, including several recently established results, reconstruct their history, link them to probabilistic optimization problems, and provide a list of open questions in this area. While the concept of extremal positive dependence is agreed upon for random vectors of arbitrary dimensions, various notions of extremal negative dependence arise when more than two random variables are involved. We review existing popular concepts of extremal negative dependence given in literature and introduce a novel notion, which in a general sense includes the existing ones as particular cases. Even if much of the literature on dependence is focused on positive dependence, we show that negative dependence plays an equally important role in the solution of many optimization problems. While the most popular tool used nowadays to model dependence is that of a copula function, in this paper we use the equivalent concept of a set of rearrangements.
    [Show full text]
  • 1 Probabilities
    1 Probabilities 1.1 Experiments with randomness We will use the term experiment in a very general way to refer to some process that produces a random outcome. Examples: (Ask class for some first) Here are some discrete examples: • roll a die • flip a coin • flip a coin until we get heads And here are some continuous examples: • height of a U of A student • random number in [0, 1] • the time it takes until a radioactive substance undergoes a decay These examples share the following common features: There is a proce- dure or natural phenomena called the experiment. It has a set of possible outcomes. There is a way to assign probabilities to sets of possible outcomes. We will call this a probability measure. 1.2 Outcomes and events Definition 1. An experiment is a well defined procedure or sequence of procedures that produces an outcome. The set of possible outcomes is called the sample space. We will typically denote an individual outcome by ω and the sample space by Ω. Definition 2. An event is a subset of the sample space. This definition will be changed when we come to the definition ofa σ-field. The next thing to define is a probability measure. Before we can do this properly we need some more structure, so for now we just make an informal definition. A probability measure is a function on the collection of events 1 that assign a number between 0 and 1 to each event and satisfies certain properties. NB: A probability measure is not a function on Ω.
    [Show full text]
  • 39 Section J Basic Probability Concepts Before We Can Begin To
    Section J Basic Probability Concepts Before we can begin to discuss inferential statistics, we need to discuss probability. Recall, inferential statistics deals with analyzing a sample from the population to draw conclusions about the population, therefore since the data came from a sample we can never be 100% certain the conclusion is correct. Therefore, probability is an integral part of inferential statistics and needs to be studied before starting the discussion on inferential statistics. The theoretical probability of an event is the proportion of times the event occurs in the long run, as a probability experiment is repeated over and over again. Law of Large Numbers says that as a probability experiment is repeated again and again, the proportion of times that a given event occurs will approach its probability. A sample space contains all possible outcomes of a probability experiment. EX: An event is an outcome or a collection of outcomes from a sample space. A probability model for a probability experiment consists of a sample space, along with a probability for each event. Note: If A denotes an event then the probability of the event A is denoted P(A). Probability models with equally likely outcomes If a sample space has n equally likely outcomes, and an event A has k outcomes, then Number of outcomes in A k P(A) = = Number of outcomes in the sample space n The probability of an event is always between 0 and 1, inclusive. 39 Important probability characteristics: 1) For any event A, 0 ≤ P(A) ≤ 1 2) If A cannot occur, then P(A) = 0.
    [Show full text]