Probability Theory 1 Sample Spaces and Events

Total Page:16

File Type:pdf, Size:1020Kb

Probability Theory 1 Sample Spaces and Events 18.310 lecture notes February 10, 2015 Probability Theory Lecturer: Michel Goemans These notes cover the basic definitions of discrete probability theory, and then present some results including Bayes' rule, inclusion-exclusion formula, Chebyshev's inequality, and the weak law of large numbers. 1 Sample spaces and events To treat probability rigorously, we define a sample space S whose elements are the possible outcomes of some process or experiment. For example, the sample space might be the outcomes of the roll of a die, or flips of a coin. To each element x of the sample space, we assign a probability, which will be a non-negative number between 0 and 1, which we will denote by p(x). We require that X p(x) = 1; x2S so the total probability of the elements of our sample space is 1. What this means intuitively is that when we perform our process, exactly one of the things in our sample space will happen. Example. The sample space could be S = a; b; c , and the probabilities could be p(a) = 1=2, p(b) = 1=3, p(c) = 1=6. f g If all elements of our sample space have equal probabilities, we call this the uniform probability distribution on our sample space. For example, if our sample space was the outcomes of a die roll, the sample space could be denoted S = x1; x2; : : : ; x6 , where the event xi correspond to rolling i. f g The uniform distribution, in which every outcome xi has probability 1/6 describes the situation for a fair die. Similarly, if we consider tossing a fair coin, the outcomes would be H (heads) and T (tails), each with probability 1/2. In this situation we have the uniform probability distribution on the sample space S = H; T . We define an event Afto beg a subset of the sample space. For example, in the roll of a die, if the event A was rolling an even number, then A = x2; x4; x6 . The probability of an event A, f g denoted by P(A), is the sum of the probabilities of the corresponding elements in the sample space. For rolling an even number, we have 1 (A) = p(x ) + p(x ) + p(x ) = P 2 4 6 2 Given an event A of our sample space, there is a complementary event which consists of all points in our sample space that are not in A. We denote this event by A. Since all the points in a sample space S add to 1, we see that : X X X P(A) + P( A) = p(x) + p(x) = p(x) = 1; : x2A x=2A x2S Prob-1 and so P( A) = 1 P(A). Note that,: although− two elements of our sample space cannot happen simultaneously, two events can happen simultaneously. That is, if we defined A as rolling an even number, and B as rolling a small number (1,2, or 3), then it is possible for both A and B to happen (this would require a roll of a 2), neither of them to happen (this would require a roll of a 5), or one or the other to happen. We call the event that both A and B happen \A and B", denoted by A B (or sometimes A B), and the event that at least one happens \A or B", denoted by A B (or^ sometimes A B).\ Suppose that we have two events A and B. These divide our_ sample space into four[ disjoint parts, corresponding to the cases where both events happen, where one event happens and the other does not, and where neither event happens, see Figure 1. These cases cover the sample space, accounting for each element in it exactly once, so we get P(A B) + P(A B) + P( A B) + P( A B) = 1: ^ ^ : : ^ : ^ : A B S A B A B B A ∧ ¬ ∧ ∧ ¬ A B ¬ ∧ ¬ Figure 1: Two events A and B as subsets of the state space S. 2 Conditional probability and Independence Let A be an event with non-zero probability. We define the probability of an event B conditioned on event A, denoted by P(B A), to be j P(A B) P(B A) = ^ : j P(A) Why is this an interesting notion? Let's give an example. Suppose we roll a fair die, and we ask what is the probability of getting an odd number, conditioned on having rolled a number that is at most 3? Since we know that our roll is 1, 2, or 3, and that they are equally likely (since we started with the uniform distribution corresponding to a fair die), then the probability of each of 1 these outcomes must be 3 . Thus the probability of getting an odd number (that is, of getting 1 2 or 3) is 3 . Thus if A is the event \outcome is at most 3" and B is the event \outcome is odd", then we would like the mathematical definition of the \probability of B conditioned on A" to give P(B A) = 2=3. And indeed, mathematically we find j P(B A) 2=6 2 P(B A) = ^ = = : j P(A) 1=2 3 The intuitive reason for which our definition of P(B A) gives the answers we wanted is that the probability of every outcome in A gets multiplied by j 1 when one conditions on the event A. P(A) Prob-2 It is a simple calculation to check that if we have two events A and B, then P(A) = P(A B)P(B) + P(A B)P( B): j j: : Indeed, the first term is P(A B) and the second term P(A B). Adding these together, we get ^ ^ : P(A B) + P(A B) = P(A): ^ ^ : If we have two events A and B, we say that they are independent if the probability that both happen is the product of the probability that the first happens and the probability that the second happens, that is, if P(A B) = P(A) P(B): ^ · Example. For a die roll, the events A of rolling an even number, and B of rolling a number less 1 1 1 or equal to 3 are not independent, since P(A) P(B) = P(A B). Indeed, 2 2 = 6 . However, if · 6 ^ · 6 1 you define C to be the event of rolling a 1 or 2, then A and C are independent, since P(A) = 2 , 1 1 P(C) = , and P(A C) = . 3 ^ 6 Let us now show on an example that our mathematical definition of independence does capture the intuitive notion of independence. Let's assume that we toss two coins (not necessarily fair coins). The sample space is S = HH;HT;TH;TT (where the first letter represents the result of the first coin). Let us denote thef event of the firstg coin being a tail by T , and the event of the ◦ second coin being a tail by T and so on. By definition, we have P(T ) = p(TH) + p(TT ) and so on. Suppose that knowing◦ that the first coin is a tail doesn't change◦ the probability that the second coin is a tail. This gives P( T T ) = P( T ): ◦ j ◦ ◦ Moreover, by definition of conditional probability, P(TT ) P( T T ) = : ◦ j ◦ P(T ) ◦ Combining these equations gives P(TT ) = P(T )P( T ); ◦ ◦ or equivalently P(T T ) = P(T )P( T ): ◦ ^ ◦ ◦ ◦ Which is the condition we took to define the independence. Conclusion: knowing that the first coin is a tail doesn't change the probability that the second coin is a tail is the same as what we defined as \independence" between the events T and T . ◦ ◦ More generally, suppose that A and B are independent. In this case, we have P(A B) P(A)P(B) P(B A) = ^ = = P(B): j P(A) P(A) That is, if two events are independent, then the probability of B happening, conditioned on A happening is the same as the probability of B happening without the conditioning. It is straight- forward to check that the reasoning can be reversed as well: if the probability of B does not change when you condition on A, then the two events are independent. Prob-3 We define k events A1 :::Ak to be independent if the intersection of any subset of these events is equal to the product of their probability, that is, if for all 1 i1 < i2 < < is k, ≤ ··· ≤ P(Ai1 Ai2 Ais ) = P(Ai1 )P(Ai2 ) P(Ais ): ^ ^ · · · ^ ··· It is possible to have a set of three events such that any two of them are independent, but all three are not independent. It is an interesting exercise to try to find such an example. If we have k probability distributions on sample spaces S1 :::Sk, we can construct a new prob- ability distribution called the product distribution by assuming that these k processes are inde- pendent. Our new sample space is made of all the k-tuples (s1; s2; : : : ; sk) where si Si. The probability distribution on this sample space is defined by 2 k Y p(s1; s2; : : : ; sk) = p(si): i=1 For example, if you roll k dice, your sample space will be the set of tuples (s1; : : : ; sk) where si x1; x2; : : : ; x6 . The value of si represents the result of the i-th die (for instance si = x3 means2 f that the i-thg die rolled 3).
Recommended publications
  • Effective Program Reasoning Using Bayesian Inference
    EFFECTIVE PROGRAM REASONING USING BAYESIAN INFERENCE Sulekha Kulkarni A DISSERTATION in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2020 Supervisor of Dissertation Mayur Naik, Professor, Computer and Information Science Graduate Group Chairperson Mayur Naik, Professor, Computer and Information Science Dissertation Committee: Rajeev Alur, Zisman Family Professor of Computer and Information Science Val Tannen, Professor of Computer and Information Science Osbert Bastani, Research Assistant Professor of Computer and Information Science Suman Nath, Partner Research Manager, Microsoft Research, Redmond To my father, who set me on this path, to my mother, who leads by example, and to my husband, who is an infinite source of courage. ii Acknowledgments I want to thank my advisor Prof. Mayur Naik for giving me the invaluable opportunity to learn and experiment with different ideas at my own pace. He supported me through the ups and downs of research, and helped me make the Ph.D. a reality. I also want to thank Prof. Rajeev Alur, Prof. Val Tannen, Prof. Osbert Bastani, and Dr. Suman Nath for serving on my dissertation committee and for providing valuable feedback. I am deeply grateful to Prof. Alur and Prof. Tannen for their sound advice and support, and for going out of their way to help me through challenging times. I am also very grateful for Dr. Nath's able and inspiring mentorship during my internship at Microsoft Research, and during the collaboration that followed. Dr. Aditya Nori helped me start my Ph.D.
    [Show full text]
  • 1 Dependent and Independent Events 2 Complementary Events 3 Mutually Exclusive Events 4 Probability of Intersection of Events
    1 Dependent and Independent Events Let A and B be events. We say that A is independent of B if P (AjB) = P (A). That is, the marginal probability of A is the same as the conditional probability of A, given B. This means that the probability of A occurring is not affected by B occurring. It turns out that, in this case, B is independent of A as well. So, we just say that A and B are independent. We say that A depends on B if P (AjB) 6= P (A). That is, the marginal probability of A is not the same as the conditional probability of A, given B. This means that the probability of A occurring is affected by B occurring. It turns out that, in this case, B depends on A as well. So, we just say that A and B are dependent. Consider these events from the card draw: A = drawing a king, B = drawing a spade, C = drawing a face card. Events A and B are independent. If you know that you have drawn a spade, this does not change the likelihood that you have actually drawn a king. Formally, the marginal probability of drawing a king is P (A) = 4=52. The conditional probability that your card is a king, given that it a spade, is P (AjB) = 1=13, which is the same as 4=52. Events A and C are dependent. If you know that you have drawn a face card, it is much more likely that you have actually drawn a king than it would be ordinarily.
    [Show full text]
  • Notes on Elementary Martingale Theory 1 Conditional Expectations
    . Notes on Elementary Martingale Theory by John B. Walsh 1 Conditional Expectations 1.1 Motivation Probability is a measure of ignorance. When new information decreases that ignorance, it changes our probabilities. Suppose we roll a pair of dice, but don't look immediately at the outcome. The result is there for anyone to see, but if we haven't yet looked, as far as we are concerned, the probability that a two (\snake eyes") is showing is the same as it was before we rolled the dice, 1/36. Now suppose that we happen to see that one of the two dice shows a one, but we do not see the second. We reason that we have rolled a two if|and only if|the unseen die is a one. This has probability 1/6. The extra information has changed the probability that our roll is a two from 1/36 to 1/6. It has given us a new probability, which we call a conditional probability. In general, if A and B are events, we say the conditional probability that B occurs given that A occurs is the conditional probability of B given A. This is given by the well-known formula P A B (1) P B A = f \ g; f j g P A f g providing P A > 0. (Just to keep ourselves out of trouble if we need to apply this to a set f g of probability zero, we make the convention that P B A = 0 if P A = 0.) Conditional probabilities are familiar, but that doesn't stop themf fromj g giving risef tog many of the most puzzling paradoxes in probability.
    [Show full text]
  • Random Variable = a Real-Valued Function of an Outcome X = F(Outcome)
    Random Variables (Chapter 2) Random variable = A real-valued function of an outcome X = f(outcome) Domain of X: Sample space of the experiment. Ex: Consider an experiment consisting of 3 Bernoulli trials. Bernoulli trial = Only two possible outcomes – success (S) or failure (F). • “IF” statement: if … then “S” else “F” • Examine each component. S = “acceptable”, F = “defective”. • Transmit binary digits through a communication channel. S = “digit received correctly”, F = “digit received incorrectly”. Suppose the trials are independent and each trial has a probability ½ of success. X = # successes observed in the experiment. Possible values: Outcome Value of X (SSS) (SSF) (SFS) … … (FFF) Random variable: • Assigns a real number to each outcome in S. • Denoted by X, Y, Z, etc., and its values by x, y, z, etc. • Its value depends on chance. • Its value becomes available once the experiment is completed and the outcome is known. • Probabilities of its values are determined by the probabilities of the outcomes in the sample space. Probability distribution of X = A table, formula or a graph that summarizes how the total probability of one is distributed over all the possible values of X. In the Bernoulli trials example, what is the distribution of X? 1 Two types of random variables: Discrete rv = Takes finite or countable number of values • Number of jobs in a queue • Number of errors • Number of successes, etc. Continuous rv = Takes all values in an interval – i.e., it has uncountable number of values. • Execution time • Waiting time • Miles per gallon • Distance traveled, etc. Discrete random variables X = A discrete rv.
    [Show full text]
  • Lecture 2: Modeling Random Experiments
    Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2021 Lecture 2: Modeling Random Experiments Relevant textbook passages: Pitman [5]: Sections 1.3–1.4., pp. 26–46. Larsen–Marx [4]: Sections 2.2–2.5, pp. 18–66. 2.1 Axioms for probability measures Recall from last time that a random experiment is an experiment that may be conducted under seemingly identical conditions, yet give different results. Coin tossing is everyone’s go-to example of a random experiment. The way we model random experiments is through the use of probabilities. We start with the sample space Ω, the set of possible outcomes of the experiment, and consider events, which are subsets E of the sample space. (We let F denote the collection of events.) 2.1.1 Definition A probability measure P or probability distribution attaches to each event E a number between 0 and 1 (inclusive) so as to obey the following axioms of probability: Normalization: P (?) = 0; and P (Ω) = 1. Nonnegativity: For each event E, we have P (E) > 0. Additivity: If EF = ?, then P (∪ F ) = P (E) + P (F ). Note that while the domain of P is technically F, the set of events, that is P : F → [0, 1], we may also refer to P as a probability (measure) on Ω, the set of realizations. 2.1.2 Remark To reduce the visual clutter created by layers of delimiters in our notation, we may omit some of them simply write something like P (f(ω) = 1) orP {ω ∈ Ω: f(ω) = 1} instead of P {ω ∈ Ω: f(ω) = 1} and we may write P (ω) instead of P {ω} .
    [Show full text]
  • 39 Section J Basic Probability Concepts Before We Can Begin To
    Section J Basic Probability Concepts Before we can begin to discuss inferential statistics, we need to discuss probability. Recall, inferential statistics deals with analyzing a sample from the population to draw conclusions about the population, therefore since the data came from a sample we can never be 100% certain the conclusion is correct. Therefore, probability is an integral part of inferential statistics and needs to be studied before starting the discussion on inferential statistics. The theoretical probability of an event is the proportion of times the event occurs in the long run, as a probability experiment is repeated over and over again. Law of Large Numbers says that as a probability experiment is repeated again and again, the proportion of times that a given event occurs will approach its probability. A sample space contains all possible outcomes of a probability experiment. EX: An event is an outcome or a collection of outcomes from a sample space. A probability model for a probability experiment consists of a sample space, along with a probability for each event. Note: If A denotes an event then the probability of the event A is denoted P(A). Probability models with equally likely outcomes If a sample space has n equally likely outcomes, and an event A has k outcomes, then Number of outcomes in A k P(A) = = Number of outcomes in the sample space n The probability of an event is always between 0 and 1, inclusive. 39 Important probability characteristics: 1) For any event A, 0 ≤ P(A) ≤ 1 2) If A cannot occur, then P(A) = 0.
    [Show full text]
  • Probability and Counting Rules
    blu03683_ch04.qxd 09/12/2005 12:45 PM Page 171 C HAPTER 44 Probability and Counting Rules Objectives Outline After completing this chapter, you should be able to 4–1 Introduction 1 Determine sample spaces and find the probability of an event, using classical 4–2 Sample Spaces and Probability probability or empirical probability. 4–3 The Addition Rules for Probability 2 Find the probability of compound events, using the addition rules. 4–4 The Multiplication Rules and Conditional 3 Find the probability of compound events, Probability using the multiplication rules. 4–5 Counting Rules 4 Find the conditional probability of an event. 5 Find the total number of outcomes in a 4–6 Probability and Counting Rules sequence of events, using the fundamental counting rule. 4–7 Summary 6 Find the number of ways that r objects can be selected from n objects, using the permutation rule. 7 Find the number of ways that r objects can be selected from n objects without regard to order, using the combination rule. 8 Find the probability of an event, using the counting rules. 4–1 blu03683_ch04.qxd 09/12/2005 12:45 PM Page 172 172 Chapter 4 Probability and Counting Rules Statistics Would You Bet Your Life? Today Humans not only bet money when they gamble, but also bet their lives by engaging in unhealthy activities such as smoking, drinking, using drugs, and exceeding the speed limit when driving. Many people don’t care about the risks involved in these activities since they do not understand the concepts of probability.
    [Show full text]
  • Is the Cosmos Random?
    IS THE RANDOM? COSMOS QUANTUM PHYSICS Einstein’s assertion that God does not play dice with the universe has been misinterpreted By George Musser Few of Albert Einstein’s sayings have been as widely quot- ed as his remark that God does not play dice with the universe. People have naturally taken his quip as proof that he was dogmatically opposed to quantum mechanics, which views randomness as a built-in feature of the physical world. When a radioactive nucleus decays, it does so sponta- neously; no rule will tell you when or why. When a particle of light strikes a half-silvered mirror, it either reflects off it or passes through; the out- come is open until the moment it occurs. You do not need to visit a labora- tory to see these processes: lots of Web sites display streams of random digits generated by Geiger counters or quantum optics. Being unpredict- able even in principle, such numbers are ideal for cryptography, statistics and online poker. Einstein, so the standard tale goes, refused to accept that some things are indeterministic—they just happen, and there is not a darned thing anyone can do to figure out why. Almost alone among his peers, he clung to the clockwork universe of classical physics, ticking mechanistically, each moment dictating the next. The dice-playing line became emblemat- ic of the B side of his life: the tragedy of a revolutionary turned reaction- ary who upended physics with relativity theory but was, as Niels Bohr put it, “out to lunch” on quantum theory.
    [Show full text]
  • Topic 1: Basic Probability Definition of Sets
    Topic 1: Basic probability ² Review of sets ² Sample space and probability measure ² Probability axioms ² Basic probability laws ² Conditional probability ² Bayes' rules ² Independence ² Counting ES150 { Harvard SEAS 1 De¯nition of Sets ² A set S is a collection of objects, which are the elements of the set. { The number of elements in a set S can be ¯nite S = fx1; x2; : : : ; xng or in¯nite but countable S = fx1; x2; : : :g or uncountably in¯nite. { S can also contain elements with a certain property S = fx j x satis¯es P g ² S is a subset of T if every element of S also belongs to T S ½ T or T S If S ½ T and T ½ S then S = T . ² The universal set ­ is the set of all objects within a context. We then consider all sets S ½ ­. ES150 { Harvard SEAS 2 Set Operations and Properties ² Set operations { Complement Ac: set of all elements not in A { Union A \ B: set of all elements in A or B or both { Intersection A [ B: set of all elements common in both A and B { Di®erence A ¡ B: set containing all elements in A but not in B. ² Properties of set operations { Commutative: A \ B = B \ A and A [ B = B [ A. (But A ¡ B 6= B ¡ A). { Associative: (A \ B) \ C = A \ (B \ C) = A \ B \ C. (also for [) { Distributive: A \ (B [ C) = (A \ B) [ (A \ C) A [ (B \ C) = (A [ B) \ (A [ C) { DeMorgan's laws: (A \ B)c = Ac [ Bc (A [ B)c = Ac \ Bc ES150 { Harvard SEAS 3 Elements of probability theory A probabilistic model includes ² The sample space ­ of an experiment { set of all possible outcomes { ¯nite or in¯nite { discrete or continuous { possibly multi-dimensional ² An event A is a set of outcomes { a subset of the sample space, A ½ ­.
    [Show full text]
  • 1 — a Single Random Variable
    1 | A SINGLE RANDOM VARIABLE Questions involving probability abound in Computer Science: What is the probability of the PWF world falling over next week? • What is the probability of one packet colliding with another in a network? • What is the probability of an undergraduate not turning up for a lecture? • When addressing such questions there are often added complications: the question may be ill posed or the answer may vary with time. Which undergraduate? What lecture? Is the probability of turning up different on Saturdays? Let's start with something which appears easy to reason about: : : Introduction | Throwing a die Consider an experiment or trial which consists of throwing a mathematically ideal die. Such a die is often called a fair die or an unbiased die. Common sense suggests that: The outcome of a single throw cannot be predicted. • The outcome will necessarily be a random integer in the range 1 to 6. • The six possible outcomes are equiprobable, each having a probability of 1 . • 6 Without further qualification, serious probabilists would regard this collection of assertions, especially the second, as almost meaningless. Just what is a random integer? Giving proper mathematical rigour to the foundations of probability theory is quite a taxing task. To illustrate the difficulty, consider probability in a frequency sense. Thus a probability 1 of 6 means that, over a long run, one expects to throw a 5 (say) on one-sixth of the occasions that the die is thrown. If the actual proportion of 5s after n throws is p5(n) it would be nice to say: 1 lim p5(n) = n !1 6 Unfortunately this is utterly bogus mathematics! This is simply not a proper use of the idea of a limit.
    [Show full text]
  • Probability Theory Review 1 Basic Notions: Sample Space, Events
    Fall 2018 Probability Theory Review Aleksandar Nikolov 1 Basic Notions: Sample Space, Events 1 A probability space (Ω; P) consists of a finite or countable set Ω called the sample space, and the P probability function P :Ω ! R such that for all ! 2 Ω, P(!) ≥ 0 and !2Ω P(!) = 1. We call an element ! 2 Ω a sample point, or outcome, or simple event. You should think of a sample space as modeling some random \experiment": Ω contains all possible outcomes of the experiment, and P(!) gives the probability that we are going to get outcome !. Note that we never speak of probabilities except in relation to a sample space. At this point we give a few examples: 1. Consider a random experiment in which we toss a single fair coin. The two possible outcomes are that the coin comes up heads (H) or tails (T), and each of these outcomes is equally likely. 1 Then the probability space is (Ω; P), where Ω = fH; T g and P(H) = P(T ) = 2 . 2. Consider a random experiment in which we toss a single coin, but the coin lands heads with 2 probability 3 . Then, once again the sample space is Ω = fH; T g but the probability function 2 1 is different: P(H) = 3 , P(T ) = 3 . 3. Consider a random experiment in which we toss a fair coin three times, and each toss is independent of the others. The coin can come up heads all three times, or come up heads twice and then tails, etc.
    [Show full text]
  • Probabilities, Random Variables and Distributions A
    Probabilities, Random Variables and Distributions A Contents A.1 EventsandProbabilities................................ 318 A.1.1 Conditional Probabilities and Independence . ............. 318 A.1.2 Bayes’Theorem............................... 319 A.2 Random Variables . ................................. 319 A.2.1 Discrete Random Variables ......................... 319 A.2.2 Continuous Random Variables ....................... 320 A.2.3 TheChangeofVariablesFormula...................... 321 A.2.4 MultivariateNormalDistributions..................... 323 A.3 Expectation,VarianceandCovariance........................ 324 A.3.1 Expectation................................. 324 A.3.2 Variance................................... 325 A.3.3 Moments................................... 325 A.3.4 Conditional Expectation and Variance ................... 325 A.3.5 Covariance.................................. 326 A.3.6 Correlation.................................. 327 A.3.7 Jensen’sInequality............................. 328 A.3.8 Kullback–LeiblerDiscrepancyandInformationInequality......... 329 A.4 Convergence of Random Variables . 329 A.4.1 Modes of Convergence . 329 A.4.2 Continuous Mapping and Slutsky’s Theorem . 330 A.4.3 LawofLargeNumbers........................... 330 A.4.4 CentralLimitTheorem........................... 331 A.4.5 DeltaMethod................................ 331 A.5 ProbabilityDistributions............................... 332 A.5.1 UnivariateDiscreteDistributions...................... 333 A.5.2 Univariate Continuous Distributions . 335
    [Show full text]