Probabilities and Random Variables
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 1 Probabilities and random variables Probability theory is a systematic method for describing randomness and uncertainty. It prescribes a set of mathematical rules for manipulat- ing and calculating probabilities and expectations. It has been applied in many areas: gambling, insurance, finance, the study of experimental error, statistical inference, and more. One standard approach to probability theory (but not the only approach) starts from the concept of a sample space, which is an exhaustive list of possible outcomes in an experiment or other situation where the result is uncertain. Subsets of the list are called events. For example, in the very simple situation where 3 coins are tossed, the sample space might be S = fhhh; hht; hth; htt; thh; tht; tth; tttg: There is an event corresponding to \the second coin landed heads", namely, fhhh; hht; thh; thtg: Each element in the sample space corresponds to a uniquely specified out- come. Notice that S contains nothing that would specify an outcome like \the second coin spun 17 times, was in the air for 3.26 seconds, rolled 23.7 inches when it landed, then ended with heads facing up". If we wish to contemplate such events we need a more intricate sample space S. Indeed, the choice of S|the detail with which possible outcomes are described|depends on the sort of events we wish to describe. In general, a sample space can make it easier to think precisely about events, but it is not always essential. It often suffices to manipulate events via a small number of rules (to be specified soon) without explicitly identi- fying the events with subsets of a sample space. If the outcome of the experiment corresponds to a point of a sample space belonging to some event, one says that the event has occurred. For example, with the outcome hhh each of the events fno tailsg, fat least one headg, fmore heads than tailsg occurs, but the event feven number of headsg does not. The uncertainty is modelled by a probability assigned to each event. The probabibility of an event E is denoted by PE. One popular interpreta- tion of P (but not the only one) is as a long run frequency: in a very large version: Aug2011 Stat241/541 printed: 29 August 2011 c David Pollard Chap 1: Probabilities and random variables 2 number (N) of repetitions of the experiment, (number of times E occurs)=N ≈ PE; More about provided the experiments are independent of each other. independence soon. As many authors have pointed out, there is something fishy about this interpretation. For example, it is difficult to make precise the meaning of \independent of each other" without resorting to explanations that degener- ate into circular discussions about the meaning of probability and indepen- dence. This fact does not seem to trouble most supporters of the frequency theory. The interpretation is regarded as a justification for the adoption of a set of mathematical rules, or axioms. See Chapter 2 for an alternative interpretation, based on fair prices. The first four rules are easy to remember if you think of probability as a proportion. One more rule will be added soon. Rules for probabilities. (P1)0 ≤ PE ≤ 1 for every event E. (P2) For the empty subset ; (= the \impossible event"), P; = 0, (P3) For the whole sample space (= the \certain event"), PS = 1. (P4) If an event E is a disjoint union of a sequence of events E1;E2;::: P then PE = i PEi. <1> Example. Find Pfat least two headsg for the tossing of three coins. Note: The examples are collected together at the Probability theory would be very boring if all problems were solved like end of each chapter that: break the event into pieces whose probabilities you know, then add. Things become much more interesting when we recognize that the assign- ment of probabilities depends on what we know or have learnt (or assume) about the random situation. For example, in the last problem we could have written Pfat least two heads j coins fair, \independence," . g = ::: to indicate that the assignment is conditional on certain information (or assumptions). The vertical bar stands for the word given; that is, we read the symbol as probability of at least two heads given that . Chap 1: Probabilities and random variables 3 If the conditioning information is held fixed throughout a calculation, the conditional probabilities P ... j info satisfy rules (P1) through (P4). For example, P(; j info) = 0, and so on. In that case one usually doesn't bother with the \given . ", but if the information changes during the analysis the conditional probability notation becomes most useful. The final rule for (conditional) probabilities lets us break occurrence of an event into a succession of simpler stages, whose conditional probabilities might be easier to calculate or assign. Often the successive stages correspond to the occurrence of each of a sequence of events, in which case the notation is abbreviated: P ... j event A and event B have occurred and previous info or P ... j A \ B \ previous info where \ means intersection or P ... j A, B, previous info or P ... j A \ B if the \previous info" is understood: or P ... j AB where AB is an abbreviation for A \ B: The commas in the third expression are open to misinterpretation, but con- venience recommends the more concise notation. Remark. I must confess to some inconsistency in my use of parentheses and braces. If the \. " is a description in words, then f... g denotes the subset of S on which the description is true, and Pf::: g or P{· · · j infog seems the natural way to denote the probability attached to that subset. However, if the \. " stand for an expression like A \ B, the notation P(A \ B) or P(A \ B j info) looks nicer to me. It is hard to maintain a convention that covers all cases. You should not attribute much significance to differences in my notation involving a choice between parentheses and braces. Rule for conditional probability. (P5) : if A and B are events then P A \ B j info = P A j info · P B j A, info : Chap 1: Probabilities and random variables 4 The frequency interpretation might make it easier for you to appreciate this rule. Suppose that in N \independent" repetitions (given the same initial conditioning information) A occurs NA times and A\B occurs NA\B times. Then, for big N, P A j info ≈ NA=N and P A \ B j info ≈ NA\B=N: If we ignore those repetitions where A fails to occur then we have NA rep- etitions given the original information and occurrence of A, in NA\B of which the event B also occurs. Thus P B j A, info ≈ NA\B=NA. The rest is division. In my experience, conditional probabilities provide a more reliable method for solving problems traditionally handled by counting arguments (Combi- natorics). I find it hard to be consistent about how I count, to make sure every case is counted once and only once, to decide whether order should matter, and so on. The next Example illustrates my point. <2> Example. What is the probability that a hand of 5 cards contains four of a kind? I wrote out many of the gory details to show you how the rules reduce the calculation to a sequence of simpler steps. In practice, one would be less explicit, to keep the audience awake. The statement of the next example is taken verbatim from the delightful Fifty Challenging Problems in Probability by Frederick Mosteller, one of my favourite sources for elegant examples. One could learn a lot of probability by trying to solve all fifty problems. The underlying question has resurfaced in recent years in various guises. See http://en.wikipedia.org/wiki/Monty Hall problem http://en.wikipedia.org/wiki/Marilyn vos Savant#The Monty Hall problem to understand why probabilistic notation is so valuable. The lesson is: Be prepared to defend your assignments of conditional probabilities. <3> Example. Three prisoners, A, B, and C, with apparently equally good records have applied for parole. The parole board has decided to release two of the three, and the prisoners know this but not which two. A warder friend of prisoner A knows who are to be released. Prisoner A realizes that it would be unethical to ask the warder if he, A, is to be released, but thinks of asking for the name of one prisoner other than himself who is to be released. Chap 1: Probabilities and random variables 5 He thinks that before he asks, his chances of release are 2/3. He thinks that if the warder says \B will be released," his own chances have now gone down to 1/2, because either A and B or B and C are to be released. And so A decides not to reduce his chances by asking. However, A is mistaken in his calculations. Explain. You might have the impression at this stage that the first step towards the solution of a probability problem is always an explicit listing of the sample space specification of a sample space. In fact that is seldom the case. An assignment of (conditional) probabilities to well chosen events is usually enough to set the probability machine in action. Only in cases of possible confusion (as in the last Example), or great mathematical precision, do I find a list of possible outcomes worthwhile to contemplate. In the next Example construction of a sample space would be a nontrivial exercise but conditioning helps to break a complex random mechanism into a sequence of simpler stages.