Probability: a Brief Introduction (Dana Longcope 1/18/05)
Total Page:16
File Type:pdf, Size:1020Kb
Probability: A brief introduction (Dana Longcope 1/18/05) Probability is a vast sub-topic within mathematics with numerous applications in Physics, Quantum mechanics being only one. Mathematical treatments can appear quite daunting but fortunately most of us have experience with random processes in life, games of chance and such things. The key concept in probability theory the the random variable. A random variable x is one which will assume a di®erent value each time we measure it (sometimes we say each time it is \realized"). If we measure it N times we ¯nd N di®erent values; we refer to ith measurement as xi. There are basically two di®erent kinds of random variable: discrete variables and continuous variables. We discuss these separately below. Discrete random variables A discrete random variable is one which can take only discrete values, let's say integers. For example, d is the number of spots showing on a 6-sided die after I roll it (i.e. it is the roll of a die). d may therefore take on the values 1, 2, . , 6, and no others; I cannot role a values d = ¼=2, d = p2. I might for example, roll my die 10 di®erent times and obtain the 10 realizations d1 = 5; d2 = 1; d3 = 3; d4 = 4; d5 = 4; d6 = 3; d7 = 6; d8 = 2; d9 = 3; d10 = 5 but the next time I rolled 10 times I would get 10 di®erent values. Since d is a random variable we don't know its value prior to rolling (at least that is the basic hypothesis of random variables). We characterize a random variable by listing the probabilities of its various outcomes. We denote by Pd the probability that a given realization will assume the value d. A probability Pj = 0 means that outcome d = j is completely impossible (so P7 = 0 since the die doesn't have a 7-spotted side); Pj = 1 means that that particular outcome is a certainty. These are the two extremes in probability and every probability must be within the rage 0 P 1. always. · j · There is no such thing as a negative probability, or a probability more certain than perfect certainty. In the case of a fair 6-sided we know that all 6 possible outcomes are equally likely. Furthermore, since the sum of all probabilities must be one (more on that below) so the value of each one must be 1=6: 1 1 1 1 1 1 P1 = 6 ; P2 = 6 ; P3 = 6 ; P4 = 6 ; P5 = 6 ; P6 = 6 If I want to know the probability that d will take one value from a set of possibilities I sum the probabilities of each outcome in the set. For example, the probability that d will be an even number is 1 1 1 1 P (d is even) = P2 + P4 + P6 = 6 + 6 + 6 = 2 : A simple consequence of this fact is that if I sum up the probabilities of all possible outcomes I must get 1: we are perfectly certain that d will assume some value. In probability this is called normalization: Pj = 1 : (1) Xj Let's consider taking a function of our random variable: f(d). Since d will take on only integer values f(x) need only be de¯ned for integers. Perhaps I am playing a game where a die roll d wins me f(d) dollars from the following payo® table 1 d 1 2 3 4 5 6 f(d) 1 0 1 0.5 0.5 0.5 ¡ ¡ (A negative value of f means that I lose f dollars.) The natural thing to ask is whether I should j j play this game? To answer this we compute the mean value or expectation of the function f(d). The mean is de¯ned as a sum over all possibilities f = P f(d) (2) h i d Xd From the payo® table above we ¯nd f = 1=12. This means I lose, on average $0.08 each time I h i ¡ roll the die. Of course, I never lose $0.08 on a particular roll; that is just the mean value. One should be careful not to confuse the mean with the experimental average. The mean, f h i is found from the knowledge of the probabilities. It is a precise number which is always the same. We will always use means Quantum Mechanics I. The average, for which I'll use the notation f¹, comes from a set of N experimental realizations, di 1 N f¹ = f(d ) N i Xi=1 This is what you compute in the laboratory. Using the 10 realization from earlier I get f¹ = 0:15 ¡ (This time at least, I seem to have lost even more that I \expected".) The average will be di®erent every time you perform a new set of experiments, and will almost never be the same as the mean. The useful relationship is that f¹ will be approximately equal to f as long as N is \large enough". h i This little tidbit goes by the name the law of large numbers. The trick is to know what \large enough" really means. but we cannot get into that here. This is (probably) the last we'll say of averages in this class. De¯nition (2) gives the recipe for computing the mean of any function f(d). It is worth making note of a few properties of the mean. 1. The mean is linear: If my function can be expressed as the sum of two functions, f(d) = g(d) + h(d), then the mean of the sum is the sum of the means g + h = g + h : h i h i h i If ® is a constant (i.e. it does not depend on d and is not otherwise random) then I can take it outside the mean ®f = ® f h i h i 2. The mean of a number is that number: 3 = 3P = 3 P = 3 h i j j Xj Xj where I've used the fact that P is normalized (i.e. eq. [1]). Since the mean of a function f j h i is not itself a random variable we consider it to be a number. This means that the mean of a mean is that mean: f = f ; h i h i This looks somewhat puzzling at ¯rst,Dbut wEe will run into its likeness often in the future. 2 3. The mean of a product in NOT the product of the means: It is usually a bad idea to discuss something that is not true. But this case appears so often and can cause so much harm if it is mistakenly used that I felt it worth stating up front. In mathematical terms gh = g h h i 6 h i h i Please note that there is a not equals in this expression. Among many other things this means that f 2 is di®erent from f 2. These are two di®erent things. h i h i It is common to take averages of the random variable itself and of various powers of it. For example, d = 3:5 for the 6-sided die. This tells us that the mean roll is a 3.5. although it's h i not easy to know what that means. One way to state it is to say that d is the centroid of the h i distribution Pd. A given roll will di®er from the mean by an amount ¢d = d d . If I use this to ¯nd the mean ¡ h i departure from the mean I ¯nd ¢d = d d = d d = 0 : h i h ¡ h ii h i ¡ hh ii (This was done laboriously on purpose; please check that you understand each step). The trivial result came because d goes above the mean by as much as it goes below the mean (¢d is positive as much as it is negative). We can obtain a more informative result by calculating the mean of the square of the departure: (¢d)2 = [d d ]2 = d2 2d d + d 2 h i h ¡ h i i h ¡ h i h i i = d2 2 d d + d 2 = d2 2 d d + d 2 h i ¡ h h ii hh i i h i ¡ h ih i h i = d2 d 2 h i ¡ h i Note that this would also be trivial if d2 were the same as d 2; but it is not. In fact (¢d)2 0, h i h i h i ¸ since it is a sum of non-negative numbers, so this exercise proves that d2 d 2 for any random h i ¸ h i variable. The expression above characterizes how far a given roll is expected to di®er from the mean roll, is called the variance of the random variable d: Var(d) = (¢d)2 = d2 d 2 h i h i ¡ h i It is common to discuss the square root of the variance, called the standard deviation = (¢d)2 d h i q which tells, in some sense, how far from the mean the value is likely to be: it is the \width" of the distribution. For the case of die rolls we ¯nd that d2 = 91=6 so = 1:71. A roll will be, on h i d average, within about 1.71 of the mean value 3.5. This statement will appear puzzling, knowing what you do about dice.