Chapter 4: Probability Distributions

Chapter 4: Probability Distributions 4.1 Random Variables A random variable is a function X that assigns a numerical value x to each possible outcome in the sample space An event can be associated with a single value of the random variable, or it can be associated with a range of values of the random variable. The probability of an event can then be described as: 푃 퐴 = 푃(푋 = 푥푖) or 푃 퐴 = 푃(푥푙 ≤ 푋 ≤ 푥푢) There could also be other topology for the random variable to describe the event. If 푥푖, = 1,2, ⋯ , 푁 are all the possible values of random variable associated with the sample space, then 푁 푃(푋 = 푥푖) = 1 푖=1 e.g. Each (composite) outcome consists of 3 ratings (M,P,C). Let 푀1 , 푃1 and 퐶1 be preferred ratings. Let X be the function that assigns to each outcome the probabilities x number of preferred ratings each outcome 퐶 0.03 3 possesses. 1 푃1 Since each outcome has a probability, we 퐶 can compute the probability of getting each 2 0.06 2 value x = 0,1,2,3 of the function X 퐶3 0.07 2 푀1 x | P(X = x) 3 | 0.03 퐶1 0.02 2 2 | 0.29 1 | 0.50 퐶2 0.01 1 0 | 0.18 푃2 퐶3 0.01 1 퐶10.09 2 푃1 퐶20.16 1 푀2 퐶3 0.01 1 … … Random variables X can be classified by the number of values x they can assume. The two common types are discrete random variables with a finite or countably infinite number of values continuous random variables having a continuum of values for x 1. A value of a random variable may correspond to several random events. 2. An event may correspond to a range of values (or ranges of values) of a random variable. 3. But a given value (in its legal range) of a random variable corresponds to a random event. 4. Different random values of the random variable correspond to mutually exclusive random events. 5. Each value of a random variable has a corresponding probability. 6. All possible values of a random variable correspond to the entire sample space. 7. The summation of probabilities corresponding to all values of a random variable must equal to unity. A fundamental problem is to find the probability of occurrence for each possible value x of the random variable X. 푃 푋 = 푥 = 푃(퐴) all outcomes 퐴 assigned value 푥 This is the problem of identifying the probability distribution for a random variable. The probability distribution of a discrete random variable X can be listed as a table of the possible values x together with the probability P(X = x) for each e.g. 푥1 | 푃(푋 = 푥1) 푥2 | 푃(푋 = 푥2) 푥3 | 푃(푋 = 푥3) … It is standard notation to refer to the values P(X = x) of the probability distribution by f(x) f(x) ≡ P(X = x) The probability distribution always satisfies the conditions 푓 푥 ≥ 0 and 푎푙푙 푥 푓 푥 = 1 푥−2 e.g. 푓 푥 = for x = 1,2,3,4 2 푥2 e.g. 푓 푥 = for x = 0,1,2,3,4 25 Since the probability distribution for a discrete random variable is a tabular list, it can also be represented as a histogram, the probability histogram. For a discrete random variable, the height for the bin value x is f(x), the width of the bin is meaningless. For a discrete random variable, the probability histogram is commonly drawn either with touching bins (left) or in Pareto style (right - also referred to as a bar chart). f(x) for number preferred ratings Of course one can also compute the cumulative distribution function (or cumulative probability function) 퐹 푥 = 푃 푋 ≤ 푥 for all − ∞ ≤ 푥 ≤ ∞ and plot it in the ways learned in chapter 2 (with consideration that the x-axis is not continuous but discrete). F(x) for number preferred ratings We now start to discuss the probability distributions for many discrete random variables that occur in nature 4.2 Binomial Distribution Bernoulli distribution: In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jacob Bernoulli, is a discrete probability distribution, which takes value 1 with success probability 푝 and value 0 with failure probability 푞 = 1 − 푝 . So if X is a random variable with this distribution, we have: 푃 푋 = 1 = 푝; 푝 푋 = 0 = 푞 = 1 − 푝. Mean and variance of a random variable 푿: (1) Mean (mathematical expectation, expectation, average, etc): 휇 = 푥 = 퐸 푋 = 푥푃(푋 = 푥) 푖 (2) Variance: 2 2 2 푉푎푟 푋 = 퐸 푥 − 푥 = 휎 = 푥 − 휇 푃(푋 = 푥) 푖 휎 is called the standard deviation. For random variable with Bernoulli distribution, we have 휇 = 퐸 푋 = 푝 푉푎푟 푋 = 휎2 = 1 − 푝 2푝 + 푝2푞 = 푞2푝 + 푝2푞 = 푝푞 푝 + 푞 = 푝푞 Binomial Distribution: We can refer to the ordered sequence of length n as a series of n repeated trials, where each trial produces a result that is either “success” or “failure”. We are interested in the random variable that reports the number x successes in n trials. Each trial is a Bernoulli trial which satisfies a) there are only two outcomes for each trial b) the probability of success is the same for each trial c) the outcomes for different trials are independent We are talking about the events 퐴푖 in the sample space S where 퐴1= s _ _ _ _ …. _; 퐴2= _ s _ _ _ …. _; 퐴3= _ _ s _ _ …. _; … ; 퐴푛= _ _ _ _ _ …. s; where by b) P(퐴1) = P(퐴2) = … = P(퐴푛) and by c) P(퐴푖 ∩ 퐴푗) = P(퐴푖) · P(퐴푗) for all distinct pairs i , j e.g. police roadblock checking for drivers who are wearing seatbelts condition a): two outcomes: “y” or “n” conditions b) &c): if the events 퐴1 to 퐴푛 contain all cars stopped, then b) and c) will be satisfied If however, event 퐴1 is broken into two (mutually exclusive sub-events), 퐴1< which is all events s _ _ _ … _ and driver 1 is less than 21 and 퐴1> which is all events s _ _ _ … _ and driver 1 is 21 or older it is entirely likely that P(퐴1<) ≠ P(퐴1>), and we would not be dealing with Bernoulli trials. If the someone caught not wearing a seatbelt began to warn oncoming cars approaching the roadblock, then P(퐴푖 ∩ 퐴푗) ≠ P(퐴푖) · P(퐴푗) for all i , j pairs and we would also not be dealing with Bernoulli trials. Note that in our definition of Bernoulli trials the number of trials n is fixed in advance All Bernoulli trials of length n have the same probability distribution!!!! (a consequence of the assumptions behind the definition of Bernoulli trials) This probability distribution is called the Binomial probability distribution for n. (it is called this because each trial has a binomial outcome “s” or “f” and the sequences generated (the composite outcomes) are binomial sequences.) e.g. Binomial probability distribution for n = 3. Sample space has 23 = 8 outcomes sss ssf sff fff sfs fsf fss ffs RV values 3 2 1 0 P(sss) = 1/8 = ½ · ½ · ½; P(ssf) = 1/8 = ½ · ½ · (1−½); P(fsf) = 1/8 = (1−½) · ½ · (1−½); etc. Probability Distribution x 0 1 2 3 f(x) 1/8 3/8 3/8 1/8 3 3 3 3 ½ 0 1 − ½ 3 ½ 1 1 − ½ 2 ½ 2 1 − ½ 1 ½ 3 1 − ½ 0 0 1 2 3 From this example, we see that the binomial probability distribution, which governs Bernoulli trials of length n is: 푛 푓(푥) ≡ 푏 푥; 푛, 푝 = 푝푥 1 − 푝 푛−푥 (BPD) 푥 where p is the (common) probability of success in any trial, and x = 0, 1, 2, …., n Note: 1. The term on the RHS of (BPD) is the x’th term of the binomial expansion of 푝 + (1 − 푝) 푛 푛 i.e. 푝 + (1 − 푝) 푛 = 푛 푝푥(1 − 푝)푛−푥 푥=0 푥 which also proves that 푛 푛 푝푥(1 − 푝)푛−푥 = 1푛 = 1 푥 푥=0 2. (BPD) is a 2-parameter family of distribution functions characterized by choice of n and p. e.g. In 60% of all solar-heat installations, the utility bill is reduced by at least 1/3. What is the probability that the utility bill will be reduced by at least 1/3 in a) 4 of 5 installations? b) at least 4 of 5 installation? a) “s” = “at least 1/3” (i.e. 1/3 or greater) “f” = “less than 1/3” P(Ai) = p = 0.6 Assume c) of Bernoulli trial assumptions holds. 5 Then f(4) = b(4; 5, 0.6) = 0.64 0.41 4 5 5 b) We want f(4) + f(5) = b(4; 5, 0.6) + b(5; 5, 0.6) = 0.64 0.41 + 0.65 0.40 4 5 Examples of binomial distribution Cumulative binomial probability distribution 풙 푩 풙; 풏, 풑 ≡ 풃 풌; 풏, 풑 (퐂퐁퐏퐃) 풌=ퟎ is the probability of x or fewer successes in n Bernoulli trials, were p is the probability of success on each trial. From (CBPD) we see 풃 풙; 풏, 풑 = 푩 풙; 풏, 풑 − 푩(풙 − ퟏ; 풏, 풑) Values of 푩 풙; 풏, 풑 are tabulated for various n and p values in Table 1 of Appendix B Cumulative binomial distribution cumulative probability e.g. probability is 0.05 for flange failure under a given load L.

Chapter 4: Probability Distributions

Bayesian Versus Frequentist Statistics for Uncertainty Analysis

Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam

This History of Modern Mathematical Statistics Retraces Their Development

3.2.3 Binomial Distribution

The Likelihood Principle

People's Intuitions About Randomness and Probability

The Interplay of Bayesian and Frequentist Analysis M.J.Bayarriandj.O.Berger

5 Stochastic Processes

1 Normal Distribution

Bernoulli Random Forests: Closing the Gap Between Theoretical Consistency and Empirical Soundness

Frequentism-As-Model

The P–T Probability Framework for Semantic Communication, Falsiﬁcation, Conﬁrmation, and Bayesian Reasoning