Basic Statistics for SGPE Students [.3Cm] Part I: Probability Theory=1[Frame]Thanks to Achim Ahrens, Anna Babloyan and Erkal

Basic Statistics for SGPE Students Part I: Probability theory1 Nicolai Vitt [email protected] University of Edinburgh September 2019 1Thanks to Achim Ahrens, Anna Babloyan and Erkal Ersoy for creating these slides and allowing me to use them. Outline 1. Probability theory I Conditional probabilities and independence I Bayes’ theorem 2. Probability distributions I Discrete and continuous probability functions I Probability density function & cumulative distribution function I Binomial, Poisson and Normal distribution I E[X] and V[X] 3. Descriptive statistics I Sample statistics (mean, variance, percentiles) I Graphs (box plot, histogram) I Data transformations (log transformation, unit of measure) I Correlation vs. Causation 4. Statistical inference I Population vs. sample I Law of large numbers I Central limit theorem I Confidence intervals I Hypothesis testing and p-values 1 / 35 Probability Example II.1 A fair coin is tossed three times. Sample space and event The (mutually exclusive and exhaustive) list of possible outcomes of an experiment is known as the sample space and is denoted as S. An event E is a single outcome or group of outcomes in the sample space. That is, E is a subset of S. In this example, S = {HHH ; THH ; HTH ; HHT; HTT; THT; TTH ; TTT} where H and T denote head and tail. Suppose we are interested in the event ‘at least two heads’. The corresponding subspace is E = {HHH ; THH ; HTH ; HHT}. What is the probability of the event E? 2 / 35 Probability Let’s take a step back: What is probability? Classical Interpretation (Jacob Bernoulli, Pierre-Simon Laplace) If outcomes are equally likely, they must have the same probability. For example, when a coin is tossed, there are two possible outcomes: head and tail. More general, if there are n equally likely outcomes, then the probability of each outcome is 1/n. Frequency Interpretation The probability that a specific outcome of a process will be obtained is the relative frequency with which that outcome would be obtained if the process were repeated a large number of times under the same conditions. 1 Trial 1 Trial 2 .8 As we make more and more .6 tosses, the proportion of tosses that produce head approaches 0.5. We say that .4 0.5 is the probability of head. Rel. frequency of heads .2 0 0 20 40 60 80 100 Number of tosses 3 / 35 Probability Let’s take a step back: What is probability? Subjective Interpretation (Bayesian approach) The probability that a person assigns to a possible outcome represents his own judgement (based on the person’s beliefs and information). Another person, who may have different beliefs or different information, may assign a different probability to the same outcome. Distinction between prior and posterior beliefs. Thinking about randomness [Carl Friedrich] Gauss’s conversation turned to chance, the enemy of all knowledge, and the thing he had always wished to overcome. Viewed from up close, one could detect the infinite fineness of the web of causality behind every event. Step back and larger patterns appeared: Freedom and Chance were a question of distance, a point of view. Did he understand? Sort of, said Eugen wearily, looking at his pocket watch. from Measuring the World by Daniel Kehlmann 4 / 35 Probability Properties of probability S Rule 1 A For any event A, 0 ≤ P(A) ≤ 1. Furthermore, P(S) = 1. S Rule 2: Complement rule A Ac denotes the complement of event A. P(Ac) = 1 − P(A) Ac S Rule 3: Multiplication rule Two events A and B are independent of each other ABc AB AcB if and only if P(AB) = P(A and B) = P(A ∩ B) = P(A)P(B). 5 / 35 Probability Properties of probability S Rule 4: Addition rule A B If two events A and B are mutually exclusive, then P(A or B) = P(A ∪ B) = P(A) + P(B). S c Rule 5 AB B If event B is a subset of event A, then P(B) < P(A). 6 / 35 Probability What is the probability of E? Example II.1 A fair coin is tossed three times. S = {HHH , THH , HTH , HHT, HTT, THT, TTH , TTT} E = {HHH , THH , HTH , HHT} What is P(E)? First, note that – because the coin is fair – 1 P(H ) = P(T) = . 2 Second, since each toss is independent of the previous, we can use Rule 3 (Multiplication Rule), 1 1 1 1 P(HHH ) = P(H )P(H )P(H ) = × × = . 2 2 2 8 and following the same reasoning, P(HHT) = P(HHT) = ... = 1/8. Third, using Rule 4 (Addition Rule) 4 1 P(E) = P(HHH ) + P(THH ) + P(HTH ) + P(HHT) = = . 8 2 7 / 35 Probability Generalised addition rule Example II.2 A fair six-sided die is rolled. The sample space is given by S = {1, 2, 3, 4, 5, 6}. Let E1 be the event ‘obtain 3 or 4’ and let E2 denote the event ‘smaller than 4’. Thus, E1 = {3, 4} and E2 = {1, 2, 3} 2 3 It is immediately clear that P(E1) = /6 and P(E2) = /6. But what is the probability that either E1 or E2? That is, what is P(E1 ∪ E2)? Since E1 and E2 are not mutually exclusive, we cannot apply Rule 4 (Addition Rule). But we can generalise Rule 4. 8 / 35 Probability Generalised addition rule S Rule 4’: (General) Addition rule For any two events A and B P(A or B) = P(A ∪ B) = P(A) + P(B) − P(AB). ABc AB AcB Note that if A and B are mutually exclusive, P(AB) = 0. Therefore, Rule 4 is a special case of Rule 4’. Applying Rule 4’, we get P(E1 ∪ E2) = P(E1) + P(E2) − P(E1E2) = P(3) + P(4) + P(1) + P(2) + P(3) − P(3) | {z } | {z } |{z} P(E1) P(E2) P(E1E2) 1 1 1 1 1 1 4 = + + + + − = . 6 6 6 6 6 6 6 9 / 35 Conditional probability Example II.3 Suppose that, on any particular day, Anna is either in a good mood (A) or in a bad mood (Ac). Also, on any particular day, the sun is either shining (B) or not (Bc). Anna’s mood depends on the weather, such that she is more likely to be in a good mood when the sun is shining. S A The blue area A which represents the probability that Anna is in a good mood is rather small compared to the full rectangle (≈ 35%). In general, it is more likely that Anna is in a bad mood. 10 / 35 Conditional probability Example II.3 Suppose that, on any particular day, Anna is either in a good mood (A) or in a bad mood (Ac). Also, on any particular day, the sun is either shining (B) or not (Bc). Anna’s mood depends on the weather, such that she is more likely to be in a good mood when the sun is shining. S AB ←−AcB ABc −→ This graph shows both events, A and B, and their overlap. 11 / 35 Conditional probability Example II.3 Suppose that, on any particular day, Anna is either in a good mood (A) or in a bad mood (Ac). Also, on any particular day, the sun is either shining (B) or not (Bc). Anna’s mood depends on the weather, such that she is more likely to be in a good mood when the sun is shining. AB ←−AcB Now, suppose the sun is shining. We can discard the remaining sample space and focus on B. The area AB takes up most of the area in the circle. That is, given that B occured, it is more likely that Anna is in a good mood, although – in general – she is more often in a bad mood. 12 / 35 Conditional probability S Rule 3’: General Multiplication rule If A and B are any two events and P(B) > 0, ABc AB AcB then P(AB) = P(A)P(B|A) = P(B)P(A|B). P(A|B) is the conditional probability of the event A given that the event B has occurred. Conditional probability From Rule 3’ follows the definition for conditional probability P(AB) P(A|B) = . P(B) Note that, if A and B are independent, then P(A)P(B) P(A|B) = = P(A). P(B) Thus, Rule 3 is a special case of Rule 3’. 13 / 35 Conditional probability Example II.4 The following table contains counts (in thousands) of persons aged 25 and older, classified by educational attainment and employment status: Not in Education Employed Unemployed Total labor force Did not finish high school 11,521 886 14,226 26,633 High school degree 36,857 1,682 22,834 61,373 Some college 34,612 1,275 13,944 49,831 Bachelor’s degree or higher 43,182 892 12,546 56,620 Total 126,172 4,735 63,550 194,457 Is employment status independent of educational attainment? Suppose we randomly draw a person from the population. What is the probability that the person is employed? 126,172 P(employed) = = 0.6488. 194,457 Now, suppose we randomly draw another person and are given the information that the person did not finish high school. What is the probability that the person is employed given that the person did not finish high school? 11,521 P(employed|did not finish high school) = = 0.4326. 26,633 14 / 35 Conditional probability We can display the relationship between education and employment in a probability table. Not in Education Employed Unemployed Total labor force Did not finish high school 0.05925 0.00456 0.07316 0.13696 High school degree 0.18954 0.00865 0.11742 0.31561 Some college 0.17800 0.00656 0.07171 0.25626 Bachelor’s degree or higher 0.22206 0.00459 0.06452 0.29117 Total 0.64884 0.02435 0.32681 1.00000 The probabilities in the central enclosed rectangle are joint probabilities.

Load more