What Is Bayesian Inference? Bayesian Inference Is at the Core of the Bayesian Approach, Which Is an Approach That Allows Us to Represent Uncertainty As a Probability
Total Page:16
File Type:pdf, Size:1020Kb
Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) © 2019 SAGE Publications Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) Student Guide Introduction This example dataset introduces Bayesian Inference. Bayesian statistics (the general name for all Bayesian-related topics, including inference) has become increasingly popular in recent years, due predominantly to the growth of evermore powerful and sophisticated statistical software. However, Bayesian statistics grew from the ideas of an English mathematician, Thomas Bayes, who lived and worked in the first half of the 18th century and have been refined and adapted by statisticians and mathematicians ever since. Despite its longevity, the Bayesian approach did not become mainstream: the Frequentist approach was and remains the dominant means to conduct statistical analysis. However, there is a renewed interest in Bayesian statistics, part prompted by software development and part by a growing critique of the limitations of the null hypothesis significance testing which dominates the Frequentist approach. This renewed interest can be seen in the incorporation of Bayesian analysis into mainstream statistical software, such as, IBM® SPSS® and in many major statistics text books. Bayesian Inference is at the heart of Bayesian statistics and is different from Frequentist approaches due to how it views probability. In the Frequentist approach, probability is the product of the frequency of random events occurring Page 2 of 19 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 over a long series of repeated trials/experiments. For example, if we want to calculate the probability of seeing tails in a coin toss, the Frequentist approach posits that the more times we toss a coin, the proportion of times we get tails will tend towards the “true” probability of the coin coming up tails. Crucially, the researcher does not incorporate prior knowledge (e.g., the coin’s composition or prior coin toss experiments) into the test. In contrast, Bayesian Inference incorporates prior knowledge. For example, we may have a hunch that the coin used in the test is flawed and may favour one side over another or we may find that in the first series of tosses, the same side always comes up. This prior belief about the fairness of the coin is taken into account when we review the final result: Let’s say out of 1,000 flips, we got 800 tails, the coin is biased. In the Bayesian approach, we would modify our final view of the coin (the posterior belief) on the basis of our earlier (prior belief) observations. Thus, Bayesian Inference allows for the incorporation of prior knowledge, whether from other studies, observations, or even subjective experience. The Frequentist approach, built on the null hypothesis, assumes no prior knowledge; Bayesian Inference does not use null hypotheses. Bayesian Inference can be applied to a range of statistical tests and analyses; Bayesian statistics can be complex, and this Guide provides only an introductory review. This Guide will outline Bayesian Inference generally and will then provide a specific example of how to conduct Bayesian Inference in an Independent Samples t test. An Independent Samples t test examines whether the mean of a continuous (e.g., age, height, weight) variable differs across the two levels or categories of a dichotomous categorical (e.g., male/female or rich/poor) variable. This example describes an Independent Samples t test using Bayesian Inference, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate an Independent Samples t test using Bayesian Inference using a subset of data from the 2016–2017 National Child Measurement Programme Page 3 of 19 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 (Year 6). Specifically, we test whether the mean BMI of boys and girls in their final year of primary school differs. This page provides links to this sample dataset and a guide to producing an Independent Samples t test using Bayesian Inference using statistical software. What Is Bayesian Inference? Bayesian Inference is at the core of the Bayesian approach, which is an approach that allows us to represent uncertainty as a probability. One way to understand the Bayesian approach is to contrast it with the Frequentist approach which bases probabilities on repeatable, random events and has null hypothesis testing at its heart. In contrast, Bayesian Inference does not test null hypotheses but incorporates prior knowledge and does not rely on repetition or necessarily randomness. To illustrate, let’s imagine that we are interested in the performance of school children in a maths test. We take a random sample of 500 children from 20 schools within one city. The Frequentist approach would test a null hypothesis that stated that there would be no variance in the children’s scores – they should all achieve a similar result; same test, same age group, and supposed same maths syllabus. A Bayesian approach would not have a null hypothesis but would state what is known as a prior distribution. Let’s say the Bayesian researcher knew that the test scores from the previous cohort had shown a specific variance, this would be the starting point for her analysis; in other words, prior knowledge is being incorporated. That prior knowledge might also be based on a reading of similar studies which showed a possible variance. Once the data are tested, both researchers find a clear gender divide in the test scores, but we might argue that because the Bayesian researcher has incorporated prior knowledge, then we may have more confidence in her results. Similarly, if the Frequentist researcher had not achieved an appropriate significance level, then he would have had to fail to reject the null hypothesis and that ends the research in its current form. Significance testing is easily influenced by sample size and composition. Page 4 of 19 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 In contrast, the Bayesian researcher could continue to collect and analyse data, incorporating new findings into her probability calculation, for example, as her research expands, the gender difference may decline and she may start to find that household income or syllabus becomes more prominent, thus, this approach is more flexible and in a sense intuitive. In simple terms, a Frequentist researcher would calculate the betting odds of a horse race as equal across all the horses, whereas the Bayesian researcher would incorporate prior racing form into the calculation. Calculating Bayesian Inference Bayes’ Theorem At the heart of Bayesian Inference is Bayes’ Theorem, Equation 1 below: P (B \ A)P(A) P(A \ B) = P(B) where: • P(A\B) = probability of A given B • P(B\A) = probability of B given A • P(A) = probability of A • P(B) = probability of B P(A\B) and P(B\A) are known as conditional probabilities, which is the probability of one event (A or B) occurring given another event (A or B) has already occurred. To illustrate, let’s imagine that you work all day in a windowless lab, and as the end of your working day nears, you wonder what’s the chance it is raining? You wonder this because you forgot to wear a raincoat today. You quickly calculate the probability of rain in the city where you live based on meteorological data for your home town, which is 0.16. This is a low probability, and so you feel Page 5 of 19 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 less worried about the missing raincoat. As you walk towards the exit, your boss appears; as it has been sunny recently, your boss has been very grumpy as he hates the sun, so you quickly calculate that the probability of him being happy is 0.3. However, he is smiling and laughing, which makes you wonder again whether it is raining, as his mood is affected greatly by the weather; he especially likes rain. Let’s say that the probability that he’s happy because it is raining is 0.95. You now wonder whether you should have brought your raincoat, so you use Bayes’ Theorem to calculate the probability that it is raining given that your boss is happy. 0.95 × 0.16 P(A \ B) = = 0.507 0.3 where: • P(A\B) = probability that it is raining because your boss is happy = 0.507 • P(B\A) = probability that your boss is happy given that it is raining = 0.95 • P(A) = probability that it is raining = 0.16 • P(B) = probability that your boss is happy = 0.3 The probability of it raining because your boss is happy is 0.507 or 50.7%; therefore, it is more likely to be raining outside than not raining, shame that you don’t have your raincoat. Conducting Bayesian Analysis: Prior and Posterior Distributions Bayesian analysis uses different terminology to Frequentist, so it is useful to review it alongside the key steps in a Bayesian approach.