An Introduction to JASP: A Free and User-Friendly Package

James E Bartlett

1 Introduction 1.1 What is JASP? JASP is a free, open-source alternative to SPSS that allows you to perform both simple and complex analyses in a user-friendly package. The aim is to allow you to conduct complex Classical (stuff with p values) and (outlined in section 8), but have the advantage of a drag-and-drop interface that is intuitive to use.

1.2 The development of JASP JASP is still in development with new features being added almost on a monthly basis. This means you should constantly be checking their Twitter (@JASPStats) or Facebook (JASPStats) accounts to see if there is a new version available. This guide currently supports the features available in version 0.8.6 (as of February 28, 2018). If this is slightly out of date and there is a new feature you are confused about, feel free to email me and remind me to update it.

1.3 Why JASP? Although many universities predominantly use SPSS, it is extremely expensive which means you probably cannot use it unless you are affiliated with a university, and even then the licensing means it is often a nightmare to use on your own computer. JASP is a free, open-source alternative that aims to give you a simple and user-friendly output, making it ideal for students who are still getting to grips with statistics in psychology. Here are just a few benefits of using JASP:

1.3.1 Effect sizes Effect sizes are one of the most important values to report when analysing data. However, despite many articles and an APA task force (1999...no one ever listens) explaining their importance, SPSS only offers a limited number of effect size options and many simple effect sizes are requiredtobe calculated manually. On the other hand, JASP allows you to simply tick a box to provide an effect size for each test, and even provides multiple options for some statistical tests.

1.3.2 Continuously updated output Imagine you have gone through all of the menus in SPSS to realise you forgot to click one option that you wanted to be included in the output. You would have to go back through the menus and select that one option and rerun the whole analysis, printing it below the first output. This looks incredibly messy and takes a lot of time. In JASP, all of the options and results are presented on the same screen. If you want another option to be presented, all you have to do is tick a box and the results are updated in seconds.

1.3.3 Minimalist design For each statistical test, SPSS provides every value you will ever need and more. This can be very confusing when you are getting to grips with statistics and you can easily report the wrong value as SPSS also has their own naming conventions (e.g. sig. instead of p value). In JASP, the aim is minimalism. You start off with the bare bones result, and you have the option to select additional information if and when you need it.

1 1.3.4 Reproducible analyses A large number of errors being reported in psychological research has led to calls to improve the of research findings (Munafo et al. 2017). This means that you can show someone exactly how you got to the results you included in your report. In JASP, you have the opportunity to save your data and analyses together as a .jasp file. This preserves the analyses you performed to show yourself (thinking of your future self is probably the most important factor as even you will probably forget which options you selected) and others months or years after conducting the analyses. In SPSS, you can save the output file, but this relies on you reverse engineering allthe options that you selected, providing room for error if you miss one of the options. SPSS also creates unnecessary barriers to accessing data as you can not open .sav files without having a valid SPSS license. Therefore, your data would not be accessible to anyone who did not have access to SPSS.

1.4 Using JASP 1.4.1 How to download JASP JASP can be downloaded for free on their website for either Windows, OSX, or (if that’s your thing). Installing it should be pretty straightforward, just follow the instructions it provides you. After installing it and opening the program, you will find the ”Welcome to JASP” window shown in Figure 1.

Figure 1: The JASP startup window

1.4.2 Entering data in JASP The first difference you will find between SPSS and JASP is how you enter data. InSPSS,you enter data manually through the data view screen. In JASP, there is currently no facility to enter data directly, and you have to load the data after it has been created in a different program. JASP currently supports SPSS .sav files (this is useful if you already have data in SPSS as you donot need to use SPSS to view or analyse it), Excel .csv files, and Open Office .osd files. If youdatais a normal Excel Workbook (.xlsx), you first have to convert it to a .csv file. If you need tocreatea .csv file from a .xlsx file, here is a useful link that explains how to create one. One feature in JASP is what they call ’data synchronisation’. Although the data is still hosted as a .csv or a .sav file, you can double click on a data point in JASP and it will open the data file in either Excel, SPSS, or Open Office (the only downside to this is if you no longer have access to SPSS you wouldnotbe

2 able to edit the original data file). You can then make changes to the file and it will automatically update in JASP when you have saved it. For further details, see the JASP tutorial. To start any data analysis, we need to load some data to use. At the top of the ”welcome” window, there are two tabs: File and Common. Click on the File tab and you will see the window in Figure 2. Here you will have your recently opened files on the right if you have used it before, but yours should be blank the first time you use it. To open a file, click on Computer >Browse, then select the data file from wherever you saved it on your computer. After it has loaded, the ”welcome” window will look slightly different as your data will be visible on the left side ofthe window like in Figure 3.

Figure 2: Opening a data file in JASP. You can choose from different files in your computer, or open a file directly from the Open Science Framework

1.4.3 Changing data types The next step is to make sure JASP knows what type of data your variables are (e.g. nominal, ordinal). Unlike SPSS, JASP does its best to guess what data type it is. The symbols at the top of the columns in Figure 3 look like three circles for nominal data, three lines like a podium for ordinal data, and a little ruler for scale data. Most of the time JASP gets it right and the columns have the right symbol. However, sometimes it is not quite right. If you click on the symbol, you can change it from one data type to another if things are not quite right. Another difference between JASP and SPSS is how you define nominal factor levels. InSPSS, you might remember that to define an independent groups factor such as nationality you needto assign each level a value and label. For example, you could list the first participant as German and label them with a 1, the second person could be Chinese and be assigned a 2. Every German participant would be identified by the number 1, and every Chinese participant would be labeled2 etc. However, in JASP you have the choice of using values as factor levels or using labels as factor levels. You just need to make sure the variable type is nominal (three little circles) at the top of the column. An important thing to note is that if you use labels, all of them need to be exactly the same to be considered the same condition throughout the dataset. For example, German could not be spelled as german or GERMAN, or JASP would think these are three different conditions. It has to be written exactly the same, capitals and spaces and everything. We will come back to creating a label for values in section 3.2.1.

3 Figure 3: An empty window with a data file loaded

2 Guide Organisation

This guide currently covers three basic statistical tests: T-Tests, correlations, and ANOVA. The first part of this guide focuses on how these can be analysed using the classical approach tostatistics through the use of Null Hypothesis Significance Testing (NHST). The Bayesian equivalent of the T-Test is then introduced in section 7. Throughout the guide, the aim is to demonstrate show how basic analyses that you may be familiar with performing in other statistical packages can be performed in JASP, and offer some practical recommendations. There are some digressions where the topics are not usually discussed in normal textbooks such as using the Student or Welch’s T-Test, or outlining different types of effect size for ANOVA. However, the main focus isonthe process of performing the analyses, and not on the rationale and background to using them. If you are unfamiliar with any of the tests, there are other more comprehensive sources that will act as a guide (e.g. Field 2011; Baguely 2012). The data for all of the examples are from real published research and were made available on the Open Stats Lab (McIntyre 2016). All the analyses you are going to perform are the same as what were performed in the original research for the classical examples. We are then going to take another look at some of the studies to see how they can be analysed using Bayesian statistics. Some of the data sets have been modified slightly to remove or recalculate some variables. The data sets that have been used specifically throughout this guide can be found onthe Open Science Framework. To download all the data sets together, you can click on the Data folder and select download as zip. Alternatively, you can just download whichever data set you need for each example.

3 Independent Samples T-test 3.1 Study background The first example that we are going to look at is from a study by Schroeder and Epley(2015). The aim of the study was to investigate whether delivering a short speech to a potential employer would be more effective at landing you a job than writing the speech down and the employer reading it themselves. Thirty-nine professional recruiters were randomly assigned to receive a job application speech as either a transcript for them to read, or an audio recording of them reading

4 the speech. The recruiters then rated the applicants on perceived intellect, their impression of the application, and whether they would recommend hiring the candidate. All ratings were on a Likert scale ranging from 0 (low intellect, impression etc.) to 10 (high impression, recommendation etc.).

3.2 3.2.1 Loading the data Firstly, we need to open the data file for this example. Look back at section 1.4 on how toopena .csv file and open Schroeder-Epley-data.csv from the folder you downloaded whilst reading section 2. Your window should now look like Figure 3. The next thing to do is to make sure JASP has correctly labeled each column. The variables we are interested in for the first example are Condition, Intellect_Rating, Impression_Rating, and Hire_Rating. Condition is our independent variable and indicates whether the participant has been provided with a transcript (value 0) or an audio recording (value 1). It should be labeled as nominal data and have the three little circles explained in section 1.4.3. One helpful feature in JASP is that you can relabel your conditions to something more useful. It could be annoying to remember which condition is 1 and which is 0. If you double click on the factor name (Condition), a window will appear which allows you to create a label for condition 0 and 1, as shown in Figure 4. The other three variables are our dependent variables and each should be labeled as scale (a little ruler). Intellect_Rating and Impression_Rating are both identified correctly. However, Hire_Rating may have been labeled as a nominal variable and needs changing to scale. Click on the three circles and change it to a ruler.

Figure 4: Changing factor labels

3.2.2 Getting a first look at the data From the window in Figure 3, click on the Descriptives tab > Descriptive Statistics to find the new window in Figure 5. From here, we can take a look at the data by ticking the box ’display boxplots’ and dragging all three of our dependent variables into the white box to the right of the full list of variables. This will fill the table in the far right screen with the data for thethree dependent variables and provide you with three boxplots. However, this only provides you with the descriptive statistics for the whole sample. This is not very informative as we had two independent groups: one for those provided the transcripts, and one for those provided the audio recordings.

5 We can split the boxplots up into our two groups by dragging the Condition variable to the split box. This divides each of the ratings into our two groups to provide the plots shown in Figure 6. Taking a look at each boxplot and the descriptive statistics, we can see that when the partic- ipants were provided with audio recordings, they provided higher ratings of intellect, impression, and recommendation than those provided with just transcripts.

Figure 5: An empty window for displaying descriptive statistics

3.3 Data screening From the boxplots, we can see that the participants who listened to the audio recordings provided higher ratings than those who just read the transcript. This is only a visual inspection and we would ideally like to perform some inferential tests. However, before we go rushing to click some buttons, we need to make sure the data is suitable for parametric tests. We want to compare two independent groups (remember what the independent variable was), so we want there to be little influence from outliers, we want to assume normality, and that the variances in both groupsare approximately equal.

3.3.1 Assessing outliers The first thing we can do is look back at the boxplots we created in section 3.2.2. These figures present the median as the central line, the first and third quartiles as the edges of the box,and then 1.5 of the inter-quartile range above and below the box as lines. Values outside of these lines are what can be considered outliers. In this example Intellect_Rating has a few outliers above and below, but they do not seem to be particularly extreme (you could follow this up more formally by assessing the scores after they have been converted to standardised values, but you cannot do this in JASP. You would have to do this in SPSS.).

3.3.2 Assessing normal distribution and homogeneity of variance The next thing to do is to check if the data appear to have been sampled from a normal distri- bution, and whether the variances are approximately equal. This involves clicking on T-Tests > Independent Samples T-Test to open a new menu below the list of variables and a blank analysis table on the right of the screen. Note one of the nice features of JASP is that you do not have to

6 Figure 6: Boxplots and descriptive statistics split by condition go back through the menus and click a range of options. The new analysis just appears below the old one (which is now greyed out but still visible if you scroll up). We can drag all of the dependent variables into the dependent variable box and drag Condition into Grouping Variable. We now have a range of options to click and the temptation to start looking at T-Tests is almost irresistible. All we have to do is look at a few more tables and then we are ready to go. On the menu below the list of variables, click on Normality and Equality of Variances under the Assumption Checks heading, and also click Descriptives under Additional Statistics. You should now get something that looks like Figure 7. Another useful design feature in JASP is that the tables are designed in APA style so that you can easily copy and paste them providing the variables have appropriate names. First, we will look at the Shapiro-Wilk test which assesses the assumption of normality. The idea behind this is that the assumption of normality is the null hypothesis in the test. Therefore, if you get p<.05, then the data do not come from a normal distribution and the assumption of normality is violated. This test and a similar one called the Kolmogorov-Smirov test can also be found in SPSS. However, although the Shapiro-Wilk test is generally considered to be better, there are issues with using both and assessing normality visually, such as using Q-Q plots (not available here in JASP, but they can be produced in SPSS), is highly recommended (if you are interested in learning more, consider reading this). As you selected all three DVs, the Shapiro-Wilk table reports the test for each one and is divided by condition as we have independent groups. As we can see in each row, none of the tests are significant so we can (tentatively) conclude the assumption of normality has not been violated in this example. Secondly, we will look at Levene’s test for the assumption of equal variances (homogeneity of variance). You should have come across this test previously and uses a similar logic to the Shapiro-Wilk. The null hypothesis is that the variances are equal between the groups, therefore a sufficiently large difference in variance between the groups will be indicated by asignificant result. This test is also heavily criticised for reasons similar to the Shapiro-Wilk test above, so any conclusion you make should be in conjunction with a careful look at the data using plots (scroll up to the boxplots, is the variance roughly the same for each condition?). However, the Levene’s test suggests that the assumption of equal variances has not been violated and we can continue.

7 Figure 7: Assessing normality and homogeneity of variance

3.4 Data analysis This is the moment you have been waiting for. After all the visualising and checking, you want to finally look at some inferential statistics. We can stay on the analysis page similar to Figure7as most of the results are already here, but we were just ignoring them temporarily and require a few more options. On the menu section, Student should be selected under Tests by default, but we also want to select Welch. Under Additional Statistics, we also want to select Mean difference and Effect size. If you really want to tidy things up, you could always untick both of the boxesunder Assumption Checks. Remember JASP automatically updates so you can select the information when and if you need it. You should have a window that looks like Figure 8. Looking at the Independent Samples T-Test table, we have all the information we want (and in contrast to SPSS, all the information we need). We have both a Student (this produces the same result as SPSS) and Welch T-Test (Welch’s T-Test should be the default option but see appendix section 11.1 for more information). Remember what the boxplots and descriptive statistics showed us, participants who were provided with audio recordings gave higher ratings than those provided with transcripts. We can now support this using the T-Test result for intellect, impression, and hiring recommendation, but we will only go through the intellect result here. In published articles, T-Tests should be reported in the standard format of: t (df) = t statistic, p value, effect size. For intellect, we would write the result for the Student T-Test up as t (37) = −3.53, p = .001, Cohen’s d = −1.13. As we selected the mean difference, this is the unstandardised (or simple) effect size between the two conditions, and simply tells us what the difference was betweenthe means for intellect in our two conditions. This shows that those in the transcript condition rated the applicant 1.99 points on our scale lower on average than those in the audio recording condition. This makes sense in our example, but if another study was performed using a different scale, the mean difference in each study would not be comparable. This is where Cohen’s d comes in.Itisa standardised effect size that expresses the difference between two conditions in terms ofstandard deviations. In our example, those in the transcript condition rated the applicant 1.13 standard deviations lower on average than those in the audio recording condition. As this is a standardised unit, we would be able to compare this to other studies that used a different scale. To interpret this result, we can look at the guidelines Cohen (1988) originally suggested. He suggested results can be considered as small (±0.2), medium (±0.5), and large (±0.8) effects. However, this was only ever meant as a heuristic and it is important that you compare the effects to those found in

8 Figure 8: Independent samples T-Test results the published literature when you perform your own research. These guidelines should be a last resort when you have no other point of comparison. Following these rough guidelines for these examples, there appears to be a large effect between the two conditions. Putting this all together, we could conclude something like this: ”An independent samples T-Test indicated that participants in the audio recording condition rated the applicant significantly higher on a rating of intellect than participants in theaudio recording condition, t (37) = −3.53, p = .001, Cohen’s d = −1.13. This shows that there was a large effect between the two conditions.”

4 Example Two: Paired Samples T-Test 4.1 Study background The next study we are going to look at is by Mehr, Song and Spelke (2016). They were interested in whether singing to infants conveyed important information about social affiliation. Infants become familiar with melodies that are repeated in their specific culture. The authors were interested in whether a novel person (someone they had never seen before) could signal to the child that they are a member of the same social group and attract their attention by singing a familiar song to them. Mehr et al. (2016) invited 32 infants and their parents to participate in a repeated measures experiment. Firstly, the parents were asked to repeatedly sing a previously unfamiliar song to the infants for two weeks. When they returned to the lab, they measured the baseline gaze (where they were looking) of the infants towards two unfamiliar people on a screen who were just silently smiling at them. This was measured as the proportion of time looking at the individual who would later sing the familiar song (0.5 would indicate half the time was spent looking at the familiar singer. Values closer to one indicate looking at them for longer etc.). The two silent people on the screen then took it in turns to sing a lullaby. One of the people sung the song that the infant’s parents had been told to sing for the previous two weeks, and the other one sang a song with the same lyrics and rhythm, but with a different melody. During this period, the amount of time spent looking at each singer was measured (this was measured as the total gaze time (ms) as opposed to a proportion). Mehl et al. (2016) then repeated the gaze procedure to the two people at the start of the experiment to provide a second measure of gaze as a proportion of looking at the familiar

9 singer. We are therefore interested in whether the infants increased the proportion of time spent looking at the singer who sang the familiar song after they sang, in comparison to before they sang to the infants.

4.2 Data analysis The first thing we need to do is load a new data file. Go back to the folder you downloaded atthe beginning of section 2 and open Mehr-study1-data.csv. Think about the process we went through in the first example to explore the data, and then you need to think about how you are goingto analyse the data. The conditions of interest are called ’Baseline_Proportion_Gaze_to_Singer’ and ’Test_Proportion_Gaze_to_Singer’. As this study is repeated measures, we will be needing a paired samples T-Test, but the remainder of the procedure is the same as the Independent Samples T-Test.

1. Explore descriptive statistics and create a box plot for each condition. 2. Assess parametric assumptions. We do not need to worry about homogeneity of variance in this example as it is a . 3. To perform a Paired Samples T-Test, click on T-Tests > Paired Samples T-Test this time. Drag both variables into the analysis space. Remember to look at the effect size!

5 Pearson’s Correlation 5.1 Study background Now that we have seen how you can run T-Tests, the next test to perform is a simple correlation between two continuous variables. Correlations allow us to assess the degree of relationship between two variables. The first example we are going to work with is from Beall, Hofer and Shaller (2016) who investigated if the outbreak of infectious diseases can influence voting behaviour. They were specifically interested in the emergence of the Ebola virus and whether it was associated with support for a more conservative candidate over a liberal candidate in the US Federal elections. There are two variables of interest that we are going to investigate: frequency of Google searches for the Ebola virus, and political support for either a conservative or liberal candidate in the 2014 US Federal elections. The first variable is called Daily.Ebola.Search.Volume and is the search volume for particular topics in a geographical region. The topic with the highest search volume in a particular day is scored 100, and all other topics are expressed as a percentage of that value. Therefore, the closer the value is to 100, the more people Googled the Ebola virus on a specific day. The second variable is called Voter.Intention.Index. This was calculated by subtracting the percentage of voters who intended to support a liberal candidate in the election from the percentage of voters who intended to support a conservative candidate. Therefore, positive values indicate greater support for conservative candidates and negative values indicate greater support for liberal candidates.

5.2 Descriptive statistics Start by loading Beall-Hofer-Shaller-data.csv into JASP. We are going to start again by looking at the descriptive statistics. Enter both variables listed above into the empty white box seen in Figure 5 and used in the previous examples.

5.3 Data screening We can now start to think about whether a parametric version of correlation is appropriate for our data. We want both both variables to be measured on a continuous scale, we want both measurements to be in pairs, and we want there to be no outliers. We will think about each one in turn. Both of the variables are measured on a continuous scale. Remember to check that JASP has correctly worked out what type of data each variable is by looking at the top of the columns in the data view screen as seen in Figure 3. We want both variables to have a little ruler on top, which they both should have.

10 Next, we want both variables to be in pairs. We start to have a small problem here. If you looked closely when you opened the data, or when you were exploring the descriptive statistics, you might have seen that we have 65 rows of data for the Ebola search index, but we only have 24 rows for the voter intention index. This makes sense as the data is split into days and the voter intention index is based on polling data. We do not have polling data for every day, so the correlations will only be based on the number of rows we have both a voter intention index and an Ebola Google search volume. This leaves us with 24 complete pairs of data to analyse. Finally, we want there to be no outliers. Outliers are extremely problematic for correlations as it can bias the value. We can look back at the boxplots you created during the tasks in section 5.2.1. It appears that the data looks fine, so we are all set to go ahead and calculate some correlations.

5.4 Data analysis We will be using a different analysis tab than the one we used for the T-Tests. Firstly, clickon Regression > Correlation Matrix to open a new analysis window. The next thing we want to do is to drag both of the variables into the empty white box like we did for the descriptive statistics. This will fill the table with two numbers, one for Pearson’s r (the correlation value) andonefor the p value. We also want to tick the box for a correlation matrix to visualise the correlation. This is incredibly important for correlations as radically different distributions can produce similar correlation values. This will produce the screen you can see in Figure 9. One of the first things you might notice in comparison to SPSS, is that you only get oneset of numbers in the correlation table. SPSS gives you the full matrix of every combination of your variables (including correlating the same two variables). In JASP, it just provides you with what you need to know for two variables: what is the correlation coefficient, and what is the p value. Simple. We could write the result up like this: ”The correlation between daily Ebola search volume and voter intention was small and non- significant, Pearson’s r(22) = .17, p = .430.” We have the (22) after r as that is the degrees of freedom for a correlation. JASP does not provide it you directly, but it is the number of people in the analysis minus two (24 was the sample as we only have 24 matching pairs of data, so 24-2).

Figure 9: Correlation matrix and scatter plot

11 6 One-way Independent Groups ANOVA 6.1 Study background For this study we are going to look at James, Bonsall, and Hoppitt (2015). They were interested in how you can treat people who have flashbacks following traumatic experiences (note that the participants did not experience real trauma, but were shown violent videos to induce ”experimen- tal trauma”). Simply trying to forget a traumatic experience is not very effective, and research suggests that changing the traumatic memory can help. James et al. (2015) included four different experimental conditions to investigate whether the process of changing a traumatic memory can be improved by pairing reactivating the memory with playing a computer game. The idea is that the visual stimulus of the computer game would interfere with the memory that is being changed and prevent further traumatic flashbacks. The four levels were: 1. A control condition (shown a short musical filler task). 2. Reactivation and Tetris (shown the traumatic video again and played Tetris after a filler task). 3. Tetris only. 4. Reactivation only.

6.2 Data analysis The first thing we need to do is open the data file for this example. Open James-2015-data.csv. As the study has four levels of manipulation and the participant completed only one of the levels, we are dealing with a one-way independent groups ANOVA. The IV in the data set is simply called Condition. The DV is the number of intrusions (flashbacks to the traumatic experience) that the participant experienced over a 7 day period after completing the experimental manipulation. A lower number means that the treatment is more effective as the participant experienced fewer flashback. This variable is called Number_of_Intrustions in the data set.

6.2.1 Descriptive statistics and parametric assumptions You can explore the simple descriptive statistics in the same way as in the previous examples through the descriptives drop-down menu. However, as the designs become more complicated, it is recommended to visualise the data to more easily understand the relationship between conditions in the IV. One of the nice features of JASP is that it is very simple to plot the results of your analyses, and you can also request the descriptive statistics in the ANOVA analysis window. Click on ANOVA > ANOVA for a independent groups ANOVA. You should get a window looking like Figure 10, but without any options selected yet. Drag Number_of_Intrusions into the Dependent Variable space, and Condition into the Fixed Factors space. For the time being we are only going to worry about the descriptive statistics and plotting the data. Select descriptive statistics under Additional Options, and drag Condition into the Horizontal axis space under Descriptive Plots to get a window looking exactly like Figure 10 now. I also selected to display error bars using the 95% (CI) to display the amount of uncertainty wehave. As you can see from both the descriptive statistics and the plot, the combined treatment of both reactivation and playing Tetris has been the most effective as that has the lowest number of intrusions. The last thing to do before looking at the ANOVA results is to take a look at the parametric assumptions. Figure 11 displays the result of selecting both options under Assumption Checks. Back in section 3.3.2, we went over why Levene’s test is not a great way of assessing homogeneity of variance. This is telling us that the assumption has been violated, and if we were to look at boxplots through the descriptive statistics (section 3.2.2), we can see that most of the conditions are approximately equal, but the control condition has quite a large spread compared to the other three conditions. Moving on the the Q-Q plot (I mentioned in section 3.3.2 that these could not be created for T-Tests, but it is helpfully provided in the ANOVA options), this is slightly more worrying. It compares the actual residuals from the ANOVA model to what would be expected from fitting a theoretical normal distribution. If all the dots fall on a straight line, itsuggests the residuals are normally distributed and this assumption is met. However, this plot shows that

12 Figure 10: Descriptive statistics and displaying a simple line plot for our data there are some outliers in the top right corner and the data is skewed as it snakes around the line (skew could also be investigated through creating histograms for each condition in descriptive statistics, and this would lead you to making approximately the same decision. One option would be to try and transform the data to make it more normal. A common ”get out of jail free card” that people report when using ANOVA is that it is ”robust to violations of parametric assumptions”. This is only partially true when the sample size is equal in each group (as it is in this case), and then has further requirements on the sample size being large enough to be approximately normal. For our purposes in demonstrating how to use ANOVA in JASP, we will continue with the data in its present form, but note that if you were analysing this data for real, it would be more appropriate to explore transforming the data, or consider a non-parametric equivalent.

6.2.2 Main ANOVA results Now it is time to look at the main ANOVA results. The main table can be viewed in Figure 12. The only additional options that have been selected is to display each measure of effect size that is provided. Eta squared (η2) tells you how much variance in your DV is explained by your IV. You can also select partial eta squared (ηp2) which in the case of a one-way ANOVA, is exactly the same as eta squared. SPSS only provides you with partial eta squared as a measure of effect size. The difference between these two measures is when you add additional factors to createa factorial ANOVA. Eta squared tells you how much variance is explained by your IV as a proportion of the total variance. Therefore, for a factorial ANOVA factor 1 may explain 7% of the variance in your DV, and factor 2 may explain 12% of the variance. The remainder will be explained by the interaction and error but together they will add up to 100%. However, this may not be entirely useful for comparing values across studies as each study will have a different amount of total variance. Partial eta-squared is the amount of variance explained by the IV in comparison to both the variance explained by the IV and the error associated with it (the variance it cannot explain), and all the other IVs are partialled out (hence partial eta squared). This explains why they are the same when you only have one IV as there is only the variance explained by the one IV and the error. We also have omega squared (ω2) which is slightly smaller than eta squared. This provides the same information, the proportion of variance explained by the IV, but it also corrects for the bias due to it being an estimate from the sample and not from the population. As the sample size increases, the difference between eta squared and omega squared will decrease. As there are only 18 participants in each group, the effect size is 27% smaller in this example. There

13 Figure 11: Assessing ANOVA parametric assumptions using Levene’s test and Q-Q plots is even a partial version of omega squared, but unfortunately it is not available in JASP. Further information on different types of effect sizes can be found in Lakens (2013). As we only haveone IV, the most appropriate choice is to report the omega squared to ensure it is less biased due to it being estimated from a sample. After that little digression, we can focus on what the table is telling us for the main ANOVA results. This is very straight forward and we can see that there is a significant effect of condition on the number of intrusive memories. We can also see that Condition explains 10.4% of the variance in the number of intrusions if we use the less biased omega squared. This could be written as ”there was a significant effect of condition on the number of intrusive memories, F (3, 68) = 3.80, p = .01, ω2 = 0.10”. This is all well and good, but we typically do not want to stop there, what we really want to know is how our conditions differed, and which one resulted in the lowest number of intrusive memories. We have two options for this: we can either use planned contrasts if we had specific predictions, or we can use post-hoc tests to examine every possible comparison.

6.2.3 Planned contrasts If we have specific predictions and we only want to examine a subset of the total possible com- parisons, we can use planned contrasts. You may remember that SPSS provides you with an option to set up your own contrasts using coefficients, or you can select from a range of existing contrasts. There is slightly less flexibility in JASP, as you can only select from a range of contrasts and not define your own. For this example, one possibility may have been to predict thatthe combination of both treatments would be the most effective (as per the original article), and you could use this as the standard to compare to all the other conditions. For this to work, we need to slightly rearrange the data which can be simply done by double clicking on the Condition header to rename the factor labels like in Figure 4. Both treatments is currently in second place, but we can rearrange this by selecting it and clicking the up arrow to move it into first place. If we then select the Simple contrast under Contrasts (click none and select simple), it uses both treatments as the condition to compare all the other treatments to. Looking back at Figure 12, we can see that both treatments results in significantly fewer intrusions than both the control condition and reactivation on its own. However, it does not produce significantly fewer intrusions than Tetris on its own. This could be written as ”planned contrasts showed that both treatments produced fewer intrusive memories than both the control condition (t(34) = 3.04, p = .003) and reactivation in isolation (t(34) = 2.78, p = .007). On the other hand, there was not a significant difference

14 Figure 12: Main ANOVA results and planned contrasts between both treatments and Tetris in isolation (t(34) = 1.89, p = .063).”. Note here that the degrees of freedom are not reported but they can be calculated by adding the sample size of each group together minus 2 (i.e. 36 - 2 = 34).

6.2.4 Post-hoc tests If we did not have any specific predictions, an alternative would be to conduct post-hoc tests and examine each combination of our conditions. If we select Condition under Post Hoc Tests, and select Effect size and Bonferroni correction, you should get a window that looks like Figure13. The first thing to notice is that unlike SPSS, JASP does not provide you with a matrix ofeach condition and repeat each combination. You are just presented with each unique combination. In this example, the conclusions do not change if we use post-hoc tests as both treatments still result in significantly fewer intrusions than control and reactivation (it is important to emphasise that this decision should always be made prior to examining any data). None of the other comparisons are statistically significant. The option to present effect sizes is extremely useful and you canrefer back to section 3.4 if you need a reminder on how to interpret Cohen’s d. The results could be written up similar to how they were presented in section 6.2.3, but there would obviously be more tests to report and you could include Cohen’s d like it is presented in section 3.4.

7 One-way Repeated Measures ANOVA 7.1 Study background The next study we are going to investigate is Harvie, Broeker and Smith (2015). They were interested in whether visual information about your body in space can affect your perception of pain. This is similar to science fiction movies such as the Matrix where people experience pain in their real bodies when their virtual reality bodies have also experienced pain. Harvie et al. (2015) used participants who experienced neck pain and surrounded them with a 360 degree screen wearing virtual reality goggles. This meant that when they turned their heads, they moved around the scene they were viewing through the goggles. To investigate how visual information affects their perception of pain, participants completed three conditions. To provide a baseline measure, they asked participants to look around scene until they felt pain in their neck and measured

15 Figure 13: Main ANOVA results and post-hoc tests the amount of rotation (in degrees) they were able to move their head. They then created two artificial conditions where their actual movements resulted in less movement through thegoggles (understated visual feedback) or more movement (overstated visual feedback). Therefore, through the participants eyes they either perceived more or less movement without realising it was being manipulated. This was measured as a proportion of the movement they were able to perform during the baseline measure. Therefore, values less than one mean that they moved less than in their baseline and values greater than one mean that they moved more than in their baseline.

7.2 Data analysis Time to open another data file, open Harvie-2015-data.csv from the folder you downloaded in section 2. In this study we have one repeated measures IV with three levels (visual feedback). As this is a repeated measures experiment, these levels will be three separate columns in our data set. The columns we will need for our IV are called Understated_Visual_Feedback, Ac- curate_Visual_Feedback, and Overstated_Visual_Feedback. If you look closely at the Accu- rate_Visual_Feedback column, all the values are 1 as the other two conditions are expressed as a proportion of this condition. We have one DV (degree of rotation as a proportion of the baseline measurement) which are the measurements in each column of our IV, we do not need to include a specific measurement column like we did for the independent groups ANOVA.

7.2.1 Descriptive statistics and parametric assumptions Similar to the last section, we are not going to repeat the procedure for getting descriptive statistics as you should be pretty good at it by now. We can focus on getting the ANOVA window up and running to assess the parametric assumptions. Click on ANOVA > Repeated Measures ANOVA to get an empty window like Figure 14. Instead of directly dragging our variables into the spaces like in the previous examples, we need to do a bit of specifying whenever we use the repeated measures ANOVA window. Under Repeated Measures Factors, we need to tell JASP what our IV (or IVs if you have a factorial repeated measures ANOVA) is and how many levels it has. Double click on RM Factor 1 and give it a name, I’ve called it Visual Feedback. Then click on Level 1 and rename it Understated, Level 2 can be Normal, and Level 3 can be Overstated. Notice how Repeated Measures Cells changes as we specify our IV and its levels. This is where we now need to drag our variables. Highlight each

16 Figure 14: Empty window for a Repeated Measures (RM) ANOVA variable and click on the arrow to the left of Repeated Measures Cells to place each variable into its respective cell (make sure the names match). You should now have a window that looks like Figure 15 on the left side. We can now focus on examining whether the parametric assumptions have been met. We need to worry about whether the residuals are normally distributed and whether we have sphericity (are the differences between each level approximately equal?). In contrast to the independent groups ANOVA, we do not have the option to look at a Q-Q plot so it will be difficult to assess the assumption of normality in JASP (hopefully it will be included in one of the next releases). JASP does provide us with the ability to assess if the assumption of sphericity has been met. In this case, it even tells us that the assumption has been violated before we explicitly ask for Mauchly’s test as a note under the Within Subjects Effects table. If we select Sphericity tests under Assumption Checks we will get the table displayed in Figure 15. As we already knew from the note, sphericity has been violated and we will have to apply a Greenhouse-Geisser correction to the degrees of freedom. This is easily done and you just have to select Greenhouse-Geisser as the only option under Sphericity Corrections. In keeping with the minimalist design, we just get two rows of output with our corrected results, instead of the pages of repeated values that SPSS insists upon.

7.2.2 Main ANOVA results We are now very close to looking at the main results, we just need to select a few options. As always, we want a measure of effect size and we will report omega squared as a less biased measure of the proportion of variance explained as we only have one IV (remember the differences from section 6.2.2). We can also select some planned contrasts as if we are following the predictions from the paper, they expected the understated visual feedback to produce greater rotation than normal feedback, and for overstated visual feedback to produce less rotation than normal feedback. With a bit of trial and error, the repeated option under contrasts provides us with these two comparisons. You should now have a window that looks like Figure 16. There is a significant effect of visual feedback on the degree of rotation. Looking at the omega squared value, this explains 20% of the variance in degree of rotation. This could be written up as ”there was a significant effect of visual feedback on the degree of rotation that the participants were able to rotatetheir necks, F (1.58, 74.26) = 18.91, p < .001, ω2 = .20”. We can then follow this up with our planned comparisons to investigate how each manipulation compared to the baseline level of rotation. With the aid of the plot in Figure 15, we can see that understated visual feedback resulted in a greater

17 Figure 15: Plotting the RM ANOVA and assessing parametric assumptions degree of rotation than in normal visual feedback (t(47) = 5.28, p < .001). Overstated visual feedback resulted in a lower degree of rotation relative to the baseline than during normal visual feedback (t(47) = 5.37, p < .001). Similar to the planned comparisons in section 6.2.3, the degrees of freedom are not reported but they can be worked out as the sample size minus 1 (in this case 48 - 1 = 47). If we did not have any specific predictions, we could follow the procedure in section 6.2.4 to produce post-hoc tests in the same way as in the independent groups ANOVA.

8 A Brief Introduction to Bayesian Statistics 8.1 ”Bayesian what now?” Until this point, all of the examples have been analysed using classical or orthodox statistics that are based on Null Hypothesis Significance Testing (NHST). This relies on a perspective of probability based on long-run relative frequencies. Probability is interpreted through a hypothetically infinite number of repetitions of a particular event. Therefore, probability can only be attributed to a collective, not single observations. Despite this, many research articles talk about p values as if they provide probability attached to specific events. NHST is designed to create objective decision making criteria in the aim of controlling long-run error rates. This allows you to state in advance how many errors you are willing to make (in the long-run) in claiming there is an effect when there is not one present (type one errors), and claiming there is not an effect when there is one (type two errors). This sets up a decision making procedure where you can reject the null hypothesis and conclude there is some none-zero effect. However, this might leave you feeling short changed. As scientists, you often want to answer different questions to whether there is an effect or not, such as ”what is the probability of this theory being correct in comparison to another?”. This is where Bayesian statistics comes in. Bayesian statistics follows a different interpretation of probability. Rather than an objective interpretation using long-run relative frequencies, probability is based on the degree of belief on whether something is true based on the information that is available (for an overview of the interpretations of probability and comprehensive introduction to Bayesian statistics, see Etz and Vandekerckhove 2017). This allows you to make statements of probability to specific events, such as the likelihood of one theory in comparison to another theory.

18 Figure 16: Main RM ANOVA results and planned contrasts

8.2 Bayesian statistics in JASP The primary aim of JASP is to make Bayesian statistics accessible to everyone. Bayesian analyses can be performed in other software packages such as R, but this requires knowledge of writing code which is not immediately accessible if you have no experience. Therefore, JASP provides access to Bayesian statistics and removes some of the computational barriers that were previously in place. However, the full extent of Bayesian statistics is not currently available in JASP and if you require specialist models beyond ANOVA, you will need to explore packages such as R, , or WinBugs. Two of the examples from earlier in the guide will now be repeated to show you how they can be analysed through a Bayesian perspective.

8.3 Bayesian hypothesis testing The first task will be to reanalyse the Mehr et al. (2016) study from example two. Partoftheir study was to initially test if the gaze time of the child was the same towards both the researcher who was going to sing the familiar song, and to the researcher who was going to sing the unfamiliar song. This was important as the children should not be gravitating towards one particular researcher before they start singing. However, a lack of difference in gaze time was the conclusion froma non-significant paired samples T-Test in gaze time. This perpetuates the fallacy that alarge p value indicates there is a small effect and the null hypothesis should be accepted. As Dienes (2014) highlights, this conclusion cannot be made using p values alone, and it is an area where Bayesian statistics can be particularly useful. We will now analyse this example using a Bayesian paired samples T-Test to see if there is support for the null hypothesis in comparison to an alternative hypothesis, or whether it was insensitive to detect a difference.

8.3.1 Reanalysis one: performing a Bayesian paired samples t-test Go back to the data folder and open ’Mehr-study1-data.csv’. Using the T-Tests tab, select the Bayesian Paired Samples T-Test as opposed to the Paired Samples T-Test used in example two. This should open a new set of menu options and an empty table on the right. Similar to the procedure for a regular T-Test, drag the two variables Familiarization_Gaze_to_Familiar and Familiarization_Gaze_to_Unfamiliar into the white box with the little ruler in the bottom right corner to have a screen looking like Figure 17.

19 Figure 17: JASP window for a Bayesian Paired Samples T-Test

8.3.2 Menu options in JASP Although the menu options and table layout look similar to when you performed a regular paired samples T-Test, there will be some new options and terminology that need outlining first.

1. Hypothesis Although this option is available for a regular t-test, it is somewhat more important when using Bayesian analyses as it affects the prior (which will explained in more detail shortly). The default option is a non-directional hypothesis indicated by the ’≠ ’ symbol and states you expect there to be a difference but it could be in either a positive or negative direction. The next options are directional predictions where you expect either measure 1 to be bigger than measure 2 (Measure 1 > Measure 2), or you expect measure 2 to be bigger than measure 1 (Measure 1 < Measure 2). It is important that you choose one of these options based on theory or previous research. 2. This will be the first new term we have come across in this example. A Bayes Factor isaratio between how well two models perform in explaining the data (Marsman and Wagenmakers 2016). The default model of comparison in JASP is a null model centered on zero (if you want to test two competing alternative models you will have to use alternative software). We have three options to choose from. The default is BF10 which indicates that the alternative hypothesis is the numerator and the null is the denominator. This tells you how well the alternative model explains the data in comparison to the null mode. The second option is BF01, which is the opposite and tells you how well the null model explains the data in comparison to the alternative.

3. Plots This section contains quite a few options and each plot will be explained in greater detail in the next section when we look at how to interpret the output. The first option creates a plot of the prior and posterior distribution. This will show you what the probability distribution looks like after considering the prior information you provided and the available data. The second option presents a robustness check of the Bayes factor and choice of prior information. As there is flexibility when choosing a prior, it is good practice to vary the prior toseehowit

20 affects the posterior and your conclusions. The third option is known as sequential analysis. This shows how the Bayes Factor changes with each additional data point and tracks the evidence in favour of either the null hypothesis or the alternative hypothesis. The last option is simply a plot of the average value in each condition and the 95% credibility interval around the average (this is similar to the 95% CI in Classical statistics, but has a different definition). 4. Prior This is one of the defining points of Bayesian statistics and it is where you indicate any existing knowledge about the effect of interest for your alternative model. Under the Prior drop-down tab, JASP allows you to choose one of three different types of prior distribution. It’s very rare you will be able to say ”I only expect an effect size of Cohen’s d = .03”, soyou specify the peak of the distribution on the more likely effect sizes and increasingly unlikely effect sizes are found in the tails of the distribution. For these examples wearegoingto just use the default option. This is a specific type of distribution known as the Cauchy probability distribution. You may be more familiar with the normal distribution, so you can compare the appearance of the two in Figure 18A. Clicking on the default option allows you to change one parameter, the scale of the distribution. Figure 18B shows how different values of the scale change its distribution. A narrow prior closer to 0 makes the distribution very tall with narrow tails. This indicates that you have a high prior belief that the effect will be somewhere around zero. A wider prior makes it much shorter with wider tails. This indicates you are less certain the effect will be close to zero and you allocate more equal probability to the effect being somewhere along the range. The default option that JASP uses is 0.707 which can be interpreted as you are 50% certain that the effect will be between -0.707 and 0.707 (for a two-sided hypothesis). Although it will not be covered here, if you have more prior information about the size of the effect you expect to find, you can choose an informed prior where you specify both the scale as we have just outlined, and the location of the distribution’s peak for a range of probability distributions. It is important you choose a prior based on your previous knowledge and you will have to justify your choice when you report the analysis.

A B 1.6 1.5 1.2 1.0

Y 0.8 0.5 P(x) 0.4

0.0 0.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 X Effect size δ Probability Distribution Cauchy Normal Cauchy Prior Width δ = 0.2 δ = 0.7 δ = 1

Figure 18: Properties of a Cauchy probability distribution. (A) in comparison to a normal distri- bution with similar dimensions, although the mean and SD that describes the normal distribution should not apply to the Cauchy. (B) the appearance of the Cauchy distribution depends on the scale. These options refer to the most likely range of effects in psychological research (Rouder et al. 2016). A prior width of 0.707 is the default option in JASP (shown in green).

8.3.3 Interpreting the output Now that you know what all of the options do, it is time to go through the analysis and see what it is telling us. The prediction in this example is that there will be little to no difference between the gaze proportion towards the familiar song singing researcher and the unfamiliar song singing researcher. Considering this, the hypothesis option can remain the same as we are not expecting either measure to be bigger. We are focusing on support for the null hypothesis in comparison to the alternative hypothesis so we can click BF01 under Bayes Factor to have the result in favour of the null. For this example, we are just going to use the default option of 0.707 centered on zero for the Cauchy distribution. If we had more information about the likely size of the effect, we could select a more informative prior. Under plots, select Prior and Posterior (including additional

21 information), Bayes Factor robustness check, and Sequential analysis. As we looked at the two measures back in section 4, we will ignore the descriptive plots for now but remember you would want to explore the two measurements if this was a new analysis. Starting from the top of the analysis window, the first piece of information you get is the Bayes Factor which tells us the data are approximately 5.1 times more likely under the null model in contrast to the alternative model. This suggests there is moderate evidence in support of the null model (we will shortly go over grades of evidence, but Wagenmakers et al. 2010 suggest Bayes Factors between 3 and 10 suggest moderate evidence). We also have a measure of error associated with the estimation of the Bayes Factor as it is based on simulations. As long as this is approximately below 10%, there should be nothing to worry about. The error in this example is 0.06% which suggests the estimate is quite accurate. We currently know that there is substantial evidence in favour of the null hypothesis, and we can start to explore the graphical options that JASP provides to understand the result in a little more detail. The first is the Prior and Posterior plot in Figure 19. The prior distribution is the dottedlineand it is the result of the parameters you specified in the prior and hypothesis sections. The posterior distribution is the solid black line and results from combining the prior and the available data. What this is showing is that the most likely value of the effect size is just above zero, approximately 0.06 if you look at the median value. Look at the shape of the posterior distribution. It is much taller and the tails are almost entirely within effect sizes of -0.8 and 0.8. This suggests that the effect size is likely to be within these boundaries, and with a greater number of measurements, the estimate of the effect size could be more precise. The other information provided here isthe 95% which contains the values that you are 95% certain will contain the true population value (remember this is a different interpretation to a classical 95% CI). This isalso displayed graphically though the line above the posterior distribution. This suggests there is 95% certainty that the true population effect size is between -0.39 and 0.50. The last thing tonote is that you are provided with values for both BF01 and BF10. These quantify the same evidence but in a slightly different way. As we selected BF01, the evidence was presented in favour of the null. However, if we had chosen BF10, we would have been presented with evidence in favour of the alternative model. It is still the same ratio of evidence between each model, but one model is either on the top or bottom (it is actually the reciprocal of our original value. Calculate 1/5.103 and compare it to the Bayes Factor of shown for BF01). The second plot that we are presented with in the analysis window is for the robustness check in Figure 20. This shows how the Bayes Factor changes with the width of the Cauchy prior you defined previously. As there is flexibility in choosing a prior, it is good practice to explore different values to see how robust the conclusions are in response to changes in the prior. The plot shows the result of the Bayes Factor for the prior you chose, a wide prior, and an ultrawide prior. We can see from the plot that as the prior width gets wider, the Bayes Factor in favour of the null hypothesis increases. Remember that the Bayes Factor quantifies the degree of belief in one hypothesis over another. What this is showing us is that if we are expressing less certainty about the size of the effect in the prior, the Bayes Factor increases to show the degree of belief in the null modelshould be increased. The Bayes Factor also approaches one the smaller the width of the Cauchy prior, which suggests the degree of belief that the effect closer to zero is high. However, remember that we are using the default prior for our alternative model which is centered on zero. If we express too much certainty that the effect is close to zero for the alternative model, this will almost bethe same as the null model which explains why it both models are equivalent with a Bayes Factor of 1. This becomes more useful when an informative prior is used and the center can be defined as a non-zero value, but note that the robustness check plot currently does not support informed priors so you would have to check it by manually adjusting the prior parameters. The last plot we are going to look at is the results of the sequential analysis in Figure 21. This shows how the value of the Bayes Factor changes with each additional measurement. When using NHST, looking at the data and seeing how the results change with additional data without any additional corrections is considered a questionable research practice (although see Lakens 2016 for how sequential analysis can be performed using NHST whilst still controlling your error rates). However, sequential analysis can be performed using the Bayesian approach without any additional considerations (although see Schönbrodt et al. 2015 for further considerations). Figure 18 shows that the Bayes Factor dances around between 1 and 3 until we have approximately 12 measure- ments, at which point the Bayes Factor steadily increases until it reaches 5.1 for the whole sample. There are two important points to highlight here about sequential analysis from Schönbrodt et

22 Figure 19: The Prior and Posterior Plot. This shows the probability distribution of the prior you define, and the probability distribution of the posterior which combines the prior with the observed data. The 95% credibility interval is shown above the distributions as a horizontal line. This is also shown with the additional information option selected. This provides the median effect size and 95% credibility interval, as well as specific values for BF10 and BF01. al. (2015). Firstly, it is important that a minimum number of participants can be justified and sampled before sequential analysis begins. Secondly, there are different recommendations on what boundary to use for evidence in favour of one model over another. As reviewed in Kruschke and Liddell (2017), some authors have suggested a threshold of 3 for making decisions, whilst others have suggested 6 for early stages of research and 10 for later stages of research. Schönbrodt et al. (2015) highly recommend a boundary of at least 5 to adequately control for false positive and negatives.

8.3.4 Reporting the results Hopefully you now have some idea about how to interpret the output JASP provides you for a Bayesian T-Test. We will now go over how you could go about reporting the results in a report. However, please note that there is currently no standardised way of reporting Bayes Factors in APA style. Some information is presented the same. You will still need to report the descriptive statistics of the relevant variables and the sample size. In addition to the actual results, van de Schoot and Depaoli (2014) highlighted several key components that should be reported in an empirical research paper utilising Bayesian statistics. These include what statistical program was used, a discussion and justification of the prior used (even if it is the default option), and asensitivity analysis (or robustness check in JASP) of the prior. One way of reporting the results that we have looked could be like this: ”It was predicted that there will be no difference between the gaze proportion of infants towards the researcher who will sing the familiar song and the researcher who will sing the unfamiliar song. The data were analysed using the statistical package JASP (JASP Team 2017; Version 0.8.5.1). The prior for the non-directional alternative model was a Cauchy prior centred on zero with a scale of 0.707. A Bayesian paired samples T-Test was performed and found that there was moderate evidence in favour of the null model in comparison to the alternative model (BF01 = 5.10). Choosing a narrower Cauchy prior width of 0.5 would still result in moderate evidence in favour of the null model (BF01 = 3.78), whilst a wider prior of 1.0 would provide similar results (BF01 = 7.02).”

23 Figure 20: The Robustness check plot. This shows how changing the default Cauchy prior width affects the Bayes Factor in favour ofBF01 in this example. The grade of evidence is shown on the right hand y-axis.

There are just a few additional notes for the end of this example. Firstly, although the use of Bayesian statistics is increasing, there are still many people who are unfamiliar with it or feel uncomfortable with solely reporting Bayesian statistics. Therefore, many papers that are starting to use Bayesian methods combine them with classical statistics to provide an alternative perspective (e.g. Mason et al. 2017). Secondly, the choice of prior is very important as it can affect your conclusions. It is crucial to justify which prior you have used by discussing likely effect sizes derived from previous research. We have used a default priors in this example as the focus is on the analysis process. However, informed priors should be used wherever possible (see Rouder et al. 2009 for more information on priors in Bayesian T-Tests).

9 Summary

Hopefully you have found these examples informative, and you can now see there are viable alter- natives to SPSS. JASP is a lightweight alternative that helpfully provides you with information that you just do not get in SPSS. Although it is not perfect (it is still in development and not even at version 1.0 yet!), there are many advantages that it can offer you as a psychology student. Firstly, it provides you with a simple output with just the values you need, with the option to add more information as and when required. Think back to when you first came across an SPSS output. You are bombarded with information before you even know what to do with it. On the other hand, JASP is more intuitive, and is hopefully a little easier to understand if you are still getting to grips with statistics in psychology. Secondly, it produces common effect sizes such as Cohen’s d with the simple press of a button. Jacob Cohen (1990: 12), a famous statistician in psychology, said ”the primary product of a research inquiry is one or more measures of effect size, not p values”. Despite its popularity, SPSS provides only a handful of effect sizes and you are required to calculate most of them separately. JASP allows you to simply tick a box to produce them which saves you time, and reduces the likelihood of making a mistake if you were to calculate it by hand. Finally, the main motivation behind creating JASP was to make Bayesian statistics open to more people. Historically, Bayesian statistics were restricted to those with good statistical and computational knowledge. However, Bayesian methods are increasing in psychological research and it is important that they are accessible to a larger number of researchers. However, at the moment JASP simply cannot do everything you need. Therefore, it is unlikely that you will solely use JASP to analyse your data. There is nothing wrong with switching between

24 Figure 21: Sequential analysis plot. This shows how the Bayes Factor is updated with each additional measurement. Similar to the Robustness check in Figure 20, the grade of evidence is shown on the right hand y-axis. different statistics packages to play to each of their strengths. Hopefully you will seethatJASP can offer several benefits, and statistics might appear to be less frightening. Who knows,maybe in a few years you will be solely taught using JASP! If you have any further questions about any of the topics covered (or not covered) in this guide, feel free to email me on [email protected]. Feedback on this guide is always welcome!

10 Additional Resources

• A free online textbook by Craig Wendorf can be downloaded from the Open Science Frame- work. This has chapters on SPSS and JASP, and provides some additional content that was not covered in this session. • JASP have their own YouTube channel with a few tutorials on how to perform simple Bayesian analyses (read appendix 9.1 if you still have no idea what this is). However, if you search JASP on Youtube, there are a few tutorials on importing data and statistical designs you may be more familiar with made by users that you might want to look at if you want some guidance in your own time.

11 References

APA Task Force on Statistical Inference (1999) ’Statistical Methods in Psychology Journals’. Amer- ican Psychologist 54(8), 594-604 Baguley (2012) Serious Stats: A Guide to Advanced Statistics for the Behavioural Sciences. Basingstoke: Palgrave Macmillan Beall, A. T., Hofer, M. K., and Shaller, M. (2016). ’Infections and elections: Did an Ebola outbreak influence the 2014 U.S. federal elections (and if so, how)?’ Psychological Science 27, 595-605 Cohen, J. (1988) Statistical Power Analysis for the Behavioural Sciences. 2nd edition, New Jersey: Lawrence Erlbaum Associates Cohen, J. (1990) ‘Things I Have Learned (So Far)’. American Psychologist 45 (12), 1304–1312

25 Dawtry, R. J., Sutton, R. M., and Sibley, C. G. (2015). ’Why wealthier people think people are wealthier, and why it matters: From social sampling to attitudes to redistribution’. Psychological Science 26, 1389-1400 Dienes, Z. (2014) ‘Using Bayes to Get the Most out of Non-Significant Results’. Frontiers in Psychology 5, 1–17 Etz, A. and Vanderkerckhove, J. (2017) ’Introduction to for Psychology’. Retrieved from osf.io/preprints/psyarxiv/q46q3 Field, A. (2013) Discovering Statistics using IBM SPSS Statistic. London: SAGE Harvie, D. S., Broecker, M., Smith, R. T., Meulders, A., Madden, V. J., and Moseley, G. L. (2015). Bogus visual feedback alters onset of movement-evoked pain in people with neck pain. Psychological Science 26, 385-392 James, E. L., Bonsall, M. B., Hoppitt, L., Tunbridge, E. M., Geddes, J. R., Milton, A. L., and Holmes, E. A. (2015). ’Computer Game Play Reduces Intrusive Memories of Experimental Trauma via Reconsolidation-Update Mechanisms. Psychological Science 26, 1201-1215 Kruschke, J.K. and Liddell, T.M. (2017) ‘Bayesian Data Analysis for Newcomers’. Psychonomic Bulletin and Review, 1–29 Lakens, D. (2013) ‘Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for T-Tests and ANOVAs’. Frontiers in Psychology 4, 1–12 Lakens, D. (2017) ‘Equivalence Tests: A Practical Primer for T Tests, Correlations, and Meta- Analyses’. Social Psychological and Personality Science, 1–8 Mason, A., Ludwig, C. and Farrell, S. (2016) ‘Adaptive Scaling of Reward in Episodic Memory: A Replication Study’. The Quarterly Journal of Experimental Psychology 70 (11), 2306-2318 Mehr, S. A., Song. L. A., and Spelke, E. S. (2016). ’For 5-month-old infants, melodies are social’. Psychological Science 27, 486-501 Rouder, J.N., Morey, R.D., Verhagen, J., Province, J.M., and Wagenmakers, E.J. (2016) ‘Is There a Free Lunch in Inference?’ Topics in Cognitive Science 8 (3), 520–547 Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., and Iverson, G. (2009) ‘Bayesian T Tests for Accepting and Rejecting the Null Hypothesis’. Psychonomic Bulletin and Review 16 (2), 225–237 Schönbrodt, F.D., Wagenmakers, E., Zehetleitner, M., and Perugini, M. (2015) ‘Sequential Hypothesis Testing With Bayes Factors : Efficiently Testing Mean Differences’. Psychological Methods Schroeder, J., and Epley, N. (2015). ’The sound of intellect: Speech reveals a thoughtful mind, increasing a job candidate’s appeal’. Psychological Science 26, 877-891 van de Schoot, R. and Depaoli, S. (2014) ‘Bayesian Analyses: Where to Start and What to Report’. The European Health Psychologist 16 (2), 75–84 Wagenmakers, E., Wetzels, R., Borsboom, D., and van der Maas, H.L.J. (2011) ‘Why Psy- chologists Must Change the Way They Analyze Their Data: The Case of Psi: Comment on Bem (2011).’ Journal of Personality and Social Psychology 100 (3), 426–432

12 Appendix 12.1 Student or Welch T-Test? When you perform an independent samples T-Test, one of the parametric assumptions you test for is homogeneity of variance. The issues surrounding using the Levene’s test have already been mentioned in section 3.3.2. but a further issue is that it will be extremely unlikely to have exactly equal variances in two groups. Unequal variances can be problematic and lead to an increase in type one error rates (Lakens 2015). An alternative approach is to use Welch’s T-Test by default which corrects for uneven variances and sample sizes. This essentially uses a correction to decrease the degrees of freedom and consequently the power of the test. Therefore, as the difference in variances increases, the degrees of freedom decreases to make the test increasingly more conservative. This attempts to ensure the false-positive error rate does not increase above 5%. For example, in section 3.4 you might have noticed if you selected Welch’s T-Test, the degrees of freedom change from 37 to 33.43. It is generally recommended to use the Welch’s version by default as you do not need to conduct Levene’s test first (saves you time), and essentially provides the same result whenthere are equal variances and sample sizes in each group. For a more thorough explanation, read Lakens’ (2015) blog post or if you are interested, a more technical paper by Derrick et al. 2016.

26