Tea: a High-Level Language and Runtime System for Automating Statistical Analyses
Total Page:16
File Type:pdf, Size:1020Kb
Tea: A High-level Language and Runtime System for Automating Statistical Analyses Eunice Jun, Maureen Daum, Jared Roesch Sarah E. Chasins University of Washington University of California, Berkeley Emery D. Berger Rene Just, Katharina Reinecke University of Massachusetts Amherst University of Washington Figure 1: Overview of Tea’s language constructs and its runtime system’s compilation process. The colors correspond to the annotations provided in Figure 2. ABSTRACT KEYWORDS Though statistical analyses are centered on research ques- statistical analysis, declarative programming language tions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have 1 INTRODUCTION statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea1, a high-level declar- The enormous variety of modern quantitative ative language and runtime system. In Tea, users express methods leaves researchers with the nontrivial their study design, any parametric assumptions, and their task of matching analysis and design to the re- hypotheses. Tea compiles these high-level specifications into search question. - Ronald Fisher [11] a constraint satisfaction problem that determines the set of Since the development of modern statistical methods (e.g., valid statistical tests, and then executes them to test the hy- Student’s t-test, ANOVA, etc.), statisticians have acknowl- pothesis. We evaluate Tea using a suite of statistical analyses edged the difficulty of identifying which statistical tests peo- arXiv:1904.05387v1 [cs.PL] 10 Apr 2019 drawn from popular tutorials. We show that Tea generally ple should use to answer their specific research questions. matches the choices of experts while automatically switch- Almost a century later, choosing appropriate statistical tests ing to non-parametric tests when parametric assumptions for evaluating a hypothesis remains a challenge. As a con- are not met. We simulate the effect of mistakes made by non- sequence, errors in statistical analyses are common [20], expert users and show that Tea automatically avoids both especially given that data analysis has become a common false negatives and false positives that could be produced by task for people with little to no statistical expertise. the application of incorrect statistical tests. A wide variety of tools (such as SPSS [46], SAS [45], and JMP [43]), programming languages (R [44]), and libraries 1named after Fisher’s “Lady Tasting Tea” experiment [11] (including numpy [32], scipy [17], and statsmodels [37]), enable people to perform specific statistical tests, but they do not address the fundamental problem that users may not arxiv, April 2019, on-line know which statistical test to perform and how to verify that 2019. https://doi.org/0000001.0000001 specific assumptions about their data hold. In fact, all of these tools place the burden of valid, replica- limitations and future work and how Tea is different from ble statistical analyses on the user and demand deep knowl- related work. We conclude by providing information on how edge of statistics. Users not only have to identify their re- to use Tea. search questions, hypotheses, and domain assumptions, but also must select statistical tests for their hypotheses (e.g., 2 USAGE SCENARIO Student’s t-test or one-way ANOVA). For each statistical This section describes how an analyst who has no statistical test, users must be aware of the statistical assumptions each background can use Tea to answer their research questions. test makes about the data (e.g., normality or equal variance We use as an example analyst a historical criminologist who between groups) and how to check for them, which requires wants to determine how imprisonment differed across re- additional statistical tests (e.g., Levene’s test for equal vari- gions of the US in 19602. Figure 2 shows the Tea code for ance), which themselves may demand further assumptions this example. about the data. This entire process requires significant knowl- The analyst specifies the data file’s path in Tea. Tea han- edge about statistical tests and their preconditions, as well dles loading and storing the data set for the duration of the as the ability to perform the tests and verify their precondi- analysis session. The analyst does not have to worry about tions. This cognitively demanding process can easily lead to transforming the data in any way. mistakes. The analyst asks if the probability of imprisonment was This paper presents Tea, a high-level declarative language higher in southern states than in non-southern states. The for automating statistical test selection and execution that analyst identifies two variables that could help them answer abstracts the details of statistical analysis from the users. Tea this question: the probability of imprisonment (‘Prob’) and captures users’ hypotheses and domain knowledge, trans- geographic location (‘So’). for non-southern. Using Tea, the lates this information into a constraint satisfaction problem, analyst defines the geographic location as a dichotomous identifies all valid statistical tests to evaluate a hypothesis, nominal variable where ‘1’ indicates a southern state and and executes the tests. Figure 1 illustrates Tea’s compilation ‘0’ indicates a non-southern state, and indicates that the process. Tea’s higher-level, declarative nature aims to lower probability of imprisonment is a numeric data type (ratio) the barrier to valid, replicable analyses. with a range between 0 and 1. We have designed Tea to integrate directly into common The analyst then specifies their study design, defining the data analysis workflows for users who have minimal pro- study type to be ‘observational study’ (rather than experi- gramming experience. Tea is implemented as an open-source mental study) and defining the independent variable to be the Python library, so programmers can use Tea wherever they geographic location and the outcome (dependent) variable use Python, including within Python notebooks. to be the probability of imprisonment. In addition, Tea is flexible. Its abstraction of the analy- Based on their prior research, the analyst knows that the sis process and use of a constraint solver to select tests is probability of imprisonment in southern and non-southern designed to support its extension to emerging statistical states is normally distributed. The analyst provides an as- methods, such as Bayesian analysis. Currently, Tea supports sumptions clause to Tea in which they specify this domain frequentist Null Hypothesis Significance Testing (NHST). knowledge. They also specify an acceptable Type I error rate The paper makes the following contributions: (probability of finding a false positive result), more collo- • Tea, a novel domain-specific language (DSL) for auto- quially known as the ‘significance threshold’α ( = :05) that matically selecting and executing statistical analyses is acceptable in criminology. If the analyst does not have based on users’ hypotheses and domain knowledge assumptions or forgets to provide assumptions, Tea will use ( Section 4), the default of α = :05. • the Tea runtime system, which formulates statistical The analyst hypothesizes that southern states will have test selection as a maximum constraint satisfaction a higher probability of imprisonment than non-southern problem ( Section 5), and states. The analyst directly expresses this hypothesis in Tea. • an initial evaluation showing that Tea can express and Note that at no point does the analyst indicate which statistical execute common NHST statistical tests ( Section 6). tests should be performed. From this point on, Tea operates entirely automatically. We start with a usage scenario that provides an overview When the analyst runs their Tea program, Tea checks proper- of Tea (Section 2). We discuss the concerns about statistics in ties of the data and finds that Student’s t-test is appropriate. the HCI community that shaped Tea’s design (Section 3), the implementation of Tea’s programming language (Section 4), the implementation of Tea’s runtime system (Section 5), and 2The example is taken from [8] and [39]. The data set comes as part of the the evaluation of Tea as a whole (Section 6). We then discuss MASS package in R. Tea executes the Student’s t-test and non-parametric alter- natives, such as the Mann-Whitney U test, which provide alternative, consistent results. Tea generates a table of results from executing the tests, ordered by their power (i.e., results from the parametric t- test will be listed first given that it has higher power than the non-parametric equivalent). Based on this output, the analyst concludes that their hypothesis—that the probabil- ity of imprisonment was higher in southern states than in non-southern states in 1960—is supported. The results from alternative statistical tests support this conclusion, so the analyst can be confident in their assessment. The analyst can now share their Tea program with col- leagues. Other researchers can easily see what assumptions the analyst made and what the intended hypothesis was (since these are explicitly stated in the Tea program), and reproduce the exact results using Tea. 3 DESIGN CONSIDERATIONS In designing Tea’s language and runtime system, we consid- ered best practices for conducting statistical analyses and Figure 2: Sample Tea program specification