Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures Jacob O

CHI 2011 • Session: Research Methods May 7–12, 2011 • Vancouver, BC, Canada The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures Jacob O. Wobbrock,1 Leah Findlater,1 Darren Gergle,2 James J. Higgins3 1The Information School 2School of Communication 3Department of Statistics DUB Group Northwestern University Kansas State University University of Washington Evanston, IL 60208 USA Manhattan, KS 66506 USA Seattle, WA 98195 USA [email protected] [email protected] wobbrock, [email protected] ABSTRACT some methods for handling nonparametric factorial data Nonparametric data from multi-factor experiments arise exist (see Table 1) [12], most are not well known to HCI often in human-computer interaction (HCI). Examples may researchers, require advanced statistical knowledge, and are include error counts, Likert responses, and preference not included in common statistics packages.1 tallies. But because multiple factors are involved, common Table 1. Some possible analyses for nonparametric data. nonparametric tests (e.g., Friedman) are inadequate, as they Method Limitations are unable to examine interaction effects. While some statistical techniques exist to handle such data, these General Linear Models Can perform factorial parametric analyses, (GLM) but cannot perform nonparametric analyses. techniques are not widely available and are complex. To address these concerns, we present the Aligned Rank Mann-Whitney U, Can perform nonparametric analyses, but Kruskal-Wallis cannot handle repeated measures or analyze Transform (ART) for nonparametric factorial data analysis multiple factors or interactions. in HCI. The ART relies on a preprocessing step that “aligns” Wilcoxon, Friedman Can perform nonparametric analyses and data before applying averaged ranks, after which point handle repeated measures, but cannot common ANOVA procedures can be used, making the ART analyze multiple factors or interactions. accessible to anyone familiar with the F-test. Unlike most χ2, Logistic Regression, Can perform factorial nonparametric articles on the ART, which only address two factors, we Generalized Linear analyses, but cannot handle repeated generalize the ART to N factors. We also provide ARTool Models (GZLM) measures. and ARTweb, desktop and Web-based programs for aligning Generalized Linear Can perform factorial nonparametric analyses and ranking data. Our re-examination of some published Mixed Models (GLMM), and handle repeated measures, but are not HCI results exhibits advantages of the ART. Generalized Estimating widely available and are complex. Equations (GEE) Author Keywords: Statistics, analysis of variance, ANOVA, factorial analysis, nonparametric data, F-test. Kaptein et al.’s [7] Can perform factorial nonparametric analyses nonparametric method and handle repeated measures, but requires ACM Classification Keywords: H.5.2 [Information different mathematics and software modules interfaces and presentation]: User interfaces – for each type of experiment design. ([7] evaluation/methodology, theory and methods. focused only on 2×2 mixed designs.) General Terms: Experimentation, Measurement, Theory. Aligned Rank Transform Can perform factorial nonparametric analyses (ART) and handle repeated measures. Requires INTRODUCTION only an ANOVA after data alignment and Studies in human-computer interaction (HCI) often ranking, provided for by ARTool or ARTweb. generate nonparametric data from multiple independent variables. Examples of such data may include error counts, A remedy to the paucity of nonparametric factorial analyses Likert responses, or preference tallies. Often complicating would be a procedure that retains the familiarity and this picture are correlated data arising from repeated interpretability of the familiar parametric F-test. We present measures. Two analysis approaches for data like these just such an analysis called the Aligned Rank Transform appear regularly in the HCI literature. The first simply uses (ART). The ART relies on an alignment and ranking step before using F-tests. We offer two equivalent tools to do the a parametric F-test, which risks violating ANOVA assumptions and inflating Type I error rates. The second alignment and ranking, one for the desktop (ARTool) and uses common one-way nonparametric tests (e.g., Friedman), one on the Web (ARTweb). Unlike the advanced methods foregoing the examination of interaction effects. Although 1 A useful page from statistics consulting at UCLA for choosing the right Permission to make digital or hard copies of all or part of this work for statistical test has a conspicuous omission marked with “???”, which personal or classroom use is granted without fee provided that copies are appears in the slot for analyzing factorial ordinal data. See not made or distributed for profit or commercial advantage and that copies http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm. Similarly, bear this notice and the full citation on the first page. To copy otherwise, another useful page giving the rationale for myriad statistical tests fails to or republish, to post on servers or to redistribute to lists, requires prior describe any nonparametric factorial analysis for repeated measures data. specific permission and/or a fee. See http://www.ats.ucla.edu/stat/spss/whatstat/whatstat.htm. Both pages CHI 2011, May 7–12, 2011, Vancouver, BC, Canada. were accessed January 12, 2011. Copyright 2011 ACM 978-1-4503-0267-8/11/05....$10.00. 143 CHI 2011 • Session: Research Methods May 7–12, 2011 • Vancouver, BC, Canada shown in Table 1, researchers only familiar with ANOVA with matching levels of independent variables (X1 and X2). can use, interpret, and report results from the ART. Thus, the cell mean in row 1 is the mean Y of s1 and s5. We describe the ART and contribute: (1) its generalization Table 2. Example calculation of cell means. from two factors to N factors, (2) the ARTool and ARTweb Subject X1 X2 Y cell mean programs for easy alignment and ranking, and (3) a re- s1 a a 12 (12+19)/2 examination of some published HCI data. s2 a b 7 (7+16)/2 s3 b a 14 (14+14)/2 THE ALIGNED RANK TRANSFORM FOR N FACTORS s4 b b 8 (8+10)/2 This section describes the Aligned Rank Transform, s5 a a 19 (12+19)/2 s6 a b 16 (7+16)/2 generalizing it to an arbitrary number of factors. The ART is s7 b a 14 (14+14)/2 for use in circumstances similar to the parametric ANOVA, s8 b b 10 (8+10)/2 except that the response variable may be continuous or ordinal, and is not required to be normally distributed. Step 2. Compute estimated effects for all main and interaction effects. Let A, B, C, and D be factors with Background levels Ai, i = 1..a; Bj, j = 1..b; Ck, k = 1..c; Dℓ, ℓ = 1..d. Rank transformations have appeared in statistics for years Let be the mean response Y for rows where factor A is [12]. Conover and Iman’s [1] Rank Transform (RT) applies Ai i ranks, averaged in the case of ties, over a data set, and then at level i. Let ABi j be the mean response Yij for rows where uses the parametric F-test on the ranks, resulting in a factor A is at level i and factor B is at level j. And so on. Let nonparametric factorial procedure. However, it was µ be the grand mean of Y over all rows. subsequently discovered that this process produces inaccurate results for interaction effects [5,11], making the One-way effects. The estimated main effect for a factor A RT method unsuitable for factorial designs. with response Yi is The Aligned Rank Transform (ART) [2,10] corrects this Ai . problem, providing accurate nonparametric treatment for Two-way effects. The estimated effect for an A*B both main and interaction effects [4,5,11]. It relies on a interaction with response Y is preprocessing step that first “aligns” the data for each effect ij (main or interaction) before assigning ranks, averaged in ABAB . i j i j the case of ties. Data alignment is an established process in statistics [6] by which effects are estimated as marginal Three-way effects. The estimated effect for an A*B*C means and then “stripped” from the response variable so interaction with response Yijk is that all effects but one are removed. ABCABACBCABC . ijk ij ik jk i j k Let’s consider an example: In a two-factor experiment with Four-way effects. The estimated effect for an A*B*C*D effects A, B, and A*B, and response Y, testing for the interaction with response Y is significance of effect A means first stripping from Y ijkℓ ABCDABCABDACDBCD estimates of effects B and A*B, leaving only a possible i j k i j k i j i k j k effect of A behind. This alignment results in YA′, whose ABACADBCBDCD i j i k i j k j k values are then ranked, producing YA″. Lastly, a full- ABCD . factorial ANOVA is run with YA″ as the response and model i j k terms A, B, and A*B, but importantly, only the effect of A is N-way effects. The generalized estimated effect for an N- examined in the ANOVA table; B and A*B are ignored. This way interaction is process is then repeated for the effects of B and A*B, i.e., on aligned ranks YB″ and YA*B″, respectively. Thus, to use N way the ART, responses Y from a study must be aligned and N-1 way N-2 way N-3 way N-4 way ranked for each effect of interest. This is a tedious process to do by hand, but ARTool or ARTweb make it easy. ... N-h way // if h is odd, or ART Procedure for N Factors We present the ART procedure in five steps: N-h way // if h is even Step 1. Compute residuals. For each raw response Y, ..

Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures Jacob O

14. Non-Parametric Tests You Know the Story. You've Tried for Hours To

Statistical Analysis in JASP

Friedman Test

Compare Multiple Related Samples

Repeated-Measures ANOVA & Friedman Test Using STATCAL

Chapter 4. Data Analysis

9 Blocked Designs

Basic Statistical Analysis Aov(), Anova(), Lm(), Glm(): Linear And

12.9 Friedman Rank Test: Nonparametric Analysis for The

Sample Size Calculation with Gpower

BMC Immunology Biomed Central

Exact P-Values for Pairwise Comparison of Friedman Rank Sums