An Interactive Environment for Exploring Trade-Offs in HCI Experiment Design Alexander Eiselmayer, Chat Wacharamanotham, Michel Beaudouin-Lafon, Wendy Mackay
Total Page:16
File Type:pdf, Size:1020Kb
Touchstone2: An Interactive Environment for Exploring Trade-offs in HCI Experiment Design Alexander Eiselmayer, Chat Wacharamanotham, Michel Beaudouin-Lafon, Wendy Mackay To cite this version: Alexander Eiselmayer, Chat Wacharamanotham, Michel Beaudouin-Lafon, Wendy Mackay. Touch- stone2: An Interactive Environment for Exploring Trade-offs in HCI Experiment Design. CHI 2019 - The ACM CHI Conference on Human Factors in Computing Systems, ACM, May 2019, Glasgow, United Kingdom. pp.1–11. hal-02127273 HAL Id: hal-02127273 https://hal.archives-ouvertes.fr/hal-02127273 Submitted on 13 May 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Touchstone2: An Interactive Environment for Exploring Trade-offs in HCI Experiment Design Alexander Eiselmayer1 Chat Wacharamanotham1 Michel Beaudouin-Lafon2 Wendy E. Mackay2 1 University of Zurich 2 Univ. Paris-Sud, CNRS, Zurich, Switzerland Inria, Université Paris-Saclay Orsay, France [email protected], [email protected], [email protected], [email protected] Participant 1 P2 Experiment name 1.0 # Technique Item # T I 0.8 IV Independent Variable 0.6 Level 1 ( 1 sec ) Power 0.4 Level 2 ( 1 sec ) At the Cohen’s f = 0.15, 0.2 24 participants gives the power of 0.80 Latin square of 1 replications 0.0 participant (or sub-block) 0 4 8 12 16 20 24 28 32 Number of Participants 1 2 3 Effect size: small (f = 0.15 ± 0.03) medium (f = 0.25) Figure 1. Touchstone2 experiments consist of interactive “bricks” ⃝1 that specify independent variables, blocking, counterbal- ancing and timing, and generate an interactive trial table ⃝2 and an interactive statistical power chart ⃝3 . ABSTRACT KEYWORDS Touchstone2 offers a direct-manipulation interface for gener- Experiment design; Randomization; Counterbalancing; Power ating and examining trade-offs in experiment designs. Based analysis; Reproducibility on interviews with experienced researchers, we developed ACM Reference Format: an interactive environment for manipulating experiment Alexander Eiselmayer, Chat Wacharamanotham, Michel Beaudouin- design parameters, revealing patterns in trial tables, and esti- Lafon, and Wendy E. Mackay. 2019. Touchstone2: An Interactive mating and comparing statistical power. We also developed Environment for Exploring Trade-offs in HCI Experiment Design. In TSL, a declarative language that precisely represents experi- CHI Conference on Human Factors in Computing Systems Proceedings ment designs. In two studies, experienced HCI researchers (CHI 2019), May 4–9, 2019, Glasgow, Scotland Uk. ACM, New York, successfully used Touchstone2 to evaluate design trade-offs NY, USA, 11 pages. https://doi.org/10.1145/3290605.3300447 and calculate how many participants are required for par- ticular effect sizes. We discuss Touchstone2’s benefits and 1 INTRODUCTION limitations, as well as directions for future research. Human-Computer Interaction (HCI) researchers often com- pare the effectiveness of interaction techniques or other in- CCS CONCEPTS dependent variables with respect to specified measures, e.g. • Human-centered computing → HCI design and eval- speed and accuracy. Designing such experiments is decep- uation methods; Laboratory experiments; tively tricky: researchers must not only control for extrane- ous nuisance variables, such as fatigue and learning effects, but also weigh the costs of adding more conditions or partic- CHI 2019, May 4–9, 2019, Glasgow, Scotland Uk ipants versus the benefits of higher statistical power. © 2019 Association for Computing Machinery. Unfortunately, the problem is greater than simply helping This is the author’s version of the work. It is posted here for your personal individual researchers design experiments. The natural sci- use. Not for redistribution. The definitive Version of Record was published ences face a “reproducibility crisis” — A recent survey of over in CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland Uk, https://doi.org/10.1145/3290605. 1500 scientists indicated that “more than 70% have tried and 3300447. failed to reproduce another scientist’s experiments.” [1]. One explanation is the number of researcher degrees of freedom: the methodological decisions from study design up to publi- 1 Generalized linear model cation [28], including how many participants are recruited IVs (e.g., input devices) are often categorical and are coded in is. Each coefficient ( ) represents the effect of a condition of an IV to a DV (e.g., time). The meaning of each depends on the coding scheme (e.g., and assigned to which conditions [31]. Cockburn et al. [5] treatment vs. baseline). argue persuasively in favor of pre-registering these decisions, coefficients in line with other scientific disciplines. However, to make The response of a DV Epsilon captures the this possible, the HCI community needs a common language The intercept often represents the An interaction effect is captured by random errors that baseline response of the DV without additional terms, which is usually are unexplained. for defining and sharing experiment designs. We also need any influence from the modeled IVs. multiplied by several x’s. tools for exploring design trade-offs, and capturing the final 2 Cell-mean table Pointing The Pointing IV has two treatment conditions: design for easy comparison with published designs. Mouse Trackpad Mouse and Trackpad. Our goal is to help HCI researchers generate and weigh Desktop 10 sec 15 sec Each cell contains the mean of the DV from each combination of the treatment conditions from both IVs. design choices to balance the inherent trade-offs among alter- Display Wall 20 sec 40 sec 3 Trial table 4 Design matrix native designs. We present Touchstone2 (Figure 1), a software ParticipantID Pointing Display Replication Trial ParticipantID Pointing Display In studies with tool for creating, comparing and sharing experiments that P1 Mouse Desktop 1 1 P1 0 0 multiple replications, P1 Mouse Desktop 2 2 P1 0 1 trial rows with the includes: P1 1 0 same conditions can P1 Trackpad Wall 1 20 be averaged as a ● P1 1 1 a visual environment to manipulate experiment designs response for one row Each row is a trial. A trial table can be very long and P2 0 0 and their parameters; repetitive, especially for a large number of replications. in the design matrix. ● a graphical interface to weigh alternative designs and highlight trial table patterns; Figure 2. Four experiment designs representations [7]2. ● an interactive visualization to assess statistical power; ● an online workspace to compare and share designs; and dependent variables (DV), and results. This helped automate ● a declarative language, TSL, to describe complex ex- hypothesis generation and testing for yeast genomics exper- periments with minimal constructs and operators. iments [16]. However, since experiments in this domain are After discussing related work, we present the results of restricted to simple Latin square designs, expo omits block- an interview study that informed the design of Touchstone2. ing and counterbalancing. Papadopoulos et al. [24] present Next, we present the design rationale for Touchstone2 and the veevvie, an ontology that describes Information Visualiza- TSL language, as well as the results of two workshops with tion data at the trial level, which unfortunately precludes HCI researchers to assess the interface. Finally, we discuss the specifying trial order. benefits and limitations of Touchstone2, as well as directions The statistical literature [7, 10] argues that experiment for future research. designs serve two primary goals: 1) explaining effects and 2) explaining the assignment of treatment conditions to sub- 2 RELATED WORK jects3. To explain effects, generalized linear models (GLM) This paper focuses on two aspects of experiment design: determine the appropriate statistical procedures for data counterbalancing1 and a priori power analysis. The research analysis (Figure 2 ⃝1 ). Cell-mean tables ⃝2 summarize lev- literature includes different conventions for representing els of dependent variables for each condition (often used in experiment designs, and provides some software packages statistical reports and for power analyses). for ensuring counterbalancing and assessing power. Treatment condition assignments are often displayed as trial tables, with one trial per line ⃝3 , but their length and Representing experiment designs complexity make them cumbersome to manipulate. Design provide two-dimensional representations of GLM Individual research disciplines use various techniques for matrices coefficients, but without order information ⃝4 , as each row optimizing experiment designs. For example, industrial man- in a design matrix may correspond to multiple replicated ufacturing uses Response surface design [2] and the Taguchi trials. Text descriptions are also possible, but the lack of method [23] for between-subjects designs. They treat prod- agreed-upon formats and minimum ‘completeness’ require- uct elements as experiment subjects and focus solely on de- ments increases the likelihood