Dynamic Graphics and Reporting for Statistics Yihui Xie Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2013 Dynamic Graphics and Reporting for Statistics Yihui Xie Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Sciences Commons, and the Statistics and Probability Commons Recommended Citation Xie, Yihui, "Dynamic Graphics and Reporting for Statistics" (2013). Graduate Theses and Dissertations. 13518. https://lib.dr.iastate.edu/etd/13518 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Dynamic graphics and reporting for statistics by Yihui Xie A dissertation submitted to the graduate faculty in partial fulllment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Dianne Cook, Co-major Professor Heike Hofmann, Co-major Professor Jarad Niemi Huaiqing Wu Gray Calhoun Iowa State University Ames, Iowa 2013 Copyright © Yihui Xie, 2013. All rights reserved. ii TABLE OF CONTENTS LIST OF TABLES vi LIST OF FIGURES viii ACKNOWLEDGEMENTS ix 1 PROBLEM STATEMENT 1 2 OVERVIEW 4 2.1 Animations . 4 2.2 Interactive Graphics . 6 2.3 Dynamic Reporting . 8 2.4 Connections . 10 3 SCOPE 12 I STATISTICAL ANIMATIONS 16 4 INTRODUCTION 18 5 STATISTICS AND ANIMATIONS 22 5.1 Iterative Algorithms . 23 5.2 Random Numbers and Simulations . 28 5.3 Resampling Methods . 29 iii 5.4 Dynamic Trends . 31 6 DESIGN AND CONTENTS 33 6.1 The Basic Schema . 33 6.2 Tools for Exporting Animations . 35 6.2.1 HTML pages . 37 6.2.2 PDF animations via LATEX......................... 40 6.2.3 GIF animations, Flash animations and videos . 41 6.2.4 The animation recorder . 42 6.2.5 Other R packages . 43 6.3 Topics in statistics . 43 6.4 Demos . 45 7 EXAMPLES 48 8 CONCLUSIONS 56 II INTERACTIVE GRAPHICS 59 9 INTRODUCTION 61 10 THE MVC ARCHITECTURE 65 11 REACTIVE PROGRAMMING 69 11.1 Mutaframes . 70 11.2 Reference Classes . 73 11.3 Reactive Programming Behind cranvas . 75 12 AN ANATOMY OF INTERACTIONS 77 12.1 Input Actions . 77 iv 12.2 Reactive Data Objects . 79 12.3 Interactions . 82 12.3.1 Brushing and selection . 82 12.3.2 Linking . 84 12.3.3 Zooming and panning . 86 12.3.4 Querying/identifying . 88 12.3.5 Direct manipulation . 88 13 CONCLUSIONS 90 III DYNAMIC REPORTING 92 14 Introduction 93 15 A WEB APPLICATION 96 16 DESIGN 98 16.1 Parser . 98 16.2 Evaluator . 101 16.3 Renderer . 104 17 FEATURES 106 17.1 Code Decoration . 107 17.2 Graphics . 108 17.2.1 Graphical Devices . 108 17.2.2 Plot Recording . 110 17.2.3 Plot Rearrangement . 111 17.2.4 Plot Size . 113 17.2.5 The tikz Device . 113 v 17.3 Cache . 115 17.4 Code Externalization . 117 17.5 Chunk Reference . 118 17.6 Evaluation of Chunk Options . 120 17.7 Child Document . 121 17.8 R Notebook . 121 18 EXTENSIBILITY 123 18.1 Hooks . 123 18.2 Language Engines . 126 19 DISCUSSION 128 IV IMPACT AND FUTURE WORK 132 20 IMPACT 133 21 FUTURE WORK 138 21.1 Animations . 138 21.2 Interactive Graphics . 139 21.3 Dynamic Reporting . 141 BIBLIOGRAPHY 143 vi LIST OF TABLES 6.1 The exporting utilities in the animation package. 37 7.1 The accuracy rates under different numbers of features with 5-fold cross- validation . 55 16.1 Code syntax for different document formats . 102 16.2 Output hook functions and the object classes of results from the evaluate package. 105 20.1 The numbers of downloads and initial publication dates of the top 35 R packages . 134 vii LIST OF FIGURES 2.1 The basic idea of animations . 5 2.2 The scheme of interactive graphics . 7 2.3 The central role of the data object in interactive graphics . 8 2.4 Overview of dynamic reporting . 9 2.5 Connections between animations, interactive graphics and reporting . 10 5.1 An illustration of a sequence of images . 23 5.2 The gradient descent algorithm applied to two bivariate objective func- tions with different step lengths . 24 5.3 Four basic types of sampling methods . 26 5.4 Simulation of Buffon’s Needle . 27 5.5 Bootstrapping for the variable eruptions of the faithful data . 30 5.6 Obama’s acceptance speech in Chicago . 31 6.1 A demonstration of the Brownian motion . 36 6.2 The interface of the animation in the HTML page (in the Firefox browser). 38 6.3 A 3D animation demo (created by the rgl package) . 46 7.1 The QQ plot for random numbers from N(0, 1) with a sample size 20 (top) and 100 (bottom) respectively . 50 7.2 The normality of the sample mean X¯ n as the sample size n increases from 1 to 50 with X Uni f (0, 1) (top) and Cauchy(0, 1) (bottom) . 52 ∼ viii 7.3 An illustration of the 5-fold cross-validation for finding the optimum num- ber of gene features based on LDA . 53 9.1 A representation of the pipeline in the cranvas package . 64 10.1 Brush a scatterplot using the MVC design . 67 11.1 The original scatterplot is automatically updated . 72 11.2 A grand tour through the flea data . 73 12.1 Histogram of sales prices . 79 12.2 Morph from a choropleth chart of the US to a population based cartogram 80 12.3 The conversion from a data frame to a mutaframe . 80 12.4 Create a scatter plot and attach a meta object to it . 80 12.5 One-to-one linking . 85 12.6 Linked map and scatterplot . 86 12.7 Zooming into the pollen data . 88 12.8 Bar chart and spine plot of the number of bedrooms in housing sales . 89 17.1 There is no need to explicitly print grid graphics . 109 17.2 Two expressions produce two plots . 111 20.1 The number of visits of the knitr website . 133 20.2 The scatterplot of downloads vs initial publication dates . 135 ix ACKNOWLEDGEMENTS I am very grateful to my major professors Di Cook and Heike Hofmann, and my committee members Jarad Niemi, Huaiqing Wu, and Gray Calhoun, for their time and feedback on my work. I learned a lot about statistical computing and graphics from Di and Heike during my stay here at Iowa State University. I’m always impressed by their sharp vision when looking at statistical graphics, and influenced by their enthusiasm in finding out interesting patterns in practical data. I’ll remember well Di’s reaction to the iris data, and Heike’s finding about why the brush rectangle did not work. I have been lucky enough to be able to work in areas that I’m truly interested in, since I was an undergraduate student in the School of Statistics, Renmin University of China. My former advisor at Renmin, Yanyun Zhao, was very open-minded, just like my current advisors at Iowa State. He did not really know all the technical details of my work, but he always encouraged and supported me when I spent a lot of time on R and statistical graphics. With that much freedom, I was able to attend a statistical graphics workshop in Germany in 2008, where I met Di and Heike for the first time, and acquainted myself with gurus in this area. I was the first (and so far the only) winner of the John Chambers Statistical Software Award in China. I worked as a summer intern at the AT&T Labs with Simon Urbanek (2012) and Fred Hutchinson Cancer Research Cen- ter with Raphael Gottardo (2013). Both mentors also gave me full freedom to combine my research interests with their projects. As a highly self-motivated person, I feel that is where my productivity primarily comes from. When I first came to the US in 2009, I had a personal misfortune but got quick help x from the Department of Statistics and the International Students Office here, even though many people did not know me at that time. I deeply appreciate that, and hope I can do something in return in my future career. I took 15 courses in total at Iowa State University, and I thank all the instructors for what.