Efficient R Programming: a Practical Guide to Smarter Programming

Total Page:16

File Type:pdf, Size:1020Kb

Efficient R Programming: a Practical Guide to Smarter Programming Efficient R Programming A Practical Guide to Smarter Programming Colin Gillespie and Robin Lovelace Efficient R Programming by Colin Gillespie and Robin Lovelace Copyright © 2017 Colin Gillespie, Robin Lovelace. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editor: Nicole Tache Production Editor: Nicholas Adams Copyeditor: Gillian McGarvey Proofreader: Christina Edwards Indexer: WordCo Indexing Services Interior Designer: David Futato Cover Designer: Randy Comer Illustrator: Rebecca Demarest December 2016: First Edition Revision History for the First Edition 2016-11-29: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491950784 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Efficient R Programming, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-95078-4 [LSI] Preface Efficient R Programming is about increasing the amount of work you can do with R in a given amount of time. It’s about both computational and programmer efficiency. There are many excellent R resources about topics such as visualization (e.g., Chang 2012), data science (e.g., Grolemund and Wickham 2016), and package development (e.g., Wickham 2015). There are even more resources on how to use R in particular domains, including Bayesian statistics, machine learning, and geographic information systems. However, there are very few unified resources on how to simply make R work effectively. Hints, tips, and decades of community knowledge on the subject are scattered across hundreds of internet pages, email threads, and discussion forums, making it challenging for R users to understand how to write efficient code. In our teaching we have found that this issue applies to beginners and experienced users alike. Whether it’s a question of understanding how to use R’s vector objects to avoid for loops, knowing how to set up your .Rprofile and .Renviron files, or the ability to harness R’s excellent C++ interface to do the heavy lifting, the concept of efficiency is key. The book aims to distill tips, warnings, and tricks of the trade into a single, cohesive whole that provides a useful resource to R programmers of all stripes for years to come. The content of the book reflects the questions that our students from a range of disciplines, skill levels, and industries have asked over the years to make their R work faster. How to set up my system optimally for R programming work? How can one apply general principles from computer science (such as do not repeat yourself, aka DRY) to the specifics of an R script? How can R code be incorporated into an efficient workflow, including project inception, collaboration, and write-up? And how can one quickly learn how to use new packages and functions? The book answers these questions and more in 10 self-contained chapters. Each chapter starts with the basics and gets progressively more advanced, so there is something for everyone in each one. While more advanced topics such as parallel programming and C++ may not be immediately relevant to R beginners, the book helps to navigate R’s infamously steep learning curve with a commitment to starting slow and building on strong foundations. Thus even experienced R users are likely to find previously hidden gems of advice. While teaching this material, we commonly hear “Why didn’t anyone tell me that before?” Efficient programming should not be seen as an optional extra, and the importance of efficiency grows with the size of projects and datasets. In fact, this book was devised while teaching a course called R for Big Data, when it quickly became apparent that if you want to work with large datasets, your code must work efficiently. Even with small datasets, efficient code that is both fast to write and fast to run is a vital component of successful R projects. We found that the concept of efficient programming is important in all branches of the R community. Whether you are a sporadic user of R (e.g., for its unbeatable range of statistical packages), looking to develop a package, or working on a large collaborative project in which efficiency is mission-critical, code efficiency will have a major impact on your productivity. Ultimately, efficiency is about getting more output for less work input. To take the analogy of a car, would you rather drive 1,000 km on a single tank (or a single charge of batteries) or refuel a heavy, clunky, ugly car every 50 km? Or would you prefer to choose an altogether more efficient vehicle and cycle? In the same way, efficient R code is better than inefficient R code in almost every way: it is easier to read, write, run, share, and maintain. This book cannot provide all the answers about how to produce such code, but it certainly can provide ideas, example code, and tips to make a start in the right direction of travel. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Bold Indicates the names of R packages. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context. T IP This element signifies a tip or suggestion. NOT E This element signifies a general note. WARNING This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/csgillespie/efficient. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Efficient R Programming by Colin Gillespie and Robin Lovelace (O’Reilly). Copyright 2017 Colin Gillespie, Robin Lovelace, 978-1-491-95078-4.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at [email protected]. O’Reilly Safari Safari (formerly Safari Books Online) is a membership-based training and reference platform for enterprise, government, educators, and individuals. Members have access to thousands of books, training videos, Learning Paths, interactive tutorials, and curated playlists from over 250 publishers, including O’Reilly Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and Course Technology, among others. For more information, please visit http://oreilly.com/safari. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/efficient-r-programming. To comment or ask technical questions about this book, send email to [email protected]. For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia Acknowledgments This book was written in the open, and many people contributed pull requests to fix minor problems. Special thanks goes to O’Reilly who allowed this process and everyone who contributed via GitHub: @Delvis, @richelbilderbeek, @adamryczkowski, @CSJCampbell, @tktan, @nachti, Conor Lawless, @timcdlucas, Dirk Eddelbuettel, @wolfganglederer, @HenrikBengtsson, @giocomai, and @daattali. Many thanks also to the detailed feedback from the technical reviewers, Richard Cotton and Garrett Grolemund.
Recommended publications
  • Installing R
    Installing R Russell Almond August 29, 2020 Objectives When you finish this lesson, you will be able to 1) Start and Stop R and R Studio 2) Download, install and run the tidyverse package. 3) Get help on R functions. What you need to Download R is a programming language for statistics. Generally, the way that you will work with R code is you will write scripts—small programs—that do the analysis you want to do. You will also need a development environment which will allow you to edit and run the scripts. I recommend RStudio, which is pretty easy to learn. In general, you will need three things for an analysis job: • R itself. R can be downloaded from https://cloud.r-project.org. If you use a package manager on your computer, R is likely available there. The most common package managers are homebrew on Mac OS, apt-get on Debian Linux, yum on Red hat Linux, or chocolatey on Windows. You may need to search for ‘cran’ to find the name of the right package. For Debian Linux, it is called r-base. • R Studio development environment. R Studio https://rstudio.com/products/rstudio/download/. The free version is fine for what we are doing. 1 There are other choices for development environments. I use Emacs and ESS, Emacs Speaks Statistics, but that is mostly because I’ve been using Emacs for 20 years. • A number of R packages for specific analyses. These can be downloaded from the Comprehensive R Archive Network, or CRAN. Go to https://cloud.r-project.org and click on the ‘Packages’ tab.
    [Show full text]
  • Read.Csv("Titanic.Csv", As.Is = "Name") Factors - Categorical Variables
    Introduction to R by Debby Kermer George Mason University Library Data Services Group http://dataservices.gmu.edu [email protected] http://dataservices.gmu.edu/workshop/r 1. Create a folder with your name on the S: Drive 2. Copy titanic.csv from R Workshop Files to that folder History R ≈ S ≈ S-Plus Open Source, Free www.r-project.org Comprehensive R Archive Network Download R from: cran.rstudio.com RStudio: www.rstudio.com R Console Console See… > prompt for new command + waiting for rest of command R "guesses" whether you are done considering () and symbols Type… ↑[Up] to get previous command Type in everything in font: Courier New 3+2 3- 2 Objects Objects nine <- 9 Historical Conventions nine Use <- to assign values Use . to separate names three <- nine / 3 three Current Capabilities my.school <- "gmu" = is okay now in most cases is okay now in most cases my.school _ RStudio: Press Alt - (minus) to insert "assignment operator" Global Environment Script Files Vectors & Lists numbers <- c(101,102,103,104,105) the same numbers <- 101:105 numbers <- c(101:104,105) numbers[ 2 ] numbers[ c(2,4,5)] Vector Variable numbers[-c(2,4,5)] numbers[ numbers > 102 ] RStudio: Press Ctrl-Enter to run the current line or press the button: Files & Functions Files Pane 2. Choose the "S" Drive 1. Browse Drives 3. Choose your folder Working Directory Functions read.table( datafile, header=TRUE, sep = ",") Function Positional Named Named Argument Argument Argument Reading Files Thus, these would be identical: titanic <- read.table( datafile, header=TRUE,
    [Show full text]
  • R Programming for Data Science
    R Programming for Data Science Roger D. Peng This book is for sale at http://leanpub.com/rprogramming This version was published on 2015-07-20 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. ©2014 - 2015 Roger D. Peng Also By Roger D. Peng Exploratory Data Analysis with R Contents Preface ............................................... 1 History and Overview of R .................................... 4 What is R? ............................................ 4 What is S? ............................................ 4 The S Philosophy ........................................ 5 Back to R ............................................ 5 Basic Features of R ....................................... 6 Free Software .......................................... 6 Design of the R System ..................................... 7 Limitations of R ......................................... 8 R Resources ........................................... 9 Getting Started with R ...................................... 11 Installation ............................................ 11 Getting started with the R interface .............................. 11 R Nuts and Bolts .......................................... 12 Entering Input .......................................... 12 Evaluation ...........................................
    [Show full text]
  • Thesis Entire.Pdf (12.90Mb)
    Faculty OF Science AND TECHNOLOGY Department OF Computer Science TOWARD ReprODUCIBLE Analysis AND ExplorATION OF High-ThrOUGHPUT Biological Datasets — Bjørn Fjukstad A DISSERTATION FOR THE DEGREE OF Philosophiae Doctor – 2018 This thesis document was typeset using the UiT Thesis LaTEX Template. © 2018 – http://github.com/egraff/uit-thesis “Ta aldri problemene på forskudd, for da får du dem to ganger, men ta gjerne seieren på forskudd, for hvis ikke er det altfor sjelden du får oppleve den.” –Ivar Tollefsen AbstrACT There is a rapid growth in the number of available biological datasets due to the advent of high-throughput data collection instruments combined with cheap compute infrastructure. Modern instruments enable the analysis of biological data at different levels, from small DNA sequences through larger cell structures, and up to the function of entire organs. These new datasets have brought the need to develop new software packages to enable novel insights into the underlying biological mechanisms in the development and progression of diseases such as cancer. The heterogeneity of biological datasets require researchers to tailor the explo- ration and analyses with a wide range of different tools and systems. However, despite the need for their integration, few of them provide standard inter- faces for analyses implemented using different programming languages and frameworks. In addition, because of the many tools, different input parame- ters, and references to databases, it is necessary to record these correctly. The lack of such details complicates reproducing the original results and the reuse of the analyses on new datasets. This increases the analysis time and leaves unrealized potential for scientific insights.
    [Show full text]
  • R Snippets Release 0.1
    R Snippets Release 0.1 Shailesh Nov 02, 2017 Contents 1 R Scripting Environment 3 1.1 Getting Help...............................................3 1.2 Workspace................................................3 1.3 Arithmetic................................................4 1.4 Variables.................................................5 1.5 Data Types................................................6 2 Core Data Types 9 2.1 Vectors..................................................9 2.2 Matrices................................................. 15 2.3 Arrays.................................................. 22 2.4 Lists................................................... 26 2.5 Factors.................................................. 29 2.6 Data Frames............................................... 31 2.7 Time Series................................................ 36 3 Language Facilities 37 3.1 Operator precedence rules........................................ 37 3.2 Expressions................................................ 38 3.3 Flow Control............................................... 38 3.4 Functions................................................. 42 3.5 Packages................................................. 44 3.6 R Scripts................................................. 45 3.7 Logical Tests............................................... 45 3.8 Introspection............................................... 46 3.9 Coercion................................................. 47 3.10 Sorting and Searching.........................................
    [Show full text]
  • Working in the Tidyverse Desi Quintans, Jeff Powell 2 Contents
    Working in the Tidyverse Desi Quintans, Jeff Powell 2 Contents 1 Preface 7 Workshop goals . .7 Assumed knowledge . .7 Conventions in this manual . .8 2 Setting up the Tidyverse 9 Software to install . .9 Installing course material and packages . .9 Workflow suggestions . 10 An example of a small Rmarkdown document . 12 3 Tidyverse: What, why, when? 15 What is the Tidyverse? . 15 What are the advantages of Tidyverse over base R? . 16 What are the disadvantages of the Tidyverse? . 16 4 The Pipeline 17 What does the pipe %>% do?........................................ 17 Using the pipe . 18 Debugging a pipeline . 22 5 What do all these packages do? 25 Importing data . 25 Manipulating data frames . 26 Modifying vectors . 27 Iterating and functional programming . 28 Graphing . 29 Statistical modelling . 29 3 4 CONTENTS 6 Importing data 31 Importing one CSV spreadsheet . 31 Importing several CSV spreadsheets . 32 Saving (exporting) to a .RDS file . 34 7 Reshaping and completing 35 What is tidy data? . 35 Preparing a practice dataset for reshaping . 36 Spreading a long data frame into wide . 37 Gathering a wide data frame into long . 38 A quick word about dots ......................................... 38 Separating one column into several . 40 Completing a table . 41 8 Joining data frames together 43 Classic row-binding and column-binding . 43 Joining data frames by value . 44 9 Choosing and renaming columns 49 Choosing columns by name . 49 Choosing columns by value . 52 Renaming columns . 53 10 Choosing rows 55 Building a dataset with duplicated rows . 55 Reordering rows . 55 Removing duplicate rows . 57 Removing rows with NAs.......................................... 58 Choosing rows by value .
    [Show full text]
  • Analytical and Other Software on the Secure Research Environment
    Analytical and other software on the Secure Research Environment The Research Environment runs with the operating system: Microsoft Windows Server 2016 DataCenter edition. The environment is based on the Microsoft Data Science virtual machine template and includes the following software: • R Version 4.0.2 (2020-06-22), as part of Microsoft R Open • R Studio Desktop 1.3.1093 working with R 4.0.2 • Anaconda 3, including an environment for Python 3.8.5 • Python, 3.8.5 as part of the Anaconda base environment • Jupyter Notebook, as part of the Anaconda3 environment • Microsoft Office 2016 Standard edition, including Word, Excel, PowerPoint, and OneNote (Access not included) • JuliaPro 0.5.1.1 and the Juno IDE for Julia • PyCharm Community Edition, 2020.3 • PLINK • JAGS • WinBUGS • OpenBUGS • stan and rstan • Apache Spark 2.2.0 • SparkML and pySpark • Apache Drill 1.11.0 • MAPR Drill driver • VIM 8.0.606 • TensorFlow • MXNet, MXNet Model Server • Microsoft Cognitive Toolkit (CNTK) • Weka • Vowpal Wabbit • xgboost • Team Data Science Process (TDSP) Utilities • VOTT (Visual Object Tagging Tool) 1.6.11 • Microsoft Machine Learning Server • PowerBI • Docker version 10.03.5, build 2ee0c57608 • SQL Server Developer Edition (2017), including Management Studio and SQL Server Integration Services (SSIS) • Visual Studio Code 1.17.1 • Nodejs • 7-zip • Evince PDF Viewer • Acrobat Reader • Microsoft Photo Viewer • PowerShell 6 March 2021 Version 1.2 And in the Premium research environments: • STATA 16.1 • SAS 9.4, m4 (academic license) Users also have the ability to bring in additional software if the software was specified in the data request, the software runs in the operating system described above, and the user can provide Vivli with any necessary licensing keys.
    [Show full text]
  • Opening Reproducible Research: Research Project Website and Blog Blog Post Authors
    Opening Reproducible Research: research project website and blog Blog post authors: Daniel Nüst Lukas Lohoff Marc Schutzeichel Markus Konkol Matthias Hinz Rémi Rampin Vicky Steeves Opening Reproducible Research | doi:10.5281/zenodo.1485438 Blog posts Demo server update 14 Aug 2018 | By Daniel Nüst We’ve been working on demonstrating our reference-implementation during spring an managed to create a number of example workspaces. We now decided to publish these workspaces on our demo server. Screenshot 1: o2r reference implementation listing of published Executable Research Compendia . The right-hand side shows a metadata summary including original authors. The papers were originally published in Journal of Statistical Software or in a Copernicus Publications journal under open licenses. We have created an R Markdown document for each paper based on the included data and code following the ERC specification for naming core files, but only included data, an R Markdown document and a HTML display file. The publication metadata, the runtime environment description (i.e. a Dockerfile ), and the runtime image (i.e. a Docker image tarball) were all created during the ERC creation process without any human interaction (see the used R code for upload), since required metadata were included in the R Markdown document’s front matter. The documents include selected figures or in some cases the whole paper, if runtime is not extremely long. While the paper’s authors are correctly linked in the workspace metadata (see right hand side in Screenshot 1), the “o2r author” of all papers is o2r team member Daniel since he made the uploads.
    [Show full text]
  • Imagej2: Imagej for the Next Generation of Scientific Image Data Curtis T
    Rueden et al. BMC Bioinformatics (2017) 18:529 DOI 10.1186/s12859-017-1934-z SOFTWARE Open Access ImageJ2: ImageJ for the next generation of scientific image data Curtis T. Rueden1, Johannes Schindelin1,2, Mark C. Hiner1, Barry E. DeZonia1, Alison E. Walter1,2, Ellen T. Arena1,2 and Kevin W. Eliceiri1,2* Abstract Background: ImageJ is an image analysis program extensively used in the biological sciences and beyond. Due to its ease of use, recordable macro language, and extensible plug-in architecture, ImageJ enjoys contributions from non-programmers, amateur programmers, and professional developers alike. Enabling such a diversity of contributors has resulted in a large community that spans the biological and physical sciences. However, a rapidly growing user base, diverging plugin suites, and technical limitations have revealed a clear need for a concerted software engineering effort to support emerging imaging paradigms, to ensure the software’s ability to handle the requirements of modern science. Results: We rewrote the entire ImageJ codebase, engineering a redesigned plugin mechanism intended to facilitate extensibility at every level, with the goal of creating a more powerful tool that continues to serve the existing community while addressing a wider range of scientific requirements. This next-generation ImageJ, called “ImageJ2” in places where the distinction matters, provides a host of new functionality. It separates concerns, fully decoupling the data model from the user interface. It emphasizes integration with external applications to maximize interoperability. Its robust new plugin framework allows everything from image formats, to scripting languages, to visualization to be extended by the community. The redesigned data model supports arbitrarily large, N-dimensional datasets, which are increasingly common in modern image acquisition.
    [Show full text]
  • Using R to Analyse Key UK Surveys UK Data Service – Using R to Analyse Key UK Surveys
    ukdataservice.ac.uk Using R to analyse key UK surveys UK Data Service – Using R to analyse key UK surveys Author: UK Data Service Updated: July 2019 Version: 1.3 We are happy for our materials to be used and copied but request that users should: link to our original materials instead of re-mounting our materials on your website cite this an original source as follows: Pierre Walthery (updated by Rosalynd Southern, 2013 and Ana Morales, 2017). Using R to analyse key UK surveys. UK Data Service, University of Essex and University of Manchester. UK Data Service – Using R to analyse key UK surveys Contents 1. Introduction ......................................................................................................................... 4 1.1. What is R? ................................................................................................................... 4 1.2. The pros and the cons of R ..................................................................................... 5 2. Using R: essential information ......................................................................................... 7 2.1. Installing and loading user-written packages ..................................................... 9 2.2. Getting help ............................................................................................................. 10 2.3. Interacting with R: command line vs graphical interface ............................... 12 2.4. Objects in R .............................................................................................................
    [Show full text]
  • Jsr223: a Java Platform Integration for R with Programming Languages Groovy, Javascript, Jruby, Jython, and Kotlin by Floid R
    CONTRIBUTED RESEARCH ARTICLES 440 jsr223: A Java Platform Integration for R with Programming Languages Groovy, JavaScript, JRuby, Jython, and Kotlin by Floid R. Gilbert and David B. Dahl Abstract The R package jsr223 is a high-level integration for five programming languages in the Java platform: Groovy, JavaScript, JRuby, Jython, and Kotlin. Each of these languages can use Java objects in their own syntax. Hence, jsr223 is also an integration for R and the Java platform. It enables developers to leverage Java solutions from within R by embedding code snippets or evaluating script files. This approach is generally easier than rJava’s low-level approach that employs the Java Native Interface. jsr223’s multi-language support is dependent on the Java Scripting API: an implementation of “JSR-223: Scripting for the Java Platform” that defines a framework to embed scripts in Java applications. The jsr223 package also features extensive data exchange capabilities and a callback interface that allows embedded scripts to access the current R session. In all, jsr223 makes solutions developed in Java or any of the jsr223-supported languages easier to use in R. Introduction About the same time Ross Ihaka and Robert Gentleman began developing R at the University of Auckland in the early 1990s, James Gosling and the so-called Green Project Team was working on a new programming language at Sun Microsystems in California. The Green Team did not set out to make a new language; rather, they were trying to move platform-independent, distributed computing into the consumer electronics marketplace. As Gosling explained, “All along, the language was a tool, not the end” (O’Connell, 1995).
    [Show full text]
  • 263Section on Health Policy Statistics AM Ro
    ✪ Themed Session ■ Applied Session ◆ Presenter 262 Section on Bayesian Statistical 264 Section on Physical and Science A.M. Roundtable Discussion (fee Engineering Sciences A.M. Roundtable event) Discussion (fee event) Section on Bayesian Statistical Science Section on Physical and Engineering Sciences Tuesday, August 2, 7:00 a.m.–8:15 a.m. Tuesday, August 2, 7:00 a.m.–8:15 a.m. Gender Issues in Academia and How to Balance Engaging Stochastic Spatiotemporal Work and Life Methodologies in Renewable Energy Research FFrancesca Dominici, Harvard School of Public Health, Boston, FAlexander Kolovos, SAS Institute, Inc., 100 SAS Campus Dr., 02115, [email protected] S3042, Cary, NC 27513 USA, [email protected] Key Words: gender issue, balancing work and life Key Words: spatiotemporal, stochastic modeling, renewable energy, energy research, solar, wind Despite interventions by leaders in higher education, women are still under-represented in academic leadership positions. This dearth of This roundtable builds on a discussion initialized at JSM2010. Last women leaders is no longer a pipeline issue, raising questions as to the year the panel explored the interest in connecting statistical methodol- root causes for the persistence of this pattern. We have identified four ogies with research in the fields of renewable energy and sustainability. themes as the root causes for the under-representation of women in In particular, stochastic spatiotemporal analysis can provide a plethora leadership positions from focus group interviews of senior women fac- of tools for fundamental aspects in the modeling of attributes related ulty leaders at Johns Hopkins. These causes are found in routine prac- to renewable energy resources such as solar radiation, wind fields, tidal tices surrounding leadership selection as well as in cultural assumptions waves.
    [Show full text]