Enhancing Reproducibility and Collaboration Via Management of R Package Cohorts
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Navigating the R Package Universe by Julia Silge, John C
CONTRIBUTED RESEARCH ARTICLES 558 Navigating the R Package Universe by Julia Silge, John C. Nash, and Spencer Graves Abstract Today, the enormous number of contributed packages available to R users outstrips any given user’s ability to understand how these packages work, their relative merits, or how they are related to each other. We organized a plenary session at useR!2017 in Brussels for the R community to think through these issues and ways forward. This session considered three key points of discussion. Users can navigate the universe of R packages with (1) capabilities for directly searching for R packages, (2) guidance for which packages to use, e.g., from CRAN Task Views and other sources, and (3) access to common interfaces for alternative approaches to essentially the same problem. Introduction As of our writing, there are over 13,000 packages on CRAN. R users must approach this abundance of packages with effective strategies to find what they need and choose which packages to invest time in learning how to use. At useR!2017 in Brussels, we organized a plenary session on this issue, with three themes: search, guidance, and unification. Here, we summarize these important themes, the discussion in our community both at useR!2017 and in the intervening months, and where we can go from here. Users need options to search R packages, perhaps the content of DESCRIPTION files, documenta- tion files, or other components of R packages. One author (SG) has worked on the issue of searching for R functions from within R itself in the sos package (Graves et al., 2017). -
Revolution R Enterprise 6.1 README
Revolution R Enterprise 6.1 README Revolution R Enterprise 6.1 for 32-bit and 64-bit Windows and 64-bit Red Hat Enterprise Linux (RHEL 5.x and RHEL 6.x) features an updated release of the RevoScaleR package that provides fast, scalable data management and data analysis: the same code scales from data frames to local, high-performance .xdf files to data distributed across a Windows HPC Server cluster, Windows HPC Server Azure Burst cluster, or IBM Platform Computing LSF cluster. RevoScaleR also allows distribution of the execution of essentially any R function across cores and nodes, delivering the results back to the user. Installation instructions and instructions for getting started are provided in your confirmation e-mail. What’s New in Revolution R Enterprise 6.1 Big Data Decision Tree Models New RevoScaleR function rxDTree can be used to create decision tree models. It is based on a binning algorithm so that it can scale to huge data. Both classification and regression trees are supported. The model objects returned can be made to inherit from the rpart class of the rpart package, so that plot.rpart, text.rpart, and printcp can be used for subsequent analysis. Prediction for models fitted by rxDTree can be done using rxPredict. See Chapter 10 of the RevoScaleR User’s Guide for examples on how to create decision tree models with rxDTree. Additional information is available in the rxDTree help file, seen by entering ?rxDTree at the R command line. Support for Compression in .xdf Files RevoScaleR’s .xdf files can now be created using zlib compression. -
Introduction to Ggplot2
Introduction to ggplot2 Dawn Koffman Office of Population Research Princeton University January 2014 1 Part 1: Concepts and Terminology 2 R Package: ggplot2 Used to produce statistical graphics, author = Hadley Wickham "attempt to take the good things about base and lattice graphics and improve on them with a strong, underlying model " based on The Grammar of Graphics by Leland Wilkinson, 2005 "... describes the meaning of what we do when we construct statistical graphics ... More than a taxonomy ... Computational system based on the underlying mathematics of representing statistical functions of data." - does not limit developer to a set of pre-specified graphics adds some concepts to grammar which allow it to work well with R 3 qplot() ggplot2 provides two ways to produce plot objects: qplot() # quick plot – not covered in this workshop uses some concepts of The Grammar of Graphics, but doesn’t provide full capability and designed to be very similar to plot() and simple to use may make it easy to produce basic graphs but may delay understanding philosophy of ggplot2 ggplot() # grammar of graphics plot – focus of this workshop provides fuller implementation of The Grammar of Graphics may have steeper learning curve but allows much more flexibility when building graphs 4 Grammar Defines Components of Graphics data: in ggplot2, data must be stored as an R data frame coordinate system: describes 2-D space that data is projected onto - for example, Cartesian coordinates, polar coordinates, map projections, ... geoms: describe type of geometric objects that represent data - for example, points, lines, polygons, ... aesthetics: describe visual characteristics that represent data - for example, position, size, color, shape, transparency, fill scales: for each aesthetic, describe how visual characteristic is converted to display values - for example, log scales, color scales, size scales, shape scales, .. -
Sharing and Organizing Research Products As R Packages Matti Vuorre1 & Matthew J
Sharing and organizing research products as R packages Matti Vuorre1 & Matthew J. C. Crump2 1 Oxford Internet Institute, University of Oxford, United Kingdom 2 Department of Psychology, Brooklyn College of CUNY, New York USA A consensus on the importance of open data and reproducible code is emerging. How should data and code be shared to maximize the key desiderata of reproducibility, permanence, and accessibility? Research assets should be stored persistently in formats that are not software restrictive, and documented so that others can reproduce and extend the required computations. The sharing method should be easy to adopt by already busy researchers. We suggest the R package standard as a solution for creating, curating, and communicating research assets. The R package standard, with extensions discussed herein, provides a format for assets and metadata that satisfies the above desiderata, facilitates reproducibility, open access, and sharing of materials through online platforms like GitHub and Open Science Framework. We discuss a stack of R resources that help users create reproducible collections of research assets, from experiments to manuscripts, in the RStudio interface. We created an R package, vertical, to help researchers incorporate these tools into their workflows, and discuss its functionality at length in an online supplement. Together, these tools may increase the reproducibility and openness of psychological science. Keywords: reproducibility; research methods; R; open data; open science Word count: 5155 Introduction package standard, with additional R authoring tools, provides a robust framework for organizing and sharing reproducible Research projects produce experiments, data, analyses, research products. manuscripts, posters, slides, stimuli and materials, computa- Some advances in data-sharing standards have emerged: It tional models, and more. -
Rkward: a Comprehensive Graphical User Interface and Integrated Development Environment for Statistical Analysis with R
JSS Journal of Statistical Software June 2012, Volume 49, Issue 9. http://www.jstatsoft.org/ RKWard: A Comprehensive Graphical User Interface and Integrated Development Environment for Statistical Analysis with R Stefan R¨odiger Thomas Friedrichsmeier Charit´e-Universit¨atsmedizin Berlin Ruhr-University Bochum Prasenjit Kapat Meik Michalke The Ohio State University Heinrich Heine University Dusseldorf¨ Abstract R is a free open-source implementation of the S statistical computing language and programming environment. The current status of R is a command line driven interface with no advanced cross-platform graphical user interface (GUI), but it includes tools for building such. Over the past years, proprietary and non-proprietary GUI solutions have emerged, based on internal or external tool kits, with different scopes and technological concepts. For example, Rgui.exe and Rgui.app have become the de facto GUI on the Microsoft Windows and Mac OS X platforms, respectively, for most users. In this paper we discuss RKWard which aims to be both a comprehensive GUI and an integrated devel- opment environment for R. RKWard is based on the KDE software libraries. Statistical procedures and plots are implemented using an extendable plugin architecture based on ECMAScript (JavaScript), R, and XML. RKWard provides an excellent tool to manage different types of data objects; even allowing for seamless editing of certain types. The objective of RKWard is to provide a portable and extensible R interface for both basic and advanced statistical and graphical analysis, while not compromising on flexibility and modularity of the R programming environment itself. Keywords: GUI, integrated development environment, plugin, R. -
Frequently Asked Questions About Rcpp
Frequently Asked Questions about Rcpp Dirk Eddelbuettel Romain François Rcpp version 0.12.7 as of September 4, 2016 Abstract This document attempts to answer the most Frequently Asked Questions (FAQ) regarding the Rcpp (Eddelbuettel, François, Allaire, Ushey, Kou, Chambers, and Bates, 2016a; Eddelbuettel and François, 2011; Eddelbuettel, 2013) package. Contents 1 Getting started 2 1.1 How do I get started ?.....................................................2 1.2 What do I need ?........................................................2 1.3 What compiler can I use ?...................................................3 1.4 What other packages are useful ?..............................................3 1.5 What licenses can I choose for my code?..........................................3 2 Compiling and Linking 4 2.1 How do I use Rcpp in my package ?............................................4 2.2 How do I quickly prototype my code?............................................4 2.2.1 Using inline.......................................................4 2.2.2 Using Rcpp Attributes.................................................4 2.3 How do I convert my prototyped code to a package ?..................................5 2.4 How do I quickly prototype my code in a package?...................................5 2.5 But I want to compile my code with R CMD SHLIB !...................................5 2.6 But R CMD SHLIB still does not work !...........................................6 2.7 What about LinkingTo ?...................................................6 -
R Generation [1] 25
IN DETAIL > y <- 25 > y R generation [1] 25 14 SIGNIFICANCE August 2018 The story of a statistical programming they shared an interest in what Ihaka calls “playing academic fun language that became a subcultural and games” with statistical computing languages. phenomenon. By Nick Thieme Each had questions about programming languages they wanted to answer. In particular, both Ihaka and Gentleman shared a common knowledge of the language called eyond the age of 5, very few people would profess “Scheme”, and both found the language useful in a variety to have a favourite letter. But if you have ever been of ways. Scheme, however, was unwieldy to type and lacked to a statistics or data science conference, you may desired functionality. Again, convenience brought good have seen more than a few grown adults wearing fortune. Each was familiar with another language, called “S”, Bbadges or stickers with the phrase “I love R!”. and S provided the kind of syntax they wanted. With no blend To these proud badge-wearers, R is much more than the of the two languages commercially available, Gentleman eighteenth letter of the modern English alphabet. The R suggested building something themselves. they love is a programming language that provides a robust Around that time, the University of Auckland needed environment for tabulating, analysing and visualising data, one a programming language to use in its undergraduate statistics powered by a community of millions of users collaborating courses as the school’s current tool had reached the end of its in ways large and small to make statistical computing more useful life. -
Revolution R Enterprise™ 7.1 Getting Started Guide
Revolution R Enterprise™ 7.1 Getting Started Guide The correct bibliographic citation for this manual is as follows: Revolution Analytics, Inc. 2014. Revolution R Enterprise 7.1 Getting Started Guide. Revolution Analytics, Inc., Mountain View, CA. Revolution R Enterprise 7.1 Getting Started Guide Copyright © 2014 Revolution Analytics, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Revolution Analytics. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013. Revolution R, Revolution R Enterprise, RPE, RevoScaleR, RevoDeployR, RevoTreeView, and Revolution Analytics are trademarks of Revolution Analytics. Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective owners. Revolution Analytics. 2570 W. El Camino Real Suite 222 Mountain View, CA 94040 USA. Revised on March 3, 2014 We want our documentation to be useful, and we want it to address your needs. If you have comments on this or any Revolution document, send e-mail to [email protected]. We’d love to hear from you. Contents Chapter 1. What Is Revolution R Enterprise? .................................................................... -
Iotools: High-Performance I/O Tools for R by Taylor Arnold, Michael J
CONTRIBUTED RESEARCH ARTICLE 6 iotools: High-Performance I/O Tools for R by Taylor Arnold, Michael J. Kane, and Simon Urbanek Abstract The iotools package provides a set of tools for input and output intensive data processing in R. The functions chunk.apply and read.chunk are supplied to allow for iteratively loading contiguous blocks of data into memory as raw vectors. These raw vectors can then be efficiently converted into matrices and data frames with the iotools functions mstrsplit and dstrsplit. These functions minimize copying of data and avoid the use of intermediate strings in order to drastically improve performance. Finally, we also provide read.csv.raw to allow users to read an entire dataset into memory with the same efficient parsing code. In this paper, we present these functions through a set of examples with an emphasis on the flexibility provided by chunk-wise operations. We provide benchmarks comparing the speed of read.csv.raw to data loading functions provided in base R and other contributed packages. Introduction When processing large datasets, specifically those too large to fit into memory, the performance bottleneck is often getting data from the hard-drive into the format required by the programming environment. The associated latency comes from a combination of two sources. First, there is hardware latency from moving data from the hard-drive to RAM. This is especially the case with “spinning” disk drives, which can have throughput speeds several orders of magnitude less than those of RAM. Hardware approaches for addressing latency have been an active area of research and development since hard-drives have existed. -
R Programming for Data Science
R Programming for Data Science Roger D. Peng This book is for sale at http://leanpub.com/rprogramming This version was published on 2015-07-20 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. ©2014 - 2015 Roger D. Peng Also By Roger D. Peng Exploratory Data Analysis with R Contents Preface ............................................... 1 History and Overview of R .................................... 4 What is R? ............................................ 4 What is S? ............................................ 4 The S Philosophy ........................................ 5 Back to R ............................................ 5 Basic Features of R ....................................... 6 Free Software .......................................... 6 Design of the R System ..................................... 7 Limitations of R ......................................... 8 R Resources ........................................... 9 Getting Started with R ...................................... 11 Installation ............................................ 11 Getting started with the R interface .............................. 11 R Nuts and Bolts .......................................... 12 Entering Input .......................................... 12 Evaluation ........................................... -
Integrating R with Azure for High-Throughput Analysis Hugh Analysis Shanahan
Integrating R with Azure for High- throughput Integrating R with Azure for High-throughput analysis Hugh analysis Shanahan Hugh Shanahan Department of Computer Science Royal Holloway, University of London [email protected] @HughShanahan Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, ... ) - parallelisation happens outside of individual R script. Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, ... ) - parallelisation happens outside of individual R script. Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, .. -
BIG DATA 50 the Hottest Big Data Startups of 2014
BIG DATA 50 The hottest big data startups of 2014 Jeff Vance Table of Contents Big Data Startup Landscape – Overview .................................................................................. i About the Author ................................................................................................................. iii Introduction – the Big Data Boom .......................................................................................... 1 Notes on Methodology & the origin of the Big Data 50 ............................................................. 2 The Big Data 50 ..................................................................................................................... 5 Poised for Explosive Growth ....................................................................................................... 5 Entrigna ................................................................................................................................... 5 Nuevora ................................................................................................................................... 7 Roambi .................................................................................................................................... 9 Machine Learning Mavens ........................................................................................................ 10 Oxdata ................................................................................................................................... 10 Ayasdi ...................................................................................................................................