The R Journal Volume 3/2, December 2011

The Journal Volume 3/2, December 2011 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial..................................................3 Contributed Research Articles Creating and Deploying an Application with (R)Excel and R...................5 glm2: Fitting Generalized Linear Models with Convergence Problems.............. 12 Implementing the Compendium Concept with Sweave and DOCSTRIP............. 16 Watch Your Spelling!........................................... 22 Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming 29 Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions............... 34 Using the Google Visualisation API with R.............................. 40 GrapheR: a Multiplatform GUI for Drawing Customizable Graphs in R............. 45 rainbow: An R Package for Visualizing Functional Time Series.................. 54 Programmer’s Niche Portable C++ for R Packages...................................... 60 News and Notes R’s Participation in the Google Summer of Code 2011....................... 64 Conference Report: useR! 2011..................................... 68 Forthcoming Events: useR! 2012.................................... 70 Changes in R............................................... 72 Changes on CRAN............................................ 84 News from the Bioconductor Project.................................. 86 R Foundation News........................................... 87 2 The Journal is a peer-reviewed publication of the R Founda- tion for Statistical Computing. Communications regarding this publication should be addressed to the editors. All articles are licensed under the Creative Commons Attribution 3.0 Unported license (CC BY 3.0, http://creativecommons.org/licenses/by/3.0/). Prospective authors will find detailed and up-to-date submission instruc- tions on the Journal’s homepage. Editor-in-Chief: Heather Turner Statistics Department University of Warwick Coventry CV4 7AL UK Editorial Board: Peter Dalgaard, Martyn Plummer, and Hadley Wickham. Editor Programmer’s Niche: Bill Venables Editor Help Desk: Uwe Ligges Editor Book Reviews: G. Jay Kerns Department of Mathematics and Statistics Youngstown State University Youngstown, Ohio 44555-0002 USA [email protected] R Journal Homepage: http://journal.r-project.org/ Email of editors and editorial board: [email protected] The R Journal is indexed/abstracted by EBSCO, DOAJ. The R Journal Vol. 3/2, December 2011 ISSN 2073-4859 3 Editorial by Heather Turner create an Excel application from an R package, using rpart as an example. Finally the area of documen- Following the guiding principles set down by its pre- tation and reproducible research is represented by decessor R News, the journal has maintained the pol- articles on the spell-checking of Rd files and vignettes, icy of allowing authors to retain copyright of their and the implementation of the compendium concept published articles. However the lack of an explicit with Sweave and DOCSTRIP. licence has meant that the rights of others regarding In addition to these articles, we have the first Pro- material published in the journal has been unclear. grammer’s Niche article since becoming The R Jour- Therefore from this issue, all articles will be licensed nal. This column is intended to provide “program- under the Creative Commons Attribution 3.0 Un- ming pearls” for R developers—short articles that ported license (CC BY 3.0, http://creativecommons. elucidate programming concepts or technical issues org/licenses/by/3.0/). This license is the standard in developing R packages. In this instance, the article set by the SPARC Europe Seal for Open Access Jour- is not on R itself, but on writing portable C++ for R nals and ensures compatibility with other open access packages, problems with which cause many an error, journals such as the Journal of Statistical Software. especially on Solaris systems. The key features of the CC BY 3.0 license are that In the usual round up of news, readers are partic- anyone is free to share or to adapt the licensed work, ularly encouraged to take a look at the report on R’s including for commercial use, provided that they cite Participation in the Google Summer of Code 2011. As the original article. This increases the value of arti- well as reporting back on an especially productive sea- cles published in The R Journal, whilst ensuring that son, the GSoC 2011 administrators discuss the costs authors receive appropriate credit for their work. and benefits of involvement in this scheme and note In the immediate term, the new licensing policy that suggestions for 2012 projects are already being will mean that members of the community are free collected. to translate articles in the journal, an initiative al- Finally I would like to thank the associate editors ready planned by the R User’s Group in South Korea and column editors for their work and support dur- (http://www.openstatistics.net/). It also opens up ing my year’s tenure as Editor-in-Chief. In 2012, this other possibilities, such as distributing sources along- role will be taken on by Martyn Plummer. Peter Dal- side articles, that could be considered in future. gaard will stand down from the editorial board after The selection of contributed articles in this issue four years of service and Deepayan Sarkar will join is again testament to the diverse concerns of the R us to fill his place. As Deepayan is currently based in community. In the area of statistical methodlogy, Tay- Delhi, the editorial board will thus span three conti- lor Arnold and John Emerson introduce new func- nents, with myself and Martyn in Europe and Hadley tions for non-parametric goodness-of-fit tests, whilst Wickham in the US. Such international diversity is Ian Marschner and joint authors Haizhou Wang and important, when the R community itself is spread Mingzhou Song propose improved algorithms for around the globe. Hopefully as the R Journal con- generalized linear models and 1-D k means cluster- tinues to establish itself and we look to expand the ing respectively. On graphics, Han Lin Shang intro- editorial board, the diversity of the R community will duces the rainbow package for visualizing time se- be increasingly represented. ries; Markus Gesmann and Diego de Castillo demon- strate use of the Google visualistion API with R, and Heather Turner Maxime Hervé presents a GUI designed to help R Statistics Department, novices create publication-quality graphics in R. Pro- University of Warwick, viding easier access to R’s functionality is also the Coventry, UK focus of our first article, which demonstrates how to [email protected] The R Journal Vol. 3/2, December 2011 ISSN 2073-4859 4 The R Journal Vol. 3/2, December 2011 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 5 Creating and Deploying an Application with (R)Excel and R Thomas Baier, Erich Neuwirth and Michele De Meo Prediction of a quantitative or categorical variable, is done through a tree structure, which even non- Abstract We present some ways of using R in professionals can read and understand easily. The Excel and build an example application using the application of a computationally complex algorithm package rpart. Starting with simple interactive thus results in an intuitive and easy to use tool. Pre- use of rpart in Excel, we eventually package the diction of a categorical variable is performed by a code into an Excel-based application, hiding all classification tree, while the term regression tree is used details (including R itself) from the end user. In for the estimation of a quantitative variable. the end, our application implements a service- Our application will be built for Microsoft Excel oriented architecture (SOA) with a clean separa- and will make use of R and rpart to implement the tion of presentation and computation layer. functionality. We have chosen Excel as the primary tool for performing the analysis because of various advantages: Motivation • Excel has a familiar and easy-to-use user inter- Building an application for end users is a very chal- face. lenging goal. Building a statistics application nor- • Excel is already installed on most of the work- mally involves three different roles: application de- stations in the industries we mentioned. veloper, statistician, and user. Often, statisticians are programmers too, but are only (or mostly) fa- • In many cases, data collection has been per- miliar with statistical programming (languages) and formed using Excel, so using Excel for the anal- definitely are not experts in creating rich user inter- ysis seems to be the logical choice. faces/applications for (casual) users. • Excel provides many features to allow a For many—maybe even for most—applications of high-quality presentation of the results. Pre- statistics, Microsoft Excel is used as the primary user configured presentation options can easily interface. Users are familiar with performing simple adapted even by the casual user. computations using the spreadsheet and can easily format the result of the analyses for printing or inclu- • Output data (mostly graphical or tabular pre- sion in reports or presentations. Unfortunately, Excel sentation of the results) can easily be used in does not provide support for doing more complex further processing— e.g., embedded in Power- computations and analyses and also has documented Point slides or Word documents using OLE (a weaknesses for certain numerical calculations. subset of COM, as in

The R Journal Volume 3/2, December 2011

Tinn-R Team Has a New Member Working on the Source Code: Wel- Come Huashan Chen

Supplementary Materials

The Split-Apply-Combine Strategy for Data Analysis

Hadley Wickham, the Man Who Revolutionized R Hadley Wickham, the Man Who Revolutionized R · 51,321 Views · More Stats

R Generation [1] 25

Data Analysis in R

A History of R (In 15 Minutes… and Mostly in Pictures)

Changes on CRAN 2014-07-01 to 2014-12-31

R Programming for Data Science

ALFRED P. SLOAN FOUNDATION PROPOSAL COVER SHEET | Proposal Guidelines

August 12-14, Dortmund, Germany

Dynamic Documents with R and Knitr, by Yihui Xie 116 Tugboat, Volume 35 (2014), No