The R Journal Volume 4/1, June 2012
Total Page:16
File Type:pdf, Size:1020Kb
The Journal Volume 4/1, June 2012 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial . .3 Contributed Research Articles Analysing Seasonal Data . .5 MARSS: Multivariate Autoregressive State-space Models for Analyzing Time-series Data . 11 openair – Data Analysis Tools for the Air Quality Community . 20 Foreign Library Interface . 30 Vdgraph: A Package for Creating Variance Dispersion Graphs . 41 xgrid and R: Parallel Distributed Processing Using Heterogeneous Groups of Apple Com- puters . 45 maxent: An R Package for Low-memory Multinomial Logistic Regression with Support for Semi-automated Text Classification . 56 Sumo: An Authenticating Web Application with an Embedded R Session . 60 From the Core Who Did What? The Roles of R Package Authors and How to Refer to Them . 64 News and Notes Changes in R . 70 Changes on CRAN . 80 R Foundation News . 96 2 The Journal is a peer-reviewed publication of the R Foun- dation for Statistical Computing. Communications regarding this pub- lication should be addressed to the editors. All articles are licensed un- der the Creative Commons Attribution 3.0 Unported license (CC BY 3.0, http://creativecommons.org/licenses/by/3.0/). Prospective authors will find detailed and up-to-date submission in- structions on the Journal’s homepage. Editor-in-Chief: Martyn Plummer Editorial Board: Heather Turner, Hadley Wickham, and Deepayan Sarkar Editor Programmer’s Niche: Bill Venables Editor Help Desk: Uwe Ligges Editor Book Reviews: G. Jay Kerns Department of Mathematics and Statistics Youngstown State University Youngstown, Ohio 44555-0002 USA [email protected] R Journal Homepage: http://journal.r-project.org/ Email of editors and editorial board: [email protected] The R Journal is indexed/abstracted by EBSCO, DOAJ. The R Journal Vol. 4/1, June 2012 ISSN 2073-4859 3 Editorial by Martyn Plummer Interface for R to call arbitrary native functions with- out the need for C wrapper code. There is also an ar- Earlier this week I was at the “Premières Rencontres ticle from our occasional series “From the Core”, de- R” in Bordeaux, the first francophone meeting of R signed to highlight ideas and methods in the words users (although I confess that I gave my talk in En- of R development core team members. This article glish). This was a very enjoyable experience, and by Kurt Hornik, Duncan Murdoch and Achim Zeilies not just because it coincided with the Bordeaux wine explains the new facilities for handling bibliographic festival. The breadth and quality of the talks were information introduced in R 2.12.0 and R 2.14.0. I both comparable with the International UseR! con- strongly encourage package authors to read this ar- ferences. It was another demonstration of the extent ticle and implement the changes to their packages. to which R is used in diverse disciplines. Doing so will greatly facilitate the bibliometric anal- As R continues to expand into new areas, we yses and investigations of the social structure of the see the development of packages designed to ad- R community. dress the specific needs of different user communi- Elsewhere in this issue, we have articles describ- ties. This issue of The R Journal contains articles on ing the season package for displaying and analyz- two such packages. Karl Ropkins and David Carslaw ing seasonal data, which is a companion to the book describe how the design of the openair package was Analysing Seasonal Data by Adrian Barnett and An- influenced by the need to make an accessible set of nette Dobson; the Vdgraph package for drawing tools available for people in the air quality commu- variance dispersion graphs; and the maxent pack- nity who are not experts in R. Elizabeth Holmes and age which allows fast multinomial logistic regression colleagues introduce the MARSS package for mul- with a low memory footprint, and is designed for ap- tivariate autoregressive state-space models. Such plications in text classification. models are used in many fields, but the MARSS package was motivated by the particular needs of re- The R Journal continues to be the journal of searchers in the natural and environmental sciences, record for the R project. The usual end matter sum- There are two articles on the use of R outside the marizes recent changes to R itself and on CRAN. It desktop computing environment. Sarah Anoke and is worth spending some time browsing these sec- colleagues describe the use of the Apple Xgrid sys- tions in order to catch up on changes you may have tem to do distributed computing on Mac OS X com- missed, such as the new CRAN Task View on dif- puters, and Timothy Bergsma and Michael Smith ferential equations. Peter Dalgaard highlighted the demonstrate the Sumo web application, which in- importance of this area in his editorial for volume 2, cludes an embedded R session. issue 2. Two articles are of particular interest to develop- On behalf of the editorial board I hope you enjoy ers. Daniel Adler demonstrates a Foreign Function this issue. The R Journal Vol. 4/1, June 2012 ISSN 2073-4859 4 The R Journal Vol. 4/1, June 2012 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 5 Analysing Seasonal Data by Adrian G Barnett, Peter Baker and Annette J Dobson but often very effective analyses, as we describe be- low. Abstract Many common diseases, such as the flu More complex seasonal analyses examine non- and cardiovascular disease, increase markedly stationary seasonal patterns that change over time. in winter and dip in summer. These seasonal Changing seasonal patterns in health are currently patterns have been part of life for millennia and of great interest as global warming is predicted to were first noted in ancient Greece by both Hip- make seasonal changes in the weather more extreme. pocrates and Herodotus. Recent interest has fo- Hence there is a need for statistical tools that can es- cused on climate change, and the concern that timate whether a seasonal pattern has become more seasons will become more extreme with harsher extreme over time or whether its phase has changed. winter and summer weather. We describe a set Ours is also the first R package that includes the of R functions designed to model seasonal pat- case-crossover, a useful method for controlling for terns in disease. We illustrate some simple de- seasonality. scriptive and graphical methods, a more com- This paper illustrates just some of the functions of plex method that is able to model non-stationary the season package. We show some descriptive func- patterns, and the case-crossover to control for tions that give simple means or plots, and functions seasonal confounding. whose goal is inference based on generalised linear models. The package was written as a companion to In this paper we illustrate some of the functions a book on seasonal analysis by Barnett and Dobson of the season package (Barnett et al., 2012), which (2010), which contains further details on the statisti- contains a range of functions for analysing seasonal cal methods and R code. health data. We were motivated by the great inter- est in seasonality found in the health literature, and the relatively small number of seasonal tools in R (or Analysing monthly seasonal pat- other software packages). The existing seasonal tools terns in R are: Seasonal time series are often based on data collected baysea • the function of the timsac package and every month. An example that we use here is the decompose stl the and functions of the stats monthly number of cardiovascular disease deaths in package for decomposing a time series into a people aged ≥ 75 years in Los Angeles for the years trend and season; 1987–2000 (Samet et al., 2000). Before we examine • the dynlm function of the dynlm package and or plot the monthly death rates we need to make the ssm function of the sspir package for fitting them more comparable by adjusting them to a com- dynamic linear models with optional seasonal mon month length (Barnett and Dobson, 2010, Sec- components; tion 2.2.1). Otherwise January (with 31 days) will likely have more deaths than February (with 28 or • the arima function of the stats package and the 29). Arima function of the forecast package for fit- In the example below the monthmean function ting seasonal components as part of an autore- is used to create the variable mmean which is the gressive integrated moving average (ARIMA) monthly average rate of cardiovascular disease model; and deaths standardised to a month length of 30 days. As the data set contains the population size (pop) we can • the bfast package for detecting breaks in a sea- also standardise the rates to the number of deaths sonal pattern. per 100,000 people. The highest death rate is in Jan- uary (397 per 100,000) and the lowest in July (278 per These tools are all useful, but most concern decom- 100,000). posing equally spaced time series data. Our package includes models that can be applied to seasonal pat- > data(CVD) terns in unequally spaced data. Such data are com- > mmean = monthmean(data = CVD, mon in observational studies when the timing of re- resp = CVD$cvd, adjmonth = "thirty", sponses cannot be controlled (e.g. for a postal sur- pop = pop/100000) vey). > mmean In the health literature much of the analysis of Month Mean seasonal data uses simple methods such as compar- January 396.8 ing rates of disease by month or using a cosinor re- February 360.8 gression model, which assumes a sinusoidal seasonal March 327.3 pattern. We have created functions for these simple, April 311.9 The R Journal Vol.