The Regress Function

News The Newsletter of the R Project Volume 6/2, May 2006 Editorial by Paul Murrell signing clinical trials. Using Sweave in clinical practice is the topic of Sven Garbade and Peter Burgard’s Welcome to the first regular issue of R News for 2006. article, while M. Wangler, J. Beyersmann, and M. This is a bumper issue, with a thirteen contributed Schumacher focus on length of hospital stay with the articles. Many thanks to the contributors and review- changeLOS package. ers for all of the enthusiasm and effort that they put Next, there are three graphics articles. Nitin Jain into these articles. and Gregory Warnes describe the “balloonplot”, a This issue begins with some “methodological” tool for visualizing tabulated data. Jung Zhao ex- papers. Peter Ruckdeschel, Matthias Kohl, Thomas plains how to produce pedigree plots in R, and Brian Stabla, and Florian Camphausen introduce their Ripley and I describe some improvements in R’s font distr package, which provides a set of S4 classes and support for PDF and PostScript graphics output. methods for defining distributions of random vari- SAS users struggling to make the transition to R ables. This is followed by an article from David Clif- might enjoy Søren Højsgaard’s article on his doBy ford and Peter McCullagh on the many uses of the package and the contributed articles are rounded out regress package, then Lukasz Komsta describes his by two pieces from Robin Hankin: one on the onion outliers package, which contains general tests for package for normed division algebras and one on outlying observations. the ResistorArray package for analysing resistor net- The next set of articles involve applications of R works. to particular settings. David Kane and Jeff Enos de- David Meyer provides us with a review of the scribe their portfolio package for working with eq- book “R Graphics”. uity portfolios. Roman Pahl, Andreas Ziegler, and This issue accompanies the release of R version Inke König discuss the GroupSeq package for de- 2.3.0. Changes in R itself and new CRAN packages Contents of this issue: Non-Standard Fonts in PostScript and PDF Graphics . 41 Editorial . 1 The doBy package . 47 S4 Classes for Distributions . 2 Normed division algebras with R: introducing The regress function . 6 the onion package . 49 Processing data for outliers . 10 Resistor networks R: Introducing the Resis- Analysing equity portfolios in R . 13 torArray package . 52 GroupSeq—Designing clinical trials using The good, the bad, and the ugly—Review of group sequential designs . 21 Paul Murrell’s new book: “R Graphics” . 54 Using R/Sweave in everyday clinical practice . 26 Changes in R . 54 changeLOS: An R-package for change in Changes on CRAN . 64 length of hospital stay based on the Aalen- R Foundation News . 70 Johansen estimator . 31 News from the Bioconductor Project . 70 Balloon Plot . 35 Forthcoming Events: DSC 2007 . 71 Drawing pedigree diagrams with R and graphviz . 38 Vol. 6/2, May 22, 2006 2 that have appeared since the release of R 2.2.0 are de- conductor packages. scribed in the relevant regular sections. There is also Finally, I would like to remind everyone of the an update of individuals and institutions who are upcoming useR! 2006 conference. It has attracted providing support for the R Project via the R Foun- a huge number of presenters and promises to be a dation and there is an announcement and call for great success. I hope to see you there! abstracts for the DSC 2007 conference to be held in Auckland, New Zealand. A new addition with this issue is “News from Paul Murrell the Bioconductor Project” by Seth Falcon, which de- The University of Auckland, New Zealand scribes important features of the 1.8 release of Bio- [email protected] S4 Classes for Distributions by Peter Ruckdeschel, Matthias Kohl, Thomas Stabla, and dispatching mechanism decide what to do at run- Florian Camphausen time. In particular, we would like to automatically generate the corresponding functions r, d, p, and q distr is an R package that provides a conceptual treat- for the law of expressions like X+3Y for objects X and Y ment of random variables (r.v.’s) by means of S4– of class Distribution, or, more generally, of a trans- classes. A mother class Distribution is introduced formation of X, Y under a function f : R2 → R which with slots for a parameter and for methods r, d, p, is already realized as a function in R. and q, consecutively for simulation and for evalu- These operations are possible with the distr pack- ation of the density, c.d.f., and quantile function of age. As an example, try the corresponding distribution. All distributions of > require("distr") the base package are implemented as subclasses. By Loading required package: distr means of these classes, we can automatically gener- [1] TRUE ate new objects of these classes for the laws of r.v.’s > N <- Norm(mean=2,sd=1.3) under standard mathematical univariate transforma- > P <- Pois(lambda=1.2) tions and under convolution of independent r.v.’s. In > (Z <- 2*N+3+P) the distrSim and distrTEst packages, we also pro- Distribution Object of Class: AbscontDistribution vide classes for a standardized treatment of simula- > plot(Z) tions (also under contamination) and evaluations of > p(Z)(0.4) statistical procedures on such simulations. [1] 0.002415384 > q(Z)(0.3) [1] 6.70507 Motivation > r(Z)(10) [1] 11.072931 7.519611 10.567212 3.358912 R contains powerful techniques for virtually any use- [5] 7.955618 9.094524 5.293376 5.536541 ful distribution using the suggestive naming conven- [9] 9.358270 10.689527 tion [prefix]<name> as functions, where [prefix] stands for r, d, p, or q, and <name> is the name of the Density of AbscontDistribution CDF of AbscontDistribution Quantile of AbscontDistribution distribution. There are limitations of this concept: You can 30 only use distributions that are already implemented 0.12 0.8 in some library or for which you have provided an 20 implementation. In many natural settings you want grid 10 0.4 to formulate algorithms once for all distributions, so 0.06 d(x)(grid) p(x)(grid) you should be able to treat the actual distribution 0 <name> as if it were a variable. 0.0 You may of course paste together a prefix and 0.00 0 20 0 20 0.0 0.4 0.8 the value of <name> as a string and then use grid grid p(x)(grid) eval(parse(....)). This is neither very elegant nor flexible, however. Instead, we would prefer to implement the algo- Figure 1: density, c.d.f. and quantile function of Z rithm by passing an object of some distribution class as an argument to a function. Even better, though, Comment: we would use a generic function and let the S4- N and P are assigned objects of classes "Norm" and R News ISSN 1609-3631 Vol. 6/2, May 22, 2006 3 "Pois", respectively, with corresponding parameters. in R. They all have a param slot for a parameter, an The command Z <- 2*N+3+P generates a new distribu- img slot for the range of the corresponding r.v., and tion object and assigns it to Z — an object of class r, d, p, and q slots. "AbscontDistribution". Identifying N, P, Z with r.v.’s dis- L(Z) = L( N + tributed according to these distributions, 2 Subclasses 3 + P). Hence, p(Z)(0.4) returns P(Z ≤ 0.4); q(Z)(0.3) returns the 30%-quantile of Z; r(Z)(10) generates 10 At present, the distr package is limited to univari- pseudo random numbers distributed according to Z; and ate distributions; these are derived from the subclass the plot command produces Figure 1. UnivariateDistribution, and as typical subclasses, we introduce classes for absolutely continuous (a.c.) and discrete distributions — AbscontDistribution Concept and DiscreteDistribution. The latter has an extra slot support, a vector containing the sup- Our class design is deliberately open so that our port of the distribution, which is truncated to the classes may easily be extended by any volunteer in lower/upper TruncQuantile in case of an infinite the R community; loose ends are multivariate dis- support. TruncQuantile is a global option of distr tributions, time series distributions, and conditional described in Ruckdeschel et al. (2005, Section 4). distributions. As an exercise, the reader is encour- As subclasses we have implemented all paramet- aged to implement extreme value distributions from ric families from the base package simply by pro- package evd. The greatest effort will be the docu- viding wrappers to the original R-functions of form mentation. [prefix]<name>. More precisely, as subclasses of As to the naming of the slots, our goal was to pre- class AbscontDistribution, we have implemented serve naming and notation from the base package Beta, Cauchy, Chisq, Exp, Logis, Lnorm, Norm, Unif, as far as possible so that any programmer familiar and Weibull; to avoid name conflicts, the F-, t-, and with S could quickly use our distr package. In ad- Γ-distributions are implemented as Fd, Td, Gammad. dition, as the distributions already implemented in R As subclasses of class DiscreteDistribution, we are all well tested and programmed, we use the exist- have implemented Binom, Dirac, Geom, Hyper, ing r, d, p, and q-functions wherever possible, simply NBinom, and Pois. wrapping them in small code sniplets to our class hi- erarchy. All this should make intensive use of object Parameter classes orientation in order to be able to use inheritance and method overloading.

The Regress Function

Progress in the R Ecosystem for Representing and Handling Spatial Data

Sweave User Manual

Supplementary Materials

The Split-Apply-Combine Strategy for Data Analysis

Hadley Wickham, the Man Who Revolutionized R Hadley Wickham, the Man Who Revolutionized R · 51,321 Views · More Stats

R Generation [1] 25

Changes on CRAN 2014-07-01 to 2014-12-31

R Programming for Data Science

ESS — Emacs Speaks Statistics

Conference Brochure S P E a K E R S

August 12-14, Dortmund, Germany

Species' Traits and Phylogenetic