INVITED SECTION:THE FUTUREOF R 9

Collaborative Software Development Using R-

Special invited paper on “The Future of R” Forge, host a project-specific website, and how pack- age maintainers submit a package to the Compre- by Stefan Theußl and Achim Zeileis hensive R Archive Network (CRAN, http://CRAN. R-project.org/). Finally, we summarize recent de- velopments and give a brief outlook to future work. Introduction

Open source software (OSS) is typically created in a R-Forge decentralized self-organizing process by a commu- nity of developers having the same or similar inter- R-Forge offers a central platform for the develop- ests (see the famous essay by Raymond, 1999). A ment of R packages, R-related software and other key factor for the success of OSS over the last two projects. decades is the Internet: Developers who rarely meet R-Forge is based on GForge (Copeland et al., face-to-face can employ new means of communica- 2006) which is an open source fork of the 2.61 Source- tion, both for rapidly writing and deploying software Forge code maintained by Tim Perdue, one of the (in the spirit of Linus Torvald’s “release early, release original SourceForge authors. GForge has been mod- often paradigm”). Therefore, many tools emerged ified to provide additional features for the R commu- that assist a collaborative software development pro- nity, namely a CRAN-style repository for hosting de- cess, including in particular tools for source code velopment releases of R packages as well as a quality management (SCM) and . management system similar to that of CRAN. Pack- In the R world, SCM is not a new idea; in fact, ages hosted on R-Forge are provided in source form the R Development Core Team has always been us- as well as in binary form for Mac OS X and Windows. ing SCM tools for the R sources, first by means of They can be downloaded from the website of the cor- Concurrent Versions System (CVS, see Cederqvist et responding project on R-Forge or installed directly in al., 2006), and then via Subversion (SVN, see Pilato R; for a package foo, say, install.packages("foo", et al., 2004). A central repository is hosted by ETH repos = "http://R-Forge.R-project.org"). Zürich mainly for managing the development of the On R-Forge, developers organize their work in base R system. Mailing lists like R-help, R-devel and so-called “Projects”. Every project has various tools many others are currently the main communication and web-based features for software development, channels in the R community. communcation and other services. All features men- Also beyond the base system, many R con- tioned in the following sections are accessible via so- tributors employ SCM tools for managing their R called “Tabs”: e.g., user accounts can be managed in packages, e.g., via web-based SVN repositories like the My Page tab or a list of available projects can be SourceForge (http://SourceForge.net/) or Google displayed using the Project Tree tab. Code (http://Code.Google.com/). However, there Since starting the platform in early 2007, more has been no central SCM repository providing ser- and more interested users registered their projects on vices suited to the specific needs of R package de- R-Forge. Now, after more than two years of develop- velopers. Since early 2007, the R-project offers such ment and testing, around 350 projects and more than a central platform to the R community. R-Forge 900 users are registered on R-Forge. This and the (http://R-Forge.R-project.org/) provides a set of steadily growing list of feature requests show that tools for source code management and various web- there is a high demand for centralized source code based features. It aims to provide a platform for management tools and for releasing prototype code collaborative development of R packages, R-related frequently among the R community. software or further projects. R-Forge is closely re- In the next three sections, we summarize the core lated to the most famous of such platforms—the features of R-Forge and what R-Forge offers to the R world’s largest OSS development website—namely community in terms of collaborative development of http://SourceForge.net/. R-related software projects. The remainder of this article is organized as fol- lows. First, we present the core features that R- Source code management Forge offers to the R community. Second, we give a hands-on tutorial on how users and developers can When carrying out software projects, source files get started with R-Forge. In particular, we illustrate change over time, new files get created and old files how people can register, set up new projects, use R- deleted. Typically, several authors work on several Forge’s SCM facilities, provide their packages on R- computers on the same and/or different files and

The R Journal Vol. 1/1, May 2009 ISSN 2073-4859 10 INVITED SECTION:THE FUTUREOF R keeping track of every change can become a tedious velopers so that they can improve the package to task. In the open source community, the general so- pass all tests on R-Forge and subsequently on CRAN. lution to this problem is to use version control, typi- Second, bug tracking systems allow users to no- cally provided by the majority of SCM tools. For this tify package authors about problems they encounter. reason R-Forge utilizes SVN to facilitate the devel- In the spirit of OSS—given enough eyeballs, all oper’s work when creating software. bugs are shallow (Raymond, 1999)—peer code re- A central repository ensures that the developer view leads to an overall improvement of the quality always has access to the current version of the of software projects. project’s source code. Any of the authorized col- laborators can “check out” (i.e., download) or “up- Additional features date” the project file structure, make the necessary changes or additions, delete files from the current re- A number of further tools, of increased interest for vision and finally “commit” changes or additions to larger projects, help developers to coordinate their the repository. More than that, SVN keeps track of work and to communicate with their user base. the complete history of the project file structure. At These tools include: any point in the development stage it is possible to go back to any previous stage in the history to inspect • Project websites: Developers may present their and restore old files. This is called version control, as work on a subdomain of R-Forge, e.g., http: every stage automatically is assigned a unique ver- //foo.R-Forge.R-project.org/, or via a link sion number which increases over time. to an external website. On R-Forge such a version-controlled repository foo-commits@ is automatically created for each project. To get • Mailing lists: By default a list lists.R-Forge.R-project.org started, the project members just have to install the is automati- client of their choice (e.g., Tortoise SVN on Windows cally created for each project. Additional mail- or svnX on Mac OS X) and check out the repository. ing lists can be set up easily. In addition to the inherent backup of every version • Project categorization: Administrators may within the repository a backup of the whole reposi- categorize their project in several so-called tory is generated daily. “Trove Categories” in the Admin tab of their A rights management system assures that, by de- project (under Trove Categorization). For each fault, anonymous users have read access and devel- category three different items can be selected. opers have write access to the data associated with a This enables users to quickly find what they are certain project on R-Forge. More precisely, registered looking for using the Project Tree tab. users can be granted one of several roles: e.g., the “Administrator” has all rights including the right to • News: Announcements and other information add new users to the project or release packages di- related to a project can be put on the project rectly to CRAN. He/she is usually the package main- summary page as well as on the home page of tainer, the project leader or has registered the project R-Forge. The latter needs approval by one of originally. Other members of a project typically have the R-Forge administrators. All items are avail- either the role “Senior Developer” or “Junior Devel- able as RSS feeds. oper” which both have permission to commit to the project SVN repository and examine the log files in • Forums: Discussion forums can be set up sepa- the R Packages tab (the differences between the two rately by the project administrators. roles are subtle, e.g., senior developers additionally have administrative privileges in several places in the project). When we speak of developers in sub- How to get started sequent sections we refer to project members having the rights at least of a junior developer. This section is intended to be a hands-on tutorial for new users. Depending on familiarity with the sys- tems/tools involved the instructions might be some- Release and quality management what brief. In case of problems we recommend con- sulting the user’s manual (R-Forge Administration Development versions of a software project are typ- and Development Team, 2008) which contains de- ically prototypes and are subject to many changes. tailed step-by-step instructions. Thus, R-Forge offers two tools which assist the devel- When accessing the URL http://R-Forge. opers in improving the quality of their source code. R-project.org/ the home page is presented (see First, it offers a quality management system sim- Figure1). Here one can ilar to that of CRAN. Packages on R-Forge are checked in a standardized way on different plat- • login, forms based on R CMD check at least once daily. The resulting log files can be examined by the project de- • register a user or a project,

The R Journal Vol. 1/1, May 2009 ISSN 2073-4859 INVITED SECTION:THE FUTUREOF R 11

Figure 1: Home page of R-Forge on 2009-03-10

• download the documentation, Registering a project

There are two possibilities to register a project: Click- • examine the latest news and changes, ing on Register Your Project on the home page or going to the My Page tab and clicking on Register Project (see • go to a specific project website either by search- Figure2 below the main tabs). Both links lead to a ing for available projects (top middle of the form which has to be filled out in order to finish the page), by clicking on one of the projects listed registration process. In the text field “Project Pub- on the right, or by going through the listing in lic Description” the registrant is supposed to enter a the Project Tree tab. concise description of the project. This text and the “Project Full Name” will be shown in several places Registered users can access their personal page on R-Forge (see e.g., Figure1). The text entered in the via the tab named My Page. Figure2 shows the per- field “Project Purpose And Summarization” is addi- sonal page of the first author. tional information for the R-Forge administrators, in- spected for approval. “Project Name” refers to the name which uniquely determines the project. In Registering as a new user the case of a project that contains a single R pack- age, the project Unix name typically corresponds to To use R-Forge as a developer, one has to register as the package name (in its lower-case version). Restric- a site user. A link on the main web site called New tions according to the Unix file system convention Account on the top right of the home page leads to force Unix names to be in lower case (and will be the corresponding registration form. converted automatically if they are typed in upper After submitting the completed form, an e-mail is case). sent to the given address containing a link for activat- After the completed form is submitted, the ing the account. Subsequently, all R-Forge features, project has to be approved by the R-Forge admin- including joining existing or creating new projects, istrators and a confirmation e-mail is sent to the are available to logged-in users. registrant upon approval. After that, the regis-

The R Journal Vol. 1/1, May 2009 ISSN 2073-4859 12 INVITED SECTION:THE FUTUREOF R

Figure 2: The My Page tab of the first author trant automatically becomes the project administra- if the package is already version-controlled some- tor and the standardized web area of the project where else—the corresponding parts of the reposi- (http://R-Forge.R-project.org/projects/foo/) is tory including the history can be migrated to R-Forge immediately available on R-Forge. This web area in- (see Section 6 of the user’s manual). cludes a Summary page, an Admin tab visible only to The SCM tab of a project explains how the project administrators, and various other tabs de- the corresponding SVN repository located at pending on the features enabled for this project in the svn://svn.R-Forge.R-project.org/svnroot/foo Admin tab. To present the new project to a broader can be checked out. From this URL the sources are community the name of the project additionally is checked out either anonymously without write per- promoted on the home page under “Recently Reg- mission (enabled by default) or as developer using istered Projects” (see Figure1). an encrypted authentication method, namely secure Furthermore, within an hour after approval a de- shell (ssh). Via this secure protocol (svn+ssh:// fol- fault mailing list and the project’s SVN repository lowed by the registered user name) developers have containing a ‘README’ file and two pre-defined di- full access to the repository. Typically, the developer rectories called ‘pkg’ and ‘www’ are created. The con- is authenticated via the registered password but it is tent of these directories is used by the R-Forge server also possible to upload a secure shell key (updated for creating R packages and a project website, respec- hourly) to make use of public/private key authenti- tively. cation. Section 3.2 of the user’s manual explains this process in more detail. SCM and R packages To make use of the package builds and checks the package source code has to be put into the ‘pkg’ The first step after creation of a project is typically directory of the repository (i.e., ‘pkg/DESCRIPTION’, to start generation of content for one (or more) R ‘pkg/R’, ‘pkg/man’, etc.) or, alternatively, a subdirec- package(s) in the ‘pkg’ directory. Developers can ei- tory of ‘pkg’. The latter structure allows developers to ther start committing changes via SVN as usual or— have more than one package in a single project; e.g.,

The R Journal Vol. 1/1, May 2009 ISSN 2073-4859 INVITED SECTION:THE FUTUREOF R 13 if a project consists of the packages foo and bar, then Recent and future developments the source code is located in ‘pkg/foo’ and ‘pkg/bar’, respectively. In this section, we briefly summarize the major R-Forge automatically examines the ‘pkg’ direc- changes and updates to the R-Forge system during tory of every repository and builds the package the last few months and give an outlook to future sources as well as the package binaries on a daily developments. basis for Mac OS X and Windows (if applicable). Recently added features and major changes in- The package builds are provided in the R Pack- clude: ages tab (see Figure3) for download or can be in- stalled directly in R using install.packages("foo", • New defaults for freshly registered projects: repos="http://R-Forge.R-project.org"). Further- Only the tabs Lists, SCM and R packages are en- more, in the R Packages tab developers can examine abled initially. Forums, Tracker and News can logs of the build and check process on different plat- be activated separately (in order not to over- forms. whelm new users with too many features) us- To release a package to CRAN the project ad- ing Edit Public Info in the Admin tab of the ministrator clicks on Submit this package to CRAN in project. Experienced users can decide which the R Packages tab. Upon confirmation, a message features they want and activate them. will be sent to [email protected] and the latest successfully built source package (the ‘.tar.gz’ file) • An enhanced structure in the SVN repository is automatically copied to ftp://CRAN.R-project. allowing multiple packages in a single project org/incoming/. Note that packages are built once (see above). daily, i.e., the latest source package does not include • The R package RForgeTools (Theußl, 2009) more recent code committed to the SVN repository. contains platform-independent package build- Figure3 shows the R Packages tab of the tm ing and quality management code used on the http://R-Forge.R-project.org/projects/ project ( R-Forge servers. tm/) as one of the project administrators would see it. Depending on whether you are a member of the • A modified News submit page offering two project or not and your rights you will see only parts types of submissions: project-specific and of the information provided for each package. global news. The latter needs approval by one of the R-Forge administrators (default: project- Further steps only submission). A customized project website, accessible through • Circulation of SVN commit messages can be http://foo.R-Forge.R-project.org/ where foo enabled in the SCM Admin tab by project ad- corresponds to the unix name of the project, is man- ministrators. The mailing list mentioned in the aged via the ‘www’ directory. The website gets up- text field is used for delivering the SVN commit dated every hour. messages (default: off). The changes made to the project can be exam- • Mailing list search facilities provided by the ined by entering the corresponding standardized Swish-e engine which can be accessed via the web area. On entry, the Summary page is shown. List tab (private lists are not included in the Here, one can search index). • examine the details of the project including a Further features and improvements which are short description and a listing of the adminis- currently on the wishlist or under development in- trators and developers, clude • follow a link leading to the project homepage, • a , • examine the latest news announcements (if available), • task management facilities,

• go to other sections of the project like Forums, • a re-organized tracker more compatible with R Tracker, Lists, R Packages, etc. package development and,

• follow the download link leading directly to • an update of the underlying GForge sys- the available packages of the project (i.e., the tem to its successor FusionForge (http:// R Packages tab). FusionForge.org/).

Furthermore, meta-information about a project For suggestions, problems, feature requests, and can be supplied in the Admin tab via so-called “Trove other questions regarding R-Forge please contact Categorization”. [email protected].

The R Journal Vol. 1/1, May 2009 ISSN 2073-4859 14 INVITED SECTION:THE FUTUREOF R

Figure 3: R Packages tab of the tm project

Acknowledgments 2004. Full book available online at http:// svnbook.red-bean.com/. Setting up this project would not have been possi- ble without Douglas Bates and the University of Wis- R-Forge Administration and Development Team. R- consin as they provided us with a server for hosting Forge User’s Manual, 2008. URL http://download. this platform. Furthermore, the authors would like R-Forge.R-project.org/R-Forge.pdf. to thank the Computer Center of the Wirtschaftsuni- E. S. Raymond. The Cathedral and the Bazaar. versität Wien for their support and for providing us Knowledge, Technology, and Policy, 12(3):23–49, 1999. with additional hardware as well as a professional server infrastructure. S. Theußl. RForgeTools: R-Forge Build and Check Tools, 2009. URL http://R-Forge.R-project.org/ projects/site/ Bibliography . R package version 0.3-3.

P. Cederqvist et al. Version Management with CVS. Stefan Theußl Network Theory Limited, Bristol, 2006. Full Department of Statistics and Mathematics book available online at http://ximbiot.com/ WU Wirtschaftsuniversität Wien cvs/manual/. Austria [email protected] T. Copeland, R. Mas, K. McCullagh, T. Per- due, G. Smet, and R. Spisser. GForge Man- ual, 2006. URL http://gforge.org/gf/download/ Achim Zeileis docmanfileversion/8/1700/gforge_manual.pdf. Department of Statistics and Mathematics WU Wirtschaftsuniversität Wien C. M. Pilato, B. Collins-Sussman, and B. W. Fitz- Austria patrick. Version Control with Subversion. O’Reilly, [email protected]

The R Journal Vol. 1/1, May 2009 ISSN 2073-4859