The R Journal Volume 5/1, June2013
Total Page:16
File Type:pdf, Size:1020Kb
The Journal Volume 5/1, June 2013 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial..................................4 Contributed Research Articles RTextTools: A Supervised Learning Package for Text Classification.........6 Generalized Simulated Annealing for Global Optimization: The GenSA Package... 13 Multiple Factor Analysis for Contingency Tables in the FactoMineR Package.... 29 Hypothesis Tests for Multivariate Linear Models Using the car Package....... 39 osmar: OpenStreetMap and R......................... 53 ftsa: An R Package for Analyzing Functional Time Series............. 64 Statistical Software from a Blind Person’s Perspective............... 73 PIN: Measuring Asymmetric Information in Financial Markets with R....... 80 QCA: A Package for Qualitative Comparative Analysis.............. 87 An Introduction to the EcoTroph R Package: Analyzing Aquatic Ecosystem Trophic Networks.................................. 98 stellaR: A Package to Manage Stellar Evolution Tracks and Isochrones....... 108 Let Graphics Tell the Story - Datasets in R.................... 117 Estimating Spatial Probit Models in R...................... 130 ggmap: Spatial Visualization with ggplot2.................... 144 mpoly: Multivariate Polynomials in R..................... 162 beadarrayFilter: An R Package to Filter Beads.................. 171 Fast Pure R Implementation of GEE: Application of the Matrix Package....... 181 RcmdrPlugin.temis, a Graphical Integrated Text Mining Solution in R....... 188 Possible Directions for Improving Dependency Versioning in R.......... 197 Translating Probability Density Functions: From R to BUGS and Back Again..... 207 News and Notes Conference Review: The 6th Chinese R Conference................ 210 Conference Report: R/Finance 2013...................... 212 The R User Conference 2013.......................... 215 2 News from the Bioconductor Project...................... 218 R Foundation News............................. 220 Changes in R................................ 221 Changes on CRAN.............................. 239 The R Journal Vol. 5/1, June 2013 ISSN 2073-4859 3 The R Journal is a peer-reviewed publication of the R Foundation for Statistical Computing. Communications regarding this publication should be addressed to the editors. All articles are licensed under the Creative Commons Attribution 3.0 Unported license (CC BY 3.0, http://creativecommons.org/licenses/by/3.0/). Prospective authors will find detailed and up-to-date submission instructions on the Journal’s homepage. Editor-in-Chief: Hadley Wickham Editorial Board: Martyn Plummer, Deepayan Sarkar and Bettina Grün R Journal Homepage: http://journal.r-project.org/ Email of editors and editorial board: [email protected] The R Journal is indexed/abstracted by EBSCO, DOAJ, and Thomson Reuters. The R Journal Vol. 5/1, June 2013 ISSN 2073-4859 4 Editorial by Hadley Wickham I’m very pleased to published my first issue of the R Journal as editor. As well as this visible indication of my working, I’ve also been hard at work behind the scenes to modernise some of the code that powers the R Journal. This includes: • The new one-column style that you’ll see in this issue. We changed because the R Journal is an electronic journal, and one column is much easier to view on screen. It also makes it much easier to intermingle code and text, which is so important for the R Journal. We also made a few tweaks to the typography which will hopefully make it a little more pleasant to view. Love it? Hate it? Please let me know and we may do another round of smaller tweaks for the next issue. • I’ve also written an R package to manage the submission process. You won’t notice this directly as a reader or author, but it should help us to keep track of articles and make sure we respond in a timely manner. • The R Journal website is now built using jekyll, a modern static site generator. You shouldn’t notice any difference, but it gives us a bit more freedom to introduce new features. In this issue We have a bumper crop of articles. I’ve roughly categorised them below. Visualisation: •“ osmar: OpenStreetMap and R” by Manuel J. A. Eugster and Thomas Schlesinger • “Let Graphics Tell the Story - Datasets in R” by Antony Unwin, Heike Hofmann and Dianne Cook •“ ggmap: Spatial Visualization with ggplot2” by David Kahle and Hadley Wickham Text mining: • “RTextTools: A Supervised Learning Package for Text Classification” by Timothy P. Jurka, Loren Collingwood, Amber E. Boydstun, Emiliano Grossman and Wouter van Atteveldt • “RcmdrPlugin.temis, a Graphical Integrated Text Mining Solution in R” by Milan Bouchet-Valat and Gilles Bastin Modelling: • “Multiple Factor Analysis for Contingency Tables in the FactoMineR Package” by Belchin Kostov, Mónica Bécue-Bertaut and François Husson • “Hypothesis Tests for Multivariate Linear Models Using the car Package” by John Fox, Michael Friendly and Sanford Weisberg • “Estimating Spatial Probit Models in R” by Stefan Wilhelm and Miguel Godinho de Matos • “Fast Pure R Implementation of GEE: Application of the Matrix Package” by Lee S. McDaniel, Nicholas C. Henderson and Paul J. Rathouz •“ ftsa: An R Package for Analyzing Functional Time Series” by Han Lin Shang The R Journal Vol. 5/1, June 2013 ISSN 2073-4859 5 The R ecosystem: • “Statistical Software from a Blind Person’s Perspective” by A. Jonathan R. Godfrey • “Possible Directions for Improving Dependency Versioning in R” by Jeroen Ooms • “Translating Probability Distributions: From R to BUGS and Back Again“ by David LeBauer, Michael Dietze, and Ben Bolker And a range of papers applying R to diverse subject areas: • “PIN: Measuring Asymmetric Information in Financial Markets with R” by Paolo Zagaglia • “Generalized Simulated Annealing for Global Optimization: The GenSA Package” by Yang Xiang, Sylvain Gubian, Brian Suomela and Julia Hoeng • “QCA: A Package for Qualitative Comparative Analysis” by Alrik Thiem and Adrian Du¸sa • “An Introduction to the EcoTroph R Package: Analyzing Aquatic Ecosystem Trophic Networks” by Mathieu Colléter, Jérôme Guitton and Didier Gascuel • “stellaR: A Package to Manage Stellar Evolution Tracks and Isochrones” by Matteo Dell’Omodarme and Giada Valle •“ mpoly: Multivariate Polynomials in R” by David Kahle • “The beadarrayFilter: An R Package to Filter Beads“, by Anyiawung Chiara Forcheh, Geert Verbeke, Adetayo Kasim, Dan Lin, Ziv Shkedy, Willem Talloen, Hinrich W.H. Goehlmann, and Lieven Clement. Hadley Wickham, RStudio [email protected] The R Journal Vol. 5/1, June 2013 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 6 RTextTools: A Supervised Learning Package for Text Classification by Timothy P. Jurka, Loren Collingwood, Amber E. Boydstun, Emiliano Grossman, and Wouter van Atteveldt Abstract Social scientists have long hand-labeled texts to create datasets useful for studying topics from congressional policymaking to media reporting. Many social scientists have begun to incorporate machine learning into their toolkits. RTextTools was designed to make machine learning accessible by providing a start-to-finish product in less than 10 steps. After installing RTextTools, the initial step is to generate a document term matrix. Second, a container object is created, which holds all the objects needed for further analysis. Third, users can use up to nine algorithms to train their data. Fourth, the data are classified. Fifth, the classification is summarized. Sixth, functions are available for performance evaluation. Seventh, ensemble agreement is conducted. Eighth, users can cross-validate their data. Finally, users write their data to a spreadsheet, allowing for further manual coding if required. Introduction The process of hand-labeling texts according to topic, tone, and other quantifiable variables has yielded datasets offering insight into questions that span social science, such as the representation of citizen priorities in national policymaking (Jones et al., 2009) and the effects of media framing on public opinion (Baumgartner et al., 2008). This process of hand-labeling, known in the field as “manual coding”, is highly time consuming. To address this challenge, many social scientists have begun to incorporate machine learning into their toolkits. As computer scientists have long known, machine learning has the potential to vastly reduce the amount of time researchers spend manually coding data. Additionally, although machine learning cannot replicate the ability of a human coder to interpret the nuances of a text in context, it does allow researchers to examine systematically both texts and coding scheme performance in a way that humans cannot. Thus, the potential for heightened efficiency and perspective make machine learning an attractive approach for social scientists both with and without programming experience. Yet despite the rising interest in machine learning and the existence of several packages in R, no package incorporates a start-to-finish product that would be appealing to those social scientists and other researchers without technical expertise in this area. We offer to fill this gap through RTextTools. Using a variety of existing R packages, RTextTools is designed as a one-stop-shop for conducting supervised learning with textual data. In this paper, we outline a nine-step process, discuss the core functions of the program, and demonstrate the use of RTextTools with a working example. The core philosophy driving RTextTools’ development is to create a package that is easy to