The R Programming Environment
Total Page:16
File Type:pdf, Size:1020Kb
Appendix A The R Programming Environment This a brief introduction to R, where to get it from and some pointers to documents that teach its general use. A list of the available packages that extends R functionality to do financial time series analysis, and which are use in this book, can be found at the end. In between, the reader will find some general programming tips to deal with time series. Be aware that all the R programs in this book (the R Examples) can be downloaded from http://computationalfinance.lsi.upc.edu. A.1 R, What is it and How to Get it R is a free software version of the commercial object-oriented statistical programming language S. R is free under the terms of GNU General Public License and it is avail- able from the R Project website, http://www.r-project.org/, for all common operating systems, including Linux, MacOS and Windows. To download it follow the CRAN link and select the mirror site nearest to you. An appropriate IDE for R is R Studio, which can be obtained for free at http://rstudio.org/ under the same GNU free soft- ware license (there is also a link to R Studio from the CRAN). By the way, CRAN stands for Comprehensive R Archive Network. Although R is defined at the R Project site as a “language and environment for statistical computing and graphics”, R goes well beyond that primary purpose as it is a very flexible and extendable system, which has in fact been greatly extended by contributed “packages” that adds further functionalities to the program, in all imaginable areas of scientific research. And it is continuously growing. In the realm of Econometrics and Quantitative Finance there is a large number of contributed packages, where one can find all the necessary functions for the analysis of financial instruments. Many of these packages are used for the financial analytics experiments performed throughout the book, and so it is necessary to install them in order to reproduce all the R Examples. There are instructions on how to do this in the next section. A. Arratia, Computational Finance, Atlantis Studies in Computational Finance 283 and Financial Engineering 1, DOI: 10.2991/978-94-6239-070-6, © Atlantis Press and the authors 2014 284 Appendix A: The R Programming Environment The best place to start learning R is at the home page of the R Project. Once there, go to Manuals and download An Introduction to R; afterwards, go to Books and browse through the large list of R related literature. Each package comes also with an interactive Help detailing the usage of each function, together with examples. A.2 Installing R Packages and Obtaining Financial Data After downloading and installing R, tune it up with the contributed packages for doing serious financial time series analysis and econometrics listed in Sect. A.4. Except for RMetrics, to install a package XX, where XX stands for the code name within brackets in the package references list, from the R console enter the command: install.packages("XX”) and once installed, to load it into your R workspace, type: library(XX) For example, to install and load the package quantmod type: install.packages("quantmod") library(quantmod) RMetrics is a suite of packages for financial analytics. It contains all the package prefixed with f (e.g. fArma, fGarch, etc.) and others; each of the f packages can be installed by its own using the previous installation procedure, or to install all these packages within the RMetrics set, type: source("http://www.rmetrics.org/Rmetrics.R") install.Rmetrics( ) Then load each package with the library( ) command (e.g., library(fArma)). For more details and tutorials on RMetrics go to www.rmetrics.org. Installing packages from RStudio is easier: just click on the “Packages” window, then on “Install Packages” and in the pop-up window type the code name of the package. Make sure to select “Install all dependencies”. To get acquainted with all existing R packages for empirical work in Finance, browse through the list of topics at the CRAN Task View in Empirical Finance: http://cran.r-project.org/web/views/Finance.html. Sources of Financial Data Financial data can be freely obtained from various sources. For example: Yahoo fin- ance (http://finance.yahoo.com/), Google finance (www.google.com/finance), MSN Moneycentral (http://moneycentral.msn.com), and the Federal Reserve Bank of St. Louis - FRED (http://research.stlouisfed.org/fred2/). One can obtain the data manually by directly accessing these sites, or access the server through various functions built in some of the R packages. In the package quantmod there is the getSymbols method, and in Rmetrics there is the fImport package for importing Appendix A: The R Programming Environment 285 market data from the internet. To obtain US company’s filings (useful for evaluating a company’s bussiness) the place to go is EDGAR at http://www.sec.gov/. Also, there are various R packages of data, or that includes some data sets for self-testing of their functions. One R package of data used in this book for testing econometric models is the AER;thefOptions package comes with some markets data to test the portfolios analytic methods. A.3 To Get You Started in R Getting ready to work with R. The first thing you should do at your R console is to set your working directory. A good candidate is the directory where you want to save and retrieve all your data. Do this with setwd("<path-to-dir>"), where <path-to-dir> is the computer address to the directory. Next step is to load neces- sary libraries with library( ). Obtaining financial data, saving and reading. The most frequent method that we use to get financial data from websites like Yahoo or the Federal Reserve (FRED) is library(quantmod) getSymbols("<ticker>",src=’yahoo’) where <ticker> is the ticker symbol of the stock that we want (to learn the ticker look up the stock in finance.yahoo.com). The getSymbols retrieves and converts financial data in xts format. This is an “extensible time-series object", and among other amenities it allows to handle the indices of the time series in Date format. We need to load library(xts) to handle xts objects. Based on our own experience, it seems that the only way to save into disk all the information contained in an xts object is with saveRDS() which saves files in binary format. This in fact makes the data files portable across all platforms. To reload the data from your working directory into the R working environment use readRDS(). Here is a full example of obtaining the data for Apple Inc. stock from Yahoo, saving it to a working directory, whose path is in a variable string wdir, and later retrieving the data with readRDS: wdir = "<path-to-working-directory>" setwd(wdir) library(quantmod); library(xts) getSymbols("AAPL",src=’yahoo’) saveRDS(AAPL, file="Apple.rds") Apple = readRDS("Apple.rds") The suffix .rds identifies the file of type RDS (binary) in your system. Sometimes we also get files from other sources in .csv format. In that case it is save in disk with save command, and read from disk with read.csv() or read.zoo(). Some manipulations of xts objects. With the data in the xts format it can be explored by dates. Comments in R instructions are preceded by #. The command names() shows the headers of the column entries of the data set: 286 Appendix A: The R Programming Environment appl = AAPL[’2010/2012’] #consider data from 2010 to 2012 names(appl) The output shows the headers of the data. These are the daily Open, High, Low, Close, Volume and Adjusted Close price: [1] "AAPL.Open" "AAPL.High" "AAPL.Low" [4] "AAPL.Close" "AAPL.Volume" "AAPL.Adjusted" We choose to plot the Adjusted Close. To access the column use $: plot(appl$AAPL.Adjusted, main="Apple Adj. Close") Handling missing or impossible values in data. In R, missing values (whether character or numeric) are represented by the symbol NA (Not Available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (Not a Number). To exclude missing values from analysis you can either delete it from the data object X with na.omit(X), or set the parameter na.rm=TRUE in the function (most functions have this feature). Note that na.omit() does a listwise deletion and this could be a problem for comparing two time series, for their length might not be equal. You can check length of a list object with length(X). This is enough to get you started. The R Examples in the book will teach you further programming tricks. Enjoy! A.4 References for R and Packages Used in This Book [R] R Core Team (2013). R: A Language and Environment for Statistical Com- puting, R Foundation for Statistical Computing, Vienna, Austria. [AER] Kleiber, C. and Zeileis, A. (2008). Applied Econometrics with R, (Springer, New York). [caret] Kuhn, M. and various contributors (2013). caret: Classification and Regression Training. [e1071] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. and Leisch, F. (2012). e1071: Misc. Functions of the Dept. of Statistics (e1071), TU Wien. [fArma] Würtz, D. et al (2012). fArma: ARMA Time Series Modelling. [fExoticOptions] Würtz, D. et al (2012). fExoticOptions: Exotic Option Valuation. [fGarch] Würtz, D. and Chalabi, Y. with contribution from Miklovic, M., Boudt, C., Chausse, P. and others (2012). fGarch: Rmetrics - Autoregressive Con- ditional Heteroskedastic Modelling. [fOptions] Würtz, D. et al (2012). fOptions: Basics of Option Valuation. [kernlab] Karatzoglou, A., Smola, A., Hornik, K.