The R Programming Environment

Total Page:16

File Type:pdf, Size:1020Kb

The R Programming Environment Appendix A The R Programming Environment This a brief introduction to R, where to get it from and some pointers to documents that teach its general use. A list of the available packages that extends R functionality to do financial time series analysis, and which are use in this book, can be found at the end. In between, the reader will find some general programming tips to deal with time series. Be aware that all the R programs in this book (the R Examples) can be downloaded from http://computationalfinance.lsi.upc.edu. A.1 R, What is it and How to Get it R is a free software version of the commercial object-oriented statistical programming language S. R is free under the terms of GNU General Public License and it is avail- able from the R Project website, http://www.r-project.org/, for all common operating systems, including Linux, MacOS and Windows. To download it follow the CRAN link and select the mirror site nearest to you. An appropriate IDE for R is R Studio, which can be obtained for free at http://rstudio.org/ under the same GNU free soft- ware license (there is also a link to R Studio from the CRAN). By the way, CRAN stands for Comprehensive R Archive Network. Although R is defined at the R Project site as a “language and environment for statistical computing and graphics”, R goes well beyond that primary purpose as it is a very flexible and extendable system, which has in fact been greatly extended by contributed “packages” that adds further functionalities to the program, in all imaginable areas of scientific research. And it is continuously growing. In the realm of Econometrics and Quantitative Finance there is a large number of contributed packages, where one can find all the necessary functions for the analysis of financial instruments. Many of these packages are used for the financial analytics experiments performed throughout the book, and so it is necessary to install them in order to reproduce all the R Examples. There are instructions on how to do this in the next section. A. Arratia, Computational Finance, Atlantis Studies in Computational Finance 283 and Financial Engineering 1, DOI: 10.2991/978-94-6239-070-6, © Atlantis Press and the authors 2014 284 Appendix A: The R Programming Environment The best place to start learning R is at the home page of the R Project. Once there, go to Manuals and download An Introduction to R; afterwards, go to Books and browse through the large list of R related literature. Each package comes also with an interactive Help detailing the usage of each function, together with examples. A.2 Installing R Packages and Obtaining Financial Data After downloading and installing R, tune it up with the contributed packages for doing serious financial time series analysis and econometrics listed in Sect. A.4. Except for RMetrics, to install a package XX, where XX stands for the code name within brackets in the package references list, from the R console enter the command: install.packages("XX”) and once installed, to load it into your R workspace, type: library(XX) For example, to install and load the package quantmod type: install.packages("quantmod") library(quantmod) RMetrics is a suite of packages for financial analytics. It contains all the package prefixed with f (e.g. fArma, fGarch, etc.) and others; each of the f packages can be installed by its own using the previous installation procedure, or to install all these packages within the RMetrics set, type: source("http://www.rmetrics.org/Rmetrics.R") install.Rmetrics( ) Then load each package with the library( ) command (e.g., library(fArma)). For more details and tutorials on RMetrics go to www.rmetrics.org. Installing packages from RStudio is easier: just click on the “Packages” window, then on “Install Packages” and in the pop-up window type the code name of the package. Make sure to select “Install all dependencies”. To get acquainted with all existing R packages for empirical work in Finance, browse through the list of topics at the CRAN Task View in Empirical Finance: http://cran.r-project.org/web/views/Finance.html. Sources of Financial Data Financial data can be freely obtained from various sources. For example: Yahoo fin- ance (http://finance.yahoo.com/), Google finance (www.google.com/finance), MSN Moneycentral (http://moneycentral.msn.com), and the Federal Reserve Bank of St. Louis - FRED (http://research.stlouisfed.org/fred2/). One can obtain the data manually by directly accessing these sites, or access the server through various functions built in some of the R packages. In the package quantmod there is the getSymbols method, and in Rmetrics there is the fImport package for importing Appendix A: The R Programming Environment 285 market data from the internet. To obtain US company’s filings (useful for evaluating a company’s bussiness) the place to go is EDGAR at http://www.sec.gov/. Also, there are various R packages of data, or that includes some data sets for self-testing of their functions. One R package of data used in this book for testing econometric models is the AER;thefOptions package comes with some markets data to test the portfolios analytic methods. A.3 To Get You Started in R Getting ready to work with R. The first thing you should do at your R console is to set your working directory. A good candidate is the directory where you want to save and retrieve all your data. Do this with setwd("<path-to-dir>"), where <path-to-dir> is the computer address to the directory. Next step is to load neces- sary libraries with library( ). Obtaining financial data, saving and reading. The most frequent method that we use to get financial data from websites like Yahoo or the Federal Reserve (FRED) is library(quantmod) getSymbols("<ticker>",src=’yahoo’) where <ticker> is the ticker symbol of the stock that we want (to learn the ticker look up the stock in finance.yahoo.com). The getSymbols retrieves and converts financial data in xts format. This is an “extensible time-series object", and among other amenities it allows to handle the indices of the time series in Date format. We need to load library(xts) to handle xts objects. Based on our own experience, it seems that the only way to save into disk all the information contained in an xts object is with saveRDS() which saves files in binary format. This in fact makes the data files portable across all platforms. To reload the data from your working directory into the R working environment use readRDS(). Here is a full example of obtaining the data for Apple Inc. stock from Yahoo, saving it to a working directory, whose path is in a variable string wdir, and later retrieving the data with readRDS: wdir = "<path-to-working-directory>" setwd(wdir) library(quantmod); library(xts) getSymbols("AAPL",src=’yahoo’) saveRDS(AAPL, file="Apple.rds") Apple = readRDS("Apple.rds") The suffix .rds identifies the file of type RDS (binary) in your system. Sometimes we also get files from other sources in .csv format. In that case it is save in disk with save command, and read from disk with read.csv() or read.zoo(). Some manipulations of xts objects. With the data in the xts format it can be explored by dates. Comments in R instructions are preceded by #. The command names() shows the headers of the column entries of the data set: 286 Appendix A: The R Programming Environment appl = AAPL[’2010/2012’] #consider data from 2010 to 2012 names(appl) The output shows the headers of the data. These are the daily Open, High, Low, Close, Volume and Adjusted Close price: [1] "AAPL.Open" "AAPL.High" "AAPL.Low" [4] "AAPL.Close" "AAPL.Volume" "AAPL.Adjusted" We choose to plot the Adjusted Close. To access the column use $: plot(appl$AAPL.Adjusted, main="Apple Adj. Close") Handling missing or impossible values in data. In R, missing values (whether character or numeric) are represented by the symbol NA (Not Available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (Not a Number). To exclude missing values from analysis you can either delete it from the data object X with na.omit(X), or set the parameter na.rm=TRUE in the function (most functions have this feature). Note that na.omit() does a listwise deletion and this could be a problem for comparing two time series, for their length might not be equal. You can check length of a list object with length(X). This is enough to get you started. The R Examples in the book will teach you further programming tricks. Enjoy! A.4 References for R and Packages Used in This Book [R] R Core Team (2013). R: A Language and Environment for Statistical Com- puting, R Foundation for Statistical Computing, Vienna, Austria. [AER] Kleiber, C. and Zeileis, A. (2008). Applied Econometrics with R, (Springer, New York). [caret] Kuhn, M. and various contributors (2013). caret: Classification and Regression Training. [e1071] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. and Leisch, F. (2012). e1071: Misc. Functions of the Dept. of Statistics (e1071), TU Wien. [fArma] Würtz, D. et al (2012). fArma: ARMA Time Series Modelling. [fExoticOptions] Würtz, D. et al (2012). fExoticOptions: Exotic Option Valuation. [fGarch] Würtz, D. and Chalabi, Y. with contribution from Miklovic, M., Boudt, C., Chausse, P. and others (2012). fGarch: Rmetrics - Autoregressive Con- ditional Heteroskedastic Modelling. [fOptions] Würtz, D. et al (2012). fOptions: Basics of Option Valuation. [kernlab] Karatzoglou, A., Smola, A., Hornik, K.
Recommended publications
  • Tinn-R Team Has a New Member Working on the Source Code: Wel- Come Huashan Chen
    Editus eBook Series Editus eBooks is a series of electronic books aimed at students and re- searchers of arts and sciences in general. Tinn-R Editor (2010 1. ed. Rmetrics) Tinn-R Editor - GUI forR Language and Environment (2014 2. ed. Editus) José Cláudio Faria Philippe Grosjean Enio Galinkin Jelihovschi Ricardo Pietrobon Philipe Silva Farias Universidade Estadual de Santa Cruz GOVERNO DO ESTADO DA BAHIA JAQUES WAGNER - GOVERNADOR SECRETARIA DE EDUCAÇÃO OSVALDO BARRETO FILHO - SECRETÁRIO UNIVERSIDADE ESTADUAL DE SANTA CRUZ ADÉLIA MARIA CARVALHO DE MELO PINHEIRO - REITORA EVANDRO SENA FREIRE - VICE-REITOR DIRETORA DA EDITUS RITA VIRGINIA ALVES SANTOS ARGOLLO Conselho Editorial: Rita Virginia Alves Santos Argollo – Presidente Andréa de Azevedo Morégula André Luiz Rosa Ribeiro Adriana dos Santos Reis Lemos Dorival de Freitas Evandro Sena Freire Francisco Mendes Costa José Montival Alencar Junior Lurdes Bertol Rocha Maria Laura de Oliveira Gomes Marileide dos Santos de Oliveira Raimunda Alves Moreira de Assis Roseanne Montargil Rocha Silvia Maria Santos Carvalho Copyright©2015 by JOSÉ CLÁUDIO FARIA PHILIPPE GROSJEAN ENIO GALINKIN JELIHOVSCHI RICARDO PIETROBON PHILIPE SILVA FARIAS Direitos desta edição reservados à EDITUS - EDITORA DA UESC A reprodução não autorizada desta publicação, por qualquer meio, seja total ou parcial, constitui violação da Lei nº 9.610/98. Depósito legal na Biblioteca Nacional, conforme Lei nº 10.994, de 14 de dezembro de 2004. CAPA Carolina Sartório Faria REVISÃO Amek Traduções Dados Internacionais de Catalogação na Publicação (CIP) T591 Tinn-R Editor – GUI for R Language and Environment / José Cláudio Faria [et al.]. – 2. ed. – Ilhéus, BA : Editus, 2015. xvii, 279 p. ; pdf Texto em inglês.
    [Show full text]
  • Modeling Language Evolution and Feature Dynamics in a Realistic Geographic Environment
    Modeling language evolution and feature dynamics in a realistic geographic environment Rhea Kapur Phillip Rogers The Pingry School Department of Linguistics [email protected] The University of California, Santa Barbara [email protected] Abstract Recent, innovative efforts to understand the uneven distribution of languages and linguistic fea- ture values in time and space attest to both the challenge these issues pose and the value in solving them. In this paper, we introduce a model for simulating languages and their features over time in a realistic geographic environment. At its core is a model of language phylogeny and migration whose parameters are chosen to reproduce known language family sizes and geographic disper- sions. This foundation in turn is used to explore the dynamics of linguistic features. Languages are assigned feature values that can change randomly or under the influence of nearby languages according to predetermined probabilities. We assess the effects of these settings on resulting geographic and genealogical patterns using homogeneity measures defined in the literature. The resulting model is both flexible and realistic, and it can be employed to answer a wide range of related questions.1 1 Introduction The distribution of linguistic feature values across time and space is uneven. The overall frequencies of particular feature values can be skewed with some highly frequent and others vanishingly rare, and many feature values have been observed to cluster geographically or within particular language families. Such patterns are most commonly explained in terms of the competing influences of inheritance, diffusion, universal tendencies of language drift, and chance. Understanding these dynamics has long been a goal of linguists, and it has important implications for historical linguistics, areal linguistics, and linguistic ty- pology.
    [Show full text]
  • The R Journal Volume 3/2, December 2011
    The Journal Volume 3/2, December 2011 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial..................................................3 Contributed Research Articles Creating and Deploying an Application with (R)Excel and R...................5 glm2: Fitting Generalized Linear Models with Convergence Problems.............. 12 Implementing the Compendium Concept with Sweave and DOCSTRIP............. 16 Watch Your Spelling!........................................... 22 Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming 29 Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions............... 34 Using the Google Visualisation API with R.............................. 40 GrapheR: a Multiplatform GUI for Drawing Customizable Graphs in R............. 45 rainbow: An R Package for Visualizing Functional Time Series.................. 54 Programmer’s Niche Portable C++ for R Packages...................................... 60 News and Notes R’s Participation in the Google Summer of Code 2011....................... 64 Conference Report: useR! 2011..................................... 68 Forthcoming Events: useR! 2012.................................... 70 Changes in R............................................... 72 Changes on CRAN............................................ 84 News from the Bioconductor Project.................................. 86 R Foundation News........................................... 87 2 The Journal is a peer-reviewed publication
    [Show full text]
  • Determinants of Risk Perception Related to Exposure to Endocrine Disruptors During Pregnancy: a Qualitative and Quantitative Study on French Women
    Article Determinants of Risk Perception Related to Exposure to Endocrine Disruptors during Pregnancy: A Qualitative and Quantitative Study on French Women Steeve Rouillon 1,2,3, Houria El Ouazzani 1,2,3, Sylvie Rabouan 1,2, Virginie Migeot 1,2,3 and Marion Albouy-Llaty 1,2,3,* 1 INSERM, University Hospital of Poitiers, University of Poitiers, Clinical Investigation Center 1402, 2 rue de la Milétrie, 86021 Poitiers CEDEX, France; [email protected] (S.Ro.); [email protected] (H.E.O.); [email protected] (S.Ra.); [email protected] (V.M.) 2 Faculty of Medicine and Pharmacy, University of Poitiers, 6 rue de la Milétrie, 86000 Poitiers, France 3 Department of Public Health, BioSPharm Pole, University Hospital of Poitiers, 2 rue de la Milétrie, 86021 Poitiers CEDEX, France * Correspondence: [email protected]; Tel.: +33-549-443-323 Received: 27 August 2018; Accepted: 5 October 2018; Published: 11 October 2018 Abstract: Endocrine disruptors (EDCs) are known as environmental exposure factors. However, they are rarely reported by health professionals in clinical practice, particularly during pregnancy, even though they are associated with many deleterious consequences. The objectives of this study were to estimate the risk perception of pregnant women related to EDC exposure and to evaluate its determinants. A qualitative study based on the Health Belief Model was carried out through interviews of pregnant women and focus group with perinatal, environmental health and prevention professionals in 2015 in the city of Poitiers, France. Then, determinants of risk perception were included in a questionnaire administered to 300 women in the perinatal period through a quantitative study.
    [Show full text]
  • Rhea: a Transparent and Modular R Pipeline for Microbial Profiling Based on 16S Rrna Gene Amplicons
    Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons Ilias Lagkouvardos, Sandra Fischer, Neeraj Kumar and Thomas Clavel ZIEL—Core Facility Microbiome/NGS, Technical University of Munich, Freising, Germany ABSTRACT The importance of 16S rRNA gene amplicon profiles for understanding the influence of microbes in a variety of environments coupled with the steep reduction in sequencing costs led to a surge of microbial sequencing projects. The expanding crowd of scientists and clinicians wanting to make use of sequencing datasets can choose among a range of multipurpose software platforms, the use of which can be intimidating for non-expert users. Among available pipeline options for high-throughput 16S rRNA gene analysis, the R programming language and software environment for statistical computing stands out for its power and increased flexibility, and the possibility to adhere to most recent best practices and to adjust to individual project needs. Here we present the Rhea pipeline, a set of R scripts that encode a series of well- documented choices for the downstream analysis of Operational Taxonomic Units (OTUs) tables, including normalization steps, alpha- and beta-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations. Rhea is primarily a straightforward starting point for beginners, but can also be a framework for advanced users who can modify and expand the tool. As the community standards evolve, Rhea will adapt to always represent the current state-of-the-art in microbial profiles analysis in the clear and comprehensive way allowed by the R language. Rhea scripts and documentation are freely available at https://lagkouvardos.github.io/Rhea.
    [Show full text]
  • John W. Emerson: Curriculum Vitae
    John W. Emerson Yale University Phone: (203) 432-0638 Department of Statistics and Data Science Email: [email protected] 24 Hillhouse Avenue, New Haven, CT 06511 Homepage: http://www.stat.yale.edu/∼jay/ Education Ph.D. Statistics, Yale University, 1994-2002. Dissertation: Asymptotic Admissibility and Bayesian Estima- tion. Advisor: John Hartigan, Eugene Higgins Professor of Statistics. M.Phil. Economics, Oxford University, 1992-1994. Thesis: Finite-Sample Properties of Sample Selection Estimators: Principles and a Case Study. Topics of Study: Micro and Macroeconomic Theory, Industrial Organization, and Econometrics. B.A. Economics and Mathematics, Williams College, 1988-1992. Honors Thesis in Mathematics: Bayesian Categorical Data Analysis with Multivariate Qualitative Measurements. Academic Appointments Professor Adjunct and Director of Graduate Studies, Department of Statistics and Data Science, Yale University, 2016-present Yale-NUS Contributing Faculty, 2014-present Associate Professor Adjunct and Director of Graduate Studies, Department of Statistics, Yale Univer- sity, 2012-2016 Visiting Professor, Peking University, Summers 2007, 2009, 2014, 2017 Guest Professor, National Taipei University of Technology, Summer 2012 Associate Professor, Department of Statistics, Yale University, 2009-2012 Faculty Affiliate, Yale Center for Environment Law and Policy, 2006-present Assistant Professor, Department of Statistics, Yale University, 2003-2009 Lecturer, Department of Statistics, Yale University, 2003 Graduate Instructor for special
    [Show full text]
  • ECON 18: Quantitative Equity Analysis Winter Study 2009
    ECON 18: Quantitative Equity Analysis Winter Study 2009 Professor David Kane dkane at iq.harvard.edu Teaching Fellow David Phillips '11 David.E.Phillips at williams.edu Overview This class will introduce students to applied quantitative equity research. We will briefly review the history and approach of academic research in equity pricing via a reading of selected papers. Students will then learn the best software tools for conducting such research. Students will work as teams to replicate the results of a published academic paper and then extend those results in a non-trivial manner. This course is designed for two types of students: first, those interested in applied economic research, and second, those curious about how that research is used and evaluated by finance professionals. See "Publication, ​ Publication" for useful background reading on my approach. Special thanks to Bill Alpert, ​ Senior Editor at Barron's, Joe Masters '96, Assistant Professor of Mathematics, SUNY Buffalo, George Poulogiannis and Dan Gerlanc '07 for agreeing to participate in the class. Prerequisites and Background Reading There are no formal prerequisites for the class. Students who have taken STAT 201 will have an advantage in terms of their exposure to R, the programming language that we use. ​ ​ But other students will be able to learn enough R both before class starts and in the first ​ ​ week of Winter Study. You can learn more about and download R. See Wikipedia for an ​ ​ ​ ​ overview of R. Read An Introduction to R. ​ ​ You are welcome to use whatever editor you like for writing code, but I strongly recommend Emacs.
    [Show full text]
  • Introduction to R for Finance
    Computational Finance and Risk Management mm 40 60 80 100 120 40 Introduction to R Guy Yollin 60 Principal Consultant, r-programming.org Affiliate Instructor, University of Washington 80 Guy Yollin (Copyright © 2011) Introduction to R 1 / 132 Outline 1 R language references mm 40 60 80 100 120 2 R overview and history 3 R language and environment basics 4 The working directory, data files, and data manipulation 40 5 Basic statistics and the normal distribution 6 Basic plotting 7 Working with time series in R 60 8 Variable scoping in R 9 The R help system 10 Web resources80 for R 11 IDE editors for R Guy Yollin (Copyright © 2011) Introduction to R 2 / 132 Outline 1 R language references mm 40 60 80 100 120 2 R overview and history 3 R language and environment basics 4 The working directory, data files, and data manipulation 40 5 Basic statistics and the normal distribution 6 Basic plotting 7 Working with time series in R 60 8 Variable scoping in R 9 The R help system 10 Web resources80 for R 11 IDE editors for R Guy Yollin (Copyright © 2011) Introduction to R 4 / 132 Essential web resources mm 40 60 80 100 120 An Introduction to R R Reference Card W.N. Venables, D.M. Smith Tom Short R Development Core Team 40 R Reference Card cat(..., file="", sep=" ") prints the arguments after coercing to Slicing and extracting data character; sep is the character separator between arguments Indexing lists by Tom Short, EPRI Solutions, Inc., [email protected] 2005-07-12 print(a, ...) prints its arguments; generic, meaning it can have differ- x[n] list with elements n th Granted to the public domain.
    [Show full text]
  • Timedate Package by Yohan Chalabi, Martin Mächler, and Diethelm Würtz R, See Ripley & Hornik(2001) and Grothendieck & Petzoldt(2004)
    CONTRIBUTED RESEARCH ARTICLES 19 Rmetrics - timeDate Package by Yohan Chalabi, Martin Mächler, and Diethelm Würtz R, see Ripley & Hornik(2001) and Grothendieck & Petzoldt(2004). Another problem commonly faced in managing time and dates is the midnight standard. Indeed, the problem can be handled in different ways depending on the standard in use. For example, the standard C library does not allow for the “midnight” standard in the format “24:00:00” (Bateman(2000)). However, the timeDate package supports this format. Moreover, timeDate includes functions for cal- endar manipulations, business days, weekends, and public and ecclesiastical holidays. One can handle Figure 1: World map with major time zones. 1 day count conventions and rolling business conven- tions. Such periods can pertain to, for example, the last working day of the month. The below examples Abstract The management of time and holidays illustrate this point. can prove crucial in applications that rely on his- In the remaining part of this note, we first present torical data. A typical example is the aggregation the structure of the "timeDate" class. Then, we ex- of a data set recorded in different time zones and plain the creation of "timeDate" objects. The use of under different daylight saving time rules. Be- financial centers with DST rules is described next. sides the time zone conversion function, which is Thereafter, we explain the management of holidays. well supported by default classes in R, one might Finally, operations on "timeDate" objects such as need functions to handle special days or holi- mathematical operations, rounding, subsetting, and days. In this respect, the package timeDate en- coercions are discussed.
    [Show full text]
  • Supplementation of a Lacto-Fermented Rapeseed-Seaweed Blend Promotes
    Hui et al. Journal of Animal Science and Biotechnology (2021) 12:85 https://doi.org/10.1186/s40104-021-00601-2 RESEARCH Open Access Supplementation of a lacto-fermented rapeseed-seaweed blend promotes gut microbial- and gut immune-modulation in weaner piglets Yan Hui1, Paulina Tamez-Hidalgo2, Tomasz Cieplak1, Gizaw Dabessa Satessa3, Witold Kot4, Søren Kjærulff2, Mette Olaf Nielsen5, Dennis Sandris Nielsen1 and Lukasz Krych1* Abstract Background: The direct use of medical zinc oxide in feed will be abandoned after 2022 in Europe, leaving an urgent need for substitutes to prevent post-weaning disorders. Results: This study investigated the effect of using rapeseed-seaweed blend (rapeseed meal added two brown macroalgae species Ascophylum nodosum and Saccharina latissima) fermented by lactobacilli (FRS) as feed ingredients in piglet weaning. From d 28 of life to d 85, the piglets were fed one of three different feeding regimens (n = 230 each) with inclusion of 0%, 2.5% and 5% FRS. In this period, no significant difference of piglet performance was found among the three groups. From a subset of piglets (n = 10 from each treatment), blood samples for hematology, biochemistry and immunoglobulin analysis, colon digesta for microbiome analysis, and jejunum and colon tissues for histopathological analyses were collected. The piglets fed with 2.5% FRS manifested alleviated intraepithelial and stromal lymphocytes infiltration in the gut, enhanced colon mucosa barrier relative to the 0% FRS group. The colon microbiota composition was determined using V3 and V1-V8 region 16S rRNA gene amplicon sequencing by Illumina NextSeq and Oxford Nanopore MinION, respectively. The two amplicon sequencing strategies showed high consistency between the detected bacteria.
    [Show full text]
  • A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments
    International Journal of Molecular Sciences Article Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments Rebecca Ansorge 1 , Giovanni Birolo 2 , Stephen A. James 1 and Andrea Telatin 1,* 1 Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich NR4 7UQ, UK; [email protected] (R.A.); [email protected] (S.A.J.) 2 Medical Sciences Department, University of Turin, 10126 Turin, Italy; [email protected] * Correspondence: [email protected] Abstract: The taxonomic composition of microbial communities can be assessed using universal marker amplicon sequencing. The most common taxonomic markers are the 16S rDNA for bacterial communities and the internal transcribed spacer (ITS) region for fungal communities, but various other markers are used for barcoding eukaryotes. A crucial step in the bioinformatic analysis of amplicon sequences is the identification of representative sequences. This can be achieved using a clustering approach or by denoising raw sequencing reads. DADA2 is a widely adopted algorithm, released as an R library, that denoises marker-specific amplicons from next-generation sequencing and produces a set of representative sequences referred to as ‘Amplicon Sequence Variants’ (ASV). Here, we present Dadaist2, a modular pipeline, providing a complete suite for the analysis that ranges from raw sequencing reads to the statistics of numerical ecology. Dadaist2 implements a new approach that is specifically optimised for amplicons with variable lengths, such as the fungal ITS. The pipeline focuses on streamlining the data flow from the command line to R, with multiple options for statistical analysis and plotting, both interactive and automatic.
    [Show full text]
  • Package 'Fbasics'
    Package ‘fBasics’ March 7, 2020 Title Rmetrics - Markets and Basic Statistics Date 2017-11-12 Version 3042.89.1 Author Diethelm Wuertz [aut], Tobias Setz [cre], Yohan Chalabi [ctb] Martin Maechler [ctb] Maintainer Tobias Setz <[email protected]> Description Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management. Depends R (>= 2.15.1), timeDate, timeSeries Imports stats, grDevices, graphics, methods, utils, MASS, spatial, gss, stabledist ImportsNote akima not in Imports because of non-GPL licence. Suggests akima, RUnit, tcltk LazyData yes License GPL (>= 2) Encoding UTF-8 URL https://www.rmetrics.org NeedsCompilation yes Repository CRAN Date/Publication 2020-03-07 11:06:10 UTC 1 2 R topics documented: R topics documented: fBasics-package . .4 acfPlot . 13 akimaInterp . 15 baseMethods . 17 BasicStatistics . 18 BoxPlot . 19 characterTable . 20 colorLocator . 20 colorPalette . 21 colorTable . 25 colVec . 25 correlationTest . 26 decor . 28 distCheck . 29 DistributionFits . 29 fBasics-deprecated . 31 fBasicsData . 32 fHTEST . 34 getS4 . 35 gh.............................................. 36 ghFit . 38 ghMode . 40 ghMoments . 41 ghRobMoments . 42 ghSlider . 43 ght.............................................. 44 ghtFit .
    [Show full text]