Basic Introduction to R and Bioconductor

Total Page:16

File Type:pdf, Size:1020Kb

Basic Introduction to R and Bioconductor Basic Introduction to R and Bioconductor. Aedin Culhane 18th Sept 2009 ([email protected]) R (www.r-project.org) is an open source interactive computer system for visualization and analysis of statistical data. Bioconductor (www.bioconductor.org) is a project within R. The Bioconductor project is a source of many resources relevant to data analysis for high-throughput biology. Installing R and Bioconductor 1. Download R from http://www.r-project.org. The primary portal for R is http://cran.r- project.org . Many local mirrors exist and these may provide quicker downloads. R can be download precompiled for Linux, Mac OS and Windows. If you are using Linux, R can be installed using yum install. To install R simply, click on download and select the version appropriate for your operating system. If you use linux, you may be able to use (apt-get install R or yum) depending on the version of Linux you use. For the most users, this precompiled version of R is sufficient. However if you wish to write an extension package for R/Bioconductor, you need to install R from source. Please refer to the R FAQ for directions on this. Windows users will need to install unix tools/perl/latex, please refer to the excellent guide at http://www.murdoch- sutherland.com/Rtools/. There is also a more (complex) complete description on the bioconductor developers wiki (http://wiki.fhcrc.org/bioc/HowTo/Build_R_on_Windows) R Extension Packages On http://cran.r-project.org/ or the mirror you are using. To find out about contributed package please click on the link Contributed extension packages at the bottom of the main download page. This will link to a page with lots of information on contributed packages and to “CRAN Task Views”. CRAN task views, provide many categories of contributed packages and makes installation of these packages simpler. For example to install genetics packages type install.packages("ctv") library(ctv) install.views("Genetics") Note these categories contain many packages, for example this Genetics contains 29 packages. Installing Individual Contributed or Extension Packages Although a team of statisticians and programmers maintain R, many independent groups submit contributed packages. This is a very extensive resource of hundreds of programs and will not be install by default. R extensions can be installed easily. R extensions are stored in repositories. Select the package repositories using Packages - Select repositories The R repositories are: CRAN: basic R distribution CRAN (extras)- Contributed R packages. There are several hundred. Bioconductor: The Bioconductor Project produces an open source software framework that will assist biologists and statisticians working in bioinformatics, with primary emphasis on inference using DNA microarrays. A CRAN style R package repository is available via http://www.bioconductor.org/. Omegahat: The Omegahat Project for Statistical Computing provides a variety of open- source software for statistical applications, with special emphasis on web-based software, Java, the Java virtual machine, and distributed computing. A CRAN style R package repository is available via http://www.omegahat.org/R/. Select one repository, it may provide a select of CRAN mirrors, select one close by. You can now download extension packages. There are several ways to install extension packages 1. From the GUI, select Packages - Install Packages. Note only the packages from the selected repositories will be shown. 2. From command line type install.packages(“packagename”) 3. Download a compressed copy of the package from the R website and install from local zip or .tar.gz file Installing Bioconductor Bioconductor can be installed using this approach however a simpler method is to use the following commands: source("http://www.bioconductor.org/biocLite.R") biocLite() This will automatically download core bioconductor packages. Due to the number and size of Bioconductor, many useful packages will not be installed, so these can be installed as described above For more complex installs of specific subsets of packages see http://www.bioconductor.org/docs/install/ To install a specific package, simply include the package name between the brackets eg biocLite(‘biomart’) On windows, you can also use the drop down menus Packages -> install package(s) but ensure that you have first select the bioconductor repository using Packages -> select repositories. By default only CRAN is selected. To install bioconductor packages, bioconductor needs to be selected. Then select Packages -> install package(s) and choose a package from the long list. Select one or more packages. Hold down the shift key to select many packages that are consecutive or hold control to select non-continuous packages. Using a script to simplify Installation of R and Bioconductor Once you are familiar with R and Bioconductor, you may wish to script the above progress so that updates to R are simpler ;-) I use the following script, which I saved to a plain text file called bioC_install.R ### Packages for R. install.packages("ade4") install.packages("scatterplot3d") install.packages("Rcurl") install.packages("fastICA") install.packages("e1071") ### BioC Packages for R. source("http://www.bioconductor.org/biocLite.R") biocLite() biocLite("made4") biocLite("hgu133plus2.db") biocLite("biomaRt") To install these packages, I simply type command source(“./bioC_install.R”) and then go for coffee ;-) R and Bioconductor are pretty “intelligent”, if you wish to install a package that has dependencies (needs other packages) it will automatically install these. Bioconductor Packages Task Overview Bioconductor (http://www.bioconductor.org) has a really nice “task views” interface to packages in Bioconductor. To find out more about these click on Software Packages Task views are grouped into 3 categories, Software Annotation Data Experiment Data. Click on each of these clicks to explore the packages in Bioconductor. Bioconductor Task Views - Categories Software packages are sub divided into the categories: • Annotation • AssayDomains • AssayTechnologies • BiologicalDomains • Infrastructure • Bioinformatics Each contains a long list of contributed packages. For example clicking on Technology returns a list of technologies software categories • Microarray • MicrotitrePlateAssay • MassSpectrometry • SAGE • FlowCytometry • Sequencing • HighThroughputSequencing and MassSpectrometry contains a list of relevant packages: and each package page give details, help and a link to a vignette (tutorial). Under Documentation, Click on the link to the pdf file. This is the vignette which will give an overview (tutorial) and examples of work-flows for the package Bioconductor Annotation Data Packages Bioconductor contains hundreds of annotation packages. These are an incredible value resource in Bioconductor. This resource can be searched by organism, chip manufacturer, chip name etc Subviews • Organism • ChipManufacturer • CustomCDF • CustomArray • CustomDBSchema • FunctionalAnnotation • SequenceAnnotation • ChipName For example click on chip manufacturer - Agilent or Organism - Homo sapiens Note annotation packages have the ending “.db” If you click on Affymetrix, you will see each array is represented by 3 annotation packages, in addition to the .db file, there is a cdf and probe package also These contain probe annotation, chip description file (cdf) and probe sequence data (probe). The latter two are compiled from Affymetrix or manufacturer information. To install an annotation package, follow the same guidelines as package installation source("http://www.bioconductor.org/biocLite.R") biocLite("hgu133plus2.db") To find out about a package: library(hgu133plus2.db) hgu133plus2.db() Quality control information for hgu133plus2 Date built: Created: Mon Sep 25 20:14:15 2006 Number of probes: 54675 Probe number missmatch: None Probe missmatch: None Mappings found for probe based rda files: hgu133plus2ACCNUM found 54675 of 54675 hgu133plus2CHRLOC found 42738 of 54675 hgu133plus2CHR found 47342 of 54675 hgu133plus2ENTREZID found 47430 of 54675 hgu133plus2ENZYME found 4755 of 54675 hgu133plus2GENENAME found 41845 of 54675 hgu133plus2GO found 35836 of 54675 hgu133plus2MAP found 46947 of 54675 hgu133plus2OMIM found 27681 of 54675 hgu133plus2PATH found 10232 of 54675 hgu133plus2PMID found 46000 of 54675 hgu133plus2REFSEQ found 45868 of 54675 hgu133plus2SUMFUNC found 0 of 54675 hgu133plus2SYMBOL found 47391 of 54675 hgu133plus2UNIGENE found 47147 of 54675 Mappings found for non-probe based rda files: hgu133plus2CHRLENGTHS found 25 hgu133plus2ENZYME2PROBE found 792 hgu133plus2GO2ALLPROBES found 6978 hgu133plus2GO2PROBE found 5128 hgu133plus2ORGANISM found 1 hgu133plus2PATH2PROBE found 183 hgu133plus2PFAM found 33303 hgu133plus2PMID2PROBE found 130997 hgu133plus2PROSITE found 23213 Bioconductor Experiment Data Packages Experiment data package contain published data pre-prepared for Bioconductor. For example these packages include data from leukemia gene expression studies (Golub et al., 1999), colon cancer gene expression profiling (Alon et al. 1999), etc. Many of the tutorial s(vignettes) in Bioconductor use these data in exercises. What Packages Should I download? There are so many packages it can be very confusing as to what is the best selection to download. There is no one answer to this question. Bioconductor is expanding rapidly to enable analysis of many different data types for example microarray data, proteomics, sequencing, etc. So it really depends on the analysis you wish to do. There
Recommended publications
  • Analysis of Cell-Based Rnai Screens Comment Michael Boutros*, Lígia P Brás†‡ and Wolfgang Huber†
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Open Access Software2006BoutrosetVolume al. 7, Issue 7, Article R66 Analysis of cell-based RNAi screens comment Michael Boutros*, Lígia P Brás†‡ and Wolfgang Huber† Addresses: *Signaling and Functional Genomics, German Cancer Research Center, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany. †EMBL - European Bioinformatics Institute, Cambridge CB10 1SD, UK. ‡Centre for Chemical and Biological Engineering, IST, Technical University of Lisbon, Av. Rovisco Pais, P-1049-001 Lisbon, Portugal. Correspondence: Michael Boutros. Email: [email protected]. Wolfgang Huber. Email: [email protected] reviews Published: 25 July 2006 Received: 27 March 2006 Revised: 7 June 2006 Genome Biology 2006, 7:R66 (doi:10.1186/gb-2006-7-7-r66) Accepted: 25 July 2006 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2006/7/7/R66 © 2006 Boutros et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. reports Analysis<p>cellHTS of cell-based is a new method RNAi screens for the analysis and documentation of RNAi screens.</p> Abstract RNA interference (RNAi) screening is a powerful technology for functional characterization of deposited research biological pathways. Interpretation of RNAi screens requires computational and statistical analysis techniques. We describe a method that integrates all steps to generate a scored phenotype list from raw data.
    [Show full text]
  • Annexes : Installation Du Logiciel R Et Des Packages R
    Annexes : Installation du logiciel R et des packages R Pr´e-requis et objectif • Aucun pr´e-requis n’est n´ecessaire.La lecture du chapitre 1 pourrait tou- tefois se r´ev´eler int´eressante. • Ce chapitre d´ecritcomment installer le logiciel R, dans sa version x (rem- placer partout dans le reste de ce document x par le num´ero de la derni`ere version disponible), sous le syst`emed’exploitation Microsoft Windows et aussi comment ajouter des packages suppl´ementaires sous Windows ou sous Linux. SECTION C.1 Installation de R sous Microsoft Windows Commencez par t´el´echarger le logiciel R (fichier R-x-win.exe o`u x est le nu- m´ero de la derni`ereversion disponible) `a l’aide de votre navigateur web usuel `a l’adresse suivante : http://cran.r-project.org/bin/windows/base/ Enregistrez ensuite ce fichier ex´ecutable sur le bureau de Windows puis double- cliquez sur le fichier R-x-win.exe dont voici l’icˆone : . Le logiciel s’installe alors et vous n’avez plus qu’`a suivre les instructions qui s’affichent et `a conserver les options propos´ees par d´efaut. Lorsque l’icˆone est ajout´ee sur le bureau, l’installation peut ˆetre consi- d´er´ee comme termin´ee. 574 Le logiciel R SECTION C.2 Installation de packages suppl´ementaires De nombreux modules (packages ou librairies) suppl´ementaires sont dispo- nibles sur le site internet : http://cran.r-project.org/web/packages/available_packages_by_name. html, ou bien encore ici : http://cran.r-project.org/bin/windows/contrib/, dans le dossier corres- pondant au num´ero x de votre version de R.
    [Show full text]
  • December 2016, Volume 34 No.2
    ISSN 0735-1348 Department of Physics, East Carolina University, 1000 East 5th Street, Greenville, NC 27858, USA http://www.ecu.edu/cs-cas/physics/ancient-timeline/ December 2016, Volume 34 No.2 IRSL dating of fast-fading sanidine feldspars from Sulawesi, Indonesia 1 Bo Li, Richard G. Roberts, Adam Brumm, Yu-Jie Guo, Budianto Hakim, Muhammad Ramli, Maxime Aubert, Rainer Grün, Jian-xin Zhao, E. Wahyu Saptomo Bayesian statistics in luminescence dating: The ‘baSAR’-model and its 14 implementation in the R package ‘Luminescence’ Norbert Mercier, Sebastian Kreutzer, Claire Christophe, Guillaume Guérin, Pierre Guibert, Christelle Lahaye, Philippe Lanos, Anne Philippe, and Chantal Tribolo RLumShiny - A graphical user interface for the R Package ‘Luminescence’ 22 Christoph Burow, Sebastian Kreutzer, Michael Dietze, Margret C. Fuchs, Manfred Fischer, Christoph Schmidt, Helmut Brückner Thesis abstracts 33 Bibliography 36 Ancient TL Started by the late David Zimmerman in 1977 EDITOR Regina DeWitt, Department of Physics, East Carolina University, Howell Science Complex, 1000 E. 5th Street Greenville, NC 27858, USA; Tel: +252-328-4980; Fax: +252-328-0753 ([email protected]) EDITORIAL BOARD Ian K. Bailiff, Luminescence Dating Laboratory, Univ. of Durham, Durham, UK ([email protected]) Geoff A.T. Duller, Institute of Geography and Earth Sciences, Aberystwyth University, Ceredigion, Wales, UK ([email protected]) Sheng-Hua Li, Department of Earth Sciences, The University of Hong Kong, Hong Kong, China ([email protected]) Shannon Mahan, U.S. Geological Survey, Denver Federal Center, Denver, CO, USA ([email protected]) Richard G. Roberts, School of Earth and Environmental Sciences, University of Wollongong, Australia ([email protected]) REVIEWERS PANEL Richard M.
    [Show full text]
  • Bioconductor Annual Report 2011
    Bioconductor Annual Report 2011 Martin Morgan Fred Hutchinson Cancer Research Center July 2011 Contents 1 Project Scope1 1.1 Packages................................1 1.2 Courses and Conferences.......................2 1.3 Community..............................3 1.4 Publication..............................3 1.5 Funding................................4 2 Core Tasks & Capabilities5 2.1 Automated Package Building and Testing.............5 2.2 Package submission management..................5 2.3 Annotation data package building..................6 2.4 Other Tasks..............................6 2.5 Hardware...............................6 2.6 Key Personnel.............................6 A Appendix: Proposal Specific Aims 11 1 Project Scope Bioconductor provides access to software for the analysis and comprehension of high throughput genomic data. Packages are written in the R programming lan- guage by members of the Bioconductor team and the international community. Bioconductor was started approximately 10 years ago (Fall, 2001) by Dr. Robert Gentleman and others, and now consists of >460 packages for the analysis of data ranging from expression microarrays through next-generation sequencing. 1.1 Packages R software packages represent the primary product of the Bioconductor project. Packages are produced by the Bioconductor team or are from international con- tributors. Table1 summarizes growth in the number of packages hosted by 1 Table 1: Number of contributed packages included in each Bioconductor release. Releases occur twice per year. Release N Release N Release N 2002 1.0 15 2006 1.8 172 2010 2.6 389 1.1 20 1.9 188 2.7 419 2003 1.2 30 2007 2.0 214 2011 2.8 467 1.3 49 2.1 233 2004 1.4 81 2008 2.2 260 1.5 100 2.3 294 2005 1.6 123 2009 2.4 320 1.7 141 2.5 352 Bioconductor, with 467 packages available in the current release.
    [Show full text]
  • Virtual Communities for Parents of Children with Special Needs in Taiwan: Emotional Support, Information, and Advocacy
    Virtual communities for parents of children with special needs in Taiwan: Emotional support, information, and advocacy A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Humanities 2017 I-Jung, Lu Manchester Institute of Education School of Environment, Education and Development List of Contents LIST OF CONTENTS ................................................................................................. 1 LIST OF TABLES ....................................................................................................... 6 LIST OF FIGURES ..................................................................................................... 7 ABSTRACT .................................................................................................................. 8 DECLARATION.......................................................................................................... 9 COPYRIGHT STATEMENT ..................................................................................... 9 ACKNOWLEDGEMENT ......................................................................................... 10 GLOSSARY................................................................................................................ 11 CHAPTER 1. INTRODUCTION ............................................................................. 14 1.1 Parents in the virtual world ................................................................... 14 1.2 Support for parents ..................................................................................
    [Show full text]
  • Analyzing the Role of Genetics and Genomics in Cardiovascular Disease
    IT'S COMPLICATED: ANALYZING THE ROLE OF GENETICS AND GENOMICS IN CARDIOVASCULAR DISEASE by JEFFREY HSU Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Adviser: Dr. Jonathan D Smith Department of Molecular Medicine CASE WESTERN RESERVE UNIVERSITY January 2014 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of: Jeffrey Hsu candidate for the Doctor of Philosophy degree*. Thomas LaFramboise Thesis Committee Chair Mina Chung David Van Wagoner David Serre Jonathan D Smith Date: 5/24/2013 We also certify that written approval has been obtained for any proprietary material contained therein. i Contents Abstract 1 Acknowledgments 3 1 Introduction 4 1.1 Heritability of cardiovascular diseases . .4 1.2 Genome wide era and complex diseases . .7 1.3 Functional genomics of disease associated loci . .8 1.4 Animal models for functional studies of CAD . 13 2 Genetic-Genomic Replication to Identify Candidate Mouse Atheroscle- rosis Modifier Genes 14 2.1 Introduction . 14 2.2 Methods . 15 2.2.1 Mouse and cell studies . 15 2.3 Results . 17 2.3.1 Atherosclerosis QTL replication in a new cross . 17 2.3.2 Protein coding differences between AKR and DBA mice resid- ing in ATH QTLs . 22 2.3.3 eQTL in bone marrow derived macrophages and endothelial cells 30 2.3.4 Macrophage eQTL replication between different crosses and dif- ferent platforms . 44 3 Transcriptome Analysis of Genes Regulated by Cholesterol Loading in Two Strains of Mouse Macrophages Associates Lysosome Pathway and ER Stress Response with Atherosclerosis Susceptibility 51 3.1 Introduction .
    [Show full text]
  • Kurt Hornik I
    R FAQ Frequently Asked Questions on R Version 2.6.2007-10-22 ISBN 3-900051-08-9 Kurt Hornik i Table of Contents 1 Introduction............................... 1 1.1 Legalese .................................................... 1 1.2 Obtaining this document..................................... 1 1.3 Citing this document ........................................ 1 1.4 Notation.................................................... 1 1.5 Feedback ................................................... 2 2 R Basics .................................. 3 2.1 What is R? ................................................. 3 2.2 What machines does R run on?............................... 3 2.3 What is the current version of R?............................. 4 2.4 How can R be obtained? ..................................... 4 2.5 How can R be installed? ..................................... 4 2.5.1 How can R be installed (Unix) ........................... 4 2.5.2 How can R be installed (Windows) ....................... 5 2.5.3 How can R be installed (Macintosh) ...................... 5 2.6 Are there Unix binaries for R? ............................... 6 2.7 What documentation exists for R? ............................ 6 2.8 Citing R .................................................... 8 2.9 What mailing lists exist for R? ............................... 9 2.10 What is CRAN? ........................................... 10 2.11 Can I use R for commercial purposes? ...................... 10 2.12 Why is R named R? ....................................... 11 2.13 What
    [Show full text]
  • Kurt Hornik I
    R faq Frequently Asked Questions on R Version 2.7.2008-04-18 ISBN 3-900051-08-9 Kurt Hornik i Table of Contents 1 Introduction ............................... 1 1.1 Legalese ................................................ 1 1.2 Obtaining this document................................. 1 1.3 Citing this document .................................... 1 1.4 Notation................................................ 1 1.5 Feedback ............................................... 2 2 R Basics .................................. 3 2.1 What is R? ............................................. 3 2.2 What machines does R run on?........................... 3 2.3 What is the current version of R?......................... 4 2.4 How can R be obtained? ................................. 4 2.5 How can R be installed? ................................. 4 2.5.1 How can R be installed (Unix) ................... 4 2.5.2 How can R be installed (Windows) ............... 5 2.5.3 How can R be installed (Macintosh).............. 5 2.6 Are there Unix binaries for R? ........................... 6 2.7 What documentation exists for R? ........................ 7 2.8 Citing R................................................ 8 2.9 What mailing lists exist for R? ........................... 9 2.10 What is cran? ....................................... 10 2.11 Can I use R for commercial purposes? .................. 11 2.12 Why is R named R? ................................... 11 2.13 What is the R Foundation? ............................ 11 3 R and S .................................
    [Show full text]
  • The R Journal Volume 5/1, June2013
    The Journal Volume 5/1, June 2013 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial..................................4 Contributed Research Articles RTextTools: A Supervised Learning Package for Text Classification.........6 Generalized Simulated Annealing for Global Optimization: The GenSA Package... 13 Multiple Factor Analysis for Contingency Tables in the FactoMineR Package.... 29 Hypothesis Tests for Multivariate Linear Models Using the car Package....... 39 osmar: OpenStreetMap and R......................... 53 ftsa: An R Package for Analyzing Functional Time Series............. 64 Statistical Software from a Blind Person’s Perspective............... 73 PIN: Measuring Asymmetric Information in Financial Markets with R....... 80 QCA: A Package for Qualitative Comparative Analysis.............. 87 An Introduction to the EcoTroph R Package: Analyzing Aquatic Ecosystem Trophic Networks.................................. 98 stellaR: A Package to Manage Stellar Evolution Tracks and Isochrones....... 108 Let Graphics Tell the Story - Datasets in R.................... 117 Estimating Spatial Probit Models in R...................... 130 ggmap: Spatial Visualization with ggplot2.................... 144 mpoly: Multivariate Polynomials in R..................... 162 beadarrayFilter: An R Package to Filter Beads.................. 171 Fast Pure R Implementation of GEE: Application of the Matrix Package....... 181 RcmdrPlugin.temis, a Graphical Integrated Text Mining Solution in R....... 188 Possible
    [Show full text]
  • October 2007 Volume 32 Number 5
    OCTOBER 2007 VOLUME 32 NUMBER 5 THE USENIX MAGAZINE OPINION Musings RIK FARROW OSES Why Some Dead OSes Still Matter ANDREY MIRTCHOVSKI AND LATCHESAR IONKOV SECURITY (Digital) Identity 2.0 CAT OKITA LAW Hardening Your Systems Against Litigation ALEXANDER MUENTZ SYSADMIN An Introduction to Logical Domains, Part 2 OCTAVE ORGERON VOIP IP Telephony HEMANT SENGAR COLUMNS Practical Perl Tools: Let Me Draw You a Picture DAVID BLANK-EDELMAN iVoyeur: Opaque Brews DAVID JOSEPHSEN /dev/random ROBERT G. FERRELL STANDARDS Whither C++? NICHOLAS M. STOUGHTON BOOK REVIEWS Book Reviews ELIZABETH ZWICKY ET AL. USENIX NOTES Creating the EVT Workshop DAN WALLACH 2008 USENIX Nominating Committee ELLIE YOUNG Letters to the Editor CONFERENCES 2007 USENIX Annual Technical Conference Linux Symposium 2007 The Advanced Computing Systems Association Upcoming Events INTERNET MEASUREMENT CONFERENCE 2007 1ST USENIX W ORKSHOP ON LARGE -S CALE (IMC 2007 ) EXPLOITS AND EMERGENT THREATS (LEET ’08) Sponsored by ACM SIGCOMM in cooperation with USENIX Co-located with NSDI ’08 OCTOBER 24–26 , 2007, SAN DIEGO , CA, USA APRIL 15, 2008, SAN FRANCISCO , CA, USA http://www.imconf.net/imc-2007/ 5TH USENIX S YMPOSIUM ON NETWORKED 21 ST LARGE INSTALLATION SYSTEM SYSTEMS DESIGN AND IMPLEMENTATION ADMINISTRATION CONFERENCE (LISA ’07 ) (NSDI ’08) Sponsored by USENIX and SAGE Sponsored by USENIX in cooperation with ACM SIGCOMM and ACM SIGOPS NOVEMBER 11–16 , 2007, DALLAS , TX , USA http://www.usenix.org/lisa07 APRIL 16–18, 2008, SAN FRANCISCO , CA, USA http://www.usenix.org/nsdi08 ACM/IFIP/USENIX
    [Show full text]
  • An R Package to Filter Beads by Anyiawung Chiara Forcheh, Geert Verbeke, Adetayo Kasim, Dan Lin, Ziv Shkedy, Willem Talloen, Hinrich W.H
    CONTRIBUTED RESEARCH ARTICLES 171 beadarrayFilter: An R Package to Filter Beads by Anyiawung Chiara Forcheh, Geert Verbeke, Adetayo Kasim, Dan Lin, Ziv Shkedy, Willem Talloen, Hinrich W.H. Göhlmann and Lieven Clement Abstract Microarrays enable the expression levels of thousands of genes to be measured simultane- ously. However, only a small fraction of these genes are expected to be expressed under different experimental conditions. Nowadays, filtering has been introduced as a step in the microarray pre- processing pipeline. Gene filtering aims at reducing the dimensionality of data by filtering redundant features prior to the actual statistical analysis. Previous filtering methods focus on the Affymetrix platform and can not be easily ported to the Illumina platform. As such, we developed a filtering method for Illumina bead arrays. We developed an R package, beadarrayFilter, to implement the latter method. In this paper, the main functions in the package are highlighted and using many examples, we illustrate how beadarrayFilter can be used to filter bead arrays. Introduction Gene expression patterns are commonly assessed by microarrays that can measure thousands of genes simultaneously. However, in a typical microarray experiment, only a small fraction of the genes are informative which motivated the development of gene filtering methods. Gene filtering aims at reducing the dimensionality of data by filtering redundant features prior to the actual statistical analysis. This has been shown to improve differential expression analysis with Affymetrix microarrays (e.g., Talloen et al., 2007; Calza et al., 2007; Kasim et al., 2010). Although different microarrays platforms share the same principle of hybridizing DNA to a complementary probe, they differ considerably by design.
    [Show full text]
  • Downloaded Freely from Comprehensive R Archive Network Resource
    1875-0362/19 Send Orders for Reprints to [email protected] 40 The Open Bioinformatics Journal Content list available at: https://openbioinformaticsjournal.com LETTER maGUI: A Graphical User Interface for Analysis and Annotation of DNA Microarray Data Dhammapal Bharne1, Praveen Kant1 and Vaibhav Vindal*, 1 1Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad, India. Abstract: Summary: maGUI is a graphical user interface designed to analyze microarray data produced from experiments performed on various platforms such as Affymetrix, Agilent, Illumina, and Nimblegen and so on, automatically. It follows an integrated workflow for pre-processing and analysis of the microarray data. The user may proceed from loading of microarray data to normalization, quality check, filtering, differential gene expression, principal component analysis, clustering and classification. It also provides miscellaneous applications such as gene set test and enrichment analysis for identifying gene symbols using Bioconductor packages. Further, the user can build a co-expression network for differentially expressed genes. Tables and figures generated during the analysis can be viewed and exported to local disks. The graphical user interface is very friendly especially for the biologists to perform the most microarray data analyses and annotations without much need of learning R command line programming. Availability and Implementation: maGUI is an R package which can be downloaded freely from Comprehensive R Archive Network resource. It can be installed in any R environment with version 3.0.2 or above. Keywords: Graphical user interface, R programming language, Bioconductor, Comprehensive R Archive Network, Microarray data analysis, Gene set test analysis, Gene set enrichment analysis.
    [Show full text]