Sharing and Organizing Research Products As R Packages Matti Vuorre1 & Matthew J

Total Page:16

File Type:pdf, Size:1020Kb

Sharing and Organizing Research Products As R Packages Matti Vuorre1 & Matthew J Sharing and organizing research products as R packages Matti Vuorre1 & Matthew J. C. Crump2 1 Oxford Internet Institute, University of Oxford, United Kingdom 2 Department of Psychology, Brooklyn College of CUNY, New York USA A consensus on the importance of open data and reproducible code is emerging. How should data and code be shared to maximize the key desiderata of reproducibility, permanence, and accessibility? Research assets should be stored persistently in formats that are not software restrictive, and documented so that others can reproduce and extend the required computations. The sharing method should be easy to adopt by already busy researchers. We suggest the R package standard as a solution for creating, curating, and communicating research assets. The R package standard, with extensions discussed herein, provides a format for assets and metadata that satisfies the above desiderata, facilitates reproducibility, open access, and sharing of materials through online platforms like GitHub and Open Science Framework. We discuss a stack of R resources that help users create reproducible collections of research assets, from experiments to manuscripts, in the RStudio interface. We created an R package, vertical, to help researchers incorporate these tools into their workflows, and discuss its functionality at length in an online supplement. Together, these tools may increase the reproducibility and openness of psychological science. Keywords: reproducibility; research methods; R; open data; open science Word count: 5155 Introduction package standard, with additional R authoring tools, provides a robust framework for organizing and sharing reproducible Research projects produce experiments, data, analyses, research products. manuscripts, posters, slides, stimuli and materials, computa- Some advances in data-sharing standards have emerged: It tional models, and more. However, the potential added value is becoming more popular to share data on the Open Science of these products is not fully realized due to limited sharing Framework (OSF). However, those materials often contain and curating practices. Although more transparent commu- idiosyncratic file organization and minimal or missing doc- nication of these research products has recently been encour- umentation for raw data. In specific areas, organization and aged (Houtkoop et al., 2018; Klein et al., 2018; Lindsay, documentation standards have emerged, (e.g., the BIDS frame- 2017; Martone, Garcia-Castro, & VandenBos, 2018; Rouder, work in neuroscience, Gorgolewski et al., 2016), but they 2016; Rouder, Haaf, & Snyder, 2019; Vanpaemel, Vermorgen, usually only consider data and code instead of the project as Deriemaecker, & Storms, 2015; Wicherts, Borsboom, Kats, a whole. More comprehensive proposals are described in the & Molenaar, 2006), these efforts often focus narrowly on Transparency and Openness Promotion (Nosek et al., 2015), sharing data (and sometimes analysis code). Further, the prac- and Peer Reviewers’ Openness initiative guidelines (Morey et tical value of sharing is often limited by poor documentation, al., 2016), but these fall short of describing detailed standards incompatible file formats, and lack of organization, result- for organization and metadata. Additionally, these standards 1 ing in low rates of reproducibility (Hardwicke et al., 2018). (if they exist) may not be widely known, recognized, agreed Standardization of protocols for sharing would be beneficial; upon, and/or adopted at large. but, such standards have not emerged, possibly due to the variance of research in psychology. Instead of developing We sought a standard for organizing and sharing that would another standard, we suggest borrowing existing standards adhere to the FAIR (Findable, Accessible, Interoperable, and practices from software engineering. Specifically, the R Reusable) guidelines to maximize the reuse potential of data and support “discovery through good data management” (Wilkinson et al., 2016, see also general discussion). Addi- tionally, we recognized the added value of including other re- The authors declare equal authorship. MV is funded by the Ado- search outputs (“products”; e.g., manuscripts) beyond datasets lescent Well-Being in the Digital Age project funded by the Huo Family Foundation. 1By “reproducibility”, we mean computational or analytic repro- Correspondence concerning this article should be addressed to ducibility by which previous results can be reproduced using existing Matti Vuorre, 1 St Giles, Oxford OX1 3JS, United Kingdom. E-mail: data. Contrast this to “replicability”, in which entire procedures are [email protected] repeated to generate new data to test a scientific idea. 2 MATTI VUORRE1 & MATTHEW J. C. CRUMP2 in a reproducible collection of materials. We identified the created (see Figure 1). Functions in R packages are portable, R package standard with modern online-based workflows as such that others can install the package from their R console, a solution that doesn’t present overwhelming overhead for load it, and start using the functions immediately. Packages already busy researchers. Here, we discuss the R package can also depend on other packages (and be depended on), standard for creating reproducible research material contain- such that R automatically installs any requirements for your ers for data, analysis code, and metadata. Then, we introduce functions to work appropriately. Functions within R packages additional R packages for incorporating other research prod- are documented in a standardized manner, and the documen- ucts into the reproducible container. These additional tools tation for a function can be viewed in R (e.g. try ?mean) or are easily accessible through our vertical package, thus named online. because an entire research project can be completed from top Wrapping packages also provide a reuse benefit: Functions to bottom as a stack of R processes from within the RStudio can be difficult to find in old scripts, but easy to find and load software platform. if they are called from an existing package. Thus, formally in- R Packages cluding one’s functions in R packages facilitates reproducibil- ity and sharing. R is a programming language for statistical analyses (R Core Team (2020); https://www.r-project.org), and RStudio is its as- Data sociated integrated development environment (IDE; RStudio Team (2016); https://www.rstudio.com/). Both are free, open Broadly, there are 3 steps to including data in an R package: source and work on Windows, Mac, and Linux systems. These 1. placing raw data in the “data-raw” directory, 2. creating tools are already widely used among psychology researchers: an R script that processes the raw data and creates an R data In one crowd-sourced analysis, 16 of 29 analysis teams used object into the “data” directory, and 3. documenting the final R (Silberzahn et al., 2018), and many psychologists have data object. already developed R packages (see also Yee & Debbie, 2017, First, raw data in any format is stored in the “data-raw” direc- and http://r4stats.com/articles/popularity/). tory to ensure that the method of sharing remains software- A cornerstone of R are user-created packages, which contain agnostic. If possible, that raw data should be in a text based functions, (meta)data, and documentation in a standard for- format, rather than proprietary formats such as SPSS files. mat that allows seamless sharing across users and operating Second, instructions for converting and cleaning the raw data systems. Data and functions in R packages are immediately into analyzable format should also be included in this direc- available to others through the R console, and documentation tory (the raw data files should not themselves be modified). can be viewed in R or online. We first outline how data These pre-processing steps should be saved so that the com- and functions are included in R packages, and then turn to plete path from raw data to results is transparent and repro- including other assets, such as manuscripts and posters. A ducible, preferably in an R script in the “data-raw” directory. complete guide to R packages is outside our scope (see Wick- That script should end with creating an R data object in the ham (2015)). “data” directory. Placing the R data object there ensures that it is included in the resulting R package, which makes using When an R package is created as shown in Figure 1, the re- the data effortless for any R user, and eschews the need to quired files and directories are automatically created. At this download additional files. Loading the R package makes the point, the package can be installed (click “Install and Restart” included data immediately available in the R console. in RStudio’s “Build” tab) and shared online (see below). How- ever, to make it useful, content needs to be added, beginning Finally, the resulting data object in “data” should be doc- with the package’s description. That information (e.g., au- umented in a standardized format by placing a data.R thors, dependencies) is written in the “DESCRIPTION” file in file in the “R” directory, a process discussed in more de- a machine readable format. To edit that file, click on its name tail in our complete online supplementary tutorial (https: in RStudio’s File browser. The “blank.Rproj”, “man” and //crumplab.github.io/vertical/), and in Wickham (2015). That “NAMESPACE” files/directories shown in Figure 1 should standard format of documentation is especially useful because not be edited by users, and we thus turn to R functions. it allows datasets’ documentation to be viewed in R (e.g., type ?attitude in the R console) and online (see below). Functions R packages thus provide a useful standard for functions and Researchers often develop and use custom functions in their (meta)data. However, as we will see, the package structure analysis. Although it is possible to declare them directly in can be extended to include analyses, manuscripts, posters, the analysis scripts, we suggest declaring functions using the and presentations as well. The complete project is then easily R package standard, by placing the function’s source code shared online, and even showcased as a website, as we will into a file in the “R” directory.
Recommended publications
  • Navigating the R Package Universe by Julia Silge, John C
    CONTRIBUTED RESEARCH ARTICLES 558 Navigating the R Package Universe by Julia Silge, John C. Nash, and Spencer Graves Abstract Today, the enormous number of contributed packages available to R users outstrips any given user’s ability to understand how these packages work, their relative merits, or how they are related to each other. We organized a plenary session at useR!2017 in Brussels for the R community to think through these issues and ways forward. This session considered three key points of discussion. Users can navigate the universe of R packages with (1) capabilities for directly searching for R packages, (2) guidance for which packages to use, e.g., from CRAN Task Views and other sources, and (3) access to common interfaces for alternative approaches to essentially the same problem. Introduction As of our writing, there are over 13,000 packages on CRAN. R users must approach this abundance of packages with effective strategies to find what they need and choose which packages to invest time in learning how to use. At useR!2017 in Brussels, we organized a plenary session on this issue, with three themes: search, guidance, and unification. Here, we summarize these important themes, the discussion in our community both at useR!2017 and in the intervening months, and where we can go from here. Users need options to search R packages, perhaps the content of DESCRIPTION files, documenta- tion files, or other components of R packages. One author (SG) has worked on the issue of searching for R functions from within R itself in the sos package (Graves et al., 2017).
    [Show full text]
  • Introduction to Ggplot2
    Introduction to ggplot2 Dawn Koffman Office of Population Research Princeton University January 2014 1 Part 1: Concepts and Terminology 2 R Package: ggplot2 Used to produce statistical graphics, author = Hadley Wickham "attempt to take the good things about base and lattice graphics and improve on them with a strong, underlying model " based on The Grammar of Graphics by Leland Wilkinson, 2005 "... describes the meaning of what we do when we construct statistical graphics ... More than a taxonomy ... Computational system based on the underlying mathematics of representing statistical functions of data." - does not limit developer to a set of pre-specified graphics adds some concepts to grammar which allow it to work well with R 3 qplot() ggplot2 provides two ways to produce plot objects: qplot() # quick plot – not covered in this workshop uses some concepts of The Grammar of Graphics, but doesn’t provide full capability and designed to be very similar to plot() and simple to use may make it easy to produce basic graphs but may delay understanding philosophy of ggplot2 ggplot() # grammar of graphics plot – focus of this workshop provides fuller implementation of The Grammar of Graphics may have steeper learning curve but allows much more flexibility when building graphs 4 Grammar Defines Components of Graphics data: in ggplot2, data must be stored as an R data frame coordinate system: describes 2-D space that data is projected onto - for example, Cartesian coordinates, polar coordinates, map projections, ... geoms: describe type of geometric objects that represent data - for example, points, lines, polygons, ... aesthetics: describe visual characteristics that represent data - for example, position, size, color, shape, transparency, fill scales: for each aesthetic, describe how visual characteristic is converted to display values - for example, log scales, color scales, size scales, shape scales, ..
    [Show full text]
  • Rkward: a Comprehensive Graphical User Interface and Integrated Development Environment for Statistical Analysis with R
    JSS Journal of Statistical Software June 2012, Volume 49, Issue 9. http://www.jstatsoft.org/ RKWard: A Comprehensive Graphical User Interface and Integrated Development Environment for Statistical Analysis with R Stefan R¨odiger Thomas Friedrichsmeier Charit´e-Universit¨atsmedizin Berlin Ruhr-University Bochum Prasenjit Kapat Meik Michalke The Ohio State University Heinrich Heine University Dusseldorf¨ Abstract R is a free open-source implementation of the S statistical computing language and programming environment. The current status of R is a command line driven interface with no advanced cross-platform graphical user interface (GUI), but it includes tools for building such. Over the past years, proprietary and non-proprietary GUI solutions have emerged, based on internal or external tool kits, with different scopes and technological concepts. For example, Rgui.exe and Rgui.app have become the de facto GUI on the Microsoft Windows and Mac OS X platforms, respectively, for most users. In this paper we discuss RKWard which aims to be both a comprehensive GUI and an integrated devel- opment environment for R. RKWard is based on the KDE software libraries. Statistical procedures and plots are implemented using an extendable plugin architecture based on ECMAScript (JavaScript), R, and XML. RKWard provides an excellent tool to manage different types of data objects; even allowing for seamless editing of certain types. The objective of RKWard is to provide a portable and extensible R interface for both basic and advanced statistical and graphical analysis, while not compromising on flexibility and modularity of the R programming environment itself. Keywords: GUI, integrated development environment, plugin, R.
    [Show full text]
  • R Generation [1] 25
    IN DETAIL > y <- 25 > y R generation [1] 25 14 SIGNIFICANCE August 2018 The story of a statistical programming they shared an interest in what Ihaka calls “playing academic fun language that became a subcultural and games” with statistical computing languages. phenomenon. By Nick Thieme Each had questions about programming languages they wanted to answer. In particular, both Ihaka and Gentleman shared a common knowledge of the language called eyond the age of 5, very few people would profess “Scheme”, and both found the language useful in a variety to have a favourite letter. But if you have ever been of ways. Scheme, however, was unwieldy to type and lacked to a statistics or data science conference, you may desired functionality. Again, convenience brought good have seen more than a few grown adults wearing fortune. Each was familiar with another language, called “S”, Bbadges or stickers with the phrase “I love R!”. and S provided the kind of syntax they wanted. With no blend To these proud badge-wearers, R is much more than the of the two languages commercially available, Gentleman eighteenth letter of the modern English alphabet. The R suggested building something themselves. they love is a programming language that provides a robust Around that time, the University of Auckland needed environment for tabulating, analysing and visualising data, one a programming language to use in its undergraduate statistics powered by a community of millions of users collaborating courses as the school’s current tool had reached the end of its in ways large and small to make statistical computing more useful life.
    [Show full text]
  • R Programming for Data Science
    R Programming for Data Science Roger D. Peng This book is for sale at http://leanpub.com/rprogramming This version was published on 2015-07-20 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. ©2014 - 2015 Roger D. Peng Also By Roger D. Peng Exploratory Data Analysis with R Contents Preface ............................................... 1 History and Overview of R .................................... 4 What is R? ............................................ 4 What is S? ............................................ 4 The S Philosophy ........................................ 5 Back to R ............................................ 5 Basic Features of R ....................................... 6 Free Software .......................................... 6 Design of the R System ..................................... 7 Limitations of R ......................................... 8 R Resources ........................................... 9 Getting Started with R ...................................... 11 Installation ............................................ 11 Getting started with the R interface .............................. 11 R Nuts and Bolts .......................................... 12 Entering Input .......................................... 12 Evaluation ...........................................
    [Show full text]
  • R: a Swiss Army Knife for Market Research
    R: A Swiss Army Knife for Market Research Enhancing work practices with R Martin Chan – Consultant, Rainmakers CSI 12th September, 2018 1 2 This presentation is about using R in business environments or conditions where its usage is less intuitively suitable… … and a story of how R has been transformative for our work practices 2 3 Who are we? What do we do? 3 4 What How Where Answer strategic Estimate size of market, and size of Across multiple questions to help our opportunity, anticipating trends industries and clients grow profitably markets Utilise resources available within an Inform strategic decisions organisation (e.g. stakeholder knowledge, • FMCG through creative and existing research) • Finance & analytical thinking Insurance Develop consumer targeting frameworks – • Travel identify core consumers and areas of • Media opportunities 4 5 A team with a mix of backgrounds from strategy, brand planning, marketing, research… (where analytics is a key, but only one of the components of our work) 5 6 What’s different about the use of R in our work? 6 Challenges of Using R Nature of data: Nature of U&A Client 1 2 3 disparate, patchy data requirement and expectations (of outputs) U&A – research with the aim to understand a market and identify growth opportunities by answering questions on whom to target, with what, and how. (Source: https://www.ipsos.com/en/ipsos-encyclopedia- 7 usage-attitude-surveys-ua) Nature of data: 1 disparate, patchy 8 9 Historical survey data – often designed for different purposes Stakeholder Interviews Our data come (Qualitative) together in and collected from different samples different forms, like pieces of Population / Demographic data (from census, World Bank Customer Interviews (Qualitative) JIGSAW research etc.) Pricing data Historical segmentation work (e.g.
    [Show full text]
  • An Introduction to R Graphics 4. Ggplot2
    Data Visualization in R 4. ggplot2 Michael Friendly SCS Short Course Sep/Oct, 2018 http://www.datavis.ca/courses/RGraphics/ Resources: Books Hadley Wickham, ggplot2: Elegant graphics for data analysis, 2nd Ed. 1st Ed: Online, http://ggplot2.org/book/ ggplot2 Quick Reference: http://sape.inf.usi.ch/quick-reference/ggplot2/ Complete ggplot2 documentation: http://docs.ggplot2.org/current/ Winston Chang, R Graphics Cookbook: Practical Recipes for Visualizing Data Cookbook format, covering common graphing tasks; the main focus is on ggplot2 R code from book: http://www.cookbook-r.com/Graphs/ Download from: http://ase.tufts.edu/bugs/guide/assets/R%20Graphics%20Cookbook.pdf Antony Unwin, Graphical Data Analysis with R R code: http://www.gradaanwr.net/ 2 Resources: Cheat sheets • Data visualization with ggplot2: https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2- cheatsheet-2.1.pdf • Data transformation with dplyr: https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data- transformation-cheatsheet.pdf 3 What is ggplot2? • ggplot2 is Hadley Wickham’s R package for producing “elegant graphics for data analysis” . It is an implementation of many of the ideas for graphics introduced in Lee Wilkinson’s Grammar of Graphics . These ideas and the syntax of ggplot2 help to think of graphs in a new and more general way . Produces pleasing plots, taking care of many of the fiddly details (legends, axes, colors, …) . It is built upon the “grid” graphics system . It is open software, with a large number of gg_ extensions. See: http://www.ggplot2-exts.org/gallery/ 4 Follow along • From the course web page, click on the script gg-cars.R, http://www.datavis.ca/courses/RGraphics/R/gg-cars.R • Select all (ctrl+A) and copy (ctrl+C) to the clipboard • In R Studio, open a new R script file (ctrl+shift+N) • Paste the contents (ctrl+V) • Run the lines (ctrl+Enter) to along with me ggplot2 vs base graphics Some things that should be simple are harder than you’d like in base graphics Here, I’m plotting gas mileage (mpg) vs.
    [Show full text]
  • R Programming
    Chapter 6 R Programming 6.1 R, Matlab, and Python R is a programming language especially designed for data analysis and data visualization. In some cases it is more convenient to use R than C++ or Java, making R an important data analysis tool. We describe below similarities and di↵erences between R and its close relatives Matlab and Python. R is similar to Matlab and Python in the following ways: • They run inside an interactive shell in their default setting. In some cases this shell is a complete graphical user interface (GUI). • They emphasize storing and manipulating data as multidimensional arrays. • They interface packages featuring functionalities, both generic, such as ma- trix operations, SVD, eigenvalues solvers, random number generators, and optimization routines; and specialized, such as kernel smoothing and sup- port vector machines. • They exhibit execution time slower than that of C, C++, and Fortran, and are thus poorly suited for analyzing massive1 data. • They can interface with native C++ code, supporting large scale data analysis by implementing computational bottlenecks in C or C++. The three languages di↵er in the following ways: • R and Python are open-source and freely available for Windows, Linux, and Mac, while Matlab requires an expensive license. • R, unlike Matlab, features ease in extending the core language by writing new packages, as well as in installing packages contributed by others. 1The slowdown can be marginal or significant, depending on the program’s implementation. Vectorized code may improve computational efficiency by performing basic operations on entire arrays rather than on individual array elements inside nested loops.
    [Show full text]
  • Evaluating the Design of the R Language Objects and Functions for Data Analysis
    Evaluating the Design of the R Language Objects and Functions For Data Analysis Floreal´ Morandat Brandon Hill Leo Osvald Jan Vitek Purdue University Abstract. R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely lin- guistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features. 1 Introduction Over the last decade, the R project has become a key tool for implementing sophisticated data analysis algorithms in fields ranging from computational biology [7] to political science [11]. At the heart of the R project is a dynamic, lazy, functional, object-oriented programming language with a rather unusual combination of features. This computer language, commonly referred to as the R language [15,16] (or simply R), was designed in 1993 by Ross Ihaka and Robert Gentleman [10] as a successor to S [1]. The main differences with its predecessor, which had been designed at Bell labs by John Chambers, were the open source nature of the R project, vastly better performance, and, at the language level, lexical scoping borrowed from Scheme and garbage collection [1]. Released in 1995 under a GNU license, it rapidly became the lingua franca for statistical data analysis. Today, there are over 4 000 packages available from repositories such as CRAN and Bioconductor.1 The R-forge web site lists 1 242 projects.
    [Show full text]
  • Using R to Analyse Key UK Surveys UK Data Service – Using R to Analyse Key UK Surveys
    ukdataservice.ac.uk Using R to analyse key UK surveys UK Data Service – Using R to analyse key UK surveys Author: UK Data Service Updated: July 2019 Version: 1.3 We are happy for our materials to be used and copied but request that users should: link to our original materials instead of re-mounting our materials on your website cite this an original source as follows: Pierre Walthery (updated by Rosalynd Southern, 2013 and Ana Morales, 2017). Using R to analyse key UK surveys. UK Data Service, University of Essex and University of Manchester. UK Data Service – Using R to analyse key UK surveys Contents 1. Introduction ......................................................................................................................... 4 1.1. What is R? ................................................................................................................... 4 1.2. The pros and the cons of R ..................................................................................... 5 2. Using R: essential information ......................................................................................... 7 2.1. Installing and loading user-written packages ..................................................... 9 2.2. Getting help ............................................................................................................. 10 2.3. Interacting with R: command line vs graphical interface ............................... 12 2.4. Objects in R .............................................................................................................
    [Show full text]
  • Interfacing NEOS from R the R Package Rneos
    Interfacing NEOS from R The R package rneos Bernhard Pfaff [email protected] Invesco Asset Management Deutschland GmbH, Frankfurt am Main 5th R/Rmetrics Meielisalp Workshop June 26{30, 2011 Pfaff (Invesco) rneos Meielisalp 1 / 12 Overview Overview Network-Enabled Optimisation System (NEOS) Overview Available Solvers Interfaces The R package rneos Overview Implementation of the API Example Outlook Pfaff (Invesco) rneos Meielisalp 2 / 12 NEOS Overview NEOS Overview http://www.neos-server.org Server framework for solving optimisation problems. Why using NEOS? Optimisation software does not need to be installed locally. Computational burdensome problems are transferred to remote machines. Help: via Email, FAQ, User Guide (Wiki). Pfaff (Invesco) rneos Meielisalp 3 / 12 NEOS Available Solvers NEOS Optimisation problems (in alphabetical order) Bound Constrained Optimisation Combinatorial Optimisation and Integer Programming Complementarity Problems Global Optimisation Linear Network Programming Linear Programming Mixed Integer Linear Programming Mixed Integer Nonlinearly Constrained Optimisation Nonlinearly Constrained Optimisation Non-differentiable Optimisation Semidefinite Programming Semi-infinite Optimisation Stochastic Linear Programming Second Order Conic Programming Unconstrained Optimisation Pfaff (Invesco) rneos Meielisalp 4 / 12 NEOS Interfaces NEOS Interfaces Through Internet: Upload of model and data files Through Email: Upload of model and data files AMPL/GAMS via Kestrel NEOS API (XML-RPC): Available clients Python Perl PHP
    [Show full text]
  • BRB-Arraytools Version 4.6 User's Manual
    BRB-ArrayTools Version 4.6 User’s Manual by Dr. Richard Simon Biometrics Research Branch National Cancer Institute and BRB-ArrayTools Development Team The EMMES Corporation June, 2017 Table of Contents Table of Contents 2 Introduction 6 Purpose of this software 6 Overview of the software’s capabilities 6 A note about single-channel experiments 10 Installation 12 System Requirements 12 Installing the software components 12 Loading the add-in into Excel 12 Collating the data 12 Overview of the collating step 12 Input to the collating step 14 Input data elements 14 Expression data 14 Gene identifiers 15 Experiment descriptors 16 Minimal required data elements 16 Required file formats and folder structures 17 Using the collation dialogs 18 Collating data using the data import wizard 18 Special data formats 24 Collating Affymetrix data from CHP files exported into text format 24 Importing Affymetrix data from text or binary CEL files 27 Importing Affymetrix Gene ST1.0, 1.1, 2.0, 2.1, Clariom D, Clariom S .CEL files 28 Importing RNA-Seq data outputted from Galaxy web tool 28 Importing RNA-Seq count data 28 Collating data from an NCI mAdb archive 29 Collating GenePix data 31 Collating Agilent data 31 Collating Illumina expression data 32 Collating Illumina methylation data 33 Importing NanoString .RCC data 34 Collating from NCBI GEO Import Tool 34 Output of the collating step 35 Organization of the project folder 35 The collated project workbook 35 Filtering the data 39 Spot filters 39 Intensity filter 39 Spot flag filter 40 2 Spot size filter
    [Show full text]