Introduction to Reproducible Science in R 2 Contents

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Reproducible Science in R 2 Contents Brian Lee Yung Rowe Introduction to Reproducible Science in R 2 Contents 1 The reproducible scientific method 1 1.1 The rise of operational models . 3 1.2 The importance of being functional . 8 1.2.1 Determinism of programs . 8 1.2.2 Lazy evaluation . 10 1.2.3 Generic versus specific types . 10 1.2.4 Computational graphs . 11 1.3 The UNIX philosophy . 13 1.4 Other approaches to reproducible science . 13 1.4.1 The tidyverse . 14 1.4.2 ReproZip . 14 1.4.3 REANA . 14 1.5 Coding conventions . 15 1.6 Package dependencies and datasets . 15 1.7 Summary . 16 1.8 Exercises . 17 I Tools for Model Development 19 2 Core tools and requirements 21 2.1 GNU/Linux . 21 2.1.1 The command shell . 22 2.1.2 File permissions and scripts . 25 2.1.3 Linking commands with the pipe . 26 2.1.4 File redirects . 28 2.1.5 Return codes . 29 2.1.6 Processes . 30 2.1.7 Variables . 32 2.1.8 Functions . 33 2.1.9 Conditional expressions . 35 2.1.10 Conditional statements . 36 2.1.11 Case statements . 37 2.1.12 Loops . 37 2.2 The make build tool . 41 2.2.1 Variables . 42 2.2.2 Conditional blocks . 44 i ii 2.2.3 Macros and functions . 44 2.3 Literate programming and LATEX................. 45 2.3.1R documentation . 46 2.3.2 Articles, papers, and reports . 48 2.3.3 Notebooks . 50 2.4 Containerization and Docker . 50 2.5R, Python, and interoperability . 52 2.6 The crant utility . 55 2.6.1 Creating projects and packages . 56 2.6.2 Building packages . 58 2.6.3 Developing multiple packages . 58 2.6.4 Installing packages . 59 2.7 Exercises . 60 3 Project conventions 63 3.1 Directory structure . 64 3.2 CreatingRfiles . 66 3.3 Working with dependencies . 69 3.4 Documenting your work . 70 4 Source code management 71 4.1 Version management . 73 4.2 Branches . 76 4.3 Merging branches . 78 4.4 Remote repositories . 83 4.5 Exercises . 85 5 Formalizing code in a package 87 5.1 Creating a package . 87 5.2 Building a package . 89 5.3 Using packages . 91 5.4 Testing packages . 92 5.5 Continuous integration . 93 5.6 Exercises . 95 6 Developing with containers 97 6.1 Working in a container . 100 6.1.1 Volume mapping . 102 6.1.2 Using graphics in a container . 103 6.2 Managing containers . 103 6.3 Working with images . 104 6.4 Running a notebook from a container . 104 6.5 Exercises . 107 II Model Development Workflows 109 iii 7 Workflows and repeatability 111 8 Exploratory analysis and hypothesis creation 115 9 Model design 127 10 Operationalization 135 10.1 Data processing as a pipeline . 137 10.2 Job initiation . 138 10.3 Scheduling jobs with cron.................. 141 10.4 Event-driven processing . 142 10.5 Error handling . 145 10.6 Development environments . 149 11 Reporting and visualization 153 11.1 Rmarkdown and knitr forflat reports . 154 11.2 Notebooks for interactive documents . 159 11.3 Shiny for interactive visualizations . 159 11.4 OpenCPU for dashboards . 162 11.5 Summary . 162 11.6 Exercises . 163 III A Discipline for Data Science 165 12 Algorithm styles 167 13 A brief history of object-oriented programming 173 14 Elements of a functional programming language 179 14.1 A taste of functional programming sorcery . 179 14.1.1 Everything I see is a function to me . 180 14.1.2 Iteration over values . 182 14.1.3 Composing functions . 185 14.1.4 Eliminating conditional expressions . 186 14.1.5 How functional is your code? . 187 14.2 Properties of the functional programming style . 188 14.3 Function representation . 190 14.3.1 Function signatures . 191 14.3.2 Function definitions . 191 14.3.3 Function application . 191 14.4 Vectorization . 191 14.5 First-Class Functions . 194 14.6 Closures . 196 iv 15 Programming conventions 203 15.1 PERL’s three virtues . 203 15.2 The Erlang manifesto . 204 15.3 Self-documenting code . 204 15.4 Abbreviations . 205 15.5 Suffixes . 206 15.6 Line length . 206 15.7 Indentation . 207 15.8 Spaces . 207 15.9 Consistency . 208 16 Troubleshooting and diagnostics 209 16.1 Testing . 209 16.1.1 Edge cases and well-formed data . 209 16.1.2 Function boundaries . 211 16.1.3 Property testing . 211 16.1.4 Model validation . 213 16.1.5 A note on test coverage . 213 16.2 Logging . 214 16.3 Debugging . 214 16.4 Exercises . 215 17 Mathematical data structures 217 17.1 Scalars . 218 17.2 Sets . 219 17.3 Tuples . 219 17.4 Relations . 220 17.5 Predicates . 220 17.6 Vectors . 220 17.7 Matrices . 221 17.8 Tensors . 221 18 Data structures in data science 223 18.1 Tabular data . 223 18.2 Hierarchical data . 224 18.3 Summary . 224 18.4 Exercises . 224 19 Common data transformations 227 19.1 Retrieving and transforming input . 228 19.2 Data normalization . 229 19.3 Feature engineering . 230 19.4 Partitioning and aggregation . 230 19.5 Matching model interfaces . ..
Recommended publications
  • Linux Fundamentals (GL120) U8583S This Is a Challenging Course That Focuses on the Fundamental Tools and Concepts of Linux and Unix
    Course data sheet Linux Fundamentals (GL120) U8583S This is a challenging course that focuses on the fundamental tools and concepts of Linux and Unix. Students gain HPE course number U8583S proficiency using the command line. Beginners develop a Course length 5 days solid foundation in Unix, while advanced users discover Delivery mode ILT, vILT patterns and fill in gaps in their knowledge. The course View schedule, local material is designed to provide extensive hands-on View now pricing, and register experience. Topics include basic file manipulation; basic and View related courses View now advanced filesystem features; I/O redirection and pipes; text manipulation and regular expressions; managing jobs and processes; vi, the standard Unix editor; automating tasks with Why HPE Education Services? shell scripts; managing software; secure remote • IDC MarketScape leader 5 years running for IT education and training* administration; and more. • Recognized by IDC for leading with global coverage, unmatched technical Prerequisites Supported distributions expertise, and targeted education consulting services* Students should be comfortable with • Red Hat Enterprise Linux 7 • Key partnerships with industry leaders computers. No familiarity with Linux or other OpenStack®, VMware®, Linux®, Microsoft®, • SUSE Linux Enterprise 12 ITIL, PMI, CSA, and SUSE Unix operating systems is required. • Complete continuum of training delivery • Ubuntu 16.04 LTS options—self-paced eLearning, custom education consulting, traditional classroom, video on-demand
    [Show full text]
  • Research Computing and Cyberinfrastructure Team
    Research Computing and CyberInfrastructure team ! Working with the Linux Shell on the CWRU HPC 31 January 2019 7 February 2019 ! Presenter Emily Dragowsky KSL Data Center RCCI Team: Roger Bielefeld, Mike Warfe, Hadrian Djohari! Brian Christian, Emily Dragowsky, Jeremy Fondran, Cal Frye,! Sanjaya Gajurel, Matt Garvey, Theresa Griegger, Cindy Martin, ! Sean Maxwell, Jeno Mozes, Nasir Yilmaz, Lee Zickel Preface: Prepare your environment • User account ! # .bashrc ## cli essentials ! if [ -t 1 ] then bind '"\e[A": history-search-backward' bind '"\e[B": history-search-forward' bind '"\eOA": history-search-backward' bind '"\eOB": history-search-forward' fi ! This is too useful to pass up! Working with Linux • Preamble • Intro Session Linux Review: Finding our way • Files & Directories: Sample work flow • Shell Commands • Pipes & Redirection • Scripting Foundations • Shell & environment variables • Control Structures • Regular expressions & text manipulation • Recap & Look Ahead Rider Cluster Components ! rider.case.edu ondemand.case.edu University Firewall ! Admin Head Nodes SLURM Science Nodes Master DMZ Resource ! Data Manager Transfer Disk Storage Nodes Batch nodes GPU nodes SMP nodes Running a job: where is it? slide from Hadrian Djohari Storing Data on the HPC table from Nasir Yilmaz How data moves across campus • Buildings on campus are each assigned to a zone. Data connections go from every building to the distribution switch at the center of the zone and from there to the data centers at Kelvin Smith Library and Crawford Hall. slide
    [Show full text]
  • Bash Guide for Beginners
    Bash Guide for Beginners Machtelt Garrels Garrels BVBA <tille wants no spam _at_ garrels dot be> Version 1.11 Last updated 20081227 Edition Bash Guide for Beginners Table of Contents Introduction.........................................................................................................................................................1 1. Why this guide?...................................................................................................................................1 2. Who should read this book?.................................................................................................................1 3. New versions, translations and availability.........................................................................................2 4. Revision History..................................................................................................................................2 5. Contributions.......................................................................................................................................3 6. Feedback..............................................................................................................................................3 7. Copyright information.........................................................................................................................3 8. What do you need?...............................................................................................................................4 9. Conventions used in this
    [Show full text]
  • Linux Cheat Sheet
    1 of 4 ########################################### # 1.1. File Commands. # Name: Bash CheatSheet # # # # A little overlook of the Bash basics # ls # lists your files # # ls -l # lists your files in 'long format' # Usage: A Helpful Guide # ls -a # lists all files, including hidden files # # ln -s <filename> <link> # creates symbolic link to file # Author: J. Le Coupanec # touch <filename> # creates or updates your file # Date: 2014/11/04 # cat > <filename> # places standard input into file # Edited: 2015/8/18 – Michael Stobb # more <filename> # shows the first part of a file (q to quit) ########################################### head <filename> # outputs the first 10 lines of file tail <filename> # outputs the last 10 lines of file (-f too) # 0. Shortcuts. emacs <filename> # lets you create and edit a file mv <filename1> <filename2> # moves a file cp <filename1> <filename2> # copies a file CTRL+A # move to beginning of line rm <filename> # removes a file CTRL+B # moves backward one character diff <filename1> <filename2> # compares files, and shows where differ CTRL+C # halts the current command wc <filename> # tells you how many lines, words there are CTRL+D # deletes one character backward or logs out of current session chmod -options <filename> # lets you change the permissions on files CTRL+E # moves to end of line gzip <filename> # compresses files CTRL+F # moves forward one character gunzip <filename> # uncompresses files compressed by gzip CTRL+G # aborts the current editing command and ring the terminal bell gzcat <filename> #
    [Show full text]
  • Cooper (2007) Advanced Bash Scripting Guide
    Advanced Bash-Scripting Guide An in-depth exploration of the art of shell scripting Mendel Cooper <[email protected]> 5.1 10 November 2007 Revision History Revision 4.3 29 Apr 2007 Revised by: mc 'INKBERRY' release: Minor Update. Revision 5.0 24 Jun 2007 Revised by: mc 'SERVICEBERRY' release: Major Update. Revision 5.1 10 Nov 2007 Revised by: mc 'LINGONBERRY' release: Minor Update. This tutorial assumes no previous knowledge of scripting or programming, but progresses rapidly toward an intermediate/advanced level of instruction . all the while sneaking in little snippets of UNIX® wisdom and lore. It serves as a textbook, a manual for self-study, and a reference and source of knowledge on shell scripting techniques. The exercises and heavily-commented examples invite active reader participation, under the premise that the only way to really learn scripting is to write scripts. This book is suitable for classroom use as a general introduction to programming concepts. The latest update of this document, as an archived, bzip2-ed "tarball" including both the SGML source and rendered HTML, may be downloaded from the author's home site. A pdf version is also available ( pdf mirror site). See the change log for a revision history. Dedication For Anita, the source of all the magic Advanced Bash-Scripting Guide Table of Contents Chapter 1. Why Shell Programming?...............................................................................................................1 Chapter 2. Starting Off With a Sha-Bang........................................................................................................3
    [Show full text]
  • Linux Shells Book Chapter 5
    Linux Shells Book Chapter 5 What is a shell? Examples: bash Bourne Again shell ksh Korn shell tcsh C shell 1 Linux Shells Linux default shell /bin/bash How do I know what shell I am running? echo $SHELL env 2 Shell Operations Shell invocation sequence 1. read special startup file containing initialization info 2. display prompt and wait for user command 3. execute command if “end of file” exit shell otherwise execute command entered 3 Shells operations Shell command examples ls ps -ef | sort | ul -tdumb | lp “\” serves as line extension character e.g.: echo this is a long command that does \ not fit in one line 4 Commands - executables Where are the shell commands? most commands invoke utility programs shell executes executable stored in file system e.g., ls (\bin\ls) 5 Commands - built-in Where are the shell commands? other commands are built-in, e.g., echo [option]... [string] which displays line of text cd bash built-ins man cd outputs bash, :, ., [, alias, bg, bind, break, builtin, cd, command, compgen, complete, continue, declare, dirs, disown, echo, enable, eval, exec, exit, export, fc, fg, getopts, hash, help, history, jobs, kill, let, local, logout, popd, printf, pushd, pwd, read, readonly, return, set, shift, shopt, source, suspend, test, times, trap, type, typeset, ulimit, umask, unalias, unset, wait - bash built-in commands, see bash(1) 6 Shell Metacharacters 7 Redirection Output Redirection Store output of process in file echo hello world > file writes string into file echo more words >> file append string to file Input Redirection
    [Show full text]
  • Kamiak Cheat Sheet
    Kamiak Cheat Sheet Logging in to Kamiak ssh [email protected] ssh -X [email protected] X11 graphics Transferring Files to and from Kamiak From your laptop, not logged into Kamiak scp -r myFile [email protected]:~ Copy to Kamiak scp -r [email protected]:~/myFile . Copy from Kamiak rsync -ravx myFile/ [email protected]:~/myFile Synchronize Linux Commands cd Go to home directory cd .. Go up one level (.. is parent, . is current) cd ~/myPath Go to path relative to home (~ is home) ls List members of current directory pwd Show path of current directory mkdir -pv myFolder Create a directory (folder is synonymous with directory) cp -r -i myFrom myTo Copy file, -r for entire folder, -i prompt before overwrite mv -i myFrom myTo Move file or folder (-i to prompt before overwrite) rm -i myFile Delete file (-i to prompt before overwrite) rm -r -I myFolder Delete folder (-I prompt if delete more than 3 files) rm -r -f myFolder Delete entire folder, without asking rmdir myFolder Delete folder only if empty more myFile Display text file, one page at a time cat myFile Display entire file cat myFile* Matches all files beginning with myFile du -hd 1 . See disk space on folder df -h . See disk space on volume man cp Manual page for command Ctl-c Kill current command Ctl-z Suspend current command bg Run suspended command in background fg Run suspended command in foreground disown -h Disconnect from terminal - 1 - hpc.wsu.edu/cheat-sheetJuly 8, 2021 hpc.wsu.edu/training/slides hpc.wsu.edu/training/follow-along Startup and
    [Show full text]
  • What's a Script?
    Linux shell scripting – “Getting started ”* David Morgan *based on chapter by the same name in Classic Shell Scripting by Robbins and Beebe What ’s a script? text file containing commands executed as a unit “command” means a body of code from somewhere from where? – the alias list, in the shell’s memory – the keywords, embedded in the shell – the functions that are in shell memory – the builtins, embedded in the shell code itself – a “binary” file, outboard to the shell 1 Precedence of selection for execution aliases keywords (a.k.a. reserved words) functions builtins files (binary executable and script) – hash table – PATH Keywords (a.k.a. reserved words) RESERVED WORDS Reserved words are words that have a special meaning to the shell. The following words are recognized as reserved when unquoted and either the first word of a simple command … or the third word of a case or for command: ! case do done elif else esac fi for function if in select then until while { } time [[ ]] bash man page 2 bash builtin executables source continue fc popd test alias declare fg printf times bg typeset getopts pushd trap bind dirs hash pwd type break disown help read ulimit builtin echo history readonly umask cd enable jobs return unalias caller eval kill set unset command exec let shift wait compgen exit local shopt complete export logout suspend * code for a bash builtin resides in file /bin/bash, along with the rest of the builtins plus the shell program as a whole. Code for a utility like ls resides in its own dedicated file /bin/ls, which holds nothing else.
    [Show full text]
  • Bash Guide for Beginners
    Bash Guide for Beginners Machtelt Garrels Xalasys.com <tille wants no spam _at_ xalasys dot com> Version 1.8 Last updated 20060315 Edition Bash Guide for Beginners Table of Contents Introduction.........................................................................................................................................................1 1. Why this guide?...................................................................................................................................1 2. Who should read this book?.................................................................................................................1 3. New versions, translations and availability.........................................................................................2 4. Revision History..................................................................................................................................2 5. Contributions.......................................................................................................................................3 6. Feedback..............................................................................................................................................3 7. Copyright information.........................................................................................................................3 8. What do you need?...............................................................................................................................4 9. Conventions used in this
    [Show full text]
  • Learning the Bash Shell, 3Rd Edition
    1 Learning the bash Shell, 3rd Edition Table of Contents 2 Preface bash Versions Summary of bash Features Intended Audience Code Examples Chapter Summary Conventions Used in This Handbook We'd Like to Hear from You Using Code Examples Safari Enabled Acknowledgments for the First Edition Acknowledgments for the Second Edition Acknowledgments for the Third Edition 1. bash Basics 3 1.1. What Is a Shell? 1.2. Scope of This Book 1.3. History of UNIX Shells 1.3.1. The Bourne Again Shell 1.3.2. Features of bash 1.4. Getting bash 1.5. Interactive Shell Use 1.5.1. Commands, Arguments, and Options 1.6. Files 1.6.1. Directories 1.6.2. Filenames, Wildcards, and Pathname Expansion 1.6.3. Brace Expansion 1.7. Input and Output 1.7.1. Standard I/O 1.7.2. I/O Redirection 1.7.3. Pipelines 1.8. Background Jobs 1.8.1. Background I/O 1.8.2. Background Jobs and Priorities 1.9. Special Characters and Quoting 1.9.1. Quoting 1.9.2. Backslash-Escaping 1.9.3. Quoting Quotation Marks 1.9.4. Continuing Lines 1.9.5. Control Keys 4 1.10. Help 2. Command-Line Editing 2.1. Enabling Command-Line Editing 2.2. The History List 2.3. emacs Editing Mode 2.3.1. Basic Commands 2.3.2. Word Commands 2.3.3. Line Commands 2.3.4. Moving Around in the History List 2.3.5. Textual Completion 2.3.6. Miscellaneous Commands 2.4. vi Editing Mode 2.4.1.
    [Show full text]
  • 1.5 Declaring Pre-Tasks
    Invoke Release Jul 09, 2021 Contents 1 Getting started 3 1.1 Getting started..............................................3 2 The invoke CLI tool 9 2.1 inv[oke] core usage..........................................9 3 Concepts 15 3.1 Configuration............................................... 15 3.2 Invoking tasks.............................................. 22 3.3 Using Invoke as a library......................................... 30 3.4 Loading collections........................................... 34 3.5 Constructing namespaces........................................ 35 3.6 Testing Invoke-using codebases..................................... 42 3.7 Automatically responding to program output.............................. 46 4 API 49 4.1 __init__ ................................................ 49 4.2 collection .............................................. 50 4.3 config ................................................. 52 4.4 context ................................................ 58 4.5 exceptions .............................................. 63 4.6 executor ................................................ 66 4.7 loader ................................................. 68 4.8 parser ................................................. 69 4.9 program ................................................ 72 4.10 runners ................................................ 76 4.11 tasks .................................................. 87 4.12 terminals ............................................... 89 4.13 util ..................................................
    [Show full text]
  • Bash Reference Manual Reference Documentation for Bash Edition 2.5, for Bash Version 2.05
    Bash Reference Manual Reference Documentation for Bash Edition 2.5, for Bash Version 2.05. Mar 2001 Chet Ramey, Case Western Reserve University Brian Fox, Free Software Foundation Copyright c 1991-1999 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the con- ditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another lan- guage, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. Chapter 1: Introduction 1 1 Introduction 1.1 What is Bash? Bash is the shell, or command language interpreter, for the gnu operating system. The name is an acronym for the `Bourne-Again SHell', a pun on Stephen Bourne, the author of the direct ancestor of the current Unix shell /bin/sh, which appeared in the Seventh Edition Bell Labs Research version of Unix. Bash is largely compatible with sh and incorporates useful features from the Korn shell ksh and the C shell csh. It is intended to be a conformant implementation of the ieee posix Shell and Tools specification (ieee Working Group 1003.2). It offers functional improvements over sh for both interactive and programming use. While the gnu operating system provides other shells, including a version of csh, Bash is the default shell.
    [Show full text]