Rmarkdown / Knitr) 35 7.1 Rmarkdown

Reproducible Research Release 1.0 Fotis E. Psomopoulos Oct 24, 2018 Table of Contents 1 Setting up 3 1.1 Git....................................................3 1.2 Jupyter..................................................3 1.3 R / RStudio................................................5 1.4 Install the R Kernel for Jupyter.....................................5 2 Introduction to the Reproducible analysis and Research Transparency workshop7 2.1 Learning objectives for this workshop:.................................7 2.2 Replication and reproduction......................................7 2.3 Science retracts paper without agreement of lead author.........................7 2.4 Retracted, but not fraud.........................................8 2.5 The Four Facets of reproducibility:...................................8 2.6 References / Sources...........................................8 3 Transparent, Reproducible and Open Research9 3.1 Transparency...............................................9 3.2 Questionable Research Practice..................................... 11 3.3 Reproducibility.............................................. 11 3.4 Open reporting of results......................................... 12 3.5 References / Sources........................................... 13 4 Tools and Platforms for Reproducibility and Transparency in research 15 4.1 R And The History of Reproducible Research.............................. 15 4.2 The Open Science Framework...................................... 16 4.3 Dryad................................................... 16 4.4 Figshare................................................. 16 4.5 Zenodo.................................................. 17 4.6 References / Sources........................................... 17 5 Jupyter Notebook for Open Science 19 5.1 Introduction to Markdown........................................ 19 5.2 Markdown................................................ 20 5.3 So, what is a Jupyter notebook?..................................... 20 5.4 Cool! How do I install Jupyter?..................................... 21 5.5 Ok, I’m set! What’s next?........................................ 21 5.6 The Future of Jupyter.......................................... 25 5.7 Reading material............................................. 26 i 5.8 References / Sources........................................... 26 6 R for Reproducible Scientific Analysis (Jupyter) 27 6.1 Data.................................................... 27 6.2 Code and document........................................... 27 6.3 Loading and Cleaning Data....................................... 28 6.4 Subsetting our data............................................ 28 6.5 Using the vegan package........................................ 29 6.6 Results.................................................. 30 6.7 Conclusions............................................... 33 6.8 References / Sources........................................... 33 7 R for Reproducible Scientific Analysis (RMarkdown / knitr) 35 7.1 RMarkdown............................................... 35 7.2 Creating a .Rmd File........................................... 37 7.3 Anatomy of Rmarkdown file....................................... 37 7.4 Chunk Labels............................................... 39 7.5 Chunk Options.............................................. 39 7.6 Global Chunk Options.......................................... 40 7.7 Tables................................................... 40 7.8 Citations and Bibliography........................................ 41 7.9 Bibliography............................................... 41 7.10 Placement................................................ 42 7.11 Citation Styles.............................................. 42 7.12 Citations................................................. 42 7.13 Publishing on RPubs........................................... 42 7.14 Amazing Resources for learning Rmarkdown.............................. 43 7.15 References / Sources........................................... 43 8 Versioning with Git 45 8.1 Software Carpentry source lesson.................................... 45 8.2 Automated Version Control....................................... 45 8.3 How can version control help me make my work more open?..................... 46 8.4 Storing our newly created Jupyter file to GitHub............................ 47 8.5 References / Sources........................................... 50 9 Docker and Reproducibility 51 9.1 Docker.................................................. 51 9.2 When to build a Docker image...................................... 52 9.3 Distributing a Docker Image....................................... 52 9.4 Running a Docker image......................................... 52 9.5 Jupyter, Docker, MyBinder........................................ 53 9.6 References / Sources........................................... 54 10 Indices and tables 55 ii Reproducible Research, Release 1.0 Contents: Table of Contents 1 Reproducible Research, Release 1.0 2 Table of Contents CHAPTER 1 Setting up 1.1 Git If Git is not already available on your machine you can try to install it via your distro’s package manager. For Debian/Ubuntu run sudo apt-get install git and for Fedora run sudo yum install git Windows • https://git-for-windows.github.io/ • Download the Git for Windows installer. • Run the installer In any case, create also an account on GitHub - it will be useful for the hand-on exercises. 1.2 Jupyter Jupyter notebook is an interactive web application that allows you to type and edit lines of code and see the output. The software requires Python installation, but currently supports interaction with over 40 languages. Jupyter notebook examples 1.2.1 Installation Jupyter install instructions For new users, installation of Anaconda is highly recommended. Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. Use the following installation steps: 1. Download Anaconda. We recommend downloading Anaconda’s latest Python 3 version (currently Python 3.5). 2. Install the version of Anaconda which you downloaded, following the instructions on the download page. 3 Reproducible Research, Release 1.0 Install on UNIX machines: First we will install Anaconda, a package manager for Python libraries. curl-OL https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1. ,!rackcdn.com/Anaconda2-2.4.0-Linux-x86_64.sh bash Anaconda2-2.4.0-Linux-x86_64.sh Install on Mac OS X: First we will install Anaconda, a package manager for Python libraries. curl-OL http://repo.continuum.io/archive/Anaconda2-4.1.1-MacOSX-x86_64.sh bash Anaconda2-4.1.1-MacOSX-x86_64.sh-b (If you are prompted, type Enter, then press Enter through the instructions. Be careful not to keep pressing Enter without reading otherwise you will end up saying No. You will need to type ‘Yes’ to continue with the installation.) After Anaconda install has finished, type: source~/.bashrc conda install jupyter conda install-c r r r-essentials This will install packages allowing you to open either a new Python .ipynb or an R .ipynb. Navigate to the directory on your computer with files you want to explore. Then type: jupyter notebook This will open your browser with a list of files. Click on the “New” and bring down the pull-down menu. Under ‘Notebooks’, click on either R or Python language to start your new notebook! You should see the files in the repository. 1.2.2 Using Jupyter notebooks: The main keyboard command to remember is how to execute the code from a cell. Type code into a cell and then hit Shift-Enter. If you’re in Python 2, type: print"Hello World!" or for Python 3: print("Hello World!") Then press Shift-Enter For more instructions, the Help menu has a good tour and detailed information. Notebooks can be downloaded locally by going to the File menu, then selecting Download and choosing a file type to download. 1.2.3 References for learning Python • http://rosalind.info/problems/locations/ • http://learnpythonthehardway.org/book/ 4 Chapter 1. Setting up Reproducible Research, Release 1.0 • http://www.learnpython.org/ • http://www.pythontutor.com/visualize.html#mode=edit 1.3 R / RStudio R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio. Windows Install R by downloading and running the correct installer file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select “Run as administrator” instead of double-clicking). Otherwise problems may occur later, for example when installing R packages. Linux You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE. 1.3.1 Install the required packages install.packages(c("dplyr","tidyr","vegan","ggplot2")); 1.4 Install the R Kernel for Jupyter Jupyter by default supports python, but this functionality can be expanded by using additional kernels. For our case, we will use also the R kernel. After launching R, run the following commands: install.packages('devtools') devtools::install_github('IRkernel/IRkernel') IRkernel::installspec() # to register the kernel in the current R installation That’s it, all

Rmarkdown / Knitr) 35 7.1 Rmarkdown

Introduction to Ggplot2

Regression Models by Gretl and R Statistical Packages for Data Analysis in Marine Geology Polina Lemenkova

Julia: a Fresh Approach to Numerical Computing∗

Revolution R Enterprise™ 7.1 Getting Started Guide

Julia: a Modern Language for Modern ML

Download from a Repository for Fur- Should Be Conducted When Deemed Appropriate by Pub- Ther Development Or Customisation

Gramm: Grammar of Graphics Plotting in Matlab

Figure Properties in Matlab

Data Visualization with Ggplot2 : : CHEAT SHEET

JASP: Graphical Statistical Software for Common Statistical Designs

ERC Document on Open Research Data and Data Management Plans

Open Science Toolbox