STAT 625: Statistical Case Studies 1 Our Computing Environment(S)
Total Page:16
File Type:pdf, Size:1020Kb
John W. Emerson, Department of Statistics, Yale University © 2013 1 STAT 625: Statistical Case Studies John W. Emerson Yale University Abstract This term, I’ll generally present brief class notes and scripts, but much of the lecture material will simply be presented in class and/or available online. However, student participation is a large part of this course, and at any time I may call upon any of you to share some of your work. Be prepared. This won’t be a class where you sit and absorb material. I strongly encourage you to work with other students; you can learn a lot from each other. To get anything out of this course you’ll need to put in the time for the sake of learning and strengthening your skills, not for the sake of jumping through the proverbial hoop. Lesson 1: Learning how to communicate. This document was created using LATEX and Sweave (which is really an R package). There are some huge advantages to a workflow like this, relating to a topic called reproducible research. Here’s my thought: pay your dues now, reap the benefits later. Don’t trust me on everything, but trust me on this. A sister document was created using R Markdown and the markdown package, and this is also acceptable. Microsoft Office is not. So, in today’s class I’ll show you how I produced these documents and I’ll give you the required files. I recommend you figure out how to use LATEX on your own computer. You already have Sweave, because it comes with R. Use of markdown is even easier (it’s a simple installation from CRAN), though it doesn’t easily produce nice PDF files; it’s better for publishing to the web. Both are integrated nicely in the R Studio environment. Secondly, you need to learn to use our department server, Euler, which is“euler.stat.yale.edu”, for the purpose of building a course web page. I’ll talk about this more in class. A brief word of caution, though: if you visit http://euler.stat.yale.edu/ you will be redi- rected to http://statistics.yale.edu (which does not “live” on euler). In contrast, http://euler.stat.yale.edu/~jay “lives” on euler. There is no such thing as http: //statistics.yale.edu/~jay. 1 Our computing environment(s) I’m sure we have a mix of Mac and PC users (and perhaps a Linux enthusiast or two). In theory, everything we do should be platform independent, and you are encouraged to become more familiar with advanced computing on your personal computers. However, some aspects of this course may be most efficient if we work together in the same environment once in a while. And I’d like you to have a web page for your course work, hence the following material. Everyone has (or could soon have) an account on Euler, the department Linux server. Everyone can log into Euler remotely, regardless of their type of personal computer or location on (or off) campus. John W. Emerson, Department of Statistics, Yale University © 2013 2 1.1 Accessing Euler from a Mac via SSH Good news, Mac users: you can open a terminal (if you have never done this, ask for help) and simply type one of the following commands: ssh euler.stat.yale.edu -l NETID ssh [email protected] Then enter your password when prompted, and you’re in! 1.2 Accessing Euler from a PC via SSH (or PuTTY) An SSH client is needed to connect remotely, and may or may not be available on Yale’s computers. If you need it on your own PC, you can download PuTTY from http://www.yale.edu/software/ Once it’s installed, you want to connect to euler.stat.yale.edu with your NETID and password. You may want WinSCP for file transfers (I’m not sure if a secure FTP is included with PuTTY, but think it is). 1.3 Accessing Euler’s filesystem I grudgingly admit that “drag and drop” interactivity with the Linux filesystem can be convenient, but it is not sufficient for this course. You’ll need to become proficient at a basic level with SSH (and perhaps the sister program, SFTP, for transferring files). 2 Getting started on Euler Once you’re in, you should have a screen like the one in Figure 1. At this point, a good rule of thumb is: DON’T TOUCH THE MOUSE. As a first example, we’ll create a folder (directory) for the course, and make sure a few folder permissions are set properly. Linux is case-sensitive, so be careful. I’ll discuss these commands in class, and they should soon become second-nature. pwd ls ls -lat chmod 755 . ls -lat mkdir Stat625 ls -lat chmod 755 Stat625 ls -lat cd Stat625 pwd ls John W. Emerson, Department of Statistics, Yale University © 2013 3 Figure 1: A screenshot showing an SSH session to euler. John W. Emerson, Department of Statistics, Yale University © 2013 4 You can see what many of these commands do, although chmod may be less than obvious. Each file or directory has a set of permissions, such as drwxr-xr-x; this particular example says “this is a directory” (because it starts with a d), “I have read, write, and execute permission” (because of the first triplet rwx), and “people in my group as well as everyone else on the system have read and execute permission, but not write permission” (because of the second and third triplets, r-x). In binary, this is 755; think about it. Essentially, anyone can enter the directly and see the contents, but only you can create files inside this directory. In fact, depending on your user defaults, the chmod commands, above, may not have been necessary. Use of 644 is commmon for files where you want to have read-write permission for yourself, but only read permission for anyone else; 700 (for directories) or 600 (for files) restricts use to you and you alone. Maybe we can explore this in class, but it isn’t critical. The Linux system has help pages (manual pages, or man for short), too; for help on pwd, for example: -bash-3.2$ man pwd Press the space bar to page through the manual, and q to exit. The following web page has a nice summary of a bunch of useful Linux commands: http://www.ss64.com/bash/ 2.1 R on Euler Next, let’s fire up R and play a bit. Simply type R at the prompt. Your prompt may look a little different, depending on the default settings: -bash-3.2$ R The result should be familiar, as will the following commands. Notice how easy it is to integrate graphics into a document. Now I admit a single “cut and paste” into Word isn’t that bad, but... graphics change, and you’ll generally have many more of them, which can be a pain. Of course, being able to display code and results in a document without painful formatting by hand is pretty cool. > sayhello <- function(x) cat(paste("Hello, ", x, "!\n", sep="")) > sayhello("Jay") Hello, Jay! > normalsample <- rnorm(100) > summary(normalsample) Min. 1st Qu. Median Mean 3rd Qu. Max. -3.0110 -0.5820 0.1094 0.1036 1.0090 2.5890 > ls() [1] "normalsample" "sayhello" Now, if you are logged into Euler and running R interactively, you can’t view any graphics. However, if you are using Sweave the story is different... as this document demonstrates. To see the example, examine the Computing1.Rnw file and see where Figure 2 is produced. John W. Emerson, Department of Statistics, Yale University © 2013 5 Histogram of normalsample 20 15 10 Frequency 5 0 −3 −2 −1 0 1 2 3 normalsample Figure 2: A sample histogram produced using Sweave. John W. Emerson, Department of Statistics, Yale University © 2013 6 3 Editing files This gets a little stickier. When I used to work in Windows I would transfer the file to my Windows laptop, edit the file locally using “Notepad” (not Microsoft Word!), and then transfer the file back to euler. This is kind of clunky, though, and problems sometimes occur with end-of-line characters (see below for more on this). There is another potential danger: you might lose track of the local copy of the file, the copy of the file on the server, and mistakenly edit the wrong one. There is a real potential for lost work and general confusion with this approach. There are basically two alternatives: (1) edit the file directly on Euler inside SSH using a Linux editor like “vi” or “emacs”, or (2) edit the file over a network connection to the filesystem. See below for information on this second route. 3.1 Using “vi” A neat little introduction to vi is http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html although after the introduction it moves quickly onto features that, frankly, I never use. There is a small set of commands which I find are most useful. Command Action arrow keys move around <Esc> exit from insertion mode once you get into it; the most commonly used ways of entering insertion mode follow. i insert here at my present location a insert one character after my present location o open a new line below the present one for insertion O open a new line above the present one for insertion x delete this character dd delete this line ZZ save and exit. :q! exit without saving :w save the file without exiting A more advanced (or, at least, exhaustive) reference is available from the author of vi, Bill Joy: http://docs.freebsd.org/44doc/usd/12.vi/paper.html John W.