DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Introduction to

Introduction to Stata Introduction to Stata

http://www.biostat.ku.dk/~ts/stata-course Klaus K. Holst Thomas Scheike Klaus Kähler Holst 27 Apr 2016 Basic introduction to Stata with an emphasis on data processing and statistical analyses

Time plan Course outline 815 – 1130 Lecture & Practicals Apr 27 UI, Data, 1130 – 1215 Lunch Graphics 1215 – 1500 Lectures & Practicals Apr 28

DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Practicals Litterature

Stata documentation (Access the 11,000+ pages documentation from the Help menu), or browse online: Stata on your own laptops (preferably version 12) ≥ http://www.stata.com/features/documentation/ Datasets available from the homepage http://publicifsv.sund.ku.dk/~ts/stata-course/data Stata journal http://www.stata-journal.com/ http://publicifsv.sund.ku.dk/~kkho/undervisning/data 1 net from http://www.stata-journal.com/software

Focus on command stata syntax (via do files) http://www.stata.com/support/faqs/ Reproduciblity! Stata forum: http://www.statalist.org/ http://www.ats.ucla.edu/stat/stata/ . . . but feel free to explore and learn from the menu-interface (and perhaps discover new functions. . . Command syntax is also shown SSC http://ideas.repec.org/s/boc/bocode.html and notice "Copy Command to Clipboard") Statistics with Stata: Version 12, Eighth Edition by Lawrence C. Hamilton. Cengage 2013 DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Example session Example session Commands are entered interactively (shell-like) and with a syntax Abbreviations can be used (sum instead of summarize, reg instead not unlike spoken language of regress,. . . )

1 use iris 1 sum 2 describe

Variable | Obs Mean Std. Dev. Min Max (Edgar Anderson’s Iris Data) ------+------Sepal_Length | 150 5.843333 .8280661 4.3 7.9 Contains data from iris.dta Sepal_Width | 150 3.057333 .4358663 2 4.4 obs: 150 Edgar Anderson’s Iris Data Petal_Length | 150 3.758 1.765298 1 6.9 vars: 5 8 Sep 2014 17:32 Petal_Width | 150 1.199333 .7622377 .1 2.5 size: 5,400 Species | 150 2 .8192319 1 3 ------storage display value 1 correlate Sepal_Length Sepal_Width Petal_Length variable name type format label variable label ------Sepal_Length double %9.0g Sepal.Length (obs=150) Sepal_Width double %9.0g Sepal.Width Petal_Length double %9.0g Petal.Length | Sepa~gth Sepa~dth Peta~gth Petal_Width double %9.0g Petal.Width ------+------Species long %9.0g Species Species Sepal_Length | 1.0000 ------Sepal_Width | -0.1176 1.0000 Sorted by: Petal_Length | 0.8718 -0.4284 1.0000

DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Example session History of statistical computing

1 scatter Sepal_Length Petal_Length if Species==1

SAS Octave

6 JMP SPSS Fortran Clojure (Incanter)

5.5 Pascal Genstat (X)Lisp−Stat C++ JavaScript Matlab 5 S−plus Java Stata Sepal.Length Swift Common Lisp Python

4.5 BMDP S B Scheme C Julia 4 Lisp 1 1.2 1.4 1.6 1.8 2 Petal.Length 1957 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2014 DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN History of statistical computing Stata history Originally built around REPL (Read-Eval-Print-Loop) interface (or runs in batch-mode). SAS Octave 1985: Stata 1.0 release, DOS JMP R SPSS 1986: program Fortran Minitab 1987: anova, logit Clojure (Incanter) Pascal 1988: Stata for UNIX, stcox Genstat (X)Lisp−Stat C++ 1990: Ado-files, reshape JavaScript Matlab 1996: glm, xtreg, xtgee S−plus Java 1999: arima, arch Stata Swift Common Lisp 2003: stata 8: GUI, graphics update, manova Python BMDP 2005: stata 9: Matrix language mata, xtmixed, svyset S B 2012: sem, marginsplot Scheme C Julia 2013: gsem, teffects Lisp 2015: IRT, Bayesian inference, . . . 1957 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2014 The Stata Journal (2005) 5, Number 1, pp. 2–18 DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Stata Stata compared to other computer languages

Simple consistent command syntax (REPL or batch) Documentation, support Stata operates on single(!) dataset in memory (spreadsheet implementation) Graphical user-interface Results stored internally (not in user-assigned objects) and Cross-platform (Windows,Mac OS X,Linux) accessed via post-commands Feature rich: http://www.stata.com/features/ Libraries (Ado-files) are loaded automatically User-contributed Implications are much simpler syntax close to spoken language Stata versions Stata R Stata/MP (Parallel processing) use iris load("iris.rda") Stata/SE (Large data) regress y x l <- lm(y~x,data=iris) Stata/IC (2,000 variables, up to 800 predictors) summary(l) Small Stata Interactive command-line interace on Linux or with SE/MP. DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Stata compared to other computer languages User interface Graphical user-interface Interactive command-line interace on Linux or with SE/MP. Vocabulary: Stata Computer Science Running stata in Batch mode (Command line argument "-b") Advanced users may be prefer to use their own editor (emacs/ESS, notepad++, . . . ), providing syntax highlighting, Stata syntax Translation completion, interaction with stata console (or batch). . . do-file script ado-file library Learn to effectively navigate your user-interface. macro "variable" Stata has powerful built-in GUI. variable column,vector,field Data browser/editor program function Graphics interaction function (math) function Help-file browser Do-file editor Follows pretty much Common User Access Standard (CUA). The modifier key under Windows and Linux is ctrl and under Mac OS X (the latter will be used onwards). DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN GUI Command window

Commands are entered directly resembling a unix shell (probably more than other matrix REPL languages, you cannot just enter an expression but need to "display"/"echo" the result) Results window 1 display 2+2 + 2 Variable list + 4 Some command shell commands exists

1 pwd Command history Actual shell escape (command: ! or shell) + 3 1 ! convert graph.pdf graph.png

ctrl + R and ctrl + B to cycle history. History can be viewed Property Window Input with: #r 10 + 5 + 1

LATEX DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Do files Syntax of Do-files Files with extension .do contains stata scripts. Stata is case sensitive (e.g. CamelCase allowed, also underscore) Reproducible research Use short meaningful (and consistent) names for variables (use Aim towards writing every analysis step in do-files labels to describe in details)

You can execute arbitrary many other do files within a Do-file capture prevents stata halting and captures error messages (modular programming) via the command pause debugging of do-files #delimit ; Multi-line statements (not interactively) 1 do dofile C++ like comments

nesting of do-calls limited to 64. 1 /* This also works for multiple lines*/ A Do-file may contain just a sequence of commands for doing data 2 // Single line, can also use ’*’(weird) manipulation and regression analysesbut also actual programming elements such as loops (foreach, forvalues, while), branches Comment your code and support with display statements (if, else), functions (program). 1 display as text"{hline 50}" char(13)"{bf: Result}" char(13)"{hline 50}" This allows for complex tasks and reduce repetetive task which is ------often error prone an difficult to maintain. Result ------

DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Log files Do editor New (Do) file + N or open + O existing file To capture a log of the stata session to a file

1 log using test, replace

Every input and output will be captured to the markup language file test.smcl.

1 log close

To open in the Viewer Do-file editor + 9 1 view test.smcl

Translate to plain ASCII Select region + , 1 translate test.smcl test.log Execute region (lines with just partial mark) ctrl + D , Mac OS X: + + D

1 LATEX 1[[http://www.stata.com/manuals13/gsw13.pdf]] DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Data editor Data editor Data Editor + 8 The data editor provides a nice spreadsheet view of the data and can be accessed via shortcut + 8 or

1 edit

Copy-paste works between other spreadsheets (Excel, OpenOffice, and even HTML tables). As a rule of thumb: Do not directly alter data in the data editor! Document you data processing steps in a do-file.

Non-destructive view is typically prefered

1 browse

DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Data editor Logical expressions LATEX Filter: Live filtering of the data (non-destructive). Comparison operators:

1 list in 1/5// head a == b true if a equals b 2 list in -10/l// tail a != b true if a not equal to b 3 list if Species==1 a > b true if a greater than b Snapshots: save,restore snapshots of data a >= b true if a greater than or equal to b a < b true if a less than b 1 snapshot save, label("Original data") a <= b true if a less than or equal to b 2 // snapshot list Avoid common mistake of testing equality with single ’=’. snapshot 1 (Original data) created at 19 Sep 2014 10:50 Be Careful of equality checks for floating numbers 1 capture drop if Sepal_Length >1.8 & Species!=1 Logical expressions combined with 2 display_N and expr1 & expr2 50 or expr1 & expr2 negation! expr 1 snapshot restore 1 2 display_N 1 display !(2<3 & 2==3)

150 1 DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Logical expressions Graph We use logical expressions to subset data with the if statement Graph window/editor for conditional constructs + 6

1 assert Sepal_Length>0

1 list if Sepal_Length>6 & Sepal_Width>3.1 & Species==2

+------+ | Sepa~gth Sepa~dth Peta~gth Peta~dth Species | |------| 51. | 7 3.2 4.7 1.4 versicolor | 52. | 6.4 3.2 4.5 1.5 versicolor | 57. | 6.3 3.3 4.7 1.6 versicolor | +------+

1 regress Sepal_Length Petal_Length if Species==2

DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Viewer (help) Viewer (help)

LATEX

Stata has a very extensive well-documented help-system The documentation is available online http://www.stata.com/features/documentation/ and lots of video material accessible from the stata home page.

All 11,000 pages available here: Help PDF Documentation Help viewer Cross-document linked, but newest Adobe Reader removed + 7 navigation for security reasons. Solution Preferences Documents and uncheck: "Open cross-document links in same window".

LATEX DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Viewer (help) Stata specific files Good place to start: .dta 1 help contents Data files (platform independent format). Contains data in rectangular (spreadsheet) format with attached labels and notes. Examples: With mydata.dta in the current working directory Specific topics on 1 use mydata 1 help logistic or reference relative to working directory 1 help spaghetti 1 use ../data/mydata To narrow the search 1 search logistic, local// stata keyword database .do 2 search logistic, manual// Search the stata manual Do files (Scripts) 3 search logistic, net// Online ressources(also user- contributed) 4 search logistic, sj// Search The Stata Journal(and 1 do myfile STB) or open and process in the Do-file editor DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Stata specific files Ado files Ado files are automatically loaded do-files containing programs. Unlike in other statistical programs where software add-ons have to .ado be explictly loaded by the user. Program files (Automatic do files). Install new programs in local Many commands in stata are defined as ado files, e.g. personal ado-directory or via ssc. 1 which ci .smcl, log /Applications/Stata/ado/base/c/ci.ado Stata log files (output dump) via log command. *! version 3.3.16 11feb2013 To view the source code use type or .hlp 1 viewsource ci Ado documentation (plain or smcl-format)

*! version 3.3.16 11feb2013 .gph, .grec program define ci, rclass byable(recall) version 6, missing Stata graph and graph edit file (via Graph editor) global S_1 /* # obs */ global S_3 /* mean */ global S_4 /* se of mean */ ... DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN Ado files Installing additional software No need to reinvent the wheel. . . Find packages and add-ons:

Searched for in the current working directory (pwd) and in the stata 1 findit schemes system directories Boston College Statistical Software Components (SSC) archive Windows: (probably) c:\ado\personal Stata Journal (SJ) Mac OS X: ~/Library/Application Support/Stata/ado/personal 1 ssc hot

Linux: ~/ado/personal Jul 2014 Rank # hits Package Author(s) 1 sysdir ------1 11533.1 outreg2 Roy Wada 2 10871.4 estout Ben Jann STATA: /Applications/Stata/ 3 6419.7 csipolate Nicholas J. Cox BASE: /Applications/Stata/ado/base/ ... SITE: /Applications/ado/ PLUS: ~/Library/Application Support/Stata/ado/plus/ 1 ssc install outreg2 PERSONAL: ~/Library/Application Support/Stata/ado/personal/ OLDPLACE: ~/ado/ Keeping up to date

1 adoupdate

DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN UNIVERSITY OF COPENHAGEN profile.do File system Try to stay organized with do-files for specific projects in their own You may install a file named profile.do in the "Ado" search path directory, e.g. Set options for graphics, output, random seeds, etc. project do-files Global macro defintions project data data-files (dta,csv) Set default working directory project figs graphics output An example profile.do To set the working directory to your home directory:

1 set more off 1 cd 2 set searchdefault all 3 set scheme s2mono Create new directory in current working directory and change to 4 noisily pwd this

System options can be examined with the query function, and 1 mkdir project 2 cd project details seen with

1 help query Display current working directory

1 pwd