Introduction to Stata

DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN Introduction to Stata Introduction to Stata Introduction to Stata http://www.biostat.ku.dk/~ts/stata-course Klaus K. Holst Thomas Scheike Klaus Kähler Holst 27 Apr 2016 Basic introduction to Stata with an emphasis on data processing and statistical analyses Time plan Course outline 815 – 1130 Lecture & Practicals Apr 27 UI, Data, 1130 – 1215 Lunch Graphics 1215 – 1500 Lectures & Practicals Apr 28 Statistics Practicals Litterature Stata documentation (Access the 11,000+ pages documentation from the Help menu), or browse online: Stata on your own laptops (preferably version 12) ≥ http://www.stata.com/features/documentation/ Datasets available from the homepage http://publicifsv.sund.ku.dk/~ts/stata-course/data Stata journal http://www.stata-journal.com/ http://publicifsv.sund.ku.dk/~kkho/undervisning/data 1 net from http://www.stata-journal.com/software Focus on command stata syntax (via do files) http://www.stata.com/support/faqs/ Reproduciblity! Stata forum: http://www.statalist.org/ http://www.ats.ucla.edu/stat/stata/ . but feel free to explore and learn from the menu-interface (and perhaps discover new functions. Command syntax is also shown SSC http://ideas.repec.org/s/boc/bocode.html and notice "Copy Command to Clipboard") Statistics with Stata: Version 12, Eighth Edition by Lawrence C. Hamilton. Cengage 2013 DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN Example session Example session Commands are entered interactively (shell-like) and with a syntax Abbreviations can be used (sum instead of summarize, reg instead not unlike spoken language of regress,. ) 1 use iris 1 sum 2 describe Variable | Obs Mean Std. Dev. Min Max (Edgar Anderson’s Iris Data) -------------+-------------------------------------------------------- Sepal_Length | 150 5.843333 .8280661 4.3 7.9 Contains data from iris.dta Sepal_Width | 150 3.057333 .4358663 2 4.4 obs: 150 Edgar Anderson’s Iris Data Petal_Length | 150 3.758 1.765298 1 6.9 vars: 5 8 Sep 2014 17:32 Petal_Width | 150 1.199333 .7622377 .1 2.5 size: 5,400 Species | 150 2 .8192319 1 3 ------------------------------------------------------------------------------- storage display value 1 correlate Sepal_Length Sepal_Width Petal_Length variable name type format label variable label ------------------------------------------------------------------------------- Sepal_Length double %9.0g Sepal.Length (obs=150) Sepal_Width double %9.0g Sepal.Width Petal_Length double %9.0g Petal.Length | Sepa~gth Sepa~dth Peta~gth Petal_Width double %9.0g Petal.Width -------------+--------------------------- Species long %9.0g Species Species Sepal_Length | 1.0000 ------------------------------------------------------------------------------- Sepal_Width | -0.1176 1.0000 Sorted by: Petal_Length | 0.8718 -0.4284 1.0000 Example session History of statistical computing 1 scatter Sepal_Length Petal_Length if Species==1 SAS Octave 6 JMP R SPSS Fortran Minitab Clojure (Incanter) 5.5 Pascal Genstat (X)Lisp−Stat C++ JavaScript Matlab 5 S−plus Java Stata Sepal.Length Swift Common Lisp Python 4.5 BMDP S B Scheme C Julia 4 Lisp 1 1.2 1.4 1.6 1.8 2 Petal.Length 1957 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2014 DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN History of statistical computing Stata history Originally built around REPL (Read-Eval-Print-Loop) interface (or runs in batch-mode). SAS Octave 1985: Stata 1.0 release, DOS JMP R SPSS 1986: program Fortran Minitab 1987: anova, logit Clojure (Incanter) Pascal 1988: Stata for UNIX, stcox Genstat (X)Lisp−Stat C++ 1990: Ado-files, reshape JavaScript Matlab 1996: glm, xtreg, xtgee S−plus Java 1999: arima, arch Stata Swift Common Lisp 2003: stata 8: GUI, graphics update, manova Python BMDP 2005: stata 9: Matrix language mata, xtmixed, svyset S B 2012: sem, marginsplot Scheme C Julia 2013: gsem, teffects Lisp 2015: IRT, Bayesian inference, . 1957 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2014 The Stata Journal (2005) 5, Number 1, pp. 2–18 Stata Stata compared to other computer languages Simple consistent command syntax (REPL or batch) Documentation, support Stata operates on single(!) dataset in memory (spreadsheet implementation) Graphical user-interface Results stored internally (not in user-assigned objects) and Cross-platform (Windows,Mac OS X,Linux) accessed via post-commands Feature rich: http://www.stata.com/features/ Libraries (Ado-files) are loaded automatically User-contributed software Implications are much simpler syntax close to spoken language Stata versions Stata R Stata/MP (Parallel processing) use iris load("iris.rda") Stata/SE (Large data) regress y x l <- lm(y~x,data=iris) Stata/IC (2,000 variables, up to 800 predictors) summary(l) Small Stata Interactive command-line interace on Linux or with SE/MP. DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN Stata compared to other computer languages User interface Graphical user-interface Interactive command-line interace on Linux or with SE/MP. Vocabulary: Stata Computer Science Running stata in Batch mode (Command line argument "-b") Advanced users may be prefer to use their own editor (emacs/ESS, notepad++, . ), providing syntax highlighting, Stata syntax Translation completion, interaction with stata console (or batch). do-file script ado-file library Learn to effectively navigate your user-interface. macro "variable" Stata has powerful built-in GUI. variable column,vector,field Data browser/editor program function Graphics interaction function (math) function Help-file browser Do-file editor Follows pretty much Common User Access Standard (CUA). The modifier key under Windows and Linux is ctrl and under Mac OS X (the latter will be used onwards). GUI Command window Commands are entered directly resembling a unix shell (probably more than other matrix REPL languages, you cannot just enter an expression but need to "display"/"echo" the result) Results window 1 display 2+2 + 2 Variable list + 4 Some command shell commands exists 1 pwd Command history Actual shell escape (command: ! or shell) + 3 1 ! convert graph.pdf graph.png ctrl + R and ctrl + B to cycle history. History can be viewed Property Window Input with: #r 10 + 5 + 1 LATEX DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN Do files Syntax of Do-files Files with extension .do contains stata scripts. Stata is case sensitive (e.g. CamelCase allowed, also underscore) Reproducible research Use short meaningful (and consistent) names for variables (use Aim towards writing every analysis step in do-files labels to describe in details) You can execute arbitrary many other do files within a Do-file capture prevents stata halting and captures error messages (modular programming) via the command pause debugging of do-files #delimit ; Multi-line statements (not interactively) 1 do dofile C++ like comments nesting of do-calls limited to 64. 1 /* This also works for multiple lines*/ A Do-file may contain just a sequence of commands for doing data 2 // Single line, can also use ’*’(weird) manipulation and regression analysesbut also actual programming elements such as loops (foreach, forvalues, while), branches Comment your code and support with display statements (if, else), functions (program). 1 display as text"{hline 50}" char(13)"{bf: Result}" char(13)"{hline 50}" This allows for complex tasks and reduce repetetive task which is -------------------------------------------------- often error prone an difficult to maintain. Result -------------------------------------------------- Log files Do editor New (Do) file + N or open + O existing file To capture a log of the stata session to a file 1 log using test, replace Every input and output will be captured to the markup language file test.smcl. 1 log close To open in the Viewer Do-file editor + 9 1 view test.smcl Translate to plain ASCII Select region + , 1 translate test.smcl test.log Execute region (lines with just partial mark) ctrl + D , Mac OS X: + + D 1 LATEX 1[[http://www.stata.com/manuals13/gsw13.pdf]] DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN Data editor Data editor Data Editor + 8 The data editor provides a nice spreadsheet view of the data and can be accessed via shortcut + 8 or 1 edit Copy-paste works between other spreadsheets (Excel, OpenOffice, and even HTML tables). As a rule of thumb: Do not directly alter data in the data editor! Document you data processing steps in a do-file. Non-destructive view is typically prefered 1 browse Data editor Logical expressions LATEX Filter: Live filtering of the data (non-destructive). Comparison operators: 1 list in 1/5// head a == b true if a equals b 2 list in -10/l// tail a != b true if a not equal to b 3 list if Species==1 a > b true if a greater than b Snapshots: save,restore snapshots of data a >= b true if a greater than or equal to b a < b true if a less than b 1 snapshot save, label("Original data") a <= b true if a less than or equal to b 2 // snapshot list Avoid common mistake of testing equality with single ’=’. snapshot 1 (Original data) created at 19 Sep 2014 10:50 Be Careful of equality checks for floating numbers 1 capture drop if Sepal_Length >1.8 & Species!=1 Logical expressions combined with 2 display_N and expr1 & expr2 50 or expr1 & expr2 negation! expr 1 snapshot restore 1 2 display_N 1 display !(2<3 & 2==3) 150 1 DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN Logical expressions Graph We use logical expressions to subset data with the if statement Graph window/editor for conditional constructs + 6 1 assert Sepal_Length>0 1 list if Sepal_Length>6 & Sepal_Width>3.1 & Species==2 +--------------------------------------------------------+ | Sepa~gth Sepa~dth Peta~gth Peta~dth Species | |--------------------------------------------------------| 51. | 7 3.2 4.7 1.4 versicolor | 52. | 6.4 3.2 4.5 1.5 versicolor | 57. | 6.3 3.3 4.7 1.6 versicolor | +--------------------------------------------------------+ 1 regress Sepal_Length Petal_Length if Species==2 Viewer (help) Viewer (help) LATEX Stata has a very extensive well-documented help-system The documentation is available online http://www.stata.com/features/documentation/ and lots of video material accessible from the stata home page.

Load more