1

Stata Workshop

Session 1

1. Introduction 2. Research question 3. Stata basics 4. Entering data into Stata 5. Cleaning a dataset

1. Introduction

 Data Services Lab and overview of the workshop.

2. Research Question

 To visualize what Stata is used for (and what it is useful for), we will pose a research question and we will try to an answer. The question:

Do females earn less than males in Oregon?

 To answer this question we will use data from the 2012 March Current Population Survey.

 We will be using a small subset based on the data extract by the Center for Economic and Policy Research :

http://ceprdata.org/cps-uniform-data-extracts/march-cps-supplement/march-cps-documentation/

3. Stata basics

 Opening Stata  Description of .do, .dta and .dct files  Using the Menu vs. command line.  Syntax note: Stata is case sensitive and doesn’t like spaces so much… For many commands, you can write the command in full length or in abbreviated form and Stata will understand.

 Finding help: using help and search commands; 2

 Online video tutorials from Stata (http://www.stata.com/links/video-tutorials/) and DSL’s Stata page (Go to http://ssil.uoregon.edu/dsl, then ‘Data Services Lab’ tab and click on Statistical Help and click on Stata link).

 Example of a DO file. Download and open the dsl_stata_workshop1.do file from the workshop’s page (actually, download entire .zip file):

http://ssil.uoregon.edu/stata-workshop/

4. Importing data into Stata

 There are many ways of uploading data into Stata’s memory, depending on what the original data is in.

 4 basic methods: o and paste o insheet command for .csv files o infile command for dictionary (.dct) files o New import excel command o Don’t know which method to use for your particular data set? Use help/search or come to the Data Services Lab!

 Download the following files from the workshop’s page: o cps_data.raw o cps_dictionary.dct o cps2012_extract.dta o cps_data.csv

 In Stata, go back to the DO file: dsl_stata_workshop1.do. Now we will try different methods…

5. Cleaning a dataset

Labels

 You can label variables and values of variables. Labels basically add text/meta-data to variables.  Labeling a variable is giving the variable a short description because many times, the variable’s name is not very informative (e.g. ‘wkslyr’???). It also helps when creating graphs since Stata can incorporate the information in the labels into graphs.  Labeling a variable’s values is useful for categorical data, such as the ‘female’ variable in our dataset (go to DO file). 3

 Note that you can label variables and values using the ‘Variables’ menu on the upper right-hand corner of the main screen, but we will use the DO file and the actual commands in order to learn!  Where can I see a variable’s labels? In the ‘Variables’ menu or by using the describe and codebook commands.  Review of label commands: label , label list, label define, label values.  Where do I get the information to label values? Answer: usually from the codebook that comes with the dataset or from the survey questionnaire.

Deleting variables and observations

 Deleting observations vs. deleting variables: eliminate data from states other than Oregon. Look at ‘state’ variable in the data editor. How do we know which value represents Oregon? Look at the Codebook (download from website).

 Delete individuals not in the labor force (babies, retired folks).

 Commands used: keep, drop, list accompanied by if, ==, !=,&, |, >= statements.

Generating new variables, replacing and renaming variables

 Generating new variables (note: Stata doesn’t like spaces so much), replacing values for new or existing variables.

 Commands used: generate, , rename.