Naming and Labeling Data

Note: The recently updated IPA Stata 102 training covers data cleaning, including naming and labeling. Also, see the guide to checking incoming data or the manual on high-frequency checks, applicable to both CAI and paper-and-pencil projects.

Variable names and labels are usually a personal preference, and different PIs have different preferences, so there's no formal convention. “Check with your PI” is usually good advice, but a very busy PI might not respond to that, so here are some guidelines: 1. All variables should have labels, and all multiple choice variables have value labels. 2. The labeling system should be internally consistent. 3. It should be easy to connect the variable in the dataset with the question on the questionnaire. Most analysis is done with the questionnaire in hand.

To learn the code to create labels in Stata, go here.

The most common way to name variables is to use the question number from the questionnaire as the variable name and provide a descriptive .

The basic here is:

Variable name: question_number Variable label: descriptive label

So if you had questions 101 through 103 from a questionnaire called “QA,” the names and labels might be: label var qa_101 "Has children under 15" label var qa_102a "Number boys under 15" label var qa_102b "Number boys in school" label var qa_103a "Number girls under 15" label var qa_103b "Number girls in school"

A second good way is to use a descriptive variable name, then put the question number in the label.

The basic format here is:

Variable name: descriptive_name Variable label: [question_number] descriptive label

There is an example of this in the document library that uses a style similar to: label var child15 "[QA.101] Has children under 15" label var child15G "[QA.102a] Number boys under 15" label var child15BS "[QA.102b] Number boys in school" label var child15G "[QA.103a] Number girls under 15" label var child15GS "[QA.103b] Number girls in school"

A practical tip on creating value labels: it can be useful to change the delimiter to a semicolon so that a single command can take up several rows in your text editor, making it easier to read. See delimit to learn about delimiters in Stata. An example would be:

#delimit ; label def sex 0 "Male 0" 1 "Female 1" ; label def reg 1 "Northern 1" 2 "Southern 2" 3 "Western 3" 4 "Eastern 4" 5 "Central 5" ; #delimit cr label values female sex label values region reg

Note how the labels have the number in the value label. This is not strictly necessary, but can be very useful if you want to refer to specific values.