<<

SIDM 4: Publication quality tables, loops in Stata and Outputting results

Publication quality tables, loops and outputting results

Learn an efficient way to produce publication quality tables from Stata & save loads of time (using Excel). Learn simple loops in Stata & save more time. For sophisticated users, there are ways to automate the extraction of statistical results which can save time, when there is considerable repetition.

Recap on preceding workshops, SIDM1, SIDM2 and SIDM3.

These covered finding your way around Stata, useful basic commands, good house-keeping (such as importance of keeping a do file of commands), and getting your data into an appropriate format for analysis. This includes the need to check what you are doing, as you go along, and to label data appropriately. This was extended to include merging datasets, and restructuring datasets.

Learning objectives of this Session SIDM4

The easiest way to produce tables from Stata output is probably to cut and paste from the output window into Excel, copying data in table format. This guide teaches you quick ways within Excel to create good-looking tables.

Loops are a way of repeating some Stata coding, so that it runs on different variables/ values. Especially when you need to go back and amend code, so this can save time compare to cut and pasting.

Relatively simple loops are the limit to which automation can help save time for many users. For sophisticated users/ those duplicating very many similar analyses: there are higher levels of automation for extracting regression coefficients and similar.

Contents 1. Producing quality tables from Stata output using excel ...... 2 Student Exercises: Producing tables in Excel from Stata output ...... 3 2. Loops in Stata: repeatedly running code on different variables/ with different values ...... 3 Student Exercises on loops: ...... 6 3. Accessing saved results of analyses for use in other commands – useful with very many repetitions ...... 7 Student Exercises on extracting data saved by Stata commands: ...... 8 4. Outputting Data from Stata ...... 9

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.1

SIDM 4: Publication quality tables, loops in Stata and Outputting results

1. Producing quality tables from Stata output using excel

Select the table required from Stata output window, being sure to select complete rows. Right click within Stata, and select “copy as table”. Move to Excel and paste. Check that the data is reasonably well in columns – if not try again. (Note that copying as a picture and pasting into word is useful if you just want a record of certain parts of the output, perhaps adding graphs in between, without being able to manipulate it).

This web example pastes into word, but you can use the same method to copy into excel: http://www.ats.ucla.edu/stat/stata/faq/outgraph.htm

Excel tables, efficient methods to create these:

In excel, I often write confidence intervals, surrounded by brackets, for example: 23.4 (19.3, 27.5) - all in one excel cell. It is straightforward to do this, referring to data in other cells, and then copy down/ across to get different rows/ columns of the table.

For example, if the excel cells D4 & E4 contain these values: D4=23.346, E4=43.22, then a cell containing:

=“(“&D4&“, “&E4&”)” contains various parts separately by & (&=concatenate, that is write one thing after the other): =“(“ shows ( &D4 shows 23.346 &“, “ shows , Remember to include any required spaces in quotes, such as after comma &E4 shows 43.22 &“)” shows )

resulting in (23.346, 43.22) when =“(“&D4&“, “&E4&”)” is written in Excel cell.

By referring to data contained in cells, such as D4 and E4, copying this command down will automatically refer to data in cells below. Copying right refers to cells to the right.

It is usually necessary to round off numbers, to avoid loads of decimal places in the output. When D4=23.346, then round(D4,1)=23.3 (to 1 decimal place) round(D4,0)=23 (to nearest whole number) round(D4,2)=23.35 (to 2 decimal places)

=“(“& round(D4,1)& “, “&round(E4,1),”)” in an Excel cell will show: (23.3, 43.2)

An alternative to the round Excel command is to use the text Excel command. This has the advantage of always giving the same number of decimal places, even when the last decimal place is 0. When D4=23.346, then text(D4,”0.0”)=23.3 (to 1 decimal place) text(D4,”0”)==23 (to nearest whole number) text(D4,”0.00”)=23.35 (to 2 decimal places)

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.2

SIDM 4: Publication quality tables, loops in Stata and Outputting results

See producing tables in excel by cutting and pasting from stata output.xls

It is easy to then cut and paste from Excel into a Word document.

More details on concatenating, in case you have not understood so far: https://support.office.com/en-us/article/CONCAT-function-9b1a9a3f-94ff-41af-9736- 694cbd6b4ca2?ui=en-US&rs=en-US&ad=US https://support.office.com/en-us/article/CONCATENATE-function-8f8ae884-2ca8-4f7a-b093- 75d702bea31d )

Student Exercises: Producing tables in Excel from Stata output Use dataset ihddata3.dta. a) Summarise the data for age, bmival and nummeds. Copy this data into Excel from the output window (select complete rows then right click to copy as a table, then paste into Excel). Now produce a table, which contains 4 columns as follows: Col 1 – variable name of interest with units (write these in yourself) Col 2 - the number of observations with non-missing data Col 3- mean (SD), i.e. mean with the SD in brackets in the same cell (concatenate using & symbols) Col 4 – [min – max], i.e. min & max values, in square brackets around, with “–“ between b) Now add extra lines to the table, with columns 1 and 2 as above, for data: doctor diagnosed diabetes (dm1), presence of chronic pain (cpain) and presence of CVD (cvd1). Remember to check how these are coded. Col 3 should now contain number with each condition (and percentage with each condition in brackets). (col 4 is empty). ) Now add extra lines to the table with numbers and percentages by smoking group, with a line for each smoking category. Also add extra lines with numbers and percentages by BMI category.

You can then cut and paste this table into word. It is easiest to change the presentation in Excel beforehand, if desired.

2. Loops in Stata: repeatedly running code on different variables/ with different values The most widely used loops in Stata are the foreach loop, which loop over different variables names (out of a list of variables) and the forvalues loop (which loops over numerical values, either

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.3

SIDM 4: Publication quality tables, loops in Stata and Outputting results consecutive numbers, or else numbers which increase by a specific set value). The syntax is as follows:

An example is: foreach xxx in var1 var2 var3 { // list of variables to loop over, line ends with open bracket { replace `xxx'=. if `xxx'==999 // commands referring to `xxx’ – note 2 different types of quotes

} // this close bracket goes on its own line.

This performs the following commands, or more generally it repeated all the command lines which are enclosed (though on separate lines) between the foreach brackets { }

replace var1=. If var1==999 // first time around the loop, `xxx’ replaced by var1

replace var2=. If var2==999 // second time around the loop, `xxx’ replaced by var2

replace var3=. If var3==999 // third time around the loop, `xxx’ replaced by var3

Since only these 3 variables are listed in the foreach command, Stata moves on to the following commands lines after this. There can be many commands inside the loop – using `xxx’ to refer to each of those variables that are being looped over in turn.

Note the use of different single quotes at different sides of `xxx’, i.e. starting with a single quote ` from far left key on standard keyboards, quote sloping top left to bottom right. Then after xxx write ‘ a single quote taken from right hand side of standard keyboards, which either looks vertical or else slopes from bottom left to top right. Note: xxx in the foreach statement can be replaced with any other words/letters that you like. The same word/letters in then referred to within the loop in single quotes.

Local constant values

The local command can be used to store a constant value that you can then refer to

local k=12

This sets k=12, and stores this as a constant within Stata, though only temporarily.

local k=12 // this needs to be run in a do file at the same time as the following commands

di `k’ // this displays the value of k, which is set temporarily to 12.

gen newvar=var12+`k’ // this creates a new variable “newvar” by adding 12 to var12

local k=`k’+1 // this adds 1 to k – note use of single quote on right hand but not on left hand side!!

Note that for this to work, you need to select several statements in your do file at the same time, and “do” them all at once, since the local value will otherwise disappear too soon.

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.4

SIDM 4: Publication quality tables, loops in Stata and Outputting results

It is possible to use “local” statements and a foreach loop to create a correspondence between the list of variables and “local” numbers:

local k=3 // this needs to be run in a do file at the same time as the following commands foreach xxx in aaa bbb ccc { // list of variables to loop over, line ends with open bracket { rename `xxx' var`k’ // commands referring to `xxx’ – note 2 different types of quotes

local k=`k’+1 // this adds 1 to k – note use of quotes

} // this close bracket goes on its own line.

K starts at 3, so this renames aaa to var3, bbb to var4, and ccc to var5. There is a correspondence between starting variable names and numbers within the loop as follows:

Initially `xxx’=aaa, `k’=3 THEN `xxx’=bbb, `k’=4 THEN `xxx’=ccc, `k’=5.

Forvalues loops forvalues uses very similar syntax, but is used for looping over consecutive numbers (or numbers with the same interval between them). “kkk=2/8” implies taking kkk=2, then kkk=3, then 4, up to 8 (i.e. increasing in steps of 1). “kkk=10(5)40” implies starting with kkk=10, and increasing in steps of 5, up to 40.

forvalues kkk=2/8 { // loops the following commands with kkk=2, 3, 4, 5, 6, 7 then 8

gen cutoff`kkk'=0 if wage!=. /* creates a new var with a number at the end of name, (cutoff1 then cutoff2 ….) and set this new var=0 for non-missing wages */

replace cutoff`kkk'=1 if wage> `kkk' & wage!=. /* sets cutoff#=1 when wage is above the specified cut-off values and not missing */

} // need to end loop with this on its own line

forvalues kkk=10(5)40 { /* loops the following commands with kkk=10,15,20,25,30,35,40 (from 10 to 40 in steps of 5) */

gen cutoff`kkk'=0 if wage!=. // as above

replace cutoff`kkk'=1 if wage> `kkk' & wage!=. // as above

}

Looping over non-consecutive numbers, with unequal intervals, and over other things too SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.5

SIDM 4: Publication quality tables, loops in Stata and Outputting results

Foreach can be used with numbers, and with numbers that do not have an equal interval between them (unlike forvalues, which needs equal intervals between the numbers):

foreach ssss of num 1 3 4 2 53 432 { // specify whatever numbers we like to loop over

di `ssss' // this simply displays each number in turn – replace/ add other code as required

}

Here is the Stata manual which gives further ways in which foreach can be used (specifying names of variables to be created – or specifying names of macros), in case the above is not flexible enough, and also in case you particularly want to pay attention to programming in a way that will save you time. This looks quite sophisticated to me, so I think what I have already described here is enough for nearly everyone. http://www.stata.com/manuals13/pforeach.pdf

If you want to loop over all possible values, without needing to specify what these values are, then read the following: http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/

Student Exercises on loops: Use data ihddata3.dta. ) Look to see which variables have negative numbers. Find out if these may correspond to a useful category (label list may help). Recode all negative numbers to missing values, for all variables where it seems appropriate to do this. Summarise the data, to see if other values look reasonable, as far as you can tell, and to see how much missing data there is. e) Calculate the log of 4 values in a loop – of age and nummeds. Summarise to see what you have, paying attention to missing values. f) Use a local statement, as well as a foreach loop, to create graphs to check the normality of bmi and age, and their logs. Save the results as graph1.gph, graph2.gph or similar, with different numerical values as part of the names of the graphs. This will allow you to see which is closer to the normal distribution. Then show more than one graph at the same time. Here are some hints: qnorm bmi, saving(graph1.gph, replace) /* this shows a normal probability plot which is a straight line for a Normal distribution and saves as a graph named graph1.gph. If you want you may use the option , saving(graph1.gph,replace) - replace options means over-write any graph of this name already present in the working directory. */ graph combine graph1.gph graph2.gph // combines graphs so that both are viewed together

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.6

SIDM 4: Publication quality tables, loops in Stata and Outputting results

Note that for the graph combine statement to work, you need to have changed directly to one of your own, this does not seem to work when graphs are saved into the default stata directly. g) Use forvalues statement to create several dummy variables for BMI, coding them to 1 when they are above the specified cut-off value, and to zero when they are below it. Firstly, going up from 20 to 40 in steps of 1, then going from 10 to 60 in steps of 5.

This is as far as most users need to get to. Aiming to automate more can increase the time taken, since writing programmes to automate can in itself be time-consuming.

3. Accessing saved results of analyses for use in other commands – useful with very many repetitions

Remember that using the egen command is a simple way to create a variable which contains summary statistics, which you can then use in calculations. This is generally simpler than using saved commands, so is the first things to try (if the summary statistic that you are wanting is available within egen – help egen will tell you this).

Help stored results for this – return list and ereturn list

After any command, you can look to see whether or not stata has saved any results. Try the commands ereturn list (which lists saved results from most regression analyses and some other commands), and return list (which lists saved results from many simpler commands). This will show you what is available and what things are called.

The main results of a regression equation are saved in matrices, e(b) containing regression coefficients, and e(V) containing the covariance matrix of regression coefficients. The only elements you are likely to use of matrix e(V) are the diagonal elements, which are SE’s squared, for each regression coefficient in turn. Since these results are given as matrices, matlist is useful, which lists the contents of matrices. The el function extracts elements from matrices, so el(e(V),2,3) extracts the 2nd row, 3rd column from matrix e(V) (although it does not always seem to work directly in practice in my experience). matlist e(b) // lists the matrix of regression coefficients. matlist e(V) // list matrix e(V), which is the saved covariance matrix saved after regression analysis

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.7

SIDM 4: Publication quality tables, loops in Stata and Outputting results

corr age bmival // find correlation between age and bmi return list // see what results are saved matlist r(C) // list the matrix, which contains the correlation coefficient(s) gen corr1=el(r(C),1,2) /* takes the 1st row, 2nd column element from matrix r(C), and put result into variable corr1 */ summ corr1 // corr1 is full of identical values, all = correlation between age and bmi gen float corr2=el(r(C),1,2) if _n==1 /* mystery to me why above command without if statement works, but this does not - _n==1 implies first row, so attempting to put result into 1st row */

gen nnn=_n // create a new variable which simply counts the row number gen float corr2=el(r(C),1,2) if nnn==1 /* this time it works - puts results into the first line of results of new variable corr2 */ summ corr2 // now there is just one value in variable corr2, which indicates the first row gen corrvars="age bmi" if nnn==1 /* keep a record of what correlation this is of, by adding a string variable containing variable names of the 2 relevant variables */

corr age nummeds // find a second correlation replace corr2=el(r(C),1,2) if nnn==2 // this time it works - puts results into the first line of results of new variable corr2 replace corrvars="age nummeds" if nnn==2 // again record what correlation this is

Loop to extract results of regression or other analyses

I’ve written a loop which extracts the results of linear regression analyses, using results saved in the ereturn list command “extracting output results from regression and putting into dataset2.do”. This is useful when you are undertaking very many regression analyses, say 20 or more, which are identical except for one variable which keeps on changing. It is slightly fiddly to amend to a specific situation, but is very flexible in what you can save. You can also amend it to extract information from any analysis where there are Stata saved results.

Student Exercises on extracting data saved by Stata commands: Using ihddata3.dta. h) Perform a ttest of bmival by sex (help ttest). Type return list, to see the saved results. Create a new variable which contains the difference in means and the SE of the difference in means, using the saved results. i) Perform a linear regression of bmival on age and sex (regress bmival age i.sex). See what results are saved by Stata (using ereturn list). Find the regression coefficients and their standard errors SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.8

SIDM 4: Publication quality tables, loops in Stata and Outputting results

in the Stata output (the SE’s are not directly listed in the output but can be derived from the output). You will need matlist command here, which prints out the named matrix. j) Look at the do file, which performs multiple regression analyses, looping over different results. Amend this so that is also saves the regression coefficient of age, and the R square value (if you want to practice). If you want to use automatically extract data in this way, amend to your data and use. “extracting output results from regression and putting into dataset2.do”.

4. Outputting Data from Stata

I don’t use these myself and don’t know anyone who routinely uses these commands, so might not be worth the time and effort to learn. But potentially worthwhile if doing loads of regressions. The above method does not work well when diff regressions have different numbers of variables in.

This points to material that others have written, that will help you get output in more user friendly terms out of Stata. Many of these commands are not currently formally a part of Stata, as you will have it install onto your own computer.

Installing new Stata commands: some commands are written by users. If you type “help tabout” and get the error message help for tabout is not found, then this implies that you need to install tabout before you can use it. You only need to do this once per computer.

Much of Stata is written by its users. They can write new commands, which are not necessarily formally a part of Stata. Stata does at times incorporate the most useful of these into new releases of Stata, after putting them through a certain amount of testing. Others are available for you to install on your individual version of Stata.

For many of these options for outputting data, you will need to install new commands into your version of Stata, which is why I teach this here:

Installing new commands from the internet into your version of Stata: http://www.ats.ucla.edu/stat/stata/faq/findit.htm

Estout can be used quite flexibly, to output a table of linear regression coefficients, in a number of different formats. The following link uses the estout command with different options, to give different tables. You might want to look through these, and copy and paste the syntax of your favourite layout. Then pay attention to the earlier part of the file, to see how to go about it. This is very flexible and pretty easy to implement. http://www.ats.ucla.edu/stat/stata/faq/estout.htm

Outreg2 (and outreg) is an alternative, as shown here:

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.9

SIDM 4: Publication quality tables, loops in Stata and Outputting results http://www.ats.ucla.edu/stat/stata/faq/outreg.htm http://dss.princeton.edu/training/Outreg2.pdf

This following tutorial includes outputting tables for logistic regression http://www2.fiu.edu/~tardanic/make.pdf

Here is a tutorial on outputting of tables from stata into a format that you can use in your reports: http://www.ianwatson.com.au/stata/tabout_tutorial.pdf

You can also save results into excel in a very flexible way. I haven’t explored this much, but I suspect this might be useful for people producing a report in the same format repeatedly, but possibly too fiddly for most purposes https://www.youtube.com/watch?v=MUQ3E8hIQZE

A menu of different ways of exporting results from Stata http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/misc/exporting_results

SDM=Stata Data Management.doc Hilary Watt SIDM=Stata Introduction and Data Management.doc workshops SCCS=Stata Commands Crib Sheet.xls 4.10