NESUG 2009 Coders' Corner

Techniques for Labeling Variables Paulette W. Staum, Paul Waldron Consulting, West Nyack, NY

ABSTRACT This paper demonstrates techniques for automating the creation of variable labels. The first example creates la- bels that are similar, except that each label includes a different numeric suffix. The second example imports labels from an external data source. The third example captures and modifies labels from an existing data set using PROC SQL and DICTIONARY.COLUMNS.

INTRODUCTION Creating labels for variables is virtuous, but tedious. If you have hundreds of variables, your LABEL statement can go on for pages. No one wants to all those labels. If the text for the labels is similar or is available electroni- cally, there are better methods than typing the text in your program. In theory, there could be SAS® syntax to set a variable label to the value of another variable. It might look like CALL SETLABEL( varname, varlabel). Since this syntax does not exist (yet), programmers have developed a variety of techniques for labeling.

EXAMPLE 1: CREATING SIMILAR LABELS WITH NUMERIC SUFFIXES Some data sets have many variables with names and labels that have the same beginnings but different numeric suffixes. Here is an example of specifying labels for variables. The LABEL statement can be used in a DATA step or a PROC step.

Label Sales1990 = "Annual Sales 1990" Sales1991 = "Annual Sales 1991" Sales1992 = "Annual Sales 1992" ... Sales2009 = "Annual Sales 2009" Sales2010 = "Annual Sales 2010" ;

Figure 1

A simple macro can minimize the typing (or copying, pasting and editing) required to label this series of variables. Save the macro in an autocall library for easy usage in any program. The macro needs to know four things, so it requires values for four parameters: Parameter Description Var Common beginning stem for variable names Number of first variable in series NVars Count of variables to be labeled Label Common beginning text for variable labels

%macro VarLabels(Var= ,Start=1 ,NVars= ,Label= ); %local i; %do i = &start %to &start + &NVars - 1; &var&i = "&label &i" %end; %mend VarLabels;

The following macro call is equivalent to writing the label statement in Figure 1, but it is much easier and shorter. data mynewds2; array Sales {*} Sales1990 - Sales2010; label %VarLabels( Var=Sales, Start=1990, NVars=21, Label=Annual Sales ) ; ... run;

1 NESUG 2009 Coders' Corner

EXAMPLE 2: COPYING LABELS FROM AN EXTERNAL SOURCE Sometimes variable labels are available from a data dictionary or other external source. You can import the labels and generate a PROC DATASETS step that assigns the labels to your variables. Here is a sample Excel spread- sheet with variable names and labels. Variable Label NAICS North American Industry Classification System Description Industry TotalSales Total Sales (thousands) PerCapitaSales Per Capita Sales

For a few variables, copying and pasting label text can be acceptable. For many variables, it is both tedious and likely to lead to errors. Automating the process is a better approach. If SAS/ACCESS for PC files is available, PROC IMPORT can create a SAS data set from this Excel spreadsheet. Otherwise, export the spreadsheet to a text file and create a data set from the text file. Note that the first row of the spreadsheet contains column headers that PROC IMPORT will use as the variable names in the newly created data set of labels. proc import datafile='c:\nesug2009\labels\columndescriptions.xls' dbms=excel out=work.labels ; getnames=yes; run;

Here is code to create an empty sample data set with variables to label. libname state 'c:\nesug2009\StateData'; data State.Industry; length NAICS $6 Description $70 TotalSales 8 PerCapitaSales 8; run;

LABEL or ATTRIB statements could provide labels for these variables, but if there are hundreds or thousands of variables, it will be much faster to generate code to create the labels. PROC DATASETS is an efficient way to label the variables. (A DATA step could be used instead, but PROC DATASETS has the advantage of processing only the data set descriptor, not the entire data set.) Here is an example.

proc datasets lib=state nolist; modify industry; label NAICS = 'North American Industry Classification System'; label Description = 'Industry'; label TotalSales = 'Total Sales (thousands)'; label PerCapitaSales = 'Per Capita Sales'; quit;

Figure 2

There are three approaches to generating this PROC DATASETS code. 1. Create macro variables for every variable name and label. Use a macro with a DO loop to generate label statements, similar to the macro in Example 1. This would clutter the macro symbol table, leading to slower processing. It would also require the use of quoting functions to protect against special characters in the labels. 2. Use CALL EXECUTE to stack the generated code in the queue for execution after the DATA step. 3. Save the generated code in a temporary or permanent file for easy review. The example below generates PROC DATASETS code and saves it in a permanent or temporary file.

2 NESUG 2009 Coders' Corner

* Identify file to hold generated label statements; *filename gencode temp; * to put generated code in a temporary file; filename gencode 'c:\nesug2009\labels\gencode.sas'; * in a permanent file;

data _null_; set work.labels end=eof; file gencode;

* Begin step; if _n_ = 1 then do; put "proc datasets lib=state nolist;" / " modify Industry;"; end;

* Generate label statement for each variable; * Use either single or double quotes to surround the label, depending on which kind of quotes is in your labels.; if indexc(label,"'")=0 then put " label " variable " = '" label "';"; else if indexc(label,'"')=0 then put " label " variable ' = "' label '";';

* End PROC DATASETS step; if eof then put "quit;"; run;

%include gencode; filename gencode clear;

The effect of this code is the same as the PROC DATASETS code in Figure 2.

EXAMPLE 3: MODIFYING LABELS FROM A SIMILAR DATA SET PROC SQL can easily create a second data set from a sample data set with the exact same variables and labels. proc sql; create table work.StateIndustry2 like work.StateIndustry; quit;

However, modifying the variable labels will require a different approach. The first task is to retrieve the labels from an existing data set. The second task is to use string processing functions to modify the original variable labels. PROC SQL provides access to the DICTIONARY.COLUMNS table. This table has a complete description of the variables in a data set, including the labels for each variable. SASHELP.VCOLUMN provides access to the same information from a DATA step. In a DATA step, the same code generation techniques that were used in Example 2 can be applied to modify existing labels. PROC CONTENTS shows the structure of SASHELP.VCOLUMN. The variables LIBNAME, MEMNAME, NAME and LABEL hold the names of the data library, data set, and variables, plus the text of the variable labels.

# Variable Type Len Flags Label

1 libname Char 8 P-- Library Name 2 memname Char 32 P-- Member Name 3 memtype Char 8 P-- Member Type 4 name Char 32 P-- Column Name . . . 9 label Char 256 P-- Column Label . . .

3 NESUG 2009 Coders' Corner

As an example, let’s create variable labels for similar data sets for two states, with variable labels that explicitly identify the relevant state. Let’s use data sets for Vermont and Maine as examples. Here is the code to generate for each state. Note that the labels for the TotalSales and PerCapitaSales variables should be modified, but not the labels for the NAICS code or description variables.

proc datasets library=state nolist; modify VermontIndustry; label NAICS = "North American Industry Classification System"; label Description = "Industry" ; label TotalSales = "Vermont Total Sales (thousands)" ; label PerCapitaSales = "Vermont Per Capita Sales" ; quit;

Figure 3A

proc datasets library=state nolist; modify MaineIndustry; label NAICS = "North American Industry Classification System"; label Description = "Industry" ; label TotalSales = "Maine Total Sales (thousands)" ; label PerCapitaSales = "Maine Per Capita Sales" ; quit;

Figure 3B

First, here is a simpler example that modifies all labels for a state. To change some labels, but not all labels, see the Appendix. %let state=Vermont; filename gencode 'c:\nesug2009\labels\gencode_example3.sas';

data _null_; * Read original variable names and labels; set sashelp.vcolumn end=eof; where libname="STATE" and memname="INDUSTRY";

file gencode;

if _n_=1 then do; put "proc datasets library=state nolist; " / " modify &state.industry;" ; end;

newlabel length=$256; * Put quotation marks around the trimmed concatenation of the state and the original label text; newlabel = quote( trim( catx(' ', "&state ", label) ) ); * Create label statement for each variable to re-label; put ' label ' name ' = ' newlabel '; ' ;

if eof then put "quit;"; run;

%include gencode; filename gencode clear;

4 NESUG 2009 Coders' Corner

This generates code that is almost equivalent to the code in Figures 3A, but it modifies the labels for all variables in the input data set. This problem is easy to solve by creating a macro with parameters for naming which va- riables should have their labels modified. A sample macro, ModVarLabels, is in the appendix. It provides a framework that can be customized to meet your needs. It allows you to select or exclude variables, but it does not implement changing variable names in addition to labels, using a placeholder to determine where to insert text, or handling double quotes in labels.

CONCLUSIONS When you have similar data sets or variables, you can automate creating variable labels by using macros and/or information available in an external source or in a related data set. SAS provides tools for each step in this process. PROC IMPORT provides access to external sources. DICTIONARY.COLUMNS or SASHELP.VCOLUMN provides convenient access to information about ex- isting SAS data set variables. PROC DATASETS is an efficient tool for adding or changing labels. Code generation lets you dynamically create data-dependent statements with varying names and labels. Macro language can package your automated labeling for ease of use.

REFERENCES Chow, Adam, “Macro To Put Variable Labels Into a SAS® Data Set From a Crosswalk Table”, http://www.lexjansen.com/wuss/2007/CodersCorner/COD_Chow_MacroToPut.pdf

DiIorio , Frank and Abolafia, Jeff, “Dictionary Tables and Views: Essential Tools for Serious Applications”, http://www2.sas.com/proceedings/sugi29/237-29.pdf

Lafler, Kirk Paul, “Exploring DICTIONARY Tables and Views”, http://www2.sas.com/proceedings/sugi30/070- 30.pdf

Pierchala, Carl E., “Accessing a SAS Variable Label: An Example of the Use of Macro Variables”, http://www.nesug.org/Proceedings/nesug97/begtut/pierchal.pdf

Ravi, Prasad, “Renaming All Variables in a SAS® Data Set Using the Information from PROC SQL's Dic- tionary Tables”, http://www2.sas.com/proceedings/sugi28/118-28.pdf

SAS Institute, “Create variable labels from data set values”, http://support.sas.com/kb/26/139.html

ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.

CONTACT INFORMATION Your comments and questions are welcome. Please contact me : Paulette W. Staum Paul Waldron Consulting 2 Tupper Lane West Nyack, NY 10994 staump@optonline.

5 NESUG 2009 Coders' Corner

APPENDIX %macro ModVarLabels(labellib=, labelds=, newlib= ,newds= , select= ,exclude= , text= );

%local i ;

* Identify file to hold generated label statements; filename gencode temp;

data _null_; * Read original variable names and labels; set sashelp.vcolumn end=eof; where libname="%upcase(&labellib)" and memname="%upcase(&labelds)";

* Point to file to hold generated label statements; file gencode;

attrib newlabel length=$256;

if _n_=1 then do; put "proc datasets library=&newlib nolist; " / " modify &newds;" ; end;

%if %length(&select) %then %do; * If variable is in list of selected variables; if upcase(name) in ( %do i = 1 %to %sysfunc(countw(&select,' ')); "%upcase(%scan(&select,&i))" %end; ) then do; %end;

%if %length(&exclude) %then %do; * If variable is not in list of excluded variables; if upcase(name) notin ( %do i = 1 %to %sysfunc(countw(&exclude,' ')); "%upcase(%scan(&exclude,&i))" %end; ) then do; %end;

newlabel = quote( trim( catx(' ', "&text ", label) ) );

* Create label statement for each variable to re-label; put ' label ' name ' = ' newlabel '; ' ;

%if %length(&select&exclude) %then %do; end; else put ' label ' name ' ="' label '"; ' ; %end;

* End proc datasets step; if eof then do; put "quit;"; end; run;

%include gencode; filename gencode clear;

%mend ModVarLabels;

%ModVarLabels(labellib=state, labelds=industry, newlib=state, newds=VermontIndustry, exclude=NAICS Description , text=Vermont ); %ModVarLabels(labellib=state, labelds=industry, newlib=state, newds=MaineIndustry, exclude=NAICS Description , text=Maine );

6