Techniques for Labeling SAS® Variables

Techniques for Labeling SAS® Variables

NESUG 2009 Coders' Corner Techniques for Labeling Variables Paulette W. Staum, Paul Waldron Consulting, West Nyack, NY ABSTRACT This paper demonstrates techniques for automating the creation of variable labels. The first example creates la- bels that are similar, except that each label includes a different numeric suffix. The second example imports labels from an external data source. The third example captures and modifies labels from an existing data set using PROC SQL and DICTIONARY.COLUMNS. INTRODUCTION Creating labels for variables is virtuous, but tedious. If you have hundreds of variables, your LABEL statement can go on for pages. No one wants to type all those labels. If the text for the labels is similar or is available electroni- cally, there are better methods than typing the text in your program. In theory, there could be SAS® syntax to set a variable label to the value of another variable. It might look like CALL SETLABEL( varname, varlabel). Since this syntax does not exist (yet), programmers have developed a variety of techniques for labeling. EXAMPLE 1: CREATING SIMILAR LABELS WITH NUMERIC SUFFIXES Some data sets have many variables with names and labels that have the same beginnings but different numeric suffixes. Here is an example of specifying labels for variables. The LABEL statement can be used in a DATA step or a PROC step. Label Sales1990 = "Annual Sales 1990" Sales1991 = "Annual Sales 1991" Sales1992 = "Annual Sales 1992" ... Sales2009 = "Annual Sales 2009" Sales2010 = "Annual Sales 2010" ; Figure 1 A simple macro can minimize the typing (or copying, pasting and editing) required to label this series of variables. Save the macro in an autocall library for easy usage in any program. The macro needs to know four things, so it requires values for four parameters: Parameter Description Var Common beginning stem for variable names Start Number of first variable in series NVars Count of variables to be labeled Label Common beginning text for variable labels %macro VarLabels(Var= ,Start=1 ,NVars= ,Label= ); %local i; %do i = &start %to &start + &NVars - 1; &var&i = "&label &i" %end; %mend VarLabels; The following macro call is equivalent to writing the label statement in Figure 1, but it is much easier and shorter. data mynewds2; array Sales {*} Sales1990 - Sales2010; label %VarLabels( Var=Sales, Start=1990, NVars=21, Label=Annual Sales ) ; ... run; 1 NESUG 2009 Coders' Corner EXAMPLE 2: COPYING LABELS FROM AN EXTERNAL SOURCE Sometimes variable labels are available from a data dictionary or other external source. You can import the labels and generate a PROC DATASETS step that assigns the labels to your variables. Here is a sample Excel spread- sheet with variable names and labels. Variable Label NAICS North American Industry Classification System Description Industry TotalSales Total Sales (thousands) PerCapitaSales Per Capita Sales For a few variables, copying and pasting label text can be acceptable. For many variables, it is both tedious and likely to lead to errors. Automating the process is a better approach. If SAS/ACCESS for PC files is available, PROC IMPORT can create a SAS data set from this Excel spreadsheet. Otherwise, export the spreadsheet to a text file and create a data set from the text file. Note that the first row of the spreadsheet contains column headers that PROC IMPORT will use as the variable names in the newly created data set of labels. proc import datafile='c:\nesug2009\labels\columndescriptions.xls' dbms=excel out=work.labels replace; getnames=yes; run; Here is code to create an empty sample data set with variables to label. libname state 'c:\nesug2009\StateData'; data State.Industry; length NAICS $6 Description $70 TotalSales 8 PerCapitaSales 8; run; LABEL or ATTRIB statements could provide labels for these variables, but if there are hundreds or thousands of variables, it will be much faster to generate code to create the labels. PROC DATASETS is an efficient way to label the variables. (A DATA step could be used instead, but PROC DATASETS has the advantage of processing only the data set descriptor, not the entire data set.) Here is an example. proc datasets lib=state nolist; modify industry; label NAICS = 'North American Industry Classification System'; label Description = 'Industry'; label TotalSales = 'Total Sales (thousands)'; label PerCapitaSales = 'Per Capita Sales'; quit; Figure 2 There are three approaches to generating this PROC DATASETS code. 1. Create macro variables for every variable name and label. Use a macro with a DO loop to generate label statements, similar to the macro in Example 1. This would clutter the macro symbol table, leading to slower processing. It would also require the use of quoting functions to protect against special characters in the labels. 2. Use CALL EXECUTE to stack the generated code in the queue for execution after the DATA step. 3. Save the generated code in a temporary or permanent file for easy review. The example below generates PROC DATASETS code and saves it in a permanent or temporary file. 2 NESUG 2009 Coders' Corner * Identify file to hold generated label statements; *filename gencode temp; * to put generated code in a temporary file; filename gencode 'c:\nesug2009\labels\gencode.sas'; * in a permanent file; data _null_; set work.labels end=eof; file gencode; * Begin step; if _n_ = 1 then do; put "proc datasets lib=state nolist;" / " modify Industry;"; end; * Generate label statement for each variable; * Use either single or double quotes to surround the label, depending on which kind of quotes is in your labels.; if indexc(label,"'")=0 then put " label " variable " = '" label "';"; else if indexc(label,'"')=0 then put " label " variable ' = "' label '";'; * End PROC DATASETS step; if eof then put "quit;"; run; %include gencode; filename gencode clear; The effect of this code is the same as the PROC DATASETS code in Figure 2. EXAMPLE 3: MODIFYING LABELS FROM A SIMILAR DATA SET PROC SQL can easily create a second data set from a sample data set with the exact same variables and labels. proc sql; create table work.StateIndustry2 like work.StateIndustry; quit; However, modifying the variable labels will require a different approach. The first task is to retrieve the labels from an existing data set. The second task is to use string processing functions to modify the original variable labels. PROC SQL provides access to the DICTIONARY.COLUMNS table. This table has a complete description of the variables in a data set, including the labels for each variable. SASHELP.VCOLUMN provides access to the same information from a DATA step. In a DATA step, the same code generation techniques that were used in Example 2 can be applied to modify existing labels. PROC CONTENTS shows the structure of SASHELP.VCOLUMN. The variables LIBNAME, MEMNAME, NAME and LABEL hold the names of the data library, data set, and variables, plus the text of the variable labels. # Variable Type Len Flags Label 1 libname Char 8 P-- Library Name 2 memname Char 32 P-- Member Name 3 memtype Char 8 P-- Member Type 4 name Char 32 P-- Column Name . 9 label Char 256 P-- Column Label . 3 NESUG 2009 Coders' Corner As an example, let’s create variable labels for similar data sets for two states, with variable labels that explicitly identify the relevant state. Let’s use data sets for Vermont and Maine as examples. Here is the code to generate for each state. Note that the labels for the TotalSales and PerCapitaSales variables should be modified, but not the labels for the NAICS code or description variables. proc datasets library=state nolist; modify VermontIndustry; label NAICS = "North American Industry Classification System"; label Description = "Industry" ; label TotalSales = "Vermont Total Sales (thousands)" ; label PerCapitaSales = "Vermont Per Capita Sales" ; quit; Figure 3A proc datasets library=state nolist; modify MaineIndustry; label NAICS = "North American Industry Classification System"; label Description = "Industry" ; label TotalSales = "Maine Total Sales (thousands)" ; label PerCapitaSales = "Maine Per Capita Sales" ; quit; Figure 3B First, here is a simpler example that modifies all labels for a state. To change some labels, but not all labels, see the Appendix. %let state=Vermont; filename gencode 'c:\nesug2009\labels\gencode_example3.sas'; data _null_; * Read original variable names and labels; set sashelp.vcolumn end=eof; where libname="STATE" and memname="INDUSTRY"; file gencode; if _n_=1 then do; put "proc datasets library=state nolist; " / " modify &state.industry;" ; end; attrib newlabel length=$256; * Put quotation marks around the trimmed concatenation of the state and the original label text; newlabel = quote( trim( catx(' ', "&state ", label) ) ); * Create label statement for each variable to re-label; put ' label ' name ' = ' newlabel '; ' ; if eof then put "quit;"; run; %include gencode; filename gencode clear; 4 NESUG 2009 Coders' Corner This generates code that is almost equivalent to the code in Figures 3A, but it modifies the labels for all variables in the input data set. This problem is easy to solve by creating a macro with parameters for naming which va- riables should have their labels modified. A sample macro, ModVarLabels, is in the appendix. It provides a framework that can be customized to meet your needs. It allows you to select or exclude variables, but it does not implement changing variable names in addition to labels, using a placeholder to determine where to insert text, or handling double quotes in labels. CONCLUSIONS When you have similar data sets or variables, you can automate creating variable labels by using macros and/or information available in an external source or in a related data set. SAS provides tools for each step in this process. PROC IMPORT provides access to external sources. DICTIONARY.COLUMNS or SASHELP.VCOLUMN provides convenient access to information about ex- isting SAS data set variables. PROC DATASETS is an efficient tool for adding or changing labels.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us