Quick viewing(Text Mode)

Punctuation in SAS Programming John Morrill, Quintiles, Inc., Kansas City, MO George Li, Quintiles, Inc., Kansas City, MO

Punctuation in SAS Programming John Morrill, Quintiles, Inc., Kansas City, MO George Li, Quintiles, Inc., Kansas City, MO

in SAS Programming John Morrill, Quintiles, Inc., Kansas City, MO George Li, Quintiles, Inc., Kansas City, MO

ABSTRACT in that they mask special characters during It' the little things that get you! Punctuation is compilation of a . %STR, for example, is often relegated to less than even a second useful for treating a as text rather than thought, only to be considered when an error part of a statement or when dealing with exists. It is our contention that a systematic unbalanced quotation marks or parentheses. An review of punctuation in SAS programming yields example of this is found when defining a macro for even experienced users a of containing a : %let interesting tidbits and valuable efficiencies. In a=%str(Ed%'s ball);. SAS Help gives the this paper we first briefly identify the common following example showing how to maintain a uses of several type of punctuation. Then from blank as a blank in counting words: %let the less obvious uses we found during our word=%qscan(&string,&count,%str( ));. search, we present the cases most likely to (There is a within the %str parentheses.) increase efficiency or provide new functionality. Examples of coverage include the when and why A second observation regarding working with of single versus double punctuation marks (.., punctuation involves noting that differences in the semi-, the period, the mark, the punctuation usage exist between the SAS SQL , the @ sign, and the pipe) procedure and the SAS Data Step. SAS SQL including a discussion of macro processing, a can have efficiencies over the Data Step, but review of punctuation usage differences between since SQL existed outside of SAS previous to SQL and traditional SAS code, and some version 6.06 its conventions are different from interesting notes on the . traditional SAS code. These differences show up even in punctuation usage. The is INTRODUCTION required as a variable name separator in SQL may be an odd lot, but even we statements such as SELECT and ORDER BY yet don' normally sit around pondering punctuation. it is optional in the corresponding Data Step. And That is why a systematic review of punctuation of course the semicolon is widely used in the used in SAS programming may yield some data step for each statement, but you may see interesting tidbits. In this paper we some only one semicolon in the whole SQL code if general observations on punctuation. Then, by there is only one SELECT or CREATE statement. punctuation type, we briefly mention obvious We now move on to a look at each special uses and give examples of less routine uses character. where appropriate. Where appropriate, we discuss double punctuation and equivalent uses. We on uses applicable to batch (&) programming and leave out uses from the The ampersand is most readily known as part of interactive world and specialized procedures a reference to a macro variable. The ampersand such as SAS/ACCESS, AF, and FSP. Also, we substitutes for the logical AND in multiple have included several special characters that are conditions, for example, if var1=1 & not generally grouped into the punctuation var2='';. The ampersand also has a special category. use in the INPUT statement, indicating that a character value may have one or more single It sounds funny to refer to "working with embedded blanks. Two or more punctuation" in the same way that we would used next to each other indicate that multiple "work with lab data" or "work with macros." passes will be made to fully resolve the macro Nonetheless, before presenting details for each variable. For example, if we define punctuation type, we make two observations about working with punctuation. First, macro %let indic1=3; quoting functions such as %STR, %NRSTR, %let indic2=4; %QUOTE, %NRQUOTE, %BQUOTE, %let drug_3=drug_c; %NRBQUOTE, and %SUPERQ are quite useful %let drug_4=drug_d; then we can make reference to &&drug_&indic1 datasets in macro calls, for example, proc and &&drug_&indic2 with multiple passes datasets; delete _:;. The colon is rarely a resolving to suit our needs. After one pass, these synonym of a and functions in the become &drug_3 and &drug_4, respectively. creation of macro variables in SQL.

ASTERISK (*) COMMA (,) Besides serving as the operator, The comma separates arguments in a , the is found in various places to indicate values in the DO and IN statements, and interactions or cross-classification (such as the in macro calls. For the IN statement, TABLES statement in PROC FREQ) and in are optional unless used in SQL where indicating the number of times to repeat a they are required. Commas are often used as character string (as in PROC REPORT). It is in data files (CSV stands for comma used to comment statements alone or in separated values). Special informats and with the (/* … */) or the percent formats (e.g., comma7.) exist for inputting and sign (%*). The asterisk also compels SAS to outputting values with commas separating the determine an array subscript by counting the thousands, for example. Double commas are number of variables in the array. Two often seen in ordered macro calls where at least indicate the operator. one is left blank to allow the default parameter to be used. Finally, the comma is (@) useful in specifying an irregular iteration pattern The at sign is used for pointer control in in a do loop: do i=1 to 4, 7, 9, 20; and the INPUT and PUT statements. As -hold can be used to delimit character values in a do specifiers, both the single trailing @ and double loop such as do=’MON’, ‘WED’, ‘SAT’;. trailing @ indicate that a data line should be held EQUAL SIGN (=) for another INPUT or PUT statement, even Besides the operator, the equal sign is across iterations of the DATA steps. also used to assign values. BLANK ( ) EXCLAMATION MARK (!) No discussion of special characters would be The exclamation mark is sometimes a synonym complete without mention of the blank, also for the pipe (|) and is sometimes called a bang. It known as the space. It is the ultimate separator is used in concatenation and as the logical OR. and functions as a in place of a comma In some instances it is automatically defined as in many instances. For example, if var1 in SAS root and is used in / (2 3 4); is equivalent to if var1 in expressions. See PIPE for further details. (2,3,4);. The blank is the only character that multiple occurrences is universally fine if one HYPHEN (-) occurrence is fine, excepting quoted text. The hyphen is sometimes equivalent to a slash and is sometimes called a . Of course it is a (^) minus sign when used as an operator, and can The caret is a synonym for the operator NOT and be used to refer to similarly named variables in a is used in conjunction with the equal sign such sequence (e.g., x1-x8). that this feature that ^= is equivalent to NE to mean "not equal to." does not work with dataset names. The hyphen COLON (:) is used for pointer control in an INPUT and PUT An excellent review of the colon (Luo, 2001) will statement when the pointer should be backed up not be completely repeated here. Perhaps the a relative number of columns. The hyphen is a simplest feature that is mentioned in this prefix operator, in this case used in conjunction reference has the most widespread use: the with parentheses: =-(&val1). variable name wildcard function. PROC PRINT; VAR x:; will print all variables At times one may wish to override a character beginning with x. This feature is also useful such as the hyphen where it appears in certain when dropping, keeping, and summing similar output. The formchar option can to this. An variables. This works with datasets as well and example that changes the default lines used in is useful, for example, in deleting temporary common PROC REPORT or PROC TABULATE output is options formchar=’|_---|+|---+=|-/<>*’; PERIOD (.) The period is, of course, commonly seen in The can also be a format modifier numeric data as the decimal placeholder, though used to suppress error messages in the log as the convention in many countries is to use the the following log illustrates. This is not to be comma for this purpose. It is also the default confused with the nofmterr option which indicator of missing numeric values, though this suppresses errors if a requested format is not can be altered in an options statement. The loaded. period must be used when specifying informats and formats and is also used as a delimiter in 1 data test1; pathnames. Double periods are often used with 2 rawdate='02/29/2002'; macro variables. For example, in proc sort 3 date1=input(rawdate, mmddyy10.); data=&lib..demog; the first period designates the 4 run; end of the macro variable &lib while the second period separates the library name from the NOTE: Invalid argument to function member name demog. Two important variables INPUT at line 3 column 9. specified with a period are created automatically rawdate=02/29/2002 date1=. _ERROR_=1 in a dataset in conjunction with a by statement: _N_=1 first.ByVariableName and last.ByVariableName. NOTE: Mathematical operations could These enable a great deal of useful processing not be performed at the following based on an observation being the first or last of places. The results of the a given value for a given variable. And by the operations have been to missing way, said the period could replace the values. hyphen in separating the area code, prefix, and Each place is given by: phone number?! (Number of times) at (Line):(Column). PIPE (|) 1at3:9 The pipe is sometimes a synonym for the NOTE: The data set WORK.TEST1 has 1 exclamation . It substitutes for the logical observations and 2 variables. OR as in if var1 IN (1 2 3) | var2 IN (4 5 6);. The double pipe is used in the 5 data test2; concatenation of string variables as in 6 rawdate='02/29/2002'; var3=trim(var1)||var2;. 7 date1=input(rawdate, ?mmddyy10.); 8 run; PLUS SIGN (+) The plus sign is the operator. Like the rawdate=02/29/2002 date1=. _ERROR_=1 hyphen, the plus is used for pointer control in an _N_=1 INPUT and PUT statement. The plus is used NOTE: The data set WORK.TEST2 has 1 when the pointer should be moved forward a observations and 2 variables. relative number of columns. Like the hyphen, the plus sign is also a prefix operator as in x=+(-1). For some reason, this formulation helps when 9 data test3; specifying pointer control movement. 10 rawdate='02/29/2002'; 11 date1=input(rawdate, ??mmddyy10.); (#) 12 run; The pound sign is used for the line pointer control in the INPUT and PUT statements. NOTE: The data set WORK.TEST3 has 1 observations and 2 variables. QUESTION MARK (?) The question mark is a synonym of CONTAIN. Since 02/29/2002 (February 29, 2002) is not a For example, if the variable names in the dataset valid date, SAS cannot convert it into a valid SAS any has four observations, each containing one date value. For the first DATA step, SAS printed of Edwards, Smith, Smithson, & Smithsonian, a note saying ‘Invalid argument to function then where upcase(names) ? 'SMITH'; INPUT’ and set the automatic variable _ERROR_ selects all but Edwards. Note that this operator to 1 in the log file. When a single ? is used in the can only be used with a character variable. second DATA step, the ‘Invalid function’ message was suppressed but SAS still set _ERROR_=1 in the log file. As we can see the SLASH (/) and (\) double ?? in the third DATA statement The slash is sometimes equivalent to a hyphen. suppressed both the ‘Invalid function’ message One obvious use of a slash is to precede an and the setting of _ERROR_ to 1. However, all option list in many procedures and of course it is three data sets set the numeric variable date1 to the operator. It is also the default split missing. character in PROC REPORT and is used to move the pointer to the next line in INPUT and QUOTE (' and ") PUT statements. The slash and backslash are The quote is used to indicate literal (e.g., both used as directory level separators, character or ) strings, including depending on the . In order to empty strings. Double single quotes are used write platform independent code, the automatic when a single quote is needed in resolution. An macro variable &SYSSCP can be used to detect example of this is illustrated with the title the current operating system to determine which statement title ‘Joe & Bob’’s BBQ’;. In separator to use. Finally, the slash is part of a this case, the title resolves to “Joe & Jim’s BBQ” certain kind of comment (/* … */). and since we used single quotes instead of double quotes to surround the entire title, the (~) ampersand remains unresolved as we desire, The tilde is a format modifier in the INPUT since it is part of these fellow’s business name. statement. It triggers special treatment of single quotation marks, double quotation marks, and The double quote is a synonym of the single delimiters in character values in that delimiters quote for many uses but also triggers the within quoted character values are read as processing of a macro variable as text. This characters instead of delimiters. The tilde is a latter function is found in the following example. synonym for the caret in meaning NOT. A printout would show 5 as the value for both X UNDERSCORE (_) and X2, but X is a character variable while X2 is While some punctuation marks (slash, backslash, a numeric variable. period) are used in pathnames or to separate parts of names, the underscore is the only %let y=5; special character which SAS names (variables, data any; data sets, formats, librefs, etc.) may contain. X="&y"; Two names universally reserved across the SAS X2=&y; system contain , _N_ and run; _ERROR_. Along with the letters A through , SEMI-COLON (;) the underscore is one of 27 characters Probably no SAS has gone his or recognized as special missing values enabling her entire career without leaving a semi-colon off the differentiation among types of missing values. the end of a statement! are optional at the end of a macro call. Multiple semi-colons The underscore is used in default names several are often seen when conditionally processing part places in SAS. For example, if in a procedure of a macro. For example, the first semi-colon at you don’t specify a dataset where one is the end of %if condition %then var1=5;; expected, the procedure by default assumes the provides the statement-ending semi-colon for previously used or generated dataset, which is what is processed (var1=5) if condition is true stored in _last_. If for some reason this is not while the second semi-colon ends the suitable, it can be changed using, for example, conditionally processed statement. Another options _last_=mylast;. Another example place multiple semi-colons are used is with the of this is that the underscore is the default cards4 statement; this is necessary when semi- variable prefix in PROC TRANSPOSE. And colons in data must be processed semi-colons since SAS-created variables begin with an rather than terminating an input line. The underscore, drop _:; removes all of these semicolon is second only to the blank in multiple variables. Data _null_ is used when data step occurrences being acceptable if one occurrence processing is required but no usual output is acceptable. dataset is desired. Also, %put _automatic_; displays all automatic variables while _local_ and in the optional hyphen (¬). The tick (`), which is _user_ have corresponding results. Other not to be confused with the single quote, examples of automatic variable exist. When represents a challenge to SAS users to find a use defining an array with array question q1- distinct from the single quote. q9;, for example, the _i_ variable is automatically and temporarily assigned as an CONCLUSION iteration for the current data step. And There are hundreds of uses of punctuation and _all_ is created as a variable in data any; set special characters. Along with a few examples, a; if condition then put _all_; yet it is this present partial repository provides impetus to created as a dataset name in proc contents ponder prodigious employment of punctuation. data=fileref._all_;. We hope this has been useful. We welcome candidates for a more complete listing. Another area of underscores is seen in the following code. REFERENCES Luo, Haiping, 2001, That Mysterious Colon (:). Array chars _character_; Proceedings of the Twenty-sixth Annual SAS do over chars; Users International Conference, Long if first.suno then chars=' '; Beach, CA, Paper 73-26. end; Array nums _numeric_; ACKNOWLEDGMENTS do over nums; The authors would like to thank Jim Edgington if nums=. then nums=0; and Indra Fernando for valuable suggestions. end; CONTACT INFORMATION The above two statements containing an John Morrill underscore merit special mention. The first sets Quintiles, Inc. all character variables to blank. This is useful to P.. Box 9708 ensure that the retain statement does not carry Kansas City MO 64134-0708 over unwanted values. The second changes the Phone: (816) 767-6000 value of numeric variables from missing to zero. E-mail: [email protected] Another handy use of the _numeric_ feature is found in proc freq; tables _numeric_;. George Li This automatically selects all numeric variables Quintiles, Inc. for inclusion into the TABLES statement. P.O. Box 9708 Kansas City MO 64134-0708 OTHER SPECIAL CHARACTERS Phone: (816) 767-6000 Of course there are other keyboard characters E-mail: [email protected] that are used in SAS. The dollar sign ($) is used in arrays and in specifying character formats. The (%) is used in the macro as the macro keyword . Parentheses (()), square ([]) and curly brackets, ({}) are used in various places with varying levels of equivalency. The less than sign (<) and the greater than sign (>) are operators when used separately, and this is not just for values but also for dates and character values. When used together, >< is the minimum operator and <> is the maximum operator. The equal sign and asterisk used together (=*) is the ‘sounds like’ operator and can be used in a fuzzy sort of way to catch phonetically similar strings. It is no longer a common keyboard character, but a synonym of the caret and tilde for NOT is found