Punctuation in SAS Programming John Morrill, Quintiles, Inc., Kansas City, MO George Li, Quintiles, Inc., Kansas City, MO
Total Page:16
File Type:pdf, Size:1020Kb
Punctuation in SAS Programming John Morrill, Quintiles, Inc., Kansas City, MO George Li, Quintiles, Inc., Kansas City, MO ABSTRACT in that they mask special characters during It's the little things that get you! Punctuation is compilation of a macro. %STR, for example, is often relegated to less than even a second useful for treating a semicolon as text rather than thought, only to be considered when an error part of a statement or when dealing with exists. It is our contention that a systematic unbalanced quotation marks or parentheses. An review of punctuation in SAS programming yields example of this is found when defining a macro for even experienced users a number of variable containing a possessive: %let interesting tidbits and valuable efficiencies. In a=%str(Ed%'s ball);. SAS Help gives the this paper we first briefly identify the common following example showing how to maintain a uses of several type of punctuation. Then from blank as a blank in counting words: %let the less obvious uses we found during our word=%qscan(&string,&count,%str( ));. search, we present the cases most likely to (There is a space within the %str parentheses.) increase efficiency or provide new functionality. Examples of coverage include the when and why A second observation regarding working with of single versus double punctuation marks (e.g., punctuation involves noting that differences in the semi-colon, the period, the question mark, the punctuation usage exist between the SAS SQL exclamation mark, the @ sign, and the pipe) procedure and the SAS Data Step. SAS SQL including a discussion of macro processing, a can have efficiencies over the Data Step, but review of punctuation usage differences between since SQL existed outside of SAS previous to SQL and traditional SAS code, and some version 6.06 its conventions are different from interesting notes on the underscore character. traditional SAS code. These differences show up even in punctuation usage. The comma is INTRODUCTION required as a variable name separator in SQL Programmers may be an odd lot, but even we statements such as SELECT and ORDER BY yet don't normally sit around pondering punctuation. it is optional in the corresponding Data Step. And That is why a systematic review of punctuation of course the semicolon is widely used in the used in SAS programming may yield some data step for each statement, but you may see interesting tidbits. In this paper we make some only one semicolon in the whole SQL code if general observations on punctuation. Then, by there is only one SELECT or CREATE statement. punctuation type, we briefly mention obvious We now move on to a look at each special uses and give examples of less routine uses character. where appropriate. Where appropriate, we discuss double punctuation and equivalent uses. We focus on uses applicable to batch AMPERSAND (&) programming and leave out uses from the The ampersand is most readily known as part of interactive world and specialized procedures a reference to a macro variable. The ampersand such as SAS/ACCESS, AF, and FSP. Also, we substitutes for the logical AND in multiple have included several special characters that are conditions, for example, if var1=1 & not generally grouped into the punctuation var2='Y';. The ampersand also has a special category. use in the INPUT statement, indicating that a character value may have one or more single It sounds funny to refer to "working with embedded blanks. Two or more ampersands punctuation" in the same way that we would used next to each other indicate that multiple "work with lab data" or "work with macros." passes will be made to fully resolve the macro Nonetheless, before presenting details for each variable. For example, if we define punctuation type, we make two observations about working with punctuation. First, macro %let indic1=3; quoting functions such as %STR, %NRSTR, %let indic2=4; %QUOTE, %NRQUOTE, %BQUOTE, %let drug_3=drug_c; %NRBQUOTE, and %SUPERQ are quite useful %let drug_4=drug_d; then we can make reference to &&drug_&indic1 datasets in macro calls, for example, proc and &&drug_&indic2 with multiple passes datasets; delete _:;. The colon is rarely a resolving to suit our needs. After one pass, these synonym of a hyphen and functions in the become &drug_3 and &drug_4, respectively. creation of macro variables in SQL. ASTERISK (*) COMMA (,) Besides serving as the multiplication operator, The comma separates arguments in a function, the asterisk is found in various places to indicate values in the DO and IN statements, and interactions or cross-classification (such as the parameters in macro calls. For the IN statement, TABLES statement in PROC FREQ) and in commas are optional unless used in SQL where indicating the number of times to repeat a they are required. Commas are often used as character string (as in PROC REPORT). It is delimiters in data files (CSV stands for comma used to comment statements alone or in separated values). Special informats and conjunction with the slash (/* … */) or the percent formats (e.g., comma7.) exist for inputting and sign (%*). The asterisk also compels SAS to outputting values with commas separating the determine an array subscript by counting the thousands, for example. Double commas are number of variables in the array. Two asterisks often seen in ordered macro calls where at least indicate the exponentiation operator. one parameter is left blank to allow the default parameter to be used. Finally, the comma is AT SIGN (@) useful in specifying an irregular iteration pattern The at sign is used for column pointer control in in a do loop: do i=1 to 4, 7, 9, 20; and the INPUT and PUT statements. As line-hold can be used to delimit character values in a do specifiers, both the single trailing @ and double loop such as do=’MON’, ‘WED’, ‘SAT’;. trailing @ indicate that a data line should be held EQUAL SIGN (=) for another INPUT or PUT statement, even Besides the equality operator, the equal sign is across iterations of the DATA steps. also used to assign values. BLANK ( ) EXCLAMATION MARK (!) No discussion of special characters would be The exclamation mark is sometimes a synonym complete without mention of the blank, also for the pipe (|) and is sometimes called a bang. It known as the space. It is the ultimate separator is used in concatenation and as the logical OR. and functions as a delimiter in place of a comma In some instances it is automatically defined as in many instances. For example, if var1 in SAS root and is used in path/directory (2 3 4); is equivalent to if var1 in expressions. See PIPE for further details. (2,3,4);. The blank is the only character that multiple occurrences is universally fine if one HYPHEN (-) occurrence is fine, excepting quoted text. The hyphen is sometimes equivalent to a slash and is sometimes called a dash. Of course it is a CARET (^) minus sign when used as an operator, and can The caret is a synonym for the operator NOT and be used to refer to similarly named variables in a is used in conjunction with the equal sign such sequence (e.g., x1-x8). Note that this feature that ^= is equivalent to NE to mean "not equal to." does not work with dataset names. The hyphen COLON (:) is used for pointer control in an INPUT and PUT An excellent review of the colon (Luo, 2001) will statement when the pointer should be backed up not be completely repeated here. Perhaps the a relative number of columns. The hyphen is a simplest feature that is mentioned in this prefix operator, in this case used in conjunction reference has the most widespread use: the with parentheses: x=-(&val1). variable name wildcard function. PROC PRINT; VAR x:; will print all variables At times one may wish to override a character beginning with x. This feature is also useful such as the hyphen where it appears in certain when dropping, keeping, and summing similar output. The formchar option can to this. An variables. This works with datasets as well and example that changes the default lines used in is useful, for example, in deleting temporary common PROC REPORT or PROC TABULATE output is options formchar=’|_---|+|---+=|-/<>*’; PERIOD (.) The period is, of course, commonly seen in The question mark can also be a format modifier numeric data as the decimal placeholder, though used to suppress error messages in the log as the convention in many countries is to use the the following log illustrates. This is not to be comma for this purpose. It is also the default confused with the nofmterr option which indicator of missing numeric values, though this suppresses errors if a requested format is not can be altered in an options statement. The loaded. period must be used when specifying informats and formats and is also used as a delimiter in 1 data test1; pathnames. Double periods are often used with 2 rawdate='02/29/2002'; macro variables. For example, in proc sort 3 date1=input(rawdate, mmddyy10.); data=&lib..demog; the first period designates the 4 run; end of the macro variable &lib while the second period separates the library name from the NOTE: Invalid argument to function member name demog. Two important variables INPUT at line 3 column 9. specified with a period are created automatically rawdate=02/29/2002 date1=. _ERROR_=1 in a dataset in conjunction with a by statement: _N_=1 first.ByVariableName and last.ByVariableName. NOTE: Mathematical operations could These enable a great deal of useful processing not be performed at the following based on an observation being the first or last of places.