Efficient Use of Numeric and Character Data Types
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Use of Numeric and Character Data Types Una Smith, Duke University Abstract • To enhance program clarity, elegance and conVe nience. A basic understanding of data types in computer pro gramming la.nguages is necessary for the efficient con~ • To enhance end-user quality control and data struction of both data sets and programs in the SAS' safety. System. This paper addresses the efficient use of nu meric and character data types at two levels of project development: initial plauning and ongoing mainte- Progress toward the first two of these goals can be nance. precisely .stated in terms of the computer resources For researchers and planners, some general con they use. Frequently, there is a tradeoff between tbe cepts of good data design are outlined! with an em two; in order to write faster data steps, SAS pre phasis on data types; these concepts should help you grammers must often larger data sets. Tbis usually to design data. sets tha.t are both clearer and easier occurs when character variables are replaced with nu to work with. SAS users who have not agonized over meric. ones, or vice versa. The interaction may also the choice of numeric or character variables may dis be a positive one, with improvements on both counts; cover that there are good reaso"ns for exercising care this tends to occur most often when the two remain ing goals) convenience and control Me kept in mind. in ~his matter. Those who have may find a. review of f the major issues helpful as they learn how to use the These two goals are very subjective; experienced users powerful SAS data step more efficiently. ga.in a.n intuitive sense about which techniques are ef fective for the kinds of data they work with. There For the data base manager 1 some pra.ctical strate is generally a tradeoff between these goals as well; gies for working around sub~optimal types ate de scribed. The significance of numeric variable lengths by definition, expert users are capable of writing per for storage requirements on various computer plat fectly correct programs tha.t most inexperienced users forms is covered, and the use and misuse of boolean would not understand. concatenated, date, time and datetirne values is di~ "Efficiency" can mean tuning your code for each cussed. Important points are illustra.ted with com of these, but there may be unexpected interactions plete SAS data step programs. between them; I make no attempt at providing the ~/right" answer) only suggestions for thoughtful decision-making; they may help you find the best so 1 Introduction lution for your own computing problems. In particular, I show how some common usages of Experienced users of the SAS System typica.lly have numeric and/or character types affect each strategy. four goals in mind when thinking about good data When you return to your familiar data sets at the end design: of this conference) you may well see some of your fa • To reduce CPU usage or improve turn-around vQrite variables in a new light. Or maybe not. Either times. way, you will have a. better feeling for the data; you will see more clearly what the data is, not just what it • To reduce storage requirements. look. like. Also, you will be better able to choose be tween numeric and character vstiables to make your I SAS is a registered kadeJILark of SAS lnstitl.lle. Inc., C&ry, NC, USA. The word SAS, when used as a. noun in this paper, data sets and progra.ms both more efficient and easier refers to the SAS programming language. to understand. 329 2 Data Types 3. Character functiollB Data types are abstractions that nonetheless have 4. Date and time functions very powerful, almost physical roles in computer func tion. A data type is a contract betwc<:n your soft 5. State and Zipcode functions ware and your computer) whereby a. name or number 6. Special functions stored in an electrical circuit somewhere inside the computer's memory can be retrieved a.gain in a. form 7. System functions that the software recogni.es. A data type is also a contract between your soft;:. 8. Truncation ware and you, specifying exa.c.t1y what you may do For example, the SAS substr function expects to data of that type. The functions and procedures three arguments; the first argument must be a vaiue built into a software package also use data types to of type character, such as 'SUGI 16'. The two define the types of input they require and output they remaining arguments must both be numeric valuesj create. Each function and procedure specifies exactly such as 1 and 4. Thus the expression the data type that each value must have when used as an argument to it. Likewise, the result of a function substr( 'suin 16', 1, 4) returns the result has a predefined data type. Note that, by definition, lSUGI', a charader value. The numbers and types a procedure may modify the values of its arguments, but does not give a separate result. Also note that a of arguments expected by each SAS function are ex value may be either a constant, or the contents of a plained in detail in "II versions of the SAS manuals. variable. 2.3 Values 2.1 Operators Most computer programming languages ha.ve large The Pascal language has several types of operators sets of data types, and allow the construction of ad for numeric valuesj division. is performed by both div, ditional user-defined types. As a rule of thumb, the which gives integer results, and /, which gives real more specialized a language is, the fewer da.ta types results. SAS has a rich variety of unique operators, it supports. The REXX language has no data types: as well as a variety of functions that substitute for SAS has two. The Pascal language, in its simplest special operators. For """mple, the SAS boolean op form, includes four types: char for single characters, eratots and. oro, and not operate on numeric values. integer for numbers such as 0, 1, 2 etc., boolean for the values True and False j and real for numbers such as 0.0, 1.33 ... , etc. 2.2 Functions The SAS language includes only numeric and char As was mentioned briefly at the start of this sec acter data types. The type of .. constant value is d.,. termined by the absence or presence of quotes around tion j the argument or arguments to each SAS func tion must be of a particular type, either numeric or the value. The type of a variable value is determined character, and the result of each function is also of by an attrib, length, retain or inpllt statement, a particular type. The SAS functions are grouped, or by assignment from a function or operator. more or less according to these data types, as follows: 1. numeric fundions, including 2.4 Variables (a) Arithmetic Because SAS has only two data types, numeric and character, the choice of which to use for variables be (b) Financial comes an issue for careful consideration. (Relatively (c) Mathematical little thought is needed to select an appropriate data (d) Probability distributions type when there are do.ens of types to choose from!) SAS has several unusual conventions which supetfi.~ (e) Quantiles cially approximate the function of additional data (f) Random number generation types. These are the numeric SAS date, time, and (g) Sample statistics datetime values and their associated SAS informats, (h) Trigonometric and hyperbolic formats and functions, and the Mt mask convention along with the byte and rank functions. These con 2. Array functions ventions must be understood in detail if they are to be 330 used efficiently and Bafely. In particular, it is impor Use erlreme caution; read the varjous discussions on tant to know how the number of bytes used to store a this topic in the SAS documentation and talk to " numeric variable affects the accuracy of values under professional SAS programmer, and proceed with -cau these conventions. conventions tion, or play it safe and use .. length of 8 bytes for all variables whose values may have decimal parts. 2.5 Informats and Formats The following is another quick and dirty, but defini~ tive test of the least number of bytes needed to store The utility of both data types (but particularly the all values of a variable correctly: numeric type) i. enhanced hy the use of informats 1. Use a procedure such as univariate to find the and formats, and the format procedure which en l minimum and maximum values stored in the vari ables your to create your own SAS forma.ts. Note a.ble, jf necessary. that user-defined informat. are not allowed, for rea 2. Figure out how many bytes you probably need, sons that go heyond the scope of this paper. SAS llsing the technique described above, or use the informats and formats, respectivelYl affect interpre least number that your SAS implementation will tation of data values only on input and output; it allow. On most computers, at least 3 bytes are may be helpful to think of them as filtecs or tem needed to store any numeric varia.ble, but on it plates which allow data to pass through in only one few machines 2 bytes may be enough. If you direction. These and other conventions are discussed don't want to bother with this step, use 3. at greater length in the following sections. J. Run this program: Yolet least 3·. 3 Numeric Conventions Xlet minimum : 0; Y.let maximum ;;;; 99.99; To represent a fractional or floating point value such %let interval 0.01; as 0.333333 ..