An Introduction to the SAS System
Total Page:16
File Type:pdf, Size:1020Kb
An Introduction to the SAS System Dileep K. Panda Directorate of Water Management Bhubaneswar-751023 [email protected] Introduction The SAS – Statistical Analysis System (erstwhile expansion of SAS) - is the one of the most widely used Statistical Software package by the academic circles and Industry. The SAS software was developed in late 1960s at North Carolina State University and in 1976 SAS Institute was formed. The SAS system is a collection of products, available from the SAS Institute in North Carolina. SAS software is a combination of a statistical package, a data – base management system, and a high level programming language. The SAS is an integrated system of software solutions that performs the following tasks: Data entry, retrieval, and management Report writing and graphics design Statistical and mathematical analysis Business forecasting and decision support Operations research and project management Applications development At the core of the SAS System is the Base SAS software. The Base SAS software includes a fourth-generation programming language and ready-to-use programs called procedures. These integrated procedures handle data manipulation, information storage and retrieval, statistical analysis, and report writing. Additional components offer capabilities for data entry, retrieval, and management; report writing and graphics; statistical and mathematical analysis; business planning, forecasting, and decision support; operations research and project management; quality improvement; and applications development. In general, the Base SAS software has the following capabilities A data management facility A programming language Data analysis and reporting utilities Learning to use Base SAS enables you to work with these features of SAS. It also prepares you to learn other SAS products, because all SAS products follow the same basic rules. The SAS products include SAS Foundation 9.2 SAS Enterprise Guide 4.2 SAS OLAP Cube Studio 4.2 SAS Information Map Studio 4.2 SAS Enterprise Miner 6.1 SAS IML Studio SAS BI Server The JMP software (not part of Base SAS) The majority of the enhancements included in SAS 9.2 are a direct response to requests from the users. The Base SAS 9.2 includes more analytical and graphical capabilities such as Base SAS - data management and basic procedures SAS/STAT - statistical analysis SAS/GRAPH - presentation quality graphics SAS/Genetics- Genetics data analysis SAS/OR - Operations research SAS/ETS - Econometrics and Time Series Analysis SAS/IML - Interactive Matrix Language SAS/SQL – Structural Query Language There are other specialized products for access to databases, connectivity between different machines running SAS With the enhancements to SAS Data Integration Studio, you can create and debug jobs more efficiently. In the SAS Business Intelligence applications, all of the user interfaces have been updated to increase productivity and usability. The SAS 9.2 includes significant performance improvements through better use of the underlying hardware and software. The improvements include more widespread use of 64-bit computing, as well as an increased ability to utilize and manage grids. Support for multi-threaded processing has been extended across the platform, from SAS procedures to SAS servers to grid enablement. In addition, new in-database features substantially improve performance by moving analytic tasks closer to your data. 2 An Introduction to the SAS System The SAS Windowing Environment The SAS windowing environment is composed of windows that enable you to accomplish specific tasks. Program Editor enables you to enter, edit, and submit SAS programs. Enhanced Program Editor (available only in the Windows operating environment) enables you to enter, edit, and submit SAS programs. The Editor provides a number of useful editing features such as: colour coding and syntax checking of the SAS language. The initial Editor Window title is Editor - Untitled until you open a file or save the contents of the editor to a file. Then the window title changes to reflect that filename. When the content of the editor is modified, an asterisk is added to the title. Log displays messages about your SAS session and any SAS programs you submit. Output enables you to browse output from SAS programs that you submit. By default, the output window is positioned behind the other windows. When you create output, the Output window automatically moves to the front of your display. Results helps you navigate and manage SAS programs that you submit. You can view, save, or print individual items of output. The window on the left side of the display is the SAS Explorer window, which you can use to assign and locate SAS libraries, files, and other items. The window at the top right is the Log window; it contains the SAS log for the session. The window at the bottom right is the Program Editor window. This window provides an editor in which you edit your SAS programs. Invoke the SAS System and include a SAS program into your session. Submit a program and browse the results. 3 An Introduction to the SAS System Navigate the SAS windowing environment. Program Editor for submitting SAS Programs (New or Existing). Multiple Windows can be opened with different SAS programs Log Window for information about the processing of SAS programs including errors, warnings, time taken etc. Output Window for showing the output produced by SAS programs Data Management Facility SAS organizes data into a rectangular form or table that is called a SAS data set. The following shows a SAS data set. The data describes participants in a 16-week weight program at a health and fitness club. The data for each participant includes an identification number, name, team name, and weight (in U.S. pounds) at the beginning and end of the program. A rectangular form of a SAS Data Set is given by In a SAS data set, each row represents information about an individual entity and is called an observation. Each column represents the same type of information and is called a variable. Each separate piece of information is a data value. In a SAS data set, an observation contains all the data values for an entity; a variable contains the same type of data value for all entities. A SAS program is a sequence of steps that the user submits for execution. DATA steps are typically used to create SAS data sets. PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data). To build a SAS data set with Base SAS, you write a program that uses statements in the SAS programming language. A SAS program that begins with a DATA statement and typically creates a SAS data set or a report is called a DATA step. The following SAS program creates a SAS data set named WEIGHT_CLUB from the health club data: data weight_club; 1 input IdNumber 1-4 Name $ 6-24 Team $ StartWeight EndWeight; 2 Loss=StartWeight-EndWeight; 3 datalines; 4 1023 David Shaw red 189 165 5 4 An Introduction to the SAS System 1049 Amelia Serrano yellow 145 124 5 1219 Alan Nance red 210 192 5 1246 Ravi Sinha yellow 194 177 5 1078 Ashley McKnight red 127 118 5 ; 6 run; The following list corresponds to the numbered items in the preceding program: The DATA statement tells SAS to begin building a SAS data set named WEIGHT_CLUB. The INPUT statement identifies the fields to be read from the input data and names the SAS variables to be created from them (IdNumber, Name, Team, StartWeight, and EndWeight). The third statement is an assignment statement. It calculates the weight each person lost and assigns the result to a new variable, Loss. The DATALINES statement indicates that data lines follow. The data lines follow the DATALINES statement. This approach to processing raw data is useful when you have only a few lines of data. (Later sections show ways to access larger amounts of data that are stored in files.) The semicolon signals the end of the raw data, and is a step boundary. It tells SAS that the preceding statements are ready for execution. By default, the data set WEIGHT_CLUB is temporary; that is, it exists only for the current job or session. Programming Language Elements of the SAS Language The statements that created the data set WEIGHT_CLUB are part of the SAS programming language. The SAS language contains statements, expressions, functions and CALL routines, options, formats, and informats - elements that many programming languages share. However, the way you use the elements of the SAS language depends on certain programming rules. The most important rules are listed in the next two sections. Rules for SAS Statements The conventions that are shown in the programs in this documentation, such as indenting of subordinate statements, extra spacing, and blank lines, are for the purpose of clarity and ease of use. They are not required by SAS. There are only a few rules for writing SAS statements: 5 An Introduction to the SAS System SAS statements end with a semicolon. You can enter SAS statements in lowercase, uppercase, or a mixture of the two. You can begin SAS statements in any column of a line and write several statements on the same line. You can begin a statement on one line and continue it on another line, but you cannot split a word between two lines. Words in SAS statements are separated by blanks or by special characters (such as the equal sign and the minus sign in the calculation of the Loss variable in the WEIGHT_CLUB example). Rules for SAS statements SAS names are used for SAS data set names, variable names, and other items. The following rules apply: • Usually consist of Keywords (DATA, PROC etc.); • Always end with a semi-colon (;) • Case InSeNSItive • One or more blank spaces can be used to separate words • Free format – can start, stop anywhere • Single statement can span multiple lines • More than one statement can be on the same line The rules for naming SAS Dataset and variables: A SAS name can contain from one to 32 characters.