Chapter 1 Lecture Notes

A CONCISE GUIDE TO THE SAS STATISTICAL PACKAGE VERSION 9

Professor Thornton Economics 415 Econometrics

1 INTRODUCTION

This guide provides an overview of the SAS statistical package, version 8, and an explanation of a number of useful SAS commands and capabilities. It does not explain all SAS commands and capabilities. SAS is an extremely powerful statistical package, and if you desire to learn more about what it can do you should consult the appropriate SAS Users Manual or purchase one of the many SAS companion books available in bookstores that provide a more detailed explanation about various facets of the SAS system.

DATA SETS

In this guide, SAS commands are explained in the context of examples. The examples are based on the following five data sets. It is assumed that each data set is contained in an ASCII text file on a disk in drive A. If your data files are on drive C, or on a USB flash drive/memory stick, etc. located in a different drive such as drive F, modify the examples below accordingly (e.g., replace the letter a with the letter c or f). Using a USB flash drive/memory stick is recommended.

DATA7-2

The data file DATA7-2 comes with the Ramanathan econometrics text book. It consists of a cross- section of 49 workers. The variables are WAGE = monthly wage, EDUC = years of education beyond the eighth grade, EXPER = years of experience, AGE = age of worker, GENDER = indicator variable for gender (1 if male, 0 if female), RACE = indicator variable for race (1 if white, 0 if nonwhite), CLERICAL = indicator variable for clerical worker (1 if clerical worker, 0 otherwise), MAINT = indicator variable for maintenance worker (1 if maintenance worker, 0 otherwise), CRAFTS = indicator variable for crafts worker (1 if crafts worker, 0 otherwise).

CPS85

The data file CPS85 consists of 527 randomly selected employed workers from the May 1985 current population survey conducted by the Department of Commerce. This is a survey of over 50,000 households conducted monthly, and it serves as the basis for the national employment and unemployment statistics. The variables are: ED = years of education, SOUTH = dummy variable (1 if worker lives in south, 0 otherwise), NONWH = dummy variable (1 if worker is nonwhite, zero otherwise), HISP = dummy variable (1 if worker is Hispanic, 0 otherwise), FE = dummy variable (1 if worker is female, 0 otherwise), MARR = dummy variable (1 if worker is married with spouse present in household, 0 otherwise), MARRFE = dummy variable (1 if worker is married female with spouse present in household, 0 otherwise), EX = years of labor market experience, UNION = dummy variable (1 if worker has union job, 0 otherwise), WAGE = average hourly earnings in constant 2003 dollars, AGE = age in years, MANUF = dummy variable ( 1 if worker works in manufacturing industry, 0 otherwise), CONSTR = dummy variable ( 1 if worker works in construction industry, 0 otherwise), MANAG = dummy variable (1 if worker is managerial or administrative, 0 otherwise), SALES = dummy variable (1 if worker is in sales, 0 otherwise), CLER = dummy variable ( 1 if worker is clerical worker, 0 otherwise), SERV = dummy variable (1 if worker is a service worker, 0 otherwise), PROF = dummy variable (1 if worker is professional or technical, 0 otherwise),

DOCTOR1

The data file DOCTOR1 consists of a cross-section of 87 primary care physicians for the year 1985. The variables are ID = identification number for the physician, VISITS = number of patient visits per week, HOURS = physician hours worked per week, AIDES = number of non-physician employees in the

2 medical practice, DOCWAGE = average hourly earnings of physician, AIDEWAGE = weekly wage of non-physician employee.

DOCTOR2

The data file DOCTOR2 consists of the same cross-section of 87 primary care physicians as in the data file DOCTOR1. The variables are ID = identification number for the physician, PRICE = fee charged by doctor for office visit with an established patient. YDUM = indicator variable for non-medical income (1 if non-medical income of more than $10,000, 0 otherwise), AGE = age of physician, PCINC = per capita income in county in which physician practices, POPTOT = population in county in which physician practices.

MACROCON

The data file MACROCON consists of a times-series of annual data for the period 1959 to 1995. The variables are YEAR = year, CONS = annual consumption spending in billions of dollars, DISINC = annual disposable income in billions of dollars, PRICE = consumer price index, PRIME = the prime interest rate, UN = unemployment rate.

BACKGROUND INFORMATION

SAS is a statistical software package that can be used to read, manage, analyze, and present data. SAS allows you to read data in a variety of different formats, transform the data to conduct statistical analyses, analyze the data, and present the results.

A SAS program has two major components: Data Steps and Procedures. The data step allows you to read SAS data sets or raw data, perform transformations on the data, create new variables, and recode existing variables. The data step is the component of the program that creates SAS datasets. The procedure (usually referred to as PROC) allows you to analyze and present the data. Data steps and procedures are comprised of one or more statements. A statement is usually identified by a keyword that suggests the statement’s function (e.g., INPUT, INFILE, MEANS, RUN). Every statement ends with a semicolon.

EXECUTING A SAS PROGRAM

A SAS program can be executed in different ways. The two most important ways are batch mode and interactive windows mode. In batch mode you use a text editor (such as Microsoft WordPad) to write a SAS program in an input file in ASCII format. You then tell SAS to execute the program in the input file and place the resulting output in an output file. You then use a text editor to view the output file.

In interactive windows mode, you can either type SAS statements in a Program Editor window or use the SAS program builder. To use the SAS program builder, you use the mouse to point and click on the appropriate selections and enter the necessary information in dialogue boxes. When SAS statements are executed the output is displayed in an Output window. A Log window is also displayed that contains the log for any SAS statements that are executed. The log window is very useful in writing SAS programs. The log is displayed whether the program works or not. It repeats the SAS statements that are executed, documents any SAS datasets that are created, gives you warnings about potential problems with your program, and error messages for mistakes such as incorrect syntax.

This guide explains how to create and execute SAS programs in interactive windows mode, using both the Program Editor window and the program builder.

3 CREATING A SAS DATASET

The first step in SAS programming is to create a SAS dataset. SAS has a large number of tools that can be used to read raw data into a SAS dataset. This process is called importing. The raw data used to create a SAS dataset can be in a number of different formats and locations. The data is usually either stored in an external data file or is entered manually when you write a SAS program. Data entered manually when writing a SAS program is called in-stream data.

The most important SAS statements in a data step are DATA, INFILE, CARDS, INPUT, LIBNAME. The DATA statement gives a name to the SAS dataset your are creating. The INFILE statement tells SAS that the raw data are located in an external data file. The CARDS statement tells SAS that the data will be entered manually in the program. The INPUT statement names the variables in the dataset. It also tells SAS the layout of the raw data. The LIBNAME statement tells SAS where to store the SAS dataset you create so that you can save it for future use.

EXTERNAL DATA FILE

The following example explains how to create a SAS dataset with raw data that is contained in an external data file in ascii text format.

Example #1

The data file named DATA7-2 is an example of an external data file in ascii text format. This file contains only numbers – the names of the variables are documented in a separate location. The file is located on a disk in drive A. This file has 49 observations on 9 variables. Each row is an observation (also called a record). Each column is a variable (also called a field). This is a common layout for most external data files. Thus, there are 49 rows and 9 columns of numbers. The names of the variables are WAGE, EDUC, EXPER, AGE, GENDER, RACE, CLERICAL, MAINT, CRAFTS. Data for the variable WAGE are contained in column 1 in the data file. Data for the variable EDUC are contained in column 2 in the data file. Etc.

You want to create a SAS dataset named EARNINGS with the data contained in the external data file named DATA7-2.

Program Editor Window

After you initiate the SAS program, you should see three separate windows on the computer screen: Explorer window, Log window, and the Program Editor window. Enter the following SAS statements in the Program Editor window

DATA earnings; INFILE ‘a:data7-2.txt’; INPUT wage educ exper age gender race clerical maint crafts; PROC PRINT data=earnings; RUN;

This is a SAS program. A SAS program can be written in either uppercase or lowercase or both. In this example, keywords are in uppercase and the rest of a statement is in lowercase. This program has both a data step and one procedure. It is comprised of 5 SAS statements. If you desire, you can write more than one statement on the same line. Also, a statement can extend to more than one line. However, each statement must end with a semicolon. In the above example, each line has one statement. The DATA

4 statement tells SAS to create a SAS dataset and name it EARNINGS. The INFILE statement tells SAS to read the raw data in the external ascii text file named DATA7-2 located on the disk in drive A. Note that the name and location of the file must be enclosed in single quotation marks. The INPUT statement tells SAS the names of the variables. You must list the variable names in the order in which they appear in the dataset documentation for this class; if you don’t SAS will incorrectly read the data. The PROC PRINT statement tells SAS to display the data set EARNINGS in the Output window so you can see it. Note that if you did not include DATA = EARNINGS in this statement, SAS would print out the current SAS dataset. This is true for any PROC statement. The RUN statement tells SAS to execute the previous statements.

Executing the Program

There are two ways to execute the above SAS program. 1) Click the Submit button on the tool bar. This is the button with a picture of a runner. 2) Click Run on the menu bar. Click Submit on the run menu. Note that SAS will execute all statements that appear in the Program Editor window. If you want SAS to execute a subset of statements that appear in the Program Editor window, use the mouse to highlight these statements, and then click Submit or click Run and submit.

Viewing the Output

After SAS executes the program, the Output window appears and the data are displayed. To close the Output window, click the box marked X in the upper right-hand corner of the Output window. After you close the Output window, you should see three windows on your screen: Log window, Editor Window, Results window.

Storing the SAS Dataset

The SAS dataset named EARNINGS is a temporary SAS dataset. It is saved in the SAS Library named Work. To verify this, proceed as follows. Close the Results window and open the Explorer window. To do this, click the Explorer button at the bottom of the Results window. Click the Libraries icon that appears in the Explorer window. Click the Work icon. Click the Earnings icon. This opens the Viewtable window, which contains your data. This verifies that the SAS library named Work contains the SAS dataset named EARNINGS. Once your SAS session ends, this dataset is automatically deleted. To make EARNINGS a permanent SAS dataset, you must use a LIBNAME statement. This is explained below.

Comments

The INPUT statement tells SAS the names of the variables, the type of variable, and how the data is arranged. In the above example, the only information provided in the INPUT statement is the names of the variables. This is because all of variables are numeric variables and there is at least one blank space between each of the values in the data lines in the external file. If the data set includes one or more character variables (a variable that contains letters of the alphabet), then the symbol $ must be placed in the INPUT statement directly after the name of the character variable. If there is not at least one blank space between the values in the data lines, then you must tell SAS the column number(s) in which the data for each variable is located in the data file.

IN-STREAM DATA

A SAS dataset can be created by entering data manually into a SAS program. Example #2 using in-stream data has been deleted since this approach is seldom used to create a SAS dataset.

5 CREATING A PERMANENT SAS DATASET

The SAS dataset EARNINGS created in the above example is a temporary SAS dataset. It is saved in the SAS Library named Work. Once your session ends, this data set is automatically deleted. The following example explains how to create a permanent SAS dataset.

Example #3

You want to create a permanent SAS dataset using the data contained in the file named DATA7-2 located on a disk in drive A. You want to save this SAS dataset on the disk in drive A.

Enter the following SAS statements in the Program Editor window

LIBNAME econ415 ‘a:’; DATA econ415.earnings; INFILE ‘a:data7-2.txt’; INPUT wage educ exper age gender race clerical maint crafts; RUN;

The LIBNAME statement tells SAS to store the SAS dataset that follows in the library named ECON415, which is located on the disk in drive A. Note that the location of the file must be enclosed by single quotation marks. If this library does not exist, then SAS will create it. If this library already exists, then SAS will store the subsequent SAS dataset in it. You can store as many SAS datasets as you want in a single library. The DATA statement tells SAS to create a SAS dataset named EARNINGS and store it in the library named ECON415. Note that you must prefix the name of the dataset with the name of the library in which it will be stored. The rest of the statements are the same as in the above example.

ACCESSING A PERMANENT SAS DATASET

The following examples explain how to load a permanent SAS dataset that you have created and create new temporary or permanent SAS datasets from it.

Example #4

You want to access the dataset named EARNINGS which is stored in the library named ECON415 on a disk on drive A. You want to create a temporary SAS data set named EARN1.

LIBNAME econ415 ‘a:’; DATA earn1; SET econ415.earnings; RUN;

The LIBNAME statement tells SAS the name of the library and where it is located. The DATA statement tells SAS to create a temporary SAS dataset named EARN1. The SET statement tells SAS to access the permanent SAS dataset named EARNINGS that is located in the library named ECON415. To verify that you have accessed EARINGS and created EARN1, click the Libraries icon in the Explorer window. There is now an icon for ECON415. If you click ECON415, you will see an EARNINGS icon. If you

6 click the Work icon, you will see an icon for the temporary dataset EARN1. Note that when you end your session, the temporary dataset EARN1 will be deleted. If you want to store this new dataset permanently in the library named ECON415, then replace the DATA statement above with the following DATA statement

DATA econ415.earn1;

If you want to store all changes made in the current session in the permanent SAS dataset named EARNINGS, then replace the DATA statement above with the following DATA statement

DATA econ415.earnings;

In this case, you do not create a temporary SAS dataset. Rather, SAS overwrites the permanent SAS dataset EARNINGS with any changes that you make to the data during the current session.

Program Builder

To access EARNINGS click right click (make sure that you right click, not left click) the Libraries icon in the Explorer window. Click New… This opens the New Library Dialogue box. In the Name box type ECON415. In the Path box type A:. Click OK. If you click on the Libraries icon, you will now see an icon for ECON415. To create the temporary SAS dataset named EARN1 click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415. Click EARNINGS. Click Open. This opens the spreadsheet that contains the EARNINGS data. On the menu bar click File. Point to Save. Click Data… A Save Data box appears. Highlight the Library name WORK. Next to Data Set: type EARN1. Click OK. If you want to store EARN1 permanently in the library named ECON415, then highlight the library ECON415 rather than the library WORK. If you want to save any changes that you make in the current session in the dataset EARNINGS, then simply use this dataset during your session.

CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS

Assignment statements and logical expressions can be used for many purposes, such as creating new variables from existing variables, recoding variables, and deleting observations from the current sample. Each of these are explained below.

ASSIGNMENT STATEMENTS

Assignment statements allow you to create new variables from existing variables. Assignment statements use the following arithmetic operators, which are carried-out in the following order if parentheses are not used: ** (exponentiation), * (multiplication), / (division), + (addition), - (subtraction). The operator for the natural logarithm is LOG.

Example #5

You want to access the dataset EARNINGS and create a temporary dataset named EARN1 that contains all the variables in EARNINGS plus additional variables that you want to create.

LIBNAME econ415 ‘a:’; DATA earn1; SET econ415.earnings;

7 logwage = log(wage); yearwage = wage*12; daywage = wage / 30; agesq = age**2; agecub = age**3; toteduc = educ + 8; RUN;

SAS will create the variables logwage, yearwage, daywage, agesq, agecub, and toteduc, and place them in the temporary dataset EARN1 along with all existing variables in the dataset EARNINGS.

LOGICAL EXPRESSIONS

Logical expressions use conditional IF, THEN, ELSE statements, and comparison and logical operators. The comparison operators are:

Equal to = eq Greater than > gt Less than < lt Greater than or equal to >= ge Less than or equal to <= le Not equal to ^= ne In in Notin notin

The logical operators are:

And & and Or | or

In the following example, a description of each logical expression and its use is given directly below the expression for ease of reference.

Example #6

You want to access the dataset EARNINGS, create a temporary dataset named EARN1, and create new variables, recode existing variables, and delete observations from the sample to construct EARN1.

LIBNAME econ415 ‘a:’; DATA earn1; SET econ415.earnings;

This accesses the permanent SAS dataset named EARNINGS from the library named ECON415, and creates the temporary SAS dataset named EARN1.

IF educ > 4 THEN college = 1; ELSE college = 0;

8 This creates a dummy variable named college that can take two values: 1 or 0. The IF THEN statement assigns a value of 1 to the variable college if the variable educ is greater than 4. The ELSE statement assigns a value of 0 to the variable college for all observations that do not have a value of one.

IF age > 50 THEN newage = 2; ELSE IF age > 25 THEN newage = 1; ELSE newage = 0;

This creates a multinomial variable called newage that can take three values: 2,1,or 0. The IF THEN statement assigns a value of 2 to the variable newage if the variable age is greater than 50. The ELSE IF THEN statement assigns a value of 1 to the variable newage if the variable age is greater than 25 and equal to or less than 50. The ELSE statement assigns a value of 0 to the variable newage for all observations that do not have a value of 2 or 1. Note that only one ELSE statement is allowed per IF THEN statement.

IF gender = 1 THEN sex = ‘male’; ELSE sex = ‘female’;

This creates a character variable named sex, that can take two names: male or female. The IF THEN statement assigns the name male to the variable sex if the variable gender is equal to 1. The ELSE statement assigns the name female to the variable sex for all observations that do not have the name male.

IF wage > 1300;

This keeps any observation for which the variable wage is greater than 1300. It deletes all observations for which wage is 1300 or less.

IF exper = 1 THEN delete;

This deletes any observation for which the variable exper is equal to 1.

IF exper = 3 and gender = 1 then delete;

This deletes any observation for which both the variable exper is equal to 3 and the variable gender is equal to 1. If either one of these conditions is not satisfied, then the observation is not deleted.

IF educ = 11 or age > = 57 then delete;

This deletes any observation for which either the variable educ is equal to 11 or the variable age is greater than or equal to 57.

IF wage = . THEN delete;

SAS represents a missing observation with a period (.). This deletes any observation for which the variable wage has a missing value.

IF age = . then age = 65;

This assigns the value of 65 to the variable age for any observation that is missing.

RUN;

9 DELETING VARIABLES FROM A SAS DATASET

Example #7

You want to create two new permanent SAS datasets from the permanent SAS dataset named EARNINGS. You want to name these new SAS datasets EARNSUB1 and EARNSUB2. You want EARNSUB1 to contain the variables WAGE, EDUC, EXPER, AGE. You want EARNSUB2 to contain the variables WAGE, EDUC.

LIBNAME econ415 ‘a:’; DATA econ415.earnsub1; SET econ415.earnings; KEEP wage educ exper age; DATA econ415.earnsub2; SET econ415.earnsub1; KEEP wage educ; RUN;

An alternative program that would accomplish the same task is the following.

LIBNAME econ415 ‘a:’; DATA econ415.earnsub1; SET econ415.earnings; DROP gender race clerical maint crafts; DATA econ415.earnsub2; SET econ415.earnsub1; DROP exper age; RUN;

The LIBNAME statement tells SAS to access and/or store permanent SAS datasets in the library named ECON415, which is located on the disk in drive A. The first DATA statement tells SAS to create a new permanent SAS dataset named EARNSUB1 and store it in the library named ECON415. The first SET statement tells SAS to access the permanent SAS dataset name EARNINGS located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE, EDUC, EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1 (or delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1). Alternatively, the DROP statement tells SAS to delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1(or include the variables WAGE, EDUC, EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1). The second DATA statement tells SAS to create a new permanent SAS dataset named EARNSUB2 and store it in the library named ECON415. The second SET statement tells SAS to access the permanent SAS dataset name EARNSUB1 located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE, and EDUC from the dataset EARNSUB1 in the dataset EARNSUB2. Alternatively, the DROP statement tells SAS to delete the variables EXPER and AGE from the dataset EARNSUB1 in the dataset EARNSUB2.

Program Builder

10 To access EARNINGS click right click the Libraries icon in the Explorer window. Click New… This opens the New Library Dialogue box. In the Name box type ECON415. In the Path box type A:. Click OK. To create the permanent SAS dataset named EARNSUB1, click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415. Click EARNINGS. Click Open. This opens the spreadsheet that contains the EARNINGS data. On the menu bar click File. Point to Save. Click Data… A Save Data box appears. Highlight the Library name ECON415. Next to Data Set: type EARNSUB1. Click OK. Close the spreadsheet that contains the file EARNINGS. Open the dataset EARNSUB1. To do this, click Globals. Point to Analyze. Click Interactive Data Analysis. Click ECON415. Click EARNSUB1. Click Open. To delete the variable GENDER click on GENDER in the spreadsheet. Click Edit in the menu bar. Click Delete. This deletes the variable GENDER from the spreadsheet. Repeat this process to delete the variables RACE, CLERICAL, MAINT, and CRAFTS. Save the file EARNSUB1. To do so, click File. Point to Save. Click Data… A Save Data box appears. Click OK. To create the permanent SAS dataset EARNSUB2, repeat the steps given above and delete the variables EXPER and AGE from the dataset EARNSUB1 to create EARNSUB2.

DISPLAYING A SAS DATASET

Example #8

You want to display the data in the permanent SAS dataset named EARNINGS.

LIBNAME econ415 ‘a:’; DATA earn1; SET econ415.earnings; PROC PRINT data=earn1; RUN;

The temporary SAS dataset EARN1 that contains the data from the permanent SAS dataset EARNINGS will be displayed in the Output Window.

Program Builder

Access the permanent SAS dataset named EARNINGS. (See section entitled ACCESSING A PERMANENT SAS DATASET). Click Solutions. Point to Analsis. Click Interactive Data Analysis. Click ECON415. Click EARNINGS. Click Open. A spreadsheet appears that contains the EARNINGS data. Alternatively, click the Libraries icon in the Explorer window. Click the ECON415 icon. Click the EARNINGS icon. This opens the Viewtable window that contains the data.

COMBINING TWO OR MORE DATASETS

Almost any combination of SAS datasets is possible. Three often used techniques for combining SAS datasets are matched merge, concatenation, and interleaving. This section explains the matched merge technique. The matched merge allows you to combine two or more datasets connecting observations by a common variable. The observations in the datasets are matched according to the values of a BY variable. Each observation in the new dataset will contain all of the variables of each of the separate datasets.

Example #9

11 You want to combine the two external ascii data files named DOCTOR1 and DOCTOR2 located on the disk in drive A into a single permanent SAS dataset named DOCTOR and store it in the library named ECON415 on the disk in drive A.

LIBNAME econ415 ‘a:’; DATA doctor1; INFILE ‘a:doctor1.txt’; INPUT id visits hours aides docwage aidewage; DATA doctor2; INFILE ‘a:doctor2.txt’; INPUT id price ydum age pcinc poptot; RUN; DATA econ415.doctor; MERGE doctor1 doctor2; BY id; RUN;

This program creates two temporary SAS datasets named DOCTOR1 and DOCTOR2. It then merges these two temporary SAS datasets by the variable ID to create the permanent SAS dataset named DOCTOR. The dataset DOCTOR contains the values of the variables ID, VISITS, HOURS, AIDES, DOCWAGE, AIDEWAGE, PRICE, YDUM, AGE, PCINC, and POPTOT for each ID number (i.e., each physician).

Example #10 through Example #21 below use the data in the external data file named CPS85, which are assumed to be located on a disk in drive A. The following program creates a permanent SAS dataset named CPS85A and saves it in the library named ECON 415 located on the disk in drive A.

LIBNAME econ415 ‘a:’; DATA econ415.cps85a; INFILE ‘a:cps85.txt’; INPUT ed south nonwh hisp fe marr marrfe ex union wage age manuf constr manag sales cler serv prof; IF WAGE=0 THEN DELETE; RUN;

FREQUENCY DISTRIBUTIONS AND SCATTER DIAGRAMS

The easiest way to display frequency distributions and scatter diagrams is to use the program builder.

Example #10

You want to access the permanent SAS dataset named CPS85A which is stored in the library named ECON415 on a disk in drive A. You want to display an absolute frequency distribution for the variables WAGE and ED, a relative frequency distribution for the variables WAGE and ED, and a scatter diagram for the variables WAGE and ED.

Program Builder

Access the permanent SAS dataset named CPS85A. (See section entitled ACCESSING A PERMANENT SAS DATASET). Click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415.

12 Click CPS85A. Click Open. A spreadsheet appears that contains the CPS85A data. Click Analyze on the menu bar. A pop-up menu appears that has 8 choices. Three of these choices are Histogram/Bar Chart (Y), Scatter Plot (Y,X), and Distribution (Y). To display an absolute frequency distribution of WAGE, click Histogram/Bar Chart (Y). Click Wage. Click the Y button. Click the OK button. The absolute frequency distribution of WAGE is now displayed. To display a relative frequency distribution of WAGE, click Distribution (Y). Click Wage. Click the Y button. Click the OK button. The relative frequency distribution of WAGE is now displayed. In addition, an assortment of descriptive statistics, such as the mean, variance, standard deviation, coefficient of variation, etc., are also provided. Repeat this sequence of steps to display the absolute and relative frequency distributions for ED. To display a scatter diagram of WAGE and ED, click Scatter Plot (Y,X). Click Wage. Click the Y button. Click Ed. Click the X button. Click the OK button. A scatter diagram for WAGE and ED is now displayed.

DESCRIPTIVE STATISTICS

Example #11

You want to access the permanent SAS dataset named CPS85A which is stored in the library named ECON415 on a disk in drive A. You want to calculate the mean, variance, standard deviation, and coefficient of variation for the variables WAGE, ED, EX, FE, AGE, UNION. You also want to calculate the covariances and correlation coefficients for these variables.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC MEANS mean var std cv max min; VAR wage ed ex fe age union; PROC CORR COV; VAR wage ed ex fe age union; RUN;

The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and create the temporary SAS dataset named CPS85B. Note that this temporary dataset will be deleted when your session ends. The PROC MEANS statement and the options MEAN, VAR, STD, CV, MAX, MIN, tell SAS to calculate the mean, variance, standard deviation, coefficient of variation, and maximium and minimum values. The VAR statement tells SAS to calculate these statistics for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you omit the VAR statement, then SAS will calculate descriptive statistics for all variables in the dataset CPS85A. The PROC CORR COV statement tells SAS to calculate the correlation matrix and covariance matrix. The VAR statement tells SAS to calculate the correlation coefficients and covariances for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you want SAS to provide a full range of descriptive statistics, you can replace the PROC MEANS mean var std cv; statement with the following statement.

PROC UNIVARIATE;

SAS will provide a large number of different types of descriptive statistics for the variables WAGE, ED, EX, FE, AGE, UNION.

Program Builder

13 Access the permanent SAS dataset named CPS85A. (See section entitled ACCESSING A PERMANENT SAS DATASET). Click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415. Click CPS85A. Click Open. A spreadsheet appears that contains the CPS85A data. Click Analyze on the menu bar. Click Distribution (Y). Click Wage. Click the Y button. Click Ed. Click the Y button. Click Fe. Click the Y button. Click Age. Click the Y button. Click Union. Click the Y button. Click the OK button. Relative frequency distributions for WAGE, ED, EX, FE, AGE, and UNION are now displayed. In addition, a full range of descriptive statistics are provided below the relative frequency distributions. These are the same descriptive statistics that are provided by the statement PROC UNIVARIATE. To calculate the covariances and correlation coefficients, with the spreadsheet open click Analyze on the menu bar. Click Multivariate (Y’s). Click the Output button. Click the boxes next to CORR and COV. Click the OK button. Click Wage. Click the Y button. Click Ed. Click the Y button. Click Fe. Click the Y button. Click Age. Click the Y button. Click Union. Click the Y button. Click the OK button.

LINEAR REGRESSION

Example #12

You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED. You also want to print the variance-covariance matrix for the parameter estimates.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed / covb; RUN;

The PROC REG statement tells SAS to run a linear regression using the OLS estimator. The MODEL statement tells SAS the dependent variable, explanatory variable(s), and any optional output to print. The dependent variable is on the left-hand side of the equal sign and the explanatory variable(s) are on the right-hand side. The / separates the regression equation from the options. The option covb tells SAS to display the variance-covariance matrix of estimates in the Output window along with the standard regression results. If you do not give SAS any options, then you do not have to include the / .

Program Builder

Access the permanent SAS dataset named CPS85A. (See section entitled ACCESSING A PERMANENT SAS DATASET). Click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415. Click CPS85A. Click Open. A spreadsheet appears that contains the CPS85A data. Click Analyze on the menu bar. Click Fit(X Y). Click the Output button. Click the box next to Estimated Covariance Matrix. Click the OK button. Click Wage. Click the Y button. Click Ed. Click the X button. Click the OK button.

Example #13

You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You also want to print the variance-covariance matrix for the parameter estimates.

14 LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe / covb; RUN;

This program is the same as the program for example #3, except we include the two additional explanatory variables, EX and FE, in the MODEL statement.

Program Builder

Access the permanent SAS dataset named CPS85A. (See section entitled ACCESSING A PERMANENT SAS DATASET). Click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415. Click CPS85A. Click Open. A spreadsheet appears that contains the CPS85A data. Click Analyze on the menu bar. Click Fit(X Y). Click the Output button. Click the box next to Estimated Covariance Matrxix. Click the OK button. Click Wage. Click the Y button. Click Ed. Click the X button. Click Ex. Click the X button. Click Fe. Click the X button. Click the OK button.

Example #14

You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You want to test the following hypotheses. 1) Education and experience have no joint effect on wage; that is, the coefficient of ED and the coefficient of EX are jointly equal to zero 2) The marginal effects of ED and EX are equal; that is the coefficients of ED and EX are equal. 3) The sum of the marginal effects of ED and EX is equal to 2; that is, the sum of the coefficients of ED and EX is 2.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; TEST ed = 0, ex = 0; TEST ed = ex; TEST ed + ex = 2; RUN;

Note that one or more TEST statements can follow a MODEL statement. Because we are testing three different hypotheses for the same regression model, we have three TEST statements that follow the model statement. Note that when you are testing a joint hypothesis (i.e., two or more restrictions jointly), after the TEST statement you separate the equation that defines each hypothesis by a comma.

Program Builder

Not applicable.

Example #15

15 You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE, and impose the restriction that the coefficients of ED and EX are equal. Thus, your objective is to estimate a restricted model that imposes a restriction on the model parameters.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; RESTRICT ed = ex; RUN;

The RESTRICT statement tells SAS to impose a restriction on the parameters of the statistical model. The restriction that you want to impose is given by the equation after the RESTRICT statement. Note that the format of the RESTRICT statement is identical to the format of the TEST statement. SAS will display the parameter estimates for the restricted model in the Output window. In addition, it provides an estimate for a parameter called RESTRICT. This is a parameter estimate for a Lagrange parameter that is introduced during the estimation process. If the coefficient of RESTRICT is zero, then the restricted and unrestricted estimates are not significantly different, which means that the restriction has no effect. In this example, a t-test cannot reject the null hypothesis that the coefficient of RESTRICT is zero. This indicates that imposing the restriction is valid.

Program Builder

Not applicable.

Example #16

You want to use the SAS dataset named CPS85A to run a linear regression of WAGE on ED, EX and FE. You want to check for multicollinearity among the explanatory variables. To do this you want to run a regression of each explanatory variable on all remaining explanatory variables so you can calculate variance inflation factors. You also want to calculate the correlation coefficients for the explanatory variables.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; MODEL ed = ex fe; MODEL ex= ed fe; MODEL fe = ed ex; PROC CORR; VAR ed ex fe; RUN;

16 You can use the R2 statistic for the last three models to calculate variance inflation factors for ED, EX and FE. You can check the correlation matrix for high correlation coefficients between the explanatory variables. Note that SAS will display certain multicollinearity diagnostics, such as eigenvalues and condition indexes, if you use the MODEL statement

MODEL wage = ed ex fe / collin;

Program Builder

Run the 4 separate regressions using example #3 as a prototype of how to run a regression. Calculate variances and covariances as in section entitled DESCRIPTIVE STATISTICS. If you want SAS to display certain multicollinearity diagnostics, such as eigenvalues and condition indexes, before you run the regression of WAGE on ED, EX, and FE, click the Output button in the Fit (Y X) dialogue box. Click the box next to Collinearity Diagnostics. Click the OK button.

Example #17

You want to use the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You want to use the testing-up approach and do a Lagrange multiplier test to test whether the variables NONWH and MARR should be included in the model. You also want to use the testing-down approach and do an F-test to test whether the variables NONWH and MARR belong in the model.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; OUTPUT out=cps85b residual=resid; PROC REG; MODEL resid = ed ex fe nonwh marr; RUN; DATA cps85c; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe nonwh marr; TEST nonwh = 0, marr = 0; RUN;

The OUTPUT statement that follows the MODEL statement for the regression of WAGE on ED, EX, FE tells SAS to save the residuals from this regression as the variable named RESID (residual=resid), and include the variable named RESID in the temporary SAS data set named CPS85B (out=cps85b). To calculate the Lagrange multiplier test statistic, take the unadjusted R2 statistic from the regression of RESID on ED, EX, FE, NONWH, MARR (R2 = 0.0102) and multiply it by the sample size (n = 530). For this example, the Lagrange multiplier test statistic is LM = (0.0102)(530) = 5.41. The second set of commands beginning with the second data statement DATA CPS85C and ending with the second RUN statement are the commands for the F-test.

Program Builder

17 Run the regression of WAGE on ED, EX, FE using example #3 as a prototype of how to run a regression. However, before you run the regression click the Output button in the Fit (Y X) dialogue box. Click the Output Variables button. Click the box next to Residual. Click the OK button. When you run the regression, SAS will save the residuals as the variable R_WAGE_2. This variable will now appear in the spreadsheet containing the CPS85A data. Run the regression of R_WAGE_2 on ED, EX, FE, NONWH, MARR. Calculate LM test statistic using the output from this regression.

Example #18

You want to use the SAS dataset CPS85A to estimate a varying slope parameter model where WAGE depends upon ED, EX, FE, and the interaction variable EDFE, which is the product of ED and FE. This interaction variable allows the coefficient of ED to depend upon FE.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; edfe = ed*fe; PROC REG; MODEL wage = ed ex fe edfe; RUN;

Note that to estimate this model, you must first create an interaction term for ED and FE.

Program Builder

Access the permanent SAS dataset named EARNINGS. (See section entitled ACCESSING A PERMANENT SAS DATASET). Click Solutions. Point to Analysis. Click Interactive Data Analysis. Click ECON415. Click CPS85A. Click Open. A spreadsheet appears that contains the CPS85A data. Click Analyze on the menu bar. Click Fit(Y X). Click Wage. Click the Y button. Click Ed. Click the X button. Click Ex. Click the X button. Click fe. Click the X button. Click Ed. Hold down the Ctrl key on the key board and click Fe. Click the Cross button. This creates the interaction term for ED and FE. Click the OK button.

Example #19

You want to use the SAS dataset CPS85A to estimate a log-linear functional form, where the logarithm of WAGE depends upon ED, EX, FE.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; Logwage = log(wage); PROC REG; MODEL logwage = ed ex fe; RUN;

18 Note that to estimate this model, you must first create a new variable named LOGWAGE, which is the natural logarithm of the variable WAGE.

Example #20

You want to use the SAS datataset CPS85A to run a instrumental variables regression of WAGE on ED, EX, and FE using the two-stage least squares estimator. You assume ED is the endogenous explanatory variable. The instrumental variables are NONWH and MARR. You also want to calculate the F-statistic for the null hypothesis that NONWH and MARR have no joint effect on ED in the first-stage regression to check the strength (relevance) of the instrumental variables NONWH and MARR.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC SYSLIN 2sls; ENDOGENOUS ed; INSTRUMENTS nonwh marr; MODEL wage = ed ex fe; RUN; DATA cps85c; SET econ415.cps85a; PROC REG; MODEL ed = ex fe nonwh marr; TEST nonwh = 0, marr = 0; RUN;

The PROC SYSLIN statement tells SAS that you are going to estimate at least one equation in a system of linear equations. The option 2SLS tells SAS to estimate the equation(s) using the two-stage least squares estimator. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instrumental variables. The MODEL statement tells SAS the equation to estimate. The second set of commands beginning with the second data statement DATA CPS85C and ending with the second RUN statement are the commands for the first-stage regression of ED on EX, FE, NONWH, MARR, and calculation of the F-statistic that is used to check instrument strength or relevance.

Example #21

You want to use the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You then want to estimate this model using the FGLS estimator (weighted least squares) assuming that the variance of the error term is a linear function of ED.

LIBNAME econ415 ‘a:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; OUTPUT out=cps85b residual=resid; DATA cps85c; SET cps85b;

19 residsq = resid**2; PROC REG; MODEL residsq = ed; OUTPUT out=cps85c predicted=varhat; DATA cps85d; SET cps85c; IF varhat <= 0 THEN varhat = residsq; sdhat = sqrt(varhat); w = 1/sdhat; PROC REG; MODEL wage = ed ex fe; WEIGHT w; RUN;

In this program we use three DATA statements to create three temporary SAS datasets. The OUTPUT statement that follows the MODEL statement for the regression of RESIDSQ on ED tells SAS to save the predicted values of RESIDSQ for this regression as the variable named VARHAT (predicted=varhat), and include this variable in the temporary SAS dataset named CPS85C (out=cps85c). The conditional IF THEN statement tells SAS to replace any value of the variable VARHAT that is negative or zero with the value for the variable RESIDSQ. We must do this because we cannot take the square root of zero or a negative number. The function SQRT tells SAS to find the square root of the variable VARHAT. The WEIGHT statement that follows the last MODEL statement tells SAS to run a weighted least squares regression using the variable W as the weight. This is the FGLS estimator.

Program Builder

Not applicable.

The following program creates a permanent SAS dataset named MACROCON and saves it in the library named ECON415 located on the disk in drive A. The dataset MACROCON is used in example #22 and example #23.

LIBNAME econ415 ‘a’; DATA econ415.macrocon; INFILE ‘a:macrocon.txt’; INPUT year cons disinc price prime un; RUN;

Example #22

You want to use the SAS dataset named MACROCON to run a linear regression of real consumption expenditures (RCONS) on real disposable income (RDISINC) and PRIME. Real consumption expenditures is defined as CONS divided by PRICE, with the appropriate adjustment for the decimal point. Real disposable income is defined as DISINC divided by PRICE, with the appropriate adjustment for the decimal point. You want to do a Largrange multiplier test to test for second-order autocorrelation.

LIBNAME econ415 ‘a:’; DATA con1; SET econ415.macrocon;

20 rcons = cons/(price/100); rdisinc = disinc/(price/100); PROC REG; MODEL rcons = rdisinc prime; OUTPUT out=con1 residual=resid; DATA con2; SET con1; resid1 = lag1(resid); resid2 = lag2(resid); PROC REG; MODEL resid = rdisinc prime resid1 resid2; RUN;

The assignment statements for RCONS and RDISINC tell SAS to create the new variables RCONS and RDISINC and save them in the temporary SAS dataset named CON1. The OUTPUT statement that follows the MODEL statement for the regression of RCONS on RDISINC and PRIME tells SAS to save the residuals from this regression as the variable named RESID, and include the variable named RESID in the temporary SAS dataset named CON1. The second DATA statement tells SAS to create a second temporary SAS dataset named CON2. The SET statement tells SAS to include all of the variables in the temporary SAS dataset CON1 in the temporary SAS dataset named CON2. The assignment statement RESID1 = LAG1(RESID) tells SAS to create a new variable named RESID1 that is equal to the variable RESID lagged one period. The assignment RESID2 = LAG2(RESID) tells SAS to create a new variable named RESID2 that is equal to the variable RESID lagged two periods. The variables RESID1 and RESID2 are saved in the temporary SAS dataset CON2. To calculate the Lagrange multiplier test statistic, take the unadjusted R2 statistic from the regression of RESID on RDISINC, PRIME, RESID1, and RESID2 (R2 = 0.31) and multiply by the sample size (n = 35). Note that you lose two observations when running this regression because you have a variable that is lagged two periods. For this example, the Lagrange multiplier test statistic is LM = (0.31)(35) = 10.8.

Program Builder

Too cumbersome because of the need to create lagged variables.

Example #23

You want to use the SAS dataset named MACROCON to run a linear regression of real consumption expenditures (RCONS) on real disposable income (RDISINC) and PRIME. You want to estimate this model using the FGLS Cochrane-Orcutt estimator to correct for first-order autocorrelation.

LIBNAME econ415 ‘a:’; DATA con1; SET econ415.macrocon; rcons = cons/(price/100); rdisinc = disinc/(price/100); PROC AUTOREG itprint; MODEL rcons = rdisinc prime / nlag=1 iter converge=0.0001; RUN;

21 The PROC AUTOREG statement tells SAS to run a linear regression and correct for autocorrelation. The option ITPRINT tells SAS to print out each iteration that SAS performs so you can see how the estimate of the autocorrelation coefficient () changes. The MODEL statement tells SAS to run a linear regression of RCONS on RDISINC and PRIME. The / tells SAS that options follow. The option NLAG=1 tells SAS to correct first-order autocorrelation. The ITER option tells SAS to use Cochrane- Orcuitt estimator, which involves doing iterations. The CONVERGE=0.0001 option tells SAS to stop iterating when the estimate of  from two successive iterations differ by no more than 0.0001. If you do not include a the CONVERGE option, SAS will use its own default value for when convergence is achieved. It is important to note that SAS will print out the negative of the estimate of the autocorrelation coefficient, . Thus, if SAS prints a negative  it is positive, indicating positive autocorrelation. If SAS prints a positive  it is negative indicating negative autocorrelation.

22