A Concise Guide to the Sas Statistical Package
Total Page:16
File Type:pdf, Size:1020Kb

A CONCISE GUIDE TO THE SAS STATISTICAL PACKAGE Version 9.3 and 9.4
Professor Thornton Economics 515 Econometrics 2 INTRODUCTION
This guide provides an overview of the SAS statistical package and an explanation of a number of useful SAS commands and capabilities. It does not explain all SAS commands and capabilities. SAS is an extremely powerful statistical package, and if you desire to learn more about what it can do you should consult the appropriate SAS Users Manual or purchase one of the many SAS companion books available in bookstores that provide a more detailed explanation about various facets of the SAS system.
DATA SETS
In this guide, SAS commands are explained in the context of examples. The examples are based on the following eight data sets. It is assumed that each data set is contained in a file on a memory stick in drive E. If your data files are on drive C, or on a memory stick located in a different drive such as drive F, modify the examples below accordingly (e.g., replace the letter E with the letter C or F).
WAGEDATA
The data file DATA7-2 comes with the Ramanathan econometrics text book. It consists of a cross- section of 49 workers. The variables are WAGE = monthly wage, EDUC = years of education beyond the eighth grade, EXPER = years of experience, AGE = age of worker, GENDER = indicator variable for gender (1 if male, 0 if female), RACE = indicator variable for race (1 if white, 0 if nonwhite), CLERICAL = indicator variable for clerical worker (1 if clerical worker, 0 otherwise), MAINT = indicator variable for maintenance worker (1 if maintenance worker, 0 otherwise), CRAFTS = indicator variable for crafts worker (1 if crafts worker, 0 otherwise).
CPS85
The data file CPS85 consists of 526 randomly selected employed workers from the May 1985 current population survey conducted by the Department of Commerce. This is a survey of over 50,000 households conducted monthly, and it serves as the basis for the national employment and unemployment statistics. The variables are: ED = years of education, SOUTH = dummy variable (1 if worker lives in south, 0 otherwise), NONWH = dummy variable (1 if worker is nonwhite, zero otherwise), HISP = dummy variable (1 if worker is Hispanic, 0 otherwise), FE = dummy variable (1 if worker is female, 0 otherwise), MARR = dummy variable (1 if worker is married with spouse present in household, 0 otherwise), MARRFE = dummy variable (1 if worker is married female with spouse present in household, 0 otherwise), EX = years of labor market experience, UNION = dummy variable (1 if worker has union job, 0 otherwise), WAGE = average hourly earnings in constant 2003 dollars, AGE = age in years, MANUF = dummy variable ( 1 if worker works in manufacturing industry, 0 otherwise), CONSTR = dummy variable ( 1 if worker works in construction industry, 0 otherwise), MANAG = dummy variable (1 if worker is managerial or administrative, 0 otherwise), SALES = dummy variable (1 if worker is in sales, 0 otherwise), CLER = dummy variable ( 1 if worker is clerical worker, 0 otherwise), SERV = dummy variable (1 if worker is a service worker, 0 otherwise), PROF = dummy variable (1 if worker is professional or technical, 0 otherwise),
MACROCON
The data file MACROCON consists of a times-series of annual data for the period 1959 to 1995. The variables are YEAR = year, CONS = annual consumption spending in billions of dollars, DISINC = annual disposable income in billions of dollars, PRICE = consumer price index, PRIME = the prime interest rate, UN = unemployment rate.
3 DEMAND
The data file DEMAND consists of prices and quantities purchased of three goods, and income, for a cross section of 30 individual consumers. These data are simulated, not real world data. The variables are: Q1 = quantity purchased of good 1, Q2 = quantity purchased of good 2, Q3 = quantity purchased of good 3, P1 = price of good 1, P2 = price of good 2, P3 = price of good 3, I = consumer income.
PRODUCER
The data file PRODUCER consists of cross-section data for 92 dairy farm households for the year 1986. These data were obtained from a random sample of Utah dairy farmers in five counties that were the major dairy production centers. The variables are: OUTPUT = pounds of milk produced per year, LABOR = hours worked per year by household members, CAPITAL = units of capital, LAND = units of land, PCAPITAL = price per unit of capital, PLAND = price per unit of land, POUTPUT = price per pound of milk, PLABOR = hourly wage of labor. Note that the price of labor and the price of land do not vary across dairy farms i.e., all 92 dairy farms can purchase labor and land at the same price.
LABOR
The data file LABOR consists of cross-section data for 100 families taken from the 1976 panel study of income dynamics, and is based on data for the year 1975. The variables are: LFP = a dummy variable for wife labor force participation (1 if wife worked in 1974, 0 otherwise), WHRS = wife’s hours of work in 1975, KL6 = number of children less than 6 years old in household, K618 = number of children between 6 and 18 in the household, WA = wife’s age, WE = wife’s years of education, WW = wife’s hourly wage for 1975, HHRS = husband’s hours worked in 1975, HA = husband’s age, HE = husband’s years of education, HW = husband’s hourly wage rate for 1975, FAMINC = total family income for 1975, MTR = marginal tax rate for wife, WMED = wife’s mother’s years of education, WFED = wife’s father’s years of education, UN = unemployment rate in county of residence (percentage), CIT = dummy variable for urban area (1 if family lives in large city, 0 otherwise), AX = wife’s years of labor market experience.
BACKGROUND INFORMATION
SAS is a statistical software package that can be used to read, manage, analyze, and present data. SAS allows you to read data in a variety of different formats, transform the data to conduct statistical analyses, analyze the data, and present the results.
A SAS program has two major components: Data Steps and Procedures. The data step allows you to read SAS data sets or raw data, perform transformations on the data, create new variables, and recode existing variables. The data step is the component of the program that creates SAS datasets. The procedure (usually referred to as PROC) allows you to analyze and present the data. Data steps and procedures are comprised of one or more statements. A statement is usually identified by a keyword that suggests the statement’s function (e.g., REG, MEANS, RUN). Every statement ends with a semicolon.
EXECUTING A SAS PROGRAM
A SAS program can be executed in different ways. The two most important ways are batch mode and interactive windows mode. In batch mode you use a text editor (such as Microsoft WordPad) to write a SAS program in an input file in a text document (.txt). You then tell SAS to execute the program in the
4 input file and place the resulting output in an output file. You then use a text editor to view the output file.
In interactive windows mode, you type SAS statements in a Program Editor window. When SAS statements are executed the output is displayed in an Output window. A Log window is also displayed that contains the log for any SAS statements that are executed. The log window is very useful in writing SAS programs. The log is displayed whether the program works or not. It repeats the SAS statements that are executed, documents any SAS datasets that are created, gives you warnings about potential problems with your program, and error messages for mistakes such as incorrect syntax.
This guide explains how to create and execute SAS programs in interactive windows mode using the Program Editor.
CREATING A SAS DATASET
The first step in SAS programming is to create a SAS dataset. SAS has a large number of tools that can be used to read raw data into a SAS dataset. This process is called importing. The raw data used to create a SAS dataset can be in a number of different formats and locations. This guide explains how to import an Excel file, create a temporary SAS dataset, create a permanent SAS dataset that you can save for future use in a SAS library, and access a SAS dataset stored in the library.
Example
The Excel File WAGEDATA has 49 observations on 9 variables. The names of the variables are WAGE, EDUC, EXPER, AGE, GENDER, RACE, CLERICAL, MAINT, CRAFTS. You want to create a temporary SAS dataset named EARNINGS, and a SAS library named ECON415. You then want to save the temporary dataset EARNINGS as a permanent SAS dataset also named EARNINGS.
If you are using SAS 9.3, you can directly import an Excel file. If you are using SAS 9.4, you must first save the Excel file as a CSV (Comma delimited) file. To use the Excel program to save the Excel file as a CSV file do the following. In Excel, on the menu bar in the upper left hand corner click File. Click Save As. In the Save as type box scroll down the list of file types and click on CSV (Comma delimited). Click Save. Now launch SAS. On the menu bar in the upper left hand corner click File. Click Import Data… Under Select a data source from the list below, Microsoft Excel Workbook should appear; if not, find it under the list of choices. Click Next. In the dialogue box next to Workbook, enter the name and location of the file you want to import. In this example: E:\wagedata. In the dialogue box that appears SAS asks you What table do you want to import? This is the name of the worksheet in the Excel file you are importing. In the Excel file WAGEDATA, there is only one worksheet named data. This should already appear in the box. If not select it. Click Next. In the dialogue box that appears, enter the name you want to give to the temporary SAS dataset you are creating. Enter the name earnings. Click Finish. To verify that you have successfully created a temporary SAS dataset named EARNINGS, click the explorer button on the tool bar. Click the Work icon, and the earnings icon. To create a new SAS library named ECON415, on the menu bar in the upper left hand corner click on Tools. Click on New Library. In the dialogue box next to Name, enter ECON415. In the box next to Path, enter E:\. Click OK. Click the explorer button on the tool bar. Click Work. Use the mouse to drag the file named EARNINGS from the folder named Work to the folder named Econ415. To verify that you have successfully created a permanent SAS dataset named EARNINGS, click on the folder Econ415.
ACCESSING A PERMANENT SAS DATASET
5 The following examples explain how to load a permanent SAS dataset that you have created and create new temporary or permanent SAS datasets from it.
Example
You want to access the dataset named EARNINGS which is stored in the library named ECON415 on a disk on drive E. You want to create a temporary SAS data set named EARN1.
In the Program Editor window type the following statements.
LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; RUN;
The LIBNAME statement tells SAS the name of the library and where it is located. The DATA statement tells SAS to create a temporary SAS dataset named EARN1. The SET statement tells SAS to access the permanent SAS dataset named EARNINGS that is located in the library named ECON415. To verify that you have accessed EARINGS and created EARN1, click the Libraries icon in the Explorer window. There is now an icon for ECON415. If you click ECON415, you will see an EARNINGS icon. If you click the Work icon, you will see an icon for the temporary dataset EARN1. Note that when you end your session, the temporary dataset EARN1 will be deleted. If you want to store this new dataset permanently in the library named ECON415, then replace the DATA statement above with the following DATA statement
DATA econ415.earn1;
If you want to store all changes made in the current session in the permanent SAS dataset named EARNINGS, then replace the DATA statement above with the following DATA statement
DATA econ415.earnings;
In this case, you do not create a temporary SAS dataset. Rather, SAS overwrites the permanent SAS dataset EARNINGS with any changes that you make to the data during the current session.
CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS
Assignment statements and logical expressions can be used for many purposes, such as creating new variables from existing variables, recoding variables, and deleting observations from the current sample. Each of these are explained below.
ASSIGNMENT STATEMENTS
Assignment statements allow you to create new variables from existing variables. Assignment statements use the following arithmetic operators, which are carried-out in the following order if parentheses are not used: ** (exponentiation), * (multiplication), / (division), + (addition), - (subtraction). The operator for the natural logarithm is LOG.
Example
6 You want to access the dataset EARNINGS and create a temporary dataset named EARN1 that contains all the variables in EARNINGS plus additional variables that you want to create.
LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; logwage = log(wage); yearwage = wage*12; daywage = wage / 30; agesq = age**2; agecub = age**3; toteduc = educ + 8; RUN;
SAS will create the variables logwage, yearwage, daywage, agesq, agecub, and toteduc, and place them in the temporary dataset EARN1 along with all existing variables in the dataset EARNINGS.
LOGICAL EXPRESSIONS
Logical expressions use conditional IF, THEN, ELSE statements, and comparison and logical operators. The comparison operators are:
Equal to = eq Greater than > gt Less than < lt Greater than or equal to >= ge Less than or equal to <= le Not equal to ^= ne In in Notin not in
The logical operators are:
And & and Or | or
In the following example, a description of each logical expression and its use is given directly below the expression for ease of reference.
Example
You want to access the dataset EARNINGS, create a temporary dataset named EARN1, and create new variables, recode existing variables, and delete observations from the sample to construct EARN1.
Commands
LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings;
7 This accesses the permanent SAS dataset named EARNINGS from the library named ECON415, and creates the temporary SAS dataset named EARN1.
IF educ > 4 THEN college = 1; ELSE college = 0;
This creates a dummy variable named college that can take two values: 1 or 0. The IF THEN statement assigns a value of 1 to the variable college if the variable educ is greater than 4. The ELSE statement assigns a value of 0 to the variable college for all observations that do not have a value of one.
IF age > 50 THEN newage = 2; ELSE IF age > 25 THEN newage = 1; ELSE newage = 0;
This creates a multinomial variable called newage that can take three values: 2,1,or 0. The IF THEN statement assigns a value of 2 to the variable newage if the variable age is greater than 50. The ELSE IF THEN statement assigns a value of 1 to the variable newage if the variable age is greater than 25 and equal to or less than 50. The ELSE statement assigns a value of 0 to the variable newage for all observations that do not have a value of 2 or 1. Note that only one ELSE statement is allowed per IF THEN statement.
IF gender = 1 THEN sex = ‘male’; ELSE sex = ‘female’;
This creates a character variable named sex, that can take two names: male or female. The IF THEN statement assigns the name male to the variable sex if the variable gender is equal to 1. The ELSE statement assigns the name female to the variable sex for all observations that do not have the name male.
IF wage > 1300;
This keeps any observation for which the variable wage is greater than 1300. It deletes all observations for which wage is 1300 or less.
IF exper = 1 THEN delete;
This deletes any observation for which the variable exper is equal to 1.
IF exper = 3 and gender = 1 then delete;
This deletes any observation for which both the variable exper is equal to 3 and the variable gender is equal to 1. If either one of these conditions is not satisfied, then the observation is not deleted.
IF educ = 11 or age > = 57 then delete;
This deletes any observation for which either the variable educ is equal to 11 or the variable age is greater than or equal to 57.
IF wage = . THEN delete;
SAS represents a missing observation with a period (.). This deletes any observation for which the variable wage has a missing value.
8 IF age = . then age = 65;
This assigns the value of 65 to the variable age for any observation that is missing.
RUN;
DELETING VARIABLES FROM A SAS DATASET
Example
You want to create two new permanent SAS datasets from the permanent SAS dataset named EARNINGS. You want to name these new SAS datasets EARNSUB1 and EARNSUB2. You want EARNSUB1 to contain the variables WAGE, EDUC, EXPER, AGE. You want EARNSUB2 to contain the variables WAGE, EDUC.
LIBNAME econ415 ‘e:’; DATA econ415.earnsub1; SET econ415.earnings; KEEP wage educ exper age; DATA econ415.earnsub2; SET econ415.earnsub1; KEEP wage educ; RUN;
An alternative program that would accomplish the same task is to replace the KEEP statements with the following DROP statements.
DROP gender race clerical maint crafts;
DROP exper age;
The LIBNAME statement tells SAS to access and/or store permanent SAS datasets in the library named ECON415, which is located on the disk in drive E. The first DATA statement tells SAS to create a new permanent SAS dataset named EARNSUB1 and store it in the library named ECON415. The first SET statement tells SAS to access the permanent SAS dataset name EARNINGS located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE, EDUC, EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1 (or delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1). Alternatively, the DROP statement tells SAS to delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1(or include the variables WAGE, EDUC, EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1). The second DATA statement tells SAS to create a new permanent SAS dataset named EARNSUB2 and store it in the library named ECON415. The second SET statement tells SAS to access the permanent SAS dataset name EARNSUB1 located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE, and EDUC from the dataset EARNSUB1 in the dataset EARNSUB2. Alternatively, the DROP statement tells SAS to delete the variables EXPER and AGE from the dataset EARNSUB1 in the dataset EARNSUB2.
DISPLAYING A SAS DATASET
9 Example
You want to display the data in the permanent SAS dataset named EARNINGS.
LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; PROC PRINT data=earn1; RUN;
The temporary SAS dataset EARN1 that contains the data from the permanent SAS dataset EARNINGS will be displayed in the Output Window.
DESCRIBING AND ANALYZING DATA
Examples #7 through #17 below involve describing and analyzing data. The data are contained in the Excel file CPS85, which is assumed to be located on a disk in drive E. Create a temporary SAS dataset named CPS85A. Save this temporary SAS dataset as a permanent SAS dataset named CPS85A in the library ECON415.
FREQUENCY DISTRIBUTIONS AND SCATTER DIAGRAMS
Example
You want to access the permanent SAS dataset named CPS85A which is stored in the library named ECON415 on drive E:. You want to display an absolute frequency distribution for the variables WAGE and ED, a relative frequency distribution for the variables WAGE and ED, and a scatter diagram for the variables WAGE and ED.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC UNIVARIATE NOPRINT; VAR wage ed; HISTOGRAM wage ed; PROC SGPLOT; SCATTER x = ed y = wage; RUN;
The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and create the temporary SAS dataset named CPS85B. Note that this temporary dataset will be deleted when your session ends. The PROC UNIVARIATE statement and the option NOPRINT tells SAS to obtain information required to construct a histogram and suppress the output. The VAR statement tells SAS to obtain the information for the variables WAGE and ED. The HISTOGRAM statement tells SAS to construct histograms for the variable WAGE and ED. The PROC SGPLOT statement tells SAS to construct a graph that plots data points. The SCATTER statement tells SAS to construct a scatter diagram. The x = ed tells SAS to measure the variable ED on the horizontal axis. The y = wage tells SAS to measure the variable WAGE on the vertical axis.
DESCRIPTIVE STATISTICS
10 Example
You want to access the permanent SAS dataset named CPS85A which is stored in the library named ECON415 on a disk in drive E. You want to calculate the mean, variance, standard deviation, and coefficient of variation for the variables WAGE, ED, EX, FE, AGE, UNION. You also want to calculate the covariances and correlation coefficients for these variables.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC MEANS mean var std cv max min; VAR wage ed ex fe age union; PROC CORR COV; VAR wage ed ex fe age union; RUN;
The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and create the temporary SAS dataset named CPS85B. The PROC MEANS statement and the options MEAN, VAR, STD, CV, MAX, MIN, tell SAS to calculate the mean, variance, standard deviation, coefficient of variation, and maximum and minimum values. The VAR statement tells SAS to calculate these statistics for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you omit the VAR statement, then SAS will calculate descriptive statistics for all variables in the dataset CPS85A. The PROC CORR COV statement tells SAS to calculate the correlation matrix and covariance matrix. The VAR statement tells SAS to calculate the correlation coefficients and covariances for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you want SAS to provide a full range of descriptive statistics, you can replace the PROC MEANS mean var std cv; statement with the following statement.
PROC UNIVARIATE;
SAS will provide a large number of different types of descriptive statistics for the variables WAGE, ED, EX, FE, AGE, UNION.
LINEAR REGRESSION
Example
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED. You also want to print the variance-covariance matrix for the parameter estimates.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed / covb; RUN;
The PROC REG statement tells SAS to run a linear regression using the OLS estimator. The MODEL statement tells SAS the dependent variable, explanatory variable(s), and any optional output to print. The dependent variable is on the left-hand side of the equal sign and the explanatory variable(s) are on the right-hand side. The forward slash (/) separates the regression equation from the options. The option covb tells SAS to display the variance-covariance matrix of estimates in the Output window along with
11 the standard regression results. If you do not give SAS any options, then you do not have to include the forward slash.
Example
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You also want to print the variance-covariance matrix for the parameter estimates.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe / covb; RUN;
This program is the same as the program for the previous example except two additional explanatory variables, EX and FE, are included in the MODEL statement.
Example
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You want to test the following hypotheses. 1) Education and experience have no joint effect on wage; that is, the coefficient of ED and the coefficient of EX are jointly equal to zero 2) The marginal effects of ED and EX are equal; that is the coefficients of ED and EX are equal. 3) The sum of the marginal effects of ED and EX is equal to 2; that is, the sum of the coefficients of ED and EX is 2.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; TEST ed = 0, ex = 0; TEST ed = ex; TEST ed + ex = 2; RUN;
Note that one or more TEST statements can follow a MODEL statement. Because we are testing three different hypotheses for the same regression model, we have three TEST statements that follow the model statement. Note that when you are testing a joint hypothesis (i.e., two or more restrictions jointly), after the TEST statement you separate the equation that defines each hypothesis by a comma.
Example
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE, and impose the restriction that the coefficients of ED and EX are equal. Thus, your objective is to estimate a restricted model that imposes a restriction on the model parameters.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG;
12 MODEL wage = ed ex fe; RESTRICT ed = ex; RUN;
The RESTRICT statement tells SAS to impose a restriction on the parameters of the statistical model. The restriction that you want to impose is given by the equation after the RESTRICT statement. Note that the format of the RESTRICT statement is identical to the format of the TEST statement. SAS will display the parameter estimates for the restricted model in the Output window. In addition, it provides an estimate for a parameter called RESTRICT. This is a parameter estimate for a Lagrange parameter that is introduced during the estimation process. If the coefficient of RESTRICT is zero, then the restricted and unrestricted estimates are not significantly different, which means that the restriction has no effect. In this example, a t-test cannot reject the null hypothesis that the coefficient of RESTRICT is zero. This indicates that imposing the restriction is valid.
Example
You want to use the SAS dataset named CPS85A to run a linear regression of WAGE on ED, EX and FE. You want to check for multicollinearity among the explanatory variables. To do this you want to run a regression of each explanatory variable on all remaining explanatory variables so you can calculate variance inflation factors. You also want to calculate the correlation coefficients for the explanatory variables.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; MODEL ed = ex fe; MODEL ex= ed fe; MODEL fe = ed ex; PROC CORR; VAR ed ex fe; RUN;
You can use the R2 statistic for the last three models to calculate variance inflation factors for ED, EX and FE. You can check the correlation matrix for high correlation coefficients between the explanatory variables. Note that SAS will display certain multicollinearity diagnostics, such as eigenvalues and condition indexes, if you use the MODEL statement
MODEL wage = ed ex fe / collin;
Example
You want to use the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You want to use the testing-up approach and do a Lagrange multiplier test to test whether the variables NONWH and MARR should be included in the model. You also want to use the testing-down approach and do an F-test to test whether the variables NONWH and MARR belong in the model.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a;
13 PROC REG; MODEL wage = ed ex fe; OUTPUT out=cps85b residual=resid; PROC REG; MODEL resid = ed ex fe nonwh marr; RUN; DATA cps85c; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe nonwh marr; TEST nonwh = 0, marr = 0; RUN;
The OUTPUT statement that follows the MODEL statement for the regression of WAGE on ED, EX, FE tells SAS to save the residuals from this regression as the variable named RESID (residual=resid), and include the variable named RESID in the temporary SAS data set named CPS85B (out=cps85b). To calculate the Lagrange multiplier test statistic, take the unadjusted R2 statistic from the regression of RESID on ED, EX, FE, NONWH, MARR (R2 = 0.0102) and multiply it by the sample size (n = 530). For this example, the Lagrange multiplier test statistic is LM = (0.0102)(530) = 5.41. The second set of commands beginning with the second data statement DATA CPS85C and ending with the second RUN statement are the commands for the F-test.
Example
You want to use the SAS dataset CPS85A to estimate a varying slope parameter model where WAGE depends upon ED, EX, FE, and the interaction variable EDFE, which is the product of ED and FE. This interaction variable allows the coefficient of ED to depend upon FE.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; edfe = ed*fe; PROC REG; MODEL wage = ed ex fe edfe; RUN;
Note that to estimate this model, you must first create an interaction term for ED and FE.
Example
You want to use the SAS dataset CPS85A to estimate a log-linear functional form, where the logarithm of WAGE depends upon ED, EX, FE.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; Logwage = log(wage); PROC REG; MODEL logwage = ed ex fe; RUN;
14 Note that to estimate this model, you must first create a new variable named LOGWAGE, which is the natural logarithm of the variable WAGE.
Example
You want to use the SAS datataset CPS85A to run a instrumental variables regression of WAGE on ED, EX, and FE using the two-stage least squares estimator. You assume ED is the endogenous explanatory variable. The instrumental variables are NONWH and MARR. You also want to calculate the F-statistic for the null hypothesis that NONWH and MARR have no joint effect on ED in the first-stage regression to check the strength (relevance) of the instrumental variables NONWH and MARR.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC SYSLIN 2sls; ENDOGENOUS ed; INSTRUMENTS nonwh marr; MODEL wage = ed ex fe; RUN; DATA cps85c; SET econ415.cps85a; PROC REG; MODEL ed = ex fe nonwh marr; TEST nonwh = 0, marr = 0; RUN;
The PROC SYSLIN statement tells SAS that you are going to estimate at least one equation in a system of linear equations. The option 2SLS tells SAS to estimate the equation(s) using the two-stage least squares estimator. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instrumental variables. The MODEL statement tells SAS the equation to estimate. The second set of commands beginning with the second data statement DATA CPS85C and ending with the second RUN statement are the commands for the first-stage regression of ED on EX, FE, NONWH, MARR, and calculation of the F-statistic that is used to check instrument strength or relevance.
Example
You want to use the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You then want to estimate this model using the FGLS estimator (weighted least squares) assuming that the variance of the error term is a linear function of ED.
LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; OUTPUT out=cps85b residual=resid; DATA cps85c; SET cps85b; residsq = resid**2; PROC REG;
15 MODEL residsq = ed; OUTPUT out=cps85c predicted=varhat; DATA cps85d; SET cps85c; IF varhat <= 0 THEN varhat = residsq; sdhat = sqrt(varhat); w = 1/sdhat; PROC REG; MODEL wage = ed ex fe; WEIGHT w; RUN;
In this program we use three DATA statements to create three temporary SAS datasets. The OUTPUT statement that follows the MODEL statement for the regression of RESIDSQ on ED tells SAS to save the predicted values of RESIDSQ for this regression as the variable named VARHAT (predicted=varhat), and include this variable in the temporary SAS dataset named CPS85C (out=cps85c). The conditional IF THEN statement tells SAS to replace any value of the variable VARHAT that is negative or zero with the value for the variable RESIDSQ. We must do this because we cannot take the square root of zero or a negative number. The function SQRT tells SAS to find the square root of the variable VARHAT. The WEIGHT statement that follows the last MODEL statement tells SAS to run a weighted least squares regression using the variable W as the weight. This is the FGLS estimator.
Examples #21 and #22 use the data are contained in the Excel file MACROCON, which are assumed to be located on a disk in drive E. Create a temporary SAS dataset named MACROCON. Save this temporary SAS dataset as a permanent SAS dataset named MACROCON in the library ECON415.
Example
You want to use the SAS dataset named MACROCON to run a linear regression of real consumption expenditures (RCONS) on real disposable income (RDISINC) and PRIME. Real consumption expenditures is defined as CONS divided by PRICE, with the appropriate adjustment for the decimal point. Real disposable income is defined as DISINC divided by PRICE, with the appropriate adjustment for the decimal point. You want to do a Largrange multiplier test to test for second-order autocorrelation.
LIBNAME econ415 ‘e:’; DATA con1; SET econ415.macrocon; rcons = cons/(price/100); rdisinc = disinc/(price/100); PROC REG; MODEL rcons = rdisinc prime; OUTPUT out=con1 residual=resid; DATA con2; SET con1; resid1 = lag1(resid); resid2 = lag2(resid); PROC REG; MODEL resid = rdisinc prime resid1 resid2; RUN;
16 The assignment statements for RCONS and RDISINC tell SAS to create the new variables RCONS and RDISINC and save them in the temporary SAS dataset named CON1. The OUTPUT statement that follows the MODEL statement for the regression of RCONS on RDISINC and PRIME tells SAS to save the residuals from this regression as the variable named RESID, and include the variable named RESID in the temporary SAS dataset named CON1. The second DATA statement tells SAS to create a second temporary SAS dataset named CON2. The SET statement tells SAS to include all of the variables in the temporary SAS dataset CON1 in the temporary SAS dataset named CON2. The assignment statement RESID1 = LAG1(RESID) tells SAS to create a new variable named RESID1 that is equal to the variable RESID lagged one period. The assignment RESID2 = LAG2(RESID) tells SAS to create a new variable named RESID2 that is equal to the variable RESID lagged two periods. The variables RESID1 and RESID2 are saved in the temporary SAS dataset CON2. To calculate the Lagrange multiplier test statistic, take the unadjusted R2 statistic from the regression of RESID on RDISINC, PRIME, RESID1, and RESID2 (R2 = 0.31) and multiply by the sample size (n = 35). Note that you lose two observations when running this regression because you have a variable that is lagged two periods. For this example, the Lagrange multiplier test statistic is LM = (0.31)(35) = 10.8.
Example
You want to use the SAS dataset named MACROCON to run a linear regression of real consumption expenditures (RCONS) on real disposable income (RDISINC) and PRIME. You want to estimate this model using the FGLS Cochrane-Orcutt estimator to correct for first-order autocorrelation.
LIBNAME econ415 ‘e:’; DATA con1; SET econ415.macrocon; rcons = cons/(price/100); rdisinc = disinc/(price/100); PROC AUTOREG itprint; MODEL rcons = rdisinc prime / nlag=1 iter converge=0.0001; RUN;
The PROC AUTOREG statement tells SAS to run a linear regression and correct for autocorrelation. The option ITPRINT tells SAS to print out each iteration that SAS performs so you can see how the estimate of the autocorrelation coefficient () changes. The MODEL statement tells SAS to run a linear regression of RCONS on RDISINC and PRIME. The / tells SAS that options follow. The option NLAG=1 tells SAS to correct first-order autocorrelation. The ITER option tells SAS to use Cochrane- Orcuitt estimator, which involves doing iterations. The CONVERGE=0.0001 option tells SAS to stop iterating when the estimate of from two successive iterations differ by no more than 0.0001. If you do not include a the CONVERGE option, SAS will use its own default value for when convergence is achieved. It is important to note that SAS will print out the negative of the estimate of the autocorrelation coefficient, . Thus, if SAS prints a negative it is positive, indicating positive autocorrelation. If SAS prints a positive it is negative indicating negative autocorrelation.
NONLINEAR, SYSTEMS OF EQUATIONS, AND LIMITED DEPENDENT VARIABLE MODELS
Many of the following examples use the data in the external data files named DEMAND, PRODUCER, and LABOR, which are assumed to be located on a memory stick in drive e. The following program creates permanent SAS datasets for each of these files. Note that the Program Builder cannot be used for
17 these more complex models; therefore, you must type the appropriate SAS statements in the Program Editor Window.
LIBNAME econ515 ‘e:’; DATA econ515.demand; INFILE ‘e:demand’; INPUT p1 p2 p3 I q1 q2 q3; RUN;
LIBNAME econ515 ‘e:’; DATA econ515.producer; INFILE ‘e:producer’; INPUT output labor capital land pcapital pland poutput plabor ; RUN;
LIBNAME econ515 ‘e:’; DATA econ515.labor; INFILE ‘e:labor’; INPUT lfp whrs kl6 k618 wa we ww hhrs ha he hw faminc mtr wmed wfed un cit ax; RUN;
NONLINEAR LEAST SQUARES REGRESSION
Example
You want to use the SAS dataset DEMAND to create a new variable, Q1EXP = P1*Q1, and use the nonlinear least squares estimator to run a regression of Q1EXP on P1, P2, P3, and I, that is nonlinear in parameters. In particular, you want to estimate a Stone-Geary demand equation.
LIBNAME econ515 ‘e:’; DATA demand1; SET econ515.demand; q1exp = p1*q1; PROC NLIN method=dud; PARMS a=50 c=0.5 d=30 e=40; MODEL q1exp = a*p1 + c*(i – a*p1 –d*p2 – e*p3); RUN;
An assignment statement is used to create the new variable Q1EXP. The PROC NLIN statement tells SAS to run a nonlinear regression using the nonlinear least squares estimator. The option METHOD=DUD tells SAS to compute numerical derivatives when applying the nonlinear least squares estimator. You can provide your own derivatives by using the statement DER.parametername = followed by the expression for the derivative, for each parameter. The PARMS statement tells SAS the names of the parameters and their starting values. The MODEL statement tells SAS the functional form to estimate. Note that the default algorithm is the Gauss-Newton iterative algorithm. Other algorithms are also available. To use an alternative algorithm, you must specify it as an option in the PROC NLIN statement.
LINEAR SEEMINGLY UNRELATED REGRESSIONS
18 Example
You want to use the SAS dataset DEMAND to estimate the parameters of two linear equations jointly. For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I.
LIBNAME econ515 ‘e:’; DATA demand1; SET econ515.demand; PROC SYSLIN sur vardef=n; good1: MODEL q1 = p1 p2 p3 i / covb; good2: MODEL q2 = p1 p2 i / covb; RUN;
The PROC SYSLIN statement tells SAS that you are going to estimate a system of linear equations. The option SUR tells SAS to estimate the system of equations using the FGLS estimator (Zellner’s SUR estimator). If you want SAS to estimate the system of equations using the interated FGLS estimator (iterated SUR estimator), replace the option SUR with the option ITSUR. The option VARDEF=N tells SAS to use the sample size as the denominator when calculating estimates of the variances and covariances. If you omit this option, SAS will use the degrees of freedom (n – k) as the denominator. The MODEL statement tells SAS the equation to estimate. The model statement is prefixed with a name for the equation followed by a colon. In the above example, the name of the first equation is GOOD1 and the name of the second equation is GOOD2. You may use any name you desire. The option COVB tells SAS to print out the variance covariance matrix of estimates for the system of equations.
Example
You want to use the SAS dataset DEMAND to estimate the parameters of two linear equations jointly. For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I. You want to test the two cross-equation restrictions that the coefficient of P1 in equation 1 is equal to the coefficient of P1 in equation 2, and the coefficient of P2 in equation 1 is equal to the coefficient of P2 in equation 2.
LIBNAME econ515 ‘e:’; DATA demand1; SET econ515.demand; PROC SYSLIN sur vardef=n; good1: MODEL q1 = p1 p2 p3 i ; good2: MODEL q2 = p1 p2 i ; STEST good1.p1 = good2.p1, good1.p2 = good2.p2; RUN;
The STEST statement tells SAS that you want to test a cross-equation restriction. The form of the STEST statement is the same as the TEST statement in PROC REG, except you must attach the name of the equation to the variable so that SAS knows to which equation the variable belongs. The STEST statement calculates the F-statistic for the approximate F-Test.
Example
You want to use the SAS dataset DEMAND to estimate the parameters of two linear equations jointly. For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For
19 equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I. You want to impose the two cross-equation restrictions that the coefficient of P1 in equation 1 is equal to the coefficient of P1 in equation 2, and the coefficient of P2 in equation 1 is equal to the coefficient of P2 in equation 2.
LIBNAME econ515 ‘e:’; DATA demand1; SET econ515.demand; PROC SYSLIN sur vardef=n; good1: MODEL q1 = p1 p2 p3 i ; good2: MODEL q2 = p1 p2 i ; SRESTRICT good1.p1 = good2.p1, good1.p2 = good2.p2; RUN;
The SRESTRICT statement tells SAS that you want to impose a cross-equation restriction. The form of the SRESTRICT statement is the same as the RESTRICT statement in PROC REG, except you must attach the name of the equation to the variable so that SAS knows to which equation the variable belongs.
NONLINEAR SEEMINGLY UNRELATED REGRESSIONS
Example
You want to use the SAS dataset DEMAND to create two new variables, Q1EXP = P1*Q1, and Q2EXP = P2*Q2, and estimate two equations jointly that are nonlinear in parameters. For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I.
LIBNAME econ515 ‘e:’; DATA demand1; SET econ515.demand; q1exp = p1*q1; q2exp = p2*q2; PROC MODEL; PARMS a c d e f; q1exp = a*p1 + c*(i – a*p1 – d*p2 - e*p3); q2exp = d*p2 + f*( i – a*p1 – d*p2 - e*p3); FIT q1exp q2exp / itsur; RUN;
Two assignment statements are used to create the new variables Q1EXP and Q2EXP. The PROC MODEL procedure can be used to estimate, and simulate, systems of linear or nonlinear equations. The PROC MODEL statement tells SAS that you are going to estimate or simulate a system of linear or nonlinear equations. The PARMS statement tells SAS the names of the parameters and their starting values. If you don’t include starting values (as in this example), then SAS will use zero as the starting value for each parameter. The next two assignment statements tell SAS the specific functional form of the equations to be estimated. Note that in this example, the values of the parameters for a, d, and e are forced to be the same in both equations because the same letter is used to designate the parameters in each equation. The FIT statement tells SAS the equations to be estimated, which are indicated by the left-hand side variable. The option ITSUR tells SAS to estimate the system of equations using the iterated seemingly unrelated regressions estimator. The default maximum number of iterations is 40. If you want
20 to increase or decrease the maximum iterations, then after ITSUR include the option MAXIT = and the number of iterations.
LINEAR TWO-STAGE LEAST SQUARES REGRESSION
Example
You want to use the SAS dataset PRODUCER to create three new variables, the logarithm of OUTPUT, LABOR, and CAPITAL, and use the two-stage least squares (2SLS) estimator to run a linear regression of the log of output on the log of labor and the log of capital. You assume that OUTPUT and LABOR are endogenous variables. You assume that CAPITAL, LAND, PCAPITAL, and POUTPUT are exogenous variables.
LIBNAME econ515 ‘e:’; DATA producer; SET econ515.producer; loutput = log(output); llabor = log(labor); lcapital = log(capital); PROC SYSLIN 2sls vardef=n first; ENDOGENOUS loutput llabor; INSTRUMENTS lcapital land pcapital poutput; pf: MODEL loutput = llabor lcapital; RUN;
The first three assignment statements create the new variables. The PROC SYSLIN statement tells SAS that you are going to estimate at least one equation in a system of linear equations. The option 2SLS tells SAS to estimate the system of equations using the two-stage least squares estimator. The option VARDEF=N tells SAS to use the sample size as the denominator when calculating estimates of the variances and covariances. If you omit this option, SAS will use the degrees of freedom (n – k) as the denominator. The option FIRST tells SAS to print out the results of the first-stage regression. Note that since you used the option VARDEF=N, the standard errors for the first stage regression and second stage regression will use the sample size as the denominator when making the calculation. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instruments to create an instrumental variable(s). The MODEL statement tells SAS the equation to estimate. The model statement is prefixed with a name for the equation followed by a colon. In the above example, the name of the equation is PF (which is short for production function). You may use any name you desire.
NONLINEAR TWO-STAGE LEAST SQUARES REGRESSION
You want to use the SAS dataset PRODUCER to use a nonlinear two-stage least squares (N2SLS) estimator to run a regression of OUTPUT on LABOR and CAPITAL, that is nonlinear in parameters. In particular, you want to estimate a constant elasticity of substitution (CES) production function. You assume that OUTPUT and LABOR are endogenous variables. You assume that CAPITAL, LAND, PCAPITAL, and POUTPUT are exogenous variables.
LIBNAME econ515 ‘e:’; DATA producer; SET econ515.producer; PROC MODEL;
21 PARMS a=2.1 e=0.3 c=0.1 d=0.5; output = a*(e*labor**(-c) + (1-e)*capital**(-c))**(-d/c); ENDOGENOUS output labor; INSTRUMENTS capital land pcapital poutput; FIT output / n2sls; RUN;
The PROC MODEL statement tells SAS that you are going to estimate or simulate at least one equation in a system of linear or nonlinear equations. The PARMS statement tells SAS the names of the parameters and their starting values. The next assignment statement tells SAS the specific functional form of the equations to be estimated. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instruments to create an instrumental variable(s). The FIT statement tells SAS the equation(s) to be estimated, which are indicated by the left-hand side variable. The option N2SLS tells SAS to estimate the equation(s) using the nonlinear two-stage least squares estimator. This estimator uses an iterative procedure to obtain the estimates. The default maximum number of iterations is 40. If you want to increase or decrease the maximum iterations, then after N2SLS include the option MAXIT = and the number of iterations.
LINEAR THREE-STAGE LEAST SQUARES REGRESSION
Example
You want to use the SAS dataset PRODUCER to estimate two simultaneous equations jointly. Equation 1 is a production function. The left-hand side variable is the log of OUTUT. The right-hand side variables are the log of LABOR and the log of CAPITAL. Equation 2 is a labor demand equation. The left-hand side variable is the log of LABOR. The right-hand side variable is the log of the real price of labor (LRPLAB), the log of the real price of capital (LRPCAP), and the log of OUTPUT. You assume that OUTPUT and LABOR are endogenous variables. You assume that CAPITAL and the real prices are exogenous variables.
LIBNAME econ515 ‘e:’; DATA producer; SET econ515.producer; loutput = log(output); llabor = log(labor); lcapital = log(capital); lrplab = log(plabor/poutput); lrpcap = log(pcapital/poutput); PROC SYSLIN 3sls vardef=n first; ENDOGENOUS loutput llabor; INSTRUMENTS lcapital lrplab lrpcap; pf: MODEL loutput = llabor lcapital; ld: MODEL llabor = lrplab lrpcap loutput; RUN;
The first five assignment statements create the new variables. The PROC SYSLIN statement tells SAS that you are going to estimate a system of linear equations. The option 3SLS tells SAS to estimate the system of equations using the three-stage least squares estimator. If you want SAS to estimate the system of equations using the iterated three-stage least squares estimator, replace the option 3SLS with the option IT3SLS. The option VARDEF=N tells SAS to use the sample size as the denominator when calculating estimates of the variances and covariances. If you omit this option, SAS will use the degrees of freedom
22 (n – k) as the denominator. The option FIRST tells SAS to print out the results of the first-stage regression(s). Note that since you used the option VARDEF=N, the standard errors for both the first stage regression and second stage regression will use the sample size as the denominator when making the calculation. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instruments to create an instrumental variable(s). The MODEL statements tells SAS the equations to estimate. The model statement is prefixed with a name for the equation followed by a colon. In the above example, the name of the equations are PF and LD. You may use any name you desire.
NONLINEAR THREE-STAGE LEAST SQUARES REGRESSION
Example
You want to use the SAS dataset PRODUCER to estimate two simultaneous equations joint; at least one of these equations is nonlinear in parameters. Equation 1 is a constant elasticity of substitution (CES) production function, and therefore is nonlinear in parameters. The left-hand side variable is OUTPUT. The right-hand side variables are LABOR and CAPITAL. Equation 2 is a labor demand equation that is linear in parameters. The left-hand side variable is LABOR. The right-hand side variables are the real price of labor (RPLAB), the real price of capital (RPCAP), and OUTPUT. You assume that OUTPUT and LABOR are endogenous variables. You assume the CAPITAL, RPLAB, and RPCAP are exogenous variables.
LIBNAME econ515 ‘e:’; DATA producer; SET econ515.producer; rplab = plabor/poutput; rpcap = logpcapital/poutput; PROC MODEL; PARMS a=2.1 e=0.3 c=0.1 d=0.5 e=0 f=0 g=0 h=0; output = a*(e*labor**(-c) + (1-e)*capital**(-c))**(-d/c); labor = e + f*rplab + g*rpcap + h*output; ENDOGENOUS output labor; INSTRUMENTS capital rplab rpcap; FIT output labor / n3sls; RUN;
The two assignment statements create the new variables. The PROC MODEL statement tells SAS that you are going to estimate or simulate a system of linear or nonlinear equations. The PARMS statement tells SAS the names of the parameters and their starting values. The next two assignment statements tell SAS the specific functional forms of the equations to be estimated. The ENDOGENOUS statement tells SAS the endogenous variables. The INSTRUMENTS statement tells SAS the variables that you will use as instruments to create instrumental variables for output and labor. The FIT statement tells SAS the equation to be estimated, which are indicated by the left-hand side variable. The option N3SLS tells SAS to estimate the system of equations using the nonlinear three-stage least squares estimator. This estimator uses an iterative procedure to obtain the estimates. The default maximum number of iterations is 40. If you want to increase or decrease the maximum iterations, then after ITSUR include the option MAXIT = and the number of iterations.
BINARY PROBIT REGRESSION
Example
23 You want to use the SAS dataset LABOR to estimate a labor force participation equation for women. The dependent variable is LFP (a dummy variable). The independent variables are WA, WE, KL6, K618, CIT, UN. You want to analyze the impact that each of the independent variables has on the probability that a women will choose to work (the probability that LFP=1).
LIBNAME econ515 ‘e:’; DATA labor; SET econ515.labor; lfpnew = 1 – lfp; PROC PROBIT; CLASS lfpnew; MODEL lfpnew = wa we kl6 k618 cit un; RUN;
The first assignment statement creates a new variable, named LFPNEW in this example, that is one minus LFP. SAS models the probability of the lower value of the dependent variable. In this example, LFP = 1 if a woman works, LFP = 0 if a woman does not work. Therefore, if you use LFP as the dependent variable, SAS would estimate the probability that a woman does not work. To estimate the probability that a woman works, you must create the new variable LFPNEW. The PROC PROBIT statement tells SAS to estimate a probit model. The CLASS statement tells SAS the variable that is being analyzed. The MODEL statement tells SAS the equation to estimate.
BINARY LOGIT REGRESSION
Example
Same as for probit model. The SAS statements are the same as the probit model, except you must include the option D = LOGISTIC in the MODEL statement.
MODEL lfpnew = wa we kl6 k618 cit un / d = logistic;
TOBIT (CENSORED) REGRESSION
Example
Suppose you are using the Limdep data file LABORSUPPY.LPJ. You want to estimate an hours of work equation for wives. The dependent variable is WHRS. The explanatory variables are KL6, K618, WA, WE. Fifty of the 100 wives in the sample do not work, and therefore WHRS is zero for these 50 observations. However, we do have data for the explanatory variables for all 100 wives. In this case, the distribution of the dependent variable, WHRS, is censored from below at zero. The appropriate regression model is a Tobit (censored) regression model.
LIBNAME econ515 ‘e:’; DATA labor; SET econ515.labor; IF whrs = 0 THEN lower = .; ELSE lower = whrs; PROC LIFEREG; MODEL (lower, whrs) = kl6 k618 wa we / d = normal covb itprint; RUN;
24 The IF/THEN/ELSE statement creates the variable named LOWER. If the value of the variable LOWER is missing, then SAS will treat the variable WHRS as censored from below. The PROC LIFEREG statement tells SAS to estimate a censored regression model. The MODEL statement tells SAS the equation to estimate. Note that the dependent variable (lower, whrs) specifies two variables. If the values of the variable LOWER is missing, then SAS assumes that the variable WHRS is censored from below. The option D = NORMAL tells SAS to assume that the dependent variable has a truncated normal distribution. The option COVB tells SAS to print out the variance/covariance matrix of estimtes. The option ITPRINT tells SAS to print out the iterations.
MATRIX COMMANDS
To do matrix and vector operations, you use PROC IML. IML stands for Interactive Matrix Language. The general form of a SAS matrix program is
PROC IML; IML statements; QUIT;
PROC IML tells SAS that you want to start doing matrix operations. QUIT tells SAS that you are finished doing matrix operations. IML statements are arranged in groups called modules. To begin a module, you use a START statement. To end a module, you use a FINISH statement. To execute the module, you use a RUN statement. The general form of an IML module is
START module name; IML statements; FINISH; RUN module name;
Defining Matrices and Vectors
Example
Suppose that you want to create the following matrices and vectors.
| 7 4 9 | | 1 5 2 | | 5 | | 2 8 9 | | 6 7 4 | | 2 | | 1 8 9 | | 5 4 7 | | 7 1 3 | | 9 |
PROC IML; START first; a = {7 4 9, 2 8 9, 5 4 7}; b = {1 5 2, 6 7 4, 7 1 3}; c = {5,2,9}; d = {1 8 9}; PRINT ,a,b,c,d; FINISH; RUN first; QUIT;
25 The PROC IML statement tells SAS to start doing matrix operations. The START statement tells SAS to begin a new module, and name this module FIRST. The next four lines create two matrices named a and b, and two vectors named c and d. Note that braces { } are used to define matrices and vectors. The rows of a matrix or vector are separated by a comma. The PRINT statement tells SAS to show the matrices in the output window. If commas are used to separate the names of the matrices and vectors (as in the example above), this tells SAS to print each matrix and vector on a new line. The FINISH statement tells SAS to end the module. The RUN statement tells SAS to execute the statements in the module FIRST. If you do not give a name to your module, then the RUN statement will execute the statements in the module that immediately precedes it. Thus, in this example it is not necessary to name the module.
Defining Matrices and Vectors with Existing Data
Example
You want to use the SAS dataset DEMAND to create a column vector for the variable Q1, and a data matrix that includes P1, P2, I, and a column of 1’s for the constant term.
LIBNAME econ515 ‘e:’; DATA demand1; SET econ515.demand; PROC IML; START first; USE demand1; READ all var{q1} into q1; PRINT q1; FINISH; RUN first; START second; USE demand1; READ all var{p1 p2 I} into Z; t = NROW(Z); ones = J(t, 1, 1); X = ones||Z; PRINT / X; FINISH; RUN second; QUIT;
The LIBNAME, DATA, and SET commands tell SAS to access the permanent SAS dataset named DEMAND and put the data in the temporary SAS dataset named DEMAND1. The PROC IML statement tells SAS to start doing matrix operations. The START statement tells SAS to begin a new module, and name this module FIRST. The USE statement tells SAS to read the SAS dataset named DEMAND1 into PROC IML. The READ statement tells SAS to take data from the SAS dataset DEMAND1 and place it in a matrix or vector. The options ALL, VAR{Q1}, and INTO Q1 tell SAS to use all of the observations for the variable Q1and read them into a vector named Q1. The PRINT statement tells SAS to display the vector Q1 in the output window. The FINISH statement tells SAS to end the module. The RUN statement tells SAS to execute the statements in the module FIRST. The second START statement tells SAS to begin a new module, and name this module SECOND. The USE statement tells SAS to read the SAS dataset named DEMAND1 into PROC IML. The READ statement tells SAS to take data from the SAS dataset DEMAND1 and place it in a matrix. The options ALL, VAR{P1 P2 I}, and INTO Z tell SAS to use all of the observations for the variables P1, P2, and I, and read them into a matrix named Z.
26 The T= NROW(Z) statement (which uses the NROW function) tells SAS to count the number of rows in the matrix Z and assign this value to T. The ONES = J(t, 1, 1) statement (which uses the J function) tells SAS to create a matrix that has identical values. This statement tells SAS to create a matrix named ONES with T rows and 1 column, and fill it with all 1’s. Therefore, SAS will create a Tx1 column vector of 1’s. The X = ONES||Z statement tells SAS to create a new matrix named X, which stacks the vector ONES and the matrix Z side by side (this is called horizontal concantenation, and the operator for this is ||). The PRINT / X statement tells SAS to display the matrix X in the output window. The option “/” tells SAS to skip to a new page when displaying X. The FINISH statement tells SAS to end the module. The RUN statement tells SAS to execute the statements in the module SECOND. The QUIT statement tells SAS that you are finished with matrix operations.
Defining an Identity Matrix
An identity matrix is a square matrix with ones on the principal diagonal and zeros off the principal diagonal.
Example
You want to create a 30x30 identity matrix named IMATRIX.
PROC IML; START ; imatrix = I(30); PRINT imatrix; FINISH; RUN; QUIT;
The imatrix = I(30) statement tells SAS to create an identity matrix (I is called the I function) named IMATRIX. The number inside parentheses defines the number of rows and columns of the matrix.
Matrix Operations and Matrix Algebra
The following is an example of some matrix operations that can be done with PROC IML. The SAS statements are in the form of NAME = OPERATION, where NAME is the name of the matrix or scalar that results from the operation that is performed.
Example
You want to create two matrices, a and b, and two vectors, c and d, and perform a number of operations on them. A description of the operation performed is to the right of the SAS statement
PROC IML; START first; a = {7 4 9, 2 8 9, 5 4 7}; * Module first creates the matrices and vectors; b = {1 5 2, 6 7 4, 7 1 3}; c = {5,2,9}; d = {1 8 9}; FINISH; RUN first; START second;
27 e = a||b; * e is a new matrix that is the horizontal concantenation of a and b (a and b are placed side by side to create e); f = a//b; * f is a new matrix that is the vertical concantenation of a and b (a is stacked on top of b to create f); g = BLOCK(a,b); * g is a new matrix that is a block diagnol matrix with a and b as separate blocks; h = DIAG(c); * h is a new matrix that is a diagonal matrix with the elements of the vector c on the principal diagonal; i = INV(a); * i is the inverse of the matrix a; j = DET(a); * i is the determinant of the matrix a; k = TRACE(a); * j is the trace of the matrix a; l = T(a); * l is the transpose of a; m = a +b; * m is the sum of a and b; n = a-b; * n is the difference between a and b; o = a*b; * o is the product of a and b; p = 5#b; * p is the matrix that results from multiplying each element in b by the scalar 5; q = b/5; * q is the matrix that results from dividing each element in b by the scalar 5; r = a@b; * r is the kroneckor product of a and b; s = VECDIAG(a); * s is a vector whose elements are the elements of the principal diagonal of a; PRINT ,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s; FINISH; RUN second; QUIT;
Matrix Command Examples
Example
You want to use the SAS dataset named DEMAND to calculate estimates of the parameters of the linear regression of Q1 on a constant, P1, P2, and I, using the OLS estimator. You want to calculate the variance/covariance matrix of estimates, standard errors of the estimates, and t-statistics for the zero null hypothesis.
LIBNAME econ515 ‘e:’; DATA demand1; * Accesses SAS dataset DEMAND and creates temporary SAS; SET econ515.demand; * data set DEMAND1;
PROC IML; * Initiates matrix operations;
START first; USE demand1; READ all var{q1} into q1; * Module FIRST creates vector of observations for q1 and data; READ all var{p1 p2 I} into z; * matrix; t = NROW(z); ones = J(t, 1, 1); x = ones||z; PRINT q1 x; FINISH; RUN first;
START second; * Starts module named SECOND;
28 xt = T(x); * Transpose of data matrix; xtx = xt*x; * Transpose of data matrix times the data matrix; xtxi = INV(xtx); * Inverse of xtx; b = xtxi*xt*q1; * OLS estimator; q1fit = x*b; * Vector of fitted values for q1; res = q1-q1fit; * Vector of residuals; rss = T(res)*res; * Residual sum of squares; df = NROW(x)-NCOL(X); * Degrees of freedom; sig2 = rss/df; * Estimate of the error variance; covb=sig2#xtxi; * Variance/covariance matrix; stderr = SQRT(VECDIAG(covb)); * Vector of standard errors of the estimates; tstat = b/stderr; * Vector of t-statistics; PRINT b df sig2 stderr tstat covb; * Display estimates; Results = b || stderr || tstat; * Create matrix with estimates, standard errors, and t-statitics; estname = {int p1 p2 I}; * Create row names; col = {estimate se tstat}; * Create column names; PRINT / results [rowname = estname colname = col] covb; * Display estimates in table form; FINISH; * Ends module named SECOND; RUN second; * Runs module named SECOND;
QUIT; * Ends matrix operations;
29