United States Department of WJ Agriculture Consumer Demand Analysis When Zero Technical Bulletin Number 1792 Consumption Occurs The Case of Cigarettes James R. Blaylock William N. Blisard

yjfèè It's Easy To Order Another Copy!

Just dial 1-800-999-6779. Toll free in the United States and Canada. Other areas, please call 1-301-725-7937.

Ask for Consumer Demand Analysis When Zero Consumption Occurs: The Case of Cigarettes (TB-1792).

The cost is $8.00 per copy. For non-U.S. addresses (includes Canada), add 25 percent. Charge your purchase to your VISA or MasterCard, or we can bill you. Or send a check or purchase order (made payable to ERS-NASS) to:

ERS-NASS P.O. Box 1608 Rockville, MD 20849-1608.

We'll fill your order by first-class mail. Consumer Demand Analysis When Zero Consumption Occurs: The Case of Cigarettes. By James R. Blaylock and William N. Blisard. Commodity Division, Economic Research , U.S. Department of Agriculture. Technical Bulletin No. 1792.

Abstract

Analysts use household survey information when attempting to model the demand for certain commodities. However, many individuals do not purchase or use specific agricultural commodities during a survey period. This presents a problem for analysts because it is unclear what the zero purchase implies. It could mean that the individual never uses the product or that the survey period was too short. Or perhaps the individual would use the product if the were lower. By focusing on a specific commodity with well-known characteristics and patterns of use, cigarettes, we are able to examine methods of treating zero observations. Our results generally support the conclusions that analysts should thoroughly understand the characteristics of the commodity to be studied, should apply appropriate econometric techniques, and should make full and creative use of all survey information. Survey designers should structure survey questions to permit separation of the sample into users, never-users, and potential users.

Keywords: Zero consumption, cigarettes, Cragg model, double-hurdle model

1301 New York Ave., NW Washington, DC 20005-4788 September 1991

III Contents

Page

Summary v

Introduction 1

Theoretical Foundations 3

Statistical Models 5

Descriptive Statistics and Model Specification 8

Some Statistical Notes 11

Empirical Findings 11

Summary of Empirical Findings 20

References 21

IV Summary

Analysts use household survey information when attempting to model the demand for certain commodities. However, many individuals do not purchase or use specific agricultural commodities during a survey period. This presents a problem for analysts because it is unclear what the zero purchase implies. It could mean that the individual never uses the product or that the survey period was too short. Or perhaps the individual would use the product if the price were lower. By focusing on a specific commodity with well-known characteristics and patterns of use, cigarettes, we are able to examine methods of treating zero observations.

Our results generally support the conclusions that analysts should thoroughly understand the characteristics of the commodity to be studied, should apply appropriate econometric techniques, and should make full and creative use of all survey information. Survey designers should structure survey questions to permit separation of the sample into users, never-users, and potential users.

Zero consumption can have several different meanings. One is that all individuals are potential users of the good. This implies that households can be induced to use a product if they could change their socioeconomic characteristics, income level, or the relative . For some agricultural products, this assumption may be invalid, resulting in biased estimates of income elasticities and the relationship drawn between demographic characteristics and consumption behavior. For example, a person may not eat certain foods because of medical conditions, religious faith, special diet, or palatability reasons.

In many cases it is impossible from survey data to distinguish whether the individual is a nonuser or simply did not purchase the product during the survey period. For example, if individuals do not consume meat because they are vegetarians, then they have no influence on the demand curve for beef. On the other hand, if individuals did not eat beef during the survey period because their income was low, then they provide valuable input into the Engel curve for beef.

In general, households can be separated into three groups: (1) those that never use the product, (2) infrequent users such as households that use the product but the survey period was too short to record their use, and (3) potential users who might use the product if certain economic or other factors changed, such as a lower price or increased income.

Survey designers need to ask questions which may identify infrequent users, nonusers, and potential users so that researchers can formulate more effective modeling strategies. For example, if a survey question identifies members in the household as vegetarians, then this information can be used in modeling the demand for meat. Information about an individual's religious and ethnic background is valuable for treating zero observations on pork consumption. Questions probing whether or not individuals ever use or how frequently they use certain major commodities can be incorporated into econometric demand analysis. Consumer Demand Analysis When Zero Consumption Occurs The Case of Cigarettes

James R. Blaylock William N. Blisard

Introduction

Agricultural often rely on household survey data to quantify the relationships between consumption of a commodity and various household socioeconomic characteristics. Many households will not, however, record purchasing or using given products during a survey period. In general, these households can belong to at least one of three groups: those that never use the product in question; infrequent users, that is, those that use the product but the survey period was too short to record it; and those that would use the product if certain economic or other factors changed, such as a lower relative commodity price or increased income.

Our research objectives are to examine the use of various types of econometric models for use in more rigorous modeling of the zeros present in household survey data. We focus on using survey information that is often not available, or not used if it is, to help in our modeling efforts. The approaches we use also help explain how and why researchers should clearly understand the characteristics of the commodity to be analyzed and to exploit any special or unusual characteristics in their models.

In situations in which some individuals report no purchase of a particular good, the method of Tobit estimation for demand analysis is often used. One underlying assumption embodied in this statistical technique is that all individuals are potential users of the good or, alternatively, that all zero observations represent standard corner solutions. The assumption is consequently implicit that households can be induced to use a product if they could change their socioeconomic characteristics, income level, or the relative prices. For some agricultural products, this assumption may be plausible. However, for other products this assumption may be invalid, resulting in inefficient estimates of income elasticities and the empirical measures of the association between demographic characteristics and consumption behavior. For example, a person may not eat certain foods because of medical conditions, religious faith, special diet, or palatabiiity reasons. In many cases it is impossible to distinguish from survey data whether the individual is a nonuser or is a consumer who did not consume or purchase the product during the survey period. For example, if a person simply does not consume meat, such as a vegetarian, then this individual has no influence on the demand curve for beef. On the other hand, if a person did not eat beef during the survey period because of low income, then this individual provides valuable input into the demand curve for beef.

Based on input from researchers, survey designers sometimes include questions that are useful for determining whether or not a person uses a product and/or how often the person uses it. For example, the recently released individual intake part of the U.S. Department of Agriculture's (USDA) Nationwide Food Consumption Survey includes extensive questions on the use of alcoholic beverages, as well as questions on the frequency of use of other products. As this trend is likely

1 to continue, it is important for analysts to become familiar with statistical and econometric techniques that allow all available information to be used in estimating demand relationships.

In this study, we examine methods of treating zero observations by focusing on the smoking behavior of women participating in the 1985-86 USDA Continuing Survey of Food Intakes by Individuals, Low-Income Women Ages 19-50. We chose to analyze smoking behavior for several reasons. First, tobacco is a product that has well-known characteristics: it is an addictive and frequently used product. Cigarette use represents not only a consumption decision but is also a form of social behavior. This suggests that the decision to start smoking and the decision of how much to smoke may be influenced in different ways by the same factor. This highlights another shortcoming of the Tobit model. This model restricts the coefficients of the starting and consumption equations to have the same sign and magnitude. There is no a priori reason why this should be the case. An alternative is to model the decisions to consume and how much to consume separately. This technique is known as the double-hurdle approach in the economics literature and the well-known Cragg model is an example (5).^

Second, our data contain additional information that permits separation of the sample's nonsmokers into two groups: those who have never smoked and ex-smokers. We will specify several different theoretical models and their statistical counterparts each treating the nonsmokers differently. First, we assume that no one is at a standard corner solution; that is, only current smokers are included in the demand curve. A Heckman selectivity model is used in this case (10). Further, we will show that if the quit and consumption equations in the Heckman model are uncorrelated then the researcher can estimate the demand curve by ordinary over a data set that contains smokers only.

In contrast to these models, we will assume that current nonsmokers may have some effect on the demand curve for cigarettes. In this case we will retain the zero consumption values. A key underlying assumption for these models is that there may be potential smokers within the current nonsmoking group. These may be people who have never smoked but it will also include those who have smoked before and quit. Utilizing this assumption we will estimate demand curves with a Cragg model over a data set that includes only current and ex-smokers.^

Third, cigarettes are viewed as a detrimentally addictive and frequently consumed agricultural commodity. These views give rise to a number of interesting practical issues that provide increased motivation for this research. For instance, public groups, as well as Federal, State, and local governments, are increasing their focus on inducing individuals to either stop smoking or, at the minimum, restrict their use of tobacco products. Tobacco companies have come under increased attack for targeting advertising messages to specific population subgroups. The most recent controversy was directed at tobacco advertising targeted for young females and Blacks. These events highlight the need for an understanding of the relationship among an individual's socioeconomic characteristics and the decision to start smoking, how many cigarettes to smoke, and, for some, the decision to quit. One of our objectives is to specify and estimate statistical models to help clarify these relationships.

A British study by Jones has used econometric models somewhat similar to those we employ to analyze the starting, quitting, and consumption decisions (11). Jones used only a limited set of demographic independent variables and placed an emphasis on using attitude questions as explanatory factors. This limits the use of his models for identifying individuals most at risk to start smoking and those who are heavy users.

italicized numbers in parentheses refer to sources listed in the References at the end of the report.

^See Blaylock and Blisard (3) for an analysis of current smokers versus nonsmokers.

2 We have been unable to locate any studies using U.S. data that analyze smoking behavior with the appropriate statistical models. On the other hand, marketing-type studies that analyze the characteristics of smokers are not unusual. The problem with these studies, however, is that they do not have the structure associated with economic models. This results in few testable hypotheses relating to the effects of individual or household characteristics on behavior and accounts for the focus on noneconomic factors as explanatory variables (2).

Some studies using U.S. household expenditure data have focused on the estimation of Engel curves for cigarettes (12). The problems associated with this type of analysis, especially for studying smoking behavior, center around the data. For example, a researcher is unable to identify the actual smoker or smokers in a household containing more than one person. Also, the use of cigarettes out of inventories results in some households being identified as nonusers since cigarette expenditures for their survey period are zero. Likewise, the identification of ex-smokers is usually impossible.

Our data set contains considerable demographic, economic, and smoking behavior information on the respondents. Our modeling efforts reveal that, all else equal, the probability increases that a woman starts smoking if she resides in a nonrural location, lives outside the West, has relatively few years of formal education, has no job outside the home, and is non-Hispanic. We also found that household income did not affect the odds of a woman starting to smoke. Statistical analysis indicates that the number of cigarettes consumed daily reaches a maximum at about age 40 and is higher if a woman lives outside the West, owns a home, and is non-Black. Smokers in ill health tend to smoke fewer cigarettes than their counterparts in good health, all else equal. The probability of quitting smoking increases for those residing in the West, for those with higher education levels, and for non-Blacks and non-Hispanics.

The empirical results of this modeling effort give rise to a more general conclusion: researchers and survey designers should acquire as much knowledge as possible about a commodity that they wish to study. For example, information on usage patterns can help in modeling any zero consumption that is due to infrequent use. Likewise, survey designers should make every effort to include survey questions which may help identify users, nonusers, and ex-users so that researchers can formulate more effective modeling strategies. For example, if a survey question identifies vegetarians, this information can be used in modeling the demand for meat. Information about an individual's religious and ethnic background is similarly valuable for treating zero observations on some food products. Answers to questions probing whether or not individuals use or how frequently they use certain major commodities also include potentially valuable information that can be incorporated into econometric demand analysis.

Theoretical Foundations

In standard theory, we assume that a set of is permanently in each consumer's utility function and if we observe zero consumption of a good then we must also assume that we are observing a typical corner solution. In this case, a standard Tobit model or one of its variants will suffice for estimation purposes.

Unfortunately, micro data suggests that all consumers do not have the same structure and, hence, not all goods are in each consumer's utility function. Pudney models this case using discrete random preference regimes {16), His approach assumes that smokers have a different preference structure than nonsmokers. In this case, observed zero observations reflect the decision to not smoke and, therefore, only smokers determine the parameters of the Engel curve for cigarettes. As noted by Blundell and Meghir, all nonparticipants are assumed not to want to consume cigarettes and the demand curve is estimated over participants only (4).

Another situation can occur when both the decision to purchase a good and the decision about how much to consume are linked. This occurs when consumers evaluate their utility levels with and without the good in question and then determine whether or not to consume the good. This may be a plausible model for cigarette consumption since cigarettes are addictive and consumed often (few people smoke only on special occasions). That is, consumption levels of additive cornmodities are difficult to adjust freely once use begins. Thus, it appears reasonable that cigarette consumption levels are first determined and then entered into the decision process about whether or not to start smoking.

This theoretical model also exploits the notion that there are certain characteristics of smoking which relate directly to the qualitative distinction between smoking and nonsmoking and are independent of the amount consumed. These characteristics are primarily related to the perception of smoking as a form of social behavior and are related to factors such as the prestige or stigma of smoking among different social groups.

Assume an individual's utility function takes the form, U= U{DEY, Y^ W{S), P{Q) (1) where Y^ denotes cigarettes, Y2 - Yp denotes all other goods, W(S) represents qualitative characteristics of smoking, P(Q) contains qualitative characteristics related to quitting such as health and the addictiveness of cigarettes and

D = 1 if an individual is an actual or a potential smoker, = 0 otherwise, and E = 1 if an individual is an actual or potential quitter, = 0 otherwise.

For simplicity we assume that cigarettes are the only good for which starting and quitting decisions are relevant; that is, all other goods are consumed at positive levels. This formulation assumes that the decisions to start and quit smoking are not symmetrical. That is, different factors enter into the decisions to start smoking versus not to start and to quit versus to continue smoking. For example, the strong evidence that a clear life-cycle pattern to starting and quitting exists lends support to separately treating the start and quit decisions {14).

In the context of an explicit starting decision, it seems reasonable that individuals will compare their welfare at zero cigarette consumption with their welfare at the level of consumption they will choose once having started. Likewise, in the context of an explicit quitting decision, it seems plausible that individuals once having started will compare their welfare at their current level of consumption with zero consumption should they quit.

The continuous aspects of the start, quit, and consumption choices are represented by the utility function given in (1). Assuming intertemporal separability, the current consumption decision is based on the indirect utility function: vip,m) = max[UiD'QY, Y„;W{S), P{0))\p'y = m] (2) where p is a vector of prices and m is total expenditure or income.

The criterion for starting is then assumed to be: D = 1 ifi>0, 13) = 0 otherwise,

I = [v{*)-v*{»)] + mS)-W*{S)]. In the start decision, consumers compare their level of utility, v{»), at positive levels of cigarette consumption with their utility, v*(«), at zero consumption, given prices and income. Included in the start equation is the term W(S) - W*(S), the net effect of the qualitative factors on the start decision. If v(*) - y*(^) is negative, it would be because of a high own price of cigarettes or because of a very low income (assuming cigarettes have a positive income ). If W(S) - W*(S) is positive, it means the qualitative factors associated with smoking are greater than those for nonsmoking. It is possible for starting to occur, even if v(») - v*(») is negative, if W(S) - W*(S) is positive and offsetting. Zero consumption may be due to a corner solution; that is, to a high relative price in v*(») and/or low income, or it may be due to negative qualitative factors.

Given D = 1, the quit decision is modeled in much the same way as the start decision:

E = 1 ifX>0, (4) = 0 otherwise,

where P|Q) - P*(Q) denotes the net effects of qualitative factors on the quit decision. Zero consumption may consequently be due to a corner solution; that is, due to a high relative price in v*(«) and/or low income, or it may be due to D or E equal to zero.

Before developing our statistical models, we briefly describe our data set, the 1985-86 USDA Continuing Survey of Food Intakes by Individuals, Low-Income Women (CSFII). The survey is designed primarily to gather data on food intake for nutritional analysis. It consequently contains considerable demographic information on the survey participants. Participants were asked several questions about cigarette use such as: (1) Do you currently smoke and if so how many cigarettes do you smoke per day (week or month)? (2) Have you smoked 100 or more cigarettes during your lifetime? A positive response to question (2) indicates that the individual is or was a smoker at one time. A positive response to question (2) and a negative response to (1) identifies an ex-smoker.

Statistical Models

One unique feature of the CSFII is the identification of ex-smokers. This permits potentially valuable separation of the sample into three groups: smokers, ex-smokers, and nonsmokers. Use of sample separation information, such as this, can improve the efficiency of our model estimators. For example, from a behavioral standpoint, ex-smokers are more likely to start smoking than nonsmokers. First assume the start, quit, and consumption equations are linear in their parameters (a, y, ß) and have additive disturbance terms (u, r, v), and that the variables influencing the three decisions are contained in matrices x, k, z. Mathematically:

Observed œnsumptiom (5) Y = DEY'* Starting equation:

S = ax + ¿/. D=1 if S>0, else D=0; <6)

¿y-A/(0,1)

Quitting equation.

Q = yk^ r, E=1 /f OO, else E=0; (7)

r^yV(0.1)

Consumption equation:

/** = max[0, Y'\ (8) y* = ßZ +»/;

i^~/V(0,ö2).

The dependent variable in the start equation, S, represents whether or not a person has ever smoked. The dependent variable in the quit equation, Q, represents whether or not a starter has quit.

A positive level of cigarette consunnption, Y, is observed only if both D and E are equal to 1 and Y** > 0. Thus, a censored sample is assumed. This is a "triple-hurdle" model because three hurdles must be passed before a positive level of consumption is observed.

A general statistical model assumes that the equation error terms, u, r, and v, are dependent. To our knowledge, this type of trivariate model with dependence has not been successfully estimated. To reduce the complexity of the model for purposes of estimation, some restrictions must be placed on the joint of the three equation error terms. Several alternative models are examined below.

Our first set of models assumes that the start equation is independent of the quit and consumption equations. To allow for the possibility the quit-consumption decisions are made simultaneously, the equation error terms are assumed to be correlated:

1 op (r. y) -BVN{0, V)X = (9) op O^2

This is a double-hurdle model with dependence, sometimes referred to as a dependent Cragg model in the literature (13, 4, 5). Using NS to denote those who never smoked, S to denote current and ex-smokers, + to denote current smokers, and - to denote ex-smokers, the likelihood functions for this model are: n II -p(í/> 0)111 p(">o) (10A) NS S

n M -P(/'>-YA)P(V^>-P-^/'>-Yä)1-

l[P(r>-yKl p{v>-pZ\r>-yK) g{Y*\v>-pZ, r>-yK^ (10B)

or

(11a) A/5 S

n 11 - ^{yk, pz/a, p)]- (11B) n ^[{yk -^{Y* - ß2))/v^r:v] ^

where p denotes probability, ^ and ip denote distribution and density functions respectively, p is a correlation coefficient and Q{^)=(p(^)/^{^). Equations (10A) and (11A) are standard probit models for the start decision and are estimated independently of the quit and consumption equations. Note that only the sample of ex- and current smokers is used for estimating the quit-consumption decisions. In this model, zero consumption may be a result of the quit or consumption decisions.

An independent Cragg model is nested within the dependent Cragg model by assuming that the error terms, r and v, in the quit-consumption equations are independent; that is, p = 0. An independently estimated probit start equation is also assumed. The likelihood functions for this model are:

n [1 - <^{u > -a^in ^(^ > -a^ NS (12) n 1 - »(Y*)»(f)] n |(^)*(r*)4^- ßz

where the first corresponds to an independent probit start equation and the second is for the independent Cragg. This model has been used by Atkinson and others (/), Deaton and Irish (6), while Haines and others (9) use a simpler version. This version of the Cragg model implies consumers decide their optimal consumption levels and then given this information decide whether or not to start smoking. In contrast, the dependent Cragg model assumes feedback goes in both directions and not just unilaterally from the consumption to the start equation.

The Tobit model is a nested version of the Cragg model with ^(aX) = 1 ; that is, if the probability of participation is one. One advantage of the Cragg model over the Tobit model is that the former allows variables to have differing effects on the consumption and participation decisions.

Dominance models form another class of specifications. These models imply that no individual is observed at a standard corner solution because once the decisions to start but not to quit are made, none of the zeros are generated by the consumption decision. This means that individuals with zero consumption, unlike in the Cragg and Tobit models, have no influence on the Engel curve for cigarettes. For commodities such as cigarettes,, which are both addictive and used frequently, this is appealing because it suggests that zero consumption should not be thought of in terms of marginal adjustments or as a result of infrequency of purchase. Statistically, dominance implies that g(»)=^(»), which is equivalent to «(•) = 1.

The most general dominance model considered here assumes that the error terms of the quit and consumption equations are correlated and that quitting dominates the consumption equation. This is a bivaríate probit/sample selection model and can be estimated using the LIMDEP statistical package (8). This model can be simplified by maintaining the dominance of quitting over consumption but assuming the probit errors are independent (that is, an independent start equation) and the errors of the quit and consumption equations are correlated. The likelihood functions for this dominant quit model are: Yl [1 -<>(^ > -a^O n *(^ ^" "^ NS s n [-(1/2) In o^ ^ In (p((/* - ß2)/a)] ^ 113) In o[(Y/r - p J ((/* - ß2)/a))/v^r^] •

n ln(1 - 4&(Y*))

where the first equation is an independent probit for the start decision, the second is a Heckman sample selectivity model that can be estimated by maximum likelihood methods, and J = ^'^G where G denotes the distríbution function for v {13, 14).

This model is often referred to as a Heckman sample selectivity model and can be estimated by maximum likelihood methods (5, 13). This model differs from the dependent Cragg in that individuals with zero consumption provide no restríctions on the parameters of the consumption equation.

The dominance or Heckman model can be further simplified by assuming that the quit and consumption equations are independent; that is, p = 0. This model, termed the complete dominance model, separates into three independent components: a probit for starting, a probit for quitting, and ordinary least squares for the consumption equation over smokers only. The model implies that decisions concerning starting, quitting, and how much to smoke are independently made and that once the decision to smoke (and not to quit) is made, no one is at a standard corner solution. A summary of the models is presented in table 1.

In all of the above models, if a parametríc restríction is valid but not imposed, the resultant estimators lose efficiency. However, if the restríction is not valid but imposed anyway, the estimators are inconsistent and cannot be used for statistical testing.

Descriptive Statistics and Model Specification

The varíables and their definitions as used in the varíous empirícal models of smoking behavior are presented in table 2. Of course, not all of these varíables are used in each equation. Table 3 contains the mean values for all of the model varíables for the entire sample and for subsamples containing current smokers, ex-smokers, and nonsmokers. The data indicate that about 48 percent of the sample of 2,962 women have smoked at one time or another. About 39 percent of the total sample are current smokers and over 14 percent of the current nonsmokers were smokers at one time. Approximately 9 percent of the entire sample are ex-smokers.

Table 1-Nesting of the models

A. Double-hurdle models

1. Double-hurdle with dependence a. Independent starting equation (probit) b. Dependence between quitting and consumption (maximum likelihood)

Restrictions = 1 i

2. Double-hurdle with independence a. Independent starting equation (probit) b. Cragg model (independence between quit and consumption) (maximum likelihood)

Restrictions = number of variables in quit equation I

3. Tobit a. Independent starting equation (probit) b. Tobit equation (Tobit estimation)

B. Dominance models

1. Dependence between start and quit a. Start and quit equations (Bivariate probit) b. Consumption equation (Heckman's two-step)

Restrictions = 1 I

2. Dominant quit model a. Independent start equation (probit) b. Dependence between quit and consumption equations but quit decision dominates

Restrictions = 1 I

3. Complete dominance model a. Independent start equation (probit) b. Independent quit equation (probit) c. Consumption equation (ordinary least squares)

Note: Restrictions refer to the number of restrictions to be placed on the more general model to derive the nested or simpler version. Table 2--Vahable definitions used in empirical models of smoking behavior

Variable Definition

South Equals one if household resides in the South, else zero. East Equals one if household resides in the East, else zero. North Central Equals one if household resides in the North Central region, else zero. West Omitted base group. Race Equals one if household head is Black, else zero. Household size Number of persons in household. Age Age of female head in years. Age^ Squared age of female head in years. Central city Equals one if household resides in central city, else zero. Suburban Equals one if household resides in suburban area, else zero. Work Equals one if female head is employed outside of home, else zero. Education Years of formal education of female head. Weight/height Weight (lbs) of female head divided by her height (inches). Physical activity Equals one if female head's usual level of physical activity at job/housework is heavy, equals two if moderate, equals three if light, equals four if none (bedridden/confined to wheelchair). Homeowner Equals one if household owns a home, else zero. Ethnicity Equals one if non-Hispanic, else zero. Male present Equals one if adult male present, else zero. Income Yearly per person income. Health Equals one if female indicates her health is good or better, else zero. Cigarettes Number of cigarettes consumed daily. Start Equals one if person has ever smoked, else zero. Quit Equals one if person currently smokes, zero if she has started but stopped.

Table 3--Sample mean values for all model variables

Variable Full sample Current smokers Ex-smokers Nonsmokers

East 0.239 0.269 0.184 0.220 North Central 0.195 0.236 0.220 0.168 South 0.331 0.315 0.310 0.341 West 0.235 0.180 0.286 0.271 Race 0.313 0.310 0.208 0.315 Household size 4.092 3.794 3.788 4.285 Age 31.706 31.588 32.055 31.783 Age^ 1,076.055 1,064.275 1,094.200 1,083.701 Central city 0.413 0.432 0.357 0.400 Suburban 0.270 0.286 0.275 0.260 Work 0.055 0.041 0.063 0.065 Education 11.259 11.041 11.702 11.399

Weight/height 2.342 2.266 2.479 2.391 Physical activity 2.088 2.061 2.043 2.105 Homeowner 0.318 0.268 0.337 0.350 Ethnicity 0.707 0.759 0.741 0.675 Male present 0.525 0.412 0.537 0.580 Income 2,547.521 2,603.482 2,843.430 2,511.300 Health 0.808 0.763 0.792 0.838 Cigarettes 6.566 16.709 0.000 0.000 Start 0.479 1.000 1.000 0.142 Quit 0.086 0.000 1.000 0.142 Number of observations 2,962 1,164 255 1,798

10 The data reveal that a higher proportion of current smokers versus nonsmokers live in the East and North Central regions and are non-Black and non-Hispanic. Smokers tend to be younger, reside in smaller size households, and reside in central city and suburban locations more often than nonsmokers. Nonsmokers are more likely than smokers to work outside the home, own a home, and have more education. On average, smokers consumed 16.7 cigarettes per day and the average daily consumption per individual in the survey was 6.6 cigarettes.

Our models are concerned with three aspects of smoking behavior: (1) the start decision: whether or not a woman has ever smoked, (2) the quit decision: whether or not a woman who has smoked has quit, and (3) the number of cigarettes consumed.

In general, it is difficult (often impossible) to rationalize why one variable should affect starting or quitting, and not consumption (see {16) for a general discussion of this in Cragg-type models). Consequently, the starting, quitting, and consumption equations are postulated to be functions of region and urban location of residence, number of children present, age, age squared, employment status, education, ethnicity^ income, race, homeownership, presence of an adult male, and employment status. In addition, the starting equation includes a variable indicating the physical activity level of the respondent, the quit equation a variable measuring the respondent's weight to height ratio, and the consumption equation includes a self-evaluated health status variable. It is unclear whether persons in good health at the time of the survey are more or less likely to start smoking than those reporting bad health. For example, those in bad health currently may have had ¡ll-health their entire lives and thus never acquired the habit. In any case, bad health would be associated with nonsmoking. Conversely, it may be the case that, all else equal, good health increases the probability of not observing a smoker. The latter would be the situation if smokers, in general, have poorer health than nonsmokers. Thus, health status is not a good predictor of starting or quitting but may be associated with the level of cigarette consumption. We postulate that current physical activity levels may be a reflection of health awareness attitudes developed over a lifetime and, thus, be an important explanatory factor in the participation decision. The next section discusses in more detail the hypothesized effects of other variables.

Some Statistical Notes

The independent and dependent Cragg models were estimated with software supplied to us by A. M. Jones, University of York. Other models were estimated using the LIMDEP statistical package (5). The likelihood ratio method is used to test the hypothesis that a nested or constrained version of one model is statistically equivalent to a more general or unconstrained model. The likelihood ratio is defined as the of the constrained likelihood function divided by the value of the unconstrained likelihood function. It can be shown that minus twice the logarithm of the likelihood ratio, given a large sample, is asymptotically distributed as chi-squared with the degrees of freedom corresponding to the number of restrictions placed on the unconstrained model to derive the simpler model or:

X^^ = -2[ln ä^ - in ^l where ^^ ^^^ ^r denote the unrestricted and restricted likelihood functions.

Empirical Findings

Parameter estimates for the various sample separation models are presented in tables 4-9. Regardless of the model, the start equation correctly classifies as smokers (current and ex-smokers) or nonsmokers about 65 percent of the observations using the (0.5, 0.5) criteria. For this criteria, a correct classification means that the predicated probability of starting for an actual starter is equal to or greater than 0.5 and below 0.5 for a nonstarter. The quit equations correctly classifies (as current smokers or ex-smokers) about 94 percent of the observations using the same criteria.

11 A maximum likelihood ratio test, Or^df^i = 0.2), accepts the hypothesis that the independent Cragg model (table 5) is an acceptable alternative to the double-hurdle model with dependence (table 4). This indicates that the decisions about quitting and consumption levels are not made simultaneously.

A likelihood ratio test overwhelmingly rejects the null hypothesis that the independent Cragg and the standard Tobit model (table 6) are statistically equivalent, ix^af^M = 120.7). This implies that the decisions about (1) quitting, and (2) how many cigarettes to consume daily are not based on the same decisionmaking structure. The race variable provides a graphic example of the erroneous conclusions that would be drawn from using the estimated Tobit model instead of the Cragg model. This variable is statistically significant at usual confidence levels in the Tobit equation and has a negative sign which indicates that Black females are more likely to quit smoking and consume fewer cigarettes than similar non-Blacks. However, in the Cragg model, race is insignificant but positive in the quit equation and negative and significant in the consumption equation. This implies that Black females are less likely to quit smoking than non-Black females, all else equal. The consumption equation, like the Tobit, indicates that non-Blacks consume more cigarettes than Blacks, all else equal. Hence, drawing inferences about the effects of race on the quit decision from the Tobit equation alone would lead to erroneous conclusions because this model does not allow a variable to have different effects on the quit and consumption decisions.

The dominant quit model (table 8) is statistically an acceptable form of the bivariate probit/sample selection model (table 7), {x^af=^ = 0.4). This indicates that the start and quit decisions are independent.^ Furthermore, a likelihood ratio test indicates that the complete dominance model (table 9) is an acceptable form of the dominant quit model, (A'^df^i = 0.8). This is confirmed by the low and statistically insignificant value for rho (= -0.13) in the dominant quit model. This likelihood ratio test is also a test for sample selection bias (7). Hence, we can reject the hypothesis that sample selection bias is a problem.

Therefore, the independent Cragg and the complete dominance models both with an independent starting equation appear to be the simplest, acceptable formulations for modeling the start, quit, and consumption behavior in our sample. Both of these models imply that the error terms for the start, quit, and consumption equations are independent. The principle difference lies in the treatment of the zeros. In this case, the Cragg model implies that zeros can be caused by quitting or by nonpurchase. It also implies that the market demand curve for cigarettes should be estimated over smokers and ex-smokers. The complete dominance model implies that no one is at a standard corner solution since only current smokers are included in the consumption equation.

We tentatively conclude that the complete dominance model is the preferred specification for the following reasons: (1) dependence is rejected between the errors of the start, quit, and consumption equations in all models; (2) sample selection bias is not present in the dominant quit model; and (3) the intuitive plausibility of the dominance assumption for modeling smoking behavior is attractive. The estimates of the coefficients for the quit and consumption equations fortunately are similar in sign across the complete dominance and independent Cragg models. The major differences appear to be with the North Central, South, and suburban dummy variables and the race and ethnicity indicators in the quit equation. They are significant in the complete dominance model but not in the independent Cragg model. On the other hand, the coefficient for the male present variable is significant in the Cragg model but not the complete dominance model. The coefficients in the consumption equations between the two models are very similar with respect to the direction of effect and significant levels. The focus of the remainder of this report is on the complete dominance formulation.

^Blaylock and Blisard find that participation (defined as a current smoker or a nonsmoker) and consumption are also independent.

12 Table 4-Double-hurdle model with dependence between quitting and consumption and an independent starting equation

Variable Coefficient Standard error

Starting equation (n = 2,962): Constant -0.479 0.398 East 0.202* 0.071 North Central 0.413* 0.075 South 0.156* 0.069 Central city 0.091 0.065 Suburban 0.133* 0.064 Children -0.096* 0.016 Age 0.788* 0.229 Age^ -0.120* 0.338 Work -0.201* 0.107 Homeownership -0.054 0.058 Education -0.047* 0.010 Physical activity -0.100* 0.036 Ethnicity 0.284* 0.054 Race -0.213* 0.061 Male present -0.360* 0.052 Income 0.012 0.014 Log likelihood = -1,939.1

Quitting equation (n 1,419): Constant 5.267* 1.346 East 0.464* 0.201 North Central 0.312 0.197 South 0.162 0.195 Central city 0.304 0.211 Suburban 0.158 0.154 Children 0.025 0.053 Age -0.914 0.769 Age^ 0.110 0.110 Work -0.164 0.268 Honneownership 0.141 0.158 Education -0.142* 0.035 Weight/height -0.484* 0.105 Ethnicity 0.031 0.196 Race 1.290 0.791 Male present -0.251* 0.145 Income 0.026 0.034

Consumption (n = 1,419): Constant -5.175 6.152 East 4.552* 1.120 North Central 4.585* 1.508 South 4.626* 1.112 Central city 0.539 0.995 Suburban 1.490* 0.976 Children -0.411 * 0.246 Age 13.387* 3.580 Age^ -1.699* 0.527 Work -1.475 1.832 Homeownership 1.595* 0.924 Education -0.069 0.176 Health status -7.186* 1.530 Ethnicity 3.724* 0.845 Race -6.930* 0.925 Male present 0.214 0.778 Income -0.327* 0.191 Sigma 10.992* 0.285 Rho -0.112 0.157 Log likelihood = -4,832.23

Note: * indicates significant at 10-percent level.

13 Table 5--lndependent double-hurdle (Cragg) model between quitting and consumption and an independent starting equation

Variable Coefficient Standard error

Starting equation (n = 2,962): Constant -0.479 0.398 East 0.202* 0.071 North Central 0.413* 0.075 South 0.156* 0.069 Central city 0.091 0.065 Suburban 0.133* 0.064 Children -0.096* 0.016 Age 0.788* 0.229 Age^ -0.120* 0.338 Work -0.201 * 0.107 Homeownership -0.054 0.058 Education -0.047* 0.010 Physical activity -0.100* 0.036 Ethnicity 0.284* 0.054 Race -0.213* 0.061 Male present -0.360* 0.052 income 0.012 0.014 Log likelihood = -1,939.1

Quitting equation (n = 1,419): Constant 5.377* 1.369 East 0.454* 0.203 North Central 0.315 0.202 South 0.165 0.211 Central city 0.308 0.246 Suburban 0.163 0.156 Children 0.024 0.057 Age -0.971 0.781 Age' 0.116 0.111 Work -0.176 0.269 Homeownership 0.017 0.199 Education -0.142* 0.036 Weight/height -0.477* 0.108 Ethnicity -0.136 0.164 Race 1.521 1.789 Male present -0.241* 0.148 Income 0.024 0.035

Consumption (n = 1,419): Constant -4.629 6.049 East 4.733* 1.071 North Central 4.700* 1.118 South 4.692* 1.093 Central city 0.612 0.987 Suburban 1.548* 0.958 Children -0.399* 0.224 Age 13.106* 3.508 Age' -1.667* 0.517 Work -1.500 1.800 Homeownership 1.537* 0.918 Education -0.109 0.166 Health status -7.194* 1.532 Ethnicity 3.757* 0.824 Race -6.757* 0.891 Male present 0.108 0.762 Income -0.317* 0.187 Sigma 10.999* 0.762 Log likelihood = -4,832.43

Note: * indicates significant at lO-percent level.

14 Table 6--lndependent start equation with Tobit consumption

Variable Coefficient Standard error

Starting equation (n = 2,962): Constant -0.479 0.398 East 0.202* 0.071 North Central 0.413» 0.075 South 0.156» 0.069 Central city 0.091 0.065 Suburban 0.133» 0.064 Children -0.096» 0.016 Age 0.788» 0.229 Age^ -0.120» 0.338 Work -0.201» 0.107 Homeownership -0.054 0.058 Education -0.047» 0.010 Physical activity -0.100» 0.036 Ethnicity 0.284» 0.054 Race -0.213» 0.061 Male present -0.360» 0.052 Income 0.012 0.014 Log likelihood = -1,939.1

Tobit (n = 1,419): Constant 3.899 6.171 East 6.174» 1.065 North Central 5.477» 1.110 South 5.218» 1.067 Central city 1.723» 0.974 Suburban 2.351» 0.959 Children -0.280 0.247 Age 7.958» 3.627 Age^ -1.054» 0.536 Work -2.115 1.715 Homeownership 0.901 0.897 Education -0.573» 0.174 Health status 5.224» 1.644 Ethnicity 3.478» 0.841 Race -3.925» 0.905 Male present -0.711 0.279 Income -0.215 0.188 Sigma 12.887» 0.279 Log likelihood = -4,892.8

Note: » indicates significant at 10-percent level.

Location of residence nnay have an influence on snnoking behavior because the regional and urban dummy variables may serve as proxies for cigarette price differentials and the population in some areas may be more tolerant of smoking as a mode of social behavior because of tradition or economic ties to the tobacco industry. Also, residents of central cities or suburban areas may be more likely to smoke than their counterparts in rural locations because of exposure to both intense peer pressure and advertising messages as well as the stress of urban life, all else equal.

The complete dominance model confirms many of these hypotheses. For example, residents of the West or South are less likely to start smoking than residents of other regions but are more likely to quit if they did start, all else equal. However, southern women tend to smoke more cigarettes than women living in other regions, about four more per day than women in the West. The model also indicates that residing in a suburban area tends to increase the odds of starting, all else equal.

15 Table 7-Dependence between starting and quitting equations, quit dominates consumption

Variable Coefficient Standard error

Starting equation (n = 2,962): Constant -0.470 0.398 East 0.206* 0.071 North Central 0.411* 0.075 South 0.161* 0.069 Central city 0.094 0.065 Suburban 0.136* 0.064 Children -0.098* 0.016 Age 0.776* 0.229 Age^ -0.118* 0.338 Work -0.211* 0.107 Homeownership -0.057 0.058 Education -0.049* 0.010 Physical activity -0.104* 0.036 Ethnicity 0.282* 0.054 Race -0.209* 0.061 Male present -0.361* 0.052 Income 0.013 0.014

Quitting equation (n 1,419): Constant 2.447* 0.701 East 0.432* 0.123 North Central 0.362* 0.127 South 0.198* 0.120 Central city 0.112 0.115 Suburban 0.188* 0.110 Children 0.021 0.030 Age -0.009 0.062 Age^ 0.062 0.421 Work -0.141 0.188 Homeownership -0.094 0.104 Education -0.107* 0.022 Weight/height -0.411* 0.073 Ethnicity 0.163* 0.096 Race 0.319* 0.119 Male present -0.108 0.093 Income 0.004 0.021 Rho 0.013 0.210

Consumption (n = 1,164): Constant -0.092 5.309 East 3.771* 1.013 North Central 5.449* 1.086 South 5.300* 0.925 Central city 0.822 0.805 Suburban 1.044 0.819 Children -0.432* 0.201 Age 11.062* 3.128 Age^ -1.384* 0.456 Work -1.255 1.542 Homeownership 1.538* 0.742 Education -0.037 0.164 Health status -6.542* 0.971 Ethnicity 2.993* 0.759 Race -6.235* 0.849 Male present 0.220 0.629 Income -0.274* 0.164 Sigma 9.727* 0.229 Lamda -0.103 0.306

Log likelihood = -6,855.82

Note: * indicates significant at 10-percent level.

16 Table 8--Dominant quit with independent start

Variable Coefficient Standard error

Starting equation (n = 2,962): Constant -0.479 0.398 East 0.202* 0.071 North Central 0.413* 0.075 South 0.156* 0.069 Central city 0.091 0.065 Suburban 0.133* 0.064 Children -0.096* 0.016 Age 0.788* 0.229 Age^ -0.120* 0.338 Work -0.201 * 0.107 Homeownership -0.054 0.058 Education -0.047* 0.010 Physical activity -0.100* 0.036 Ethnicity 0.284* 0.054 Race -0.213* 0.061 Male present -0.360* 0.052 Income 0.012 0.014 Log likelihood = -1,939.1

Quitting equation (n = 1,419) Constant 2.457* 0.701 East 0.421* 0.123 North Central 0.356* 0.127 South 0.195* 0.120 Central city 0.111 0.115 Suburban 0.183* 0.110 Children 0.019 0.030 Age -0.011 0.062 Age^ 0.060 0.421 Work -0.147 0.188 Homeownership -0.094 0.104 Education -0.101* 0.022 Weight/height -0.409* 0.073 Ethnicity 0.159* 0.096 Race 0.316* 0.119 Male present -0.117 0.093 Income 0.004 0.021

Consumption (n = 1,164): Constant -0.089 5.309 East 3.763 1.013 North Central 5.453* 1.086 South 5.298* 0.925 Central city 0.820 0.805 Suburban 1.049 0.819 Children -0.439* 0.201 Age 11.058* 3.128 Age^ -1.384* 0.456 Work -1.254 1.542 Homeownership 1.536* 0.742 Education -0.038 0.164 Health status -6.534* 0.971 Ethnicity 2.987* 0.759 Race -6.234* 0.849 Male present 0.223 0.629 Income -0.270* 0.164 Sigma 9.732* 0.229 Rho -0.134 0.301 Log likelihood = -4,916.9

Note: * indicates significant at 10-percent level.

17 Table 9--Complete dominance nnodel

Variable Coefficient Standard error

Starting equation (n = 2,962): Constant -0.479 0.398 East 0.202* 0.071 North Central 0.413* 0.075 South 0.156* 0.069 Central city 0.091 0.065 Suburban 0.133* 0.064 Children -0.096* 0.016 Age 0.788* 0.229 Age^ -0.120* 0.338 Work -0.201* 0.107 Homeownership -0.054 0.058 Education -0.047* 0.010 Physical activity -0.100* 0.036 Ethnicity 0.284* 0.054 Race -0.213* 0.061 Male present -0.360* 0.052 Income 0.012 0.014 Log likelihood = -1,939.1

Quitting equation (n = 1,419) Constant 2.456* 0.704 East 0.420* 0.121 North Central 0.357* 0.123 South 0.196* 0.117 Central city 0.109 0.111 Suburban 0.183* 0.108 Children 0.020 0.029 Age -0.010 0.062 Age^ 0.048 0.422 Work -0.152 0.183 Homeownership -0.093 0.101 Education -0.100* 0.022 Weight/height -0.404* 0.071 Ethnicity 0.161* 0.099 Race 0.318* 0.112 Male present -0.116 0.089 Income 0.004 0.021 Log-likelihood = -620.72

Consumption (n = 1,164): Constant -0.089 5.086 East 3.763* 0.884 North Central 3.674* 0.926 South 4.154* 0.891 Central city 0.886 0.800 Suburban 1.150 0.794 Children -0.432* 0.203 Age 10.973* 2.985 Age^ -1.378* 0.441 Work -1.324 1.453 Homeownership 1.511* 0.745 Education -0.004 0.141 Health status -6.522* 1.339 Ethnicity 3.070* 0.697 Race -6.113* 0.727 Male present 0.163 0.631 Income -0.271* 0.156 Log-likelihood = -4,296.56

Note: * indicates significant at 10-percent level.

18 As the number of children in a household increases, the odds decline that a woman will start smoking, all else equal. Likewise, a woman in a household with children tends to consume fewer cigarettes than a single woman. One possible reason for these findings is that females with more children run the risk of exposing more people to the potential health hazards associated with passive smoke. Also, since larger households contain more members, they are more likely to contain some individuals that are less tolerant of smoking as a form of social behavior. However, the number of children in a household did influence the odds of quitting.

There is much evidence that a clear life-cycle pattern exists for smoking. Marsh and Matheson found that 75 percent of smokers were smoking by the age of 18 and only 7 percent started after the age of 25 1/5). Those that quit smoking tended to be in their 30's and 40's. This life-cycle pattern was modeled by including age and age squared in the starting, quitting, and consumption equations. Empirical results confirm that a life-cycle pattern exists for starting and consumption but the variables are insignificant in the quitting equation. The probability of observing a starter peaks at about age 33 and the consumption of cigarettes increases up to about age 40 and then declines, all else the same.

The hypothesis that years of formal education make individuals more cognizant of the dangers of cigarette consumption is partially confirmed. More educated women are found to be less likely to start and more likely to quit but a smoker's education level did not affect the number of cigarettes consumed daily.

We postulate that working women are less likely to be smokers than similar nonworking women because of the increased emphasis on a smoke-free work environment. It is also expected that females with active lifestyles may be more health conscious and less likely to smoke than others with a more sedentary lifestyle. The starting equation gives credence to the these hypotheses, but the work variable was not significant in the quit or consumption decisions.

Owning a home did not significantly affect the odds of starting or quitting cigarettes but increased the number of cigarettes consumed daily by about 1.5. This variable may be a proxy for wealth or asset levels. Women residing in households with an adult male present are found to have a lower probability of starting but the variable did not significantly influence the quitting or consumption decisions, all else equal. This variable may be a crude indication of the stress of an individual's living environment. Stress may influence a person's dependence on the psychological support provided by smoking.

The education, work, and male present variables can be viewed as measuring the effects of social interactions. These variables were found to significantly influence the start but not the consumption decision. This finding is consistent with the view that social interaction is mostly a qualitative factor relating to the decision to begin smoking and that the level of consumption is a more private decision.

We found that non-Black females are more likely to quit than their Black counterparts but they are also more likely to start. Non-Black smokers were found to consume about six more cigarettes per day than Black smokers, all else equal. Non-Hispanics were found to have a higher probability of starting, to have a lower probability of quitting, and were also found to smoke more, about three cigarettes more per day than Hispanics, all else equal. The effects of these variables on smoking behavior, which cannot be predicted a priori, may relate to cultural and environmental factors as well as tastes and preferences.

Income did not significantly influence the odds of starting or quitting. Income has a significant negative effect on consumption levels. The estimated elasticity at the sample means was -0.04. This indicates that as income increases, the number of cigarettes consumed declines. It should be kept in mind that this finding is for low-income women and may not be applicable to the entire population.

19 Poor health is a significant determinant of the number of cigarettes consumed daily. Results show that smokers in ill-health tend to smoke about six and a half fewer cigarettes per day than their counterparts in good health, all else equal. Higher values of the height-to-weight ratio increase the odds a woman has quit smoking. This variable may be related to health variability. As expected, higher levels of physical activity significantly reduced the odds that a woman will start smoking.

The profile of a low-income woman most likely to start smoking is: resides in a suburban area of the North Central region, resides in a small household, is unemployed with little education, is non- Hispanic, is about 33 years old, and does not have an adult male in the household. The profile of a quitter is: resides in the West, is non-Black, Hispanic, educated, and lives with an adult male. A profile of a heavy smoker would include the following characteristics: resides in the South, lives in a small household, is about 40 years old, is non-Hispanic and non-Black, and evaluates her health as good.

Summary of Empirical Findings

To correctly model the demand for a product one should account for the peculiarities of the commodity. This is very important for cigarettes, where we believe a double-hurdle model is appropriate and testing is necessary to determine whether or not the decisions to start, quit, and how much to consume are separate, endogenous choices.

Our models indicate that a separate, exogenously determined starting equation is probably satisfactory. We also found strong evidence that the quit and consumption decisions are independent. The simplest models that were found to be satisfactory include the independent Cragg and the complete dominance model. The major difference between these models is in the treatment of zero values. The sample-separation Cragg model postulates that both ex- and current smokers have information to contribute to the Engel equation. On the other hand, the complete dominance model postulates that no one is at a standard corner solution. We believe the complete dominance models are preferred over the Cragg model in our sample.

The conclusion that the dominance models are "better" than the Cragg-type models is not based on any statistical test but rather on the characteristics and use patterns of cigarettes. However, the dominance and Cragg models are quite similar with few significant differences with respect to the sign of coefficients in the start, quit, and consumption models.

The most important variables influencing the starting decision are location of residence, number of children, age, education, ethnicity, level of physical activity, and the presence of an adult male. Region, education, weight/height, race, ethnicity, and the presence of an adult male were the most significant variables in the quitting equation. Consumption was influenced most by region, household size, age, homeownership, race, ethnicity, and health status. If public advertising messages are designed to encourage people not to begin smoking they might focus on non- Hispanics who have little education. If the message is to encourage people to reduce consumption, the target group would include younger, non-Hispanics, and non-Blacks. Messages designed to encourage those women to quit who are least likely to do so would target Blacks living in the East and North Central regions who have little education.

20 References

(1) Atkinson, A.B., J. Gomulka, and N.H. Stern. "Household Expenditure on Tobacco 1970- 1980: Evidence from the Family Expenditure Survey." LSE ESRC Programme on taxation, incentives and the distribution of income. Discussion paper 57, 1984.

(2) Bagozzi, G.S. "A Prospective for Theory Construction in Marketing," Journal of Marketing 48, (1984) 11-29.

(3) Blaylock, J. and W. Blisard. "U.S. Cigarette Consumption: The Case of Low-Income Women." Unpublished paper.

(4) Blundell, R.W. and C. Meghir. "Bivahate Alternative to the Univariate Tobit Model," Journal of 33. (1987) 179-200.

(5) Cragg, J.G. "Some Statistical Models for Limited Dependent Variables with Applications to the Demand for Durable Goods," 39, (1971) 829-44.

(6) Deaton, A. and M. Irish. "Statistical Models for Zero Expenditures in Household Budgets," Journal of 23, (1984) 59-80.

(7) Dhrymes, P.V. "Limited Dependent Variables," in Z. Griiiches and M.D. Intrilligator (eds.). Handbook of Econometrics. Vol. Ill, North-Holland, Amsterdam, (1986).

(8) Greene, W.H. Limdep User's Manual. New York University, 1986.

(9) Haines, P., D. Guilkey, and B. Popkin. "Modeling Food Consumption Decisions as a Two- Step Process," American Journal of 70, (1988) 543-552.

(10) Heckman, J.J. "Sample Selection Bias as a Specification Error," Econometrica 47, (1979) 153-161.

(11) Jones, A.M. "A Double-Hurdle Model of Cigarette Consumption." Discussion Paper 128, Department of Economics, University of York, 1987.

(12) Lee, J.W. and A. Kidane. "Tobacco Consumption Pattern: A Demographic Analysis," Atlantic Economic Journal 15, (1987) 92.

(13) Lee, L. and G. Maddala. "The Common Structure of Tests for Selectivity Bias, Serial Correlation, Heteroscedasticity and Non-Normality in the Tobit Model," International Economic Review. 1-20, 1985.

(14) Maddala, G. Limited Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press, 1983.

(15) Marsh, A. and J. Matheson. Smoking Attitudes and Behavior: An Enguirv on Behalf of the Department of Health and Social Security. HMSO, London, 1983.

(16) Pudney, S. Modelling Individual Choice: The Econometrics of Corners. Kinks, and Holes. London: Basil Blackwell, 1989.

21