1999: Comparing Multilevel and Single-Level Negative

COMPARING MULTILEVEL AND SINGLE-LEVEL NEGATIVE BINOMIAL REGRESSION MODELS OF PERSONAL CRIMES: EVIDENCE FROM THE NATIONAL CRIME VICTIMIZATION SURVEY. Andromachi Tseloni, Department of Criminology and Criminal Justice, University of Maryland, 2220 LeFrak Hall, College Park MD 20742 Key words: Multilevel Models, Negative Binomial, The sample, in principle, represents the non Personal Crimes institutionalized permanent residents of the U.S. 12 years or older. The Address List from the Decennial 1. Introduction Census provides the sampling frame for a rotating panel Multilevel/hierarchical modeling explicitly oi" housing units. The survey collects detailed " accounts for the clustering of the units of analysis in information about the victimization experiences of each surveys which use multi-stage sampling, such as the member 12 years or older of the households which live National Crime Victimization Survey (NCVS). It thus in the selected housing units. The housing units remain avoids atomistic or ecological fallacy which occurs in the NCVS sample for three years and their eligible when one analyzes such data assuming that they are the residents are interviewed every six months during that product of simple random sampling. In empirical period. victimization research the clustering of crime incidents The 1994 NCVS includes information within individuals, individuals within households, collected during three interviews between January 1994 households within neighborhoods etc. makes multilevel and June 1995. Therefore it includes information on modeling quite attractive. To my knowledge, the only victimizations which occurred during a period of one published studies in the field of victimology using and a half years. Being a sample of housing units the hierarchical techniques are Rountree et al. (1994) and NCVS has serious attrition problems . since it does not Rountree and Land (1996) which model the risk of follow (or trace back) the households if they move out burglary and violent crime in the neighborhoods of (or in) the selected units before the end of the survey. Seattle. Neither, however, discusses how its results The sample of this study consists of all the differ from conventional regression modeling nor does households which occupied the selected housing units it analyze the implications of random effects. at the time of the first interview (January to June 1994) This study is concerned with comparing the of the 1994 NCVS public use file. Households which results between estimated single-level" and multilevel moved in after this period have been removed from the models of personal crimes drawn from the 1994 NCVS. data set. Thus the sampling unit of the employed data The main interest here is to find out whether and, if so, set is the household (rather than the housing unit) with how the multilevel specification offers new insights all its members 12 years or older. Housing units which into victimization. Unlike most empirical victimization were occupied by more than one household during the research this work focuses on personal crime counts or period of the survey are not overcounted here. Another incidence rates rather than risks which ignore the very advantage is that most included households (91.3%) prevalent phenomenon of crime concentration (Farrell, had a bounding (or pre-survey) interview. This is very 1992; Osborn and Tseloni, 1998). The statistical important because it ensures lack of telescoping in specification employed is the negative binomial crime reporting. The data offer a natural 2 level regression model which accounts for the population hierarchy of individuals (level-1 unit) nested within unexplained heterogeneity in cross section analyses households (level-2 unit). Table 1 presents the (Osborn and Tseloni, 1998: 308). distribution of individuals per household in the sample. The next section is an overview of the data set employed. Sections 3 and 4 describe the variables used 3. The response variable in this study and give theoretical justification for their The response variable of this study, I/,./., is a selection. Section 5 presents the statistical models and count which gives the number of personal crimes each section 6 gives the results. A concluding section 7 household member 12 years or older has experienced discusses the implications of this exercise and offers during a maximum of 18 months prior to the last suggestions for future research. interview of the NCVS. Personal crimes is a composite variable which includes rape, sexual assault, robbery, 2. The data assault, threats, pocket-picking and larceny. In The study employs data from the 1994 NCVS. particular, Yu takes on values yo. =0, 1 ..... where i The NCVS is conducted monthly by the Census Bureau denotes the individual andj his or her household. of the U.S. on behalf of the Bureau of Justice Statistics. 19 Table 1: Number of Individuals 12 years old or older guardianship through knowledge of the neighborhood per Household. and friendship networks. Number of Individuals Frequency Two dummy variables, 6 and 12 months, 1 27,011 which are defined at the person level, capture the effect 2 89,126 of shorter reference periods for those who moved out 3 42,051 before the end of the survey. 4 26,307 The level-2 or household covariates include 5 8,718 indicators of affluence, such as number of cars owned, 6 2,307 annual family income, tenure, and type of 7 681 accommodation; household composition, namely 8 257 number of adults and children in the household; and 9 108 protection against crime. The last is measured by 10 52 whether devices against intruders are fitted and 11 11 participation in neighborhood watch. Place size in Total 196,629 terms of population and urban area of residence which are also defined at the household level depict physical Table 2 presents the empirical frequency proximity to potential offenders. distributions of personal victimizations which occurred Table 3 summarizes the descriptive statistics, within 6, 12 or 18 months. The distribution of the total namely mean, standard deviation (S.D.) where number of personal crimes (last column) is clearly very appropriate, and range of values, of the set of covariates skewed and the bulk of cases are concentrated around used in this study. They are qualitative except for age zero events (97.5%). Overdispersion, whereby the and number of adults which are discrete. The reference variance is greater than the mean, exists in the data. The category of each non-binary qualitative covariate is negative binomial regression model which allows for indicated as the base in Table 3 and shown in extra-Poisson variation seems appropriate for modeling parentheses next to the variable name in Table 4 of personal crimes (Cameron and Trivedi, 1986; modeling results below. McCullagh and Nelder, 1989). 5. The statistical model 4. The covariates 5.1. The multilevel Poisson model The empirical modeling of this study follows Goldstein (1995) describes multilevel models the routine activities or lifestyle theory of criminal for proportions and presents models for counts as an victimization. Proponents of the theory (Hindelang et extension of the former. The negative binomial al., 1978; Cohen and Felson, 1979; Felson, 1998) argue multilevel model derives as an extension of the that the demographic and socio-economic Poisson. First I define the Poisson multilevel model. characteristics of individuals and their households, as Let /zU be the expected number of well as their lifestyle patterns and everyday routine victimizations. The log link function for the Poisson activities determine their exposure to crime. With model with random coefficients is regard to personal crime they do so by determining ln/z 0 - r/0 - Xijfl + ZP=0 urjzTij + ~.rr=p+l bl rj Z rj individuals' vulnerability and their chances of coming (1) into contact with motivated offenders through social where 7:=0,1 .... r with r being the total number of and physical proximity in the absence of effective random coefficients in the model including the guardians. intercept. Note that the coefficients are random at To model the number of personal level-2 (household), or higher if there was any. The victimizations the characteristics of the individuals and level-1 (individual) randomness solely defines the their households, which theory and previous research probability distribution of the observed response suggest, are employed here. Similarly to the response (Goldstein et al., 1998). variable they are drawn from the 1994 NCVS. The covariates which refer to the person (level-l) include X~j is a row vector of K (K >_ r) covariates for the 0" socio-demographic characteristics, such as sex, age, individual including the intercept, some of which may race, marital status, educational level, and employment refer to the individuals' household, zoij-1, zrij = xro status, as well as lifestyle factors, namely shopping, for 77=1.... p, are covariates for the 0" individual with evenings out, and use of public transportation. Length random effects, zrj = xrj, for 77=p+1..., r, refer to the of time living at the address is an indicator of r-p random effects covariates for the j household. 20 [urj]_ N (0, f)u) is the random departure from thej-th variable var(Udj) is restricted to zero for avoiding household (Goldstein, 1995). overspecification in the level-2 random part of the The probability distribution for I10. follows the model. In this case the variance of /tO between Poisson distribution so that the probability that Yo takes households of the base category, Xdj- O, is given by the specific value y ij is: var(uo), whereas var(uo.Q+2cov(uoj, U,l~ gives the Pr(Y0 - Y o ) - exp(-/l(i)/l~0 yii! , yij=O,1 ..... (2) between households with Xdj -- 1 variance (Goldstein, 1995). Thus, each household type defined by the with the usual property that E(~=var(Y~.j)--=~O " which dummy variable has different unexplained equals to exp(r/0") from Eq.

1999: Comparing Multilevel and Single-Level Negative

Statistical Approaches for Highly Skewed Data 1

Negative Binomial Regression Models and Estimation Methods

Measures of Fit for Logistic Regression Paul D

Multiple Binomial Regression Models of Learning Style Preferences of Students of Sidhu School, Wilkes University

Regression Models for Count Data Jason Brinkley, Abt Associates

Generalized Linear Models and Point Count Data: Statistical Considerations for the Design and Analysis of Monitoring Studies

Lecture 3 Residual Analysis + Generalized Linear Models

The Overlooked Potential of Generalized Linear Models in Astronomy, I: Binomial Regression

Negative Binomial Regression Second Edition, Cambridge University Press

R-Squared Measures for Count Data Regression Models with Applications to Health Care Utilization

Negative Binomial Regression

Multiple Approaches to Analyzing Count Data in Studies of Individual