If you have issues viewing or accessing this file contact us at NCJRS.gov.

'!Jr- • ~,-.. -- ____ •_____ ~-.--.;~~--, ~.-....., .'~-----~~"- .~ ..... ~~~_-'~ ...... ,-.-.-.,---t.\...-_~_ " . 4 .,. I National Criminal Justice Reference Service I------~~------nCJrs

This microfiche was produced from documents received for inclusion in the NCJRS data base. Since NCJRS cannot exercise .. MEASURING ASSOCIATIONS WITH control over the physical condition of the documents submitted, RANDOMIZED RESPONSE* the individual frame quality will vary. The resolution (;'hart on this frame may be used to evaluate the docum~nt quality.

1.0 :~ I~II~ IIIII~ w~ ~M 2.2 James Alan Fox L\l D36 W ~ I:iu u_n~ 2.0 College of Criminal Justice ... ~ 1.1 ... " .. ~I 'f" ! .

'" and

" i '~ , 111111.25 1I1111.~ 111111.6 ( ) . 1- I Paul E. Tracy Center for Studies in Criminology and Criminal Law II' University of Pennsylvania MICROCOPY RESOLUTION TEST CHART 11 NATIONAL BUR(AU OF STANDARDS.1963.A f 00 vi· ' !I' ' - l (April, 1982) II Microfilming procedures used to create this fiche comply with the standards set forth in 41CFR 101-11.504.

I! 1/ cr , , Points of view or opinions stated in this document are those of the author(s) and do not represent the official position or policies of the U. S. Department of Justice.

*Order of authors names determined by lottery. Direct communications to National Institute of Justice ! James Alan Fox, College of Criminal Ju'stice, Northeaste'r:n Unj.versity, Department of Justice f,,,> Boston, 02115. The aut.hors gratefully acknowledge Wesley Washington, D. C. 20531 Skogan of Northwestern University whose inquiry stimulated this note.

: 0 '0 ' :3/27/84\

) " ~. "1~------

')

MEASURING ASSOCIATIONS WITH

RANDOMIZED RESPONSE Since its introduction by W'arner (1965) the randomized response

Abstract ., approach has been the focus of considerable theoretical literature but has not been readily welcomed by survey researchers. The reluctance to The vi~w of many in the soc.ial science research community, it utilize randomized response is perhaps rooted in the fact that the de- appears, underestimates the scope and potential of the randomized response cision to employ the technique is irreversible. That is, unlike analytic technique, particularly with regard to its analytic capabilities. That innovations which can be experimented with and then discarded by the re- is, there seems to be some concern that this data collection strategy is, searcher without altering the quantity or quality of survey response data, by its nature, analytically restrictive. However, in this note we demon- the researcher cannot recapture information that is lost or altered by strate how the quantitative, unrelated question model of randomized response using a randomized response data collection strategy. It is easily under- can be reformulated into a measurement error model, which in turn allows a t stood, therefore, why randomized response is considered by some to be more wide range of multivariate approaches. Last, we illustrate through a f ) of a validation check than a primary method of collecting data (see, e.g., simulated correlation experiement the type of adjustments that need only Penick and Owens, 1976). be made to 'manipulate randomized response data more powerfully than has Skepticism over the randomized response technique appears to sur- been the practice in the past. round two principal issues: whether the reduction in bias earned through

randomized response outweighs the inefficiency of the technique and the

extent to which the technique limits analyt:f.c capabilities. The former U.S. Depertment ofJuatlee NatlOMIlnlttltute of Ju.tlce ThIs document hal bMn reproduced 8)(actly as reoefl/od ftom tho issue has been the topic of a number of validation efforts (see, e.g., ptI'IOn or 6fganlzaUon orIgoInaUng II. Polntl) of view or oplnluns 8111100 In thla documtnt 110 thoto of IIle authm and do oot oocessorlly t~r"'ol ltIe QlJc:ia/ potIUoo 01' poIIeIet ollhe NaUoo1I1 Institute of Folsom, 1974; Locander, 1974; Locander et. al., 1976; Bradburn and Sudman, ~~oo. .

~t1on 10 reproducalhll ~ materia' has boon 1979; and Tracy and Fox, 1981) which seem to suggest that the randomized granted by • Public Danain,/Natl.onal Institute of Justice ----- response technique has value in reducing systematic response error. The ------.------to the Nr.lonal Criminal Juab Retll'lII1Ce Setvloe (NCJRS). second concern, more partict~arly, that multivariate statistics are not Fur1hlw reproduction outllde of tt;. NOJRS a)'Stem requlro/i permlD' IkJ.n of Ihe~ 0WMr. possible with data ga.thered through randomized response methods, is still

an issue around which there is, considet'able confusion. The application

. , of univariate summary measures (e,g., proportions and means) and the

1 comparison of these measures across suhgroups are ohvious. HO't"ever, the richer ~, multivariate approaches (e.g., correlations, ANOVA; logit analysis) require t individual level data ...·hich are seemingly unavailahle 'Idth the randomized \ \ Distribution of Measurement Error in Randomized Responses response technique. Sudman and Bradhurn's (1982:81) recent comment reflects this \ ~ I, Consider the quantitative, unrelated question model employing an By using this procedure, you can estimate the undesirahle hehavior of a group; and, at the same time, the respo.ndent's I alternative question having a known distribution (see Greenberg et a1. , a.nonymity is fully protected. With this method, ho ...·ever, you I cannot relate individual characteristics of respondents to in­ 1971). More particularly, a survey respondent is instructed to manipulate dividual behavior. That is, standard regression procedures are not possihle at an individual level. If you have a very large a randomizing device following a bernoulli distribution with known para­ sample, group characteristics can he related to the estimates I ohtained from randomized response. For example, you could look I meter p, and then either to answer a sensitive question having unknown at all the ans ...·ers of young ~'omen and compare them to all the answers of men and older age groups. On the whole, however, much mean and variance, J.I and 0'2, or to report innocuously a random digit information is lost '1"hen randomized response is used. x x I (or nonsensitive response) having known location and scale parameters, In this note ~'e sho~T hm,' unkno ...n individual level data can he estimated for and 0' 2 , depending on the outcome of the randomizing device. use in multivariate analyses ...·ithout loss of information, explicating mathematically Y Let represent the verbal (observed) response given by the ith an approach recommended conceptually by Boruch (1972:410) (see also Boruch and respondent, and let and y' be the underlying (unobserved) sensitive Cecil, 1979:142). MOre precisely, we demonstrate that randomized response ohser- i and nonsensitive scores. As dictated by the randomizing device with vations can he treated as individual level scores that are contaminated hy se1ec tion probability p, random measurement error and, consequently, that ~ultivariate measures need

only he corrected for or purged of the effects of this error. Xi with probability p As is true of most of the theoretical and applied literature concerning z (1) i = randomized response, efforts to develop bivariate and multivariate approaches Yi with probability 1-p for randomized response data have focused ~ainly on dichotomous (or polytomous) Conditional on fixed xi' data in multh'ay contingency tahles (see Boruch, 1971; Barksdale, 1971; Drane,

1976; Cl icker ?-nd Igl e'\l'icz, 1976; Chen, 1979; Kraemer, 1980; and Tamhane, 1981) (2) with only limited attention to quantitative responses (see Rosenherg, 1979; Thus, although individual scores on the sensitive question (x) are un- Kraemer, 1980; and Himmelfarb and Edgell, 1981). The "errors in variahles" known, we can estimate approach to quantitative randomized response data presented here, like Chen's

(1979) misclassification perspective in the qualitative case, is sufficiently (3) gene~al to apply to a variety of data collection strategies and analytic techniques.

l 3 ~...... ,~ ----

Thus, We ~an define, further, the measurement error introduced by random-

ized response as ~ = (l-p) (~ ... ~ ) r p x y'

(4) 2 o == r Alternatively, the estimated individual scores derived from Equation (3)

can be viewed as the actual score plus a disturbance term, i.e., ...:nd

(5) \.is = ~ yx- u

Considering next properties of the disturbance u ' we sub- i 2 222 o = C1 /p + 0 • stitute Equation (3) into Equation (4) , s Y x·

u Substituting rand s back into Equation (6), it can be shown that i =

~ = P Jl (l-p) \.i 0 (7) From Equation (1) u r + s'=

and [x - (l-p) ~ l/p - x w.p. p i y i ~ u i = (6) w.p. l-p i ~ For simplicity, set + (l-p) o2 /p2 + 0 2 + (~ ... ~ ) 2J j [ y !f 1. x x r = [Xi ... (l-p) ~y]/p ... xi ' i I • (l-p) 2 ! f02 + 0 / + (~ _ ~ )2] (8) • (1-p) i p Lx y p x y p (Xi - ~y) ; >~ ~, i producing the desired expressions for the mean a~'ld variance of the dis- ~ and 1 turbance term •• :j Next, the covariance between Xi and u can be obtained in a simi­ :ir i 'I <,~ ,I lar fashion. We have ./

4 5 f'' , possible so long as the summary statistics (e.g., correlations) are G = E(xu) - E(x) E(u) = E(xu) xu corrected for attenuation produced by the randomized response procedure. because E(u) = O. Multiplying Equation (6) by xi' and collecting That is, the randomized response estimated scores are unbiased and, 2 xi terms, although containing measurement error, the effects of this unreliability can be corrected, as we demonstrate in the next sp-ction. 2 )/ (1 -p ) ( xi - xi ~y P w.p. p

w.p. 1-p

Next, we take expected values and collect covariance terms,

(l-p) (a2 + ~2 _ ~ ~) w.p. p p x x x y E(xu) = 2 2 (a + p J.l ~) / p - Ca '" ~.J w. p. I-p xy x y x ~

Thus,

(l-p) a E(xu) a 0 (9) xu = = p xy = since, by design of the randomized response procedure, and its 1 alternative are independent.

In sum, the randomized response procedure in effect contaminates a response by a random disturbance term having a zero mean, a variance given in Equation (8), and, most importantly, a zero covariance with the , variable under investigation. Therefore, since the distributional pro- I, perties of this type of measurement error are known and are quite trac- I~ J :) table, any multivariate approach using the estimated scores in (3) are I °1 ! ,I j j J I 1 6 7

~ ~., -~--~~------'-~~-'----'

Thus, the correlation between the unobserved true scores is Correcting Summary Statistics for Measurement Error from Randomized Response (11)

In order to demonstrate the correction procedure, consider, for ex- ample, estimating the correlation between two sensitive variables xl and Since the sample correlation of and is available directly, we

2 / 0 2 o 2/20 in order x which are surveyed under alternative question randomized response de­ need only construct estimates of the ratios o and u x 2 u xl 2 2 l 2 2 signs. The estimated individual scores, given by to achieve an estimate of Further, because v and (J are Yl Y2 known and and are ob~ervable, we derive below estimates of 2 0 and and then, using Equation (8), estimates of and x (10) 1 follow. Geueralizing Equation (1), the observed scores can, as before, be re-expressed as

w.p. I-p j.

where j = 1, 2. The mean and variance of z. are easily shown to be J where u and u are uncorrelated with their respective "true" scores il i2 (12) llZ = P )1 + (l-p.) lly , as well as with each other. ~ j j Xj J j

Because of this strict independence of the error terms, ~! I, and a,.,.x x = 0 x x ,and the r.:orrelation between the estimated scores becomes l 2 l 2 i 1 ,t1 £

1 Substituting (12) into (13) and simplifying algebraically, ~ r ~ "l A I ~f" 222 2 I } a • P 0 + (1 - P ) 0 + P (1 - Pj)(llx - II ) (14) I j j ,1 ~ j ~ ~ j ~

,\t1 ~ I i.l j 9 8 'I t l ,I - --~--. --~- ~~-~~~ r-

r1" \) \ ~ \ \ t \t< \" To illustrate this cor.rection procedure, we have simulated random- Solving for ;, ized response data for estimating the correlation between two randomized i 1 ) response measures as well as between a randomized response and a direct 2 2 (1 - p ) 0 ] /p . (15) = [0z. j Y j i question measure.. For both cases the true correlation between the two J j t "sensitive" measures and was set at .6. Both measures as well We have, as a result, sample estimates as their respective alternative responses, and Y , were drawn from 2 ... A2 2 normal distributions. The location and scale parameters in each random- :0: (16) 0 l_2o - Pj (1 - p. )(~x· - II ) (1- Pj O~j] /Pj Xj Zj J j Y ) I j ized response pair were purposely set close but unequal (i.e., II = 20 x ' 1 2 A2 - 2 f 2 2 2 z .) / (n·- 1) ; and I = 50, a = 100; II 55, 0 105). where Pj , II and 0 are known ; 0 = I: (zij- x = = Yj Yj Zj J ~ 2 Y2 Y2 f In the first simulation the probabilities of selecting the sensitive i question, PI =.6 and P2 = .7. The observed (attenuated) correlation

II (18) 2 and the corrected correlation computed from Equation are shown in Also, given 0 from Equation (16), we can generalize from Equation (8), I Xj Tabl,! 1 for varying sample sizes. As expected the correlations, after

correction for substantial attenuation, hover around the true value of .6 oA2 + 0 2 / p. + (llA x - lly. )2J (17) [ Xj Yj J j J and, more importantly, the standard errors of the corrected coefficients,

although inflated, grow acceptably small in moderately sized samples. In sum, we arrive at a corrected estimate of p ,the correlation x x Next, we considered the more usual case where one of the variables is l 2 of two randomized response measures, by expressing Equation (11) in sample ... measured directly (e.g.~ age), that is, where PI = 1.0. Further, we re­ form duced the selection probability, P2' to .5 (which incr~ases respondent 3 protection) as we would recommend in a highly sensitive inquiry. The (18) simulated correlations under these conditions are given in Table 2, again

revealing that these correlations can be satisfactorily corrected for con- ... 2 ... 2 where o and o are provided respectively in (16) and (17), and x. u. J J taminatic..l produced by the randomized response procedure. As before, the

p... A is the sample correlation of the scores estimated in Equation (10). x x corrected correlations are unbiased and have standard errors which become l 2 reasonably small for most applications.

11

10 .--,---.-~--~~

TABLE 1* TABLE 2* Simulated Correlation of Two 'Randomized Responses Simulated Correlations of a Randomized and a Direct Response

Attenuated Correlation Corrected Correlation rr ' 'Sample -Size Mean S.E. Mean S.E. I: 'Attenuated Correla tion COrrected Correlation ,i'~ -Sample-Size Mean S~E. 100 .2297 .0974 .6195 .2768 \ S.E. Mean ' j 250 .2322 .0611 .6203 .1697 100 .2751 .1058 .6002 .2512

500 .2301 .0450 .6119 .1194 250 .2793 .0654 .5923 .1374 750 .2292 .0347 .6063 .0901 500 .2880 .0436 .6033 .0968

1000 .2310 .0302 ~6120 .0818 750 .2907 .0355 .6069 .0734

1500 .2310 .0274 .6099 .0722 1000 .2914 .0319 .6074 .0683 2000 .2290 .0220 .6049 .0586 1500 .2842 .0299 .5942 .0652

2500 .2264 .0212 .5969 .0545 2000 .2912 .0230 .6109 .0508 2500 .. 2889 .0207 .6030 :0445

*Selection probabilities P1 c .6 and P2 = .7; true correlation '" Selection probability p Q ~5; true correlation p = .60. p .6. Simulation includes 100 trials. = Simulation includes 100 trials.

12 13 ,f '~ I I,

Notes 1. On occasions, the condition that the sensitive scores and their

associated disturbances are uncorre1ated, which is central to this Conclusion derivation and is assured here by the use of random digits for non-

We have demonstrated how estimation with the unrelated question ran- sensitive responses, may be restricting. For example, one may wish

domized response model. can be treated as simply a problem of measurement to employ, rather than random digits, a substantively meaningful, yet innocuous, alternative question (see Greenberg, et a1., 1971), perhaps error. As a result, the sizable literature on analysis of variables with error can be invoked to fashion analytic techniques for randomized response even one that elicits socially desirable resp0ns~s in order to

data. Also, the measurement error model derived here is structurally equi­ neutra.1ize the embarr8,55ing nature of the sensiti've question (Zdep and

valent to Warner's (1971) additive randomized response model in which re- Rhodes, 1976; also see Miller, 1981; and Fox and Tracy, 1981). Because

spondents verbally report the sum of their sensitive score and a random the sensitive and nonaansitive responses would no longer be necessarily

number. Although we feel that Warner's additive contamination model is uncorre1ated, the derived disturbance term would be correlated with

problematic as a data collection strategy (see, Fox and Tracy, 1980), the true sensitive score. Nevertheless, more complex interview designs, Rosenberg's (1979) full exposition of multivariate methods for the additive involving multiple samples and multiple alternative questions (e.g., see

randomized response model can be extended directly to data collected Fb1som et al., 1973), will permit an estimate of the covariance of Xi and 4 u so that the measurement model can be identified entirely. through the unrelated question approach as well. Therefore, although we i

have illustrated here only the use of correlations for unrelated question, 2. There are certain statistical advantages to using a distribution for

ranoomized response data t a full range of multivariate techniques, both the alternative response that is the same as that expected for the

categorical and quantitative, can be adapted in similar fashion. sensitive response. However, we tried here to approximate the usual case

where these expec ta t ions at'e fall ible .'

3. For discussions of. choosing selection probabilities as well as other

design parameters, see Fox and Tracy (1980), and Greenberg et a1. (1977). I :i 4. Unlike the unrelated question design (which we prefer for data ~, ~ . collection) the additive randomized response model, because it can be 1 ~ 1. 5 ~ .' :1 ) it ,j I~ j '~ j, -15- 14 ~ ----~~~------~ r IW(+:'I ~

1f

i. f\'.."; \ designed with random digits that are dj.stributed normally, produces References

estimates that enjoy certain desirable statistical properties (e.g., \ Barksdale, lv. B.

maximum likelihood). Nonetheless, both plans permit usual hypothesis 1971 "New randomized response techniques for control of nonsampling

tests as a consequence of the Central Limit Theorem (see Rosenberg, 1979: 91). errors in surveys." Ph.D. Dissertation, Department of Bio- statistics, University of North Carolina, Chapel Hill.

Chen, T. T.

1979 "Analysis of randomized response as purposively misclassified

data." Proceedings of the American Statistical Association,

Section on Survey Research Methods: 158-163.

Clicker, R. P. and B. Iglewicz

1976 '~arner's randomized response technique: the two sensitive

questions case." :rroceedings of the American Statistical

Association, Social Statistics Section: 260-263.

Drane, W.

1976 "On the theory of randomized responses to two sensitive questions."

Communications in Statistics - Theory and Methods A5:565-574.

Folsom, R. E.

1974 "A randomized response validation study: comparison of direct

and randomized reporting in DUI arrests." Research Triangle

Institute report No. 254-807.

Folsom R. 'E., B. G. Greenberg, D. G. Horvitz, and J. R. Abernathy

1973 "The two alternative questions randomized response model for

human surveys." Journal of the American Statistical Association

68:525-30.

-17- -16- r~~~-~-~""''''----

~

i,. r.' Ife' , • • \

'. ~ ~,j \ Locander, W. B., S. Sudman and N. N. Bradburn ~ Fox, J. A. and P. E. Tracy i 1976 "An investigation of the interview method, threat and response ! :1980 "The randomized response approach: applicability to distortion." Journal of the American Statistical Association criminal justice research and evaluation." Evaluation I!, 71:269-75

Review 4:601-22. Miller, J.D.

1981 "Reaffinning the viability of the randomized response approach." 1981 "Complexities of the randomized response solution." American

American Sociological Review 46:930-933. Sociological Review 46:928-930.

Greenberg, B. G., R. R. Kuebler, J. R. Abernathy and D. G. Horvitz Penick, B. K. E. and M. E. B. Owens

1971 "Application of the randomized response technique in obtaining 1976 Surveying Crime. Washington, D.C.: National Academy of

quantitative data." Journal of the American Statistical Associ- Sciences.

ation 66: 24 3-50. Rosenberg, M. J.

1977 "Respondent hazards in the unrelated question randomized re- 1979 '~ultivariable analysis by a randomized response technique for

sponse model." Journal of Statistical Planning and Inference disclosure control." Ph.D. Dissertation, Department of Bio-

1:53-60. statistics, University of Michigan.

Hlirumelfarb, S. H. and S. E. Edgell Tamhane, A. C.

1981 "Association of variables measured by the additive constants 1981 "Randomized response techniques for mul tiple sensitive attributes."

model of the randomized response technique." Department of Journal of the American Statistical Association 76:916-923.

Psychology, University of Louisville, mimeo. Tracy, P. E. and J. A. Fox

Kraemer, H. C. 1981 The validity of randomized response for sensitive measurements."

1980 "Estimation and testing of bivariate association using data American Sociological Review 46:187-200.

generated by the randomized response technique." Psychological Warner, S. L.

Bulletin 87:304-308. 1965 "Randomized response: a survey technique for eliminating evasive

Locander, W. B. answer bias." Journal of the American Statistical Association 1974 "An investigation of intervie~7 method, threat and response 60:63-69.

distortion." Ph.D. Dissertation, Department of Business Admin- 1971 "The linear randomized response model." Journal of the American

istration, University of Illinois. Statistical Association 66:884-888.

I " 19 18 , ,.

\ t f Zdep, S. M. and I. N. Rhodes 1 1976-77 ''Making the randomized response technique work." Public Opinion Quarterly 40:531-7.

,

20 r

I.V

.... ~ \ \ )