CHAPTER EIGHT

Behavior Genetic Research Methods Testing Quasi-Causal Hypotheses Using Multivariate Twin Data

ERIC TURKHEIMER AND K. PAIGE HARDEN

I find the use of a correlation coefficient a dangerous personality is in principle no less extensive than the symptom. It is an enemy of generalization, a focuser on intersection of genetics and personality. As such it is the “here and now” to the exclusion of the “there and too vast to review in a single chapter. then.” Any influence that exerts selection on one variable Research methods in both behavioral genetics and and not the other will shift the correlation coefficient. personality are currently at a crossroads. Although the What usually remains constant under such circumstances history of the behavioral genetics of personality has is one of the regression coefficients. If we wish to seek for its origins in animal breeding, and the foundational constancies, then, regression coefficients are much more likely to serve us than correlation coefficients. work in the field is largely about temperament in dogs (Tukey, 1969, p. 89). (Scott & Fuller, 1965), what has come to be thought of as behavioral genetics are the methods of quan- titative genetics, in which genetic and environmen- tal processes are inferred from differences in genetic INTRODUCTION and environmental relationships in twin and sibling The suggested topic of this chapter – methods for pairs, families, and pedigrees. Over roughly the same behavioral genetic research in personality – is poten- period, “personality” has come to refer largely to indi- tially misleading. Behavioral genetics, of course, is vidual differences in human personality, especially as simply the science of genetics in its many forms as they are assessed via paper-and-pencil and self-report. it is applied to behavior, and for the most part there The intersection of these more specific paradigms, in is no reason for genetic research methods for the which correlations between the self-reported person- study of behavior to be any different than genetic ality scores of family members are analyzed using methods applied to nonbehavioral characteristics of quantitative genetic statistical models, has defined the organisms. The genetics of behavior can be studied in behavioral genetics of personality for the last 50 years humans or in nonhuman animals, using correlational (Tellegen, Lykken, Bouchard, Wilcox, Segal, & Rich, or experimental methods; it can be inferred from pat- 1988). terns of familial relations or observed more directly in That methodological era is coming to an end. The DNA. Like any characteristic of an organism, behav- wider availability of specific genetic markers and the ior can be thought of in terms of individual differences sequencing of the human genome has supplemented, or unvarying species-typical characteristics; it can be and to some degree supplanted, the classical quanti- studied in the cross-sectional context of the current tative methods of the last century (Charney, 2012). moment or as an ongoing process in a life span or Researchers in personality have recognized the lim- evolutionary time; and it can be seen as an aspect itations of a narrow focus on self-report (Oltmanns of normal functioning or as a reflection of disorder & Turkheimer, 2009), and the recent explosion in and distress. The same is true of personality. There is evolutionary thinking in the behavioral sciences has little about personality that requires it to be studied had a profound effect on the field (Penke, Denissen, differently than diabetes, or height, for that matter. & Miller, 2007), in particular by rekindling interest The domain of behavioral genetic research methods in in species-general aspects of personality as opposed

159

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 160 ERIC TURKHEIMER AND K. PAIGE HARDEN

to individual differences, and by reminding us of confound them. The branch of personality psychol- the importance of personality in nonhuman animals ogy that interfaces with social consists (Gosling & John, 1999). largely of this kind of work. It is even possible to con- With that in mind, it may seem retrograde to focus duct randomized experimental research while includ- a review of behavioral genetic research methods in ing genetic information (Burt, 2009), although this is personality on twin methods for the study of individ- not often attempted in humans. ual differences in self-reported responses in humans. When studying human individual differences in The choice can be defended on several grounds, other personality, the observations are usually correlational, than the simple fact that this is where the expertise beginning with the most fundamental observations in of the authors happens to lie. First is to counter the personality, the patterns of association among per- too-frequent tendency in the behavioral sciences to sonality items that have been the basis for factor move on from one poorly understood method to the analytic studies of personality structure since Cattell next, motivated not by the theoretical completion (1957). Even at a more molar level, the basic observa- of the old paradigm but rather by the availability tions of personality science usually involve statistical of new technology. This review endeavors to show associations, either among the personality traits them- that both the foundations and implications of classical selves or with external variables that are indicators twin studies of personality have not been fully under- of validity. We refer to these relations as phenotypic stood. Related to this motivation, the new molecular associations, with “phenotype” denoting a character- genetic methodologies have themselves led to a com- istic of an organism at the observational level, as op- plex tangle of methodological difficulties, which one posed to its underlying causes. Phenotypic associa- of us has recently reviewed elsewhere (Turkheimer, tions with personality are easy to observe, but in 2012), although not in the context of personality per nonexperimental work the important underlying se (but see Munafo, Clark, Moore, Payne, Walton, & questions are about cause: What causes individual dif- Flint, 2003). Finally, there is the undeniable but some- ferences in personality, and what do individual differ- what mysterious fact that notwithstanding the thou- ences in personality cause? For example, does neu- sands of twin studies of personality that have been roticism cause poor physical health (Shipley, Weiss, conducted, and an equally rich history of theoretical Der, Taylor, & Deary, 2007)? Does military service writing on the subject, the quantitative genetics of per- change one’s personality, or are men with certain per- sonality remains stubbornly controversial, both widely sonality characteristics more likely to select military accepted as foundational yet regularly rejected as mis- service (Jackson, Thoemmes, Jonkmann, Ludtke, & leading or worse. Its merits and implications continue Trautwein, 2012)? Do changes in impulsivity cause a to be debated in the top journals (Charney, 2012). The young adult to “mature out” of alcohol use, or does perhaps unrealistic goal of this chapter is to soften the heavy drinking cause increases in impulsivity (Little- disagreement about the genetics of behavior by refor- field, Sher, & Wood, 2009; Quinn, Stappenbeck, & mulating its methodological foundation of twin and Fromme, 2011)? These causal questions are very dif- family studies. Later, we also apply our reformulation ficult to answer, and phenotypic associations alone of older methods to gain realistic understanding of the are causally ambiguous. The only way to demonstrate newer ones that capitalize on the availability of mea- conclusively that a phenotypic association between sured DNA. heavy drinking and impulsivity is causal would be to assign individuals randomly to different heavy- drinking conditions and see what happens. Random PERSONALITY AS NONEXPERIMENTAL experimentation of this kind is, of course, often impos- SCIENCE sible for practical or ethical reasons. In the absence Focusing the chapter on individual differences in of random assignment, how can a social scientist humans highlights a particularly problematic aspect proceed? of scientific inference in the human behavioral sci- As is usually the case in the social sciences, the ences: the inference of causality from nonexperimen- answer is that social scientists can resort to quasi- tal data (see West, Cham, & Liu, Chapter 4 in this experimental methods, in the hope of capturing some volume). It is, of course, possible to study personality of the causal certainty offered by the idealized random experimentally, using random assignment to experi- experiment (see West et al., Chapter 4 in this volume). mental conditions to isolate causal effects of manipu- Suppose, for example, there existed pairs of children lations from extraneous variables that might otherwise who had been matched for cultural background and

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 161

genetic predisposition, yet who nonetheless differed in the reader lay aside what he or she already knows some hypothesized causal factor. To continue with the or has heard about twin studies, including the idea example of heavy drinking and impulsivity, if there that the purpose of behavioral genetics is to estimate were pairs of children matched for genetic and envi- the magnitude of genetic and environmental contri- ronmental family background but differing in their butions to a trait. In later sections, we expand on our drinking habits, and if differences in drinking within simple MZ-twin regression analysis to show how it pairs of matched children was still related to their per- intersects with more complex – and perhaps more sonality, this association could not be the result of familiar – methods for analyzing twin data. (The environmental or genetic family background, because reader interested in a more traditional introduction to the pairs had been matched for these traits. Asso- the twin method can see Plomin, DeFries, Knopik, & ciations within matched pairs would not prove cau- Neiderhiser, 2012 for an exhaustive account, or Neale sation, because matching can never be comprehen- & Maes, 2007 for a more computationally oriented sive and perfect, but to the extent the matching approach.). succeeded in holding constant important confounds, the within-pair associations would strengthen our RELIGIOSITY AND DELINQUENCY IN MZ TWINS impression that the association may have a causal basis. We have adopted the qualified term quasi-causal To provide a brief substantive background for our to denote associations that have survived analysis example, the incidence of delinquency increases so using quasi-experimental methods. dramatically in adolescence that some researchers Needless to say, the matched pairs we have consider it to be developmentally normative (Moffitt, described do exist: they are called identical (monozy- 1993). Adolescents commit more than 30% of major gotic; MZ) twins reared together. Other kinds of famil- crimes in the (Federal Bureau of Inves- ial clusters – fraternal (dizygotic; DZ) twins, siblings, tigation, 2004). One potential protective factor against half-siblings, twins reared apart, adoptive siblings, and delinquency is religiosity – that is, affiliation with and so forth – are also matched, but to a lesser degree involvement in religious organizations and activities. than are identical twins reared together. This chapter Religious involvement may decrease problem behav- makes the case that the essential contribution of what ior by instilling beliefs about divine sanctions, encour- is commonly called behavior genetics is the use of aging prosocial ties that foster concern for collective such familial clusters to obtain a significant but imper- well-being, facilitating the intergenerational commu- fect degree of quasi-experimental control over nonex- nication of conforming values, and buffering against perimental phenotypic associations. To illustrate this psychological distress that otherwise may be acted out point, we begin by presenting a regression-based anal- in problem behavior (Alpert, 1939; Smith, 2003). A ysis of MZ twin data on religiosity (involvement in meta-analysis of 60 studies concluded that there is organized religious activities) and delinquent behav- a moderate negative relationship between religiosity ior during adolescence. This starting-off point differs and delinquency (Baier & Wright, 2001). However, from what is typically thought of as a “twin study” in the association between religiosity and delinquency two important respects. First, the focus of our anal- is confounded by numerous variables related to both, ysis is on the relation between two individual differ- including genetic factors (Koenig, McGue, Krueger, & ences variables, rather than on dividing the sources Bouchard, 2005; Miles & Carey, 1997). of variation in a single behavior into genetic versus Twin data on religiosity and delinquency are drawn environmental components. By the end of this first from the National Longitudinal Study of Adolescent analysis, we will not know much about how much Health (Add Health), a nationally representative study genes matter for either religiosity or delinquency, but designed to assess adolescent health and risk behavior, we will know much more about how they are related collected in four waves between 1994 and 2008. Add to each other. Second, we begin by using data from Health participants were recruited using a stratified only MZ twins rather than from both MZ and DZ school-based sampling design. A randomly selected twins. As we describe later, the familiar decomposi- subsample of 20,745 participants (randomly selected tion of observed variance into genetic and environ- from school rosters) completed a 90-minute in-home mental components depends on comparing the rela- interview between April and December 1995 (Wave I tive similarity of MZ versus DZ twins, but we hope to interview; 10,480 female, 10,264 male). Participants convince the reader that the clearest exposition of the ranged in age from 11 to 21 years (M = 16 years, twin method starts elsewhere. Indeed, we request that 25th percentile = 14 years, 75th = 17 years). The

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 162 ERIC TURKHEIMER AND K. PAIGE HARDEN

design features of the Add Health data set have been RANDOM EFFECTS MODELS extensively described elsewhere (Harris, 2011), and The most readily apparent analytic complexity intro- interested readers are referred to the Add Health web- duced by MZ twin data (as opposed to data on sin- site (http://www.cpc.unc.edu/addhealth) for addi- gletons) is that observations may no longer be con- tional information. In this chapter we use a subsam- sidered independent: individuals are clustered within ple of the Add Health participants that comprises all twin pairs. Failure to consider this nonindependence adolescents between 11 and 20 years old who were may cause serious bias in the estimation of parame- identified as monozygotic (MZ) or dizygotic (DZ) twins ters and standard errors. Random effects models, also raised together in the same household (N = 289 MZ known as hierarchical linear models or mixed effects pairs, 451 DZ pairs). Twin zygosity was determined models, are a popular approach for the analysis of clus- primarily on the basis of self-report and four ques- tered data. (For a comprehensive introduction to ran- tionnaire items concerning how often twins were con- dom effects models, see Raudenbush and Bryk (2002) fused with one another and the similarity of their or Schoemann, Rhemtulla, and Little, Chapter 21 in physical appearance. Eighty-nine pairs of uncertain this volume.) A basic mixed effects model for our data, zygosity were determined to be identical if they shared analogous to a simple regression in non-clustered data, five or more genetic markers. Details regarding the is as follows: Add Health twins sample are described by Harris, Halpern, Smolen, and Haberstick (2006). Yij = B00 + (B01 × Xij) + u0J + eij (8.1) Religiosity was measured using four items (rated on four-point or five-point ordinal scale) assessing impor- The subscripts ij represent the ith twin within the jth tance of religion, frequency of prayer, attendance at pair. The first part of Model 1 is directly comparable religious services, and attendance at youth groups. to traditional regression analysis, with B00 represent- Additional religiosity items that focused primarily on ing the population intercept and β01representing the type of affiliation (e.g., identification as born-again) expected increase in delinquency given an increase or theological beliefs (e.g., divine authorship of sacred in one unit religiosity. Together this represents the texts) were excluded. Individuals who denied any reli- fixed portion of the model. The latter, random por- gious affiliation were not assessed for religiosity during tion of the model is composed of u0j, the pair-level the interview; they were assigned scores as appropri- error of prediction (i.e., the difference between the ate (e.g., “Never” for frequency of prayer; “Not at all population-level intercept B00 and the intercept for a important” for importance of religion.) Items scored given twin pair); and eij, the individual-level error of in reverse direction, such that higher scores reflect prediction. As applied to our example, u0j reflects the less religiosity, were reversed numerically. Religiosity extent to which a twin pair is, on average, less or more scores were computed by summing responses on the delinquent than the overall population; eij reflects four items (M = 9.44, SD = 4.22, median = 9, range the extent to which an individual twin is less or = 4 to 17, alpha = 0.76). more delinquent than the pair average. Unaccounted- Adolescents were also asked how often in the last for variation in delinquency is thus divided into two 12 months they had engaged in each of 15 antisocial components: one part shared by twins in a pair and behaviors: Never (0), One or Two Times (1), Three another part independent among individual twins. or Four Times (2), or Five or More Times (3). In Taken together, the fixed effects and random effects addition, they were asked how often in the last 12 parts comprise a model closely related to traditional months each of 4 violent events happened: Never (0), regression analysis; the only addition is an estimate of Once (1), More Than Once (2). A previous confir- the dependence between observations within a cluster matory factor analysis of this data indicated that 11 (i.e., the pair-level residual variation). items pertaining to theft, deception, and public row- This model was estimated in MZ twin pairs using diness were indicative of a single factor, labeled here PROC MIXED in SAS (see Appendix A at the end of as delinquency (Harden, Mendle, Hill, Turkheimer, & the chapter for code). Results are listed under Model 1 Emery, 2008). (The remaining items were indicative in Table 8.1. Consistent with previous epidemiolog- of a factor pertaining to aggressive or violent behav- ical research, religiosity was significantly associated ior.) Delinquency factor scores (M = 0.12, SD = 0.81, with lower delinquency (β = 0.027). Of the total range = –1.11 to 3.58, 25th percentile = –0.50, 75th unaccounted-for variation in delinquency (0.324 + percentile = 0.68) were estimated using the program 0.343 = 0.667), 51.4% (0.343/0.667) was attributable Mplus (Muthen´ & Muthen,´ 1998–2010). to genetic and environmental factors that make twins

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 163

TABLE 8.1. Results from Mixed Effects Models of Religiosity and Delinquency

MZ Twins Only MZ and DZ Twins Model 1 Model 2 Model 3 Model 4 Fixed Effects Intercept −.178 (.097) −.353 (.113)∗ −.353 (.113)∗ −.404 (.069)∗ Religiosity .027 (.009)∗ −.008 (.017) Religiosity Deviation −.008 (.017) −.008 (.018) Deviation∗Zygosity .027 (.021) (DZ Effect – MZ Effect) Religiosity Pair Average .055 (.020)∗ .047 (.011)∗ .050 (.007)∗ Random Effects MZ Twin Pair .343 (.048)∗ .335 (.046)∗ .335 (.046)∗ .314 (.045)∗ Within-MZ Residual .324 (.029)∗ .317 (.028)∗ .317 (.028)∗ .317 (.028)∗ DZ Twin Pair .225 (.031)∗ Within-DZ Residual .389 (.028)∗

∗ Significant at P < .05.

similar. Put another way, the intraclass correlation significantly predict delinquency (β01 =−0.008), as for twin pairs was 0.51. The remaining 48.6% of would be predicted by a causal hypothesis. the variance existed within twin pairs and can thus A difficulty with including the pair average as a be attributed to environmental influences that make predictor is that it may be strongly correlated with an twins different and to measurement error. individual’s score. To ameliorate the problem of multi- Besides dividing unaccounted-for variation in collinearity, Model 2 may be re-parameterized to yield delinquency into between-pair and within-pair com- orthogonal covariates, namely the twin-pair average ponents (and providing accurate estimates of the stan- (X¯ 0 j ) and the deviation of each twin from the twin-pair dard errors), the results of Model 1 tell us nothing that average (Xij − X¯ 0 j ): we could not have known from an ordinary correla- ¯ ¯ tional study. There is, however, an additional piece of Yij = B00 +(BB × X0 j )+(BW ×(Xij − X0J ))+u0 j +eij information that we can include in the model – the (8.3) average level of religiosity for the twin pair (X¯ 0 j ): The between-cluster regression coefficient BB esti- Yij = B00 + (B01 × Xij) + (B02×X¯ 0 j ) + u0 j + eij (8.2) mates whether pairs with higher average religiosity have lower average delinquency, including the effects Consider how the addition of this one piece of of the unmeasured covariates that vary at the pair information changes the meaning of the parameter level. In contrast, the within-cluster regression coef- B01, which now quantifies whether an individual who ficient BW estimates whether the MZ twin with higher is more religious is less delinquent, controlling for the religiosity than his or her co-twin also has lower delin- overall level of religiosity in his or her twin pair. Because quency than his or her co-twin. Results from this religiosity is not randomly assigned, controlling for the model (Model 3) are shown in Table 8.1. The regres- average level of religiosity in the pair essentially con- sion of delinquency on religiosity within twin pairs is trols for being from the “type” of family that is reli- not significant (βW =−0.008), but the regression on gious, including all the between-family genetic and mean pair religiosity is significant (βB = 0.047). Model environmental differences that are confounded with 3 is merely a re-parameterization of Model 2, such average religiosity. Results from this model are listed that BB in Model 3 equals the sum of the two regres- under Model 2 in Table 8.1. Twin pairs who are sion coefficients from Model 2 (.053 – .008 = .047), more religious, on average, have lower levels of delin- but because the covariates are orthogonal the stan- quency (β02 = 0.055). However, if Twin A is more reli- dard error for the between-cluster regression is slightly gious than Twin B, this within-pair difference does not smaller in Model 3.

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 164 ERIC TURKHEIMER AND K. PAIGE HARDEN

Again, how to interpret these results? The within- confounded only by those factors that vary within cluster regression is not biased by the exclusion of twin pairs. cluster-level confounds; therefore, the within-cluster Inclusion of DZ twin pairs complicates the analysis regression better approximates the “true” quasi-causal of between- and within-pair variances to some extent, relation between religiosity and delinquency in the but ultimately allows the estimation of a third vari- population. In other words, BW is the key parame- ance, and with the addition of some statistical and ter of interest for evaluating a quasi-causal hypoth- biological assumptions leads to the familiar terms of esis. In the current analysis, the quasi-causal rela- the classical twin model. In MZ twins, who are iden- tion between religiosity and delinquency appears to tical genetically, within-pair confounds are necessar- be nonexistent: Families who are more religious are ily environmental in origin. In DZ twins, who share less delinquent, but a twin who is more religious than only 50% of their genes, there are both genetic and his co-twin is not. Another way of making the same environmental within-pair confounds. Therefore, to point is to observe that to the extent the relationship is the extent that the relation between religiosity and ultimately causal, the structural relation between reli- delinquency is attributable to genetic confounds, there giosity and delinquency should not differ depending should be a larger within-pair effect for DZ twin pairs on whether one is comparing means of twin pairs, than for MZ twin pairs. To model this, we can spec- deviations of individual twins from their pair mean, ify an additional interaction term to our mixed effects or unrelated individuals (see the discussion of rat pups model: in Turkheimer & Waldron, 2000). These comparisons are invariant causally but differ in their confounds: Yij = B00 + (BB × X¯ 0 j ) + (BW × (Xij − X¯ 0 j )) twin comparisons are confounded only by environ- + (B × ZYG) + (B × ZYG mental influences unique to each twin, while compar- 03 04 ¯ isons between unrelated individuals are confounded × (Xij − X0 j )) + u0 j + eij (8.4) by all genetic and environmental differences between families. Inequality of BW and BB, therefore, suggests When zygosity (abbreviated ZYG) is dummy-coded as the operation of third-variable confounds operating 0 in MZ twins and 1 in DZ twins, BW estimates the between families. the within-pair regression for MZ twins, whereas B04 Comparison of within- and between-cluster regres- estimates the difference between MZ and DZ twins in sion coefficients is a strategy found frequently in the within-pair effect. To the extent that confounding the medical literature, with the aim of disentangling variables are genetic in origin, the within-pair effect “maternal” factors from “fetal origins” of disease aeti- will be larger for DZ twins than MZ twins. In contrast, ology (for a review, see Carlin, Gurrin, Sterne, Morley, MZ and DZ twin pairs raised together control equally & Dwyer, 2005). Mixed effects models of twin data well for shared environmental factors, thus there will have been productively applied to the study of birth be no difference between pair types if the relevant weight and cord blood erythropoietin (Morley, Moore, confounds are environmental in origin. There is no Dwyer, Owens, Umstad, & Carlin, 2005), birth weight reason to expect a main effect of zygosity on the out- and blood pressure (Mann, De Stavola, & Leon, 2004), come of interest; however, it is necessary to include and tobacco use and bone density (Hopper & Seeman, the main effect of a covariate included in an interac- 1994), to name just a few examples. This statistical tion term. method would be just as productively applied to the Results from this model (Model 4), using data from study of psychological development as to the study of both MZ and DZ twin pairs, are shown in Table 8.1. disease. See Appendix A at the end of the chapter for the corresponding SAS program. The primary result of Model 4 is the same as Model 3: differences in reli- Differentiating Genetic and Shared giosity between MZ twins do not significantly pre- Environmental Confounds dict delinquency, inconsistent with a causal hypoth- Thus far, we have considered average family reli- esis (BW =−0.008). Second, the within-pair effect giosity and within-twin pair differences in religios- is not significantly larger in DZ pairs than in MZ ity in pairs of identical twins. The former effect is pairs (B04 = 0.027), indicating that the association confounded by all genetic and environmental fac- between religiosity and delinquency is attributable tors that vary between families and are systemati- to environmental factors that vary between families. cally associated with religiosity. The latter effect is Third, the residual variance shared by MZ twins

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 165

(0.314) is larger than the variance shared by DZ twins Nevertheless, this extension may be valuable in more (0.225), indicating that genetic factors account for fully characterizing the relation between predictor and residual variation in delinquency not accounted for by outcome. religiosity. Assuming that the between-pair variance in MZ In the classical twin study, the three pieces of avail- twins equals all genetic variance and all shared envi- able information – between- and within-pair vari- ronmental variance, whereas the twin-pair variance ances and zygosity – are re-parameterized to yield the in DZ twins equals one-half the genetic variance plus three components of the classical twin model: addi- all shared environmental variance, simple arithmetic tive genetic influences (A), which are assumed to be yields a more precise estimate for the genetic variance perfectly correlated in MZ twin pairs and correlated at (A = 2 × (MZ Twin Pair Var – DZ Twin Pair Var) = 0.5 in DZ twin pairs; shared environmental influences 0.178) and shared environmental variance (C = MZ (C), which are environmental influences that make Twin Pair Var − A = 0.136). The within-pair vari- twins raised in the same home more similar, regard- ance in MZ pairs is a direct estimate of the E variance less of zygosity; and non-shared environmental influ- (0.317). ences (E), which are environmental influences that In summary, random effects models of the relation are unique to each twin and thus contribute to within- between an environment and putative psychological pair differences, plus measurement error. Differentiat- outcome in twin pairs can be used to assess the fol- ing these sources of variation depends on the relative lowing: (1) the extent to which within-twin pair dif- similarity of MZ and DZ twins for a given pheno- ferences in environmental experience predict differ- type. MZ twins are assumed to share all of their genes ences in the outcome of interest, as would be expected and, by definition, their shared environment; conse- under a causal hypothesis; (2) the extent to which the quently, to the extent that MZ twins are not perfectly within-pair regression and the between-cluster regres- correlated for a phenotype, this is reflected in the esti- sion are different, suggesting the operation of con- mate for E. To the extent that MZ twins, who share founding variables; (3) the extent to which the within- all of their genes, are more similar than DZ twins, pair regression in DZ pairs differs from MZ pairs, which who are assumed to share 50% of their genes, this suggests that the relevant confounds are genetic in ori- will be reflected in the estimate of A. Finally, to the gin; and (4) the extent to which the residual within- extent that the similarity of DZ twins exceeds half that and between-pair variance components differ between observed in MZ twins, this will be reflected in the esti- MZ and DZ pairs, which suggests that genetic fac- mate of C. tors account for variation in outcome independent of Although most commonly discussed in terms of the predictor of interest. The proposed mixed effects genetic and shared environmental contributions to an model, therefore, provides a rich characterization of individual phenotype, we can also apply these same how X and Y are related, with very minimal program- concepts to the association between two phenotypes, ming (six lines of SAS code). just as we could with the between and within vari- Nevertheless, translating parameters that are speci- ances in an analysis of MZ twins. To the extent the fied in terms of pair means and deviations or between- relation between religiosity and delinquency is truly and within-cluster variation into the components causal, one would expect the same causal forces to be familiar in behavior geneticists – namely, additive operating within families as between them. Our MZ genetics, shared environment, and non-shared envi- twin analysis, however, indicated that the association ronment – requires either post hoc arithmetic of the between delinquency and religiosity was driven by kind we have used here or rather elaborate weight- between-family confounds rather than a “true” quasi- ing schemes (McArdle & Prescott, 2005). Moreover, causal effect of religious involvement: Twin pairs who whereas the current example clearly indicates that the were more religious were less delinquent, but twins relevant confounds were shared environmental in ori- who differed in their religiosity did not differ in their gin, there will obviously be cases in which the con- delinquency. This model could be extended to ask founding variables are both genetic and environmen- whether the between-family confounds that drive the tal, and the proposed random effects model does not religiosity-delinquency association are genetic versus explicitly quantify genetic versus shared environmen- shared environmental in origin. It should be noted tal selection effects. We now turn our attention to an that both of these extensions are somewhat tan- analytic approach more commonly used in behavior gential to the central point of the analysis: testing genetics that addresses some of these shortcomings: whether the quasi-causal hypothesis can be excluded. structural equation modeling.

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 166 ERIC TURKHEIMER AND K. PAIGE HARDEN

a) rMZ/rDZ rMZ/rDZ

bP bP

Religiosity Delinquency Religiosity Delinquency

TWIN 1 TWIN 2

b) r=1.0/0.5 r=1.0/0.5

r=1.0 r=1.0

A C E A C E A C E A C E a2 c2 e2 a2 c2 e2 a2 c2 e2 a2 c2 e2

bP bP

Religiosity Delinquency Religiosity Delinquency TWIN 1 TWIN 2

Figure 8.1. Phenotypic regression model in a pair of twins. giosity and delinquency, separately for MZ and DZ Delinquency is regressed on religiosity, with equal regression twins. Also similar to the random effect regressions, coefficients bP for each member of the pair. Curved paths rep- those MZ and DZ covariances can be re-parameterized resent twin correlations for delinquency and religiosity. as ACE variances, as shown in Figure 8.1b. Notably, the unstandardized regression of delinquency on reli- giosity, bP,isthesame in both figures; partitioning STRUCTURAL EQUATION MODELS the variance of religiosity and delinquency has no Unstandardized ACE Regression consequences for the regression coefficient between them. Since the development of powerful and relatively Nevertheless, in bivariate twin data of the kind user-friendly structural equation modeling software, required to fit the model in Figure 8.1b, one can fit such as Mplus or Mx, twin data are most commonly a model, illustrated in Figure 8.2, in which separate analyzed using structural equation models, which, regressions are estimated for Y on the three biomet- above and beyond offering a graphical means of rep- ric components of X (only one twin is shown). If the resenting the underlying equations, make it simpler true model is represented by Figure 8.1a (i.e., X causes to execute the reparameterizations of variances that Y), it is easy to see what will happen in a bivariate we conducted in the preceding section using post hoc model. Because the phenotype X is the sum of its arithmetic. Figure 8.1a is a SEM representation of a three unstandardized components A, C, and E, we will twin regression, with delinquency predicted from reli- have giosity in twin pairs. As was the case in the random

effects regressions employed in the previous section, bP (A + C + E) = bP A + bP C + bP E (8.5) the dependence of the observations taken from the same twin pair is modeled explicitly, here simply by This shows that when the relationship between X including estimates of the twin correlations for reli- and Y is causal at the level of the phenotype, the

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 167

a2 c2 e2

A C E

a’2 bP 1.0 1.0 1.0 bP A bP

c’2 C

Religiosity Delinquency e’2 E

three unstandardized coefficients in a bivariate ACE Figure 8.2. Phenotypic regression model in which twin regression model will be equal to each other and covariances in MZ and DZ twins have been decomposed into to the phenotypic causal parameter that underlies biometric components; doing so has no effect on the regres- them. sion bP. The reasons for this equivalence are simple but easily misunderstood, and very important. Intuitions about the parameters of the classical twin model in correlated with a host of unmeasured confounds that the context of regression may be better served by will then bias the phenotypic regression of delin- temporarily suspending the classical “genes and envi- quency on religiosity and vitiate the causal interpreta- ronment” interpretation of the twin model in which tion of simple regressions. In a manner entirely anal- the A term is identified with a latent construct called ogous to the random effects models developed earlier, “the additive effect of genes,” the C term is identi- with some key assumptions, twin models can pro- fied as “the shared environment” and the E term as vide insight into the nature of the confounding and the “nonshared environment” in favor of something any causal relations that remain. The key to the anal- more literal, as follows. The classical twin model can ysis is that any unmeasured confounds of religios- use covariances in identical and fraternal twins to par- ity can themselves be partitioned into ACE compo- tition a phenotype into three components, defined by nents, in the same way that confounds in a random their covariances between members of twin pairs. One effects model can be decomposed into a component (E) is uncorrelated between members of a pair, one correlated with variation in twin pair means and (C) is perfectly correlated between members of a pair, another component correlated with variation within and one (A) is correlated 1.0 in identical twins and pairs. The ACE components of the unmeasured con- 0.5 in fraternal twins. All three are components of the founds will then differentially bias the ACE regression phenotype being analyzed, and their sum, A+C+E, coefficients with which they are associated. The model is equal to the phenotype. Understood this way, the including the unmeasured confounds is illustrated in invariance of structural regression components in an Figure 8.3 unstandardized bivariate model of a causal relation Figure 8.3 is identical to Figure 8.2, except for is completely unsurprising. If one unit change in the the addition of unobserved confounds inside the dot- phenotype of religiosity causes bP units of change in ted rectangle, and dotted arrows with coefficients ′ ′ ′ delinquency, that will remain the case no matter what bA, bB and bE , which represent the causal effects component of the phenotype one is examining, and of the confounds on the outcome. The confounds whether one thinks of the origins of the component as are perfectly correlated with the predictor of interest, genetic or environmental. because by definition the predictor and the uncon- The foregoing discussion has not taken into account trolled confounds cannot be distinguished. In actual the problem at the heart of this chapter, namely that data, the ACE regression model is fit without refer- individuals are not randomly assigned to religiosity ence to the unmeasured confounds, which then bias conditions, with the result that religiosity is potentially the unstandardized ACE regressions away from the

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 168 ERIC TURKHEIMER AND K. PAIGE HARDEN

1.0 1.0 1.0

Uncontrolled Confounds of Religiosity a 2 c 2 e 2 A C E a2 A C E R A b’A b’C b’E bP bP c2 bP R C

e2 Delinquency E R Religiosity

Figure 8.3. Genetically informed phenotypic regression, delinquency that are not shared by identical twins. including unmeasured confounds (only one twin shown). At the very least, we can say that the E regression Latent regressions along A, C, and E paths are equal to bP, is free of the many genetic and environmental con- but observed regressions are biased by unmeasured A and C confounds in dotted box. founds that identical twins share. The partial solution to the underdetermination of Equation 8.6 (or, alter- natively, the twin-based quasi-experimental solution to the unavailability of random assignment in studies true causal value bP. In the model fit to the observed of natural variation in humans) is as follows: To the data, extent we are willing to make the quasi-experimental ′ ′ ′ assumption that relations within identical twin pairs bA = bP + b ; bC = bP + b ; bE = bP + b (8.6) A C E are unconfounded by other variables, which is to say ′ This situation would appear to present something of that bE is zero, then bE estimates bP, the true causal a dilemma, because there are three equations (one relation, while bA and bC estimate the sum of the for each of the observed ACE regressions) and four causal coefficient plus the effects of confounds that unknowns (the true causal parameter and three ACE vary in each of the domains, respectively. confounds). This is no small shortcoming, and should The E regression estimates the causal effect of X serve to remind us of the impossibility of reaching on Y to the extent we can assume that differences strong causal inferences from nonexperimental data. within MZ pairs are not confounded by uncontrolled Nonetheless, the ACE partitioning of the unmeasured factors. We have already seen why this is a reason- confounds offers a partial solution. The A and C terms, able assumption to make. MZ twins are, to a first representing variation in the confounds shared by approximation, genetically identical, so differences in identical twins, includes most of the plausible con- delinquency associated with differences in religiosity founds of developmental causal relations. Socioeco- within pairs cannot be attributed to genetic differences nomic status, parental education, culture, and, of between the pairs. When twins are reared together, course, genotype are all shared by identical twins, and they are roughly identical for a host of socioeconomic therefore are not potential explanations of differences variables that might otherwise explain differences in between members of a pair. delinquency: socioeconomic status, place of residence, In contrast, the E component represents aspects and so on. It is, however, quite possible to think of of religiosity and its confounds that are uncorrelated uncontrolled third variables that might confound the within pairs of identical twins. It is more difficult – within-pair association. For example, parents might not impossible, just more difficult – to posit con- choose to send one child to a religious school and founds of the relationship between religiosity and the other to a public school. The child in the religious

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 169

school would become more religious, and might also model of religiosity can be thought of in an abstract make friends that predisposed him or her lower lev- way as a stand-in for genetic variance in religiosity, els of delinquency than the twin in the public school, the component itself is religiosity, phenotypic religios- even in the absence of any direct causal effect of reli- ity. The same is true of the C and E terms, which refer giosity on delinquency. Accepting the E regression as not so much to latent shared and nonshared environ- an estimate of the causal effect is therefore an assump- ments underlying religiosity as to a portion of religios- tion, or one might better say an approximation, of the ity itself. By naming the latent variance that is not cor- true state of affairs. In the absence of random assign- related between twins the non-shared environment, ment, strict inference of causality is simply impossible, behavior geneticists have promoted as environmental and all twins offer is a quasi-experimental method for what is really just within-pair variation in the phe- approaching it. As before, we prefer the term quasi- notype of interest. Although the origin of within-twin causal to describe a regression of Y on X (in either ran- pair variation is, by definition, environmental influ- dom effects or structural equation models) that has ences (and measurement error) that make twins raised survived exposure to a genetically informed design in the same family different, the effect of within-twin that controls for genetic and shared environmental pair variation consists of the effect of X itself, con- confounds. founded by the effects of any uncontrolled variables The latent E component of religiosity is analogous that also vary within pairs. to the within-pair deviation in religiosity (xij − x.j), When describing the relation of a latent E vari- entered as a covariate in mixed effects Models 2 – able with other variables in the model, many authors 4, above, and they offer the same inferential benefits have mistakenly concluded that the E regression esti- and limitations. To understand this connection, con- mates only the role of non-shared environmental con- sider again the case of MZ twins. They are necessar- founds, and have neglected that this relation is actu- ily identical for additive genetic and shared environ- ally the best estimate of the phenotypic causal effects mental factors; any difference between MZ twins is of X itself. Given that detection of this causal effect is reflected in their scores on the latent E variable. That usually the primary goal of the study, misinterpreting is, a higher score on the latent E variable indicates that the term “non-shared” can sometimes snatch defeat an individual twin has higher levels of religiosity, rela- from the jaws of victory in an otherwise success- tive to his or her co-twin. Consequently, the bE path is ful study. For example, Pike, McGuire, Hetherington, directly analogous to the within-pair regression coef- Reiss, and Plomin (1996), although implying that their ficient βW.ThebE path asks, if Twin A has higher lev- goal was to evaluate a causal hypothesis (“differen- els of religiosity than his or her co-twin (i.e., higher tial treatment affects adolescent adjustment”, p. 599), latent E scores), does Twin A also have lower levels described their results as follows: “mothers’ negativ- of delinquency? Therefore, the regression on E is the ity is significantly associated with depressive symp- key parameter of interest for testing a causal hypoth- toms through nonshared environmental processes. esis about the relation between religiosity and delin- (p. 597)” A reader could easily interpret this state- quency. ment as meaning that some other environmental pro- Some readers may find this interpretation of the cess, unique to each sibling, was responsible for both regression on E to be counterintuitive. Historically, maternal negativity and depressive symptoms, rather there have been two important obstacles to a proper than as evidence for a quasi-causal effect of moth- understanding of quasi-causal relations in behavior ers’ negativity on depression. Similarly, Spotts, Ped- genetics, and the crucial role that is played by “non- erson, Neiderheiser, Reiss, Lichtenstein, Hansson, & shared environmental” differences within pairs of MZ Cederblad (2005) described a previous result (Reiss, twins. First, the three variance components of the Neiderhiser, Hetherington, & Plomin, 2000) as fol- classical twin model – additive genetics, shared and lows: “For example, more than half the correlation unshared environment – have been reified as “genes,” between mother’s positivity and child’s social respon- “shared environment,” and “nonshared environment” sibility is accounted for by genetic influences, with the for so long that it is easy to forget that ultimately remainder being accounted for by shared and non- they are all just components of the variable being ana- shared environmental influences. (p. 339)” This state- lyzed, specifically the predictor in a bivariate regres- ment makes it seem as though the association has sion model, which in this case is phenotypic religiosity. been carved up into various types of confounds – That is to say, although according to a fairly restrictive a little attributable to genes, a little to social class set of assumptions the A component in a classical twin or other between-family environmental differences, a

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 170 ERIC TURKHEIMER AND K. PAIGE HARDEN

little to peer groups or other within-family envi- is an information-theoretic fit criterion that estimates ronmental differences – with nothing left over to the Bayes factor, the ratio of posterior to prior odds in comprise a causal relation. In fact, that “non-shared comparisons of a model with a saturated one (Raftery environmental influences” account for the “correla- & Richardson, 1996; Schwarz, 1978). BIC outper- tion between mother’s positivity and child’s social forms other fit criteria in its ability to discriminate responsibility” means that within-pair differences in between multivariate behavior genetic models, partic- maternal positivity predicted within-pair differences in ularly for complex model comparisons in large sam- social responsibility – a quasi-causal relation. A third ples, and is more robust to distributional misspecifi- example is a report by McGue, Iacono, and Krueger cations (Markon & Krueger, 2004). Interpretation of (2006), whose stated goal was to evaluate whether BIC values is entirely comparative, with lower val- early adolescent problem behavior is related to adult ues of BIC indicating better model fit. RMSEA mea- disinhibitory psychopathology via a causal mecha- sures error in approximating data from the model-per- nism, but who devoted a single sentence to describing, model parameter (Steiger, 1990). RMSEA values of without comment, whether the twin with more prob- less than 0.05 indicate a close fit, and values up to 0.08 lem behavior grew up to have more psychopathology represent reasonable errors of approximation. Browne than his or her co-twin – “non-shared environmental and Cudeck (1993) have argued that the RMSEA pro- factors accounted for the remaining 11% and 5% of vides very useful information about the degree to the correlation [in females and males, respectively]” which a given model approximates population values. (p. 599). Differences in model χ2 are themselves distributed as χ2, with df equal to the difference between the models’ df. We fit a series of five nested models to data from MODELING SEQUENCE MZ and DZ twins in the program Mplus (Muthen´ We previously described four ways in which the rela- & Muthen,´ 1998–2010). Results from the full model tion between an environment and putative psycholog- (Model 5) are summarized in Table 8.2. Of the ical outcome in twin pairs can be characterized, each total variance in religiosity, additive genetic effects of which is explicitly characterized in the unstandard- accounted for approximately 24% [4.30/(4.30+9.06+ ized bivariate ACE regression model. First, the extent 4.64)], shared environmental influences for approxi- to which within-twin pair differences in environmen- mately 50%, and non-shared environmental influ- tal experience predict differences in the outcome of ences for approximately 26%. Similarly, of the unique interest, as would be predicted by a causal hypothe- variance in delinquency, additive genetic effects sis, will be reflected in the be path. If be can be fixed accounted for approximately 25%, shared environ- to zero without significant loss of model fit, this pro- ment for 20%, and non-shared environment for 55%. vides disconfirmatory evidence regarding the quasi- Notice that the estimate for non-shared environmen- causal hypothesis. Second, to the extent that the asso- tal variance (EDel = 0.334) is equal (to the second ciation between X and Y is confounded by variables decimal place) to the within-pair random effect, θ MZ, that differ between unrelated individuals, the be path estimated in the mixed effects Models 2–4. The regres- will differ from the ba and bc paths. If the three paths sion onto E is not significantly different from zero cannot be fixed to equality, this provides evidence that (be = 0.000, 95% CI = –0.031, 0.30), a result that fal- the quasi-causal association is confounded. Third, the sifies the hypothesis that religiosity causes decreases in relative magnitudes of ba and bc indicate the extent adolescent delinquency. The regression onto A is also to which the confounding variables are genetic or not significantly different from zero (ba = 0.043, 95% environmental in origin. Finally, the extent to which CI = –0.056, 0.142), suggesting that genetic pathways genetic, shared environmental, and non-shared envi- are not a significant confound of the quasi-causal rela- ronmental factors contribute to residual variation in tion between religiosity and delinquency. The regres- delinquency is directly estimated by the A, C, and E sion onto C is significantly different from zero (bc = components of delinquency. 0.061, 95% CI = 0.020, 0.102), suggesting that envi- Nested SEMs can be compared using two mea- ronmental circumstances related to familial religios- sures of goodness-of-fit, Bayesian Information Crite- ity, such as parental education or socioeconomic sta- rion (BIC), and Root Mean Square Error of Approx- tus, account for the association between religiosity and imation (RMSEA), as well as differences in χ2. BIC delinquency.

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 171

TABLE 8.2. Results from Structural Equation Models of Religiosity and Delinquency

Model 5: Model 6: Model 7: Full Model Regressions Equal No A or E Regression∗∗ Parameters Unstandardized Standardized Unstandardized Standardized Unstandardized Standardized Variance Components of Religiosity Additive 4.30 (1.21)∗ h2 = 24% 4.19 (1.21)∗ h2 = 23% 4.30 (1.21)∗ h2 = 24% Genetic Shared 9.06 (.121)∗ c2 = 50% 9.15 (1.21)∗ c2 = 51% 9.04 (1.21)∗ c2 = 50% Environmental Non-Shared 4.64 (.39)∗ e2 = 26% 4.66 (.40)∗ e2 = 26% 4.64 (.39)∗ e2 = 26% Environmental Regression Paths ∗ Areligion → .043 (.050) β = .111 .036 (.005) β = .091 [0] [0] Delinquency ∗ ∗ ∗ Creligion → .061 (.021) β = .228 .036 (.005) β = .135 .078 (.013) β = .292 Delinquency ∗ Ereligion → .000 (.016) β =−.001 .036 (.005) β = .096 [0] [0] Delinquency Residual Variance Components of Delinquency Additive .151 (.073)∗ h2 = 25% .144 (.073)∗ h2 = 23% .158 (.072)∗ h2 = 27% Genetic Shared .124 (.060)∗ c2 = 20% .136 (.058)∗ c2 = 22% .103 (.058) c2 = 17% Environmental Non-Shared .334 (.026)∗ e2 = 55% .341 (.027)∗ e2 = 55% .334 (.026)∗ e2 = 56% Environmental Model Fit Indices χ 2 (df, P) 30.36 (17, .02) 42.09 (19, .002) 31.64 (19, .03) χ 2 ( df, P) − 11.73 (2, .003) 1.28 (2, .527) CFI / TLI .979 / .985 .963 /. 977 .980 / .987 RMSEA .046 .057 .044

∗ Significant at P < .05. ∗∗ Model accepted as the best representation of the data.

Model 6 tested whether environmental and genetic nents to zero, allowing the association between reli- differences between families confound the associa- giosity and delinquency to be accounted for entirely tion between religiosity and delinquency by fixing the by environmental differences between families. Com- regression paths to equality. This model resulted in a pared to the full model (Model 5), Model 7 did not fit significant increase in χ2(࢞χ2 = 11.73, ࢞df = 2, P = significantly worse (࢞χ2 = 1.28, ࢞df = 2, P = 0.53), 0.003) and an increase in RMSEA, suggesting that the indicating that shared environmental confounds could association is indeed confounded by between-family account for the association between delinquency and differences. The parameter estimates from Model 5, religiosity. in which the only significant path between religiosity The same conclusion is apparent, regardless of and delinquency was the regression on C, suggested whether we use mixed effects models or structural that the relevant between-family confounds were equation models – religiosity is not causally related shared environmental in origin. Model 7, then, fixed to delinquency, but families whose environmental cir- the regressions of delinquency on the A and E compo- cumstances make them the “type” of family to be

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 172 ERIC TURKHEIMER AND K. PAIGE HARDEN

a2 c2 e2

A C E

2 a R b -b 1 b -b 1 P A P C A

2 c R

bP C

e2 E R

Figure 8.4. Genetically informed phenotypic regression, illustrated in Figure 8.4. As before, the outcome vari- including unmeasured confounds (only one twin shown). able is regressed on the unstandardized A and C com- Delinquency is regressed on religiosity along A, C and E ponents of the predictor, but instead of also being pathways. E pathway estimates bP. regressed on the E term, it is regressed on the full phe- notype of the predictor. This parameterization most religious have less delinquent children. This example, closely approximates the substantive goal of research in particular, demonstrates the usefulness of ostensibly of this kind, which is to examine the phenotypic “genetic” designs in evaluating hypotheses about phe- regression of Y on X while controlling for the between- notypic causation in the presence of genetic and envi- pair genetic and environmental confounds contained ronmental confounds. The structural equation mod- in A and C, respectively. els have the added benefit of explicitly parameterizing The interpretation of the coefficients in the geneti- genetic and shared environmental variance compo- cally informed phenotypic regression model is to mul- nents, as well as separate genetic and shared environ- tiply through the coefficients for the ACE regression mental regressions. However, fitting these structural Equation 8.6. Redistributing, we obtain, equation models requires substantially more program- Y = b ′ A + b ′′ C + (b b ′ )(A + C + E) (8.7) ming than the mixed effects models (see Appendix B A C P E at the end of the chapter for program). The sum of A, C,andE, of course, is simply X.Soin this parameterization, the regression on the pheno- ALTERNATIVE PARAMETERIZATIONS type tests for the quasi-causal effect, on the assump- tion that there are no confounds within pairs of iden- The genetically informed regression model we have tical twins. The A and C regressions test the difference described provides a straightforward test of hypothe- between the quasi-causal effect and the effects of the ses regarding environmental causation, yet a scan A and C confounds, respectively. They are equal to of the behavior genetics literature reveals a plethora zero when the quasi-causal relation is unconfounded. of alternative parameterizations for bivariate twin or In this parameterization, the sequence of inferences is sibling data. While these parameterizations are, in to test the phenotypic regression for the quasi-causal most cases, mathematically indistinguishable (Loehlin, effect, then test whether b and b differ from 0 to test 1996), there are important conceptual differences A C for the presence of confounding, then test the differ- among them that have implications for interpretative ence between b and b to determine if confounds are clarity. We include discussion of two reparameteriza- A C genetic or shared environmental in origin, and finally tions that were not included in Loehlin (1996). One of to test the residual variation in Y, as before. these we recommend, the other not. The results of fitting the genetically informed phe- notypic regression model are given in Table 8.3. The Genetically Informed Phenotypic Regression phenotypic regression, analogous to the non-shared A re-parameterization of the bivariate regression environmental regression in the previous models, is model that we find to be interpretively useful is equal to zero, suggesting that once shared familial

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 173

TABLE 8.3. Results from Alternative Parameterization of Bivariate Cholesky (with Phenotypic Path from Religiosity to Delinquency)

Model 8: Alternative Full Model (Phenotypic Path) Parameters Unstandardized Standardized Variance Components of Religiosity Additive Genetic 4.30 (1.21)∗ h2 = 24% Shared Environmental 9.06 (.121)∗ c2 = 50% Non-Shared Environmental 4.64 (.39)∗ e2 = 26% Regression Paths Areligion → Delinquency .043 (.050) β = .112 ∗ Creligion → Delinquency .061 (.021) β = .229 Religion → Delinquency .000 (.016) β =−.002 Residual Variance Components of Delinquency Additive Genetic .151 (.073)∗ h2 = 25% Shared Environmental .124 (.060)∗ c2 = 20% Non-Shared Environmental .334 (.026)∗ e2 = 55% Model Fit Indices χ 2 (df, P) 30.36 (17, .02) χ 2 ( df, P) − CFI / TLI .979 / .985 RMSEA .046

∗ Significant at P < .05.

confounds have been controlled, there is no reason E to the phenotype. We refer to this as an unstan- to hypothesize a quasi-causal relation between reli- dardized parameterization. It is far more common to giosity and delinquency. The values of the A and C standardize the A, C, and E components of the pre- regressions are the same as they were in the previous dictor variable to a variance of 1, and estimate the models, although this is not the usual result. In gen- paths to the observed variable, which we call a stan- eral, the A and C regressions in a phenotypic model dardized parameterization. In univariate behavioral will be equal to the differences between the A and C genetics, the choice between the standardized and regressions and the E regression in an ACE regression unstandardized parameterizations is trivial. In the un- model, which is to say they measure the magnitude standardized parameterization, the variance accoun- of the bias introduced by the A and C confounds. In ted for by A, for example, is equal to the estimated this example, however, the quasi-causal parameter is A variance; in the standardized parameterization, it is equal to exactly zero, so we conclude that the phe- equal to the square of the estimated path. The fit of notypic regression is all confound, and the A and C the models is identical, and in a univariate design they regressions are the same in the ACE and phenotypic are identical in interpretation. regression models. In the context of multivariate regression mod- els, however, these parameterizations have important conceptual differences. The complication arises from INTERPRETATION AND STANDARDIZATION the fact that the variances of the latent compo- The reader experienced in fitting SEM models to twin nents of X have consequences for the regressions of data may have noticed that we have chosen to param- Y on X. Return to Equation 8.7. Under conditions eterize our models by estimating A, C, and E vari- of no confounding, the unstandardized regression bE ances while fixing to 1.0 the paths from A, C, and estimates the quasi-causal regression bP, and the three

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 174 ERIC TURKHEIMER AND K. PAIGE HARDEN

ACE regressions bA, bC,andbE will be equal to each pose that data on religiosity and delinquency were other. If we consider instead the standardized regres- collected in the Netherlands rather than the United sions βA, βC,andβE, these relations no longer hold. States, and that the phenotypic causal relationship We have, instead, between religiosity and delinquency was the same there as here, but for whatever reason in the Nether- β = b var(A); β = b var(C ); β = b var(E) (8.8) A p c p E p lands twins varied less in their religiosity. If the un- When the shared and non-shared variances are stan- standardized model were fit, the investigator would dardized, the regression of Y on the latent variables gain insight into the similarities and differences no longer depends solely on the structural coefficient between the American and Dutch populations: the relating the phenotypes, but is a function of this coeffi- quasi-causal relations represented by the unstandard- cient and the magnitude of the A, C,andE components ized regressions (presumably the point of the study) of the predictor variable. Even when the phenotypic would be identical, but the A, C,andE variance com- relation between X and Y is invariant – a structural, ponents would be generally be different from each causally determined property – the magnitude of the other, and different in the Netherlands than they latent variances can be expected to vary from popula- were in the United States. If the standardized model tion to population and study to study. In particular – were fit, the structural invariance of the unstandard- as would be considered obvious in a typical regression ized parameter would be lost. The standardized regres- context – the amount of variance in Y accounted for sion parameters would be equal to the product of the by the ACE components of X depends on the relative invariant unstandardized parameters and the respec- magnitudes of the ACE components. tive ACE variances. The investigator would observe Although the issue of standardization may at first that the regression coefficient linking adolescent rule- seem to be a technical issue of biometric structural breaking to the E component of religiosity is larger in equation modeling, in fact it cuts to the heart of the Dutch study than in the American one, and, in the behavior genetic enterprise. We will therefore the opaque language commonly encountered in mul- make the point in a few different ways. One way tivariate behavior genetic research, conclude that the to understand the difference between standardized relation between religiosity and rule-breaking appears and unstandardized regressions is as the difference to be mediated along non-shared environmental between structural regression parameters (estimated pathways. by unstandardized regression coefficients) and vari- Another way of understanding the issue is in terms ance explained (estimated by standardized regres- of metrics. When a phenotype is partitioned into sions). Consider a concrete physical system that has unstandardized ACE components with estimated vari- been designed in a way that a change of one unit of X ances, all three components are expressed in the same causes a change of two units of Y, with no other causes units of X, so regression coefficients of Y on X are and no error. In any set of observations in which X all expressed in the same units, that is, units of Y varies, the unstandardized regression of Y on X will per units of X. If the latent components are standard- estimate the causal parameter – that is, two. (The stan- ized to have unit variance, however, their metric is dard error of the estimate will increase as the variance no longer in units of X, but instead is equal to the of X decreases, and of course no estimate can be com- standard deviations of each particular component. The puted when the variance of X becomes zero.) But what A regression is in units of Y per standard deviation is the amount of variance in Y that is accounted for by of A,theC regression in terms of the standard devi- X, or equivalently, the standardized regression coeffi- ation of C, and so forth. In a biometric study, the cient of Y on X? That quantity is equal to the square of magnitudes of these standard deviations are going to the causal parameter multiplied by the variance of X. differ from each other, and across studies they will In conditions of high variability in X, it will account for vary even more. Therefore the standardized regression a large amount of variability in Y, and in conditions of coefficients relating Y to the biometric components of low variability in X it will account for a small amount X will no longer be the same metric, and cannot be of variability; but the unstandardized parameter, two meaningfully compared. units of Y per unit of X, remains constant regardless of Suppose a social psychologist interested in the fun- how variance in X may change. damental attribution error set herself the task of deter- The same kind of thinking applies to regressions mining the percentage of variance that the fundamen- with error of the kind we are considering here. Sup- tal attribution error (FAE; the FAE refers to a tendency

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 175

to emphasize internal as opposed to situational expla- Heritability nations of the behavior of other people.) accounts for We have saved until last the most important rea- in some outcome. The research program would be a son for the misapprehension of research methods in lost cause, because even assuming a fixed causal effect behavioral genetics: the concept of heritability, which of the FAE that can be observed across situations, we have intentionally delayed mentioning until this there is no invariant percentage of variance accounted sentence. We have conducted a reasonably extensive for. In situations in which reliance on the FAE varies review of behavioral genetic research methods with- a great deal, it will account for a lot of variance in out once referring to the construct that most defenders Y; in situations where it hardly varies at all, it will or critics of the field view as its central idea. (The most account for very little. On the other hand, although recent biologically oriented broadside against behav- the question of how much variance is accounted for ior genetics [Charney (2012)] refers to the object of by the fundamental attribution error is ill-conceived its derision as “heritability studies.”) A full evaluation and not useful scientifically, it would be incorrect to of heritability would take us far afield, but note that conclude on this basis that the FAE itself was causally heritability is the proportion of phenotypic variability unimportant or that its effect could not be quantified. accounted for by the total effect of genotype (broad Based on either randomized experimentation or what- heritability) or by the additive effect of genotype (nar- ever quasi-experimentation can be cobbled together, row heritability). Heritability is thus a standardized causation is quantifiable by unstandardized regression variance component, and as such it is not invariant as coefficients, which are invariant against changes in the the genetic or phenotypic variance changes, so it is not variance of X. The percentage of variance explained, a meaningful indicator of the causal effect of genotype quantified by standardized regression parameters, is on phenotype. not invariant, even when the underlying causal pro- A good way to characterize the unstandardized cesses are. multivariate models we have described in this chap- To summarize: The structural constant underly- ter is that rather than being focused on the esti- ing covariation between a cause and an effect is the mation of heritability, they are designed to estimate unstandardized regression coefficient, expressing the relationships between variables that are invariant as units of Y caused by each unit of change in X.When regards heritability. Religiosity and delinquency are regressions are standardized, they no longer estimate heritable – everything is heritable or potentially so this quantity. Instead they estimate the variance in (Turkheimer, 2000) – and it would not be difficult to Y that is accounted for by X (if Y is also standardized, estimate heritability coefficients using the models we the variance explained will be a proportion), a quan- have described. The goal of our analyses, however, is tity that depends on the structural parameter and the not to estimate these coefficients, but rather to esti- variance of X. Questions of how many units of reduc- mate the part of regression of delinquency on reli- tion in delinquency are caused by an increase in one giosity that is independent of their heritabilities. We unit in religiosity, and whether that value is the same want to estimate the extent to which delinquency is for religiosity itself as for the uncontrolled genetic and causally associated with religiosity above and beyond environmental traits that confound it, despite their any variation in genetic background they may share. myriad methodological and interpretive complexities, In the same way, the models control for any shared at least have an invariant correct answer that a dili- environmental variability that is common to religios- gent investigator can hope to estimate. It is true that ity and delinquency, again without attending to the relying on unstandardized coefficients forces us to take magnitude of the shared environmental variance com- seriously the sometimes arbitrary units in which our ponent. The only way to accomplish such invariance constructs are measured, but that is ultimately a good is to estimate unstandardized regressions, because the thing, and in any event throwing the units away by standardized alternative will depend in part on magni- standardizing makes regression analyses even harder tudes of the biometric variances. to interpret. How much variance in delinquency is accounted for by religiosity and its ACE components MOLECULAR GENETIC APPROACHES has no invariant answer: It depends on the variability of delinquency and its biometric decomposition in a Twenty years ago, most scientists studying the genetics particular situation, and is not a worthwhile scientific of personality would not have predicted that the most question. crucial questions regarding causation would involve

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 176 ERIC TURKHEIMER AND K. PAIGE HARDEN

complex issues in the design and analysis of twin stud- locate some of the variance in personality somewhere ies. As the Human Genome Project neared its com- on the X chromosome. The results of the study fore- pletion, it was widely anticipated that the availability shadowed much of what has happened since in the of data from actual DNA, as opposed to the statisti- molecular genetics of personality. Of the 16 scales of cal inferences of quantitative genetics, would provide the 16PF, one (Q2, self-sufficiency vs. group adher- the causal, biologically based foundation that twin and ence) was more similar in the brothers concordant for family studies lacked. From the beginning, personal- color-blindness compared to the discordant brothers: ity has played a signature role in the development of The authors conclude with a recommendation that the what are called “molecular” genetic methods for the finding be replicated. The authors do not even men- study of behavior, providing some of the earliest suc- tion statistical power, but with 17 sibling pairs it is cesses but also some of the greatest frustrations. On the obviously quite limited. More advanced linkage meth- assumption that the molecular genetic methods are ods (e.g., Fullerton et al., 2003) use multiple mark- not yet as widely incorporated into the general body ers to evaluate the probability of linkage continuously of research methods in personality, we review some across the genome. of the basic techniques that are available. The review Linkage analysis has an important disadvantage as suggests that, in fact, the problems of scientific infer- a means of studying personality. Although linkage ence facing DNA-based studies of behavior turn out to can be detected with reasonable power using plausi- have much in common with traditional family studies: ble sample sizes for single genes of large effect, it is The core scientific problem is still the inference of cau- almost certain that no such genes exist for normal sation in a nonexperimental setting, and the contrast- variation in personality. For multiple genes of very ing of comparisons within and between family mem- small effect, which is just as certainly the situation that bers continues to play a crucial role. does obtain, the power to detect linkage is very small (Risch, 1990). At the same time, when genetic mul- tiple markers are used in combination with multiple Linkage Analysis potential outcomes, the number of significance tests The first DNA-based method that was applied to employed increases rapidly, meaning that both Type personality, linkage analysis operates within families. 1 and Type 2 error rates are often severely inflated. The word “linkage” refers the nonindependence of Another problem is that linkage analysis does not genetic loci that occur close to each other on a chro- identify a single genetic locus that is associated with mosome, a phenomenon called “linkage disequilib- a phenotype, but rather a region of a chromosome rium” (LD). In general, genes on different chromo- within which variation in a gene is likely to be in LD somes are passed on independently, and crossover with an outcome. Further analysis must be conducted processes lead to independence for genes well sepa- (using association methods described later in the chap- rated on a single chromosome. Genes close together ter) to establish the particular gene that is responsible on the same chromosome will tend to be transmitted for the linkage. together, however. If within a family (either a com- plete pedigree or a pair of siblings and their parents), Candidate Gene Studies individuals who share a behavioral trait are also iden- tical by descent (IBD) for a particular gene, it can be In contrast to linkage analysis, which is conducted inferred that the trait in question is related to the gene within families, most candidate gene or association stud- or to another gene close to it on the chromosome. ies are conducted between families. In its most basic Linkage analysis has been the earliest molecu- form, the candidate gene study is about as simple as a lar method to be adopted in the study of behav- study can be: A candidate genetic locus is mapped and ior because it requires minimal knowledge of actual a personality variable measured in a sample of unre- genetic sequence. The first linkage study of person- lated individuals, and the outcome of the study is the ality to be reported (Benjamin, Press, Maoz, & Bel- correlation between the two. Most of the well-known maker, 1993) looked for linkage between the 16 PF molecular genetic studies of personality have been and phenotypic color-blindness in 17 pairs of brothers association studies. In what is still the most widely of whom at least one was color-blind. Because color- cited study of this kind, Ebstein et al. (1996) studied blindness was known to be caused by a single X-linked the relation between novelty seeking as measured by locus, to the extent brothers concordant for color- Tridimensional Personality Questionnaire (Cloninger, blindness were more similar for personality, it would Svrakic, & Przybeck, 1993) and the D4 dopamine

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 177

receptor gene (DRD4) in a sample 124 normal vol- any trait of psychological interest, any one association unteers. Thirty-four participants who had at least one would be tiny at best. The extremity of the inferen- copy of the seven repeat allele scored about half a stan- tial problems brought on by this situation has led to a dard deviation higher than 90 who did not. radical restructuring of the way GWAS-based science Whether or not the relation between DRD4 and is conducted. When candidate gene studies were first novelty seeking has held up is a matter of meta- introduced, the theory-dependent process by which analytic controversy that we cannot review here genes became candidates seemed like a tonic for the (Kluger, Siegfried, & Epstein, 2002; Munafo, Yalcin, atheoretical gene searches of the early linkage era. But Saffron, & Flint, 2008). What has become crystal clear, it turned out that the theory by which genes were however, is that the effect size of half a standard devi- selected could not be separated from the chaotic prob- ation was unrealistically large. In the years since the lems of the winner’s curse and publication bias. GWAS heady early days of the genome project, it has become finally made these problems intractable, and the field clear that associations between individual genetic loci has reversed course: rather than being guided by and psychological outcomes are small and context- theory-driven hypotheses, GWAS is now conducted dependent, even when they appear to be statistically completely atheoretically, using highly stringent reliable. It has proven extremely difficult to screen (p < 10–8) significance levels to guard against Type I reports of association studies for what is known as the error. GWAS has produced important discoveries in “winner’s curse” (Xiao & Boehnke, 2009): Faced with the medical domain (Visscher & Montgomery, 2009), thousands of potential alleles, thousands of potential but it has been disappointing at best in the behavioral outcomes, intense pressure to publish, and a strin- sciences, and particularly so for personality (Munafo, gent peer-review system that prefers positive results, Clark, Moore, Payne, Walton, & Flint, 2003). even a well-intentioned and honest community of There is another threat to the validity of association scientists will produce effect sizes that are severely studies, usually identified with GWAS but in fact rel- biased upward. The winner’s curse is not exclusive evant to all studies of genetic association: population to genomics; it is rampant in the behavioral sciences stratification, sometimes called the “chopsticks gene” generally. It is just that the modern technology of problem (Hamer, 2000). Here is how one could find genomics has brought with it an expectation of scien- a gene associated with the use of chopsticks. Assem- tific rigor and replicability that the social sciences have ble a sample consisting of half North American and long since gotten used to doing without (Turkheimer, half Japanese participants, and identify a gene – any 2011). gene will do – that occurs more frequently in the Japanese than in the North Americans. Assuming the Japanese are more likely to use chopsticks, eating Genome-Wide Association Studies style will necessarily be correlated with variation in The Gordian knot of methodology in association the gene. Population stratification is usually controlled studies – myriad hypotheses, small effects, inadequate either by ensuring ethnic homogeneity in the sample power, and results that seem to depend on context – or by using statistical methods like principal compo- has combined with the next wave of technological nent analysis to identify ethnic dimensions in the SNPs possibility in genomics to launch a new paradigm of and then controlling for them statistically. The prob- DNA-based research. Genome-wide association stud- lem, however, extends beyond the confines of eth- ies, or GWAS, capitalize on the existence of sin- nic identification (Turkheimer, 2011). In the context gle nucleotide polymorphisms (SNPs), individual seg- of GWAS, culture of origin is an instance of the core ments of DNA that take only two values of the four problem that we identified in the analysis of twin stud- (ATCG) that are possible. Upward of a million SNPs ies – a shared environmental confound. In the chop- can be inexpensively assessed in a dense array across sticks example, the problem is not that the candidate the genome. SNPs are not genes – they are indicators gene and chopsticks may not actually be associated, of genes, and associations between SNPs and outcomes because their correlation is a simple statistical fact. are indicative of corresponding associations with genes The problem is that the statistical association is not an at some location on the chromosome close enough to indicator of an actual causal pathway from the gene to be in LD with the SNP. the chopstick, in exactly the same sense that holds for With GWAS, it is possible to search for associations an association between religiosity and delinquency. In between personality and literally millions of locations fact, in a mixed sample of Japanese and North Amer- in the genome. It quickly became apparent that for ican youth in which the Japanese are more religious

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 178 ERIC TURKHEIMER AND K. PAIGE HARDEN

and less delinquent, there would be a statistical associ- complex trait analysis (GCTA) uses GWAS data in a ation that would probably not be indicative of a causal novel way that closes the methodological gap between relation. quantitative and molecular genetics (Visscher, Yang, & What can be done to rescue causal inference under Goddard, 2010; Yang et al., 2010; see also Turkheimer, such extreme nonexperimental conditions? Resisting 2012). In a sample of individuals from whom SNP the temptation to say “nothing,” one interesting pos- chips have been obtained, pairwise coefficients are sibility is to return to within- and between-families computed that quantify the degree of genetic similar- approach that we endorsed as a structure for causal ity between pairs of individuals across the SNPs. These inference in twin studies, and which formed the basis coefficients, which are analogous to the coefficients of for linkage studies in the early days of molecular genetic relatedness in twin and family studies (e.g., genetics. An association between a gene and chopstick siblings are on average 50% genetically related), are use within families (the sibling with the gene used generally close to zero, and in fact pairs with coef- chopsticks; the one without the gene did not) effec- ficients higher than 2.5% are usually eliminated as tively rules out population stratification as an alter- genetically related. Once the genetic similarity matrix native explanation. Designs combining within- and is obtained, one can compute the relationship between between-families genetic associations and the statis- the degree of SNP-based genetic similarity and the tical methods to analyze them are available and pre- degree of similarity in the trait of interest, obtaining date GWAS (Fulker, Cherny, Sham, & Hewitt, 1999), a proportion of variance that is essentially a heritabil- but have not been widely employed for personality. ity coefficient computed in a sample of unrelated indi- One exception is a study by Middeldorp, de Geus, viduals. These heritabilities are generally smaller than Beem, Lakenberg, Hottenga, Slagboom, and Boomsma those computed from family members, but consider- (2007), which studied the relation between the sero- ably larger than the percentage of variance that can be tonin transporter gene (5-HTTLPR) and neuroticism, obtained by adding up the effects of individual SNPs or anxiety, and depression in a sample of 1,804 twins, genes. both sibling and parents. Only two of the eighteen Vinkhuyzen et al. (2012) conducted GCTA of within-family tests reached a significance level of neuroticism and extraversion scores in a sample of p<.05, leading the authors to reject the hypothesis of a approximately 12,000 individuals collected from sev- causal relation between 5-HTTLPR and the outcomes. eral research centers. Across the centers, the traits had The molecular genetics of personality has reached quantitative heritabilities (computed in the usual way) a conundrum. One can design “theory-driven” stud- of .4 to .45. In contrast, 6% of the variation in neuroti- ies within and between families, which control for cism and 12% of the variation in extraversion could be a subset of potential confounds of genomic causa- explained by SNP-based similarity. These proportions tion, but which are unavoidably contaminated by data are somewhat smaller than they have been for other exploration and the winner’s curse, cherry picking of traits, like height and intelligence, for which almost results, and publication bias. These studies wind up half of the phenotypic heritability has been recovered looking like non-genomic social science: locally inter- from the SNPs, although 6% and 12% are still signif- esting but frustratingly noncumulative. Or, one can icantly more than the 1% that can be recovered by opt for GWAS of massive populations with tiny p lev- quantifying the effects of individual genes. It should els, atheoretical by design and blind to the possibil- be emphasized that the methodology of GCTA is in ity of noncausal confounds, hoping for a few reliably fact much more similar to a twin study than it is to significant effects that collectively account for a few a GWAS. No distinction is made between the effects of percent of the variance at best, and which have not, individual SNPs, and no inference of the causal effects in the behavioral sciences at any rate, produced sub- of individual SNPs is even attempted (Turkheimer, stantive causal science. What would seem to be the 2012). logical compromise – GWAS of enormous samples of siblings – simply isn’t practical. CONCLUSIONS AND RECOMMENDATIONS Historically, it is undeniable that behavior genetics has Genome-Wide Complex Trait Analysis progressed from Galtonian ideas about “nature and We close this section with a consideration of nurture,” by way of supportable notions of heritabil- the newest molecular genetic method. Genome-wide ity in animal breeding, to a long era of concern, if not

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 179

outright obsession, with the values of heritability coef- the phenotypic association to risk of disconfir- ficients for human individual differences. Other than mation. the important task of disconfirming any remnants of 2. Regression-based genetically informative analy- blank-slate environmentalism mistakenly held over ses can be conducted more or less equivalently from previous eras of behaviorism or psychoanalysis, using multilevel models, structural equation mod- this effort was in our view not especially productive. els, or a combination of the two. Multilevel mod- Heritability is greater than zero for all individual dif- els usually have the advantage of being easier to ferences, and takes a determinate value for none of code, whereas structural equation models have them. Figuring out how “genetic” traits are, either in the advantage of greater flexibility, especially in absolute terms or relative to each other, is a lost cause: their ability to re-parameterize random variances Everything is genetic to some extent and nothing is into the familiar biometric components. completely so. There is little more to be said. 3. Although behavior genetic designs are commonly But despite the endless assertions of heritability thought of as a means of identifying and con- and the similarly endless denunciations of behavior trolling genetic effects, shared environmental con- genetic studies and their conclusions, both of which founds are often equally important threats to continue unabated to this day, we contend that her- causal hypotheses. If neuroticism is associated itability was never the most important motivation with poorer school performance, but living in for human of behavioral genetics. Instead, behavioral a violent neighborhood contributes to both, the genetics is justified by the simple observation that shared environmental effect of neighborhood is there is more than one reason why differences among an alternative, noncausal environmental explana- people are correlated with each other, either within tion of the phenotypic association. individual lives or across genetically related individu- 4. Causal hypotheses are almost always about phe- als. There are genetic as well as environmental reasons notypic relations among variables, not relations why extraverted mothers have extraverted children, among abstract variance components presum- or why religiously committed youth are less likely to ably representing genetic and environmental pro- become delinquent. Any question worth asking about cesses underlying observed behavior. Non-shared the behavioral genetics of personality comes down to environmental regressions are usually the best a question about why two or more traits are related available estimate of causal relations among to each other, and like any other kind of association- uncontrolled variables, because neither genes nor based psychology, such questions are ultimately about shared environments can account for them. The whether and how one trait causes another. Once that non-shared environment plays a special role in point is conceded, a huge segment of nonexperimental genetically informed social scientific methodol- human psychology threatens to collapse unless geneti- ogy because it encompasses associations among cally informative designs can be called on to support it, variables that cannot be accounted for by shared and such designs turn out not to depend on point esti- genes or environments, and are thus more plausi- mates of heritability at all; indeed, their correct analy- ble instances of phenotypic causation. sis relies on methods that are invariant with regard to 5. Notwithstanding the aforementioned, uncon- changes in the genetic and environmental variability trolled associations within identical twin pairs of individual differences. are not immune from confounds, and behavior Our analysis leads to a several specific recom- genetic methodology is ultimately just another mendations for the conduct of genetically informed quasi-experimental tool in the social scientific research in personality, and we will close by enumer- workshop. Once phenotypic associations have ating them. survived exposure to analyses of genetic and shared environmental confounds, confidence in 1. Behavioral genetic investigations of relations the causal relation may increase, but it is not among personality variables or between person- proven. We prefer the term “quasi-causal” to ality and exogenous variables should begin with describe the hypothesis that remains. an observation of a phenotypic association, which 6. From the point of view of understanding will usually be uncontrolled by random assign- the relationship between behavior genetics and ment. The goal of the genetically informed part the rest of naturalistic developmental psychol- of the analysis is to expose the causal basis of ogy, the inferential imperfection of genetically

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 180 ERIC TURKHEIMER AND K. PAIGE HARDEN

informed designs is a good thing. Too often, does matter, because familial variance in neuroti- proponents and detractors of behavior genetics cism and its familial covariance with other traits describe the discipline as though it were some- are alternative explanations of causal hypotheses how alien to the rest of social scientific methodol- about its phenotypic origins and consequences. ogy, generating either robustly scientific or falsely 10. Molecular genetic methods have added to the reductionist genetic counterhypotheses to psycho- tools available to behavior geneticists, but they logical theories of human development. Neither have not replaced twin studies and quantitative is true. Behavior genetics is only a threat to psy- genetic statistical methods. Just as we have con- chological theories in the same sense that the tended that the goal of twin studies was never cross-lagged panel design is. Yes, behavior genetic to quantify the magnitude of genetic effects on designs can sometimes make it harder to believe phenotypes, the goal of molecular genetics is not in causal hypotheses (Turkheimer & Waldron, to discover the individual genes that underlie dif- 2000), but that is as it should be, and ultimately ferences in personality. Instead, the goal of both behavior genetics is no more probative of causal quantitative and molecular genetics is to aid in relations than any other quasi-experimental the identification of causal processes in develop- method. ment, and in that regard molecular genetics faces 7. To a surprising degree, issues of standardization many of the same problems as quantitative genet- are crucial to placing behavior genetic methodol- ics, often in even more intractable forms. ogy on a strong foundation. Since Tukey, it has 11. Viewed in this way – as a quasi-experimental been well-understood that only unstandardized method that sits alongside many others – behav- regression coefficients provide invariant estimates ior genetic research methods can be seen for of causal relations in the face of changes in the what they truly are rather than as the threat- variances of predictor and outcome, but unfortu- ening or na¨ıve stereotypes that are often repre- nately those old insights have been lost in con- sented. Behavior genetics is not a radical, reduc- temporary practice that still relies on correlations tive alternative to psychological explanation of and standardized “beta weights.” Accounting for behavior, as earlier critics once feared (Lewon- variance and explaining causation are two differ- tin, Rose, & Kamin, 1984), and it is not a poorly ent and ultimately independent enterprises, and specified, dumbed-down version of the astound- science is almost always properly concerned with ing understanding of genomics we have achieved the latter. at a biological level of analysis, as more recent 8. As Tukey famously said about correlation coef- critics have contended (Charney, 2012). Behavior ficients, we believe that the world would be a genetics is not oversimplified genomics any more less confusing and contentious place without her- than nonexperimental developmental psychology itability coefficients, at least if one is concerned is oversimplified developmental biology. Behav- with a more complex and uncontrollable aspect of ior genetics is ordinary social science, with all the behavior than, say, milk production in cows. As problems that come with a lack of experimental with the heritability of milk production, the her- control; it is, however, social science conducted a itability of neuroticism informs us that we could little more carefully, analyzed with a little more selectively breed for human neuroticism if we realism about why individual differences are asso- wanted to, but fortunately we do not. Genes, in ciated with each other, and interpreted with a lit- the very abstract sense in which the term is used tle more skepticism about the vagaries of correla- in human quantitative genetics, influence neu- tion and causation. roticism, and this will generally ensure that the heritability of neuroticism is not zero. Beyond that the numerical value of heritability is indetermi- REFERENCES nate, and the question of “how important” genes Alpert, H. (1939). Emile Durkheim and sociologismic psy- are to differences in neuroticism has no meaning- chology. American Journal of Sociology, 45(1), 64–70. ful answer. Baier, C. J., & Wright, B. R. E. (2001). “If you love me, 9. Skepticism about the utility of heritability coef- keep my commandments”: A meta-analysis of the effect ficients should not be a basis for believing that of religion on crime. Journal of Research in Crime and Delin- genetic variance in neuroticism does not matter. It quency, 38(1), 3–21.

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 181

Benjamin, J., Press, J., Maoz, B., Belmaker, R. H. (1993). Harden, K. P., Mendle, J., Hill, J. E., Turkheimer, E., & Linkage of a normal personality trait to the color- Emery, R. E. (2008). Rethinking timing of first sex and blindness gene: Preliminary evidence. Biological Psychia- delinquency. Journal of Youth and Adolescence, 37(4), 373– try, 34(8), 581–583. 385. Browne, M. W., & Cudeck, R. (1993). Alternative ways of Harris, K. M. (2011). Design features of Add Health.Chapel assessing model fit. In K. A. Bollen, & J. S. Long (Eds.), Hill, NC: Carolina Population Center at the University of Testing structural equation models (pp. 136–162). Newbury North Carolina at Chapel Hill. Available at http://www. Park, CA: Sage Publications. cpc.unc.edu/projects/addhealth/data/guide. Burt, S. A. (2009). A mechanistic explanation of popularity: Harris, K. M., Halpern, C. T., Smolen, A., & Haberstick, B. Genes, rule breaking, and evocative gene-environment C. (2006). The National Longitudinal Study of Adoles- correlations. Journal of Personality and Social Psychology, cent Health (Add Health) twin data. Twin Research and 96(4), 783–794. Human Genetics, 9, 988–997. Carlin, J. B., Gurrin, L. C., Sterne, J. A., Morley, R., & Hopper, J. L., & Seeman, E. (1994). The bone density of Dwyer, T. (2005). Regression models for twin studies: A female twins discordant for tobacco use. New England critical review. International Journal of Epidemiology, 34(5), Journal of Medicine, 330(6), 387–392. 1089–1099. Jackson, J. J., Thoemmes, F., Jonkmann, K., Ludtke,¨ O., Cattell, R. B. (1957). Personality and motivation structure and & Trautwein, U. (2012). Military training and personal- measurement. Oxford: World Book Company. ity trait development does the military make the man, Charney, E. (2012). Behavior genetics and postgenomics. or does the man make the military? Psychological Science, Behavioral and Brain Sciences, 35(5), 331–358. 23(3), 270–277. Cloninger, C. R., Svrakic, D. M., & Przybeck, T. R. (1993). Kazdin, A. E. (1997). Parent management training: Evi- A psychobiological model of temperament and character. dence, outcomes, and issues. Journal of the American Archives of General Psychiatry, 50(12), 975–990. Academy of Child and Adolescent Psychiatry, 36(10), 1349– Ebstein, R. P., Novick, O., Umansky, R., Priel, B., Osher, 1356. Y., Blaine, D., Bennett, E. R., Nemanov, L., Katz, M., & Koenig, L. B., McGue, M., Krueger, R. F., & Bouchard, Belmaker, R. H. (1996). Dopamine D4 receptor (D4DR) T. J. (2005). Genetic and environmental influences exon III polymorphism associated with the human per- on religiousness: Findings for retrospective and current sonality trait of Novelty Seeking. Nature Genetics, 12, 78– religiousness ratings. Journal of Personality, 73(2), 471– 80. 488. Ellison, C. G., & Sherkat, D. E. (1995). The “semi- Laub, J. H., & Sampson, R. J. (1993). Turning points in the involuntary institution” revisited: Regional variations life course: Why change matters to the study of crime. in church participation among Black Americans. Social Criminology, 31(3), 301–325. Forces, 73(4), 1415–1437. Lewontin, R. C., Rose, S. R., & Kamin, L. J. (1984). Not in Farrington, D. P. (2005). Childhood origins of antisocial our genes: Biology, ideology, and human nature. : behavior. & Psychotherapy, 12(3), 177– Pantheon Books. 190. Littlefield, A. K., Sher, K. J., & Wood, P. K. (2009). Federal Bureau of Investigation. (2004). Crime in the United Is “maturing out” of problematic alcohol involvement States: 2004. Washington, DC: U.S. Government Printing related to personality change? Journal of Abnormal Psy- Office. chology, 118(2), 360. Fulker,D.W.,Cherny,S.S.,Sham,P.C.&Hewitt,J.K. Loehlin, J. C. (1996). The Cholesky approach: A cautionary (1999). Combined linkage and association sib-pair anal- note. Behavior Genetics, 26(1), 65–69. ysis for quantitative traits. American Journal of Human Mann, V., De Stavola, B. L., & Leon, D. A. (2004). Sepa- Genetics, 64(1), 259–267. rating within and between effects in family studies: An Fullerton, J., Cubin, M., Tiwan, H., Wang, C., Bomhra, application to the study of blood pressure in children. A., Davidson, S., Miller, S., Fairburn, C., Goodwin, G., Statistics in Medicine, 23(17), 2745–2756. Neale, M. C., Fiddy, S., Mott, R., Allison, D. B., & Flint, Markon, K. E., & Krueger, R. F. (2004). An empirical J. (2003). Linkage analysis of extremely discordant and comparison of information-theoretic selection criteria for concordant sibling pairs identifies quantitative-trait loci multivariate behavior genetic models. Behavior Genetics, that influence variation in the human personality trait 34(6), 593–610. neuroticism. American Journal of Human Genetics, 72(4), McArdle, J. J., & Prescott, C. A. (2005). Mixed-effects vari- 879–890. ance components models for biometric family analyses. Gosling, S. D., & John, O. P. (1999). Personality dimen- Behavior Genetics, 16(1), 163–200. sions in nonhuman animals: A cross-species review. Cur- McGue, M., Iacono, W. G., & Krueger, R. (2006). The asso- rent Directions in Psychological Science, 8(3), 69–75. ciation of early adolescent problem behavior and adult Hamer, D. (2000). Beware the chopsticks gene. Molecular psychopathology: A multivariate behavioral genetic per- Psychiatry, 5(1), 11–13. spective. Behavior Genetics, 36(4), 591–602.

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 182 ERIC TURKHEIMER AND K. PAIGE HARDEN

Middeldorp, C. M., deGeus, E. J. C., Beem, A. L., Laken- Raftery, A. E., & Richardson, S. (1996). Model selection for berg, N., Hottenga, J., Slagboom, P. E., & Boomsma, D. I. generalized linear models via GLIB, with application to (2007). Family based association analyses between the epidemiology.InD.A.Berry&D.K.Stangl(Eds.),Baye- serotonin transporter gene polymorphism (5-HTTLPR) sian biostatistics (pp. 321–354). New York: Marcel Dekker. and neuroticism, anxiety and depression. Behavior Genet- Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear ics, 37(2), 294–301. models: Applications and data analysis methods. Thousand Miles, D. R., & Carey, G. (1997). Genetic and environmen- Oaks, CA: Sage Publications. tal architecture of human aggression. Journal of Personal- Reiss, D., Neiderhiser, J. M., Hetherington, E. M., & Plomin, ity and Social Psychology, 72(1), 207–217. R. (2000). The relationship code: Deciphering genetic and Moffitt, T. E. (1993). Adolescence-limited and life-course- social influences on adolescent development. Cambridge, MA: persistent antisocial behavior: A developmental taxon- Harvard University Press. omy. Psychological Review, 100(4), 674–701. Risch, N. (1990). Linkage strategies for genetically complex Morley, R., Moore, V. M., Dwyer, T., Owens, J. A., Umstad, traits. II. The power of affected relative pairs. American M. P., & Carlin, J. B. (2005). Association between ery- Journal of Human Genetics, 46(2), 229–241. thropoietin in cord blood of twins and size at birth: Does Sampson, R. J., Morenoff, J. D., & Gannon-Rowley, T. it relate to gestational factors or to factors during labor or (2002). Assessing “neighborhood effects”: Social pro- delivery? Pediatric Research, 57, 680–684. cesses and new directions in research. Annual Review of Munafo, M. R., Clark, T. G., Moore, L. R., Payne, Sociology, 28, 443–478. E., Walton, R., & Flint, J. (2003). Genetic polymor- Schwarz, G. (1978). Estimating the dimension of a phisms and personality in healthy adults: A systematic model.The Annals of Statistics, 6(2), 461–464. review and meta-analysis. Molecular Psychiatry, 8(5), 471– Scott, J. P., & Fuller, J. L. (1965).Genetics and the social behav- 484. ior of the dog. Chicago: University of Chicago Press. Munafo, M. R., Yalcin, B., Saffron, A. W., & Flint, J. (2008). Shipley, B. A., Weiss, A., Der, G., Taylor, M. D., & Deary, Association of the dopamine D4 receptor (DRD4) gene I. J. (2007). Neuroticism, extraversion, and mortal- and approach-related personality traits: Meta-analysis ity in the UK Health and Lifestyle Survey: A 21-year and new data. Biological Psychiatry, 63(2), 197–206. prospective cohort study. Psychosomatic Medicine, 69(9), Muthen,´ B., & Muthen,´ L. (2010). Mplus user’s guide (6th 923–931. ed.). Los Angeles, CA: Muthen´ & Muthen.´ Smith, D. N. (2003).Hinduism and modernity. Malden, MA: Neale, M. C., & Maes, H. H. (2007). Methodology for genetic Blackwell Publishing. studies of twins and families. Dordrecht, The Netherlands: Spitz, E., Moutier, R., Reed, T., Busnel, M. C., & Marcha- Kluwer Academic. land, C. (1996). Comparative diagnoses of twin zygosity Oltmanns, T. F., & Turkheimer, E. (2009). Person percep- by SSLP variant analysis, questionnaire, and dermato- tion and personality pathology. Current Directions in Psy- glyphic analysis. Behavior Genetics, 26(1), 55–63. chological Science, 18(1), 32–36. Spotts, E. L., Pederson, N. L., Neiderhiser, J. M., Reiss, D., Pearce, L. D., & Axinn, W. G. (1998). The impact of fam- Lichtenstein, P., Hansson, K., & Cederblad, M. (2005). ily religious life on the quality of mother-child relations. Genetic effects on women’s positive mental health: Do American Sociological Review, 63(6), 810–828. marital relationships and social support matter? Journal Penke, L., Denissen, J. J. A., & Miller, G. F. (2007). The of Family Psychology, 19(3), 339–349. evolutionary genetics of personality. European Journal of Steiger, J. H. (1990). Structural model evaluation and mod- Personality, 21(5), 549–587. ification: An interval estimation approach. Multivariate Pike, A., McGuire, S., Hetherington, E. M., Reiss, D., & Behavioral Research, 25(2), 173–180. Plomin, R. (1996). Family environment and adolescent Tellegen, A., Lykken, D. T., Bouchard, T. J., Wilcox, K. J., depressive symptoms and antisocial behavior: A mul- Segal, N. L., & Rich, S. (1988). Personality similarity in tivariate genetic analysis.Developmental Psychology, 32(4), twins reared apart and together. Journal of Personality and 590–603. Social Psychology, 54(6), 1031–1039. Plomin, R., DeFries, J. C., Knopik, V. S., & Neiderhiser, Tukey, J. W. (1954). Causation, regression, and path anal- J. M. (2013). Behavioral genetics (6th ed.). New York: ysis.InO.Kempthorne,T.A.Bancroft,J.W.Gowen,& Worth Publishers. J. L. Lush (Eds.), Statistics and mathematics in biology (pp. Quinn, P. D., Stappenbeck, C. A., & Fromme, K. (2011). 35–66). Ames: Iowa State University Press. Collegiate heavy drinking prospectively predicts change Tukey, J. W. (1969). Analyzing data: Sanctification or in sensation seeking and impulsivity. Journal of Abnormal detective work? American Psychologist, 24(2), 83–91. Psychology, 120(3), 543. Turkheimer, E. (2000). Three laws of behavior genetics and Quinsey, V. L., Skilling, T. A., Lalumiere,` M. L., & Craig, what they mean. Current Directions in Psychological Science, W. M. (2004). Juvenile delinquency: Understanding the ori- 9(5), 160–164. gins of individual differences. Washington, DC: American Turkheimer, E. (2011). Genome wide association studies Psychological Association. of behavior are social science. In K. S. Plaisance & T. A.

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 183

C. Reydon (Eds.), Philosophy of behavioral biology (pp. 43– of the heritability for human height.’ Twin Research and 64). New York: Springer. Human Genetics, 13, 517–524. Turkheimer, E. (2012). Still missing. Research in Human Wallace, J. M., Brown, T. N., Bachman, J. G., & Laveist, Development, 8(3–4), 227–241. T. A. (2003). The influence of race and religion on absti- Turkheimer, E., & Waldron, M. (2000). Nonshared envi- nence from alcohol, cigarettes and marijuana among ronment: A theoretical, methodological, and quantita- adolescents. Journal of Studies on Alcohol, 64(6), 843–848. tive review. Psychological Bulletin, 126(1), 78–108. Wilcox, W. B. (1998). Conservative protestant childrear- Vinkhuyzen, A. A. E., Pedersen, N. L., Yang, J., Lee, S. H., ing: Authoritarian or authoritative? American Sociological Magnusson, P. K. E, et al. (2012). Common SNPs explain Review, 63(6), 796–809. some of the variation in the personality dimensions of Wilson, J., & Sherkat, D. E. (1994). Returning to the fold. neuroticism and extraversion. Translational Psychiatry, 2, Journal for the Scientific Study of Religion, 33(2), 148–161. e102. doi:10.1038/tp.2012.27 Xiao, R., & Boehnke, M. (2009). Quantifying and correct- Visscher, P. M., & Montgomery, G. W. (2009). Genome- ing for the winner’s curse in genetic association studies. wide association studies and human disease: From Genetic Epidemiology, 33(5), 453–462. trickle to flood. Journal of the American Medical Association, Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., 302(18), 2028–2029. Henders, A. K. et al. (2010).Common SNPs explain a Visscher, P. M., Yang, J., & Goddard, M. E. (2010). A com- large proportion of the heritability for human height. mentary on ‘common SNPs explain a large proportion Nature Genetics, 42(7), 565–569.

APPENDIX A: SAS CODE f11 = Delinquency factor score at Wave I religsum = Individual religiosity sum score religavg = Twin pair average religiosity religdev = Individual deviation from twin pair average religiosity MZ = 0 or 1; Monozygotic twin pair

MODEL 1 proc mixed data=mztwin12 method= ml covtest noclprint; class pair; model f11 = religsum / solution; random intercept / subject=pair type=vc; run;

MODEL 2 proc mixed data=mztwin12 method= ml covtest noclprint; class pair; model f11 = religsum religavg / solution; random intercept / subject=pair type=vc; run;

MODEL 3 proc mixed data=mztwin12 method = ml covtest noclprint; class pair; model f11 = religdev religavg / solution; random intercept / subject=pair type=vc; run;

MODEL 4 proc mixed data=twin12 method = ml covtest noclprint; class pair MZ twinid;

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 184 ERIC TURKHEIMER AND K. PAIGE HARDEN

model f11 = religdev religavg religdev*MZ / solution; random intercept / subject=pair group=MZ type=vc; repeated / group=MZ type=vc; run;

SAS OUTPUT FOR MODEL 4 The SAS System 12:43 Thursday, December 1, 2011 75 The Mixed Procedure Model Information Data Set WORK.TWIN12 Dependent Variable f11 Covariance Structure Variance Components Subject Effect PAIR Group Effects MZ, MZ Estimation Method ML Residual Variance Method None Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Dimensions Covariance Parameters 4 Columns in X 5 Columns in Z Per Subject 2 Subjects 644 Max Obs Per Subject 2 Number of Observations Number of Observations Read 1370 Number of Observations Used 1286 Number of Observations Not Used 84 Iteration History Iteration Evaluations -2 Log Like Criterion 0 1 3035.96177444 1 2 2906.05165136 0.00000000 Convergence criteria met. Covariance Parameter Estimates Standard Z Cov Parm Subject Group Estimate Error Value Pr > Z Intercept PAIR MZ 0 0.2112 0.03214

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 185

(CONTINUED ON NEXT PAGE) 6.57 <.0001 Intercept PAIR MZ 1 0.3354 0.04633 7.24 <.0001 Residual MZ 0 0.3889 0.02778 14.00 <.0001 Residual MZ 1 0.3172 0.02831 11.20 <.000

The SAS System 12:43 Thursday, December 1, 2011 76 The Mixed Procedure Fit Statistics -2 Log Likelihood 2906.1 AIC (smaller is better) 2922.1 AICC (smaller is better) 2922.2 BIC (smaller is better) 2957.8 Solution for Fixed Effects Standard Effect MZ Estimate Error DF t Value Pr > |t| Intercept -0.4036 0.06920 641 -5.83 <.0001 religdev -0.00848 0.01681 641 -0.50 0.6138 religavg 0.05042 0.006817 641 7.40 <.0001 religdev*MZ 0 0.02717 0.02068 641 1.31 0.1894 religdev*MZ 1 0 . . .. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F religdev 1 641 0.24 0.6221 religavg 1 641 54.69 <.0001 religdev*MZ 1 641 1.73 0.189

APPENDIX B: MPLUS CODE data: file = model6.txt; variable: names = pair zygo f11a f11b religa religb; missing =.; grouping = zygo (1=mz 2=dz);

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 186 ERIC TURKHEIMER AND K. PAIGE HARDEN

usevariable = religa religb f11a f11b; analysis: type = missing h1; model = nocovariances; model: a11 by religa@1; c11 by religa@1; e11 by religa@1; a21 by religb@1; c21 by religb@1; e21 by religb@1; [religa* religb*] (relmean); [a11-e21@0]; a11*4 (amz); c11*8 (c); e11*4 (e); a21*4 (amz); c21*8 (c); e21*4 (e); a11 with a21*4 (amz); c11 with c21*8 (c); religa@0; religb@0; religa with religb@0; f11a on a11*.01 (areg) c11*.04 (creg) e11*.01 (ereg); f11b on a21*.01 (areg) c21*.04 (creg) e21*.01 (ereg); a12 by f11a@1; c12 by f11a@1; e12 by f11a@1; a22 by f11b@1; c22 by f11b@1; e22 by f11b@1; [f11a* f11b*] (delmean); [a12-e22@0]; a12* (xmz); c12* (y); e12* (z); a22* (xmz); c22* (y); e22* (z); a12 with a22* (xmz); c12 with c22* (y); f11a@0; f11b@0; f11a with f11b@0; model dz: a11 with a21*4 (adz);

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012 BEHAVIOR GENETIC RESEARCH METHODS 187

a12 with a22* (xdz); model constraint: amz= 2*adz; xmz =2*xdz; output: sampstat tech1 cinterval standardized;

Downloaded from https://www.cambridge.org/core. Universiteit Leiden / LUMC, on 06 Nov 2018 at 14:12:58, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511996481.012