<<

Psychological Assessment © 2011 American Psychological Association 2011, Vol. 23, No. 2, 337–353 1040-3590/11/$12.00 DOI: 10.1037/a0021746

Deriving Childhood Temperament Measures From -Eliciting Behavioral Episodes: Scale Construction and Initial Validation

Jeffrey R. Gagne and Carol A. Van Hulle Nazan Aksan University of Wisconsin—Madison Koc University

Marilyn J. Essex and H. Hill Goldsmith University of Wisconsin—Madison

The authors describe the development and initial validation of a home-based version of the Laboratory Temperament Assessment Battery (Lab-TAB), which was designed to assess childhood temperament

broadly. with a comprehensive series of emotion-eliciting behavioral episodes. This article provides researchers with general guidelines for assessing specific behaviors using the Lab-TAB and for forming behavioral publishers. composites that correspond to commonly researched temperament dimensions. We used mother ratings and independent postvisit observer ratings to provide validity evidence in a community sample of allied 4.5-year-old children. 12 Lab-TAB behavioral episodes were employed, yielding 24 within-episode disseminated its temperament components that collapsed into 9 higher level composites (, , , , be of Positive Expression, Approach, Active Engagement, Persistence, and Inhibitory Control). These dimen- to one sions of temperament are similar to those found in questionnaire-based assessments. Correlations among not or

is the 9 composites were low to moderate, suggesting relative independence. As expected, agreement between Lab-TAB measures and postvisit observer ratings was stronger than agreement between the and Lab-TAB and mother questionnaire. However, for Active Engagement and Shyness, mother ratings did

user predict child behavior in the Lab-TAB quite well. Findings demonstrate the feasibility of emotion-

Association eliciting temperament assessment methodologies, suggest appropriate methods for data aggregation into trait-level constructs and set some expectations for associations between Lab-TAB dimensions and the degree of cross-method convergence between the Lab-TAB and other commonly used temperament individual assessments. the Psychological of Keywords: temperament, children, behavioral assessment, scale construction, Laboratory Temperament

use Assessment Battery (Lab-TAB) American personal the the by Temperament traits are conceptualized as behavioral and emo- tive, and social complexity than personality traits, and to include

for tional dimensions that develop early in childhood and collectively self-regulatory as well as reactive components (Buss & Plomin, form the basis for later personality (Goldsmith et al., 1987). 1975, 1984; Goldsmith et al., 1987; Rothbart, 1989; Rothbart &

solely Childhood temperament is often assumed—and to a degree has Bates, 1998; Thomas & Chess, 1977). Research on childhood copyrighted been demonstrated—to reflect biological individuality, to remain temperament intersects with several subdisciplines in psychology, is relatively stable across development, to have less cognitive, affec- including personality, developmental, clinical, and behavioral neu- intended

is roscience (Gagne, Vendlinski, & Goldsmith, 2009). The connec- tions between early temperament traits and later personality and document

article psychopathology have been intensely studied over the last decade This article was published Online First April 11, 2011. This (Caspi, Roberts, & Shiner, 2005; Goldsmith, Lemery, & Essex,

This Jeffrey R. Gagne, Carol A. Van Hulle, and H. Hill Goldsmith, Depart- ment of Psychology, University of Wisconsin—Madison; Nazan Aksan, 2004). However, unlike personality research, which typically em- Department of Psychology, Koc University; Marilyn J. Essex, Department ploys interview and questionnaire methods, temperament research of Psychiatry, University of Wisconsin—Madison. has historically been rooted in observational and laboratory-based Nazan Aksan is now at Department of Psychology, University of Iowa. assessment techniques. This research was supported in part by National Institute of Mental Although even prehistoric probably invoked the con- Health and Health Research Institute, University of Wisconsin— cept of temperament in dealing with one another, the Greek phy- Madison, Grants R01-MH044340 and P50-MH052354. The project de- sician (129–199 CE) is generally credited with creating the scribed was also supported by National Institute of Mental Health Award first influential, systematic account of how aspects of temperament Number T32-MH018931 (Program Director: R. J. Davidson). We also corresponded with specific personality profiles and diseases. At acknowledge support from National Institutes of Health Grants R37- MH050560, P30-HD03352, and P50-MH084051. the turn of the 20th century, the Russian physiologist Pavlov Correspondence concerning this article should be addressed to Jeffrey R. studied “transmarginal inhibition” in canines. Pavlov and his fol- Gagne, Department of Psychology, University of Wisconsin—Madison, lowers observed reactive and regulatory aspects of the canine 1202 West Johnson Street, Madison, WI 53706-1611. E-mail: nervous system, phenomena that overlap with current conceptual- [email protected] izations of temperament. In the early 20th century, the German

337 338 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

psychiatrist Kretschmer believed that body type was related to mies and factor structures of temperament dimensions have orig- temperament and that the extremes of thinness and obesity were inated from these questionnaire-based methods. Even when factor- differentially related to distinct psychiatric disorders. In the late analytic methods are used to determine dimensionality, the 1940s and 1950s, Escalona employed detailed and intensive ob- resulting structures reflected the temperament theory that informed servational methods to examine individual differences in person- item-generation (Kohnstamm et al., 1998). ality development in a sample of healthy infants (Escalona & Most studies use parental rating scales as the primary means of Leitch, 1952). Although she focused on psychoanalytic concep- assessing temperament, typically without direct observation as a tions of personality and did not explicitly use the term tempera- check on the validity of the questionnaire measures. Unfortunately, ment in her work, Escalona’s observations of early emerging using parent ratings as the sole basis for assessment of child patterns of emotions and behavioral repertoires laid some of the temperament introduces the potential for several biases. Specifi- groundwork for contemporary temperament research. Although cally, parent ratings of temperament in studies with siblings often conceptions and theories changed over time, these early theorists show contrast effects whereby the parent exaggerates differences all emphasized an observational quality to the study of tempera- between siblings or assimilation effects whereby the parent exag- ment. gerates similarities between siblings. Parent-rated temperament Thomas and Chess began the classic New York Longitudinal often evinces higher age-to-age stability than lab-based ratings,

broadly. Study (NYLS; Thomas, Chess, & Birch, 1968) of infant temper- suggesting that parent expectations about child behavior may ament in the early 1950s; they used ratings of nine temperament contribute to this stability (Saudino, 2003a). However, it is also publishers. dimensions based on parent interviews and, later, questionnaires. possible that lab ratings are not as reliable or valid as parent ratings Children with high or low ratings on these dimensions were or that the two types of assessment tap different aspects of behav- allied disseminated

its considered at risk for poor psychological adjustment. This ap- ior. Another criticism of parent ratings include the possibility of be of proach to conceptualizing temperament as a dimensional construct “halo effects” or “cries for help,” in which case the parents’ to

one had a significant impact on temperament theory and research in the affective relationship with the child biases reports of temperament. not or latter half of the 20th century. The original Thomas and Chess Yet another criticism is that parents lack knowledge of the typical is (Thomas, Chess, & Birch, 1968) temperament dimensions were level and range of behavior of large, representative comparison and activity, regularity, initial reaction, adaptability, intensity, mood, groups of same-age children and are thus unable to accurately

user distractibility, persistence/ span, and sensitivity. Cur- scale their own children’s behavior (see Mangelsdorf, Schoppe, & Association rently, the most commonly examined temperament dimensions are Buur, 2000, for further discussion of this issue). activity level, anger/, behavioral inhibition/fear, effort- A special issue in Infant Behavior and Development (2003)

individual ful control, and positive (Gagne et al., 2009). Although these focused on this important topic of parents as informants about the

the dimensions are not representative of any single theoretical frame- temperament of their own children (Goldsmith & Hewitt, 2003; Psychological of work regarding the structure of temperament, researchers have Hwang & Rothbart, 2003; Saudino, 2003a, 2003b; Seifer, 2003).

use reached some consensus about the importance of these widely These researchers also indicated that some parent assessments of studied traits. temperament were less susceptible to rater biases than others,

American In addition to dimensional traits, temperament can also be particularly rating systems that do not rely on global judgments of personal the conceptualized categorically or typologically. Thomas and Chess behavior and focus on specific, concrete behaviors. For instance, the by (Thomas, Chess, & Birch, 1968) included “easy,” “difficult,” and certain temperament questionnaires (e.g., the Child Behavior

for “slow-to-warm-up” categories in their theory of temperament, and Questionnaire) employ specific content and time frames in items other behavioral scientists have also conceptualized temperament that allow parents to access more specific memories of their child’s solely categorically. Meehl (1992) and Kagan (1994), among others, behavior (e.g., “has difficulty sitting still at dinner”). Measures that copyrighted

is emphasized that the issue of dimensions versus types is concep- use global items such as “my child is highly active” appear to leave tually complex and that types remain a viable yet understudied more room for biases to intrude. Although we need more studies intended

is alternative to temperamental dimensions. Although our research that empirically compare different types of parent ratings, the issue group has described typological approaches to temperament using of contrast effects in twin and family research clearly occurs most document

article configural frequency analysis (Aksan et al., 1999), this article often in global questionnaire ratings (Goldsmith & Hewitt, 2003; This adopts a dimensional conceptualization. Hwang & Rothbart, 2003; Saudino, 2003a). This Several temperament researchers have designed theoretically Although most temperament studies have relied on global parent comprehensive questionnaires covering multiple temperament di- reports for behavioral assessment, several researchers have advo- mensions; these efforts are similar to approaches in the personality cated the use of laboratory methods and some suggest that a domain that emphasize multiple traits. In some cases, temperament multimethod perspective provides the best evidence for the signif- researchers have adapted the popular five-factor model from adult icance of temperament to important developmental outcomes personality research for use with children (Kohnstamm, Halver- (Hwang & Rothbart, 2003). However, the literature on early tem- son, Mervielde, & Havill, 1998). In another approach, the 15 scales perament clearly shows that parental report and observational of the Children’s Behavior Questionnaire have been factor ana- measures of the putatively “same” temperament trait often lack lyzed to yield three factors (Putnam & Rothbart, 2006). Typically, substantial agreement (Goldsmith, Rieser-Danner, & Briggs, 1991; temperament measures are derived in a top-down fashion, and Mangelsdorf et al., 2000; Saudino & Cherny, 2001; Saudino, questionnaire items reflect the concerns of developmental psychol- Wertz, Gagne, & Chawla, 2004; Seifer, Sameroff, Barrett, & ogists (e.g., Buss & Plomin, 1975; 1984; Gartstein & Rothbart, Krafchuck, 1994). Conceptually, this lack of agreement might 2003; Goldsmith, 1996; Putnam, Gartstein, & Rothbart, 2006) or result from flaws in one or both of the assessment approaches clinicians (Thomas & Chess, 1977). Many temperament taxono- (which seems to be the default assumption in the literature) or from DERIVING CHILDHOOD TEMPERAMENT MEASURES 339

fundamental differences in the features of temperament captured the frequency and intensity of target behaviors as well as contex- by the two approaches. That is, the nature of, say, anger proneness tual information regarding the situations and the behavior of the assessed by independent observers may differ from the nature of parent was obtained. Rothbart (1988) also assessed similar traits in anger proneness that parents report. The plausibility of this latter structured tasks in the laboratory setting. This work of Rothbart’s explanation must be adjudicated empirically. For instance, was one direct precursor of the approach reported in this article Saudino (2009) showed that parent-rated activity level tapped (Goldsmith & Rothbart, 1991). In later Louisville Twin Study different genetic and environmental factors than did lab-based and work than that described above, a series of behavioral vignettes mechanical ratings of activity level in toddlers. Other studies have (i.e., visible barrier with attractive toy, puppet, and mechanical toy shown that parent and lab ratings have differential correlates to games) were employed in the laboratory to derive scores of emo- outcomes such as maternal (Gartstein & Marmion, tional tone, activity, attentiveness, and social orientation as judged 2008; Hayden, Klein, & Durbin, 2005). Whether weak conver- by observers across all the episodes (Riese, 1998). Although the gence of parental report questionnaires and observational measures ratings and coding systems were more sophisticated than the IBR, is due to different sources of systematic error or to these two these assessments were also not designed to elicit particular be- assessment approaches tapping partially different temperament haviors or to obtain a specific structure of temperament. traits, or both, the advantage of employing carefully constructed In addition to Rothbart’s (1988) research mentioned above, a

broadly. observational ratings in addition to parent report is apparent. handful of studies have employed laboratory situations or episodes Those who include lab-based assessments of temperament in that evoke and assess multiple specific dimensions of tempera- publishers. their programs of research often focus on one specific dimension. ment. For example, in another study of infant temperament and For example, Kagan’s research on reactively fearful children pri- emotion that was a direct precursor to the methods reported in the allied disseminated

its marily used objective laboratory fear and inhibition paradigms current article, Goldsmith and Campos (1990) assessed fearfulness be of (Kagan, 1994). Saudino and Eaton have produced a series of and / using several laboratory vignettes. Fear was as- to

one studies examining activity level using a combination of parent, sessed in two visual cliff episodes and one stranger approach not or laboratory, and mechanical actometer measures (Saudino & Eaton, episode, and joy was assessed in a series of four gamelike epi- is 1991, 1995). Kochanska’s research on effortful control and inhib- sodes. Subsequent studies from Goldsmith’s group elaborated and itory control also combined parent and lab-based assessments of upon this approach (e.g., Pfeifer, Goldsmith, Davidson, & Rick-

user temperament (Kochanska, Murray, & Harlan, 2000; Kochanska, man, 2002). In the MacArthur Longitudinal Twin Study (Robin- Association Murray, Jacques, Koenig, & Vandegeest, 1996). Measuring a son, McGrath, & Corley, 2001), researchers included distinct single dimension at a time allows researchers to be efficient. episodes that elicit anger (restraint and toy removal), prohibition/

individual However, there is a natural trade-off between depth and breadth of inhibition (prohibition of touching an attractive toy), and behav-

the measurement in observational research. The narrower approach ioral inhibition (stranger approach) during home and laboratory Psychological of needs to be balanced by research that takes a broader and more visits with young children. Despite these examples, more compre-

use integrative view, or many important aspects of temperament will hensive laboratory-based investigations of child temperament are not be properly understood. Such research would be more likely to the exception rather than the rule, and the assessments in the

American be conducted if there were a set of easily available and well- studies just reviewed were not usually designed to develop a personal the validated observational assessment tools and protocols that can be taxonomic structure. the by used to measure a broad range of temperament-related variables. Developing valid and reliable objective temperament assess-

for Some researchers have assessed temperament more broadly by ment methods will allow us to complement parent rating scales and examining a wider range of dimensions in the lab. Most of these return to the tradition exemplified by the early work of Pavlov and solely assessments were relatively free flowing in administration and Escalona. As previously mentioned, objective assessment can take copyrighted

is were often conducted in conjunction with cognitive and motor the form of free-flowing, unstructured observations or structured testing in early childhood. In the early stages of the Louisville tasks and episodes that are designed to elicit specific behavioral intended

is Twin Study, Matheny (1980, 1983) used observational measures responses and quantify individual differences in those responses. of child temperament based on the items from Bayley’s Infant The purpose of this article is to examine the psychometric prop- document

article Behavior Record (IBR; Bayley, 1969). The IBR is used to assess erties of a comprehensive, in-home behavioral assessment of tem- This a range of temperament-like infant behaviors that are observed in perament in preschool-age children. The in-home assessment was This the testing situation of the Bayley Scales of Infant Development modified from the Preschool version of the Laboratory Tempera- (Bayley, 1969). A factor analysis of the IBR suggests that it ment Assessment Battery (Lab-TAB; Goldsmith, Reilly, Lemery, measures task orientation, affect-extraversion, activity, auditory- Longley, & Prescott, 1993), a laboratory-based temperament mea- visual awareness, and motor coordination. It is noteworthy that sure that includes several behavioral episodes corresponding to the these assessments were not focused on eliciting specific full range of temperament dimensions. We refer collectively to the temperament-related behavior. With the IBR and similar measures, episodes in this study as the Lab-TAB, although administration is child testers or observers provide global ratings based on impres- in the home rather than the lab context. Although we expect there sions of the child’s behavior during the testing situations that to be some variance in behavioral observation from place to place, targeted other characteristics. most all aspects of administration and coding are consistent across A few investigators have developed more fine grained methods the lab- and home-based versions of the Lab-TAB. The great for measuring and rating temperament in an observational setting. majority of Lab-TAB episodes were originally intended to tap a Rothbart (1986) assessed activity level, smiling and laughter, single dimension of temperament; however, behavioral reactions distress to limitations, fear, and vocal activity during bath, feeding, in some episodes appear to be influenced by multiple traits. In the and situations in a home visit with infants and their parents; present study, we selected a community sample and used mother 340 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

ratings and independent postvisit ratings to provide validity evi- recruited, 451 remained in the study at 4.5 years when the home- dence for our home-based Lab-TAB assessment. based assessment of temperament was conducted, although not all Much of what we know about child temperament assessment families participated in this aspect of the study. Inclusion criteria has been derived from questionnaire methodology, and the impli- for this study focused on the mothers, and there were no exclu- cations for temperament assessment using a lab-based methodol- sionary criteria for children with disabilities (e.g., Fragile X, ogy are not straightforward. For questionnaires, the basic unit of Down’s Syndrome). However, none of the children who partici- information is the item, and items are systematically organized into pated in the 4.5-year assessment had any major disabilities. There scales that reflect particular dimensions. Well known standards were nearly equal numbers of girls (228) and boys (223), and 93% and practices have been developed to optimize what item features of the mothers identified themselves as Caucasian (2.4% African are, how items should be combined to form scales, and what American, 1.8% Hispanic, 2% Indian-Alaskan, and 0.7% Asian). interrelationships are to be expected among scales (Loevinger, Approximately 44% of the children in the study participated in 1957; Clark & Watson, 1995). Because observational assessments center-based day care at least 8 hr per week, 17% were in home- of temperament have rarely been organized into comprehensive based day care, and the remaining children were not in day care (or protocols, little consensus exists on best practices and standards for data were missing). At the birth of the target child (1990–1991), how to assess specific behaviors, how to form behavioral compos- the average age of mothers and fathers was 29 years and 32 years,

broadly. ites, and what expectations to have for associations between di- respectively. Thirty-eight percent of the children were first-born, mensions. The same psychometric principles needed to ensure 37% were second-born, 19% third-born, and the remaining 6% publishers. reliability and foster validity apply to both questionnaire and were later born. At the time of recruitment, 36.8% of the partici- observational assessments, but practical assessment guidelines pating mothers and 40.5% of the fathers had a high school or allied disseminated

its may be very different for the two modalities as we shall describe technical degree or less, and the remainder had a college degree or be of in the article. postcollege education. At the time of pregnancy for the target to

one A central question is whether the dimensions of temperament child, 57% of participants had combined family incomes of less not or that can be recovered from the Lab-TAB are similar to tempera- than $50,000 per year. is ment questionnaire scales. If lab-based dimensions are comparable and in nature to those derived from questionnaires, will correlations Laboratory Temperament Assessment Battery—Home

user from “corresponding” dimensions show convergent validity across

Association Version (Lab-TAB) the methods? Based on previous findings, it is unlikely that the convergence between mother ratings and the Lab-TAB measures During home visits, child temperament was assessed with the

individual will be strong. However, we anticipate that agreement will be Lab-TAB—Home Version, a comprehensive home-based temper-

the comparable to the typical levels of parent-laboratory agreement ament assessment that includes behavioral episodes corresponding Psychological of noted in the literature (Goldsmith et al., 1991; Mangelsdorf et al., to dimensions of temperament. This administration of the Lab-

use 2000; Saudino & Cherny, 2001; Saudino et al., 2004; Seifer et al., TAB included 12 standardized behavioral episodes intended to 1994). Convergence between Lab-TAB measures and postvisit elicit targeted affective and behavioral reactions. These episodes

American observer ratings will most likely be stronger than mother question- comprised a revised version of the Preschool Lab-TAB (Goldsmith personal the naire by Lab-TAB convergence because both the postvisit observ- et al., 1993). Preschool Lab-TAB data has previously shown the by ers and the Lab-TAB coders will be rating child behavior from the convergent validity with temperament questionnaires (i.e., the

for overlapping contexts. The purpose of this article is not to promote Children’s Behavior Questionnaire) as well as continuity across the Lab-TAB assessment for general use, but to demonstrate the age in several studies (e.g., Pfeifer et al., 2002). The administration solely feasibility of explicitly behaviorally based temperament assess- and coding of this home-based version was almost identical to the copyrighted

is ment methodologies, to suggest methods for data aggregation into lab versions. The only significant differences were that the child trait-level constructs, and set some expectations for the degree of was in the home, the camera was present as opposed to being in a intended

is cross-method convergence that such assessments are likely to control room, and the props that the experimenter used were in a evince. We do not claim the superiority of objective laboratory bag in the room. Although there is always variance in behavioral document

article ratings, but we offer an alternative or an adjunct to standard observation from place to place (including from laboratory to This assessment methods. laboratory), our goal was to make the administration as consistent This as possible across home administrations. The Lab-TAB assess- Method ments typically lasted about 40 min and occurred at the midpoint of a 5-hr home visit that included other activities such as maternal Sample interviews and play sessions with a sibling. One child tester administered the battery after establishing appropriate rapport with The sample consisted of 408 children assessed at 4.5 years of the child. Eight individuals served as child testers for the 4.5-year age. Families resided in either the Milwaukee (78%) or the Mad- visits, and each was highly trained and monitored to achieve ison, Wisconsin, metropolitan areas (Hyde, Klein, Essex, & Clark, consistency of administration. During administration of the Lab- 1995). This was a study of maternity leave and health outcomes, TAB episodes, children’s behavior was videotaped and later coded therefore all mothers were required to be employed or a full-time in the laboratory. Thirteen percent of the sample was rated by a homemaker. Participants were selected during the second trimester second observer, and every Lab-TAB episode had a mean kappa of of pregnancy with the target child, and only mothers who were .90 or higher, reflecting chance-corrected interrater agreement. cohabiting with the baby’s biological father and older than 18 Table 1 lists the 12 Lab-TAB episodes, beginning with a de- years of age were included. Of the 570 families that were initially scriptive title that is be used throughout the article (Bookmark, DERIVING CHILDHOOD TEMPERAMENT MEASURES 341

Table 1 Lab-TAB Episodes, Descriptions, and Target Responses

Lab-TAB episodes Description Target responses

Negative affectivity domain Box Empty Failed expectations due to a wrapped box (present) being empty Sadness, Anger, Negative Affect End of the Line Unreasonable prohibition (a novel toy is retrieved by the parent after a Anger, Sadness, Negative Affect demonstration) Stranger Approach Social interaction with unfamiliar adult wearing hat and sunglasses Shyness, Person Fear Spider Physical contact with unknown, hidden object (a jumping spider toy) Object Fear, Startle Transparent Box Desirable object kept out of reach (locked inside a transparent box) Anger, Sadness, Frustration, Persistence (in trying to open the box)

Positive affectivity domain Bookmark Paper, stamps, and markers are used to make a bookmark with experimenter Seated activity level, engagement, broadly. Perpetual Motion Seated activity with “space wheel” toy, child left alone with toy for 2 min Engagement, persistence

publishers. Popping Bubbles Blowing (low intensity) and popping (high intensity) bubbles using a bubble High & low intensity pleasure gun Pop-Up Snakes Tester surprises child with pop-up snake in a can, child and tester then Pleasure, contentment, allied

disseminated parent anticipatory positivity its

be Workbench Child manipulates various objects on a toy workbench, fine motor activity Seated activity level, engagement of to

one Behavioral control-regulation domain not or

is Dinky Toys Child forced to make a toy choice among many alternatives, 2 trials Inhibitory control Snack Delay Child must wait for a signal before eating a snack (M&M’s candies or Goldfish Inhibitory control and crackers), 6 trials

user ϭ Association Note. Lab-TAB Laboratory Temperament Assessment Battery.

individual Box Empty, Dinky Toys, End of the Line, Perpetual Motion, included 28 items rated on a 1–5 scale. The items were intended

the Popping Bubbles, Pop-Up Snakes, Snack Delay, Spider, Stranger to be unipolar, with 1 describing the absence of the quality Psychological of Approach, Transparent Box, Workbench). Brief descriptions of the being rated (e.g., positive affect, energy, cooperation) and 5

use episodes as well as the broad domains of temperamental reactions describing the intense, consistent, and/or extreme reaction. For that each episode was intended to elicit are also noted. Within each example, for the rating, a score of 1 indicated no American episode epoch or trial, a number of child responses are coded. sign of impulsivity, ever, 2 indicated only slight or ambiguous personal the Lab-TAB coding involves multiple domains of responses in- signs of impulsivity; but restrained quickly, 3 indicated unam- the by cluding facial, vocalic, motoric, behavioral, and postural mo- biguous tendency towards impulsivity; often shows some re- for dalities (e.g., smiling, reaching, crying, touching, or changes in straint, 4 indicated typically impulsive; may show signs of facial expression). Sometimes the presence or absence of a restraint in 1 or 2 situations, and 5 indicated consistently solely response is simply noted; however, more often parameters of impulsive, shows little, if any, inhibition. Sixteen of the 28 copyrighted is the response, such as latency, duration, and intensity, are timed postvisit rating items overlapped with similar items on the

intended or rated. Expressive (e.g., facial and vocal) measures and in- Behavior Rating Scales (BRS) from the Bayley Scales of Infant

is strumental or motoric measures often fall into different clusters Development–II (Bayley, 1993), although slight modifications document and can be classified as different episode component scores. For were made for age appropriateness (see Goldsmith, 1978, for

article example, the Box Empty episode yields anger, sadness, and more detail). Use of Bayley Scale items to assess temperament This approach scores. If episode component scores are intercorre- yields interrater reliability estimates of 87% (Goldsmith & This lated, an overall episode summary score is often justified. Gottesman, 1981; Matheny, 1980). The other items included However, we used many episode-level component scores in the four ratings of child’s behavior with the mother and eight items present analyses. The actual process of scoring and scale con- constructed to test for convergent validity for behaviors scored struction for the Lab-TAB is a key element of our results and during the Lab-TAB episodes. The postvisit rating variables will be described in the next section. that we used in our analyses were Anger Proneness, Sadness, Fear, Shyness, Positive Affect, Exuberance, Hyperactivity, Per- Postvisit Observer Temperament Ratings sistence and Impulsivity. For all items, a kappa of .85 or higher was obtained. In addition to the Lab-TAB assessments, the child tester and the person who videotaped the child during the entire visit Children’s Behavior Questionnaire (CBQ) completed postvisit ratings of child temperament. These ratings were conducted independently by the two observers after re- An abridged 80-item version of the CBQ (Rothbart, Ahadi, & viewing the videotape immediately upon return to the project Hershey, 1994) was completed before the home visit by the offices after the home visit. These postvisit observer ratings mother. The CBQ requires parents to judge their children’s reac- 342 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

tions to a variety of situations over the last 6 months (e.g., “Can Results lower his/her voice when asked to do so”) and is appropriate for children from 3 years to 7 years of age (Rothbart, Ahadi, Hershey, Construction of the Lab-TAB Temperamental & Fisher, 2001). Each item is rated on a 1–7 scale, with 1 Composite Measures indicating the reaction is extremely untrue of the child and 7 indicating that the reaction is extremely true. CBQ scores have Each of the 12 Lab-TAB episodes used in the home adminis- shown high internal consistency, parental agreement, and conver- tration at 4.5 years yields substantial raw data. Multiple response gent validity with socialization-relevant traits (Rothbart et al., categories (ranging from 2 to perhaps 8–10) are scored in each 2001) and have been used in numerous studies with a wide range episode. Typically, we scored the latency of the first response, the of empirical correlates. At the time the CBQ was completed, the occurrence or nonoccurrence of the response in each scoring mothers had no knowledge of how their children would behave interval of 5–10 s (or, in some cases, each discrete trial), and in during the Lab-TAB assessment. The eight CBQ scales that we most cases, the magnitude or intensity of each response. These used were selected for overlap with temperament dimensions responses are depicted in the top level of Figure 1 (this figure assessed in the Lab-TAB, and each CBQ scale had 10 items. should be read and understood in conjunction with Table 2). From Estimates of internal consistency for each CBQ scale were as these raw data, we derived the latency or speed, the mean level of broadly. follows: Anger (␣ϭ.78), Fear (␣ϭ.73), Shyness (␣ϭ.92), response, and the peak intensity of the response. Latency, mean Sadness (␣ϭ.63), Approach (␣ϭ.74), Activity Level (␣ϭ.73), level, and peak are referred to as parameters of response. Some- publishers. Attentional Focusing (␣ϭ.78), and Inhibitory Control (␣ϭ.82). times, the simple occurrence of a behavior was noted. The raw data points can range from 12–197 per episode. allied disseminated

its We computed descriptive statistics for each directly coded vari- be of Data Analysis able (i.e., parameters of response) to detect variables with low to

one variance or markedly nonnormal distributions. Then, the lowest

not Approach. This article reports on the initial steps taken in or order composite variables were computed at this stage (i.e., means is deriving scores from the Lab-TAB—Home Version as well as within the four parameters of a given response: occurrence, inten- and evidence for convergent validity. An average score approach to sity, peak, and latency); this level is referred as the “response data reduction and composite construction was employed, be- user parameter” level in Figure 1. In cases in which distributions were

Association ginning with raw scores from each Lab-TAB episode. We used skewed, appropriate transformations were made. Most transforma- average as opposed to differentially weighted scores because tions involved either a reciprocal square root function for latency average scores tend to replicate better, and there is typically individual measures to transform them to speed measures or a square root very little practical difference when the two types of scores are

the function for other positively skewed variables. Because later steps

Psychological related to external variables. Although differentially weighted of in the process would involve combining measures that were scored scores are more precise than average scores, the differential use in different metrics, we also applied z-transformations to each weights are usually sample-specific. We based our validity variable.

American evidence on convergence with the postvisit observer ratings and The next step in data reduction was to combine correlated

personal the CBQ scales. the parameters of the same response within an episode (e.g., latency, the by Imputation. Prior to analyses, we used imputation to avoid mean level, and peak angry facial expression). These response

for biases due to missing data (Graham, 2009). A total of 408 parameter level composites were then examined for covariation children completed at least most of the Lab-TAB episodes at with other response parameter level composites, still within the solely age 4.5 years. Out of this group of 408, 382 children (94%) had episode (e.g., postural anger and facial anger). Often, principal copyrighted complete Lab-TAB, postvisit observer rating, and CBQ data.

is component analysis, common factor analysis, or simply examina- Eleven children were missing the CBQ, six were missing the tion of the correlation matrices suggested that a higher level, intended Stranger Approach episode of the Lab-TAB, and nine were is within-episode component was justified. For example, “facial an- missing scattered single variables. We used Little’s Missing ger,” “postural anger,” and “protest” lower level composites could document Completely at Random Test (MCAR; Little, 1998) to determine article be combined into an “anger” component within the End-of-the- This whether these data were missing at random. Little’s MCAR is a Line episode. These episode level components (also depicted in This chi-square test, whereby if the p value is not significant, then Figure 1) were always calculated by taking the mean of the the data may be assumed to be MCAR. The resulting MCAR constituent measures, with means still being calculated in the few score for our data, ␹2(195, N ϭ 408) ϭ 187.25, p ϭ .64, cases in which one or more of the constituent measure was miss- indicates that we cannot reject the hypothesis that the data are ing. Not every measure in the raw data was used in a component. MCAR. On the basis of this finding and the relatively small The subsets of items used in each episode level component, the amount of missing data, we had sufficient justification to im- components themselves, and the internal consistency estimates for pute the missing values in our data set using the SPSS Missing the components are presented for each episode in Table 2. Value Analysis expectation maximization (EM) algorithm As can be seen from Table 2, this process yielded a total of 24 (Dempster, Laird, & Rubin, 1977). After this single imputation within-episode components for the 12 episodes. For example, this procedure, neither means nor standard deviations of any study process yielded anger and sadness components within Box Empty, variables differed in the 1st or 2nd decimal place from their End-of-the-Line, and Transparent Box episodes. Fear (object) was values before imputation. Therefore, all data analyses used the derived from the Jumping Spider episode, and Shyness (social complete set of data, with imputed values where necessary, for fear) was derived from the Stranger Approach. The Positive Ex- all 408 participants. pression components were derived from Bookmark, Popping Bub- DERIVING CHILDHOOD TEMPERAMENT MEASURES 343

Discrete Trials or Episode Intervals Raw behavioral data: Box Empty facial (scores range from 0-3) and postural sadness (scores range from 0-2) variables across epochs-see Table 2.

Box Empty Box Empty Box Empty Box Empty Speed to Mean Speed to Mean Facial Facial Postural Postural Sadness Sadness Sadness Sadness Response parameter level: the mean, peak and speed of Box Empty Box Empty Box Empty facial and postural sadness are calculated. Peak Facial Peak Postural Sadness Sadness

Episode level components: broadly. response parameters are averaged to form Box Empty episode Sadness score. publishers.

allied Box Empty End of Line Transparent Box disseminated its Sadness Sadness Sadness be of to one not or is

and Primary level user

Association temperament composite: episode level components individual Sadness averaged to form Sadness the composite. Psychological of use

Figure 1. Depiction of the levels of analysis in deriving primary level temperament composites from the American Laboratory Temperament Assessment Battery. personal the the by

for bles, and Pop-Up Snakes episodes; Approach (anticipatory posi- The conceptual starting point was our theoretical view of temper- tive affect) components were derived from Box Empty, Perpetual ament as involving emotional and regulatory dimensions of be- solely Motion, Popping Bubbles, and Pop-Up Snakes episodes; and the havior. Some expectations were also set by earlier preschool copyrighted

is Active Engagement components were derived from both Work- Lab-TAB results (Pfeifer et al., 2002). Intercorrelations among the bench and Bookmark episodes. Finally, Persistence components component measures from each episode indicated that the degree intended

is were derived from Perpetual Motion and Transparent Box epi- of overall shared variance from one episode to the next ranged sodes, whereas Inhibitory Control components were derived from from low to moderate. Thus, in forming these higher level tem- document

article Snack Delay and Dinky Toy episodes. The internal consistency or peramental composites, similar responses from different episodes This alpha estimates for these 24 episode specific components are were unit weighted and averaged. For example, sadness compo- This presented in Table 2. Twenty of these 24 components had internal nents from all three episodes were averaged to consistencies in the range of .70 to .90. Furthermore, components form the Sadness composite. from the same episode such as anger and sadness components from the Box Empty episodes often showed low correlations indicating Descriptive Statistics that the components were tapping largely independent sources of variance. Table 3 lists the means and standard deviations of all tempera- Because our was mainly in behavioral tendencies that ment dimensions in the study for both boys and girls. For Lab- transcend any one episode or situation, in the third step we calcu- TAB dimensions, boys had significantly higher levels of anger, lated higher level “temperamental” composites, which are shown approach, active engagement, and persistence than girls and girls at the bottom level of Figure 1. In other words, we were more were higher on inhibitory control. Girls had higher sadness and confident in designating a measure as “temperamental” when it shyness scores on the postvisit ratings and boys had higher exhibited cross-situational consistency. The empirical starting exuberance, hyperactivity and impulsivity scores. On the CBQ, point for forming these higher level composites was the correlation girls had higher sadness, attentional focusing and inhibitory matrix of the 24 within-episode components (see the Appendix). control, and boys had higher activity level. Gender differences 344 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

Table 2 Lab-TAB Content of Temperament Composites

Internal consistency Temperament Mean corrected dimensions Constituents of the episode level components ␣ r item-total r

Anger Box Empty episode: Anger component of mean, peak, speed of facial anger, .93 .73 postural anger, speed of frustration 9 items End of the Line episode: Anger component of mean, peak, speed of facial anger, .87 .61 postural anger, speed of frustration 9 items Transparent Box episode: Anger component of mean, peak, speed of facial anger, .84 .55 postural anger, speed of frustration 9 items Sadness Box Empty episode: Sadness component of mean, peak, speed of facial sadness, .87 .68 postural sadness 6 items End of the Line episode: Sadness component of mean, peak, speed of facial .80 .55 sadness, postural sadness 6 items broadly. Transparent Box episode: Sadness component of mean, peak, speed of facial .85 .64

publishers. sadness, postural sadness 6 items Fear Spider episode: Approach-related fear component of initial touch and peak wariness .77 allied of approach 2 items disseminated its Spider episode: Post-approach fear expression component of mean facial fear, .87 .71 be of bodily fear, vocal distress and withdrawal 4 items to one

not Shyness Stranger Approach episode: Shyness component of approach and shyness ratings .90 .78 or

is across 2 raters 4 items

and Positive Expression Bookmark episode: Smiling component of mean, peak and speed of smiling 3 items .73 .56 Popping Bubbles episode, low intensity trials: Positive affect expression component .78 .57 user of mean, peak and speed of smiling, percentage intervals laughter 6 items Association Popping Bubbles episode, high intensity trials: Positive affect expression component .84 .65 of mean, peak and speed of smiling, percentage intervals laughter 4 items Pop-Up Snakes episode: Positive affect expression component of mean, peak and .88 .54 individual speed of smiling, % intervals laughter 14 items the Psychological of Approach Box Empty episode: component of mean, peak, speed of anticipatory .94 .94 behavior 3 items use Perpetual Motion episode: Approach component of mean and peak of active .86 .71 approach, mean frequency of touches, and speed of anticipatory behavior 4 items American Popping Bubbles episode, low intensity trials: Approach component of mean, peak, .76 .61 personal the speed of vigor of approach 3 items the by Popping Bubbles episode, high intensity trials: Approach component of mean, peak, .83

for speed of vigor of approach 2 items Pop-Up Snakes episode: Approach component of mean, peak, speed of vigor of .84 .63 approach 6 items solely copyrighted Active Engagement Bookmark episode: Active engagement component of mean, peak of active .62 is approach 2 items

intended Workbench episode: Activity level component of mean, peak of play 2 items 62 is Persistence Perpetual Motion episode: Persistence component of % time on task and latency to .52 document off-task behavior 2 items article Transparent Box episode: Persistence component of % time on task and latency to .83 This off-task behavior 2 items This Inhibitory Control Dinky Toys episode: Inhibitory control component of mean and speed of .50 .30 impulsivity across trials 4 items Snack Delay episode: Inhibitory control component of global inhibitory control .75 .43 across trials 4 items

Note. Lab-TAB ϭ Laboratory Temperament Assessment Battery.

are fairly consistent across the three types of ratings and follow tween the composites were low to moderate. The patterns of patterns that are typically found in the literature. covariance shown are typical of child temperament dimensions in the literature. For example, correlations in both the positive affec- Interrelations Among the Lab-TAB Temperamental tivity domain (Positive Expression, Approach and Active Engage- Dimension Composites ment) and the regulation domain (Inhibitory Control and Persis- The intercorrelations between the Lab-TAB temperamental tence) were moderate, ranging from .20–.56. On the other hand, composites are displayed in Table 4. In general, correlations be- correlations in the negative affectivity domain (Anger, Sadness, DERIVING CHILDHOOD TEMPERAMENT MEASURES 345

Fear, and Shyness) ranged from Ϫ.12 to .10, suggesting that with the corresponding CBQ scales. The pattern of correlations negative emotions are more distinct from one another in early between the CBQ and postvisit ratings (see Table 7) was similar to childhood than are positive emotions.1 Anger was positively re- the pattern of Lab-TAB by CBQ correlations. The only small to lated to the positive affectivity dimensions, presumably due to moderate postvisit rating correlations were also with CBQ Fear, common approach motivation, and Anger was negatively associ- Shyness, Inhibitory Control, and Active Engagement. Three of ated with Shyness and the regulatory dimensions. Inhibitory Con- these correlations were greater in magnitude than the Lab-TAB by trol showed negative associations with Positive Affect and a pos- CBQ associations. In general, associations between both objective itive correlation with Shyness. ratings (i.e., Lab-TAB temperament composites and postvisit tem- We used Bartlett’s test of sphericity and the Kaiser-Meyer- perament dimensions) and the CBQ scales were much weaker than Olkin (KMO) measure of sampling adequacy to determine the those between Lab-TAB and the postvisit observer ratings. factorability of the intercorrelation matrix shown in Table 4. Bar- tlett’s test of sphericity calculates the determinant of the matrix of Regression Analyses: Lab-TAB Composites and CBQ the sums of products and cross-products (S) from which the Temperament Dimensions intercorrelation matrix is derived. The determinant of the matrix S is converted to a chi-square statistic and tested for significance. The moderate or even nonexistent associations between primary

broadly. The null hypothesis is that the intercorrelation matrix comes from level Lab-TAB composites and “corresponding” CBQ maternal a population in which the variables are noncollinear (i.e., an report scales are open to various interpretations, including the publishers. identity matrix). If two variables share a common factor with other observation we have already offered that the content of the ques- variables, their partial correlation will be small. If the variables in tionnaire and laboratory measures does not align precisely—or allied disseminated

its the matrix are measuring a common factor, the KMO value should does not align at all in some cases—between measures that on the be of be close to 1.0. If the variables are not measuring a common factor, surface appear to correspond. However, the broader question of to

one the KMO will be closer to 0.0. The results of the Bartlett’s test, whether maternal report can predict children’s actual behavior

not 2 or ␹ (36, N ϭ 408) ϭ 454.06, p Ͻ .0001, indicates that the matrix under standardized conditions can be addressed by examining the is departs significantly from an identity matrix. However, the KMO power of the full set of eight CBQ scales that we used for and of .715 suggests that the degree of common variance among the predicting each primary level Lab-TAB composite. In conducting

user measures in Table 4 is marginal (a threshold of .70 is often this analysis, we initially considered using only plausible CBQ Association considered the minimal level needed to justify factor analysis of scales for each regression equation. For instance, it might be the matrix). In summary, the results of these two tests are some- plausible that Lab-TAB Inhibitory Control could be predicted by

individual what equivocal as to whether factor analyzing the matrix of Lab- CBQ Inhibitory Control, CBQ Attentional Focusing, CBQ Ap-

the TAB variables is fully justified; we interpret this result to suggest Psychological of that the primary trait level of analysis is preferable for interpreta- 1 Given that the pattern of correlations suggests that negative emotions use tion. are more distinct from one another in early childhood than are positive emotions, we sought to formally test the idea that coherence within positive American Correlations Between the Lab-TAB Composites and affect may be greater than within negative affect dimensions. Because our personal the Other Temperament Measures N (408 for all tests below) was large, any formal test of equality in subsets the by of correlations speaking to coherence within and across those two domains for Postvisit observer temperament ratings. Table 5 shows the is overpowered. Hence, we decided to test two sets of pattern hypotheses intercorrelations between the Lab-TAB composite measures and against a null pattern that assumed uniformity in coherence within and solely the conceptually matching cross-situational variables derived from across positive and negative affect dimensions. In this baseline null model, copyrighted we assumed that the coherence within positive affect and coherence within is the postvisit observer ratings. All of these correlations were mod- est to moderate in magnitude, and many provide a degree of negative affect dimensions would be similar to each other, ␹2(8, N ϭ intended 408) ϭ 119.35, p ϭ .00; root-mean-square error of approximation is convergent validation for scores on the composite Lab-TAB mea- Ϫ ϭ sures, which were derived from microanalytic coding of specific (RMSEA) 90% [.15, .21]; comparative fit index (CFI) .64; Akaike document information criterion (AIC) ϭ 154.16. We tested this model against two

article situations. The sadness, persistence, and active engagement by alternative models. This hyperactivity correlations were all under .30, indicating somewhat

This In Model A, correlations were set equal only within the positive affect lower convergence. The negative association between Lab-TAB domain, and all other correlations were freely estimated. The fit of this Inhibitory Control and postvisit ratings of Impulsivity shows that model was inadequate, ␹2(2, N ϭ 408) ϭ 43.41, p ϭ .00, RMSEA 90% CI children with poor inhibitory behavior have increased levels of [.18, .29]; CFI ϭ .87; AIC ϭ 98.09, but it represented a significant impulsivity. improvement in fit over the baseline model, ⌬␹2 (6) ϭ 75.89, p ϭ .00. In CBQ temperament dimensions. To further examine the Model B, correlations were set equal only within the negative affect convergent validity of these composite Lab-TAB scores, we used domain, and all other correlations were freely estimated. The fit of Model 2 maternal reports on eight scales from the CBQ. The correlations B was also inadequate, ␹ (5, N ϭ 408) ϭ 13.84, p ϭ .02; RMSEA 90% CI ϭ ϭ between cross-situational composites from the Lab-TAB and CBQ [.12, .11]; CFI .97; AIC 56.69. However, Model B represented a significant improvement in fit over the baseline model, ⌬␹2 (3) ϭ 105.46, are presented in Table 6. Eight of the cross-situational Lab-TAB p ϭ .00. However, Model B is not nested within Model A, which incor- dimensional composites have a closely corresponding CBQ scale, porates equality in coherence only in positive affect. The chi square to the and four of these Lab-TAB composites have correlations of small degree of freedom ratio, the RMSEA interval, and the CFI indicate an magnitude with the corresponding CBQ scale (Fear, Shyness, adequate fit, and the AIC indicates that Model B fits better than Model A. Inhibitory Control, and Active Engagement). In contrast, Sadness, This sequence of tests indicates that coherence within and across positive Anger, Approach, and Persistence did not correlate significantly and negative affect dimensions is not equal. 346 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

Table 3 Means, Standard Deviations, t-Tests: Lab-TAB, Postvisit, and CBQ Temperament Dimensions

Score

Boys Girls

Temperament dimension MSDMSD t

Lab-TAB ءءAnger 0.18 1.04 Ϫ0.16 0.93 3.47 Sadness Ϫ0.03 1.02 0.03 0.98 Ϫ0.60 Fear Ϫ0.05 1.03 0.05 0.97 Ϫ0.99 Shyness 0.02 1.04 0.02 Ϫ0.97 0.36 Positive Expressiveness 0.05 0.94 Ϫ0.05 1.05 0.99 ءءApproach 0.32 0.89 Ϫ0.30 1.0 6.62 ءActive Engagement 0.11 0.97 Ϫ.10 Ϫ1.01 2.17 ءPersistence 0.11 1.01 Ϫ0.10 0.98 2.10 ءء broadly. Inhibitory Control Ϫ0.20 1.03 0.18 0.93 Ϫ3.93 Postvisit ratings publishers. Anger proneness 0.10 1.0 Ϫ0.09 1.0 1.85 ءSadness Ϫ0.12 1.04 0.11 0.95 Ϫ2.31

allied Fear Ϫ0.01 1.01 0.01 0.99 Ϫ0.29 ء disseminated Ϫ Ϫ its Shyness 0.15 0.93 0.13 1.04 2.82

be Ϫ of Positive affect 0.10 1.04 0.09 0.95 1.96 ءء to Exuberance 0.23 0.97 Ϫ0.21 0.99 4.55 ءء one Ϫ not Hyperactivity 0.34 1.05 0.31 0.85 6.88 or

is Persistence Ϫ0.05 1.0 0.05 1.0 Ϫ1.04 ءءImpulsivity 0.20 0.96 Ϫ0.018 1.0 3.92 and CBQ Ϫ user Anger 0.15 1.0 0.04 0.97 1.91 ءء Association Sadness Ϫ0.11 0.98 0.18 0.95 Ϫ3.04 Fear 0.02 1.02 0.03 1.0 Ϫ0.09 Shyness Ϫ0.08 1.0 0.09 1.0 Ϫ1.69

individual Approach 0.04 1.08 0.02 0.98 0.10 ءءActivity level 0.24 1.06 Ϫ0.17 0.90 4.07 ءء the Ϫ Ϫ Psychological

of Attentional Focusing 0.22 1.02 0.11 0.98 3.26 ءءInhibitory Control Ϫ0.22 1.05 0.16 0.95 Ϫ3.78 use Note. N ϭ 408 (196 boys and 212 girls). Lab-TAB ϭ Laboratory Temperament Assessment Battery; CBQ ϭ Children’s Behavior Questionnaire. ءء ء American p Ͻ .05. p Ͻ .01 (two tailed). personal the the by

for proach (negatively), and perhaps CBQ Fear and Shyness (posi- child temperament can indeed predict child behavior in a standard- tively). However, such considerations are somewhat subjective. ized situation but that these predictions do not necessarily corre- solely Instead, we decided on a more exploratory approach wherein all spond to single objectively assessed dimensions of affect and/or copyrighted 2

is CBQ scales were used in each regression equation, as Table 7 behavior. One can compare the R values in the Table 8 regression shows. We expected several of the CBQ scales to have no predic- analyses with the r2 values in the last column of Table 5 to intended

is tive power for each Lab-TAB composite. estimate the incremental predictive power of multiple CBQ scales As the second line of Table 8 shows, the overall multiple over a single “corresponding” CBQ scale in predicting a Lab-TAB document 2

article regression was significant at p Ͻ .05 for seven of the nine Lab- dimension. In only some of these cases does the adjusted R This TAB composites, and the significance level for predicting Persis- suggest incremental prediction. It is interesting that mother-rated This tence was p ϭ .05. In contrast, Lab-TAB Sadness was clearly Shyness and Activity Level show the most significant associations not predicted by the set of CBQ scales. For each of the CBQ with children’s Lab-TAB ratings. subscale predictors, different patterns emerged. CBQ Shyness pos- itively predicted Lab-TAB Shyness and Inhibitory Control and Discussion negatively predicted Lab-TAB Anger, Positive Expression, and Approach. The CBQ Activity Level scale predicted Active En- In this article, we examined psychometric properties of a home- gagement and Anger in the standardized assessment with positive based Preschool Lab-TAB assessment with a community sample partial regression coefficients and Shyness with a negative coef- of preschool-age children; we used maternal report questionnaires ficient. The CBQ Approach, Attentional Focusing, Fear, and In- and observers’ postvisit ratings to provide validity evidence. Be- hibitory Control scales also evinced significant predictive power of cause the bulk of what we know about child temperament assess- Lab-TAB temperament composites. The CBQ Approach scale did ment has been derived from questionnaire methodology and ob- not predict the corresponding Lab-TAB composite, and the Sad- servational assessments of temperament have rarely been ness and Anger scales did not predict any Lab-TAB composites. organized into comprehensive protocols, little consensus exists on The patterns of significant findings indicate that mother reports of the best practices for assessing specific temperament-related be- DERIVING CHILDHOOD TEMPERAMENT MEASURES 347

Table 4 Correlations: Lab-TAB Temperament Dimension Composites

Positive Active Inhibitory Temperament dimension Anger Sadness Fear Shyness Expression Approach Engagement Persistence Control

ءءϪ.30 ءءϪ.27 ءء18. ءء42. ءء25. ءAnger — .08 .06 Ϫ.12 00. ءϪ.02 Ϫ.05 Ϫ.08 Ϫ.12 05. ءSadness — .10 Ϫ.09 ءFear — Ϫ.02 .00 Ϫ.06 Ϫ.02 Ϫ.12 ءءϪ.09 .00 .19 ءءϪ.18 ءءShyness — Ϫ.16 ءءϪ.30 ءءϪ.20 ءء20. ءءPositive Expression — .56 ءءϪ.44 ءءϪ.18 ءءApproach — .28 ءActive Engagement — .01 Ϫ.12 ءءPersistence — .24 Inhibitory Control —

Note. N ϭ 408. Lab-TAB ϭ Laboratory Temperament Assessment Battery. .(p Ͻ .01 (two tailed ءء .p Ͻ .05 ء broadly. publishers. haviors or for forming composites of these behaviors to reflect anger proneness. The implicit model does not include the concept

allied temperamental dispositions. Our research questions focused on that the behavioral level of temperament is organized around disseminated its whether temperament dimensions that can be recovered from the parameters of response, such as latency or duration of response. be of Lab-TAB are similar to temperament questionnaire scales and Thus, an early stage in the transformation of raw behavioral into to

one whether correlations from “corresponding” dimensions show con- temperament measures involves averaging across response param- not or

is vergent validity across data derived from Lab-TAB, parental ques- eters. Likewise, the modality of response (e.g., facial, gestural, tionnaire, and postvisit observation rating methods. Previous arti- vocal) does not define temperament, and thus we also averaged and cles that offered “how-to” advice on scale derivation and that across modality in deriving affective and/or motivational compos-

user proposed general principles for data reduction or latent trait esti- ites within episodes (where the episode is the task or trial that is Association mation (e.g., Clark & Watson, 1995, in this Journal) have not defined by standardized affective incentives). Once scores defined generally included assessment methods that elicit actual behavior by a common affective core are derived for each episode, our

individual as one of the data types considered. implicit model of temperament calls for averaging (e.g., by taking

the The home-based administration of the Lab-TAB included 12 a principal component) the common scores across episodes. This Psychological of episodes designed to elicit targeted behavioral reactions; most of process was illustrated in Table 2. The reason for averaging across

use these episodes generated dozens of raw data points. From these episodes is that theories of temperament (as well as theories of raw data, we calculated higher level temperament composites. Our personality more generally) posit cross-situational consistency as a

American method of transforming short videotaped periods of elicited be- defining feature of traits. Indeed, Allport (1937) defined “trait,” in personal the havior into numerical scores that can be used to study temperament part, as “the capacity to render many stimuli functionally equiva- the by is not meant to be prescriptive; rather, the method is meant to lent.” for illustrate a strategy that can be modified by other investigators. The decision to aggregate across episodes does not mean that This process reflects an implicit model of temperament that in- situations are unimportant as organizers of behavior. Some re- solely cludes the concept that the behavioral level of temperament is searchers will find that a specific Lab-TAB episode captures a copyrighted

is organized around affective and/or motivational constructs, such as context that is crucial for a given study and, thus, would not pursue intended is

document Table 5 Table 6

article Correlations: Lab-TAB Temperament Dimensions and Postvisit Correlations: Lab-TAB Temperament Dimensions and CBQ This Temperament Dimensions Temperament Questionnaire Scales This

Corresponding Most comparable CBQ Lab-TAB composites postvisit ratings Correlation Lab-TAB composites questionnaire scales Correlation r2

Temperament dimensions Temperament dimensions Anger Anger proneness .33 Anger Anger .03 .00 Sadness Sadness .25 Sadness Sadness .00 .00 02. ءءFear Fear .76 Fear Fear .15 10. ءءShyness Shyness .46 Shyness Shyness .32 Positive Expressiveness Positive affect .48 Approach Approach .09 .01 04. ءءApproach Exuberance .59 Active Engagement Activity level .20 Active Engagement Hyperactivity .21 Persistence Attentional focusing .09 .01 04. ءءPersistence Persistence .24 Inhibitory Control Inhibitory control .19 Inhibitory Control Impulsivity Ϫ.51 Note. N ϭ 408. Lab-TAB ϭ Laboratory Temperament Assessment Bat- Note. N ϭ 408. All correlations significant at the p Ͻ .01 level (two tery; CBQ ϭ Children’s Behavior Questionnaire. .(p Ͻ .01 (two tailed ءء .tailed). Lab-TAB ϭ Laboratory Temperament Assessment Battery 348 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

Table 7 Once formed, the Lab-TAB temperamental composites showed Correlations: Postvisit Temperament Dimensions and CBQ low to moderate intercorrelations, as anticipated by prior studies Temperament Questionnaire Scales (e.g., Gagne & Goldsmith, 2011; Pfeifer et al., 2002). Covariances between dimensions within the positive affectivity domain and Most comparable CBQ within the behavioral control-regulation domain were significant. Postvisit ratings questionnaire scales Correlation In contrast, the negative affectivity dimensions showed little as- Temperament dimensions sociation with one another, indicating that negative emotions as Anger Proneness Anger .08 assessed by these procedures are more distinct from one another in Sadness Sadness .09 early childhood than positive emotions. However, Anger was ءء Fear Fear .15 positively correlated with the positive affectivity dimensions and ءءShyness Shyness .49 Exuberance Approach .10 negatively correlated with Shyness and the regulatory dimensions. ءء Hyperactivity Activity Level .29 We interpret this set of relations as indicative of a common Persistence Attentional Focusing .04 approach motivation, whereby angry children are more likely to be ءء Impulsivity Inhibitory Control Ϫ.22 generally expressive and active, and less likely to inhibit their Note. N ϭ 408. CBQ ϭ Children’s Behavior Questionnaire. behavior. Similarly, Inhibitory Control was negatively correlated ءء broadly. p Ͻ .01 (two tailed). with Positive Affect and positively correlated with Shyness. These

publishers. findings are generally consistent with both contemporary theoret- ical and empirical perspectives on temperament. allied aggregation. The power of situations (in this case, Lab-TAB epi- The hypothesized associations between the Lab-TAB compos- disseminated its ites and the corresponding postvisit observer rating variables were

be sodes) to organize behavior is also one of the key reasons that the of

to most basic elements of behavior (e.g., intensity of an angry facial all moderate (correlations ranged from .21 to .76). This pattern of one

not expression in the End of the Line episode or vigor of approach in correlations provides convergent validity for the Preschool Lab- or is the Popping Bubbles episode) across all the episodes cannot be TAB assessment. Lab-TAB temperament dimensions were derived

and submitted to an exploratory factor analysis to derive temperament from microanalytic coding of specific situations, whereas the post- trait measures. Basic elements of elicited behavior are not compa- visit ratings were based on rater’s global impressions of child user

Association rable to the initial questionnaire items in a large pool that might be behavior across the entire home visit, including transitions be- used to derive a set of questionnaire-based trait dimensions. When tween situations. For example, in the Lab-TAB Persistence epi- answering a questionnaire, the respondent is not constrained in sodes (“Perpetual Motion” and “Transparent Box”) temperament individual answering Item 2 by his or her response to Item 1. This indepen- is assessed in a very specific manner–the child is presented with the stimuli designed to evoke a persistent response. However, some

Psychological dence does not apply to actual behavior; the child who has with- of drawn across a room in one 10-s interval of the Dinky Toys children may not be sufficiently engaged with the eliciting stimuli use episode cannot touch the toys during the same interval. Similarly, or their emotional reactions may prevent a persistent strategy. The once a child sadly resigns from trying to open the Transparent Box coding of persistent behavior in these episodes is also very specific American

personal with faulty keys, he or she is highly unlikely to shift to anger, and may be qualitatively different from observer impressions the the

by frustration, or persistence within the next few seconds. across all of the Lab-TAB episodes. Differences in the magnitude for

solely Table 8 copyrighted

is Regression Analyses: Predicting Each Primary Level Lab-TAB Composite From the Set of Eight CBQ Scales intended

is Laboratory Temperament Assessment Battery: Primary level composite variables document Positive Active Inhibitory article Model Shyness Fear Anger Sadness Expression Engagement Approach Persistence Control This

This F, full model 6.80 2.00 3.59 0.17 5.66 4.17 8.27 1.93 4.63 p, full model Ͻ.001 .046 Ͻ.001 .995 Ͻ.001 Ͻ.001 Ͻ.001 .054 Ͻ.001 Multiple R .35 .20 .26 .06 .32 .28 .38 .19 .29 Adjusted R2 .10 .02 .05 Ϫ.02 .08 .06 .13 .02 .07 Predictors: CBQ scales (B: Activity level ؊.14 (.07) Ϫ.05 (.07) .16 (.07) .01 (.07) .04 (.07) .24 (.07) .03 (.06) .05 (.07) .03 (.07 (B: Shyness .29 (.05) Ϫ.08 (.06) ؊.16 (.06) .04 (.06) ؊.22 (.06) Ϫ.03 (.05) ؊.28 (.05) .09 (.06) .16 (.06 (B: Approach .04 (.06) .09 (.06) Ϫ.03 (.06) Ϫ.02 (.06) Ϫ.00 (.06) ؊.14 (.06) .02 (.06) ؊.12 (.06) Ϫ.08 (.06 (B: Attentional Focusing Ϫ.03 (.06) Ϫ.04 (.06) .01 (.06) .02 (.06) ؊.13 (.06) ؊.16 (.06) Ϫ.03 (.06) .10 (.06) .01 (.06 B: Fear Ϫ.06 (.05) .16 (.05) Ϫ.01 (.05) .00 (.06) Ϫ.08 (.05) Ϫ.07 (.05) .01 (.05) .01 (.05) .05 (.05) (B: Inhibitory Control Ϫ.01 (.07) Ϫ.04 (.07) Ϫ.03 (.07) .02 (.07) Ϫ.06 (.07) .11 (.07) ؊.18 (.07) .01 (.07) .21 (.07 B: Sadness .04 (.06) .03 (.06) Ϫ.04 (.06) .00 (.06) Ϫ.03 (.06) .11 (.06) Ϫ.06 (.06) .09 (.06) .07 (.06) B: Anger Ϫ.04 (.07) Ϫ.06 (.07) Ϫ.01 (.07) .01 (.07) Ϫ.01 (.07) .03 (.07) Ϫ.08 (.06) .05 (.07) .04 (.07)

Note. N ϭ 408. Each regression had degrees of freedom of (8, 375). B, the unstandardized partial regression coefficient, was equivalent to the standardized beta because all predictors and outcomes were standardized. Standard errors of the estimate are in parentheses. Significant partial regression coefficients (p Ͻ .05) are printed in bold font. Lab-TAB ϭ Laboratory Temperament Assessment Battery; CBQ ϭ Children’s Behavior Questionnaire. DERIVING CHILDHOOD TEMPERAMENT MEASURES 349

of postvisit rating by Lab-TAB correlations might be affected by macro level view of temperament as opposed to the Lab-TAB these differences, and dimensions that reflect lower agreement, composites, which focused on emotional reactions at a very dis- such as persistence, may be a more subtle quality to observers crete, or micro, level. This contrast between micro and macro than, for instance, overt fearful reactions. levels of analysis likely contributes to instances of low covariance As expected, the correlations between Lab-TAB dimensions and between the Lab-TAB and maternal ratings in this study. Although the most comparable CBQ questionnaire scales were much lower a few of the associations between Lab-TAB and observer postvisit than those between Lab-TAB and the postvisit observer ratings. ratings were somewhat modest (e.g., Sadness, Active Engagement, Poor agreement between lab and parent ratings of temperament is and Persistence), most were quite strong. Relations between the consistent with the larger cross-informant agreement in the child Lab-TAB and CBQ were weaker, with half of the temperament development and child psychopathology literatures. Lower cova- dimensions showing no covariance (Anger, Sadness, Approach, riance could be due to limited content overlap between the mea- and Persistence). The same four dimensions were not significantly sures, despite similarities in how the dimensions are named. As correlated between the CBQ and postvisit ratings. However, the described earlier, the postvisit observer ratings were conducted correlations between CBQ and postvisit observer ratings were after the Lab-TAB assessments, which were witnessed by the slightly higher than the CBQ and Lab-TAB, particularly for the observers as a part of the visit. Therefore, the overlapping behav- more “salient” dimensions of Shyness and Activity. We suggest

broadly. ioral “content” being tapped by both the Lab-TAB and postvisit that the shared macro level of analysis most likely contributed to observer ratings probably contributed to their higher covariance, as the slightly higher associations. Yet, if this level of analysis issue publishers. opposed to the CBQ scales, which were based on mother’s im- were solely driving the lack of covariance between the Lab-TAB pressions of child behavior in the home. In addition, CBQ ratings and the CBQ, we would have expected weaker correlations be- allied disseminated

its reflect child behavior across many contexts over time, whereas tween Lab-TAB and the postvisit observer ratings than we actually be of Lab-TAB and postvisit ratings reflect behaviors present only dur- observed. to

one ing the Lab-TAB situations and the home visit. Although the issue was not addressed empirically in our study, not or We also used an exploratory approach wherein all CBQ scales we suggest that another element in interpreting our results is that is were entered in a regression equation to predict each Lab-TAB mothers probably view their children’s behavior much more prag- and temperament composite. Our goal was to investigate whether the matically than do researchers who are trained to assess the subtle-

user “noncorresponding” CBQ scales held predictive power for each ties of emotional reactions and changes. Mothers’ concerns may be Association Lab-TAB composite and whether each CBQ scale showed inde- with the relative “success” of the child’s behavior; that is, mothers pendent predictive power. Of course, we expected that several of may focus on the child’s ability to negotiate a task or situation

individual the CBQ scales would hold no predictive power. The overall regardless of the affective quality of the child’s engagement in the

the multiple regression was significant for seven of the nine Lab-TAB task. For example, if a child tolerates the presence of strangers Psychological of composites, and the regression for Lab-TAB Persistence was bor- with little intervention, and these situations usually “work out”

use derline significant (Sadness was the Lab-TAB dimension clearly without disrupting others, mothers might be less inclined to rate not predicted by the CBQ scales). These findings suggest that the child as being fearful of strangers, regardless of fearful emo-

American maternal reports of child temperament can indeed predict child tional expressions that the child might show during interactions personal the behavior in a standardized assessment but that single dimensions with strangers. Analogous lab-based ratings (e.g., in the Lab-TAB the by of mother report do not typically correspond to single dimensions Stranger Approach episode) focus on the specific emotional reac-

for of standardized assessments. Mother-assessed Shyness and Activ- tions of the child and do not emphasize the resolution of the ity Level showed the strongest associations with children’s Lab- episode. solely TAB scores, a finding that mirrored our correlational results. A clear advantage of objective assessment of temperament is its copyrighted

is Perhaps mothers report Shyness and Activity Level more accu- flexibility. Different data reduction schemes allow temperament to rately than other temperamental dimensions. However, although be characterized in terms of specific reactions to specific eliciting intended

is the internal consistency of CBQ Shyness assessment was higher stimuli, as well as in terms of more functional units of behavior than most other CBQ dimensions (␣ϭ.92), the internal consis- such as emotion-attention or emotion-action pairings that occur document

article tency of CBQ Activity Level was fairly typical (␣ϭ.73). Alter- simultaneously. For example, in the Transparent Box episode, the This nately, Shyness and Activity Level reactions may have more child is assessed for a range of anger and sadness reactions (e.g., This salience for mothers on a day-to-day basis, resulting in better facial, bodily, vocal), persistence (“stops”), and attention (gaze prediction of Lab-TAB temperament. For instance, the r2 for aversion). The meaning of attending to this task and expressing prediction of Lab-TAB Shyness by CBQ Shyness was as high anger simultaneously is qualitatively different by simply being (10%) as the adjusted R2 for prediction of Lab-TAB Shyness by all rated “high” on questionnaire scales tapping attention and anger; CBQ scales (also 10%). Perhaps mothers observe more instances that is, the Lab-TAB reaction is a functional, contextualized of salient activity level and shyness in their children than they do attention–action pairing. The Lab-TAB allows us to encapsulate of fear, anger, sadness, and so on, and this larger parental “data- situational expressions of emotion and attention and/or action base” of activity and shyness observations leads to questionnaire across a multitude of episodes with various target and secondary responses that correspond better with Lab-TAB behavior. In any behaviors assessed. Depending on the coding and data reduction case, the results suggest that not all domains of parental report scheme, such episodes and coding systems can lend greater rich- possess equal predictive power. ness and flexibility to temperament assessment than standardized Unlike the Lab-TAB coders, mothers and observers made “sum- questionnaire or global observer ratings. This molecular view can mary” judgments and attended to more globally meaningful units contribute to the work of researchers interested in the specificity of of child behavior. At this level of analysis, the ratings extracted a in the context of individual tasks, multiple 350 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

expressions of emotion within a situation (e.g., anger and sadness), It is not our ultimate goal to replace parent ratings of tempera- and the stability of emotion and temperament across eliciting ment with laboratory assessments. Although significant method- contexts. It is important to note that Lab-TAB episodes can also be ological concerns attend the use of parental report, questionnaires coded from a global perspective (much like some parent and are inexpensive and easy to administer. In addition, parental per- observer ratings), wherein coders view a range of videotaped spectives on their children’s behavior are valuable and impossible episodes and rate their overall impression of the child’s tempera- to capture using laboratory measures. Rather than downplaying the ment. Thus, depending on the goals of a given study, objective importance of parental questionnaire assessment, our intention is assessment of temperament can yield different types of data. to offer a psychometrically sound, objective measure of tempera- A few researchers have approached Lab-TAB coding from both ment as an option for a temperament assessment strategy. As micro and macro levels of analysis in the same investigation. In previously mentioned, reliance on any one approach to tempera- one study examining laboratory based temperament assessment of ment assessment carries with it risks. The use of multiple sources preschoolers for associations with later childhood disorders, micro of information about participants’ behavior in a quantitative anal- level and global coding schemes were developed for 12 Lab-TAB ysis allows for firmer conclusions about the behavior being inves- episodes (Hayden et al., 2005). The molecular coding followed the tigated (Saudino, 2005) and relations to important developmental system of affective coding devised by Goldsmith and used in this outcomes.

broadly. article. The global coding scheme involved the raters watching an We view the main limitations of the Lab-TAB approach as entire episode and making a rating based on all behaviors relevant being the following: (a) the range of contexts and incentive events publishers. to each dimension of temperament. Findings indicated that that are sampled is limited; (b) the procedures are relatively laboratory-based positive predicted depression risk time-consuming and expensive, at least as compared with ques- allied disseminated

its and that both micro and macro level coding schemes were fairly tionnaires (but not when compared with common assessment be of congruent in predictive power. These global coding schemes were methods in cognitive, social, and approaches); (c) to

one considered successful and used in subsequent analyses (Durbin, exact replication of procedures across laboratories or different not or Hayden, Klein, & Olino, 2007). field settings is difficult, given factors ranging from room dimen- is The use of the Lab-TAB procedures in these studies also dem- sions to subtle differences in examiner behavior; and (d) some and onstrates the stability of temperament assessment from one devel- theories of temperament posit typologies and Lab-TAB is designed

user opmental period to another. Lab-TAB ratings of positive and to produce dimensional outcomes (although profiles or types can Association negative emotionality at age 3 years showed moderate to high be derived statistically). Also, in this study, our sample was rela- levels of stability with matched laboratory assessments at 5–6 tively homogeneous in racial and ethnic composition, which limits

individual years and 9 years (Durbin et al., 2007). These results were partic- generalizability. A more general problem with lab- and home-

the ularly robust in that stability was high even though the tasks were based objective behavioral assessment of temperament is the ques- Psychological of different at the three ages. Stability was higher in emotional tion of how best to derive measures of temperament traits, and of

use aspects of temperament compared with other dimensions. One of course, the purpose of this article is to suggest a fruitful strategy. the disadvantages of parent report is the potential overestimation

American of stability due to parents’ wish to present a consistent picture of

personal References the their child’s behavior (Durbin, 2010; Kagan, 1998). Laboratory the by temperament batteries that employ multiple layered tasks not only Aksan, N., Goldsmith, H. H., Smider, N., Essex, M., Clark, R., Hyde, J., for provide a more nuanced measure of individual differences in Klein, M., & Vandell, D. (1999). Derivation and prediction of temper- emotional reactivity but also may offer more accurate estimates of amental types among preschoolers. Developmental Psychology, 35, solely change and stability (Durbin, 2010). 958–971. doi:10.1037/0012-1649.35.4.958 copyrighted Allport, G. W. (1937). Personality-A psychological interpretation. New is Although in this article we opted for simple data analytic ap- proaches, the data structure produced by Lab-TAB affords struc- York, NY: Henry Holt. intended Bayley, N. (1969). Bayley Scales of Infant Development. New York, NY: is tural equation modeling and multilevel modeling (MLM) ap- proaches. Kiel and Buss (2006) applied MLM to Lab-TAB data The Psychological Corporation. document Bayley, N. (1993). Bayley Scales of Infant Development (2nd ed.). San

article from toddlers tested in our laboratory, and Durbin (2010) dis- Antonio, TX: The Psychological Corporation. This cussed the advantages of MLM approaches. In brief, MLM can be

This Buss, A. H., & Plomin, R. (1975). A temperament theory of personality employed to model the temperamental reactivity for a particular development. Oxford, England: Wiley. emotion as a function of the “potency” of the task or situation. The Buss, A. H., & Plomin, R. (1984). Temperament: Early developing per- variable of interest (i.e., a specific emotion or temperament dimen- sonality traits. Hillsdale, NJ: Erlbaum. sion) is identified as the dependent variable, and the tasks are Caspi, A., Roberts, B. W., & Shiner, R. L. (2005). Personality develop- assigned a potency value related to ability to elicit that trait. For ment: Stability and change. Annual Review of Psychology, 56, 453–484. example, the Stranger Approach task would have a high potency doi:10.1146/annurev.psych.55.090902.141913 value for fear, but a low value for other aspects of temperament Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in (Durbin, 2010). These analyses provide both overall emotion or scale development. Psychological Assessment, 7, 309–319. doi:10.1037/ 1040-3590.7.3.309 temperament scores (the intercept) and the slope of the emotion Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likeli- variable across increasingly potent lab episodes. MLM can im- hood from incomplete data via the EM algorithm. Journal of the Royal prove the estimation of cross-contextual consistency and inconsis- Statistical Society Series, 39, 1–38. tency and suggest better strategies for relating emotion variables to Durbin, C. E. (2010). Modeling temperamental risk for depression using outcomes of interest based on the contexts that are most salient to developmentally sensitive laboratory paradigms. Child Development elicitation. Perspectives, 4, 168–173. DERIVING CHILDHOOD TEMPERAMENT MEASURES 351

Durbin, C. E., Hayden, E. P., Klein, D. N., & Olino, T. M. (2007). Stability of Psychopathology and Behavior Assessment, 27, 89–100. doi:10.1007/ of laboratory assessed temperament traits from ages 3 to 7. Emotion, 7, s10862-005-5383-z 388–399. doi:10.1037/1528-3542.7.2.388 Hwang, J., & Rothbart, M. K. (2003). Behavior genetics studies of infant Escalona, S., & Leitch, M. (1952). Early phases of personality develop- temperament: Findings vary across parent-report instruments. Infant ment: A non-normative study of infant behavior. Monographs of the Behavior & Development, 26, 112–114. doi:10.1016/S0163- Society for Research in Child Development, 17, 72. 6383(02)00173-X Gagne, J. R., & Goldsmith, H. H. (2011). A longitudinal analysis of anger Hyde, J. S., Klein, M. H., Essex, M. J., & Clark, R. (1995). Maternity leave and inhibitory control in twins from 12 to 36 months of age. Develop- and women’s mental health. Psychology of Women Quarterly, 19, 257– mental Science, 14, 112–124. 285. doi:10.1111/j.1471-6402.1995.tb00291.x Gagne, J. R., Vendlinski, M. K., & Goldsmith, H. H. (2009). The genetics Kagan, J. (1994). Galen’s prophecy: Temperament in nature. New of childhood temperament. In Y.-K. Kim (Ed.), Handbook of behavioral York, NY: Basic Books. genetics (pp. 251–267). New York, NY: Springer. doi:10.1007/978-0- Kagan, J. (1998). and the child. In N. Eisenberg (Ed.), Handbook 387-76727-7_18 of child psychology: Vol. 3. Social, emotional, and personality develop- Gartstein, M. A., & Marmion, J. (2008). Fear and positive affectivity in ment (5th ed., pp. 177–235). New York, NY: Wiley. infancy: Convergence/discrepancy between parent-report and Kiel, E. J., & Buss, K. A. (2006). Maternal accuracy in predicting toddlers’ laboratory-based indicators. Infant Behavior and Development, 31, 227– behaviors and associations with toddlers’ fearful temperament. Child

broadly. 238. doi:10.1016/j.infbeh.2007.10.012 Development, 77, 355–370. doi:10.1111/j.1467-8624.2006.00875.x Gartstein, M. A., & Rothbart, M. K. (2003). Studying infant temperament Kochanska, G., Murray, K. T., & Harlan, E. T. (2000). Effortful control in publishers. via the revised infant behavior questionnaire. Infant Behavior & Devel- early childhood: Continuity and change, antecedents, and implications opment, 26, 64–86. doi:10.1016/S0163-6383(02)00169-8 for social development. Developmental Psychology, 36, 220–232. doi: allied Goldsmith, H. H. (1978). Behavior genetic analyses of early personality 10.1037/0012-1649.36.2.220 disseminated its (temperament): Developmental perspectives from the longitudinal study Kochanska, G., Murray, K., Jacques, T. Y., Koenig, A. L., & Vandegeest, be of of twins during infancy and early childhood (Doctoral dissertation). K. A. (1996). IC in young children and its role in emerging internaliza- to

one University of Minnesota, Twin Cities, Minnesota. tion. Child Development, 67, 490–507. doi:10.2307/1131828 not or

is Goldsmith, H. H. (1996). Studying temperament via construction of the Kohnstamm, G. A., Halverson, C. F., Mervielde, I., & Havill, V. L. (1998). Toddler Behavior Assessment Questionnaire. Child Development, 67, Analyzing parental free descriptions of child personality. In G. A. and 218–235. doi:10.2307/1131697 Kohnstamm, C. F. Halverson, I. Mervielde, & V. L. Havill (Eds.),

user Goldsmith, H. H., Buss, A. H., Plomin, R., Rothbart, M. K., Thomas, A., Parental descriptions of child personality: Developmental antecedents

Association Chess, S., . . . McCall, R. B. (1987). Roundtable: What is temperament? of the Big Five? (pp. 1–19). Mahwah, NJ: Erlbaum. Four approaches. Child Development, 58, 505–529. doi:10.2307/ Little, R. J. A. (1988). A test of missing completely at random for 1130527 multivariate data with missing values. Journal of American Statistical individual Goldsmith, H. H., & Campos, J. J. (1990). The structure of temperamental Association, 83, 1198–1202. the fear and pleasure in infants: A psychometric perspective. Child Devel- Loevinger, J. (1957). Objective tests as instruments of psychological the- Psychological of opment, 61, 1944–1964. doi:10.2307/1130849 ory. Psychological Reports, 3, 635–694. doi:10.2466/PR0.3.7.635-694 use Goldsmith, H. H., & Gottesman, I. I. (1981). Origins of variation in Mangelsdorf, S. C., Schoppe, S. J., & Buur, H. (2000). The meaning of behavioral style: A longitudinal study of temperament in young twins. parental reports: A contextual approach to the study of temperament and

American Child Development, 52, 91–103. doi:10.2307/1129218 behavior problems in childhood. In V. J. Molfese & D. L. Molfese personal the Goldsmith, H. H., & Hewitt, E. C. (2003). Validity of parental report of (Eds.), Temperament and personality development across the life span the by temperament: Distinctions and needed research. Infant Behavior & De- (pp. 121–140). Mahwah, NJ: Erlbaum.

for velopment, 26, 108–111. doi:10.1016/S0163-6383(02)00172-8 Matheny, A. P., Jr. (1980). Bayley’s Infant Behavior Record: Behavioral Goldsmith, H. H., Lemery, K. S., & Essex, M. J. (2004). Temperament as components and twin analyses. Child Development, 51, 1157–1167.

solely a liability factor for childhood behavioral disorders: The concept of doi:10.2307/1129557 copyrighted liability. In L. F. DiLalla (Ed.), Behavior genetics principles: Perspec- Matheny, A. P., Jr. (1983). A longitudinal twin study of stability of is tives in development, personality, and psychopathology (pp. 19–39). components from Bayley’s Infant Behavior Record. Child Development,

intended Washington, DC: American Psychological Association. doi:10.1037/ 54, 356–360. doi:10.2307/1129696 is 10684-002 Meehl, P. E. (1992). Factors and taxa, traits and types, differences of document Goldsmith, H. H., Reilly, J., Lemery, K. S., Longley, S., & Prescott, A. degree and differences in kind. Journal of Personality, 60, 117–174.

article (1993). Preliminary manual for the Preschool Laboratory Temperament doi:10.1111/j.1467-6494.1992.tb00269.x This Assessment Battery (Technical Report Version 1.0). Madison, WI: De- Pfeifer, M., Goldsmith, H. H., Davidson, R. J., & Rickman, M. (2002). This partment of Psychology, University of Wisconsin—Madison. Continuity and change in inhibited and uninhibited children. Child Goldsmith, H. H., Rieser-Danner, L. A., & Briggs, S. (1991). Evaluating Development, 73, 1474–1485. doi:10.1111/1467-8624.00484 convergent and discriminant validity of temperament questionnaires for Putnam, S. P., Gartstein, M. A., & Rothbart, M. K. (2006). Measurement preschoolers, toddlers, and infants. Developmental Psychology, 27, of fine-grained aspects of toddler temperament: The Early Childhood 566–579. doi:10.1037/0012-1649.27.4.566 Behavior Questionnaire. Infant Behavior and Development, 29, 386– Goldsmith, H. H., & Rothbart, M. K. (1991). Contemporary instruments 401. doi:10.1016/j.infbeh.2006.01.004 for assessing early temperament by questionnaire and in the laboratory. Putnam, S. P., & Rothbart, M. K. (2006). Development of short and In J. Strelau & A. Angleitner (Eds.), Explorations in temperament (pp. very short forms of the Children’s Behavioral Questionnaire. 249–272). New York, NY: Plenum Press. Journal of Personality Assessment, 87, 102–112. doi:10.1207/ Graham, J. W. (2009). Missing data analysis: Making it work in the real s15327752jpa8701_09 world. Annual Review of Psychology, 60, 549–576. doi:10.1146/ Riese, M. L. (1998). Predicting infant temperament from neonatal reactiv- annurev.psych.58.110405.085530 ity for AGA/SGA twin pairs. Twin Research, 1, 65–70. doi:10.1375/ Hayden, E. P., Klein, D. N., & Durbin, C. E. (2005). Parent reports and 136905298320566366 laboratory assessments of child temperament: A comparison of their Robinson, J. L., McGrath, J., & Corley, R. P. (2001). The conduct of the study: associations with risk for depression and externalizing disorders. Journal Sample and procedures. In R. N. Emde & J. K. Hewitt (Eds.), Infancy to 352 GAGNE, VAN HULLE, AKSAN, ESSEX, AND GOLDSMITH

early childhood: Genetic and environmental influences on developmental Saudino, K. J. (2009). Do different measures tap the same genetic influ- change (pp. 23–41). New York, NY: Oxford University Press. ences? A multi-method study of activity level in young twins. Develop- Rothbart, M. K. (1986). Longitudinal observation of infant temperament. mental Science, 12, 626–633. doi:10.1111/j.1467-7687.2008.00801.x Developmental Psychology, 22, 356–365. doi:10.1037/0012- Saudino, K. J., & Cherny, S. S. (2001). Parental ratings of temperament in 1649.22.3.356 twins. In R. N. Emde & J. K. Hewitt (Eds.), Infancy and early childhood: Rothbart, M. K. (1988). Temperament and the development of inhibited Genetic and environmental influences on developmental change (pp. approach. Child Development, 59, 1241–1250. doi:10.2307/1130487 73–88). New York, NY: Oxford University Press. Rothbart, M. K. (1989). Temperament in childhood: A framework. In G. A. Saudino, K. J., & Eaton, W. O. (1991). Infant temperament and genetics: Kohnstamm, J. E. Bates, & M. K. Rothbart (Eds.), Temperament in An objective twin study of motor activity level. Child Development, 62, childhood (pp. 59–73). Chicester, England: Wiley. 1167–1174. doi:10.2307/1131160 Rothbart, M. K., Ahadi, S. A., & Hershey, K. L. (1994). Temperament and Saudino, K. J., & Eaton, W. O. (1995). Continuity and change in objec- in childhood. Merrill-Palmer Quarterly, 40, 21–39. tively assessed temperament: A longitudinal twin study of activity level. Rothbart, M. K., Ahadi, S. A., Hershey, K. L., & Fisher, P. (2001). British Journal of Developmental Psychology, 13, 81–95. Investigations of temperament at 3–7 years: The Children’s Behavior Saudino, K. J., Wertz, A. E., Gagne, J. R., & Chawla, S. (2004). Night and Questionnaire. Child Development, 72, 1394–1408. doi:10.1111/1467- day: Are siblings as different in temperament as parents say they are? 8624.00355 Journal of Personality and Social Psychology, 87, 698–706. doi: Rothbart, M. K., & Bates, J. E. (1998). Temperament. In N. Eisenberg broadly. (Vol. Ed.), Handbook of child psychology: Social, emotional, & person- 10.1037/0022-3514.87.5.698 Seifer, R. (2003). Twin studies, biases of parents, and biases of researchers.

publishers. ality development (pp. 105–176). New York, NY: Wiley. Saudino, K. J. (2003a). Parent ratings of infant temperament lessons from Infant Behavior & Development, 26, 115–117. doi:10.1016/S0163- 6383(02)00174-1 allied twin studies. Infant Behavior & Development, 26, 100–107. doi:

disseminated Seifer, R., Sameroff, A. J., Barrett, L. C., & Krafchuck, E. (1994). Infant its 10.1016/S0163-6383(02)00171-6 be of Saudino, K. J. (2003b). The need to consider contrast effects in parent- temperament measured by multiple observations and mother report. to rated temperament. Infant Behavior & Development, 26, 118–120. doi: Child Development, 65, 1478–1490. doi:10.2307/1131512 one

not Thomas, A., & Chess, S. (1977). Temperament and development. New or 10.1016/S0163-6383(02)00175-3 is Saudino, K. J. (2005). Multiple informants. In B. Everitt & D. Howell York, NY: Brunner/Mazel.

and (Eds.), Encyclopedia of Statistics in Behavioral Science (Vol. 4, pp. Thomas, A., Chess, S., & Birch, H. G. (1968). Temperament and behavior 1332–1333). Chichester, England: Wiley. disorders in children. New York, NY: New York University Press. user Association individual the Psychological of use American personal the the by for solely copyrighted is intended is document article This This This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Appendix

Table A1 Correlations: Lab-TAB Episode-Level Temperament Components

Episode level component 1 2 3456789101112131415161718192021222324

1. Box Empty Anger — .34 .31 .18 Ϫ.12 .08 .00 .06 Ϫ.10 .13 .08 .08 .12 .18 .20 .13 .24 .05 .24 .04 Ϫ.06 .14 .08 Ϫ.20 2. End of the Line Anger .34 — .26 .04 Ϫ.11 .08 .07 .10 Ϫ.10 .16 .18 .20 .03 .23 .34 .26 .22 .06 .02 .07 Ϫ.13 .21 .10 Ϫ.34

3. Transparent Box Anger .31 .26 — .04 .01 .17 Ϫ.06 Ϫ.01 Ϫ.07 .14 .10 .11 .15 .17 .28 .13 .23 .14 .06 .13 Ϫ.02 .32 .15 Ϫ.13 MEASURES TEMPERAMENT CHILDHOOD DERIVING 4. Box Empty Sadness .18 .04 .04 — .11 .26 .08 .08 .00 Ϫ.03 .03 .00 Ϫ.02 Ϫ.01 Ϫ.04 .05 .10 Ϫ.13 .01 Ϫ.04 .01 .06 Ϫ.04 Ϫ.04 5. End of the Line Sadness Ϫ.12 Ϫ.11 .01 .11 — .07 .03 Ϫ.02 .00 Ϫ.13 Ϫ.06 Ϫ.07 Ϫ.09 Ϫ.03 Ϫ.13 .10 Ϫ.06 .03 Ϫ.07 .04 .00 Ϫ.07 Ϫ.04 .04 6. Transparent Box Sadness .08 .08 .17 .26 .07 — .03 .03 .09 .08 .07 .14 Ϫ.02 .09 Ϫ.02 Ϫ.02 .00 Ϫ.05 Ϫ.11 Ϫ.06 Ϫ.03 .34 .01 Ϫ.07 7. Jumping Spider Approach Fear .00 .07 Ϫ.06 .08 .03 .03 — .69 .01 Ϫ.07 .03 .03 Ϫ.16 .00 Ϫ.03 Ϫ.01 .01 Ϫ.26 .00 Ϫ.09 .11 .05 .09 Ϫ.01 8. Jumping Spider Post-Approach Fear .06 .10 Ϫ.01 .08 Ϫ.02 .03 .69 — Ϫ.05 .09 .13 .13 Ϫ.13 Ϫ.01 .03 .13 .15 Ϫ.27 .07 .05 Ϫ.08 .10 .17 Ϫ.02 9. Stranger Approach Shyness Ϫ.10 Ϫ.10 Ϫ.07 .00 .00 .09 .01 Ϫ.05 — Ϫ.08 Ϫ.14 Ϫ.10 Ϫ.13 Ϫ.07 Ϫ.05 Ϫ.16 Ϫ.12 Ϫ.16 Ϫ.08 Ϫ.06 .09 .07 Ϫ.13 .16 10. Bookmark Positive Affect .13 .16 .14 Ϫ.03 Ϫ.13 .08 Ϫ.07 .09 Ϫ.08 — .34 .33 .17 .19 .21 .27 .19 .06 .18 .06 Ϫ.10 .14 .17 Ϫ.18 11. Popping Bubbles Low Positive Affect .08 .18 .10 .03 Ϫ.06 .07 .03 .13 Ϫ.14 .34 — .50 .27 .24 .22 .51 .29 .13 .02 .09 Ϫ.12 .11 .14 Ϫ.20 12. Popping Bubbles High Positive Affect .08 .20 .11 .00 Ϫ.07 .14 .03 .13 Ϫ.10 .33 .50 — .27 .24 .20 .41 .39 .14 .09 .11 Ϫ.17 .15 .20 Ϫ.23 13. Pop–Up Snakes Positive Affect .12 .03 .15 Ϫ.02 Ϫ.09 Ϫ.02 Ϫ.16 Ϫ.13 Ϫ.13 .17 .27 .27 — .16 .11 .20 .19 .50 .12 .14 Ϫ.02 Ϫ.01 .12 .00 14. Box Empty Approach .18 .23 .17 Ϫ.01 Ϫ.03 .09 .00 Ϫ.01 Ϫ.07 .19 .24 .24 .16 — .24 .25 .20 .17 .11 .08 Ϫ.16 .14 .17 Ϫ.26 15. Perpetual Motion Approach .20 .34 .28 Ϫ.04 Ϫ.14 Ϫ.02 Ϫ.03 .03 Ϫ.05 .21 .22 .20 .11 .24 — .34 .29 .14 .13 .15 .18 .25 .26 Ϫ.29 16. Popping Bubbles Low Approach .13 .26 .13 .05 Ϫ.09 Ϫ.02 Ϫ.01 .13 Ϫ.16 .27 .51 .41 .20 .25 .34 — .40 .16 .08 .18 Ϫ.15 .14 .15 Ϫ.27 17. Popping Bubbles High Approach .24 .22 .23 .10 Ϫ.06 .00 .01 .15 Ϫ.12 .19 .29 .39 .19 .20 .29 .40 — .07 .19 .20 Ϫ.06 .08 .21 Ϫ.24 18. Pop–Up Snakes Approach .05 .06 .14 Ϫ.13 .03 Ϫ.05 Ϫ.26 Ϫ.27 Ϫ.16 .06 .13 .14 .50 .17 .14 .16 .07 — Ϫ.01 .13 .03 Ϫ.01 .10 Ϫ.07 19. Bookmark Active Engagement .24 .02 .06 .01 Ϫ.06 Ϫ.11 .00 .07 Ϫ.08 .18 .02 .09 .12 .11 .13 .08 .19 Ϫ.01 — .14 .01 Ϫ.03 .12 .00 20. Workbench Active Engagement .04 .07 .13 Ϫ.04 .04 Ϫ.06 Ϫ.09 .05 Ϫ.06 .06 .09 .11 .14 .08 Ϫ.05 .18 .20 .13 .14 — .00 .02 .10 Ϫ.03 21. Perpetual Motion Persistence Ϫ.06 Ϫ.13 Ϫ.02 .01 .00 Ϫ.03 Ϫ.11 Ϫ.08 .09 Ϫ.10 Ϫ.12 Ϫ.17 Ϫ.02 Ϫ.16 .18 Ϫ.15 Ϫ.06 .03 .01 .00 — Ϫ.10 Ϫ.09 .12 22. Transparent Box Persistence .14 .21 .32 .06 Ϫ.07 .34 .05 .10 .07 .14 .11 .15 Ϫ.01 .14 .25 .14 .08 Ϫ.01 Ϫ.03 .02 Ϫ.10 — .15 Ϫ.19 23. Dinky Toys Inhibitory Control .08 .10 .15 Ϫ.04 Ϫ.04 .01 .09 .17 Ϫ.13 .17 .14 .20 .12 .17 .26 .15 .21 .10 .12 .10 Ϫ.09 .15 — Ϫ.13 24. Snack Delay Inhibitory Control .20 Ϫ.34 Ϫ.13 Ϫ.04 .04 Ϫ.07 Ϫ.01 Ϫ.02 .16 Ϫ.18 Ϫ.20 Ϫ.23 .00 Ϫ.26 Ϫ.29 Ϫ.27 Ϫ.24 Ϫ.07 .00 Ϫ.03 .12 Ϫ.19 Ϫ.13 —

Note. N ϭ 408. Correlations with an absolute value of .099 or greater and .129 or greater are significant at a p Ͻ .05 level and p Ͻ .01 (two tailed), respectively (values in the table are rounded to two decimal places).

Received March 26, 2010 Revision received July 16, 2010 Accepted September 14, 2010 Ⅲ 353