<<

EPIB 668 Examples of familial aggregation and studies

Aur´elieLabbe - Winter 2011.

1 / 68 OUTLINE

What is familial aggregation ? Case-control and cohort design: the history approach Example 1: Familial aggregation of lung cancer Twin study design Example 2: Familial aggregation of cancer : definition The component model to compute heritability Twin studies to estimate heritability Example 3: Compulsive hoarding

2 / 68 WHAT IS FAMILIAL AGGREGATION ?

3 / 68 Genetic questions

4 / 68 What is familial aggregation ?

First step in pursuing a possible genetic etiology of the disease Based on phenotypic data only (don’t need DNA) Demonstrate that the disease tends to run in more than what would expect by chance Examine how that familial tendency is modified by the degree or type of relationship, age or environmental factors

Important Familial aggregation does not separate genetic from environment

5 / 68 Rational of aggregation studies

Identify a group of individuals with a specific disease and determine whether relatives have an excess frequency of the same disease when compared to an appropriate reference population Often, of interest is a disease (i.e. affected vs non-affected). But it can also be a physiological trait that has a continuous distribution (e.g cholesterol levels)

6 / 68 CASE-CONTROL AND COHORT DESIGN: THE FAMILY HISTORY APPROACH

7 / 68 Issues in designing family studies

Goal is to demonstrate that the disease tends to run in families more than what would expect by chance Need to sample families rather than individuals In standard epi. studies, it is hard enough to find appropriate sampling frames for individuals ... this is even harder for families (in most countries, including US) ! For these reasons, most studies of familial aggregation of disease are based on ascertainment of probands (cases or controls), followed by identification of their family members.

8 / 68 Different types of designs

Case-control or cohort approach: Can obtain information from the probands themselves concerning their family history (presence of disease in their relatives) Can enroll relatives in the study for a more detailed evaluation Usually easier to adapt the case-control design for family studies, rather than the cohort design Other designs: Twin studies studies

9 / 68 Familial aggregation based on Family History (FH)

Positive FH = presence of disease in one or more first degree relatives FH should not be considered as a simple attribute of a person, comparable to age or cigarette smoking Depends on many factors: No of relatives and types of relatives Biologic relationship with the case/control Age distribution of relatives Disease frequency in the population

10 / 68 More on Family History...

FH is in general obtained from the cases and controls themselves (or from spouses and parents, if not available). FH can be: Abbreviated: ask about the presence of disease in relatives Detailed: detailed inquiry about relatives of cases and controls with and without the disease. For example, one can extend the FH to a multilevel scale, such as: First degree, other relative, none Two or more, one, none Parent or sib under age 50, parent or sib over age 50, none Plenty of scope for imagination !

11 / 68 The ideal Family History information...

For whatever type of relatives to be considered, the ideal information would be:

Number of relatives of each type at risk Their disease status (including specific diagnoses) Sexes Ages at risk (birth dates and dates of diagnosis or death Trade off between the number of relative types to be considered and the extend and quality of the information collected.

12 / 68 Example 1 Familial cancer history and lung cancer risk in United States nonsmoking men and women.

13 / 68 14 / 68 Example 1

Population based case-control study of lung cancer in non-smoking men and women in NY state from 1982 to 1984 In-person interviews completed for 437 lung cancer cases and 437 matched population controls (matched on sex, age, smoking history, county of residence and types of interview) Cases and controls reported their FH of cancer on parents, and children Information collected on relatives: country of birth year of birth and death smoking status (yes/no) presence of cancer (yes/no), type of cancer and year of diagnosis

15 / 68 Example 1

Table 1 Sociodemographic characteristics of nonsmoking men and women with lung cancer, population controls, and case and control family members; New York Lung Cancer Study, 1982–1984

Cases Controls Sex Females 218 (49.9%) 218 (49.9%) Males 219 (50.1%) 219 (50.1%) Mean age (range) 67 (31–81) 68 (35–82) Smoking history Never 197 (45.1%) 197 (45.1%) Former 240 (54.9%) 240 (54.9%) Interview type Self 296 (67.7%) 305 (69.8%) Surrogate 141 (32.3%) 132 (30.2%) Mean years school (range) 11.5 (0–24) 12.7 (3–20) Histologic type Adenocarcinoma 222 (50.8%) Epidermoid/squamous 109 (24.9%) Other 106 (24.3%) Marital status Ever 374 (86.8%) 398 (91.7%) Never 57 (13.2%) 36 (8.3%) Mean age fathers 70 71 % Fathers who smoked 56 55 Mean age mothers 72 74 % Mothers who smoked 12 11 Mean age brothers 60 62 Mean No. of brothers (range) 1.82 (0–8) 1.49 (0–8) % Who smoked 62 58 Mean age sisters 62 65 Mean No. of sisters (range) 1.75 (0–6) 1.56 (0–7) % Who smoked 33 31 Mean age children 35 35 Mean No. of children (range) 1.93 (0–8) 2.08 (0–11) % Who smoked 42 38

16 / 68 Example 1: Lung cancer risk

Table 2 Parental cancer history and risk of lung cancer in nonsmoking men and women; New York Lung Cancer Study, 1982–1984

No. of No. of a OR (95% CI) P cases controls Father Any cancer Yes 75 49 1.67 (1.12–2.48) 0.01 No 296 322 Lung cancer Yes 16 9 1.87 (0.79–4.42) 0.15 No 355 362 Aerodigestive tract cancer Yes 27 11 2.78 (1.30–5.95) 0.009 No 344 360 Digestive cancer Yes 25 21 1.20 (0.66–2.17) 0.55 No 346 350 Yes 8 8 1.00 (0.38–2.66) 1.00 No 363 363 Mother Any cancer Yes 75 63 1.24 (0.85–1.80) 0.26 No 301 313 Lung cancer Yes 2 3 0.67 (0.11–3.99) 0.66 No 374 373 Aerodigestive tract cancer Yes 2 4 0.50 (0.09–2.73) 0.42 No 374 372 Digestive cancer Yes 27 22 1.28 (0.69–2.37) 0.44 No 349 354 Breast cancer Yes 25 13 2.00 (1.00–4.00) 0.049 No 351 363

17 / 68 Example 1: Lung cancer risk

Cases were more likely than their matched controls to report a paternal history of any cancer and aerodigestive tract cancer Cases and controls reported a similar maternal history of cancer, with the exception of breast cancer, which reportedly occurred two times more frequently in mothers of cases than in mothers of matched controls

18 / 68 Example 1: Lung cancer risk

Table 3 Cancer history in siblings and offspring and risk of lung cancer in nonsmoking men and women; New York Lung Cancer Study, 1982–1984

No. of No. of a OR (95% CI) P cases controls Sister Any cancer Yes 84 54 1.66 (1.11–2.47) 0.01 No 243 257 Lung cancer Yes 9 2 4.14 (0.88–19.46) 0.07 No 318 309 Aerodigestive tract cancer Yes 11 3 3.50 (0.96–12.74) 0.058 No 316 308 Digestive cancer Yes 18 11 1.70 (0.77–3.77) 0.19 No 309 300 Breast cancer Yes 24 11 2.07 (0.99–4.31) 0.053 No 303 300

19 / 68 Example 1: Lung cancer risk

Cases were significantly more likely than their matched controls to report having had sisters with any cancer Cases were significantly more likely than controls to report having had brothers with any cancer

20 / 68 Example 1: Cohort analysis

Cohort of fathers, mothers, sisters and brothers were constructed Variable included were age and smoking status of the cohort member Indicator variable was included to indicate if the relative was related to a case or a control Logistic models were performed to predict the occurrence of various cancers in the cohort of interest. Compute Risk Ratios reflecting the risk of being a case relative as compared to a control relative, controlling for the relative’s age and smoking status.

21 / 68 Example 1: Cohort analysis

Table 4 Cohort analysis of parental cancer history and risk of lung cancer in nonsmoking men and women; New York Lung Cancer Study, 1982–1984 a RR (95% CI) P Father Any cancer Yes 1.90 (1.27–2.85) 0.002 No Lung cancer Yes 1.85 (0.80–4.33) 0.15 No Aerodigestive tract cancer Yes 2.77 (1.34–5.74) 0.0059 No Digestive cancer Yes 1.27 (0.69–2.34) 0.44 No Prostate cancer Yes 1.51 (0.52–4.42) 0.45 No

22 / 68 Example 1: Cohort analysis

Fathers of cases were at greater risk of developing any cancer and aerodigestive tract cancers compared to fathers of controls. Mothers of cases were at greater risk of developing any cancer and breast cancer compared to mothers of controls.

23 / 68 TWIN STUDY DESIGN

24 / 68 Other type of design: twin studies

Provide a simple way to separate genetic from environmental factors (in theory) Usually collect monozygotic (MZ) and dizygotic (DZ) : MZ share 100% of genes DZ share 50% of genes in average (as full sibs) By comparing MZ to DZ twins for of disease, one can assess the relative importance of of genetic and environmental factors.

25 / 68 Twin study: concordance rate

There are two main types of concordance rate (depends on the question being asked) Let C be the No of concordant affected twins pairs Let D be the No of discordant twins pairs Casewise concordance rate: 2C/(2C + D). Estimate the probability that one twin is affected given that the other is affected. Pairwise concordant rate: C/(C + D). Estimate the proportion of twinships in the population who are concordant (if twinships have been identified by complete ascertainment; otherwise, divide C by 2).

26 / 68 Twin study: comparison of concordance rate

If MZ twins have a higher concordance than the DZ twins, there is suggestive evidence for a genetic basis of the disease. Any discordance between MZ twins automatically leads one to conclude there must be some role for environmental factors in risk Since DZ twins and full sibs share on average 50% of their gene, a higher concordance rate in DZ twins compared to full sibs further points to a role for shared environmental factors. In practice, however, interpretation of twin concordance rates is rarely straightforward.

27 / 68 Example 2 Environmental and heritable factors in the causation of cancer.

28 / 68 The New England Journal of Medicine

© Copyright, 2000, by the Massachusetts Medical Society

VOLUME 343 J ULY 13, 2000 NUMBER 2

ENVIRONMENTAL AND HERITABLE FACTORS IN THE CAUSATION OF CANCER

Analyses of Cohorts of Twins from Sweden, Denmark, and Finland

PAUL LICHTENSTEIN, PH.D., NIELS V. HOLM, M.D., PH.D., PIA K. VERKASALO, M.D., PH.D., ANASTASIA ILIADOU, M.SC., JAAKKO KAPRIO, M.D., PH.D., MARKKU KOSKENVUO, M.D., PH.D., EERO PUKKALA, PH.D., AXEL SKYTTHE, M.SC., AND KARI HEMMINKI, M.D., PH.D.

29 / 68 Example 2: Background

The contribution of hereditary factors to the causation of sporadic cancer is unclear. Studies of twins make it possible to estimate the overall contribution of inherited genes to the development of malignant diseases.

30 / 68 Example 2: Method

The authors combined data on 44,788 pairs of twins listed in the Swedish, Danish, and Finnish twin registries. Goal is to assess the risks of cancer at 28 anatomical sites for the twins of persons with cancer. Statistical modeling was used to estimate the relative importance of heritable and environmental factors in causing cancer at 11 of those sites.

31 / 68 Example 2: Results

At least one cancer occurred in 10,803 persons among 9512 pairs of twins. An increased risk was found among the twins of affected persons for stomach, colorectal, lung, breast, and prostate cancer

32 / 68 Example 2: Table 2 (extract)

NO. OF NO. SITE OF CONCORDANT OF DIS- CON- CANCER AND AFFECTED CORDANT RELATIVE RISK CORD- SUBJECTS PAIRS PAIRS (95% CI)† ANCE‡

Larynx Men MZ 2 22 119.2 (23.6–601.1) 0.15 DZ 1 36 42.4 (5.4–334.6) 0.05 Women MZ 0 3 — 0 DZ 0 11 — 0 Lung Men MZ 15 233 7.7 (4.4–13.6) 0.11 DZ 24 436 6.7 (4.3–10.5) 0.10 Women MZ 3 63 25.3 (7.4–87.0) 0.09 DZ 1 185 1.8 (0.2–12.8) 0.01 Breast Men MZ 0 2 — 0 DZ 0 5 — 0

33 / 68 Example 2: Interpretation of concordance

Twin 1 Disease + Disease - Disease + a = 15 b = 233/2 Twin 2 Disease - c = 233/2 d = 6983 In this paper, concordance is the proportion of all persons with cancer whose twin had cancer at the same site (Table 2 legend) Here, No of males with lung cancer with an affected twin (lung cancer) = 15 × 2 = 30 Here, No of males with lung cancer = 15 × 2 + 233 = 263 Concordance is 30/233 = 0.11 11% of all males with lung cancer had a MZ twin with lung cancer

34 / 68 Example 2: Interpretation of concordance in males

For MZ men, concordance is 0.11 For DZ men, concordance is 0.10 Same concordance rate between MZ and DZ twins suggests that there is no evidence for a genetic basis of lung cancer in males. Since only 11% of all males with lung cancer had a MZ twin with lung cancer, there is evidence for an environmental factor in risk.

35 / 68 Example 2: Interpretation of concordance in females

For MZ women, concordance is 0.09 For DZ men, concordance is 0.01 The concordance for MZ twins is 9 times higher than the concordance for DZ twins. This suggests a genetic basis of lung cancer in females. Since only 9% of all females with lung cancer had a MZ twin with lung cancer, there is evidence for an environmental factor in risk as well.

36 / 68 HERITABILITY: GENERAL DEFINITION

37 / 68 Genetic epidemiology questions

38 / 68 Introduction

After documenting familial aggregation for a trait or disease, the next logical step is to ask how much of the familial aggregation can be attributed to genetic causes Heritability is typically used to answer this question Heritability was originally developed for continuous traits (i.e. cholesterol level, BMI,...) The concept can also be applied to qualitative traits (disease status), but the question needs to be reframed

39 / 68 Datasets

Heritability is computed from phenotypic data only (don’t need DNA), measured on relatives The dataset is a set of N families (twins, sib-pairs, nuclear families, more complex families...) from which the phenotype is measured on each subject within family.

40 / 68 Definition

Heritability is defined as the proportion of the trait variation directly attributable to genetic differences among individuals relative to the total variation in a population. Heritability is a ratio of (i.e. between 0 and 1) A high heritability constitutes circumstantial evidence for genetic control of a trait. A high heritability means that a large proportion of the phenotypic variation among relatives follows patterns predicted by simple genetic factors

41 / 68 Examples of heritability estimates

ADHD 80% Childhood delinquency 20%-40% Fingerprint ridge count 98% Height 66% IQ 34% Social maturity score 16% Alzeimer disease 60% 80% Note: these estimates may vary a lot from one study to another.

42 / 68 THE VARIANCE COMPONENT MODEL TO COMPUTE HERITABILITY

43 / 68 Introduction

Statistical techniques used to compute heritability are largely derived from analysis of variance Analysis focuses on correlations or covariances among relatives. It attempts to partition these observed correlations (or covariances) into components attributable to shared genes and shared environments. From these components, heritability can be computed.

44 / 68 Model for quantitative traits

Variation of a trait can be separated into genetic and environmental components. The basic model for the trait in the population supposes that:

Var (Trait) = Var (Genetic effect) + Var (Environmental effect)

These two components are called the genetic variance and the environmental variance. Heritability = Var (Genetic effect)/Var (Trait).

45 / 68 Example: Heritability and phenotype variation

Genotypic difference makes 3-unit difference on phenotype. Environmental difference also makes 3-unit difference Heritability = 50%

46 / 68 Example: Heritability and phenotype variation

Genotypic difference makes 3-unit difference on phenotype Environmental difference just makes 1-unit difference Heritability  50%

47 / 68 Assumption for heritability analysis

We assume that there is no statistical gene-environment interaction (additivity of the model) makes the same difference in all environments Gene-environment interaction: different populations live in very different environments, leading to different phenotypic effects for any given genotype

48 / 68 Heritability values

If heritability =0, all observed phenotypic variation is attributable to non-genetic factors If heritability =1, there is no phenotypic variation NOT due to genetic differences Greater diversity of environmental factors present in a population will lower the estimated heritability Populations with more homogeneous environment will give higher estimates of heritability A population that is relatively genetically homogeneous will produce a lower estimate of heritability It is hazardous to apply an estimated heritability obtained from one population to another population for predictive purposes.

49 / 68 TWIN STUDIES TO ESTIMATE HERITABILITY

50 / 68 Twin studies

Twins are uniquely matched for age and many environmental factors They have been used to study the role of genetic factors for a large number of different Goal is to compare similarities (correlations) and/or differences (within pair variances)in MZ and DZ twins to infer and measure genetic control Greater concordance in MZ twins compared to DZ twins can argue in favor of genetic factors Any discordance in MZ twins underscores a role for environmental factors

51 / 68 Twin studies: key assumptions

MZ twins are completely identical for all genetic factors DZ twins are no more alike genetically than full sibs and share, on average, half of their genes Both types of twins are sampled from the same gene pool The environmental component of variance of both types of twins are similar Note that the last assumption may be quite weak, especially for behavioral traits

52 / 68 Computing heritability

2 H ≈ 2(ConcordanceMZ − ConcordanceDZ )

If the correlation (concordance) of the phenotype is the same in DZ and MZ twins, heritability is equal to 0. If the correlation (concordance) in MZ twins is greater than the DZ twin, heritability is high. This confirms our previous statement: Greater concordance in MZ twins compared to DZ twins can argue in favor of genetic factors

53 / 68 Example 3 PREVALENCE AND HERITABILITY OF COMPULSIVE HOARDING

54 / 68 Prevalence and Heritability of Compulsive Hoarding: A Twin Study

Alessandra C. Iervolino, Ph.D. Objective: Compulsive hoarding is a seri- compulsive hoarding into additive genetic ous health problem for the sufferers, their and shared and nonshared environmen- Nader Perroud, M.D. families, and the community at large. It tal factors (female twins only; N=4,355). appears to be highly prevalent and to run Results: in families. However, this familiality could A total of 2.3% of twins met crite- Miguel Angel Fullana, Ph.D. be due to genetic or environmental fac- ria for caseness, with significantly higher tors. This study examined the prevalence rates observed for male (4.1%) than for Michel Guipponi, Ph.D. and heritability of compulsive hoarding in female (2.1%) twins. Model-fitting analyses a large sample of twins. in female twins showed that genetic fac- tors accounted for approximately 50% of Lynn Cherkas, Ph.D. Method: A total of 5,022 twins com- the variance in compulsive hoarding, with pleted a validated measure of compulsive David A. Collier, Ph.D. nonshared environmental factors and hoarding. The prevalence of severe hoard- measurement error accounting for the ing was determined using empirically de- other half. David Mataix-Cols, Ph.D. rived cutoffs. Genetic and environmental influences on compulsive hoarding were Conclusions: Compulsive hoarding is estimated using liability threshold models, highly prevalent and heritable, at least in and maximum-likelihood univariate women, with nonshared environmental model-fitting analyses were employed to factors also likely to play an important decompose the variance in the liability to role.

(Am J Psychiatry 2009; 166:1156–1161) 55 / 68 Background

Compulsive hoarding is defined as the acquisition of a large number of possessions with failure to discard them; It represents a serious health problem for the sufferers, their families, and the community at large. It is is associated with substantial psychiatric and medical comorbidity, including obsessive-compulsive disorder, mood disorders, social phobia, personality disorders, obesity, etc... Therapeutic response to antidepressants and behavior therapy is poor or partial at best.

56 / 68 Goal of the study

To estimate the prevalence of severe compulsive hoarding in a sample of 5,022 monozygotic and dizygotic twins To estimate the contribution of genetic and shared and nonshared environmental factors to compulsive hoarding in a subsample of 4,355 female twins.

57 / 68 Sample

Participants were MZ and DZ twins from the Twins UK adult twin registry The sample available for analysis included 2,053 twin pairs Over 80% of the sample was female Mean age was 55.5 years

58 / 68 Phenotype

The HRS-SR is a brief self-administered instrument consisting of five items (clutter, difficultydiscarding, excessive acquisition, distress, and impairment) Each item is measured on a Likert scale ranging from 0 (none) to 8 (extreme), with 4 reflecting moderate symptoms The total score can range from 0 to 40. For prevalence analysis, phenotype was the affection status (total score greater than 17) For heritability analysis, phenotype was the total score

59 / 68 Heritability analysis

Maximum-likelihood univariate model-fitting analyses were employed to decompose the variance in the liability to compulsive hoarding into additive genetic (A) and shared (C) and nonshared (E) environmental factors. To establish the best fit for the data, alternative models were tested by systematically dropping paths from the full model (i.e., AE, CE, E).

60 / 68 TABLE 2. Genetic Raw Ordinal Model-Fitting Analyses for the HRS-SR and Standardized Parameter Estimates, Based on the Full ACE Modela Modelb –2 Log Likelihood χ2 Δdf p Akaike’s Information Criterion Compare To 1. Fully saturated 9345.8 2. ACE 9350.2 4.33 9 0.88 –13.7 1 3. AE 9350.3 0.1 1 0.75 –15.6 2 4. CE 9374.5 24.4c 10.01c 23.37 2 5. E 9576.0 225.9c 20.01c 221.87 2 a HRS-SR=Hoarding Rating Scale–Self Report; A=additive genetic effects; C=shared environmental effects; E=nonshared environmental effects. Thresholds for first- and second-born monozygotic and dizygotic female twins could be equated without any loss in fit (χ2=7.2, df=9, n.s.) and were as follows: –0.14 (95% CI=–0.18 to –0.09), 0.91 (95% CI=0.86 to 0.96), and 2.03 (95% CI=1.94 to 2.12). b Standardized estimates were as follows: A: 0.49 (95% CI=0.30 to 0.57); C: 0.03 (95% CI=0 to 0.20) (parameter can be dropped without any loss in fit); E: 0.48 (95% CI=0.43 to 0.55). c Significant decline in fit. The critical χ2 value for 1 degree of freedom at the 0.05 significance level is 3.84.

61 / 68 Results from heritability analysis

Table shows that the best model is:

2 2 σ = σa + σe

This means that there is no shared environmental variance 2 2 The VC estimates are: σa = 0.49 and σe = 0.48 So, the estimated heritability is around 50%.

62 / 68 CONCLUSIONS / DISCUSSION

63 / 68 Conclusions / Discussion

Studies of familial aggregation are normally the first step in investigating a possible genetic etiology for a disease or trait A disease or trait that does not run in families is unlikely to have a strong genetic component. Aggregation studies require the collection of family data However, the sampling plan might comprise only a set of unrelated individuals (cases or controls), who are then asked about their family histories without formally enrolling their family members as study subjects. FH is strongly linked to the status of proband (think more about that...)

64 / 68 Conclusions / Discussion

Disease outcome of the probands can be analyzed in relation to their FH as predictor variables (case-control approach) Reported disease outcome of the family members can also be treated as a vector of response variables to be analyzed in relation to the disease status of the proband (cohort approach). In the latter case, some allowance is needed for the dependence of family member’s outcomes on each other (statistical analyses become more complex).

65 / 68 Conclusions / Discussion

Familial aggregation cannot separate genetic from environmental effect A twin study is the only design that can give an insight about genetic and environmental components (not very formal) When we study a quantitative trait instead of a disease affection status (e.g blood pressure, BMI, etc...), the environmental and genetic component can be formally defined and computed through the heritability.

66 / 68 Conclusion

Once a phenotype shows familial aggregation, one needs to assess the relative importance of genetic control The statistical tools we use reflect the imbalance genetic and environmental factors jointly controlling the phenotype

67 / 68 Final remarks

Heritability calculation is based on models that require strong assumptions (e.g: no epistatic effect, no G × E correlation, no G × E interaction, etc... ) Heritability estimates should be taken with caution Estimates can vary a lot from one study to another Heritability models differ a lot in animal studies where genotypic or environment factors are controlled.

68 / 68