SNP ASSOCIATIONS WITH TUBERCULOSIS SUSCEPTIBILITY IN A UGANDAN
HOUSEHOLD CONTACT STUDY
by
ALLISON REES BAKER
Submitted in partial fulfillment of the requirements
For the degree of Master of Science
Thesis Advisor: Dr. Catherine M. Stein
Department of Epidemiology and Biostatistics
CASE WESTERN RESERVE UNIVERSITY
August, 2010
CASE WESTERN RESERVE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
We hereby approve the thesis/dissertation of
______
candidate for the ______degree *.
(signed)______(chair of the committee)
______
______
______
______
______
(date) ______
*We also certify that written approval has been obtained for any proprietary material contained therein. Table of Contents
Table of Contents...... iii List of Tables ...... iv Acknowledgements...... v List of Commonly Used Abbreviations ...... vi Chapter 1: Literature Review...... 8 1.1. Genetics of Susceptibility to Tuberculosis ...... 8 1.1.1. History and Epidemiology of Tuberculosis ...... 8 1.1.2. Candidate Genes ...... 10 1.1.3. Genome-wide Linkage Scans ...... 15 1.2. Methods for Fine Mapping Analysis ...... 20 1.3. Imputation...... 23 Chapter 2: Specific Aims...... 27 2.1. Specific Aim 1 ...... 27 2.2. Specific Aim 2 ...... 27 2.3. Specific Aim 3 ...... 28 Chapter 3: Methods...... 29 3.1. Data Description ...... 29 3.1.1. Sample...... 29 3.1.2. Descriptive Statistics...... 30 3.2. Genotyping...... 31 3.3. Analysis Strategy ...... 34 3.3.1. Aim 1: Candidate Gene Analysis...... 34 3.3.2. Aim 2: Fine Mapping Analysis...... 37 3.3.3. Aim 3: Imputation...... 38 Chapter 4: Results and Discussion...... 41 4.1. Results...... 41 4.1.1. Candidate Gene Analysis...... 41 4.1.2. Fine-Mapping Analysis...... 43 4.1.3. Imputation...... 44 4.2. Discussion...... 53 4.3. Conclusions and Future Directions...... 56 Bibliography ...... 60
iii
List of Tables
Table 1. Descriptive Statistics...... 34 Table 2. Candidate Gene SNPs Departing from HWE...... 41 Table 3. Candidate Gene Analysis Results...... 42 Table 4. Fine Mapping SNPs Departing from HWE ...... 44 Table 5. Haplotype Analysis Results for TLR2...... 46 Table 6. Haplotype Analysis Results for TLR4...... 46 Table 7. Haplotype Analysis Results for TLR6...... 46 Table 8. Haplotype Analysis Results for TIRAP...... 46 Table 9. Results for Imputed Genotypes on Chromosome 7p...... 48 Table 10. Results for Imputed Genotypes on Chromosome 20q...... 52
iv Acknowledgements
I would like to acknowledge and thank my thesis advisor, Dr. Catherine Stein, for providing me with direction and leadership throughout my academic program. I am exceedingly grateful for the incredible mentoring, support and guidance received from
Drs. Courtney Gray and Emma Larkin. Thank you also to Drs. Robert Igo and Robert
Elston, and thanks to Robert Goodloe for his programming assistance and sharing in the student experience. A very special thank you is dedicated to my devoted mother and father, and to my husband, Dave, for without his endearing love and endless support, none of my success would be possible.
v List of Commonly Used Abbreviations
AIDS Acquired Immune Deficiency Syndrome ASW African Ancestry in Southwest USA CARD11 Caspase recruitment domain family, member 11 cM Centimorgan CTSZ Cathepsin Z GLMM Generalized Linear Mixed Model HIV Human Immunodeficiency Virus IL-1 Interleukin-1 IL-10 Interleukin-10 IL-12 Interleukin-12 IFNG1- γ Interferon Gamma HIV Human Immunodeficiency Virus HMM Hidden Markov Model HWE Hardy Weinberg Equilibrium kb Kilobasepair LD Linkage Disequilibrium LTBI Latent Mycobacterium Tuberculosis Infection LWK Luhya in Webuye, Kenya MAF Minor Allele Frequency Mb Megabasepair MC3R Melanocortin 3 Receptor MKK Maasai in Kinyawa, Kenya Mtb Mycobacterium tuberculosis NOS2A Nitric Oxide Synthase 2A NRAMP1 Natural-Resistance-Associated Macrophage Protein 1 PPD Purified Protein Derivative QC Quality Control QTL Quantitative Trait Locus SLC11A1 Solute Carrier Family 11, Member 13 SNP Single Nucleotide Polymorphism TB Tuberculosis TBSCPB Tuberculosis Susceptibility Variable TDT Transmission Disequilibrium Test TLR-2 Toll-Like Receptor-2 TLR-4 Toll-Like Receptor-4 TNF Tumor Necrosis Factor TNF-α Tumor Necrosis Factor-α TST Tuberculin Skin Test UG Uganda YRI Yoruba in Ibadan, Nigeria
vi
SNP Associations with Tuberculosis Susceptibility in a Ugandan Household Contact Study
Abstract
by
ALLISON REES BAKER
The World Health Organization reports that over 9 million new cases of tuberculosis
(TB) are diagnosed each year, killing between 1.6 and 2 million individuals worldwide.
TB is an infectious disease caused by the bacterium Mycobacterium tuberculosis (Mtb), and reports indicate that only 10% of individuals infected with Mtb actually advance to disease. Genetic linkage and association analyses have established several chromosome regions involved in TB susceptibility. This study examines the association of TB susceptibility with a selection of biologically relevant markers, a chromosome 7 region identified through a previous genome scan, and association with imputed genotypes.
Across chromosomes 7 and 20, 564 Ugandan individuals were genotyped at 1,417 SNPs.
None of the candidate genes or fine mapping SNPs were found significantly associated with TB susceptibility (P > 0.10). Five imputed SNPs were significant at the P = 0.01 level. Suggested future work includes GWAS and resequencing analyses.
vii
Chapter 1: Literature Review
1.1. Genetics of Susceptibility to Tuberculosis
1.1.1. History and Epidemiology of Tuberculosis
The World Health Organization (WHO) reports that over 9 million new cases of tuberculosis (TB) are diagnosed each year, killing between 1.6 and 2 million individuals worldwide (World Health Organization 2009). TB is an infectious disease caused by the bacterium Mycobacterium tuberculosis (Mtb), but reports indicate that only 10% of individuals infected with Mtb actually advance to disease (Murray et al. 1990). The pathogenesis of TB follows a two-stage process: a productive infection of Mtb whereby symptoms do not develop, followed by Mtb replication and the expression of disease symptoms, such as persistent cough and pulmonary cavities on chest x-ray (Comstock
1982). A simple skin-test is most commonly used to detect latent Mtb infection, testing reaction to purified protein derivative (PPD). TB disease is characterized by growth of
Mtb on culture, presence of cavities on x-ray, and symptoms such as cough and fever.
Studies have established a direct association between human immunodeficiency virus (HIV) status and TB susceptibility (Chaisson et al. 1987; Pitchenik et al. 1984;
Sunderam et al. 1986; Louie et al. 1986). Due to the growing rate of HIV and acquired immune deficiency syndrome (AIDS), the number of TB cases continues to rise. In fact, having AIDS increases the risk for contracting TB by almost 100 times (Horner and Moss
1991). Almost 3 million of these newly diagnosed cases are found in Africa. In Uganda alone, 426 individuals per 100,000 have been diagnosed with TB (World Health
Organization 2009). The WHO reports most currently that 38% of incident adult TB
8 cases are also infected with HIV. Because HIV inhibits the immune system, which in
turn is directly attacked by the Mtb, monitoring TB in developing countries where HIV
and AIDS are highly prevalent is of great interest to the research community.
Several arguments have been made for a genetic risk factor in TB development, based on the notion that such a small percentage of individuals infected with Mtb
progress to disease development. Although results remain somewhat inconsistent, animal
models (Skamene et al. 1998; Kramnik et al. 2000), twin studies (Kallmann 1943;
Simonds 1963; Comstock 1978; Wiart et al. 2004), segregation analysis (Shaw et al.
1997), candidate gene studies (Bellamy and Hill11998; Bellamy 1998a; Bellamy et al.
1998b; Leandro et al. 2009; Stein et al. 2007; Pacheco et al. 2008), linkage analysis
(Bellamy et al. 2000; Cervino et al. 2002; Greenwood et al. 2000; Jamieson et al. 2004;
Miller et al. 2004), and fine-mapping analyses (Cervino et al. 2002) have found evidence in support of a genetic component of TB. Furthermore, European populations have greater TB resistance than populations of African ancestry, possibly the result of longer
exposure times (Dubos 1952). Möller and Hoal suggested that these population
differences are not only due to socioeconomic factors, as evident in a United States
nursing home study whereby individuals of African descent were found twice as likely as
individuals from European descent to be Mtb infected, (Möller et al. 2010a; Stead et al.
1990), but that these differences are also due to non-environmental factors. Thus,
inherited genetic susceptibility for a host defense against TB development may exist.
Shaw and colleagues (Shaw et al. 1997) performed a segregation analysis of 98
Brazilian families to determine the mode of inheritance for genetic susceptibility to TB.
After testing four different models: major gene, sporadic, polygenic, and multifactorial,
9 the investigators concluded a general two-locus major gene model for TB susceptibility,
which was only marginally preferred over a single-locus model (Shaw et al. 1997).
Another segregation and commingling analysis was carried out by Stein et al. (2005),
where antigen-induced tumor necrosis factor α (TNF-α) expression was used as an
endophenotype for TB, such that a major gene model with three underlying means
explained one-third of the phenotypic variance (Stein et al. 2005). Where as Shaw et al.
(1997) analyzed TB status as a binary trait and suggested an underlying oligogenic
model, Stein and colleagues (2005) concluded that a major gene underlay TNF-α
expression levels in response to stimulation with Mtb culture filtrate, and that
heterogeneity could be explained by age and HIV status. From these suggested models,
linkage and association analyses have been conducted, thus establishing possible
candidate genes involved in TB susceptibility.
1.1.2. Candidate Genes
Several candidate genes have been examined for association to susceptibility of TB
development after infection, and findings from these analyses are inconsistent.
A major candidate gene likely to be involved in TB susceptibility is the natural-
resistance-associated macrophage protein 1 gene, formerly NRAMP1 and now referred to as the SLC11A1 (solute carrier family member 1) gene. Skamene et al. (1998) and
Kramnik et al. (2000) showed that the mouse ortholog Nramp1 on chromosome 1 was correlated with susceptibility to infection with Mtb. In a large case-control study in The
Gambia, Africa, Bellamy et al. (1998b) found that in a sample of 827 HIV uninfected subjects, polymorphisms in the NRAMP1 gene located on chromosome 2q35 were
10 significantly associated with TB. Although this study design was not able to distinguish between susceptibility to TB infection and susceptibility to clinical TB diagnosis, it provided valuable insight into the genetic susceptibility to TB development.
Several other studies have examined SLC11A1. A meta-analysis by Li et al.
(2006) focused on four specific variants (for HIV-negative individuals only) within the
SLC11A1 gene and reported very inconsistent evidence for its association with TB susceptibility (Li et al. 2006a). Significant linkage of active TB to a variant slightly distal to SLC11A1 was found in a large Aboriginal Canadian family exposed to a TB epidemic (LOD score = 3.81) (Greenwood et al. 2000). Therefore, SLC11A1 plays at least some marginal role in TB susceptibility.
Also recently identified via animal models is a region on chromosome 1 containing the intracellular pathogen resistance gene (Ipr1), affecting TB resistance in mice (Pan et al. 2005). A West African study identified an association with what is the most comparable human homologue to Ipr1, the nuclear body protein gene (SP110) on chromosome 2 (Tosh et al 2006). However, replication studies in South Africa (Babb et al. 2007a), Ghana (Thye et al. 2006), and Russia (Szeszko et al. 2007) did not confirm this association. Another gene identified through transgenic mouse models is the Toll- like receptor gene, TLR2: TLR2 knock-out mice are highly susceptible to TB infection
(Reiling et al. 2002; Drennan et al. 2004). In fact, mice with the TLR2 gene survived the
Mtb infection much longer than those mice not expressing the gene (Heldwein et al.
2003).
Several candidate genes for TB susceptibility have been identified through biologic plausibility, and results from association analyses of these genes with TB is
11 inconsistent. The vitamin D receptor (VDR) gene has been inconsistently associated with
TB infection and/or disease in several populations (Bellamy et al. 1999; Selvaraj et al.
2000; Roth et al. 2004; Liu et al. 2004; Chen et al. 2006; Babb et al. 2007b). A meta- analysis by Lewis et al. of the association between the VDR and TB risk reported inconclusive results because the studies reviewed were underpowered due to sample size limitations (Lewis et al. 2005). Other studies have found inconsistent results for association of TB disease with TLR2 (Ben-Ali et al. 2004; Ogus et al. 2004; Thuong et al. 2007; Yim et al. 2006; Bochud et al. 2003), interferon gamma (IFNG) (Rossouw et al. 2003; Lopez-Maderuelo et al. 2003; Etokebe et al. 2006; Moran et al. 2007; Amim et al. 2008; Pacheco et al. 2008), the Interleukin -1 (IL1) complex of genes [IL1: (Bellamy et al. 1998a; Wilkinson et al. 1999); IL1B: (Kusuhara et al. 2007; Awomoyi et al. 2005;
Gomez et al. 2006); IL1RA: (Bellamy et al. 1998a; Wilkinson et al. 1999 )], Interleukin-
6 (IL6) (Ladel et al. 1997; Oral et al. 2006), Interleukin-10 (IL10) (Lopez-Maderuelo et al. 2003; Stein et al. 2007; Pacheco et al. 2008), Interleukin-12 (IL12) (Leandro et al.
2009), lipase-encoding lipR (Sheline et al. 2009), pro-apoptotic P2X7 (Li et al. 2002;
Nino-Moreno et al. 2007), and nitric oxide synthase 2A (NOS2A) (NOS2A was also
confirmed in knock-out models) (Gomez et al. 2007; Jamieson et al. 2004; Qu et al.
2007; MacMicking et al. 1997). In a 2009 study of Ghanaian patients diagnosed with
clinical TB and Mtb exposed controls, four IL10 promoter variants were genotyped (Thye
et al. 2009a). After analyzing a set of haplotypes reconstructed from these variants, these
authors found the haplotype associated with higher production of IL10 occurred
significantly less in the PPD-negative controls than in the TB cases (odds ratio (OR) =
2.15, 95% confidence interval (CI)= [1.3 – 3.6] ) and PPD-positive controls (OR = 2.09,
12 95% CI = [1.2 - 3.5] (Thye et al. 2009a). These results further support that Mtb exposed
individuals who retain PPD-negative status are genetically distinct from TB cases and
latently-infected individuals.
In a case-control study of African-Americans and Caucasians, an analysis of 39
tag SNPs in the nitric oxide synthase 2A (NOS2A) gene found nine single nucleotide
polymorphisms (SNPs) significantly associated with TB (P < 0.05), multiple SNP
interactions between NOS2A and the interferon gamma receptor 1 (IFNGR1) gene (P
ranging from 0.0004 to 0.0006), and interactions between NOS2A and the Toll-like
receptor-4 (TLR4) gene (0.002 < P < 0.005), in the African-American individuals only
(Velez et al. 2009). Other studies suggest interactions between TLR4, VDR, IL12 and
IFNG (Ben-Ali et al. 2004; Ogus et al. 2004; Yim et al. 2006). Inconsistent results in
these studies are primarily due to underpowered analyses via small sample sizes,
differences in phenotype definition, racial/ethnic diversity within the study sample, and
discrepancies in the characterization of controls.
Another promising candidate gene is tumor necrosis factor (TNF) which helps
code for TNF-α. TNF-α is a proinflammatory cytokine modulated by T cell–macrophage
interaction that mediates granuloma formation as well as the suppression of TB infection
(Keane et al. 2001). In mouse strains deficient in TNF, TB is sometimes lethal, thereby
suggesting evidence of association between the TNF gene and disease (Flynn et al. 1993;
Botha and Ryffel 2003). Although the association between TB susceptibility and TNF
was not found significant in a 2008 meta-analysis (Leandro et al. 2009), variation in the
TNF gene or its promoter region has been associated with increased risk for infectious
diseases, such as malaria, leprosy, and HIV disease progression, as well as certain
13 autoimmune disorders (e.g. asthma, systemic lupus erythematosus, rheumatoid arthritis,
Crohn’s disease, sarcoidosis, psoriasis, and diabetes) (Bidwell et al. 2001; Haukim et al.
2002). In an analysis of 177 Ugandan pedigrees where TB prevalence was high, Stein et al. found that TNF-α had an estimated heritability of 68% (Stein et al. 2003).
Additionally, Stein et al. (2007) used an intermediate phenotype of TNF-α levels in 398 related Kampala, Uganda individuals to study associations with candidate genes related to TNF-α regulation. Results showed that in both HIV-negative and HIV-positive individuals, the candidate genes IL10, interferon-gamma receptor 1 (IFNGR1) and TNF-
α receptor - 1 (TNFR1) were linked and associated with both TB and TNF-α regulation, with the TNFR1 association being novel (Stein et al. 2007). Several other studies have found inconsistent results for association of TB disease with TNF (Correa et al. 2005;
Oral et al. 2006; Stein et al. 2007; Pacheco et al. 2008; Möller et al. 2010b).
Second to Stein and colleagues (2007) considering intermediate phenotypes of TB over a binary trait (presence/absence of TB), Flores-Villanueva et al. (2005) considered a
Mexican sample of incident confirmed TB cases and healthy (albeit non-vaccinated) controls, such that all individuals were HIV-negative (Flores-Villanueva et al. 2005).
They found that the monocyte chemoattractant protein-1 (MCP1) in the 17q11.2 region was strongly associated with susceptibility to TB development post Mtb infection (P =
0.0003). These results were successfully replicated in a Korean sample. Considering non-vaccinated controls with recent exposure to a TB case assured the researchers that these controls were infected with Mtb but were not TB-diseased (Flores-Villanueva et al.
2005).
14 Recently, Thye et al. (2009b) examined a set of polymorphisms within the MCP1
gene in a large Ghanaian sample in West Africa. Considering case-control data in
addition to affected nuclear families and a replication analysis of Russian cases and
controls, they found that one of the polymorphisms, MCP1 – 2581G was significantly associated with resistance to TB in the cases versus the controls (corrected P = 0.0012,
OR = 0.81, 95% CI [0.73–0.91]) and in the nuclear families (P = 0.04, OR = 0.72, 95%
CI not reported), in addition to the MCP1-326C variant being significant in both samples.
However, no associations with infection resistance or disease susceptibility were found in the Russian sample (Thye et al. 2009b). This suggests a possible genetic difference between the African sample and the Russian sample, and haplotype analyses in this same study identified differences in linkage disequilibrium (LD) structure between the
Russians and Africans (Thye et al. 2009b). Further analyses are necessary to determine
disease causality, such as fine mapping and resequencing, as well as continual
examination of differences in LD structure across and within different ethnic groups.
1.1.3. Genome-wide Linkage Scans
In addition to candidate genes studies illustrating evidence of a genetic component to TB
susceptibility, linkage analyses have also provided evidence that genetics controls an
aspect of TB resistance and/or infection. To further study genetic susceptibility to TB,
Bellamy et al. (2000) conducted a two-stage genome-wide microsatellite scan of African
individuals in The Gambia and South Africa with an average density of one marker every
11 cM. Considering 92 concordantly affected sibling pairs, seven chromosomal regions
were identified in the full genome-wide analysis (LOD score > 1.0), where markers were
15 genotyped in a second set of individuals from the same African regions. Results from the
analysis of the combined data showed suggestive linkage to TB susceptibility on
chromosomes 15q (LOD = 1.82) and Xq (LOD = 2.18), possibly explaining the
additional cases of TB commonly found in males over females (Bellamy et al. 2000).
Miller et al. (2004) conducted a genome-wide non-parametric linkage scan of 405
microsatellite markers across the genome, set approximately 10cM apart, in 26 Brazilian
families from areas with high TB prevalence (Miller et al. 2004). Eight regions in the
genome provided evidence for suggestive linkage (P < 0.05) to TB, with replicated
linkage peaks on chromosomes 10 (10q26.13) and 20 (20p21.1) (Miller et al. 2004).
Another genome-wide scan on 96 Moroccan families revealed a major
susceptibility locus for TB on chromosome 8q12-13 (Baghdadi et al. 2006). In this
model-free linkage scan, 388 microsatellite markers spanning the genome (average
spacing about 10cM apart) were analyzed, with a maximum LOD score of 3.49 (P = 3 x
10-5). These authors further investigated the area through a model-based linkage analysis of a chromosome 8 region, finding a maximum LOD score similar to the model-free estimate (LOD score = 3.38, P = 4 x 10-5). Because dividing the data into two sets of
pedigrees (those with and those without affected parents) revealed stronger linkage at the
susceptibility locus in families with at least one affected parent than in the entire sample,
Baghdadi et al. concluded that an autosomal dominant allele of a major susceptibility
gene controlled TB disease in these Moroccan families.
To focus on the intermediate phenotypes related to progression from Mtb
infection to TB, Stein et al. (2008) conducted a full family-based genome-wide linkage
analysis of microsatellite markers on 803 Ugandan individuals, both HIV-negative and
16 HIV-positive. Phenotypes considered included exposure to TB, exposure to TB coupled
with Mtb infection, and culture-confirmed TB. Suggestive linkage to TB disease was
found on a 34 cM long segment on chromosome 7 (P = 0.0002), in addition to a 25 cM
long region on chromosome 20 (P = 0.002) (Stein et al. 2008). Of specific interest is the
reported region on chromosome 7, 7p22-7p21, as it contains the IL6 gene, an
immunoregulatory cytokine which inhibits production of TNF-α and IL-1β, and thus may
be harmful in mycobacterial infections (van Crevel et al .2002). Ladel et al. found that
IL6-deficient mice were highly susceptible to TB infection and that this infection was
lethal (Ladel et al. 1997), while Oral et al. (2006) did not find significant differences in
the distribution of the IL6 gene polymorphisms or differences in IL6 allele frequencies
between TB cases and controls. Such contradictory evidence suggests further pursuit of
the IL6 gene’s involvement in TB susceptibility.
Further upstream in this chromosome 7 region is the gene CARD11 (caspase
recruitment domain family, member 11), which is part of the NOD-like receptor (NLR) pathway. This gene is of interest because NLRs have non-redundant roles in Mtb
recognition (Berrington and Hawn 2007). CARD11 has been associated with TB in a
yet-unpublished genome-wide association study of TB conducted in a Vietnamese
population (Dr. Thomas Hawn, personal communication).
The same region on chromosome 20q13 observed by Stein et al. (2008) was
found to be a major susceptibility locus for TB in a study of South African and Malawian
sibling pairs, HIV-negative and HIV-positive cases included (Cooke et al. 2008). South
Africans considered in this study were of mixed races, and no population stratification
testing or adjustment was performed. Two genes in this chromosome 20 region,
17 melanocortin 3 receptor (MC3R) and cathepsin Z (CTSZ), were mapped in South African and Malawian populations, a novel discovery in TB susceptibility (Cooke et al. 2008).
Single-point and multi-point sibling pair linkage analysis performed on a set of 402 microsatellite markers identified 64 markers for further study. Forty SNPs in the 1-LOD drop interval around the highest linkage peak on chromosome 20 were then analyzed for association with disease in a large independent West African case-control sample.
Adjusting for age, sex, ethnicity, and HIV status, a logistic regression analysis found significant evidence for association of a protective effect against TB with polymorphisms in both MC3R (protective genotype AA) and CTSZ (protective genotype TT) genes
(Cooke et al. 2008). Both chromosome 20 genes are biologically relevant to TB susceptibility as MC3R plays a suggested role in the regulation of energy homeostasis, while CTSZ is expressed in cancer cell lines with possible involvement in host defense and tumorigenesis (Pruitt et al. 2007).
Most recently, Mahasirimongkol et al. (2009) conducted a genome-wide linkage analysis in a Thailand pedigree (Mahasirimongkol et al. 2009). Using 93 Thai families, a nonparametric multipoint linkage analysis was conducted using MERLIN (Abecasis et al.
2002). Haplotypes were built based on LD between SNPs, and these inferred haplotypes were used as multiallelic markers for the linkage analysis. Accounting for LD in this way, Mahasirimongkol et al. found a maximum LOD score of 2.29 on chromosome
5q23.2-31.3. Additionally, these authors conducted an ordered subset analysis by minimum age of onset of TB, finding two regions of suggestive linkage, 17p13-13.1
(maximum LOD score 2.57) and 20p13-12.3 (maximum LOD score 3.33). (Minimum age of TB onset was used because the authors assumed TB to occur at a younger age
18 through actual immunological impairment rather than repeated exposure to Mtb or reduced immune response due to old age.) This chromosome 20 region is about 70cM away from the 20q13 region found by Cooke et al. (2008) and Stein et al. (2008), and thus does not cover the CTSZ gene or MC3R gene.
TB infection is only one phase of the disease’s progression. The latent phase of the disease is also crucial, as roughly 20% of individuals with long term exposure to TB display a natural resistance to infection (Rieder 1999). In an analysis of tuberculin skin test (TST) reactivity, a method of intracutaneous testing for tuberculin sensitivity, a genome-wide linkage search conducted by Cobat et al. (2009) found two loci linked to
TST reaction. Model-free linkage analysis of adjusted residuals was conducted in a
South African population consisting of 128 nuclear families, whereby residuals were adjusted for previous TB diagnosis, age, and sex. Significant linkage results for TST positivity (where a “positive TST” was any visible skin reaction of diameter greater than
0 mm) was found on chromosomal region 11p14 (LOD = 3.81), with a lack of response indicating Mtb resistance. Furthermore, these authors considered a second quantitative phenotype, the extent of TST reactivity. Significant linkage to this phenotype was found at chromosomal region 5p15 (LOD = 4.00); fine association mapping of this region using an r2 threshold of 80% and minor allele frequency cut off of 5% identified the
SLC6A3 gene (solute carrier family 6 member 3) as a possible candidate gene for TST reaction. In a Ugandan sample analyzed by Stein et al. (2008), this same region was found suggestive for linkage (P = 0.0005) with persistently negative TST. From these results, the authors concluded individuals who did not become infected, despite persistent exposure to TB, retained this resistance via a genetic difference in their T-cell regulators.
19 Though there exists an array of evidence for a genetic role in TB susceptibility, no
precise causal genetic variants have been identified. Therefore, the linkage results
reported here should be further evaluated using other genetic analyses, such as
association-based tests and fine mapping approaches.
1.2. Methods for Fine Mapping Analysis
In order to identify the specific gene(s) underlying the trait of interest, linkage
scans must be followed up by fine mapping analyses and genetic sequencing. A “1-LOD
drop interval” from the strongest linkage signal is equivalent to a 96.8% confidence
interval for the location of the site causing linkage (Mangin et al. 1994). It has been
shown that for almost all quantitative trait locus (QTL) models, this 1-LOD interval has
the correct probability of containing the exact QTL (Mangin et al. 1994). Although this
may not hold for all methods of analysis, it seems an appropriate basis. Examples of fine
mapping approaches presented here examine associations with diseases other than TB,
but these methodologies provide information applicable to my analyses conducted on TB
susceptibility.
Examples of fine mapping methods include that of Li et al. (2006b), who studied the association between haplotypes and susceptibility to age-related macular degeneration
(AMD). Considering an area already known to be strongly associated with AMD, the authors conducted a single-SNP association test on 84 SNPs in a region of 123 kb, on related affected individuals versus unrelated controls. Forward stepwise logistic regression was used to construct haplotypes associated with AMD: at each step in the regression, the SNP that increased the likelihood ratio statistic the greatest was added to
20 the model and a permutation approach was then used to compare haplotypes between
cases and controls (Li et al. 2006b).
Jallow et al. conducted a fine-resolution multipoint analysis (Jallow et al. 2009).
After performing a genome-wide association analysis, an area on chromosome 11 was identified for fine mapping. A 111 kb region in the center of the strongest signal on chromosome 11p15 was sequenced in a reference panel of 62 randomly selected
Gambian controls and used as a reference panel for imputation for 2,500 individuals.
Imputation of SNPs satisfying genome-wide association study (GWAS) quality control assessments and of SNPs with relative significance was conducted (Marchini et al. 2007), and a test for trend was applied at each imputed SNP to study the association between any imputed SNP and disease. Their results provided evidence that using multipoint association mapping via model-based imputation can identify the casual variant within a
GWAS signal (Jallow et al. 2009).
Xing et al. identified genetic linkage of systematic lupus erythematosus (SLE) in an African American family-based sample to a region on chromosome 13 previously identified by these authors, confirming prior results. These authors finely mapped 324 microsatellite markers (average distance 11.35 cM apart) plus an additional 12 microsatellites, spanning a localized region of 29.62 cM (with average marker distance
1.97 cM) (Xing et al. 2005). The distribution of alleles shared identical by descent
(IBD) at 13q32 was compared between affected relative pairs, followed by the construction of haplotypes using a two-stage hidden Markov model (HMM) approach as implemented in GENEHUNTER (Kruglyak et al. 1996) and the estimation-maximization
(EM) - algorithm as applied in the Statistical Analysis of Genetic Epidemiology
21 (S.A.G.E.) program DECIPHER (S.A.G.E. v. 6.1.0.) at a ~ 25.3 cM region. Comparing
these haplotypes against the distribution of alleles shared IBD in affected relative pairs
finely mapped the disease locus (Xing et al. 2005).
Cervino et al. (2002) conducted a two-stage non-parametric sibling-pair linkage
analysis in Gambian and South African populations to identify a TB susceptibility locus
(Cervino et al. 2002). Stage 1 considered a set of 299 microsatellite markers across all chromosomes, followed by a second linkage analysis of seven chromosomal regions, including additional Gambian and South African families. Combining the entire dataset provided the best evidence for linkage at a 14 cM region on chromosome 15q11-13,
(combined LOD score 2.00). To follow up these results, 10 microsatallite markers and five SNPs were considered for a fine mapping test of association within the families, plus an additional 44 Guinea-Conakry families. A transmission disequilibrium test (TDT) and exact symmetry test found that a seven base pair deletion in the UBE3A gene was marginally associated with TB (P = 0.002).
To accurately localize the disease variant, fine mapping analyses consider a refined region of the genome, with marker densities usually stronger than what is studied in genome-wide linkage and/or association analyses. Li et al. (2006b) examined SNPs in affected related and unrelated individuals in a haplotype-based approach, while Jallow et al. (2009) applied an imputation-based method to a set of unrelated individuals. Xing et al. (2005) considered African American families in a replication analysis of microsatellite markers using an IBD and haplotype-based approach, while Cervino et al. (2002) performed a TDT and exact symmetry test of microsatellites in a set of African families.
These fine mapping approaches provide insight into the analysis of the genetic
22 susceptibility to disease. Based on the type of markers involved, subject selection, ascertainment methods and sample size, methods must be appropriately applied.
1.3. Imputation
A method proven to increase the power of detecting an association between a genetic variant and the defined phenotype is the prediction of genotypes at untyped markers, or imputation (Marchini et al. 2007; Servin and Stephens 2007). Analyses of imputed genotypes provides an accurate and valid method of replication, better resolution of detected associations, and often approximates results that would have been obtained by directly genotyping all SNPs near the loci of interest (Servin and Stephens 2007).
Imputation can additionally provide a strategy for quality control by identifying genotyping errors and can be used in family-based studies to help replace missing genotypes of untyped family members (Ellinghaus et al. 2009). A reference population must be identified in order to impute the unknown genotypes, as imputation calculates its results based on similar patterns of LD between the study population and a reference population. Currently, the populations provided by the International HapMap Project
(2003) are the best characterized reference populations available, although projects such as the 1,000 Genomes Project will provide reference haplotypes from more populations in the near future.
There are several existing software packages which implement SNP genotype imputation. These include BEAGLE (Browning and Browning 2007; Browning and
Browning 2009), BIM-BAM (Bayesian Imputation-Based Association Mapping) (Servin and Stephens 2007), IMPUTE (Marchini et al. 2007), MACH (Li and Abecasis 2006),
23 and PLINK (Purcell et al. 2007). Both BIM-BAM and BEAGLE implement haplotype- clustering methods via a HMM. MACH and IMPUTE also apply HMM, whereas PLINK is centered around multi-marker tagging with basic Expectation-Maximization (EM) phasing algorithms (Pei et al. 2008). MACH can utilize the phased reference haplotype off HapMap, directly from the downloadable files, while IMPUTE and BEAGLE use their own reference format, although HapMap Phase II files can be downloaded off of the
IMPUTE and BEAGLE websites (Ellinghaus et al. 2009). A comparison of these programs, excluding BIM-BAM (as it focuses less on imputation and more on association, thereby lacking a measure of imputation confidence) examined these programs’ accuracies, efficacies, and runtimes (Nothnagel et al. 2009). Considering a
German sample of 449 unrelated individuals and using the HapMap CEU reference panel, Nothnagel and colleagues concluded that PLINK failed to impute over 60% of the imputable SNPs and was consistently inaccurate compared to IMPUTE and MACH.
IMPUTE and MACH illustrated trade-offs between accuracy and efficacy, although they overall out-performed BEAGLE in both areas. Defining imputation efficacy as “the proportion of imputable SNPs for which the program-specific confidence in an imputed genotype equaled or exceeded a given confidence threshold”, and imputation accuracy as “the concordance rate between the imputed and observed genotypes of these SNPs”, Nothnagel et al. 2008 found that MACH and IMPUTE had almost identical trade-offs between accuracy and efficacy, irrespective of the imputation basis and when varying confidence threshold values. Although BEAGLE was only slightly less accurate than IMPUTE and MACH, PLINK performed consistently more poorly (Nothnagel et al. 2009).
24 Furthermore, IMPUTE requires that the user define the recombination and mutation rates and is therefore more sensitive to model misspecification (Browning
2008), whereas MACH estimates the recombination rates given the sample data provided using an EM iterative approach such that recombination and mutation rates are estimated at the end of each iteration. Because of this iterative approach, MACH requires more computational power and running time than IMPUTE and BEAGLE. Additionally,
Nothnagel et al. (2009) concluded that MACH and BEAGLE were more user-friendly than IMPUTE. IMPUTE, MACH, and BIM-BAM are considered computationally intensive tools, such that all observed genotypes are considered when each missing genotype is imputed. On the other hand, PLINK and BEAGLE can be computationally more efficient tools such that they focus on genotypes for a small number of nearby markers when imputing each missing genotype (Li et al. 2009).
A second review by Ellinghaus et al. (2009) of BEAGLE, IMPUTE, and MACH extended that of Nothnagel et al. (2009). This review provided a detailed comparison of the three programs, making suggestions to the user based on resource accessibility
(computer, documentation, cost of license, etc.), input (genotype format, reference data format, conversion utilities), processing (runtime, maximum memory allowance, error correction, etc.), output (quality measures, file size, file format, etc.) and other items such as X-chromosome imputation and accuracy estimation (Ellinghaus et al. 2009). Each program has its limitations: BEAGLE is marginally less accurate than IMPUTE and
MACH, IMPUTE is susceptible to model misspecification and MACH has the potential for longer runtimes. IMPUTE and MACH can only work with diallelic markers, where
BEAGLE can handle mutli-allelic markers.
25 Imputation can be a very useful tool in the detection of associations between genetic markers and disease. However, appropriate selection of the specific imputation program should be considered.
26 Chapter 2: Specific Aims
The purpose of this study was to examine the association of TB susceptibility and a
selection of biologically relevant markers and to fine map a region identified through a
previous genome scan, utilizing an imputation analysis.
2.1. Specific Aim 1
To Perform a Candidate-Gene Analysis for TB Susceptibility
Rationale: To identify any possible associations between TB susceptibility and a set of
pre-selected, biologically relevant candidate genes.
Approach: Four candidate genes, IL6, CARD11, CTSZ, and MC3R, were selected for
analysis, based on previous findings of suggestive linkage (IL6, CARD11) and replicated
linkage (CTSZ, MC3R) (Stein et al. 2008). Within these regions, 57 tag SNPs were selected and were genotyped on an Illumina 1536-SNP BeadArray. Association analyses were conducted using a generalized linear mixed model approach as conducted in SAS’s
PROC GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004).
2.2. Specific Aim 2
To Perform a Fine-Mapping Analysis for TB Susceptibility
Rationale: To more finely map a major TB susceptibility locus on chromosome 7p
identified through a whole genome-wide linkage scan (Stein et al. 2008).
Approach: SNPs were selected equidistant across a region on chromosome 7 based on a well defined SNP quality. Association analyses were conducted using a generalized
27 linear mixed model approach as conducted in SAS’s PROC GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004)
2.3. Specific Aim 3
To Impute Missing Genotypes and Analyze for Associations with TB Susceptibility
Rationale: To increase the power and resolution of a scan for associations between genetic variants and the defined TB susceptibility phenotype.
Approach: Based on similarity of the LD structure in three HapMap populations (the
Yoruba in Ibadan, Nigeria, the Maasai in Kinyawa, Kenya, the Luhya in Webuye, Kenya) and that of my Ugandan data, unknown SNP genotypes were imputed using the MACH
1.0 software package (Li and Abecasis 2006). The candidate gene analysis and fine- mapping analysis were carried out using the imputed genotypes in SAS’s PROC
GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004).
28 Chapter 3: Methods
3.1. Data Description
3.1.1. Sample
Study participants were recruited to the Household Contact Study in Kampala,
Uganda (Phase I), between 1995 and 1999, and the Kawempe Community Health Study in Uganda (Phase II), which enrolled study subjects between April 2002 and February
2004 (Guwatudde et al. 2003; Stein et al. 2005). Both studies are supported by the
Tuberculosis Research Unit (Principal Investigator: Dr. W. Henry Boom). Phase I of the study (1995 to 1999) ascertained households through a TB index case. Index cases were
identified at the Uganda National Tuberculosis and Leprosy Program (NTLP) Treatment
Center as having a positive acid-fast (AFB) smear and positive Mtb culture. Households
were included in the study if the index case lived with at least one other individual and all
household individuals (index case and parents/guardians of children in the household
ages 18 years or less included) provided informed consent to participate in the study.
Phase II of the study (2002 to 2004) ascertained families through index cases only
required to have a positive Mtb culture and to be referred to the study by the NTLP. All
household contacts were defined as persons residing in the household for at least seven
consecutive days during the three month period prior to the TB diagnosis of the index
case.
Upon enrollment, all participants were given physical examinations, including
HIV testing. Individuals in the households expected to have TB received chest x-rays
and sputum samples for culture and AFB smear. In the case of young children, gastric
lavage was performed. Based on these collections, TB diagnoses were assigned as
29 defined by the American Thoracic Society (ATS). All TB-confirmed cases received
short-course therapy (ATS 1994). All study participants received TST, using the
Mantoux method of intracutaneous testing for tuberculin sensitivity. Test results were
defined separately for HIV-positive children less than or equal to five years of age (TST
concluded positive if induration diameter was 5mm or greater) versus all HIV-negative
individuals greater than five years of age (TST concluded positive if induration diameter
was 10mm or greater). For those individuals with a negative TST at their baseline
evaluation, a skin test was given again at 3, 6, 12, and 24 months. Subjects with two or
more negative TSTs over a two year period were concluded to be persistently TST-
negative. Most individuals, if exposed to TB, convert from non-infected (TST-negative) to Mtb-infected (TST-positive) within the first three months after exposure to an index case. Thus, if individuals did not convert from TST-negative to TST- positive within two years of exposure, it was safely assumed that they would not convert thereafter (Stein and
Boom, personal communication). Individuals who converted from a negative TST upon
enrollment to a positive TST during the study were indentified as part of the latent Mtb
infection (LTBI) group. Phase II participants with a positive TST result were offered the
anti-tuberculosis medication Isoniazid.
3.1.2. Descriptive Statistics
General descriptive statistics were performed on variables of interest, using
Statistical Analysis Software (SAS) software v9.2 (Cary software, North Carolina SAS
Institute Inc. 2004) (Table 1). After removing any samples with call rates less than 90%,
564 individuals with complete genotype data were included from both Phase I and Phase
30 II, with a median age of 16 years. The sample comprised 318 females (56.4%) and 246
males (43.6%); 430 (76.2%) individuals were HIV-negative while 89 (15.8%) individuals
were HIV-positive, and the other individuals’ HIV statuses were unknown. A total of
122 (21.6%) individuals had confirmed TB. The sample comprised 243 pedigrees,
including 73 singletons, 230 parent-offspring pairs, and 32 sibling pairs, with a mean
family size of 5.08 individuals, and a standard deviation of 5.87.
3.2. Genotyping
All genotyping was performed on the Illumina BeadArray platform using a
custom 1536 SNP microarray. Pedigree structures were verified using S.A.G.E.’s
RELTEST (S.A.G.E. v6.1.0.) and RELPAIR (Epstein et al. 2000) in the Stein et al.
(2008) genome scan analysis and population substructure as tested using STRUCTURE
(Pritchard et al. 2000) was found absent from this Ugandan sample (Stein et al. 2008).
Mendelian inconsistencies were removed prior to analysis using MARKERINFO, and all
marker allele frequencies were estimated in FREQ (S.A.G.E. v6.1.0.). Call rates by plate
were considered as an additional measure of quality control (QC) and any SNPs that did
not meet the call rate of 90% were excluded. Deviation from Hardy Weinberg
Equilibrium (HWE) (P < 0.001) was examined in the pooled data (cases and controls combined). Also, signal intensities were verified as falling into three distinct genotype groups and thus no SNPs were lost due to inadequate signaling. Based on these QC measures, a total of 119 SNPs were excluded from analyses.
Genotyping for the candidate gene analysis was based on a set of SNPs selected after finding suggestive evidence for linkage to an area on chromosome 7 (7p22-p21)
31 containing the IL6 gene (Stein et al. 2008). A suggestive linkage signal (P = 0.002) to
TB on chromosome 20q13 was observed in this same genome scan, replicating the Cooke
et al. (2008) results that found the MC3R and CTSZ genes on chromosome 20q13
mapped to TB in African populations (Cooke et al. 2008), and these gene regions were
genotyped as well. SNPs within the CARD11 gene were also considered as it was found
to be within the 1-LOD support interval from the significant linkage signal on
chromosome 7 (Stein et al. 2008). CARD11, which is part of the NOD-like receptor
(NLR) pathway, is of interest because NLRs have non-redundant roles in Mtb recognition
(Berrington and Hawn 2007), and because of reported association with TB (Dr. Thomas
Hawn, personal communication). Due to its large size (about 13.8 kb wide), CARD11 could not be covered with tag SNPs, and thus 17 SNPs found to be associated with this gene and TB in a Vietnamese study (Drs. Thomas Hawn and Nguygen Thuy Thuong, personal communication) were selected. However, if one of these SNPs was not
available on the Illumina platform or did not meet the minor allele frequency (MAF) threshold of 5% or a defined SNP score quality criterion (Illumina SNP quality score >
0.6), it was replaced with the nearest SNP that met the MAF and SNP score thresholds.
SNP quality scores were determined by Illumina and are based on the probability of success of the assay and validation of the SNP in at least two populations. Tag SNPs for
IL6, CTSZ, and MC3R were selected through the tagger application of the Genome
Variation Server (SeattleSNPs Program for Genomic Applications PGA, 2009) using a
MAF threshold of 5%, an r2 cutoff of 80%, and the HapMap Phase II populations YRI
(Yoruba in Ibadan, Nigeria), MKK (Maasai in Kinyawa, Kenya), and LWK (Luhya in
Webuye, Kenya). Because the tag SNPs for these three HapMap populations were not
32 the same, any SNP that was identified in any of these populations as a tag SNP was
selected for analysis. After removing seven SNPs that did not meet the call rate threshold
of 90%, a total of 50 SNPs across the four candidate genes were considered in the final analysis (27 in CARD11, 16 in IL6, five in CTSZ, and two in MC3R).
The SNPs for the fine mapping analysis were selected across a 17.84-Mb region
(the 1-LOD drop region) on chromosome 7p. The region was divided into equidistant windows of 11.4 kb, a size chosen to fit the remaining openings on the 1536-SNP bead array. Excluding the windows containing the 17 SNPs associated with TB in the
Vietnamese study referenced above, the SNP with the highest quality score was selected from each window. This score was defined as follows: 5 = MAF > 0.15 and SNP score
> 0.8; 4 = MAF > 0.10 and SNP score > 0.7; 3 = MAF > 0.10 and SNP score > 0.6; 2 =
MAF > 0.10 and SNP score > 0.4; 1 = MAF > 0.05 and SNP score > 0.4. In the case of a tie, the SNP closer to the center of the window was chosen. This strategy resulted in a total set of 1,367 SNPs for the fine mapping analysis after removing 112 SNPs that did not meet the call rate threshold of 90%.
33
Table 1. Descriptive Statistics
Chi-Square No TB TB Total P
Males 181 65 246 (44%)
Females 261 57 318 (56%) 0.01506
HIV + 34 55 89 (16%)
HIV - 363 67 430 (76%) < 0.0001
BCG scar 270 56 326 (58%)
No BCG scar 99 41 140 (25%) 0.00317
Total 442 (78%) 122 (22%) 564
3.3. Analysis Strategy
3.3.1. Aim 1: Candidate Gene Analysis
To assess the association between TB susceptibility and the covariates of interest, the TB
phenotype was dichotomized: (1) non-cases without TB (TST-negative) and latent Mtb infected cases (LTBI) combined, (2) index cases only (probands diagnosed with active
TB at the time of study enrollment). Individuals with active TB were coded as affected while those without active TB (the LTBI and TST-negative individuals) were coded as unaffected. Because these are household data, unrelated singleton individuals were identified within each household and treated as unrelated individuals in the analyses.
Included as covariates in the analysis were HIV status (which as discussed has been shown to have a significant affect on TB susceptibility), sex, and age dichotomized at five years. This binary age covariate was included partly because results are
34 inconclusive as to whether young children in TB endemic settings express less robust
IFN-γ responses to Mtb antigens than adults (Kampmann et al. 2006; Lewinsohn et al.
2008). Furthermore, conflicting sex and age-dependent association results have been reported with alleles of the SLC11A1 gene (Leung et al. 2007; Malik et al. 2005) as well as significant linkage with earlier onset TB (Mahasirimongkol et al. 2009). Lewinsohn et al. (2008) found that young children (age less than five years) exposed to TB displayed more robust IFN-γ responses than adults.
Due to the relatedness of individuals in this sample, a simple logistic regression analysis that did not take familial correlations into account could not be applied.
Furthermore, because the majority of affected individuals were the parents in these families and not the offspring, the TDT, which compares the number of alleles transmitted from an informative parent heterozygous at that particular SNP to his/her affected offspring, was not appropriate for these data. Therefore, a generalized linear mixed model (GLMM) approach was applied as implemented in SAS’s PROC
GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004). Whereas a generalized estimated equation approach (such as PROC GENMOD) allows for only fixed effects to be modeled, PROC GLIMMIX incorporates both fixed and random effects into parameter estimation, via “pseudo-likelihood” techniques as in Wolfinger and
O’Connell (Wolfinger and O’Connell 1993) and Breslow and Clayton (Breslow and
Clayton 1993).
For an n × 1 vector of observations, Y, and γ an r × 1 vector of random effects, models fit by the GLIMMIX procedure assume that E[Y |γ ] = g− 1(Xβ + Zγ) where g( ·)
is a differentiable monotonic link function and g−1 (·) is its inverse. The matrix X is an
35 n × p matrix of rank k, and Z is an n × r design matrix for the model’s random effects.
These random effects are assumed to be normally distributed with mean 0 and variance matrix G. The GLMM contains a linear mixed model inside the inverse link function, referred to as the linear predictor, η = Xβ + Zγ (Cary software, North Carolina SAS
Institute Inc. 2004).
The GLIMMIX procedure distinguishes “R” and “G” random effects, depending on whether the variance of the random effect is contained in G or in variance matrix R,
such that var[Y] = A1/2RA1/2, where matrix A is a diagonal matrix of response variances
and these variances are functions of the mean. If a random effect is an element of γ, it is a G-side effect. Otherwise, it is an R-side effect, where R-side effects are also called
“residual” effects. Standard errors of the parameter estimates are obtained from the negative of the inverse of the (observed or expected) second derivative Hessian matrix H.
In this application, the GLMM is approximated by a linear mixed model based on covariance parameter estimates. The resulting linear mixed model is then iteratively fit.
Upon convergence, the new parameter estimates update the linearization, a process which stops when parameter estimates between successive linear mixed model fits change only within a specified range (Cary software, North Carolina SAS Institute Inc. 2004).
The response variable in a GLMM is usually assumed to be independent for all
subjects; however, because these analyses are conducted on pedigree data, there exist
correlations between observations. These correlated data are modeled using the same
link function and linear predictor setup as in the standard logistic model (the independent
case), and the random component is described by the same variance functions as in the independence case. However, the correlation structure, which here is a function of the
36 relationships between individuals in a given family, must be incorporated. PROC
GLIMMIX allows the user to define the correlation structure of the data. Here, an
exchangeable correlation structure within each pedigree was applied, assuming that all
individuals have a correlation of ρ = 1 with themselves, and a correlation ρ with any other member in their family, i.e., for n = 1, 2,…, i families, the G covariance matrix is defined as ⎛⎞1 ρ ⎜⎟. G = ⎜⎟% ⎜⎟ ⎝⎠ρ 1
This may induce some bias, since sibling pairs, for example, are more correlated than
avuncular pairs. However, due to the particular family structure given here, as most of
the relative pairs are parent offspring, I assumed that an exchangeable correlation
structure most appropriate.
Each SNP was entered into the GLMM individually, along with the three
covariates, HIV status, dichotomized age (< five years or > and equal to five years) and
sex. Strong LD was observed within the candidate genes (data not shown), and LD
between genes was assumed to be absent (r2 = 0), therefore a significance threshold of P
= 0.0125 (0.05 / 4 genes) was applied to the candidate gene results.
3.3.2. Aim 2: Fine Mapping Analysis
A total of 1,479 SNPs were selected evenly across a 17.84-Mb region (the 1-LOD
drop interval) on chromosome 7p for fine mapping, i.e. roughly 1 SNP every 11.4kb. Of
these, 1,367 total SNPs met the 90% call rate threshold. Although this chromosomal region contains the CARD11 and IL6 candidate genes, SNPs in these genes were not
37 considered in the fine mapping analysis to avoid redundancy with the tag SNPs
genotyped in the candidate gene analyses. The same approach applied to the candidate
gene analysis was applied to the fine mapping analysis, again by considering TB as a
binary trait. Likewise, HIV status, dichotomized age, and sex were considered in the
model. After careful consideration of the LD present in the fine mapping data, a
statistical significance threshold for the markers assumed the SNPs within the
2 chromosome 7 region had a LD measure of r = 10% (and therefore 90% were
independent) on average, resulting in a significance threshold of P = 4.13 x 10-5
(0.05/[(1,344*0.9)]).
3.3.3. Aim 3: Imputation
In order to increase the power and resolution of association detection between these SNPs
and TB susceptibility, untyped genotypes were imputed using MACH 1.0 (Li and
Abecasis 2006). Based on user-friendliness, narrowed regions for imputation, and the
experience of collaborators, MACH was chosen for these analyses. Using HapMap’s
Genome Browser application, chromosomal regions for imputation were selected based
on the genotyped regions in the Ugandan data, plus an extension of 250kb at each end of
the region to capture any SNPs that my be in LD with other SNPs outside of the regions
of interest. Given the parental genotypes, the offspring contribute no information, thus
MACH assumes all subjects are unrelated in the reference panel. Therefore, it was
necessary to correct Mendelian after imputation. Although some family information was
discarded through this approach, accuracy was assumed unaffected since the SNPs
provided were close together.
38 We used a separate dataset to guide selection of the appropriate HapMap
reference population. Several recent studies suggest that TB susceptibility is related to
the TLR pathway (Berrington and Hawn 2007). To examine the similarity in genotype
frequency and LD structure between HapMap populations and this household study
population, full-exon resequencing of TLR genes was conducted in unrelated individuals
from Uganda, all of Black African decent (Baker et al. 2009). Sequences were aligned
and analyzed with the programs PHRED/PHRAP (Ewing and Green 1998; Ewing et al.
1998) and CONSED (Gordon et al. 1998) and genotypes were constructed. The TLR
genes of interest included TLR2, TLR4, TLR6, and an adaptor-like protein involved in
TLR4 signal transduction, toll/interleukin 1 receptor domain-containing adaptor protein
(TIRAP). A Pearson χ2 test with two degrees of freedom was applied to test for significant differences (at the α = 0.05 significance level) in genotype frequencies.
A key concept in imputation is the minor allele frequency (MAF): if the MAFs between the reference population and the study population are significantly different, imputation accuracy is greatly reduced. Therefore, to analyze discrepancies in MAFs between the Uganda sample and the pre-selected reference panel, 100 imputed SNPs across the chromosome 7 and chromosome 20 regions were randomly selected and using a chi-square test of significance, these 100 imputed genotypes were compared to genotypes from each of the three reference populations.
In addition to MAF and LD, heterozygosity is an important component of imputation. Nothnagel et al. (2009) found that increasing marker heterozygosity (mildly) reduced imputation accuracy and also reduced the general trade-off between accuracy and efficacy (Nothnagel et al. 2009). Thus, the overall marker heterozygosity of these
39 Ugandan genotypes was established using Haploview (Barrett et al. 2005). Additionally, estimating the haplotype frequencies using the EM-algorithm (as implemented in SAS’s
PROC HAPLOTYPE) in the different populations at the TLR genes provided a more appropriate comparison of genomic structure between the Ugandan and Kenyan populations, leading to a definitive reference population selection.
Pooling the imputed genotypes with the known marker information, i.e. family identification, individual identification, sex, father/mother identification, and covariate values, the candidate gene analysis and fine-mapping analysis were repeated using these pooled data.
40 Chapter 4: Results and Discussion
4.1. Results
4.1.1. Candidate Gene Analysis
A total of 564 Ugandan individuals were genotyped at 50 SNPs across the four
candidate genes, CARD11, IL6, CTSZ, and MC3R. In the pooled sample (TB cases and
controls combined), the genotype distributions of two SNPs significantly departed from
HWE (Table 2). Based on a defined significance threshold, P = 4.06 × 10-5, none of the candidate genes SNPs were found significantly associated with TB susceptibility (Table
3). Furthermore, none of the 50 SNPs was associated with TB susceptibility at the P =
0.05 level.
Table 2. Candidate Gene SNPs Departing from HWE SNP Gene Position ObsHET PredHET HWpval MAF Alleles rs12700594 CARD11 3123227 0.352 0.48 5.49E-07 0.4 A:G rs12700386 IL6 22729534 0.459 0.383 1.00E-04 0.258 G:C *All P-values less than 0.0001
41 Table 3. Candidate Gene Analysis Results SNP Gene Location Estimate Standard Error P-value rs6976564 CARD11 2,884,623 0.00075 0.00191 0.69317 rs2644303 CARD11 2,894,071 -0.00013 0.00196 0.94841 rs6948739* CARD11 2,899,721 -0.00026 0.00196 0.89329 rs2679251 CARD11 2,905,002 0.00025 0.00194 0.89858 rs2527516* CARD11 2,905,445 0.00005 0.00195 0.97917 rs1878805 CARD11 2,927,940 0.00004 0.00195 0.98493 rs1636166 CARD11 2,941,892 -0.00007 0.00192 0.96952 rs10229368 CARD11 2,952,112 0.00003 0.00195 0.98689 rs746009 CARD11 2,961,390 0.00014 0.00194 0.94085 rs7794674 CARD11 2,973,945 -0.00080 0.00195 0.68235 rs4719737 CARD11 2,985,488 0.00015 0.00194 0.93656 rs1843933 CARD11 2,997,718 -0.00012 0.00196 0.95033 rs11762164 CARD11 3,008,186 -0.00005 0.00195 0.97782 rs12671372 CARD11 3,020,624 0.00051 0.00194 0.79072 rs10951005* CARD11 3,037,641 -0.00028 0.00195 0.88762 rs10951010 CARD11 3,042,579 0.00067 0.00192 0.72541 rs7805181 CARD11 3,056,808 -0.00012 0.00196 0.95296 rs12700536* CARD11 3,058,747 -0.00007 0.00194 0.96939 rs1976135* CARD11 3,060,293 0.00036 0.00195 0.85295 rs1976132* CARD11 3,060,648 0.00003 0.00195 0.98961 rs6461814* CARD11 3,062,737 -0.00015 0.00194 0.93769 rs11772124* CARD11 3,084,400 -0.00045 0.00195 0.81830 rs7791004 CARD11 3,089,458 -0.00052 0.00196 0.78921 rs17150474* CARD11 3,092,123 -0.00028 0.00195 0.88538 rs6969362 CARD11 3,111,159 0.00021 0.00194 0.91401 rs12700594 CARD11 3,123,227 -0.00074 0.00186 0.68962 rs4722476 CARD11 3,157,114 -0.00001 0.00195 0.99636 rs12700386 IL6 22,729,534 0.00027 0.00188 0.88574 rs3087221 IL6 22,729,942 -0.00014 0.00197 0.94247 rs2069824 IL6 22,731,757 0.00087 0.00193 0.65350 rs2069832 IL6 22,733,958 0.00053 0.00193 0.78387 rs2069835 IL6 22,734,396 0.00040 0.00195 0.83672 rs1474347 IL6 22,734,649 0.00039 0.00195 0.84028 rs2066992 IL6 22,734,774 0.00001 0.00196 0.99724 rs2069839 IL6 22,735,020 -0.00077 0.00195 0.69093 rs2069840 IL6 22,735,097 -0.00019 0.00195 0.92356 rs1554606 IL6 22,735,232 0.00032 0.00194 0.86870 rs2069842 IL6 22,735,835 0.00010 0.00196 0.95805 rs1548216 IL6 22,736,298 -0.00029 0.00196 0.88268 rs2069843 IL6 22,736,519 -0.00039 0.00191 0.83769 rs2069845 IL6 22,736,674 -0.00033 0.00196 0.86730 rs2069846 IL6 22,736,887 0.00003 0.00196 0.98717 * SNP identified previously by Vietnamese study ** Results adjusted for age (< or ≥ 5 years), sex, and HIV status
42 SNP Gene Location Estimate Standard Error P-value rs2069849 IL6 22,737,681 -0.00016 0.00197 0.93427 rs3746619 MC3R 54,257,212 -0.00119 0.00192 0.53693 rs3827103 MC3R 54,257,436 -0.00038 0.00195 0.84549 rs10369 CTSZ 57,003,851 0.00038 0.00195 0.84472 rs9760 CTSZ 57,005,158 0.00007 0.00193 0.97140 rs163790 CTSZ 57,008,989 0.00015 0.00194 0.93728 rs163800 CTSZ 57,011,903 0.00007 0.00196 0.97360 rs163801 CTSZ 57,012,009 -0.00035 0.00197 0.86077 ** Results adjusted for age (< or ≥ 5 years), sex, and HIV status
4.1.2. Fine-Mapping Analysis
Within these 564 Ugandans, a total of 1,479 SNps were genotyped across the 17.84-Mb region on chromosome 7, and of these 1,367 met the 90% call rate threshold. Departures from HWE in the pooled sample were found in 24 SNPs (Table 4). No significant associations with TB susceptibility were found with these SNPs (P > 0.10) (data not shown).
43 Table 4. Fine Mapping SNPs Departing from HWE SNP Position ObsHET PredHET HWpval MAF Alleles rs6960928 1263932 0.016 0.071 2.44E-16 0.037 G:A rs10266549 1440457 0.668 0.499 1.69E-10 0.477 G:A rs7783310 1712658 0.575 0.473 4.66E-05 0.383 G:A rs4256490 1857290 0.183 0.24 5.62E-05 0.14 G:A rs1637755 2188475 0.474 0.404 8.00E-04 0.281 G:A rs11977057 4094233 0.148 0.484 1.98E-41 0.412 C:G rs314598 4454229 0.379 0.481 4.99E-05 0.403 G:C rs11976063 4798495 0.56 0.459 4.36E-05 0.357 A:G rs12540466 4917659 0.263 0.353 5.09E-06 0.228 A:T rs627222 6094981 0.217 0.273 3.00E-04 0.163 A:G rs38019 8014208 0.348 0.458 5.97E-06 0.355 A:T rs11978224 8652578 0.355 0.484 6.85E-07 0.409 T:A rs2915125 9873235 0.167 0.496 1.64E-39 0.455 A:G rs13236165 11651725 0.224 0.499 9.98E-26 0.48 G:A rs17165063 11707008 0.353 0.443 2.00E-04 0.331 A:T rs12671228 12528990 0.304 0.404 4.63E-06 0.281 C:A rs2282880 13921045 0.286 0.457 7.34E-12 0.354 A:G rs41503 13999181 0.271 0.401 3.98E-09 0.278 T:A rs7783337 14123511 0.325 0.414 4.71E-05 0.293 A:G rs7811874 15308407 0.523 0.434 5.72E-05 0.318 G:A rs10227084 15366197 0.588 0.5 9.00E-04 0.497 C:A rs10242655 15822701 0.014 0.176 6.96E-40 0.097 C:A rs38237 15899199 0.24 0.308 6.86E-05 0.19 A:G * All P-values less than 0.0001.
4.1.3. Imputation
Reference Population for Imputation
In comparing genotype frequencies, the Uganda (UG) population tended to show fewer differences from the Kenyan populations (MKK, LWK) than the Yoruba population (YRI); several differences in the Ugandan population were seen in TLR6, where the Ugandan frequencies were dramatically different. Interestingly, allele frequencies for TLR6 were “flipped” in the Ugandan population (the most common genotype in Ugandan data was opposite to all the other populations).
44 Comparisons of LD patterns were inconclusive. Differences in LD structure were
apparent between the Uganda sample, both Kenyan populations, and the Yoruba population, and thus it was difficult to decide which HapMap population was most like the Ugandan population, suggesting that multiple HapMap populations should contribute to the reference sample. These inconclusive findings were unlikely the result of small sample size since the size of the Ugandan population was actually slightly larger than that of the Yoruba population. However, a limitation of this analysis was that there was not exact overlap in the SNPs covered in these populations: the Ugandan sequencing focused on exonic SNPs, while the HapMap data had broader coverage. Based on lack of SNP overlap, LD was difficult to assess and compare.
Haplotype analyses found that the haplotypes in TLR6 in the UG individuals were not present at all in the other four populations. A Fisher’s exact test of the haplotype frequencies on TIRAP found that the UG frequencies were significantly different (at the
α = 0.05 level) from LWK (P = 0.0412), MKK (P = 0.0014), and YRI (P = 0.0076).
Moreover, YRI was also significantly different from MKK (P < 0.00001). Also interesting was that at TIRAP, the two Kenyan populations had significantly different
haplotype frequencies from one another (P = 0.0002). At TLR2, UG was significantly
different from LWK (P < 0.00001), YRI (P < 0.00001), and MKK (P < 0.00001). Also,
at TLR2, YRI was significantly different from LWK (P < 0.00001) and MKK (P <
0.00001). These haplotype analyses demonstrated that UG differs from both the Kenyan populations and the Yoruba population but that some similarities between these
populations are present, suggesting the use of an imputation reference population that
combines all three populations. Results are provided in Tables 5 - 8.
45
Table 5. Haplotype Analysis Results for TLR2
HAPLOTYPE C-C C-T T-T T-C LWK 0.06178 0.64608 0.29212 MKK 0.02043 0.617 0.36253 Population YRI 0.74552 0.01496 0.23951 UG 0.28573 0.49999 0.21427
Table 6. Haplotype Analysis Results for TLR4
HAPLOTYPE A-A-G A-A-T A-G-G A-G-T G-A-G G-G-G LWK 0.75617 0.01764 0.02792 0.06758 0.12391 MKK 0.77033 0.03297 0.03199 0.1472 0.01421 Population YRI 0.84913 0.01063 0.03088 0.02095 0.0834 46 UG 0.80263 0.03947 0.15789
Table 7. Haplotype Analysis Results for TLR6 HAPLOTYPE A-C-C A-C-T G-C-C G-T-C A-G-C A-G-T G-G-C G-G-T LWK 0.65348 0.15902 0.15902 0.0284 MKK 0.61879 0.32858 0.02453 0.02336 Population YRI 0.6902 0.23618 0.05828 0.01532 UG 0.02083 0.67708 0.01042 0.29167
Table 8. Haplotype Analysis Results for TIRAP HAPLOTYPE A-C-G-C G-C-A-C G-C-G-C G-T-G-C G-C-G-T LWK 0.10565 0.02415 0.74289 0.11736 MKK 0.03668 0.67503 0.22676 0.05059 Population YRI 0.03064 0.0457 0.81032 0.10301 UG 0.07778 0.01111 0.85556 0.01111 0.04444
Marker Data
To ensure that imputation accuracy was not greatly reduced by MAF discrepancies, genotypes between the imputed data and each of the pre-selected reference panel were examined at a random selection of 100 SNPs across the chromosome 7 and chromosome 20 regions. Based on these 300 chi-square tests, several significant differences were found. The average observed heterozygosity in the observed Ugandan genotypes was 40%. The average predicted heterozygosity, under the assumption of
HWE, was about 39%. Therefore, heterozygosity was not suspected to significantly reduce the accuracy of my imputation.
A total of 11,872 SNPs were imputed across the chromosome 7 and chromosome
20 regions. However, after correcting for Mendelian inconsistencies, only 9,936 of these
SNPs provided sufficient information for the association analyses as inconsistent genotypes were set to missing.
Association Analysis
Results for the SNPs with association P values below 0.05 are provided in Tables
9 and 10. Of the SNPs with suggestive association to TB susceptibility, four were in close proximity with the candidate genes examined, including IL6 and MC3R. The imputation analyses found significance at five SNPs at the P = 0.01 level. Two of these
SNPs are located on the SDK1 gene, the sidekick homolog 1, cell adhesion molecule, a protein coding gene with literature supporting its role in HIV-associated nephropathy
(Kaufman et al. 2004; Kaufman et al. 2007 ).
47
Table 9. Results for Imputed Genotypes on Chromosome 7p SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP
rs6952577 838627 UNC84A -0.00483 0.002152 0.02481
rs1113831 1775912 0.005669 0.002708 0.03628 46.041 kb from MAD1L1 21.796 kb from ELFN1
rs10950377 1802421 0.01361 0.006525 0.03696 19.532 kb from MAD1L1 48.305 kb from ELFN1
rs4721174 1903206 MAD1L1 -0.00421 0.001933 0.02943
rs3778994 2142382 MAD1L1 -0.00402 0.00198 0.04233
rs3735093 2254383 NUDT1 0.006108 0.002849 0.03204
rs1799832 2257049 NUDT1 -0.00506 0.002571 0.04914
rs3735111 2616459 IQCE -0.00441 0.002113 0.03668
rs4719646 2721152 AMZ1 -0.00404 0.001988 0.04190
rs798527 2739317 GNA12 3.4885 1.3873 0.01191
rs798521 2742925 GNA12 1.0455 0.4298 0.01499
rs2644295 2826744 GNA12 -0.00404 0.001909 0.03417
rs1182181 2839774 GNA12 -0.00479 0.002233 0.03195
rs1182179 2840175 GNA12 -0.00479 0.002233 0.03195
rs7802106 3442243 SDK1 0.008724 0.004374 0.04608
rs4722830*** 3468403 SDK1 0.008585 0.003142 0.00628 48 rs1915981*** 3477268 SDK1 0.008585 0.003142 0.00628
rs2002671 3671005 SDK1 0.008828 0.004244 0.03753
rs6964347 3673290 SDK1 0.005732 0.002756 0.03750
rs12112197 3700806 SDK1 -0.00435 0.002077 0.03622
rs13225994 3745464 SDK1 -0.01202 0.005419 0.02659
rs6975070 3759422 SDK1 -0.0041 0.002006 0.04093
rs12701221 3855197 SDK1 0.01034 0.004444 0.01992
rs17134410 4153858 SDK1 -0.00393 0.001953 0.04421
rs669028 4166428 SDK1 -0.00594 0.00293 0.04263
rs4723505 4230186 SDK1 0.005754 0.00279 0.03921 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01
SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP
rs9638928 4304786 -0.00369 0.001878 0.04969 383.669 kb from FOXK1 29.629 kb from SDK1
rs7783362 4445115 -0.00459 0.002188 0.03592 243.34 kb from FOXK1 169.958 kb from SDK1
rs7788807 4734565 FOXK1 0.004882 0.002273 0.03170
rs11771525 5446978 -0.00474 0.002361 0.04453 34.975 kb from FBXL18 17.275 kb from TNRC18
rs10249759 6032273 EIF2AK1 0.006975 0.003363 0.03810
rs10224504 6194952 PSCD3 -0.00374 0.001874 0.04607
rs13225983 6545107 -0.01202 0.005419 0.02659 38.482 kb from ZDHHC4 54.733 kb from KDELR2
rs7794450 6592373 ZDHHC4 -0.00576 0.002392 0.01605
rs7795522 6602099 C7orf26 -0.01082 0.005458 0.04751 279.585 kb from
rs4724889 6909185 -0.00524 0.002415 0.03011 C1GALT1 76.799 kb from C7orf28B
rs2163639 7162473 -0.00778 0.00323 0.01596 26.297 kb from C1GALT1 330.087 kb from C7orf28B
rs4582461 7162798 -0.00778 0.00323 0.01596 25.972 kb from C1GALT1 330.412 kb from C7orf28B 155.465 kb from
rs7806755 7209303 0.004401 0.002243 0.04973 COL28A1 376.917 kb from C7orf28B rs13246962 145.467 kb from
7219301 -0.00469 0.002155 0.02970 COL28A1 386.915 kb from C7orf28B 49 rs10487590 7243789 C1GALT1 -0.00754 0.00324 0.01994 100.054 kb from
rs17252812 7264714 -0.00616 0.002983 0.03884 COL28A1 14.208 kb from C1GALT1
rs1922630 7278275 0.007511 0.003319 0.02361 86.493 kb from COL28A1 27.769 kb from C1GALT1
rs1638201 7281776 0.007957 0.003788 0.03567 82.992 kb from COL28A1 31.27 kb from C1GALT1
rs2270080 7284116 -0.00733 0.003482 0.03534 80.652 kb from COL28A1 33.61 kb from C1GALT1
rs6463665*** 7285249 0.008222 0.003174 0.00959 79.519 kb from COL28A1 34.743 kb from C1GALT1
rs2141911 7295301 0.009059 0.004017 0.02415 69.467 kb from COL28A1 44.795 kb from C1GALT1
rs10273515 7299106 -0.0072 0.003491 0.03912 65.662 kb from COL28A1 48.6 kb from C1GALT1
rs13237015 7299447 -0.00714 0.003639 0.04963 65.321 kb from COL28A1 48.941 kb from C1GALT1 rs12673989 7310506 0.007964 0.003788 0.03549 None within 500 kb. None within 500 kb
rs9648104 7316406 0.009905 0.00407 0.01494 48.362 kb from COL28A1 65.9 kb from C1GALT1 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status **All P-values < 0.05, *** denotes P-value < 0.01
SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP
rs1294632 7391125 COL28A1 -0.00581 0.002691 0.03089
rs10952075 7836573 0.5405 0.2749 0.04930 138.374 kb from GLCCI1 111.81 kb from RPA3
rs4725044 7909102 -0.0044 0.002115 0.03755 65.845 kb from GLCCI1 184.339 kb from RPA3
rs10486207 8077306 GLCCI1 0.007259 0.003642 0.04625
rs17153107 8644906 NXPH1 0.01804 0.007459 0.01556
rs12702800*** 8931817 0.6734 0.2242 0.00267 None within 500 kb. 172.699 kb from NXPH1
rs6967777 9037073 0.007679 0.003868 0.04709 None within 500 kb. 277.955 kb from NXPH1
rs293169 9124086 0.004271 0.001923 0.02637 None within 500 kb. 364.968 kb from NXPH1
rs293173 9125176 0.004271 0.001923 0.02637 None within 500 kb. 366.058 kb from NXPH1
rs293181 9126659 0.004271 0.001923 0.02637 None within 500 kb. 367.541 kb from NXPH1
rs293184 9127724 0.004271 0.001923 0.02637 None within 500 kb. 368.606 kb from NXPH1
rs7808679 9138949 -0.00378 0.001907 0.04763 None within 500 kb. 379.831 kb from NXPH1
rs4720828 9213749 -0.00439 0.002186 0.04443 426.675 kb from PER4 454.631 kb from NXPH1
rs2713319 9498692 0.02088 0.008718 0.01663 141.732 kb from PER4 None within 500 kb
rs1910859 9559170 -0.00517 0.002563 0.04369 81.254 kb from PER4 None within 500 kb
rs2709004 9564339 0.008581 0.004075 0.03521 76.085 kb from PER4 None within 500 kb
50 rs16876171 9646548 -0.00483 0.002335 0.03861 None within 500 kb. 4.576 kb from PER4
rs13234568 9853401 0.01854 0.008438 0.02800 None within 500 kb. 211.429 kb from PER4
rs16876384 10049257 -0.00409 0.001964 0.03742 None within 500 kb. 407.285 kb from PER4
rs2108004 10559409 -0.00448 0.002277 0.04888 379.93 kb from NDUFA4 None within 500 kb
rs10243246 10601851 -0.00567 0.002303 0.01388 337.488 kb from NDUFA4 None within 500 kb
rs2108016 10605337 -0.00445 0.001861 0.01674 334.002 kb from NDUFA4 None within 500 kb
rs10486094 10764944 -0.0047 0.002238 0.03555 174.395 kb from NDUFA4 None within 500 kb
rs7789764 10939888 NDUFA4 -0.00628 0.002796 0.02469
rs1616965 10945922 NDUFA4 -0.0071 0.002922 0.01517
rs6968332 11181819 -0.0043 0.002179 0.04859 194.769 kb from THSD7A 6.05 kb from PHF14
rs12673692 11535056 THSD7A -0.00587 0.002479 0.01798 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01
SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP
rs2681050 11615999 THSD7A -0.00415 0.00206 0.04381
rs12699252 11651102 THSD7A -0.00497 0.002523 0.04901 160.003 kb from
rs6959385 11998352 0.009827 0.004396 0.02537 219.02 kb from TMEM106B THSD7A 42.8 kb from
rs10230889 12286214 -0.00437 0.001942 0.02444 50.819 kb from VWDE TMEM106B
rs2033604 12659666 SCIN -0.00604 0.002495 0.01551
rs7811183 12882595 -0.00555 0.002423 0.02188 None within 500 kb. 185.512 kb from ARL4A
rs7780825 12883832 0.01958 0.009378 0.03683 None within 500 kb. 186.749 kb from ARL4A
rs2041235 13171369 -0.00403 0.00189 0.03305 None within 500 kb. 474.286 kb from ARL4A rs194019 13206844 -0.00498 0.00232 0.03169 None within 500 kb. None within 500 kb rs7780526 13244474 0.01958 0.009378 0.03683 None within 500 kb. None within 500 kb rs17167005 13324742 -0.0067 0.003208 0.03670 None within 500 kb. None within 500 kb
rs2051928 13803279 -0.7444 0.362 0.03972 94.103 kb from ETV1 None within 500 kb
rs10243424 14109647 0.01471 0.006387 0.02124 41.551 kb from DGKB 112.072 kb from ETV1
rs1367775 14437323 DGKB -0.0055 0.002791 0.04870
rs17168255 14552554 DGKB -0.00446 0.002029 0.02797 51
rs7796540 14946709 -0.00677 0.00303 0.02543 259.758 kb from TMEM195 99.109 kb from DGKB
rs12386767 15074710 0.01471 0.006387 0.02124 131.757 kb from TMEM195 227.11 kb from DGKB
rs12699683 15099653 0.009143 0.003575 0.01054 106.814 kb from TMEM195 252.053 kb from DGKB
rs12699696 15248099 TMEM195 0.007406 0.003679 0.04413
rs2389412 15523467 TMEM195 0.005709 0.002742 0.03736 4.288 kb from
rs11763112 15572453 -0.00522 0.002237 0.01970 44.908 kb from MEOX2 TMEM195 246.669 kb from
rs16878648 15939502 -0.005 0.002478 0.04346 154.174 kb from ISPD MEOX2 287.824 kb from
rs1358449 15980657 -0.00514 0.002606 0.04853 113.019 kb from ISPD MEOX2 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01
SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP
rs17169263 16100239 LOC729920 -0.00483 0.002316 0.03709
rs1295175 16239376 LOC729920 0.007014 0.003028 0.02054
rs7782228 17001343 -0.0064 0.003218 0.04682 303.457 kb from AHR 113.205 kb from AGR3
rs10244656 17062178 -0.01351 0.006231 0.03009 242.622 kb from AHR 174.04 kb from AGR3
rs2166939*** 17502387 -0.00531 0.001999 0.00794 294.523 kb from SNX13 150.087 kb from AHR
rs2084472 17559022 0.01451 0.005852 0.01316 237.888 kb from SNX13 206.722 kb from AHR
rs7781652 17669942 0.003981 0.001942 0.04032 126.968 kb from SNX13 317.642 kb from AHR
rs12669067 17984469 0.006301 0.002896 0.02961 48.455 kb from PRPS1L1 37.813 kb from SNX13
rs7800421 18029281 1.0346 0.4824 0.03200 3.643 kb from PRPS1L1 82.625 kb from SNX13 184.507 kb from
rs2961310 22690933 -0.00427 0.00215 0.04683 42.357 kb from IL6 MGC87042 201.559 kb from
rs4321884 22707985 -0.00503 0.00212 0.01767 25.305 kb from IL6 MGC87042
rs7802277 22749139 0.01018 0.004062 0.01220 69.637 kb from TOMM7 10.995 kb from IL6 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01
52
Table 10. Results for Imputed Genotypes on Chromosome 20q SNP Location Gene Estimate Std Error p-Value Gene Proximal to SNP Gene Distal to SNP
rs6513195 54159977 1.2301 0.6237 0.04859 97.217 kb from MC3R 146.558 kb from CBLN4
rs163781 56997161 TH1L -0.00432 0.00191 0.02359
rs6026742 57174001 -0.00666 0.003174 0.03589 25.468 kb from ZNF831 122.705 kb from SLMO2
rs259997 57216650 C20orf174 -1.6598 0.8367 0.04730 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05
4.2. Discussion
From the analyses conducted here, none of the candidate gene SNPs or fine
mapping SNPs were found significantly associated with TB susceptibility at their
respective significant thresholds (P < 0.025, P < 4.13 x 10-5). Relaxing these thresholds
to P < 0.05 failed to provide any associations with these variants. Analyses of the
imputed genotypes were also non-significant at their respective threshold, although 117
imputed SNPs were significant at the P < 0.05 threshold. However, none of these SNPs
were within or near any of the candidate gene regions.
The genome-wide linkage results reported by Stein et al. (2008) provided promising initial evidence for TB susceptibility loci on chromosome 7 and the replicated regions on chromosome 20. Following up these linkage results, the candidate gene and fine mapping analyses here examined genetic associations in already published genomic regions. Imputation of genotypes can increase power to detect associations, but the appropriate reference panel is instrumental for accurate and precise imputation. It is possible that so few associations were found within the imputed data because the reference population selected did not fairly represent the LD structure in the Ugandan population. Also, some of the imputation software assumes individuals are unrelated, thus family data may weaken genotype imputation validity.
Recall that the GLMM applied to assess genetic association assumed an exchangeable correlation matrix to represent familial correlations, introducing potentially biased results. Ignoring avuncular and grandparental relationships may have led to under-reported levels of significance, having over-corrected for these correlations.
53 Assuming the observations are more correlated than they truly are has the potential to thus increase type II error rates. In spite of this, 230 (58%) of the relative pairs were parent-offspring pairs.
It is possible that the Stein et al. (2008) results were not confirmed here because the exact familial correlations were not integrated into the analyses, as correlations between full siblings were most likely under-estimated. Future analyses correctly accounting pedigree structure will be implemented in S.A.G.E.’s ASSOC program, whereby the familial correlations are modeled as random effects and variance components (additive polygenic, nuclear family, spousal, sibling, and individual environmental) are estimated, using a linear mixed model in which each marker is included as a fixed effect and the likelihood is maximized over all parameters (Gray-
McGuire et al. 2009).
Replication of an association with CTSZ was not successful. However, the significant association with CTSZ as indentified by Cooke et al. (2008) was based upon the analysis of a single SNP within that gene, rs34069356. Upon designing the SNP panel for genotyping, this SNP did not match my inclusion criteria, having not been validated by Illumina. This SNP was not imputed, thus replication of the Cooke association to this SNP was not possible. The significant associations of the MC3R
SNPs, rs3746619 and rs3827103, with TB susceptibility as reported by Cooke et al.
(2008) were also not replicated. Recall that these significant associations from Cooke
(2008) were found in a South African population of fully independent sibling pairs, including “Coloureds”, South Africans of mixed ancestry. Those investigators were not able to replicate their association in a second case-control analysis taken from a West
54 African sample. Based on the LD and haplotype analyses done here, we found that the
Ugandan population carries genetic distinctions from other African populations, including novel polymorphisms and LD structure from a South African sample (Baker et al. 2009). Therefore, it is not unexpected that the Cooke et al. (2008) results were not replicated.
Results for the imputation analyses should be interpreted with caution, as 61% of the chi-square tests analyzing differences between the random selection of 100 imputed
SNPs and each of the reference populations (MKK, LWK, YRI) were found statistically significant. These results may be due to local ancestry at these SNPs, such that even though the Ugandan data resembles that of the three references populations globally,
locally the SNPs are similar to only a single population. Also, selective sweeps in this region may have influenced these analyses. A better approach to measuring the imputation accuracy may be to remove a selection of the real Ugandan SNPs and impute these given the HapMap reference populations, and then compare these imputed genotypes to the real genotypes.
Of the suggestively significant imputed SNPs, four were found in close proximity with the candidate genes examined, such that rs7802277 (P = 0.01220) was located 11 kb from IL6. Also, rs4321884 (P = 0.01767) was 25 kb proximal to IL6. Distal to IL6 was
rs2961310 (P = 0.04683), 42.357 kb away, and rs6513195 (P = 0.04859) was found
97.217 kb away from MC3R. The imputation analyses found significance at five SNPs, at the P = 0.01 level. Two of these SNPs are located on the SDK1 gene, further supporting the role of HIV-status in TB susceptibility.
55 4.3. Conclusions and Future Directions
One of the difficulties in studying the genetic component of TB lies in how the
phenotype is defined, as well as which populations are considered for analysis. Latent
TB is an important aspect of the immunological sequence from exposure to active TB,
thus further research must be conducted to better understand this intermediate phase and
how both environment and ancestry affect it. It may be beneficial to consider other TB phenotypes, such as grouping LTBI individuals with index cases and studying this group versus the non-TB cases. Even more informative may be considering a polytomous regression, including all three levels of TB susceptibility. A possible limitation of the analyses presented here is that only 22% of the individuals genotyped were TB cases.
This study followed the previously reported linkage signals and their significant linkage to TB susceptibility in a genome-wide scan (Stein et al. 2008). Therefore, levels of significance in these association tests should be interpreted with caution. P-values may not be the best determinant of significant association to TB susceptibility; with a purpose to locate and identify the most likely point in a region already identified by linkage, a P-value is irrelevant. Perhaps fitting a curve over the entire region and locating its maximum value through the use of LD may be more appropriate. Roeder et al. (2006) suggested dividing P-values by weighted hypotheses and then applying a
traditional false discovery rate (FDR) analysis to the weighted P-values. Then the
hypotheses are weighted as determined by prior information, and the power to detect
association increases as a result (Roeder et al. 2006). However, Roeder et al. (2006)
assumed the linkage analyses included fully informative affected sibling pairs, that the
56 association study consisted of cases and controls (not related individuals), and that the
weight selection were performed with care.
Because association analyses are powered to detect associations with markers that
are in LD with the casual SNP (or casual loci), unless the casual SNP itself has been
genotyped, a haplotype analysis may provide more insight into disease susceptibility than
single SNP analyses (Bailar and Hoaglin 2009). Clark et al. (2004) explained that
analyzing phased haplotype improves the results of candidate gene studies, for three
reasons. First, protein-coding (i.e. functional) genes are defined by haplotypes. Second,
haplotypes are representative of the genetic differences between populations, and
dependent upon ancestry and demography, haplotypes composed of synonymous SNPs
can be population specific (Schaid 2004). Third, analyzing haplotypes over SNPs
presents a reduction in the dimensionality of the analysis, thus potentially increasing
statistical power (Clark 2004). Haplotypes will be constructed using S.AG.E.’s
DECIPHER program, analyzed in ASSOC, and these results will be compared to the SNP
results (S.A.G.E. v6.1.0.).
A possible limitation to the imputation analysis may be that imputation in MACH
automatically ignores SNPs present in the pedigree file but not in the reference panel.
Therefore, future work will compare imputed genotypes for SNPs included in the
Ugandan data to the actual SNP genotypes to evaluate imputation accuracy. For those genotypes missing from the pedigree dataset, imputation accuracy cannot be directly
assessed in this way. Instead, SNPs could be deliberately omitted from the source dataset
and imputed. Comparing these imputed genotypes to the known genotypes may provide
insight into the accuracy of imputation in these instances.
57 Other areas of interest include genome-wide association studies (GWAS).
Currently, no GWAS on TB have been published. A major limitation to these studies is
genotyping cost. Another limitation is that identification of causal variants is not
guaranteed as GWAS are intended to identify variants under the common disease,
common variant (CDCV) hypothesis, whereby common variants with small to moderate
effects are detected. This opposes the common disease, rare variant (CDRV) hypothesis,
which theorizes that common diseases are the result of several rare variants across the genome, such that each of these rare variants has a moderate to large effect on the disease. Therefore, if TB susceptibility is in fact the product of rare variants, a GWAS approach may not detect significant associations. Having found significant linkage but not association at the chromosome 7 and chromosome 20 regions (Stein et al. 2008) suggests rare variants may be responsible for genetic susceptibility to TB. To detect these rare variants, a copy number variation (CNV) analysis could be applied, whereby stretches in the genome, 1 kb or longer, present differences in the number of copies of a variant (or variants) between populations. These repeated “chunks” of the genome are then used as the units of variance under study, as opposed to single nucleotide variants.
It may be possible that additional inherited characteristics of the DNA controlled by epigenetics also influence phenotype, such that DNA transcription and/or tissue- specific expression of certain genes are regulated in the absence of altered nucleotide sequences (Möller et al. 2010a). Even more specific to the sequences themselves is a resequencing approach, such as has been conducted on the TLR genes (Ma et al. 2007).
Resequencing, like CNV analysis, assumes the CDRV hypothesis, and thus can potentially detect rare variants. In a full-exon resequenced region, Ma et al. (2007) found
58 coding variants in the TLR1 and TLR10 genes were significantly more expressed in
African American TB cases than African American controls. TB cases and controls matched on European and Hispanic ethnicity were also examined, and differences in the frequency of rare nonsynonymous polymorphisms were present in the Europeans (at
TLR10) and the Hispanics (at TLR2) (Ma et al. 2007). Further resequencing of exonic regions of the genome may lead to additional discoveries of rare variants influencing TB susceptibility. However, unless the sample size is very large or is heavily ascertained, a single rare variant may not be detectable; instead, the total “load” of rare mutations at many sites within a gene may be the relevant exposure for testing association.
Although experiment-wide statistical significance was not achieved in these analyses, the imputation results suggest MC3R and IL6 may play important roles in TB susceptibility, although these SNPs with suggested associations were located quite far from these genes. A major limitation to consider is the small sample size of only 564 individuals. Future work should pursue how genes influence immunological traits and their effects on TB susceptibility.
59 Bibliography
Abecasis, G. R., S. S. Cherny, W. O. Cookson, and L. R. Cardon, 2002 Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat.Genet 30: 97- 101.
Amim, L. H., A. G. Pacheco, J. Fonseca-Costa, C. S. Loredo, M. F. Rabahi et al. 2008 Role of IFN-gamma +874 T/A single nucleotide polymorphism in the tuberculosis outcome among Brazilians subjects. Mol.Biol.Rep. 35: 563-566.
Awomoyi, A. A., M. Charurat, A. Marchant, E. N. Miller, J. M. Blackwell et al. 2005 Polymorphism in IL1B: IL1B-511 association with tuberculosis and decreased lipopolysaccharide-induced IL-1beta in IFN-gamma primed ex-vivo whole blood assay. J.Endotoxin.Res. 11: 281-286.
Babb, C., E. H. Keet, P. D. van Helden, and E. G. Hoal, 2007a SP110 polymorphisms are not associated with pulmonary tuberculosis in a South African population. Hum Genet 121: 521-522.
Babb, C., M. L. van der, N. Beyers, C. Pheiffer, G. Walzl et al. 2007b Vitamin D receptor gene polymorphisms and sputum conversion time in pulmonary tuberculosis patients. Tuberculosis (Edinb.) 87: 295-302.
Baghdadi, J. E., M. Orlova, A. Alter, B. Ranque, M. Chentoufi et al. 2006 An autosomal dominant major gene confers predisposition to pulmonary tuberculosis in adults. J Exp.Med. 203: 1679-1684.
Bailar III J. C., and D. C. Hoaglin, 2009 Medical Uses of Statistics. John Wiley & Sons, Inc., Hoboken, New Jersey.
Baker A. B., A. Randhawa, Shey M., M. de Kock, G. Kaplan et al. 2009 Comparison of genotype frequencies in Toll-like receptor genes in Ugandans, South Africans, and African HapMap populations. Poster presented at the American Society of Human Genetics Annual Meeting, October 2009, Honolulu, HI . 2009.
Barrett, J. C., B. Fry, J. Maller, and M. J. Daly, 2005 Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 21: 263-265.
Bellamy, R., N. Beyers, K. P. McAdam, C. Ruwende, R. Gie et al. 2000 Genetic susceptibility to tuberculosis in Africans: a genome-wide scan. Proc.Natl.Acad.Sci.U.S.A 97: 8005-8009.
Bellamy, R., C. Ruwende, T. Corrah, K. P. McAdam, M. Thursz et al. 1999 Tuberculosis and chronic hepatitis B virus infection in Africans and variation in the vitamin D receptor gene. J.Infect.Dis. 179: 721-724.
60 Bellamy, R., C. Ruwende, T. Corrah, K. P. McAdam, H. C. Whittle et al. 1998a Assessment of the interleukin 1 gene cluster and other candidate gene polymorphisms in host susceptibility to tuberculosis. Tuber.Lung Dis. 79: 83-89.
Bellamy, R., C. Ruwende, T. Corrah, K. P. McAdam, H. C. Whittle et al. 1998b Variations in the NRAMP1 gene and susceptibility to tuberculosis in West Africans. N.Engl.J.Med. 338: 640-644.
Bellamy, R. J., and A. V. Hill, 1998 Host genetic susceptibility to human tuberculosis. Novartis.Found.Symp. 217: 3-13.
Ben-Ali, M., M. R. Barbouche, S. Bousnina, A. Chabbou, and K. Dellagi, 2004 Toll-like receptor 2 Arg677Trp polymorphism is associated with susceptibility to tuberculosis in Tunisian patients. Clin.Diagn.Lab Immunol. 11: 625-626.
Berrington, W. R., and T. R. Hawn, 2007 Mycobacterium tuberculosis, macrophages, and the innate immune response: does common variation matter? Immunol.Rev. 219: 167-186.
Bidwell, J., L. Keen, G. Gallagher, R. Kimberly, T. Huizinga et al. 2001 Cytokine gene polymorphism in human disease: on-line databases, supplement 1. Genes Immun. 2: 61-70.
Bochud, P. Y., T. R. Hawn, and A. Aderem, 2003 Cutting edge: a Toll-like receptor 2 polymorphism that is associated with lepromatous leprosy is unable to mediate mycobacterial signaling. J Immunol. 170: 3451-3454.
Botha, T., and B. Ryffel, 2003 Reactivation of latent tuberculosis infection in TNF- deficient mice. J Immunol. 171: 3110-3118.
Breslow, N., and D. G. Clayton, 1993 Journal of the American Statistical Association 88: 9-25.
Browning, B. L., and S. R. Browning, 2009 A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84: 210-223.
Browning, S. R., 2008 Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet 124: 439-450.
Browning, S. R., and B. L. Browning, 2007 Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084-1097.
Cervino, A. C., S. Lakiss, O. Sow, R. Bellamy, N. Beyers et al. 2002 Fine mapping of a putative tuberculosis-susceptibility locus on chromosome 15q11-13 in African families. Hum.Mol.Genet. 11: 1599-1603.
61 Chaisson, R. E., G. F. Schecter, C. P. Theuer, G. W. Rutherford, D. F. Echenberg et al. 1987 Tuberculosis in patients with the acquired immunodeficiency syndrome. Clinical features, response to therapy, and survival. Am.Rev.Respir.Dis. 136: 570- 574.
Chen X.R., Feng Y.L., Ma Y., Zhang Z.D., Li C.Y. et al. 2006 Study on the association of two polymorphisms of the vitamin D receptor (VDR) gene with susceptibility to pulmonary tuberculosis (PTB) in the Chinese Tibetans. Sichuan Da Xue Xue Bao Yi Xue Ban 37: 847-851.
Clark, A. G., 2004 The role of haplotypes in candidate gene studies. Genet Epidemiol. 27: 321-333.
Comstock, G. W., 1982 Epidemiology of tuberculosis. Am.Rev.Respir.Dis. 125: 8-15.
Comstock, G. W., 1978 Tuberculosis in twins: a re-analysis of the Prophit survey. Am.Rev.Respir.Dis. 117: 621-624.
Cooke, G. S., S. J. Campbell, S. Bennett, C. Lienhardt, K. P. McAdam et al. 2008 Mapping of a novel susceptibility locus suggests a role for MC3R and CTSZ in human tuberculosis. Am.J.Respir.Crit Care Med. 178: 203-207.
Correa, P. A., L. M. Gomez, J. Cadena, and J. M. Anaya, 2005 Autoimmunity and tuberculosis. Opposite association with TNF polymorphism. J.Rheumatol. 32: 219-224.
Drennan, M. B., D. Nicolle, V. J. Quesniaux, M. Jacobs, N. Allie et al. 2004 Toll-like receptor 2-deficient mice succumb to Mycobacterium tuberculosis infection. Am J Pathol. 164: 49-57.
Dubos, R. J., 1952 Discussion on treatment of tuberculous meningitis and survival of bacilli in tuberculous lesions. Am Rev.Tuberc. 65: 637-640.
Ellinghaus, D., S. Schreiber, A. Franke, and M. Nothnagel, 2009 Current software for genotype imputation. Hum.Genomics 3: 371-380.
Epstein, M. P., W. L. Duren, and M. Boehnke, 2000 Improved inference of relationship for pairs of individuals. Am J Hum Genet 67: 1219-1231.
Etokebe, G. E., L. Bulat-Kardum, M. S. Johansen, J. Knezevic, S. Balen et al. 2006 Interferon-gamma gene (T874A and G2109A) polymorphisms are associated with microscopy-positive tuberculosis. Scand.J.Immunol. 63: 136-141.
Ewing, B., and P. Green, 1998 Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186-194.
Ewing, B., L. Hillier, M. C. Wendl, and P. Green, 1998 Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175-185.
62 Flores-Villanueva, P. O., J. A. Ruiz-Morales, C. H. Song, L. M. Flores, E. K. Jo et al. 2005 A functional promoter polymorphism in monocyte chemoattractant protein- 1 is associated with increased susceptibility to pulmonary tuberculosis. J.Exp.Med. 202: 1649-1658.
Flynn, J. L., J. Chan, K. J. Triebold, D. K. Dalton, T. A. Stewart et al. 1993 An essential role for interferon gamma in resistance to Mycobacterium tuberculosis infection. J.Exp.Med. 178: 2249-2254.
Gomez, L. M., J. M. Anaya, J. R. Vilchez, J. Cadena, R. Hinojosa et al. 2007 A polymorphism in the inducible nitric oxide synthase gene is associated with tuberculosis. Tuberculosis (Edinb.) 87: 288-294.
Gomez, L. M., J. F. Camargo, J. Castiblanco, E. A. Ruiz-Narvaez, J. Cadena et al. 2006 Analysis of IL1B, TAP1, TAP2 and IKBL polymorphisms on susceptibility to tuberculosis. Tissue Antigens 67: 290-296.
Gordon, D., C. Abajian, and P. Green, 1998 Consed: a graphical tool for sequence finishing. Genome Res 8: 195-202.
Gray-McGuire, C., M. Boehnke, R. Goodloe, and R. C. Elston, 2009 Research: Genetic association tests: A method for the joint analysis of family and case-control data. Human Genomics 4.
Greenwood, C. M., T. M. Fujiwara, L. J. Boothroyd, M. A. Miller, D. Frappier et al. 2000 Linkage of tuberculosis to chromosome 2q35 loci, including NRAMP1, in a large aboriginal Canadian family. Am.J.Hum.Genet. 67: 405-416.
Guwatudde, D., M. Nakakeeto, E. C. Jones-Lopez, A. Maganda, A. Chiunda et al. 2003 Tuberculosis in household contacts of infectious cases in Kampala, Uganda. Am J Epidemiol. 158: 887-898.
Haukim, N., J. L. Bidwell, A. J. Smith, L. J. Keen, G. Gallagher et al. 2002 Cytokine gene polymorphism in human disease: on-line databases, supplement 2. Genes Immun. 3: 313-330.
Heldwein, K. A., M. D. Liang, T. K. Andresen, K. E. Thomas, A. M. Marty et al. 2003 TLR2 and TLR4 serve distinct roles in the host immune response against Mycobacterium bovis BCG. J Leukoc.Biol. 74: 277-286.
Horner, P. J., and F. M. Moss, 1991 Tuberculosis in HIV infection. Int.J.STD AIDS 2: 162-167.
The International HapMap Project. 2003 Nature 426: 789-796.
Jallow, M., Y. Y. Teo, K. S. Small, K. A. Rockett, P. Deloukas et al. 2009 Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat.Genet.
63 Jamieson, S. E., E. N. Miller, G. F. Black, C. S. Peacock, H. J. Cordell et al. 2004 Evidence for a cluster of genes on chromosome 17q11-q21 controlling susceptibility to tuberculosis and leprosy in Brazilians. Genes Immun. 5: 46-57.
Kallmann F.J., R. D., 1943 Twin studies on the significance of genetic factors in tuberculosis. American Review of Tuberculosis 549-571.
Kampmann, B., G. Tena-Coki, and S. Anderson, 2006 Blood tests for diagnosis of tuberculosis. Lancet 368: 282-283.
Kaufman L, Yang G, Hayashi K, Ashby JR, Huang L et al. 2007 The homophilic adhesion molecule sidekick-1 contributes to augmented podocyte aggregation in HIV-associated nephropathy. Federation of American Societies for Experimental Biology 21: 1367-1375.
Kaufman, L., K. Hayashi, M. J. Ross, M. D. Ross, and P. E. Klotman, 2004 Sidekick-1 is upregulated in glomeruli in HIV-associated nephropathy. J Am Soc.Nephrol. 15: 1721-1730.
Keane, J., S. Gershon, R. P. Wise, E. Mirabile-Levens, J. Kasznica et al. 2001 Tuberculosis associated with infliximab, a tumor necrosis factor alpha- neutralizing agent. N.Engl.J.Med. 345: 1098-1104.
Kramnik, I., W. F. Dietrich, P. Demant, and B. R. Bloom, 2000 Genetic control of resistance to experimental infection with virulent Mycobacterium tuberculosis. Proc.Natl.Acad.Sci.U.S.A 97: 8560-8565.
Kruglyak, L., M. J. Daly, M. P. Reeve-Daly, and E. S. Lander, 1996 Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58: 1347-1363.
Kusuhara, K., K. Yamamoto, K. Okada, Y. Mizuno, and T. Hara, 2007 Association of IL12RB1 polymorphisms with susceptibility to and severity of tuberculosis in Japanese: a gene-based association analysis of 21 candidate genes. Int.J.Immunogenet. 34: 35-44.
Ladel, C. H., C. Blum, A. Dreher, K. Reifenberg, M. Kopf et al. 1997 Lethal tuberculosis in interleukin-6-deficient mutant mice. Infect.Immun. 65: 4843-4849.
Leandro, A. C., M. A. Rocha, C. S. Cardoso, and M. G. Bonecini-Almeida, 2009 Genetic polymorphisms in vitamin D receptor, vitamin D-binding protein, Toll-like receptor 2, nitric oxide synthase 2, and interferon-gamma genes and its association with susceptibility to tuberculosis. Braz.J Med.Biol.Res 42: 312-322.
Leung, K. H., S. P. Yip, W. S. Wong, L. S. Yiu, K. K. Chan et al. 2007 Sex- and age- dependent association of SLC11A1 polymorphisms with tuberculosis in Chinese: a case control study. BMC.Infect.Dis. 7: 19.
64 Lewinsohn, D. A., S. Zalwango, C. M. Stein, H. Mayanja-Kizza, A. Okwera et al. 2008 Whole blood interferon-gamma responses to mycobacterium tuberculosis antigens in young household contacts of persons with tuberculosis in Uganda. PLoS.One. 3: e3407.
Lewis, S. J., I. Baker, and S. G. Davey, 2005 Meta-analysis of vitamin D receptor polymorphisms and pulmonary tuberculosis risk. Int.J.Tuberc.Lung Dis. 9: 1174- 1177.
Li Y, and Abecasis GR, 2006 Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet S79: 2290.
Li, C. M., S. J. Campbell, D. S. Kumararatne, R. Bellamy, C. Ruwende et al. 2002 Association of a polymorphism in the P2X7 gene with tuberculosis in a Gambian population. J Infect.Dis. 186: 1458-1462.
Li, H. T., T. T. Zhang, Y. Q. Zhou, Q. H. Huang, and J. Huang, 2006a SLC11A1 (formerly NRAMP1) gene polymorphisms and tuberculosis susceptibility: a meta- analysis. Int.J.Tuberc.Lung Dis. 10: 3-12.
Li, M., P. tmaca-Sonmez, M. Othman, K. E. Branham, R. Khanna et al. 2006b CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration. Nat.Genet. 38: 1049-1054.
Li, Y., C. Willer, S. Sanna, and G. Abecasis, 2009 Genotype imputation. Annu.Rev.Genomics Hum Genet 10: 387-406.
Liu, W., W. C. Cao, C. Y. Zhang, L. Tian, X. M. Wu et al. 2004 VDR and NRAMP1 gene polymorphisms in susceptibility to pulmonary tuberculosis among the Chinese Han population: a case-control study. Int.J.Tuberc.Lung Dis. 8: 428-434.
Lopez-Maderuelo, D., F. Arnalich, R. Serantes, A. Gonzalez, R. Codoceo et al. 2003a Interferon-gamma and interleukin-10 gene polymorphisms in pulmonary tuberculosis. Am.J.Respir.Crit Care Med. 167: 970-975.
Louie, E., L. B. Rice, and R. S. Holzman, 1986 Tuberculosis in non-Haitian patients with acquired immunodeficiency syndrome. Chest 90: 542-545.
Ma, X., Y. Liu, B. B. Gowen, E. A. Graviss, A. G. Clark et al. 2007 Full-exon resequencing reveals toll-like receptor variants contribute to human susceptibility to tuberculosis disease. PLoS.One. 2: e1318.
MacMicking, J. D., R. J. North, R. LaCourse, J. S. Mudgett, S. K. Shah et al. 1997 Identification of nitric oxide synthase as a protective locus against tuberculosis. Proc.Natl.Acad.Sci.U.S.A 94: 5243-5248.
65 Mahasirimongkol, S., H. Yanai, N. Nishida, C. Ridruechai, I. Matsushita et al. 2009 Genome-wide SNP-based linkage analysis of tuberculosis in Thais. Genes Immun. 10: 77-83.
Malik, S., L. Abel, H. Tooker, A. Poon, L. Simkin et al. 2005 Alleles of the NRAMP1 gene are risk factors for pediatric tuberculosis disease. Proc.Natl.Acad.Sci.U.S.A 102: 12183-12188.
Mangin, B., B. Goffinet, and A. Rebai, 1994 Constructing confidence intervals for QTL location. Genetics 138: 1301-1308.
Marchini, J., B. Howie, S. Myers, G. McVean, and P. Donnelly, 2007 A new multipoint method for genome-wide association studies by imputation of genotypes. Nat.Genet. 39: 906-913.
Miller, E. N., S. E. Jamieson, C. Joberty, M. Fakiola, D. Hudson et al. 2004 Genome- wide scans for leprosy and tuberculosis susceptibility genes in Brazilians. Genes Immun. 5: 63-67.
Möller, M., W. E. de, and E. G. Hoal, 2010a Past, present and future directions in human genetic susceptibility to tuberculosis. FEMS Immunol.Med.Microbiol. 58: 3-26.
Möller, M., F. Flachsbart, A. Till, T. Thye, R. D. Horstmann et al. 2010b A functional haplotype in the 3'untranslated region of TNFRSF1B is associated with tuberculosis in two African populations. Am J Respir.Crit Care Med. 181: 388- 393.
Moran, A., X. Ma, R. A. Reich, and E. A. Graviss, 2007 No association between the +874T/A single nucleotide polymorphism in the IFN-gamma gene and susceptibility to TB. Int.J.Tuberc.Lung Dis. 11: 113-115.
Murray, C. J., K. Styblo, and A. Rouillon, 1990 Tuberculosis in developing countries: burden, intervention and cost. Bull.Int.Union Tuberc.Lung Dis. 65: 6-24.
Nino-Moreno, P., D. Portales-Perez, B. Hernandez-Castro, L. Portales-Cervantes, V. Flores-Meraz et al. 2007 P2X7 and NRAMP1/SLC11 A1 gene polymorphisms in Mexican mestizo patients with pulmonary tuberculosis. Clin.Exp.Immunol. 148: 469-477.
Nothnagel, M., D. Ellinghaus, S. Schreiber, M. Krawczak, and A. Franke, 2009 A comprehensive evaluation of SNP genotype imputation. Hum Genet 125: 163- 171.
Ogus, A. C., B. Yoldas, T. Ozdemir, A. Uguz, S. Olcen et al. 2004 The Arg753GLn polymorphism of the human toll-like receptor 2 gene in tuberculosis disease. Eur.Respir.J. 23: 219-223.
66 Oral, H. B., F. Budak, E. K. Uzaslan, B. Basturk, A. Bekar et al. 2006 Interleukin-10 (IL-10) gene polymorphism as a potential host susceptibility factor in tuberculosis. Cytokine 35: 143-147.
Pacheco, A. G., C. C. Cardoso, and M. O. Moraes, 2008 IFNG +874T/A, IL10 -1082G/A and TNF -308G/A polymorphisms in association with tuberculosis susceptibility: a meta-analysis study. Hum Genet 123: 477-484.
Pan, H., B. S. Yan, M. Rojas, Y. V. Shebzukhov, H. Zhou et al. 2005 Ipr1 gene mediates innate immunity to tuberculosis. Nature 434: 767-772.
Pei, Y. F., J. Li, L. Zhang, C. J. Papasian, and H. W. Deng, 2008 Analyses and comparison of accuracy of different genotype imputation methods. PLoS.One. 3: e3551.
Pitchenik, A. E., C. Cole, B. W. Russell, M. A. Fischl, T. J. Spira et al. 1984 Tuberculosis, atypical mycobacteriosis, and the acquired immunodeficiency syndrome among Haitian and non-Haitian patients in south Florida. Ann.Intern.Med. 101: 641-645.
Pritchard, J. K., M. Stephens, and P. Donnelly, 2000 Inference of population structure using multilocus genotype data. Genetics 155: 945-959.
Pruitt, K. D., T. Tatusova, and D. R. Maglott, 2007 NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61-D65.
Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira et al. 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575.
Qu, Y., Y. Tang, D. Cao, F. Wu, J. Liu et al. 2007 Genetic polymorphisms in alveolar macrophage response-related genes, and risk of silicosis and pulmonary tuberculosis in Chinese iron miners. Int.J.Hyg.Environ.Health 210: 679-689.
Reiling, N., C. Holscher, A. Fehrenbach, S. Kroger, C. J. Kirschning et al. 2002 Cutting edge: Toll-like receptor (TLR)2- and TLR4-mediated pathogen recognition in resistance to airborne infection with Mycobacterium tuberculosis. J Immunol. 169: 3480-3484.
Rieder H. L., 1999 Epidemiologic Basis of Tuberculosis Control. International Union Against Tuberculosis and Lung Disease, Paris.
Roeder, K., S. A. Bacanu, L. Wasserman, and B. Devlin, 2006 Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 78: 243-252.
67 Rossouw, M., H. J. Nel, G. S. Cooke, P. D. van Helden, and E. G. Hoal, 2003 Association between tuberculosis and a polymorphic NFkappaB binding site in the interferon gamma gene. Lancet 361: 1871-1872.
Roth, D. E., G. Soto, F. Arenas, C. T. Bautista, J. Ortiz et al. 2004 Association between vitamin D receptor gene polymorphisms and response to treatment of pulmonary tuberculosis. J.Infect.Dis. 190: 920-927.
S.A.G.E. [2010] Statistical Analysis for Genetic Epidemiology, Release 6.1.0: http://darwin.cwru.edu/
SAS Institute. The Mixed Procedure. SAS/STAT User's Guide, Version 913. Cary, NC: SAS Institute.
Schaid, D. J., 2004 Evaluating associations of haplotypes with traits. Genet Epidemiol. 27: 348-364.
SeattleSNPs Program for Genomic Applications (PGA). Genome Variation Server. 2009.
Selvaraj, P., P. R. Narayanan, and A. M. Reetha, 2000 Association of vitamin D receptor genotypes with the susceptibility to pulmonary tuberculosis in female patients & resistance in female contacts. Indian J.Med.Res. 111: 172-179.
Servin, B., and M. Stephens, 2007 Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS.Genet. 3: e114.
Shaw, M. A., A. Collins, C. S. Peacock, E. N. Miller, G. F. Black et al. 1997 Evidence that genetic susceptibility to Mycobacterium tuberculosis in a Brazilian population is under oligogenic control: linkage study of the candidate genes NRAMP1 and TNFA. Tuber.Lung Dis. 78: 35-45.
Sheline, K. D., A. M. France, S. Talarico, B. Foxman, L. Zhang et al. 2009 Does the lipR gene of tubercle bacilli have a role in tuberculosis transmission and pathogenesis? Tuberculosis (Edinb.) 89: 114-119.
Simonds B., 1963 Tuberculosis in twins. Pitman Medical, London.
Skamene, E., E. Schurr, and P. Gros, 1998 Infection genomics: Nramp1 as a major determinant of natural resistance to intracellular infections. Annu.Rev.Med. 49: 275-287.
Stead, W. W., J. W. Senner, W. T. Reddick, and J. P. Lofgren, 1990 Racial differences in susceptibility to infection by Mycobacterium tuberculosis. N.Engl.J Med. 322: 422-427.
Stein C. M., and W. H. Boom, 2009 personal communication.
68 Stein, C. M., D. Guwatudde, M. Nakakeeto, P. Peters, R. C. Elston et al. 2003 Heritability analysis of cytokines as intermediate phenotypes of tuberculosis. J.Infect.Dis. 187: 1679-1685.
Stein, C. M., L. Nshuti, A. B. Chiunda, W. H. Boom, R. C. Elston et al. 2005 Evidence for a major gene influence on tumor necrosis factor-alpha expression in tuberculosis: path and segregation analysis. Hum.Hered. 60: 109-118.
Stein, C. M., S. Zalwango, A. B. Chiunda, C. Millard, D. V. Leontiev et al. 2007 Linkage and association analysis of candidate genes for TB and TNFalpha cytokine expression: evidence for association with IFNGR1, IL-10, and TNF receptor 1 genes. Hum.Genet. 121: 663-673.
Stein, C. M., S. Zalwango, L. L. Malone, S. Won, H. Mayanja-Kizza et al. 2008 Genome scan of M. tuberculosis infection and disease in Ugandans. PLoS.One. 3: e4094.
Sunderam, G., R. J. McDonald, T. Maniatis, J. Oleske, R. Kapila et al. 1986 Tuberculosis as a manifestation of the acquired immunodeficiency syndrome (AIDS). JAMA 256: 362-366.
Szeszko, J. S., B. Healy, H. Stevens, Y. Balabanova, F. Drobniewski et al. 2007 Resequencing and association analysis of the SP110 gene in adult pulmonary tuberculosis. Hum Genet 121: 155-160.
Thuong, N. T., T. R. Hawn, G. E. Thwaites, T. T. Chau, N. T. Lan et al. 2007 A polymorphism in human TLR2 is associated with increased susceptibility to tuberculous meningitis. Genes Immun. 8: 422-428.
Thye, T., E. N. Browne, M. A. Chinbuah, J. Gyapong, I. Osei et al. 2009a IL10 haplotype associated with tuberculin skin test response but not with pulmonary TB. PLoS.One. 4: e5420.
Thye, T., E. N. Browne, M. A. Chinbuah, J. Gyapong, I. Osei et al. 2006 No associations of human pulmonary tuberculosis with Sp110 variants. J Med.Genet 43: e32.
Thye, T., S. Nejentsev, C. D. Intemann, E. N. Browne, M. A. Chinbuah et al. 2009b MCP-1 promoter variant -362C associated with protection from pulmonary tuberculosis in Ghana, West Africa. Hum Mol.Genet 18: 381-388.
Tosh, K., S. J. Campbell, K. Fielding, J. Sillah, B. Bah et al. 2006 Variants in the SP110 gene are associated with genetic susceptibility to tuberculosis in West Africa. Proc.Natl.Acad.Sci.U.S.A 103: 10364-10368.
Treatment of tuberculosis and tuberculosis infection in adults and children. American Thoracic Society. 1994 Monaldi Arch.Chest Dis. 49: 327-345.
van Crevel, R., T. H. Ottenhoff, and J. W. van der Meer, 2002a Innate immunity to Mycobacterium tuberculosis. Clin.Microbiol.Rev. 15: 294-309.
69 Wiart, A., A. Jepson, W. Banya, S. Bennett, H. Whittle et al. 2004 Quantitative association tests of immune responses to antigens of Mycobacterium tuberculosis: a study of twins in West Africa. Twin.Res. 7: 578-588.
Wilkinson, R. J., P. Patel, M. Llewelyn, C. S. Hirsch, G. Pasvol et al. 1999 Influence of polymorphism in the genes for the interleukin (IL)-1 receptor antagonist and IL- 1beta on tuberculosis. J.Exp.Med. 189: 1863-1874.
Wolfinger, R., and M. O'Connell, 1993 Journal of Statistical Computation and Simulation 48: 233-243.
World Health Organization., 2009 Global tuberculosis control : epidemiology, strategy, financing : WHO report 2009. World Health Organization, Geneva.
Xing, C., C. Gray-McGuire, J. A. Kelly, P. Garriott, H. Bukulmez et al. 2005 Genetic linkage of systemic lupus erythematosus to 13q32 in African American families with affected male members. Hum.Genet. 118: 309-321.
Yim, J. J., H. W. Lee, H. S. Lee, Y. W. Kim, S. K. Han et al. 2006 The association between microsatellite polymorphisms in intron II of the human Toll-like receptor 2 gene and tuberculosis among Koreans. Genes Immun. 7: 150-155.
70