SNP ASSOCIATIONS WITH TUBERCULOSIS SUSCEPTIBILITY IN A UGANDAN

HOUSEHOLD CONTACT STUDY

by

ALLISON REES BAKER

Submitted in partial fulfillment of the requirements

For the degree of Master of Science

Thesis Advisor: Dr. Catherine M. Stein

Department of Epidemiology and Biostatistics

CASE WESTERN RESERVE UNIVERSITY

August, 2010

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

______

candidate for the ______degree *.

(signed)______(chair of the committee)

______

______

______

______

______

(date) ______

*We also certify that written approval has been obtained for any proprietary material contained therein. Table of Contents

Table of Contents...... iii List of Tables ...... iv Acknowledgements...... v List of Commonly Used Abbreviations ...... vi Chapter 1: Literature Review...... 8 1.1. Genetics of Susceptibility to Tuberculosis ...... 8 1.1.1. History and Epidemiology of Tuberculosis ...... 8 1.1.2. Candidate ...... 10 1.1.3. Genome-wide Linkage Scans ...... 15 1.2. Methods for Fine Mapping Analysis ...... 20 1.3. Imputation...... 23 Chapter 2: Specific Aims...... 27 2.1. Specific Aim 1 ...... 27 2.2. Specific Aim 2 ...... 27 2.3. Specific Aim 3 ...... 28 Chapter 3: Methods...... 29 3.1. Data Description ...... 29 3.1.1. Sample...... 29 3.1.2. Descriptive Statistics...... 30 3.2. Genotyping...... 31 3.3. Analysis Strategy ...... 34 3.3.1. Aim 1: Candidate Analysis...... 34 3.3.2. Aim 2: Fine Mapping Analysis...... 37 3.3.3. Aim 3: Imputation...... 38 Chapter 4: Results and Discussion...... 41 4.1. Results...... 41 4.1.1. Candidate Gene Analysis...... 41 4.1.2. Fine-Mapping Analysis...... 43 4.1.3. Imputation...... 44 4.2. Discussion...... 53 4.3. Conclusions and Future Directions...... 56 Bibliography ...... 60

iii

List of Tables

Table 1. Descriptive Statistics...... 34 Table 2. Candidate Gene SNPs Departing from HWE...... 41 Table 3. Candidate Gene Analysis Results...... 42 Table 4. Fine Mapping SNPs Departing from HWE ...... 44 Table 5. Haplotype Analysis Results for TLR2...... 46 Table 6. Haplotype Analysis Results for TLR4...... 46 Table 7. Haplotype Analysis Results for TLR6...... 46 Table 8. Haplotype Analysis Results for TIRAP...... 46 Table 9. Results for Imputed Genotypes on 7p...... 48 Table 10. Results for Imputed Genotypes on Chromosome 20q...... 52

iv Acknowledgements

I would like to acknowledge and thank my thesis advisor, Dr. Catherine Stein, for providing me with direction and leadership throughout my academic program. I am exceedingly grateful for the incredible mentoring, support and guidance received from

Drs. Courtney Gray and Emma Larkin. Thank you also to Drs. Robert Igo and Robert

Elston, and thanks to Robert Goodloe for his programming assistance and sharing in the student experience. A very special thank you is dedicated to my devoted mother and father, and to my husband, Dave, for without his endearing love and endless support, none of my success would be possible.

v List of Commonly Used Abbreviations

AIDS Acquired Immune Deficiency Syndrome ASW African Ancestry in Southwest USA CARD11 Caspase recruitment domain family, member 11 cM Centimorgan CTSZ Cathepsin Z GLMM Generalized Linear Mixed Model HIV Human Immunodeficiency Virus IL-1 Interleukin-1 IL-10 Interleukin-10 IL-12 Interleukin-12 IFNG1- γ Interferon Gamma HIV Human Immunodeficiency Virus HMM Hidden Markov Model HWE Hardy Weinberg Equilibrium kb Kilobasepair LD Linkage Disequilibrium LTBI Latent Mycobacterium Tuberculosis Infection LWK Luhya in Webuye, Kenya MAF Minor Allele Frequency Mb Megabasepair MC3R Melanocortin 3 Receptor MKK Maasai in Kinyawa, Kenya Mtb Mycobacterium tuberculosis NOS2A Nitric Oxide Synthase 2A NRAMP1 Natural-Resistance-Associated Macrophage 1 PPD Purified Protein Derivative QC Quality Control QTL Quantitative Trait Locus SLC11A1 Solute Carrier Family 11, Member 13 SNP Single Nucleotide Polymorphism TB Tuberculosis TBSCPB Tuberculosis Susceptibility Variable TDT Transmission Disequilibrium Test TLR-2 Toll-Like Receptor-2 TLR-4 Toll-Like Receptor-4 TNF Tumor Necrosis Factor TNF-α Tumor Necrosis Factor-α TST Tuberculin Skin Test UG Uganda YRI Yoruba in Ibadan, Nigeria

vi

SNP Associations with Tuberculosis Susceptibility in a Ugandan Household Contact Study

Abstract

by

ALLISON REES BAKER

The World Health Organization reports that over 9 million new cases of tuberculosis

(TB) are diagnosed each year, killing between 1.6 and 2 million individuals worldwide.

TB is an infectious disease caused by the bacterium Mycobacterium tuberculosis (Mtb), and reports indicate that only 10% of individuals infected with Mtb actually advance to disease. Genetic linkage and association analyses have established several chromosome regions involved in TB susceptibility. This study examines the association of TB susceptibility with a selection of biologically relevant markers, a region identified through a previous genome scan, and association with imputed genotypes.

Across 7 and 20, 564 Ugandan individuals were genotyped at 1,417 SNPs.

None of the candidate genes or fine mapping SNPs were found significantly associated with TB susceptibility (P > 0.10). Five imputed SNPs were significant at the P = 0.01 level. Suggested future work includes GWAS and resequencing analyses.

vii

Chapter 1: Literature Review

1.1. Genetics of Susceptibility to Tuberculosis

1.1.1. History and Epidemiology of Tuberculosis

The World Health Organization (WHO) reports that over 9 million new cases of tuberculosis (TB) are diagnosed each year, killing between 1.6 and 2 million individuals worldwide (World Health Organization 2009). TB is an infectious disease caused by the bacterium Mycobacterium tuberculosis (Mtb), but reports indicate that only 10% of individuals infected with Mtb actually advance to disease (Murray et al. 1990). The pathogenesis of TB follows a two-stage process: a productive infection of Mtb whereby symptoms do not develop, followed by Mtb replication and the expression of disease symptoms, such as persistent cough and pulmonary cavities on chest x-ray (Comstock

1982). A simple skin-test is most commonly used to detect latent Mtb infection, testing reaction to purified protein derivative (PPD). TB disease is characterized by growth of

Mtb on culture, presence of cavities on x-ray, and symptoms such as cough and fever.

Studies have established a direct association between human immunodeficiency virus (HIV) status and TB susceptibility (Chaisson et al. 1987; Pitchenik et al. 1984;

Sunderam et al. 1986; Louie et al. 1986). Due to the growing rate of HIV and acquired immune deficiency syndrome (AIDS), the number of TB cases continues to rise. In fact, having AIDS increases the risk for contracting TB by almost 100 times (Horner and Moss

1991). Almost 3 million of these newly diagnosed cases are found in Africa. In Uganda alone, 426 individuals per 100,000 have been diagnosed with TB (World Health

Organization 2009). The WHO reports most currently that 38% of incident adult TB

8 cases are also infected with HIV. Because HIV inhibits the immune system, which in

turn is directly attacked by the Mtb, monitoring TB in developing countries where HIV

and AIDS are highly prevalent is of great interest to the research community.

Several arguments have been made for a genetic risk factor in TB development, based on the notion that such a small percentage of individuals infected with Mtb

progress to disease development. Although results remain somewhat inconsistent, animal

models (Skamene et al. 1998; Kramnik et al. 2000), twin studies (Kallmann 1943;

Simonds 1963; Comstock 1978; Wiart et al. 2004), segregation analysis (Shaw et al.

1997), candidate gene studies (Bellamy and Hill11998; Bellamy 1998a; Bellamy et al.

1998b; Leandro et al. 2009; Stein et al. 2007; Pacheco et al. 2008), linkage analysis

(Bellamy et al. 2000; Cervino et al. 2002; Greenwood et al. 2000; Jamieson et al. 2004;

Miller et al. 2004), and fine-mapping analyses (Cervino et al. 2002) have found evidence in support of a genetic component of TB. Furthermore, European populations have greater TB resistance than populations of African ancestry, possibly the result of longer

exposure times (Dubos 1952). Möller and Hoal suggested that these population

differences are not only due to socioeconomic factors, as evident in a United States

nursing home study whereby individuals of African descent were found twice as likely as

individuals from European descent to be Mtb infected, (Möller et al. 2010a; Stead et al.

1990), but that these differences are also due to non-environmental factors. Thus,

inherited genetic susceptibility for a host defense against TB development may exist.

Shaw and colleagues (Shaw et al. 1997) performed a segregation analysis of 98

Brazilian families to determine the mode of inheritance for genetic susceptibility to TB.

After testing four different models: major gene, sporadic, polygenic, and multifactorial,

9 the investigators concluded a general two-locus major gene model for TB susceptibility,

which was only marginally preferred over a single-locus model (Shaw et al. 1997).

Another segregation and commingling analysis was carried out by Stein et al. (2005),

where antigen-induced tumor necrosis factor α (TNF-α) expression was used as an

endophenotype for TB, such that a major gene model with three underlying means

explained one-third of the phenotypic variance (Stein et al. 2005). Where as Shaw et al.

(1997) analyzed TB status as a binary trait and suggested an underlying oligogenic

model, Stein and colleagues (2005) concluded that a major gene underlay TNF-α

expression levels in response to stimulation with Mtb culture filtrate, and that

heterogeneity could be explained by age and HIV status. From these suggested models,

linkage and association analyses have been conducted, thus establishing possible

candidate genes involved in TB susceptibility.

1.1.2. Candidate Genes

Several candidate genes have been examined for association to susceptibility of TB

development after infection, and findings from these analyses are inconsistent.

A major candidate gene likely to be involved in TB susceptibility is the natural-

resistance-associated macrophage protein 1 gene, formerly NRAMP1 and now referred to as the SLC11A1 (solute carrier family member 1) gene. Skamene et al. (1998) and

Kramnik et al. (2000) showed that the mouse ortholog Nramp1 on chromosome 1 was correlated with susceptibility to infection with Mtb. In a large case-control study in The

Gambia, Africa, Bellamy et al. (1998b) found that in a sample of 827 HIV uninfected subjects, polymorphisms in the NRAMP1 gene located on chromosome 2q35 were

10 significantly associated with TB. Although this study design was not able to distinguish between susceptibility to TB infection and susceptibility to clinical TB diagnosis, it provided valuable insight into the genetic susceptibility to TB development.

Several other studies have examined SLC11A1. A meta-analysis by Li et al.

(2006) focused on four specific variants (for HIV-negative individuals only) within the

SLC11A1 gene and reported very inconsistent evidence for its association with TB susceptibility (Li et al. 2006a). Significant linkage of active TB to a variant slightly distal to SLC11A1 was found in a large Aboriginal Canadian family exposed to a TB epidemic (LOD score = 3.81) (Greenwood et al. 2000). Therefore, SLC11A1 plays at least some marginal role in TB susceptibility.

Also recently identified via animal models is a region on chromosome 1 containing the intracellular pathogen resistance gene (Ipr1), affecting TB resistance in mice (Pan et al. 2005). A West African study identified an association with what is the most comparable human homologue to Ipr1, the nuclear body protein gene (SP110) on chromosome 2 (Tosh et al 2006). However, replication studies in South Africa (Babb et al. 2007a), Ghana (Thye et al. 2006), and Russia (Szeszko et al. 2007) did not confirm this association. Another gene identified through transgenic mouse models is the Toll- like receptor gene, TLR2: TLR2 knock-out mice are highly susceptible to TB infection

(Reiling et al. 2002; Drennan et al. 2004). In fact, mice with the TLR2 gene survived the

Mtb infection much longer than those mice not expressing the gene (Heldwein et al.

2003).

Several candidate genes for TB susceptibility have been identified through biologic plausibility, and results from association analyses of these genes with TB is

11 inconsistent. The vitamin D receptor (VDR) gene has been inconsistently associated with

TB infection and/or disease in several populations (Bellamy et al. 1999; Selvaraj et al.

2000; Roth et al. 2004; Liu et al. 2004; Chen et al. 2006; Babb et al. 2007b). A meta- analysis by Lewis et al. of the association between the VDR and TB risk reported inconclusive results because the studies reviewed were underpowered due to sample size limitations (Lewis et al. 2005). Other studies have found inconsistent results for association of TB disease with TLR2 (Ben-Ali et al. 2004; Ogus et al. 2004; Thuong et al. 2007; Yim et al. 2006; Bochud et al. 2003), interferon gamma (IFNG) (Rossouw et al. 2003; Lopez-Maderuelo et al. 2003; Etokebe et al. 2006; Moran et al. 2007; Amim et al. 2008; Pacheco et al. 2008), the Interleukin -1 (IL1) complex of genes [IL1: (Bellamy et al. 1998a; Wilkinson et al. 1999); IL1B: (Kusuhara et al. 2007; Awomoyi et al. 2005;

Gomez et al. 2006); IL1RA: (Bellamy et al. 1998a; Wilkinson et al. 1999 )], Interleukin-

6 (IL6) (Ladel et al. 1997; Oral et al. 2006), Interleukin-10 (IL10) (Lopez-Maderuelo et al. 2003; Stein et al. 2007; Pacheco et al. 2008), Interleukin-12 (IL12) (Leandro et al.

2009), lipase-encoding lipR (Sheline et al. 2009), pro-apoptotic P2X7 (Li et al. 2002;

Nino-Moreno et al. 2007), and nitric oxide synthase 2A (NOS2A) (NOS2A was also

confirmed in knock-out models) (Gomez et al. 2007; Jamieson et al. 2004; Qu et al.

2007; MacMicking et al. 1997). In a 2009 study of Ghanaian patients diagnosed with

clinical TB and Mtb exposed controls, four IL10 promoter variants were genotyped (Thye

et al. 2009a). After analyzing a set of haplotypes reconstructed from these variants, these

authors found the haplotype associated with higher production of IL10 occurred

significantly less in the PPD-negative controls than in the TB cases (odds ratio (OR) =

2.15, 95% confidence interval (CI)= [1.3 – 3.6] ) and PPD-positive controls (OR = 2.09,

12 95% CI = [1.2 - 3.5] (Thye et al. 2009a). These results further support that Mtb exposed

individuals who retain PPD-negative status are genetically distinct from TB cases and

latently-infected individuals.

In a case-control study of African-Americans and Caucasians, an analysis of 39

tag SNPs in the nitric oxide synthase 2A (NOS2A) gene found nine single nucleotide

polymorphisms (SNPs) significantly associated with TB (P < 0.05), multiple SNP

interactions between NOS2A and the interferon gamma receptor 1 (IFNGR1) gene (P

ranging from 0.0004 to 0.0006), and interactions between NOS2A and the Toll-like

receptor-4 (TLR4) gene (0.002 < P < 0.005), in the African-American individuals only

(Velez et al. 2009). Other studies suggest interactions between TLR4, VDR, IL12 and

IFNG (Ben-Ali et al. 2004; Ogus et al. 2004; Yim et al. 2006). Inconsistent results in

these studies are primarily due to underpowered analyses via small sample sizes,

differences in phenotype definition, racial/ethnic diversity within the study sample, and

discrepancies in the characterization of controls.

Another promising candidate gene is tumor necrosis factor (TNF) which helps

code for TNF-α. TNF-α is a proinflammatory cytokine modulated by T cell–macrophage

interaction that mediates granuloma formation as well as the suppression of TB infection

(Keane et al. 2001). In mouse strains deficient in TNF, TB is sometimes lethal, thereby

suggesting evidence of association between the TNF gene and disease (Flynn et al. 1993;

Botha and Ryffel 2003). Although the association between TB susceptibility and TNF

was not found significant in a 2008 meta-analysis (Leandro et al. 2009), variation in the

TNF gene or its promoter region has been associated with increased risk for infectious

diseases, such as malaria, leprosy, and HIV disease progression, as well as certain

13 autoimmune disorders (e.g. asthma, systemic lupus erythematosus, rheumatoid arthritis,

Crohn’s disease, sarcoidosis, psoriasis, and diabetes) (Bidwell et al. 2001; Haukim et al.

2002). In an analysis of 177 Ugandan pedigrees where TB prevalence was high, Stein et al. found that TNF-α had an estimated heritability of 68% (Stein et al. 2003).

Additionally, Stein et al. (2007) used an intermediate phenotype of TNF-α levels in 398 related Kampala, Uganda individuals to study associations with candidate genes related to TNF-α regulation. Results showed that in both HIV-negative and HIV-positive individuals, the candidate genes IL10, interferon-gamma receptor 1 (IFNGR1) and TNF-

α receptor - 1 (TNFR1) were linked and associated with both TB and TNF-α regulation, with the TNFR1 association being novel (Stein et al. 2007). Several other studies have found inconsistent results for association of TB disease with TNF (Correa et al. 2005;

Oral et al. 2006; Stein et al. 2007; Pacheco et al. 2008; Möller et al. 2010b).

Second to Stein and colleagues (2007) considering intermediate phenotypes of TB over a binary trait (presence/absence of TB), Flores-Villanueva et al. (2005) considered a

Mexican sample of incident confirmed TB cases and healthy (albeit non-vaccinated) controls, such that all individuals were HIV-negative (Flores-Villanueva et al. 2005).

They found that the monocyte chemoattractant protein-1 (MCP1) in the 17q11.2 region was strongly associated with susceptibility to TB development post Mtb infection (P =

0.0003). These results were successfully replicated in a Korean sample. Considering non-vaccinated controls with recent exposure to a TB case assured the researchers that these controls were infected with Mtb but were not TB-diseased (Flores-Villanueva et al.

2005).

14 Recently, Thye et al. (2009b) examined a set of polymorphisms within the MCP1

gene in a large Ghanaian sample in West Africa. Considering case-control data in

addition to affected nuclear families and a replication analysis of Russian cases and

controls, they found that one of the polymorphisms, MCP1 – 2581G was significantly associated with resistance to TB in the cases versus the controls (corrected P = 0.0012,

OR = 0.81, 95% CI [0.73–0.91]) and in the nuclear families (P = 0.04, OR = 0.72, 95%

CI not reported), in addition to the MCP1-326C variant being significant in both samples.

However, no associations with infection resistance or disease susceptibility were found in the Russian sample (Thye et al. 2009b). This suggests a possible genetic difference between the African sample and the Russian sample, and haplotype analyses in this same study identified differences in linkage disequilibrium (LD) structure between the

Russians and Africans (Thye et al. 2009b). Further analyses are necessary to determine

disease causality, such as fine mapping and resequencing, as well as continual

examination of differences in LD structure across and within different ethnic groups.

1.1.3. Genome-wide Linkage Scans

In addition to candidate genes studies illustrating evidence of a genetic component to TB

susceptibility, linkage analyses have also provided evidence that genetics controls an

aspect of TB resistance and/or infection. To further study genetic susceptibility to TB,

Bellamy et al. (2000) conducted a two-stage genome-wide microsatellite scan of African

individuals in The Gambia and South Africa with an average density of one marker every

11 cM. Considering 92 concordantly affected sibling pairs, seven chromosomal regions

were identified in the full genome-wide analysis (LOD score > 1.0), where markers were

15 genotyped in a second set of individuals from the same African regions. Results from the

analysis of the combined data showed suggestive linkage to TB susceptibility on

chromosomes 15q (LOD = 1.82) and Xq (LOD = 2.18), possibly explaining the

additional cases of TB commonly found in males over females (Bellamy et al. 2000).

Miller et al. (2004) conducted a genome-wide non-parametric linkage scan of 405

microsatellite markers across the genome, set approximately 10cM apart, in 26 Brazilian

families from areas with high TB prevalence (Miller et al. 2004). Eight regions in the

genome provided evidence for suggestive linkage (P < 0.05) to TB, with replicated

linkage peaks on chromosomes 10 (10q26.13) and 20 (20p21.1) (Miller et al. 2004).

Another genome-wide scan on 96 Moroccan families revealed a major

susceptibility locus for TB on chromosome 8q12-13 (Baghdadi et al. 2006). In this

model-free linkage scan, 388 microsatellite markers spanning the genome (average

spacing about 10cM apart) were analyzed, with a maximum LOD score of 3.49 (P = 3 x

10-5). These authors further investigated the area through a model-based linkage analysis of a chromosome 8 region, finding a maximum LOD score similar to the model-free estimate (LOD score = 3.38, P = 4 x 10-5). Because dividing the data into two sets of

pedigrees (those with and those without affected parents) revealed stronger linkage at the

susceptibility locus in families with at least one affected parent than in the entire sample,

Baghdadi et al. concluded that an autosomal dominant allele of a major susceptibility

gene controlled TB disease in these Moroccan families.

To focus on the intermediate phenotypes related to progression from Mtb

infection to TB, Stein et al. (2008) conducted a full family-based genome-wide linkage

analysis of microsatellite markers on 803 Ugandan individuals, both HIV-negative and

16 HIV-positive. Phenotypes considered included exposure to TB, exposure to TB coupled

with Mtb infection, and culture-confirmed TB. Suggestive linkage to TB disease was

found on a 34 cM long segment on chromosome 7 (P = 0.0002), in addition to a 25 cM

long region on (P = 0.002) (Stein et al. 2008). Of specific interest is the

reported region on chromosome 7, 7p22-7p21, as it contains the IL6 gene, an

immunoregulatory cytokine which inhibits production of TNF-α and IL-1β, and thus may

be harmful in mycobacterial infections (van Crevel et al .2002). Ladel et al. found that

IL6-deficient mice were highly susceptible to TB infection and that this infection was

lethal (Ladel et al. 1997), while Oral et al. (2006) did not find significant differences in

the distribution of the IL6 gene polymorphisms or differences in IL6 allele frequencies

between TB cases and controls. Such contradictory evidence suggests further pursuit of

the IL6 gene’s involvement in TB susceptibility.

Further upstream in this chromosome 7 region is the gene CARD11 (caspase

recruitment domain family, member 11), which is part of the NOD-like receptor (NLR) pathway. This gene is of interest because NLRs have non-redundant roles in Mtb

recognition (Berrington and Hawn 2007). CARD11 has been associated with TB in a

yet-unpublished genome-wide association study of TB conducted in a Vietnamese

population (Dr. Thomas Hawn, personal communication).

The same region on chromosome 20q13 observed by Stein et al. (2008) was

found to be a major susceptibility locus for TB in a study of South African and Malawian

sibling pairs, HIV-negative and HIV-positive cases included (Cooke et al. 2008). South

Africans considered in this study were of mixed races, and no population stratification

testing or adjustment was performed. Two genes in this chromosome 20 region,

17 melanocortin 3 receptor (MC3R) and cathepsin Z (CTSZ), were mapped in South African and Malawian populations, a novel discovery in TB susceptibility (Cooke et al. 2008).

Single-point and multi-point sibling pair linkage analysis performed on a set of 402 microsatellite markers identified 64 markers for further study. Forty SNPs in the 1-LOD drop interval around the highest linkage peak on chromosome 20 were then analyzed for association with disease in a large independent West African case-control sample.

Adjusting for age, sex, ethnicity, and HIV status, a logistic regression analysis found significant evidence for association of a protective effect against TB with polymorphisms in both MC3R (protective genotype AA) and CTSZ (protective genotype TT) genes

(Cooke et al. 2008). Both chromosome 20 genes are biologically relevant to TB susceptibility as MC3R plays a suggested role in the regulation of energy homeostasis, while CTSZ is expressed in cancer cell lines with possible involvement in host defense and tumorigenesis (Pruitt et al. 2007).

Most recently, Mahasirimongkol et al. (2009) conducted a genome-wide linkage analysis in a Thailand pedigree (Mahasirimongkol et al. 2009). Using 93 Thai families, a nonparametric multipoint linkage analysis was conducted using MERLIN (Abecasis et al.

2002). Haplotypes were built based on LD between SNPs, and these inferred haplotypes were used as multiallelic markers for the linkage analysis. Accounting for LD in this way, Mahasirimongkol et al. found a maximum LOD score of 2.29 on chromosome

5q23.2-31.3. Additionally, these authors conducted an ordered subset analysis by minimum age of onset of TB, finding two regions of suggestive linkage, 17p13-13.1

(maximum LOD score 2.57) and 20p13-12.3 (maximum LOD score 3.33). (Minimum age of TB onset was used because the authors assumed TB to occur at a younger age

18 through actual immunological impairment rather than repeated exposure to Mtb or reduced immune response due to old age.) This chromosome 20 region is about 70cM away from the 20q13 region found by Cooke et al. (2008) and Stein et al. (2008), and thus does not cover the CTSZ gene or MC3R gene.

TB infection is only one phase of the disease’s progression. The latent phase of the disease is also crucial, as roughly 20% of individuals with long term exposure to TB display a natural resistance to infection (Rieder 1999). In an analysis of tuberculin skin test (TST) reactivity, a method of intracutaneous testing for tuberculin sensitivity, a genome-wide linkage search conducted by Cobat et al. (2009) found two loci linked to

TST reaction. Model-free linkage analysis of adjusted residuals was conducted in a

South African population consisting of 128 nuclear families, whereby residuals were adjusted for previous TB diagnosis, age, and sex. Significant linkage results for TST positivity (where a “positive TST” was any visible skin reaction of diameter greater than

0 mm) was found on chromosomal region 11p14 (LOD = 3.81), with a lack of response indicating Mtb resistance. Furthermore, these authors considered a second quantitative phenotype, the extent of TST reactivity. Significant linkage to this phenotype was found at chromosomal region 5p15 (LOD = 4.00); fine association mapping of this region using an r2 threshold of 80% and minor allele frequency cut off of 5% identified the

SLC6A3 gene (solute carrier family 6 member 3) as a possible candidate gene for TST reaction. In a Ugandan sample analyzed by Stein et al. (2008), this same region was found suggestive for linkage (P = 0.0005) with persistently negative TST. From these results, the authors concluded individuals who did not become infected, despite persistent exposure to TB, retained this resistance via a genetic difference in their T-cell regulators.

19 Though there exists an array of evidence for a genetic role in TB susceptibility, no

precise causal genetic variants have been identified. Therefore, the linkage results

reported here should be further evaluated using other genetic analyses, such as

association-based tests and fine mapping approaches.

1.2. Methods for Fine Mapping Analysis

In order to identify the specific gene(s) underlying the trait of interest, linkage

scans must be followed up by fine mapping analyses and genetic sequencing. A “1-LOD

drop interval” from the strongest linkage signal is equivalent to a 96.8% confidence

interval for the location of the site causing linkage (Mangin et al. 1994). It has been

shown that for almost all quantitative trait locus (QTL) models, this 1-LOD interval has

the correct probability of containing the exact QTL (Mangin et al. 1994). Although this

may not hold for all methods of analysis, it seems an appropriate basis. Examples of fine

mapping approaches presented here examine associations with diseases other than TB,

but these methodologies provide information applicable to my analyses conducted on TB

susceptibility.

Examples of fine mapping methods include that of Li et al. (2006b), who studied the association between haplotypes and susceptibility to age-related macular degeneration

(AMD). Considering an area already known to be strongly associated with AMD, the authors conducted a single-SNP association test on 84 SNPs in a region of 123 kb, on related affected individuals versus unrelated controls. Forward stepwise logistic regression was used to construct haplotypes associated with AMD: at each step in the regression, the SNP that increased the likelihood ratio statistic the greatest was added to

20 the model and a permutation approach was then used to compare haplotypes between

cases and controls (Li et al. 2006b).

Jallow et al. conducted a fine-resolution multipoint analysis (Jallow et al. 2009).

After performing a genome-wide association analysis, an area on chromosome 11 was identified for fine mapping. A 111 kb region in the center of the strongest signal on chromosome 11p15 was sequenced in a reference panel of 62 randomly selected

Gambian controls and used as a reference panel for imputation for 2,500 individuals.

Imputation of SNPs satisfying genome-wide association study (GWAS) quality control assessments and of SNPs with relative significance was conducted (Marchini et al. 2007), and a test for trend was applied at each imputed SNP to study the association between any imputed SNP and disease. Their results provided evidence that using multipoint association mapping via model-based imputation can identify the casual variant within a

GWAS signal (Jallow et al. 2009).

Xing et al. identified genetic linkage of systematic lupus erythematosus (SLE) in an African American family-based sample to a region on chromosome 13 previously identified by these authors, confirming prior results. These authors finely mapped 324 microsatellite markers (average distance 11.35 cM apart) plus an additional 12 microsatellites, spanning a localized region of 29.62 cM (with average marker distance

1.97 cM) (Xing et al. 2005). The distribution of alleles shared identical by descent

(IBD) at 13q32 was compared between affected relative pairs, followed by the construction of haplotypes using a two-stage hidden Markov model (HMM) approach as implemented in GENEHUNTER (Kruglyak et al. 1996) and the estimation-maximization

(EM) - algorithm as applied in the Statistical Analysis of Genetic Epidemiology

21 (S.A.G.E.) program DECIPHER (S.A.G.E. v. 6.1.0.) at a ~ 25.3 cM region. Comparing

these haplotypes against the distribution of alleles shared IBD in affected relative pairs

finely mapped the disease locus (Xing et al. 2005).

Cervino et al. (2002) conducted a two-stage non-parametric sibling-pair linkage

analysis in Gambian and South African populations to identify a TB susceptibility locus

(Cervino et al. 2002). Stage 1 considered a set of 299 microsatellite markers across all chromosomes, followed by a second linkage analysis of seven chromosomal regions, including additional Gambian and South African families. Combining the entire dataset provided the best evidence for linkage at a 14 cM region on chromosome 15q11-13,

(combined LOD score 2.00). To follow up these results, 10 microsatallite markers and five SNPs were considered for a fine mapping test of association within the families, plus an additional 44 Guinea-Conakry families. A transmission disequilibrium test (TDT) and exact symmetry test found that a seven deletion in the UBE3A gene was marginally associated with TB (P = 0.002).

To accurately localize the disease variant, fine mapping analyses consider a refined region of the genome, with marker densities usually stronger than what is studied in genome-wide linkage and/or association analyses. Li et al. (2006b) examined SNPs in affected related and unrelated individuals in a haplotype-based approach, while Jallow et al. (2009) applied an imputation-based method to a set of unrelated individuals. Xing et al. (2005) considered African American families in a replication analysis of microsatellite markers using an IBD and haplotype-based approach, while Cervino et al. (2002) performed a TDT and exact symmetry test of microsatellites in a set of African families.

These fine mapping approaches provide insight into the analysis of the genetic

22 susceptibility to disease. Based on the type of markers involved, subject selection, ascertainment methods and sample size, methods must be appropriately applied.

1.3. Imputation

A method proven to increase the power of detecting an association between a genetic variant and the defined phenotype is the prediction of genotypes at untyped markers, or imputation (Marchini et al. 2007; Servin and Stephens 2007). Analyses of imputed genotypes provides an accurate and valid method of replication, better resolution of detected associations, and often approximates results that would have been obtained by directly genotyping all SNPs near the loci of interest (Servin and Stephens 2007).

Imputation can additionally provide a strategy for quality control by identifying genotyping errors and can be used in family-based studies to help replace missing genotypes of untyped family members (Ellinghaus et al. 2009). A reference population must be identified in order to impute the unknown genotypes, as imputation calculates its results based on similar patterns of LD between the study population and a reference population. Currently, the populations provided by the International HapMap Project

(2003) are the best characterized reference populations available, although projects such as the 1,000 Genomes Project will provide reference haplotypes from more populations in the near future.

There are several existing software packages which implement SNP genotype imputation. These include BEAGLE (Browning and Browning 2007; Browning and

Browning 2009), BIM-BAM (Bayesian Imputation-Based Association Mapping) (Servin and Stephens 2007), IMPUTE (Marchini et al. 2007), MACH (Li and Abecasis 2006),

23 and PLINK (Purcell et al. 2007). Both BIM-BAM and BEAGLE implement haplotype- clustering methods via a HMM. MACH and IMPUTE also apply HMM, whereas PLINK is centered around multi-marker tagging with basic Expectation-Maximization (EM) phasing algorithms (Pei et al. 2008). MACH can utilize the phased reference haplotype off HapMap, directly from the downloadable files, while IMPUTE and BEAGLE use their own reference format, although HapMap Phase II files can be downloaded off of the

IMPUTE and BEAGLE websites (Ellinghaus et al. 2009). A comparison of these programs, excluding BIM-BAM (as it focuses less on imputation and more on association, thereby lacking a measure of imputation confidence) examined these programs’ accuracies, efficacies, and runtimes (Nothnagel et al. 2009). Considering a

German sample of 449 unrelated individuals and using the HapMap CEU reference panel, Nothnagel and colleagues concluded that PLINK failed to impute over 60% of the imputable SNPs and was consistently inaccurate compared to IMPUTE and MACH.

IMPUTE and MACH illustrated trade-offs between accuracy and efficacy, although they overall out-performed BEAGLE in both areas. Defining imputation efficacy as “the proportion of imputable SNPs for which the program-specific confidence in an imputed genotype equaled or exceeded a given confidence threshold”, and imputation accuracy as “the concordance rate between the imputed and observed genotypes of these SNPs”, Nothnagel et al. 2008 found that MACH and IMPUTE had almost identical trade-offs between accuracy and efficacy, irrespective of the imputation basis and when varying confidence threshold values. Although BEAGLE was only slightly less accurate than IMPUTE and MACH, PLINK performed consistently more poorly (Nothnagel et al. 2009).

24 Furthermore, IMPUTE requires that the user define the recombination and mutation rates and is therefore more sensitive to model misspecification (Browning

2008), whereas MACH estimates the recombination rates given the sample data provided using an EM iterative approach such that recombination and mutation rates are estimated at the end of each iteration. Because of this iterative approach, MACH requires more computational power and running time than IMPUTE and BEAGLE. Additionally,

Nothnagel et al. (2009) concluded that MACH and BEAGLE were more user-friendly than IMPUTE. IMPUTE, MACH, and BIM-BAM are considered computationally intensive tools, such that all observed genotypes are considered when each missing genotype is imputed. On the other hand, PLINK and BEAGLE can be computationally more efficient tools such that they focus on genotypes for a small number of nearby markers when imputing each missing genotype (Li et al. 2009).

A second review by Ellinghaus et al. (2009) of BEAGLE, IMPUTE, and MACH extended that of Nothnagel et al. (2009). This review provided a detailed comparison of the three programs, making suggestions to the user based on resource accessibility

(computer, documentation, cost of license, etc.), input (genotype format, reference data format, conversion utilities), processing (runtime, maximum memory allowance, error correction, etc.), output (quality measures, file size, file format, etc.) and other items such as X-chromosome imputation and accuracy estimation (Ellinghaus et al. 2009). Each program has its limitations: BEAGLE is marginally less accurate than IMPUTE and

MACH, IMPUTE is susceptible to model misspecification and MACH has the potential for longer runtimes. IMPUTE and MACH can only work with diallelic markers, where

BEAGLE can handle mutli-allelic markers.

25 Imputation can be a very useful tool in the detection of associations between genetic markers and disease. However, appropriate selection of the specific imputation program should be considered.

26 Chapter 2: Specific Aims

The purpose of this study was to examine the association of TB susceptibility and a

selection of biologically relevant markers and to fine map a region identified through a

previous genome scan, utilizing an imputation analysis.

2.1. Specific Aim 1

To Perform a Candidate-Gene Analysis for TB Susceptibility

Rationale: To identify any possible associations between TB susceptibility and a set of

pre-selected, biologically relevant candidate genes.

Approach: Four candidate genes, IL6, CARD11, CTSZ, and MC3R, were selected for

analysis, based on previous findings of suggestive linkage (IL6, CARD11) and replicated

linkage (CTSZ, MC3R) (Stein et al. 2008). Within these regions, 57 tag SNPs were selected and were genotyped on an Illumina 1536-SNP BeadArray. Association analyses were conducted using a generalized linear mixed model approach as conducted in SAS’s

PROC GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004).

2.2. Specific Aim 2

To Perform a Fine-Mapping Analysis for TB Susceptibility

Rationale: To more finely map a major TB susceptibility locus on chromosome 7p

identified through a whole genome-wide linkage scan (Stein et al. 2008).

Approach: SNPs were selected equidistant across a region on chromosome 7 based on a well defined SNP quality. Association analyses were conducted using a generalized

27 linear mixed model approach as conducted in SAS’s PROC GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004)

2.3. Specific Aim 3

To Impute Missing Genotypes and Analyze for Associations with TB Susceptibility

Rationale: To increase the power and resolution of a scan for associations between genetic variants and the defined TB susceptibility phenotype.

Approach: Based on similarity of the LD structure in three HapMap populations (the

Yoruba in Ibadan, Nigeria, the Maasai in Kinyawa, Kenya, the Luhya in Webuye, Kenya) and that of my Ugandan data, unknown SNP genotypes were imputed using the MACH

1.0 software package (Li and Abecasis 2006). The candidate gene analysis and fine- mapping analysis were carried out using the imputed genotypes in SAS’s PROC

GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004).

28 Chapter 3: Methods

3.1. Data Description

3.1.1. Sample

Study participants were recruited to the Household Contact Study in Kampala,

Uganda (Phase I), between 1995 and 1999, and the Kawempe Community Health Study in Uganda (Phase II), which enrolled study subjects between April 2002 and February

2004 (Guwatudde et al. 2003; Stein et al. 2005). Both studies are supported by the

Tuberculosis Research Unit (Principal Investigator: Dr. W. Henry Boom). Phase I of the study (1995 to 1999) ascertained households through a TB index case. Index cases were

identified at the Uganda National Tuberculosis and Leprosy Program (NTLP) Treatment

Center as having a positive acid-fast (AFB) smear and positive Mtb culture. Households

were included in the study if the index case lived with at least one other individual and all

household individuals (index case and parents/guardians of children in the household

ages 18 years or less included) provided informed consent to participate in the study.

Phase II of the study (2002 to 2004) ascertained families through index cases only

required to have a positive Mtb culture and to be referred to the study by the NTLP. All

household contacts were defined as persons residing in the household for at least seven

consecutive days during the three month period prior to the TB diagnosis of the index

case.

Upon enrollment, all participants were given physical examinations, including

HIV testing. Individuals in the households expected to have TB received chest x-rays

and sputum samples for culture and AFB smear. In the case of young children, gastric

lavage was performed. Based on these collections, TB diagnoses were assigned as

29 defined by the American Thoracic Society (ATS). All TB-confirmed cases received

short-course therapy (ATS 1994). All study participants received TST, using the

Mantoux method of intracutaneous testing for tuberculin sensitivity. Test results were

defined separately for HIV-positive children less than or equal to five years of age (TST

concluded positive if induration diameter was 5mm or greater) versus all HIV-negative

individuals greater than five years of age (TST concluded positive if induration diameter

was 10mm or greater). For those individuals with a negative TST at their baseline

evaluation, a skin test was given again at 3, 6, 12, and 24 months. Subjects with two or

more negative TSTs over a two year period were concluded to be persistently TST-

negative. Most individuals, if exposed to TB, convert from non-infected (TST-negative) to Mtb-infected (TST-positive) within the first three months after exposure to an index case. Thus, if individuals did not convert from TST-negative to TST- positive within two years of exposure, it was safely assumed that they would not convert thereafter (Stein and

Boom, personal communication). Individuals who converted from a negative TST upon

enrollment to a positive TST during the study were indentified as part of the latent Mtb

infection (LTBI) group. Phase II participants with a positive TST result were offered the

anti-tuberculosis medication Isoniazid.

3.1.2. Descriptive Statistics

General descriptive statistics were performed on variables of interest, using

Statistical Analysis Software (SAS) software v9.2 (Cary software, North Carolina SAS

Institute Inc. 2004) (Table 1). After removing any samples with call rates less than 90%,

564 individuals with complete genotype data were included from both Phase I and Phase

30 II, with a median age of 16 years. The sample comprised 318 females (56.4%) and 246

males (43.6%); 430 (76.2%) individuals were HIV-negative while 89 (15.8%) individuals

were HIV-positive, and the other individuals’ HIV statuses were unknown. A total of

122 (21.6%) individuals had confirmed TB. The sample comprised 243 pedigrees,

including 73 singletons, 230 parent-offspring pairs, and 32 sibling pairs, with a mean

family size of 5.08 individuals, and a standard deviation of 5.87.

3.2. Genotyping

All genotyping was performed on the Illumina BeadArray platform using a

custom 1536 SNP microarray. Pedigree structures were verified using S.A.G.E.’s

RELTEST (S.A.G.E. v6.1.0.) and RELPAIR (Epstein et al. 2000) in the Stein et al.

(2008) genome scan analysis and population substructure as tested using STRUCTURE

(Pritchard et al. 2000) was found absent from this Ugandan sample (Stein et al. 2008).

Mendelian inconsistencies were removed prior to analysis using MARKERINFO, and all

marker allele frequencies were estimated in FREQ (S.A.G.E. v6.1.0.). Call rates by plate

were considered as an additional measure of quality control (QC) and any SNPs that did

not meet the call rate of 90% were excluded. Deviation from Hardy Weinberg

Equilibrium (HWE) (P < 0.001) was examined in the pooled data (cases and controls combined). Also, signal intensities were verified as falling into three distinct genotype groups and thus no SNPs were lost due to inadequate signaling. Based on these QC measures, a total of 119 SNPs were excluded from analyses.

Genotyping for the candidate gene analysis was based on a set of SNPs selected after finding suggestive evidence for linkage to an area on chromosome 7 (7p22-p21)

31 containing the IL6 gene (Stein et al. 2008). A suggestive linkage signal (P = 0.002) to

TB on chromosome 20q13 was observed in this same genome scan, replicating the Cooke

et al. (2008) results that found the MC3R and CTSZ genes on chromosome 20q13

mapped to TB in African populations (Cooke et al. 2008), and these gene regions were

genotyped as well. SNPs within the CARD11 gene were also considered as it was found

to be within the 1-LOD support interval from the significant linkage signal on

chromosome 7 (Stein et al. 2008). CARD11, which is part of the NOD-like receptor

(NLR) pathway, is of interest because NLRs have non-redundant roles in Mtb recognition

(Berrington and Hawn 2007), and because of reported association with TB (Dr. Thomas

Hawn, personal communication). Due to its large size (about 13.8 kb wide), CARD11 could not be covered with tag SNPs, and thus 17 SNPs found to be associated with this gene and TB in a Vietnamese study (Drs. Thomas Hawn and Nguygen Thuy Thuong, personal communication) were selected. However, if one of these SNPs was not

available on the Illumina platform or did not meet the minor allele frequency (MAF) threshold of 5% or a defined SNP score quality criterion (Illumina SNP quality score >

0.6), it was replaced with the nearest SNP that met the MAF and SNP score thresholds.

SNP quality scores were determined by Illumina and are based on the probability of success of the assay and validation of the SNP in at least two populations. Tag SNPs for

IL6, CTSZ, and MC3R were selected through the tagger application of the Genome

Variation Server (SeattleSNPs Program for Genomic Applications PGA, 2009) using a

MAF threshold of 5%, an r2 cutoff of 80%, and the HapMap Phase II populations YRI

(Yoruba in Ibadan, Nigeria), MKK (Maasai in Kinyawa, Kenya), and LWK (Luhya in

Webuye, Kenya). Because the tag SNPs for these three HapMap populations were not

32 the same, any SNP that was identified in any of these populations as a tag SNP was

selected for analysis. After removing seven SNPs that did not meet the call rate threshold

of 90%, a total of 50 SNPs across the four candidate genes were considered in the final analysis (27 in CARD11, 16 in IL6, five in CTSZ, and two in MC3R).

The SNPs for the fine mapping analysis were selected across a 17.84-Mb region

(the 1-LOD drop region) on chromosome 7p. The region was divided into equidistant windows of 11.4 kb, a size chosen to fit the remaining openings on the 1536-SNP bead array. Excluding the windows containing the 17 SNPs associated with TB in the

Vietnamese study referenced above, the SNP with the highest quality score was selected from each window. This score was defined as follows: 5 = MAF > 0.15 and SNP score

> 0.8; 4 = MAF > 0.10 and SNP score > 0.7; 3 = MAF > 0.10 and SNP score > 0.6; 2 =

MAF > 0.10 and SNP score > 0.4; 1 = MAF > 0.05 and SNP score > 0.4. In the case of a tie, the SNP closer to the center of the window was chosen. This strategy resulted in a total set of 1,367 SNPs for the fine mapping analysis after removing 112 SNPs that did not meet the call rate threshold of 90%.

33

Table 1. Descriptive Statistics

Chi-Square No TB TB Total P

Males 181 65 246 (44%)

Females 261 57 318 (56%) 0.01506

HIV + 34 55 89 (16%)

HIV - 363 67 430 (76%) < 0.0001

BCG scar 270 56 326 (58%)

No BCG scar 99 41 140 (25%) 0.00317

Total 442 (78%) 122 (22%) 564

3.3. Analysis Strategy

3.3.1. Aim 1: Candidate Gene Analysis

To assess the association between TB susceptibility and the covariates of interest, the TB

phenotype was dichotomized: (1) non-cases without TB (TST-negative) and latent Mtb infected cases (LTBI) combined, (2) index cases only (probands diagnosed with active

TB at the time of study enrollment). Individuals with active TB were coded as affected while those without active TB (the LTBI and TST-negative individuals) were coded as unaffected. Because these are household data, unrelated singleton individuals were identified within each household and treated as unrelated individuals in the analyses.

Included as covariates in the analysis were HIV status (which as discussed has been shown to have a significant affect on TB susceptibility), sex, and age dichotomized at five years. This binary age covariate was included partly because results are

34 inconclusive as to whether young children in TB endemic settings express less robust

IFN-γ responses to Mtb antigens than adults (Kampmann et al. 2006; Lewinsohn et al.

2008). Furthermore, conflicting sex and age-dependent association results have been reported with alleles of the SLC11A1 gene (Leung et al. 2007; Malik et al. 2005) as well as significant linkage with earlier onset TB (Mahasirimongkol et al. 2009). Lewinsohn et al. (2008) found that young children (age less than five years) exposed to TB displayed more robust IFN-γ responses than adults.

Due to the relatedness of individuals in this sample, a simple logistic regression analysis that did not take familial correlations into account could not be applied.

Furthermore, because the majority of affected individuals were the parents in these families and not the offspring, the TDT, which compares the number of alleles transmitted from an informative parent heterozygous at that particular SNP to his/her affected offspring, was not appropriate for these data. Therefore, a generalized linear mixed model (GLMM) approach was applied as implemented in SAS’s PROC

GLIMMIX procedure (Cary software, North Carolina SAS Institute Inc. 2004). Whereas a generalized estimated equation approach (such as PROC GENMOD) allows for only fixed effects to be modeled, PROC GLIMMIX incorporates both fixed and random effects into parameter estimation, via “pseudo-likelihood” techniques as in Wolfinger and

O’Connell (Wolfinger and O’Connell 1993) and Breslow and Clayton (Breslow and

Clayton 1993).

For an n × 1 vector of observations, Y, and γ an r × 1 vector of random effects, models fit by the GLIMMIX procedure assume that E[Y |γ ] = g− 1(Xβ + Zγ) where g( ·)

is a differentiable monotonic link function and g−1 (·) is its inverse. The matrix X is an

35 n × p matrix of rank k, and Z is an n × r design matrix for the model’s random effects.

These random effects are assumed to be normally distributed with mean 0 and variance matrix G. The GLMM contains a linear mixed model inside the inverse link function, referred to as the linear predictor, η = Xβ + Zγ (Cary software, North Carolina SAS

Institute Inc. 2004).

The GLIMMIX procedure distinguishes “R” and “G” random effects, depending on whether the variance of the random effect is contained in G or in variance matrix R,

such that var[Y] = A1/2RA1/2, where matrix A is a diagonal matrix of response variances

and these variances are functions of the mean. If a random effect is an element of γ, it is a G-side effect. Otherwise, it is an R-side effect, where R-side effects are also called

“residual” effects. Standard errors of the parameter estimates are obtained from the negative of the inverse of the (observed or expected) second derivative Hessian matrix H.

In this application, the GLMM is approximated by a linear mixed model based on covariance parameter estimates. The resulting linear mixed model is then iteratively fit.

Upon convergence, the new parameter estimates update the linearization, a process which stops when parameter estimates between successive linear mixed model fits change only within a specified range (Cary software, North Carolina SAS Institute Inc. 2004).

The response variable in a GLMM is usually assumed to be independent for all

subjects; however, because these analyses are conducted on pedigree data, there exist

correlations between observations. These correlated data are modeled using the same

link function and linear predictor setup as in the standard logistic model (the independent

case), and the random component is described by the same variance functions as in the independence case. However, the correlation structure, which here is a function of the

36 relationships between individuals in a given family, must be incorporated. PROC

GLIMMIX allows the user to define the correlation structure of the data. Here, an

exchangeable correlation structure within each pedigree was applied, assuming that all

individuals have a correlation of ρ = 1 with themselves, and a correlation ρ with any other member in their family, i.e., for n = 1, 2,…, i families, the G covariance matrix is defined as ⎛⎞1 ρ ⎜⎟. G = ⎜⎟% ⎜⎟ ⎝⎠ρ 1

This may induce some bias, since sibling pairs, for example, are more correlated than

avuncular pairs. However, due to the particular family structure given here, as most of

the relative pairs are parent offspring, I assumed that an exchangeable correlation

structure most appropriate.

Each SNP was entered into the GLMM individually, along with the three

covariates, HIV status, dichotomized age (< five years or > and equal to five years) and

sex. Strong LD was observed within the candidate genes (data not shown), and LD

between genes was assumed to be absent (r2 = 0), therefore a significance threshold of P

= 0.0125 (0.05 / 4 genes) was applied to the candidate gene results.

3.3.2. Aim 2: Fine Mapping Analysis

A total of 1,479 SNPs were selected evenly across a 17.84-Mb region (the 1-LOD

drop interval) on chromosome 7p for fine mapping, i.e. roughly 1 SNP every 11.4kb. Of

these, 1,367 total SNPs met the 90% call rate threshold. Although this chromosomal region contains the CARD11 and IL6 candidate genes, SNPs in these genes were not

37 considered in the fine mapping analysis to avoid redundancy with the tag SNPs

genotyped in the candidate gene analyses. The same approach applied to the candidate

gene analysis was applied to the fine mapping analysis, again by considering TB as a

binary trait. Likewise, HIV status, dichotomized age, and sex were considered in the

model. After careful consideration of the LD present in the fine mapping data, a

statistical significance threshold for the markers assumed the SNPs within the

2 chromosome 7 region had a LD measure of r = 10% (and therefore 90% were

independent) on average, resulting in a significance threshold of P = 4.13 x 10-5

(0.05/[(1,344*0.9)]).

3.3.3. Aim 3: Imputation

In order to increase the power and resolution of association detection between these SNPs

and TB susceptibility, untyped genotypes were imputed using MACH 1.0 (Li and

Abecasis 2006). Based on user-friendliness, narrowed regions for imputation, and the

experience of collaborators, MACH was chosen for these analyses. Using HapMap’s

Genome Browser application, chromosomal regions for imputation were selected based

on the genotyped regions in the Ugandan data, plus an extension of 250kb at each end of

the region to capture any SNPs that my be in LD with other SNPs outside of the regions

of interest. Given the parental genotypes, the offspring contribute no information, thus

MACH assumes all subjects are unrelated in the reference panel. Therefore, it was

necessary to correct Mendelian after imputation. Although some family information was

discarded through this approach, accuracy was assumed unaffected since the SNPs

provided were close together.

38 We used a separate dataset to guide selection of the appropriate HapMap

reference population. Several recent studies suggest that TB susceptibility is related to

the TLR pathway (Berrington and Hawn 2007). To examine the similarity in genotype

frequency and LD structure between HapMap populations and this household study

population, full-exon resequencing of TLR genes was conducted in unrelated individuals

from Uganda, all of Black African decent (Baker et al. 2009). Sequences were aligned

and analyzed with the programs PHRED/PHRAP (Ewing and Green 1998; Ewing et al.

1998) and CONSED (Gordon et al. 1998) and genotypes were constructed. The TLR

genes of interest included TLR2, TLR4, TLR6, and an adaptor-like protein involved in

TLR4 signal transduction, toll/interleukin 1 receptor domain-containing adaptor protein

(TIRAP). A Pearson χ2 test with two degrees of freedom was applied to test for significant differences (at the α = 0.05 significance level) in genotype frequencies.

A key concept in imputation is the minor allele frequency (MAF): if the MAFs between the reference population and the study population are significantly different, imputation accuracy is greatly reduced. Therefore, to analyze discrepancies in MAFs between the Uganda sample and the pre-selected reference panel, 100 imputed SNPs across the chromosome 7 and chromosome 20 regions were randomly selected and using a chi-square test of significance, these 100 imputed genotypes were compared to genotypes from each of the three reference populations.

In addition to MAF and LD, heterozygosity is an important component of imputation. Nothnagel et al. (2009) found that increasing marker heterozygosity (mildly) reduced imputation accuracy and also reduced the general trade-off between accuracy and efficacy (Nothnagel et al. 2009). Thus, the overall marker heterozygosity of these

39 Ugandan genotypes was established using Haploview (Barrett et al. 2005). Additionally, estimating the haplotype frequencies using the EM-algorithm (as implemented in SAS’s

PROC HAPLOTYPE) in the different populations at the TLR genes provided a more appropriate comparison of genomic structure between the Ugandan and Kenyan populations, leading to a definitive reference population selection.

Pooling the imputed genotypes with the known marker information, i.e. family identification, individual identification, sex, father/mother identification, and covariate values, the candidate gene analysis and fine-mapping analysis were repeated using these pooled data.

40 Chapter 4: Results and Discussion

4.1. Results

4.1.1. Candidate Gene Analysis

A total of 564 Ugandan individuals were genotyped at 50 SNPs across the four

candidate genes, CARD11, IL6, CTSZ, and MC3R. In the pooled sample (TB cases and

controls combined), the genotype distributions of two SNPs significantly departed from

HWE (Table 2). Based on a defined significance threshold, P = 4.06 × 10-5, none of the candidate genes SNPs were found significantly associated with TB susceptibility (Table

3). Furthermore, none of the 50 SNPs was associated with TB susceptibility at the P =

0.05 level.

Table 2. Candidate Gene SNPs Departing from HWE SNP Gene Position ObsHET PredHET HWpval MAF Alleles rs12700594 CARD11 3123227 0.352 0.48 5.49E-07 0.4 A:G rs12700386 IL6 22729534 0.459 0.383 1.00E-04 0.258 G:C *All P-values less than 0.0001

41 Table 3. Candidate Gene Analysis Results SNP Gene Location Estimate Standard Error P-value rs6976564 CARD11 2,884,623 0.00075 0.00191 0.69317 rs2644303 CARD11 2,894,071 -0.00013 0.00196 0.94841 rs6948739* CARD11 2,899,721 -0.00026 0.00196 0.89329 rs2679251 CARD11 2,905,002 0.00025 0.00194 0.89858 rs2527516* CARD11 2,905,445 0.00005 0.00195 0.97917 rs1878805 CARD11 2,927,940 0.00004 0.00195 0.98493 rs1636166 CARD11 2,941,892 -0.00007 0.00192 0.96952 rs10229368 CARD11 2,952,112 0.00003 0.00195 0.98689 rs746009 CARD11 2,961,390 0.00014 0.00194 0.94085 rs7794674 CARD11 2,973,945 -0.00080 0.00195 0.68235 rs4719737 CARD11 2,985,488 0.00015 0.00194 0.93656 rs1843933 CARD11 2,997,718 -0.00012 0.00196 0.95033 rs11762164 CARD11 3,008,186 -0.00005 0.00195 0.97782 rs12671372 CARD11 3,020,624 0.00051 0.00194 0.79072 rs10951005* CARD11 3,037,641 -0.00028 0.00195 0.88762 rs10951010 CARD11 3,042,579 0.00067 0.00192 0.72541 rs7805181 CARD11 3,056,808 -0.00012 0.00196 0.95296 rs12700536* CARD11 3,058,747 -0.00007 0.00194 0.96939 rs1976135* CARD11 3,060,293 0.00036 0.00195 0.85295 rs1976132* CARD11 3,060,648 0.00003 0.00195 0.98961 rs6461814* CARD11 3,062,737 -0.00015 0.00194 0.93769 rs11772124* CARD11 3,084,400 -0.00045 0.00195 0.81830 rs7791004 CARD11 3,089,458 -0.00052 0.00196 0.78921 rs17150474* CARD11 3,092,123 -0.00028 0.00195 0.88538 rs6969362 CARD11 3,111,159 0.00021 0.00194 0.91401 rs12700594 CARD11 3,123,227 -0.00074 0.00186 0.68962 rs4722476 CARD11 3,157,114 -0.00001 0.00195 0.99636 rs12700386 IL6 22,729,534 0.00027 0.00188 0.88574 rs3087221 IL6 22,729,942 -0.00014 0.00197 0.94247 rs2069824 IL6 22,731,757 0.00087 0.00193 0.65350 rs2069832 IL6 22,733,958 0.00053 0.00193 0.78387 rs2069835 IL6 22,734,396 0.00040 0.00195 0.83672 rs1474347 IL6 22,734,649 0.00039 0.00195 0.84028 rs2066992 IL6 22,734,774 0.00001 0.00196 0.99724 rs2069839 IL6 22,735,020 -0.00077 0.00195 0.69093 rs2069840 IL6 22,735,097 -0.00019 0.00195 0.92356 rs1554606 IL6 22,735,232 0.00032 0.00194 0.86870 rs2069842 IL6 22,735,835 0.00010 0.00196 0.95805 rs1548216 IL6 22,736,298 -0.00029 0.00196 0.88268 rs2069843 IL6 22,736,519 -0.00039 0.00191 0.83769 rs2069845 IL6 22,736,674 -0.00033 0.00196 0.86730 rs2069846 IL6 22,736,887 0.00003 0.00196 0.98717 * SNP identified previously by Vietnamese study ** Results adjusted for age (< or ≥ 5 years), sex, and HIV status

42 SNP Gene Location Estimate Standard Error P-value rs2069849 IL6 22,737,681 -0.00016 0.00197 0.93427 rs3746619 MC3R 54,257,212 -0.00119 0.00192 0.53693 rs3827103 MC3R 54,257,436 -0.00038 0.00195 0.84549 rs10369 CTSZ 57,003,851 0.00038 0.00195 0.84472 rs9760 CTSZ 57,005,158 0.00007 0.00193 0.97140 rs163790 CTSZ 57,008,989 0.00015 0.00194 0.93728 rs163800 CTSZ 57,011,903 0.00007 0.00196 0.97360 rs163801 CTSZ 57,012,009 -0.00035 0.00197 0.86077 ** Results adjusted for age (< or ≥ 5 years), sex, and HIV status

4.1.2. Fine-Mapping Analysis

Within these 564 Ugandans, a total of 1,479 SNps were genotyped across the 17.84-Mb region on chromosome 7, and of these 1,367 met the 90% call rate threshold. Departures from HWE in the pooled sample were found in 24 SNPs (Table 4). No significant associations with TB susceptibility were found with these SNPs (P > 0.10) (data not shown).

43 Table 4. Fine Mapping SNPs Departing from HWE SNP Position ObsHET PredHET HWpval MAF Alleles rs6960928 1263932 0.016 0.071 2.44E-16 0.037 G:A rs10266549 1440457 0.668 0.499 1.69E-10 0.477 G:A rs7783310 1712658 0.575 0.473 4.66E-05 0.383 G:A rs4256490 1857290 0.183 0.24 5.62E-05 0.14 G:A rs1637755 2188475 0.474 0.404 8.00E-04 0.281 G:A rs11977057 4094233 0.148 0.484 1.98E-41 0.412 C:G rs314598 4454229 0.379 0.481 4.99E-05 0.403 G:C rs11976063 4798495 0.56 0.459 4.36E-05 0.357 A:G rs12540466 4917659 0.263 0.353 5.09E-06 0.228 A:T rs627222 6094981 0.217 0.273 3.00E-04 0.163 A:G rs38019 8014208 0.348 0.458 5.97E-06 0.355 A:T rs11978224 8652578 0.355 0.484 6.85E-07 0.409 T:A rs2915125 9873235 0.167 0.496 1.64E-39 0.455 A:G rs13236165 11651725 0.224 0.499 9.98E-26 0.48 G:A rs17165063 11707008 0.353 0.443 2.00E-04 0.331 A:T rs12671228 12528990 0.304 0.404 4.63E-06 0.281 C:A rs2282880 13921045 0.286 0.457 7.34E-12 0.354 A:G rs41503 13999181 0.271 0.401 3.98E-09 0.278 T:A rs7783337 14123511 0.325 0.414 4.71E-05 0.293 A:G rs7811874 15308407 0.523 0.434 5.72E-05 0.318 G:A rs10227084 15366197 0.588 0.5 9.00E-04 0.497 C:A rs10242655 15822701 0.014 0.176 6.96E-40 0.097 C:A rs38237 15899199 0.24 0.308 6.86E-05 0.19 A:G * All P-values less than 0.0001.

4.1.3. Imputation

Reference Population for Imputation

In comparing genotype frequencies, the Uganda (UG) population tended to show fewer differences from the Kenyan populations (MKK, LWK) than the Yoruba population (YRI); several differences in the Ugandan population were seen in TLR6, where the Ugandan frequencies were dramatically different. Interestingly, allele frequencies for TLR6 were “flipped” in the Ugandan population (the most common genotype in Ugandan data was opposite to all the other populations).

44 Comparisons of LD patterns were inconclusive. Differences in LD structure were

apparent between the Uganda sample, both Kenyan populations, and the Yoruba population, and thus it was difficult to decide which HapMap population was most like the Ugandan population, suggesting that multiple HapMap populations should contribute to the reference sample. These inconclusive findings were unlikely the result of small sample size since the size of the Ugandan population was actually slightly larger than that of the Yoruba population. However, a limitation of this analysis was that there was not exact overlap in the SNPs covered in these populations: the Ugandan sequencing focused on exonic SNPs, while the HapMap data had broader coverage. Based on lack of SNP overlap, LD was difficult to assess and compare.

Haplotype analyses found that the haplotypes in TLR6 in the UG individuals were not present at all in the other four populations. A Fisher’s exact test of the haplotype frequencies on TIRAP found that the UG frequencies were significantly different (at the

α = 0.05 level) from LWK (P = 0.0412), MKK (P = 0.0014), and YRI (P = 0.0076).

Moreover, YRI was also significantly different from MKK (P < 0.00001). Also interesting was that at TIRAP, the two Kenyan populations had significantly different

haplotype frequencies from one another (P = 0.0002). At TLR2, UG was significantly

different from LWK (P < 0.00001), YRI (P < 0.00001), and MKK (P < 0.00001). Also,

at TLR2, YRI was significantly different from LWK (P < 0.00001) and MKK (P <

0.00001). These haplotype analyses demonstrated that UG differs from both the Kenyan populations and the Yoruba population but that some similarities between these

populations are present, suggesting the use of an imputation reference population that

combines all three populations. Results are provided in Tables 5 - 8.

45

Table 5. Haplotype Analysis Results for TLR2

HAPLOTYPE C-C C-T T-T T-C LWK 0.06178 0.64608 0.29212 MKK 0.02043 0.617 0.36253 Population YRI 0.74552 0.01496 0.23951 UG 0.28573 0.49999 0.21427

Table 6. Haplotype Analysis Results for TLR4

HAPLOTYPE A-A-G A-A-T A-G-G A-G-T G-A-G G-G-G LWK 0.75617 0.01764 0.02792 0.06758 0.12391 MKK 0.77033 0.03297 0.03199 0.1472 0.01421 Population YRI 0.84913 0.01063 0.03088 0.02095 0.0834 46 UG 0.80263 0.03947 0.15789

Table 7. Haplotype Analysis Results for TLR6 HAPLOTYPE A-C-C A-C-T G-C-C G-T-C A-G-C A-G-T G-G-C G-G-T LWK 0.65348 0.15902 0.15902 0.0284 MKK 0.61879 0.32858 0.02453 0.02336 Population YRI 0.6902 0.23618 0.05828 0.01532 UG 0.02083 0.67708 0.01042 0.29167

Table 8. Haplotype Analysis Results for TIRAP HAPLOTYPE A-C-G-C G-C-A-C G-C-G-C G-T-G-C G-C-G-T LWK 0.10565 0.02415 0.74289 0.11736 MKK 0.03668 0.67503 0.22676 0.05059 Population YRI 0.03064 0.0457 0.81032 0.10301 UG 0.07778 0.01111 0.85556 0.01111 0.04444

Marker Data

To ensure that imputation accuracy was not greatly reduced by MAF discrepancies, genotypes between the imputed data and each of the pre-selected reference panel were examined at a random selection of 100 SNPs across the chromosome 7 and chromosome 20 regions. Based on these 300 chi-square tests, several significant differences were found. The average observed heterozygosity in the observed Ugandan genotypes was 40%. The average predicted heterozygosity, under the assumption of

HWE, was about 39%. Therefore, heterozygosity was not suspected to significantly reduce the accuracy of my imputation.

A total of 11,872 SNPs were imputed across the chromosome 7 and chromosome

20 regions. However, after correcting for Mendelian inconsistencies, only 9,936 of these

SNPs provided sufficient information for the association analyses as inconsistent genotypes were set to missing.

Association Analysis

Results for the SNPs with association P values below 0.05 are provided in Tables

9 and 10. Of the SNPs with suggestive association to TB susceptibility, four were in close proximity with the candidate genes examined, including IL6 and MC3R. The imputation analyses found significance at five SNPs at the P = 0.01 level. Two of these

SNPs are located on the SDK1 gene, the sidekick homolog 1, cell adhesion molecule, a protein coding gene with literature supporting its role in HIV-associated nephropathy

(Kaufman et al. 2004; Kaufman et al. 2007 ).

47

Table 9. Results for Imputed Genotypes on Chromosome 7p SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP

rs6952577 838627 UNC84A -0.00483 0.002152 0.02481

rs1113831 1775912 0.005669 0.002708 0.03628 46.041 kb from MAD1L1 21.796 kb from ELFN1

rs10950377 1802421 0.01361 0.006525 0.03696 19.532 kb from MAD1L1 48.305 kb from ELFN1

rs4721174 1903206 MAD1L1 -0.00421 0.001933 0.02943

rs3778994 2142382 MAD1L1 -0.00402 0.00198 0.04233

rs3735093 2254383 NUDT1 0.006108 0.002849 0.03204

rs1799832 2257049 NUDT1 -0.00506 0.002571 0.04914

rs3735111 2616459 IQCE -0.00441 0.002113 0.03668

rs4719646 2721152 AMZ1 -0.00404 0.001988 0.04190

rs798527 2739317 GNA12 3.4885 1.3873 0.01191

rs798521 2742925 GNA12 1.0455 0.4298 0.01499

rs2644295 2826744 GNA12 -0.00404 0.001909 0.03417

rs1182181 2839774 GNA12 -0.00479 0.002233 0.03195

rs1182179 2840175 GNA12 -0.00479 0.002233 0.03195

rs7802106 3442243 SDK1 0.008724 0.004374 0.04608

rs4722830*** 3468403 SDK1 0.008585 0.003142 0.00628 48 rs1915981*** 3477268 SDK1 0.008585 0.003142 0.00628

rs2002671 3671005 SDK1 0.008828 0.004244 0.03753

rs6964347 3673290 SDK1 0.005732 0.002756 0.03750

rs12112197 3700806 SDK1 -0.00435 0.002077 0.03622

rs13225994 3745464 SDK1 -0.01202 0.005419 0.02659

rs6975070 3759422 SDK1 -0.0041 0.002006 0.04093

rs12701221 3855197 SDK1 0.01034 0.004444 0.01992

rs17134410 4153858 SDK1 -0.00393 0.001953 0.04421

rs669028 4166428 SDK1 -0.00594 0.00293 0.04263

rs4723505 4230186 SDK1 0.005754 0.00279 0.03921 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01

SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP

rs9638928 4304786 -0.00369 0.001878 0.04969 383.669 kb from FOXK1 29.629 kb from SDK1

rs7783362 4445115 -0.00459 0.002188 0.03592 243.34 kb from FOXK1 169.958 kb from SDK1

rs7788807 4734565 FOXK1 0.004882 0.002273 0.03170

rs11771525 5446978 -0.00474 0.002361 0.04453 34.975 kb from FBXL18 17.275 kb from TNRC18

rs10249759 6032273 EIF2AK1 0.006975 0.003363 0.03810

rs10224504 6194952 PSCD3 -0.00374 0.001874 0.04607

rs13225983 6545107 -0.01202 0.005419 0.02659 38.482 kb from ZDHHC4 54.733 kb from KDELR2

rs7794450 6592373 ZDHHC4 -0.00576 0.002392 0.01605

rs7795522 6602099 -0.01082 0.005458 0.04751 279.585 kb from

rs4724889 6909185 -0.00524 0.002415 0.03011 C1GALT1 76.799 kb from C7orf28B

rs2163639 7162473 -0.00778 0.00323 0.01596 26.297 kb from C1GALT1 330.087 kb from C7orf28B

rs4582461 7162798 -0.00778 0.00323 0.01596 25.972 kb from C1GALT1 330.412 kb from C7orf28B 155.465 kb from

rs7806755 7209303 0.004401 0.002243 0.04973 COL28A1 376.917 kb from C7orf28B rs13246962 145.467 kb from

7219301 -0.00469 0.002155 0.02970 COL28A1 386.915 kb from C7orf28B 49 rs10487590 7243789 C1GALT1 -0.00754 0.00324 0.01994 100.054 kb from

rs17252812 7264714 -0.00616 0.002983 0.03884 COL28A1 14.208 kb from C1GALT1

rs1922630 7278275 0.007511 0.003319 0.02361 86.493 kb from COL28A1 27.769 kb from C1GALT1

rs1638201 7281776 0.007957 0.003788 0.03567 82.992 kb from COL28A1 31.27 kb from C1GALT1

rs2270080 7284116 -0.00733 0.003482 0.03534 80.652 kb from COL28A1 33.61 kb from C1GALT1

rs6463665*** 7285249 0.008222 0.003174 0.00959 79.519 kb from COL28A1 34.743 kb from C1GALT1

rs2141911 7295301 0.009059 0.004017 0.02415 69.467 kb from COL28A1 44.795 kb from C1GALT1

rs10273515 7299106 -0.0072 0.003491 0.03912 65.662 kb from COL28A1 48.6 kb from C1GALT1

rs13237015 7299447 -0.00714 0.003639 0.04963 65.321 kb from COL28A1 48.941 kb from C1GALT1 rs12673989 7310506 0.007964 0.003788 0.03549 None within 500 kb. None within 500 kb

rs9648104 7316406 0.009905 0.00407 0.01494 48.362 kb from COL28A1 65.9 kb from C1GALT1 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status **All P-values < 0.05, *** denotes P-value < 0.01

SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP

rs1294632 7391125 COL28A1 -0.00581 0.002691 0.03089

rs10952075 7836573 0.5405 0.2749 0.04930 138.374 kb from GLCCI1 111.81 kb from RPA3

rs4725044 7909102 -0.0044 0.002115 0.03755 65.845 kb from GLCCI1 184.339 kb from RPA3

rs10486207 8077306 GLCCI1 0.007259 0.003642 0.04625

rs17153107 8644906 NXPH1 0.01804 0.007459 0.01556

rs12702800*** 8931817 0.6734 0.2242 0.00267 None within 500 kb. 172.699 kb from NXPH1

rs6967777 9037073 0.007679 0.003868 0.04709 None within 500 kb. 277.955 kb from NXPH1

rs293169 9124086 0.004271 0.001923 0.02637 None within 500 kb. 364.968 kb from NXPH1

rs293173 9125176 0.004271 0.001923 0.02637 None within 500 kb. 366.058 kb from NXPH1

rs293181 9126659 0.004271 0.001923 0.02637 None within 500 kb. 367.541 kb from NXPH1

rs293184 9127724 0.004271 0.001923 0.02637 None within 500 kb. 368.606 kb from NXPH1

rs7808679 9138949 -0.00378 0.001907 0.04763 None within 500 kb. 379.831 kb from NXPH1

rs4720828 9213749 -0.00439 0.002186 0.04443 426.675 kb from PER4 454.631 kb from NXPH1

rs2713319 9498692 0.02088 0.008718 0.01663 141.732 kb from PER4 None within 500 kb

rs1910859 9559170 -0.00517 0.002563 0.04369 81.254 kb from PER4 None within 500 kb

rs2709004 9564339 0.008581 0.004075 0.03521 76.085 kb from PER4 None within 500 kb

50 rs16876171 9646548 -0.00483 0.002335 0.03861 None within 500 kb. 4.576 kb from PER4

rs13234568 9853401 0.01854 0.008438 0.02800 None within 500 kb. 211.429 kb from PER4

rs16876384 10049257 -0.00409 0.001964 0.03742 None within 500 kb. 407.285 kb from PER4

rs2108004 10559409 -0.00448 0.002277 0.04888 379.93 kb from NDUFA4 None within 500 kb

rs10243246 10601851 -0.00567 0.002303 0.01388 337.488 kb from NDUFA4 None within 500 kb

rs2108016 10605337 -0.00445 0.001861 0.01674 334.002 kb from NDUFA4 None within 500 kb

rs10486094 10764944 -0.0047 0.002238 0.03555 174.395 kb from NDUFA4 None within 500 kb

rs7789764 10939888 NDUFA4 -0.00628 0.002796 0.02469

rs1616965 10945922 NDUFA4 -0.0071 0.002922 0.01517

rs6968332 11181819 -0.0043 0.002179 0.04859 194.769 kb from THSD7A 6.05 kb from PHF14

rs12673692 11535056 THSD7A -0.00587 0.002479 0.01798 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01

SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP

rs2681050 11615999 THSD7A -0.00415 0.00206 0.04381

rs12699252 11651102 THSD7A -0.00497 0.002523 0.04901 160.003 kb from

rs6959385 11998352 0.009827 0.004396 0.02537 219.02 kb from TMEM106B THSD7A 42.8 kb from

rs10230889 12286214 -0.00437 0.001942 0.02444 50.819 kb from VWDE TMEM106B

rs2033604 12659666 SCIN -0.00604 0.002495 0.01551

rs7811183 12882595 -0.00555 0.002423 0.02188 None within 500 kb. 185.512 kb from ARL4A

rs7780825 12883832 0.01958 0.009378 0.03683 None within 500 kb. 186.749 kb from ARL4A

rs2041235 13171369 -0.00403 0.00189 0.03305 None within 500 kb. 474.286 kb from ARL4A rs194019 13206844 -0.00498 0.00232 0.03169 None within 500 kb. None within 500 kb rs7780526 13244474 0.01958 0.009378 0.03683 None within 500 kb. None within 500 kb rs17167005 13324742 -0.0067 0.003208 0.03670 None within 500 kb. None within 500 kb

rs2051928 13803279 -0.7444 0.362 0.03972 94.103 kb from ETV1 None within 500 kb

rs10243424 14109647 0.01471 0.006387 0.02124 41.551 kb from DGKB 112.072 kb from ETV1

rs1367775 14437323 DGKB -0.0055 0.002791 0.04870

rs17168255 14552554 DGKB -0.00446 0.002029 0.02797 51

rs7796540 14946709 -0.00677 0.00303 0.02543 259.758 kb from TMEM195 99.109 kb from DGKB

rs12386767 15074710 0.01471 0.006387 0.02124 131.757 kb from TMEM195 227.11 kb from DGKB

rs12699683 15099653 0.009143 0.003575 0.01054 106.814 kb from TMEM195 252.053 kb from DGKB

rs12699696 15248099 TMEM195 0.007406 0.003679 0.04413

rs2389412 15523467 TMEM195 0.005709 0.002742 0.03736 4.288 kb from

rs11763112 15572453 -0.00522 0.002237 0.01970 44.908 kb from MEOX2 TMEM195 246.669 kb from

rs16878648 15939502 -0.005 0.002478 0.04346 154.174 kb from ISPD MEOX2 287.824 kb from

rs1358449 15980657 -0.00514 0.002606 0.04853 113.019 kb from ISPD MEOX2 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01

SNP Location Gene Estimate Std Error p-Value Gene Distal to SNP Gene Proximal to SNP

rs17169263 16100239 LOC729920 -0.00483 0.002316 0.03709

rs1295175 16239376 LOC729920 0.007014 0.003028 0.02054

rs7782228 17001343 -0.0064 0.003218 0.04682 303.457 kb from AHR 113.205 kb from AGR3

rs10244656 17062178 -0.01351 0.006231 0.03009 242.622 kb from AHR 174.04 kb from AGR3

rs2166939*** 17502387 -0.00531 0.001999 0.00794 294.523 kb from SNX13 150.087 kb from AHR

rs2084472 17559022 0.01451 0.005852 0.01316 237.888 kb from SNX13 206.722 kb from AHR

rs7781652 17669942 0.003981 0.001942 0.04032 126.968 kb from SNX13 317.642 kb from AHR

rs12669067 17984469 0.006301 0.002896 0.02961 48.455 kb from PRPS1L1 37.813 kb from SNX13

rs7800421 18029281 1.0346 0.4824 0.03200 3.643 kb from PRPS1L1 82.625 kb from SNX13 184.507 kb from

rs2961310 22690933 -0.00427 0.00215 0.04683 42.357 kb from IL6 MGC87042 201.559 kb from

rs4321884 22707985 -0.00503 0.00212 0.01767 25.305 kb from IL6 MGC87042

rs7802277 22749139 0.01018 0.004062 0.01220 69.637 kb from TOMM7 10.995 kb from IL6 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05, *** denotes P-value < 0.01

52

Table 10. Results for Imputed Genotypes on Chromosome 20q SNP Location Gene Estimate Std Error p-Value Gene Proximal to SNP Gene Distal to SNP

rs6513195 54159977 1.2301 0.6237 0.04859 97.217 kb from MC3R 146.558 kb from CBLN4

rs163781 56997161 TH1L -0.00432 0.00191 0.02359

rs6026742 57174001 -0.00666 0.003174 0.03589 25.468 kb from ZNF831 122.705 kb from SLMO2

rs259997 57216650 C20orf174 -1.6598 0.8367 0.04730 * Results adjusted for age (< or ≥ 5 years), sex, and HIV status ** All P-values < 0.05

4.2. Discussion

From the analyses conducted here, none of the candidate gene SNPs or fine

mapping SNPs were found significantly associated with TB susceptibility at their

respective significant thresholds (P < 0.025, P < 4.13 x 10-5). Relaxing these thresholds

to P < 0.05 failed to provide any associations with these variants. Analyses of the

imputed genotypes were also non-significant at their respective threshold, although 117

imputed SNPs were significant at the P < 0.05 threshold. However, none of these SNPs

were within or near any of the candidate gene regions.

The genome-wide linkage results reported by Stein et al. (2008) provided promising initial evidence for TB susceptibility loci on chromosome 7 and the replicated regions on chromosome 20. Following up these linkage results, the candidate gene and fine mapping analyses here examined genetic associations in already published genomic regions. Imputation of genotypes can increase power to detect associations, but the appropriate reference panel is instrumental for accurate and precise imputation. It is possible that so few associations were found within the imputed data because the reference population selected did not fairly represent the LD structure in the Ugandan population. Also, some of the imputation software assumes individuals are unrelated, thus family data may weaken genotype imputation validity.

Recall that the GLMM applied to assess genetic association assumed an exchangeable correlation matrix to represent familial correlations, introducing potentially biased results. Ignoring avuncular and grandparental relationships may have led to under-reported levels of significance, having over-corrected for these correlations.

53 Assuming the observations are more correlated than they truly are has the potential to thus increase type II error rates. In spite of this, 230 (58%) of the relative pairs were parent-offspring pairs.

It is possible that the Stein et al. (2008) results were not confirmed here because the exact familial correlations were not integrated into the analyses, as correlations between full siblings were most likely under-estimated. Future analyses correctly accounting pedigree structure will be implemented in S.A.G.E.’s ASSOC program, whereby the familial correlations are modeled as random effects and variance components (additive polygenic, nuclear family, spousal, sibling, and individual environmental) are estimated, using a linear mixed model in which each marker is included as a fixed effect and the likelihood is maximized over all parameters (Gray-

McGuire et al. 2009).

Replication of an association with CTSZ was not successful. However, the significant association with CTSZ as indentified by Cooke et al. (2008) was based upon the analysis of a single SNP within that gene, rs34069356. Upon designing the SNP panel for genotyping, this SNP did not match my inclusion criteria, having not been validated by Illumina. This SNP was not imputed, thus replication of the Cooke association to this SNP was not possible. The significant associations of the MC3R

SNPs, rs3746619 and rs3827103, with TB susceptibility as reported by Cooke et al.

(2008) were also not replicated. Recall that these significant associations from Cooke

(2008) were found in a South African population of fully independent sibling pairs, including “Coloureds”, South Africans of mixed ancestry. Those investigators were not able to replicate their association in a second case-control analysis taken from a West

54 African sample. Based on the LD and haplotype analyses done here, we found that the

Ugandan population carries genetic distinctions from other African populations, including novel polymorphisms and LD structure from a South African sample (Baker et al. 2009). Therefore, it is not unexpected that the Cooke et al. (2008) results were not replicated.

Results for the imputation analyses should be interpreted with caution, as 61% of the chi-square tests analyzing differences between the random selection of 100 imputed

SNPs and each of the reference populations (MKK, LWK, YRI) were found statistically significant. These results may be due to local ancestry at these SNPs, such that even though the Ugandan data resembles that of the three references populations globally,

locally the SNPs are similar to only a single population. Also, selective sweeps in this region may have influenced these analyses. A better approach to measuring the imputation accuracy may be to remove a selection of the real Ugandan SNPs and impute these given the HapMap reference populations, and then compare these imputed genotypes to the real genotypes.

Of the suggestively significant imputed SNPs, four were found in close proximity with the candidate genes examined, such that rs7802277 (P = 0.01220) was located 11 kb from IL6. Also, rs4321884 (P = 0.01767) was 25 kb proximal to IL6. Distal to IL6 was

rs2961310 (P = 0.04683), 42.357 kb away, and rs6513195 (P = 0.04859) was found

97.217 kb away from MC3R. The imputation analyses found significance at five SNPs, at the P = 0.01 level. Two of these SNPs are located on the SDK1 gene, further supporting the role of HIV-status in TB susceptibility.

55 4.3. Conclusions and Future Directions

One of the difficulties in studying the genetic component of TB lies in how the

phenotype is defined, as well as which populations are considered for analysis. Latent

TB is an important aspect of the immunological sequence from exposure to active TB,

thus further research must be conducted to better understand this intermediate phase and

how both environment and ancestry affect it. It may be beneficial to consider other TB phenotypes, such as grouping LTBI individuals with index cases and studying this group versus the non-TB cases. Even more informative may be considering a polytomous regression, including all three levels of TB susceptibility. A possible limitation of the analyses presented here is that only 22% of the individuals genotyped were TB cases.

This study followed the previously reported linkage signals and their significant linkage to TB susceptibility in a genome-wide scan (Stein et al. 2008). Therefore, levels of significance in these association tests should be interpreted with caution. P-values may not be the best determinant of significant association to TB susceptibility; with a purpose to locate and identify the most likely point in a region already identified by linkage, a P-value is irrelevant. Perhaps fitting a curve over the entire region and locating its maximum value through the use of LD may be more appropriate. Roeder et al. (2006) suggested dividing P-values by weighted hypotheses and then applying a

traditional false discovery rate (FDR) analysis to the weighted P-values. Then the

hypotheses are weighted as determined by prior information, and the power to detect

association increases as a result (Roeder et al. 2006). However, Roeder et al. (2006)

assumed the linkage analyses included fully informative affected sibling pairs, that the

56 association study consisted of cases and controls (not related individuals), and that the

weight selection were performed with care.

Because association analyses are powered to detect associations with markers that

are in LD with the casual SNP (or casual loci), unless the casual SNP itself has been

genotyped, a haplotype analysis may provide more insight into disease susceptibility than

single SNP analyses (Bailar and Hoaglin 2009). Clark et al. (2004) explained that

analyzing phased haplotype improves the results of candidate gene studies, for three

reasons. First, protein-coding (i.e. functional) genes are defined by haplotypes. Second,

haplotypes are representative of the genetic differences between populations, and

dependent upon ancestry and demography, haplotypes composed of synonymous SNPs

can be population specific (Schaid 2004). Third, analyzing haplotypes over SNPs

presents a reduction in the dimensionality of the analysis, thus potentially increasing

statistical power (Clark 2004). Haplotypes will be constructed using S.AG.E.’s

DECIPHER program, analyzed in ASSOC, and these results will be compared to the SNP

results (S.A.G.E. v6.1.0.).

A possible limitation to the imputation analysis may be that imputation in MACH

automatically ignores SNPs present in the pedigree file but not in the reference panel.

Therefore, future work will compare imputed genotypes for SNPs included in the

Ugandan data to the actual SNP genotypes to evaluate imputation accuracy. For those genotypes missing from the pedigree dataset, imputation accuracy cannot be directly

assessed in this way. Instead, SNPs could be deliberately omitted from the source dataset

and imputed. Comparing these imputed genotypes to the known genotypes may provide

insight into the accuracy of imputation in these instances.

57 Other areas of interest include genome-wide association studies (GWAS).

Currently, no GWAS on TB have been published. A major limitation to these studies is

genotyping cost. Another limitation is that identification of causal variants is not

guaranteed as GWAS are intended to identify variants under the common disease,

common variant (CDCV) hypothesis, whereby common variants with small to moderate

effects are detected. This opposes the common disease, rare variant (CDRV) hypothesis,

which theorizes that common diseases are the result of several rare variants across the genome, such that each of these rare variants has a moderate to large effect on the disease. Therefore, if TB susceptibility is in fact the product of rare variants, a GWAS approach may not detect significant associations. Having found significant linkage but not association at the chromosome 7 and chromosome 20 regions (Stein et al. 2008) suggests rare variants may be responsible for genetic susceptibility to TB. To detect these rare variants, a copy number variation (CNV) analysis could be applied, whereby stretches in the genome, 1 kb or longer, present differences in the number of copies of a variant (or variants) between populations. These repeated “chunks” of the genome are then used as the units of variance under study, as opposed to single nucleotide variants.

It may be possible that additional inherited characteristics of the DNA controlled by epigenetics also influence phenotype, such that DNA transcription and/or tissue- specific expression of certain genes are regulated in the absence of altered nucleotide sequences (Möller et al. 2010a). Even more specific to the sequences themselves is a resequencing approach, such as has been conducted on the TLR genes (Ma et al. 2007).

Resequencing, like CNV analysis, assumes the CDRV hypothesis, and thus can potentially detect rare variants. In a full-exon resequenced region, Ma et al. (2007) found

58 coding variants in the TLR1 and TLR10 genes were significantly more expressed in

African American TB cases than African American controls. TB cases and controls matched on European and Hispanic ethnicity were also examined, and differences in the frequency of rare nonsynonymous polymorphisms were present in the Europeans (at

TLR10) and the Hispanics (at TLR2) (Ma et al. 2007). Further resequencing of exonic regions of the genome may lead to additional discoveries of rare variants influencing TB susceptibility. However, unless the sample size is very large or is heavily ascertained, a single rare variant may not be detectable; instead, the total “load” of rare mutations at many sites within a gene may be the relevant exposure for testing association.

Although experiment-wide statistical significance was not achieved in these analyses, the imputation results suggest MC3R and IL6 may play important roles in TB susceptibility, although these SNPs with suggested associations were located quite far from these genes. A major limitation to consider is the small sample size of only 564 individuals. Future work should pursue how genes influence immunological traits and their effects on TB susceptibility.

59 Bibliography

Abecasis, G. R., S. S. Cherny, W. O. Cookson, and L. R. Cardon, 2002 Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat.Genet 30: 97- 101.

Amim, L. H., A. G. Pacheco, J. Fonseca-Costa, C. S. Loredo, M. F. Rabahi et al. 2008 Role of IFN-gamma +874 T/A single nucleotide polymorphism in the tuberculosis outcome among Brazilians subjects. Mol.Biol.Rep. 35: 563-566.

Awomoyi, A. A., M. Charurat, A. Marchant, E. N. Miller, J. M. Blackwell et al. 2005 Polymorphism in IL1B: IL1B-511 association with tuberculosis and decreased lipopolysaccharide-induced IL-1beta in IFN-gamma primed ex-vivo whole blood assay. J.Endotoxin.Res. 11: 281-286.

Babb, C., E. H. Keet, P. D. van Helden, and E. G. Hoal, 2007a SP110 polymorphisms are not associated with pulmonary tuberculosis in a South African population. Hum Genet 121: 521-522.

Babb, C., M. L. van der, N. Beyers, C. Pheiffer, G. Walzl et al. 2007b Vitamin D receptor gene polymorphisms and sputum conversion time in pulmonary tuberculosis patients. Tuberculosis (Edinb.) 87: 295-302.

Baghdadi, J. E., M. Orlova, A. Alter, B. Ranque, M. Chentoufi et al. 2006 An autosomal dominant major gene confers predisposition to pulmonary tuberculosis in adults. J Exp.Med. 203: 1679-1684.

Bailar III J. C., and D. C. Hoaglin, 2009 Medical Uses of Statistics. John Wiley & Sons, Inc., Hoboken, New Jersey.

Baker A. B., A. Randhawa, Shey M., M. de Kock, G. Kaplan et al. 2009 Comparison of genotype frequencies in Toll-like receptor genes in Ugandans, South Africans, and African HapMap populations. Poster presented at the American Society of Human Genetics Annual Meeting, October 2009, Honolulu, HI . 2009.

Barrett, J. C., B. Fry, J. Maller, and M. J. Daly, 2005 Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 21: 263-265.

Bellamy, R., N. Beyers, K. P. McAdam, C. Ruwende, R. Gie et al. 2000 Genetic susceptibility to tuberculosis in Africans: a genome-wide scan. Proc.Natl.Acad.Sci.U.S.A 97: 8005-8009.

Bellamy, R., C. Ruwende, T. Corrah, K. P. McAdam, M. Thursz et al. 1999 Tuberculosis and chronic hepatitis B virus infection in Africans and variation in the vitamin D receptor gene. J.Infect.Dis. 179: 721-724.

60 Bellamy, R., C. Ruwende, T. Corrah, K. P. McAdam, H. C. Whittle et al. 1998a Assessment of the interleukin 1 gene cluster and other candidate gene polymorphisms in host susceptibility to tuberculosis. Tuber.Lung Dis. 79: 83-89.

Bellamy, R., C. Ruwende, T. Corrah, K. P. McAdam, H. C. Whittle et al. 1998b Variations in the NRAMP1 gene and susceptibility to tuberculosis in West Africans. N.Engl.J.Med. 338: 640-644.

Bellamy, R. J., and A. V. Hill, 1998 Host genetic susceptibility to human tuberculosis. Novartis.Found.Symp. 217: 3-13.

Ben-Ali, M., M. R. Barbouche, S. Bousnina, A. Chabbou, and K. Dellagi, 2004 Toll-like receptor 2 Arg677Trp polymorphism is associated with susceptibility to tuberculosis in Tunisian patients. Clin.Diagn.Lab Immunol. 11: 625-626.

Berrington, W. R., and T. R. Hawn, 2007 Mycobacterium tuberculosis, macrophages, and the innate immune response: does common variation matter? Immunol.Rev. 219: 167-186.

Bidwell, J., L. Keen, G. Gallagher, R. Kimberly, T. Huizinga et al. 2001 Cytokine gene polymorphism in human disease: on-line databases, supplement 1. Genes Immun. 2: 61-70.

Bochud, P. Y., T. R. Hawn, and A. Aderem, 2003 Cutting edge: a Toll-like receptor 2 polymorphism that is associated with lepromatous leprosy is unable to mediate mycobacterial signaling. J Immunol. 170: 3451-3454.

Botha, T., and B. Ryffel, 2003 Reactivation of latent tuberculosis infection in TNF- deficient mice. J Immunol. 171: 3110-3118.

Breslow, N., and D. G. Clayton, 1993 Journal of the American Statistical Association 88: 9-25.

Browning, B. L., and S. R. Browning, 2009 A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84: 210-223.

Browning, S. R., 2008 Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet 124: 439-450.

Browning, S. R., and B. L. Browning, 2007 Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084-1097.

Cervino, A. C., S. Lakiss, O. Sow, R. Bellamy, N. Beyers et al. 2002 Fine mapping of a putative tuberculosis-susceptibility locus on chromosome 15q11-13 in African families. Hum.Mol.Genet. 11: 1599-1603.

61 Chaisson, R. E., G. F. Schecter, C. P. Theuer, G. W. Rutherford, D. F. Echenberg et al. 1987 Tuberculosis in patients with the acquired immunodeficiency syndrome. Clinical features, response to therapy, and survival. Am.Rev.Respir.Dis. 136: 570- 574.

Chen X.R., Feng Y.L., Ma Y., Zhang Z.D., Li C.Y. et al. 2006 Study on the association of two polymorphisms of the vitamin D receptor (VDR) gene with susceptibility to pulmonary tuberculosis (PTB) in the Chinese Tibetans. Sichuan Da Xue Xue Bao Yi Xue Ban 37: 847-851.

Clark, A. G., 2004 The role of haplotypes in candidate gene studies. Genet Epidemiol. 27: 321-333.

Comstock, G. W., 1982 Epidemiology of tuberculosis. Am.Rev.Respir.Dis. 125: 8-15.

Comstock, G. W., 1978 Tuberculosis in twins: a re-analysis of the Prophit survey. Am.Rev.Respir.Dis. 117: 621-624.

Cooke, G. S., S. J. Campbell, S. Bennett, C. Lienhardt, K. P. McAdam et al. 2008 Mapping of a novel susceptibility locus suggests a role for MC3R and CTSZ in human tuberculosis. Am.J.Respir.Crit Care Med. 178: 203-207.

Correa, P. A., L. M. Gomez, J. Cadena, and J. M. Anaya, 2005 Autoimmunity and tuberculosis. Opposite association with TNF polymorphism. J.Rheumatol. 32: 219-224.

Drennan, M. B., D. Nicolle, V. J. Quesniaux, M. Jacobs, N. Allie et al. 2004 Toll-like receptor 2-deficient mice succumb to Mycobacterium tuberculosis infection. Am J Pathol. 164: 49-57.

Dubos, R. J., 1952 Discussion on treatment of tuberculous meningitis and survival of bacilli in tuberculous lesions. Am Rev.Tuberc. 65: 637-640.

Ellinghaus, D., S. Schreiber, A. Franke, and M. Nothnagel, 2009 Current software for genotype imputation. Hum.Genomics 3: 371-380.

Epstein, M. P., W. L. Duren, and M. Boehnke, 2000 Improved inference of relationship for pairs of individuals. Am J Hum Genet 67: 1219-1231.

Etokebe, G. E., L. Bulat-Kardum, M. S. Johansen, J. Knezevic, S. Balen et al. 2006 Interferon-gamma gene (T874A and G2109A) polymorphisms are associated with microscopy-positive tuberculosis. Scand.J.Immunol. 63: 136-141.

Ewing, B., and P. Green, 1998 Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186-194.

Ewing, B., L. Hillier, M. C. Wendl, and P. Green, 1998 Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175-185.

62 Flores-Villanueva, P. O., J. A. Ruiz-Morales, C. H. Song, L. M. Flores, E. K. Jo et al. 2005 A functional promoter polymorphism in monocyte chemoattractant protein- 1 is associated with increased susceptibility to pulmonary tuberculosis. J.Exp.Med. 202: 1649-1658.

Flynn, J. L., J. Chan, K. J. Triebold, D. K. Dalton, T. A. Stewart et al. 1993 An essential role for interferon gamma in resistance to Mycobacterium tuberculosis infection. J.Exp.Med. 178: 2249-2254.

Gomez, L. M., J. M. Anaya, J. R. Vilchez, J. Cadena, R. Hinojosa et al. 2007 A polymorphism in the inducible nitric oxide synthase gene is associated with tuberculosis. Tuberculosis (Edinb.) 87: 288-294.

Gomez, L. M., J. F. Camargo, J. Castiblanco, E. A. Ruiz-Narvaez, J. Cadena et al. 2006 Analysis of IL1B, TAP1, TAP2 and IKBL polymorphisms on susceptibility to tuberculosis. Tissue Antigens 67: 290-296.

Gordon, D., C. Abajian, and P. Green, 1998 Consed: a graphical tool for sequence finishing. Genome Res 8: 195-202.

Gray-McGuire, C., M. Boehnke, R. Goodloe, and R. C. Elston, 2009 Research: Genetic association tests: A method for the joint analysis of family and case-control data. Human Genomics 4.

Greenwood, C. M., T. M. Fujiwara, L. J. Boothroyd, M. A. Miller, D. Frappier et al. 2000 Linkage of tuberculosis to chromosome 2q35 loci, including NRAMP1, in a large aboriginal Canadian family. Am.J.Hum.Genet. 67: 405-416.

Guwatudde, D., M. Nakakeeto, E. C. Jones-Lopez, A. Maganda, A. Chiunda et al. 2003 Tuberculosis in household contacts of infectious cases in Kampala, Uganda. Am J Epidemiol. 158: 887-898.

Haukim, N., J. L. Bidwell, A. J. Smith, L. J. Keen, G. Gallagher et al. 2002 Cytokine gene polymorphism in human disease: on-line databases, supplement 2. Genes Immun. 3: 313-330.

Heldwein, K. A., M. D. Liang, T. K. Andresen, K. E. Thomas, A. M. Marty et al. 2003 TLR2 and TLR4 serve distinct roles in the host immune response against Mycobacterium bovis BCG. J Leukoc.Biol. 74: 277-286.

Horner, P. J., and F. M. Moss, 1991 Tuberculosis in HIV infection. Int.J.STD AIDS 2: 162-167.

The International HapMap Project. 2003 Nature 426: 789-796.

Jallow, M., Y. Y. Teo, K. S. Small, K. A. Rockett, P. Deloukas et al. 2009 Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat.Genet.

63 Jamieson, S. E., E. N. Miller, G. F. Black, C. S. Peacock, H. J. Cordell et al. 2004 Evidence for a cluster of genes on chromosome 17q11-q21 controlling susceptibility to tuberculosis and leprosy in Brazilians. Genes Immun. 5: 46-57.

Kallmann F.J., R. D., 1943 Twin studies on the significance of genetic factors in tuberculosis. American Review of Tuberculosis 549-571.

Kampmann, B., G. Tena-Coki, and S. Anderson, 2006 Blood tests for diagnosis of tuberculosis. Lancet 368: 282-283.

Kaufman L, Yang G, Hayashi K, Ashby JR, Huang L et al. 2007 The homophilic adhesion molecule sidekick-1 contributes to augmented podocyte aggregation in HIV-associated nephropathy. Federation of American Societies for Experimental Biology 21: 1367-1375.

Kaufman, L., K. Hayashi, M. J. Ross, M. D. Ross, and P. E. Klotman, 2004 Sidekick-1 is upregulated in glomeruli in HIV-associated nephropathy. J Am Soc.Nephrol. 15: 1721-1730.

Keane, J., S. Gershon, R. P. Wise, E. Mirabile-Levens, J. Kasznica et al. 2001 Tuberculosis associated with infliximab, a tumor necrosis factor alpha- neutralizing agent. N.Engl.J.Med. 345: 1098-1104.

Kramnik, I., W. F. Dietrich, P. Demant, and B. R. Bloom, 2000 Genetic control of resistance to experimental infection with virulent Mycobacterium tuberculosis. Proc.Natl.Acad.Sci.U.S.A 97: 8560-8565.

Kruglyak, L., M. J. Daly, M. P. Reeve-Daly, and E. S. Lander, 1996 Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58: 1347-1363.

Kusuhara, K., K. Yamamoto, K. Okada, Y. Mizuno, and T. Hara, 2007 Association of IL12RB1 polymorphisms with susceptibility to and severity of tuberculosis in Japanese: a gene-based association analysis of 21 candidate genes. Int.J.Immunogenet. 34: 35-44.

Ladel, C. H., C. Blum, A. Dreher, K. Reifenberg, M. Kopf et al. 1997 Lethal tuberculosis in interleukin-6-deficient mutant mice. Infect.Immun. 65: 4843-4849.

Leandro, A. C., M. A. Rocha, C. S. Cardoso, and M. G. Bonecini-Almeida, 2009 Genetic polymorphisms in vitamin D receptor, vitamin D-binding protein, Toll-like receptor 2, nitric oxide synthase 2, and interferon-gamma genes and its association with susceptibility to tuberculosis. Braz.J Med.Biol.Res 42: 312-322.

Leung, K. H., S. P. Yip, W. S. Wong, L. S. Yiu, K. K. Chan et al. 2007 Sex- and age- dependent association of SLC11A1 polymorphisms with tuberculosis in Chinese: a case control study. BMC.Infect.Dis. 7: 19.

64 Lewinsohn, D. A., S. Zalwango, C. M. Stein, H. Mayanja-Kizza, A. Okwera et al. 2008 Whole blood interferon-gamma responses to mycobacterium tuberculosis antigens in young household contacts of persons with tuberculosis in Uganda. PLoS.One. 3: e3407.

Lewis, S. J., I. Baker, and S. G. Davey, 2005 Meta-analysis of vitamin D receptor polymorphisms and pulmonary tuberculosis risk. Int.J.Tuberc.Lung Dis. 9: 1174- 1177.

Li Y, and Abecasis GR, 2006 Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet S79: 2290.

Li, C. M., S. J. Campbell, D. S. Kumararatne, R. Bellamy, C. Ruwende et al. 2002 Association of a polymorphism in the P2X7 gene with tuberculosis in a Gambian population. J Infect.Dis. 186: 1458-1462.

Li, H. T., T. T. Zhang, Y. Q. Zhou, Q. H. Huang, and J. Huang, 2006a SLC11A1 (formerly NRAMP1) gene polymorphisms and tuberculosis susceptibility: a meta- analysis. Int.J.Tuberc.Lung Dis. 10: 3-12.

Li, M., P. tmaca-Sonmez, M. Othman, K. E. Branham, R. Khanna et al. 2006b CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration. Nat.Genet. 38: 1049-1054.

Li, Y., C. Willer, S. Sanna, and G. Abecasis, 2009 Genotype imputation. Annu.Rev.Genomics Hum Genet 10: 387-406.

Liu, W., W. C. Cao, C. Y. Zhang, L. Tian, X. M. Wu et al. 2004 VDR and NRAMP1 gene polymorphisms in susceptibility to pulmonary tuberculosis among the Chinese Han population: a case-control study. Int.J.Tuberc.Lung Dis. 8: 428-434.

Lopez-Maderuelo, D., F. Arnalich, R. Serantes, A. Gonzalez, R. Codoceo et al. 2003a Interferon-gamma and interleukin-10 gene polymorphisms in pulmonary tuberculosis. Am.J.Respir.Crit Care Med. 167: 970-975.

Louie, E., L. B. Rice, and R. S. Holzman, 1986 Tuberculosis in non-Haitian patients with acquired immunodeficiency syndrome. Chest 90: 542-545.

Ma, X., Y. Liu, B. B. Gowen, E. A. Graviss, A. G. Clark et al. 2007 Full-exon resequencing reveals toll-like receptor variants contribute to human susceptibility to tuberculosis disease. PLoS.One. 2: e1318.

MacMicking, J. D., R. J. North, R. LaCourse, J. S. Mudgett, S. K. Shah et al. 1997 Identification of nitric oxide synthase as a protective locus against tuberculosis. Proc.Natl.Acad.Sci.U.S.A 94: 5243-5248.

65 Mahasirimongkol, S., H. Yanai, N. Nishida, C. Ridruechai, I. Matsushita et al. 2009 Genome-wide SNP-based linkage analysis of tuberculosis in Thais. Genes Immun. 10: 77-83.

Malik, S., L. Abel, H. Tooker, A. Poon, L. Simkin et al. 2005 Alleles of the NRAMP1 gene are risk factors for pediatric tuberculosis disease. Proc.Natl.Acad.Sci.U.S.A 102: 12183-12188.

Mangin, B., B. Goffinet, and A. Rebai, 1994 Constructing confidence intervals for QTL location. Genetics 138: 1301-1308.

Marchini, J., B. Howie, S. Myers, G. McVean, and P. Donnelly, 2007 A new multipoint method for genome-wide association studies by imputation of genotypes. Nat.Genet. 39: 906-913.

Miller, E. N., S. E. Jamieson, C. Joberty, M. Fakiola, D. Hudson et al. 2004 Genome- wide scans for leprosy and tuberculosis susceptibility genes in Brazilians. Genes Immun. 5: 63-67.

Möller, M., W. E. de, and E. G. Hoal, 2010a Past, present and future directions in human genetic susceptibility to tuberculosis. FEMS Immunol.Med.Microbiol. 58: 3-26.

Möller, M., F. Flachsbart, A. Till, T. Thye, R. D. Horstmann et al. 2010b A functional haplotype in the 3'untranslated region of TNFRSF1B is associated with tuberculosis in two African populations. Am J Respir.Crit Care Med. 181: 388- 393.

Moran, A., X. Ma, R. A. Reich, and E. A. Graviss, 2007 No association between the +874T/A single nucleotide polymorphism in the IFN-gamma gene and susceptibility to TB. Int.J.Tuberc.Lung Dis. 11: 113-115.

Murray, C. J., K. Styblo, and A. Rouillon, 1990 Tuberculosis in developing countries: burden, intervention and cost. Bull.Int.Union Tuberc.Lung Dis. 65: 6-24.

Nino-Moreno, P., D. Portales-Perez, B. Hernandez-Castro, L. Portales-Cervantes, V. Flores-Meraz et al. 2007 P2X7 and NRAMP1/SLC11 A1 gene polymorphisms in Mexican mestizo patients with pulmonary tuberculosis. Clin.Exp.Immunol. 148: 469-477.

Nothnagel, M., D. Ellinghaus, S. Schreiber, M. Krawczak, and A. Franke, 2009 A comprehensive evaluation of SNP genotype imputation. Hum Genet 125: 163- 171.

Ogus, A. C., B. Yoldas, T. Ozdemir, A. Uguz, S. Olcen et al. 2004 The Arg753GLn polymorphism of the human toll-like receptor 2 gene in tuberculosis disease. Eur.Respir.J. 23: 219-223.

66 Oral, H. B., F. Budak, E. K. Uzaslan, B. Basturk, A. Bekar et al. 2006 Interleukin-10 (IL-10) gene polymorphism as a potential host susceptibility factor in tuberculosis. Cytokine 35: 143-147.

Pacheco, A. G., C. C. Cardoso, and M. O. Moraes, 2008 IFNG +874T/A, IL10 -1082G/A and TNF -308G/A polymorphisms in association with tuberculosis susceptibility: a meta-analysis study. Hum Genet 123: 477-484.

Pan, H., B. S. Yan, M. Rojas, Y. V. Shebzukhov, H. Zhou et al. 2005 Ipr1 gene mediates innate immunity to tuberculosis. Nature 434: 767-772.

Pei, Y. F., J. Li, L. Zhang, C. J. Papasian, and H. W. Deng, 2008 Analyses and comparison of accuracy of different genotype imputation methods. PLoS.One. 3: e3551.

Pitchenik, A. E., C. Cole, B. W. Russell, M. A. Fischl, T. J. Spira et al. 1984 Tuberculosis, atypical mycobacteriosis, and the acquired immunodeficiency syndrome among Haitian and non-Haitian patients in south Florida. Ann.Intern.Med. 101: 641-645.

Pritchard, J. K., M. Stephens, and P. Donnelly, 2000 Inference of population structure using multilocus genotype data. Genetics 155: 945-959.

Pruitt, K. D., T. Tatusova, and D. R. Maglott, 2007 NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and . Nucleic Acids Res 35: D61-D65.

Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira et al. 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575.

Qu, Y., Y. Tang, D. Cao, F. Wu, J. Liu et al. 2007 Genetic polymorphisms in alveolar macrophage response-related genes, and risk of silicosis and pulmonary tuberculosis in Chinese iron miners. Int.J.Hyg.Environ.Health 210: 679-689.

Reiling, N., C. Holscher, A. Fehrenbach, S. Kroger, C. J. Kirschning et al. 2002 Cutting edge: Toll-like receptor (TLR)2- and TLR4-mediated pathogen recognition in resistance to airborne infection with Mycobacterium tuberculosis. J Immunol. 169: 3480-3484.

Rieder H. L., 1999 Epidemiologic Basis of Tuberculosis Control. International Union Against Tuberculosis and Lung Disease, Paris.

Roeder, K., S. A. Bacanu, L. Wasserman, and B. Devlin, 2006 Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 78: 243-252.

67 Rossouw, M., H. J. Nel, G. S. Cooke, P. D. van Helden, and E. G. Hoal, 2003 Association between tuberculosis and a polymorphic NFkappaB binding site in the interferon gamma gene. Lancet 361: 1871-1872.

Roth, D. E., G. Soto, F. Arenas, C. T. Bautista, J. Ortiz et al. 2004 Association between vitamin D receptor gene polymorphisms and response to treatment of pulmonary tuberculosis. J.Infect.Dis. 190: 920-927.

S.A.G.E. [2010] Statistical Analysis for Genetic Epidemiology, Release 6.1.0: http://darwin.cwru.edu/

SAS Institute. The Mixed Procedure. SAS/STAT User's Guide, Version 913. Cary, NC: SAS Institute.

Schaid, D. J., 2004 Evaluating associations of haplotypes with traits. Genet Epidemiol. 27: 348-364.

SeattleSNPs Program for Genomic Applications (PGA). Genome Variation Server. 2009.

Selvaraj, P., P. R. Narayanan, and A. M. Reetha, 2000 Association of vitamin D receptor genotypes with the susceptibility to pulmonary tuberculosis in female patients & resistance in female contacts. Indian J.Med.Res. 111: 172-179.

Servin, B., and M. Stephens, 2007 Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS.Genet. 3: e114.

Shaw, M. A., A. Collins, C. S. Peacock, E. N. Miller, G. F. Black et al. 1997 Evidence that genetic susceptibility to Mycobacterium tuberculosis in a Brazilian population is under oligogenic control: linkage study of the candidate genes NRAMP1 and TNFA. Tuber.Lung Dis. 78: 35-45.

Sheline, K. D., A. M. France, S. Talarico, B. Foxman, L. Zhang et al. 2009 Does the lipR gene of tubercle bacilli have a role in tuberculosis transmission and pathogenesis? Tuberculosis (Edinb.) 89: 114-119.

Simonds B., 1963 Tuberculosis in twins. Pitman Medical, London.

Skamene, E., E. Schurr, and P. Gros, 1998 Infection genomics: Nramp1 as a major determinant of natural resistance to intracellular infections. Annu.Rev.Med. 49: 275-287.

Stead, W. W., J. W. Senner, W. T. Reddick, and J. P. Lofgren, 1990 Racial differences in susceptibility to infection by Mycobacterium tuberculosis. N.Engl.J Med. 322: 422-427.

Stein C. M., and W. H. Boom, 2009 personal communication.

68 Stein, C. M., D. Guwatudde, M. Nakakeeto, P. Peters, R. C. Elston et al. 2003 Heritability analysis of cytokines as intermediate phenotypes of tuberculosis. J.Infect.Dis. 187: 1679-1685.

Stein, C. M., L. Nshuti, A. B. Chiunda, W. H. Boom, R. C. Elston et al. 2005 Evidence for a major gene influence on tumor necrosis factor-alpha expression in tuberculosis: path and segregation analysis. Hum.Hered. 60: 109-118.

Stein, C. M., S. Zalwango, A. B. Chiunda, C. Millard, D. V. Leontiev et al. 2007 Linkage and association analysis of candidate genes for TB and TNFalpha cytokine expression: evidence for association with IFNGR1, IL-10, and TNF receptor 1 genes. Hum.Genet. 121: 663-673.

Stein, C. M., S. Zalwango, L. L. Malone, S. Won, H. Mayanja-Kizza et al. 2008 Genome scan of M. tuberculosis infection and disease in Ugandans. PLoS.One. 3: e4094.

Sunderam, G., R. J. McDonald, T. Maniatis, J. Oleske, R. Kapila et al. 1986 Tuberculosis as a manifestation of the acquired immunodeficiency syndrome (AIDS). JAMA 256: 362-366.

Szeszko, J. S., B. Healy, H. Stevens, Y. Balabanova, F. Drobniewski et al. 2007 Resequencing and association analysis of the SP110 gene in adult pulmonary tuberculosis. Hum Genet 121: 155-160.

Thuong, N. T., T. R. Hawn, G. E. Thwaites, T. T. Chau, N. T. Lan et al. 2007 A polymorphism in human TLR2 is associated with increased susceptibility to tuberculous meningitis. Genes Immun. 8: 422-428.

Thye, T., E. N. Browne, M. A. Chinbuah, J. Gyapong, I. Osei et al. 2009a IL10 haplotype associated with tuberculin skin test response but not with pulmonary TB. PLoS.One. 4: e5420.

Thye, T., E. N. Browne, M. A. Chinbuah, J. Gyapong, I. Osei et al. 2006 No associations of human pulmonary tuberculosis with Sp110 variants. J Med.Genet 43: e32.

Thye, T., S. Nejentsev, C. D. Intemann, E. N. Browne, M. A. Chinbuah et al. 2009b MCP-1 promoter variant -362C associated with protection from pulmonary tuberculosis in Ghana, West Africa. Hum Mol.Genet 18: 381-388.

Tosh, K., S. J. Campbell, K. Fielding, J. Sillah, B. Bah et al. 2006 Variants in the SP110 gene are associated with genetic susceptibility to tuberculosis in West Africa. Proc.Natl.Acad.Sci.U.S.A 103: 10364-10368.

Treatment of tuberculosis and tuberculosis infection in adults and children. American Thoracic Society. 1994 Monaldi Arch.Chest Dis. 49: 327-345.

van Crevel, R., T. H. Ottenhoff, and J. W. van der Meer, 2002a Innate immunity to Mycobacterium tuberculosis. Clin.Microbiol.Rev. 15: 294-309.

69 Wiart, A., A. Jepson, W. Banya, S. Bennett, H. Whittle et al. 2004 Quantitative association tests of immune responses to antigens of Mycobacterium tuberculosis: a study of twins in West Africa. Twin.Res. 7: 578-588.

Wilkinson, R. J., P. Patel, M. Llewelyn, C. S. Hirsch, G. Pasvol et al. 1999 Influence of polymorphism in the genes for the interleukin (IL)-1 receptor antagonist and IL- 1beta on tuberculosis. J.Exp.Med. 189: 1863-1874.

Wolfinger, R., and M. O'Connell, 1993 Journal of Statistical Computation and Simulation 48: 233-243.

World Health Organization., 2009 Global tuberculosis control : epidemiology, strategy, financing : WHO report 2009. World Health Organization, Geneva.

Xing, C., C. Gray-McGuire, J. A. Kelly, P. Garriott, H. Bukulmez et al. 2005 Genetic linkage of systemic lupus erythematosus to 13q32 in African American families with affected male members. Hum.Genet. 118: 309-321.

Yim, J. J., H. W. Lee, H. S. Lee, Y. W. Kim, S. K. Han et al. 2006 The association between microsatellite polymorphisms in intron II of the human Toll-like receptor 2 gene and tuberculosis among Koreans. Genes Immun. 7: 150-155.

70