EXPLORING TUBERCULOSIS GENETICS: RESISTANCE TO INFECTION, PROGRESSION TO ACTIVE DISEASE, HOST GENETICS AND MYCOBACTERIUM TUBERCULOSIS LINEAGES WITHIN A HOUSEHOLD CONTACT STUDY IN KAMPALA, UGANDA.

by

NOÉMI BORSAY HALL

Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy

Department of Epidemiology and Biostatistics CASE WESTERN RESERVE UNIVERSITY

May 2016 1

CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of

Noémi Borsay Hall

candidate for the degree of Doctor of Philosophy*

Committee Chair Catherine M. Stein, Ph.D.

Committee Member W. Henry Boom, M.D.

Committee Member Rob P. Igo, Jr., Ph.D.

Committee Member Nathan J. Morris, Ph.D.

Date of Defense March 16, 2016

*We also certify that written approval has been obtained for any proprietary material contained therein 2

Dedication

This work is dedicated to my family, especially Rev. Dr. Daniel J. Borsay, Dr. Rudy

Almasy, the late Dr. A. Wahab Khair and the late Rev. Dr. Laszlo A. Borsay.

Emma Rose- have you started thinking about your dissertation topic yet?

3

Table of Contents Dedication ...... 2 Table of Contents ...... 3 List of Tables ...... 5 List of Figures ...... 6 Acknowledgements...... 7 List of Commonly Used Abbreviations ...... 9 Chapter 1: Introduction ...... 13 Specific Aims ...... 14 Specific Aim 1: ...... 14 Specific Aim 2: ...... 15 Specific Aim 3: ...... 15 Chapter 2: Background and Literature Review ...... 16 2.1 Epidemiology of TB ...... 16 2.2 The RSTR Phenotype ...... 21 2.3 Genetic Associations with TB ...... 22 2.4 Genetic Associations with RSTR ...... 27 2.5 SNP Annotation Databases ...... 30 2.6 Mycobacterium tuberculosis and host immune response ...... 32 2.7 Mycobacterium tuberculosis Lineage ...... 36 Chapter 3: Candidate Analysis ...... 39 3.1 Introduction ...... 39 3.2 Study population ...... 39 3.3 Candidate gene genotyping ...... 40 3.4 Statistical Analysis ...... 41 3.5 Validation Analysis ...... 42 3.6 Initial Results ...... 43 Genetic association with TB ...... 43 Genetic association with RSTR ...... 44 3.7 Validation Results ...... 46 Genetic association with TB ...... 46 Genetic association with RSTR ...... 46 4

3.8 Meta-Analysis...... 47 Meta-Analysis with TB ...... 47 Meta-Analysis with RSTR ...... 47 3.9 Discussion...... 53 Chapter 4: RSTR and Annotated SNP Selection ...... 60 4.1 Introduction ...... 60 4.2 Study population ...... 60 4.3 SNP Selection ...... 61 4.4 Statistical Analysis ...... 63 4.5 Results ...... 65 4.6 Discussion...... 69 4.7 Gene Level Analysis ...... 73 4.8 Results ...... 73 4.9 Conclusions ...... 77 Chapter 5: SEM and IFNγ Immune Response ...... 80 5.1 Introduction ...... 80 5.2 Study population ...... 80 5.3 Human Genotyping ...... 80 5.4 Immunological Data ...... 81 5.5 Mtb Genotyping ...... 81 5.6 Structural Equation Modeling ...... 82 5.7 Statistical Analysis ...... 83 5.8 Structural Equation Modeling Results ...... 87 5.9 Discussion...... 91 Chapter 6: Discussion and Conclusions ...... 96 Appendix A ………………………………...………………………………………………………………………………………… 101 Bibliography ...... 102

5

List of Tables

Table 1. IFNγ Response and Mtb Lineage Preliminary Data ...... 38 Table 2. Illumina 10k Initial Analysis and Omni5 Validation Sample characteristics ...... 45 Table 3. Results of genetic association analysis validation of TB phenotype ...... 48 Table 4. Results of genetic association analysis validation of RSTR phenotype ...... 50 Table 5. Comparison of Illumina 10k Initial Analysis and Omni5 Validation Sample characteristics, by phenotype ...... 52 Table 6. Comparison of Age by phenotype, within Illumina 10k Initial Analysis and Omni5 Validation Sample ...... 52 Table 7. Annotated SNP Selection Summary ...... 66 Table 8. Annotated SNPs Associated with RSTR, p-value < 0.05 ...... 67 Table 9. Gene Level Analysis, including gene region ± 10kb ...... 75 Table 10. Gene Level Analysis, including gene region ± 0kb ...... 76 Table 11. Sample Demographics ...... 89 Table 12. SNPs from of Interest included in Analysis ...... 89 Table 13. Results of Structural Equation Modeling with Illumina10k and Omni5 Data ...... 89 6

List of Figures

Figure 1. TB Pathogenesis ...... 17 Figure 2. Hypothesized biological model ...... 35 Figure 3. Annotated SNP Selection Process ...... 64 Figure 4. Proposed modeling of IFNγ immune response, with antigen-induced IFNγ measures, Mtb lineage and human genetic information through SNPs, with corresponding equations...... 86 Figure 5. Final Model of IFNγ immune response, with antigen-induced IFNγ measures, Mtb lineage and human genetic information through SNPs, with corresponding equations ...... 90

7

Acknowledgements

I would like to acknowledge and thank the members of my dissertation committee, Dr. Catherine M. Stein (Chair and Research and Academic Advisor), Dr. W.

Henry Boom, M.D (Director of the TBRU), Dr. Rob P. Igo, and Dr. Nathan Morris, for questioning me, challenging me, and helping me grow as a researcher. I appreciate the time you have set aside from your own work to help me complete my own. I especially thank Cathy for her continual understanding, careful reading of all my drafts, and for giving me the push I needed to complete my dissertation. Special thanks also go to Rob for sharing his R wizardry and for helping me find the peskiest bugs in my code.

I appreciate the guidance of Dr. Will Bush, who introduced me to SQL and SNP annotation, and helped to keep me accountable by checking on my progress when I needed it. Without Alberto Santana’s help in navigating Latitude, updating my software, and letting me borrow hardware, I would be technically helpless. Also, Dr. Rajeev

Mehlotra’s support has been invaluable. Thank you for welcoming me into your family.

Thank you to all those at the TBRU who have made this analysis possible and this data available. I thank all the participants and their families for giving of themselves to further scientific research. A big thanks also goes out to LaShaunda Malone, data- manager extraordinaire.

I need to express my great appreciation for my fellow pre- and post-docs here in

GenEpi. From helping me stay afloat with ice cream and cake, letting me bounce ideas off of them, and keeping me out of the Case Security Alerts, I am so grateful for their support, especially during the most trying times. I am glad that nobody was crushed by 8 my cubicle walls. Who knew a tiny strip of plastic was the key to their structural integrity?

Finally, I thank my family. The family I was born into, my parents and my sister

Judit, as well as the family that I am so lucky to have created with my handsome husband, Kevin, and my little proposal-crasher, Emma Rose. Without their support, I would not be following my dreams. I am proud to carry on the Borsay family legacy, and

I am so lucky to have parents who have always encouraged me to focus on my studies.

Köszönöm szépen, drága Anyu és Apu!

Emma Rose, thank you for inspiring me and reminding me that every day is a good day to learn something new. Saving the best for last, I can never thank Kevin enough. Thank you for all of your patience, kindness, helpfulness, and your boundless love and support as I worked my way to completing this dissertation. Without you, I would be living in a van down by the river.

9

List of Commonly Used Abbreviations

ABL Abelson AG85B Antigen 85 B AIDS Acquired immune deficiency syndrome BCG Bacille Calmette-Guérin (vaccine) CAS Central Asian lineage CD4 Cluster of differentiation 4 COLEC10 Collectin liver 10 CXFT Culture filtrate DNA Deoxyribonucleic acid EA Euro-American lineage EAU Euro-American, Ugandan sub-lineage ENCODE Encyclopedia of DNA Elements ESAT6 Early secretory antigen 6 FANTOM5 Functional Annotation of the Mammalian Genome GEE Generalized estimating equations GVS Genome variation server GWAS Genome wide association study IL Interleukin HDAC Histone deacetylase HIV Human immunodeficiency virus HLA Human leukocyte antigen IFN Interferon- IGRA Interferon- response assay kb Kilobasepairs KC Kawempe Community Health Study LCN12 Lipocalin 12 LD Linkage disequilibrium LWK Luhya in Webuye, Kenya MAF Minor allele frequency MDR Multi-drug resistant MHC Major histocompatibility complex MKK Maasai in Kinyawa, Kenya MSMD Mendelian Susceptibility to Mycobacterial Disease Mtb Mycobacterium tuberculosis NOD Nucleotide oligomerization domain NRAMP Natural resistance-associated macrophage NTM Neurotrimin PBMC Peripheral blood mononuclear cell PPD Purified protein derivative of tuberculin PTST- Persistent tuberculin skin test negative RNA Ribonucleic acid RSTR Resister SEM Structural equation model 10

SKAT SNP-set Kernel Association Test SLC Solute Carrier SNP Single nucleotide polymorphism STAT1 Signal transducer and activator of transcription 1 STRUM Structural Equation Modeling (a program in R to conduct SEM analysis) TB Tuberculosis TBRU Tuberculosis Research Unit TICAM2 Toll-Like Receptor Adaptor Molecule 2 TIRAP Toll-Interleukin 1 Receptor Domain Containing Adaptor Protein TLR Toll-like receptor TNF tumor-necrosis factor TOLLIP Toll interacting protein TST Tuberculin skin test VIT Vitrin WTCCC Wellcome Trust Case Control Consortium YRI Yoruba in Ibadan, Nigeria 11

Exploring Tuberculosis Genetics: Resistance to infection, progression to active disease, host genetics and Mycobacterium tuberculosis lineages within a household contact study in Kampala, Uganda.

Abstract by

NOÉMI BORSAY HALL

Pulmonary tuberculosis disease (TB), caused by infection with Mycobacterium tuberculosis (Mtb), remains a major public health threat globally, with a high burden in

Sub-Saharan Africa. In this study, we explored the stages of TB pathogenesis with a focus on two groups of individuals: those who have active TB disease and those who remain uninfected despite prolonged exposure to an index case with active TB disease.

These latter individuals who appear to persistently resist Mtb infection are referred to as resisters (RSTR). From data collected through The Kawempe Community Health Study in Kampala, Uganda, a household contact study with a two-year period of follow-up, we conducted regression analyses focusing on these two outcomes of interest, as well as structural equation modeling of the interferon gamma (IFNγ) immune response. First, both TB and RSTR were tested for association with 546 haplotype tagging SNPs representing 29 candidate genes. The RSTR phenotype was then the focus of an annotated SNP-based regression analysis, focusing on 305 SNPs within 17 genes shown to be differentially expressed in RSTRs and those with a positive tuberculin skin test. We 12 then used structural equation modeling to incorporate human genetics, Mtb lineage, and

IFNγ response to three different Mtb antigens in modeling the latent variable, IFNγ immune response. Through these analyses, we were able to detect associations with TB disease in IL1B, TICAM2, NOD1 and NOD2. Interestingly, SNPs within NOD1, NOD2, and TICAM2 also showed association with the RSTR phenotype. The RSTR-focused annotated SNP analysis also identified suggestive associations within the COLEC10 and

HDAC4 genes. COLEC10, HDAC4 and NOD1 had not been known to previously show an association with susceptibility to TB disease or with resistance to infection with Mtb.

The NOD1 association is novel, both with TB disease and with RSTR, and is therefore of special interest. Future genetic association analysis should focus on these novel associations with RSTR and TB, in order to develop a more effective vaccine and identify drug targets for faster-acting, effective TB treatments. 13

Chapter 1: Introduction

Tuberculosis, along with HIV/AIDS and malaria, is one of the top three most devastating infectious diseases globally. Pulmonary tuberculosis disease (TB), caused by infection with Mycobacterium tuberculosis (Mtb), remains a major public health threat globally, with a high burden in Sub-Saharan Africa. According to the World Health

Organization in 2014, Uganda’s TB incidence rate was 161 per 100,000 people, compared to 3.1 per 100,000 in the United States.

Here at Case Western Reserve University, the goal of the Tuberculosis Research

Unit (TBRU) is to “work internationally to identify new approaches to prevention, diagnosis, and treatment of TB in areas of the world where it is most common.” One of the ways in which the TBRU identifies the factors that influence TB disease and Mtb infection is through its household contact studies in Kampala, Uganda: the Household

Contact Study (1995-2000) and the Kawempe Community Health Study (2002-2014). To date, clinical data and biological samples have been collected from over 3,800 enrolled individuals. In the aforementioned household contact studies, individuals who presented at the study clinic with active TB disease were enrolled as index cases. All household members of the index case who provided informed consent were also enrolled and evaluated at study entry with TST, HIV testing, chest X-ray, and given a physical exam for signs and symptoms of TB disease. Household contacts enrolled in the study received a follow-up exam every three months for the initial six months, then every six months for the next two years. The availability of this longitudinal household-level data has allowed us to explore human genetic associations between disease phenotypes of interest as well 14 as explore the genetic variability of the Mycobacterium tuberculosis and human immune response.

In the following analyses, I focused on two groups of individuals: those who have active TB disease and those who remain uninfected despite prolonged exposure to an index case with active TB disease. These latter individuals have a persistently negative

TST (PTST-) result over an extended period of exposure, indicating resistance to Mtb infection, and are referred to as resisters (RSTR). This phenotype is especially of interest as studying these individuals and their associated genetic variations may reveal meaningful differences relating to their immune response to Mtb infection.

Specific Aims

Specific Aim 1:

To examine the role of 29 candidate genes previously implicated in TB in the Ugandan sample, focusing on TB disease and RSTR. a) Analyze association between 546 haplotype tagging SNPs (Tag SNPs), chosen to represent 29 candidate genes involved in innate immunity, and two phenotypes of interest, TB disease and RSTR, within a sample of 835 individuals from the Kawempe

Community Health Study. b) Validate results of the analysis of candidate genes. This validation will be completed using data obtained from the Omni5 microarray, a 5 million SNP panel, using an independent sample of 483 individuals from the same KC Study population. This sample is independent from the sample used in Aim 1a. 15

Specific Aim 2:

Conduct a genetic association study using selected SNPs from the Omni5 microarray and

483 individuals from the KC Study population, comparing those who are RSTR vs. non

RSTR, including those who have TB disease and those who are TST+. SNPs within the region of genes of interest will be selected for inclusion in the analysis after being identified as functional in relevant immune response associated cell types: macrophages and CD4+ T cells. This analysis will focus on genes identified in previous linear neural network analysis and gene set enrichment analysis, such as vitrin (VIT), collectin sub- family member 10 (COLEC-10), lipocalin-12 (LCN12), and genes in the histone deacetylase (HDAC) family. These genes have been identified in previous research to be differentially expressed in macrophages between TST+ individuals and RSTR individuals.

Specific Aim 3:

Use structural equation modeling to model the relationships between Mtb lineages:

Central Asian, Euro-American Non-Ugandan, and Euro-American Ugandan; IFNγ response to three different Mtb antigens: culture filtrate containing of Mtb and other antigens (CXFT), antigen-85B (AG85B), and early secretory antigen-6 (ESAT6); and human genetic polymorphisms associated with IFNγ production. 16

Chapter 2: Background and Literature Review

2.1 Epidemiology of TB

TB is an ancient disease, with the first known evidence of human Mtb found in

9,000 year old remains in a Neolithic village within present-day Israel (Spigelman et al.

2015). As previously noted, TB remains a global public health burden today. In 2014, according to the World Health Organization, 9.6 million people developed TB disease worldwide, with an estimated 25% of these new cases occurring in Africa (WHO 2015).

About 1.5 million people died of the disease worldwide in 2014 (WHO 2015). In recent years, multi-drug resistant TB (MDR TB) has received most of the media attention relating to TB disease. Though MDR TB contributes to the overall burden of disease, it is not the leading cause of pulmonary TB. Those with MDR TB acquire the disease either through improper adherence to the recommended first-line drug regimen or from an infectious individual with MDR TB, and in 2014, MDR TB accounted for only approximately 1.4% of total incident TB cases and 12% of the retreatment cases in

Uganda. In the following analyses, we have focused on the more prevalent disease, TB that is susceptible and responds to first-line drugs such as isoniazid and rifampicin.

There are two stages of TB pathogenesis, as shown in Figure 1: initial infection and progression to active disease. Once an uninfected individual is exposed to Mtb, they may become infected by the mycobacterium. Although there is no gold standard for diagnosis of Mtb infection in the absence of disease, both the tuberculin skin test (TST) and the recently developed interferon gamma release assay (IGRA), provide evidence of 17 immune sensitization to Mtb and are utilized to identify Mtb-infected individuals. Only about 10% of healthy adults with evidence of Mtb infection develop active TB disease; the rest remain latently infected. Notably, previous studies using the TST as a marker for

Mtb infection have found that 10-15% of individuals exposed within their own household to TB remain TST- and therefore presumably uninfected (Stein et al. 2008, Mahan et al.

2012, Ma et al. 2014).

Figure 1. TB Pathogenesis

Studies have shown that 10-15% of individuals exposed within their own household to TB remain TST- and therefore presumably uninfected (Stein et al. 2008, Mahan et al. 2012, Ma et al. 2014). Approximately 5-10% of healthy adults with evidence of Mtb infection develop active TB disease.

18

The gold standard of TB disease diagnosis is growth of Mtb in culture, which can be performed using sputum collected from an individual suspected to have TB disease.

The sensitivity of culture is 80-85%, with a specificity of about 98% (ATC 2000). This criteria, culture-confirmed TB disease, is what was used to diagnose TB disease in the current study. Another method of TB diagnosis is the sputum smear test, utilizing sputum collected from a productive cough. It is the easiest and quickest way to detect acid-fast bacilli in stained smears through microscopy. However, it is estimated that only 50% to

80% of patients with pulmonary TB disease have positive smears (ATC 2000).

Though there is no gold standard for diagnosing infection with Mtb, either the

TST or IGRA is used to measure immune responses and diagnose Mtb infection in those individuals without TB disease. The TST is the most widely used and, until 2001, was the only method available for identifying Mtb infection for nearly a century. When administering a TST, an individual is injected intradermally with the purified protein derivative. The resulting skin induration should be assessed between 48 and 72 hours after injection, when the reaction peaks. This is a disadvantage, as it requires the individual to either make a return trip to the clinic where the TST was initially administered, or for the clinic workers themselves to track down the tested individual within the given time period. A TST is defined as positive if the induration at the injection site is greater than 5mm for children or those with an HIV+ diagnosis, and greater than 10mm for all others. TST sensitivity ranges from 89% to 95%, and specificity ranges from 85% to 99% (ATC 2000, CDC 2010). In the KC Study, the TST is the method used to identify Mtb infection. 19

The IGRA was first commercially available in 2001, with three currently available FDA-approved tests on the market. For example, the IGRA marketed as

QuantiFERON-TB Gold In-Tube test (QFT-GIT) identifies Mtb-infected individuals through measuring the IFNγ production following incubation of fresh whole blood with antigens such as ESAT-6 and CFP-10 (CDC 2010). This IFNγ production minus the IFNγ production of a control sample (which contains no Mtb-specific antigens) provides a measure of TB response, serving as evidence of immune sensitization to Mtb, and indicating the presence of an Mtb infection. Use of the IGRA does not require multiple clinic visits as the TST does, but a fresh blood specimen is required. Appropriate lab facilities, such as refrigeration, and trained lab technicians are also needed to run the test correctly. IGRA sensitivity ranges from 81% to 94%, with specificity ranging from 88% to 99% (CDC 2010).

Individuals are administered a TST or an IGRA when they are suspected of having been infected with Mtb through contact with an infectious TB case. A positive

TST or IGRA indicates that the individual’s immune system has come in contact with

Mtb previously, and can therefore recognize the bacterium and mount an appropriate immune response, which causes the skin induration. The individual with a positive test result can then start a preventative therapy which is designed to prevent progression of infection to disease. It is this progression of disease and the process of resisting Mtb infection in the first place which is of interest in our research, and further study of these phenotypes and their genetic associations may lead to advances in the treatment of TB disease and the prevention of Mtb infection. 20

Though a great deal of work has been done on human genetic susceptibility to TB disease, less has been done to examine human genetic influences on Mtb infection and resistance. In order to properly identify genes or genetic regions associated with resistance to TB, we must be able to accurately distinguish between those individuals who do not have Mtb infections because they are resistant to infection, and those who do not have Mtb infections because they have never been exposed. The RSTR phenotype itself implies the presence of innate immunity possibly following from a genetic difference within those individuals who are exposed to, yet continue to resist infection with Mtb (or at the very least, evade detection of infection through TST or IGRA).

However, this intriguing phenotype can be difficult to ascertain. Some individuals consistently exposed to Mtb have a negative TST, and therefore no evidence of Mtb living within his or her body. Ascertainment of this phenotype, therefore, hinges on reliable detection of infection, such as through a TST or IGRA. Most studies have focused on TB disease as the phenotype of interest and have not included TST results in the characterization of non-diseased individuals (Stein 2011). Therefore, there is no quantification of the unaffected subject’s infection with Mtb as measured by TST or

IGRA; instead, it is merely noted that the control is “healthy”.

The data used in this analysis were gathered from household contact studies which were uniquely designed to provide a wealth of information for genetic association studies of infectious diseases such as ours (Stein et al. 2013). Using the household contact study design to study genetic susceptibility to TB enables us to characterize the two stages of Mtb infection in the household setting among enrolled related and unrelated individuals. In this manner, individuals are captured within a wide range of ages and 21 within all stages of infection- from uninfected with Mtb, to infection with Mtb without signs of disease, and TB disease. Data collection incorporating both environmental and host genetic information is a boon to the study of TB, as the disease itself is a result of a gene by environment interaction, involving exposure to Mtb along with genetic susceptibility to disease. The household contact study design offers an ideal way to collect and study all relevant information through extensive collection of clinical and epidemiological data over time.

2.2 The RSTR Phenotype

Our use of data from a longitudinal household contact study provides the opportunity to collect information about the exposure of enrolled individuals to those with active TB disease and Mtb infection in their homes and to collect TST results at baseline and at subsequent study visits (Stein et al. 2013). Using data from the KC study, we are able to assess the RSTR phenotype using available follow-up data to confirm a subject’s continued TST negativity despite known household exposure. The overall prevalence of RSTR in the KC study was found to be 9% (Ma et al. 2014). Though some may argue that the difference between those who are TST+ and those who are RSTR is simply degree of exposure to those with infectious TB disease, recent analysis of the KC study has shown that a risk score, reflecting exposure intensity, did not differ significantly between those who are RSTR, TST+, or TST converters (Ma et al, 2014).

The phenotype RSTR as we define it is therefore interesting, not only in that the individual does not appear to be infected by Mtb, but that the individuals resist Mtb infection over a two-year period, despite prolonged exposure to an individual with active

TB disease within their household. This ability to resist infection is what is intriguing, as 22 learning more about what genetic variants drive this phenotype may identify a focal point for vaccine development, harnessing the ability to resist Mtb infection. Studying this

RSTR phenotype also aligns with the recommendations of the Gates Foundation for novel approaches to vaccine development, as it focuses on both understanding the natural immune response and the special case of unexpected immunity in the face of TB endemicity in the community at large and exposure to TB disease at the household level

(Hanekom et al. 2014).

2.3 Genetic Associations with TB

Pulmonary TB is spread by airborne droplets containing Mtb, through coughing, sneezing, or even just talking. Before Mtb was isolated and identified in 1882 by Robert

Koch, though, TB was noted to cluster within families, providing early evidence that susceptibility to the disease may be linked to a hereditary factor, and not only environmental factors. In early TB research, twin studies were used to support the argument that host genetics contributed to TB susceptibility. Twins provided an ideal opportunity to identify how much of TB could be linked to genetic effects and how much to shared environment. Monozygotic twins shared both genetics and environment, while dizygotic twins shared environment, but only shared about 50% of their genetic information. Twins with at least one member of the pair diagnosed with TB were studied by comparing the concordance rate between monozygotic and dizygotic twins (Kallman and Reisner 1943, Comstock 1978). One particular set of twin data originally collected in the 1950s, the Prophit Survey, has been studied extensively and different conclusions have been drawn from the data (Comstock 1978, van der Eijk et al. 2007). Within the early heritability studies, including the Prophit Survey, the concordance rate between 23 dizygotes ranged from 6% to 25%, while concordance between monozygotes ranged from 21% to 65% (Moller and Hoal 2010). In the latest review of the Prophit study data,

Van der Eijk argues that analyzing concordance within subgroups relating to intensity of exposure shows that environment has a stronger influence on TB disease incidence than genetics among twin pairs(van der Eijk et al. 2007).

Animal models have also been studied to understand how genetics contributes to

TB disease. Mice and rabbits provide useful models for studying TB disease, as they are able to be easily bred to be resistant or susceptible to TB disease (Moller et al. 2010).

SLC11A1, also known as NRAMP1, was first identified by linkage and subsequent fine- mapping to be associated with TB through the use of the mouse model. (Flynn 2004,

Moller et al. 2010).

From this rich history of TB research, human genetic susceptibility has been demonstrated to be involved in the pathogenesis of TB, with most research focusing primarily on immune response genes (Moller et al. 2010, Moller and Hoal 2010, Stein

2011). TB is a complex disease, so no single study approach can yet identify all of the various genes that possibly contribute to the development of the disease. Past studies may not have been able to consistently replicate findings of association with various candidate genes due to inconsistency in definitions of TB disease (sputum smear vs. symptoms vs. culture confirmed) and are not able to assess exposure to Mtb within controls (ever exposed vs. latent infection). Many studies exploring TB disease and genetic risk factors had focused on a few SNPs or polymorphisms within a few candidate genes. In the analysis based on Aim 1, we took a comprehensive approach to the examination of genetic susceptibility to TB by investigating haplotype tagging SNPs in multiple 24 candidate genes involved in innate immunity and/or the inflammatory immune response pathways that affect host response to mycobacterial invasion.

Recent research has focused on whole genome linkage scans, case-control candidate gene association studies, and genome-wide association studies (GWAS) to locate genetic regions or specific genetic variants associated with TB. The GWAS approach to identifying variants associated with TB disease has been utilized within

African and Asian samples with varying success in replicating initial findings (Thye et al.

2010, Newport and Finan 2011, Mahasirimongkol et al. 2012, Png et al. 2012, Thye et al.

2012, Chimusa et al. 2014). Within African samples from Ghana, The Gambia, Malawi, and South Africa, the 11p13 locus has been identified as associated with TB disease

(Thye et al. 2012, Chimusa et al. 2014). GWAS within Indonesian, Japanese, and Thai samples, have identified different loci, including 20q12, but have not replicated the

11p13 locus (Mahasirimongkol et al. 2012, Png et al. 2012). However, in a GWAS performed in a Russian population, the 11p13 locus was replicated and showed association with TB susceptibility, in addition to identifying TB susceptibility associated

SNPs in the ASAP1 gene, which is involved in dendritic cell migration (Curtis et al.

2015).

A recent GWAS showed variants in HLA class II gene regions associated with pulmonary TB and Mtb infection in populations with European ancestry from Iceland,

Russia, and Croatia (Sveinbjornsson et al. 2016). These findings bolster previous findings in a much smaller study among 174 Portuguese, where a variant in the HLA-DRB1 gene showed higher frequency in TB patients than healthy exposed healthcare workers (Duarte et al. 2011). A GWAS performed in a Moroccan population identified SNPs in the 25

FOXP1 and AGMO genes, both involved in macrophage function, suggestively associated with pulmonary TB (Grant et al. 2016). Results from these GWAS show that there is still little agreement across studies, which may be due to the definition of cases and controls within these studies, but may also be linked to the difference in genetic background between subjects from different continents.

Numerous studies have informed our understanding of the role of host genetics in susceptibility to TB infection and disease. The Toll-like and Nod-like receptor family of genes (TLR1, TLR2, TLR4, TLR6, TLR9, TIRAP, TOLLIP, NOD1, NOD2) are involved in recognition of mycobacteria, and a few studies have shown association between these genes and TB (Berrington and Hawn 2007, Brooks et al. 2011, Azad et al. 2012, Zhang et al. 2013, Dittrich et al. 2015).

The tumor-necrosis factor (TNF) pathway, including the products of IL1, IL4,

IL6, and IL10 genes, is a key component of the innate immune response to TB (Flynn

2004). IL10 inhibits the synthesis of pro-inflammatory cytokines, such as IL1 (Donnelly et al. 1999). A recent review of the TB genetic susceptibility literature concluded that studies on IL10 polymorphisms have shown mixed results (Azad et al. 2012). Increased expression of IL4 has been shown to be associated with TB disease (Demissie et al. 2006,

Li et al. 2012), while a different study identified an IL4 SNP to be associated with protection against TB disease (Abhimanyu et al. 2011). IL6 was found to be linked with

TB in a genome-wide linkage analysis (Stein et al. 2007), but the finding was not validated in a fine-mapping association study (Baker et al. 2011). A recent study used both network and pathway analysis to show that the main pathways and genes which control susceptibility to infection with mycobacteria, not limited to Mtb, involve both the 26 innate and the adaptive immune pathways. The genes involved included IL1B, TLR1, and

TLR9 in an Mtb specific network (Lipner et al. 2016).

The IFNγ/interleukin-12 pathway, including STAT1, IL12RB1 and IL12RB2 genes, provides the primary adaptive immune response to TB (Galal et al. 2012).

IL12RB1 deficiency has been associated with Mendelian Susceptibility to Mycobacterial

Disease (MSMD) (Remus et al. 2004, van de Vosse et al. 2013), while variation in

IL12RB2 has been strongly associated with severe TB disease (Casanova 2013).

Variation in STAT1, which is involved in upregulating IFNγ response genes, has been associated with MSMD (Ozbek et al. 2005, Tsumura et al. 2012). IL12B shows a significant difference in gene expression levels after exposure to Mtb-specific antigens

(Wu et al. 2007). However, these findings are not always replicated, as a study in Ghana focused on 246 variants within promoters and exonic regions of IFNγ pathway genes, including IL12A, IL12B, IL12RB1 and IL12RB2, and STAT1 found no significant association between these variants and resistance or susceptibility to TB disease (Thye et al. 2010). This study used 1999 smear or culture-positive TB patients and 1589 healthy controls who did have TST results, though the TST status of the controls was not described.

In addition, genetic variation in NRAMP1 (aka SLC11A1) has been associated with TB disease (Li et al. 2011, Stein and Baker 2011, Meilang et al. 2012) and SLC6A3 was identified through a genome-wide linkage analysis of tuberculin skin test reactivity

(Cobat et al. 2009). The SLC6A3 gene resides on 5 in the same region identified by our genome scan (Stein et al. 2008). 27

These genes within specific pathways were selected for association analysis with

TB disease and RSTR because of their key roles in innate and adaptive immunity to TB.

This focused gene and tag SNP selection reflects our hypothesis based approach, as opposed to agnostically analyzing millions of SNPs across the genome, and also reduces the penalty incurred by multiple testing. It is this multiple testing penalty which likely prevented these genes of interest from showing an association within published GWAS.

Using human genetic information from individuals enrolled in the KC study, I have examined the associations between genes within the TLR pathway, the tumor necrosis factor (TNF) pathway, and the IFN pathway, and our two phenotypes of interest: TB disease and RSTR. The RSTR phenotype is hypothesized to represent innate resistance to infection, as these are individuals who maintain a negative TST over a 2-year follow-up period despite prolonged exposure to those with infectious TB disease.

2.4 Genetic Associations with RSTR

While many studies designed to uncover genetic associations with TB focus on the outcome of TB disease, previous research has linked different genomic regions to TB disease versus resistance of Mtb infection (Stein et al. 2008). Few studies have explored genetic association or genetic linkage with the TST- phenotype (Stein et al. 2008, Cobat et al. 2009, Cobat et al. 2012, Cobat et al. 2015), and only one has studied PTST- as a phenotype, which is referred to in this proposal as RSTR (Stein et al. 2008). The RSTR phenotype captures innate resistance to infection. It is a unique and interesting phenotype, as these are individuals who maintain a negative TST over a 2-year follow-up period, despite prolonged exposure to those with infectious TB disease. A longitudinal study with careful data collection is needed to characterize this phenotype, and is therefore not 28 able to be defined in those studies which utilize data collection at only one timepoint.

Using data from the first stage of the Household Contact Study in Kampala, Uganda, a linkage analysis was conducted on RSTRs compared to those with positive TST (Stein et al. 2008). This linkage study suggested regions on chromosome 2 and 5 which showed association with RSTR, and showed a nominal association with the gene SLC11A1.

A 2009 study focused on a genome-wide linkage search for genetic loci with an impact on TST reactivity (Cobat et al. 2009). TST reactivity is here defined as the extent of TST reactivity in millimeters, a continuous trait. Linkage analysis with the continuous trait identified the 5p15 region, within the same region identified in our earlier study

(Stein et al. 2008). Fine-mapping within this region then identified SLC6A3 as a gene that would warrant further investigation. A series of studies by this research group has focused on the 11p14 region and its association with TST negativity. The initial study was performed in a South African population and identified 11p14 as associated with

TST negativity, later naming the locus TST1 (Cobat et al 2009). Another linkage analysis followed, using results from a French household contact study, focusing on household members of patients with TB disease (Cobat et al 2015). This study also identified

11p14-15, near the site of the TST1 locus, and then validated these results using a meta- analysis with the French population combined with a South African cohort.

Though very few candidate genes have been hypothesized for PTST- or the RSTR phenotype in published research besides SLC6A3 and TST1, current work being done by our partners through the TBRU at the University of Washington in Dr. Thomas Hawn’s lab, has identified differential gene expression response to Mtb in those who are RSTR 29 compared to those who are TST+, through gene expression analysis, and described in

Section 4.1 (Seshadri 2016).

Three genetic probes were used to correctly predict TST negativity in 22 out of 28 samples tested, using linear neural network analysis followed by leave one out cross validation. Of these three genes, none had been identified as associated with TB disease in published research. Lipocalin-12 (LCN12) codes for an extracellular protein for which no biological function has yet been elucidated (Peng et al. 2011). Collectin liver 10

(COLEC10) codes for a pattern recognition molecule which has been associated with the innate immune system (Ohtani et al. 2012, Axelgaard et al. 2013). Vitrin (VIT) is associated with collagen production in the eye, and mutations within this gene have been associated with von Willebrand disease, a blood clotting disorder (Mayne et al. 1999,

Whittaker and Hynes 2002).

Further analyses have identified several other genes as part of a gene set associated with TST negativity; of these fourteen, none had been previously identified in published research as being associated with TB disease. The neurotrimin (NTM) gene encodes for a neural cell adhesion protein which is typically expressed in the central nervous system, lung, and heart, and may have a cardiovascular function (Krizsan-Agbas et al. 2008, Luukkonen et al. 2012). The Abelson family of nonreceptor tyrosine kinases,

ABL1 and ABL2, have been studied as potential drug targets for treating tumors in various cancers(Greuber et al. 2013). ABL1 was first identified as an oncogene in leukemia, and has been useful in treating chronic myeloid leukemia, linked to the fusion oncogene BCR-ABL1(Wong and Witte 2004). Eleven genes within the histone deacetylase (HDAC) family, HDAC1-11, involved in transcription regulation, have also 30 been included in the analysis. Within the HDAC family of genes, none of them have been investigated in association with tuberculosis, though some of the HDAC genes have been studied and used in cancer treatments as potential tumor suppressors and drug targets(Xia et al. 2015). HDAC1 and HDAC2 have been implicated in maintaining CD4+ T cell lineage integrity, as HDAC1-3,5,7 have been linked to T cell regulation and development

(Boucheron et al. 2014, Xia et al. 2015).

Though these genes and gene sets show differential expression between RSTR subjects and TST+ subjects, the genes of interest identified have not previously been shown to be associated with TB disease or resistance to Mtb infection in current published analyses. Therefore, our objective was to conduct genetic association analysis on SNPs functionally associated with these genes of interest and tag SNPs within these gene regions specially selected from within African populations to capture nearby common genetic variation. This analysis was completed using the results of the Illumina

Omni5, a microarray containing nearly 5 million single nucleotide polymorphisms

(SNPs), on a sample of 483 individuals enrolled in the KC Study, as proposed in Aim 2.

2.5 SNP Annotation Databases

The 17 genes of interest with the RSTR phenotype were identified through experiments which showed that these genes were differentially expressed among those individuals classified as RSTR and those individuals classified as TST+. The genes identified in conjunction with the RSTR phenotype have not yet been studied in the context of TB disease or resistance, so choosing SNPs for analysis cannot be done by referencing the literature. Selecting SNPs solely within the gene coding region is restrictive and may not capture SNPs involved in regulating the gene’s expression, as 31 those SNPs may be located upstream or downstream of the coding region itself. We would ideally choose those SNPs that are involved in gene regulation within cells integral to TB immune response. Consortia, such as ENCODE and FANTOM5, have been working to make SNP annotation available online, through information gathered from a myriad of experiments on dozens of cell types. Chips containing millions of SNPs which densely cover the are increasingly affordable. These developments allow for a finely tuned analysis specific to our needs.

The Encyclopedia of DNA Elements (ENCODE) project published initial results in 2012 in a dedicated issue of Nature (Encode Project Consortium 2012, Maher 2012).

The ENCODE project set out to identify and catalog functional elements within the human genome. Genes are the most well-known functional element, and ENCODE identified 20,687 human protein coding genes. However, genes only account for approximately 1.2% of the entire human genome. ENCODE catalogued functional elements contained within 80.4% of the human genome. Data was collected from 147 different cell types, and these cell types were divided into three tiers based on priority and depth of analysis. Tier 1 contains three cell types which were analyzed using the greatest number of tests (such as RNA-seq, methyl-seq, DNase-seq, histone modifications). Tier 2 contains 15 other highly studied cell types including HeLa cells, cervical carcinoma cells.

The third tier contains over 100 other cell types, representing a range of biological diversity. CD4+ T cells are located within the third tier and among the available

ENCODE cell types are the best suited for analysis with the RSTR phenotype. It is these

CD4+ T cells which identify antigens from MHC cells and instigate the appropriate immune response, as described in the host immune response Section 2.5. 32

The Functional Annotation of the Mammalian Genome (FANTOM5) project completed and released results in 2014 in Nature (Andersson et al. 2014,

Fantom Consortium et al. 2014). FANTOM5 is an expansion on the methods and cell types used within ENCODE, providing resources for functional annotation and understanding transcription start sites, promoters, and enhancers between cell types, utilizing both human and mouse cell lines. This project included 573 human primary cell samples, 128 mouse primary cell samples, 250 cancer cell lines, 152 human post-mortem samples, and 271 mouse developmental cell samples. Within the human primary cell samples, most cell types were available from three separate donors. Most importantly for our analysis, FANTOM5 contained two different macrophage cell types, granulocyte macrophage progenitor cells from donor bone marrow and monocyte derived macrophage cells from donor blood, each from three donors. Macrophages are involved in TB immune response, as described in the host immune response Section 2.5, and would be ideal in studying the RSTR phenotype.

2.6 Mycobacterium tuberculosis and host immune response

The immune response to TB is a process involving the reactions between the Mtb and the host’s immune system, producing cytokines to fight the infection. The communication between these elements is a key part of what stands between Mtb infection and progression of infection to TB disease. Few studies focus on host genetic influences on cytokines and cytokine production in response to Mtb exposure or infection

(Wheeler et al. 2006, Stein et al. 2007, Stein et al. 2008, Carmona et al. 2013). IFNγ is a key cytokine in Mtb response, as evidenced by their use in IGRAs to detect Mtb infection. T cells produce IFNγ, after recognizing Mtb associated antigen peptides 33 presented by MHC Class II cells via receptors on the T cell’s surface. Through household contact studies conducted in Uganda, IFNγ response has been shown to be a heritable trait, with an estimated 17% heritability, with IFNγ heritability differing according to the antigen used to illicit the IFNγ response, with a range of 11-22% (Stein et al. 2003, Tao et al. 2013).The presence of IFNγ informs us about the body’s immune response to TB, so it is especially of interest to identify the genetic factors underlying the IFNγ response to

Mtb-specific antigens.

Antigen-induced IFNγ production has been established as having a protective role in host immune response to Mtb (van Crevel et al. 2002, Tao et al. 2013). This immune response can be measured in vitro by the amount of cytokines produced in a blood sample incubated with a culture filtrate including Mtb specific antigens. The Mtb-specific antigens we utilize in this study to measure IFNγ response, early secretory antigen 6

(ESAT-6) and antigen 85B (Ag85B), produce clinically relevant results. ESAT-6 is a component of IGRAs (Fletcher 2007, Nyendak et al. 2008), and Ag85B is a potential vaccine candidate (Mustafa 2002, Williams et al. 2005). Culture filtrate (CXFT), containing Mtb culture filtrate proteins was also used in this study to elicit an IFNγ response.

The IFNγ response, representing the human immune response to TB, is the focus of the structural equation modeling (SEM) proposed in Aim 3 and described in more detail in Chapter 5. The IFNγ production levels described here are used as predictors in the SEM. SEM is a hypothesis based approach and should not be used to explore data to generate a hypothesis of how variables relate to one another. Rather, this analysis was approached with a hypothesis grounded in the biological interactions of the host, the 34 pathogen, and the resulting immune response—which here, respectively, are humans,

Mtb, and IFNγ response. This simplified biological model for the hypothesis is illustrated in Figure 2. 35

Figure 2. Hypothesized biological model

Peptides of antigens, such as ESAT6 and AG85B

Macrophage CD4+ T Cell MHC Class II Mtb

IFNγ, produced by IFNγ T-cells in response Host genetic variants to detected influence IFNγ production antigens

Here, we see that when Mtb, which includes antigens specific to the mycobacterium, enters the human body typically after being inhaled into the lungs, it is engulfed by a macrophage. Its presence is detected by CD4+ T Cells, when antigen presenting cells present peptides from Mtb-specific antigens bound to major histocompatibility (MHC) class II proteins. These CD4+ T Cells produce IFNγ, which in turn activates macrophages to destroy the Mtb. We hypothesize that this relationship is regulated by human genetic variants and also influenced by the Mtb lineage present. 36

2.7 Mycobacterium tuberculosis Lineage

Worldwide, there are seven main lineages of Mtb. These lineages have been defined by Gagneux and others on the basis of past studies and various global phylogenies (Gagneux and Small 2007, Coscolla and Gagneux 2014). Two main lineages and one sub-lineage of Mtb have been identified within our study data from

Kampala, Uganda. The lineages are Lineage 3, also known as the East African/Indian or

Central Asian lineage (CAS); Lineage 4, also known as the Euro-American lineage (EA), and a sublineage of Lineage 4 which is believed to be unique to Uganda, which we refer to as Euro-American Ugandan (EAU). Previous research identifying these three groups of

Mtb within the KC Study, referred to the Euro-American Ugandan sub-lineage as Mtb

Uganda family (Wampande et al. 2013). This study found that out of the 1,746 Mtb isolates obtained, 63% were of EAU lineage, 22% were EA lineage or Lineage 4, and

11% were CAS lineage or Lineage 3. The main focus of this study was to explore whether the dominance of the EAU lineage was due to increased virulence, as defined by the presence of cavitary disease; however, they found no association. The association between the Mtb genotype, host genotype, and their effect on immune response is not fully understood and warrants further investigation. The genotypic variation of Mtb lineages and genetic variation within their human hosts may help to explain differences in immune response. These lineages present within the KC study sample were therefore included in the SEM analysis, as described in Chapter 5. 37

We have completed preliminary work comparing IFNγ responses to incubation with various culture filtrates by Mtb lineage, stratified by TB diagnosis (TB disease, latent TB infected [LTBI], and TST-) in unpublished preliminary analyses. Analyses were conducted on 568 individuals enrolled between 2002 and 2006 in the KC study.

Though HIV status was consistently negatively associated with IFNγ response, only within the TST- group was any significant association of Mtb lineage with IFNγ response observed. The EAU lineage of Mtb did contribute to the IFNγ response to CXFT, but only in the TST- group (p-value = 0.03). Also within the TST- group, those subjects exposed to CAS lineage of Mtb elicited a significantly lower IFNγ response to ESAT-6

(p-value = 0.007). See Table 1 for the subset of these results highlighting our findings.

These findings are of interest as the variables included here, ESAT6, CXFT, and AG85B as well as Mtb lineages EA, EAU, and CAS, were also included as predictors in the SEM, described in Chapter 5. 38

Table 1. IFNγ Response and Mtb Lineage Preliminary Data

No TB/TST- Beta (SE) P-Value (CI) ESAT6 Intercept 1.04 (0.33) 0.001 (0.4, 1.7) * Age 0.02 (0.008) 0.01 (0.003, 0.04) * HIV 0.1 (0.28) 0.7 (-0.4, 0.6) EAU -0.6 (0.32) 0.05 (-1.3, 0.004) CAS -0.89 (0.33) 0.007 (-1.5, -0.2) * CXFT Intercept 1.48 (0.31) <0.001 (0.87, 2.07) * Age 0.02 (0.33) 0.95 (-0.63, 0.67) HIV -1.0 (0.31) 0.0009(-1.6, -0.4) * EAU 0.76 (0.36) 0.03 (0.03, 1. 5) * CAS 1.08 (0.54) 0.05 (0.009, 2.1)

* indicates a p-value less than 0.05 39

Chapter 3: Candidate Gene Analysis

3.1 Introduction

The work to be described in this chapter is that stemming from the proposed Aim 1 (see Section 1.2). The candidate genes included in this analysis were selected based on the current literature of human genetic association with TB disease at the time of the Illumina10k microarray selection, focusing on genes with key roles in the innate and adaptive immune response to TB. These genes, described in Section 2.2, are selected for their association with TB disease or for their association with human immune function. tumor-necrosis factor (TNF) pathway, including the products of IL1, IL4, IL6, and IL10 genes, is a key component of the innate immune response to TB (Flynn 2004).

SLC11A1, also known as NRAMP1, was first identified by linkage and subsequent fine- mapping to be associated with TB through the use of the mouse model. (Flynn 2004,

Moller et al. 2010). The Toll-like and Nod-like receptor family of genes (TLR1, TLR2,

TLR4, TLR6, TLR9, TIRAP, TOLLIP, NOD1, NOD2) are involved in recognition of mycobacteria, and a few studies have shown association between these genes and TB

(Berrington and Hawn 2007, Brooks et al. 2011, Azad et al. 2012, Zhang et al. 2013,

Dittrich et al. 2015). The IFNγ/interleukin-12 pathway, including STAT1, IL12RB1 and

IL12RB2 genes, provides the primary adaptive immune response to TB (Galal et al.

2012).

3.2 Study population

Data used in the following analyses were gathered from two phases of a household contact study conducted in Kampala, Uganda. Data from the Household

Contact Study were collected from 1995-1999(Guwatudde et al. 2003), while data from 40 the Kawempe Community Health Study (KC Study) were collected from 2002-2009

(Stein et al. 2013). The study protocol was reviewed and approved by the AIDS Scientific

Committee of Makerere University, The Uganda National Council on Science and

Technology, and institutional review board at University Hospitals Case Medical Center,

Cleveland, OH. In these household contact studies, individuals who presented at the study clinic with active TB disease were enrolled. Household members, including those who were infected or uninfected with Mtb, were also enrolled. After the initial clinic visit and necessary medical information was collected, patients enrolled in the study underwent a follow-up exam every three months for the initial six months, then every six months for the next two years. TB disease was diagnosed by a culture confirmation.

RSTR was defined as an individual with a negative TST result at the initial clinical exam, followed by a negative TST test every clinic visit thereafter throughout the two-year follow-up period. A TST was defined as positive if the induration at the injection site is greater than 5mm for children or those with an HIV+ diagnosis, and greater than 10mm for all others. Individuals that were not RSTR or did not have active TB disease were defined as having a latent TB infection (LTBI).

3.3 Candidate gene genotyping

In our analysis, we focused on 29 genes involved in the TNF, TLR/NLR, and

IFNγ/IL12 pathways described in Section 2.2, genotyping 546 haplotype tagging SNPs within these genes. Our study has the advantage of utilizing Tag SNPs in our analysis that were chosen by their ability to “tag” certain haplotypes that may be inherited with disease-causing variants through the linkage disequilibrium (LD) structure of the genome

(Stein 2011). Tag SNPs provide a powerful and less restrictive approach to detecting 41 common variants, as compared to choosing individual potential disease causing variants, and allow more flexibility in choosing SNPs (Li and Leal 2008). Rather than potentially missing a disease-causing SNP by excluding it from an analysis, Tag SNPs allow the inclusion of multiple potential disease-causing SNPs within a given haplotype.

A custom Illumina GoldenGate 10k microarray was designed for this analysis.

Haplotype tagging SNPs were selected to capture common genetic variation (minor allele frequency ≥ 5%) with strong coverage (linkage disequilibrium r2 ≥ 0.8) in any of the 3

African HapMap populations, based on our previous analyses (Baker et al. 2012). Tag

SNPs were identified using Genome Variation Server (GVS)

(http://gvs.gs.washington.edu/GVS137/index.jsp). Genotyping was conducted using the

Illumina iSelect platform. Once SNPs were selected using GVS, their availability on the iSelect platform was verified; if a specific SNP was not available on iSelect, a nearby

SNP was selected to replace it. Genotype calling and quality control was performed using Genome Studio, filtering the SNPs by call frequency, replicate errors, and clustering quality. Family relationships were corrected and resolved where needed, including defining subfamilies of first-degree relatives within households.

3.4 Statistical Analysis

Genetic association analyses were conducted by logistic regression using generalized estimating equations to account for genetic relatedness within the data, as implemented in the R package GEE. For GEE, an exchangeable correlation structure was used in analysis unless the sample did not converge; then an independence correlation structure was used instead. Data was clustered by subfamily, defined as groups of first- degree relatives living within a household. Genetic association analyses were conducted 42 for TB disease as an outcome, and then again for RSTR as an outcome. Genotypes were coded as both additive and dominant genetic models, using the minor allele as the reference allele. Sex and HIV status were included as covariates in all analyses. A nominal significance threshold of p = 2 × 10–4, corresponding to an experiment-wide significance of  = 0.05, was determined by estimating the number of independent tests based on LD among SNPs, using the program SNPSpDlite (Nyholt 2004) and applying the Bonferroni correction.

3.5 Validation Analysis

A validation of results, as proposed in Aim 1b, obtained from the candidate gene based analysis was also performed using the results from an Illumina Omni5 microarray, which is described more fully in Section 4.3, on a sample of 483 independent individuals from the KC study. Those SNPs from the original analysis which were associated with the outcomes of interest with at least a p-value of 0.05 were selected for validation. As with the initial candidate gene based analysis as described above, genetic association analyses were conducted for TB disease as an outcome, and then again for RSTR as an outcome. Genotypes were coded as both additive and dominant genetic models, using the minor allele as the reference allele. Sex and HIV status were included as covariates in all analyses, with the addition of age as a binary variable, contrasting between individuals

≤ 10 years and > 10 years of age.

Genotypes of SNPs selected for validation not contained within the Omni5 panel were imputed. To impute the genotypes, the program SHAPEIT was used to first check

SNP alignment to the same DNA strand and then haplotypes were phased (Delaneau et al. 2014). The existing SNPs and a reference sample consisting of all African samples 43 from the 1000 Genomes Project were then used for imputation within the IMPUTE2 program (Marchini and Howie 2010, Howie et al. 2011). The African samples were used to most closely replicate the LD patterns found within our Ugandan samples. Using the pattern of LD found in the 1000 Genomes Project samples, IMPUTE2 then calculated probabilities for untyped genotypes for each individual given genotypes at typed SNPs and haplotypes in the reference panel. Given these probabilities, the most likely genotype was selected for each of the imputed SNPs and filled in for those individuals involved in the initial analysis. Only those imputed SNPs with an IMPUTE2 quality score greater than 0.7 were included in the final analysis. These results were formatted for analysis, replacing genotype posterior probabilities with allele dosages in R using Gtool. For the validation of the previous candidate gene analysis, our significance threshold was less strict. P-values that are less than 0.05 were classified as supportive of the initial findings.

The results of the validation analysis were then compared to that of the original analysis through a meta-analysis. Then, a meta-analysis was conducted to examine the overall strength of genetic association. For meta-analysis, the program METAL was used, with inverse variance weighting. METAL resolves for possible allele mismatch caused by

DNA strand differences between populations to be combined, given the major and minor alleles used for each population’s analysis (Willer et al. 2010).

3.6 Initial Results

Genetic association with TB We examined whether 546 haplotype tagging SNPs in 29 immune pathway genes were associated with TB in 835 subjects from 481 families within 298 households. 240 individuals (28.7%) had TB (Table 2). The mean age was

18.43 (median=17) and 15% were HIV+. The percentage of HIV+ individuals within 44 each group was similar, with 15% HIV+ in the TB analysis and 13% HIV+ in the RSTR analysis (data not shown).

Genetic association analysis with pulmonary TB as the outcome of interest showed two SNPs met the study-wide significance threshold, with 19 additional SNPs showing a nominally significant association (p < 0.05) (Table 3). The top SNPs in the

TB analysis included 1 SNP within TICAM2 (aka TRAM) in the 5’ region, rs746566

(OR= 1.42, p=3.6x10-6) and 1 SNP in IL1B, rs1143643 (OR=1.99, p=4.3x10-5).

Multiple SNPs were associated with TB at the nominal p < 0.05 level in IL4 (best p=6.9x10-3), NOD1 (p= 9.4x10-3), and TOLLIP (p =6.8x10-3).

Genetic association with RSTR We next examined whether the same set of SNPs was associated with RSTR in 718 individuals, including 75 individuals (10.5%) who were

RSTR. None of the SNPs met the experiment-wide significance level in the analysis with

RSTR as the phenotype (Table 4). However, 17 SNPs showed a nominal association, at the p < 0.05 level. The top SNPs in this analysis included 2 SNPs in NOD1, 2 SNPs in

NOD2, and 3 SNPs in SLC6A3. 45

Table 2. Illumina 10k Initial Analysis and Omni5 Validation Sample characteristics ILL10k Initial Analysis Omni5 Validation Sample Sample Total individuals 835 483 Families 481 324 Females 485 (58.1%) 239 (49.4%) Age, years 18.4 ± 13.6 20.8 ± 13.5 TB disease 240 / 835 (28.7%) 203 / 483 (42.0%) Resisters (RSTR) * 75 / 718 (10.5%) 33 / 471 (7.0%) HIV+ 122 (14.9%) 64 (13.3%) * The analysis of the RSTR phenotype was restricted to a subset of individuals with complete tuberculin skin test follow-up data, and included 718 individuals from 435 families for the Initial analysis, and 471 individuals from 315 families for the Validation analysis. ILL10k Initial Analysis sample refers to those individuals genotyped on the Illumina10k microarray and corresponding to Aim 1a, while Omni5 Validation sample refers to those individuals genotyped on the Omni5 microarray, corresponding to Aim 1b. 46

3.7 Validation Results

Genetic association with TB We examined 61 SNPs to validate: 36 SNPs had been found associated with TB, 24 with RSTR, and one with both phenotypes with a p-value

<0.05. Of these 61 SNPs, 45 were in the Omni5 panel. The remaining 16 were imputed using the 1000 Genomes African populations as a reference. These SNPs were analyzed in 483 subjects from 324 families within 225 households. 203 individuals (42.0%) had

TB (Table 5). The mean age was 20.8 (median=21) and 13.3% were HIV+.

Genetic association analysis with pulmonary TB as the outcome of interest showed three SNPs with a suggestive association (p < 0.05) (Table 6). The top SNPs in the TB analysis included a SNP within IL12RB2, rs3790567 (OR= 0.55, p=0.012), a SNP within IL1A, rs2856838 (OR= 0.62, p=0.022), and a SNP in TLR2, rs11938228

(OR=1.79, p=0.0025).

Genetic association with RSTR We then performed the same analysis with RSTR as the outcome of interest in 471 individuals, including 33 individuals (7.0%) who were classified as RSTR. Genetic association analysis with RSTR as the outcome of interest showed one SNP with p-value < 0.01 (Table 7), and three SNPs which showed a nominal association (p < 0.05). These top SNPs included one SNP within TICAM2, rs2052834

(OR= 0.47, p=0.007), one SNP within TIRAP, rs8177375 (OR= 1.95, p=0.054), and two

SNPs within NOD2, rs2111234 (OR= 1.45, p=0.05), and rs6500328 (OR=2.01, p=0.053). 47

3.8 Meta-Analysis

Meta-Analysis with TB Meta-analysis with pulmonary TB as the outcome of interest showed eight SNPs with a meta p-value < 0.05. (Table 6). These included one SNP within IL1B, SLC11A1, NOD1, IL4, TLR4, TICAM2, and two SNPs in TLR6. Of these eight, there were four SNPs with a stronger association (p-value <0.01): a SNP within

NOD1, rs7793010 (OR= 1.3, p=0.003), a SNP within NOD2, rs6500328 (OR= 1.29, p=0.007), a SNP within TLR6, rs5743832 (OR= 1.63, p=0.008), and a SNP in TICAM2, rs746566 (OR=1.3, p=0.0001).

Meta-Analysis with RSTR Meta-analysis with RSTR as the outcome of interest showed eight SNPs with a p-value < 0.05 (Table 7). These SNPs included one SNP within NOD2,

IL12A, IL10, TLR2, two SNPs within IL10 and two SNPs in NOD1. Of these eight, there were two SNPs with a stronger association (p-value <0.01): a SNP within NOD2, rs6500328 (OR= 2.17, p=0.0059) and a SNP in TICAM2, rs2052834 (OR=.59, p=0.0009). 48

0.197 0.069 0.312 0.022 0.118 0.090 0.097 0.099 0.023 0.742 0.296 0.758 0.026 0.082 0.134 0.255 0.016 0.477 0.855 0.245 0.042 0.115 0.013 0.304 value - analysis p -

Meta 1.20 1.28 1.23 0.76 0.83 0.79 1.42 0.77 0.73 1.05 1.22 1.04 0.72 0.85 1.27 OR 1.16 1.40 1.11 0.97 0.81 1.32 1.32 1.40 0.87

0.986 0.999 0.988 0.996 0.976 quality

0.406 0.542 0.134 0.889 0.724 0.781 0.886 0.898 0.406 0.022 0.141 0.012 0.459 0.786 0.874 0.158 0.085 0.076 0.025 0.140 0.615 0.963 0.471 0.066 value - p

0.64 0.55 0.85 0.96 0.96 OR 0.74 0.65 1.51 1.79 0.63 1.11 0.99 1.17 1.50 1.21 1.17 0.59 0.97 1.08 1.06 0.95 0.97 0.83 0.62

3 Omni5 0.06 0.08 0.18 0.47 0.10 0.22 0.07 0.08 0.12 0.03 0.36 0.06 0.22 0.10 0.08 0.09 0.0 0.35 0.37 0.40 0.06 0.14 0.97 0.43 Control

1

0.04 0.07 0.18 0.46 0.09 TB 0.19 0.07 0.12 0.16 0.03 0.37 0.06 0.23 0.14 0.12 0.11 0.0 0.31 0.33 0.37 0.05 0.14 0.94 0.31

03 05 03 03 03 - - - - - 0.010 0.038 0.019 0.044 0.042 0.020 0.044 0.012 0.026 0.034 0.023 0.013 0.020 0.028 0.013 0.031 0.046 0.026 0.018 value - 7.72E 4.25E 9.44E 5.48E 6.93E p

1.91 1.39 0.62 0.79 1.52 OR 1.59 1.99 0.65 0.66 1.77 1.48 1.64 1.56 0.62 0.67 0.68 1.80 0.67 0.72 0.64 1.75 0.67 0.68 1.53

Illumina10k 0.06 0.16 0.17 0.52 0.11 0.19 0.14 0.12 0.13 0.05 0.38 0.04 0.25 0.14 0.14 0.11 0.04 0.32 0.34 0.38 0.06 0.16 0.14 0.31 Control

0.07 0.18 0.17 0.45 0.11 TB 0.21 0.17 0.10 0.11 0.07 0.40 0.07 0.28 0.10 0.10 0.09 0.07 0.31 0.33 0.39 0.06 0.10 0.10 0.36

A/C T/C C/T A/G T/C C/T G/A A/G C/T G/A C/A A/C A/G C/T T/C G/A G/A T/C A/G G/A Alleles A/G A/G G/A A/C

Dom Dom Dom Add Dom Dom Dom Dom Add Dom Dom Dom Dom Dom Add Add Add Dom Dom Dom Dom Dom Add Dom

genetic association analysis validationgenetic TBphenotype of

IL4 IL12B TICAM2 TLR4 IL1A NOD1 IL12RB2 TICAM2 TICAM2 TLR1 IL18 SLC11A1 IL12RB2 NOD1 TLR4 TLR4 STAT1 NOD2 IL4 IL4 Gene IL1B IL1B TLR4 TLR2

3 6 9 7 0 4 0 43 0 8 3 40 3

. Results of. Results 3 570

rs419939 rs462466 rs273719 rs2856838 rs297049 rs379056 rs400 rs224327 rs224327 rs224329 rs2569254 rs256946 rs171590 rs192791 rs1927912 rs206679 rs206684 rs11536877 rs11938228 rs122702 rs13062 rs17129792 SNP rs1143633 rs114364 Table 49

03 03 04 03 - - - - 0.023 0.019 0.172 0.292 0.087 7.95E 6.76E 1.17E 3.34E

1.50 1.52 1.63 1.19 1.21 1.29 0.80 1.30 1.30

0.984 0.992

0.475 0.475 0.281 0.140 0.178 0.106 0.688 0.438 0.108

1.36 0.72 0.65 1.28 0.93 0.88 1.27 1.21 1.21

7 5 0.08 0.32 0.07 0.7 0.18 0.1 0.50 0.09 0.09

9 2 0.09 0.29 0.07 0.7 0.18 0.1 0.52 0.10 0.10

03 06 - - 0.010 0.028 0.030 0.040 0.014 0.017 0.013 6.83E 3.61E

1.88 1.52 1.63 1.30 0.68 1.42 1.33 1.74 1.79

0.08 0.29 0.06 0.52 0.17 0.24 0.45 0.08 0.08

0.08 0.32 0.08 0.42 0.17 0.25 0.54 0.09 0.09

A/G T/C C/T G/A G/A C/T A/G A/G A/C

Add Add Dom Dom Dom Dom Dom Add Add

TICAM2 NOD1 TLR6 TLR6 TLR6 TOLLIP IL18 NOD2 TICAM2

8 9 2 8 rs650032 rs698365 rs746566 rs7793010 rs574380 rs5743812 rs574383 rs574386 rs5744229 In Alleles column, minor allele is listed first. TB and Control refer to refer and Control (quality) allele TB quality first. allele score frequencies IMPUTE2 is listed In (MAF). Alleles column, minor minor is for shownimputed SNPs. 50

03 04 - - 0.024 0.056 0.033 0.036 0.100 0.234 0.099 0.112 0.167 0.305 0.135 0.165 0.931 0.036 0.309 0.049 0.047 0.050 0.077 0.088 0.154 0.003 value - 5.90E 9.43E p analysis -

Meta 1.63 1.51 0.60 0.67 0.81 1.35 1.26 0.81 1.42 0.77 1.45 2.17 0.75 0.98 1.75 0.75 0.68 0.68 0.52 1.64 0.59 0.71 0.63 1.51 OR

0.985 0.983 0.961 0.996 0.975 0.930 0.989 quality

03 - 0.283 0.702 0.464 0.502 0.457 0.268 0.593 0.586 0.547 0.312 0.422 0.053 0.792 0.054 0.464 0.550 0.492 0.492 0.583 0.517 0.959 0.924 0.055 value - 6.72E p

0.72 2.01 0.50 1.95 1.45 OR 1.26 0.81 0.81 0.78 0.70 0.47 0.98 1.05 1.45 1.48 1.17 0.74 0.81 1.19 0.58 1.16 1.15 1.27 1.47

9 1 Omni5 0 .03 0.42 0.59 0.34 0.14 0.07 0.13 0.40 0.40 0.11 0.06 0.48 0.40 0.09 0.41 0.09 0.23 0.18 0.40 0.03 0.9 0.03 0 0.13 0. Control

3 1 37 44 1 0.21 0.99 0.99 0.21 0.17 0.17 0.29 0.29 0.04 0.04 0.42 0.54 0.08 0.54 0.21 0.25 0.08 0.29 0. 0.9 0. 0.42 0.13 0. RSTR

phenotype phenotype 0.012 0.047 0.043 0.040 0.044 0.032 0.050 0.047 0.019 0.014 0.032 0.046 0.049 0.020 0.044 0.046 0.036 0.032 0.014 0.040 0.025 0.025 0.020 0.026 value - p

2.20 2.44 0.59 0.49 1.88 OR 0.41 0.62 0.61 0.30 2.23 0.66 0.62 0.44 1.56 1.72 1.65 0.53 0.59 0.68 1.83 0.70 0.70 0.46 0.48

0.40 0.47 0.31 0.18 0.11 0.14 0.42 0.42 0.09 0.05 0.47 0.37 0.11 0.40 0.12 0.18 0.18 0.42 0.45 0.12 0.48 0.49 0.16 0.15 Illumina10k Control

0.45 0.45 0.29 0.12 0.19 0.12 0.33 0.33 0.00 0.14 0.45 0.40 0.05 0.45 0.21 0.21 0.10 0.33 0.31 0.19 0.40 0.40 0.05 0.02 RSTR

G/T C/T T/C C/T G/A C/T A/G T/C G/A A/G T/C C/T A/G T/C C/T G/T A/G T/G T/G A/C Alleles G/A A/G T/C A/G

Dom Add Add Dom Dom Dom Dom Dom Dom Dom Dom Add Dom Add Add Add Dom Dom Dom Add Model Dom Dom Dom Dom

genetic association analysis validationgenetic RSTR of

TLR2 SLC6A3 SLC6A3 TLR4 TLR4 TOLLIP NOD2 STAT1 TIRAP TLR2 IL12RB1 TICAM2 IL12RB2 IL6 NOD2 IL12A SLC11A1 NOD1 IL10 SLC6A3 Gene STAT1 IL10 IL10 IL12RB1

5 0 9 2 3 2 8 4 1 0 35 76 4 1

. Results of. Results 4

rs817737 rs893629 rs503071 rs503072 rs574394 rs6500328 rs757582 rs3024490 rs409588 rs423523 rs456082 rs464061 rs206983 rs211123 rs2243115 rs227663 rs270980 rs1554286 rs178526 rs178871 rs205283 rs2066445 SNP rs11305 rs151811 Table 51

0.031

1.58

0.492

0.78

0.30

0.17

0.031

0.57

0.32

0.24

C/T

Dom

NOD1 rs932272 In Alleles column, minor allele is listed first. RSTR and Control refer to to and refer allele RSTR first. (MAF). score Control IMPUTE2 is listed frequencies quality In allele Alleles column, minor minor SNPs. for imputed (quality) is shown 52

0.36 0.98 0.39 value 0.001* - p

247

12 12 (5%) 98 98 (40%) Omni5 124 (50%) 124 LTBI 17.09 (0.9) 17.09

520 (60%) 37 37 (7%) ILL10k 213 (41%) 213 314 16.12 (0.6) 16.12

0.59 0.06 0.99 value 0.003* -

p

13* 05*

- -

203

value 1.18E 8.17E - 17 17 (8%) TB p 92 92 (45%) 45 (22%) Omni5 26.79 (0.8) 26.79

19.3

21.56 240 Other

ILL10k 35 35 (14%) 86 (35%) 26.2 (0.8) 26.2 130 (54%) 130 Mean Age 9.56

11.54 RSTR

0.42 0.88 0.92 0.73 value

- 16* 16* - - p

value

< 2E < 2E 33 - p

3 (10%) 23 23 (69%) 16 (48%) Omni5

RSTR 16.4 11.54 (2.2) 11.54 15.29 Other

75

Mean Age TB 4 (5.3%) 26.19 26.79 ILL10k 55 55 (73%) 41 (55%) 9.56 (1.0) 9.56

Comparison of Age by phenotype, within Illumina 10k Initial Analysis and Omni5 Validation Analysis 10kOmni5 Sample Initial andwithin Illumina of by phenotype, Age Comparison

. . Comparison of Illumina 10k Initial Analysis and Omni5 Validation 10k Sample Omni5 Initial and of Analysis Illumina by phenotype. Comparison characteristics,

-values shown are the result of a the -values aret-test.shown result -values shown are the result of a Pearson's Chi-squared test, comparing Ill10k and Omni5 for all variables, except except the continuous Chi-squared for of a variables, the -values and Pearson's areshown result Omni5 all test, comparing Ill10k OMNI5 Ill10k Mean age Mean age (%) ≤ 10 (%) Female (%) HIV N *other TB all vs. either the phenotypes, betweenspecified the age difference for a indicatesp-value a <0.05, showing significant LTBI).phenotypes or and (TB phenotypesRSTR vs. all and other (RSTR LTBI), P Table 6 *the givenand difference the for samples. Omni5 the a Ill10k variable between indicatesp-value a showing <0.05, significant P the t-testvariable of age, used. where was 1a,to Aim and analysis used the incorresponding microarray, ILL10k initial refers to on those the genotypedIllumina10k individuals analysis and used corresponding the in microarray, Omni5 validation on the tothose genotyped while individuals refers Omni5 to Aim 1b. Table 5 53

3.9 Discussion

This study examined the association between 29 candidate genes involved in innate immune responses and two distinct phenotypes that result as a consequence of Mtb exposure: resistance to infection and pulmonary TB. We identified novel associations between pulmonary TB and TICAM2; to our knowledge, we are the first to observe associations between this gene and TB. This finding was subsequently replicated in an independent data set, the Wellcome Trust Case Control Consortium (WTCCC) (Hall et al. 2015), yet this result was not independently replicated in our validation sample.

However, it still retained a strong p-value in the meta-analysis (p < 0.001). Moreover, we observed several SNPs with p < 0.01 in NOD1 that were associated with TB. Although our results for NOD1 did not achieve significance after multiple testing correction, this is the first report of even a suggestive association between TB and NOD1. This finding was subsequently replicated in an independent data set, the WTCCC (Hall et al. 2015), yet this result of an association of NOD1 and TB was not replicated in our validation sample nor in the meta-analysis.

Finally, we observed two SNPs in TOLLIP associated with TB (p<0.05), consistent with earlier findings (Shah et al. 2012). Three SNPs within the TICAM2 gene were associated with TB, with one SNP significant at the experiment-wide threshold. In addition, one TICAM2 SNP was nominally associated with RSTR in the initial analysis and this was replicated, resulting in a strong association in the meta-analysis. TICAM2, also known as TRAM, is a Toll-like receptor (TLR) adaptor that supports TLR4- mediated immune responses (Seya et al. 2005). In a recent study, TICAM2 levels predicted with 80% accuracy whether subjects would be high or low responders to the 54

MVA85A TB vaccine candidate(Matsumiya et al. 2013). Ours is the first study to find an association with TICAM2 genetic variants and TB, and this association was replicated with TICAM2 SNPs (p<0.05) in the WTCCC data (Thye et al. 2012).

We observed a statistically significant association between TB and IL1B, more significant than in previous reports and in intronic rather than exonic variants (Awomoyi et al. 2005, Gomez et al. 2006). However, this association was not present in our validation sample, though the association in the meta-analysis was significant at the p <

0.05 level. Intronic SNPs in IL4 were also associated with TB. This is the first report of an association of IL4 polymorphisms with TB in an African population and replicates studies of IL4 in TB in non-Africans (Naslednikova et al. 2009, Abhimanyu et al. 2011).

Our greater SNP density and use of haplotype-tagging SNPs allowed us to detect these genetic association effects (Li and Leal 2008, Stein and Baker 2011). This greater coverage of genetic variation may explain why we achieved greater significance than in previous reports (Awomoyi et al. 2005, Gomez et al. 2006).

Though not significant at the experiment-wide threshold, SNPs from both NOD1 and NOD2 were associated with TB and the RSTR phenotype, respectively. One study in a Chinese population identified a single SNP in NOD2 gene associated with TB susceptibility, although we observed an association between this gene and RSTR (Zhao et al. 2012). NOD2, a cytosolic pattern recognition receptor, has been implicated in recognition of Mtb products that are secreted from the macrophage phagosome into the cytosol. Thus NOD2 may have a role in activation of the host cell inflammasome with subsequent production of mature IL1β and IL18 (Ferwerda et al. 2005, Kleinnijenhuis et al. 2009, Brooks et al. 2011). Ours is the first study to report associations between NOD1 55 and TB, and this finding was replicated in the WTCCC study data (Hall et al. 2015). The association between NOD1 and RSTR is likewise unique. The NOD1 SNP with the strongest association with RSTR in the initial analysis showed a slighter weaker (p <

0.05) association in the meta-analysis, yet a second NOD1 SNP associated with RSTR showed a slightly stronger association in the meta-analysis than in the initial analysis, with a p < 0.01. This is noteworthy, as it is the first report of a possible role for NOD1, and no other studies have examined genetic influences on RSTR. Of the four NOD1 and

NOD2 SNPs analyzed for an association with the RSTR phenotype, all of them maintained significance in the meta-analysis at the p < 0.05 level, with one NOD2 SNP showing a stronger association with p < 0.01.

Although many studies designed to uncover genetic associations with TB focus on TB, few have explored the genetic association or genetic linkage with the TST − phenotype (Stein et al. 2008, Cobat et al. 2009). As most studies do not include TST in the characterization of non-diseased individuals, there is usually no assessment of the unaffected subject’s exposure and/or infection with Mtb (Stein and Baker 2011). Our use of data from a longitudinal household contact study not only provides opportunity to collect follow-up data but also confirms Mtb household exposure of all study participants

(Stein et al. 2013). The RSTR phenotype is of special interest as these individuals do not appear to become infected by Mtb over a 2-year period, despite heavy exposure to an individual with active pulmonary TB and residence in a high TB-endemic area (Ma et al.

2014). Though we did not find any SNPs to be significantly associated with the RSTR phenotype at the p<2×10− 4 (study-wide α = 0.05) level, we did find a nominally significant association with three SLC6A3 SNPs. This finding replicates the cross- 56 sectional study by Cobat et al., conducted in South Africa, that associated SLC6A3 with

TST reactivity (Cobat et al. 2009). Following the meta-analysis, one SNP within TICAM2

(rs2052834, p= 0.0009) and one SNP within NOD2 (rs6500328, p=0.0059) showed strong association with the RSTR phenotype, and the initial interesting associations with

SLC6A3 SNPs were no longer present. However, each of the SLC6A3 SNPs had to be imputed for the validation and subsequent meta-analysis. Though each SNP was imputed with a quality score greater than 0.95, it still would have been preferable to have been able to genotype these SNPs directly.

Because we observed nominal associations between various genes and TB and not with RSTR, this further suggests that these distinct clinical outcomes are regulated by different genetic mechanisms. Larger cohorts would be ideal to more closely examine this trait, but given the currently available data, the age of the RSTRs is of interest. As shown in Table 6, the mean age of the RSTRs is significantly different than that of the other two groups, TB and LTBI, in both the Illumina10k group in the initial analysis and the Omni5 group used in the validation. This difference in age is interesting, as it shows the RSTRs are younger on average than the other phenotype groups. If there is something unique about those who resist infection at a young age, any extended follow-up data past the initial two years of follow-up should be investigated to confirm whether or not this phenotype holds up past adolescence. Our sample size also limited the number of SNPs able to be analyzed, as we included only those SNPs with more than 35 copies of the minor allele; otherwise, our model would not converge. Finally, even though HIV status was included as a covariate, the impact of HIV on the characterization of RSTR is not well known and there may remain residual confounding. TST positivity is defined using a 57 lower threshold for HIV-positive individuals, and previous work showed difference in the distribution of HIV in RSTRs versus non-RSTRs (Ma et al. 2014).

Interestingly, we only observed one SNP within the 3′ region of the SLC11A1 gene (aka NRAMP1) that was associated with TB, and it did not achieve experiment-wide statistical significance (p = 0.026). SLC11A1 has been associated with TB in meta- analyses, so the lack of statistically significant associations might be surprising (Li et al.

2006, Li et al. 2011, Meilang et al. 2012). Non-replication could be due to study design, including differences in diagnostic criteria for TB cases and controls and issues of targeted polymorphisms versus comprehensive LD coverage (Stein and Baker 2011).

Another possible explanation for our weak association between TB and SLC11A1 could be due to interactions between SLC11A1 and other genes, where TLR2 acted as a modifier of SLC11A1-associated TB risk (Velez et al. 2009).

Though these two samples are similar in many ways, as they originate from the same KC study and therefore the data was ascertained and all phenotypic measurements were collected following the same protocol, a meta-analysis was the most appropriate way to combine the data for analysis. The 835 samples used in the initial analysis were gathered from 2002-2008 and genotyped using the Illumina10k microarray, while the 483 samples for validation were gathered from 2005-2009 and genotyped using the Omni5 microarray. The regression analysis including the Omni5 SNPs was conducted on a subset of SNPs from the original 546 SNPs in the Illumina 10k analysis. The covariates included in each analysis differed by the addition of a variable for age in the Omni5 analysis. The program METAL was used to overcome these differences between samples in order to combine their results in a meta-analysis. Benefits of METAL include the 58 ability to perform inverse variance weighted meta-analysis using standard errors to weight effect size estimates, as opposed to simply using p-values for a sample-size based analysis. METAL resolves for possible allele mismatch between populations to be combined, given the major and minor alleles used for each population’s analysis (Willer et al. 2010).

The two top SNPs, one within TICAM2 and one within IL1B, which showed strongest association with TB disease in initial analysis, required imputation to be included in the validation sample. Though they each were imputed with a quality score greater than 0.95, which indicates a very high level of certainty for the imputed genotype, we would still have preferred to have genotyped these SNPs directly. While the TICAM2

SNP retained a high level of association in the meta-analysis, the SNP of interest in IL1B was only associated at the p < 0.05 level.

Our candidate gene, hypothesis-based approach, as opposed to a genome-wide analysis, may have prevented us from observing additional genes significantly associated with the RSTR phenotype, so further work is needed. We are also limited by the available sample size, especially within the RSTR analysis groups. RSTR is a phenotype that does not have a high prevalence in our sample. In the initial sample, 10.5% of the sample were classified as RSTR, while in the validation sample, only 7.0% were classified as RSTR.

The non-replication of our initial strong findings in association with TB in the validation sample could be due to differences between the initial sample and the validation sample.

However, Table 5 shows a lack of significant difference among the initial analysis dataset, those genotyped on the Illumina10k microarray, and those in the validation analysis, those genotyped on the Omni5 microarray, between those variables included in 59 analysis as covariates: age and sex. However, there is a significant difference between the

TB groups by HIV status. Though HIV status was included as a covariate in both initial and validation analyses, this may indicate a difference in clinical severity of disease associated with HIV co-morbidity. Future analysis is planned to investigate differences between the initial and validation dataset relating to the distribution of tribe and clinical differences in the TB group. 60

Chapter 4: RSTR and Annotated SNP Selection

4.1 Introduction

The work to be described in this chapter is that from the proposed Aim 2 (see

Section 1.2). This analysis has been driven by the results of work by Dr. Thomas Hawn’s lab, which has identified differential gene expression response to Mtb in those who are

RSTR compared to those who are TST+ (Seshadri 2016). Peripheral blood mononuclear cells (PBMCs) were collected from individuals in the KC study who were either RSTR or had a latent Mtb infection (TST+). Monocytes were then extracted from these cells and were stimulated with Mtb, incubating overnight. The RNA levels corresponding to each gene on the chip were measured before and after Mtb stimulation. These gene expression values were then compared, first comparing each gene individually, then within gene networks. No significant differences were found for individual gene expression values, but significant differences were identified within gene networks. Their results have highlighted the importance of a variety of genes, most of which have never been associated with TB phenotypes before. The variants within these identified genes, as described in Section 2.3 are the focus of this analysis of association with the RSTR phenotype.

4.2 Study population

Data used in the following analyses were gathered from a household contact study the Kawempe Community Health Study (KC Study) conducted in Kampala, Uganda and were collected from 2005-2009 (Stein et al. 2013). In this household contact study, individuals who presented at the study clinic with active TB disease were enrolled.

Household members, including those who were infected or uninfected with Mtb, were 61 also enrolled. After the initial clinic visit and necessary medical information was collected, patients enrolled in the study underwent a follow-up exam every three months for the initial six months, then every six months for the next two years. Phenotypes were defined as in Section 3.1.

4.3 SNP Selection

In this analysis, we have focused on SNPs within a varying number of base pairs of the stop and start of each of the 17 previously described genes (see Section 2.3) shown to be associated with the RSTR phenotype, chosen from those SNPs included on the

Human Omni5-Quad BeadChip. The Omni5 microarray, processed using the Infinium

LCG genotyping assay and the iScan scanning system, includes 4,301,331 markers across the genome, which are available to power a thorough search for novel polymorphisms associated with our phenotype of interest (Xing et al. 2013). These markers include Tag

SNPs from the International HapMap and 1000 Genomes projects. This offers comprehensive coverage of the genome, representing significant variation even within

African populations. For the Yoruba in Ibadan, Nigeria (YRI) population, 71% of variation is captured for those SNPs with MAF >5%, and 58% of variation is captured for those SNPs with MAF >1%, as defined by the 1000 Genomes Project. Genotype calling and quality control were performed using Genome Studio, filtering the SNPs by call frequency, replicate errors, and clustering quality. Family relationships were corrected and resolved where needed, including defining subfamilies of first-degree relatives within households.

Following the completion of all quality control measures, SNPs were chosen for analysis based on the annotation available in the ENCODE and FANTOM5 databases, as 62 illustrated in Fig. 3. DNase I hypersensitive sites were identified from Th1 CD4+ T cells using DNase-chip or DNase-seq by a research group at Duke University, working in collaboration with the ENCODE consortium (Crawford et al. 2006, Song and Crawford

2010, Thurman et al. 2012). DNase I hypersensitive sites, which are regions of chromatin easily cleaved by the DNase I enzyme, are typically associated with regulatory functions as they allow for greater accessibility of the chromatin. Promoters and enhancer regions were identified through cap analysis of gene expression, which is a method to cut and tag sections of complementary DNA synthesized from RNA, within both granulocyte macrophage progenitor cells and monocyte derived macrophage cells from three different adult human donors (Andersson et al. 2014, Fantom Consortium et al. 2014). The two macrophage cell lines were collected from distinct body tissues, with the progenitor cells originating from bone marrow and the monocyte derived cells from the blood. For the selection of SNPs within FANTOM5 determined enhancer regions, the gene region was defined as within 500 kilobasepairs (kb) of the stop and start of each of the 17 genes. This range of 500kb around the gene was chosen because it is estimated that 64% of all enhancer regions for a given gene are within 500kb of the gene itself (Andersson et al.

2014). For the selection of all other annotated SNPs and Tag SNPs, the gene region was defined as within 1kb of the stop and start of each of the 17 genes. This has effectively limited the number of SNPs included in analysis and reduced burden of multiple comparisons. This more focused selection is intended to also reveal those SNPs that are most likely to affect the immune response related functions of the gene in question. In order to ensure coverage of these genes, Tag SNPs were selected within all gene regions using the NIEHS LD Tag SNP selection tool online (Xu and Taylor 2009). Tag SNPs 63 were selected to represent LD patterns based on three different African populations available from HapMap samples: the Luhya in Webuye, Kenya (LWK), the Maasai in

Kinyawa, Kenya (MKK), and the Yoruba in Ibadan, Nigeria (YRI). The LD threshold was set at r2 = 0.8, so that only those SNPs with a shared r2 greater than 80% were be included. SNPs were excluded from analysis given a MAF < 0.05.

4.4 Statistical Analysis

Genetic association analyses were conducted by logistic regression using generalized estimating equations to account for correlations within the data caused by genetic relatedness and shared environment, as implemented in the R package GEE. For

GEE, an exchangeable correlation structure was used in analysis unless the sample did not converge; then an independence correlation structure was used instead. Data were clustered by subfamily, defined as groups of first-degree relatives living within a household. Genetic association analyses were conducted for RSTR as the outcome.

Genotypes were coded as both additive and dominant genetic models, using the minor allele as the reference allele. Sex, HIV status, and age as a binary variable, contrasting between individuals ≤ 10 years and > 10 years of age, were included as covariates in all analyses. A nominal significance threshold of p = 1 × 10–4, corresponding to an experiment-wide significance of  = 0.05, was determined by estimating the number of independent tests based on LD among SNPs, using the program SNPSpDlite (Nyholt

2004). 64

Figure 3. Annotated SNP Selection Process

Gene ± 1kb

TagSNPs FANTOM5 ENCODE

NIEHS Macrophages CD4+ T Cell African populations Enhancers DNase LWK, MKK, YRI Promoters Hypersensitivity - monocyte - progenitor

Omni5 65

4.5 Results

We examined a total of 305 SNPs across 17 gene regions for this analysis, on a total of 471 subjects from 315 families within 223 households. Of the 471 subjects, 33, or

7%, were classified as RSTR (Table 5). 305 SNPs across 17 gene regions were included in the final analysis. We noted that among the RSTRs, the mean age was 11.5, while among the non-RSTRs, the mean age was 21.6. None of the SNPs met the experiment- wide significance level in this analysis with RSTR as the phenotype (Table 8). However,

29 SNPs showed a nominal association at the p < 0.05 level. Of these, six SNPs showed a suggestive association at the p < 0.01 level. The top SNPs in this analysis included three

SNPs from the COLEC10 gene, rs4876876 (OR= 0.22, p=3.2E-03), rs7835846 (OR= 2.1, p=4.8E-03), and rs9650075 (OR= 1.8, p=5.0E-03). 66

Table 7. Annotated SNP Selection Summary Promoters ENH Dnase within Total SNPs Gene CHR (#sites) (#sites) TagSNPs Macrophages in OMNI5 ABL2 1 0 0 1 1 2 HDAC1 1 0 1 (1) 1 0 1 VITRIN 2 0 2 (2) 10 0 12 HDAC4 2 3 (2) 53 (40) 26 2 81 HDAC11 3 1 (1) 2 (1) 1 0 4 HDAC3 5 0 1 (1) 2 1 3 HDAC2 6 0 0 3 0 3 HDAC9 7 0 0 55 0 55 COLEC10 8 0 0 16 0 16 LCN12 9 0 1 (1) 0 0 1 ABL1 9 0 1 (1) 10 0 11 NTM 11 0 4 (2) 96 0 100 HDAC7 12 1 (1) 0 3 0 4 HDAC5 17 0 2 (2) 4 1 7 HDAC10 22 1 (1) 4 (2) 0 0 5 HDAC6 X 0 0 1 0 1 HDAC8 X 0 0 5 0 5 Column Totals* 6 71 234 5 311 *The 53 HDAC4 SNPs in Dnase hypersensitive sites came from 40 different sites within the HDAC4 gene region, and no more than 4 SNPs came from a single site. **The column totals for SNPs within Dnase regions, TagSNPs, and promoters does not add up exactly to the total number of SNPs in the OMNI5 dataset because several SNPs were identified as being either in a Dnase region or as a promoter AND as a TagSNP. 67

03 03 03 03 03 - - - - - 0.041 0.012 0.047 0.017 0.010 0.038 0.039 0.021 0.036 0.019 0.043 0.014 0.045 0.044 0.022 0.037 0.043 0.032 value 7.3E 6.2E 3.2E 4.8E 5.0E - p

0.65 0.24 0.58 0.25 0.64 0.41 0.38 0.51 0.26 0.24 0.27 0.21 0.35 0.25 0.33 0.37 0.35 0.24 0.32 0.56 0.21 0.24 0.81 SE

0.22 1.95 0.31 1.83 0.28 2.29 2.37 0.22 1.71 1.64 2.12 1.82 2.10 0.51 1.97 0.39 0.50 1.78 2.28 0.31 1.55 1.73 0.18 OR

DNase DNase DNase DNase DNase DNase DNase DNase Source TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP TagSNP Annotation Annotation

Add Add Dom Add Add Model Dom Dom Dom Add Add Add Add Add Add Dom Dom Dom Add Dom Dom Add Add Dom

-value < 0.05 < -value p 0.09 0.24 0.07 0.33 0.11 0.07 0.31 0.13 0.13 0.15 0.16 0.24 0.08 0.44 0.16 0.20 0.46 0.28 0.16 0.11 0.38 0.13 0.11

Control

MAF

0.00 0.35 0.04 0.40 0.00 0.05 0.40 0.15 0.13 0.15 0.15 0.20 0.13 0.42 0.20 0.33 0.30 0.33 0.20 0.04 0.38 0.19 0.13 RSTR

A/G G/A C/T A/G G/A G/A G/A C/T T/C T/C G/T A/G C/T A/G G/T C/A A/G A/G T/C A/C Alleles C/T A/G T/C

HDAC4 HDAC4 HDAC4 HDAC8 HDAC8 HDAC9 HDAC9 HDAC9 NTM NTM COLEC10 COLEC10 COLEC10 COLEC10 HDAC1 HDAC4 HDAC4 HDAC4 HDAC4 HDAC4 Gene ABL1 ABL1 COLEC10

. Annotated SNPs Associated with RSTR, SNPs . Annotated rs2526631 rs1106363 rs114071488 rs9677989 rs12390338 rs12690254 rs10215908 rs10224196 rs3791375 rs3791474 rs4629183 rs4851972 rs4852037 rs7835846 rs9650075 rs16834956 rs10185553 rs11675637 rs17147263 rs4740366 rs4876876 rs7010043 rs7013203 SNP Name SNP Table 8 68

03 - 0.029 0.029 0.018 0.018 0.028 6.2E

0.29 0.37 0.22 0.62 0.24 0.32

0.53 2.39 0.60 0.25 0.51 2.00

DNase TagSNP TagSNP TagSNP TagSNP TagSNP

Add Dom Add Dom Add Dom

0.34 0.40 0.43 0.07 0.34 0.25

0.25 0.33 0.65 0.04 0.45 0.25

T/C A/G T/C G/A C/T T/C

NTM NTM NTM NTM NTM NTM

rs76249979 rs1939715 rs2658859 rs318947 rs487557 rs566995 In Alleles column, minor allele is listed first. RSTR and Control refer to minor allele frequencies (MAF). IMPUTE2 quality score and score IMPUTE2 refer allele RSTRfirst. quality (MAF). Control is listed frequencies Inallele column, minor Alleles to minor the SNPs.for source source provides for imputed (quality) The additive; Domannotation = Add is Model= shown dominant, TagSNP of an annotation source For SNP.SNP with the in given includingincluded to be example, a the chosen that means SNP was a TagSNP analysis as gene region. identified as was given it the within 69

4.6 Discussion

There have been relatively few studies focusing on the association between human genetic factors and TST negativity or RSTR phenotypes. A series of studies by

Erwin Schurr’s research group has focused on the 11p14 region and its association with

TST negativity. The initial study was performed in a South African population and identified a locus on 11p14, later renamed TST1, as associated with TST negativity

(Cobat et al 2009). Also in this study, the 5p15 region, near the SLC6A3 gene, was found to be associated with TB negativity, which agreed with an earlier linkage analysis using a sample from our KC Study (Stein et al. 2008, Cobat et al. 2009). Then through a segregation analysis, a study of healthy household contacts of TB cases in Columbia found that TST reactivity was linked to a codominant gene, though the responsible gene itself was not determined by their analysis (Cobat et al. 2012). Another linkage analysis followed, using results from a French household contact study, focusing on healthy household members of patients with TB disease (Cobat et al 2015). This study identified the 11p14-15 region, near the site of the TST1 locus, as being associated with TST negativity, and then validated these results using a meta-analysis with a South African cohort. Despite these two samples coming from populations with very different TB endemicity, the findings still replicated. It is interesting to note that the 11p13 region has also been independently associated with TB disease (Thye et al. 2012).

However, TST negativity may not be comparable to the RSTR phenotype. The

RSTR phenotype presents in individuals who maintain a negative TST over a 2-year follow-up period, despite prolonged exposure to those with infectious TB disease. TST negativity or reactivity is collected at a single time point and therefore may not be 70 comparable to our specific phenotype. A longitudinal study with careful data collection is needed to characterize the RSTR phenotype, and it is therefore not able to be defined in those studies which utilize data collection at only one time point. In order to study our unique phenotype, we have taken the approach of selecting genes of interest based on our collaborator’s findings. None of these genes of interest have been known to show a genetic association with RSTR or TB disease in published analysis. In this uncharted territory, we aimed to identify those SNPs driving the experimental association with our phenotype, and which are likely to be functionally relevant to the underlying innate immune response.

In this focused analysis within 17 gene regions, none of the selected SNPs were identified as experiment-wide significant despite being chosen based on functional annotation within genes shown experimentally to be differentially expressed between

TST- individuals and TST+ individuals. Therefore, it may be more interesting to compare not just RSTR vs. Non-RSTR, but specifically RSTR vs. TST+. It would also be interesting to focus on the clinical and epidemiological differences between those who are RSTR and those who are TST+, though our previous study showed an epidemiological risk score did not differ between the two groups (Ma et al. 2014). There may be other significant differences between these groups, beyond their genetic differences, such as tribe and physical characteristics. We are also currently hampered in this analysis by our sample size, particularly regarding the number of individuals in the analysis who are classified as RSTRs, which is our special phenotype of interest. As shown in Table 2, under the Omni5 heading, only 7% of our sample was classified as

RSTR. Though we would prefer to have more individuals with this phenotype to study, 71 this is not a common phenotype nor is it simple to collect. Therefore, we must work with the limited sample size at hand.

What is intriguing about our SNP-level association results is the strong showing of SNPs from COLEC10. Five SNPs show an association at the p<0.05 level, with three showing association at the p < 0.01 level. COLEC10 codes for a pattern recognition molecule which has been associated with the innate immune system and as its name suggests, is involved in regulating liver function through tissue development (Ohtani et al. 2012, Axelgaard et al. 2013, Bayarri-Olmos et al. 2015). Though this gene has not been identified as associated with TB or mycobacterial resistance in the past, this novel finding is of interest through its connection with human innate immune function.

DNase I hypersensitivity sites identified in CD4+ T cells within the HDAC4 gene region show an association with the RSTR phenotype. Although this association is not experiment-wide significant, it is still an interesting finding, given that of the eight SNPs in the HDAC4 gene region with a p-value < 0.05, seven were selected because they were located within DNase I hypersensitivity regions. However, because the odds ratios are not consistently showing a protective effect or evidence of susceptibility, further analysis is needed to understand the biological relevance. Though eleven genes within the HDAC family were included in the analysis, none of them have previously been shown to have significant associations with TB. Though the model used in a recent study is regulation of the skeletal muscle cells and their associated cellular energy, IFNγ was found to activate

HDAC4, both in vitro and in vivo within a mouse model (Fang et al. 2016). This association of IFNγ and HDAC4 is of interest, since the pro-inflammatory cytokine IFNγ is also involved in the human immune response to Mtb infection. Further study is needed 72 to elucidate the connection between IFNγ and its related human immune response and

HDAC4 within this framework. We also note that none of the annotated SNPs selected because they were located within enhancer regions or selected as promoters were found to be associated with the RSTR phenotype at a p-value <0.05. The majority of SNPs included in this analysis were either selected as being within DNase hypersensitive sites or as tag SNPs, as shown in Table 7.

One of the strengths of this analysis is its focus on genes identified as part of gene sets which may regulate the immune response in macrophages, when comparing TST+, or those individuals with a latent Mtb infection, and those individuals who are TST- and who were able to successfully prevent an Mtb infection. This focus on finding signals within genes significantly associated with the RSTR phenotype is necessary for detecting those genes which can be used to understand biological function for vaccine development. Our approach here focuses on each SNP analyzed independently. However, these SNPs are within genes which were initially identified as part of a gene set analysis and should therefore be analyzed as a set. An analysis should follow which takes into account the fact that these SNPs may be working together. A gene-level analysis, such as facilitated by the online interface VEGAS2, may be better suited to be able to detect an association with our phenotype. Future analyses should also focus within the two genes which showed interesting associations with the RSTR phenotype, COLEC10 and

HDAC4. Further analysis is needed to determine how these genes may contribute to the innate immune function driving the RSTR phenotype. 73

4.7 Gene Level Analysis

Because these SNPs were not identified singly, but rather as genes within a gene- set pathway analysis, gene-level analysis was also conducted using the 17 identified genes. Versatile Gene-based Association Study (VEGAS) is a user-friendly approach to gene-based analysis which can be applied to family-based studies, such as ours (Liu et al.

2010). A newer version of this program, VEGAS2, however, uses the 1000 Genomes as a reference set instead of HapMap data and allows for better control and personalization of the analysis (Mishra and Macgregor 2015). The user specifies what percentage, from a range of the top 1-100%, of SNPs within each pre-defined gene is included in the gene- based analysis. For our analysis, 100% of the SNPs were selected. The gene region itself can also be specified, from ± 0kb, ± 10 kb, ± 20 kb, or ± 50kb outside the gene. Both the

± 0kb and the ± 10 kb gene regions were selected and run for our analysis, to compare the results. The user can also specify one of four populations, European, Asian, American, or

African, from 1000 Genomes and then further select specific sub-populations to be used as the reference for the LD structure; for our analysis, the African population and sub- populations of YRI and the LWK were be selected. This version also offers the option of including genes located on sex . The gene-based p-value generated by the program is the proportion of Monte Carlo simulated test statistics that exceed observed gene-based test statistics.

4.8 Results

For the following gene level analyses, both the ± 0kb and the ± 10 kb gene regions were selected and run separately, in order to compare the results. While no genes showed a gene-level p-value < 0.05, the HDAC8 gene shows a p-value of 0.057 in the 74 analysis including SNPs within the ± 10kb surrounding the gene. In the same analysis including only those SNPs within the gene itself, the HDAC8 gene p-value increases slightly to 0.062. 75

Table 9. Gene Level Analysis, including gene region ± 10kb

Gene-level Top SNP Gene nSNPs p-value Top SNP p-value ABL1 11 0.146 rs4740366 0.024 ABL2 2 0.252 rs12130276 0.141 COLEC10 7 0.704 rs16891948 0.294 HDAC2 3 0.657 rs2025191 0.317 HDAC3 3 0.378 rs11741808 0.187 HDAC4 76 0.261 rs4851972 0.011 HDAC5 6 0.758 rs228757 0.315 HDAC7 3 0.821 rs2544030 0.351 HDAC8 4 0.057 rs12690254 0.028 HDAC9 55 0.284 rs10224196 4.15E-03 HDAC10 4 0.616 rs114219302 0.195 HDAC11 3 0.097 rs2290194 0.056 NTM 100 0.256 rs1106363 4.21E-03 VIT 12 0.918 rs17019851 0.149 nSNPs provides the number of SNPs included in the analysis for each gene. Top SNP gives the rs number for the SNP within each gene with the lowest associated p-value, which is presented in the Top SNP p-value column. 76

Table 10. Gene Level Analysis, including gene region ± 0kb

Gene-level Top SNP Gene nSNPs p-value Top SNP p-value ABL1 10 0.112 rs4740366 0.024 ABL2 2 0.288 rs12130276 0.141 COLEC10 5 0.624 rs16891948 0.294 HDAC2 3 0.662 rs2025191 0.317 HDAC3 3 0.374 rs11741808 0.187 HDAC4 76 0.283 rs4851972 0.011 HDAC5 5 0.689 rs228757 0.315 HDAC7 3 0.786 rs2544030 0.351 HDAC8 4 0.062 rs12690254 0.028 HDAC9 55 0.329 rs10224196 4.15E-03 HDAC10 3 0.409 rs114219302 0.195 HDAC11 3 0.095 rs2290194 0.056 NTM 100 0.252 rs1106363 4.21E-03 VIT 12 0.932 rs17019851 0.149 nSNPs provides the number of SNPs included in the analysis for each gene. Top SNP gives the rs number for the SNP within each gene with the lowest associated p-value, which is presented in the Top SNP p-value column. 77

4.9 Conclusions

What is most interesting about this gene-level analysis is also the most disappointing; this analysis did not find a significant association between any of the identified genes and the RSTR phenotype. While it may be possible that there truly is no association between these genes and our phenotype of interest, there may in fact be a different approach needed to identify this association.

VEGAS2 was designed to utilize the p-values of a completed genome-wide association analysis and calculate a gene-level p-value to determine the overall association of the SNPs within each gene with the given phenotype. The genes are defined by the program itself, given the SNPs provided and the selected gene region size.

We note that in neither gene size analysis (± 0 or 10 kb) was HDAC1, HDAC6, or

LCN12 identified as a gene, because each gene contained only one SNP. VEGAS2 did recognize all other genes containing 2 or greater SNPs. It is also a limitation, given that we utilize a limited number of annotated SNPs, that VEGAS2 was intended for use with

GWAS results, and there may have been too few SNPs in our selected genes for this particular analysis to have meaningful results.

Moreover, not all SNPs which were used as input for VEGAS2 were actually utilized by the program in calculating the gene-level statistics. Notably, though

COLEC10 showed promise in the SNP-level analysis, with three SNPs showing p-values

< 0.01, none of these SNPs were selected as part of the COLEC10 gene region, in either gene region size analysis. Of the initial 16 SNPs within the COLEC10 gene used as input for VEGAS2, only 5 were used in the gene region ± 0 kb and only 7 were used in the gene region ± 10 kb to calculate the gene-level statistics. This may be due to the fact that 78 though many of these SNPs were identified as Tag SNPs based on the HapMap data,

VEGAS2 uses the 1000 Genomes Project as its reference for building the genes.

Excluding individual SNPs from the gene-level analysis is inappropriate when calculating the gene-level statistic, where all SNPs are needed to contribute to the overall significance of the gene. Therefore a method should be pursued which allows for the inclusion of all SNPs within each gene region.

For any future studies, it would be beneficial to repeat both the SNP-based regression and the gene-level analysis with the RSTR phenotype given the complete set of SNPs within the gene regions of interest, as opposed to restricting to just those SNPs which have been annotated as potentially functional. These findings could also be used to develop cellular and immunology experiments.

Another proposed method for analyzing gene-level association, specifically using family data with a binary outcome, is the method utilized in the new gskat package within

R. This method uses GEE, as described in Section 3.4, and SNP-set kernel association test (SKAT), as opposed to the Monte Carlo simulation model utilized by VEGAS2. GEE can take into account the relatedness among family data in a genetic association analysis, while SKAT collects multiple SNP-statistics to combine into a single SNP-set level statistic. This method does allow for the user to specify exactly which SNPs should be included for analysis within a given region, unlike the automatic approach in VEGAS2 which led to the unwanted exclusion of SNPs in the calculation of several gene-level statistics. As opposed to using the 1000 Genomes African data as a reference, it appears that Hap Map data or the user’s own sample data can be used to calculate the correlation 79 among the related participants. An ideal method would utilize the existing data to calculate and correct for any correlations within the data. 80

Chapter 5: SEM and IFNγ Immune Response

5.1 Introduction

The work to be described in this chapter is that stemming from the proposed Aim

3 (see Section 1.2). I have used SEM to identify links between the IFNγ response and biological factors associated with TB disease. SEM is a hypothesis based approach. We did not explore data just to generate a hypothesis of how variables relate to one another.

Rather, this analysis uses as its basis a hypothesis grounded in the biological interactions of the host, the pathogen, and the resulting immune response—which here, respectively, are humans, Mtb, and IFNγ response.

The simplified biological model for our hypothesis is illustrated in Figure 2, and described in Section 2.5. This figure illustrates the biological functions that occur in response to the detection of Mtb in the human body and the most important players in these interactions. We hypothesize that the IFNγ immune response is regulated by human genetic variants and also influenced by the Mtb lineage present.

5.2 Study population

Data used in the following analyses were gathered from two phases of a household contact study conducted in Kampala, Uganda. Data from the Household

Contact Study were collected from 1995-1999(Guwatudde et al. 2003), while data from the Kawempe Community Health Study (KC Study) were collected from 2002-2009

(Stein et al. 2013). Please refer to Section 3.2 for a description of the data collection.

5.3 Human Genotyping 81

We included individuals from the KC study for this analysis with available genotypes for the 546 haplotype tagging SNPs described within Section 3.3, and those individuals with available genotypes within the Omni5 dataset, as described in Section

4.3; refer to those respective sections for information on data collection relating to the

SNP data.

5.4 Immunological Data

As previously described (Section 2.5), IFNγ is involved in the pathogenesis of

TB, and is an important focus for future drug and vaccine development for TB. In our data, immune response was measured by the level of IFNγ in response to various culture filtrates after incubation with patient blood samples. Blood samples were collected from each patient, as previously described (Li et al. 2012, Mahan et al. 2012). Blood samples were incubated for 7 days with a culture filtrate containing surface proteins of Mtb and other antigens (CXFT), as well as specific antigens including Antigen-85B (Ag85B) and early secretory antigen (ESAT)-6. IFNγ in the supernatant was measured using ELISA

(Cambridge Bioscience). In addition, blood samples were also incubated with stimulus for 7 days and IFNγ measured after 7 days; this was used as a control. The control IFNγ value was subtracted from the IFNγ after stimulation with the antigens described above, and this difference was used for the analysis. When the control value was greater than the stimulated value, the value was Winsorized to zero. Because of severe skewness of the data, values were log-transformed prior to further analysis.

5.5 Mtb Genotyping

Mtb genotyping was performed by Dr. Edward Wampande in the laboratory of

Dr. Moses Joloba in Kampala, Uganda, under the direction of Dr. Sebastien Gagneux. 82

The methods used in extracting and genotyping the Mtb is described in detail elsewhere

(Wampande et al. 2013). The Mtb lineage was determined by using the DNA in a real- time PCR assay on a Roche LightCycler machine. Primers and probes with SNPs specific to the Mtb genome were used to assign the isolate to a specific Mtb lineage based on previous phylogenetic analysis (Coscolla and Gagneux 2010). Descriptions of Mtb lineages included in this analysis are provided in Section 2.6. Samples of Euro-American

Ugandan sublineage (EAU) and Central Asian lineage (CAS) DNA were used as positive controls on the assays. Mtb lineages could only be identified in individuals who had active TB disease, since they were the only ones who could produce a sputum sample from which Mtb could be obtained. For the purposes of this analysis, only those individuals with an active case of TB disease were assigned an Mtb lineage and subsequently included in analysis.

5.6 Structural Equation Modeling

Structural equation modeling (SEM) is a hypothesis-driven approach, where we start with a hypothesized biological model, and test the model fit using a system of equations to model the data. We use SEM to test our hypothesized model, representing both latent and observed variables through a series of equations, referred to as structural and measurement equations. These equations represent the interactions of the observed and unobserved, or latent, variables. Through SEM, we can include more than one outcome variable at once and can model complex relationships between variables, accounting for causal relationships between variables as well as errors associated with measuring those variables. This is better suited for what we wish to study here than regression on a single outcome variable. We have utilized SNP data and Mtb lineage 83 information in order to understand how each of these contribute to antigen-induced IFNγ immune response, within the context of our household contact study. IFNγ immune response was treated as a latent variable, that is, a variable that is not directly measured, but is indirectly measured through other variables. The latent variable of IFNγ immune response is intended as a proxy for the human immune response to Mtb. We strive to understand how other variables, such as human SNPs and Mtb lineage, may be related to

IFNγ immune response, and apply that knowledge to understand how this may affect an individual’s ability to fight off infection. We want to know, in our sample, what factors impact the immune response, using the various types of available data.

5.7 Statistical Analysis

In order to select the genes of interest for analysis, a preliminary association linear regression analysis was conducted on the set of Tag SNPs described in Section 3.3, with

IFNγ production, as described in Section 5.4, as the outcome. For those genes where multiple SNPs were found to be significantly associated with IFNγ production, a single

SNP was selected to represent that gene in the final SEM analysis, based on the strength of the association in the preliminary analysis (see Table 12). Four additional SNPs were also included in the SEM analysis based on the strength of their association with TB disease, based on the results of the logistic regression model described in Section 3.6.

Two sets of equations for each model tested were created: measurement equations and structural equations. These equations and all necessary data for each model were read into Strum, a user-friendly R package which allows for structural equation modeling that can take pedigree information into account, facilitating the use of family data (Morris et al. 2010). The framework utilized in Strum allows for latent measures and covariates 84 within a structural model, allows for genetic association testing, and can also conduct family-based association analyses. Using structural equation modeling through Strum is ideal for our TB household contact study data. Because TB phenotypes typically display no obvious Mendelian inheritance patterns, though TB disease has shown familial aggregation, we understand that the disease manifestation and host immune response to infection is a result of various interconnecting factors of environment and genetics.

Within Strum’s framework of structural equation modeling, we have been able to assess all relationships within our data simultaneously. This is important, as we desired to integrate the immunological, microbiological, and genetic data available to analyze and understand their combined effects on TB. Strum is also unique in that it utilizes both a measurement equation and a structural equation with pedigree data, as opposed to previous approaches that have included only a measurement equation (Eaves et al. 1996,

Dolan and Boomsma 1998, Bauman et al. 2005) or only a structural equation with no explicit measurement equation (Todorov et al. 1998, Gianola and Sorensen 2004).

To build the final model, we tested first the hypothesis that human host genetic variants affect IFNγ response by modeling IFNγ response as a latent variable, with the measured IFNγ levels produced by incubation with three different antigens as observed endogenous indicator variables for IFNγ and the human genetic component modeled through the SNPs of interest identified in the preliminary association regression analysis as observed exogenous variables. Next, we included Mtb lineage in the model. IFNγ production levels and human host genetics remain in the model as determined by their significance to the model. Also note that the models included the polygenic genetic 85 effect, p, which was calculated by Strum based on the pedigree structure present in the data, and accounts for the relatedness within this household contact study data.

The term error included in the equations represents variance in the model, which may include independent environmental and/or unmeasured genetic variance. The results of these Strum analyses include information on model fit, including fitted parameter value standard errors, confidence intervals, and p-values, as well as the χ2 index of fit and associated p-value. The information on the model fit for each iteration of the model informed any decision on how to modify the model to allow for the best fit. Coefficients which were found to not be statistically significant were removed from the model, and the model re-tested for best fit. It was through this process of model fitting and re-fitting that we intended to identify the factors contributing to IFNγ response in our study sample. The proposed and final models are presented below (Figures 4 & 5). 86 and human genetic and genetic human

measures, Mtb lineage Mtb measures, 푒푟푟표푟

SNPs + IFNγ

+

p 퐸퐴 퐶퐴푆 퐸퐴푈 푆푁푃푠

4 훾

AG85B 3 훾

with antigen-induced 2 훾

1 훾

= response, ESAT6 IFNγ 퐼퐹푁푔 immune immune

IFNγ 푒푟푟표푟 of CXFT + with equations. corresponding

퐼퐹푁푔

1 1 1

=

6 퐵 Proposed modeling EA .

CAS 85 EAU 4 퐶푋퐹푇 퐸푆퐴푇 퐴푔

Figure Figure information through SNPs, information 87

5.8 Structural Equation Modeling Results

In this Strum analysis, a sample of 1407 individuals from these household contact studies was used (Table 11) to model IFNγ response to Mtb as a latent variable, signifying a key component of the human immune response to Mtb.

In the primary analysis to select SNPs for inclusion in the Strum analysis, SNPs from four genes were shown to have a significant association with IFNγ response (p- values < 0.001). The single most significantly associated SNP within each of the four associated immune response genes (IL6, TICAM2, TLR4, TLR9) was then selected for inclusion in the Strum analysis. Four SNPs representing the immune response-involved genes IL4, IL1B, NOD1, and TOLLIP, were also included in the analysis based on their strong association with TB disease found through our previous work (Hall et al. 2015).

These SNPs and their associated p-values from the preliminary work are presented in

Table 12.

Eight SNPs within human immunity genes, the three main lineages of Mtb found in our Ugandan study sample, and production levels of IFNγ in response to various Mtb- specific antigens were included in the initial model. These model predictors were then removed using backwards stepwise elimination until a model of best fit was identified, based on the comparative fit index and the theoretically corrected chi-squared p-value.

Predictors were removed from the model based on the strength of their associated p- values; if the predictor was not statistically significant at the p <0.05 level, then the predictor was removed. The final model, as shown in Figure 5, therefore only included three SNPs, the three main lineages of Mtb found in our Ugandan study sample, with the 88

EA lineage being used as the reference, and two Mtb-specific antigens, ESAT6 and

AG85B.

Strum was used to comparatively fit models to identify those predictors independently associated with IFNγ response and select the model of best fit, using the results of the comparative fit index provided within the Strum output to compare the models. The most parsimonious model had a comparative fit index score of 0.99, which indicates a very well-fitted model, and a theoretically corrected chi-square p-value of

0.69, also indicating a good fit. A significant association was identified between IFNγ response and a SNP in TICAM2 (p-value = 0.007), which we previously identified as associated with TB. Antigens ESAT6 (0.012) and AG85B (0.006) were statistically significant contributors to modeling IFNγ immune response. None of the Mtb lineages showed a significant association with IFNγ immune response, and the polygenic genetic effect was also not significant. 89

Table 11. Sample Demographics Total number of individuals (n) 1406 Families 840 TB + 448 (32%) Mtb Lineage Euro-American Ugandan 214 (48%) Euro-American 79 (18%) Central Asian Strain 41 (9%)

Table 12. SNPs from Genes of Interest included in Analysis SNP GENE Function Association p-Value rs2243270 IL4 intron TB 6.93E-03 rs1143633 IL1B intron TB 4.25E-05 rs2069842 IL6 intron AG85B 8.08E-04 rs2970499 NOD1 intron TB 9.85E-03 rs12109315 TICAM2 intron AG85B 1.63E-03 rs11536877 TLR4 intron ESAT6 2.08E-03 rs352140 TLR9 coding AG85B 2.94E-03 rs5743867 TOLLIP intron TB 6.83E-03 The association column identifies which outcome the SNP of interest was most highly associated with, either TB disease, or IFNg production in response to incubation with AG85B or ESAT6. The p-value column, therefore, provides the p-value which results from this association.

Table 13. Results of Structural Equation Modeling with Illumina10k and Omni5 Data Model Parameters Estimate 95% CI p-value AG85B 1.367 (0.4, 2.34) 5.70E-03 * TICAM2 - rs12109315 -1.465 (-2.54, -0.38) 7.87E-03 * ESAT6 0.793 (0.17, 1.42) 0.013 * NOD1 - rs2970499 0.370 (-0.04, 0.78) 0.077 CAS -0.369 (-0.86, 0.12) 0.138 Ugandan -0.249 (-0.67, 0.17) 0.244 TOLLIP - rs5743867 -0.115 (-0.38, 0.15) 0.401 polygenic genetic effect (P) 0.237 (-0.54, 0.54) 0.175 * indicates p-value <0.05. The est column provides an estimate of the contribution of each predictor to modeling the latent variable of IFNg response, though it should not be overinterpreted; rather it is the direction of this estimate that is most informative. 90

Figure 5. Final Model of IFNγ immune response, with antigen-induced IFNγ measures, Mtb lineage and human genetic information through SNPs, with corresponding equations

CAS TICAM2 SNP

EAU IFNγ NOD1 SNP

EA TOLLIP SNP p ESAT6 AG85B

푆푁푃푠 퐸푆퐴푇6 1 = 퐼퐹푁푔 + 푒푟푟표푟 퐶퐴푆 퐴푔85퐵 1 퐼퐹푁푔 = 훾 훾 훾 훾 + 푝 + 푒푟푟표푟 1 2 3 4 퐸퐴푈 퐸퐴 91

5.9 Discussion

In building a model to appropriately describe the IFNγ immune response, we use the three types of information we have collected we believe to be most important. These data were collected and included in the model because there was already a strong basis for their contribution to the host immune response to Mtb. From this final model, we conclude that SNPs within TICAM2 and NOD1 and IFNγ production in response to

AG85B and ESAT6 are contributing significantly to IFNγ immune response.

The association with a SNP in TICAM2 is interesting given that it was also identified within Aim 1 as significantly associated with TB disease. That finding was novel, and to our knowledge, TICAM2 had not previously been found to be associated with TB disease. TICAM2 has been shown to be involved in both the TLR4 mediated pro-inflammatory cytokine production pathway and the TLR2 signaling pathway for IFN-

β production (Yamamoto et al. 2003, Ohnishi et al. 2012).

NOD1 was identified as a gene of interest after several SNPs within this gene showed suggestive association with TB disease in our earlier study focusing on candidate genes (Hall et al. 2015). This was the first known association of SNPs within NOD1 and

TB disease. Following our initial publication of results with association of TB and RSTR with NOD1 SNPs, two more recent studies have investigated the role of NOD1 within macrophages. The first study describes the role of NOD1 within human alveolar macrophages and monocyte-derived macrophages after experimentally induced Mtb infection (Juarez et al. 2014). They concluded that NOD1 is involved within both types of macrophage, with potential for killing Mtb within a human host. Another study followed, focusing on the role of NOD1 regarding cytokine production within macrophages derived 92 from murine bone marrow after experimentally induced Mtb infection (Lee et al. 2016).

They concluded that NOD1 works in concert with NOD2 and TLRs in producing cytokines within macrophages in the mouse model. These studies reinforce the fact that

NOD1, along with NOD2, is important in the human immune response to Mtb. Though this analysis did not reveal a strong association with the NOD1 SNP, these studies support the need for further study of the association between NOD1 and TB. Because we limited our analysis to the top SNPs from these eight genes, we may have missed out on finding a SNP which is involved in the IFNγ response. It may be useful to utilize a

GWAS approach to finding SNPs which are associated with IFNγ production, and in turn, involved in the human host’s IFNγ response. Unfortunately, this was not possible in our data, given that fewer than 20 individuals included in the Omni5 genotyping also had available data for IFNγ production. Utilizing the literature to identify any SNPs or genes shown to be associated with IFNγ production, or hypothesized to be involved in the overall IFNγ response, is another valid method to determine which SNPs could be included in this SEM analysis.

Both IFNγ production in response to incubation with antigen ESAT6 (p =0.012) and AG85B (p=0.006) also significantly contributed to the model. This is not surprising given that these variables are a direct measure of IFNγ production levels in response to stimulation by the given antigens. It is a bit surprising that CXFT was not an important indicator of the latent IFNγ variable, because CXFT responses are much more robust. On the other hand, ESAT6 and AG85B are TB-specific antigens, and therefore may be more appropriate to our model. 93

It is also important to consider the contribution of Mtb lineages to the immune response. Lineage, though not providing strong evidence of influencing the model here, has been shown to have a significant effect on immune response. Though earlier research does not conclusively show a difference in virulence among lineages (Wampande et al.

2013), the Beijing (East Asian or Lineage 2) lineage is described as being particularly virulent (Nicol and Wilkinson 2008). More recent studies have described MTB lineage- specific differences in cytokine induction and TLR2 and TLR4 activation (Sarkar et al.

2012, Carmona et al. 2013). Also, of the seven worldwide lineages of Mtb, a review of recent literature concluded that the East Asian lineage, which contains the Beijing strain, and the Euro-American lineage which are more widespread globally, also tend to be more virulent than lineages which appear more geographically constrained (Coscolla and

Gagneux 2014). A set of genes have been found to be differentially regulated in macrophages infected with mycobacteria in vitro, including Mtb (Blischak et al. 2015).

The response among the varying strains of Mtb studied (a common laboratory strain, heat-inactivated Mtb, and a laboratory-identified virulent strain) was similar, and the

CMAS gene was found to be associated with Mtb infection. Again, this study reinforces the idea that Mtb infections induce an innate immune response that is also associated with human genetics.

On the other hand, a previous study using our same study population investigating the virulence of the three main lineages did not find a significant difference in virulence as measured by cavitary disease (Wampande et al. 2013). Our previous work did show that in those who were TST- at baseline (that is, they were not infected with Mtb), exposure to CAS did have a significant association (p-value = 0.007) with a reduction of 94

IFNγ production in response to incubation with ESAT6. However in our structural equation analysis, we only included Mtb lineage in the analysis for those who had been diagnosed with culture-confirmed TB disease. In our final model, none of the lineages contributed significantly to IFNγ immune response. This supports the finding that there is no difference in virulence among the three lineages, at least among those with active TB disease, as it relates to IFNγ immune response.

However, Mtb lineage remains in our final model for two main reasons. As just described, in previous studies it has been shown to affect the host’s immune response.

Also, this final model with the present variables is chosen as it is the most parsimonious model. This means that removal of any further variables, even given that they do not appear to be significantly contributing to the model, lowers the overall value of the model fit in describing our latent variable as measured by the χ2 index of fit and associated p- value. If the strength of our model lies in the variety of available data used to fit the IFNγ immune response, we should not weaken the model.

This analysis allows for a look at the bigger biological picture of immune response to Mtb in a human host, to help us understand how each of these three factors— human genetics, Mtb lineage, and antigen induced response— contribute overall. IFNγ immune response as a latent variable is not something which we have directly measured.

Rather it represents the total production and effect of IFNγ within a human host during the immune response as a result of the introduction of Mtb. Of course, our weakness and limitation follows not from the method, but from the limited available data to model this complex biological process. It is important to note the limitations of our available data.

Though we intended to use data from over 1400 individuals for this analysis, not all 95 individuals had full information available for all variables included in the analysis. The

Mtb lineage was not genotyped for most of those individuals in the Omni5 analysis, and conversely, not all individuals who had samples tested for IFNγ production were genotyped. Specifically, only those individuals with full data available for Mtb lineage and SNP genotypes were included. In follow-up analyses, a specific option is available to include more individuals in the analysis which were previously unavailable due to missingness for certain variables. This option allows for pairwise analysis of predictors, as opposed to joint analysis of all predictors at once. In the joint analysis of all predictors, if one predictor value is missing, then that individual is removed from analysis. However, when using pairwise analysis of predictors, there is more flexibility in using an individual with missing values for some predictors.

Biological factors for this model were chosen to clearly model and support our biological hypothesis. Future analyses using structural equation modeling may be able to incorporate other biologically relevant variables which are known to affect immune response, such as age, sex, presence of BCG scar, known HIV status, and other known contributors to IFNγ immune response, such as TNFα. With larger and more complex models, which may involve multiple interaction, the model and subsequent findings may be more challenging to interpret, and this should be taken into account when selecting variables to include in future analyses. 96

Chapter 6: Discussion and Conclusions

Throughout this research, my goals have mirrored the mission of Case’s TBRU: to work internationally to identify new approaches to prevention, diagnosis, and treatment of TB in areas of the world where it is most common. This research has been made possible through the resources of the TBRU and specifically the KC study which collected study participants and their associated data from Kampala, Uganda. This international work is focusing on an area of high burden of disease and acute need.

Uganda is of interest as a country where TB is endemic; In 2014, their TB incidence rate was 161 per 100,000 population (WHO 2015). My work is in response to the pressing need for development of better treatments and more effective forms of prevention, such as vaccines. We start at the level of human genetics in an effort to discover those genes which are involved in the human response to TB disease, either in susceptibility to disease or resistance to the manifestation of infection with Mtb. We believe that the existence of the RSTR phenotype, which so far we are the first to be able to collect and characterize given our longitudinal data collection strategy, is a result of an atypical and improved innate immune response. Again, previous work has shown that this phenotype is not simply a result of some individuals having a reduced or otherwise different level of exposure (Ma et al. 2014).

Our analyses focusing on the RSTR phenotype have identified genes with suggestive associations: NOD1, NOD2, TICAM2, COLEC10 and HDAC4. Interestingly, of these genes, NOD1, NOD2, and TICAM2 have also shown association with TB disease. Analyses with TB disease showed associations with IL1B and TICAM2 in the candidate gene analysis. However, the lack of replication for these SNPs in the validation 97 study sample may point to underlying differences between the initial study sample and the validation study sample, consisting of those individuals genotyped on the Illumina10k microarray and the Omni5 microarray, respectively. Though these two samples were compared by age, sex, and HIV status, as shown in Table 5, the only difference between initial and validation study samples was shown in HIV status, between the individuals with TB disease. Further analysis will follow up on other possible sample differences, such as clinical characteristics and tribe distribution between samples.

A NOD1 SNP also appeared to contribute to the SEM model of the IFNγ immune response. Involvement in the IFNγ immune response would go hand in hand with a possible link to resistance to disease via the RSTR phenotype. This is a novel finding, as these analyses show the first known associations of the NOD1 gene with TB disease or resistance to Mtb infection. Following our initial publication of results with association of

TB and RSTR with NOD1 SNPs, two more recent studies have investigated the role of

NOD1 within macrophages. The first study describes the role of NOD1 within human alveolar macrophages and monocyte-derived macrophages after experimentally induced

Mtb infection (Juarez et al. 2014). They concluded that NOD1 is involved within both types of macrophage, with potential for killing Mtb within a human host. Another study followed, focusing on the role of NOD1 regarding cytokine production within macrophages derived from murine bone marrow after experimentally induced Mtb infection (Lee et al. 2016). They concluded that NOD1 works in concert with NOD2 and

TLRs in producing cytokines within macrophages in the mouse model. Our research, along with this recent work investigating the biological function of NOD1 within 98 macrophages, supports the need for further study of the associations between NOD1 and

TB disease and resistance to Mtb infection.

Future work focusing on genetic associations with the RSTR phenotype should investigate these genes, NOD1, NOD2, TICAM2, COLEC10 and HDAC4, possibly utilizing fine-mapping or as part of a GWAS. A complete GWAS with the RSTR phenotype may also provide interesting results, however, given the existing evidence pointing towards the 17 identified genes of interest, the focus of analysis in Chapter 4, it would be most useful to focus future research on the genetic variation within these gene regions. One approach would be to expand the inclusion of SNPs within these gene regions, both for the SNP-level association analysis and in calculating gene-level statistics. Pathway analysis including our genes of interest may also provide further evidence of association with the RSTR phenotype. Further gene expression and functional experiments could also elucidate the role of these genes in resistance to Mtb infection.

Once a strong connection with a genetic basis for the RSTR phenotype is established, the biological function should be investigated, with the end goal being identification of a biological focus for vaccine development.

Though our study may be the first to be able to characterize and collect this phenotype, this phenotype surely exists in other populations. The replication of these genetic associations within other populations would confirm their importance in the development of the phenotype and allow for a focus for possible vaccine development. It would also be interesting to identify this phenotype in populations outside of Uganda 99 where TB is endemic, particularly those populations which may be exposed to different

Mtb lineages, such as the more virulent Beijing strain (Nicol and Wilkinson 2008).

Within our study population and beyond, it is also of interest to confirm that this phenotype persists even beyond the initial two years of follow-up. We noted that there was an apparent difference in the age distribution between RSTRs and non-RSTRs. This is shown to be statistically significant, in both the sample of individuals genotyped using the Illumina10k microarray and the Omni5 microarray, as shown in Table 6. In order to adjust for possible age-related confounding, we utilized the binary age variable in our regression analyses. The cutoff for children aged ≤10 years was based on reports of age- specific genetic effects for TB, differences in immune responses of children compared with adults, and unique epidemiological risk profiles for Mtb infection in children (Leung et al. 2007, Lewinsohn and Lewinsohn 2008, Lewinsohn et al. 2008, Grant et al. 2013,

Ma et al. 2014).

Some areas of future improvement for increasing TB control include improved prevention of TB disease through the development of a more effective vaccine and faster- acting, effective TB treatments (Abel et al. 2014). The use of antibiotics is only one part of the greater goal of TB control and eventual elimination. However, it is clear that this is one part that can use improvement. Among the many issues with currently available TB treatments include the fact that the treatment is long, typically six months of daily medications. Recent work in the development of new treatments has led to the first release of an FDA approved drug to treat TB in over four decades (Zuniga et al. 2015).

This drug does have significant side effects, and is therefore recommended primarily for those with MDR-TB, who cannot otherwise benefit from the traditional first-line drugs. If 100 we wish to protect people from developing Mtb infection leading to TB disease, we need to be able to harness the power of the innate immune response, as found in those who naturally resist Mtb infection, the RSTRs.

The results of studies on the association of genetics and TB disease, and especially resistance to TB, can help identify potential drug targets, supplying fodder for drug development. With this research, we aim to shed light on the mechanisms underlying TB disease susceptibility and resistance. Any new knowledge regarding the genetic basis for TB resistance can also be used towards the development of a new vaccine. As the current BCG vaccine does not protect against TB reliably in adults, the development of a more effective and reliable vaccine is sorely needed, as evidenced by the multiple new preventive TB vaccine candidates currently being developed and tested

(Kaufmann et al. 2015). 101

Title: Polymorphisms in TICAM2 and IL1B are associated with TB If you're a copyright.com Author: N B Hall, R P Igo, L L Malone, B user, you can login to Truitt, A Schnell et al. RightsLink using your copyright.com credentials. Publication: Genes and Immunity Already a RightsLink user or Publisher: Nature Publishing Group want to learn more? Date: Dec 18, 2014 Copyright © 2014, Rights Managed by Nature Publishing Group

Author Request

If you are the author of this content (or his/her designated agent) please read the following. If you are not the author of this content, please click the Back button and select an alternative Requestor Type to obtain a quick price or to place an order.

Ownership of copyright in the article remains with the Authors, and provided that, when reproducing the Contribution or extracts from it, the Authors acknowledge first and reference publication in the Journal, the Authors retain the following non­exclusive rights: a) To reproduce the Contribution in whole or in part in any printed volume (book or thesis) of which they are the author(s). b) They and any academic institution where they work at the time may reproduce the Contribution for the purpose of course teaching. c) To reuse figures or tables created by them and contained in the Contribution in other works created by them. d) To post a copy of the Contribution as accepted for publication after peer review (in Word or Text format) on the Author's own web site, or the Author's institutional repository, or the Author's funding body's archive, six months after publication of the printed or online edition of the Journal, provided that they also link to the Journal article on NPG's web site (eg through the DOI).

NPG encourages the self­archiving of the accepted version of your manuscript in your funding agency's or institution's repository, six months after publication. This policy complements the recently announced policies of the US National Institutes of Health, Wellcome Trust and other research funding bodies around the world. NPG recognises the efforts of funding bodies to increase access to the research they fund, and we strongly encourage authors to participate in such efforts.

Authors wishing to use the published version of their article for promotional use or on a web site must request in the normal way.

If you require further assistance please read NPG's online author reuse guidelines.

For full paper portion: Authors of original research papers published by NPG are encouraged to submit the author's version of the accepted, peer­reviewed manuscript to their relevant funding body's archive, for release six months after publication. In addition, authors are encouraged to archive their version of the manuscript in their institution's repositories (as well as their personal Web sites), also six months after original publication. v2.0

Copyright © 2016 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions. Comments? We would like to hear from you. E­mail us at [email protected]

Appendix A 102

Bibliography

Abel, L., J. El-Baghdadi, A. A. Bousfiha, J. L. Casanova and E. Schurr (2014). "Human genetics of tuberculosis: a long and winding road." Philos Trans R Soc Lond B Biol Sci 369(1645): 20130428.

Abhimanyu, I. R. Mangangcha, P. Jha, K. Arora, M. Mukerji, J. N. Banavaliker, C. Indian Genome Variation, V. Brahmachari and M. Bose (2011). "Differential serum cytokine levels are associated with cytokine gene polymorphisms in north Indians with active pulmonary tuberculosis." Infect Genet Evol 11(5): 1015-1022.

Andersson, R., C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd, Y. Chen, X. Zhao, C. Schmidl, T. Suzuki, E. Ntini, E. Arner, E. Valen, K. Li, L. Schwarzfischer, D. Glatz, J. Raithel, B. Lilje, N. Rapin, F. O. Bagger, M. Jorgensen, P. R. Andersen, N. Bertin, O. Rackham, A. M. Burroughs, J. K. Baillie, Y. Ishizu, Y. Shimizu, E. Furuhata, S. Maeda, Y. Negishi, C. J. Mungall, T. F. Meehan, T. Lassmann, M. Itoh, H. Kawaji, N. Kondo, J. Kawai, A. Lennartsson, C. O. Daub, P. Heutink, D. A. Hume, T. H. Jensen, H. Suzuki, Y. Hayashizaki, F. Muller, F. Consortium, A. R. Forrest, P. Carninci, M. Rehli and A. Sandelin (2014). "An atlas of active enhancers across human cell types and tissues." Nature 507(7493): 455-461.

Andersson, R., C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd, Y. Chen, X. Zhao, C. Schmidl, T. Suzuki, E. Ntini, E. Arner, E. Valen, K. Li, L. Schwarzfischer, D. Glatz, J. Raithel, B. Lilje, N. Rapin, F. O. Bagger, M. Jorgensen, P. R. Andersen, N. Bertin, O. Rackham, A. M. Burroughs, J. K. Baillie, Y. Ishizu, Y. Shimizu, E. Furuhata, S. Maeda, Y. Negishi, C. J. Mungall, T. F. Meehan, T. Lassmann, M. Itoh, H. Kawaji, N. Kondo, J. Kawai, A. Lennartsson, C. O. Daub, P. Heutink, D. A. Hume, T. H. Jensen, H. Suzuki, Y. Hayashizaki, F. Muller, Fantom Consortium, A. R. Forrest, P. Carninci, M. Rehli and A. Sandelin (2014). "An atlas of active enhancers across human cell types and tissues." Nature 507(7493): 455-461.

ATC (2000). "American Thoracic Society Diagnostic Standards and Classification of Tuberculosis in Adults and Children. ." Am J Respir Crit Care Med 161(4 Pt 1): 1376-1395.

Awomoyi, A. A., M. Charurat, A. Marchant, E. N. Miller, J. M. Blackwell, K. P. McAdam and M. J. Newport (2005). "Polymorphism in IL1B: IL1B-511 association with tuberculosis and decreased lipopolysaccharide-induced IL-1beta in IFN-gamma primed ex-vivo whole blood assay." J Endotoxin Res 11(5): 281-286.

Axelgaard, E., L. Jensen, T. F. Dyrlund, H. J. Nielsen, J. J. Enghild, S. Thiel and J. C. Jensenius (2013). "Investigations on collectin liver 1." J Biol Chem 288(32): 23407-23420.

Azad, A. K., W. Sadee and L. S. Schlesinger (2012). "Innate immune gene polymorphisms in tuberculosis." Infect Immun 80(10): 3343-3359. 103

Baker, A. R., F. Qiu, A. K. Randhawa, D. J. Horne, M. D. Adams, M. Shey, J. Barnholtz-Sloan, H. Mayanja-Kizza, G. Kaplan, W. A. Hanekom, W. H. Boom, T. R. Hawn, C. M. Stein, U. Tuberculosis Research and T. South African Tuberculosis Vaccine Initiative (2012). "Genetic variation in TLR genes in Ugandan and South African populations and comparison with HapMap data." PLoS One 7(10): e47597.

Baker, A. R., S. Zalwango, L. L. Malone, R. P. Igo, Jr., F. Qiu, M. Nsereko, M. D. Adams, P. Supelak, H. Mayanja-Kizza, W. H. Boom and C. M. Stein (2011). "Genetic susceptibility to tuberculosis associated with cathepsin Z haplotype in a Ugandan household contact study." Hum Immunol 72(5): 426-430.

Bauman, L. E., L. Almasy, J. Blangero, R. Duggirala, J. S. Sinsheimer and K. Lange (2005). "Fishing for pleiotropic QTLs in a polygenic sea." Ann Hum Genet 69(Pt 5): 590-611.

Bayarri-Olmos, R., S. Hansen, M. L. Henriksen, L. Storm, S. Thiel, P. Garred and L. Munthe-Fog (2015). "Genetic variation of COLEC10 and COLEC11 and association with serum levels of collectin liver 1 (CL-L1) and collectin kidney 1 (CL-K1)." PLoS One 10(2): e0114883.

Berrington, W. R. and T. R. Hawn (2007). "Mycobacterium tuberculosis, macrophages, and the innate immune response: does common variation matter?" Immunol Rev 219: 167-186.

Blischak, J. D., L. Tailleux, A. Mitrano, L. B. Barreiro and Y. Gilad (2015). "Mycobacterial infection induces a specific human innate immune response." Sci Rep 5: 16882.

Boucheron, N., R. Tschismarov, L. Goschl, M. A. Moser, S. Lagger, S. Sakaguchi, M. Winter, F. Lenz, D. Vitko, F. P. Breitwieser, L. Muller, H. Hassan, K. L. Bennett, J. Colinge, W. Schreiner, T. Egawa, I. Taniuchi, P. Matthias, C. Seiser and W. Ellmeier (2014). "CD4(+) T cell lineage integrity is controlled by the histone deacetylases HDAC1 and HDAC2." Nat Immunol 15(5): 439-448.

Brooks, M. N., M. V. Rajaram, A. K. Azad, A. O. Amer, M. A. Valdivia-Arenas, J. H. Park, G. Nunez and L. S. Schlesinger (2011). "NOD2 controls the nature of the inflammatory response and subsequent fate of Mycobacterium tuberculosis and M. bovis BCG in human macrophages." Cell Microbiol 13(3): 402-418.

Carmona, J., A. Cruz, L. Moreira-Teixeira, C. Sousa, J. Sousa, N. S. Osorio, A. L. Saraiva, S. Svenson, G. Kallenius, J. Pedrosa, F. Rodrigues, A. G. Castro and M. Saraiva (2013). "Strains Are Differentially Recognized by TLRs with an Impact on the Immune Response." PLoS One 8(6): e67277.

Casanova, J. L. (2013). Towards a genetic theory of infectious diseases. Host Response in Tuberculosis Keystone Symposium., Whistler, BC, Canada. 104

CDC (2010). "Centers for Disease Control and Prevention Updated Guidelines for Using Interferon Gamma Release Assays to Detect Mycobacterium tuberculosis Infection." MMWR 2010 59(RR-5): 1-13.

Chimusa, E. R., N. Zaitlen, M. Daya, M. Moller, P. D. van Helden, N. J. Mulder, A. L. Price and E. G. Hoal (2014). "Genome-wide association study of ancestry-specific TB risk in the South African Coloured population." Hum Mol Genet 23(3): 796-809.

Cobat, A., L. F. Barrera, H. Henao, P. Arbelaez, L. Abel, L. F. Garcia, E. Schurr and A. Alcais (2012). "Tuberculin skin test reactivity is dependent on host genetic background in Colombian tuberculosis household contacts." Clin Infect Dis 54(7): 968-971.

Cobat, A., C. J. Gallant, L. Simkin, G. F. Black, K. Stanley, J. Hughes, T. M. Doherty, W. A. Hanekom, B. Eley, J. P. Jais, A. Boland-Auge, P. van Helden, J. L. Casanova, L. Abel, E. G. Hoal, E. Schurr and A. Alcais (2009). "Two loci control tuberculin skin test reactivity in an area hyperendemic for tuberculosis." J Exp Med 206(12): 2583-2591.

Cobat, A., C. Poirier, E. Hoal, A. Boland-Auge, F. de La Rocque, F. Corrard, G. Grange, M. Migaud, J. Bustamante, S. Boisson-Dupuis, J. L. Casanova, E. Schurr, A. Alcais, C. Delacourt and L. Abel (2015). "Tuberculin skin test negativity is under tight genetic control of chromosomal region 11p14-15 in settings with different tuberculosis endemicities." J Infect Dis 211(2): 317-321.

Comstock, G. W. (1978). "Tuberculosis in twins: a re-analysis of the Prophit survey." Am Rev Respir Dis 117(4): 621-624.

Coscolla, M. and S. Gagneux (2010). "Does M. tuberculosis genomic diversity explain disease diversity?" Drug Discov Today Dis Mech 7(1): e43-e59.

Coscolla, M. and S. Gagneux (2014). "Consequences of genomic diversity in Mycobacterium tuberculosis." Semin Immunol 26(6): 431-444.

Crawford, G. E., S. Davis, P. C. Scacheri, G. Renaud, M. J. Halawi, M. R. Erdos, R. Green, P. S. Meltzer, T. G. Wolfsberg and F. S. Collins (2006). "DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays." Nat Methods 3(7): 503-509.

Curtis, J., Y. Luo, H. L. Zenner, D. Cuchet-Lourenco, C. Wu, K. Lo, M. Maes, A. Alisaac, E. Stebbings, J. Z. Liu, L. Kopanitsa, O. Ignatyeva, Y. Balabanova, V. Nikolayevskyy, I. Baessmann, T. Thye, C. G. Meyer, P. Nurnberg, R. D. Horstmann, F. Drobniewski, V. Plagnol, J. C. Barrett and S. Nejentsev (2015). "Susceptibility to tuberculosis is associated with variants in the ASAP1 gene encoding a regulator of dendritic cell migration." Nat Genet 47(5): 523-527.

Delaneau, O., J. Marchini, C. Genomes Project and C. Genomes Project (2014). "Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel." Nat Commun 5: 3934. 105

Demissie, A., L. Wassie, M. Abebe, A. Aseffa, G. Rook, A. Zumla, P. Andersen, T. M. Doherty and V. S. Group (2006). "The 6-kilodalton early secreted antigenic target-responsive, asymptomatic contacts of tuberculosis patients express elevated levels of interleukin-4 and reduced levels of gamma interferon." Infect Immun 74(5): 2817-2822.

Dittrich, N., L. C. Berrocal-Almanza, S. Thada, S. Goyal, H. Slevogt, G. Sumanlatha, A. Hussain, S. Sur, S. Burkert, D. Y. Oh, V. Valluri, R. R. Schumann and M. L. Conrad (2015). "Toll-like receptor 1 variations influence susceptibility and immune response to Mycobacterium tuberculosis." Tuberculosis (Edinb) 95(3): 328-335.

Dolan, C. V. and D. I. Boomsma (1998). "Optimal selection of sib pairs from random samples for linkage analysis of a QTL using the EDAC test." Behav Genet 28(3): 197-206.

Donnelly, R. P., H. Dickensheets and D. S. Finbloom (1999). "The interleukin-10 signal transduction pathway and regulation of gene expression in mononuclear phagocytes." J Interferon Cytokine Res 19(6): 563-573.

Duarte, R., C. Carvalho, C. Pereira, A. Bettencourt, A. Carvalho, M. Villar, A. Domingos, H. Barros, J. Marques, P. Pinho Costa, D. Mendonca and B. Martins (2011). "HLA class II alleles as markers of tuberculosis susceptibility and resistance." Rev Port Pneumol 17(1): 15-19.

Eaves, L. J., M. C. Neale and H. Maes (1996). "Multivariate multipoint linkage analysis of quantitative trait loci." Behav Genet 26(5): 519-525.

Encode Project Consortium (2012). "An integrated encyclopedia of DNA elements in the human genome." Nature 489(7414): 57-74.

Fang, M., Z. Fan, W. Tian, Y. Zhao, P. Li, H. Xu, B. Zhou, L. Zhang, X. Wu and Y. Xu (2016). "HDAC4 mediates IFN-gamma induced disruption of energy expenditure-related gene expression by repressing SIRT1 transcription in skeletal muscle cells." Biochim Biophys Acta 1859(2): 294-305.

Fantom Consortium, R. P. the, Clst, A. R. Forrest, H. Kawaji, M. Rehli, J. K. Baillie, M. J. de Hoon, V. Haberle, T. Lassmann, I. V. Kulakovskiy, M. Lizio, M. Itoh, R. Andersson, C. J. Mungall, T. F. Meehan, S. Schmeier, N. Bertin, M. Jorgensen, E. Dimont, E. Arner, C. Schmidl, U. Schaefer, Y. A. Medvedeva, C. Plessy, M. Vitezic, J. Severin, C. Semple, Y. Ishizu, R. S. Young, M. Francescatto, I. Alam, D. Albanese, G. M. Altschuler, T. Arakawa, J. A. Archer, P. Arner, M. Babina, S. Rennie, P. J. Balwierz, A. G. Beckhouse, S. Pradhan-Bhatt, J. A. Blake, A. Blumenthal, B. Bodega, A. Bonetti, J. Briggs, F. Brombacher, A. M. Burroughs, A. Califano, C. V. Cannistraci, D. Carbajo, Y. Chen, M. Chierici, Y. Ciani, H. C. Clevers, E. Dalla, C. A. Davis, M. Detmar, A. D. Diehl, T. Dohi, F. Drablos, A. S. Edge, M. Edinger, K. Ekwall, M. Endoh, H. Enomoto, M. Fagiolini, L. Fairbairn, H. Fang, M. C. Farach-Carson, G. J. Faulkner, A. V. Favorov, M. E. Fisher, M. C. Frith, R. Fujita, S. Fukuda, C. Furlanello, M. Furino, J. Furusawa, T. B. Geijtenbeek, A. P. Gibson, T. Gingeras, D. Goldowitz, J. Gough, S. Guhl, R. Guler, S. Gustincich, T. J. Ha, M. Hamaguchi, M. Hara, M. Harbers, J. Harshbarger, A. Hasegawa, Y. Hasegawa, T. Hashimoto, M. Herlyn, K. J. Hitchens, S. J. Ho Sui, O. M. Hofmann, I. Hoof, F. Hori, L. Huminiecki, K. Iida, T. Ikawa, B. R. Jankovic, H. Jia, A. Joshi, G. 106

Jurman, B. Kaczkowski, C. Kai, K. Kaida, A. Kaiho, K. Kajiyama, M. Kanamori-Katayama, A. S. Kasianov, T. Kasukawa, S. Katayama, S. Kato, S. Kawaguchi, H. Kawamoto, Y. I. Kawamura, T. Kawashima, J. S. Kempfle, T. J. Kenna, J. Kere, L. M. Khachigian, T. Kitamura, S. P. Klinken, A. J. Knox, M. Kojima, S. Kojima, N. Kondo, H. Koseki, S. Koyasu, S. Krampitz, A. Kubosaki, A. T. Kwon, J. F. Laros, W. Lee, A. Lennartsson, K. Li, B. Lilje, L. Lipovich, A. Mackay-Sim, R. Manabe, J. C. Mar, B. Marchand, A. Mathelier, N. Mejhert, A. Meynert, Y. Mizuno, D. A. de Lima Morais, H. Morikawa, M. Morimoto, K. Moro, E. Motakis, H. Motohashi, C. L. Mummery, M. Murata, S. Nagao-Sato, Y. Nakachi, F. Nakahara, T. Nakamura, Y. Nakamura, K. Nakazato, E. van Nimwegen, N. Ninomiya, H. Nishiyori, S. Noma, S. Noma, T. Noazaki, S. Ogishima, N. Ohkura, H. Ohimiya, H. Ohno, M. Ohshima, M. Okada-Hatakeyama, Y. Okazaki, V. Orlando, D. A. Ovchinnikov, A. Pain, R. Passier, M. Patrikakis, H. Persson, S. Piazza, J. G. Prendergast, O. J. Rackham, J. A. Ramilowski, M. Rashid, T. Ravasi, P. Rizzu, M. Roncador, S. Roy, M. B. Rye, E. Saijyo, A. Sajantila, A. Saka, S. Sakaguchi, M. Sakai, H. Sato, S. Savvi, A. Saxena, C. Schneider, E. A. Schultes, G. G. Schulze- Tanzil, A. Schwegmann, T. Sengstag, G. Sheng, H. Shimoji, Y. Shimoni, J. W. Shin, C. Simon, D. Sugiyama, T. Sugiyama, M. Suzuki, N. Suzuki, R. K. Swoboda, P. A. t Hoen, M. Tagami, N. Takahashi, J. Takai, H. Tanaka, H. Tatsukawa, Z. Tatum, M. Thompson, H. Toyodo, T. Toyoda, E. Valen, M. van de Wetering, L. M. van den Berg, R. Verado, D. Vijayan, I. E. Vorontsov, W. W. Wasserman, S. Watanabe, C. A. Wells, L. N. Winteringham, E. Wolvetang, E. J. Wood, Y. Yamaguchi, M. Yamamoto, M. Yoneda, Y. Yonekura, S. Yoshida, S. E. Zabierowski, P. G. Zhang, X. Zhao, S. Zucchelli, K. M. Summers, H. Suzuki, C. O. Daub, J. Kawai, P. Heutink, W. Hide, T. C. Freeman, B. Lenhard, V. B. Bajic, M. S. Taylor, V. J. Makeev, A. Sandelin, D. A. Hume, P. Carninci and Y. Hayashizaki (2014). "A promoter-level mammalian expression atlas." Nature 507(7493): 462-470.

Ferwerda, G., S. E. Girardin, B. J. Kullberg, L. Le Bourhis, D. J. de Jong, D. M. Langenberg, R. van Crevel, G. J. Adema, T. H. Ottenhoff, J. W. Van der Meer and M. G. Netea (2005). "NOD2 and toll-like receptors are nonredundant recognition systems of Mycobacterium tuberculosis." PLoS Pathog 1(3): 279-285.

Fletcher, H. A. (2007). "Correlates of immune protection from tuberculosis." (1566-5240 (Print)).

Flynn, J. L. (2004). "Immunology of tuberculosis and implications in vaccine development." Tuberculosis (Edinb) 84(1-2): 93-101.

Gagneux, S. and P. M. Small (2007). "Global phylogeography of Mycobacterium tuberculosis and implications for tuberculosis product development." Lancet Infect Dis 7(5): 328-337.

Galal, N., J. Boutros, A. Marsafy, X. F. Kong, J. Feinberg, J. L. Casanova, S. Boisson-Dupuis and J. Bustamante (2012). "Mendelian susceptibility to mycobacterial disease in egyptian children." Mediterr J Hematol Infect Dis 4(1): e2012033.

Gianola, D. and D. Sorensen (2004). "Quantitative genetic models for describing simultaneous and recursive relationships between phenotypes." Genetics 167(3): 1407-1424. 107

Gomez, L. M., J. F. Camargo, J. Castiblanco, E. A. Ruiz-Narvaez, J. Cadena and J. M. Anaya (2006). "Analysis of IL1B, TAP1, TAP2 and IKBL polymorphisms on susceptibility to tuberculosis." Tissue Antigens 67(4): 290-296.

Grant, A. V., J. El Baghdadi, A. Sabri, S. El Azbaoui, K. Alaoui-Tahiri, I. Abderrahmani Rhorfi, Y. Gharbaoui, A. Abid, M. Benkirane, V. Raharimanga, V. Richard, M. Orlova, A. Boland, M. Migaud, S. Okada, D. K. Nolan, J. Bustamante, L. B. Barreiro, E. Schurr, S. Boisson-Dupuis, V. Rasolofo, J. L. Casanova and L. Abel (2013). "Age-dependent association between pulmonary tuberculosis and common TOX variants in the 8q12-13 linkage region." Am J Hum Genet 92(3): 407-414.

Grant, A. V., A. Sabri, A. Abid, I. Abderrahmani Rhorfi, M. Benkirane, H. Souhi, H. Naji Amrani, K. Alaoui-Tahiri, Y. Gharbaoui, F. Lazrak, I. Sentissi, M. Manessouri, S. Belkheiri, S. Zaid, A. Bouraqadi, N. El Amraoui, M. Hakam, A. Belkadi, M. Orlova, A. Boland, C. Deswarte, L. Amar, J. Bustamante, S. Boisson-Dupuis, J. L. Casanova, E. Schurr, J. El Baghdadi and L. Abel (2016). "A genome-wide association study of pulmonary tuberculosis in Morocco." Hum Genet 135(3): 299- 307.

Greuber, E. K., P. Smith-Pearson, J. Wang and A. M. Pendergast (2013). "Role of ABL family kinases in cancer: from leukaemia to solid tumours." Nat Rev Cancer 13(8): 559-571.

Guwatudde, D., M. Nakakeeto, E. C. Jones-Lopez, A. Maganda, A. Chiunda, R. D. Mugerwa, J. J. Ellner, G. Bukenya and C. C. Whalen (2003). "Tuberculosis in household contacts of infectious cases in Kampala, Uganda." Am J Epidemiol 158(9): 887-898.

Hall, N. B., R. P. Igo, Jr., L. L. Malone, B. Truitt, A. Schnell, L. Tao, B. Okware, M. Nsereko, K. Chervenak, C. Lancioni, T. R. Hawn, H. Mayanja-Kizza, M. L. Joloba, W. H. Boom, C. M. Stein and U. Tuberculosis Research (2015). "Polymorphisms in TICAM2 and IL1B are associated with TB." Genes Immun 16(2): 127-133.

Hanekom, W., P. Johnston, G. Kaplan, C. Karp, L. Shackelton, L. Stuart and C. Wilson (2014). Revision of the Bill and Melinda Gates Foundation TB Vaccine Strategy - 2014.

Howie, B., J. Marchini and M. Stephens (2011). "Genotype imputation with thousands of genomes." G3 (Bethesda) 1(6): 457-470.

Juarez, E., C. Carranza, F. Hernandez-Sanchez, E. Loyola, D. Escobedo, J. C. Leon-Contreras, R. Hernandez-Pando, M. Torres and E. Sada (2014). "Nucleotide-oligomerizing domain-1 (NOD1) receptor activation induces pro-inflammatory responses and autophagy in human alveolar macrophages." BMC Pulm Med 14: 152.

Kallman, F. and D. Reisner (1943). "Twin studies on the significance of genetic factors in tuberculosis." Am Rev Tuberculosis 47: 549-574. 108

Kaufmann, S. H., T. G. Evans and W. A. Hanekom (2015). "Tuberculosis vaccines: time for a global strategy." Sci Transl Med 7(276): 276fs278.

Kleinnijenhuis, J., L. A. Joosten, F. L. van de Veerdonk, N. Savage, R. van Crevel, B. J. Kullberg, A. van der Ven, T. H. Ottenhoff, C. A. Dinarello, J. W. van der Meer and M. G. Netea (2009). "Transcriptional and inflammasome-mediated pathways for the induction of IL-1beta production by Mycobacterium tuberculosis." Eur J Immunol 39(7): 1914-1922.

Krizsan-Agbas, D., T. Pedchenko and P. G. Smith (2008). "Neurotrimin is an estrogen-regulated determinant of peripheral sympathetic innervation." J Neurosci Res 86(14): 3086-3095.

Lee, J. Y., E. H. Hwang, D. J. Kim, S. M. Oh, K. B. Lee, S. J. Shin and J. H. Park (2016). "The role of nucleotide-binding oligomerization domain 1 during cytokine production by macrophages in response to Mycobacterium tuberculosis infection." Immunobiology 221(1): 70-75.

Leung, K. H., S. P. Yip, W. S. Wong, L. S. Yiu, K. K. Chan, W. M. Lai, E. Y. Chow, C. K. Lin, W. C. Yam and K. S. Chan (2007). "Sex- and age-dependent association of SLC11A1 polymorphisms with tuberculosis in Chinese: a case control study." BMC Infect Dis 7: 19.

Lewinsohn, D. A. and D. M. Lewinsohn (2008). "Immunologic susceptibility of young children to Mycobacterium tuberculosis." Pediatr Res 63(2): 115.

Lewinsohn, D. A., S. Zalwango, C. M. Stein, H. Mayanja-Kizza, A. Okwera, W. H. Boom, R. D. Mugerwa and C. C. Whalen (2008). "Whole blood interferon-gamma responses to mycobacterium tuberculosis antigens in young household contacts of persons with tuberculosis in Uganda." PLoS One 3(10): e3407.

Li, B. and S. M. Leal (2008). "Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data." Am J Hum Genet 83(3): 311-321.

Li, H. T., T. T. Zhang, Y. Q. Zhou, Q. H. Huang and J. Huang (2006). "SLC11A1 (formerly NRAMP1) gene polymorphisms and tuberculosis susceptibility: a meta-analysis." Int J Tuberc Lung Dis 10(1): 3-12.

Li, Q., J. Li, J. Tian, B. Zhu, Y. Zhang, K. Yang, Y. Ling and Y. Hu (2012). "IL-17 and IFN-gamma production in peripheral blood following BCG vaccination and Mycobacterium tuberculosis infection in human." Eur Rev Med Pharmacol Sci 16(14): 2029-2036.

Li, X., Y. Yang, F. Zhou, Y. Zhang, H. Lu, Q. Jin and L. Gao (2011). "SLC11A1 (NRAMP1) polymorphisms and tuberculosis susceptibility: updated systematic review and meta-analysis." PLoS One 6(1): e15831.

Lipner, E. M., B. J. Garcia and M. Strong (2016). "Network Analysis of Human Genes Influencing Susceptibility to Mycobacterial Infections." PLoS One 11(1): e0146585. 109

Liu, J. Z., A. F. McRae, D. R. Nyholt, S. E. Medland, N. R. Wray, K. M. Brown, A. Investigators, N. K. Hayward, G. W. Montgomery, P. M. Visscher, N. G. Martin and S. Macgregor (2010). "A versatile gene-based test for genome-wide association studies." Am J Hum Genet 87(1): 139-145.

Luukkonen, T. M., M. Poyhonen, A. Palotie, P. Ellonen, S. Lagstrom, J. H. Lee, J. D. Terwilliger, R. Salonen and T. Varilo (2012). "A balanced translocation truncates Neurotrimin in a family with intracranial and thoracic aortic aneurysm." J Med Genet 49(10): 621-629.

Ma, N., S. Zalwango, L. L. Malone, M. Nsereko, E. M. Wampande, B. A. Thiel, B. Okware, R. P. Igo, Jr., M. L. Joloba, E. Mupere, H. Mayanja-Kizza, W. H. Boom, C. M. Stein and U. Tuberculosis Research (2014). "Clinical and epidemiological characteristics of individuals resistant to M. tuberculosis infection in a longitudinal TB household contact study in Kampala, Uganda." BMC Infect Dis 14: 352.

Mahan, C. S., S. Zalwango, B. A. Thiel, L. L. Malone, K. A. Chervenak, J. Baseke, D. Dobbs, C. M. Stein, H. Mayanja, M. Joloba, C. C. Whalen and W. H. Boom (2012). "Innate and adaptive immune responses during acute M. tuberculosis infection in adult household contacts in Kampala, Uganda." Am J Trop Med Hyg 86(4): 690-697.

Mahasirimongkol, S., H. Yanai, T. Mushiroda, W. Promphittayarat, S. Wattanapokayakit, J. Phromjai, R. Yuliwulandari, N. Wichukchinda, A. Yowang, N. Yamada, P. Kantipong, A. Takahashi, M. Kubo, P. Sawanpanyalert, N. Kamatani, Y. Nakamura and K. Tokunaga (2012). "Genome-wide association studies of tuberculosis in Asians identify distinct at-risk locus for young tuberculosis." J Hum Genet 57(6): 363-367.

Maher, B. (2012). "ENCODE: The human encyclopaedia." Nature 489(7414): 46-48.

Marchini, J. and B. Howie (2010). "Genotype imputation for genome-wide association studies." Nat Rev Genet 11(7): 499-511.

Matsumiya, M., E. Stylianou, K. Griffiths, Z. Lang, J. Meyer, S. A. Harris, R. Rowland, A. M. Minassian, A. A. Pathan, H. Fletcher and H. McShane (2013). "Roles for Treg expansion and HMGB1 signaling through the TLR1-2-6 axis in determining the magnitude of the antigen-specific immune response to MVA85A." PLoS One 8(7): e67922.

Mayne, R., Z. X. Ren, J. Liu, T. Cook, M. Carson and S. Narayana (1999). "VIT-1: the second member of a new branch of the von Willebrand factor A domain superfamily." Biochem Soc Trans 27(6): 832-835.

Meilang, Q., Y. Zhang, J. Zhang, Y. Zhao, C. Tian, J. Huang and H. Fan (2012). "Polymorphisms in the SLC11A1 gene and tuberculosis risk: a meta-analysis update." Int J Tuberc Lung Dis 16(4): 437-446. 110

Mishra, A. and S. Macgregor (2015). "VEGAS2: Software for More Flexible Gene-Based Testing." Twin Res Hum Genet 18(1): 86-91.

Moller, M., E. de Wit and E. G. Hoal (2010). "Past, present and future directions in human genetic susceptibility to tuberculosis." FEMS Immunol Med Microbiol 58(1): 3-26.

Moller, M. and E. G. Hoal (2010). "Current findings, challenges and novel approaches in human genetic susceptibility to tuberculosis." Tuberculosis (Edinb) 90(2): 71-83.

Morris, N. J., R. C. Elston and C. M. Stein (2010). "A framework for structural equation models in general pedigrees." Hum Hered 70(4): 278-286.

Mustafa, A. S. (2002). "Development of new vaccines and diagnostic reagents against tuberculosis." (0161-5890 (Print)).

Naslednikova, I. O., O. I. Urazova, O. V. Voronkova, A. K. Strelis, V. V. Novitsky, E. L. Nikulina, R. R. Hasanova, T. E. Kononova, V. A. Serebryakova, O. A. Vasileva, N. A. Suhalentseva, E. G. Churina, A. E. Kolosova and T. V. Fedorovich (2009). "Allelic polymorphism of cytokine genes during pulmonary tuberculosis." Bull Exp Biol Med 148(2): 175-180.

Newport, M. J. and C. Finan (2011). "Genome-wide association studies and susceptibility to infectious diseases." Brief Funct Genomics 10(2): 98-107.

Nicol, M. P. and R. J. Wilkinson (2008). "The clinical consequences of strain diversity in Mycobacterium tuberculosis." Trans R Soc Trop Med Hyg 102(10): 955-965.

Nyendak, M. R., P. B., M. D. Null, J. Baseke, G. Swarbrick, H. Mayanja-Kizza, M. Nsereko, D. F. Johnson, P. Gitta, A. Okwera, S. Goldberg, L. Bozeman, J. L. Johnson, W. H. Boom, D. A. Lewinsohn and D. M. Lewinsohn (2008). "Mycobacterium tuberculosis Specific CD8(+) T Cells Rapidly Decline with Antituberculosis Treatment." (1932-6203 (Electronic)).

Nyholt, D. R. (2004). "A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other." Am J Hum Genet 74(4): 765-769.

Ohnishi, H., H. Tochio, Z. Kato, N. Kawamoto, T. Kimura, K. Kubota, T. Yamamoto, T. Funasaka, H. Nakano, R. W. Wong, M. Shirakawa and N. Kondo (2012). "TRAM is involved in IL-18 signaling and functions as a sorting adaptor for MyD88." PLoS One 7(6): e38423.

Ohtani, K., Y. Suzuki and N. Wakamiya (2012). "Biological functions of the novel collectins CL-L1, CL-K1, and CL-P1." J Biomed Biotechnol 2012: 493945.

Ozbek, N., C. Fieschi, B. T. Yilmaz, L. de Beaucoudrey, B. Demirhan, J. Feinberg, Y. E. Bikmaz and J. L. Casanova (2005). "Interleukin-12 receptor beta 1 chain deficiency in a child with disseminated tuberculosis." Clin Infect Dis 40(6): e55-58. 111

Peng, Y., X. Zhang, J. Liu, Q. Liu, C. Guo, Y. Zhang and D. Lin (2011). "Solution structure of the protein lipocalin 12 from rat epididymis." Proteins 79(7): 2316-2320.

Png, E., B. Alisjahbana, E. Sahiratmadja, S. Marzuki, R. Nelwan, Y. Balabanova, V. Nikolayevskyy, F. Drobniewski, S. Nejentsev, I. Adnan, E. van de Vosse, M. L. Hibberd, R. van Crevel, T. H. Ottenhoff and M. Seielstad (2012). "A genome wide association study of pulmonary tuberculosis susceptibility in Indonesians." BMC Med Genet 13: 5.

Remus, N., J. El Baghdadi, C. Fieschi, J. Feinberg, T. Quintin, M. Chentoufi, E. Schurr, A. Benslimane, J. L. Casanova and L. Abel (2004). "Association of IL12RB1 polymorphisms with pulmonary tuberculosis in adults in Morocco." J Infect Dis 190(3): 580-587.

Sarkar, R., L. Lenders, K. A. Wilkinson, R. J. Wilkinson and M. P. Nicol (2012). "Modern lineages of Mycobacterium tuberculosis exhibit lineage-specific patterns of growth and cytokine induction in human monocyte-derived macrophages." PLoS One 7(8): e43170.

Seshadri, C. (2016). Monocyte Transcriptional Networks are Associated with Tuberculin Skin Test Status in Uganda. in preparation.

Seya, T., H. Oshiumi, M. Sasai, T. Akazawa and M. Matsumoto (2005). "TICAM-1 and TICAM-2: toll-like receptor adapters that participate in induction of type 1 interferons." Int J Biochem Cell Biol 37(3): 524-529.

Shah, J. A., J. C. Vary, T. T. Chau, N. D. Bang, N. T. Yen, J. J. Farrar, S. J. Dunstan and T. R. Hawn (2012). "Human TOLLIP regulates TLR2 and TLR4 signaling and its polymorphisms are associated with susceptibility to tuberculosis." J Immunol 189(4): 1737-1746.

Song, L. and G. E. Crawford (2010). "DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells." Cold Spring Harb Protoc 2010(2): pdb prot5384.

Spigelman, M., H. D. Donoghue, Z. Abdeen, S. Ereqat, I. Sarie, C. L. Greenblatt, I. Pap, I. Szikossy, I. Hershkovitz, G. K. Bar-Gal and C. Matheson (2015). "Evolutionary changes in the genome of Mycobacterium tuberculosis and the human genome from 9000 years BP until modern times." Tuberculosis (Edinb) 95 Suppl 1: S145-149.

Stein, C. M. (2011). "Genetic epidemiology of tuberculosis susceptibility: impact of study design." PLoS Pathog 7(1): e1001189.

Stein, C. M. and A. R. Baker (2011). "Tuberculosis as a complex trait: impact of genetic epidemiological study design." Mamm Genome 22(1-2): 91-99. 112

Stein, C. M., D. Guwatudde, M. Nakakeeto, P. Peters, R. C. Elston, H. K. Tiwari, R. Mugerwa and C. C. Whalen (2003). "Heritability analysis of cytokines as intermediate phenotypes of tuberculosis." J Infect Dis 187(11): 1679-1685.

Stein, C. M., N. B. Hall, L. L. Malone and E. Mupere (2013). "The household contact study design for genetic epidemiological studies of infectious diseases." Front Genet 4: 61.

Stein, C. M., S. Zalwango, A. B. Chiunda, C. Millard, D. V. Leontiev, A. L. Horvath, K. C. Cartier, K. Chervenak, W. H. Boom, R. C. Elston, R. D. Mugerwa, C. C. Whalen and S. K. Iyengar (2007). "Linkage and association analysis of candidate genes for TB and TNFalpha cytokine expression: evidence for association with IFNGR1, IL-10, and TNF receptor 1 genes." Hum Genet 121(6): 663- 673.

Stein, C. M., S. Zalwango, L. L. Malone, S. Won, H. Mayanja-Kizza, R. D. Mugerwa, D. V. Leontiev, C. L. Thompson, K. C. Cartier, R. C. Elston, S. K. Iyengar, W. H. Boom and C. C. Whalen (2008). "Genome scan of M. tuberculosis infection and disease in Ugandans." PLoS One 3(12): e4094.

Sveinbjornsson, G., D. F. Gudbjartsson, B. V. Halldorsson, K. G. Kristinsson, M. Gottfredsson, J. C. Barrett, L. J. Gudmundsson, K. Blondal, A. Gylfason, S. A. Gudjonsson, H. T. Helgadottir, A. Jonasdottir, A. Jonasdottir, A. Karason, L. B. Kardum, J. Knezevic, H. Kristjansson, M. Kristjansson, A. Love, Y. Luo, O. T. Magnusson, P. Sulem, A. Kong, G. Masson, U. Thorsteinsdottir, Z. Dembic, S. Nejentsev, T. Blondal, I. Jonsdottir and K. Stefansson (2016). "HLA class II sequence variants influence tuberculosis risk in populations of European ancestry." Nat Genet.

Tao, L., S. Zalwango, K. Chervenak, B. Thiel, L. L. Malone, F. Qiu, H. Mayanja-Kizza, W. H. Boom, C. M. Stein and U. Tuberculosis Research (2013). "Genetic and shared environmental influences on interferon-gamma production in response to Mycobacterium tuberculosis antigens in a Ugandan population." Am J Trop Med Hyg 89(1): 169-173.

Thurman, R. E., E. Rynes, R. Humbert, J. Vierstra, M. T. Maurano, E. Haugen, N. C. Sheffield, A. B. Stergachis, H. Wang, B. Vernot, K. Garg, S. John, R. Sandstrom, D. Bates, L. Boatman, T. K. Canfield, M. Diegel, D. Dunn, A. K. Ebersol, T. Frum, E. Giste, A. K. Johnson, E. M. Johnson, T. Kutyavin, B. Lajoie, B. K. Lee, K. Lee, D. London, D. Lotakis, S. Neph, F. Neri, E. D. Nguyen, H. Qu, A. P. Reynolds, V. Roach, A. Safi, M. E. Sanchez, A. Sanyal, A. Shafer, J. M. Simon, L. Song, S. Vong, M. Weaver, Y. Yan, Z. Zhang, Z. Zhang, B. Lenhard, M. Tewari, M. O. Dorschner, R. S. Hansen, P. A. Navas, G. Stamatoyannopoulos, V. R. Iyer, J. D. Lieb, S. R. Sunyaev, J. M. Akey, P. J. Sabo, R. Kaul, T. S. Furey, J. Dekker, G. E. Crawford and J. A. Stamatoyannopoulos (2012). "The accessible chromatin landscape of the human genome." Nature 489(7414): 75-82.

Thye, T., E. Owusu-Dabo, F. O. Vannberg, R. van Crevel, J. Curtis, E. Sahiratmadja, Y. Balabanova, C. Ehmen, B. Muntau, G. Ruge, J. Sievertsen, J. Gyapong, V. Nikolayevskyy, P. C. Hill, G. Sirugo, F. Drobniewski, E. van de Vosse, M. Newport, B. Alisjahbana, S. Nejentsev, T. H. Ottenhoff, A. V. Hill, R. D. Horstmann and C. G. Meyer (2012). "Common variants at 11p13 are associated with susceptibility to tuberculosis." Nat Genet 44(3): 257-259. 113

Thye, T., F. O. Vannberg, S. H. Wong, E. Owusu-Dabo, I. Osei, J. Gyapong, G. Sirugo, F. Sisay-Joof, A. Enimil, M. A. Chinbuah, S. Floyd, D. K. Warndorff, L. Sichali, S. Malema, A. C. Crampin, B. Ngwira, Y. Y. Teo, K. Small, K. Rockett, D. Kwiatkowski, P. E. Fine, P. C. Hill, M. Newport, C. Lienhardt, R. A. Adegbola, T. Corrah, A. Ziegler, T. B. G. C. African, C. Wellcome Trust Case Control, A. P. Morris, C. G. Meyer, R. D. Horstmann and A. V. Hill (2010). "Genome-wide association analyses identifies a susceptibility locus for tuberculosis on chromosome 18q11.2." Nat Genet 42(9): 739-741.

Todorov, A. A., G. P. Vogler, C. Gu, M. A. Province, Z. Li, A. C. Heath and D. C. Rao (1998). "Testing causal hypotheses in multivariate linkage analysis of quantitative traits: general formulation and application to sibpair data." Genet Epidemiol 15(3): 263-278.

Tsumura, M., S. Okada, H. Sakai, S. Yasunaga, M. Ohtsubo, T. Murata, H. Obata, T. Yasumi, X. F. Kong, A. Abhyankar, T. Heike, T. Nakahata, R. Nishikomori, S. Al-Muhsen, S. Boisson-Dupuis, J. L. Casanova, M. Alzahrani, M. A. Shehri, G. Elghazali, Y. Takihara and M. Kobayashi (2012). "Dominant-negative STAT1 SH2 domain mutations in unrelated patients with Mendelian susceptibility to mycobacterial disease." Hum Mutat 33(9): 1377-1387. van Crevel, R., T. H. Ottenhoff and J. W. van der Meer (2002). "Innate immunity to Mycobacterium tuberculosis." Clin Microbiol Rev 15(2): 294-309. van de Vosse, E., M. H. Haverkamp, N. Ramirez-Alejo, M. Martinez-Gallo, L. Blancas-Galicia, A. Metin, B. Z. Garty, C. Sun-Tan, A. Broides, R. A. de Paus, O. Keskin, D. Cagdas, I. Tezcan, E. Lopez- Ruzafa, J. I. Arostegui, J. Levy, F. J. Espinosa-Rosales, O. Sanal, L. Santos-Argumedo, J. L. Casanova, S. Boisson-Dupuis, J. T. van Dissel and J. Bustamante (2013). "IL-12Rbeta1 deficiency: mutation update and description of the IL12RB1 variation database." Hum Mutat 34(10): 1329- 1339. van der Eijk, E. A., E. van de Vosse, J. P. Vandenbroucke and J. T. van Dissel (2007). "Heredity versus environment in tuberculosis in twins: the 1950s United Kingdom Prophit Survey Simonds and Comstock revisited." Am J Respir Crit Care Med 176(12): 1281-1288.

Velez, D. R., W. F. Hulme, J. L. Myers, M. E. Stryjewski, E. Abbate, R. Estevan, S. G. Patillo, J. R. Gilbert, C. D. Hamilton and W. K. Scott (2009). "Association of SLC11A1 with tuberculosis and interactions with NOS2A and TLR2 in African-Americans and Caucasians." Int J Tuberc Lung Dis 13(9): 1068-1076.

Wampande, E. M., E. Mupere, S. M. Debanne, B. B. Asiimwe, M. Nsereko, H. Mayanja, K. Eisenach, G. Kaplan, H. W. Boom, G. Sebastien and M. L. Joloba (2013). "Long-term dominance of Mycobacterium tuberculosis Uganda family in peri-urban Kampala-Uganda is not associated with cavitary disease." BMC Infect Dis 13(1): 484.

Wheeler, E., E. N. Miller, C. S. Peacock, I. J. Donaldson, M. A. Shaw, S. E. Jamieson, J. M. Blackwell and H. J. Cordell (2006). "Genome-wide scan for loci influencing quantitative immune 114

response traits in the Belem family study: comparison of methods and summary of results." Ann Hum Genet 70(Pt 1): 78-97.

Whittaker, C. A. and R. O. Hynes (2002). "Distribution and evolution of von Willebrand/integrin A domains: widely dispersed domains with roles in cell adhesion and elsewhere." Mol Biol Cell 13(10): 3369-3387.

WHO (2015). Global Tuberculosis Report 2015. Geneva, Switzerland, World Health Organization.

Willer, C. J., Y. Li and G. R. Abecasis (2010). "METAL: fast and efficient meta-analysis of genomewide association scans." Bioinformatics 26(17): 2190-2191.

Williams, A., H. G. J., S. O. Clark, K. E. Gooch, K. A. Hatch, G. A. Hall, K. Huygen, T. H. M. Ottenhoff, K. L. M. C. Franken, P. Andersen, T. M. Doherty, S. H. E. Kaufmann, L. Grode, P. Seiler, C. Martin, B. Gicquel, S. T. Cole, P. Brodin, A. S. Pym, W. Dalemans, J. Cohen, Y. Lobet, N. Goonetilleke, H. McShane, A. Hill, T. Parish, D. Smith, N. G. Stoker, D. B. Lowrie, G. Kallenius, S. Svenson, A. Pawlowski, K. Blake and P. D. Marsh (2005). "Evaluation of vaccines in the EU TB Vaccine Cluster using a guinea pig aerosol infection model of tuberculosis." (1472-9792 (Print)).

Wong, S. and O. N. Witte (2004). "The BCR-ABL story: bench to bedside and back." Annu Rev Immunol 22: 247-306.

Wu, B., C. Huang, M. Kato-Maeda, P. C. Hopewell, C. L. Daley, A. M. Krensky and C. Clayberger (2007). "Messenger RNA expression of IL-8, FOXP3, and IL-12beta differentiates latent tuberculosis infection from disease." J Immunol 178(6): 3688-3694.

Xia, J., E. L. Tilahun, E. H. Kebede, T. E. Reid, L. Zhang and X. S. Wang (2015). "Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families." J Chem Inf Model 55(2): 374-388.

Xing, C., J. Huang, H. Yi-Hsiang, A. L. DeStefano, N. L. Heard-Costa, P. A. Wolf, S. Seshadri, D. P. Kiel, L. A. Cupples and J. Dupuis (2013). Evaluation of Power of the Illumina HumanOmni5M-4v1 BeadChip to Detect Risk Variants for Human Complex Diseases, Boston University School of Public Health: 1-7.

Xu, Z. and J. A. Taylor (2009). "SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies." Nucleic Acids Res 37(Web Server issue): W600-605.

Yamamoto, M., S. Sato, H. Hemmi, S. Uematsu, K. Hoshino, T. Kaisho, O. Takeuchi, K. Takeda and S. Akira (2003). "TRAM is specifically involved in the Toll-like receptor 4-mediated MyD88- independent signaling pathway." Nat Immunol 4(11): 1144-1150. 115

Zhang, Y., T. Jiang, X. Yang, Y. Xue, C. Wang, J. Liu, X. Zhang, Z. Chen, M. Zhao and J. C. Li (2013). "Toll-like receptor -1, -2, and -6 polymorphisms and pulmonary tuberculosis susceptibility: a systematic review and meta-analysis." PLoS One 8(5): e63357.

Zhao, M., F. Jiang, W. Zhang, F. Li, L. Wei, J. Liu, Y. Xue, X. Deng, F. Wu, L. Zhang, X. Zhang, Y. Zhang, D. Fan, X. Sun, T. Jiang and J. C. Li (2012). "A novel single nucleotide polymorphism within the NOD2 gene is associated with pulmonary tuberculosis in the Chinese Han, Uygur and Kazak populations." BMC Infect Dis 12: 91.

Zuniga, E. S., J. Early and T. Parish (2015). "The future for early-stage tuberculosis drug discovery." Future Microbiol 10(2): 217-229.