Expression Profiles of Pediatric Tuberculosis Patients and Exposed Controls from India

Jeffrey A Tornheim,1 Mandar Paradkar,2 Anil Madugundu,4,5 Nikhil Gupte,2 Vandana Kulkarni,2 Sreelakshmi Sreenivasamurthy,4,5 Remya Raja,4 Neeta Pradhan,2 Shri Vijay Bala Yogendra Shivakumar,6 Chhaya Valvi,3 Rewa Kohli,2 Padmapriyadarsini Chandrasekaran,7 Vidya Mave,2 Akhilesh Pandey,4,5 and Amita Gupta1

1Center for Clinical Global Health Education, Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD, USA, 2Byramjee Jeejeebhoy Government Medical College – Johns Hopkins University Clinical Research Site, Pune, Maharashtra, India, 3Byramjee Jeejeebhoy Government Medical College, Pune, Maharashtra, India, 4National Institute of Mental Health and Neurosciences - Institute of Bioinformatics Lab, Bangalore, India, 5Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA, 6Johns Hopkins University – India office (CCGHE), Pune, Maharashtra, India, 7National Institute for Research in Tuberculosis, Chennai, Tamil Nadu, India

BACKGROUND RESULTS—Differential Gene Expression

 Tuberculosis (TB) is the #1 infectious disease killer worldwide, and 27% of cases are in India (WHO 2016).  Aim 1: Differential gene expression between pediatric cases (N=16) and controls (N=32) at enrollment  Children frequently have either paucibacillary disease, are too young to provide adequate sputum Gene Counts Fold Change Distribution samples, or develop extrapulmonary disease, forcing clinicians to rely on diagnostic tests with poor sensitivity and frequent presumptive treatment for TB in children (Perez-Velez NEJM 2012).  Several studies have attempted to use biomarkers of host response to infection as an alternative means of Number of Differentially Expressed between Cases and Controls, diagnosing TB, but these studies have not included many children, Indian patients, or microbiologically by False Discovery Rate (FDR) and Log2-Fold Change Any Log2 >0.5 Log2 >1.0 Log2 >1.5 Log2 >2.0 confirmed patients. In addition, these studies have varied both in technology used and analytic approach FDR=0.10 288 226 100 35 13 FDR=0.05 123 111 70 33 13  In order to evaluate the application of previously published transcriptomic signatures of TB to Indian FDR=0.01 58 31 25 18 10 children, we evaluated the complete gene transcription of children with confirmed TB and those without evidence of TB from an ongoing prospective cohort in India.  Aim 2: Differential gene expression between cases at baseline and 6 months of treatment Gene Counts Fold Change Distribution

METHODS Number of Differentially Expressed Genes between Cases During Treatment, by False Discovery Rate (FDR) and Log2-Fold Change Any Log2 >0.5 Log2 >1.0 Log2 >1.5 Log2 >2.0 Parent Protocol FDR=0.10 1014 683 28 1 1 FDR=0.05 568 487 25 1 1 FDR=0.01 0 0 0 0 0  This study was nested in a prospective observational cohort of Indian TB patients and household contacts

(The Cohort for TB Research by Indo-US Medical Partnership, “CTRIUMPh”, Gupte BMJ Open 2016).

Adults and children with active TB and household contacts were evaluated to measure host and microbial  Aim 3: Differential gene expression between controls at the time of exposure and incident infection factors associated with: TB treatment outcomes among active TB patients, progression from infection to Gene Counts Fold Change Distribution active TB disease among household contacts of active TB patients, and transmission of TB to contacts.

Nested Transcriptomic Substudy Number of Differentially Expressed  Biorepository samples were selected from CTRIUMPh participants to address the following aims: Genes Before & After Developing LTBI, by False Discovery Rate (FDR) and Log2-Fold Change AIM 1 – Confirm the presence of previously published gene Any Log2 >0.5 Log2 >1.0 Log2 >1.5 Log2 >2.0 FDR=0.10 0 0 0 0 0 signatures and evaluate their relative accuracy among a cohort FDR=0.05 0 0 0 0 0 of confirmed positive Indian pediatric TB patients FDR=0.01 0 0 0 0 0

AIM 2 – Evaluate whether positive gene signatures return to

normal after 6 months of treatment in a cohort of Indian  70 genes differentiated children with microbiologically confirmed TB from household contacts (gene confirmed pediatric TB patients expression increased for 68 and decreased for 2). AIM 3 – Evaluate whether negative gene signatures remain  25 genes differentiated cases before and after TB treatment (all decreased expression with time). 6 genes negative in exposed household contacts over time were differentially expressed between 0 and 1 month of treatment (increased for 5 and decreased for 1).  No significant difference was found among household contacts who developed incident LTBI, though the  All participants from the city of Pune, India who were <15 years old with microbiologically or histologically number of paired samples for this analysis was low. confirmed TB were selected for inclusion (cases). Household contacts of these participants were evaluated for TB, and were confirmed to not have TB by symptom screen, tuberculin skin test (TST), and interferon gamma release assay (IGRA). Cases were age– and sex-matched with 2 household contacts who were TST/IGRA negative upon enrollment (1:2 case:control ratio). All cases and controls were HIV negative.

 Whole blood was collected in PAXgene tubes at enrollment, 1 month, and 6 months of treatment for cases, RESULTS—Comparison with Published Signatures

and at enrollment, 4 months, and 12 months for controls. TST and IGRAs were repeated for controls until Published signatures of gene expression in Tuberculosis either conversion or completion of 1 year of follow-up. Sequencing # of Genes False Log Fold Difference Author Age Application Method in Signature Discovery Rate in Expression RNA Sequencing and Data Analysis Anderson (NEJM 2014) Children TB vs. LTBI Microarray 42 -- 0.5 Berry (Nature 2010) Adults TB vs. Not TB Microarray 393 0.01 1  RNAseq was performed using Illumina’s HiSeq 2500 platform at MedGenome in Bangalore, India to Bloom (PLoS One 2012) Adults Post Treatment Microarray 320 0.01 1 Jenum (Sci Reports 2016) Children TB vs. Not TB dcRT-MLPA 12 -- -- generate an average 170 million 100 paired-end reads per sample. Raw data were aligned to Kaforou (PLoS One 2013) Adults TB vs. Not TB Microarray 53 -- 0.5 the (GRCh38.10) using the Spliced Transcripts Alignment to a Reference (STAR) aligner. Kaforou (PLoS One 2013) Adults TB vs. LTBI Microarray 27 -- 0.5 Lee (BMC Bioinformatics 2016) Adults TB vs. LTBI Microarray 169 0.05 1 coding genes were selected for differential expression analysis using the DESeq2 package in R. Lee (BMC Bioinformatics 2016) Adults TB vs. Not TB Microarray 267 0.05 1 Maertzdorf (EMBO Mol Med 2015) Adults TB vs. Not TB RT-PCR 4 -- --  Principle component analysis (PCA) was performed on log transformed data to identify important clinical Obermoser (Immunity 2014) Adults Inflammation Microarray 74 0.1 1 Obermoser (Immunity 2014) Adults Interferon γ Microarray 79 0.1 1 covariates. Differential expression models were constructed including age, sex, site of disease for cases Sweeney (Lancet Resp Med 2016) Mixed TB vs. Not TB Public Data 3 0.01 0.58 (pulmonary vs. extrapulmonary), TST/IGRA conversion status (controls) and time of sample collection. Zak (Lancet 2016) Children Incident LTBI RNAseq 16 -- --

 Differentially expressed genes were summarized according to false discovery rate (FDR) limits of <0.10,  Laboratory and analytic methods varied across different published signatures

<0.05, and <0.01, and absolute log2-fold expression differences of >0.5, >1, >1.5, and >2.0. Concordance between our data and published signatures of active TB (FDR<0.05, ≥1 log fold change)  Results were compared to published lists of genes associated with TB compared to either healthy controls 2 or latent tuberculosis infection (LTBI, Aim 1), with clinical improvement during treatment (Aim 2), and with TB vs. Not TB TB vs. LTBI Inflammation Interferon Gamma Berry Jenum Anderson Kaforou Obermoser Obermoser development of infection (Aim 3). CTRIUMPh Lee Maertzdorf Sweeney 393 Genes 12 Genes 42 Genes 27 Genes 74 Genes 79 Genes  Power calculations were performed post-hoc to confirm that sample size and sequencing depth were AIM2 X X ANKRD22 X X sufficient to identify differentially expressed genes using the PROPER package in R. C1QB X X C1QC X DEFA3 X DHRS9 X X FAM26F X FCGR1A X X X X FCGR1B X X X GBP1 X x X GBP5 X X X X RESULTS—Clinical Characteristics GBP6 X X X X GPR84 X MPO X  Transcriptomic profiling was completed for a total of 16 cases and 32 controls from CTRIUMPh: PRRG4 X SEPT4 X X Transcriptomic CTRIUMPh Cases Transcriptomic Study Controls  Out of 70 differentially expressed genes between cases and controls at the time of enrollment, 15 were Characteristic Study Cases (N=101) Converters (N=11) Nonconverters (N=21) (N=16) found in other published signatures (most frequently FCGR1A, FCGR1B, GBP1, GBP5, and GBP6). N (%) N (%) N (%) N (%) Median Age in Years (range) 8 (1-14) 9.5 (3-14) 9 (6-14) 10 (2-14)  The greatest overlap with other published signatures was found with Sweeney (1 gene, 33.%), followed by Male 51 (50.5) 8 (50.0) 6 (54.5) 9 (42.9) Kaforou (7 genes, 25.9%), Obermoser (6 genes, 7.6%), and Anderson (3 genes, 7.1%). BCG Scar 78 (77.2) 8 (50.0) 5 (45.5) 12 (57.1) Pulmonary TB 74 (73.3) 8 (50.0) NA NA Extrapulmonary TB: Lymph Node 11 (36.7)1 6 (37.5) NA NA  None of the genes in published signatures of response to treatment at one month, but there was overlap at 1 Extrapulmonary TB: Abdominal 14 (46.7) 0 NA NA 6 month of treatment with the Bloom signature (10 genes, 3.1%). 1% of extrapulmonary tuberculosis patients (N=30)  Participants in the transcriptomic study reflected demographics of the parent study, with the exception of a lower proportion of BCG scars among transcriptomic study patients.

Characteristic CTRIUMPh Cases Transcriptomic Study RESULTS—Power Analysis N (% of Disease Type) N (% of Disease Type) History of TB Contact 66 (65.3) 9 (56.3) Expected power to detect differential gene expression Median Illness Duration (range) 30 Days (0-120) 52.5 Days (25-60) Smear Positive Pulmonary TB 9 (12.2) 3 (37.5) Xpert MTB/RIF Positive Pulmonary TB 14 (18.9) 6 (75.0) Impact of Sample Size & Depth on FDR & Power

Culture Positive Pulmonary TB 13 (16.7) 5 (62.5) 50 Million Reads 170 Million Reads Culture Negative by 2 Weeks 8 (61.5) 4 (80.0) Cases Controls Actual Marginal Actual Marginal Power Power Culture Negative by 6 Months 13 (100.0) 5 (100.0) 5 10 0.350 0.48 0.340 0.49 Pathologically Confirmed Extrapulmonary TB 11 (36.7) 8 (100.0) 10 20 0.150 0.73 0.140 0.75 Chest X-ray with Cavitary Disease 20 (27.4) 2 (25.0) 15 30 0.110 0.82 0.100 0.84 Chest X-ray Score (Range 0 – 40) 25 (0 – 110) 22 (0 – 130) 20 40 0.091 0.87 0.086 0.88

25 50 0.092 0.89 0.089 0.90  Cases in the transcriptomic study had a shorter duration of therapy and a higher rate of early culture 30 60 0.080 0.91 0.076 0.92 conversion than the parent study. All participants demonstrated clinical improvement on the expected time frame for Aim 2 analysis (transcriptomic analysis of baseline vs. 1 and 6 months).  Use of deep sequencing generated a power >80% to detect differential gene expression at the level detected with a false discovery rate <0.10.

RESULTS—Principle Component analysis CONCLUSIONS Evaluation of the impact of individual variables on inter-sample variance  RNAseq at high depth identified a unique signature of gene expression between TB patients and household contacts at baseline, and over time.

 This study is unique in its use of high depth sequencing, microbiologically confirmed pediatric TB patients,

and comparison with different gene signatures according to varying thresholds of significance.

 Sex and age significantly impacted interpretation of differential gene expression in children and should be incorporated into models of differential gene expression in this population.  Our gene signatures demonstrated incomplete overlap with previously published signatures

 Additional work is required to determine the optimal set of genes to identify pediatric TB patients and

PCA Variance for Cases vs Controls PCA Variance for Cases Over Time PCA Variance for Controls Over Time evaluate the impact of these signatures on patients without microbiological diagnosis. (43 Components Identified) (28 Components Identified) (43 Components Identified) PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 Standard Deviation 6.169 1.678 1.311 0.466 5.020 1.381 0.798 0.422 3.183 1.089 0.733 0.288 Proportion of Variance 0.885 0.065 0.040 0.005 0.900 0.068 0.023 0.006 0.845 0.099 0.045 0.007 Cumulative Proportion 0.885 0.951 0.991 0.996 0.900 0.968 0.991 0.997 0.845 0.943 0.988 0.995  ≥99% of the variance in all models was explained by 4 principle components or fewer. ACKNOWLEDGEMENTS

This work was graciously supported by the NIH/DBT RePORT India Consortium. This project has been

funded in whole or in part with Federal funds from the Government of India’s (GOI) Department of

Biotechnology (DBT), the Indian Council of Medical Research (ICMR), the United States National Institutes of

Health (NIH), National Institute of Allergy and Infectious Diseases (NIAID), Office of AIDS Research (OAR),

and distributed in part by CRDF Global. Additional support came from NIH/NIAID (UM1AI069465), the Fogarty

International Center BJGMC JHU HIV TB Program (D43TW009574), the UJMT Fogarty Global Health Fellows

Program (R25TW009340), the Ujala Foundation, the Gilead Foundation, the Wyncote Foundation, and

Persistent Systems. Jeffrey A Tornheim, MD MPH, Assistant Professor of Medicine  Variance identified by principle component analysis mapped well to sex, with partial influence of age. CONTACT: Center for Clinical Health Education, Division of Infectious Diseases  As a result, age and sex were included in all models of differential gene expression. Johns Hopkins University School of Medicine, Baltimore, MD USA [email protected]