CONFIDENTIAL PENTA Foundation Project

The HIV CLINICAL & EXPERIMENTAL PLATFORM Transcriptomics and Bioinformatics

Rome, Italy February 12th Mark Cameron, PhD WP4: Immunological Platform

1

Rapid expansion of the scope of the sequencing field

Sequencing Genomes to Reference Genetic Variation Moving beyond genome (Epigenetics)

Genome: -DNA-Seq -Targeted DNA-Seq (e.g. exome)

Can be heritable Goal of epigenomics: Epigenetic State: -Methyl-Seq -ChIP-Seq (:DNA) (e.g. histone modification)

Transcriptome (RNA-Seq): -mRNA -ncRNA/miRNA -Rep-Seq (T/BCR repertoire) -CLIP-Seq (protein:RNA) -Ribo-Seq (translation)

Shendure & Lieberman-Aiden, Nat Biotech 30(11):1084-94 U. Miami/Cleveland Inter-CFAR Systems Biology Core

• Mandate: Provide hypothesis-driven, top-to-bottom, genomic assays and bioinformatics analysis • >75 SOPs spanning kickoff, sample collection and shipping to assay, analysis and interpretation of multi-omic data • Consulting, design, and automated reporting pipelines • Research and clinical data integration

• Technologies: • Ultra-low input SOPS: SOPs to pg level, <10 cells, single cell • Illumina CSPRO certified & GCLP supervised • Illumina HiSeq SOPs: m/mi/ncRNA, exome • Correlate panel development and monitoring • Custom Panels: innate immunity, cytokines, T cell activation, IRGs, inflammasome, cancer (tumor/cell and immune), metabolism, regulation Objectives

• We are developing: – Sensitive and minimally-invasive assays to assess the human transcriptome – Bioinformatic tools to identify additional correlates of pathogenesis or protection (i.e. biomarkers) – often needed where traditional biomarkers are insufficient to fully assess a cohort response. – Data integration methods to refine and validate biomarkers within and across studies The Power of A Functional Genomics Perspective

New correlate signatures

Validation (e.g. other cohorts, outcomes)

Pulendren et al, Immunity 2010 Panels for Precision Medicine Inter-CFAR Systems Biology Core Investigation Workflow

Inves,ga,on# I.#Kickoff## Descrip,on# Scope of Work Mee,ng# dI.1! Review,

Sample# Timelines and II.#Inves,ga,on# Worksheet# Setup# Deliverables dII.1! RNAGseq#Run# III.#Genomics! worksheet# Assay dIII.1! Preliminary# IV.Bioinforma,cs# Report# Iterative Preliminary#Analysis# dIV.1! Deliverables Legend# Func,onal# V.#Preliminary# Genomics# call# Findings# Descrip,on# Case Inter-CFAR dV.1! Sys Bio Core E Core# VI.#Func,onal# Advanced# Protocols! Genomics# Report# Advanced#Analysis! dVI.1! Document# Inves,ga,on# VII.#Advanced# Valida*on! Conclusion# Findings# ‘omic!pla1orms! dI.1! !

22 Preprocessing of RNA-seq: High Capacity R Pipeline

Plexed RNA 1. Create Raw fastq from binary output by running pooled tagged (Seq Run) RNA short read libraries are run on Illumina Hiseq 2500 (2x50 30M Raw Fastq reads), as per manufacturer’s instructions. (Filter) 2. Create High Quality Fastq’s by filtering out poor quality base calls HQ Fastq and adapter contamination using Trimmomatic. (Align) 3. Generate Mapped Read files by aligning the HQ Fastq reads using the STAR aligner. Mapped Reads

(Counting) 4. Generate Features Counts using Htseq.

Feature Counts

23 based Analysis

Counts 1. Create/Import Raw Matrix ( by (Import) Samples) from aggregating count files into a matrix, and importing the sample Phenotype data in R. Raw Gene Exprs 2. Create Normalized Expression Matrix by performing (QC) removing sample outliers base on QC assessment and normalizing the samples to each other using Edge. Gene Exprs 3. Generate Differential Gene Expression Lists by performing (Model) 2 group analysis via linear modeling using EdgeR.

DE Gene Lists 4. Generate Pathway Enrichment Lists by taking top ranking genes and performing pathway enrichment using Gene Set (Pathway Enrichment Analysis or GSEA. Enrichment) Pathway Lists 5. Run the Advanced Bioinformatics Toolkit.

24 Identifying correlates of immunogenicity in the DC-Vax-001 clinical trial

Mark Cameron Case Western Reserve University DCVAX Dose Escalation Trial

■ Trial: ■ Randomized, double-blinded, placebo-controlled, phase I ■ healthy volunteers ■ Evaluate safety and immunogenicity of anti-DEC205-HIV gag p24 (clade B) fusion monoclonal antibody protein vaccine plus Poly ICLC

■ Study Design (45 Subjects): ■ 3 groups of 9 DCVax-001/Poly ICLC, 9 Poly ICLC, 9 Saline ■ Dose groups (s.c.): ■ Low - 3 doses of 0.3 mg of DCVax-001 ■ Mid - 3 doses of 1 mg ■ High - 3 doses of 3 mg

26 Poly ICLC induces broad IFN signaling gene expression in healthy volunteers

polyICLC 14 12 10 8

placebo Peak Gene Expression

Caskey et al., JEM, 2011 27 DC-Vax Transcriptomics

Objectives ■ To determine the impact of increasing doses of DC VAX on global gene expression unique from poly-ICLC ■ To define gene expression biomarkers that correlate with outcome (ICS, titers)

28 Innate Arm: CD4+ IFNg+ regression with gene-expression

METHOD: mean rank ordering DATASET: DC-Vax Post-Last Vacc LEGEND: top 50 genes correlated to post-DC-Vax IFN+ CD4+ cells %

Genes positively correlated with the percentage of IFNg+ CD4+ cells are still more highly expressed in high dose DCVax samples, post-last vacc

Gene signatures do not appear to be impacted by multiple vaccinations

29 DCVax Antigen-Specific Stim Experiment Design

N Donor (n=26) 9 Placebo Day0 (Pre-Vaccination) 15hr stim HIVGAG 7 Poly ICLC Wk16 (4 Weeks Post Last Vaccination)

Low-DCVax (0.3 mg) PBMC Low Input RNA-Seq Mid-DCVax (1 mg) Day0 (Pre-Vaccination) 15hr stim High-DCVax (3 mg) DMSO Wk16 (4 Weeks Post Last Vaccination)

10 of the highest responders regardless of dose logFC logFC UBE2L6 PML CAMK2D IL12A IL18R1 0 0 1 1 GBP1 STAT1 IRF1 6 7 PTPN6 PTAFR IFNG STAT5A IL2 OASL PTPN2 IFNG GBP5

INTERFERON SIGNALING STAT1 NUP205 GBP4 FCGR1B

IL2RA IL23A logFC ZNF697 0 1 FJX1 ARL5B 4 NFKBIZ

MSANTD3 RIPK2 PTGS2 BTG3 DENND4A CXCL10

POSITIVE REGULATION OF CYTOKINE PRODUCTION LAMP3 EDN1 NFKB1

GPR132 STAT5A GBP1 OASL IFNG signaling (TH1) is uniquely enriched in genes TIFA up-regulated in DCVAX vs PICLC P2RX7 (wk16) TTYH2 PTGER4 KCNJ2 Contrast # DEGs pValue <= 0.05 LAP3 Social Network - rank the USP15 FNTA wk16 – DCVAX (GAG-DMSO) vs social network and how well RIPK2 CD47 logFC PICLC (GAG-DMSO) 944 UBE2L3 0 those within are connected RCN1 IL2RA SLAMF8 TAP2 1 PRO-INFLAMMATORY RESPONSE / CXCL11 6 GSEA PreAPOPTOSIS-ranked : Pvalue SIGNALING* sign(logFC) SNX10 CASP7 CXCL9 IDO1 DB : mSigDB (c2+c7) CASP4 GBP1 TNFSF10 Enrichment Map : JI >= 0.25% VPS9D1 LIMK2 CXCL10 WARS APOL6 STAT1 UBE2L6 SECTM1 APOL1 LAMP3 IRF1 Network edges OASL TDRD7 RTP4 Co−expression FCGR1B PRPS2 Pathway PML Genetic Interactions PRRG4 logFC logFC UBE2L6 PML CAMK2D IL12A IL18R1 0 0 1 1 GBP1 STAT1 IRF1 6 7 PTPN6 PTAFR IFNG STAT5A IL2 OASL PTPN2 IFNG GBP5

INTERFERON SIGNALING STAT1 NUP205 GBP4 FCGR1B

IL2RA IL23A logFC ZNF697 0 1 FJX1 ARL5B 4 NFKBIZ

MSANTD3 RIPK2 PTGS2 BTG3 DENND4A CXCL10

POSITIVE REGULATION OF CYTOKINE PRODUCTION LAMP3 EDN1 NFKB1

GPR132 STAT5A GBP1 OASL

TIFA

P2RX7 TTYH2 PTGER4 KCNJ2

LAP3 USP15 FNTA RIPK2 CD47 logFC UBE2L3 0 RCN1 IL2RA SLAMF8 TAP2 1 PRO-INFLAMMATORY RESPONSE / CXCL11 6 APOPTOSIS SIGNALING SNX10 CASP7 CXCL9 IDO1 CASP4 GBP1 TNFSF10 VPS9D1 LIMK2 CXCL10 WARS APOL6 STAT1 UBE2L6 SECTM1 APOL1 LAMP3 IRF1 Network edges OASL TDRD7 RTP4 Co−expression FCGR1B PRPS2 Pathway PML Genetic Interactions PRRG4 GS size log10_pval 13 1 59 6

CROSSTALK_BETWEEN_DENDRITIC_CELLS_AND_NATURAL_KILLER_CELLS

CTLA4_SIGNALING_IN_CYTOTOXIC_T_LYMPHOCYTES GS size log10_pval 6 2 COMMUNICATION_BETWEEN_INNATE_AND_ADAPTIVE_IMMUNE_CELLS 8 2

SYSTEMIC_LUPUS_ERYTHEMATOSUS_SIGNALING GS size log10_pval GRAFT-VERSUS-HOST_DISEASE_SIGNALING DIFFERENTIAL_REGULATION_OF_CYTOKINE_PRODUCTION_IN_INTESTINAL_EPITHELIAL_CELLS_BY_IL-17A_AND_IL-17F 13 1 DENDRITIC_CELL_MATURATION 59 6 AUTOIMMUNE_THYROID_DISEASE_SIGNALING

CROSSTALK_BETWEEN_DENDRITIC_CELLS_AND_NATURAL_KILLER_CELLS ROLE_OF_HYPERCYTOKINEMIA_HYPERCHEMOKINEMIA_IN_THE_PATHOGENESIS_OF_INFLUENZA ALLOGRAFT_REJECTION_SIGNALING

T_HELPER_CELL_DIFFERENTIATION CTLA4_SIGNALING_IN_CYTOTOXIC_T_LYMPHOCYTES CYTOTOXIC_T_LYMPHOCYTE-MEDIATED_APOPTOSIS_OF_TARGET_CELLS GS size log10_pval TREM1_SIGNALING DIFFERENTIAL_REGULATION_OF_CYTOKINE_PRODUCTION_IN_MACROPHAGES_AND_T_HELPER_CELLS_BY_IL-17A_AND_IL-17F Unique to IgG2: Positive association of T cell activation and cytokine gene expression 6 2 COMMUNICATION_BETWEEN_INNATE_AND_ADAPTIVE_IMMUNE_CELLS B_CELL_DEVELOPMENT 8 2 CDC42_SIGNALING ALTERED_T_CELL_AND_B_CELL_SIGNALING_IN_RHEUMATOID_ARTHRITIwith titers at 16 S wk SYSTEMIC_LUPUS_ERYTHEMATOSUS_SIGNALING pearsonCor GRAFT-VERSUS-HOST_DISEASE_SIGNALING DIFFERENTIAL_REGULATION_OF_CYTOKINE_PRODUCTION_IN_INTESTINAL_EPITHELIAL_CELLS_BY_IL-17A_AND_IL-17F ROLE_OF_PATTERN_RECOGNITION_RECEPTORS_IN_RECOGNITION_OF_BACTERIA_AND_VIRUSES DENDRITIC_CELL_MATURATION TNF 0.8 ACTIN_CYTOSKELETON_SIGNALING Heatmap representation of the most frequent genes in the CCL3 AUTOIMMUNE_THYROID_DISEASE_SIGNALING pathways (16 wk Stim dataset) CCL4 0.6 ROLE_OF_HYPERCYTOKINEMIA_HYPERCHEMOKINEMIA_IN_THE_PATHOGENESIS_OF_INFLUENZA CCL2 0.4 ALLOGRAFT_REJECTION_SIGNALING RHOA_SIGNALING IL1B T_HELPER_CELL_DIFFERENTIATION CCL5 0.2 REGULATION_OF_ACTIN-BASED_MOTILITY_BY_RHO 0 CYTOTOXIC_T_LYMPHOCYTE-MEDIATED_APOPTOSIS_OF_TARGET_CELLS TREM1_SIGNALING DIFFERENTIAL_REGULATION_OF_CYTOKINE_PRODUCTION_IN_MACROPHAGES_AND_T_HELPER_CELLS_BY_IL-17A_AND_IL-17FModule1 : T cell activation Module2 : Cytokine/chemokines B_CELL_DEVELOPMENT pearsonCor

CDC42_SIGNALING ALTERED_T_CELL_AND_B_CELL_SIGNALING_IN_RHEUMATOID_ARTHRITIS NFKB1 ATM pearsonCor PIK3R1 ROLE_OF_PATTERN_RECOGNITION_RECEPTORS_IN_RECOGNITION_OF_BACTERIA_AND_VIRUSES 0.8 MAPK1 TNF HLA−DOB 0.8 ACTIN_CYTOSKELETON_SIGNALING CCL3 PIK3CG 0.6 CD40 CCL4 0.6 TLR2 CCL2 0.4 RHOA_SIGNALING PIK3C3 0.4 IL1B IL10 CCL5 0.2 REGULATION_OF_ACTIN-BASED_MOTILITY_BY_RHO TLR9 IL6 0.2 0 CD247 TNF pearsonCor CD3D 0 NFKB1 CD86 ATM FASLG HLA−DRA PIK3R1 0.8 CD28 MAPK1 CD40LG HLA−DOB HLA−DRB1 PIK3CG 0.6 HLA−B CD40 CD3E TLR2 FCER1G PIK3C3 0.4 B2M IL10 HLA−F TLR9 HLA−DMA IL6 0.2 HLA−DRB5 CD247 HLA−DOA TNF HLA−DQA1 CD3D 0 PIK3R4 CD86 FASLG HLA−DRA CD28 CD40LG HLA−DRB1 HLA−B CD3E FCER1G B2M HLA−F HLA−DMA HLA−DRB5 HLA−DOA HLA−DQA1 PIK3R4 JAK/STAT signaling is enriched via association with IgG3 titers

CELL_CYCLE_REGULATION_BY_BTG_FAMILY_PROTEINS

INTRINSIC_PROTHROMBIN_ACTIVATION_PATHWAY pearsonCor

JAK/STAT SIGNALINGIL−22_SIGNALING 0.4 IgG3 * UBIQUINONE_BIOSYNTHESIS 0.0 * MITOCHONDRIAL_DYSFUNCTION −0.4 *OXIDATIVE_PHOSPHORYLATION −0.8 * PURINE_METABOLISM F5 IDE CAT AK4 AK3 RB1 ADA BLM ADK ITPA FIS1 ATIC FHIT TAF9 TJP2 PPAT E2F4 E2F1 PPA2 TYK2 TXN2

= also observed in AOX1 RFC5 ADSL DLG1 ATP5I CDK2 DNA2 MLH1 CYC1 POLE POLK POLB THBD ADSS MPP6 MPP2 SOD2 GPD2 GART CYCS NME1 NME6 NME7 NME4 MSH2 RRM1 SDHC COQ3 STAT1 PAICS ATP5J * PINK1 MAOB ATP5L FTSJ2 FTSJ1 ATP5S ATP5B ATP5E MGMT PRIM1 PAPD7 PARK7 ATP5O NT5C2 SEPT1 REV3L COX11 COX17 UFSP2 PNPT1 VPS4B POLE2 POLA1 HPRT1 CASP3 CASP9 PDE6A ENPP5 PDE3B PRPS1 PRPS2 APH1B CASP8 HOXB9 COX7B NUDT2 NUDT9 NUDT5 ADCY1 ABCC6 KIF20B HSPD1 ADCY7 PRDX3 CNOT7 PROS1 COX6C COX7C CCNE1 SOCS3 MAPK9 ATP5J2 AMPD3 PSMC1 PSMC5 PSMC2 PSMC6 PSMC4 COX4I1 RRM2B STAT5B ATP5F1 GMPR2 IMPAD1 ATP5A1 UQCRB ATP5C1 DGUOK UQCRH TCIRG1 ATP5G1 ATP5G3 ATP5G2 UQCRQ KATNA1 PAPSS1 IMPDH2 POLR2J COL1A1 POLR2L COX7A2 COX6B1 COX6A1 POLR3F POLR2F CYB5R3 NDUFA1 NDUFA5 NDUFA8 NDUFA6 NDUFA2 NDUFA9 NDUFA3 NDUFA7 ENTPD4 PPP2CA POLR1E POLR2B POLR3B MAP2K4 MAPK12 MAPK14 MAPK11 POLR3C POLR2C POLR1C POLR2H NDUFB7 NDUFS6 NDUFB5 NDUFB1 NDUFB3 NDUFV3 NDUFS4 NDUFS5 NDUFS2 NDUFB2 NDUFS1 NDUFB9 NDUFB6 NDUFB8 NDUFB4 NDUFC2 POLR3G POLR2G UQCR11 UQCR10 PSENEN BCKDHA ATP6V1F UQCRC2 UQCRC1 UQCRHL ATP6AP1 ATP6V0B COL10A1 COX7A2L NDUFA11 NDUFA12 NDUFA13 PPP2R2B PPP2R5E PPP2R3A KIAA2022 NDUFAF1 NDUFB11 UQCRFS1 ATP6V1B2 ATP6V0A2 ATP6V0A1 ATP6V1E1 ATP6V0E1 SMARCA5 ATP6V0D1 ATP6V1G1 association with IgG1 HSD17B10 LOC100132832

JAK/STAT SIGNALING pearsonCor

0.6 IgG3 0.5 0.4 0.3 0.2 T1 A T5B T TYK2 TA S SOCS3 MAPK9 S MAPK12 MAPK14 MAPK11

IL−22_SIGNALING Innate Study: Day 1 type I IFN gene expression network associated with later IgG3 Ab titers

8000 80008000 8000

6000 60006000 6000

8000

60004000 40004000 4000 NES'

4000 % DEGs % DEGs % DEGs % DEGs 34

LN GI FRTLNLN PBMCGIGI FRTFRTLN PBMCPBMCGI % DEGs 2000FRT PBMC20002000 2000 LN GI FRT PBMC 2000

0 1 307 10 1 3 7 10 1 003 7 10 1 3 7 10 0 300 1 3 7 10 1 113 337 771010101111333377710710101011113333777107101010111 333 777101010 1 3 7 10 1 3 710 1 3 710 1 3 710 1 3 710 300 300300 300

200 1 3 710 1 3 7101133771010113377101011337710101113337771010101 3 710 1 3 710

100 % ISG DEGs 200 200200 200

0 1 3 7 10 1 3 7 10 1 3 7 10 1 3 7 10 PBMC LN FRT GI 100 100100 Figure100 5b % ISG DEGs % ISG DEGs% ISG DEGs % ISG DEGs

0 00 0 1 3 7 10 1 113 337 771010101111333377710710101011113333777107101010111 333 777101010 1 3 7 10 PBMC PBMCPBMCLN FRTPBMCLNLN FRTGIFRTLN GIFRTGI GI Figure 5b FigureFigure 5b5b Figure 5b Immune Correlate (Biomarker) ID in EPIICAL

• Aim: to identify biomarkers that can be used as classifiers or predictors of viral control in EPIICAL cohorts. • Accomplished by bioinformatic integration of omic, immune assay and clinical trial data • Biomarkers that describe innate and adaptive immune responses – E.g. in slow vs rapid-responding individuals • Develop specific assays (e.g. Fluidigm Panels) for monitoring these pathways in the prospective cohort and future trials I: Biomarker Discovery

• Approach: To curate a list of biomarkers of viral control from existing scientific literature and study data and build a database for EPIICAL.

• Parameters: – Records of Evidence: non-data driven, known biomarkers from the literature – Descriptive: non-data driven, controlled, public database driven – Experimental Evidence: EPIICAL study (data driven) identification of biomarker potential by bioinformatics

• An iterative database will store these biomarker parameters. • We already have a relevant biomarker database, via CFAR studies, to initially populate and rank EPIICAL biomarkers from the retrospective trial. II: Prioritizing Biomarkers

• Approach: A scoring matrix will used to evaluate potential biomarkers and decide which will form a panel.

• The biomarkers will be ranked according to their P-value by: – testing for differential response, e.g. (S)low vs. (R)apid Suppression or (S)low vs. (R)apid rebound – Advanced correlation testing (next slide).

• In multiple assays or studies, the biomarkers will be ranked in an absolute manner to overcome the inherent differences in variance between assays

• An aggregate score (sum of ranks) will be calculated for each biomarker across all assays with testing for significance.

• The final prioritization score (aggregation score ranked with p value) for each biomarker will be provided to the Fluidigm platform as a candidate panel. III - Bioinformatics Toolset

• We undercover statistically significant correlations using: – Linear Discriminant Analysis (*LDA), Support Vector Machines (*SVM), Independent Component Analysis (ICA), Principal Component Analysis (PCA), and Gene Set Enrichment Analysis (GSEA). * Classifier signature development

– Mixomics for integrating transcription profiling data with flow cytometry, cytokines or other laboratory parameters.

– Gene Signature Meta-Analysis (GSMA) • a systematic meta-analysis tool for pathway enrichment. • Integrates pathway information across studies by using the maximum P-value statistic in 4 steps: phenotype, pathway enrichment, combined scoring, control of false discovery rate. Biomarker development as a trial deliverable

Biomarker Refinement and Validation (Genomics)

Preliminary Correlates from Panel Draft Panel Final Pilots Monitoring Refine Refine Screen Screen Future (retro) (Prosp) Trials Score Score

Therapies (e.g. for Best in Class Trial Candidate non-response)

Drug Discovery (Modeling) “Integration: Day of Big Data*”

• New technology = More data at finer resolution (clinical, flow assays, genomics, etc.) • Need for new bioinformatic methods for analysis, integration and summary

*Nature 462, 722-723 Dr. Mark Cameron Epidemiology & Biostatistics Institute of Computational Biology Case Western Reserve University [email protected] (216) 368-3196

Sr. Genome Research Assistants: -Pearline Cartwright -John Pyles

Bioinformaticians: -Peter Wilkinson -Brian Richardson

Post-Doctoral Fellow: -Jackie Golden

Student: -Michael Cartwright -Translational Bioinformatics, PhD Acknowledgments q Rafick Sekaly q Rockefeller University q Marina Caskey q Genomics q Sarah Schlesinger q Christine Trumpfheller – John Pyles q Gaelle Breton – Andrew Smith q Ralph Steinman q Bioinformatics q VRC – Slim Fourati q Rick Koup q Robert Bailer –Aarthi Talla – Khader Ghneim q Georgia Tomaras –Ali Filali q Eva Chung – Peter Wilkinson q Adrian McDermott Group

q VIMCs & FNIH

42