Bioinformatics 1 -- Lecture 22

IMMUNOINFORMATICS:

Bioinformatics Challenges in

Most slides courtesy of Julia Ponomarenko, San Diego Supercomputer Center or Oliver Kohlbacher, WSI/ZBIT, Eberhard-Karls- Universität Tübingen The Immune Reaction

2 Vaccines have been made for 36 of >400 human pathogens

Immunological Bioinformatics, 2005 The 4 MIT press. +HPV & Rotavirus

 Quantum unit of

 Surface on which an binds

 Comprises antigenic matter

 Linear epitope consists of a short AA sequence

depends on tertiary structure.

4 Two branches of immunity

Innate Adaptive immunity immunity

 Recognizes molecules, called pathogen-  Recognizes ; that are specific associated molecular patterns (~103), or B- and T-cell recognition sites on PAMPs shared by groups of related . microbes; e.g. LPS from the gram-negative cell wall, RNA from viruses, flagellin, and glucans from  Is -specific. fungal cell walls.  3 to 10 days response.  Is antigen-nonspecific.  Involves the following:  Immediate or within several hours response. . antigen-presenting cells (APCs) such as and dendritic cells;  Involves body defense cells that have pattern-recognition receptors: . antigen-specific B- (~109); . Leukocytes: , , . antigen-specific T-lymphocytes (~ and ; 1012); and . cells that release inflammatory mediators: . . macrophages and mast cells;

. natural killer cells (NK cells); and  Improves with repeated exposure and . complement and cytokines. becomes protective.

 Does not improve with repeated exposure to 3 a given infection. Components of the

6 5 B and T cells recognize diferent epitopes of the same

T-cell epitope B-cell epitope Denatured antigen Native or denatured (rare) antigen Linear (often) peptide 8-37 aa Sequential (continuous) or Internal (often) conformational (discontinuous) Accessible, hydrophilic, mobile, usually on the surface or could be Binding to receptor: exposed as a result of -5 -7 physicochemical change Kd 10 – 10 M (low afnity) Binding to antibody: Slow on-rate, slow of-rate -7 -11 (once bound, peptide may stay Kd 10 – 10 M (high afnity) associated for hours to many days) Rapid on-rate, variable of-rate

11 (magenta, orange) and T cell epitopes (blue, green, red) of hen egg-white lysozyme

10 PDB: 1dpx MHC class I pathway

Intracellular pathogen (virus, mycobacteria)

Cytosolic protein

Proteasome

Peptides CD8 TAP epitope ER ER

MHC I TCR

CD8 CTL (TCD8+) Any cell 13 Xenoreactive Complex AHIII 12.2 TCR bound to P1049 (ALWGFFPVLS) /HLA- MHC class I pathway A2.1

Intracellular pathogen T-Cell (virus, mycobacteria) Receptor

Cytosolic protein V β Proteasome Vα

Peptides CD8 TAP epitop ER ER e

MHC I TCR MHC class I CD8 CTL (TCD8+) β-2- Any cell 14 Microglobulin 1lp9 MHC class I pathway

Intracellular pathogen (virus, mycobacteria) Bioinformatics approaches at epitope prediction:

Cytosolic protein (1) Prediction of proteosomal cleavage Proteasome sites (several methods exist based on small amount of in vitro data). Prediction of peptide-TAP binding Peptides (2) (ibid.). CD8 TAP epitop (3) Prediction of peptide-MHC ER ER e binding. MHC I TCR (4) Prediction of pMHC-TCR binding.

CD8 CTL (TCD8+) Any cell 15 MHC class I epitope prediction: Challenges  High rate of pathogen mutations. Pathogens evolve to escape: . Proteosomal cleavage (HIV); . TAP binding (HIV, HSV type I); . MHC binding.

 MHC genes are highly polymorphic (2,292 human alleles/1,670 - 2 yeas ago).

 MHC polymorphism is essential to protect the population from invasion by pathogens. But it generates problem for epitope-based vaccine design: a vaccine needs to contain a unique epitope binding to each MHC allele.

 Every normal (heterozygous) human expresses six diferent MHC class I molecules on every cell, containing α-chains derived from the two alleles of HLA-A, HLA-B, HLA-C genes that inherited from the parents.

 Every human has ~1012 lymphoid cells with a T-cell receptors repertoire of ~107 , depending on her immunological status (vaccinations, disease history, environment, etc.). 16 Prediction of MHC class I binding peptide – potential epitopes ALAKAAAAM ALAKAAAAN  MHC allele or allele supertype (similar in sequences ALAKAAAAV alleles bind similar peptides) specific. ALAKAAAAT  Peptide length (8-, 9-, 10-, 11-mers) specific. GMNERPILT  Sequence-based approaches: GILGFVFTM . Gibbs sampling (when the training peptides are TLNAWVKVV of diferent lengths) KLNEPVLLL . Hidden Markov Models (ibid.) AVVPFIVSV . Sequence motifs, position weight matrices Peptides . Artificial Neural Networks (require a large known to number of training examples) bind to the HLA-A*0201 . SVM* molecule. 17 Performance measures for prediction methods

Predicted score (binding afnity ROC curve 1.0 value)

TP 0.8

FP 0.5 Score AUC FN threshold 0.3 or TN AROC

True positive rate, TP / (TP + FN) 0 0 0.2 0.4 0.6 0.8 1 based on TP+FN – actual binders ( False positive rate, FP / (FP + TN) a defined threshold on binding afnity values) TN+ FP – actual non-binders (ibid.) Sensitivity = TP / (TP + FN) = 6/7= 0.86 Specificity = TN / (TN + FP) = 6/8 = 0.75 18 Performance measures for prediction methods (cont)

Predicted score pi (binding afnity Pearson’s correlation value) coefcient TP

FP Score FN threshold

TN

TP+FN – actual binders (based on a defined threshold on actual binding afnity values ai ) TN+ FP – actual non-binders (ibid.) Sensitivity = TP / (TP + FN) = 6/7= 0.86 19 Specificity = TN / (TN + FP) = 6/8 = 0.75 MHC class I pathway MHC class II pathway

Intracellular pathogen (virus, mycobacteria) Extracellular protein

Cytosolic protein Endosome

Proteasome

? Peptides MHC II CD8 TAP epitope CD4 epitope ER ER

MHC I TCR TCR CD4 CD8 CTL TCD4+ (TCD8+) Any cell B-cell, , or dendritic cell29 Complex Of A Human TCR, Influenza HA Antigen Peptide (PKYVKQNTLKLAT) and MHC Class II MHC class I pathway MHC class II pathway T-Cell Intracellular pathogen (virus,Receptor mycobacteria) Extracellular protein

Cytosolic protein Endosome

Proteasome Vα V β ? Peptides MHC II

TAP CD4 MHC class II ER ER epitop α e MHC I TCR

MHCT classCD8+ II CD4 TCD4+ β Any cell B-cell, macrophage, or dendritic cell30 Complex Of A Human TCR, Influenza HA Xenoreactive Complex AHIII 12.2 TCR Antigen Peptide (PKYVKQNTLKLAT) and MHC bound to P1049 (ALWGFFPVLS) /HLA- Class II A2.1 Epitope, or antigenic determinant, is T-Celldefined as the site of an antigen recognized by immune response T-Cell Receptormolecules (, MHC, TCR) Receptor

T cell epitope – a short linear peptide or other chemical entity (native or denatured antigen) that binds MHC V β (class I binds 8-10 ac peptides; classV αII Vα bindsV β 11-25 ac peptides) and may be recognized by T-cell receptor (TCR).

TMHC cell recognition of antigen involves tertiaryclass IIcomplex α “antigen-TCR-MHC”.

MHC class I MHC class II β β-2- 31 1fyt Microglobulin 1lp9 MHC class II epitope prediction: Challenges Complex Of A Human TCR, Influenza HA  MHC class II genes are as highly Antigen Peptide (PKYVKQNTLKLAT) and polymorphic as MHC class I (1,012 MHCExtracellular Class II protein human alleles for today). T-Cell  The repertoire of T-cell receptors ReceptorEndosome is ~107 and depends on an individual’s immunological status (vaccinations, disease history, environment, etc.). ? MHC II Vα  The epitope length 9-37 aa. V

 The peptide may have non-linear β CD4 conformation. epitope MHC  The MHC binding groove is open class II from both sides and it is known α that residues outside the groove efect peptide binding. TMHCCD4+ class II B-cell, macrophage, or dendriticβ cell32 MHC class II epitope prediction: Challenges

Extracellular protein

Endosome

 The processing of MHC class II epitopes is still a mystery and likely depends on the antigen structure, the ? cell type and other factors. MHC II

CD4 epitope

TCR

CD4 TCD4+

B-cell, macrophage, or dendritic cell33 22 Prediction of MHC class II binding peptide – potential epitopes

 MHC allele or allele supertype (similar in sequences alleles bind similar peptides) specific.

 Predictions for peptides of length 9 aa (the peptide-MHC binding core)

 Sequence-based approaches: . Gibbs sampling . Sequence motifs, position weight matrices . Machine learning: SVM, HMM, evolutionary algorithms

35 Nielsen et al., Bioinformatics 2004 Benchmarking predictions of peptide binding to MHC II (Wang et al. PLoS Comput Biol. 2007)

 Data: pairs {peptide – afnity value in terms of IC50 nM} for a given MHC allele

 16 diferent mouse and human MHC class II alleles.

 10,017 data points.

 9 diferent methods were evaluated: 6 matrix-based, 2 SVM, 1 QSAR-based.

 AUC values varied from 0.5 (random prediction) to 0.83, depending on the allele.

 Comparison with 29 X-ray structures of peptide-MHC II complexes (14 diferent alleles):

. The success level of the binding core recognition was 21%-62%,

. with exception of TEPETOPE method (100%) that is based on structural information and measured afnity values for mutant variants of MHC class II and peptides (Sturniolo et al., Nature Biotechnology, 1999).

 => Structural information together with peptide-MHC binding data should improve the prediction. 36 38 26 ]

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Outstanding problems in TAP/ Proteosome/MHC peptide prediction

 High throughput data generation

. Currently few verified peptide sequences. . Overfitting likely

 Something better than the usual machine learning approach?

. Structure-based motifs, length preferences? . Combine multiple motifs, reflecting multiple enzymes?

 Vaccine design to incorporate cleavage motifs

. Optimize recombinant vaccine for cleavage?

63 Cellular immunity

64 65 Epitope-based vaccines

 A major reason for analyzing and predicting epitopes is because they may lead to the development of peptide-based synthetic vaccines.

 Thousands of peptides have been pre-clinically examined; over 100 of them have progressed to phase I clinical trials and about 30 to phase II, including vaccines for foot-and-mouth virus infection, influenza, HIV.

 However, not a single peptide vaccine has passed phase III and became available to the public.

 The only successful synthetic peptide vaccine has been made against canine parvovirus (causing enteritis and myocarditis in dogs and minks). It consists of several peptides from the N-terminal region of the viral VP2 protein (residues 1–15, 7– 1, and 3–19) coupled to a carrier induced an immune response in dogs and minks. However, it is expensive and has lower than the conventional vaccine (attenuated 45 virus) coverage. Immunoinformatics

 The goals: . Modeling of the immune system at the population and individual levels (in silico immune system). . Design of medical diagnostics and therapeutic/prophylactic vaccines for cancers, , autoimmune and infectious diseases.

 Epitope discovery: how the antigens are recognized by the cells of the immune system.

 Data collection and analysis: IEDB, HIV database, AntiJen (UK), IMGT (France)

 Evolution of the adaptive and innate immune system: Gary Litman (FL), Louis Du Pasquier (Switzerland)

 Evolution of pathogens and co-evolution of host and pathogen.

 Modeling of host-pathogen interactions: Leor Weinberger (UCSD)

 Deciphering regulatory networks in APCs, lymphocytes and other cells.47