<<

Know Thine Enemy - Current Methods of Prediction

Rosanna Smith

Supervisors: Dr Matthew Davies & Prof Peter Coveney

Word Count: 4900

April 16, 2007

Contents

1 Introduction 1

2 Vaccination and 1 2.1 Innate Immunity ...... 1 2.2 Adaptive Immunity ...... 2 2.3 Vaccination ...... 5

3 T-cell Epitope Prediction 6 3.1 Motifs, Quantitative Matrices and Artificial Neural Networks ...... 6 3.2 Molecular Dynamics ...... 8

4 B-Cell Epitope Prediction 9

5 Reverse Vaccinology 11

6 Impact of the Grid 12

7 Final Remarks 13 1 Introduction

Since Edward Jenner’s pioneering experiments 1796 on the use of cow- pox to vaccinate against human smallpox, the study of vaccinology has achieved protection for millions of people against potentially lethal or disabling diseases such as polio, tetanus and diphtheria, to name a few. Smallpox itself was declared as ‘eradicated’ by the World Health Organ- isation in 1980 as a direct result of mass vaccination. However, there are many other infections, including malaria and HIV, whose control by an effective has so for eluded researchers. This essay aims to review the current, mainly non-experimental, methods used in epitope predic- tion, which could play an important time-reducing step in the design of effective . The motivation for epitope prediction may be rooted in its application in vaccine design, but this research area pushes the bound- aries of understanding in fundamental subjects such as -protein in- teractions. Epitope prediction also requires the development of computa- tional methods such as pattern recognition and the extraction of informa- tion from varying sizes of data sets. As shall be described, the subject of epitope prediction has achieved some limited success thus far, but could hold the potential for much more.

2 Vaccination and Immunity

The aim of vaccination is to manipulate the adaptive so that a potentially harmful pathogen that could be encountered is immedi- ately recognised by the body and destroyed as quickly as possible. Ma- nipulation of the adaptive immune response involves challenging the im- mune system with an agent similar enough to the pathogen that they are considered equivalent by the immune system, but the agent itself must not cause disease or serious side effects. Knowledge of how a pathogen is recognised as being foreign is therefore crucial if trying to design an agent for use in vaccination.

2.1 Innate Immunity The immune system has two approaches to infection: the innate immune response and the adaptive immune response. The innate immune response is the first line of defence against infection and involves the removal of pathogens by and . These white blood cells have receptors for common pathogen surface molecules, which allow the white

1 blood cell to distinguish the pathogen from host (self) cells. After recog- nition by the receptors the pathogen is ingested (phago- cytosis) and then killed. As well as directly removing the pathogen, the innate response also initiates the adaptive immune response, which takes effect approximately 96 hours after infection if the innate response has not cleared the pathogens. Although the innate response can recognise pathogens with common types of surface molecule, the and receptors are inherited with little modification and evolve slower than the pathogens they come across. This means that pathogens can avoid the innate response. For example, viruses do not have any surface molecules similar to those recognised by the white cell receptors and some bacteria, including the agent of syphilis Treponema pallidum, have devel- oped a protective capsule to avoid detection.

2.2 Adaptive Immunity The responds to pathogens that either are not detected by the macrophages and neutrophils, or that are too prevalent to be dealt with by the innate response alone. The adaptive immune response is centred on the roles played by two types of : T-cells and B- cells. These host cells not only detect pathogens but also take part in their removal. B-cells detect pathogens via surface receptors known as immunoglob- ulins. Each B-cell produces receptors of a single specificity, which will only bind to a single type of molecule. If a B-cell is activated then it will also excrete immunoglobulin molecules, known as , which have the same specificity as the receptors. The molecule from the pathogen which immunoglobulins bind is known as the and is commonly a pathogenic surface protein. On a more detailed level, the exact region on the antigen surface where the receptor or binds is the B-cell epitope and the corresponding region of the immunoglobin molecule is the . Variation of the immunoglobulin paratope is possible due to the protein domain being domain being encoded by several different gene segments. During B-cell development these gene segments undergo somatic and random recombination as well random insertion and dele- tion of nucleotides. This means that although an individual inherits only a limited number of different antigen-binding site gene segments, the re- combination processes result in a vast number of possible antigen-binding sites, leading to the recognition of an equally vast number of .

2 T-cells do not detect pathogens directly, instead, the T-cell receptors recognise foreign peptides bound to MHC (major histocompatibilty com- plex) molecules on the surface of host cells. MHC molecules are generated by the host cell, he short peptides they bind originate from the pathogen. For T-cells the epitope is the foreign peptide, but it must be bound by an MHC molecule to attain the correct conformation for recognition by T-cell receptors. There are two classes of MHC molecule: MHC class I molecules bind peptides derived from (including viral proteins) synthesised in the cytosol and MHC class II molecules bind peptides derived from pro- teins within intracellular vesicles (such as phagocytosed pathogens). The host cells express both types of MHC molecule and continuously display peptides from the cytosol and intracellular vesicles on their surface, in- cluding self-peptides. However, self-peptides are not recognised by naive T-cells as the negative selection during maturation leads to self-tolerance. Within the human population there are hundreds of different versions of each class of MHC molecule but any given individual only inherits 6-8 of these allelic variations. The two classes of MHC molecule have similar structures, with two out of the four protein domains forming the walls of a groove that can bind peptides. However, there are differences between the two classes which impact on the prediction of peptide binding. MHC class I molecules bind peptides 8-10 amino acids long and these fit inside the groove, which is closed at each end. In contrast, the MHC class II binding groove is open at each end and the peptide length ranges from 13 amino acids up to around 20.

Figure 1: Diagram of MHC molecules (grey) binding peptides (black) a) Class I b) Class II [1]

3 Foreign peptide fragments bound to MHC molecules are delivered to the cell surface and presented to the eternal environment where they are accessible to T-cell receptors. T-cell receptors have a similar structure to part of the immunoglobulin molecule and have a similarly large number of possible binding sites. However, T-cells also have a co-receptor which must bind simultaneously to the MHC molecule in order for the receptor to cause activation. T-cells expressing CD8 co-receptor specifically bind to MHC class I molecules, whereas T-cells expressing CD4 co-receptors bind to MHC class II molecules. This specificity is related to the future function of the activated T-cell. It is important that both B-cells and T-cells do not recognise host cells as being foreign and mount an autoimmune response. To prevent this happening they undergo negative selection during maturation, so that those receptors which do recognise host proteins and peptide fragments are killed, leaving only those with receptors that may recog- nise foreign . As the epitope for T-cells partly consists of a self protein (the MHC molecule), there is also some positive selection pressure towards allowing receptors that will recognise the MHC complex. The adaptive immune system is initiated by host dendritic cells which either engulf pathogens at the site of infection or are themselves infected by a virus or bacteria that replicates in the cytosol. The dendritic cells migrate to a lymph node where they come into contact with naive (unac- tivated) CD8 and CD4 T-cells which may recognise peptides presented by the from its cytosol and intracellular vesicles via the appro- priate class of MHC molecule. Naive T-cells with receptors that recognise a pathogenic peptide presented by the dendritic cell are activated. Only these T-cells undergo clonal expansion involving multiple divisions of the T-cell to produce many progeny with exactly the same receptor and co- receptor. The progeny are mature armed effector T-cells which act against the corresponding type of infection that resulted in naive T-cell activa- tion. Effector CD8 T-cells are also known as cytotoxic T-cells as they kill the host cell that presents the activating pathogenic peptide, preventing the further replication and dispersal of the pathogen (virus). Effector CD4 T-cells recognise peptides delivered by MHC class II molecules from the intracellular vesicles of macrophages, dendritic cells and B-cells and can either activate macrophages to kill their vesicular pathogens or help ac- tivate B-cells to produce antibodies directed against the pathogen. After an infection has been cleared, some of the armed effector T-cells change into long-lived, memory T-cells that will be reactivated if reinfection by the same pathogen occurs.

4 Naive B-cells are activated by a subset of the effector CD4 T-cells, the helper T-cells. If a naive B-cell has surface receptors that recognise an anti- gen on the pathogen then the pathogen will be ingested by the B-cell and antigenic peptides delivered to the B-cell surface by MHC molecules. The combined signals of antigen recognition at the surface and antigenic pep- tide recognition by a helper T-cell leads to B-cell activation. Upon activa- tion the B-cell undergoes clonal expansion and subsequent differentiation either into plasma cells, which produce a large amount of antibody with the same paratope as the original B-cell receptor, or into long lived mem- ory B-cells. The antibodies are excreted into the extracellular space where they combat the pathogen by binding bacterial toxins or binding to the bacteria itself, resulting in the ingestion of the bacteria by macrophages upon recognition of the non-antigen binding end of the antibodies. As has been described, the primary immune response not only fights infection but also induces the production of memory lymphocytes which means that the response time upon reinfection is greatly decreased

2.3 Vaccination Traditional methods of vaccination design involve illiciting a primary adap- tive immune response using dead or attenuated forms of whole pathogen (for example, the polio and measles viruses) or inactivated pathogenic toxin (for example, the toxin produced by Bordetella pertussis, the cause of whooping cough). However, this approach requires the pathogen to be cultured in vitro which is not always possible. Peptides are the ideal vac- cine agent as they are more easily taken up by host cells, in comparison to proteins or whole pathogens and their specific target can prevent un- wanted side effects. The experimental identification of which pathogenic proteins act as antigens is an arduous process, requiring either whole pro- teins or single peptides to be tested for potency from the vast range of pro- teins produced by the pathogen. Also, the experimental process of split- ting proteins into peptides involves enzymes that could split the amino acid sequence at any point, and crucially, at different points to those at which the protein is cleaved in vivo. This introduces the possibility that potential antigenic peptides are missed by experimental methods as they have been cleaved. The aim of epitope prediction is not to do away with experimentation, but to filter out potential B-cell and T-cell epitopes from the vast array of possibilites so that subsequent experimental testing is more efficient at revealing suitable vaccine candidates.

5 3 T-cell Epitope Prediction

The epitope for T-cell receptors is the complex formed by the binding of a small peptide to a MHC molecule. The MHC molecule on its own is not a stable structure and will not bind to a receptor, but is stabilised by the presence of the peptide. As only peptides which bind to the MHC molecules are delivered to the cell surface, only this subset of peptides from the ingested pathogen have the possibility of being detected by naive T-cells and possibly trigger an adaptive immune response. Therefore T- cell epitope prediction focuses on predicting whether or not any given peptide from the pathogen will, or will not, bind to an MHC molecule. However, a peptide that does bind is not automatically a T-cell epitope as there may not be a complementary T-cell receptor. So although not all binders are epitopes, all epitopes are binders and the binding approach to T-cell epitope prediction does not exclude any potential epitopes. It is relatively simple to determine the binding strength of peptides to MHC molecules through the use of competitive binding assays. The IC50 value of a peptide is the concentration required for 50% inhibition of a standard peptide, labelled fluorescently or with a radioisotope. Strong peptide binders are normally defined as requiring less than 50nM peptide for 50% inhibition [2]. As binding can be determined experimentally, there is a wealth of peptide binding data stored in databases such as JenPep [3] which can be utilised for T-cell epitope prediction.

3.1 Motifs, Quantitative Matrices and Artificial Neural Networks Peptides that bind to MHC class I molecules are stabilised at each end by their amino and carboxyl terminals which bind strongly to invariant MHC molecule sites (same residues for all allelic variations). There are additional sites along the peptide, known as anchor sites, where peptide residues strongly bind to those residues in the MHC molecule groove that show allelic variation. For a given allelic variation each peptide that binds strongly will have the same (or similar) residue at that site. For exam- ple, most variants bind a large hydrophobic peptide residue such as va- line, leucine or isoleucine at their carboxyl end, as the sidechains of these amino acids fit into a complementary pocket on the MHC molecule. Bind- ing peptides extend through each end of the groove of the MHC class II molecule, so the region of the peptide (8-10 amino acids in length) that binds within the groove varies. The MHC class II molecule binds along

6 the backbone of the peptide as well as at anchor sites which are less clear than for MHC class I molecules as strong binding does not always require the given MHC class II variant to have the same type of peptide residue at an anchor site. The presence of anchor residues results in common patterns (motifs) occurring for all the peptides that bind to a particular MHC molecule. Binding motifs can be found by firstly eluting strongly binding peptides from the MHC molecule, then sequentially breaking down the peptide from the N terminus and using chromatography to determine the identity of the removed amino acid. By pooling the results for many peptides it can seen which amino acids occur most often for a certain residue position [4]. The motifs common to bound peptides can be used to select, from all the possible peptides in a pathogenic protein, those that conform with the motif. The predictive power of this method his been tested in studies on the human papilloma virus [5] which reported positive predictive values (fraction of all positives that are true positives) of 27-73%, depending on the method used to determine the motif. Refinement of the motif method leads to quantitative matrices. These take into account the contribution of each amino acid towards the pep- tide binding rather than just those at binding sites. In this method a set of known binding peptides is used to determine a binding coefficient for every type of amino acid at each binding site, forming a matrix of coeffi- cients. Peptide sequences from the pathogenic protein are scored by com- bining the relevant elements in the binding coefficients matrix. Matrix methods are a step up in complexity from binding motifs and have greater predictive power, however, they assume that each amino acid contributes independently towards binding, which is not the case, and they also tend to create a bias in results towards the set of peptides used to determine the matrix coefficients. Artificial neural networks (ANNs) are adaptive processing networks that have learning capabilities. They operate by creating connections be- tween many different processing units, each analogous to single neurons in the brain, which was the original inspiration for this form of network. Each unit takes many input signals, then based on an internal weighting system, produces a single output signal, which is often one of the inputs for another unit. Initially the system does not ‘know’ anything as the in- ternal weightings of the units are randomised. Each time the network per- forms a task, the output can be deemed correct or incorrect. Those weight- ings for correct answers are reinforced, whereas weightings for incorrect outputs are diminished, so the network undergoes ‘supervised learning’. In the context of T-cell epitope prediction, ANN’s can be trained, using

7 known peptide binders from experiment, to recognise the patterns that differentiate binders from nonbinders. The sensitivity of peptide binding (fraction of true positives and false negatives that are true positives) has been reported to be of 80% [6] for ANN-based methods, which is higher than for matrix methods. The ability of ANNs to learn means that new data can be easily incorporated and they can also cope with datasets con- taining some erroneous data. One disadvantage, however, is that the pep- tides require alignment with the anchor positions before processing, which is relatively simple for MHC class I molecules as there is little variation of alignment, but considerably more difficult for MHC class II molecules [7]. In general, however, ANNs provide an increasingly sophisticated and ac- curate method of peptide binding prediction.

3.2 Molecular Dynamics Molecular dynamics methods use the structure of MHC:peptide complexes that have been determined by crystal x-ray diffraction, as the starting point for the simulated application of physical laws on the atoms contained in the complex. X-ray diffraction of MHC:peptide crystals can determine the average position of the atoms in the complex, but these are not the same as the equilibrium positions that occur when the peptide is at its lowest energy configuration. Molecular dynamics methods can determine the lowest energy configuration of the complex, from which the interaction energy of the peptide and MHC molecule can be determined [8]. Compar- isons of the interaction energy for different peptides will tell you which are strong binders and therefore possible contenders as T-cell epitopes. Finding the lowest energy configuration is an iterative process over many incremental time steps. Each atom has a coordinate and velocity and for the duration of the time step it is subject to a constant external force due to its interactions with surrounding atoms. For example, atoms that are covalently bonded have an equilibrium bond length, and any pertur- bation from will result in a force to restore equilibrium. Using Newtonian mechanics the system is deterministic, so the coordinates and velocities at the end of the time step can be found. These are then used as the initial conditions for the next time step and the associated forces are calculated for these new positions. Iteration of this process shows how the configu- ration changes over time when subjected to controlled environmental fac- tors. In the case of epitope prediction the initial atomic coordinates for the complex are those from x-ray diffraction structures combined with the co- ordinates of water molecules that are positioned to surround the complex.

8 For the first time step the initial atomic velocities are randomly assigned from the Maxwell-Boltzmann velocity distribution for a given tempera- ture. The timestep used in simulation is normally one femtosecond. The lowest energy configuration is obtained by gradually decreasing the ki- netic energy of the system through a friction term in the atomic equation of motion which relates the current temperature of the system to a target temperature [9]. The complex will be at its minimum energy configura- tion when the kinetic energy reaches zero and this configuration can then be used to determine the interaction energy. Molecular dynamics methods demonstrate a predictive power that is at least comparable to quantative matrices [8], with an approach that is more realistic, as they consider in- teractions on the scale of individual atoms rather than amino acids. A major disadvantage of this method is that molecular dynamics cal- culations are often prohibitively expensive computationally and the feasi- bility of their use is dependent upon increased availability of computing power. This problem is discussed further in a later section.

4 B-Cell Epitope Prediction

Immunoglobulin molecules (B-cell receptors and antibodies) form a Y- shaped structure and consist of 6 domains. As shown below, there are constant regions that can be one of a limited number of types and variable regions which have one version of many possible structural variations.

Figure 2: Diagram of Immunoglobulin molecule structure showing the locations of the variable and constant regions and light and heavy chains.[10]

9 The antibody is structurally symmetric, so the two ‘arms’ of the Y- shape will be indentical and bind to the same epitope. The variable regions from the heavy chain and light chain each contain 3 loop regions at the edge of a β sheet, known as the hypervariable regions or complementarity- determining regions (CDR). These CDRs can assume a great number of structures due to slightly different amino acid sequences and together, all 6 CDRs together form the antigen binding site. The B-cell epitope itself is the region on the surface of foreign body that the paratope of a B-cell receptor (and antibody) will bind to. This is normally the surface of a protein where the amino acids of the epitope re- gion have been brought together in the protein structure from separated regions of the amino acid sequence. Hence, they are known as discontin- uous epitopes. All epitopes are likely to be discontinuous to some extent [ref Barlow], however, a minority of epitopes primarily consist of a short section of sequential amino acids and so are known as linear epitopes. From the nature of the of immunoglobulins and the epitope sites on pathogenic proteins, it can be seen that the problem of B-cell epi- tope prediction is much less constrained then that for T-cell prediction, where the epitope is known to be a short peptide and the form of its bind- ing to the MHC is also known. In contrast a B-cell epitope could be any region of any pathogenic protein (although it is likely to be a protein on the surface of the pathogen - see Reverse Vaccinology). Many studies have fo- cussed on the prediction of linear epitopes, as predictions of these could be made on the basis of the pathogenic protein amino acid sequence, without requiring knowledge of its surface structure. Methods for prediction have focussed on the creation of amino acid propensity scales. These scales assign a numerical weighting to each type of amino acid to reflect how likely it is to have a certain physicochemical property, such as hydrophilicity, or how likely it is to be located within a certain structure such as a α helix or β sheet. Some of the amino acid properties have been linked to antigenicity (hydrophilicity [11], location within a turn [12]), so using the associated propensity scale a peptide of a given length can be scored according to the average weighting of its constituent amino acids. By using a ‘sliding window’ to sample all peptides within the protein, a prediction profile is built up, where peaks correspond to a higher score and high likelihood of the corresponding peptide being an epitope [13]. However, amino acid propensity scales have been shown to perform only marginally better at epitope prediction than randomly selected weightings [14], showing that a new approach is required. Although many studies have focussed on linear epitopes, these are a small subset, so attention is now turning towards the prediction of dis-

10 continuous epitopes. All predictive tools need a set of experimentally determined epitopes on which the analysis of epitope characteristics can be based. In the case of B-cell epitopes, this data set is obtained by x- ray diffraction of antigen-antibody complexes. Epitome [15] is a recent database detailing all known structurally-determined antigen-antibody com- plexes and it currently holds only 142 different epitopes. Compared with the number of known T-cell epitopes, this is very small, and a predictive algorithm based on such a data set is unlikely to be accurate. If further x-ray diffraction studies on antibody: antigen complexes are carried out then it may become possible to use methods such as supervised learning to elucidate the common features of B-cell epitopes.

5 Reverse Vaccinology

A recent addition to the repertoire of epitope prediction methods is reverse vaccinology, a method that requires knowledge of the whole pathogenic genome sequence. Reverse vaccinology does not attempt to identify a particular epitope directly, instead, it aims to identify surface proteins expressed by a pathogen. Pathogenic surface proteins are regularly lost from the pathogen surface, taken up non-selectively by dendritic cells, macrophages, and B-cells and the constituent peptides presented via the MHC complex to T-cells. Therefore T-cell epitopes are likely to originate from surface proteins of the pathogen. The genome is screened for sur- face proteins by comparison of all the open reading frames for coding re- gions showing homology to known protein coding regions as recorded in databases such as BLASTX [16]. If the predicted protein is known to function in the cytoplasm then it is discarded, but all other proteins pro- ceed for further analysis. This involves identifying whether the protein has features common to surface-associated proteins, such as transmem- brane domains, or whether it shows homology to known surface proteins. Databases such as BLAST [16] and FASTA [17] are used for this screen. Fil- tering of the proteome in this manner can reduce the number of proteins requiring experimental testing for by up to 25% of the to- tal [18]. This relatively simple and computationally inexpensive method provides a significant reduction of time and cost involved in experimen- tally determining the antigenic proteins of a pathogen. An example of suc- cess for reverse vaccinology was the identification of several candidates for clinical development of a vaccine against Neisseria meningitidis, the bac- teria responsible for meningitis B [19]. In this case the reverse vaccinol- ogy procedure identified 570 possible antigenic surface proteins, of which

11 22 were experimentally determined to be exposed at the surface and in- duce a bacteriacidal antibody response. Reverse vaccinology does reduce the number of possible antigens significantly, however, further reduction could be gained by combining this method with the other epitope predic- tion methods discussed in order to determine which peptide(s) within the protein act as the epitope. For example, combining reverse vaccinology with subsequent T-cell epitope prediction could find the location of the peptide within the protein sequence, leading to a peptide based vaccine which is preferable to a whole protein vaccine.

6 Impact of the Grid

As has been discussed previously, molecular dynamics is a method of epi- tope prediction that would benefit greatly from an increase in available computing power. This increase is starting to appear in the form of grid computing. The premise of grid computing is that at any given time the possible computing power of a region far exceeds that which is actually being utilised by those who have access to each constituent source. If this untapped power could be used as a communal resource then calculations that are preventatively time-consuming with locally available computing power could be performed on a sensible timescale. There is an analogy with the National Grid for electric power: when we switch on a kettle we do not know which power station has provided that electric power as all generated power is pooled as resource for the end user. An equivalent system in distributive computing, involving all computing sources, is still along way off. However, grids have been established that involve signif- icant numbers of high-permformance computers which have previously only been available to those working directly for the institution that owns them. For example, TeraGrid in the USA combines some of the computing power from 9 universities and institutions with the resultant grid acces- sible to researchers at those institutions and outside users who apply for computing time. The National Grid Service in the UK provides a sim- ilar type of grid for universities and research institutions as well acting as the connection between the UK grid and European grid projects such as DEISA (Distributed Infrastructure for Supercomputing Applications). One problem facing grid computing is the development of middleware that produces a uniform interface for users and hides the heterogeneity of the providers but progress is being made in making middleware less cumbersome for both the user and provider [20].

12 The grids mentioned above are not designed for a particular applica- tion, and can be utilised for any problem requiring massive amounts com- puting power. Research in the physical sciences, particularly particle and atmospheric physics, has dominated the use of grid technology so far, but applications within the biological sciences are starting to emerge. As well as the use of molecular dynamics in epitope prediction, other fields within are likely to benefit from grid computing through projects such as ImmunoGrid and ViroLab. These are two European wide projects which aim to create a virtual human immune system in the case of the former and a virtual laboratory of infectious diseases for the latter. Nei- ther project would be feasible without grid computing capabilities and demonstrates that grid computing can not only speed up existing research methods, but allows the conception of entirely new ones.

7 Final Remarks

Epitope prediction as a method for efficient identification of potential vac- cines builds on knowledge of the immune response to infection. The study of immunology has determined many of the details concerning the recog- nition of pathogens as foreign bodies and the means of their removal. Knowledge of the roles and structure of MHC molecules has lead to T-cell epitope prediction based on MHC binding peptides. Although the struc- tures of immunoglobulin molecules have also been determined, utilising this knowledge to predict B-cell epitope binding is a much harder problem than T-cell epitope prediction and to date has achieved little tangible suc- cess in terms of potential vaccines. However, exploring the nature of what makes a surface region of pathogenic protein act as an epitope continues to shed light on protein interactions and suitable modelling approaches. Re- verse vaccinology is a recent addition to the vaccine candidate prediction repertoire, showing early success, and in combination with T-cell epitope prediction the method should become even more useful. Epitope predic- tion is clearly a challenging but useful field of research and will hopeful become even more fruitful in the future, aided perhaps by the application of grid computing.

13 References

[1] Hertz T and Yanover C. Pepdist: A new framework for protein- protein binding prediction base on learning peptide distance func- tions. Bioinformatics, 7, 2006.

[2] Blythe M, Doytchinova I, and Flower D. Jenpep: a database of quanti- tative functional peptide data for immunology. Bioinformatics, 18:434– 439, 2001.

[3] Blythe M, Doytchinova I, and Flower D. Jenpep. www.jenner.ac. uk/JenPep.

[4] Falk K, Rotzschke O, Stevanovic S, Jung G, and Ramensee H. Allele- specific motifs revealed by sequencing of self-peptides eluted from MHC molecules. Nature, 351:290–296, 1991.

[5] Kast W, Brandt R, Sidney J, Drijfhout J, Kubo R, Grey H, Melief C, and Sette A. Role of HLA-A Motifs in Identification of Potential CTL Epi- topes in Human Papillomavirus Type 16 E6 and E7 Proteins. Journal of Immunology, 152:3904–3912, 1994.

[6] Adams H and Koziol J. Prediction of Binding to MHC class I molecules. Journal of Immunological Methods, 185:181–190, 1995.

[7] Brusic V, Rudy G, Honeyman M, Hammer J, and Harrison L. Predic- tion of MHC class II-binding peptides using an evolutionary artificial neural network. Bioinformatics, 14:121–130, 1998.

[8] Davies M, Sansom C, Beazley C, and Moss D. A Novel Predictive Technique for the MHC Class II Peptide-Binding Interaction. Molecu- lar Medicine, 9:9–12, 2003.

[9] Brunger A, Adams P, and Rice L. New Applications of simulated an- nealing in X-ray crystallography and solution NMR. Structure, 5:325– 336, 1997.

[10] Ross J. . www.jdaross.cwc.net/humoral_ immunity.htm.

[11] Parker J, Guo D, and Hodges R. New hydrophilicity scale de- rived from high-performance liquid chromatography peptide reten- tion data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry, 25:5425, 1986.

14 [12] Pellequer J, Westhof E, and van Regenmortel M. Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunology Letters, 36:83–99, 1993.

[13] Pellequer J, Westhof E, and van Regenmortel M. Epitope predictions from the primary structure of proteins. Peptide Antigens, A Practical Approach, Oxford University Press.

[14] Blythe M and Flower D. Benchmarking epitope prediction: Underperformance of existing methods. Protein Science, 14:246–248, 2005.

[15] Schlessinger A, Ofran Y, Yachdav G, and Rost B. Epitome: database of structure-inferred antigenic epitopes. Nucleic Acids Research, 34:777– 780, 2006.

[16] NCBI. Basic Local Alignment Search Tool (BLAST). www.ncbi.nlm. nih.gov.library.vu.edu.au/BLAST/.

[17] EMBL-EMI. FASTA–Protein Similarity Search. www.ebi.ac.uk/ fasta33/.

[18] Mora M, Veggi D, Santini L, Pizza M, and Rappuoli R. Reverse vacci- nology. Drug Discovery Today, 8:459–464, 2003.

[19] Pizza M and Scarlato M et al. Identification of Vaccine Candidates Against Serogroup B Meningococcus by Whole-Genome Sequencing. Science, 287:1816–1820, 2000.

[20] Coveney P, Saksena S, McKeown M, and Pickles S. The application hosting environment: lightweight middleware for grid-based com- putational science. Computer Physics Communications, 176:406–418, 2007.

15