Revealing Protein Structures: a New Method for Mapping Antibody Epitopes
Total Page:16
File Type:pdf, Size:1020Kb
Revealing protein structures: A new method for mapping antibody epitopes Brendan M. Mumey Brian W. Bailey Edward A. Dratz Department of Computer NIH/NlAAA/DlCBWLMBB Department of Chemistry and Science Fluorescence Studies Biochemistry Montana State University Park 5 Building Montana State University Bozeman, MT 59717-3880 12420 Parklawn Dr. MSC Bozeman, MT 59717-3400 [email protected] 8115 [email protected] Bethesda, MD 20892-8115 [email protected] ABSTRACT cells [9] and each protein has a unique folded structure. Whenever A recent idea for determining the three-dimensional structure of a the 3-D folding structure of linear protein sequences can be de- protein uses antibody recognition of surface structure and random termined this information has provided fundamental insights into peptide libraries to map antibody epitope combining sites. Anti- mechanisms of action that are often extremely useful in drug de- bodies that bind to the surface of the protein of interest can be sign. Traditional methods of protein structure determination re- used as “witnesses” to report the structure of the protein as follows: quire preparation of large amounts of protein in functional form, Proteins are composed of linear polypeptide chains that come to- which often may not be feasible. Attempts are then made to grow gether in complex spatial folding patterns to create the native pro- 3-D crystals of the proteins of interest for structure determination tein structures and these folded structures form the binding sites by x-ray diffraction, however, obtaining crystals of sufficient qual- for the antibodies. Short amino acid probe sequences, which bind ity is still an art and may not be possible [24, 251. Alternatively, to the active region of each antibody, can be selected from random if the proteins are not too large, are highly water soluble, and meet sequence peptide libraries. These probe sequences can often be other criteria, methods of nuclear magnetic resonance can be used aligned to discontinuous regions of the one-dimensional target se- for structure determination [8]. It is also possible to predict 3-D quence of a protein. Such alignments indicate how pieces of the structures de nova from the sequence of amino acids in the protein, protein sequence must be folded together in space and thus provide but the available methods are not very accurate unless a 3-D struc- valuable long-range constraints for solving the overall 3-D struc- ture of a highly homologous protein is already known [2] (also see ture. This new approach is applicable to the very large number of http://predictioncenter.llnl.gov). proteins that are refractory to current approaches to structure deter- mination and has the advantage of requiring very small amounts of A large fraction of protein structures of interest (50% or more) can- the target protein. The binding site of an antibody is a surface, not not be solved by the traditional approaches above [14, 153. Thus, just a linear sequence, so the epitope mapping alignment problem the antibody imprint method is being developed to provide struc- is outside the scope of classical string alignment algorithms, such tural information on difficult cases that appear refractory to tradi- as Smith-Waterman. We formalize the alignment problem that is tional approaches [7, 21, 11. The antibody imprint method makes at the heart of this new approach, prove that the epitope mapping use of information carried in the structures of antibodies against alignment problem is NP-complete, and give some initial results proteins of interest to reveal the 3-D folding of target proteins [7, using a branch-and-bound algorithm to map two real-life cases. 21, 1, 13, 171. Antibodies tend to be highly specific for the protein structures that they recognize [20]. They can either recognize con- 1. ANTIBODY EPITOPE MAPPING tinuous epitopes or discontinuous epitopes. Discontinuous epitopes Proteins are nano-machines that carry out most of the processes provide the most useful structural information in antibody imprint- in living cells. These tiny machines are constructed from long- ing, because they can reveal distant pieces of primary sequence that chains (typically 100-1000 elements) composed of twenty different are located in close spatial proximity in the native, folded protein. amino acids arranged in characteristic sequences. Proteins must be Most antibodies recognize discontinuous epitopes on protein sur- folded into complex 3-D shapes to create the binding pockets and faces [26]. Studies of a substantial number of antibody-protein active sites necessary to carry out their myriad of different func- complexes with known x-ray structures indicate that these com- tions [5]. There are at least 30,000 different proteins in human plexes form in a lock and key manner with little or no structural change induced by complex formation [ 1 I]. Fortunately, relatively few long-distance constraints are needed to reveal the global fold- Permission to make digital or hard copies of all or part of this work for ing of proteins [lo, 121. In addition, the spatial proximity of dif- personal or classroom use is granted without fee provided that copies ferent regions of proteins can change during function and antibody are not made or distributed for profit or commercial advantage and that imprinting has the potential to reveal these structural changes, if ap- copies bear this notice and the full citation on the first page. To copy propriate antibodies can be found that recognize the different struc- otherwise, to republish, to post on servers or to redistribute to lists, tural shapes [ 11. requires prior specific permission and/or a fee. RECOMB ‘02, April 18-21, 2002 Washington, D.C., USA Copyright 2002 ACM ISBN l-58113-498-3-02/04 . ..$5.00 Briefly, the antibody imprinting method is carried out by immobi- 233 lizing antibodies (against a target of interest) on beads or in plastic occur in probe sequence when the epitope mapping is discontin- wells. Random peptide libraries are exposed to the immobilized an- uous. We also allow single position gaps in the target, reflecting tibodies so that library members that bind to the antibodies can be the possibility of a single residue insertion into the probe. To be captured on the surface. The random peptide libraries are carried on a valid alignment, each probe position and target position can be bacteriophage (which is called “phage display” of the library) [3]. used at most once per mapping. Formally, an alignment A con- Each phage has a different peptide expressed on the surface of one SiStS of a sorted Set PA = {ii < is < . .. < ik}, and another of the coat proteins of the phage and there are typically 5 . 10’ [6] SetTA = {jl,jZ,... ,jk}. with the interpretation that the i,-th and even up to 10” different sequences in the library [29]. These probe residue, s(&), is aligned to the jr,-th target residue, t(j,), probe libraries contain either linear peptides or can be constrained for 1 5 p 5 k. with circular topology where the two ends of the probe are chem- ically linked with a disulfide bond. Peptide sequences that do not We adopt a two-part scoring system to evaluate the quality of align- stick to the antibody are washed off and the tightly binding phage ments. The scoring system is composed of a substitution score and are eluted under harsher conditions. The phages that bind to the a gap cost, antibody are multiplied by growth in suitable bacteria and again score(A) = S(A) - G(A). exposed to the immobilized antibody. These cycles of binding and enrichment of members of the random peptide library are usually The S(A) component is calculated with a substitution matrix M, repeated three times to select the phages with the highest affinity similar in principal to a Dayhoff matrix, used in other protein align- to the antibody. These enriched phages are then highly diluted and ment contexts. We discuss our choice of substitution matrix in the grown as clones that arise from individual phage particles. Each of experimental results section. The matrix is also used to score un- the phage clones carry the DNA sequence that codes for the peptide aligned probe positions; if character c occurs and unaligned probe sequences that have been selected. This DNA region is amplified position i, then M(c, -) is included in S(A): by PCR, tagged with fluorescent primers and sequenced in a stan- dard automated DNA sequencer. In this way, the sequence for each peptide is discovered. These individual sequences are often highly p=l probe positions i $! PA conserved and SO-100 independent peptide sequences together de- scribe a consensus sequence, called the coltsens~s epitope of the The probe gap cost G(A) is calculated by examining the number antibody. The problem addressed in the present paper is to develop of amino acid residues skipped along the target protein sequence a means to examine and evaluate all possible ways in which a con- between successive aligned probe positions: sensus epitope can be mapped onto the target protein in question, k-l to provide proximity constraints on the 3-D structure of the pro- (T-4) = c 4lh+l - hII tein. We adopt the terminology that the consensus epitope sequence *=I forms a probe that is to be aligned to the protein target sequence. where d(z) is the cost of skipping z amino acids along the target between successive mapped probe positions. For circular probes In the remainder of the paper, we formalize the probe-target align- we also include the term d[]jk - ji I] in the above sum. The com- ment problem, describe a branch-and-bound algorithm to find op- putational problem is thus to find finding an alignment A that max- timal (and sub-optimal) alignments, prove the corresponding deci- imizes score(A).