University of Wollongong Research Online

University of Wollongong Thesis Collection University of Wollongong Thesis Collections

1994 and germline evolution of immunoglobulin variable genes Harald Sebastian Rothenfluh University of Wollongong

Recommended Citation Rothenfluh, Harald Sebastian, Somatic hypermutation and germline evolution of immunoglobulin variable genes, Doctor of Philosophy thesis, Department of Biological Sciences, University of Wollongong, 1994. http://ro.uow.edu.au/theses/1053

Research Online is the open access institutional repository for the University of Wollongong. For further information contact Manager Repository Services: [email protected].

Somatic hypermutation and germline evolution of immunoglobulin variable genes

A thesis submitted in fulfilment of the requirements for the award of the degree

Doctor of Philosphy

from [UNIVERSITY OF WOLLONGONG LIBRARY. The University of Wollongong

by

Harald Sebastian Rothenfluh, B.Sc.(UNSW) Grad. Dip. Ed.(SCAE) B.Sc.Hons.(UW)

Department of Biological Sciences 1994 Declaration

This thesis is submitted in accordance with the regulations of the University of Wollongong in partial fulfilment of the requirements for the award of a Doctor of Philosophy. It does not incorporate any material previously published or written by another person except where due reference is made in the text The experimental work described in this thesis is original work and has not been previously submitted for a degree or diploma in any university.

H. Rothenfluh July, 1994

i Abstract

The work presented in this thesis was aimed at further defining the mechanism of somatic hypermutation and analysing in detail the patterns of sequence variability observed in germline heavy chain variable gene segments (VH) of the mouse. A number of mechanisms have been invoked to explain somatic hypermutation of immunoglobulin variable (IgV) region genes. Some of these mechanisms predict the presence of multiple and differentially mutated DNA or RNA copies of the rearranged IgV region. In order to detect these, the RNA and DNA was isolated from the same pool of specific B cells that were isolated from the spleen of hyperimmunized mice. Although insufficient sequences were collected to allow identification of DNA and RNA from the same cell, it was found that identical N regions and VH-D or D-JH joins were present in B cells that were not clonally related. This suggests that certain CDR3 sequences that are not encoded in the germline may be selected during B cell ontogeny and/or the germinal center reaction. A major hurdle in elucidating the mechanism of somatic hypermutation is the fact that it has not yet been reproduced in vitro. In a preliminary experiment it was attempted to induce antigen-specific splenic B cells, both singly and in groups of 10, to undergo somatic hypermutation during in vitro culture. The cells were isolated using flow cytometry, and cultured according to a recently developed B cell activation system method which utilizes the membranes of activated T cells. Some B cells were successfully induced to secrete Ig and/or proliferate, however none of the proliferating cells that were analyzed displayed evidence of having accumulated mutations during in vitro culture. Previous reports have failed to determine the precise 5' boundary for somatic hypermutation in rearranged IgVpj regions. In order to allow a more accurate definition of where this boundary lies, the 5' flanking region sequences of a number of previously characterised B cell hybridomas were determined. These sequences were added to all previously published sequences. This data set indicates that almost 97 % of somatic mutations were found downstream of the transcription start site (cap) site, and that the mutation frequency distribution around IgVn regions is asymmetrical, with a single mode centered on the rearranged VH region and a long tail extending into the J - C intron. Two classes of model are consistent with the new data: those where transcription products are the direct mutational substrates, and those where the mutational machinery operates directly on the DNA. Although the polymerase chain reaction (PCR) has revolutionized the study of genetic information, a number of in vitro artifacts can result from the use of this technique: Nucleotide misincorporations and the production of hybrid DNA molecules. The fidelity of the Taq and Pfu DNA polymerases was assessed by sequencing multiple clones of PCR amplified DNA fragments. In this way it was demonstrated that Pfu DNA ii polymerase has a 12-fold lower error-rate than Taq DNA polymerase. A number of experiments involving the restriction analysis of PCR products that were amplified from mixtures of well characterized cloned DNA revealed that under the PCR conditions used in the work carried out for this thesis, hybrid DNA molecules are produced below detectable limits. Thus the DNA sequences presented in this thesis are free of significant levels of in vitro generated artifacts. A number of laboratories previously reported the presence of hypervariable regions corresponding to the complementarity determining regions (CDR) in germline IgV genes. However, the murine germline VH gene sequences presented in this thesis also include significant amounts of non-transcribed 5' flanking region sequence. Nucleotide and amino acid variability plots clearly illustrate the similarity between the germline VH genes and their somatically rearranged and mutated counterparts. Statistical analysis indicates that the sequence patterns are significantly different from those expected under a random point mutator model, and that there is a significant deficit of stop codons generated by nucleotide substitutions. Phylogenetic analysis revealed that the putative transcription/coding units evolved differently and more rapidly than the non-transcribed 5' flanking regions, suggesting that hyper-recombination events targeted to the putative transcription/coding regions contributed to the evolution of germline VH genes. A number of evolutionary models that have been proposed to account for the evolution of the IgV multigene family will be evaluated on the basis of how well they can explain the new data.

iii Acknowledgments

The time spent gathering and interpreting the data presented in this thesis was a very rewarding experience. This is in no small way due to the enthusiastic involvement of my supervisor, Associate Professor Ted Steele, at all stages of the work. I would also like to thank Ted for all of his support, and for giving me the opportunity to experience the different facets of science and the scientific process. I am also very grateful to Dr Phil Hodgkin and Alusha Mamchak, who made me welcome in their laboratory and who were very patient with me throughout the work involving flow cytometry and tissue culture. Many thanks also to Drs. Gerry Both, Linda Taylor and Stefan Eick, who were involved in many critical discussions during the early phases of this work, and to Professor Adrian Gibbs for confirming the phylogenetic analyses and Professor Bob Blanden for his very insightful involvement in discussions about the germline analyses. Several members of staff at the Department of Biological Sciences also need to be thanked: Dr. Mark Walker for helpful advice, and the technical staff who are always ready to help out with materials, in particular Wendy Forbes who re-sequenced the DNA that I PCR amplified from the single splenic B cells. However, none of this work would have been possible without the love, understanding and support I received from my wife Josie. Many thanks also to my family, whose support has been invaluable.

iv Contents

Declaration i Abstract ii Acknowledgments iv Contents v Abbreviations x

Chapter 1. Introduction 1 1.1 Clonal Selection 1 Confirmation of Burnet's clonal selection theory 1 Affinity maturation of the immune response 2 Antigenic selection of B cells 3 1.2 The generation of diversity 4 a The germline 4 b. Rearrangement and junctional diversity 6 Ordered versus stochastic rearrangement of H and L chain loci 6 Allelic exclusion 8 D gene reading frames 9 Rearrangement signalling sequences 9 Transcription control elements in rearrangement 10 Joining of the genetic elements 11 Junctional diversity 12 VH replacement and secondary VK rearrangement 13 Recombination activating genes (RAG) 1 and 2 14 Preferential utilization of VH genes 14 c. Somatic hypermutation and the germinal center 16 Characteristics of somatic hypermutation in murine systems 16 Timing of somatic hypermutation 17 Boundaries of somatic hypermutation 17 Somatic hypermutation, antigenic selection and 3-dimensional conformation of V regions 18 Recruitment of new B cell clones 21 Germinal centers are the site of somatic hypermutation 23 The germinal center reaction 25 Transgenic models for somatic hypermutation 27 Somatic hypermutator models 30 Somatic hypermutation due to error-prone DNA synthesis/repair 30 Somatic hypermutation due to gene conversion events 32 Somatic hypermutation due to an error-prone DNA-»RNA-»DNA loop.33 Chapter 2. Aims of this thesis 36 2.1 Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells 36 2.2 In vitro analysis of splenic antigen-specific B cells 37 2.3 Determination of the 5' boundary for somatic hypermutation in VH regions 37 2.4 Minimization of PCR generated artifacts 38 2.5 Molecular and phylogenetic analysis of related germline VH genes 39 3. Materials and Methods 40 3.1 DNA, bacterial strains and cloning vectors used 40 a. Genomic DNA 40 b. Hybridoma DNA 40 c. 40.3,3B44 and 3B62 rearranged VH region genes 40 v d. A6/24 and A20/44 rearranged VH region genes 40 e. Bacterial strains 40 f. Cloning vectors 41 3.2 Enzymes 41 Thermostable DNA polymerases 41 DNA modification enzymes 41 3.3 Primers used in the Polymerase Chain Reaction (PCR) 41 3.4 PCR with Taq DNA polymerase 42 3.5 PCR with Pfu DNA polymerase 43 3.6 PCR with radiolabel incorporation 43 3.7 Electroelution 43 3.9 Cloning and ligation of DNA 45 3.10 Preparation of competent cells 45 3.11 Transformation of DNA cloned into bacteriophage M13mpl9 45 3.12 Transformation of DNA cloned into plasmidpUC 19 46 3.13 Dot-Spot plaque hybridization screening 46 3.14 Dot-spot colony hybridization screening 47 3.15 Transferring separated DNA fragments onto a membrane 47 3.16 Hybridization of labeled probes to nylon membranes 47 3.17 Preparation of single-stranded DNA for sequencing 48 3.18 Preparation of double-stranded DNA for sequencing 48 3.19 DNA sequencing 48 3.20 Preparation of genomic DNA 49 3.21 Isolation of both cytoplasmic RNA and nuclear DNA from the same B cell(s) 49 3.22 Isolation of splenic B cells from mice hyperimmunized with NP with NIP-coated Dynabeads 50 3.23 Flow cytometry 51 3.24 Culture of FACS sorted cells 52 3.25 Enzyme-Linked Immunosorbent Assay (ELISA) 53 3.26 Enzyme-Linked Immunosorbent spot (ELISpot) assay 54 RESULTS 55 Chapter 4. Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells 56 Rationale 56 Strategy.... 56 cDNA sequences 56 DNA sequences 58 Concluding remarks 60 Chapter 5. In vitro analysis of splenic antigen-specific B cells 62 Rationale 62 Strategy 62 Concentration of H chain isotypes in pre- and post-tertiary sera 62 Flow cytometric analysis of splenic cells 63 Three color flow cytometric sorting of splenic B cells 66 Ig secretion by cultured cells 67 Molecular analysis of proliferating B cells 68 Concluding remarks 69 Chapter 6. Determination of the 5' boundary for somatic hypermutation in VH regions 70 Rationale 70 Strategy 70 Sequence survey of VHI86.2 related germline genes 70 Extension of 5' flanking region sequences of previously characterized vi hybridomas 72 Mutation frequency distribution is asymmetric 76 Concluding remarks 78 Chapter 7. Minimization of PCR generated artifacts 79 Rationale 79 Strategy 79 Fidelity of Taq DNA polymerase 79 PCR crossover events with Taq DNA polymerase 80 PCR crossover events with Pfu DNA polymerase 80 Concluding remarks 85 Chapter 8. Molecular analysis of VH 186.2 related germline genes 86 Rationale 86 Strategy 86 Isolation of 31 VH186.2 related gennline genes from C57BL/6J genomic DNA 86 DNA sequences of the 31 genes isolated from C57BL/6J DNA 88 Wu - Kabat nucleotide/amino acid variability plots for the 31 C57BL/6J sequences 94 Codon-by-codon analysis of the putative coding regions of the 31 C57BL/6J sequences 96 Isolation of 21 VH186.2 related germline genes from BALB/c genomic DNA 97 DNA sequences of the 21 genes isolated from BALB/c DNA 99 Wu - Kabat nucleotide/amino acid variability plots for the 21 BALB/c sequences 103 Codon-by-codon analysis of the putative coding regions of the 21 BALB/c sequences 104 Concluding remarks 105 Chapter 9. Phylogenetic analysis of VH 186.2 related germline genes 106 Rationale 106 Strategy 106 Dendrograms for the 31 C57BL/6J sequences 107 Dendrograms for the 21 BALB/c sequences 109 Independent confirmation of the phylogenetic analyses 110 Distribution of insertions and deletions Ill Concluding remarks 112 Chapter 10. Analysis of genuine germline VH 186.2 related genes 113 Rationale 113 Strategy 113 Estimation of B lymphocyte contaminants in C57BL/6J and BALB/c liver tissue 113 Nucleotide and amino acid variability plots for 27/30 genuine germline genes 116 Codon-by-codon analysis of the genuine germline genes 117 Phylogenetic analysis of 27 genuine germline genes 118 Concluding remarks 120 Chapter 11. Molecular analysis of VH205.12 related germline genes 121 Rationale and strategy 121 Isolation of 20 VH205.12 related germline genes from C57BL/6J genomic DNA 121 DNA sequences of the 20 genes isolated from C57BL/6J DNA 122 Wu - Kabat nucleotide/amino acid variability plots for the 20 C57BL/6J sequences 126 Codon-by-codon analysis of the putative coding regions of the 20 C57BL/6J sequences 128 Isolation of 10 VH205.12 related gerrhline genes from BALB/c genomic DNA 129 DNA sequences of the 10 genes isolated from BALB/c DNA 130 Wu - Kabat nucleotide/amino acid variability plots for the 10 BALB/c sequences 133 Concluding remarks 134 Chapter 12. Phylogenetic analysis of VH205.12 related germline genes 135 Rationale and strategy 135 Dendrograms for the 20 C57BL/6J sequences 135 Dendrograms for the 10 BALB/c sequences 136 Distribution of insertions and deletions 137 Concluding remarks 138 Chapter 13. Discussion and Conclusions 139 Preamble 139 13.1 Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells 139 Deleterious somatic mutations in the literature 139 Characterization of NP-specific B cells 141 Clonal relationships among NP-specific B cells 141 Antigenic selection of VH-D-JH junctions 142 Comparison of the RNA and the DNA sequences 145 Future work 145 13.2 Invitro analysis of splenic antigen-specific B cells 146 In vitro reproduction of somatic hypermutation 146 Analysis of anti-NP sera and splenic cells 147 Ig secretion and proliferation of FACS sorted splenic B cells 147 Future experiments 148 13.3 Determination of the 5* boundary for somatic hypermutation in VH regions 149 Search for the cluster of mutations found upstream of the cap site in VH3B62 149 Determination of the 5' boundary for somatic hypermutation 150 Asymmetrical distribution of somatic mutations: implications for somatic mutator models 151 Evaluation of somatic hypermutation models 152 Which mechanism is correct? 157 13.4 Minimization of PCR generated artifacts 157 Error-rates of Pfu and Taq DNA polymerases 157 PCR crossover events with Taq DNA polymerase 158 PCR crossover events with Pfu DNA polymerase 159 13.5 Molecular and phylogenetic analysis of related germline VH genes: evolutionary implications 161 a) Patterns of sequence variation among murine germline IgVn genes 161 Germline VH sequences 161 Germline sequences with identical regions 162 Observed and expected nucleotide and amino acid variability viii 164 VH186.2 and VH205. 12 related pseudogenes 167 Deficit of stop codon-generating nucleotide substitutions 171 Hyper-recombination targeted to coding/transcription units 172 Utilization and location of VH genes belonging to the J558 family 174 Summary 174 b) Patterns of sequence variation among germline murine VL genes and among germline IgV genes of other vertebrate species 175 Murine VKOXI germline genes 175 Human VH genes 179 Rabbit VH genes 185 Chicken Vj\, and VH pseudogenes 188 Chicken V^ pseudogenes 190 Chicken VH pseudogenes 193 Xenopus VH genes 196 Summary of the patterns of sequence variability among mammalian, avian and amphibian IgV germline genes 200 c) Evolutionary models and the patterns of sequence variability among germline IgV genes 204 Evolution of IgV genes by random mutagenesis followed by natural selection 205 Gene conversion 206 The neutral theory of molecular evolution 207 Gennline generator of IgV diversity 213 Soma-to-germline genetic feedback loop 213 Lymphocyte-specific, endogenous retrovirus-mediated genetic feedback loop 215 Summary , 217 Appendix A Genbank accession numbers 220 Appendix B Publications resulting from the work presented in this thesis 221 References 223

Response to examiner's comments 246

ix Abbreviations

A Adenine Ab Antibody Ag Antigen APC allophycocyanin ARS p-azophenylarsonate base pair(s) bp Bovine serum albumin BSA Cytosine C Transcription start site cap copied DNA(reverse transcript) cDNA Complementarity detemiining region CDR conalbumin A conA Diversity gene segment D gene segments Deoxyribonucleic acid DNA Double stranded DNA dsDNA Deoxynucleoside triphosphate dNTP Dithioerythritol Dithiothreitol DTE Ethylenediaminetetra-acetic acid, disodium salt DTT Fetal calf serum EDTA Fluorescein isothiocyanate FCS Framework region FJTC Centrifugal force FR Guanine g Heavy chain G hour H chain Horse-raddish peroxidase hr Immunoglobulin A Immunoglobulin G HRA Immunoglobulin kappa light chain IgA Immunoglobulin lambda light chain IgG Immunoglobulin M IgK Interleukin IgX Isopropyl-6-D-thiogalactopyranoside IgM Joining gene segment EL kilobase(s) IPTG Keyhole limpet hemocyanin J gene segment liter kb Light chain KLH molarity 1 Matrix associated region membrane-bound immunoglobulin L chain minute M milligram MAR Major histocompatibility complex mlg milliliter min microliter mg millimolarity MHC messenger Ribonucleic acid ml nanogram ixl (4-hydroxy-3-iodo-5-nitrophenyl)aceticacid (4-hydroxy-3-nitrophenyl)acetyl x mM NnNImRN gPregioP A n Non-germlinjoins e(se encodee textd fo nucleotider definitions )foun d at VH-D and D-JH PCR Polymerase chain reaction PE Phycoerythrin Pg picogram phOx 2-phenyl-5-oxazolone pmol picomole PNA Peanut hemagglutinin P nucleotides Non-germline encoded nucleotides found at VH-D and D-JH RAPD joins (see text for definition) R:S Random amplification of polymorphic DNA RSS Ratio of replacement to silent nucleotide substitutions SDS Recombination signal sequences sec Sodium dodecyl sulfate, sodium salt ssDNA second T single stranded DNA Thcell Thymine TCR T helper (CD4+) cell V gene segment T cell receptor VH Variable gene segment Heavy chain variable gene VL Light chain variable gene X-Gal 5-bromo-4-chloro-3-indolyl-6-D-galactopyranoside

xi 1. INTRODUCTION

1.1 Clonal Selection

The primary purpose of the evolutionary advanced human and murine immune systems is to protect the individual against disease. A fundamental property of the immune system is its capacity to respond to virtually any foreign substance that enters the body. This property is necessary for the system to function effectively against a wide variety of disease-causing microorganisms, some of which have a great capacity to mutate and thus change their antigenic structures. Thus, the immune system has evolved to deal with the problem of a potentially huge variety of antigenic challenges. In the late 1950s, Sir MacFarlane Burnet, adding to the earlier work of Niels Jerne and David Talmage, formulated the Clonal Selection Theory to explain the specific adaptive properties of acquired immunity (Burnet, 1957). The idea at the heart of Burnet's theory was that each B cell expressed and secreted only one specific antibody. Each specific cell was therefore potentially selectable by a specific antigenic epitope leading to clonal growth of that specific cell line. In this way the organism could biologically amplify in a few days the antigen specific cell (or cells) and manufacture those specific antibody molecules required by the host, ensuring that high concentrations flood the lymphatics and blood plasma systems. This helps to clear the body of the infective agent via direct neutralization and indirect complement- mediated reactions (target cell lysis, opsonization and phagocytosis, chemotaxis and white cell infiltration to the depot of infection).

Confirmation of Burnet's clonal selection theory: A number of experiments eventually confirmed the 'one cell - one antibody' premise (reviewed in Ada and Nossal, 1987). Experiments carried out by Lederberg and Nossal in which rats were immunized with one of two different flageEar , determined that whereas many antibody producing cells could make specific against one of the antigens, no cell made antibodies against both antigens. Using another strategy, which involved the analysis of individual antibody producing cells, Makela found that different antibody secreting cells produced antibodies with differing specificities for the eliciting antigen. This finding was compatible with Burnet's clonal selection theory, which also postulated that the antibody producing cells would accumulate mutations as they proliferated, and that any cell harbouring a mutation resulting in a higher affinity for the antigen would be positively selected by the antigen. Since different clones would contain different mutations, it follows that they would possess different antigen specificities.

1 Experiments that demonstrated that the initial antigen binding cells are indeed the precursors of the antigen producing cells were also carried out (reviewed in Ada and Nossal, 1987). Ada and Byrt immunized two groups of mice with one of two related 11 ^-labeled Salmonella antigens. This would destroy the cells that bound the radioactive antigen. The lymphocytes of these mice were then transferred to X-irradiated mice, which were immunized with both antigens (non-radioactive). It was found that the mice failed to respond to the initial, radioactively labelled antigen, but responded well against the other antigen. The strongest support for Burnet's theory came from experiments in which immune spleen cells where added to a layer of gelatin containing the antigen and non­ specific cells were washed off. The antigen specific cells were removed and individually stimulated with the immunizing or a different antigen. Only the cells stimulated with the original antigen proliferated and secreted antibody.

Affinity maturation of the immune response: When an antigen enters the immune system for the second time, it will result in a secondary immune response which is more rapid and intense than the primary response. The antibodies produced during the secondary response also have higher affinity for the antigen than do primary response antibodies. Therise in antibody affinity with time during an immune response is known as affinity maturation (Siskind and Benacerraf, 1969; Berek and Milstein, 1987 and 1988). During T cell dependent immune responses, B cells are induced to become immunoglobulin (Ig) secreting cells by T helper (Th) cells. The Th cells themselves however need to be activated by antigen presenting cells which take up, process and display a fragment of the antigen on their surface in conjunction with class II major histocompatibihty complex (MHC) molecules (Grey and Chesnut, 1985). Macrophages (Unanue, 1984), dendritic cells (Steinman et al, 1983), B cells (Lanzavecchia, 1985) and, although only inefficiently, T cells (Lanzavecchia et al, 1988) are some of the cells which have been shown to possess antigen presenting functions. The activation of both B cells and Th cells requires the secretion of specific factors by the respective activating cells (Nisbet-Brown, et al, 1987; O'Garra, et al, 1988; Weaver and Unanue, 1990; Hodgkin and Kehry, 1993). When an antigen enters a naive immune system it will bind to any circulating B cell bearing an Ig with some affinity for the invading antigen (Burnet, 1957). The antigen is internalised and processed by the B cell, and fragments of the captured antigen are then presented on the cell surface in conjunction with class U MHC molecules (Lanzavecchia, 1985). Th cells expressing antigen specific TCRs will recognize this MHCU/peptide complex. The binding of the Th cell to the MHCII/peptide complex presented by the B cell triggers the secretion of the CD40 ligand (CD40L) by the Th cell. The CD40L interacts with the CD40 receptor on the B cell, which leads to the expression of CD80 by

2 the B cell, which interacts with CD28 on the T cell (reviewed in Clark and Ledbetter, 1994). The interaction of these and other surface molecules leads to the activation of the B and T cells. The activated B cells then either become the Ig secreting plasma cells of the primary immune response or they may undergo the memory cell pathway, during the course of which the variable (V) regions of the Igs expressed by these cells are somatically hypermutated (Siekevitz et al, 1987; see below). The process of somatic hypermutation and subsequent selection of high affinity mutants takes place in the germinal centres (Jacob etal, 1991a; MacLennan, 1991; Jacob et al, 1992; see below). As the antigen is cleared from the system there is increased competition for the remaining antigen, which results in more stringent selection for higher affinity antibodies. The long lived memory B cell clones then enter the periphery and initiate a rapid secondary response with high affinity for the invading antigen.

Antigenic selection of B cells: It was shown that following the first antigenic contact, mature resting ul°8ni B cells lose expression of their membrane-bound IgD (mlgD; Butcher et al, 1982; Havran et al, 1984). This would result in the loss of approximately 80 - 90 % of mlg molecules, and leave the cell with approximately 10^ to 2 x 10^ mlgM molecules (Havran et al, 1984). It was calculated that only approximately 20 % of the surface of a cell with 10^ mlg molecules would be occupied by the mlg molecules (Havran et al, 1984). From this, it was concluded that following an 80 -90 % reduction of mlg, the bulk of collisions between a B cell and its specific antigen would not result in an antigen-antibody interaction, and hence would greatly decrease the antigen binding potential of the cell. Recent in vitro experiments have demonstrated that naive B cells expressing IgM antibodies with high affinity for the antigen respond more rapidly, and that selection of high affinity clones by protein antigen is greater at reduced mlg densities (George et al, 1993). Taken together, these data suggest that receptor density and affinity are critical determinants of which B cells become clonally selected by antigen, and that only B cells expressing mlg with high avidity at low mlg densities are selected by antigen during the later stages of the immune response (George et al, 1993). Recently it was shown that affinity maturation in IgD deficient mice is delayed 3-4 days as compared to normal mice (Roes and Rajewsky, 1993). This suggests that prior to the loss of IgD expression in antigen activated B cells, IgD may enhance B cell-antigen or Th-B cell interactions (Coico etal, 1988; Tamma etal, 1991; Amin etal, 1991; Roes and Rajewsky, 1993). It has also been apparent for some time that resting B cells can be activated without contact with antigen in the presence of activated Th cells (Julius and Rammensee, 1988; Owens, 1988) or by soluble factors (Jelachich etal, 1984; Abbas et al, 1990). However, B cells activated by this bystander effect were found to secrete less than half the amount of antibody as do antigen activated memory B cells (Johnson and

3 Jemmerson, 1991), thus the antibodies produced by these cells are unlikely to play an important role in the specific immune response. B cells and T cells need to be able to produce large numbers of different antigen receptor specificities so that in principle a response can be mounted against almost any antigen. Many features of the germ line organisation of TCR V region gene segments and their rearrangement to form a fully functional TCR V region are similar to those of the IgV region gene segments (Gascoigne et al, 1984; Kavaler et al, 1984; Siu et al, 1984; Barth etal, 1985; Winoto etal, 1985; Becker etal, 1985). A number of studies failed to detect any evidence for somatic hypermutation of TCR V regions (Chien et al, 1984; Arden et al, 1985; Fink et al, 1986), however since none of these studies involved driving the immune response through in vivo hyperimmunization, the lack of somatic hypermutation in TCR V regions may be due to limited experimental design (reviewed in Steele et al, 1993). Indeed, in a report where tryptic peptide mapping was used to study antigen driven T cell clones, evidence for somatic mutation was found (Augustin and Sim, 1984), however this was not followed up with more detailed molecular techniques. The study of TCR V regions is beyond the scope of this thesis, thus subsequent discussion will focus primarily on the B cell antigen receptor.

1.2 The generation of antibody diversity a. The germline

Antibodies are composed of two identical heavy (H) chains and two identical light (L) chains that can belong to either the X or K light chain gene families. In the mouse, the LK chain genes are located on chromosome 6, the L^, chain genes on chromosome 16 and the H chain genes on chromosome 12 (Honjo, 1983). All three chain types contain V and constant (C) regions. The VL regions are made up of V and junctional (J) gene segments (Brack et al, 1978; Bernard et al, 1978). The VH chains also contain V and J gene segments, however they are separated by a diversity (D) gene segment (Early et al, 1980). The organisation in the murine germline of the three Ig chains is shown in Figure 1.1. The JK3 gene segment (Tonegawa, 1983) and the J^4 gene segment (Blomberg and Tonegawa, 1982) are non-functional. There are at least 12 D segments present in the H chain locus (Kurosawa and Tonegawa, 1982) and the number of V genes in the VK locus is estimated to range between approximately 160 - 300 genes (Cory et al, 1981; Gearhart, 1982; Kofler et al, 1992). The germline repertoire of VH genes is much larger. For example, in BALB/c mice the J558 VH family contains 500 - 1000 or more VH genes (Livant et al, 1986). However, this family is larger in BALB/c mice than it is in

4 either A/J or C57BL/6 mice (eg. see Meek et al, 1990). The J558 VH family includes the VH186.2 and the VH205.12 sub-families (see below).

Ig K light chain:

LVj LV2 L Vn J15 CK •fOD'ID'DOODrMZZ}- n= 200-300

LV2 J2 C^ J4 CU L Vj J3 C^ Jj CX1 fOlHZIHHIZ^MHZZHHZZl-

Ig heavy chain;

LV, LV2 LV„ Da2 1M C„ 1 I {DlD {D ll»^{HI][HZ3c6.c73,cTi,cl21>cy2a,ce,c0 n>300

Figure 1.1. Schematic representation of the germline organization of the gene segments found in murine IgH and IgL chains. Symbols: L = leader sequence; V = variable gene; D = diversity gene; J = joining gene; C = constant region gene; n = number of genetic elements (see text for references); j] = indicates regions that have not yet been sequenced. NOTE: the C region genes are composed of varying numbers of introns and exons which are not shown in this diagram. (Adapted from Tonegawa, 1983; Gearhart, 1982)

VH genes are spaced at 10 - 20 kilobase (kb) intervals along the chromosome (Honjo, 1983; Bothwell, 1984; Blankenstein and Krawinkel, 1987; Rathbun et al, 1989), and these intervals are generally constant regardless of the VH family (Meek et al, 1990). The most complete map of the 14 known murine VH families available at this time is shown in Figure 1.2. Not all gene families are present as distinct clusters of VH genes on the chromosome, in several cases they are interspersed. The sizes of the gene families and the degree of interspersion can vary greatly between different inbred strains of mice as well as in wild mice (eg. see Kemp et al, 1981; Brodeur and Riblet, 1984; Blankenstein and Krawinkel, 1987; Brodeur et al, 1988; Kleinfeld and Weigert, 1989; Rathbun etal, 1989; Meek etal, 1990).

J558 Q52 J606 - CH27 - 3609N - GAM3-8 - 36-60 - S107 - GAM3-8 - CP3 - 36-60 - CP3 - X-24 - SM-7 - S107 3609 7183 ^ Telomere Centromere—• Figure 1.2. Relative map positions of the 14 known VH gene families of the mouse. Families that are extensively interspersed are boxed. (Adapted from Kofler et al, 1992)

The presence of multiple IgV region genetic elements in the germline could theoretically giverise to more than 10*0 different antibody specificities and the random pairing of heavy and light chains could further increase this number (Berek and Milstein, 1988). However, it is likely to be an underestimate of the true theoretical repertoire since

5 this calculation assumed the presence of only 100 VH genes in the germline. There are likely to be certain limitations to the theoretical size of the repertoire since some H-L chain and V(D)J combinations will be non-functional or deleterious to the cell (Conn, 1974; Berek and Milstein, 1988) and some H-L chain combinations are preferentially associated in specific antibodies (Mannik, 1967; Kranz and Voss, 1981). Studies involving immature B cell lines have revealed preferential utilization of the VH genes in closest proximity to the JH gene locus on the chromosome (Yancopoulos et al, 1984; Perlmutter et al, 1985), and preferential utilization of the most 5' and 3' D genetic elements (Tsudaka et al, 1990). It was also shown that antibodies expressed in unimmunized mice display restricted V region gene combinations (Rajewsky et al, 1987). Therefore the potential naive repertoire is smaller than the theoretical one. Nevertheless, even if the potential repertoire were only 10% - 50% of the theoretical one it would still be larger than the circulating B cell population expressed by the murine immune system (Berek and Milstein, 1988). Thus, the arrangement of the genetic elements coding for the antibody V region in the murine germline alone could result in a vast repertoire of different antibody specificities. The murine naive B cell population consists of approximately 10^ cells (eg. see Rajewsky et al, 1987). However, according to the Protecton model (Conn and Langman, 1990), it is unlikely that each naive B cell expresses a unique specificity, since for any antibody to be present at the lowest concentration required for a rapid and effective response against an antigen, an appropriate minimum number of antibody secreting B cells need to be present. Thus, it is clear that only a fraction of the potential repertoire generated by the above mechanisms is expressed. Nevertheless, the main force driving the evolution of the immune system has most probably been the generation of a system that could eliminate any invading antigen, and in response to this selection pressure, additional diversification mechanisms have evolved over time. b. Rearrangement and junctional diversity

Ordered versus stochastic rearrangement of H and L chain loci: The formation of rearranged IgV regions involves the joining of the appropriate gene segments to form a continuous V region with all the intervening sequences removed (Fig. 1.3). It was proposed that the H chain locus is rearranged before rearrangement of one the light chain loci {Alt etal, 1981; Korsmeyer etal, 1981; Siden etal, 1981).

6 LVj LV2 LVn DUn IM C„ JC^CDHHZHHWlHHHFaiED Germline DNA -i-K3H-nnnD\ I Rearranged DNA -] ||| ~}AAAAA Processed mRNA

Figure 1.3. DNA rearrangement of the V region genes of an IgVjj followed by the splicing of intervening sequences and C region introns during mRNA processing. The leader sequence is eventually removed from the protein. Symbols: L = Leader sequence; V = variable gene; D = diversity gene; J = joining gene; Cu = Constant region exons (IgM); other annotations as in legend to Figure 1. (Adapted from Tonegawa, 1983; Gearhart, 1982)

Early studies found that the IgX, locus of B cells expressing an LK chain is usually not rearranged whereas that the IgK locus of a L\ bearing B cell usually has undergone rearrangement (Alt et al, 1980; Hieter et al, 1981; Coleclough et al, 1981). Additionally, the ratio of K:X chain bearing peripheral B cells is greater than 10:1 (McGuire and Vitetta, 1981). This ratio is not due to antigenic selection (Cohn and Langman, 1990), since it is already present very early in B cell ontogeny (McGuir and Vitetta, 1981; Rolink et al, 1991). The interpretation of the above data was that K chain rearrangement precedes X chain rearrangement. It was proposed that this might be due to a sequential mechanism whereby IgA, gene rearrangement could only occur if both IgK alleles were non-productively rearranged (Hieter et al, 1981), or due to a stochastic model where the probability of an IgK rearrangement is much higher than that of an IgX (Coleclough etal, 1981). Recent gene targeting studies, in which rearrangement of the IgK locus was eliminated (Zou et al, 1993; Chen et al, 1993a), found that all newly generated and peripheral B cells expressed h\ chains. One of these studies, utilizing the sensitivity of the polymerase chain reaction (PCR; Saiki et al, 1988), found that approximately 4% of LK bearing B cells of normal mice contained a non-productive IgX rearrangement, and that the potential K/X double producing pool (i.e. B cells with productive rearrangements at both the IgK and the IgX loci) is less than 1 % (Ehlich et al, 1993; Zou et al, 1993). Hybridomas expressing LJL chains but with unrearranged K chain alleles were also reported elsewhere (Berg et al, 1990). Two studies involving the analysis of IgH chain and IgL chain rearrangements in transformed pre-B cell lines that lacked a functional LIH chain, nevertheless detected rearrangements at the IgK locus (Blackwell et al, 1989; Kubagawa et al, 1989). In more recent investigations, low levels of K chain rearrangements were detected in mice lacking the membrane exon of the LI chain, which as a result can only produce the secreted form

7 of the |X chain (Kitamura and Rajewsky, 1992). The possibility that the secreted LIH chain may somehow effect the observed low level of K chain rearrangement was eliminated by the detection of K chain rearrangements in pre-B cell lines with non-functional D-JH rearrangements on both IgH alleles (Grawunder et al, 1993), and in mice that are unable to rearrange the IgH chain due to targeted deletion of the JH locus (Chen et al, 1993b). Taken together, the above data indicates that IgX. gene rearrangement can occur independently of rearrangement and expression of the IgK locus, and that some rearrangement of the IgL chain loci can occur in the absence of a functional IgH chain. This is incompatible with the proposition that Ig gene rearrangement follows an ordered sequence of events (Hieter et al, 1981). Recently, a model for the rearrangement of H chains and L chains, that took the above data into account was proposed (Ehlich et al, 1993). According to this model, the rearrangement process is initiated in large CD43+ B cell precursors. Rearrangements can occurfirst at either the H chain locus or at one of the L chain loci (usually the K locus). In approximately 95 % of these B cell precursors, H chain rearrangement is initiated first. In this case, the cells express a pre-B cell receptor and become small pre-B cells, which undergo rearrangement of their L chain loci to become mature B cells. However, if the light chain is rearranged first, the cell becomes a mature B cell following a productive H chain rearrangement, thus by-passing the small pre-B cell stage of B cell development.

Allelic exclusion: During IgL chain rearrangement one V gene is joined to a J gene (Weigert et al, 1978; Brack et al, 1978). In IgH chains, the D and the J genes are joined before a V gene is joined to the ligated D-J genes (Sakano et al, 1980; Alt and Baltimore, 1982; Yaoita et al, 1983). Most antigen receptor genetic loci are subject to allelic exclusion, i.e. only one of the alleles is productively rearranged (Alt, et al, 1980; Early and Hood, 1981; Altetal, 1984; Goverman etal, 1985; Fink etal, 1986). Thus, it was shown that the expression of a membrane bound LIH chain will inhibit further rearrangement at the IgH locus (Nussenzweig et al, 1987), and IgH chain allelic exclusion was absent in mice that only produced IgM molecules lacking the transmembrane portion (Manz et al, 1988; Kitamura et al, 1991; Kitamura and Rajewsky, 1992). A non-productive rearrangement can result from incomplete rearrangements, out-of-frame rearrangements or rearrangement to a pseudo joining site (reviewed in Early and Hood, 1981). However, there are some antigen receptor loci that may not be subject to allelic exclusion. It was shown that secondary rearrangements occur at the VK locus, and this led to the hypothesis that productively rearranged VK alleles are tested for functional interaction with the H chain expressed by the same cell before allelic exclusion takes place (Harada and Yamagishi, 1991). This is supported by the fact that in transgenic mice, allelic exclusion of the endogenous K allele by a K transgene only occurred when a

8 functional H-K chain combination was present in the cell (Storb, 1987). The initial suggestion that the TCR a chain locus also may not be subject to allelic exclusion (Fondell et al, 1990) was recently confirmed by Padovan et al (1993), who found T cells expressing two different, functional a chain rearrangements.

D gene reading frames: Although additional antibody diversity stems from the ability of D genes to be read in three different reading frames (RF; Kaartinen and Makeia, 1985), there is a prevalence of one particular RF (Kaartinen and Makeia, 1985; Gu et al, 1991a), RF1, according to the nomenclature of Ichihara et al (1989). There appear to be a number of mechanisms that are responsible for the preferred utilization of RF1. Firstly, in the absence of N region addition (see below) rearrangement is promoted by short sequence homologies at the recombination sites (Gu et al, 1990; Feeney, 1992). It was shown that rearrangements that were mediated by these short sequence homologies result in the almost exclusive usage of RF1 (Gu et al, 1990, 1991). Secondly, the usage of RF2 in D-JH rearrangements results in the expression of a truncated \i chain called the Dp; protein (Reth and Alt, 1984). This protein arrests the further development of the cell by preventing VH-D-JH joining (Gu et al, 1991a). Finally, approximately 70% of D-JH rearrangements in RF3 contain stop codons, and thus can not lead to productive VH-D- JH rearrangements (Gu et al, 1991a).

Rearrangement signalling sequences: Highly conserved structures that are rearrangement signalling sequences (RSS) have been found to be associated with the V region genes of all Ig and TCR chains (Sakano et al, 1980; Early et al, 1980; Sakano et al, 1981; Siu et al, 1984; Gascoigne et al, 1984; Kavaler et al, 1984; Arden et al, 1985). A highly conserved heptamer recognition sequence is separated from a less conserved nonamer sequence by a spacer that is usually either 12 or 23 nucleotides in length (Fig. 1.4). Although the lengths of the spacers are conserved, there is no apparent conservation of the spacer sequences(Early et al, 1980; Sakano et al, 1980). One turn of the DNA helix spans 10.4 base pairs (bp; Wang, 1979), thus the spacers span 1 or 2 turns of the of the DNA helix. This has led to the suggestion that two V region gene segments can only be joined if a 12 bp spacer (one turn of the DNA helix) is adjacent to a 23 bp (two turns of the DNA helix) spacer (Early, et al, 1980; Sakano et al, 1980). Kurosawa and Tonegawa (1982) also suggested that D-D joins may be possible if an SP2 type D segment was involved. The coding region of SP2 D segments contains three distinct regions: a conserved heptamer sequence flanked by a pentanucleotide on either side. The central heptamer is homologous to the signal heptamers, hence it could be recognized as such by the rearrangement machinery. This would create a 24 bp spacer which presumably would be consistent with the 12/23 bp

9 spacer rule. Indeed, D-D rearrangements have now been detected in mice (Meek et al, 1989) and in humans (Davidson etal, 1990).

IgK light chain:

9 -J- 9 7- J "Fl 23 L—I 12

— 7 9 -J- 9 7 J 23 12

Ig heavy chain; 0- J

- 9 9 — 12 &°€hr Figure 1.4. Organization of the rearrangement signal sequences and spacers around IgV region genes. The conserved heptamer found near the 3' terminus of many germline Vfj genes (see section 13.5c) is indicated by a shaded rectangle. Symbols: V = variable gene; D = diversity gene; J = joining gene; 7 = heptamer signal sequence; 9 = nonamer signal sequence; 12 = spacer spanning one turn of the DNA helix; 23 = spacer spanning two turns of the DNA helix. (Adapted from Tonegawa, 1983)

The presence of the heptamer/nonamer RSSs, regardless of the coding sequence, is sufficient to mediate V(D)J recombination. This was clearly demonstrated when recombination took place during an assay that utilized an extrachromosomal DNA substrate in conjunction with the heptamer-nonamer RSS (Hesse et al, 1987).

Transcription control elements in rearrangement: Recent evidence indicates that transcriptional enhancers also enhance V(D)J recombinational activity. One study utilising a cell line containing recombination substrates found that these were only rearranged efficiently in the presence of an active enhancer (Oltz et al, 1993). However the effects of the enhancers on V(D)J rearrangement seem to differ between the different Ig loci. Analysis of V(D)J recombination in mice lacking the IgH intron enhancer indicates that D- JH recombination is only slightly affected by the lack of the enhancer, whereas VH-DJH recombination is almost entirely abolished (Serwe and Sablitzky, 1993). However, another study where the IgH intron enhancer or the 5' matrix associated region (MAR; Cockerill et al, 1987) were deleted, found strong inhibition of D-JH rearrangement in both cases (Chen etal, 1993c). A recent study found that the above discrepancies may be due to the insertion of prokaryotic sequences near the rearrangement substrates, which can lead to hypermethylation of the substrate DNA (Fernex et al, 1994). A transgenic model recently

10 demonstrated that hypermethylated sequences do not undergo V(D)J rearrangement (Engler et al, 1993). Fernex et al. showed that rearrangements were not detected in transgenic mice where the rearrangement substrate was in proximity to the prokaryotic sequences, but when the substrate (minus prokaryotic sequence) was transfected into pre- B cell lines, rearrangement took place. They were than able to show that sequences downstream of the intronic enhancer were superfluous for rearrangement. Thus, the above data may suggest that a number of sequence motifs or control elements upstream of, and including the H chain intron enhancer may play important, and possibly synergistic, roles in V(D)J rearrangement. Similarly, it was also found that the removal of the IgK enhancer completely blocked VK-JK recombination (Takeda et al, 1993). In the chicken IgX locus it was shown that the enhancer, the promoter and an as yet unknown signal sequence immediately 5' of the promoter are essential for the rearrangement process to occur (Lauster et al, 1993). The enhancers of the TCR a and 6 loci have also been implicated in the control of the tissue and stage specificity of V(D)J rearrangement (Capone et al, 1993). Thus, some of the regulatory elements of transcription also seem to play a crucial role in V(D)J recombination, possibly by making the genes involved accessible to the V(D)J recombination machinery (Chen et al, 1993c). In addition, by using non- replicative substrates, it was shown that DNA replication does not affect the efficiency of V(D)J rearrangement (Hsieh etal, 1991).

Joining of the genetic elements: The first step of the rearrangement process involves the alignment of the two genes to be joined via the rearrangement signal sequences. DNA binding proteins with high specificity for the heptamer (Aguilera et al, 1987) and nonamer (Halligan and Desiderio, 1987) recognition sequences, and for the 23 bp spacer (Matsunami et al, 1989) have been identified. The DNA strands which are aligned according to the 1 turn / 2 turn rule are nicked between the heptamer recognition sequence and the coding sequence (Alt and Baltimore, 1982; Manser, 1990a; Roth etal, 1992a). An RSS specific endonuclease that introduces double-stranded breaks at these sites has been characterized (Desiderio and Baltimore, 1984; Kataoka et al, 1984; Hope etal, 1986). The coding regions are then ligated to produce a coding joint and the signal sequences are fused to yield a signal joint. The signal joint can be of two types depending on the orientation of the signal sequences (reviewed in Gellert, 1992). Usually the rearrangement of antigen receptor V regions results in the signal joint being deleted from the chromosome as a circular piece of DNA. Excised circular DNA molecules containing signal joints were isolated from thymocyte DNA preparations (Fujimoto and Yamagishi, 1987; Okazaki etal, 1987) and, more recently, were directly observed during TCR 8 chain rearrangement (Roth etal, 1992a). Another type of rearrangement event produces inverted signal joints which are

11 3 0009 03132054 7 not deleted from the chromosome. Signal joint inversion occurs when one of the genetic

elements being rearranged is in the opposite orientation. This is the case in the human IgK

locus where approximately half of the VK genes are in the opposite orientation to JK (Weichhold era/., 1990).

Junctional diversity: A number of ancillary mechanisms of the rearrangement process contribute to antibody diversity. The coding joints are imprecise because recombination usually takes place near, but rarely precisely at the termini of the gene segments (Gearhart, 1982; Tonegawa, 1983). This is in contrast to the signal joints which are exactly located at the border of the heptamer RSS (Hesse et al, 1987), although signal joint nucleotide additions that do not seem to be correlated with the activity of terminal deoxynucleotidyltransferase (TdT) were reported in one study (Lieber et al, 1988a). The loss of nucleotides at the coding joints (Sakano et al, 1981; Kurosawa et al, 1981; Kurosawa and Tonegawa, 1982) could be explained by the action of an exonuclease activity acting on the ends of the coding sequences (Alt and Baltimore, 1982). The deletion of nucleotides from coding joints occurs as frequently in the fetal repertoire as it does in the adult repertoire (Meek, 1990). The enzyme TdT was suggested as being responsible for the addition of non- templated nucleotides called N regions (Alt and Baltimore, 1982) at the coding joints of IgH chains (Kurosawa and Tonegawa, 1982), and at the coding joints of TCR a (Arden et al, 1985) and 6 chains (Siu et al, 1984). In support of this, it was later found that the addition of N regions is directly correlated with the presence of TdT (Desiderio et al, 1984; Landau et al, 1987), and more recently, that the lymphocytes of TdT deficient mice contained few (Gilfillan et al, 1993) or no (Komori et al, 1993) N regions. In the few coding joints possessing N regions (Gilfillan et al, 1993) there were never more than one or two N nucleotides added. Recently it was also shown that N sequence addition occurs in fewer than 30 % of VH-D and D-JH junctions of the fetal repertoire, and whereas N region additions of up to 12 bp are common in the adult repertoire, they are less than 4 bp in the fetal repertoire (Lafaille et al, 1989; Gu et al, 1990; Feeney, 1990; Meek, 1990). An additional diversification mechanism is the addition of one or two P nucleotides at coding joints (Lafaille et al, 1989). P nucleotides are only added to complete coding sequences and they always form palindromes with the one or two nucleotides of the coding sequence that are adjacent to the RSS. The addition of P nucleotides does not involve TdT, since P nucleotides are present in normal numbers in TdT deficient mice (Komori et al, 1993; Gilfillan et al, 1993). A model to explain the addition of P nucleotides wasfirst propose d by Lieber (1991) and later refined by Roth et al. (1993; reviewed in Ferguson and Thompson, 1993). The first step involves the nicking of one DNA strand between the coding sequence and the RSS. The two terminal

12 bases of the coding sequence are then covalently sealed whilst a double-strand break is formed at the signal end. The hairpin is resolved when an endonuclease nicks one of the strands in the hairpin structure, in the process creating either a 3' overhang consisting of a palindrome (P sequence) if the nick is introduced in the anti-sense strand, or a short deletion if the nick is introduced into the sense strand. Support for hairpin formation during rearrangement comes from a number of recent findings. Firstly, double strand breaks were found near the RSS of TCR 8 chains (Roth et al, 1992a). The second piece of evidence comes from mice with severe combined immune deficiency (scid) which are unable to complete T and B cell differentiation (Bosma et al, 1983). These mice can initiate but not complete the rearrangement process (Schuler et al, 1986; Lieber et al, 1988b). It was later found that the coding ends formed in scid mice are covalently sealed to form hairpin structures (Roth et al, 1992b). Hairpin structures have not yet been found in normal mice, perhaps because they are resolved very efficiently, however the possibility exists that they are formed only in scid mice as a result of the defective recombination machinery (reviewed in Ferguson and Thompson, 1993).

VH replacement and secondary Vk rearrangement: VH-VHDJH recombination events, called VH gene replacement, were shown to occur at the IgH locus in pre-B cells in vitro (Reth et al, 1986; Kleinfeld et al, 1986). It was proposed that the VH gene replacement occurred via the conserved heptamer present in framework region 3 (FR3) of most VH genes. This heptamer is identical in sequence to the heptamers found 5' of D gene segments (Early et al, 1982), thus VH-D-JH and D-JH rearrangements in effect carry a 5' heptamer which can mediate the addition of a VH gene. The hypothesis that repeated VH replacement events could occur at the IgH locus (Reth et al, 1986) was confirmed when a non-productive IgH locus was found to have undergone two successive VH replacements (Kleinfeld and Weigert, 1989). VH gene replacement was proposed to be a mechanism that can rescue a non-functional VH-D-JH rearrangement (Reth et al, 1986), and which can result in the more random VH usage in the adult repertoire as compared to the fetal repertoire (Kleinfeld et al, 1986). Secondary VK-JK rearrangements can occur at the IgK locus (Feddersen and Van Ness, 1985), and circular DNAs containing VK-JK joints that were excised during such a secondary recombination event have been reported (Harada and Yamagishi, 1991). This mechanism was found to be important in the prevention of autoimmune disease. Auto­ reactive immature B cells in the bone marrow of transgenic mice were found to alter their auto-antigen specificity by rearranging and expressing a new L chain (Tiegs et al, 1993). This mechanism can thus rescue auto-reactive B cells from any negative selection that may take place at later stages of B cell maturation in the bone marrow.

13 Recombination activating genes (RAG) 1 and 2: The successful transfer of DNA containing the gene for a protein with V(D)J recombinase activity into fibroblast cells (Schatz and Baltimore, 1988) resulted in the isolation and characterization of the V(D)J recombination activating gene, RAG-1 (Schatz et al, 1989). It was later found that the frequency of recombination of artificial substrates in the fibroblast cells increased 1000 fold when an additional recombination activating gene, RAG-2, was also transferred into the fibroblast cells (Oettinger et al, 1990). The two genes are encoded by a single exon and are separated by only 8 kb of DNA (Oettinger et al, 1990). In the mouse the RAG locus has been mapped to chromosome 2 and in humans to chromosome 11 (Oettinger et al, 1992). Analysis of the amino acid sequence of RAG-1 revealed that it is homologous to the yeast gene Hprl, which in turn is related to topoisomerase I, which led to the suggestion that RAG-1 may be a topoisomerase (Wang et al, 1990a). However, when two putative topoisomerase active sites present in RAG-1 were altered, the altered RAG-1 proteins were still able to mediate site specific recombination (Kallenbach etal, 1993). Although the exact function of the RAG-1 and RAG-2 gene products have not yet been determined, it was recently shown that mice carrying crippling germline mutations in the RAG-1 gene (Mombaerts et al, 1992) or in the RAG-2 gene (Shinkai et al, 1992) did not produce any mature B and T lymphocytes, but had no other obvious abnormalities. It was shown in both cases that this was due to the inability of the mice to initiate rearrangement of their Ig or TCR loci. The introduction of a RAG-2 expression vector into the RAG-2 deficient mice restored their ability to rearrange the B and T cell antigen receptors (Shinkai et al, 1992). Thus, in order for normal Ig or TCR rearrangement to occur, both RAG genes need to be expressed in vivo.

Preferential utilization of VH genes: Although during both Ig heavy and light chain rearrangement a gene segment can generally be rearranged to any one of a number of other gene segments (Weigert et al, 1978; Schilling et al, 1980; Gearhart et al, 1981; Manser et al, 1984; Gu et al, 1991a), restricted or preferential utilization of V gene segments has been reported (eg. see Yancopoulos et al, 1984; Perlmutter et al, 1985; Tsukada et al, 1990; Kalled and Brodeur, 1990). Investigations into H chain rearrangement in immature B cell lines indicate that the restricted utilisation of certain VH and D genes may be due to the genes concerned being closest to the other genes involved in the recombination process (Yancopoulos et al, 1984; Perlmutter et al, 1985; Tsukada etal, 1990). However, more recently it was shown that although in BALB/c fetal liver derived A-MuLV transformed cell lines there is a preference for the rearrangement of VH genes belonging to 3' VH gene families (relative to the D gene locus), VH gene usage within these families was random and not correlated to the proximity of the VH gene to the D locus (Atkinson et al, 1993). In addition, a study of in vitro VK rearrangements in pre-B

14 cell lines revealed that the VK genes preferentially utilized belonged to the VK4 gene

family which is separated from the nearest JK gene by several other VK families (Kalled and Brodeur, 1990). Comparisons of fetal lymphocyte repertoires with those of adult mice have revealed that in addition to the lack of N sequence addition, restricted V gene utilisation is also a characteristic of fetal and neonatal immune repertoires (Feeney, 1990; Gu etal, 1990; Meek, 1990; Feeney, 1992). One somewhat vague explanation for the preferential usage of certain IgV genes early in ontogeny was given by Tutter and Riblet (1989), who suggested that certain IgV genes may be conserved and preferentially utilized in V(D)J rearrangement, because they possess intrinsic non-coding functions such as the ability to recombine, although no mechanistic details were given. This suggestion was based on the findings that certain probes specific for mouse group HI VH genes could identify human group m VH genes and VH genes in other vertebrate groups, and that an octamer sequence motif that may promote recombination is present in FR1 of many of these genes (Tutter and Riblet, 1989). The mouse protein group m includes the VH genes belonging to the 7183, J606, S107, X-24 and 3609N gene families (reviewed in Kofler et al, 1992). In addition, most murine fetal pre-B- and B cell lines express antibodies utilising group III VH genes (Perlmutter etal, 1985). A more detailed mechanism to explain biased rearrangements postulates that the RSSs associated with certain VH genes may be preferentially utilized by the recombination machinery (Atkinson et al, 1993). This proposition is supported by the

finding that the preferred rearrangement of the murine LK locus compared to the Lji locus is due to the preferential utilization of the K RSSS by the recombination machinery (Ramsden and Wu, 1991). It was also proposed that certain V region genes are preferentially utilized because of short homologies at or near the recombination break­ points that facilitate the pairing of the V region genes prior to the recombination process (Gu etal, 1990; Feeney, 1992). A study utilizing extrachromosomal recombination substrates transfected into murine pre B cells, found that in this selection free system there was indeed preferential rearrangement of substrates with short stretches of homology (Gerstein and Lieber, 1993). This study also revealed that the selection of partially homologous rearrangement substrates was diminished in the presence of TdT. Furthermore, it was shown that adult mice lacking the enzyme TdT display limited V gene usage, indicating that TdT plays a vital role in the diversification of the adult immune repertoire by blocking homology directed recombination (Komori et al, 1993; Gilfillan et al, 1993). In summary, the immune system has developed a number of strategies to increase its potential repertoire of antibody specificities. The presence of multiple copies of the genetic elements that constitute a variable region can generate a large number of different combinations. The potential repertoire is further increased by the junctional diversity that 15 is generated by inaccurate joining of the genetic elements, and by the removal and non- templated addition of nucleotides at the coding joints. However, the murine immune system has developed an additional diversification mechanism that is operative following an encounter with a T cell-dependent antigen: IgV region-specific somatic hypermutation. This mechanism will be discussed in detail in the next section. c. Somatic hypermutation and the germinal center

Characteristics of somatic hypermutation in murine systems: The importance of somatic hypermutation in the generation of antibody diversity was demonstrated by comparing somatically mutated IgV genes that were produced during T cell dependent immune responses with their germline counterparts. This was initially established for secondary response antibodies (Gearhart et al, 1981; Pech et al, 1981; Bothwell et al, 1981; Seising and Storb, 1981; Crews et al, 1981; Kim et al, 1981), and later for primary response antibodies (Griffiths et al, 1984; Cumano and Rajewsky, 1985; Levy etal, 1989; Tao and Bothwell, 1990; Tao et al, 1993; Apel and Berek, 1990; Kaartinen et al, 1991). The bulk of murine somatic mutations are single point mutations, with a low incidence of deletions or insertions found usually only in heavily mutated genes (Kim etal, 1981; Gearhart and Bogenhagen, 1983; Cumano and Rajewsky, 1986). It has been generally accepted that somatic hypermutation is restricted to T cell-dependent immune responses (Maizels and Bothwell, 1985). However, it was recently shown that B cells responding to the T-independent antigen oc(l—»6)dextran (DEX) contain somatic mutations (Akolkar etal, 1987; Wang etal, 1990b), and that antigen-specific germinal centers are formed during anti-DEX responses (Wang et al, 1994). Somatic mutations are restricted to rearranged IgV regions (Kim et al, 1981; Gorski et al, 1983; Gearhart and Bogenhagen, 1983; Roes et al, 1989), and introduced at similar rates in productive and non-productive V(D)J rearrangements (Pech et al, 1981; Gorski et al, 1983; Roes et al, 1989), whilst the mutation rate is approximately ten times lower in D-JH rearrangements (Roes et al, 1989; Sablitzky et al, 1985a). No somatic mutations in unrearranged VH and VK genes have been detected to date, however it has recently become apparent that unrearranged Vji genes can be targeted by the somatic mutator, but at a reduced rate which is similar to that found in D-JH rearrangements (Weiss and Wu, 1987; Motoyama et al, 1994). In a single instance, somatic mutations were also detected in a JK-CK intron (Weber et al., 1991). CH (Kim et al, 1981) and CK (O'Brien et al, 1987) genes are not mutated, however somatic mutations were recently found in C\\ genes, which suggests that sequences signalling the termination of mutation may be absent in this locus, or embedded in the C region, since the JX-C-X intron is much shorter than those of the other Ig loci (Motoyama et al, 1993). While the somatic

16 hypermutator mechanism is operating, the rate of nucleotide substitution in IgV regions is 10-3-10-4/ bp / generation (McKean et al, 1984; Sablitzky et al, 1985b).

Timing of somatic hypermutation: Somatic mutations in the murine spleen have been detected as early as day 5 of the primary response (Levy et al, 1989). Whereas early in the primary response few mutations are found (Kaartinen et al, 1983; Cumano and Rajewsky, 1985; Levy et al, 1989; Tao and Bothwell, 1990; Apel and Berek, 1990) there is an increase in the level of mutation later in the primary response (Griffiths et al, 1984; Weiss and Rajewsky, 1990; Weiss etal, 1992; Tao etal, 1993). Secondary and tertiary response antibodies display increased numbers of mutations (Berek etal, 1985; Cumano and Rajewsky, 1986; Berek et al, 1987; Blier and Bothwell, 1987; Rada et al, 1991) as well as higher affinity antibodies (reviewed in Berek and Milstein; 1987). Analysis of anti-2-phenyl-5-oxazolone (phOx) B cells isolated from draining lymph nodes revealed that somatic mutations could be detected at day 7 following a primary footpad immunization (Kallberg et al, 1993). In contrast, at day 7 following a peritoneal immunization of phOx, the majority of anti-phOx B cells isolated from the spleen are unmutated (eg. see Kaartinen et al, 1983; Griffiths et al, 1984; Berek et al, 1985; Kaartinen et al, 1986; Pelkonen et al, 1986). This suggests that somatic mutation may be initiated earlier in germinal centers in lymph nodes that are close to the site of infection, than it does in splenic germinal centers. In addition, the study by Kallberg et al. also compared two different sets of mutated K light chains: one set included not only VK- Oxl rearrangements, but also related K chains (non-homogeneous set), whereas the other set consisted only of mutated VK-OX1 rearranged sequences (homogeneous set). This comparison revealed that the non-homogeneous set displayed a higher incidence of somatic hypermutation at day seven of the primary response than the homogeneous set. The interpretation of this was that the study of somatically mutated rearrangements of single VH or VK genes may actually underestimate the somatic mutation rate. Nevertheless, molecular studies of B cells from individual splenic germinal centers showed that the delayed onset of mutation in splenic germinal centers is unlikely to be due to the analysis of single VH gene rearrangements (Jacob et al, 1991a; Jacob et al, 19912; see below).

Boundaries of somatic hypermutation: The somatic hypermutator mechanism is focused on the rearranged V(D)J region, irrespective of which J element was rearranged (Lebecque and Gearhart, 1990; Weber et al, 1991). Studies on the distribution of somatic mutations around rearranged IgVH regions indicate that there is a peak in mutation frequency over the rearranged V(D)J and a gradual decline in the 3' flanking region (Both et al, 1990; Lebecque and Gearhart, 1990). The most distal 3' mutations were found in the heavy chain intron enhancer (Lebecque and Gearhart, 1990), although

17 this is most likely to be due to a distance effect rather than due to a termination signal. Whilst Lebecque and Gearhart (1990) proposed that the promoter is the usual delimiting boundary for somatic hypermutation, Both and colleagues (1990) suggested that the transcription start site (cap) is the 5' boundary for somatic hypermutation. A number of recent data support the latter conclusion (Rothenfluh etal, 1993; see chapter 6). Among four hybridoma derived V\\ sequences, the 5' boundary of mutation was found to lie within the leader intron, only two mutations were detected in the 5' non-coding region (Motoyama et al, 1991; Motoyama et al, 1994).

Somatic hypermutation, antigenic selection and 3-dimensionaI conformation of V regions: IgV regions are composed of three complementarity determining regions (CDR) and four framework regions (FR; Wu and Kabat, 1970). The antigen binding site at the end of each arm of the Y-shaped immunoglobulin molecule is composed of 6 polypeptide loops - the CDRs, 3 contributed by the heavy chain and 3 by the light chain - which are connected by theft shee t FRs (Alzari et al, 1988). It is well documented that there is strong selection for replacement mutations in the CDRs and selection against such changes in the FRs (Reviewed in Allen et al, 1987; Berek and Milstein, 1987,1988; Blier and Bothwell, 1988). It has been shown for a number of anti-hapten immune responses that single mutations in CDRs can greatly increase antibody affinity for the antigen. Most high- affinity anti-(4-hydroxy-3-nitrophenyl)acetyl (NP) antibodies carry a tryptophan—leucine replacement at residue 33 (Cumano and Rajewsky, 1986; Blier and Bothwell, 1987) which results in a 10-fold increase in affinity (Allen et al, 1987). The threonine—»isoleucine change at codon 57 and the lysine-»threonine at codon 58 of antibodies produced against p-azophenylarsonate (Ars) together result in an eight-fold affinity increase (Sharon et al, 1989), and the histidine-»glutamine/asparagine replacement at codon 36 of anti-2-phenyl-5-oxazolone (phOx) antibodies increases affinity by 8-10-fold (reviewed in Berek and Milstein, 1987). In all of the above immune responses, the critical affinity increasing mutations are seen in most mutated antibodies, indeed, the replacement of the histidine at position 34 by either a glutamine or an asparagine was found in all tertiary anti-phOx antibodies (Berek et al, 1987). In contrast, this mutation is never seen in passenger K transgenes which are not subjected to antigenic selection (Sharpe et al, 1991; Betz et al, 1993; Gonz£lez-Fern£ndez and Milstein, 1993). This indicates that antigenic selection of antibodies results in the positive selection of certain key, affinity enhancing mutations. This is supported by several recent studies involving engineered antibodies. In one example, two independently generated anti-Ars hybridomas produced antibodies that shared 4 amino acid replacement mutations in CDR2 of the heavy chain, and additional hybridomas contained various combinations of these 18 replacements (Wysocki et al, 1990). In an extension of this study the 4 mutations were either removed from hybridomas expressing mutated IgV regions, or introduced into hybridomas expressing unmutated IgV regions (Parhami-Seren etal, 1990). This clearly demonstrated that the 4 mutations did indeed increase the affinity for the antigen. When the tryptophan—leucine mutation, which is characteristic of virtually all high affinity anti- NP antibodies, was removed from 3B44 (a somatic mutant of the VH186.2 germline IgV gene carrying an additional 7 replacement mutations in its H chain), the affinity of this antibody was reduced 10-fold (Allen et al, 1988), thus this mutation alone was responsible for the affinity increase. Somatic mutations found in a heavily mutated anti- Ars hybridoma with exceptionally high affinity were introduced into an unmutated (Sharon, 1990). This demonstrated that 3 amino acid replacement mutations greatly increased affinity for the antigen, whereas several other mutations had no effect on affinity. In this study, it was also found that the contribution of the light chain to affinity increases was not always significant. This latter point is in agreement with another study which found that the Ig generating system is dependent more on VH than VL for the generation of distinct Ig specificities (Chothia and Lesk, 1987). Studies on the three-dimensional structures of antibodies have revealed that variations at residue 71 of H chain FR3 have significant effects on CDR1 and CDR2 (Chothia et al, 1989; Tramontano et al, 1990). The position and conformation of the CDR1 and CDR2 loops varies depending on the side chains of the amino acid present at position 71 (Tramontano et al, 1990) and in a study on chimaeric mouse/human antibodies it was found that the substitution of a phenylalanine with a tyrosine at residue 71 increased affinity (Foote and Winter, 1992). However, another study demonstrated that amino acid replacements at 2 FR positions of an anti-Ars antibody had no effect on affinity (Sharon, 1990). Two recent studies, also utilizing site-directed mutagenesis, clearly demonstrated that the alteration of any antigen contact residue or residue that is critical in determining the conformation of the antigen contact sites, is conserved (Parhami-Seren, 1993; Sompuram and Sharon, 1993). In addition, determination of the three-dimensional interaction of a number of anti-phOx antibodies with the hapten phOx, confirmed that most of the residues that are characteristic of mutated high affinity anti- phOx antibodies were critical antigen contact residues (Alzari etal, 1990). Thus, antigenic selection can have the following effect at any given position of the IgV region: a) if the amino acid is not directly involved in antigen contact or maintaining the geometry of the binding site, then many conservative and some non-conservative replacement mutations will be tolerated; b) if the amino acid is directly involved in binding antigen (usually CDR residues, but also some FR residues; reviewed in Alzari et al, 1988), then non-conservative replacement mutations are not tolerated unless they optimise the antigen-antibody interaction further. However, antigenic selection does not solely target point mutations, the length and sequence of the V-J or V-D-J joints are also 19 conserved in some specific antibodies (Alzari etal, 1990; Ruff-Jamison and Glenney, 1993). Presumably this is because the size and the conformation of CDR3 in these cases is critical for a high affinity interaction with the antigen. Nevertheless, it has become apparent that some residues that are highly conserved in some immune responses are not essential for antigen binding. Thus, when the invariant tryptophan residue found at position 36 of VH genes was converted to a different amino acid, the antibodies were shown to have retained their antigen binding ability (Sharon, 1988). Similarly, when the CDR2 of the anti-phosphocholine antibody T15 was altered by in vitro random mutagenesis, some CDR2 regions with amino acid replacements at some conserved residues also did not lose their antigen binding ability (Chen et al, 1992). This study also determined that approximately 57 % of the mutagenized CDR2 regions had lost or reduced their ability to bind antigen. In some cases, more than one mutation was required to bring about the loss of or a reduction in antigen binding. Thus, although some residues are critical for antigen binding, others may play a role in the maintenance of overall Ig structure, signal transduction, or they may be required in earlier stages of B cell differentiation or development (Sharon, 1988; Chen etal, 1992). Mian et al (1989) made the interesting suggestion that as a result of antigenic selection forces during immune responses, there is selection for amino acids at antigen binding sites that possess a wide range of biochemical/biophysical characteristics. They argued that the presence of amphipathic amino acids and those with large and flexible side chains may be of benefit at antigen contact sites. Amphipathic amino acids would be unaffected by changes in hydrophobicity, and large amino acids could generally be involved in diverse electrostatic and van der Waals' interactions, whilst amino acids with flexible side chains could provideflexibility. Tryptopha n and tyrosine are two amino acids that best meet the above criteria, thus they would be expected contribute significantly to the amino acid composition of antigen binding sites. As can be seen in Table 1.1, this is indeed the case. Therefore at antigen contact residues, there seems to be selection for amino acids that will not only increase affinity, but that will also provide greater structural and physico-chemical flexibility.

20 Table 1.1. Percentage composition of the amino acids in CDRs and at antigen binding positions (Adapted from Mian et al, 1989). Potential Known Amino Variable CDR binding binding Acid regions regions positions positions Alanine 6.44 7.06 3.50 1.1 Arginine 3.55 3.80 2.84 5.7 Asparagine 2.55 7.44 11.17 6.9 Aspartic acid 3.81 5.09 6.14 8.0 Cysteine 2.03 0.13 0.16 0.0 Glutamic acid 5.68 2.38 2.42 3.4 Glutamine 3.58 4.35 0.57 1.1 Glycine 9.40 7.01 8.68 8.0 Histidine 0.77 2.39 3.92 2.3 Isoleucine 3.98 3.41 3.43 1.1 Leucine 7.79 5.09 2.43 1.1 Lysine 4.42 4.10 2.59 0.0 Mettaonine 1.76 2.07 0.23 0.0 Phenylalanine 2.96 2.53 2.28 3.4 Proline 4.45 2.81 3.68 3.4 Serine 13.84 17.27 14.88 13.6 Threonine 8.66 6.05 5.90 4.6 Tryptophan 2.10 1.94 5.52 10.2 Tyrosine 5.31 11.08 17.30 25.0 Valine 6.90 4.05 2.34 1.1 Total number of amino acids 110 816 27 007 15 054 88

Recruitment of new B cell clones: In two well characterized T cell dependent immune responses, those against the haptens phOx in BALB/c mice (Kaartinen et al, 1983; Griffiths et al, 1984; Berek et al, 1985) and NP in C57BL mice (Makeia and Karjalainen, 1977; Bothwell etal, 1981; Cumano and Rajewsky, 1985, 1986), the primary response antibodies are much less heterogeneous than the secondary response antibodies. In BALB/c mice, the primary anti-phOx response is dominated by the idiotype (id495) encoded by the VKOXI-VHOXI gene combination (reviewed in Berek and Milstein, 1987; and in Makeia and Kaartinen, 1988), whereas in C57BL mice the primary anti-NP response is dominated by the V^I-VH186.2 gene combination (Cumano and Rajewsky, 1985, 1986). For the anti-phOx response, it was shown that the usage of id495 varies between strains. Thus, in the anti-phOx response of C57BL, AKR and NZB mice, id495 contributes only a minor proportion of antibodies (Solin et al, 1992). Nevertheless, in most mouse strains, id495 plays a major role in the primary anti-phOx response (Makeia and Kaartinen, 1988; Kaartinen et al, 1989), and the studies on somatic hypermutation involving the anti-phOx response focused on the VKOXI-VHOXI gene combination (reviewed in Berek and Milstein, 1987, 1988). However, a recent comparison of V gene usage in the primary anti-phOx response of 12 strains of mice has revealed another major idiotype, id350 (Solin et al, 1992). Indeed, the data indicated that

21 in the mouse species, this idiotype seems to be more important in the primary anti-phOx response than id495. In both the anti-phOx and anti-NP responses in BALB/c and C57BL mice respectively, the secondary response is dominated by new gene combinations (Makeia and Karjalainen, 1977; Reth et al, 1978; Bothwell et al, 1981; Berek et al, 1985, 1987). The new gene combinations apparent in the BALB/c secondary response to phOx generally have higher affinities for the hapten (Berek et al, 1985,1987). In another well studied system, the anti-Ars response in A/J mice, the secondary response is dominated by a VH gene that is rare in the primary response (Manser et al, 1984; Wysocki et al, 1986; Manser et al, 1987). The domination of the primary response by specific gene combinations is presumably due to the early activation of germline combinations that provide the best protection against the invading antigen, and hence are clonally selected on initial contact with the antigen. Somatic mutation and antigenic selection results in higher affinity variants of these initially dominant clones later in the immune response. However, other gene combinations, initially present in low numbers also undergo clonal expansion, somatic hypermutation and antigenic selection, and some of these later recruits may then dominate subsequent immune responses (reviewed in Berek and Milstein, 1987, 1988). Some of the newly recruited gene combinations are successful because they utilize the same or a very similar antigen binding strategy as the initially successful gene combinations. Alzari et al (1990) showed that a secondary anti-phOx antibody, which utilized a VH gene unrelated to VHOXI but coupled with the VKOXI L chain, possessed a CDR3 of the same size and most of the key contact residues also found in most mutated VKOXI-VHOXI antibodies (Alzari et al, 1990). However, other new recruits which have longer H chain CDR3 and L chain CDR1, but still possess good affinities for the antigen (Berek et al, 1985), clearly utilize a different, but no less successful, antigen contact strategy. A comparison of the kinetics of the VKOXI-VHOXI (OX) anti-phOx antibodies with anti-phOx antibodies utilizing neither of those two genes (non-Ox), found that there were no significant differences in the affinity maturation, which suggests that the repertoire shift is not due to selection for higher affinities (Foote and Milstein, 1991). Both antibody types also displayed gradually declining dissociation rates with time, i.e. increased lifespan of hapten-antibody complexes in later immune responses. However the non-Ox antibodies had significantly higher association rates than the Ox antibodies, and the association rates did not alter throughout the time course of the responses. Therefore, the recruitment and expansion of B cell clones with new gene combinations can be a consequence of the selection for antibodies with high association rates (i.e. antibodies are being selected at two levels: affinity for antigen and speed of antigen-antibody complex formation) and, although somatic hypermutation followed by antigenic selection results in

22 higher affinity antibodies with lower dissociation rates, the association rate is not affected by the accumulation of mutations.

Germinal centers are the site of somatic hypermutation: It was proposed some time ago that somatic hypermutation and subsequent antigenic selection of high affinity mutants takes place in germinal centers (MacLennan and Gray, 1986). This was later confirmed by a number of studies that demonstrated the clonal accumulation of somatic mutations in germinal cento- B cells (Apel and Berek, 1990; Berek et al, 1991; Weiss et al, 1992; McHeyzer-Williams etal., 1993), and/or the mutation and expansion of B cell clones in germinal centers (Jacob et al, 1991a; Jacob and Kelsoe, 1992; Jacob et al, 1993; Kiippers et al., 1993). Germinal centers form inside lymphoid follicles a few days after antibody production in response to antigenic stimulation (reviewed in MacLennan and Gray, 1986; MacLennan et al., 1992). The structure of a typical germinal center is shown in Figure 1.5.

/ *->. ^ Follicular / j^*\*^ mantle f^^——^---^^^----V Outer zone f // ^——- ^\\—r~— AP*cal li&kt f ( ^\ \ j zone U yl. -—-X^/^ ""^-J-—^ Basal light \ (^ **^_T) \J zone ^^———-—-^s"~~~~~ - Dark zone

Figure 1.5. Highly stylized diagram indicating the zones of a germinal center as seen at the peak of the germinal center reaction. (Adapted from MacLennan et al, 1991; Hardie et al, 1993).

Following primary immunization, two anatomically and phenotypically distinct sites of B cell proliferation become apparent within the spleen (MacLennan et al, 1988; Jacob et al., 1991b). Between 0-2 days following immunization, antigen specific B cells can be detected in the periarteriolar lymphoid sheath (PALS)-foci. Intense proliferation of 1-3 founder B cell clones results in the expansion of the PALS-foci until day 4-8. After this time, the foci decrease in size until they almost disappear after 1-2 weeks (MacLennan et al., 1988; Jacob et al., 1991b). The other site of antigen dependent B cell proliferation are the germinal centers. A few days following immunization, fewer than 5 B cell blasts bearing surface immunoglobulin (slg+) enter the follicular dendritic cell (FDC) network in the follicles (Kroese et al, 1988; Jacob et al., 1991a,b; Liu et al, 1991a). The B cell blasts undergo rapid cell cycling, and at day 3-4 of the immune

23 response they have filled the FDC network (MacLennan et al, 1988; Liu et al, 1992). Within a short period of time the B cell blasts move to one end of the FDC network and form the dark zone (Fig. 5) which contains few FDCs. During this time the B cell blasts have lost expression of slg and, among other phenotypic changes, no longer express the bcl-2 gene product, and are now called centroblasts. Although the centroblasts proliferate, they do not increase in number, rather, they giverise to non-dividing, slg+ centrocytes. The centrocytes move into the dense FDC network of the basal light zone of the germinal center (reviewed in MacLennan et al, 1992; Liu et al, 1992). Most of the centrocytes in the basal light zone undergo apoptosis (MacLennan et al, 1988; Liu et al, 1989). It was shown that the centrocytes can be rescued from apoptosis by cross-linking their antigen receptor with anti-Ig bound to sheep red blood cells, but not by soluble anti- Ig (Liu et al., 1989). The bcl-2 gene is normally expressed in B cells but not in the centroblasts or centrocytes of germinal centers (Hockenbury et al, 1991). However, when the slg of germinal center centrocytes are cross linked, they begin to produce the bcl-2 protein (Liu et al, 1991b). A number of studies have demonstrated that expression of the bcl-2 protein inhibits apoptosis of B cells in germinal centers and of T cells in the thymus (McDonnell et al, 1989; Hockenbery etal, 1990; Liu etal, 1991b; Sentman et al, 1991; Strasser et al, 1991). FDCs were shown to be able to rescue germinal center B cells from apoptosis (Kosco et al, 1992), as well as providing additional signals that stimulate B cell proliferation (Burton et al, 1993). This has led to the suggestion that the centrocytes are programmed to undergo apoptosis unless they express slg of sufficient affinity to bind with the antigen presented on the surface of the FDC. The cross linking of the slg then activates expression of the bcl-2 gene, which rescues the centrocyte from apoptosis (reviewed in MacLennan et al, 1992; Liu et al, 1992). Additional work has also shown that germinal center centrocytes can also be prevented from the apoptotic pathway by incubation with CD23 in combination with interleukin (IL) la, but not with either of the two factors alone (Liu et al, 1991c). Centrocytes stimulated with CD23 and IL-la or with UL-2 differentiated to become plasma cells (Liu et al, 1991c), whilst stimulation with CD40 alone or in combination with IL-4 resulted in the formation of memory B cells or memory B blasts respectively (reviewed in MacLennan et al, 1992; Liu et al, 1992). The FDC of the basal light zone do not express CD23, whereas apical light zone FDC display high levels of expression of CD23 (Hardie etal, 1993). This therefore suggests that centrocytes reaching the apical light zone may be stimulated to become antigen secreting plasma cells (Liu et al, 1991c). During primary immune responses in mice, germinal center size peaks at around day 9 of the response, which is followed by a reduction in size until the germinal centres have reached pre-immune size after approximately 3 weeks. During secondary responses, germinal center size peaks at day 7, and has returned to pre-immune level by day 14 (Hollowood and Macartney, 1992). 24 It was proposed that plasma B cells and memory B cells arise from different precursor cell populations (Linton et al, 1989), and that only the memory B cell precursors can form germinal centers (Linton et al, 1992). However, this is not supported by a recent study, which found identical VH-D and D-JH joints in B cells isolated from neighbouring PALS-foci and germinal centers (Jacob et al, 1992). Therefore, after unmutated B cells have undergone proliferation and selection, some of these cells may migrate to nearby lymphoid follicles, where they form germinal centers. This is consistent with the finding that antigen stimulated B cells, but not virgin B cells can respond to the antigen on the surface of FDC (reviewed in MacLennan et al, 1991).The ratio of PALS-foci (100-150/spleen) to germinal centers (300-500/spleen) suggests that one PALS-focus may provide the founder cells for more than one germinal center (Jacob et al., 1992).

The germinal center reaction: The information presented above suggests that the following sequence of events takes place prior to and during the germinal center reaction (see Fig. 1.6): B cells that have been activated by antigen migrate to and form oligoclonal PALS-foci where they proliferate rapidly. This occurs prior to germinal center formation, and by the time germinal centers are increasing in size to reach their peak activity, the PALS-foci are dwindling. Thus, it seems likely that PALS-foci are the site where interclonal competition for antigen by unmutated, activated B cells takes place (Jacob et al, 1991b), whereas in germinal centers intraclonal competition for antigen by mutated B cells takes place. In addition, the proliferation of antigen specific B cells in PALS-foci very early during the immune response contributes to the rapid appearance of antigen specific antibodies (Jacob etal., 1992). Somatic mutation is likely to be active in the centroblasts proliferating in the dark zone (reviewed in MacLennan et al, 1991, 1992). The mutated centroblasts giverise t o progeny centrocytes which express the mutated antibody on their surface. The centrocytes move to the basal light zone, where they must bind to the antigen displayed on the FDC, otherwise they will undergo apoptosis (Fig. 1.6; Liu et al, 1989). Germinal center B cells do not usually display evidence of somatic mutations during the early part of the germinal center reaction, and at day 10 many deleterious mutations can be found, suggesting that antigenic selection has not yet had great impact (Apel and Berek, 1990; Rada et al, 1991; Berek et al, 1991; Weiss et al, 1992; Jacob et al, 1993). The mutation frequency increases until day 14-16 (Weiss et al, 1992; Jacob et al, 1993), after which it does not increase further. At this stage there is evidence for strong antigenic selection against deleterious mutations (Weiss and Rajewsky, 1990; Weiss et al, 1992; Jacob etal, 1993; McHeyzer-Williams etal, 1993). The outer zone may provide a pathway by which the somatically mutated and antigenically selected B cells return to the dark zone, where they may undergo a further 25 round of somatic mutation (MacLennan et al., 1991). This would certainly be consistent with the step-wise manner in which somatic mutations accumulate in IgV regions (eg. see McKean etal., 1984, Sablitzky etal, 1985b; Blier and Bothwell, 1987; Berek et al., 1987; Manser, 1989), however in order to revert to the centroblast phenotype the centrocyte would first need to lose expression of bcl-2 and slg.

Mature antigen responsive B cell

Bcl-2+ve/-ve/ i M \ // slg +ve

ANTIGENIC STIMULATION Proliferation Development of lymphoid follicles Ig secretion Differentiation Primary response Centroblast - Oligoclonal growth Low affinity Ab FORMATION OF GERMINAL CENTERS PALS - Foci slg negative Plasma \ // Bcl-2 negative Cell ~' Non Ig secretor 7hr generation time Hypermutator on?

/r slg positive Proliferation and secretion of Bcl-2 negative unmutated germ-line V(D)Js \ No division Mutated Memory Bcell " Mutated Plasma Bcell

LOW Ag-Ab .v32a-'// \\ AFFINITY C3b complexes N*s* ' ' " COMBINING (PALS-Foci Ab?) ^ SITES

•/•'• ;':•"•>' Follicular Dendritic Cell

Figure 1.6. Proliferation and somatic hypermutation of centroblasts in the dark zone, followed by antigenic selection of the centrocytes. Those expressing high affinity antibodies emerge from the germinal center as a memory cell or a plasma cell, depending on which signals were received. (Adapted from Steele etal, 1993).

26 It has been shown that germinal center B cells can take up, process and present the antigen displayed by FDCs (Kosco et al, 1988; Gray et al, 1991). Therefore, any B cells that present antigen may come in contact with antigen specific T cells, which are present in the light zones of later stage germinal centers (reviewed in Nieuwenhuis et al., 1992), and initiate the interaction between the two cell types. An important step of this intracellular interaction is the binding of the CD40 ligand on the T cell to the CD40 receptor on the B cell (reviewed in Clark and Ledbetter, 1994). Centrocytes that receive the CD40 mediated signal may eventually emerge as memory B cells from the germinal center (Reviewed in MacLennan et al, 1992; Liu et al, 1992). Any centrocytes which did not present processed antigen to T cells, presumably undergo the plasma B cell differentiation pathway when they come in contact with the CD23 expressing FDCs of the apical light zone (Liu et al, 1991c). Thus, mutated and antigenically selected plasma and memory B cells producing high affinity antibodies emerge from germinal centers.

Transgenic models for somatic hypermutation: A number of laboratories have utilized transgenic mice for the study of somatic hypermutation and the elucidation of the genetic elements involved in its control. Work with Ig transgenic mice has shown that multiple copies of the transgenes integrate into the genome, that the transgenic Ig are expressed in atissue specifi c manner and that usually the integration site has no effect on expression, providing that the transgene contains the transcriptional and other control elements contained in the vicinity of the rearranged coding region (reviewed in Storb, 1987). Storb and colleagues constructed a transgene from a rearranged K chain that had previously undergone somatic hypermutation and antigenic selection (O'Brien et al, 1987). By showing that the K transgenes could undergo somatic hypermutation, they demonstrated that the m-acting elements sufficient for the expression of the transgene were also sufficient to target the mutator mechanism, and that no additional tams-elements were necessary, because the transgenes were mutated regardless of their integration site. However, a more careful analysis of the mutation pattern in and around these transgenes (Hackett et al, 1990) found a high rate R:S mutations in the FRs. Because one of the expressed transgene copies contained a stop codon, it was argued that the individual transgene copies are subject to reduced selective pressures. Whereas this might be a contributing factor, the transgene under study was a gene that has previously undergone somatic hypermutation and subsequent antigenic selection, i.e. it has already proven itself to be a high affinity antibody. Only the additional mutations not present in the original transgene were included in the analysis (Hackett et al, 1990). It is therefore reasonable to assume that any additional replacement mutations in the CDRs and at critical FR positions would be selected against, since they may reduce affinity for the antigen. Hence the additional mutations observed in these transgenes (O'Brien et al, 1987; Hackett et al, 27 1990) may be a reflection of mutation/selection events that occur when a previously mutated gene undergoes the germinal centre reactions for a second time. A transgene containing a rearranged VHDJH region and approximately 2.5 kb of 5 flanking region and approximately 6.3 kb of 3' flanking region (including the enhancer, C^ genes and the Qi switch region) was not only expressed and somatically mutated in the transgenic mice, but it also switched H chain class from IgM to IgG (Durdik et al, 1989). This suggests that the sequences sufficient for the expression of rearranged IgH chains are also sufficient for somatic hypermutation. Similar to the K transgene studied by Storb's group, the H chain used in the transgene was also obtained from a somatically mutated hybridoma, however, in this study not enough somatic mutations were detected to make any meaningful comparisons between the normal mutation patterns and the additional mutations detected in the IgH transgenes. Two different K transgenes, both of which lack the enhancer present 3' of the CK region (Meyer and Neuberger, 1989), did not undergo somatic mutation (Sharpe et al, 1990; Carmack et al, 1991). Sequence analysis revealed that the endogenous rearranged genes were mutated, indicating that the lack of mutation was due to the absence of one or several critical m-acting control elements. Since in both cases the K transgenes were transcribed, the missing enhancer is not crucial for gene expression. However, when the region containing the 3' CK enhancer was added to one of the transgenes, it accumulated somatic mutations (Sharpe et al, 1991). This suggests that the 3' enhancer and/or sequences in close proximity to it may be crucial for the correct targeting of somatic hypermutation. Similar data has recently been published for H chain transgenes which did not contain any sequences 3' of the intron enhancer (Giusti et al, 1992). These transgenes only underwent somatic hypermutation when they recombined to endogenous C region genes to form complete Ig molecules (Giusti et al, 1992; Giusti and Manser, 1993). Giusti and Manser showed a somatic interchromosomal recombination event, which occurs at low frequency, is responsible for the production of the chimaeric Ig molecules (Giusti and Manser, 1994). They also demonstrated that elements or sequence motifs 5* of the H chain promoter are not necessary for somatic hypermutation (Giusti and Manser, 1993). Another transgenic model using a construct containing the VH promoter region and the IgH intron enhancer region, but where the VDJ gene was replaced with a chloramphenicol transferase (CAT) gene, found that the CAT gene was mutated, but at a 5- to 10- fold lower rate than normal (Azuma et al, 1993). The IgH intron enhancer is flanked on both sides by MAR elements (Cockerill et al, 1987). The partial H chain transgene used by Manser and colleagues (Giusti et al, 1992; Giusti and Manser, 1993, 1994) contained a truncated form or no part of the 3' MAR, whereas the CAT transgene used by Azuma and colleagues contained all of the 5' and 3' MARs (Azuma et al., 1993).

28 Thus, the total absence of mutation in the former and the low rate of mutation in the latter transgene may be correlated with the presence of the 31 MAR. Nevertheless, it is clear that the presence of the J-C intron enhancer region is not sufficient to efficiently target the mutator mechanism to rearranged VHDJH regions. It is therefore likely that for somatic hypermutation to operate efficiently on the H chain as well as the K light chain, the 3' enhancer must be present. The introduction of an IgH chain intron enhancer into a TCR VflDJfl transgene did not result in a transgene that could act as a substrate for somatic hypermutation, even though the endogenous IgV regions did hypermutate (Hackett et al, 1992). This suggests that TCR V regions either do not hypermutate, or the control elements recognized by the putative T cell hypermutator differ from those utilized by the B cell hypermutator (reviewed in Steele et al., 1993). Alternatively, it is possible that the somatic hypermutator acting on IgV regions cannot recognize the TCR promoter. The study of K transgenic mice has also led to a clarification of the intrinsic specificity of the somatic hypermutation mechanism (Sharpe et al, 1991; Betz, et al, 1993; Gonzalez-Fernandez and Milstein, 1993). These transgenes do not carry the characteristic, antigen selected histidine-»asparagine and tyrosine—^phenylalanine replacement mutations at positions 34 and 36 respectively, but they reveal a clustering of mutations in CDR1, as well as three prominent and a number of minor mutational hot spots (Betz, et al, 1993; Gonzalez-Fernandez and Milstein, 1993). These unselected mutations also showed a preference for transitions over transversions (Gonzalez- Fernandez and Milstein, 1993), which is in agreement with the mutation preference found in H chain flanking regions (Both et al, 1990; Rothenfluh et al, 1993). A study involving transgene-expressing hybridomas isolated from IgM H chain transgenic mice, found that more than one transgene copy can be present in a single cell, that the rates of mutations can differ between these, and that in some cases both mutated and unmutated copies of the transgene were present (Sohn et al, 1993). Similarly, in hybridomas isolated from K transgenic mice, most copies of the transgenes were mutated (Lozano et al, 1993). However, in the latter study it was noted that the expression of the different transgene copies varied, some copies were even downregulated. The expression of transgene copies carrying mutations that improve antigen binding was found to be higher than the expression of the other transgene copies, however it was not determined whether this was at the level of transcription or during later events. Nevertheless, this study clearly illustrates that B cells can regulate the expression of antibody genes in a manner that ensures that not only one specificity will be expressed, but that the antibody with the highest affinity will be produced. The role that antigenic selection and intracellular processes play in this feedback signal process is not yet clear. In a recent study, mice that were unable to rearrange their endogenous IgH and IgL chains but contained human IgH and IgK minilocus transgenes were generated

29 (Lonberg et al, 1994). Not only did these mice express human antibodies, but it was shown that the human transgenes could undergo isotype switching and somatic hypermutation. Therefore the control sequences for V(D)J joining, isotype switch and somatic hypermutation must be very similar, if not identical in both species.

Somatic hypermutator models: An early study on somatic mutation only found mutations in IgG antibodies, none in IgM antibodies, which led to the suggestion that the somatic hypermutator mechanism is linked to IgH chain switching (Gearhart et al, 1981). However, a number of studies have since demonstrated the presence of somatic mutations in IgM antibodies (Griffiths et al, 1984; Tao et al, 1990, 1993; Kaartinen et al, 1991). Work on transgenic mice has also shown that IgH chain switching is not required for somatic hypermutation (O'Brien et al, 1987; Sohn et al, 1993). One of these studies on IgM transgenes (Sohn et al, 1993), found that some B cells expressed both IgM and IgG antibodies derived from the transgene. In these cells, the transgenes that had switched to IgG expression contained higher rates of mutations, suggesting that even though the activation of somatic mutation is not dependent on the IgH chain switch, sequences associated with the 7C regions may enhance somatic mutation. The finding of B cells with identical mutations and IgH switch recombination events (Siekevitz et al, 1987) led to the suggestion that the IgH chain class switch may terminate somatic hypermutation (Rajewsky et al, 1987). However, this was disproved when clonally related hybridomas with identical class switch recombination events were shown to contain different mutations (Shan et al, 1990). Three major classes of models have been postulated to explain somatic hypermutation: those involving error-prone DNA synthesis/repair (Brenner and Milstein, 1966; Gearhart, 1981; Manser, 1990b), gene conversion events (Krawinkel et al, 1983; Krawinkel etal, 1986; Golding etal, 1987; Kolchanov etal, 1987; Maizelsri989) and an error-prone DNA-»RNA-»DNA information loop (Steele and Pollard, 1987). Somatic hypermutation due to error-prone DNA synthesis/repair: According to one model involving error-prone DNA repair (Fig. 1.7), a specific nicking enzyme introduces a nick into one strand of the DNA (Brenner and Milstein, 1966) in or around rearranged IgV regions (Gearhart, 1981). A 3'-»5' exonuclease then removes bases from the single-strand nick. A putative error-prone DNA polymerase then repairs the gap and introduces the occasional mismatched nucleotide. A more recent modification of this type of model was the proposal that the hypermutator mechanism is dependent on the transcriptional state of the IgV regions (Lebecque and Gearhart, 1990). Local unfolding of the chromatin around the V(D)J region would thus allow access of the mutational machinery to the target area.

30 Rearranged V region 5'- 1 1 3' 3'- 1 jS ' ^ V region-specific y nicking 1 1 1 II 1 Removal y of bases 1 1 1 1 _J Error-prone y repair 1 1 - 1 • 1

Replication > t 1 • —1_ 1 • 1 _| 1 1 ZZh— L Z3—- Figure 1.7. Specific nicking followed by exonuclease activity and error-prone DNA repair The process may be repeated a number of times during clonal expansion. Symbols: i = a mutational event (Adapted from Brenner and Milstein, 1966; Gearhart, 1981).

The in vitro error rates of DNA polymerases a and 6 approximate the somatic mutation (reviewed in Kunkel, 1988, 1991), however neither of these enzymes displays the 2:1 transition to transversion ratio, and it is possible that the in vitro error rates are higher than the in vivo rates due to the lack of accessory factors or additional processes (Kunkel, 1991). Nevertheless, if a more accurate DNA polymerase is responsible for filling the gap(s) introduced into the IgV region, then repeated rounds of nicking, removal of nucleotides and DNA repair could explain the high mutation frequency (Gearhart, 1981). It is difficult to conceive how this process could maintain its IgV region specificity. Seising and Storb (1981) proposed that an enzyme involved in V(D)J recombination introduces the site-specific nick. However, the fact that transgenes can hypermutate independent of rearrangement (O'Brien etal, 1987), and the step-wise manner in which somatic mutations are introduced into the target region (eg. see Blier and Bothwell, 1987; Berek et al, 1987; Manser, 1989) are incompatible with this explanation. A more recent model (Manser, 1990b) invokes the localized replication of rearranged V(D)J genes, which is independent of chromosomal replication. This is somewhat similar to the model proposed by Bothwell (1984), but provides more mechanistic detail. Mutations are only introduced into the lagging strand because the additional mechanisms involved, i.e. the removal and filling in of the RNA primers and

31 ligation of the Okazaki DNA fragments, are more error-prone when the cell is not in S- phase. The newly synthesized lagging strand is now in the correct orientation for RNA synthesis (i.e. 5'-»3'), whereas the leading strand is not. The mutated lagging strand is now expressed with the unmutated C gene, and if the mutated IgV region has no or low affinity for the antigen, then the mutant lagging strand duplex is removed. The process is then repeated until the B cell produces a functional, high affinity Ig or until the cell dies. It was suggested that the intron enhancer of IgH and IgK loci may trigger the localized DNA replication, however transgenic studies have shown that additional control elements are necessary (Sharpe et al, 1991; Giusti and Manser, 1993), and an additional mechanism would have to be invoked to explain somatic mutation in IgA, chains, since these do not contain an intron enhancer. Thus, although there is no direct evidence against this model, a number of additional assumptions need to be made for it tofit the available data. From a recent a study on hybridomas containing mutated and unmutated K transgenes, it was concluded that the process of somatic hypermutation is dependent on the orientation of the IgV regions in relation to the direction of DNA replication (Rogerson et al, 1991). This model requires the presence of a mutation initiation region upstream of the promoter which needs to be in the correct orientation for it to be recognized by the "mutator factor". However, this model is unlikely for at least two reasons:first, in another K transgenic system it was shown that somatic hypermutation took place regardless of the orientation of the transgene copies (Lozano et al, 1993), and second, IgH chain transgenes underwent somatic mutation even though the region upstream of the promoter was not present (Giusti and Manser, 1993). Somatic hypermutation due to gene conversion events: Sequence analysis of the H chain IgV region of a murine B cell hybridoma provided evidence for a gene conversion event between the rearranged VH region and a germline VH gene (Krawinkel et al, 1983). Further analysis revealed that one of the recombination points was flanked by a palindrome (Krawinkel et al, 1986). A similar gene conversion event was reported for another hybridoma, where the recombination point was also adjacent to a palindrome (Cumano and Rajewsky, 1986). Two sequence surveys studies revealed that many somatic mutations were adjacent to direct repeats or palindromes (Golding et al, 1987), and that L chain IgV genes contain many direct repeats (Kolchanov et al, 1987). Subsequently, it was elsewhere argued that gene conversion events are responsible for the generation of somatic mutations in and around mammalian IgV regions (Maizels, 1989). It was shown that quasi-palindromic DNA sequences promote insertion and deletion mutations (Ripley, 1982), and that large numbers of palindromes and direct repeats are present in IgV genes (Golding et al, 1987; Kolchanov et al, 1987). Thus, if palindrome-mediated gene conversion events were the major source of point mutations, then a significant number insertions and deletions should also be produced. However, 32 these type of mutations are rarely seen in somatically mutated and antigenically selected IgV regions (reviewed in Berek and Milstein, 1987,1988). Whereas this is probably due to antigenic selection, sequence analysis of non-antigenically selected and/or passenger transgenes found no frameshift mutations outside of CDR3 (Sharpe et al, 1991; Gonzalez-Fernandez and Milstein, 1993). Furthermore, other non-Ig genes also contain many palindromes and direct repeats without being hypermutated (Kolchanov et al, 1987). It is now well established that somatic hyperconversion events are responsible for the diversification of chicken IgV regions (Reynaud et al, 1987, 1989; see section 13.5b). Gene conversion also contributes to the diversification of rabbit IgV regions (Becker and Knight, 1990; see section 13.5b), although rearranged D and J genes in rabbits also seem to be hypermutated (Short et al, 1991). Analysis of mutations in twelve somatically hyperconverted chicken IgA, chains revealed that out of a total of 219 mutations, only 4 occurred outside of the rearranged VJ gene, one of these immediately upstream of the coding region, the other three within 100 bp downstream of the coding region (Reynaud et al, 1987). This is in contrast to somatically mutated mouse IgV regions, where many mutations occur in the flanking regions (Gearhart and Bogenhagen, 1983; Both et al, 1990; Lebecque and Gearhart, 1990; Weber et al, 1991). A number of studies carried out comparisons of mutated IgV sequences with germline IgV genes to detect possible donors. Two such studies failed to detect a possible germline donor for somatically acquired mutations (Chien et al, 1988; Wysocki et al, 1990). An earlier study found a possible germline donor for 1 out of 24 somatic mutations (Crews etal, 1981), whilst a recent, more comprehensive investigation found that out of 96 mutations, only 3 may have arisen by gene conversion (Milstein et al, 1992). An additional study, utilizing oligonucleotide probing of germline DNA and comparisons of previously published sequences, also found possible germline donors for a number of somatic mutations (David et al, 1993). Thus, gene conversions may be responsible for some mutations introduced into murine IgV regions, however it is most unlikely that this type of mechanism can account for the great majority of somatic mutations observed in murine systems (reviewed in Wysocki and Gefter, 1989). Somatic hypermutation due to an error-prone DNA^RNA-»DNA loop: According to this model (Fig. 1.8), mutations are introduced into IgV regions during transcription of the rearranged gene, and then also during reverse transcription of the RNA (Steele and Pollard, 1987). The cDNA copy is then aligned with the rearranged but unmutated V region on the chromosome, and via a gene conversion event replaces the unmutated copy (Steele et al, 1991). Mutations can be introduced at or near the sites of recombination due to heteroduplex-induced mutagenesis (Thomas and Capecchi, 1988; Weiss and Wilson, 1986) and transcription from minor upstream cap sites (Dougherty and Temin, 1987; Dornburg and Temin, 1988), could account for the very low frequency 33 of mutations upstream of the cap site (Both et al, 1990, 1990; Lebecque and Gearhart, 1990; Rothenfluh etal, 1993).

L V DJ2 J3 J4 CH ITHH> •JJ-C •DNA Error-prone I transcription

-DC JK RNA -BC * H'HVH—cDNA Homologous alignment of I cDNA with chromosomal allele

DNA

DNA

Figure 1.8. The reverse transcriptase model allows the introduction of mutations into and around the rearranged IgV region during three steps: DNA dependent RNA synthesis, RNA dependent DNA synthesis and gene conversion. The reverse transcriptase primer is indicated by an arrow. Symbols: X = cross-over point, I = a mutational event. (Adapted from Steele and Pollard, 1987; Steele et al, 1991)

This model can account for the localization of mutations to rearranged IgV genes, providing the assumption is made that at least one reverse transcriptase priming site is present in the J-C intron (Steele et al, 1991). In addition, no new enzyme specificities need be invoked, since a DNA-»RNA-»DNA information loop could accumulate mutations at the rate of 10"3 to 10'4/bp/event (Reanney, 1984,1986; Kunkel, 1991). The finding that in H chain genes the cap site is the 5' boundary of mutation (Steele et al, 1992; Rothenfluh et al, 1993; see chapter 6) is also compatible with, but not definitive proof for this model. In a recent study, the rearranged V regions were amplified from single germinal centre B cells, which were micromanipulated from stained and fixed germinal centers, and it was found that only single copies of any one rearranged V region were present in these cells (Kiippers et al, 1993). It was argued that this is provides evidence against models that predict the presence of multiple copies containing different mutations in somatically hypermutating B cells. There are two such models: the DNA based model proposed by Manser and the reverse transcriptase model of Steele and Pollard (see above). However, although many of the cells showed evidence of somatic hypermutation, it is possible that these cells expressed a previously uncharacterized

34 germline VH gene. It is also not clear whether they were actively undergoing somatic mutation at the time of isolation, and whether the cells were centroblasts (in which somatic hypermutation is proposed to be active - see above) or centrocytes. It was also pointed out that most of the cells were incomplete because the sections were approximately 1 cell thick. This raises the possibility that multiple and differentially mutated copies were lost in the processing of the sections. Alternatively, there may be very rapid intracellular selection of the multiple copies, followed by rapid degradation of the unsuccessful copies, so that once somatic mutation is terminated, the cell will only express one rearrangement for each H and L chain locus. Indeed, the recent data on K transgenes presented by Milstein's group is very suggestive of a highly sophisticated selection process that acts on multiple expressed and mutated V regions present in individual B cells (Lozano et al, 1993; and see above) and seems to be able to distinguish between successful and unsuccessful V region mutants.

35 2. AIMS OF THIS THESIS

The early phases of the work carried out for this thesis involved the determination of the 5' boundary for somatic hypermutation. Concurrently, work carried out prior to this thesis (Rothenfluh, 1990) was extended by isolating and sequencing VH 186.2 related germline genes. Analysis of the germline sequences revealed a number of interesting sequence patterns, and the germline VH sequence survey was subsequently extended to include another VH sub-family. Since it was of utmost importance to determine to what extent in vitro generated artifacts may have contributed to the data, an entire section of this thesis describes a range of experiments that directly address this issue. Although most of the later stages of this work was spent on the germline sequence survey, preliminary work was carried out on two additional projects, both of which address a different aspect of somatic hypermutation of murine IgV genes. The results obtained in the latter two projects will be presented and discussed first.

2.1 Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells

Although it is generally accepted that somatically mutated germinal center B cells with deleterious mutations rapidly undergo apoptosis (MacLennan etal., 1988; Liu etal, 1989), a number of groups have isolated germinal center or splenic B cells with somatic mutations that result in translation stop codons or loss of antigen binding (Apel and Berek, 1990; Berek et al, 1991; Rada et al, 1991; Weiss et al, 1992). These cells were isolated from germinal centers or spleen during the early phases of the imune response, where presumably antigenic selection has not yet had its full effect (eg. Apel and Berek, 1990; McHeyzer-Williams et al, 1993). Some of the studies PCR amplified from the cDNA (Apel and Berek, 1990; Berek etal, 1991), whilst the others amplified from the DNA (Rada et al, 1991; Weiss et al, 1992). Two of the somatic mutator models predict the existence of multiple copies of rearranged IgV genes with different mutations in hypermutating B cells (Steele and Pollard, 1987; Manser, 1990). One of these predicts the differences to be present in the RNA/cDNA molecules (Steele and Pollard, 1987), whilst the other predicts the differences to be present at the DNA level (Manser, 1990). The data from the above studies suggests that it may be possible to isolate gerrninal center B cells after they have mutated, but before antigenic selection. This raises the possibility that differentially mutated copies of IgV region DNA or RNA may be detected in these cells if they are in fact present. Thus, the RNA and DNA from the same splenic B cell pool were isolated, and the rearranged VH regions were PCR amplified

36 from each type of nucleic acid and then sequenced. It was hoped that in this way any differences that may exist between the DNA and the RNA would be revealed.

2.2 In vitro analysis of splenic antigen-specific B cells

Although the process of somatic hypermutation is well characterised, little is known about the mechanism (reviewed in Steele, 1991). The establishment of an in vitro system, in which somatic hypermutation could be reproduced, would greatly facilitate the elucidation of the mechanism. Hodgkin and colleagues (1990) found that the membranes of activated T cells were sufficient to activate small resting B cells. They also found that addition of various interleukins could stimulate antibody secretion and isotype switching in these B cells. Since this system seems to provide B cells with many of the signals required for activation, it was of interest to determine if this system could also be utilized for the in vitro study of somatic hypermutation. Therefore, in collaboration with Dr. Phil Hodgkin and Alusha Mamchak of the John Curtin School of Medical Research at the Australian National University, Canberra, a preliminary study was carried out. In this initial experiment we were interested in the stimulatory effects of the above B cell activation system on antigen-specific B cells isolated from hyperimmune spleen. It was determined whether these B cells could be made to secrete Ig, proliferate and undergo somatic hypermutation without too many modifications of the system. The results of this experiment can be used tofine-tune furthe r experiments.

2.3 Determination of the 5' boundary for somatic hypermutation in VH regions

Two recent papers addressed the issue of where the boundary for somatic hypermutation in IgVn regions lies (Both et al, 1990; Lebecque and Gearhart, 1990). Both groups found that the bulk of mutations occurred in the transcription unit, however two hybridomas contained a total of four mutations in between the cap site and the promoter region (Lebecque and Gearhart, 1990), whilst a hybridoma expressing a VH region with a somatically mutated derivative of the VH 186.2 germline gene, VH3B62 (Cumano and Rajewsky, 1986), contained a cluster of 5 mutations >375 bp upstream of the cap site (Both et al, 1990). The four mutations found close to the cap site (Lebecque and Gearhart, 1990) can be accounted for by the reverse transcriptase model (Steele and Pollard, 1987; and see section 1.2c), however it is much more difficult to reconcile the distal cluster of mutations found in 3B62 with this model. It was previously reported that two mutations found in the coding region of VH3B62 may have a germline donor (Cumano and Rajewsky, 1986). The region of homology present in the putative germline donor, H17, is flanked by a palindromic sequence (Cumano and Rajewsky, 1986), 37 which may have promoted a gene conversion event (Krawinkel et al, 1986). Therefore it was important to determine whether there may be a germline donor for the cluster of 5 mutations found in 3B62, or if they arose somatically. In work carried out prior to this thesis (Rothenfluh, H. S. 1990. Honours Thesis. University of Wollongong), PCR and restriction analyses of germline DNA were carried out to detect a potential donor for this cluster of mutations. These strategies failed to detect a putative germline donor, however this needs to be confirmed by a sequence survey of VH186.2 related germbine genes. Thus, part of he work done for this thesis involved the PCR amplification and sequence determination of VH 186.2 related germline genes and their 5'flanking regions that were isolated from C57BL/6J and BALB/c mice. The two strains of mice were utilized because the hybridoma 3B62 was isolated from C57BL/6J mice following a secondary imunization with the hapten NP, and the fusion partner used in the generation of the hybridoma cell lines originated from BALB/c mice (Cumano and Rajewsky, 1986). Furthermore, in order to determine whether somatic mutation in distal 5'flanking region s is a common event, the sequences for a number of anti-NP hybridomas, the coding sequences of which were previously determined (Blier and Bothwell, 1987), were extended into the distal 5' flanking regions. This would also provide additional sequence data for a more accurate determination of the 5' boundary for somatic hypermutation in H chain IgV regions. The additional sequence data could also be used to further investigate the proposition that the mutation frequency distribution around rearranged V regions may be asymmetric (Steele et al, 1992).

2.4 Minimization of PCR generated artifacts

It was shown that DNA strand breaks (Paabo et al, 1990) could result in a phenomenon called 'strand-jumping' or 'PCR crossover'. This results in the production of hybrid DNA products, parts of which are derived from two different templates. Another study suggested that if a partially degraded RNA molecule is reverse transcribed, then the incomplete cDNA molecule could act as a primer during subsequent PCR amplification and thus result in the production of chimaeric molecules (Shuldiner et al, 1989). Strand jumping would tend to cause problems especially when PCR amplifying homologous genes or cDNAs. However, Milstein and colleagues (1992) demonstrated that under optimal PCR conditions, they could not detect any strand jumping events when amplifying from related germline VKOXI genes. Another problem associated with PCR amplification of DNA is the error rate of the thermostable DNA polymerase used. The error rate of Taq DNA polymerase appears to vary between different laboratories; eg. 2 x 10-4 errors per nucleotide per cycle (Saiki et al, 1988), 1.1 x 10"4 (Tindall and Kunkel, 1988), 7 x 10"4 (Milstein et al, 1992), 6 x 10-5 (Weiss and Rajewsky, 1990). Another thermostable enzyme, Pfu DNA polymerase, 38 was shown to have a 12-fold lower error rate (Lundberg et al, 1991). Thus, Taq DNA polymerase which was used during the very early stages of the work done for this thesis, was replaced by Pfu DNA polymerase which was the enzyme used for most of the PCR amplifications. Since a large part of this thesis involves the analysis of closely related germline V genes, it is important that the data contains few if any PCR generated artifacts. The level of enzyme generated base changes was estimated by amplifying well-defined DNA fragments from hybridoma or cloned DNAs, and sequencing the products. The level of strand jumping that occurred during PCR amplifications was assessed by southern blot analysis of restricted PCR products, which were amplified from mixtures of well defined cloned DNA.

2.5 Molecular and phylogenetic analysis of related germline VH genes

Analysis of the coding and 5' flanking region sequences of the VH186.2 related germline genes revealed that the differences in the coding regions were concentrated in the regions corresponding to CDR1 and CDR2. Although this pattern amongst germline IgV genes was reported in a number of previous studies (eg. see Bothwell et al, 1981; Givol etal, 1981; Bentley and Rabbitts, 1983; Even etal, 1985; Schiff et al, 1985; Kodaira et al, 1986; Rathbun etal, 1989; Pascual and Capra, 1991; Sims etal, 1992; Tomlinson et al, 1992), none of the previous studies attempted to provide an explanation for this pattern, nor did the earlier data include significant amounts of non-transcribed flanking sequence. The non-random concentration of sequence variability in germline IgV genes raises a number of important questions: How does a pattern normally seen in rearranged IgV genes that have been antigenically selected at the protein level arise in transcriptionally silent germline IgV genes? How are highly homologous, yet unique sequences maintained in the germline? In an attempt to answer these questions, the patterns of sequence variation and the phylogenetic relationships of the germline IgV genes and their 5' non-translated flanking regions were analyzed in detail. This was repeated for an additional related set of IgVH genes - those related to the germline gene VH205.12.

39 3. MATERIALS AND METHODS

3.1 DNA, bacterial strains and cloning vectors used a. Genomic DNA C57BL/6J liver DNA was kindly provided by K. Rajewsky and his group at the Institute of Genetics, University of Cologne. Genomic DNA was also prepared from hver and embryonic tissue from the C57BL/6J and BALB/c strains held at the Animal Breeding Establishment (ABE), John Curtin School of Medical Research, Canberra Embryos and adult livers were isolated from 8 -10 week pregnant, unimmunized females obtained from the ABE in Canberra. The tissues were removed within 16 hours following removal from the barrier-maintained clean conditions of the ABE. In order to reduce lymphocyte contamination, each mouse was exsanguinated and intact embryos and maternal livers were removed and washed three times in sterile phosphate buffered saline (PBS pH7.4), prior to storage at -70°C. b. Hybridoma DNA The DNA from hybridomas Hl-7, Hl-8, Hl-27, Hl-30, Hl-51, Hl-29, Hl-39, Hl-45 and Hl-72 was kindly provided by A. Bothwell, whilst the DNA from hybridomas 3C52, 3A112 and 3D61 was generously provided by K. Rajewsky and his group. c. 40.3, 3B44 and 3B62 rearranged VH region genes The hybridomas expressing these somatically mutated IgH chain V regions were isolated during the secondary anti-NP response in C57BL/6J mice. 4.2 kb EcoRI genomic DNA fragments containing the 3B44 or 3B62 VH genes, and a 4.6 kb EcoRI

fragment containing the 40.3 VH gene were inserted into the pUC19 cloning vector (Cumano and Rajewsky, 1986). These clones were kindly provided by K. Rajewsky and his group. d. A6/24 and A20/44 rearranged VH region genes The hybridomas expressing these somatically mutated IgH chain V regions were isolated during an anti-idiotypic response against VH186.2 rearranged anti-NP antibodies in C57BL/6J mice (Sablitzky and Rajewsky, 1984). The 7.2 kb EcoRI fragments containing these genes were subsequently cloned into a Bluescribe vector (Both et al, 1990). These clones were kindly provided by K. Rajewsky and his group. e. Bacterial strains Escherichia coli (E.coli) strain JM109 was initially purchased from Invitrogen. A number of glycerol stocks were prepared and kept at -70°C.

40 f. Cloning vectors The M13mpl9 bacteriophage vector was initially purchased by the laboratory of G. Both, Division of Biomolecular Engineering, Commonwealth Scientific and Industrial Research Organization (CSIRO), North Ryde. Aliquots from the initial large scale RF M13mpl9 DNA preparation were stored in isopropanol at -70°C. The pUC19 cloning vector was purchased from Boehringer-Mannheim. Fresh batches of this DNA was continually purchased from the supplier throughout the course of this work.

3.2 Enzymes

Thermostable DNA polymerases: Taq DNA polymerase (Perkin Elmer Cetus) and Pfu DNA polymerase (Stratagene) were used for PCR amplification of DNA. Most sequencing reactions also utilized the fmol sequencing kit (Promega) which utilizes Taq DNA polymerase. DNA modification enzymes: All other DNA modification enzymes were purchased from Boehringer Mannheim, and they were used with the supplied buffers at the recommended concentrations. In the early stages of the work carried out for this thesis, the T7 DNA polymerase sequencing kit was utilized (Pharmacia).

3.3 Primers used in the Polymerase Chain Reaction (PCR)

The relative positions and the sequences of the primers used to PCR amplify DNA or cDNA are shown in Figure 3.1 and Table 3.1 respectively.

a) P~|">—|I]—[ V D 4,5, *13 "12 "8,9, 6,7 10,11 1, 14. b) 13 12

c) L V D J Cyl 16 15

Figure 3.1. Locations of the PCR primers used in this thesis. The primers used to PCR amplify the following regions are shown: a) rearranged VDJ regions from genomic DNA, b) germline V"H205.12 and V~H186.2 related genes, and c) cDNA from splenic B cells. Symbols: P = promoter, l = cap site, L = leader, V = V gene, D = diversity gene, J = joining gene, Cyl = Cyl constant region gene. Arrows indicate the direction of DNA synthesis initiating from the specified primer(s). 41 Primers 4, 5, 6 and 7 are specific for sequences immediately downstream of the JH-1, JH-2, JH-3 and JH-4 genes respectively, whilst primers 8 -11 are specific for sequences within JH-1, JH-2, JH-3 and JH-4 respectively. Primers 1 and 14 respectively are specific for the 5'flanking region of VH 186.2 and VH205.12 related germline genes, whereas primer 12 is specific for a region that is highly conserved in members of both sub-families. The sequences of the primers indicated in Figure 3.1 are shown in Table 1.

Table 3.1. PCR primer sequences Primer Primer sequence (5'—>3') 1 GCGGTCGACGTGATGC A ATATTrTnTTn Ar 2 C£C

The sequences containing the restriction sites that were added to the 5' end of the primers are underlined (all upstream primers contain a Sail site, whereas all downstream primers contain an EcoRI site). * The sequences of these primers, without restriction sites, were previously published in McHeyzer-Williams et al, 1990.

3.4 PCR with Taq DNA polymerase

All PCR amplifications were performed in 25 |il reaction mixtures. Each PCR reaction contained between 7-100 ng of genomic hybridoma DNA, 100 ng of mouse liver DNA or 100 pg of cloned DNA. 20 ng of each primer was included in the PCR mixes, which also contained 0.2 mM each deoxynucleoside triphosphate (dNTP) in 10 mM Tris- HCl (pH 8.3), 50 mM KC1, 2-3 mM MgCl2,0.001 % gelatin and 2.5 units of Taq DNA polymerase, which was added after thefirst 5 min denaturation step (95 °C). The reaction mixtures were overlaid with 25 jil of mineral oil before being subjected to an appropriate 42 number of cycles of denaturation (95°C, 30 s), annealing (60°C, 30s) and extension (72°C, 30 - 60 s). All PCR amplifications were performed in a HBTR1 thermal cycler (Hybaid). In order to prevent cross-contamination, disposable filter protected pipette tips were utilized for the preparation of all PCR reactions. PCR reagents were aliquoted in a laminar flow hood situated in another part of the building where no post-PCR work was carried out. Filter tips were also used for the preparation of aliquots. A 'no-DNA' control was included with every amplification.

3.5 PCR with Pfu DNA polymerase

The total reaction volumes and the amounts of templates and primers added are as described in section 3.4. The PCR reaction mixture contained 0.2 mM each dNTP, 20 mM Tris-HCl (pH 8.2), 10 mM KC1, 6 mM (NH4)S04, 2 mM MgCl2, 0.1% Triton X- 100, 10 ng/|j.l BSA and 2.5 units of Pfu DNA polymerase, which was added after the first denaturation step 95°C). The reaction mixtures were overlaid with 25 ui of mineral oil and subjected to an appropriate number cycles of denaturation (95°C, 30 s), arinealing (60°C, 30s) and extension (75°C, 30 - 60 s). All PCR amplifications were performed in a HBTR1 thermal cycler (Hybaid). The precautions described in section 3.4 were strictly followed at all times.

3.6 PCR with radiolabel incorporation

Radioactive probes were prepared by utilizing a modification of a PCR technique allowing the incorporation of a radionucleotide (Schowalter and Sommer, 1989). The PCR reactions were prepared as described in sections 3.5 and 3.6, but 20 (iCi of (a-32P) dATP (Bresatec Ltd., Thebarton 5031, South Australia) was added to the reaction mixture. The precautions described in section 3.4 were strictly followed at all times. A sample (5-10 jxl) was electrophoresed on a 1.5 % agarose gel, and the remainder was extracted once with chloroform and precipitated with isopropanol.

3.7 Electroelution

DNA was initially recovered from agarose gel slices with the "Little Blue Tank" (ISCO). However, the unsatisfactory recovery yields of this and other electroelution procedures prompted the development of an alternative method. The Rothenfluh electroelutor, shown in Figure 3.2, was constructed out of perspex. Platinum wire was used for the electrodes, and rubber-coated copper-wire power leads of standard size were used. This electroelutor was developed and built independently of a commercially

43 available model that also utilizes Centricon microconcentrator columns but consists of two buffer chambers (Amicon). The electroelutor shown in Figure 3.2 is a simpler design and only contains one built-in buffer chamber which avoids the use of rubber seals and reduces the amount of buffer required.

Negative electrode

Positive electrode

Figure 3.2. The Rothenfluh electroelutor. Up to 6 electroelution units can be connected. Each electroelution unit is composed of a microfuge tube (A) which is inserted into the upper opening of a Centricon-100 sample reservoir (B). The positive electrodes are attached to the lid, and can be detached if fewer than 6 samples are to be electroeluted. The shaded areas are filled with 1 x TAE buffer.

The lid of a 1.5 ml microfuge tube was removed and the lower end was pierced and wrapped with Parafilm. The electroelution unit was assembled by inserting this microfuge tube into the upper opening of a Centricon-100 microconcentrator (Amicon) as shown in Figure 3.2. The agarose gel slice was placed into the microfuge tube after the electroelution unit was filled with 1 x TAE buffer. The electroelution unit was placed into the electroelutor as shown in Figure 3.2. The lid was placed into position so that the upper electrode was partially immersed in the buffer contained within the microfuge tube. Any unused upper electrodes were disconnected. The power lead attached to the fid was connected to the positive terminal, and the power lead from the lower buffer chamber was connected to the negative terminal. 100 volts of electric current were passed through the apparatus for at least 15 min, depending on the size of the DNA and the percentage of the agarose gel. 15 min was sufficient to completely remove a 1 kb DNA fragment from a 0.7% agarose gel slice. Other workers in the Department of Biological Sciences have used this electroelutor successfully for the electroelution of 5 kb DNA fragments, and for the electroelution of proteins. The successful removal of DNA fragments from the gel slice is confirmed by exposing the gel slice to a UV trans-iUuminator. Following electroelution, the sample reservoir was centrifuged at 1,000 g for 20 min. 750 pi of sterile water was added and centrifuged for a further 6 min. This was repeated once more, then the DNA was precipitated.

44 3.9 Cloning and ligation of DNA

All 5' primers used in PCR contained a Sail restriction site, whilst all 3' primers contained an EcoRI site (Table 3.1). Thus the electroeluted PCR fragments and 50 ng of pUC19 or M13mpl9 DNA for each ligation were separately restricted with EcoRI for 1-2 hr at 37°C, following which the enzyme was heat denatured at 70°C for 10 min. The DNA was precipitated at room temperature in 0.8 M LiCl and an equal volume of isopropanol. The DNA was then restricted with Sail for 1-2 hr at 37°C. Following a 10 min heat inactivation of the second restriction enzyme, half of the PCR fragment was added to 50 ng of restricted pUC19 or M13mpl9 DNA and precipitated. The remainder of the PCR amplified DNA was stored in a -20°C freezer. Following precipitation, the DNA was ligated overnight with T4 DNA ligase in an ice/water bath which was initially at 10°C. The DNA fragment amplified from hybridoma HI-72 with primers 1 and 5 was phosphorylated and concatamerized by ligation using the method described by Jung et al. (1990), prior to restriction and ligation into M13mpl9.

3.10 Preparation of competent cells

E. coli JM109 cells were grown in 2 x YT broth (1.6 % tryptone, 1 % yeast extract, 0.5 % NaCl) until an optical density (OD) 500 - 550 nm was reached. The cells were then placed on ice for 10 min prior to pelleting by centrifugation for 7.5 min at 800 g. The supernatant was poured off and the pellet was resuspended in 10 ml of ice cold 100 mM CaCl2 for each 20 ml of culture. The cells were allowed to stand on ice for 20 min, following which they were centrifuged at 500 g for 10 min. The supernatant was removed and the cells were resuspended in 2 ml of ice cold 100 mM CaCl2 for each 20 ml of original culture. The competent cells were used for transformation on the same day.

3.11 Transformation of DNA cloned into bacteriophage M13mpl9

The T4 DNA ligase was heat denatured for 10 min at 70°C. Each transformation also included the following controls: - Vector DNA cut with EcoRI, unligated - Vector DNA cut with EcoRI, ligated - Vector DNA cut with Sail, unligated - Vector DNA cut with Sail, ligated - Vector DNA cut with both enzymes, unligated - Vector DNA cut with both enzymes, ligated - Uncut vector DNA 45 Each ligation reaction was mixed with 75 (il of competent cells in a 5 ml test tube, and the transformation mixtures were then placed on ice for 45 min. The cells were heat shocked at 37°C for 5 min, following which they were immediately placed on ice for 10 min. 200 \x\ of fresh JM109 culture, 30 ul of isopropyl-6-D-thiogalactopyranoside (IPTG, 120 mg/ml) and 30ul of 5-bromo-4-chloro-3-indolyl-6-D-galactopyranoside (X- Gal, 30mg/ml) was added to each transformation. 4.5 ml of melted 0.7 % YT agar (0.8 % tryptone, 0.5 % yeast extract, 0.25 % NaCl, 0.7 % nutrient agar) was added to each transformation. The transformation mixture was immediately plated onto a nutrient agar plate (60 mM K2HPO4, 33 mM KH2PO4, 7.5 mM (NH4)2S04, 1.7 mM Tri-Sodium Citrate, 0.002 % MgS04, 0.2 % D-Glucose, 0.0005 % Thiamine-HCl), and then incubated at 37°C overnight. Opaque plaques were picked with sterile toothpicks and placed into a well of a 96 well microtiter plate, containing 200 ul of SM buffer (50 mM Tris pH 7.5,0.2 % MgS04,100 mM NaCl, 0.01 % gelatin) in each well. Before placing at 4°C, the microtiter plate was allowed to stand at room temperature for 45 - 60 min to allow the single stranded phage particles to enter the buffer.

3.12 Transformation of DNA cloned into plasmid pUC19

200pl of competent cells were added to each heat denatured ligation mixture in a 1.5 ml microfuge tube. The tubes were then placed on ice for 60 min. The cells were heat shocked for 5 min at 37°C, and then placed on ice for 10 min. 1 ml of 2 x YT broth was added to the transformation mixes (the controls described in section 3.11 were also included), which were then incubated at 37°C for 60 min. The transformation mixes were then spread onto nutrient agar plates containing IPTG, X-Gal and ampicillin. The plates were incubated at 37 °C overnight. The next day white colonies were picked with sterile tooth picks.

3.13 Dot-Spot plaque hybridization screening

To screen the M13mpl9 plaques for insertion of the correct fragment, 5 pi of each plaque supernatant was spotted onto a nylon membrane (Hybond-N+, Amersham). The membrane was allowed to air dry. Then the membrane was placed onto a piece of filter paper soaked in denaturation buffer (0.5 M NaOH, 1.5 M NaCl) for 7 min. Next, the membrane was placed onto afilter pape r soaked with neutralization buffer (1.5 M NaCl, 0.5 M Tris pH 7.4,1 mM EDTA) for 6 min, following which the membrane was placed on afilter paper soaked in 0.4 M NaOH for 20 min. The membrane was then air dried.

46 3.14 Dot-spot colony hybridization screening

White transformant colonies were picked with sterile tooth picks and then spotted onto a nylon membrane and, at an equivalent position, onto an ampicillin nutrient agar plate. Both were incubated at 37°C overnight. The membrane was then placed onto a filter papers soaked in the following solutions: 10 % SDS for 3 min, denaturation buffer for 5 min, neutralization buffer for 5 min and NaOH for 20 min. The membrane was then dried in an oven pre-set at 60°C.

3.15 Transferring separated DNA fragments onto a membrane

The DNA fragments to be transferred to the nylon membrane were size separated on an agarose gel, following which the gel was soaked in transfer buffer (0.4 M NaOH, 1 M NaCl) for 15 min. The buffer was then replaced with fresh buffer, and the gel soaked for a further 20 min. A tray was then filled with approximately 1 1 of transfer buffer, and a glass plate placed over the tray. Three sheets of filter paper were soaked in transfer buffer and placed onto the glass plate. The gel was placed onto thefilter papers , and the edges of the gels masked with plastic film. The membrane was placed onto the gel, and three pre-soaked sheets of filter paper were placed onto the membrane. A stack of tissue paper was then placed onto the filter papers, and weighed down with a glass plate (approximately 400 g). The DNA was then allowed to transfer to the membrane overnight. This method is similar to that described in Sambrook et al. (1989).

3.16 Hybridization of labeled probes to nylon membranes

The membrane and 5 ml of (pre)hybridization buffer (1.25 mM EDTA, 525 mM Na2HP04, 7 % SDS, 50 u.g/ml sonicated salmon sperm DNA) was placed into a glass hybridization tube supplied with the mini-hybridization oven (Hybaid). The membrane was pre-hybridized at 55 °C for 60 min. The heat denatured probe was then added to the (pre)hybridization buffer, and the membrane allowed to hybridize at 55°C overnight. The next day, the buffer was removed and two 10 min low stringency (25°C) washes in 10 ml of wash buffer A (1 mM EDTA, 50 mM Na2HP04, 5 % SDS) were carried out. Following this, a 10 min high stringency (55°C) wash in 10 ml of wash buffer B (1 mM EDTA, 100 mM Na2HP04, 1 % SDS) was carried out. The membrane was allowed to air-dry and was then exposed to Hyperfilm-NP (Amersham).

47 3.17 Preparation of single-stranded DNA for sequencing

The day before the DNA extraction, 40 ul of one hybridization-positive M13mpl9 plaque supernatant was added to 2 ml of 2 x YT broth, which was previously inoculated with E.coli JM109. This was incubated at 37°C overnight with shaking. The next day the cells containing the RF phase bacteriophage were pelleted and the supernatant containing the single-stranded bacteriophage particles was removed. The double-stranded bacteriophage DNA was also extracted (for method see section 3.18), and the insert released to allow size selection of clones containing an insert of the correct size. 300 ul of PEG solution (10 % PEG 6000, 14.6 % NaCl) was added to the supernatant. This was allowed to stand at room temperature for 30 min which was followed by a 10 min centrifugation at 10,000 g. The supernatant was removed, and the DNA pellet was resuspended in 200 (il of sterile water and then extracted once with a 25:24:1 phenol:chloroform:isoamyl alcohol solution, once with chloroform and once with ether. The single-stranded DNA was precipitated in 0.8 M LiCl and isopropanol. Finally, the DNA was resuspended in 25 ul of water.

3.18 Preparation of double-stranded DNA for sequencing

2 ml of 2 x YT broth containing ampicillin (70 -100 ug/ml) was inoculated with a hybridization-positive colony and incubated at 37°C overnight with shaking. The next day, the cells were pelleted by centrifugation at 10,000 g for 5 min and the supernatant removed. The cells were resuspended in 100 pi of lysis buffer (25 mM Tris pH 8.0, 10 mM EDTA, pH 8.0, 50 mM D-Glucose, Lysozyme, 5 mg/ml) by vortexing. 200 ul of alkaline solution (200 mM NaOH, 1 % SDS) was added, and the mixture vortexed. This was followed by a 10 min incubation on ice. 150 ul of high salt solution (3 M potassium acetate, 11.5 % glacial acetic acid) was added and the mixture was vortexed and then centrifuged at 10,000 g for 10 min. The supernatant was removed and 10 ug/ml of RNAse A was added, following which the solution was incubated at 37°C for 30 - 60 min. Following this the DNA was extracted and precipitated as described in section 3.17. The DNA was resuspended in 25 ul. 5 ul of this was restricted with EcoRI and Sail to release the insert, and then size separated on an agarose gel. This allowed selection of hybridization positive clones that contained an insert of the correct size.

3.19 DNA sequencing

In the early stages of the work carried out for this thesis, the T7 sequencing kit (Pharmacia) was utilized. All instructions supplied by the manufacturer were followed. However, the bulk of the sequence data was obtained by using thefinol sequencing kit 48 (Promega). Again, all instructions supplied by the manufacturer were followed. The sequences were separated in 5 % acrylamide gels. By running a short and a long gel for each sequencing reaction, 350 - 450 bases of sequence could be read. With the T7 sequencing kit, any ambiguities were resolved by using deaza-dGTP nucleotide mixes and/or by heating the DNA prior to carrying out the sequencing reaction. With the fmol sequencing kit, any ambiguities were resolved by sequencing the opposite strand.

3.20 Preparation of genomic DNA

Embryos or livers were washed three times in sterile saline solution prior to storage at -70°C. A small piece of tissue was used for each extraction. At all stages throughout the extraction procedure, care was taken not to expose the DNA to unnecessary shearing forces or nucleolytic enzymes (i.e. no vortexing, use of DNAse- free sterile equipment and solutions etc.). The piece of tissue was minced with a sterile razor blade, and then it was digested with proteinase K (50ug/ml) at 55°C overnight. The next day the tissue was digested with RNAse A (10 ug/ml) for 2 hr at 37°C. The DNA was then extracted with a phenol:chloroform:isoamyl alcohol solution a number of times until the supernatant was clear, followed by a chloroform an ether extraction. The DNA was then precipitated by adding an equal volume of isopropanol. The DNA was removed with a sterile hooked Pasteur pipette and washed in 70 % ethanol. Then the DNA was resuspended in 300 ul of sterile water and the concentration and purity determined by taking UV-light OD readings at 260 nm and 280 nm.

3.21 Isolation of both cytoplasmic RNA and nuclear DNA from the same B cell(s)

This method (Gough, 1988) allows the isolation of cytoplasmic RNA and nuclear DNA from the same cell(s). The cytoplasmic membranes of up to 5 x 10^ cells were disrupted by vortexing in 200 ul of extraction buffer 1 (10 mM Tris (pH 7.5), 150 mM NaCl, 1.5 mM MgCl2 and 0.65 % NP-40). The nuclei were pelleted by centrifugation at 800 g for 5 min, and following removal of the supernatant the DNA extracted as described above. The supernatant was placed into another microfuge tube that contained 200 ul of extraction buffer 2 (7 M urea, 350 mM NaCl, 10 mM EDTA, 10 mM Tris (pH 7.5) and 1 % SDS), and vortexed. The RNA was then extracted once with a phenol:chloroform:isoamyl alcohol solution and once with chloroform. The supernatant was precipitated in 95 % ethanol at -70°C for at least 1 hr. First strand cDNA synthesis was carried out with an IgCyi specific primer using the Amersham cDNA synthesis kit exactly as described in the instructions provided by the manufacturer.

49 This method was also used to isolate the RNA from single cells. In this case the nucleus was not recovered, and 10 ug of tRNA was added as carrier.

3.22 Isolation of splenic B cells from mice hyperimmunized with NP with NIP-coated Dynabeads

Immunization schedule: 6 C57BL/6J mice were immunized intra-peritoneally with the hapten NP coupled to keyhole limpet hemocyanin (NP-KLH, kindly provided by Dr. P. Lalor of the Walter and Eliza Hall Institute of Medical Research, Melbourne, Vic. Australia). For primary immunization (day 0), each mouse was injected with 30 ug NP- KLH, 10 ug alhydrogel and 10^ cells of Bordetella pertussis in 200 ul sterile saline solution (0.85 % NaCl). For the secondary (day 14) and tertiary (d 23) immunizations, each mouse was injected with 10 ug NP-KLH in 200 ug of sterile saline solution. Preparation of NIP coated Dynabeads: (4-hydroxy-3-iodo-5-nitrophenyl)acetic acid (ND?) was coupled to BSA with a 10 molar excess of NIP over BSA. The solution was then dialysed through 2500 volumes of NaHC03, and then through 2500 volumes of PBS (25 mM NaH2P04, 81 mM Na2HP04, 100 mM NaCl). The coupling ratio (NIP:BSA) was determined to be 7.6:1. The NIP-BSA was biotinylated with the Amersham Biotinylation Kit as described in the instructions supplied by the manufacturer. The biotinylated NTP-BSA was purified through a G25 Sephadex column. The biotinylated NTP-BSA was then coupled to the Streptavidin coated Dynabeads (Dynal) at room temperature for 60 min. Using a magnetic separator, the NIP-BSA coated Dynabeads were washedfive time s with BPBS and stored at 4°C until use. Preparation of red blood cell-free single cell suspension: The spleens from 6 hyperimmunized mice were removed at day 26, and single cell suspensions made by pushing the tissue through a wire mesh (Cunningham chamber) into 10 ml of BPBS (25 mM NaH2P04, 81 mM Na2HP04, 100 mM NaCl, 0.1 % BSA, 0.01% Sodium azide), and by drawing the cell suspension up and down a Pasteur pipette. Any clumps were pelleted during a very brief spin in a benchtop centrifuge, and the single cell suspension removed. The cells were pelleted by centrifugation at 800 g for 5 min. The supernatant was removed and the cells were resuspended in 2 ml ACT buffer (9 volumes of 0.83 % aqueous ammonium chloride to 1 volume Tris pH 7.65) in order to lyse the red blood cells (Boyle, 1968). This was repeated once more, and the volume brought up to 10 ml in ACT buffer. The cells were pelleted by centrifugation at 800 g for 5 min, and then resuspended in 10 ml BPBS. The cells were again pelleted as described above and resuspended in 10 ml of BPBS. A 10 ul sample of the cell suspension was stained with 90 ul of 0.5 % trypan blue and the number of live and dead cells counted on a hemocytometer.

50 Separation of NP+ cells with NIP-BSA coated Dynabeads: Non-specifically binding cells were removed by adding Streptavidin coated Dynabeads to the cells at a beadxell ratio of 0.2:1 for 45 min on ice. Using a magnetic separation device, the supernatant containing the NP specific cells was removed in to a fresh microfuge tube. NTP-BSA coated Dynabeads were added to these cells at an approximate bead:cell ratio of 0.2:1. The cells and beads were incubated on ice for 60 min. The supernatant containing the non-binders was removed using the magnetic separator. The Dynabead-cell mixture was resuspended in 1 ml of BPBS. A 10 ul sample, stained with Trypan blue, was observed under a hemocytometer to determine how many unbound cells were present. This was repeated 7times, until only rosettes (cells coated with Dynabeads) were present. The cells were snapfrozen and stored at -70°C.

3.23 Flow cytometry

Immunization schedule: Two C57BL/6J mice were injected intra-peritoneally with a total volume of 200ul containing 30ug of NP-KLH, alhydrogel, 10^ B. pertussis in sterile 0.85% saline solution. At day 14 and 57, they were injected with a further lOug of NP-KLH in 200ul of sterile saline solution. The spleens were removed 5 days following the tertiary immunization. Preparation of single cell suspension: The spleens were removed 5 days following tertiary immunization and single cell suspensions made by pushing the tissue through a wire mesh into 10 ml of RPMI (0.01 % RPMI, 10 % FCS, 10 mM HEPES, 5 x 10"5 M 2-mercaptoethanol, 0.01 % non-essential amino acid solution, 1 mM sodium pyruvate, 2 mM glutamine), and drawing the cells up and down a 10 ml glass pipette. The cells were placed in a glass tube and allowed to stand at room temperature for 5 min. The supernatant was poured into a new glass tube without any of the clumps of aggregated cells. The cells were pelleted by centrifugation at 800 g for 7 min. After removing the supernatant, the cells were resuspended in 5 ml of BCM medium. The cell suspension was layered onto 5 ml of Ficoll and centrifuged at 800 g for 20 min. The red blood cells and any dead cells formed a pellet. The remaining cells were removed from the interface and placed into 10 ml BCM. In order to remove any Ficoll, the cells were pelleted (800 g for 8 min) and resuspended in 10 ml BCM twice. Any BCM was removed by pelleting the cells and resuspending them in 10 ml CBSS (0.14 M NaCl, 5.4 x 10-3 M KC1, 1.3 x 10"3 M Na2HP04, 4.4 x 10'4 M KH2PO4, 5.6 x 10'3 M glucose) twice. The cells were then pelleted a final time and resuspended in 5 ml CBSS.

51 Fluorochrome staining of cells: The following antibodies which were kindly supplied by Dr. Paul Lalor were used to label the cells: - anti (a)-B220-phycoerythrin (PE) - a-B220-fluorescein isothiocyanate (FJTC) - a-IgK-aUophycocyanin (APC) -NP-APC - a-IgX-APC - a-IgGi-FITC. The following of aliquots of 10^ cells were stained with the indicated antibodies for flow cytometric analysis: - oc-B220-PE - a-B220-FTTC - a-K light chain-APC - streptavidin-APC - OC-B220-PE / NP-APC / a-IgGl-FTTC - a-B220-FITC / a-IgK-APC / NP-APC - a-B220-FTTC / ct-IgX-APC / NP-APC One aliquot of unstained cells was also analyzed with a fluorescence activated cell sorter (FACS). From the analyses it was determined that the following antibodies should be used for the separation of splenic B cells via three color flow cytometry: a-IgK-APC, NP-APC and a-IgGi-FTTC. Thus, the remainder of the spleen cells were stained with these three antibodies. The FACS was then set to sort one or ten B200+, IgK" and IgGi+ cell(s) into each single well of 96-well microtiter plates.

3.24 Culture of FACS sorted cells

The cells isolated by flow cytometry were placed in 100 ul BCM containing one of the following additives, and following 6 day culture (37 °C) they were assayed for Ig secretion (see section 3.26): 1 B cell/well: - 20,000 D10 T cells (36 wells) - conalbumin-A (conA) stimulated T cell membranes, IL-4 and IL-5 (192 wells) 10 B cells/well: - no additions (24 wells) - 20,000 D10 T cells (96 wells) - 20,000 D10 T cells, IL-4, IL-5 (96 wells) - IL-4, IL-5 (24 wells) - conA stimulated T cell membranes (24 wells) - conA stimulated T cell membranes, IL-4, IL-5 (24 wells)

52 The remaining cells, to be used for molecular analysis, were placed in 100 ul BCM containing one of the following additives, and were cultured for 6 days at 37°C. 10 B cells/well: - no additions (24 wells) - IL-4, IL-5 (24 wells) - conA stimulated T cell membranes (24 wells) - conA stimulated T cell membranes, IL-4, IL-5 (24 wells) Throughout in vitro culture of the sorted B cells, the wells were scanned for proliferating clones. After six days, the contents of any wells containing proliferating clones were removed for molecular analysis. This was done by drawing the contents of the well up and down a micropipettor (Gilson P-200) in order to separate the proliferating cells, and then placing the contents of the well into small plastic tray containing 1 ml of BCM. The tray was scanned with a reverse phase microscope and any single B cells were removed using a micropipettor (Gilson P-20) with a narrow tip, and placed into a sterile 1.5 ml microfuge tube. Immediately, the first step of the RNA/DNA extraction procedure described in section 3.21 was carried out, and the RNA was stored in extraction buffer 2 on dry ice. When all the single cells were isolated in this manner, the extraction was completed and the RNA stored in 95 % ethanol at -70°C. The remaining microtiter plates were used to assay for antibody secretion or for Ig isotyping.

3.25 Enzyme-Linked Immunosorbent Assay (ELISA)

The wells of the microtiter trays to be used were coated with antigen by adding 100 ul of NP (1 ug / ml) in carbonate buffer to each well. The plates were covered and stored at 4°C overnight. The next day, the wells were rinsed three times with PBS-0.5 % Tween solution (PBST), and 100 ul of 4 % FCS (heat inactivated) was added to each well in order to block any sites that were not bound by the antigen. The plates were incubated at 37°C for 1 hr, following which they were washed three times with PBST. Various dilutions, ranging from 1/10 to 1/10,000 of the pre- and post-tertiary immunization sera were made with BCM. The serum antibodies were allowed to bind to the antigen coated wells for 1 hr at room temperature. The wells were then rinsed three times with PBST. 75 ul of streptavidin-bound horse-radish peroxidase (HRA) was added to each well and incubated for 1 hr at room temperature, following which the plates were washed 4 times with PBST. 100 ul of AEC/H2O2 substrate solution (25 mg 3-amino-9- ethylcabazole, 2 ml dimethylformamide, 95 ml 0.05 M acetate buffer, pH 5.0, 40 ml H2O2) was added to each well. Immediately after the colour began to develop, the reaction was terminated by the addition of 2 x Citrate buffer. The microtiter plates were

53 then immediately analyzed on a microtiter plate reader at 405 nm, with a reference of 490 nm.

3.26 Enzyme-Linked Immunosorbent spot (ELISpot) assay

The membranes of the ELISpot trays to be used were coated with antigen by adding 100 ul of NP (1 ug / ml) in carbonate buffer to each well. The plates were covered and stored at 4°C overnight. The next day, the antigen solution was removed and replaced by 100 ul of 4 % heat inactivated FCS and incubated at 37°C for 1 hr. The plates were then rinsed three times with PBS. 100 ul of BCM containing the cultured FACS sorted cells from one microtiter well (see section 3.24) was added to each well, and the plates were then incubated at 37 °C for 4 hr. The BCM and the cells were removed by rinsing the trays three times with PBS and three times with PBST. Following this, 100 ul of PBST containing a 1/5,000 dilution of oc-total mouse Ig-HRP was added to the wells, and the trays were incubated at 4°C overnight. The following day, the ELISpot trays wererinsed thre e times with PBS and then immersed in 0.05 M Tris-buffered saline solution (pH 8.0). The saline solution was removed, and 100 ul of AEC/H2O2 substrate solution was added. At the first sign of coloration of the ELISpot membranes, the solution was removed and the trays were rinsed with water. After allowing the membranes to dry completely, the wells were scanned for the characteristic brown spots that indicate the presence of secreted anti-NP antibodies.

54 RESULTS

Chapter 4: Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells

Chapter 5: In vitro analysis of splenic antigen-specific B cells

Chapter 6: Determination of the 5' boundary for somatic hypermutation in VH regions

Chapter 7: Minimization of PCR generated artifacts

Chapter 8: Molecular analysis of VRI86.2 related germline genes

Chapter 9: Phylogenetic analysis of VR186.2 related germline genes

Chapter 10: Analysis of genuine germline VH186.2 related genes

Chapter 11: Molecular analysis of VR205. 12 related germline genes

Chapter 12: Phylogenetic analysis of VH205. 12 related germline genes

55 4. Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells

Rationale: In a number of previous reports it was demonstrated that it is possible to isolate antigen-specific germinal center or splenic B cells with deleterious mutations in the DNA (Rada et al, 1991; Weiss et al, 1992) or in the RNA (Apel and Berek, 1990; Berek et al, 1991). This suggests that it is possible to isolate recently hypermutated B cells before they undergo the stringent selection processes in the basal light zone. Since these cells may allow the detection of differentially mutated RNA or DNA species (as predicted by the models proposed by Steele and Pollard, 1987, and by Manser, 1990), it was attempted to isolate antigen-specific B cells from hyperimmune spleen and compare the RNA and the DNA sequences from the same pool of B cells.

Strategy: A single cell suspension was made from 6 C57BL/6J hyperimmunized with NP. Antigen-specific B cells were removed from the single cell suspension with NIP- coated Dynabeads. In this manner approximately 1.8 x 10^ NP+ B cells were isolated. Fewer than 2 % of these were dead cells. The DNA and cytoplasmic RNA were isolated separately from the same pool of B cells. Although the results did not allow unequivocal identification of RNA and DNA sequences from the same B cell, a number of interesting sequence patterns at the VH-D and D-JH joins were noted. cDNA sequences: Primer 15 was used to reverse transcribe Cyl rearranged VDJ regions from the cytoplasmic RNA. Primers 3 and 16 were used to PCR amplify (40 cycles) VH186.2 rearrangements from the cDNA of approximately 5 x 104 cells. The pooled products from 2 independent PCR amplifications were sub-cloned and sequenced. The cDNA sequences are shown in Figure 4.1.

56 **********3*********** VH186.2 TCCACAGGTG Tfr&CTCCCA GGTCCAACTG CAGCAGCCTG GGGCTGAGCT TGTGAAGCCT GGGGCTTCAG TGAAGCTGTC CTGCAAGGCT TCTSGCTACA C13 s c35(3) 02(9) • Q- 014 c48 cl2 c47

cl C34 C44

ell Cl6 clO 040(2) C43(2) cS 730 740 750 760 770 780 790 300 810 820

CDR1 CDR2 VH186.2 CCTTCACCAG CTACTGGATG CACTGGGTGA AGCAGAGGCC TGGACGAGGC CTTGAGTGGA TTGGAAGGAT TGATCCTAAT AGTGGTGGTA CTAAGTACAA cl3 035(3) • •X •G. 02(9) • •I S-

014 ..X •fi- 048 .GI •S- 012 • -I • £• c47 • •X cl .A AT G. c34 •X A fi- c44 .1 AA.G.

ell • X cl6 • X clO • X •3L. c40(2) • X •I- c43(2) •X •I-

c5 830 840 350 860 870 880 890 900 910 920

VH186.2 TGAGAAGTTC AAGAGCAAGG CCACSiCTGAC TGTAGACAAA CCCTCCAGCA CAGCCTACAT GCAGCTCAGC AGCCTGACAT CTGAGGACTC TGCGGTCTAT 013 C 035(3) 02(9) ol4 048 012 047 ol 034 044

ell ol6 ClO c40(2) 043(2) o5 930 940 950 960 970 980 990 1000 1010 1020

DFL16.1 JH2 VH186.2 TATTGTGCAA GA TATTACTACG GTAGTAGCTA C TACTTTGACT ACTGGGGCCA AGGCACCACT CTCACAGTCT CCTCA cl3 TAC 035(3) TAC 02(9) 014 TAC c48 TAC cl2 TAC 047 TAC Cl GGC TAC 034 T. . .. GGC — G. .--...... G C44 GGA —A CTAC AATT— - TATGCTAT3G ACTACTGGGG TCAAGGAACC TCAGTCACCG TCTCCTCA Cll AG 016 AG ClO AG C40(2) AG — • C43(2) AG — 05 GGG AG

57 Figure 4.1 (previous page). The sequences of 16 unique cDNA clones. ClonaUy related sequences with identical VH-D-JH joints are grouped together. Numbers shown in brackets following the name of some of the sequences indicate the number of times the sequence was isolated. The sequences are compared to the V"H186.2, DFL16.1* JH-2 and JH-4 germline sequences. The numbering is according to Figure 2 in Both et al, 1990. The additional bases at some V"H-D and D-JH joints are N regions. The positions of CDRl and CDR2 are shown. The first codon of the coding sequence (CAG) is underlined. The position of the upstream primer is indicated by asterisks. Symbols:. = identical to the germline sequence; - = deletion of a base. Amino acid replacement mutations are indicated by upper case letters, silent mutations are indicated by lower case letters. Nucleotide changes shown in bold and/or underlined were documented in previous studies of anti-NP VHDJH sequences (Blier and Bothwell , 1987 and Weiss et al, 1992 respectively).

Two sets of clonally related sequences were isolated. Family 1 consists of 7 sequences (cl3, c35, c2, cl4, c48, cl2 and c47), whilst family 2 contains 5 sequences (ell, cl6, clO, c40 and c43). Since clonally related B cells probably originate from the same germinal centre (eg. see Jacob et al, 1991a, 1992), the isolation of clonally related sets of sequences from a B cell pool isolated from six spleens suggests that PCR and/or reverse transcription primer bias may have limited the diversity of genes isolated. Alternatively, some of the mice may have had a more vigorous anti-NP response than others, thus contributing more B cells to the B cell pool. Interestingly, c5 contains a D-JH join identical to that found in the members of family 2, however it possesses a different VH-D join, cl and c34 also contain the same N region at the VH-D join, but they differ at the D-JH join. The TGG (Trp)-»TTG (Leu) mutation at position 836 was only detected in 4 clones, whereas in a number of other studies on the anti-NP response this mutation was found in at least half of the sequences (eg. Cumano and Rajewsky, 1986; Blier and Bothwell, 1987; Weiss et al, 1992). This amino acid replacement was shown to confer a 10-fold increase of affinity for the hapten (Allen et al, 1987, 1988). The replacement mutations at positions 914 and 968 were found in at least half of the sequences in Figure 4.1, whereas the AGC (Ser)-»ATC (lie) was found in all but two of the sequences. Although these three replacement mutations were found in other studies (see legend to Fig. 4.1), they were only present in a minority of sequences. A total of 76 replacement and 3 silent mutations were defected in the 16 sequences; this does not include any changes found within the VH-D-JH joins or 10 bp on either side. 42 of the replacement mutations were found in CDRl and CDR2, whereas all of the silent changes were found in FR1. Since CDRl and CDR2 occupy only approximately 20 % of the sequenced region (excluding the VH-D-JH joints and the 10 bp on either side), the data suggests that there has been selection for replacement mutations in the CDRs, and selection against such changes in the FRs.

DNA sequences: Nested sets of primers were used to amplify from the DNA. In the initial PCR amplification, primer 2 and an equimolar mix of primers 4 - 7 were used,

58 whereas in the second amplification primer 3 and an equimolar mix of primers 8-11 were used. The products from 2 independent PCR amplifications were pooled, then sub- cloned and sequenced. The DNA sequences are shown in Figure 4.2.

***********3********** VH186.2 TCCACAGGTG TCCACTCC£A_GGTCCAACTG CAGCAGCCTG GGGCTGAGCT TGTGAAGCCT GGGGCTTCAG TGAAGCTGTC CTGCAAGGCT TCTGGCTACA D4(10) C

D39

DIO D7

D3 6 D25 D42

D13 730 740 750 760 770 780 790 800 810 820 CDRl CDR2 VH186.2 CCTTCACCAG CTACTGGATC CACTGGGTGA AGCAGAGGCC TGGACGAGGC CTTGAGTGGA TTGGAAGGAT T3ATCCTAAT AGTGGTGGTA CTAAGTACAA D4(10) T D39 a--.fi

DIO D7

D36 D25 .£A. .XQ- GA..A. .TA. . . •. •£ D42 .£A. .T£. SA-A. .TA. . . --£ D13 830 840 850 860 870 880 890 900 910 920

VH186.2 TGAGAAGTTC AAGAGCAAGG CCACACTGAC T3TAGACAAA CCCTCCAGCA CAGCCTACAT GCAGCTCAGC AGCCTGACAT CTGAGGACTC TGCGGTCTAT D4(10) G AT T T_

D39

D10 D7

D36

D25 .£.&.£. • fi- D42 .£.£.£. -fi- D13 930 940 950 960 970 980 990 1000 1010 1020

DFL16.1 J 1 ************g********** VH186.2 TATTGTGCAA GA TATTACTACG GTAGTAGCTA C TGGTACTTCG ATGTCTGGGG CGCAGGGACC ACGGTCACCG TCTCCTCA D4(10) TACG -...., ATCE A j„2 ***********9*********** TACTTTGACT ACTGGGGCCA AGGCACCACT CTCACAGTCT CCTCA D39 C. , .T. . . .£. . . . TTT T

D10 TAC D7 ..o... TAC

D36 TACG D25 TACG D42 TACG .^A.

-.„. *****-******^^_********* TATGCTATGG ACTACTGGGG TCAAGGAACC TCAGTCACCG TCTCCTCA D13 GAGGGGGATC TACTATGGTA A<

Figure 4.2. The sequences of 8 unique DNA clones. Clonally related sequences with identical VH-D-JH joints are grouped together. D4 was isolated 10 times. The sequences are compared to the VH186.2, E>FL16.1» JH-1. JH-2 and JH-4 germline sequences. The numbering is according to Figure 2 in Both et al, 1990. The additional bases at some VH-D and D-JH joints are N regions. The positions of CDRl and CDR2 are shown. The position of the primers are indicated by asterisks. Symbols: . = identical to the germline sequence; - = deletion of a base. The TGA stop codon resulting from a mutation in JH-2 of D42 is indicated with lines above and below the codon. Amino acid replacement mutations are indicated by upper case letters, silent mutations are indicated by lower case letters. Nucleotide changes shown in bold and/or underlined were documented in previous studies of anti-NP VHDJH sequences (Blier and Bothwell, 1987 and Weiss et al, 1992 respectively). The first codon of the coding sequence (CAG) is underlined.

59 DNA clones D7 and DIO have identical VH-D-JH joins, which are also identical to those found in the cDNA clones belonging to family 1 (Fig. 4.1). However, none of the DNA sequences exactly correspond to any of the cDNA sequences. Only one JH-4 rearranged DNA clone was isolated (D13), and this clone possesses different VH-D-JH joins to the clonally related family 2 cDNA sequences (Fig. 4.1). Another clonally related set of sequences, family 3, contains three sequences. Two of these, D25 and D42, only differ by one mutation, a TGG-»TGA in JH-2 of D42 which results in a stop codon. Although it has been shown that Pfu DNA polymerase has a lower error rate than Taq DNA polymerase (Lundberg et al, 1991; Rothenfluh et al, 1993; see chapter 7), it is possible that this mutation was introduced by the enzyme during one of the two cycles of PCR amplification. The JH-2 rearranged family 3 sequences contain the same VH-D join as the JH-1 rearranged DNA clone D4, but they contain different D-JH joins. It was suggested that incomplete cDNA molecules resulting from the reverse transcription of partially degraded RNA molecules can act as primers during PCR amplification, thus generating hybrid DNA products (Shuldiner et al, 1989). However, it is unlikely that D5 is such a hybrid product, since it was isolated 10 times. A total of 55 mutations were scored outside of the VH-D-JH junctional region, 51 of these were replacement mutations. 39 replacement and 2 silent nucleotide changes were present in CDRl and CDR2, indicating that these sequences have also undergone selection for antigen-binding. However the bulk of these mutations are concentrated in CDR2 and only three were found in CDRl. The TGG-^TTG replacement mutation in CDRl that is predominant in other anti-NP responses (eg. Cumano and Rajewsky, 1986; Blier and Bothwell, 1987; Weiss et al, 1992) was only found in one of the DNA clones. The replacement mutations that were observed in the majority of cDNA clones at positions 830,914 and 968 are found in at most one of the 8 DNA clones isolated. The mutation pattern of the cDNA clones differs to that of the DNA clones. Although members of family 1 were isolated from both nucleic acids, no identical clones were isolated from both sets of sequences. One key difference is the absence of recurrent mutations in the DNA sequences, whereas this was characteristic of the sequences isolated from the cDNA. The differences could be due to different primer specificities, or due to the fact that the cDNA sequences only contain Cyl expressing antibodies. The strategies used to isolate the splenic B cells and the primers used to amplify the DNA do not allow identification of the C region expressed by the B cells.

Concluding remarks: The data presented in this chapter indicates that insufficient sequences were isolated to allow detection of RNA and DNA sequences that originated from the same cell. It is possible that a different experimental strategy may increase the probability of isolating RNA and DNA sequences from the same cell (see section 13.1). However, some unexpected but interesting observations were made. First, it is apparent 60 that identical N regions are present in sequences that are not clonally related. Second sequences that contained one identical coding join but differ at the other join were isolated. The implications of these results will be discussed below (section 13.1)

61 5. In vitro analysis of splenic antigen-specific B cells

Rationale: The aim of this preliminary experiment was to culture antigen-specific splenic B cells in the presence of activated T cell membranes, IL-4, IL-5 and/or live T cells, and to determine whether the cells (singly or in groups of 10) could be made to secrete Ig, proliferate and undergo somatic hypermutation. The results indicate that the B cell activation system developed by Hodgkin et al (1990) does not provide the required signals for somatic hypermutations, but it is able to induce Ig secretion and proliferation of single or groups of 10 splenic B cells.

Strategy: A single cell suspension was made from the spleen of 6 mice hyperimmunized with NP. Antigen-specific B cells were isolated using three-color flow cytometry, and the cells were sorted singly or in groups of ten into individual wells of microtiter plates. The cells were cultured and tested for Ig secretion or proliferation. A number of single cells from the single proliferating B cell clone that was detected were isolated. The rearranged IgV region of these cells was PCR amplified and sequenced.

Concentration of H chain isotypes in pre- and post-tertiary sera: The concentrations of the IgM, IgG and IgA isotypes prior to, and following tertiary immunization were determined by an ELISA assay (see Fig. 5.1). The most abundant isotype produced during the tertiary immune response in the two mice is IgGi. Other major isotypes present are IgM, IgG2a and IgG2b» however these isotypes were already present at similar levels prior to tertiary immunization. This is consistent with a previous study, where it was found that 20 out of 28 secondary anti-NP response hybridomas isolated from a C57BL/6J mouse secreted IgGl antibodies (Blier and Bothwell, 1987). Data from previous studies on the anti-NP response indicated that only a small minority of primary and secondary response hybridomas expressed the IgG3 isotype (eg. see Reth etal, 1978; Blier and Bothwell, 1987; Tao and Bothwell, 1990; Tao et al, 1993). None of these studies isolated IgA expressing anti-NP antibodies.

62 3n

> a" K 2' o 1-

0- —i—«—i—i—r—»—i io-7 io-5 io-3 IO"9 IO"7 IO"5

G !g 2b IgG3

-o-fr-o-p-^ 6 2 10- 1(H IO- 10-6 1(H 10"? IO"6 IO"4 IO-2 Serum dilution

Figure 5.1. Concentration of IgM, IgGi, IgG2a, IgG2b> IgG3 and IgA isotypes before and after tertiary immunization in the sera of the two mice used in this experiment. The legends shown in the first graph (IgM) applies to all graphs, pre-1, post-1 = pre- and post-tertiary sera of mouse 1; pre-2, post-2 = pre- and post-tertiary sera of mouse 2.

Both mice had similar pre- and post-tertiary concentrations of IgM and IgGi. Since these are probably the main isotypes involved in the anti-NP response (eg. see Reth et al, 1978; Blier and Bothwell, 1987; Tao and Bothwell, 1990; Tao et al, 1993), it indicates that both mice responded well to NP. The pre-and post-immune level of IgG2a in mouse 1 were identical, whereas in mouse 2 there was some elevation in the expression of this isotype during the tertiary response. The most striking difference between the two mice however, is the level of expression of IgG3. In mouse 2 this isotype was virtually absent before and after tertiary immunization. In contrast, even though the concentration of this isotype in mouse 1 was lower than either IgM, IgGi or IgG2a> it contained much higher levels of this isotype than mouse 2. The fact that the pre- and post-tertiary concentrations of IgG3 in mouse 1 are not significantly different indicates that most of this isotype was produced during the primary and/or secondary responses.

Flow cytometric analysis of splenic cells: In order to determine which of the available stains to use for sorting the splenic cells with the FACS, a number of aliquots

63 containing 10^ splenic B cells were stained as described in section 3.23. The histograms shown in Figure 5.2 show some of the results obtained.

io° io1 io2 io3 io4 10° io1 IO2 io3 io4 B220 NP

Figure 5.2. Flow cytometry of splenic B cells, a) Analysis of B220 expression of the cells that passed through the live cell gate set on the forward scatter versus side scatter plot (data not shown). The data was obtained from splenic B cells stained with a-B220-PE (a). Virtually identical results were obtained with the a-B220-FTTC (data not shown). The level of LK chain expression (b), NP binding (c), and Lx chain expression (d) among cells that passed through the live cell and the B220+ gates. The peak closest to each y-axis was obtained from cells labeled with streptavidin-APC and provides a control for non-specific staining.

The data indicates that over 80 % of the splenic cells expressed B220+ and thus were B lymphocytes. This seems very high, since a significant proportion of non-B cells (T cells and macrophages) should also have been present in the splenic B cell population. However, this is not likely to be due to non-specific staining since the B220+ population is distinct from the control population, i.e. the cells labeled with streptavidin-APC (Fig. 5.2a). Furthermore, there is a distinct lack of a B220" population which indicates that either the non-B cells were somehow lost during the isolation procedure, or they were present in very low numbers in the two spleens. Figure 5.2c reveals that approximately 81 % of the splenic B cells (i.e. the cells that have passed through the live cell gate and the B220+ gate express IgK chains. This is

64 in good agreement with the X light chain data (Fig. 5.2d), which indicates that approximately 16 % of splenic B cells express the IgX chain. However, Figure 5.2d also indicates that the cc-IgA, antibody bound in a non-specific manner, as evidenced by the large population of IgX,10 cells (i.e. cells that bound the ct-IgX antibody only weakly). Approximately 27 % of the splenic B cells are specific for NP (Fig. 5.2b). Dot plots for the live B220+ cells, comparing NP binding with IgK chain expression (Fig. 5.3a), or with IgA, chain expression (Fig. 5.3b) were also constructed.

a) S-J- 58.3% 23.0%

•'».)*-»; : o 13.6% ,•>' • • • 5.0% 10° 101 102 103 IO4 NP b)2; 11.1% 5.6%

2" •

NP<=I

Bfr^i "••;•«•.••• ' °"^H •BrVDEn' ••

o .65.22%,'] >• • 18.1% o,10 ° 101 IO2 io3 io4

Fig. 5.3. Dot plots comparing a) binding of NP with expression of IgK chain, and b) binding of NP with expression of IgA, chain. Each dot represents one live B220+ cell. The major cell populations are highlighted in the representative staining patterns shown on the left hand side of each dot plot.

In support of the data shown in Figure 5.2, the dot plots shown in Figure 5.3 indicate that the majority of splenic B cells did express IgK chains. The dot plots for NP binding versus IgK expression (Fig. 5.3a) and NP versus IgX chain expression (Fig. 5.3b) indicate that the IgX bearing, NP binding splenic B cells constituted approximately 5 % of the total splenic B cell pool. Between 18 - 23 % of the total splenic B cell repertoire were NP binding and IgK expressing B cells. Thus, on day 5 of the tertiary

65 anti-NP response in the mice used in this work, most NP binding B cells express the IgK chain. Figure 5.2d suggested that the cc-IgA antibody stained non-specifically, however this is not evident in Figure 5.3b since the proportion of cells in each quadrant corresponds well with the equivalent quadrants of Figure 5.3a.

Three color flow cytometric sorting of splenic B cells: From the above data it was decided notto use the cc-IgA antibody for flow cytometric cell sorting. Since many B cells that have undergone the germinal center pathway express the IgGi isotype (McHeyzer-Williams, 1991), it was decided to stain the splenic cells that were to be sorted with oc-IgGl coupled to FITC, oc-NP coupled to PE and a-IgK chain coupled to APC The staining patterns of the splenic cell pool from which the B cells were isolated using flow cytometry are shown in Figure 5.4.

Figure 5.4. Flow cytometry of the total splenic cell population from which the B cells were isolated. Analysis of a) IgGi H chain expression, b) expression of IgK and c) NP binding. The lines above the histograms indicate the gates that were set for flow cytometric cell sorting (i.e. the cells that fall below the line are the cells that were sorted), and the size of the selected cell population is indicated as a percentage of the total splenic cell pool. The peak closest to each y-axis was obtained from cells labeled with streptavidin-APC and provides a control for non-specific labeling.

66 The data in Figure 5.4 indicates that up to 44 % of the cells used for FACS sorting were IgK", i.e. they expressed IgX. This is much higher than indicated in the previous FACS analyses (Figs. 5.2 and 5.3). Nevertheless, only 0.13 % of the total splenic cell population satisfied all three requirements (indicated in Fig. 5.4) and was sorted by flow cytometry.

Ig secretion by cultured cells: The ELISpot assay (see section 3.26) was used to detect Ig secretion by the cultured cells. The results are shown in Table 5.1

Table 5.1. Ig secretion of cultured B cells Number of B Ingredients added to BCM aNumber of cells/well positive wells 1 DIO T cells, conA activated T cell membranes, IL-4, IL-5 2/36 1 conA activated T cell membranes, IL-4, IL-5 2/196& 10 DIO T cells, conA activated T cell membranes 2/96 10 DIO T cells, IL-4, IL-5 21/96C a Any well containing at least one brown spot indicative of the presence of an Ig secreting B cell was counted as a positive. b In one of these wells two spots were detected.c Four of the wells contained 2 spots, and one of the wells contained 11 spots.

Since only a small number of B cells were sorted by flow cytometry a number of planned culture conditions could not be carried out, thus limiting the comparative value of the data presented in Table 5.1 The data suggests that all culture conditions resulted in Ig secretion by some of the B cells. When single B cells were cultured in the presence of live T cells, activated T cell membranes, IL-4 and IL-5 approximately 5.6 % of the cells were induced to secrete Ig. In the absence of live T cells, the frequency of Ig secreting B cells was reduced to approximately 1 %. Interestingly, the data suggests that one of the Ig secreting single cells cultured in the absence of whole T cells may also have proliferated, since one of the wells positive for Ig secretion contained two spots indicating that two Ig secreting B ceils were present. Approximately 2.1 % of the wells containing 10 B cells that were cultured in the absence of IL-4 and IL-5 secreted Ig. The frequency of Ig secreting B cells increased to approximately 22 % when the cells were cultured in the presence of whole T cells, IL-4 and IL-5 without the activated T cell membranes. However in all but two of the positive wells only one Ig secreting cell was detected. Nevertheless, one well contained 11 Ig secretors suggesting that at least some of the cells in this well proliferated.

67 Molecular analysis of proliferating B cells: The cells that were cultured for this purpose were not assayed for Ig secretion, they were visually examined for proliferation under a reverse phase microscope. The first search was carried out on day 4 of culture and was repeated every day until day 6.

Table 5.2. Proliferation of cultured B cells Number of B Ingredients added to BCM aNumber of cells/well positive wells 10 - 0/24 10 conA activated T cell membranes 0/24 10 IL-4, IL-5 0/24 10 conA activated T cell membranes, IL-4, TL-5 2/24 a Any well containing at least one cluster of proliferating cells was counted as positive.

As can be seen in Table 5.2, B cell proliferation only took place when the cells were cultured in the presence of activated T cell membranes as well as IL-4 and IL-5. Single clusters of proliferating cells were detected in two of the 24 wells with these culture conditions. However, the cells constituting one of these clusters were large and of irregular shape, indicating that the cluster may have consisted of contaminating cells. Therefore only the cluster that contained cells with B cell morphology (i.e small, spherical and agranular) was selected for molecular analysis. This cluster of proliferating B cell clones contained approximately 50 - 60 cells. Following physical separation of the cluster 13 single cells were isolated. Primer 15 was used to reverse transcribe the mRNA isolated from each of the 13 single cells. The cDNA from each cell was subjected to 40 cycles of PCR amplification with primers 2 and 15. 5 ul of each PCR reaction was subjected to a second round of amplification (40 cycles) with primers 3 and 16. In this manner amplification should only have take place from cells that expressed the VH 186.2 germline gene (or one closely related to it) and the IgGi isotype. Successful amplification took place from 5 of the 13 cells suggesting that the success rate of PCR amplification under the conditions used is low. Alternatively, it is possible that most of the cells did not express the genes described above, or did so below detectable levels. The sequences of the 5 PCR amplified DNA fragments were somatically mutated but identical, indicating that these cells originated from the same precursor cell and that they did not accumulate further mutations during in vitro proliferation.

68 ***********3********** VH186.2 TCCACAGGTG TCCACTCC£A_SGTCCAACTG CAGCAGCCTG GGGCTGAGCT TGTGAAGCCT GGGGCTTCAG TGAAGCTGTC CTCCAAGGCT TCTGGCTACA F10 730 740 750 760 770 780 790 800 810 820

CDR1—-— CDR2 VH186.2 CCTTCACCAG CTACTGGATC CACTCGGTGA AGCAGAGGCC TGGACGAGGC CTTCAGTGGA TTGGAAGGAT TGATCCTAAT AGTCGTCGTA CTAAGTACAA F10 £ 5 830 840 . 850 860 870 880 890 900 910 920

VH186.2 TGAGAAGTTC AAGAGCAAGG CCACACTCAC T3TAGACAAA CCCTCCAGCA CAGCCTACAT GCAGCTCAGC AGCCTGACAT CTGAGGACTC TCCGGTCTAT F10 930 940 950 960 970 980 990 1000 1010 1020 DFL16.1 JH-2 VH186.2 TATTGTGCAA GA TATTACTACG GTAGTAGCTA C TACTTTGACT ACTGGGGCCA AGGCACCACT CTCACAGTCT CCTCA F10

Figure 5.5. Sequence of VH-D-JH region expressed by the 5 B cells compared to the VH186.2, DFL16.1 and JH-2 germline sequences. Position of primer 3 and the CDRs are indicated. Symbols: . = identity with the germline sequence; - = deletion of a nucleotide. Amino acid replacement mutations are indicated by upper case letters. Thefirst codon of the coding region is underlined (CAG). The numbering is as shown in Both et al, 1990. Nucleotide changes shown in bold and/or underlined were documented in previous studies of anti-NP VHDJH sequences (Blier and Bothwell , 1987 and Weiss et al, 1992 respectively).

As shown in Figure 5.5, the VH region differs from the germline sequence by two nucleotide changes that result in amino acid replacements, both of which were reported in previous studies on the anti-NP response (Blier and Bothwell, 1987; Weiss et al, 1992). This strongly suggests that the precursor cell from which the five clones were derived had indeed undergone the somatic hypermutation pathway. However, due to the antibodies used in flow cytometry, it is impossible to determine whether the B cell was present in a gerrninal center or in another part of the spleen at the time of splenectomy.

Concluding remarks: The above data indicates that the B cell activation system originally developed by Hodgkin and colleagues (1990) needs to be modified to achieve optimal activation of single B cells isolated from hyperimmune spleen. The fact that antigen-specific splenic B cells were activated more efficiently in the presence of whole T cells indicates that additional T cell factors need to be added. Furthermore, sequence analysis of B cells that proliferated in vitro indicates that they did not accumulate further somatic mutations.

69 6. Determination of the 5' boundary for somatic hypermutation in VH regions

Rationale: Two previous studies which attempted to define the boundaries for somatic hypermutation in VH regions came to different conclusions. Both and colleagues (1990) argued that the cap site is the 5' boundary, whereas according to Lebecque and Gearhart (1990) the 5' boundary is situated in the promoter region. One of the VH genes sequenced in the former study, VH3B62, contained a cluster of 5 mutations > 375 bp upstream of the cap site. Since it was suggested that at least some mutations found in VH3B62 may have resulted from a gene conversion event (Cumano and Rajewsky, 1986), a search for a putative germline donor was carried out. One of the cluster of five mutations introduced a Rsal restriction site into the 5' flanking region of VH3B62 (see Fig. 6.1). A variety of PCR and hybridization strategies failed to detect this restriction site or a germline donor in C57BL/6J genomic DNA (Rothenfluh, 1990; Rothenfluh et al, 1993). However, these strategies relied on the detection of only one of the five mutations, hence it was attempted to detect the cluster of 5 mutations by sequencing VH186.2 related germline genes amplified from BALB/c and C57BL/6J genomic DNA. Furthermore, in order to determine whether the region upstream of the cap site is normally a target for somatic hypermutation, it was important to sequence the 5' flanking regions of additional rearranged VH regions. The addition of these sequences to previously published somatically mutated VH sequences generates a larger data set which allows more precise definition of the 5' boundary for somatic hypermutation in rearranged VH regions, and it also illustrates the asymmetric nature of the distribution of somatic mutations around these genes.

Strategy: The sequence survey of germline VH 186.2 related genes was carried out by PCR amplifying from C57BL/6J and BALB/c genomic DNA and sequencing the DNA fragments (see also chapter 8). In order to allow more precise definition of the 5' boundary for somatic hypermutation, the 5' flanking regions of somatically mutated IgVH regions expressed by a number of secondary anti-NP hybridomas were also PCR amplified and sequenced.

Sequence survey of VH 186.2 related germline genes: The earlier strategies used to detect possible donors for the upstream cluster of mutations found in VH3B62 were limited because they were based on the detection of a single one of the 5 mutations, or on the detection of VH3B62 coding region specific sequence. To overcome this limitation it was therefore attempted to identify any potential donor by sequencing a number of VH186.2 related germline genes. These genes were PCR amplified for 40

70 cycles with primers 1 and 12 from the following genomic DNA: C57BL/6J liver DNA (supplied by K. Rajewsky), C57BL/6J liver and embryo DNA (Canberra) and BALB/c liver or embryo DNA (Canberra). VH 186.2 related germline genes were also amplified and sequenced from BALB/c DNA because the fusion partner used for the generation of hybridoma 3B62 originated from a BALB/c mouse (Cumano and Rajewsky, 1986). As can be seen in Figure 6.1, none of the 52 unique VH 186.2 related genes amplified from C57BL/6J (31) and BALB/c (21) DNA contain even a single one of the 5 mutations found in the 5' flanking region of VH3B62.

******** ****;L*********** VH186.2 TGATGCAATA TTCTGTTGAC CCATACATA T ACATAATTTA TTTCTTCTGA TAATGCTGCA ATAATCAATC ATGTGTATAT GTTTCTGAGG 3B62 C C57C19 GGTTATT G. .A. C57C14 .A. C57C2S C57C26 .A. C57C11 .A. C57C15 .A. C57C2 .A. C57C16 GGTTATT. .A. C57C27 GGTTATT. .G. C57C23 .G. C57C20 .A. GGTTATT G. . .. C57C22 .A. C57G9 GGTTATT G. . ..A....G.. .. C57G11 C57G44 T A.. .. C57G45 G C — .A.A.. .A. C57G46 T A.. .. C57G8 .A. C57G10 .A. CS7E35 GGTTATT. .G. C57E22 .A. C57E44 .A. C57E33 .A. C57E31 GGTTATT. .G. C57E3 .A. C57E40 GGTTATT. .G. C57E36 .A. C57E38 CT . .A. C57E6 .A. CS7E1 BALB1 .A. BALB2 GGTTATT A.A. .0. BALB4 CT .A. EALB6 C.GGTTATT A... .G. BALB7 CT .A. BALB16 CT .A. BALB9 CC.GGTTATTT . .T G C A. .A.A.. A BALB10 CT A.. BALB12 CT A.. BALB13 CT A.. BALB18 C.GGTTATT A.. BALB23 .. T A.. EALB20 CT . A.. 20 30 100 BALB25 40 50 60 70 80 90 BALB21 BALE26 BALB14E BALB13E BALB3E BALB16E BALB5E

71 VH186.2 TATGTTTTGT TTTGGTCATT TGGGTGATTT TTCGAATGTA TATGATATTG GAAAGGCAAA TGTTAATTGT ATGTATTGAA AGGAGGCTGT GACTTTTAAT 3B62 T , ,,i C G...T C57C19 C T A C C57C14 C T A C. C57C25 C57C26 C57C11 T A A C. C57C15 C T A C. CS7C2 C T C C. C57C16 C T A A C C. C57C27 C T A A C C. C57C23 ..A T TT A C57C20 T A TT C. C57C22 C T A C. C57G9 T A A C. C57G11 ..A T TT. CS7G44 C57G45 C T A C. C57G46 C T A C. C57G8 C. T A A C C.C. C57G10 ...C...A A T...A.. . .A. . . .C C. C57E35 C T A A C C. C57E22 C57E44 C T A A C C. C57E33 C T A TT C. C57E31 T A TT C. C57E3 C T A C. C57E40 C T A C. C57E36 T A T C57E38 C T A A C C. C57E6 ..A T TT A C57E1 ...C T T A C. BALB1 A T C TT C. BALB2 T A TT C. BALB4 G T A C. BALB6 T A TT C. BALB7 T A T BALB16 T A BALB9 G T C... BALB10 C BALB12 T AA C C. . . . BALB13 T CT C... BALB18 T A A C BALB23 T A A...TT C BALB20 T A A. ..TT C... BALB25 ...A A T CA. . . .T C A. .C BALB21 T A A. . .TT C BALB26 T AA. .G. . . C C BALB14E T A A. . .TT C BALB13E G T A • .-C BALB3E G T A -..C... BALB16E G T A C. .. . BALB5P T. AA..G... C....A C. . . . "llO ' "l20 ' * 130 ' " 140 150 160 170 180 190 200

Figure 6.1. V"H186.2 related germline sequences amplified various genomic DNA (C57Gx = liver DNA from German C57BL/6J; C57Cx = liver DNA from Canberra C57BL/6J; C57Ex = embryo DNA from Canberra C57BL/6J; BALBx = liver DNA from Canberra B ALB; BALBxE = embryo DNAfrom Canberr a BALB/c). Some of the sequences are identical over this region, however, they do have different downstream regions (see Figs. 8.1 and 8.4). Only the region spanning the cluster of 5 mutations found in VH3B62 is shown, i.e. 352 - 518 bp upstream of the cap site. The sequences are compared to the V"H3B62 sequence and to the VH186.2 germline gene (which was isolated from C57BL/6J but not from BALB/c DNA). Numbering is as shown in Fig. 2 of Both et al, 1990. The position of the 5' primer is indicated by asterisks. The Rsal restriction site introduced into the 5' flanking region of VH3B62 due to a T-»C base change is underlined. Symbols: . = sequence identity with the VH 186.2 gene; - = base deletion.

Extension of 5' flanking region sequences of previously characterized hybridomas: The 5' flanking region sequence up to a point approximately 520 bp upstream of the cap site has only been determined for 9 VH genes (Both et al, 1990; Lebecque and Gearhart, 1990). Only one of these, VH3B62, contained mutations upstream of the promoter (Both et al, 1990). In order to determine whether this region is a common substrate for somatic hypermutation, the 5' flanking region sequences of an additional 12 hybridoma VH genes were determined. The mRNA sequences for hybridomas Hl-7, Hl-8, Hl-27, Hl-30, Hl-51, Hl-29, Hl-39, Hl-45, Hl-72 (Blier

72 and Bothwell, 1987), 3C52, 3A112 and 3D61 (Cumano and Rajewsky, 1986) were previously determined. The 5' flanking regions of hybridomas 3C52, 3A112 and 3D61 were PCR amplified, cloned and sequenced by Dr. Linda Taylor. The rearranged VH region and approximately 520 bp of non-transcribed 5' flanking region of the remaining hybridomas were subjected to 40 cycles of PCR amplification with primers 1 and 5 for JH-2 rearrangements, or with primers 1 and 7 for JH-4 rearrangements. The DNA amplified in 2 to 4 independent PCR reactions was pooled and sub-cloned. Between 3 to 5 different clones for each hybridoma were sequenced in order to detect any Taq or Pfu DNA polymerase induced errors (see chapter 7). In codons 3,74 and 106 of HI-7, 3 and 29 of Hl-8, 100 of Hl-27, 16, 84, 87 and 100 of Hl-30, 61 and 108 of Hl-39, 108 of Hl- 45, 66 and 85 of Hl-72, 24 of 3C52 and 38 of 3D61 the DNA sequences determined here differ from the previously published mRNA sequences (Figs. 6.2 and 6.3). However, these discrepancies do not alter the conclusions drawn in the papers in which the original sequences were published (Cumano and Rajewsky, 1986; Blier and Bothwell, 1987).

************ 2.*********** VH186.2 CATGACTTCT TGATGCAATA TTCTGTTGAC CCATACATAT ACATAATTTA TTTCTTCTGA TAATGCTGCA ATAATCAATC ATGTGTATAT GTTTCTGAGG Family 1 f,Tn 2 Rearranged^ Hl-7 Hl-8 Hl-27/30 Hl-51 Family 3 f,TH ; Rearranged) Hl-29 Hl-39 Hl-45 3C52 ~~ JH-4 Rearranged Hl-72 3A112 3D61 100 VH186.2 TATGTTTTGT TTTGGTCATT TGGGTGATTT TTCGAATGTA TATGATATTG GAAAGGCAAA TGTTAATTGT ATGTATTGAA AGGAGGCTGT GACTTTTAAT Hl-7 Hl-8 Hl-27/30 Hl-51 Hl-29 Hl-39 Hl-45 3C52 Hl-72 3A112 3D61 200 VH186.2 AAGTTAGCTG TTTTTGAGAT TTCCCATCAC TATTCTCATC TTTCTAACCA CCTGTAAATC CATCTGTCAA CTGTGTCACA GTGGGGCCAC TGTCTCAAGC Hl-7 Hl-8 Hl-27/30 Hl-51 Hl-29 Hl-39 Hl-45 3C52 Hl-72 3A112 3D61 300

73 VH186.2 TGCAAATCTT TTTAGTGCAC AGGCTCTAAT GTTACATCCA TAGCCTCAAC ACAAGGTTCA GGGATGAGGT ATGGGATGAA TTTCCACAGA CAAGATGAGG Hl-7 T Hl-8 Hl-27/30 Hl-51 Hl-29 • Hl-39 Hl-45 3C52 Hl-72 3A112 3D61 400 VH186.2 ACTTGGGCTT CAGTATCCTG ATTCCTGACC CAGATGTCCC TTCTTCTCCA GCAGGAGTAG GTGCTTATCT AATATGTATC CTGCTCATGA ATATGCAAAT Hl-7 Hl-8 Hl-27/30 Hl-51 Hl-29 Hl-39 Hl-45 3C52 Hl-72 3A112 3D61 500

VH186.2 CCTGTGTGTC TACAGTGGTA AATATAGGGT TGTCTACACG ATACAAAAAA CATGAGATCA CTGTTCTCTT TACAGTTACT GAGCACACAG GACCTCACCA Hl-7 G G Hl-8 Hl-27/30 Hl-51 Hl-29 Hl-39 Hl-45 3C52 Hl-72 3A112 3D61 600 Leader 1 VH186.2 TGGGATGGAG CTGTATCATG CTCTTCTTGG CAGCAACAGC TACAGGTAAG GGGCTCACAG TAGCAGGCTT GAGGTCTGGA CATATACATG GGTGACAAT3 Hl-7 7 Hl-8 Hl-27/30 Hl-51 Hl-29 Hl-39 Hl-45 3C52 Hl-72 3A112 3D61 700

VH186.2 ACATCCACTT TGCCTTTCTC TCCACAGGTG TCCACTCCCA GGTCCAACTG CAGCAGCCTG GGGCTGAGCT TGTGAAGCCT GGGGCTTCAG TGAAGCTGTC Hl-7 Hl-8 Hl-27/30 Hl-51 Hl-29 ' G Hl-39 G Hl-45 G 3C52 C Hl-72 3A112 3D61 800

74 CDRl VH186.2 CT3CAAGGCT TCTGGCTACA CCTTCACCAG CTACTGGATG CACTGGGTGA AGCAGAGGCC TGGACGAGGC CTTGAGTGGA TTGGAAGGAT TGATCCTAAT Hl-7 C T. . .A A Hl-8 C T. . .A A Hl-5Hl-27/31 0 A C TT...... AA A. Hl-29 T. . Hl-39 A Hl-45 T.. 3C52 Hl-72 TC. .A G. 3A112 T 3D61 T. 900

CDR2 VH186.2 AGTGGTGGTA CTAAGTACAA TGAGAAGTTC AAGAGCAAGG CCACACTGAC TGTAGACAAA CCCTCCAGCA CAGCCTACAT GCAGCTCAGC AGCCTGACAT Hl-7 G G Hl-8 G G Hl-27/30 G G Hl-51 G G A Hl-29 G TA.T A T. Hl-39 A...C Hl-45 TA G C A T. 3C52 Hl-72 3A112 3D61 1000

VH186.2 CTGAGGACTC TGCGGTCTAT TATTGTGCAA GA Hl-7 CC Hl-8 CC Hl-27/30 CC Hl-51 CC Hl-29 Hl-39 Hl-45 3C52 CT Hl-72 3A112 3D61 1032

Figure 6.2. Sequences of the hybridoma 5' flanking regions and V"H gene elements compared with the V"H186.2 germline sequence. The sequences in families 1 and 3 were grouped together to highlight clonal relationships (Blier and Bothwell, 1987). The sequence Hl-27/30 represents the 5' flanking and V"H coding sequence of hybridomas Hl-27 and Hl-30, which are identical. The position of the 5' primer is indicated by asterisks. The TATA box, octamer and ATG start codon are overlined. Symbols: I = putative splice sites; > = first codon of the coding region (CAG); * = putative transcription start site; . = sequence identity with the VH 186.2 germline gene. The numbering is as shown in Both et al, 1990.

The 5' flanking region and VH coding region sequences for the hybridomas are shown in Figure 6.2. Over the region sequenced in this work, the VH regions expressed by hybridomas Hl-27 and Hl-30 are identical. This, and the other sequence disagreements (see above) do not alter the conclusions drawn in the papers in which the coding region sequences were originally published. Although it is possible that Hl-27 and Hl-30 may differ in the J-C intron, the consensus sequence was tentatively named Hl-27/30, and the mutations found in this sequence were scored only once in all subsequent analyses.

75 VH186.2 DFL16.1 Ju_2 TATTGTGCAA GA TATTACTACG GTAGTAGCTA C TACTTTGACT ACTGGGGCCA AGGCACCACT CTCACAGTCT CCTC Hl-7 CC ..CTC Hl-8 CC ..CTC Hl-27/30 CC T ..CTC Hl-51 CC

Hl-29 ' GGA Hl-39 T. .. GGA Hl-45 GGA

JH-3 GCCTGGTTTG CTTACTGGGG CCAAGGGACT CTGGTCACTG TCTCTGCA 3C52 CT CCTCCG.G— -

JH-4 Hl-72 TACTCC TATGCTATGG ACTACTGGGG TCAAGGAACC TCAGTCACCG TCTCCTCA 3A112 ..T. ..TTCCTT.. . ±-••'•'•'•'• :::::::::: :::c:::::::::::::::: :::::::: 3D61 . .C...... T. .AG..

hybndom a V D *^3. Sequences ?T ' H- -JH junctions compared with the germline sequences of the VH186.2, Dfug.! and JR genes. The sequence Hl-27/30 represents the VH-D-JH junction sequence of hybridomas Hl-27 and Hl-30 which are identical. The additional sequences at some of the VH-D joints are N regions. Symbols:. = sequence identity with the respective germline gene; - = base deletion.

Of the eleven unique hybridoma sequences, only one, Hl-7, was mutated 5' to the cap site (Fig. 6.2). Three mutations occurred upstream of, but within 200 bp of the cap site: a G->T transversion at position 374 (178 bp 5' of the cap site), and two A->G transitions at positions 556 and 541 (16 and 11 bp 5' to the cap site) respectively. The distribution of mutations within the coding regions was previously described (Cumano and Rajewsky, 1986; Blier and Bothwell, 1987). Briefly, most hybridomas displayed a

clustering of mutations in CDRl (Fig. 6.2). Whereas JH-2 rearrangements also displayed a concentration of mutations in CDR2, this was much less apparent in JH-3 and JH-4 rearrangements. Four of the eleven unique VR-D joints contain N regions, however no evidence of N region addition is found at any of the D-JH joints (Fig. 6.3).

Mutation frequency distribution is asymmetric: An earlier review of a limited data set suggested that the distribution of somatic mutations around rearranged VH regions may be asymmetric (Steele et al, 1992). In order to test this proposition, the mutation frequency distribution graph (Fig. 6.4) around a putative JH-1 rearrangement was constructed by adding the above IgH chain sequences (Figs. 6.2 and 6.3) to those analyzed in the review (Steele et al, 1992; and see legend to Fig. 6.4). A JH-1 rearrangement was chosen because it shows the total L-VDJ-C target area. In addition, most of the J-C intron sequence data was obtained from JH-1 rearranged VH genes (Both etal, 1990; Lebecque and Gearhart, 1990).

76 i -1000 ^n 1000 2000 3000 4000 Nucleotides from cap site DHH HHI—0 P L VDJHI toto ta E

Figure 6.4. Distribution of somatic mutations within and upstream of the rearranged VH region °enes of

21 to 29 somatically mutated sequences. The J-C intron data was obtained from 5 JH-1 reaped °enes (Both et al, 1990; Lebecque and Gearhart, 1990), which were chosen because more sequence data for° that region was available for JH-1 rearrangements, and because they contain the complete J-C intron The sequences were aligned by the cap site (position 0). The mutation frequency (%) indicates the number of mutations per 100 bases sequenced. The diagram below the graph represents the region for which sequences were available and indicates the positions of the promoter (P), cap site, leader (L), V region unrearranged JH genes and the enhancer (E). The gap indicates the region in which mutations were not

scored, i.e. the D region, 10 bp 5' of the VH-D and 10 bp 3' of the D-JH junctions. The data were grouped m 50 bp intervals. The total number of bases sequenced for the interval 550 - 501 bp 5* of the cap site was 858; for the interval 500 - 451 bases 5' of the cap site it was 994; for the intervals 450 - 301 bases 5' of the cap site it was 1000; for the next 50 bp interval 1016; for the following interval 1415 and for the intervals 150 bp 5' of the cap site to 149 bp 3' of the cap site 1450 bases were sequenced. In order to also allow alignment of the rearranged VH genes, only the sequences for McPC603, MOPC167 HPCG13, MC101, H37-311, H37-45 and H37-80 (Lebecque and Gearhart, 1990) were included in the interval 150 - 199 bp 3' of the cap site. The mutation frequency for the interval 200 - 249 bp 5' of the cap site was calculated from a total of 1438 bases sequenced; for the intervals 250 - 499 bp 3' of the cap site

it was 1400 bases, whilst for the intervals immediately 3" and 5' of the VH-D-JH joins it was 814 and 1068 bases respectively. The graph was compiled from the hybridoma sequences shown in Fi«mres 6 2 and 6.3 m addition to 40.3,3B44,3B62, A6/24, A20/44 (Both et al, 1990), H37-68, H37-85, H37-79, H37-78, H37-96, M460 (Clarke et al, 1990), and the sequences previously mentioned (Lebecque and Gearhart, 1990). The JH-1 - C intron down to the EcoRI site is a composite of 40.3 (Both et al, 1990) and MOPC 167, MOPC 603, HPCG13 and HPCG 15 (Lebecque and Gearhart, 1990). Due to the identity of their sequences the mutations found in this region of hybridomas Hl-27 and Hl-30 (Figs. 6.2 and 6.3) were scored only once.

Figure 6.4 reveals that the mutation frequency immediately 5' of the cap site is approximately an order of magnitude lower than that 3' of the cap site. It also illustrates that the distribution of somatic mutations around rearranged VH regions is positively skewed, with a single mode centered on the rearranged VH gene, and a long tail

77 extending into the J-C intron, thus confirming the conclusions drawn in Steele et al, 1992.

Concluding remarks: In this chapter it was shown that the 5 mutations found in the distal 5' flanking region of VH3B62 were not present in any of the 31 and 21 VH186.2 related sequences isolated from C57BL/6J and BALB/c DNA respectively. Taken together with previous data (Rothenfluh, 1990; Rothenfluh et al, 1993) it is therefore unlikely that a germline donor for these 5 mutations is present in the DNA of the two strains of mice. The addition of the 5' flanking region sequences presented in Figure 6.2 to the previously published rearranged VH sequences reveals that the cap site is the 5' boundary for somatic hypermutation with a low level of leakage, and that the distribution of somatic mutations around rearranged VH regions is asymmetric.

78 7. Minimization of PCR generated artifacts

Rationale: In chapters 8 -12 detailed molecular and phylogenetic analyses will be applied to related germline VH sequences that were PCR amplified from genomic DNA. Since it was shown that a number in vitro artifacts can be generated during PCR, it was important to carry out control experiments that indicate to what extent such artifacts may have contributed to the germline VH sequences presented below. The introduction of nucleotide misincorporations was greatly reduced by the use of a thermostable enzyme, Pfu DNA polymerase, that possesses 3'—»5' proof-reading activity (Lundberg et al, 1991). In order to assess the level of PCR crossover or strand jumping events (Paabo et al, 1990), a number of experiments involving the restriction analysis of PCR products were carried. The results of these experiments suggest that PCR crossover may not occur to a significant degree under optimized conditions.

Strategy: The fidelity of the Taq and Pfu DNA polymerases was assessed by sequencing PCR amplified DNA fragments of known sequence. A number of clones containing well-characterized rearranged or germline VH genes were used to assess the level of PCR crossover. Following PCR amplifications from mixtures of these clones, some containing degraded or restricted DNA, the PCR products were analyzed with restriction enzymes or sequenced. Any strand jumping events can thus be identified by unexpected restriction fragments or hybrid sequences.

Fidelity of Taq DNA polymerase: In chapter 6 between 3-5 subclones of each PCR fragment amplified from hybridoma DNA were sequenced in order to identify any Taq DNA polymerase induced mutations. In this manner a total of 28,354 bases of DNA that was PCR amplified from hybridoma DNA with Taq DNA polymerase was sequenced. A total of 23 single point mutations that were present in only one subclone were detected. These base changes were probably induced by the enzyme during DNA synthesis. No enzyme generated frameshift mutations were found. The preference for transitions over transversions and the predominance of TA to CG transitions of Taq DNA polymerase (Table 7.1) is in agreement with data presented in earlier reports (Saiki etal, 1988; Tindall and Kunkel, 1988; Keohavong and Thilly, 1989).

79 Table 7.1. Base substitution specificity of Taq DNA polymerase Type of nucleotide change Frequency Transitions TC 10/23 AG 6/23 GA 3/23 CT 1/23 Transversions CG 1/23 CA 1/23 AT 1/23

The overall error rate of Taq DNA polymerase is 1 error in every 1233 nucleotides under the experimental conditions used in this work (see Materials and Methods). The average error rate per nucleotide per cycle (m) can be calculated according to the following formula (Saiki etal, 1988): m = 2 (f/d), where f = observed error rate in the PCR product (1/1233), and d = the number of PCR cycles (40). Therefore the error rate of Taq DNA polymerase in this work is 4.1 x 10"5 errors per nucleotide per cycle. However, it is important to note that this is only an average value. In fact it was found that some sequences contained no errors, whereas others contained up to four. Thus, the overall error rate of the enzyme varied from no errors to 1 error per 288 nucleotides (1.7 x IO"4 errors/nucleotide/cycle), which is approximately an order of magnitude higher than the average rate.

PCR crossover events with Taq DNA polymerase: The 5' primer (primer 1) used in the PCR amplifications in chapter 6 is specific for VH 186.2 related genes, as shown in Figure 6.1 (also see chapter 8). This primer, in combination with JH specific 3' primers was used to amplify rearranged VH regions from hybridoma DNA (Figs. 6.2 and 6.3). Thus, if strand jumping occurred to a significant extent under the conditions described it would be expected that chimaeric molecules containing partial sequences from VH186.2 related germline genes and from somatically mutated hybridoma VHDJH DNA should be detected. No such sequences were isolated, suggesting that under the conditions used strand jumping is a rare event with Taq DNA polymerase.

PCR crossover events with Pfu DNA polymerase: Since the bulk of PCR amplifications were done using Pfu DNA polymerase, it was important to assess the level of strand jumping that occurred under the conditions in which this enzyme was utilized.

80 In vitro artifacts generated during the PCR amplification and/or cloning of the hybridoma DNA could be detected by sequencing multiple copies of the products. However, this may not always possible for germline genes, since not all may be isolated more than once. Amplification from degraded DNA: It was shown that significant levels of strand jumping events can occur when amplifying from damaged DNA (Paabo et al, 1990). If the genomic DNA from which PCR amplifications were carried out in this thesis was significantly degraded, it is possible that hybrid DNA molecules would be produced during PCR amplification. Therefore, equimolar mixes of cloned VH3B62 and VH40.3 DNA (see Materials and Methods) were partially (1 min) or completely (60 min) digested with Rsal or DNAsel. The DNA was then subjected to 40 rounds of PCR amplification with primers 1 and 12.

- 12 3 4 5 6 7

1018- €»€»

Figure 7.1. PCR amplification from degraded DNA. A no DNA negative control (Lane -) and a number of positive control reactions (Lane 1: 100 pg VH40.3; Lane 2: 100 pg VH3B62; Lane 3: 50 pg each VH40.3 and VH3B62) were included in the experiment. Lanes 4 - 7: PCR amplifications of mixtures containing 50 pg each of V"H40.3 and VH3B62 DNA that were partially degraded with Rsal (Lane 4) or DNAsel (Lane 5), or completely degraded with Rsal (Lane 6) or DNAsel (Lane 7). The PCR reactions were separated on a 1.5 % agarose gel, transferred to a nylon membrane and hybridized with a full length probe which was amplified from an equimolar mixture of the two cloned DNA with primers 1 and 12.

As shown in Figure 7.1, no product was detected in amplifications where the template DNA was partially or completely degraded. An independent repeat of the experiment (data not shown) confirmed this result.

81 Amplification from mixtures of degraded and undegraded DNA: In order to mimic conditions that may be inducive to the formation of hybrid DNA molecules (Paabo et al, 1990), 40 cycles of PCR amplification were carried out from 0.1 pg, 1 pg, 10 pg or 50 pg of completely Rsal degraded VH3B62 DNA mixed with 50 pg of VH40.3 DNA. The Rsal restriction pattern of the cloned VH3B62 and VH40.3 DNAS are shown in Figure 7.2. The Rsal restriction site (GT/AC) located 139 bp downstream of primer 1 in VH3B62 is the result of a GTAT->GTAC mutation in this gene (Both et al, 1990; and see Figure 6.1).

Rsal Rsal Rsal Rsal Rsal VH3B62 cap I CD+EH V D%2 %HJ 'H4

136'139 ' 522 "" 255 r103&•' 2148

Rsal Rsal Rsal Vj^OJ

LZT-HLJ- PPm Jffi" il 136 909 365

Figure 7.2. Arrangement of VH genes in the genomic clones derived from hybridoma 3B62 and 40.3 DNA. The positions of the promoter (P), transcription start site (cap), leader (L), rearranged V"H region and any unrearranged JH genes are indicated. The sizes and positions of the unrearranged JH genes are not shown in scale. Rsal recognition sites are shown by arrows. The solid line below the gene diagram represents the PCR amplified region that is defined by primers 1 and 12, and the length of the Rsal restriction fragments are indicated. The three Rsal fragments at the 3' end of the PCR fragment amplified from VH40.3 (in order from 5' to 3') are 7, 53 and 50 bp in length. The dashed lines on either side of the PCR amplified region indicate the distance to the next Rsal site. The sizes of the 5' and 3' terminal Rsal fragments of the PCR products include the restriction sites added to the 5' termini of the primers.

Rsal restriction of VH3B62 produces a 235 bp fragment (the Rsal fragment spanning the priming site for primer 1 in Fig. 7.2), which only differs at 2 positions from the corresponding region of VH40.3. If this Rsal fragment anneals to the complementary strand of VH40.3 and is extended during PCR amplification, then the resulting DNA fragment should be 136 bp longer than the fragment produced if primer 1 was extended. hi addition, the Rsal site that was introduced into VH3B62 by a T-»C change at position 141 (Fig. 6.1) will no longer be present in any such hybrid DNA products because the nucleotide change will be present on the 522 bp Rsal fragment downstream of the introduced Rsal site (Fig. 7.2). As can be seen in Figure 7.3 all PCR products indicate that no strand jumping occurred.

82 Figure 7.3. Size separation of PCR products amplified from 50 pg of untreated V"H40.3 DNA mixed with 0.1 pg (Lane 1), 1 pg (Lane 2), 10 pg (Lane 3) or 50 pg (Lane 4) of Rsal restricted VH3B62 DNA. Lane - is the no DNA negative control. Prior to electrophoretic separation on a 1.5 % agarose gel the PCR products were restricted with Rsal. The DNA was transferred to a nylon membrane and hybridized with a probe that was PCR amplified from an equimolar mixture of V"H40.3 and V"H3B62 with primers 1 and 13.

The 909 bp DNA fragment detected in all lanes corresponds to the large fragment produced following Rsal restriction of the PCR fragment amplified from VH40.3 with primers 1 and 12. Therefore if the 235 bp DNA fragment produced by Rsal restriction of VH3B62 did act as a primer, it did so below detectable limits. The complete Rsal restriction of the PCR product makes it unlikely that the negative result is due to incomplete restriction of the VH3B62 DNA. Restriction analysis of DNA amplified from mixed templates: An additional strategy to detect any strand jumping activity during PCR amplification was to PCR amplify from an equimolar mixture of VH3B62 and VH40.3. The PCR product obtained following 40 cycles, of amplification was restricted with Rsal and then size separated. As can be seen in Figure 7.4 no unexpected Rsal fragments were detected.

83 Figure 7.4. Size separation of Rsal restricted PCR products. PCR amplifications were carried out from 100 pg of VH40.3 DNA (Lane 1), 100 pg of V"H3B62 DNA (Lane 2) and from a mixture containing 50 pg each of VH40.3 and VH3B62 DNA (Lane 3). A no DNA negative control was also included (Lane -). The Rsal fragments were separated on a 2 % agarose gel and transferred to a nylon membrane. The probe used for hybridization was amplified from an equimolar mixture of V"H40.3 and VH3B62 DNA with primers 1 and 13.

The absence of any Rsal fragments not shown in Figure 7.2 indicates that any hybrid DNA molecules were present below detectable limits. Two independent repeats of this experiments confirmed this result (data not shown). Sequence survey of PCR products amplified from mixed templates: In a final attempt to detect chimaeric DNA molecules formed as a result of strand jumping during PCR, an equimolar mixture of 4 cloned VH 186.2 related germline genes (C57C-9, C57C-17, BALB-2 and BALB-23; see chapter 8) was used as template DNA. Two independent PCR amplifications (40 cycles) were performed from a mixture containing 25 pg of each clone using primers 1 and 12, and 16 clones were isolated from each PCR reaction and sequenced.

84 Table 7.2. Sequence survey of PCR products Number of times Number of times Number of hybrid Clone isolated from PCR1 isolated from PCR 2 sequences isolated C57C-9 3 4 0 C57C-17 6 3 0 BALB-2 4 7 0 BALB-23 3 2 0

None of the 32 sequences isolated displayed any evidence of having undergone a strand jump event (Table 7.2). Taken together, the data presented in the strand jumping controls indicate that if strand jumping did take place during the PCR amplifications carried out in this work, it did so below detectable levels. Fidelity of Pfu DNA polymerase: In chapter 6, Pfu DNA polymerase was used to PCR amplify 9,175 bp from hybridoma DNA. In this set of sequences no Pfu generated point mutations were detected. However, the above 32 sequences amplified with Pfu DNA polymerase yielded a total of 30,656 bases of DNA sequence. Only two point mutations that were probably induced by the enzyme were detected. Although both of these errors were transitions, one TA-»CG and one GC-»AT, there are insufficient numbers of probable Pfu DNA polymerase generated errors to make any statistically significant observations about the error specificity of this enzyme. Nevertheless, the data does allow an approximation of the error rate of the enzyme. Using the formula shown above, the overall error rate of 1 error per 15,328 nucleotides yields an error rate per nucleotide per cycle of 3.3 x 10"6. Thus under the laboratory conditions described, the error rate of Pfu DNA polymerase was found to be more than 12-fold lower than that of Taq DNA polymerase.

Concluding remarks: The above results indicate that highly accurate PCR amplification of DNA is possible with Pfu DNA polymerase. In addition, using a number of strategies it was also shown that the level of PCR strand jumping is below detectable levels under the PCR conditions used in the work carried out for this thesis. This demonstrates that it is highly unlikely that the sequence data reported on in this thesis contains any significant PCR generated artifacts, be they point mutations, frameshift mutations or hybrid sequences. Thus, the germline VH gene sequences presented in the following sections represent sequences that are present in the germline of the mouse strains analyzed and not in vitro artifacts.

85 8. Molecular analysis of VH 186.2 related germline genes

Rationale: Figure 6.1 shows a limited portion of the 5' flanking region of the 52 VH 186.2 related germline genes amplified from C57BL/6J and BALB/c genomic DNA with primers 1 and 12. The total amplified DNA fragment contains approximately 520 bp of non-transcribed 5'flanking region s and all but the 42 bp which abut the 3' end of the coding region (a total of approximately 958 bp of DNA sequence depending on the number of insertions and/or deletions). Analysis of these germline sequences revealed a striking concentration of sequence diversity in the putative CDR2 of these genes. Although this has been reported by a number of earlier studies (eg. see Bothwell et al, 1981; Givol etal, 1981; Bentley and Rabbitts, 1983; Schiff etal, 1986; Kodaira etal, 1986; Reynaud et al, 1987,1989; Pascual and Capra, 1989; Lautner-Rieske etal, 1992; Tomlinson et al, 1992; Sims et al, 1992), none of these reports included significant amounts of flanking region sequence. Furthermore, there appeared to be a deficit of crippling mutations in the putative coding regions of the sequences shown below. These findings are striking because these genes are only expressed following a productive VHDJH rearrangement in B cells, thus natural selection based on function is unlikely to act directly on these genes. Therefore, the observed patterns were subjected to statistical analysis which revealed that these patterns are indeed significantly different from the patterns expected by random accumulation of point mutations.

Strategy: Wu-Kabat amino acid and nucleotide variability plots respectively were constructed for the putative coding regions and for the entire PCR products. The non- randomness of the pattern of sequence variation and the apparent lack of crippling mutations were further analyzed by comparing of the mutation pattern expected under a random point mutator model with the observed pattern.

Isolation of 31 VH 186.2 related germline genes from C57BL/6J genomic DNA: Using primers 1 and 12,31 unique VH186.2 related genes were isolated by PCR amplification (40 cycles) from C57BL/6J genomic DNA (Table 8.1).

86 Table 8.1. Isolation of 31 VH186.2 related germline genes from C57BL/6J DNA a Unique b Source of c PCR d Number of e Other sequence genomic DNA amplification repeats information C57G8 Liver (Germany) Taq#l 2(2GL) C57G10 2(1GL,1CL) C57G9y Pfu#l 5(2GL,3CL) Base deletion 802; V^ C57Gll\j/ 2(2GL)* Base deletion 768 C57G44 II n 5(1GL,4CL) C57G45 VH8D5§ C57G46 " ii C57C2 Liver (Canberra) Pfu#2 C57C11 II n C57C14\y M n Base deletion 802 C57C15 11 n C57C16 11 u VHl65.l1 C57C19 It ll C57C20 II n 2(2CL) C57C22 II n C57C23 Liver (Canberra) Pfu#3 Vnl02t C57C25 n n C57C26 M it C57C27 n n C57E1 Embryo (Canberra) Pfu#4 11(4GL,2CL,5E) VH145t C57E2 II II 12(4GL,1CL,7E) VHl86.2t II n C57E3 5(1GL,3CL,1E) VH8D5§ C57E6 n n C57E22 M it C57E31 Embryo (Canberra) Pfu#5 II ii C57E33 2(1GL,1E) VH24.81 it il C57E35 VH8D5§ C57E36 it n C57E38 n it C57E40 n n C57E44 n ii

a. A list of the unique sequences that were isolated and sequenced. \y indicates a pseudogene. b. Germany refers to the C57BL/6J liver DNA provided by K. Rajewsky, Canberra refers to DNA isolated from mice obtained from the ABE in Canberra (see Materials and Methods). Liver tissue was taken from unimmunized adult mice, whilst embryo tissue was taken from 8-10 week pregnant unimmunized female mice. c. Indicates which enzyme was utilized and which sequences were isolated from each independent amplification, d. Indicates the number oftimes eac h unique sequence was isolated. The information in the brackets indicates the number of times and from which DNA it was isolated. GL = C57BL/6J liver DNA provided by K. Rajewsky; CL = DNA isolated from mice obtained from the ABE in Canberra; E = C57BL/6J embryonic DNA. * = isolated from 2 independent PCR amplifications e. The type of mutation that generated the pseudogene is indicated by reference to the nucleotide position at which the mutation occurred. The coding regions for some of the unique sequences isolated match those of genes previously documented by tBothwell et al, 1981, ^Siekevitz et al, 1987 and §Kaartinen et al, 1988. Nine of the 31 unique sequences obtained from C57BL/6J genomic DNA were isolated from more than one independent PCR amplification. Three of the 31 coding regions are pseudogenes as a result of frameshift mutations (Table 8.1 and Fig. 8.1). However, it is possible that some of the other genes may be pseudogenes due to frameshift mutations and/or stop codons in the 3' terminal 42 bp that were not amplified (see section 13.5a). Four of the seven VH186.2 related genes documented by Bothwell et

87 al (1981), including VH186.2 itself, were isolated by the primers used in the PCR amplifications: VH6 (C57G9), VH102 (C57C23), VH145 (C57E1) and VH186.2 (C57E2). Two of these, VR145 (C57E1) and VH186.2 (C57E2) were isolated 11 and 12 times respectively. Presumably, this represents a bias of the oligonucleotide pair used in the PCR amplifications, however it does indicate that the primers are indeed specific for VH186.2 related genes. In addition, three other previously published genes were isolated: VH165.1 (C57C16) and VH24.8 (C57E33) which were first described by Siekevitz et al (1987), and VH8D5 (C57E3, C57E35 and C57G45) which was earlier isolated by Kaartinen etal (1988).

DNA sequences of the 31 genes isolated from C57BL/6J DNA:

VH186.2 TGATGCAATA TTCTGTTGAC CCATACATA T ACATAATTTA TTTCTTCTGA TAATGCTGCA ATAATCAATC ATGTGTATAT GTTTCTGAGG C57C19 G. GGTTATT C57C14 C57C25 C57C26 A C57C11 C57C15 .T C57C2 .T ..A.. . . C57C16 C57C27 GGTTATT GGTTATT G...... a. . .a. .. .fi. . C57C23 A C57C20 A C57C22 GGTTATT . .fi. . . C57G9 GGTTATT . .. fi. . . .a. ...a . . ... A CS7G11 , . a. . C57G44 .T . .A C57G45 G. . . . C. . -.A.A ... A. C57G46 .T . .a. . ... C57G8 . . .T. . .a. . C57G10 . A ...... C57E35 GGTTATT a. . . .a. .. .fi. . C57E22 A C57E44 A C57E33 .T ..a . . C57E31 GGTTATT , G C57E3 A C57E40 20 30 40 50 60 70 90 100 C57E36 C57E38 VC57EH186.6 2 TATGTTTTGT TTTGGTCATT TGGGTGATTT TTCGAATGTA TATGATATTG GAAAGGCAAA TGTTAATTGT ATGTATTGAA AGGAGGCTGT GACTTTTAAT C57C1C57E19 C T A C C57C14 C57C2S C57C26 C57C11 .T. CS7C15 .T. C57C2 C. .T. ..A. C57C16 C. .T. .TT. C57C27 C. .T. .TT. C57C23 -T. C57C20 ,T. ..A. C57C22 .T. .TT. C57G9 .T. -A. . ..C. C57G11 .T. .A. . . .C. C57G44 .T. .A. .CC. C57G45 .T. -A. . ..C. C57G46 .T. .A. . . .C. C57G8 C. .T. .A. ..A. . ..C, C57G10 .T. .A. .TT. . . .C. C57E35 C. .T. .A. .TT. . ..C. C57E22 .T. .A. . ..C. C57E44 C. .T. .A. C57E33 .T. .A. C57E31 .T. ..A. . ..A C57E3 .T. .TT. C57E40 110 120 130 .T. 140 150 160 170 180 190 200 C57E36 .T. C57E38 C. C57E6 C57E1 88 VH186.2 AAGTTAGCTG TTTTTGAGAT TTCCCATCAC TATTCTCATC TTTCTAACCA CCTGTAAATC CATCTGTCAA CTGTGTCACA-GTGGGGCCAC TGTCTCAAGC C57C19 A A.. .C C GC C. .G GA A T C57C14 A A.. .C C GC C. .G GA A T C57C25 - C57C26 - C57C11 T.A.. .C C GC C. .G G T T C57C15 A A.. .C C GC C. .G G T T C57C2 C A.. -C C GC C.-G GA - T C57C16 C A.. .C....C GC C. .G G A T C57C27 C A.. .C C GC C. .G G A T C57C23 T C T...- G.G C57C20 T CC ' - G.A C57C22 A A.. .C C GC C .G GA A T CS7G9 A.A.. .C C GC. . . . C. .G G T T C57G11 T C T...- G.A C57G44 C - G C57G45 A A.. .C C GC..C. ..C GA..A..A.- T C57G46 A A.. .C C GC C. .G - C57G8 C A.. .C....C GC C. .G G - T C57G10 T.A.. .C....C GC C .G G T T C57E35 C A.. .C C GC... C. .G G A T C57E22 C57E44 C A.. .C C GC C .G G A T C57E33 '. T A.. .C C GC C. .G G - T C57E31 T CC - G.A C57E3 A A.. .C C GC C .G GA A T C57E40 A A.. .C C GC C .G G T T C57E36 - T C57E38 C A.. .C C GC C G A T C57E6 T C T...- G C57E1 A A.. .C C GC C. .G GA A T 210 220 230 240 250 260 270 280 290 300 VH186.2 TGCAAATCTT TTTAGTGCAC AGGCTCTAAT GTTACATCCA TAGCCTCAAC ACAAGGTTCA G-GGATGAGGT AT-GGGATGAA TTTCCACAGA CAAGATGAGG C57C19 A G - - A C57C14 A G - - C TT CA. . . C57C25 - - C57C26 - - C57C11 T T..G. .- - C. ...T.TT CA. .A C57C15 T T..G. .- - C. ...T.TT CA.-A C57C2 C A G. .- A..- C. ..CT.TT CA. . . C57C16 C A G. .- A..- C ..CT.TT CA. . . C57C27 C A G. .- A..- C. ..CT.TT CA. . . C57C23 - GA..G T. . . C57C20 G - A. ..GA..G G T. . . C57C22 A G - - C TT CA.. . C57G9 T..G. .- - C ...T.TT CA. .A C57G11 - A. ..GA..G T. . . C57G44 - A. ..GA..G G T. . . C57G45 T G - - C TT CA... C57G46 G - - C C57G8 C A G. .- A ..- C ..CT.TT CA. . . C57G10 T T..G. .- - C. ...T.TT CA. .A C57E35 - - C TT CA. . . C57E22 - - C57E44 C A G. .- A..- C..CT.TT A... C57E33 G -.A - T AT G. .CA.T. C57E31 G - -...G G T. . . C57E3 A G - - C TT CA. . . C57E40 A - TT CA... C57E36 - - C57E38 C A - - C57E6 - GA C57E1 A G - - A 310 320 330 340 350 360 370 380 390 400

89 VH186.2 ACTTGGGCTT CAGTATCCTG ATTCCTGACC CAGATGTCCC TTCTTCTCCA GCAGGAGTAG GTGCTTATCT AA-TATGTATC CTGCTCATGA ATATGCAAAT C57C19 CS7C14 C57C25 C57C26 C57C11 C57C15 C57C2 C57C16 C57C27 G. . .C . . C57C23 C57C20 C57C22 CS7G9 C57G11 C57G44 C57G45 C57G46 C57G8 . . .C . ..,T C57G10 . . ,T C57E35 . . . .G... C57E22 . . .CA.. . . . T C57E44 .. .c .. .T . . .A. .. C57E33 C57E31 A. . . . . ,c. T.C.... C57E3 C57E40 C57E36 C57E38 C57E6 V 186.2 CCTGTGTGTC TACAGTGGTA AATATAGGGT TGTCTACACG ATACA—AAAAA-CATGAGATCA CTGTTCTCTT TACAGTTACT GAGCACACAG GACCTCACCA C57EH 1 C57C19 410 420 430 440 450 460 470 480 490 500 CS7C14 A T.. . C57C25 .. C57C26 . ..G.. .ftT. C A T.. . C57C11 AA A C .C. —.. ...G. .A, C A C57C15 AA C .C A ...,C C57C2 AA C .C. . . .G. » ....c A T. . . a T.. , C57C16 AA T C . . . .G. ..,.c A CAG.. c C57C27 AA. C .C. CAG. . .A. . . . .A. C57C23 ...C.AA - T. .C .C. ...,c C57C20 ...C.AA - T. .C .C . ..0...A .T A T.. . C57C22 AA T C. CAG...... c . .A CAG. . .A...... c . .A. C57G9 AA A C .C A ....c A T. . . C57G11 ...C.AA - T T. .C .C. . .G.. n ...,c A T.. . C57G44 ...C.AA T. .C .C. . . .G..A ....c A C57G45 AA T C T.. . C57G46 AA T.. C. . .G.. a ...,c A T.. . C57G8 AA C .C. a ....c A T.. . C57G10 AA A C .C . G. a ...,c C57E35 AA C .C CAG. . .A. . .,.,. c . .A. C57E22 T.. . C57E44 AA C .C. — .. C57E33 AA C .C T. . . C57E31 ...C.AA - T. .C .C. C57E3 AA T C C57E40 AA51 0 A52 0 530 54C0 -C 550 560 570 580 590 600 C57E36 C57E38 AA T C C57E6 ...C.AA T T. .C -C. C57E1

90 Leader | VH186.2 TGGGATGGAG CTGTATCATG CTCTTCTTGG CAGCAACAGC TACAGGTAAG GGGCTCACAG TAGCAGGCTT GAGGTCTGGA C57C19 C57C14 •C T T G A G. . .C AGGCTTGAGATCTGGC C57C25 C57C26 C57C11 C57C15 .A.T T C57C2 .A.T T C57C16 .A.T T C57C27 T C57C23 T T C57C20 . . .T T T G. • A G..-C AGGCTTGAGATCTGGC C57C22 . . .T. T C57G9 . . .T. T C57G11 . . .T. T G. C57G44 . . .T. T G. C57G45 .A.T. T G. T. . .C C57G46 . . ,T. T.A C57G8 . . .T. T G. C T G. C57G10 T.T C57E35 C .A C T C57E22 T G. C57E44 C T G. • A C AGGCTTGAGATCTGGC C57E33 c C57E31 C57E3 C57E40 610 620 630 640 650 660 670 680 C57E36 C57E38 C57E6 VH186.2 CATATACATG GGTGACAATG ACATCCACTT TGCCTTTCTC TCCACAGGTG TCCACTCCCA GGTCCAACTG CAGCAGCCTG GGGCTGAGCT TGTGAAGCCT C57C1C57E19 T C57C14 .C .T. C57C25 C57C26 C57C11 . . .T. C57C15 .G.T. C57C2 C57C16 C57C27 A.. G C. G. .A. C57C23 .A A. . G C57C20 .C..T.. A. C57C22 -.. G CS7G9 G..A. CS7G11 .A A.. G .A A.. G C57G44 C57G45 . .A. fi. C57G46 A. .. .A., R. C57G8 G G A. . C57G10 . .A., fi. CCTG. ..A C57E35 ....c G .A C57E22 A. .. .A. fi. C57E44 .. . .C. .T.. .a . . a. . .a.. G C57E33 CS7E31 . G. C57E3 C57E40 690 700 710 720 730 740 750 760 770 780 C57E36 C57E38 C57E6 C57E1

91 CDRl VH186.2 GGGGCTTCAG TGAAGCTGTC CTGCAAGGCT TCTGGCTACA CCTTCACCAG CTACTGGATG CACTGGGTGA AGCAGAGGCC TGGACGAGGC CTTGAGTGGA C57C19 A A C57C14 - T A C57C25 A C57C26 A CS7C11 GA C57C15 A A C57C2 A A AC A C57C16 A A AC A C57C27 A A ACT A C57C23 G CA C57C20 T T A A C57C22 A C57G9 - T GA C57G11 G CA C57G44 T T A C57G45 A C57G46 C..A C57G8 A A AC C A C57G10 G CA C57E35 A C57E22 A C57E44 A C57E33 A C57E31 T T A C57E3 A C57E40 T T A A... C57E36 A C57E38 G CA C57E6 ...A A A T A C57E1 A 790 800 810 820 830 840 850 860 870 880 CDR2 VH186.2 TTGGAAGGAT TGATCCTAAT AGTGGTGGTA CTAAGTACAA TGAGAAGTTC AAGAGCAAGG CCACACTGAC TGTAGACAAA CCCTCCAGCA CAGCCTACAT C57C19 C57C14 C57C25 GG. ...A..A C C. T C57C26 TA.A. .A. . . T G A AT C57C11 AT.. .T GG. . . .A. .A C C. T C57C15 AT.. .A GC.A C T C57C2 GAT.C .C GG A C C T C57C16 GAT.. .T GG A C C T C57C27 GAT.. .T GG A C C T C57G4C57C253 AT.. .CA TGCC GA.A.A . . .A CC CA G T TT C57C2C57G460 AT.T. ... .AC GC .A A C T T C57C2C57G8 2 GAT.AT... .A..T C GGC .A A C C. T T C57G1C57G90 AT.. .CT GGTC. GA.A. . ..A ... A.A C C CA G T CT T C57E3C57G115 AT.. .CA TGCC GA.A....A A C CA G T T C57G4C57E224 T. . . .C A C CT T C57E44 AT.. .A GC .A C T C57E33 .C..GA TC GA.A..TA C CA G T T C57E31 T. .. .C A C T C57E3 AT.. .A GC.A C T C57E40 GA. . . .A GGC .A C T C57E36 C57E38 C TC. GA.A. . .A C CA G T T C57E6 GAT.. .T GG A C C T C57E1 890 900 910 920 930 940 950 960 970 980

92 VH186.2 GCAGCTCAGC AGCCTGACAT CT3AGGACTC TGCGGTCTAT TATTGTGCAA GA C57C19 T. C57C14 C57C25 T. C57C26 A..T. C57C11 C57C15 C57C2 C57C16 C57C27 T. C57C23 T. C57C20 .A T. C57C22 T. C57G9 C57G11 C57G44 . . .A C57G45 C57G46 C57G8 C57G10 C57E35 C57E22 C57E44 C57E33 C57E31 ...A C57E3 C57E40 . . .A 990 1000 1010 1020 1030 C57E36 C57E38 C57E6 Figure. . .A 8.1. The sequences of 31 VH186.2 related germline genes isolated from C57BL/6J DNA. 3 of the C57E1 sequences represent pseudogenes (italicised sequence names, see Table 8.1). The sequence for clone C57E2 is identical to the VH186.2 sequence (Table 8.1) and is thus labeled V"H186.2. The putative TATA box, octamer and ATG start codon are overlined. The positions of the primers are indicated with asterisks. Symbols: I = putative splice sites; > = first codon of the coding region (CAG); * = putative transcription start site; . = sequence identity with the VH186.2 germline gene; - = base deletion. Base insertions are indicated by additions to the individual sequence. The numbering is as shown in Both et al, 1990.

It is interesting to note that C57E3, C57E35 and C57G45 have identical putative transcription units, and only differ upstream of the cap site. In addition, the 5' non- transcribed flanking regions of C57C25 and C57C26 are identical and differ from the VH186.2 sequence by only one nucleotide at position 498 (in the octamer motif), but the putative coding regions of these genes are different. Since C57E3, C57E35 and C57G45 were isolated from different PCR amplifications (Table 8.1) - indeed, C57E3 was isolated a total of five times from three independent PCR amplifications - it is highly unlikely that they could have resulted from PCR strand jumping events. In addition, one of these sequences (C57G45) was amplified from a different genomic DNA preparation. Furthermore, chapter 7 illustrates clearly that PCR amplification under the conditions used in this work results in undetectable levels of DNA strand jumping. Thus, it is unlikely that any of the 31 VH186.2 related genes are hybrid sequences. Six of the above sequences (C57C19, C57C14\|/, C57C25 and C57C26) contain nucleotide substitutions within the putative octamer sequence motif, and four sequences (C57C16, C57C22, C57G45, C57G46) also carry single nucleotide changes within the AT rich region (TATA box). In addition, one of the sequences (C57C25) contains a nucleotide change at the splice site. A heptamer homologue (CTCATGA) is present 2 bp upstream of the octamer motif. This heptamer differs from the consensus sequence (Poellinger et al, 1989) by one nucleotide and approximately half of the sequences

93 contain an additional nucleotide change within this sequence motif. If these changes interfere with the normal function of the promoter region/splice sites then the sequences may also be pseudogenes. However, recent evidence suggests that there may be considerable flexibility in some of the protein-DNA interactions that take place at promoter sequence motifs (see section 13.5a). Nevertheless, regardless of possible crippling mutations in the promoter region, approximately 90 % of the putative coding regions display open reading frames and none contain stop codons. The homology plot shown in Figure 8.2 indicates that the 31 sequences are indeed members of the same family since the sequence homologies are greater than 80 %. Indeed, most sequences share greater than 90 % homology. The data also indicates that the sequences of the putative transcription units of the sequences are somewhat more homologous than the sequences of the 5' flanking regions.

200 -i H 5' cap site E3 3' cap site C I n O T3

100 c s n na o o 3

86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 % Sequence homology

Figure 8.2. Sequence homology plot of the 31 VH186.2 related sequences isolated from C57BL/6J DNA. Homologies are shown for the 5' non-transcribed flanking regions (5' cap) and for the putative transcription unit (3' cap).

Wu - Kabat nucleotide/amino acid variability plots for the 31 C57BL/6J sequences: The non-transcribed 5' flanking regions and coding regions of the sequences shown in Figure 8.1 contain many coincident changes, i.e. nucleotide changes present in more than one gene (Bothwell et al, 1981). Indeed, in a number of positions the coincident changes are shared by virtually all of the sequences. However, most striking is the concentration of nucleotide variability in CDR2 of the VH186.2 related germline sequences. The standard for the analysis of V region variability was set 24 years ago by the Wu - Kabat amino acid variability plot (Wu and Kabat, 1970). In order to

94 allow a similar analysis of the entire DNA sequences the Wu - Kabat definition of amino acid variability was modified as follows:

Nucleotide variability = Number of different nucleotides at a given position Frequency of the most common nucleotide at that position

The nucleotide variability plot for the 31 VH 186.2 related sequences isolated from C57BL/6J DNA is shown in Figure 8.3.

20

iU -600 -400 -200 0 200 400 600 °0 n2 0 40 60 80 100 Nucleotide position Amino Acid Position D-4£H~Tn CDRl CDR2 P capL V

Figure 8.3. Nucleotide (left) and amino acid (right) variability plots for the 31 V"H186.2 related C57BL/6J sequences. The horizontal dotted line in the nucleotide variability plot indicates the level of variability where only one out of the 31 sequences contains a nucleotide change (2.067). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0. The diagram below each graph indicates the region included in the graph. The relative positions of CDRl and CDR2 are indicated by black boxes. In the nucleotide variability plot the cap site was assigned position 0. Symbols: P = promoter; cap = transcription start site; L = leader sequence; V = germline Vfj gene.

The nucleotide variability plot reveals a major variability peak in CDR2 of the sequences and a minor peak in FR1. Nucleotide variability for CDRl and most of the FR regions seems to be lower than that generally seen in the non-transcribed 5' flanking region. In fact, the nucleotide variability for most of the coding region, except for CDR2, is only marginally higher than the background (i.e. where only one sequence differs, indicated by the dotted line). The Wu - Kabat amino acid variability plot for the partial coding regions of the 31 sequences indicates that the variability peak in CDR2 is also present af the amino acid level. In contrast, the minor nucleotide variability peak in FR1 is not present at the protein level.

95 Codon-by-codon analysis of the putative coding regions of the 31 C57BL/6J sequences: In order to provide statistical support and further extend the analyses shown in Figure 8.3, a detailed codon-by-codon analysis of 'expected' versus the 'observed' amino acid replacement to silent mutation ratios (R:S) generated under a random point mutator model was carried out. The numbers of expected mutations are calculated according to three types of point mutation models. The first is based on pure randomness which allows all nucleotide changes to occur with equal frequency (Jukes and Cantor, 1969). The other two models correct this random expectation by considering known nucleotide substitution biases as derived from empirical data. In one case expected frequencies were determined based on meiotic selection free nucleotide substitutions in Ig genes (Kaartinen et al, 1991), and in the other by reference to data from meiotic non-Ig pseudogenes after elimination of probable changes of a methylated cytosine to thymine (Li etal, 1984; Kaartinen etal, 1991).

Table 8.2. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 31 VH186.2 related germline genes isolated from C57BIV6JDNA. Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 25 39 0 (1-30) Expl 44.4 42.2 41.9 16.6 18.4 19.0 3.1 3.4 3.2 CDRl Obs 15 1 0 (31-35) Exp 13.5 12.6 12.4 1.1 1.7 1.9 1.4 1.7 1.7 FR2 Obs 10 7 0 (36-49) Exp 12.1 11.3 11.1 3.5 3.9 4.0 1.3 1.8 1.9 CDR2a Obs 136 11 0 (50-58) Exp 112.5 108.9 108.2 32.6 36.0 37.3 1.8 1.9 1.5 CDR2b Obs 22 6 0 (59-66) Exp 22.9 21.4 21.2 3.1 4.9 5.5 1.9 1.7 1.3 FR3 Obs 27 9 0 (67-83) Exp 26.5 25.4 25.2 8.5 9.5 9.8 1.1 1.2 1.0 Entire V Obs 235 73 0 region Exp 231.9 221.8 220.0 65.4 74.4 77.5 10.6 11.7 10.6 *** *** ***

For this analysis a V"H186.2 related consensus sequence was determined for the entire 52 unique sequences (i.e. 31 from C57BL/6J, 21 from BALB/c). The consensus sequence differs from the germline V"H186.2 sequence only at codons 43 (CAA), 50 (AAG), 54 (AGT), 59 (AAC) and 75 (TCC). + It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ^ The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies 96 determined for meiotic substitutions in non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right), x^ probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. *** = p<0.01; **** = p<0.001. For stop codons, the %2 value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

It is apparent from Table 8.2 that the predicted R:S ratio is not constant throughout the VH186.2 consensus sequence. Depending on the point mutation model the expected R:S ratio of the FRs is between 2:1 and 3:1. In CDRl, the R:S ratio is between 6:1 and 12:1, whilst in CDR2 it is approximately 3:1 in the carboxy terminal portion (CDR2a), and between 3:1 and 7:1 in the amino terminal portion (CDR2b). Thus, although the analysis in Table 8.2 would predict the presence of a variability peak in CDRl of VH 186.2 related germline genes, no such peak is present in that region (Fig. 8.3). In addition, CDR2b has a higher expected R:S ratio than CDR2a, yet most nucleotide and amino acid variability is concentrated in CDR2a, whereas CDR2b seems to be conserved. From Table 8.2 it can also be seen that there is a statistically significant conservation of FRl amino acid sequence, i.e. selection against replacement mutations, accompanied by an increase in silent changes at the third codon position. In contrast, few silent mutations occur in subregion CDRl. The high R:S ratio in this subregion suggests that strong selection for point mutations resulting in amino acid replacements has taken place. In addition, there is a statistically significant lack of stop codons generated by nucleotide substitutions throughout the coding region of the sequences shown in Figure 8.1. The data presented in Table 8.2 predicts that between 10 and 12 such stop codons should occur in the 31 sequences, yet not a single one is present.

Isolation of 21 VH 186.2 related germline genes from BALB/c genomic DNA: In order to test whether the patterns described above for the 31 sequences isolated from C57BL/6J are unique to this mouse strain, 21 unique VH 186.2 related germline genes were isolated from BALB/c DNA as described above. A compendium of information about the sequences is shown in Table 8.3

97 Table 8.3. Isolation of 21 VH186.2 related germline genes from BALB/c DNA a Unique b Source of c PCR d Number e Other sequence genomic DNA amplification of repeats information BALB1 Liver (Canberra) Pfu#l 1 BALB2 Tl ii 1 BALB6 11 n 1 BALB7 n n 1 BALB9 n n 2(2L) BALB10 n II 2(2L)* BALB12 n ti 2(2L)* BALB13 n ii BALB16\|/ n n Stop codon 883 BALB18 n ti BALB20 Liver (Canberra) Pfu#2 BALB21 ti n BALB4y ii II Stop codon 883 BALB23 n n BALB25 it n BALB26 ii n BALB3E Embryo (Canberra) Pfu#3 BALB5E it it BALB13Ey Embryo (Canberra) Pfu#4 Stop codon 919 BALB14E n n BALB16E n n 2(2E) a. A list of the unique sequences that were isolated and sequenced. \\r indicates a pseudogene. b. Canberra refers to DNA isolated from mice obtained from the ABE in Canberra (see Materials and Methods). Liver tissue was taken from unimmunized adult mice, whilst embryo tissue was taken from 8-10 week pregnant unimmunized female mice. c. Indicates which enzyme was utilized and which sequences were isolated from each independent amplification, d. Indicates the number of times each unique sequence was isolated. The information in the brackets indicates the number of times it was isolated, and from which DNA it was isolated. L = liver; E = embryo; = isolated from 2 independent PCR amplifications, e. The type of mutation that generated the pseudogene is indicated by reference to the nucleotide position at which the mutation occurred. None of the previously reported genes that were isolated from C57BL/6J (i.e VH6, VH102, VH145, VH186.2, VH165.1, VH24.8 and VH8D5, see Table 8.1) were detected in BALB/c DNA. Only 2 of the 21 unique sequences were isolated more than once from independent PCR amplifications. This is in contrast to the genes isolated from C57BL/6J DNA, where 9 of the 31 genes were isolated more than once. Furthermore, none of the 2 repeat sequences isolated from BALB/c were detected more than twice, whereas 5 of the 9 repeat sequences isolated from C57BL/6J DNA were detected more than 5 times (Table 8.1). Although these differences may indeed be indicative of a larger repertoire in BALB/c DNA (Livant et al, 1986), they may in part be due to strain-specific differences at the PCR priming sites. Three of the 21 sequences are pseudogenes as determined by the inability of the coding region to produce an intact protein (see below).

98 DNA sequences of the 21 genes isolated from BALB/c DNA:

VH186.2 TGATGCAATA TTCTGTTGAC CCATACATA T ACATAATTTA TTTCTTCTGA TAATGCTGCA ATAATCAATC ATGTGTATAT GTTTCTGAGG BALB1 . .G. .C.GGTTATT T A....G BALB2 A BALB4 T A BALB6 CT A BAEB7 ..A. BALB16 • A.. BM£9 . .G. .C.GGTTATT AA G BALB10 CT A.] BALB12 . .G. .C.GGTTATT A G BALB13 CT A.. BALB18 CT A.. BALB23 CT A.. BALB20 .A. . .G. .C.GGTTATT. .T G C A. .A.A BALB25 CT A.. BALB21 CT A.. EALB2 6 CT A.. BALB14E . .G. .C.GGTTATT A.. BALB13E A.. BALB3E T A.. BALB16E 20 30 CT A.. BALB5E 100 40 50 60 70 80 90 VH186.2 TATGTTTTGT TTTGGTCATT TGGGTGATTT TTCGAATGTA TATGATATTG GAAAGGCAAA TGTTAATTGT ATGTATTGAA AGGAGGCTGT GACTTTTAAT BALB1 A T C TT C. . . . BALB2 T A TT C BALB4 .T. .A. BALB6 .T. .A. .TT. BALB7 .T. BUSH C BALB9 C BALB10 .AA. C BALB12 .CT. C BALB13 BALB18 .A. .TT. c BALB23 .A. .TT. C BALB20 .CA. c .A. . .TT. BAI£25 A. ,C BALB21 .AA. BALB2 6 .A.. .TT. c BALB14E .A. c BSLB23E .A. BALE3E .A. c BALB16E .AA..G... C. -.,c 110 120 130 140 150 160 170 180 BAIB5E -..c VH186.2 AAGTTAGCTG TTTTTGAGAT TTCCCATCAC TATTCTCATC TTTCTAACCA CCTGTAAATC CATCTGTCAA CTGTGTCACA-GTGGGGCCAC TGTCTCAAGc C BALB1 T C A T...- A BALB2 A.. .C C GC C GA A T c BALB4 ..C...C A.. .C G C..G - 190 200A BALB6 A.. .C C GC C GA A T BALB7 BALB16 .A. .G C..G .G.A EALB9 .A. .G C. .G .G.A BALB10 .A. .GC C..G BALB12 C .G.A BALB13 .A. .GC C..G .G.. EALB18 .A. .GC C .G.. RALB23 .GC C .GA. BALB2 0 C .G.A BALB2 5 .A.. .C C... -GC C .GA. BALB21 .A.. .C C... .GC C..G .G.. BALB26 .A.. .C C... .GC C .GA. EALB14E .A .G C..G -G.A BALB13E .A .G C..G .G.A BALB3E .A .G C..G BALB16E .G-. .A.. .C C... .GC C..G . ..T BALB5E 210 220 230 240 250 2 60 270 280 290 300

99 VH186.2 TGCAAATCTT TTTAGTGCAC AGGCTCTAAT GTTACATCCA TAGCCTCAAC ACAAGGTTCA GGGATGAGGT AT-GGGATGAA TTTCCACAGA CAAGATGAGG BALB1 T GA..G G T... BALB2 A G - c TT CA... BALB4 - T C BALB6 A G - c TT CA... BALB7 BALB16 BALB9 .TG. ..A. ..GA..G... BALB10 .CG. ..A. ..GA..G.A. T.. . BALB12 ..A. ..GA..G... .A.AA.. . .T ..A. ..- T.. . BALB13 .TT...... CA... BALB18 .TT...... CA... BALB23 A A. ..GA..G. T.. . BALB20 .G. .TT. . ...CA... BALB25 .CG. .GA. .G.A. T.. . .CT. . BALB21 . .G. . . .CA.. . BALB26 T.. . T.. . BALB14E T.. . BALB13B .A. . 310 320 330 350 3 60 .T..T.. . BALB3E 340 370 380 390 400 BALB16E

BALB5VH186.E2 ACTTGGGCTT CAGTATCCTG ATTCCTGACC CAGATGTCCC TTCTTCTCCA GCAGGAGTAG GTGCTTATCT AATATGTATC CTGCTCATGA ATATGCAAAT BALB1 ...G A G GT T.C BALB2 T C C..T A BALB4 BALB6 BALB7 BALB16 BALB9 BALB10 BALB12 .TG. .A. BALB13 . .G BALB18 .TG. .A. BALB23 ..G.A.. BALB2 0 BALB25 BALB21 BALB26 TT A T T. BALB14E .T.C. BALB13E BALE 3 E BALB16E . .C. . .T. BALB5E 410 420 430 440 450 460 470 480 490 500

VH186.2 CCTGTGTGTC TACAGTGGTA AATATAGGGT TGTCTACACG ATACA AAAAA—CATGAGATCA CTGTTCTCTT TACAGTTACT GAGCACACAG GACCTCACCA BALB1 ...C.AA - T T... .A... — — A C BALB2 AA C .C... — ...A C A T EALB4 A C.C...-— — C BALB6 BALB7 AA C .C... AG...A C. C A T BALE 16 BALB9 .C .C. BALB10 .C .C. BALB12 .C.AA. .T. .C .C. .CAG. BALB13 A T.C .C .C. - — ...A C BALB18 ..CA CAA BALB23 AA ...A .T C .C. ..— AG...A C '. A...... BALB20 ..C.AA - C ..T T.TC .C. ..CAG — A C ..T . .T. . BALB25 AA C .C. AG...A C CT A BALB21 A.. . A. .T. ..T.. C.C... — — ...A..G C..T BALB26 AA C .C. .. AG. ..A C A BALB14E .CCA -T. ..C .CT. ...- C .C — T--...A.A C A BALB13E ..A..A.. . A..T. ..T.. C.C.-— — ...A..G AC .T..C.C.C .C. . — — •...CG AT.C.GCc BALB3E A 510 520 530 540 550 560 570 580 590 600 BALB16E BALB5E Leader 1 VH186.2 TGGGATGGAG CTGTATCATG CTCTTCTTGG CAGCAACAGC TACAGGTAAG GGGCTCACAG TAGCAGGCTT GAGGTCT3GA CATATACATG GGTGACAAT3 EALE1 C T T BALE2 C T C BALB4 BALB6 A C T C BALB7 C T T C BALB16 C BALB9 C BALB10 A C T A C BALB12 .C .. BALB13 .T T. BALE! 8 . . .C .C .T. A. BALB23 ...C .C .T. A. BALB20 .. .c .C .T. A. BALB25 BALB21 . ..C . ..T T. .. .c RALB2 6 . . .C T. ...c .C.C. .CA TA. BALB14E . . ,T T. .. .c . ..CCTG A. BALB13E .. .c . . .C T. .C.C..T.A T BALB3E T. .. .c BALB16E .. .c T. .A.A ...c BALB5E .. .c 610 .CT T. 640 650 660 670 .. .c 690 700 .. .c 630 620 .. ,c .. .c 680 100 VH186.2 ACATCCACTT TGCCTTTCTC TCCACAGGTG TCCACTCCCA GGTCCAACTG CAGCAGCCTG GGGCTGAGCT TGTGAAGCCT GGGGCTTCAG TGAAGCTGTC BALB1 G G BALB2 .AA. BALB4 .A. .A. .AA. BALB6 BALB7 BALB16 BALB9 BALB10 BALB12 BALB13 BALB18 BALB23 BALB20 BALB25 .AA BALB21 .C -CT. .CCA. . G. .T. . .A.A.. BALB26 BALB14E BALB13E .AA.. BALB3E 710 720 730 740 750 760 770 780 790 800 BALB16E CDRl BALB5E VH186.2 CTGCAAGGCT TCTGGCTACA CCTTCACCAG CTACTGGATG CACTGGGTGA AGCAGAGGCC TGGACGAGGC CTTGAGTGGA TTGGAAGGAT TGATCCTAAT BALB1 A A A AT.. .T CT. BA1B2 A A A GAT.. .T GG. BALB4 A A A T GTAT.A A T. .GAT. . .T.C.GG. BAIB6 A A A A GAT.. .T GG. BALB7 ...T T BUBH .A A T A GAT.. .T GG. BAI.B9 BALBIO .TCC BALB12 . .GC EALB13 .TGA .A..TC. BAIB18 .TGA .A. . .C. . A. .TC. BALB23 AC. .A. .GAT. . .T GG. BALB2 0 .A. .GA... .A GC BALB25 . . A. .A. ..C.T.. .C TCC BALB21 .A A. .A. . . .GAT. BALB26 .T GG. .T .A. .A. .C .T. . TCC BAIB14E ,A A BALR23E .. .A A. BALB3E • A GAT.. .T GG. .;.A A. .A GAT.. .T GG. BALB16E 810 820 830 840 850 8 60 BALB5E 870 880 890 900 —CDR2 ****12****_> VH186.2 AGTGGTGGTA CTAAGTACAA TGAGAAGTTC AAGAGCAAGG CCACACTGAC TGTAGACAAA CCCTCCAGCA CAGCCTACAT GCAGCTCAGC AGCCTGACAT EAIB1 GA.A..AA C CA GA T T BALB2 ..A...AT C C C T BALB4 BAIB6 A C T C T T A EALB7 .CA...AT C C C. T BALB16 BALB9 .C. T. BAIB10 EALB12 GA.A. ..AA. ..CC .CA. .GA. T. CA. BAIB13 .A C . .C. T. . ..C. BME18 GA.A. .TA GC CA. .G.. T. BALB23 GA.A. .TA GC CA. .C T. BALB20 A C ..C. T. BAI.B25 .A. .A C C T. BAIS21 T. .A. BALB26 GA.A...AA. ...G..TA.. .C. .GA. BALB14E ..A...AT C ..C. T. GA.A...AA A.. .C. T. . . .A. BALB13B .G.T. BALB3E AA C . .C. T. BALB16E A C , .C. T. . ..A. BALB5E . .A. ..AT C 930 940 950 . .C. T. 970 980 990 1000 910 920 960 Figure 8.4. Sequences of the 21 VH186.2 related germline genes isolated from BALB/c DNA. 3 of the sequences represent pseudogenes (italicised sequence names, see Table 8.3). The TATA box, octamer and ATG start codon are overlined. The positions of the primers are indicated with asterisks. Symbols: I = putative splice sites; > = first codon of the coding region (CAG); * = putative transcription start site;. = sequence identity with the VH 186.2 germline gene; - = base deletion. Base insertions are indicated by additions to the individual sequence. The numbering is as shown in Both et al., 1990.

None of the 21 sequences isolated from BALB/c DNA correspond to any of the sequences found in C57BL/6J DNA. Although the 21 sequences are obviously related to VH 186.2, this gene was not isolated at all from BALB/c DNA. Nevertheless, a number of sequence patterns found in the 31 C57BL/6J sequences are also found in the 21

101 BALB/c sequences. Thus, many of the coincident changes found in 5' flanking regions as well as in the putative coding regions of the C57BL/6J sequences are also present in the BALB/c sequences. Furthermore, the BALB/c sequences also display a significant concentration of variability in CDR2 (also see Fig. 8.6). Two sets among the above BALB/c sequences contain identical coding regions. Both of these sets, BALB5E / BALB14E / BALB2 and B ALB 18 / BALB23 are identical up to a point within or near the leader sequence, but differ in the upstream regions. Although all of these sequences were isolated only once (Table 8.3), none were isolated from the same PCR amplifications. Although it could be argued that these sequences are the result of PCR strand jumping, the data presented in chapter 7 indicates that this is not likely to be the case. One of the 21 BALB/c sequences shown in Figure 8.4 contains a single nucleotide change in the putative octamer (BALB16y), two sequences (BALB25 and BALB13E\|/) contain single nucleotide changes in the putative TATA box and one of these sequences (BALB25) also contains a change at the spbice site. Thus, BALB25 may be a pseudogene due to possible defects in the promoter region and at the splice site. Almost half of the sequences contain a nucleotide change within the heptamer homologue which is present 2 bp upstream of the octamer region. The heptamer present in the VH186.2 promoter already differs by one nucleotide from the consensus (Poellinger et al, 1989), thus it may be possible that the additional change may abrogate or decrease to effectiveness of this sequence motif. However, it is not clear whether this would interfere with transcription, because the heptamer is absent in some genes that can be transcribed (see section 13.5a). The homology plot shown in Figure 8.5 indicates that most of the sequences share greater than 90 % homology. The sequence homologies of the non-transcribed 5' flanking region sequences and the putative transcription units are much more similar than they were amongst the 31 C57BL/6J sequences (Fig. 8.2). Therefore the 21 BALB/c sequences belong to the same family.

102 50n M 5' cap site Z E2 3' cap site c 3 £40'

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 % Sequence homology

Figure 8.5. Sequence homology plot of the 21 V"H186.2 related sequences isolated from BALB/c DNA. Homologies are shown for the 5' non-transcribed flanking regions (5' cap) and for the putative transcription unit (3' cap).

Wu - Kabat nucleotide/amino acid variability plots for the 21 BALB/c sequences: Nucleotide and amino acid variability was calculated as described above.

-400 -200 0 200 20 40 60 80 Nucleotide position Amino Acid Position 13-KHZnC CDRl CDR2 P capL V Figure 8.6. Nucleotide (left) and amino acid (right) variability plots for the 21 VJI186.2 related BALB/c sequences. The horizontal dotted line in the nucleotide variability plot indicates the level of variability where only one out of the 21 sequences contains a nucleotide change (2.1). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0. The diagram below each graph indicates the region included in the graph. The relative positions of CDRl and CDR2 are indicated by black boxes. In the nucleotide variability plot the cap site was assigned position 0. Symbols: P = promoter, cap = transcription start site; L = leader sequence; V = germline VH gene.

103 Although the background level of nucleotide variability in the BALB/c sequences is higher than that found in the C57BL/6J sequences (Fig. 8.3), there is a much higher variability peak in CDR2. Two additional minor variability peaks are also present in FRl and FR3, however neither of these translates into the putative amino acid sequence. In contrast, the elevation of nucleotide variability in CDR2 is also present at the protein level Although the nucleotide variability throughout most of the PCR amplified region is higher than in the C57BL/6J genes, the depression of variability in the FRs of the C57BL/6J sequences is also apparent in the sequences amplified from BALB/c DNA. As discussed above, the expected R:S ratios for CDRl and CDR2b would predict nucleotide variability in these regions, however Figures 8.3 and 8.6 clearly illustrate that this is not the case. As with the 31 C57BL/6J sequences, the most significant concentration of nucleotide variability in the 21 BALB/c sequences is present in CDR2a which has a lower expected R:S ratio than either CDRl or CDR2b.

Codon-by-codon analysis of the putative coding regions of the 21 BALB/c sequences: The codon-by-codon analysis of the 31 VH 186.2 related sequences isolated from C57BL/6J (Table 8.2) was also applied to the 21 sequences isolated from BALB/c DNA (Table 8.4).

Table 8.4. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 21 VH186.2 related geimline genes isolated from BALB/c DNA. Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 30 28 0 (1-30) Expl 40.2 38.2 37.9 15.0 16.7 17.2 2.8 3.1 2.9 CDRl Obs 25 0 0 (31-35) Exp 21.1 19.7 19.3 1.7 2.7 3.0 2.3 2.6 2.7

FR2 Obs 13 8 3dn) (36-49) Exp 17.1 16.0 15.7 4.9 5.5 5.7 1.3 1.8 1.9 CDR2a Obs 117 10 0 (50-58) Exp 97.2 94.1 93.5 28.2 31.1 32.3 1.5 1.7 1.3

CDR2b Obs 27 6 2dn) (59-66) Exp 28.7 26.7 26.5 3.9 6.1 6.9 1.9 1.7 1.3 FR3 Obs 22 13 0 (67-83) Exp 25.7 24.7 24.5 8.2 9.2 9.5 1.1 1.1 1.0

Entire V Obs 234 65 5(2«) region Exp 230.0 219.4 217.4 61.9 71.3 74.6 12.0 13.3 12.2 * ** *

104 For this analysis (Table 8.4, previous page) a VH186.2 related consensus sequence was determined for the entire 52 unique sequences (i.e. 31 from C57BL/6J, 21 from BALB/c). The consensus sequence differs from the germline V"H186.2 sequence only at codons 43 (CAA), 50 (AAG), 54 (AGT), 59 (AAC) and 75 (TCC). t it is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ^ The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic substitutions in non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right). n The number preceding this symbol indicates the number of neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation, x probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. * = p -0.05; ** = p<0.05; **** = p<0.001. For stop codons, the %2 value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

The consensus sequence used for this analysis is the same as the one used for the codon-by-codon analysis of the 31 C57BL/6J sequences, thus the distribution of expected R:S ratios is identical. As with the C57BL/6J sequences, the selection for silent mutations in FRl and replacement mutations in subregion CDR2a is statistically significant. The difference between the expected and observed numbers of stop codons for the above sequences (p < 0.05) is not as statistically significant as it is in the C57BL/6J sequence set (p < 0.01). However, this statistical analysis includes two stop codons that were neutralized, or rescued, by at least one other nucleotide change within the same codon. If these are excluded from the calculations then there is indeed a highly significant deficit of stop codons generated by nucleotide substitutions (p < 0.01).

Concluding remarks: Thus, among the VH186.2 related sequences of both strains of mice there is statistically significant selection for amino acid diversity in CDR2a and conservation of FRl, whilst there is also a significant deficit of stop codons. Although the 52 sequences isolated from both mouse strains do not include the 3' 42 nucleotides, it is unlikely that this would greatly affect the patterns described above, unless it is assumed that this region generally contains an increased incidence of nucleotide and amino acid variability stop codons. The seven VH 186.2 related germline genes published by Bothwell and colleagues (Bothwell et al, 1981) do not support such an assumption. Furthermore, there is also no evidence of such a pattern at the 3' termini of the 67 germline VH genes in the J558 sequence database compiled by Gu et al. (1991b).

105 9. Phylogenetic analysis of VH 186.2 related germline genes

Rationale: Each of the 958 bp fragment amplified with primers 1 and 12 from C57BL/6J or BALB/c DNA constitutes only a small portion of the 10-20 kb of DNA that contains each IgVH gene (Honjo, 1983; Bothwell, 1984; Blankenstein and Krawinkel, 1987; Rathbun et al, 1989). Under a simple neo-Darwinian model of V gene evolution (i.e. selection of individuals on the basis of advantageous random variations in the germline DNA), it would be expected that the PCR amplified DNA fragment should evolve as a contiguous unit. In order to directly test this, the entire sequence, the non- transcribed 5' flanking sequence, the putative transcribed region and the putative coding region of each complete data set (i.e. 31 C57BL/6J and 21 BALB/c sequences) were subjected to independent phylogenetic analyses. These analyses reveal that the sequence relationships differ between the 5' non-transcribed flanking regions and the putative transcription units.

Strategy: The phylogenetic trees (dendrograms) were constructed using the MegAlign software package (DNAStar, Abacus House, West Ealing, London, W13 OAS, U.K.). This program generates dendrograms by the weighted-residue method described by J. Hein (1990).

106 Dendrograms for the 31 C57BL/6J sequences:

I57C-20 C57E-31 C57G^4 •C57C-25 C57C-23 57C-19 C57G-11V C57E-1 C57E-6 _i-C57C-20 iC57C-25 . P-C57E-31 Jk:57C-26 r^^-C57G-44 fc-C57E-22 J-C57C-2•C57G-113 W -C57G-4>VH186.6 2 C57E-6 -C57C-14~t-c C557^-1 •C57E-37lC-19 6 J57C-27 C57C-14W 7G-8 "157C-22 7C-16 LfT-C:57E- 3 57C-2 -C57G-45 -44 C57C-27 __5 C57G-8 57C-22 C57C-16 •C57E-3 C57C-2 J57G-45 C57E-44 J57G-46 C57E-35 r_r-C57C-ll —C57E-38 P C57G-9\j/ C57C-11 ' C57C-15 LC57E-40 C57G-10 -C57E-3C57G-103 •C57E-33 P capL P capL -rzhKH: —OfH 257G-11V r-C57G-46 -C57C-23 C57C-22 -•EW10 C57E-33 C57E-3 I—C57C-20 I-C57E-44 J-C57E-31 C57E-40 'C57G-44 r-dC57C-1 5 r-C57C-2 C57E-22C57C-2 6 —C57C-27 rF-186. 2 • C57G-8 J-C57E-3•VH 6 r 'C57C-16 ^_4-C57C-19 1 C57E-6 T-C57E-1 C57G-9\|/ J57C-25 -C57C-11 :57C-15 I-C57C-23 I-C57G-46 -C57E-38 -C57C-22 C57G-1-C57E-30 3 C57G-45 C57C-20 C57G-ll\|f C57E-35 J^C57E-31 C57G-44 rC57E-3 C57E-22 •-C57E-44 _ _ C57C-14\|/ J_C57C-27 JC57C-16 :57C-1ri-C57C-1 2 , C57G-9V i I-C57G-8 I-—C57C-14y C57C-26 H ' -C57E-6 <=Z57E-4 0 -C57C-25 L-C5 P capL P capL V o-fw:

Figure 9.1. Dendrograms illustrating phylogenetic sequence relationships of the 31 VH 186.2 related sequences isolated from C57BL/6J DNA. Four different regions were subjected to analysis: the entire PCR fragment (upper left), the 5' non-transcribed region (upperright), th e transcription unit (lower left), 107 and the coding region (lowerright). The sequence for clone C57E2 is identical to the VH186.2 sequence (Table 8.1), and is thus labeled VH186.2. The diagram below each dendrogram indicates the region that was analyzed. Symbols: P = promoter; cap = transcription start site; L = leader sequences; V = V gene; y = pseudogene.

Figure 9.1 illustrates clearly that the sequence relationships of the 5' non- transcribed flanking regions and the putative 3' transcribed regions differ greatly amongst the C57BL/6J sequences, probably because they have evolved quite differently. Although the topology of the dendrogram obtained for the coding regions differs from that constructed for the entire putatively transcribed region, there is more conservation of sequence relationships than there is between the dendrograms of the regions 5' and 3' of the cap site. Thus, two related sets of sequences, one composed of C57G46, C57C22, C57G45, C57E35, C57E3 and C57E44 and the other composed of C57C23, C57E38 and C57G10 are on the same branch of the dendrograms constructed for the putative transcribed and coding regions. However, in the dendrogram of the 5' flanking regions these clusters are no longer apparent. Nevertheless, there are some clusters of related sequences present in the dendrograms for the putative transcription units and coding regions (C57C2 / C57C27 / C57G8 / C57C16 / C57E6 and C57E22 / C57C14\|/ / C57E36 / VH 186.2 / C57E1) that are partially present also in the dendrogram of the 5' non-translated flanking regions, although they form part of a different cluster of related sequences. Only one cluster of related sequences (C57C20 / C57E31 / C57G44) is present in all dendrograms. Therefore, the phylogenetic analysis of the 31 VH186.2 related sequences isolated from C57BL/6J DNA indicate that the putative transcription/coding units have evolved differently to the 5' non-translatedflanking regions . The fact that more unique nucleotide changes are present in the putative transcription units (Fig. 8.1) suggests that these regions have evolved more rapidly. The different evolutionary rates between the two regions may indicate that the putative transcription/coding units have been targeted by hyper-recombination events (A. Gibbs, personal communication).

108 Dendrograms for the 21 BALB/c sequences:

TBALB-14E BALB-20 BALB-2 BALB-21 BALB-6 BALB-2 BALB-20 BALB-6 BALB-21 BALB-14E BALB-5E BALB-23 BALB-16E "BALB-5E BALB-9 BALB-12 BALB-4y BALB-26 BALB-13 BALB-18 BALB-25 •BALB-16E BALB-1 'BALB-4V BALB-16y "BALB-9 BALB-7 •BALB-13Ey BALB-10 BALB-3E BALB-3E J—BALB-16\j/ BALB-12 •-BALB-7 BALB-26 "BALB-10 BALB-18 •BALB-1 BALB-23 HI •BALB-13 BALB-13E\|/ "BALB-25 P capL P capL O-HH:

"BALB-5E BALB-2 BALB-2 "BALB-6 BALB-14E BALB-5E BALB-6 BALB-14E BALB-20 "—BALB-20 J-BALB-9 BALB-4y {BALB-16E —PBALB-16E 1 BALB-4y "- BALB-9 BALB-loy BALB-16\|/ BALB-7 BALB-7 BALB-10 BBALB-21 BALB-21 rl_ BALB-1- 0 BALB-25 -BALB-3E BALB-13 — BALB-13 BALB-3E BALB-25 —f BALB-23 _BALB-13E\|f L BALB-18 •BALB-12 •BALB-26 {EBALB-2 6 -BALB-13Ey JBALB-18 •BALB-12 L 1 BALB-23 BALB-1 •BALB-1 P capL P capL O-HK -T3-HK

Figure 9.2. Dendrograms illustrating phylogenetic sequence relationships of the 21 VH186.2 related sequences isolated from BALB/c DNA. Four different regions were subjected to analysis: the entire PCR fragment (upper left), the 5' non-transcribed region (upperright), the transcription unit (lower left), and the coding region (lowerright). The diagram below each dendrogram indicates the region that was analyzed. Symbols: P = promoter; cap = transcription start site; L = leader sequences; V = V gene; \|/ = pseudogene.

109 Four main clusters of related sequences are observed in both the dendrograms of the putative transcription units and coding regions: BALB2 / BALB6 / BALB5E / BALB14E / BALB20, BALB4\|/ / BALB16E / BALB9, BALB 16\|/ / BALB7 / BALB21 / BALB 10 and BALB13E\|/ / BALB 12 / BALB26 / BALB 18 / BALB23 / BALB1. In contrast to the C57BL/6J dendrograms (Fig. 9.1), none of these four clusters are totally absent in the dendrogram for the non-transcribed 5' flanking regions. Indeed, the cluster composed of related sequences BALB4\j/, BALB16E and BALB9 is present in all four dendrograms. Nevertheless, analysis of the dendrogram obtained for the putative transcription units reveals that many of the clusters of related sequences present in this dendrogram are altered in the dendrogram constructed from the 5' flanking region sequences (Fig. 9.2). The phylogenetic analysis of the 21 VH186.2 related sequences isolated from BALB/c DNA thus indicates that in these sequences the rates of evolution of the 5' flanking regions and the putative transcription units also differ. Taken together, the results obtained from the phylogenetic analyses of the VH 186.2 related sequences that were isolated from C57BL/6J and BALB/c DNA suggest that hyper-recombination events targeted to the putative transcription/coding units may have contributed to the evolution of these genes.

Independent confirmation of the phylogenetic analyses: The data shown in Figures 9.1 and 9.2 was independently confirmed by A. Gibbs (Molecular Systematics and Evolution Group, Research School of Biological Sciences, Australian National University, Canberra, ACT, 2501, Australia), Peter Lockhart and M Steele (Molecular Genetics Unit, Massey University, Auckland, New Zealand) by a number of methods (data not shown). First, dendrograms with very similar topologies were generated from distance matrices of percent nucleotide differences between all pairs of sequences by using the Neighbour-Joining tree method (Saitou and Nei, 1987). This revealed that there were consistently more nucleotide changes in the putative transcription unit than in the 5' non-transcribed flanking region, indicating that the putative transcription unit evolved at a higher rate. Second, the different relationship between the upstream and downstream regions was confirmed by using the 'sites test' (Kishino and Hasegawa, 1989) as implemented in the DNAPARS program of the PHYLIP 3.5 package (Felsenstein, 1993). Between 33 and 63 'parsimony sites' in each data set were compared and Neighbour- Joining trees were constructed for each data set. Superimposition of the trees onto the sequences of both 5' and 3* regions revealed that they were highly significantly different, between 5-7 standard deviations from the mean of an expected normal distribution. Third, distance matrices were computed for 300 nucleotide portions of the sequences (overlapping by 100 nucleotides) and compared with the DIPLOMO program (G. Weiller 110 and A. Gibbs, in preparation). In this way it was found that the two 5' terminal and the two 3' tenninal 300 nucleotide portions yielded very similar matrices. The topology of the dendrograms therefore differs between the putative tran­ scribed regions and the 5' flanking regions, indicating that the different parts of the total PCR amplified regions have distinct lineal relationships. The data suggest that the PCR amplified regions may have been subjected to hyper-recombination events targeted to the transcription unit

Distribution of insertions and deletions: In an attempt to find further support for the proposition that hyper-recombination events have targeted the transcription unit, the distribution of insertions and deletions within the entire PCR amplified region was analyzed. A total of 21 unique nucleotide insertion or deletion events were found in the 52 sequences from both data sets. The distribution of these within the PCR amplified region reveals that there is a concentration of insertions and deletions around the VH 186.2 cap site (Fig. 9.3).

-600 -400 -200 0 200 Nucleotide Position •O-HHZEB P cap L V

Figure 9.3. Distribution of 21 unique insertion and deletion events found in the 52 VH186.2 related germline V"H genes. The data are grouped in 50 bp intervals. The relative positions of CDRl and CDR2 are indicated by black boxes in the diagram below the graph. The cap site is assigned position 0. Symbols: P = promoter; cap = transcription start site; L = leader region; V = germline VH gene.

If it is assumed that nucleotide insertions and deletions are diagnostic of genetic recombination processes (eg. see Thomas and Capecchi, 1986; Weiss and Wilson, 1988), then the data presented in Figure 9.3 is consistent the proposition that

111 recombination events targeting the coding and/or transcribed regions have contributed to the germline evolution of IgVH genes.

Concluding remarks: The above phylogenetic analyses indicate that the dendrograms generated for the 5' flanking regions and the putative transcribed regions are very different, indicating that the two regions have evolved at different rates. The fact that more nucleotide changes are present in the putative transcription units suggests that this region has evolved at a higher rate than the 5' flanking region, probably due to hyper- recombination events targeted to this region. This proposition is supported by the concentration of insertion and deletion events around the cap site.

112 10. Analysis of genuine germline VH 186.2 related genes

Rationale: Since the 3' primer used to amplify the VH 186.2 related germline genes is positioned inside the VH coding region, it could be argued that some of the sequences presented in chapter 8 may originate from rearranged and somatically mutated VH regions expressed in B cells present in the liver. However, this would assume that significant concentrations of B cells expressing somatically mutated derivatives of VH 186.2 related genes, an assumption for which there is no support in the literature. In addition, such hypothetical contamination was reduced by exsanguination of the mice before removal of the tissues. Although it was shown that approximately 75 % of |LT8" B cells expressing VH 186.2 antibodies in unimmunized mice were somatically mutated, this population of B cells constitutes a minute fraction of the total peripheral B cell pool (Schittek and Rajewsky, 1992). None of the 45 mutated VH186.2 sequences published by Schittek and Rajewsky correspond to any of the 52 sequences shown in Figures 8.1 and 8.4. Furthermore, a comparison of the above 52 sequences with a total of 114 somatically mutated VH186.2 sequences published by others (Cumano and Rajewsky, 1985, 1986; Tao and Bothwell, 1990; Forster and Rajewsky, 1990; McHeyzer-Williams et al, 1991; Jacob et al, 1991a; Jacob and Kelsoe, 1992; McHeyzer-Williams et al, 1993; Jacob et al, 1993; Tao et al, 1993) found no matching sequences.

Strategy: Regardless of the above arguments, it was imperative to directly address the issue of whether any of the sequences amplified from liver genomic DNA were not germline genes but genes expressed by B cells present in the tissue. One strategy involved PCR amplification of liver DNA that was seeded with specified amounts of cloned DNA containing rearranged VH regions. Another approach involved repeating the analyses described in chapters 8 and 9 with a number of genuine germline genes: those previously isolated by others, those isolated from embryonic DNA and those isolated from independent PCR amplifications.

Estimation of B lymphocyte contaminants in C57BL/6J and BALB/c liver tissue: Although it is unlikely that all the sequences that were isolated from liver tissue could originate from contaminating B lymphocytes, it was important to estimate the level of B cell contamination. This was done by PCR amplifying from liver DNA with primer 1 and a primer specific for either JH-1 (primer 4), JH-2 (primer 5), JH-3 (primer 6) or JH-4 (primer 7). In order to estimate accurately the number of contaminating lymphocytes, amplifications were also carried out from liver DNA seeded with defined

113 amounts of the following cloned DNAs: 40.3 (JH-1 rearrangement), 3B44 (JH-2 rearrangement), and Hl-72 (JH-4 rearrangement, see chapter 6).

a)

b)

c)

Figure 10.1. Results of the quantitative PCR assay to estimate the relative number of rearranged V"H186.2 related sequences in the genomic DNA preparations used in this work. Control amplifications from a) BALB/c and b) C57BL/6J liver DNA alone (Lanes 1), or seeded with 15,000 (Lanes 2), 1,500 (Lanes 3) or 150 copies (Lanes 4) of cloned DNA containing the indicated JH rearrangement. No cloned JH-3 rearranged DNA was available for this experiment, c) Control amplifications from C57BL/6J liver DNA seeded with 750 (Lanes 1), 375 (Lanes 2) and 150 (Lanes 3) copies of 3B44 (JH-2) or Hl-72 (JH4) cloned DNA.

114 The amplifications carried out from BALB/c DNA (Fig. 10.1a) indicate that no JH-1 rearrangements were amplified, but that very faint bands of the size expected for VH-D-JH rearrangements were amplified with the primers specific for JH-2> JH-3 and JH-4- The seeded controls for JH-2 and JH-4 rearrangements indicate that the level of amplification from unseeded BALB/c DNA is approximately equal to that obtained when the DNA was seeded with 150 copies of the control template DNA. No JH-3 rearranged control DNA was available, but the DNA band of the size expected for a PCR amplified JH-3 rearrangement is of the same intensity as the ones obtained for JH-2 and JH-4 rearrangements. This may indicate that 100 ng of BALB/c liver DNA contains approximately 150 copies each of JH-2, JH-3 and JH-4 rearrangements. However, it is also possible that the banding pattern produced in the amplifications from genomic DNA may be due to random priming since the banding pattern obtained somewhat resembles those obtained in random amplification of polymorphic DNA (RAPD) experiments (Williams et al, 1990). Nevertheless, the data does raise the possibility that detectable numbers of rearranged templates are present in the genomic liver DNA. Thus, if it is assumed that 1 \ig of mammalian genomic DNA contains approximately 3 x 10^ copies of single-copy genes, and that the size of the germline VH 186.2 related sub-family is 20 (probably a conservative estimate), then the above data suggests that 100 ng of BALB/c liver DNA contains approximately 1 somatically rearranged VH 186.2 related gene per 1,333 unrearranged VH186.2 related genes. The banding pattern obtained for the C57BL/6J liver DNA (Fig. 10.1b) differs somewhat from those obtained from BALB/c liver DNA. This would be expected if the banding pattern is due to random priming. However, very faint bands of the appropriate size for somatically rearranged VH 186.2 related genes were detected in unseeded C57BL/6J liver DNA with primers specific for JH-1 and JH-3- The control amplifications containing genomic DNA seeded with a JH-1 rearranged template indicate that the product obtained from unseeded genomic DNA is much fainter than that where 150 copies of the control template are present. Therefore, fewer than 150 copies of JH-1 rearranged VH186.2 related genes are present in 100 ng of C57BL/6J liver DNA. The DNA band of the size expected for a PCR amplified JH-3 rearrangement is of the same intensity as the corresponding band obtained from BALB/c DNA, indicating that approximately 150 VH 186.2 genes rearranged to JH-3 were present in the template DNA. Thus, for JH-1 and JH-3 rearrangements, the level of B lymphocyte contamination in C57BL/6J liver DNA is approximately the same as for BALB/c liver DNA (i.e. 1 rearrangement per 1,333 germline genes). No product was detectable from C57BL/6J liver DNA with JH-2 and JH-4 specific primers even when 150 copies of the appropriate control target were present (Fig. 10.1b). The data shown in Figure 10.1c indicate that for these two rearrangements the lower level of detection by PCR is from 750 control targets. Note that the band obtained

115 with the JH-4 specific primer is barely visible. Thus, although no band of appropriate size was detected for these two rearrangements, the detection limit was higher than in BALB/c DNA. This could be due to impurities present in the C57BL/6J DNA that were absent in the BALB/c DNA. Alternatively, the low molecular weight smears (Fig. 10. lb) may indicate that the primers were 'mopped up' by non-specific priming sites, thus making fewer primer molecules available for specific amplification. Due to this it cannot be stated that no JH-2 and JH-4 rearranged templates were present, because if they were present at the same level as in BALB/c DNA they would be below the limits of detection. Thus, for JH-2 and JH-4 rearranged VH186.2 related genes in C57BL/6J liver DNA the number of somatically rearranged template per unrearranged gerrnhne template may be the same as in BALB/c DNA, but possibly due to the reasons given above they could not be detected. Nevertheless, it clear that the level of B lymphocyte contamination in liver genomic DNA was at most 1 rearranged VH gene for every 1,000 germline VH genes. This makes it highly unlikely that any of the 52 sequences isolated in previous chapters represent a rearranged VH gene expressed by a B lymphocyte.

Nucleotide and amino acid variability plots for 27/30 genuine germline genes: A number of genuine germline VH186.2 related genes can be identified amongst the 31 and 21 sequences isolated from C57BL/6J and BALB/c DNA respectively (see Tables 8.1 and 8.3). The coding regions of C57C16, C57E23, C57E1, C57E2, C57E3, C57E35, and C57E33 match those of previously published genes. BALB 10, BALB 12, C57G8, C57G10, C57G9, C57G11, and C57G44 were isolated from more than one independent PCR amplifications, and a total of 17 genes were amplified from embryonic DNA (5 from BALB/c and 12 from C57BL/6J). Thus, amongst the 52 sequences described above 27 can be assumed to be genuine germline genes. A nucleotide variability plot was constructed for these sequences (Figure 10.2). The amino acid variability plot was constructed from the above 27 sequences and three genes reported on by Bothwell et al. (1981) that were not isolated in this study: VH186.1, VH3 and VH23 (Figure 10.2).

116 -400 -200 0 200 400 20 40 60 80 100 Nucleotide position Amino Acid Position 4-LKZOC P capL CDRl CDR2

Figure 10.2. Nucleotide (left) and amino acid (right) variability plot for 27 or 30 certified VH 186.2 related C57BL/6J sequences respectively. The horizontal dotted line in the nucleotide variability plot indicates the level of variability where only one out of the 27 sequences contains a nucleotide change (2.08). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0. The diagram below each graph indicates the region included in the graph. The relative positions of CDRl and CDR2 are indicated by black boxes. In the nucleotide variability plot the cap site was assigned position 0. Symbols: P = promoter; cap = transcription start site; L = leader sequence; V = germline VJI gene.

Figure 10.2 clearly demonstrates the presence of a significant nucleotide variability peak in the amino terminal portion of CDR2 (CDR2a) which is also present at the arnino acid level. Although the distribution of expected R:S ratios predicts that similar variability peaks should be present in CDRl and in the carboxy-terrninal portion of CDR (CDR2b), no significant increase in variability is apparent in these regions. The overall reduction in nucleotide variability in the FRs observed in the 31 C57BL/6J and the 21 BALB/c sequences is also present in the above set of genuine germline sequences. Two minor variability peaks are present in FRl and one is found in FR3, however these do not translate into arnino acid variability spikes, indicating that the nucleotide changes contributing to these variability peaks are mainly silent changes. The data presented in Figure 10.2 therefore confirms the patterns obtained in the two data sets presented in chapter 8 (i.e. the 31 C57BL/6J and 21 BALB/c sequences).

Codon-by-codon analysis of the genuine germline genes: The 30 unambiguous germline sequences used for the arnino acid variability plot were also subjected to codon-by-codon analysis as described for the C57BL/6J and the BALB/c data sets (Tables 8.2 and 8.4).

117 Table 10.1. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 30 genuine germline VH186.2 related genes. Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 32 88 0 (1-30) Expfl 48.5 46.1 45.8 18.1 20.2 20.8 3.4 3.7 3.5 *** CDRl Obs 18 1 0 (31-35) Exp 16.0 15.0 14.7 1.3 2.1 2.3 1.7 2.0 2.0 FR2 Obs 29 10 0 (36-49) Exp 27.8 25.9 25.5 8.0 8.9 9.2 3.1 4.2 4.3 CDR2a Obs 145 14 0 (50-58) Exp 121.6 117.8 117.0 35.3 39.0 40.4 1.9 2.1 1.6 CDR2b Obs 43 10 1 (59-66) Exp 44.2 41.2 40.8 6.0 9.5 10.6 3.7 3.3 2.6 FR3 Obs 41 15 0 (67-83) Exp 41.1 39.5 39.2 13.2 14.7 15.2 1.7 1.8 1.6 Entire V Obs 308 88 1 region Exp 299.2 285.5 283.0 81.9 94.4 98.5 15.5 17.1 15.6 1* 3JQ J^S *I^

For this analysis a VH186.2 related consensus sequence was determined for the entire 52 unique sequences (i.e. 31 from C57BL/6J, 21 from BALB/c). The consensus sequence differs from the germline Vj{186.2 sequence only at codons 43 (CAA), 50 (AAG), 54 (AGT), 59 (AAC) and 75 (TCC). t It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). 1 The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic substitutions in non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right), x probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. ' = p<0.01; * * = p<0.001. For stop codons, the x^ value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

The codon-by-codon analysis of the 30 certified germline genes confirms the elevation in CDR2a and the depression in FRl of the R:S ratio that is also evident among the 31 C57BL/6J (Table 8.2) and the 21 BALB/c sequences (Table 8.4). In addition, Table 10.1 also reveals a statistically significant deficit of stop codons among the 30 genuine germline sequences.

Phylogenetic analysis of 27 genuine germline genes: The sequences of the 27 certified germline genes described above were also subjected to the same phylogenetic analysis as shown in chapter 9. The results of this analysis are shown in Figure 10.3.

118 1-C57C-16 j— C57C-16 LC57G-8 r—TLC57G-8 C57E-35 H "—C57E-44 C57E-44 r-rL-C57E-35 C57E-3 " C57E-38 r 4 C57G-45 j C57E-33 C57E-3C57E-480 H—C57E-4l-C57G-90 w C57G-9C57G-1y 0 C57E-3 C57G-10 — C57E-33 -[E C57G45 •BALB-12 -C57E-1 C57E-22 BALB-14E VH186.2 BALB-12 C57E-36 BALB-5E I rV7T C57E-1BALB-1 0 BALB-13E\i/ BALB-14E r I RA BALB-3E -BALB-16EBALB-5 E BALB-16E _i-C57C-23 I-C57C-23 , r-1—C57G-lly ^—C57G-11W C57G-44 -C57E-6 J '—C57E-6 J-C57E-2L_i C57E-32 1 H — C57E-31 I—PVHI86.2 BALB-3E I—rs7< BALB-DEy -C57G-44 •—C57E-36 -BALB-10 P capL V P capL -JZr-HHZZ:

i—C57E-22 C57G-8 rj-BALB-10 C57C-16 rPVH186.2 BALB-14E -l•C57E-L-C57E-31 6 BALB-5E BALB-3E BALB-16E J-C57E-31 C57E-6 "•C57G-44 C57E-35 — C57G-9y c _j C57G-lly C57G-45 P-CC57C-25 3 C57E-3 •f— C57G-1C 0 H:l-C57E-4 4 C57E-38 C57E-40 BALB-12 r-BI—C57E-3, 6 C57E-33 -C57E-1 •BALB-5E VH: BALB-14E BALB-13E\|/ i— BALB-10 C57G-8 I—VH186.( 2 i— ^.r J-C57E-31 C57C-16 C57E-22 T 'C57G-44 j—BALB-16C57E-6 E BALB-3E ~rs rC57C-23 C57E-35 -C57E-38 C57G-45 C57G-10 C57E-3 C57G-11\K ^CSTE^M C57G-9\|/ •BALB-13E\y I C57E-40 -C57E-3BALB-13 2 P capL P capL O-HH:

Figure 10.3. Dendrograms illustrating phylogenetic sequence relationships of the 27 genuine germline V"H186.2 related sequences isolated from C57BL/6J and BALB/c DNA. Four different regions were subjected to analysis: the entire PCR fragment (upper left), the 5' non-transcribed region (upperright), the transcription unit (lower left), and the coding region (lowerright). Th e diagram below each dendrogram indicates the region that was analyzed. Symbols: P = promoter; cap = transcription start site; L = leader sequences; V = V gene; \\f = pseudogene.

119 Four major clusters of related sequences are found in both the dendrograms for the putative transcription units and the coding regions: C57G8 / C57C16 / BALB14E / BALB5E / BALB16E / C57E6 / C57G9\j/, C57E35 / C57G45 / C57E3 / C57E44, C57E36 / C57E1 / BALB10 / VH186.2 / C57E22, and C57C23 / C57E38 / C57G10 / C57G1 lY- It is evident from Figure 10.3 that there is little conservation of the sequence relationships among these clusters in the dendrogram constructed from the non- transcribed 5' flanking regions of these sequences. Thus, the presence of different lineage relationships in the non-transcribed flanking regions and putative transcription units of the 31 C57BL/6J sequences (Fig. 9.1) and the 21 BALB/c sequences (Fig. 9.2) is confirmed by the above analysis of 27 certified germline genes.

Concluding remarks: The PCR control experiment described in this chapter indicates that the 52 VH186.2 related sequences presented in chapter 8 represent genuine germline genes. Furthermore, 27 full-length sequences and 30 putative coding regions that represent genuine germline genes could be identified. When these were subjected to the analyses outlined above they yielded virtually identical results, indicating that the patterns observed in chapters 8 and 9 reflect patterns present in germline VH genes.

120 11. Molecular analysis of VH205.12 related germline genes

Rationale and strategy: In order to determine if the patterns of DNA sequence variation and the phylogenetic relationships found amongst VH 186.2 related germline genes are general properties of murine IgVn genes, germline genes belonging to the VH205.12 sub-family were analyzed using the same strategies as described for the VH 186.2 related genes (see chapters 8 and 9). Generally, the data agree with the results obtained from the VH 186.2 related sequences, however interpretation of the results is limited by the smaller sample sizes. Due to time constraints it was not possible to isolate additional members of the VH205.12 sub-family.

Isolation of 20 VH205.12 related germline genes from C57BL/6J genomic DNA: PCR primers 14 and 12 were used to PCR amplify (40 cycles) DNA fragments containing approximately 520 bp of non-transcribed 5' flanking region and all but the 3' terminal 42 bases of the coding region of VH205.12 related germline genes from C57BL/6J DNA. The total length of the PCR fragments is approximately 968 bp, depending on the number of insertions and deletions present. 20 unique sequences were isolated from C57BL/6J DNA (Table 11.1).

Table 11.1. Isolation of 20 VH205.12 related germline genes from C57BL/6JDNA a Unique b Source of c PCR d Number of e Other sequence genomic DNA amplification repeats information C57G1 Liver Pfu#l 1 (Germany) C57G2 n n 9(4G,5C) VH205.12t C57G3 n n C57G5 M n C57G6 II ii C57G9 11 n C57G14 II Pfu#2 2(1G,1C) C57G15\|f II n Stop codon 880 C57G18y It n Stop codon 880 C57G22y II it Stop codon 880 C57G26 11 ii 8(2G,6C) C57G30y II n Stop codon 1016 C57C2 Liver (Canberra) Pfu#3 C57C9 n ii 3(1G,2C) C57C16 n ii C57C17V n n Base deletion 826 C57C18 n ii 2(2Q* C57C27 II Pfu#4 C57C44y n it 3(3C)* Stop codon 880 C57C48V n n Base deletion 885 a. A list of the unique sequences that were isolated and sequenced. \j/ indicates a pseudogene. b. Germany refers to the C57BL/6J liver DNA provided by K. Rajewsky, Canberra refers to DNA isolated from mice obtained from the ABE in Canberra (see Materials and Methods). Liver tissue was taken from 121 unimmunized adult mice, c Indicates which enzyme was utilized and which sequences were isolated from each independent amplification, d. Indicates the number of times each unique sequence was isolated. The information in the brackets indicates the number of times it was isolated, and from which DNA it was isolated. G = C57BL/6J liver DNA provided by K. Rajewsky; C = DNA isolated from mice obtained from the ABE in Canberra, indicates that the sequence was isolated from independent PCR amplifications, e. The type of mutation that generated the pseudogene is indicated by reference to the nucleotide position at which the mutation occurred, t = The germline gene VH205.12 was first described by Dildrop et al, 1984.

Six of the 20 sequences were isolated from more than one independent PCR amplification and two of these were isolated 8 or 9 times. Seven of the 20 genes isolated are pseudogenes, five due to point mutation generated stop codons and two due to frameshift mutations. It is interesting to note that four of the pseudogenes (C57G15, C57G18, C57G22 and C57C44) are generated by a TGG->TGA nucleotide change at position 880 and that the sequences of these pseudogenes seem to be conserved. This is in agreement with a recent report on human germline VH genes which found that the coding sequences of 24 out of 31 pseudogenes differed by only very few nucleotide substitutions (Haino etal, 1994).

DNA sequences of the 20 genes isolated from C57BL/6J DNA:

VH205.12 CTCTITIGGT GACAATTTAA AACAGAATTT CAAAATCAGT ATGTGACAGT GACTAGTATA CTCTTAACAA TAATAAGTAA AAATIAAACA TTTTCCACAT C57G6 A C57C27 A G C57G3 A C57C18 A C57G14 A C57C9 A C57C44 A C57G26 A G CS7G15 G C57G18 A G C57G30 A G CS7C17 A G C57C16 A G C57G1 A G CS7G22 A C57G5 A G C57G9 C57C2 A G C57C48 A G 10 20 30 40 50 60 70 80 90 100 VH205.12 CACATCCATG TCATTACTTT TCTCATTTGC CTTCTTCCTT ATCAACATAT GACTAAATTC TAATTAAGAC ATTAAATCTT TIT-AAACTGC ACTTAGCTAA C57G6 A T C C T. . . .C T C57C27 C...A A T. .C C T C57G3 A T C C T. . . .C T C57C18 A T C C T. ...C T C57G14 A - C57C9 A T C C T. ...C T C57C44 A - C57G26 T C G.G G G T. . ..A- CS7G15 T C G.G G G T. . ..A- C57G18 T C G.G G G T. . ..A- C57G30 C G.G G G T. . ..A- C57C17 T C G.G G G T. . ..A- C57C15 T C G.G G G T. . ..A- CS7G1 T C G.G G G T. . ..A- C57G22 A - C57G5 C C G T C57G9 - C57C2 T C G.G G G T. . ..A- C57C48 C C G - 110 120 130 140 150 160 170 180 190 200

122 CATTA iTCTTCiVi' C ACAAT. VH205.12 AGGGTA'rrr c CCSCA TrAT C ATCAG TCIT—CAT C AT c; TTC rc AllATCTG 'E A AiYT Q LAAS.T G TGAACATAG' C CCCAGAGT-GA C57G6 C57C27 C57G3 C57C18 C57G14 C57C9 C57C44 C57G26 . . .G .A. C57G1S . . .G C57G18 .T.G.. .T-, C57G30 . .G .A. C57C17 C57C16 . .G .A. C57G1 C57G22 C57G5 C57G9 C57C2 C57C48 210 220 230 240 250 260 270 280 290 300

VH205.12 CAAAACAAAC TTAGGCCAAA CACAGA1TGA GAGATTTGT C CCTGTAGTTT CAAGAATACC AGCAGTGCAG GGCTCACAGA AAATGTATGG ATCCAT:CTC C C57G6 C57C27 G .n.. T C57G3 G .G.. T...... A C57C18 G .G. . T...... A C57G14 G .n.. T. . . . C57C9 G .G. . T...... A C57C44 G .fi...... A . . T.. C57G26 C57G15 ,G. . T r; C57G18 G • G. . T. . . . C57G3D C57C17 C57C16 GC...A .G. . ..CT. . .(T, C57G1 GC...A .o.. T...... A . .G. . . .A. . . .T. . . C57G22 G .fi. T A . fi..A. ..r. . C57G5 C57G9 . , . .A .T. .. C57C2 GC... . .G.. T.T. . . .CT. . .T. .a .CC. n . .G...T.. G. . . . C57C48 GC...ATG A A A G.... 310 320 330 340 350 360 370 380 390 400

VH205.12 TCAGAGAGTT ATTGGATTTG GACTAGACTA TCCTGCTGCT TGACCTATGT ACC-TTTAAGT CCTTCCTCTC CAGCTTTTCI' TCATTCGGA-T TGGTJ?ATTA T C57G6 c. . . . .A a. C T.. .-. .G. .. C57C27 . .G . . .C. 7\ G C. T -. .G. .. C57G3 ..,T.. C57C18 .T. . . . .C. .A . . .T. G c....T. . .G. .. C57G14 ...T.. C57C9 ...T.. .G. . . C57C44 . .T ...T.. C57G26 .C. .C — .T. .A A C.A.. .T. C. A. .A. •V.. .A C57G15 . . .C. . . .T. . . . .AG. .. .A. C. . . .TT. C57G18 . . .C. . . .T. . . . .G . .A. C57G30 .C.— C57C17 .C. .C — C57C16 .C. .C — C57G1 ...T.. .G. . . C57G22 . .T . ..T.. _ C57G5 .r. .r.— C57G9 C57C2 .c. .c — C57C48 . . .T. . . . .GA A 410 420 430 440 450 460 470 480 190 500 *

VH205.12 ATACAAAGTC CCCTGGTCAT GAATA TGCAA AATACCTAA G TCTATCGTAG CTAAA A ACAG GGATATCAAC 1(.CCCTGAAA A CAA

123 Leader 1

VH205.12 CACCACAGAC ACTGAACACA CTGACTCTAA CCAT3GGATG GAGCTGGATC TTTCTCTTTC TCCTGTCAGG AACT3CAGGT AAGGGGCTCA CCAGTTCCAA C57G6 .T G C.CT T C57C27 .T C C57G3 T T C57C18 T G C.CT C57G14 T G C.CT C57C9 T G C.CT C57C44 .T. T C57G26 .T. .AT. C. .A.T. .AA .T. C57G15 .T. ..G. .. C. -AA. CS7G18 .T. .-G. .. C. -AA. C57G30 .T. .AT. C. .A.T. .AA T ....A A .T. C57C17 .T. .AT. C. .A.T. .AA T ....A A .T. C57C16 ,T. .AT. C. .A.T..AA T A A .T. C57G1 .T. ..G...... C.CT T. C57G22 .T T . .T. C57G5 .T AT. C G... .G T A A.T. .AA T A ,T C.CT C57G9 . .T. C57C2 .T AT. C G... .G T A A.T. .AA T A .T G G C C. ..T T .AA. C57C48 700 610 620 630 640 650 660 670 680 690

VH205.12 ATCTGAAGAA AAGAAATGGC TTGGGATGTC ACTGACATCC ACTCTGTCTT TCTCTTCACA GGTGTCCTCT CTGAGGTCCA GCTGCAACAA TCTGGACCTG C57G6 C57C27 G C57G3 G C57C18 G C57G14 G C57C9 . . . .G CS7C44 G C57G26 C.AA. ..T.. .A..T. .G..G CS7G15 .A. G C57G18 .A. G C57G30 C.AA. ..T.. .A. .T. . .G..G C57C17 C.AA...T.. .A. .T.. .G..G .CA. C57C16 C.AA. ..T.. .A..T.. .G..G C57G1 I A. .G..G C57G22 C.AA A. G C57G5 C57G9 C.AA. . .T.. .A. .T A .A. . .G..G C57C2 .A. . C57C48 710 720 730 740 750 760 770 780 790 800

CDRl VH205.12 AGCTGGTGAA GCCTGGGGCT TCAGTGAAGA TATCCTGTAA GGCTTCTGGA TACACGTTCA CTGACTACTA CATGAACTGG GTGAAGCAGA GCCATGGAAA C57G6 C57C27 .C. . A. C57G3 .C. . A. .A. .A. C57C18 .C. . A. C57G14 .A. .C. . A. C57C9 .A. C57C44 .C. . A. C57G26 .CT .T ..TT.A. .T.C .A. CS7G1S .C . .T .. .T.A. .A .A C57G18 .CT .T ..TCA.. .G. .A. .T.C C57G30 • CT .T . .TCA.. C57C17 .C . .T . . .T.A. .G. .A. .T.C -G. C57C16 .C. . .T . . .T.A. .A. .T.C .G. C57G1 .T . . .T.A. .A. .T.C .c. .T . . .T.A. C57G22 .CT .T .TAT.A. C57G5 AA. C57G9 ,C . . ..T .T.A. ..A. .T.C.A...... A. C57C2 810 820 830 840 '850 360 870 880 890 900 C57C48

CDR2 VH205.12 GAGCCTTGAG TGGATTGGAG ATATTAATCC TAACAATGGT GGTACTAGCT ACAACCAGAA GTTCAAGGGC AAGGCCACAT TGACT3TAGA CAAGTCCTCC C57G6 C57C27 C57G3 C57C1S C57G14 C57C9 .CA. C57C44 .G..C . .A. C57G26 .T .CA. C57G15 -T .CA. CS7G1S ,G..C . .A. CS7G30 ,G..C . .A. CS7C17 .G..C . -A. C57C16 .G..C. . .A. C57G1 .CA. CS7G22 . .A. C57G5 C57G9 C57C2 1000 CS7C48 910 920 930 940 950 960 970 980 990

124 *************12********** VH205.12 AGCACAGCCT ACATGGAGCT CCGCAGCCTG ACATCTGAGG ACTCTGCAGT CTATTACTGT C57G6 C57C27 C57G3 C57C18 C57G14 C57C9 C57C44 G A. C57G26 C AAG C57G15 G A. C57G18 G A. C57G30 CT AAG C57C17 C AAG C57C16 C AAG C57G1 C AAG CS7G22 G A. C57G5 C AA. C57G9 C57C2 CS7C48 C AA. 1010 1020 1030 1040 1060 1070 Figure 11.1. The sequences of 20 VH205.12 related germline genes isolated from C57BL/6J DNA. 7 of the sequences represent pseudogenes (italicised sequence names, see Table 11.1). The sequence for clone C57G2 is identical to the VH205.12 sequence Table 11.1), and is thus labeled VH205.12. The putative TATA box, octamer and ATG start codon are overlined. The positions of the primers are indicated with asterisks. Symbols: I = putative splice sites; > = first codon of the coding region (GAG); * = putative transcription start site; . = sequence identity with the VH205.12 germline gene; - = base deletion. Base insertions are indicated by additions to the individual sequence. The numbering is as shown in Both et al, 1990. Inspection of the sequences presented in Figure 11.1 reveals that a number of sequences with identical putative coding regions of transcription units but different 5' flanking regions were isolated. The putative coding regions of C57C18, C57C9, C57G14 and C57G3 are identical. Whereas C57G3 differs from the other three sequences immediately 5' of the putative coding region, C57C9, C57G14, and C57C18 are identical up to a point approximately 160 bp upstream of the putative cap site. Similarly, the putative coding regions of C57G6 and C57C27 are identical but differ in their upstream regions. It is highly unlikely that this can be explained by strand jumping events that occurred during PCR amplification because three of the sequences (C57C9, C57C18, and C57G14) were isolated from independent PCR amplifications (Table 11.1). Moreover, the data presented in chapter 7 clearly illustrates that no crossover events could be detected in a number of control experiments. The sequence C57G26 was isolated a total of 8 times from a number of different PCR amplifications (Table 11.1). C57C16 differs from C57G26 by two nucleotides at positions 765 and 766, whilst the only difference between C57C17 and C57G26 is a deletion of a single nucleotide in the putative coding region (position 826) of C57C17. These sequences may be indicative of recent duplication events. From Figure 11.1 it is also apparent that similar to the 2 sets of VH 186.2 related sequences (Figs. 8.1 and 8.4), many coincident changes are present in the collection of 20 VH205.12 related sequences isolated from C57BL/6J DNA. However, in contrast to the VH186.2 sequence sets the 20 VH205.12 related genes isolated from C57BL/6J DNA display much more sequence variability within the 70 bp including and surrounding CDRl than they do in CDR2 (Fig. 11.1). In addition, the sequences also display some

125 concentration of nucleotide variability immediately 5' of the priming site of primer 12. The fact that the sequences were PCR amplifiable suggests that this variability does not extend into the actual priming site itself. All but two of the above sequences contain an A-»T nucleotide change in the octamer motif which brings about a sequence identity with the octamer that is present in the VH186.2 promoter, which is known to be functional (Ballard and Bothwell, 1986). Only one sequence (C57C44) contains a point mutation (A->T) in the putative TATA box and none of the sequences contain any changes at the splice sites. It is also possible that some of these sequences may be pseudogenes due to crippling mutations present in the recombination signal sequences that are present downstream the coding region. In addition, no obvious heptamer homologue is present in the promoter region, however hybridomas expressing the VH205.12 gene have been isolated (Sablitzky and Rajewsky, 1984), thus the lack of a heptamer homologue in the promoter region does not necessarily resultin the lack of transcription of the gene (see section 13.5a). The homology plot shown in Figure 11.2 indicates that all the sequences share greater than 85 % homology and are thus members of the same gene family. The sequence homologies of the non-transcribed 5' flanking region sequences and the putative transcription units are very similar.

m 5' cap site E3 3' cap site

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99100 % Sequence homology

Figure 11.2. Sequence homology plot of the 20 V"H205.12 related sequences isolated from C57BL/6J DNA. Homologies are shown for the 5' non-transcribed flanking regions (5' cap) and for the putative transcription unit (3' cap).

Wu - Kabat nucleotide/amino acid variability plots for the 20 C57BL/6J sequences: The nucleotide and amino acid variability indices were calculated for the 20 VH205.12 related genes isolated from C57BL/6J DNA, as described in chapter 8.

126 -400 -200 0 200 400 20 40 60 80 Nucleotide position Amino Acid Position -KHZTJD P capL CDRl CDR2

Figure 11.3. Nucleotide (left) and amino acid (left) variability plots for the 20 VH205.12 related C57BL/6J sequences. The horizontal dotted line in the nucleotide variability plot indicates the level of variability where only one out of the 20 sequences contains a nucleotide change (2.105). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0. The diagram below each graph indicates the region included in the graph. The relative positions of CDRl and CDR2 are indicated by black boxes. In the nucleotide variability plot the cap site was assigned position 0. Symbols: P = promoter, cap = transcription start site; L = leader sequence; V = germline VH gene.

The nucleotide variability plot reveals that the region upstream of the putative coding region contains variability peaks as high or higher than those found in the coding region. However, it is important to note that the FRs of the putative coding region contain verytittle nucleotid e variability except for the regions immediately surrounding CDRl and the 3' end of the sequenced region. In addition, the variability peaks in the region upstream of the putative coding region are due to variability at single nucleotide positions, whilst the spikes in and around CDRl are due to nucleotide variability at a number of sites spread over approximately 70 bp (Fig. 11.1). The overall level of nucleotide variability in the FRs is much lower than that present in the upstream regions (Fig. 11.3). This pattern is also present at the amino acid level. 5 major variability peaks are present in the Wu-Kabat amino acid variability plot for the putative coding regions. The first immediately precedes CDRl, whilst the second is situated in CDRl. The third and fourth are present in CDR2, and the final one at the very 3' terminus of the sequenced. Two other variability spikes present in FR3 at the nucleotide level do not translate to the protein level. Although the nucleotide and amino acid variability distributions are not as clear-cut as for the VH186.2 sequences (Figs. 8.3 and 8.6), they do suggest positive selection for variability in and around CDRl and negative selection for variability in the FRs.

127 Codon-by-codon analysis of the putative coding regions of the 20 C57BL/6J sequences: The codon-by-codon analysis of the putative coding regions described in chapter 8 was also applied to the putative coding region sequences for the 20 VH205.12 related sequences isolated from C57BI76J DNA.

Table 11.2. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 20 VH205.12 related germline genes isolated from C57BL/6JDNA. Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 27 64 0 (1-30) Exp^l 62.6 60.1 58.6 23.6 25.8 25.9 4.7 5.2 6.6 CDRl Obs 26 2 0 (31-35) Exp 23.0 22.3 21.6 3.7 4.0 5.2 1.2 1.7 1.2 FR2 Obs 11 12 4 (36-49) Exp 20.4 18.8 18.1 4.5 5.4 5.6 2.1 2.8 3.3 **** CDR2a Obs 21 4 0 (50-58) Exp 19.4 18.9 18.7 5.6 6.2 6.4 0.0 0.0 0.0 CDR2b Obs 15 0 0 (59-66) Exp 11.9 10.9 10.8 2.3 3.0 3.3 0.8 1.1 0.9 ** ** FR3 Obs 29 19 1 (67-83) Exp 35.7 34.4 33.8 11.8 12.9 13.1 1.5 1.7 2.2 *** Entire V Obs 129 101 5 region Exp 173.0 165.4 161.6 51.5 57.3 59.5 10.3 12.5 14.2 ** ***

For this analysis a VH205.12 related consensus sequence was determined for the 20 unique sequences. The consensus sequence differs from the germline VH205.12 sequence only at codons 6 (CAG), 22 (TGC), and 28 (ACA). t It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). I The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic substitutions in non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right). x2 probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. ** = p<0.05; *** = p<0.01; **** = p<0.001. For stop codons, the x2 value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

The detailed codon analysis of the consensus sequence of the 52 VH 186.2 related sequences described in chapter 8 revealed that the expected R:S ratio varied in the different sub-regions of the VH coding sequences (Tables 8.2, 8.4 and 10.1). As seen in Table 11.2, the expected R:S ratio is also not constant throughout the VH205.12

128 consensus coding sequence. The expected R:S ratio of the FRs ranges from 2:1 to approximately 4:1, depending on the model used to calculate the expected numbers. In CDR2a the expected ratio is approximately 3:1, whereas in CDRl and CDR2b the expected R:S ratios respectively range between 4:1 to 6:1 and 3:1 to 5:1. Thus the expected R:S ratios for CDRl and CDR2b are lower for the VH205.12 related sequences than they are in the VH186.2 related genes. Consistent with the higher expected R:S ratio of CDRl, concentration of nucleotide and amino acid variability is present in that region of the 20 sequences isolated from C57BL/6J DNA (Figs. 11.1 and 11.3). However, variability in these sequences seems to be concentrated in CDR2a rather than in CDR2b. Although this is similar to concentration of variability in CDR2a of VH 186.2 related sequences (chapter 8), the variability in this region of the VH205.12 related sequences is much lower than in the VH186.2 sequences. Table 11.2 indicates that there is statistically significant conservation of FRl and FR2 in the 20 VH205.12 related sequences isolated from C57BL/6J DNA as evidenced by the significant elevated numbers of silent nucleotide changes and the decrease in amino replacement changes when compared to the expected numbers. Despite the increased nucleotide and amino acid variability found in and around CDRl (Fig. 11.3), Table 11.2 suggests that this is not statistically different to the expected level of variability. Therefore, as was the case amongst the VH 186.2 related sequences (chapter 8) the sequence composition of CDRl of the VH205.12 related sequences isolated from C57BL/6J DNA also favours replacement substitutions, i.e. any nucleotide change is more likely to produce an amino acid replacement than a silent change. Although the observed number of stop codons due to nucleotide substitutions was lower than expected, this was statistically significant for only two of the three methods for estimating the expected numbers. Thus, a higher proportion of the VH205.12 related sequences have accumulated nucleotide substitutions that result in stop codons than is evident amongst the VH186.2 related sequences (chapter 8, see below).

Isolation of 10 VH205.12 related germline genes from BALB/c genomic DNA: VH205.12 related germline genes were also amplified from BALB/c DNA as described above. Due to time limitations it was only possible to isolate and sequence 10 unique sequences.

129 Table 11.3. Isolation of 10 VH205.12 related germline genes from BALB/c DNA C a Unique *> Source of PCR d Number of e Other sequence genomic DNA amplification repeats information BALB6 Liver (Canberra) Pfu#l 1 BALB8 11 1 BALB9 n 4(3-1,1-2) BALBll ti 1 BALB13V n 1 Stop codon 848 BALB17 ii 4(3-1,1-2) BALB19 n 5(3-1,2-2) BALB58y Pfu#2 1 Stop codon 848 BALB67 n 1 BALB71 n 1

a. A list of the unique sequences that were isolated and sequenced. \jr indicates a pseudogene. b. These genes were amplified from only one type of genomic DNA: DNA extracted from liver tissue obtained from a BALB/c mouse from the ABE in Canberra, c. Indicates which enzyme was utilized and which sequences were isolated from each independent PCR amplification, d. Indicates the number oftimes each sequence was isolated. The number preceding the dash indicates the number of times the sequence was isolated and the number following the dash indicates one of the two PCR amplifications, e. The type of mutation that generated the pseudogene is indicated by reference to the nucleotide position at which it occurred.

Three of the sequences were isolated from more than one independent amplification. Two of the are pseudogenes due to defects in the putative coding regions. Both contain a GGA-»TGA nucleotide substitution that introduces a stop codon. The VH205.12 gene was not isolated from BALB/c DNA, however this may be due to the low number of sequences that were isolated which does not allow statistically valid comparison to the set of 20 VH205. 12 related C57BL/6J sequences described above.

DNA sequences of the 10 genes isolated from BALB/c DNA:

***********^4********** VH205.12 CTCTTTTGGT GACAATTTAA AACAGAATTT CAAAATCAGT ATGTGACAGT GACTAGTATA CTCTTAACAA TAAIAAGTAA AAATTAAACA TTTTCCACAT BALB8 A BALB9 A BALB67 A BALB71 A G BALBll A G BALB19 A BALB17 A G T BALB13 A G BALB58 A G BALB6 A G T 10 20 30 40 50 60 70 80 90 100 VH205.12 CACATCCATG TCATTACTTT TCTCATITGC CTTCTTCCTT ATCAACATAT GACTAAMTC TAATTAAGAC ATTAAATCTT TTT-AAACTGC ACTTAGCT&A BALB8 CA T C C -- — .C T BALB9 CA T C C -- — .C T BALB67 T C G. ...C T A . .C. BALB71 ...T .A. . .C. BALBll ...T .A. . .C. BALB19 r... .A. . .C. BALB17 . ..T .A. . .C. BALB13 . ..T .A. . .C. .A. . RALB58 . ..T .A. ..C. 120 140 . .C. 160 180 190 200 BALB6 . ..T 130 170 110 150

130 VH205.12 AGGGTATTTC CCTCATTATC ATCAGCATIA TCTTCATCAT CATCTTCATC TTCACAATTC ATTATCTGTA AATCAAAATG TGAACATAGC CCCAGAGT-G-A BALB8 T G T .T.G A.T G.-. BALB9 T G T .T.G A.T G.-. BALB67 T G.-. .T G G -.-. BALB71 ,T. BALBll BALB19 ,C. ...T BALB17 .T. . . .T BALB13 .T. .. .T BALB58 C. 210 220 .. .T 240 250 260 270 280 290 300 BALB6 230 VH205.12 CAAAACAAAC TTAGGCCAAA CACAGATTGA GAGATTTGTC CCTGTAGTTT CAAGAATACC AGCAGTGCAG GGCTCACAGA AAATGTATGG ATCCATTTCC BALB8 G G T T A G A T BALB9 G G T T A G A T BALB67 -T..T. .A .. BALB71 .A . . .A T. BALBll .T..T..A BALB19 .G. . T. . ..A .. . .G. BALB17 .G. . T. ...AT. . .G. BALB13 .G.. T. ...AT. . .G. BALB58 .G. . 320 T. 330 340 . ..A .. . .G. 370 380 390 400 BALB6 310 350 3 60 VH205.12 TCAGAGAGTT ATTGGATTTG GACTAGACTA TCCTGCTGCT TGACCTATGT ACCTTTAAGT CCTTCCTCTC CAGCTTTTCT TCATTCGGAT TGGTTATTAT BALB8 C C G C....T G... BALB9 T C C G C T G... BALB67 .TC. G. ,T. . EALB71 ..C. G. .T. . BALBll ..c. .T..A. .TA. BALB19 .TC. BALB17 .T..A. .T. . BALB13 ..C. . . .-G. .T. . BALB58 ..C. .T..A. .TA. BALB6 ..C. .T. .A. .T. . 410 420 ..C. 440 450 460 470 480 490 500 430

VH205.12 ATACAAAGTC CCCTGGTCAT GAATATGCAA AATACCTAAG TCTATGGTAG CTAAAAACAG GGATATCAAC ACCCTGAAAA CAACATATGT ACAATGTCCT BALB8 T T C BALB9 T T C BALB67 A T T A C .A. BALB71 BALBll .A. BALB19 EALB17 BALB13 BALB58 510 520 530 540 550 560 570 580 590 600 BALB6 Leader 1 VH205.12 CACCACAGAC ACTGAACACA CTGACTCTAA CCATSGGATG GAGCTGGATC TTTCTCTTTC TCCTGTCAGG AACTGCAGGT AAGGGGCTCA CCAGTTCCAA BALB8 ,T G CT T BALB9 ,T G CT T BALB67 ,T A .T. .T. . . BALB71 • T. .T. .T. .T. BALBll .T..T. BALB19 .T. BALB17 ,T. .CCT BALB13 • T. .T. .T. BALB58 610 620 630 640 650 660 670 680 690 700 BALB6 I VH205.12 BALB8 ATCTGAAGAA AAGAAATGGC TTGGGATGTC ACTGACATCC ACTCTGTCTT TCTCTTCACA GGTGTCCTCT CTGAGGTCCA GCTGCAACAA TCTGGACCTS BALB9 AT A T T G BALB67 AT A T T G BALB71 - . ..GG ..TT G.. . BALBll ...G .T G... BALB19 ...G .T G.,. BALB17 ...G .T.T. .A. .T. BALB13 ...G BALB58 BALB6 .T. . ...G 710 720 730 740 750 760 770 780 . ..G 790 800 CDRl VH205.12 AGCTGGTGAA GCCTGGGGCT TCAGTGAAGA TATCCTGTAA GGCTTCTGGA TACACGTTCA CTGACTACTA CATGAACTGG GTGAAGCAGA GCCATGGAAA BALB8 A A A G BALB9 .A. A. BALB67 .A. A. .A. A. BALB71 .C • A. A. BALBll .A. A. BALB19 .c .A. .A.. .AC BALB17 .C. .A. .T. .A. A. BALB13 .C • T, .A. A. BALB58 .c .A. .A. .TAC 820 830 850 860 870 890 900 BALB6 810 .C. -A. 840

131 CDR2 VH205.12 GAGCCTTGAG TGGATTGGAG ATATTAATCC TAACAATGGT GGTACTAGCT ACMCCAGAA GTTCAAGGGC AAGGCCACAT TGACTGTAGA CAAGTCCTCC EALB8 T BALB9 T BALB67 T BALB71 G C BALBll T...A. A A BALB19 T...A. A A BALB137 G. C T BALB58 G C T BALB6 G 910 920 930 940 950 960 970 980 990 1000 ************************* VH205.12 AGCACAGCCT ACATGGAGCT CCGCAGCCTG ACATCTGAGG ACTCTGCAGT CTATTACT5T BALB8 BALB9 BALB67 BALB71 BALBll BALB19 BALB17 BALB13 EALB58 1010 1020 1030 1040 1060 1070 BALB6

Figure 11.4. The sequences of 10 VH205.12 related germline genes isolated from BALB/c DNA. 2 of the sequences represent pseudogenes (italicised sequence names, see Table 11.1). The putative TATA box, octamer and ATG start codon are overlined. The positions of the primers are indicated with asterisks. Symbols: I = putative splice sites; > = first codon of the coding region (GAG); * = putative transcription start site;. = sequence identity with the VH205.12 germline gene; - = base deletion. Base insertions are indicated by additions to the individual sequence. The numbering is as shown in Both et al, 1990.

As with the other sets of sequences it is apparent that coincident changes are present at a number of nucleotide positions. There seems to be less nucleotide variability in and around CDRl and in CDR2 than in the set of 20 sequences isolated from C57BL/6J DNA (Fig. 11.1), but this may merely be a reflection of the low number of sequences. The concentration of nucleotide variability immediately upstream of the 3' primer that was present in the 20 sequences shown in Figure 11.3 is absent from the above sequences. None of the 10 VH205.12 related sequences isolated from BALB/c DNA match any of the 20 sequences that were isolated from C57BL/6J. This is similar to the VH 186.2 related sequences shown in chapter 8 which also showed no overlap between the sequences isolated from the two different mouse strains. Two pairs of sequences, BALB 13 / BALB58 and BALBll / BALB 19, possess identical putative coding regions but differ in other parts of the putative transcription unit and the non-transcribed 5' flanking region. Only one of these, BALB 19, was isolated from independent PCR amplifications, however both BALB 13 and BALB58 were isolated from different amplifications. All of the 10 sequences shown in Figure 11.4 contain a A—»T change in the putative promoter which brings about sequence identity with the octamer found in the VH186.2 promoter which is known to be a functional promoter (Ballard and Bothwell, 1986). None of the sequences contain any changes in the putative TATA boxes or splice sites, thus none of these sequences contain any obvious defects in the elements that are involved in transcription or splicing. However, it cannot be excluded that some of these

132 sequences may contain crippling mutations in the control elements that are present downstream of the coding sequence. The 10 BALB/c sequences share greater than 93 % homology and are thus members of the same gene family. The sequence homologies of the putative transcription units appear to be somewhat higher than those of the non-transcribed 5' flanking region sequences, however this may once again reflect the small sample size.

™ 5' cap site E! 3' cap site

93 94 95 96 97 98 99 % Sequence homology

Figure 11.5. Sequence homology plot of the 20 VH205. 12 related sequences isolated from BALB/c DNA. Homologies are shown for the 5' non-transcribed flanking regions (5' cap) and for the putative transcription unit (3' cap).

Wu - Kabat nucleotide/amino acid variability plots for the 10 BALB/c sequences: A nucleotide variability plot was constructed for the entire PCR amplified regions and an amino acid variability plot was generated for the putative coding regions of the 10 VH205.12 related sequences isolated from BALB/c DNA (Fig. 11.6).

133 8i

< 3. §-4

0 -600 -400 200 400 600 0 20 40 60 80 100 Nucleotide position Amino Acid Position

P capL V CDRl CDR2

Figure 11.6. Nucleotide (left) and amino acid (right) variability plots for the 10 VH205.12 related BALB/c sequences. The horizontal dotted line in the nucleotide variability plot indicates the level of variability where only one out of the 10 sequences contains a nucleotide change (2.222). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0. The diagram below each graph indicates the region included in the graph. The relative positions of CDRl and CDR2 are indicated by black boxes. In the nucleotide variability plot the cap site was assigned position 0. Symbols: P = promoter, cap = transcription start site; L = leader sequence; V = V gene.

Due to the low number of sequences it is difficult to draw any firm conclusions from the variability plots shown in Figure 11.6. The nucleotide variability peaks in the putative coding region are lower than those present in the non-transcribed 5' flanking regions, however more sequences need to be added to the set in order to determine whether the conservation of FR regions seen in the 20 sequences isolated from C57BL/6J DNA (Fig. 11.1) is also present in the above 10 sequences. If this pattern were to hold for a larger set of sequences, then it would be expected that the gaps present in the 5' flanking region, but not those in the putative coding region would fill as more sequences are added. It was not attempted to subject the 10 VH205.12 related sequences isolated from BALB/c DNA to a detailed codon-by-codon analysis since the number of nucleotide differences from the consensus is probably not large enough not large enough for a statistically valid analysis.

Concluding remarks: Although more VH205.12 related sequences need to be isolated, the above data nevertheless indicates that non-random concentration of sequence variability in the germline CDRs is also a feature of the VH205.12 related sequences. The proportion of pseudogenes was higher in the VH205.12 sub-family than it was in the VH 186.2 sub-family, but among the 20 C57BL/6J sequences the observed number of stop codons that were generated by point mutation was still significantly lower than expected under a random point mutator model. 134 12. Phylogenetic analysis of VH205.12 related germline genes

Rationale and strategy: In order to assess whether different lineal relationships between the 5' non-transcribed flanking regions and putative transcription units also exist among the VH205.12 related sequences, they were subjected to the phylogenetic analysis described in chapter 9.

Dendrograms for the 20 C57BL/6J sequences:

C57C-18 J-C57C-44y C57C-9 1—C57G-22V r-F

C57C-9 C57C-18 C57G-14 C57C-9 C57C-18 C57G-3 C57G-6 C57G-14 C57C-27 C57C-27 C57G-6 C57G-3 |lVH2VH205.1i 2 C57G-9 I— r

Figure 12.1. Dendrograms illustrating phylogenetic sequence relationships of the 20 VH205.12 related sequences isolated from C57BL/6J DNA. Four different regions were subjected to analysis: the entire PCR fragment (upper left), the 5' non-transcribed region (upper right), the transcription unit (lower left), and the coding region (lower right). The sequence for clone C57G2 is identical to the VH 186.2 sequence 135 Table 11.1), and thus is labeled VH205.12. The diagram below each dendrogram indicates the region that was analyzed. Symbols: P = promoter; cap = transcription start site; L = leader sequences; V = V gene; y = pseudogene.

The above analysis indicates that sequence relationships differ between the dendrograms for the putative transcription units and the 5' non-transcribed flanking regions (Fig. 12.1). Thus, the cluster containing sequences C57C-9, C57G-14, C57C- 18, C57G-6, C57C-27, C57G-3, C57G-9, and VH205.12 are on the same branch in the dendrograms for the putative transcription and coding regions. However, in the dendrogram for the 5' flanking region many of these lineal relationships have been completely reshuffled. In addition, two pairs of pseudogenes (C57-44\|/ and C57G-22y, C57G-15\|f and C57G-18\|/) are found on the same branch in the dendrogram for the transcribed region but on different branches in the dendrogram for the 5' flanking region. Therefore, as was the case among the VH 186.2 related sequences (chapter 9), lineal relationships differ between the transcription unit sequences and the sequences of the 5' non-translated flanking regions.

Dendrograms for the 10 BALB/c sequences:

BALB-6 BALB-17 BALB-13-cy BALB-6 BALB-11 r I— BATBALB-58. y BALB-71 BALB-19 BALB-19 i BALB-67 BALB-67 BALB-8 J-BALB-9 TBALB- 9 1 BALB-8 P capL P capL O-HH: O-HH:

i BALB- 13y *— BALB-8 ^">BALB-58y I BALB-6 BALB-71 1 BALB-17 i BALB-6 •BALB-13y 'BALB-17 T "BALB-1 BALB-111 I I BALB-9 i 1' BALB-1 BALB-199 I | [ ^ BALB-8 L-BALB-6•BALB-67 | ''BALB-6 7 1BALB-7 BALB-711 \ BALB-11 4BALB-58 y 1BALB-1 9 P capL P capL O-HH: 04W

Figure 12.2 Dendrograms illustrating phylogenetic sequence relationships of the 10 VH205.12 related sequences isolatedfrom BALB/ c DNA. Four different regions were subjected to analysis: the entire PCR fragment (upper left), the 5' non-transcribed region (upper right), the transcription unit (lower left), and the coding region (lowerright). The diagram below each dendrogram indicates the region that was analyzed. Symbols: P = promoter; cap = transcription start site; L = leader sequences; V = V gene; y = pseudogene.

136 Although probably too few sequences were obtained to draw any firm conclusions, the sequence relationships in the dendrograms obtained for the putative transcribed and coding regions are more similar than those found in the dendrograms generated for the 5' flanking regions and the putative transcription units. Therefore, the above analyses of the VH205.12 related sequences support the conclusions drawn from the phylogenetic analyses of the VH 186.2 related sequences, i.e. the data indicates that the putative transcription/coding units may have been targeted by hyper-recombination events.

Distribution of insertions and deletions: A total of 14 unique nucleotide insertion or deletion events were found in the 30 sequences from both data sets. The distribution of these within the PCR amplified region is shown in Figure 12.3.

I

a < a

-200 0 200 Nucleotide Position n-H-H-m P cap L V Figure 12.3. Distribution of 14 unique insertion and deletion events found in the 30 VH205.12 related germline VH genes. The data are grouped in 50 bp intervals. The relative positions of CDRl and CDR2 are indicated by black boxes in the diagram below the graph. The cap site is assigned position 0. Symbols: P = promoter; cap = transcription start site; L = leader region; V = germline VH gene.

It is apparent that the clustering of insertion and deletion events around the cap site of the 52 VH186.2 related sequences (see Fig. 9.3) is not present in the set of VH205.12 related sequences. Two concentrations of insertions and deletions are present in the regions 150 - 200 and 250 - 300 nucleotides upstream of the cap site. However, these 'peaks' of insertion and deletion events are less significant than the one centered on the cap site of VH186.2 related sequences. Therefore, although the phylogenetic analyses of the VH205.12 related sequences support the proposition that the transcription unit may be

137 the target for hyper-recombination events, the distribution of insertions and deletions differs markedly from that seen among the VH 186.2 related sequences. However, since most of the insertions and deletions contributing to the peak present at the cap site in the 52 VH 186.2 related sequences (Fig. 9.3) were obtained from BALB/c sequences it is possible that as more VH205.12 related BALB/c sequences become available, a more prominent peak of insertion/deletion events will become apparent around the cap site (discussed further in section 13. 5a).

Concluding remarks: Although fewer VH205.12 were available for study, the preliminary results suggest that the 5' flanking regions and the putative transcribed regions of these sequences have also evolved differently, indicating that hyper- recombination events targeted to the transcription unit may also have contributed to the evolution of this sub-family. The distribution of insertions and deletions among these genes is inconclusive and awaits a larger sample size.

138 13. DISCUSSION AND CONCLUSIONS

Preamble: Although the data presented in chapter 4 did not allow unambiguous identification of DNA and RNA sequences that were isolated from the same cell, a number of interesting observations regarding VH-D and D-JH junctional sequences were made. These observations will be discussed in the context of their possible role during B cell ontogeny and/or antigenic selection. The preliminary results shown in chapter 5 indicate that the B cell activation system used needs to be developed further if it is to be used to efficiently activate single antigen-specific B cells isolated from hyperimmune spleen. Possible future strategies will be discussed. The results obtained from the experiments described in chapter 6 directly address the mechanism of somatic hypermutation. Although the results do not allow unambiguous identification of the mechanism, they allow the definition of more stringent requirements that need to be met by any putative model. Thus, a number of models that have been proposed to explain somatic hypermutation will be evaluated in the light of the results obtained in chapter 6. Any study involving the amplification of closely related sequences must take into account the fact that some of the sequences may result from PCR strand jumping events. This issue was addressed in chapter 7, and the results indicate that it is possible to reduce these in vitro artifacts to a negligible level. The results from the analyses of the related germline VH genes presented in chapters 8-12 have important implications for the evolution of germline IgV genes. However, before discussing which evolutionary mechanisms may best explain the sequence patterns observed among the murine VH genes analyzed in this thesis, the analyses described in chapters 8-12 will be applied to germline IgV genes from other vertebrate species in order to determine whether the murine VH sequence patterns described above are general features of vertebrate germline IgV genes.

13.1 Comparison of RNA and DNA sequences isolated from the same pool of anti-NP splenic B cells

Deleterious somatic mutations in the literature: It is thought that the process of somatic hypermutation is operating in the slg" centroblasts that are proliferating in the dark zone of germinal centers (MacLennan et al, 1991, 1992; and see section 1.2c). Following somatic hypermutation these cells express slg so that they can be selected on the basis of their ability to bind antigen. Any cells that fail to bind the antigen presented by the FDCs undergo apoptosis (Liu et al, 1989). It was therefore interesting to note that

139 a number of laboratories managed to isolate germinal center or splenic V region sequences that contained mutations that introducted stop codons, or that would result in the loss of antigen specificity (Apel and Berek, 1990; Rada et al, 1991; Berek et al, 1991; Weiss et al, 1992). V region sequences with deleterious mutations were isolated from both primary (Apel and Berek, 1990; Rada et al, 1991; Berek et al, 1991; Weiss et al, 1992) and secondary responses (Rada et al, 1991). One of these studies suggested that somatically mutated germinal center B cells may accumulate further somatic mutations whilst expressing slg (Rada et al, 1991). This would imply that a cell can undergo antigenic selection based on its slg whilst still being mutated. Therefore, a B cell that survived antigenic selection and is on its way out of the germinal center could replace the surface receptor that enabled it to survive with another carrying additional mutations. It was previously shown that many somatic mutations are likely to be deleterious or lethal, especially when additional mutations are introduced into a previously mutated V region (Chen et al, 1991). Thus, a prediction of the proposition put forth by Rada and colleagues is that a significant number of somatically mutated peripheral B cells should express IgV regions that contain deleterious mutations. However, this is exactly the type of situation that the germinal center has evolved to avoid. Two alternative explanations could explain the above data. First, it could be assumed that multiple somatically mutated copies of the rearranged V region can exist in any one B cell. Indeed, two of the models for somatic hypermutation predict that a somatically mutating cell should contain multiple copies of the V region genes with different mutations. The model proposed by Manser (1990; and see section 1.2c) predicts that these differences should exist at the DNA level, whereas the model proposed by Steele and Pollard (1987; and see section 1.2c) predicts they should be present at the RNA level. Thus, according to these models it may be possible that one (or more) of the mutated copies of the V region may possess deleterious or lethal mutations. Support for this explanation comes from the interesting finding that hybridomas expressing multiple transgene copies, each differentially mutated, seem to be able to upregulate expression of the most advantageous copy, and downregulate the other copies (Lozano et al, 1993). This suggests that B cells contain an as yet unknown mechanism by which they can distinguish different somatically mutated copies of a V region. Although it could be argued that this phenomenon may be due to the integration of multiple copies of the transgene, it is difficult to conceive how B cells could develop a highly sophisticated strategy for the recognition and selection or downregulation of individual transgene copies within the time it takes to generate and grow hybridoma cells. Second, it could be argued that the sequences containing detrimental mutations were isolated from cells which no longer mutate but have not yet been selected by the antigen presented by the FDCs in the basaltight zon e of the germinal center. In support of

140 this, stop codons generated by somatic mutations have only been reported in studies were the V regions were PCR amplified from the DNA or RNA of germinal center B cells (Rada et al, 1991; Berek et al, 1991; Weiss et al, 1992). In addition, a study that analyzed the V regions expressed by hybridomas that did not bind the eliciting antigen detected mutations that abrogated antigen specificity but no frameshift mutations or stop codon-generating mutations (Apel and Berek, 1990).

Characterization of NP-specific B cells: In order to allow direct sequence comparison of the RNA and DNA from the same cells, NP-specific B cells were isolated from the spleen of hyperimmunized mice and both the RNA and the DNA of these cells was isolated as described above. The sequences in Figures 4.1 and 4.2 demonstrate that the methods utilized indeed resulted in the isolation of B cells expressing somatically mutated derivatives of the VH 186.2 gene which is characteristic of anti-NP B cells (Bothwell et al, 1981; Cumano and Rajewsky, 1985, 1986). The use of NIP-coated Dynabeads for the isolation of these cells does not allow determination of light chain usage by the cells. Due to the specificity of the primers used for reverse transcription of the RNA and amplification of the cDNA, it is possible to state that the RNA sequences were obtained from IgGi+ B cells. However H chain usage could not be determined for the DNA sequences since the primers used were specific for the JH genes. Thus, although the advantage of the PCR is that it allows rapid and specific amplification of desired sequences, a key disadvantage is that it only allows very limited characterization of the target cell(s). This could be overcome by using multiple-parameter flow cytometry to isolate the cells, however for this experiment the more basic procedure used for the isolation of the desired cells was sufficient.

Clonal relationships among NP-specific B cells: Blier and Bothwell (1987) established that it is possible to isolate somatically mutated B cells with identical VH-D- JH joins, but different somatic mutations from individual immunized mice. Indeed, this has since been commonly used to identify somatic mutants originating from the same germinal center (eg. see Jacob et al, 1991a, 1992). It is apparent from Figures 4.1 and 4.2 that three clonally related sets of B cells were isolated in this work. Clonally related members of family 1 were isolated from both the RNA and the DNA of the B cells, whereas members of family 2 and 3 were respectively isolated from the RNA or the DNA only. 12 of the 16 cDNA and 5 of the 8 DNA sequences belong to a clonally related family. This is somewhat surprising, since this degree of clonal relatedness would only be expected if the cells were isolated from a single mouse. Therefore, the experimental conditions resulted in the isolation of a lower than expected variety of NP-specific B cells.

141 There are a number of possible explanations for this. First, it could be due to primer bias of one or more of the primers used in this work, or alternatively at least one of the priming sites was heavily mutated in many B cells. However the latter explanation is unlikely since the primers were specific for regions that are not normally subject to high rates of mutation. Second, some of the mice may have had a prolific anti-NP response, thus contributing many of the B cells that were isolated. Third, a survey of published anti-NP VH-D-JH sequences revealed that in one study (McHeyzer-Williams et al, 1993) two sequences with identical VH-D-JH junctions were isolated from different mice at different times following primary immunization (see below). Finally, it was recently shown that PALS-Foci may provide the founder B cells for germinal center formation (Jacob and Kelsoe, 1992; and see section 1.2c). Antigen-specific B cells undergo proliferation without mutation in the PALS-Foci, thus it is reasonable to assume that several B cells from the same clone (i.e. identical VH-D-JH junctions) may exit the PALS-Foci and initiate the formation of a germinal center each. In this case, clonally related sequences may make up a larger than expected proportion of somatically mutated splenic B cells.

Antigenic selection of VH-D-JH junctions: Surprisingly, cDNA clones cl and c34 which are not clonally related contain the same N sequence at their VH-D junctions (Fig. 4.1). During a search of previously published anti-NP VH region sequences, other examples of identical N regions shared by different VH-D junctions were found (Fig. 13.1). Thus, cDNA clone c5 possesses a non-templated GGG trinucleotide at its VH-D junction that is also present at the VH-D junction of a number of sequences published by Blier and Bothwell (1987). Similarly, several of the anti-NP VH sequences published recently McHeyzer-Williams et al. (1993) also contain identical N regions at their VH-D joins. It is also interesting to note that sequences 8 (Figure 2 in McHeyzer-Williarns et al, 1993) and 12 (Figure 8 in McHeyzer-Williarns et al, 1993) have identical VH-D-JH junctions despite being isolated from different mice at different times after immunization (Fig. 13.1).

DFL16.1 JH-2 VH186.2 TATTGTGCAA GA TATTACTACG GTAGTAGCTA C TACTTTGACT Cla GGC — G.. —. . C34a T.. .. GGC —A CTAC

c5a GGG AG GCTATGG Hl-19b ..C GGG A AGAGACTAC ,T Hl-9b T... .G GGG — C.TGGCA C Hl-55b -T G GGG — TGGCA C 16c2 TACCCC .T. 16c4 TACCCC ... 17°4 TACCCC C. Sc7 TACCCC 8c2 GGT C 12cS GGT c 2c8 GGT ATC

142 Figure 13.1 (see previous page). VH-D-JH sequences isolated from B cells participating in the anti-NP response and containing identical N regions. The sequences are published in this thesis (a), or were D c c2 previously published ( Blier and Bothwell, 1987; McHeyzer-Williams et al, 1993 [ , c4, c7? c8 _ Figures 2, 4, 7, 8 respectively]). The sequences are compared the 3' end of VH186.2, DpLi6 1 and to the 5' end of JH-2 (cDNA clone c5 is rearranged to JH-4)- Symbols:. = identity to germline sequence; - = deletion of a nucleotide; blank spaces indicate sequences not shown in the original publication. Additional nucleotides at all VH-D and some D-JH joins are the N regions.

Another unexpected finding was that some sequences shared a common VH-D join but differed at the D-JH join, or vice versa. cDNA clone c5 and the family 2 cDNA sequences share the same D-JH junction but differ at the VH-D junction, whilst the VH-D junction of DNA clone D4 is identical to that found in the family 3 DNA sequences (Fig. 13.2). Comparison of the sequences shown in Figures 4.1 and 4.2 with other sequences from the literature reveals that cDNA clone c44 also shares the same VH-D join with members of family 3 published by Blier and Bothwell (1987), however it contains a different D-JH join. These and other sequences with similar features are shown in Figure 13.2.

PFL16.1 JH-2 VH186.2 TATTGTGCAA GA TATTACTACG GTAGTAGCTA C TACTTTGACT a D4 TACG _ .ATCC TGGTACTTCG Group 1 D36a TACG D25a TACG „ D42a TACG M

a c44 GGA . . . .AATT. . . Group 2 Hl-291 GGA ...CCCAC Hl-39t GGA ...ACCAC Ml-45t GGA Hl-59k GGA

...G.TGGTT ACTAC .G. . Group 3 . . .G.TGGTT ACTAC • GAGAGGA JH-4 TATGCTATGG Group 4 c5a GGG AG ell* AG cl6s AG clO2 AG c40a AG c43a . . . AG

Figure 13.2. VH-D-JH sequences isolated from B cells participating in the anti-NP response that contain identical VH-D or D-JH junctions, published in this thesis (a), or previously published (^Blier and Bothwell, 1987; cJacob et al, 1992, Figure 3D). The sequences are compared the 3' end of VH186.2, DpLi6.1 and to the 5' end of JH-2 or JH-4 (DNA clone D4 is rearranged to JH-1 )• Symbols:. = identity to germline sequence; - = deletion of a nucleotide; blank spaces indicate sequences not shown in the original publication. Additional nucleotides at all VH-D and some D-JH joins are the N regions.

Figure 13.2 reveals that the sequences can be placed into four groups. Group 1 consists of sequences with identical VH-D but different JH rearrangements, group 2 contains sequences with identical VH-D but different D-JH, group 3 includes sequences with identical 5' JH termini but different D segments, and group 4 is comprised of sequences with identical D-JH but different VH-D. In addition, two anti-NP antibodies

143 with identical VH-D-JH junctions and a single nucleotide change in the VH gene were isolated from different C57BL/6J mice (McHeyzer-Williarns et al, 1993; and see Figure 13.1). The presence of identical N regions and/or identical VH-D or D-JH junctions in different B cells or mice is probably due to antigenic selection for particular amino acids in CDR3 which are not encoded by the germline sequence. Such residues may be introduced into CDR3 by the enzyme TdT (Alt and Baltimore, 1982), or they may be a result of new codons that are generated when nucleotides are removed from the ends of the coding regions (Alt and Baltimore, 1982). Certain non-germtine encoded amino acids may enhance antibody-antigen interactions, they may affect overall Ig structure and/or signal transduction or they may play a role at some stage of B cell development (Sharon, 1988; Alzari et al, 1990; Chen et al, 1992). An alternate explanation that could account for the presence of identical N regions in different B cells may simply be the fact that 80 - 90 % of the nucleotides found in N regions are GC (Lieber et al, 1988a). Therefore, especially when N regions are short, the nucleotide preference of the enzyme TdT is likely generate some identical N regions in different VH-D-JH junctions. It is well established that the intense antigenic selection acting on somatically mutated B cells in the germinal center can result not only in the conservation of critical amino acid residues (eg. see Sharon, 1988; Alzari et al, 1990; Parhami-Seren et al, 1993) but also the selection for specific amino acid replacement mutations (eg. see Berek and Milstein, 1987; Allen et al, 1988). In the anti-Ars response it was found that identical blocks of multiple amino acid replacement mutations in CDR2 were found in two hybridomas that were isolated from different A/J mice. Since no potential germline donors for these mutations were found, it was argued they arose as a result of intense antigenic selection (Wysocki etal, 1990). Some germline encoded residues that form part of CDR3 are also highly conserved where they interact with the antigen (Alzari et al, 1990). Additionally, the sequence of CDR3 was also found to be critical in endowing an anti-phosphotyrosine antibody with a high affinity for the antigen (Ruff-Jamison and Glenney, 1993), and in the anti-NP response it was found that antigenic selection favours antibodies with short CDR3 (McHeyzer-Williarns et al, 1993). Furthermore, it was recently shown that positive selection of VHl2-expressing B cells in neonatal mice is critically dependent on the length and sequence of CDR3 of the H chain (Clarke and McCray, 1993). Therefore, CDR3 sequences can contribute to or determine the success of a B cell in an immune response. It has been generally assumed that B cells that contain identical VH-D-JH junctions are derived from the same precursor cell. However, the data presented in this section suggests that not only can intense antigenic selection, or selection based on Ig function and/or structure result in the appearance of conserved CDR3 residues, it can also result in the positive selection of specific N regions and/or somatically altered VH-D-JH 144 junctions. Thus, CDR3 sequences with identical nucleotide deletions at the VH-D-JH junctions and/or containing identical N regions can be isolated from B cells originating from different precursor cells or even from different mice.

Comparison of the RNA and the DNA sequences: As shown in Figures 4.1 and 4.2 the cDNA and the DNA sequences display the hallmarks of somatic mutation and antigenic selection, i.e. concentration of amino acid replacement mutations in the CDRs and conservation of the FRs. Most of the mutations detected were found in somatically mutated VH regions isolated from the anti-NP response by others (Blier and Bothwell, 1987; Weiss et al, 1992). Only approximately 4 % of the somatic mutations found in the cDNA sequences were silent, whereas in the DNA sequences approximately 8 % of the mutations scored (not including the VH-D-JH junctions and the 10 bp on either side) were silent. A number of mutations that are found in many or most of the cDNA sequences (G->C at position 753, G->T at position 830, A-»G at position 914 and G->C at position 968) are present in only a minority of the primary or secondary anti-NP antibodies published by others (eg. Cumano and Rajewsky, 1986; Blier and Bothwell, 1987; Weiss et al, 1992). In contrast, the tryptophan to leucine change that is present at codon 33 in CDRl of most of the previously published VH186.2 expressing anti-NP antibodies (eg. Cumano and Rajewsky, 1986; Blier and Bothwell, 1987; Weiss etal, 1992) was present in only a minority of the sequences isolated here. This amino acid replacement was shown to confer a 10-fold increase in affinity for the hapten NP (Allen et al, 1988). The absence of this mutation in the tertiary response B cells isolated here may indicate that the cells have been recruited into the anti-NP response at a later stage on the basis of not only affinity for antigen, but high association rate as suggested by Foote and Milstein (1991). Sequences with the same VH-D-JH junctions were isolated from both the DNA and RNA of the NP-binding B cells Figs. 4.1 and 4.2). These were assumed to be clonally related and were tentatively placed in the same family of related sequences, family 1. However, none of the family 1 DNA sequences completely corresponded to those isolated from the RNA. In addition, the mutations found in most cDNA sequences at positions 753, 830, 914, and 968 (Figs. 4.1 and 4.2, and see above) are present in at most one of the DNA sequences. Due to the small number of sequences it is not possible to determine whether in fact multiple copies of the VH genes with different mutations are present in any of the cells that were isolated.

Future work: Despite the inability to reach any conclusions regarding the original question - i.e. can different copies of the VH genes be detected at the RNA or at the DNA level in antigen-specific splenic B cells? - the work carried out nevertheless allows fine- 145 tuning of the design of future experiments. Since most of the sequences isolated were rearranged to JH-2> different PCR strategies should be employed. Instead of pooling the JH specific primers when amplifying from the DNA, separate amplifications should be carried out for each rearrangement. Similarly, JH specific rather than IgGi specific primers should be used for the amplification from the cDNA. Alternatively, the experiment could be simplified by only amplifying JH-2 rearrangements since this is the most commonly used JH element in the anti-NP response (eg. see Cumano and Rajewsky, 1986; Blier and Bothwell, 1987). However, the ideal method for addressing the above question would be to use flow cytometry to isolate single NP-specific germinal center B cells expressing the IgGi heavy chain. The following procedure should then be used to amplify from each cell: the RNA is reverse transcribed (without removal of the DNA) with a IgGi specific primer. This is then followed by nested primer PCR amplification with IgGi specific primers to amplify the cDNA, and primers immediately downstream of JH-2 to amplify from the DNA. In this manner it could be determined whether a sequence originated from the RNA or the DNA. This method would allow unambiguous identification of any differences between the two types of nucleic acid.

13.2 In vitro analysis of splenic antigen-specific B cells

In vitro reproduction of somatic hypermutation: Recently the cellular interactions that take place in germinal centers between B cells, Th cells and FDCs were reproduced in vitro (Kosco et al, 1992; Kosco and Gray, 1992). However, a difficulty with this procedure is that since at this stage FDCs cannot be grown in culture they must be continually isolated from mice (M. Kosco, personal communication). It has yet to be determined whether the in vitro replication of germinal centers is sufficient to induce B cells to undergo somatic hypermutation, or whether activation signals obtained prior to entry into a germinal center are also required. Recently a system was developed which used the membranes of stimulated Th cells in combination with IL-4 and IL-5 to activate small resting B cells which subsequently proliferated and secreted Ig (Hodgkin et al, 1990). Since this system is able to provide naive B cells with the signals required for differentiation, it may be possible to use this system to induce antigen-specific B cells isolated from hyperimmune spleen to undergo somatic hypermutation. Therefore, in this preliminary study it was attempted to determine the effects of the B cell activation system first described by Hodgkin and colleagues on antigen-specific splenic B cells. NP-binding B cells were isolated using flow cytometry from single cell suspensions made from the spleen of two mice that were hyperimmunized with NP. The cells were sorted singly or in groups of 10 into individual wells of microtiter trays and 146 cultured under defined conditions. In order to determine whether additional Th signals may be required, some of the B cells were cultured in medium that also contained live Th cells. The cultured cells were then assayed for Ig secretion and proliferation. The rearranged VH genes expressed by some of the proliferating clones were PCR amplified and sequenced.

Analysis of anti-NP sera and splenic cells: Analysis of the isotype concentrations in the pre- and post tertiary sera of the two mice used in this study used in this study indicates that the major isotype produced during the tertiary anti-NP response is IgGi (Fig. 13.1). The other major isotypes present are IgM, IgG2a and IgG2b, however they were present at similar levels before the tertiary immunization. A previous study in which the same immunization protocols were used showed that IgGi is the most common isotype found in secondary anti-NP hybridomas isolated from C57BL/6J mice (Blier and Bothwell, 1987). Flow cytometry of a sample of the splenic single cell suspension indicates that over 80 % of cells are B200+ (Fig. 5.2a). It is also apparent that there is no significant B200" population in the spleen cells that were isolated. This is somewhat puzzling since even at the height of primary and secondary immune responses, germinal centers occupy les than 10 % of the total volume of the spleen (MacLennan et al., 1988). Unless it is assumed that during tertiary immune responses significantly higher numbers of germinal centers are formed, it appears that the procedure used to obtain the splenic single cell suspension somehow resulted in the loss of most non-B cells. The FACS analyses of the splenic cells (Figs. 5.2 and 5.3) indicate that 16 - 24 % of the B cells express the IgLJt chain (Figs. 5.2d and 5.3b). Approximately 5 % of the splenic B cells bind NP and express the IgLx, chain, whereas 23 % bind NP and express the IgLx; chain (Fig. 5.3). This is in good agreement with previous work which showed that primary anti-NP responses are characterised by the selection for 1%X expressing B cells (Karjalainen and Makeia, 1978; Reth et al, 1978), while significant recruitment of IgK bearing B cells into the anti-NP response occurs in later anti-NP responses (Reth et al, 1978). Therefore, the isotype expression data and the FACS analyses indicate that the anti-NP responses generated in the two mice used in this experiment generally conform well with previous studies.

Ig secretion and proliferation of FACS sorted splenic B cells: The B cells were sorted either singly or in groups of ten into the wells of microtiter dishes and then cultured in standard medium with various additions (Tables 5.1 and 5.2). Because Peanut Hemaglutinin (PNA, a germinal center B cell-specific marker) was not used to sort the cells, it is impossible to determine whether the B cells were isolated from germinal centers of from other parts of the spleen.

147 The ELISpot assay for Ig secretion (Table 5.1) indicates that the B cells are efficiently induced to secrete Ig in the presence of other B cells, T cells, IL-4 and IL-5. Single B cells were also induced to secrete Ig in the presence of T cells, activated T cell membranes, IL-4 and IL-5, albeit at a reduced rate. The reduction of Ig secretion when the T cells or the ILs are omitted from the medium suggests that these are required for efficient stimulation of the B cells. In two wells more spots were detected than the number of cells that were placed into the well (Table 5.1). This may indicate that either B cell proliferation occurred or that more than the programmed number of cells were placed into those particular wells. The proliferation assay was only done for wells containing 10 B cells. The only evidence for proliferation was detected was when activated T cell membranes, IL-4 and IL-5 were present in the medium (Table 5.2). In wells where the membranes and/or the ILs were absent no proliferation was detected. However, it cannot be ruled out that this is due to the small number of cells used for this analysis. Nevertheless, only two proliferating clones were detected and one of these may have been a contaminant. Thus, 13 single cells were isolated from the only clone of cells that apparently consisted of B cells. PCR amplification using nested primers with VH 186.2 and Cyl specific primers was successful for only 5 of the 13 cells. With the available data it is impossible to determine whether this is due to technical reasons or because the other 8 cells expressed a different isotype and/or VH gene. Sequencing of the DNA fragments that were amplified from the 5 cells revealed no differences (Fig. 5.5). Therefore, during the course of this project it was determined that B cells that were isolated from hyperimmune spleens and were FACS sorted singly or in groups of ten could be induced to secrete Ig by using the B cell activation system first described by Hodgkin et al. (1990). This was most efficient in the presence of whole T cells, indicating that additional Th cell factors play a role in activating these cells. Furthermore, the B cells could also be induced to proliferate providing 10 B cells were cultured in the presence of the activated T cell membranes and the ILs. With furtherfine tunin g of the system it may be possible to induce single B cells to proliferate. Molecular analysis of the B cell clones indicates that they did not hypermutate.

Future experiments: From the above results it cannot be established which B cells, if any, were isolated from germinal centers. Thus, a repeat of this experiment should ensure that germinal center B cells are isolated by using PNA binding as one of the parameters used in flow cytometry. At present single B cells have been isolated from specific compartments of germinal centers, and the genes expressed by these cells were characterized (eg. Jacob et al, 1991; Kuppers et al, 1993). However, because the cells are isolated from fixed tissue slices it is impossible to carry out more detailed functional studies. The results of 148 this preliminary study indicate that as more surface markers that distinguish B cells from different compartments of germinal centers become available, the B cell activation system used here may also prove to be a very useful and rapid technique for the in vitro characterization of single germinal center B cells.

13.3 Determination of the 5' boundary for somatic hypermutation in VH regions

Search for the cluster of mutations found upstream of the cap site in VH3B62: The reverse transcriptase model for somatic hypermutation proposed by Steele and Pollard (1987) does not predict the occurrence of somatic mutations far upstream of the cap site. The mutations found between the cap site and the promoter region of two hybridoma VH regions (Lebecque and Gearhart, 1990) can be accounted for by this model (see section 1.2.c). However the cluster of 5 mutations found > 375 bp upstream of the cap site in the VH region isolated from hybridoma 3B62 (Both et al, 1990) is incompatible with the reverse transcriptase model. This hybridoma expresses a somatically mutated derivative of the VH186.2 germline gene (Cumano and Rajewsky, 1986). There is evidence that some mutations in the coding region of VH3B62 may have arisen due to a gene conversion event between the rearranged VH3B62 gene and the H17 germline gene (Cumano and Rajewsky, 1986). It was therefore important to determine whether the 5 upstream mutations may have a potential germline donor, and whether this region is normally subjected to somatic hypermutation. In order to detect a possible germline donor for these mutations a number of PCR and hybridization strategies were employed in a previous study (Rothenfluh, 1990; Rothenfluh et al, 1993). Briefly, one of the 5 mutations introduced a diagnostic Rsal restriction site. This restriction site was not detected in any PCR products amplified from genomic DNA, nor was it detected by Southern blot hybridization of genomic DNA. VH3B62 specific primers failed to amplify any product from genomic DNA. Whereas none of these strategies detected a possible germline donor for the cluster of 5 mutations found > 375 bp upstream of the cap site in VH3B62 (Both et al, 1990), they relied on the detection of only one of thefive mutations . Therefore an additional attempt to detect a putative gene conversion donor sequence was made by sequencing VH 186.2 related germline genes amplified from BALB/c and C57BL/6J genomic DNA. Figure 6.1 shows part of the 5' flanking region sequences for 31 VH186.2 related sequences that were isolated from C57BL/6J DNA and 21 from BALB/c DNA. None of the 5 mutations were detected in any of the 52 germline genes. This suggests that the mutations found in the 5' flanking region of VH3B62 were not generated by a gene conversion event, but that they probably arose by a somatic mutational event or as a post- fusion technical artifact. In order to shed further light on this it was therefore necessary to 149 more precisely define the 5' boundary of somatic mutation by extending the number of sequences available for the 5' flanking region of rearranged VH regions.

Determination of the 5' boundary for somatic hypermutation: The 5' flanking region sequences up to a point approximately 520 bp upstream of the cap site were determined for 9 hybridomas produced from secondary response anti-NP B cells (Blier and Bothwell, 1987). These sequences and those of an additional 3 hybridomas (sequenced by Dr. L. Taylor) are shown in Figures 6.2 and 6.3. Only one of the twelve hybridomas, Hl-7, was mutated upstream of the cap site and both of these mutations fell within 200 bp of the cap site. Adding these twelve 5' flanking region sequences to those sequenced by others provides a total of 29 VH gene sequences that include the promoter region, and 20 of these extend to a point at least 520 bp upstream of the cap site (see legend to Table 13.1).

Table 13.1. Incidence of 5' distal mutations in rearranged VH genes Location on genomic DNA Number of mutated sequences Total number sequenced 220-520 bp 5' of cap* 1 20 Within 200 bp of cap0 4 29 a The sequences were those shown in Figures 6.2 and 6.3, and 40.3, 3B44, 3B62, A6/24, A20/44 (Both et al, 1990), MOPC167, H37-311, H37-45, and H37-80 (Lebecque and Gearhart, 1990). D The sequences used were those listed above and MOPC603, HPCG13, MC101 (Lebecque and Gearhart, 1990), H37-68, H37-85, H37-79, H37-78, H37-96 andM460 (Clarke etal, 1990).

Table 13.1 confirms the previous finding that somatic mutations are rarely seen upstream of the promoter/cap site regions (Both et al, 1990; Lebcque and Gearhart, 1990). Only four of the 29 sequences that include the promoter region contained mutations upstream but within 200 bp of the cap site, whereas only a single sequence (VH3B62) out of a total of 20 contained somatic mutations upstream of the promoter region. In the heavily mutated VH3B62 gene a cluster offive base changes was found > 375 bp upstream of the cap site (Both et al, 1990). However, the uniqueness of the 5 nucleotide substitutions found in VH3B62 and their distance from the normally mutated region suggests that these mutations may not be due to the normal somatic hypermutator mechanism. Thus, these mutations cannot be used as evidence against RNA/cDNA based mutator models (Rogerson et al, 1990). A mutation frequency distribution graph for a hypothetical VH-D-JH-1 rearrangement was plotted from a collection of sequences that included the twelve sequences shown in Figures 6.2 and 6.3, and all other rearranged VH regions available in the literature (see legend to Fig. 6.4). This graph clearly illustrates that there is a dramatic decrease in the mutation frequency around the cap site. In addition, out of a total of 407 mutations scored in sequences used in Figure 6.4, only 14 were found upstream of the

150 cap site and onlyfive of these mutations were found upstream of the promoter. Hence, in rearranged VH genes almost 97 % of somatic mutations fall within the transcribed region, which supports the proposition that somatic mutations are restricted to the transcription unit (Both etal, 1990, Steele, etal, 1992). Therefore, a reasonable conclusion to be drawn from the above is that the cap site is indeed the 5' boundary for somatic hypermutation and that a small amount of leakage occurs. This view would be consistent with mutator models that are dependent on the transcriptional state rearranged V genes (Steele and Pollard, 1987; Both etal, 1990; Lebecque and Gearhart, 1990; Steele et al, 1992). The model proposed by Lebecque and Gearhart predicts that the promoter is the 5' boundary for mutation (this is inconsistent with the above data) and that mutations are first introduced into genomic DNA. The model initially proposed by Steele and Pollard predicts that the cap site is the 5' boundary for somatic mutation, but that RNA/cDNA is the main nucleic acid substrate where nucleotide changes are first introduced. The observation that the 5' boundary for somatic hypermutation generally does not exceed the transcription unit may be suggestive of a model involving transcription, but is by itself insufficient evidence for a conclusion that mutations are introduced on the transcript or its associated cDNA.

Asymmetrical distribution of somatic mutations: implications for somatic mutator models: In an earlier review a mutation frequency distribution graph was plotted for the sequence data available at the time, which comprised 12 to 21 somatically mutated VH region sequences (Steele et al, 1992). This analysis revealed that the mutation frequency distribution around VH regions is positively skewed with a single mode at or near the VH-D-JH coding region and a long tail into the 3' non-translated region of the JH-C intron. The data available for the VL regions suggests that the mutation frequency is also distributed in an asymmetrical fashion (Lebecque and Gearhart, 1990; Weber et al, 1991; Motoyama et al, 1991, 1994). However, in the review it was also pointed out that a larger data set was required for a more conclusive analysis of the mutation frequency distribution pattern (Steele et al, 1992). The 12 sequences presented in Figures 6.2 and 6.3 therefore provide a larger data set for the accurate determination of the underlying frequency distribution of somatic hypermutation. Adding these 12 sequences to the data set presented in the previous review clearly illustrates that the mutation frequency distribution is indeed positively skewed (Fig. 6.4). The mode of the mutation distribution curve is centered on the rearranged VH-D-JH, which is in agreement with previous work where it was found that mutations are focused on the V(D)J coding region regardless of which J element was rearranged (Lebecque and Gearhart, 1990; Weber et al, 1991). There is also a sharp decline in mutation frequency around the cap site and a long tail into the JH-C intron (Fig. 4., 10).

151 Evaluation of somatic hypermutation models: A variety of models have been proposed to explain how somatic mutations accumulate in IgV regions (see section 1.2c). However, any model must be able to account for the asymmetrical mutation frequency distribution and predict that the cap site is the 5' boundary for somatic hypermutation in IgVH genes. In order to determine which models are compatible with these observations, somatic mutator mechanisms in which enzymes act on a prototype VH-D-JH-1 gene will be considered from a completely theoretical viewpoint below. The enzymes may act directly on the V region DNA (Brenner and Milstein, 1966; Gearhart, 1982; Bothwell, 1984; Golding etal, 1987; Kolchanov etal, 1987; Maizels, 1989; Manser, 1990), or localized error-prone DNA-^RNA->cDNA synthesis may be coupled to gene conversion to integrate the mutated cDNA into the chromosome (Steele and Pollard, 1987; Steele et al, 1991). The following assumptions were made: a) the incidence of mutation in the target area is constant for a unit length of a newly synthesized nucleic acid (RNA, DNA or cDNA) or a given length of template, i.e. the number of mutational events accumulated in a target is directly proportional to its length; b) all DNA templates are repaired or all fragments (DNA or cDNA) are made with an equal probability or frequency; and c) all fragments (DNA or cDNA) undergo homologous recombination or gene conversion of the target area with equal probability or frequency. Thus, if all other factors are equal these rules mean that the frequency of mutation in a given part of the target region is proportional to the cumulative number of mutations in a given length interval. Figures 13.3 and 13.4 describe mechanisms that utilize DNA as the direct substrate for the introduction of mutations, whereas two possible mechanisms where mutations can be introduced into both RNA and cDNA are considered in Figure 13.5. The target area is assumed to lie within the region 500 - 600 bp upstream of the cap site to the MAR/enhancer region. Mechanism 1: Localized error-prone DNA repair initiating from a common 5' terminus on the plus strand (+) and terminating randomly could generate a positively skewed mutation distribution. The 5' terminus is located either 500 - 600 bp upstream of the promoter region (Fig. 13.3a) or within the promoter region 100 - 200 bp upstream of the cap site (Fig. 13.3b). An alternate mechanism could involve error-prone DNA synthesis resulting in the production of fragments with different 3' termini. These mutated fragments would each have an equal probability of altering the sequence of the resident chromosomal target by gene conversion

152 a) b)

f# # >> /'^^1> CJ •v""*v o \\ 0a) c 3 ' 7 v\ ' =•/ V\ 3 >c-r \\ cr ur0u> V\ 1 u \\ c sX c o i // VS. o Kt 1 • I VS 3 •s\ •a VK 2 •s\ Vs. 1 -J HZWrOrHHHhHZIZr- -430HHHr-HZZ]- P L VDJH-I JH-2 JH-3 JH-4 P L VDJH-I JH-2 JH-3 JH-4 H—l-H—h [ I I •+—4-

Figure 13.3. Theoretical mechanisms of somatic hypermutation capable of generating a positively skewed distribution of mutation frequency (Mechanism 1). a) A common 5' terminus 500 - 600 bp upstream of the cap site and variable 3' termini extending into the MAR/enhancer region, b) A common 5' terminus at the 5' side of the promoter and variable 3' termini extending to the MAR/enhancer region. The drawing below the graph depicts the target region extending which includes the promoter (P), cap site (I) leader (L), a rearranged mouse VH-D-JH-1, the unrearranged JH-2. JH-3> ^d JH-4 elements and the 3' portion of the MAR/enhancer region. The solid line (—) represents the observed distribution (see Fig. 6.4). The broken line (—) indicates the expected distributions that are unaffected by any process that might prevent mutations occurring in the 5' region, whereas the dotted line (...) indicates the expected distribution where the 5' region is protected from mutation.

The curves shown in Figure 13.3 could also be generated by mechanisms involving synthesis of new DNA strands or randomly initiated DNA repair on the minus (-) strand. However, a starting point 500 - 600 bp upstream of the cap site is not compatible with the observed mutation frequency distribution unless it is assumed that the upstream and promoter regions may be bound or protected in vivo by a variety of DNA binding proteins such as transcription factors (Lebecque and Gearhart, 1990). In this manner the region upstream of the promoter would generally be protected from mutation. Nevertheless, it is also reasonable to expect a similar curve without protection of the 5' terminus by DNA binding proteins (G. Kelsoe, personal communication). If a DNA polymerase with low processivity and an error-rate of (for example) one nucleotide substitution per 300 nucleotides polymerized commences action at the 5' end of the target

153 region, then under the binomial distribution a zone of fewer mutations accumulating near this origin could be expected. If the polymerase leaves the template at random points it will generate a positively skewed population of fragment lengths, each of which could act as a gene conversion donor. Thus DNA-based models of the above types could produce the observed mutation frequency distribution without necessarily requiring the protection of the upstream region. Mechanism 2: Localized error-prone DNA repair initiating randomly within the target region would produce normal distribution of mutation frequency (Fig. 13.4a). Alternatively, error-prone DNA synthesis by a polymerase with low processivity initiated randomly throughout the target region could also produce a set of DNA fragments of varying lengths that could be re-integrated into the chromosome via gene conversion events. Such a mechanism would also produce a normal mutation frequency distribution. A variation of the latter mechanism is the mechanism proposed by Bothwell (1984). However, error-prone over-replication of the whole target region would produce a uniform distribution of mutation frequency which is clearly incompatible with the observed distribution.

a)

# >> **•-—— g / / \ cr / / \ / / a / / \ \ •^o3 \ \ CO // // \\ / i 3 f / 2 / / VS. / / s\ —z? "-*• -QO—HHI—LZZr- -Odh+++^ZJ p L VDJH-I JH.2 JH-3 JH-4 P L VDJH-1 JH_2 JH-3 JH-4 H—I—I—I- I I I I H—I—h -f-f- 4—i-

Figure 13.4. Theoretical mechanisms of somatic hypermutation generating symmetrical or negatively skewed distribution of mutation frequency (Mechanisms 2 and 3). a) Distribution of mutation frequencies expected under mechanism 2. b) Distribution of mutation frequencies expected under mechanism 3. The drawing below the graph depicts the target region extending which includes the promoter (P), cap site (I)

154 leader (L), a rearranged mouse VH-D-JH-1, the unrearranged JH-2, JH-3> and JH-4 elements and the 3' portion of the MAR/enhancer region. The solid line (—) represents the observed distribution (see Fig. 6.4), whereas the broken line (—) indicates the expected distribution.

Mechanism 3: This is similar to mechanism 1 but operates on the - strand. Error-prone DNA repair or DNA synthesis originates from the same starting point which is located in the MAR/enhancer region on the - strand. As shown in Figure 13.4b, enzymes initiating from a fixed point within the 3' end of the target region would produce a negatively skewed mutation frequency distribution with a single mode adjacent to the MAR/enhancer region and a long tail extending into the 5' portion of the mutation substrate area. Obviously, this is in conflict with the observed distribution of mutation frequency. Mechanism 4: This is a modified version of the reverse transcriptase model (Steele and Pollard, 1987; Steele et al, 1991). Pre-mRNA produced by the normal transcriptional enzymatic machinery is the primary substrate for hypermutation. The modification from the earlier version of the model is the presence of multiple reverse transcriptase priming sites scattered throughout the target region (Fig. 13.5a).

b)

& a

-4-TtilTTl--JMr-4—I r- -OOMHHr > P L VDJH-I JH-2 JH-3 JH-4 P L VDJH-I JH-2 JH-3 JH-4 I I I I • 1 1 1 1 I 1 1 1 1 1 1—• 1 1 1 1 1 1 1 1 1 1 •*•-

1 1 1 1 1 1 1 ] 1 A— H—I- -1 1 1 ' 1 1 1 1 1 1 II 1 1 1 1 1 1 *' , . , , , , | 4_ _.,,,,. 4mm

\ \ \ A- \ \ \ *—

Figure 13.5. Theoretical distributions of mutation frequency around a mouse VH-D-JH-1 gene generated by error-prone reverse transcription-based copying processes (Mechanisms 4 and 5). a) Mechanism 4. The expected distribution of mutation frequency is due to a population of potential gene conversion donors consisting of mutated cDNA molecules with variable 3' termini due to the presence of multiple reverse transcriptase priming sites, b) Mechanism 5. The expected mutation frequency distribution generated by a population of potential gene conversion donors with variable 5' ends due to termination of cDNA synthesis from a single initiation site. The drawing below the graph depicts the target region extending which includes the promoter (P), cap site (I) leader (L), a rearranged mouse VH-D-JH-1, the unrearranged

155 JH-2> JH-3> and JH-4 elements and the 3' portion of the MAR/enhancer region. The solid line ( ) represents the observed distribution (see Fig. 6.4), whereas the broken line (—) indicates the expected distribution.

The multiple reverse transcriptase priming sites would produce multiple cDNA retrotranscripts with differing initiation sites but with a common terminus located at the cap site. Integration of mutated cDNA molecules produced by this mechanism into the chromosome would result in a positively skewed asymmetrical distribution of mutation frequency (Fig. 13.5a), which is consistent with the observed pattern. Mechanism 5: This mechanism is a variation of mechanism 4. In this mechanism cDNA synthesis is initiated from a common initiation site which is situated near the 3' end of the target region (Fig. 13.5b). cDNA synthesis is terminated at random, thus producing a number of cDNA molecules with variable 5' termini. Integration of the molecule(s) into the chromosome would result in a negatively skewed mutation frequency distribution (Fig. 13.5b). This is incompatible with the observed distribution. The mutation frequency distribution generated by this mechanism but assuming that cDNA synthesis usually ends at the cap site would also not match the observed distribution. Mechanism 6: This mechanism assumes that all mutations are introduced by gene conversion events between a germline donor sequence and the VH-D-JH region. It is now known that this type of mechanism is responsible for the generation of diversity in chicken IgV regions (Reynaud et al, 1987, 1989). Sequence analysis of 12 somatically hyperconverted IgV regions and approximately 200 bp of their 3' and 5' flanking region sequences (Reynaud et al, 1987) revealed that only 4 out of 219 mutations (1.8 %) fell outside of the rearranged V region. Therefore, if this mechanism was solely responsible for the introduction of mutations into murine IgV regions the mutation frequency distribution shown in Figure 13.6 would be expected. Obviously, the expected distribution of mutation frequency differs greatly from the observed distribution.

g 3 CT I *-> 3 2 •^^^o--HH^—LZD- J E P L VDJH.I JH-2 JH-3 H-4 Figure 13.6. Theoretical distribution of mutation frequency around a murine VH-D-JH-1 !°CUS generated by a V region-specific hyperconversion mechanism.

156 If the assumption is made that the putative hyperconversion mechanism is active on the entire murine target area, a mutation frequency distribution similar to the one shown in Figure 13.4a would be expected. This also is incompatible with the observed distribution of mutation frequency.

Which mechanism is correct? The above analyses reveal that it is reasonable to exclude mechanisms 2, 3, 5 and 6 as possible models to explain somatic hypermutation of murine IgV regions, since these mechanisms are incompatible with the observed mutation frequency distribution. Thus, mechanism 1 and 4 remain as possible explanations for the observed data. The model invoking error-prone DNA repair initially proposed by Brenner and Milstein (1966), and the model proposed by Manser (1990) which involves error-prone DNA synthesis both involve mutational mechanisms described by mechanism 1. These two models therefore could potentially generate the observed mutation frequency distribution providing that i) the error-prone enzymes involved in DNA repair or DNA synthesis traverse the + strand of the target region from a common upstream initiation site and terminate randomly in the downstream region; or ii) the enzymes initiate at random sites on the - strand and terminate at a common point near the cap site. Mechanism 4 is a modification of the reverse-transcriptase model initially proposed by Steele and Pollard (1987). This model can thus also be expected to generate the observed distribution of mutation frequency providing that multiple initiation sites for cDNA synthesis are present in the target region. In summary, although it is now possible to eliminate a number of models as possible explanations for somatic hypermutations, it is not yet possible to distinguish between DNA-based models of the type described by mechanism 1, and RNA/cDNA- based models as described by mechanism 4.

13.4 Minimization of PCR generated artifacts

Error-rates of Pfu and Taq DNA polymerases: Nucleotide misincorporation induced by bacterial DNA polymerases during PCR is a problem that was recognized in the first comprehensive publication reporting on the use of thermostable enzymes in PCR (Saiki et al, 1988). It has since been demonstrated that the error rate of perhaps the most commonly used enzyme, Taq DNA polymerase, ranges from 1.1 x IO"4 (Tindall and Kunkel) to 6 x 10"^ (Weiss and Rajewsky, 1990) errors per nucleotide per cycle. In chapter 7 it was shown that the average error rate of this enzyme in the experimental conditions described in this thesis is 4.1 x 10"^ errors per nucleotide per cycle, and hence is at the lower end of the reported error rates. Nevertheless, this effectively predicts the presence of one misincorporated nucleotide in every 1233 nucleotides sequenced. Since

157 the PCR amplified germline VH sequences reported on in chapters 8 and 11 are almost 1000 bp in length, the use of Taq DNA polymerase would result in the presence of at least one in vitro generated nucleotide change in almost every sequence. However, the above error rate for Taq DNA polymerase is only an average value. As shown in chapter 7, the maximum error rate of Taq DNA polymerase can be an order of magnitude higher than the average. Thus, at worst the use of Taq DNA polymerase for the amplification of the germline VH gene sequences could result in several errors per germline sequence. Since this would obviously greatly affect any phylogenetic analysis of the germline VH sequences it was of utmost importance to reduce the number of in vitro generated nucleotide changes as much as possible. Using a genetic assay it was previously shown that Pfu DNA polymerase, a thermostable enzyme with 3'-»5' proofreading ability possesses an approximately 12- fold lower error rate than Taq DNA polymerase (Lundberg et al, 1991). During the course of this work two errors that were probably induced by the enzyme were detected in over 30,000 nucleotides of DNA sequence that was PCR amplified using this enzyme (chapter 7). Thus, under the conditions described in this thesis Pfu DNA polymerase has an error rate of 3.3 x 10_6 errors per nucleotide per cycle. This error rate is in good agreement with that published by Lundberg et al. (1991). Thus, the use of Pfu DNA polymerase for PCR amplification of DNA reduces the number of enzyme generated errors to a negligible level.

PCR crossover events with Taq DNA polymerase: Another problem associated with PCR amplification is the production of hybrid DNA products. This has been called strand jumping or PCR crossover and was shown to occur when the template DNA was damaged by sonication, depurination or UV irradiation (Paabo et al, 1990). It was proposed that when the enzyme Taq DNA polymerase reaches a break in the DNA it terminates extension. In subsequent cycles the partial extension product can act as a primer. When amplifying from a family of homologous sequences the partially extended product may anneal to a homologous, but different sequence and extension of this 'false primer' would then result in a chimaeric DNA molecule containing partial sequences of two different template molecules. Another study suggested that incomplete cDNA synthesis followed by PCR may also result in hybrid DNA products (Shuldiner et al, 1989). In this case, if a partially degraded RNA molecule is reverse transcribed it may anneal to a homologous but different sequence and subsequently be extended by the DNA polymerase. This would also produce a DNA molecule containing partial sequences of two different templates. An alternative pathway resulting in the production of hybrid PCR products occurs when the enzyme terminates DNA synthesis before reaching the other

158 end of the template, thus producing a partial DNA molecule that may anneal to another homologous sequences in later cycles (Tomlinson et al, 1992). An obvious situation where strand jumping may be a problem is when PCR amplifying from related VH genes. Indeed, a recent study involving PCR amplification of human VH genes went to extraordinary lengths to ensure no genes contained any in vitro generated artifacts (Tomlinson et al, 1992). Thus, cloned PCR fragments were probed with up to 29 different specific probes, and out of over 3,500 clones only 74 were accepted as genuine, artifact-free sequences. This seems a rather extreme culling rate since it assumes that most PCR products are hybrid molecules and/or contain at least one enzyme generated nucleotide change. This is in contrast to a similar study involving the PCR amplification of related VKOX-1 germline genes under optimized concentrations of dNTP and Mg2+ (Milstein et al, 1992). In this study, controls to approximate the level of strand jumping were carried out. Hybridization screening of over 100 M13 plaques containing PCR products amplified from a mixture of two templates and sequence determination of 20 clones did not detect any crossover events when PCR was carried out under the optimized conditions. Therefore, it seems reasonable to assume that when high molecular weight genomic DNA is used and PCR conditions are optimized, the level of strand jumping is very low. This is in agreement with the data obtained in chapter 6 where no hybrid DNA molecules were detected when DNA was amplified using the 5' primer specific for VH186.2 related germline genes in combination with a JH specific primer. Over 9,000 nucleotides of DNA sequence was amplified with Taq DNA polymerase from various somatically mutated hybridomas (Figs. 6.2, 6.3, and Table 7.1). If crossover events had occurred, hybrid DNA products containing the upstream sequence of a VH 186.2 related germline gene (Fig. 8.1) and the rearranged VH-D-JH expressed by the hybridoma should have been detected. Not one of the clones that was sequenced showed any evidence of being a chimaeric molecule. Thus, under the conditions used in this work strand jumping occurred at a very low frequency, suggesting that the genomic DNA was of high quality and that Taq DNA polymerase did not terminate DNA synthesis prematurely.

PCR crossover events with Pfu DNA polymerase: In the above examples Taq DNA polymerase was used for PCR amplification. The results obtained therein can therefore not be extended to PCR amplification with Pfu DNA polymerase. If Pfu DNA polymerase has a very low processivity it could well be that prematurely terminated DNA fragments could promote the formation of chimaeric PCR products. However, the data shown in chapter 7 indicates that if any crossover events occurred under the conditions utilized for the work carried out for this thesis, they did so below detectable limits. An attempt to PCR amplify from partially and fully degraded template DNA failed to yield a 159 product (Fig. 7.1). This indicates that a short exposure (1 min) of the DNA to a restriction enzyme or DNAsel is sufficient for the enzymes to degrade enough of the template into small pieces to the extent that there is an insufficient amount of the complete target sequence to produce a detectable product. When only one of the two DNA templates was digested by the restriction enzyme Rsal a surprising result was obtained. Rsal restriction of VH3B62 produces a 235 bp DNA fragment (Fig. 7.2) that differs from the corresponding VH40.3 sequence at only two nucleotide positions. The two nucleotide changes are 15 and 89 nucleotides distant from the 3' terminus. A study on the minimal homology requirements for PCR primers (Sommer and Tautz, 1989) indicates that a 61 % homology is sufficient for a 36 bp primer to allow successful amplification providing the two 3' nucleotides are identical to the target sequence. This suggests that the 235 bp Rsal fragment should be able to anneal to the intact VH40.3 sequence quite efficiently. However, as seen in Figure 7.3, no evidence for any crossover events was obtained, indicating that despite the presence of a DNA fragment that should be able to promote strand jumping no such events occurred within detectable limits. One possible explanation for this somewhat surprising result is that the shorter PCR primer (primer 1 - 30 nucleotides, see Table 3.1) anneals to the target sequence more rapidly than the larger Rsal restriction fragment and thus outcompetes the latter. This is even the case when the amounts of Rsal restricted VH3B62 and intact VH40.3 present in the amplification mixture are approximately equal (Fig. 7.3, Lane 4). It would be of interest to carry out competitive annealing experiments to determine the size of DNA degradation products that would be able to successfully compete with the shorter PCR primers. Another strategy employed involved PCR amplification from an equimolar mixture of two intact and homologous but different DNA templates, and subjecting the products to restriction analysis. This is probably the strand jump control experiment that most closely reproduces the conditions in the PCR amplifications carried out for this thesis. This is due to the fact that all due care was taken to avoid degradation or damage of the genomic DNA during the DNA extraction procedure which makes it unlikely that many DNA strand breaks were present in the target sequences. Therefore this control experiment directly tested the proposition that crossover events are brought about when Pfu DNA polymerase prematurely terminates DNA synthesis, thus generating potential false primers for subsequent cycles. However, this strategy also failed to detect any hybrid DNA molecules (Fig. 7.4). This suggests that when the target sequence for PCR is less than or equal to approximately 1,000 bp, DNA synthesis mediated by Pfu DNA polymerase is not terminated prematurely. Alternatively, it could be assumed that in some cases Pfu DNA polymerase does 'drop off the template prior to completing synthesis of the entire target region, but the false primers so generated are outeompeted by the shorter PCR primers. 160 In addition, two independent PCR amplifications were carried out from an equimolar mixture of 4 intact, homologous target sequences. 16 clones were chosen at random from each amplification and subjected to sequence analysis. The results listed in Table 7.2 demonstrate that none of the 32 sequences were hybrids. Taken together, the above data demonstrate that in vitro artifacts such as enzyme induced nucleotide changes or PCR crossover events are extremely unlikely to account for the germline and somatic sequence patterns discussed in this thesis. In addition, the data presented above and that presented by Milstein and colleagues (Milstein et al, 1992) indicates that when the target DNA is of high quality and PCR conditions are optimized, the production of hybrid DNA molecules can be reduced to a level that is below detectable limits.

13.5 Molecular and phylogenetic analysis of related germline VH genes: evolutionary implications a) Patterns of sequence variation among murine germline IgVH genes

Germline VH sequences: Primer 12 is specific for a conserved region approximately 42 bp upstream of the 3' terminus of the VH186.2 and VH205.12 germline genes (Both et al, 1990). This primer in combination with primer 1 or 14 was used to amplify germline VH genes that are related to VH186.2 or VH205.12 respectively. The related sets of genes were amplified from C57BL/6J (IgHD) and from BALB/c (IgHa) genomic DNA (see chapters 8 and 11). None of the VH186.2 related sequences amplified from C57BL/6J (Fig. 8.1), including the VH186.2 prototype sequence which was isolated 12 times from C57BL/6J DNA, matched any of those amplified from BALB/c DNA (Fig. 8.4). Similarly, the VH205.12 sequences isolated from C57BL/6J DNA (Fig. 11.1) did not correspond to any of the sequences isolated from BALB/c DNA (Fig. 11.4), however this may be due to the smaller numbers of VH205.12 sequences isolated (20 from C57BL/6J, 10 from BALB/c). Comparison of the VH 186.2 related sequences that were isolated from the two strains of mice (Figs. 8.1 and 8.4) reveals that many of the coincident changes found in the non-transcribed 5' flanking regions and in the transcription/coding unit are present in both strains. A similar comparison between the VH205.12 related genes isolated from C57BL/6J (Fig. 11.1) and BALB/c (Fig. 11.4) genomic DNA also indicates that many coincident changes are present in both sequence data sets. Thus, the sequence data presented in chapters 8 and 11 suggest that the IgHa and IgHb VH repertoires contain homologous but nevertheless different VH186.2 and VH205.12 sub-families. This is consistent with an earlier study where no allelic relationships were found among 5 VH 186.2 related sequences that were isolated from C57BL/6J and BALB/c

161 DNA (Loh et al, 1984). In addition, the two strains possess different IgH haplotypes: the IgHD haplotype in C57BL/6J mice, and the IgHa haplotype in BALB/c mice (Riblet et al, 1986). However, at present it is not possible to determine whether the above strain differences are due to rapid recent divergence from a common ancestor or polymorphisms. Analysis of the genealogy of laboratory mouse strains (eg. Klein, 1975) indicates that there are too many gaps for the data to be useful in addressing this issue. It was suggested that the VH 186.2 sub-family may contain between 35 - 50 different genes (Bothwell, 1984). This estimate was based on hybridization studies that identified 10 cross-hybridizing bands when genomic mouse DNA was hybridized with a probe based on the VH sequence expressed by an anti-NP hybridoma (Bothwell et al, 1981). Work done by Rajewsky's group suggests that the VH205.12 sub-family is larger than the VH 186.2 sub-family (Gu et al, 1991b). It is likely that primer-bias prevented the amplification of every member of the VH186.2 and VH205.12 sub-families, and this also makes it impossible to determine the proportion of the two sub-families that were isolated and sequenced. An attempt at solving this problem should be made by PCR amplifying members of the two VH gene sub-families with primers close to but different to primers 1 and 12. Ideally, the 3* primer should be situated in the flanking region downstream of the V segment gene to allow amplification of the entire coding region. If this resulted in the isolation of many new genes then the sequences isolated in this work represent only a small sample of the overall repertoire. However, due to time restraints this experiment was beyond the scope of this thesis.

Germline sequences with identical regions: The VH 186.2 and VH205.12 related germline sequences presented above demonstrate the presence of identical coding regions or transcription units linked to different 5' non-translated flanking regions. Two sequences (C57C25 and C57C26) contain identical 5' regions and differ only in the coding region. The regions of complete sequence identity between various clones are shown in Figure 5.7. Since the PCR amplified region did not include the 42 bp at the 3' end of the genes it cannot be excluded that the groups of sequences with identical coding/transcription units may differ in this region. In order to confirm this finding the experiment needs to be repeated with PCR primers downstream of the putative coding regions so that it can be established unambiguously that the entire V genes are completely identical. The data presented in chapter 7 indicates that it is highly unlikely that these sequences are PCR strand jumping artifacts.

162 cap L VH 186.2 I D-C 1 — C57C25 C57C26 C57G45 C57E35 C57E3 BALB5E BALB 14E BALB2 BALB18 BALB23

cap L VH205.12 i-n-H i BALBll BALB19 C57G6 C57C27 BALB13 BALB58

C57G3 C57C18 C57C9 C57G14

Figure 13.7. Germline sequences containing identical coding regions, transcription units, or 5' flanking regions. The sequences containing regions of identity are grouped together, and the regions of identity are indicated by a line. Upper diagram: VH186.2 related sequences; lower diagram: VH205.12 related sequences.

It is well accepted that multigene families such as the IgVH locus have arisen as a result of gene duplication events that probably usually occur in tandem (Ohno, 1970; Hood et al, 1975). Indeed, in the human VH locus two blocks of tandemly duplicated VH genes have been identified, one involved the tandem duplication of two VH genes (Kodaira et al, 1986) whilst in the other tandem duplication event three VH genes were duplicated (Matsuda et al, 1993). In some of these genes the homology was shown to extend into the immediate flanking regions (Haino et al, 1994) although the distance between some of the VH gene pairs differs between the two tandem homology units (Matsuda et al, 1993). In addition, it was also shown that some human germline VH genes have diverged by the exchange of short stretches of coding region sequence with other VH genes (Haino etal, 1994). The data presented in Figure 13.7, if confirmed suggests that the entire coding/transcription unit with little or no 5' flanking region sequence may be duplicated. This may be mediated by hyper-recombination events targeted to the putative transcription/coding units (see chapter 9 and below). These events would have occurred very recently since the coding sequences have not yet accumulated any changes due to

163 either nucleotide substitutions or recombination events. Furthermore, the finding of two identical 5' flanking region sequences that are linked to different VH186.2 related genes may be indicative of a duplication event and subsequent divergence (possibly via hyper- recombination events targeted to the coding region) of the VH gene segment but not the immediate 5' flanking region sequence. Thus, the finding of identical putative coding/transcription units linked to different 5' non-transcribed flanking regions is consistent with the proposition that hyper-recombination events targeted to these regions are active on germline IgV genes. However, this is a very preliminary interpretation of the data since it needs to be established that the putative coding sequences do not contain any differences in the regions that were not PCR amplified.

Observed and expected nucleotide and amino acid variability: The DNA sequences of the VH 186.2 related genes shown in Figures 8.1 and 8.4 indicate that nucleotide variability is concentrated in the putative CDR2 of these genes. Although the presence of sequence variability in CDRs and conservation of FR sequences has been documented by several other studies (eg. see Bothwell et al, 1981; Givol et al, 1981; Bentley and Rabbitts, 1983; Schiff et al, 1986; Kodaira et al, 1986; Reynaud et al, 1987, 1989; Pascual and Capra, 1989; Lautner-Rieske et al, 1992; Tomlinson et al, 1992; Sims et al, 1992), no serious attempt has been made to identify the evolutionary mechanism(s) that may be responsible for the introduction into the germline DNA of these sequence patterns, which strikingly resemble the somatic mutation pattern that is introduced into the rearranged IgV genes expressed by B cells during the germinal center reaction. The nucleotide variability in CDR2 of the VH186.2 related germline genes is best illustrated by the nucleotide and amino acid variability plots shown in Figures 8.3 and 8.6. These figures reveal a peak of nucleotide variability in CDR2 which also translates into the putative amino acid sequence of these genes. The FRs of the VH 186.2 related sequences display a number of minor nucleotide variability spikes which are absent at the amino acid level, indicating strong selection for silent nucleotide changes in those regions. Due to the small sample size the variability plots for the 10 VH205.12 sequences isolated from BALB/c DNA (Fig. 11.6) are not very informative. However, the variability plots for the 20 VH205.12 related sequences (Fig. 11.3) indicate that at some positions of the 5' flanking regions the nucleotide variability exceeds the variability found in CDR2. However, it is important to note that the spikes in nucleotide variability in the non-translated 5' flanking regions are due to variability at single nucleotide positions. This is in contrast to the putative CDRl of the genes where nucleotide and amino acid variability is spread over 70 bp including and flanking CDRl. Furthermore, the nucleotide variability plots of the VH205.12 sequences indicate that there is very little nucleotide variability in the putative FRs of these genes. In fact the nucleotide variability 164 of these regions is much lower than the background variability found in the 5' flanking region. It is interesting to note that in the VH 186.2 related sequences nucleotide and amino acid variability is prominent in CDR2 but not in CDRl, whereas the VH205.12 related germline genes display more sequence diversity in CDRl than in CDR2. This is similar to the situation found in human VH genes where increased variability is always present in CDR2, but the level of variability in CDRl varies between the different VH gene families (eg. see Tomlinson et al, 1992). The fact that other laboratories have observed similar variability distribution patterns in germline IgV genes to those reported on in this thesis strengthens the validity of the data. Furthermore, when nucleotide and amino acid variability plots are generated for a collection of genuine germline genes (see chapter 10) the patterns are indistinguishable from those shown in Figs. 8.3 and 8.6. The nucleotide sequences and the variability plots indicate that the variability in CDR2 is restricted to the carboxy terminal portion of CDR2 which spans residues 50-58. This is consistent with similar patterns that were reported for human (Tomlinson et al, 1992) and murine VH genes (Schiff et al, 1985). Honjo and colleagues also pointed out that the amino terminal portion of CDR2 seems to be conserved in a family-specific manner (Haino et al, 1994). On this basis CDR2 was divided into two sections: CDR2a (residues 50 - 58) and CDR2b (residues 59 - 66) for subsequent statistical analyses of the sequence data. The statistical validity of the observed variability in the CDRs of the VH186.2 and VH205.12 related germline genes was tested by comparing the observed numbers of nucleotide substitutions with the expected numbers for each VH subregion (Tables 8.2, 8.4,10.1 and 11.2). In order to calculate the expected numbers of silent, replacement and stop codon-generating mutations three different models were used, all of which assume that mutations can occur at all positions with equal probability without taking into account mutational hotspots (eg. Kaartinen et al, 1991). The first method assumes a random point mutation process without making any assumptions about possible nucleotide substitution biases. The second method is based on the empirical substitution frequencies that were determined for meiotic nucleotide substitutions in IgV genes that are selection free (Kaartinen et al, 1991). Finally, the third method is based on the meiotic nucleotide substitution frequencies that were determined for non-Ig pseudogenes but taking into account probable changes of a methylated 5'-GC-3' to a 5'-GT-3' (Li et al, 1984). This type of analysis indicates clearly that in the VH186.2 related genes there is strong selection for silent nucleotide changes, and strong selection against amino acid replacement changes in CDRl. In contrast, CDR2a of these genes display evidence for strong positive selection for replacement changes accompanied by negative selection of silent changes (Tables 8.2, 8.4 and 10.1). This analysis was only carried out for the 20 VH205.12 related sequences isolated from C57BL/6J DNA, since the 10 sequences

165 isolated from BALB/c DNA are probably too small a sample to subject to statistical analysis. In the VH205.12 related sequences (Table 11.2) it is apparent that strong sequence conservation has occurred in FRl and in FR2. The differences between the expected and observed number of replacement and silent nucleotide changes in CDR2b was also statistically significant for two of the mutator models used to calculate the expected numbers, however this may only be an artifact of the low number of nucleotide changes present in that region (15 replacement changes, no silent or stop codon- generating changes), and thus awaits confirmation by a larger data set. The codon-by-codon analysis of the 20 VH205.12 sequences isolated from C57BL/6J DNA shows that although there is a peak of variability in CDRl of these sequences, the R:S ratio of the observed changes does not differ significantly from the expected R:S ratio. This suggests that the majority of nucleotide changes in that region will result in amino acid replacements. Therefore, the codon-by-codon analyses carried out for the VH186.2 and the VH205.12 sub-families (Tables 8.2, 8.4 and 11.2) suggest that in some FRs and/or CDRs the difference between the observed and the expected numbers of replacement and silent changes are statistically significant. This confirms that the non-random patterns apparent in germline IgV genes are in fact significantly different from the patterns expected under a model where nucleotide changes accumulate in a random fashion (see chapter 13.5b). An important feature of IgV regions that is revealed by this type of analysis is that the expected R:S ratios are not constant over the entire IgVH region. Furthermore, in the VH186.2 related sequences the expected R:S ratio of CDR2a differs from the expected R:S ratio of CDR2b. This latter point is less apparent the 20 VH205.12 related sequences isolated from C57BL/6J DNA. In the VH 186.2 related sequences the expected R:S ratio for CDRl ranges from 6:1 to 12:1 and is much higher than in the other germline VH sub- regions. The CDRl R:S ratio for the 20 VH205.12 related sequences that were isolated from C57BL/6J DNA ranges between 4:1 to 6:1, and although it is higher than the R:S ratios found in the other sub-regions of the VH genes it is not elevated to the extent that it is in the VH186.2 related genes. The elevated R:S ratio apparent in CDRl leads to the prediction that if germline VH genes are evolving via a mechanism where mutations are introduced at random into the germline VH genes, an accumulation of nucleotide substitutions leading to amino acid replacements should be present in germline CDRl, i.e. Wu-Kabat variability plots should reveal an amino acid variability peak in CDRl of genrtiine genes. Whereas this is the case in the VH205.12 related sequences (Fig. 11.3), no significant variability peaks are apparent in CDRl of the VH186.2 related sequences (Figs. 8.3 and 8.6). In both VH186.2 and VH205.12 related sequences the expected R:S ratio is higher in CDR2b than it is in CDR2a, although in the VH205.12 related sequences the extent of this elevation is lower than in the VH 186.2 related sequences. This would predict that

166 nucleotide and amino acid variability should be higher in CDR2b than in CDR2a. However, the observed patterns indicate that variability is in fact concentrated in CDR2a rather than in CDR2b. Taken together, the results indicate that strong selection has acted on these germline genes to conserve the FR sequences and diversify (parts) of the CDRs in a highly specific manner. The patterns of sequence variability of these germline genes deviate significantly from the expected patterns and resemble the patterns seen in somatically mutated IgV regions. The latterfinding wa s previously described by others (eg. see Bothwell et al, 1981; Givol et al, 1981; Bentley and Rabbitts, 1983; Schiff et al, 1986; Kodaira etal, 1986; Reynaud etal, 1987, 1989; Pascual and Capra, 1989; Lautner-Rieske etal, 1992; Tomlinson etal, 1992; Sims et al, 1992) and is generally ascribed to the fact that the selection forces acting on expressed IgV regions are similar to those acting on unrearranged germline IgV genes (eg. Bothwell et al, 1981; Kodaira et al, 1986; Pascual and Capra, 1991). However, this interpretation does not seem to take into account that unrearranged germline IgV genes cannot be expressed at the protein level. Thus it is difficult to imagine how selection could act directly on these silent genes to produce the observed highly non-random sequence patterns (also discussed in more detail below).

VH186.2 and VH205.12 related pseudogenes: Three of the 31 VH186.2 related sequences isolated from C57BL/6J DNA are pseudogenes as determined by the inability of the coding regions to produce a functional protein due to single nucleotide deletions (Table 8.1). The 21 BALB/c VH186.2 related sequences includes three pseudogenes that contain a stop codon (Table 8.3). Five of the 20 VH205.12 related C57BL/6J sequences are pseudogenes due to stop codons, and a further two pseudogenes were generated by single nucleotide deletions (Table 11.1). The 10 VH205.12 sequences isolated from BALB/c DNA include two pseudogenes that were generated by a stop codon (Table 11.3). Other collections of germline genes related to VH186.2 (Bothwell et al, 1981) or belonging to the J558 VH family (Gu et al, 1991b) indicate that there is no increased abundance of stop codons or frameshift mutations in the coding region that was not amplified in this work (i.e. the 42 bp at the 3' termini of the genes). Nevertheless, it is not possible to determine whether any of the amplified sequences may contain pseudogenes due to coding defects in the 3' terminal 42 bp of coding sequence that was not PCR amplified. However, a repeat of this experiment with a 3' PCR primer situated downstream of the 3' terminus of the VH genes should resolve this issue. It cannot be excluded that the sequence collections contain additional pseudogenes due to sequence changes in critical transcription/rearrangement control sequences. The promoter region contains a number of DNA sequence motifs that are important for correct transcription of IgV genes. Some of the well known promoter motifs are the TATA box,

167 the IgVH octamer sequence ATGCAAAT and the IgVn upstream heptamer CTCATGC (Parslow et al, 1984; Falkner and Zachau, 1984; Poellinger et al, 1989). The conserved heptamer is normally found 2 - 22 bp upstream of the octamer. An homologue of the above heptamer sequence is present 3 bp upstream of the VH 186.2 octamer (eg. see Fig. 8.1), however this heptamer sequence differs by one nucleotide (CTCATGA) from the above consensus heptamer sequence. In contrast, a homologous heptamer sequence is absent from the VH205.12 promoter region (eg. see Fig. 11.1). The promoter regions of the human VH6 gene (Sun et al, 1994) and of the human VH-IJI and VH-IV genes published by Haino et al (1994) also lack a heptamer homologue. Nonetheless, it is known that VH205.12 (Sablitzky and Rajewsky, 1984), the human VH6 (Sun et al, 1994) and some of the human VH-HJ and VH-IV (Haino et al, 1994) germline genes are capable of being expressed. This suggests that the heptamer is either not essential for correct promoter function (also see Haino et al, 1994) or another DNA sequence motif can compensate for the lack of the heptamer. Furthermore, the fact that VH 186.2 can be expressed (eg. see Bothwell etal, 1981) despite containing a heptamer that differs from the consensus by a single nucleotide change suggests that (some) nucleotide changes from the consensus heptamer sequence may not abrogate the function of this element in promoter regions where it is present Mutational analysis of the IgVH octamer (Ballard and Bothwell, 1986) indicated that changing the octamer sequence to ATTTAAAT, ATGGCAT or ATGAGAT resulted in almost complete loss of transcription of the gene. However, the sequences immediately 5' of the octamer, including the conserved heptamer, were absent in these mutants. Therefore, it is possible that the loss of transcription was due to the absence of one or more accessory sequence motifs rather than due to the mutations introduced into the octamer. It is of interest to note that whereas the VH 186.2 octamer sequence corresponds perfectly with the consensus IgVn octamer sequence published by Parslow et al. (1984), the VH205.12 octamer sequence differs from the consensus at the last position: ATGCAAAA- Strikingly, all but one of the 29 VH205.12 related sequences isolated in this work contain an A->T nucleotide substitution (relative to the VH205.12 sequence) at the 3' terminal base of the octamer, thus effectively reverting the diverged octamer sequence back to the consensus sequence (Figs. 11.1 and 11.4). However, since the VH205.12 gene is capable of being expressed, the T-»A change from the consensus sequence at the most 3' base of the octamer does not inhibit the effectiveness of the octamer. There is additional evidence that complete octamer sequence conservation is not obligatory for the maintenance of a functional promoter region. Thus, at least 16 human and murine IgVn and IgVL genes that are known to be expressed contain octamers that diverge from the consensus sequence by one or two nucleotide changes (see Table I in Atchison et al, 1990). A recent study demonstrated that when the octamer sequence is 168 altered by three nucleotides the octamer binding protein (OBP)-IOO can still bind to the octamer with high affinity by interacting with the sequences immediately flanking the octamer (Baumruker et al, 1988). It was suggested that this effect may be mediated by the ability of flanking sequences to stabilize the overall DNA conformation, or by the presence of specific contact sites present in the DNA sequence immediately flanking the octamer. In addition, it was recently found that an IgVK octamer that was bound poorly by the promoter binding factors was compensated for by an upstream pyrimidine rich octamer which resulted in the expression of the gene (Atchison et al., 1990). The only nucleotide change present in the octamer of the VH205.12 related sequences is the A—»T change which produces the consensus octamer sequence, therefore all of these genes contain a functional octamer. One of the VH 186.2 related sequences isolated from BALB/c DNA and 6 of the C57BL/6J sequences contain single nucleotide changes within the octamer motif (Figs. 8.1 and 8.4). However, none of these sequences contains any nucleotide changes within 20 bp on either side of the octamer motif suggesting that if any of these single nucleotide changes do interfere with OBP-DNA interactions , high affinity OBP-DNA interaction may nevertheless be mediated by the sequences immediately flanking the octamer that may have been conserved for this reason. However, this would assume that strong selection based on function has acted on these flanking regions. It is difficult to see how such selection could act directly on the flanking region of a translationally silent gene, especially since hundreds of IgVH genes are present in the germline, thus making it unlikely that the loss of function of any one of these would be deleterious to the whole organism (see section 13.5c). The TATA boxes of the human VH-L VH-IL VH-UI and VH-IV families display sequence variation both between and within the families (Haino et al, 1994), indicating that the recognition and binding of this promoter element by the appropriate binding factor(s) may be flexible. The TATA box of the VH 186.2 gene was previously determined (Both etal, 1990; see Fig. 8.1). None of the 21 VH186.2 related sequences isolated from BALB/c DNA contain any nucleotide changes in the TATA box (Fig. 8.4), however 4 of the 31 C57BL/6J sequences contain a A->T nucleotide change. Since this nucleotide change does not alter the AT content of the TATA box it may have no adverse effect on the function of this region. Although previous studies did not determine a TATA box for the VH205.12 gene, an AT rich region is present between 23 and 28 bp upstream of the cap site (Fig. 11.1). Only one of the VH205.12 related sequences isolated in this work contains a nucleotide change within the putative TATA box. Since this change is an A-»T, it may not interfere with the function of the region. None of the VH205.12 related sequences contain any nucleotide changes at the splice sites (Figs. 11.1 and 11.4). However one of the VH 186.2 related sequences isolated from C57BL/6J DNA (C57C25, Fig. 8.1) contains an altered splice site (AG/GT-*AC/GT), and one of the VH186.2 related BALB/c sequences (BALB25, Fig.

169 8.4) also contains a nucleotide change within the splice site (AG/AT). If these alterations interfere with splicing these two sequences may be pseudogenes. There may also be additional pseudogenes due to defects in the recombination signal sequences found in the 3' flanking regions that were not PCR amplified in this work. As discussed above, the sequences isolated in this work contain some genes that are unable to produce a functional protein due to coding region defects such as stop codons and frameshift mutations. Thus, 9.7 % of the VH186.2 related sequences isolated from C57BL/6J DNA, and 14.3 % of the BALB/c VH 186.2 related sequences are pseudogenes on the grounds that they are unable to produce a functional protein. The VH205.12 sub-family appears contain more pseudogenes: 7 out of the 20 C57BL/6J sequences, 35 %, contain coding defects. Additional pseudogenes may have been produced by mutations that interfere with the correct transcription or with the splicing of the mRNA. However, recent studies (see above) have provided evidence that the interactions between the DNA binding factors and the various promoter elements may be flexible in order to accommodate random nucleotide changes in the sequence motifs. Indeed, it appears that flanking sequences may even contain 'back-up' sequences that can compensate for the loss of function of some of the promoter sequence motifs (Atchison et al, 1990). The above results indicate that the proportion of pseudogenes in the VH186.2 sub-family is much lower than that of the VH205.12 sub-family. Although it cannot be excluded that there may be additional pseudogenes due to other defects, it is unlikely that this would raise the number of VH186.2 pseudogenes to the level found in the VH205.12 sub-family. Among human VH families the proportion of pseudogenes also varies. Thus, approximately 50 % of the largest two human VH gene families, VH-1 and VH-UI are pseudogenes, whereas in the smaller families the proportion appears to be much lower (eg. see Lee et al., 1987; Pascual and Capra, 1991; Tomlinson et al, 1992). The sequence data that was obtained for the VH186.2 and VH205.12 sub-families (chapters 8 and 11) indicates that the proportion of pseudogenes in murine VH sub-families may also vary. It is also interesting to note that the pseudogenes detected in this work contain only single coding defects. This is consistent with the previousfinding tha t the majority of murine pseudogenes contain only one crippling mutation (Cohen and Givol, 1983; Blankenstein et al, 1987). In addition, some of the murine pseudogenes reported on in this work (chapters 8 and 11) as well as some human pseudogenes (Tomlinson et al, 1993; Haino et al, 1994) contain the same coding region defect and significant sequence homology. Pseudogenes with multiple coding and other defects, which are thought to represent older pseudogenes have been detected in the murine (Blankenstein et al, 1987) and human (Tomlinson et al, 1993; Haino et al, 1994) genomes. Interestingly, the ratio of (old) pseudogenes with multiple defects to (recent) pseudogenes with single defects 170 appears to be higher in the human (Tomlinson et al, 1992; Haino et al, 1994) than in the murine IgVH locus (Cohen and Givol, 1983; Blankenstein etal, 1987; chapters 8 and 11). The presence of many IgV pseudogenes with single crippling mutations led to the suggestion that IgV pseudogenes may be corrected via gene conversion events (Cohen and Givol, 1983; Schiff et al, 1985; Blankenstein et al, 1987; Haino et al, 1994, see below).

Deficit of stop codon-generating nucleotide substitutions: The detailed codon- by-codon analyses of the VH186.2 related sequences (Tables 8.2, 8.4 and 10.1) and of the 20 VH205.12 related sequences (Table 11.2) reveal that the observed numbers of stop codons are significantly lower than would be expected by the random accumulation of mutations during the course of evolution. In the VH 186.2 related sequences the difference is statistically significant for all three mutation models used to calculate the expected numbers, whereas in the VH205.12 related sequences the difference is only statistically significant under the two models that take into account nucleotide substitution biases (Kaartinen et al, 1991; Li et al, 1984; see above). It is likely that mutation models that take into account nucleotide substitution biases are a more accurate reflection of the expected number of mutations than the random model which makes no assumptions at all. Probably the more accurate of the three methods is the mutation model that was derived from meiotic mutations that accumulated in non-immunoglobulin related pseudogenes and also takes into account changes of methylated C nucleotides to T (Li et al, 1984). According to this model, the difference between the expected and observed number of stop codons in the C57BL/6J VH 186.2 related sequences is statistically significant (p < 0.001; Table 8.2). In the 21 BALB/c VH186.2 related sequences the difference is also statistically significant (p < 0.05), however, this calculation includes two 'rescued' stop codons, i.e. stop codons that contain one or more additional nucleotide changes which result in a codon that codes for an amino acid. If these two stop codons are eliminated from the calculations then the difference is even more significant (p < 0.01). Although the 20 VH205.12 related sequences contain 5 'functional' stop codons (i.e. stop codons that will effectively terminate translation), this is still significantly below the expected number (Table 11.2) according to the mutation model described by Li et al (1984). The fact that the number of stop codons is significantly below the expected indicates that these genes have undergone selection based on function. This is consistent with the distribution of nucleotide and amino acid variation found within these genes which is also indicative of selection based on antigen binding. Indeed, it was previously pointed out that the selection forces operating on germline IgV genes and those operating on expressed, rearranged IgV regions during an immune response produces the same pattern of sequence variation within these genes, i.e. accumulation of variability within 171 the CDRs but not in the FRs (eg. see Bothwell et al, 1981; Kodaira et al, 1986; Pascual and Capra, 1991). This view is confirmed by the finding that stop codons are selected against in germline VH genes as well as in expressed IgVn genes.

Hyper-recombination targeted to coding/transcription units: Each VH gene is centrally situated within 10 - 20 kb of flanking sequence (reviewed in Honjo, 1983; Rathbun et al, 1989). Under a model of point mutation and selection it would be reasonable to assume that the PCR amplified region (~958 bp for VH 186.2 related sequences, -968 bp for VH205.12 related sequences) should have evolved as intact contiguous units. This proposition was tested by subjecting the putative coding regions, the putative transcription units and the 5' non-transcribed flanking regions of each set of sequences to independent phylogenetic analyses (Figs. 9.1, 9.2, 12.1 and 12.2). Dendrograms were constructed with the MegAlign software package, which utilizes the weighted-residue method (Hein, 1990). The validity of this method and the reproducibility of the overall topologies of the dendrograms were independently tested and confirmed by using a number of alternative phylogenetic analyses (see chapter 9). The results of the phylogenetic analyses indicate that the overall topologies of the dendrograms obtained for the putative coding and transcription units are similar but that they differ from the dendrograms obtained for the 5' flanking regions. This indicates that the lineal relationships present in the 3' transcribed regions and the 5' non-transcribed regions are very different, indicating that the two regions have probably evolved quite differently. This was found in the VH186.2 and the VH205.12 related sequences isolated from both C57BL/6J and BALB/c DNA as well as in the collection of genuine germline genes. The independent confirmation of these results further demonstrated that not only did the 5' flanking regions and the transcription units evolve as separate contiguous units, but that consistently more nucleotide changes were found in the putative 3' transcribed region. Therefore, the results from the phylogenetic analyses of the PCR amplified sequences indicate that the 3' transcription unit has evolved at a faster rate. This could be due to hyper-recombination events targeted to the putative transcription units or the coding regions. If it is assumed that insertions and deletions can be introduced at recombination points (eg. see Thomas and Capecchi, 1986; Weiss and Wilson, 1988), then the proposition that the transcription/coding units are targets for hyper-recombination events would predict that insertion and deletion events should be concentrated around the cap site. Indeed, the distribution of 21 insertion and deletion events around the 52 VH 186.2 related sequences indicates clearly that such a concentration of insertion and deletion events is centered over 50 bp region that includes the cap site (Fig. 9.3). Examination of the 31 VH186.2 related sequences isolated from C57BL/6J DNA (Fig. 8.1) and the 21 isolated from BALB/c DNA (Fig. 8.4) reveals that most of the insertion and deletion 172 events were contributed by the BALB/c sequences. Thus, the phylogenetic analysis of and the distribution of insertions and deletions around the VH186.2 related sequences are consistent with the proposition that the putative transcription unit of these sequences are targeted by hyper-recombination events. From this work it cannot be established if deletion and insertion events are also concentrated immediately downstream of the putative VH coding unit, however this issue should be resolved by PCR amplifying and sequencing VH 186.2 related sequences with a primer located downstream of the VH gene. Although the phylogenetic analysis of the VH205.12 related sequences is also consistent with the proposition that the putative transcription/coding units have been targeted by hyper-recombination events, the distribution of insertions and deletions around these sequences (Fig. 12.3) differs from that found amongst the VH186.2 related sequences. In these sequences there are two minor peaks of insertions and deletions near the promoter (150 - 200 bp and 250 - 300 bp upstream of the cap site). Comparison of the insertion/deletion distribution graphs obtained from the VH205.12 and VH186.2 related sequences reveals that both display minor concentrations of insertions and deletions (i.e. 2-3 insertion/deletion events per 50 nucleotide region) in the regions 150 - 200 bp and 250 - 300 bp upstream of the cap site. Thus, the only inconsistency in the insertion/deletion distribution graph for the VH205.12 related sequences is the absence of a peak of insertion and deletion events around the cap site. However, since only 10 BALB/c sequences contributed to this analysis it is possible that this merely reflects the smaller size of the VH205.12 related sequence data set. This issue could be resolved by isolating and sequencing additional VH205.12 related sequences (especially from BALB/c DNA), but due to time limitations, it was not possible to do this for this thesis. The evolution of the IgV gene families has almost certainly involved gene conversion events, and there is some evidence for this in the literature (eg. see Bentley and Rabbits, 1983; Krawinkel etal, 1983; Ferguson etal, 1989; Haino etal, 1994). It may be suggested that the evidence presented in this thesis for hyper-recombination events targeted to the putative transcription/coding units of germline IgV genes may be indicative of normal germline gene conversion events that have contributed to the evolution of IgV gene families. This however is unlikely for several reasons. Cohen et al (1982) showed that 'zipper' regions (regions composed of repetitive DNA sequence elements) are found upstream of the ATG initiation codon of IgVH genes. The demonstration that 2 out of 5 sequences that are homologous downstream of such a zipper region abruptly diverge from the other 3 immediately upstream, was taken as evidence that zipper regions promote gene conversion events between different germline IgVH genes. The upstream VH186.2 germline sequence contains a 42 bp DNA region of TA repeats which is situated between 573-615 bases upstream of the putative cap site. 173 However, the distribution of insertions/deletions around VH 186.2 related sequences (Fig. 9.3) suggests that the cap site may be a hot spot for recombination even though the area immediately surrounding the cap site does not contain a zipper region. Since the cap site is not in close proximity to a gene conversion-promoting zipper region it seems likely that the recombination events that occur around the cap site are not of the same type as those occurring around the zipper region. Significant differences between the sequence homologies of the 5' flanking regions and the putative transcription units may interfere with the use of the zipper region as a recombination hot spot. However, the sequence homology plots shown in Figures 8.2, 8.5, 11.2 and 11.5 indicate that the 5' non- transcribed flanking regions and the putative transcription units both share sequence homologies greater than 80 %. Furthermore, in the data set that provided the greatest number of insertion and deletion events around the cap site (21 VH 186.2 related sequences from BALB/c, Table 8.3), the sequence homologies of the two different regions are very similar, thus further supporting the proposition that the hyper- recombination events that are targeted to the transcription/coding units are of a different nature than the recombination events that are mediated by the zipper region.

Utilization and location of VH genes belonging to the J558 family: The J558 family, which includes the VH 186.2 and the VH205.12 sub-families is located furthest from the D locus (Fig. 1.2; reviewed in Kofler et al, 1992). It was shown that in CB.20 mice members of the J558 family are commonly rearranged and expressed (Gu et al, 1991b). This is in contrast to the human VK locus where VK gene usage decreases with distance from the JK locus (Cox et al, 1994). Although the location of the J558 family is known, the relative positions of the sub-families has not yet been determined. It also be interesting to assess the level of interspersion between the VH 186.2 and VH205.12 sub-families. In addition, although it was found that generally all J558 sub­ families are expressed in B cells, in some of the sub-families, including VH 186.2 and VH205.12 a few VH genes seem to be preferentially utilized. It would thus also be of interest to determine the location of the preferentially utilized genes within the J558 gene family, i.e. are they the most D proximal J558 genes? These questions can only be answered by mapping and sequencing the entire murine VH locus.

Summary: The sequence data and the analyses presented above have revealed a number of very interesting properties of murine VH186.2 and VH205.12 related germline genes. First, sequence variability of the VH 186.2 related germline genes is concentrated in CDR2a. Although the expected R:S ratios determined from a consensus sequence predict that sequence variability should be present in CDRl and CDR2b, this is not the case. In the VH205.12 related sequences the predicted sequence variability in CDRl is present, however not in CDR2b. Second, comparison of the observed number of replacement and 174 silent nucleotide exchanges in these sequences revealed that in several sub-regions (eg. FRl, FR2, CDR2a) the differences are statistically significant. This further emphasizes the extremely non-random nature of the distribution of sequence variability within the putative coding regions of these genes. Third, a statistically significant lack of stop codons generated by point mutations is apparent in three of the above sequence collections (statistical analysis of the BALB/c VH205.12 related sequences was not done due to the low number of sequences). The above three properties of germline IgVn genes demonstrate that these germline genes have been subjected to very strong selection based on protein function during the course of their evolution. As was pointed out previously by others (eg. Bothwell et al, 1981; Kodaira et al, 1986; Pascual and Capra, 1991), the similarities between the germline IgV genes and the expressed rearranged IgV regions is remarkable. It is therefore of interest to evaluate what type of mechanism could generate the above patterns of sequence diversity in germline IgVn genes (see section 13.5c). Finally, the phylogenetic analyses indicate that the putative transcribed regions and the 5' non-translated flanking regions are evolving as independent units and that the putative transcribed regions appear to be evolving at a faster rate than the non-transcribed 5' flanking regions. This indicates that hyper-recombination events targeted to the putative transcription and/or coding units may have contributed to the evolution of these genes. This is supported by the concentration of insertions and deletions around the cap site of the VH 186.2 related sequences. b) Patterns of sequence variation among germline murine VL genes and among germline IgV genes of other vertebrate species

In order to determine whether some or all of the above properties are unique to murine IgVn germline genes or whether they can also be found in murine IgVL germline genes and in the germline IgV genes of other vertebrate species, sets of related germline IgV genes were taken from the literature and subjected to the above analyses where possible.

Murine VKOXI germline genes: Variability plots: The sequences for a total of 26 VKOXI germline genes were obtained from the literature (14 from Even et al, 1985; 12 from Milstein et al, 1992, group II sequences). A variability plot for the DNA sequences of these genes that was previously constructed (GonzaUez-Fernandez and Milstein, 1993, Fig. 5b) indicates that germline variability is concentrated in a region of approximately 60 bp which includes CDR2 and the immediately precedes CDR2. Another significant peak in variability corresponds to the portion of CDR3 that is coded for by the germline V genes, whereas only a very

175 minor variability peak is present in CDRl. Thus, these genes also display a similar concentration of sequence variability to that found in the above murine VH germline genes. Codon-by-codon analysis: The number of replacement, silent and stop codon-generating nucleotide changes observed in the 26 germline genes were compared to the expected numbers which were calculated as described in chapter 8 from a consensus sequence.

Table 13.2. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 26 VK-OX1 related germline genes. Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 62 30 0 (1-23) Expl 67.1 65.0 64.8 22.6 24.3 24.7 2.2 2.8 2.6 CDRl Obs 46 6 1 (24-34) Exp 41.7 40.9 40.3 9.8 10.8 11.5 1.5 1.4 1.2 FR2 Obs 67 37 1 (35-49) Exp 73.9 69.4 68.7 21.7 22.9 23.7 9.3 12.7 12.6 *** *** CDR2 Obs 61 3 0 (50-56) Exp 45,7 43.8 43.3 18.3 20.2 20.7 0.0 0.0 0.0 FR3 Obs 48 32 0 (67-83) Exp 58.3 57.0 56.8 19.8 20.8 21.3 1.9 2.2 1.9 *** *** CDR3 Obs 61 17 2n (89-95) Exp 62.2 59.6 54.9 11.4 13.9 14.5 6.3 10.5 10.6

Entire V Obs 345 125 4(2n) region Exp 348.9 334.7 328.8 103.6 112.9 116.4 21.2 29.6 28.9

For this analysis, a Vx-Oxl related consensus sequence was determined for 26 germline sequences (Even et al, 1985; Milstein et al, 1992, Group n sequences). The consensus sequence differs from the germline VKOXI sequence only at codons 46 (CGC), 53 (AAC), and 94 (TAC). t It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ' The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for substitutions in meiotic non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right).n Indicates the number of neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation, x2 probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. *** = p<0.01; **** = rxO.001. For stop codons, the %2 value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

Both FR2 and FR3 display evidence of strong sequence conservation, whereas CDR2 has undergone strong selection for replacement mutations. The differences of the

176 observed number of mutations from the expected in these regions are statistically highly significant. The 26 VKOxl germline genes also contain lower than expected numbers of stop codon-generating point mutations. For all three methods of calculating the expected numbers, the differences between the observed and expected numbers of stop codon- generating changes are statistically significant. It is interesting to note that out of a total of four stop codon-generating point mutations only 2 actually result in a 'functional' stop codon, the other two are 'rescued' by additional mutation(s) in the same codon. As with the murine VH sequences presented in chapters 8 and 11, the expected R:S ratios vary between the different sub-regions. In the FRs. the expected R:S ratio is approximately 3:1 for all mutator models, however the expected R:S ratios of CDRl and CDR3 respectively are approximately 4:1 and between 4:1 and 5.5:1. Interestingly, the expected R:S in CDR2 is somewhat depressed, ranging between 2.5:1 and 2:1 depending on the mutator model. Thus, the expected R:S ratios would predict the concentration of amino acid variability in CDRl and CDR3 and reduced amino acid variability in CDR2.

However, the variability plot for 26 VKOxl germline genes (Gonzalez-Fernandez and Milstein, 1993, Fig. 5b) indicates that although there is a concentration of sequence variability in CDR3, the variability of CDRl is barely above background but most sequence variability is present in CDR2 and in the region immediately preceding CDR2. Gonzalez-Fernandez and Milstein (1993, Fig. 5a) also plotted a variability plot for 48 VKOXI transgenes that had accumulated somatic mutations in Peyer's patches of unimmunized mice. This revealed that with the exception of a mutational hot spot in FR3 amino acid replacements caused by somatic mutations had accumulated in the sub-regions with the highest expected R:S ratios, i.e. in CDRl and in CDR3. Distribution of insertions and deletions: Although none of the 26 sequences contained sufficient non-transcribed 5' flanking region sequence for the phylogenetic analyses shown in chapter 9, the 12 germline gene sequences presented in Milstein et al. (1992) included the leader sequence. Thus, the distribution of insertions and deletions was plotted for these 12 VKOXI germline genes (Fig. 13.8). This reveals that the putative coding regions of the 12 VKOXI germline genes are flanked on both sides by a concentration of insertion and deletion events. Significant concentrations of insertion/deletion events are present within the leader intron, and a lesser concentration immediately downstream of the 3' terminus of the putative coding region.

177 20 n

-400 -200 0 200 400 600 Nucleotide position O-

Figure 13.8. Distribution of 31 unique insertion and deletion events around 12 VKOxl germline genes. The data are grouped in 50 bp intervals. The diagram below the graph represents the region for which sequence data was available. The relative positions of CDRl, CDR2 and CDR3 are indicated with black boxes. Symbols: L = leader; V = V gene.

Since it was not possible to construct dendrograms for the putative coding regions and the 5' flanking regions it is not possible to determine whether the two regions evolved differently, however the distribution of insertions and deletions around the 12 VKOXI germline genes indicates that hyper-recombination events may also occur in the IgVK locus, but that in this locus hyper-recombination is targeted to the putative V coding regions. Summary: Many of the patterns observed in the VH186.2 and VH205.12 related sub­ families are also present in the 26 VKOXI germline genes. Thus, sequence variability in these genes is also highly non-random. This is highlighted by the comparison of the variability plots obtained for germline VKOXI genes and somatically mutated VKOXI transgenes isolated from Peyer's patches of unimmunized mice (Gonzalez-Fernandez and Milstein, 1993, Fig. 5). The former deviates significantly from the expected, whereas the latter is exactly as predicted by an essentially random point mutation process. The number of stop codons generated by point mutation in the 26 VKOXI germline genes is also significantly below the expected. This suggests that the pattern of sequence variation in these germline genes is the result of strong selection for protein function. Finally, the putative coding units of 12 VKOXI germline genes are flanked on both sides by concentrations of insertion and deletion events. If it is assumed that these events are diagnostic of recombination events, then it appears that hyper-recombination events have also targeted these IgVL chain genes, but unlike the VH 186.2 and VH205.12 related genes only the V coding region is targeted by this process.

178 Human VH genes: The human IgH locus is located on chromosome 14 and although the organization of the genetic elements is very similar to the murine IgH locus (Fig. 1.1), there is much more intermingling of the VH gene families (reviewed in Berman and Alt, 1990; Pascual and Capra, 1991). The human IgVH locus contains 6 VH families, one of which, VH- VI, contains a single gene which is the most JH-proximal VH gene (reviewed in Pascual and Capra, 1991). The VH-1 and VH-III are the largest VH gene families and approximately 50 % of these genes are pseudogenes. The VH-IV family however contains very few pseudogenes (Lee et al, 1987; Pascual and Capra, 1991). This is similar to the scarcity of pseudogenes in the VH 186.2 sub-family of the C57BL/6J strain (Table 8.1) when compared with the higher apparent proportion of pseudogenes found in the VH205.12 sub-family of the same mouse strain (Table 11.1). It was recently estimated that the total number of human IgVH genes is approximately 120, which is significantly lower than the number likely to be present in the murine IgVn locus (Matsuda etal, 1993). The evolution of the human IgVH locus has involved the translocation of VH genes to other chromosomes, as is evidenced by the presence of sequences that cross- hybridize with VH-L VH-U and VH-IU family specific primers on chromosomes other than chromosome 14 (Matsuda et al, 1990; Cherif and Berger, 1990). Four of these VH genes (orphons) that have been translocated to chromosome 16 were isolated and sequenced (Matsuda et al, 1990). The sequences of two of these revealed that they belong to the VH-I family and that both contained multiple nonsense and frameshift mutations, whereas the other two sequences which belong to the VH-IJJ family appear to be functional. It is not known whether these genes can be expressed in B cells. However it was recently demonstrated that mice containing rearranged VH-D-JH transgenes that lack a downstream C gene could be expressed by murine B cells following interchromosomal recombination of the transgene with an endogenous C region gene (Giusti et al, 1992; Giusti and Manser, 1993, 1994). This indicates that interchromosomal gene conversion can join a rearranged VH region to a CH region, and it is thus conceivable that a similar mechanism may result in the rearrangement of functional orphon VH genes to the other genetic elements that make up the IgVH chain. Variability plots: Amino acid variability plots for 21 VH-I and 25 VH-UI genes were previously published (Tomlinson et al, 1992, Fig 4: only functional genes were included). The amino acid variability plot for both gene families clearly illustrate the presence of amino acid variability peaks in CDRl and in CDR2a. A minor peak is also present in FR3 of the VH-1 genes, however the level of variability in CDR2b in both gene families is equal to the background.

179 Recently, the sequences of 9 VH-III genes were determined up to a point approximately 250 bp upstream of the ATG initiation codon (Haino et al, 1994). The putative coding region sequences for these genes were previously determined (Matsuda et al, 1993). Although this is a small data set, it is the largest set of human VH genes for which significant amounts of upstream sequence is available. Thus, nucleotide and amino acid variability plots were constructed for these sequences (Fig. 13.9).

-200 0 200 40 60 80 100 Nucleotide Position Amino Acid Position

—o- nn CDRl CDR2 V Figure 13.9. Nucleotide (left) and amino acid (right) variability plots for 9 VH-III sequences (Haino et al, 1994). The horizontal line indicates the level of variability where only 1 out of the 9 sequences contained a nucleotide change (2.25). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0. The diagram below each graph indicate the region that was analyzed. The relative positions of CDRl and CDR2 are indicated by black boxes. In the nucleotide variability plot the first nucleotide of the putative coding region was assigned position 0. Symbols: P = promoter; L = leader, V = V gene.

Major nucleotide variability peaks present in CDRl and CDR2 are also present in the putative amino acid sequences. A number of nucleotide variability peaks in FRl and FR3 however are not present at the protein level, indicating strong sequence conservation in these regions. Immediately upstream of the promoter region there is a great increase in the overall level of nucleotide variability. Examination of the sequences (Haino et al, 1994, Fig. 3) reveals that the sequences diverge significantly beyond this point which indicate that recombination events have probably occurred in this region. Thus, the high level of nucleotide variability in this region is not likely to reflect an increased incidence of point mutation. Codon-by-codon analysis: Although the data set is small a detailed analysis of observed versus expected R:S ratios was carried out (Table 13.3). It must be noted that the 9 sequences do not include any pseudogenes which may bias the observed incidence of stop codons generated by point mutations. 180 Table 13.3. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt)for 9 human VH-IU germline genes Sub-section of Numbers of nucleotide differences resultiginn : V region (codons) Replacement Silent Stop FRl Obs 21 15 0 (1-30) Expl 24.9 24.0 23.8 9.9 10.7 11.1 1.2 1.3 1.2 CDRl Obs 26 9 0 (31-35) Exp 28.8 27.8 27.5 5.5 5.8 6.3 2.0 2.8 2.9 FR2 Obs 9 26 0 (36-49) Exp 24.4 22.9 22.6 8.6 9.3 9.5 1.3 1.8 1.9 **** **** CDR2a Obs 54 11 3n (50-58) Exp 52.0 50.0 49.2 14.3 15.3 15.8 1.7 2.7 3.0 CDR2b Obs 16 13 1" (59-66) Exp 21.2 20.7 20.7 7.5 8.1 7.9 1.3 1.7 1.4 ** * FR3 Obs 37 40 0 (67-93) Exp 55.9 53.4 52.8 18.2 19.8 20.9 2.9 3.9 3.3 **** Entire V Obs 163 114 4n region Exp 207.3 198.8 196.664. 0 69.0 71.5 9.9 13.7 13.0 *** ***

For this analysis, a VH-IH consensus sequence was determined for 9 genes that also included 5' flanking region sequence (Matsuda et al., 1993; Haino et al, 1994). t it is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ^ The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic substitutions in non-lg pseudogenes after elimination of probable changes of a methylated C to T (right). n The number preceding this symbol indicates neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation(s). x2 probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. * = p -0.05; ** = p<0.05; *** = p<0.01; **** = p<0.001. For stop codons, the x2 value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

The expected R:S ratio is not constant throughout the entire putative coding region. In all FRs and in CDR2b it ranges from approximately 2:1 to 3:1. It is slightly elevated in CDR2a where it is between 3:1 and 4:1 depending on the mutator model. However, in CDRl the expected R:S ratios is between 4:1 and just over 5:1. Thus, the expected R:S ratios would predict most amino acid variability to be present in CDRl and a slight elevation of variability in CDR2a. However, as shown in Figure 13.9 amino acid variability in CDR2a is equivalent to that seen in CDRl, which is higher than expected. Furthermore, the codon-by-codon analysis also indicates that very strong selection

181 against amino acid replacement mutations has taken place in FR2 and FR3. Although there is a significant deficit in stop codons generated by point mutations (in 2 out of 3 of the mutator models), this may be a reflection of the fact that all 9 sequences included in the analysis were functional genes. In order to determine whether there is a genuine deficit of stop codon-generating point mutations in human VH-ITI sequences, a codon-by-codon analysis was carried out on 25 VH-III sequences published by Pascual and Capra (1991) that included 12 pseudogenes (Table 13.4).

Table 13.4. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 25 human VH-UI germline genes Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 46 46 3dn) (1-30) Expl 65.8 63.2 62.7 26.0 28.3 29.2 3.1 3.5 3.1 CDRl Obs 77 10 6(4^) (31-35) Exp 76.5 73.9 73.2 12.4 13.4 14.9 4.1 5.7 4.9 FR2 Obs 42 66 2(ln) (36-49) Exp 76.8 71.9 71.0 27.1 29.3 29.8 6.2 8.8 9.1 CDR2a Obs 178 48 0 (50-58) Exp 175.6 171.5 166.1 47.5 51.5 57.9 2.7 2.9 2.3

CDR2b Obs 44 33 9(7n) (59-66) Exp 60.9 59.3 59.3 21.5 22.1 23.0 3.6 8.2 3.8 **** *** FR3 Obs 121 110 0 (67-93) Exp 167.7 160.1 158.2 54.5 59.4 62.8 8.8 11.6 9.9

Entire V Obs 508 313 20(13n) region Exp 623.3 600.0 590.4 189.2 204.0 217.5 28.5 37.0 33.1 *** **

For this analysis, a consensus sequence was determined for 25 VH-UI genes (Pascual and Capra, 1991). 1" It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al., 1991). 1 The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic substitutions in non-lg pseudogenes after elimination of probable changes of a methylated C to T (right).n The number preceding this symbol indicates neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation(s). x probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes Ask Us JkA against each of the expected numbers. = p<0.05; = p<0.01; **** = p<0.001. For stop codons, the X2 value (1 df) is determined for the numbers of stop codons generated by point mutations versus other mutations (i.e. replacement + silent).

182 Table 13.4 illustrates the fact that in all FRs there has been strong sequence conservation, i.e. there are significantly more silent changes than expected and conversely, significantly fewer replacement changes than expected. The same pattern is also apparent in CDR2b. Interestingly, the number of replacement and silent changes observed in CDRl and CDR2 do not differ significantly from the expected. This indicates that the expected R:S ratio equals the observed R:S ratio. From the data presented in Figure 13.9 and in the literature (Schiff et al., 1985; Tomlinson et al, 1992) it is clear that nucleotide and amino acid changes accumulate in the germline CDRs in a highly non- random fashion. The number of point mutations that generate a stop codon is significantly lower for two of the point mutation models, however this analysis includes 7 codons that contain additional nucleotide substitutions within the codon thus resulting in an amino add replacement rather than a stop codon. If these are not included in the analysis then the observed deficit of nucleotide generated stop codons is significantly different from the expected for all three random point mutator models. Distribution of insertions/deletions: Analysis of the 25 VH-UI sequences (Pascual and Capra, 1991) reveals that there appearsto be a concentration of insertion and deletion events in the leader intron of these sequences. Since this is similar to the distribution of insertions and deletions in VKOXI genes (see above), the distribution of these events was plotted (Fig. 13.10). In addition, an insertion/deletion distribution graph was also constructed for 16 VH-I genes (Pascual and Capra, 1991).

25 VH.m 16VH_i

-200 -100 0 100 200 300 400 -200-100 0 100 200 300 400 500 Nucleotide Position Nucleotide Position 3- CDRl CDR2 p—i—ncz L CDRl CDR2 Figure 13.10. Distribution of 16 and 53 unique insertion and deletion events found in 16 VH-I and in 25 VH-ni human germline genes respectively. All sequences were obtained from Pascual and Capra, 1991. Positions where the insertion (+) or deletion (-) of triplets of nucleotides ("-») occurred are indicated by open arrow-heads, the number of times that each event occurred is indicated. The data are grouped in 50 bp intervals. The relative positions of the leader (L), CDRl and CDR2 are indicated in the diagrams below the graphs. The first nucleotide of the putative coding region was assigned position 0.

183 The distribution of insertions and deletions around human VH-I and VH-UI genes resembles the distribution around VKOXI genes (Fig. 13.8). Significant concentrations of insertion and deletion events are apparent in the leader intron in both VH gene families and the 3' termini of these sequences are flanked by a lesser concentration of insertions/deletions. A number of insertions and deletions were also found in CDR2 of the VH-UI genes. Strikingly, almost all of these of these involved the insertion or deletion of nucleotide triplets or multiples thereof, and occurred in both functional genes and pseudogenes. In one the of VH-IU genes the insertion of 4 codons took place near the 3' terminus of the putative coding region. Therefore it appears that the evolution of VH-IU genes has involved the selection of insertions and deletions that do not alter the reading frame. It is difficult to envisage how this type of selection could directly act on germline genes which are only translated in the context of a V(D)J rearrangement in B cells (see section 13.5c). Nevertheless, the fact that both human VH families are flanked by insertion/deletion events is consistent with the proposition that germline IgV genes are targeted by hyper-recombination events. Phylogenetic analysis: Recently the sequences of 9 functional human VH-UI genes were determined up to a point 257 bp upstream of the ATG initiation codon (Haino et al, 1994). These sequences were subjected to similar phylogenetic analyses as those described in chapters 9 and 12. However, since the distribution of insertions and deletions around VH-in genes suggests that hyper-recombination events are targeted to the putative coding units, the sequences were divided at a point approximately in the center of the leader intron and dendrograms for the regions upstream and downstream of this point were constructed (Fig. 13.11). The dendrograms clearly demonstrate that whereas the sequence relationships for the putative coding regions and the 3* segments of the total sequence are almost identical, the sequence relationships for the 5' segments of the total sequence is very different. This indicates that the two regions have evolved very differently, and further supports the proposition that hyper-recombination events targeted to the coding regions of human VH- m genes have taken place in the evolutionary history of these genes.

184 Figure 13.11. Dendrograms illustrating phylogenetic sequence relationships of 9 human VH-IH sequences (Haino et al., 1994). Four different regions were subjected to analysis: the entire sequenced region (upper left), the regions 5' (upperright) and 3' (lower left) of a point that bisects the leader intron and the putative coding region (lower left). The diagram below each dendrogram indicates the region that was analyzed and indicates the relative positions of the promoter (P), leader (L) and germline VH gene (V).

Summary: The above data demonstrate that sequence diversity in human germline VH genes is also concentrated in the CDRs. In the VH-III gene family there is strong sequence conservation not only in the FRs, but also in CDR2b. Thus, CDR2b seems to be conserved in human VH genes as well as in murine VH genes. The significant deficit of stop codons generated by point mutations observed in murine VH and VkOxI germline genes is also apparent in the human VH-III family. Furthermore, the VH-IU germline genes contain many insertions and deletions that do not alter the reading frame. This and the lower than expected number of stop codons are indicative of strong selection for protein function. The distribution of insertions and deletions around members the VH-I and VH-UI gene families and the phylogenetic analysis of 9 VH-UI genes indicate that these genes may also be targets for hyper-recombination events.

Rabbit VH genes: The organization of the rabbit IgH chain locus is also very similar to the murine IgH locus (Knight, 1992). Because all of the VH genes sequenced to date are > 80 % homologous it appears that the VH locus of the rabbit consists of one single VH gene family, the individual members of which are separated by 3 - 8 kb of sequence (eg. see Currier et al., 1988; Knight, 1992). The rabbit VH locus contains over 100 VH genes, approximately half of which are pseudogenes (Gallarda et al, 1985; Currier et al, 1988; Knight, 1992),

185 however it appears that over 90 % of B cells rearrange and express only 3-4 VH gene segments (Short et al, 1991). Somatic diversification of the expressed VH-D-JH repertoire involves somatic gene conversion of the VH genes which often leads to the insertion or deletions of codons (Becker and Knight, 1990). Somatic mutations have been found in the D regions of rearranged rabbit V regions (Short etal, 1991). Variability plots: The nucleotide and amino acid variability plots for 14 rabbit VH genes that were isolated and sequenced by Roux et al. (1991) were constructed as described in chapter 8 (Fig. 13.12). Three out of the 14 sequences are pseudogenes because of stop codons and only one of these sequences (RVH732) contains an additional crippling mutation, the deletion of a single nucleotide. A further three pseudogenes were generated by single nucleotide deletions and in one of these sequences two frameshift mutations occurred. Although it cannot be excluded that some of the apparently functional coding regions contain defective transcription and/or recombination signals, many of the rabbit pseudogenes have thus not accumulated more than one crippling mutation.

0 20 40 60 80 100 Nucleotide Position Amino Acid Position

CDRl CDR2 CDRl CDR2

Figure 13.12. Nucleotide (left) and amino acid (right) variability plots for 14 rabbit VH genes. The horizontal line indicates the level of variability where only 1 out of the 14 sequences contained a nucleotide change (2.15). Positions where the insertion (+) or deletion (-) of triplets of nucleotides (u-0 occurred are indicated by open arrow-heads, the number of times that each event occurred is indicated. Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0.

Nucleotide variability peaks that are also present in the amino acid sequences are found in CDRl and CDR2a. However, concentrations of nucleotide and amino acid variability are also found in FRl and FR3. It is not clear whether this indicates that many of the residues in FRl and FR3 may not be critical for conservation of the overall configuration of the CDRs or whether some residues of FRl and FR3 may also interact with the antigen. Similar to the human VH-III gene family a number of insertions and

186 deletions that do not alter the reading frame of the germline genes are observed in both FRl and CDR2a in the 14 rabbit genes used for this analysis. This indicates that these germline genes have also been subjected to selection forces that favour germline genes with open reading frames, however how this type of selection may directly act on germline genes that are only expressed in the context of a functional V(D)J rearrangement in B cells is difficult to envisage (see section 13.5c). Codon-by-codon analysis: When this type of analysis is applied to the 14 rabbit VH germline genes (Table 13.5), it becomes apparent that the number of silent and replacement nucleotide exchanges in FRl are significantly below and above the expected numbers respectively.

Table 13.5. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt) for 14 rabbit VH germline genes Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 77 14 4(3») (1-30) Expl 64.0 61.7 62.2 27.6 29.9 30.6 3.2 3.5 3.1 **** CDRl Obs 38 6 in (31-35) Exp 38.3 36.5 36.4 5.0 6.3 6.8 1.7 2.2 1.8

FR2 Obs 20 11 8(7n) (36-49) Exp 27.2 25.4 25.2 9.3 10.3 10.3 2.5 3.4 3.5

CDR2a Obs 44 6 3n (50-57) Exp 40.5 39.3 39.0 11.8 12.7 13.1 0.7 1.0 0.9 ** CDR2b Obs 15 11 in (58-65) Exp 19.9 19.1 18.8 5.6 5.7 6.0 1.5 2.2 2.2 *** *** ** FR3 Obs 60 16 3(2n) (66-90) Exp 55.4 53.9 53.6 21.8 23.2 23.7 1.7 2.0 1.7

Entire V Obs 254 64 20(17n) region Exp 245.3 235.9 235.2 81.1 88.1 90.5 9.9 13.7 13.0 ***

For this analysis, a VH consensus sequence was determined for 14 rabbit VH genes (Roux et al., 1991). ^ It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991)." The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right).n Indicates neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation(s). x probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. = p -0.05; = p<0.05; ** = p<0.01; **** = p<0.001. For stop codons, the x2 value (1 df) is determined for the number of stop codon generating point mutations versus other mutations (i.e. replacement + silent).

187 In contrast, CDR2b displays evidence for strong sequence conservation, as evidenced by the statistically significant increase in silent and decrease in replacement mutations. The elevation of replacement changes and the depression of silent changes observed in CDR2a is statistically significant only for the mutator model derived from meiotic changes occurring in non-Ig pseudogenes and that takes into account likely nucleotide substitution biases (Li et al, 1984). For these reasons, this model is likely to be the most accurate and thus the variability pattern of CDR2a of the 14 rabbit VH genes is probably significant different from the expected pattern. The expected R:S ratios for FRl and FR3 are approximately 2:1, whereas in FR2, CDR2a and CDR2b it ranges from 3:1 to below 4:1 depending on the mutator model. The only significant elevation in the expected R:S ratio occurs in CDRl where itties between 5:1 and 8:1. Thus, the expected R:S ratios predict the presence of variability in CDRl but not in the other regions. A total of 20 nucleotide changes that could generate a stop codon were detected in the 14 rabbit VH sequences, but this is only significantly lower than expected for the random mutation model that does not take into account any substitution biases. However, when the fact that only 3 of the 20 stop codon-generating nucleotide changes actually result in a functional stop codon - the remaining 17 were 'rescued' by additional mutation(s) in the same codon - then the observed number is significantly lower than expected for all three mutation models. Summary: The rabbit VH genes analyzed above differ somewhat from the murine and human VH genes in that they display nucleotide and amino acid variability in FRl and FR3. Similar to the other two species they also contain variability peaks at both the nucleotide and amino acid levels in CDRl and CDR2a, and the CDR2b sequences have been strongly conserved throughout the evolution of these genes. Furthermore, the rabbit VH genes also contain significantly fewer than expected stop codons and most of the pseudogenes only contain a single crippling mutation, none contain more than two. This is similar to many murine and human pseudogenes that also have not diverged significantly from functional VH germline genes (see section 13.5a). Although insufficient sequence data was available to plot the distribution of insertion and deletion events or to carry out phylogenetic analyses, the above data indicate that the rabbit germline genes also display evidence of strong selection for protein function.

Chicken Y% and VH pseudogenes: In the chicken IgX locus the single J element is approximately 2 kb upstream of the CX gene, and the single functional Vxi gene is approximately 2 kb upstream of the J element. A 19 kb region of DNA that is situated 2.4 kb upstream of the functional V^l gene and contains 25 VA, pseudogenes was identified (Reynaud et al, 1987). Recently an additional VX pseudogene was detected (Kondo et al, 1993), however this sequence has

188 not been included in the analyses described below. Nine of the 26 pseudogenes are in the opposite orientation to the functional VX1 gene. The pseudogenes are non-functional due to truncations at the 3' or 5' ends of the coding regions or due to the absence or crippling of the recombination signal sequences (Reynaud et al, 1987). None of the pseudogenes contain a leader segment and all homology to the V^l gene is lost immediately upstream of the coding sequence. Strikingly, only two of the pseudogenes contain out-of-frame deletions. Only a single pseudogene, which is also truncated at the 5' end contains a stop codon. The remainder of the sequences contain open reading frames. Furthermore, only 3 WX pseudogenes contain two crippling truncations. The chicken IgH locus contains one JH element and a single functional VH gene. The two genes are separated by approximately 15 kb of DNA which contains around 15 D elements. A cluster of VH pseudogenes spread over 60 - 80 kb is found approximately 7 kb upstream of the functional VH gene and contains approximately one pseudogene in every 850 bp of DNA (Reynaud et al, 1989). The complete or partial sequences of 18 chicken VH pseudogenes revealed that 15 possessed open reading frames. Two of the sequences contained one stop codon each and another sequence contain an out-of-frame insertion. It was also pointed out that the 18 VH pseudogenes were all fused to a D gene which in some cases was joined to a partial J gene (Reynaud et al, 1987, Fig. 4). Indeed, analysis of the sequences of the putative VH-D germline fusions reveals that they are similar to somatically generated CDR3 regions. In some cases blocks of sequence identity between putative germline VH-D fusions and somatically rearranged VH-D joins can be detected (Reynaud et al, 1987, Figs. 3 and 4). This is similar to the horned shark, Heterodontus francisci, where VH-D and VH-D-JH rearrangements were detected in a cDNA library that was generated from the mRNA obtained from reproductive tissue (Kokubu et al, 1988). However, whereas all of the sequenced chicken VH-D sequences are non-functional, some apparently functional VH-D-JH sequences are present in the horned shark but it is not clear if these are expressed by B cells. No 'joined' germline V region genetic elements have been detected in mice or humans to date. In chicken as in mammals, junctional diversity contributes to the diversification of IgV regions (Reynaud et al, 1987; McCormack et al, 1989; Reynaud et al, 1989). Additional diversity is generated in the bursa of Fabricius by somatic hyperconversion between the pseudogenes and the rearranged IgV regions (Reynaud et al, 1987,1989). This is mediated by intrachromosomal gene conversion (Carlson et al, 1990). In the IgVX locus it was shown that the pseudogene donor sequences are not altered and the unrearranged VX1 gene is not targeted by the hyperconversion mechanism (Carlson et al., 1990; reviewed in Bezzubova and Buerstedde, 1994). It was demonstrated that the hyperconversion mechanism is biased toward pseudogenes that are VX1 gene-proximal and in the opposite orientation to the V^l gene (McCormack and Thompson, 1990). It

189 was also shown that the length of sequence homology between pseudogene donors and the VAl gene affects pseudogene usage during somatic hyperconversion. A study on a B cell tumour cell line, DT40, that continues somatic hyperconversion of its rearranged VA-J gene revealed that somatic hyperconversion was not affected by the deletion of the RAG-2 loci (Takeda et al, 1992) which indicates that the enzymes involved in V(D)J rearrangement are not likely to be involved in the somatic hyperconversion process. Recently, homologues of the yeast genes involved mitotic and meiotic recombination, RAD51, RAD52 and RAD54, were isolated from chicken genomic DNA (Bezzubova etal, 1993a,b; Bezzubova and Buerstedde, 1994). Two of these, the RAD51 and RAD52 homologues, were shown to be expressed at high levels in the primary chicken lymphoid organs as well as in the ovaries and testis of chicken (Bezzubova et al, 1993a,b). Another experiment utilizing the DT40 cell line found that transfection of gene construct of the rearranged IgA, gene resulted in the targeted integration of the construct into the rearranged locus (Buerstedde and Takeda, 1991). Similarly, unrearranged IgA gene constructs integrated into the unrearranged locus, and 6-actin gene constructs integrated into the 6-actin locus, hi contrast, targeted integrated occurred at greatly reduced levels in non-B cell lines. Although the definitive experiments need to be carried out these data lead to the tantalizing, if tentative conclusion that the RAD homologues found in chicken may mediate highly site-specific recombination events that responsible for somatic hyperconversion as well as germline recombination (eg. see Bezzubova and Buerstedde, 1994).

Chicken VA, pseudogenes: Variability plots: Nucleotide and amino acid variability plots for the 25 VA pseudogenes published by Reynaud et al. (1987) were constructed (Fig. 13.13).

190 100 200 20 40 60 80 100 Nucleotide Position Amino Acid Position

CDRl CDR2 CDR3 CDRl CDR2 CDR3

Figure 13,13. Nucleotide (left) and amino acid (right) variability plots for 25 chicken V^ pseudogenes (Reynaud et al, 1987). The horizontal line indicates the level of variability where only 1 out of the 25 sequences contained a nucleotide change (2.08). Positions where the insertion (+) or deletion (-) of triplets of nucleotides («-•) occurred are indicated by open arrow-heads, the number of times that each event occurred is indicated. Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0.

Prominent nucleotide and amino acid variability peaks are evident in all CDRs. Although two variability peaks are evident in FR2, they are due to variability at a single codon position, whereas in the CDRs variability is spread over a number of residues. Not only does the variability spike in CDR3 resemble that found in expressed V(D)J regions, but the 3' termini of the 20 pseudogenes that are not truncated at this end also resemble rearranged IgV genes because they contain additions or deletions of 1 - 4 nucleotides (Reynaud et al, 1987, Fig. 4). Eleven insertions of single or multiple triplets of nucleotides occurred in CDRl, two triplet insertions took place in FRl and two triplet deletions took place in FR2 (Reynaud et al, 1987, Fig. 4). Remarkably, no frameshift insertions and only two out-of-frame deletions are found in the 25 chicken IgA pseudogenes. Therefore most insertions conserve the reading frame. This is especially striking when considering that these are pseudogenes on which it is conventionally assumed no selection forces are active (eg. see Gojobori etal, 1982). Codon-by-codon analysis: In order to determine whether the observed patterns differ significantly from the expected, the observed numbers of nucleotide substitutions were compared to the expected numbers (Table 13.6).

191 Table 13.6. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt)for 2 5 chicken IgA pseudogenes Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 75 45 0 (1-20) Exp^l 80.6 77.5 77.6 34.0 37.3 37.9 5.3 5.2 4.4

CDRl Obs 95 38 4n (21-28) Exp 98.9 96.6 95.9 34.3 34.5 36.0 3.8 5.9 5.2

FR2 Obs 52 36 2n (29-44) Exp 63.1 60.2 59.7 21.2 22.2 23.0 5.7 7.6 7.5 CDR2 Obs 83 27 ln (45-51) Exp 82.8 79.7 79.0 24.6 27.8 29.3 3.6 3.4 2.7

FR3 Obs 114 35 3n (52-83) Exp 106.6 102.9 102.8 39.1 41.6 43.0 6.4 7.9 6.7 CDR3 Obs 140 25 1 (84-92) Exp 131.1 125.2 123.0 32.9 39.0 41.5 2.0 1.8 1.5

Entire V Obs 559 206 ll(10n) region Exp 563.1 542.1 538.0 186.1 202.4 210.7 26.8 31.8 28.0

For this analysis, a consensus sequence was determined for the 25 chicken IgA pseudogenes (Reynaud et al, 1987). ' It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ^ The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right). n Indicates neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation(s). x probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. * = p -0.05; ** = p<0.05; *** = p<0.01; **** = p<0.001. For stop codons, the x2 value (1 df) is determined for the number of stop codon generating point mutations versus other mutations (i.e. replacement + silent).

The statistically significant depression of replacement changes and excess of silent base changes indicates in FR2 indicates that the sequence for this region has been highly conserved. In contrast, for two of the three mutation models the excess of amino acid replacements and the suppression of silent changes in CDR3 is statistically significant (p < 0.01). The data also indicates that the R:S ratio of the variability peaks found in CDRl and CDR2 does not deviate significantly from the expected R:S ratio. The 25 chicken IgA pseudogenes analyzed in Table 13.6 differ from the murine, human and rabbit genes analyzed in previous sections in that there is verytittle fluctuatio n of expected R:S ratios between the different sub-regions. The expected R:S of all FRs

192 and of CDRl and CDR2 ranges between 2:1 and 3:1. There is a slight elevation in CDR3, where the expected R:S ratio is between 3:1 and 4:1.

Chicken VH pseudogenes: Variability plots: The distribution of nucleotide and amino acid variability was plotted for the 18 chicken VH pseudogenes available from the literature (Reynaud et al, 1989).

100 200 300 20 40 60 80 100 120 Nucleotide Position Amino Acid Position CDRl CDR2 CDRl CDR2

Figure 13.14. Nucleotide (left) and amino acid (right) variability plots for 18 chicken VH pseudogenes (Reynaud et al, 1989). The horizontal line indicates the level of variability where only 1 out of the 18 sequences contained a nucleotide change (2.12). Positions where the insertion (+) or deletion (-) of triplets of nucleotides H occurred are indicated by open arrow-heads, the number of times that each event occurred is indicated. Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0.

It can be seen in Figure 13.14 that nucleotide and amino acid variability is concentrated in CDRl and CDR2. Unlike the previous variability plots for VH genes from other species (chapters 8 and 11 and see above), concentration of variability is also present in the 3' terminus of the putative coding region which reflects the CDR3-like structure of the 3' termini of the chicken VH pseudogenes (Reynaud et al, 1989). A minor nucleotide variability peak is present in FR3, however this is not reflected at the amino acid level indicating that the majority of those nucleotide changes are silent. The overall level of variability in the FRs is extremely low when compared to the variability in the CDRs. The insertion of a nucleotide triplet took place in FRl and in CDR2, and in CDR2 two identical deletions of a nucleotide triplet was detected (Reynaud et al, 1989, Fig. 4). Only one of the 18 VH pseudogenes contains an out-of-frame insertion, and no out-of- frame deletions are present (Reynaud et al, 1989, Fig. 4). Thus, the published chicken

193 VH pseudogenes also display highly non-random accumulation of nucleotide and amino acid changes in the CDRs but not in the FRs.. Codon-by-codon analysis: The results from this analysis are shown in Table 13.7.

Table 13.7. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt)for 1 8 chicken IgVn pseudogenes Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 54 15 1 (1-30) Expfl 49.1 47.0 46.6 17.9 19.8 20.2 2.1 2.2 2.1 CDRl Obs 54 13 0 (31-35) Exp 55.1 53.7 53.1 8.9 9.2 10.3 2.9 4.1 3.6

FR2 Obs 29 27 5(4*) (36-49) Exp 41.2 38.6 38.1 16.5 17.3 17.6 5.7 7.6 7.5 *** *** CDR2a Obs 72 9 0 (50 - 57) Exp 66.3 63.6 62.6 14.7 17.4 18.5 2.0 2.5 2.1 ** *** CDR2b Obs 106 18 5n (50-65) Exp 96.8 93.9 93.3 29.5 31.6 32.9 2.7 3.5 2.8 ** *** *** FR3 Obs 95 21 2n (66-101) Exp 85.2 81.5 80.6 29.9 32.9 34.1 3.0 3.5 3.3 *** *** Entire V Obs 338 94 13(1 P) region Exp 327.4 314.7 311.7 102.7 110.8 115.1 14.1 18.4 17.1

For this analysis, a consensus sequence was determined for 18 chicken IgH pseudogenes (Reynaud et al, 1989). t It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ^ The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies determined for meiotic non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right). n Indicates neutralized stop codons, i.e. a mutation that would have resulted in a stop codon was neutralized by additional mutation(s). x2 probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. = p -0.05; ** = p<0.05; *** = p<0.01; **** = p<0.001. For stop codons, the x2 value (1 df) is determined for the number of stop codon generating point mutations versus other mutations (i.e. replacement + silent).

The differences between the observed and expected numbers of replacement and silent nucleotide exchanges are statistically significant in FR2, CDR2a, CDR2b and FR3. The excess of silent changes and the deficit of replacement changes in FRl indicates strong sequence conservation. Both the amino and the carboxy regions of CDR2 display significant increases of replacement changes and a decrease of silent changes. However, the Wu-Kabat variability plot constructed for the 18 VH pseudogenes (Fig. 13.14)

194 indicates that amino acid variability in CDR2b is less than it is in CDR2a. Interestingly, the observed number of replacement and silent changes in FR3 are significantly elevated and reduced respectively. It is apparent from the variability plots (Fig. 13.14) and from the DNA sequences of these pseudogenes (Reynaud et al, 1989, Fig. 4), that this is due to the variability spike present at the 3' termini of the VH pseudogenes rather than a general elevation of variability throughout FR3. The number of nucleotide changes that generate stop codons are in accordance with the expected number, however this includes many stop codon-generating changes that occur in codons that contain additional changes which 'rescue' the stop codon. In fact, only two of the observed 13 nucleotide substitutions that could generate stop codons actually do so. When this is taken into account, the number of observed nucleotide changes that produce a 'functional' stop codon is significantly below the expected number. The expected R:S ratios of the FRs range from between 2:1 and 3:1 depending on the mutator model. The expected R:S ratio for CDR2b falls within approximately the same range as the FRs, however the expected R:S ratio for CDR2a ranges between 3:1 and approximately 4:1. The highest expected R:S ratio is found in CDRl where it lies between 5:1 and approximately 6:1. Thus, as with the other sequences the expected R:S ratio is not constant throughout the putative coding region of the 18 chicken VH pseudogenes that were analyzed. Summary: The data on the chicken IgVJt and IgVH pseudogenes clearly demonstrates that a number of the features observed in mammalian IgV germline genes are also present. Both data sets display significant concentration of sequence variability in the CDRs and sequence conservation of the FRs. Indeed, the Wu-Kabat amino acid variability plots closely resemble those obtained from somatically mutated IgV regions (eg. see Wu and Kabat, 1970). The sequence data for these pseudogenes reveals that the majority of sequences are continuous open reading frames and that almost all insertion and deletion events within the putative coding regions involve triplets of nucleotides or multiples thereof. Additionally, there is a significant deficit of stop codons in both data sets. Moreover, in several sub-regions the observed numbers of nucleotide changes differ significantly from the expected numbers, thus confirming the non-random distribution of sequence variation amongst these pseudogenes. The above data on the chicken Ig pseudogenes indicate that the sequences have been subjected to strong selection for protein function and antigen binding. However, this is extremely difficult to reconcile with the fact that all of the sequences analyzed are considered to be pseudogenes and therefore cannot possibly be subject to direct selection forces which act on the expressed proteins. Furthermore, since pseudogenes are not under any functional constraints they are expected to accumulate further mutations, some of which should result in additional crippling mutations (Li et al, 1981; Miyata and Yasunaga, 1981; Gojobori etal, 1982; Li, 1983; Kimura, 1983; Graur etal, 1989), but 195 the above data demonstrates clearly that the chicken pseudogenes do not conform with these predictions (for further discussion see below and section 13.5c).

Xenopus VH genes: The organization of the IgVH locus in Xenopus, the South African clawed toad, is similar to the mammalian IgVH locus and contains approximately 100 VH genes, an unknown number of D elements and at least 7 JH genes (Schwager et al, 1988; Schwager et al, 1989; Wilson et al, 1992). Three VH families, VH-L VH-II and VH-III were identified and the presence of a fourth VH family was deduced from cDNA sequences of expressed VH-D-JH that did not correspond with any VH-L VH-IJ or VH-III sequences (Schwager et al, 1989). Examination of the 5' and 3* flanking region sequences of 9 VH-L 8 VH-II and 5 VH-IU genes that were sequenced by Schwager et al. (1989) indicates that whereas the putative VH coding sequences are > 80 % homologous, the flanking region sequences generally bear very little homology immediately 5' and 3' of the coding regions. The incidence of pseudogenes varies between the VH gene famities: fewer than 15 % of VH-I and VH-JJ genes and approximately 50 % of VH-IU genes are pseudogenes. Junctional diversity and somatic hypermutation contributes to antibody diversity (Wilson et al, 1992). The somatic hypermutation rates are somewhat lower than in mice and somatic mutations are limited to the central portion of the VH gene, none or very few occur in the JH gene (FR4). The lack of germinal centers and the small amount of affinity maturation in Xenopus immune responses led to the suggestion that Xenopus lacks the ability to selectively expand B cells that produce high affinity Ig (Wilson et al, 1992). Variability plots: The sequences of nine VH-I germline genes were published by Schwager et al. (1989). The published sequences extended to a point 600 bp upstream of thefirst nucleotide of the putative VH coding region for 5 of the 9 genes, and 400 bp of downstream sequence was also determined. The 5' and 3' flanking region sequences of two out of the 9 putative coding regions show little homology with the other flanking regions. Since this would bias a nucleotide variability plot, only the 9 putative coding regions were included in this analysis (Fig. 13.15).

196 8n

< < £ 4-1 g-4 cr PS J? 2^

0 0 0 100 200 300 0 20 40 60 80 100 Nucleotide Position Amino Acid Position

CDRl CDR2 CDRl CDR2

Figure 13.15. Nucleotide (left) and amino acid (right) variability plots for 9 Xenopus VH germline genes (Schwager et al, 1989). The horizontal line indicates the level of variability where only 1 out of the 9 sequences contained a nucleotide change (2.25). Positions at which there was no nucleotide or amino acid variability were arbitrarily assigned a value of 0.

Although, the overall level of variability in the genes analyzed is lower than in the mammalian IgV genes or in the chicken IgV pseudogenes (see above), it is clear that the majority of variability is concentrated in CDRl and CDR2a. The sequence of CDR2b has been conserved, as have the FR sequences except for some elevation in variability found at the 3' end of FR3. Although only a small number of sequences contributed to the data the overall variability pattern is consistent with the observed patterns in mammals and chicken. In addition, the 8 VH-II and 5 VH-III Xenopus germline gene sequences in the literature (Schwager et al, 1989) indicate that sequence variability among the members of these families is also restricted to CDRl and CDR2. Codon-by-codon analysis: The nucleotide substitutions present in the 9 germline VH-I genes were compared with the expected changes, however due to the low number of sequences and nucleotide changes the data set is probably too small for a meaningful statistical analysis (Table 13.8).

197 Table 13.8. Statistical analysis of the 'observed' versus the 'expected' numbers of replacement, silent and stop codon-generating mutations at the nucleotide level (assuming a point mutation processt)for 9 Xenopus Vfl-I germline genes Sub-section of Numbers of nucleotide differences resulting in: V region (codons) Replacement Silent Stop FRl Obs 6 8 0 (1-30) Expl 10.0 9.7 9.6 3.4 3.8 3.9 0.6 0.6 0.6 *** *** *** CDRl Obs 14 4 0 (31-35) Exp 15.2 14.2 13.9 1.6 1.9 2.2 1.2 1.9 1.9 FR2 Obs 6 3 0 (36-49) Exp 6.2 5.8 5.7 2.0 2.1 2.2 0.8 1.1 1.0 CDR2a Obs 20 8 0 (50-58) Exp 19.6 19.0 18.8 6.4 7.0 7.2 0.0 0.0 0.0 CDR2b Obs 4 3 0 (59-66) Exp 6.0 6.0 6.0 2.3 2.2 2.3 0.7 0.8 0.7 FR3 Obs 11 5 0 (67-93) Exp 12 11.4 11.3 3.3 3.7 3.9 0.7 0.9 0.8 Entire V Obs 61 31 0 region Exp 69.0 66.1 65.3 19.0 20.7 21.7 4.0 5.3 4.9 *** **

For this analysis, a consensus sequence was determined for 9 VH-I genes (Schwager et al., 1989). t It is assumed that mutations at each site are equally probable, but it does not take into account mutational hotspots (see Kaartinen et al, 1991). ^ The three expected numbers are the expectation based on a random point mutation process, making no assumptions about nucleotide substitution biases (left), corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic 'selection free' substitutions in Ig genes (center), or according to the empirical substitution frequencies tetermined for meiotic non-Ig pseudogenes after elimination of probable changes of a methylated C to T (right), x2 probabilities (1 degree of freedom, df), comparing the observed numbers of replacement to silent changes against each of the expected numbers. = p<0.05; '' = p<0.01. For stop codons, the X2 value (1 df) is determined for the number of stop codon generated by point mutations versus other mutations (i.e. replacement + silent).

Except for FRl, the numbers of observed nucleotide changes in all other sub- regions fell within expected limits, however it is clear that too few changes were present for this analysis to bear much weight. Nevertheless, one interesting point emerges from Table 13.8: not a single stop codon that was generated by a nucleotide substitution was present in the 9 VH-I sequences and this is significantly below the expected number for two of the mutator models. It is probable that the absence of stop codons is real, since none of 7 Xenopus pseudogenes in the literature (Schwager et al, 1989) are disabled by a stop codon. Distribution of insertions/deletions: Although the complete 5' flanking region sequences were determined for only 5 out of the 9 VH-I germline genes, the distribution of insertions and deletions in the available sequence was nevertheless plotted (Fig. 13.16).

198 1000 Nucleotide Position

-n—LP LH V

Figure 13.16. Distribution of 25 unique insertion and deletion events found in 9 VH-I Xenopus germline genes. The sequences were obtained from Schwager et al, 1989. The data are grouped in 50 bp intervals. The relative positions of the promoter region (P) leader (L) and putative coding region (V) are indicated in the diagram below the graph. The first nucleotide of the putative coding region was assigned position 0. The number of sequences available for the interval 550-500 bp upstream of the coding region was 5, for the interval 500 - 450 5' of the coding region it was 6, for the region 450 - 0 bp upstream of the coding region it was 7, for the coding region and for 515 bp of 3' flanking region all 9 sequences were available, and 8 sequences were available to a point 616 bp 3' of the coding region, and for the remainder of the 3' flanking region 7 sequences were available.

The insertion/deletion distribution diagram clearly illustrates the presence of a concentration of insertions and deletions in the region flanked by the leader and by the promoter. Lesser concentrations of insertion and deletion events are also present in the leader intron, and downstream of the putative VH coding region. This data taken together with the fact that the 5' and 3' flanking regions of two of the VH-I sequences bear no similarity with the other sequences is also consistent with the proposition that the coding region is targeted by hyper-recombination events. Summary: Although the data set available for the Xenopus VH-I gene family is small, it does indicate that the concentration of sequence variability in the CDRs and the deficit of stop codons is also apparent in these sequences. Furthermore, the putative coding regions are also flanked by insertions and deletions, which taken together with the fact that the 5' and 3' flanking regions of 2 out of 7 or 9 sequences respectively are completely different, may indicate that Xenopus VH genes are also targeted by hyper-recombination events. Additionally, it is interesting to note that the 5' and 3' flanking region sequences of virtually every one of the Xenopus VH-II and VH-III germline genes sequenced by

199 Schwager et al (1989, Figs. 2 and 3) bear very little similarity to any of the other related genes.

Summary of the patterns of sequence variability among mammalian, avian and amphibian IgV germline genes:

The results of the above analyses are summarized in Table 13.9.

Table 13.9. Properties of germline IgV genes from different vertebrate species Nucleotide and Different lineal Insertions/Deletions Significant deficit amino acid relationships bracketing the of functional stop variability between 5' and 3' coding/transcription codons plots regions unit a Mouse VH Yes Yes Yes Yes b Mouse VL Yes Not available Yes Yes C Human VH Yes Yes Yes Yes d Rabbit VH Yes Not available Not available Yes e Chicken y VL Yes Not available Not available Yes * Chicken y VH Yes Not available Not available Yes S Xenopus VH Yes Not available Yes Yes

Based on data obtained from a 30 genuine VH186.2 related germline genes (chapter 10);b 26 VKOXI germline genes (Even et al, 1985; Milstein et al, 1992; nucleotide variability plots for VKOXI germline genes were previously published in Gonzalez-Fernandez and Milstein, 1993); c 25 human VH-IH genes (Pascual and Capra 1991) and 9 human VH-HI sequences containing 5' flanking regions (Haino et al, 1994);d 14 rabbit VH genes (Roux et al., 1991);e 25 chicken N\\ pseudogenes (Reynaud et al, 1987);f 18 chicken VH pseudogenes (Reynaud et al, 1989); S 9 Xenopus VH-I genes (Schwager et al, 1989).

The germline IgV genes of all species analyzed contain non-random concentrations of sequence variability in the CDRs, although it must be noted that the 14 rabbit VH genes also contained sequence variability in some of the FRs. In several IgV sub-regions of all species except Xenopus (due to the small data set) the observed versus expected numbers of nucleotide substitutions differed significantly, further highlighting the non-random distribution of variability within these genes. Analysis of the percent differences between the observed R:S ratio and the expected R:S ratio (Table 13.10) indicates that in some sub-regions where no statistically significant differences were observed large differences between the two ratios are nevertheless apparent.

200 Table 13.10. Observed vs expected R:S ratios in FRs and CDRs of germline IgV genes in different vertebrate species observed R:S / expected R:S ratio (%\t FRl CDRl FR2 CDR2f FR3 aMouse VH 38.24 281.63 104.63 a: 357.63/b: 111.72 105.99

''Mouse VL 78.78 218.78 62.47 972.06 56.25 *** ^umanVH 46.52 156.55 26.74 a: 129.17/b: 51.67 43.68

dRabbitVn 270.58 118.32 74.36 a: 246.32 / b:43.52 165.81 *** * ** eChicken \|/ VL 81.40 93.85 55.65 114.01 136.24 *** fChicken y VH 156.05 80.57 67.27 x 236.42 / b: 177.71 191.39 *** *** *** SXenopus VH 30.5 55.4 77.2 a: 95.7/b: 51.1 75.9 ***

Based on codon analyses carried out on a 30 genuine germline VH genes (chapter 10); D 26 VKOXI genes (Even et al, 1985; Milstein et al, 1992);c 25 human VH-III genes (Pascual and Capra, 1991); d 14 rabbit VH genes (Roux et al, 1991); e 25 chicken Y%\ pseudogenes (Reynaud et al, 1987);f 18 chicken VH pseudogenes (Reynaud et al., 1989); 8 9 Xenopus VH-I germline genes (Schwager et al, 1989). t The expected R:S ratio was based on a random point mutation process but corrected for nucleotide substitution biases according to the empirical substitution frequencies determined for meiotic substitutions in non-Ig pseudogenes after elimination of probable changes of a methylated C to T (Li et al, 1984). 1 a: refers to CDR2a (i.e. residues 50 - 58 in mice, human and Xenopus, residues 50 - 57 in chicken and rabbit), and b: refers CDR2b (i.e. residues 59 - 66 in mice, human and Xenopus, 58 - 65 in chicken and rabbit). * = p - 0.05; ** = p < 0.05; *** = p < 0.01, **** = p <0.001, as determined in the codon-by-codon analyses presented in previous sections.

From the sequence data and the variability plots it is apparent that the sequence of CDR2b of the above VH genes are generally more conserved than the CDR2a sequences. This was previously observed by others (Schwager et al, 1989; Tomlinson et al, 1992; Haino et al, 1994). The codon-by-codon analyses also demonstrate that the expected R:S ratios vary in the different sub-regions with elevated R:S ratios typically present in the CDRs, especially in CDRl. The apparent diversification of CDRs and conservation of FRs of therefore is a property of both germline IgV genes and somatically diversified rearranged IgV regions of all of the above vertebrate species. All sequence sets analyzed above contain significantly fewer stop codons generated by point mutation than would be expected under a model whereby the germline IgV genes accumulate such nucleotide changes at random. Indeed, many point mutation- generated stop codons are 'rescued' by additional nucleotide changes within the same stop codon. In addition, insertions of single or multiple nucleotide triplets are found in the human VH-ITJ family, the rabbit VH genes and the chicken IgW^ and IgVn pseudogenes. Furthermore, the majority of IgV pseudogenes do not appear to have accumulated more than one or a few crippling mutations and they have not diverged significantly from functional genes. It was previously suggested that the concentration of sequence

201 variability at the putative antigen binding sites of germline IgV genes indicates that these genes are under some form of selection on the basis of protein function associated with antigen binding (eg. see Bothwell et al, 1981; Sims et al, 1992; Kodaira et al, 1986; Pascual and Capra, 1991, see below).

The putative coding sequences of the murine VKOxl germline genes, the human VH-I and VH-II families and the Xenopus VH-I genes analyzed above are flanked by concentrations of insertions and deletions. The murine VH germline sequences presented in this thesis and the 9 human VH-I germline genes analyzed above demonstrate that the sequence relationships of the transcription/coding units and the 5' flanking regions differ, indicating that the transcription/coding units evolve more rapidly than the 5' flanking regions of these genes. Taken together, the data suggests that the transcription/coding units have been targets for hyper-recombination events. This is further supported by the fact that although the putative coding sequences of Xenopus VH-L VH-II and VH-III germline genes are highly homologous, very little sequence homology is present in the 5' and 3' flanking sequences of most of these genes (Schwager et al, 1989). The 5' flanking region sequences of the 9 VH-I germline genes that were analyzed above (Haino et al, 1994) also diverge immediately upstream of the promoter region. Thus, evidence for the presence of a hyper-recombination mechanism that can target the putative transcription/coding regions of germline IgV genes has been obtained from mice, human and Xenopus germline sequences, indicating that site-specific hyper-recombination may be a generalized mechanism involved in the evolution of germline IgV genes (see section 13.5c). In an earlier study it was found that a germline VH gene contained direct and inverted repeats at the 5' and 3' ends of the putative coding region (Rechavi et al, 1983). These structures were proposed to be involved with the duplication and expansion of germline VH gene families. This would also be consistent with the above finding that hyper-recombination events targeting the putative coding regions of human and Xenopus VH and murine VKOXI germline genes take place. However, if the hyper-recombination events are mediated by direct and/or inverted repeats, these motifs should consistently coincide with the putative recombination hot spots flanking the transcription/coding units. However, analysis of the VKOXI sequences (Milstein et al, 1992), the human VH-I and VH-in sequences (Pascual and Capra, 1991) and the Xenopus VH sequences (Schwager et al, 1989) does not reveal an obvious correlation between insertions/deletions and direct/inverted repeats. Arguably the most intriguing data set presented in this section are the chicken IgV^ and IgVH pseudogenes. The Wu-Kabat plots generated for both sets of pseudogene sequences, 25 and 18 respectively, show a very clear-cut difference between the FRs and CDRs: sequence diversity is very high in CDRs and very low in the FRs. Only a minority contain more than one crippling mutation and, remarkably, very few of these 202 pseudogenes are crippled because of stop codons or frameshift mutations. Indeed, there is a statistically significant deficit of stop codons in both sets of sequences. In addition, the majority of insertions and deletions are in frame, i.e. single or multiple nucleotide triplets. Thus, the chicken pseudogenes also exhibit evidence of powerful selection for protein function. However, it is difficult to imagine how the selection forces could act directly on pseudogenes which are incapable of producing a functional protein. Indeed, almost all of the chicken pseudogenes are incapable of rearrangement and/or transcription. Because pseudogenes are under no functional constraint all mutations that occur in these sequences should become fixed in the population with equal probability. Indeed, it was shown that the rate of nucleotide substitutions and frameshift mutations in non-Ig pseudogenes is much higher than in functional genes (Li et al, 1981; Miyata and Yasunaga, 1981; Gojobori etal, 1982; Li, 1983; Kimura, 1983; Graur etal, 1989). Therefore it was proposed that pseudogenes, due to the lack of functional constraint, diverge rapidly once they have been crippled (Reviewed in Li, 1983; Kimura, 1983) and as a result the highly non-random patterns seen in functional IgV genes should be much less apparent in IgV pseudogenes. However, the codon-by-codon analyses for the IgVX and IgVH chicken pseudogenes clearly indicate that in fact the opposite is the case, i.e. the pseudogenes exhibit features that are normally found in genes that are acted upon by strong selective pressures. An earlier comparison of 7 functional murine VH genes with 9 VH pseudogenes (Schiff et al, 1985) also came to the conclusion that IgV pseudogenes seem to be subject to the same selection pressures as functional genes. It was argued that this is due to a pseudogene correction mechanism that probably involves gene conversion-mediated exchange of a crippled sequence with a functional one (Cohen and Givol, 1983; Schiff etal, 1985; Blankenstein etal, 1987; Haino etal, 1994). The organization of the chicken IgVJt and IgVH loci (Reynaud et al, 1987,1989) provides an ideal test situation for this proposition since each locus contains a single functional gene and a large number of homologous pseudogenes that appear to have been subjected to strong selection forces. If these pseudogenes have been prevented from diverging and accruing multiple crippling mutations by gene conversion-mediated correction, then they must show evidence that the whole or part of each pseudogene has been replaced by sequence from the single functional gene. Comparison of the IgV^ and IgVn pseudogene sequences with the functional V\\ or VHI genes respectively reveals that all of the pseudogenes differ from the functional genes at many if not most positions in the CDRs. The homology between the FRs of the VH pseudogenes and the functional VHI gene appears to be higher than it is in the IgX locus. However, the pseudogenes at both loci contain coincident nucleotide changes at many positions (both CDRs and FRs) that are shared by most or all of the pseudogenes but not by the functional genes. Thus, if a gene conversion-mediated pseudogene correction mechanism is operative in the chicken it can only correct fragments of the FRs where the pseudogenes 203 share limited stretches of sequence homology with their functional counterpart. The CDRs appear to be too heterogeneous for homology-mediated sequence correction to be operative. This would predict that crippling mutations should accrue mainly in the CDRs and to a much lesser extent in the FRs of the chicken pseudogenes. The data indicates that no such bias is apparent (Reynaud et al, 1987, 1989). Therefore, unless additional functional IgV genes that have so far escaped detection are present in the chicken genome, pseudogene correction mediated by gene conversion cannot explain the lack of divergence of the chicken pseudogenes. The presence of many functional genes in the IgV loci of mammals and Xenopus makes it impossible to carry out a similar test of the proposition, thus it cannot be excluded that in these species gene correction occurs. However, presumably any gene that has accrued a crippling mutation is homologous to its intact allelic counterpart or to other members of the multigene family. It would therefore be expected that a gene conversion-mediated pseudogene correction mechanism should generally act to correct any recently generated pseudogene and thus other pseudogenes should display similar characteristics to IgV pseudogenes, i.e. generally little divergence from the functional genes and few additional crippling mutations. This is in contrast to the available data which suggests that non-Ig pseudogenes generally diverge rapidly due to the accumulation of further mutations (Li et al, 1981; Miyata and Yasunaga, 1981; Gojobori etal, 1982; Li, 1983; Kimura, 1983; Graur et al, 1989). It would therefore have to be assumed that the pseudogene correction mechanism has the ability to differentiate between IgV and non-Ig pseudogenes. Thus, the analysis of the murine germline VH genes isolated in this work, and the analysis of germline IgV genes from other species has brought to light a number of highly non-random sequence patterns. Any evolutionary model that is invoked to explain the origin and maintenance of germline IgV genes must also be able to account for the sequence patterns discussed above. c) Evolutionary models and the patterns of sequence variability among germline IgV genes

It is generally accepted that multigene families arose by gene duplication from a primordial precursor (Ohno, 1970), and that the subsequent evolution of these genes involved tandem duplication by unequal crossing over (Ohno, 1970; Edelman and Gaily, 1970; Hood et al, 1975). As shown in Figure 1.1, hundreds of murine germline VH genes are arranged in tandem. A problem with maintaining tandemly arrayed homologous genes is that unequal reciprocal meiotic crossover events could result in the deletion of large portions of the tandem arrays. The data presented in this thesis and by others demonstrates that although members of IgV multigene families are generally homologous, they nevertheless display significant sequence diversity. This is in contrast to other 204 multigene families where the need to produce large numbers of identical RNA or protein molecules has resulted in very high sequence homology (eg. Hood et al, 1975; Arnhein, 1983). Thus, any evolutionary theory must be able to account for the conservation of 'housekeeping' multigene families as well as the generation of germline diversity in the IgV multigene family. Furthermore, the data presented in previous sections demonstrate that the patterns of sequence variability observed in germline IgV genes of several vertebrate species are significantly different from those expected under a model where mutations are introduced at random throughout the evolution of these genes. Some of these were previously reported, however a number of new patterns have emerged. Several evolutionary mechanisms have been invoked in the past to account for the observed characteristics of germline IgV genes, and it will be of interest to re-examine these in the light of the above data. Furthermore, in order to consider as many options as possible two new models, which at present may seem rather speculative, will be discussed.

Evolution of IgV genes by random mutagenesis followed by natural selection: The evolution of the IgV loci that contain large numbers of homologous yet unique sequences no doubt involved genome duplication (Ohno, 1970) and tandem duplication events (reviewed in Li, 1983). Indeed, evidence for duplicated genes has "been found in both human and murine IgV loci (eg. see Kodaira et al, 1986; Chen and Yang, 1990; Sasso et al, 1992; Matsuda et al, 1993). However, whereas multigene families of 'house-keeping' genes such as rRNA, tRNA, and histone genes contain almost no sequence diversity (eg. Hood et al, 1975, Table 1), individual IgV genes can be identified by distinct diversified regions: the putative antigen contact sites (CDRs). This and the data presented in this thesis indicates that each IgV gene (including the pseudogenes) has been subjected to some form of strong selection on an individual basis. Thus, the proposition that natural selection acts on the IgV multigene families as a whole (Hood et al, 1975) and not on individual members does not fit the observed sequence patterns (eg. see chapters 8 -12, section 13.5b and reviewed in Steele et al, 1993). The high R:S ratio in the antigen recognition portions of the genes coding for MHC class I and II molecules and the lower R:S ratio in the other parts of the genes was taken as evidence for overdominant selection (i.e. heterozygotes possess the highest fitness and are positively selected by Darwinian evolution, Hughes and Nei, 1988, 1989). Because of the critical functions that the MHC molecules are involved in (immune response, recognition of self vs non-self) and because they are expressed on most somatic cells it follows that deleterious mutations will adversely affect the survival of the whole organism, whereas rare advantageous mutations may increase its survival during episodes of disease-outbreak. Thus, natural selectionforces ar e likely to act directly on the genes coding for the MHC proteins (Hughes and Nei, 1988,1989). 205 The high R:S ratio in the CDRs of germline IgV genes and the lower R:S ratio in the FRs was taken as evidence that the germline CDRs are subject to positive diversity- enhancing selection, whereas purifying selection acting on the FRs is responsible for the conservation of FR sequence (Gojobori and Nei, 1984; Tanaka and Nei, 1989). However, this proposal ignores two powerful reasons why germline IgV genes can not be under direct selection pressures. First, germline IgV genes are only fully expressed in the context of a functional V(D)J rearrangement in B cells. Therefore the only way that direct positive and negative Darwinian selection forces can be responsible for the sequence patterns in IgV genes is if these forces can act directly on the DNA sequence. Second, selection for antibody function takes place at the B cell level and not at the whole organism level. Thus, a B cell with a non-functional rearrangement will be eliminated from the B cell pool. However, the large number of functional IgV genes ensures that sufficient B cells enter the peripheral immune system to allow the combat of virtually any invading pathogen. The fact that avian species which contain only one or two functional IgV genes (Reynaud etal, 1987,1989; McCormack etal, 1989) are still able to mount successful humoral responses indicates that large numbers of IgV genes are not an absolute requirement for successful immune responses. Furthermore, a recent survey of the expression of germline IgV genes in humans showed that many functional mammalian germline IgV may never or rarely be expressed (eg. see Cox et al, 1994). This suggests that species with large numbers of germline IgV genes may in fact be carrying many surplus genes which are not critical for the survival of the organism. This further argues against the fact that unrearranged germline IgV genes are subject to strong natural selection forces (also see Baltimore, 1981).

Gene conversion: There is evidence in the literature that intrachromosomal gene conversion events do occur at IgV loci (eg. see Bentley and Rabbitts, 1983; Krawinkel et al, 1983; Ferguson etal, 1989; Haino etal, 1994). However, a number of theoretical studies have shown that intrachromosomal gene conversion events are likely to maintain sequence homogeneity among members of a multigene family (eg. see Edelman and Gaily, 1970; Egel, 1981; Nagylaki and Peters, 1982; Arnheim, 1983; Nagylaki, 1984). Thus, the very high sequence conservation of for example rRNA genes, AM repeats and small multigene families such as the a- and y-hemoglobin genes is likely to be due to gene conversion events (eg. see Hood etal, 1975; Arnheim, 1983). Although individual members of IgV multigene families are homologous, they are nevertheless unique sequences with most of the distinguishing changes concentrated in the CDRs. It could be assumed that biased gene conversion events (Ohta, 1984; Walsh, 1985) may mediate the diversification of the germline CDRs. However, since gene conversion tends to homogenize genes, CDR-specific gene conversion would result in the 206 homogenization of CDRs which is clearly not the case. In addition, strong selection forces would have to be invoked to account for the high R:S ratio in the germline CDRs since this could not be the result of gene conversion alone (eg. see Hughes and Nei, 1988). As was discussed above, it is very unlikely that individual germline IgV genes are subject to strong selection forces in species that contain many germline IgV genes (also see Baltimore, 1981). Furthermore, since this would in effect be invoking hyper- recombination events targeted to CDRs insertions and deletions should therefore bracket the CDRs. This is clearly not the case (also see below). It is known that IgV genes contain palindromes and direct repeats (Golding et al, 1987; Kolchanov et al, 1987; Schwager et al, 1987) and it has been shown that such sequences may mediate gene conversion between germline IgV genes (Krawinkel et al, 1983,1986). However, there is no evidence that palindromes, direct and inverted repeats are preferentially found at all CDR - FR borders. Thus it is difficult to imagine how gene conversion events mediated by these sequences could result in the observed distribution of sequence variability among germline IgV genes. In addition, it was shown that quasi- palindromic sequence motifs can result in significant numbers of frameshift mutations (Ripley, 1982). This leads to the prediction that if these sequence motifs are primarily responsible for the concentration of sequence variability in germline CDRs, then a significant proportion of genes should be crippled by frameshift mutations which should mainly occur at the FR - CDR boundaries. Examination of the sequences presented in this thesis and those published by others (eg. Reynaud et al, 1987, 1989; Schwager et al, 1989; Pascual and Capra, 1991; Milstein et al, 1992) indicates that this is not the case.

The neutral theory of molecular evolution: According to this theory, the bulk of molecular evolution is the result of random fixation of selectively neutral or nearly neutral mutations (Kimura, 1968). This is based on the fact that the rate of silent nucleotide changes appears to be much higher than the rate of nucleotide changes that result in amino acid replacements. This theory leads to a number of predictions. First, nucleotide differences at the third codon positions should predominate since mutations at these positions are most likely to be neutral. Second, since pseudogenes are not under any functional constraints any mutations occurring in crippled genes are in effect selectively neutral and are equally likely to become fixed in the population. Therefore, pseudogenes should accumulate further mutations more rapidly than functional genes. There is support for both of these points in the literature. It has been shown that in genes that display a very low rate of evolution the rate of evolutionary changes at the third codon position can in fact be very high (eg. Kimura, 1977; Jukes, 1978; Kimura, 1983). It has also been shown that generally pseudogenes accumulate mutations at a much higher rate than functional genes (eg. Li et al, 1981; Miyata and Yasunaga, 1981; Gojobori et al, 1982; Li, 1983; Kimura, 1983; Graur et al, 1989). It is therefore of interest to 207 determine whether functional germline IgV genes and IgV pseudogenes also conform to the predictions of the neutral theory. Mutation rates at the third codon positions: The mutation rates at all three codon positions were determined for the murine VH186.2 related sequences presented in Figs. 8.1 and

8.4 and compared to those obtained for somatically mutated derivatives of the VH186.2

germline gene (Table 13.11). Two different types of somatically mutated VH186.2 genes are included: those obtained from anti-NP responses and those obtained from unimmunized mice (see legend to Table 13.11).

Table 13.11. Rates of nucleotide change in germline VH186.2 related genes and in somatically mutated VH186.2 genes. No. Mutation Rate {%) at each codons Codon Position Genes used used 1 2 3 3* *VH186.2 related germline genes (BALB/c) FRl 630 1.06 0.58 1.43 1.16 CDRl 105 3.81 1.59 2.54 0.00 FR2 273 0.61 1.34 0.98 0.85 CDR2 357 7.00 5.32 3.55 1.03 FR3 378 1.41 0.97 0.97 0.71 Total FR 1281 1.07 0.86 1.20 0.96 Total CDR 462 6.28 4.47 3.32 0.79 bVnl86.2 related germline genes (C57BL/6J) FRl 1020 1.00 0.13 1.27 1.27 CDRl 170 1.37 0.78 1.37 0.20 FR2 442 0.01 0.60 0.75 0.68 CDR2 578 5.13 4.15 2.42 0.00 FR3 612 0.93 1.09 0.27 0.22 Total FR 2074 0.80 0.51 0.87 0.84 Total CDR 748 4.27 3.39 2.18 0.04 Somatically mutated VH186.2 genes (anti-NP) FRl 7358 0.20 0.20 0.26 0.20 CDRl 1249 0.93 3.52 1.17 0.43 FR2 3242 0.26 0.23 0.57 0.47 CDR2 4284 0.58 1.00 0.45 0.29 FR3 6096 0.21 0.31 0.47 0.37 Total FR 16969 0.21 0.25 0.40 0.31 Total CDR 5533 0.66 1.57 0.61 0.32 ^Somatically mutated VH186.2 genes FRl 1350 0.37 0.20 0.37 0.35 CDRl 225 0.15 1.20 1.78 0.44 FR2 585 0.23 0.34 0.40 0.28 CDR2 765 0.78 1.22 0.96 0.52 FR3 1080 0.31 0.49 0.71 0.52 Total FR 3015 0.32 0.33 0.50 0.40 Total CDR 990 0.64 1.21 1.14 0.51

* Does not include nucleotide changes that alone or in combination with additional changes in the same codon result in an amino acid replacement. The germline data was obtained from athe 21 VH186.2 related germline genes shown in Fig. 8.4, and bthe 31 VH 186.2 related germline genes shown in Fig. 8.1 in addition to VH 186.1, VH23 and VH3, which were taken from Bothwell et al, 1981. cThis data set contains a total of 254 sequences that were isolated during primary or secondary anti-NP responses from C57BL mice. The sequences were obtained from the following publications: Cumano and Rajewsky, 1986- Blier and Bothwell, 1987 (Hl-77 was not included); Weiss and Rajewsky, 1990 (Fig. 3); Jacob et al 1991; Jacob and Kelsoe, 1992 (Fig. 6); Weiss et al, 1992; Tao et al, 1993 (Fig. 5); McHeyzer- 208 Williams et al, 1993 (Fig. 8); Rothenfluh et al, 1993 (Fig. 6). Any positions that were not determined in the above publications were not included in the analysis. dThis data set contains a total of 45 somatically mutated VH186.2 sequences that were isolated from unimmunized mice (Schittek and Rajewsky, 1992, Fig. 1).

Table 13.11 demonstrates that there is no significant elevation in the mutation rate at the third codon position in the FRs of the germline VH sequences. The overall mutation rate increases at all three codon positions in germline CDRs, however when eliminating third codon changes that occur at codons where the amino acid is altered there is in fact a significant decrease in the mutation rate at the third codon position in germline CDRs. This is remarkably similar to the mutation rate at the third codon position of the CDRs of somatically mutated VH 186.2 genes, where there is also a marked decrease when the third codon mutations that alone or with additional mutations at the same codon result in an amino acid replacement have been eliminated. Thus, although the mutation rates of the germline genes are higher than those of the somatically mutated genes the overall pattern is remarkably similar. In the CDRs there is a significant reduction in the mutation rate at the third position of codons where the amino acid remains unaltered, whereas in the FRs the mutation rate that includes all third position mutations and the mutation rate that does not include changes that alone or in combination with other changes alter the amino acid are very similar. Thus, the mutation rate at the third codon positions of the VH 186.2 related sequences are incompatible with the prediction made by neutral theory of molecular evolution. The data presented in Table 13.11 strengthens the previous conclusions that the patterns of sequence variability in germline IgV genes is non-random and bears a remarkable degree of resemblance to the pattern found among somatically mutated genes. In order to determine whether the germline IgV genes of other species display a similar pattern, the analysis shown in Table 13.11 was repeated for the human, rabbit and Xenopus germline VH genes that were analyzed in the previous section (see Table 13.12).

209 Table 13.12. Rates of nucleotide change in human, rabbit and Xenopus germline VH genes No. Mutation Rate (%) at each codons Codon Position Genes used used a13 human VH-I germline genes FRl 390 2.31 1.37 2.14 1.79 CDRl 65 6.67 6.67 5.64 2.05 FR2 182 2.01 1.28 2.56 1.65 CDR2 206 5.02 5.50 3.88 1.43 FR3 396 0.42 1.68 2.69 1.43 Total FR 968 1.48 1.48 2.44 1.62 Total CDR 271 5.41 5.78 4.30 2.21 a25 human VH-HI germline genes FRl 750 1.33 0.71 2.04 1.64 CDRl 125 8.80 11.20 5.87 1.07 FR2 350 2.29 1.90 6.29 4.86 CDR2 418 6.94 8.37 10.29 4.15 FR3 2400 1.01 0.83 1.51 1.03 Total FR 3500 1.21 0.91 2.10 1.54 Total CDR 543 7.37 9.02 9.27 3.44 b14 rabbit VH germline genes FRl 420 3.49 3.02 1.27 0.79 CDRl 70 9.52 4.28 7.62 2.86 FR2 196 1.36 2.21 3.23 1.70 CDR2 224 3.57 4.46 4.02 2.23 FR3 336 2.98 2.98 1.88 0.89 Total FR 952 2.87 2.80 1.89 1.02 Total CDR 294 4.99 4.42 4.88 2.38 °9 Xenopus VH-I germline genes FRl 270 0.37 0.37 0.99 0.74 CDRl 45 4.44 2.96 8.15 3.7 FR2 126 1.59 0.00 0.79 0.53 CDR2 153 2.83 2.18 2.61 1.53 FR3 288 1.04 0.12 0.46 0.35 Total FR 684 0.88 0.19 0.73 0.54 Total CDR 198 3.20 2.36 3.87 2.02

* Does not include nucleotide changes that alone or in combination with additional changes in the same codon result in an amino acid replacement. a The VH-I and the VH-III gene sequences were obtained from Pascual and Capra, 1989. Note that Pascual and Capra published the sequences for 16 VH-I germline genes, however the sequences of three of these (\|/22-l, \yl5-l and ^65-3) were not used in this analysis since CDR2 and FR3 of these genes differ completely from the other sequences.b 14 Rabbit VH sequences isolated and sequenced in Roux et al, 1991. c 9 germline Xenopus VH-I genes published in Schwager etal, 1989.

The data presented in Table 13.12 indicates that in the human and rabbit germline VH genes there is a decrease in the mutation rate at the third position of codons where the amino acid is unaltered when compared to the total third codon position mutation rate. However, the decrease is generally greater in CDRs than it is in FRs. The data for the Xenopus VH-I genes more closely resembles that of the murine VH 186.2 related germline genes (Table 13.11), because the difference between the two mutation rates for third codon positions in FRs is very small but there is a significant difference in the CDRs.

210 Sequence variation in IgV pseudogenes: The neutral theory predicts that IgV pseudogenes should rapidly diverge and accumulate further mutations. However as discussed in previous sections this is not the case in many IgV pseudogenes that were isolated from different vertebrate species. Although in humans, rabbits and mice it cannot formally be excluded that an IgV pseudogene-specific correction mechanism exists, it was shown that

this is highly unlikely to be the case in the chicken IgV^ and IgVH loci (section 13.5b). It was also shown that not only is there a lack of multiple crippling mutations in the chicken pseudogenes, but the pattern of sequence variability in these pseudogenes is very similar to the pattern seen in expressed IgV regions. Thus, the sequences of the chicken pseudogenes indicate that these crippled genes do not accumulate further mutations at random and that they display very similar sequence patterns to rearranged and expressed genes. Although the data analyzed in Tables 13.11 and 13.12 includes pseudogenes, it is

of mterest to apply similar analyses to the chicken IgV^ and IgVH pseudogenes (Table 5.13). It is clear from this analysis that in the chicken pseudogenes there is also a decrease in the third position mutation rate in the CDRs when the codons resulting in amino acid replacements are excluded from the calculations. This is also found in the somatically hyperconverted chicken IgV genes. Some germline FRs also display a decrease in the third codon mutation rate, but generally to a lesser extent than in the CDRs In FRl and FR3 of the VH pseudogenes where there is also a significant difference between the two mutation rates at the third codon positions.

211 Table 5.13. Rates of nucleotide change per nucleotide in chicken germline VH and Vx pseudogenes and in somatically hyperconverted genes. No. Mutation Rate (%) at each codons Codon Position Genes used used 1 2 3 3* 25 chicken V^, pseudogenes FRl 375 2.4 3.56 4.09 2.76 CDRl 191 9.60 6.11 7.50 3.14 FR2 376 3.01 1.68 3.28 1.86 CDR2 168 6.94 6.55 8.53 1.78 FR3 701 2.76 2.62 1.85 1.18 CDR3 189 8.47 10.93 10.75 3.17 Total FR 1452 2.73 2.62 2.8 1.77 Total CDR 548 8.39 7.91 8.94 2.74 12 expressed chicken V^, genes FRl 240 1.89 0.83 1.25 1.25 CDRl 96 4.51 6.25 1.04 0.35 FR2 192 1.91 0.52 1.22 0.52 CDR2 84 1.59 1.98 1.59 0.40 FR3 396 1.43 1.85 1.51 0.34 CDR3 72 3.70 3.70 2.31 1.85 Total FR 828 1.65 1.25 1.37 0.65 Total CDR 252 3.31 4.1 1.58 0.79 18 chicken VH pseudogenes FRl 433 1.77 2.23 1.31 0.69 CDRl 80 10.42 10.00 7.92 4.2 FR2 224 1.78 2.83 4.46 2.98 CDR2 256 5.73 6.90 4.04 0.91 FR3 620 1.94 3.12 1.29 0.32 Total FR 1277 1.85 2.77 1.85 0.91 Total CDR 336 6.85 7.64 4.96 1.69 11 expressed chicken VH genes FRl 330 1.01 0.51 0.20 0.20 CDRl 55 4.85 6.06 4.85 1.21 FR2 154 0.22 0.43 2.16 1.95 CDR2 224 2.68 2.83 2.67 0.30 FR3 462 0.14 0.29 0.07 0.07 Total FR 946 0.45 0.39 0.45 0.42 Total CDR 279 3.11 3.45 3.11 0.48

* Does not include nucleotide changes that alone or in combination with additional changes in the same codon result in an amino acid replacement. All of the sequences were obtained from Reynaud et al, 1987 or from Reynaud et al, 1987.

Thus, the data for murine, human, rabbit and Xenopus germline VH genes presented in Tables 13.11 and 13.12 indicate that the mutation rate at the third codon position in germline FRs and CDRs is not higher than it is at the first and second positions. Contrary to the predictions of the neutral theory of evolution the chicken pseudogenes have not diverged significantly and also display the non-random pattern of sequence variability present in functional genes. In vertebrate germline VH genes and in the chicken pseudogenes the mutation rates at the three codon positions resemble those observed in somatically altered IgV regions and are incompatible with the neutral theory of molecular evolution. Furthermore, the chicken pseudogene sequences also do not meet the predictions made by this hypothesis

212 Germline generator of IgV diversity: An alternate explanation for the highly non- random sequence patterns observed in germline IgV genes may be to invoke a mechanism that acts directly on the germline DNA of the gametes or their precursors (eg. spermatogonia). This mechanism may consist of a multimeric complex that has evolved in response to the need for a diverse germline repertoire. Since it appears that a large number of germline IgV genes may not be absolutely necessary for the generation of a large and diverse repertoire of antibody specificities (see above), this mechanism probably evolved before many or most of the somatic diversification mechanisms evolved. Support for this may come from the fact that at least some of the non-random sequence patterns are also present in primitive vertebrates (eg. Kokubu etal, 1988). The mechanism would need to possess a number of novel specificities. Not only would it need to be able to differentiate between IgV and non-IgV genes, but it would also have to distinguish between CDRs and FRs. The mechanism would then introduce mainly replacement changes into the CDRs and avoid the introduction of nucleotide changes that would result in an amino acid change into the FRs. The enzymatic machinery would also need to be able to avoid the introduction of nucleotide substitutions that generate a stop codon. Alternatively, a correction mechanism could 'rescue' such stop codons by either correcting the stop codon-generating change or by introducing additional nucleotide changes into the same codon. Finally, the mechanism would also need to introduce insertions and deletions at the borders of the putative transcription/coding units. As can be seen, a mechanism acting on germline DNA would in effect have to replicate the mutation and selection processes that take place during the immune response. At present, there is no evidence for a cellular mechanism that is capable of the described specificities required for the generation of the observed germline IgV sequence diversity.

Soma-to-germline genetic feedback loop: According to this model, somatic mutation and selection genetic events can directly influence the homologous target area in the germline DNA, eg. RNA copies of somatically expressed genes could be taken up by endogenous retroviruses which could act as soma-to-germline genetic vectors (Steele, 1979). Thus, according to this specific model endogenous retroviruses package Ig RNA molecules, find their way to and infect oocytes, sperm, spermatogonia or the fetus, and a cDNA copy of the sequence then integrates into the germline DNA. There is now a body of evidence in the literature that is consistent with many of the predictions made by this model. Endogenous retroviruses are found associated with sperm cells in the male reproductive tract (Fernandes etal, 1973; Kiessling et al, 1987, 1989) and with oocytes in the ovary (Nilsson et al, 1986). Retroviral proteins have also been detected in human placentas (Maeda et al, 1983; Johansen et al, 1989) and in the uterus of pregnant mice (Strickland etal, 1979). Endogenous retroviruses are secreted 213 by mitogen stimulated B cells (eg. Moroni and Schumann, 1975; Moroni etal, 1980; King et al, 1990) and they have also been detected in antigen dependent germinal centers (Szakal and Hanna, 1968). It was also shown that endogenous retroviruses can package cellular RNA (Ikawa et al, 1974) and in a naturally occurring mutant it was found that the host cell RNA was packaged in approximate proportion to its intracellular concentration (Gallis et al, 1979; Chen et al, 1985; Aronoff and Linial, 1991). Male mice injected intraperitoneally with infectious endogenous retroviruses were shown to transmit the virus to females during mating (Portis et al, 1987), suggesting that it may be possible for retroviruses to spread to reproductive tissues. Furthermore, processed pseudogenes, i.e. pseudogenes with the characteristics of processed genes (removal of introns, poly-A tails) have been isolated and characterized (eg. see Battey et al, 1982; Karin and Richards, 1982; Reilly et al, 1982; McCarrey and Thomas, 1987), and it was shown that processed pseudogenes can be produced by retroviral vectors (Linial, 1987; Dornburg and Temin, 1988; Tch6nio etal, 1993). The presence of a DNA-»RNA—»DNA feedback loop mediated by endogenous retroviruses would account for the similar patterns of sequence variability in germline and somatically altered IgV genes. It could also account for the presence of apparent VH-D junctions among the chicken VH pseudogenes (Reynaud et al, 1989) and the variable 3' termini apparent in the chicken Vx pseudogenes (Reynaud et al, 1987) and also in some VKOXI germline genes (Milstein et al., 1992). Insertion of an incoming cDNA copy of an expressed and therefore functional IgH or IgL gene would also be able to account for the lack of stop codons in germline IgV genes as well as the different sequence relationships observed between the 5' non-transcribed regions and the putative transcription/coding units. Nevertheless, an endogenous retrovirus-mediated genetic feedback loop would predict that unmutated Ig genes expressed by plasma cells in addition to somatically mutated and antigenically selected Ig genes expressed by memory cells should return to the germline. However, the germline IgVn sequences presented in this thesis and by others (eg. Reynaud et al, 1987, 1989; Schwager et al, 1989; Roux et al, 1991; Tomlinson et al, 1992; Milstein et al, 1992; Haino et al, 1994) clearly indicate that virtually all germline IgV genes contain the characteristic concentration of sequence variability in the CDRs. Therefore, a genetic feedback loop mediated by endogenous retroviruses as envisaged by Steele (1979) can only fully explain the data if it is assumed that only the genes expressed by somatically mutated and antigenically selected memory B lymphocytes can undergo the soma to germline feedback.

214 Lymphocyte-specific, endogenous retrovirus-mediated genetic feedback loop: As described above, there is abundant evidence consistent with a mechanism as described by Steele (1979) and almost all of the sequence data obtained for germline IgV genes are also consistent with this proposition. Therefore it is worthwhile to consider a modification of the above model. Although certain aspects of this model are somewhat speculative, it is nevertheless consistent with the germline IgV sequence patterns reported on in this thesis. It is now well known that endogenous retroviruses are present in the mouse epididymis (Fernandes et al, 1973; Kiessling et al, 1987,1989). These retroviruses are produced by the epithelial cells that line the lumen of the epididymis (Kiessling et al, 1989). Interestingly, no retroviral particles were produced by the testis but RNA probes detected patchy expression of endogenous retroviral genes in the spleen (Kiessling et al., 1989). It was estimated that in mice up to IO1 * - IO12 retroviral particles are present per ml of epididymal fluid (Kiessling, 1984; Kiessling et al, 1987). It was also shown that lymphocytes and macrophages are present between epithelial cells of the epididymis (Hoffer et al, 1973; Wang and Holstein, 1983). Cells with lymphocyte and macrophage morphologies are also present in the lumen of the epididymis (Kiessling et al, 1989), and these cells are literally engorged with retroviral particles. Furthermore, it was also demonstrated that retroviral particles attach to the spermatozoa stored in the lumen of the epididymis (Kiessling et al, 1987). Although it is not clear whether endogenous retroviruses are also produced in the epididymis of other mammals, it was demonstrated that reverse transcriptase activity is present in human sperm (Witkin and Bendich, 1977). The above data led to the suggestion that lymphocytes and macrophages that have taken up retroviral particles in the epididymis may spread these particles to other male tissues (Kiessling et al, 1989), and that sperm associated with retroviral particles may not only fertilize an oocyte but may in fact be carried to the ovary (Kiessling et al, 1987). The above data thus leads to the model depicted in Figure 13.17. It is assumed that, possibly via the interaction of a specific surface marker, only memory lymphocytes can transverse the barriers that normally prevent the entry of blood cells into the male reproductive tract, and enter the epithelium. The fact that in healthy, young males generally few lymphocytes are present in the epididymis (Hoffer et al, 1973; Wang and Holstein, 1983; Kiessling et al, 1989) may support this. The retroviruses secreted by the epididymal epithelial cells infect the lymphocytes present in the lumen or within the epithelium of the epididymis. In addition, it is also possible that the lymphocytes may secrete their own retroviral particles. Either or both of these retrovirus pathways may result in the presence of retroviral particles that have packaged a cellular RNA molecule in the epididymal fluid. Some of these particles then attach to and possibly infect the spermatozoa. In the latter case, the reverse transcript of the cellular RNA transported by the retrovirus(es) may integrate into the haploid genome of the spermatozoa, in the former 215 case, the sperm may transport the retrovirus(es) to the released oocyte or to the ovary itself.

Figure 13.17. Memory lymphocyte specific soma-to-germline genetic feedback loop. W = retroviral particle containing retroviral genomic RNA (~) only; x:? = retroviral particle containing memory lymphocyte RNA (**•) as well as retroviral genomic RNA.

In the chicken it was shown that homologues of three genes that are implicated in yeast mitotic and meiotic recombination are found in the chicken, and at least two of these are expressed at high levels in the primary chicken lymphoid organs as well as in the ovaries and testis (Bezzubova etal, 1993a,b; Bezzubova and Buerstedde, 1994). Moreover, in an unpublished report Bezzubova and colleagues (see Bezzubova et al, 1993b) isolated a RAD54 homologue from mice, although it was not specified in which tissues it is expressed. If these gene products are expressed in mammalian reproductive systems then they may well mediate specific recombination between a somatically mutated and antigenically selected cDNA molecule and a homologous germline IgV gene. In addition, it was suggested that VH gene replacement is mediated by the conserved heptamer sequence near the 3' end of VH genes (Reth et al, 1986; Kleinfeld et al, 1986). This heptamer is identical to that found 5' of many D elements. Thus the conserved heptamer of a rearranged VH gene may interact with the nonamer immediately downstream of an unrearranged VH gene which results in the looping out and replacement of the originally rearranged VH gene (Reth et al, 1986; Kleinfeld et al, 1986).

216 This heptamer is highly conserved in murine (Reth et al, 1986; Kleinfeld et al, 1986) and human (eg. see Pascual and Capra, 1991) germline VH genes, and it is also present in some rabbit (Roux et al, 1991), Xenopus (Schwager et al, 1989) and Heterodontus (Kokubu et al, 1988) germline VH genes, as well as in some of the chicken VH pseudogenes (Reynaud et al, 1989). Thus, the conserved heptamer found in many germline VH genes of several species may facilitate gene conversion events between the cDNA molecule and a germline VH gene. Li the murine VH genes presented in this thesis, the putative recombination hot-spot was situated around the cap site, whereas in the VKOXI germline genes and the germline VH genes of other species it is situated in the leader intron. This suggests that both unspliced and spliced RNA molecules can be packaged, reverse transcribed and returned to the germline by retroviruses. The former predicts a recombination hot-spot around the cap site, whereas the latter predicts a hot-spot near the 5' end of the V coding region. Any model invoking a soma-to-germline feedback-loop to explain the similarities between the sequence patterns found in somatically mutated and in germline IgV genes would predict that IgV gene families that are used most should contain the highest proportion of germline genes with open reading frames. Similarly, IgV gene families that are rarely used should accumulate more pseudogenes since the soma-to-germline feedback loop would only rarely return a functional gene to the germline. It is thus of interest to note that in Xenopus the two families with the lowest proposition of pseudogenes, VH-I and VH-n. are rearranged and expressed by more B cells than the VH-III family which contains up to 50 % pseudogenes (Schwager et al, 1989). In addition, it was also shown that whereas VH205.12 related genes are commonly rearranged and expressed in pre-B cells of CB.20 mice, they are rarely detected in adult U+8+ spleen cells (Gu et al, 1991b). In contrast, VH 186.2 related genes are often rearranged and expressed in adult u+8+ spleen cells of the same mouse strain. Thus the usage of the two sub-families in adult splenic B cells appears to reflect their pseudogene content.

Summary: Germline IgV gene evolution involving random mutation followed by (overdominant) selection is very unlikely to have resulted in the observed patterns because the germline IgV genes are only transcribed and translated following a functional V(D)J rearrangement in a B cell. Furthermore, the presence of large numbers of germline IgV genes also makes it unlikely that each individual gene can be subject to strong selection forces. But the most convincing argument against this type of evolutionary mechanism comes from the chicken pseudogenes which are clearly crippled genes but still display the same patterns found in functional genes. As was discussed above, it is unlikely that these pseudogenes are corrected by a gene conversion-mediated mechanism.

217 It was also shown that whereas gene conversion between germline IgV genes no doubt takes place, it is unlikely to have produced the observed DNA sequence patterns. Intra-chromosomal gene conversion results in the maintenance of sequence homogeneity, thus it is difficult to explain the concentration of sequence diversity in germline CDRs with this mechanism. There is no evidence that CDR targeted gene conversion events take place. The facts that gene conversion-promoting zipper regions are present within a few hundred nucleotides upstream of the cap site and that 5' flanking region sequences of the murine germline VH genes presented in thesis do not display significantly lower sequence homology than the putative transcription units, suggests that cross-over events should normally take place at the zipper region. Further evidence against the proposition that the recombination hot-spots flanking the putative coding/transcription units are due to intra­ chromosomal gene conversion events is the fact that this would result in the loss of sequence heterogeneity, and this is clearly not the case. Analysis of the mutation rate at the three codon positions of germline IgV genes reveals that the expected preponderance of mutations at the third position is not present in germline CDRs or FRs, nor is it present in the chicken pseudogenes. Indeed, this analysis showed that the pattern of the mutation rates at the three codon positions in germline IgV genes in fact bear remarkable similarity to those observed in somatically fashioned and altered IgV genes. The sequences of IgV pseudogenes generally are inconsistent with the prediction that pseudogenes should diverge much faster due to the accumulation of additional mutations. This is most clearly observed in the collection of chicken IgVX, and IgVH pseudogenes. Therefore, functional germline IgV genes and IgV pseudogenes do not conform to the predictions made by the neutral theory of molecular evolution. A germline IgV diversification mechanism that is operative in germline tissues was discussed. A mechanism possessing the required specificities could indeed produce the patterns observed in germline IgV genes, however at the present time there is no evidence of any cellular machinery with the sophisticated molecular intelligence needed to produce the observed patterns. A soma-to-germline genetic feedback loop whereby RNA expressed by B cells is packaged by endogenous retroviruses which return a cDNA copy of this RNA to a germ cell or to the fetus would be consistent with the non-random sequence patterns found among germline IgV genes. Furthermore, reports in the literature confirm that endogenous retroviruses are indeed present in the tissues where this model predicts they should be found. However, the model as originally postulated is inconsistent with the fact that virtually all germline IgV genes contain the sequence patterns that resemble those found in somatically mutated and antigenically selected IgV region genes. A modification of this model was presented that can account for this. The memory lymphocyte-specific genetic feedback loop would return IgV genes that have been mutated and selected by 218 antigen during an immune response to the germline, thus producing the observed sequence patterns that are so remarkably similar to those found in memory B cells. This model makes a number of testable predicitons. First, most or all of the lymphocytes that are present in the epididymal lumen of young males should be memory lymphocytes (i.e. lymphocytes displaying evidence of having undergone somatic hypermutation and isotype switch). Second, immunized mice should contain higher numbers of memory lymphocytes in the epididymis than unimunnized mice. Third, the homologous chromosome pairs containing the IgV loci should exhibit signs of allelic scrambling. Finally, with the use of sophisticated in-situ hybridization and/or PCR techniques, it should be possible to detect Ig RNA in endogenous retroviral particles present in the epididymis.

219 Appendix A: Genbank accession numbers

Sequence See Accession Sequence See Accession Sequence See Accession name figure(s) number name figure(s) number name figure(s) number 3A112 6.2, 6.3 L09566 C57C16 8.1 L26928 C57C18 11.1 L33933 3C52 L09567 C57C19 it L26929 C57C27 L33934 3D61 L09568 C57C2 n L26930 C57G14 L33935 Hl-27 L09588 C57C20 ci L26931 C57G15 L33936 Hl-29 L09589 C57C22 n L26932 C57G26 L33937 Hl-30 L09590 C57C23 it L26933 C57G3 L33938 Hl-39 L09591 C57C25 II L26934 C57G6 L33939 H145 L09593 C57C26 II L26935 C57C16 L33940 Hl-51 L09594 C57E1 it L26936 C57C17 L33941 Hl-7 L09595 C57C27 it L26952 C57C2 L33942 Hl-71 L09596 BALB1 8.4 L26867 C57C44 L33943 Hl-8 L09597 BALB10 u L26868 C57C48 L33944 C57E22 8 1 L26851 BALB12 n L26869 C57G1 L33945 C57E3 L26852 BALB13 n L26870 C57G5 L33946 C57E31 L26853 BALB13E n L26871 C57G9 L33947 C57E33 L26854 BALB14E M L26872 C57G30 L33948 C57E35 L26855 BALB16 It L26873 C57C9 L33949 C57E36 L26856 BALB16E II L26874 C57G18 L33950 C57E38 L26857 BALB18 II L26875 C57G22 L33951 C57E40 L26858 BALB2 11 L26876 BALBll 11 .4 L33952 C57E44 L26859 BALB20 rt L26877 BALB13 L33953 C57E6 L26860 BALB21 n L26878 BALB 17 L33954 C57G10 L26861 BALB23 II L26879 BALB19 L33955 C57G11 L26862 BALB25 it L26880 BALB58 L33956 C57G44 L26863 BALB26 it L26881 BALB6 L33957 C57G45 L26864 BALB3E n L26882 BALB67 L33958 C57G8 L26865 BALB4 it L26883 BALB71 L33959 C57G9 L26866 BALB5E n L26884 BALB8 L33960 C57C11 L26925 BALB6 n L26885 BALB9 L33961 C57C14 L26926 BALB7 it L26886 C57C15 L26927 BALB9 it L26887

The sequences presented in Figures 4.1, 4.2 and 5.5 have been submitted to Genbank, however at the time of submission of this thesis, the accession numbers were not yet available.

220 Appendix B: Publications resulting from the work presented in this thesis Publications in scientific journals:

1 Steele, E. J., Rothenfluh, H. S., and Both, G. W. 1992. Defining the nucleic acid substrate for somatic hypermutation. Immunol. Cell. Biol 70: 129-144 2 Rothenfluh, H. S., and Steele, E. J. 1993. Origin and maintenance of germ-line V genes. Immunol. Cell Biol. 71: 227-232 3 Rothenfluh, H, and Steele, T. 1993. Lamarck, Darwin and the immune system. Today's Life Science. 5[7]: 8-15 and 5[8]: 16-22 (two separate instalments, July and August issues). 4 Rothenfluh, H. S., Taylor, L., Bothwell, A. L. M„ Both, G. W., and Steele, E. J. 1993. Somatic hypermutation in 5' flanking regions of heavy chain antibody variable regions. Eur. J. Immunol. 23: 2152-2159. 5 Steele, E. J., Rothenfluh, H. S., Blanden, R. V., and Ada, G., L. 1993. Affinity maturation of lymphocyte receptors and positive selection of T cells in the thymus. Immunol. Rev. 135: 5-49 6 Rothenfluh, H. S., Gibbs, A. J., Blanden, R. V., and Steele, E. J. 1994. Analysis of patterns of DNA sequence variation in flanking and coding regions of murine germline immunoglobulin variable genes - evolutionary implications. Proc. Natl Acad. Sci. USA Accepted for publication Manuscripts in preparation: 1 Rothenfluh, H. S. Hypothesis: A lymphocyte-specific genetic feedback-loop. 2 Rothenfluh, H. S., and Steele, E. J. Minimisation of PCR generated artifacts.

3 Rothenfluh, H. S., and Steele, E. J. Selection for non-germline encoded VH-D- JH junctional sequences.

4 Rothenfluh H. S. Pattern of nucleotide substitution in germline immunoglobulin heavy-chain variable gene segments is incompatible with the neutral theory of molecular evolution. 5 Rothenfluh, H. S., Blanden, R. V, and Steele, E. J. Evolution of vertebrate germline immunoglobulin variable genes: highly non-random patterns of DNA sequence variation in functional genes and in pseudogenes. Other Publications: 1 Steele, E. J., and Rothenfluh, H. S. 1994. Diversity in the Immune System. In: Principles of Medical Biology. Eds.: E. E. Bittar, and N. Bittar. JAI Press Inc., CT, USA. In press

221 Abstracts/ Oral Presentations at Conferences:

1 Rothenfluh, H. S., Bothwell, A. L. M., Both, G. W., and Steele, E. J. 1992. Definition of the nucleic acid substrate for somatic hypermutation. The 8th International Congress for Immunology, Budapest, Hungary. 2 Rothenfluh, H. S., Both, G. W., and Steele, E. J. 1992. Origin and maintenance of germ-line V genes. The joint NSW & ACT branch meeting of the Australasian Society for Immunology held at Kioloa, NSW, Australia. 3 Rothenfluh, H. S., and Steele, E. J. 1993. Somatic and germline evolution of immunoglobulin (Ig) variable (V) genes. International Congress on the Regulation of Leukocyte Production and Immune Function. Darting Harbour, NSW. 4 Steele, E. J., Rothenfluh, H. S., Ada, G. L., and Blanden, R. V. 1993. of B cell Ig hypervariation - implications for TcR variable gene diversification. International Congress on the Regulation of Leukocyte Production and Immune Function. Darting Harbour, NSW.

222 REFERENCES Abbas, A. K., Urioste, S., Collins, T. L., and Boom, H. 1990. Heterogeneity of helper/inducer T lymphocytes. IV. Stimulation of resting and activated B cells by Thl and Th2 clones. J. Immunol 144: 2031-2037 Aguilera, R. J., Akira, S., Okazaki, K., and Sakano, H. 1987. A pre-B cell nuclear protein that specifically interacts with the immunoglobulin V-J recombination sequences. Cell 51: 909-917 Akolkar, P. N., Sidker, S. K., Bhattacharya, S. B., Liao, J., Gruezo, F., Morrison, S. L., and Kabat, E. 1987. Different VL and VH germ-line genes are used to produce similar combining sites with specificity for oc(l-^6)Dextrans J. Immunol. 138: 4472-4479 Allen, D., Cumano, A., Dildrop, R., Kocks, C, Rajewsky, K., Rajewsky, N., Roes, J., Sablitzky, F., and Siekevitz, M. 1987. Timing, genetic requirements and functional consequences of somatic hypermutation during B-cell development. Immunol. Rev. 96: 5-22 Allen, D„ Simon, T., Sablitzky, Rajewsky, K., and Cumano, A. 1988. Antibody engineering for the analysis of affinity maturation of an anti-hapten response. EMBO J. 7: 1995-2001 Alt, F. W., Enea, V., Bothwell, A. L. M., and Baltimore, D. 1980. Activity of multiple light chain genes in murine myeloma cells producing a single, functional light chain. Cell 21: 1-12 Alt, F. W., Rosenberg, N., Lewis, S., Thomas, E., and Baltimore, D. 1981. Organization and reorganization of immunoglobulin genes in A-MuLV- transformed cells: Rearrangement of heavy but not light chain genes. Cell 27: 381-390 Alt, F. W., and Baltimore, D. 1982. Joining of immunoglobulin heavy chain gene segments: Implications from a chromosome with evidence of three D-JH fusions. Proc. Natl Acad. Sci. USA 79: 4118-4122 Alt, F. W., Yancopoulos, G. D., Blackwell, T. K„ Wood, C, Thomas, E., Boss, M., Coffman, R., Rosenberg, N., Tonegawa, S., and Baltimore, D. 1984. Ordered rearrangement of immunoglobulin heavy chain variable region segments. EMBO J. 3: 1209-1219 Alzari, P. M., Lascombe, M., and Poljak, R. J. 1988. Three-dimensional structure of antibodies. Ann. Rev. Immunol. 6: 555-580 Alzari, P. M., Spinelli, S., Mariuzza, R. A., Boulot, G., Poljak, R. J., Jarvis, J. M., and Milstein, C. 1990. Three-dimensional structure determination of an anti-2- phenyloxazolone antibody: The role of somatic mutation and heavy/light chain pairing in the maturation of an immune response. EMBO J. 9: 3807-3814 Amin, A. R., Tamma, S. M., Oppenheim, J. D., Finkelman, F. D., Kieda, C, Coico, R. F., and Thorbecke, G. J. 1991. Specificity of the murine IgD receptor on T cells is for N-linked glycans on IgD molecules. Proc. Natl Acad. Sci. USA 88: 9238-9242 Apel, M., and Berek. C. 1990. Somatic mutations in antibodies expressed by germinal centre B cells early after primary immunization. Int. Immunol 2: 813-819 Arden, B., Klotz, J. L., Siu, G., and Hood, L. E. 1985. Diversity and structure of genes of the a family of mouse T-cell antigen receptor. Nature 316: 783-787 Arnheim, N. 1983. Conceited evolution of multigene families. In: Evolution of genes and proteins. Eds. Nei, M., and Koehn, R. K. Sinauer Associates Inc. Sunderland, Massachusetts, pp. 38-61 Aronoff, R., and Linial, M. 1991. Specificity of retroviral RNA packaging. J. Virol. 65: 71-80 Asano, Y., and Hodes, R. J. 1984. T cell regulation in B cell activation. T cells independently regulate the responses mediated by distinct B cell subpopulations223 . J. Exp. Med. 155: 1267-1276 Atchison, M. L., Delmas, V., and Perry, R. P. 1990. A novel upstream element

compensates for an ineffectual octamer motif in an immunoglobulin VK promoter EMBO J. 9:3109-3117 Atkinson, M. J., Paige, C. J., and Wu, G. E. 1993. Map position and usage of 3' VH family members: usage is not position dependent. Int. Immunol. 5:1577-1587 Augustin, A. A., and Sim, G. K. 1982. T-Cell receptors generated via mutations are specific for various major histocompatibility antigens. Cell 39:5-12 Ballard, D. W. and Bothwell, A. 1986. Mutational analysis of the immunoglobulin heavy chain promoter region. Proc. Natl Acad. Sci. USA. 83: 9626-9630 Baltimore, D. 1981. Gene conversion: Some implications for immunoglobulin genes. Cell 24: 592-594 6 Bart, R. K., Kim, B. S., Lan, N. C, Hunkapiller, T., Sobieck, N., Winoto, A., Gershenfeld, H., Okada, C, Hansburg, D., Weissman, I. L., and Hood, L. 1985. The murine T-cell receptor uses a limited repertoire of expressed Vfi gene segments. Nature 316: 517-523 Battey, J., Max, E. E., McBride, W. O., Swan, D., and Leder, P. 1982. A processed human immunoglobulin e gene has moved to chromosome 9. Proc. Natl. Acad. Sci. USA 79: 5956-5960 Baumrucker, T., Sturm, R., and Herr, W. 1988. OBP100 binds remarkably degenerate octamer motifs through sequence specific interactions with flanking sequences. Genes Dev. 2: 1400-1413 Becker, D. M., Patten, P., Chien, Y., Yokota, T., Eshhar, Z., Giedlin, M., Gascoigne, N. R. J., Goodnow, C, Wolf, R., Arai, K, and Davis, M. M. 1985. Variability and repertoire size of T-cell receptor Va gene segments. Nature 317:430-434 Becker, R. S., and Knight, K. L. 1990. Somatic diversification of immunoglobulin heavy chain VDJ genes: Evidence for somatic gene conversion in rabbits. Cell 63: 987-997 Bentley, D. L., and Rabbitts, T. H. 1983. Evolution of immunoglobulin VH genes: Evidence indicating that recently duplicated human VK sequences have diverged by gene conversion. Cell 32: 181-189 Berek, C, Griffiths, G. M., and Milstein, C. 1985. Molecular events during maturation of the immune response to oxazolone. Nature 316: 412-418 Berek, C, Jarvis, J. M., and Milstein, C. 1987. Activation of memory and virgin B cell clones in hyperimmune animals. Eur. J. Immunol. 17: 1121-1129 Berek, C, and Milstein, C. 1987. Mutation drift and repertoire shift in the maturation of the immune response. Immunol. Rev. 96: 23-41 Berek, C, and Milstein, C. 1988. The dynamic nature of the antibody repertoire. Immunol. Rev. 105: 5-26 Berek, C, Berger, A., and Apel, M. 1991. Maturation of the immune response in germinal centers. Cell 67: 1121-1129 Berg, J., McDowell, M., Jack, H., and Wabl, M. 1990. Immunoglobulin X gene rearrangement can precede k gene rearrangement. Dev. Immunol. 1: 53-57 Berman J. E., and Alt, F. W. 1990. Human heavy chain variable region gene diversity, organization, and expression. Intern. Rev. Immunol 5: 203-214 Bernard, O., Hozumi, N., and Tonegawa, S. 1978. Sequences of mouse immunoglobulin light chain genes before and after somatic changes. Cell 15: 1133-1144 Betz, A. G., Rada, C, Pannell., R., Milstein, C, and Neuberger, M. S. 1993. Passenger transgenes reveal intrinsic specificity of the antibody hypermutation mechanism: Clustering, polarity, and specific hot spots. Proc. Natl. Acad. Sci. USA 90: 2385-2388 Bezzubova, O., Shinohara, A., Mueller, R. G., Ogawa, H., and Buerstedde, J. M. 1993a. A chicken RAD51 homologue is expressed at high levels in lymphoid and reproductive organs. Nucleic Acids Res. 21: 1577-1580 224 Bezzubova, O. Y., Schmidt, H., Ostermann, K., Heyer, W. D., and Buerstedde, J. M. 1993b. Identification of a chicken RAD52 homologue suggests conservation of the RAD52 recombination pathway throughout the evolution of higher eukaryotes. Nucleic Acids Res. 21: 5945-5949 Bezzubova, O. Y., and Buerstedde, J. M. 1994. Gene conversion in the chicken immunoglobulin locus: A paradigm of homologous recombination in higher eukaryotes. Experientia 50: 270-276 Blackwell, T. K., Malynn, B. A., Pollock, R. R., Ferrier, P., Covey, L. R., Fulop, G. M., Phillips, R. A., Yancopoulos, G. D., and Alt, F. W. 1989. Isolation of scid pre-B cells that rearrange kappa light chain genes: formation of normal signal and abnormal coding joins. EMBO J. 8:735-742 Blankenstein, T., and Krawinkel, U. 1987. Immunoglobulin VH region genes of the mouse are organized in overlapping clusters. Eur. J. Immunol. 17: 1351-1357 Blankenstein, T., Bonhomme, F., and Krawinkel, U. 1987. Evolution of pseudogenes in the immunoglobulin Vn-gene family of the mouse. Immunogenet. 26: 237- 248 Blier, P. R., and Bothwell, A. 1987. A limited number of B cell lineages generates the heterogeneity of a secondary immune response. J. Immunol. 139: 3996-4006 Blier, P. R., and Bothwell, A. L. M. 1988. The immune response to the hapten NP in C57BL/6 mice: Insight into the structure of the B-cell repertoire. Immunol. Rev. 105: 27-43 Blomberg, B., and Tonegawa, S. 1982. DNA sequences of the joining regions of mouse X light chain immunoglobulin genes. Proc. Natl. Acad. Sci. USA 79: 530-533 Boersch-Supan, M. E., Agarwal, S., White-Scharf, M. E., and Imanishi-Kari, T. 1985. Heavy chain variable region. Multiple gene segments encode anti-4-(hydroxy-3- nitrophenyl)acetyl idiotypic antibodies. /. Exp. Med. 161: 1272-1292 Bosma, G. C, Custer, R. P., and Bosma, M. J. 1983. A severe combined immunodeficiency mutation in the mouse. Nature 301: 527-530 Both, G. W., Taylor, L., Pollard, J. W., and Steele, E. J. 1990. Distribution of mutations around rearranged heavy-chain antibody variable-region genes. Mol Cell Biol 10: 5187-5196 Bothwell, A. L. M., Paskind, M., Reth, M., Imanishi-Kari, T., Rajewsky, K., and Baltimore, D. 1981. Heavy chain variable region contribution to the NPD family of antibodies: Somatic mutation evident in a 72a variable region. Cell 24: 625-637 Bothwell, A. L. M. 1984. The genes encoding anti-NP antibodies in inbred strains of mice. In: The Biology ofldiotypes. Eds. Greene, M. I., and Nisonoff, A. Boyle, W. 1968. An extension of the 5lQ-release assay for the estimation of mouse cytotoxins. Transplantation 6: 761-764 Brack, C, Hirama, M., Lenhard-Schuller, R., and Tonegawa, S. 1978. A complete immunoglobulin gene is created by somatic recombination. Cell 15:1-14 Brenner, S., and Milstein, C. 1966. Origin of antibody variation. Nature 211: 242-243 Brodeur, P. H., and Riblet, R. 1984. The immunoglobulin heavy chain variable region (Igh-V) locus in the mouse. I. One hundred Igh-V genes comprise seven families of homologous genes. Eur. J. Immunol. 14: 922-930 Brodeur, P. H., Osman, G. E., Mackle, J. J., Lalor, T. M. 1988. The organization of the mouse Igh-V locus. Dispersion, interspersion, and the evolution of VH gene family clusters. J. Exp. Med. 168: 2261-2278 Buerstedde, J. M., and Takeda, S. 1991. Increased ratio of targeted to random integration after transfection of chicken B cell lines. Cell 67: 179-188 Burnet, F.M. 1957. A modification of Jerne's theory of antibody production using the concept of clonal selection. Aust. J. Sci. 20: 67-69 Burton, G. F., Conrad, D. H., Szakal, A. K., and Tew, J. G. 1993. Follicular dendritic cells and B cell costimulation. /. Immunol. 150: 31-38 Butcher, E. C, Rouse, R. V., Coffman, R. L., Nottenburg, C. N., Hardy, R22. R.5 , and Weissman, I. L. 1982. Surface phenotype of Peyer's patch germinal center cells: Implications for the role of germinal centers in B cell differentiation. J. Immunol. 132: 1712-2707 Capone, M., Watrin, F., Femex, C, Horvat, B., Krippl, B., Wu, L., Scollay, R., and Alt, F. W. 1993. TCR6 and TCRa gene enhancers confer tissue- and stage- specificity on V(D)J recombination events. EMBO J. 12: 4335-4346 Carlson, L. M., McCormack, W. T., Postema, C. E., Humphries, E. H., and Thompson, C. B. 1990. Templated insertions in the rearranged chicken IgL V gene segment arise by intrachromosomal gene conversion. Genes Dev. 4: 536- 547 Carmack, C. E., Camper, S. A., Mackle, J. I, Gerhard, W. U., and Weigert, M. G. 1991. Influence of a VK8 L chain transgene on endogenous rearrangements and the immune response to the HA(SB) determinant on influenza virus. J. Immunol. Ul: 2024-2033 Chen, P. J., Cywinski, A., and Taylor, J. M. 1985. Reverse transcription of 7S L RNA by an avian retrovirus. J. Virol. 54: 278-284 Chen, P. P., and Yang, P. M. 1990. A segment of human Vh gene locus is duplicated. Scand. J. Immunol. 31: 593-599 Chen, C, Roberts, V. A., and Rittenberg, M. B. 1992. Generation and analysis of random point mutations in an antibody CDR2 sequence: Many mutated antibodies lose their ability to bind antigen. J. Exp. Med. 176: 855:866 Chen, J., Trounstine, M., Kurahara, C, Young, F., Kuo, C, Xu, Y., Alt, F. W., and Huszar, D. 1993a. B cell development in mice that lack one or both immunoglobulin K light chain genes. EMBO J. 12: 821-830 Chen, J., Trounstine, M., Alt, F. W., Young, F., Kurahara, C, Loring, J. F., and Huszar, D. 1993b. Immunoglobulin gene rearrangement in B cell deficient mice generated by targeted deletion of the JH locus. Int. Immunol. 5: 647-656 Chen, J., Young, F., Bottaro, A., Stewart, V., Smith, R. K., and Alt, F. W. 1993c. Mutations of the intronic IgH enhancer and itsflanking sequence s differentially affect accessibility of the JH locus. EMBO J. 12: 4635-4635 Cherif, D., and Berger, R. 1990. New localization of VH sequences by In Situ hybridization with biotinylated probes. Genes Chromosomes Cancer 2: 103-108 Chien, Y., Gascoigne, N. R. J., Kavaler, J., Lee, N. E., and Davis, M. M. 1984. Somatic recombination in a murine T-cell receptor gene. Nature 309: 322-326 Chien, N. C, Pollock, R. R., Desaymard, C, and Scharff, M. D. 1988. Point mutations cause the somatic diversification of IgM and IgG2a antiphosphorylcholine antibodies. J. Exp. Med. 167: 954-973 Chothia, C, and Lesk, A. M. 1987. Canonical structures for the hypervariable regions of immunoglobulins. J. Mol Biol. 196: 901-917 Chothia, C, Lesk., A. M., Tramontano, A., Levitt, M., Smith-Gill, S. J., Air, G., Sheriff, S., Padlan, E. A., Davies, D., Tulip, W. R., Colman, P. M., Spinelli, S., Alzari, P. M., and Poljak, R. J. 1989. Conformations of immunoglobulin hypervariable regions. Nature 342: 877-883 Clark, E. A., and Ledbetter, J. A. 1994. How B and T cells talk to each other. Nature 367: 425-428 Clarke, S., Rickert, R., Kopke Wloch, M., Staudt, L., Gerhard, W., and Weigert, M. 1990.The BALB/c secondary response to the Sb site of influenza virus hemagglutinin. Nonrandom silent mutation and unequal numbers of VH and VK mutations. J. Immunol. 145: 2286-2296 Clarke, S. H., and McCray, S. K. 1993. VH CDR3-dependent positive selection of murine VHl2-expressing B cells in the neonate. Eur. J. Immunol. 23: 3327-3334 Cockerill, P. N., Yen, M., and Garrard, W. T. 1987. The enhancer of the immunoglobulin heavy chain locus is flanked by presumptive chromosomal loop anchorage elements. J. Biol. Chem. 262: 5394-5397 Cohen, J. B., Effron, K., Rechavi, G., Ben-Neriah, Y., Zakut, R., and Givol226 , D. 1982. VSimplH genese DN: Aa rolsequencee in gens ei ninteraction homologou. Nucleics flankin Acidsg region Res.s nea 10r :immunoglobuli 3353-3370 n Cohen, J. B., and Givol, D. 1983. Conservation and divergence of immunoglobulin VH pseudogenes. EMBO J. 2: 1795-1800 Cohn, M., and Langman, R. E. 1990. The Protecton: the unit of humoral immunity selected by evolution. Immunol. Rev. 115: 11-142 Coico, R. F., Siskind, G. W., and Thorbecke, G. J. 1988. Role of IgD and T8 cells in the regulation of the humoral immune response. Immunol. Rev. 105: 45-67 Coleclough, C, Perry, R. P., Karjalainen, K., and Weigert, M. 1981. Aberrant rearrangements contribute significantly to the allelic exclusion of immunoglobulin gene expression. Nature 290: 372-378 Cohn, M. 1974. A rationale for ordering the data on antibody diversification. In: Progress in immunology II, Vol. 2. eds. Brent, L., and Holborow, J. North- Holland Publishing Co. Cohn, M., and Langman, R. E. 1990. The protecton: The unit of humoral immunity selected by evolution. Immunol. Rev. 115: 7-142 Cory, S., Tyler, B. M., and Adams, J. M. 1981. Sets of immunoglobulin VK genes homologous to ten cloned VK sequences: Implications for the number of germline VK genes. J. Mol. Appl Genet. 1: 103-116 Cohn, M. 1974. A rationale for ordering the data on antibody diversification. In: Progress in Immunology II, Vol 2, Biological Aspects. Brent. I., and Holborow, J., eds. North-Holland/American Elsevier, Amsterdam and New York. Cox, J. P. L., Tomlinson, I. M., and Winter, G. 1994. A directory of human germ-line VK segments reveals a strong bias in their usage. Eur. J. Immunol. 24: 827-836 Crews, S., Griffin, J., Huang, H., Calame, K., and Hood, L. 1981. A single VH gene segment encodes the immune response to phosphorylcholine: Somatic mutation is correlated with the class of antibody. Cell 25: 59-66 Cumano, A., and Rajewsky, K. 1985. Structure of primary anti-(4-hydroxy-3- nitrophenyl)acetyl (NP) antibodies in normal and idiotypically suppressed C57BL/6 mice. Eur. J. Immunol 15: 512-520 Cumano, A., and Rajewsky, K. 1986. Clonal recruitment and somatic hypermutation in the generation of immunological memory to the hapten NP. EMBO J. 5: 2459- 2468 Currier, S. J., Gallarda, J. L., and Knight, K. L. 1988. Partial molecular map of the rabbit VH chromosomal region. /. Immunol. 140: 1651-1659 David, V., Folk, N. L., and Maizels, N. 1992. Germ line variable regions that match hypermutated sequences in genes encoding murine anti-hapten antibodies. Genetics 132: 799-811 Davidson, A., Manheimer-Lory, A., Aranow, C, Peterson, R., Hannigan, N., and Diamond, B. 1990. Molecular characterization of a somatically mutated anti-DNA antibody bearing two systemic lupus erythematosus-related idiotypes. /. Clin. Invest. 85: 1401-1409 Desiderio, S., and Baltimore, D. 1984. Double-stranded cleavage by cell extracts near recombinational signal sequences of immunoglobulin genes. Nature 308: 860- 862 Desiderio, S. V., Yancopoulos, G. D., Paskind, M., Thomas, E., Boss, M. A., Landau, N., Alt, F. W., and Baltimore, D. 1984. Insertion of N regions into heavy-chain genes is correlated with expression of terminal deoxytransferase in B cells. Nature 311: 752-755 Dildrop, R., Bovens, J., Siekevitz, M., Beyreuther, K., and Rajewsky, K. 1984. A V region determinant (idiotype) expressed at high frequency in B lymphocytes is encoded by a large set of antibody structural genes. EMBO J. 3: 517-523 Dornburg, R., and Temin, H. M. 1988. Retroviral vector system for the study of cDNA gene formation. Mol. Cell Biol. 8: 2328-2334 Dougherty, J. P., and Temin, H. M. 1987. A promoterless retroviral vector indicate227 s that there are sequences in U3 requiredfor 3' RNA processing. Proc. Natl. Acad. Sci. USA 84: 1197-1201 Durdik, J., Gerstein, R. M., Rath, S., Robbins, P. F., Nisonoff, A., and Seising, E. 1989. Isotype switching by a microinjected JLL immunoglobulin heavy chain gene in transgenic mice. Proc. Natl. Acad. Sci. USA 86: 2346-2350 Early, P., Huang, H., Davis, M., Calame, K., and Hood, L. 1980. An immunoglobulin heavy chain variable region gene is generated from three segments of DNA: VH, D and JH.Ce//19: 981-992 Early. P., and Hood, L. 1981. Allelic exclusion and nonproductive immunoglobulin gene rearrangements. Cell 24: 1-3 Early, P., Nottenburg, C, Weissman, I., and Hood, L. 1982. Immunoglobulin gene rearrangements in normal mouse B cells. Mol. Cell. Biol. 2: 829-836 Edelman, G. M., and Gaily, J. A. 1970. Arrangement and evolution of eukaryotic genes. In: The Neurosciences. 2nd study program. Ed. Schmitt, F. O. Rockefeller University Press, NY. pp. 962-972 Egel, R. 1981. Intergenic conversion and reiterated genes. Nature 290: 191-192 Enrich, A., Schaal, S., Gu, H„ Kitamura, D., Muller, W., and Rajewsky, K. 1993. Immunoglobulin heavy and light chain genes rearrange independently at early stages of B cell development. Cell 72: 695-704 Engler, P., Weng, Storb, U. 1993. Influence of CpG methylation and target spacing on V(D)J recombination in a transgenic substrate. Mol. Cell Biol 13: 571-577 Even J., Griffiths, G. M., Berek, C, and Milstein, C. 1985. Light chain germ-line genes and the immune response to 2-phenyloxazolone. EMBO J. 4: 3439-3445 Falkner, F. G., and Zachau, H. G. 1984. Correct transcription of an immunoglobulin K gene requires an upstream fragment containing conserved sequence elements. Nature 310: 71-74 Feddersen, R. M., and Van Ness, B. G. 1985. Double recombination of a single immunoglobulin K-chain allele: Implications for the mechanism of rearrangement. Proc. Natl. Acad. Sci. USA 82: 4793-4797 Feeney, A. J. 1990. Lack of N regions in fetal and neonatal mouse immunoglobulin V- D-J junctional sequences. J. Exp. Med. 172: 1377-1390 Feeney, A. J. 1992. Predominance of VH-D-JH junctions occurring at sites of short sequence homology results in limited junctional diversity in neonatal antibodies. /. Immunol 149: 222-229 Felsenstein, J. PHYLIP version 3.5. [email protected]. (1993) Ferguson, S. E., Cancro, M. P., and Osborne, B. A. 1989. Analysis of a novel VHS107 haplotype in CLA-2 and WSA mice. Evidence for gene conversion among IgVn genes in outbred populations. J. Exp. Med. 170: 1811-1823 Ferguson, S. E., and Thompson, C. B. 1993. A new break in V(D)J recombination. Current Biology 3:51 -53 Fernandes, G., Yunis, E. J., and Good, R. A. 1973. Reproductive deficiency of NZB male mice. Possibility of a viral basis. Lab. Invest. 29: 278-281 Fernex, C, Caillol, D., Capone, M., Krippl, B., and Ferrier, P. 1994. Sequences affecting the V(D)J recombinational activity of the IgH intronic enhancer in a transgenic substrate. Nucleic Acids Res. 22: 792-798 Fink, P. J., Matis, L. A., McElligott, D. L., Bookman, M., and Hedrick, S. M. 1986. Correlations between T-cell specificity and the structure of the antigen receptor. Nature 321: 219-226 Fondell, J. D., Marolleau, J., Primi, D., and Marcu, K. B. 1990. On the mechanism of non-allelically excluded Va-Jcc T cell receptor secondary rearrangements in a murine T cell lymphoma. J. Immunol. 144: 1094-1103 Foote, J., and Milstein, C. 1991. Kinetic maturation of an immune response. Nature 352: 530-532 Foote, J., and Winter, G. 1992. Antibody framework residues affecting the ' conformation of the hypervariable loops. J. Mol Biol. 224: 487-499 Forster, I., and Rajewsky, K. 1990. The bulk of the peripheral B-cell pool i22n 8mic e is stablUSA e87 an: d4781-478 not rapidl4 y renewed from the bone marrow. Proc. Natl. Acad. Sci. Fujimoto, S., and Yamagishi, H. 1987. Isolation of an excision product of T-cell- receptor a-chain gene rearrangement. Nature 327: 242-243 Gallarda, J. L., Gleason, K. S., and Knight, K. L. 1985. Organization of rabbit immunoglobulin genes. I. Structure and multiplicity of germ-line VH genes. /. Immunol 135: 4222-4228 Gallis, B., Linial, M., and Eisenman, R. 1979. An avian oncovirus mutant deficient in genomic RNA: Characterization of the packaged RNA as cellular messenger RNA. Virol. 94: 146-161 Gascoigne, N. R. J., Chien, Y., Becker, D. M., Kavaler, J., and Davis, M. M. 1984. Genomic organization and sequence of T-cell receptor 6-chain constant- and joining-region genes. Nature 310: 387-391 Gearhart, P. J., Johnson, N. D., Douglas, R., and Hood., L. 1981. IgG antibodies to phosphorylcholine exhibit more diversity than their IgM counterparts. Nature 291: 29-34 Gearhart, P. J. 1981. Somatic mutation in anti-phosphorylcholine antibodies. In: Idiotypes. Antigens on the inside. Workshop at the Basel Institute for Immunology. Editiones Roche, Basel, Switzerland. Gearhart, P. J. 1982. Generation of immunoglobulin variable gene diversity. Immunol. Today 4:107-112 Gearhart, P. J., and Bogenhagen, D. F. 1983. Clusters of point mutations are found exclusively around rearranged antibody variable genes. Proc. Natl. Acad. Sci. USA 80: 3439-3443 Gellert, M. 1992. Molecular analysis of V(D)J recombination. Annu. Rev. Genet. 22: 425-456 George, J., Penner, S. J., Weber, J., Berry, J., and Claflin, J. L. 1993. Influence of membrane Ig receptor density and affinity on B cell signalling by antigen. /. Immunol 151: 5955-5965 Gerstein, R. M., and Lieber, M. R. 1993. Extent to which homology can constrain coding exon junctional diversity in V(D)J recombination. Nature 363: 625-627 Gilfillan, S., Dierich, A., Lemeur, M., Benoist, C, and Mathis, D. 1993. Mice lacking TdT: Mature animals with an immature lymphocyte repertoire. Science 261: 1175-1178 Giusti, A. M., Coffee, R., and Manser, T. 1992. Somatic recombination of heavy chain variable region transgenes with the endogenous immunoglobulin heavy chain locus in mice. Proc. Natl. Acad. Sci. USA 89: 10321-10325 Giusti, A. M., and Manser, T. 1993. Hypermutation is observed only in antibody H chain V region transgenes that have recombined with endogenous immunoglobulin H DNA: Implications for the location of cw-acting elements required for somatic mutation. J. Exp. Med. Ill: 797-803 Giusti, A. M., and Manser, T. 1994. Somatic generation of hybrid antibody H chain genes in transgenic mice via interchromosomal gene conversion. J. Exp. Med. 179: 235-248 Givol, D., Zakut, R., Effron, K., Rechavi, G., Ram, D., and Cohen, J. C. 1981. Diversity of germ-line immunoglobulin VH genes. Nature 292: 426-430 Gojobori, T., Li, W. H., and Graur, D. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol 18: 360-369 Gojobori, T., and Nei, M. 1984. Concerted evolution of the immunoglobulin VH gene family. Mol. Biol Evol. 1: 195-212 Golding, G. B., Gearhart, P. J., and Glickman, B. W. 1987. Patterns of somatic mutations in immunoglobulin variable genes. Genetics 115: 169-176 Gonzalez-Fernandez, A., and Milstein, C. 1993. Analysis of somatic hypermutation in mouse Peyer's patches using immunoglobulin K light-chain transgenes. Proc. Natl. Acad. Sci. USA 90: 9862-9866 Gorski, J., Rollini, P., and Mach, B. 1983. Somatic mutations of immunoglobuli229 n variable genes are restricted to the rearranged V gene. Science 220:1179-1181 Gough,number N. M. s1988 of .cells Rapi. Anal.d and quantitativBiochem. 173e preparatio: 93-95 n of cytoplasmic RNA from small Goverman, J., Minard, K., Shastri, N., Hunkapiller, T., Hansburg, D., Sercarz, E., and Hood, L. 1985. Rearranged 6 T cell receptor genes in a helper T cell clone specific for lysozyme: No correlation between Vg and MHC restriction. Cell 40 859-867 Graur, D., Shuali, Y., and Li, W. H. 1989. Deletions in processed pseudogenes accumulate faster in rodents than in humans. J. Mol. Evol. 28: 279-285 Grawunder, U., Haasner, D., Melchers, F., and Rolink, A. 1993. Rearrangement and expression of x light chain genes can occur without |u heavy chain expression during differentiation of pre-B cells. Int. Immunol 5:1609-1618 Gray, D., Kosco, M., and Stockinger, B. 1991. Novel pathways of antigen presentation for the maintenance of memory. Int. Immunol. 3: 141-148 Grey, H. M., and Chesnut, R. 1985. Antigen processing and presentation to T cells. Immunol. Today. 6: 101-106 Griffiths, G. M., Berek, C, Kaartinen, M., and Milstein, C. 1984. Somatic mutation and the maturation of the immune response to 2-phenyl oxazolone. Nature 312: 271-275 Gu, H., Forster, I., and Rajewsky, K. 1990. Sequence homologies, N sequence insertion and JH gene utilization in VHDJH joining: implications for the joining mechanism and the ontogenic timing of Lyl B cell and B-CLL progenitor generation. EMBO J. 9: 2133-2140 Gu, H., Kitamura, D., and Rajewsky, K. 1991a. B cell development regulated by gene rearrangement: Arrest of maturation by membrane-bound Dju protein and selection of DH element reading frame. Cell 65: 47-54 Gu, H., Tarlinton, D., Miiller, W., Rajewsky, K., and Forster, I. 1991b. Most peripheral B cells in mice are ligand selected. J. Exp. Med. 173: 1357-1371 Hackett, J., Rogerson, B. J., O'Brien, R. L., and Storb, U. 1990. Analysis of somatic mutations in K transgenes. J. Exp. Med. 172: 131-137 Haino, M., Hayashida, H., Miyata, T., Shin, E. K., Matsuda, F., Nagaoka, H., Matsumura, R., Taka-ishi, S., Fukita, Y., Fujikura, J., Honjo, T. 1994. Comparison and evolution of human immunoglobulin VH segments located in the 3' 0.8-megabase region. J. Biol Chem. 269: 2619-2626 Halligan, B. D., and Desiderio, S. V. 1987. Identification of a DNA binding protein that recognizes the nonamer recombination signal sequence of immunoglobulin genes. Proc. Natl. Acad. Sci. USA 84: 7019-7023 Harada, K., and Yamagishi, H. 1991. Lack of feedback inhibition of VK gene rearrangement by productively rearranged alleles. J. Exp. Med. 173: 409-415 Hardie, D. L., Johnson, G. D„ Khan, M., and MacLennan, I. C. M. 1993. Quantitative analysis of molecules which distinguish functional compartments within germinal centers. Eur. J. Immunol 23: 997-1004 Havran, W. L., DiGiusto, D. L., and Cambier, J. C. 1984. mlgMmlgD ratios on B cells: Mean mlgD expression exceeds mlgM by 10-fold on most splenic B cells. J. Immunol. 132: 1712-1716 Hein, J. 1990. Unified approach to alignment and phylogenies. Methods in Enzymology 183: 626-645 Hesse, J. E., Lieber, M. R., Gellert, M., and Mizuuchi, K. 1987. Extrachromosomal DNA substrates in pre-B cells undergo inversion or deletion at immunoglobulin V-(D)-J joining signals. Cell 49: 775-783 Hieter, P. A., Korsmeyer, S. J., Waldmann, T. A., and Leder, P. 1981. Human immunoglobulin K light-chain genes are deleted or rearranged in ^-producing B cells. Nature 290: 368-372 Hockenbery, D., Nunez, G., Milliman, C, Schreiber, R. D., and Korsmeyer, S. J. 1990. Bcl-2 is an inner mitochondrial membrane protein that blocks programmed cell death. Nature 348: 334-336 Hockenbery, D. M., Zutter, M., Hickley, W., Nahm, M., and Korsmeyer, S23. J0. 1991. Bcl-death2. protei Proc.n iNatl.s topographicall Acad. Sci.y restricteUSA 88:d 6961-696in tissues5 characterize d by apoptotic cell Hodgkin, P. D., Yamashita, L. C, Coffman, R. L., and Kehry, M. R. 1990. Separation of events mediating B cell proliferation and Ig production by using T cell membranes and lymphokines. J. Immunol. 145: 2025-2034 Hoffer, A. P., Hamilton, D. W., and Fawcett, D. W. 1973. The ultrastructure of the principal cells and intraepithelial leucocytes in the initial segment of the rat epididymis. Anat. Rec. 175: 169-202 Hollowood, K., and Macartney, J. 1992. Call kinetics of the germinal center reaction - a stathmokinetic study. Eur. J. Immunol 22: 261-266 Honjo, T. 1983. Immunoglobulin genes. Ann. Rev. Immunol 1: 499-528 Hood, L., Campbell, J. H., and Elgin, S. C. R. 1975. The organization, expression, and evolution of antibody genes and other multigene families. Ann. Rev. Genet. 9: 305-353 Hope, T. J., Aguilera, R. J., Minie, M. E., and Sakano, H. 1986. Endonucleolytic activity that cleaves immunoglobulin recombination sequences. Science 231: 1141-1145 Hsieh, C, McCloskey, R. P., Radany, E., and Lieber, M. R. 1991. V(D)J recombination: Evidence that a replicative mechanism is not required. Mol. Cell Biol 11: 3972-3977 Hughes, A. L., and Nei, M. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335: 167-170 Hughes, A. L., and Nei, M. 1989. Nucleotide substitution at major histocompatibility complex class II loci: Evidence for overdominant selection. Proc. Natl Acad. Sci. USA 86: 958-962 Ikawa, Y., Ross, J., and Leder, P. 1974. An association between globin messenger RNA and 60S RNA derived from Friend leukemia virus. Proc. Natl. Acad. Sci. USA11: 1154-1158 Jacob, J., Kelsoe, G., Rajewsky, K., and Weiss, U. 1991a. Intraclonal generation of antibody mutants in germinal centres. Nature 354: 389-392 Jacob, J., Kassir, R., and Kelsoe, G. 1991b. In situ studies of the primary immune response to (4-hydroxy-3-nitro)acetyl. I. The architecture and dynamics of responding cell populations. J. Exp. Med. 173: 1165-1175 Jacob, J., and Kelsoe, G. 1992. In situ studies of the primary immune response to (4- hydroxy-3-nitrophenyl)acetyl. II. A common clonal origin for periarteriolar lymphoid sheath-associated foci and germinal centers. J. Exp. Med. 176: 679- 687 Jacob, J, Miller, C, and Kelsoe, G. 1992. In situ studies of the antigen-driven somatic hypermutation of immunoglobulin genes. Immunol. Cell Biol. 70:145-152 Jacob, J., Przylepa, J., Miller, C, and Kelsoe, G. 1993. In situ studies of the primary immune response to (4-hydroxy-3-nitrophenyl)acetyl. III. The kinetics of V region mutation and selection in germinal center B cells. J. Exp. Med. 178: 1293- 1307 Jelachich, M. L., Grusby, M. J., Clark, D., Tasch, D., Margoliash, E., and Pierce, S. K. 1984. Synergistic effect of antigen and soluble T-cell factors in B-lymphocyte activation. Proc. Natl. Acad. Sci. USA 81: 5537-5541 Johansen, T., Holm, T., and Bj0rklid, E. 1989. Members of the RTVL-H family of human endogenous retrovirus-like elements are expressed in placenta. Gene 79: 259-267 Johnson, J. G., and Jemmerson, R. 1991. Relative frequencies of secondary B cells activated by cognate vs. other mechanisms. Eur. J. Immunol 21: 951-958 Julius, M. H., and Rammensee, H. G. 1988. T helper cell-dependent induction of resting B cell differentiation need not require cognate cell interactions. Eur. J. Immunol 18: 375-379 Jukes, T. H., and Cantor, C. R. 1969. Evolution of protein molecules. In: Mammalian231 Protein Metabolism Vol. III. Munro, H. N. ed. Academic Press, New York. Jung, V.containin18,: Pestka6156 g, termina S. B., anl restrictiod Pestka,n Sendonucleas. 1990. Efficiene recognitiot cloninng sitesof PC. NucleicR generate Acidsd DN Res.A Jukes, T. H. 1978. Neutral changes during divergent evolution of hemoglobins. J. Mol. Evol. 11: 267-269 Kaartinen, M., Griffiths, G. M., Markham, A. F., and Milstein, C. 1983. mRNA sequences define an unusually restricted IgG response to 2-phenyloxazolone and its early diversification. Nature 304: 320-324 Kaartinen, M., and Makeia, O. 1985. Reading of D genes in variable frames as a source of antibody diversity. Immunol. Today 6: 324-327 Kaartinen, M., Pelkonen, 1, and Makeia, O. 1986. Several V genes participate in the early phenyloxazolone response in various combinations. Eur. J. Immunol. 16: 98-105 Kaartinen, M., Pelkonen, E., Even, J., and Makeia, O. 1988. V genes of the primary antibody response of C57BL/10 mice to the hapten phenyloxazolone. Eur. J. Immunol. 18: 1095-1100 Kaartinen, M., Solin, M., and Makeia, O. 1989. 'Allelic' forms of immunoglobulin V genes in different strains of mice. EMBO J. 8: 1743-1748 Kaartinen, M., Kulp, S., and Makeia, O. 1991. Characteristics of selection-free mutations and effects of subsequent selection. In: Somatic hypermutation in V- regions. Steele, E. J. ed. CRC Press, Boca Raton, FL. Kallberg, E., Gray, D., and Leanderson, T. 1993. Analysis of somatic mutation activity in multiple VK genes involved in the response to 2-phenyl-5-oxazolone. Int. Immunol 5: 573-581 Kalled, S. L., and Brodeur, P. H. 1990. Preferential rearrangement of VK4 gene segments in pre-B cell lines. /. Exp. Med. 172: 559-566 Kallenbach, S., Briiikmann, T., and Rougeon, F. 1993. Rag-1: a topoisomerase? Int. Immunol. 5: 231-232 Karin, M., and Richards, R. I. 1982. Human metallothionein genes - primary structure of the metallothionein-n gene and related processed gene. Nature 299:797-802 Kataoka, T., Kondo, S., Nishi, M., Kodaira, M., and Honjo, T. 1984. Isolation and characterization of endonuclease J: A sequence-specific endonuclease cleaving immunoglobulin genes. Nucleic Acids Res. 12: 5995-6010 Kavaler, J., Davis, M. M., and Chien, Y. 1984. Localization of a T-cell receptor diversity-region element. Nature 310:421-423 Kemp, D. J., Tyler, B., Bernard, O., Gough, N., Gerondakis, S., Adams., J. M., and Cory, S. 1981. Organization of genes and spacers within the mouse immunoglobulin VH locus. J. Mol. Appl. Genet. 1: 245-261 Keohavong, P., and Thilly, W. G. 1989. Fidelity of DNA polymerases in DNA amplification. Proc. Natl Acad. Sci. USA 86: 9253-9257 Kiessling, A. A. 1984. Evidence that reverse transcriptase is a component of murine epididymal fluid. Proc. Soc. Exp. Biol. Med. 176: 175-182 Kiessling, A. A., and Goulian, M. 1979. Detection of reverse transcriptase activity in human cells. Cancer Res. 39: 2062-2069 Kiessling, A. A., Crowell, R. C, and Connell, R. S. 1987. Sperm-associated retroviruses in the mouse epididymis. Proc. Natl. Acad. Sci. USA 84: 8667- 8671. Kiessling, A. A., Crowell, R., and Fox, C. 1989. Epididymis is a principal site of retrovirus expression in the mouse. Proc. Natl. Acad. Sci. USA 86: 5109-5113 Kim, S., Davis, M., Sinn, E., Patten, P., and Hood, L. 1981. Antibody diversity: Somatic hypermutation of rearranged VH genes. Cell 27:573-581 Kimura, M. 1968. Evolutionary rate at the molecular level. Nature 217: 624-626 Kimura, M. 1977. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267: 275-276 Kimura, M. 1983. The neutral theory of evolution. In: Evolution of genes and proteins. Eds. Nei, M., and Koehn, R. K. Sinauer Associates Inc. Sunderland, Massachusetts, pp. 208-233 232 King, LMolecula. B., Lundr event, F.s iE.n ,B Whitelymphocyt, D. eA. differentiation, Sharma, S.., Inducibland Corleye expressio, R. B.n 1990of th.e endogenous mouse mammary tumour proviral gene, Mtv-9. J. Immunol. 144: 3218-3227 Kishino, H., and Hasegawa, M. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J. Mol. Evol 29: 170-179 Kitamura, D., Roes, J., Kuhn, R., and Rajewsky, K. 1991. A B cell-deficient mouse by targeted disruption of the membrane exon of the immunoglobulin u. chain gene. Nature 350: 423-426 Kitamura, D., and Rajewsky, K. 1992. Targeted disruption of u. chain membrane exon causes loss of heavy-chain allelic exclusion. Nature 356: 154-156 Klein, J. 1975. Biology of the mouse histocompatibility-2 complex. Springer Verlag, Berlin. Kleinfeld, R., Hardy, R. R., Tarlinton, D., Dangl, J., Herzenberg, L. A., and Weigert, M. 1986. Recombination between an expressed immunoglobulin heavy-chain gene and a germline variable gene segment in a Ly 1+ B-cell lymphoma. Nature 322: 843-846 Kleinfeld, R. W., and Weigert, M. G. 1989. Analysis of VH gene replacement events in a B cell lymphoma. /. Immunol. 142: 4475-4482 Kleinfeld, R. W., and Weigert, M. G. 1989. Interspersion of the VHQ52 and VH7183 gene families in the NFS/N mouse. J. Immunol. 142: 4483-4492 Knight, K. L. 1992. Restricted VH gene usage and generation of antibody diversity in rabbit. Annu. Rev. Immunol 10: 593-616 Kodaira, M., Kinashi, T., Umemura, I., Matsuda, F., Noma, T., Ono, Y., and Honjo, T. 1986. Organization and evolution of variable region genes of the human immunoglobulin heavy chain. J. Mol. Biol. 190: 529-541 Kofler, R., Geley, S., and Helmberg, A. 1992. Mouse variable-region gene families: Complexity, polymorphism and use in non-autoimmune responses. Immunol. Rev. 128: 5-21 Kolchanov, N. A., Solovyov, V. V., and Rogozin, I. B. 1987. Peculiarities of immunoglobulin gene structures as a basis for somatic mutation emergence. FEBS Lett. 214: 87-91 Kokubu, F., Litman, R„ Shamblott, M. J., Hinds, K., and Litman, G. W. 1988. Diverse organization of immunoglobulin VH gene loci in a primitive vertebrate. EMBO J. 1: 3413-3422 Komori, T., Okada, A., Stewart, V., and Alt, F. W. 1993. Lack of N regions in antigen receptor variable region genes of TdT-deficient lymphocytes. Science 261: 1171- 1175 Kondo, T., Arakawa, H., Kitao, H., Hirota, Y., and Yamagishi, H. 1993. Signal joint immunoglobulin VM-JX and novel joints of chimeric V pseudogenes on extrachromosomal circular DNA from chicken bursa. Eur. J. Immunol. 23: 245- 249 Korsmeyer, S. J., Hieter, P. A., Ravetch, J. V., Poplack, D. G., Waldmann, T. A., and Leder, P. 1981. Developmental hierarchy of immunoglobulin gene rearrangements in human leukemic pre-B cells. Proc. Natl. Acad. Sci. USA 78: 7096-7100 Kosco, M. H., Szakal, A. K., and Tew, J. G. 1988. In vivo obtained antigen presented by germinal center B cells to T cells in vivo. J. Immunol. 140: 354-360 Kosco, M. H., Pflugfelder, E., and Gray, D. 1992. Follicular dendritic cell-dependent adhesion and proliferation of B cells in vitro. J. Immunol. 148: 2331-2339 Kosco, M. H., and Gray, D. 1992. Signals involved in germinal center reactions. Immunol. Rev. 126: 63-76 Kranz, D. M., and Voss, E. W. 1981. Restricted reassociation of heavy and light chains ' from hapten-specific monoclonal antibodies. Proc. Natl. Acad. Sci. USA 78: 233 5807-5811 , v Krawinkel, U., Zoebelein, G., Bruggemann, M., Radbruch, A., and Rajewsky, K. Evidenc1983. Recombinatioe for gene conversionn betwee. nProc. antibod Natl.y heav Acad.y chai Sci.n variable-regio USA 80: 4997-500n genes1 : Krawinkel, U., Zoebelein, G., and Bothwell, A. L. M. 1986. Palindromic sequences are associated with sites of DNA breakage during gene conversion. Nucleic Acids Res. 9: 3871-3882 Kroese, G. M., Wubbena, A. S., Seijen, H. G., and Nieuwenhuis, P. 1988. The de novo generation of germinal centers is an oligoclonal process. Adv. Exp. Med Biol. 237: 245-250 Kubagawa, H., Cooper, M. D., Carroll, A. J., and Burrows, P. D. 1989. Light-chain expression before heavy-chain gene rearrangement in pre-B cells transformed by Epstein-Barr virus. Proc. Natl Acad. Sci. USA 86: 2356-2360 Kunkel, T. A. Exonucleolytic proofreading. Cell 53: 837-840 Kunkel, T. A. 1991. Hypermutation during DNA synthesis in vitro. In: Somatic hypermutation in V-regions. Ed. Steele, E. J. CRC Press, Boca Raton, FL. Kuppers, R., Zhao, M., Hansmann, M., and Rajewsky, K. 1993. Tracing B cell development in human germinal centres by molecular analysis of single cells picked from histological sections. EMBO J. 12: 4955-4967 Kurosawa, Y., von Boehmer, H., Haas, W., Sakano, H., Trauneker, A., and Tonegawa, S. 1981. Identification of D segments of immunoglobulin heavy-chain genes and their rearrangement in T lymphocytes. Nature 290: 565-570 Kurosawa, Y., and Tonegawa, S. 1982. Organization, structure, and assembly of immunoglobulin heavy chain diversity DNA segments. J. Exp. Med. 155: 201- 218 Lafaille, J. J., DeCloux, A., Bonnevile, M., Takagaki, Y., and Tonegawa, S. 1989. Junctional sequences of T cell receptor y8 genes: Implications for yS T cell lineages and for a novel intermediate of V-(D)-J joining. Cell 59: 859-870 Landau, N. R., Schatz, D. G., Rosa, M., and Baltimore, D. 1987. Increased frequency of N-region insertion in a murine pre-B-cell line infected with a terminal deoxynucleotidyl transferase retroviral expression vector. Mol. Cell. Biol. 1: 3237-3243 Lanzavecchia, A. 1985. Antigen-specific interactions between T and B cells. Nature 314: 537-539 Lanzavecchia, A., Roosnek, E., Gregory, T., Berman, P., and Abrignani, S. 1988. T cells can present antigens such as HIV gpl20 targeted to their own surface molecules. Nature. 334: 530-532 Lauster, R., Reynaud, C, Martensson, I., Peter, A., Bucchini, D., Jami, J., and Weill, J. 1993. Promoter, enhancer and silencer elements regulate rearrangement of an immunoglobulin transgene. EMBO J. 12: 4615-4623 Lautner-Rieske, A., Huber, C, Meindl, A., Pargent, W., Schable, K. F., Thiebe, R., Zocher, I., and Zachau, H. G. 1992. The human immunoglobulin K locus: Characterization of the duplicated A regions. Eur. J. Immunol. 22:1023-1029 Lebecque, S. G., and Gearhart, P. J. 1990. Boundaries of somatic mutation in rearranged immunoglobulin genes: 5' boundary is near the promoter, and 3' boundary is ~lkb from V(D)J gene. J. Exp. Med. 172: 1717-1727 Lee, K. H., Matsuda, F., Kinashi, T., Kodaira, M., and Honjo, T. 1987. A novel family of variable region genes of the human immunoglobulin heavy chain. J. Mol Biol. 195: 761-768 Levy, N. S., Malipiero, U. V., Lebecque, S. G./and Gearhart, P. J. 1989. Early onset of somatic mutation in immunoglobulin VH genes during the primary immune response. J. Exp. Med. 169: 2007-2019 Li, W. H., Gojobori, T., and Nei, M. 1981. Pseudogenes as a paradigm of neutral evolution. Nature 292: 237-239 Li, W. H. 1983. Evolution of duplicate genes and pseudogenes. In: Evolution of genes and proteins. Eds. Nei, M., and Koehn, R. K. Sinauer Associates Inc. Sunderland, Massachusetts, pp. 14-37 Li, W. H., Wu, C, and Luo, C. 1984. Nonrandomness of point mutation as reflecte234 d in nucleotidEvol 21:e 58-7 substitution1 s in pseudogenes and its evolutionary implications. /. Mol. Lieber, M. R., Hesse, J. E., Mizuuchi, K., and Gellert, M. 1988a. Lymphoid V(D)J recombination: Nucleotide insertion at signal joints as well as coding joints. Proc. Natl Acad. Sci. USA 85: 8588-8592 Lieber, M. R., Hesse, J. E., Lewis, S., Bosma, G. C, Rosenberg, N., Mizuuchi, K., Bosma, M. J., Gellert, M. 1988b. The defect in murine severe combined immune deficiency: Joining of signal sequences but not coding segments in V(D)J recombination. Cell 55: 7-16 Lieber, M. R. 1991. Site-specific recombination in the immune system. FASEB J. 5: 2934-2944 Linial, M. 1987. Creation of a processed pseudogene by retroviral infection. Cell 49: 93- 102 Linton, P., Decker, D. J., and Klinman, N. R. 1989. Primary antibody-forming cells and secondary B cells are generated from separate precursor cell subpopulations. Cell 59: 1049-1059 Linton, P., Lo, D., Lai, L., Thorbecke, G. J., and Klinman, N. R. 1992. Among naive precursor cell subpopulations only progenitors of memory B cells originate germinal centers. Eur. J. Immunol 22: 1293-1297 Liu, Y., Joshua, D. E., Williams, G. T., Smith, C. A., Gordon, J., and MacLennan, I. C. M. 1989. Mechanism of antigen-driven selection in germinal centres. Nature 342: 929-931 Liu, Y., Zhang, J., Lane, P. J. L., Chan, E. Y., and MacLennan, I. C. M. 1991a. Sites of specific B cell activation in primary and secondary responses to T cell- dependent and T cell-independent antigens. Eur. J. Immunol. 21: 2951-2962 Liu, Y., Mason, D. Y., Johnson, G. D., Abbot, S., Gregory, C. D., Hardie, D. L., Gordon, J., and MacLennan, I. C. M. 1991b. Germinal center cells express bcl-2 protein after activation signals which prevent their entry into apoptosis. Eur. J. Immunol. 21: 1905-1910 Liu. Y., Cairns, J. A., Holder, M. J., Abbot, S. D., Jansen, K. U., Bonnefoy, J., Gordon, J., and MacLennan, I. C. M. 1991c. Recombinant 25-kDa CD23 and interleukin la promote the survival of germinal center B cells: evidence for bifurcation in the development of centrocytes rescued from apoptosis. Eur. J. Immunol. 21: 1107-1114 Liu, A. H., Creadon, G., and Wysocki, L. J. 1992. Sequencing heavy- and light-chain variable genes of single B-hybridoma cells by total enzymatic amplification. Proc. Natl Acad. USA. 89: 7610-7614 Livant, D., Blatt, C, and Hood, L. 1986. One heavy chain variable region gene segment subfamily in the BALB/c mouse contains 500 -1000 or more members. Cell 47: 461-470 Loh, D. Y., Bothwell, A. L. M., White-Scharf, M. E., Imanishi-Kari, T., and, Baltimore, D. 1984. Molecular basis of a mouse strain-specific anti-hapten response. Cell 33: 85-93 Lonberg, N., Taylor, L. D., Harding, F. A., Trounstine, M., Higgins, K. M., Schramm, S. R., Kuo, C. C, Mashayekh, R., Wymore, K., McCabe, J. G., Munoz-O'Regan, D., O'Donnell, S. L., Lapachet, E. S. G., Bengoechea, T., Fishwild, D. M., Carmack, C. E., Kay, R. M., and Huszar, D. 1994. Antigen- specific human antibodies from mice comprising four distinct genetic modifications. Nature 368: 856-859 Lozano, F., Rada, C, Jarvis, J. M., and Milstein, C. 1993. Affinity maturation leads to differential expression of multiple copies of a K light-chain transgene. Nature 363: 271-273 Lundberg, K. S., Shoemaker, D. D., Adams, M. W. W„ Short, J. M., Sorge, J. A., and Mathur, E. J. 1991. High-fidelity amplification using a thermostable DNA polymerase isolated from Pyrococcus furiosus. Gene 108: 1-6 MacDonnell, T. J., Deane, N., Piatt, F. M., Nunez, G., Jaeger, U., McKearn235, J. P., and Korsmeyer, S. J. 1989. bcl-2-immunoglobulin transgenic mice demonstrate extended B cell survival and follicular lymphoproliferation. Cell 57: 79-88 MacLennan, I. C. M., and Gray, D. 1986. Antigen-driven selection of virgin and memory B cells. Immunol. Rev. 91: 61-85 MacLennan, I. C. M„ Liu, Y. and Ling, N. R. 1988. B cell proliferation in follicles, germinal centre formation and the site of neoplastic transformation in Burkitt's lymphoma. Curr. Top. Microbiol. Immunol 141: 138-148 MacLennan, I. 1991. The centre of hypermutation. Nature 354: 352-353 MacLennan, I. C. M., Johnson, G. D., Liu, Y. I., and Gordan, J. 1991. The heterogeneity of follicular reactions. Res. Immunol. 142: 253-256 MacLennan, I. C. M., Liu, Y., and Johnson, G. D. 1992. Maturation and dispersal of B- cell clones during T cell-dependent antibody responses. Immunol. Rev. 126: 143-161 Maeda, S., Mellors, R. C, Mellors, J. W., Jerabek, L. B., and Zervoudakis, I. A. 1983. Immunohistologic detection of antigen related to primate type C retrovirus p30 in normal human placentas. Am. J. Path. 112: 347-356 Maizels, N., and Bothwell, A. 1985. The T-cell-independent immune response to the hapten NP uses a large repertoire of heavy chain genes. Cell 43:715-720 Maizels, N. 1989. Might gene conversion be the mechanism of somatic hypermutation of mammalian immunoglobulin genes. Trends Genet. 5: 4-8 Makeia, O., and Karjalainen, K. 1977. Inherited immunoglobulin idiotypes of the mouse. Immunol. Rev. 34: 119-138 Makeia, O., and Karjalainen, K. 1988. Genetic control of early antibody responses. Immunol. Rev. 105: 85-96 Mannik, M. 1967. Variability in the specific interactions of H and L chains of yG- Globulins. Biochem. 6: 134-142 Manser, T., Huang, S., Gefter, M. 1984. Influence of clonal selection on the expression of immunoglobulin variable region genes. Science 226: 1283-1288 Manser, T., Wysocki, L. W., Margolies, M. N., Gefter, M. L. 1987. Evolution of antibody variable region structure during the immune response. Immunol. Rev. 96: 141-161 Manser, T. 1989. Evolution of antibody structure during the immune response. J. Exp. Med. 170: 1211-1230 Manser, T. 1990a. Limits on heavy chain junctional diversity contribute to the recurrence of an antibody variable region. Molec. Immunol. 27: 503-511 Manser, T. 1990b. The efficiency of antibody affinity maturation: Can the rate of B-cell division be limiting? Immunol Today 11: 305-308 Manz, J., Denis, K., Witte, O., Brinster, R., and Storb, U. 1988. Feedback inhibition of immunoglobulin gene rearrangement by membrane \i, but not by secreted \i heavy chains. /. Exp. Med. 168: 1363-1381 Matsuda, F., Shin, E. K., Hirabayashi, Y., Nagaoka, H., Yoshida, M. C, Zong, S. Q., and Honjo, T. 1990. Organization of variable region segments of the human immunoglobulin heavy chain: duplication of the D5 cluster within the locus and interchromosomal translocation of variable region segments. EMBO J. 9: 2501- 2506 Matsuda, F., Shin, E. K., Nagaoka, H., Matsumura, R., Haino, M., Fukita, Y., Taka- ishi, S., Imai, T., Riley, J. H., Anand, R., Soeda, E., Honjo, T. 1993. Structure and physical map of 64 variable segments in the 3' 0.8-megabase region of the human immunoglobulin heavy-chain locus. Nature Genet. 3: 88-94 Matsunami, N., Hamaguchi, Y., Yamamoto, Y., Kuze, K., Kangawa, K., Matsuo, H., Kawaichi, M., and Honjo, T. 1989. A protein binding to the JK recombination sequence of immunoglobulin genes contains a sequence related to the integrase motif. Nature 342: 934-937 McCarrey, J. R., and Thomas, K. 1987. Human testis-specific PGK gene lacks introns and possesses characteristics of a processed gene. Nature 326: 501-505 McCormack, W. T., Tjoelker, L. W., Carlson, L. M., Petryniak, B., Barth23, 6C . F., Humphriesinvolvenucleotides deletios, a tE .bot H.nh, ocodinanf da Thompsoncirculag segmentsr episom, .C .Cell B.e 198956an:d . 785-79 additioChicke1 nn Igof Lsingl genee rearrangemennonrandom t McCormack, W. T., and Thompson, C. B. 1990. Chicken IgL variable region gene conversions display pseudogene donor preference and 5' to 3' polarity. Genes Dev. 4: 548-558 McGuire, K. L., and Vitetta, E. S. 1981. K/X shifts do not occur during maturation of murine B cells. J. Immunol. 127: 1670-1673 McHeyzer-Williarns, M. G., Nossal, G. J. V., and Lalor, P. A. 1991. Molecular characterization of single memory B cells. Nature 350: 502-505 McHeyzer-Williarns, M. G., McLean, M. J., Lalor, P. A., and Nossal, G. J. V. 1993. Antigen-driven B cell differentiation in vivo. J. Exp. Med. 178: 295-307 McKean, D., Hiippi, K., Bell, M., Staudt, L., Gerhard, W., and Weigert, M. 1984. Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin. Proc. Natl Acad. Sci. USA 81: 3180-3184 Meek, K., Hasemann, C, Pollock, B., Alkan, S. S., Brait, M., Slaoui, M., Urbain, J., and Capra, J. D. 1989. Structural characterization of antiidiotypic antibodies. Evidence that Ab2s are derived from the germline differently than Abls. J. Exp. Med. 169: 519-533 Meek, K. 1990. Analysis of junctional diversity during B lymphocyte development. Science 250: 820-822 Meek, K., Rathbun, G., Reininger, L., Jaton, J., Kofler, R., Tucker, P. W., and Capra, J. D. 1990. Organization of the murine immunoglobulin VH complex: Placement of two new VH families (VHIO and VHI 1) and analysis of VH family clustering and interdigitation. Mol. Immunol 27: 1073-1081 Meyer, K. B., and Neuberger, M. S. 1989. The immunoglobulin K locus contains a second, stronger B-cell-specific enhancer which is located downstream of the constant region. EMBO J. 8: 1959-1964 Mian, I. S., Bradwell, A. R., and A. J. Olson. 1991. Structure, function and properties of antibody binding sites. J. Mol Biol. 217: 133-151 Milstein, C, Even, J., Jarvis, J. M., Gonzalez-Fernandez, A., and Gherardi, E. 1992. Non-random features of the repertoire expressed by the members of one Vk gene family and of the V-J recombination. Eur. J. Immunol 22: 1627-1634 Miyata, T., and Yasunaga, T. 1981. Rapidly evolving mouse a-globin-related pseudogene and its evolutionary history. Proc. Natl. Acad. Sci. USA 78: 450- 453 Mombaerts, P., Iacomini, J., Johnson, R. S., Herrup, K., Tonegawa, S., and Papaioannou, V. E. 1992. RAG-1 deficient mice have no mature B and T lymphocytes. Cell 68: 869-877 Moromi, C, and Schumann, G. 1975. Lipopolysaccharide induces C-type virus in short term cultures of BALB/c spleen cells. Nature 254: 60-61 Moroni, C, Stoye, J. P., DeLamarter J. F., Erb, P., Jay, F. A., Jongstra, J., Martin, D., and Schumann, G. 1980. Normal B-cell activation involves endogenous retroviral antigen expression: Implications for leukemogenesis. Cold Spring Harbor Symp. Quant. Biol. 44: 1205-1210 Motoyama, N., Okada, H., and Azuma, T. 1991. Somatic mutation in constant regions of mouse XI light chains. Proc. Natl. Acad. Sci. USA 88: 7933-7937 Motoyama, N., Miwa, T., Suzuki, Y., Okada, H., and Azuma, T. 1994. Comparison of somatic mutation frequency among immunoglobulin genes. J. Exp. Med. 179: 395-403 Nagylaki, T., and Petes, T. D. 1982. Intrachromosomal gene conversion and the maintenance of sequence homogeneity among repeated genes. Genetics 100: 315- 337 Nagylaki, T. 1984. The evolution of multigene families under intrachromosomal gene conversion. Genetics 106: 529-548 Nieuwenhuis, P., Kroese, F. G. M., Opstelten, D., and Seijen, H. G. 1992. De novo germinal center formation. Immunol. Rev. 126: 77-98 237 Nilsson, B. O., Larsson, E., and Sundstrom, P. 1986. Virus p30-related protein in follicularfluids and C virus-like particles on the cell membrane of the human oocyte. J. In Vitro Fert. Embryo Transfer 3: 296-303 Nisbet-Brown, E. R., Lee, J. W. W., Cheung, R. K., and Gelfand, E. W. 1987. Antigen-specific and -nonspecific mitogenic signals in the activation of human T cell clones. J. Immunol. 138: 3713-3719 Nussenzweig, M. C, Shaw, A. C, Sinn, E., Danner, D. B., Holmes, K. L., Morse, H. C, and Leder, P. 1987. Allelic exclusion in transgenic mice that express the membrane form of immunoglobulin \i. Science 236: 816-819 O'Brien, R. L., Brinster, R. L., and Storb, U. 1987. Somatic hypermutation of an immunoglobulin transgene in K transgenic mice. Nature 326: 405-409 O'Garra, A., Umland, S., De France, T., and Christiansen, J. 1988. "B cell factors" are pleiotropic. Immunol Today 9: 45-54 Oettinger, M. A., Schatz, D. G., Gorka, C, and Baltimore, D. 1990. RAG-1 and RAG- 2, adjacent genes that synergisticaUy activate V(D)J recombination. Science 248: 1517-1523 Oettinger, M. A., Stanger, B., Schatz, D. G., Glaser, T., Call, K., Housman, D., and Baltimore, D. 1992. The recombination activating genes, RAG-1 and RAG-2, are on chromosome lip in humans and chromosome 2 in mice. Immunogeneti.es 35: 97-101 Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin. Ohta, T. 1984. Some models of gene conversion for treating the evolution of multigene families. Genetics 106: 517-528 Okazaki, K., Davis, D. D., and Sakano, H. 1987. T cell receptor 6 gene sequences in the circular DNA of thymocyte nuclei: direct evidence for intramolecular DNA deletion in V-D-J joining. Cell 49: 477-485 Oltz, E. M., Alt, F. W., Lin, W., Chen, J., Taccioli, G., Desiderio, S., and Rathbun, G. 1993. A V(D)J recombinase-inducible B-cell line: Role of transcriptional enhancer elements in directing V(D)J recombination. Mol. Cell Biol 13: 6223- 6230 Owens, T. 1988. A noncognate interaction with anti-receptor antibody-activated helper T cells induces small resting murine B cells to proliferate and to secrete antibody. Eur. J. Immunol. 18: 395-401 Paabo, S., Irwin, D. M., and Wilson, A. C. 1990. DNA damage promotes jumping between templates during enzymatic amplification. J. Biol. Chem. 265: 4718- 4721 Padovan, E., Casorati, G., Dellabona, P., Meyer, S., Brockhaus, M., and Lanzavecchia, A. 1993. Expression of two T-cell receptor a chains: Dual receptor T cells. Science 262: 422-424 Parhami-Seren, B., Wysocki, L. J., Margoties, M. N., and Sharon, J. 1990. Clustered H chain somatic mutations shared by anti-p-azophenylarsonate antibodies confer enhanced affinity and ablate the cross-reactive idiotype. J. Immunol. 145: 2340- 2346 Parhami-Seren, B., Kussie, P. H., Strong, R. K., and Margohes, M. N. 1993. Conservation of binding site geometry among p-Azophenylarsonate-specific antibodies. J. Immunol. 150: 1829-1837 Parslow, T. G., Blair, D., Murphy, W. J., and Granner, D. K. 1984. Structure of the 5' ends of immunoglobulin: A novel conserved sequence. Proc. Natl. Acad. Sci. USA. 81: 2650-2654 Pascual, V., and Capra, J. D. 1991. Human immunoglobulin heavy-chain variable region genes: Organization, polymorphism, and expression. Adv. Immunol. 49: 1-74 Pech, M., Hbchtl, J., Schnell, H., and Zachau, H. G. 1981. Differences between germ­ line and rearranged immunoglobulin VK coding sequences suggest a localized mutation mechanism. Nature 291: 668-670 238 Pelkonen, J., Kaartinen, M., and Makeia, O. 1986. Quantitative representation of two Immunol.germ-tine 16:106-10V gens in9 th e early antibody response to 2-phenyloxazolone. Eur. J. Perlmutter, R. M., Kearney, J. F., Chang, S, P., and Hood, L. 1985. Developmentally controlled expression of immunoglobulin VH genes. Science 227: 1597-1601 Portis, J. L., McAtee, F. J., and Hayes, S. F. 1987. Horizontal transmission of murine retroviruses. J. Virol. 61: 1037-1044 Rada, C, Gupta, S. K., Gherardi, E., and Milstein, C. 1991. Mutation and selection during the secondary response to 2-phenyloxazolone. Proc. Natl Acad Sci USA 88: 5508-5512 Rajewsky, K., Forster, I., and Cumano, A. 1987. Evolutionary and somatic selection of the antibody repertoire in the mouse. Science 238: 1088-1094 Ramsden, D. A., and Wu, G. E. 1991. Mouse K light-chain recombination signal sequences mediate recombination more frequently than those of X light chain Proc. Natl. Acad. Sci. USA 88: 10721-10725 Rathbun, G., Berman, J., Yancopoulos, G., and Alt, F. W. 1989. Organization and expression of the mammalian heavy-chain variable-region locus. In: Immunoglobulin genes. Honjo, T, Alt, F. W., and Rabbitts, T. H., eds. Academic Press, N. Y. Reanney, D. 1984. Genetic noise in evolution? Nature 307: 318-319 Reanney, D. C. 1986. Genetic error and genome design. Trends Genet. 2: 41-46 Rechavi, G., Ram, D., Glazer, L., Zakut, R., and Givol, D. 1983. Evolutionary aspects of immunoglobulin heavy chain variable region (VH) gene subgroups. Proc. Natl. Acad. Sci. USA 80, 855-859. Reilly, J. G., Ogden, R., and Rossi, J. J. 1982. Isolation of a mouse pseudo tRNA gene encoding CCA - a possible example of reverse flow of genetic information. Nature 300: 287-289 Reth, M., Hammerling, G. J., and Rajewsky, K. 1978. Analysis of the repertoire of anti-NP antibodies in C57BL/6 mice by cell fusion. I. Characterization of antibody families in the primary and hyperimmune response. Eur. J. Immunol. 8: 393-400 Reth, M. G., and Alt, F. W. 1984. Novel immunoglobulin heavy chains are produced from DJH gene segment rearrangements in lymphoid cells. Nature 312: 418-423 Reth, M., Gehrmann, P., Petrac, E., and Wiese, P. 1986. A novel VH to VHDJH joining mechanism in heavy-chain-negative (null) pre-B cells results in heavy- chain production. Nature 322: 840-842 Reynaud, C, Anquez, V., Grimal, H., and Weill, J. 1987. A hyperconversion mechanism generates the chickentight chai n preimmune repertoire. Cell 48: 379- 388 Reynaud, C, Dahan, A., Anquez, V., and Weill, J. 1989. Somatic hyperconversion diversifies the single VH gene of the chicken with a high incidence in the D region. Cell 59: 171-183 Riblet, R., Tutter, A., and Brodeur, P. 1986. Polymorphism and evolution of IgH-V gene families. Curr. Top. Microbiol. Immunol. 127: 167-172 Ripley, L. S. 1982. Model for the participation of quasi-palindromic DNA sequences in frameshift mutation. Proc. Natl. Acad. Sci. USA 79: 4128-4132 Roes, J., Hiippi, K., Rajewsky, K., and Sablitzky, F. 1989. V gene rearrangement is required to fully activate the hypermutation mechanism in B cells. J. Immunol. 142: 1022-1026 Roes, J., and Rajewsky, K. 1993. Immunoglobulin D (IgD) deficient mice reveal an auxiliary receptor function for IgD in antigen-mediated recruitment of B cells. J. Exp. Med. Ill: 45-55 Rogerson, B., Hackett, J., Peters, A., Haasch, D., and Storb, U. 1991. Mutation pattern of immunoglobulin transgenes is compatible with a model of somatic hypermutation in which targeting of the mutator is linked to the direction of DNA replication. EMBO J. 10: 4331-4341 Rolink, A., Streb, M., and Melchers, F. 1991. The \dX ratio in surface immunoglobuli239 n molecules on B lymphocytes differentiating from DHJH-rearranged murine pre-B cell clones in vitro. Eur. J. Immunol. 21: 2895-2898 Roth, D. B., Nakajima, P. B., Menetski, J. P., Bosma, M. J., and Gellert, M 1992a V(D)J recombination in mouse thymocytes: Double-strand breaks'near T celi receptor 6 rearrangement signal. Cell 69: 41-53 Roth, D. B., Manetski, J. P., Nakajima, P. B., Bosma, M. J., and Gellert, M. 1992b V(D)J recombination: Broken DNA molecules with covalently sealed (hairpin) coding ends in scid mouse thymocytes. Cell 70: 983-991 Roth, D. B., Zhu, C, and Gellert, M. 1993. Characterization of broken DNA molecules associated with V(D)J recombination. Proc. Natl. Acad. Sci. USA 90 10788- 10792 Rothenfluh, H. S. 1990. Origin of mutations in non-translated regions upstream of immunoglobulin genes. Honours Thesis. University of Wollongong Rothenfluh, H. S., and Steele, E. J. 1993a. Origin and maintenance of germ-line V genes. Immunol. Cell. Biol. 71: 227-232 Rothenfluh, H. S., and Steele, E. J. 1993b. Lamarck, Darwin and the immune system Today's Life Science 5(7) 8-15, and 5(8) 16-22 Rothenfluh, H. S., Taylor, L., Bothwell, A. L. M., Both, G. W., and Steele, E. J. 1993. Somatic hypermutation in 5' flanking regions of heavy chain antibody variable regions. Eur. J. Immunol. 23: 2152-2159 Roux, K. H., Dhanarajan, P., Gottschalk, V., McCormack, W. T., and Renshaw, R. W. 1991. Latent al VH germ-line genes in an a2a2 rabbit: evidence for gene conversion at both the germ-line and somatic levels. J. Immunol. 146: 2027-2036 Ruff-Jamison, S., and Glenney, J. R. 1993. Requirement for both H and L chain V regions, VH and VK joining amino acids, and the unique H chain D region for the high affinity binding of an anti-phosphotyrosine antibody. J. Immunol. 150: 3389-3396 Sablitzky, F., and Rajewsky, K. 1984. Molecular basis of an isogeneic anti-idiotypic response. EMBO J. 3: 3005-3012 Sablitzky, F., Weisbaum, D., and Rajewsky, K. 1985a. Sequence analysis of non- expressed immunoglobulin heavy chain loci in clonally related, somatically mutated hybridoma cells. EMBO J. 4: 3435-3437 Sablitzky, F. Wildner, G., and Rajewsky, K. 1985b. Somatic mutation and clonal expansion of B cells in an antigen-driven immune response. EMBO J. 4: 345-350 Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., and Erlich, H. A. 1988. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239: 487-491 Saitou, N., and Nei, M. 1987. The neighbour-joining method: A new method for constructing phylogenetic trees. Mol. Biol. Evol 4:406-425 Sakano, H., Maki, R., Kurosawa, Y., Roeder, W., and Tonegawa, S. 1980. Two types of somatic recombination are necessary for the generation of complete immunoglobulin heavy-chain genes. Nature 286: 676-683 Sakano, H., Kurosawa, Y., Weigert, M., and Tonegawa, S. 1981. Identification and nucleotide sequence of a diversity DNA segment (D) of immunoglobulin heavy- chain genes. Nature 290: 562-565 Sambrook, J., Fritsch, E. F., and Maniatis, T. 1989. Molecular Cloning. A laboratory manual, 2nd edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor. Sasso, E. H., van Dijk, K. W., Bull, A., Van Der Maarel, S. M., and Milner, E. C. B. 1992. VH genes in tandem array comprise a repeated germline motif. J. Immunol. 149: 1230-1236 Schatz, D. G., and Baltimore, D. 1988. Stable expression of immunoglobulin gene V(D)J recombinase activity by gene transfer into 3T3 fibroblasts. Cell 53: 107- 115 Schatz, D. G., Oettinger, M. A., and Baltimore, D. 1989. The V(D)J recombination activating gene, RAG-1. Cell 59: 1035-1048 Schiff, C, Milili, M., and Fougereau, M. 1985. Functional and pseudogene240s are similarlof the IgVHy organizeH familyd an.d EMBO may equallJ. y4 :contribut 1224-123e t0o the extensive antibody diversity Schilling, J., Clevinger, B., Davie, J. M., and Hood, L. 1980. Amino acid sequence of homogeneous antibodies to dextran and DNA rearrangements in heavy chain V- region gene segments. Nature 283: 35-40 Schittek, B., and Rajewsky, K. 1992. Natural occurrence and origin of somatically mutated memory B cells in mice. J. Exp. Med. 176: 427-438 Schowalter, D. B., and Sommer, S. S. 1989. The generation of radiolabeled DNA and RNA probes with polymerase chain reaction. Anal. Biochem. Ill- 90-94 Schuldiner, A. R., Nirula, A., and Roth, J. 1989. Hybrid DNA artifact from PCR of closely related target sequences. Nucleic Acids Res. 17: 4409 Schuler, W., Weiler, I. J., Schuler, A., Phillips, R. A., Rosenberg, N., Mak, T. W., Kearney, J. F., Perry, R. P., and Bosma, M. J. 1986. Rearrangement of antigen receptor genes is defective in mice with severe combined immunodeficiencv. Cell 46:963-972 J Schwager, J., Grossberger, D., and Du Pasquier, L. 1988. Organization and rearrangement of immunoglobulin M genes in the amphibian Xenopus. EMBO J. 7:2409-2415 Schwager, J., Burckert, N., Courtet, M., and Du Pasquier, L. 1989. Genetic basis of the antibody repertoire in Xenopus: analysis of the VH diversity. EMBO J. 8: 2989- 3001 Seising, E., and Storb, U. 1981. Somatic mutation of immunoglobulin light-chain variable-region genes. Cell 25: 47-58 Sentman, C. L., Shutter, J. R., Hockenbery, D., Kanagawa, O., and Korsmeyer, S. J. 1991. bcl-2 inhibits multiple forms of apoptosis but not negative selection in thymocytes. Cell 67: 879-888 Serwe, M., and Sablitzky, F. 1993. V(D)J recombination in B cells is impaired but not blocked by targeted deletion of the immunoglobulin heavy chain intron enhancer. EMBO J. 12: 2321-2327 Shan, H., Shlomchik., and Weigert, M. 1990. Heavy-chain class switch does not terminate somatic mutation. J. Exp. Med. 172: 531-536 Sharon, J. 1988. The invariant tryptophan in an H chain V region is not essential to antibody binding. J. Immunol. 140: 2666-2669 Sharon, J., Gefter, M. L., Wysocki, L. J., and Margolies, M. N. 1989. Recurrent somatic mutations in mouse antibodies to p-azophenylarsonate increase affinity for hapten. J. Immunol. 142: 596-601 Sharon, J. 1990. Structural correlates of high antibody affinity: Three engineered amino acid substitutions can increase the affinity of an anti-p-azophenylarsonate antibody 200-fold. Proc. Natl. Acad. Sci. USA 87: 4814-4817 Sharpe, M., Neuberger, M., Pannell, R., Surani, M. A., and Milstein, C. 1990. Lack of somatic mutation in a K light chain transgene. Eur. J. Immunol 20: 1379-1385 Sharpe, M. J., Milstein, C, Jarvis, J. M., and Neuberger, M. S. 1991. Somatic hypermutation of immunoglobulin K may depend on sequences 3' of CK and occurs on passenger transgenes. EMBO J. 10: 2139-2145 Shinkai, Y., Rathbun, G., Lam, K., Oltz, E. M., Stewart, V., Mendelsohn, M., Charron, J., Datta, M., Young, F., Stall, A. M., and Alt, F. W. 1992. RAG-2 deficient mice lack mature lymphocytes owing to inability to initiate V(D)J rearrangement. Cell 68: 855-867 Short, J. A., Sethupathi, P., Zhai, S. K., and Knight, K. L. 1991. VDJ genes in VHa2 allotype-suppressed rabbits. Limited germline VH gene usage and accumulation of somatic mutations in D regions. J. Immunol. 147: 4014-4018 Siden, E., Alt, F. W., Shinefeld, L., Sato, V., and Baltimore, D. 1981. Synthesis of immunoglobulin )i chain gene products precedes synthesis of light chains during B-lymphocyte development. Proc. Natl. Acad. Sci. USA 78: 1823-1827 Siekevitz, M., Kocks, C, Rajewsky, K., Dildrop, R. 1987. Analysis of somatic mutation and class switching in naive and memory B cells generating 24adoptiv1 e primary and secondary responses. Cell 48: 757-770 Sims, M. J., Krawinkel, U., and Taussig, M. J. 1992. Characterization of germ-line genes of the VGAM3.8 VH gene family from BALB/c mice. J. Immunol 149 1642-1648 Siu, G., Kronenberg, M., Strauss, E., Haars, R., Mak, T. W., and Hood, L. 1984. The structure, rearrangement and expression of Dfl gene segments of the murine T-cell antigen receptor. Nature 311: 344-349 Sohn, J., Gerstein, R. M., Hsieh, C, Lemer, M., and Seising, E. 1993. Somatic hypermutation of an immunoglobulin \i heavy chain transgene. J Exp Med 177:493-504 ~ 5 F' Solin M., Kaartinen, M., and Makeia, O. 1992. The same few V genes account for a majority of oxazolone antibodies in most mouse strains. Molec. Immunol 29* 1357-1362 Sompuran, S. R., and Sharon, J. 1993. Verification of a model of a F(ab) complex with phenylarsonate by oligonucleotide-directed mutagenesis. J. Immunol. 150: 1822- 1828 Sommer, R., and Tautz, D. 1989. Minimal homology requirements for PCR primers. Nucleic Acids Res. 17: 6749 Steele, E. J. 1979. Somatic selection and adaptive evolution. On the inheritance of acquired characters. 2nd edition. University of Chicago Press, Chicago. Steele, E. J., and Pollard, J. W. 1987. Hypothesis: Somatic hypermutation by gene conversion via the error prone DNA->RNA-»DNA information loop. Molec. Immunol. 24: 667-673 Steele, E. J. Ed. 1991. Somatic hypermutation in V-regions. CRC Press, Boca Raton, FL. Steele, E. J., Pollard, J. W., Taylor, L., and Both, G. W. 1991. Evaluation of possible mutator mechanisms active on mammalian variable region genes. In: Somatic hypermutation in V-regions. ed. Steele, E. J. CRC Press, Boca Raton, FL. Steele, E. J., Rothenfluh, H. S., and Both, G. W. 1992. Defining the nucleic acid substrate for somatic hypermutation. Immunol. Cell. Biol. 70: 129-144 Steele, E. J., Rothenfluh, H. S., Ada, G. L., and Blanden, R. V. 1993. Affinity maturation of lymphocyte receptors and positive selection of T cells in the thymus. Immunol. Rev. 135: 5-49 Steinman, R. M„ Gutchinov, B., Witmer, M. D., and Nussenzweig, M. C. 1983. Dendritic cells are the principal stimulators of the primary mixed leukocyte reaction in mice. J. Exp. Med. 157: 613-627 Storb, U. 1987. Transgenic mice with immunoglobulin genes. Annu. Rev. Immunol. 5: 151-174 Strasser, A., Harris, A. W., and Cory, S. 1991. bcl-2 transgene inhibits T cell death and perturbs thymic self-censorship. Cell 61: 889-899 Strickland, J. E., Fowler, A. K., and Hellman, A. 1979. Expression of murine leukemia virus DNA polymerase in mouse uterus during pregnancy. Biol. Reprod. 20: 751-756 Sun, Z., and Kitchingman, G. R. 1994. Analysis of the imperfect octamer-containing human immunoglobulin VH6 gene promoter. Nucleic Acids Res. 22: 850-860 Szakal, A. K., and Hanna, M. G. 1968. The ultrastructure of antigen localization and virus-like particles in mouse spleen germinal centers. Exp. Mol. Pathol 8: 75-89 Takeda, S., Masteller, E. L., Thompson, C. B., and Buerstedde, J. M. 1992. RAG-2 expression is not essential for chicken immunoglobulin gene conversion. Proc. Natl. Acad. Sci. USA 89: 4023-4027 Takeda, S., Zou, Y., Bluethman, H., Kitamura, D., Muller, U., and Rajewsky, K. 1993. Deletion of the immunoglobulin K chain intron enhancer abolishes K chain gene rearrangement in cis but not X chain gene rearrangement in trans. EMBO J. 12: 2329-2336 Tamma, S. M. L., Amin, A. R., Finkelman, F. D., Chen, Y, Thorbecke, G.24 J.2, and regionCoico,s Ro.f Fimmunoglobuli. 1991. IgD receptorn D. Proc.s on Natl. murin Acad.e T-helpe Sci.r cellUSAs bin88:d 9233-923 to Fd an7d Fc Tanaka, T., and Nei, M. 1989. Positive Darwinian selection observed at the variable region genes of immunoglobulins. Mol Biol. Evol 6: 447-459 Tao, W., and Bothwell, A. L. M. 1990. Formation and development of B cell lineages during a primary anti-hapten immune response. J. Immunol. 145: 3216-3222 Tao, W., Hardardottir, F., and Bothwell, A. L. M. 1993. Extensive somatic mutation in the Ig heavy chain V genes in a late primary anti-hapten immune response. Molec. Immunol. 30: 593-602 Tchenio, T., Segal-Bedirdjian, E., and Heidmann, T. 1993. Generation of processed pseudogenes in murine cells. EMBO J. 12:1487-1497 Thomas, K. R., and Capecchi, M. R. 1986. Introduction of homologous DNA sequences into mammalian cells induces mutations in the cognate gene. Nature 324: 34-38 Tiegs, S. L., Russell, D. M., and Nemazee, D. 1993. Receptor editing in self-reactive bone marrow B cells. /. Exp. Med. Ill: 1009-1029 Tindall, K. R., and Kunkel, T. A. 1988. Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase. Biochem. 27: 6008-6013 Tomlinson, I. M., Walter, G., Marks, J. D., Llewelyn, M. B., and Winter, G. 1992. The repertoire of human germline VH sequences reveals about fifty groups of VH segments with different hypervariable loops. /. Mol Biol 227: 776-798 Tonegawa, S. 1983. Somatic generation of antibody diversity. Nature 302: 575-581 Tramontano, A., Chothia, C, and Lesk, A. M. 1990. Framework residue 71 is a major determinant of the position and conformation of the second hypervariable region of the VH domains of immunoglobulins. /. Mol Biol 215: 175-182 Tsudaka, S., Sugiyama, H., Oka, Y., and Kishimoto, S. 1990. Estimation of D segment usage in initial D to JH joinings in a murine immature B cell line. Preferential usage of DFL16.L the most 5' D segment and DQ52, the most Jn-proximal D segment. J. Immunol. 144: 4053-4059 Tutter, A., and Riblet, R. 1989. Conservation of an immunoglobulin variable-region gene family indicates a specific, noncoding function. Proc. Natl Acad. Sci. USA 86: 7460-7464 Unanue, E. R. 1984. Antigen-presenting function of the macrophage. Ann. Rev. Immunol. 2: 395-428 Walsh, J. B. 1985. Interaction of selection and biased gene conversion in a multigene family. Proc. Natl. Acad. Sci. USA 82: 153-157 Wang, J. C. 1979. Helical repeat of DNA in solution. Proc. Natl. Acad. Sci. USA 76: 200-203 Wang, Y. F., and Holstein, A. F. 1983. Intraepithelial lymphocytes and macrophages in the human epididymis. Cell Tissue Res. 233: 517-521 Wang, J. C, Caron, P. R., and Kim, R. A. 1990a. The role of DNA topoisomerases in recombination and genome stability: A double-edged sword? Cell 62:403-406 Wang, D., Chen, H., Liao, J., Akolkar, P. N., Sikder, S. K., Gruezo, F. and Kabat, E. 1990b. J. Immunol. 145: 3002-3010 Wang, D., Wells, S. M., Stall, A. M., and Kabat, E. 1994. Reaction of germinal centers in the T-cell-independent response to the bacterial polysaccharide cc(l-»6)dextran. Proc. Natl. Acad. Sci. USA 91: 2502-2506 Weaver, C. T., and Unanue, E. R. 1990. The costimulatory function of antigen- presenting cells. Immunol. Today 11: 49-55 Weber, J. S., Berry, J., Manser, T., and Claflin, J. L. 1991. Position of the rearranged VK and its 5'flanking sequences determines the location of somatic mutations in the JK locus. /. Immunol 146: 3652-3655 Wecker, E., and Horak, I. 1982. Retrovirus genes in lymphocyte function and growth. Curr. Top. Microbiol. Immunol. 98: 1-132 Weichhold G. M., Klobeck, H. G., Ohnheiser, R., Combriato, G., and Zachau, H. G. 1990. Megabase inversions in the human genome as physiological events243. Nature 347:90-92 Weigert, M., Gatmaitan, L., Loh, E., Schilling, J., and Hood, L. 1978. Rearrangement of genetic information may produce immunoglobulin diversity. Nature 276: 785- 790 Weiss, R. A. 1982. Hybridomas produce viruses as well as antibodies. Immunol. Today 3: 292-294 Weiss, S., and Wu, G. E. 1987. Somatic point mutations in unrearranged immunoglobulin gene segments encoding the variable region of Xtight chains . EMBO J. 6: 921-932 Weiss, U., and Wilson, J. H. 1988. Heteroduplex-induced mutagenesis in mammalian cells. Nucleic Acids Res. 16: 2313-2322 Weiss, U., and Rajewsky, K. 1990. The repertoire of somatic antibody mutants accumulating in the memory compartment after primary immunization is restricted through affinity maturation and mirrors that expressed in the secondary response. J. Exp. Med. 172: 1681-1689 Weiss, U., Zoebelein, R., and Rajewsky, K. 1992. Accumulation of somatic mutants in the B cell compartment after primary immunization with a T cell-dependent antigen. Eur. J. Immunol. 22: 511-517 Williams, J. G. K., Kubelik, A. R., Livak, K. J., Rafalski, J. A., and Tingey, S. V. 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res. 18: 6531-6535 Wilson, M., Hsu, E., Marcuz, A., Courtet, M., Du Pasquier, L., and Steinberg, C. What limits affinity maturation of antibodies in Xenopus - the rate of somatic mutation or the ability to select mutants? 1992. EMBO J. 11:4337-4347 Winoto, A., Mjolsness, S., and Hood, L. 1985. Genomic organization of the genes encoding mouse T-cell receptor a-chain. Nature 316: 832-836 Witkin, S. S., and Bendich, A. 1977. DNA synthesizing activity in normal human sperm. Exp. Cell Res. 106: 47-54 Wu, T., and Kabat, E. A. 1970. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J. Exp. Med. 132: 211-250 Wysocki, L., Manser, T., Gefter, M. 1986. Somatic evolution of variable region structures during an immune response. Proc. Natl Acad. Sci. USA 83: 1847- 1851 Wysocki, L. J., and Gefter, M. L. 1989. Gene conversion and the generation of antibody diversity. Annu. Rev. Biochem. 58: 509-531 Wysocki, L. J., Gefter, M. L., and Margolies, M. N. 1990. Parallel evolution of antibody variable regions by somatic processes: Consecutive shared somatic alterations in VH genes expressed by independently generated hybridomas apparently acquired by point mutation and selection rather than by gene conversion. /. Exp. Med. 172: 315-323 Yancopoulos, G. D., Desiderio, S. V., Paskind, M., Kearney, J. F., Baltimore, D., and Alt, F. W. 1984. Preferential utilization of the most JH-proximal VH gene segments in pre-B-cell lines. Nature 311: 727-733 Yaoita, Y., Matsunami, N., Choi, C. Y, Sugiyama, H., Kishimoto, T., and Honjo, T. 1983. The D-JH complex is an intermediate to the complete immunoglobulin heavy-chain V-region gene. Nucleic Acids Res. 11: 7303-7316 Zou, Y., Takeda, S., and Rajewsky, K. 1993. Gene targeting in the mouse IgK locus: efficient generation of X chain-expressing B cells, independent of gene rearrangements in IgK. EMBO J. 12: 811-820

244 Response to examiner's comments

It was argued that chapter 5 did not add anything to the thesis and could have been omitted. I would argue against this on the following grounds: The initial part of my thesis dealt with the process of somatic mutation which takes place in germinal centers. This work is described in chapters 4, 5 and 6. Although much is now known about the process of somatic hypermutation (the 5' boundary has been defined in chapter 6 of this thesis), the mechanism is still unknown and there is still some uncertainty as to the exact location within the germinal center where somatic hypermutation occurs. The experiments described in chapter 4 attempted to directly test the reverse transcriptase mechanism of somatic mutation, whereas those described in chapter 5 involved the development of a technique that would allow the reproduction of somatic hypermutation in vitro as well as in vitro characterization of B cells isolated from the various regions of germinal centers. Due to time limitations, it was not possible to carry out more than preliminary experiments in chapter 5. However, the results obtained are very encouraging and indicate that continuation of this work may prove to be very fruitful. Thus, chapters 4 - 6 are part of a concerted effort during the course of my doctoral research to elucidate the mechanism of somatic mutation. The comment was made that the phylogenetic analyses could have been placed in a separate section, and that it would have been useful to indicate that the phylogenetic analyses were restricted to the mouse and did not encompass other species or phyla. I purposely placed the phylogenetic analyses of each set of germline sequences (chapters 9 and 12) immediately following the molecular analyses (chapters 8 and 11 respectively) because I believe in this way the implications of all of the analyses are more obvious than if I had separated the molecular and phylogenetic analyses. In the 'Rationale' section of chapters 9 and 12 I made it clear that the phylogenetic analyses were applied to the VH186.2- and VH205.12 related sequences that I presented in chapters 8 and 11. Taken together with the fact that the sequence names are clearly identified on the dendrograms, there can be no doubt that the phylogenetic analyses were applied to each set of sequences. It was pointed out that one aspect of the memory lymphocyte specific soma-to-germline genetic feedback loop described on pages 215 - 217 that is still unclear is how only the V genetic element of a complete V(D)JC transcription unit can be returned to the germline. On pages 216-217 I mention the presence of a highly conserved heptamer sequence within the 3' terminus of the coding region of many VH genes found in vertebrates. I then discuss how this heptamer sequence (which is thought to mediate VH gene replacement) may facilitate gene conversion events between the incoming cDNA molecule and a homologous germline target, and in so doing may provide the specificity for the VH genetic element. However, there is as yet no equivalent sequence in VL germline genes that may provide specificity for the VL portion of the transcript. Nevertheless, there is as yet insufficient sequence data available to completely rule out the possibility that other portions of Ig retrotranscripts may be returned to the germline. It was pointed out that the Introduction (chapter 1) may have benefited from the inclusion of a discussion on Bcell tolerance. In the introduction I have limited myself to the discussion of issues directly related to the research presented in this thesis, which has no direct bearing on tolerance. As it is, the introduction covers 35 pages, and the inclusion of a discussion of tolerance would have greatly increased the size of this section. An alternative interpretation of some of the results presented in chapter 4 was offered: Five cDNA sequences with shared changes from the VH 186.2 germline gene were isolated (page 57). It was suggested that these changes were not somatic mutations, but that the five sequences originated from a germline gene that was closely related to, but nevertheless different from VH186.2. 245 I classed these changes as somatic mutations because the same changes were also identified by other workers in the anti-NP response (Blier and Bothwell, 1987; Weiss et al, 1992).The possibility that these sequences represent different germlme genes cannot be formally excluded. None of the 52 Vnl86.2-related sequences presented in this thesis (chapter 8) contain any of these three changes. However, it is possible that VH 186.2- related germline genes containing the above changes exist but were not PCR amplified due to primer bias.

246