Novel Genes in the Class II Region of the Human Major Histocompatibility Complex

Home , BAT1, BAT2, Histocompatibility

Novel genes in the class II region of the human major histocompatibility complex

t y

Isabel Mary Hanson

A thesis submitted for the degree of Doctor of Philosophy in the University of London

Human Immunogenetics Laboratory Imperial Cancer Research Fund 44 Lincolns Inn Fields London

and

Department of Genetics and Biometry University College Gower Street London

February 1991

1 ProQuest Number: 10610901

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest

ProQuest 10610901

Published by ProQuest LLC(2017). Copyright of the Dissertation is held by the Author.

ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 to Chris

2 ACKNOWLEDGEMENTS

I am very grateful to the friends and colleagues who have helped to make my three years at ICRF such an enjoyable time. First and foremost I would like to thank my supervisor, John Trowsdale, and my lab. mum, Pat Miller. Next, I gratefully acknowledge my fellow Human Immunogeneticists, past and present: Danny Altman, Ian 'Peachy Keen' Campbell, Vincent Cunliffe, Will Foulkes, Richard Glynne, Vikki Groves, Hitoshi Ikeda, Adrian 'Gene' Kelly, Lesley-Anne Kerr, Chris 'More tea, vicar?' Lock, Ruth 'Someone's stolen my bike' Love- ring, Ian Mockridge, Steve Powis, Jiannis 'Disaster' Ragoussis, Philippe Sanseau, David 'Captain' Sansom and David Wilkinson. I am also indebted to Kathy Cheah, Paul Freemont, Pat Gorman, Ketan Patel, Sabine Myers, Sue 'Perfect Day' Rider, Melissa 'Ruby' Rubock, Lisa Stubbs, Susan Tonks and my external supervisor, Jonathan Wolfe. Beyond ICRF I would like to acknowledge my parents, Charles & Anne Hanson, and my parents-in-law-to-be, Ken & Penny Riley. I am particularly grateful to Patricia Cocks for providing me with a roof over my head in salubrious St. John's Wood. Last, but by no means least, I would like to thank Chris Riley for his support which never faltered even though I didn't name my new genes after him.

3 ABSTRACT

The aim of the work described in this thesis was to identify and characterise novel genes in the class II region of the human major histocompatibility complex (MHC).

A physical map of the region, spanning over IMbp, was constructed using pulsed field gel electrophoresis (PFGE) in conjunction with probes for the known class II genes. This map facilitated the direct localisation of the gene for the a2 chain of type XI fibrillar collagen, COL11A2, to a region just centromeric of the class II DP subregion. In addition the PFGE map revealed four clusters of sites for restriction endonucleases which cut preferentially in CpG-rich regions often found at the 5‘ ends of genes. Three of these clusters were cloned by cosmid walking and chromosome jumping. Genomic fragments from these regions were hybridised to cDNA libraries which resulted in the identification of five novel genes designated RING1-5. RING1, RING2 and RING5 were 95kb, 90kb and 85kb proximal respectively to DPB2. RING3 was 35kb distal to DNA. R1NG4 was 25kb proximal to DOB.

Nucleotide sequencing revealed that RING1-5 were not related to each other or to the class II genes. The predicted protein product of RING1 contained a novel cysteine-histidine motif which was conserved in a variety of other proteins and which was reminiscent of domains found in zinc-dependent nucleic acid binding proteins. RING3 potentially encoded a protein with striking homology to the product of fsh, a Drosophila developmental gene. The putative product of RING4 was a member of the ‘ABC1 superfamily of ATP-dependent transporter proteins. RINGS was the human homologue of KE4, a gene in the mouse MHC region which is a candidate for the developmental lethal fwtf

These findings may be of importance in understanding MHC/disease associations and the role of the MHC in the immune response.

4 CONTENTS

Page 1. The major histocompatibility complex 1.1. Classical studies of the MHC 12 1.2. Function of class I and class II gene products 16 1.3. Molecular genetics of the MHC 26 1.4. Evolution of the MHC 36 1.5. Clinical relevance of the class II region 42 1.6. Evidence for other genes in the class n region 45 1.7. Approaches to finding novel genes 48

2. Materials and Methods 2.1. Bacterial cell culture 51 2.2. Screening of recombinant DNA libraries 52 2.3. Subcloning of DNA fragments 55 2.4. Transformation of bacterial cells with DNA by electroporation 56 2.5. Preparation of DNA from transformed bacterial cells 57 2.6. Small scale preparation of bacteriophage X DNA 60 2.7. Preparation of eukaryotic DNA from cells in culture 61 2.8. Preparation of very high molecular weight DNA for pulsed field gel electrophoresis 62 2.9. Preparation of RNA 63 2.10. Restriction endonuclease digestion of DNA 64 2.11. Electrophoresis of DNA and RNA 66 2.12. Preparation of DNA and RNA blots 67 2.13. Preparation of DNA probes 69 2.14. Hybridisation of blots 70 2.15. DNA sequencing 71

3. Identification of potential CpG islands in the class 11 region by PFGE mapping 3.1. Strategy to identify potential sites of genes 75 3.2. Choice of materials for construction of the PFGE map 77 3.3. Construction of the PFGE map 79

5 Contents (continued)

3.4. Summary and discussion 87

4. Mapping of the COL11A2 gene to the class II region 4.1. Introduction 89 4.2. Mapping of COL11A2 using somatic cell hybrids 92 4.3. Mapping of COL11A2 by PFGE 92 4.4. Cosmid walking between the DP subregion and COL11A2 95 4.5. Summary and discussion 97

5. Identification of novel genes associated with clusters of rare-cutter sites 5.1. Introduction 99 5.2. Cloning of cluster 1 by cosmid walking 99 5.3. Cloning of cluster 2 by chromosome jumping 105 5.4. Identification of cluster 3 in previously isolated cosmid clones 112 5.5. Analysis of cloned regions for coding sequences 115 5.6. Expression patterns of R1NG1-5 121 5.7. Refinement of the physical map of the class II region 124 5.8. Summary and discussion 127

6. Characterisation of novel genes by nucleotide sequencing 6.1. Introduction 129 6.2. Nucleotide sequence of RING1 129 6.3. Partial nucleotide sequence of RING2 139 6.4. Nucleotide sequence of RING3 139 6.5. Nucleotide sequence of RING4 146 6.6. Partial nucleotide sequence of RING5 153 6.7. Summary 155

7. Comparative mapping of novel genes in the MHC region of mouse and man 7.1. Introduction 156

6 Contents (continued)

7.2. Determination of the positions of sequences homologous to the human genes RING1-5 and COL11A2 in the mouse MHC 158 7.3. Determination of the positions of sequences homologous to three mouse genes, KE3-5, in the human MHC 161 7.4. Summary and discussion 168

8. RFLP analysis of novel genes 8.1. Introduction 172 8.2. Approach to identifying RFLPs 172 8.3. RFLP analysis of COL11A2 173 8.4. RFLP analysis of RING1 174 8.5. RFLP analysis of RING2 175 8.6. RFLP analysis of RING3 176 8.7. RFLP analysis of RING4 111 8.8. RFLP analysis of B51 178 8.9. Summary and discussion 179

9. Concluding discussion 9.1. Advances in MHC mapping 181 9.2. Potential role of novel genes in class II-assodated phenotypes 183 9.3. Additional applications of probes generated in this study 188 9.4. More new genes in the class II region? 189 9.5. Function of the MHC gene cluster 190

10. References 192

Appendices A. Cosmid clones 221 B. cDNA clones 224 C. Gene symbols and database accession numbers 225

7 FIGURES

1.1. Position of the MHC on chromosome 6 15 1.2. Schematic illustration of antigen presentation 17 1.3. Schematic structures of class I and class II molecules 23 1.4. Atomic structures of class I and class II molecules 24 1.5. Molecular genetic maps of the MHC 28 1.6. Evolutionary tree for the class II genes 38 1.7. Organisation of murine and human MHC regions 40

3.1. Diagram of PFGE apparatus 78 3.2. PFGE mapping of the class II region 80 3.3. PFGE mapping of the class II region 81 3.4. PFGE mapping of the class II region 82 3.5. Physical map of the class II region showing clusters of rare-cutter sites 86

4.1. Generalised structure of a fibrillar collagen molecule 89 4.2. Mapping of COL11A2 by in situ hybridisation 91 4.3. Mapping of COL11A2 using somatic cell hybrids 93 4.4. PFGE mapping of COL11A2 94 4.5. Cosmid clones between COL11A2 and the DP subregion 96

5.1. Cosmid clones in the region centromeric of COL11A2 100 5.2. Detailed map of the new cosmid clones 101 5.3. Determination of the methylation status of the cloned rare-cutter sites in the genome 103 5.4. Determination of the methylation status of the cloned rare-cutter sites in the genome 104 5.5. Construction of a rare-cutter jumping library 106 5.6. Restriction maps of jumping clone X]2 and cosmid HPB.ALL 71 109 5.7. Determination of the methylation status of the cloned rare-cutter sites in the genome 110 5.8. Overview of region cloned by jumping 111

8 Figures (continued)

5.9. Map of overlapping cosmids in the DQ subregion 113 5.10. Determination of the methylation status of the cloned rare-cutter sites in the genome 114 5.11. Zoo blot analysis of genomic fragment 33X1 116 5.12. Northern blot analysis of genomic fragment 33X1 118 5.13. Novel genes in clusters 1,2 and 3 120 5.14. Northern blot analyses of RING1-5 122 5.15. Mapping of the DNA gene 125 5.16. Physical map of the class II region showing the positions of RING1-5 and COL11A2 126

6.1. Nucleotide and amino acid sequence of RING1 132 6.2. Alignment of proteins sharing the R1NG1 cysteine-histidine motif 133 6.3. Hypothetical structure of the RING1 cysteine-histidine motif 137 6.4. Nucleotide and amino acid sequence of RING2 138 6.5. Restriction map of RING3 cDNA clones 141 6.6. Nucleotide sequence of RING3 142 6.7. Alignment of the RING3 amino add sequence with fsh 144 6.8. Nucleotide sequence of RING4 147 6.9. Hydropathidty plot of the RING4 protein product 148 6.10. Alignment of the RING4 amino acid sequence with members of the ABC superfamily 150 6.11. Schematic diagram of ABC structure 151 6.12. Alignment of the RING5 nudeotide and amino acid sequencewith KE4 154

7.1. Molecular map of the proximal region of the mouse MHC 157 7.2. Determination of the positions of sequences homologous to the human genes RING1-5 and COL11A2 in the mouse MHC 159

9 Figures (continued)

7.3. Determination of the positions of sequences homologous to the mouse genes KE4 and KE5 in the human MHC 163 7.4. Restriction map of cosmid HPB.ALL 51 165 7.5. PFGE mapping of B51 166 7.6. Physical map of the proximal end of the human MHC region 167 7.7. Comparative map of the proximal regions of the mouse and human MHCs 169

A.I. Restriction map of the cosmid vector cos202 222 B.l. Restriction map of the cDNA vector CDM8 223

10 TABLES

1.1. Serologically defined HLA specificities 19 1.2. Associations between the MHC and diseases 43

3.1. Distribution of rare-cutter sites in human DNA 76 3.2. PFGE fragments detected by class II probes 83

4.1. PFGE fragments detected by COL11A2 and DPB1 95

5.1. PFGE fragments detected by COL11A2 and 33X1 102 5.2. Zoo blot results obtained with cosmid fragments 115 5.3. cDNA clones isolated with cosmid fragments 121 5.4. Expression patterns of RING1-5 123

6.1. Homologues of RING1 134

7.1. Genomic probes for the mouse genes KE2-5 162 7.2. PFGE fragments detected by B51 and 33X1 168

8.1. RFLP analysis of COL11A2 173 8.2. RFLP analysis of RING1 (CEM15) 174 8.3. RFLP analysis of RING1 (33X1) 175 8.4. RFLP analysis of RING2 175 8.5. RFLP analysis of RING3 176 8.6. RFLP analysis of RING4 178 8.7. RFLP analysis of B51 179

A.I. Orientation of cosmid clones 221 B .l. Orientation of cDNA clones 225

11 1. The major histocompatibility complex

The major histocompatibility complex (MHC) is one of the best characterised regions of the human genome. Situated on chromosome 6, region p21.3, it spans about 4Mbp of DNA and contains over 70 genes. Three features of the MHC in particular have stimulated the enormous amount of work on this gene cluster. First, it encodes molecules which play a central role in the regulation of the immune response. Second, it is associated with numerous diseases. Third, it represents 1/750th of the human genetic material, and as such is a paradigm for the detailed molecular genetic organisation of the genome.

1.1. Classical studies of the MHC

Many of the MHC lod were discovered by chance and were studied for several years in absence of any knowledge of their true function. This section summarises experiments in both human and murine systems which led to the discovery of the MHC.

1.1.1. Class I molecules Some of the first experiments which identified a phenotype controlled by the MHC were performed seventy five years ago when it was shown that the ability to successfully transplant tumours between mice was genetically determined, being dependent on the strains selected as donor and host (Little and Tyzzer, 1916; Klein, 1986). Much later it was shown that that the outcome of tissue transplants segregated with a serologically defined blood antigen, H(histocompatibility)-2 (Gorer, 1937; Gorer et al., 1948). More refined genetic and serological analyses revealed that the situation was more complicated, with two segregant allelic series of antigens, H-2K and H-2D, playing a major role in determining the success of transplants (Amos et al., 1955). In humans, the presence of histocompatibility antigens was demonstrated when it was discovered that the sera of multiply transfused patients and

12 multiparous women contained antibodies which were capable of agglutinating the leukocytes of some donors but not others (Dausset, 1958; Payne and Rolfs, 1958; van Rood et al., 1958). The complex patterns of reactivity of these sera with large panels of donors were statistically analysed to define the first genetically controlled human leukocyte antigens (HLA; van Rood and van Leeuwen, 1963; Payne et al., 1964). Initially, two distinct loci, HLA-A and HLA-B, were found to control the expression of HLA molecules. It was soon demonstrated by family studies and population studies that these loci were closely linked (Ceppelini et al., 1967). Later, a third closely linked locus, HLA-C, was also described (Svejgaard et al., 1973). Studies of the effects of matching at the HLA-A, -B and -C lod on the survival of human tissue transplants suggested that the HLA antigens were analogous to the histocompatibility loci ( H-2D and H-2K) which had been previously defined in mice. These molecules were classified together as the classical transplantation antigens, or class I antigens. The finding that humans and mice (and other species) had a gene cluster which played an important role in the survival of tissue transplants gave rise to the concept of the major histocompatibiltiy complex.

1.1.2. Class II molecules Evidence for a novel MHC-encoded function in man was obtained when it was found that strong proliferative responses were obtained when lymphocytes from different individuals were cultured together in vitro (Bach and Hirschorn, 1964). The locus responsible for this mixed lymphocyte reaction was closely linked to those encoding the classical transplantation antigens, but was distinct from them because the MLR proceeded even when the lymphocyte donors were matched for HLA-A, -B and -C (Yunis and Amos, 1971). The HLA-D region was thus defined. Later it became apparent from more detailed serological analysis that the HLA-D region contained two lod, HLA-DR and HLA- DQ (Tosi et al., 1978). Finally it was found that MLR occurred even between lymphocytes matched for DR and DQ and that there was a third D-region locus, HLA-DP (Shaw et al., 1980).

In mice, evidence for a new MHC linked function came from studies of the ability of different inbred strains to mount an immune response to synthetic antigens The 'immune response' (Ir) genes controlling this

13 phenotype were shown to map to a region of the mouse MHC distinct from those encoding the H-2K and D molecules (McDevitt et al., 1972). Two Ir molecules, I-E and I-A, were defined. Later it was shown that the mouse Ir and the human D-region molecules were related, and they were termed class II antigens.

1.1.3. Class III molecules The first non-class I-dass II gene was mapped to the MHC when it was discovered that mouse serum contained a serologically defined factor which was encoded in the region between H -2K and H-2D loci (Shreffler and Owen, 1963; Shreffler and David, 1972). Later it was demonstrated that this was the complement component C4 (Meo et al., 1975). Similarly in man it was found that two serologically defined serum proteins were encoded by the MHC, and later it was discovered that these proteins were the products of the two C4 loci (O'Neill et al., 1978). Complement components C2 and factor B (Bf) were also found to be encoded in the interval between HLA-D and HLA-B (Fu et al., 1974; Lamm et al., 1976). The genes encoding C4, C2 and Bf were inseparable from one another by recombination (Weitkamp and Lamm, 1982). The interval between the class II region and the class I region which encoded the complement components was designated the class III region.

1.1.4. Mapping of the classical MHC loci The polymorphism of the MHC molecules facilitated the mapping of classical human MHC loci by recombination analysis, revealing the gene order DP-[DQ, DR]-[C4, C2, Bf]-B-C-A (Figure 1.1.; Weitkamp and Lamm, 1982).

The chromosomal location of the MHC was deduced from a variety of approaches. For example, HLA typing of the family of an individual with a cytologically detectable translocation breakpoint in region 21 on the short arm of chromosome 6 revealed that the breakpoint had occurred in the middle of the MHC region, thus positioning the MHC at 6p21 (Berger et al., 1979). More recently, HLA typing of y-radiation- induced MHC-loss mutant cell lines coupled with high-resolution cytogenetic karyotyping facilitated the mapping of the MHC to the distal portion of band 6p21.3 (Spring et al., 1985). From recombination

14 CHROMOSOME 6

HLA-A

HLA-C -HLA-B HLA-8 SCA1 0.3 GlO Bf; C4F rAH C2.C45

HLA-O/DR

PGM 3

-M E 1

SOD 2

Figure 1.1. Position of the classical MHC lod relative to one another and to other polymorphic markers on human chromosome 6 (from Weitkamp and Lamm, 1982). mapping of the MHC loci relative to other polymorphic markers on chromosome 6 the orientation of the complex on the short arm and its position relative to these markers was determined. These studies revealed that the DP locus was closest to the centromere (Figure 1.1.; Weitkamp and Lamm, 1982).

1.2. Function of class I and class II gene products

The class I molecules were initially defined as transplantation antigens (section 1.1.1.). The class II molecules were initially defined in man as being responsible for the mixed lymphocyte reaction and in mouse as the immune response genes (section 1.1.2.). However, it is now apparent that the class I and class II molecules are closely related, both structurally and functionally. By binding peptide fragments and presenting them on the cell surface, class I and class II molecules provide the context for recognition of peptide antigen by the T-cell receptor, and as such play a central role in the control of the immune response. In general, class I molecules present peptides derived from endogenous proteins to CD8+ cytotoxic T-lymphocytes while class II molecules present peptides derived from exogenous proteins to CD4+ helper T-lymphocytes (Figure 1.2.). The interaction between the MHC/antigen complex and the T-cell receptor results in activation of the T-cell. Stimulated helper T-lymphocytes secrete lymphokines which promote antibody production by B-cells and assist in the activation of cytotoxic T-cells. Stimulated cytotoxic T-cells lyse the cell presenting the foreign antigen (Klein, 1986).

Class I molecules are expressed by virtually all nucleated cells. Class n molecules are expressed constitutively by B-lymphocytes, macrophages and activated T-lymphocytes and de novo expression can be induced by y-interferon in a wide variety of cell types (Cresswell, 1987).

1.2.1. Polymorphism of class I and class II gene products It was the polymorphism of the classical class I, class II and class in molecules which initially led to their discovery. In fact, the MHC is the most polymorphic gene system known.

16 CD8+ cytotoxic T-cell CD4+ helper T-cell

T cell APC T cell APC

ntigen

Peptide TCR

MHC I MHC II

CDS CD4

Figure 1.2. Highly schematic illustration of antigen presentation by class I and class II molecules to cytotoxic and helper T-cells respectively. APC, antigen presenting cell; TCR, T-cell receptor; 0, peptide. CD8 and CD4 are 'accessory' molecules on the surface of cytotoxic and helper T-cells respectively which are also involved in the interaction between the T-cell receptor and the MHC molecule. The development of serological reagents to characterise the human and murine class I and class II antigens revealed that for each locus a remarkably large number of alleles were present in the population at a significant frequency (>1%). A list of the serologically defined human class I and class II specificities is shown in Table 1.1. (Bodmer et al., 1989b). More recently, the PCR-based cloning and sequencing of numerous class II genes has revealed even greater polymorphism at the DNA level (Bodmer et al., 1990). All the serologically detectable variation in class II DR molecules is attributable to polymorphism in the p-chain, because the a-chain is monomorphic. In contrast, both DQ chains are polymorphic. The DP a-chain is polymorphic, although markedly less so than the DP P-chain (Trowsdale et al., 1985).

Although some species, such as the Syrian hamster, seem to display limited MHC polymorphism, the highly polymorphic human and mouse systems are probably more typical (Klein, 1986). Because the polymorphism is especially marked in those loci which encode the functional antigen presenting molecules, and tends to be particularly concentrated within the exons encoding the domains which form the antigen binding groove, it is most likely that it has been established by natural selection rather than through random drift (Klein, 1986; Parham et al., 1988; Marsh and Bodmer, 1989; Klein and Takahara, 1990).

The major selection pressure on MHC class I and class II poly morphism has probably been exerted through pathogens (Klein, 1986). As described below, class I and class II molecules are important in determining which antigens an individual can respond to in two ways: through selective binding of specific peptide fragments, and through influencing the development of the T-cell repertoire. An individual whose MHC molecules fail to stimulate an immune response against a critical antigenic determinant on a pathogen, either because the antigen does not bind to the presenting molecules or because the individual's T-cells cannot recognise the MHC/antigen complex, has a selective disadvantage if infected by that pathogen. However, MHC polymorphism is of great advantage to the population as a whole because it ensures that other individuals will have combinations of class I and class II alleles which are effective in responding to the same

18 A B C D DR DQ DP Al B5 Cwl Dwl DR1 DQwl DPwl A2 B7 Cw2 Dw2 DR2 DQw2 DPw2 A3 B8 Cw3 Dw3 DR3 DQw3 DPw3 A9 B12 0 4 Dw4 DR4 DQw4 DPw4 A10 B13 Cw5 Dw5 DR5 DQw5 (wl) DPw5 A ll B14 Cw6 Dw6 DRw6 DQw6(wl) DPw6 Awl9 B15 0 7 Dw7 DR7 DQw7 (w3) A23 (9) B16 O S Dw8 DRw8 DQw8 (w3) A24 (9) B17 0 9 (w3) Dw9 DR9 DQw9(w3) A25 (10) B18 0 1 0 (w3) DwlO DRwlO A26(10) B21 OH Dwl 1 (w7) DRwll (5) A28 Bw22 Dwl2 DRwl2 A29 (wl9) B27 Dwl 3 DRwl3(w6) A30 (wl9) B35 Dwl4 DRwl4 (w6) A31 (wl9) B37 Dwl 5 DRwl5 (2) A32 (wl9) B38 (16) Dwl6 DRwl6 (2) Aw33 (wl9) B39 (16) Dwl7 (w7) DRwl7 (3) Aw34 (10) B40 Dwl 8 (w6) DRwl8 (3) Aw36 Bw41 Dwl9 (w6) Aw43 Bw42 Dw20 DRw52 Aw66 (10) B44 (12) Dw21 Aw68 (28) B45 (12) Dw22 DRw53 Aw69 (28) Bw46 Dw23 Aw74 (wl9) Bw47 Dw24 Bw48 Dw25 B49 (21) Dw26 Bw50 (21) B51 (5) Bw52 (5) Bw53 Bw54 (w22) Bw55 (w22) Bw56 (w22) Bw57 (17) Bw58 (17) Bw59 Bw60 (40) Bw61 (40) Bw62 (15) Bw63 (15) Bw64 (14) Bw65 (14) Bw67 Bw70 Bw71 (w70) Bw72 (w70) Bw73 Bw75 (15) Bw76 (15) Bw77 (15) Bw4 Bw6

Table 1.1. Complete listing of serologically defined HLA specificities from the Tenth International Histocompatibility Workshop (Bodmer et al., 1989b). pathogen (Benacerraf, 1981; Klein, 1986). Natural selection is probably responsible not only for maintaining the multiple alleles seen at each functional class I and class II locus but also for maintaining the the multiple class I and class II loci. The potential of each haplotype to encode more than one class I or class II molecule increases the range of antigens to which an individual can respond (Doherty et al., 1976).

1.2.2. Functional significance of class I and class II polymorphism A series of key experiments contributing to contemporary under standing of the function of class I and class II molecules was performed by Zinkemagel and Doherty (1975; Doherty et al., 1976). Different strains of mice were infected intracerebrally with lymphocytic choriomen ingitis virus (LCMV). One week later the mice were killed and their T- lymphocytes tested for the ability to lyse LCMV-infected fibroblasts from a mouse with the H-2k MHC haplotype. The target fibroblasts were killed only by T-cells derived from H-2k strains. It was later shown that all the mouse strains produced cytotoxic T-lymphocytes in response to the viral infection but in each case these lymphocytes were only able to lyse target cells of the same (self) MHC haplotype. Specifically, the phenotype was determined by the class I H-2K gene. On the basis of this work it was proposed that the cytotoxic T-lymphocytes had dual specificity, simultaneously recognising the foreign viral antigen and the self class I molecule. Later it was demonstrated that guinea-pig T-helper lymphocytes were only activated by antigens presented by cells of the self class II haplotype (Sprent, 1978). These studies gave rise to the concept of MHC-restricted antigen presentation, in which cytotoxic T-cells and helper T-cells recognise foreign antigen only in association with self class I or class II molecules respectively.

At first it was not known whether the T-cells recognised the antigen and MHC molecule separately with different receptors or together with the same receptor, or what form the antigen was in. Over the subsequent years it became apparent that T-cells had a single receptor which recognised a single complex of MHC molecule with antigen in the form of a peptide of about 5-10 residues in length (Figure 1.2.; Owen and Crum pton, 1987; Davis and Bjorkman, 1988). Proteins from which MHC-presented antigens are derived are now known to processed by partial proteolysis before presentation (Townsend and Bodmer, 1989).

20 Classical in vivo studies of the control of the immune response had already indicated that class II molecules could present short synthetic peptides (Benacerraf, 1981) and it was later shown that peptides could bind directly to purified class II molecules in vitro (Babbitt et al., 1985; Buus et al., 1986). Similarly it was shown that the antigen presented by class I molecules could be mimicked by short synthetic peptides (Townsend et al., 1986) and that class I molecules purified from cells were bound to peptides of 8-9 residues in length (Elliott et al., 1990).

Classical studies on the control of the immune response in mice had indicated that the MHC was responsible for determining whether or not an immune response could be mounted against a given peptide (Benacerraf, 1981). Later it was shown that the ability to mount an immune response correlated directly with the affinity of purified MHC molecules for that peptide (Babbitt et al., 1985; Buus et al., 1986). Thus, MHC molecules dictate, by their ability to selectively bind amino acid sequences from a protein antigen, whether a T-cell response can be generated against that protein antigen (Benacerraf, 1981).

A second mechanism by which class I and class II molecules play an important role in determining the specificity of an individual's immune response is through the selection of the T-cell repertoire during development (Schwartz, 1989; Marrack and Kappler, 1988). This process takes place in the thymus and involves both the positive and negative selection of maturing T-lymphocytes. The positive selection process is poorly understood but is thought to involve the recognition by the T-cell receptor of thymically-expressed self class I or class n molecules, and results in only those T-cells bearing receptors which recognise self MHC molecules reaching the pool of mature T-cells. Positive selection is the means by which all mature T-cells recognise self MHC molecules as restriction determinants for foreign antigens. The process of negative selection eliminates those T-cells bearing receptors which have high affinity for self MHC + self peptide, and which would thus be autoreactive in the periphery. There is good evidence for the role of class II molecules in the negative selection procedure. Using monoclonal antibodies which recognise T-cell receptors containing the Vpl7a epitope it was shown that T-cells carrying such receptors were selectively deleted in the thymuses of

21 mice expressing the I-E antigen (Schwartz, 1989). Negative selection is a major mechanism through which tolerance to self proteins is established, and is thought to involve the recognition by maturing T- cells of thymically expressed MHC molecules presenting peptides from self proteins. However, the clonal deletion process does not remove all autoreactive T-cells, especially in the case of tissue-specific antigens not expressed in the thymus. Potentially autoreactive T-cell clones in the periphery may be controlled by additional mechanisms, such as suppressor T-cells, about which little is known (Schwartz, 1989).

1.2.3. Structure of class I and class II MHC molecules Class I and class II molecules were initially characterised using standard biochemical techniques (Strominger, 1987). The functional molecules are cell surface glycoproteins. Both are heterodimers of a- and p-chains, as illustrated schematically in Figure 1.3. The mature class I or class II molecule has four extracellular domains, each of approximately ninety amino acids. For class I molecules, three of these domains are contained within the ~43kD a-chain, which is encoded within the MHC. The fourth domain is provided by P2-microglobulin (~12kD), which is encoded on human chromosome 15. In the case of class II molecules, two domains are contained within the ~34kD a-chain and two within the ~29kD p-chain, and both chains are encoded within the MHC.

More recently, the crystal structure of two class I molecules, HLA-A2 and HLA-Aw68, have been determined (Figure 1.4.; Bjorkman et al., 1987a; Garrett et al., 1989). The membrane-proximal domains a3 and p2-microglobulin interact to form the base of the molecule upon which the polymorphic al and a2 domains are supported. The al and a2 domains interact to form a cleft, the floor of which is composed of eight anti-parallel P-strands and the walls of which are formed by two a- helices. The crystal structure of a class II molecule has not yet been obtained, but a hypothetical structure has been modeled based on the amino adds conserved between dass n and class I (Brown et al., 1988). In this model, the al and pi domains interact to form a cleft similar to that observed in the class I structure (Figure 1.4.).

22 MHC class I MHC class II

TM TM

Cytoplasm Cytoplasm

Expressed by virtually Restricted expression all nucleated cells e.g. macrophages, B-cells and activated T-cells

Figure 1.3. Schematic illustration of the structure of class I and class II molecules. TM, transmembrane region. Glycosylation sites are indicated by open circles. Intra-domain disulphide bridges are indicated by S-S. 30 I * 4

Figure 1.4. (a) Three-dimensional crystal structure of the class I molecule HLA-A2 (i) from the side, showing the antigen binding groove created by theotl andc*2 domains, and (ii) looking down on the groove from above (from Bjorkman et al., 1987a). (b) Predicted structure of the antigen binding site of a class II molecule (from Brown et al., 1988). The groove identified in the membrane-distal surface of the class I and class II molecules is thought to be the site of peptide binding. The dimensions of the cleft are appropriate for the accommodation of a peptide of between 8 and 25 amino adds in length, depending on the conformation of the peptide. In the two X-ray diffraction studies it was observed that unidentified material had co-crystallised in the cleft of the class I molecules, and theoretically this could be peptide (Bjorkman et al., 1987a; Garrett et al., 1989). In addition, many of the polymorphic residues and many of the residues predicted from functional studies to be important for peptide binding were found to line the groove (Bjorkman et al., 1987b; Parham et al., 1988; Marsh and Bodmer, 1989). The shape of the groove created by the polymorphic residues in HLA- A2 was quite distinct from that in HLA-Aw68, providing a structural basis for the observed allelic specificity in peptide binding (Garrett et al., 1989).

1.2.4. Peptide binding to class I and class II molecules Class I molecules, which bind and present antigens derived from intracellular proteins, are thought to become complexed with peptide in the endoplasmic reticulum (ER) or a closely associated compartment (Townsend et al., 1989; Yewdell and Bennink, 1990). On the basis of current evidence, it seems likely that peptides are generated by proteolytic activity in the cytoplasm and then transported into the lumen of the ER where they bind to newly synthesised class I heavy chains (Townsend and Bodmer, 1989; Townsend et al., 1989). Interestingly, a gene encoding a function thought to be important for normal association of class I heavy chains with peptide is probably encoded within the MHC (section 1.6.2.). Formation of the heavy chain/peptide complex induces association with p2-microglobulin and subsequent export from the ER (Townsend et al., 1989). The class I/peptide complex moves rapidly from the ER to the Golgi apparatus, where modification of carbohydrate occurs, and then to the cell surface, where presentation of the bound antigen to T-lymphocytes occurs (Neefjes et al., 1990).

Class II molecules, which bind and present peptides derived from extracellular proteins, are thought to become complexed with peptide in an endosomal compartment (Yewdell and Bennink, 1990). Newly

25 synthesised class II molecules become associated in the ER with a protein known as invariant chain, which plays a key role in differentiating the exogenous antigen presentation pathway taken by class II molecules from the endogenous antigen presentation pathway taken by class I molecules. In vitro studies with purified invariant chain and class II molecules have shown that binding of invariant chain to class II molecules inhibits the binding of peptides (Teyton et al., 1990). This work led to the suggestion that association of invariant chain with class II could prevent binding of cytoplasmically-derived peptides to class II in the ER. From pulse-chase and immuno-electron microscopy experiments evidence has been obtained that the class n/invariant chain complex proceeds rapidly from the ER to the Golgi apparatus but instead of continuing directly to the plasma membrane the complex is diverted to a novel post-Golgi cytoplasmic compartment where it is retained for several hours before the appearance of mature class II molecules on the cell surface (Neefjes et al., 1990). Targetting of the invariant chain/class II complex to the post-Golgi vesicles is dependent on an amino add sequence, identified by deletion mapping, in the N-terminus of the invariant chain molecule (Bakke and Dobberstein, 1990). It has been proposed that the compartment to which the complex is targetted may be an endosomal vesicle in which invariant chain becomes dissociated, facilitating binding of class II to peptides which have been generated by proteolysis from endosomally im ported exogenous proteins (Neefjes et al., 1990; Teyton et al., 1990). This is consistent with the observation that antigen presentation by dass II molecules, but not by dass I molecules, is sensitive to inhibitors which disrupt the endosomal trafficking and processing pathway by which extracellular proteins are taken into the cell and degraded (Yewdell and Bennink, 1990).

1.3. Molecular genetics of the MHC

1.3.1. Application of molecular cloning and m apping techniques to the MHC region With the advent of molecular cloning technology, it soon became apparent that the complexity of the MHC was much greater than had been previously indicated by the classical studies. The power of this

26 technology is illustrated by comparing the map in Figure 1.1., which summarises the classical studies, to that in Figure 1.5.b., which summarises ten years of molecular studies on the MHC region. The MHC provides an excellent example of the way in which a variety of cloning and mapping methods can be used to characterise in detail the molecular genetic organisation of a region of the mammalian genome.

The first important advance made possible by cloning technology was the isolation of probes for the classically defined MHC genes. These facilitated the sequencing, and analysis of gene organisation, of individual loci. In addition, through hybridisation of these probes to total human genomic DNA at reduced stringency, it was revealed that there were many other related sequences in the human genome. For example, class I gene probes detected numerous genomic fragments, which are now known to correspond to seventeen class I genes, whereas only three class I loci had been detected by serological methods. Similarly class II a- and (3-chain probes revealed class II genes in addition to those encoding the serologically defined DP, DQ and DR antigens. These class I- and class II-related genes were isolated by exhaustive screening of genomic and cDNA libraries with probes for the classical MHC genes. All these cross-hybridising human MHC genes have now been isolated, but additional class I- and class n-related sequences may exist which have diverged too greatly to be detected by this approach.

Initially, somatic cell hybrids containing fragments of chromosome 6 and y-irradiation-induced mutants with deletions in chromosome 6 were used to show that these related genes also mapped to the MHC region. In some cases the relative positions of genes which are in very close physical proximity, such as the four genes in the DP subregion, could be determined by cosmid cloning. However, the most important advance in the understanding of the organisation of the region was facilitated by the development of the powerful long range physical mapping technique of pulsed field gel electrophoresis (PFGE; Schwartz and Cantor, 1984; Carle and Olson, 1984). In conjunction with the numerous MHC probes obtained from gene cloning studies, PFGE i | allowed the accurate mapping of the genes and subregions which had | not been previously linked on overlapping genomic clones. i i I 27 e I *

0) CO ( f l (f> — TJ < < —I _J o o

O 60 o & CM a - a » _

0) 0< ) _ l o

'I? ■t; 4-* u

a S. § C/5 « & S ^ . n (0 Q * “ & < .y s

o — 1

-Q o ns £ When the experiments described in this thesis were started most of the known MHC genes had been isolated on single genomic clones or small contigs extending tens of kilobases which were unlinked to one another (Figure 1.5.a). The PFGE maps revealed that there were large gaps between these cloned regions, which theoretically could contain many additional genes. Recently it has been the aim of several laboratories to clone and analyse the DNA in these intervals, using a 'reverse genetics' approach to detect genes in genomic clones without any knowledge of the functions that those genes might encode. The application of reverse genetics techniques has resulted in a remarkable increase in our understanding of the molecular organisation of the MHC over the last three years (compare Figures 1.5.a. and 1.5.b.).

Details of the molecular genetic organisation of the different MHC regions, as deduced from the application of the approaches described above, are described in the following sections.

1.3.2. The class I region The class I genes are found in a region spanning approximately 2Mbp at the telomeric end of the MHC. The map order of the classical class I loci, HLA-A, -B and -C, as determined by recombination analysis has been confirmed by PFGE mapping. Thus, HLA-C is 130kb distal to HLA- B and just over lOOOkb proximal to HLA-A (Carroll et al., 1987; Ponta- rotti et al., 1988; Ragoussis et al., 1989).

Hybridisation of class I gene probes to Southern blots of total human genomic DNA revealed the presence of numerous other class I-related sequences (Orr and DeMars, 1983). Fourteen non -A-B-C class I gene sequences, accounting for all the cross-hybridising bands seen on Southern blots, were isolated from genomic libraries. From sequencing studies it was shown that three of these were intact genes while the others were pseudogenes (Koller et al., 1989). The three intact genes are expressed and give protein products and have been designated HLA-E (Koller et al., 1988), HLA-F (Geraghty et al., 1990) and HLA-G (Kovats et al., 1990). However, the functions of these non-classical class I genes is as yet unknown. Using a panel of mutant cell lines with deletions in different regions of the MHC, it was possible to demonstrate that the n o n - A-B-C sequences mapped to the class I region. Most of the

29 sequences mapped close to, or between, the HLA-A, -B and -C genes, but from recombination analysis it was shown that four (including HLA-F and -G) mapped 8cM telomeric of HLA-A, while a fifth mapped an additional 2cM telomeric (Koller et al., 1989).

Another intact non-A-B-C class I gene, cdall, has been described by Ragoussis et al. (1989), which maps within 50kb of HLA-A. The relationship of this gene to those described by Koller et al. (1989) is currently unclear. In a PFGE mapping study using a class I probe at reduced stringency it was shown that all the class I-related genes were encompassed in a region spanning a total of 2Mbp (Ragoussis et al., 1989; Figure 1.5.b.). In the human genome, the average frequency of recombination between genes is such that a genetic distance of lcM is roughly equivalent to a physical distance of IMbp. This relationship holds well in the proximal portion of the class I region, where the genetic distance between HLA-B and HLA-A is roughly lcM, and the physical distance is just over IMbp. However, the more distal class I genes map within IMbp of HLA-A and yet have a genetic separation of up to lOcM (Koller et al., 1989; Ragoussis et al., 1989). This provides evidence for an unusually high frequency of recombination between HLA-A and the more distal class I genes.

1.3.3. The class II region The class II genes are found in a region at the centromeric end of the MHC which from PFGE mapping spans approximately IMbp (Hardy et al., 1986; Ragoussis et al., 1989; Dunham et al., 1989). The genes which encode the a and p chains of the serologically defined class II antigens DP, DQ and DR are localised in discrete genetic subregions. In addition this region contains class II pseudogenes and genes which are apparently intact but for which no protein product has yet been identified.

The first class II cDNA clones isolated were for the DR A gene. Polysomes from a B-lymphoblastoid cell line were selected with a monoclonal antibody against the DRa chain, and the mRNA from the antibody-bound polysomes was used as a template to make cDNA (Korman et al., 1982a). The cloning of DRA facilitated the cloning of other class II a-chain genes by hybridisation of the DRA probe to

30 genomic or cDNA libraries at low stringency (Spielman et al., 1984; Auffray et al., 1984). Similar approaches were taken to isolate DRB and DQB1 gene clones (Long et al., 1983). Again, these were used to screen libraries at reduced stringency to obtain clones for the other class II 13- chain genes. To date the class II region contains 6 a-chain genes and between 7 and 11 P-chain genes, the exact number depending on the haplotype.

The DP subregion The DP subregion spans 70kb at the proximal end of the class II region and contains two a-chain genes ( DPA1 and DPA2) and two p-chain genes ( DPB1 and DPB2) which are organised as shown in Figures 1.5 and 4.5. (Trowsdale et al., 1984; Servenius et al., 1984; O kada et al., 1985a). The expressed DP molecule has been demonstrated by transfection studies to be the product of the DPA1 and DPB1 genes (Okada et al., 1985a). The DPA2 and DPB2 genes, in contrast, are non transcribed pseudogenes, containing frame-shift mutations and defective splice junctions which would prevent the expression of a functional protein product (Gustafsson et al., 1987).

The DQ subregion The DQ subregion contains five genes, as shown in Figure 1.5.b. The DQA1/DQB1 gene pair and the DQA2/DQB2 (formerly DXa/DXp) gene pair were initially isolated on two unlinked sets of overlapping genomic clones (Okada et al., 1985b; Auffray et al., 1984; Jonsson et al., 1987). The two gene pairs are highly related at the nucleotide sequence level. 15kb distal to the DQA2 gene is a truncated pseudogene, DQB3 (formerly DVP), which apparently does not contain the exons encoding the signal sequence or p2 domain (Ando et al., 1989). Sequencing of the remaining exons revealed that DQB3 is more closely related to DQB1 and DQB2 than to other class II p-chain genes. The entire DQ subregion has now been cloned on overlapping cosmids, revealing that the DQA1/DQB1 and DQA2/DQB2 gene pairs are 75kb apart (Blanck and Strominger, 1988). The DQA1 and DQB1 genes are fully functional and encode the serologically defined DQ antigen. Although the DQA2 and DQB2 genes are intact and do not appear to contain any deleterious mutations which would prevent their expression at the RNA or

31 protein level, transcription of these loci has never been detected (Auffray et al., 1987).

The DR subregion The DR subregion is the only class II subregion known to vary in gene number between individuals. All haplotypes carry a single a-chain gene, DRA , but the number of p-chain genes is variable. DR4 haplotypes, for example, carry four DRB genes, while DR1 and DR8 haplotypes are thought to have only one (Bohme et al., 1985; Andersson et al., 1987). The DR subregions of the DR2, DR3 and DR4 haplotypes have been analysed in detail by cosmid cloning although in none of these studies were all the DR genes linked on a single contig (Rollini et al., 1985; Spies et al., 1985; Andersson et al., 1987; Kawai et al., 1989). The three DRB genes, DRB1, DRB2 and DRB3, from the DR2 haplotype have been linked on overlapping cosmid clones (Kawai et al., 1989). The three genes were contained within a region spanning 80kb. They were not linked to the DRA gene. The DRB2 gene lacked an exon containing the 5' untranslated region, and did not give a cell- surface product when co-transfected with the DRA gene into m ouse L- cells, and is therefore probably a pseudogene. The DRB1 and DRB3 genes, in contrast, were expressed with DRA in transfection studies to give functional class II molecules (Kawai et al., 1989). The DR subregions in the DR3 and DR4 haplotypes also contained two functional DRB genes, along with one or two pseudogenes (Rollini et al., 1985; Spies et al., 1985; Andersson et al., 1987).

The DN subregion The DN subregion is currently defined by a single gene, DNA (formerly DZa). The DNA gene was isolated by screening a human cosmid library with a DRA gene probe at reduced stringency (Spielman et al., 1984). This gene has been shown by physical mapping to be between the DP subregion and the DOB gene (Inoko et al., 1989), but has not been accurately positioned within this interval or oriented on the chromo some. The nucleotide sequence of a DNA genomic clone revealed that the DNA gene has a similar organisation to that of other class II a- chain genes, and that it does not contain any mutations which would prevent transcription (Trowsdale and Kelly, 1985). Indeed, the DNA gene is transcribed in B-lymphocytes, giving a major transcript of 3.5kb

32 and a minor transcript of l.lkb (Kelly, 1988). The major transcript is thought to be the result of inefficient 3f RNA processing directed by the unusual polyadenylation signal (ACTAAA) found in the DNA gene (Trowsdale and Kelly, 1985). The minor transcript however is processed and polyadenylated normally and does not appear to contain any deleterious mutations that would prevent translation; however, no protein product has been described (Young and Trowsdale, 1990). It has been speculated that a DNA gene product might pair with a DOB gene product to form a functional class II molecule, but this is unlikely because transcription of the DNA and DOB genes is not co-ordinately regulated (Tonnelle et al., 1985).

The DO subregion The DO subregion is currently defined by a single gene, DOB. The DOB gene was first identified by screening a human cDNA library with a mixture of DRB and DPB probes at reduced stringency (Tonnelle et al., 1985). In an independent study DOB was isolated from a human genomic library using a probe for the mouse class II gene Ob (formerly Ap2) at low stringency (Servenius et al., 1987). The DOB gene is transcribed at low levels in B-lymphocytes but no protein product has been reported. Like the DNA gene, DOB is isolated in the class II region and a closely physically linked a-chain gene has not yet been described. As mentioned above, it is unlikely that any DOB product would pair with an a-chain from the DP, DQ, DR or DN subregions because DOB gene expression is not co-ordinately regulated with that of the other class II loci. Specifically, DOB is transcribed only at low levels in B- lymphocytes, and transcription of DOB in fibroblasts is not inducible by y-interferon (Tonnelle et al., 1985). Thus, it is possible that a DOA gene remains to be discovered, and that a DO molecule exists with a function distinct from the classical class II molecules. The DOB gene was mapped distal of the DQ subregion (Hardy et al., 1986) and has recently been accurately positioned 45kb proximal to DQB2 by cosmid walking (Blanck and Strominger, 1988; Figures 1.5. and 5.9.).

1.3.4. The class III region The class III region, which is bounded at the centromeric end by the class II region and at the telomeric end by the class I region, has been

33 shown by PFGE mapping to span about l.IMbp (Figure 1.5.; Dunham et al., 1987; Carroll et al., 1987; Ragoussis et al., 1989; Dunham et al., 1990).

With the advent of molecular cloning techniques it became possible to clone the class III complement genes which had been identified by classical studies. The genes for C4, C2 and factor B were all cloned by designing synthetic oligonucleotides based on the known protein sequences and using these to screen cDNA libraries. cDNA probes were then used to screen cosmid libraries, and overlapping clones were obtained which contained these genes (Carroll et al., 1984). The cloned complement gene subregion contained C2, Bf and two C4 genes ( C4A and C4B) within 120kb of DNA. The C2 and Bf genes were separated by less than 500bp, while C4A and C4B were about lOkb apart. The products of these genes are serum glycoproteins which are components of the classical (C4 and C2) and alternative (factor B) complement cascades. These cascades ultimately result in the formation of a membrane attack complex, a transmembrane channel which causes lysis and death of the cells of invading micro-organisms (Reid, 1988). The classical pathway is activated by antibody bound to antigen and is therefore a major effector of the humoral immune response. The alternative pathway can be activated directly by the cell surface of the pathogen.

Prompted by linkage analysis in pedigrees affected by congenital adrenal hyperplasia caused by steroid 21-hydroxylase deficiency, which had mapped the defective locus to the class III region, it was soon demonstrated that the region encompassing the complement genes also contained two genes for steroid 21-hydroxylase (21A and 21B in Figure 1.5.), one just proximal of each of the C4 genes (Carrol et al., 1985; W hite et al., 1985). 21A is a pseudogene. Deleterious mutations in the functional steroid 21-hydroxylase gene were shown to be responsible for congenital adrenal hyperplasia (White et al., 1985). Deletions and duplications of one or other [21-hydroxylase/C4] gene unit are not uncommon in the population (Carroll and Alper, 1987).

The genes for the related lymphokines tumour necrosis factor (TNF) a and p (Tnfa and Tnfb) were localised to the MHC by analysis of MHC deletion mutants (Spies et al., 1986) and then to the class III region by

34 PFGE mapping (Inoko and Trowsdale, 1987; Dunham et al., 1987; Carroll et al., 1987).

PFGE mapping revealed that the distance between DRA and the complement gene cluster was about 390kb, while the distance between the complement genes and HLA-B was about 580kb (Dunham et al., 1987; Carroll et al., 1987; Ragoussis et al., 1989; Dunham et al., 1990). These intervals were clearly large enough to accommodate numerous genes in addition to the Tnf genes. Intensive efforts to clone the entire class in region were initiated, and these have resulted in over 900kb of DNA being cloned in overlapping cosmids (Spies et al., 1989a, b; Sarg ent et al., 1989a, b; Kendall et al., 1990; Spies et al., 1990). The remaining interval between the proximal end of the cosmid contig and the DRA gene has now been covered by yeast artificial chromosome clones (Ragoussis et al., 1991b).

Analysis of the cloned regions for coding sequences, using reverse genetics techniques, led to the discovery of at least 26 novel genes in the class III region (Figure 1.5.b.; Levi-Strauss et al., 1988; Spies et al., 1989a, b; Sargent et al., 1989a, b; Morel et al., 1989; Milner and Campbell, 1990; Kendall et al., 1990; Spies et al., 1990). The total number is uncertain because there have been two very recent independent reports of novel genes between 21B and DRA and it is not yet clear whether both groups have identified the same genes (Kendall et al., 1990; Spies et al., 1990). In Figure 1.5.b. the data of Kendall et al. (1990) are shown because this report described seven novel genes whereas Spies et al. (1990) only described six. The density of genes in the class HI region is remarkably high. In fact, one of the novel genes (OSG in Figure 1.5.b.) actually overlaps, on the opposite strand, with the 21B gene (Morel et al., 1989).

Three of the novel genes were shown by nucleotide sequencing to be members of the heat shock protein hsp70 multigene family (Sargent et al., 1989a; Milner and Campbell, 1990). Hsp70-1 and hsp70-2 are very closely related, encoding identical protein products. Hsp70-hom shares 90% identity with hsp70-l but unlike hsp70-l and -2, expression of this gene is not heat-inducible.

35 The nucleotide sequences of a further four of these novel genes ( RD, BAT2, BAT3 and OSG) have been published. The RD gene encodes a predicted protein product of 42kD, which contains an novel motif consisting of a reiterated arginine-aspartate dipeptide (Levi-Strauss et al., 1988). BAT2 and BAT3 (G2 and G3 in Figure 1.5.b.) encode proteins of 228 and HOkD respectively, both of which are unusually rich in proline (Banerji et al., 1990). The ’opposite strand gene’, OSG, which overlaps with 21B is expressed, like 21B, in steroidogenic adrenal tissue (Morel et al., 1989). This intriguing finding is suggestive of a functional or regulatory relationship between 21B and OSG, but the partial sequence of OSG did not reveal what this might be. In addition to the sequence data from these human genes, a partial sequence has been determined for the mouse homologue of the B144 gene but this did not match any sequences in the nucleotide sequence databases and the function of this gene is not known (Tsuge et al., 1987). No sequence data have yet been published for the other novel genes.

1.4. Evolution of the MHC

1.4.1. Evolutionary relationships betw een d ass I and dass II genes Once the genes encoding dass I and class n molecules had been cloned it became apparent from the conservation in gene organisation and identity at the nucleotide sequence and predicted amino acid sequence level that they were related. In particular, there was significant homology between the class II a2 domain, the class II p2 domain and the class I a3 domain. These domains were also related at the amino acid sequence level to the conserved 'antibody fold' domain found in members of the immunoglobulin supergene family (Korman et al., 1982b; Hood et al., 1986). The common ancestry of class I and class II molecules is also reflected in their related functions and their related structures (Figures 1.2. and 1.4.). It is not clear when in evolutionary time MHC molecules arose. However, class I and class II genes have now been described in amphibians, birds, reptiles and fish as well as mammals, which implies that class I and class II genes evolved in their present form before the radiation of vertebrates which took place about 400 million years ago (Kaufman et al., 1990; Hashimoto et al., 1990).

36 The ancestral class II and class I genes have clearly been duplicated many times to give the multiple loci seen in the class II and class I regions of the mammalian MHC. The organisation of the human class II region suggests that the subregions were generated from a primordial a /p gene pair through a series of duplication and divergence events (Klein, 1986; Bodmer et al., 1986).

Comparisons of the nucleotide and amino add sequences of the class II a-chain genes and their protein products revealed that DPA1,DQA1, DRA and DNA are equally diverged from one another, suggesting that that they arose by duplication at roughly the same time (Figure 1.6.; Auffray et al., 1984). A similar analysis of the p-chain genes reveals that DPB1, DQB1 and DRB are equally diverged from one another but that DOB is more distantly related. This suggests that the DOB gene may have arisen before the other p-chain genes in evolutionary time (Tonnelle et al., 1985). Sequence comparisons between genes within subregions reveals that additional duplications have probably occurred much more recently (Figure 1.6.). For example, the DQA1 and DQA2 genes are 99% related in the exon encoding the a2 domain (Auffray et al., 1987). The duplication which took place in the DP subregion probably occurred earlier than that in the DQ subregion because the DPA2 and DPB2 genes are significantly diverged from DPA1 and DPB1. At the nucleotide level, the identity between DPA1 and DPA2 is 76%, while between DPB1 and DPB2 it is 86%. The increased divergence of the DPA2 gene compared to DPB2 suggests that DPA2 became a pseudo gene before DPB2 (Gustafsson et al., 1987).

1.4.2. Organisation of the MHC region in different species As the sequences of the class II genes of mouse and man were deduced, it became apparent that the subregions were homologous. Thus, the DRA and Ea genes are more closely related to one another than to the other lod, as are the DRB and Eb genes (Kaufman et al., 1984; Denaro et al., 1985). In the same way the DQA and DQB genes are related to the m ouse Aa and Ab genes respectively (Kaufman et al., 1984). The isolated mouse p-chain gene, Ob (formerly Ap2), is homologous to the isolated human P-chain gene, DOB (Larhammar et al., 1985; Tonnelle et al., 1985; Servenius et al., 1987). Finally, the isolated m ouse p-chain pseudogene Pb (formerly AP3) was found to be most related to the

37 Figure 1.6. Evolutionary tree for the dass II genes. HLA and Ig refer to the primordial MHC and immunoglobulin genes. ABC refers to the primordial dass I gene. Da and P refer to the primordial dass II a- and p-chain genes. Subsequent events involve pair wise combinations of a and p genes except where indicated. Approximate divergence times are indicated in millions of years (from Bodmer et al., 1986). DZ is now known as DN; DX is now known as DQ2. sequences of the two (3-chain genes in the human DP subregion (Widera and Fla veil, 1985). When the organisation of the genes in the human and mouse class II regions was deduced by cosmid cloning and physical mapping it was revealed that the relative positions of the homologous subregions is conserved between the two species (Steinmetz et al., 1986; Hardy et al., 1986). These sequence comparison and gene mapping data have led to the hypothesis that the overall organisation of the class II region was established before the radiation of rodents and primates about 135 million years ago (Bodmer et al., 1986; Klein, 1986). Differences between the two class II regions, such as the duplication of the human DQA1/DQB1 gene pair to give DQA2 and DQB2, and the deletion from the mouse class II region of sequences homologous to DPA1 or DPA2, have most likely occurred more recently. In contrast to the distinct conservation of subregions seen in the mouse and human class II regions, the class II genes of the chicken MHC are more closely related to one another than they are to DR, DQ or DP. Thus, the multiple chicken class II genes probably arose independently by duplication from a primordial class n gene pair after the separation of the mammalian and avian lineages about 300 million years ago (Kroemer et al., 1990).

As in the human MHC, the mouse class I and class II regions are separated by a class HI region (Figure 1.7.). The class in regions of mouse and man seem to be conserved, although many of the most recently discovered genes in the human class III region have not yet been mapped in the mouse. Both class HI regions contain genes for steroid 21-hydroxylase and the complement components C4, C2 and factor B; the organisation of these closely linked genes is similar in the two species (Chaplin, 1985). The hum an RD gene has a homologue in the analogous position in the mouse class III region (Levi-Strauss et al., 1988). The B144, Tnfa, Tnfb and BAT1 genes have recently been linked on overlapping cosmids in mouse, and their order is conserved relative to the human class III region (Wroblewski et al., 1990). A m ouse hsp70 gene has been localised to the class III region by recombination analysis but its position relative to other class III loci is not yet known (Gaskins et al., 1990).

39 & 7 3 cd £ cd s HH HH ‘ 2 o CA (A d , ttf a , 7 3 hH* 7 3 (A £ CA cd 2 u § fH o cd • H ■4-J £ • H O

s ’■ 8 a 0 ) > a> •42 ^cd ■ s CO ' 3 Mh CO »H o o o cd (A B • H CA CA § cd LU to »H 7 3 o 7 3 7 3 tr < © © cd£ 4 4 Q 2 cd 11 „ P o CA 'd O CA £ Q £ cd • H O 7 3 O *42 V CA cd 1—1 cd a CA CO (A CA © j d IU £ LU ' u & © 2 • H © *0 2 Q_ k - u LU O • H J 3 z o Q 0 CO CO 4-» 4-* CO E Cd Mh < O B o o 2 ° a s v _ a> CA CA -*—• © CA 3 , £ C 2 o cd o o . a X

The major difference in organisation between the human and mouse MHCs is in the class I region. The human MHC contains seventeen class I genes, while the mouse MHC contains between 26 and 33 class I genes depending on the haplotype (Flavell et al., 1986). As mentioned previously, the human class I genes all lie telomeric of the class in region (Figure 1.7.). Most mouse class I genes are also found in a region telomeric of the class in region. However, a pair of class I genes, K and K2, are present at the centromeric end of the class H region, about 70kb proximal to Pb (Figure 1.7.; W idera and Flavell, 1985; Steinmetz et al., 1986). These genes are proposed to have arisen by duplication in the Qa region of the mouse class I gene cluster, and to have subsequently moved to their current location by an intrachromosomal double cross over event (Bodmer, 1981; Weiss et al., 1984). This organisational change is believed to have occurred since the radiation of rodents and

41 mammals because it has only been found in the MHCs of the closely related rat and mouse (Klein, 1986).

1.5. Clinical relevance of the class II region

The extreme polymorphism of the MHC genes and gene products has provided a powerful system of markers to test for association between the MHC and diseases. In population association studies, the frequency of a particular MHC allele is tested (using serological reagents, or, more recently, using RFLPs or sequence specific oligonucleotides) in a group of individuals with a particular disease and a group of healthy matched controls. The frequencies of each allele in the two groups are calculated and subjected to statistical tests to determine whether there is a significantly increased or decreased frequency of a given allele between the two groups. If a significant difference is found, that allele is considered to be associated with the disease.

This approach has revealed significant associations between the MHC and over 40 diseases (Table 1.2.; Tiwari and Terasaki, 1985). Many of the associations are strongest with class II region; these include insulin- dependent diabetes mellitus (IDDM), rheumatoid arthritis, narcolepsy and Hodgkin's lymphoma. The associations are never complete, reflecting the fact that these diseases usually have complex genetics and that the final development of the disease phenotype is probably dependent on a combination of environmental and genetic factors. Part of the genetic component is generally considered to be contributed by a 'disease susceptibility' gene in the class II region which predisposes an individual towards developing a certain disease (Bell and Todd, 1989).

Interpretations of the results of disease association studies must take in to account the phenomenon of linkage disequilibrium. Infrequent recombination between loci results in linkage disequilibrium between alleles at those loci in the population; that is, the frequency at which alleles at two loci are found together in the population is greater than would be expected from the individual frequency of the alleles. It is well documented that there is very strong linkage disequilibrium between genes in the MHC region (Klein, 1986). In the class II region,

42 Frequency (%) HLA Relative Condition Allele Patients Controls Risk

H odgkin’s disease A l 40 32.0 1.4 Idiopathic hemochromatosis A3 ' 76 28.2 8.2 B14 16 3.8 4.2 Behget’s disease B5 41 10.1 6.3 Congenital adrenal hyperplasia B47 • v 9 0.6 15.4 Ankylosing spondylitis B27 90 9.4 87.4 Reiter’s disease B27 79 9.4 37.0 Acute anterior uveitis B27 52 9.4 10.4 Subacute thyroiditis B35 70 14.6 13.7 Psoriasis vulgaris Cw6 87 33.1 13.3 Dermatitis herpetiformis DR3 85 26.3 15.4 Celiac disease DR3 79 26.3 10.8 DR 7 AJso increased IgA deficiency in blood donors DR3 64 26.3 5.0 DR 7 AJso increased Sicca syndrome DR3 78 26.3 9.7 Idiopathic Addison’s disease DR3 69 26.3 6.3 Graves’ disease DR3 56 26.3 3.7 Insulin-dependent diabetes mellitus DR3 a n d /o r DR4 91 57.3 7.9 DR2 10 30.5 0.2 Myasthenia gravis DR3 50 28.2 2.5 Systemic lupus erythematosus DR3 70 28.2 5.8 Idiopathic membranous nephropathy DR3 75 20.0 12.0 Zw*-immunized mothers DR3 95 15 113 Narcolepsy DR2 100 22 B7 Also increased Multiple sclerosis DR2 59 25.8 4.1 Optic neuritis DR2 46 25.8 2.4 C2 deficiency DR2 B18 Goodpasture’s syndrome DR2 88 32.0 15.9 Rheumatoid arthritis DR4 50 19.4 4.2 Pemphigus (in Jews) DR4 87 32.1 14.4 IgA nephropathy DR4 49 19.5 4.0 Hydralazine-induced SLE DR4 73 32.7 5.6 Postpartum thyroiditis DR4 72 32.2 5.3 Hashimoto’s thyroiditis DR5 19 6.9 3.2 Pernicious anemia DR5 25 5.8 5.4 Juvenile rheumatoid arthritis DRw8 23 7.5 3.6 Primary glomerulonephritis C4B*2.9 25 1.5 22.0

Table 12. Associations between the MHC and diseases. The relative risk shows how many times more frequently the disease occurs amongst individuals with a particular MHC allele compared to those without that allele (from Klein, 1986). alleles at the DR and DQ loci are particularly strongly associated. DP alleles in general are not strongly associated with the DQ/DR region, providing evidence for a hot spot of recombination in the proximal class II region, although in some haplotypes significant disequilibrium between DP and DR/DQ is seen (Rosenberg et al., 1989). The practical significance of linkage disequilibrium is that if a disease association is found with a particular class II allele, any gene (including known class II genes and, theoretically, novel genes) in linkage disequilibrium with that allele is a candidate for the disease susceptibility gene (Bell and Todd, 1989).

Many of the diseases which are associated with the class II region have an autoimmune pathology and consequently most efforts to explain these associations have focussed on the genes encoding the classical class II antigens. Since the products of these genes clearly play an important role in the control of the immune response (section 1.2.2.), the classical class II genes are intuitively excellent candidates for autoimmune disease susceptibility genes. The sequences of numerous class II alleles with which disease associations have been detected, or which are in linkage disequilibrium with the marker alleles, have been determined in an attempt to identify shared residues or epitopes in the encoded class II molecules which could explain why particular haplotypes predispose to disease (Bell and Todd, 1989). As a result of these studies, particular class II antigens have been strongly implicated in the mechanism of development of IDDM and rheumatoid arthritis (Todd et al., 1988; Nepom , 1990). Shared amino acids in these allelic products are proposed to give them common structural features which fail to delete potentially autoreactive T-cells in the thymus or which facilitate presentation of critical autoantigens in the periphery (Bell and Todd, 1989).

There is, however, still no proof that the class II genes implicated by these studies explain the association of any disease with the class II region. Thus it is possible that the true autoimmune disease susceptibility genes are as yet undiscovered class II genes or genes encoding accessory functions in antigen presentation (section 1.6.). Furthermore, not all class II-associated diseases have an autoimmune pathology. Almost 100% of individuals suffering from the sleep

44 disorder narcolepsy have DR2(Drwl5) yet show no evidence of an autoimmune response (Aldrich, 1990). Other examples of non- autoimmune diseases in association with the class II region include Hodgkin's lymphoma, chronic lymphocytic leukaemia and acute non- lymphocytic leukaemia, all of which are associated with DP alleles (Bodmer et al., 1989a; Pawelec et al., 1989). In these cases the association with the class II region may be explained by the presence of novel genes with non-immunological functions. A precedent for this latter situation in the human MHC was the association of congenital adrenal hyperplasia with HLA-Bw47. The molecular basis of this disease was actually a deletion in the functional steroid 21-hydroxylase gene which was in linkage disequilibrium with HLA-Bw47 (White et al., 1985). Thus, a complete understanding of class II-disease associations will depend on characterising all of the genes in this region and determining their function.

1.6. Evidence for other genes in the class II region

1.6.1. Novel dass II genes Evidence for the existence of a novel human class II antigen has been reported by Carra and Accolla (1987), who isolated a monoclonal antibody which immunoprecipitated a class II molecule from B- lymphoblastoid cell lines lysates after the lysates had been cleared of DR, DQ and DP molecules. The immunoprecipitated molecules contained a- and P-chains as judged by SDS-polyacrylamide gel electrophoresis, but 2-D peptide mapping studies revealed that these were distinct from the DR, DQ and DP a- and p-chains present in the parental cell line. As yet, the genes encoding this novel antigen have not been identified.

It may be that the novel class II molecule described in this study is a product of the DNA locus with that of a hitherto undiscovered DNB locus, or a product of the DOB locus with that of a hitherto undiscovered DO A locus. As discussed in section 1.3.3. both the DNA and DOB genes are transcribed and the nucleotide sequences of the corresponding cDNA clones did not reveal and deleterious mutations which would prevent translation (Tonnelle et al., 1985; Young and

45 Trowsdale, 1990). DNA homologues have been described in whale and rabbit, and the rabbit gene is expressed, but the potential of these genes to encode a protein product has not yet been determined (Kulaga et al., 1987; Trowsdale et al., 1989). The mouse homologue of DOB, Ob, is also transcribed and potentially functional, and intuitively it seems likely that DOB, which clearly arose before the major mammalian radiation, would only be maintained in both mouse and man in a potentially functional form if it did indeed encode a protein product. It is therefore possible that the human class II region contains functional DNB o r DO A genes. Alternatively, the class II molecule described by Carra and Accolla (1987) may be the product of a completely novel class II a /p gene pair. The approaches taken previously to isolate novel a- and p- chain genes involved the screening of genomic or cDNA libraries with class II gene probes at reduced stringency. These methods would not detect diverged class II genes. It should be mentioned that any novel class II gene would not necessarily be encoded within the class II region, although class II genes unlinked to the MHC region have not yet been described in any species.

1.6.2. Class I-modifying locus Evidence for a non-class II gene potentially mapping in the class n region has come from studies of a mutant human B-lymphoblastoid cell line LBL 721.174 which has a defect in the presentation of antigen to T-cells by class I molecules (DeMars et al., 1985; Cerundolo et al., 1990). Class I molecules present short peptide fragments derived from the degradation of intracellular proteins, such as viral proteins, to cytotoxic T-lymphocytes (CTL), which then become activated to kill the infected cell. The class I molecules in LBL 721.174 are functionally normal because they could present exogenously added peptide fragments to CTL such that killing ensued. However, virally infected LBL 721.174 cells were not killed and the class I molecules were retained in the endoplasmic reticulum (which is where antigen binding to class I molecules is thought to occur) instead of progressing to the cell surface. This phenotype has been interpreted as showing that LBL 721.174 has a defect in the transport of intracellularly derived peptides from their site of generation in the cytosol to their site of class I binding in the ER (Cerundolo et al., 1990). Understanding the molecular basis of the defect in this mutant will provide important

46 clues as to the way in which class I molecules become complexed with antigen, about which little is known. LBL 721.174 has a deletion spanning from the DPB2 gene to the complement gene cluster, and the gene responsible for the antigen presentation defect is therefore likely to m ap within this interval (Cerundolo et al., 1990; DeMars et al., 1985). A strikingly similar phenotype has also been described in the rat, and in this case the gene responsible for the defect in class I antigen presentation could be mapped to the class II region of the rat MHC (Livingstone et al., 1989). Like the m ouse class II region (section 1.4.2.), the rat class II region is homologous to the human class II region, and therefore the observation of Livingstone et al. (1989) provides additional evidence for the presence of a class I-modifying locus in the human class II region.

1.6.3. LMP antigens Evidence for novel genes in the mouse MHC was obtained when an antiserum made between mice differing only in the MHC region was found to precipitate a large (~580kDa) multisubunit protein complex from a mouse macrophage cell line (Monaco and McDevitt, 1982). The complex was composed of a large number of noncovalently linked low molecular weight polypeptide (LMP) subunits which were bio chemically, serologically and genetically distinct from the class I, class II and class HI gene products. The subunits ranged in molecular weight from 12-35kD. Two of the subunits displayed electrophoretic poly morphism and both the polymorphisms mapped by recombination analysis within the mouse class II region. Lmp-7 was localised between between Pb and Ab, while Lmp-2 was mapped slightly more accurately, between Pb and Ob (Monaco and McDevitt, 1986; Steinmetz et al., 1986). Neither gene has yet been cloned. The genes for the other fourteen LMP subunits could not be mapped in this study because their products were not polymorphic. A biochemically similar complex has also been described in human cells (Monaco and McDevitt, 1984). Therefore, given that the human and mouse class II regions are homologous, it is possible that the human class II region also encodes LMP subunits. The function of the LMP complex is unknown, although the fact that it is expressed in macrophages and lymphocytes and is inducible by y- interferon, like class II molecules, has led to speculation that it may provide some accessory function in antigen presentation (Monaco and

47 McDevitt, 1986). The unusual properties of the LMP complex are also found in the eukaryotic multicatalytic proteinase (Rivett, 1989). This broad specificity non-lysosomal endopeptidase complex of about 600kD is composed of at least thirteen distinct subunits with molecular weights of between 20 and 35kD. An intriguing possibility is that the LMP complex is the same as, or closely related to, the high molecular weight proteinase, and generates peptides from intracellular proteins for presentation by MHC molecules (Parham, 1990).

1.7. Approaches to finding novel genes

There is substantial circumstantial evidence to support the hypothesis, on which the experiments described in this thesis are based, that there are previously undiscovered genes in the class II region of the human MHC. To summarise briefly the preceding sections, evidence that there may be additional class n genes in the class II region comes from (i) immunoprecipitation of non-DP-DQ-DR class II molecules from B- lymphoblastoid cell lines and (ii) the finding of two potentially functional class II genes, each without a 'partner1. Evidence for non class II genes in the class II region comes from (i) studies of class I antigen presentation mutants in human and rat which implicate a class II-encoded locus and (ii) the mapping of genes encoding two subunits of a novel multisubunit complex to the mouse class II region. In addition, disease association studies can be interpreted in terms of previously undiscovered genes (both class II and non-class II) in the class II region.

In order to analyse a region of the genome of interest for coding sequences in the absence of detailed knowledge about the function of any gene which might be present, a ’reverse genetics' strategy can be applied (Orkin, 1986). Reverse genetics methods have provided spectacular successes in the cloning of the genes causing diseases which could not be analysed using more traditional approaches, such as Duchenne muscular dystrophy and cystic fibrosis (Monaco and Kunkel, 1987; Rommens et al., 1989). Over the last three years, this approach has also proved highly successful in the discovery of novel genes in the class in region of the hum an MHC, as described in section 1.3.4. Some

48 of the methods which can be used have already been alluded to, but the principles will now be described in more detail.

Once the genomic region of interest has been identified it is mapped and cloned. The most powerful technique for long-range physical mapping is currently pulsed field gel electrophoresis (Barlow and Leh- rach, 1987), although irradiation-induced hybrids and naturally occurring chromosomal translocations and deletions have also proved extremely useful for determining the relative order of probes in a given region (Hastie et al., 1988; Cox et al., 1990). The region of interest is then cloned, using the previously mapped probes as starting points for the isolation of genomic clones. Genomic libraries in bacteriophage X and cosmid vectors have previously proven successful, although the relatively small insert sizes in these systems can make the cloning of large regions of DNA extremely laborious. More recently however, yeast artificial chromosome (YAC) vectors have been developed in which hundreds of kilobases of genomic DNA can be cloned and propagated (Schlessinger, 1990). Jumping libraries and linking libraries provide additional sources of cloned genomic material (Poustka and Lehrach, 1986). The use of rare-cutter jumping libraries has the additional advantage that the cloned regions may contain CpG islands which are frequently found at the 5' ends of genes (see below and Chapter 3).

The methods used to search for genes in a cloned region are based on the known properties of transcribed sequences. For example, many genes are associated with CpG islands, short (l-2kb) regions of DNA which are unusually rich in unmethylated CpG dinucleotides (Bird, 1987). The function of these regions is not yet known, but they provide convenient diagnostic markers for expressed sequences, and can be detected because they contain sites for certain restriction endonucleases which otherwise cleave DNA very infrequently (Brown and Bird, 1986; Lindsay and Bird, 1987). Another diagnostic feature of coding sequences is that they are likely to be conserved in the genomes of other organisms. Cross-hybridisation of fragments from genomic clones to genomic DNA from other species has proved a successful approach to identifying genes (Monaco et al., 1986). Furthermore, transcribed regions can be detected by hybridising fragments from genomic clones

49 to northern blots or to cDNA libraries (Monaco et al., 1986). Very recently, 'exon-trapping1 techniques have been developed to test for the presence of splice junctions in a region of cloned DNA (Duyk et al., 1990).

These approaches are theoretically applicable to any part of the human genome, including the class II region. In fact, as described in the preceding sections, many steps have already been taken towards characterising the class II region at the molecular level. In particular, physical maps of the region are available which can be used as a basis for further work, and numerous genes have been cloned which are suitable for use as probes in the construction of more detailed maps and the isolation of new genomic clones. The following chapters of this thesis describe the application of molecular mapping and cloning techniques in the class II region to determine the likely positions of genes, to obtain genomic clones covering these regions, to identify genes within these clones, and to characterise the novel genes thus found.

50 2. Materials and methods

2.1. Bacterial cell culture

Liquid cultures were grown by inoculating L-broth containing the appropriate antibiotic with a single bacterial colony and shaking vigorously overnight at 37°C. To grow bacterial colonies, cells were streaked onto the surface of L-agar plates containing the appropriate antibiotic and incubated with the plates inverted at 37°C overnight.

L-broth Bacto-tryptone lOg/litre Bacto-yeast extract 5g/litre NaCl lOg/litre Sterilised by autodaving.

L-agar plates 1.5% (w/v) Bacto-agar in L-broth. Sterilised by autodaving. Solid media were melted by microwaving and cooled to 50°C before adding antibiotics.

Antibiotics A m picillin Anhydrous ampicillin was dissolved in double distilled water (DDW) at 50mg/ml, sterilised by filtration and used at a final concentration of 50pg/m l. Tetracycline Tetracydine hydrochloride was dissolved in DDW at 12.5mg/ml, sterilised by filtration and used at a final concentration of 12.5|ig/ml. Kanam ycin Kanamycin sulphate was dissolved in DDW at lOmg/ml, sterilised by filtration and used at a final concentration of 10|ig/ml.

Antibiotics were stored at -20°C.

51 2.2. Screening of recombinant DNA libraries

2.2.1. Cosmid libararies The cosmid library used in this study, a gift from Dimitri Kioussis, (National Institute of Medical Reseach, London) was made using genomic DNA from the human T-cell line HPB.ALL in the cosmid vector cos202 (ampicillin resistant). 106 recombinants were plated out and screened as follows. The titre of the cosmid library was first determined by preparing a 10"2 dilution of the frozen library stock (lOjil stock in 990|il L-broth) and plating out dilutions of this on L-agar plates containing ampicillin. These plates were incubated at 37°C overnight and the 10'2 dilution was stored at 4°C. The following day the number of colonies obtained for each dilution was counted and used to calculate the volume of the 10"2 dilution required to yield 250 000 colonies (the optimal number for a 20x20cm filter). Four 20x20cm Hybond-N membranes (Amersham) marked with an asymmetric pattern of dots were lowered onto the surface of four L-agar 245x245mm (Nunc) plates. The calculated volume of the 10"2 dilution was then spotted on to the surface of the filter and spread evenly using a flamed glass spreader. These plates, the master plates, were incubated overnight at 37°C. The next day eight additional 245x245mm plates, the replica plates, were poured and overlaid with nylon membranes. To prepare duplicate replicas of the first master, the filter from the first master plate was lifted off the agar with Millipore forceps and placed colony side up on three thicknesses of Whatman 3mm paper. The wetted filter from the first replica plate was then overlaid on this filter, covered with further layers of Whatman and pressed on to the master. The master pattern of spots was marked on to the replica filter before the two were pulled apart and the replica was returned to its plate. The same master filter was then overlaid with a second replica filter and the process repeated. The first master filter was then returned to its plate. When duplicate replicas had been prepared from the four master filters, all twelve plates were incubated at 37°C until the colonies had regrown. The master plates were then sealed with Parafilm and stored at 4°C.

The filters from the replica plates were removed from the agar surface and processed by placing them successively (colony side up) on pads of

52 Whatman soaked in denaturing solution (7min) and neutralising solution (2x 3min). The filters were then rinsed by immersion in 2x standard saline titrate (SSC) and air dried before the DNA was fixed by baking for 2hr at 80°C or UV irradiation at 0.4 J/cm2. The membranes were incubated for l-2hr at 42°C in 500ml TENS buffer and the bacterial debris removed with a rubber policeman. After a final rinse in 2x SSC the filters were ready for hybridisation

Following hybridisation of the replica filters with the probe of interest, a positive hybridisation signal present in duplicate was identified on the autoradiographs and used to pinpoint on the appropriate master plate the position of the region containing the positive colony. Bacterial cells were removed from this region with a toothpick or flamed loop and transferred to 1ml L-broth. After vigorous mixing to disperse the cells, dilutions of this stock were made and plated out on 140mm L-agar plates containing ampicillin and incubated at 37°C overnight. A plate bearing well separated colonies was selected for the preparation of duplicate colony lifts. A nylon membrane of appropriate size was overlaid on the agar surface for lmin. During this time the position of the filter was marked by piercing with a needle and syringe containing ink. The filter was then removed and placed successively (colony side up) on pads of Whatman soaked in denaturing solution (7min) and neutralising solution (2x 3min). The filter was briefly rinsed in 2x SSC and the DNA fixed onto the surface by baking or UV irradiation. Meanwhile, a second membrane was overlaid on the surface of the plate and marked in the same position with the needle and ink. This replica was processed in the same way as the first. The secondary plate was incubated at 37°C until the colonies had regrown, then stored at 4°C. Following hybridisation of the secondary filters with the probe of interest the autoradiographs were used to identify the positions of individual positive colonies on the plates. These were picked and the cosmid DNA purified as described in section 2.5.1.

Denaturing solution 1.5M NaCl 0.5M NaOH

53 Neutralising solution 1.5M NaCl 0.5M Tris.Cl pH7.2 Im M EDTA

20x SSC 3M NaCl 0.3M sodium citrate

TENS buffer 50mM Tris.Cl pH8.0 ImM EDTA 1M NaCl 0.1% SDS

2.2.2. cDNA libraries The cDNA libraries used in this study were constructed according to the method of Seed (1987) in the plasmid vector CDM8 or derivatives thereof. For each library, 2.5x105-lxl06 clones, propagated in E. coli MC1061/p3, were plated out and screened in the same way as the cosmid library, using ampicillin + tetracycline selection. Once a single positive colony had been identified, plasmid DNA was purified as described in section 2.5.1. The CEM (T-cell line) cDNA library was a gift from Jenny Dunne (Lymphocyte Molecular Biology Laboratory, ICRF). The JY (B-cell line) and y-interferon induced macrophage (U937) cDNA libraries were a gift from Dr. David Simmonds, Oxford.

2.2.3. Jum ping libraries The rare-cutter jumping libraries used in this study were a gift from Annemarie Poustka (German Cancer Research Centre, Heidelberg, FRG). Both libraries were constructed as described in Poustka and Lehrach (1988; see also Figure 5.5.). The Notl jumping library was constructed from Notl-cut human genomic DNA which was circularised around the marker plasmid and then re-cut with BamHI. The BssHII jumping library was constructed from BssHII-cut human genomic DNA which was circularised around the marker plasmid and then re-cut with BamHI+Hindni. In both cases the marker plasmid was pMLS-Mlu-Not, which carries the supF gene, and the vector was a modified form of the bacteriophage X strain NM1151 which contains amber mutations that are suppressed in the presence of supF. Jumping

54 clones were propagated as temperature sensitive lysogens of phage X recombinants in E. coli MC1061/p3 (ampicillin + tetracycline selection). The library was handled and screened in the same way as the cosmid library except that incubations were done at 30°C to maintain lysogenic growth. However, once an individual positive bacterial colony was detected by secondary screening the phage were induced to undergo lytic growth so that preparations of X DNA could be made. A positive colony was inoculated into 5ml L-broth containing tetracycline and ampicillin and shaken at 30°C overnight. 1ml was diluted into 50ml L- broth containing antibiotics and shaken at 30°C for 90min. The lytic growth cycle was then induced by shaking the culture at 42°C for 20min. Phage DNA was prepared as described in section 2.6., starting with the step to pellet the lysed bacterial cells.

2.3. Subcloning of DNA fragments

DNA fragments were subcloned into the plasmid vector Bluescript (ampicillin resistant), which has a useful polylinker cloning site and carries the lacZ gene which facilitates colour selection of insert-carrying recombinants. Vector for subcloning was cleaved to completion (as judged by testing a sample on an agarose gel) with the appropriate enzyme(s) and then purified by phenol extraction and ethanol precipitation. Insert DNA was prepared by excising the desired fragment from a lx TAE agarose gel and purifying the DNA using Geneclean (Bio 101). For the ligation reaction, vector and insert DNA were mixed in the ratio [1 vector terminus: 2 insert termini] and incubated overnight at 16°C with lpl lOmM ATP, lpl lOx ligation buffer, lpl T4 DNA ligase (Biolabs) and DDW in a total volume of 10|il. The reaction was then ethanol precipitated and resuspended in 10(0.1 DDW for transformation into the Bluescript host E. coli XL-1 Blue (Stratagene) by electroporation (section 2.4.2.). lOx Ligation buffer 500mM Tris.Cl pH8.0 lOOmM MgCl2 200mM dithiothreitol (DTT) 500pg/m l BSA

55 lOmg/ml BSA Bovine serum albumin was dissolved at lOmg/ml in lOmM Tris.Cl pH7.5, ImM EDTA

2.4. Transformation of bacterial cells with DNA by electroporation

2.4.1. Preparation of E. coli cells for electroporation 5ml L-broth were inoculated with a single colony of the desired E. coli strain and shaken at 37°C overnight. 2.5ml of this culture were diluted into 500ml L-broth in a 2 litre flask and shaken at 37°C until OD600 was 0.5-0.6. The culture was then chilled in an ice-water bath for 15min, transferred to pre-chilled 500ml centrifuge bottles, and spun for 20min, 4000rpm, 2°C, in a Beckman J6B. The supernatant was poured off and the pellet resuspended in 5ml ice cold water. A further 500ml ice cold water were added before centrifuging at lOOOOrpm, 2°C, 20min. The supernatant was poured off quickly to avoid loss of the loose pellet. The cells were then resuspended and centrifuged as before. The supernatant was removed immediately and the pellet resuspended by swirling in the residual liquid. 40ml ice-cold 10% glycerol were added and the cells were pelleted at 10 OOOrpm, lOmin, 2°C. Finally the cells were resuspended in an equal volume of 10% glycerol, aliquotted into Eppendorfs (50pl each), frozen on dry ice and stored at -70°C (Ausubel et al., 1987).

2.4.2. Introduction of DNA into cells DNA in ligation buffer was ethanol precipitated and resuspended in water before transformation. The salts in ligation mixes lowered the efficiency of transformation because the resistance of the sample was reduced to sub-optimal levels.

DNA (typically l-100ng) was added to 50|il electroporation-competent cells and the mixture was pipetted into the bottom of a pre-chilled electroporation cuvette. This was placed in the sample chamber of the electroporation apparatus, which was set to 2.5kV, 25pF with the pulse controller adjusted to 200Q. The pulse was applied and the cuvette removed. 1ml SOC medium was immediately added and the mixture pipetted into a fresh Eppendorf and incubated at 37°C for 30-60min.

56 Aliquots of the transformation mix were plated out on L-agar plates containing the appropriate antibiotic. When Bluescript plasmids were used, the plates were spread with lOOpl (for an 82mm plate) IPTG and 40pl X-gal 30min before use. The efficiency of transformation by electroporation was so high for supercoiled plasmids (typically 108-109 transformants/pg) that extensive dilutions of transformation mixes were made in SOC in order to obtain single colonies (Ausubel et al., 1987).

SOC Medium 0.5% Bacto yeast extract 2% Bacto-tryptone 2.5mM KC1 lOmM NaCl lOmM MgCl2 lOmM MgS04 20mM glucose Sterilised by autodaving.

IPTG 25mg/ml isopropyl thiogalactoside in DDW, stored at -20°C.

X-Gal 25mg/ml bromo-chloro-indolyl-p-D-galact- oside in dimethyl formamide, stored at -20°C.

2.5. Preparation of DNA from transformed bacterial cells

2.5.1. Small scale preparation of plasmid or cosmid DNA by alkaline lysis This technique was used for the rapid small-scale purification of plasmid or cosmid DNA from bacterial cells (Sambrook et al., 1989).

5ml L-broth were inoculated with a single bacterial colony and shaken overnight at 37°C. 1.5ml of each culture were transferred to an Eppendorf tube and microfuged for lmin. The supernatant was rem oved by aspiration with a drawn-out Pasteur pipette. The cell pellet was resuspended by vortexing in lOOpl GTE and the tubes were left at room temperature for 5min. The cells were lysed by adding 200pl freshly prepared 0.2M NaOH/1% SDS which was mixed by inverting the tube. After standing on ice for 5min the bacterial chromosomal DNA was

57 precipitated by adding 150pl KAc pH4.8, vortexing. and standing on ice for a further 5min. The chromosomal DNA was then pelleted by microfuging for lOmin at 4°C, and the supernatant was transferred to a fresh Eppendorf tube. An equal volume of phenol/chloroform/iso amyl alcohol (PCIA) was added and mixed thoroughly by vortexing, and the tubes were microfuged for 5min. The aqueous phase containing the plasmid DNA was transferred to a fresh Eppendorf tube. Two volumes of absolute ethanol were added and mixed before leaving at -20°C for 2hr. Following precipitation the DNA was pelleted by microfuging for 15min at 4°C. The supernatant was discarded before addition of 500|il 70% ethanol. The tubes were microfuged again and the ethanol removed by aspiration. The pellets were dried for 5min by vacuum desiccation and then resuspended in 20fil TE pH8.0 containing 100ng/ml DNase-free RNase. 2-5^1 were analysed by restriction endonuclease digestion. This technique typically yielded 2|ig cosmid DNA or 5|ig plasmid DNA.

GTE 50mM glucose lOmM EDTA 25mM Tris.Cl pH8.0

KAc pH4.8 60ml 5M KAc 11.5ml glacial acetic add 28.5ml DDW

PCIA Phenol (melted at 65°C) was mixed with chloroform and iso-amyl alcohol in the ratio 25:24:1 and buffered by equilibrating once with 1 volume 50mM Tris base, twice with 1 volume 50mM Tris.Cl pH8.0 and once with 1 volume TE pH8.0.

TE pH8.0 lOmM Tris.Cl pH8.0 ImM EDTA pH8.0

58 DNase-free RNase Pancreatic RNase A was dissolved in 15mM NaCl, lOmM Tris.Cl pH7.5 at lOmg/ml, heated to 100°C for 15min and cooled slowly to room temperature before storing at -20°C.

2.5.2. Large scale preparation of plasmid or cosmid DNA by alkaline lysis This technique was used to prepare milligram-scale quantities of high purity plasmid or cosmid DNA (Sambrook et al., 1989).

5ml L-broth were inoculated with a single bacterial colony and grown during the day (or overnight) at 37°C with vigorous shaking. This culture was then diluted into 400ml L-broth in a 2 litre flask (or split 2x 200ml in 1 litre flasks) and shaken at 37°C overnight. Bacterial cells were pelleted in a 500ml centrifuge bottle by centrifuging at 6000rpm for lOmin and then resuspended thoroughly in 20ml GTE containing 4mg/ml lysozyme. This suspension was left at room temperature for 5min before adding 40ml 0.2M NaOH/0.1% SDS, mixing gently, and incubating for a further 5min on ice. 20ml 5M KAc pH4.8 were then added and mixed by vortexing. This mixture was incubated for 15min on ice and centrifuged for 15min at 8000rpm. The supernatant was filtered through a nylon tea-strainer into a 200ml glass bottle, and the nucleic acid precipitated by adding 0.6 volumes of propan-2-ol and incubating at -20°C for 2hr. The precipitate was pelleted by spinning at 2000rpm for 20min at 4°C and the pellet was resuspended in 5ml RNasing buffer. The solution was transferred to a 50ml Falcon tube before adding lOpl lOmg/ml RNase and incubating at 37°C for 15min. 200|il 20mg/ml proteinase K and 125jxl 20% SDS were then added and the incubation continued at 37°C for a further 30min. An equal volume of PCIA was added and mixed thoroughly by vortexing. The phases were separated by centrifuging at 2000rpm for 15min, and the aqueous phase was transferred to a 30ml Corex tube. 0.1 volumes 3M NaAc pH5.2 and 2 volumes absolute ethanol were added and the mixture was stored at -20°C for 2hr. The DNA was pelleted by centrifugation at lOOOOrpm for 20min at 4°C and resuspended in 1ml A-50 buffer. The plasmid or cosmid DNA was then separated from contaminating bacterial DNA and RNA by fractionation through a

59 30ml A-50 biogel (Bio-Rad) column. Fractions contributing to the first OD260 peak were pooled in a 30ml Corex tube and ethanol precipitated. The DNA was pelleted by spinning at lOOOOrpm for 20min at 4°C, the supernatant was poured off and the pellet was washed in 5ml 70% ethanol before centrifuging again. The pellet was dried under vacuum and resuspended in 500pl TE pH8.0.

RNasing buffer 0.1M NaCl 5mM EDTA 0.1M Tris.Cl pH8.0

Proteinase K Fungal proteinase K (BDH) was dissolved at 20mg/ml in 20mM Tris.Cl pH8.0, self-digested at 37°C for lhr, and frozen at -20°C.

Biogel A-50 buffer 0.5M NaCl 25mM Tris.Cl pH8.0 Im M EDTA

2.6. Small scale preparation of bacteriophage X DNA lOOpl phage were mixed with 100|il lOmM CaCl2, lOmM MgSC >4 and lOOpl of a saturated culture of host bacterial cells and incubated for 20min at 37°C. This mixture was diluted in 50ml L-broth containing lOmM MgS 0 4 , 0.5% casamino acids and 0.2% maltose and shaken overnight in a 200ml conical flask at 37°C. 20ml aliquots were transferred to Falcon tubes and centrifuged at 4000rpm for 15min to pellet the debris of lysed bacterial cells. 15ml of supernatant were transferred to a fresh tube and incubated with lOpl lOmg/ml DNase for 15min at 37°C to degrade bacterial DNA. 5ml of AS buffer were then added and the mixture heated to 70°C for 15min to denature the phage capsids. The tube was cooled under the cold tap and 1.7ml KAc pH4.8 were added before centrifugation at 4000rpm for 15min. The super natant was filtered through a tea strainer into a fresh Falcon tube, and 15ml propan-2-ol were added before centrifugation at 4000rpm for 15min. The supernatant was poured off and the pellet resuspended in 200|il 0.3M NaAc. The phage DNA was transferred to a microfuge tube,

60 treated with RNase for 15min at 37°C, and extracted with PCIA. The aqueous phase was transferred to a fresh microfuge tube and precipitated with 2 volumes absolute ethanol at -20°C. The DNA was pelleted by microfugation and washed with 70% ethanol. Following a second spin the supernatant was removed by aspiration, the pellet was dried for l-2min under vacuum and the DNA was resuspended in 50- lOOpl DDW. IOjjI were analysed by restriction enzyme digestion.

AS buffer 0.3M Tris.Cl pH8.0 0.15M EDTA 1.5% SDS

2.7. Preparation of eukaryotic DNA from cells in culture

Many of the human genomic DNA samples used to prepare Southern blots in this study were a gift from Susan Tonks (Tissue Antigen Laboratory, ICRF). Genomic DNA samples from other vertebrate species for the preparation of 'zoo' blots were obtained from Dr. Nigel Spurr (Clare Hall Laboratories, ICRF). Additional genomic DNA samples were prepared using the following protocol (Susan Tonks, personal communication).

Cells were grown in RPMI medium supplemented with 10% foetal calf serum (Gibco) at 37°C in 5% CO 2 . For DNA preparation, 108-109 cells were pelleted at 3000rpm for 5min and resuspended in 10ml PBSA. This procedure was repeated. 90ml sucrose-triton were then added and the mixture centrifuged at lOOOOrpm for lOmin. The supernatant was discarded and the pellet resuspended in 20ml 75mM NaCl, 24mM EDTA pH8.0. 40jil lOmg/ml RNase, 500|il 20% SDS and 200|il 20mg/ml proteinase K were added and the mixture was incubated at 37°C for 3hr. An equal volume of buffered phenol was added and mixed thoroughly by shaking. The phases were separated by spinning at 2000rpm for 5min and the aqueous phase was transferred to a fresh tube. The process was repeated using PCIA until the aqueous phase was clear. The aqueous phase was then extracted with chloroform/isoamyl alcohol (24:1) and transferred to a beaker. 0.1 volumes 0.5M NaCl and 2 volumes of absolute ethanol were then added and the precipitated DNA strands

61 were spooled onto a Pasteur pipette and transferred to an Eppendorf. lml TE was then added and the DNA left to dissolve at 4°C. The OD was assayed at 260 and 280nm to assay the ratio of DNA to protein. If OD 2 60 /OD 280 <1.75, the sample was incubated again with proteinase K and re-extracted. A typical yield from 108 cells was 200pg DNA.

PBSA 171mM NaCl 3.4mM KC1 lOmM Na2HP04 1.8mM KH 2PO4

Sucrose-Triton 0.33M sucrose lOmM Tris.Cl pH7.5 5mM MgCl 2 1% Triton-X 100

2.8. Preparation of very high molecular weight DNA for pulsed field gel electrophoresis

DNA for pulsed field gel electrophoresis (PFGE) was prepared in the form of cells encased in blocks of low melting point agarose (LMPA) to avoid shearing (Dr. Denise Barlow, personal communication). Each LMPA block contained 3x106 cells, enough for three digests.

Approximately 108 cells were pelleted at lOOOrpm for 5min, the supernatant was removed by aspiration and the cells resuspended in 10ml PBSA. The cells were counted using a haemocytometer, then pelleted as before and resuspended in a further 10ml PBSA. Finally, the cells were pelleted again and resuspended in 50pl PBSA per 3x106 cells. An equal volume of 1% LMPA in PBSA, melted and equilibrated at 42°C, was added to the cells and mixed. lOOpl aliquots were pipetted into the slots of pre-chilled block formers (LKB) which had previously been soaked in 10% hydrogen peroxide for 30min to remove all traces of nucleases. The block formers were left on ice for 30min until the agarose had set. The blocks were then transferred to a Falcon tube containing 2.5ml 0.5M EDTA, 1% sodium laurylsarcosine and lm g/m l proteinase K, and incubated at 50°C for 48hr. A second aliquot of

62 proteinase K was added about half way through the incubation period. The tube was then topped up with TE pH8.0 and inverted several times to wash the blocks. The blocks were separated from the TE by passing through a tea strainer, then returned to the Falcon tube. This process was repeated three times with fresh TE. The TE was then replaced with TE containing 0.04mg/ml PMSF (freshly prepared by dissolving 40mg/ml in propan-2-ol) and incubated at 50°C for 30min to inactivate any residual protease activity. This step was repeated and the blocks were then stored in 0.5M EDTA pH8.0 at 4°C until required.

2.9. Preparation of RNA

Preparation and analysis of RNA was performed with the assistance of Dr. Ruth Lovering.

2.9.1. Isolation of total RNA from cells in culture Cell lines were cultured in the Cell Production Unit, ICRF. RNA was isolated using RNase-free reagents, plasticwear and glass wear (Sambrook et al., 1989). Cells were pelleted by spinning at 2000rpm, 5min. The pellet volume was estimated and 5 volumes of guanidinium thiocyanate homogenisation buffer were added. Homog enisation was effected by drawing the lysate ten times into a syringe fitted with a 19-gauge needle. The homogenate was then layered onto a cushion of 5.7M CsCl, 0.1M EDTA in a polyallomer ultracentrifuge tube. The volume of the homogenate determined the size of the rotor used and the amount of CsCl required, as shown below.

Homogenate Spin time Speed R otor volume (ml) CsQ (ml) (hours) (rpm) SW60 1.2 3.1 12 40 000 SW41 3.5 9.7 24 32 000 SW28 12.0 26.5 24 25 000

After centrifugation, the liquid in the tube except the last 500|il was carefully removed with a pipette and the curved base of the tube, containing the RNA pellet, was cut off with a scalpel. The remaining supernatant in the base of the tube was carefully pipetted off and the

63 pellet was resuspended in 500pl lOmM Tris.Cl pH7.5, 5mM EDTA, 1% SDS. This was extracted twice with PCIA and the RNA was precipitated from the aqueous phase by adding 0.1 volumes 3M NaAc pH5.2 and 2 volumes absolute ethanol, and storing at -20°C overnight. The RNA was pelleted by microfuging at 4°C for 5min and the pellet was washed in 70% ethanol. After re-spinning the supernatant was removed and the pellet dried under vacuum. The RNA was resuspended in 200|il DDW.

Homogenisation 5M guanidinium thiocyanate buffer 5mM sodium citrate 1% Sarkosyl 0.7% p-m ercaptoethanol

To test for y-interferon inducible gene expression in colon carcinoma cell lines, cells were incubated with 300 units of y-interferon/ml for 36- 48hr before RNA extraction (experiment performed by Adrian Kelly, this laboratory).

2.9.2. Isolation of poly(A)+ RNA Polyadenylated RNA was purified from total RNA by oligo-dT cellulose chromatography using Fast-track reagents (Invitrogen). Each total RNA sample was made 0.5M with NaCl and incubated with pre equilibrated oligo-dT cellulose for 30min at room temperature. The mixture was then transferred to a spin-column and washed three times with high-salt binding buffer to remove non-polyadenylated RNA. Poly(A)+ RNA was then eluted from the column in a low-salt buffer and mixed with 0.15 volumes 2M NaAc and 2 volumes absolute ethanol. The mixture was frozen on dry ice to precipitate the RNA and then microfuged for 15min. The supernatant was removed and the pellet resuspended in low-salt buffer before storing the sample at -80°C.

Z10. Restriction endonuclease digestion of DNA

For routine mapping of plasmid, cosmid or bacteriophage DNA with restriction endonucleases, 100-500ng were typically digested in a volume of 20pl with a 2-5 fold excess of enzyme for l-2hr. When

64 specific fragments were required for subcloning or probe preparation, the digest was scaled up accordingly.

For preparation of genomic Southern blots, lOpg genomic DNA were typically digested overnight in a volume of 50-1 OOpl with a 5-fold excess of enzyme. A tenth of the digest was resolved on a minigel to monitor the extent of digestion. Incomplete digests were usually diluted into an equal volume of lx restriction buffer and supplemented with more enzyme before continuing the incubation.

For preparation of PFGE blots, blocks (section 2.8) were washed free of EDTA storage buffer by rocking in 50ml TE at room temperature (2x 30min) followed by 30min in DDW. Blocks were then cut with a scalpel into thirds, each third containing enough DNA for one digest (10pg). These were pre-equilibrated in 1ml of the appropriate buffer before digestion with a 5-fold excess of enzyme in a total volume of lOOpl.

All digests were carried out in the presence of lOOmg/ml BSA. The volume of enzyme added was not allowed to exceed 10% of the total volume. Genomic digests were supplemented with 5mM spermidine if the salt concentration in the buffer exceeded 50mM. Digests were incubated at the optimum temperature as recommended by the manufacturer. Overnight digests involving enzymes from thermophilic bacteria (e.g. TaqI, BssHII) were covered with a layer of paraffin oil to prevent evaporation during incubation.

Four basic lOx buffers were prepared which were supplemented with additional salts if required, according to the manufacturers’ recomm endations. lOx L Buffer lOOmM Tris.Cl pH7.5, lOOmM MgCl 2, lOmM DTT lOx M Buffer 500mM NaCl, lOOmMTris.Cl pH7.5, lOOmM MgCl 2, lOmM DTT

65 lOx H Buffer 1M NaCl, lOOmM Tris.Cl pH7.5, lOOmM MgCl 2, lOmM DTT lOx VH Buffer 1.5M NaCl, lOOmM Tris.Cl pH7.5, lOOmM MgCl 2, lOmM DTT

2.11. Electrophoresis of DNA and RNA

2.11.1. Conventional agarose gel electrophoresis All DNA samples were mixed with 0.2 volumes of the appropriate loading buffer before electrophoresis. Routine restriction endonuclease digests of plasmid, cosmid and bacteriophage DNA were resolved on 0.8% agarose gels in 0.5x TBE buffer containing 0.5pg/ml ethidium bromide using Bio-Rad Mini-Sub minigel apparatus (gel length 10cm). Digests of genomic DNA were resolved using Bio-Rad DNA Sub-Cell apparatus (gel length 20cm) in lx TAE buffer, typically using 0.8% agarose gels which were ideal for most purposes. The agarose concentration was increased up to 1.2% for the resolution of smaller fragments. DNA size markers were HindlD-cut bacteriophage X, BstEII- cut X or HaelD-cut <|>X174, depending on the size range of interest. lOx TBE 108g/l Tris base 55g/l Boric add 20mM EDTA lOx TAE 48.4g/l Tris base 11.4ml/l gladal acetic add 20mM EDTA

5x Loading buffer 50% glycerol 60mM EDTA 0.25% brom ophenol blue 5x TBE or TAE

66 2.11.2. Pulsed field gradient gel electrophoresis PFGE was performed using Pulsaphor apparatus (LKB). 250ml 0.8% agarose in 0.25x TBE were poured into the 20x20cm casting frame supplied. The 16-well LKB comb was used to form the slots into which the digested blocks were inserted. The blocks were sealed into the slots with molten 1% LMPA. Size markers of yeast chromosomes (S. cerevisiae strain YP148, a gift from Denise Barlow, Genome Anlysis Laboratory, ICRF) and concatemers of bacteriophage X DNA were used. The gel tank contained 2.5 litres of 0.25x TBE buffer which was recirculated at 10°C. The electrodes were positioned in the standard double inhomogeneous field configuration as described in the LKB manual. The gel was run at 330V with a constant pulse time of between 45-60sec for 22-26hr, which resolved molecules of up to 900kb.

2.11.3. Electrophoresis of RNA RNA was resolved in formaldehyde agarose gels. 3g agarose, 20ml lOx MOPS buffer and 150ml DDW were mixed and boiled in a microwave. The gel was cooled to 50°C and 24ml 38% formaldehyde were added before pouring. RNA samples (10-15}ig total RNA or 1.5-3|ig poly(A)+ RNA) were desiccated under vacuum and resuspended in 5^.1 deionised formamide, lpl lOxMOPS buffer, 1.6(0.1 38% formaldehyde and 1.4|il DDW. The samples were heated to 65°C for 5min and cooled on ice. lpl loading dye (50% glycerol, 0.1% bromophenol blue) was added and the samples were loaded on the gel. Electrophoresis was carried out at 30-40mA overnight.

2.12. Preparation of DNA and RNA blots

Nucleic acids were routinely transferred to nylon membranes (Hybond- N, Amersham) using blotting apparatus and protocols described in Sambrook et al. (1989).

2.12.1. Preparation of DNA blots Following electrophoresis, gels which had not been run in the presence of ethidium bromide (e.g. pulsed field gels) were stained by immersing in running buffer containing 0.5pg/ml ethidium bromide for lhr. The gel was destained by immersing in running buffer or DDW without

67 ethidium bromide for 30-60min. A photograph was then taken before the gel was soaked 0.15M HC1 for 2x lOmin to partially depurinate the DNA. (This treatment resulted in more efficient transfer of high molecular weight DNA molecules to the filter, because the depurinated sites are cleaved during the denaturation step, thus fragmenting the long molecules.) Following a rinse in DDW, the gel was soaked in denaturing solution for 2x 20min and then in neutralising solution for 2x 20min. The gel was then assembled in a capillary blotting apparatus containing 20x SSC and transfer of nucleic acid to the membrane was allowed to proceed overnight (genomic Southerns) or for 48hr (PFGE gels). When blots of minigels containing cosmid, plasmid or phage DNA were being prepared, the acid treatment step was omitted and each denaturation or neutralisation step was reduced to lOmin. Up to four blots were made from a single minigel by allowing transfer to proceed for 15min each time before changing the filter.

After transfer, filters were rinsed by immersing in 2x SSC and air dried. The DNA was then fixed to the membrane by baking for 2hr at 80°C or by UV cross linking at 0.4 J/cm 2.

Denaturing solution 0.5M NaOH 1.5M NaCl

Neutralising solution 0.5M Tris.Cl pH7.5 1.5M NaCl Im M EDTA

2.12.2. Preparation of RNA blots Following electrophoresis, northern gels were stained for 30min in lpg/m l ethidium bromide and destained in water for lhr before photography. Gels were then rinsed in 20x SSC and then blotted onto Hybond-N overnight in 20x SSC. After RNA transfer, filters were baked for 2hr at 80°C in a vacuum oven and UV cross linked at 0.4J/cm2.

RNA loading was judged by hybridising northern blots with the probe pEDl, a cDNA subclone of the esterase-D gene ESD , w hich is ubiqu itously expressed in a cell-cycle independent fashion (Squire et al., 1986).

68 2.13. Preparation of DNA probes

Probe DNA fragments were purified using Geneclean (Biol 01) from normal agarose gels run in lx TAE buffer, or excised directly from low melting point agarose (LMPA) gels run in 0.5x TBE buffer.

Geneclean-purified DNA was labelled to high specific activity with [a- 32P]dCTP by random hexamer priming (Feinberg and Vogelstein, 1983). For Genedeaned fragments, DDW was added to 10-50ng probe DNA to give a final volume of 33p,l and boiled for 3 mins. The mixture was briefly cooled on ice before the addition of lOpl OLB, 2fil lOmg/ml BSA, 2.5pl [a-32P]dCTP (10pCi/pl) and 2.5|il Klenow fragment (8U/|il, BRL).

For probes isolated on LMPA gels, the required fragment was cut out with a scalpel over a UV transilluminator and the resulting gel slice boiled with 3 volumes DDW for 7min. After cooling to 37°C, aliquots containing 50ng DNA were transferred to fresh Eppendorfs and the volume adjusted to 33pl with DDW. One aliquot was then used for the labelling reaction as described above (Feinberg and Vogelstein, 1984).

The reaction was typically incubated for 6hr at room temperature. After this time, probe DNA was separated from unincorporated nucleotides by passing the reaction mix through a TES-equilibrated G-50 Sephadex column prepared in a Pasteur pipette plugged with glass wool. The first radioactive peak was collected and assayed. Labelled probes were boiled for 5min before addition to hybridisation bags (section 2.14.).

Probes containing repetitive DNA sequences were competed before use by adding 100|ig sonicated human carrier DNA per 50ng labelled probe in a final concentration of 5x SSC, boiling for 5min, and incubating at 65°C for lOOmin before addition to the hybridisation bag.

OLB 100|il Solution A 250|il Solution B 150pl Solution C

69 SolutionA 1.25M Tris.Cl pH7.5 0.125M MgCl2 18pl p-mercaptoethanol 5pl each 0.1 M dATP, dTTP, dGTP (Pharmacia) (Solutions of nucleotides were prepared in 3mM Tris.Cl pH7.5, 0.2mM EDTA.)

Solution B 2M HEPES adjusted to pH7.4 with NaOH

Solution C Hexadeoxyribonudeotides (Pharmacia) suspended at 90 OD 260 units/ml in lOmM Tris.Cl, ImM EDTA.

TES lOmM Tris.Cl pH8.0 lOmM EDTA 0.5% SDS

2.14. Hybridisation of blots

2.14.1. Hybridisation of DNA blots Genomic blots were prehybridised at 65°C for 5-8hr in 6x SSC, 10% dextran sulphate, 5x Denhardt's solution, 0.5% SDS and lOOpg/ml heat denatured sonicated salmon sperm DNA. Hybridisation was carried out at 65°C overnight in fresh buffer with the addition of 106 cpm probe per ml (Sambrook et al., 1989).

Cosmid blot filters and recombinant DNA library filters were hybridised in the same way except that the probe concentration was reduced to 5x10s cpm/ml.

In hybridisations using a human genomic or cDNA probe on human DNA, filters were washed for 20min at 65°C in 2x SSC, 0.5% SDS followed by 20min at 65°C in O.lx SSC, 0.5% SDS. These washing conditions were used unless otherwise stated. When a cross-species hybridisation was being performed, lower washing stringencies were used as described in the appropriate places in the text. A typical final wash, determined empirically, was 6x SSC at 65°C. Filters were exposed

70 to Kodak XAR-5 autoradiography film at -80°C between intensifying screens. lOOx Denhardt's solution 2% (w/v) BSA 2% (w/v) Ficoll 2% (w /v) Polyvinylpyrrolidine

2.14.2. H ybridisation of RNA blots Northern blots were prehybridised at 42°C for lhr in a buffer containing 50% deionised formamide, 6x SSPE, 5x Denhardt's solution and lOOpg/ml sheared salmon sperm DNA. Hybridisation was carried out at 42°C overnight in fresh buffer with the addition of 106 cpm probe per ml.

When a human cDNA probe was hybridised to a human northern blot, filters were washed at 65°C for 20min in 2x SSPE, 0.5% SDS followed by 20min at 65°C in O.lx SSPE, 0.5% SDS. When a human genomic fragment containing an unknown amount of exon was used as the probe, the washing stringency was reduced. A typical final wash, determined empirically, was 2x SSPE at 50°C.

20x SSPE 3.6M NaCl 0.2M sodium phosphate pH7.7 0.02M EDTA

2.15. DNA sequencing

DNA was sequenced by the chain-termination method (Sanger et al., 1977) using Sequenase protocols and reagents (USB). The Sequenase enzyme used was a modified bacteriophage T7 DNA polymerase. Sequencing was performed on double stranded DNA obtained from maxipreps or minipreps. DNA prepared by alkaline lysis minipreps was found to be clean enough for sequencing, but the RNA had to be removed first in order to accurately determine the concentration of DNA present. Oligonucleotide primers other than those for M13-based vectors were made in the Oligonucleotide Synthesis Laboratory, ICRF.

71 2.15.1. Sequencing reaction 2(ig template DNA, 2|il 20mM EDTA and 2jil 2M NaOH were mixed in a microfuge tube with DDW to bring the final volume to 20}tl. The mixture was incubated for 5min at room temperature before neutralising with 3^,1 3M NaAc pH5.2 and 7\i\ DDW. 75pl absolute ethanol were then added and the DNA precipitated at -70°C for 15min. The DNA was pelleted by microfuging at 4°C for lOmin, washed in lOOpl 70% ethanol, respun, and the supernatant removed by aspiration with a drawn-out Pasteur pipette. The pellet was briefly dried under vacuum and resuspended in 7|il DDW and 2pl 5x reaction buffer. 0.5pmol primer were added to the resuspended denatured template and the two were annealed by incubating the mixture at 65°C for 2min and then cooling slowly to <35°C. The primer was extended by adding to the template/primer mix (lOpl): l|il 0.1M DTT, 2pl lx labelling mix, 0.5pl [a-35S]dATP and 2pl Sequenase enzyme diluted 1:7 in TE pH8.0. The reaction was incubated at room temperature for 5min. 3.5|il were then transferred to four prewarmed tubes, each containing 2.5pl of one of the termination mixes. The reactions were incubated at 37°C for 5min, during which time dideoxynucleotides were incorporated into the extended DNA molecules. 4jxl stop solution were then added to each tube. The samples were denatured at 75°C for 2min before resolving 2.5pl of each on a sequencing gel. The remainder of each sample was stored at -20°C.

To resolve compressions sometimes observed on the gel due to secondary structures in regions rich in dG and dC, the sequencing reaction was performed with the substition of dITP for dGTP (and ddlTP for ddGTP). dITP forms weaker secondary structures than dGTP and these are more readily denatured during electrophoresis, resulting in improved resolution in the compressed areas.

5x Reaction buffer 200mM Tris.Cl pH7.5 lOOmM MgCl2 250mM NaCl

72 5x Labelling mix 7.5pM each of dGTP, dCTP and d ITP Diluted to lx with DDW. ddG Termination mix 80|iM each of dGTP, dATP, dCTP and dTTP 8|xM ddGTP 50mM NaCl ddA Termination mix 80|iM each of dGTP, dATP, dCTP and dTTP 8jiM ddATP 50mM NaCl ddC Termination mix 80fiM each of dGTP, dATP, dCTP and dTTP 8}iM ddCTP 50mM NaCl ddT Termination mix 80}iM each of dGTP, dATP, dCTP and dTTP 8|xM ddTTP 50mM NaCl

Stop solution 95% formamide 20mM EDTA 0.05% brom ophenol blue 0.05% xylene cyanol

2.15.2. Denaturing polyacrylamide gel electrophoresis Sequencing gel electrophoresis was performed using Koch-Light apparatus and gel plates. Before each run, the gel plates (40x20cm) were washed in warm soapy water, rinsed and dried. They were cleaned by wiping first with ethanol, then acetone, and finally DDW. The back plate was siliconised with Repelcote. 0.4mm spacers were positioned between the plates and the sandwich fastened together with bulldog clips. 50ml polyacrylamide gel mix were prepared and injected between the plates with a syringe. The comb was positioned and the gel was allowed to set. The gel was then assembled in the tank and warmed by electrophoresing in lx STBE for at least 20min at 4kW. Excess urea was syringed from the loading area and 2.5^1 of each sample were then loaded. A run time of about lOOmin resolved sequences near the

73 primer. Longer runs of up to 4hr allowed sequences over 200 nucleotides from the primer to be read.

Following the run, the back plate was carefully lifted off and the gel (on the front plate) was fixed in a tray containing 10% glacial acetic acid, 10% methanol for 15min. The gel was then drained, transferred to a sheet of Whatman 3mm paper, covered in Saran wrap and dried at 80°C for 30min on a Bio-Rad slab drier. Autoradiography was performed overnight using Kodak XAR-5 film at room temperature. Sequences were compiled using Intelligenetics GEL and SEQ software. Comparisons of nucleotide and predicted amino acid sequences to the databases were performed using the Intelligenetics IFIND program.

Gel mix 31.5ml 50% urea 6.7ml 40% acrylamide stock 5ml lOx STBE DDW to 50ml 400|il lOOmg/ml ammonium persulphate 60|il TEMED

40% acrylamide stock 38% (w/v) acrylamide 2% N,N'-bismethylene-acrylamide Deionised, filtered and stored at 4°C. lOx STBE Tris base 108g/litre Boric acid 55g/litre EDTA 9.3g/litre

74 3. Identification of potential CpG islands in the class II region by PFGE mapping

3.1. Strategy to identify potential sites of genes

The approach taken to identify the position of novel genes in the class II region was based on the observation that the 5' ends of genes are frequently associated with short stretches of CpG-rich DNA, known as CpG islands (Bird, 1987).

Vertebrate DNA is relatively depleted of C+G, and the dinucleotide CpG is particularly rare, being present at only 20% of the expected frequency. The majority of CpG dinucleotides are stably methylated at the 5 position on the cytosine ring (Bird, 1986). It has been observed, however, that about 1% of the genomic DNA in a wide variety of vertebrate species is non-methylated, as judged by cleavage of genomic DNA with the methylation sensitive restriction endonuclease Hpall (which cuts at CCGG but not CmCGG) (Cooper et al, 1983). Cloning and sequencing of a number of the Hpall fragments revealed that they were rich in C+G and that CpG was present at the expected frequency (Bird et al., 1985). Randomly selected clones were found to detect transcripts when used as probes on northern blots, and characterisation of one clone in detail revealed that the CpG rich region was located at the 5’ end of two opposite strand divergent transcripts (Lavia et al., 1987). It had also been independently shown that non-methylated sequences characteristic of CpG islands were associated with the 5' ends of a number of vertebrate genes, such as the chicken a2(I) collagen gene (McKeon et al., 1982). A subsequent search of the gene databases revealed that all constitutively expressed 'housekeeping' genes and many tissue specific genes had CpG-rich islands at the 5' end (Bird, 1986; Gardiner-Garden and Frommer, 1987).

The unusual nucleotide composition of CpG islands led to the pred iction that islands would be detectable in genomic DNA as containing clusters of sites for the infrequently cutting restriction endonucleases

75 used for constructing PFGE maps (Brown and Bird, 1986; Lindsay and Bird, 1987). These 'rare-cutter' enzymes have 6 or 8bp recognition sequences containing one or more CpG, which must be unmethylated for cleavage to take place. CpG islands are therefore likely to contain sites for these enzymes. Indeed, if it is assumed that CpG islands contain 65% G+C, are not depleted in CpG and are lkb long, whereas bulk DNA is 40% G+C with CpG occurring at 25% of the expected level, then the theoretical number of sites for each enzyme in the human genome (3xl09bp) and the proportion of these sites in islands can be calculated (Table 3.1).

Enzyme Target Total sites % sites in Sites per island [class] sequence in genome islands Expected Observed

N otl [a] GCGGCCGC 4100 89 0.12 0.3

BssHII [a] GCGCGC 47000 74 1.2 1.2 EagI [a] CGGCCG 47000 74 1.2 1.1

M lul [b] ACGCGT 37000 21 0.3 0.03 Nrul [b] TCGCGA 37000 21 0.3 0.1

Table 3.1. Distribution of selected rare-cutter sites in human DNA. For each enzyme (all of which contain two CpGs, underlined) the total number of sites, the percentage of sites in islands and the expected number of sites per island were calculated by Lindsay and Bird (1987). The observed number of sites for each enzyme was obtained from the nucleotide sequences of the CpG islands at a random selection of human genes in the databases (Bird, 1990). Class designation ([a] or [b]) is from Bird (1990).

The table shows that those rare-cutter enzymes with recognition sequences consisting entirely of C+G and containing two CpG dinuc leotides are predicted to be highly diagnostic for CpG islands because the majority of sites for these enzymes should occur in islands (class [a] enzymes; Bird, 1990). Most importantly, it has been demonstrated in practice that sites for class [a] enzymes are indeed clustered in CpG

76 islands and are associated with transcribed sequences (Brown and Bird, 1986; Lindsay and Bird, 1987; Bird 1990). The m apping and cloning of rare-cutter sites in a genomic region of interest has successfully led to the identification of novel genes in mouse and man (Rappold et al., 1987; Estivill et al., 1987). In contrast, rare-cutter enzymes with recog nition sequences containing A+T in addition to C+G (class [b] enzymes, Table 3.1.), are predicted in theory and shown in practice to cut more frequently in inter-island DNA (Lindsay and Bird, 1987; Bird, 1990).

The feasibility of using PFGE to map the class II region has been dem onstrated by H ardy et al. (1986) and more recently by D unham et al. (1989) and Inoko et al. (1989). The aim of these studies was to use PFGE to construct detailed physical maps of the region from which the relative order of the different subregions and the distances between them could be established. In the present study it was decided to map the class II region by PFGE with the specific aim of locating class [a] rare- cutter sites which could mark the presence of genes. These sites could then be cloned and analysed in detail for evidence of transcribed sequences.

3.2. Choice of materials for construction of the PFGE map

3.2.1. Restriction endonucleases A discussed above, class [a] rare-cutter enzymes are predicted from theory and shown in practice to cut far more frequently in CpG islands than in inter-island DNA (Table 3.1.) These were selected as the enzymes of choice for finding CpG islands. However, it was also necessary to use class [b] rare-cutters, which tend to cut between CpG islands, so that complementary, overlapping fragments were generated which could be used to link together the class [a] fragments and build up the map. The class [a] enzymes used in this study were BssHII, EagI and Notl. The class [b] enzymes used were Mlul and Nrul.

3.2.2. DNA Most individuals in the population possess two distinct class II haplotypes, the long range maps of which may differ due to polymorphisms at the rare-cutter sites, through variation in DRB gene number, or

77 through differences in the amount of DNA between subregions (Dunham et al., 1989; Lawrance and Smith, 1990). To avoid possible difficulties in interpretation of data which could arise from using the DNA of an MHC heterozygote, PFGE blots were prepared from the DNA of the EBV-transformed B-lymphoblastoid cell line PGF, which is homozygous for the entire MHC as judged by tissue typing. The haplotype of this cell line is A3 B7 DRwl5(2) Dw2 DQw6 DPw4.

3.2.3. PFGE apparatus PGF DNA digested with rare-cutter enzymes was resolved by pulsed field gradient gel electrophoresis using LKB Pulsaphor apparatus with the electrodes in the double inhomogeneous field configuration (Figure 3.1.).

• • •

• +

Figure 3.1. Schematic illustration of a double inhomogeneous pulsed field gel electrophoresis apparatus. Electrodes are represented by black dots. Arrows indicate the direction of motion during electrophoresis of a DNA molecule loaded in the central well under the influence of the electric field which pulses first in the north-south direction and then in the west-east direction. Fractionation of molecules is believed to be dependent on the greater ability of small molecules to re-orientate themselves when the field direction changes (Smith et al., 1987).

78 A constant pulse time of 45-60sec and a run time of 20-24hr was found to resolve molecules of up to 900kb. One disadvantage of this gel system is that electrophoresis results in tracks which are bowed rather than straight, which can complicate comparison of fragment sizes between the lanes. However, in practice it was found that only a small amount of distortion occurred during short runs (Figure 3.2.a.). Hybridisation of blots generated with this system resulted in sharp bands (Figure 3.2.b.) Blots generated using the LKB hexagonal electrode array, which resolves DNA in straight tracks, were found to give less well defined bands (data not shown).

3.2.4. Probes The probes for the class II DPB1, DPA1, DOB, DQA2, DRB1 and DR A genes were those described in the report of the Tenth International Histocompatibility Workshop (Marcadet et al., 1989). The probe for the DNA gene was 8bal, a 1.8kb PstI genomic fragment isolated from the cosmid JG8b (Trowsdale and Kelly, 1985).

3.3. Construction of the PFGE map

DNA from the cell line PGF was digested with rare-cutter enzymes, resolved by PFGE, transferred to nylon membranes and hybridised with probes for class II genes. The sizes of the fragments detected by each probe with each enzyme or combination of enzymes are shown in Table 3.2. Representative autoradiographs are shown in Figures 3.2.b., 3.3. and 3.4.

79 i n J N UON > , X mu/M T3 o> j> O 0»co

i n J N T 3 C rd UON c £ o mu/M X c/i CD 0> g >. N i n J N C < o> o» UON CL X mu/M Q

T30> to inJN tJj0> CO Q. < UON z Q Q |n ||/\| (X U Ch t>o c S • *-H rd C uo 'aJ 5C X5 QJ C rd QJ X5 g O

CO •a

i- x 3 w <<-> o o o o o o o LO 6C o c o cc LO X (N rd X (b) Autoradiographs of the blot of the same gel hybridised sequentially with the class II gene probes indicated. tN X LO"T CO (N CM E pulsed-field electrophoresis. Size markers are yeast chromosomes (YC). LM, limiting mobility. o o o o o o o o o o ^ m ro cnj r-

# N+IAI # I/M CD a+iAi O a a N + a N

N+l/\l « I/M « a+IAI < z Q a i N+a t N

N+IAI # I/M a+i\i CO CL Q a « N + a < N Figure Figure 3.3. Autoradiographs of a PFGE filter sequentially hybridised with probes for the class genes II DPB1, DNA bacteriophage lambda DNA. LM, Limiting mobility. d> and DOb. X, Xotl; BssHII; 13, M. Mlul. The Mlul+Notl double digest is partial. Size markers are concatemers of no o o O O O ID o CO ^ in co cm h - i n CO CM CM 20 rz • • 4 u . 'S>

CO IAJ CM V < X 3 O CL Q 23 a 2 0 rz C3 UJ

2 3 r \ co IAI 3 CO CD 2 3 3 o CO a o Z)20 a» o 27" 'ViCO toZ) (2 TZ) 3 a> co 23 o o o o l o C j- i •—■ 00 D CO CM ® X) 23 i n r f CO CM CM > > .13 2 3 > V co § • • • • • '35 rM *20 a *3 3J 4 «s T 3 3 ^-^-1 m o co a+a % O X Q "H. CO rz CO i - 2 3 a 20 O *5 CJ rz U fO 3 3J 13 T3 h* m 3 < 3+a o. 20 (b) (b) Autoradiographs of a PFGE filter hybridised sequentially with DOB and DQA2 probes, illustrating the Size Size markers are yeast chromosomes in each case. a o strong and weak bands obtained with the DQA2 probe as discussed in the E, Eagl; text. M, Mini. B, 111; Bssl 03 DPB1 DPA1 DNA DOB DQA2 DRB DRA S W

BssHII 250 250 250 210 210 500 500 500

EagI 250 250 250 210 210 500 500 500

M lul 130 450 450 450 450 570 570 570

N o tl 370 370 370 >900 >900 >900 >900

N ru l >900 >900 >900 >900 >900 >900 >900

B+E 250 250 250 210 NT 500 NT

B+M 130 120 120 210 210 500 500 500

B+N 250 250 250 210 210 500 500 500

M +N 130 240 240 210 210 570 570 570

Table 3.2. Sizes of rare-cutter fragments detected by MHC dass II probes on PFGE blots. Fragment sizes are in kilobases. Probes are shown across the top and rare-cutter enzymes are shown down the side (B, BssHII; E, EagI; M, Mlul; N, Notl). The DQA2 probe hybridised to two bands in each track: one strong (S) corresponding to the fragment carrying the DQA2 gene, and the other weaker (W), corresponding to the fragment carrying the cross-hybridising DQA1 gene (see text for further discussion). NT, not tested. Fragment sizes were deduced from more than one blot.

The genes of the DP subregion are known from family studies to be at the centromeric end of the class II region (Shaw et al., 1981). From cosmid walking it has been established that the configuration of the genes within the subregion is DPB2-DPA2-DPB1-DPA1 (Trowsdale et al., 1984). It has also been shown by deletion mapping that the cluster is oriented on the short arm of chromosome 6 with DPB2 toward the

83 centromere (Erlich et al., 1986). This information was used as a starting point for the construction of the PFGE map. Since the DPB1 and DPA1 genes are only 2kb apart (Kelly and Trowsdale, 1985), it was not surprising that the DPB1 and DPA1 probes detected similarly sized Notl, BssHII and EagI fragments (Table 3.2.). With Mlul, however, the DPB1 probe detected a 130kb fragment while the DPA1 probe detected a 450kb fragment (Figure 3.2.b.). This result indicated that that there must be an Mlul site in the genomic DNA between the DPA1 and DPB1 probes. These data allowed the immediate positioning of three Mlul sites: the first between theDPA1 and DPB1 probes, the second 130kb proximal and the third 450kb distal (Figure 3.5.). The 250kb BssHII fragment detected by both DPB1 and DPA1 probes was cut by Mlul to give a 130kb fragment detected by DPB1 and a 120kb fragment detected by DPA1 (Table 3.2.). The 370kb Notl fragment detected by both DPB1 and DPA1 probes was also cut down by Mlul to give a 130kb fragment detected by DPB1 and a 240kb fragment detected by DPA1. W ith additional data from EagI single and double digests (Table 3.2. and Figure 3.4.a.) it was possible to position BssHII, EagI and Notl sites on either side of the HLA-DP subregion as shown in Figure 3.5. It was already apparent that clusters of class [a] rare-cutter enzyme sites were present in the class II region. Cluster 1 contained sites for BssHII, EagI and Notl while cluster 2 contained sites for BssHU and EagI.

The DNA gene probe hybridised to similarly sized fragments as the DPA1 probe in all single and double digest combinations tested. (Figure 3.3. and Table 3.2.). It was concluded that the DNA gene mapped distal of the DP Al gene, but within about 120kb as defined by the position of the distal BssHU site.

The DOB gene probe detected a 450kb Mlul fragment, like the DNA and DPA1 probes (Figures 3.2. and 3.3.). However, the DOB probe detected different Notl (>900kb), EagI (210kb) and BssHII (210kb) fragments, indicating that it must map distal of the DNA gene but still within the same 450kb Mlul fragment. The double digest data obtained with the DOB probe allowed the positioning of EagI and BssHII sites coincident with the central Notl site. This defined a third cluster of class [a] rare- cutter sites (Cluster 3, Figure 3.5.)

84 The DQA2 probe hybridised to two fragments in each track, one strongly, corresponding to the fragment carrying the DQA2 gene, and one weakly, corresponding to the fragment carrying the DQA1 gene (Figure 3.4.b.). This pattern of hybridisation has also been reported in other studies with the same probe (Biro et al., 1989; Bontrop et al., 1989). With BssHII, EagI and Mlul, the strongly hybridising band ( DQA2) was of a similar size to that detected with the DOB probe, while the weaker band ( DQA1) was novel (Figure 3.4.b.). It was concluded that there was a fourth cluster of rare-cutter sites between the DQA2 and DQA1 genes (Cluster 4, Figure 3.5.). Because the DQA2 and DOB probes detected similarly sized fragments in each single and double digest tested, it was not possible to determine the positions of these genes relative to one another. However, the relative positions of the DOB and DQA2 genes has been established as cen-D0B-DQA2-te\ for a number of different cell lines, including DR2 homozygotes, in other studies (Hardy et al., 1986; D unham et al., 1989; Inoko et al., 1989; Blanck and Strom inger, 1988). Probes for the DQB1, DQB2 and DQB3 (formerly DVB) genes were not used in the present study, since their close proximity to the DQA1 and DQB2 genes is well documented (Okada et al., 1985b; Jonsson et al., 1987; Blanck and Strominger, 1988; Ando et al., 1989; Inoko et al., 1989). The overall order of the genes in the DO and DQ subregions has been established as cen-DOB-DQB2-DQA2-DQB3-DQBl-DQAl-tel by pulsed field mapping in a DR2 homozygous cell line and by chromosome walking in a DR7 homozygous cell line (Inoko et al., 1989; Blanck and Strominger, 1988). This information was used in the construction of the m ap shown in Figure 3.5.

The DRB1 and DRA gene probes both hybridised to similarly sized fragments as the weakly hybridising ( DQA1) bands detected by the DQA2 probe (Table 3.2.). It was not possible from these data to determine the relative positions of the DQA1,DRB1 and DRA genes, but it was clear that there were no further clusters of rare-cutter sites in the class II region. For the construction of the map shown in Figure 3.5., the positions of the genes in this region were assumed to be as previously determined for a DR2 homozygous cell line: cen -DQA1- DRBl-DRB2-DRB3-DRA-tel (Inoko et al., 1989; Kawai et al., 1989).

85 CLUSTER1 CLUSTER 2 CLUSTER 3 CLUSTER 4 I w □ cu E fo. fo. E o .o .E -O « ® o a $ a w a o>® -Q O O • • O cn in V CD 60 3 p h H h 3 1 3 - 3 1 • H 13 U w • ^ -54 3 1 • H p4 O a O O -t-> HJ S J • ^H 'cd A 0> .3 • CO O O to . a 3 . >> £ . a , a 3 o g o g O cd j c cd j c cd 3 e 5 s . s to 0) 0 6 o 3 co V cd O a> co j c 6 p h h h h H h .52 h h , X X 3 1 H "cd h • • 3 T 3 1 f • • • 4 - 4 A CO p d 0 6 < 3 a> co O CO u < l-l 0 6 cd O o O 3 3 3 J-l CD CO CO p d d H H H H h h JO • ^H X O .44 .44 3 4 Q 3 - 3 1 < O 3 - 3 1 O O W 3 1 £ 4-4 3 1 4-4 j T 3 J O 0 6 3 (-i - 4 cd 0) co O 3 4 - ( 4 - 4 cd cd l-H £ 4 - 4 a> o 6 O CO cd 3 d d I 3 h h h h h 3 r o O 3 T 3 - • H M- 3 - 3 1 4—> • • 3 1 • • 43 co cd 4-4 o 3 0) O CO O O O 3 0 6 u 6 < 3 > CD »H cd p 1 - 4 0) CD 4_( 13 T i • ^H T • • -H 4-4 JO 13 3 - 3 - -O • ‘5b 3 CJ cd O 3 3 o cd td_ p CD O 0 6 O cd CD 3 CO cd l-H CO CD cd CJ O 3 3 O 3" CJ -1 1 a> £ p CO 3 H j H h The PFGE data were used to construct a physical map of the class II region of the PGF haplotype (Figure 3.5.) The novel feature of interest of this map is the presence of four clusters, as defined at the level of resolution of the PFGE technique, of class [a] rare-cutter enzyme sites, which as discussed previously are highly diagnostic for CpG islands. Each cluster contains one or more sites for EagI, BssHII and Notl. These clusters were considered likely to be the sites of novel genes and were selected as targets for cloning and further analysis.

3.4. Summary and discussion

The physical mapping data presented here reveal that the organisation of the class II region of the DR2-homozygous cell line PGF is very similar to that of other cell lines. The class II gene order deduced for PGF (centromer e-DPBl-DPAl-DNA-[DOB, DQA2]-[DQA1, DRB, DRA]- telomere) is consistent with that reported for other cell lines in other studies. In another study in which DR2 haplotypes were mapped with some of the same enzymes (Mlul, BssHII and Notl), the fragment sizes detected were in good agreement, within experimental error (Dunham et al., 1989). The fragment sizes detected in PGF DNA with DRA and DRB gene probes are consistent with the conclusion of Dunham et al. (1989) that DR2 haplotypes possess more DNA in the DR subregion than DR3, DR5 and DR6 haplotypes, but less than DR4 haplotypes.

The PFGE map of the PGF class II region has revealed the presence of four clusters of sites for rare-cutter enzymes which are highly diagnostic for CpG islands and which could mark the positions of novel genes (Figure 3.5.). The next step towards testing this hypothesis was to clone the regions containing the clusters of rare-cutter sites by walking from previously cloned class II genes. In general, the precise distance from each class II gene relative to the nearest rare-cutter sites could not be determined from the map. However, the DP subregion is anchored to the physical map because an Mlul site occurs between the probes for the DPB1 and DPA1 genes, which map only 2kb apart (Kelly and Trowsdale, 1985). Cluster 1 maps 130kb centromeric of this site and cluster 2 maps 120kb telomeric. The DP subregion has been cloned on

87 overlapping cosmids, which extend 70kb centromeric and 20kb telomeric of the Mlul site as shown in Figure 3.5. (Trowsdale et al., 1985). This previously established cosmid walk was potentially a useful starting point for chromosome walking towards clusters 1 and 2, since cluster 1 was estimated to be about 60kb from the proximal end of the DP region cosmid clones, while cluster 2 was about lOOkb from the distal end of the cloned region. In fact, the walk towards cluster 1 was accelerated by the serendipitous discovery that the gene for the a l chain of type XI collagen mapped between cluster 1 and the HLA-DP subregion, as described in Chapter 4.

88 4. M apping of the COL11A2 gene to the class II region

This chapter describes the mapping of the gene for the a2 chain of type XI collagen, a component of cartilage fibrils, to the centromeric end of the class II region of the MHC.

4.1. Introduction

The mechanical strength of cartilage is provided by a three-dimen sional mesh of rigid, rod-like collagen fibres composed of type n, type IX and type XI collagen molecules (Mendler et al., 1989). Type II and type XI collagens are of the fibrillar class, containing a single unin terrupted triple helical domain as shown in Figure 4.1. (Miller and Gay, 1987). In cartilage collagen fibres, type II and type XI molecules are packed together in a ratio of about 8:1 in a supramolecular staggered array (Mendler et al., 1989).

Triple helical domain

Figure 4.1. Schematic diagram of a fibrillar collagen molecule. The mature molecule is 300nm long and is composed of three polypeptide chains, al, a2 and a3. In fibrillar collagens the majority of the length of the molecule is accounted for by the long, uninterrupted triple helical domain (Miller and Gay, 1987).

Type XI collagen is a heterotrimer of al, a2 and a3 polypeptide chains (Morris and Bachinger, 1987) which are encoded by three separate

89 genes. The a3(XI) chain is believed to be a heavily glycosylated variant of the a l chain of type n collagen and would therefore be a product of the COL2A1 gene at 12ql4.3 (Furuto and Miller, 1983; Law et al., 1986). The al(XI) chain gene (COL11A1) and the a2(XI) chain gene (C0L11A2) have been cloned and sequenced more recently (Bernard et al., 1988; Kimura et al., 1989). COL11A1 has been mapped to human chromosome 1, region p21 (Henry et al., 1988). COL11A2 was first mapped in the Human Cytogenetics Laboratory, ICRF, to human chromosome 6, region 21.3-22, by in situ hybridisation (Figure 4.2.; Hanson et al., 1989). The MHC has previously been shown to map to 6p21.3 (Spring et al., 1985).

The possibility that COL11A2 might map near the MHC was considered extremely interesting because of the involvement of cartilage collagen in a clinically important MHC-associated disease, rheumatoid arthritis (RA; Lotz and Vaughan, 1988). About 75% of RA patients have antibodies autoreactive to type II collagen, and type II collagen-reactive T-lymphocytes are found at high frequency in the synovial infiltrates of chronically affected individuals (Londei et al., 1989). The importance of an autoimmune response to cartilage collagen in the development of the arthritic condition has been demonstrated in rodents and primates where the disease can be induced with injections of type II collagen (Stuart et al., 1984). The disease can also be transferred by injection of antibodies or T-cells from affected individuals. The arthritogenic potential of type XI collagen has not yet been investigated in such detail because type XI collagen was discovered much more recently. However, it has now been reported that inoculation of mice with type XI collagen also induces the development of arthritis (Boissier et al., 1990). In view of these results, it was decided to map the COL11A2 gene in detail relative to the MHC. If the gene was found to be closely linked it was hoped to perform RFLP studies on diseased and normal individuals to test whether certain COL11A2 alleles were associated with the incidence of RA.

90 number of train 10 Analysis of 96 m etaphase chromosome spreads revealed a clustering of of clustering a revealed spreads chromosome etaphase m 96 of Analysis Histogram showing distribution of silver grains over m etaphase hum an an hum etaphase m over grains silver of distribution showing Histogram grains over bands 6p21.3-22 (Hanson et al., 1989). al., et probe. (Hanson 6p21.3-22 COL11A2 the bands over with grains hybridisation situ in following chromosomes Figure 4.2. Figure bY jwii Wjki1 tEuf aiiki ai ro a^iW ii ijiik a iWijnkWii1] ibiYijuwJili tfEjudfo jqo nhib ilh ijdi d|i cn «|i «i grjrnn pa gfrrjTrinnn ip «!|iii «|i3 ccjnj du|ai i d j i ii|lih njhriab tjiqfop 3 5 6 7 8 9 0 1 2 Y X 22 21 20 19 18 17 16 15 H 13 7 9 0 1 12 11 10 9 B 7 B 4.2. Mapping of COL11A2 using somatic cell hybrids

To confirm and refine the COL11A2 map position, it was decided to test for the presence of the gene in the DNA of a pair of complementary human/mouse somatic cell hybrids, MCP-6 and 56-47, which contain portions of chromosome 6 overlapping at band 6p21 as shown in Figure 4.3.a. MCP-6 contains the translocation chromosome t(6:X) (6qter-6p21::Xql3-Xqter) (Goodfellow et al., 1982) and 56-47 contains the translocation chromosome t(6:17)(6pter-6p21::17pl3-17qter) along with other human chromosomes (Nagarajan et al., 1986). A Southern blot was prepared from EcoRI-digested DNA from the hybrids, and from total human and mouse genomic DNA as controls. Hybridisation of this blot with theCOL11A2 probe, a 4.5kb EcoRI/BamHI fragment from the cosmid cosHcol.ll (Hanson et al., 1989), gave the result shown in Figure 4.3.b. The probe detected a fragment of about llkb in total human, MCP-6 and 56-47 DNAs but not in mouse genomic DNA at the stringency used for washing (O.lx SSC, 65°C). It was therefore concluded that theCOL11A2 gene must map to 6p21.1-3, the region shared by the hybrids (Figure 4.4.a).

Taken together, the results from in situ hybridisation (6p21.3-22, Figure 4.2.) and somatic cell hybrid mapping (6p21.1-3, Figure 4.3.) suggest that the most likely location of COL11A2 was in band 6p21.3, the map position of the MHC. It was therefore decided to attempt to link the COL11A2 locus with the MHC by PFGE mapping.

4.3. Mapping of COL11A2 by PFGE

W hen the COL11A2 gene probe was hybridised to PFGE blots it was found that it detected fragments similar in size to those detected by the DPB1 gene probe in all single and double digest combinations tested (Figure 4.4. and Table 4.1.).

92 a. b.

1 2 3 4 — 2 3 .1 6 p

22 MCP-6 = 21 .3 • — — 9 .4 MHC[ 21 .2 21 .1 — 6.6 56-47

— 4 .4

6q — 2 .3

Figure 4.3. (a) Intact human chromosome 6 ( showing the position of the MHC), and regions of chromosome 6 contained in the human/mouse hybrids 56-47 and MCP-6. (b) Autoradiograph of Southern blot of EcoRI-digested DNA hybridised with the COL11A2 probe and washed to a final stringency of O.lxSSC, 65°C. Lane 1, total human DNA; lane 2, total mouse DNA; lane 3,

MCP-6; lane 4, 56-47. Z CO Z z CD z + + + + + + Z CD CD IE SE SE Z CD CO EIE

k b

— 3 7 0

— 2 5 0

— 1 3 0

DPB1 COL11A2

Figure 4.4. Autoradiographs of a PFGE filter hybridised sequentially with probes for the DPB1 and COL11A2 genes. The filter was stripped and checked for retention of signal by autoradiography between hybridisations. B, BssFIII;

M, Mlul; N, Notl. PROBE DIGEST DPB1 COL11A2

BssHII 250kb 250kb EagI 250kb 250kb M lul 130kb 130kb Notl 370kb 370kb N ru l >900kb >900kb BssHII+Eagl 250kb 250kb BssHII+MluI 130kb 130kb BssHII+Notl 250kb 250kb M lul+ N otl 130kb 130kb

Table 4.1. Sizes of fragments detected by hybridising probes for the COL11A2 and DPB1 genes to PFGE blots of DNA from the cell line PGF cut with the enzymes indicated. (Figure 4.4. and Hanson et al., 1989).

It was concluded that the COL11A2 and DPB1 genes were in close physical proximity at the centromeric end of the class II region. The upper limit on the maximum distance between the two probes is given by the length of the shortest common band detected, a 130kb Mlul fragment (Figure 4.4.). As shown in Chapter 3, the telomeric end of this 130kb Mlul fragment falls between the DPA1 and DPB1 genes, while the centromeric end occurs within cluster 1 (Figure 3.5.).

4.4. Cosmid walking between the DP subregion and COL11A2.

The genes of the DP subregion have previously been isolated on overlapping cosmid clones (MANN 2.3, MANN 3.6 and MANN 2.2, Figure 4.5.; Trowsdale et al., 1985). These extend approximately 70kb distal of the Mlul site between DPA1 and DPB1, The COL11A2 probe did not hybridise to blots of DNA from these cosmids, however (data not shown). The COL11A2 gene was therefore deduced to map centromeric of the previously cloned region but still within the same 130kb Mlul fragment, as shown in Figure 4.5.

95 CO CO TO 6 z *-H O z CD _ •S '-2 < 'r- ^ *ss e 0^ ft □ I co CM _ fl j! o r CM _ <-? 5 o > S! z o S -8 3 g Q_ ■ t f— _ M'S 5 ? Q — f— ICO ICO I Sh jb CO o o _ ^ 8 b * T“ a bOTO x CM CD o M i Q_ 05 — 1g j* S S Q I CO T - S 'S -S3 o & o ^ | CO — a>* 3 8 < < • s ° § ° cd CD co a> CL Q . o I I X — — — 6

= o _ CO a0 1 CO __ o o lO — o

o a> co -M- St o o I CO “ o 1 20

— o _ T— BU s | o — aJ * oj . g JO “I

O CO a § &&CJ 2 co o o LJ 5 z z CO COs £Q- LU LU O O £ o o O O 2

96 The DP-region cosmid walk was extended in the centromeric direction to confirm this and in so doing to link the COL11A2 locus to the DP genes.

A 4.5kb Clal/Sall fragment was subcloned into the Bluescript vector from the end of MANN 2.3, the most centromeric cosmid in the cloned region (Figure 4.5.). The Clal site was in the genomic insert of MANN 2.3 and the Sail site was at the cloning site of the cosmid vector, pTCF. The subcloned fragment, pCS2, was found to contain many repetitive sequences when used as a probe on Southern blots of human genomic DNA. pCS2 was therefore cleaved with Rsal to generate several smaller fragments, each of which was tested for the presence of repeats by hybridising back to genomic Southern blots. One 500bp Rsal fragment (WP in Figure 4.5.) was found to be single copy and was used to screen the cosmid library. Two new cosmids, HPB.ALL 1 and HPB.ALL 8, were isolated with this probe and were shown to span the gap between MANN 2.3 and cosHcol.ll, the cosmid clone carrying the COL11A2 gene (Figure 4.5.) Since the position and orientation of the COL11A2 gene within cosHcol.ll was already known (Dr. Kathy Cheah, personal communication), this result established that the COL11A2 gene was 45kb centromeric to DPB2 and oriented with the 3' end of the gene nearest the MHC.

4.5. Summary and discussion

Using mapping techniques of increasing resolution, the gene for the a2 chain of type XI collagen has been localised to the centromeric end of the class II region of the MHC. This is consistent with two other reports localising COL11A2 to the short arm of chromosome 6, although neither of these studies mapped the gene relative to the MHC (Kimura et al., 1989; Law et al., 1990). This assignment is in keeping with the general trend for diverse map locations of human fibrillar collagen genes. The COL3A1 and COL5A2 loci are both found within the chromosomal region 2q24.3-31, but COL1A1, COL1A2, COL2A1, COL11A1 and COL11A2 map to 17q21-22, 7q21.3-22.1, 12ql4.3, lp21 and 6p21.3 respectively (Vuorio and de Crombrugghe, 1990).

97 The finding that the COL11A2 locus maps only 45kb proximal to the DPB2 gene is intriguing, given the association of RA with the class II region, and the involvement of cartilage collagens described in section 4.1. However, the major association of RA with the class II region is seen with the DR subregion (Nepom, 1990) and in studies where a possible association of RA with the DP subregion has been investigated, no statistically significant difference in the frequency of DP alleles in patients and controls has been observed (Begovich et al., 1989; Stephens et al., 1989). Thus it is unlikely that the close linkage of the COL11A2 gene to the class II region is significant for the understanding of the association of classical RA with the class II region. However, another MHC-associated disease, pauciarticular juvenile rheumatoid arthritis (PJRA), does demonstrate an association with DP alleles; specifically, the incidence of DPw2 is increased in patients compared to controls (Odum et al., 1986; Begovich et al., 1989; Fugger et al., 1990). Anti collagen antibodies have been observed in PJRA patients (Lotz and Vaughan, 1988). It remains possible therefore that the association of PJRA with the DP subregion might be at least partly explained by linkage disequilibrium between alleles of the DP genes and alleles of the closely physically linked COL11A2 locus. Consequently it was decided to search for RFLPs at the COL11A2 locus which could be used to extend the disease association studies. This work is described in Chapter 8.

In the course of the work described in this chapter, overlapping cosmid clones were isolated which extended approximately 130kb proximal of the Mlul site between DPA1 and DPB1. The genomic insert of cosHcol.ll was mapped and shown not to contain an Mlul site. The proximal end of the 130kb Mlul fragment (Figure 4.5.) must therefore be located just proximal of cosHcol.ll. Since this Mlul site was known to map within the most centromeric of the clusters of rare-cutter sites (Cluster 1, Figure 3.5.), the cosmid walk was extended further to encompass this region, as described in Chapter 5.

98 5. Identification of novel genes associated with clusters of rare-cutter sites

This chapter describes the cloning of three of the clusters of rare-cutter sites identified by PFGE mapping of the class II region (Chapter 3), and the subsequent identification of five novel genes.

5.1. Introduction

The work described in Chapter 4, i.e. the positioning of the COL11A2 gene centromeric of the DP subregion, provided a useful step towards the cloning of the first cluster of rare-cutter sites identified by PFGE (Cluster 1, Figure 3.5.). As shown in Figure 4.5., the centromeric end of the 130kb Mlul fragment which carries theCOL11A2 and DPB1 genes must fall just beyond the proximal end of cosmid cosHcol.ll. Since this Mlul site occurs in cluster 1, the cosmid walk was extended in the centromeric direction in order to encompass the rare-cutter sites of cluster 1.

5.2. Cloning of cluster 1 by cosmid walking

A 7.5kb EcoRI fragment (E3, Figure 5.2.) was isolated from the centromeric end of cosHcol.ll and used as a probe to screen the cosmid library. Four new overlapping cosmids, HPB.ALL 25, HPB.ALL 31, HPB.ALL 33 and HPB.ALL 42, were isolated and mapped with restriction enzymes, including the rare-cutters. A complete map of the cloned region is presented in Figure 5.1. and a more detailed restriction map of the new cosmids is shown in Figure 5.2. As discussed above, the PFGE data indicated that there should be an Mlul site just centromeric of the cosmid cosHcol.ll. This was indeed found to be the case. In addition to the single Mlul site, this region was found to contain one BssHII site, five EagI sites and one Notl site (Figure 5.2.). To prove that

99 C0L11A2 DPB2 DPA2 DPB1 DPA1 It I t ■ I ■t Z- Z L C 03 m c - CL C 1- - C\J M" ^ °= 8

O O LO M O CM 1 w 0 o I I 00 CM CO c\i CM CO CO \ o CD c lu 8 o x o o CM o o CO o CO o o O CO o o o o o o co C'- CO 5 0 CM co o o o M" o r-. co co _ • 1/5 PH 01 bO £ V h • ^ Q P < V4-H ■J ■4—* H-< T3 o £ • H d c V bC < Tj £ 0 0/ £ C/5 bO cu 'H ’V ♦ H £ i O V (0 u O £ C/5 h h l h a—I 'a! T3 > — 4 M-l < CD d 6 £ i V o cd u CD d h h h h

T3 X X * »H h X T3

Restriction maps of new overlapping cosmid clones, showing the cluster of sites for the rare-cutter enzymes BssHII, EagI, Mlul and Notl. Brackets indicate a portion of cosmid HPB.ALL 33 that was found to be rearranged. Also shown are the positions of various restriction enzyme fragments which were isolated for use as probes. these were the cluster 1 sites as detected by PFGE, it was necessary to demonstrate that they were unmethylated in genomic DNA.

The methylation status of the BssHII, Mlul and Notl sites in genomic DNA were determined by hybridising PFGE blots with probes mapping distal ( COL11A2) and proximal (33X1, Figure 5.2.) of these sites as deduced from the cosmid map. The results are shown in Figure 5.3. and sum m arised in Table 5.1.

PROBE DIGEST 33X1 COL11A2

BssHQ lOOkb 200kb M lul 230kb 130kb N o tl 440kb 370kb BssHII+MluI lOOkb 130kb BssHII+Notl lOOkb 250kb Mlul+Notl 230kb 130kb

Table 5.1. Sizes of fragments detected by probes distal ( COL11A2) and proximal (33X1) of the BssHII, Mlul and Notl sites in the new cosmid clones.

The two probes detected differently sized fragments with BssHII, Mlul and Notl, showing that these sites were cut and therefore unmethylated in genomic DNA.

To determine the methylation status of the EagI sites, Southern blots were prepared from genomic (PGF) DNA and cosmid (HPB.ALL 42) DNA cut with EagI and resolved by conventional gel electrophoresis. These blots were then hybridised with fragments adjacent to or spanning the EagI sites as deduced from the cosmid map. One result is shown in Figure 5.4. In cosmid DNA, which is unmethylated and therefore cut at all EagI sites, the probe 33X3 detected two EagI fragments because it spans the central EagI site. In PGF DNA, however,

102 E E E N/E M B/E 10kb cen

Probes: czi ! I 33X1 C0L11A2

H o X x Z (/) H CO h- + C/) o CO o GO z co z + + — X + — X X —+ ------I— CO CO 3 3 )"“ (/)(/) 3 3 3 O CO CO O cn c/) Z CO CD z co m

4 4 0 — * kb 3 7 0— # kb

2 5 0 — 2 3 0— 1 30 — 1 00 — • %

33X1 COL11A2

Figure 5.3. Determination of the methylation status of cloned BssHII, Mlul and Notl

sites in genomic DNA. At the top is shown a map of the rare-cutter sites in the overlapping cosmid clones. The autoradiographs show a pulsed field blot hybridised sequentially with probes mapping proximal (33X1) and distal (COL11A2) of the BssHII, Mlul and Notl sites in the cosmid clones. Figure 5.4. Determination of the methylation status of cloned EagI sites in genomic DNA. At the top is shown a summary map of the rare-cutter sites in the overlapping cosmid clones. The autoradiographs show blots of cosmid DNA

(HPB.ALL 42) and genomic DNA (PGF) cut with EagI and hybridised with the genomic fragment 33X3. Bars represent lambda/Hindlll DNA size markers; from the top these are 23.1 kb, 9.4kb, 6.6kb, 4.4kb, 2.3kb and 2.0kb. the same probe detected a single fragment whose length was the sum of the two cosmid fragments. From these data it was deduced that the central EagI site was uncut and therefore methylated in genomic DNA while the two flanking sites were cut and therefore unmethylated. In the same way it was shown that all but one of the EagI sites were unmethylated in the genome.

The conclusion from these studies was that all but one of the rare- cutter sites cloned by extending the cosmid walk were unmethylated in genomic DNA and therefore that these must be the sites in cluster 1 as detected by PFGE mapping. The methylation data are summarised in Figure 5.13.a.

The fact that the unmethylated rare-cutter sites in cluster 1 were spread over 20kb, whereas a typical CpG island spans l-2kb, suggested that there may be more than one gene in this region. To test this hypothesis, cosmid fragments close to the unmethylated rare-cutter sites were isolated and used to probe northern blots, zoo blots and cDNA libraries to obtain evidence for transcribed sequences as described in section 5.5.

5.3. Cloning of cluster 2 by chromosome jumping

The cloning of the unmethylated rare-cutter sites in cluster 1 made feasible the use of rare-cutter jumping libraries to clone adjacent clusters (Poustka and Lehrach, 1988). In this technique, the two ends of large genomic DNA fragments generated by cleavage with a rare-cutter enzyme are co-doned, as shown in Figure 5.5. A probe adjacent to a site for that rare-cutter enzyme in the genome (i.e. at one end of a rare- cutter fragment) can be used to screen the library and obtain cloned DNA from the other end of the same fragment, which may be hundreds of kilobases away. It was dedded to use fragments adjacent to, and distal of, the BssHII and Notl sites of cluster 1 as probes to screen BssHII and Notl jumping libraries respectively. In theory, this approach should facilitate the cloning of the nearest distal unmethylated BssHII site (i.e. in cluster 2) and the nearest distal unmethylated Notl site (i.e. in d u ster 3).

105 Cleave genomic DNA with rare-cutter enzyme R I f I Circularise in presence W of selectable marker rnmmi F F

F F

Cleave with frequently cutting enzyme F I F F F F F F F F If f y u F F R R F F F F u

Ligate into phage arms and select for presence I of marker

R R

Figure 5.5. Construction of a rare-cutter jumping library (Poustka and Lehrach, 1988). Details of the enzymes and selection system used are given in the text (this chapter and Materials and Methods). The probe used to screen the Notl jumping library was 31KN2, a 1.9kb Kpnl/Notl fragment mapping adjacent to, and distal of, the Notl site in cosmid HPB.ALL 31 (Figure 5.2.). Unfortunately, no positives were obtained with this probe. One explanation for this negative result was that by chance the nearest site for the second enzyme used to construct the library was too far from the Notl site, thereby generating a fragment too large to clone into the phage vector.

The probe used to screen the BssHII jumping library was jBK, a 3.7kb BssHII/Kpnl fragment isolated from cosmid HPB.ALL 31. (Figure 5.2.). This probe detected five positives which were picked and the recombinant phage DNA purified. By restriction enzyme mapping it was shown that four of the positives were identical (clone X]2) while the fifth was different (clone A.j3).

The jumping clone Xj3 was shown by restriction enzyme mapping with Sail and BamHI to contain two genomic inserts, a 0.7kb Sall/BamHI fragment and a 2.0kb Sall/BamHI fragment. In each case the Sail site was derived from the polylinker of the marker plasmid pMLS-Mlu- Not while the BamHI site was presumably genomic. The inserts were subcloned into the Bluescript plasmid vector. The 2.0kb Sall/BamHI fragment was shown by hybridisation to derive from the starting probe jBK. The 0.7kb fragment, when hybridised to PFGE blots, detected similarly sized fragments as the COL11A2 and DPB1 genes; in particular it detected the 130kb Mlul band (data not shown). This result was unexpected because a successful 'jump' to the nearest unmethylated distal BssHII site (in cluster 2) should have resulted in the cloning of a fragment which hybridised to the 450kb Mlul band (Figure 3.5.). In order to work out where the 0.7kb insert was derived from, it was hybridised to a blot of DNA from the entire cosmid walk, spanning from DPA1 to cluster 1 (Figure 5.1.). The probe hybridised strongly to cosmid MANN 3.6 and more weakly to MANN 2.2 and MANN 2.3. This suggested that the 0.7kb insert could be derived from the DPB1 gene and was also cross-hybridising to the related DPB2 gene. To test this, a portion of the subcloned insert was sequenced using primers complementary to the Bluescript vector cloning site, and the partial nucleotide sequence obtained was found to be identical over 153 nucleotides (8195-8348 in Kelly and Trowsdale, 1985) to a region of the

107 second intron of the DPB1 gene, except for a single nucleotide mis match at position 8270. These data confirmed that the 0.7kb insert of X)3 was derived from the DPB1 gene. The sequenced region mapped near a BssHII site in the second exon of the DPB1 gene (position 7736 in Kelly and Trowsdale, 1985).

It was therefore concluded that X)3 contained the ends of an unusual jump from the BssHII site in cluster 1 to the BssHII site in the DPB1 gene (Figure 5.8.). The BssHII site in the DPB1 gene must have been unmethylated, and hence cleaved, in the cell line from which the library was constructed, although in PFGE analysis of this region in PGF DNA (and in other published maps from other cell lines) the cleavage of this BssHII site has never been observed.

The jumping clone Xj2 was shown by restriction mapping with Sail, BamHI and Hindlll to contain two genomic inserts, a l.lkb Sall/Hindm fragment and a 2.0kb Sall/BamHI fragment (Figure 5.6.a.). The Sail site in each case was derived from the polylinker of the marker plasmid, while the BamHI and Hindlll sites were presumably genomic. The 2.0kb Sall/BamHI insert was shown by hybridisation to be derived from the starting probe, jBK, and was probably identical to the 2.0kb BamHI/Sail insert identified in Xj3. The l.lkb Sail/Hindlll fragment, designated AJ2SH, was hybridised to PFGE blots and was found to detect a 250kb BssHII fragment, a 450kb Mlul fragment and a 370kb Notl fragment. This pattern of hybridisation would be expected for a probe mapping between DPA2 and cluster 2 (Figure 3.5.). A,j2SH was then used to screen the cosmid library and a positive clone, HPB.ALL 71, was isolated and characterised by restriction enzyme mapping (Figure 5.6.b.). Xj2SH hybridised just centromeric of the most centromeric BssHII site in HPB.ALL 71.

From the PFGE map, cluster 2 contained unmethylated sites for BssHII and EagI (Figure 3.5.). Cosmid HPB.ALL 71 was found to contain five BssHII site and two EagI sites spread over about 7kb (Figure 5.6.b.). To prove that the sites in HPB.ALL 71 mapped to cluster 2 (and not to cryptic rare-cutter sites, as was found for Xj3) it was necessary to demonstrate that at least one of the BssHII sites and one of the EagI sites in HPB.ALL 71 was unmethylated in genomic DNA.

108

h sae a apis o oh clones. both to applies bar scale The

o cen h cN lbay ae shown. are library, cDNA the screen to

al eoi isr fo A2 Te w BsI famns 7B3 n 7Bl wih ee usqety sd s probes as used subsequently were which 71Bsl, and 71Bs3 fragments, BssHII two The Aj2. from insert genomic Sail

b Rsrcin a o te omd P.L 7, hc ws sltd y cenn te irr wt te .k Hindlll/ l.lkb the with library the screening by isolated was which 71, HPB.ALL cosmid the of map Restriction (b)

X CD co a> >■4 Cl, CD 1-4 £ td o G G 6 td td 4 - > T3 £ > TJ x > -G T3 'O X £ hh E £ PQ co" CO CO T3 Q J h 6 U U 4= M) G cu td M 0) 0) -zj «» G3 CO B (3 h O u > C 0 cp X a> Q X cd cd jo a> td B a> a> a> co CO t-i 4 - * — 4 - * Q J PH . y . £ '^ 4 H CM .G A 1 1 § 5 3 & A n & at VO id ----- o uo CO o O CM CO IT) LO O CM LO — O — t/> CD CO co CO _td O a ? LU to to X CO cd CO a> c 0 1 < d o . x CD CO CO CO CO CO z (e) X U)

9.4

— 6.6

■4.4

3.9

2.3

■2.0

1.6

Figure 5.7. Determination of the methylation status of cloned BssHII and FagI sites in genomic DNA. The autoradiographs show blots of genomic DNA (PGF) cut with BssHII or FagI and hybridised with the 3.9kb

BssHII fragment 7 1 Bs 1 from cosmid HPB ATI. 71 (Figure 5.6 ). All si/es are in kilobases. 73 vq w 2 6 S a; 2 n CM p < ic 0 ^ a» © cc in m £ 73 UJ CL ^ *2 aJ h [Q « 'd CM « " O (O £ ° «* ^ d ■§■8 * S. is a . m — a> 'S/C ■£ .£ OS ’0 W> ’g > •£ ~ 2 ^ - 8«§ 8 2 .2. 'H 13 _ X V h O **-> <0 < £ ~ <*H CO CL ^ 7 3 O C /5 q. 73 o3 a> M h <— -a -* z C z o co^ o '55 < CO 6©£> ~ ^ / < □ f / < 2 <8 - □ CQ — □ p o . • Q a u s ^ *o t)Oa)c 60_J j CM ^ & i-| < \ / m Mg £ 2 \ i o. \ i □ o •S I q s ' i H£ o rC £ O) > ' i CM .2^73 *- c < ^«> §c s u ° 0 ^ft V h _ h g £ 73 □ o 1 ° > o O 2 -C '•0 £Q — b; r ■g ■o O>. OJx JU ^ cc 7JZ, **(D ° ^ 111 £12 •£"4_’ 60 C *-HI« h 2 e *S o CO CO O> 5 ur- <9 U73 O) O -I 3 £ 3 ^ O o £ £"73 < V h C/5 ^ (*J 0 J cd 2 oi - >.- 3 CL cu 73^ — £._ X H - # « J ^ ^ 't? i-H O '3 ° roq 1______I J L GO ^ i co nj CO in «J«N O 0) o CO Z LLI o UJ 2. . >< g & z r* S EP -3 tn E z 2 UJ tp > Oh 2 0 §3 ♦ CO o a ojc SB o This was done by hybridising cosmid fragment 71Bsl (3.9kb BssHII fragment, Figure 5.6.b.) to blots of genomic (PGF) DNA cut with BssHQ or EagI and resolved by conventional electrophoresis. Probe 71Bsl detected a 3.9kb BssHII fragment and a 1.6kb EagI fragment in genomic DNA (Figure 5.7.). Thus the two BssHII sites flanking 71Bsl and the two EagI sites within 71Bsl were shown to be unmethylated in genomic DNA (Figure 5.13.b.). This fact, coupled with the PFGE mapping data from Xj2SH, led to the conclusion that HPB.ALL 71 must m ap to cluster 2 (Figure 5.8.).

To test for the presence of genes at cluster 2, fragments mapping between the BssHII sites in HPB.ALL 71 were isolated and used as probes for zoo blots and cDNA libraries as described in section 5.5.

5.4. Identification of cluster 3 in previously isolated cosmid clones

Cluster 3 was cloned by taking advantage of a previously established cosmid walk around the DOB gene (Figure 5.9.; Blanck and Strominger, 1988). These cosmid clones extended 60kb centromeric of the DOB gene. From the PFGE map shown in Figure 3.5. it was apparent that cluster 3 mapped on the centromeric side of the DOB gene, although its precise position could not be determined. It was decided to test for the presence of rare-cutter sites in the cosmids mapping centromeric of the DOB gene.

Cosmid clones U10 and U15 (a gift from George Blanck) were digested with BssHII and Notl. It was found that cosmid U15, the most proximal clone in the walk of Blanck and Strominger (Figure 5.9.a.), contained two BssHII sites and two Notl sites as shown in Figure 5.9.b. The methylation status of these sites in genomic DNA was not determined at this stage. However, a cDNA clone subsequently isolated with a probe from U15 was hybridised to a PFGE filter of human DNA digested with BssHII and Notl in single and double digest combinations (Figure 5.10.). The cDNA probe detected a fragment of about lOkb in each track (the fragment of about 440kb in the Notl track of Figure 5.10. was a residual signal from a prior hybridisation).

112 0 5 0 100 150 2 0 0 2 5 0 (a) i r-1 i i i i j i 1 1 1 1—i— r r ■ | i i 1 I I 1 1 1—ii "| —t- t- | DO/3 DX/3 DX a DQ/3 DQa ■ I GENES ------■— ■ DV/3 II 1 .. Sal I I 1 I — U— Cla I i i ii i 11 I I ------1------Xho I ii I II = 11 1 II I I I 1 Itlll Asp 718 i l+l 1 1 I I 1 II III 111! Ill II 1 1+1— Bam HI 1 HI 1 1 i i i i 1 1 I J15 T 2 0 T 1 6 i i 1 1 i i COSMID U10 T 8 U 5 CLONES 1 1 i ii i i M 4 M 15A M 5 i i j 1 ------1 T 1 8 A U13 U 16 i i i i i VI2 7 8 U 51A use i i i i 1 J UI------118 1 IH L 9 A U 9B

(b) J U1 5

U10

■cen

Notl

BssHII

Sail

Kpnl

5kb U15KN

Figure 5.9. (a) Map of overlapping cosmid clones isolated by Blanck and

Strominger (1988). The DXp, DXa, DVP, DQp and DQa genes are now known as DQB2, DQA2, DQB3, DQB1 and DQA1 respectively. (b) Detailed restriction map of cosmid U15 and the overlapping portion of cosmid U10. The position of the probe U15KN is shown. Figure 5.10. Determination of the methylation status of cloned BssHII and Notl sites in genomic DNA. The autoradiograph shows the result of hybridising the RING4 cDNA probe 2.1 to a PFGE filter of human genomic DNA (PGF) digested with BssHII, BssHII+Notl and Notl. Size markers are concatemers of bacteriophage lambda DNA. This result indicated that both Notl sites and both BssHII sites in U15 are unmethylated in genomic DNA (Figure 5.13.C.).

5.5. Analysis of cloned regions for coding sequences

5.5.1. Zoo blots Transcribed DNA sequences are more highly conserved during evolution than non-coding regions, and genomic fragments which cross-hybridise with the genomic DNA of other species may indicate the presence of genes (Monaco et al., 1986). This principle was used to test for the presence of genes in clusters 1 and 2. Restriction fragments mapping close to the unmethylated rare-cutter sites were isolated from the cosmid clones and probed onto 'zoo-blots* of EcoRI-cut genomic DNA from different vertebrate species. A representative result is shown in Figure 5.11. A summary of the results obtained with all probes tested is given in Table 5.2.

PROBE SPECIES 33X1 31K1 jBK 71Bs3

Rhesus monkey + + + + Pig + + + +

Rat + - + - M ouse + + + + W h ale + + + +

Chicken — --

Table 5.2. Summary of results obtained by probing cosmid fragments from clusters 1 and 2 onto zoo blots of EcoRI-cut genomic DNA from a variety of vertebrate species. Positions of probes are shown in Figures 5.2. and 5.6. (+), cross hybridisation detected; (-), no cross hybridisation detected. Blots were washed to a final stringency of 2x SSC, 65°C. A representative result is shown in Figure 5.11.

115 > * JTv c o a) 0) S 52P ■* C w 3 -S ro o ° 3 o & a o ^ ! E I S Q. CE I ^ O k b

L - 23.1

- 9.4

— 6.6

- 4.4

» «

* ■ ' 2.3 * 2.0

— 0.6

PROBE: 33X1

Figure 5.11. Zoo blot analysis of genomic fragment 33X1. The autoradiograph shows the result of hybridising probe 33X1 to a Southern blot of

EcoRI-cut genomic DNA from different vertebrate species. The final w ashing stringency was 2x SSC at 65°C. The results obtained with the probes listed in Table 5.2. provide good evidence for the conservation of these sequences in the genomes of other organisms. These fragments may therefore contain exons of genes. To tested this hypothesis, the probes were subsequently hybrid ised to northern blots to obtain evidence for transcription.

5.5.2. RNA blots The conserved fragments identified through zoo blotting were hybridised to northern blots of poly(A)+ RNA from a variety of cell lines to test whether or not they detected transcribed sequences. A representative result with the genomic fragment 33X1 is shown in Figure 5.12. 33X1 detected an RNA species of about 1.6kb in RNA from the cell lines K562 (erythroleukaemia) and U937 (histiocytic leukaemia) but not HL60 (promyelocytic leukaemia) or WEHI (mouse promyelocytic leukaemia). The RNA species detected by 33X1 is probably a polyadenylated mRNA because it is increased in abundance in poly (A)+ RNA (Figure 5.12.). Similar results were obtained using the probe 31K1 from cluster 1 and Xj2SH from cluster 2 (data not shown). However, this was not conclusive evidence that these genomic fragments encoded the gene (or part of the gene) giving rise to the transcript; it remained possible that the transcribed locus was elsewhere and that the genomic fragments tested here contained pseudogenes or other sequences related to the true expressed locus which were cross- hybridising with the RNA species under the conditions used. More convincing evidence that these probes were really part of functional genes was obtained by the isolation of cDNA clones, as described below.

No hybridisation to northern blots was detected with the probe jBK. In practice, however, it was often difficult to obtain convincing results by hybridising northern blots with genomic fragments. Negative results were difficult to interpret because a given genomic fragment could contain a very small portion of an exon, which may not hybridise under the conditions used. Furthermore, a given gene may be not be expressed in the cell lines tested at a level detectable by the sensitivity of the northern blotting technique. A more satisfactory approach to determining whether a genomic fragment encoded transcribed sequences was to hybridise that fragment directly to cDNA libraries.

117 |—PolyA +-||— T otal - 1 CM O ^ " CM O ^ — (OlOCOllDlDCOl lO_JO>UJlO_IC)LLI

2 8 s

1 8 s

PROBE: 3 3 X1

Figure 5.12. Northern blot analysis of genomic fragment 33X1. The autoradiograph shows the result of hybridising probe 33X1 to a northern blot of total and poly(A)+ RNA from different cell lines as described in the text.

The final washing stringency was 2x SSPE at 65°C. 5.5.3. cDNA libraries cDNA dones were isolated by screening cDNA library filters with each of the probes 33X1,31K1,31KN2 (Cluster 1), 71Bs3, 71Bsl (Cluster 2) and U15KN (Cluster 3). All of these genomic fragments detected positive colonies. In each case, DNA from positive colonies was purified and the longest clone was generally selected for detailed characterisation (Table 5.3.). However, four cluster 2 cDNA clones were characterised because these were isolated using two different genomic fragments, 71Bs3 and 71Bsl, and it was not initially certain that only one gene was present, although this was quickly established by restriction enzyme mapping of the dones. The cDNA clone inserts were hybridised back to pulsed field blots to confirm that the only location in the human genome of sequences related to that cDNA clone was within the MHC. Each cDNA probe was found to give the hybridisation pattern expected from the known location of the corresponding genomic probe. This eliminated the possibility that the genomic fragments contained pseudogenes which were detecting cDNAs derived from transcripts from genes elsewhere.

The cDNA clones were hybridised back onto the relevant cosmid clones to define the approximate positions of the cognate genes (Figure 5.13.). These genes were designated RING (Really Interesting New G ene) 1-5. RING1, RING2 and RING5, in cluster 1, were respectively 95kb, 90kb and 85kb proximal to the DPB2 gene. RING3, in cluster 2, was llOkb distal of the DPA1 gene. RING4 , in cluster 3, was 25kb proximal of the DOB gene (Figure 5.16.)

119 (a) CLUSTER 1

o o O o o E N/E M E B I____I Probe Probe Probe 33X1 31K1 31KN2 2kb I RING1 | | R1NG2 | RING5 | cDNA CEM15 cDNA CEM21 cDNA JJU5

(b) CLUSTER 2 o o E B

Probe 71BS3 1 kb U Probe 71Bs1 RING3 cDNA CEM32 cONA CEM35 cDNA CEM41 cDNA CEM44

(C ) CLUSTER 3 O N B

Probe U15KN

1 kb l RING4

cDNA 2.1

Figure 5.13. New genes in clusters 1, 2 and 3. The positions of the rare-cutter sites within each cluster, together with the methylation status of each site in genomic DNA, are shown (B, BssHII; E, EagI; M, Mlul; N, Notl; o, site unmethylated; x, methylated; -, not tested). The positions of the cosmid fragments from each cluster used to probe the cDNA library are indicated. Boxed regions show the positions of the five novel genes, RING1-5, as defined by mapping the cDNA clones back onto the cosmids. The maps are oriented with the centromere at the left. GENOMIC cDNA cDNA INSERT PROBE LIBRARY CLONE SIZE GENE

33X1 CEM (T-LCL) CEM15 1.5kb RING1 31K1 CEM (T-LCL) CEM21 1.5kb RING2 31KN2 U937+IFNy yU5 2.3kb RING5 71Bsl CEM (T-LCL) CEM32 4.0kb RING3 CEM (T-LCL) CEM35 2.3kb i t 71Bs3 CEM (T-LCL) CEM41 3.0kb i t CEM (T-LCL) CEM44 1.6kb i t U15KN U937 2.1 2.6kb RING4

Table 53. cDNA clones isolated by screening cDNA libraries with genomic fragments from clusters 1,2 and 3. The positions of the probes and the cognate cDNAs within each cluster are summarised in Figure 5.13.

5.6. Expression patterns of RING 1-5

Northern blots containing RNA from a range of cell lines were hybridised with the cDNA clones for the novel genes RING1-5 and washed at high stringency (O.lx SSPE, 65°C). Representative results are show n in Figure 5.14. and sum m arised in Table 5.4.

The RING1 cDNA probe CEM15 detected a transcript of about 1.6kb in poly(A)+ RNA samples from all cell lines tested. The transcript was expressed at particularly high levels in Molt4, a T-lymphoblastoid cell line (LCL), although expression was also significantly higher in Mann (B-LCL) than in K562 (erythroleukaemia) or U937 (macrophage).

The RING2 cDNA probe CEM21 detected a transcript of about l.lkb in poly(A)+ RNA from T- and B-LCLs. No expression was detected in the macrophage cell line U937 or the erythroleukaemia line K562, even after prolonged autoradiographic exposure.

121 CO

CM O + d . i

H X SC c CQ '7, I !§ t i CO x CM >> mm 3 in to LU O n> n CD x

w m xto I I Xto co o Cu >_fO CM CM SO .2 •5rc

LU co O o> X CD H LfS I H 5 I I z LO LO cx to ^ CO s c 73 > LU CM 0 o> X CD w, o to '7 rs C a

x c QJu X LU O Z Z CD DC

in u.a> P s c respectively respectively in the absence (-) and presence (+) of gamma interferon. Approximate transcript sizes are shown in kilobases. macrophage (U937). In the right hand panel B is B-LCL (Raji), T is T-LCL (Molt4); 2 1, and 3 are colon carcinoma lines SW1222, 5W620 and CC20 panels) or total (right hand panel) RNA from different cell lines with cDNA probes from the genes RINC.1-5. Probes are: RINC1, CEM15; RING2, CEM21; CEM21; RING3, CEM41; RlNC5,flU5; RING4, 2.1. In the first four panels, T is T-LCL (Molt4), B is B-LCL (Mann), E is erythroleukaemia (K562), M The RING3 cDNA probe CEM41 detected two large transcripts, a major species of about 3.5kb and a less abundant species of about 4.5kb, in all cell lines tested. The same transcripts were detected in y-interferon induced and uninduced colon carcinoma cell lines SW1222, SW620 and CC20, indicating that RING3 expression is not y-interferon inducible in these cell lines.

The RING4 cDNA probe 2.1 detected a transcript of about 2.8kb which was expressed at high levels in the B-LCL tested (Raji), and at lower levels in the T-LCL (Molt4). No RING4 expression was detected in resting colon carcinoma cell lines SW1222, SW620 and CC20, but the transcript was strongly induced in y-interferon-treated cells. In addition, RING4 expression was found to be both a and y-interferon inducible in cells of the fibrosarcoma line HT1080 (J. John and G. Stark, personal communication).

The RING5 cDNA probe yU5 detected two equally abundant RNA species of about 2.0 and 2.3kb in all poly(A)+ samples tested.

cDNA RNA SIZE GENE PROBE (APPROX.) T BME H IFN

RING1 CEM15 1.6 kb + + + + + NT

RING2 CEM21 l.lk b + + -- NTNT

RING3 CEM41 3.5+4.5 kb + + + + + - RING4 2.1 2.8 kb + + NTNTNT + RING5 yU5 2.0+2.3 kb + + + + + NT

Table 5.4. Summary of expression patterns of the genes RING1-5. +, expression detected; -, no expression detected. T, T-LCL; B, B-LCL; M, macrophage U937; E, erythro leukaemia K562; H, HeLa; IFN, a- and y-interferon inducible (Figure 5.14.).

123 5.7. Refinement of the physical map of the class II region

Blanck and Strominger (1990) have recently published a cosmid walk which extends 120kb in the vicinity of the DNA gene (Figure 5.15.a). These cosmid clones were not linked to any other class II genes, and the precise position and orientation of the DNA gene therefore remained ambiguous. From the PFGE mapping data summarised in Figure 3.5. it was hypothesised that cluster 2 might fall within this cosmid contig. The probe 71Bsl (Figure 5.6.b.) from the cluster 2 cosmid HPB.ALL 71 was hybridised to DNA from the published cosmids (a gift from George Blanck). This probe cross-hybridised to the overlapping cosmid clones U22, 027 and HA14 (Figure 5.15.b). From the map shown in Figure 5.15.a., the cross-hybridising region was about 35kb from the DNA gene. When these cosmids were mapped with BssHII, it was found that they contained the same pattern of sites as HPB.ALL 71 (Figure 5.15.b.), providing additional evidence that cluster 2 is indeed contained within the cosmid walk. Since PFGE data show that the DNA gene is on the centromeric side of cluster 2 (Figure 3.5.) it follows that the DNA gene is about 35kb centromeric of RING3. Thus, the cosmid walk of Blanck and Strominger is oriented within the class II region such that the DNA gene has the same transcriptional orientation as DPA1 (i.e. the map shown in Figure 5.15.a. is oriented with the centromere towards the right). From the PFGE map it is estimated that theDNA gene is 75kb from the DPA1 gene.

The mapping of cluster 3 rare-cutter sites in the cosmid U15, which maps just centromeric to the DOB gene, anchors the cosmid walk of Blanck and Strominger (1988; Figure 5.9.a.) within the class II region relative to the PFGE map. It is estimated that the DOB gene is 160kb telomeric of the DNA gene.

A physical map of the class II region showing the positions of RING1-5 and the accurately determined positions of the class II genes relative to the clusters of rare-cutter sites is shown in Figure 5.16.

124 (a)

0 O ?0 50 «0 5 0 6 0 TO SO 90 100 (10 1?0 ------1 1------I------T------T “ — 1------' 1------T ------T------1------1------1------1 OZ* OfNfS------■ ------

COSMIO ClONfs

CM h- N O) CM < CM CM T™ T— T" CMCMy— t— X 3 o o o o => o O r-

(i) (ii)

Figure 5.15. (a) Map of cosmid clones in the region of the DNA gene (formerly DZA)

from Blanck and Strominger (1990). (td Autoradiographs of cosmid DNA digested with Kpnl (i) or BssHII (ii)

and hybridised with the 3.9kb BssHII fragment from the RING3 locus in cosmid HPB.ALL 71. ’71' is cosmid HPB.ALL 71; other cosmids are shown in (a). Bars represent lam bda/1 lindlll DNA size markers; from the top these are 23.1 kb, 9.4kb, 6.6kb, 4 4kb, 2.3kb, 2.0kb and 0.56kb. CLUSTER 1 CLUSTER 2 CLUSTER 3 O < CM ▼ I DC S CO o o .* n □ □ □ □ □ □ □ □ □ 126 OQ o O o CO Q cc < Q > OQ O o OC s a OC CM Q OC OQ Q OQ CM a < CM Q OQ Q < o a Q. OQ CM Q < CL CM Q OL OQ Q CL < Q z o CQ T" < i G) o E 0) CL o O) G Vh 60 to o O) O G O) 3 * 60 O) 8 2 O G V (0 G .

the novel genes RING1-5 with respect to the previously known class II genes and the clusters of rare- cutter sites. Gene orientations are shown by arrows where known. The orientations of RING1, RING3 and RING4 were determined during nucleotide sequencing (Chapter 6). 5.8. Summary and discussion

The data presented here provide evidence for five new genes in the class II region of the human MHC. RING1, RING2 and RING5 m ap centromeric of the DP subregion, RING3 maps 35kb telomeric of DNA and RING4 maps 25kb centromeric of DOB (Figure 5.16.). The express ion patterns and transcript sizes of these genes are dissimilar to those characterised for the classical class II genes. In addition, sequence data from each cDNA clone (see Chapter 6) indicate that none is related to the class II genes. Thus the discovery of RING3 and RING4 in the middle of the class II region demonstrates for the first time that the classical class II genes are interspersed with non-dass II sequences. This finding has implications for the understanding of phenotypes associated with the class II region, since genes dosely physically linked to the known class II genes are candidates for involvement in class II- associated diseases (section 1.5.). Furtherm ore, both RING3 and RING4 map in the interval believed to contain the gene responsible for a defect in antigen presentation by class I molecules (Cerundolo et al., 1990; section 1.6.2.). Elucidation of the function of these novel genes is clearly of importance. As a first step towards this goal, nucleotide sequencing of RING1-5 cDNA dones was undertaken (Chapter 6).

The finding of non-class II genes in the class II region also has implications for the understanding of the evolution of the region. The class II region of the MHC is thought to have evolved from a single primordial a/p gene pair through a series of duplication and divergence events (Klein, 1986; Bodmer et al., 1986). The similarity of the overall organisation of the classical class II genes of mouse and man have led to the hypothesis that these duplication and divergence events occurred before the radiation of rodents and primates (section 1.4.). To test whether theRING3 and RING4 genes have become inserted into the middle of the human class II region after the divergence of rodents and primates, or whether their position was established beforehand, the presence of the novel genes in the mouse class II region was tested. This work is described in Chapter 7.

127 The strategy used to detect novel genes, i.e. the identification of CpG islands, is likely to influence the type of gene discovered. In a search of the nucleotide sequence databases, CpG-rich regions were found to be associated with all ubiquitously expressed 'housekeeping' genes, but only some tissue-specific genes (Bird, 1986; Gardiner-Garden and Frommer, 1986). Consistent with this, expression of RING1-5 was detected in most cell lines tested (Figure 5.14. and Table 5.4.).

It is likely that there are further genes associated with the clusters of rare-cutter sites. Cosmid U15, for example, contains two Notl sites, both of which are unmethylated in genomic DNA. Given that most Notl sites are found in CpG islands (Table 3.1.), it would be surprising if there were no further genes in this region. Experiments to test this are currently underway in the laboratory.

In the present study, the methods used would specifically detect those genes associated with unmethylated CpG islands. CpG islands generally remain stably unmethylated in all tissues of the body, regardless of the site of expression of the associated gene (Bird, 1987). However, it has recently been reported that in some cell lines a proportion of CpG islands become irreversibly methylated and would therefore not be detectable in genomic DNA (Antequera et al., 1990). Such islands would, however, be detectable by clustering of rare-cutter sites in cloned DNA and could be tested for in the recently described yeast artificial chromosome clones which span the entire class II region (Ragoussis et al., 1991b). Such clones could also be used to test for the presence of non-island-assodated genes. The recently developed 'exon trapping’ technique, to test for the presence of splice junctions in a cloned region, is a potentially powerful method to identify genes regardless of whether or not they are associated with CpG islands (Duyk et al., 1990).

128 6. Characterisation of novel genes by nucleotide sequencing

6.1. Introduction

One method of obtaining valuable preliminary information about the possible functions of a gene is to determine the nucleotide sequence of that gene. A gene encoding a protein product should contain an open reading frame (ORF), the predicted amino acid sequence of which can be used to search protein sequence databases. Important clues as to the function of the protein may be obtained from significant matches (Doolittle, 1986). Since the RNA species detected by the the probes for the novel genes RING1-5 were all found at increased frequency in poly(A)+ selected RNA samples compared to total RNA (Figure 5.14. and data not shown), it was likely that the transcripts of these genes were polyadenylated, and therefore encoded proteins. This chapter describes the complete nucleotide sequencing of cDNA clones for the RING1, RING3 and RING4 genes and the partial nucleotide sequencing of the RING2 and RING5 genes.

6.2. Nucleotide sequence of RING1

The nucleotide sequence of RING1 was obtained from the cDNA clone CEM15. The insert was estimated to be about 1.5kb long, and contained an internal Xhol site. The CDM8 vector contained Xhol sites either side of the cloning site so that two insert fragments were liberated when CEM15 was digested with Xhol. These were subcloned into the Bluescript vector to give two constructs: pXB, containing the ~0.65kb Xhol insert fragment, and pXT, containing the ~0.85kb Xhol insert fragment. The two ends of each subclone were sequenced using oligonucleotide primers complementary to the Bluescript cloning site. Thereafter, the novel sequence data obtained were used to design new complementary oligonucleotides which were used as primers to extend

129 the sequence further. Finally, primers were designed to sequence across the internal Xhol site of the CEM15 insert.

The CEM15 insert was 1439 nucleotides in length. The first 1116 of these encoded a single uninterrupted ORF of 372 amino acids, ending with a stop codon. The ORF was followed by a 3f untranslated region of 323 nucleotides, terminating in 6 A residues which were potentially the start of a poly(A) tail. The sequence AACAAA, which resembles the consensus poly(A) addition signal AATAAA, started 25 nucleotides upstream of the first of the 6 A residues. The AACAAA variant seems to be relatively inefficient in initiating 3' processing of RNAs, since in an in vitro 3' processing assay it was found to direct message cleavage and polyadenylation with only 4% of the efficiency of the consensus AATAAA (Sheets et al., 1990). Nevertheless, in a search of the nucleotide sequence databases, the AACAAA variant was detected in 0.8% of vertebrate cDNAs, and does therefore seem to be a naturally occurring poly(A)-addition signal, albeit rare (Sheets et al., 1990). It may be that additional nucleotides in the vicinity of the AACAAA sequence increase the efficiency of 3’ processing of RING1 mRNAs in vivo. It is also possible that the 6 A residues are not part of the poly(A) tail, in which case the cDNA clone is truncated at the 3' end before the true tail.

The ORF began at the first codon of the CEM15 insert and there was no upstream in-frame stop codon. The CEM15 cDNA clone was therefore probably truncated at the 5' end. Since the length of the cDNA insert (1.44kb without a poly (A) tail) corresponded well to the transcript size detected in poly(A)+ RNA (1.6kb with a poly(A) tail), it was concluded that the CEM15 cDNA insert was probably missing only a few nucleo tides at the 5' end. It was therefore decided to obtain the sequence of the corresponding genomic region to see whether a starting methionine residue and an in-frame upstream stop codon were present.

The two subclones of CEM15, pXT and pXB, were hybridised back to cosmid DNA to determine the orientation of the RING1 gene. pXT hybridised to the 4.1kb Xhol fragment (33X3) of cosmid HPB.ALL 33 (Figure 5.2.) while pXB hybridised to the adjacent 1.7kb Xhol fragment, 33X1 (data not shown). From the sequence data it was known that pXT

130 encoded the 5' end of RING1, while pXB encoded the 3' end. Thus, the RING1 gene must be oriented with the 3’ end towards the centromere. The 4.1kb genomic Xhol fragment, 33X3, containing the 5' portion of the RING1 gene, was subcloned from the cosmid HPB.ALL 33 into the Bluescript vector. A part of the resulting construct, p33X3, was sequenced using a primer complementary to the 5' end of the CEM15 insert and extending further in the 5’ direction. The sequence so obtained revealed the presence of a methionine codon just 5 codons upstream of the first codon in the CEM15 insert (Figure 6.1.). The methionine codon was in a sequence context of CCATAATGG, corresponding well to the general initiation consensus of CCACCATGG (Kozak, 1986). An in-frame termination codon was found a further 21 nucleotides upstream. This methionine was judged likely to be the initiating codon of the RING1 gene product, and for the purpose of the following discussion it will be assumed that the sequence shown in Figure 6.1., with an ORF of 377 amino acids, is correct. However, because this sequence was obtained from a genomic clone rather than a cDNA clone, there is no evidence that it is present in a mature RING1 transcript. This region could be part of an intron in the RING1 gene, and subsequently spliced out during transcript processing.

The predicted protein product of the RING1 gene was rich in glycine in the C-terminal two-thirds (27%, compared to a mean of 7% in human proteins; Doolittle, 1986). The N-terminal third, in comparison, was relatively depleted of glycine. The protein contained 10 cysteine residues, 7 of which were clustered together in the N-terminal region. Another feature of interest was the motif KRPR, starting at residue 172. The same sequence is found in the nuclear localisation signal of polyoma virus large-T antigen (Richardson et al., 1986).

The amino add sequence of the predicted protein product of the RING1 gene was used to search the OWL(9.1) protein sequence database (collaboration with Dr. Paul Freemont, Protein Structure Laboratory, ICRF). When the entire RING1 sequence was used as the query sequence, weak identity was detected with a number of database sequ ences, all of which were related to RING1 solely on the basis of their high glycine content. However, when the glycine-rich C-terminal two-

131 GGC TGC TGT TTC TAA AAC CCC TTT CCC TCT AAC CCA CAC CAC CTT TCT ACT CAC 54

TGA TGC CTT CAG GAA GCC ATA ATG GAT GGC ACA GAG ATT GCT GTT TCC CCT CGG 108 • MD G T E 1 I AVSPR 11

TCA CTG CAT TCA GAA CTC ATG TGC CCT ATC TGC CTG GAC ATG CTG AAG AAT ACG 162 SL H SE LM PI LDML K N T 29 0 ©

ATG ACC ACC AAG GAG TGC CTC CAC AGA TTC TGC TCT GAC TGC ATT GTC ACA GCC 216 MTTKE 0 L 0 RF 0 SD 0 I V T A 47

CTA CGG AGC GGG AAC AAG GAG TGT CCT ACC TGC CGA AAG AAG CTG GTG TCC AAG 270 LRSGNK E 0 PT 0 RKKL V SK 65

CGA TCC CTA CGG CCA GAC CCC AAC TTT GAT GCC CTG ATC TCT AAG ATC TAT CCT 324 RS L R PDP N F DALISKIYP 83

AGC CGG GAG GAA TAC GAG GCC CAT CAA GAC CGA GTG CTT ATC CGC CTG AGC CGC 378 SREEYEAH Q DR V LI RS R 101

CTG CAC AAC CAG CAG GCA TTG AGC TCC AGC ATT GAG GAG GGG CTA CGC ATG CAG 432 LH N QQ AL SSS IEEGL RM Q 119

GCC ATG CAC AGG GCC CAG CGT GTG AGG CGG CCG ATA CCA GGG TCA GAT CAG ACC 4 86 AMH RA Q R VRRP £ PGSD Q T 137

ACA ACG ATG AGT GGG GGG GAA GGA GAG CCC GGG GAG GGA GAA GGG GAT GGA GAA 540 TTM S GG E Q E p GG E G D G E 155

GAT GTG AGC TCA GAC TCC GCC CCT GAC TCT GCC CCA GGC CCT GCT CCC AAG CGA 594 DVSSDSAP D SAP G PA p KR 173

CCC CGT GGA GGG GGC GCA GGGGGG AGC AGT GTA GGG ACG GGG GGA GGC GGC ACT 648 P R GGG A GG S SV G T GG G G T 191

GGT GGG GTG GGT GGG GGT GCC GGT TCG GAA GAC TCT GGT GAC CGG GGA GGG ACT 702 GG V G G G AG SEDSG D R G G T 2 09

CTG GGA GGG GGA ACG CTG GGC CCC CCA AGC CCT CCT GGG GCC CCC AGC CCC CCA 756 L GGGT L G P P SP p GAP S p P 227

GAG CCA GGT GGA GAA ATT GAG CTC GTG TTC CGG CCC CAC CCC CTG CTC GTG GAG 8 10 EP GGE I EL V F R p H P L L V E 245

AAG GGA GAA TAC TGC CAG ACG AGG TAT GTG AAG ACA ACT GGG AAT GCC ACA GTG 864 K G E Y C Q TR Y V KTTG NA T V 263

GAC CAC CTC TCC AAG TAC TTG GCC CTG CGC ATT GCC CTC GAG CGG AGG CAA CAG 918 DH L S K Y L A LR I ALERR Q Q 281

CAG GAA GCA GGG GAG CCA GGA GGG CCT GGA GGG GGC GCC TCT GAC ACC GGA GGA 972 Q EA G EPGG P G G G ASD T G G 2 99

CCT GAT GGG TGT GGC GGG GAG GGT GGG GGT GCC GGA GGA GGT GAC GGT CCT GAG 1 026 PDGC G G EG GG A G G G D G PE 317

GAG CCT GCT TTG CCC AGC CTG GAG GGC GTC AGT GAA AAG CAG TAC ACC ATC TAC 1080 E PALPS LE G V S £ K Q Y T I Y 335

ATC GCA CCT GGA GGC GGG GCG TTC ACG ACG TTG AAT GGC TCG CTG ACC CTG GAG 1134 I A PGGGA F T TL N G S L T L E 353

CTG GTG AAT GAG AAA TTC TGG AAG GTG TCC CGG CCA CTG GAG CTG TGC TAT GCT 1188 L V NEKF W K V S R ? L EL C Y A 371

CCC ACC AAG GAT CCA AAG TGA CCC CAC CAG GGG ACA GCC AGA GGA AGG GGA CCA 1242 PTKDPK 377

TGG GGT ATC CCT GTG TCC TGG TCT ATC ACC CCA GCT TCT TTG TCC CCC AGT ACC 1 296

CCC AGC CCA GCC AGC CAA TAA GAG GAC ACA AAT GAG GAC ACG TGG CTT TT A TAC 1 350

AAA GTA TCT ATA TGA GAT TCT TCT ATA TTG TAC AGA GTG GGG CAA .AAC ACG CCC 1404

CCA TCT GCT GCC TTT TCC ATT GCC CTG CAA CGT CCC ATC TAT ACG AGG TGT TGG 1458

AGA AGG TGA AGA ACC CTC CCA TTC ACG CCC GCC TAC CAA CAA CAA ACG TGC TTT 1512

TTT CCT CTT TGA AAAAA 1 529

Figure 6.1. Nucleotide sequence of the RING1 gene. Nucleotides after the vertical line were obtained from the cDNA clone CEM15. Nucleotides before the line were obtained from a genomic subclone as described in the text. The clustered cysteine and histidine residues are circled, and the potential nuclear localisation signal is underlined. 0 i CJ X CO z a Cm CO Cm CL > a ►J 0 4 O) > cj Cm > CO Cm X CL CJ 04 cj Eh Oi X Cm < cj > CJ > £5 ►J 2 CL oi ol Ql Ql u CJ CJ CJ u CJ CJ CJ 1 CO ■ -~3 ►J _3 m > Ph , 04 04 04 04 04 04 04 u cj CJ U CJ n C) ?! 1 X Eh a 0 4 CO z CJ

> cj I motit

z SJ j cj MC

Q u Eh CO z z > > Eh z Z K CJ 04 a: CO iL CJ CL Z 2 a hj Eh « < Z CJ z a CO CL Eh CO CO « z ■ 2 Eh Z CJ CJ oi iJ a 2 S CL s X CJ 35 Eh < M cj z < os Eh Z Ql »-q H al... CL.. 01 . . H.... > H H 2 H H ►5 H H iu u 0 CJ c j CJ CJ p > .Jl 04 > 2 < Q CL CO M Cm CO ( ? CL CO U u CJ cj cj u U CJ Cm »-q H Ph Cm fH 2 Q n Z CO CL 35 35 . 35 35 35 35 rd 1 2 CO S n 2 y P CJ a CJ U CJ o 1 z 04 04 04 Eh QQ CJ Eh Cm 2 Q iL CJ Eh Eh Eh hJ .J < Eh ►J Q SJ CJ 2 CO Eh 03 > cu M EH l-q o 01 01 Q > > CJ M 2 > 2 04 35 h3 04 04 Oi Eh C/3 a > Q a CJ CJ Z 0) < z <2 CO < Z u ►j hJ M > Cm Cm 03 ►J C5 M Cm CJ Eh X X 03 2 o 35 Q Q CO Z o» CJ Q U CO <5 U a 0 cj CJ CJ O CJ I O H £1- > M M > H M 1 C5 rn y ffn Z Hi O i . U u p u CJ _ CJ Cl u J al CO 01 t> Eh a H Eh 2 M Q Z .-q Eh > 03 0) CO hJ CJ Q CJ CJ CJ -5 « Eh CJ CO > O CJ CO

00 O r H r H r H iH r H O 1 CJ CJ Q r H CO r o Eh Eh z < 2 CJ CO CJ CJ 04 M 2 Cm M > CJ a! 0 1 0 1 are are conserved, or conservatively substituted, between proteins and constitute the described described in the text. The identities the of text. the homologous genes are shown in Table 6.1. and described in thirds of the RING1 amino acid sequence were eliminated and the 139 N-terminal amino acids were used as the query sequence, a much more significant match was detected with seven other proteins (Table 6.1.).

TOTAL GENE NAME ORGANISM AMINO ACIDS MOTIF START

RING1 H u m an 377 19 RAG-1 H u m an 1043 290 rfp (ret) H u m an 513 16 rpt-1 M ouse 353 15 RAD18 S. cerevisiae 487 29 IE110 Herpes simplex 775 116 v iru s VZ61 Varicella-zoster 467 19 v iru s CG30 Baculovirus 264 8

Table 6.1. Homologues of RING1 detected by searching the protein databases with the N-terminal 139 amino acids of the putative RING1 product. The total number of amino acids in the protein product of each homologue, and the position of the first cysteine residue of the conserved motif, are shown.

When these proteins were aligned to maximise identity between them, it was apparent that they all shared a previously undescribed cysteine- histidine motif, with a consensus sequence of:

-C-X-(I,V)-C-X ii.3o -C-X-H-X-(F,I,L)-C-(I,L,M)-X io -i8-C-P-X-C-

(Figure 6.2.). In each protein, the motif was always contained within the N-terminal third, and frequently started very close to the N- term inus (Table 6.1.).

The motif is reminiscent of those found in zinc-binding transcription factors, in which cysteine residues, or a combination of cysteine and

134 histidine residues, co-ordinate zinc ions, thereby stabilising the structure of a sequence specific nucleic-acid binding domain (Berg, 1990). One class of zinc-binding transcription factors includes the well- studied archetypal ’zinc-finger' protein TFIIIA from Xenopus laevis, which has zinc-dependent sequence-specific DNA and RNA binding activity. It is now known that there are large families of TFIHA-related genes in the genomes of both vertebrates and invertebrates which include developmentally important loci such as Kruppel in Drosophila melanogaster (Dressier and Gruss, 1988). The consensus amino acid sequence of the zinc-binding domain of these proteins is:

-(Tyr,F)-X-C-X2 /4 -C-X3 -F-X5 -L-X2 -H-X3/4-H-X-

A single protein may contain several of these domains. The pair of cysteine residues and the pair of histidine residues within each domain co-ordinate a single zinc ion, which is thought to induce folding of the domain to form an a-helix which interacts with DNA (Lee et al., 1989b; Berg, 1990).

A second well-characterised class of transcription factors, the steroid- hormone receptors, also have zinc-dependent sequence-specific DNA binding domains. Each receptor molecule contains one zinc-binding domain with a consensus sequence of:

-C-X2 -C-X13 -C-X2 -C-X15.17 -C-X5 -C-X9 -C-X2 -C-X4 -C-

The first two pairs of cysteine residues co-ordinate one zinc ion while the next two pairs co-ordinate a second ion. The structure thus created is believed to contain a short a-helical region around the fourth cysteine residue which makes sequence-specific contacts with the DNA double helix (Hard et al., 1990).

A number of other distinct cysteine-histidine motifs have now been recognised in other proteins and although these are generally less well characterised some of them are known to be responsible for a zinc- dependent nucleic acid-binding function (Berg, 1990). It is therefore very tempting to speculate that the RING1 motif may also be involved in metal-dependent nucleic acid binding.

135 The known functions of some of the RING1 homologues detected in the databases are highly suggestive of a role for the motif in DNA binding and, in some cases, transcriptional activation. The IE110 gene of herpes simplex virus type 1 encodes a nuclear phosphoprotein which is required for transcriptional activation of later virus genes (Perry et al., 1986). The m ouse rpt-1 gene is expressed in resting CD4+ T- lymphocytes and encodes a nuclear protein which down-regulates expression of the interleukin-2 receptor a chain gene (Patarca et al., 1988). Normal expression of the human rfp gene is up-regulated during spermatogenesis, but the 5’ end of this gene, including the region encoding the cysteine-histidine motif, has also been identified in a fusion gene, ret, which can transform NIH 3T3 cells (Takahashi and Cooper, 1987; Takahashi et al., 1988). The yeast RAD18 gene is required for postreplication repair of DNA damaged by UV light or other agents. The hum an RAG-1 gene activates DNA recombination at V(D)J sequences found at the T-cell receptor and immunoglobulin gene loci, a process which leads to the generation of diversity in the protein products of these genes (Schatz et al., 1989). The functions of products of the varicella-zoster virus gene VZ61 and the baculovirus gene CG30 are unknown (Davison and Scott, 1986; Thiem and Miller, 1989). Despite the tantalising suggestion of a DNA binding function for these gene products, there is as yet no evidence that any of the RING1 homologues bind DNA or zinc, or even that the cysteine-histidine motif is necessary for the function of these proteins. However, based on the speculation that the phylogenetically conserved cysteine and histidine residues of the RING1 motif are involved in co-ordinating divalent metal ions and creating a DNA-binding domain, a hypothetical structure for such a complex has been proposed and is presented in Figure 6.3. (Dr. Paul Freemont, personal communication).

One approach to establishing the biological function of the RING1 motif involves the PCR-based cloning of human genomic sequences which bind to the cysteine-histidine-rich region (Kinzler and Vogelstein, 1989). This method is currently being used (collaboration with Dr. Ruth Lovering) to determine whether the motif has a sequence specific DNA binding activity. The possible role of the RING1 gene product in MHC-associated phenotypes is discussed in Chapter 9.

136 Figure 6.3. (a) Schematic illustration of the co-ordination of zinc ions by two TFIIIA-type zinc fingers. (b) Schematic illustration of the hypothetical co-ordination of metal ions by the RINGl-related motif. RING2 CEM21 5' end

GCC CGG TCC GGC GTG TTC TGT CCT ACC TCA GGA GCG GGG AGC GGC ATC GGC CGA 54 ARSGVFCPTSGAGSGIGR 18

GCG GTC AGT GTA CGC CTG GCC GGA GAG GGG GCC ACC GTA £CT GCC TGC GAC CTG 108 AVSVRLAGEGATVAACD L 36

GAC CGG GCA GCG GCA CAG GAG ACG GTG CGG CTG CTG GGC GGG CCA GGG AGC AAG 162 DRAAAQETVRLLGGPGSK 54

GAG GGG CCG CCC CGA GGG AAC CAT GCT GCC TTC CAG GCX GAC GTG TCI GAG GCC 216 EGPPRGNHAAFQADVSEA 72

AGG GCC GCC AGG TGC CTG CTG GAA CAA GTG CAG GCC TGC TTT TCT CGC CCA CCA 27 0 RAARCLLEQVQACFSRPP 90

TCT GTC GTT GTG TCC TGT GCG GGC ATC ACC CAG GAX GAG TTT CTG CTG CAC ATG 32 4 SVVVSCAGITQDEF L L H M 108

TCX GAG GAX GAC TGG GAC AAA GTC ATA £CT GTC AAC CTC AAG GTG GCG ATC TCT 378 SED D W D K V IA V N LK V A I S 126

GAA CCT CGC GAG TTT GGC CCC TTA £CT GGG AGG AGT TGG AGG AGG GCT GTC 42 9 EPREFGPLAGRSWRRAV 143

Figure 6.4. Nucleotide and predicted amino acid sequence of the 5' end of the RING2 cDNA clone CEM21. The sole methionine codon within the open reading frame is shown in bold type. The termination codons in the other two reading frames are underlined. 6.3. Partial nucleotide sequence of RING2

The two ends of the RING2 cDNA done CEM21 were sequenced using primers complementary to the cDNA vector CDM8 either side of the cloning site. One end (primed with the oligonucleotide CDM8-F) clearly contained a long tract of A residues, characteristic of a poly(A) tail, but could not be read further, presumably due to secondary structure created by the homopolymeric tail. The nucleotide sequence at the other end of CEM21, presumably near the 5’ end of the gene, was obtained by priming the sequencing reaction with the oligonucleotide CDM8-B. This sequence was extended further in the 3' direction by priming the reaction with an oligonucleotide containing nucleotides 86-107 (Figure 6.4.). A total of 429 nucleotides were sequenced in this way (Figure 6.4.). This region potentially encoded a single long ORF of 143 amino acid residues, including a single methionine at position 108. This methionine codon was not in the optimal initiation consensus sequence (CCACCATGG: Kozak, 1986). Both other reading frames contained termination codons throughout the region sequenced. It is not possible at this stage to tell whether the ORF detected at the 5' end of the cDNA clone is that encoding the RING2 protein product (in which case the CEM21 cDNA clone is truncated at the 5' end), or whether it occurs by chance in the 3' untranslated region of the RING2 gene. This question will be resolved with the solution of the complete nucleotide sequence of CEM21. The complete amino acid sequence of the ORF was used to search the PIR and Swiss-Protein sequence databases but no significant match was found.

6.4. Nucleotide sequence of RING3

Four RING3 cDNA clones were isolated (Figure 5.13.). CEM32 and CEM35 were isolated by screening the library with the genomic BssHII fragment 71Bsl, while CEM41 and CEM44 were isolated by screening the library with an adjacent BssHII fragment, 71Bs3. Theoretically the two genomic probes could encode part of two separate genes. However, when the cDNA clones were mapped with restriction enzymes it was apparent that they were derived from the same gene because they all

139 shared several sites in common (Figure 6.5.). Together, the four cDNAs provided a useful nested set of clones within the RING3 gene. The nucleotide sequence of the two ends of each cDNA was determined, using CDM8-F and -B primers, in order to obtain the sequence of RING3 at several points.

The sequence data from the truncated 3' ends of the CEM44 and CEM41 cDNA clones revealed potential ORFs whose predicted amino acid sequences were used to search the databases. Both sequences identified the product of fsh, a Drosophila homeotic gene, as the top match. Because of the striking homology between fsh and these partial sequ ences from RING3, and the extremely interesting phenotypes of mutat ions in the fsh locus (see below) it was decided to determine the comp lete nucleotide sequence of the longest RING3 cDNA clone, CEM32.

The nucleotide sequence CEM32 was obtained by sonicating the CEM32 insert and blunt-end ligating the resulting short fragments into M l3. The M13 clones were then sequenced and aligned to build up the sequence of the CEM32 insert (collaboration with Dr. Stephan Beck). The nucleotide sequence of the 4004-nucleotide CEM32 insert is shown in Figure 6.6. The sequence at the 5' end of the CEM32 insert exactly matched that previously determined at the 5' end of the CEM35 insert, beginning at the same nucleotide. Therefore it is very likely that this is the true 5' end of the RING3 transcript. The CEM32 insert contained a single long ORF. The first potential methionine residue within this ORF was not in a good context for initiation of translation, while the second was in the context CCACAATGG, which differs by only one nucleotide from the consensus of CCACCATGG (Kozak, 1986). The second methionine was therefore considered most likely to be the initiating codon of the RING3 protein. Use of the second methionine would give a protein product of 754 amino acids and a 5’ untranslated region of 1153 nucleotides (Figure 6.6.). The ORF was followed by a 3' untranslated region of 589 nucleotides which was truncated before the poly(A) tail but which contained a sequence exactly matching the consensus poly(A) addition signal (AATAAA) just before the end (starting at nucleotide 3991).

140 CEM32

primers which were used to obtain the nucleotide sequence at the end of each cDNA clone as described in the text. G 1 CGG ACC ACA OCT CAC CCA ATT QGC TTG GAG ATG TGG CGG GTT GCC ACT TCC CTO TGG GTC TCT GCC GCA CTC TTC TGC CTG GTG ACT GAC ACC TTG GAA ATG AAG TTT 1 03 ATG ACG TCA TCG TTC CGG CTG OCC AAT ACA AAA AGC TCC CGC OCA GAG GTG TTC CTT CCC CTT CGA CTC AGC TTC TTC ACC CGC GTG AGC GAG CGC GCC CGC GCG GAG 217 GCG GTC QGC AAA ATC TCA AGC AGG GTG GCG CGC ATG AGC QGC GAA GCT CCT CCT OCC CGC CTA TAT ATA AAG GGC TGG CGC GGG GCT CGG OGG CGC CAT TTC CTG CTG 32 5 GAG TGG ACC AGC CTC TAG AAC GAG CTG GAG GAT TCT OCC TAC CGA TAC ACA OCC TTC GAG TCG TCC OGG OCC GCC ATT ACA ATC CAC CTC CAT CCG CTT GGA AAT GGC 433 CTT CGT CCC GCC CTA TGA CTG GTC OCA GCG QGC ACT ACA GAC CCC TTA GAA OCC OCT OGA OCT OCC CTT TTT CGG OCC CCG CCC AAT CCT CGG AGT CTG TCC ACC CCC 541 TCT ACT CCG CCC TCA AGA OGA TTT CAA ACA TGG AGG CGG CGG CTC CCT AAA CCA CTT TTC GTG TTC ATC GCC CTC CAT CCG AGA TCG AAA CGG GAC CTC GTC QGC CCC € 4 3 GTA OGA OCC CGA CAA GAA GAG OCA ATC CCT OCA GAC CAA CAG CGG OCT ATA TTG ACG ACG GTG TCT GAG ATC OGG GAC CGT CTT TTG AAG AGT CAG TCC CTC CTT ACT 757 TGC CCC CCT CAG CTG AGG CCG CCG CCA TTT TCT TGC TGT GCG CCG TCT OCA GAG OCC OCC AAG CTG CCC OGA OCT CTC CGA GAG OCC CCA AAG AGA CTC CTT TCG TOC i€ 5 CGG OCA OCC ACG GGG TTT GTC OCC TGG ACG OCC AAG AGG AAC QGC CTC OCC CCA ACT TAG OGG GTT ATG CTG GAC CGG GCG GTG AX GCA ACC GAG GCC M X CGG ACT 973 TTC CGC o o c TCA QGO CAG COC OGG TTC CTT GCG GTC AAC|lATClCTC CAA AAC GTG ACT CCC CAC AAT AAG CTC CCT OGG GAA OGG AAT OCA GGG TTC CTG OGG CTG GOC 10S1

CCA GAA OCA GCA OCA OCA OOG AAA AGG ATT CCA AAA OCC TCT CTC TTG TAT GAG QQC TTT GAG AGC OCC ACA ATG OCT TCG GTG OCT OCT TTG CAA CTT ACC CCT GCC 1 1 1 3 M A a V P A 1 Q LTPA 12

AAC CCA OCA CCC CCG GAG GTG TCC AAT CCC AAA AAC OCA OGA CGA GTT ACC AAC CAG CTG CAA TAC CTA CAC AAG CTA GTG ATG AAG OCT CTG TGG AAA CAT CAG TTC 1 2 3 7 II r r r r S V S ■ P XXPGA V TX 0 L Q Y 1* a XVVMXA L XX a Q F 4 3

OCA TGG OCA TTC CGG CAG OCT GTG GAT OCT GTC AAA CTG OCT CTA CCG GAT TAT CAC AAA ATT ATA AAA CAG CCT ATG GAC ATG OCT ACT ATT AAC AGG AGA CTT GAA 1 4 0 5 A M P r a Q P V DA VKLGLP DY a X I IX Q P M 0 M G T X X AA L E • 4

AAC AAT TAT TAT TCG OCT OCT TCA GAG TCT ATG CAA GAT TTT AAT ACC ATG TTC ACC AAC TCT TAC ATT TAC AAC AAG OCC ACT GAT GAT ATT GTC CTA ATG GCA CAA 1 5 1 3 X a Y Y X A A SEC M Q D r X T M FTXCYX YX X p T DD I VLHA Q 120

ACG CTC GAA AAG ATA TTC CTA CAG AAG GTT OCA TCA ATG OCA CAA GAA GAA CAA GAG CTG GTA GTG ACC ATC CCT AAG AAC ASC CAC AAC AAG GGG GCC AAG TTC GCA l € 2 l T LEKIFL Q XV ASM P Q EE Q ELVV T I P XX 3 a XX G AXL A ISC

GCG CTC CAG GGC ACT CTT MX AGT OCC CAT CAG GTG OCT OCC CTC TCT TCT GTG TCA CAC K A OCC CTG TAT ACT CCT CCA OCT GAG ATA CCT ACC ACT GTC CTC AAC 1 7 2 3 AL Q G S V T 3 A a Q V P A V 3 a V a a T ALYTPPPEIPTT V L X 192

ATT CCC CAC CCA TCA GTC ATT TCC TCT OCA CTT CTC AAG TCC TTG CAC TCT OCT OGA CCC CCG CTC CTT OCT GTT ACT OCA OCT CCT CCA OCC CAG CCC CTT GCC AAG 1 1 3 7 I p I PS V I s a P LLX s L a a AG p PLLAVT AAPP A Q p LA X 2 20

AAA AAA QGC CTA AAG CGG AAA OCA GAT ACT M X ACC OCT ACA OCT ACA GCC ATC TTG OCT CCT OCT TCT OCA OCT AGC CCT OCT OGG ACT CTT GAC OCT AAG GCA GCA 194 5 KKGVK AX ADT TTPTPTA I 1* A PG 3 P A a P P G a 1> E P X A A 2 €4

CGG CTT OCC CCT ATG CGT AGA GAG AGT GGT CGC OCC ATC AAG OCC CCA CGC AAA GAC TTG CCT GAC TCT CAG CAA CAA CAC CAG AGC TCT AAG AAA GGA AAG CTT TCA 2 0 5 3 A L p P M A A E S G A p I X p P A X D L P 0 a QQQ a 0 3 3 XXGX 1. 3 3 0 0

GAA CAC TTA AAA CAT TCC AAT OCC ATT TTG AAG GAG TTA CTC TCT AAG AAG CAT OCT GCC TAT GCT TGG CCT TTC TAT AAA CCA CTG GAT OCT TCT GCA CTT GGC CTG 21C1 E 0 L K a CNG I L X EL 1* a X X a AA YA X PFYXPVDA3ALGL 33C

CAT GAC TAC CAT GAC ATC ATT AAG CAC OCC ATG GAC CTC AGC ACT CTC AAG CGG AAG ATG GAG AAC OCT GAT TAC OGG GAT OCA CAG GAG TTT GCT GCT GAT GTA COG 22C 9 a D Y a DII K a P MD L S T V XAXM EX A 0 Y A D A Q E F A A D V A 3 72

CTT ATG TTC TCC AAC TGC TAT AAG TAC AAT CCC CCA GAT CAC GAT CTT GTG OCA ATG OCA CGA AAC CTA CAG GAT CTA TTT GAG TTC OGT TAT GCC AAG ATG CCA GAT 2 3 7 7 L M F 3 X CY X Y X PPD a D VV A M A a X L Q 0 V F EF a YA KMP 0 4 o a

GAA CCA CTA GAA CCA OGG CCT TTA OCA GTC TCT ACT OCC ATG CCC OCT QGC TTG OCC AAA TCG TCT TCA GAG TCC TCC ACT GAG GAA AGT AGC AGT GAG M X TCC TCT 2 4 1 5 EPLEPGP L P V 3 T A H p P OLAX a a a E 3 a 3 E E 3 3 3 E 3 3 3 444

GAG GAA GAG GAG GAG GAA GAT GAG GAG GAC GAG GAC GAA GAA GAG ACT GAA MX TCA GAC TCA GAG GAA GAA ACG OCT CAT CGC TTA OCA GAA CTA CAG GAA CMS CTT 2 5 9 3 EEEEEEDEEO EEEE E a E a a 0 3 EE E a A a a L A E L Q E 0 L 410

CGG OCA CTA CAT GAA CAA CTG OCT OCT CTC TCC CAG OCT CCA ATA TCC AAG OCC AAG ACG AAA AGA GAG AAA AAA GAG AAA AAG AAG AAA CGG AAC GCA GAG AMS CAT 2 701 AAV a E Q L A A X. 3 Q G P 1 a K p X a X a EXX E K I X X X a A E X B 5 1 €

CGA QGC CGA OCT OGG OCC GAC GAA GAT GAC AAC GGG CCT M X OCA OCC CCC OCA CCT CAA CCT AAG AAG TCC AAG AAA GCA AGT OCC ACT OGG OCT O X AGT GCT GCT 1 1 0 9 AGAA OADEDD XGP a A p X P P Q PXX a XXA 3 G 3 G G G 3 A A 5 52

TTA GGC OCT TCT QGC TTT OGA OCT TCT OGA OGA AGT OCC ACC AAG CTC CCC AAA AAC OCC MCA AAG ACA OCC CCA CCT OCC CTC CCT MCA GGT TAT GAT TCA GAG GAG 2 9 1 7 L G P 3 G F G P a G G a GTXL p XXATXTAPPALPTG Y 0 3 E E 59 3

GAG GAA GAC AGC M X CCC ATG ACT TAC GAT GAG AAC OGG CAC CTG AGC CTG GAC ATC AAC AAA TTA CCT OGG GAC AAG CTG QGC CGA CTT CTG CAT ATA ATC CAA G X 3 0 2 5 E E E 3 A p M a YD EXX Q L a L D 1 a XLPGEXLGaVV B I 1 Q A €24

AGG GAG CCC TCT TTA CCT GAT TCA AAC CCA GAA GAG ATT GAG ATT GAT TTT GAA ACA CTC AAG CCA TCC MCA CTT AGA GAG CTT GAG CGC TAT CTC CTT TCC TGC CTA 3 1 3 3 A ‘E p 3 L A D a a P E E I E 1 D r ET 1. X P 3 T L a E LE a Y V L 3 C L €€0

OCT AAG AAA CCC CCG AAG OCC TAC M X ATT AAG AAG OCT GTG OGA AAC MCA AAG GAG GAA CTG GCT TTG GAG AAA AAC CCG GAA TTA GAA AAG CGG TTA CAA GAT GTC 3 2 4 1 A X X p A X P YTI XXPVG XT X EELA L E X X a E LE X a L Q DV € 9 €

AGC GGA CAG CTC AAT TCT ACT AAA AAG CCC CCC AAG AAA GCG AAT GAG AAA ACA GAC TCA TCC TCT OCA CAG CAA CTA OCA CTG TCA CGC CTT MX OCT TCC A X T X 3 3 4 9 3 G O L X 3 T X X p p X XA X E X T E 3 3 3 A QQ VAV 3 a L 3 A 3 3 3 7 3 2

AGC TCA GAT TCC AGC TCC TCC TCT TCC TCG TCG TCG TCT TCA GAC AC C AGT GAT TCA GAC TCA OCC TAA OGG CTC AGG CCA GAT GGG OCA OCA AGG CTC CGC A X A X 3 4 5 7 3 3 D 3 a a a a a 3 3 3 3 3 0 T 3 D 3 D a G 7 54

OGA OCC CTA GAC CAC CCT OCC OCA CCT OCC OCT TCC CCC TTT OCT CTG a c a CTT CTT CAT CTC ACC CCC CCC CTG CCC CCC TCT AGG MCA GCT O X TCT OCA GTG G X 35C 5 GAG OGA TGC MX GAC ATT TAC TGA AGG M X GAC ATG GAC AAA ftCA ACA TTC AAT TCC CAG CCC CAT TGG OGA CTG ATC TCT TGG ACA CAG M X CCC CAT TCA AAA T X 3C 73 GCC AGG OCA AGG CTG GGA GTG TCC AAA OCC CTG ATC TGG ACT TAC CTG M X CCA TAG CTG OCC TAT TCA CTT CTA ACG GCC CTG TTT TGA GAT TGT TTG TTC TAA TCT 3 7 3 1 ATT TTA M X TAG CTA ACG CTC OGG GCA OGG MX QGC CCT GCT OCC CTC JXC CTC CAT OGG GAG OGA ACA AGG OGG AGC TCT TTT CTT ACG TTG ATT TTT TTT TTT CTA 3499 CTC TGT TTT CCC TTT TTC CTT CCG CTC CAT TTG OGG OCC TGG GCG TTT CAG TCA TCT CCC CAT TTG GTC OCC TGG ACT CTC TTT GTT GAT TCT AAC TTG TAA ATA AAG 3 9 9 7 AAA ATA T 4 0 0 4

Figure 6.6. Complete nucleotide sequence of the RING3 cDNA clone CEM32. The predicted amino acid sequence is shown, starting at the methionine residue in the best context for initiation. An additional upstream methionine codon is boxed and the in-frame upstream termination codon is underlined. The potential nuclear localisation signal within the predicted amino acid is boxed. The two regions of internal homology are indicated by bold underlining. The predicted protein product contained a potential nuclear localisation signal, KKKRK, starting at residue 508. In addition it contained regions which were enriched in one particular amino acid. Thus, a highly acidic glutamate-rich region was present between resid ues 445-468, a basic lysine-rich region was present between residues 496- 516 and a serine-rich region was present between residues 729-753. A further feature of interest was an internal duplication in the amino acid sequence (Figure 6.6.). Two stretches of 26 amino acids starting at positions 50 and 322 shared 19 residues in common (73% identity).

When the amino acid sequence of the predicted protein product of the RING3 gene was used to search the protein sequence databases a highly significant match was obtained with the 1106-amino add product of the 5.9kb transcript of the Drosophila melanogaster gene female sterile homeotic (fsh; Haynes et al., 1989). The amino acid sequence identity between the two proteins is shown in Figure 6.7. This alignment reveals three domains which are strikingly conserved. In one stretch of 120 amino acids (positions 22-142 in RING3), 91 residues (76%) were identical in the two proteins and a further 9 were highly conservative substitutions. The human protein seems to have two laTge regions deleted relative to the Drosophila protein, which accounts for its smaller size. Interestingly, the two regions which have appear to been deleted from RING3 contained proposed transmembrane regions in fsh. It had previously been suggested that the fsh protein may be an integral membrane protein (Haynes et al., 1989). However, the absence of these regions in the human protein raises the possibility that they are not transmembrane domains in fsh. It may be significant that many of the stretches of amino acids which were proposed to be membrane- spanning domains were rich in alanine, the hydrophobidty of which is controversial (Haynes et al., 1989).

A further similarity between RING3 and fsh is that both give two major transcripts. The RING3 transcripts are 3.5kb and 4.5kb while the fsh transcripts are 5.9kb and 7.6kb. The fsh transcripts arise through alternative RNA processing and the predicted product of the 7.6kb transcript contains the same 1106 amino acids encoded by the 5.9kb transcript with an additional 946 residues at the C-terminal end (Haynes et al., 1989). The CEM32 insert is thus related to both fsh gene

143 % >

r> o>M 3 60 the the homology. The alignment starts with residue 1 of the RING3 protein and residue 8 of the fsh protein. h Alignment of the amino acid identical sequences in of the the two human proteins RING3 are and indicated Drosophila by 5.9-fsh colons. gene Dashes products. indicate gaps Residues which have which been are introduced to maximise products but seems to be analogous to the 5.9kb transcript because the ORF terminates at the same position (Figure 6.7.). It is not clear at this stage whether CEM32 is derived from the 3.5kb or 4.5kb RING3 transcript. It will be necessary to characterise additional cDNA clones and the genomic structure of the RING3 locus in order to determine the relationship between theRING3 transcripts and the fsh transcripts.

A role for the fsh locus in Drosophila development has been suggested by the observation that fsh expression is required both maternally during oogenesis and then later in embryogenesis for normal embry onic pattern formation to occur. Furthermore, certain mutant fsh alleles interact synergistically with mutant alleles at other developmentally important loci (such as the homeotic gene Ultrabithorax) to increase the frequency of abnormalities in the segmentation pattern of the embryo (Haynes et al., 1989). It has also been directly shown, using immunofluorescence-tagged antibodies against the product of the Kruppel gene (another developmentally important locus) that Kruppel expression is altered in /s/i-mutant embryos (Huang and Dawid, 1990). These observations suggest that the fsh gene product(s) plays a role in the complex interaction between gene products which is known to be critical for normal pattern formation in the developing Drosophila embryo. It is not yet known how this interaction is mediated.

Mammals, including humans, are known to have embryonically- expressed homologues of many Drosophila developmental genes and it is possible that a similar system of interacting gene products is involved in establishing the body pattern of mammals (Dressier and Gruss, 1988). This raises the exciting possibility that RING3 m ay be critical in human embryonic development. It will be of great interest to test the expression of RING3 in embryonic tissues.

The region which is duplicated within the predicted protein product of the RING3 gene (Figure 6.6.) shows homology to a duplicated domain in the human CCG1 protein (Sekiguchi et al., 1988). CCG1 w as identified in a transfection assay as the gene which complements temperature-sensitive mutant baby hamster kidney cell lines which are arrested in the G1 phase of the cell cycle. The mechanism by which the CCG1 gene product overcomes the block in the cell cycle is unknown,

145 and the function of the duplicated domain therefore remains obscure. The homology between the CCG1 and RING3 gene products is confined to the region of internal homology. This duplicated motif may define a novel protein domain.

6.5. Nucleotide sequence of RING4

The complete nucleotide sequence of the ~2.6kb RING4 cDNA clone 2.1 was determined by Ian Mockridge and Adrian Kelly in this laboratory. The cDNA clone contained a single long ORF which started at the 5' end. A methionine codon occurred at nucleotide 84 but this was not preceded by an in-frame upstream stop codon. A portion of the cosmid clone U15 containing the 5' end of the RING4 gene was sequenced (collaboration with Dr. Stephan Beck, ICRF), and the ORF was found to extend a further 180 nucleotides in the 5' direction before a stop codon was encountered. The sequence presented in Figure 6.8. is a composite of the cDNA sequence and the genomic sequence. As with RING1, there is as yet no evidence that this genomic sequence is part of the transcribed region, but at present it will be assumed that the sequence shown in Figure 6.8. is correct. The composite ORF contained two potential initiating methionine residues (arrowed in Figure 6.8.), one immediately following the upstream in-frame stop codon in the genomic sequence and a second at amino add position 61 (nucleotide 211) in the cDNA clone. Use of the first methionine would give a protein containing 808 amino acids, while use of the second would give a product of 748 amino acids. The ORF was followed by a 3' untranslated region of 370 nucleotides and a poly (A) tail. Starting 19 nucleotides upstream of the tract of A residues was the sequence AATAAA, which is identical to the consensus signal for 3’ processing of mRNAs (Sheets et al., 1990).

The N-terminal two-thirds of the predicted protein product of the R1NG4 gene was highly hydrophobic in character. The potential membrane-spanning regions indicated in Figure 6.8. were revealed in a hydropathicity analysis by Dr. Michael Sternberg (Figure 6.9.). The C- terminal third, in contrast, was more hydrophilic. When the predicted amino add sequence of the RING4 product was used to search the pro-

146 T MAELLASAGSACSWDFPRAPPSFP GCGGCCGC777CGA777CGC777CCCC7AAA7GGC7GAGC77C7CGCCAGCGCAGGA7CAGCC7GTTCC7GGGAC77TCCGAGAGCCCCGCCC7CG77CC IOC

PPAASRGGLJGG7RSFRPHRGAESPRPGRDRDCV CTCCCCCAGCCGCCAGTAGGGGAGGAcjrCGGCGGTACCCGGAGCTTCAGGCCCCACCGGGGCGCGGAGAGTCCCAGACCCGGCCGGGACCGGGACGGCGT

RVPMASSRCPAPRGCR C Z P G A S L A W L G 7 V L L L L CCGAG7GCCAA7GGC7AGC7CTAGG7G7CCCGC7CCCCGCGGG7GCCGC7GCC7CCCCGCAGC77CTCTCGCATGGCTGGGCACAG7AC7GC7ACT7C7C

ADHVLLRTALPRIFSLLVPTALPLLRVHAVGLSR GCCGACTGGGTGCTGCTCCGGACCGCGCTGCCCCGCATATTCTCCCTGCTGGTGCCCACCGCGCTGCCACTGCTCCGGGTCTGGGCGGTGGGCCTGAGCC

MAVLHLCACGVLRATVGSKSENAGAOGWLAALK CCTCGGCCGTGCTCTGGCTGGGGGCCTGCGGGGTCCTCAGGGCAACCCTTGGCTCCAAGAGCGAAAACGCAGGTGCCCAGGGCTGGCTGGCTGCTTTGAA

PLAAALGLALPGLALFRELISHGAPGSADSTRL GCCATTAGCTGCGGCACTGGGCTTGGCCCTGCCGGGACTTGCCTTGTTCCGAGAGCTGATCTCATGGGGAGCCCCCGGGTCCGCGGATAGCACCAGGCTA

LHWGSHPTAFVVSYAAALPAAALHHKLGSLWVPG CTGCACTGGGGAAGTCACCCTACCGCCTTCGTTGTCAGTTATGCAGCGGCACTCCCCGCAGCAGCCCTGTGGCACAAACTCGGGAGCCTCTGGGTGCCCG

goggsgnpvrrelgclgsetrrlslflvlvvls GCGGTCAGGGCGGCTCTGGAAACCCTGTGCGTCGGCTTCTAGGCTGCCTGGGCTCGGAGACGCGCCGCCTCTCGCTGTTCCTGGTCCTGGTGGTCCTC7C

SLGEMAIPFF7GRL7DWILQDGSAD7F7RNL7I C7C7C77GGGGAGA7GGCCA77CCA77C777ACGGGCCGCC7CAC7CAC7GGA77C7ACAAGA7GGC7CAGCCGA7ACC77CAC7CGAAAC77AAC7C

MSIL7IASAVLEFVGDGITHN7MGHVHSHL0GEV A7G7CCA77C7CACCA7AGCCAG7GCAG7GC7GGAG77CG7GGG7GACGGGA7C7A7AACAACACCA7GGGCCACG7GCACAGCCAC77GCAGGGAGACG

FGAVLROE7EFFOON07GNIMSRV7ED7S7LSD 7G777GGGGC7G7CC7GCGCCACGAGACGGAG77777CCAAC AGAACCAGACACG7AACA7CA7G7C7CGGG7AACAGAGGACACG7CCACCC7GAG7GA

SLSENLSLFLWYLVRGLCLLGIMLWGSVSL7KV 77C7C7GAG7GAGAA7C7GAGC77A777C7G7GG7ACC7CG7GCGAGGCC7A7G7C7C77GGGGA7CA7GC7C7GGGGA7CAG7G7CCC7CACCA7GG7C

7LI7LPLLFLLPKKVGKWYQLLEVQVRESLAKSS ACCC7GA7CACCC7GCC7C7GC7777CC77C7GCCCAAGAAGG7GGGAAAA7GG7ACCAG77GC7GGAAG7GCAGG7GCGGGAA7C7C7GGCAAAG7CCA

0VAIEALSAMP7VRSFANEEGEA0KFREKLCEI GCCAGG7GGCCA77GAGGC7C7G7CGGCCA7GCC7ACAG77CGAAGC777GCCAACGAGGAGGGCGAAGCCCAGAAG777AGCCAAAAGC7GCAAGAAA7

K7LNQKEAVAYAVNSW77S I SGMLLKVG I LY I G AAAGACAC7CAACCAGAAGGAGGC7G7GGCC7A7CCAG7CAAC7CC7GGACCAC7AG7A777CAGG7A7GC7GC7GAAAG7GGGAA7CC7C7ACA77GG

GQLV7SGAVSSGNLV7FVLYQMQF7QAVEVLLS1 GGGCAGC7GG7GACCAG7GGGGC7G7AAGC AG7GGGAACC77G7CACA777G77C7C7ACCAGA7GCAG77CACCCAGGC7G7GGAGG7AC7GC7C7CCA

YPRV0KAVGSSEKIFEYL0R7PRCPPSGLL7PE TC7ACCCC AGAG7ACAGAAGGCTGTGGGC7CCTCAGAGAAAA7A777GAC7ACC7GGACCGCACCCC7CGC7GCCCACCCAG7GG7C7G7 rGAC7CCC77

AC AC77GGAGGGCC77G7CCAG77CCAAGA7G7C7CC777GCC7ACCCAAACCGCCCAGA7G7C77AC7GC7ACAG 7GGC7GACA77CACCC7ACCCCC7

GEV7ALVGPNGSGKS7VAALLQNLYQP7GGQL1L. GGCGAGG7GACGGCGC7GG7GGGACCCAA7GGG7C7GGGAAGAGCACAG7CGC7GCCC7GC7GCAGAA7C7G7ACCAGCCCACCGGGGGACAGC7GC7G7

dgkplpoyehrylhrovaavgqepovfgrslce 7GGA7GGGAAGCCCC77CCCCAA7A7GAGC ACCGC7ACC7GCACAGGCAGG7GGC7GCAG7GGGACAAGAGCCACAGG7A777( j GAAGAAG7C77CAAGA

NIAYGL70KP7KEE I 7AAAVKSGAHSFI SGLPO AAA7AT7GCC7A7GGCC7GACCCAGAACCCAAC7A7GGAGCAAA7CACAGC7GC7GCAG7AAAG7C7GGGGCCCA7AG777CA7C7C7GGAC7CCC7CAG

GY07EVDEACS0LSGG0R0AVALARALIRKPCVL GGC7A7GACACAGAGG7AGACGAGGC7GGGAGCCAGC7G7C AGGGGG7CAGCGACAGGCAG7GGCG77GGCCCGAGCA7TGA7CCGGAAACCG7G7G'

L D A N S Q OVEOLLYE 77A7CC7GGA7GA7GCCACCAG7GCCCTGCA7GCAAACAGCCAC 77ACAGG7GGAGCAGC7CC7G7ACGAAAGCCC7GAGCGG7AC7CCCGC7CAG7GC7

LI7QHLSLVEQADHILFLEGGAIREGG7HQ01.K 7C7CA7CACCC AGCACC7CAGCC7GG7GGAGCAGGC7GACC ACA7CC7C777C7GGAAGGAGGCGC7A7CCGGGAGGGGGGAACCCACCAGCAGC7CA7G 2 C

EKKGCYWAHVOAPADAPE* GAGAAAAAGGGG7GC7AC7GGGCCA7GG7GCAGGC7CCTGCACA7GC7CCAGAA7GAAAGCC77C7CAGACC7GCGCAC7CC A7C7CCC7CCC7777C77 2iZ C7C7C7G7GG7GGAGAACCACAGC7GCAGAG7AGCAGC7GCC7CCAGGA7GAG77AC77GAAA777GCC77GAG7G7G77ACC7CC777CCAAGC7CC7C 2tZ G7GA7 AA7GCAGAC77CC7GGAG7ACAAAC ACAGGA777G7AA77CC7AC7G7AACGGAG777AGAGCCAGGGC7GA7GC777GG7G7GGCCAGCAC7C7 2'Z CAAAC7GAGAAA7C77CAGAA7C7ACGGAAACA7GA7CAGC7A7777CAACA7AAC7CAAGGCA7A7GC7GGCCCA7AAACACCC7G7AGG77C77GA7A 2 8 C 777A7AATAAAA77GG7G7777G7AAAAAAAAAAAAAAAAAA 2842

Figure 6.8. Nucleotide sequence of the RING4 gene. Nucleotide sequences after the vertical line were obtained from the cDNA clone 2.1. Sequences before the vertical line were obtained from a genomic subdone. Two potential initiating methionine residues are arrowed. Three consensus N-linked glycosylation sites are indicated by squares. Potential transmembrane regions in the ORF are underlined. The polyadenylation sequence in the 3' untranslated region is underlined. The region encoding the ATP-binding domain is boxed. o

o u

CO CM o a> co co o o CT\ v d a>M 3 • 60pH the the algorithm of Rao and Argos (1986) to predict the location of transmembrane regions. Short horizontal tJU, lines indicate the positions of putative membrane-spanning domains. tein sequence data bases, significant homology was found between a stretch of amino acids within the C-terminal region (boxed in Figure 6.8.) and the ATP binding site of members of the 'ABC* (ATP-binding cassette) superfamily of energy-dependent transport proteins (Figure 6.10.), which includes P-glycoprotein, the plasma-membrane-associated pump responsible for the multi-drug resistance phenotype found in some human tumour cells (Juranka et al., 1989; Hyde et al., 1990). The physiological substrate of P-glycoprotein is uncertain, but other members of the ABC superfamily are each specialised for the transport of a particular substrate. Known substrates include sugars, inorganic ions, amino acids, peptides and proteins.

In general, ABC transporters have a similar overall organisation to P- glycoprotein, shown in Figure 6.11., with two highly hydrophobic integral membrane domains and two ATP-binding domains (Hyde et al., 1990; Juranka et al., 1989). The four domains may be present in one polypeptide encoded by a single gene, or they may be encoded by more than one gene so that the functional transporter is a multi-subunit complex. P-glycoprotein, for example, is encoded by a single gene, while the oligopeptide permease of bacteria is the product of four separate genes. The RING4 product which contains one hydrophobic domain and one ATP-binding domain falls between these two extremes, resembling one-half of a functional ABC transporter molecule, like the hlyB haemolysin transporter of E. coli (Hyde et al., 1990; Felmlee et al., 1985). If the RING4 product is part of an ABC transporter, the functional molecule may be a R/NG4-homodimer, or a heterodimer w ith a RING4 -related gene product.

The possibility that theRING4 product is part of an ABC transporter may be particularly relevant for the understanding of the class II- associated defect in antigen presentation by class I molecules as discussed in section 1.6.2. Studies of mutant cell lines in man and rat have implicated a role for a class H-encoded function in normal class I antigen presentation in these species (Livingstone et al., 1989; Cerundolo et al., 1990). The RING4 gene maps within the interval known to contain this function in the human mutant B-cell line LBL 721.174. This cell line expresses normal class I molecules but these are unable to present intracellular viral antigens and do not progress from

149 o Ed Ed DC CO l 0 Ex] Cx3 CO 1 1 X 1 > t-H 1 1 2 Ed Ed > > Eh CO Ex] < < H £h M CO > > >H >H Q 1 1 1 1 1 1 1 I 4 3 1 1 1 1 1 1 1 I < 1 1 1 1 1 1 1 1 CO 1 1 1 1 Eh 1 1 1 Eh 1 1 > 1 1 > 1 1 < 1 1 1 o CU 1 1 1 Q Ed Ci3 u u Ci3 1 Ex] Ci3 O 1 1 1 i 1 1 1 1 4 ) 1 1 1 i < 0 1 Eli M 4 1 i-3 Eli i 1 1 1 t-H t-3 1 1 1 i 1 M 1 1 > w w t-H M 41 43 1 M 0 5*3 3 3 CU SC XSCXSC a . 1 1 Eh CO 1 1 1 1 2 0 Q 1 DC Ex] X 2 G 1 1 1 1 0 41 X 2 M > > 1 41 41 1 1 > i-3 1 1 Eli 1 1 1 1 1 1 1■ 1 1I 1 1 1 1 1 tt, d ! 1 1 1 1 X 1 1 1 D h < 1 1 1 1 1 1 1 1 ►3 l M M MMM t-H M Oh < 1 1 1 0 X 0 0 1 o > M M 1 .-3 1 M M M < x C C DC DC DC DC X X T 3 a 1 1 1 1 1 2 1 1 G a : XSC 0 < 1 0 1 1 cti 0 1 1 1 1 X Ex] 1 1 0 1 1 1 1 1 1 1 1 o 1 1 1 1 1 1 1 1 Q CO 1 1 1 1 1 1 1 1 cu ►3 1 1 1 1 bu Ex. 1 1 cu a 1 1 EH _3 Ei3 Ci] 0 0 CO < Eh > Eh X X > < O 0 I 1 1 Eh CU DU 1 1 < X SC 0 Q >H >H 2 0 Eh 2 cu u 1 a M X 1 1 >4—* a 0 0 0 DC 2 2 0 0 Ul O DC CO Ed 41 43 < 2 CU O 2 CO Q Q > CU 0 0 C /1 H Ed 2 1 0 Ed Ex] CU Cu x a CO 1 X 2 G 2 < etf 0 1 i 1 1 43 t-H Eh < *-■ >H 1 i h3 1 0 M 43 i-3 < DC i 41 Eh Ci3 Cl] 1 CO M 1 i 1 43 0 0 1 1 2 1 i 1 1 > M 1 1 Ex3 1 i SC q DC Eh q Q 0 o < < DC 2 X 2 2 t-H 1-3 M t-H i-3 l—l » 2 > M V+H CO Eh 1 H Eh cu cu 1 1 X Eh u a 0 2 2 1 1 6 0 0 < Q 2 2 41 41 2 2 G Eli 1 l 1 1 CO CO 1 41 •4-^ > 4 3 41 43 41 Eh < 43 43 o > w Eh 41 2 41 Eh CU 1 l O 1 I 1 CO 2 6 W 1 1 DCSC a a 1 Q H-i a 1 1 1 1 i i 1 1 c/l 0 CO CO Ed Ed Eu bu 43 43 00 > 1 1 1 1 t-H W 1 1 CU < > HH > > 2 2 > > < 0 0 Eh CO t o 0 0 0 > t-H 41 M M M M 41 1

2 DC DCDCSC 0 0 XX £ a 1 Ed 41 Eh 2 M 1 1 CD j 1 1 1 I HM M 2 • V 41 i 1 1 1 1 1 < < G < a 0 2 43 bu 2 X SC < > > CO > 1 1 Eh Eh (U > Eh 1 41 43 Eh bu 43 43 H-* Eh 1 1 1 1 0 1 1 1 o CO 1 1 1 1 1 1 1 1 4h X 1 1 1 1 1 1 1 1 cu 0 1 1 1 1 1 1 1 t CO u 0 1 Eh 1 0 1 t cu 0 1 1 1 1 1 1 1 1 u 2 CO CO CO c o CO CO CO CO G CU 2 CO :*3 Ed Ci3 Ei] XX <0 0 1 1 1 1 1 1 l 1 > 1 1 1 M 1 1 l 1 l 1 M W t-H > > M to < 1 1 lu 0 0 0 0 0 CU Eh > 43 1 41 43 43 > M Cu > Eh Eh Eli Eh Eh Eh I 1 6 0 Ed a 0 0 0 1 1 1 1 G 0 1 1 1 1 1 1 I 1 CU CO SC < 0 < Ex] l 0 X 0 SC CO Eli 1 >H < SC T3 41 > > Eli 2 1 1 t-H t-H •43 Eh 1*3 Ed 2 a 1 XX t o Cu 43 41 4 3 i 1 43 41 4 1 G g 00 EH CO CO 2 2 1 c o 2 2 vO £ 41 1 1 2 1 > t-H cu vd 0 1 1 2 Q 1 2 o» 6 0) G M G 4h <0 0 M VO a bu CO CQ 60 G s 2 T3 0) a a <0 > 1 S> .8 6 0 G M e 4-> a a > i

F ig u re 6 .1 1 .

Model of the transmembrane topology of P-glycoprotein, illustrating the structure of the members of the ABC superfamily of transporters. Each circle represents an amino acid; branched structures represent N-linked carbohydrate. The tandemly duplicated structure is clearly evident. The predicted product of the RING4 gene contains one membrane-associated domain and one ATP-binding domain and would therefore have a structure similar to one half of P-glycoprotein. the endoplasmic reticulum (ER), which is thought to be the site of class I/peptide interaction. The defect can be relieved by the addition of exogenous viral peptides, which induce folding of the class I molecules, association with p-2 microglobulin, and transport of the complex to the cell surface where normal antigen presentation occurs. This observ ation has been interpreted as showing that LBL 721.174 is defective in the transport of intracellularly derived peptides from their site of generation in the cytosol to their site of binding to class I, in the ER (Cerundolo et al., 1990). It is known that some members of the ABC family are specialised for the transport of peptides and proteins; thus, the product of the STE6 gene in S. cerevisiae is responsible for export of the 12-amino acid a mating factor (McGrath and Varshavsky, 1989), the product of the E. coli hlyB gene transports the protein haemolysin (Mr 107kD; Felmlee et al., 1985), and the product of the opp (oligopeptide permease) operon of S. typhimurium transports peptides of up to 5 amino acids in length (Hiles et al., 1987). These members of the ABC superfamily are specialised for the transmembrane transport of polypeptides which do not have signal sequences by a mechanism independent of the normal secretory pathway. An attractive hypothesis is that theRING4 product is part of a signal-independent peptide transporter which is associated with the ER membrane and which pumps peptides from the cytosol into the lumen of the ER, where binding to class I molecules then occurs (Trowsdale et al., 1990). The role of the RING4 product in class I antigen presentation is currently being tested by transfecting the RING4 gene into mutant cell lines and testing for complementation of the defect (collaboration with Dr. Alain Tow nsend).

The RING4 gene is an exciting candidate for explaining some class II- associated autoimmune diseases. It can be speculated that different allelic forms of this gene could have a profound influence on the immune response by determining which peptides ultimately become associated with class I molecules and presented to cytotoxic T- lymphocytes. Thus, in theory, the RING4 protein could promote the presentation of certain autoantigens to autoreactive T-cells, and participate in the initiation of an autoimmune response.

152 6.6. Partial nucleotide sequence of RING5

The two ends of the insert of the 2.3kb RING5 cDNA clone yU5 were sequenced using the two synthetic oligonucleotide primers (CDM8-F and CDM8-B) complementary to the CDM8 vector on either side of the cloning site. The sequence primed by CDM8-F extended for 162 nucleotides and contained a single ORF of 54 amino acid residues (Figure 6.12.) This ORF contained a stretch of hydrophobic amino acid residues followed by a region unusually rich in histidine residues. The partial amino add sequence of the putative RING5 protein product was found to be very similar to the N-terminal region of the predicted product of a mouse gene, KE4 (St.-Jacques et al., 1990). When the RING5 and KE4 gene sequences were aligned, 145 out of 162 nucleotides (89.5%) were identical. At the amino acid level, 45 out of 54 (83%) residues were identical (Figure 6.12.). The yU5 sequence primed by the CDM8-B oligonucleotide contained stop codons in all three reading frames. When this sequence was compared with the sequence of the KE4 cDNA, significant identity was detected with the 3' untranslated region of XE4. It was therefore concluded that R1NG5 and KE4 are homologues of the same gene. Consistent with this conclusion are the observations that the RING5 cDNA clone yU5 detects cross- hybridising sequences 45kb proximal of the mouse Pb gene, which is known to be the map position of KE4 (see Chapter 7 for further details), and that KE4 and RING5 were both found to be expressed in all tissues tested (St.-Jacques et al., 1990; Table 5.4.). It is interesting however that RING5 appears to give rise to two major RNA species, while KE4 produces only one (Figure 5.14.; St.-Jacques et al., 1990). Furthermore, the KE4 transcript was estimated to be 2.8kb in length, while the RING5 transcripts are 2.0 and 2.3kb. The explanation for these differences will await the complete sequencing of cDNA clones for the two forms of RING5 and comparison of these sequences to KE4. The protein coding region of the 2.7kb KE4 cDNA which was sequenced in the study of St.- Jacques et al. (1990) spanned only 1.3kb of the total and it is therefore possible that the differences in length observed between the mouse and human transcripts occur in the untranslated regions.

153 KE4 —T G-T —G ------RING5 CCC CAC TGG GTG GCG GTG GGA CTG CTG ACC TGG GCG ACC TTG GGG CTT CTG GTG (5 4 ) RING5 P H W V A V G L L T W A T L G L L V (18) hj x cd * z x z z x Cd CD CDCd E CD < CD < < CD < O < O CD Cd CD O* CD E CD I CD I CD I CD I X I t-3 CD I I I CD I I CD I CD I E I I CD I I CD I I E I CD I h Eh I E CDI I CD I I O I I 1-3 CD I CD I CD I I I Cd CD I < I CD I I C Q I Q CD I CD I E I E I O I ! 2 O X ! X O I I EH I Eh CD CD i/l i/l 'J X X M M < CD CD CD CO O < < < < 3 0 < < < h h h h h h

WO D I CD < D CdCD D I CD I Q I X CD Q -3 > I Q I X lu I I

J L LO LO HJ* X 2 2 X Cd CD CD Cd CD D Eh CD CDCD CDCD CD I CD CD cd O CD X X M H CD CD CD CD CD < < < CD CD CD CD Eh CD CD < < £ CD I CdCD CD E E CD CD CD WO IT) CM ’T < < E < Eh CD CD CD CD < CD CD CD 2 < < < h h h h

D I CD I X I CO I X u I Cu CD X I X I co I X I X I CD I X I Q I X I CO I X < CD

» a y - . c c. u- - •§ -a a o —J 41 3 3 <3 CO * 73 T> 73 3 7 ♦ H • ^ • ON ON o C/D £ , -t-> • ^ 3 T 73 £ O mutation tw5 (Abe et al., 1988; Artzt et al., 1988). Five genes, designated KE1-5, were identified in the cloned region and four of these, including KE4, were found to be expressed in 5.5 day-old mouse embryos, the time at which the twS mutation causes embryonic death. As a step towards characterising the candidate genes for the tw5 phenotype, the complete nucleotide sequence of KE4 was obtained by St.-Jacques et al. (1990). The predicted protein product contained 436 amino acids and had two domains which were extremely rich in histidine residues. The first of these was partly contained within the RING5 sequence (Figure 6.12.). Within these domains, histidine was typically found as every alternate residue. The KE4 product also contained three stretches of hydrophobic residues characteristic of transmembrane domains and a further stretch at the N-terminus which was characteristic of a signal sequence. The KE4 product may therefore be an integral membrane protein. Its function at this stage remains obscure but its role in development, and particularly in the tw5 phenotype, could be tested by the construction of transgenic mice.

6.7. Summary

The sequence data described in this chapter have given intriguing insights into the possible functions of four of the five novel genes. The RING1 gene encodes a protein containing a novel cysteine-histidine motif which is conserved in a range of species including yeast and which may interact with DNA. TheRING3 gene is homologous to the developmentally important fsh locus of D. melanogaster. Partial sequencing of the RINGS gene revealed an ORF with homology to the KE4 gene of the mouse, which maps to a region known to contain a developmental lethal. Finally, the RING4 gene product appears to be an ABC-type transmembrane pump, related to bacterial peptide transporters. Experiments to gain further insights into the functions of the products of these genes and suggestions for their possible roles in explaining MHC-associated phenotypes are discussed in Chapter 9.

155 7. Comparative mapping of novel genes in the MHC region of mouse and man

This chapter describes the mapping of sequences homologous to the human genes RING1-5 and COL11A2 in the mouse MHC, and the mapping of sequences homologous to three novel mouse genes, KE1-3, in the human MHC.

7.1. Introduction

The organisation of genes in the human and mouse class n regions is very similar. The finding that the subregions are essentially homologous, and that the order of the subregions with respect to one another is conserved (section 1.4.) has led to the proposal that the organisation of the class II region was established before the radiation of primates and rodents (Klein, 1986; Bodmer et al., 1986). Modifications of this organisation, such as the duplication of the hum an DQA1 and DQB1 gene pair to give the DQA2 and DQB2 genes, and the deletion in the mouse of sequences homologous to the human genes DPA1 or DPA2, are thought to have occurred more recently (Bodmer et al., 1986). It was therefore of interest to test for the presence of homologues of the human genes RING3 and RING4 in the mouse class II region in order to determine whether these genes were also present in the ancestral class II region or whether they have more recently become inserted into the human MHC.

The region immediately centromeric of the class II region, containing the genes RING1, RING2 and RING5, is also interesting from an evolutionary point of view. The major difference in overall organisation of the mouse and human MHCs is that the mouse has a pair of class I genes, K and K2, mapping about 70kb proximal of the most proximal class II gene, Pb (Figures 1.7. and 7.1.). From sequence data and restriction mapping studies the K and K2 genes are thought to have arisen by duplication in the Qa region of the mouse class I gene

156 Figure 7.1. Molecular map of the proximal end of the mouse MHC. The main diagram shows the 600kb cosmid walk of Steinmetz et al. (1986) which links the class II genes (Pb, Ob, Ab, Aa, Eb, Eb2 and Ea) with the class I gene pair (K and K2) in the BALB/c mouse (H-2d haplotype). Above this map is shown the 190kb region cloned from the tw5 haplotype in an independent study by Uehara et al. (1987). Also indicated are the positions of the five novel genes, KE1-5, which were found within this cloned region by Abe et al. (1988). Arrows show the transcriptional orientation of the genes, where known. Both maps are to the same scale. duster and to have subsequently moved to their current location by an intrachromosomal double cross-over event (Bodmer, 1981; Weiss et al., 1984). This organisation is also found in the rat, but seems to be unique to rodents because every other species examined to date, including man, has all the class I genes grouped together in a single region mapping further towards the telomere (Klein, 1986). The report of five novel genes, KE1-5, in the immediate vicinity of K and K2, raised the possibility that these genes may also have been moved from the dass I region (Abe et al., 1988; Figure 7.1.). In this case, it would be predicted that the human homologues of the mouse KE genes should map to the human dass I region. It has been already been shown in Chapter 6 that RING5, which maps 85kb centromeric of DPB2, is the human homologue of the mouse KE4 gene, which maps 45kb centromeric of Pb. This result suggests that the KE4 gene has not come from the class I region with K and K2 but rather that its position at the centromeric end of the class II region was established before the radiation of rodents and primates. To clarify the situation for the other KE genes, and to determine whether RING1 and RING2 were homologous to any of the KE genes, it was dedded to determine the positions of the KE genes in the human MHC and of the RING genes in the mouse MHC.

7.2. Determination of the positions of sequences homologous to the human genes RING1-5 and COL11A2 in the mouse MHC

The entire dass II region and the H-2K subregion of the BALB/c (H-2d) mouse has been doned in a single contig of overlapping cosmid dones (Figure 7.1.; Steinmetz et al., 1986). To determine whether or not the human genes RING1-5 and COL11A2 have homologues in the mouse MHC, cDNA probes from RING1-5 and a genomic probe from COL11A2 were hybridised to blots of DNA from these mouse cosmids. It was found that all six of the human genes cross-hybridised with specific m ouse cosmid clones (Figure 7.2.).

The R1NG1, RING2 and RING5 cDNA probes (CEM15, CEM21 and yU5 respectively) all cross-hybridised with two overlapping mouse cosmids, II5.8 and II6.4 (Figure 7.2.). The RING2 and RING5 probes detected similarly sized Hindm fragments. The region of overlap between the

158 CO CO CO 1- 0 N ^ y— 0 N i - ( D N ■ CO • CM CM •C CO • CM CM ' CO ■ CM CM Is- . . O . - N . . o . . Is* . . O r — I D C D t - CM ^ ID C D CM r f I D C D c m ^ r

RING1 RING2 RING5

CO CO CO T - ( D N ^ t— 0 N r - CM r ‘ CO rf ■ CM CM ■ CO rf ■ CM CM CM ‘ CD rf * h» . O N . . O . O . . C D ▼“ I D C D t — CM r f r- ID CD r- CM r f t - I D CM CO

C0L11A2 RING3 RING4

Figure 7.2. Detection of sequences cross-hybridising to the human genes RING1-5 and COL11A2 in the mouse MHC. The autoradiographs show the results of hybridising probes from each of these genes to blots of Hindlll-cut (first five panels) or EcoRI-cut (last panel) DNA from cosmid clones in the mouse class II region (Figure 7.1.). Probes were: RING1, CEM15 cDNA; RING2, CEM21 cDNA; RING5,xU5 cDNA; COL11A2, 4.5kb BamHI/EcoRI genomic fragment; RING3, CEM41 cDNA; RING4, 2.1 cDNA. The final washing stringency was 6x SSC, 65°C for the RING5, COL11A2 and RING4 probes and 2x SSC, 65°C for the RING1, RING2 and RING3 probes. Bars at left represent lambda/Hindlll DNA size markers; from top these are 23.1 kb, 9.4kb, 6.6kb, 4.4kb, 2.3kb, 2.0kb and 0.56kb. two cosmids is about 40kb (Figure 7.1.). Detailed mapping of 115.8 and II6.4 with the restriction enzymes Kpnl and Sail allowed the relative position of the putative mouse homologues of RING1, RING2 and RING5 to be determined, according to detailed cosmid maps kindly provided by Dr. Michael Steinmetz (data not shown). Thus the putative mouse homologues, as defined by cross-hybridisation, of RING1, RING2 and RING5 were found to be respectively about 60kb, 50kb and 45kb proximal to the Pb gene, and therefore they have the same relative position in mouse as in human (Figure 7.7.). This result is consistent with RING5 being the human homologue of the mouse KE4 gene as determined by nucleotide sequencing (section 6.6.), since KE4 is known to map 45kb proximal of Pb (Figure 7.1.).

The RING3 cDNA probe CEM41 cross-hybridised with two overlapping mouse cosmids, II2.27a and 114.24. This places the putative mouse homologue of RING3 between 30 and 40kb distal of the Pb gene (Figures 7.2. and 7.1.).

The RING4 cDNA probe 2.1 cross-hybridised with mouse cosmid II5.9, which places the mouse homologue of RING4 between 10 and 40kb proximal of the Ob gene (Figures 7.2. and 7.1.). This result has recently been confirmed by Deverson et al. (1990), who have described a gene in this region in both mouse and rat which is related to RING4 at the nucleotide sequence level.

The hum an COL11A2 probe, a genomic 4.5kb BamHE/EcoRI fragment, cross-hybridised with mouse cosmid III 0.6 (Figure 7.2.; the bands seen in the '5.8* and ’6.4' tracks are due to prior hybridisation of the same filter with theRING1 probe). Cosmid II10.6 also contains the Pb gene (Figure 7.1.). Detailed restriction enzyme mapping of II10.6 revealed that the region cross-hybridising to the COL11A2 probe was about lOkb proximal of the Pb gene (data not shown). This region has now been sequenced in the mouse, and the presence of the mouse COL11A2 gene confirmed (Dr. Kathy Cheah, personal communication; L. Stubbs and K. Cheah, manuscript in preparation).

It was apparent from these results that the positions of RING1-5 and COL11A2 with respect to one another and to the characterised class II

160 genes are strikingly conserved in mouse and man, as summarised in Figure 7.7. It was apparent that the only genes in the proximal region of the human MHC which did not have a homologue in the mouse MHC w ere DNA and DPA1/DPA2. Previous attempts to search for novel class II a-chain genes in the mouse class II region by screening mouse genomic libraries at low stringency with mouse a-chain gene probes had not revealed homologues of DNA or DPA1/DPA2 although it is possible that such genes may have diverged too greatly at the nucleotide sequence level to be detected in this way. It was therefore decided to hybridise the human DNA and DPA1 gene probes directly to DNA from the mouse class II region cosmids at low stringency. Washing at 55°C in 6x SSC revealed a single cross-hybridising band in cosmid II2.27a with the human DNA probe (data not shown). This result suggests that there are indeed sequences related to DNA between Pb and RING3 in the mouse (Figure 7.7.). No cross-hybridising bands were detected in the region near Pb using the human DPA1 probe under the same conditions. It remains possible that there is a mouse homologue of DPA1 but that it is highly diverged.

7.3. Determination of the positions of sequences homologous to three mouse genes, KE3-5, in the human MHC

Probes for the mouse genes KE2r KE3, KE4 and KE5 were a gift from Dr. Kuniya Abe. These probes were genomic fragments from mouse cosmid clones as described in Table 7.1. Each has been previously reported to detect sequences in human genomic DNA (Abe et al., 1988). The conditions for cross-hybridisation to sequences in human genomic DNA were determined for each mouse probe in the present study by hybridising them to human Southern blots at 65°C and washing at gradually increasing stringency (Table 7.1.).

It was shown in section 6.6. that the RING5 gene is the human homologue of the mouse KE4 gene. It was therefore hypothesised that the human homologues of the other KE genes may also map near RING5. The KE gene probes were hybridised to blots of DNA from overlapping human cosmids in this region, using the conditions determined for detection of cross-hybridising sequences.

161 CROSS MOUSE GENE PROBE IDENTITY HYBRIDISATION

KE2 No.5; 1.7kb BamHI fragment 6x SSC, 65°C KE3 No.7; 1.8kb PstI fragment 2x SSC, 65°C KE4 No.ll; 4.9kb BamHI fragment 2x SSC, 65°C KE5 No.14; 3.9kb PstI fragment 6x SSC, 65°C

Table 7.1. Description of genomic probes for the mouse genes KE2-5. The second column shows the probe number, as assigned by Abe et al. (1988), and the size of the probe. The third column shows the washing conditions under which cross-hybridisation of each probe to human genomic DNA was observed.

The KE5 probe cross-hybridised to fragments in cosmids HPB.ALL 31 and cosHcol.ll, as shown in Figure 7.3. The large EcoRI band detected in HPB.ALL 31 is shorter than that detected in cosHcol.ll because the HPB.ALL 31 cosmid insert terminates within this EcoRI fragment (Figure 5.2.). The results are consistent with the presence of a homologue of KE5 mapping within the ~12kb EcoRI fragment just proximal to the COL11A2 gene.

The KE4 probe cross-hybridised to two adjacent EcoRI fragments in the region of overlap between cosmids HPB.ALL 31 and HPB.ALL 33 (Figures 7.3. and 5.2.). These fragments encode the RING5 and RING2 genes. In addition, the KE4 probe was found to cross-hybridise to cDNA clones for both the RING5 and RING2 genes (data not shown). This was surprising, given that nucleotide sequencing had revealed that RING2 was unrelated to KE4, while RING5 and KE4 were homologous. However, it has recently been reported that the KE4 probe, a 4.9kb genomic BamHI fragment, detects two RNA species of 1.5kb and 2.8kb on mouse northern blots and encodes part of two physically close but distinct genes (St.-Jacques et al., 1990). The 2.8kb transcript is the product of the originally described KE4 gene (Abe et al., 1988; St.-Jacques et al., 1990). Since it has already been demonstrated that RING5 is the human homologue of the true KE4 gene, it is most likely that RING2 is

162 CO CO 1- o ■ ” o i- co co co O CO CM CM CO 1- O CO CO

_ 23.1

— 9.4 — 6.6

“ 4.4

— 2.3 — 2.0

KE4 PROBE KE5

Figure 7.3. Detection of sequences cross-hybridising to the mouse genes KE4 and KE5 in the human MHC. The autoradiographs show the results of hybridising mouse genomic probes for the genes KE4 and KE5 to blots of EcoRI-digested DNA from overlapping human cosmids at the proximal end of the human class II region (Figure 5.1.). 33 is cosmid HPB.ALL 33; 31 is cosmid HPB.ALL 31; Col is cosmid cosHcol.ll; 1 is cosmid HPB.ALL 1; 8 is cosmid HPB.ALL 8; 2.3 is cosm id M A N N 2.3. The final w ashing stringency w as 2x SSC, 65°C for the KE4 probe and 6x SSC, 65°C for the KE5 probe. the homologue of the second (uncharacterised) mouse gene encoded by the KE4 probe. This explanation is consistent with the results described in section 7.2., where mouse sequences cross-hybridising to RING2 and RING5 were found to be in close proximity.

Probes for the mouse genes KE2 and KE3, which map centromeric of the K and K2 genes, did not cross-hybridise with the human cosmid clones. The KE3 probe was used to screen the human cosmid library and to isolate a novel clone, HPB.ALL 51 (Figure 7.4.). This cosmid was mapped with restriction enzymes and found to contain a 2.6kb BamHI fragment, designated B51, which cross-hybridised with the KE3 probe. B51 was used as a probe to map the putative human homologue of the KE3 gene. B51 was first hybridised to Southern blots of DNA from the somatic cell hybrids, MCP-6 and 56-47, as described for the mapping of COL11A2 in section 4.2. The B51 probe detected a large EcoRI fragment of about 15kb in both hybrids, indicating that B51 maps to 6p21 (data not shown). B51 was then mapped relative to the MHC by PFGE. It was found that B51 detected similarly sized bands in human DNA cut with Notl, Mlul, Mlul+Notl, Nrul and Nrul+Notl as the probe 33X1 (Figure 7.5. and Table 7.2.). Apart from these shared bands, 33X1 detected a lOOkb BssHII fragment which was not cleaved by Nrul, while B51 detected a 50kb BssHII fragment which was cut by Nrul to yield a very small fragment of about 10-15kb in length (Figure 7.5. and Table 7.2.).

These results facilitated the accurate mapping of B51, the putative human homologue of the mouse KE3 gene, between 80-95kb centromeric of the RING1 gene (Figure 7.6.). Consistent with the genomic mapping results, the cosmid HPB.ALL 51 was found to contain a single Nrul site and a single BssHII site separated by about 15kb (Figure 7.4). Furthermore, the B51 probe mapped between these two sites as judged by hybridising B51 to blots of cosmid DNA (data not show n).

In conclusion, the positions of the human sequences which cross- hybridise with probes for the mouse genes KE3, KE4 and KE5 are in the same order as their m ouse counterparts (Figure 7.7.).

164 u * c cd HH c X)Ui £ (0 2 c CU cd o X co o 4-) bo X) c c tox> o *2 CN O) 0) d) 0) cc b 2 T0) 3 x> (A TJ0> jd x> O ca X •* —i I (A (A CA cd o £ b 0> ■Q X> xs• H o £ cu m m u X! to CU < X CQ to i n ffi to XI < CQ to £ _Q c in 3£ ■£ X 0> x : wc n o> C Cu 3CA m O 2 tN Xc u c •4-^ o < o 2 CD .58»h fo > o CL to 04 which mapped between the BssHII site and the most distal Xhol site as shown. X i CU

UJ O U h O h CD m ra • * CD £ 00 o Jd cu t cn3 a. I I I o o o ir> JO £ JX o in

a> x H cu a 0 0 "

CU > CD 5 § 3 a. in

in X CD 00 00 to CO e

in cu K - I I I o o o I- J 3 3 n, J>C CO o go 3 CM in in kilobases. LM, limiting mobility. Size markers in the second panel are concatemers of bacteriophage lambda DNA. with with B51. Lane 7, Nrul; lane 8, BssHII; lane 9, BssHII+NruI; lane 10, Nrul+Notl; lane 11, Notl. Fragment sizes are indicated lane lane 3, BssHII; lane 4, BssHII+MluI; lane 5, Mlul; lane 6, Mlul+Notl. The third panel shows a different filter hybridised e c- hybridised sequentially with 33X1 (Figure 5.2.) and B51. Lane 1, human genomic DNA digested with Notl; lane 2, BssHII+Notl; in 03 — C5

03 — LO _ l □ in _ l 03 < ca cd CL IE

03 — 13 c O

BssHII 50kb lOOkb M lul 230kb 230kb N o tl 440kb 440kb N ru l >900kb >900kb BssHII+MluI 50kb lOOkb BssHII+Notl 50kb lOOkb Mlul+Notl 230kb 230kb N ru l+ N o tl lOOkb lOOkb NruI+BssHII 10-15kb lOOkb

Table 7.2. Sizes of fragments detected on PFGE blots with the probes B51 and 33X1.

7.4. Summary and discussion

This chapter has described the comparative mapping of the proximal MHC regions of man and mouse. At this stage, the human homologues of the mouse KE3 and KE5 genes and the mouse homologues of the human RING1, RING2, RING3 and DNA genes have been defined only by cross-hybridisation. It will be necessary to identify and sequence the genes within these cross-hybridising regions to formally prove homology, as has already been done for the RING4, RING5/KE4, and COL11A2 genes (Deverson et al., 1990; Monaco et al., 1990; Chapter 6, this study; Dr. Lisa Stubbs, manuscript in preparation).

The comparative map of the proximal region of the mouse and human MHCs is probably the most detailed for any part of the mammalian genome (Figure 7.7.). The relative positions of the putative homologues of RING1-5, COL11A2 and KE3-5 are strikingly similar in the two species. This degree of conservation strongly suggests that the establishment of the molecular genetic organisation of the entire proximal MHC region predates the radiation of rodents and primates.

168 C / l C/l CU d d CO o c/l cd 73 c /l % -M d0) T3 c/l d 73 CU cd CU Uh V CU C / l d cu cu CU 00 eaaa 4 X cd o i n c /i w o Lvaa 4 cdC / 1 T3 -d i.ada 4 u CU d A cd 4 cu 4->u zvaa zoNia C / l d cd 6 zaaa 4 LONia o A d in -d c/l

It is noticeable that the distances between genes is often smaller in the mouse, in keeping with the observation that the overall size of the mouse class n region is about 300kb, compared to 900kb in man. Thus, the hum anCOL11A2 gene is 45kb proximal to the DPB2 gene while the m ouse COL11A2 gene is only about lOkb proximal to the Pb gene. Similarly, RING3 is HOkb distal to the HLA-DP subregion in man but in mouse the region cross-hybridising with RING3 is only 30-40kb distal of Pb. It is striking, however, that the distance between RING3 and RING4 is virtually identical in both mouse and man (about lOOkb, Figure 7.7.). This interval may well be the same size in the two species because it contains conserved genes.

It is intriguing that a homologue of the RING4 gene is conserved in the class II region of mouse and rat (this study; Deverson et al., 1990; Monaco et al., 1990). The nucleotide sequence and map location of the hum an RING4 gene has led to the suggestion that it encodes an ATP- driven transporter which pumps peptides from the cytoplasm into the lumen of the endoplasmic reticulum where binding to class I molecules then occurs (section 6.5.; Trowsdale et al., 1990). It is possible that the location within the MHC of a gene encoding a product which is potentially so intimately involved with the function of class I antigens provides a selective advantage, with RING4 and class I gene sequences co-evolving within the same gene complex where they

170 would be rarely separated at meiosis. From this point of view it will be interesting to test for the presence of a homologue of RING4 , (and of the other RING and KE genes) in the MHCs of more distantly related species, such as the chicken.

The mouse homologues of the human RING1, RING2 and RING5 genes map within the region of the mouse MHC believed to contain the tw5 embryonic lethal gene (Artzt et al., 1988). RINGS is the human homologue of the mouse KE4 gene, which is already being considered as a candidate for tw5 (St.-Jacques et al., 1990). The m ouse RING1 and RING2 genes are additional candidates for the tw5 gene. It will be of particular relevance to determine whether the mouse homologues of RING1 and RING2 are expressed in early embryogenesis.

171 8. RFLP analysis of novel genes

8.1. Introduction

As described in section 1.5., many diseases are known to show an association at the population level with the class II region. Theoretic ally the basis of the association could be an influence of the class II allele with which the association was originally detected or of any gene in linkage disequilibrium with the original marker. To determine whether any alleles of the novel genes C COL11A2 and RING1-5) are in linkage disequilibrium with any class II alleles, and whether these alleles are more closely associated than the class II alleles with any diseases, it is first necessary to define polymorphisms in the novel genes with which the relevant studies can be performed.

8.2. Approach to identifying RFLPs

The genomic DNA samples used in this study were from random Caucasian donors and were a gift from Susan Tonks, Tissue Antigen Laboratory, ICRF. Genomic DNA from five individuals, giving a sample size of ten chromosomes, was typically digested with each enzyme used. Digested DNA was resolved on 0.8% agarose gels in lx TAE buffer. Gels were run to maximise the resolution of larger fragments, and it is therefore possible that some small fragments may have run off the end of the gel during electrophoresis. On the filters of MspI- and Taql-cut DNA, the maximum fragment size retained was about 1.5kb. On the other filters, the maximum fragment size retained was about 1.2kb. All hybridisations were performed at 65°C and washed to O.lx SSC at 65°C to avoid cross-hybridisation with any related sequences. In practice, some weaker bands could occasionally be detected after long exposures of hybridised filters. These were not recorded in the present analysis.

172 8.3. RFLP analysis of COL11A2

The probe used was a 4.5kb BamHI/EcoRI genomic fragment subdoned from the cosmid cosHcol.ll (Hanson et al., 1989). The results obtained w ith this probe are shown in Table 8.1.

CHROMOSOMES ENZYME TESTED FRAGMENTS FREQUENCY

EcoRI 30 llk b 0.53 9kb 0.47 BgHI 10 9.5kb 1.0 4.0kb 1.0 MspI 10 n o n e TaqI 10 4.5kb 1.0 2.0kb 1.0

Table 8.1. Fragment sizes detected with the genomic COL11A2 probe in Southern blots of human DNA cut with the enzymes indicated.

The COL11A2 probe detected a frequent RFLP in EcoRI-digested DNA samples. Of fifteen samples tested, two contained a single cross- hybridising fragment of about llkb and one contained a single cross- hybridising fragment of about 9kb. These individuals were presumably homozygous for the llkb fragment and the 9kb fragment respectively. In the other twelve samples the COL11A2 probe detected two fragments, one of llkb and one of 9kb. These individuals were presumably heterozygotes. In the sample tested here the frequency of the two fragments was 0.53 (llkb) and 0.47 (9kb). Cheah et al. (1990a) have reported a similar observation, and have demonstrated by family studies that the polymorphic bands are allelic. In addition, BamHI and StuI RFLPs have been demonstrated near the COL11A2 gene (Priestly et al., 1990; Cheah et al., 1990b). The potential applications of COL11A2 RFLPs are discussed in Chapter 9.

173 No bands were detected in Mspl-digested samples, probably because the cross-hybridising fragments were small and had run off the end of the gel during electrophoresis. However, 50 Mspl-cut samples were tested in the study of Cheah et al. (1990a), and no polymorphism was detected. No TaqI or Bglll polymorphism was detected in the small sample used here.

8.4. RFLP analysis of RING1

The probes used were 33X1, a 1.7kb genomic Xhol fragment from cosmid HPB.ALL 33 which encodes the 3' end of the RING1 gene (Figure 5.2.), and CEM15, the RING1 cDNA clone (Figure 5.13.). Results obtained with these probes are listed in Tables 8.2. and 8.3. On the basis of these data it was concluded that the RING1 gene did not contain any high-frequency polymorphisms for the ten restriction endonucleases tested.

CHROMOSOMES ENZYME TESTED FRAGMENTS FREQUENCY

BamHI 10 4.3kb 1.0 2.3kb 1.0 BglU 8 4.3kb 1.0 2.0kb 1.0 Dral 10 5.8kb 1.0 1.8kb 1.0 EcoRI 10 ~15kb 1.0 EcoRV 10 >20kb 1.0 H indlll 10 >20kb 1.0 PstI 10 5.5kb 1.0 PvuII 10 7.7kb 1.0

Table 8.2. Results obtained by hybridising the RING1 cDNA probe CEM15 to Southern blots of human DNA cut with the enzymes shown.

174 CHROMOSOMES ENZYME TESTED FRAGMENTS FREQUENCY

BglU 10 4.3kb 1.0 EcoRI 10 ~15kb 1.0 MspI 10 1.6kb 1.0 TaqI 10 1.5kb 1.0

Table 83. Fragment sizes detected on Southern blots of human genomic DNA with the genomic probe 33X1 which encodes the 3’ end of the RING1 gene.

8.5. RFLP analysis of RING2

The RING2 probe was the 800bp genomic fragment 31K1 (Figure 5.2.). The results obtained with this probe are shown in Table 8.4. From these data it was concluded that the part of the RING2 locus covered by the probe 31K1 did not contain any high-frequency polymorphisms for the four enzymes tested. It would be informative to repeat this study with a probe covering a larger region of the genome, such as the RING2 cDNA, CEM21.

CHROMOSOMES ENZYME TESTED FRAGMENTSFREQUENCY

BglU 10 6.0kb 1.0 EcoRI 10 6.0kb 1.0 3.5kb 1.0 MspI 10 1.9kb 1.0 1.6kb 1.0 1.55kb 1.0 TaqI 10 2.6kb 1.0

Table 8.4. Results obtained by hybridising the R1NG2 genomic probe 31K1 to Southern blots of human genomic DNA cut with the enzymes shown.

175 8.6. RFLP analysis of RING3

The RING3 probe was the cDNA clone CEM41. The results obtained w ith this probe are shown in Table 8.5.

CHROMOSOMES ENZYME TESTED FRAGMENTS FREQUENCY

BamHI 10 >20kb 1.0 BglU 8 >20kb 1.0 Dral 10 5.8kb 1.0 3.7kb 1.0 1.8kb 1.0 1.6kb 1.0 EcoRI 10 ~15kb 1.0 EcoRV 10 >20kb 1.0 H in d m 10 5.1kb 1.0 3.0kb 1.0 2.5kb 1.0 MspI 10 ll.Okb 1.0 9.5kb 1.0 2.2kb 1.0 PstI 10 3.9kb 1.0 1.9kb 1.0 1.3kb 1.0 l.Okb 1.0 PvuII 10 4.5kb 0.9 4.3kb 0.1 TaqI 10 2.6kb 0.1 2.0kb 0.9 1.6kb 1.0

Table 8.5. Results obtained by hybridising the RING3 cDNA probe CEM41 to Southern blots of human genomic DNA cut with the enzymes shown.

176 In four of the five PvuII-cut genomic DNA samples, a fragment of 4.5kb was detected by the CEM41 probe. In the fifth sample, a fainter band of 4.5kb and an additional band at 4.3kb was observed. This individual was presumably heterozygous for a PvuII polymorphism. In TaqI digests, the CEM41 probe detected a 1.6kb fragment and a 2.0kb fragment in all five samples. One sample had an additional fragment of 2.6kb. This individual was presumably heterozygous for a TaqI polymorphism. In the small sample tested here, the 4.3kb PvuII fragment and the 2.6kb TaqI fragment were each present on one out of ten chromosomes (frequency 0.1). The two polymorphic fragments were not found in the same individual. The RING3 locus did not contain any high-frequency polymorphisms for the other efgkt enzymes tested.

8.7. RFLP analysis of RING4

The RING4 probe was the cDNA clone 2.1. The results obtained with this probe are shown in Table 8.6. It was concluded that the RING4 locus, as defined by the 2.1 cDNA probe, did not contain any high- frequency polymorphisms for the nine restriction endonucleases tested. However, it has recently been reported that H A M 1, the rat homologue of the RING4 gene, is polym orphic (Deverson et al., 1990). Further studies on RING4 using additional enzymes and more DNA samples may reveal variation at this locus.

177 CHROMOSOMES ENZYME TESTED FRAGMENTS FREQUENCY

BamHI 10 ~18kb 1.0 BglU 8 6.0kb 1.0 3.2kb 1.0 EcoRI 10 llk b 1.0 3.4kb 1.0 EcoRV 10 >20kb 1.0 H in d m 10 5.7kb 1.0 MspI 10 2.1kb 1.0 1.8kb 1.0 PstI 10 9.7kb 1.0 1.6kb 1.0 PvuII 10 1.8kb 1.0 TaqI 10 1.8kb 1.0 1.7kb 1.0

Table 8.6. Results obtained by hybridising the R1NG4 probe 2.1 to Southern blots of genomic DNA cut with the enzymes shown.

8.8. RFLP analysis of B51

B51 encodes the putative human homologue of the mouse KE3 gene, as described in Chapter 7. This probe, a 2.6kb BamHI fragment isolated from the cosmid HPB.ALL 51, has been mapped to a position 80-95kb centromeric of RING1 (Figure 7.6.). The results obtained with this probe are shown in Table 8.7.

B51 detected a two-allele polymorphism with BamHI. Three of the five genomic DNA samples tested contained a single hybridising fragment of about 15kb. The two other samples contained the ~15kb fragment and an additional 2.6kb fragment. These were presumably hetero zygous for the polymorphism. In the small number of chromosomes tested the frequency of the two alleles was 0.8 (~15kb) and 0.2 (2.6kb). No bands were detected in Mspl-digested human DNA, presumably

178 because the B51 probe does not hybridise to any MspI fragments greater than 1.5kb, the minimum fragment size retained on this blot. No high- frequency polymorphisms were detected with the other eight enzymes tested. Potential applications of the B51 RFLPs are discussed in section 9.3.

CHROMOSOMES ENZYME TESTED FRAGMENTS FREQUENCY

BamHI 10 ~15kb 0.8 2.6kb 0.2 BglU 16 5.4kb 1.0 Oral 10 5.8kb 1.0 EcoRI 18 ~15kb 1.0 EcoRV 10 llkb 1.0 1.9kb 1.0 H indm 10 lOkb 1.0 MspI 10 n o n e PstI 10 1.9kb 1.0 1.5kb 1.0 PvuII 10 1.8kb 1.0 1.65kb 1.0 TaqI 10 1.8kb 1.0 1.5kb 1.0

Table 8.7. Results obtained by hybridising the B51 probe to Southern blots of human genomic DNA cut with the enzymes shown.

8.9. Summary and discussion

RFLPs were identified at the COL11A2, RING3 and B51 lod, and their potential uses are discussed in the following chapter. In contrast, the RING1, RING2 and RING4 probes did not detect polymorphisms with the enzymes tested in the small number of individuals examined. It remains possible that RFLPs could be detected with these enzymes in a

179 much larger study, although any such polymorphisms would probably be very rare in the population.

A more thorough approach to identifying polymorphisms at or near these loci would involve the use of many more restriction endo nucleases and longer probes, such as whole cosmids with repetitive sequences competed out. In addition, it may be advisable to resolve each digest on two different gels run to maximise the resolution of long or short fragments. In practice, however, this traditional approach to searching for polymorphism is very time consuming and is ultimately dependent on the presence or absence of a site for a specific restriction endonuclease, which probably accounts only for a small fraction of the natural variation in the human genome. One alternative strategy to identify polymorphism is to search for dinucleotide repeat sequences which often display high frequency variation in the number of repeat units present at a given locus. The variation in repeat number can be detected by PCR. In a recent study of such loci in mice it was found that 88% were polymorphic between strains (Love et al., 1990).

180 9. Concluding discussion

The preceding chapters have described the identification and preliminary characterisation of five novel genes, RING1-5, in and around the class II region of the human MHC. In this chapter the general significance of this work in the context of recent progress in MHC mapping will be discussed along with suggestions for future experiments.

9.1. Advances in MHC mapping

When this study was initiated, the molecular map of the MHC region was dominated by the 'classical' class I, class II and class III genes (Figure 1.5.a.). Over the last three years however the powerful techniques of reverse genetics have led to the discovery of numerous novel genes, including those described in this study, within the MHC region (Figure 1.5.b.).

The greatest progress has been made in the class m region. 890kb of the class in region has now been cloned in overlapping cosmids and this interval contains at least 36 genes (section 1.3.4.). Progress in the class II region has been slower, partly because it was thought that it was unlikely that there would be non-class II genes amongst the known class II genes, given that the region has probably evolved through multiple gene duplication events. It might be expected that such a process would not occur if other genes were in the immediate vicinity since these could be disrupted with deleterious consequences. The a- and P-globin gene complexes, both of which are thought to have evolved through gene duplication, are apparently devoid of other genes. However, it is now becoming apparent that the evolutionary history of the class II region has not precluded the presence of unrelated genes. Evidence has been presented in this thesis for two non-class II genes, R IN G 3 (homologous to the D rosophila developmental gene fsh) and RING4 (a member of the ABC family of

181 transmembrane transporters) at CpG islands between the DNA and DOB genes in the middle of the class II region. Another study of the human class II region has provided evidence for five genes, Y2-5, betw een DNA and DOB (Spies et al., 1990). From sequence data Y3 is equivalent to RING4, and from mapping data Y4 is probably equivalent to RING3. Two genes related to RING4 have recently been described in the rat class II region (Deverson et al., 1990). Finally, seven novel genes have been identified in the equivalent interval in the mouse class II region (Monaco et al., 1990). Sequence data have revealed that three of these are highly diverged class II genes which were not previously identified by cross-hybridisation with class II gene probes (Dr. John Monaco, personal communication; Monaco et al., 1990).

The region immediately centromeric of the class II region has been characterised in man for the first time in the present study. This region contains a cluster of unrelated genes: a fibrillar collagen gene (COL11A2), a gene encoding protein containing a novel cysteine- histidine motif (RING1), a histidine-rich transmembrane protein gene (RING5), a partially characterised gene (RING2), and sequences hom ologous to two mouse genes ( KE3 and KE5). No class H-like sequences were found in this region, and this cluster of genes most likely defines the centromeric end of the class II region and thus the proximal boundary of the whole human MHC.

In contrast to the detailed map of the centromeric end of the MHC region, much less is known about the class I region. Although considerable progress has been made in characterising and mapping the numerous class I-related genes and pseudogenes, there are very large gaps between these sequences whose genetic content is unknown. There is evidence that clinically important genes remain to be discovered in the class I region. Family studies in pedigrees with inherited forms of idiopathic haemochromatosis, spinocerebellar ataxia and juvenile myoclinic epilepsy have indicated that these diseases may be caused by genes mapping in, or close to, the class I region (Edwards et al., 1986; Spence et al., 1989). A combination of genetic and physical mapping of the class I region has provided evidence for very high recombination frequencies telomeric of the HLA-A gene (section 1.3.2.). Thus, disease genes which have been mapped many cM from HLA-A

182 in family studies may in fact be in close physical proximity to HLA-A. Recently, YAC clones covering portions of the class I region have been described and these should facilitate the search for novel genes in the class I region using the same approaches which have been successful in the class in and class II regions (Chimini et al., 1990).

It may be many years before the full genetic content of the MHC is known, partly because the potential total number of genes is very large, given the overall size of the MHC, and partly because the reverse genetic approach to identifying novel transcribed regions may miss some genes. Ultimately, the identification of every gene in the MHC may not be possible until the entire region has been sequenced.

The intensive efforts to map, clone and sequence MHC genes have made the MHC one of the best studied regions of the human genome (Stephens et al., 1990). The detailed maps which are available for the MHC region and the impressive progress made towards cloning the entire region make the MHC an obvious starting point for large scale genomic sequencing of the type envisioned by the Human Genome Project. It is interesting in this context to consider the size of the MHC as a proportion of the total genome: taking the haploid size of the human genome to be 3000Mbp, the size of chromosome 6 to be 165Mbp and the size of the MHC to be 4Mbp, the MHC represents about 2.5% of the length of chromosome 6 and about 1/750th of the length of the human genome

9.2. Potential role of novel genes in class II-associated phenotypes

Regardless of whether the function of a gene is known, some progress can be made in assessing whether that gene is associated with certain diseases by identifying RFLPs in the gene and using these in population association studies as described in section 8.1. However, knowledge of the function of a gene is clearly helpful in formulating hypotheses as to what specific phenotype that gene may influence. In the case of COL11A2, localised to the centromeric end of the class II region in this study, much was already known about the function of the gene

183 (Chapter 4). In contrast, nothing was known about the functions of RING1-5. Partial or complete nucleotide sequences of RING1-5 were obtained as a first step towards overcoming this obstacle (Chapter 6). In four cases out of five the predicted amino add sequence provided dues about the possible function of the gene product and allowed hypotheses to be made about their possible role in MHC-assodated diseases.

9 .2 .1. COL11A2 As discussed in Chapter 4 the characterisation of RFLPs at the COL11A2 locus (section 8.3.) is potentially of interest in ascertaining whether or not alleles of COL11A2 are associated with pauciarticular juvenile rheumatoid arthritis (PJRA). PJRA has previously been shown to display a weak but significant association with the DP region (Odum et al., 1986; Begovich et al., 1989; Fugger et al., 1990) which theoretically may be explained by an involvement of an allele at a closely linked gene which is in linkage disequilibrium with the DP region. The association of COL11A2 RFLPs with RFLPs at the DP loci is being determined to test whether COL11A2 is in linkage disequilibrium with the DP subregion and whether it has a stronger association than DP w ith PJRA.

9 .2.2. RING1 The predicted protein product of the RING1 gene contained a novel cysteine-histidine motif reminiscent of the metal-dependent nucleic acid-binding motifs found in transcription factors (section 6.2.). To test the hypothesis that theRING1 motif is a DNA-binding domain, the N- terminal third of the RING1 gene is being expressed in bacterial cells in order to prepare large quantities for use in DNA binding assays (collaboration with Dr. Ruth Lovering).

Proof that RING1 encodes a transcription factor could be of interest considering the association of the class II region with certain cancers. One of the RING1 -related proteins, containing the same cysteine- histidine motif, was identified in a transformation assay (Takahashi and Cooper, 1987). Hodgkin's lymphoma, chronic lymphocytic leukaemia and acute non-lymphocytic leukaemia all show association with certain DP alleles (Bodmer et al., 1989a; Pawelec et al., 1989). Given that RING1 is in close physical proximity to the DP subregion it is

184 possible that these associations are explained by alleles of RING1 in linkage disequilibrium with DP.

A possible function for RING1 in the regulation of gene expression is also of interest given that RING1 has a mouse homologue which maps within the region known to contain the mouse developmental lethal m utation tw5 (section 7.2.; A rtzt et al., 1988). M any of the genes which cause embryonic abnormalities in Drosophila are transcription factors (Dressier and Gruss, 1988). In this respect it will be of interest to determine whether expression of RING1 is developmentally regulated.

9.2.3. RING2 The partial nucleotide sequence of RING2 (section 6.3.) did not allow any conclusions to be drawn about the possible function of this gene. The main point of interest is that a sequence homologous to RING2 in the mouse maps to the region known to contain the developmental lethal mutation tw5 (section 7.2.; Artzt et al., 1988). This gene is therefore another candidate for tw5.

9.2.4. RING3 The nucleotide and predicted amino add sequence of the RING3 gene reveals that it is the human homologue of the Drosophila develop mental homeotic gene fsh (section 6.4.). Genetic experiments are underway in Drosophila in an attempt to determine how the fsh protein interacts with other known developmental genes (Huang and Dawid, 1990; Dr. Igor Dawid, personal communication). Given the striking conservation of the amino acid sequence between the human and Drosophila proteins it is likely that the RING3 gene product may play a role in mammalian embryonic development, as has been observed for mammalian homologues of other developmentally im portant Drosophila genes. In general these homologues have been studied in the mouse using transgenic techniques to determine their precise developmental and spatial expression patterns. More immediately, it would be of importance to determine by northern blotting whether RING3 was expressed in embryonic tissues.

Evidence that RING3 is a developmentally important gene could be of significance in understanding the association of the MHC with habitual

185 spontaneous abortion. There are numerous reports that women who suffer from habitual abortions have a significantly higher frequency of certain MHC alleles (Christiansen et al., 1989) or show an elevated frequency of shared MHC alleles with their partners (Tiwari and Terasaki, 1985). One interpretation of these findings is that certain MHC haplotypes carry recessive or dominant mutations in a developmentally important locus such as RING 3 (or a human equivalent of tw5) which cause embryonic abnormalities, and consequently abortion. A dominant mutation carried on a maternal haplotype would be detected as an association with the MHC in affected women. A recessive mutation carried on the same haplotype in both partners would be detected as an elevated sharing of haplotypes amongst affected couples.

9.2.5. RING4 The predicted amino acid sequence of the RING4 gene product revealed that it is a member of the ABC superfamily of transmembrane pumps (section 6.5.). This finding is of particular significance given that the RING4 gene maps to the region known to encode a locus required for normal antigen presentation by class I molecules (section 1.6.2.). It has been proposed that a mutant cell line unable to present intracellular antigens by class I molecules has a defect in the mechanism by which peptides were transported from the cytoplasm into the ER (Cerundolo et al., 1990). It has also been hypothesised that peptide transport across the ER membrane could be facilitated by a system like the bacterial oligopeptide permease, which is also an ABC transporter (Townsend and Bodmer, 1989). Clearly the finding of RING4, an ABC transporter gene, in the region which is known to be deleted in the mutant cell line is highly suggestive of a role for the RING4 gene product in peptide transport (Parham, 1990; Trowsdale et al., 1990). In another mutant cell line with a similar defect in class I antigen presentation which is thought to be caused by a point mutation, transcription of RING4 could not be detected (Spies et al., 1990). TheRING4 gene is also being transfected into mutant cell lines to attempt to complement the defect in class I antigen presentation (collaboration with Dr. Alain Townsend).

186 Proof that the RING4 gene product is a transporter which delivers cytoplasmic peptide antigens to class I molecules in the lumen of the ER may be of significance in understanding the association of the class II region with autoimmune diseases. It is conceivable that a peptide transporter could exert selectivity on the type of antigen delivered to the lumen of the ER by preferentially transporting peptides of a certain length or with certain sequence characteristics (Elliott et al., 1990). It could be envisaged that polymorphic forms of the transporter might result in a different selection of peptides, perhaps including critical autoantigenic epitopes from self proteins, being made available for binding to and presentation by class I molecules. In this respect it is interesting that polymorphisms, both at the functional level and the RFLP level, have been identified at the locus equivalent to RING4 in the rat (Livingstone et al., 1989; Deverson et al., 1990) although the h u m an RING4 locus was not polymorphic with the restriction enzymes tested (section 8.7.). It will be of great interest to determine w hetherRING4 is polymorphic with enzymes other than those tested in the present study, as the association of R IN G 4 RFLPs with susceptibility to autoimmune disease can then be investigated.

9.2.6. RING5 The partial nucleotide and predicted amino add sequence of the RING5 gene revealed that it is homologous to the mouse KE4 gene (section 6.6.). The mouse gene has been completely sequenced and the predicted protein product had three potential transmembrane regions and two domains in which histidine was typically found every second residue (St.-Jacques et al., 1990). Although theKE4 gene product does not have any homology with other proteins which could help shed light on its function, it is potentially interesting because (like the mouse homologues of RING1 and RING2, see above) it maps to the region known to contain the mouse developmental lethal gene tw5 (Artzt et al., 1988). Furthermore it is expressed in early embryogenesis, the time at which thetw5 mutation is known to act. Since this mutation has been characterised in the mouse, further work on the KE4/RING5 locus would probably be most productive in this system. The construction of transgenic mice homozygous for mutant forms of KE4 could reveal w hetherKE4 was the tw5 gene.

187 9.3. Additional applications of probes generated in this study

The probes generated during the work described in this thesis provide additional accurately mapped markers for the class II region and the area immediately centromeric of the MHC. These probes have so far been used to characterise X-irradiated fusion hybrids carrying fragments of chromosome 6 (Ragoussis et al., 1991a) and yeast artificial chromo some clones covering the class II region (Ragoussis et al., 1991b).

The RFLPs detected with the B51 probe (section 8.8.) are of interest because B51 is a novel marker mapping over 200kb centromeric of DPB1, the most proximal class II gene in which polymorphism has previously been studied in detail. The association of B51 RFLPs with pauciarticular juvenile arthritis and Hodgkin’s lymphoma could be tested to determine whether B51 is more closely associated with these diseases than the DP subregion. This could provide evidence for new disease susceptibility factors mapping centromeric of the class II region.

The RFLPs characterised at theCOL11A2 locus may be useful for assessing the role of this gene in a heterogeneous group of joint disorders described as osteoarthritis or chondrodysplasia, which are characterised by progressive degeneration or abnormal development of articular cartilage. In families where the disease shows a clear pattern of inheritance, linkage analyses can be performed using polymorphic probes for candidate genes, such as cartilage collagens. To date, the COL2A1 locus has been investigated most intensively because type II collagen is quantitatively the most important in cartilage, and in one pedigree it has been shown that RFLPs at the COL2A1 locus segregated with the inheritance of osteoarthritis with mild chondrodysplasia (Knowlton et al., 1990). However, in other pedigrees the same locus was excluded from being closely linked to the inherited defect (Knowlton et al., 1990; Sykes et al., 1989), which raises the question of an involvement of more minor cartilage components.

188 9.4. More new genes in the class II region?

To date the only interval within the class II region which has been intensively studied for the presence of novel genes is that between DNA and DOB, or its equivalent in rodents (this study; Spies et al., 1990; Deverson et al., 1990; M onaco et al., 1990). In the present study, the strategy taken to isolate novel genes was based on identifying CpG islands, as defined by the presence of clusters of sites for rare-cutter enzymes, in the class II region. This approach proved highly successful, with each cluster of rare-cutter sites being associated with at least one gene, and confirms that CpG islands are highly diagnostic for the presence of genes. Cluster 4, between DQB3 and DQB1 (Figure 3.5.), may therefore mark the position of another novel gene. However, there are additional genes in the class II region which are not associated with CpG islands (Spies et al., 1990). Other as yet uncharacterised intervals in the class n region may also contain additional genes even though they do not necessarily contain CpG islands.

It is interesting that previously undiscovered highly diverged class II- related genes, including an a/P gene pair, have been found in the mouse class II region (Monaco et al., 1990; Dr. John Monaco, personal communication). Based on the striking homology between the human and mouse class II regions (Chapter 7) it would be predicted that homo logues of these genes should also be found in the human class II region. Indeed, probes from two of the mouse genes have been used in this laboratory to identify a homologous a /p gene pair in the human class II region (Adrian Kelly, personal communication). The product of this gene pair could be the novel class II molecule identified by Carra and Accolla (1987; section 1.6.1.), and may have implications for the understanding of the association of autoimmune diseases with the class II region (section 1.5.).

There is still no confirmation of the identification of the genes which encode the subunits of the LMP complex and which map between Ob and Pb in the mouse class II region (section 1.6.3.), although these may be among the novel genes described by Monaco et al. (1990). If the LMP complex is the same as the high molecular weight proteinase complex, as discussed in section 1.6.3., it may be possible to identify the Imp

189 genes on the basis of recently published protein sequence data from proteinase subunits. N-terminal amino acid sequences have been determined for five human proteinase subunits (Lee et al., 1990a), and these could be used to design oligonucleotide probes which in turn could be used to map the genes encoding the subunits.

It is notable that the mouse and rat class II regions contain two genes, HAM1 and HAM2, about 15kb apart, which are related to the human RING4 gene (Monaco et al., 1990; Deverson et al., 1990). HAM1 and HAM2 share 77% identity at the amino acid level, with HAM1 m ost closely related to RING4. RING4 probes have been hybridised at reduced stringency to cosmid and YAC clones in the vicinity of the hum an RING4 gene and weakly cross-hybridising fragments have been detected in some experiments (Dr. John Trowsdale, personal comm unication). It remains to be determined whether humans, like rodents, have a second, diverged, RING4 -like gene.

9.5. Function of the MHC gene cluster

The MHC probably provides an excellent model for a typical region of the mammalian genome, containing a high density of compact genes with a variety of different functions. Constitutively expressed CpG island-associated 'housekeeping' genes are interspersed with tissue- specific genes. Some genes are clearly related to one another by duplication events while others are unique. A number of the genes in the MHC region, such as those for steroid 21-hydroxylase and the a2 chain of type XI collagen, are functionally unrelated to the classical MHC genes. Despite this apparent diversity and complexity it is interesting to note that some of the recently discovered genes in the MHC region may have functions related to those of the classical MHC genes in the mediation of the immune response. It is well established that the classical class I and class II gene products function in shaping the T-cell repertoire and in antigen presentation, while the complement gene products play a key effector role in the humoral immune response. Now there is good evidence that the MHC also encodes the putative peptide transporter which translocates antigens from the cytoplasm into the ER ( RING4 ). The MHC may also encode

190 subunits of a proteolytic complex which could generate peptides from intracellular antigens (Parham, 1990). The hsp70 gene products may function in the binding and chaperoning of peptides in the cytoplasm and the ER (Flynn et al., 1989; Vanbuskirk et al., 1989). The products of the Tnf genes can modulate the expression of MHC molecules (Collins et al., 1986; Pujol-Borrell et al., 1987). There is also evidence that the class ID region encodes a locus which regulates the sensitivity of cells to lysis by natural killer cells (Rembecki et al., 1988).

Another intriguing feature of these genes is that almost all are polymorphic, albeit to varying degrees. The highest level of poly morphism is observed in the products of the classical class I and class n genes (Parham et al., 1988; Marsh and Bodmer, 1989). The C4 gene products are also unusually polymorphic, though less so than those of the class I and class Hgenes (Carroll and Alper, 1987). Lower levels of polymorphism have been described for the C2 and Bf genes (Campbell, 1987), the rat HAM 1 gene (homologous to RING4; Deverson et al., 1990), theTNFB gene (Messer et al., 1991), the hsp70 gene (Caplen et al., 1990), the NK-regulatory gene (Rembecki et al., 1988) and the MHC-encoded Imp genes (Monaco and McDevitt, 1986). It is tempting to speculate that the products of specific alleles at these loci function more efficiently together and that the physical clustering of these genes has been favoured by natural selection because this tends to keep advantageous allelic forms together at meiosis (Parham, 1990; Bodmer et al., 1986). Although in most cases it is not yet known whether the polymorphism observed is of functional significance, evidence to support the hypothesis that certain alleles of MHC gene products may interact favourably has been obtained from studies of antigen presentation in the rat. Naturally occurring allelic forms of the cim locus (which may be the rat peptide transporter) in the class II region affect the specificity and efficiency of antigen presentation by class I molecules (Livingstone et al., 1989; Deverson et al., 1990). The idea that the clustering of these genes may have functional consequences attaches even greater significance to the study of the MHC.

191 10. References

Abe K, Wei J-F, Wei F-S, Hsu Y-C, Uehara H, Artzt K and Bennett D (1988). Searching for coding sequences in the mammalian genome: the H-2K region of the mouse MHC is replete with genes expressed in embryos. EMBO J. 7: 3441-3449.

Aldrich MS (1990). Narcolepsy. New. Eng. J. Med. 323: 389-394.

Amos DB, Gorer PA and Mikulska LB (1955). An analysis of an antigenic system in the mouse (the H-2 system). Proc. Roy. Soc. Lond. B 144: 369-380.

Andersson G, Larhammer D, Widmark E, Servenius B, Peterson PA and Rask L (1987). Class II genes of the human major histocompatibility complex. Organisation and evolutionary relationship of the DRp genes. J. Biol. Chem. 262: 8748-8758.

Ando A, Kawai J, Masahiro M, Tsuji K, Trowsdale J and Inoko H (1989). Mapping and nucleotide sequence of a new HLA class II light chain gene, DQB3. Immunogenetics 30: 243-249.

Antequera F, Boyes J and Bird A (1990). High levels of de novo m ethy- lation and altered chromatin structure at CpG islands in cell lines. Cell 62: 503-514.

Artzt K, Abe K, Uehara H and Bennett D (1988). Intra-H-2 recombin- tw5 ation in t haplotypes shows a hot spot and close linkage of I to H-2K. Immunogenetics 28: 30-37.

Auffray C, Lillie JW, Arnot D, Grossberger D, Kappes D and Strominger JL (1984). Isotypic and allotypic variation of human class II histocom patibility antigens. Nature 308: 327-333.

192 Auffray C, Lillie JW, Korman AJ, Boss JM, Frechin N, Guillemot F, Cooper J, Mulligan RC and Strominger JL (1987). Structure and expression of HLA-DQa and -DXa genes: interallelic alternate splicing of the HLA-DQa gene and functional splicing of the HLA-DXa gene using a retroviral vector. Immunogenetics 26: 63-73.

Ausubel FM, Brent R, Kingston RE, Moore DD, Smith JA, Seidman JG and Struhl K (1987). Current protocols in molecular biology. Wiley Interscience.

Babbitt PJ, Allen PM, Matsueda G, Haber E and Unanue ER (1985). Binding of immunogenic peptides to la histocompatibility molecules. Nature 317: 359-361.

Bach F and Hirschorn K (1964). Lymphocyte interaction: a potential histocompatibility test in vitro. Science 143: 813-814.

Bakke O and Dobberstein B (1990). MHC class II-associated invariant chain contains a sorting signal for endosomal compartments. Cell 63: 707-716.

Banerji J, Sands J, Strominger JL and Spies T (1990). A gene pair from the human major histocompatibility complex encodes large proline- rich proteins with multiple repeated motifs and a single ubiquitin-like domain. Proc. Natl. Acad. Sci. USA 87: 2374-2378.

Barlow DP and Lehrach H (1987). Genetics by gel electrophoresis: the impact of pulsed field gel electrophoresis on mammalian genetics. Trends Genet. 3: 167-171.

Begovich AB, Bugawan TL, Nepom BS, Klitz W, Nepom GT and Erlich HA (1989). A specific HLA-DP(3 allele is associated with paudarticular juvenile rheumatoid arthritis but not adult rheumatoid arthritis. Proc. Natl. Acad. Sci. USA 86: 9489-9493.

Bell JI and Todd JA (1989). HLA class II sequences infer mechanisms for major histocompatibility complex-associated disease susceptibiltiy. Mol. Biol. Med. 6: 43-53.

193 Benacerraf B (1981). Role of the MHC gene products in immune regulation. Science 212: 1229-1238.

Berg JM (1990). Zinc fingers and other metal-binding domains. J. Biol. Chem. 265: 6513-6516.

Berger R, Bemheim A, Sasportes M, Hauptmann G, Hors J, Legrand L and Fellows M (1979). Regional mapping of HLA on the short arm of chromosome 6. Clin. Genet. 15: 245-251.

Bernard M, Yoshioka H, Rodriguez E, Van der Rest M, Kimura T, Ninomiya Y, Olsen BR and Ramirez F (1988). Cloning and sequencing of pro-al(XI) collagen cDNA demonstrates that type XI belongs to the fibrillar class of collagens and reveals that expression of the gene is not restricted to cartilagenous tissue. J. Biol. Chem. 263: 17159-17166.

Bird AP (1986). CpG-rich islands and the function of DNA methylation. Nature 321: 209-213.

Bird AP (1987). CpG islands as gene markers in the vertebrate nucleus .Trends Genet. 3: 342-347.

Bird AP (1990). Two classes of observed frequency for rare-cutter sites in CpG islands. Nucleic Acids Res. 17: 9485.

Bird AP, Taggart M, Frommer F, Miller OJ and MacLeod D (1985). A fraction of the mouse genome that is derived from islands of non- methylated, CpG rich DNA. Cell 40: 91-99.

Biro PA, Cutbush S, Tiercy JM, Paulsen G, Demopulos J, Acton RT, Van Tonder SV, Semana G and Quillivic F (1989). RFLP standardisation report for DQ alpha/Hindlll in Dupont B (ed.), Immunobiology of HLA, vol. I, p824, Springer-Verlag.

Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL and Wiley DC (1987a). Structure of the human class I histocompatibility antigen HLA-A2. Nature 329: 506-512.

194 Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL and Wiley DC (1987b). The foreign antigen binding site and T cell recogn ition regions of class I histocompatibility antigens. Nature 329: 512-518.

Blanck G and Strominger JL (1988). Molecular organization of the DQ subregion (DO-DX-DV-DQ) of the human MHC and its evolutionary implications. J. Immunol. 141: 1734-1737.

Blanck G and Strominger JL (1990). Cosmid clones in the HLA-DZ and -DP subregions. Human Immunol. 27: 265-268.

Blissard GW, Quant-Russell RL, Rohrmann GF and Beaudreau GS (1989). Nucleotide sequence, transcriptional mapping, and temporal expression of the gene encoding p39, a major structural protein of the multicapsid nuclear polyhedrosis virus of Orgyia pseudotsugata. Virology 168: 354-362.

Bodmer JG, Tonks S, Oza AM, Lister TA and Bodmer WF (1989a). HLA- DP based resistance to Hodgkin's disease. Lancet 1:1455-1456.

Bodmer JG, M arsh SGE and Albert E (1990). N om enclature for factors of the HLA system, 1989. Immunol. Today 11: 3-10.

Bodmer WF (1981). HLA structure and function: a contemporary view. Tissue Antigens 17: 9-20.

Bodmer WF, Trowsdale J, Young JAT and Bodmer J (1986). Gene clusters and the evolution of the major histocompatibility system. Phil. Trans. R. Soc. Lond. B 312: 303-315.

Bodmer WF, Albert E, Bodmer JG, Dupont B, Mach B, Mayr WR, Sasazuki T, Schreuder GMT, Svejgaard A and Terasaki PI (1989b). Nomenclature for factors of the HLA system 1987 in Dupont B (ed.) Immunobiology of HLA, vol I, pp72-82, Springer-Verlag.

Bohme J, Andersson M, Andersson G, Moller E, Peterson PA and Rask L (1985). HLA-DR (3 genes vary in number between different DR

195 specificities, whereas the number of DQ P genes is constant. J. Immunol. 135: 2149-2155.

Boissier M-C, Chiocchia G, Ronziere M-C, Herbage D and Fournier C (1990). Arthritogenicity of minor cartilage collagens (types IX and XI) in mice. Arth. Rheum. 33: 1-8.

Bontrop RE, Carpenter CB, Walford R, Sekiguchi S, Bignon JD and Cohen D (1989). RFLP standardization report for DQ alpha/Taql in Dupont B (ed.) Immunobiology of HLA, vol. I, pp815-816, Springer- Verlag.

Brown JH, Jardetzky T, Saper MA, Samraoui B, Bjorkman PJ and Wiley DC (1988). A hypothetical model of the foreign antigen binding site of class II histocompatibility molecules. Nature 332: 845-850.

Brown WRA and Bird AP (1986). Long-range restriction site mapping of mammalian genomic DNA. Nature 322: 477-481.

Buus S, Colon S, Smith C, Freed JH and Grey HM (1986). Interaction between a 'processed' ovalbumin peptide and la molecules. Proc. Natl. Acad. Sci. USA 83: 3968-3971.

Campbell RD (1987). Molecular genetics and polymorphism of C2 and factor B. Brit. Med. Bull. 43: 37-49.

Caplen NJ, Patel A, Millward A, Campbell RD, Ratanachaiyavong S, Wong FS and Demaine AG (1990). Complement C4 and heat shock protein 70 (HSP70) genotypes and type I diabetes mellitus. Immunogen etics 32: 427-430.

Carle GF and Olson MV (1984). Separation of chromosomal DNA mol ecules from yeast by orthogonal-field-alternation gel electrophoresis. Nucleic Acids Res. 12: 5647-5664.

Carra G and Accolla RS (1987). Structural analysis of hum an la antigens reveals the existence of a fourth molecular subset distinct from DP, DQ and DR molecules. J. Exp. Med. 165: 47-63.

196 Carroll MC and Alper CA (1987). Polymorphism and molecular genetics of human C4. Brit. Med. Bull. 43: 50-65.

Carroll MC, Campbell RD, Bently DR and Porter RR (1984). A molecul ar map of the human major histocompatibility complex class in region linking complement genes C4, C2 and factor B. Nature 307: 237-241

Carroll MC, Campbell RD and Porter RR (1985). M apping of steroid 21- hydroxylase genes adjacent to complement component C4 in HLA, the major histocompatibility complex in man. Proc. Natl. Acad. Sci. USA 82: 521-525.

Carroll MC, Katzman P, Alicot EM, Koller BH, Geraghty DE, Orr HT, Strominger JL and Spies T (1987). Linkage map of the human major histocompatibility complex including the tumor necrosis factor gene. Proc. Natl. Acad. Sci. USA 84: 8535-8539.

Ceppelini R, Curtoni ES, Mattiuz PL, Miggiano V, Scudeller G and Serra A (1967). Genetics of leukocyte antigens: a family study of segregation and linkage in Curtoni ES, Mattiuz PL and Tosi RM (eds.) Histocompatibility testing 1967, ppl49-185, Munksgaard.

Cerundolo V, Alexander J, Anderson K, Lamb C, Cresswell P, McMich- ael A, Gotch F and Townsend A (1990). Presentation of viral antigen controlled by a gene in the major histocompatibility complex. Nature 345: 449-452.

Chaplin DD (1985). Molecular organisation and in vitro expression of m urine class ID genes. Immunol. Rev. 87: 61-80.

Cheah KSE, Pui MS, Lui VCH, Wong L, Hsu LCS, Pun KK, Priestly LM and Sykes BC (1990a). An EcoRI RFLP in the human a2(XI) (COL11A2) gene. Nucleic Acids Res. 18: 387.

Cheah KSE, Lui VCH, Yu ECL and Hsu LCS (1990b). A StuI RFLP in the human COL11A2 gene. Nucleic Acids Res. 18: 4964.

197 Chimini G, Boretto J, Marguet D, Lanau F, Lauquin G and Pontarotti P (1990). Molecular analysis of the human MHC class I region using yeast artificial chromosomes. Immunogenetics 32: 419-426.

Christiansen OB, Riisom K, Lauritsen JG, Grunnet N and Jersild C (1989). Association of maternal HLA haplotypes with recurrent spont aneous abortions. Tissue Antigens 34: 190-199.

Collins T, Lapierre LA, Fiers W, Strominger JL and Pober JS (1986). Rec ombinant human tumor necrosis factor increases mRNA levels and surface expression of HLA-A,B antigens in vascular endothelial cells and dermal fibroblasts in vitro. Proc. Natl. Acad. Sci. USA 83: 446-450.

Cooper DN, Taggart MH and Bird AP (1983). Unm ethylated domains in vertebrate DNA. Nucleic Acids Res. 11: 647-658.

Cox DR, Burmeister M, Price E, Kim S and Myers RM (1990). Radiation hybrid mapping: a somatic cell genetic method for constructing high resolution maps of mammalian chromosomes. Science 250: 245-250.

Cress well P (1987). Regulation of HLA class I and class II antigen expression. Brit. Med. Bull. 43: 66-80.

Dausset J (1958). Iso-leco-anticorps. Acta Haem. 20: 156-166.

Davis MM and Bjorkman PJ (1988). T-cell antigen receptor genes and T- cell recognition. Nature 334: 395-402.

Davison AJ and Scott JE (1986). The complete DNA sequence of the varicella-zoster virus. J. Gen. Virol. 67: 1759-1816.

DeMars R, Rudersdorf R, Chang C, Petersen J, Strandtmann J, Korn N, Sidwell B and Orr HT (1985). Mutations that impair a posttranscript- ional step in the expression of HLA-A and -B antigens. Proc. Natl. Acad. Sci. USA 82: 8183-8187.

198 Denaro M, Gustafsson K, Larhammar D, Steinmetz M, Petersen PA and Rask L (1985). Mouse MHC dass II gene Ep2 is dosely related to Ep and to HLA-DR. Immunogenetics 21: 613-616.

Deverson EV, Gow IR, Coadwell WJ, Monaco JJ, Butcher GW and Howard JC (1990). MHC class II region encoding proteins related to the multidrug resistance family of transmembrane transporters. Nature 348: 738-741.

Doherty PC, Blandon RV and Zinkemagel RM (1976). Specificity of virus-immune effector T cells for H-2K or H-2D compatible interac tions: implications for H-antigen diversity Transplant. Rev. 29: 89-124.

Doolittle RF (1986). Of URFs and ORFs: a primer on how to analyze derived amino add sequences. University Science Books.

Dressier GR and Gruss P (1988). Do multigene families regulate vertebrate development? Trends Genet. 4: 214-219.

Dunham I, Sargent CA, Trowsdale J and Campbell RD (1987). Molecular mapping of the human major histocompatibility complex by pulsed field gel electrophoresis. Proc. Natl. Acad. Sci.US A 84: 7237-7241.

Dunham I, Sargent CA, Dawkins RL and Campbell RD (1989). An analysis of variation in the long-range genomic organization of the human major histocompatibility complex dass II region by pulsed field gel electrophoresis. Genomics 5: 787-796.

Dunham I, Sargent CA, Kendall E and Campbell RD (1990). Charact erization of the class in region in different MHC haplotypes by pulsed field gel electrophoresis. Immunogenetics 32: 175-182.

Duyk GM, Kim S, Myers RM and Cox DR (1990). Exon trapping: a genetic screen to identify candidate transcribed sequences in cloned mammalian genomic DNA. Proc. Natl. Acad. Sci. USA 87: 8995-8999.

199 Edwards CQ, Griffen LM, Dadone MM, Skolnick MH and Kushner JP (1986). Mapping of the locus for hereditary haemochromatosis: localis ation between HLA-B and HLA-A. Am. ]. Hum. Genet. 38: 805-811.

Elliott T, Townsend A and Cerundolo V (1990). Naturally processed peptides. Nature 348: 195-197.

Erlich H, Lee JS, Petersen JW, Bugawan T and DeMars R (1986). Molecular analysis of HLA class I and class II antigen loss mutants rev eals a homozygous deletion of the DR, DQ, and part of the DP region: implications for class II gene order. Human Immunol. 16: 205-219.

Estivill X, Farrall M, Scambler PJ, Bell GM, Hawley KMF, Lench NJ, Bates GP, Kruyer HC, Frederick PA, Stanier P, Watson EK, Williamson R and Wainwright BJ (1987). A candidate for the cystic fibrosis locus isolated by selection for methylation free islands. Nature 326: 840-845.

Feinberg AP and Vogelstein B (1983). A technique for radiolabelling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem. 132: 6-13

Feinberg AP and Vogelstein B (1984). Addendum: A technique for radiolabelling DNA restriction fragments to high specific activity. Anal. Biochem. 137: 266-267.

Felmlee T, Pellett S and Welch RA (1985). Nucleotide sequence of an Escherichia coli chromosomal hemolysin. J. Bacteriol. 163: 94-105.

Fla veil RA, Allen H, Burkly LC, Sherman DH, Waneck GL and Widera G (1986). Molecular biology of the H-2 histocompatibility complex. Science 233: 437-443.

Flynn GC, Chappell TG and Rothman JE (1989). Peptide binding and release by proteins implicated as catalysts of protein assembly. Science 245: 385-390.

200 Fu SM, Kunkel HG, Brusman HP, Allen FH and Fotino M (1974). Evidence for linkage between HL-A histocompatibility genes and those involved in the synthesis of the second component of complement. /. Exp. Med. 140: 1108-1111.

Fugger L, Ryder LP, Morling N, Odum N, Friis J, Pedersen FK, Heilmann C, Sandberg-Wollheim M and Svejgaard A (1990). DNA typing for HLA-DPB1*02 and -DPB1*04 in multiple sclerosis and juvenile rheumatoid arthritis. Immunogenetics 32: 150-156.

Furuto DK and Miller EJ (1983). Different levels of glycosylation contri bute to the heterogeneity of the al(II) collagen chains derived from a transplantable rat chondrosarcoma. Arch. Biochem. Biophys. 226: 604- 611.

Gardiner-Garden M and Frommer M (1987). CpG islands in vertebrate genomes. J. Mol. Biol. 196: 261-282.

Garrett TPJ, Saper MA, Bjorkman PJ, Strominger JL and Wiley DC (1989). Specificity pockets for the side chains of peptide antigens in HLA-Aw68. Nature 342: 692-696.

Gaskins HR, Prochaka M, Nadeau JH, Henson VW and Lei ter EH (1990). Localisation of a mouse heat shock Hsp70 gene within the H-2 complex. Immunogenetics 32: 286-289.

Geraghty DE, Wei X, Orr HT and Koller BH (1990). HLA-F: an expressed HLA gene composed of a class I coding sequence linked to a novel transcribed repetitive element. J. Exp. Med. 171: 1-18.

Gorer PA (1937). The genetic and antigenic basis of tumour trans plantation. J. Pathol. Bacteriol. 44: 691-697.

Gorer PA, Lyman S and Snell GD (1948). Studies on the genetic and ant igenic basis of tumour transplantation. Linkage between a histocomp atibility gene and 'fused' in mice. Proc. Roy. Soc. Lond. 135: B 499-505.

201 Goodfellow PN, Banting G, Trowsdale J, Chambers S and Solomon E. (1982). Introduction of a human X-6 translocation chromosome into a mouse teratocarcinoma: investigation of control of HLA-A,B,C expression. Proc. Natl.Acad. Sci. USA 79: 1190-1194.

Gustafsson K, Widmark E, Jonsson A-K, Servenius B, Sachs DH, Larhammer D, Rask L and Peterson PA (1987). Class II genes of the human major histocompatibility complex. Evolution of the DP region as deduced from the nucleotide sequences of the four genes. J. Biol. Chem. 262: 8778-8786.

Hanson IM, Gorman P, Lui VCH, Cheah KSE, Solomon E and Trows dale J (1989). The human a2(XI) collagen gene (COL11A2) maps to the centromeric border of the major histocompatibility complex on chromosome 6. Genomics 5: 925-931.

Hard T, Kellenbach E, Boelens R, Maler BR, Dahlman K, Freedman LP, Carlstedt-Duke J, Yamamoto KR, Gustafsson J-A and Kaptein R (1990). Solution structure of the glucocorticoid receptor DNA-binding domain. Science 249: 157-160.

Hardy DA, Bell JI, Long EO, Lindsten T and McDevitt HO (1986). Mapping of the class II region of the human major histocompatibility complex by pulsed field gel electrophoresis. Nature 323: 453-455.

Hashimoto K, Nakanishi T and Kurosawa Y (1990). Isolation of carp genes encoding major histocompatibility complex antigens. Proc. Natl. Acad. Sci. USA 87: 6863-6867.

Hastie ND, Porteous DJ, Bickmore W, Maule J and van Heyningen V (1988). Molecular analysis of the aniridia-Wilm's tumor syndrome. Curr. Top. Microbiol. Immunol. 137: 41-46.

Haynes SR, Moze BA, Bhatia-Dey N and Dawid IB (1989). The Drosophila fsh locus, a maternal effect homeotic gene, encodes apparent membrane proteins. Dev. Biol. 134: 246-257.

202 Henry I, Bernheim A, Bernard M, Van der Rest M, Kimura T, Jeanpierre C, Barichard F, Berger R, Olsen BR, Ramirez F and Junien C (1988). Mapping of the fibrillar collagen gene, pro-al(XI) (COL11A1), to the p21 region of chromosome 1. Genomics 3: 87-90.

Hiles ID, Gallagher MP, Jamieson D. and Higgins CF (1987). Molecular characterization of the oligopeptide permease of Salmonella typhimurium. J. Mol. Biol. 195: 125-142.

Hood L, Kronenberg M and Hunkapiller T (1985). T cell antigen recep tors and the immunoglobulin supergene family. Cell 40: 225-229.

Huang D-H and Dawid IB (1990). The maternal-effect gene fsh is essential for the specification of the central region of the Drosophila embryo. The New Biologist 2: 163-170.

Hyde SC, Emsley P, Hartshorn MJ, Mimmack MM, Gileadi U, Pearce SR, Gallagher, MP, Gill DR, Hubbard RE and Higgins CF (1990). Structural model of ATP-binding proteins associated with cystic fibrosis, multidrug resistance and bacterial transport. Nature 356: 362- 365.

Inoko H and Trowsdale J (1987). Linkage of the TNF genes to the HLA- B locus. Nucleic Acids Res. 15: 8957-8962.

Inoko H, Tsuji K, Groves V and Trowsdale J (1989). Mapping of the HLA class II genes by pulsed field gel electrophoresis and size polymor phism in Dupont B (ed.) Immunobiology of HLA, vol II, pp83-36, Springer-Verlag.

Jonsson A-K, Hyldig-Nielsen J-J, Servenius B, Larhammer D, Andersson G, Jorgensen F, Peterson PA and Rask L (1987). Class II genes of the human major histocompatibility complex. Comparisons of the DQ and DX a and p chain genes. J. Biol. Chem. 262: 8767-8777.

Juranka PF, Zastawny RL and Ling V (1989). P-glycoprotein: m ultidrug- resistance and a superfamily of membrane associated transport proteins. FASEB J. 3: 2583-2592.

203 Kaufman JF, Auffray C, Korman AJ, Shackleford DA and Strominger J (1984). The class II molecules of the human and murine major histocompatibility complex. Cell 36: 1-13.

Kaufman J, Skjoedt K and Salomonsen J (1990). The MHC molecules of nonmammalian vertebrates. Immunol. Rev. 113: 83-118.

Kawai J, Ando A, Sato T, Nakatsuji T, Tsuji K and Inoko H (1989). Analysis of gene structure and antigen determinants of DR2 antigens using DR gene transfer into mouse L cells. J. Immunol. 142: 312-317.

Kelly AP (1988). M. Phil. Thesis, University of London.

Kelly AP and Trowsdale J (1985). Complete nucleotide sequence of a functional HLA-DPp gene and the region between the DPpl and DPal genes: comparison of the 5* ends of HLA class II genes. Nucleic Acids Res. 13: 1607-1621.

Kendall E, Sargent CA and Campbell RD (1990). Human major histocompatibility complex contains a new cluster of genes between the HLA-D and complement C4 loci. Nucleic Acids Res. 18: 7251-7257.

Kimura T, Cheah KSE, Chan SD, Lui VCH, Mattei M-G, Van der Rest TM, Ono K, Solomon E, Ninomiya Y and Olsen BR (1989). The human a2(XI) collagen (COL11A2) chain. Molecular cloning of cDNA and genomic DNA reveal characteristics of a fibrillar collagen with differences in genomic organisation. J. Biol. Chem. 264: 13910-13916.

Kinzler KW and Vogelstein B (1989). Whole genome PCR: application to the identification of sequences bound by gene regulatory proteins. Nucleic Acids Res. 17: 3645-3653.

Klein J (1986). Natural history of the major histocompatibility complex. John Wiley and Sons.

Klein J and Takahara N (1990). The major histocompatibility complex and the quest for origins. Immunol. Rev. 113: 5-26.

204 Knowlton RG, Katzenstein PL, Moskowitz RW, Weaver EJ, Malemud CJ, Pathria MN, Jimenez SA and Prockop DJ (1990). Genetic linkage of a polymorphism in the type II procollagen gene (COL2A1) to primary osteoarthritis associated with mild chondrodysplasia. New Eng. J. Med. 322: 526-530.

Koller BH, Geraghty DE, Shimizu Y, DeMars R and Orr HT (1988). HLA-E: a novel HLA class I genes expressed in resting T-lymphocytes. J. Immunol. 141: 897-904.

Koller BH, Geraghty DE, DeMars R, Duvick L, Rich SR and Orr HT (1989). Chromosomal organization of the human major histocomp atibility complex class I gene family. J. Exp. Med. 169: 469-480.

Korman AJ, Knudsen PJ, Kaufman JF and Strominger JL (1982a). cDNA clones for the heavy chain of HLA-DR antigens obtained after immunopurification of polysomes by monoclonal antibodies. Proc. Natl. Acad. Sci. USA 79: 1844-1848.

Korman AJ, Auffray C, Schamboeck A and Strominger JL (1982b). The amino acid sequence and gene origin of the heavy chain of the HLA- DR antigen: homology to immunoglobulins. Proc. Natl. Acad. Sci. USA 79: 6013-6017.

Kovats S, Main EK, Librach C, Stubblemine M, Fisher SJ and Demars R (1990). A class I antigen, HLA-G, expressed in human trophoblasts. Science 248: 220-223.

Kozak M (1986). Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44: 283-292.

Kroemer G, Bernot A, Behar A-M, Gastinel L-N, Guillemot F, Park I, Thoraval P, Zoorob R and Auffray C (1990). Molecular genetics of the chicken MHC: current status and evolutionary aspects. Immunol. Rev. 113:119-146.

205 Kulaga H, Sogn JA, Weissman JD, Marche PN, LeGuem C, Long EO and Kindt TJ (1987). Expression patterns of MHC class n genes in rabbit tissues indicate close homology to the human counterparts. J. Im m unol 139: 587-592.

Lamm LU, Jorgensen F and Kissmeyer-Nielsen F (1976). Bf maps between the HLA-A and -D lod. Tissue Antigens 7: 122-124.

Larhammer D, Hammerling U, Rask L and Peterson PA (1985). Sequence of gene and cDNA encoding murine major histocompat ibility complex class II gene Ap2. J. Biol. Chem. 260: 14111-14119.

Lavia P, MacLeod D and Bird A (1987). Coincident start sites for diverg ent transcripts at a randomly selected CpG-rich island. EMBO J. 6: 2773- 2779.

Law ML, Tung L, Morse HG, Berger R, Jones C, Cheah KSE and Solomon E (1986). The human type II collagen gene (COL2A1) assigned to 12ql4.3 Ann. Hum. Genet. 50: 131-137.

Law ML, Chan SDH, Berger R, Jones C, Kau FT, Solomon E and Cheah, KSE (1990). The gene for the a2 chain of the human fibrillar collagen type XI (COL11A2) assigned to the short arm of chromosome 6. A nn. Hum. Genet. 54: 23-29.

Lawrance, S. K. and Smith, C. L. (1990). Megabase-scale restriction fragment length polymorphisms in the human major histo compatibility complex. Genomics 8: 394-399.

Lee LW, Moomaw CR, Orth K, McGuire MJ, DeMartino GN and Slaughter CA (1990a). Relationships among the subunits of the high molecular weight proteinase, macropain (proteasome). Biochim. Bio- phys. Acta. 1037: 178-185.

Lee MS, Gippert GP, Soman KV, Case DA and Wright PE (1990b). Three-dimensional solution structure of a single zinc finger DNA- binding domain. Science 245: 635-637.

206 Levi-Strauss M, Carroll MC, Steinmetz M and Meo T (1988). A previously undetected MHC gene with an unusual periodic structure. Science 240: 201-204.

Lindsay S and Bird AP (1987). Use of restriction enzymes to detect potential gene sequences in mammalian DNA. Nature 327: 336-338.

Little CC and Tyzzer EE (1916). Further experimental studies on the inheritance of susceptibility to a transplantable tumour, carcinoma (JwA) of the Japanese waltzing mouse. J. Med. Res. 33: 393-427.

Livingstone AM, Powis SJ, Diamond AG, Butcher GW and Howard JC (1989). A trans-acting major histocompatibility complex-linked gene whose alleles determine gain and loss changes in the antigenic structure of a classical class I molecule. J. Exp. Med. 170: 777-795.

Londei M, Savill CM, Verhoef A, Brennan F, Leech, ZA, Duance V, Maini RN and Feldmann M (1989). Persistance of collagen type II- specific T-cell clones in the synovial membrane of a patient with rheumatoid arthritis. Proc. Natl. Acad. Sci. USA 86: 636-640.

Long EO, Wake CT, Gorski J and Mach B (1983). Complete sequence of an HLA-DRp chain deduced from a cDNA clone and identification of multiple non-allelic DR(3 chains. EMBO J. 2: 389-394.

Lotz M and Vaughan JH (1988). Rheumatoid arthritis in Samter M (ed.) Immunological diseases, ppl365-1416, 4th edition, Little Brown and Co.

Love JM, Knight AM, McAleer MA and Todd JA (1990). Towards construction of a high resolution map of the mouse genome using PCR-analysed microsatellites. Nucleic Acids Res. 18: 4123-4128.

Marcadet A, Dupont B and Cohen D (1989) Organisation and design of the Southern blot component of the tenth histocompatibility workshop in Dupont B (ed.) Immunobiology of HLA, vol. I, pp560-566, Springer Verlag.

207 Marrack P and Kappler J (1988). The T-cell repertoire for antigen and MHC. Immunol. Today 9: 308-315.

Marsh SGE and Bodmer JG (1989). HLA-DR and -DQ epitopes and monoclonal antibody specificity. Immunol. Today 10: 305-312.

McDevitt HO, Deak BD, Shreffler DC, Klein J, Stimpfling JH and Snell GD (1972). Genetic control of the immune response: mapping of the IR-1 locus. /. Exp. Med. 135:1259-1278.

McGrath JP and Varshavsky A (1989). The yeast STE6 gene encodes a homologue of the mammalian multidrug resistance P-glycoprotein. Nature 340: 400-404.

McKeon C, Ohkubo H, Pastan I and de Crombrugghe B (1982). Unusual methylation pattern of the a2(I) collagen gene. Cell 29: 203-210.

Mendler M, Eich-Bender SG, Vaughan L, Winterhalter KH and Bruckner P (1989). Cartilage contains mixed fibrils of collagen types n, IX and XL J. Cell Biol. 108: 191-197.

Meo T, Krasteff T and Shreffler DC (1975). Immunochemical characterization of murine H-2 controlled Ss (serum substance) protein through identification of its human homologues as the fourth comp onent of complement. Proc. Natl. Acad. Sci. USA 72: 4536-4540.

Messer G, Spengler U, Jung MC, Honold G, Blomer K, Pape GR, Riethmuller G and Weiss EH (1991). Polymorphic structure of the tumour necrosis factor locus: an Ncol polymorphism in the first intron of the TNF-P gene correlates with a variant amino acid in position 26 and a reduced level of TNF-p production. /. Exp. Med. 173: 209-219.

Miller EJ and Gay S. (1987). The collagens: an overview and update. Meth. Enzymol. 144: 3-40.

Milner CM and Campbell RD (1990). Structure and expression of three MHC-linked HSP70 genes. Immunogenetics 32: 242-251.

208 Monaco AP and Kunkel LM (1987). A giant locus for the Duchenne and Becker muscular dystrophy gene. Trends Genet. 3: 33-37.

Monaco AP, Neve RL, Colletti-Feener C, Bertelson CJ, Kurnit DM and Kunkel LM (1986). Isolation of candidate cDNAs for portions of the Duchenne muscular dystrophy gene. Nature 323: 646-650.

Monaco JJ and McDevitt HO (1982). Identification of a fourth class of proteins linked to the murine major histocompatibility complex. Proc. Natl. Acad. Sci. USA 79: 3001-3005.

Monaco JJ and McDevitt HO (1984). H-2-linked low-molecular weight polypeptide antigens assemble into an unusual macromolecular complex. Nature 309: 797-799.

Monaco JJ and McDevitt HO (1986). The LMP antigens: a stable MHC- controlled multisubunit protein complex. Human Immunol. 15: 416- 426.

Monaco JJ/ Cho S and Attaya M (1990). Transport protein genes in the murine MHC: possible implications for antigen processing. Science 250: 1723-1726.

Morel Y, Bristow J, Gitelman SE and Miller WL (1989). Transcript enco ded on the opposite strand of the human 21-hydroxylase/complement component C4 locus. Proc. Natl. Acad. Sci. USA 86: 6582-6586.

Morris N P and Bachinger HP (1987). Type XI collagen is a heterotrim er with the composition (la, 2a, 3a) retaining non-triple-helical domains. J. Biol. Chem. 262: 11345-11350.

Nagarajan L, Louie E, Tsujimoto Y, Ar-Rushdie A, Huebner K and Croce C (1986). Localisation of the human pirn oncogene (PIM) to a region of chromosome 6 involved in translocations in acute leukaemias. Proc. Natl. Acad. Sci. USA 83: 2556-2560.

209 Neefjes JJ, Stollorz V, Peters PJ, Geuze HJ and Ploegh HL (1990). The biosynthetic pathway of MHC class II but not class I molecules intersects the endocytic route. Cell 61: 171-183.

Nepom GT (1990). MHC genes and HLA-associated diseases. Curr. Opin. Immunol. 2: 588-592.

Odum N, Morling N, Friis J, Heilmann C, Hyldig-Nielsen JJ, Jakobsen BK, Pedersen FK, Platz P, Ryder LP and Svejgaard A (1986). Increased frequency of HLA-DPw2 in pauciarticular onset juvenile rheumatoid arthritis. Tissue Antigens 28: 245-250.

Okada K, Prentice HL, Boss JM, Levy DJ, Kappes D, Spies T, Raghupathy R, Mengler RA, Auffray C and Strominger JL (1985a). SB subregion of the human major histocompatibility complex: gene organization, allelic polymorphism and expression in transformed cells. EMBO ]. 3: 739-748.

Okada K, Boss JM, Spies T, Mengler R, Auffray C, Lillie J, Grossberger D and Strominger JL (1985b). Gene organisation of DC and DX subregions of the human major histocompatinility complex. Proc. Natl. Acad. Sci. USA 82: 3410-3414.

O'Neill GJ, Yang SY, Tegoli J, Berger R and Dupont B (1978). Chido and Rodgers blood groups are distinct antigenic components of human complem ent C4. Nature 273: 668-670.

Orkin S (1986). Reverse genetics and hum an disease. Cell 47: 845-850.

O rr HT and DeMars R (1983). Class I-like HLA genes m ap telomeric to the HLA-A2 locus in human cells. Nature 302: 534-536.

Owen MJ and Crumpton MJ (1987). The role of class I and class II antigens in T-cell recognition. Brit. Med. Bull. 43: 228-240.

Parham P, Lomen CE, Lawlor DA, Ways JP, Holmes N, Coppin HL, Salter RD, Wan AM and Ennis PD (1988). Nature of polymorphism in HLA-A, -B and -C molecules. Proc. Natl. Acad. Sci. USA. 85: 4005-4009.

210 Parham P (1990). Transporters of delight. Nature 348: 674-675.

Patarca R, Schwartz J, Singh RP, Kong Q-T, Murphy E, Anderson Y, Sheng F-YW, Singh P, Johnson KA, Guamagia SM, Durfee T, Blattner F and Cantor H (1988). Rpt-1, an intracellular protein from helper/- inducer T cells that regulates gene expression of interleukin 2 receptor and human immunodeficiency virus type 1. Proc. Natl. Acad. Sci. USA 85:2733-2737.

Pawelec G, Rehbein A, Schaudt K and Busch FW (1989). Frequencies of HLA-DP alleles in four major types of leukaemia. Tissue Antigens 34: 138-140.

Payne R and Rolfs MR (1958). Fetomaternal leukocyte incompatibility. J. Clin. Invest. 37: 1756-1763.

Payne R, Tripp M, Weigle J, Bodmer W and Bodmer J (1964). A new leukocyte isoantigen system in man. Cold Spring Harbor Symp. Quant. Biol. 29: 285-295.

Perry LJ, Rixon FJ, Everett RD, Frame MC and McGeoch DJ (1986). Characterization of the IE110 gene of herpes simplex virus type 1. J. Gen. Virol. 67: 2365-2380.

Pontarotti P, Chimini G, Nguyen C, Boretto J and Jordan BR (1988). CpG islands and HTF islands in the HLA class I region: investigation of the methylation status of class I genes leads to precise physical mapping of the HLA-B and -C genes. Nucleic Acids Res. 16: 6767-6778.

Poustka A and Lehrach H (1986). Jumping libraries and linking libraries: the next generation of molecular tools in mammalian genetics. Trends Genet. 2: 174-179.

Poustka A and Lehrach H (1988). Chromosome jumping: a long range cloning technique in Setlow JK (ed.) Genetic engineering, vol. 10, pp!69-193, Plenum Press.

211 Priestly LM, Sykes BC and Cheah KSE (1990). A BamHI RFLP in the human a2(XI) (COL11A2) gene. Nucleic Acids Res. 18: 689.

Pujol-Borrell R, Todd I, Doshi M, Bottazzo GF, Sutton R, Gray D, Adolf GR and Feldmann M (1987). HLA dass II induction in hum an islet cells by interferon-y plus tumour necrosis factor or lymphotoxin. N ature 326: 304-306.

Ragoussis J, Bloemer K, Pohla H, Messer G, Weiss E and Ziegler A (1989). A physical map including a new class I gene (cdal2) of the human major histocompatibility complex (A2/B13 haplotype) derived from a monosomy 6 mutant cell line. Genomics 4: 301-308.

Ragoussis J, Jones TA, Sheer D, Shrimpton AE, Goodfellow PN, Trowsdale J and Ziegler A (1991a). Isolation of probes specific to chromosomal region 6p21 from immunoselected irradiation-fusion- gene transfer hybrids. Genomics in the press.

Ragoussis J, Monaco A, Mockridge I, Kendall E, Campbell RD and Trowsdale J (1991b). Cloning of the HLA class II region in YACs. Proc. Natl. Acad. Sci. USA in the press.

Rao JKM and Argos P (1986). A conformational preference param eter to predict helices in integral membrane proteins. Biochim. Biophys. Acta. 869:197-214.

Rappold GA, Stubbs L, Labeit S, Crkvenjakov RB and Lehrach H (1987). Identification of a testis-spedfic gene from the mouse t-complex. EMBO J. 6:1975-1980.

Reid KBM (1988). The complement system in Hames BD and Glover DM (eds.) Molecular immunology, pl89-241, IRL Press.

Rembecki RM, Kumar V, David CS and Bennett M (1988). Bone marrow transplants involving intra-H-2 recombinant inbred mouse strains. J. Immunol. 141: 2253-2260.

212 Richardson WD, Roberts BL and Smith AE (1986). Nuclear location signals in polyoma virus large-T. Cell 44: 77-85.

Rivett AJ (1989). The multicatalytic proteinase of mammalian cells. Arch. Biochem. Biophys. 268: 1-18.

Rollini P, Mach B and Gorski J (1985). Linkage map of three HLA-DRp chain genes: evidence for a recent duplication event. Proc. Natl. Acad. Sci. USA 82: 7197-7201.

Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, Rozmahel R, Cole JL, Kennedy D, Hidaka N, Zsiga M, Buchwald M, Riordan JR, Tsui L-C and Collins FS (1989). Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245: 1059- 1065.

Rosenberg WMC, W ordsworth BP, Jewell DP and Bell JI (1989). A locus telomeric to HLA-DPB encodes susceptibility to coeliac disease. Immunogenetics 30: 307-310.

St.-Jacques SB, Han T-H, MacMurray A and Shin H-S (1990). A putative transmembrane protein with histidine-rich charge clusters encoded in the H-2K/tw5 region of mice. Mol. Cell. Biol. 10:138-145.

Sambrook J, Fritsch EF and Maniatis T (1989). Molecular cloning: a laboratory manual, 2nd edition, Cold Spring Harbor Laboratory Press.

Sanger F, Miklen S and Coulson AR (1977). DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467.

Sargent CA, Dunham I, Trowsdale J and Campbell RD (1989a). Human major histocompatibility complex contains genes for the major heat shock protein HSP70. Proc. Natl. Acad Sci. USA 86: 1968-1972.

Sargent CA, Dunham I and Campbell RD (1989b). Identification of multiple HTF-associated genes in the human major histocompatibility complex class in region. EMBO J. 8: 2305-2312.

213 Schatz DG, Oettinger MA and Baltimore D (1989). The V(D)J recombinatiom activating gene, RAG-1. Cell 59: 1035-1048.

Schlessinger D (1990). Yeast artificial chromosomes: tools for mapping and analysis of complex genomes. Trends Genet. 6: 248-258.

Schwartz RH (1989). Acquisition of immunologic self tolerance. Cell 57: 1073-1081.

Schwartz SC and Cantor CR (1984). Separation of yeast chromosome sized DNAs by pulsed field gradient gel electrophoresis. Cell 37: 67-75.

Seed B (1987). An LFA-3 cDNA encodes a phospholipid-linked memb rane protein homologous to its receptor CD2. Nature 329: 840-842.

Sekiguchi T, Miyata T and Nishimoto T (1988). Molecular cloning of the cDNA of human X chromosomal gene (CCG1) which complements the temperature-sensitive G1 mutants tsBN462 and tsl3 of the BHK cell line. EMBO J. 7:1683-1687.

Servenius B, Gustafsson K, Widmark E, Emmoth E, Andersson G, Larhammer D, Rask L and Peterson P (1984). Class II genes of the human major histocompatibility complex. Molecular map of the human HLA-SB (HLA-DP) region and seqence of an SBa (DPa) pseudogene. EMBO J. 3: 3209-3214.

Servenius B, Rask L and Peterson PA (1987). Class II genes of the human major histocompatibility complex. The DOP gene is a divergent member of the class II p gene family. ]. Biol. Chem. 262: 8759-8766.

Shaw S, Johnson AH and Shearer GM (1980). Evidence for a new segregant series of B cell antigens that are encoded in the HLA-D region and that stimulate secondary allogeneic proliferative and cytotoxic responses. J. Exp. Med. 152: 565-580.

Shaw S, Kavathas P, Pollack MS, Charmot D and Mawas C (1981). Family studies define a new histocompatibility locus, SB, between HLA-DR and GLO. Nature 293: 745-747.

214 Sheets MD, Ogg SC and Wickens MP (1990). Point mutations in AAUAAA and the poly(A) addition site: effects on the accuracy and efficiency of cleavage and polyadenylation in vitro. Nucleic Acids Res. 18: 5799-5805.

Shreffler DC and David CS (1972). Studies on recombination within the mouse H-2 complex. Tissue Antigens 2: 232-240.

Shreffler DC and Owen RD (1963). A serologically detected variant in mouse serum: inheritance and assocaition with the histocompatibil ity-2 locus. Genetics 48: 9-26.

Smith CL, Lawrance SK, Gillespie GA, Cantor CR, Weissman SM and Collins FS (1987). Strategies for mapping and cloning macroregions of mammalian genomes. Meth. Enzymol. 151: 461-489.

Spence MA, Spurr NK and Field LL (1989). Report of the committee on the genetic constitution of chromosome 6. Cytogenet. Cell Genet. 51: 149-165.

Spielman R, Lee J, Bodmer WF, Bodmer JG and Trowsdale J (1984). Six HLA-D region a-chain genes on human chromosome 6: poly morphisms and associations of DCa-related sequences in DR types. Proc. Natl. Acad. Sci. USA 81: 3461-3465.

Spies T, Sorrentino R, Boss JM, Okada K and Strominger JL (1985). Structural organisation of the DR subregion of the human major histocompatibility complex. Proc. Natl. Acad. Sci. USA 82: 5165-5169.

Spies T, Morton CC, Nedospasov SA, Fiers W, Pious D and Strominger JL (1986). Genes for the tumor necrosis factors a and p are linked to the human major histocompatibility complex. Proc. Natl. Acad. Sci. USA 83: 8699-8702.

Spies T, Blanck G, Bresnahan M, Sands J and Strominger JL (1989a). A new cluster of genes within the major histocompatibility complex. Science 243: 214-217.

215 Spies T, Bresnahan M and Strominger JL (1989b). Human major histocompatibility complex contains a minimum of 19 genes between the complement cluster and HLA-B. Proc. Natl. Acad. Sci. USA 86: 8955-8958.

Spies T, Bresnahan M, Bahram S, Arnold D, Blanck G, Mellins E, Pious D and DeMars R (1990). A gene in the human major histocompatibility complex class II region controlling the class I antigen presentation pathway. Nature 348: 744-747.

Sprent J (1978). Role of H-2 complex in induction of T helper cells in vivo. J. Exp. Med. 148: 478-489.

Spring B, Fonatsch C, Muller C, Pawelec G, Kompf J, Wernet P and Zeigler A (1985) Refinement of HLA gene m apping with induced B-cell line mutants. Immunogenetics 21: 277-291.

Squire J, Dryja TP, Dunn J, Goddard A, Hofmann T, Musarella M, Will ard HF, Becker AJ, Gallie B and Phillips RA (1986). Cloning of the ester ase D gene: a polymorphic gene probe closely linked to the retinoblast oma locus on chromosome 13. Proc. Natl. Acad. Sci. USA 83: 6573-6577.

Steinmetz M, Stephan D and Fischer-Lindahl K (1986). Gene organiz ation and recombinational hotspots in the murine major histocompat ibility complex. Cell 44: 895-904.

Stephens HAF, Vaughan RW, Sakkas LI, Welsh KI and Panayi GS (1989). Southern blot analysis of HLA-DP gene polymorphism in Caucasoid rheumatoid arthritis (RA) patients and controls. Im m uno genetics 30: 149-155.

Stephens JC, Cavanaugh M, Gradie MI, M ador ML and Kidd KK (1990). Mapping the human genome: current status. Science 250: 237-244.

Strominger JL (1987). Structure of class I and class II HLA antigens. Brit. Med. Bull. 43: 81-93.

216 Stuart JM, Townes AS and Kang AM (1984). Collagen autoimmune arthritis. Ann. Rev. Immunol. 2: 199-218.

Svejgaard A, Nielsen LS, Ryder LP, Kissmeyer-Nielsen F, Sandberg L, Lindholm A and Thorsby E (1973). Subdivision of HL-A antigens. Evidence of a 'new' segregant series in Dausset J and Colombani J (eds.) Histocompatibility testing 1972, pp465-473, Munksgaard.

Sykes B (1989). Inherited collagen disorders. Mol. Biol. Med. 6: 19-26.

Takahashi M and Cooper GM (1987). ret transforming gene encodes a fusion protein homologous to tyrosine kinases. Mol. Cell Biol. 7: 1378- 1385.

Takahashi M, Inaguma Y, Hiai H, and Hirose F (1988). Development- ally regulated expression of a human "finger'-containing gene encoded by the 5' half of the ret transforming gene. Mol. Cell. Biol. 8: 1853-1856.

Teyton L, O'Sullivan D, Dickson PW, Lotteau V, Sette A, Fink P and Peterson PA (1990). Invariant chain distinguishes between the exogen ous and endogenous antigen presentation pathways. Nature 348: 39-44.

Thiem SM and Miller LK (1989). A baculovirus gene with a novel transcription pattern encodes a polypeptide with a zinc finger and a leucine zipper. J. Gen. Virol. 63: 4489-4497.

Tiwari JL and Terasaki PI (1985). HLA and disease associations. Springer Verlag.

Todd JA, Acha-Orbea H, Bell JI, Chao N, Fronek Z, Jacob CO, McDer mott M, Sinha AA, Timmerman L, Steinman L and McDevitt HO (1988). A molecular basis for class n-associated autoimmunity. Science 241:1003-1009.

Tonnelle C, DeMars R and Long EO (1985) DOp: a new P chain in HLA- D with a distinct regulation of expression. EMBO J. 4: 2839-2847.

217 Tosi R, Nobuyuki T, Centis D, Battista G and Pressman D (1978) Immunological recognition of human la molecules. J. Exp. Med. 148: 1592-1611.

Townsend A and Bodmer H (1989). Antigen recognition by class I- restricted T lymphocytes. Ann. Rev. Immunol. 7: 601-624.

Townsend ARM, Rothbard J, Gotch FM, Bahadur G, Wraith D and McMichael AJ (1986). The epitopes of influenza nucleoprotein recog nised by cytotoxic T lymphocytes can be defined with short synthetic peptides. Cell 44: 959-968.

Townsend A, Ohlen C, Bastin J, Ljunggren H-G, Foster L and Karre K (1989). Association of class I major histocompatibility heavy and light chains induced by viral peptides. Nature 340: 443-448.

Trowsdale J and Kelly A (1985) The human HLA class II a chain gene DZa is distinct from genes in the DP, DQ and DR subregions. EMBO J. 4: 2231-2237.

Trowsdale J, Kelly A, Lee J, Carson S, Austin P and Travers P (1984). Linkage map of two HLA-SBp and two HLA-SBa-related genes: an intron in one of the SBp genes contains a processed pseudogene. Cell 38: 241-249.

Trowsdale J, Young JAT, Kelly AP, Austin PJ, Carson S, Meunier H, So A, Erlich H, Spielman RS, Bodmer J and Bodmer WF (1985). Structure, sequence and polymorphism in the HLA-D region. Immunol. Rev. 85: 5-43.

Trowsdale J, Groves V and Arnason A (1989). Limited MHC polymorphism in whales. Immunogenetics 29: 19-24.

Trowsdale J, Hanson I, Mockridge I, Beck S, Townsend A and Kelly A (1990). Sequences encoded in the class II region of the MHC related to the 'ABC* superfamily of transporters. Nature 348: 740-744.

218 Tsuge I, Shen F-W, Steinmetz M and Boyse EA (1987). A gene in the H-2S:H-2D interval of the mouse major histocompatibility complex which is transcribed in B cells and macrophages. Immunogenetics 26: 378-380.

Uehara H, Abe K, Park C-HT, Shin H-S, Bennett D and Artzt K (1987). The molecular organisation of the H-2K region of two t-haplotypes: implications for the evolution of genetic diversity. EMBO J. 6: 83-90.

Vanbuskirk A, Crum p BL, Margoliash E and Pierce SK (1989). A peptide binding protein having a role in antigen presentation is a member of the hsp70 heat shock family.}. Exp. Med. 170: 1799-1809. van Rood JJ and van Leeuwen A (1963). Leukocyte grouping: a method and its application. J. Clin. Invest. 42: 1382-1390. van Rood JJ, Eernisse JG and van Leeuwen A (1958). Leukocyte antibodies in sera from pregnant women. Nature 191: 1735-1736.

Vuorio E and de Crombrugghe B (1990). The family of collagen genes. Ann. Rev. Biochem. 59: 837-872.

Weiss EH, Golden L, Fahrner K, Mellor AM, Devlin JJ, Bullman H, Tiddens H, Bud H and Flavell RA (1984). Organization and evolution of the class I gene family in the major histocompatibility complex of the C57BL/10 mouse. Nature 310: 650-655.

Weitkamp L and Lamm LU (1982) Report of the committee on the gen etic constitution of chromosome 6. Cytogenet. Cell Genet. 32: 130-143.

White PC, Grossberger D, Onufer BJ, Chaplin DD, New MI, Dupont B and Strominger JL (1985). Two genes encoding steroid 21-hydroxylase are located near the genes encoding the fourth component of comple ment in man. Proc. Natl. Acad. Sci. USA 82: 1089-1093.

Widera G and Flavell RA (1985). The I region of the C57BL/10 mouse: characterisation and physical linkage to H-2K of an SB(3-like class II pseudogene, \yAP3. Proc. Natl. Acad. Sci. USA 82: 5500-5505.

219 Wroblewski JM, Kaminsky SG, Milisauskas VK, Pittman AM, Chaplin DD, Spies T and Nakamura I (1990). The B144-H-2D interval and the location of a mouse homologue of the human D6S81E locus. Imm uno genetics 32: 200-204.

Yewdell JW and Bennink JR (1990). The binary logic of antigen processing and presentation to T cells. Cell 62: 203-206.

Young JAT and Trowsdale J (1990). The HLA-DNA (DZA) gene is correctly expressed as a l.lkb mature mRNA transcript. Im m u no genetics 31: 386-388.

Yunis EJ and Amos DB (1971). Three closely linked genetic systems relevant to transplantation. Proc. Natl. Acad. Sci. USA 68: 3031-3035.

Zinkem agel RM and Doherty PC (1975). H-2 compatibility required for T-cell mediated lysis of target cells infected with lymphocytic choriomeningitis virus. J. Exp. Med. 141: 1427-1437.

220 Appendices

A. Cosmid clones

A.l. Cosmid vector The cosmid library was constructed using genomic DNA from the human T-cell line HPB.ALL in the cosmid vector cos202 (Figure A.I.). The genomic DNA was partially cleaved with Sau3A and ligated into the Bgin cloning site of the vector.

A.2. Cosmid clones The orientation of the genomic inserts of the cosmid clones isolated in this study is summarised in Table A.I., which shows which end of the vector (the 2.2kb EcoRI fragment or the 0.2kb EcoRI fragment, Figure A.I.) is adjacent to the proximal (centromeric) end of the genomic insert.

COSMID FIGURE PROXIMAL END

HPB.ALL 1 5.2. 0.2kb HPB.ALL 8 5.2. 0.2kb HPB.ALL 25 5.2. 2.2kb HPB.ALL 31 5.2. 2.2kb HPB.ALL 33 5.2. 2.2kb HPB.ALL 42 5.2. 2.2kb HPB.ALL 71 5.6. 2.2kb HPB.ALL 51 7.4. 2.2kb

Table A l. Orientation of cosmid clones isolated in this study. The second column shows the Figure in which a map of the genomic insert of the cosmid is presented. The third column shows which end of the vector (Figure A.I.) is adjacent to the proximal end of the genomic insert.

221 O) DC = - 0) o O ) C X o CQ O a ) LU O S O O LU D) CO LU "o> CO LU O JZ X

Cv| c > .E a = ‘o w (A O Q . 2 u E cu ,5, c X O > . (M C\J E Cl CO o DC O) 6 DO >% a CL o ‘X « u .y £ I (A DC _ P4 o — CO LU O

c/> CL

— O) = . E 0 ) O) F .-tf DQ o w O

222 Xhol, Xbal, Hindlll, BstXI

CDM8-F Xhol, Xbal, Pstl, Notl, BstXI

CDM8-B

The CDM8 vector carries: supF origins of replication for M13, Py, piVX and SV40 CMV and T7 promoters splice site and poly(A) tail

Figure B.l. Restriction map of the cDNA vector CDM8.

223 B. cDNA clones

B.l. cDNA vector All the cDNA clones described in this study were isolated from libraries constructed according to the method of Seed (1987) in the plasmid vector CDM8, or derivatives thereof. The structure of CDM8 is shown in Figure B.l. For cDNA cloning, the 4.8kb vector is cleaved with BstXI to remove the 400bp stuffer fragment and then ligated to cDNA which has been tailed at each end with complementary linkers. The BstXI sites are destroyed in this process, but the cDNA insert can be excised using other enzymes with sites either side of the cloning site (Figure B.l.).

Sequencing of the cDNA clones isolated in this study was initiated using two primers, CDM8-FORWARD (CDM8-F) and CDM8-BACK- WARD (CDM8-B) which are complementary to the vector sequence either side of the cloning site as shown in Figure B.l. The sequences of these two primers are:

CDM8-F (ICRFNo. 10299) S’-CCCAAGCTTCTAGAGATCCCT-S' CDM8-B (ICRF No. 5841) 5’-AGGCGCAGAACTGGTAG-3’

B.2. cDNA clones The orientation of each cDNA insert within the vector is summarised in Table B.l. which shows which of the two primers, CDM8-F or CDM8-B, gives the sequence at the 5' end of a particular insert.

224 GENE cDNA 5’ PRIMER

RING1 CEM15 N D RING2 CEM21 CDM8-B RING3 CEM32 CDM8-B ti CEM35 CDM8-F n CEM41 CDM8-B n CEM44 CDM8-B RING4 2.1 ND RING5 yU5 CDM8-F

Table B.l. Orientation of cDNA clones isolated in this study. The third column shows which primer, CDM8-F or CDM8-B, gives the sequence at the 5’ end of the cDNA insert. ND, not determined: both CEM15 and 2.1 were subcloned into other vectors before sequencing.

C. Gene symbols and database accession numbers

HGM nomenclature for the new genes RING1-5:

RING1 D6S111E RING2 D6S112E RING3 D6S113E RING4 D6S114E RING5 D6S115E

GenBank accession number for the nucleotide sequence of R IN G 5 (shown in Figure 6.12.): M58660.

225