Construction of an integrated map of the genomic locus lq21 harboring the Human Epidermal Differentiation Complex as a platform for the identification of all genes in this complex, the study of their expression, regulation, function and evolution

by Andrew P. South

Thesis Submitted For The Degree of Doctor of Philosophy University of London 1999

Centre For Applied Molecular Biology School of Pharmacy ProQuest Number: 10104217

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest.

ProQuest 10104217

Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.

All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. Microform Edition © ProQuest LLC.

ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Abstract

The human Epidermal Differentiation Complex (EDC) on lq21 consists of three structurally different, yet functionally related, gene families. Members of all three gene families have been shown to play important roles in epidermal differentiation. This thesis initially describes the assembly of a completely contiguous set of overlapping bacterial clones covering 2.45Mb of human genomic DNA from the lq21 locus, encompassing the entire EDC. All known genes (28) and eight DNA markers within the EDC are precisely localized to EcoRI restriction enzyme fragments that constitute a partial EcoRI and full Notl and Sail restriction enzyme map. The bacterial clones presented in this thesis have been accepted as the substrate for long range genomic sequencing of this region for the sequencing project at the Sanger Centre, UK. In addition to providing a template for large-scale sequencing, the bacterial clones presented here will serve as a molecular resource for the elucidation of all transcripts within the EDC as well as the study of transcriptional regulation, function and evolution of the EDC. As an evaluation of this resource as a tool for the identification of transcribed sequences, exon trapping has been performed from three PAC clones. The exon trapping experiments described in this thesis have identified 13 putative exons that are shown to derive from the EDC. Searches of the publicly available databases with the sequences of these 13 putative exons have identified a novel cDNA clone that is shown to localize to the EDC. Two of the thirteen putative exons identified are homologous, but not identical, to this novel cDNA. These data coupled with Northern blot and RT-PCR analysis, suggests that yet another novel family of transcribed sequences has been identified within the EDC. Thirteen gene members of the SI00 family of calcium binding proteins constitute one of the three so-far identified multi-gene families residing in the EDC. By using the contiguous bacterial clone map towards the study of EDC evolution, two findings have been made. Firstly, evidence of an ancestral break-point inversion during the evolution of mammals is supported by the elucidation of the transcriptional orientation of four SI00 genes and by the identification of extensive alternative splicing of the 5’ untranslated region of one of these SlOO genes. Secondly, a similar clustering of SlOO genes to that seen in human and mouse is described for the first non­ mammalian vertebrate species, Gallus gallus. Acknowledgements

Many thanks are due to my supervisor, Dean Nizetic, for his time, support and patience during the writing of this thesis. Respect and thanks also due to the members of Dean’s laboratory, past and present - Jurgen Groet, Rachel Flomen, Pedro Baptista, and Jane Ives.

Appreciation to the lq21 consortium, especially Ghazala Mirza and Jiannis Ragoussis for fluorescent in situ hybridizations, is given.

With much love and respect I thank my wife to be, Clare, for huge support and encouragement during the past years.

This work has been funded by the Constance Bequest Fund, The School of Pharmacy, and the BMH4-CT96-0319 grant from the Commission of European Communities to the European lq21 consortium. Table of Contents

Title 1 Abstract 2 Acknowledgements 3 Table of Contents 4 Figure finder 15 Table finder 17 Glossary of Abréviations 18

Aims of This Thesis 21

Chapter 1: Introduction 22 1.1. Towards a complete nucleotide sequence of human chromosome 1 22 1.1.1. project strategy 22 1.1.2. Genetic mapping 23 1.1.3. Somatic cell hybrid lines and radiation hybrid mapping 23 1.1.4. Long-range physical restriction enzyme mapping 24 1.1.5. Cloning genomic DNA with vectors capable of propagation within a host organism 24 1.1.5.1. Yeast artificial 24 1.1.5.2. Bacterial vector/host systems 25 1.1.5.2.1. Cosmids 26 1.1.5.2.2. PI artificial chromosomes and bacterial artificial chromosomes 27 1.1.6. Sequencing 28 1.1.6.1. Clone end-sequencing approach 28 1.1.6.2. Whole genome “shotgun” approach 29 1.1.6.3. Clone mapping approach 30 1.1.7. Current status of publicly available genome sequencing 31 1.1.7.1 Global sequencing project 31 1.1.7.2. Chromosome 1 sequencing project 31 1.1.8. Global mapping and local mapping: parallel endeavors 32

1.2. Epidermal Differentiation 33 1.2.1. Epidermal Differentiation as a model for the study of cell differentiation in general 33 1.2.2. Biological markers of epidermal differentiation 34 1.2.2.1. The keratins 34 1.2.2.2. Profil aggrin 36 1.2.2.3. The Comified Envelope (CE) - protein content 38 1.2.2.4. The Comified Envelope (CE) - lipid content 43 1.2.3. Regulation of epidermal differentiation 44

1.3. The Epidermal Differentiation Complex 46 1.3.1. Gene families constituting the EDC 46 1.3.1.1. Loricrin, involucrin and the SPRR (CE precursor) gene family 46 1.3.1.2. Profilaggrin, , and repetin (intermediate filament association) gene family 48 1.3.1.3. S 100 gene family 49 1.3.2. Gene complexes and multi-gene families 50 1.3.3. Locus control regions 51 1.3.4. Clustered multi-gene families generally reside in ‘gene-rich’ regions of the genome 53

1.4. Genetic disorders associated with human chromosomal region lq21 55 1.4.1. Disorders of the skin associated with lq21 55 1.4.2. Other disorders linked to lq21 56 1.4.3. Cancer and lq21 56 Chapter 2: Materials and Methods 58 2.1. Materials 58 2.1.1. Specialized materials 58 2.1.2. Chemicals 59 2.1.3. Enzymes 59 2.1.4. DNA oligonucleotide primers 59 2.1.5. Nucleotides 62 2.1.6. DNA cloning vectors used 62 2.1.7. DNA size markers 63 2.1.8. Culture media 63 2.1.9. Solutions 64 2.1.10. Sources of genomic DNA 66 2.1.11. Sources of RNA 66 2.1.12. Northern blot 66 2.1.13. Host strains used 66 2.1.14. Bacterial clone libraries 67 2.1.15. IMAGE consortium cDNA clones used 67 2.1.16. COS7 cell line 67 2.2 Methods 67 2.2.1. Filter spotting and processing 67 2.2.2. Restriction enzyme digestion 69 2.2.3. Agarose gel purification of DNA 69 2.2.4. Retrieving single clones from microtitre plates 70 2.2.5. Agarose plug minipreps of S.cerevisiae containing YAC clones 71 2.2.6. Pulsed Field Gel Electrophoresis 71 2.2.7. Southern blotting 72 2.2.8. Radio-labeled probe preparation 72 2.2.9. Suppression of probe sequence over-represented within the genome 73 2.2.10. Hybridization of membranes (filters) containing Southern blotted DNA or high/low density gridded colony DNA 73 2.2.11. Polymerase Chain Reaction (PGR) amplification 74 2.2.12. Preparation of plasmid DNA 74 2.2.13. Glycerol stock production 76 2.2.14. Preparation of probes from bacterial clones 76 2.2.15. Dot blot construction 77 2.2.16. Competent cell preparation 78 2.2.17. Transformation of competent cells 78 2.2.18. Tissue culture of C0S7 cells 78 2.2.19. Exon Trapping 79 2.2.19.1. Preparation of pSPLB vector 79 2.2.19.2. Preparation of bacterial clone insert DNA 79 2.2.19.3. Ligation and transformation 80 2.2.19.4. Electroporation 80 2.2.19.5. Reverse transcription and PCR 81 2.2.19.6. Sub-cloning secondary PCR products 83 2.2.19.7. Generation of 3^^ PCR for library screening 83 2.2.20. Sequencing template preparation 83 2.2.21. Genomic DNA preparation 84 2.2.22. cDNA construction 86 2.2.23. Northern blot hybridization 87 2.2.24. Sub-cloning cosmid DNA fragments into pBluescript*^ vector 87 2.2.25. Oligonucleotide DNA primer design and annealing temperature calculation 88 Chapter 3: Construction of overlapping segments of human DNA cloned in bacteria representing the lq21 region,

specifically the Epidermal Differentiation Complex 90 3.1. Summary 90 3.2. Introduction 91 3.2.1. Advantages of bacterial clone physical mapping 91 3.2.2. Mapping status of lq21 92 3.3. Production of a sub-library of recombinant bacterial clones enriched for 1 q21 originating DNA 95 3.3.1. Starting material 95 3.3.2. YAC isolation 95 3.3.3. Verifying the YAC STS/EST content 96 3.3.4. Screening the cosmid and PAC libraries 97 3.3.5. Construction of tools for hybridization and PCR based screening of the lq21 sub-library 102 3.3.6. First round screens with DNA markers 103 3.3.7. Contig linking 107 3.4. Further screening of the whole genome and flow sorted chromosome 1 bacterial clone libraries 111 3.4.1. Further screening of the main PAC library 111 3.4.2. Screening BAC and cosmid library 114 3.4.3. Identifying the contents of the EDC within the contig 114 3.5. Restriction analysis and fine alignment of clones within the contig 116 3.5.1. Coverage (contig depth) 121 3.5.2. Localizing genes and markers 122 3.5.3. New ESTs and STSs from the region 123 3.6. Integrated 2.45 mega-base map of human chromosomal region lq21 encompassing the entire EDC 123 3.6.1. Submission of the bacterial contig for large scale sequencing 126 3.6.2. Comparison of the integrated 2.45Mb bacterial map with previous mapping studies of lq21 126 3.6.3 Precise ordering of genes and markers within the EDC 127

Chapter 4: Use of the integrated map as a platform for the identifîcation of novel transcribed sequences within the Human Epidermal Differentiation Complex (via Exon

Trapping) 129 4.1 Summary 129 4.2 Introduction 129 4.2.1 cDNA libraries and their screening 130 4.2.2 cDNA selection 131 4.2.3 Global cDNA sequencing and mapping 132 4.2.4 Approaches to identify transcribed sequences directly from genomic DNA without prior knowledge of the tissue of expression 132 4.2.4.1 Zoo blots 132 4.2.4.2 Linking libraries 133 4.2.4.3 Sequence annotation 133 4.2.4.4 Exon trapping 134 4.3 Generation of “exon trapped” libraries from selected EDC PAC clones 135 4.3.1 Starting material 135 4.3.2 Production of “exon trapped” libraries 136 4.4 Analysis of “exon trapped” libraries 137 4.4.1 Screening 137 4.4.2 Assessment of trapped exon sequence from S100 genes within the “exon trapped” libraries 139 4.4.3 Selecting clones to pick 142 4.4.4 PCR amplification 142 4.4.5 Analysis of PCR products 142 4.4.6 Sequencing PCR products from “exon trapped” clones 146 4.4.7 BLAST analysis 146 4.4.8 Open reading frame (ORE) analysis 147 4.4.9 Overall statistics relating to putative exons identified 147 4.5 Analysis of “exon trapped” clones showing homology to database entries 149 4.5.1 Clone 128L15bH 15 149 4.5.2 Clone 127E12sC3 and related EST 149 4.5.2.1 Screening the “exon trapped” libraries with IMAGE clone 1676497 152 4.5.2.2 PCR with primers designed from the 3’ end of IMAGE clone 1676497 152 4.5.2.3 Direct sequencing of PAC clone 127 E12 with primers derived from IMAGE clone 1676497 sequence 152 4.5.2.4 ORE analysis of IMAGE clone 1676497 and BLASTX 2.0.9 data 154 4.5.3 Clone 127E12sB4 154 4.5.4 Fine mapping of IMAGE clone 1676497 and “exon trapped” clones 127E12sC3 and 127E12sB4 155 4.5.5 Genomic mapping of IMAGE clone 1676497 and “exon trapped” clone 127E12sB4 156 4.6 Transcriptional analysis of putative exons and novel EST 158 4.6.1 RT-PCR data 158 4.6.2 Northern blot analysis 158

10 Chapter 5: Use of the integrated map to provide insight into the evolutionary events shaping the Epidermal

Differentiation Complex 162 5.1 Summary 162 5.2 Introduction 162 5.2.1 Evolution of multi-gene families 162 5.2.1.1 Gene duplication 162 5.2.1.2 Diversification 163 5.2.1.3 Random genetic drift and natural selection 164 5.2.1.4 Molecular drive 164 5.2.1.5 Block duplication 165 5.2.2 Comparative genomics 166 5.2.2.1 Other vertebrate models - Fugu rubripes and Gallus gallus 167 5.2.3 Epidermal Differentiation Complex 168 5.2.3.1 Evolution 168 5.2.3 2 Mouse EDC 170 5.2.3.3 Transcriptional orientation within the EDC located SlOO gene family 171 5.2.3.4 Chicken EDC genes 172 5.3 SI00A1 and SI00A13 transcript species and orientation 173 5.3.1 Library screening with S 100A 1 and S I00A 13 173 5.3.2 Sub-cloning and sequencing of the S100A13 positive EcoRI fragment from cosmid clone ICRFl 12cP0780 175 5.3.3 BLASTN 2.0.9 search of GenBank human EST database with genomic sequence derived from cosmid clone ICRFl 12cP0780 178 5.3.4 Identifying other SIOOA13 EST entries with novel 5’ end sequences 180 5.3.5 Extending the 2,668bp genomic sequence with PCR products amplified using novel S100A13 EST species-specific primers 180

11 5.3.6 SlOOAl exon 1 identified with additional sequence derived from cosmid ICRFcPO?80 181 5.3.7 Identification of additional SI OOA 13 species of ESTs 183 5.3.8 RT-PCR analysis of the various identified S100A13 species of transcribed sequences 183 5.3.9 Sub-cloning and sequencing of an S1 OOA 13-positive Sau3Al fragment from the 10q23 localized cosmid clone ICRFl 12cL0496 190 5.4 Transcriptional orientation of other SlOO genes within the EDC 193 5.4.1 S I00A2 orientation determined from mapping data 193 5.4.2 SI OOA 12 orientation determined from mapping data 194

5.5 SlOO cross-species comparison: G a //m5 195 5.5.1 Published Gallus gallus SlOO genes and human equivalent comparison 195 5.5.2 Identification of Gallus gallus SIOOAIO 195 5.5.3 Chicken genomic cosmid library screening using Gallus gallus S100A6 and SIOOAIO 196 5.5.4 Direct sequencing S100A6, SIOOAIO and comparison 198 5.5.5 Analyzing the presence of other SlOO genes in chicken cosmid clones identified with chicken S100A6 and SIOOAIO 202

12 Chapter 6: Discussion 207 6.1 Evaluation of the integrated map as a molecular resource 207 6.1.1 As a substrate for large scale sequencing 207 6.1.2 As a resource for the characterization of chromosome 1 -specific repeats implicated in cancer 208 6.1.3 As a resource for transfection studies 210 6.1.4 As a substrate for evolutionary comparisons 211 6.2 Exon trapping as a method of identifying transcribed sequences within the EDC 211 6.2.1 Efficiency of exon trapping 212 6.3 Analysis of “exon trapped” products and a cDNA clone identified 214 6.3.1 Clone 128L15bH15 215 6.3.2 Clone 127E12sC3 and related cDNA 216 6.3.3 Clone 127E12sB4 217 6.3.4 Evidence for a novel gene family within the EDC? 217 6.3.5 Other “exon trapped” clones 219 6.4 Is lq21 a gene rich region? 221 6.5 The role of evolutionary comparisons in understanding the EDC 223 6.5.1 Transcriptional orientation of the S 100 genes as an indicator of the evolutionary processes defining the organization of this multi-gene family 223 6.5.2 Organization of the mouse S100 gene family located on a genomic region syntenic with the human EDC 227 6.5.3 Implications on the possible presence of locus control regions 228 6.5.4 The first identification of an S100 gene cluster within the chicken genome 229 6.5.5 The existence of a non-mammalian EDC? 231 6.6 Further work 232 6.6.1 Further characterization of (potential) transcripts identified by exon trapping 232

13 6.6.2 Systematic, optimized exon trapping and cDNA selection of bacterial clones presented in the integrated map 233 6.6.3 Analysis of long-range genomic sequence generated by the Sanger Centre 234 6.6.4 Further analysis of the chicken S100 cluster 234

Conclusions 235

References 236

Appendix I: Details of vectors used for restriction analysis 267 Appendix H: Table of EDC bacterial map confirmations 268 Appendix III: EDC novel sequences 270 Appendix IV: Publication 277

14 Figure finder 1.1...... 35 1.2...... 47

3.1 ...... 94 3.2 ...... 98 3.3 ...... 100 3.4 ...... 104 3.5 ...... 104 3.6 ...... 106 3.7 ...... 108 3.8 ...... 109 3.9 ...... 113 3.10 ...... 118 3.11 ...... 119 3.12 124

4.1 ...... 138 4.2 ...... 141 4.3 ...... 143 4.4 ...... 145 4.5 ...... 151 4.6 ...... 153 4.7 ...... 157 4.8 159 4.9 ...... 161

5.1...... 174 5.2...... 176 5.3 179 5.4...... 182

15 5.5 184 5.6 185 5.7 185 5.8 189 5.9 191 5.10 194 5.11 199 5.12 201 5.13 203 5.14 205

6.1 220 6.2 226

16 Table finder

Table 1.1...... 40

Table 2.1 ...... 60 Table 2.2 ...... 67

Table 3.1...... 96 Table 3.2 101 Table 3.3 103 Table 3.4 105 Table 3.5 I l l Table 3.6 121 Table 3.7 121

Table 4.1 137 Table 4.2 139 Table 4.3 148 Table 4.4 160

Table 5.1 186 Table 5.2 196 Table 5.3 198 Table 5.4 200 Table 5.5 204

17 Glossary of Abbreviations

ATP (dATP, ddATP, rATP) adenosine 5’ -jtriphosphate(deoxy, dideoxy, ribo)

BAC bacterial artificial chromosome BIDS Institute for Scientific Information Inc Science Citation Index Database bp base pairs BSA bovine serum albumin cDNA complementary DNA CE comified envelope CEPH centre d'etude polymorphisme humaine CGH comparative genome hybridization Ci curie cM centimorgans cpm counts per minute CpG 5’ CG 3’ dinucleotide CTP (dCTP, ddCTP, rCTP) cytosine 5’ - triphosphate'deoxy, dideoxy, ribo)

DNA deoxyribonucleic acid dNTPs deoxyribonucleic 5’ triphosphates DTT dithiothreitol

EDC epidermal differentiation complex EDTA ethylenediaminetetraacetic acid EGF epidermal growth factor EST expressed sequence tag ETOH ethanol

FISH fluorescent in situ hybridization

GAPDH glyceraldehyde-3-phosphate dehydrogenase GDB Genome Database (http://www.hgmp.mrc.ac.uk/gdb/) GTP guanosine trisphosphate GTP (dGTP, ddGTP, rGTP) guanosineS’ -triphosphate(deoxy, dideoxy, ribo)

18 HEPES N-2-hydroxyethypiperazine-N’-2-ethanesulphonic acid HGMP human genome mapping project HGP human genome project HS DNase I hypersensitive site

IMAGE I.M.A.G.E. Consortium (integrated molecular analysis of genomes and their expression). Lennon et al, 1996 kb kilobase pairs

LCR locus control region

Mb megabase pairs MHC major histo-compatibility complex M-MLV molony murine leukaemia virus MTP minimal tiling path

NCBI national institute for biotechnology information

OD optical density OMIM OMIM (TM). Center for Medical Genetics, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), 1999. World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/

PAC PI artifical chromosome PCR polymerase chain reaction PFGE pulsed field gel electrophoresis PPARa peroxisome proliferator-activated receptor alpha

RNA ridonucleic acid RPLB random primed labeling buffer rpm revolutions per minute P-ME P-mercaptoethanol

SC ganger centre SCE sorbitol/sodium citrate/EDTA SDS sodium dodecyl sulphate SPRR small proline rich protein

19 ssc sodium saline citrate STC sequence tagged connector STS sequence tagged site

TBE tris - borate/EDTA TE tris/EDTA TIGR the institute for genomic research TG transglutaminase TGFa transforming growth factor alpha Tris tris(hydroxymethyl)aminomethane tRNA transfer-RNA TTP (dTTP, ddTTP) thymidine 5’ -triphosphate(deoxy, dideoxy)

rUTP ribo-uracil 5’ - trisphosphate U units UK united kingdom USA united states of america UV ultra-violet v/v volume/volume

WUGSC Washington university genomic sequencing center w/v weight/volume

YAC yeast artificial chromosome YLS yeast lysis solution

20 Aims of this thesis

The aims of this thesis are described generally by the title “Construction of an integrated map of the genomic locus lq21 harboring the Human Epidermal Differentiation Complex as a platform for the identification of all genes in this complex, the study of their expression, function and evolution”. At the beginning of this thesis the three main areas of investigation were as follows:

1. Construction of an integrated map in ‘sequence-ready’ bacterial clones encompassing the entire Epidermal Differentiation Complex (EDC).

2. Identification of novel transcribed sequences within the EDC by using the bacterial clone map produced and the hypothesis that the EDC, like other complexes within vertebrate genomes, would reside in a ‘gene-rich’ region.

3. Contribute to the understanding of EDC evolution by use of the bacterial clone map produced. A species comparison of Homo sapiens and Gallus gallus was initiated. Gallus gallus was selected as a model organism on the basis of being a non­ mammalian vertebrate that could potentially contain a ‘compressed’ EDC due to a smaller genome with a lower incidence of repetitive sequences.

21 Chapter 1: Introduction

1.1 Towards a complete nucleotide sequence of human chromosome 1

1.1.1 Human genome project strategy The complete nucleotide sequence of the human genome will provide an invaluable reference point for the further understanding of human disease and biological function. The mammoth task of sequencing the some 3000 million base pairs of DNA is the ultimate goal of the human genome project (HGP). The HGP was initiated from the USA in 1990 by the two governmental bodies, the National Institutes of Health and the Department of Energy. The fifteen-year program (estimated to cost $3 billion) was soon joined by the UK, France, Germany and Japan as an internationally collaborative project. Towards the end of 1998 the projected deadline for the completion of the HGP was brought forward by two years, from 2005 to 2003 (Goodman, 1998). The overall strategy of the HGP begins with the identification of reference points throughout the genome to facilitate mapping each of the 24 individual chromosomes. These reference points are specific stretches of sequence that can be positioned coherently to a genomic region. The specific stretch of sequence, or sequence tagged site (STS), can take the form of an anonymous sequence template for PCR, a cloned genomic fragment, or an identified expressed sequence - formerly known as expressed sequence tagged site or EST (Olsen et al, 1989; Adams et al, 1991). Coherent positioning of these STSs or ESTs can be achieved by a number of mapping techniques (discussed below). Identifying and mapping STSs and ESTs is a circular and spiraling process - identifying STSs and ESTs facilitates mapping, which in turn facilitates the identification of STSs and ESTs. Technological advances increase the ability to achieve both goals. In the context of the HGP these reference points are used to anchor larger human DNA fragments that provide a sequencing substrate. These larger fragments derive from genomic DNA cloned into artificial vectors capable of propagating the genomic/vector DNA within a host organism, which can then be readily purified. Size, stability, and

22 ultimately suitability for sequencing of the cloned genomic fragment are dependent on the vector/host system used.

1.1.2 Genetic mapping

By following the segregation of polymorphic STS alleles through families, or pedigrees, a genetic map is produced (Donis-Keller et al, 1987; Weissenbach et al, 1992). Segregation of STSs through generations can be followed due to the process of homologous recombination between pairs of chromosomes during meiosis - the closer the proximity of two STSs, the more likely their alleles will be co-inherited. The STS in this case has to be polymorphic in order to follow two variant forms (alleles) on a pair of chromosomes as they segregate. The Centre d'Etude Polymorphisme Humaine (CEPH) collected DNA samples from carefully selected large families of at least three generations. It is from this valuable resource (and others like it) that a great deal of genetic mapping information has been produced through collaborative projects (Murray et al, 1994).

1.1.3 Somatic cell hybrid lines and radiation hybrid mapping Human chromosomes can be separated onto a rodent background by the fusion of two somatic cell lines under chemical conditions, often resulting in uninucleate cells composing of chromosomes from both parental lines. Human chromosomes are preferentially lost or deleted after rounds of culturing, enabling the establishment of cell lines containing few, single, or partial human chromosomes (Kao et al, 1976). By this means human chromosome specific markers can be generated or refined to chromosomal locations (McAlpine etal, 1989). Fragmenting chromosomes with a high dose of x-rays and then recovering the broken fragments on a rodent background can increase the resolution of somatic cell hybrid mapping. Broken ends of chromosomes are rapidly joined after x-ray exposure, usually resulting in the translocation or insertion of human chromosomes into rodent chromosomes. Irradiated cells are recovered by fusion with non-irradiated cells as high dosages of x-rays result in cell death. The further the distance between two markers on a chromosome, the more chance there is of separation by irradiation. Marker order and

23 distances can therefore be determined by estimating the frequency of breakage and separation in a manner similar to genetic mapping (Cox et al, 1990). This method, termed radiation hybrid mapping, has facilitated accurate mapping of a large number of STS and EST markers genome wide (Walter et al, 1994; Schuler et al, 1996).

1.1.4 Long-range physical restriction enzyme mapping The ability to separate large multi-megabase fragments of DNA by pulsed field gel electrophoresis (PFGE) (Schwartz and Cantor, 1984) enables the construction of long- range physical maps by digesting DNA with restriction enzymes whose recognition sites are infrequently located along the length of human chromosomes (Barlow and Lehrach, 1987). The large DNA fragments produced by these ‘rare-cutting’ restriction enzymes are resolved by PFGE, transferred to a membrane (Southern, 1975), and recognized by DNA markers used as hybridization probes. This method has facilitated the construction of ‘restriction maps’ covering large genomic regions, an example of which being the complete Notl restriction enzyme map of the entire long arm of human chromosome 21 (Ichikawa et al, 1993).

1.1.5 Cloning genomic DNA with vectors capable of propagation within a host organism

Two main vector/host systems are used in direct physical mapping of genomes for sequencing - yeast artificial chromosomes (YACs) and bacterial vectors (X- bacteriophage, cosmid, PI-artificial chromosomes and bacterial artificial chromosomes).

1.1.5.1 Yeast artificial chromosomes Yeast artificial chromosomes were first described in 1987 (Burke et al, 1987) and are able to clone large genomic DNA segments of up to 2 Mb (Larin et al, 1991 ; Chumakov et al, 1992b). This feature facilitates the rapid coverage of large genomic regions by assembling randomly cloned fragments into overlapping, or contiguous sets of YACs, anchored to a specific chromosome by means of STSs and ESTs. The maps produced from these sets of contiguous clones (or contigs) have been the useful bridges between genetic maps and physical overlap maps. The first assembled clone maps of single

24 chromosomes were achieved by this means (Chumakov et al, 1992; Foote et al, 1992; Nizetic et al, 1994; Collins et al, 1995). This strategy has been further extended to produce a 75% coverage of the human genome in 225 YAC contigs with an average size of 10Mb (Chumakov et al, 1995). Higher resolution of an individual YAC clone can be achieved by randomly fragmenting the insert DNA at a specific end, thus producing a number of smaller clone derivatives. By sizing and ordering these derivatives a greater resolution of marker to marker distance and order can be achieved within a single clone or an overlapping set of fragmented clones (Lewis et al, 1992; Lioumi et al, 1998). The major drawback in using YACs for faithful representation of genomic DNA is that a large proportion of clones suffer structural instability of insert DNA produced by rearrangements, deletions and chimaerisms (Green et al, 1991). With no functional advantage conferred by the insert DNA present within a YAC clone, instability is not selected against. In fact a YAC clone containing a small insert will generally replicate at a faster rate than a YAC clone containing a larger insert, therefore any deletion early in the growth of a YAC clone colony or culture will predominate. Recombination between YACs can result in the deletion, rearrangement or the joining of DNA from different origins (chimaerism), thus reducing the advantages of YAC mapping. Additionally, even YAC clones with no rearrangements do not represent suitable substrates for sequencing, due to their large size and the relative difficulties faced in purifying cloned DNA from yeast genomic DNA. In some instances the sub-cloning of YAC DNA into smaller bacterial vector/host systems (which are more suitable for sequencing) has circumvented this problem. Although the bacterial sub-clones need to be ordered and the possibility of rearrangements can present a problem, this strategy has been employed with early human genomic sequencing such as within the major histocompatibility complex (MHC) on chromosome 6p (Shiina et al, 1999).

1.1.5.2 Bacterial vector/host systems Bacterial vector/host systems present distinct advantages over YAC clones in that plasmid DNA can be more readily purified from host genomic DNA using an alkaline lysis method (Birnboim and Doly, 1979). The smaller size of the insert DNA (30-300kb)

25 facilitates accurate restriction endonuclease mapping (Coulson etal, 1986) and provides a template for large scale sequencing (Sulston et al, 1992).

1.1.5.2.1 Cosmids Cosmid based vectors were first described in 1978 (Collins and Hohn, 1978) and later modified (Hohn and Collins, 1980; Cross and Little, 1986) to clone up to 40kb of genomic DNA. The reduced size of the cloned insert DNA increases the resolution of a region represented by cosmid clones, but also increases the number of cosmids (and therefore the volume of work) needed to cover that genomic region when compared to YACs. In the case of a total genomic cosmid library, many more clones are needed to cover a similar genomic equivalent than would be needed by YAC clones. For example, a fivefold equivalent genome coverage in cosmid clones would be represented by close to 400,000 clones, whilst the number of YAC clones needed to provide equal coverage would be much smaller (if YACs averaged 800kb, 20 times the size of a single cosmid, then only 20,000 YAC clones are needed, 20 times less, to give similar genome coverage). This increase in the number of clones needed to constitute a library can be reduced if the library is constructed from a specific chromosome of interest. This has a number of distinct advantages over a total genomic library. If the chromosomal location of a specific probe is known then, in some cases, the number of clones needed to be screened is reduced 10-30 fold. If the region of interest has been cloned in YACs and the need arises to increase the resolution of mapping by using bacterial clones (such as for sequencing), then chromosome specific libraries can be screened with the YAC clones. This not only reduces the number of clones to be screened, but also prevents any chimaeric YAC clones identifying a proportion of bacterial clones associated with chromosomes other than the one of interest. A number of chromosome specific cosmid libraries have been constructed and are available for screening (Deaven et al, 1986; de Jong et al, 1989; Nizetic et al, 1991 and 1994; Nizetic and Lehrach, 1995). Obtaining chromosome specific material can be achieved by a process of ‘flow sorting’. Briefly, cells from a particular cell line of interest are chemically arrested in metaphase; the chromosomes are isolated and stained with one or two fluorescent dyes. The target chromosome is then identified and isolated based on its’ pattern of fluorescent emissions

26 within a laser detection system. The availability of a cell line that gives an identifiable pattern of fluorescent emissions (or flow-karyotype profile) for the particular chromosome of interest is of paramount importance (Davies et al, 1981). The use of somatic cell hybrid lines containing human chromosomes is particularly useful, as rodent chromosomes are considerably smaller and more uniform in size than human chromosomes. Although cosmid clones represent an important bacterial host/vector system that has contributed to high-resolution mapping and sequencing, a number of factors present difficulties when mapping large genomes (such as the human genome) with cosmid clones. The sheer size of the human genome means that the number of cosmids needed to provide ultimate coverage is large (400,000 for a five-fold equivalent). In addition cosmid clones are occasionally susceptible to deletions and rearrangements within the insert DNA. This has been shown to be a result of the high plasmid copy number (up to 50) when using conventional cosmid vectors (Kim et al, 1992). With a large number of identical plasmid copies within a bacterial cell, recombination between plasmid DNA, as a result of repetitive elements (likely to be present within the 40kb insert, Smit, 1996) is thought to mediate rearrangement and deletion events.

1.1.5.2.2 PI artificial chromosomes and bacterial artificial chromosomes For reasons described above, the development of stable vector systems able to clone intermediate sized DNA fragments represented a breakthrough in the HOP. PI artificial chromosome and bacterial artificial chromosome vectors are capable of cloning up to 300kb of genomic DNA and maintaining this DNA in a stable, single copy fashion (laonnou et al, 1994; Shizuya et al, 1992). The manageable size of these vectors means that the resolution of mapping achieved from cosmid clones is maintained with the added advantage of increased size (less clones needed to represent a genomic region). It is on these vector/host systems that the majority of sequencing strategies are now based, including the general strategy of the HGP.

27 1.1.6 Sequencing

Sequencing of genomic DNA proceeds from small fragments (1.4 - 2.2kb) derived from bacterial clones (or in some cases total genomic DNA, see below) which are sub-cloned into bacteriophage M l3 or plasmid vectors (Bankier et al, 1987) for sequencing by the chain termination method (Sanger et at, 1977). This sub-cloning is random and termed ‘shotgun’ cloning. Sequence reads from random shotgun clones are assembled into contigs on the basis of sequence overlap. These contigs represent an ‘unfinished’ stage where sequence gaps remain. Closing of sequence gaps (or ‘finishing’) is then facilitated by direct sequencing of the bacterial clone (another stage is needed before this can take place when shotgun cloning from total genomic DNA, discussed below) effectively linking the edges of the sequenced contigs (Sulston et al, 1992). This represents the basic strategy for genomic sequencing. Variations to this theme, as well as a more detailed look at the HGP strategy are discussed below.

1.1.6.1 Clone end-sequencing approach It is recognized that constructing bacterial clone contigs spanning multi-megabase regions of genomic DNA is costly and time consuming. It has been argued that this is a necessary step for the HGP (see below) but it is also agreed that this is a limiting factor in sequence generation. For this reason, alternative approaches have been suggested and implemented. In 1996 Craig Venter and colleagues proposed a new strategy for genome sequencing (Venter et al, 1996). This strategy circumvented the need for constructing multi-megabase bacterial contigs and was based on randomly sequencing the end sections of bacterial clones. A library of 300,000 BAG clones was constructed, representing a 15- fold coverage of the human genome (total genomic DNA of an average size of 150kb randomly cloned into the BAG vector could represent the 30(X)Mb genomic DNA 15 times). The ends from all BAG clones (600,000 in total) are sequenced from isolated DNA. BAG clones are fingerprinted (digested with a restriction enzyme) to detect any artifactual clones (clones containing little or no insert DNA). 1% (3,000) of clones are selected for sequencing. These sequenced clones are checked against the database of sequenced ends. Identified BAG clones that are overlapping with the sequenced clone (end sequence detected in whole clone sequence) are compared by fingerprinting. Glones

28 displaying a minimal overlap based on fingerprint analysis are themselves sequenced. It is proposed that sequencing of the entire genome will be generated from as little as 20.000 completely sequenced BAC clones. Although this project is underway it has been argued that a proportion of redundancy in (and therefore wasteful) sequencing will be completed (the Sanger Centre and the Washington University Genome Sequencing Centre, 1998). The random selection of 3.000 clones is unlikely to be equally spaced throughout the genome. However, recent data suggests this might not be the case (Mahairas et al, 1999). As of February 1999, 314.000 BAC end sequences (known as sequence-tagged connectors, STCs), constituting approximately 135Mb of sequence (4.5% of the human genome), had been produced by The Institute for Genomic Research (TIGR) in collaboration with two groups at the University of Washington. The projected number of STCs to be generated has increased from 600,000 to 900,000 (Venter et al, 1996; Mahairas et al, 1999). Analysis of the current 314,000 STCs produced gives the following findings. The average length of an STC is 432bp, of which the average non-repetitive sequence of this is 281 bp (the human genome contains a substantial amount of repetitive sequence, reviewed in Smit, 1996). Over 75,000 DNA fingerprints, (representing 77% of the BAC clones used to generate STCs) have been produced. 5% of the fingerprinted clones are seen to contain artifacts. New repetitive elements within the genome have been identified from STC sequence. The estimated random distribution of STCs across the genome is 1 per 6kb, the mean distribution of 275 STCs across more than 1.8Mb of sequence gives a value of 1 per 6.6kb. Non-repetitive STC sequences are rarely identical, suggesting they are not clustered within one genomic region, while the mean GC content is 40.2%, very similar to that of the whole genome (Saccone et al, 1993).

1.1.6.2 Whole genome “shotgun” approach In 1997 an argument for changing the current sequencing strategy from clone based mapping to a whole genome shotgun approach was made (Weber and Myers, 1997). The proposed advantages of this strategy focused on speed, cost saving and the identification of genome polymorphisms by sequencing an increased coverage of genomic DNA obtained from a pool of geographically distinct human sources. However in the same

29 journal issue, a complete rejection of this strategy was made citing convincing arguments against the supposed advantages (Green, 1997). With the announcement of a technological advance in sequencing equipment from PE Applied Biosystems, coupled with the data produced from the BAC end sequencing project, an actual strategy of whole genome shotgun sequencing was initiated (Venter et al, 1998). The technological advance made by PE Applied Biosystems was the launch of a new sequencing machine, the A BI3700. Apparently, the machine is capable of processing 1000 samples a day with minimal operating time (approximately 15 minutes when compared to 8 hours for the pre-existing equipment). The Institute for Genomic Research (TIGR) and Perkin-Elmer have initiated what they predict to be a 3-year program, costing between $200-250 million (compared to the $300 million of the HGP) which includes laboratory infrastructure costs and all computational equipment needed to analyze the data. The strategy (similar to that proposed by Weber and Myers) is based on fragmenting total genomic DNA into small (2-lOkb) fragments and then cloning these fragments into a two vector system. The need for multiple cloning vectors is due to the regions of the genome that seem to be unclonable in a single particular vector. A large number of small fragment (2kb) clones (30,000,000 in total) and larger fragment (lOkb) clones (5,000,000 in total) are produced with two separate vectors respectively, creating a 31 fold coverage of the genome. Initially all clones are not completely sequenced, only their ends. By incorporating the BAC end sequencing data, a total of 46-fold coverage will be achieved. By this approach, it is estimated that 99.9% of the genome will be sequenced by 2003, although the authors accept that they cannot predict with confidence how many gaps, or how many unclonable or unsequenceable regions will be encountered (Venter et al, 1998).

1.1.6.3 Clone mapping approach The Sanger Centre (SC) and the Washington University Genome Sequencing Centre (WUGSC) are following a strategy in line with the HGP (SC and WUGSC, 1998). A high density of STS markers (average of 15 per Mb) in the region of interest are used to screen libraries of bacterial clones. The identified bacterial clones are ordered into contigs by

30 fingerprinting. Contigs are extended by generating new markers either from the ends of bacterial clones or from YAC clones bridging distances between the original markers used. Bacterial clones are selected for sequencing on the basis of redundancy (determined from fingerprinting), STS content, and chromosomal localization determined by fluorescent in situ hybridization (FISH). As of September 1998, these two groups have mapped approximately 890Mb coverage in bacterial clones and sequenced over 98Mb (3.3% of the genome). Projected finished sequence output is expected to rise to 150- 200Mb per year. No marker tested has failed to identify a positive clone within the libraries being used. On chromosome 22q, the longest contig produced is estimated at 14.7Mb and any gaps between contigs are being actively pursued. This general approach is said to present two advantages over the end sequencing and whole genome shotgun approaches. Firstly, neither of these two approaches can be coordinated to minimize overlap of sequencing and secondly, neither utilizes current mapping information to order and verify clone fidelity (SC and WUGSC, 1998; Green, 1997).

1.1.7 Current status of publicly available genome sequencing

1.1.7.1 Global sequencing project As of May 1999 420.2Mb (13.1%) of finished human genome sequence is available in the public domain, while 328.3Mb are awaiting finishing (10.2% of the human genome). These statistics were obtained from the National Institute of Biotechnology (NCBl) at: http://www.ncbi.nlm.nih.gov/genome/seq/pag.cgi ?F=HsHome.html&ORG=Hs

1.1.7.2 Chromosome 1 sequencing project As part of the HGP initiative the Sanger Centre (Hinxton, UK) has undertaken sequencing of human chromosome 1. NCBl (May 1999) and Sanger Centre (September 1999) figures for chromosome 1 (quoting chromosomal size as 263Mb) describe 19.3Mb of sequence available (7.3% of total chromosome 1 sequence), with a further 15.7Mb of unfinished sequence available. Figures given by the Sanger Centre describe the total chromosome 1 finished sequence in 1997 as approximately 5Mb, which rose to

31 approximately 15Mb in 1998. The projected date for the complete sequence of chromosome 1 is 2005. Data obtained from the report of the Fourth International Workshop on Human Chromosome 1 Mapping (Gregory et aU 1998) details statistics on chromosome 1 mapping progress. Integration of publicly available datasets described that 4,024 PAC clones and 1,998 YAC clones had been localized to chromosome 1. Over 100Mb covered by PAC clones had been ‘fingerprinted’ and over 3000 STSs has been hybridized to bacterial clone libraries.

1.1.8 Global mapping and local mapping: parallel endeavors Global mapping initiatives have been implemented by large institutions capable of generating vast quantities of sequence on a weekly basis. These institutions are essential if the projected target of 2003 is to be reached. The overall strategy of the ‘global’ bacterial-clone based mapping approach (adopted by institutions within the HGP framework), has been based on the pioneering work of smaller ‘local’ mapping groups, studying specific regions of the genome that are of particular biological interest (such as the so-called ‘Down syndrome critical region’ on human chromosome 21, Osoegawa et al, 1996). Work being undertaken by ‘local’ mapping groups is actively encouraged by large mapping and sequencing institutions such as the Sanger Centre (Hinxton, UK), who readily incorporate collaborative data into their global programs (Gregory et al, 1998). One such region of biological interest is the human Epidermal Differentiation Complex, which covers some 2Mb of human chromosome lq21. The complex is so termed as it harbors members of three distinct gene families that are expressed during epidermal differentiation. This thesis describes a ‘local’ mapping project that has covered the human Epidermal Differentiation Complex in bacterial clones. These clones have been integrated into the larger, chromosome 1 project at the Sanger Centre. An introduction to epidermal differentiation and the Epidermal Differentiation Complex follows.

32 1.2 Epidermal differentiation

The outer epidermis of an organism is the first line of defense between itself and the environment. The epidermis must provide properties to shield the organism from desiccation, physical, mechanical, chemical and microbial assault. For example, human infants born prior to 30 weeks gestation (normal gestation period is 40 weeks) lack a skin barrier and can suffer from fluid imbalance, heat loss and infection (Cartlidge and Rutter, 1992). In the mammalian epidermis, the sequential process of keratinization provides shield function. Keratinization is effected by kératinocytes (the major epidermal cell constituent) during terminal differentiation as they migrate from the epidermal basal layer (point of origin), to the epidermal stratum comeum. Kératinocytes at the outermost layer (stratum corneum), having undergone terminal differentiation, become lifeless, flattened, squames, and are eventually sloughed into the environment. During keratinization, epidermal cells follow a programmed regulation of gene activity, thus facilitating the necessary modifications to protein structure and expression. Many genes encoding structural, functional, and transcriptional proteins playing a part in terminal differentiation of the epidermis have been identified. A number of models have been presented to describe the involvement of the wide variety of proteins in keratinization. This introduction hopes to present to the reader a brief outlook on the current understanding of these processes.

1.2.1 Epidermal Differentiation as a model for the study of cell differentiation in general The mammalian epidermis presents an insightful model of differentiation. Cells follow a programmed life cycle beginning at a specific site of origin and migrating through spatially organized layers (directly relating to maturation) until they reach a specific site of termination. This cycle progression is continuous throughout the mammalian adult life, thereby not limiting observations to early embryonic development as with other cell types of tissues and organs within the body (central nervous system, for example). Kératinocytes can be grown in culture, where they retain many of their differentiating

33 characteristics (Leigh and Watt, 1994). Much of the current understanding of keratinization has come from investigating cultured kératinocytes under varying conditions. Kératinocytes are common targets for neoplastic transformations. The incidence of skin cancers throughout southern USA and Australia now exceeds all other cancers combined, and is still increasing (Brash, 1997). A consistent characteristic of malignant human kératinocytes in culture, is aberrant terminal differentiation - it has been proposed that this might be a necessary step in the neoplastic transformation of these cells in vivo (Rheinwald and Becket, 1980; Yuspa and Morgan, 1981). Understanding how terminal differentiation is regulated will not only elucidate the transformation processes, but will also present valuable information regarding cell cycle and tumor progression in general.

1.2.2 Biological markers of epidermal differentiation The morphological state of the keratinocyte can be classified into four phenotypically distinct groups corresponding to four epidermal cellular layers. Figure 1.1 displays a schematic of these cellular layers and indicates the relative location of a number of proteins classically used as biological markers of epidermal differentiation (for reviews see Watt, 1989; Christiano, 1997; Eckert et al, 1997). The following paragraphs detail the history and current understanding of these markers.

1.2.2.1 The keratins The keratins are a class of intermediate filament proteins present as various species throughout the differentiating epidermis. Intermediate filament proteins are, in general, thought to be involved in the physical co-ordination of cell shape and nuclear centration as well as maintaining cell differentiation (Franke, 1987). In the epidermis, keratins are critical in maintaining cytoskeletal integrity, cell-cell interactions and contribute to specific structures within a differentiating keratinocyte (Corden and McLean, 1996). The protein content of a fully differentiated keratinocyte is composed from up to 85% of keratin intermediate filaments (Fuchs and Cleveland, 1998). Keratins are said to emanate from perinuclear rings (free ribosomes), extend throughout the cell cytoplasm, and terminate at the cellular junctions, termed desmosomes and hemidesmosomes (Roop, 1995). Desmosomes and hemidesmosomes are the major cell surface attachments sites for intermediate filaments at cell to cell and cell to substratej junctions, respectively (reviewed by Green and Jones, 1996).

34 EXTERNAL ENVIRONMENT Stratum Corneum Cornified envelope precursors, e.g. loricrin, SPRRs, involucrin Granular Layer

involucrin profilaggrin keratin 1 + 10

Supra-basal Layer

keratin 1 + 10

Basal Layer ή®®® keratin 5+14 BASEMENT MEMBRANE

Figure 1.1 Schematic diagram of keratinocyte organization within the human epidermis, the four distinct morphological layers, and some of the classical biological markers found within these layers. Ellipsoid shapes depict kératinocytes within the four distinct morphological layers (indicated by large text to the right). Some of the classical biological markers of terminal differentiation are given with smaller text. Arrow depicts direction of migration as differentiating kératinocytes detach from the basement membrane, move through the four cellular layers and are eventually sloughed into the external environment.

35 Over 30 keratin genes that are differentially expressed in epithelial cells can be found within the human genome (Fuchs and Cleveland, 1998). Keratin protein sequences conform to the basic blueprint of all intermediate filament proteins - an approximately 310 amino acid central alpha-helical rod domain, flanked by end domains highly variable in sequence and structure (Blumenburg, 1993). Two sub-families exist: basic and acidic. One member of each family must be present for the formation of a keratin 'couple', aligned via the interaction of the two alpha-helical regions. Each member has a preferred couple, such as the K5 and K14 couple which composes 20-25% mass of cells at the basal layer and are one of the molecular markers of the this layer (Fuchs and Cleveland, 1998).

Comprising the basal layer and attached to the basement membrane are the basal cells, which contain a population of stem cells. These highly proliferative stem cells represent an undifferentiated keratinocyte and give rise to the population of terminal differentiating cells. Morphological changes occur once a keratinocyte detaches from the basal membrane, primarily with an increase in size and the production of the keratin Kl/KlO marker couple. The density of the desmosomes (cellular junctions) also increases, with the overall resulting morphology said to be spinous, coining the term spinous layer, also referred to as the supra-basal layer. Most cells within the supra-basal layer have lost the ability to proliferate but are metabolically active. Other keratins are expressed selectively depending on tissue type. K9 is localized exclusively to the skin of palms and soles, while K6, K16, and K17, are expressed in the supra-basal layer of palms, soles, nails, hairs, and in oral and genital mucosal surfaces (Christiano, 1997; Ekanayake-Mudiyanselage era/, 1998). Above the supra-basal layer lies the granular layer, so termed due to the presence of membrane coating granules, L-granules and keratohyalin granules. Keratohyalin granules are rich in the protein filaggrin (Lee et aU 1993).

1.2.2.2 Profilaggrin Filaggrin is produced during the latter stages of epidermal differentiation from the large polyprotein precursor profilaggrin (McKinley-Grant et al., 1989). Profilaggrin consists of

36 specific terminal domains flanking the block of tandemly repeated filaggrin units, which are individually separated, by linker domains. Human profilaggrin contains a variable number of filaggrin units ranging from 10-12 and is highly heterogeneous with 39% of the 324 amino acid positions variable (out of a sample of 11 cDNA clones; Gan et al., 1990). Evidence suggests that the expression of profilaggrin is suppressed at the transcriptional level until very late on in the granular layer, although the mechanism of suppression is unknown (Mckinley-Grant et at., 1989). Profilaggrin accumulates in a non-functional, phosphorylated form within keratohyalin granules. Early in the final layer (stratum comeum), profilaggrin is dephosphorylated and proteoliticaly cleaved by the excision of the short peptide linker sequences to yield functional filaggrin molecules. Functional filaggrin molecules aggregate keratin intermediate filaments (KIF) into tightly aligned bundles called macrofibrils, which are packed into the flattened squames of the upper stratum comeum (Dale et at, 1994). The process of tightly aligning KIF into macrofibrils severely alters cell shape and facilitates cytoskeletal collapse within a single cell layer between the granular layer and the stratum corneum, indicating that this is a highly regulated event (Kuechle et at, 1999). Less is known about the terminal domains of profilaggrin and their role in epidermal differentiation. Most of the information available is focussed on the amino terminal domain, which can itself be divided into two sub domains, A and B. The A domain is 81 amino acids in length, is hydrophobic, and contains two calcium binding E-F hand motifs which display homology to those of the SI00 protein family (see section 1.3.1.3). The B domain consists of 212 amino acids and is highly hydrophilic. The amino acid sequences of domain A (predicted from cDNA clones) display greater than 90% identity among rat, mouse and human profilaggrin. Homology between the filaggrin repeats of these three species is less (42% to 60% identity) leading to the suggestion that domain A is of functional importance (Presland et at, 1995). It has been demonstrated that the two distinct calcium-binding sites at the amino terminus of human profilaggrin do indeed bind calcium and that the protein undergoes conformational changes upon removal of Ca2+. Presland and coworkers (1995) suggest that this functional calcium binding domain plays a role in profilaggrin processing and

37 targeting into keratohyalin granules as well as other calcium dependent processes during terminal differentiation (see 1.2.3).

Distinct conformational changes occur as a keratinocyte migrates through the granular layer and into the stratum comeum. Additional proteins, lipids and cell surface carbohydrates are synthesized while proteins that are no longer needed, nucleic acids, and even whole organelles are successively destroyed. The final terminally differentiated keratinocyte is a flattened squame composed of mainly macrofibrils, enclosed in a highly insoluble protein structure termed the cornified envelope. It has been proposed that unused filaggrin molecules are degraded into mostly free amino acids which are used in maintenance of epidermal osmolarity and flexibility (Dale et al, 1994). However, it has also been demonstrated that filaggrin is incorporated into the cornified envelope (see below).

1.2.2.3 The Cornified Envelope (CE) - protein content The final process of a terminally differentiating keratinocyte is the deposition of a highly insoluble structure on the inside of the plasma membrane, termed the cornified envelope (CE). This structure replaces the plasma membrane in lifeless, flattened squames, and is the mammalian front line defensive shield (for review see Riechert et al., 1993, Hohl and Roop, 1993). The CE is an amalgam of proteins (90%) cross-linked by a membrane bound transglutaminase (TG) via covalent bonds, into a rigid scaffold with lipids (10%) covalently attached to its external surface (Swartzendruber et al., 1988). Three epidermal transglutaminases have been studied in detail, two of which are cytosolic: TOE and TOC, and one of which is membrane bound: TGK (Park et al., 1988). TGC is expressed in the basal layer and is thought to play a role in stabilization of the dermo-epidermal junction (Aeschlimann et al, 1998), rather than in CE formation (TGC is induced by retinoic acid, a potent inhibitor of CE formation; Thacher and Rice, 1985). The membrane bound TGK is certain to participate in envelope formation, while the possibility of TGE involvement has not been ruled out. TGE could play a role, as it is thought that TGK would be incorporated into the CE and not be able to cross link outer layers (Thacher and Rice,

38 1985). Expression of TGE has been demonstrated in cultured kératinocytes but not in vivo (Aeschlimann et al, 1998). Recently, an additional TG has been identified, TGX. Early evidence suggests a role in CE formation based on expression increases in differentiating kératinocytes (Aeschlimann et at, 1998). Two distinct types of CE have been identified via Normarski contrast microscopy: fragile and rigid. CEs recovered by tape stripping from the granular layer/stratum corneum interface show a considerable presence of fragile CEs, while tape stripping from the upper stratum comeum reveals only rigid CEs (Michel et a l, 1988), This provided the first clues that CE formation is a process of sequential layering of components. More evidence for such a model comes from the identification CE components. Elucidating the exact nature and composition of the CE has proven difficult due to the irreversible cross-linked nature of the structure. The isodipeptide bonds cannot be hydrolyzed to release intact proteins without the use of reagents that also cleave peptide bonds. A number of clues can point towards a protein being a constituent of the CE, such as its ability to serve as a transglutaminase substrate, reactivity to antibodies of the CE, mRNA abundancy in the epidermal upper layers, or its ability to become cross linked. Table 1.1 lists the majority of literature cited CE components. Others exist, particularly a number of unidentified components of known mass (Steinert and Marekov, 1995; Robinson et al, 1997). Two laboratories have studied direct CE components by treating purified CEs with a series of degrading reagents over periods of time (Steinert and Marekov, 1995 + 1997; Robinson etal, 1997).

39 Protein Evidence for CE component Reference Involucrin 1,2,5 Thacher and Rice, 1985 Crish etal, 1993 Steinert and Marekov, 1997 Cystatin a 1,4,5 Takahashi etal, 1992, 1997, + 1998 Steinert and Marekov, 1997 SPRR proteins 1,3,4, 5 Hohl etal, 1993 + 1995 Steinert and Marekov, 1995 + 1997 Robinson etal, 1997 Loricrin 1,2,4, 5 Hohl etal, 1991 Yoneda and Steinert, 1993 Steinert and Marekov, 1995 + 1997 Robinson et al, 1997 Hardman etal, 1998 Trichohyalin 2 ,3 ,4 Lee etal, 1993 O’Keefe era/, 1993 Filaggrin 1,3,4 Markova etal, 1993 Steinert and Marekov, 1995 Keratin intermediate filaments 1,5 Steinert and Marekov, 1995 + 1997 El afin 1,2, 3,4,5 Tezuka and Takahashi, 1987 Molhuizen etal, 1993 Nonomura et al, 1994 Steinert and Marekov, 1995+ 1997 Desmosomal components 1,5 Steinert and Marekov, 1995 + 1997 Robinson etal, 1997 Annexin I 5 Robinson et al, 1997 Plasminogen activator inhibitor-2 5 Robinson etal, 1997 SIOOAIO 5 Robinson et al, 1997 SlOOAll 1,5 Robinson etal, 1997 Robinson and Eckert, 1998 Table 1.1. Human cornified envelope protein precursors, evidence of such and origin of experimental data. Evidence: 1 - demonstrated cross-linking via N^-(y-glutamyl)lysine isodipeptide bonds. 2 - immunogold/ immuno-fluorescent electron microscopy. 3 - immunoblotting/staining. 4 - mRNA abundance in upper granular layer and stratum comeum. 5 - sequencing of purified peptides released from digested CE.

40 Steinert and Marekov site that the ultimate definition for a CE constituent is an identifiable protein sequence cross-linked by a membrane bound transglutaminase- catalyzed, isodipeptide bond. In the first of two studies, Steinert and Marekov predicted CE composition by treating purified CEs from human foreskin with low-specificity proteases over a period of two to four days, followed by direct amino acid sequencing of the many peptides recovered and immunogold electron microscopy, using monospecific antibodies. This elucidated much of the cytosolic portion of the CE, but was unable to identify involucrin or cystatin a (two proteins they suggested as initial envelope scaffold - Steinert and Marekov, 1995). The second study identified involucrin with immunogold microscopy after removal of extracellular lipids (seel.2.2.4)with methanol/KOH and also identified cystatin a (amongst others - see table 1.1) via protein digestion and sequencing of released peptides (Steinert and Marekov, 1997). Robinson and co-workers used CNBr digestion, followed by trypsin and then proteinase K treatment of CEs from cultured kératinocytes to release peptides, which were purified and sequenced to predict CE composition. These two independent studies using slightly different digestion techniques of CEs from different sources produced varied results. Steinert and Marekov demonstrated the presence of elafin, filaggrin, KIF, loricrin, involucrin, cystatin a, desmosomal components, SPRRl and SPRR2. Robinson etal, demonstrated the presence of SIOOAIO and SlOOAl 1 (two previously unknown CE components) as well as annexin-I, desmosomal proteins, SPRRs, plasminogen activator inhibitor-2, and ivolucrin. Surprisingly, no evidence of loricrin was determined by Robinson et al, 1997, although it is widely accepted that loricrin is a major precursor (see table 1.1). Based on these studies and other data (presented herein), the composition of the CE has been modeled as follows. The innermost layer (cytosolic side) is virtually completely composed of the protein loricrin but also the small proline rich proteins (SPRRs) and filaggrin have been implicated. Transgenic studies over expressing human loricrin in mice support the hypothesis that loricrin is the, or one of the final proteins to be deposited into the CE. A normal phenotype is seen when human loricrin is over expressed - mice

41 show cross-linking of the human homologue as well as the murine loricrin (Yoneda and Steinert, 1993). The middle layer of the CE includes the proteins elafin, loricrin and members of the SPRR family, while the outer layer includes involucrin, cystatin-a and perhaps further unknown proteins. Transgenic mice over expressing human involucrin display an abnormal phenotype, suggesting that this protein forms part of the essential internal structure (Crish etal., 1993). Although evidence of keratins and filaggrin present in the CE are seen after rounds of proteolysis, it is possible that this is a result of contamination of these proteins from the CE/macrofibril junction. The desmosomal proteins desmocolin 3a/3b, desmoglein 3, desmoplakin I, plakoglobin, envoplakin, and plakophilin are possibly located toward the middle and outermost layers of the CE. Evidence of desmoplakin cross-linked to elafin in the middle layer was seen (Steinert and Marekov, 1995). Based on the amino acid composition of proteins known to isolate to the stratum corneum, Steinert and Marekov used a mathematical model to determine that the CE is composed of the following: loricrin <70% filaggrin 8% elafin 6% SPRRs 5% cystatin a 5% involucrin 2% keratins 2%

The model is limited to known proteins but is based on the overall amino acid composition of the CE. Further proteins may yet be identified. It is important to note that Robinson et al, 1997, discount mathematical models of CE composition due to large negative values (non physical) being produced from their data. This may be a result of an artificial environment produced by studying envelopes of cultured kératinocytes, rather than envelopes from human foreskin as performed by Steinert and Marekov.

42 Three mechanisms involved in facilitating precursor proteins to the site of envelope formation have been proposed. The first describes precursors diffusing passively to the vicinity of a TG and being cross-linked via an intermediary complex formed with the enzyme. The second model proposes vesicle delivery of the precursor, while the third describes ‘envelope organizer proteins’, with annexin I as the given example (Robinson et al, 1997). In this model it is suggested that, as calcium levels rise within the differentiating cell (calcium regulates differentiation, see below) annexin I (which is known to bind membrane phospholipids in the presence of calcium) forms a complex with SlOOAl 1 and moves to the inner surface, facilitating binding to the plasma membrane (Robinson et al, 1997). Analysis of in vivo expression of certain CE precursor proteins reveals that, although expression is restricted to keratinocyte terminal differentiation, differences can be seen in the locality of expression. As with certain keratins (see 1.2.2. l)pther CE precursor proteins are differentially regulated. Involucrin is present in all squamous epithelia while loricrin is found solely in keratinizing epithelia. SPRRl is localized predominantly in follicular epidermis and oral mucosa, SPRR2 is expressed in follicular and inter-follicular epidermis, while SPRRS is absent in epidermis and strongly expressed in internal squamous epithelia (Lohman et al, 1997).

1.2.2.4 The Cornified envelope (CE) - lipid content On the intercellular side of the electron dense protein component of the CE, there is an electron lucent band displaying the lipid portion covalently attached via ester bonds to acyl groups of the protein CE (Swartzendruber et al., 1987). These lipids are deposited from the membrane coating granules seen in the granular layer (discussed earlier). Main components of the human lipid monolayer are co-hydroxyceramides (of which there are two identified types - long-chain-co-hydroxyceramides and co-hydroxyacylceramides, ^ comprising 54.3% and 24.8% respectively), fatty acids (12.7%) and co-hydroxy-acids (9.4%). cû-hydroxyceramides consist of co-hydroxy acids linked by amide bonds to sphingosine. Interaction of the sphingosine chains between adjacent envelopes has been suggested to play a role in stratum comeum cohesion (Riechert et al., 1993).

43 (û-hydroxyceramides derive in large part from hydrolysis of glucosylceramides, mediated by the stratum comeum p-glucocerebrosidase enzyme. This hydrolysis is initiated by four distinct sphingolipid activator proteins that are functionally produced via cleavage of a single, large precursor protein, prosaposin (Doering et al, 1999). It has been demonstrated that the targeted deletion of murine prosaposin in transgenic mice produced either a neonatally fatal or a later onset fatality of complex pathology that included an ichthyosiform phenotype (see 1.4.1), indicating that prosaposin, and therefore sphingolipid activator proteins, are required for normal epidermal differentiation (Doering et al, 1999). More recently, a novel function of TGK has been identified in that it can form ester bonds between specific glutaminyl residues of involucrin and long-chain co- hydroxyceramides (Nemes et al, 1999). This is the first study to identify specific attachments of protein CE components to lipid CE components via TG activity, and confirms suggestions that involucrin forms the initial protein scaffold of the CE.

1.2.3 Regulation of epidermal differentiation Determination of epidermal differentiation regulation has concentrated on two main areas: the induction of terminal differentiation (by demonstrating expression of markers) and the expression of individual gene and protein markers. Early in vitro studies show that Ca^"^ (Yuspa et al, 1989), cell density (Ryle et al, 1989) and depletion of retinoic acid (vitamin A - Fuchs and Green, 1981) induce terminal differentiation based on the presence of markers such as filaggrin and the keratin couple Kl/KlO. Transcription of loricrin is induced in the same manner (Hohl, eta l 1991). Ca^"^ and TPA (12-0- tetradecanoylphorbol-13-acetate) induces involucrin expression (Takahashi and Ezuka, 1993) as do isoforms of protein kinase C (Takahashi et al, 1998) and the transcription factor SPl (Banks et al, 1999). Keratinocyte proliferation and differentiation is regulated by an increasingly large number of growth factors and cytokines (Gibbs et al, 1998). This is not surprising considering the differential nature of keratin and SPRR expression (isee 1.2.2.1 and 1.2.2.3). A number of examples follow. Epidermal growth factor (EGF) and transforming growth factor alpha (TGFa) have been reported to act via the EGF-receptor (King et al, 1990). It

44 has been proposed that platelet activating factor is an intrinsic regulator of keratinocyte proliferation and differentiation (Shimada et al, 1998). Activin A is indicated in the inducement of cultured keratinocyte differentiation and the suppression of proliferation (Seishima et al, 1999). Sulfur mustard induces markers of terminal differentiation and apoptosis in kératinocytes (Rosenthal et al, 1998). Activators of the nuclear hormone receptor PPARa also stimulate keratinocyte differentiation (Hanley et al, 1998). Very recently, a member of the ‘kruppel’ transcription factor family, klf4, was demonstrated to be crucial in the orchestration of correct murine epidermal barrier function (Segre et al, 1999). Mice, homozygous for a targeted deletion (knockout) of the murine klf4 gene, displayed loss of barrier function without gross alterations to the process of epidermal differentiation, followed by lethality after 15 hours. Loss of barrier function was demonstrated by weight loss, measured hydration of the skin (tested by surface electrical capacitance), and dye exclusion. Ultrastructural data suggested resulting alterations in CE formation and gross defects in the secretion or deposition of lipids onto the CE scaffold. Subtractive hybridizations identified three murine CE precursor proteins to be up regulated in klf4-/- mice: SPRR2a; repetin (Krieg et al, 1997); and plasminogen activating inhibitor 2. This data suggests that removal of transcription factor klf4 up- regulates certain CE components, leading to abnormal CE formation via the normal differentiation pathway (Segre etal, 1999).

Mechanical influences on the terminal differentiation have also been identified. UV light induces expression of a number of members of the SPRR gene family, indeed these genes were first identified as a result of this property (Kartasova et al, 1988). Barrier disruption in hairless mice (by tape stripping, or treating with acetone or SDS) is seen to increase epidermal lipid and DNA synthesis, as well as effecting the expression of a number of epidermal markers. The expression of murine keratins K6, K16, and K17 were induced after barrier disruption. Expression of basal keratins K5 and K14 and suprabasal keratins K1 and KIO were increased after varying degrees of disruption. Loricrin expression was unchanged, while involucrin expression moved from the granular layer to the suprabasal layer (Ekanayake-Mudiyanselage et al, 1998). It has been indicated that temperature

45 gradients in the skin regulate morphogenesis via temperature sensitive regulatory pathways (Gibbs etal, 1998).

1.3 The Epidermal Differentiation Complex

A remarkable clustering of three gene families involved in epidermal terminal differentiation is seen within a genomic segment of human chromosome lq21. This is why this segment has been termed the Epidermal Differentiation Complex and is defined by an approximately 2Mb stretch of human chromosome lq21 (Mischke et al, 1996). The Epidermal Differentiation Complex (EDC) contains 28 genes of three families and includes a number of the classical biological markers of terminal differentiation, discussed in the previous section (1.2). Figure 1.2 shows a schematic of the EDC and its location on human chromosome 1.

1.3.1 Gene families constituting the EDC

1.3.1.1 Loricrin, involucrin and the SPRR (CE precursor) gene family The first gene family is composed of the CE precursors, loricrin, involucrin and the SPRRs. The protein products of these genes are characterized by homologous N and C terminal domains and specific, repetitive, internal domains (Backendorf and Hohl, 1992). Loricrin and involucrin are single genes while the SPRR family consists of 10 members: 2 SPRRl genes, 7 SPRR2 genes and 1 SPRRS gene. Compelling evidence that these genes have evolved from a common ancestor has been documented (Gibbs et al, 1993) and will be reviewed elsewhere (see 5.2.3). The specific roles of these precursor proteins have been presented in section 1.2.2.3.

46 0Mb r ~ ~ i SIOOAIO r~r—1 SlOOAll

THH

FLG 0 . 5 __

1Mb INV

SPRR

1q21 LOR S100A9/12 S100A8 r— 1 S100A7 1.5

S100A6-2

SlOOAl 3 SlOOAl

2Mb

Figure 1.2. Human Epidermal Differentiation Complex on chromosome lq21. Ideogram of chromosome 1 highlighting the 2Mb region harboring three gene families of the Epidermal Differentiation Complex (depicted by shading). Not to scale.

47 1.3.1.2 Profilaggrin, trichohyalin, and repetin (intermediate filament association) gene family The second family consists of three genes encoding the intermediate filament association proteins filaggrin (section 1.2.2.2), trichohyalin (Mckinley-Grant etal, 1989; O’Keefe et al, 1993), and the less characterized protein repetin (Krieg et al, 1997; Huber et al, 1999). Trichohyalin and repetin are similar to profilaggrin in that they share almost identical exon structure and contain two functional calcium-binding domains of the E-F hand type. The first exon is short and purely untranslated, followed by an intermediate second exon of both untranslated and translated sequence, and a long final exon, containing the majority of coding sequence and 3’ untranslated sequence (Markova et al, 1993 and Lee et al, 1993). Both trichohyalin and profilaggrin are thought to aggregate keratin intermediate filaments (KIF), although in different ways and in different tissues. Trichohyalin functions to cross-link KIF into loose networks in the inner root sheath cells of the hair follicle as well as the granular layer of the epidermis (Lee et al, 1993), while filaggrin functions to tightly aggregate KIF into bundles, solely in the granular layer of the epidermis (Ishida-Yamamoto etal, 1997a). Although it has been demonstrated that profilaggrin and trichohyalin are co-expressed, they have distinct expression and processing mechanisms (Ishida-Yamamoto etal, 1997a). Even though both proteins are known substrates for TG, it is still debatable whether or not these proteins are precursors of the CE (Steinert and Marekov, 1995 and Ishida-Yamamoto et al, 1997a). Repetin was first isolated by subtractive cDNA cloning of benign and malignant tumors of mouse skin (Kreig et al, 1997). Cloning and characterization of repetin described a novel gene exhibiting striking similarity to profilaggrin and trichohyalin, and co­ localization to mouse chromosome 3. Expression of the repetin gene is seen in stratified internal epithelia and normal skin epidermis, where it is restricted to the differentiated suprabasal layers (see 1.2.2.1). Based on the high presence of glutamine residues within the central repetitive domain, it has been speculated that repetin could act as a substrate for transglutaminase-mediated cross-linking (section 1.2.2.3). Human repetin has been cloned and localized to human chromosome lq21 (see chapter 3) and has been demonstrated to bind calcium (Huber et al, 1999).

48 1.3.1.3 s 100 gene family The third, identified gene family residing within the EDC, is the SlOO family. SlOO proteins are small (average 10-kDa in size), calcium binding proteins with two conical B- F hand domains (for review of E-F hand calcium binding proteins see Persechini et aU 1989), that share at least 50% homology at the amino acid level (for review of SI00s see Zimmer et al, 1995). The majority of SlOO genes follow a standard exon structure - a short first exon composed of 5’ untranslated sequence; an intermediate second exon of both untranslated and translated sequence; and a larger, final exon, containing the majority of coding sequence and the 3’ untranslated sequence. This exon structure is very similar to that of profilaggrin and trichohyalin (see above). Each member of the family exhibits a specific pattern of expression, with some cell types expressing multiple members. It is thought that SI00s act as Ca^"^ second messengers, both extracellularly and intracellularly, by binding to target-proteins. This interaction appears to involve cysteine residues as well as a conserved 13 amino acid linker region (separating the two calcium- binding domains). It is thought that on binding calcium, the linker domain is exposed and able to bind specific target-proteins. Potential functions of SlOO family members range from regulation of cell growth, contraction, differentiation, metabolism, and cell structure, to potential anti-inflammatory, signal transduction, response to chemotactic factors and regulation of kinase activity and neurotransmitter release (Zimmer et al, 1995). Unlike the two other, previously discussed gene families within the EDC, the SlOO family has additional members located at other regions of the genome (SIOOB, SI OOP, and Calbindin-3). The 13 members localized to the EDC have been given a logical nomenclature, SlOOAl-13 (Schaefer et al, 1995; Wicki et al, 1996a + b). In relation to epidermal differentiation, it has been demonstrated that two SlOO proteins are CE precursors (Robinson et al, 1997). The fact that calcium levels tightly control epidermal differentiation has lead to much speculation that the co-localization of SlOO genes with other epidermal genes is not purely co-incidental - a functional relationship between SlOO proteins and the expression of other EDC genes could exist (Schaefer et al, 1995; Mischke et al, 1996; Marenholz et al 1996). It is important to note here that such a functional relationship has not yet been demonstrated, although S100A6 expression has

49 been localized to the skin, particularly the keratogenous region of the hair follicle, and linked to expression of hair structural proteins (Wood et al, 1991).

1.3.2 Gene complexes and multi-gene families Multi-gene families are characterized by a varying number of genes sharing a degree of homology and possessing a similar function. Numerous families exist in vertebrate genomes, consisting of loci at different chromosomal locations (examples such as the annexins, UNIGENE - http://www.ncbi.nlm.nih.gov/UniGene; and Transglutaminases, UNIGENE), or loci clustered together in close proximity, such as the p-globin gene family (Hardison et al, 1997), and the opsin gene family (Dulai et al, 1999). A super-gene family is said to be a set of multi-gene families and single copy genes that are structurally related yet functionally different, implying common ancestry, such as the immunoglobin super-family (Hood et al, 1985). What is rare to find in the genome, is the clustering of structurally different, yet functionally similar gene families. Such clustering is said to form a gene complex. Gene complexes such as those found associated with the immunoglobin super-family, consist of multiple genes belonging to distinct gene-families which are functionally inter­ dependent and/or because of somatic cell specific rearrangements of genomic DNA juxtaposing different constituents of the complex in order to create new variants. The most studied example to date is the major histocompatability complex (MHC), where a large number of related and unrelated genes are clustered together within 4 MB of the short arm of human chromosome 6. Briefly, the majority of MHC genes are categorized into MHC class I, II, and IQ. MHC class I and class II genes code for cell surface glycoproteins, involved in cellular recognition processes of the immune system. Class IQ MHC genes code for complement components such as C2, C4 and Bf (Carroll et al, 1984). An example of structurally different yet functionally related gene families can be seen in the MHC class Q region where gene families encoding sub-units of transporter proteins and protease proteins are found (Monaco JJ, 1992). Protease protein sub-units form part of the low molecular mass polypeptide (LMP) complex, which proteolytically degrade cytoplasmic proteins into peptides. These degraded peptides are delivered to the transporter proteins which

50 facilitate transportation to MHC class I molecules. MHC class I molecules bind these intracellular peptides and are exported to the cell surface where this sample of cellular content (bound peptides) can be presented to CD8+ T cells (Monaco JJ, 1992). A growing number of gene families have been identified within the MHC; at least 7 in the MHC class I region alone (Shiina et al, 1999). The EDC presents an example of structurally different yet functionally related gene families. At least 2 members of each of the three, so far identified gene families within the EDC, are expressed in epidermis and are structurally important for keratinization. The key question to be answered is are these genes co-localized by chance, or are there highly significant functional and evolutionary constraints keeping these genes in close proximity and across species barriers? One step towards answering this question is the production of a high-resolution physical map of the complex, thus facilitating gene identification, functional studies and direct species comparisons.

1.3.3 Locus control regions One identified feature of a number of gene families is the presence of what is termed a “locus control region” (LCR). Locus control regions were first identified in a stretch of sequence that greatly influenced expression of the p-globin gene family (Taun et al, 1985; Forrester et al, 1986; Grosveld et al, 1987). The p-globin gene family, in humans, contains five p-like-globin genes that are differentially expressed during development (at embryonic, fetal, and adult stages) and are erythroid tissue specific (see Townes and Behringer, 1990, and Hardison et al, 1997, for reviews). It was seen that in certain variants of the disease thalassemia, a deletion 5’ of the gene cluster severely retarded the tissue and developmentally specific, gene expression of the intact gene family. Investigation of this deleted sequence revealed 5 ‘major’ DNase I- hypersensitive sites, more prominent than the ‘minor’ DNase I-hypersensitive sites (HS) located close to the 5’ end of all the p-like-globin genes (Tuan et al, 1985). It was also noted that these HS were tissue specific (limited to erythroid tissue only) and developmentally stable - present through all stages of development (Forrester et al, 1986). Transgenic experiments revealed that expected levels of the p-globin gene expression were only seen when an introduced construct contained these HS. Moreover

51 the levels of expression were copy-number dependent (level of expression related to number of introduced copies of construct), and position-independent (the site of construct integration had no effect on expression - Grosveld et al, 1987). This definition: sequences that confer high level, copy-number dependent and position-independent, expression - was the initial criteria of an LCR. Much work has gone into characterizing the p-globin LCR and other subsequently identified LCRs (few examples being: MHC class H, Ea gene - Carson and Wiles, 1993; CD2 gene - Festenstein et al, 1996; Chicken lysozyme gene - Bonifer et al, 1990; T-cell receptor a locus - Ortiz et al, 1997; Opsin gene family - Dulai et al, 1999). The general understanding of how an LCR influences expression is by chromatin domain opening activity. HS are characteristic of nucleosome free regions, thereby enabling transcriptional proteins (enhancer/promoter binding proteins) access to desired sequence (Udvardy, 1998). It has been seen that copy number dependency is not strictly adhered to in some cases, and that LCR is not critical for developmentally specific expression (Hardison et al, 1997). Ortiz and co-workers (1997) demonstrated that cw-acting elements (enhancers/silencers) present at an LCR, alter chromatin configuration in a tissue specific manner - removal of these specific sequences altered chromatin in all tissues studied. This was in agreement with earlier work demonstrating that a short region of an LCR with no identified enhancers, functions to establish and/or maintain an open chromatin domain (Festenstein etal, 1996). Recently it was shown that the p-globin LCR exhibits directionality (Tanimoto et al, 1999). By inverting members of the p-globin gene family, it was seen that proximity to the LCR greatly effected expression levels (and in one case, where a gene was positioned further away from the LCR, complete silencing was demonstrated). In addition, inversion of the LCR itself vastly reduced expression levels indicating that the action of the p- globin LCR is directional.

It is thought that an LCR would operate in any gene family comprised of multiple members that are differentially regulated through development (Townes and Behringer, 1990; Carson and Wiles, 1993). It has been speculated that the whole EDC or genes

52 within the EDC may be influenced by an LCR (Hardas et al, 1996; Mischke et al, 1996; Zhao and Elder, 1997; South et al, 1999). Members of all three, gene families present within the EDC are differentially expressed during the development of the epidermis. It would seem a plausible argument to speculate that an LCR would act on the EDC, especially in view of its’ peri-centromeric location. In fact, evidence from work identifying a transcription factor crucial for mouse epidermal barrier function led to the authors speculating LCR activity for an SPRR2 gene (Segre et al, 1999 - see section 1.2.4). A parallel is drawn between clustered globin genes and clustered genes for CE precursors (SPRR2) in that they are both activated by transcription factors of the same family (‘kruppel-Iike’ transcription family members klfl and klf4, respectively).

1.3.4 Clustered multi-gene families generally reside in gene-rich regions of the genome Regions of the vertebrate genome that harbor multi-gene families are generally ‘gene- rich’, containing a concentration of transcriptionally active genes, including genes that are not obviously functionally related to the gene families present (such as is the case for the MHC and other gene complexes of the immunoglobin super family). One explanation suggested is a possible evolutionary advantage in concentrating genes to one area - a reduction in transcription and intron splicing (presumably time consuming events) being advantageous (Hughes and Yeager, 1997; Yeager and Hughes, 1999). The mouse and human genomes are predicted to contain between 60-80,000 genes (Antequera and Bird, 1993, Fields et al, 1994). If a random distribution of these genes across the human haploid genome (3,000,000kb) is seen, then a prediction of one gene per 37.5 - 50kb can be made. This however, is not the case - it is clear that genes are distributed across the genome in a non-random fashion (Saccone et al, 1992; Craig and Bickmore, 1994). Due to this clustering location of genes, it has been proposed that a genomic region with a concentration of around one gene per 23kb is ‘gene-rich’ (Fields et al, 1994). A search of the literature with this term (using BIDS) identifies 8 entries for 1998. Of these 8,4 are associated with the human genome. A sample of two of these papers demonstrates the broad use of this term. A ‘gene-rich’ region of human chromosome 12pl3 was seen to contain 17 genes, one every 13.4kb (Ansari-Lari, 1998), while transcript mapping in a

53 ‘gene-rich’ region of human chromosome Ilql2-ql3.1 identified 16 novel ESTs to accompany 15 known genes, giving a density of one gene/BST every 53kb (Cooper et al, 1998). One of the best-characterized ‘gene-rich’ regions of the human genome is the MHC, the complete sequence of which is now finished (Parham, 1999). Gene density over the three MHC regions (4Mb long, class I-in regions) ranges from one gene every 14.3kb to one gene every 25kb, with density in smaller segments being as high as one gene per 12.4kb (Shiina et al, 1999). Another chromosomal region, 21q22.3 is seen to contain an even higher gene density. Based on YAC data it has been proposed that this region of chromosome 21 contains one gene per lOkb This is in stark contrast to other regions of chromosome 21 thought to be ‘gene-poor’ and so-far containing one gene per 6Mb (Gardiner 1997). The EDC already contains 28 genes, over an estimated 2Mb (Marenholz et al, 1996; Wicki et al 1996a+b; Yamamura et al, 1996; Huber et al, 1999), a density of one gene per 71 kb. Prior to the start of this thesis, no attempt at transcript identification had been made. It was therefore hypothesized that the EDC would contain additional transcripts possibly related to epidermal differentiation and possibly showing no relation to the currently described genes.

54 1.4 Genetic disorders associated with human chromosomal region lq21

1.4.1 Disorders of the skin associated with lq21 Keratodermas are a highly heterogeneous group of human disorders characterized by hyperkeratosis and/or thickening of the palms and soles. Some of the disorders are classified by mutations in keratin genes (reviewed by Fuchs and Cleveland, 1998), but the underlying cause for most is unknown. The majority of the molecular evidence points towards a defect in the desmosomes or comified envelope (Christiano, 1997). A defect in any gene that causes a disruption in the structure of the epidermis, either directly or indirectly, has the potential to produce a harmful phenotype. This would obviously apply to any EDC gene disruption. In fact, a keratoderma disorder involving an EDC gene has been characterized - a point mutation (insertion) in the loricrin gene (causing a frameshift and delayed termination) was identified in a family with an incidence of Vohwinkels syndrome (Maestrini et al, 1996). Vohwinkel syndrome (OMIM No. 124500) is a rare, dominantly inherited disorder, phenotypically characterized by thickening of the palms and soles in a honeycomb appearance, and the constriction of digits, occasionally leading to auto-amputation due to circulatory problems. Separate cases of Vohwinkel syndrome have been attributed to defects in the loricrin gene (Takahashi et al, 1999; Armstrong et al, 1999; Ishida-Yamamoto etal, 1997b; Korge eta l 1997). However, additional cases have been documented where a defect in loricrin, or any linkage to the EDC have been ruled out (Maestrini et al, 1999; Akiyama et al, 1998). Although the loricrin mutation in Vohwinkel syndrome has been the only disorder of the skin directly relating to an EDC gene, aberrant expression of EDC genes have been noted in others. Ichthyosis is a term to describe a subset of keratodermas characterized by excessively dry skin and the accumulation of scale (Bale and Doyle, 1994). A decrease in profilaggrin expression is seen in cases of ichthyosis vulgaris (OMIM 146700), an autosomal dominant disorder (Nirunsuksiri et al, 1995), while lamellar ichthyosis (a congenital recessive disorder, OMIM 242300) is characterized by mutations in TGK (Candigf al, 1998). Although TGK is not an EDC gene, it is directly involved in the

55 construction of the CE (composed of a growing number of EDC genes) and exemplifies the earlier statement, that a defect in an EDC gene has the potential to produce a harmful phenotype. S100A7 was first identified as a partially secreted protein that is highly up regulated in psoriatic skin (Madsen et al, 1991). Psoriasis (OMIM 1779CX)) is a severe skin disorder characterized by epidermal hyperplasia, vascular alterations and inflammation effecting polymorphonuclear leukocytes, activated T-cells, langerhans cells, and macrophages (Barker, 1991). A number of other EDC genes are shown to be up regulated in psoriatic skin - S100A2, S100A8, S1(X)A9, and SPRR2a (Hardas et al, 1996).

1.4.2 Other disorders linked to lq21 A number of other disorders have been assigned to lq21 from linkage studies. Pycnodysostosis (OMIM 265800), an autosomal recessive, skeletal disorder has been mapped to lq21 by pedigree analysis of a Mexican family (Polymeropoulos et al, 1995). A number of linkage studies in families suffering from varying lipodystrophy conditions have mapped a susceptibility locus to around lq21 (Pajukanta et al, 1998; Peters et al, 1998). Familial lipodystrophy (OMIM 151660) is a genetically heterogeneous set of disorders characterised by a total or partial absence of sub-cutaneous adipose tissue (Anderson et al, 1999). In addition, a related disorder, non-insulin-dependent diabetes mellitus (NIDDM), has also been linked to markers around lq21 (Schaffer et al, 1999).

1.4.3 Cancer and lq21 The genomic region lq21 has been implicated in a wide range of chromosomal abnormalities associated with cancer, while individual EDC genes are shown to up or down regulated in cancerous tissues. It has been reported that recurring alterations of chromosome 1 are the most frequent site for chromosomal abnormalities across all human solid tumours, including cutaneous malignant melanomas (Zhang et al, 1999). Breakpoints involving chromosome 1 in melanomas appear most frequently at peri- centromeric regions (Ipl 1-12 and lq21) and to a lesser degree at lp36. It has been speculated that these regions may harbor genes contributing to the development and

56 progression of these melanomas. Also, the fusion of sequences at Ipl 1-12 and lq21 are known to form ring chromosomes (Zhang et al, 1999). Chromosomal region lq21 is also recurrently gained in desmoid tumors. Using Comparative genome hybridization (CGH), lq21 was identified as the region most frequently gained (39%) in 28 desmoid tumors studied (Larramendy et al, 1998). It has also been reported that lq21-22 may harbor unknown target genes commonly amplified I in human sarcomas (Forus etal, 1996). The break-point of a 1 :X translocation in a renal cell carcinoma was narrowed to around SI00A4, leading to speculation that the related sequence (SI00) in the region being the cause (Weterman et al, 1993). SI00 genes, in particular, have been implicated in a number of cancers. S100A6 has been seen to display higher levels of expression among patients with acute myeloid leukemia (Englekamp et al, 1993), while S100A7 was demonstrated to be expressed in breast cancer cell lines, and in cancer cells of some breast carcinomas, but not in any non- cancerous tissue examined, except skin (Moog-Lutz et al, 1995). S100A2 is underexpressed in breast carcinomas when compared to normal breast tissue (Lee et al, 1992; Xia et al, 1997). S100A2 is also underexpressed in several carcinoma cell lines of the skin, oral mucosa and urogenital tract, leading to the hypothesis that S100A2 operates as a tumor suppressor gene (Xia et al, 1997). Other EDC genes are differentially affected by carcinogenic transformation. SPRR gene expression is seen to be differentially affected in cultured cell lines derived from separate squamous cell carcinomas (Lohman et al, 1997; Reddy et al, 1998), as is the case with the non-EDC TGK gene (Monzon et al, 1996).

57 Chapter 2: Materials and Methods

2.1 Materials

2.1.1 Specialized materials Supplier 3 MM Watman Watman

22cm X 22cm NUNC™ plates Gibco BRL Benchkote Watman Blue sensitive X-ray film Genetic Research Instrumentation Ltd. Hybond N+ nylon membranes Amersham International Mesh sheets Scientific Laboratory Supplies Nunc CryoTube™ Vials Gibco BRL Polaroid 667 film Polaroid QIAGEN midi-tips QIAGEN Wizard*^ DNA clean-Up System Promega

2.1.2 Chemicals 90 OD units/ml of Random Hexamer primers in H^O Gibco BRL

Dextran T40 FLUKA EDTA Gibco BRL Ethidium Bromide 1% in solution FLUKA Low melting point (LMP) agarose Gibco BRL PBS (lOX) ICN Biomedicals Inc. ‘Rapid’ agarose Gibco BRL Tris Gibco BRL Yeast extract Gibco BRL

All other laboratory chemicals were supplied by BDH laboratory supplies, Sigma Chemical Company, and DIFCO laboratories.

58 2.1.3 Enzymes p-Agarase I (lU/|ul) (from P.atlantica) Calbiochem DNase I (RNase free) Boeringer Manheim M-MLV Reverse transcriptase, RNase H Minus (from recombinant £.co/0 Promega Proteinase K BDH RNase A (DNase free) Sigma Chemical Co. RNasin*^ Promega SP6/T7 RNA polymerase Promega SUPER TAQ HT biotechnology Ltd. Zymolyase (lOOT) 100,000 U/g (from A.luteus) ICN Biomedicals Inc.

BRL Life Technologies or New England Biolabs supplied all other enzymes.

2.1.4 DNA oligonucleotide primers Primers supplied by BRL Life Technologies. Sequence, annealing temperatures, cycles used, expected fragment size and any reference for the primer design is given in the following table:

Table 2.1 (following three pages). Primers used: sequence, annealing temperature, origin, and PCR cycle where appropriate. Cycles as follows: 1= 95°C 5 minutes, then 34 cycles of: 95°C 1 minute, annealing temp. 1 minute, 72”C 2 minutes. Final extension (72°C): 7 minutes. 2 = 94°C 4 minutes, then 35 cycles of: 94°C 1 minute, annealing temp. 1 minute, 72“C 1 minute. Final extension: 5 minutes. 3 = 95°C 30 seconds, then 35 cycles of: 95°C 45 seconds, annealing temp. 45 seconds, 72°C 1 minute, 30 seconds. Final extension: 5 minutes. 4 = 33 cycles of: 95°C 30 seconds, annealing temp. 30 seconds, 72°C 1 minute. Final extension 5 minutes. 5 = 35 cycles of: 95°C 1 minute, annealing temp. 45 seconds, 72°C 1 minute. Final extension 5 minutes * = ‘hot start’ (see section 2.2.11) N/A = not applicable (sequencing primer) 0 = annealing temperature varied depending on which primer was used as a pair.

59 Table 2.1. Legend on previous page. Primer Sequence Annealing Cycle Fragment Reference (pair where temperature size indicated)

SIOOAIO F: (j GGAAAGAAGTA( j GCAGAAATG 60"C 1 246bp \o \z etal, 1993 R: AGAACTTTAnTATTGAGGGCAAGG Involucrin F: GCCCAAGAGAACACCCCAGAAATACCACC 62"C 2 242bp Lioumi etal, 1998 R: TTCCTGTGATGCTTTTCTGACC S100A6 F: TGATTTCTGTTCCTCCTTGGCTTAG 60"C 1 783bp Volz etal, 1993 R: CAGCGTTTACCtrrCCTTATCCTCTT Trichohyalin F: AAGTTTTTATGCAGAGGTCACTCC 60"C 1 338bp Section 3.3.6 R: ATGCAGATGTGGTGTGCATT Profilaggrin FrAAGAATGAAGCAAGTTCACTTTCA 62"C 1 337bp W17217 Whitehead RtTGnTTATATTTTTGGCTCCTTCG Institute DIS 1664 F: GGTCTGAGAAAGACGGTGAG 60"C 1 150bp GDB R: ACATCGCAGCTAAGTGTTCC D1S2278 F:CATGACAGTGACACCAAGGG 60"C 1 347bp GDB R: AGACTCCACAGAGAGGCCAA D1S3020 F: T(3GTGTTTGGTTACATGGAT 60"C 1 151bp GDB R: GTGAAGGCAACATGTATCGT D1S305 F: CCAGNCTCGGTATGTTTTTACTA 60"C 1 156- GDB R: CTGAAACCTCTGTCCAAGCC 176bp D153625 F: GTCTGCATGGAAAGACATC 60"C 1 391 bp Section 3.3.6 R: CGTCAGAACCATCATGCCAG D1S3626 F: GCAGCTATGACGGAGGAC 60"C 1 470bp Section 3.3.6 RiGAGAGATGAGGAGTGACTGC S100A8 F: TCTTATGCTTTTGTGGAATGAGGTT 60"C 1 601 bp Mischke et al, 1996 R: TGAGTTGGGAGTTTGGGAGTGGA S100A9 F: GCCACCCTGCCTCTACCCA 60"C 1 402bp Mischke et a/, 1996 R: TGCCTTTTTGCTTTCTACAGTG SlOOAl F: AGACAGCCACATTGGGCAGCGC 60"C 1 165bp Section 3.4.3 R: GATGAGTTGCACXÎCTTGGACCGC S100A13 F: CAGCAGTTGCCCCATCTGCT 60*’C 1 233bp Section 3.4.3 R: GAAGAGTGCGGTTCTGCTCG S100A5 F: GCATCGATGACITGATGAAG 60"C 1 267bp Section 3.5.2 R; CTGCGATGGAAACmTAnrC M13+ F+: ACGTTGTAAAACGACGGCCAGTG 55"C 3 N/A Section 4.4.4 R+: CACACAGGTAACAGCTATGACC SD6 TCTGAGTCACCTGGACAACC 55"C See N/A Section 2.2.19 2.2.19.5 SA2 ATCTCAGTGGTATTTGTGAGC 55"C See N/A Section 2.2.19 2.2.19.5 SD2 GTGAACTGCACTGTGACAAGC 55"C See N/A Section 2.2.19 2.2.19.5

60 Table 2.1 continued.

Primer Sequence Annealing Cycle Fragment Reference

(pair where temperature size indicated) SA4 CACCTGAGGAGTGAATTGGTCG 55‘C Sec N/A Section 2.2.19 2.2.19.5 EX EXl: TCCCTTCGGGAGATCTCCAG 59"C 4* N/A Section 2.2.19 EX2: CTGCACGTGCTCTAGAGTAGTCG 167SEQ1 GAG TGA GCT GGG AGA TGT TC N/A Section 4.5.2

E2C3 F: CACACTCTCCCGGTCAGAG 55"C 4* 191bp Section 4.5.2 R: CGGACCCTTGTGAATGAAC 167 F: GGA CCA CAA CGC AGA AAA TGA GG 55'’C 4* 305bp Section 4.5.2.2 R; GTG TGG AGG AGG GTG TGT CGC AAG 167SEQ2 CGC TTC TTC TGT TGT GTG GTG N/A Section 4.5.23 167SEQ3 GTG TGT GTT GGG TTC TGG GGA N/A Section 4.52.3 167EX2 GGT GAT TGG TGG TGG AAG AAA AG N/A Section 4.5.2.3 I67SEQ.V1 GGT GGA TTC GAG TGG GTT TGG N/A Section 4.5.2.3 I67SEQ2-1 GTT GTG GGT TGG TGT TTG GG N/A Section 4.5.2.3 167SEQEX GTG GGA TGG AGA GTG TGG AG N/A Section 4.5.2.3 127E12sB4 F:CACGTCGCACCAGCCTATG 55"C 4* 194bp Section 4.6.1 R: CACCATCGTCTCCCGCAAG !27eI2sC3 E2C3F (see above) 55"C 4* 88bp Section 4.6.1 R: CCA AGC CTG GAG ACC CAG AAT G l28LI5sB17 F: GTCTCT( j GTTTGATGACC 50‘C 4* 135bp Section 4.6.1 R: ACCACAAGACTTCTCCCA l28L!5bGi4 F: GAGTTAGAGCCAATTTTGAC 50*’C 4* 183bp Section 4.6.1 R:CATCATCCCATTGAATCC l28LI5bH15 F: CAGACAGACACGTGTTCGATG 55‘C 4* 135bp Section 4.6.1 R: CAGCAAAGTCTAGCCAAG L4R ATGATAGAGATAGGATTTAGGTGAG N/A Section 5.3.2 127F GAATGGGTGTTGGAGGGGGA N/A Section 5.3.2 F-1 GGTTGAGGAGTGTGGGGTG N/A Section 5.3.2 F-2 GGTTATGAGGTGTGAGTG N/A Section 5.3.2 F-3 GTGAGGTGAGAGTGATGTG N/A Section 5.3.2 A1.3-n GGGAGAGTGTGTTTAGGAGG N/A Section 5.3.2 A13-ln GTGAGGGGGTTGTGAGTTTTGTTG N/A Section 5.3.2 AI3-2n GTGTGAAGAGTGGGTAGGGGGTATG N/A Section 5.3.2 A13uni-rev GAAGTGGTTGAGGGTGAGGG e e e Section 5.3.3 113442F AAGTGAGATGGGGTTGGGTG N/A Section 5.3.3 A13spIF GGGGTGGTGGAGGAGGGAGTG 64“G 5* Section 5.3.8 A13sp2F GGGAGAGGAGGGGGGAAGAT 64“G 5* Section 5.3.8 A13sp3F GGATTGTTGAGAGAGAGAGG 60"G 5» Section 5.3.8 AI3sp4F AGGTGGTGGGAGGGAGAGTG 60"G 5* Section 5.3.5 A13sp5F GAAGGGGGTGGATTGGGGTGTG 64X 5* Section 5.3.5 80P7di-I GGTGGAGGGGGTGTTGGTGGA N/A Section 5.3.5 A13sp6F GGTGATGTGTGGTGGAGTTG 60"G 5* Section 5.3.8

61 Table 2.1 continued. Primer Sequence Annealing Cycle Fragment Reference (pair where temperature size indicated) 96L4F GCATTGCCCAGAGGGACTCACXrC N/A Section 5.3.9 96L4F2 CTTGGACCCCTCACAGAGCACT N/A Section 5.3.9 96L4R GCCCACCTCTTGGGTGGGTGC N/A Section 5.3.9 A12EX3 F: GGTAGCCATTGCGCTGAAG 50'C 3* 127bp Section 5.4.2 R: GCA AGG CTG GGT TTT GGT G CHA6 F :GGACCTGGACCGCAACAAAG 60"C 3* 141bp Section 5.5.4 R: GGGCCCCTTAGAGGCCATTG CHAIO F: CATGAAGGACCTGGACCAGTGC 55"C 3* 144bp Section 5.5.4 R: CTGCGGGGCCTTCTCTCACTTC CHAll F: CGCTGTTGTGGACAGGATG 60"C 3* 217bp Section 5.5.4 R: CAAGGAGTGGGGATGTTG CHA6D1-1 CAGGTGACGTCACACCTCGTGTC N/A Section 5.5.4 CHA6D1-2 CCGATGGTCAACTCCTTCTGGATC N/A Section 5.5.4 CHA6D1-3 CACATCTGGATGGGGCACTCTG N/A Section 5.5.4 CHA6EX1R CTGGAGCAGGGCAGGATCCGGGAA N/A Section 5.5.4 CHAIORl CATCAGCGTCTCCATGGCGTGC N/A Section 5.5.4 CHAI0EX2F GAG GCT GAT GTT CAC CTT CC N/A Section 5.5.4 CHA10EX2R GCG CTG GTT CTC CAG GAA TC N/A Section 5.5.4 AlORI ATAGGCTTCAACGGACCACACC N/A Section 5.5.4 A10EX2F CTT CAA CGG ACC ACA CCA N/A Section 5.5.4 A10EX2R GTA CTC TCA GGT CCT CCT TTG N/A Section 5.5.4 AIOEXIR GACGCTGGGCGAGCTGGGCGAG N/A Section 5.5.4

2.1.5 Nucleotides

[a-^^P]-dCTP aqueous solution (3000Ci/mM) ICN Biomedicals Inc. Amersham International [a-‘^^P]-dUTP aqueous solution (3000Ci/mM) Amersham International ddNTP set, 2’,3’ - dideoxynucleoside 5’-triphosphate ddCTP, ddTTP, ddATP, ddGTP Pharmacia Biotech Ribonucleotides (rCTP, rUTP, rATP, rGTP) Promega

2.1.6 DNA cloning vectors used pBluescript*^ II phagemid vector Stratagene pSPL3 splicing vector Gibco BRL

62 2.1.7 DNA size markers 1 kb DNA ladder Life Technologies High MW DNA markers Life Technologies Lambda ladder BIORAD S. cerevisiae chromosomal DNA BIORAD

2.1.8 Culture media

AHC: 1 % w/v casein hydrolysate autoclave

0.002% w/v adenosine hemisulphate 2% w/v glucose 0.67% Nj base

2-YT 16% w/v tryptone/peptone 10% w/v yeast extract 5% w/v NaCl Plate 1.5% w/v agar Stab 0.75% w/v agar autoclave

LB 10% w/v tryptone/peptone 5% w/v yeast extract 5% w/v NaCl autoclave

Superbroth 32% w/v tryptone/peptone 20% w/v yeast extract 5% w/v NaCl 5mM NaOH autoclave

63 2.1.9 Solutions

ssc 150mM NaCl 15mM Sodium citrate

Spheroplasting solution: IM sorbitol lOmM Tris-HCl pH 7.5 20mM EDTA pH 8.0

TE: ImM EDTA pH 8.0 lOmM Tris-HCl pH 7.5 autoclave

0.ImM EDTA pH 8.0

lOmM Tris-HCl pH 7.5 autoclave

Yeast lysis solution: 1 % w/v lithium dodecyl sulphate lOOmM EDTA lOmM Tris-HCl pH 8.0

Herring sperm DNA solution: 0.5% w/v herring sperm DNA autoclave

Denhardt’s solution (100 x) 2% w/v Ficoll 400 (Denhardt, 1982,) 2% w/v polyvinylpyrrolidone 2% w/v bovine serum albumin

Pre / hybridization solutions:

Denhardts 5 X Denhardt’s solution

12 X SSC 0.5% w/v SDS

5 X herring sperm DNA solution

64 Church 0.5M NaPi (pH 7.2) 7% SDS ImM EDTA 0.01% tRNA

5 X Random Priming Labeling Buffer (5 x RPLB) 0.25mM Tris-HCl (pH 8.0) 25mM MgClj

IM HEPES (pH 6.6) 50mM P-mercaptoethanol O.lmM dATP, dTTP, dGTP 30% v/v 90 OD units/ml Random Hexamers

ALl 50mM glucose lOmM EDTA 25mM Tris-HCl (pH 8.0)

AL2 1%SDS 0.4M NaOH

AL3 5M K+ (pH 4.8)

3M A'

Denaturing solution 0.5M NaOH 1.5M NaCl

Neutralizing solution IM Tris-HCl (pH 7.4) 1.5M NaCl

PROPK solution 50mM Tris-HCl (pH8.5) 50mM EDTA lOOmM NaCl 1 % v/v Na-lauroyl-sarcosine

65 10 X RPE b u f f e r 400mM Tris-HCl (pH7.5) 60mM MgClz 20mM spermidine (sodium salt) lOOmM NaCl

TEN 80mM EDTA 40mM Tris-HCl (pH9.0) 160mM NaCl

1 0 X Guy’s Buffer 500mM KCl lOOmM Tris-HCL (pH7.5) 15mM MgClj

10 X DNase I buffer 250mM Tris-Hcl 50mM MgClj ImM EDTA (pH 7.2)

2.1.10 Sources of genomic DNA Total human genomic DNA was a kind gift from Dr Rachel Flomen, while a flask of transformed human cells (from which genomic DNA was isolated) was a kind gift from B. Chopra.

2.1.11 Sources of RNA Fetal liver, heart, and lung total RNA isolated in Dr Nizetic’s laboratory was kindly supplied by Dr Jane Ives. Commercial polyA+ RNA (adult brain, liver, pancreas, and skeletal muscle) were supplied by CLONTECH.

2.1.12 Northern blot A Multiple Tissue Northern Blot (MTN™) HI was supplied by CLONTECH (catalogue number #7760-1)

2.1.13 Host strains used E. coli DH5-alpha-mcr (BRL Life Technologies) - cosmid library E. coli DHIOB (BRL Life Technologies) - PAC and BAC library E. coli XLl Blue (BRL Life Technologies) - laboratory transformations (sub-cloning cosmid DNA fragments and exon trapping derived DNA fragments)

66 2.1.14 Bacterial clone libraries ICRFl 12 - Flow-sorted chromosome 1 cosmid library (Nizetic et al, 1994) RPCI-1 - PAC library (laonnou et al, 1994) Total Human BAC library (Research Genetics) - (Shizuya et al, 1992)

2.1.15 IMAGE consortium cDNA clones used

IMAGE clone ID Represented: Used in Chapter 153992 SlOOAl 3 398802 S100A2 3 377441 SI 00 A3 3 342548 S100A12 3 133442 SlOOAl3 (novel 5’ end) 5 1676497 Novel EDC assigned 4 cDNA

Table 2.2. IMAGE clones obtained from HGMP Resource Centre, Hinxton, UK.

2.1.16 COS7 cell line COS7 cells, clonal line A6 were used for exon trapping (see chapter 4).

2.2 Methods

2.2.1 Filter spotting and processing (Nizetic and Lehrach, 1995; Ross etal, 1992) Spotting. 1. An appropriate number of 22cm x 22cm Nunc™ sterile culture plates with 200ml of appropriate agar containing the desired antibiotic were prepared. A single 22.3cm x 22.3cm Hybond N+ membrane was placed onto each agar plate and incubated for at least 20 minutes at room temperature prior to spotting. Care was taken when handling the membrane (with forceps only at each comer), ensuring that no air bubbles were present between the membrane and the agar. 2. Filters (Hybond N'*’ membranes) were spotted individually after being removed from the agar and placed on top of two squares (larger than 22cm^) of Watman on the laboratory bench. The filter was then marked to indicate the orientation (as a general rule well Al of the microtitre plate would be spotted onto the corner, with four plates spotted per filter).

67 3. An appropriate spotting tool was submerged into the wells of the desired microtitre plate for a few seconds before being placed directly onto the filter. Pressure was evenly applied to the spotting tool with enough force to mark the filter before being returned to the microtitre plate for subsequent spotting, 4. The filter was then placed, with care, back onto the agar plate ensuring no air bubbles were present. The plate was inverted and incubated overnight at the appropriate temperature.

Processing. 1. The incubated filter was inspected for any abnormal colony growth, which was noted if present.

YAC colony filter lysis: 2. The filter was placed on a single square of Watman in a tray, pre-wetted with SCE solution containing 0.02mg/ml zymolyase and 14mM P-mercaptoethanol. The tray was sealed and incubated at 37®C overnight. 3. The filter was placed onto a single square of Watman, pre-wetted with denaturing solution and incubated for 10 minutes, air dried on a fresh square of Watman, and submerged into 250ml of neutralizing solution for 5 minutes. At this and all other stages of processing, the filter was handled with forceps taking great care not to disturb any colony growth. In addition, when the filter was placed onto a pre-wetted square of Watman care was taken to ensure no air bubbles were present between filter and Watman. 4. The solution was replaced with a 1:10 dilution of neutralizing solution and the filter incubated for a further 5 minutes. 5. The solution was replaced with 250ml of 1:10 diluted neutralizing solution containing 250pg/ml proteinase K, and incubated for 30-60 minutes at 37°C. 6. The filter was submerged into a 1:10 dilution of neutralizing solution for five minutes. 7. The solution was replaced with 50mM Tris-HCl pH 7.4 and incubation continued for a further five minutes, after which the filter was removed and placed onto a dry square of Watman. 8. The filter was covered, to prevent dust settling, and dried overnight at room temperature. 9. The filter was cross-linked with 0.12 J/cm^ of energy from a UV cross-linker and stored between dry Watman.

Bacterial colony filter lysis 2. The filter was placed onto a single square of Watman, pre-wetted with denaturing solution and incubated for 4 minutes. At this and all other stages of processing the filter was handled with forceps taking great care not to disturb any colony growth. In addition, when the filter was placed onto a pre-wetted square of Watman care was taken to ensure no air bubbles were present between filter and Watman. 3. The filter was then placed onto a fresh square of Watman, pre-wetted with denaturing solution, and then moved onto a dry glass plate, standing in a steaming waterbath. The waterbath lid was secured in place and

68 the filter was incubated for four minutes. Care was taken to wipe the lid of the waterbath prior to securing in place in order to prevent any condensed water falling onto the filter. 4. The waterbath lid was removed with care (preventing any falling water). The filter was removed and placed onto a fresh square of Watman pre-wetted with neutralizing solution, for a further four minutes. 5. The filter was then placed onto a fresh, dry, square of Watman and air dried for one minute only. 6. Whilst holding the filter by two adjacent comers, it was submerged into 250ml of pre-warmed PROPK solution containing 150pg/ml proteinase K at 37°C. 7. After incubation at 37°C for 30-50 minutes the filter was retrieved with care and placed onto a fresh, dry, square of Watman. 8. Proceed from step 8 of YAC filter lysis (above).

2.2.2 Restriction enzyme digestion 1. Restriction enzyme digestion of plasmid DNA was carried out in either lOpl or 18pl volumes (depending on the size of the electrophoresis apparatus used to resolve the resulting DNA fragments and unless otherwise indicated), under the following conditions: 1 X restriction enzyme buffer (supplied by enzyme manufacturer) 1 pg RNase A 5-20 Units of enzyme (depending on concentration of enzyme supplied) +/- 1 X BSA (depending on enzyme manufacturers specification) Reactions were incubated for 37°C for a minimum of 2 hours.

2. Restriction enzyme digestion of genomic DNA was carried out in a volume of 150pl under the following conditions: 1 X restriction enzyme buffer (supplied by enzyme manufacturer) 15pg RNase A 150 Units of enzyme +!- 1 X BSA depending on enzyme manufacturers specification Reactions were incubated for 37"C for a minimum of 12 hours before adding 15pl of NaOAc, 412.5pi 100% ETOH, and being precipitated at -20°C for a minimum of 2 hours. Digested DNA was recovered by spinning in a microfuge at 13,000 rpm for 10 minutes, washed once with 70% ETOH, air dried and reconstituted in 10- 18pi of H%0 (depending on the required volume for gel electrophoresis).

2.2.3 Agarose gel purification of DNA 1. After electrophoresis the agarose gel was stained for a minimum of 10 minutes in 100pi 1% ethidium bromide / 500ml H^O, and washed twice with one litre of H^O for a minimum of 5 minutes.

69 2. The DNA was visualized using a UV transuliminator (254nm) and the desired bands were excised with a scalpel.

Agarase digestion (LMP agarose) 3. The gel slice was weighed and placed into a 1.5ml eppendorf tube. To an average slice that weighed 250pg, 150|il of HjO, 40pl of IM NaCl, and 8pi of 0.5M EDTA (pH 8.0) were added. The agarose slice was

then melted at 68°C for 15 minutes followed by incubation at 40°C for 5 minutes. 4. After adding 2pl (2U) of P-Agarase, the sample was incubated at 40°C or 37°C for an excess of 8 hours. 5. After a single phenol and chloroform/isoamyl alcohol (24:1) extraction, 360|il of sample was precipitated with 5.4pl of lOmg/ml Dextran T40, 95pi of IM NaCl, and 1062pl of 100% ethanol. After centrifuging at 13,000 rpm for 10 minutes the pellet was washed with 70% ethanol, dried and resuspended in 20pl H^O.

Electroelution 3. The gel slice, along with 500pl of TE, were placed into a section of dialysis tubing secured at the bottom with a medi-clip. Any air bubbles were removed and the tubing was secured at the top with a second medi- clip. 4. The dialysis tubing, containing the gel slice and TE, was placed back into the original electrophoresis apparatus (containing the original buffer) and electrophoresed with an appropriate current for 40 minutes. 5. 400pl of the DNA containing TE were then removed from the dialysis tubing and precipitated with 40pl of 3M NaOAc and 1 lOOpl of 100% ethanol at -20°C for a minimum of 2 hours. After centrifuging at 13,000 rpm for 10 minutes the pellet was washed with 70% ethanol, dried and resuspended in an average of 20pl H^O.

Prep-A-Gene® 3. The agarose slice was treated according to the manufacturer specifications for the Prep-A-Gene® DNA Purification Systems (BioRad) and resuspended in 20pl H^O.

2.2.4 Retrieving single clones from microtitre plates (Nizetic and Lehrach, 1995). 1. Agar plates containing the desired antibiotic and appropriate media were prepared and labeled as needed. A number of 10cm x 10cm squares of Benchkote punctured with single holes (the size of a single microtitre well) were prepared, 2. The desired microtitre plate was placed on dry ice during the course of retrieval to prevent de-frosting. A Benchkote square was placed onto the microtitre plate, shiny side up, so that the punctured hole exposed the desired well.

70 3. A 2mm wide screwdriver was sterilized by first wiping with tissue and then immersing in ethanol, followed by flaming. A small amount of the frozen material was then scraped from the desired well and streaked onto the agar plate which was subsequently incubated overnight (14 hours) at ,the appropriate temperature.

2.2.5 Agarose plug minipreps of S. cerevisiae containing YAC clones (adapted from Anand et al, 1989) 1. 15ml of AHC were inoculated with either, cells from a single colony grown on plate, or from a glycerol stock derived from a single colony. The culture was grown at 30°C with shaking for 48hrs. 2. Low melting point agarose was dissolved to a concentration of 2% in spheroplasting solution, cooled to and held at 60°C. P-mercaptoethanol was added to 14mM (1:1000 from stock). 3. The cells were pelleted and washed once in 50mM EDTA pH 8: 3,200 rpm/20 minutes. 4. The cells were resuspended in 400pl spheroplasting solution / 14mM P-mercaptoethanol / lOOpg/ml zymolyase. 5. Plug moulds (supplied with PFGE apparatus) were prepared by rinsing in ethanol, drying and sealing with a non-porous tape. 500|il of the 2% molten agarose were added to the cells, mixed and dispensed into the plug mould. The plugs were allowed to set upon a glass plate standing on ice for at least 30 minutes. 6. Plugs were expelled from the moulds into 5ml of spheroplasting solution / 14mM P-mercaptoethanol / 1 OOpg/ml zymolyase and incubated at 37°C for 2 hours with gentle shaking. 7. The solution was replaced with yeast lysis solution and incubated at 37°C for 1 hour with gentle shaking. 8. The yeast lysis solution was replaced with a further 5ml and incubated at 37°C overnight with gentle shaking. 9. The plugs were then washed three times at room temperature in 5ml of T„ ,E and stored at 4°C in 5ml of

0 .5 . EDTA pH 8.0

2.2.6 Pulsed Field Gel Electrophoresis (Schwartz and Cantor, 1984) Separation of agarose plug mini-preps of YAC DNA. 1. One third of a YAC plug was washed as follows:

To.jE 3 X 30 minutes, 37°C

To.iE 3 X 30 minutes, 55°C 2. YAC plugs were separated on 1% agarose (LMP or ‘rapid’ agarose) in O.SxTBE buffer using CHEF-DR** II apparatus (BIO-RAD) under the following conditions: Voltage 6V Pulse switch time 60 - 90 seconds Duration 24 hours

71 Separation of PAC clone insert DNA. 2. Notl restriction enzyme digested PAC DNA was separated on a 1% LMP agarose in 0.5xTBE buffer using CHEF-DR** II apparatus (BIO-RAD) under the following conditions; Voltage 5.2V Pulse switch time 3 -1 5 seconds Duration 14 hours

2.2.7 Southern blotting (Southern, 1975) 1. The agarose gel containing the electrophoresed DNA was nicked with 0.25M HCl for 20 minutes. 2. The gel was neutralized with large volumes of distilled water and placed in 0.4M NaOH for 15 minutes. 3. Transfer to filter was carried out in 0.4M NaOH for an excess of 12 hours. 4. The filter was washed with 2 x SSC, dried, wrapped in cling film, and stored at 4°C.

2.2.8 Radio-labeled probe preparation

1. 5pi of 5xRPLB was added to 17.5pl of either TE or H^O containing approximately 25ng of the DNA to be labeled. 2. The DNA mix was denatured in a heating block at 101°C for 5 minutes and snap chilled on ice. The following were then added:

a-^2p dCTP (3000Ci/mM) 1.5pl klenow subfragment of DNA polymerase ( 1 U/pl) 1 .Opl 3. The reaction was mixed by fingertipping and placed at either 37°C for 30 minutes or at room temperature for an excess of 14 hours. 4. Probe DNA was purified from unincorporated nucleotides by ethanol precipitation: the following were added to the reaction: 200mM EDTA (pHS.O) 1 Opl tRNA (lOmg/ml) 3 pi H^O 15pl

1.6M NaCl 8pl 100% ethanol 154pl 5. The additional components were mixed thoroughly and placed on dry ice for 15 minutes before centrifuging at 13,000rpm for 10 minutes. The pellet was resuspended in an appropriate volume of H^O.

6. A small proportion of the probe (less than 1%) was removed and monitored in a scintillation counter to assess the incorporation of labeled nucleotide. 7. The remainder of the probe was denatured in a heating block at 101°C for 5 minutes, immediately prior to hybridization.

72 2.2.9 Suppression of probe sequence over-represented within the genome

1. The labeled probe was resuspended in 111.2pl H^O (step 5, section 2.2.8) and Ipl was removed to assess incorporation (step 6, section 2.2.8). 2. The following components were added: IM NaPi (pH 7.2) 17pl Total human placental DNA (10.8mg/ml) 9.8pl

3. Prior to hybridization, the components were mixed, denatured in a heating block at 101°C for 5 minutes, and immediately transferred to 65°C for a period of 2-5 hours.

2.210 Hybridization of membranes (filters) containing Southern blotted DNA or high/low density gridded colony DNA 1. Filters were wetted in 6 x SSC, rolled up and placed into the appropriate sized bottle. Where a filter was large enough to roll up on itself, or if one or more filters were being hybridized in a single bottle, a nylon mesh was placed between any surface, including that of the bottle, that came into contact with the DNA side of the filter. In this instance the mesh and all the filters were wetted in 6 x SSC together. 2. An appropriate volume of prehybridization solution (Church or Denhardts) was added to the bottle, which was rotated for a minimum of 1 hour inside a hybridization oven (Stewart Scientific) at 65°C. When a filter was being hybridized for the first time, total human placental DNA was denatured for 5 minutes at 101 °C and then added to the pre-hybridization solution, to an average concentration of lOpg/ml. 3. The radio-labeled probe was added to an appropriate volume of fresh hybridization solution (Church or Denhardts). The prehybridizing solution was poured off the filter and the fresh, radio-labeled-probe- containing-hybridization solution was added. Hybridization was carried out overnight (12-18hrs) at 65°C. 4. The filters were washed flat, in a plastic box, in the following way:

2xSSC / 0.1 % w/v SDS room temperature / 15 minutes

0.2xSSC / 0.1 % w/v SDS 65°C / 40 minutes

The quantity of wash solution varied depending on the size of the plastic box used. As a rule up to 7 large filters (22cm x 22cm) were washed with a minimum of 1 litre; more filters requiring greater volume. Smaller filters were washed with around 250-750ml of solution depending on the size of the plastic box used. 5. Filters were removed from the plastic box and excess wash solution was allowed to evaporate or soak into 3MM Watman, where the filters were laid DNA side up. On no account were the filters allowed to dry out completely. Filters were then placed onto a slightly larger piece of cut Benchkote, DNA side up, and wrapped in cling film.

73 6. Auto-radiography was performed with X-ray film or Phosphorlmager screens (Molecular Dynamics).

2.2.11 Polymerase Chain Reaction (PCR) amplification (Saiki et al^ 1985 and 1987)

1. Each reaction consisted of (50pl total volume): 5 pi 10 X PCR buffer (supplied with enzyme) Ipl lOmM dNTPs Ipl primer 1 (20pmole/pl) 1 pi primer 2 (20pmole/pl) 39pl H20 1 pi SUPER TAQ polymerase (HT Biotech) 2pl template DNA A premix of all components was made for each reaction before adding template DNA, except in the case of a ‘hot start’, where a premix was made consisting of all components except SUPER TAQ before adding template DNA. In this case reactions (minus SUPER TAQ) were incubated at 95°C for 5 minutes and chilled on ice before adding the SUPER TAQ. Section 2.1.4 describes the specific cycles and temperatures used for each primer, as well as whether a ‘hot start’ was used.

2.2.12 Preparation of plasmid DNA (modified alkaline lysis protocol; Sambrook et al, 1989). Plasmid DNA under 50kb in size: 1. 4mls of 2-YT broth supplemented with the appropriate antibiotic were inoculated from either a single colony or a glycerol stock derived from a single colony. The culture was then grown overnight, an average of 14 hours, at 37°C in a rotating incubator at 200rpm. 2. The cells were transferred to a single 2ml eppendorf tube and harvested in two sequential spins of 9,500rpm in a benchtop microfuge. The media was removed completely and replaced with 150pl of solution ALl. 3. The cells were gently resuspended and placed on ice. 4. 300|il of solution AL2 were added and mixed by gentle inversion. The tube was incubated on ice for 5 minutes. 5. 225pi of pre-chilled solution AL3 (on ice) were added and mixed by very gentle inversion. Care was taken to aggregate the precipitated cell debris when inverting. The sample was left on ice for any period of time between 15 minutes and 4 hours. 6. The sample was spun at 13,000rpm in excess of 15 minutes.

74 7. Two volumes of 630|il of the resulting supernatant were removed and placed in separate tubes. To each tube 1260|il (equal to two volumes) of 100% ethanol were added. Leaving the two tubes at one of the following temperatures, for the corresponding minimum time period precipitated the plasmid DNA: -20°C 2 hours -70°C 45 minutes 8. The DNA was pelleted at 13,000rpm for 10 minutes, washed with 70% ethanol and dried. 9. The DNA in each of the two tubes was resuspended with 225pi of H^O and pooled.

10. An equal volume of phenol (450pl) was added and mixed by vortexing for 1 minute. 11. The sample was centrifuged at 13,000rpm for 3 minutes. 430pl of the aqueous phase was removed and to this an equal volume (430pl) of chloroform/isoamyl alcohol (24:1) was added and mixed by vortexing for 1 minute. 12. The sample was again centrifuged at 13,0(X)rpm for 3 minutes. 400pl of the aqueous phase was removed and placed in a fresh tube. Care was taken not to remove any of the organic phase when removing that of the aqueous phase in both steps 11 and 12. 13. 40|il of 3M NaOAc and l.lOOpl of 100% ethanol were added to the sample to precipitate the DNA at one of the two given temperatures and minimum time periods indicated in step 7. 14. The DNA was pelleted at 13,(X)0rpm for 10 minutes, washed once with 70% ethanol, dried, and resuspended in 40pl of H^O.

The DNA yield varied depending on size and copy number of the plasmid. 4pi (10%) of the plasmid DNA preparation was digested with a single or a combination of restriction enzymes and resolved, along with size/quantity standards, on an appropriate agarose gel. This gave an accurate determination of quantity and quality. Where more plasmid DNA was required initial volumes of broth (stepl) and subsequent solutions (steps 2 through to 9) were increased and scaled up accordingly. It was found that harvesting overnight cultures of more than 5ml of inoculated broth required an increase in ALl solution (step2) of up to 3(X)pl to avoid a loss in DNA preparation quality.

Plasmids over 40kb in size. The method was essentially the same as above except greater care was taken when handling the DNA in order to reduce shearing. Changes are indicated at the following steps: 3. The cells were resuspended with either a large blue tip or a yellow tip where the end was cut off obliquely with a scalpel to increase the diameter of the channel that the cells passed through. 9. 225pi of HjO were added to each tube and left for 15 minutes at room temperature. The DNA was resuspended by gentle fingertipping. 10 and 11. Rapid inversion for 3 minutes, rather than vortexing achieved mixing for 1 minute.

75 14. The DNA was resuspended by adding 40pl of H^O and leaving for 15 minutes at 37°C or overnight at 4°C

followed by gentle fingertipping. Manipulation of the final, concentrated, DNA preparation was undertaken using a large blue tip or a yellow tip which had its end cut obliquely with a scalpel to increase the channel size that the large DNA molecule passed through. This was found to decrease any shearing of the DNA molecule.

2.2.13 Glycerol stock production Where a new clone had been produced or isolated from a library, a glycerol stock was made in the following way; 1. 800|il of overnight culture was transferred to a 2ml cryovial (Gibco BRL). 2. 200pl of 75% glycerol was added and mixed. Vials were labeled, flash-frozen on dry ice and stored at - 80°C for further use.

2.2.14 Preparation of probes from bacterial clones PAC and BAG insert DNA probes 1. Approximately 0.75|ig of PAC or BAG DNA was digested with Notl restriction enzyme in three separate, 15pl volumes (see section 2.2.2). 2. All three digested DNA reactions were resolved using PFGE (section 2.2.6) in the presence of size markers (Ikb ladder. Life Technologies; ^.-ladder, BioRad; HMW marker. Life Technologies), 3. PAG vector DNA is approximately 17kb in size, while BAG vector DNA is approximately 7kb in size. DNA fragments of a different size to the vector fragment were isolated and purified using the agarase method (section 2.2.3). 4. 5|il from 20pl of the purified insert DNA fragments were used in subsequent preparation of radio-labeled probes (section 2.2.8).

Gosmid 5flM3AI probe preparation 1. Approximately 2pg of cosmid DNA was digested with 5aM3AJ restriction enzyme as described (section 2.2.2), and resolved on 20cm long 0.8% agarose gels under standard conditions. 2. 5flw3AI restriction enzyme digests the cosmid vector, Lawrist 4, into multiple DNA fragments all smaller than 900bp (including either side of the multiple cloning site). Therefore, any 5au3AI restriction fragments larger than 900bp are presumed to be of insert DNA origin. DNA fragments larger than Ikb were excised from the agarose gel and purified using the electroelution method (section 2.2.3). 3. 5pi of purified DNA fragments were resolved on 0.8% agarose gel in order to determine the concentration. 25ng of DNA were used in radio-labeled probe preparation as described (section 2.2.8).

T7 and SP6 RNA polymerase extension end probe preparation (adapted from Gross and Little, 1986)

76 1. The following reaction was set up, in the order given and using components suspended in dEPC treated

H2O wherever possible: 1 jig mini-prep plasmid DNA (PAC, BAG or cosmid)

12.5pl dEPC treated H 2O (- volume of plasmid DNA, depending on concentration) 3 pi 10 X RPE buffer 1 pi RNasin (Promega) 2pl lOOmM DTT Ipl lOmM rNTPs 8pi Yeast tRNA (lOmg/ml) 0.5 pi Polymerase (T7/SP6, Promega) 2pl a-32p dUTP (3000Ci/mM) 2. Reactions were incubated at 37“C for 75 minutes, after which the following was added: 3pl 0.5M EDTA 6.4pl 2M NaCl

21.6pl dEPC treated H 2O 154pl 100%ETOH 3. Reactions were then precipitated on dry ice for 15 minutes, centrifuged at 13,000 rpm for 10 minutes and resuspended in an appropriate volume of dEPC treated H20. A small proportion of the resuspended probe preparation was removed and monitored in a scintillation counter to assess the incorporation of labeled nucleotide. 4. Resuspended probe preparation was added directly to the hybridization solution.

2.2.15 Dot blot construction

1. 2pg of plasmid DNA were denatured in a volume of 15pl of 0.2M NaOH for 15 minutes at room temperature. 2. 3pi was spotted onto Hybond N+ nylon membrane, five times to produce five separate dot blots and dried at room temperature (volume in step 1 can be altered depending on number of dot blots required). 3. Hybond N+ containing DNA was placed on a single sheet of Watman, pre-wetted with denaturing solution for 5 minutes. Care was taken so as there were no air bubbles between membrane and Watman. 4. The Hybond N+ membrane (filter) was transferred to a single sheet of Watman, pre-wetted with neutralizing solution and incubated for 5 minutes. 5. The membrane was floated^DNAside up, on l(X)mM NAPI pH 7.2 and then dried on a single sheet of Watman at room temperature. 6. The dried membrane was cross-linked with 0.12 J/cm^ of energy from a UV cross-linker and stored between dry Watman.

77 2.2.16 Competent cell preparation (adapted from Sambrook et al, 1989). 1. 15ml of LB was inoculated with either a fresh single colony or a glycerol stock derived from a single colony of XL IB lue cells and incubated at 37°C with shaking (200rpm) overnight. 2. The overnight culture was used to inoculate a fresh 100ml of LB, which was incubated at 37°C with shaking (200rpm) until the OD at ^^,nm reached between 0.4 and 0.48.

3. The culture was split into two and centrifuged at 4,000rpm, 4°C for 10 minutes. 4. The supernatant was decanted and each cell pellet was resuspended in 10ml of 0.1 M CaClj and placed on ice for 1 hour. 5. Cells were again centrifuged at 4,000rpm, 4°C for 10 minutes. The supernatant was decanted and the pelleted cells were resuspended in 2ml of O.IM CaClj /15% v/v glycerol and placed on ice.

6. 1.5ml eppendorf tubes were pre-chilled on dry ice. 200pl aliquots of cells were flash frozen in the pre­ chilled eppendorf tubes and stored at -80°C.

2.2.17 Transformation of competent cells (adapted from Sambrook et at, 1989). 1. One or more 200|il aliquots of competent cells were defrosted on ice. While this was taking place the transformation DNA was also placed on ice. 2. 50-100|il of competent cells were added to the transformation DNA and incubated on ice for 30 minutes. 3. The reactions were heat shocked at 42°C for 90 seconds and placed on ice for a further 2 minutes. 4. 450-950ml of Superbroth were added to each reaction and placed at 37°C for 1 hour. 5. An appropriate volume of the reactions was plated onto either LB or 2-YT plates (containing the desired antibiotic concentration) and incubated at 37°C for at least 14 hours.

2.2.18 Tissue culture of COS? cells 1. A vial of frozen cells stored in liquid N2 was thawed quickly (under a running tap). 2. The cells were transferred to a 15ml falcon tube and centrifuged at 1 lOOrpm for 5 minutes. 3. The supernatant was removed and the cells were resuspended in 2ml of Modified eagle medium, supplemented with 10% fetal bovine serum, 2% L-Glutamine and 2% Penicillin-Streptomycin. 4. Cells were again centrifuged and resuspended as before (steps 2 and 3). 5. Cells were added to 7ml of the above media, in a 75cm^ flask pre-warmed at 37°C. The lid was loosened to allow access of CO 2 and the cells were incubated at 37°C for at least 2 days with 10% CO 2

78 6. Once a confluent monolayer of cells was present within the flask, the cells were divided as follows: Media was removed and cells were washed once in PBS (ICN Biomedicals Inc.) 2ml of trypsin (lOX in solution, Sigma Chemical Co.) was washed over the monolayer and removed 1 ml of trypsin was added and left in contact with the monolayer for 1 to 2 minutes cells were loosened by tapping the flask and then washed off with 5ml of pre-warmed media the suspension of cells was divided into the required number of flasks, containing pre-warmed media as before (step 5)

2.2.19 Exon Trapping (Dyuk et aL, 1991, Buckler et uZ., 1992, Church et al., 1994)

2.2.19.1 Preparation of pSPL3 vector 1. pSPL3 vector DNA was prepared from a bacterial clone containing the plasmid using a QIAGEN midi-tip according to the manufacturer specification. 2. 5|ig of pSPL3 was completely digested with 50 units of BaniRl restriction enzyme, according to the manufacturer instructions. 3. The digested vector was phenol/chloroform+isoamylalcohol (24:1) extracted, ethanol precipitated (for details see 2.2.12, steps 10-13) and resuspended in 17pl of H^O.

4. IU of alkaline phosphatase along with the appropriate buffer was added to the sample, which was incubated at 37°C for 45 minutes followed by heat inactivation at 68°C for 10 minutes. The DNA was then phenol/chloroform:isoamyalcohol extracted, ethanol precipitated and resuspended in 50pl of H^O.

5. 1 pi of the sample was resolved on a 0.8% agarose gel in the company of a size/weight DNA marker to determine the concentration (DNA was visualized as in section 2.2.3, steps 1+2). 6. Three test ligations were carried out with 25ng of the digested, phosphatased vector. The first contained vector, ligation buffer, H^O and lOmM rATP only. The second contained vector, ligation buffer, H^O, lOmM rATP and lU of ligase. The third reaction contained vector, ligation buffer, H^O, lOmM rATP, lU of ligase and 1U of polynucleotide kinase. Ligations were carried out overnight at room temperature. 7. Competent cells were transformed with the individual ligations. Polynucleotide kinase rephosphorylates the ends of the vector while ligase only ligates phosphorylated ends. Test ligations indicated the presence of undigested or unphosphotased vector.

2.2.19.2 Preparation of bacterial clone insert DNA

1. 15pi of plasmid DNA (PAC) was digested with either SauSAl (partially) or with both BamHl and BgBl (fully). Five separate 20pl reactions containing the appropriate buffers and 2pg of RNase were set up with 3pl of plasmid in each. For the 5ûm3AI partial digestions 0.5 units of enzyme were added to each of the five

79 reactions which were incubated for one hour only at 37°C before heat inactivation took place. For the

BamYWIBgUl complete digestions ] Ounitsof each enzyme was added to each reaction and incubated for at least

14 hours at 37®C before heat inactivation. 2. Like reactions were pooled, ethanol precipitated and resuspended in 20|il H^O. 2pl of the BamY{\JBgïïl digested sample were electrophoresed on a 0.8% agarose gel in order to determine the quality and quantity. All of the 20pl SauiM. partially digested sample was ran on a 0.8% agarose gel in the following way: 3pi of the sample was electrophoresed next to size standards (Ikb, Life Technologies) while the remaining 17pl of the sample was electrophoresed in a neighboring lane of the gel. This was carried out to determine a 3-5kb window by visualizing only the 3pi of sample in order not to expose the remaining 17pl to UV light as this impedes cloning considerably. The 3-5kb window from the portion of the gel containing the 17pi of sample was excised from the gel and purified using the Prep-A-Gene® protocol (see 2.2.3). The purified Sau2)fii\. sample was resuspended in 30pl of which 3pi was electrophoresed on a 0.8% agarose gel in order to determine quantity and quality.

2.2.19.3 Ligation and transformation 1. Ligation reactions were set up with 30ng of prepared insert DNA {SauiAl partial or BamYWBglll full) and lOng of prepared pSPL3 vector in a lOpl volume of 1 x ligation buffer, lOmM rATP and 0.5 units of ligase. Reactions were left for 14 hours at room temperature. 2. lOOpl of competent cells were transformed with the ligation reactions as described previously except all the transformed cells were plated onto one plate only (see 2.2.17). 3. 2ml of LB containing lOOpg of carbenicylin were added to the agar plates in order to resuspend the colonies which were initially scraped off. 4. The resuspended cells were added to 20ml of LB containing 50pg/ml carbenicyclin and placed in a rotating incubator at 37°C / 200rpm overnight. 5. Plasmid DNA was isolated using a QIAGEN midi-tip (according to the manufacturer specifications) and resuspended in 50pl of H^O. 3pi of the resulting sample was electrophoresed on a 0.8% agarose gel in order to assess quantity and quality.

2.2.19.4 Electroporation 1. A sufficient number of 500ml tissue culture flasks of COS7 cells were cultured to generate one flask per sample plus a positive and negative control. Cells were cultured until they were approximately 75% confluent. Cells were then trypsinated as described and pelleted at 1100 rpm for 5 minutes, washed in PBS (ICN Biomedicals Inc.) and repelleted.

80 2. Cells were resuspended in 700|il of cold PBS (ICN Biomedicals Inc.) and added to lOOpl + 20|ig of transformant DNA in a pre-cooled 0.4cm cuvettes. The cuvettes were placed on ice for 10 minutes and gently resuspended before electroporation was carried out under the following parameters; Voltage 1.2kV Capacitance 25 pF Resistance lOOQ Pulse time 0.5-1 second 3. Cells were placed on ice for a further 10 minutes before being transferred to a 500ml tissue culture flask containing 10ml of pre-warmed media. Cells were incubated for 72 hours with a change of media after 24 hours.

2.2.19.5 Reverse transcription and PCR

1. Electroporated cells, grown for 72 hours, were trypsinated and pelleted as described. The supernatant was removed and the pelleted cells resuspended in lOOpl of RNAzol B (Biogene) before being transferred to a 1.5ml eppendorf tube and placed on ice for 1 minute. 2. lOpl of chloroform was added and the reaction was vortexed for 20 seconds and placed on ice for 10 minutes. 3. The reaction was centrifuged at 13,000rpm for 3 minutes after which the aqueous phase was removed to a fresh 1.5ml eppendorf tube. 4. The RNA was precipitated by adding an equal volume of isopropanol and placed at 4°C overnight. 5. The RNA was pelleted at 15,000g / 4°C for 15 minutes. The supernatant was removed and the pellet was washed briefly with 80% v/v ethanol and dried at room temperature for 5 minutes. 6. The pellet was resuspended in 20pl of dEPC treated H^O.

7. RNA was reverse transcribed by first adding: 7pi 5 X RT buffer 4pl lOmM dNTPs Ipl RNasin (lOU/pl) Ipl primer SA2 (lOOng/pl) The reaction was incubated at 65°C for 10 minutes and then placed on ice. 2pl of M-MLV reverse transcriptase was added before incubation at 42°C for 90 minutes.

81 8. Half of the reaction (17.5|il) was removed and added to: 5)il 10 X Guy’s Buffer 1^1 lOmM dNTP’s 2.5^il primer SD6 (lOOng/pl) 2.5pl primer SA2 (lOOng/pl) Ipl Promega Taq 20.5pl HjO

9. Primary PCR was carried out under the following conditions: 6 cycles of: 94“C 1 minute 55®C 1 minute 72°C 5 minutes

then held at 55°C

10. 2pi (20 units) of the restriction enzyme BstXl was added and the reaction was incubated at 55°C overnight, 11. A further 0,5pl (5 units) of BstXl was added and incubation was continued for a further 2 hours at 55°C. 12. A secondary, or nested, PCR was set up in the following way: lOpl primary PCR reaction

5pi 10 X Guy’s Buffer Ipl lOmM dNTP’s 2.5pl primer SD2 (lOOng/pl) 2.5 pi primer SA4 (lOOng/pl) 29pl H;0

Cycles were carried out as follows: 30 cylces of: 94°C 1 minute 55°C 1 minute 72°C 3 minutes

13. The PCR reaction was ethanol precipitated to remove any excess nucleotides and primers and resuspended in 20pl. 14. 3pi of the secondary PCR was electrophoresed on a 1% agarose gel in the presence of size and weight standards to determine the concentration and quality.

82 2.2.19.6 Sub-cloning secondary PCR products 1. Secondary PCR products were cloned using the CLONEAMP® pAMPl System (Life Technologies) in the following way: 3|il Secondary PCR product Ipl pAMPl vector 0.5pl 10 X Guy’s Buffer 0.5pi UDG (Uracil DNA Glycosylase) The reactions were placed at 37°C for 40 minutes before transformation of competent cells (see 2.2.17) which were plated out on LB agar containing 50pg/ml carbenecylin, 20pg/ml IPTG and 20pg/ml X-gal (for blue white screening) and incubated overnight at 37°C. 2. White colonies (insert containing) were used for subsequent analysis.

2.2.19.7 Generation of 3^^ PCR for library screening

1. The following reactions were set up: 36pl H%0 5pi 10 X PCR buffer (supplied with enzyme) Ipl EXl primer 1 pi EX2 primer Ipl lOmM dNTPs 1 pi SUPER TAQ polymerase 3pi secondary PCR product (section 2.2.19.5) Cycles given in section 2.1.4. 2. 10% of the resulting PCR amplification was resolved with 2% agarose gel electrophoresis. 25ng (on average 1 pi of 3"^ PCR) was used in subsequent radio-labeled probe preparation (see 2.2.8).

2.2.20 Sequencing template preparation PCR amplified DNA 1. PCR reactions were ethanol precipitated by adding 1/10 volume of 3M NaOAc and 2.5 volumes of 100% ETOH, and leaving at either of the following temperatures for the given time period: dry ice 15 minutes -80°C 40 minutes -20T 2 hours 2. Precipitated DNA was centrifuged at 13,000 rpm for 10 minutes, washed once in 70% ethanol, dried at room temperature and resuspended in 20pl H 2O. 3. 5pi of the precipitated DNA was resolved with appropriate electrophoresis in the presence of size markers to determine the concentration. lOOng were used as template for sequencing reactions using BigDye™

83 terminator cycle sequencing according to manufacturer’s specifications (PE Applied BioSytsems). Sequencing was performed on an ABI PRISM™ 310 Genetic Analyzer, according to the manufacturer specifications.

Plasmid DNA preparation 1. Where a plasmid was over 15kb in size (such as a PAC or cosmid), approximately 8pg of plasmid DNA was restriction enzyme digested (either EcoRI, BamWl, or Hindill) in a volume of lOOpl with 50-100 units of enzyme. Where a plasmid was 15kb or less (such as a sub-cloned cosmid fragment), purification proceeded from step 2, without restriction enzyme digestion. 2. Restriction enzyme digested fragments were then purified using the Wizard*^ DNA Clean-Up System (Promega) purification system, according to the manufacturer’s specifications. 3. Purified DNA was ethanol precipitated by adding 1/10 volume of 3M NaOAc and 2.5 volumes of 100% ETOH, and leaving at either of the following temperatures for the given time period: dry ice 15 minutes -80°C 40 minutes -20°C 2 hours 4. Precipitated DNA was centrifuged at 13,000 rpm for 10 minutes, washed once in 70% ethanol, dried at room temperature and resuspended in 20|il H 2O. 5. 5pi (approximately l-2pg) of the resulting DNA was used as template for sequencing reactions using BigDye™ terminator cycle sequencing according to manufacturer’s specifications (PE Applied BioSytsems). Sequencing was performed on an ABI PRISM™ 310 Genetic Analyzer, according to the manufacturer specifications.

2.2.21 Genomic DNA preparation

1 . Tissue fragments (approximately 2g) or cells (confluent 750ml flask) were suspended in 20ml of TEN contained in a 50ml falcon tube. 2. 1 ml of 20% SDS was added and mixed by gentle inversion. 3. 2mg of RNase were added to the falcon tube, which was rotated at a gentle speed for 10 minutes at 37°C. 4. 1 Omg of proteinase K was added and rotated at a gentle speed, at 37°C for a minimum of 24 hours. 5. An equal volume of phenol was added, mixed and rotated at a gentle speed for 1 hour at room temperature. 6. The sample was centrifuged at 4000rpm for 15 minutes at room temperature. 7. The aqueous phase was removed with care to avoid any interphase deposits. 8. Another equal volume of phenol was added to the sample, which was again rotated at room temperature for 1 hour and then centrifuged at 4000rpm for 15 minutes. 9. To the removed aqueous phase an equal volume of chloroform/isoamylalchol (24:1) was added.

84 10. The sample was mixed and rotated for a further hour at room temperature, before being centrifuged at 4000rpm for 15 minutes. 11. The aqueous phase was carefully removed and placed into a fresh tube of appropriate size. 12. To this, 0.1 volumes of 3M NaOAc was added and mixed. 2.5 volumes of 100% v/v ethanol were added and mixed briskly by inversion. 13. The visible DNA was spooled onto a sealed Pasteur pipette, removed from the tube and dipped once into 70% v/v ethanol, before being placed into an 1.5ml eppendorf tube containing an appropriate volume of H^O.

The Pasteur pipette was broken off to enable the lid of the eppendorf tube to be closed, which was placed at 4°C overnight (enabling resuspension of the DNA). 14. After the DNA had completely resuspended, the broken pipette was removed. 15. The quality of the DNA was then checked. Firstly the absorbency at ODj^, and ODj^,, was recorded. From

this the concentration and quality could be determined (Sambrook et al, 1989).

Concentration:

(ODjf,,, X dilution coefficient X 50) / 1000 = concentration in jig/pl where the dilution coefficient

indicates the degree the sample was diluted by, e.g. if the absorbency of the sample was determined without dilution then the coefficient would be 1. If the sample was diluted 1:1(X)0 the coefficient would be 1000.

Quality:

The ODjf^, to ODjs,, ratio indicates the quality. A ratio of 1.75 - 2.0 is good. Below 1.75 indicates too much associated proteins. If the OD^^, to OD^^,, ratio fell outside the boundaries of 1.75 - 2.0 then a further phenol / chloroform extraction and ethanol precipitation was performed (steps 5-14). 16. lOpg of the sample was then digested with 50 units of EcoRI restriction enzyme and 15pg of RNase, according to the manufacturer’s specifications, in a volume of 150|il for a period of at least 6 hours. This reaction was then precipitated by adding 15pl (0.1 volumes) of 3M NaOAc and 412.5pl (2.5 volumes) of

100% v/v ethanol, mixing and incubating at -20°C for a minimum of 2 hours. The sample was then centrifuged at 13000rpm for 10 minutes, washed once with 70% v/v ethanol, dried and resuspended in an appropriate volume to be loaded onto electrophoresis apparatus. The digested sample, along with 2pg of the undigested sample, was then electrophoresed at a low voltage (35-50V) for a period of over 18 hours on a 0.8% agarose gel. The DNA was visualized with a UV transuliminator after staining with ethidium bromide solution (0.0002% v/v) for 15 minutes at room temperature and de-staining with H^O for a minimum of 20

minutes. Visualization of the digested sample gave an accurate indication of concentration that could have been misinterpreted by the OD^^, to OD^K,, ratio due to contaminants in the sample, such as RNA.

85 Visualization of the undigested sample determined the quality of the DNA with respect to any sheering occurring during the isolation process and also determined if any RNA was present. 17. The isolated genomic DNA was then stored at 4°C.

2.2.22 cDNA construction Determination of RNA concentration 1. RNA absorbency at OD^^, and OD^^,, was measured. From this the concentration and quality could be determined (Sambrook etal, 1989). Concentration was determined as follows: (ODj^, X dilution coefficient X 40) / 1000 = concentration in pg/pl where the dilution coefficient indicates the degree the sample was diluted by, e.g. if the absorbency of the sample was determined without dilution then the coefficient would be 1. If the sample was diluted 1:1000 the coefficient would be 1000. Quality can be determined by the ODj^, / OD^^,, ratio. The RNA is of good quality if the ratio is greater than

1.9. If the ratio is less then further phenol/chloroform/IAA (25:24:1) extractions are performed.

DNase I treatment of RNA 1. 5|ig of total RNA or 250ng of PolyA+ RNA were treated with DNase I, RNase free. Reactions were set up as follows:

14pl RNA in dEPC treated H 2O

4|il 5 X DNase I buffer 1 |il RNasin (Promega) 1 |il DNase I, RNase free (ROCHE)

Reactions were incubated at 37°C for 10 minutes, after which the following were added: 2pl lOOmM TrisHCl pH 7.4/lOOmM EDTA

25pl H 2O 5 pi Yeast tRNA (lOmg/ml)

2. Two phenol/chloroform/IAA (25:24:1) extractions were performed. To the remaining 50pl the following were added: 6.6pl 2M NaOAc 1 lOpl 100% ETOH 3. Samples were incubated on ice for 30 minutes and then centrifuged at 13,000 rpm, 4“C, for 15 minutes. 4. The RNA pellet was washed with 75% ETOH and centrifuged at 13,000 rpm, 4°C, for 5 minutes, and then dried on the bench at room temperature. RNA was suspended in 20pl dEPC treated H 2O.

86 Reverse transcription 1. Two identical reactions for each RNA sample were set up: 20pl DNase I treated RNA

7fil 5 X M-MLV RT-buffer (supplied by manufacturer, Promega) 4|il 1 OmM dNTPs 1 |il RNasin (Promega) 1 |il random primers (Life Technologies)

2. Reactions were incubated at 65°C for 10 minutes and then placed on ice. 3. To one of the two reactions (labeled +RT), 2pl of M-MLV reverse transcriptase (400units) were added, to the remaining reaction (labeled -RT), 2pl of H 2O was added. 4. Reactions were incubated at 42°C for 90 minutes and stored at -20°C. 5. 3-5|il of each reaction (+RT/-RT) were used in each PCR experiment.

2.2.23 Northern blot hybridization 1. A Multiple Tissue Northern Blot (CLONTECH) was hybridized in ExpressHyb™ solution according to the manufacturer specifications. 2. In each case approximately 2 x 10^ cpm/ml of labeled probe were used in each hybridization experiment. 3. The Northern blot was washed, after hybridization, in the following way:

0.2 X SSC/0.05% SDS 15 minutes, room temperature x 3 0.1 X SSC/0.1 % SDS 20 minutes, 50“C x 2 4. Autoradiography was performed using Posphorlmager screens. Molecular Dynamics.

2.2.24 Sub-cloning cosmid DNA fragments into pBluescript^ vector

1 . pBluescript*^ vector was prepared in an identical way to pSPL3 described in section 2.2.19.1, steps 1-7, except linearization was completed by using a different restriction enzyme in some cases (indicated in text). 2. Cosmid fragment insert DNA was prepared as follows: 8pg of cosmid DNA was digested with the restriction enzyme of interest. Digested DNA was resolved with 0.8% agarose gel electrophoresis. 5% of the digested DNA was resolved in a single gel lane adjacent to a lane resolving a size standard, while the remaining 95% of the digested DNA was resolved over four lanes. Only the portion of the agarose gel which resolved 5% of the DNA digestion and the size marker were stained with ethidium bromide, visualized under UV light, and photographed in the presence of a ruler. The distance the desired DNA fragment (to be cloned) had migrated through the agarose gel was recorded. A gel slice at this distance along the agarose was removed from the four lanes resolving the remaining 95% of the restriction enzyme digested cosmid DNA. DNA present in this gel slice was purified using the electroelution method, phenol/chloroform extracted, and

87 ethanol precipitated. 10% of the purified DNA fragment was resolved with agarose gel electrophoresis to determine the concentration. 3. Ligation reactions were set up with 20ng of prepared insert DNA and lOng of prepared pBluescript*^ vector in a lOpl volume of 1 x ligation buffer, lOmM rATP and 0.5 units of ligase. Reactions were left for 14 hours at room temperature. 2. lOOfil of competent cells were transformed with the ligation reactions as described previously (see 2.2.17). 3. 8 white colonies were selected for investigation. Plasmid DNA was isolated from these colonies. 4. Dot blots were constructed (see 2.2.15) from the isolated DNA in addition to pBluescript^ DNA and the original cosmid DNA as controls. The hybridization probe that had originally identified the DNA fragment of interest within the cosmid of interest was hybridized to the produced dot blots. 5. DNA from two positively hybridizing plasmid isolations were purified and sequenced as described.

2.2.25 Oligonucleotide DNA primer design and annealing temperature calculation PCR amplification 1. Oligonucleotide DNA primers were designed in a 5’ to 3’ orientation from the plus (forward) and minus (reverse) strand of the DNA region of interest. The length of oligonucleotides ranged from 18-24 depending on the GC content. GC content of each primer ranged from 45-65% and the GC content of each primer in a pair was designed to be as close a match as possible. 2. Long stretches of identical nucleotides (>4) were avoided and primers were designed to contain a G or C nucleotide at the beginning and end of each primer sequence where possible. The two primers constituting a pair were compared for possible co-annealing to avoid the event of dimerization. 3. Each primer sequence was used in a BLASTN 2.0.9 (Altschul etal, 1997) of the non-redundant databases available at the NCBI web site (http://www.ncbi.nlm.nih.gov/blast). In the event of homologous matches, primers were re-designed.

Calculation of annealing temperature 1. The following equation was supplied by Dr Jane Ives for the determination of primer annealing temperature: 69.3 + (0.41(GC%)) - 650 oligo. length The above equation was used as a guideline only - the actual annealing temperature of a PCR reaction (using a pair of primers) was often not exactly as equated, see table 2.1 (section 2.1.4) for actual annealing temperatures used.

88 Sequencing 1. Oligonucleotide DNA primers were designed in a 5’ to 3’ orientation in the direction of the desired sequence, according to the PE Applied Biosystems recommended specification for use with BigDye™ cycle sequencing ready reaction mix. 2. It was noted that primers over 22 bases long produced longer sequence reads (specifications stipulate primers should be at least 18 bases long). GC content was, on average, over 50% (specifications stipulate 30- 80%) but no greater than 65%. 3. Each primer sequence was used in a BLASTN 2.0.9 (Altschul et al, 1997) of the non-redundant databases available at the NCBI web site (http://www.ncbi.nlm.nih.gov/blast). In the event of homologous matches, primers were re-designed.

89 Chapter 3: Construction of overlapping segments of human DNA cloned in bacteria representing the lq21 region, specifically the Epidermal Differentiation Complex.

3.1 Summary

The construction of a fully overlapping set of bacterial clones, spanning 2.45Mb of human chromosome lq21 and encompassing the known Epidermal Differentiation Complex, is described. The map consists of genomic DNA cloned as 45 PI artificial chromosomes, 3 bacterial artificial chromosomes and 34 cosmids. All of the 28 genes so far assigned to the epidermal differentiation complex are positioned to specific fragments on a partial EcoRI restriction endonuclease map and a full Notl and SaR restriction endonuclease map, as too are markers from the region, thus facilitating refinement of order and distances between markers and genes. Of the total map length, 59.3% is covered by overlapping bacterial clones with a depth ranging from 3-9 fold, while only 15.6% is covered by single clones. The Notl and SaR restriction endonuclease map facilitates comparison with previous studies from this region employing long-range restriction endonuclease and yeast artificial chromosome mapping techniques. The distances determined by the restriction endonuclease map are in good agreement with these previous, low resolution mapping studies. The bacterial clones present in this map will provide the template for the large scale sequencing of this region and have been integrated into the chromosome 1 sequencing project at the Sanger Centre, Hinxton, UK. In addition, the bacterial clones presented here will provide a resource for further studies of genomic structure, transcriptional regulation, function and evolution of the epidermal differentiation complex as well as the identification of novel transcribed sequences.

90 3.2 Introduction

3.2.1 Advantages of bacterial clone physical mapping

PI artificial chromosomes (FACs - loannou et al, 1994) and bacterial artificial chromosomes (BACs - Shizuya et al, 1992) have provided ai more reliable representation of cloned genomic DNA than previously developed vector systems (such as YACs and cosmids, see chapter 1). FACs and BACs have become the clones of choice for large scale mapping and sequencing projects (Venter et al 1996; Marra et al, 1997; Gregory et al, 1998; SC and WUGSC, 1998). This is largely due to stability of cloned DNA, a manageable size, and ease of isolation. The vector system is based on a very low copy number maintained in Escherichia coli by the replication control elements from factor F and bacteriophage FI plasmids, thus reducing the possibility of recombination events. Although the cloned DNA fragments are generally smaller than those maintained in yeast artificial chromosomes (YACs - Burke et al, 1977), FAC and BAC DNA is more readily purified from host organism DNA through an alkaline lysis procedure (Birnboim and Doly, 1979). The average size of 100-200kb (as opposed to up to 1Mb in YACs) permits accurate fingerprint analysis (Marra et al, 1997; Gregory et al, 1998). In contrast, cosmid clones (Collins and Hohn, 1978) are much smaller (average insert size 40kb), and are not maintained at a low copy number within E.coli, increasing the risk of deletions and rearrangements (Kim U-J. et al, 1992). Systems for isolating novel transcribed sequences (see chapter 4) originally described using YACs and cosmids; such as direct screening of cDNA libraries (Blvin et al, 1990), cDNA selection (Lovett et al, 1991; Farimoo et al, 1991), and exon trapping (Duyk et al, 1990; Buckler et al, 1991); can all be utilised with FAC and BAC clones. Techniques have been described for the modification of FAC and BAC vectors to enable transfection studies of cloned DNA (Mejia and Monaco, 1997; Yang et al, 1997; Kim et al, 1998). In order to track the successful transfection of a BAC or FAC clone into eukaryotic cells (to facilitate functional studies of sequences present) a selectable marker and reporter gene need to be introduced into the vector (many existing libraries do not contain these features - Baker and Cotton, 1997). This has been demonstrated by

91 using restriction enzyme digestion followed by ligation (Meija and Monaco, 1997); by the use of homologous recombination based on known sequences present in the genomic insert (Yang et al, 1997); or by site specific recombination between ZoxP sites contained within the BAC vector and modifying construct (Kim et al, 1998). In one instance the method described successfully introduced BAC clone DNA into fertilized mouse zygotes (by pronuclear injection), thus transmitting the sequence in the germ line (Yang et al, 1997). With the large scale sequencing projects now underway the need for constructing plasmids for transfection studies can be circumvented by modifying a clone identified in a database that contains the sequences of interest. The significant size of the bacterial clone will enable the inclusion of many genes, along with possible flanking or intronic regulatory elements, thus providing a powerful tool in any functional study of transcribed sequences.

3.2.2 Mapping status of lq21 At the beginning of this study a degree of knowledge concerning the mapping status of human chromosomal region lq21 had already been established. Initially work had described the isolation of genes within the three multigene families and their specific localization to lq21 (McKinley-Grant et al, 1989; Morii et al, 1991; Engelkamp et al, 1993; Gibbs et al, 1993; Schaefer et al, 1995; Moog-Lutz et al, 1995). The first physical maps to identify the specific clustering of these multigene families were based on digesting genomic DNA with rare cutting restriction enzymes and resolving the resulting large fragments using pulsed field gel electrophoresis (PFGE) technology. Pulsed field gels were Southern blotted and hybridized with gene specific probes described from earlier work. The first of these studies (Volz et al, 1993) used seven rare cutting restriction enzymes (producing large restriction fragments) to digest the Epstein- Barr virus-transformed normal B-cell line H2LCL. The resulting, digested, fragments were resolved using rotating field gel electrophoresis (a technique derived from PFGE), Southern blotted, and hybridized with seven gene specific probes. This established a 2.05Mb region of lq21, harboring members of the three multi-gene families. This genomic map was further refined and used to establish the term Epidermal

92 Differentiation Complex, with the addition of extra restriction enzyme sites and the localization of four more genes (Mischke et al, 1996). Figure 3.1 A displays a schematic of the map produced by Mischke et al, 1996. This data was in agreement and to some extent incorporated two separate studies. The first of these described the isolation of a single YAC clone shown to hybridize with nine SI00 genes (Schaefer et al, 1995) and localize, by fluorescent in situ hybridization (FISH), to lq21 (figure 3.IB). The second had previously described the presence of approximately 10 SPRR genes (Gibbs et al, 1993) within the lq21 region (figure 3.1C) giving rise to a total of 24 genes clustered in a 1.9 Mb portion of human lq21. Further refinements to the lq21 physical map were achieved by the assembly of Yeast Artificial Chromosome (YAC) maps (contigs). The first of these consisted of 24 YAC clones spanning 6 Mb of human chromosome lq21 (Marenholz etal, 1996). This incorporated new markers from the region as well as established polymorphic markers not previously placed on a contiguous physical map. Restriction analysis was not undertaken with this collection of YAC clones, presumably due to the large distance covered and number of clones analyzed. Restriction analysis was performed with a second, much smaller, 1.2 Mb YAC contig covering the distal portion of the EDC (involucrin to S100A2), which identified additional rare cutting restriction enzyme sites not seen with the genomic restriction map (Zhao and Elder, 1997). Figure 3. ID displays a schematic of the region represented by this YAC contig. Since the above physical maps were produced an additional three SI00 genes had been localized to the region, either specifically to YAC clones (Wickii et al, 1996a+b) or to lq21 by FISH (Moog-Lutz et al, 1995; Yamamura et al, 1996).

93 A

CEN Okb 500kb IQOOkb I500kb 2000kb TEL ^ I_ —I ► Npt\ Sail Sail Sail Sail N otl

4? <=> « » <=>

B 1500kb 20(X)kb

Sail Sal]i fSail Sail S ail

/ / / / Î

c ❖

lOOOkb I500kb

SPRR31 SPRRl 11 lllllilSPRR2

D I500kb

N o tl

SPRR2

Figure 3.1. Schematic displaying four N otl and Sail restriction maps of regions included within the Epidermal Differentiation Complex on human chromosome lq21. In each case (A, B, C, and D) the upper vertical line displays a scale bar in kilobases (kb) relative to A, while the lower bar displays the positions of restriction enzyme sites. Beneath these vertical lines are the relative positions of EDC genes mapped in each region. Grey boxed maps (A and C) were determined from genomic restriction mapping while unboxed maps (B and D) were determined from YAC restriction mapping. Adapted from: A - Mischke et al, 1996; B - Schaefer et al, 1995; C - Gibbs et al, 1993; D - Zhao and Elder, 1997. Construction of maps in these studies used additional restriction enzymes, here only N otl and Sail restriction enzyme sites are displayed.

94 3.3 Production of a sub-library of recombinant bacterial clones enriched for lq21 originating DNA

3.3.1 Starting Material

16 out of the 24 clones from the Marenholz YAC contig were kindly donated by Dr Jiannis Ragoussis. These clones represented the entire 6Mb region of human chromosome lq21 and had been selected to avoid redundancy. A single 96-well microtitre plate was inoculated, grown, used to produce filters (see section 2.2.1) and stored at -SO^C. Plasmid DNA containing the cloned loricrin gene was kindly donated by Dr. Daniel Hohl. 1 pg of plasmid was digested with EcoRI restriction enzyme, effectively excising the insert cDNA from the vector. The insert and vector DNA were separated by electrophoresing on a 0.8% agarose gel and purified using the electroelution method (see materials and methods, section 2.2.2 and 2.2.3). Plasmid DNA containing cloned portions of SPRR la, SPRR2e and SPRR3 genes were kindly donated by Dr. Claude Backendorf. These plasmids were digested with the appropriate restriction enzymes (to excise the vector from insert DNA), resolved on 0.8% agarose gels, and purified using the electroelution method.

3.3.2 YAC isolation

Five YAC clones from the Marenholz and co-workers collection (CEPH mega-Y AC library) were selected for investigation: 764_a_l, 907_e_6, 874_d_5, 955_e_l 1, and 950_e_2. These YAC clones encompassed the entire Epidermal Differentiation Complex (EDC). The two outer most YAC clones 950_e_2 and 764_a_l were mapped to lq21 using two color fluorescent in situ hybridization by Marenholz and co-workers. This had not only verified the content of these YAC clones to be non-chiameric but also determined the chromosomal orientation of the EDC. Agarose plug mini-preps from the five YAC clones were prepared. Yeast chromosomes and the respective artificial chromosomes were resolved after pulsed-field gel

95 electrophoresis (PFGE) for the 150-2200kb window. Artificial chromosomes were excised from the gel where visualized, and purified by agarase digestion (materials and methods). Where artificial chromosomes could not be visualized (due to co-running with yeast chromosomes) two identical gels were produced. Resolved DNA present in one of these gels was Southern blotted (transferred to a nylon membrane according to Southern, 1975 - see materials and methods) and hybridized with 50ng of ^^P labeled total human placental DNA, indicating the artificial chromosomes’ position. The YACs were then excised from the corresponding position on the remaining gel and purified by the Prep-A-Gene® or agarase method. Table 3.1 details the size of each YAC isolated (based on comparison to size standards from the PFGE) and the method used during isolation.

Table 3.1 Size, purification method, and labeling efficiency of YAC clones used in library screening. nd = not determined, * = not labeled. VAC address Size in kb from PFGE Size determined by Size in kb determined by Purification CPM/ml CPM/ml Genethon (in kb) Marenholz et al, 1996 method (cosmid) (PAG)

764_a_l 900 nd 330-1400 agarase 0.99m 1.12m 907_e_6 1050 1220 1040 Prep-A-Gene® 0.88m 0.8m 874_d_5 co-ran 2200,1600 1450 2245 Prep-A-Gene® 0.98m * 955_e_ll co-ran 1125 1310 1165 agarase 1.57m 1.44m 950_e_2 co-ran 680 340,930,1290 240-860 agarase 0.73m 0.73m

Marenholz and co-workers noted that the size of the YACs 764_a_l and 950_e_2 varied when isolating multiple clones (15 and 11 respectively) from the main YAC library (indicated by the range of sizes given in table 3.1). In this case the largest YAC from six separate isolations (size indicated in table 3.1) was selected for investigation.

3.3.3 Verifying the YAC STS/EST content

Each of the five selected YAC clones were checked by PCR with three DNA markers from the region (2pl of a 1:200 dilution of purified YAC DNA were used as template). YAC 764_a_l and YAC 907_e_6 were positive for SIOOAIO, YAC 874_d_5 and YAC 955_e_l 1 were positive for involucrin, while YAC 955_e_l 1 and YAC 950_e_2 were positive for S100A6.

96 3.3.4 Screening the cosmid and PAC libraries.

Radio-labeled YAC DNA was initially hybridized to high density gridded filters from a flow sorted chromosome 1 cosmid library, after suppression of repetitive sequences (see materials and methods, section 2.2.9). Although the ultimate strategy was based on constructing the majority of the map in PI artificial chromosomes (PAC), using YAC to PAC library hybridizations, the cosmid data provided an additional lq21 resource. Each YAC was hybridized to cosmid filters individually, table 3.1 indicates the counts per minute per ml of labeled YAC (as determined by a Beckman scintillation counter) used in each hybridization. Cosmids were scored and the data was entered into the xv2well program (Unix operating system) written by Huw Griffiths, ICRF, London (unpublished). Cosmid addresses for each YAC hybridization were calculated and compared using a computer program written by Colin James in Dr Nizetic’s laboratory. The program generated a table displaying each positively scored cosmid with respect to the YAC used in each hybridization. Cosmids positive with more than three YAC clones, or two or more overlapping YAC clones (i.e. 764_a_l and 950_e_2), were scored as non-specific. From this data each positive cosmid could be assigned to a ‘bin’ or ‘pocket’, depending on its’ hybridization pattern. A total of 13 pockets were produced containing 588 specific cosmids. Figure 3.2 displays these pockets in relation to the YAC clones. Two anomalies are seen when these data are compared to the Marenholz YAC contig: YAC clones 907_e_6 and 874_d_5 hybridize to a number of cosmid clones uniquely, where the Marenholz data displays these clones overlapping. One explanation for this would be the fact that the YAC clone 874_d_4 is known to be re-arranged (Marenholz et al, 1996), while the YAC clone 907_e_6 could possibly contain chromosome 1 specific repeats (Zhang et al, 1999) generating a large number of uniquely positive cosmid clones within the library. Independent screening of the cosmid library by two laboratories studying individual genes within the EDC had identified a number of cosmid clones. These cosmid clones were present within the pockets as indicated by figure 3.2.

97 CEN Okb 50pkb IQOOkb I500kb lOOOkb 2300kb 3000kb TEL M— -► .s'* \r SW>

I ------1 I 764_a_l I .1______1 ★ +43 r------, 1I ______874_d_5 II

I 907_e_6 I J ______I o so oc I 955_e_ll I O I 950_e_2 I . I------J

CD o 25 17 30 27 200 22 94 22 82

Figure 3.2. Cosmid pocket map generated using 5 YAC clones covering approximately 3 mega-bases of human chromsome lq21. The top horizontal bar displays the scale in kilobases and the orientation (CEN = centromere, TEL = telomere). The thicker horizontal bar below the scale displays a schematic representation of the chromosome with the position of genes and markers indicated by circular shapes (THH - trichohyalin, FLG - pro-filaggrin, INV - involucrin, LOR -loricrin). White circles depict genes used to screen the flow sorted chromosome 1 library by independent laboratories (Dr J. Ragoussis and Dr. D. Hohl). Below the chromosomal bar are horizontal lines representing YAC clones. Dashed lines depict the relative positions of YAC clones as determined by Marenholz et al, 1996 (not to scale) while solid lines depict the relative positions of YAC clones used to generate the pocket map (not to scale). Vertical lines display the relative positions of the cosmid pockets (determined by the hybridisation pattern of each individual YAC clone). The number of cosmids is indicated for each pocket. In the case of YAC 874_d_5, ★ indicates the number of cosmids uniquely positive for this clone (43). Circular shapes present on YAC clones or within pockets indicate that either the YAC clone is positive for the corresponding marker/gene PCR (grey circular shapes) or cosmids that have been previously identified with hybridisation screens of the library with the particular gene (indicated by white circular shapes) are present within the individual pocket. Interestingly the pocket data revealed that the YAC clones 907_e_6 and 955_e_l 1 overlapped. This had not been indicated in the map produced by Marenholz and co­ workers, presumably due to the lack of markers for this interval. This enabled the number of YAC to PAC hybridizations to be reduced from five to four by excluding the now redundant, rearranged YAC 874_d_5.

Labeled YAC DNA was hybridized to high density gridded membranes (or filters, herein) of the PAC library RPCI-1, representing the whole human genome. In addition to suppressing human repetitive sequences within the labeled DNA (prior to YAC to PAC hybridization as described), around 1 pg of PAC vector DNA that contained no cloned insert was also added to the competitive hybridization reaction. This was due to the fact that about one quarter of the clones within the library RPCI-1 contain no insert (HGMP Resource Centre, Hinxton, UK), and that a number of these PAC clones have been scored positive with other YAC to PAC hybridizations (Jurgen Groet, Dr Nizetic laboratory, personal communication). Even after this, a number of positive PAC clones were seen that did hybridize with other YACs from unrelated regions of the human genome (J. Groet, personal communication). After removing these non-specific clones, a total of 105 were scored. These PAC clones could be ordered into seven pockets determined by the individual YAC hybridization patterns. Figure 3.3 displays the numbers of positive PAC clones positioned within these described pockets. A number of differences can be seen between the cosmid pocket and the PAC pocket maps after excluding YAC 874_d_5 from the cosmid pocket data. The factor between literal numbers of clones for each pocket shows a wide variance. In the case of YAC 764_a_l, 34 uniquely positive cosmid clones are seen in comparison to 21 uniquely positive PAC clones, which produces a factor of 1.62 between the two figures. In the case of the pocket produced by YAC clones 955_e_l 1 and 950_e_2, 22 cosmid clones and only 2 PAC clones are present; a difference by a factor of 11. In order to assess the accuracy of the PAC pocket data before proceeding with constructing the bacterial map, the expected number of positive PAC clones for each YAC clone in question was determined.

99 CEN Okb 500kb lOOOkb 1500kb 2000kb TEL M— L- I ^

o

764 a 1 955 e 11

907 e 6 950 e 2

Figure 3.3. PAC pocket map generated using four Y AC clones covering approximately 3 mega­ bases of human chromosomal region lq21 (of which only 2 megabses are shown). The top horizontal line displays the scale in kilobases and orientation (CEN = centromere, TEL = telomere). The horizontal bar below this shows a schematic representation of the chromosome displaying the position of three genes (rhomboid shapes) used to verify the Y AC content by PCR (IVL = involucrin). Horizontal lines below this depict the Y AC clones indicated (not to scale). Rhomboid shapes present within these YAC clones indicate the presence of the respective gene within the Y AC clone as determined by PCR. Vertical dashed lines delineate the PAC pockets which are represented by black or white rectangles at the bottom of the figure. Numbers present in these rectangles indicate the number of PAC clones scored positive within the particular pocket.

100 The average length of a PAC clone from the RPCI-1 library seen when working on human chromosome 21 (Groet et al, 1998; personal communication) is 1 lOkb. The RPCI-1 library was constructed to yield a five fold coverage of the human genome (loannou et al, 1994) but the revelation that only 75% of the clones contain a cloned insert (HGMP Resource Centre, Hinxton, UK) reduces this to 3.75 fold. With these statistics in mind, and assuming the library coverage is equal genome wide, the expected number of positive PAC clones from a labeled YAC of known size can be worked out using the formula:

YAC size in kb x 3.75 (library coverage) = expected number of positive PAC clones 110 (average PAC size in kb)

Table 3.2 displays the predicted number of PAC clones (as determined with the above formula) and the actual number of PAC clones scored positive with the labeled, PFGE sized, YAC clones.

Table 3.2. Predicted compared to actual number of positively scored, specific, PAC clones identified with four YAC clones (indicated) used as hybridization probes against the total human PAC library, RPCI.

YAC SIZE in kb PREDICTED ACTUAL 764_a_i 900 30.7 28 907_e_6 1040 35.5 35 955_e_ll 1125 38.4 32 950_e02 680 23.2 22

As the degree of overlap between YAC clones is not known the expected distance covered by the positive PAC clones can be worked out from the above formula using the known value for the positive PAC clones (105).

Estimated distance covered in kb = number of positive PAC clones x 110 3.75

Therefore the estimated distance covered would be 3080kb, or 3.1Mb.

101 The number of PAC clones identified from the RPCI-1 PAC library was in good agreement with the predicted number of PAC clones (table 3.2). Therefore, the 105 positively identified PAC clones were selected to constitute the ‘lq21 sub-library’ - a library of recombinant bacterial clones enriched for human chromosomal region lq21, specifically the Epidermal Differentiation Complex.

3.3.5 Construction of tools for hybridization and PCR-based screening of the lq21 sub-library

A lq21 sub-library was produced by picking the 105 positively scored PAC clones from the main library directly into two 96-well microtitre plates containing 2-YT media (plates 1 and 2). These plates were grown overnight after which the cells were resuspended and divided to constitute three copies which were grown for an additional night. Three PAC clones failed to grow even after picking for a second time and streaking onto agar. Cells in each plate were resuspended. One of the three copies was immediately frozen on dry ice and stored appropriately at -80°C. The second sub-library set was used to produce hybridization membranes (filters) by spotting 40 times onto nylon membranes. The second plate of the two contained only 9 clones and was therefore spotted underneath the first, full, microtitre plate using a multi-channel pipette. This second plate set was stored overnight at 4°C in case any spotting problems were seen and subsequently frozen on dry ice and stored at -80°C. This plate set (plates 1 and 2) were labeled ‘master’ and used to pick clones for further analysis. Spotted filters were grown on agar overnight and processed as described (see section 2.2.1). The third copy of the sub-library was used to produce pools of cells that could be screened by PCR. Half of the well volume from each column of plate 1 was removed and pooled (labeled pool 1-12). Half of the well volume from the 9 remaining PAC clones in plate 2 were pooled and labeled pool 13. Each pool was then divided into 4 aliquots and labeled appropriately. All the pools and the two plates were frozen on dry ice and stored at -80°C. Pools would provide a template in any PCR reaction as follows: 2pl of defrosted cells from each pool would be used in separate PCR reactions. If a

102 positive pool were identified, the PCR would be repeated using cells scraped from the appropriate wells of the frozen plates as template indicating which clone or clones were positive.

3.3.6 First round screens with DNA markers Initially, the PAC pools were screened by PCR with a number of markers. Table 3.3 displays the markers used and the number of positively identified pools (first round PCR) and clones (second round PCR). 50ng of total human placental DNA (Sigma) was used in each PCR reaction as a positive control. Figure 3.4 displays the result of screening the pools and clones with the primers that amplified a portion of the trichohyalin gene. Primer sequences, amplification conditions, and size of PCR product, can be found in section 2.1.4 (materials and methods).

Table 3.3. Primer sets used to screen the lq21 sub-library pools, the number of positive pools and clones identified. For primer details see section 2.1.4. Primer set Positive pools Positive clones

Trichohyalin (THH) 2 2 Profilaggrin (FLG) 2 2 Involucrin 3 5 DIS1664 2 2 DIS2278 1 1 DIS3020 0 0 DIS.305 0 0 DIS3626 1 1 DIS3625 4 4 SI00A6 2 2 SI00A8 2 2 SI00A9 2 2 SIOOAIO 0 0

Amplified DNA from positive PCR reactions (either control or pool) was gel purified by the electroelution method, providing a hybridization probe for the above markers. Purified cDNA inserts representing the loricrin gene and three of the SPRR genes provided corresponding hybridization probes. An average of 25ng of these DNA probes were labeled as described and hybridized against the lq21 sub-library filters.

103 Figure 3.4. 2% agarose gel electrophoresis displaying the results of screening the lq21 sub-lihrary PCR pools and corresponding inicrotitre plate with primers amplifying a CO CM o 2 i— I — I— I— O 'O or^'O iO ’ïjcocNji— portion of the trichohyalin gene. Gels were V

stained with ethidium bromide and visualised CLû.CLCLû.a.û.aa.a.CL£i.a.*- under UV light. i f ^ A: Results of screening pools 1-13. Pools 1- îï-2036bp 12 represent the columns 1-12 of microtitre plate 1 while pool 13 represents the clones present in microtitre plate 2 of the lq21 sub­ library. B: Results of screening columns 4 and 9 of microtitre plate 1 of the lq21 sub-library displaying the wells E4 and F9 positive.

U_^ LU O5

,fS-2036bp 4r'018bp .4fc-510bp Jtr220lop

Figure 3.5. Pho.sphorimager (Molecular Dynamics) autoradiography of two lq21 sub-library Filters hybridized with separate probes, in each case rows A to H correspond to those of inicrotitre plate 1 while row 1 represents row A of microtitre plate 2. A depicts the result of hybridization with a radio-labeled probe representing a portion of the S100A9 gene, displaying clones in wells E2 and E3 of the lq21 sublibrary to be positive. B depicts the result of hybridization with a radio-labeled insert from clone 230 D1 displaying the clones in wells E3, E7, G4 and H6 (230 Dl ) of the lq21 sublibrary to be positive. After initial hybridisation screens (example seen in A), filters were cut in two (B) in order to fit into smaller hybridisation bottles.

104 Figure 3.5A displays the hybridization result for a probe representing S100A9 against the sub-library filters. The following table indicates the number of clones positive from hybridization with the given probe.

Table 3.4. Genes and markers used as radio-labeled hybridization probes against the lq21 sub-library Alters and the resulting number of positive clones identified. PROBE No. positive clones

Trichohyalin (THH) 2 Profilaggrin (FLG) 2 Involucrin 5

D is 1664 2

D 1S2278 1 D 1S3020 0

D 1S305 0 DIS 3626 1

D 1S3625 4 S100A6 2

S100A8 2 SI00A9 2

SIOOAIO 0

Loricrin 2 SPRR1 4 SPRR2 4 SPRR3 3

The data obtained from these initial PCR and hybridization experiments is depicted in figure 3.6A. This not only identified ‘islands’ of PAC clones (e.g. the 2 PAC clones positive for the marker DIS 1664) but also identified sets of contiguous, overlapping clones (or contigs). An example of which can be seen in figure 3.6A with one clone positive for the three genes: involucrin, SPRRIB and SPRR3. This clone links the additional clones positive for involucrin with clones positive for SPRRla, SPRR3 and SPRR2e.

105 A CEN Okb 50pkb lOOOkb ISOOkb 2000kb TEL ^ I__ I ^

o ( x n >

- 00 -

B CEN Okb 500kb lOOOkb 150Okb 2000kb TEL 1 ^

o 0=aM^=^MKXHH^X>=

^ = 0 ^ - ^ m - c CEN Okb 500kb lOOOkb 15Wkb 2000kb TEL ^ I__ 1------k.

OHIN]===H>KK)0 =<===0 (>== 133 on 240 117 275 119

24P17 191 N24 73 N3 224 G17 116 MIO

Figure 3.6. Schematic representation of the bacterial clone map at three separate stages of construction. In each case the upper horizontal bar depicts orientation (CEN - centromere, TEL - telomere) and the scale bar (in kilobases) relative to gene and marker positions within the region. The lower horizontal bar displays a schematic representation of human chromosomal region Jq21, while the shapes represent the positions of genes and markers (indicated) used in bacterial contig construction (THH - trichohyalin, FLG - profilaggrin, IVL - involucrin, LOR - loricrin). Below the representation of the chromosome are horizontal lines depicting the relative position of PAC clones from the lq21 sub-library (not to scale). The position of genes and markers on PAC clones are shown by corresponding shapes as determined from hybridization and PCR experiments. A; PAC clones identified with initial screens of the lq21 sub-library. B: Additional PAC clones identified from the lq21 sub-library with initial insert hybridization experiments are depicted by double horizontal lines. C; Additional PAC clones identified from the lq21 sub-library and positioned on the map after random analysis and exhaustion of the sub-library are displayed by double horizontal lines. The positions and names of gap facing PAC clones (as determined by the hybridisation pattern and Ecr;RI fingerprint of these PAC clones) are indicated.

106 PAC DNA from all the positive clones was isolated as described. An average of 1 pg of isolated PAC DNA was digested with EcoRl restriction enzyme and resolved on a 20cm long, 0.8%, agarose gel. From this, the degree of overlap shared between PAC clones could be roughly determined. In one instance, a PAC ‘island’ could be linked. This was seen with two PAC clones, one positive for loricrin (clone 140 Jl) and one positive for S100A9 (clone 127 E l2) - from the EcoRÎ gel (or fingerprint) a number of shared bands can be identified. Figure 3.7 displays a portion of one such EcoRI gel with the two clones in question resolved and the position of the possible shared bands (loricrin positive clones = 190 Dl and 140 Jl, S100A9 positive clones =127 E12 and 128 115). The EcoRI fingerprint also indicated PAC clones that were possibly redundant (contained identical insert DNA), this was the case for two involucrin positive clones.

3.3.7 Contig linking

To confirm the degree of overlap, and to identify further clones, PAC insert DNA was isolated, labeled, and hybridized back to the lq21 sub-library filters. Insert DNA was isolated by digesting PAC DNA with Notl restriction enzyme, resolving on a 1% low melting point agarose gel by PFGE, excising and purifying by the agarase method (see materials and methods for details). The Notl restriction enzyme digests the PAC vector either side of the multiple cloning site, effectively excising the vector from insert. Figure 3.8 displays an example of a pulsed field gel resolving Notl digested PAC DNA. Where PAC insert DNA contained one or more Notl sites all non-vector DNA fragments were isolated and purified. Three PAC clones identified during the initial screens of the lq21 sub-library contained a single, internal, Notl site. In each case the two fragments resulting from Notl digestion were of a different size to that of the vector (17kb) fragment.

107 CN I/) ^ OJ -OCMCMO* :-gû-,Uj_,U-Zû^ ÿ ^^porxcooo — o“- t O'OO'^CvJOJCOOxCO'^ g

w w w B S - -

Figure 3.7. 10 PAC clones identified from the initial screens of the lq21 sub-library. An average of l|ig of FAC DNA has been digested with EcoRI restriction enzyme and resolved on a 0.8% agarose gel in the presence of a Ikb marker (Life Technologies). Marker sizes are indicated to the nearest kb. The position of EcoRI PAG vector fragments are indicated for clone 59 HI 2. The position of EcoRI fragments possibly shared between PAG clones 140 Jl and 127 E 12 are indicated by white boxes.

108 1 2 3 A2 - B188 A2 - 107 B3 3 2

UÜÜ U O ti

m ** m m

Figure 3.8. Example of Pulsed Field Gel Electrophoresis used to isolate insert DNA from 3 bacterial clones. Approximately jpg in three separate reactions from three separate DNA isolations of one BAG clone (B188 A2) and one PAG clone (107 B3) digested with N o tl restriction enzyme and resolved on a 1% low melting point agarose (Life Technologies) gel in the presence of three size markers. 1 - Lamda ladder (Bio-Rad). 2 - High molecular weight DNA (Life Technologies). 3 - Ikb ladder (Life Technologies). Sizes are shown to the nearest kb.

109 Figure 3.5B displays an example of an insert hybridization to the sub-library filters. Insert hybridizations from the initially identified PAC clones detected a further 6 PAC clones that had not been previously scored as positive and also linked clones containing the marker DIS 1664 all the way through to clones containing the marker S100A6. These hybridizations also provided data regarding the order of clones within contig islands, confirming data obtained from the EcoRI gels (see above). DNA from the newly identified PAC clones was isolated, the insert DNA was purified and hybridized to the lq21 sub-library filters. These hybridization results confirmed the already established overlap but failed to identify new PAC clones. Figure 3.6B displays the relative positions of these newly identified PAC clones. Unidentified PAC clones within the lq21 sub-library were then selected for analysis. PAC clones were initially selected on the basis of the YAC to PAC hybridization data. 2 PAC clones positive with YAC 764_a_l only, 2 PAC clones positive for both YACs 764_a_l and 907_e_6, 2 PAC clones positive for YAC 907_e_6 only, and 2 PAC clones positive with YAC 950_e_2 only, were selected. DNA was isolated and, where possible, insert was purified from vector. In the case of 4 of these 8 randomly selected clones no insert DNA was present after Notl restriction enzyme digestion. DNA from the four isolated inserts was hybridized to the lq21 sub-library filters. With this approach a further 4 new PAC clones were identified, two of these randomly selected PAC clones were seen to be overlapping and linking to the island of PAC clones containing profilaggrin. DNA from these new PAC clones was isolated, inserts were purified and hybridized to the lq21 sub-library filters, scoring new positive clones which were analyzed in a similar fashion. DNA was isolated from the remaining unidentified lq21 sub-library PAC clones. Insert DNA was purified where possible and hybridized back to the lq21-sub library filters. Figure 3.6C displays the relative positions of all the PAC clones identified from the sub­ library and depicts the gap facing clones determined from individual hybridization patterns and EcoRI fingerprints. Of the remaining unidentified PAC clones, only 52.4% contained insert DNA. Of the entire sub-library 29.4% contained no insert and 32.4% of the clones were not incorporated into the bacterial contig (all with inserts).

110 3.4 Further screening of the whole genome and flow sorted chromosome 1 bacterial clone libraries

After exhaustion of the lq21 sub-library a number of gaps still remained. This possibly highlights the inability of YAC clones to truly represent cloned DNA from the region without the introduction of deletions and rearrangements. Alternatively the PAC library could possibly contain gaps at these positions.

3.4.1 Further screening of the main PAC library

Identification of clones to extend and link the initial contigs began with hybridizations of PAC insert DNA to the gridded filters of the total human PAC library RPCI-1. Table 3.5 displays the PAC clones used in the initial screens of the main PAC library with insert DNA (see also figure 3.6C).

Table 3.5. PAC clones used to screen the total human FAC library RPCI 1, the region the clones are from (also see figure 3.6C), and the newly identiRed positive clones. Insert Used Marker/area New Positive Clones 24 PI 7 D1S3626 45 G7 191 N24 SI00A6 107 B3, 148 L21 73 N3 Centromeric of profilaggrin 116018 133 011 trichohyalin 116 018, 135 06, 287 El 7 275 119 Centromeric of 133 011 135 06, 287 El 7 116M10 Telomeric of profilaggrin 91 G5, 74J15 240117 D1S1664 11013 224(317 Telomeric PAC island 114B7

A number of the newly identified clones had been scored positive with the YAC to PAC library hybridizations but upon isolation of DNA, proved to have no cloned insert (91 G5 and 136 06) or contained a cloned insert that did not hybridize with any probe/target from the region (116 018). These clones were isolated again from the main library and analyzed. DNA was purified from three separate colonies, digested with EcoRI restriction enzyme and resolved on 0.8% agarose gels. This data showed that in the cases of clones 91 G5 and 116 018 two or three species of PAC insert out of the

111 three isolated were present in the wells of the main library. Southern blots of the agarose gels were produced and hybridized with the original probe that identified the clones from the main library. In each case the PAC clone which truly represented cloned DNA from the region was identified. Four of the new clones were found to have been positive with the initial YAC to PAC library screens, but scored incorrectly. This was the case for clones 114 B7 (scored as 114 A7), 148 L21 (scored as 148 K21), 287 E l7 (scored as 287 F I7), and 74 J15 (scored as 74 JIO). After removing this case of human error, and the case of mixed species of clones in the wells of the main library, from the statistics regarding the percentage of clones mapping from the sub-library, we see an increase from 38.2% to 45.1%.

DNA from the newly identified PAC clones was isolated and dot blots were produced as described. Insert DNA was purified and hybridized to the produced dot blots and the lq21 sub-library filters. This data coupled with the original hybridization data (i.e. 73 N3 and 133 0 1 1 both hybridizing to 116 018) linked three contigs, effectively closing two gaps. One clone, 107 B3, contained insert DNA but failed to hybridize either as a probe or target with any of the clones already identified, suggesting the cloned DNA was not from this region. Due to the presence of mixed wells within the PAC library 17 colonies were isolated. Restriction digestion analysis of isolated PAC DNA from 17 separate colonies revealed the presence of two species of PAC clone in well 107 B3 of the main PAC library. Figure 3.9 displays a 0.8% agarose gel resolving EcoRI digested DNA from these 17 PAC isolates of 107 B3 along with PAC clones 148 L21 and 230 D l. Southern blotting and hybridization analysis confirmed that one of the two species of clone present contained insert DNA similar to 148 L21 (lane 5 of figure 3.9 being the best example). Further analysis of the restriction fingerprint of this clone revealed the insert DNA to be contained within clone 148 L21 and therefore did not extend the main contig.

112 Figure 3.9. Example of a mixed well within the PAC library, RPCI-1. 0.8% agarose gel resolving EcoRI digested DNA isolated from 17 colonies of the well 107 B3 from the main PAC library in addition to a size standard marker and EcoRI digested DNA from clones 191 N24 and 148 L21. Lane numbers 1-17 indicate the isolations from well 107 B3. Lane number 5 displays an EcoRI fingerprint similar to 191 N24 and 148 L21 while lanes 8, 14, 16 and 17 display a fingerprint from both species of PAC clone present in well 107 B3. 107 B3 was scored positive with an insert hybridisation of clone 191 N24 to the main PAC library. Subsequent hybridization of the Southern blot of this gel with the insert from PAC clone 148 L21 identified homologous DNA present in lanes 5,8, 14, 16 and 17.

13 3.4.2 Screening BAC and cosmid libraries

Gridded filters containing a Bacterial Artificial Chromosome library (BAG, Research Genetics), representing the whole human genome, were screened with the insert probes from PAG clones 148 L21 and 114 B7. One BAG clone was scored positive for the 148 L21 hybridization while two positive BAG clones were scored for the 114 B7 hybridizations. DNA was isolated from the BAG clones and analyzed. Insert DNA failed to identify any new PAG or BAG clones from the respective libraries. EcoRi restriction analysis indicated that the 148 L21 positive BAG clone was contained within 148 L21 and did not extend further. Of the two 114 B7 positive BAG clones, one extended further by an estimated 8-1 Okb (B313 M l4). Gridded filters representing the flow sorted chromosome 1 cosmid library were screened with insert DNA from the clones 148 L21, 135 06 and 114 B7.44 and 20 cosmids were scored for both clones 135 06 and 114 B7 respectively. Of these one shared cosmid was identified, IGRFl 12cB0750. Restriction and hybridization analysis of this cosmid along with 135 06, 114 B7 and B313 M14, identified shared fragments and indicated that 135 06 and B313 M14 overlapped by a small margin, effectively closing the final gap of the main contig. 27 positive cosmids were scored from the 148 L21 hybridization. Restriction analysis followed by hybridizations failed to identify additional clones from all three libraries.

3.4.3 Identifying the contents of the EDC within the contig

Having closed the gaps within the main contig the next step was to verify that the entire known EDG was encompassed within it. The probe representing SIOOAIO (most proximal gene within the EDG) was hybridized against target DNA of the whole contig (lq21 sub-library filters and dot blots of additional clones) identifying the PAG clone 135 06 to be positive. PGR primers were designed to amplify portions of the SlOOAl and S100A13 genes (materials and methods). Amplification was carried out using fetal heart cDNA (constructed by Dr. Jane Ives, in Dr Nizetic’s laboratory, using the method described in

114 chapter 2, section 2.2.22) as a template. PCR product was purified by the electroelution method in order to provide a hybridization probe. The probes were hybridized against target DNA from the whole contig but failed to identify any positive clones. The probes representing SlOOAl and A13 were then hybridized against filters gridded with the chromosome 1 cosmid library. A number of positive clones were seen displaying a weak hybridization signal with the SlOOAl probe, while two strong hybridizing positive clones and a number of weaker ones (some of which were shared with SlOOAl positive cosmids) were seen with the SlOOAl3 probe. DNA was isolated from these clones, digested with EcoRI restriction enzyme and resolved on 20cm long 0.8% agarose gels. These gels were Southern blotted and re-probed with SlOOAl and S100A13. A single positive 6kb EcoRI fragment in clone ICRFl 12cP0780 only, was seen with the SlOOAl hybridization. An approximately 15kb EcoRI fragment present in the two, prominent, SlOOAl3 positive clones gave strong signals upon hybridization with the SlOOAl3 probe. In addition, a weaker signal was seen in a 3kb EcoRI fragment of cosmid ICRFl 12cP0780 with the probe representing S100A13. None of the weaker cosmids for either the SlOOAl or A13 hybridization, excluding ICRFl 12cP0780, re­ probed upon Southern blot analysis. Sau3Al probes were prepared from the cosmid ICRFl 12cP0780 (see materials and methods), labeled and hybridized against both the cosmid and the PAC library filters. Two positive cosmid clones (ICRFl 12cL1963 and ICRFl 12cE2275) and two positive PAC clones were seen (92 M23 and 178 FI5). DNA was isolated from all the new clones and analyzed. The majority of the insert DNA from the two cosmid clones was contained within ICRFl 12cP0780 while, the two PAC clones overlapped only slightly with each other and included about half of the original cosmid each. Insert DNA was isolated from the two new PAC clones and used to screen the cosmid library. The insert from PAC clone 178 FI5 gave a very weak hybridization signal for seven cosmids, which upon analysis proved to include a number of cosmids positive with the insert for PAC clone 148 L21. Further EcoRI digestion, resolution, blotting and hybridizations proved the two clones (148 L21 and 178 F15) to overlap, successfully linking the S 100A 1 locus to the main contig.

115 Hybridization experiments with the two SlOOAl3 positive cosmids identified both additional PAC and BAG clones. None of these clones, however, were seen from previous hybridization experiments with any YAC, PAC, BAC, cosmid, STS or genes from the lq21 region. Fluorescent in situ hybridization experiments (FISH, performed by Ghazala Mirza in the lab of Dr. Jiannis Ragoussis at UMDS, London) localized one of the strong S100A13 positive cosmid clones to human chromosome 10q23. This suggested that SlOOAl3 had been originally mapped incorrectly (Wicki et al, 1996b), that another locus or pseudogene was located on chromosome 10 or that a novel SI00 gene, very similar to SlOOAl3, was located on chromosome 10. In order to assess the localization of the published SlOOAl3 gene the EcoRl fragment from ICRFcl 12P0780 that gave a weak hybridization signal with the probe for SlOOAl3 was sequenced (see Chapter 5). This data proved that S100A13 did indeed localize to lq21 and indicated that the entire EDC was contained within the main bacterial contig.

3.5 Restriction analysis and fine alignment of clones within the contig

A minimal tiling path (MTP) of clones was determined on the basis of the hybridization patterns displayed by either PAC or cosmid insert probes. Clones were selected to constitute the MTP on the basis of coverage, the least redundant clones were chosen where possible. In some instances (i.e. clones B313 M14, ICRFl 12cB0750, and 135 06) coverage exceeded two fold due to the degree of overlap being small. DNA isolated from each clone was digested fourfold: an average of 1 pg was digested in each case with EcoRI, EcoRl and Notl, EcoRl and Sail, and EcoRl, Notl, and Sail restriction enzymes. The four reactions for each clone were resolved by electrophoreses in adjacent wells of a 20cm long 0.8% agarose gel, next to the respective adjacent clone(s) in the MTP. Electrophoresis was carried out for an average of 1000 volt/hours (around 40V), stained with ethidium bromide, visualized under UV light, photographed, and Southern blotted. These photographs and Southern blots formed the basis of the

116 restriction analysis. Figure 3.10A displays a Polaroid photograph of one such gel used for the restriction analysis. Insert DNA from each MTP clone (except in the case of cosmid clones) was systematically labeled and hybridized individually to the respective Southern blot(s) containing the clone itself and any clones known to overlap with it. The hybridization pattern seen on the Southern blots not only determined the orientation of overlapping clones but also determined the degree of overlap between MTP clones. Figures 3.10B and 3.11A display the Southern blot of the gel shown in figure 3.10A after hybridization with the insert of one of the clones present on the Southern blot. In some instances restriction enzyme digested fragments present on the Southern blots were not seen to be positive upon hybridization with insert DNA from the clone itself. Close scrutiny of figure 3.10 will reveal six EcoRI fragments from clone 19 K8 not positively identified with the hybridization of the insert DNA of clone 19 K8 (most notably two EcoRl fragments around 7kb). In the majority of cases it is seen that corresponding fragments in adjacent clones are also not positive with the insert hybridization. It was therefore concluded that, as these negative fragments are in common between adjacent clones, a number of repetitive elements would be masking the hybridization signal (insert probes are pre-annealed with repetitive sequences prior to membrane hybridization, see section 2.2.9).

Figure 3.10.0.8% agarose gel electrophoresis resolving 7 MTP clones (A) and an autoradiograph (B) displaying the corresponding Southern blot hybridized with the radio-labeled insert of clone 19 K8 (next page). A: Outermost lanes of the agarose gel contain 1.2pg of the Ikb ladder (Life Technologies) size standard (size to the nearest 1 kb, except for the 1.6kb fragment, indicated next to the far right lane). Seven MTP clones digested fourfold with a combination of restriction enzymes (EcoRl: E, Notl: N, and Saü: S - as indicated for clones 20 NIB and 59 H12) are resolved in four lanes for each clone. The T7 and SP6 end EcoRI fragments for clone 230 Dl are indicated by black and white blocked arrows respectively in the EcoRl digested lane (T7 end fragment also present in the EcoRl and Sail digested lane). Vector fragments are indicated in lane ENS with clear arrows for clone 127 El 2 (vector fragments present in all clones). B: Phospholmager file (Molecular Dynamics) displaying the Southern blot of the gel in A hybridized with the radio- labeled insert of clone 19 KB. Southern blot displays a mirror image of the gel.

117 20N18 59H12 19K8 140J1 127E12 230D1 148L21 48L21 230D1 127E12 140J1 19K8 59H12 20N18

8 8 # * - Î ^

MWWwM #8*#R MM

Figure 3.10. 0.8% agarose gel electrophoresis resolving 7 MTP clones (A) and an autoradiograph (B) displaying the corresponding Southern blot hybridized with the radio-labeled insert of clone 19 K8. Legend on previous page. 148L2 230D1 127E12 140J1 9K8 59H12 20N1

Probe = 20 NI8 insert Probe = cDNA representing a portion of tine S PR R 2e gene

Figure 3.11. Southern blot of the agarose gel in figure 3.10A hybridized with two ^-P-labeled probes. (A) Southern hot of the gel shown in figure 3. lOA hybridized with the labeled insert from PAC clone 20 N18. (B) Southern blot of the agarose gel shown in figure 3.IDA hybridized with a labeled SPRR2 cDNA probe. The asymmetric location of the Sail site, and the symmetric location of the two Notl sites, with respect to the PAC vector arms, enabled the determination the orientation of each PAC clone within the MTP (see appendix I for details of restriction sites). Notl, as discussed for insert isolation, digests the vector near to the cloning site on either side (SP6 and T7 ends), effectively excising the vector from insert, while Sail digests the vector close to only one side of the multiple cloning site (SP6 end). Therefore from the gel photograph the particular EcoRl fragment for each end (SP6 and T7) could be seen. Figure 3.10A indicates the T7 and SP6 EcoRI fragments for clone 230D1. When an adjacent clone insert was hybridized to the equivalent Southern blot a positive signal is seen for the particular end fragment (SP6 or T7) that is overlapping. In addition the digestions revealed any £coRI fragments that contained internal Notl or Sail restriction sites, indicating the possible locations of CpG islands (Bird, 1987; Gardiner-Garden and Frommer, 1987) and giving useful mapping reference points when comparing with the original genomic restriction maps of the region (see figure 3.1). End fragments could not be determined in some cases. In PAG clones this was due to co-running of end EcoRl fragments with internal EcoRl fragments. In the case of BAG clones, both Notl and Sail excise either side of the cloning site, and in the case of cosmid clones, no Notl or Sail sites are present in the vector. In these instances T7 and, on occasion, SP6 RNA polymerase extensions as probes were prepared (see section 2.2.14) and hybridized to the respective Southern blot(s). After the degree of overlap had been determined between clones of the minimal tiling path, each fragment present on the produced gels was sized in the following way. Gels were scanned into a PG using a Hewlett Packard DeskScan II software package. The contrast of the image was inverted to resemble a negative and the file was imported into a Molecular Dynamics software package. Fragment Analysis. This automatically sized the fragments within the gel based on the known size markers. From this the exact length of each clone and respective clone overlap could be determined. Table 3.6 displays the exact clone length determined from the EcoRl analysis compared to the clone length determined from insert isolation PFGE (figure 3.8) for eight clones of the minimal tiling path (covering centromeric of the SPRR cluster to S100A2).

120 Table 3.6. Comparison of size determined from PFGE with size determined from total EcoRI fragments of 8 clones in the minimal tiling path. Clone EcoRI size in kb PFGE size in kb % difference 20 NI8 163.5 150 +9 59H12 123 110 +11.8 19 K8 143 140 +2.1 140 Jl 152 170 -11.8 127 E12 89 110 -19.1 138 F6 108 110 -1.8 230 Dl 140.5 140 +0.4 148 L21 130 140 -7.2

EcoRI fragments were ordered into ‘pockets’ or ‘bins’ based on the insert to Southern blot hybridizations. A degree of refinement to some of these ‘bins’ was achieved by comparing previous EcoRI gels of clones not in the MTP. This also positioned non-MTP clones in relation to MTP clones.

3.5.1 Coverage (contig depth) The overall size of the region mapped in bacterial clones is 2450kb. Of this the majority of the coverage was greater than 2 fold. The one area of the contig in which coverage was only single fold for more than 50kb was the most telomeric clone 92 M23. For this reason the cosmid clones identified with PAC clone 92 M23 (section 3.4.3) were incorporated into the map. 18 positive cosmids were scored of which 15 were positioned to the map. Two of these cosmids had been previously scored positive with probes derived from ICRFl 12cP0780 and 178 F15. Table 3.7 details the degree of coverage throughout the map.

Table 3.7. The distance covered and percentage of the total contig in relation to coverage ranging from single to 9-fold. Coverage Distance in kb % of contig 1 -fold 440 18.03 2-fold 723 29.64 3-fold 623 25.6 4-fold ‘ 393.5 16.11 5-fold 161 6.58 6-fold 33 1.37 7-fold 35.5 1.46 8-fold 21 0.87 9-fold 8 0.33

121 3.5.2 Localizing genes and markers

Probes were obtained for all known genes and markers in the region. PCR primers were designed and used to amplify a portion of the SI00A5 gene. In addition to this and other probes (used to define the bacterial map, see sections 3.31, 3.33, and 3.36), cDNA clones representing the remaining genes within the EDC were obtained from two sources. cDNA clones representing SI00A4, S100A7, and SlOOAl 1 from a cultured keratinocyte library were identified in the laboratory of Dr. Dietmar Mischke by Ingo Marenholz and kindly donated (South et al, 1999). In addition, clones representing the markers D1S3623 and D1S3624 were kindly donated by Ingo Marenholz and Dietmar Mischke. IMAGE consortium cDNA clones were obtained from the HGMP resource centre (Hinxton, Cambridge, UK) representing SlOOAl, S100A2, SI00A3 and SlOOAl 2. For details of primers and cDNA clones see section 2.1 (materials and methods). A probe representing the recently cloned human repetin gene (Huber et al, 1999) was kindly donated by Marcel Huber, Lausanne. All probes were used in systematic hybridizations against the MTP Southern blots. This determined the specific EcoRl fragments to which these genes and markers localized. In some instances this positioning provided specific information regarding the ordering of certain fragments within the restriction map. For example, the probe representing S 1 OOAl2 hybridized to a 4,450bp EcoRI fragment present in both clones 127 E l2 and 138 F6 as well as an internal 12,500bp EcoRI fragment of 138 F6 and the 9,200bp SP6 end EcoRI fragment of 127 E l2. This data determined the SP6 end EcoRI fragment of clone 127 E12 to be overlapping with the 12,500bp EcoRI fragment of clone 138 F6, enabling the exact positioning of the two fragments positive for SI OOAl 2. Figure 3.1 IB displays the Southern blot of the agarose gel in figure 3.10A hybridized with labeled cDNA representing the SPRR2e gene. Due to the conserved nature of the SPRR2 genes a number of positively hybridizing EcoRi fragments can be seen, relating to the 7 members of the SPRR2 family.

122 3.5.3 New ESTs and STSs from the region

So far, three new STS markers were generated in the lab of Dr. Dietmar Mischke by Ingo Marenholz: y37m-2, y37m-6, and y37m-16. Portions of the YAC 764_a-l were sub-cloned and sequenced to provide these markers, which were used in hybridization experiments as described for other known markers. The same group isolated a number of new cDNA clones from the region by hybridizing whole YAC clones to a gridded keratinocyte cDNA library (South et al, 1999). One of these clones localized to the proximal end of the EDC contig and was subsequently hybridized to a MTP Southern blot containing clones from this area, identifying positive EcoRl fragments.

3.6 Integrated 2.45 mega-base map of human chromosomal region lq21 encompassing the entire EDC

Restriction mapping data was collated with gene, marker and EST localization data to produce the final, completely contiguous, integrated map displayed in figure 3.12. Appendix II details all hybridization and PCR-based data used to construct the final map. Fluorescent in situ hybridizations were performed in the laboratory of Dr. Jiannis Ragoussis by Ghazala Mirza with PAC clones 61 J12, 135 06, 14 N l, 43 017, 71 A4 and 148 L21 confirming their specific localization to the lq21 region (South et al, 1999). In addition the cosmid clone ICRFl 12cP0780 was used in a FISH experiment. This cosmid as a probe gave a clear signal on lq21 confirming its’ localization to this area, but also produced a faint signal on lp36. This is in agreement with studies observing homology between chromosomal regions lp36, lpl3 and lq21 (Zhang etal, 1999; see also chapter 6).

123 Figure 3.12. Restriction and transcriptional map of the 2.45 mega-base bacterial contig encompassing the entire EDC. The map is set out in five parts over two pages. Each part is as foilows;The lowest horizontal scale bar is in kilobases, 0 being the most centromeric end of the contig. The top (thicker) horizontal bar is a schematic representation of chromosomal DNA showing the localization of markers, genes and ESTs in the region. Rhomboid signs represent genes; ellipsoid signs represent markers, STSs and ESTs. The y37m-2, m-6, and m-16 are new markers and their GenBank accession numbers are X94412, AJ009641, and X95103, respectively. 10198b8 is a new keratinocyte cDNA clone mapping to the region, which has the accession number AJ00940. Underneath the schematic of chromosomal DNA are bacterial clones represented by dark horizontal lines. Clones that make up the minimal tiling path are shown with orientation, restriction enzyme sites and localization of genes/markers or ESTs. Black boxes represent T7 clone ends while white boxes represent SP6 clone ends. Small vertical bars show EcoRl restriction enzyme sites while a large N or S represents Notl and Sail restriction enzyme sites within particular EcoRI fragments respectively. Polymorphic EcoRl sites in our map are represented by a small grey vertical bar. The vertical grey rectangles represent EcoRl ‘bins’ - the precise ordering of fragments within the bins has not been determined. BAC clones are shown with the prefix B’ while cosmid clones have the prefix ‘c’ instead of ICRFcl 12. The remaining clones are PACs. Fluorescent in situ hybridisation has confirmed the following clones to localize to lq21: 61J12; 13506; 41N1; 43017; 71 A4; 148L21; and cP0780.

101968 dlS3623 v37m-6 y37m-2 slOOalO slOOall 0 0 0 0

.N S .S-^N h t-O-H o - Of* I M l I I I 1*1 |N, B o-f I OHH (H 1----- 1 I t I ^ I s , , |N,„^ -O---- I-I—00*- -i—h-H - o

3yo 4yo _5ÿO v37m-16 Trichohyalin Repetin dlS3624 Profilaggrin '■0 » » 0

#01 I I I I lUt ^ #—I IIINH ^ 1^ 11^ -H------1—I I II* 1----1-----1---h—h o I I II ■+-C

-I 1----1----#" #■■#■##*«------#-#-»-

KKHI Figure 3.12 continued. d l S 1 6 6 4 Involiicrin 0

I h------1—I I I I i I H t» ------1 1 t t II w------itf □ h

-t—+01 I HHH I n M |S, , , , t ihhCI -0---1--- Mil IIHII I------1—I—I—I I I I II > I H*n

SPRRl SPRR3 SPRRl SPRR2 Loricrin sl00a9 sl0 0 al2 slOOaS "W-

A i^iii M *♦ I ii^i IIII& l^ll*4W «------1 I III^HD gl ri)» ♦ * t -4*-+++#- I IIM m '#--*"4.4,,# I I

ISMI 1600 ]#0 sl00a7/dls3625 slOOa: 6,5, 4, 3, 2 S100al3/S100al 40 — —4 4 4 4 4

|40I I I M l I 1—I— I I I nw------1— I III» 6 ^ 6 n -*n

I4SU1 >401 I I B D — H —1— I I I Hw —I— t—«4-444-4— 4-^ -fg g i$l iil'Ml I #11—□ —

-#-4-44-4—4- I f -M4-i4i i^t I «Il Q

2jKH) 2100 22^00 2»M) 3.6.1 Submission of the bacterial contig for large scale sequencing

The entire contig (figure 3.12) was submitted to the Sanger Centre, Hinxton, Cambridge, UK to be integrated into the large scale sequencing project of human chromosome one. At the time of submission this contig was the largest completely overlapping, chromosome 1, bacterial clone map accepted. Sequencing progress can be viewed via the Sanger Centre web site, specifically http://webace.sanger.ac.uk/cgi- bin/ace/pic/1 ace?name=Chr 1 ctg80&class=Map&click=303-8.303-8,382-8.98-8. A more general view of the sequencing progress across chromosome one can be viewed via http://www.webace.sanger.ac.uk. The link to the above site displaying the contig sequencing progress would be achieved from Chr_lctg80. The Sanger Centre has predicted the deposition of raw sequence data onto the web site around October 1999.

3.6.2 Comparison of the integrated 2.45Mb bacterial map with previous mapping studies of lq21

The sum of the lengths of the non-redundant restriction endonuclease fragments displayed in figure 3.12 gives the measurement of the physical distances between genes and markers present within the 2.45Mb of the integrated map. These measurements are in good agreement with previous studies of the region using long-range restriction endonuclease digestion of genomic DNA (Volz et al, 1993; Mischke et al, 1996; also see figure 3.1). These previous studies found the genes SIOOAIO and S100A6 on genomic fragments with minimum sizes of 1760kb and 1750kb, respectively, and maximum sizes of 2050kb and 1900kb, respectively. We find the same genes occupying a distance of 1700kb. The small difference of 50-60kb can be easily attributed to the difference in the resolution of the techniques used. A comparable difference between the resolution of the techniques can be seen from the recent 1.8Mb sequence of the MHC class I region (Shiina et al, 1999) where initial estimates from mapping data suggested the same region covered 2Mb. The genes S100A6, S100A5, SI00A4 and SI00A3 have previously been localized to a 15kb region in genomic DNA-containing phages (Englekamp et al, 1993). In agreement we see these SI00 genes localizing to a similarly sized region. The identified 8.9kb

126 EcoRl fragment positive with the probe representing S100A13 is in good agreement with the lOkb EcoRl fragment identified by Wicki et al, 1996a. The study of a YAC containing the SlOO genes A1 and A9 (Schaefer et al, 1995) gives a distance between these two genes of 320kb. Here we describe the maximum distance between these genes as 330kb and also describe the positions of Sail sites identified in this study. A mapping study using mainly YAC clones (Zhao and Elder, 1997) gives a distance between the genes involucrin and S100A2 as 780kb. The map presented here is in agreement with a maximum distance of 790kb for the same interval, and also confirms the position of a single Notl site over this region.

3.6.3 Precise ordering of genes and markers within the EDC

The study here describes the precise ordering of genes and markers known to be present within the described region covered by the integrated 2.45Mb map, in addition to two new markers and one identified keratinocyte cDNA, except in the case of the SPRR gene cluster (see figure 3.12). Here, due to the high number of related genes we did not attempt a precise ordering. This is in the main part due to a parallel detailed mapping study of the SPRR cluster in Claude Backendorf’s laboratory, Leiden. However, the relative ordering of the three SPRR families (integrating the detailed Leiden study) are given in complete agreement with their precise ordering (see South et al, 1999). The orders of all genes and markers previously localized to low resolution physical maps are in agreement with data present here, with two exceptions. One refinement to the previous 6Mb YAC contig of the region (Marenholz, et al, 1996) is described. This YAC contig was unable to resolve the marker D1S3625 with respect to S100A6, giving the order cen - S100A9 - S100A6 D1S3625 - tel. Here we refine this order to cen - S 100A9 - D1S3625 - S100A6 - tel. One refinement to the order of the S 100 gene cluster is described. A previous mapping study using a single YAC was unable to resolve the order of SlOOAl and S100A13 (Schaefer et al, 1995). Here we describe the order as cen S100A6 -S100A2 - S100A13 - SlOOAl - tel. In addition , the positioning of the three new markers, y37m-2, y37m-6, and y37m-6 is described (GenBank accession numbers X94412, AJ009641, and X95103 respectively).

127 The positioning of a novel keratinocyte cDNA to the region, approximately 300-350kb telomeric of the so-far-defined EDC (GenBank accession number AJ009640) is also presented (1019b8 -see figure 3.12).

128 Chapter 4: Use of the integrated map as a platform for the identification of novel transcribed sequences within the Human Epidermal Differentiation Complex (via Exon Trapping)

4.1 Summary

Exon trapping was performed from an approximately 170kb segment of the EDC using three PAG clones identified in chapter 3. Thirteen putative exons were isolated, comprising more than 13.2% of the total “exon trapped” clones produced. Homologous sequences to three putative exons were identified using BLASTN 2.0.9 and BLAST? 2.0.8 (Altschul et al, 1997) searches of the available databases. Of these homologous sequences, a single cDNA clone is 81% homologous to one identified exon at the nucleotide level, and is shown to map to the “exon trapped” region by hybridization and direct sequencing. An additional “exon trapped” clone showed 46% identity to the same cDNA clone at the amino acid level. RT-PCR and Northern blot data suggest this cDNA is expressed and is related to a number of different transcripts in adult tissues. These data suggest that multiple related transcripts of a novel type are localized within the Epidermal Differentiation Complex.

4.2 Introduction

Estimations of gene content within a mammalian genome vary considerably (Fields et al, 1994) but it is generally accepted that only a small proportion is transcribed to constitute mRNA (3-5%). As discussed previously, the primary goal of the human genome project is to generate a complete nucleotide sequence of the human genome. Over the last decade technological advances have enabled this goal to become a fixed point on our horizons. The main driving force behind the realization of this goal was the search for genes.

129 Before the introduction of global sequencing of both genomic DNA and transcribed DNA, efforts were (and still are) focussed on identifying transcripts from certain genomic regions of biological interest. Positional cloning projects identify regions associated with a genetic disorder, followed by the hunt for candidate genes that potentially contribute to the disorder. The introduction to this chapter will give a brief outline of some of the strategies, successfully employed, in the hunt for genes.

4.2.1 cDNA libraries and their screening

The majority of transcript identification methods rely on the construction of libraries of transcribed sequences cloned into a suitable vector-host system. RNA from a cell line or tissue of interest is isolated and reverse transcribed into cDNA. The resulting cDNA is cloned into a suitable vector and is available for transformation of a host organism or immediate analysis. It is important to note that a cDNA library will represent the population of transcripts from the cell line or tissue(s) used for RNA isolation. Housekeeping (essential for cellular function) and tissue specific transcripts will be present and in different quantities. It was seen by randomly sequencing clones of a cDNA library generated from a liver cell line, that 2.6% of the estimated transcripts constituted 52% of the total mRNA present (Okuba et al, 1992). The realization that the majority of transcripts within a cell would be from a small proportion of the individual transcripts expressed in greater quantities prompted the need for the production of ‘normalized’ libraries (Bento Soares etal, 1994). A normalized library aims to reduce the quantity of highly expressed transcripts whilst enriching the number of poorly expressed transcripts present within the library, thus increasing the chances of identifying novel genes as opposed to analyzing expression patterns of different tissues. Normalizing is achieved by annealing a single stranded population of cDNA with an equal amount of mRNA from identical sources. By this means, the more abundant cDNAs will anneal at a faster rate than the less abundant cDNAs. After a degree of annealing, single stranded cDNA is isolated as a ‘normalized’ population.

130 Identifying the genomic origin of a particular cDNA within a library can be achieved by mapping the cDNA of interest, constructing a cDNA library from a specific genomic origin or screening a library with genomic DNA of known origin. Direct screening of arrayed cDNA libraries with genomic clones has been demonstrated as a useful way of identifying transcribed sequences from a region of interest (Elvin et al, 1990). In this case repetitive elements within the genomic clone are suppressed before hybridization to avoid identifying a proportion of cDNAs that contain highly repetitive sequences (Crampton et al, 1981) and are not located on the genomic clone used. Libraries of genomic clones themselves can be screened with cDNA populations as hybridization probes (Hochgeschwender etal, 1989). In this case a genomic library of specific origin is constructed (either from flow sorted material or somatic cell hybrids) and arrayed on membranes that can be hybridized with different cDNA populations. Hybridizing different populations of cDNAs to identical membranes can elucidate expression patterns, whilst genomic clones identified to contain differentially expressed sequences of interest can be readily isolated.

4.2.2 cDNA selection

Transcribed sequences from a specific genomic region can be identified and isolated by direct selection using a genomic clone of interest (Lovett et al, 1991 ; Parimoo et al, 1991 ; Morgan et al, 1992, Korn et al, 1992). This method essentially describes identifying cDNA sequences from a whole cellular complement that specifically anneal to a genomic clone (or clones) immobilized on a solid support. In this case suppression of repetitive sequences is carried out in both the cDNA and the genomic clone of interest, prior to hybridization. After hybridization, non-homologous cDNAs are washed from the immobilized genomic clone DNA at high stringency. The remaining, homologous cDNAs are eluted from the solid support and amplified using the polymerase chain reaction. Additional rounds of selection can be undertaken with the amplified products, increasing the specificity of the experiment.

131 4.2.3 Global cDNA sequencing and mapping

Randomly sequencing cDNA clones within a library provides a useful database of transcribed sequences. Projects sequencing large numbers of cDNA clones from many different libraries have been initiated (Adams et al, 1991; 1992; 1995; Lennon et al, 1996). A number of parallel studies to map a large proportion of the growing number of transcribed sequences have also been undertaken (Schuler etal, 1996; Deloukas etal, 1998) facilitating assignment to genomic regions. This global approach not only assigns possible genes to genomic regions of possible biological significance but also assists the global mapping within the human genome project.

4.2.4 Approaches to identify transcribed sequences directly from genomic DNA without prior knowledge of the tissue of expression

The above strategies define the use of cDNA libraries in identifying transcribed sequences. The major drawback to the use of cDNA libraries is that they are limited to the tissue or cell type of origin and to the levels of transcript expression. Normalization of a cDNA library cannot always enrich for sequences of minimal expression. Techniques to isolate transcribed sequences that circumvent the need for cDNA construction present distinct advantages over these methods.

4.2.4.1 Zoo blots Using the premise that transcribed sequences will be more conserved across evolution has enabled their identification by hybridizing genomic fragments to Southern blotted genomic DNA of diverse species. This technique was first used to identify transcribed sequences of the gene implicated in Duchene muscular dystrophy (Monaco et al, 1986). With the more recent advances in sequencing this exact principle can be exploited by the direct sequence comparison of two syntenic regions of separate species. This has been demonstrated with the identification of regulatory elements within the p-globin locus by comparing the sequences across species (Hardison etal, 1997).

132 4.2A.2 Linking libraries Linking clones are defined by short sequences isolated by the criteria that they contain a cleavage site for restriction enzymes that digest genomic DNA infrequently. A number of such enzymes, such as Notl are associated with CpG islands, genomic regions containing a high density of non-methylated CG dinucleotides (Bird, 1986). It was noted that such CpG islands are associated with many house keeping genes and an estimated 40% of tissue specific genes (Gardiner-Garden and Frommer, 1987). Therefore the identification of a linking clone not only provides the means for the generation of an STS for mapping purposes but also has the potential for containing a transcribed sequence. Linking clones are produced by partially or completely digesting genomic DNA and isolating fragments of the desired size, which are subsequently circularized. The circularized DNA is purified and digested using the restriction enzyme of choice (invariably an enzyme associated with CpG islands or infrequent cleavage sites throughout the genome). The linear DNA is then cloned into a suitable vector that has also been digested with the enzyme of choice, producing a population of clones containing sequences flanking the enzyme site of choice.

4.2.4.3 Sequence annotation In 1991 the first efficient example of identifying protein-coding sequences with the use of computer prediction software was described (Uberbacher and Mural, 1991). Since then a number of software packages have been developed that analyze a given stretch of genomic sequence via neuronal networking using algorithms based on current knowledge of transcribed sequences. Different algorithms for different genomes are needed depending on complexity. For instance, up to 90% of a prokaryotic genome can code for protein via intronless sequences, compared to the estimated 1-3% in human, interrupted by intronic sequences. Dynamic programming algorithms that consider all combinations of possible transcribed sequences and choose the best using previous information, are the basis of many eukaryotic genefinders (Salzberg et al, 1999). These genefinding programs will analyze a stretch of sequence and predict elements such as an exon or regulatory binding site based on known data available. Neuronal network programs can utilize the

133 data from predictions made to re-analyze the sequence and make the best choice, the data determined will be used in subsequent sequence analysis thereby facilitating ‘learning’. This can have drawbacks in that if sequences are being predicted incorrectly they will be continually incorrectly predicted, therefore a degree of updating systems and programs is necessary as more reliable expression data is determined from predicted sequences. Common computational methods are of the Markov chain family, in which the probability of an event is based on a fixed number of previous events. These methods are being continually updated and improved to cope with the complexities of sequence presented (Burge and Karlin, 1997; Salzberg etal, 1999). Current applications are quoted to identify 50-80% of exons present within a given sequence (Burge and Karlin, 1997), although examples are seen where the rates of false negative and false positive predictions are high (Gardiner, 1997; see below).

A 2A A Exon trapping Exon amplification systems rely on the presence of functional splice sites within genomic DNA. The first system described used a retroviral shuttle vector to select for 3’ splice sites in genomic DNA (Dyuk et al, 1990) while the second system described, selected for sequences flanked by functional 3’ and 5’ splice sites (Buckler et al, 1991). In this second system, genomic DNA fragments are inserted into an HIV tat intron present within a mammalian expression vector, pSPLl. Genomic-vector DNA constructs are transfected into COS-7 cells where an SV40 early promoter drives expression of the construct sequence. Genomic sequences flanked by functional splice sites present in the correct orientation within the pSPLl construct will be spliced, whereas sequences not flanked by functional 5’ and 3’ sites will be spliced out. Cytoplasmic mRNA is then isolated from the COS7 cells and reverse transcribed to cDNA. Spliced sequences derived from the pSPLl construct are amplified using PCR and analyzed. This method was later modified with the introduction of the pSPL3 vector (Church et al, 1994). The advantages provided by the modified vector were the inclusion of a BstXl restriction enzyme half site either side of the tat intron. When genomic sequence containing a cryptic 3’ splice site, or where no genomic sequence is spliced at all within the construct, a functional BstXl site is

134 formed. Digestion with BstXl restriction enzyme prevents any PCR amplification of sequences containing no identified exons or cryptic 3’ splice sites from the genomic sequence of interest, thus reducing a symptomatic problem of the earlier system. A distinct advantage of the exon trapping system is the ability to identify functionally transcribed sequences regardless of any temporal or spatial expression constraints observed by cDNA based methods. In addition, exon trapping is able to identify exons that are not always predicted by the available software. For example, in analyzing SOOkb of genomic sequence derived from chromosome 21, the sequence prediction software GRAIL (grail2) failed to identify 5 from 19 exons of a characterized gene from the region (Gardiner, 1997).

In this chapter exon trapping is assessed as a method of identifying transcribed sequences within the Epidermal Differentiation Complex. At the beginning of this study, no active attempt to identify transcribed sequences within the EDC had been made.

4.3 Generation of “exon trapped” libraries from selected EDC PAC clones

4.3.1 Starting material

Three PAC clones were selected from the integrated map (see chapter 3) as starting material for the exon trapping approach for identifying transcribed sequences. Clones 127 E l2, 128 L I5, and 138 F6, covering the region centromeric of the S1(X)A9 gene to telomeric of the S100A7 gene, were selected (see figure 3.12 for location). This 175kb region was selected on the basis of a number of criteria as follows. Firstly, a small region was chosen as a pilot study to evaluate the exon trapping strategy of identifying transcribed sequences within the EDC. Secondly, the region was selected to include a number of SlOO genes. SlOO genes contain three exons and would therefore act as positive controls for the exon trapping strategy. The loricrin, involucrin, and SPRR family

135 consist of genes with only two exons, while the region containing profilaggrin and trichohyalin was being investigated in the laboratory of Dr Dietmar Mischke, Berlin. Template DNA from the three, aforementioned, PAC clones was isolated and prepared as described in section 2.2.19.2 (materials and methods). Both the given methods of preparing template DNA (completely digesting the PAC DNA with BamHi and BgUL restriction enzymes, or partially digesting the PAC DNA with Sau2>Ai restriction enzyme) were used as two separate experiments for each clone.

4.3.2 Production of “exon trapped” libraries

Exon trapping was carried out as described (section 2.2.19) with SauiA i partially, or B am W B gia completely digested template DNA, for each of the three PAC clones. In addition, H 2 O and approximately 5pg of pSPLB vector were used in separate exon trapping experiments, as controls. Final, amplified, products were cloned using the CLONEAMP® pAMPl system (Life Technologies), following the manufacturer’s specifications. Growing transformants on media containing IPTG and X-gal (see materials and methods) screened successful cloning of amplified products. The multiple cloning site of the pAMPl vector is within the a-peptide of the lac Z gene, so insertion of DNA into this multiple cloning site disrupts the lac Z gene, resulting in white colonies (no lac Z expression) rather than blue colonies (Jiac Z expression). White or light blue colonies were picked into 384-well microtitre dishes containing the appropriate media and antibiotic. Colonies resulting from SauiA i partially, and BamYHJBgni completely, digested PAC template, were picked into one microtitre dish or plate, for each PAC clone. A single row of colonies resulting from pSPLB and H 2 O, control templates were also picked into each plate. Table 4.1 describes the number of colonies picked for each plate.

136 Microtitre dish number Starting template Number of colonies picked 127 E l2 SauSAl partial 144 1 (127 E12) m E12 BamUl / Bglll 144 H;0 24

PSPL3 24 128 L15 Sau3M partial 120 2(128 L15) 128L15BamHI/Bgrn 144 HjO 24

PSPL3 24 138 F6 SauSAl partial 144 3(138F6) 138F6fiflmHI/fi^/II 144 HjO 24

PSPL3 24

Table 4.1. Numbers of colonies (clones) picked into each of the 3 microtitre dishes from exon trapping experiments with the given template.

One microtitre dish constituted an “exon trapped” library for each of the three PAC clones. Microtitre dishes were grown overnight at 37®C. The resulting cultures were used to spot nylon membranes, which were grown overnight at 37®C on agar containing the appropriate antibiotic. Membranes were processed as described (see section 2.2.1) and the microtitre dishes frozen on dry ice, and stored at -80®C.

4.4 Analysis of “exon trapped” libraries

4.4.1 Screening

The “exon trapped” libraries were screened with a number of different hybridization probes. Figure 4.1 displays a selection of the raw data from hybridization experiments using a number of probe/target combinations that are given in table 4.2. Table 4.2 details the hybridization results of membranes containing spotted microtitre dishes with the given

137 A B s # .a ' L r .> & Î 1 .* * « ■© # t # :à g 4* ft # # &# » {; » » 4 w # * ^ f ' - * # * ■» <- # * * * * * & ■ # ■r * # 4" * 4) # ir # Ÿ a # ' « « .4 ^ ^ # a < V « # # e '« # # ###& ■ * # * # ' , * * A ■* 4. a e * # ■ :> # n « $ a Ç # <3 # ' 2#» # %# » # $ # $ à -- £1 « .. # # $ a: # # # # e # # # -*> •'■;■ '-■ -- ■ V &" f:' r -,- * ■ # ^ # r 4» - »* * # . ■r“ a •f # : a Î » -f # ' 7 . # % \ # & » # 9 ** -s; a • $ 0 C a # ' # * /* s $ if- # V- # y & & 4. % ;$4:. - •Jf » * * -

.. s " f ife r- : V V $ © c Figure 4.1. Membranes containing spotted # # ^ ##*## exon trapped” library derived from PAC ' ^ ^ ######## * V # ## clone 127 E12 hybridized with separate ^^p ^ # # # # # # ^ * ^ * * labeled probes. Probes used as follows: A = HIV <§ m * 4 * $ # „ . # f* # # # # # » ^ 2 intron. B = 127E12 insert DNA. C = 3rd PCR » ' * î * * ###### #*##^ *# **# w from 127 E l2 5aa3AI exon trapping experiment. # ' ### 7 # '. # # 4: a # Arrows give examples of clones selected for 4 ; © # © analysis. # # # Ù * .$ ^ # $ ' ' % ^ 0! W 54 © # $ # à :#

# # è * # # # * radio-labeled probes (initial library screening). The difference between positive hybridization signal, compared to negative hybridization signal, varied depending on the nature of the probe.

Hybridization probe Microtitre dish 1 Microtitre dish 2 Microtitre dish 3 membrane (127 E l2) membrane (128 L15) membrane (138 F6) HIV 11 10 2 Cot-1 17 8 0 127 El 2 insert 40 18 34 128 L15 insert 15 39 41 138 F6 insert 2 15 15 127E12 3^‘'PCR-S 11 18 23 127E12 3“’PC R -B /B U U U 128 L15 3''' PCR - S 8 20 1 128 L15 3'" PCR - B/B 7 17 1 138 F6 3'“ P C R -S 12 3 37 138 F6 3'" PCR - B/B 1 4 24 S100A7 0 0 0 S100A8 0 0 0 S100A9 0 0 0 S100A12 0 0 0

Table 4.2. Number of positive clones identified with the given radio-labeled probe (column 1) hybridized against nylon membranes containing spotted clones of the 3 “exon trapped” libraries (columns 2-4). U represents unreadable results from hybridization experiments (signal too weak).

4.4.2 Assessment of trapped exon sequence from SlOO genes within the “exon trapped” libraries

No positive signals were seen on the membranes representing the “exon trapped” libraries from hybridization with probes representing the four SlOO genes (positive controls). In order to assess the presence of trapped exon sequences from the SlOO genes within the libraries, thereby assessing the effectiveness of the experimental procedure, the products from the 3^^ PCR were investigated (see section 2.2.19.7). One fifth of the 3*^^* PCR products from each exon trapping experiment were resolved on a 0.8% agarose gel.

139 This gel was Southern blotted and hybridized with the probe representing SlOOAji2Figure 4.2 displays this result, identifying a positive signal at around 550bp in the 3^^ PCR product from the exon trapping experiments using 128 FI 5 Sau3Al partially digested DNA template. The size of the second exon of S100A12 is 158bp, coupled with 56bp of extra sequence provided by the primer binding sites and vector sequence gives an expected fragment size of 214bp. Positive signal around 550bp in the lane resolving the 3*^** PCR products from 128 F15 Sau3Al exon trapping experiment was seen. SlOOAl2 maps to all three PAC clones used in the exon trapping experiments described (see figure 3.12) and therefore positive signal would be expected from exon trapping experiments of all three clones. Probes representing SI 00A8 and S100A9 were hybridized to Southern blots of the resolved 3"^^ PCR products in an identical manner. Positive signals were seen in the lanes resolving 3^^ PCR products from 127 E12 Sau3Al and 128 F15 BamHyBglll template DNA with the S100A9 probe (around 300bp) and positive signal was seen in the lane resolving 3”* PCR products from 138 F6 BamHUBgUl template DNA with the S100A8 probe (around 300bp). The sizes of the second exons of S100A8 and S100A9 are similar to SlOOAl2 (164bp and 165bp respectively), therefore in each case positively hybridizing signals were larger than expected. This could be explained by the presence of exon like sequence adjacent to the second exon of these SlOO genes being spliced together, thus producing a larger product. Positive hybridization signal was seen in at least one of the exon trapping experiments from each PAC clone used as template. The fact that positive signals were seen in some, but not all of the expected exon trapping experiments did present an anomaly. This could be explained by the fact that each exon trapping experiment represented a different population of trapped exons. The competitive nature of the PCR (amplifying a wide range of products) could selectively amplify a differing population of exons depending on the different starting template. The above data provided evidence that exons from the three SlOO genes A8, A9, and A12 (the exon/intron structure of S100A7 was not available and therefore was not investigated) were trapped in a number of the exon trapping experiments performed. It was therefore decided that the “exon trapped” libraries produced would be investigated further.

140 87654321 M M12345678

lOISbp

220to

Figure 4.2. 0.8% agarose gel resolving amplified products of the 3rd PCR from 8 exon trapping experiments, and the corresponding Southern blot hybridized with a radio- labeled probe representing S100A12. 1 = 127 E l2 Sau3Al. 2=128 FI5 Sau3A. 3=138 F6 Sau3Al 4=127 E l2 BamHl/Bglll 5=128 F\5 BamHyB^lU. 6=138 F6 BamHl/Bglll. 7=Hp. 8=pSPL3.

141 4.4.3 Selecting clones to pick Where possible, three positive clones from each of the original hybridization experiments to the “exon trapped” libraries (detailed in table 4.2) were selected for investigation. Frozen cells from each selected well of the particular microtitre dish were scraped and streaked onto an agar plate containing the appropriate antibiotic, and grown overnight at 37°C, as detailed in section 2.2.4 (materials and methods).

4.4.4 PCR amplification Single colonies from each agar plate, streaked with cells from a selected well of the desired

“exon trapped” library (section 4.4.3), were picked into 25pi of H 2 O and subsequently used to inoculate 3ml of LB containing the appropriate concentration of carbenecillin. The inoculated broth was incubated at 37°C overnight, with shaking at 200rpm. H 2 O, containing scraped cells, was heated at 95°C for 5 minutes. 2pl from this was used as template for PCR experiments with the M l3+ primer pair. PCR products were ethanol precipitated and 10% resolved with 2% gel electrophoresis. Figure 4.3 shows a Polaroid photograph of one such gel, containing 12 resolved PCR products from “exon trapped” library number 2 (clone 128 L15). Glycerol stocks were made from the inoculated broth, frozen and stored at -80°C.

4.4.5 Analysis of PCR products Of the PCR products produced, eight were initially chosen for further analysis on the basis of size redundancy - a number of PCR products were of the same size, only one from a possible size group was initially chosen. Approximately 25ng of the eight selected PCR products were used for radio-labeled probe preparation for subsequent hybridization to filters of all three “exon trapped” libraries and lq21 sub-library filters (chapter 3), individually. Only 50% of the probes produced positive signals with their corresponding positions on the “exon trapped” library filters, and only two of these produced positive signals with one or more of the three PAC clones used as starting material on the lq21 sub-library filters.

142 12 11 10 9 8 7 6 5 4 3 2 1

Figure 4.3. Polaroid photograph of 2% agarose gel electrophoresis resolving M13+ PCR products from 12 selected clones of exon trapped library 2 (128 L15).

143 It was postulated that this was due to the nature of the M l3+ primer PCR. The Ml 3+ primers are located 115bp and 70bp (forward and reverse respectively) away from the cloning site of the pAMPl vector. PCR products cloned into the pAMPl vector (from the original exon trapping experiments) contained 159bp of pSPL3 vector sequence. The average size of an M l3+ PCR product was about 450bp, 334bp of which would be vector-derived sequence. Therefore, nearly 75% of an average radio-labeled probe would be homologous to each DNA target present on the filters used for hybridization (both “exon trapped” library and lq21 sub-library filters), thus masking any positive signal seen with a possible exon. For this reason further PCR experiments were performed with the template used in the M 13+ PCR experiments, but using primers positioned directly adjacent to possible trapped exon DNA, primer pair EXl and EX2. The PCR products from the EX1/EX2 amplifications were ethanol precipitated. 10% of the precipitated material was resolved with 2% agarose gel electrophoresis. Of these PCR experiments, only 20% displayed any amplification, compared to 89.5% with the M l3+ primer pair. Approximately 25ng of all the EX1/EX2 PCR amplified products were used for radio- labeled probe preparation and were hybridized to filters of all three “exon trapped” libraries and lq21 sub-library filters, individually. Of these, all produced positive signals with their corresponding positions on the “exon trapped” library filters, and also produced positive signals with one or more of the three PAC clones used as starting material on the 1 q21 sub-library filters. Figure 4.4 displays the results of one such hybridization with an EX1/EX2 amplified DNA fragment. Hybridization experiments with PCR products also identified any redundant clones within the “exon trapped” libraries. Two more rounds of clone selection, streaking, PCR amplification and hybridization experiments were performed with clones identified in the initial screening of the “exon trapped” libraries that were not positive with the EX1/EX2 PCR product hybridizations.

144 Figure 4.4. Membranes containing spotted “exon trapped” libraries and lq21 sub-library hybridized with a labeled probe representing “exon trapped” clone 128L15sB5 (arrowed). A = “exon trapped” library 2, derived from clone 128 F15, displaying 8 positive clones. B = “exon trapped” library 3, derived from clone 138 F6, displaying 3 positive clones. C = lq21 sub-library filters displaying clones 127 E l2, 128 F I5 and 138 F6 (feint) as positive (wells E2, E3 and E7 of the lq21 sub-library). The number of “exon trapped” clones within the three libraries, excluding those picked from H 2 O and pSPL3 exon trapping experiments, totals 840. From the hybridization data,

1.67% of these clones were positive with the HTV probe and 2.74% were positive with the Cot-1 probe. The total number of clones analyzed from all three libraries was 106. Of these, 16% produced a PCR product from EX1/EX2 primer pair and 85.5% produced a PCR product from the M l3+ primer pair. Of the M l3+ PCR products, 69% were less than 350bp in size. No EX1/EX2 PCR products or any M l3+ PCR products larger than 350bp were seen from “exon trapped” library number 3, produced from clone 138 F6, although positive hybridization signals were seen with probes derived from the other two “exon trapped” libraries (see figure 4.4).

4.4.6 Sequencing PCR products from “exon trapped” clones PCR products from “exon trapped” clones that produced positive hybridization results (both “exon trapped” and lq21 sub-library filters in the case of EX1/EX2 PCR or only “exon trapped” filters in the case of Ml 3+ PCR) were sequenced using BigDye™ terminator chemistry using an ABI-310 (PE Applied Bio-Systems) according to the manufacturers specifications. Approximately lOOng of the PCR product was used as template DNA and sequenced using either EXl or M13F+ primers, depending on which primer pair was used to amplify the initial product. Where a PCR product was larger than 750bp, an extra sequencing reaction was performed with the addition of the corresponding primer of the pair (either EX2 or M13R+). In addition a representative M l3+ PCR product that was less than 350bp in size was also sequenced.

4.4.7 BLAST analysis Vector sequence from both pAMPl and pSPL3 was removed from the generated sequence of the PCR products of “exon trapped” clones. These ‘clipped’ PCR product sequences were then used in BLASTN 2.0.9 (Altschul et al, 1997) searches of the non- redundant and human EST databases available at the NCBI web site

146 (http://www.ncbi.nlm.nih.gov/blast). The sequence of the representative M l3+ PCR product that was less than 350bp, revealed spliced pSPL3 vector sequence, containing a functional BstXl site.

4.4.8 Open reading frame (ORF) analysis

The presence of an open reading frame within a ‘clipped’ PCR product sequence was then investigated using the ORF Finder program available at the NCBI web site (produced by Tatiana Tatusov and Roman Tatusov, NCBI). Due to the directional nature of trapping exons with the pSPL3 vector (5’ to 3’), sequences generated with the EXl primer should show an expected open reading frame in the positive frame, while sequence generated with the M13+F primer should show an expected ORF in the negative frame). The majority of the “exon trapped” sequences produced an ORF. If this was the case, the translated amino acid sequence was used in a BLASTP 2.0.8 (Altschul et aU 1997) search of the non-redundant protein database available at the NCBI web site.

4.4.9 Overall statistics relating to putative exons identified 14 PCR products from “exon trapped” clones were sequenced. Table 4.3 describes the hybridization, BLASTN, and ORF Finder data for the 14 PCR products containing possible trapped exon sequence. Of the fourteen sequenced PCR products, 3 displayed significant homology to database entries not associated with repetitive elements. The PCR product from clone 127B12sB4 displayed an ORF with a translated amino acid sequence similar to gi/3342533 {Homo sapiens peptidoglycan recognition protein precursor) and emb/CAA72803 (Mus musculus TAG7 protein) over 50 and 55 of the 69 amino acid ORF. The PCR product from clone 128L15bH15 displayed similarity, at the nucleotide level (133 out of the total 143bp), to gb/AF 120334 {Homo sapiens GTP binding protein NGB mRNA). The PCR product from the clone 127B12sC3 displayed partial similarity, at the nucleotide level (77 out 95bp over the first 95bp of the PCR product), to gb/AI056693 (Soares_total_fetus_Nb2HF8_9w Homo sapiens cDNA). Interestingly the database entry for gb/AI056693 was annotated to be similar to Q62185 TAG7 protein.

147 “exon trapped” lq21 “exon trapped" Length of Primer BLASTN 2.0.9 Open BLASTP 2.0.8 results clone hybridization library ‘clipped’ pair results reading data hybridization sequence frame data generated 127E12sB4 127 E12 12 others 209 EX +I = 70 aa +2 = gi/3342533 (4e- 128 LI 5 +2 = 69 aa 06), emb/CAA72803 140 J1 (3e-05) 127E12sCJ 127 E12 Inconclusive 222** EX 128115 138 F6 127E12sC3 127 E12 29 others 199 EX Gb/AI056693 +2 = 68 aa 128 L15 (2e-05) +3 = 67 aa 140J1 + I= * -2 = 58 aa -1 =*

127E12sD17 127 E12 19 others 125 EX 160 hits on nr, -1 = 44 aa* 128 L15 similar to LI.bl -3 = 36 aa* 138 F6 LI repeat 140J1 127EI2sDI8 127 E12 4 others 104 EX +I = 42 aa 128 L15 I27E12sF7 127 E12 4 others 137 EX 484 hits on nr, +2 = 44 aa pir/S23650 (6e-06) 128 L15 repeat emb/CAA36480 (6e- 06)gi/2072953 (le-05) l28L15sA3 127 E12 Inconclusive 85 EX +2 = 27 aa 128 LI 5 -2 = 21 aa 128LI5sA16 128 L15 Itself only 157 EX 75 hits on nr +2 = 43 aa 138F6 repeat +3 = 47 aa 128L15sB6 127 E12 5 others 81 EX +2 = 25 aa* 128 L15 +3 = 25 aa -3 = 25 aa 128L15sB7 128 L15 10 others 80 EX +2 = 26 aa 138 F6 128L15sB17 128 LI 5 3 others 142 EX +1 = 46 aa 138 F6 +3 = 46 aa -3 = 35 aa 128L15bG14 - Inconclusive 196 M13+ - - - 128L15bH15 19 others 143 M13+ gb/AFI 20334 -3 = 47 aa (Ie-44), plus 8 EST entries 128L15bJ3 - Itself only 460 M13+ -- - Table 4.3. Hybridization, sequence, BLAST, and ORF Finder data for 14 PCR products from the ''exon trapped” libraries. Legend on next page.

148 Table 4.3 (previous page). Hybridization, sequence, BLAST, and ORF Finder data for 14 PCR products from the “exon trapped” libraries, “exon trapped” clones (column one) display the following nomenclature: the first number-letter-number refers to the library of origin (PAC clone address), the second letter refers to PAC clone template preparation (s=5a«3AI partial or b=Bam]\\IBg\\l complete digestion), while the third letter-number refers to the address of the clone in question, aa = amino acids * = stop codon present in ORF. ** = sequence of poor quality (more than three indeterminable bases or ‘n’s). ‘inconclusive’ in column 3 refers to an indeterminable number of positive signals seen in the “exon trapped” libraries in addition to the clone itself, due to high background signal.

4.5 Analysis of “exon trapped” clones showing homology to database entries

4.5.1 Clone 128L15bH15 The trapped exon sequence from clone 128L15bH15 displayed a similarity, at the nucleotide level (as described), to the non-redundant database entry for Homo sapiens GTP binding protein NGB mRNA. The homology is seen across the length of the “exon trapped” sequence, with 133 nucleotides identical over a stretch of 146 nucleotides of the database entry (with a 3bp gap). A BLASTN 2.0.9 search of the Genbank human EST database identified 8 entries with an identical homology (identical to the non-redundant database entry rather than the “exon trapped” sequence). Open reading frame analysis identified one ORF in the negative frame (as expected from the directional nature of the exon trapping and final cloning) which showed no significant homology to the any entry within the non-redundant protein databases (using BLASTP). Therefore, this “exon trapped” sequence displayed homology to a H. sapiens GTP binding protein at the nucleotide level only, with the ORF amino acid sequence in the opposite frame and displaying no significant homology.

4.5.2 Clone 127E12sC3 and related EST

The sequence derived from “exon trapped” clone 127B12sC3 displayed a partial homology to EST entry AI056693 as described. It was hypothesized that the partial homology between the “exon trapped” clone and the EST entry could be explained as follows. The clone 127E12sC3 contained two separate stretches of genomic DNA flanked

149 by splice donor and splice acceptor sequences, which were spliced together during the exon trapping procedure. The 95bp stretch of sequence homologous to the EST entry would contain a ‘true exon’ while the additional 104bp stretch (showing no homology) would represent a cryptic exon, trapped by the chance occurrence of splice acceptor and donor sequences. The homologous stretch of the EST entry was located towards the end of the sequence - the sequence totaled 540bp and the 81 % homology was seen from 433-526bp. The ENTREZ (ENTREZ is a browser of nucleotide sequence database entries available at the NCBI web site) entry for AI056693 describes the high quality sequence stopping after base 453. Therefore the EST itself could be 100% homologous to the “exon trapped” clone, while the sequence entry (AI056693) could display a 20% error over the stretch seen to be homologous with clone 127E12sC3. In order to asses this hypothesis the IMAGE clone, for which the sequence AI056693 was derived from (IMAGE clone 1676497), was obtained from the HGMP resource centre (Hinxton, Cambridge, UK) and, in addition, primers were designed that would amplify the “exon trapped” sequence from clone 127E12sC3 (E2C3F and E2C3R). The 846bp insert of IMAGE clone 1676497 was sequenced from a PCR product template (amplified with the M l3+ primer pair). Sequencing was carried out using the BigDye™ terminator method (PE Applied Bio-Systems) on an ABI 310 genetic analyzer, following the manufacturer’s specifications, with the M l3+ primers and an additional primer designed from the M13F+ primer sequence (167SEQ1). The 846bp derived sequence displayed 99% homology (535 out of 536bp) to the EST entry AI0556693, and 81% homology to the first 95bp of “exon trapped” clone 127E12sC3. PCR with the primer pair E2C3F and E2C3R was performed with template DNA derived from the “exon trapped” clone 127E12sC3 and the PAC clones 127 E12, 138 F6 and 140 Jl. A single DNA fragment of the expected size was seen with agarose gel electrophoresis from the 127E12sC3 template only. Direct sequencing of the PAC clone 127 E12 (see materials and methods for template preparation) with primers E2C3F and E2C3R, using the BigDye™ terminator method (PE Applied Bio-Systems), produced two non-homologous (from BLASTN 2.0.9 comparison) sequences. Figure 4.5 displays these sequences and the homologies seen with the BLASTN 2.0.9 comparison to the sequence of 127E12sC3.

150 127E12sC3 sequence

X

00% 100% .X ■■•••V

3 19bp sequence from PAC clone 127 El 2 using pnmer E2C3F ,

396bp sequence from FAC clone 127 El 2 using pnmer E2C3R

Figure 4.5. Homology between exon-trapped clone 127E12sC3 and two sequences derived from PAC clone 127E12 using primers designed from the sequence of 112E12sC3. Horizontal lines represent DNA sequence (except for scale bar). Arrowhead lines represent the direction of sequence derived from PAC clone 127E12. Dashed lines delineate regions of 100% sequence homology. Drawn to scale. No homology is seen between the two sequences derived from PAC clone 127E12sC3. Sequence homologies determined using the BLASTN 2.0.9 program (Altschul er al, 1997) available at the NCBI web site (http://www.ncbi.nlm.nih.gov/hla.st). Appendix 111 details the sequence displayed in this figure. 4.5.2.1 Screening the “exon trapped” libraries with IMAGE clone 1676497 Insert DNA from the IMAGE clone 1676497 (isolated by PCR using the M l3+ primer pair) was radio-labeled and hybridized against the “exon trapped” library filters and lq21 sub-library filters. An identical hybridization pattern, to that seen with 127E12sC3 labeled as a probe, was displayed. One difference in the intensity of signal was seen - the intensity of the positive signal for PAC clones 127 E l2 ancjl28 L I5 was greater than that for PAC clone 140 J l .

4.5.2.2 PCR with primers designed from the 3’ end of IMAGE clone 1676497 To assess the presence of sequence, corresponding to the IMAGE clone 1676497, within the PAC clones positively identified with hybridization, primers were designed from the 3’ end of the IMAGE clone sequence (167F and 167R). PCR amplified the expected size product from PAC clones 127 E l2 and 138 F6 used as template DNA (as determined by agarose gel electrophoresis). The primers were then used to sequence directly from PAC clone 127 E l2 using the BigDye™ terminator method (PE Applied Bio-Systems). Primer 167F generated 588bp of sequence, while primer 167R generated 580bp of sequence. The two sequences overlapped by 191bp, as expected.

4.5.2.3 Direct sequencing of PAC clone 127 E12 with primers derived from IMAGE clone 1676497 sequence Primers were designed from the sequence of IMAGE clone 1676497 in order to determine the exon/intron structure of the EST by sequencing using PAC clone 127 El 2 as a template. Sequencing was carried out using the BigDye™ terminator method (PE Applied Bio-Systems) on PAC DNA template, prepared as described. Figure 4.6 details the positions of primers used and the length of sequence generated. Interestingly, the sequence of IMAGE clone 1676497 homologous to clone 127E12sC3 constitutes a single exon of equivalent size (90bp long).

152 0 1 ^ 0 . y b ______O.^kb______O.^kb 0 . y b ^ b

IMAGE 1676497/ESTAI056693. 7T AAAAAAA

\ \ \ \ Genomic clone 127 EI2

AmEQ2 I67SEQ1 167F 167SEQ3 ___ ^ 167SEQEX ^I67.SF.03-1 ^ I67SEQ2-1 L67E2Ü 167R

01^ y^b

Figure 4.6. Schematic representation of the sequence of IMAGE clone 1676497 (upper thick horizontal line) compared to direct sequencing derived from genomic clone 127 E12 (lower thick horizontal lines). Scales for both sequence representations is in kilobases as indicated. Direction of sequencing, length of sequencing, and primers used are represented by arrows. Sequences derived from PAC clone 127E12 are not continuous. Appendix III details the sequence displayed in this figure. 4.5.2.4 ORF analysis of IMAGE clone 1676497 and BLASTX 2.0.9 data The IMAGE clone 1676497 originates from an oligo (dT) primed library (Soares_total_fetus_Nb2HF8_9w), thereby orienting the clone 5’ to 3’. The ORF Finder program (available at the NCBI web site) identifies two open reading frames in the expected frame. The largest ORF spans from base 1 (5’ end of the 846bp sequence) to base 417 (stop codon) and translates to 138 amino acids. The second ORF spans from base 2 to 202 (stop codon) and translates to 66 amino acids. Neither of the two ORFs displayed any significant homology to entries in the protein databases available through the BLASTP 2.0.8 search engine at the NCBI web site. A BLASTX 2.0.9 search of the GenBank CDS translations, PDB, SwissProt, Spupdate, and PIR databases identified three entries showing significant homologies with translated sequence of the IMAGE clone 1676497. All three entries (ref/NP005082.1/PTNFSF3L- TNF superfamily, member 3 LTB; emb/CAA60133 - Mus musculus tag7; and gi/3342531 - peptidoglycan recognition protein precursor) displayed homology over the translated sequence from bases 507 to 716 (base 846 being the 3’ end). This was not in agreement with the identified open reading frames, with the stretch of homology being in the possible 3’ untranslated region of the IMAGE clone (presuming that either ORF is correct). This data does explain the GenBank entry annotation “TAG7 like”, although it is unlikely that clone 1676497 would translate a protein product homologous to either of the three above database entries.

4.5.3 Clone 127E12sB4

One of the ORFs identified in “exon trapped” clone 127E12sB4 shows homology to similar database entries identified with BLASTX 2.0.9 using IMAGE 1676497 sequence - gi/3342533, peptidoglycan recognition protein precursor and emb/CAA60133 Mus musculus tag7. The homology is over the same amino acid sequences of the database entries identified with the IMAGE 1676497 BLASTX 2.0.9 data. For example, BLASTX 2.0.9 data for IMAGE 1676497 shows homology from amino acid 18 to amino acid 87 of emb/CAA60133 {Mus musculus tag7) while the BLASP 2.0.8 data for the +2 frame ORF of 127E12sB4 displayed homology over amino acid 28 to amino acid 82 of

154 emb/CAA60133. BLASTN 2.0.9 comparison of the two sequences in question shows no significant homology. BLASTP 2.0.9 comparison of the +2 frame ORF of clone 127E12sB4 (69 amino acids) with 70 translated amino acids from the possible 3’ end of IMAGE clone 1676497 displays a greater homology than with either amino acid sequence to the database entry emb/CAA60133. The +2 ORF of 127E12sB4 (69 aa) displays a 36% identity over 55 aa (20 identical aa) and 46% positive aa (26 aa displaying similarities with respect to properties) with no gaps when compared to emb/CAA60133 database entry. BLASTX data for the possible 3’ end of IMAGE clone 1676497 displays a 37% identity and 48% positives (over 70 aa) with database entry emb/CAA60133. When these two stretches of amino acids (69 from the +2 ORF of 127E12sB4 and 70 from the 3’ end of IMAGE clone 1676497) are compared with the BLASTP 2.0.9 program a 46% identity and 62% positives are seen over a 56 amino acid stretch. This data suggests that IMAGE clone 1676497 and “exon trapped” clone 127E12sB4 may be related to some degree - although there is no nucleotide sequence homology, the translated amino acid sequences display homology.

4.5.4 Fine Mapping of IMAGE clone 1676497 and “exon trapped” clones 127E12sC3 and 127E12sB4.

Approximately 25ng of DNA from IMAGE clone 1676497 and “exon trapped” clones 127E12sC3 and 127E12sB4 were radio-labeled and hybridized individually to minimal tiling path Southern blots containing DNA from PAC clone 127 E12 and minimal tiling path neighbors (described in Chapter 3). All three probes hybridized to an approximately ISkb EcdRl restriction enzyme fragment present in clones 127 E12 and 140 Jl. This EcoRl fragment contains a Sail restriction enzyme site which upon digestion (with both enzymes) produces an approximately 15.5kb and a 1.5kb EcoRUSall restriction enzyme fragments. Probes representing IMAGE 1676497 and 112E12sC3 hybridized to the 15.5kb EcoRUSaE fragment present within PAC clones 127 E12 and 140 Jl while the probe representing 127E12sB4 hybridized to the 1.5kb EcoRUSaE fragment. The probe representing IMAGE 167697 also hybridized to an 8.9kb EcoRl fragment present in clone PAC 127 E l2 only.

155 4.5.5 Genomic mapping of IMAGE clone 1676497 and “exon trapped” clone 127E12sB4

Radio-labeled probes representing IMAGE clone 1676497 and “exon trapped” clone 127E12sB4 were hybridized to Southern blots containing total human genomic DNA digested with a number of restriction enzymes. Southern blots were produce with total human DNA isolated by Dr R Flomen in Dr Nizetic’s laboratory or with DNA isolated from a single flask of cells kindly donated by B. Chopra in Professor Anne Stephenson’s laboratory, School of Pharmacy (see materials and methods). Neither probe produced a positive signal with any Southern blot screened. To assess the quality of one of the Southern blots used a radio-labeled probe representing the SPRR2e gene was hybridized. The probe representing the SPRR2e gene identified positively hybridizing DNA fragments. Figure 4.7 displays this result.

156 Figure 4.7. 0.8% agarose gel resolving various restriction enzyme digestions M123456M M6 54321M M6 5 4 3 2 1 M of 15|ig of total human genomic DNA (A), which was subsequently Southern -m #» #» blotted and hybridized with two separate radio-labeled probes (B+C). In each case lanes are as follows: M = lkb ladder (Life Technologies). 1 = BamWl. 2= Xho\ and Kpn\. 3 = Kpn\. 4 = S/ÏI. 5 = N indlll. 6 = EcoRl. A = 0.8% agarose gel resolving restriction enzyme digested human genomic DNA. B = Southern blot of gel in A hybridized with a radio-labeled probe representing IMAGE clone 1676497. C = Southern blot of gel in A hybridized with a radio-labeled probe representing the SPRR2e gene.

I k t r ^ 4.6 Transcriptional analysis of putative exons and novel EST

4.6.1 RT-PCR data

PCR primers were designed from a number of the “exon trapped” clones displayed in table 4.3 and from IMAGE clone 1676497. In the case of “exon trapped” clone 127B12sC3, a single primer was designed to amplify the first 90bp of 127E12sC3 sequence in conjunction with primer E2C3F. This was due to the presence of two functionally trapped sequences contained within the one clone (see section 4.5.2). cDNA was constructed from RNA of a number of tissues as described (materials and methods). PCR was performed with the designed primers on the constructed cDNA and two commercially produced cDNAs. Figure 4.8 shows the results from two RT-PCR experiments, resolved with agarose gel electrophoresis. Table 4.4 describes the results seen.

4.6.2 Northern Blot analysis

Radio-labeled probes representing IMAGE clone 1676497 and “exon trapped” clone 127E12sB4 were hybridized individually to a commercially available multiple tissue Northern blot (CLONTECH) as described. The Northern blot was initially hybridized with a radio-labeled probe representing the p-actin gene to assess the quality of the blot (an experiment performed by Jurgen Groet in Dr Nizetic’s laboratory). Between each hybridization the Northern blot was either stripped (according to the manufacturer’s specifications) or left for a minimum of three ^^P half-lives (6 weeks), to prevent any signal from a previous hybridization being present after autoradiography. Figure 4.9 displays the results of various 32P labeled probes hybridized against the human multiple tissue Northern blot (CLONTECH). These data show three clear positive hybridization signals using a probe representing IMAGE clone 1676497, two approximately 6kb transcripts in adult heart and adult skeletal muscle, and an additional, approximately 3kb transcript in adult skeletal muscle. Possible weaker hybridization signals are seen in adult pancreas (approximately 1.5kb and Ikb) and in adult heart, brain, placenta, lung, skeletal muscle, and kidney (7.5kb). The positive control ( I p-actin probe)

158 5 6 Sk.M. F.Liver F lu n g F.Hecrt + - + - + + P HgO

. •

vO

Figure 4.8. PCR with primers amplifying sequence from a novel EDC-Iocalized cDNA (A) and an EDC exon trapped clone (B). 2% agarose gel electrophoresis resolving PCR products amplified from: A: Primers designed from a novel EDC cDNA, IMAGE 167649, using as template: I = Adult bone Marrow cDNA, 2= adult skeletal muscle cDNA, 3= Total fetal cDNA, 4= IMAGE clone 1676497, 5= PAC 127 E12, 6= H 2O control. Primers used = 167F and 167R. Lanes 3+4 are positive. cDNA source = Marathon Ready^M cDNA, CLONTECH. B: Primers designed from “exon trapped” clone 128L15bH15 (128L15bH15F+R), using as template: Sk.M. = adult skeletal muscle, F.Liver = fetal liver, F.Lung = fetal lung, F.Heart = fetal heart, P = PAC clone 128L15. ‘+’ = +RT, = -RT. Results detailed in table 4.4. Primer sequences, annealing temperature and cycle used, given in section 2.1.4. showed strong hybridization signals in adult heart, brain, placenta, lung, and skeletal muscle; weaker hybridization signals in adult liver and kidney; and virtually no signal in adult pancreas. The probe representing clone 127E12sB4 identified two weak signals in adult pancreas only, similar in size to weaker signals seen with the 1676497 probe (Ikb and 1.5kb).

Primer H iO FAC Brain Liver pancreas Skeletal Bone Total Fetal Fetal Fetal Muscle Marro pair Fetus heart lung liver w origin + + + + + + + rt rt rt rt rt rt rt rt rt rt rt rt rt rt

1676497 + + - + - + - +

127EI2S + + + + + + + + + + + + + + B4

127E12S + + + + + + + + + C.l

I28L1I5 + + sBI7

128LI5b + + GI4

l28L15b + + + + - + - + + H15 s b s S s b s s

Table 4.4. RT-PCR data using primers designed from a number of ' exon trapped” clones and IMAGE clone 1676497. Columns 2 to 12 depict template used in PCR experiments using primers designed from sequenced clones given in column one. Columns divided into + and - rt depict template produced from RNA of the given tissue with (+) or without (-) M-MLV reverse transcriptase (Promega). In the final row, b and s represent two different sized PCR products (b = 280bp, s = 135bp). In the case of total fetus and bone marrow. Marathon Ready™ cDNA (CLONTECH) was used as template.

160 1 2,3 4 5 6 7 8 1 2 3 4 5 6 7 8

9.5kb 9.5kb 7.5kb 7.5kb 4.4kb 4.4kb

2 .4 k b _ ! 2.4kb ! 1.35kb 1.35kb

OS

p-actin IfVIAGE 1676497 127E12SB4

Figure 4.9. Multiple tissue Northern blot (Clontech) hybridized with various radio-labeled probes. In each case the probe is given underneath each Phospholmager scan, arrows indicate the positions of RNA size markers according to the manufacturers specifications (sizes given in kilobases, kb). I = pancreas, 2 = kidney, 3 ^skeletal muscle, 4 = liver, 5 = lung, 6 = placenta, 7 = brain, 8 = heart. All tissues are from adult human origin (CLONTECH). Each image displays the same Northern blot hybridized with the given probes sequentially as seen (left to right). The blot was either washed at high stringency or left for 1 month between subsequent hybridizations. Chapter 5: Use of the integrated map to provide insight into the evolutionary events shaping the Epidermal Differentiation Complex.

5.1 Summary

By using the integrated map presented in chapter 3 in conjunction with hybridisation and direct sequencing techniques, the previously undetermined transcriptional orientation of four SlOO genes is described. In the case of one SlOO gene, S100A13, extensive alternative splicing of the 5’ untranslated region is demonstrated. Previously unidentified regions of the genome are shown to contain sequences homologous to SIOOAIO, S100A 11, and S lOOAl 3. These were determined from database searches (in the case of SIOOAIO and SlOOAl 1), and direct sequencing (in the case of S100A13). Genomic clones containing chicken SlOO genes were identified from a total chicken genomic cosmid library. Hybridization of these genomic cosmid clones with probes representing human SlOO genes recognized distinct regions of homology with SlOOAl and partial homology with S100A2, 3,4, and 5, suggesting a similar clustering of SlOO genes in chicken as seen in mouse and human.

5.2 Introduction

5.2.1 Evolution of multi-gene families

5.2.1.1 Gene duplication It is widely accepted that gene duplication has been one of the main driving forces behind the evolution of multi-gene families. Convergent evolution can not be considered as a major driving force in the production of the wide range of related genes identified throughout

162 any organisms’ genome. In the context of the evolution of an organism as a whole, gene duplication with subsequent diversification of duplicated products provides the simplest way of acquiring new function. It should be noted that alternative splicing and post- translational modification of a single gene offers an alternative source of simple diversity to that provided by gene duplication (discussed by Ohta, 1991). Three identified processes can mediate gene duplication. The most frequently described is homologous recombination between repetitive elements, a process that can delete as well as duplicate sections of DNA (reviewed in Purandare and Patel, 1998). In the context of an initial genomic event to produce a gene family via homologous recombination, duplication rather than deletion mediates (although deletion of sequence will act to diversify or remove an established family). Duplication can also occur by non- homologous recombination induced by chromosomal double stranded breaks, requiring little or no homology between the sites of breakage and repair (Taghian and Nickoloff, 1997). The third process of gene duplication is via RNA-mediated transposition whereby transcribed sequences are integrated into the genome (Vanin, 1985). Once a duplication event has occurred, recombination between the duplicated genes through mis-alignment and unequal crossover can increase or decrease the number of genes within a family.

5.2.1.2 Diversification The subsequent duplicated products have either diversified to provide alternative function or, as is the case of ribosomal RNA and histone genes, to provide additional, redundant copies necessary for large amounts of the corresponding products. Diversification of duplicated gene products can result from alternative splicing and post- translational modification of genes or from a direct diversification of the genes themselves, or a combination of both. An example of extensive post-translational modification is seen in the proteolytic cleavage of the precursors of opiate peptides (Noda et al, 1982). The pattern of peptide production from the precursor corticotropin-p- lipotropin differs due to alternative sites of cleavage over developmental stages, enabling the release of a number of different hormones from one gene product. Differential splicing is an important mechanism in the production of diversity within the

163 immunoglobin gene family, an essential component of the vertebrate immune response (Ohta, 1991). Gene conversion, or gene fusion, is the process of non-reciprocal transfer of genetic information between two homologous genes. This process does not affect the number of genes within a family but instead alters individual members by the donation of sequence of one member to another. This has been studied in detail for the p-like globin gene family (reviewed by Papadakis and Patrinos, 1999) where several hemoglobin variants containing fused or hybrid globin chains are described. In discussing the role of recombination in gene family evolution, Schimenti concludes that the relative frequencies of point mutations compared to the relative frequencies of reciprocal versus nonreciprocal recombination, the number of genes within a family, preferences in donor/recipient gene-conversion pairs, and the frequency of each in the germ line, determines the outcome.

5.2.1.3 Random genetic drift and natural selection Recombination provides the mechanism for major changes within a eukaryotic organism at the genomic level. Random genetic drift and natural selection contribute to the gradual change of genetic information across evolutionary time. Genetic drift describes the fixation of nucleotide changes through random processes over time, while natural selection describes the fixation or removal of nucleotide changes as a result of selective pressures. Ohta performed computer simulations to determine how a useful gene family can be attained over evolutionary time (summarized by Ohta, 1991). He concludes that positive Darwinian selection (fixation of advantageous changes) is needed, while the interaction between unequal crossover, mutation, random genetic drift, and natural selection are important, for acquiring gene families.

5.2.1.4 Molecular drive Dover identifies a third passive (non-recombinational) component of genome evolution, molecular drive (Dover, 1982; reviewed in Dover, 1987). Molecular drive describes the unusual rates of DNA turnover within a genome. This is reflected by accelerated

164 substitution in certain areas of the genome, compared to others and the predicted rate of nucleotide substitution over time. Ohno hypothesized that a redundant, duplicated gene is free to accumulate nucleotide changes without the constraints limiting those associated with the functional copy (Ohno 1970). This plausible notion has not been demonstrated. Instead, whilst studying Xenopus genes, it was demonstrated that as long as a gene is expressed, the functional constraints associated with a single copy gene are operating on that of a duplicated gene (Hughes and Hughes, 1993). Accelerated amino acid substitution has been associated with gene duplication (Ohta, 1991). This would support diversification under functional constraints - if a gene product is unable to freely acquire changes over time then directed substitutions would increase the chances of diversification.

5.2.1.5 Block duplication It has been proposed that complex genomes underwent several rounds of genome duplication via polyploidization during evolution (Ohno, 1970). It was first proposed by Lundin that this polyploidization was responsible for the presence of paralogous regions of the genome harboring similar gene families (Lundin, 1993). This idea was further developed by the identification of ten gene families having members on both human chromosomes 6, 9 (Kasahara et al, 1996) This paralogy was seen to also extend to chromosome 1 with examples of four members of these ten gene families (Katsanis et al, 1996). However Hughes and Yeager rejected the idea of simple block duplications by phylogenetic analysis of the gene families (Hughes and Yeager, 1997; Yeager and Hughes, 1999). They demonstrated that in fact the genes had duplicated at widely different time periods, spread over 1.6 billion years. A cautionary comment was made that the original theory of polyploidization (Ohno, 1970) was based on the genome sizes of evolutionarily distant organisms. It was suggested that the relative genome size could have resulted from specialized functions of the particular organisms studied (Hughes and Yeager, 1997).

165 5.2.2 Comparative genomics

Comparative analysis of other species genomes is a powerful tool, with multi-faceted applications relevant to the study of the human genome. The identification of an orthologous counterpart to a human gene within another species generally circumvents any practical, ethical and social problems involved in experimental procedures to define the biological function of that gene. Important elements within a genome, not only the genes themselves but also regulatory sequences, can be conserved across many species. This not only facilitates the identification of these elements but can indicate whether these elements play an essential role in basic cellular processes. On a simplistic level, a gene identified in human that displays homology to genes identified in simple eukaryotic organisms, such as yeasts, or even prokaryotic bacteria, such as E.coli, would indicate a basic cellular function. If a human gene displayed homology to a vertebrate organism such as mouse but not an invertebrate, such as C.elegans, it could indicate that the gene in question had a distinct role in vertebrates. This is a basic example, but it also highlights another role of species comparison, that of addressing evolutionary relationships between organisms and their genomes. The vast majority of evolutionary theory has derived from species comparison.

In 1995 the first genome of a self-replicating organism. Haemophilias influenzae, was completely sequenced (Fleischmann et al, 1995). Since then, this number has increased dramatically (Karlin et al, 1998), with more complex and larger genomes now becoming available - the public release of the Drosophila genome is expected by the end of 1999 (Burtis and Hanley, 1999). It has been projected that human genome sequencing will be complete by the year 2003 (90-99.9% sequenced - SC and WUGSC, 1998; Venter et al, 1998; Goodman, 1998) and this is expected to be followed by mouse, the next best- mapped vertebrate (for review see Carver and Stubbs, 1997). The limiting factor in both human and mouse is the sheer size of the genome and the presence of repetitive elements and regions unstable in current cloning vectors (SC and WUGSC, 1998; Venter et al, 1998). Figures of 90-95% or 99.9% complete are being used, as until the final picture unfolds the degree of difficulty in finishing is yet unknown.

166 5.2.2.1 Other vertebrate models - Fugu rubripes and Gallus gallus In 1993 Sydney Brenner and colleagues added another vertebrate to the select list of model organisms, the pufferfish Fugu rubripes (Brenner etal, 1993). The rationale behind this project was that, remarkably, the Fugu genome displays an estimated 7.68 fold compression (genome size of 390 - 404Mb) with no evidence of interspersed highly repetitive sequences that are characteristic of mouse and human DNA (review of human repetitive elements in Smit, 1996). The reason for the compressed genome size has been postulated as; rather than being actively compressed over evolutionary time, the Fugu genome could represent a vertebrate genome that hasn’t expanded by the acquisition of repetitive elements and therefore could be closer to an ancestral genome (Brenner et al, 1993). If synteny between the Fugu and human genomes could be demonstrated then a positional cloning project looking for a disease gene in 1Mb of human DNA could be reduced to 130kb in Fugu. However, by studying the surfeit locus in Fugu it was shown that rather than demonstrating clear regions of synteny, the genes of this family were interspersed across the Fugu genome as opposed to being closely linked in human (Gilley et al, 1997). Nevertheless, a growing number of human loci have been identified in Fugu and some syntenic regions compared (Trower et al, 1996; Tassone et al, 1999). It has become evident that Fugu genes are shorter, with the intron sizes responsible for the majority of compression (Brennen etal, 1993; Elgar, 1996).

The identification of economically important traits has been the major goal of constructing genetic maps in livestock species (for review see Georges and Andersson, 1996). Of these livestock species, the domestic chicken {Gallus gallus) presents not only an economically important organism but also one that represents a class of species other than mammals (namely avian). Birds are thought to have diverged from a common ancestry with mammals approximately 300-350 million years ago (Kumar and Hedges, 1998) and are poorly studied in comparison to mammals. Study of an avian species will contribute to an understanding of vertebrate evolution, identifying elements that are

167 distinct amongst diverse vertebrates. One apparent element was the observation that avian karyotypes comprise two classes of chromosomes, micro and macro (Bloom, 1993). In Gallus gallus one quarter of the genome is composed of microchromosomes, cytologically indistinguishable due to their small size. Two thirds of the genome consists of macrochromosomes, designated so by their larger size, which are the only chromosomes larger than the smallest human chromosome. The remainder of the genome consists of intermediate sized chromosomes that have been designated larger microchromosomes (Bloom, 1993). Chicken microchromosomes are hyperacetylated (associated with transcriptionally active DNA), early replicating (another indication of transcriptionally active DNA), and gene rich (McQueen et al, 1998). The chicken genome is also 60% smaller than the human genome (1,200Mb, Bloom, 1993) and it has been proposed that gene density on microchromosomes is comparable to that of Fugu due to compressed intron size (Hughes and Hughes, 1995; McQueen et al, 1998), as has been demonstrated with studies of the chicken MHC (Kaufman et al, 1995). Thus the benefits associated with the Fugu genome could be achieved with the chicken genome. An advantage over Fugu as a model system is the ease of genetic mapping in the chicken. Fugu are large salt-water fish that are absent from European waters and cannot be bred in captivity (Little, 1993), whereas chickens are bred routinely as livestock throughout the world. Genetic maps of the chicken genome have been produced (examples being Bumstead and Palyga, 1992 and Groenen et al, 1998, comprehensive genetic maps available are found at http://www.ri.bbscr.ac.uk/genome mapping.html) and linkage groups between humans have been identified (reviewed by Burt et al, 1995).

5.2.3 Epidermal Differentiation Complex

5.2.3.1 Evolution Speculation of the evolutionary origin of the three so far identified gene families within the EDC has generally been limited to comparisons of these genes within the human. The observation that loricrin, involucrin and SPRR proteins shared homology at the N and C termini, coupled with strikingly homologous exon structures suggested that these genes had common ancestry (Backendorf and Hohl, 1992). This notion was investigated further

168 after the identification of additional SPRR members and comparisons of the internal repeat structures (Gibbs et al, 1993), Loricrin, involucrin, and the SPRR proteins share homology at the N and C termini, separating gene specific internal repeats. It was noted that only three internal repeats existing in the seven SPRR2 genes suggested inter-genic duplication (creation of additional gene copies), while the 14 internal repeats of the single SPRR3 gene suggested intra-genic duplication (duplication within the single gene). In the SPRR 1A gene, 6 repeats were identified, coupled with SPRR IB this demonstrated a process of both inter- and intra-genic duplication (Gibbs et al, 1993). The promoter regions of the human SPRR genes have been examined in detail (Fischer et al, 1996; Sark et al, 1998; Fischer et al, 1999). In all SPRRl and 2 genes studied, a common transcriptional binding site was identified at the same position, 5’ of the transcriptional initiation site, whilst in the SPRR3 gene this binding site is some 180 nucleotides downstream of the conserved region. This prompted the theory that the rearrangement of this binding site has led to the divergence of the SPRR3 gene (Fischer et al, 1999). More recently, a number of the SPRR genes have been identified in mouse (Kartsova et al, 1996; Reddy et al, 1998; Song et al, 1999). These studies concentrate on the genomic structure and differential expression of the mouse SPRR genes, as opposed to commenting on evolutionary relationships between mouse and man.

Expression of loricrin was demonstrated in a number of mammalian species by the use of monospecific polyclonal antibodies to the carboxy-terminal peptide of human loricrin (which is highly conserved in mouse, Hohl et al, 1993). Immunoblot analysis of total protein extracts from the skin of various species separated by SDS-PAGE demonstrated cross-reactivity with the antibodies to human loricrin in all mammalian species tested (human, mouse, hamster, rat, rabbit, lamb, and cow). Three non-mammalian vertebrate species were investigated in a similar manner and failed to show any significant cross­ reactivity with the human loricrin antibody {Xenopus laevis, garter snake and chicken). No cross-reactivity with human involucrin or SPRR proteins was noted, indicating that the presence of any possible ancestral loricrin/involucrin/SPRR proteins in the non­ mammalian vertebrate species cannot be discounted (an ancestral protein could possibly be more similar to involucrin or SPRR proteins and therefore would not show cross­ reactivity).

169 Early identification of multiple SlOO genes, coupled with the observation that they were homologous, led to the speculation of a gene family (Lagasse and Clerc, 1988). Later, when more SlOO genes and proteins had been identified, analysis of homology was performed with human SlOOAl-10, SIOOB, SI OOP, and CALB3 (Schaefer etal, 1995). The resulting dendrogram led to comments that two distinctive groups were present within the SlOO genes analyzed, but no speculation as to which could be the ancestral gene was made. Furthermore the authors comment on the fact that a number of SlOO genes were identified in other vertebrates but failed to analyze the relationship of interspecies SlOO genes. The more recent identification of a similar clustering of SlOO genes in mouse (see below) has led to the suggestion that the SlOO gene cluster on human lq21 was present in a common ancestor (70 million years ago) and that two rearrangements have occurred since (Ridinger et al, 1999).

The genes profilaggrin, trichohyalin, and repetin, are all related structural proteins containing functional calcium binding domains (see section 1.3.1.2). It has been noted that the internal repeats of trichohyalin share homology with the internal repeats of involucrin (Lee et al, 1993), and that the internal repeats and C terminus of repetin share homology with involucrin (Kreig et al, 1997). All three genes share an identical exon structure as the majority of SlOO proteins (albeit larger with internal repeats). After the initial elucidation that profilaggrin can functionally bind calcium via two E-F hand domains it was proposed that this family (now including repetin which was unidentified at the time) arose during evolution by the fusion of a gene for a structural protein, containing internal repeats (of the loricrin, involucrin and SPRR family), with a calcium binding protein (SlOO) that were both co-localized to lq21 (Markova et al, 1993).

S.2.3.2 Mouse EDC Mouse genes corresponding to the majority of the human EDC genes have been described, in fact a number of the mouse EDC genes were characterized before their human equivalents (Mehrel et al, 1990; Ohta et al, 1991; Kreig et al, 1997). Initial linkage studies mapped mouse loricrin and profilaggrin to mouse chromosome 3 (Rothnagel et al, 1994) in a region of known synteny between human chromosome 1 (Moseley and Seldin, 1989; Hardas et al, 1994). Other human EDC equivalents to have

170 been localized to mouse chromosome 3 include SI00A4 and S100A6 (Debry and Seldin, 1996), SIOOAIO (Saris et al, 1987) and repetin (Kreig et al, 1997). A recent study has defined the order of mouse SPRRs, loricrin, involucrin, and profilaggrin on mouse chromosome 3 to be identical to the human EDC on chromosome lq21 (Song et al, 1999). The genetic distance between mouse profilaggrin and loricrin was calculated to be 1.4 cM, an estimated 2.8Mb. A mouse YAC clone has been described that harbors a cluster of the mouse SlOO genes (equivalent human) A l, A3, A4, A5, A6, A8, A9 and A13 (Ridinger et al, 1999). The order of the mouse SlOO genes is slightly different to that in human, for which two inversion events have been suggested (see chapter 6). This YAC clone was localized to mouse chromosome 3 by fluorescent in situ hybridization (FISH) in agreement with other assignments of individual SlOO genes to mouse chromosome 3 (Debry and Seldin, 1996). The degree of linkage between the identified mouse SlOO cluster and other mouse EDC-equivalent genes has been described using radiation hybrid mapping (Lueders et al, 1999). S100A6 is mapped closer to loricrin than loricrin is to profilaggrin. Exact distances are not given as the measurements described are between profilaggrin and a non-EDC gene (97.5 cR away. Lenders et al, 1999). If the distance between profilaggrin and loricrin is estimated to be 2.8Mb from genetic mapping (Song et al, 1999) then a comparable distance between S100A6 and loricrin would be 1.7Mb. It remains to be seen whether these distances described using principally genetic mapping, will be confirmed by physical mapping.

5.2.3.3 Transcriptional orientation within the EDC located human SlOO gene family One of the ways to assess the evolutionary processes at work in the formation of a gene family is to look at the transcriptional orientation of the gene members. A duplication event generally results in a head to tail array of genes (Ohno, 1970), where an inversion (non-homologous recombination), or an insertion within a tandem array of duplicated genes, can be indicated by opposing transcriptional orientation. Currently, the transcriptional orientation of four SlOO genes is known (S100A3,4, 5, and 6). This is based on the isolation of a YAC clone containing 9 SlOO genes (Schaefer et al, 1995) and long range physical mapping (Mischke et al, 1996), coupled with sequencing data

171 (Englekamp et al, 1993), see figure 3.1 for details. These four genes are arranged in a head to tail orientation (telomere to centromere).

5.2.3.4 Chicken EDC genes In order to assess the possibility of an EDC in the chicken, a search of the available databases was performed using all known EDC gene names. The sequences of Gallus gallus calcyclin (S100A6) mRNA (gb/U7635/GGU76365) and Gallus gallus calgizzarin (SlOOAl 1) mRNA (gb/U77733/GGU77733) were identified via an ENTREZ search (a search engine that identifies terms within the text of database entries, available at the NCBI web site - http://www.ncbi.nlm.nih.gov/Entrez). The database entries were submitted by: Allen, E.G., Sutherland, C., Andrea, I.E., Schonekess, B.C., and Walsh, M.P., 1996, unpublished (S100A6); and by Schonekess, B.O. and Walsh, M.P., 1996, unpublished (SlOOAl 1).

172 5.3 SlOOAl and A l3 transcript species and orientation

5.3.1 Library screening with SlOOAl and SlOOAl3

During the course of constructing the bacterial clone map of the EDC, a number of chromosome 1 cosmid clones were identified with probes representing both SlOOAl and SlOOAl3 (section 3.4.3). As described, isolated DNA from these clones was digested with E c d R i restriction enzyme and resolved by electrophoresis on a 0.8% agarose gel. DNA fragments resolved on this gel were Southern blotted and hybridized with radio- labeled probes representing each gene individually (SlOOAl and SlOOAl3). A single cosmid, ICRFl 12cP0780, displayed one EcoRI fragment positive with the SlOOAl probe. This cosmid clone also displayed a single EcoRI fragment, approximately 3kb in size, positive with the probe for SlOOAl3. Two additional cosmid clones, ICRFl 12cL0496 and ICRFl 12cG 1487, displayed a larger, approximately 15kb, EcoRI fragment positive for the S100A13 probe with a greater intensity than the 3kb SlOOAl3 positive fragment in cosmid clone ICRFl 12cP0780. Figure 5.1 displays the digested cosmids resolved on a 0.8% agarose gel and the Southern blot of this gel hybridized with radio-labeled probes representing SlOOAl and SlOOAl3 individually. From this hybridization data it was hypothesized that SlOOAl localized to cosmid ICRFl 12cP0780 while SlOOAl3 localized to cosmid clones ICRFl 12cL0496 and ICRFl 12cG1487 and showed a degree of homology to sequence present in a 3kb FcoRI fragment from cosmid clone ICRFl 12cP0780. The single cosmid positive with the probe for SlOOAl (ICRFl 12cP0780) was integrated into the EDC bacterial map using hybridization and restriction-mapping experiments described earlier (3.4.3). The two cosmid clones positive for S1(X)A13 only, failed to detect any clones previously identified whilst constructing the bacterial map of the EDC (chapter 3).

173 A ji 2 ' 1 - w .. .k* 1 - 1 1 «

Figure 5.1. 0.8% agarose gel (A) resolving restriction enzyme digested cosmid DNA and two autoradiographs (B and C, Phosphorlmager files, Molecular Dynamics) displaying the corresponding Southern blot of the 0.8% agarose gel hybridized with radio-labeled probes representing the SlOOAl and S100A13 genes. Cosmid clones were identified from screens of the flow sorted chromosome 1 cosmid library with the radio-labeled probes representing the SlOOAl and SlOOAl 3 genes. In each case (A, B, and C) the numbered lanes indicate: 1 - S a iû A l digested DNA of clone ICRFl 12cL0496; 2 - £coRI digested DNA of clone ICRFl 12cP0780; 3 - E coR l digested DNA of clone ICRFl 12cG1487; and 4 - fcoRI digested DNA of clone ICRFl 12cL0496. A - 0.8% agarose gel, resolving restriction enzyme digested cosmid DNA in the presence of a Ikb marker (Life Technologies). The gel has been stained with ethidium bromide and photographed under UV light (254nm). This gel was subsequently blotted to a nylon membrane according to Southern. B - Southern hot of the gel in A, hybridized with a radio-labeled probe representing the SlOOAl gene. Boxed areas indicate hybridization signal from a previous, unrelated, experiment. C - Southern blot of the gel in A, hybridized with a radio-labeled probe representing the SlOOAl3 gene. Boxed areas indicate hybridization signal from a previous, unrelated, experiment. Fluorescent in situ hybridization (FISH) was carried out in the lab of Dr. Jiannis Ragoussis, by Ghazala Mirza, with cosmid clone ICRFl 12cL0496, This experiment localized cosmid clone ICRFl 12cL0496 to human chromosome 10q23, suggesting that the S100A13 gene previously localized to a lq21 assigned YAC clone (Wicki et al, 1996b), was in fact localized to 10q23, while related S100A13 sequence was present in a 3kb EcoRI fragment localized to the EDC (cosmid clone ICRFl 12cP0780). Attention was directed towards investigating the apparent related sequence located within the EDC.

5.3.2 Sub-cloning and sequencing of the S100A13 positive EcoRl fragment from cosmid clone ICRFl 12cP0780. In order to assess the S100A13 homologous sequence present in cosmid clone ICRFl 12cP0780, the 3kb EcoRl fragment positive for the S100A13 probe was sub-cloned into an EcoRl digested pBluescript*^ vector as described in section 2.2.24 (materials and methods). Plasmid DNA, containing the sub-cloned 3kb EcoRl fragment, was purified using a QIAGEN™ plasmid midi kit, and used as a template for sequencing. Sequencing was carried out on an ABl 310 using BigDye^*^ terminator chemistry with primer walking. Figure 5.2 describes the result of performing a BLASTN 2.0.9 (Altschul et al, 1997) search of the non-redundant GenBank, EMBL, DDBJ, and PDB sequences with the determined sequence of the sub-cloned EcoRl fragment, as well as describing the position of primers used and the length of sequence generated. The BLASTN 2.0.9 search was performed via the blast server at the National Center for Biotechnology (NCBl, Johns Hopkins University, Baltimore, USA) web site (http://www.ncbi.nlm.nih.gov/blast). The blast data determined the 3kb (2,993bp) EcoRl fragment to be the T7 end fragment of cosmid ICRFl 12cP0780, in addition to containing 216bp of the published S1(X)A13 gene sequence. No exon-intron structure for the S100A13 gene had previously been described. By identifying bases 12 to 228 of the published S100A13 sequence, it was assumed that this sequence would constitute the second exon of the gene. This was based on the general exon structure of the SI00 gene family, in that the majority of SKX) genes contain three exons (the known exception being the S100A5 gene, containing 4 exons - Englekamp et al, 1993). The first, short, exon contains 5’ untranslated sequence only (in the case of S100A13 this would presumably be the unidentified bases 1 to 11). The second larger exon contains both 5’ untranslated and coding sequence, while the third, largest, exon contains coding sequence and all of the 3’ untranslated sequence (Wicki etal, 1996b).

175 Ikb 2k b 3kb I _L_

M13R+ A13-ln F-2 M13F+

L4R F-3 127F

A13-2n A13-n F-1

o\-J pBluescript S100A13 pBluescript

Lawrist4 ALU like ALU like

Figure 5.2. BLASTN 2.0.9 (Altschul et al, 1997) results of searching the non-redundant GenBank, EMBL, DDBJ, and PDB sequences with the sequence of the 3kh EcoRI fragment derived from cosmid clone ICRF112cP0780. Upper horizontal line shows the scale in kilobases. Arrows beneath the scale show the direction and size of BigDye™ terminator (PE Applied Biosystems) sequence using primers indicated (see materials and methods for primer sequence). The lower, thicker, horizontal line shows a schematic representation of the derived sequence and the positions of sequences from the non-redundant GenBank, EMBL, DDBJ and PDB databases identified with the BLASTN 2.0.9 program (black and white rectangles). The blocked arrow displays the 5’ to 3’ direction of the SI GOA 13 sequence. BLASTN 2.0.9 search was performed via the blast server, NCBI ('http://www.ncbi.nlm.nih.gov/blastL on the 23^^^ September, 1998. It was therefore hypothesized that the determined 2,993bp sequence from cosmid ICRFl 12cP0780 contained the second exon of the S100A13 gene, thus positioning this gene to the Epidermal Differentiation Complex. The PCR probe for the S100A13 gene was generated by amplifying 214bp of 3’ sequence from the 447bp published sequence with primers located at 203bp (S100A13F) and 434bp (S100A13R). Therefore, the probe (constituting bases 203 to 434 of the published SI GOA 13 sequence) could only hybridize to 25bp of sequence present in the 2,993bp EcoRl fragment of cosmid ICRFl 12cP0780 (S100A13 bases 203 to 227). Cosmid ICRFl 12cP0780 failed to identify further clones from the flow sorted chromosome 1 cosmid library that extended towards the centromere (figure 3.12). The known orientation of this cosmid (ICRFl 12cP0780, figure 3.12) determines that the hypothetical third exon (bases 228 to 447) would be found centromeric of the T7 end EcoRl fragment. This explains the failure to identify additional S100A13 positive cosmids (which localized to chromosome lq21) from the flow sorted chromosome 1 library, with either an ICRFl 12cP0780 derived probe or the S100A13 probe. The probe representing S100A13 was hybridized to a Southern blot containing EcoRI digested DNA from PAC clone 178 FI 5 (PAC clone containing S100A2, 3,4, 5, and 6 - see figure 3.12). A single 8,700bp EcoRl fragment was identified as positive with the S100A13 probe. This is in agreement with the YAC data from Wicki etal, 1996b, who identified a lOkb S100A13 positive EcoRl fragment (the small difference can be attributed to the differing resolution of the techniques used). The above data disproved the initial hypothesis (section 5.3.1), that the two cosmid clones ICRFl 12cL0496 and ICRFl 12cG 1487 contained the S100A13 gene and that ICRFl 12cP0780 contained sequence homologous to S100A13. It was now hypothesized that the two cosmid clones, ICRFl 12cL0496 and ICRFl 12cG1487, contained homologous sequence to the S100A13 gene - either another copy of the gene (or psuedogene) or a related, novel, SlOO gene localizing to chromosome 10q23.

177 5.3.3 BLASTN 2.0.9 search of the GenBank human EST database with the genomic sequence derived from cosmid clone ICRFl 12cP0780. The non-redundant databases available at the NCBI BLAST server do not contain a larger number of unmapped or redundant EST entries. Therefore, a search of the Genbank human EST database was performed with the genomic (non-vector) portion of the derived, non-vector, 3kb EcoRl fragment sequence (2,668bp). Figure 5.3 details the result of this BLASTN 2.0.9 search. The majority of the identified human EST entries (100 out of 101) corresponded to the hypothetical second exon of the published S100A13 gene. Three EST entries showed 97-100% homology with other stretches of the 2,668bp sequence. Two of these (accession numbers AA366085 and AA143022) displayed 100% homology to the hypothetical second exon of S100A13, as well as two separate 5’ regions 100% homologous to portions of the 2,668bp sequence, upstream of the S100A13 homologous region. The third EST (5’ sequence accession number T78567) displayed 97% homology to 234bp of the sequenced EcoRl fragment, upstream of the hypothetical S100A13 exon 2. This 5’ sequence of the EST (T78567) was 389bp long and displayed no significant BLASTN 2.0.9 homology over the final 155bp of sequence. The Genbank entry for this EST showed the cDNA to be an IMAGE clone (IMAGE 113442) with an insert size of 890bp and that the high quality 5’ sequence stopped at base 276. An ENTREZ search (ENTREZ is a browser provided by NCBI) with the IMAGE clone number (113442) identified that the 3’ end of the cDNA had also been sequenced. This 3’ sequence (accession number T78482) was 427bp long and displayed a 95% homology to the published S100A13, 3’- 5’ sequence. The IMAGE clone 113442 was obtained from the HGMP Resource Centre (Hinxton, Cambridge, UK) and sequenced from a PCR amplified template (PCR using the cDNA clone as a template with the M l3+ primer pair) using Bid-Dye terminator chemistry and primer walking (2 primers designed and used: 113442F and A13uni-rev). This determined sequence (790bp) was compared to that of the EcoRl fragment from cosmid ICRFl 12cP0780 using BLASTN 2.0.9 at the NCBI blast server. The 5’ end of cDNA 113442 displayed 100% homology to the EcoRl fragment sequence up to and including the S100A13 homologous stretch. Sequence of the IMAGE clone 113442 after this displayed 100% homology to the remaining, published, S100A13 sequence (bases 228-447) only.

178 Ikb 2kb 3kb

S100A13

ALU like ALU like ______T78567( IMAGE 113442 5’)

...... " AA366085 (EST77044 TIGR)

...... AA 143022 ^ (IMAGE 505142) I 1 EST entries

Figure 5.3. BLASTN 2.0.9 (Altschul et al, 1997) results of searching the GenBank human EST entries with the 2,668bp genomic sequence derived from cosmid clone ICRF112cP0780. Upper horizontal line shows the scale in kilobases. The horizontal line below the scale, shows a schematic representation of the derived sequence and the positions of sequences from the non-redundant GenBank, EMBL, DDBJ and PDB databases identified with the BLASTN 2.0.9 program. The blocked arrow displays the 5’ to 3’ direction of the S100A13 sequence. Underneath this are human EST entries identified by searching the GenBank human EST database with the 2,668bp genomic sequence. Blocked lines are regions showing significant homology, dashed lines link regions of homology within the same EST. BLASTN 2.0.9 search was performed via the blast server, NCBI (http://www.ncbi.nlm.nih.gov/blast) on the 23*'^ September, 1998. 5.3.4 Identifying other S100A13 EST entries with novel 5’ end sequences.

A BLASTN 2.0.9 search was performed with the sequence of the hypothetical second exon of SI GOA 13 (bases 12-227 of the published SI GOA 13 sequence) in an attempt to identify further EST entries with novel 5’ sequences. As was expected, the results were similar to those obtained with the BLASTN 2.G.9 search with the 2,668bp sequence of cosmid clone ICRFl 12cPG78G (IGG EST entries being identified). EST entries that contained additional 5’ sequence to that of the S1GGA13 sequenced used in the BLASTN 2.G.9 search were investigated. Five species of EST that contained a unique 5’ sequence in addition to the published S1GGA13 homologous sequence were identified. Three of these species had already been identified and are described in figure 5.3. The remaining two species were represented by 13 EST entries. Of these 13 entries, 12 were homologous and contained the remaining eleven, 5’, bases of the published S1GGA13 sequence. The longest of these 12 EST entries contained 141bp 5’ of the hypothetical S1GGA13 exon 2 (accession number AA527275). The remaining eleven 5’ bases of the published S1GGA13 gene were included at the proximal end of this 141bp sequence. The final EST species identified was represented by a single entry (accession number AA364489) and contained a unique 5’ sequence of 83 bases. This particular EST entry (accession number AA364489) was annotated to be ‘similar to S-IGG protein, alpha chain’ (SIGG alpha is synonymous to SlGGAl).

5.3.5 Extending the 2,668bp genomic sequence with PCR products amplified using novel S100A13 EST species-specific primers.

In order to assess the relationship between the two newly identified SlGGAl3 EST species and the lq21 locus, PCR primers were designed from the unique 5’ sequence of the two newly identified SlGGAl3 EST species (primers A13sp4F and A13sp5F respectively). PCR experiments were performed using these two primers in conjunction with a primer designed from the hypothetical second exon of SlGGAl3 (A13uni-rev) using PAC clone 178 FI5 as template. The primer combination A13sp4F - A13uni-rev, amplified an approximately l,8GGbp product from PAC clone 178 F I5. This amplified fragment of DNA was sequenced using BigDye™ terminator chemistry. Primer A13sp4F

180 generated 238bp of good quality sequence and a further 277bp of poor quality sequence. Primer A13uni-rev failed to generate good quality sequence. An additional primer, 80P7di-l (designed from the M13F+ primer sequence of the cloned 3kb EcoRI fragment - see figure 5.2), generated 622bp of good quality sequence from the PCR amplified 1,800bp DNA fragment. This sequence linked the main genomic 2,668bp sequence to that generated by primer A13sp4F and extended the total genomic sequence to 3,426bp.

5.3.6 SlOOAl exon 1 identified with additional sequence derived from cosmid ICRFl 12cP0780

A BLASTN 2.0.9 search of the non-redundant databases with the full 3,426bp of extended genomic sequence, identified previously published genomic sequence around the first exon of the SlOOAl gene (accession number M65210 - Morii et al, 1991). The final 50bp of the 3,426bp sequence displayed 100% homology to the first 50bp of the SlOOAl exon 1 sequence, effectively extending the total genomic sequence identified to 3,776bp. Figure 5.4 displays the results of BLASTN 2.0.9 searches of the non-redundant and human EST databases with the full 3,776bp sequence, as well as the location of the PCR amplified sequence (section 5.3.5) and the primers used to extend the original 2,668bp genomic sequence. The BLASTN 2.0.9 data displayed in figure 5.4 identifies all 5 species of SlOOAl3 EST entries (section 5.3.4), as well as the previously published genomic sequence around the first exon of the SlOOAl gene. This data not only specifically localizes both the SlOOAl and S100A13 genes to the bacterial clone map of the EDC, but also, for the first time, determines the relative transcriptional orientation of these two genes. The novel finding of these experimental data is that the SlOOAl3 gene is transcribed in the same orientation as S100 genes A3, A4, A5, and A6 (telomere to centromere) while the SlOOAl gene is transcribed in opposing orientation (centromere to telomere).

181 ikb 2 k b 3kb 4kb — h - — I— —I A !3sp5F

80P7di-l ■=> SI00A13 SlOOAl 3 SlOOAl 4 : 1----- ALU like ALU like SlOOAl exon 1 (M65210)

IMAGE 113442 22 EST entries best hit = AA 179240 ■ AA366085 (EST77044 TIGR) I

AA 143022 (IMAGE 505142) oc to W25732, AA527275, A1021993, AA025332, AA708317, AA092278, W61069, A1056032, W77955, W25494, A1084881, W72756 ...... AA364489

AA814441 85 EST entries A1027537 AA833741

Figure 5.4. BLASTN 2.0.9 (Altschul etal, 1997) results of searching the non-redundant and human EST databases with 3,776bp of genomic sequence around the SlOOAl and S100A13 genes. Upper horizontal line shows the scale in kilobases. The horizontal line, below the scale, shows a schematic representation of the derived sequence. The positions of sequence entries displaying homology to this 3,776bp sequence, identified from the non-redundant GenBank, EMBL, DDBJ and PDB databases with the BLASTN 2.0.9 program, are indicated along the schematic representation of the sequence. Above this, the grey rectangle indicates the region amplified with PCR primers A13uni-rev (not shown) and A13sp5F. Arrows represent sequence used to extend the 2,668bp sequence derived from cosmid ICRFl 12cP0780 and link this sequence to theSlOOAl database entry (M65210). Primers used to generate this sequence are indicated. The blocked arrows display the 5’ to 3’ direction of the SlOO genes indicated. Human FST entries identified by searching the GenBank human FST database with the 3,776bp genomic sequence are represented by blocked lines below the schematic of the 3,776bp sequence (accession numbers are given where space permits). Dashed lines link regions of homology (blocked lines) within the same FST. BLASTN 2.0.9 search was performed via the blast server, NCBI ('http://www.ncbi.nlm.nih.gov/blast) on the 19'^ of January 1999. Appendix 111 details the full sequence described here. 5.3.7 Identification of additional SlOOAl3 species of ESTs

The new BLASTN 2.0.9 data (described in figure 5.4) identified a further three EST entries from the Genbank human EST database (accession numbers AA833741, AA81441, and AI027537). All three EST entries displayed sequence homology to the SlOOAl exon 1 database entry, while the EST with the longest stretch of sequence (AA833741) displayed homology to the first 30 bases of EST sequence entry AA364489. Further BLASTN 2.0.9 searches were performed with the AA833741 sequence, identifying further EST entries, which were, in turn, used in BLASTN 2.0.9 searches themselves. Figure 5.5 displays the results of these searches.

5.3.8 RT-PCR analysis of the various identified S100A13 species of transcribed sequences

RT PCR experiments were performed using the SlOOAl3, exon 2 specific primer (A13uni-rev) in conjunction with primers designed from the 5’ sequence of SlOOAl3 EST species identified in section 5.3.4. cDNA was constructed, as described (materials and methods), from 50ng of commercially produced poly-A+ RNA available in the laboratory (CLONTECH). SlOOAl3 expression had previously been identified in human adult brain and pancreas from Northern blot hybridization experiments (Wicki et al, 1996b). One identified EST entry (accession number AI021993) originated from a fetal liver and spleen cDNA library. Human adult brain, adult pancreas and fetal liver cDNA were investigated. Figure 5.6 shows the RT-PCR results for primer pairs designed from EST entries AA 143002 and AA364489. Table 5.1 details the results of RT-PCR experiments performed with SlOOAl3 EST specific primers described. Expression is demonstrated in three of the five primer pair combinations. All positive PCR products were of the expected size (determined from the EST sequence), except for those generated with primers derived from EST AA364489. In this case, at least 3 different PCR products were seen in each positive PCR.

183 a m S100A13 SlOOAl 3 SlOOAl SlOOAl 1-----

AA8??74l (cDNA insert = Ufilbp)

AAR 1441 = (cDNA insert = 11 ?2bpi

AI027537 = fcDNA iwcrt = M57bp)

AA364489 (EST?50W TIGR)

All85340(cDNA insert =922hpi

(cDNA insert = l?93bp> > 40 ESI entries, example * R60100 500bp (cDNA inserts 145 Ibp)

Figure 5.5. EST entries identified with BLASTN 2.0.9 searches of the GenBank human EST database with AA833741, embZ20964, and R60100. The uppermost horizontal lines represent known sequence around the SlOOAl and SlOOAl3 genes. Where these lines are not linked, the specific distances and adjoining sequence is not known. Solid lines beneath this represent EST entries from the GenBank human EST database. Dashed lines link sequence within the same EST entry. Greater than 95% homology is displayed between EST entries (solid lines) found in the same vertical plane, except where indicated (EST entry embZ20964). The genomic organisation is known for EST entry AA364489 only, distances and positions of sequences within all other EST entries are unknown, except in the case of A ll85340 which displays 100% homology over a small stretch of the SlOOAl sequence indicated (drawn in the same vertical plane). Drawn to scale. BLASTN 2.0.9 search was performed via the blast server, NCBI (http://www.ncbi.nlm.nih.gov/blast) between the 3''^ of February and the 6'^ March, 1999. # .

Figure 5.6. PCR products using primer combinations A13uni-rev with A13Sp3F (A), and A13uni-rev with A13Sp5F (B), amplified from the given templates, and resolved on 2% agarose gels. Gels were stained with elhidiuni bromide and photographed under UV light. In each case (A and B) template (other than negative control, H^O, and positive control, PAG clone 178 F15) was constructed from 5()ng of Foly-A+ RNA of the given tissue with M-MLV reverse transcriptase (+RT) or without (-RT) following the manufacturers specification (Fromega). Brain = adult brain, liver - fetal liver, tuid pancreas = adult pancreas. White, blocked, arrows indicate the minimum of three FCR products seen with primer combination A13uni-rev and A13Sp5F (B).

Figure 5.7. PCR products using primer combination A13uni-rev with A13Sp6F , amplified from the given templates, and resolved on 2% agarose gels. Gels were stained with ethidium bromide and photographed under UV light. Template (other than negative control, H^O) was constructed from 50ng of Foly-A+ RNA of the given tissue with M- MLV reverse transcriptase (+RT) or without (-RT) following the manufacturers specification (Fromega). Brain = adult brain, liver = fetal liver, and pancreas = adult pancreas.

185 EST sequence used Primer H^O Clone Brain Fetal Liver Pancreas for forward primer combination 178 F15 design +RT -RT +RT -RT -i-RT -RT IMAGE 113442 A13splF / + A13uni-rev AA366085 A13sp2F / + + 4 A13uni-rev AA 143002 A13sp3F / + + + 4- A13uni-rev

W25732 A13sp4F / + + + + * + + ! * + + * A13uni-rev AA364489 AI3sp5F / + + + + A13uni-rev Table 5.1. Results of PCR experiments using the given primer combinations on cDNA constructed from adult brain, adult pancreas, and fetal liver with M-MLV reverse transcriptase (+RT) and without M-MLV reverse transcriptase (-RT). * - indicates DNA contamination (PCR product of expected size from genomic amplification in addition to any (+) product of the expected size from cDNA amplification).

In each case (adult brain, pancreas and fetal liver cDNA) the majority of the PCR amplified product, using primer combination A13sp5F / A13uni-rev, was represented by one of the minimum of three possible products. The major product of the brain cDNA PCR was the smallest, approximately 250bp. The major products of the pancreas and fetal liver cDNAs were approximately 350bp in size. The larger product (or products), of approximately 440bp, displayed amplification at a much lower level. The PCR products from brain cDNA and fetal liver were ethanol precipitated and sequenced with the primers used in the PCR. The results showed that the PCR product from brain cDNA consisted of sequence 100% homologous to that of EST entry AA364489. The sequence of the PCR product from liver was 100% homologous to the first stretch of EST entry AA364489 (not homologous to the published S100A13) followed by sequence 100% homologous to EST entry W25732. It was therefore hypothesized that any larger products

186 seen after PCR amplification from these cDNA sources would consist of additional sequence from other SlOOAl 3 EST species upstream of that from entry AA364489.

Three additional EST entries detailed within section 5.3.7 and figure 5.5 (accession numbers AA814441, AI027537, and AA833741) displayed no homology to the published sequence of SlOOAl3. The sequence present in the database only represented a portion of these cDNA clones, in all cases a 5’ sequence read of no more than 444bp. For example, the insert of the cDNA clone for which the 444bp sequence (accession number AA833741) was available, was 1161bp. It was hypothesized that these cDNA clones would contain sequence homologous to SlOOAl3. Unfortunately, due to plate contamination at the HGMP Resource Centre, these newly identified cDNA clones were not available for investigation. A primer was designed from the sequence of AA833741 (A13Sp6F) and used in conjunction with A13uni-rev in RT PCR experiments to determine if these clones did contain sequence homology to S100A13. A similar pattern of DNA fragments to those from the PCR with the A13sp5F/A13uni- rev primer combination was seen when the resulting PCR products were resolved on a 2% agarose gel. Figure 5.7 displays this result. Three distinctly sized DNA fragments (approximately 400, 500, and 600bp) were individually excised from the agarose gel and separately purified by electroelution. 2pl of a 1:10 dilution of each of the isolated products were used as template in separate PCR experiments, under the same conditions as before. Of the three different templates (400, 500, and 600bp), the PCR amplification was only specific for the smallest DNA fragment isolated (400bp - a single distinct 400bp product was seen with agarose gel electrophoresis). The two larger isolated products produced the full range of all three PCR amplified DNA fragments, seen in the original experiment (figure 5.7). The desired sized products in these cases constituted the majority of amplified DNA fragments (e.g. although all three sized products were produced from the 500bp isolated template, the majority of the product was of 500bp in size, a similar case was seen with the 600bp template - the majority of the amplified DNA was 600bp in

187 size). Products from these PCR experiments were ethanol precipitated and sequenced using primers from the PCR amplification. The PCR product from the 600bp template failed to produce good quality sequence, presumably due to the presence of the other, smaller, amplified DNA fragments. The smallest PCR product (400bp) was homologous to only a portion of the EST entry AA83374 and also to the hypothetical S100A13 exon 2. The medium-sized PCR product (500bp) contained sequence homology to EST entry AA83374, which was broken by the inclusion of sequence homologous to EST entry AI30440, and homology to the hypothetical S100A13 exon2. Figure 5.8 displays the sequence homology of this and previous RT-PCR experimental data. The results proved that some of the identified cDNA clones did indeed contain sequence homologous to that of the published SlOOAl3. It was also hypothesized that the additional cDNA clones, which were not investigated, would probably also contain such sequence.

188 S100A13 SlOOAl 3 SlOOAl SlOOAl 1-----

AA3644» ÆST75086TKÎR>

:, 500bp .î >40tm\nei MQlt»

Figure 5.8. RT-PCR products from adult brain and fetal liver, using primer sequences derived from S100.\13 related EST sequence, in relation to other EST entries identified in the region as determined from BLASTN 2.0.9 searches. RT-PCR products are highlighted by the white rectangle. RT-PCR products were purified and sequenced. This sequence was used to search the GenBank human EST database using the BLASTN 2.0.9 (Altschul et al, 1997) program. Grey boxed area depicts data displayed in Figure 5.5 (EST entries identified with BLASTN 2.0.9 searches of the GenBank human EST database with AA833741, embZ20964, and R60100) with the addition of EST entry W25732. The uppermost horizontal lines represent known sequence around the SlOOAl and SlOOAl 3 genes. Where these lines are not linked, the specific distances and adjoining sequence is not known. Solid lines beneath this represent RT-PCR products or EST entries from the GenBank human EST database. Dashed lines link sequence within the same RT-PCR product or EST entry. Greater than 95% homology is displayed between sequences (solid lines) found in the same vertical plane, except where indicated (EST entry embZ20964). The genomic organisation is known for EST entries AA364489 and W25732 only, distances and positions of sequences within all other EST entries are unknown. Drawn to scale. BLASTN 2.0.9 search was performed via the blast server, NCBI (http://www.ncbi.nlm.nih.gov/blast) between the 20"' of January and the 16"" February, 1999. 5.3.9 Sub-cloning and sequencing of an SlOOAl 3-positive Sau3Pd fragment from the 10q23 localized cosmid clone ICRFl 12cL0496

To determine the nature of the SlOOAl3 homologous sequence present on chromosome 10q23 (sections 5.3.1 and 5.3.2), the cosmid clones ICRFl 12cL0496 and ICRFl 12cG1487 (identified with the hybridization results detailed in section 5.3.1) were investigated. These cosmid clones contained an approximately 15kb EcoRL fragment positive for the hybridization with the probe representing SlOOAl3 (figure 5.1). Whilst isolating 5aM3AI-digested cosmid insert DNA from clone ICRFl 12cL0496, a 1.6kb fragment was identified with the S100A13 probe (see figure 5.1). Cosmid DNA from ICRFl 12cL0496 was subsequently preparatively digested with Sau3M restriction enzyme. Sau3Al restriction enzyme digests genomic DNA more frequently than £coRI restriction enzyme due to a shorter recognition sequence (4 bases as opposed to 6). The resulting Sau3Al fragments were partially resolved (electrophoresis time was halved in order to prevent any small DNA fragments migrating out of the gel) on a 0.8% agarose gel and Southern blotted. This Southern blot was hybridized with the radio-labeled probe representing SlOOAl3 in order to identify any possible, additional, Sau3Al fragments other than the 1.6kb fragment seen in figure 5.1. A single 1.6kb positively-hybridizing- fragment was identified, gel purified, and sub-cloned into a BamHl restriction enzyme digested pBluescript^ vector as described in section 2.2.24 (materials and methods). Plasmid DNA containing the sub-cloned, 1.6kb, Sau3Al fragment was purified using a QIAGEN™ midi-tip and used as a template for sequencing. Sequencing was carried out on an ABI310 using BigDye™ terminator chemistry with primer walking. Figure 5.9 describes the results of BLASTN 2.0.9 searches of the non-redundant database with the sequence of the 1.6kb Sau3Al fragment in question. The greatest degree of homology was seen with four entries in the non-redundant database, all of which displayed 96% homology over just 25 bases (24 out of 25) or 100% over 21 bases. One of the entries displaying 100% homology over 21 bases was the published sequence of the SlOOAl 3 mRNA. A BLASTN 2.0.9 search of the Genbank human EST entries failed to identify any longer sequences or greater homologies than that seen with the BLASTN 2.0.9 search of the non-redundant databases.

190 ______O^kb

M13F+ 96L4R2

^ 96L4F ' 96L4R pBluescript pBluescript ^ 0

■ gb/M37984mUMTROC gb/AC005944 gb/AC(X)6208 3

'*F*( . - 'W I'p- .'--'A yw

Figure 5.9. Sequence of a 1.6kb 5aw3AI fragment from cosmid clone ICRFl 12cL0496, primers and sequence used to generate data, and the results of a BLASTN 2.0.9 search (Altschul et al, 1997) of the non-redundant Genhank, EMBL, DDBJ, and PDB databases with this sequence. Upper horizontal line shows the scale in kilobases. Arrows beneath the scale show the direction and size of BigDye™ terminator (PE Applied Biosystems) sequence using primers indicated (see materials and methods for primer sequence). The lower, thicker, horizontal line shows a schematic representation of the derived sequence and the positions of vector sequences, identified with the BLASTN 2.0.9 program. Underneath this are the regions displaying greater than 95% homology to sequence entries indicated. BLASTN 2.0.9 search was performed via the blast server, NCBI ('http://www.ncbi.nlm.nih.gov/blast). The stronger hybridization signal seen in cosmid clones ICRFl 12cL0496 and ICRFl 12cG1487 compared to cosmid clone ICRFl 12cP0780 could be explained in the following way. Cosmid clone ICRFl 12cP0780 contained 25 out of 25 bases of sequence, homologous to the probe representing SlOOAl3 used in the original hybridization experiments to identify clones from the flow sorted chromosome 1 library. The GC content of this 25bp stretch is 60% (15 of 25bp). Cosmid clone ICRFl 12cL0496 contained 21 out of 21 bases of sequence, homologous to the probe representing SlOOAl 3. The GC content of the 2 Ibp homologous stretch is 76.2% (16 out of 21 bases). The PCR amplified probe representing the SlOOAl3 gene contained 214bp of the 447bp published sequence covering bases 203 -434. The homology seen in cosmid clone ICRFl 12cP0780 covered bases 203-227 of the published SlOOAl3 sequence (the very first 25bp of the probe). The homology seen in cosmid clone ICRFl 12cL0496 was located more centrally within the sequence of the probe (bases 388 to 408). Random primed oligo-nucleotide labeling generates, on average, labeled stretches of 50-300 bases (Dr D. Nizetic, personal observation). Therefore, taking into account the higher GC content of the ICRFl 12cL0496 sequence compared to that of ICRFl 12cP0780, more of the labeled probe would have remained hybridized to ICRFl 12cL0496 than ICRFl 12cP0780 after washing (see section 2.2.10). In addition, when comparing the lanes of the ethidium stained gel containing resolved DNA fragments from all three cosmids (figure 5.1) it can be clearly seen that more DNA is present for cosmid clones ICRFl 12cG1487 and ICRFl 12cL0496 than for cosmid clone ICRFl 12cP0780. This would provide more target DNA from clones ICRFl 12cL0496 and ICRFl 12cG1487 for the hybridizing probe than from ICRFl 12cP0780, generating a stronger hybridization signal. The derived sequence of the 1.6kb Sau3Al fragment from cosmid clone ICRFl 12cL0496 showed no greater homology to SlOOAl3 than the 2Ibp described above, when compared using BLASTN 2.0.9. No significant database entries were identified with the 1.6kb sequence either. An ORF search was performed using the ORF finder program available at NCBI, with the 1.6kb sequence. One ORF was identified that included the 21 homologous bases. The translated amino acid sequences of this ORF showed no

192 homology with protein database entries. It was therefore concluded that the 2Ibp present on chromosome 10q23, homologous to the SlOOAl3 gene does not represent a related transcribed sequence (e.g. a novel SlOO gene).

5.4 Transcriptional orientation of other SlOO genes within the EDC

5.4.1 S100A2 orientation determined from mapping data

The exon/intron structure of the S100A2 gene had been determined by sequencing 8,670bp of genomic DNA (Wicki et al, unpublished). This sequence (emb/Y0775/HSS 100A2) was searched for EcoRI, Notl and SaU. restriction enzyme recognition sequences. Two were identified: one EcoRI consensus sequence and one SaU. consensus sequence. Figure 5.10 shows these restriction enzyme sites in relation to the exon and intron structure of the S100A2 gene. This sequence was compared to the final restriction map of the EDC contig (figure 3.12). The probe for SlOOA2 localized to an approximately 17kb EcoRl fragment contained within clones 148 L21 and 178 F 15. Probes for S100A3 and SI00A4 also localized to this fragment in addition to the SP6 end fragment of clone 230 D1 (S100A2 negative). The 8,760bp of sequence-data positions the single EcoRl consensus sequence 2,291 nucleotides upstream of SlOOA2 exon 1. The position of the SaU consensus sequence is 1,449 nucleotides further upstream from S 100A2 than the EcoRI consensus sequence. The restriction map data describes a single SaU site present in clones 148 L21 and 178 F15. This SaU site digests an 8,900bp EcoRl fragment into 7,450bp and l,500bp fragments (as sized by the Molecular Dynamics software package, FragmeNT Analysis, section 3.5) respectively. This is in good agreement with the l,449bp Sail - EcoRl fragment determined from the 8,670bp sequence and positions this sequence precisely onto the EDC contig restriction map. This in turn elucidates the transcriptional orientation of the S100A2 gene to be in concordance with SI00A3, SI00A4, S100A5, S100A6, and S100A13 (telomere to centromere).

193 5.4.2 SlOOAl2 orientation determined from mapping data

The transcriptional orientation of the SlOOAl2 gene was also determined by using the restriction map data and previously determined sequence data (Yamamura et al, 1996). As described (section 3.5.2), the probe representing SlOOAl2 hybridized to a 4,450bp EcoRl fragment present in both clones 127 E l2 and 138 F6 as well as an internal 12,500bp EcoRI fragment of 138 F6 and the 9,200bp SP6 end EcoRl fragment of 127 El 2. This data determined the SP6 end EcoRL fragment of clone 127 E l2 to be overlapping with the 12,500bp EcoRL fragment of clone 138 F6. The S100A12 mRNA and 4,092bp genomic sequence (Genbank accession D83664 and D83657) define a single EcoRL site present in the third exon of the SlOOAl 2 gene (base 285 out of the 466bp mRNA). DNA primers (A12EX3F, A12EX3R) were designed to amplify 3’ of the EcoRl site in the third exon of the SlOOAl2 gene. PCR was performed with these primers, using PAC clone 127 E l2 as a template, amplifying 126bp of the SlOOAl2 gene. This PCR product was purified (electroeluted), radio-labeled, and hybridized to a Southern blot containing DNA from clones 127 E l2 and 138 F6. The 4,450bp EcoRl fragment present in both clones was the only EcoRl fragment to positively hybridize with the PCR probe representing the SlOOAl 2 exon 3 sequence 3’ of the EcoRl site. This data compared to the restriction map (figure 3.12) determined the transcriptional orientation of SlOOAl2 to be in concordance with S100A2, SI00A3, SKX)A4, S100A5, S100A6, and SlOOAl3 (telomere to centromere).

2kb 4kb 6kb 8kb lOkb -4 — — I— — I— -H— —i S100A2 EcoRl SaU. I______L_

Figure 5.10. Schematic diagram representing the positions of EcoRI and Sail restriction enzyme sites in relation to the exonAntron structure of S100A2. Top horizontal line depicts the scale in kilobases (kb) while the bottom horizontal bar represents 8.76kb of genomic sequence (Wicki et al, unpublished) and the position the S100A2 exons and restriction enzyme sites. Arrow indicates the 5’ to 3’ transcriptional orientation of the S100A2 gene.

194 5.5 SlOO cross-species comparison: Gallus gallus

5.5.1 Published Gallus gallus SlOO genes and human equivalent comparison

The sequences of Gallus gallus calcyclin (S100A6) mRNA (gb/U7635/GGU76365) and Gallus gallus calgizzarin (SlOOAl 1) mRNA (gb/U77733/GGU77733) were compared to their Homo sapiens counterparts (gb/J02763/HUMC AC Y and dbj/D49355/HUMS100CPlrespectively) using the BLASTN 2.0.9 program (Altschul et al, 1997) available at the NCBI web site. Gallus gallus calcyclin mRNA displayed an 85% homology to human S100A6 over 278bp with no nucleotide gaps. Gallus gallus calgizzarin mRNA displayed a 74% homology to human SlOOAl 1 over 26Ibp with no nucleotide gaps.

5.5.2 Identification of Gallus gallus SI OCA 10

A systematic BLASTN 2.0.9 search of the non-redundant GenBank, EMBL, DDBJ, and PDB databases was performed using each human SlOO mRNA sequence (Al-13) in order to identify any additional Gallus gallus sequences present within the databases. Table 5.2 describes the accession numbers for each human SlOO gene used. This search identified a single Gallus gallus database entry with the human SIOOAIO mRNA sequence - gb/M38592/CHKCLANNn Gallus gallus domesticus cellular ligand of annexin II (pi 1). The Homo sapiens cellular ligand annexin II (pi 1) mRNA was also identified with the BLASTN 2.0.9 search using Homo sapiens SIOOAIO mRNA database entry. The human pi 1 gene is synonymous with SIOOAIO (Schaefer et al, 1995) and therefore the Gallus gallus pi 1 gene represents the equivalent SIOOAIO gene. Gallus gallus domesticus cellular ligand of annexin II displayed a 79% homology, over 284bp with no gaps, to the sequence of human SIOOAIO (gb/M81457/HUMCALPAlL). The database sequence entry for G.gallus pi 1 (gb/M38592/CHKCLANNII) was submitted by V. Gerke, unpublished. In addition to the identification of the Gallus gallus SIOOAIO mRNA sequence, the BLASTN 2.0.9 searches also identified the sequence of two bacterial clones containing

195 regions homologous to SlOOAl 1. Database entries gb/Ac005826/AC005826 (Homo sapiens clone UWGC;rg041a03 from 7pl4-15) and gb/AC004668/AC004668 (Homo sapiens BAG clone RG2760003 from 7q22-q31.1) displayed considerable homology to the SlOOAl 1 mRNA sequence - 89% over 52Ibp with 13 nucleotide gaps (468/52Ibp and 466/52Ibp respectively). This is in contrast to the data presented by Moog-Lutz et al, 1995, who localized SlOOAl 1 to human chromosome lq21 only, using in situ hybridization with radio-labeled plasmid DNA containing the SlOOAl 1 cDNA sequence.

Table 5.2. Sequence accession numbers for each of the SlOO genes used in a systematic BLASTN 2.0.9 (Altschul et al, 1997) search of the non-redundant databases available at the NCBI weh site. Homo sapiens SlOO mRNA Sequence accession number Al ref/NM_006271.1/SlOOAl A2 gb/AF086003/HUMYU63D08 A3 ref/NM_002960.1 /S100A3 A4 ref/NM_002961.1/S 100A4 A5 ref/NM_002962.1/S100A5 A6 gb/J02763/HUMCACY A7 ref/NM_002963.1/S 100A7 A8 emb/X06234/HSMRP8 A9 emb/X06233/HSMRP14 AlO gb/M81457/HUMCALPAlL A ll dbj/D49355/HUMS 1 OOCPl AI2 ref/NM_005621.1/S100A12 A13 ref/NM_005979.1/S 100A13

5.5.3 Chicken genomic cosmid library screening using G.gallus S100A6 and SIOOAIO

PCR primers were designed from each of the 3’ portions of the Gallus gallus SlOO gene sequences and used to amplify the desired sequence from isolated Gallus gallus genomic DNA (approximately 25ng template). PCR products were ethanol precipitated and sequenced, as described in section 2.2.20, in order to verify the content of the amplified DNA fragment. Nylon membranes containing a gridded chicken cosmid library

196 (constructed by Johannes Buitkamp, Leo Schalkwyk and Rudi fries) were obtained in collaboration with Leo Schalkwyk (library number 125, RessourcenZentrum, Berlin, Germany). These library membranes were screened with radio-labeled probes representing the chicken S100A6 and SIOOAIO genes as described. It was decided not to screen the library with a probe representing the chicken S1(X)A11 gene, due to the identification of human SlOOAl 1 homologous sequence at two genomic loci other than that at human chromosome lq21. Six strong positive signals were seen from the screen with the probe representing chicken S100A6, while 11 strong positive signals were seen with the probe representing chicken SIOOAIO. Four gridded membranes containing 27,648 clones each, totaling 110,592 clones provides a genome equivalent coverage of 3.69, based on an average 40kb cosmid insert size and a genome size of 1,200Mb. Six identified positive clones for chicken S 100A6 is within the limits for a single locus throughout the chicken genome. Eleven positive cosmids with chicken SIOOAIO would suggest either the SIOOAIO locus is three­ fold over represented within the cosmid library or that 2 or 3 loci, or homologous loci exist throughout the chicken genome. Identifying the addresses of the 17 positive cosmid clones was hampered by the poor gridding of the library membranes. Background colony signal (negative clones present on the library filters) was apparent, but the lack of a true grid delineating the positions of the microtitre plates used to stamp the membranes made scoring positive clones difficult. Grid lines were physically drawn onto exposed auto-radiography film using the provided information regarding the numbers of microtitre dishes and clones spotted onto the membranes. Only nine, of the 17, cosmid clones were received from the RessourcenZentrum, Berlin (via Leo Schalkwyk). DNA was isolated from these cosmid clones (as described) and digested individually with EcoRl and individually with BamUl restriction enzymes. The resulting restriction enzyme digested fragments were resolved with 0.8% agarose gel electrophoresis and transferred to nylon membranes as described. These nylon membranes were hybridized with the original labeled DNA probe (chicken S100A6 and SIOOAIO). Of the nine cosmid clones analyzed, three (from the six S1(X)A6 library screen

197 positive clones received) displayed positively hybridizing DNA fragments with the S100A6 probe, while three (from three SIOOAIO library screen positive clones received) displayed positively hybridizing DNA fragments with the SIOOAIO probe. Two of the three SIOOAIO positive cosmid clones were redundant (contained identical insert DNA). Figure 5.13A shows a 0.8% agarose gel resolving the six chicken cosmid clones identified with the S100A6 library screen, and the resulting Southern blot showing the three positively hybridizing cosmids (with S100A6 as a probe). Human SIOOAIO and SlOOAl 1 co-localize within 20-95kb (as determined from the integrated map, figure 3.12); therefore the presence of chicken SlOOAl 1 within the identified cosmids was investigated. Southern blots of the EcoRl digested, chicken cosmid DNA were screened with a labeled probe representing the chicken SlOOAl 1 gene. No positively hybridizing EcoRI fragments were seen.

5.5.4 Direct sequencing S100A6, SIOOAIO and comparison Primers were designed from the chicken S100A6, chicken SIOOAIO, and human SIOOAIO genes, for direct sequencing (no genomic sequence data was available for human SIOOAIO). PAC clone 135 06 from the integrated map of the EDC (figure 3.12) was used as template for human SIOOAIO sequencing, while a combination of the chicken S100A6 and SIOOAIO positive cosmid clones were used as template for direct sequencing of the chicken genes. Figure 5.11 displays BLASTN 2.0.9 data from the complete sequence of the chicken S100A6 locus. Unfortunately no further sequence upstream of the chicken S100A6 locus could be obtained - sequence from three separate primers terminated abruptly at the same point (see figure 5.11). Table 5.3 compares the exon - intron structure of the human and chicken S100A6 genes.

Organism Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Honu) sapiens 80bp 586bp 159bp 373bp 210bp Gallus f>allus 25bp 534bp 152bp 33 Ibp 198bp Table 5.3. Comparison of the exon/intron structure of the S100Â6 genes in Homo sapiens and Gallus gallus.

Figure 5.12 displays the BLASTN 2.0.9 data of the sequence obtained from the chicken and human SIOOAIO primers. The exon structure for both published SIOOAIO genes was determined (as shown in figure 5.12). Intron structures for the published chicken and human SIOOAIO genes were not determined.

198 0.5kb Ikb I.5kb 2kb —I------1------1------1

CHA6D1-1 CHA6F

CHA6DI-2 CHA6R

CHA6DI-3 C H A 6E ^1R

Exon 1 Exon 2 Exon 3

Figure 5.11. Direct sequencing of chicken cosmid DNA containing the S100A6 locus. Top horizontal bar represents the scale in kilobases (kb). Arrows delineate sequence generated from the given primers (see section 2.1.4 for primer sequence). Thick, lower, horizontal bar represents genomic sequence generated (l,777bp in total), while rectangles depict the positions of chicken S100A6 sequence (gb/U76365/GGU76365 Gallus gallus calcyclin mRNA), as determined by BLASTN 2.0.9 searches of the non-redundant databases via the blast server at the NCBI web site (http://www.ncbi.nlm.nih.gov/blast). Appendix III displays the sequence described here. Table 5.4 displays the determined exon structure from the published human and chicken S 1OOA10 sequence.

Organism Exon 1 Exon 2 Exon 3 Homo sapiens 93bp 150bp 405bp Gallus gallus 14bp 148bp 201bp

Table 5.4. Exon structure of the SIOOAIO genes in Homo sapiens and Gallus gallus deduced from comparison of genomic sequence to database SIOOAIO sequence.

During the course of running BLASTN 2.0.9 searches with the genomic sequence derived from primers of the human SIOOAIO gene, homology to a novel database entry was seen. Determined genomic sequence of the human SIOOAIO exon 2 showed 88% homology over 177bp with database entry AC005921, /fomo sapiens chromosome 17 clone hRPK.294_J_22, complete sequence. The determined genomic sequence of the human SIOOAIO exon 3 also showed homology to this entry, but over different regions - 81% over 272bp (with four gaps) and 85% over 41bp. Figure 5.12 displays the location of this homology between these two genomic sequences (determined human SIOOAIO and database entry for the human chromosome 17 clone).

200 CH AI OR 1 CHAIOF < ------C H A 10EX2F C H A I OR CHA10EX2R

Gallus gallus genomic DNA:

A10EX2R SIOOAIOR

A lO EX IR A 10EX2F AlORl SIOOAIOF

Homo sapiens genomic DNA:

Ggb/AC005921.3/AC005921 H .sapiens chrl7 hRPK. 294_5_22 88% 81% 85%

Ikb

Figure 5.12. Direct sequencing of PAC clone 135 06 and chicken SIOOAIO positive cosmids with primers derived from SIOOAIO mRNA sequences and derived genomic sequence. Thick horizontal lines in three planes represent sequence derived from genomic clones. Top plane represents sequence derived from chicken SIOOAIO positive cosmids, middle plane represents sequence derived from PAG clone 135 06, lower horizontal plane represents sequence identified from that of the middle plane with a BLASTN 2.0.9 search of the non-redundant databases performed via the blast server, NCBI (http://www.nchi.nlm.nih.gov/blast). Thick black lines in this plane (lower) represent regions of homology, indicated below. Dashed lines link regions of homology. Where these dashed lines cross indicates opposing orientation of homologous sequences. Arrows represent sequence derived from the associated primer (for primer sequence see 2.1.4). Scale is in kilobases indicated at the very bottom of the figure. Dark grey rectangles represent 99-100% homology with database entries for chicken SIOOAIO (top horizontal plane) and human SIOOAIO and pi 1 sequences (middle horizontal plane). White rectangle represents a repetitive element (identifies >100 database entries from sequenced genomic clones) which is non-Alu (no Alu like sequences identified with a BLASTN 2.0.9 search of the Alu database). Grey vertical rectangles represents an average of 71% homology between the top and middle plane sequence. Appendix 111 displays the sequence data described above. 5.5.5 Analyzing the presence of other SI00 genes in chicken cosmid clones identified with chicken S100A6 and SlOO AlO

Nylon membranes containing Southern blotted chicken cosmid DNA (digested with either EcoRl or BamYÜ. restriction enzymes) were systematically probed with labeled DNA (see materials and methods) representing the human SlOO genes that were localized to the integrated map (see figure 3.12), with the exception of human S100A6, AlO and A11. It was observed that a number of the probes representing human SlOO genes were hybridizing to vector based fragments present on the nylon membranes (either Scos-1 vector or marker DNA). Vector based labeled probe DNA sequences were therefore suppressed by hybridizing with 2pg of vector based DNA (Ikb ladder. Life Technologies) prior to membrane hybridization (see materials and methods section 2.2.9). This effectively suppressed the majority of vector based DNA hybridization signal seen on the nylon membranes for the majority of probes used. Figure 5.13B displays the hybridization results for human SlOOAl, A2 and A4 against Southern blots of the three S100A6 positive chicken cosmids. Table 5.5 displays the results seen with the hybridization experiments using human SlOO genes as probes against Southern blots (nylon membranes) of restriction enzyme digested chicken cosmid DNA.

Chicken S100A6 positive cosmids were mapped from the BamUl digested fingerprints. The identification of vector fragments assisted map construction. The map is based on BamRl fingerprint data and SlOO probe hybridization data only, no insert hybridizations were possible due to the inability to identify a restriction enzyme that would reliable separate vector from insert. Figure 5.14 displays the BamHI restriction map of the chicken S100A6 positive cosmids (A) as well as a schematic of the two, non-redundant, chicken SIOOAIO cosmids (displaying EcoBl sites - B). The positions of positively hybridizing fragments with the probe used are indicated.

202 B Ml 2 3 M 3 2 1 2 3 1 2 3 1 2 3 1111 1 111 111 111 m; # ^ PI K # #

#e ##*####»

mm mm

Probe = G.gSlOOAô Probe = H.s: SlOO

Figure 5.13. Agarose gels and resulting Southern blots (hybridized with the given probes) of restriction enzyme digested clones identified from from a cosmid library constructed from total Gallus gallus (chicken) genomic DNA, with a probe representing Gallus gallus S100A6. In each case cosmid DNA is resolved in the presence of size markers (Ikb ladder, Life Technologies) indicated in A only (by M). 1 = mpmgl25N2443; 2 = mpmgl25N0477; 3 = mpmgl25C09121. A - 0.8% agarose gel resolving BamWl restriction enzyme digested DNA from six separate cosmid clones and the resulting Southern blot hybridized with a probe representing Gallus gallus (G.g) S100A6. B - 0.8% agarose gel resolving three identical sets of BamHI restriction enzyme digested DNA of the three G.,g S100A6 positive cosmids identified in A, and the resulting Southern blots hybridized with the given probes representing Homo sapiens (H.s)SlOO genes. SlOO Hybridization results Hybridization results Hybridization results probe against: Chicken S100A6 against: Chicken S100A6 against: Chicken SIOOAIO used positive cosmids (BamHI positive cosmids (BcaRl positive (cosmids EcdRl digestion) digestion) digestion) Al Strong hybridization to Strong hybridization to a Weak hybridization to an 6.Ikb fragment in one roughly 20kb fragment in 8kb fragment in one cosmid only one cosmid only cosmid only A2 Weak hybridization to Weak hybridization to a Negative 5.Ikb fragment, all roughly 25kb fragment, cosmids all cosmids A3 Weak hybridization to Not tested Not tested 6.1 kb fragment in one cosmid only A4 Weak hybridization to Weak hybridization to a Negative 5.Ikb fragment, all roughly 25kb fragment, cosmids all cosmids A5 Weak hybridization to Not tested Not tested vector based fragments and 6.Ikb fragment in one cosmid only A7 Negative Negative Negative A8 Negative Negative Negative A9 Strong hybridization to Not tested Not tested vector based fragments, very weak hybridization to 3.8kb fragment in all cosmids AI2 Negative Negative Negative A13 Negative Negative Negative Table 5.5. Summary of hybridization results from hybridizing radio labeled probes representing human SlOO genes against Southern blots of the S100A6 and SIOOAIO positive chicken cosmids.

204 A

G.g. S100A6 H.S. SlOOAl (H.s. SI00A2+4) (H SIOÛA3+5) 0 ■<-

^ — I------1^ 1—ly— inpm gl25N 2443

mpmgl25N0477 -H— I ------H—

mpmgl25C09121 A)-'—f-"------

Ok^ _10kb

B

(H.s. SlOOAl) SIOOAIO

niping 12501437

iTipmgl25K 10210 inpmgl25B14211

Okb _IOkb

Figure 5.14. Partial restriction endonuclease maps of chicken cosmid DNA indicating the relative positions of positively hybridizing fragments with the given prohes. In both cases (A and B) the thick upper horizontal bar shows a schematic of the genomic DNA represented by cosmid DNA. Horizontal lines (not arrows) beneath this thick bar represent cosmid DNA with clone addresses given. Vertical lines dissecting these horizontal lines depict restriction endonuclease sites (in the case of A, Bam\\\, in the case of B, EcoRl). Rhomboid shapes present on the thick upper bar represents the relative positions of SlOO genes indicated (G.g. = Gallus gallus, H.s. = Homo sapiens), while rhomboid shapes on the lower horizontal lines indicate corresponding positions within cosmid DNA - black indicates positioning by hybridization and sequencing data from the gene given in bold text, dark grey indicates strong positive hybridizations for gene given in bold text, while white rhomboid shapes and bracketed text indicates weaker hybridization signal. In the case of A arrows indicate BamHI ‘bins’, where order of DNA fragments cannot be specifically determined from the restriction mapping data given.

205 These data indicate that a similar clustering of SlOO genes at the S100A6 locus is present within chicken genomic DNA compared to that seen in mouse and human genomic DNA. Two separate BamYil fragments, other than the chicken S100A6 locus, exhibit positive signals upon hybridization with human SlOO gene probes, indicating the presence of other, orthologous chicken SlOO genes. The chicken SIOOAIO locus did not show co-localization with the chicken SlOOAl 1 over 50kb of genomic DNA represented by two cosmid clones. However, it cannot be concluded that these do not co-localize due to the limited number of cosmid clones studied. The presence of sequence homologous to human SlOOAl is demonstrated at the chicken SIOOAIO locus, suggesting an additional SlOO gene in chicken when compared to human, or a different order of SlOO genes around the chicken SIOOAIO locus, to that seen in human.

206 Chapter 6: Discussion

6.1 Evaluation of the integrated map as a molecular resource

As described in chapter 3, a fully overlapping set of bacterial clones spanning 2.45Mb of human chromosome lq21, and encompassing the known Epidermal Differentiation Complex, is presented (figure 3.12). All identified genes within the EDC are specifically localized on EcoRI restriction enzyme fragments within a partial EcoRI, and full Not\ and SaU., restriction enzyme map. In addition, known markers and new markers have been positioned, thus facilitating the refinement of order and distances between markers and genes within the described region. These distances are in good agreement with previous, low resolution, physical mapping studies from a number of sources (see section 3.6.2). The following text within this section evaluates the fully overlapping set of bacterial clones as a molecular resource.

6.1.1 As a substrate for large scale-sequencing

The contig of bacterial clones presented has been accepted and integrated into, the chromosome 1 sequencing project at the Sanger Centre, Hinxton, UK. At the time of acceptance, the map presented here was the longest set of completely contiguous bacterial clones representing a region of chromosome 1 available to the Sanger Centre sequencing project (Simon Gregory, personal communication). Of the total map length, 59.3% is covered by overlapping clones with a depth ranging from 3-9 fold, while only 15.6% is covered by single clones. Of the 15.6% single fold coverage, not one stretch is larger than 50kb (see figure 3.12) and in each case the bacterial clone containing single fold coverage, is validated over more than 50% of it’s length with other, overlapping clones. It may, however, be necessary for the Sanger Centre to identify additional clones over the single fold coverage regions, in order to validate the faithful representation of cloned DNA represented by these clones (McPherson, 1997). The

207 greater resources of this larger genome centre, such as the availability of human genomic libraries with a larger total coverage, will facilitate this. In addition, the clones that do contain single-fold coverage will themselves provide the means of screening the additional genomic libraries (such as RPCI-11, a library with a coverage of 25.3 genome equivalents - SC and WUGSC, 1998) circumventing any need for further mapping of these regions other than identifying bacterial clones and comparing fingerprints. As is standard procedure at the Sanger Centre, bacterial clones will be fingerprinted and confirmed to localize to a region by FISH before beginning sequencing (SC and WGSC, 1998). FISH has been performed with a sample of six PAC clones and one cosmid clone (see section 3.6) from the integrated map. The six PAC clones clearly localized to lq21, adding additional confirmation of map integrity (data not shown). FISH with the single cosmid clone (ICRFl 12cP0780, containing SlOOAl and part of S100A13) gave a strong signal with chromosomal region lq21 but also a weaker signal with chromosomal region lp36. This result confirmed the localization of cosmid clone ICRFl 12cP0780 to lq21 (by the strong FISH signal) and also the presence of chromosome 1-specific repetitive elements within this cosmid, in agreement with other studies observing the presence of such repeats along the length of chromosome 1 (see below).

6.1.2 As a resource for the characterization of chromosome 1-specific repeats implicated in cancer

Previous studies have noticed that homologies exist between lq21 and other regions within chromosome 1. Analysis of clustered repeats in human chromosome lp36.2 led to the observation of 5 homologous regions of chromosome 1 specific repetitive sequences spread across the length of the chromosome. By performing FISH with a large number of PAG clones, the combined length of homologous sequence seen in the general regions lp36, lpl2, lq21 and lq42 was estimated to be 8Mb (Versteeg et al, 1998). A YAC clone containing SlOOA6, isolated whilst studying lq21, was seen to localize to lq21 and Ip 13 by FISH (Hardas et al, 1994). This group, headed by JT Elder, microdissected sequences of lq21 and used these in FISH experiments, producing

208 identical results to those seen with the YAC clone. The authors suggested that these results confirm an earlier evolutionary model that human chromosome 1 had arisen from the insertion of a centromere and heterochromatin into an ancestral chromosome containing chromosome-specific repetitive elements (Moseley and Seldin, 1989; Hardas et al, 1994). Further confirmation of the presence of chromosome 1 -specific repetitive elements has been described, again by using a YAC clone in FISH experiments. Whilst studying the frequency of chromosomal alterations in human malignant melanomas, it was seen that chromosomal regions lq21. Ip 13 and (to a lesser degree) lp36 were the most frequent sites of alterations observed in chromosome 1. During the course of studying a t(l ;6)(q21 ;ql4) translocation, a YAC clone was isolated that gave specific signals on lq21, lpl3 and lp36 (Zhang et al, 1999). The authors suggest an etiological role for these three evolutionary conserved regions in the generation of chromosome 1 rearrangements and that physical and transcriptional mapping of the YAC clone used may provide insights into such rearrangements. The YAC clone identified by Zhang and co-workers can be found within the 6Mb YAC contig produced in the lab of Dietmar Mischke, Berlin (this YAC contig was the starting point for the construction of the bacterial map presented in this study, see section 3.3.1). The YAC clone in question, 954_E_4, has been placed at the very centromeric end of the 6Mb YAC contig (Marenholz et al, 1996) and extends telomerically to an estimated 600kb away from the centromeric end of the bacterial map presented here, and over 2.5Mb from the cosmid clone used in the FISH experiment described here (see previous section, 6.1.1). Therefore, this study identifies an additional specific region of human chromosome lq21 (namely cosmid ICRFl 12cP0780, containing SlOOAl and S100A13 sequence) that displays homology with another separate region of human chromosome 1. Moreover a high resolution physical map of this region and the region around S100A6 described in this thesis, will form the basis of the construction of a transcriptional map, which has been suggested as a method to provide insight into frequent alterations of chromosome 1 in cancer (Zhang et al, 1999).

209 6.1.3 As a resource for transfection studies

As well as providing a resource for the identification of new genes (discussed in chapter 4 and below) the bacterial clones will provide the means to study the expression and regulation of genes within the EDC. By modifying PAG and BAG clones to contain selectable markers and reporter genes (Mejia and Monaco, 1997; Yang et al, 1997; Kim et al, 1998), large genomic sections containing the gene or genes of interest, including any flanking regulatory elements, will be available for transfection studies. The production of a full Notl restriction map is particularly useful for the described method of ‘retro-fitting’ selectable markers into PAG clones using restriction enzyme digestion, specifically Not\ (Mejia and Monaco, 1997). Transfection of the modified clone of interest will provide the means to study levels of expression, regulation, and tissue specificity, in a wide range of cell lines and organisms. This has been demonstrated (on a smaller scale) for many of the SPRR genes (Fischer et al, 1996 and 1998; Sark et al, 1998). In these studies, DNA constructs containing the SPRR gene of interest plus varying lengths of 5’ regulatory sequence are transfected into keratinocyte cell lines to study promoter activity. Modification of PAG clones containing a number of SPRR genes (such as 20 N18, 59 H12, and 19 K8 — see figure 3.12) and subsequent transfection would enable expression studies of multiple SPRR genes and any possible locus control elements acting on these genes as a whole, as has been demonstrated with the P-globin locus (Grosveld et al, 1987).

Transgenic organisms have demonstrated that the involucrin promoter directs epidermal tissue specific expression in a differentiation-appropriate manner (Garroll et al, 1993; Grish et al, 1993). This not only provides insight into the function of involucrin itself, but also presents an epidermis specific promoter that may be used towards therapeutic gene manipulation in patients with a deficiency of an epidermis specific gene. The majority of transgenic technology using genomic clones has described the use of YAGs (reviewed by Peterson et al, 1997), but examples of successful transfection of bacterial clone plasmid DNA have been described (Mullins et al, 1997; Yang gf al, 1997). With the complete sequence of the BDG region, common and specific regulatory elements of existing EDG genes (and any resulting novel EDG genes) may be identified;

210 the PAC clones presented here will provide an accessible resource for the study of these genes and regulatory elements.

6.1.4 As a substrate for evolutionary comparisons

The generation of the complete nucleotide sequence, facilitated by the construction of the integrated bacterial map of the described region, will also enable the identification and comparison of syntenic regions in other species. The comparisons of gene order (discussed in greater detail below, section 6.5) and eventual nucleotide sequence, will enable the elucidation of previously unidentified common regulatory elements (as described for the p-globin locus, Hardison et al, 1997) and may suggest evolutionary mechanism associated with the EDC.

6.2 Exon trapping as a method of identifying transcribed sequences within the EDC

Exon trapping from three PAC clones covering approximately 170kb of DNA within the EDC successfully identified 14 putative exons, 13 of which mapped back to the PAC of origin by hybridization and/or PCR (tables 4.3 and 4.4). Of these 13, three identified entries in the available databases of nucleotide and protein sequences that were not associated with repetitive elements, via BLAST searches (Altschul et al, 1997) performed through the NCBI web site. Of these three database entries, one cDNA clone was identified that contained homologous (but not identical) nucleotide sequence to a putative exon. This cDNA clone was mapped to the region used for exon trapping by hybridization, PCR and direct sequencing. A separate putative exon, of the three that identified database entries, displayed homology to this cDNA at the translated amino acid level. The final putative exon that identified a database entry showed homology, at the nucleotide level only, to a GTP binding protein-coding gene.

211 Of the 13 putative exons identified, 10 display an ORF in the expected frame, while one (associated with as L.l repetitive element) displays an ORF in the opposite frame than would be expected. Of the 10 identified ORFs in the expected orientation, the largest translates to 69 amino acids while the smallest translates to 25 amino acids. The average size of the 10 translated ORFs is 44 amino acids.

6.2.1 Efficiency of exon trapping

No exons from the four SlOO genes present within the PAC clones used as template for the exon trapping experiments were present within the three “exon trapped” libraries produced. It was shown by hybridization, that sequences homologous to S100A8, A9 and A12 were present in the PCR amplified material from PAG clones used to produce

the “exon trapped” libraries. This suggested that the final cloning procedure was inefficient or that the number of clones picked to produce the “exon trapped” libraries was not sufficient to represent all exon-like sequences trapped. It was also seen that the majority of clones within one of the three “exon trapped” libraries (number 3, derived from PAC clone 138 F6) contained small lengths of DNA or failed to produce a fragment upon PCR amplification. Sequence analysis of one of these, small cloned fragments identified spliced pSPL3 sequence containing a functional BstXl restriction enzyme site (cloned genomic fragments containing no exon like sequence were presumably spliced out leaving the two pSPL3 construct exons only). This suggested that the digestion of amplified products from the exon trapping procedure with BstXl restriction enzyme was inefficient and not complete. The cloning of this artifact DNA fragment may explain the failure to pick clones containing trapped exons from the SlOO genes and the overall low percentage of identified putative exons (14.3% of overall clones). It would also be expected that competition among PCR templates would favor this smaller and most abundant DNA template, thereby masking true exon sequence (Church et al, 1994). It should also be noted that direct sequencing of small fragments of the PAC clone 127 E l2 (used in the exon trapping experiments, see section|4.3.1), described the exon structure of an identified novel cDNA clone (see figure 4.6). The generated sequence

212 defined at least 3 internal exons of the novel cDNA. This cDNA also mapped to another PAC clone used in the exon trapping experiments, clone 128 LI 5 (see section 4.1). None of the three described internal exons were identified by the exon trapping method, although the smallest exon does share 81 % homology and is of identical size (90bp) to “exon trapped” clone 127E12sC3. It is possible that this exon (the smallest of the three) could have been trapped but not identified due to cross hybridization with clone 127E12sC3, which identified 29 other exons within the “exon trapped” libraries (see table 4.3). These 29 other exons were thought to be redundant with clone 127E12sC3 and not sequenced, although it is possible that some of these clones could have represented the exon derived from the novel cDNA clone. This hypothesis is supported by the fact that no other of the 14 described “exon trapped” clones identified more than 19 other clones within the “exon trapped” libraries.

The original study describing the use of the pSPL3 vector (Church et al, 1994) as an improved method of exon trapping over the original pSPLl vector, defined the identification of one exon per 20-80kb using genomic DNA template ranging in complexity from 30kb to 3Mb. In that study exon trapping was performed with 10 cosmids individually, identifying 31 unique exons, giving an average of 3.1 unique exons per cosmid, or 1 exon per 11.3kb analyzed. Exon trapping experiments were performed from pools of cosmids that demonstrated that, as the complexity of target increased, the efficiency of identifying unique exons decreased. A report on the 4^^ International Workshop on the Identification of Transcribed Sequences (Gardiner and Mural, 1995) described exon trapping of 17 PAC clones from a 750kb region yielded 4- 20 exons per clone. A separate study identifying transcribed sequences in a ‘gene-rich’ region of human chromosome 11 (1 Iql2-ql3.1) used individual exon trapping of 14 PAC clones with the pSPL3 vector system and identified an average of 2,1 unique exons per clone (Cooper et al, 1998). With an average PAC clone of 120kb this produced a figure of one exon per 50kb. The authors picked 96 “exon trapped” clones per experiment (per PAC clone) and stated that this method was effective and efficient, with the four fold reduction in the number of exons identified per PAC clone (compared to

213 the original cosmid data) being in line with the increased template complexity demonstrated by the work originally describing the pSPL3 vector system (Church et al, 1994).

In this thesis, a range of 2-11 unique exons have been trapped per PAG clone based on hybridization of exons to membranes containing each exon trap library (all data not shown). Over the 170kb region that has been “exon trapped”, 13 putative exons have been identified and shown to map back, giving an average of one exon per 13. Ikb or 4.3 exons per clone. This figure is in line with the original work describing pSPL3-based exon trapping in cosmids, yielding one exon per 11.3kb (Church et al, 1994) and with that described in the two subsequent studies using PAC clones (Gardiner and Mural, 1995; Cooper et al, 1998), discussed above.

Therefore, it can be concluded that overall, the exon trapping strategy described here identified a number of exons from the EDC that is not out of line with previous studies of different genomic regions, using this technique. However, the efficiency of the method to identify the maximum number of exons within the region studied, described here, was hampered by incomplete BstXl digestion of the PCR amplified products derived from transfected C0S7 cells. With effective optimization of the exon trapping strategy it is predicted that an even higher number of exons can be identified within the EDC.

6.3 Analysis of “exon trapped” products and a cDNA clone identified

Of the thirteen exons identified via exon trapping, ten possessed possible open reading frames in the expected orientation, and three identified database entries, other than those associated with repetitive elements (in excess of 100 entries), via BLAST searches of the available databases (clones 128L15bH15, 127E12sC3 and 127E12sB4).

214 6.3.1 Clone 128L15bH15

The sequence of clone 128L15bH15 showed homology at the nucleotide level to a Homo sapiens GTP binding protein gene (see table 4.3). 133 out of the 143 nucleotides of sequence were homologous, with three gaps in the exon sequence (133 of the 143 nucleotides showed positive matches over 146 nucleotides within the sequence of the

GTP binding protein gene). B ased bn the directional nature of exon trapping (which traps exons by the presence of splice donor and splice acceptor sites in the correct orientation) it can be determined that the homology was seen in opposite orientations. Indeed, the only open reading frame present in the exon, displayed no homology to the GTP binding protein at the amino acid level, as the inferred translated protein would derive from an opposite orientation. It was therefore concluded that this exon displayed homology to the Homo sapiens GTP binding protein gene at the nucleotide level only, with a completely different possible translated protein product. RT-PCR using primers designed from the sequence of clone 128L15bH15 indicated that this exon is transcribed in all tissues tested (see table 4.4), and that two differently sized cDNA fragments are produced (approximately ISObp and 230bp as determined by gel electrophoresis - see figure 4.8). PCR amplifies the expected sized DNA fragment from 128 LI 5 PAC clone as template, as well as template from fetal heart and fetal lung cDNA (approximately 150bp). Using cDNA constructed from fetal liver RNA as template, a larger product is amplified (approximately 230bp), while the smaller (150bp) fragment is amplified from the control lane, presumably from DNA contamination. Using cDNA constructed from adult skeletal muscle RNA, both sized bands are amplified, while in the control lane the smaller product is amplified, with the presence of the smaller fragment in both reactions being attributed to DNA contamination. Therefore it was concluded that two different sized transcripts are expressed from this sequence. The study of whether the larger sized transcript originates from the region the exon was trapped from will require further work, beyond the allotted time scale of this thesis.

215 6.3.2 Clone 127E12sC3 and related cDNA

Clone 127E12sC3 contained two exon like sequences spliced together (see figure 4.5). Of these, the smaller of the two sequences (90bp) displayed 81% homology to a database entry, the sequence of IMAGE cDNA clone 1676497. From direct sequencing of a small region of PAG clone 127 E l2, using primers designed from the sequence of IMAGE cDNA clone 1676497, the exon structure of the cDNA was determined (see figure 4.6). This showed that the region of the cDNA clone 81% homologous to the smaller 90bp of sequence from “exon trapped” clone 127E12sC3 constituted an actual exon of equal size (90bp). This strongly indicated that clone 127E12sC3 and IMAGE clone 1676497 were highly related. No “exon trapped” clones were identified, other than those positive with hybridization of clone 127E12sC3, from the libraries when screened with a hybridization probe derived from 1676497 and no expression data for clone 127E12sC3 could be produced in the tissues examined. RT-PCR with primers derived from the sequence of IMAGE clone 1676497 indicated that this cDNA was transcribed (see table 4.4). Northern blot data showed the presence of a large (approximately 6kb) transcript in adult heart and skeletal muscle tissue and a smaller (approximately 3kb) additional transcript in adult skeletal muscle tissue. Additional signals of less intensity were seen in pancreas (approximately 1.5kb and Ikb) as well as a possible 7.5kb signal in heart, brain, placenta, lung, skeletal muscle and kidney, although the intensity of these signals were very low (see figure 4.9). No apparent transcripts were identified from Northern blot analysis in adult liver. RT-PCR using skeletal muscle cDNA as a template failed to identify any expression, suggesting that the transcripts seen with the Northern blot data in skeletal muscle were not an exact match to IMAGE clone 1676497 but instead, were highly related. Two open reading frames present in the 846bp IMAGE clone 1676497, were identified on analysis. The cDNA was directionally cloned and possessed a poly-A tail, therefore the orientation of the clone could be inferred. The two ORFs identified, began directly at the 5’ end (inferred from the directional nature of cloning) and ended at 417bp and 202bp, respectively. No protein matches were seen from BLASTP searches of the available databases (NCBI). A BLASTX search of the database revealed homology to

216 three database entries over 70 translated amino acids of the 3’ end of the cDNA clone, which did not agree with the identified ORFs. It was therefore concluded that IMAGE clone 1676497 showed a degree of homology (37% identity) to three database entries at the 3’ end of the cDNA that was unlikely to be translated.

6.3.3 Clone 127E12sB4

The +2 frame ORF identified in “exon trapped” clone 127B12sB4, produced 3 matches with protein database entries upon BLASTP analysis. These database entries (detailed in section 4.5.3) were identical to those identified with the BLASTX search using the sequence of IMAGE clone 1676497. Clone 127E12sB4 displayed no homology to IMAGE clone 1676497 at the nucleotide level. When the translated amino acid sequence of the +2 ORF of clone 127E12sB4 was compared to the corresponding translated region of clone 1676497 using BLASTP 2.0.9, a match of 47% identical amino acids was seen. This result suggested that IMAGE clone 1676497 and “exon trapped” clone 127E12sB4 were related. RT-PCR data for clone 127E12sB4 were inconclusive due to DNA contamination in the RNA from which cDNA was prepared. A Northern blot was hybridized with a ^^P labeled probe, representing clone 127E12sB4. Only two weak signals were seen in pancreas - approximately Ikb and 1.5kb transcripts, similar to the very weak signals seen with the hybridization of the same Northern blot with IMAGE clone 1676497. This provided evidence that sequences homologous or identical to 127E12sB4, were expressed in pancreatic tissue, and that these expressed sequences were also homologous to IMAGE clone 1676497. The other tissues screened with the 127E12sB4 probe that were present on the Northern blot: adult kidney, skeletal muscle, liver, lung, placenta, brain and heart were all negative.

6.3.4 Evidence for a novel gene family within the EDC?

The above data could be interpreted to suggest that at least two, and possibly three, related novel transcripts have been identified and localized to the EDC. It has been demonstrated that IMAGE clone 1676497 is expressed in adult and fetal tissues, and

217 that a number of transcripts have been identified from Northern blot analysis. Whether or not the cDNA actually codes for a functional protein has not been demonstrated and therefore its classification as a gene cannot be made. Likewise, with the two “exon trapped” clones 127E12sB4 and 127E12sC3, no determination as to whether these sequences represent functional exons within a gene can be made. No solid evidence as to whether these exons represent transcribed sequence can be made due to failure of a concise RT-PCR result. Clone 127E12sB4 showed weak hybridization to transcripts present on a Northern blot that were similarly identified by IMAGE clone 1676497. However, there is strong evidence that these three sequences are related. 90 base pairs, representing a functionally trapped unit, from “exon trapped” clone 127E12sC3 are 81% homologous to an identically sized exon of IMAGE clone 1676497, suggesting these clones represent sequences derived from two highly homologous transcriptional units with respect to both nucleotide sequence and exon structure. Sequences from these two clones could represent two members of a previously unidentified gene family residing within the EDC. Clone 127E12sB4 is homologous to a stretch of sequence within the IMAGE clone 1676497 at the amino acid level, suggesting these sequences are related in some degree. The size of the exon 127E12sB4 is not related to the predicted final exon of 1676497. Therefore the evidence that these two sequences represent members of a previously unidentified gene family is based on homology at the amino acid level and the ability of these two sequences to weakly identify transcripts of similar size in pancreatic RNA, when hybridized against an identical Northern blot. In addition, all three sequences map to within 17kb of each other, indicating that they could have evolved from common sequence ancestry via local rearrangement events. Further investigation will be needed to determine if these sequences truly represent a novel gene family. The nucleotide sequence of PAC clones 127 E12, 128 F15 and 138 F6 may help to answering some of the above speculation. Figure 6.1 describes the relationship between all three sequences described in this section.

218 6.3.5 Other “exon trapped” clones

RT-PCR was performed with primers designed from “exon trapped” clones 128L15sB17 and 128L15bG14 on the templates given in table 4.4. Although both these sequences contain an ORF in the expected frame, no evidence of expression was demonstrated in the tissues investigated.

219 Obp 200bp

A: ‘Exon trapped clone” 127E12sC3

B: Genomic organization 01^ lOOpbp honiology

C: cDNA clone IMAGE 1676497 AAAA

Obp 200bp

Oaa 70aa D: I___ 46% idenlih' , -

E: ‘Exon trapped” clone 127E12sB4:

Obp 200bp

Figure 6.1. Homology between a novel EDC localized cDNA and two “exon trapped” clones.Schematic diagrams are organized in 5 horizontal planes (A-E). First horizontal plane (A) represents the “exon trapped” clone 127E12sC3 (thick horizontal line). Scale bar in base-pairs is above. Second horizontal plane (B) is in two parts and represents the genomic organization of 127E12sC3 (upper) and IMAGE cDNA clone 1676497 (lower, see figure 4.6 for detail). Third horizontal plane (C) represents the nucleotide sequence of IMAGE 1676497. Fourth horizontal plane (D) represents amino acid sequence translated from the nucleotide sequences given in horizontal planes C and D from the positions indicated. Fifth horizontal plane (E) represents nucleotide sequence of “exon trapped” clone 127E12sB4. Grey rectangles indicate degree of homology between nucleotide sequence in plane B and amino acid sequence in plane D, as determined from BLAST searches (Altschul et al, 1997).

220 6.4 Is lq21 a gene rich region?

At the beginning of this work 27 genes had been assigned to a 2Mb region of human chromosome lq21 designated the Epidermal Differentiation Complex, with an average gene density of 1 per 74kb. It was hypothesized that, like other genomic segments harboring multi-gene families such as the MHC the EDC could contain a high density of hitherto unidentified genes. Over the course of this thesis independent groups have specifically localized a number of additional genes and transcripts to the EDC and neighboring regions. Whilst constructing the integrated bacterial clone map described in chapter 3, an additional member of the ‘fused’ gene family (comprising profilaggrin and trichohyalin) was identified, repetin, and localized to the EDC (Huber et al, 1999). A keratinocyte cDNA identified by the lab of Dietmar Mischke, Berlin, was also localized to the bacterial clone map (figure 3.12), around 300kb outside of the defined EDC (SIOOAIO to SlOOAl). Three additional keratinocyte cDNA clones were positioned within the EDC on the basis of YAC hybridization data (Mischke et al, 1998). An unidentified transcript similar to “growth arrest inducible gene product” was positioned within lOOkb of S100A6 by YAC fragmentation mapping (Lioumi et al, 1998). This transcript was not found to localize to the bacterial map (data not shown) but is positioned to a YAC clone containing the S100A6 gene, indicating close proximity. In the same study, a further 3 transcripts are localized to a YAC clone containing SIOOAIO, at a distance approximately 1Mb away from SIOOAIO. A recent report localized a nicotinic acetylcholine receptor gene, CHRNB2, to lq21 with a distance of 2.5Mb telomeric of the filaggrin gene (using the 6Mb YAC contig, produced by Marenholz et al, 1996)(Lueders et al, 1999). The distances determined from the integrated bacterial map presented here, would place this gene roughly 850kb from the most distal end of the map, although this gene was localized to a YAC clone (950_e_2) used in the initial stages of map construction. Three cDNA clones were selected from epidermal tissues and localized to a YAC clone covering the region of the SPRR and involucrin genes (Zhao and Elder, 1997).

221 Therefore, an additional 6 cDNA clones have been localized within the EDC (SIOOAIO to SlOOAl) while one gene and 5 cDNA clones have been localized within neighboring regions. It is difficult to estimate gene density from YAC clone data as is demonstrated by the cDNA localized to within lOOkb of SlOOA6 by YAC mapping (Lioumi et al, 1998), but not localized to the bacterial map (which extends some 300kb telomeric of S100A6). Looking at the region SIOOAIO to SlOOAl and including the single cDNA identified in this study whilst exon trapping, the density of identified transcripts is 35 over 1.8Mb or 1 per 51.4kb. This figure already exceeds certain regions designated ‘gene rich’ in the literature (such as a region of 1 Iq with a density of 1 transcript per 53kb, Cooper et al, 1998), but is yet to approach the figure of one per 23kb, as proposed to constitute a ‘gene rich’ region (Fields et al, 1994). The data obtained from exon trapping experiments described here, indicates an above average number of unique exons identified per PAC clone (4.3 per clone) when compared to one study using exon trapping from PAC clones (2.1 per clone. Cooper et al, 1998) but is at the lower end of the scale for the figures reported in other studies (4-20 per clone, Gardiner and Mural, 1995). However, when considering the inefficient nature of the exon trapping (described above) it would be plausible that, had the percentage of artifactual DNA clones been reduced or even the number of clones picked increased, more exons would have been identified. The evidence suggested by this, and other investigations detailed, indicates that the EDC is indeed a gene rich region and that it is predicted many more genes will be discovered, as more transcriptional studies are initiated and as the sequence of the bacterial clones presented here is determined.

222 6.5 The role of evolutionary comparisons in understanding the

EDC

6.5.1 Transcriptional orientation of the SlOO genes as an indicator of the evolutionary processes defining the organization of this multi-gene family

The transcriptional orientation of 4 of the SlOO gene family members within the EDC was determined from comparing the restriction map of DNA fragments from the overlapping bacterial clones presented in chapter 3, with direct sequencing data (Wicki et al, unpublished; Yamamura et al, 1996). Figure 6.2C displays these orientations with respect to the previously determined orientations of S100A3,4, 5 and 6 (Englekamp et al, 1993). This data alone suggests an ancestral inversion between SI OOA 13 and S 1 OOAl, juxtaposing SlOOAl in relation to the other S 100 genes. Evidence that supports this ancestral inversion is seen from the high number of alternatively spliced 5’ untranslated regions from SlOOAl3 (see chapter 5, sections 5.3.3 - 5.3.8). At least 8 alternatively spliced 5’ untranslated regions of the SI OOAl 3 gene were identified by directly comparing RT-PCR sequencing and EST database entries (see figures 5.6, 5.7 and 5.8). Numerous, additional exon sequences were identified from cDNA clone database entries that were not confirmed by RT-PCR (figure 5.5). Northern blot data presented in the original paper describing SI OOA 13 (Wicki et al, 1996b) suggests that SI OOA 13 is expressed in most tissues, except for possibly leukocytes, with higher levels of expression in heart, skeletal muscle, kidney and pancreas. Only a small portion of the Northern blot is shown (around 500bp), so it is difficult to tell whether all lengths of transcripts described here are present in all tissues. The length of the smear of positive signal extends beyond the displayed window of 0.5kb in skeletal muscle, heart and possibly small intestine. The majority of signal in each case is around 0.5kb, suggesting that the majority of transcripts do not contain extensive 5’ untranslated regions.

223 The pattern of alternatively spliced initial exons has been attributed to alternative promoters acting on transcription initiation (Wang et al, 1997; Cramer et al, 1997). The understanding of splice site selection is still not complete, but it is believed to be a combination of inhibition and activation factors that are competing to specifically block or enhance different splice sites. The two main functions of alternative splicing are to provide an on off switch by altering the translation stop codon in order to produce a non-functional protein, and secondly to produce multiple protein isoforms (Wang et al, 1997). In the case described here, SlOOAl3 alternatively spliced exons seem to be in the 5’ untranslated region, thus having no effect on the translated protein product. With the suggestion of an ancestral inversion by the described transcriptional orientation, it is proposed here that this inversion could have either destroyed, or altered, the ancestral promoter of SlOOAl3. An inversion breakpoint altering cw-acting elements (promoter, enhancer, or silencing elements), effecting the transcriptional initiation of SlOOAl3 could account for the alternatively spliced products. Additionally an inversion breakpoint could have juxtaposed cw-acting elements that were ancestrally designed for another gene within the EDC, therefore producing the unusual alternative splicing described. A look at the BLASTN 2.0.9 data for another, sample SlOO gene, SlOOAl2, reveals no apparent alternative splicing. The 466bp mRNA sequence of SI OOA 12 identifies 10 EST database entries with 100% homology and a further 8 EST entries with greater than 98% homology. Comparing the sequence lengths of these EST database entries and the homology seen with the S100A12 mRNA shows no apparent additional 3’ or 5’ sequences, or any additional or missing sequences across the length of the EST database entries.

224 Figure 6.2 Organization of the SlOO gene clusters in mouse and human, suggested inversion events, and the first identification of an SlOO gene cluster in chicken (next page). In each case (A, B, C, and D) thick horizontal lines represent genomic DNA. Rectangles represent the relative positions of SlOO genes. Scale bar (below D) is in kilobases (kb) (gene width not to scale). A: Organization of the mouse SlOO gene cluster (as determined by Ridinger et al, 1999). B: Suggested positions of two inversion breakpoints to explain the rearrangement of the SlOO gene order seen from comparison of human and mouse clusters (Ridinger et al, 1999). Vertical arrows describe the positions of breakpoints while horizontal arrows describe the inverted genomic DNA. C: Specific order of the SlOO gene cluster in human as determined from the integrated bacterial contig presented in chapter 3. Horizontal arrows describe the transcriptional orientation of SlOO genes indicated (SlOOAl, A2, A3, A4, A5, A6, and A12). D: Chicken genomic segments harboring S100A6 and SIOOAIO genes (dark rectangles) in relation to identified sequences homologous to human SlOO genes (light grey = strong hybridization signal, and white rectangles = weak hybridization signal). H.s. = Homo sapiens. Horizontal arrow above S100A6 and H.s. S100A2+4 describes the two possible orientations of these sequences in relation to H.s. SlOOAl.

225 A: A3/4/5 A6 A l 3 Ai Mouse:

A&9

1 ^ B: A3/4/5 A6 ^A8/9 A 1 3 ^A 1 Inversion I:

r o A9/8 A6 A5/4/3 X A13 A1 N) Inversion 2: i

C: Human: < H ------#- **# H ------H t H ------H AlO A ll A9A12A8 A7 A6/A5/A4/A3/A2 A 13/A I

D: Chicken SIOOAIO: {H.S.AI} AlO Chlcken S100A6: A6 4 > i - (H.s. A 2+4)/ H.S. A 1 (A3+5)

200kb Figure 6.2. See previous page for legend. 6.5.2 Organization of the mouse SlOO gene family located on a genomic region syntenic with the human EDC

Evidence of an ancestral breakpoint between SlOOAl and SlOOAl3 is further supported by the characterization of a YAC clone containing 8 identified mouse SlOO genes (Ridinger et al, 1999). This YAC clone describes a similar clustering of SlOO genes seen in human EDC, but shows a different order. Figure 6.2A displays the order determined by Ridinger and co-workers. The authors suggest that the re-arranged order could have resulted from inversion breakpoints between SlOOAl and S100A13, as well as between S100A7 and S100A8. The SlOOAl to SlOOAl3 breakpoint is in agreement with the data presented here. The proposed breakpoint between S100A7 and S100A8 is in general agreement with the comparisons of human and mouse orders, although no mouse S100A7 gene has been identified or localized to the YAC clone described by Ridinger and co-workers, therefore whether this gene is a) present in the YAC clone or b) juxtaposed with S100A8 and S100A9 could not have been determined. Figure 6.2B displays the proposed inversion breakpoint events to arrive at the two species SlOO gene orders.

The evidence presented in this thesis supports the hypothesis that at some point in the phylogenetic lineage leading to the human genome, an inversion breakpoint has occurred between SlOOAl and SlOOAl3. It is hypothesized here that this would likely be a recent evolutionary event on the basis of the alternative splicing. Unless there were some evolutionary constraints requiring the extensive alternative splicing mechanisms, then it could be assumed that cw-acting elements upstream of the ATG codon involved in this alternative splicing would not be positively selected for. It is reasonable to presume that over the course of evolutionary time, cw-acting elements that are not required for the function of the SlOOAl3 gene would be lost through random drift.

It is not clear whether the order of SlOO genes in the mouse represents an organization closer to that of a mammalian ancestral genome. The mouse genome is known to have

227 re-arranged considerably during evolution, and it has even been proposed that the mouse does not provide the simplest model for human genome comparison because of this (Nizetic et al, 1987; Graves, 1996; Carver and Stubbs, 1997). Indeed, the entire mouse EDC is organized in a similar order but with opposing orientation. In human, the interval from profilaggrin to S100A6 is in a centromere to telomere orientation on chromosome lq21 (see figure 3.1 or 3.12), while the mouse EDC on chromosome 3 is ordered in the opposite direction, the interval from profilaggrin to S100A6 is in a telomere to centromere orientation (Lenders et al, 1999). Therefore it is clear that analysis of other mammals and vertebrates will provide useful information regarding the evolution of the SlOO cluster and the EDC as a whole.

6.5.3 Implications on the possible presence of locus control regions The identification of a recent inversion event in the human SlOO gene cluster on lq21 has implications on the widely speculated possibility of a locus control region acting on the EDC as a whole (Mischke et ai, 1996; Hardas et al, 1996; Zhao and Elder, 1997; see section 1.3.3). The observation that the p-globin LCR exhibits directionality (Tanimoto et al, 1999) suggests that were a gene under the control of an LCR to be inverted, then this control would be lost. Therefore, it is assumed that an LCR would not be essential for S 100 gene expression and therefore would not be present. In retrospect this is not a surprising conclusion as many of the SlOO genes exhibit a diverse pattern of expression (Zimmer et al, 1995; see also section 1.3.1.2). It is more likely to hypothesize that genes centromeric of SlOOA9 (see figure 3.12) could be under the control of an epidermal LCR, due to the observation that many of these genes have been identified or implicated, in the formation of comified envelopes (including SIOOAIO and AlO, Robinson etal, 1997).

228 6.5.4 The first Identification of an SlOO gene cluster within the chicken genome

Chicken genomic clones (cosmids) were isolated that were shown to contain the orthologous S100A6 and SIOOAIO genes (see sections 5.5.3 and 5.5.4). Direct sequencing determined the complete exon and intron structure of the chicken S100A6 gene. The substantial compression seen within the genes of the pufferfish Fugu rubripes is due in most part from compacted intron size (Elgar, 1996). It has been suggested that certain regions of the chicken genome (microchromosomes) could display similar compression (Hughes and Hughes, 1995; McQueen et al, 1998). A comparison of the intron sizes between human and chicken S100A6 (table 5.3) reveals an 8.9% reduction of intron 1 and an 11.3% reduction of intron 2 from human to chicken. The overall chicken S100A6 mRNA sequence is 16.5% smaller than in the human, although no attempt has been made to characterize the 5’ end of the chicken gene where the majority of size reduction is seen (exon 1 in human is 80bp, while in chicken this is 25bp). The finding that the intronic sequences of the chicken S100A6 gene are smaller than the human S100A6 gene sequences is in line with the observation that the majority of chicken gene introns (as opposed to exons) are smaller (Hughes and Hughes, 1995). S100 genes may not be the best candidates for demonstrating compression, as they are small. In particular the region around S100A6 in human is gene rich, GC rich, and compact (4 genes within 15kb, GC content approaching 56%). It has been noted that compression between Fugu and human genomes is more evident in regions of the human genome that are gene and GC poor (Gardiner, 1997). Direct sequencing of SIOOAIO in both human and chicken (exon structure for the human gene had not been previously described) was unsuccessful in determining the intron structure, as these were seen to be larger than any sequence generated (figure 5.12). However, the exon structures of the electronically available SIOOAIO sequences were determined. The electronically available chicken SIOOAIO mRNA sequence is considerably smaller than the human equivalent (44%). Again, whether this is due to compression or the fact that the full-length gene has not been cloned, can not be established. Looking at the comparisons in exon size (table 5.4), the first and third exons

229 of the chicken SIOOAIO gene are considerably smaller, while the second exon (148bp) is very similar to the human SIOOAIO second exon (150bp). This could suggest that the 3’ and 5’ ends of the chicken SIOOAIO gene have not been cloned.

Chicken cosmids containing S100A6 and SIOOAIO were restriction mapped with BamRl and EcoRl respectively. Hybridization experiments were performed using probes representing all known human SlOO genes co-localizing to human chromosome 1, except in the cases of S100A6, SIOOAIO and SlOOAl 1. Chicken SlOOAl 1 was used to screen chicken cosmids investigated, no positive hybridization signal was seen. Five cross-hybridizing signals were observed with human SlOO gene probes. Human SlOOAl revealed a strong positive hybridization signal on one chicken S100A6 positive cosmid and a weaker hybridization signal on one chicken SIOOAIO positive cosmid. Human SlOO genes A2, A3, A4 and A5 produced a weak hybridization with chicken S100A6 positive cosmids, comparable to the weak signal given with the SlOOAl probe to one chicken SIOOAIO positive cosmid. The strong hybridization signal to a BamHl restriction enzyme fragment from the human SlOOAl probe strongly suggests that this sequence represents an orthologous chicken SlOOAl gene. The weaker hybridization signal with a unique EcoRI fragment in a single SIOOAIO positive cosmid suggests the presence of a novel chicken SlOO gene, significantly different to chicken SIOOAIO, SlOOAl 1 (shown not to localize), or SlOOAl. Human S100A2 and SI00A4 probes hybridized to a distinct BamHl restriction enzyme fragment in all three S100A6 positive cosmids suggesting the presence of at least one more novel chicken SlOO gene. Human SI00A3 and S100A5 hybridized to the identical BamHl restriction enzyme fragment that was identified by a strong positive signal with human SlOOAl. In this case, it is known that SI00A3 and S100A5 are more homologous to SlOOAl than S100A2 and SI00A4 are (Schaefer etal, 1995). Therefore, whether these weaker hybridization signals are a result of cross-hybridization with the presumed orthologous chicken SlOOAl gene, or to another novel chicken SlOO gene or

230 genes, could not be determined with the experiments performed within the allotted time scale of this thesis. A summary of the results of cross hybridization from probes representing human SlOO genes with chicken genomic cosmids is displayed in figure 6.2D. From these data, it can be concluded that a similar clustering of SlOO genes present at the human locus lq21 is seen in chicken genomic DNA. The precise number and order of genes is yet to be determined, but chicken S100A6 and sequence highly homologous to human SlOOAl both co-localize to a genomic segment of approximately 30kb. The distance between these two genes represents a 3-fold compression to that seen in human and a 10-fold compression to that seen in mouse, genomic DNA. This could be a result of either additional genomic rearrangement between all three species or an example of compression within the chicken genome. It is of interest to note that no cross hybridization was observed with the human SlOOAl3 gene in the identified chicken cosmids.

6.5.5. The existence of a non-mammalian EDC?

S 100 genes have been identified in a number of vertebrates, including Xenopus (Zimmer et al, 1996). It is widely believed that the intermediate filament association gene family arose from fusion of an SI 00-like gene with a comified envelope presursor- like gene (of the SPRR, loricrin, involucrin family) (Markova et al, 1993; Lee et al, 1993; Krieg et al, 1997). Therefore, a co-localization of SI 00-like and CE precursor­ like genes would be needed in a primordial gene cluster, prior to the divergence of mammals, in order for such a gene family to evolve. The role of profilaggrin is highly specialized and essential for the required shift in cytoskeletal structure during the later stages of keratinization (see section 1.2.2). Although not yet demonstrated, it would seem plausible to speculate that a common ancestor to the intermediate filament association gene family, would be present in the majority of organisms possessing keratinizing epidermal tissues. It has been noted recently that granular deposits, much like keratohyalin granules or L-granules (see section 1.2.2.1), have been observed in the epidermis of reptiles (Alibardi and Thompson, 1999; Alibardi, 1999). It has also been

231 noted that comified envelope-like structures exist in birds (Peltonen et al, 1998). It is argued here that a non-mammalian EDC would likely exist in other vertebrates. As has been previously described, the elucidation of the structure of an EDC in organisms other than human, would greatly advance current understanding of function and evolution. It is proposed that due to the conserved nature of the SlOO genes, a search for these genes and the demonstration of co-localization within other species would be the logical first step towards answering the question of whether a non-mammalian EDC exists, as has been demonstrated in this thesis. The identification of genomic clones containing an SlOO gene cluster similar to that seen in the human and mouse EDC would be the starting point for the possible identification of other EDC genes.

6.6 Further work

This thesis describes the production of an integrated high-resolution map of completely contiguous bacterial clones covering the human EDC. This map will be the platform for the identification of all genes, and the study of their expression, regulation and evolution. Preliminary studies have been initiated that demonstrate the use of exon trapping from the bacterial clones presented here as a means of identifying transcribed sequences, and that demonstrate the existence of an SlOO cluster in a non-mammalian vertebrate similar to that seen in human and mouse. The extension of these studies goes beyond the allotted time scale of this thesis and a number of key areas of further work are described below.

6.6.1 Further characterization of (potential) transcripts identified by exon trapping.

Thirteen putative exons and one cDNA clone were identified in chapter 4. It was demonstrated that the cDNA clone is transcribed, recognises related transcripts on a Northern blot, and is related to two independent exons identified via exon trapping.

232 which map to within 17kb of each other. It has been speculated that this cDNA and these exons constitute a novel gene family within the EDC. Further experiments are needed to: a) Determine expression (if any) in a range of tissues using RT-PCR and Northern blot analysis, of particular importance would be epidermal tissues. b) Clone the full length of any transcripts determined using 3’ and 5’ rapid amplification of cDNA ends (RACE, Frohman, 1993). In the case of the identified cDNA clone, which detected large transcripts (approx. 6kb) in adult skeletal muscle and heart it is of interest to note that long genes are scarce in GC-rich isochores (Duret et al, 1995). Coupled with the fact that these long transcripts are probably related (due to negative RT-PCR results in skeletal muscle with primers that amplify a portion of the cDNA) and that the EDC is located within a GC rich region (Englekamp et al, 1993; Zhao and Elder, 1997) it may be seen that the full length transcript is of smaller size.

6.6.2 Systematic, optimized exon trapping and cDNA selection of bacterial clones presented in the integrated map

Exon trapping described in chapter 4 demonstrated the ability of this technique to identify transcripts. A global approach to exon trapping of the entire bacterial contig will contribute to defining all known genes within the EDC. Parallel cDNA selection of primarily epidermal-derived tissues will also contribute to the identification of genes that are functionally related to other EDC members. The SlOO gene family demonstrates a wide range of expression, and therefore it would also be prudent to investigate tissues unrelated to epidermal differentiation as well.

233 6.6.3. Analysis of long range genomic sequence generated by the Sanger Centre

As the sequence of the bacterial clones within the contig is generated and annotated using gene-prediction programs, more genes will be identified. In the case of the identified cDNA and the related exons, sequence analysis may elucidate further exons homologous to the cDNA clone that may enlarge possible transcriptional units identified from exon trapping.

6.6.4. Further analysis of the chicken SlOO cluster

Further analysis of the chicken S100 cluster may contribute to identifying orthologues of other EDC related genes. The two next steps would be to demonstrate linkage between the S100A6 and SIOOAIO clusters by either genetic mapping or long range restriction mapping, and to characterize the possible orthologous SlOO genes identified in chapter 5 by degenerative PGR from the cosmid clones isolated. Linkage between the SIOOAIO and S100A6 chicken clusters may define the avian EDC. Bacterial clones covering genomic region could be used for the detection of orthologous EDC genes of all three so-far-identified families and any novel transcripts determined.

234 Summary of conclusions

1. An integrated partial EcoRI, and full Not\ and Sail restriction map of 82 completely contiguous bacterial clones encompassing the entire Epidermal Differentiation Complex (EDC), is presented. All 28 so-far-identified genes, 8 DNA markers, and one novel keratinocyte cDNA clone are specifically localized to EcoBl restriction fragments. Accurate distances between genes and markers that correlate with previous, low resolution, mapping studies of the region, are described.

2. By the use of exon trapping, 13 putative exons and one novel, transcribed cDNA clone have been isolated from the EDC. Three of the 13 putative exons are significantly homologous to database entries, as determined by BLAST search programs. Of these three exons, one is 81% homologous to a novel cDNA clone at the nucleotide level, while another is 46% homologous to this same cDNA at the amino acid level. Genomic sequencing has determined that the region of the novel cDNA showing nucleotide homology to one of the two exons constitutes an exon itself. This and other evidence is presented that suggests the existence of a novel family of related transcripts within the EDC.

3. Data supporting a recent inversion event within the S100 gene cluster during the evolution of mammals is presented. Eight different transcribed sequences of the SlOOAl3 gene have been described that show alternatively spliced 5’ untranslated regions. Eight additional, homologous database entries suggest that the alternative splicing demonstrated here is extensive and consists of over 15 transcripts. A species comparison of SlOO genes in Homo sapiens and Gallus gallus has identified novel sequences homologous to Homo sapiens SlOOAl, A2, A3, A4, and A3, within the Gallus gallus genome. These novel sequences co-localize with Gallus gallus S100A6 and SIOOAIO. This is the first identified clustering of SlOO genes within the Gallus gallus genome, similar to that seen in the EDC of Homo sapiens and Mus musculus.

235 References

Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A,, Olde, B., Moreno, R.F., Kerlavage, R.F., McCombie, W.R., and Venter, J.C. (1991) Complementary DNA sequencing: expressed sequence tags and the human genome project. Science 252: 1651-1656

Adams, M.D., Dubnick, M., Kerlavage, A.R., Moreno, R., Kelley, J.M., Utterback, T.R., Nagle, J., Fields, C., and Venter, J.C. (1992) Sequence identification of 2,375 human brain genes. Nature 355: 632-642

Adams, M.D., Kerlavage, A.R., Fleischmann, R.D., Fulder, R.A. and 81 others. (1995) Initial Assessment of Human Gene Diversity and Expression Patterns Based Upon 83 Million Nucleotides of c DNA Sequence. Nature. 377: 3-17

Aeschlimann, D., Koreller, M.K., Allen-Hoffmann, B.L., and Mosher, D.F., (1998) Isolation of a cDNA Encoding a Novel Member of the Transglutaminase Gene Family from Human Kératinocytes. J. .Biol. Chein., 273: (6) 3452-2460.

Akaiyama, M., Christiano, A.M., Yoneda, K., and Shimizu, H. (1998) Abnormal Comefied Cell Envelope Formation in Mutilating Palmopantar Karatoderma Unrelated to Epidermal Differentation Complex. J. Invest. Dermatol. Ill: (1) 133-138

Alibardi, L. (1999) Formation of large micro-ornamentations in developing scales of agamine lizards. J. Morphol 240: (3) 251-266

Alibardi, L., Thompson, M B. (1999) Epidermal differentiation in the developing scales of embryos of the Australian scincid lizard Lampropholis guichenoti. J. Morphol. 241: (2) 139-152

Anderson, J.L., Khan, M., David, W.S., Mahdavi, Z., Nuttall, F.Q., Krech, E., West, S.G., Vance, J.M., Pericak Vance, M.A., and Nance, M.A. (1999) Confirmation of linkage of hereditary partial lipodystrophy to chromosome lq21-22. Ami. Med. Genet. 82: 161-165

236 Anasari-Lari, M A., Oeltjen, J.C., Schwartz, S., Zhang, Z., Muzny, D M., Lu, J., Grrell, J.H., Chinault, A.C., Belmont, J.W., Miller, W., and Gibbs, R.A. (1998) Comparative Sequence Analysis of a Gene-Rich Cluster at Human Chromosome 12pl3 and its Syntenic Region in Mouse Chromosome b.Genome Res., 8: 29-40.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J, (1997) Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res., 25: (17) 3389-3402.

Anand, R., Villasante, A., and Tyler-Smith, C. (1989). Construction Of Yeast Artificial Chromosome Libraries With Large Inserts Using Fractionation By Pulse-Field Gel Electrophoresis. Nucleic Acids Res. 17: 3425-3433.

Antequera, F., and Bird, A. (1993) Number of CpG Islands and Genes in Human and Mouse (1993) Proc. Natl. Acad. Sci. USA. 90: 11995-11999

Armstrong, D.K.B., McKenna, K.E., and Hughes, A.E. (1998) A novel insertional mutation in loricrin in Vohwinkel's keratoderma. J. Invest. Dermatol. Ill: 702-704

Backendorf, C., and Hohl, D., (1992) A Common Origin for Comified Envelope Proteins? Nature Genetics 2: 91

Baker, A. and Cotten, M. (1997) delivery of bacterial artificial chromosomes into mammalian cells with psoralen-inacxtivated adenovirus carrier. Nucliec. Acids. Res. 25: 1950-1956

Bale, S.J., and Doyle, S.Z. (1994) The genetics of Ichthyosis: A primer for epidemiologists. J. Invest. Dermatol. 102: 49S-50S

Bankier, A.T., Weston, K.M., and Barrell, B.G. (1987) Random Cloning and Sequencing by the M 13/Dideoxynucleotide Chain Termination Method. Methods Enzymol. 155: 51-53.

Banks, E.B., Crish, J.F., and Eckert, R.L. (1999) Transcription Factor Spl Involucrin Promoter Activity in Non-Epithelail Cell Types. Biochem. J. 337: 507-512

Barker, J.N.W.N. (1991) The pathophysiology of Psoriasis. Lancet 388: (8761) 227-230

Barlow, D.P., and Lehrach, H., (1987) Genetics by gel electrophoresis: the impact of pulsed field gel electrophoresis on mammalian genetics. Trends Gen. 3: (6) 167-168

237 Baxendale, S., Bates, G.P., MacDonald, M.E., Gusella, J.F., and Lehrach, H. (1991) The direct screening of cosmid libraries with YAC clones. Nucleic Acids Res. 19: 6651.

Bento Soares, M., de Fatima Bonaldo, M., Jelene, P., Su, L., Lawton, L., and Efstratiadis, A. (1994) Construction and Characterization of a Normalized cDNA Library. Proc. Nat Acad. Sci. USA. 91: 9228- 9232.

Berry, R., Stevens, T.J., Walter, N.A.R., Wilcox, A.S., Rubano, T., Hopkins, J.A., Weber, J., Goold, R., Bento Soares, M., and Sikela J.M., (1995) Gene-based Sequence-tagged -sites (STSs) as the Basis for a Human Genome Map. Nature Genetics, 10: 415-423.

Bird, A.P. (1986) CpG-rich islands and the function of DNA méthylation. Nature 321: 209-213

Birnboim, H.C., and Doly, J. (1979) A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res. 7: 1513-1523

Bloom, S.E., deleaney, M.E., and Muscarella, D.E. (1993). Constant and variable features of avian chromosomes. In: etches, R.L, and Gibbons, A.M.V. (Eds) Manipulation of the avian genome. pp39-59. CRC Press, Boca Raton, FL.

Bloomenburg, M. (1993) Molecular Biology of human keratin genes. In Darmon, M., and Blumenburg, M. (Eds) Molecular Biology of the skin, ppl-32. Academic Press, San Diego.

Bonifer, C., Vidal, M., Grosveld, F., and Sippel, A.E. (1990) Tissue specific and position independent expression of the complete gene domain for the chicken lysozyme in transgenic mice. EMBO J. 9: 2843- 2848

Brash, D.E. (1997) Sunlight and the onset of skin cancer. Trends Genet. 13: (10) 410-414.

Brenner S., Elgar, G., Sanford, R., Macrae, A., Venkatesh, B., and Aparicio, S. (1993) Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366: 265-268

Britten, R.J. (1997) Mobile Elements Inserted in the Distant Past Have Taken on Important Functions. Gene 205: 177-182.

238 Buckler A.J., Chang D.D., Graw S.L., Brook D., Haber D.A., Sharp P.A., and Housman D.E. (1991) Exon Amplification: A strategy to isolate mammalian genes based on RNA splicing. Proc. Nat Acad. Sci. USA. 88: 4005-4009

Bulger, M., Von Doorninck, J.H, Saitoh, N., Telling, A., Farrell, €., Bender, M.A., Felsenfeld, G., Axel, R., and Groudine, M. (1999) Conservation of Sequence and Structure Flanking the Mouse and Human p-globin Loci: The p-globin Genes are Embedded Within an Array of Ordorant Receptor Genes. Proc. Natl. Acad. Sci. USA.. 96: 5129-5134.

Bumstead, N., and Palyga, J. (1992) A preliminary linkage map of the chicken genome. Genomics 13: 690-697

Burge, C. and Karlin, S. (1997) Prediction of Complete Gene Structures in Human Genomic DNA. J. Mol. Biol. 268: 78-94.

Burke, D.T., Carle, G.F., and Olsen, M.V. (1987) Cloning Large Segments of Exogenous DNA into Yeast by Means of Artificial Chromosome Vectors. Science, 236: 806-812

Burt, D.W., Bumstead, N., Bitgood, J.J., Ponce de Leon, F.A. and Crittenden, L.B. (1995) Chicken genome mapping: a new era in avian genetics. Trends Genet. 11: 190-194

Burtis, K.C., and Hawley, R.S. (1999) The millennium flies in. Nature 401: 125-126

Candi, E., Melino, G., Lahm, A., Ceci, R., Rossi, A., Kim, I.G., Ciani, B., and Steinert, P.M. (1998) Transglutaminase 1 Mutations in Lamellar Ichthyosis. 7. Bio. Chem. 273: (22) 13693-13702

Carroll, M.C., Campbell, R., D., Bentley, D R., and Porter, R.R. (1984) A Molecular Map of the Human Major Histocompatibility Complex Class III Region Linking Complement Genes C4, C2 and Factor B. Nature, 307: 237-241.

Carroll, J.M., Albers, K.M., Garlick, J.A., Harrington, R., and Taichman, L.B. (1993) Tissue-specific and stratum- specific expression of the human involucrin promoter in transgenic mice. Proc. Nat. Acad. Sci. USA 90: (21) 10270- 10274

Carson, S., and Wiles, M.V. (1993) Far Upstream Regions of Class IIMHC Ea are Necessary for Position-

Independent, Copy-Dependent Expression of Ea Transgene. Nucleic Acids Res. 21: 2065-2072.

Cartlidge, P.H.T., and Rutter, N. (1992) Skin barrier function. In: Polin, R.A. and Fox, W.W. (eds) Fetal and Neonatal Physiology. Philadelphia: Saunders, W.B. pp569-585.

239 Carver, E.A., and Stubbs, L. (1997) Zooming in on the Human-Mouse Comparative Map: Genome Conservation Re-examined on a High Resolution Scale. Genome Res. 7: 1123-1137.

Cheng, J.F., Boyartchuk, V., Zhu, Y. (1994) Isolation and mapping of human chromosome 21 cDNA: Progress in constructing a chromosome 21 expression map. Genomics 23: 75-85

Christiano, A.M. (1997) Frontiers in Keratodermas: Pushing the Envelope. Trends Genet. 13: (6) 227- 233.

Chumakov, I., Rigault, P., Guillou, S., and others. (1992a) Continuum of overlapping clones spanning the entire human chromosome 21q. Nature 359: 380-360

Chumakov, I.M., Le Gall, I., Billault, A., Ougen, P., Soularue, P., Guillou, S., Rigault, P., Bui, H., De Tand, MF., Barillot, E., Aberderrahim, H., Cherif, D., Berger, R., Le Paslier, D., and Cohen, D. (1992b) Isolation of Chromosome 21-specific Yeast Artificial Chromosomes from a Total Human Genome Library. Nat. Genet. 1: 222-225

*

Chung, J.H., Bell, A C., and Felsenfeld, G., (1997) Characterization of the Chicken P-globin Insulator. Proc. Nat. Acad. Sci. USA. 94: 575-580.

Church, D M., Stotler, C.J., Rutter, J.L., Murrell, J R., Trofatter, J.A. and Buckler, A.I. (1994) Isolation of genes from complex sources of mammalian genomic DNA using exon amplification. Nature Genet. 6: 98- 105

Collins, J. and Hohn, B (1978) Cosmids: A Type of Plasmid Gene-Cloning Vector that is Packageable in vitro in Bacteriophage X Heads. Proc. Natl. Sci. USA. 75:(9) 4242-4246.

Collins, I.E., Cole, C.G., Smink, L.J., and 30 others. (1995) A high density YAC contig map of human chromosome 22. Nature (suppl) 377: 367-371

Corden, L.D., and McLean, W.H.I. (1996) Human Keratin Diseases: hereditary fragility of specific epithelial tissues. Exp. Derm. 5: 297-307

Coulson, A., Sulston, J., Brenner, S., and Karn, J., (1986) Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc. Nat Acad. Sci. USA 83: 7821-7825

* Chumakov I.M., Rigualt P., Le Gall, I., and 59 others (1995) A YAC contig of the human genome. Nature (suppl) 377: 175-298

240 Coulson, A., Waterson, R., Kiff, J., Sulson, J. and Kohara, Y. (1988) Genome Linking With Yeast Artifical Chromosomes. Nature, 335: 184-186

Cox, D R., Buurmeister, M., Roydon Price, E., Kim, S., and Myers, R.M., (1990) Radiation Hybrid Mapping: A Somatic Cell Genetic Method for Constructing High-Resolution Maps of Mammalian Chromosomes. (1990) Science, 250:245-250

Craig, J.M., and Bickmore, W.A., (1994) The Distribution of CpG Islands in Mammalian Chromosomes. Nature Genetics. 7: 376-381

Cramer, P., Gustavo Pesce, C., Baralle, F.E., and Kornblihtt, A.R. (19970 Functional Association Between Promoter Structure and Transcript Alternative Splicing. Proc. Natl. Acad. Sci. USA. 94: 11456- 11460

Crampton, J.M., Davies, K.E., and Knapp, T.F. (1981) The occurrence of families of repetitive sequences in a library of cloned cDNA from human lymphocytes. Nucleic. Acids. Res. 9: 3821-3834

Crish, J.F., Howard, J.M., Zaim, T.M., Murthy, S., and Eckert, R.L, (1993) Tissue-specific and differentiation-appropriate expression of the human involucrin gene in transgenic mice: an abnormal epidermal phenotype. Differentiation 53: 191-200

Cross, S.H., and Little, P.F.R., (1986) A cosmid vector for systematic chromosome walking. Gene, 49: 9- 22

Dale, B.A., Resing, K.A., and Presland, R.B. (1994) Keratohyalin granule proteins. In Leigh I, Lane B, Watt F (eds). The Keratinocyte Handbook. Cambridge University Press, Cambridge, UK, pp323-350.

Davies, K.E., Young, B.D., Elies, R.G., Hill, M.E., and Williamson, R., (1981) Cloning of a representative genomic library of the human X chromosome after sorting by flow cytometry. Nature, 293:374

Deaven, L.L., Van Dilla, M.A., Bartholdi, M.F., Carrano, A.V., Cram, L.S., Fuscoe, J.C., Gray, J.W., Hildebrand, C.E., Moyzis, R.K., and Perlman, J. (1986) Construction of human chromosome specific DNA libraries from flow sorted chromosomes. Cold Spring Harbor Symp. Quant. Biol. 51: 159-168

DeBry, R.W., and Seldin, M.F., (1996) Human / Mouse Homology Relationships. Genomics 33:337-351.

241 Denhardt, D.T., (1982) A membrane-filter technique for the detection of complementary-DNA. Curr. Contents/life sciences 43: 22

Deloukas, P., Schuler, G.D., Gyapay, G. and 62 others (1998) A Physical Map of 30,000 Human Genes. Science, 282: 744-746. de Jong, P.J., Yokobata, K., Chen, C., Lohman, F., Pederson, L., McNinch, J., and Van Dilla, M. (1989) human chromosome specific partial digest libraries in lambda and cosmid vectors. Cytogenet. Cell Genet. 51:985

Dillon, N., Grosveld, F. (1993) Transcriptional regulation of multigene loci: multilevel control. Trends Gene/9:134-137

Doering,T., Holleran,W.M., Potratz, A., Vielhaber, G., Elias, P.M., Suzuki, K., and Sandhoff, K. (1999) Sphinogolipid Activator Proteins are Required for Epidermal Permeability Barrier Formation. J. Biol. Chem, 274: (16) 11038-11045.

Donis-Keller, H., Green, P., Helms, C., and 30 others. (1987) A Genetic Linkage Map of the Human Genome. Cell, 51: 319-337.

Dover, G.A. (1982) Molecular Drive: a cohesive mode of species evolution. Nature, 299: (9) 111-117.

Dover, G.A. (1987) DNA Turnover and the Molecular Clock. J Mol Evol, 26: 47-58

Dulai, K.S., von Dornum, M., Mollon, J.D., and Hunt, D M., (1999) The Evolution of Trichromatic Colour Vision by Opsin Gene Dupication in New World and Old World Primates. Genome Res. 9: 629- 638.

Duret, L., Mouchiroud, D., and Gautier, C. (1995) Statistical Analysis of Vertebrate Sequences Reveals That Long Genes are Scare in GC-Rich Isochores. J Mo/. Evol. 40: 308-317.

Dyuk, G.M., Kim, S., Myers, R.M., Cox, D. (1990) Exon Trapping: A genetic screen to identify candidate tarnscribed sequences in cloned mammalian genomic DNA. Proc. Nat Acad. Sci. USA, 87: 8995-8999.

Eckert, R., and Green, H. (1986) Structure and evolution of the human involucrin gene. Cell 46: 583-589

Eckert, R.L., Crish, J.F., and Robinson, N.A. (1997) The Epidermal Keratinocyte as a Model for the Study of Gene Regulation and Cell Differentiation. Physiological Rev. 77: (2) 397-424

242 Ekanayake-Mudiyanselage, S., Heinrich, A., Schmook, P.P., Jensen, JM,, Meingassner, J.G., and Proksch, E. (1998) Expression of Epidermal Keratins and the Comified Envelope Protein Involucrin is Influenced by Permeability Barrier Disruption. J. Invest. Derm., 111: (3) 517-522.

Elgar, G. (1996) Quality not quantity: the pufferfish genome. Hum. Mol. Genet. 5: 1437-1442

Elgar, G., Clark, M., Green, A., and Sandford, R. (1997) How Good a Model is Fugu Genome. Nature. 387:140

Elvin, P., Slynn, G., Black, D., Graham, A, Butler, R., Riley, J., Anand, R., and Markham, A.P. (1990) Isolation of cDNA Clones Using Yeast Artificial Chromosome Probes. Nucleic Acids Res. 18: (13) 3913- 3917

Englekamp, D., Schafer, B.W., Mattel, M.G., Erne, P., Heizmann, C.W. (1993) Six SlOO genes are clustered on human chromosome lq21: identification of two genes coding for the two previously unreported calcium binding proteins SIOOD and SIOOE. Proc. Nat Acad. Sci. USA. 90: 6547-6551

Pestenstein, R., Tolaini, M., Corbella, P., Mamalaki, C., Parrington, J., Pox, M., Miliou, A., Jones, M., and Kioussis, D. (1996) Locus Control Region Punction and Heterochromatin-induced Position Effect Variegation. Science, 271: 1123-1125.

Pields, C., Adams, M.D., White, O., and Venter, C., (1994) How many genes in the human genome? Nature Genetics, 7: 345-346.

Pischer, D.P., Gibbs, S., van de Putte, P., Backendorf, C. (1996) Interdependent transcription control elements regulate the expression of the SPRR2A gene during keratinocyte terminal differentiation. Mol Cell Biol: 16:5365-5374

Pischer, D.P., Sark, M.W.J., Lehtola, M.M., Gibbs, S., van de Putte, P., and Backendorf, C. (1999) Structure and evolution of the human SPRR3 gene: implications for function and regulation. Genomics 55: 88-99

Plieischmann, R.D., Adams, M.D., White, O. and others. (1995) Whole-genome Random Sequencing and Assembly of Haemophilus influenzae RD. Science 269: 496-511

Poote, S., Vollrath, D. Hilton, A. and Page, D C. (1992) The human Y chromosome: overlapping DNA clones spanning the euchromatic region.lScience 258: 60-66

243 Forrester, W.C.,Thompson, C., Elder, J.T., and Groudine, M. (1986) A developmentally Stable Chromatin in the Human p-globin Cluster. Proc. Nat. Acad. Sci. USA. 83: 1359-1363.

Forrester, W.C., Epner, E., Driscoll, M.C., Enver, T., Brice, M., Papayannopoulou, T., and Groudine, M. ( 1990) A Deletion of the Human P-globin Locus Activation Region Causes a Major Alteration in Chromatin Structure and Replication Across the Entire P-globin Locus. Genes & Dev. 4: 1637-1649.

Forus, A., Berner, J.M., MezaZepeda, L.A., Saeter, G., Mischke, D., and Fodstad, O. (1998) Molecular characterisation of a novel amplicon at Iq21-q22 frequently observed in human sarcomas. Brit J Can 78: 495-503

Franke, W.W. (1987) Nuclear Lamins And Cytoplasmic Intermediate Filament Proteins - A Growing Multigene Family Ce//.48:.3-4

Frohman, M.A. (1993) Rapid Amplification of Complementary DNA Ends for Generation of Full-length Complementary cDNAs: Thermal RACE. Methods Enzymol. 218: 340-358.

Fuchs, E., and Cleveland, D.W. (1998) A Structural Scaffolding of Intermediate Filaments in Health and Disease. Science 279: 514-519

Fuchs, E., and Green, H. (1981) Regulation of Terminal Differentiation of Cultured Human Kératinocytes by Vitamin-A. Cell 25: 617-625

Gan, S-Q., McBride, O.W., Idler, W.W., Markova, N., and Steinert, P.M. (1990) Organization, Structure, and Polymorphisms of the Human profilaggrin Gene. Biochem. 29: 9432-9440

Gardiner, K., and Mural, R.J. (1995) Getting the message: identifying transcribed sequences. Trends Genet. 11 (3) 77-79

Gardiner K. (1997) Clonability and gene distribution on human chromosome 21: reflections of junk DNA content? Gene, 205: 39-46.

Gardiner-Garden, M., and Frommer, M. (1987) CpG islands in vertebrate genomes. J. Mol. Biol. 196: 261-282

244 Georges, M., and Andersson, L., (1996) Livestock Genomics Comes of Age. Genome Res. 6: 907-921

Gibbs, S., Fijnemann, R., Wiegant, J., van Kessel, A.G., van de Putte, P., and Backendorf, C. (1993) Molecular characterisation and evolution of the SPRR family of keratinocyte differentiation markers encoding small proline-rich proteins. Genomics 16: 630-637

Gibbs, S., Boelsma, E., Kempenaar, J., and Ponec, M. (1998) Temperature-sensitive Regulation of Epidermal Morphogenesis and the Expression of Comified Envelope Precursors by EGF and TGFa. Cell Tissue Res. 292: 107-114.

Gilley, J., Armes, N., and Fried, M. (1997) Fugu Genome is Not a Good Mammalian Model. Nature. 385: 305-306

Goffeau, A., Barrell,B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H., and Oliver, S.G. (1996) Life with 6000 Genes. Science, 274: 546-567.

Goodman, L. (1998) The Human Genome Project Aims for 2003. Genome Res. 8: 997-999.

Graves, J.A.M. (1996) Mammals that Break the Rules: Genetics of Marsupials Monotremes. Ann Rev. Genet.30: 233-260

Green, E D., Reithman, H.C., Dutchik, J.E. and Olson, M.V. (1991) Detection and characterization of chimeric yeast-artificial chromosomes clones. Genomics 11: 658-669

Green, K.J., and Jones, J.C.R. (1996) Desmosomes and Hemidesmosomes: Structure and Function of Molecular Components. The FASEB J. 10: 871-880

Green, P. (1997) Against a Whole-Genome Shotgun. Genome Res. 1: 410-417.

Gregory, S., Vaudin, M., White, P., Vance, J. (1998) Report of the Fourth International workshop on Human Chromosome 1 Mapping 1998. Cytogenet Cell Genet 83: 147-175

Gribnau, J., de Boer, E., Trimborn, T., Wijgerde, M., Milot, E., Grosveld, F., and Fraser, P. (1998) Chromatin Interaction Mechanism of Transcriptonal Control in vivo. The EMBO Journal, 17: (20) 6020- 6027.

245 Groenan, M.A.M., Crooijmans, P.M.A., Veenendal, A., Cheng, H.H., Siwek, M., and van der Poel, J.J. (1998) A Comprehensive Microsatellite Linkage Map of the Chicken Genome. Genomics, 49: 265-274.

Groet, J., Ives, J.H., South, A.P., Baptista, P.R., Jones, T.A., Yaspo, M.L., Lehrach, H., Potier, M.C., Van Broeckhoven, C., and Nizetic, D. (1998). Bacterial Contig Map of the 21 q 11 Region Associated with Alzheimer’s Disease and Abnormal Myelopoiesis in Down Syndrome. Genome Res. 8: 385-398

Grosveld, F., Blom van Assendelft, G., Greaves, D., and kollias, G. (1987) Position independent, high level expression of the human P-globin gene in transgenic mice. Cell 51: 975-985

Hanley, K., Jiang, Y., He, S.S., Friedman, M., Elias, P.M., Bikle, D.D., Williams, M.L., and Feingold, K.R. ( 1998) Keratinocyte Differentiation is Stimulated by Activators of the Nuclear Hormone Receptor PPARa J.Invest. Derm, 110:(4) 368-375.

Hardas, B.D., Zhang, J., Trent, J.M., and Elder, J.T. (1994) Direct Evidence for Homologous Sequences on the Paracentric Regions of Human Chromosome 1. Genomics 21: 359-363.

Hardas, B.D., Zhao, X., Zhang, J., Longqing, X., Stoll, S., and Elder, J.T. (1996) Assignment Of Psoriasin To Human Chromosomal Band lq21: Coordinate Overexpression Of Clustered Genes In Psoriasis. J. Invest. Dermatol. 106: 753-8

Hardison, R., Slightom, J.L., Gumucio, D.L., Goodman, M., Stojanovic, N., and Miller, W. (1997) Locus Control Regions of Mammalian P-globin Gene Clusters: Combining Phylogenetic Analyses and Experimental Results to Gain Functioning Insights. Gene, 205: 73-94.

Hardman, M.J., Sisi, P., Banbury, D.N., and Byrne, C. (1998) Pattern Acquisition of Skin Barrier Function During Development. Development 125: 1541-1552.

Hasegawa, S., Kato, K., Takahashi, M., Zhu, Y., Obata, K., and Miyake, K. (1993) SlOOao Protein as a Marker for Tissue Damage Related to Extracoporeal Shock Wave Lithotripsy. Eur. Urol. 24: 393-396.

Hatada, S., Kuziel, W., Smithies, O., and Maeda, N. (1999) The Influence of Chromosomal Location on the Expression of Two Transgenes in Mice. The Journal of Biological Chemistry, 274: (2) 948-955.

Hedges, S.B., Parker, P H., Sibley, C.G., and Kumar, S. (1996) Continental Breakup and the Ordinal Diversification of Birds and Mammals. Nature. 381: 226-229

246 Hocheschwender, U., Gregor Sutcliffe, J., and Brennan, M.B. (1989) Construction and Screening of a Genomic Library Specific for Mouse Chromosom 16. Proc. Natl. Acad. Sci. USA. 86: 8482-8486

Hofmann, M.A., Drury, S., Fu, C., Taguchi, A., Lu, Y., Avila, C., Kambhan, N., Bierhaus, A., Nawroth, P., Neurath, M.F., Slattery, T., Beach, D., McClary, J., Nagashima, M., Morser, J., Stern, D., and Schmidt, AM. (1999) RAGE Mediates a Novel Proinflammatory Axis: A Central Cell Surface Receptor for SlOO/Calgranulin Polypeptides. Cell, 97: 889-901.

Hohl, D., Mehrel, T., Licht, U., Turner, M.L., Roop, D R. and Steinert, P.M. (1990) Characterization of Human Loricirn J. Biol. Chem. 266: (10) 6626-6630

Hohl, D, Licht, U., Breitkreutz, D., Steinhert, P.M. and Roop, D R. (1991) Transcription of the Human Loricrin Gene in Vitro is Induced by Calcium and Cell Density and Suppressed by Retinoic Acid. J. Invest. Dermatol. 96: 414-418

Hohl, D., Olano, B.R., de Viragh, P.A., Huber, M., Detrisac, C.J., Schnyder, U.W., and Roop, D.R. (1993) Expression patterns of loricrin in various species and tissues. Differentiation 54: 25-34

Hohl, D., de Viragh, P.A., Amiguet-Barras, F., Gibbs, S., Backendorf, C., and Huber, M. The small proline-rich proteins constitute a multi-gene family of differentially regulated comified envelope precursor proteins. (1995)7. Invest. Dermatol. 104: 902-909

Hohn,B., and Collins, J. (1980) A small cosmid for efficient cloning of large DNA fragments. Gene 11: 291-298

Hood, L., Kronenberg, M., and Hunkapiller, T. (1985) T Cell Antigen Receptors and the Immunoglobulin Supergene Family. Cell, 40: 225-229.

Huber, M., Siegenthaler, G., and Hohl, D (1999) Repetin is a novel "fused" protein with calcium-binding and repetitive domains localised to chromosome lq21. (meeting abstract) 7. Invest. Dermatol. 112: 612

Hughes, A.L., Hughes,M.K., (1995) Small Genomes for Better Flyers. Nature 377: 391.

Hughes, A.L., and Yeager, M. (1997) Molecular Evolution of the Vertebrate Immune System. BioEssays, ICSU Press, 19: (9) 777-786.

247 Ichikawa, H., Hosoda, F., Aral, Y., Shimizu, K., Ohira, M., and Oki, M. (1993) A Notl restriction map of the entire long arm of human chromosome 21. Nature Genet. 1: 361-365 loannou, A.P., Amemiya, J., Games, J. et al (1994) A new bacteriophage PI-derived vector for the propagation of large human DNA fragments. Nature Genetics 6: 84-89 .

Ishida-Yamamoto, A., Hashimoto, Y., Manabe, M., O’Guin, W.M., Dale, B.A., and lizuka, H. (1997a) Distinctive expression of filaggrin and trichohyalin during various pathways of epithelial differentiation. Brit. J. Derm. 137: 9-16

Ishida-Yamamoto, A., Korge, B.P., Puenter, C., Dopping Hepenstal, T., Hohl, D., lizuka, H., Stephenson, A.M., Eady, R.A.J., and Munro, C.S. (1997b) Abnormal keratinization and loricrin mutation in Vohwinkel's syndrome (VS) with ichthyosis. J. Invest. Dermatol. 108: 642 (meeting abstract)

Ivens, A C. and Little, P.F.R. (1995) Cosmid clones and their application to genome studies. In: Glover DM and Hames BD (eds.). DNA Cloning 3. A practical approach. IRL Press, Oxford, UK, pp 1-46,

Kao, FT., Jones, C., and Puck, T.T.„ 91976) Genetics of somatic mammalian cells: Genetic, immunologic, and biochemical analysis with Chinese hamster cell hybrids containing selected human chromosome s. Proc. Nat. Acad. Sci. USA 1 ; 73: 193-197.

Karlin, S., Campbell, A.M., and Mrazek, J., (1998) Comparative DNA Analysis Across Diverse Genomes. Ann Rev Genet 32: 185-255.

Kartasova, T., and van de Putte, P. (1988) Isolation, characterization, and UV stimulation, of genes encoding polypeptides of related structure in human epidermal kératinocytes. Mol. Cell. Biol. 8: 2195- 2203

Kartasova, T., Darwiche, N., Kohno, Y., Koizumi, H., Osada, S., Huh, N., Lichti, U., Steinert, P.M. and Kuroki, T. (1996) J. Invest. Dermatol. 106: 294-304.

Kasahara, M., Hayashi, M., Tanaka, K., Inoko, H., Sugaya, K., Ikemura, T., and Ishibashi, T. (1996) chromosomal localization of the proteasome Z subunit gene reveals an ancient chromosomal duplication involving the major histocompatability complex. Proc. Natl. Acad. Sci. USA 93: 9096-9101

248 Katsanis, N., Fitzgibbon, J., and Fisher, E.M.C. (1996) Paralogy Mapping: Indentification of a Region in the Human MHC Triplicated onto Human Chromosome 1 and 9 Allows the Prediction and Isolation of Novel PBX and NOTCH Loci. Genomics, 35: 101-108.

Kaufman, J., Volk, H., and Wallny, H-J. (1995) A “minimal essential Mhc” and an “unrecognized” Mhc: two extremes for selection of polymorphism. Immunol. Rev. 143: 63-88

Kim. C.G., Epner, E.M., Forrester, W.C., and Groudine, M. (1992) Inactivation of the Human P-globin Gene by Targeted Insertion into the |3-globin Locus Control Region. Genes & Development, Cold Spring Harbor Laboratory Press, 6: 928-938.

Kim, UJ. Shizuya, H., de Jong, P.J., Birren, B., and Simon, M.I., (1992) Stable Propagation of cosmid sized human DNA inserts in an F factor based vector. Necleic Acids Research 20: 1083-1085.

Kim, S.Y., Horrigan, S.K., Altenhofen, J.L., Arbieva, Z.H., Hoffman, R., and Westbrook, C.A. (1998) Modification of bacterial artificial chromosome clones using Cre recombinase: introduction of selectable markers for expression in eukaryotic cells. Genome Research 8: 404-412.

King, L., Gates, R., Stoscheck, C., and Nanney, L. (1990) The EGF/TGFa receptor in skin. J. Invest. Dermatol. 94: 164S-170S

Korenberg, J R., Chen, XN., Adams, M.D., and Craig Venter, J., (1995) Toward a cDNA Map of the Human Genome. Genomics, 29: 364-370.

Korge, B.P., Ishida-Yamamoto, A., Punter, C. et al (1997) Loricrin mutation in Vohwinkel's keratoderma is unique to the variant with ichthyosis. J. Invest. Dermatol. 109, 604-610

Korn, B., Sedlacek, Z., Manca, A., Kioschis, P., Konecki, D., Lehrach, H. and Poustka, A. (1992) A strategy for the selection of transcribed sequences in the Xq28 region. Hum. Molec. Genet. 1: 235-242.

Krieg, P., Schuppler, M., Koesters, R., Mincheva, A., Lichter, P., and Marks, F. (1997) Reptin (Rptn), a New Member of the “Fused Gene” Subgroup within the SlOO Gene Family Encoding a Murine Epidermal Differentiation Protein. Genomics, 43:339-348.

Krumlauf, R., Jeanpierre, M., and Young, R.D. (1982) Construction and Characterization of Genomic Libraries From Specific Human Chromosomes. Genetics, 79:2971-2975.

249 Kuechle, M.K. Thulin, C D., Presland, R.B., and Dale, B.A, (1999) Profilaggrin Requires both Linker and Filaggrin Peptide Sequences to Form Granules: Implications for Profilaggrin Processing In Vivo. J. Invest. Dermatol. 112: (6) 843-851

Kumar, S., and Hedges, S,B. (1998) A molecular timescale for vertebrate evolution. Nature 392: .917- 920

Lagasse, E., and Clerc, R.G. (1988) Cloning and Expression of Two Human Genes Encoding Calcium- Binding Proteins That Are Regulated during Myeloid Differentiation. Mol. Cell. Biol. 8: 2402-2410.

Larin, Z., Monaco, A.P., Lehrach, H. (1991) Yeast artificial chromosome libraries containing large inserts from mouse and human libraries. Proc. Natl. Acad. Sci. USA. 88: 4123-4127

Lee, S-C., Kim, I-G, Marekov, L.N., O’Keefe, E.J., Parry, D.A.D., and Steinert, P.M. (1993) The structure of human trichohyalin. J. Biol. Chem. 268: (16) 12164-12176.

*

Lewis, B.C., Shah, N.P., Braun, B.S., Denny, C.T., Creation of a yeast artificial chromosome vector on lysine-2. (1992) Genet. Anal. 9: (3) 86-90

Larramendy, M.L., Virolainen, M., Tukiainen, E., Elomaa, I., and Knuutila, S. (1998) Chromosome Band lq21 is Recurrently Gained in Desmoid Tumors. Genes, Chromo & Can. 23: 183-186

Lennon, G., Auffray, C., Polymeropoulos, M. and Bento Soares, M. (1996) The I.M.A.G.E. Consortium: An Integrated Molecular Analysis of Genomes and their Expression. Genomics 33: 151-152.

Liang, F., Han, M., Romanienko, P.J., and Jasin, M. (1998. Homology-directed repair is a major double­ strand break repair pathway in mammalian cells. Proc. Natl. Acad. Sci. USA. 95: 5172-5177.

Lindsay, S., and Bird, A.P. (1987) Use of Restriction Enzymes to Detect Potential Gene Sequences in Mammalian DNA. Nature. 327: 336-338

Little, P. (1993) Small and Perfectly Formed. Nature, 366: 204-205.

Lioumi, M., Olavesen, M.G., Nizetic, D., and Ragoussis, J. (1998) High-resolution Y AC fragmentation map of lq21 Genomics A09\ 200-208. * Leigh, I.M. and Watt, F.M. (1994) The culture of human kératinocytes. In Leigh I, Land B, Watt F (eds). The Keratinocyte Handbook, ppl- Cambridge University Press, Cambridge, UK 250 Lohman, P.P., Medema, J.K., Gibbs, S., Ponec, M., Van de Putte, P., and Backendorf, C., (1997) Expression of the SPRR Cornification Genes is Differentially Affected by Carcinogenic Transformation. Experimental Cell Research, 231: 141-148.

Lovett, M., Kere, J., and Hinton, L.M. (1991) Direct selection: A method for the isolation of cDNAs encoded by large genomic regions. Proc. Natl. Acad. Sci. USA 88: 9628-9632

Lueders, K.K., Elliott, R.W., Marenholtz, I., Mischke, D., DuPree, M., and Hamer, D., (1999) Genomic organization and mapping of the human and mouse neuronal B2-nictotinic acetylcholine receptor genes. Mammalian Genome 10:900-905.

Lundin, L. (1993) Evolution of the vertebrate genome as reflected in paralagous chromosomal regions in man and the house mouse. Genomics 16: 1-19

Madsen, P., Rasmussen, H.H., Leffers, H., Honore, B., Dejgaard, K., Olsen, E., Kiil, J., Walbum, E., Andersen, A.H., Basse, B., Lauridsen, J.B., Ratz, G.P., Celis, A., Vandekerckhove, J., and Celis, I.E. (1991) Molecular Cloning, Occurrence, and Expression of a Novel Partially Secreted Protein “Psoriasin” That Is Highly Up-Regulated in Psoriatic Skin. J. Invest. Derm. 97: 701-712.

Maestrini, E., Monaco, A.P., McGrath, J.A. et al. (1996) A molecular defect in loricrin, the major component of the cornified cell envelope, underlies Vohwinkel's syndrome. Nature Genetics 13: 70-77.

Maestrini, E., Korge, B.P., Ocana-Sierra, J., Calzolari, E., Cambiaghi, S., Scudder, P.M., Hovnanian, A., Monaco, A.P., and Munro, C.S. (1999) A missense mutation in connexin26, D66H, causes mutilating keratoderma with sensorineural deafness (Vohwinkel's syndrome) in three unrelated families. Hum. Mol. Genet. 8: 1237-1243

Mahairas, G.G., Wallace, J.C., Smith, K., Swartzell, ST., Holzman, T., Keller, A., Shaker, R., Furlong, J., Young, J., Zhao, S., Adams, M., and Hood, L. (1999) Sequence-tagged Connectors: A Sequence Approach to Mapping and Scanning the Human Genome. Proc. Natl. Acd. ScLUSA.,^ 96: 9739- 9744.

Marenholz, I., Volz, A., Ziegler, A., Davies, A., Ragoussis, I., Korge, B.P., and Mischke, D. (1996) Genetic analysis of the Epidermal Differentiation Complex (EDC) on Human Chromosome lq21: Chromosomal Orientation, New Markers, and a 6-Mb YAC Contig. Genomics 37: 295-302.

251 Markova, N.G., Marekov, L.N., O’Keefe, E.J., Parry, D.A.D., and Steinert, P.M. (1993) Profilaggrin is a major epidermal calcium binding protein. Mol Cell Biol 13: 613-625.

Marra, M.A., Kucaba, T.A., Dietrich, N.L., Green, E.D., Brownstein, B., Wilson, R.K. High throughput fingerprint analysis of large-insert clones (1997) Genome. Res. 7: 1072-1084

Matsuki, M., Yamashita, P., Ishida-Yamamoto, A., Yamada, K., Kinoshita, C., Fushiki, S., Ueda, E., Morishima, Y., Tabata, K., Yasuno, H., Hashida, M., lizuka, H., Ikawa, M., Okabe, M., Kondoh, G., Kinoshita, T., Takeda, J., and Yamanishi, K. (1998) Defective Stratum Corneum and Early Neonatal Death in Mice Lacking the Gene for Transglutaminase 1 (Keratinocyte Transgultaminase) Proc. Nat. Sci. USA. 95: 1044-1049

McAlpine, P.J., Shows, T.B., Boucheix, C., Stranc, L.C., Berent, T.G., Pakstis, A.J., and Doute, R.G. ( 1989) Report of the Nomenclature Committee and the 1989 Catalog of Mapped Genes. Cytogenet. Cell Genet. 51:13-66

McKinley-Grant, L.J., Idler, W.W., Berstein, I.A., Parry, D.A.D., Cannizzaro, L., Croce, C.M., Huebner, K., Lessin, S.R., and Steinert, P.M. (1989) Characterization of a cDNA clone encoding human filaggrin and localization of the gene to chromosome region lq21. PNAS 86: 4848-4852.

McPherson, J.D. (1997) Sequence Ready - or Not? Genome Res, 1: 1111-1113.

McQueen, H.A., Siriaco.G., and Bird, A.P. (1998) Chicken Microchromosomes are Hyperacetylated, Early Replicating, and Gene Rich. Genome Research, 8: 621-630.

Mehrel, T., Hohl, D., Rothnagel, J.A., Longley, M.A., Bundman, D., Cheng, C., Lichti, U., Bisher, M., Steven, A C., Steinert, P.M., Yuspa, S.H., and Roop D R. (1990) Identification of a major keratinocyte cell envelope protein, loricrin. Cell 61: 1103-1112

Meljia, I.E., and Monaco, A.P., (1997) Retrofitting Vectors for Escherichia co/i-Based Artificial Chromosomes (PACs and BACs) with Markers for Transfection Studies. Genome Research 7: 179-186.

Michel, S., Schmidt, R., Shroot, B., and Reichert, U., (1988) Morphological and Biochemical Characterization of the Cornefied Envelopes from Human Epidermal Kératinocytes of Different Origin. J. Invest. Dermatol. 91: (1) 11-15

252 Mischke, D., Korge, B.P., Marenholz, I., Volz, A., and Ziegler, A. (1996) Genes encoding structural proteins of the epidermal cornification and SlOO calcium-binding proteins form a gene complex (‘epidermal differentiation complex’) on human chromosome lq21. J Invest Derm 106: 989-992.

Mischke, D., Zirra, M., Fischer, D., backendorf, C., Ziegler, A., and Marenholz, I. j (1998) Assignment of additional genes expressed in human kératinocytes to the Epidermal Differentiation Complex (EDC) in chromosome region lq21. Cytogenet Cell Genet 83: 172 (meeting abstract)

Molhuizen, H.O.F., Alkemade, H.A.C., Zeeuwen, P.L.J.M., de Jongh, G., Wieringa, B., and Schalkwijk, J. (1993) SKALP/Elafin: An Elastase Inhibitor from Cultured Human Kératinocytes. J. Biol. Chem. 268: (16) 12028-12032.

Monaco, A.P., Bertelson, C.J., Middlesworth, W., Colletti, C.A., Aldridge, J., Fischbeck, K.H., Bartlett, R., Pericak-Vance, M.A., Roses, A.D., and Kunkel, L.M., (1985) Detection of Deletions Spanning the Duchenne Muscular Dystrophy Locus Using a Tightly Linked DNA Segment. Nature 316: (29) 842-844.

Monaco, A.P., Neve, R.L., Colletti-Feener, C., Bertelson, C.J., Kurnit, D M., and KUNKEL, l.m. (1986) Isolation of candidate cDNAs for portions of the Duchenne muscular dystrophy gene. Nature, 323: 646- 650.

Monaco, J.J. (1992) A Molecular Model of MHC Class -I- restricted Antigen Processing. Immunology Today, 13: (5) 173-179.

Monzon, R.I., McWilliams, N., and Hudson, L.G., (1996) Suppression of Cornified Envelope Formation and Type 1 Transglutaminase by Epidermal Growth Factor in Neoplastic Kératinocytes. Endocrinology 137: 1727-1724.

Moog-Lutz, C., Bouillet, P., Regnier, C.H., Tomasetto, C., Mattel, M-G., Chenard, M-P., Anglard, P., Rio, M-C., and Basset, P. (1995) Comparative Expression Of The Psoriasin (S100A7) And SIOOC Genes In Breast Carcinoma And Co-Localisation To Human Chromosome Iq21-Q22. Int. J. Cancer 63: 297- 303.

Morgan, J.G., Dolganov, G.M., Robbins, S.E., Hinton, S.M., and Lovett, M. (1992) The selective isolation of novel cDNAs encoded by the regions surrounding the human interleukin 4 and 5 genes. Nucleic Acids Res. 20: 5173-5179

253 Morii, K., Tanaka, Y., Takahashi, Y., Minoshima, S., Fukuyama, R., Shimizu, N., and Kuwano, R. (1991) Structure And Chromosome Assignment Of Human SlOO a And (3 Subunit Genes, Biochem. Biophys. Res. Commun. 175: 185-191.

Mosely, W.S., and Seldin, M.F. (1989) Definition of mouse chromosome 1 and 3 gene linkage groups that are conserved on human chromosome 1 : evidence that a conserved linkage group spans the centromere of human chromosome 1. Genomics 5: 899-905

Mullins, L.J., Kotelevtseva, N., Boyd, C., and Mullins, J. (1997) Efficient Cre-lox Linearisation of BACs: Applications to Physical Mapping and Generation of Transgenic Animals. Nucleic Acids Res. 25: (12) 2539-2540

Murray, J., Buetow, K.H., Weber, J.L., Ludwigsen, S., and 27 others (1994) A Comprehensive Human Linkage Map with Centimorgan Density. Science. 265: 2049-2054.

Nemes, Z., Marekov,L.N., Fesus, L., and Steinert, P.M., (1999) A novel function for transglutaminase 1: Attachment of long chain hydroxyceramides to involucrin by ester bond formation. PNAS 96: 8402-8407

Nickerson, D.A., Kaiser, R., Lappin,S., Stewart, J., Hood, L., and Landegren U., (1990) Automated DNA diagnostics using an BLISA-based oligonucleotide ligation assay. Genetics 87: 8923-8927.

Nirunsuksiri, W., Presland, R.B., Brumbaugh, S.G., Dale, B.A., and Fleckman, P. (1995) Deceased Profilaggrin Expression in Ichthyosis Vulgaris is a Result of Selectively Impaired Posttranscriptional Contol. J. Bio. Chem. 270: (2) 871-876

Nizetic, D., Figueroa, F., Dembic, Z., Nevo, E., and Klein, J. (1987) Major histocompatobility complex gene organization in the mole rat Spalax ehrenbergi: Evidence for transfer of function between class II genes:. Proc Natl. Acad. Sci. USA, 84: 5828-5832.

Nizetic, D., Zehetner, G., Monaco, A.P., Gellen, L., Young, B.D., and Lehrach, H„ (1991) Construction, arraying, and high-density screening of large insert libraries of human chromosomes X and 21: Their potential use as reference libraries. PNAS 88: 3233-3237.

Nizetic, D., Gellen, L., Hamvas, R.M.J., Mott, R., Grigoriev, A., Vatcheva, R., Zehetner, G., Yaspo, ML., Dutriaux, A., Lopes, C., Delabar, JM., Van Broeckhoven, C., Potier, MC., and Lehrach, H. (1994) An integrated YAC-overlap and ‘cosmid-pocket’ map of the human chromosome 21. Human Molecular Genetics, 3: (5) 759-770.

254 Nizetic, D., Monard, S., Cotter, F., Young, B.D., Lehrach, H. (1994) Construction of cosmid libraries from flow sorted material of human chromosomes 1,6,7,11,13 and 18 for use as reference libraries. Mamm Gen 5: 801-802

Nizetic, D. and Lehrach, H. (1995) Chromosome Specific Cosmid Libraries: construction, handling and use in parallel and integrated mapping. In: Glover DM and Hames BD (eds.). DNA Cloning 3. A practical approach IRL Press, Oxford, UK, pp49-79

Noda, M., Furutani, Y., Takahashi, H., Toyosata, M., Hirose, T., Inayama, S., Nakanishi, S., and Numa, S. (1982) Cloning and Sequence Analysis of c DNA for Bovine Adrenal Preproenkepahalin. Nature. 295: 202-295

Nonomura, K., Yamanishi, K., Yasuno, H., Nara, K., and Hirose, S. (1994) Up-regulation of Elafin/SKALP gene expression in Psoriatic epidermis. J. Invest. Dermatol. 103: 88-91

O’Brien, S.J., Womack, I.E., Lyons, L.A., Moore, K.J., Jenkins, N.A.., and Copeland, N.G. (1993) Anchored reference loci for comparative genome mapping in mammels. Nature Genetics 3: 103.

O'Kefee, E.J., Hamilton, E.H., Lee, S.C., and Steinert, P. (1993) Trichohyalin - A Structural Protein of Hair, Tongue, Nail, and Epidermis. J. Invest. Derm. 101: 65s-71s

Okuba, K., Hori, N., Matoba, R, Niiyama, T., Fukushima, A., Kojima, Y., and Matsubara, K. (1992) Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genet. 2: 173-179

Ohno, S. (1970) Evolution by gene duplication. Springer Verlag: New York.

Ohta, T. (1991) Multigene Families and the Evolution of Complexity. J. Mol. Evol. 33: 34-41.

Ohta, H., Sasaki, T., Naka, M., Hiraoka, O., Miyamoto, C., Furuichi, Y,, and Tanaka, T. (1991) molecular cloning and expression of the cDNA coding for a new member of the SlOO protein family from porcine cardiac muscle. FEES lett. 295: 93-96

Olsen, M.V., Dutchik, I.E., Graham, M.Y., Garrett, G., Brodeur, G.M., Helms, C., Frank, M., MacCollin, M., Scheinman, R., and Frank, T., (1986) Random-done strategy for genomic restriction mapping in yeast. Genetics 83: 7826-7830.

255 Olsen, M., Hood, L., Cantor, C., and Botstein, D. (1989) A Common Language for Physical Mapping of the Human Genome. Science, 245: 1434-1435.

Ortiz, B.D., Cado, D., Chen, V., Diaz, P.W., and Winoto, A., (1997) Adjacent DNA Elements Dominantly Restrict the Ubiquitous Activity of a Novel Chromatin-Opening Region to Specific Tissues. The EMBO Journal, 16: (16) 5037-5045.

Osoegawa, K., Susukida, R., Okano, S., Kudoh, J., Minoshima, S., Shimizu, N., de Jong, P.J., Groet, J., Ives, J.H., Lehrach, H., Nizetic, D., and Soeda, E. (1996) An integrated map with Cosmid/PAC contigs of a 4-Mb Down syndrome critical region. Genomics 32: 375-387

Pajukanta, P., Nuotio, I., Terwilliger, J.D., Porkka, K.V.K., Ylitalo, K., Pihlajamaki, J., Suomalainen, A.I., Syvanen, A C., Lehtimaki, T., Viikari, J.S.A., Laakso, M., Taskinen, M.R., Ehnholm, C., and Peltonen, L. (1998) Linkage of familial combined hyperlipidaemia to chromosome Iq21-q23. Nature Genet. 18: (4) 369-373

Papadakis, M.N., and Patrinos, G.P. (1999) Contribution of Gene Conversion in the Evolution of the Human P-like Globin Gene Family. Hum. Genet. 104: 117-125.

Parham, P. (1999) Virtual Reality in the MHC. Immunological Reviews, 167: 5-15.

Parimoo, S., Patanjali, S R., Shukla, H., Chaplin, D.D. and Weissman, S.M. (1991) cDNA selection: Efficient PCR approach for the selection of cDNAs encoded in large chromosomal DNA fragments. Proc. Natl. Acad. Sci. USA 88: 9623-9627

Peltonen, L., Arieli, Y., Pyornila, A., Marder, J. (1998) Adaptive changes in the epidermal structure of the heat-acclimated rock pigeon (Columba livia): A comparative electron microscopy study. J. morphology 235:(1) 17-29.

Persechini, A., Moncrief, N.D., and Kretsinger, R.H. (1989) The EF-hand family of calcium-modulated proteins. Trends Neurosci. 12: (11) 462-467.

Peterson, K.R., Clegg, C.H., Li, Q.L., and Stamatoyannopoulos, G. (1997) Production of transgenic mice with yeast artificial chromosomes. Trends Genet. 13: (2) 61-66.

256 Polymeropoulos, M.H., Ortiz De Luna, R.I., Ide, S.E., Torres, R., Rubenstein, J., and Francomano, C.A. (1995) The gene for pycnodysostosis maps to human chromosome lcen-lq21. Nature Genetics 10: 238- 239.

Peters, J.M., Barnes, R., Bennett, L., Gitomer, W.M., Bowcock, A.M., and Garg, A.(1998) Localization of the gene for familial partial lipodystrophy (Dunnigan variety) to chromosome lq21-22. Nature Genet. 18: 292-295

Potts, B.C.M., Smith, J., Akke, M., Macke, T.J., Okazaki, K., Hidaka, H., Case, D.A., and Chazin, W J. (1995) The Structure Of CalcyclCalcyclin Reveals A Novel Homodimeric Fold For SlOO Ca^^-Binding Proteins. Nat. Struc. Biol. 2: (9) 790-796.

Presland, R.B., Bassuk, J.A., Kimball, J.R., and Dale, B.A. (1995) Characterization of Two Distinct Calcium-Binding Sites in the Amino-Terminus of Human Profilaggrin. J. Invest. Derm. 104: 218-223.

Purandare, S.M., and Patel, P.I. (1997) Recombination Hot Spots and Human Disease. Genome Research, Cold Spring Harbour Laboratory Press, 7: 773-786.

Reddy, S.K.P., Knokin, T., and Wu, R. Structure and organization of the genes encoding mouse small proline-rich proteins mSPRRlA and IB. (1998) Gene 224: 59-66.

Reichert, U., Michel, S., and Schmidt, R. (1993) The cornified envelope: a key structure of terminally differentiating kératinocytes. In Darmon, M., and Blumenburg, M. (Eds) Molecular Biology o f the skin. Academic Press, San Diego, ppl07-150

Resing, K.A. and Dale, B.A. (1991) Proteins of keratohyalin. In: Goldsmith LA (ed) Physiology, Biochemistry and Molecular Biology o f the skin. Oxford University Press, New York, pp 148-167.

Rheinwald, J.G., and Beckett, M.A., (1980), Defective Terminal Differentiation in Culture as a Consistent and Selectable Character of Malignant Human Kératinocytes. Cell 22: 629-632.

Ridinger, K., Ilg, E C., Niggli, F.K., Heizmann, C.W., and Schaefer, B.W. (1999) Clustered organization of SlOO genes in human and mouse. Biochi. Biophys. Acta 14405: 1-11.

Robinson, N.A., Lapic, S., Welter, J.F., and Eckert, R.I. (1997) SlOOAll, SIOOAIO, Annexin I, Desmosmal Proteins, Small Proline-rich Proteins, Plasminogen Activator Inhibitor-2 and Involicrin are

257 Components of the i Cornified Envelope of Cultured Human Epidermal Kératinocytes. J. Biol. Chem. 272: (18) 12035-12046.

Robinson, N.A., and Eckert, R.I. (1998) Identification of Transglutaminase-reactive Residues in SlOOAll. 7. Biol. Chem. 273: (5) 2721-2728.

Roop, D. (1995) Defects in the Barrier. Science, 267: 474-475.

Rothnagel, J.A., Longley, M.A., Bundman, D.S., Naylor, S.L., Lalley, P.A., Jenkins, N.A., Gilbert, D.J., Copeland, N.G., and Roop, D R. Characterization of the mouse loricrin gene: linkage with profilaggrin and the flaky tail and soft coat mutant loci on chromosome 3. Genomics 23: 450-456. (1994)

Rosenthal, D.S., Simbulan-Rosenthal, C.M.G., Iyer, S., Spoonde, A., Smith, W., Ray, R., and Smulson, M.E. (1998) Sulfur Mustard Induces Markers of Terminal Differentation and Apoptosis in Kératinocytes Via a Ca'"^ - Calmondulin and Caspase-Dependent Pathway. J Invest. Dermatol. 111(1) 64-71

Ross, M.T., Hoheisel, J.D., Monaco, A.P., Larin, Z., Zehetner, G., and Lehrach, H. (1992) High-density gridded YAC filters: their potential as genome mapping tools. In: Anand, R. (ed) Techniques for the Analysis of Complex Genomes. Academic Press Limited, London, ppl37-153.

Ryle, C.M., Breitkreutz, D., Stark, H.J., Leigh, I.M., Steinert, P.M., Roop, D., and Fusenig, N.E. (1989) Density-Dependent Modulation of Synthesis of Keratin-1 And Keratin-10 in the Human Keratinocyte Line Hacat and in Ras-Transfected Tumorigenic Clones. Differentiation 40: 42-54

Saccone, S., de Sario, A., Della Valle, G., and Bernardi, G. (1992) The highest Gene Concentrations in the Human Genome are in Telomeric Bands of Metaphase Chromosomes. Proc. Natl. Acad. Sci. USA. 89: 4913-4917

Saccone, S., De Sario, A., Weigant, J., Raap, A.K., Della Valle, G., and Bernardi, G. (1993) Correlations Between Isochores and Chromosomal Bands in the Human Genome. Proc. Nat. Acad. Sci. USA. 90: 11929-11933

Saiki, R.K., Scharf, S., Faloona, F., Mullis, K.B., Horn, G.T., Erlich, H.A., and Amheim, N. (1985). Enzymatic amplification of b-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anaemia. Science 230: 1350-1354

258 Saiki, R.K., Gelfand, D.H., Stoffel, S. Scharf, S., Higuchi, R., Horn, G.T., Mullis, K.B., and Erlich, H.A., (1988) Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239: 487-491

Salzberg, S.L., Pertea, M., Delcher, A.L., Gardner, M.J., and Tettelin, H. (1999) Interpolated Markov Models for Eukaryotic Gene Finding. Genomics, 59: 24-31

Sambrook, J., Fritsch, E.F., and Maniatis, T. (1989). Molecular cloning: a laboratory manual. Cold Spring Harbour Laboratory Press, Cold Spring Harbour, NY.

The Sanger Centre and the Washington University Genome Sequencing Centre. (1998) Toward a Complete Human Genome Sequence. Genome Research, Cold Spring Harbor Laboratory Press, 8:1097-

1108 .

Sanger, F., Nicklen, S., and Coulson, A.R. (1977) DNA Sequencing with Chain-Terminating Inhibitors. Proc. Natl. Acad. Sci. USA 74:(12) 5463-5467.

Sark, M.W.J., Fischer, D.F., de Meijer, E., van de Putte, P., and Backendorf, C. (1998) AP-1 and Ets transcription factors regulate the expression of the SPRR 1A keratinocyte terminal differentiation marker. J. Biol. Chem. 273: 24683-24692

Saris, C.J.M., Kristensen, T., D’Eustachio, P., Hicks, L.J., Noonan, D.J., Hunter, T., and Tack, B. 91987) cDNA Sequence and Tissue Distribution of the mRNA for Bovine and Murine pi 1, the SlOO -related Light Chain of the Protein-Tyrosine Kinase Substrate p36 (Calpactin I). J. Biol. Chem. 262: (22) 10663-

10671.

Schaefer, B.W., Wicki, R., Englekamp, D., Mattel, M.G., Heizmann, C.W. (1995) Isolation of a YAC clone covering a cluster of nine SlOO genes on human chromosome lq21: rationale for a new nomenclature of the SlOO calcium-binding protein family. Genomics 25: 638-643.

Schaefer, B.W., Heizmann, C.W. (1996) The SlOO family of EF-hand calcium-binding proteins: Functions and pathology. Trends Biochem. Sci. 21: 134-140.

Schaffer, A., Orso, E., Palitzsch, K.D., Buchler, C., Drobnik, W., Furst, A., Scholmerich, J., Schmitz, G. ( 1999) The human apM-1, an adipocyte-specific gene linked to the family of TNF's and to genes expressed in activated T cells, is mapped to chromosome Iq21.3-q23, a susceptibility locus identified for familial combined hyperlipidaemia (FCH). Biochem, Biophys. Res. Commun. 260: 416-425

259 Schimenti, J.C. (1999) Insights from Model Systems: Mice and the Role of Unequal Recombination in Gene-Family Evolution. Am. J. Hum. Genet., 64: 40-45.

Schuler, G.D., Boguski, M.S., Stewart, E.A., and 56 others (1996) A gene map of the human genome. Science 274: 540-546

Schwartz, D C., and Cantor, C.R., (1984) Separation of Yeast Chromosome-Sized DNAs by Pulsed Field Gradient Gel Electrophoresis. Cell 37: 67-75.

Segre, J.A., Bauer, C., and Fuchs, E.(1999) Klf4 is a transcription factor required for establishing the barrier function of the skin Nature Genetics 22: 356-360.

Seishima, M., Nojiri, M., Esaki, C., Yoneda, K., Eto,Y., and Kitajima, Y. (1999) Activin A Induces Terminal Differentation of Cultured Human Kératinocytes. J Invest. Dermatol. 112: (4) 432-436

Shiina, T., Tamiya, G., Oka, A., Takishima, N., and Inoko, H. (1999) Genome Sequencing Analysis of the 1.8Mb Entire Human MHC class I region. Immunological Reviews, 167: 193-199.

Shimada, A., Ota, Y., Sugiyama, Y., Sato, S., Kume, K., Shimizu, T., and Inoue, S. (1998) In Situ Expression of Platelet-Activating Factor (PAF)- Receptor Gene in Rat Skin and Effects of PAF on Proliferation and Differentation of Cultured Human Kératinocytes. J Invest. Dermatol. 110:(6) 889-893

Shizuya, H., Birren, B.I., Kim, U-J., Mancino, B., Slepak, T., Tachiiri, Y., and Simon, M.I. (1992) Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Pro Nat Acad Sci USA 89: 8794-8797.

Smit, A.FA., (1996) The Origin of Interspersed Repeats in the Human Genome. Current Opinion in Genetics and Development, 6: 743-748.

Song, H-J., Poy, G., Darwiche, N., Lichiti, U., Kuroki, T., Steinert, P.M., and Kartasova, T. Mouse Sprr2 genes: a clustered family of genes showing differential expression in epithelial tissues. (1999) Genomics 55: 28-42.

South A.P., Cabral A., Ives J.H., James C H., Mirza G., Marenholz I., Mischke D., Backendorf C., Ragoussis J., and Nizetic D. (1999) Human Epidermal Differentiation Complex in a single 2.5 Mbp long

260 continuum of overlapping DNA cloned in bacteria integrating physical and transcript maps. J. Invest. Derm. 112 (6): 910-918.

Southern, E.M. (1975). Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol., 98: 503-517.

Steinert, P.M., and Marekov, L.N. (1995) The Proteins Elafin, Filaggrin, Keratin Intermediate Filaments, Loricrin, and Small Proline-rich Proteins 1 and 2 are Isodipeptide Cross-linked Components of the Human Epidermal Cornified Cell Envelope. J. Biol. Chem. 270: (30) 17702-17711.

Steinert, P.M, and Marekov, L.N., (1997) Direct Evidence that Involucrin is a Major Early Isopeptide Cross-linked Compondent of the Keratinocyte Cornified Cell Envelope. J..Biol.Chem., 272: (3) 2021- 2030.

Steinert, P.M. and Roop, D R. (1998) Molecular and Cellular Biology of Intermediate Filaments Ann. Rev. Biochem. 57: 593-625.

Sulston, J., Du, Z., Thomas, K., Wilson, R., Hillier, L., Staden, R., Halloran, N., Green, P., Thierry-Mieg, J., Qiu, L., Dear, S., Coulson, A., Crazton, M., Durbin, R., Berks, M., Metzstein, M., Hawkins, T., Ainscough, R., and Waterson, R. (1992) The C. elegans genome sequencing project: a beginning. Nature 356: 37-41.

Swartzendruber, D C., Wertz, P.W., Madison, K.C., and Downing, D.T. (1987) Evidence that the Corneocyte Has a Chemically Bound Lipid Envelope. J. Invest. Dermatol., 88: (6) 709-713.

Taghian, D.G., and Nickoloff, J.A. (1997) Chromosomal Double-Strand Breaks Induce Gene Conversion at High Frequency in Mammalian Cells. Molecular and Cellular Biology, 17: 6386-6393.

Takahashi, M., Tezuka, T., and Katunuma, N. (1992) Phosphorylated Cystatin a is a Natural Substrate of Epidermal Transglutaminase for Formation of Skin Cornified Envelope. FEBS J., 308: (1) 79-82.

Takahashi, H., and lizuka, H. (1993) Analysis Of The 5'-Upstream Promoter Region Of Human Involucrin Gene - Activation By 12-0-Tetradecanoylphorbol-13-Acetate. J. Invest. Dermatol. 100: 10-15

261 Takahashi, H., Kinouchi, M., Wupper, K.D., and lizuka, H, (1997) Cloning of Human Keratolinin cDNA: Keratolinin Is Identical with a Cysteine Proteinase Inhibitor, Cystatin A, and is Regulated by Ca2+, TPA, and Camp. (1997) J.Invest. Dermatol. 108: (6) 843-847.

Takahashi, H., Asano, K., Kinouchi, M., Ishida-Yamamoto, A., Wueppers, K.D., and lizuka, H., (1998) Structure and Transcriptional Regulation of the Human Cystatin A Gene. J. Biol. Chem. 273: (28) 17375- 17380.

Takahashi, H., Asano, K., Manabe, A., Kinouchi,M., Ishida-Yamamoto, and lizuka, H. (1998) The a and 7t Isoforms of Protein Kinase C Stimulate Transcription of Human Involucrin Gene. J. Invest. Dermatol. 110:(3) 218-223

Takahashi, H., Ishida-Yamamoto, A,, Kishi, A., Ohara, K., and lizuka, H. (1999) Loricrin gene mutation in a Japanese patient of Vohwinkel's syndrome. J. Dermatol Sci 19: 44-47

Talmadge, C.B., Zhen, DK., Wang, JY., Berglund, P., Li, BF,, Weston, M.D., Kimerling, W.J., Zabarovsky, E.R., Stanbridge, E.J., Klein, G. and Sumegi, J. (1995) Construction and Characterization of a Not] Linking Library from Human Chromosome Region lq25-qter. Genomics, 29: 105-114.

Tanimoto. K., Liu, Q., Bungert, J., and Engel, J.D., (1999) Effects of Altered Gene Order or Orientation of the Locus Control Region on Human P-globin Gene Expression in Mice. Nature, 398: 344-348.

Tassone, P., Villard, L., Clancy, K., and Gardiner, K. (1999) Structures, Sequence Characteristics, and Synteny Relationships of the Transcription Factor E4TF1, the Splicing Factor U2AF35 and the Cystathionine beta Synthetase Genes from fugu rubripes. Gene. 226: 211-223

Tezuka, T. and Takahashi, M. (1987) The Cystine-Rich Envelope Protein from Human Epidermal Stratum Corneum Cells. J Invest Dermatol 88: (1) 47-51.

Thacher, S.M., and Rice, R.H. (1985) keratinocyte-specific transglutaminase of cultured human epidermal cells: relation to cross-linked envelope formation and terminal differentiation. Cell 40: 685-695

Townes, T.M. and Behringer, R.R. (1990) Human globin locus activation region (LAR): Role in temporal control. Trends Genet. 6: (7) 229.

262 Trower, M.K., Orton, S.M., Purvis, I.J., Sanseau, P., Riley, J., Christodoulou, C., Burt, D., See, C.G., Elgar, G., Sherrington, R., Rogaev, E.I., St. George-Hyslop, P., Brenner, S., and Dykes, C.W. (1996) Conservation of Synteny Between the Genome of the Pufferfish {Fugu rubripes) and the Region on Human Chromosome 14 (14q24.3) Associated with Familial Alzheimer Disease (AD3 locus) Proc. Natl. Sci. USA. 93: 1366-1369

Tuan, D., Soloman, W., Li, Q., and London, I.M. (1985) The”(3-like globin” Gene Domain in Human Erythroid Cells. Proc. Nat. Acad. Sci. USA. ' 82: 6384-6388.

Uberbacher, E C., and Mural, R.J. (1991) Locating Protein-Coding Regions in Human DNA Sequences by a Multiple Sensor-Neural Network Approach. Proc. Natl. Acad. Sci. USA 88:11261-11265.

Udvardy, A. (1999) Dividing the empire: boundary chromatin elements delimit the territory of enhancers EMBOJ.IS: 1-8

Vanin, E.F., (1985) Processed Pseudogenes: Characteristics and Evolution. Ann. Rev. Genet. 19: 253-273.

Venter, J.C., Smith, H.O., and Hood, L. (1996) A New Stategy for Genome Sequencing. Nature, 381: 364-366.

Venter, J.C., Adams, M.D., Sutton, G.G., Kerlavage, A.R., Smith, H.O., and Hunkapiller, M. (1998) Shotgun Sequencing of the Human Genome. Science, 280: 1540-1542.

Versteeg, R., Chan, A., and van der Drift, P. (1998) PAC analysis of 5 homologous clusters of locally repetitive sequences on chromosome 1 reveal a total length of 8Mb. Cytogenet Cell Genet 83: 174

Volz, A., Korge, B.P., Compton, J.G., Zeigler, A., Steinert, P.M., and Mischke, D. (1993) Physical mapping of a functional cluster of epidermal differentiation genes on chromosome lq21. Genomics 18: 92-99.

Walter, M.A., Spillett, D.J., Thomas, P., Weissenbach, J., and Goodfellow, P.N. (1994) A method for constructing radiation hybrid maps of whole genomes. Nature Genetics, 7: 22.

Wang, Y.C., Selvakumar, M., and Helfman, S.M. (1997) Alternative pre-m RNA Splicing. In.- Krainer, A.R. (Ed.) Eukaryotic mRNA Processing Oxford University Press, New York pp242-278

263 Watt, F.M. (1989) Terminal Differentiation of Epidermal Kératinocytes. Curr Opin Cell Biol 1: 1107- 1115.

Weber, J.L. and Myers, E.W. (1997) Human Whole-Genome Shotgun Sequencing. Genome Res 7:401- 409.

Weissenbach, J., Gyapay, G., Dib, C., Vilgnal, A., Morissette, J., Millasseau, P., Vaysseix, G., and Lathrop, M.„ (1992) A second -generation linkage map of the human genome. Nature, 359: 794-801.

Weterman, M.A.J., Wilbrink, M., Dijkhuizen, T., van den Berg, E., and van Kessel, A.G. (1993) Fine mapping of the lq21 breakpoint of the papillary venal cell carcinoma-associated (X;l) translocation. Hum. Genet. 98: 16-21.

Wicki, R., Schaefer, B.W., Erne, P., and Heizmann, C.W. (1996a) Characterization Of The Human And Mouse Cdnas Coding For S100A13, A New Member Of The SlOO Protein Family. Biochem. Biophys. Res. Commun. 227; 594-599.

Wicki, R., Marenholz, I., Mischke, D., Schaefer, B.W., and Heizmann, C.W. (1996b) Characterization of the human S100A12 (calgranulin C, p6, CAAFl, CGRP) gene, a new member of the SI(X) gene cluster on chromosome lq21. Cell Calcium 20: 459-64.

Williams, M.L. and Elias, P.M. (1993) From Basket Weaver to Barrier - Unifying Concepts for the Pathogenesis of the Disorders of Cornification. Arch. Dermatol. 192: 626-629.

Wood, L., Carter, D., Mills, M., Hatzenbuhler, N., and Vogeli, G. (1991) Expression of Calcyclin, a Calcium-Binding Protein, in the Keratogenous Region of Growing Hair Follicles. J. Invest. Derm. 96: 383-387.

Xia, L., Stoll, S.W., Liebert, M., Ethier, S.P., Carey, T., Esclamado, R., Carroll, W., Johnson, T.M., Elder, J.T. (1997) CaN19 Expression in Benign and malignant Hyperplasias of the Skin and Oral Mucosa: Evidence for a Role in Regenerative Differentiation. Cancer Res. 57: 3055-3062.

Yamamura, T., Hitomi, J., Nagasaki, K., Suzuki, M., Takahashi, E-L, Saito, S., Tsukada, T., and Yamaguchi, K. (1996) Human CAAFl Gene - Molecular Cloning, gene Structure, and Chromosome Mapping. Biochem. Biophys. Res. Commun. 221: 356-360

264 Yang, X.W., Model, P., and Heintz, N. (1997) Homologous recombination based modification in Escherichia coli and germline transmission in transgenic mice of a bacterial artificial chromosome. Nature Biotech. 15: 859-865

Yeager, M., and Hughes, A.L. (1999) Evolution of the mammalian MHC: natural selection, recombination, and convergent evolution. Immunol. Rev. 167: 45-58

Yoneda, K. and Steniert, P.M. (1993) Overexpression of human loricrin in transgenic mice produces a normal phenotype. PNAS 90: 10754-10758.

Young Kim, S., Horrigan, S.K., Altenhofen,J.L., Arbieva, Z.K., Hoffman, R., and Westbrook, C.A., (1998) Modification of Bacterial Artificial Chromosome Clones Using Cre Recombinase: Introduction of Selectable Markers for Expression in Eukaryotic Cells. Genome Research 8: 404-412.

Yuspa, S.H., and Morgan, D.L.( 1981) Mouse Skin Cells Resistant to Terminal Differentation Associated With Initiation of Carcinogenesis. Nature, 293:72-74.

Yuspa, S.H., Kilkenny, A.E., Steinert, P.M., and Roop, D R. (1989) Expression of murine epidermal differentiation markers is tightly regulated by restricted extracellular calcium concentrations in vitro. J. Cell. Biol. 109: 1207-1217.

Zeeuwen, P.L.J.M., Hendriks, W., de Jong, W.W., and Schalkwijk, J. (1997) Identification and Sequence Analysis of Two New Members of the SKALP/elafin and SPAI-2 Gene Family. J. Biol Chem 272: (33) 20471-20478.

Zhao, X.P. and Elder, J.T. (1997) Positional cloning of novel skin-specific genes from the human epidermal differentiation complex. Genomics 45: 250-258.

Zimmer, D.B., Cornwall, E.H., Landar, A., and Song, W. (1995) The SlOO Protein Family: History, Function, and Expression. Brain Res. Bull 37: (4) 417-429.

Zimmer, D.B., Chessher, J., and Song, W. (1996) Nucleotide homologies in genes encoding members of the SlOO protein family. Biochim. Biophys. Acta. 1313: 229-238.

Zhang, J., Glatfelter, A.A., Taetle, R., and Trent, J.M. (1999) Frequent Alterations of Evolutionarily Conserved Regions of Chromosome 1 in Human Malignant Melanoma. Cancer Genet. Cytogenet. Ill: 119-123

265 Zong, X.P., and Krangel, M.S. (1997) An Enhancer-Blocking Element Between a and o Gene Segments Within the Human T Cell Receptor a / o Locus. Proc. Natl. Acad. Sci. USA. 94: 5219-5224.

266 A p p e n d ix I: D e t a il s o f v e c t o r s

USED FOR RESTRICTION ANALYSIS

The following table details the relevant features of cloning vectors used in the construction of genomic libraries used in this thesis. Libraries and relevant references are described in section 2.1.14.

Library Vector Size of recombinant Cloning site T7 SP6 BcoRl sites Bamin. sites Notl San clone vector nromoter promoter sites sites PAG PGYPAG 16015bp SamHl-lbp 15987bp 7bp 4 sites - 0 * 2 sites - 1 site- RPCI-1 -2N 5I78bp 39bp 26bp 13243bp 1598Ibp 13729bp 3 13835bn Cosmid Lawrist4 5407bp BamHl-1 bp 5364bp 12bp 5 sites - 0 * 1 site - 0 ICRF112 293bp 5400bp 1524bp 3241bp 3291bp 5028bo BAG pBeloBA 7507bp //mJlll- 311bp 12bp 2 sites - 1 site - 2 sites - 3 sites - -Gil 384bp 333bp 354bp Ibp 366bp 1207bp 63 Ibp 646bp 7030bo Gosmid SGOSl 7937bp BamHI-32bp 36bp 0 2 sites - 0* 2 sites - 0 MMPG- Ibp 7bp cl25 62bp 57bp (chicken)

* V4 chance of creating a B am H l site. Appendix II: Summary of hybridization and PCR based confirmations that establish the overlap of the integrated bacterial contig presented in figure 3.12.

Table on following page

(left) Sources of probes, (top) target clones. The exact nature of probe used in each case was (0) - hybridization to colony (1) - hybridization to Southern blot (2) - gene/marker by PCR (3) - T7 clone end riboprobe. In the case of BAG and PAC probes, hybridization refers to labeled insert. In the case of cosmid clone cP0780 the probe used was isolated from internal SauSAl fragments. Probes used for genes/markers/ESTs are indicated in text along with relevant primer sequences. Addresses of clones are as figure 3.12 except cosmid#l : ICRFcl 12L1963 and ICRFcl 12E2275, while cosmid#2 refers to ICRFcl 12PI146, ICRFcl 12K182, ICRFcl 12K1231, ICRFclI2D0853, ICRFcl 12H1166, ICRFcl 12A0454, ICRFcl 1201459, ICRFcl 12N0667, ICRFcl 12F124, ICRFcl 12G0160, ICRFcl 12N15101, and ICRFcl 12L0567. ‘nt’ indicates clone not tested.

268 i

I - 53 £ u 1 1 j % 1 O s 2 1 z 1 1 1 i s I i i o 1 2 II 1 I 1 % £ g I g & i 1 1 ? E £ 1 109M8 1019b8 61J12 22 0 0 9 d lS 3 6 2 3 y37m-6 nt nt y37m-2 nt 0313M 14 CB0750 nt y37m -16 nt nt g s g 13506 k - ' slOOalO 1 slO O all nt 1 133011 Trichohyalin

1 16018 d lS 3 6 2 4 Prolilaggrin 50B24 int 91G5 : nt 11013 52J10 - 0 tf d1S1664 4 3 0 1 7 13P20 ' s ft ft Involucrin ; 1 1 20N18 SPR R 3 Jtt 0.1 0 1 SP R R Ib 59H12 Q H n i jjt .1 SPR R 2a A*— «41 41 19K8 i . i 0 , 0 Loricrin ! 1 ft.1 127E12 U 1 jO 1 0.1 Ü 0 . 1 . -- s100a9 I |n t s i 0038 230D1 « t 0 1 0 1 o r % , 0 _ aa « s100a7 nt r 1 1 Ij d1s3625 B 2 - 0,1.2 0.1.2 O.IJÏ 148L21 = " 4 0.1 I ) , - O'

*■ ■ — s100a6 0 1 2 C 1 2 - 2 2 nt nt - slOOaS : 1 1,2. i nt nt nt s100a4 nt : 1 nt nt nt s100a3 nt nt nt nt nt s100a2 C 1 1: nt nt s100a13 - CP0780 r « — slO O al 9 * 9 92M23 _^ _ Appendix III: EDC novel sequences (ordered as described in text)

EDC NOVEL SEQUENCE 1

“Exon trapped” clone 127E12sB4 (section 4.4.6, table 4.3)

ATACTCCCACCATCGTCTCCCGCAAGGAGTGGGGGCAAGACCGCTCGCCTGCAGGGCCCTGCTGACCCTG CCTGTGGCCTACATCATCACAGACCAGCTCCCAGGGATGCAGTGCCAGCAGCAGAGCGTTTGCAGCCAG ATGCTGCGGGGGTTGCAGTCCCATTCCGTCTACACCATAGGCTGGTGCGACGTGGCGTACAA

EDC NOVEL SEQUENCE 2

“Exon trapped” clone 127E12sCl (section 4.4.6, table 4.3)

ACGTCTTTGACACATGGTACAAGdTNTTATNCTGNNGGNCCNTCAGCATTATACNGNGCNNGATCATAC NGTGGNGNGNTGANNTGNCGAGGGGANTTTGCTCNTNTGNGNGGNTNAAAANNNGNGNNNNTNGGGNN NNNNTNNGNCTNNGNNCNNNGTNTNGCCTCNAGNTCNTNNTCGNTNNCCGTNNTTCNNGNGANTTNNN CGGNNGNGGNNGNNTG

EDC NOVEL SEQUENCE 3

“Exon trapped” clone 127E12sC3 (section 4.4.6, table 4.3)

GTCCACACTCTCCCGGTCAGAGTCCTGGGACCACATGGGGACGCTGCCATGGCTTCTTGCCTTCTTCATTC TGGGTCTCCAGGCTTGGGGGGATnrCCACAGTTCTCCTGGAGTGAGACCCAAGCCAGAGGdTGTCCCA GAGGCTTATGGACCTGTTTGTCAAGCATCTCACAGTTCATTCACAAGGGTCCGCAATG

EDC NOVEL SEQUENCE 4

“Exon trapped” clone 127E12sD17 (section 4.4.6, table 4.3) E2D17exl

CTTCACNTGGNGNANATnTAGACATCAGGACGGGTTCCAAGATGGCCGAATAGGAACAGCTCCAGnTA CAGCTCCTGGCGTGAGTGATGCAGAAGACAGGCGATTTCTGCATTTCCAACTGAG

EDC NOVEL SEQUENCE 5

“Exon trapped” clone 127E12sD18 (section 4.4.6, table 4.3)

ACCACANTGNTAGATGGAGAGAGAGACTCNGAGCCTCTACCAGTCCCAACTAGTCACGGAGCCCnrCC AAAGATGCCAGGCTGAAGTGGAGCCTCTGGGTGCAG

EDC NOVEL SEQUENCE 6

“Exon trapped” clone 127E12sF7 (section 4.4.6, table 4.3)

AATAATCAACAGAGTAAACAGACAACCTACAGAATGGAGAAAATGTITGCAAACTATGCATCATGACAA AGAGCTAATACCCGGAATCTACAAGGAACTCAAACAACmTAAGGAAAAAAGTCAGCTCATTAAAAA

270 EDC NOVEL SEQUENCE 7

“Exon trapped” clone 128L15sA3 (section 4.4.6, table 4.3)

GATGGAGAGAGAGCTCTGAGCCTCTACCAGCCAACTAGTCACGGAGCCTTTCCAAGATGCCAGCTGAAG TGGAGCTCTGTCCCAG

EDC NOVEL SEQUENCE 8

“Exon trapped” clone 128L15sA16 (section 4.4.6, table 4.3)

ACCAGCAGGAGCATGCATTAAAGCCCGGACTTCACCACTGTGCAATGTGTTCATATACATGTGTATGCAG GATTNTGTGTGGCATAAGCTTTCAGTTCTTTTGGGTAAATCCCGAAGAGTGTGATTTCTGGATCCTATG

EDC NOVEL SEQUENCE 9

“Exon trapped” clone 128L15sB6 (section 4.4.6, table 4.3)

TCTGAGCCTCTACCAGTCCCAACTAGTCACGGAGCCCnrCCAATAGATGCCAGGCTGAAGTGGTAGCCT CTGTGTGCAG

EDC NOVEL SEQUENCE 10

“Exon trapped” clone 128L15sB7 (section 4.4.6, table 4.3)

AAAAGCAAGAGGAGATCAAAACCTGCAAATGGAAGGCTTCTGATTAGCATTACAGAATCTCAAACCTGA AG

EDC NOVEL SEQUENCE 11

“Exon trapped” clone 128L15sB17 (section 4.4.6, table 4.3)

GTCTCTGGTTTGATGACCTGCCCnTGGCGCCCCATTCTGGAAGATTCAGTCATCAGC CTGTGCAGGCCCTCCCAAGGGAGCTGTGTGGGGTGAAATGGGGCCAAGTAGAAGGG GCCGGCTGGGAGAAGTCTTGTGGTGGGG

EDC NOVEL SEQUENCE 12

“Exon trapped” clone 128L15bG14 (section 4.4.6, table 4.3)

AGAGCACTCTTAGAGTTAGAGCCAATnTGACTCTCCATTCTACCCdTACAGCTCCGTTACTTTGGACAA CTTACTTGGCTTTTCTGGCCCCTCCCTGTCCACTTCCCCCTTTAAAAATGTGAAATGGGAGTGATCTCATT AGTAGCTACCATACAGGGTGCTGTGGGGATTCAATGGGATGATGCACCTGAACA

EDC NOVEL SEQUENCE 13

“Exon trapped” clone 128L15bH15 (section 4.4.6, table 4.3)

GGGAAGAAAGGGGAGGCAGACAGACACGTGTTCGATGTGAAGCCTAAGCACTTGCTCTCTGGGAAGAGG AAAGCTGGCAAAAAGGACAGGAGATAGTATCCGTTTGATTGTGGCTTGGCTAGACTGTrGCTGTTTATTT CCTG

271 EDC NOVEL SEQUENCE 14

“Exon trapped” clone 128L15bJ3 (section 4.4.6, table 4.3)

ATGGCAGTGTTCTGAAGTTTGTTATATTAAGGAGTGTATTGCCnTCACACATGTGAGGGTCTTGGGACAC AGGGCTGnTTCTGAAGTTCTATGnTGTdTGGAATTTGTTGAGCCCTGGCATGTAGATCACAGTAGCCT GGGTTCAGCTGACTCAGGGCTCCAGTCTTTAGCAGCGGTAACAGCAGCCAAAGCCAGACnTATAGCAGG AGGTTATTACTATCTCTATCCTAGACCCTTCCTCnTGTTCACGAGCGTGGGCAGGGAGGAAGGAGCCCTT GAGGAACTAGACTGnTGTGGACTTTGCTCTTGAGATAGCTGTGGTAAGGGCACTGAGCGGATGGTTCTT TTCACTTAGCAGATACCAGGCCTTACATTGGTACATCGCCTATTTGGGCTGGTGGTCAGAGGATAAGCTG ATAGAC

EDC NOVEL SEQUENCE 15

IMAGE clone 1676497 (section 4.5.2) poly A tail (3-5) GGGGGTnTAAATATTTTATrCATTATAAAACATATAGGACCACAACCCAGAAAATGAGGAAAAGGGATG AGAAGGGACAGGAGAAGCATCCCAGGGAAGCAACACTGACCATGTGCCGTGGGACCCAGACCCAGCTCT TACTTGTAGGCCACATCACACCCACTGTTGTTGTGGACATGATGGGCCTGCAGTTCCCGCAGTCTCTGGCT GCAGACTGTCTGGTCGTGACACTCCAGTCCAGGGACATGGTGTATAACAAGGACATTCACTGGCGTGGTC AGCTGAATACTGCAGCCAACAGCTTCTGCCCCCCATGCCTTGCGAGAGACCGTGGTGGAGACATCTmr CAGTGAGCTGGGAGATGTTCTCAAATAGGTACTGGACCCCTCTGATACCTGTTTAGCTTGTGTTTTGTTCC AGGAGGAATCACCCCAGGCCTGGATACCCAGAGCAGAGAAGACAAGAAGCCACAGCAGCATCCCCACG TGGTCCCAGGACACCAATCTGGAGAGTGTGGATGGCAGCAGATATCTGTGGGTCCTGTGCTGTCCCAGCC CAGCAGGTCAGGGTGCAGTCAGCTTGCCCCTGATTGGCCAGCAGCTCCTTCCCTTCCAGACTTGGCCTCC AGGCCCTGCCCTTCTTCACCAACTGGCACCTTCTCCTCCAGCCTTCACTCTCCCTAGCTCCTTGAGGGCAC TGCCCTCCCTGGGAATGGAGCACTCGGGCTCTAGCTTCCCAGAGAGAAGAGAGCCCAGGATCCCTAACA AAAGATGGGGAGAAGGGAAGACAGAGGGCAAGGCTCTGAGCCAGCTCCTTTCNCAGCACCTGGAGAAAT GCAAGTGG

EDC NOVEL SEQUENCE 16

Direct sequencing of PAC clone 127E12 with primers derived from clone 127E12sC3 (section 4.5.2, figure 4.5)

E2C3F: TGGGGACGCTGCCATGGCTTCTTGCCTTCTTCATTCTGGGTCTCCAGGCTTGGGGTAAGTTTTATrTACTTG CATCAGAAGTGCrnTGATGAGCAAGAGAAGACCGTTCTCAAGTTTAATAATAGAAACCCATCCAAGTGC TTTCACnTGTrAGTGGATAGTGTAGCnTGCAATTTGGAGCCCTGTTCTAGAGnTACAGAnTmTCAA ATCCTTGCCTGTCCTTTGTTATGGTCTTCAANAAATTATTCAAnrCTTGGAATCTTATTTGCTCCCATGAT GAAAGGATAATATTGCTCATTNCTTAGAGTrGC

E2C3R: AACAGGTCCATAAGCCTCTGGGACAAGCCTCTGGCTTGGGTCTCACTCCAGGAGAACTGTGGAAAATCCC CTGCAAGAGAGCACTGGTGCATTGGTGCACTCdTGGAGTGCACATGGCCAGCAAGCTCAGAGGCTGGG CCAGACATCCCACTCTGCAGTGCACAGCCTCTCCTCCTCCCCAGGTCACCTAACGTCTTCTAGCAAGCAC ACAGAGGAGAACCACAGAAGTCTCTGGCdTGGCCTGCCTCTGCdTCCACATCTGCCCAGAGAGGCCTT TGTCCTCCCACCTCCCCCAGCCTCCCTCCATTCCAGGAGCACTGCAGTCTCACCTGCTGCdTATCTGGGC AGTGCCACCnrCCTGGTTCACCGGCAGACCTACTAATCAAACCGT

EDC NOVEL SEQUENCE 17

Direct sequencing of PAC clone 127E12 with primers derived from IMAGE clone 1676497 (section 4.5.2.3, figure 4.6). Sequences presented as seen in figure 4.6, left to right:

ATGTTGCAGCNCAACCCTCAGGCCTTGCAGGCTTTCCCAGTTGGGGCTNAGGACGAGCTCTCCGGAAGTA TTGGCCAATCTTGCCCTTTTCCCAGGAGGGGCTTATTCACAGGCCGGGTACCCAGGTGCTCTCAGCCTGA GTCTGGTCCCACTTGCATTTCTCCAGGTGCTGTGAAAGGAGCTGGCTCAGAGCCTTGCCCTCTGTCTTCCC

272 TTCTCCCCATCmTGTrAGGGATCCTGGGCTCTCTrCTTCTCTCTGGGAAGCTAGAGCCCGCGTGCTCCA TTCCCAGGGAGGGAGTGCCCTCTAGGGCTAGGGAGAGTGAAGGCTGGAGGAGAAGGCGCCAGTTGGTGA AGAAGGGCAGGGCCTGGAGGCCAAGTCTGGAAGGGAAGGAGCTGCTGGCCAATCAGGGGCAAGCTGAC TGCACCCTGACCTGCTGGGCTGGGACAGCACAGGACCCACATATATCTGTGAGTTGAGTATGTGTTTCTT CTTCAGAAGCCCTCTTCCAGCAAAGGGACTGGAATCCACCAGAAAGAGTGTrAAGGATTGGATTTCCTCT GCCCTATAACTAGGAACCCCTCCTACT

TCCCGTGGGGAGCGTGCGTGGCTrCTrGTCTTCTCTGCTCTGGGTCTCCAGGCCTGGGGTAAGmTCnT CCCTGGGCCAAAGAGCAACCGAGAAGGGTGATCCTAAAGTCTCATAATAGGAATTAAAAAATACCTACT GTCCATCTGAGCCTGAGGAGTrCAAGGCTGCAGTGAGCCGAGATCCCACCACTGCACTCCAGCCTGAGTG ACAGAGGAAGAACTrGTCTCAAAAACAAACAAACAACAACAAACAAAANTACTCTCTCrrGGCCAGGGG TGGTGGCTTATGCATGTAATCCCAGCACnTGGGAGGCCTAGGTGGGCAGATCACGAGGTTAAGAGATCG AGACCATCCTGGCCAACATGGTGAAACCACGTCTCTACTAAAATCAAAAATTAGCTGGGTGTGGTGGCAC TTGCCTCTAGTTCAGCTCTCGGAGGAGGTGAGGTNGATATTGCTrGACCGGAATGATGTGTnTGGGACC ACTTACCCCAGGCCTGGAGACCCAGAGCAGAGAAGACAAGAAGCCACGGCAGCATCCCCACGTGGTCCC AGGACACCAATCTGGAGAGTGTGGATGGCAGCCTGAGAGAGACGCrGACAGTrGTTAACATGACCATrC TCAATCTGCAGAGCATCCCAAACCCTGACTGCAGTGGGTGCCATAGTrGCCTCCTCTTACCATrCCTGTGG CCTCCTGAGGGCCAGCACCTGCCTCAGTCnTCTTNAGGGCCCCAGCATACAGCAGGGTGCTTCAAGCAT TCCTTGTCCTATTATTTACCCGAGGAAACCTCAATTrCATCACGTGCCCACGGNATTGCAAGGCTTGTGTA CACACTTGGCACCAATGCAGCCTTATnTAATTAAGCCCTTACANNTAATTCTGGCACTGTAAAACTrGTC AANTCCCATnTTACNNCCAAGGGAAGTANGANCCAA

AGCCTNTNGTTGAGACCCCCAGATAAAAGCAGTrAGGTCATGGTCATAGAAAGGGCCTCAATrATCAGCT CNNNCCTTGGCCCTCTTCCAGGCAAAGAGATGGTACACCTGCGTACCAGTCGGAGAGCTAAGCCTCATGT ATAGGAAGTCGTTTGCAATGTGTAGTGTGCTGTGCAAGCCAGGGTrGGGGAGGGTAGAGAATACAGTrAT TACTTAACCTTCCTrGCTTCCTCCTGGGAGCTCTCAGAGACAGGCACCTCACAGAAAAATCCAAAGGATT GGGAGTrTCATGGGGGTnTAAATATTTTATTCATTATAAAACATATAGGACCACAACCCAGAAAATGAG GAAAAGGGATGAGAAGGGACAGGAGAAGCATCCCAGGGAAGCAACACTGACCATGTGCCGTGGGACCC AGACCCAGCTCTTACTTGTAGGCCACATCACACCCACTGTTGTTGTGGACATGATGGGCCTGCAGTTCCC GCAGTCTCTGGCTGCAGACTGTCTGGTCGTGACACTCCAGTCCAGGGACATGGTGTATAACAAGGACATT CACTGGCGTGGTCAGCTGAATACTGCAGCCAACAGCTTCTGCCCCCCATGCCTTGCGAGAGACCGTGGTG GAGACATCTGTGGGAAGGCCTTGGGGACGTCACTCAGCGCACCTGCCCCATCACTACCGCATTAGCCCGC CAAGCTGAACATGCAGGCATGGGCTGTGTCGGCCCCTTGGTAGTGGAAGAGCCTrCAGCAGGCAGGTGC ACCTGCCCTGCCCACCTCTACATACCCATCTCTCAACACTCACCCGAACACAGACTCTGCACCTGGCCAG GGAGCCTAATGCTTGCTCCCATCCATAATGTCCCTACTATGTCTrCCTGCTGTGGTCAAGGCTCTGCACTT AAGGAAGCATACTGCTTATCCTGGGAAGCCTGACAGCCTGGCTTGGTGTCACCTCAGCATTCAATGTGTG CTCTNNGATCATTTCCTCTGCGGTATGAGTGTTCCCTAGGGGACTGGATCTTCTCCCACGCAGCAGGGGC TATGTCTCCCCCTCAGTCCTGAGGACTTCATGAGGGCAAGGTTTGGGTCTCCCTCCACAGGGAGTCCCTG TCTGCAGAGGCCTGTTTCCCCTCTTCCTGGGTTTCTAGCCCTCCAGCACCAGCCTTCCATCCCAAGGACCT GGGTAATrAATGCAGGTGGTTTACTGGTACTGCTCTCTGCACTTCAACCCCATnTCCCCCTCCCAGTCAC ATGrGGCTTrGCAGACCTCAGAAACATTTCCGATCACATGAAAAACCCCAGCTCGTCCCAAAACCCCCTT ACTATCCAGATCCAGTTTACCTTTTTCAGTGAGCTGGGAGATGTTCTCAAATAGGTACTGGAGCCCCTCTG ATACCTGnTAGCrrGTGTnTGTTCCAGGAGGAATCACCTGCAAAGGAATATCCCATnTGCATCAGAGA GCACCAATACCAGCCATCATGGTGGCACCAGCTCTGAACAGAAACAGCCATCCCCTCTCCACTGCCCCTC TGCCTCCTCAAATCCCAGAGCCTCCTCTAAGAAATCTCTCTGGGCTGACCAGAGGGCCACCTCTAATAGC AGCAGGTGGCATAACTTCCTTCCATGCCAACTCTGGGAAATCCCCTGGGATAGGCAAGGAGTTGGAAATG TTATCCTTTTCTCATGAGCTGTATCTACACACACAATTGGCGGGGAAAA

EDC NOVEL SEQUENCE 18

IMAGE clone 113442 (section 5.3.3)

GATTCTTGAGACAGACACGCCACAGAGGGGCCACCAGACTTGAAGGACTCCAAGAAGGACCrrCAACTG AAGACCTCCTCTCAAGTAGAGGGGTCTCCAGATCGAGGGATCACCACAGAGGGGTCCCTAAGCCACAGA CATTTAAAAAGGATCCCGGATTGTGAGACCTACAGAAGAGGGGGnTCTGGAGCTGAATCTCCTGCCCCC ACCAAAACTGAGATGCCGTGCCTGGTCTGCTGCCCAAGCTCAACCCCTGTCCTCTCCCTCTTTCCCCATAT GCCCCCTCCAGGCTGAGACTGATGTGATAACACCAAGGCAGGGACCrrrCTGAACCATTCCTGCATCTTGG GGCTCCCTGACTCTCCCTCTCTGCTCCTTCTCTCTTCCACAGGTCAGCCCTGACAAAGGTCAGCTAGCCCC TTGAGGACATCAGCnTGGCCTCAGGGTCCTAATGGCAGCAGAACCACTGACAGAGCTAGAGGAGTCCA TTGAGACCGTGGTCACCACCTTCTTCACCTTTGCAAGGCAGGAGGGCCGGAAGGATAGCCTCAGCGTCAA

273 CGAGTTCAAAGAGCTGGTTACCCAGCAGTTGCCCCATCTGCTCAAGGATGTGGGCTCTCTTGATGAGAAG ATGAAGAGCTTGGATGTGAATCAGGACTCGGAGCTCAAGTTCAATGAGTACTGGAGATTGATTGGGGAG CTGGCCAAGGAAATCAGGAAGAAGAAAGACCTGAAGATCAGGAAGAAGTAAAGCCGCTGGCTGAGATG GGTGGGCAAGGCAAACTGATCAAG

EDC NOVEL SEQUENCE 19

Entire sequence described in figure 5.4 (sections 5.3.2-5.3.6).

GGGTACTCACGGTTGCCTGGGCCTGAGGTGGGAGCACîGTrCTGGACAGATGGCCAAGGTTGCAAATGTG GCTGCTGGAGCTGTGTGTGGAGACCTGTCTTCAACAGTCCCTCCCCCGCTGACCCCCCCACCGACTCCAC CCCTTCACAGACGAACACCCTGTGGGGCAAGGGGCCAGGAGCAGACAGAGGCGCCCTCTGTCTCTTGGG CTGGAACAGGCGGCCCTTGTCTCnrCCCAGCdTGCTTGCCCCGTTCCnTCTCTCCAGGCAGCAGTGGA TGGTGCAGGGGAAAGAGGTGGGAAGGAGGTCCTGGGAGGGACACTGGATGTCTTACCCCAAGCTGGGCC TTGCAGTACCTGTGGCTGGCTGTGCTGGTTGAGCCCGGTAAGGGGGGAATTCTGCTGAGGGGGTCTTAGT CTGTGGCTCCTCCATGGGGCCTCACTGAGAGGACCCTCAGTGAGAAACTTAGTCTTGGGAGTCTGAGAAC CCACAGAGGGATTTCAATCCAGGACTCTTCTCTGGGGGTCTCAACCTGGATCCCCTCGTGTGTGGGCCTC AGCCTGGGTCTCTAGGATCTTGGTCTGAAGCATCCTTTGTGGATCTCATCCAAGGGCCTCTGTGTGTGTCT CAGATTGGGCCTCTCAGTGTTGAGCTCCCTTGATTGGTTGCCTCCCTCTTCCCCCCTCAGCATACACACTC CTGATTAGACATGGGTATGGTGGCAGCAGCCCCCAGTATTGAGAGGACTAGGACCCTGGGACCCCTGCC GAACGAGATCTGTCCAGGCAGCTCCTCTCCCTGACCAAGTGGGGGCGCTCGTGTGGCTGGCAGGGAGCC CGGGTCCAGACCTGAGGGCTCAGGGCAAGCGGCCTGATTGGGCTGTCTCTGGTCTTCAGAATCGACCACG GAAAnTGACACCTCCGGGCTTGGAAGCAGCTCTCTCCrCdTCCCCGCTGCTTATAAACCTCAGCCCTGA GGCTCCAGCTCACTCTACCCCATCTCCTTGCCGGGTGAGTCTCATGGCTGACCCTGGAGGGAGGCTGGAG ATCATTTnrCTTCCTGGGCTCCCAGGTCCCTCCCCCAGAGCCAGCTCAGACAGGAATTCAGAGAAGAGA GCTTTGGATGATCGAGCAACAGCGCGTGCAGGAGTGTGTGACGGGGTGGACAGAAAAGAGAATGCCTCT TCCAGCCCCACCCGCCGCACCCCAGGGGAGTTGTGAGTGCCTGTACTAAACCCGTCCTGGAGCAGGGAG TCAGAGACGCCCCGTTCCAGGACCCHTGTCACCGGGAGACAGGAGCGGGGAGGATCTGTGGGGTCCTGG GTTCAGCACTCTGCCCTGCTGCCGCTGAGGAGAGGTAAGGGTGGAGCGGGGCAGAGAAAATGGGTCTGT TCAGACGGAGCCAGCTCTGACCACAGGGGACATTTGCCAGACCCTGTGCTTCTCTGCTGGGAGGAGCGGT TAGAGTGTGTGTGGGCGGATGCGAGTGGCCGGAGACCAGGGCCCAACATAAACAAGCTTTGGAGACAAA CAACCTTATGACCTGTGAGTCTGGGTGAGACAAGAAGGATTCTTGAGACAGACACGCCACAGAAGGGCC ACCAGACTTTGAAGGACrrCCAAGAAGGACCTTCAACTGAAGACCTCCTCTCAAGTAGAGGGGTCTCCAG ATCGAGGGATCACCACAGAGGGGTCCCTAAGCCACAGACATTTAAAAAGGATCCCGGATTGTGAGACCT ACAGAAGAGGGGGTTTCrrGGAGCTGAATCTCCTGCCCCACCAAAACTGAGATGCCGTTGCCTGGTCTGCT GCCCAAGCTCAACCCTGTCCTCTCCTCnrCCCATATGCCCCTCAGCTGAGACTGATGTGATACACCAAGC AGGGACCTTCTGAACCATTCCTGCATCTTGGGGCTCCCTGACTCTCCCTCTCTGCTCCTTCTCTCTTCCAC AGGTCAGCCCTGACAAAGGTCAGCTAGCCCCTTGAGGACATCAGCnTGGCCTCAGGGTCCTAATGGCAG CAGAACCACTGACAGAGCTAGAGGAGTCCATTGAGACCGTGGTCACCACCnTCTTCACCnTGCAAGGCA GGAGGGCCGGAAGGATAGCCTCAGCGTCAACGAGTTCAAAGAGCTGGTTACCCAGCAGTrGCCCCATCT GCTCAAGGTAGGCAGAAGTCTGGGACTGAGATTTTGTGAGATGATGATTATGGGACACTGTGGTTAGGAC GTAACTGTGCCATACGGTGAGGTGACCACAAGAAGGGTGTGGGTGGCTGCATGTGTCAATGGTTGCAGGT TTTCTnTGnrnTCGAGACAAAGTCTCACTCTGTTGCCCAGGCCGGAGTGCAGTGGTGTGATCrTGGCT CACTGCAACCTCCATCTCCGGGGTTCAAGTGATrCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTACAG GTGTGTGCCACCACGCCGAGCCTATTTCTGTATmTAGTAGAGACAGGGTTTCACCAGGTTGGCCAGGCT GGTCTCGAACTCCTGACCTCAGGTGATCCACCCGCCTCGGCCTCCCAAAGTGCTGTGGTTACAGGCGCAA GCCACTGCGCCTGACCCGGTTGTGAGTTTTCTTGCCTGGTGTGTACATGTGTGTGnTGTGTGTCCATGCA TATAATTAGGTAAGCAAAGATACTATGTGCCAGTGGGAGTATGGTCTTGTGTCAGAAGATGCTGTTGGGT TCTTGTGTATGCTCTGTGCCTTTCTGGTCTAAnTCCCCAAGACAGGACTGGACCCAGAGCTGGCTTCCAG GAGATTCACAGACTGTCAGAGCrrGGAAGGAGGCCAAGAGAAAACACAGTCCGCATTCCTCACTrCACAT ATATGGAAACTGAGGCCTAGCAGTGGAAAGGGGCTTATCAAAAATGAAAACACAGCCAAATTCATGGAG TCATAGACCTCAGTGTAGGGGGTCTGTGGGTGCATCTGTGAAGACTCCCTACCCGCTATGGGCATCCCCT TACTAGGGTGTAAGAGTCATTCCTTCTCTGCACTCACAGGCTGGAGTCAGGTdTAAAnTCATCGACrCT GGCCTCGAAGAGGCCTGGTCATTCTGGCrGTTGACAGTTTGTTTTACATCCTTTCTGAAAATCCACAGGCA CGTTGAAGTGTGTGAATATGGACTCdTGAACACAGTGAGTCTCTGAAACTCATGGTCnTCATCTGCAA

TGTGCGATCACAGCTCATTGCATCCTTGACCTTCCGGGCTCAGGCAATCCTCCTGCCCCAGCCTCACAAG

GTCACCGAGGCTGGAGTGCAATGGTGCGATCTTGGCTCACTGCAACCTCTGCCTCCTGGGTTCAAGTGGT TCTCTTACCTCAGCCTCATGAGTAGCTGGGATTACAGGCAACCGCCACCATGCCTGGATAATnTTGTATT

274 TTAGTAGAGACGGGTTTCACTATGTTGGCCAGGCTGGnTCGAACTCCTGATCTCAGGTGATCCACTGCCT CGGCTCTCAAAGTGCTGGAATTACAGGCGTGAGCCACCGCGCCTGGCCTAArmTAAGTTACTTGTAGA GATGTGGTCTCTACTGTGTITCCCAAGCTGGTCTCAAACTCTAGGCCTCAAGTGATCCTAGTAGT

EDC NOVEL SEQUENCE 20

Sequence of 1.6kb Sau3Al fragment from cosmid clone ICRF112cL0496 (section 5.3.9, figure 5.9).

TATTGATATTGCAATAATTAATTGAATAAAAATGTGGAAATAACTATGCAAACTCTGTCn'AGGArrAAGT TAGTGATACTAACTCCCTGGTGACCAGATAGCGGTGGGGGGGAGTrCTGdTAAGGGGAGCATGGCCCA GATGAGGCCTGGAGGCCCTGCCTGAGGTGGTGTGGACTGATGGGTGGTGCTCGTTTGGCAGGGGCATTGC CCAGAGGGACTCAGCCTGCGGCTGTAGGGGGCCCCCCTGTAAAGATGCCTCACTCTTAACTTGCCAGAAG TGGAGGGGTCCTGATGGCCAGGGCCAGCCAGCGGGTGGAGCCnTGCCCTGAAGGCTAGGACAGAGAGG GAGTTACAAAGAGGAGCCAAAGTCACTTCAGGGAATAGACAGCCTGGGTGAGGTGAGGAGAGGGGCTG GGGGGCCAGGGCATGGTGGACTGAAGGGTCAGAGATAAGGTGAGAGTTCATAGCCAAGGCTGAGAGTGG TAAACAAGTGGGAGCTGGTGCATCTTGTGGGCAACAGGGCdTGGACCCCTCACAGAGCACTGGGCAGT GGCCACTGATTCCAAAGAATGGTGGCTrCTCCCCAGACCCTGTTGCCTAGAC'nTAGCATCAGGCTGTGT CTCCTTGTGTGCTTGGGTAAAGAGGCAGTGTCCATTCAGGCTTCATGTCCCAGGGCCAGTGGGGCCTTCT CAGGGGCTCTGACCCAGCCTGGCAGGGACACCTTCCACGCCnTCCACATTGGCTGCTCCTGCTGCTCTG TGCTTGGCTCCCCGAGGCTGGTCTGGGGCTCCCATGCTGCTATCTCTGGAGGATGGGGACCTGAGGTGAC CACCAGTGAGCCACTGCTTGGCGGCAGCATTGATGCCATGGCTGGAAAGCCACATGTTGTTAGGTrCACC CTGATGCACCTGGTGGCTGCTCCACTGCGTGCAAGCTGCTGCTTTGGACTCCrrATCTGGGCAAGCACATT CCAATAGGGGAGCAGTTGGTCTCTrCTCTTCTGGAGCTAAAATCTGCTnTCAAGGGTCCCAGCTCTGCCC TGCCCACCCCCGCCCCCGCCGCACCCACCCAAGAGGTGGGCCCCTGGACATATCnTGCCCTCTTGCGTAC AATAGCCACAGCTCTTTTTCAGCACCTGCTTCACTGACCTGCTGCCTTCCCTGCACCTTCCTCTCAGTGCC CCTGGCCCTCCAGCATGGGGCTCAGTGCAGAGGAGGCAGCACTCCACTCTCATTCTCCAGATGGTCCCAG AATGCCCCTGTGAGCCAGGCACAATCCCAGGCCAGGTTCTAGCTCCCAGGAAGCTCATNTCCTAGTGATG TGTGTGCTGGAGTGATGGGGGTGGAAAGTGTGGAGGCCCAGGGCCCTGAAACCAGACAGCCAGGCTTGC CTCTCCCCAGCCATGGGCCTCAGGTACAGTGTCCGCTTATGAGCCACAGTTTCCTTATCTGGAAATGTGAT ATTTGGTTTCC

EDC NOVEL SEQUENCE 21

Chicken SI00A6 locus (section 5.5.4, figure 5.11)

AGCCTTCCTTCCCGTTCCGGATCCTGCCcTGCTcCAgCGGTAAGTTTCCCnTCCTCCTTCCCATTCCCTTTT CACCAAACCCCTGAGGCTGCCCCGGGGCTCTGTGCTGCCCCCCAGATGTGCAGAGTGCCCCATCCAGATG TGGGACCCCCCCGCCCCCAATGGAAGCCAACAGCTCCCCCACGTCCnrCCCATCTCAGGGCTGGCGAGG CAATGGGGACTGGAGACCCCACGAAGGCTTCTCCCCATCCGCCCCACTCGTGCCCCCCTCTGCCCCCCAT CCCAGCCCCCGGCAGCTCCCACAGCCCCGTGGCTGCGTGCGGAGGCGGCGGGCACTGTGCCCTATGAGT CAGCTCAGCCACAGCCCTGGTCTGTGCAACGTTCCGGGATGGGGGGGGGGGGGGGCACAGAGCAGCCCC TCACCTTTGGGGCTGTGATGAACTGGGACCTCAGGAGCTGTGCCCCCAGCACCGGGCTGATGGGGGGGGT TGGGGGGCTATGGGTGGGCTGAGAGnTGGGAGAGTTGGATrGGGGGnTGGTGGGTACTGAGCTCTGCC CTGGTTCACAGCCCAGCCCGCAGCCATGGCAGCCCCCCTGGACCAAGCCATTGGGCTCCTGGTGGCCACT TTCCACAAGTATFCGGGCAAGGAGGGCGACAAGAACAGCCTGAGCAAGGGCGAGCTGAAGGAGCTGATC CAGAAGGAGTTGACCATCGGGCCGGTGAGTGCTCCCCAACAACGCCACCACTGCTGCACCAACAGCGGA GACACACCCAGAATATGGGCATGGGACGGGCAGGAGGGAGGTGACACGAGGTGTGACGTCACCTGGGG GGGGTGGCACAGCCCTGCCCCCACACCTGCCnTATGGGATGCAGTGTGTACACCGGTGCnTGCTGGGG AGGGGGGACGTGGGTTGGGGATGTGCTCGTGGTCGGGTCCCGTGGTGCCCCTCAGCTGTGTGGTCCCCCT TTTCTGGGTAAGTGTGGGGCCACGTCCCCCCCCTGACCCCCCGATGGTTGATGGCTCTGTGCTCnTGGGG CTGCAGAAACTGAAGGATGCGGAGATCGCGGGGCTGATGGAGGACCTGGACCGCAACAAAGACCAGGA GGTGAACTTCCAGGAGTACGTCACCTTCCTGGGCGCCTrGGCCATGATCTACAACGAGGCCCTGTTGCAG TACAAGTAGGGCTGAGCCCCCCTCCCAATGGCCTCTAAGGGGCCCAATAAAACCTnTGTAAAGATTTGC AGCTGTGTTTGTGGTTGTrGTCGGACCTCAGAGGAATCAAAGCCTCACGGATCAGTGGAGCACCTGAGCG CGGCTGCGCTGTGCTCACATTGCAGGGGTGGGGGGCCACACTGTCCACCCTATGAGCCCTCAGCCCmr GGGACCCGATrTGGGATTATTnTGCATTCAGAACCGGAACGATGCATGATGAAGCATCCCTTCCTGCTG GCTCATGCCCTGCGCCCACCCCGCTGCTATATCCCCATCCCATTCCATCCCCATGGCATCGAGACAGCAG CCCCCAGCAAAGCAGCAGCGTGGTGGGCAGCAGAGGGCAGCCCAGGGCTCACAGTGTGTGCACCCACGT

275 GTCCTTCTAACTGCCCATGGGACGTCCGTGTCTCTCCACCCACGTGGACAGCGTGTCTTCGACTTCCAGTG TGCATCGTGT

EDC NOVEL SEQUENCE 22

Chicken SIOOA10 genomic sequences (section 5.5.4, figure 5.12). Sequences as in order of figure 5.12 left to right.

CCCTCGCGGCNGACAAGAACTACCTGAGCAAGGAGGACCTGCGTGCGCTGATGGAGAAGGAGTTCCCCG GATTCCTGGAGGTGAGGGGTTGGGCTGCGTGCGGAGGCGGCGGGCACTGTGCCCTATGAGTCAGCTCAG CCACAGCCCTGGTCTGTGCAACGTTCCGGGATGGGGGGGGGGGGGGGCACAGAGCAGCCCCTCACCnr GGGGCTGTGATGAACTGGGACCTCAGGAGCTGTGCCCCCAG

CACCCACCCCACAGCCCTATAACTGCTGGATTCCTTGAGGTGGTAGGGGACAGAACAGGGTCACACCTA CCCCATGGTCATAAAACCTGGAGGGGGGTGGGGTGGGGGGGGGCACAGCGGGGTCACACCTACCCCGCT GCCCTATAAGCACACCGTGAGGCACTGACCTGCTTCTGCCCCCAGAACCAGCGCGACCCTATGGCGCTGG ATAAGATCATGAAGGACCTGGACCAGTGCCGGGATGGCAAAGTGGGCTTCCAGAGCTrCTTCTCACTGGT GGCTGGACTGACCATCGCCTGCAATGACTACTTCGTGGTGCACATGAAGCAGAAGGGGCGGAAGTGAGA GAAGGCCCCGCAGCCCCAATAAAGTGTnTATATGACTCTGCTGTGGCCTCTGCCTTGGGGTGGGTCCCC TCAGGTCCTCATCACTAGGGGGTTCCCATCAGCCATATCCCTATCAGCACCAAGTCCCCATCACCACGGG GTTCTCATCCCCTAAAGCCCTTTCANTGTGGGGCTCCCGGCCCCATATCNNCGTCCCTGTNCNGTCCNAN TCANCCNANGTCCACACCTTTGTGCTNNCCATCATTATAGGGATCCCTGTTTCCATGTCC

EDC NOVEL SEQUENCE 23

Human SI GOA 10 genomic sequences (section 5.5.4, figure 5.12) as seen in figure 5.12, left to right

GGGTCGCTTAAGGAATCTGCCCCACAGCTTCCCCCATAGAAGGATTTCATGAGCAGATCAGGACACTrAG CAAATGTAAAAATAAAATCTAACTCTCAnTGACAAGCAGAGAAAGAAAAGTTAAATACCAGATAAGCT TTTGAl ri'1'lGTATTGTTTGC

CCCCATCGAATTTGTGAAATGTAAACATCATGGTTTCCATGGCGTGTTCCATTTGAGATGGCATTTTGGTG

CATGAATTTAl l'i ri'1'1 rAAATTGAAAGATAAAATrGTATGTATTTAATGTGTACAACATGATGTnTCAG GTATATATACATTGTGGGATGACTGAATCTAGCTAATrAACATATGCATTACCTCCCATAGTTAAGCACTN GTGTGGTGAGAACACTTAGCCACTCTTAACTAnTGTCAAAGAATACAATACATrGTAATTAACTTATAGT CACGTTGTTGTACAGTAGATCTCTTAAACTTATTCCTCCTAACTGAAATnTCAATCnTTGACCCATAGCT CCTTCAACCATCACTCAAATGCAATCACCCCAGCCCATTCTATCTTTAGTTCTATGAAATTCAGNGGTTAA NNATTCCC

GAGCTTGGGTNTTCTATGGCAGAATCTAAATANNAATACTGAAATGCTCCCGCANGGGCAAAAAANAAA ANATAATACATTTAGATGAACTCCAAAGGTTGCCrrATGACCTANNACAAGAGCTCAAGTAATTGGGAGA TATATACACAGTTCTCAGCAGTACAAATCAAATAAGCTCCTCATTGCAACdTCTGAdTCTAnTCTTGC

GTGTAGAGATGGCAAAGTGGGCTrCCAGAGCTTCTTTTCCCTAATTGCGGGCCTCACCATTGCATGCAAT GACTATTTTGTAGTACACATGAAGCAGAAGGGAAAGAAGTAGGCAGAAATGAGCAGTTCGCTCCTCCCT GATAAGAGTrGTCCCAAAGGGTCGCnTAAGGAATCTGCCCCACAGCTTCCCCCATAGAAGGATTTCATGA GCAGATCAGGACACTTAGCAAATGTAAAAATAAAATCTAACTCnrA'nTGACAAGCAGAGAAAGAAAAG TTAAATACCAGATAAGCmTGATmrGTATTGnTGCATCCCCnTNNCCTCAATAAATAAAGTTCnnT TAGTTCCAAATrrGAGACAGAATGTITGNTGTrCCCTCAGAAATrCTrGTrCCCCAAGAGGCAGCTTGCCC TrGGGCAGCCAGCACACAGCAGAGCmTdTACACAGCAGAGCmTCTCACAGAACTGGCAAAATGAG AAAACACTCTTCATTCAGGCTTGGAGTCTAACTTCTCTAATTGTrCTGCCCTTGTAGCTGTGTCTATATGA AGGACTCAAACTTGAAGGATTAGGAGAAGGCTCTACAGGAATAGATGGGTAAGTATATAArrACTGGGG CACAGTTTTGCCCAGTTAAATCAGTGCnTTTCTAGGCAGCACTTTGCATGCCTGCATGCTGATCATnTCA TAGCAAGTGCCAGCAACCTAGGAGACTGGAACCATCGAGAGG

276 Appendix IV: publication

South A.P., Cabral A., Ives J.H., James C.H., Mirza G., Marenholz I., Mischke D., Backendorf C., Ragoussis J., and Nizetic D. (1999) Human Epidermal Differentiation Complex in a single 2.5 Mbp long continuum of overlapping DNA cloned in bacteria integrating physical and transcript maps. J. Invest. Derm. 112 (6): 910-918.

Ill Human Epidermal Differentiation Complex in a Single 2.5 Mbp Long Continuum of Overlapping DNA Cloned in Bacteria Integrating Physical and Transcript Maps

Andrew P. South, Adriana Cabral,* Jane H. Ives, Colin H. James, Ghazala Mirza,f Ingo Marenholz,$ Dietmar Mischke,$ Claude Backendorf,* Jiannis Ragoussis,f and Dean Nizetic Center for Applied Molecular Biology, School of Pharmacy, University of London, London, U.K.; *Department of Molecular Genetics, Leiden Institute of Chemistry, University of Leiden, The Netherlands; fSt Thomas’ Hospitals (UMDS), Guy’s Hospital, London, U.K.; ^Institute for Immunogenetics Charité, Humbolt University, Berlin, Germany

Terminal differentiation of kératinocytes involves the duction roles in the differentiation of epidermis and sequential expression of several major proteins which other tissues. In order to provide a bacterial clone can be identified in distinct cellular layers within the resource that will enable further studies of genomic mammalian epidermis and are characteristic for the structure, transcriptional regulation, function and maturation state of the keratinocyte. Many of the evolution of the epidermal differentiation complex, as corresponding genes are clustered in one specific well as the identification of novel genes, we have human chromosomal region lq21. It is rare in the constructed a single 2.45 Mbp long continuum of genome to find in such close proximity the genes genomic DNA cloned as 45 pi artificial chromosomes, belonging to at least three structurally different famil­ three bacterial artificial chromosomes, and 34 cosmid ies, yet sharing spatial and temporal expression speci­ clones. The map encompasses all of the 27 genes so ficity, as well as interdependent functional features. far assigned to the epidermal differentiation complex, This DNA segment, termed the epidermal differenti­ and integrates the physical localization of these genes ation complex, contains 27 genes, 14 of which are at a high resolution on a complete Notl and Sail, and specifically expressed during calcium-dependent ter­ a partial EcoRl restriction map. This map will be the minal differentiation of kératinocytes (the majority starting resource for the large-scale genomic sequen­ being structural protein precursors of the comified cing of this region by The Sanger Center, Hinxton, envelope) and the other 13 belong to the SlOO family U.K. Key words: lq21/comiJied envelope/gene complex/ of calcium binding proteins with possible signal trans­ sequence-ready map. J Invest Dermatol 112:910-918, 1999

erminal differentiation in stratified squamous epithelia trichohyahn are activated from their precursors and lead to dense involves the sequential expression of several major packing of keratin filaments inside comified cells (Steinert and proteins which can be identified in four distinct R oop, 1988; Dale et al, 1994). Genes encoding the majority of the cellular layers within the mammaUan epidermis and above mentioned stmctural precursor proteins are clustered in a are characteristic for the maturation state of the stretch of genomic DNA at lq21 corresponding to approximately T keratinocyte (Resing and Dale, 1991). Terminal differentiation1% of the length of the largest human chromosome. This DNA kératinocytes involves the cessation of mitotic activity and the region, named the epidermal differentiation complex (EDC) migration of cells through the four layers with the final differenti­ (Mischke et al, 1996) harbors 27 hitherto identified genes, including ation step being the formation of the comified cell envelope (CE), genes for many of the CE precursor proteins, such as involucrin a 15 nm thick, highly insoluble structure formed on the inner (Eckert and Green, 1986), loricrin (Hohl et al, 1991), and 10 surface of the plasma membrane (Williams and Ehas, 1993). The members of the three subfamilies of small proUne-rich proteins CE is assembled by keratinocyte transglutaminase-catalyzed cross- (SPRR) (Gibbs et al, 1993), intermediate filament-associated protein linking of stratum comeum precursor proteins. At the same time, precursors profilaggrin and trichohyahn (Markova et al, 1993; intermediate filament-associated proteins such as filaggrin and O ’Kefee et al, 1993), as well as 13 genes of the SlOO family (SlOOAl—13) (Schaefer and Heizmarm, 1996; Wicki et al, 1996a, b), encoding small calcium-binding proteins containing two EF- Manuscript received November 5, 1998; revised January 14, 1999; hand motifs with Ukely roles in calcium-mediated signahng during accepted for publication March 7, 1999. cell cycle progression and/or cellular differentiation. Co-locaHzation Reprint requests to: Dr. D. Nizetic, Center for Applied Molecular of genes encoding calcium binding proteins with those encoding Biology, School of Pharmacy, University of London, 29/39 Brunswick for structural proteins expressed during terminal epidermal differen­ Square, London W CIN lAX, U.K. Abbreviations: BAG, bacterial artificial chromosome; EDC, epidermal tiation is particularly interesting as calcium levels tightly control differentiation complex; MTP, minimal tiling path; PAG, pi artificial not only epidermal and general epitheHal cell differentiation, but chromosome; SPRR, small proline rich proteins; YAC, yeast artificial also the expression of specific genes encoding structural epidermal chromosome. proteins (Yuspa et al, 1989; Presland et al, 1995). An intriguing

0022-202X/99/$14.00 • Copyright © 1999 by The Society for Investigative Dermatology, Inc.

910 VOL. 112, NO. 6 JUNE 1999 HUMAN EDC IN SINGLE BACTERIAL CONTIG 911 hypothesis is that the spatial and temporal expression of structural Inserts were cut out of the gel and weighed. To an average slice that genes in the EDC could be tightly and coordinately regulated, weighed 250 mg, 150 pi of H 2 O, 40 pi of 1 M NaCl, and 8 pi of 0.5 M possibly by shared locus control elements operating at the level of ethylene diamine tetraacetic acid (pH 8.0) were added. The agarose slice the whole gene complex (Townes and Behringer, 1990; Hardas was dissolved at 68°C for 15 min, followed by incubation at 37°C for 15 min. Two microHters of (î-Agarase 1 was added (Calbiochem 1000 U et al, et al, 1996; Mischke 1996), as has been shown for other gene per ml) and incubated at 37°C for 14 h. After pbenol and chloroform: complexes such as the globin locus (Dillon and Grosveld, 1993). iso-amylalcohol (24:1) extraction, 360 pi o f sample was precipitated with Many of these 27 genes within aU three farmhes have been 5.4 pi 10 mg per ml Dextrane T40, 95 pi of 1 M NaCl, and 1062 pi imphcated, or are known to be directly involved, in a number of 100% ethanol. The pellet was dissolved in 20 pi H 2 O. YAC DNA was wide-ranging disorders. Vohwinkel’s keratoderma, a rare domin­ isolated as described previously (Nizetic and Lehrach, 1995). Picking and antly inherited genodermatosis, was the first disease directly related spotting of PAC clones, generation of cosmid insert probes and Southern to a mutation in an EDC gene — a ffameshift and delayed blotting were carried out as recently described (Groet et al, 1998). termination codon in loricrin (Maestrini et al, 1996; Korge et al, Fluorescence in situ hybridizations were carried out as described previously 1997). Autosomal recessive pycnodysostosis has also been mapped (Marenholz et at, 1996) to lq21 (Gelb et al, 1996; Polymeropoulos et al, 1995). Disorders of keratinization, such as various types of ichthyosis are marked Pool construction and pool polymerase chain reaction (PGR) One with a drastic decrease in profilaggrin expression and/or abnormal set of duplicated microtiter dishes containing the lq21 PAG sublibrary was CE (Resing and Dale, 1991), whereas S100A7, S100A8, and divided in two. One half of the well volume was firozen after adding S100A9 (formerly known as psoriasin, calgranuHn A, and calgranuUn glycerol whereas the other half was pooled in following way: The wells in columns firom plate 1 were pooled and labeled pools 1—12 whereas the B, respectively), have been found highly upregulated in psoriatic remaining nine clones in plate 2 were pooled as one and labeled 13. epidermis (Hardas et al, 1996). Moreover, SlOO proteins have been Glycerol was added to constitute 25% and the resulting volumes were associated with tumor development and the metastatic behavior of aliquotted and frozen down. Escherichia coli cells were used as templates in breast cancer cells (Englekamp et al, 1993). In addition, amplification PCR reactions performed using amphmers firom the 3' untranslated regions of the region lq21—q22 has been detected in sarcomas (Forus et al, of the following genes: trichohyalin, profilaggrin, involucrin SIOOAIO, 1998). The breakpoint of a papillary renal cell carcinoma associated S100A9, S100A8, and S100A6 as given in the Genome Database (Johns translocation (X;l) has been mapped to about S100A4, with the Hopkins University, Baltimore). PCR conditions are described below presence of related sequence in the region being suggested as a except 2 pi o f cells firom the PAG pools were used as a template in the first round of screening, whereas cells scraped firom individual frozen cause (Weterman et al, 1993). microtiter plate wells were used in conjunction with 2 pi o f sterile water So far, a long range physical map of the EDC using genomic as a template for the final round of screening. The PCR products firom DNA Southern blots, as well as dense yeast artificial chromosome these reactions were run on appropriate agarose gels. The bands were (YAC) and YAC-ffagmentation maps have been constructed purified with Prep-A-Gene (BioRad) according to the manufacturer’s (Marenholz et al, 1996; Mischke et al, 1996; Zhao and Elder, 1997; specifications and used in subsequent hybridization experiments. Lioumi et al, 1998). YAC clones, although very useful in bridging long distances, firequently suffer from chimerism, deletions, and Cultured keratinocyte library construction cDNA clone 1019bS rearrangements making them imperfect for rehable gene detection (Fig 1) has been detected using the YAC 764_a_l as a hybridization and unsuitable for high resolution, faithful, “sequence ready” probe against high density membranes of a cultured keratinocyte library. representation of genomic DNA (Cheng et al, 1994; Groet et al, Poly(A) + R N A isolated firom primary human kératinocytes cultured 1998). according to Fischer et al (1996) in both proliferating and differentiating Here we present a single continuum of 2.45 Mbp of overlapping culture conditions was size firactionated on a sucrose gradient. Four pooled genomic DNA fragments (contig) cloned in bacterial vectors, fractions ranging firom larger than 5 kb to smaller than 1500 bp were spanning the entire known EDC. The map integrates the locaHz- reversed transcribed with SuperScript 11 RNase H-reverse transcriptase ation of all 27 genes so far placed within the EDC on a high (Gibco BILL) and cloned in the Uni-ZA P X R lambda vector (Stratagene). The primary complexity o f these Hbraries varied firom 1.3 to 2 million resolution restriction map of PI (PAC), bacterial artificial chromo­ plaques. Equal plaque-forming units of libraries 1 and 2 and Hbraries 3 and some (BAC), and cosmid clones. This map provides an appropriate 4 were mixed before in vivo excision with ExAssist helper phage (M13). set of DNA clones to be used as tools for the detailed expression Single colonies of the two resulting pBluescript SK(-) phagetnid Hbraries studies of single genes and smaller groups of genes within the EDC (in strain SOLR) were scraped fi-om agar plates and stored at -80°C before using PAC/BAC retrofitting technology in the transient transfection picking and filter spotting. Gridding of the Hbrary into microtiter plates studies (Mejia and Monaco, 1997; Kim et al, 1998). It also represents and the production of high density membranes for screening will be a resource for the positional derivation of potentially many addi­ discussed elsewhere (Marenholz et al in preparation). tional genes, as well as promoters and regulatory elements of single genes and gene famihes, and for the studies aimed at detecting cDNA and sequence tagged site isolation Plasmid containing cDNA shared transcriptional locus control elements. The presented map representing the loricrin gene was kindly donated by Daniel Hohl. Clones will also be the main substrate for the long-range nucleotide from the cultured keratinocyte cDNA library were used to isolate probes sequence determination of genomic DNA from this region of for SlOOAll, S100A4, S100A7, and 1019b8. IMAGE consortium clones chromosome 1, as agreed by The Sanger Center, Hinxton, U.K. 153992, 398802, 377441, and 342548 were obtained from the HGMP resource center and used as probes for SlOOAl, S100A2, S100A3, and MATERIALS AND METHODS S100A12, respectively. We designed PCR primers for SlOOAl, S100A5, and S100A13: AlFor: AGACAGCCACATTGGGCAGCGC; AlRev: Libraries used Cosmid ICRFcll2 flow-sorted chromosome 1 library GATGAGTTGCAGGCTTGGACCGC; A5For: GCATCGATGACTT- (Nizetic et at, 1994). PAG Library RPCI-1 (loannou et at, 1994). Totd GATGAAG; A5rev: CTGCGATGGAAACTTTATTTC; A13For: GAA- Human BAC library (Shizuya et at, 1992) (Research Genetics). GTAAAGCCGCCTGGCTGAGATG; A13Rev: GAGGAAGCTTTA- TTTGGGAAGAGTGCG. PAC/BAC insert isolation and hybridizations Hybridization were Conditions as given below except for the case of the A1 primers where carried out as described by Baxendale et at (1991) with some modifications. 10 ng of the clone lC R Fcll2P 0780 was used as starting material. Subcloned PAC/BAC clones were digested at 37°C with Notl (NEB). Pulse field gel portions of YAC 764_a_l were used for probes representing Y37 m-2, electrophoresis was carried out using a CHEF-DR 11 system (Biorad) Y37 m-6, and Y37 m-16. Clones containing dlS3623 and dlS3624 were according to manufacturers recommendations, with 0.5 X TBE (10 X used as probes. Probes for the sequence tagged site markers dlS3625 and TBE: 108 g Tris base, 55 g boric acid, 40 ml 0.5 M ethylene diamine dlS1664 were produced by P C R with ampHmers as given in Marenholz tetraacetic acid (pH 8.0) per Htre), and 1% low melting point agarose (Life et al, 1996 and the Genome Database, respectively. Twenty nanograms of Technologies). Electrophoresis conditions for the 4-200 kbp window: total human placental D N A (Sigma) was used as starting material for the 5.2 V per cm, 14 h, switch time 3-15 s. D N A size markers used: Lambda reactions. PCR was carried out under standard conditions for 32 cycles ladder (Biorad), high molecular weight DNA markers (Life Technologies), (95°C, 60 s; 60°C, 60 s; 72°C, 120 s). The initial dénaturation was 5 min and 1 kb DN A ladder (Life Technologies). at 95°C, and the final extension was 7 min at 72°C. 912 SOUTH E T A L THE JOURNAL OF INVESTIGATIVE DERMATOLOGY

I- □ s..

ID! □

in II

■5 a ID S. >• □ □ □ □ □

in □

n n

10 □

ID 10

ID = IÎÛ VOL. 112, NO. 6 JUNE 1999 HUMAN EDC IN SINGLE BACTERIAL CONTIG 913

SPRR cosmid mapping Membranes containing a high density gridded This helped estabHsh the initial sets of overlapping PAC clones chromosome 1 specific cosmid hbrary, constructed in Lawrist 4 (Nizetic (PAC contigs) and identified the outermost (gap-facing) clones et al, 1994) were screened with a mixture of SPRRl, SPRR2, and SPRR3 from initial contigs. The insert D N A from the gap facing clones subclass-specific probes (Gibbs et al, 1993). Overlapping clones were was used in additional rounds of hybridization experiments to the detected by cosmid walking, using labeled RNA end probes, synthesized with either SP6 or T7 RNA polymerase and [^^P]-UTP (Amersham) subHbrary membranes. Remaining PAC clones from the lq21 essentially as described (Ivens and Little, 1995). Restriction mapping was subHbrary that had not been recognized by established markers or performed on cosmids linearized either with Nrul or Cpol (which did not characterized PAC clones were then randomly selected and their cut inserts) by partial digestion with EcoRI. Restriction fragments were insert DNA isolated and hybridized back to the lq21 PAC subHbrary separated by pulse field gel electrophoresis (Gibbs et al, 1993), in the membranes. W here no further clones within the subHbrary could presence of DNA size markers and detected by either one of the two be identified, inserts of gap facing PAC clones were hybridized labeled end-probe oligonucleotides (see below), which allowed the mapping back to the main PAC Hbrary (loannou ef al, 1994), a total human of each insert in both directions. Cosmids, totally digested with the above BAC Hbrary (Shizuya et al, 1992) and a chromosome 1 specific mentioned restriction enzymes, were hybridized with either subclass- cosmid Hbrary (Nizetic et al, 1994). specific SPRR probes or, as a control, with the respective cosmid DNA, labeled by random priming. Fluorescent in situ hybridizations have been performed using the The following primers were labeled with T4 polynucleotide kinase and foUowing six PAC clones: 61 J12, 13506, 14 N l, 43017, 71 A4, T^^P-ATP and were used as end-probes: bio-1095: TTCAGCTGC- and 148L21 confirming their specific locaHzation to the lq21 TGCCTGAGGCTGGACGACCTCGCGG and bio-1096: TTCCACC- region (results not shown). ATGATATTCGGCAAGCAGGCATCGCCA. Figure 2 summarizes the results of the hybridization and PCR experiments performed to define the map. Owing to spatial RESULTS constraints aU probe target combinations have not been shown. Bacterial clone identification Four YAC clones from the For easier interpretation of the results, we wiU give an example. previously published 6 Mb contig of the region (Marenholz et al, Looking at the probes (first left-hand column. Fig 2) the horizontal 1996) were used as starting material. The YAC clones used were row labeled 43017 represents the results obtained using this PAC 764_a_l, 907_e_6, 955_e_ll, and 950_e_2; of which 764_a_l as a probe. We can see that five target PAC clones have been and 950_e_2 had been previously mapped to lq21.3 by two detected (gray fields in five vertical columns: 52J10, itself, 47M17, color fluorescence in situ hybridization on extended metaphase 71 A4, and 13P20). AH five were detected using labeled 43017 chromosomes displaying the given orientation (Marenholz et al, insert DNA in hybridization to colonies (ceUs processed and 1996). Artificial chromosomes (YAC) were separated from the host immobiHzed on a nylon filter), code “0” (see Fig 2 legend). Only yeast chromosomes by excision from the gel after pulse-field gel two PAC clones (itself and 52J10) were positive when an RNA electrophoresis, their DNA purified and used as a hybridization T7-end probe from 43017 was used (code “3” in Fig 2 legend), probe against a Hbrary representing the whole human genome 5- indicating that the PAC 52J10 overlaps the T7 end of the PAC fold, cloned as PAC in E . coli (loannou et al, 1994) and displayed 43017 (see positioning in Fig 1). O f the reihaining three code on high-density membranes. After eHminating PAC clones which “0” positive clones, the clone 13P20 was detected with involucrin hybridized to unrelated YAC clones from other regions of the and also the SPR R cluster containing PAC 20N18 used as a probe. human genome (J. Groet, personal observation) a total of 105 PAC This estabhshed the PAC clones 52J10 and 13P20 as the minimum clones, grouped in seven pockets (determined by their pattern of necessary to extend the contig (minimal tüing path or MTP) left hybridization to individual overlapping YAC clones), were regrid- and right, respectively, from the starting PAC in this example ded in two 96 well microtiter plates as a subHbrary of PAC clones (43017). Therefore, only these three PAC clones (52J10, 43017, enriched for the lq21 region. These plates were duplicated and and 13P20) were used in the next round of hybridizations to spotted on to nylon membranes where they were grown as colonies, Southern blots (see below). Hence, only these three PAC clones processed, and used as an immobiHzed DNA target for subsequent have an additional code “1” positive result in Fig 2 in the row hybridization experiments. In addition, E . coli cells from one of with which we started this example, the 43017 horizontal row. the dupHcate subHbrary sets of plates were used to provide template In the case of probes representing a number of SlOO genes (A2- DNA in PCR screening, as described in Materials and Methods. A6, first column in Fig 2), aU known individual SlOO genes from Genes and sequence tagged site markers already placed on to this group have been positioned on to PAC clones 148L21 and YAC clones and genomic DNA of the region (Volz et al, 1993; 178F15, and further mapped on to individual restriction fragments Marenholz et al, 1996) were assigned to individual PAC clones (see below). This was aclHeved prior to obtaining additional clones within the lq21 subHbrary using hybridization (with the lq21 (right hand side end columns in Fig 2) that were left not tested subHbrary membranes as targets) and/or P C R (using the E . coli (nt) for these SlOO gene probes. The testing was considered ceUs as templates). Purified DNA of inserts from positive PAC unnecessary because the probes were already positioned within the clones {Materials and Methods) were used as probes in further MTP. The reason the additional clones were obtained was to bridge rounds of hybridization experiments to the gridded PAC subHbrary the one gap that remained near the right hand side of the map. membranes. This detected further overlapping clones from the subHbrary which were digested with EcoRI restriction enzyme, Restriction mapping AU clones were initiaUy digested with resolved on agarose gels, transferred to membranes by Southern EcoRI restriction enzyme and separated by electrophoresis on 20 cm blotting and rehybridized with PAC clone insert DNA probes. long, 0.8% agarose gels. The overlaps, as seen by shared restriction

Figure 1. Restriction and transcriptional map of the 2.45 Mbp bacterial contig encompassing the entire EDC. The top horizontal scale bar is in kilobase pairs, 0 being the most centromeric end of the contig. The second, thicker, horizontal bar is a schematic representation of chromosomal DNA showing the locaHzation of markers, genes and EST in the region. Rhomboid signs represent genes; eUipsoid signs represent markers, sequence tagged sites and EST. The y37m-2, m-6 and m-16 are new markers and their GenBank accession numbers are X94412, AJ009641, and X95103, respectively. 101968 is a new keratinocyte cDNA clone mapping to the region, which has the accession number AJ009640. Underneath this are the bacterial clones represented by dark horizontal Hnes. Clones that make up the M TP are shown with orientation, restriction enzyme sites and locaHzation of genes/markers or EST. Filled in boxes represent T7 clone ends whereas open larger boxes represent SP6 clone ends. SmaU vertical bars show EcoRI restriction enzyme sites whereas a large N or S represents Notl and Sail restriction enzyme sites within particular EcoRI fragments, respectively. Polymorphic EcoRJ sites in our map are represented by a sm ^ gray vertical bar. The Hght gray horizontal bars above the bacterial clones represent EcoRI “bins”—the precise order of fragments within the bins has not been determined. BAC clones are shown with the prefix “B” whereas cosmid clones have the prefix “c” instead of ICRFcll2. The remaining clones are PACs. A more detailed and enlarged map of the region marked by the dashed rectangle can be seen in Fig 4, showing the precise ordering of the SPRR gene cluster. 109M8 10I8M OO.I 22009 61

B313MM CB0750

13506

133011

01S3624

50824 P.1 0 0,1 im 9105

52J10 0181664 1,^ 1.Î 43017 13P20

20N16

SPRflIb 0 61 0 61

SPRR2a M 0.1 127E12

23001 ml 00*7

178F15

4100*13 CP0780 •100*1

Figure 2. Summary of the hybridization and PCR based confirmations that establish the overlap shown in Fig 1. Left; sources of probes. Top: target clones. The exact nature of probe used in each case was (0) hybridization to colony, (1) hybridization to Southern blot, (2) gene/marker by PCR, (3) T7 clone end riboprobe. In the case of BAC and PAC probes hybridization refers to labelled insert, in the case of cosmid clone cP0780 the probe used was isolated from internal 5om3AI fragments. Probes used for genes/markers/EST are indicated in text along with relevant primer sequences. Markers are taken from the Genome Database (GDB). Addresses of clones are as Fig 1 except cosmid 1: ICRFcl 12L1963 and ICRFcl 12E2275, whereas cosmid 2 refers to ICRFcl 12P1146, ICRFcll2K182, ICRFcl 12K1231, ICRFcl 12D0853, ICRFcl 12H1166, ICRFcl 12A0454, ICRFcl 1201459, ICRFcl 12N0667, ICRFc112F124, ICRFc 112G0160, ICRFcl 12N15101, and ICRFcl 12 L0567. “nt” indicates clone not tested. VOL. 112, NO. 6 JUNE 1999 HUMAN EDC IN SINGLE BACTERIAL CONTIG 915

B SIZE SIZE STD. 20N18 59H12 19K8 140.J1 127E12 230D1 1481,21 s t d . 148L21 230D1 127E12 140J1 19K8 59H12 20N18

CM 12kbp llk b p lOkbp 9kbp ### 8kbp «#*### 2=HKrÉ 7kbp 6kbp mmT. 5kbp 14kbp

IJk b p

|2 k b p

l.6kbp

lik b p

lO.Skbp

D 148L21 230D1 127E12 140J1 19K8 59H12 20N18 1481,21 230D1 127E12 140J1 19K8 59H12 20N18

m m m m m ssü

Figure 3. Example of a 0.8% agarose gel and the resulting Southern blot hybridized with various ^^P-labelled probes from the EDC. (,4) agarose gel of seven clones from the MTP (see Fig 1), digested with four different combinations of restriction enzymes and stained with ethidium bromide and photographed under ultraviolet. The restriction enzyme combinations for all clones are shown for the first two; E = EmRI, N = Notl, and S = Sail. The first and last lane contain a 1 kb ladder size standard, the size of which is indicated for the last. (B) Southern blot of gel shown in (/I), hybridized with insert of the PAG 19K8 used as a probe. (Q Southern blot of gel shown in {A), hybridized with insert of the PAG 20N18 used as a probe. (D) Southern blot of gel shown in (A), hybridized with an SPRR2 cDNA probe. Southern blots display mirror image of the gel in {A). 916 SOUTH E T A L THE JOURNAL OF INVESTIGATIVE DERMATOLOGY

190 1|0 1^0 1^0 190 190 Kbp

■ ■ ■ ■ ■ ■ ■ I I I II.I I I I III 11 I II

SPRR1A SPRR1B SPRR2D SPRR2A SPRR2B SPRR2F SPRR2C SPRR2G

ICRFC112K0695 ICRFC112K1632______ICRFC112J1113 ICRFc 112G 0658 ICRFcl 12F0823 ICRFcl 12F0969 IC RFcl 12D01101 ICRFcl 12J0123 ICRFcl 12L0560 IC R F cl 12C164S IC R F cl 12C 0111 IC R F cl 12E0682 ICRFcl 12L1535

Figure 4. Contiguous EcoRI restriction map of the SPRR cosmid contig (covering the region within the dashed rectangle around 1500— 1690 kbp of the lq21 map in Fig 1). PACs from the region are shown underneath the cosmid contig. The three different SPRR subclasses are indicated by black (SP R R l), gray (SPRR2), and white (SPRR3) ovals. The sequences o f S P R R IA (L05187), SPRRIB (M84757), SPR R 2A (X53064), SPRR2B (L05188), SPR R 2C (L05189, L05190), and SPR R3 (AF077374) are available from the Genome Database (GenBank ID in brackets). Sequence analysis o f the other SPR R2 genes, which have been denominated according to their respective localization within the map, is in progress (Cabral et al in preparation). Restriction fragments hybridizing to the SPR R 2 probe are indicated with a small open square beneath the bar representing the restriction site pattern. The frnding that two genes (SPRRC and SPRRE) are identified by two hybridizing fragments is due to the fact that the probe contains an internal EcoRI site. fragments determined from these gels, coupled with the results along with the 14 kb SP6-end fragment of 230D1, thus indicating from PAC insert hybridization experiments identified a tentative an EcoRI site within the S100A4 sequence. S100A3 localizes to MTP of overlapping clones. Clones from this MTP were digested the 17 kb fragment of 148L21 and 178F15 as weh as the 14 kb with four combinations of restriction enzymes; EcoRI only, EcoRI SP6 fragment of 230D1, whereas S100A2 is only present on the and Notl, EcoRI and Sail, and EcoRI, Notl, and Sail together. 17 kb fragment from 148L21 and 178F15. This estabUshes the Fragments were again resolved on identical agarose gels which order S100A5 + S100A6-S100A4-S100A3-S100A2 and is in fuh were used to produce Southern blots. Inserts from MTP clones agreement with the genomic sequence data (Englekamp et al, were then systematically hybridized back to the Southern blots in 1993). This sequence also contains the internal EcoRI site of order to define the precise degree of overlap shared by clones. SI00A4 and, coupled with our map, unambiguously defines the EcoRI fragments of the MTP were sized and grouped together in order as S100A6-S100A5-S100A4-S100A3-S100A2. bins (light gray horizontal bars in Fig 1) depending on which Recombinant PAG clones are constructed (loannou et al, 1994) clones contain the fragments, which clones hybridize to the in such a way that the random genomic DNA insert (100—150 kb fragments and other restriction enzyme sites Hmit their order on average) has been inserted in the cloning site between the two and position. RNA polymerase promoters T7 and SP6. The specific orientation Figure 3 displays an example of one such gel and the resulting of the PAG inserts in relation to the vector could be determined Southern blot hybridized with a number of radiolabeled probes. as the restriction enzyme Noil digests the PAG vector either side Seven clones from the MTP have been resolved on the agarose of the cloning site, whereas the restriction enzyme Sail digests the gel in Fig 3(.4). A number of shared fragments between adjacent vector on the SP6 side only. In the case of the one BAG clone, clones can clearly be seen in the ethidium bromide stained ultraviolet the two cosmid clones, and where PAG end-fragments could not photo, except in the case of 127E12 and 230D1 which do not be easily identified within the M TP (due to co-running with overlap. In some instances clones can contain EcoRI bands of the internal fragments), T7-RNA end-probes were hybridized to same size by coincidence, which is why it is necessary perform Southern blots, defining clone orientation and the exact positioning hybridizations with adjacent clone inserts as probes. This serves the of certain EcoRI fragments. purpose of both verifying the overlapping fragments and measuring the total length of the overlapping segment. Figure 3(B) clearly Gene and transcript map Hybridization probes were produced defines the degree of overlap between 19K8 (insert used as probe) for aU genes and markers shown in Fig 1 and used against the and adjacent clones. Another example of this overlap conformation Southern blots of the MTP, thus positioning the respective genes and can be seen in Fig 3(C) where the insert of clone 20N18 has been markers to specific EcoRI fragments (for an example, see Fig 3D). used as a probe. Figure 3(D) displays the same Southern blot In the centromeric half of the map bacterial clones confirm the hybridized with the cDNA derived from the 930 bp probe order of genes and markers and the respective distances between corresponding to SPRJ12E. Owing to the highly conserved nature them from previous mapping studies (Marenholz et al, 1996; of the SPRR2 genes (90% in the coding region Gibbs et al, 1993) Mischke et al, 1996; Lioumi et al, 1998). a number of positive EcoRI fragments can be seen (nine in total) Telomeric of S100A9, our map for the first time estabhshes the corresponding to the seven SPRR2 genes (see also Fig 4). These order and distances between all known genes and markers in this positive fragments have been placed in EcoRI bins accordingly: in segment. The SlOO genes A ll, A12, and A13 have been precisely each of three separate bins (59H12 and 20N18 shared, 59H12 positioned in relation to the previously pubhshed order of SlOOAl— specific and 19K8 specific), three SPRR2 positive fragments can 10. We establish the order as follows (Fig 1): cen-SlOOAlO— be placed. As this procedure does not allow the positioning of the S100A11-S100A9-S100A12-S100A8-S100A7-S100A6-S100A5- three different fragments in one bin and their assessment to one S100A4-S100A3-S100A2-S100A13-S1 OOAl-tel. specific gene, the whole SPRR region was mapped in more detail The region occupied by the SPRR gene family (a 170 kbp by using a cosmid contig (see below). segment marked by a hashed hne rectangle in Fig 1) has been Probes representing S100A6, S100A5, SI00A4, and SI00A3 mapped in more detail by assembhng a contiguous EcoRI map, locahze to PAG clones 230D1, 148L21, and 178F15 (see Fig 1). using chromosome 1 specific cosmids (Fig 4). This allowed the S100A6 and S100A5 locahze to a 10.6 kb fragment present in ah assignment of the nine SPRR2 specific hybridizing bands (Figs 1 three clones. SI00A4 localizes to this same fragment in ah three and 3D) to seven individual SPRR2 genes and the determination clones but also to a 17 kb fragment in clones 148L21 and 178F15 of the relative position of the various members of the three VOL. 112, NO. 6 JUNE 1999 HUMAN EDC IN SINGLE BACTERIAL CONTIG 917 subfamilies in the SPRR multigene family. A more detailed study is This allows for prediction that the ,lq21 region may harbor novel addressing the exon-intron organization, transcriptional orientation, transcripts unrelated to the EDC as well as new members of the and evolution of the various members of this gene family, and wiU current gene farmhes. The bacterial clones presented in this map be pubhshed elsewhere (Cabral et al in preparation). provide optimal material for further isolation of genes using exon The three new markers y37 m-2, y37 m-6, and y37 m-16 were trapping, cDNA selection and the screening of cDNA Hbraries. produced from subcloning YAC 764_a_l, increasing marker density This study reports one new cDNA, which has been mapped within the region. One new cDNA clone, 1019b8 was found from approximately 350 kb telomeric of SIOOAIO and was obtained by YAC hybridizations to a cultured keratinocyte cDNA hbrary, and hybridization of a YAC to the keratinocyte cDNA Hbrary. This has been mapped approximately 350 kb telomeric of SIOOAIO. position is outside the current EDC m sensu stricto. This cDNA does not show sequence similarity to any of the three EDC gene DISCUSSION famihes. If the expression of the gene underlying this cDNA is found to be regulated during keratinocyte differentiation, it may The overall size of the region mapped in Fig 1 is 2450 kbp. Of add further genes (gene famihes) to the EDC, expanding its physical the total contig length 59.3% is covered by overlapping clones borders. This is currently an ongoing investigation. with a depth ranging from 3- to 9-fold, and only 15.6% is covered It has been suggested that EDC gene expression could be by single-clone-deep parts of the contig. The average degree of coordinated from a shared locus control region (Hardas et al, 1996), overlap in the MTP is 47.2% of clone length. The sum of the raising the question of evolutionary history and biologic role. Are lengths of the nonredundant restriction fragments shown in the the EDC genes kept together coincidentally or are there regulatory map in Fig 1 gives the measurement of the physical distance and functional constraints keeping the genes in close physical between the markers for various intervals of the map which matches distance over miUions of years and across species barriers? What well with the previously published physical maps of large genomic could the study of the EDC gene analogs throughout vertebrate DNA fragments separated by pulse-field gel electrophoresis (Volz evolution tell us about the role of specific proteins in the evolution et al, 1993; Mischke et al, 1996). These maps found the genes of epidermis? A high resolution physical map of ordered bacterial SIOOAIO and S100A6 on genomic DNA fragments with minimum clones is an essential resource in experimental approaches addressing sizes 1760 kbp and 1750 kbp, respectively, and maximum sizes of these and other questions. An integrated restriction map, in 2050 kbp and 1900 kbp, respectively. We find the same genes particular with Notl, renders the clones specially useful for direct occupying a maximum distance of 1700 kbp which is in good transient transfection approaches using PAC/BAC retrofitting tech­ agreement with the data from direct genomic physical mapping. nology (Mejia and Monaco, 1997; Kim et al, 1998). The small difference of 50-60 kb could be easily attributed to the In conclusion, we beHeve the high resolution bacterial contig difference in resolution between the two experimental approaches. spanning the EDC will be a very helpful resource in further studies One small refinement to previously published work locaUzing of this intriguing gene complex. This contig wiU also be the main genes and markers to YAC clones can be seen. The previously starting substrate for the long range sequencing of this region by published 6 Mb YAC contig (Marenholz et al, 1996) was unable The Sanger Center, Hinxton, U.K. to resolve the position of the marker dlS3625 with respect to S100A6 and gave the following order; cen—S100A9—S100A6— dlS3625—tel (see Fig 1, Marenholz et al, 1996). We see from the hybridization pattern in the bacterial contig the order: cen S100A9— We thank Beat Schaefer and Daniel Hohl for some gene-specific probes, Maria dlS3625-S100A6-tel. Lioumi for sharing corroborative mapping information, the M RC-Human Genome The genes S100A6, S100A5, S100A4, and SI00A3 have previ­ Mapping Project Resource Center U .K. for access to various molecular resources and ously been locaHzed to a 15 kb region in genomic DNA-containing Juergen Groet for advice in experimental procedures. Thanks to Susan Gibbs and phages (Englekamp et al, 1993). In agreement we see these SlOO David Fischer (Leiden University) for essential contribution in the construction and genes localizing to a region of similar size. S100A13 is shown to propagation of the human keratinocyte cD N A library. We also thank David hybridize to an EcoRI fragment from digested YAC DNA of about Bentley, Simon Gregory, and Mark Vaudin for advice and co-ordination to help 10 kb (Wicki et al, 1996a). This is in hne with our finding, an 8.9 kb incorporate this map into the overall chromosome 1 mapping and sequencing project EcoRI fragment positive on hybridization of an SlOOAl3 probe. at The Sanger Center. A .PS. has been partly supported by the Constance Bequest The study of a YAC containing SlOO genes A9—Al (Schaefer Fund. This work has been mainly supported by the B M H 4-C T 96-0319 grant et al, 1995) gives a distance between these two genes of 320 kbp. from the Commission of European Communities to the European 1q21 Consortium, Our contig is in agreement, with a maximum distance of 330 kbp and partly by the Human Genome Mapping Project Strategic grant C 9 422614 and also confirms the position o f Sail restriction enzyme sites in from the Medical Research Council, U.K. the region. A mapping study using mainly YAC clones from Zhao and Elder, 1997 gives a distance from involucrin to S100A2 of 780 kbp. Our map is in agreement with a maximum distance of REFERENCES 790 kbp and also confirms the position of a single Notl restriction Baxendale S, Bates GP, MacDonald ME, GuseUaJF, Lehrach H: The direct screening enzyme site over this region. Clones 22009 and 92M23 could of cosmid libraries with YAC clones. Nucleic Acids Res 19:6651, 1991 potentially contain as yet uncharacterized CpG islands (both contain Cheng JF, Boyartchuk V, Zhu Y: Isolation and mapping of human chromosome 21 two Notl restriction enzyme sites) and are currently the subject of cDNA. Progress in constructing a chromosome 21 expression map. Genomics 23:75-85, 1994 an ongoing investigation. Dale BA, Resing KA, Presland RB. Keratohyalin granule proteins. In: Leigh I, Land A multigene family clustered in a small genomic region is not a B, Watt F (eds). The Keratinocyte Handbook. Cambridge: Cambridge University novelty for complex genomes, but very few of them include Press, 1994, pp 323-350 structurally different (though distantly related in evolution) classes Dillon N, Grosveld F: Transcriptional regulation of multigene loci: multilevel control. Trends Genet 9:134—137, 1993 of genes. Best examples are the major histocompatibility complex, Eckert R, Green H: Structure and evolution of the human involucrin gene. Cell and other gene complexes belonging to the immunoglobuHn 46:583-589, 1986 superfamily. Genomic regions occupied by such complexes are Englekamp D, Schafer BW, Mattel MG, Erne P, Heizmann CW: Six SlOO genes usually transcriptionally very active, and truly “gene-rich”, including are clustered on human chromosome lq21: identification of two genes coding for the two previously unreported calcium binding proteins SIOOD and SIOOE. numerous additional transcripts whose direct evolutionary, struc­ Proc Natl Acad Sci USA 90:6547-6551, 1993 tural, and functional Hnk to the multigene family (if any) is not Fischer DF, Gibbs S, van de Putte P, Backendorf C: Interdependent transcription apparent. For example, the genomic segment harboring the human control elements regulate the expression of the SPRR2A gene during major histocompatibüity complex in 6p is one of the “gene-richest” keratinocyte terminal difierentiation. Mol Cell Biol 16:5365—5374, 1996 Forus A, Berner JM, MezaZepeda LA, Saeter G, Mischke D, Fodstad O: Molecular regions in the entire genome with gene densities exceeding 1 per characterisation of a novel amphcon at Iq21-q22 frequently observed in human 10 kbp spreading over several Mbp (Milner and Campbell, 1992). sarcomas. Br J Cancer 78:495-503, 1998 918 SOUTH E T AL THE JOURNAL OF INVESTIGATIVE DERMATOLOGY

Gelb BD, Spencer E, Obad S, Edelson GJ, Faure S, Weissenbach J, Desnick RJ: Nizetic D, Monard S, Cotter F, Young BD, Lehrach H: Construction of cosmid Pycnodysostosis: Refined linkage and radiation hybrid analyses reduce the hbraries from flow sorted material of human chromosomes 1, 6, 7, 11, 13 and critical region to 2 cM at lq21 and map two candidate genes. Hum Genet 18 for use as reference hbraries. Mamm Genet 5:801—802, 1994 98:141-144, 1996 O ’Kefee EJ, Hamilton EH, Lee SC, Steinert P: Trichohyahn—A Structural protein Gibbs S, Fijnemann R , van Wiegant J, Kessel AG, van de Putte P, Backendorf C: of hair, tongue, nail, and epidermis. _/ Invest Dermatol 101:65s—71s, 1993 Molecular characterisation and evolution of the SPRR family of keratinocyte Polymeropoulos MH, De Ortiz Luna RI, Ide SE, Torres R, Rubenstein J, differentiation markers encoding small prohne-rich proteins. Genomics 16:630— Francomano CA: The gene for pycnodysostosis maps to human chromosome 637, 1993 lcen-lq21. Nature Genet 10:238-239s, 1995 Groet J, Ives JH, South AP, et al: Bacterial contig map of the 21qll region associated Presland RB, Bassuk JA, Kimball JR , Dale BA: Characterization of two distinct with Alzheimer’s disease and abnormal myelopoiesis in Down syndrome. calcium-binding sites in the amino-terminus of human profilaggrin. J Invest Genome Res 8:385—398, 1998 Dermatol 104:218-223, 1995 Hardas BD, Zhao X, Zhang J, LongqingX, Stoll S, Elder JT: Assignment of psoriasin Resing KA, Dale BA. Proteins of keratohyahn. In: Goldsmith LA (ed.). Physiology, to human chromosomal band lq21: coordinate overexpression of clustered Biochemistry and Molecular Biology of the Skin. New York: Oxford University genes in psoriasis. _/ Invest Dermatol 106:753—758, 1996 Press, 1991, pp 148-167 loannou AP, /unemiya J, Games J, et al: A new bacteriophage PI-derived vector Schaefer BW, Heizmann CW: The SlOO family of EF-hand calcium-binding proteins: for the propagation of large human DNA fragments. Nature Genet 6:84—89,1994 Functions and pathology. Trends Biochem Sci 21:134—140, 1996 Ivens AC, Little PFR. Cosmid clones and their application to genome studies. In: Schaefer BW, Wicki R , Englekamp D, Mattel MG, Heizmann CW: Isolation of a Glover DM, Hames BD (eds). D N A Cloning 3 A Practical Approach. Oxford: YAC clone covering a cluster of nine SlOO genes on human chromosome IRL Press, 1995, pp 1-46 lq21: rationale for a new nomenclature of the SlOO calcium-binding protein Kim SY, Horrigan SK, Altenhofen JL, Arbieva ZH, Hoffinan R, Westbrook CA: family. Genomics 25:638—643, 1995 Modification of bacterial artificial chromosome clones using Cre recombinase: Shizuya H, Birren BI, Kim U-J, Mancino B, Slepak T, Tachiiri Y, Simon Ml: introduction of selectable markers for expression in eukaryotic cells. Genome Cloning and stable maintenance of 300-kilobase-pair fragments of human Res 8:404-412, 1998 DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci USA Korge BP, Ishida-Yamamoto A, Punter C, et al: Loricrin mutation in Vbhwinkel’s 89:8794-8797, 1992 keratoderma is unique to the variant with ichthyosis. J Invest Dermatol 109:604— Steinert PM, Roop DR: Molecular and cellular biology of intermediate filaments. 610, 1997 Annu Rev Biochem 57:593-625, 1988 Lioumi M, Olavesen MG, Nizetic D, Ragoussis J: High-resoludon YAC Townes TM, Behringer R R : Human globin locus activation region (LAR): Role fragmentation map of lq21. Genomics 409:200—208, 1998 in temporal control. Trends Genet 6:219—229, 1990 Maestrini E, Monaco AP, McGrath JA, et al: A molecular defect in loricrin, the Volz A, Koige BP, ComptonJG, Zeigjer A, Steinert PM, Mischke D: Physical mapping major component of the comified cell envelope, underhes Vohwinkel’s of a functional cluster of epidermal differentiation genes on chromosome lq21. syndrome. Nature Genet 13:70-77, 1996 Genomics 18:92-99, 1993 Marenholz I, Volz A, Ziegler A, Davies A, Ragoussis I, Korge BP, Mischke D: Genetic analysis of the epidermal differentiation complex (EDC) on human Weterman MAJ, Wübrink M, Dijkhuizen T, van den Berg E, van Kessel AG: Fine chromosome lq21: chromosomal orientation, new markers, and a 6-Mb YAC mapping of the lq21 breakpoint of the papillary venal cell carcinoma-associated contig. Genomia 37:295-302, 1996 (X; 1) translocation. Hum Genet 98:16—21, 1993 Markova NG, Marekov LN, O ’Keefe EJ, Parry DAD, Steinert PM: Profilaggrin is Wicki R, Schaefer BW, Erne P, Heizmann CW: Characterization of the human and a major epidermal calcium binding protein. Mol Cell Biol 13:613—625, 1993 mouse cDNAs coding for S100A13, a new member of the SlOO protein family. Mejia JE, Monaco AP: Retrofitting vectors for E. co/i-based artificial chromosomes Biochem Biophys Res Commun 227:594—599, 1996a (PACs and BACs) with markers for transfection studies. Genome Res 7:179— Wicki R, Marenholz I, Mischke D, Schaefer BW, Heizmann CW: Characterization 186, 1997 of the human S100A12 (calgranuhn C, p6, CAAFl, CGRX) gene, a new Milner CM, Campbell PX): Genes, genes and more genes in the major member of the SlOO gene cluster on chromosome lq21. Cell Calcium 20:459— histocompatibility complex. Bioessays 14:565-571, 1992 464, 1996b Mischke D, Korge BP, Marenholz I, Volz A, Ziegler A: Genes encoding structural WUhams ML, Elias PM: From basket weaver to barrier—unifying concepts for the proteins of the epidermal comification and SlOO calcium-binding proteins pathogenesis of the disorders of comification. Arch Dermatol 192:626-629, 1993 form a gene complex (‘epidermal differentiation complex’) on human Yuspa SH, Kilkenny AE, Steinet PM, Roop DR: Expression of murine epidermal chromosome lq21._/ Invest Dermatol 106:989—992, 1996 differentiation markers is tightly regulated by restricted extracellular calcium Nizetic D, Lehrach H: Chromosome specific cosmid libraries: construction, handling concentrations in vitro. J Cell Biol 109:1207—1217, 1989 and use in parallel and integrated mapping. In: Glover DM, Hames BD (eds). Zhao XP, Elder JT: Positional cloning of novel skin-specific genes from the human D N A Cloning 3. A Practical Approach. Oxford: IPX Press, 1995, pp 49-79 epidermal differentiation complex. Genomics 45:250—258, 1997