<<

THE MOLECULAR CLONING AND CHARACTERISATION

OF NORMAL AND BETA-PLUS THALASSAEMIC HUMAN'

GLOBIN '

\ . ' by

DAVID ANTHONY WESTAWAY

a thesis submitted for the degree of

Doctor of Philosophy

in the

University of London

December, 1980 Department of St Mary's Hospital Medical School London W2 IPG VI

ABSTRACT

^thalassaemia is a human hereditary diseases characterised by decreased output of p globin chains. Human & ~rp~r andy - globin cDNA plasmids, whose identity was confirmed by sequence analysis, were used to investigate the p- globin locus in j3 -thalassaemia. No gross alteration of the globin locus is associated-' with the disease. In order to identify the mutation conferring the thai , a chromosomal p globin has been cloned in phage lambda. Starting material was DNA from a Turkish Cypriot compound heterozygote, with a P gene .on one and a 6p fusion gene on the other chromosome. The DNA was digested with the restriction endonuclease Hsu 1 which generates a 7.5kb fragment from the p globin gene, and a 16kb fragment from the fusion gene. The Hsu 1 cleaved DNA was enriched for the 7.5kb size fraction and ligated to the lambda cloning vector ANEM788. 160,000 recombinants were screened and a positive scoring phage isolated. Restriction analysis confirmed that this phage contained a P globin gene. The 7.5kb insert was subcloned into the plasmid pAT153 and digestion with seven restriction endonucleases gave a pattern indistinguishable from that of a normal p globin gene. Thus within the 7.5kb of DNA cloned, which includes the p globin gene plus almost 6kb of flanking DNA, there have been no insertions or deletions greater than about 100 base-pairs. Also no point mutations occur within the hexanucleotide recognition sequences of the restriction endonucleases used here. The complete sequence of the coding blocks and of the P gene are presented and compared to a normal P globin gene. The gene has an G->A transition in the first . This sequence variant+ is also found in the P gene of a Greek Cypriot homozygous p thalassaemic described by Spritz et al (Spritz et al, 1980). To my.Parents/ iii

Acknowledgements.

To Bob Williamson Peter Little, and Ian Jackson for endless f useful discussions (Figure 19 is the Peter Little Memorial Diagram). To Ned Mantei and Charles Weissmann for help with DNA sequencing. To Binie Klein and Ken Murray for help with in vitro packaging. To Noreen Murray, David Sherratt, and Tom Maniatis for phage and bacterial strains. To Bernadette Modell for pointing out the compound heterozygote patient in our stocks of human DNA. To Julian Crampton for photographic help. To for reading the manuscript, and to my sister for typing it. I was supported by an MRC studentship. ( To all the Biochemistry department at St. Mary's. A special mention to all the hard-core globinologists (real or honorary) Jack, Tim, Adam, Auntie Janet, Sue, Gill, Mike, Raymond, Rob, Charles, Lesley, and Derek.

To my family. To my Croydonian friends for their acerbic wit. ABBREVIATIONS bp base-pai r kb 10^1 base-pairs Bis NN -methylenebisacrylamide BSA bovine serum albumin cpm counts per minute DEAE diethylaminoethyl D.melanogaster Drosophila melanogaster (a fruit fly) DNA deoxyribonucleic acid DNase deoxyribonuclease cDNA complementary DNA dpm . - disintegrations per minute E.coli (a bacterium) EDTA ethylene diamine tetraacetic acid EtOH ethanol g acceleration due to gravity (terran units) Hb haemoglobin ug microgram ng nanogram P9 picogram M mola r moi multiplicity of infection pfu plaque forming units RNA ribonucleic acid hnRNA heterogeneous nuclear RNA mRNA messenger RNA tRNA transfer RNA RNase r ibonuclease rpm revolutions per minute SDS sodium dodecyl sulphate TCA trichloroacetic acid TEMED N'N'N'N' tetramethylethylene diamine Thai thalassaemia Tris 2-amino-2(hydroxymethyl)- propane-1,3-diol UV ultraviolet w/v weight per volume X.laevis Xenopus laevis (a toad) /

Experiments that failed' too many times, And transformations that were too hard to find Blue Oyster Cult, 1974 VI

Figures Page

1. The Structure of the Human a- and/}- Globin mRNAs• 7

2. The Human a- and /}-like Globin Gene Clusters. 10

3. Structure of the Humana- and /} -Globin Genes. 12

4. Putative Recognition Sequences of the Human/}- 17

Globin Gene.

5. Gene Deletions in /? Thalassaemia 26

6. Human a-,/}-, and y - Globin Plasmids pHa//}/yGl. 54

7. A Sequencing Gel of the a-Globin cDNA Plasmid pHaGl 56

8. Partial Nucleotide Sequences of pHa//}/yGl. 58

+ 9. The/} Thalassaemic Patient. 61

10. Southern Transfers of the Patients DNA. 62

11. Protocol for Cloning the Hsu 1 /}Globin Gene Fragment. 65

12. Optimising UV Irradiation of In Vitro Packaging 68

Lysogens.

13. Analysis of Size-Fractionated Human DNA. 71

14. The Hind 111 (Hsu 1) Lambda Vector NEM788. 73

15. Restriction Endonuclease Analysis of Recombinant 79

Phage.

16. "Benton and Davis" screening of Recombinant Phage. 81

17. Plaque-Purification of a Positive-Scoring 82/83

Recombinant Phage, Apaddington 1.

18. Southern Transfers of the Recombinant Phage 85/86

A Paddington 1.

19. Mapping the/}Globin cDNA Hybridising Fragments in 87

A Paddington 1.

20. Restriction Digests of the Pst 1 and Hsu 1 Subclones 92 + 21. Restriction Digests of the 7.5/} Subclone. 94 * (« /II

+ 22. Restriction Maps of the Plasmids 4.4//and 7.5 /J . 97 + 98/99 23. Parallel Restriction Mapping of the Normal and p

Thalassaemic Globin Genes. 103 24. The "Shotgun" Sequencing Protocol. 105 25. Starting Fragments for Shotgun Sequencing of the fj Thalassaemic Human p Globin Gene.

26. Sequencing Gel of an Rsa 1 Fragment spanning the 3' 107

end of the large intron.

27. The Sequencing Protocol. 109

28. Sequencing Gel of a Hinf 1 Fragment spanning the 1L0

Gene's 3' flanking region.

29. Sequencing Gel of a Hinf 1 Fragment spanning the 111

region approximately 80 downstream

from the Gene's site.

30. Sequencing Gels of the coding and anti-coding 112

strands of an Hph 1 fragment spanning the gene's

small intron. Yin

Tables Page

1. Packaging Efficiencies of Lambda . 69

2. Packaging Efficiencies of restricted and 75

unrestricted Vector DNAs.

3. Construction of \ NEM788/Human recombinants In Vitro. 77

4. cDNA Hybridising Restriction Fragments of Lambda 89

Paddington 1.

5. Restriction Fragments generated by double-digests 95

+ of the Hsu 1 Subclone (7.50 ). ix

CONTENTS

PAGE Abstract i

Dedication ii

Acknowledgements iii

Abbreviations iv

Quotation v

List of Figures vi

List of Tables viii

1. Introduction

1.1 Recombinant DNA and Molecular Cloning 1

1.2 The Southern Transfer Technique 3

1.3 The Human Globin and Messenger 4

1.4 Globin Gene Linkage 9

1.5 Introns and Precursor RNAs 9

1.6 Globin Pseudogenes and Repeated DNA Sequences 15

1.7 Fine Structure of the Globin Genes 16

1.8 The Proteins of Globin 18

1.9 Inherited Diseases of the Globin Genes 20

1.10 The Molecular Basis of Thalassaemia 22

Gene Deletions in Thalassaemia 23

Non-Deletion Thalassaemias 28

1.11 Summary and Aims 31

2. Materials and Methods

2.1 Materials 34 VI

2.2 Bacterial and Phage Strains 36

2.3 General Methods

2.3.1 Restriction Enzyme Digestion 36

2.3.2 Ethanol Precipitation of Nucleic Acids 37

2.3.3 Agarose Gel Electrophoresis 38

2.3.4 Acrylamide Gel Electrophoresis 38

2.3.5 Visualisation of DNA Restriction Fragments 38

2.3.6 Autoradiography 39

2.3.7 Recovery of Restriction Fragments 39

2.3.8 Gel Filtration 40

2.3.9 Determination of Radioactivity 40

2.4 Specific Methods

2.4.1 Preparation of Plasmid DNA 41

2.4.2 Transformation of HB101 43

2.4.3 Colony Electrophoresis 43

2.4.4 Labelling Plasmid DNA by Nick- 44

2.4.5 Southern Transfers 44

2.4.6 Grow'th of Phage Lambda in liquid culture 46

2.4.7 Growth of Phage Lambda on Agar plates 46

2.4.8 Preparation of Phage DNA 47

2.4.9 Preparation of Phage Vector "arms" 48

2.4.10 In Vitro Packaging of Lambdoid DNA 49

2.4.11 Miniprep Analysis of Lambdoid DNA 49

2.4.12 Screening of Lambda Recombinants 50

2.4.13 Maxam and Gilberting in Zurich 50 xi

2.4.14 Maxam and Gilberting in Paddington 51

3. Results.

3.1 Partial Sequencing of Human a-, andy Globin 53

cDNA Plasmids

3.2 Molecular Diagnosis of p Thalassaemia in 59

association with Hb Lepore

3.3 In Vitro Packaging of Lambdoid DNA - 64

3.4 Making Recombinant Phage

3.4.1 Enrichment for Human Globin Sequences 70

3.4.2 Manipulation of Vector DNA 72

3.5 Screening Recombinant Phage derived from the 80

P /Lepore Human DNA

3.6 Mapping the Recombinant Phage 84

3.7 Subcloning of the cloned J3 Globin Gene into the 90

Plasmid pAT153

3.8 Restriction mapping of the Subcloned Recombinant 93

Recombinant Plasmids

+ 3.9 The Sequence of the /? Thalassaemic P Globin Gene

3.9.1 Rationale 100

3.9.2 Technical Aspects 101

3.9.3 The Sequence 108

4. Discussion

4.1 The Analysis of Control Sequences in Eukaryotic 114

Genes

4.2 Naturally occuring mutants of the Human Globin 118 XII

Genes 4.3 The map position of the/}+Thalassaemia Gene 119 4.4 Molecular Cloning of the. /}+Thalassaemic Globin 122 Gene 4.5 Structural Comparison of Normal and [J Thalassaemic 124 Globin Genes

5. Summary and Perspectives 130

6. References 132

Errata 142 Section 1 Introduction. 1

1 INTRODUCTION

1.1 Recombinant DNA and Molecular Cloning

The genome of a higher consists of vast amounts of deoxyribonucleic acid (DNA). Known mutation rates suggest that animals should not survive carrying such a large genetic

"target", as they would accumulate many lethal mutations before reaching reproductive age. The implication is that not all of the DNA contains useful information. Biophysical studies of DNA reannealing have shown that some sequences are repeated hundreds or thousands of times in the genome.(Britten and

Kohne, 1968).It is probable that the highly repeated sequences can sustain multiple mutational hits without any effect on the organism's viability. Genes coding for abundant components of the cell, such as ribosomal RNA (rRNA) or transfer RNA (tRNA) are present hundreds of times in the genome. Genes coding for enzymes, or proteins for export from the cell are usually present only once in the haploid genome.The three types of DNA sequence,unique,repeated,and highly repeated are probably interspersed.

To investigate the biochemistry of how a gene works, it is necessary to purify that gene sequence.The genome organisation of higher made the purification of single copy gene sequences extremely difficult (Anderson and

Schimke 1976, Woo et al 1977) . The situation was reversed with the advent of "genetic engineering" technology. The idea was to evolve a system analogous to bacterial generalised transducing phage. The methodology relies on the ability of bacterial replicons to amplify phenotypically silent tracts of passenger 2

DNA. DNA from (in this case) the eukaryotic genome is fragmented and rejoined to a prokaryotic acceptor chromosome.

These acceptor such as plasmids or phage are called

"vectors". The fragmentation and rejoining manipulations are greatly facilitated by restriction endonucleases. These enzymes cleave DNA at specific nucleotide sequences ("sites") to generate DNA restriction fragments. For instance the restriction enzyme Eco R1 recognises and cleaves at the

1 sequence 5 GAATTC 3'. However these enzymes are not an absolute requirement for producing in vitro recombination.

The mixture of recombined bacterial and eukaryotic DNA is reintroduced into live host . This process is inefficient and invariably only one recombinant molecule will enter a bacterium. Once inside the cell the vector DNA will amplify both itself and its eukaryotic passenger DNA sequence.

The inserted sequence is not contaminated with other eukaryotic sequences, the only "background" now being that of the bacterial and vector chromosomes. These contaminants are readily removed by any of a variety of biophysical techniques.

Therefore if one can identify the bacterium harbouring the sequence of interest, the bacterium can be propogated clonally to provide large amounts of a single eukaryotic sequence. These bacteria are (usually) identified by the ability of their DNA to anneal to a radiolabelled hybridisation probe. The process of identifying a particular recombinant of interest is called

"screening". A sequence purified by these techniques has undergone molecular cloning.

Plasmids and phage have different parameters as cloning vehicles.The latter are more suitable for cloning single-copy 3

eukaryotic genes. The first advantage of phage is that they

accept larger DNA fragments. The larger the inserted fragment

in each recombinant, the fewer recombinants are required to

contain all of the DNA sequences in the genome of the organism

under study. In other words, the fewer are required for a

representative "library" of the organism's genome. Disabled derivatives of phage lambda have been constructed which are capable of accepting DNA fragments of up to 21kb (Murray and

Murray, 1974, Blattner et al, 1977, Leder et al, 1977).

Fragments of this size can be introduced into plasmid vectors but the efficiency of the "transformation" step needed to

reintroduce the naked recombinant DNA back into bacteria, falls off rapidly with the increasing size of inserted DNA.

Recombinant phage lambda DNA can be reconstituted into viable phage in vitro and then infected directly into E.coli (Hohn and

Murray, 1977). The "in vitro packaging" process will package all recombinant lambda molecules with equal efficiency, as long as the total length of the molecules is > 75-105 < % of wild-

type lambda (Bellet et al, 1971). In terms of the inserted DNA

fragment, these limits are typically 1-15 kb. Even allowing

for the different "target" DNA size optima of plasmids and phage, the efficiency of formation of recombinants is at least

tenfold higher using phage lambda. Another advantage of phage

lambda is that recombinant phage plaques can be screened at high densities in situ (Benton and Davis, 1977).

1.2 The Southern Transfer Technique

Another technique which has revolutionised the structural

analysis of the eukaryotic genome is the Southern transfer

(Southern, 1975). With modifications, (Botchan et al,1976 4

Jeffreys and Flavell, 1977a) the approach allows the

construction of a physical map around a single copy gene. This does not require any molecular cloning. The starting material,

total genomic DNA, is digested with a restriction endonuclease

to generate a set of defined DNA fragments. These are then

fractionated on the basis of size by agarose gel electro-

phoresis. DNA within the gel matrix is denatured (by soaking

the gel in a NaOH solution) and blotted onto a nitrocellulose

filter. The DNA is now immobilised and can be hybridised to a

radiolabelled hybridisation probe specific for the gene of

interest. After washing off unhybridised material, the filter

is autoradiographed. Restriction fragments containing the gene

sequence will be visualised as bands on the X-ray film and the

electrophoretic mobility (size) of these bands will be

characteristic for the restriction endonuclease used to digest

the starting DNA. By using single and double digests with

various restriction endonucleases, the position of restriction

sites around the gene can be established.

Why is it useful to map the positions of restriction

sites? Although these sites are positioned by molecular

analysis, they can be used as genetic markers, just like say,

the gene for a fruit-fly eye colour mutant. Markers are useful

for measuring distances along a chromosome. An obvious question

that can be asked is, do these distances change when the gene

is disfunctional? These molecular markers can also be used to

follow patterns of inheritance.

1.3 The Human Globin Proteins and Messenger RNAs

\ The new techniques of recombinant DNA can be used to 5

investigate any gene or gene family of interest. The pathway of gene expression is similar for all coding genes, and is discussed with particular reference to thr human globin genes

in the following sections. Briefly, the genes are "transcribed"

in the cell nucleus by RNA polymerase to give an accurate copy of one of the DNA strands of the gene. This RNA molecule is modified by a number of enzymes and transported to the cytoplasm. The mature RNA molecule is called a messenger RNA

(mRNA). THe coding information contained in the sequence of the four different nucleotides of the mRNA is retrieved by

"translation". (ribonucleoprotein complexes) traverse the mRNA using the triplet code of the RNA sequence to direct the synthesis of the corresponding polypeptide chain

The particular advantages of the globin gene system are that it is a multigene family, where the synthesis of a given globin protein is limited to a particular stage in human development. The proteins are all well characterised and the primary structure is known in many cases (Dayhoff, 1972). A variety of inherited diseases of globin chain synthesis exist.

Finally, the mRNAs are quite easy to purify. This last point is important as the purified mRNAs can be used as "probes" to study globin gene (DNA) sequences.

The globin proteins themselves are the apoprotein s / i components of the oxygen transport protein, haemoglobin. These genes are only expressed in the cells of the blood

(erythropoietic) system, though basal levels of expression have been reported in other tissues (Humphries et al, 1976). Only some populations of erythropoietic cells direct the synthesis of globin chains, and lineages of various subpopulations have 6

been described (Paul, 1976). Each haemoglobin molecule consists of a haem portion plus four proteins. The adult haemoglobin,

HbA, is composed of two alpha ( a ) and two beta (jj ) chains.

However a minor species also exists (HbA£) where the p chains are replaced by delta ( 6 ) globin. Delta globin is very closely related to globin and its low level in the adult is probably determined at the transcriptional level (Wood et al, 1978).

Foetal haemoglobin contains gamma (Y) globin, which is another 0-like globin. Two distinct forms exist, gy and /, differing only by a single amino acid residue (glycine or alanine) at position 136 (Schroeder et al, 1968). Thereare no a-like foetal globins but both d- and like embryonic globins exist. These are epsilon (£ ) and zeta (£) globin, and are found in the embryonic haemoglobins Hb Portland (£2^ '

Gower 2 ene Gower 1 9 system thus provides an opportunity to follow the developmental switching of individual genes in a multigene family.

The mRNAs for d and P globin are abundant in erythrocytes, facilitating their analysis. The size of thes molecules suggests that they could code for proteins about 200 amino acids long, whereas d and P globin are only 141 and 146 amino acids respectively (Dayhoff, 1972). The extra nucleotides are organised in two blocks on either side of the , and these are called the 5' and 3* untranslated regions. The mRNA 5' terminus is blocked with a cap structure.

f This is a methylated guanosine residue linked to the 5 nucleotide via a 5*-5' triphosphate linkage. This structure is thought to be involved in translation initiation (Shatkin,

1976) . THe 5* contains the AUG translation 7

5' 3' AUG P UAA cap—poly A I 146

AUG a UAA caP—«polyA I 141 I 1 0 500nucs.

Figure 1. The Structure of Humana - and ff-globin mRNAs Coding areas of the molecules are drawn as thick lines. AUG and UAA are the initiation and termination codons respectively of protein synthesis, nucs = nucleotides. 8

initiation codon and adjacent to this is a putative binding site. The 3' terminal nucleotide has a tract of polyadenylic acid attached to it. This "poly A tail" is added enzymatically after of the mRNA. Comparison of

1 the 3 untranslated regions of a variety of mRNAs reveals only one , an AAUAAA block about twenty nucleotides away from the poly A site (Marotta, 1977 Proudfoot,

1977) . This has been proposed as a recognition sequence for the

1 polyadenylation enzyme. The function of the 5' and 3

1 untranslated regions is not obvious. Removal of the 3 untranslated region from rabbit P globin mRNA does not alter

the ability of the mRNA to direct protein synthesis (Kronenburg et al, 1979). Structural features of human CZ andPglobin mRNAs

are summarised in Figure 1.

Purified mRNA molecules can be used as hybridisation

probes to detect their complementary gene sequences. In

practice this is usually acheived by reverse trancription of

the mRNAs into a complementary DNA (cDNA) product. Radioactive

nucleotides can be included in this reaction to yield a highly

radioactive globin gene specific probe. Such cDNA probes are

always contaminated at low levels by species transcribed from

non-globin RNAs present in the erythrocytes. However, if cDNA

molecules prepared from total erythrocte (poly A+) RNA are

converted into double-stranded (ds) DNA, then this ds DNA can

be cloned in bacterial plasmids. Most chimaeric plasmids

produced in this way will contain globin-specific inserts, and

can be identified by hybridisation to (relatively) pure globin

cDNA. This technology has been used in this laboratory, and by

others to produce a-, P -, and /-globin "cDNA" plasmids (Little 9

et al, 1978, Wilson etal, 1978). The identity of the human

globin cDNA plasmids pHa/|3/YGl was confirmed by myself using

direct sequence analysis (Section 3.1).

1.4 Globin Gene Linkage

Genetic studies imply that the/?-like globin genes are

physically linked (Weatherall and Clegg, 1972). This result is

confirmed by molecular analysis using radiolabelled globin cDNA g GL. v or cDNA plasmids to detect gene sequences. The gene order t—Y

has been established by Southern blotting, (Mears et al,

1978, Flavell et al, 1978, Little et al, 1979, Fritsch et al,

1979, Bernards et al, 1979a, Tuan et al, 1979) and by isolation

of recombinant phage lambda containing linked genes. (Lawn et

al, 1978, Ramirez et al, 1979). The e gene liesto the 5' side of the Yglobin genes. (Fritsch et al, 1980:

Figure 2). Analogous studies have defined the structure of the a-like globin gene cluster. Both a and £ genes are duplicated,

with the embryonic genes lying to the 5' side of the adult

genes. (Figure 2; Orkin et al, 1978a, Lauer et al, 1980). In

both gene clusters the order of the genes parallels the order

in which the proteins are expressed in development, with the

1 adult genes lying on the 3 side of the map.

1.5 Introns and Precursor mRNAs Three interesting observations arise from structural analysis of these gene clusters. Firstly the genes are interrupted by tracts of DNA not represented in either the mature protein or mRNA product of the gene. These tracts are called "introns",or intervening sequences. Introns were first discovered in the ribosomal RNA (rRNA) genes of Drosophila i|jp2 e gY aY 4>pi 6 p —CD U -a—D a a o

Q 1,0 70 3,0 4,0 5,0 6,0 Kh

Si i|»al a2al

Figure 2. The Human a- and ft-like globin gene clusters.

The direction of transcription is from left to right. l|j = pseudogene. 11

melanogaster and subsequently in the genome of Adenovirus

(Glover and Hogness 1977, Chow et al, 1977). They have also f been described in the ovalbumin, ovomucoid and lysozyme genes of the chicken, and a plethora of other genes (Dugaiczyk et al,

1978, Catterall et al, 1979,Nguyen-Huu et al, 1979). However they are not ubiquitous and and 5S rRNA genes are uninterrupted (Schaffner et al, 1978, Korn and Brown, 1978).

The first globin gene intron was discovered in the rabbit

P -globin gene by Jeffreys and Flavell (Jeffreys and Flavell,

1977b). The sequence interupting the gene is about 0.6kb long and lies near the carboxy terminus (3* terminus) of the gene.

Fine structure analysis of the rabbit p-globin gene has subsequently shown that a second small intron of 126 base-pairs lies near the 5' end of the gene. (Van den Berg et al, 1978).

This pattern of a small and large intron has been confirmed for every /3-like globin gene examined. This includes mouse p-major and P~minor genes, chicken j9-globin, Xenopus laevis /?-globin and all the human )5-like globin genes. (Konkel et al, 1979,

Dodgson et al, 1979,Patient et al, 1980, Efstratiadis et al,

1980) . The introns of the human 0-like globin genes are approximately 130 and 800 base pairs long, and they occur between codons 30/31 and 104/105 respectively. (Lawn et al,

1980). Two introns interrupt the human a-like globin genes in analogous places in the protein coding sequence, but both are small. The structure of the human a and p globin genes is shown in figure 3.

Nucleotide sequencing has shown that introns are less conserved in evolution than their adjacent coding sequences, suggesting that intron DNA may not be particularly important 12

p a

0 0-8 1-6 Kb.

Figure 3. Structure of the human fl- and P -globin genes.

Coding regions are in black. Introns (intervening sequences) are in white. The 5' and 3' untranslated regions of the mRNAs are shaded. Transcription is from left to right. 13

for the functioning of its "host" gene. Konkel et al have suggested that the purpose of introns is to reduce crossover between related genes. (Konkel et al, 1979). This is because aligned DNA duplexes of the two related genes will be

interrupted by a large mismatch, (corresponding to the diverged

intron sequences), which could prevent the complete exchange of

DNA strands between the two chromosomes. The importance of

reducing crossover between related, but not homologous genes is

that the number of copies of these genes will be stabilised.lt has also been suggested that the coding-blocks separated by

introns encode discrete protein domains (Gilbert, 1978) .

Illegitimate recombination events between introns would create new orders of coding-blocks along the chromosome. This in turn would produce proteins with new combinations of domains, and

these proteins could be tested (in the evolutionary sense) for

their usefulness. The key point here is that the recombination

to produce novel orders of domains occurs in an unexpressed

area of the gene. This is useful because the recombination doesn't have to be perfectly in phase (as it would if the

recombination occured directly between the two protein coding

blocks). Such (coding-block) shuffling does appear to

happen on a much shorter time scale in the immunoglobulin genes

(Sakano et al, 1979, Seidman et al, 1979).

Introns do not code for any obvious "gene" product, but

these sequences are transcribed. Studies over a number of years

have indicated that mature messenger RNAs are derived from the

"processing" of larger transcripts. The precursor molecules are

found in the heterogeneous nuclear RNA (hnRNA). A great deal of

information is now available for the mouse /3-globin gene where 14

a 15S precursor to the 9S mRNA has been characterised. (Curtis and Weissmann, 1976, Kwan et al, 1977, Ross, 1976). This molecule is capped and polyadenylated. R-loop mapping showed that the 15S RNA was co-linear with the chromosomal mouse /3- globin gene. (Tilghman et al, 1978). This was the first evidence that the major, and probably the minor, mouse globin gene introns were transcribed. Transcription of introns has now been established as a general phenomenon. The intron sequences are removed from the 15S RNA molecule via excision/ligation reaction(s) collectively referred to as "splicing". Several reaction intermediates occur in the conversion of 15S RNA to 9S mRNA, both for mouse and rabbit j3-globin (Kinniburgh and Ross,

1979, Flavell et al, 1979a). Analogous intermediates exist in the maturation of ovalbumin gene transcripts. (Roop et al,

1978). The 5' terminus of rabbit and mouse 15S RNA is identical with that of the 9S mRNA. (Flavell et al, 1979a, Weaver and

Weissmann, 1979). Transcripts derived from in vitro transcription of a cloned rabbit j3-globin gene again have a 5* start-point terminus indistinguishable from the 9S mRNA (R.

Flavell, personal communication).

Data on humanjOglobin gene transcription is more sparse.

Using a restriction fragment containing the large intron as a hybridisation probe, RNA species (>9S) are detected in the nuclei of erythropoietic cells. These multiple RNA species are probably analogous to those seen for mouse pre mRNAs (Maquat et al, 1980). A 16S species related to Y globin mRNA has been detected by hybridisation of a Yglobin cDNA plasmid to nuclear poly A+ foetal liver RNA. This species was not detected with a yglobin cDNA plasmid. (Courtney and Williamson, 1979). None of 1244

the human a or P globin pre mRNAs have been precisely mapped against cloned copies of their cognate genes.

1.6 Globin Pseudogenes and repeated DNA sequences.

During the analysis of cloned globin genes, it was found that certain flanking sequences cross-hybridise with the globin structural genes. These pseudogenes are interspersed amongst the globin genes, and have an gross organisation similar to authentic genes (Figure 2).However, features of their nucleotide sequence make it unlikely that they are expressed as proteins. (Proudfoot and Maniatis, 1980). One c and two p pseudogenes have been identified so far in humans (Fritsch et al, 1980, Lauer et al, 1980) . Pseudogenes have also been described for mouse d globin, rabbit p globin, and Xenopus 5S

DNA. (Vanin et al, 1980, Lacy et al, 1980, Jacq et al, 1978).

The origin and significance of these sequences is still unclear.

Finally, structural analysis has demonstrated that repetitive DNA sequences are interspersed amongst the globin genes. The most detailed analysis has been performed for the rabbit p globin locus (Shen and Maniatis, 1980, Hoeijmakers-van

Dommelen et al, 1980), but a broadly similar pattern is seen for the human /} - like globin locus (Fritsch et al, 1980,

3 Coggms et al, 1980). Inverted repeat sequences flank the r 7 and bp genes and a related sequence lies to the 5' side of the

1 £ gene. A long repeated sequence lies to the 3 side of the p

1 globin gene (Kaufman et al, 1980). The sequence 5 of the gene serves as a template in vitro for RNA polymerase 111

(Duncan et al, 1980, section 1.8). These sequences may be 16

involved in DNA replication but it is tempting to think that

they play a role in organising higher orders of structure of

the globin linkage groups.

1.7 Fine structure of the globin genes

Technological advances in the last three or four years

have made DNA sequencing accessible to the scientific

proletariat (Maxam and Gilbert, 1977, 1980, Sanger and Coulson,

1978, Sanger et al, 1978). The sequences of all five p-like

human globin genes have been determined, as well as the

sequences of the mouse j3-major and minor, and the rabbit j3-

globin genes (Lawn et al, 1980, Spritz et al, 1980, Proudfoot

et al, 1980, Slightom et al, Konkel et al, 1979, Van Ooyen

et al, 1980). Comparison of all these sequences should reveal

elements of the genes which have been conserved during divergence of the protein coding blocks. It has been argued

that elements identified in this way are recognised by the

enzymes involved in gene expression. By extrapolation

particular elements could be correlated with particular

enzymes, e.g. (transcripts of) conserved intron sequences could

be recognised by the nicking/closing ("spligase") enzyme(s) of

RNA splicing.

Sequence comparisons have revealed strongly conserved

elements in the genes. Thus two conserved sequence blocks occur

to the 5' side of globin (and other eukaryotic) genes. These

sequences may be involved in transcription initiation

(Efstratiadis et al, 1980). Sequences close to the

intron/coding-block junctions are conserved. The junction

sequences of the first intron of the human jO-globin gene are 17

P

IV82

I Al.L.uClvArr L.uLeuV.lV.l GluA.nrheArc ' I G£t£TGG0CAf,i,IT6&T»lc»AGGT-//-TCCCACCCTTAf.GCTGCTGGTGGTC GAGAACTTCAGGGTGAGTCTATGGG 104

Figure 4. Putative Recognition Sequences of the Human P Globin Gene.

IVS = intervening sequence (= intron) . AUG and TAA are the initiation and termination triplets of protein synthesis. Conserved sequences found in other genes are boxed. Sequences are from Lawn et al (1980). 30, 31, etc. refer to amino acid codons. 18

1 5 CAGGttgg...intron...taggCTGC 3', where the nucleotides o£

the coding block are shown in capitals. However, these

sequences are almost perfect direct repeats. Although splicing

1 could occur from the last nucleotide of the 5 coding-block to

the first nucleotide of the 3' coding-block, it could also

1 occur between the penultimate nucleotide of the 5 coding-block and the last nucleotide of the intron. Thus a number of splicing "frames" are formally possible. Comparison of the sequence of several splice junctions has suggested a prototype

1 sequence 5 TCAG/gta...intron...txcag/G, where / delineates the splice point, and x signifies any nucleotide (Breathnach et al,

1978). Sequences of splice junctions which are not perfect direct repeats confirm that this rule is correct (Stein et al,

1980). For the globin splice junction shown above, the correct

1 position of the splice point is 5 CAG/Gttgg.... intron.. .tag/gCT

1 GC 3 . Thus splicing occurs within the codon for amino acid 30.

The only other highly conserved sequence in the globin

(and other) genes is the AAUAAA block previously mentioned

(section 1.3). Figure 4 shows the location of these various conserved sequences around the human j3-globin gene.

1The proteins of globin gene expression

Some of the proteins of globin gene expression have been alluded to in the previous sections. Three forms of RNA polymerase exist in the nuclei of eukaryotes. The globin genes are transcribed by RNA polymerase II. This is a very complex multi-unit enzyme and other proteins may be involved in transcription initiation and termination. The enzymes involved

in capping, polyadenylation, or splicing have not been purified 19

as yet. The enzymology of mRNA translation is known in great detail for eukaryotes.

In vivo eukaryotic DNA is associated with a variety of proteins and this nucleoprotein complex is called chromatin.

Proteins in chromatin have a variety of functions. A class of proteins are responsible for "packing" the DNA into condensed structures. The most abundant proteins of this class are the . There are five major histone species, HI, H2A, H2B,

H3 and H4. The basic chromatin fibre is believed to consist of a chain of repeating subunits, the , each made up of about 165 base pairs of DNA wrapped around a histone octamer (2 x H2A, 2 x H2B, 2 x H3, 2 x H4). The nucleosomes are separated by lengths of "linker" DNA about 40 base pairs long. This

"beads-on-a-string" structure accounts for the first order of

DNA packing. Further orders of packing of these chromatin fibres must exist to account for the observed lengths of chromosomes. Other abundant protein species may be involved in this process (Elgin and Weintraub, 1975).

There is another class of chromosomal proteins. These are the nonhistone chromosomal proteins (NHC proteins). The NHC proteins are heterogeneous. Some of NHC proteins will be the enzymes already mentioned: RNA polymerase II, spligase etc.

Different tissues within the same organism contain different

NHC proteins. Thus the NHC proteins are suspected to be involved in gene expression in general, and in particular they may control tissue-specific patterns of gene expression. This hypothesis has been tested by reconstituting depleted chromatin with different NHC proteins. Transcripts from the reconstituted template are examined to see if they are tissue-specific (Paul 20

and Gilmour, 1968). These experiments are very difficult.

Another way of investigating structure-function relationships in chromatin is by using a classical "protection" experiment. The ability of chromosomal proteins to protect

DNA sequences from endonuclease digestion is tested. DNA susceptible to digestion is released from the bulk chromatin and becomes solubilized. Only a mild digestion with pancreatic

DNase I or spleen DNase II is needed to release DNA from the chromatin. Consequently the solubilized DNA is not degraded to oligonucleotides, and gene sequences can be detected by solution hybridisations. Gottesfeld and Partington have shown that (mouse) globin genes in Friend erythroleukaemia cell chromatin are not particularly sensitive to DNase II digestion.

However, when the cells are induced with dimethylsulphoxide to produce globin mRNA, the solubilized fractions are enriched for globin sequences (Gottesfeld and Partington, 1977). Similarly ovalbumin gene sequences are preferentially digested by DNase I only in nuclei derived from oviduct that is actively expressing ovalbumin mRNA (Garel and Axel, 1976). The implication is that transcriptionally active genes are in a more "open" conformation than bulk chromatin. In a more sophisticated variant of this experiment DNA from the solubilized chromatin fraction is probed via a Southern blot (Staider et al, 1980).

This allows the endpoints of the DNase sensitive region to be precisely mapped. Human globin chromatin domains could be defined in this way.

1.9 Inherited Diseases of the Globin Genes

Inherited diseases exist in man which affect the type 21

and amount of the globin proteins. The former diseases are the haemoglobinopathi.es and the latter are the thalassaemias. A number of different mutational events can alter the primary structure of the globin gene products. Transitions or tranversion can produce amino acid substitutions and many of these have been documented. Clinically, the most clinically important example is probably haemoglobin S ( p6 Glu>Val), found in sickle-cell anaemia (Weatherall and Clegg, 1972).

Mutations creating or removing termination codons will result in extended or truncated polypeptide chains. Premature termination of j3-globin translation occurs in Chinese patient described by Chang et al (1979) and translation read-through produces the a chain variants found in Hb Constant Spring, Hb

Icaria, Hb Koya Dora, and Hb Seal Rock (Reviewed by Clegg and

Weatherall, 1976). Two elongated P chain variants, Hb Tak and

Hb Cranston are also thought to be due to a mutation in the UAA translation termination codon (Bunn et al, 1975, Lehmann et al,

1975). Illegitimate recombination events between globin genes result in globin proteins with an amino terminal derived from one gene and a carboxy terminal derived from another gene. For example the abnormal haemoglobin Hb Lepore contains a fusion protein (NH -) 5/P (-COOH) (Baglioni, 1962). The converse fusion protein Anti-Lepore has also been described. Hb Kenya contains another fusion protein, this time with a Y-globin amino terminus and a /3-globin carboxy terminus (Huisman et al,

1972) .

The Thalassaemia syndromes are hereditary anaemias. They are cha racterised by reduced synthesis of either a- or |5-globin chains and the clinical symptoms are a result of unbalanced a/p 22

chain ratios. These diseases behave as classical Mendelian recessive mutations and are common in equatorial and

Mediterranean areas. The heterozygous carrier state for thalassaemia is asymptomatic (Thalassaemia minor). Homozygous p-thalassaemia, Thalassaemia major, is a serious clinical condition and used to be fatal in early adult . Examination of HbA levels in patients presenting with )3-thalassaemia reveals that the disease is genetically heterogeneous. HbA levels can be 5-30% of. normal or undetectable. Such + o thalassaemias are classified as ft and ft respectively (Forget,

-f.

1978). The implication is that different alleles of (3- thalassaemia produce different amounts of protein.

The of d-thalassaemia are more complex as there are four functional genes in the diploid genome. Varying degrees of severity of a-thalassaemia are seen and these can be divided into four classes. In order of increasing clinical severity these are d-thalassaemia 2 and d-thalassaemia 1 which are mild conditions, HbH which is an anaemia, and hydrops foetalis which is fatal at or before birth.

1.10 The Molecular Basis of Thalassaemia

In most cases of thalassaemia the phenotype is reduced output of Q or p globin protein with no corresponding increase in embryonic or foetal globins to balance the deficit. What sort of mutations might one anticipate in thalassaemia? The most obvious mutation is a deletion of protein coding sequences from the gene. Deletions are found in both a- and j3- thalassaemias. However (large) deletions are not ubiquitous and most p-thalassaemias involve no detectable coding sequence 23

deletions (Flavell et al, 1979b). In practice, deletions smaller than about 0.2 kb may go unnoticed using Southern transfers of genomic DNA.This parameter (presence or absence of a deletion) is a convenient way of classifying thalassaemias

(Wood et al, 1979, Maniatis et al, 1980a). However, this arbitrary division is not meant to imply genetic homogeneity within members of each class.

Gene deletions in thalassaemia

One way of testing for a gene deletion is to "titrate" the number of genes in the genomic DNA of a patient, using radioactive cDNA as a probe. These solution hybridisations are semi-quantitative and have established that most cases of a- thalassaemia are due to gene deletions, a-thalassaemia 2, G- thalassaemia 1, HbH disease and hydrops foetalis are usually the result of deleting one, two, three or four of the G-glcbin structural genes (Ottolenghi et al, 1974, Kan et al, 1975)

Loss of two a-loci, a-thalassaemia 1, can occur in two ways; either one allele is lost from each chromosome or both alleles are lost from one chromosome. Southern blotting experiments have shown that the single gene chromosome is common in blacks (Dozy et al, 1979). This explains why a- thalassaemia 1, but not hydrops foetalis has a high frequency in the black population.In turn, a single a-gene chromosome can originate in two ways, with either the leftward or rightward locus being deleted. Both types of deletion are found (Embury et al, 1980). The breakpoints of these deletions, either leftward or rightward, are superficially indistinguishable

from those found when propogating recombinant containing duplicated a-loci (Lauer et al, 1980). 24

Why do some a-globin gene deletions keep arising in the

human population? The most likely explanation is that

misalignment of the a loci may occur at meiosis such that a

"leftward" gene lies adjacent to a "rightward" gene. Genetic

recombination between the misaligned genes will produce one

chromosome with three a-genes and one chromosome with one (Z-

gene. This predicts that some people should have five 0. genes

and this is observed (Goosens et al, 1980, Higgs et al, 1980).

Electron microscopic mapping of duplicated genes cloned in

phage lambda has shown that sequence homology between the two a -genes extends considerably beyond the coding portions. The

extensive length of homology would make misalignment, and hence gene deletion, relatively frequent (Lauer et al, 1980). The

expansion and contraction of CZ-gene number by unequal crossing-over may also serve as a mechanism whereby drift between duplicated a genes is minimised during evolution (Zimmer et al, 1980). An essentially similar hypothesis has been proposed for the ^/-^y gene pair (Slightom et al, 1980).

Gene deletions seem to be the cause of certain rare

thalassaemia syndromes. These are 6p thalassaemia and the Negro

form of hereditary persistence of foetal haemoglobin (HPFH). No

P -globin is produced from the affected chromosome in these syndromes. The interesting observation here is that clinically

the diseases are mild (or in the case of HPFH, asymptomatic) because of a compensatory increase in the expression of the

foetal globin genes. Gamma globin expression is more elevated

in HPFH than in 6p thalassaemia. The elevation is seen in heterozygotes and homozygotes of both diseases. The elevation of Hb F production could be the result of in vivo selection for 25

proliferation of gamma-globin producing erythroid cells. In o homozygous /J-thalassaemia, which is a severe anaemia, there is a strong selection pressure for gamma-globin production. o n However Hb F levels in p -thalassaemia homozygotes are only slightly elevated above normal, so it is unlikely that Hb F overproduction in thalassaemia and HPFH can be attributed to a cellular origin. Hb F overproduction does seem to be the direct result of a mutation, presumably cluster. Solution hybridisations have shown that the p- and possibly the 6-globin genes are deleted in 6p~thalassaemia and

HPFH (Ottolenghi et al, 1976, Ramirez et al, 1976). These deletions are probably the mutations causing over-expression of the foetal genes, although the formal possibility of double mutations cannot be excluded. This led to the suggestion that the deletions progressively removed sequences which normally act in cis to repress foetal globin gene expression in adult life (Huisman et al, 1974). It was therefore of considerable interest to map the breakpoints of these deletions. This has now been accomplished using genomic blotting techniques

(Fritsch et al, 1979, Tuan et al, 1979, 1980, Ottolenghi et al,

f 1979, Bernards et al, 1979b, 1980). The 3 breakpoints mapped in two cases of HPFH probably lie over 17kb beyond the /J-globin gene (Kaufman et al, 1980). The 5' breakpoints of various HPFH patients are shown in Figure 5. The significance of these findings has been discussed at length by Fritsch et al (1979) and by Wood et al (1979). The data is not consistent with a • . • . a v single cis-acting element lying between the r and

6 gene (Huisman et al, 1974). Most of the findings could be explained if a second repressor element was postulated, lying 26

A G C Y Y +Pi 6 p "/• HbF Normal o o o—o—a o o p°thal o HbLepore < > 1-5 6pthal1,2 • =---- 5-15 HPFH 1,2 C=:::: 15"30

HPFH3.4 1 15-30 Hb Kenya > 6 -10 6p thai 3,6 =:::: 10 13 ypthal =» Greek - 15-20 HPFH 1.2

i _ Figure 5. Gene Deletions in p Thalassaemia.

Deletions are indicated by open blocks. Where the end-point cannot be precisely mapped the approx- imate position is shown by shading. %Hb F refers to % foetal Haemoglobin in heterozygotes. There is no detectable deletion in the Greek form of HPFH. This Figure is after Maniatis et al, 1980a. 27

to the 3' side of the P gene. Deletion of one element produces

a bp thalassaemia phenotype, deletion of both an HPFH

phenotype. One exception cannot be satisfactorily explained.

Here the DNA containing both of these hypothetical regulatory

elements is deleted, but the phenotype is that of bp

thalassaemia. This deletion removes the /gene (the patient has

bp thalassaemia) . Perhaps if the^Ygene was intact then an

HPFH phenotype would be observed in the patient. Bank and co-

workers have proposed that the 6 globin gene itself may be

involved in gamma-globin regulation, and have implied 6 globin

transcripts as being repressor molecules (Bank et al, 1980).

These would have to act in cis. Another suggestion

is that deletions disrupt the supramolecular structure of the qraYbp gene complex. This disruption makes expression of the

putative^Y^Y transcription unit more favourable. An equilibrium

between Y and p globin expression has been suggested (Bernards

et al, 1979b) and deletion of the 6 and P gene sequences would

shift the equilibrium to favour foetal globin gene expression.

Analagous phenomena to HPFH and bp thalassaemia have been

observed in the a-like globin gene cluster (Pressley et al,

1980).

Two cases of p-thalassaemia have been found to be

1 associated with deletions. In the first case the 3 end of the

P gene is removed by a small, ca.600 base-pair, deletion. This

presumably inactivates the gene, although the possibility of a

second, earlier, mutation has been discussed (Orkin et al,

1979a, Flavell et al, 1979b) . The second type of deletion is

found in Y6 thalassaemia and removes the gyOy and 6 globin genes. The deletion stops approximately 2kb. before the Pgene 28

but still depresses synthesis of j3-globin chains (Van der Ploeg et al, 1980).

Non-deletion Thalassaemias

Inactivation of a-globin genes is not always synonymous with deletion. Historically the first non-deletion a- thalassaemia is Haemoglobin Constant Spring. This haemoglobinopathy is a result of a point mutation in the termination codon of the a-gene, with the consequent addition of 31 amino-acids to the a-globin chain. It is thought that the quantitative deficiency of this a-chain variant is due to mRNA instability (Weatherall and Clegg, 1975). More recently superficially normal, though disfunctional, G globin loci have been demonstrated in non-Asian patients. In these experiments, genomic blotting or solution hybridisations indicated the presence of two or three CL globin loci whereas the patients' (HbH) were consistent with a single active gene (Kan et al, 1977, Orkin et al, 1979b).

More detailed data exists for the j3 thalassaemias. As j thalassaemics produce apparently normal j3-globin protein, the genes must be present. Solution hybridisations and Southern blot analysis have also confirmed that 0-globin o sequences are present in nearly all cases of /? thalassaemia

(Orkin et al, 1979a, Flavell et al, 1979b). As the gross structure of the jQ-globin genes is ostensibly normal in p and o 0 P thalassaemia, the mutations in these diseases must involve only one or a few nucleotides. How could say, a point mutation reduce or abolish the expression of a j3-globin gene? Mutations could occur in the untranscribed flanking regions of the /?- globin gene, such as to interfere with the processes of 29

transcription initiation or termination. Alternatively

mutations may occur within the transcribed region of the gene.

These mutations may lie in areas of the fi globin pre-mRNA

recognised by the enzymes of RNA processing. A final

possibility is that the mutation interferes with mRNA

translation. Mutations near the AUG codon might affect

translation initiation. Silent mutations in the coding region

could change an amino-acid codon from a frequent to a rarer

type. If the cognate tRNA for the new codon was rare, this

would retard protein chain elongation. Experiments carried out

over the last 10 years have been designed to explore these

possibilities.

Variable amounts of p globin mRNA have been demonstrated o in the reticulocytes of homozygous p thalassaemics (Benz et al, 1978, Old et al, 1978) . As anticipated the mRNA detected by hybridisation cannot be translated in vitro (Pritchard 1976. See however Conconi, 1972). In one case of a homozygous p Chinese patient, the failure to direct protein synthesis has been correlated with a premature in the mRNA (Chang o O and Kan, 1979). In homozygous p patients with no detectable

]3-globin mRNA, a defect in transcription or processing of pre-

mRNA is possible. Comi et al have investigated the high

molecular weight nuclear RNA of two such Italian patients by

probing with p-globin cDNA• Hybridising material was found in

one case but not the other (Comi et al, 1977).

Hybridisation and cell free translation experiments are

inconsistent with the idea that P thalassaemia is the result of

inefficient mRNA translation. There is a roughly linear

correlation between mRNA and protein levels in these patients 30

n+ (Benz et al 1978). This implies that most p thalassaemic f globin genes produce or process j3-globin pre-mRNA at reduced efficiency. Analysis of steady-state levels of globin HnRNA indicated that the Q/fi ratio was more nearly normal in the nucleus than in the cytoplasm of bone marrow cells from three homozygous p thalassaemics (Nienhuis et al, 1977), Two of the patients have been reinvestigated using cloned hybridisation probes and an essentially similar result was obtained. One patient appeared to produce an abnormal precursor mRNA sequence not detectable in HnRNA from sickle cell anaemic controls

+ (Kantor et al, 1980). Three different homozygous P thalassaemic patients were examined by Ross and co-workers

(Maquat et al, 1980). Pulse chase experiments showed that processing of p-globin pre-mRNA was in all cases abnormal. The pattern of accumulation of pre-mRNA processing intermediates varied from patient to patient. The authors speculate that the mutation producing this phenotype may lie at or near the n+ mtron/coding block junctions. Overall, p-thalassaemia may often be associated with RNA processing defects. Caution is needed though as some of the homozygous thalassaemics in the o sample examined may really be compound heterozygotes for p and + +

P thalassaemia. Even if anomalous processing is the norm in p- thalassaemia this may not refl

ect the primary lesion in ths disease. For instance mutations which alter the start or stop points of transcription may produce a pre-mRNA less amenable to processing.

Some haemoglobinopathies behave like thalassaemias. The chain variant Hb Constant Spring has already been mentioned. 4- HbK Woolwich is a P-chain variant which behaves like a p 31

o thalassaemia allele (Lang et al, 1974; see Section 4.3). The 0p thalassaemia allele which contains a translation stop codon at position 17 produces a truncated protein chain (Chang et al,

1979). In all three cases mRNA instability has been suggested as the origin of the thalassaemic phenotype.

Finally the Greek form of HPFH may be classed as a non- deletion syndrome. This condition is clinically asymptomatic.

Heterozygotes have slightly lower levels of Hb F than their

Negro counterparts. The chromosome carrying the determinant 9v * probably produces t, r, 0 and p globin protein chains, excluding the possibility of a deletion of gene coding sequences (Wood et al, 1979). Genomic blotting has shown that there is also no detectable rearrangement of the intergenic DNA in this syndrome (Tuan et al, 1980).

1.11 Summary and Aims

The human globin proteins are encoded in two separate gene clusters. These two clusters of a- and //-like globin genes lie on different chromosomes. A variety of different mutations affecting these genes are found in the human population. These mutations include deletions, gene fusions, and single base substitutions. The mutations may result in alteration to the type (primary structure) or amount of globin proteins, or sometimes both. Where a protein variant is involved the inherited disease is called a haemoglobinopathy.

When output of a- and //-like chains is imbalanced the resulting syndrome is called a thalassaemia. Whereas most cases of CZ-thalassaemia (a chain deficiency) are due to deletions of coding sequences, most //-thalassaemias ( //chain deficiency) are 32

not. j3-thalassaemia may be the result of point mutations in or around the p-globin gene. The aim of this research project is to use recombinant DNA technology to characterise the + mutation(s) found in p-thalassaemia.

cDNA copies of human C-, p -, and Y- globin mRNA cloned in bacterial plasmids have been characterised by nucleotide sequencing. The plasmids have been used to construct a physical map around the human a , 5p , and ^yG-y linkage groups (Whitelaw et al, 1980, Flavell et al, 1978, Little et al, 1979). Having established the arrangement of normal globin genes one can go on to investigate the globin genes in thalassaemia. The

9 physical maps of the 6p globin locus(and the A locus) are o + unaltered in most cases of P and P thalassaemia (Flavell et n+ al, 1979b, Tuan et al, 1979). This implies that p-thalassaemia is usually the result of a small (point?) mutation, probably lying close to the j3-globin gene itself (Section 4.3) A patient carrying a p-thalassaemia allele was selected for further » study, p -thalassaemia was chosen because this disease may be a consequence of defective RNA processing (as opposed to a defect in the more well-understood processes of mRNA translation), The

Southern transfer technique was used to confirm that there were no gross rearrangements of the 0-globin gene in this particular patient. In an attempt to find the mutation causing the disease, the affected globin gene from the patient was cloned in phage lambda. The entire, gene was sequenced, and this sequence was compared to that found in the gene of a non- thalassaemic subject. The underlying assumption is that one (or more) of the sequence changes found in the "thalassaemic" gene + is the cause of the ^-thalassaemic phenotype. The position of 33

these nucleotide changes may delineate areas of the j3-globin gene which are important for the gene's expression in vivo Section 2 : Materials and Methods. 34

2. Materials and Methods

2.1 Materials

Chemicals

With the following exceptions, all chemicals were Analar grade supplied by British Drug Houses (BDH), Poole, England.

Trizma base (Tris), sodium chloride, ethidium bromide and putrescine hydrochloride were from the Sigma Chemical Co., St.

Louis, Missouri, U.S.A. Hydrazine hydrate (HZ), spermidine trihydrochloride, and piperidine were from Fluka AG, Buchs,

Switzerland.

Isotopes

All radioisotopes were from the Radiochemical Centre,

Amersham, England.

Chromatography, electrophoresis, etc.

Acrylamide and bis acrylamide for preparative gels and

Bio-Gel gel filtration matrix were from Bio-Rad Laboratories,

Richmond, California, U.S.A. Nitrocellulose filters were from

Schleicher and Schull, Dassel, West Germany. Ficoll was supplied by Pharmacia, Upsalla, Sweden, and polyvinyl pyrrolidone and bovine serum albumin (BSA) were from Sigma

Chemical Co. Agarose was from Miles Laboratories Inc.,

Kankakee, Illinois, U.S.A. DE52 DEAE cellulose was from Whatman

Ltd., Maidstone, Kent.

Photography

FP4 film, Phenisol and PQ universal developers, and Hypam fixer were from Ilford, Basildon, England. Rx medical X-ray film and Mach 11 intensifying screens were from the Fuji Photo

Co. Ltd., Tokyo, Japan. UV light sources were a short-wave 35

transilluminator and a hand-held short-wave lamp from UV

Products Inc., San Gabriel, California, U.S.A.

Restriction Enzymes

Eco Rl, Hinf 1, and "enzyme grade" BSA for restriction enzyme digests were from Bethesda Research Laboratories Inc.

(BRL), Rockville, Md., U.S.A. Hsu 1, Bgl 11, Hpa 1, and Xba 1 were prepared in this lab by Dr. J. Arrand and co-workers. All other restriction enzymes were from New England Biolabs Inc.,

Beverley, Mass., U.S.A.

Other Enzymes

DNase 1 was from the Worthington Biochemical Corp., New

Jersey, U.S.A. T4 polynucleotide kinase was from PL-

Biochemicals, Milwaukee, Wisconsin, U.S.A. Bacterial alkaline phosphatase was from B.R.L. Inc. E.coli DNA polymerase 1 and

Proteinase K were from the Boehringer Corp. (London) Ltd.,

Lewes, England. RNase A and Lysozyme were from Sigma Chemical

Co. T4 DNA ligase was a gift from Lesley Woods, Imperial

College, London.

Nucleic Acids

AC1857 DNA was from Boehringer. Herring sperm DNA and polyadenylic acid were from Sigma Chemical Co.

Bacteriological Materials

BBL trypticase peptone was from Becton Dickinson and Co.,

Cockeysville, Md., U.S.A. Agar was from Difco Laboratories,

Detroit, Michigan, U.S.A. Yeast extract and tryptone were from

Oxoid Ltd., Basingstoke, England. Uridine, ampicillin, and chloramphenicol were from Sigma Chemical Co., tetracycline was from Lederle Laboratories, Gosport, England. 90 mm petri-dishes were from Sterilin Ltd., Teddington, England. Bio-assay dishes 36

were from Nunc, Roskilde, Denmark.

2.2 Bacterial and Phage strains

HB101 was used as the host for all globin recombinant plasmids and was a gift from Herb Boyer. Its genotype is str R, rB7, mB-, recA-, pro-, gal-. (Boyer and Roulland-Dussoix,

1969) .

LE392, (ED8656) and 259 (ED8654), are derivatives of ED803

(Wood, 1966). These strains of E.coli K12 are isogenic and their genotype is : rk-, mk+, recA-, supE, supF, gal-, met-.

275 is a supressor-less strain of E.coli, and C600( Aimm21) is a supE E.coli strain carrying a A imm21 prophage in its genome.

in vitro packaging lysogens BHB2671 and 2673 were a gift from Binie Klein (University of Edinburgh).

BHB2671: N205 (Aimm434, cl t.s., b2, red 3, Eam4, Sam7)

BHB2673: N205 ( Aimm434, cl t.s., b2, red 3, Daml5, Sam7)

The Hind III lambda vector NEM788 was a gift from Noreen

Murray, University of Edinburgh. Its genotype is,

7 A ninS^, att-red^, cl^ , Warn, Earn, Sam. (Figure 14)

The plasmid vector pAT153 is a derivative of pBR322 and was a gift from Prof. D. Sherratt (University of Glasgow; Twigg and

Sherratt, 1980) .

2•3 General Methods

2.3.1 Restriction Enzyme digestion

DNA for digestion was dissolved in autoclaved double glass distilled water at a concentration of 0.2 mg/ml or less.

One tenth volume of 100 mM Tris-HCl pH7.5, 60 mM MgCl , 60 mM

2-mercaptoethanol, 1 mg/ml BSA was added to the DNA solution. 37

This reaction mix was augmented with an appropriate amount of

NaCl or KC1 as recommended by the restriction enzyme suppliers.

Enzymes were added to give an estimated 2-3 fold over-digestion within the allocated digestion time (e.g. one ug of DNA plus three units of enzyme incubated for one hour). Digests were o performed in 1.5 ml snap-shut eppendorf tubes at 37 C. After an appropriate incubation time, the digest was placed on ice and a small sample electrophoresed on an agarose or acrylamide gel to determine the extent of the reaction. Completed digests of human DNA were ethanol precipitated directly. All other completed digests were quenched by phenol extraction. The aqueous phase was ether-washed several times and then ethanol precipitated.

2.3.2 Ethanol precipitation of Nucleic Acids.

DNA was routinely precipitated by the addition of 1/10 volume of 4M NaCl and 2 volumes of ethanol. DNA for sequence analysis was precipitated by the addition of 1/10 volume 3M Na acetate pH5.6 plus 2 volumes of ethanol (3M Na acetate stock solution was produced by titration of glacial acetic acid with

5M NaOH). Samples were chilled overnight at -20°C or for 10-15 minutes in a dry ice/ethanol bath. The DNA was pelleted by centrifuging in a Sorvall HB4 head at 16,000 xg. for 10 minutes or for 10 minutes in an Eppendorf microfuge (Eppendorf 5412).

The supernatant was removed and the pellet washed in 70% ethanol. After re-spinning, the supernatant was removed and the pellet dried under vacuum. The DNA was redissolved in an appropriate volume of water or TE buffer (10 mM Tris-HCl pH8.0,

1 mM EDTA). 38

2.3.3 Agarose Gel Electrophoresis

Agarose gels for the routine analysis of DNA were electrophoresed in E-buffer (40 mM Tris, 20 mM sodium acetate,

2 mM EDTA, pH7.7). DNA samples which were to be transferred to nitrocellulose filters were electrophoresed in T-buffer (40 mM

Tris, 250 mM sodium acetate, 1 mM EDTA, 0.25 mg/ml ethidium bromide, pH7.7 with glacial acetic acid). Gels were made by dissolving the appropriate weight of agarose powder in E-buffer o and boiling the solution. The solution was cooled to about 60 C and poured into a gel mould. After the gel had set, the slot- formers were removed. Samples were loaded in 2% (w/v) Ficoll plus 0.2% Orange G (Sigma) as a tracking dye. Electrophoresis was carried out in an electric field of 2-5v/cm until the tracking dyes reached the bottom of the gel.

2.3.4 Acrylamide Gel Electrophoresis

Preparative 30:1 acrylamide:bis-acrylamide gels were formed in a 400x200x1.5 mm gel mould and run in 50 mM Tris- borate (pH8.3) 1 mM EDTA buffer. 5 or 7.5% gels (with respect to the acrylamide concentration) were routinely used for tie fractionation of restriction fragments. 50:1 gels were used for fractionation of strand-separated restriction fragments.

2.3.5 Visualisation of DNA Restriction Fragments

Ethidium Bromide Staining

After electrophoresis agarose gels were strained in E- buffer containing 0.5 ug/ml ethidium bromide for 10 mins. Gels were illuminated with short-wave UV light and photographed on

Ilford FP4 film using a Polaroid MP-4 camera.

UV-shadowing

UV irradiation of ethidium bromide stained DNA may result 39

in nicking of the DNA. UV shadowing was used for visualisation of DNA fragments which had been electrophoresed on preparative acrylamide gels. Gels were aged overnight prior to electrophoresis. After electrophoresis the gels were covered with Saran wrap and layed onto a UV fluorescent thin layer chromatography plate (Kieselgel 60 F 254, Merck). The gels were illuminated with a short-wave UV lamp. Restriction fragments appear as dark bands on a green background.

2.3.6 Autoradiography

Samples were autoradiographed at -70°C using intensifying screens. Films were sensitised by pre-flashing to an OD650 of

0.1-0.3

2.3.7 Recovery of Restriction Fragments

Restriction fragments were eluted from agarose gels by soaking crushed gel fragments in TNE-buffer (50 mM Tris-HCl pH8.0, 150 mM NaCl, 1 mM EDTA) for 24-48 hours at 37°C. The supernatant was filtered through sintered glass and loaded onto a DEAE-cellulose column in a pasteur pipette pre-equilibrated with TNE-buffer. The DNA was eluted off the column with 300 ul of 1.5 M NaCl, 50 mM Tris-HCl pH8.0, 1 mM EDTA-buffer. The DNA sample was then ethanol precipitated.

Alternatively agarose gel slices were diced and sealed inside a dialysis bag along with E-buffer. The bag was submerged in E-buffer between two electrodes and electrophoresed for 15 hours or more at 2v/cm. The current was reversed for 10 min. to remove the DNA from the dialysis membrane. E-buffer was recovered from inside the bag, phenol extracted and ethanol precipitated.

Restriction fragments were eluted from acrylamide gel 40

matrix in 0.5 M ammonium acetate, 0.02% SDS, 1 mM EDTA + tRNA

(5ug/ml). Gel slices were diced and put into a heat-sealed 1 ml disposable pipette tip plugged with siliconised glass-wool

(Maxam and Gilbert, 1977) . 0.6 ml of elution solution was added and the top of the tips were sealed with parafilm. The tips were heated overnight (> 10 hrs.) at 37°C in an air-incubator.

The parafilm was removed and the sealed tip of the tube cut off. The elution solution was drained out and collected in 15 ml plastic centrifuge tubes. Residual elution solution was blown out by attaching the tip to an automatic pipettor and depressing the plunger 2-3 times. The^acrylamide fragments left in the tip were re-washed with 0.2 ml of elution solution.

Ethanol precipitation of the DNA fragments was as described elsewhere (Maxam and Gilbert, 1980). The pellet was dissolved in 200 ul water and spun for 30 sees, in a microfuge to remove any residual acrylamide fragments. The supernatant was reprecipitated using sodium acetate/ethanol as described in

Section 2.3.2.

2.3.8 Gel Filtration .

Gel filtration was used to remove unincorporated radiolabelled nucleotides from nick-translation reactions. A pasteur pipette was plugged with siliconised glass-wool. Bio-

Gel A 1.5m agarose beads were poured into the pipette and this matrix was pre-equilibrated with > 3 column-volumes of 3xSSC

(Section 2.4.4). The sample was applied to the column in < 1/10 column volume, and the column washed with more 3xSSC. Fractions were collected in 1.5 ml eppendorf tubes, monitored, and the excluded peak of (radiolabelled) DNA pooled.

2.3.9 Determination of Radioactivity 41

radioactivity was usually measured by Cerenkov radiation and the efficiency of counting was approximately 30%.

TCA precipitated labelled DNA was immobilised on Whatman

GF/C filters, dried and counted in toluene-based scintillant

[TBS; 0.5% w/v 2,5 diphenyloxazole (PPO), 0.03% w/v l,4-di-2-(5 phenyloxazolyl) benzene in toluene].

2.4 Specific Methods

2.4.1 Preparation of Plasmid DNA

Liquid cultures of bacteria were grown in L broth (5g/l yeast extract, 10g/l tryptone, 5g/l NaCl, lg/1 D+

/ pH7.2). The autoclaved broth was augmented with antibiotics as appropriate:-

Strain ' Antibiotic

LE392,259

HB101 200 ug/ml streptomycin

HB101 + pH /3 G1 25 ug/ml kanamycin

HB101 + Rl-C,+Rl-D 50 ug/ml ampicillin

+ 1.5p +

+ HB101 + Hp -IS,+4.4/9 10 ug/ml tetracycline o 50 ml cultures were grown overnight at 37 C, and diluted 1 o in 100 into 11 of fresh L broth. The culture was grown at 37 C to an OD 650 of 0.1. Uridine was added to a final concentration of 1 mg/ml. Cultures were grown further to OD 650 = 0.6 then chloramphenicol was added to 0.3 mg/ml and the plasmids allowed to amplify overnight. Bacteria were killed with 1% v/v chloroform and harvested in a Sorvall GSA rotor for 10 min. at

8000 xg. The bacterial pellets were resuspended in 7 ml of 25% w/v sucrose, 50 mM Tris-HCl pH8.0. Freshly dissolved lysozyme, 42

25 mg/ml in 0.25 M Tris-HCl pH8.0, was added (1.6 ml). The mixture was stirred gently on ice for 5 min. 3.2 ml of 0.25 M

EDTA was added and stirred as above. 1.3 ml of lytic mix was added (2% Triton X-100, 50 mM Tris-HCl pH8.0, 20 mM EDTA). The mixture was stirred gently on ice for 20 min. If the solution was not viscous at this stage it was incubated for 5 min. at 37°

C. Cell debris (plus chromosomal DNA) was pelleted in a Beckman

SW40Ti rotor at 150,000 xg at 4 C for 60 min. The supernatant was phenol/chloroform extracted (phenol-chloroform 1:1 buffered with 10 mM sodium acetate, 10 mM sodium chloride, 1 mM EDTA, pH6.0) three or four times then extracted with chloroform/octan

-ol (24:1). The DNA was precipitated with NaCl/ethanol, and redissolved in about 10 ml of TE buffer. RNaseA was added to 20 ug/ml and the solution incubated at 37°C for 30 min. After one chloroform/octanol extraction the solution was reprecipitated and resuspended in 10 ml of TE buffer. 0.04 x the volume of

0.25 M K HPO pH7.5 was added to give a new volume (X). Xg of caesium chloride (CsCl) and 0.1 X ml of 10 mg/ml ethidium bromide were added. The solution was preclarified by spinning at 15 K for 60 min. in the SW40Ti rotor. The pellet and pellicle were discarded and the supernatant banded to equilibrium at 50,000 rpm in a Sorvall TV-865B vertical head rotor for 16 hrs. Plasmid bands were visualised with long-wave

UV light and the lower band side-harvested. Ethidium bromide was removed by 5 or 6 extractions with CsCl saturated propan-2-

01. The DNA solution was dialysed extensively against TE buffer. The concentration of plasmid DNA was estimated by reading the OD 260 of the solution (OD 260 = 1, = 50 ug/ml

DNA). This DNA is substantially free of E.coli DNA and RNA. 43

However it is contaminated by oligonucleotides not visible in agarose gel electrophoresis. These interfere with kinase labelling and may be removed by preparative acrylamide gel electrophoresis prior to labelling.

2.4.2 Transformation of HB101

An overnight culture of HB101 was diluted 1 in 100 into fresh L broth plus streptomycin. This was grown at 37°C to an

OD 650 =0.5. Cells were harvested by centrifugation at 6000 xg. in a Sorvall HB-4 rotor. The cells were resuspended in an equal i volume of filter sterilised 80 mM CaC^. After 15 min. on ice the cells were harvested and resuspended in one tenth the original volume of sterile 80 mM CaCl2,15% glycerol. Cells were dispensed in 200 ul aliquots in sterile 1.5 ml snap-shut eppendorf tubes. The aliquots were either snap frozen in liquid o nitrogen and stored at -70 C, or used directly.

DNA was added to the treated cells (5-50 ng) in 100 ul TE buffer. The mixture was left on ice for 10 mins., heat-shocked o at -50 C for 10 secsonds in a dry-ice/propan-2-ol bath, thawed to room temperature, chilled for 5 mins. on ice, heated for 1 min. at 42°C, and chilled on ice for 1 min. 200 ul of pre- o warmed L broth was added and the tubes were incubated at 37 C for 30 min. to allow the cells to recover. The solution was plated on L agar plates (L broth with 10g/l NaCl, 15g/l agar) plus the appropriate antibiotics.

2.4.3 Colony Electrophoresis

The type of supercoiled plasmid within a bacterium could be analysed with this procedure. A single colony was picked into 0.5 ml of L broth plus the appropriate antibiotics, and grown for 120 min. at 37°C in a disposable 7 ml bijou bottle 44

(Sterilin). Chloramphenicol was added to 170 ug/ml and the cells were incubated overnight for plasmid amplification. Cells were transferred to 1.5 ml eppendorf tubes and harvested in a microfuge by spinning for 3 min. The supernatant was removed and the cell pellet was resuspended in 20 ul of 20 mM Tris-HCl pH 8.0, 5mM EDTA. SDS was added to 0.75% and proteinase K to

0.7 mg/ml. The mixture was incubated at 37°C for 120 min, and the lysate layered onto the dry slot of an agarose gel. The sample was overlayed with molten agarose prior to electrophoresis.

2.4.4 Labelling Plasmid DNA by Nick-Translation 32

P A-phosphate labelled deoxyribonucleotides were incorporated into double-stranded DNA by nick-translation

(Rigby et al, 1977). The reaction mix included 0.5 ug of plasmid DNA, 4 uM dATP dGTP, 2.6 uM a^l? dCTP dTTP

(>400Ci/mmol), 5mM MgCl , 50 mM Tris-HCl pH7.5. Four units of

DNA polymerase 1 (Boehringer) and 5.2 pg of DNase 1 (RNase free grade, Worthington) were also added. The total reaction volume was 33 ul. Incubation was for 90 min at 15°C. The reaction was quenched by phenol extraction. Unincorporated nucleotides vere removed by chromatography of the phenol extracted aqueous phase on a Bio-Gel A 1.5 M column (Section 2.3.8). Excluded fractions were pooled and added directly to the hybridisation reactions.

2.4.5 Southern Transfers

Human DNA for Southern Transfers was fractionated on 0.8- or 1.0% agarose gels (higher percentages were used for digests of recombinant phage). A hybridisation marker was electrophoresed in parallel to the human DNA. This was Bgl 1 digested pH YgI. This hybridises to the p-globin cDNA plasmid 45

pH pG 1 and has fragment sizes of 6.6, 4.4, 1.8, and 1.0 kb.

"Visual" size markers, usually lambda DNA digested with Eco R1 or Eco R1 + Hsu 1 were also electrophoresed in parallel to the human DNA. After electrophoresis the gel was photographed and then denatured to render the DNA single-stranded. Denaturation was for 120 min in 0.5 M NaOH 1.5 M NaCl. The gel was neutralised in 0.5 M Tris-HCl pH5.0, 1.5 M NaCl. (The DNA strands do not renature during the neutralisation step). DNA was blotted overnight in 20x SSC onto a nitrocellulose filter, using the apparatus described by Southern (1975). SSC is 0.L5 M

NaCl, 0.15 M Na citrate. The nitrocellulose filter, now with single-stranded DNA bound to it, was washed for 10 min in 2x

SSC. DNA was immobilised on the filters by baking at 80°C for

120 min. Non-specific binding sites on the filters were saturated by pre-treatment of the filters prior to hybridisation. All treatments were at 65°C. Filters were soaked for 30 min in 3x SSC, then for 180 min in 3x SSC plus 0.2% w/v ficoll, polyvinyl pyrrolidone, and bovine serum albumin. This solution is 3x SSC 10x Denhardts (Denhardt, 1966). Filters were washed for a further 60 min in 3x SSC 10x Denhardts plus 0.L% w/v SDS, 50 ug/ml sonicated denatured herring sperm DNA.

Additionally 10 ug/ml polyadeylic acid was used if pH pGl was the hybridisation probe. The same solution was used for hybridisations. The nick-translated probe was denatured by immersing in boiling water for 5 min, and this was added directly to the above solution. Probe concentration was 2 ng/ml re globin gene sequences. For blots of human genomic DNA, the hybridisation time was 60 hr. This time was reduced if the filter bound DNA was in excess in the hybridisation reaction, 46

e.g. for blots of globin recombinant phage. After hybridisation, the filters were washed repeatedly (> 6 times)

in 3x SSC plus 0.1% SDS plus herring sperm DNA. The filters were then washed twice for 30 min with the same solution. If

appropriate a low salt (high stringency) wash was added to the

protocol (2x 30 min; see text). Filters were then taped onto a

glass plate for autoradiography.

2.4.6 Growth of Phage Lambda in liquid culture

Overnight cultures of LE392 were diluted 1 in 50 into fresh L broth plus 10 MgSO (200 ml per 21 flask). Cultures mM 4 were grown at 37°C to an OD 650 =0.45-0.6 (about 2-3 xl0® cells/ml). A high titre stock of phage was added to give a multiplicity of infection equal to 1 (m.o.i. = one phage for every bacterium). The cultures were shaken vigorously for 4-5 hrs. Bacteria were killed by the addition of 2ml/l chloroform.

2.4.7 Growth of Phage Lambda on Agar Plates

Plating Cells. The host bacteria for growing phage on plates are prepared as follows. 1 ml of an overnight culture is diluted into 50 ml of fresh L broth. The culture is grown for

120 min, and then harvested by spinning at 6000 xg for 10 rain.

The bacterial pellet is resuspended in 50 ml of sterile 10 mM

MgCl , aliquoted and stored at 4°C. 2 Plate Lysates are used for small-scale propagation of phage stocks. A single phage plaque was picked into 1.0 ml of phage buffer (21.5 mM KH£0 49 mM Na£PC> pH7.15, 85 mM NaCl, 1 mM 4 4 The MgSO^0.1 mM CaCl , 0.001% w/v gelatin) plus 50 ul of CHClg* 2 CHCl^will kill bacteria in the solution. The solution minus the

CHClgis added to 2.5 ml of sterile BBL top-agar, pre-cooled to 47

45°C. (Top-agar is 10 g/1 BBL trypticase peptone, 5 g/1 NaCl,

6.5 g/1 agar). The tubes were kept at 45°C for 2-3 min, then

0.2 ml of plating cells were added. The contents of the tube were mixed and poured onto a moist L agar plate (L agar is L broth with 10 g/1 NaCl, 15 g/1 agar). The top layer was allowed o to set and the plates were incubated overnight at 37 C.

Confluent of the bacterial lawn was usually observed in these plate lysates. The plate was overlayed with 5.0 ml of L o broth and left overnight at 4 C. This broth was sucked off and a few drops of CHCl^were added. This solution was used directly as a high titre ph3ge stock (5x1 .u./ml).

Titreing. Serial dilutions of phage solutions were performed in phage buffer. 0.1 ml of phage dilution plus 0.2 ml o of plating cells pre-mcubated together at 37 C for 10 mm.

This was then added to pre-cooled BBL top-agar and plated onto previously poured BBL plates (BBL plates are as top-agar but 10 g/1 agar). The top-agar was allowed to set, and the plates were o inverted and incubated overnight at 37 C.

2.4.8 Preparation of Phage DNA.

2 ml of CHClg, 40 g of NaCl, and lmg of DNase and RNase were added to 11 of high titre phage solution. This was left at o 22 C for 60 min. Debris was pelleted at 16,000 xg in a Sorvall GSA head. Polyethylene glycol was added to 10 w/v and the o solution was left overnight at 4. C. Phage precipitated by this procedure were pelleted at 16,000 xg for 10 min, and resuspended in 20 ml of phage buffer. Contaminating E.coli were removed by re-spinning the solution at 16,000 xg for 10 min in a Sorvall HB-4 head. The supernatant was treated with 10 ug/ml

DNase 1 for 60 min at 22°C. 0.71x the volume of CsCl was added. 48

f

After the CsCl had dissolved, the solution was preclarified at

15,000 rpm in a Beckman SW40Ti rotor. The pellet and pellicle were removed and the supernatant re-spun to equilibrium for 24 hrs at 33,000 rpm. Phage were visible as a white band. This band was side-harvested with a syringe, added to pre-clarifled

CsCl solution, and spun to equilibrium again. The phage band was collected and dialysed against 25 mM NaCl, 10 mM TriS-HCl pH8•0, ImM MgSO^. The solution was brought to 0.2% SDS, 10 mM o EDTA, and heated at 65 C for 15 mm. Proteinase K was added to

50 ug/ml and the solution was digested at 37°C for 60 min. The solution was extracted 4 times with re-distilled phenol

(buffered with 0.5 M Tris-HCl pH8.0). Phenol was removed by extensive dialysis against TE buffer. DNA was stored at 4°C.

The concentration of the DNA was estimated from its OD 260. A small sample of the DNA was digested with an appropriate restriction enzyme to confirm the phage's identity.

2.4.9 Preparation of Phage Vector "Arms"

ANEM788 DNA was approximately 3 fold overdigested with

Hsu 1, phenol extracted, ether washed and loaded onto 10-40% sucrose gradients (1M NaCl, 20 mM Tris-HCl pH8.0, 10 mM EDTA;

Maniatis et al, 1978). Gradients were 14 ml and the sample was

20-30 ug of phage DNA in a volume of 200 ul. Samples were spun for 15 hrs at 33,00Orpm in a SW40Ti rotor at 20°C. 0.6 ml fraction were collected by bottom-harvesting of the gradients.

15 ul of each fraction was electrophoresed on a 0.7% flat-bed agarose gel. Fraction were pooled so as to exclude contamination by the trpE central fragment. Pooled fractions of left and right "arms" were concentrated by ethanol precipitation prior to use. 49

2.4.10 In Vitro Packaging of Lambdoid DNA

The lysogenic strains BHB2671 and 2673 were grown and

induced as described by Collins and Hohn (1978). Cells were harvested at 6000 xg and resuspended in twice the original volume of M9 buffer (21.5 mM KHgPO^ mM Na HP0 pH7.15, 8.5 mM 2 4 NaCl, 2.5 mM NH C1 , 1 mM MgSO^, 0.1 mM CaCl / 0.4% glucose). 4 2 Cells were UV irradiated for 3 1/2 min in 200 ml batches in

NUNC bio-assay dishes. The light source was a UV Products Inc.

Mineralight S68 (minus filter) 40 cm from the bacterial suspension. Irradiated cells were harvested and concentrated as described. The concentration buffer was 40 mM Tris-HCl pH8.0,

10 mM NaNg, 10 mM MgCl ,10 mM putrescine dihydrochloride, 10 2 i mM spermidine trihydrichloride, 0.1% 2-mercaptoethanol, and 7% dimethylsulphoxide. In vitro packaging of lambda DNA was performed using 40 ul aliquots of the concentrated cells. \ "Exogenous" lambda DNA was added to a concentration of 7.5 ug/ml (300 ng per 40 ul aliquot). Packaging reactions were performed as described except tha the final concentration of

ATP was 8 mM. The reactions were quenched by the addition of

0.5 ml phage buffer plus 50 ul of CHCl^. Phage were titred as described previously.

2.4.11 "Miniprep" Analysis of Lambdoid DNA

Miniprep analysis of single phage plaques was essentially as described by Cameron et al (1977). Single phage plaques for analysis were amplified using the plate lysate technique

(2.4.7; the only modification to the protocol was that agar was replaced by agarose). Plates were over-layed with 10 mM Tris-

HCl pH7.5, 10 mMEDTA and harvested in the usual manner. To 4 ml of the solution was added 0.4 ml 0.5 M EDTA pH8.5, 0.2 ml 2 M 50

Tris Base, and 0.2 ml 10% SDS. This was mixed, followed by 10 ul of diethylpyrocarbonate. The solution was incubated at 65°C for 30 min. 1 ml of 5 M potassium acetate was added and the solution was chilled on ice for 1 hr. Debris was pelleted by spinning at 24,000 xg for 10 min. Nucleic acids in the supernatant were precipitated with 2 volumes of ethanol. The pellet was resuspended in 0.4 ml TE buffer, phenol extracted, and re-precipitated prior to restriction enzyme analysis.

2.4.12 Screening of Lambda Recombinants

In Situ hybridisation to phage plaques was performed after the method Benton and Davis (1977). Phage were grown on

BBL agarose plates (7 or 10 g/1 respectively of agarose replaces agar in the top and bottom layers of the plates).

Plates were chilled to 4°C. Nitrocellulose filters were labelled with pencil marks, soaked in 3 xSSC for 30 min and blotted dry. Filters were layed onto the surface of the plates.

Duplicates were blotted from each plate. The first filter was blotted 2 min, and the second for 3 min. Filters were then layed phage-side uppermost for 3 min on a pad of Whatman 3MM paper soaked in 0.5 M NaOH, 1.5 M NaCl. Filters were then submerged twice (30 sec) in 0.5 M Tris-HCl pH7.0, 1.5 M NaCl solution. The neutralised filters were blotted dry and baked at

80°C for 120 min. Hybridisation conditions are as described in

Section 2.4.5

2.4.13 Maxam and Gilberting in Zurich

60 ug of supercoiled plasmid DNA was digested with the relevant restriction enzyme, phenol extracted and precipitated.

The DNA was resuspended in 0.1 M Tris-HCl pH8.0 buffer. This solution was de-ionised by shaking with a few beads of 51

chelating resin. 0.15 units of Bacterial Alkaline Phosphatase

(BAP, Worthington) were added. Incubation was for 30 min at 65°

C. The reaction was monitored by removing a sample of the DNA solution before and after the addition of BAP. These samples were added to a trace amount of previously kinase-labelled DNA.

The reaction mixes + BAP were incubated in parallel to the main reaction. After 30 min the "+" samples were electrophoresed on

3 MM paper at 2,500v in Na citrate buffer pH3.5. In this assay high molecular weight DNA remains at the origin whereas orthophosphate released by BAP electrophoreses with the tracking dye. No counts should remain at the origin after a succesful dephosphorylation. Dephosphorylated DNA was phenol extracted 4 times, ether washed and precipitated. Kinasing was carried out as described by Maxam and Gilbert (1977).

Unincorporated ATP was removed by gel-filtration on a Sephadex

G-100 column run in 20 mM Tris-HCl pH7.5, 1 mM EDTA. Singly labelled restriction fragments were generated by "re- restriction". These digest mixes were augmented by 20 mM KPO^to inhibit any contaminating phosphatases in the restriction enzymes. Digests were run on 0.7% preparative agarose gels and fragments were visualised by autoradiography. Fragments were eluted by soaking in TNE buffer (Section 2.3.7). Recovered fragments were treated with A>G, G, C+T, and C specific reactions (Maxam and Gilbert, 1977). 400x280x1.5 mm sequencing gels were run in 50 mM Tris Borate pH8.3, 1 mM EDTA buffer at

800v. Autoradiography was at -70°C using intensifying screens.

2.4.14 Maxam and Gilberting in Paddington

Digested starting fragments (Section 2.9.2) were dephosphorylated in 0.1 M Tris-HCl pH8.0, 1 mM EDTA buffer. 20 52

1 pmol of 5 termini were typically dephosphorylated in a volume of 200 ul with 0.2 units of BAP (200 BRL units) at 65°C for 60 min. The reaction was monitored by a "+ BAP" reaction as described in Section 2.4.13. Release of 32p orthophosphate was measured by TCA precipitation of the + BAP samples. Typically

>95% of the input was rendered TCA soluble by BAP treatment. DNA was recovered and kinase labelled as described before. Labelled samples were electrophoresed on a strand- separation acrylamide gel and eluted as described in Section

2.3.7. Recovered single-stranded fragments were treated to the

G, G+A, C+T, and C specific reactions described by Maxam and

Gilbert (1980). Latterly these reactions were augmented by a T- specific reaction (Rubin and Schmidt, 1980). 420x200x0.35 ram 8% and 20% sequencing gels were electrophoresed in 75 mM Tris-

Borate pH8.3, 1 mM EDTA running buffer. Sample slots were either 4 or 8 mm. Gels were run at a constant power of 25-30

Joules per second per gel. Section 3 : Results

(The Search for the Elusive Chimaera) 3. Results

3.1 Partial sequencing of human (Z-, P-, and V-globin cDNA

plasmids

In order to study a eukaryotic gene in its normal or

1 'diseased state it is necessary to have a nucleic acid hybridisation probe. Because the globin genes are expressed at high levels in erythroid cells this requirements is easily met.

Relatively large amounts of globin specific mRNA can be isolated and this can be radiolabelled itself or reverse transcribed into radiolabelled complementary DNA (cDNA). The cDNA probes have been most useful as they will hybridise to other globin RNA or globin specific gene sequences. These probes have been used successfully to obtain the copy number of globin genes in normal individuals and an approximation of the coding information deleted in various haemoglobinopathies and thalassaemia syndromes (Williamson, 1976, Forget et al, 1976,

Ottolenghi et al, 1976). cDNA titration reactions have also been used to determine the amount of cytoplasmic and nuclear + o O globin specific RNA species found in patients with p or p thalassaemia (Benz et al, 1978, Old et al, 1978, Nienhuis et al, 1977).

In practise purified globin specific cDNAs are always contaminated at detectable levels by cDNA molecules transcribed from other RNA species present in erythroid cells. This is not an insuperable problem and can be accommodated by extra purification steps in the isolation of the hybridisation probe or by performing cross- or competition-hybridisation reactions.

However a more elegant solution is to clone double-stranded 54

K A

Figure 6. Human Q , P , and y Globin Plasmids, pHfl/j3/yGl.

1

Th1 e cDNA inserts are shown as shaded blocks. 3 and 5 refer to the orientation of the cDNA insert. Rest- riction sites labelled in the sequencing protocol are indicated by shaded arrows. Restriction sites used for secondary restriction are indicated by open arrows. R = Eco Rl, K = Kpn 1, H = Hind 111. 55

cDNA molecules into a bacterial plasmid. This approach was

first used successfully by Maniatis et al (Maniatis et al,

1976, Efstratiadis et al, 1977) to clone a cDNA copy of rabbit

P -globin mRNA). Using the same methodology, double-stranded

cDNA from human a-, p and y-globin mRNA was inserted into

the plasmid pCRl (Armstrong et al, 1977). This work was carried out in our lab in collaboration with P. Curtis and J. Van der

Berg of the Insititute of I, Zurich.

Bacterial clones containing globin specific sequences were

identified by in situ hybridisation to human Gj3y cDNA.

Putative a-, p-, and y - cDNA plasmids (pHa/0/VGl ) were distinguished by solution hybridisation to radiolabelled preparations of a, p, apy , or a/} globin cDNA. Restriction mapping showed that the inserted fragment of the putative a j clone contained sites for the enzymes Hind III, Hha I, and Hpa

II. It was not cleaved by Eco Rl. In contrast both the putative

/?- and y- inserted fragments were cleaved by Eco Rl but not by

the three enzymes which cleaved the <2 insert (Little et al,

1978, Figure 6). In order to further differentiate these three

plasmids and confirm the designations made from the hybridisation reactions, direct nucleotide sequencing was

undertaken by myself.

The putative p and y cDNA plasmids were sequenced around

the Eco Rl site, and the a plasmid around the Hind 111 site of

the inserted fragment. Rudimentary "restriction" maps of all

three plasmids are shown in Figure 6. A sequencing "ladder"'

derived from the a plasmid pHCZGl is shown in Figure 7, and

partial sequences derived from the plasmids are shown in Figure

8- The sequences were in all cases independently verified by a 56

Figure 7. A Sequencing Gel of the a Globin cDNA Plasmid pHflGl.

The Figure shows a 20% acrylamide gel of the anti- coding strand of the plasmid. The sequence reads from codon 89 back towards the 5' end of the mRNA. 57

Zurich G-gnome. These nucleotide sequences are consistent with

the known protein sequences of human a-, ft-, and y globin

(Dayhoff, 1972) and cannot be derived from mRNAs for 6-globin

or any known embryonic globin proteins. There are a number of

ambiguities in the sequence and these are shown underlined.

However the three proteins differ by several amino-acids in the

area sequenced and the ambiguities are such that they could not

lead to assignment of an incorrect identity. The partial

sequences of pHaG1 and pH]9Gl are in full agreement with

published sequences of a and p-globin mRNA (Marotta et al,

1977, Wilson et al, 1980).The partial sequence of pHyGl agrees with that of the gamma genomic globin gene sequence of Slightom

et al (1980). The only discrepancy is at residue 129 where an

ambiguous A position in a (serine) codon was shown to be a C

residue.

The partial sequences locate the position of the

restriction sites utilised in the sequencing protocol. Thus the

Eco R1 sites of pHpGl and pH y G1 lie at codons 120-121, and

the Hind III site of pHaG1 lies at codons 90-91, confirming

the result of Orkin (1977). The cDNA inserts can in each case

be orientated with respect to known markers in the parental

plasmid, pCRl. For example, secondary restriction of pHyGl

(labelled at the Eco R1 site) with Hind 111 generates two

labelled fragments. The small fragment yields sequences from

the sense-strand, moving from the amino to carboxy terminus with respect to the protein sequence. Conversely the large

fragment gives anti-sense sequences. Thus the 3' end of the

inserted cDNA fragment must lie closest to the Hind 111 site of

pCRl. Similar arguments can be used to deduce the insert 58

75 80 85 90 AspMet ProAsnAlaLeuSer AlaLeuSerAspLeu His Ala His LysLeu o(-globir

GACAUGCCCAACGCGCUGUCCGCCCUGAGCGACCUGCACGCGCACAAGCUU pHo(G1 Hind III

95 100 ArgVal AspProVa I AsnPheLys LeuLeu globin

CGGGUGGACCCGGUCAAC UUCAAGCUCCUA p H C{G1

105 110 115 120 LeuLeuGlyAsnVal LeuVal CysVal Leu Ala His HisPheGly LysGlu /j-globin CUCCUGGGCMCGUGCUGGUCUGUGUGCUGGCCCAUCACUUUGGCAA- GAA pH^Gi

"ri

105 110 115 120 LeuLeuGlyAsnVal LeuVal ThrVal LeuAla IleHisPheGly LysGlu V^gtobin CUCCUGGGAAAUGUGCUGGUGACCGUUUUGGCAAUCCAUUUCGGCAA-GAA pH^GI '"rT

125 130 135 PheThrProGlu Val Gin AlaSer TrpGln LysMet Val Thr V^globin UUCACCCCUGAGGUGCAGGCUUCAUGGCAGAAGAUGGUGACU pH/G1 t MboH

Figure 8. Partial Nucleotide Sequences of pHfl/ff/yGl.

Sequences are shown in the RNA form of the coding strand. Ambiguous nucleotide positions are under- lined. The arrow points to an A residue which disagrees with the y sequence of Slightom et al (1980). Amino acid codons are numbered. 59

orientation of pH a G1 and pHjjGl. With this information, the plasmids can be digested with restriction enzymes to give ENA

f fragments complementary to either the 5' or 3 coding sequences of the globin genes. This ability was of particular use in genomic blotting experiments to deduce a physical map of the human dp locus (Flavell et al, 1978).

3.2. Molecular diagnosis of Q thalassaemia in association with

Haemoglobin Lepore

The Southern transfer technique can be used for the mapping of single copy genes from higher eukaryotes. The construction of a physical map around the dp globin genes was alluded to in the previous section, and the aforementioned globin cDNA plasmids have also been used as hybridisation probes in mapping the qfy locus (Little et al, 1979), and in establishing the ^V-6 globin intergene distance (Bernards et al, 1979). Similar genomic blotting experiments performed on o + DNA from patients carrying P or P thalassaemic genes have shown that there is usually no gross rearrangement of the globin locus asociated with these syndromes (Flavell et al,

1979). The favoured approach to identify the lesion in these inherited diseases was molecular cloning of the thalassaemic globin gene followed by detailed structural and functional + analysis (see Introduction). The work here concerns P thalassaemia, and in order to clone a gene which was 4 unequivocally derived from a chromosome which conferred P - thalassaemia, the starting DNA was obtained from a compound heterozygote. This patient is a j3 thalassaemic who also carries an Hb Lepore allele. The patient's family is shown in Figure 60

9a, and the presumed arrangement of his Pglobin loci is shown

in Figure 9b. In this patient the P globin gene from the

"other" chromosome, a 6j3 fusion gene, can be readily

distinguished structurally from the thalassaemic globin gene

and excluded from further study. The advantages of this

approach are dealt with more fully in the Section 4. Prior to

cloning the DNA from this patient, the Southern transfer

technique was used to confirm, (a) the presence of the Hb

Lepore 6j3 fusion gene (henceforth referred to as the Lepore

gene) and (b) that there are no large deletions within the

like globin gene cluster on the other chromosome.

High molecular weight DNA was prepared from the patient's

circulating nucleated cells and was digested with the

restriction endonucleases Xba I, Hsu I and Pst I. The resulting

DNA was fractionated on agarose gels, transferred to a

nitrocellulose filter and hybridised with a radiolabelled human j3-globin cDNA-derived recombinant plasmid (Little et al, 1978). .

Restriction fragments containing the p (and 6 ) globin

structural genes were visualised by autoradiography. DNA

derived from the placenta of a clinically normal patient of

caucasian ancestry was analysed in parallel. The results are

presented in Figure 10. Pst I generates fragments of 4.4 and

2.3 kb from the normal patient (N) and these contain p- and 6-

gene sequences respectively (Flavell et al, 1978). In the p

/Lepore patient (T) an extra fragment of 2.6 kb is seen which

is diagnostic for the Lepore gene (Flavell et al, 1978). Xba I

generates a single hybridising fragment of 11 kb from normal

DNA. This fragment has been shown to contain both the p and 5

genes and has previously been sized at 12.3 kb. The discrepancy 61

a o

i / i P - Thai. Hb Lepore

B 25 20 15 10 OkB

5p chr. 1 A A

chr. 2 1L * AP.* T "33 a

T Hsu I v PstI

-j. Figure 9. The p Thalassaemic Patient.

A. The patients immediate family. Hb Lepore was confirmed in the father by starch gel electro- phoresis (B.Modell, personal communication).

B. The presumed arrangement of the patient's globin loci. 62

6-6 H

416 5-1 - 4-2- 411 3-4-

4 7-5 4 2-6 4 2-3 1-98- 4-2- 4 3-8 3-4-

Hsu1 Xba1 Pst1

Figure 10. Southern Transfers on the Patient's DNA.

T = the thalasaemic patient, N = normal patient. Sizes of marker DNA fragments are in kb. Fragments referred to in the text are indicated by arrows. The Pst 1 digests were fractionated on a 1% agarose gel, the Hsu 1 and Xba 1 on a 0.8% gel. The procedure was as described in 2.4.5 except that filters were given 2x 30 min washes in 0.1x SSC prior to autoradiography. 63

between a size of 11 and 12.3 kb is probably not significant for fragments of this size, and larger discrepancies than this exist in the literature for the sizes of globin gene fragments

(compare Flavell et al, 1978 and Mears et al, 1978). An additional band at 3.8 kb is observed in Xba 1 digested P/

Lepore DNA. This agrees well with a previous estimate of 3.9 kb for the 'Lepore' Xba I fragment. Therefore digests with these two enzymes confirm that the patient is heterozygous for the

Lepore gene. The deletion which creates the Lepore gene removes about 7000 base-pairs of DNA. Digestion with Hsu I generates

7.5 and 16.0 kb hybridising bands from both the normal and the

P /Lepore DNA. The smaller band contains the j3-globin gene

(Flavell et al, 1978). Thus for three different restriction enzymes a fragment co-migrating with the normal j3-globin fragment (4.4 kb Pst I, 7.5 kb Hsu I) or normal 6j3-globin fragment (11 kb Xba I) is seen in the P+/Lepore DNA. As the patient has been shown to be heterozygous for the Lepore gene by DNA analysis, the normally sized fragments must be derived from the chromosome conferring the P -thalassaemic phenotype.

This confirms previous results that there is usually no gross o + rearrangement of the p-globin locus in p or p thalassaemia.

The larger (16 kb) hybridising Hsu I fragment in the normal DNA is derived from the 6 globin gene (sized at 18.0 kb by Van der

Ploeg et al, 1980). The co-migrating band from the p*/Lepore patient is derived from the Lepore chromosome. The Lepore deletion removes the Hsu I site 5' of the P globin gene to create a new, fused fragment. However the gain in size of the fused Hsu fragment, from 16 kb to 16 + 7.5 kb is offset by the size of the Lepore deletion, 7 kb, to produce a 6j3 fragment 64

whose size (in this system) is indistinguishable from the 16 kb

6 fragment. These blotting experiments confirm the haematological diagnosis of the patient's genotype. The data is summarised in Figure 9b. The absolute positions of the restriction sites shown relative to the gene coding sequences are not established by these experiments, and are taken from

Flavell et al (1978) .

3.3. In vitro packaging of phage lambda DNA

X decided to clone the 7.5 kb Hsu 1 fragment containing

+ thepatient's ( j3 thalassaemic) globin gene. The cloning protocol is shown in Figure 11. DNA was cloned in phage lambda, and the in vitro packaging technique was used to produce lambda/human recombinants at high efficiencies. This is a system for converting chimaeric lambda/foreign DNA concatemers into viable phage particles (Hohn and Murray, 1977). The method used was that of Collins and Hohn in which a concentrated lysed mixture of two induced lysogens (lambda carrying strains of

E.coli is incubated with the ligated mixture of lambda and foreign DNA (Collins and Hohn, 1978). The lysogenised lambda strains cannot form viable phage as they both carry the b2 deletion. However these genomes will recombine at a high frequency with the lambdoid vector DNA, even though the strains carry lesions in their red (lambda mediated recombination) genes. It is important to prevent recombination between the lambda genomes occurring as it could result in loss of the safety mutations in the vector and/or rearrangement of the cloned insert. Formation of viable vector/lysogen recombinants is reduced by UV irradiation of the lysogens prior to lysis - recombination will still occur but DNA derived from the 65

, human DNA phage DNA 1. ^ - Digest. Digest.

2. Fractionate.

Fractionate. FTarm" 3. 7.Ligate.

8. Package.

11.Plaque^ .^purify 4 •z. .. Hsu I

Figure 11. Protocol for cloning the Hsu 1 /? Globin Gene Fragment^ 66

lysogenised lambda genomes will carry lethal mutations. The

dose of UV irradiation necessary has to be determined

experimentally, and this is described as follows.

The two lysogens, BHB2671 and BHB2673, are grown and

induced in the normal way. After resuspension in buffer they

are either concentrated directly or UV irradiated for varying

amounts of time prior to concentration. Aimm 21 DNA (a gift

from Binie Klein) is then packaged in vitro with these

aliquots, diluted, and plated on a non-lysogenic strain of

E.coli (275) or a lysogenic strain of E.coli carrying a Aimm21

genome (C600(Aimm21)). Phage packaged from the exogenous DNA,

Aimm21, will grow on E.coli 275 to give plaques and will give no

plaques on C600 (Aimm21)^. As the time of UV irradiation

increases so should the titre (number of plaques) on C600 (A

imm21) decrease. The titre on the 275 indicator should not be

affected by the UV irradiation time. Thrs is what is observed

(Figure 12). Although efficiences for the same time point

(using different batches of in vitro packaging aliquots) may

differ by a factor of two or three, the efficiency on C600 (A

imm21) drops off more rapidly than for 275. Over-irradiation

reduces the packaging efficiency of the exogenous DNA and a

time point of 3 1/2 minutes was chosen as being optimal. This

gives a packaging efficiency of about 4 x 10^ plaque forming

units (p.f.u.) per microgram with Aimm21 DNA, but increasing

the effective ATP concentration to 6mM gave efficiencies in

excess of 10^ p.f.u. per microgram with Aimm21 or ANEM788

DNA. Table 1 gives the packaging efficiencies obtained for a

number of batches of in vitro packaging aliquots. Variation

between aliquots of the same batch of packaging mix is low, but

t Erratum PTO 67

Erratum

A imm 21 will not grow on C600 ( A imm21) as this bacterial strain is immune to superinfection by a homo-immune phage. Only phage which have acquired A imm434 immunity via recombination will grow on C600 (Aimm21). Thus the titre on C600 (Aimm21) is a measure of recombination frequency in this system. 68

Figure 12. Optimising UV Irradiation of In Vitro Packaging Lysogens.

Packaging mixtures were prepared and irradiated as described. 350 ng. of Aimm21 was incubated at an ATP concentration of ImM, diluted and titred in duplicate on E.coli 275 or C600(A imm21). The average packaging efficiencies are presented. The packaging efficiency of the endogenous lambda DNA was calculated from the number of lysogenic cells in each packaging aliquot (= the number of lysogenic phage) and by assuming that 1 microgram of lambda DNA is equivalent to 2 x 10^0 phage. Where two values exist for the same time-point using different batches of aliquots, both are presented. Efficiency of exogenous DNA = circles, endogenous DNA = squares. P.f.u. = plaque-forming units. 69

Table l. Packaging Efficiencies of Lambda DNAs i Packaging "DNA Indicator Strain Aliquots 275 C600( A imm21) 259

7 A imm21 1.3+0.1xl0 26+2.0 ND.

7 B imm21 2.8+0.5x10 53+11.0 ND.

B NEM ND. ND. 6•0+1•3x1 788

All packaging reactions were performed at 6mM ATP. ND.= not done. 70

is high between batches, being a factor of 2.1 in the data

presented.

3.4 Making recombinant phage

3.4.1 Enrichment for human Q-globin sequences n+ 45 micrograms of high molecular weight DNA from the p

/Lepore patient was digested with the Hind III isoschizomer Hsu

I. A time course of this reaction indicated that the DNA was at

least threefold over-digested. The DNA was electrophoresed on a

preparative 1% agarose gel in parallel with lambda DNA cleaved with Hsu I. The gel was sliced and the marker DNA tracks

stained with ethidium bromide. After reassembly of the gel a

rectangle of agarose was cut out of the human DNA track to

include restriction fragments with sizes between 6.0 and 9.5

kb. This size fraction will include the 7.5 kb //-globin

fragment. DNA was eluted from the agarose by the "freeze and

squeeze" method (Thuring et al, 1975). The estimated recovery

was 4.6 micrograms. A sample of the enriched DNA was

resuspended in ligase buffer, split two ways and incubated plus

or minus T4 DNA ligase (Cozzarelli et al, 1967). These DNA

samples were then electrophoresed in parallel with

unfractionated Hsu I digested DNA from the patient (Figure 13).

The fractionated DNA migrates in the correct position (Track b)

and is "ligatable" by T4 ligase (Track d). However it is still

possible that a subpopulation of these DNA molecules is not

religatable. The enrichment can be quantitated by scanning the

gel. This was estimated to be greater than sixfold (data not '

shown). This figure is approximate as the enriched DNA gave a 71

Figure 13. Analysis of Size-Fractionated Human DNA.

All samples were electrophoresed on a 0.7% agarose gel for 90 min at 5 v/cm. Tracks a, e 0.3 ug AHsu 1. Track b, d 0.18 ug of size fractionated Hsu 1 digest- ed p+/Lepore DNA minus and plus incubation with 0.14 Weiss units of T4 DNA ligase for 60 min at 22°C. Track c 0.96 ug Hsu 1 digested ft+/Lepore DNA. 72

poor UV spectrum, precluding an accurate assessment of its concentration. The overabundance of sequences at the size

limits of the enriched DNA is probably an artifact of the gel elution procedure.

3.4.2 Manipulation of vector DNA

The Hind III lambda vector ANEM788 was chosen for cloning of genomic restriction fragments from the thalassaemic patient.

The phage is a Warn Earn Sam derivative of the phage ANEM760 and

its salient features are shown in Figure 14 (Murray et al,

1977). It is a replacement vector with two Hind III sites flanking a nonessential region, the trpE fragment. It has the following safety mutations:-

- W,E, and S amber mutations, allowing growth only on SupE,

SupF E.coli strains

- att and cl deletions prevent lysogenisation of the phage

- red deletion reduces lambda mediated recombination function

- nin 5 deletion prevents the phage adopting a plasmid mode of

replication

- incorporation of three deletion safety mutations makes reversion to the wild-type effectively impossible.

This phage was certified for use at Category I* containment with recA hosts by the U.K. Genetic Manipulation

Advisory Group. Accordingly ANEM788 phage (a gift from Noreen

Murray) was plaque purified and grown in liquid culture. The host strains of E.coli used were either 259 (ED8654) or LE392

(ED8656). These strains are isogenic and are SupE SupF derivatives of E.coli K12. Phage particles were precipitated with polyethylene glycol and banded twice on a caesium chloride density gradient (Kaiser and Hogness, 1960). Phage DNA was LATE REOULAT* RECOMBINU REPLIC^

HEAD TAIL REGULATE LYSIS r • —ti 1 i —ii ii • 1 * ii i roNt/l zu j att kilcm CHOP a SR m' ... 1 / ..V . . .1 1 1 i 11 0 13 20 30 AO 50 60 70 80 90 100%

• T L JL

Wa V trpE (oft-red) CI nin5 J T< 788

»- Replacement DERIVATION OF A-HINDFFL — Deletion REPLACEMENT VECTOR NM 788 Point mut? - 5kb ( MURRAY et al. Molec gen. Genet. 150 53 1977 I f Hind III

gure 14. The Hind 111 (Hsu 1) Lambda Vector NEM788. 74

prepared after the method of Maniatis et al and was stored at 4°

C in Tris-EDTA buffer to promote annealing of its cohesive termini (Maniatis et al, 1978). The viability of the phage DNA was checked by in vitro packaging prior to further manipulations. Infective DNA (packaging efficiency > 6x10^ p.f.u. per microgram of lambdoid DNA) was incubated with Hsu I, a Hind III isoschizomer. The reaction was terminated by phenol extraction at an estimated 2 1/2-3 fold overdigestion. The DNA was recovered and a sample disolved in T4 ligase buffer. This sample was split into two and incubated plus or minus an excess of T4 DNA ligase. Half of each sample was electrophoresed on an agarose gel to give the result shown in the inset in Table 2.

Fifty percent of the Hsu I digested ANEM788 DNA is present as annealed left and right arms (Track a) and addition of T4 DNA ligase converts 95% of the phage DNA to a concatameric form

(Track b). A better way of assaying ligation is the restoration of infectivity to the Hsu I digested ANEM788 DNA. Accordingly the other half of the samples + ligase ( Experiment 1) were packaged in vitro (Table 2). Hsu I digestion reduces the packaging efficiency of the vector DNA by at least two orders of magnitude and this drop is partially reversed by the addition of T4 DNA ligase. The efficiency with ligated DNA will never equal that of the intact vector as many ligation reactions with produce uninfective chimaeric molecules.

Approximately 20% of the lambda genome is deleted in the derivation of ANEM788 (Figure 14). However if the central

(phenotypically silent) trp E restriction fragment is removed biochemically, then the remainder of the ANEM788 genome, the left and right arms, is too small to form a viable phage. This 75

Table 2. Packaging efficiencies of restricted and unrestricted vector DNA (p.f.u./microgram of lambda DNA).

Expt Intact NEM788 NEM 788/Hsu 1 + DNA ligase minus plus

v 1.10 + 0.11x10'* 1.00 + 0.13x10* 1.10 + 0.10xl0

trpE —

1.00 + 0.05x10 <17.5 4.29 + 0.30x10

Ligations were performed at a DNA concentration of 80 jug./ml. plus or minus 0.014 units of T4 DNA ligase /jUi. DNA input for in vitro packaging was 280ng. for all incubations in preparation 2 and intact XNEM788 in preparation 1, 160ng. for + incubations in preparation 1. All DNA samples except* were added directly to the packaging aliquots in ligase buffer. After incubation the aliquots were diluted in phage buffer and titred in duplicate on indicator strains 259 or LE392. The inset shows 250ng. of ANEM788/Hsu 1 + T4 DNA ligase electro- phoresed at 2v/cm on a 0.7% agarose gel. L = left arm, R = right arm, L + R = annealed arms, and trpE = central restrict- ion fragment. /\ 76

question of size has nothing to do with coding information on

the genome but reflects the preference of lambda's packaging

enzymes for a DNA substrate of size limits 75-105 % of wild- type lambda (Bellet et al, 1971). With the trpE fragment

removed the total percentage length of wild-type lambda is

approximately 70% and only those arms which acquire a foreign

restriction fragment of 2.5 kb or more (>5% lambda) in a

ligation reaction will exceed the lower size threshold to make

a viable phage. Thus removal of the trpE fragment prior to

ligation to human DNA will select for formation of recombinants. This "removal" is accomplished by sucrose

gradient centrifugation of the Hsu I digested ANEM788 DNA (Maniatis et al, 1978). Gradient fractions are pooled so as to

exclude the trpE fragment, and the resulting preparation of

purified arms is ligated to Hsu I digested human DNA. Table 3 shows the result of two such experiments where a sample of

purified vector arms plus T4 DNA ligase is incubated + Hsu I

digested human DNA, packaged in vitro and titred. In both cases

the titre increases, suggesting that the addition of human DNA allows formation of molecules greater than 75% of the length of

wild-type lambda DNA. The background in the "arms plus ligase" incubations is almost certainly due to contaminating trpE

fragments in the preparation of vector arms. Unfortunately

recombinant and parental phage are phenotypically

indistinguishable in this vector, necessitating direct analysis

of the DNA to confirm that lambda/human recombinants are being made. Individual plaques resulting from in vitro packaging of

ligated human and vector DNA were amplified by culturing on

agar plates. DNA from these phage preparations was digested 77

Table 3. Construction of ANEM788/Human Recombinants In Vitro

Preparation of purif- A B ied "arms".

Source of Hsu 1 RPC-fractionated Size-fractionated digested human DNA homozygous p° jJ/Lepore DNA. thalassaemic DNA

Packaging efficiency of "arms" plus ligase (p.f.u. per of jug. 4 5 fractionated vector 1.17 + 0.02xl0 7.04 + 0.04x10 DNA) .

Packaging efficiency 5 6 of "arms" plus ligase 1.50 + 0.13xl0 1.26 +0.08x10 plus human DNA. Increase with human 12.8 + 1.3x 1.79 + 0.12x DNA.

Recombinants/Total 5/5 4/8 sample (DNA analysis)

Sizes of inserts in 1 4.7 1,2 7.3 recombinant phage, 2 7.0 3 9.0 (kb.) See Figure 15. 3 3.9 + 1.8 4 5.9 + 6.5 5 5.0 + 2.9 +2.7 5-8 5.9 = trpE 5 4.7 + 4.4

0.80 ug. of purified "arms" were disolved in 6.4 ul of ligase buffer. This sample was split two ways. To one was added 0.8 ul of Tris-EDTA buffer and to the other was added 0.8 ul of 0.5 mg./ml. RPC-5 fractionated human DNA (a gift from I. Jackson), or 0.8 ul of 0.5 mg./ml. size-fractionated DNA (Figure 13).1 ul of T4 DNA ligase (0.14 Weiss units) was added to both samples and ligation was at 22 C for 180 mins. 1 ul of each incubation was electrophoresed on a 0.7% agarose gel to confirm ligation, and 3 ul of the sample was packaged in vitro , diluted and titred. 78

with Hsu I (Cameron et al, 1977). Figure 15 shows Hsu I digests of phage derived from RPC-5 fractionated human DNA (Hardies and

Wells, 1976). None contain the 5.9kb trpE fragments; some contain multiple inserts. The results with the size fractionated j3/Lepore DNA are less spectacular as ligation of the "arms" alone gives a large number.of phage. The increase in titre with human DNA (Table 3) was 1.8 fold suggesting that (1-

1/1.8x100%), about 44% of the phage are recombinants. The mathematics of the calculation are really more complex than this as the DNA concentration is not held constant in the experiment. Furthermore the calculation does not take into account recombinant phage containing multiple .inserts in the replaceable region and such phage are found. Direct DNA analysis shows that four out of eight phage derived from ligation of vector and p /Lepore DNA were true recombinants

(Table 3). This agrees well with the tentative original estimate. Overall the experiments presented in Table 3 and

Figure 15 allow four salient observations to be made:-

- recombination is occurring in vitro

- the Hsu I sites at the vector/human DNA junctions are

reconstituted

- the sizes of inserted DNA fragments reflect the source of the

+ DNA. All recombinants derived from the size fractionated J3 /

Lepore DNA fall within the 6.0-9.5kb limit. Recombinants

no derived from the homozygous p thalassaemic DNA (Table 3) show

a wider range of sizes in accordance with the heterogeneous

size of the RPC-5 DNA fraction (> 20-lkb).

" in vitro recombination in this system has a high efficiency 5 with about 5x10 human recombinants obtained from one 79

A B . c D E F

A B c D E

Figure 15. Restriction Endonuclease Analysis of Recombinant Phage.

Individual plaques resulting from the experiment described in Table 3 were picked, and amplified on agarose plates. DNA was made from these phage (2.4.11) digested with Hsu 1 and fractionated on 0.7% agarose gels. --- A. Phage derived from ligation of RPC-5 purified DNA to phage vector arms. Tracks a,f, NEM788. Tracks b-e, recombinant phage.

B. Phage derived from ligation of size-fractionated DNA (Fig. 13) to phage vector arms. Track c, NEM788 Tracks a,e recombinant phage. Tracks b, d. "parental" phage. 80

microgram of the P /Lepore DNA. This is important as it means

that any human gene, single-copy genes being a worst case, can be cloned from the amount of DNA obtained from a small blood sample (typical yields being 50 ug DNA from 10 ml of blood, with considerably more from thalassaemic patients).

+ 3.5 Screening recombinant phage derived from P/Lepore human

DNA •f

Recombinant phage derived from the p/Lepore human DNA were screened directly without further amplification. Batches of 65,000 phage were plated out on 24 cm. square petri-dishes, affectionately known as "megaplates" (Lenhard-Schuller et al,

1978). After the top agar had set 1 microlitre (5-10 p.f.u.) of a low-titre stock of the recombinant charon phage Ahp G2 was carefully spotted at two positions near the edge of the plate

(Figure 16a; Lawn et al, 1978). The plate was dried then incubated in the usual manner. The spots-titres of AH/?G2, which contains a genomic human P-globin gene, serve as internal controls in the screening process. (The charon phage will grow on the same bacterial lawn as ANEM788). Duplicate nitrocellulose filters were blotted from each plate, pre-washed and hybridised to a denatured nick-translated probe (Benton and

Davis, 1977). Preliminary experiments probing with the P- globin cDNA plasmid gave hybridisation to all phage plaques.

Later experiments used a 4.4kb Pst I fragment excised from the plasmid H-IS. This fragment contains a genomic p-globin gene and has no repetitive elements within it (Fritsch et al, 1980;

Figure 16b). Screening of 320,000 phage, 44% of which contained

a human DNA insert, yielded one positive scoring recombinant 81

A) Marker B) Probe

XHpGI 5 Kb Hp-IS 1 Kb

P § P

C) Screen I

. XHpG2 X788-p Q

Figure 16. "Benton and Davis" Screening of Recombinant Phage.

A. The recombinant used as an internal marker. B. The subcloned genomic /3 globin fragment used as hybridisation probe. C. An autoradiograph of duplicate filters blotted

off half a +megaplate.The positive-scoring phage ( ANEM788/} ) is arrowed. Figure 17. Plaque-Purification of a Positive-Scoring Recombin- ant Phage, A Paddington 1.

The area of the megaplate containing the positive- scoring phage (Fig. 16) was picked into phage buffer and re-screened at a number of dilutions. A. Duplicate filters from a 90 mm plate with about 200 plaques. B. Duplicate filters from a plate with 19 plaques. C. DNA Miniprep analysis of the plaque-purified phage. Track 1 ANEM788/Hsu 1. Track 2 Paddington 1/ Hsu 1. Track 3 A Hsu 1/ Eco R1. 83

A

B

c 84

(Figure 16c). The hybridising plaque is present on duplicate filters and gives a signal of similar intensity to the Ah/?G2 phage on the same filter. Phage from this area of the plate were picked at a number of dilutions and rescreened. Five out of about two hundred phage hybridised intensely to the probe

(Figure 17a) but a further cycle of rescreening was necessary before a "spot" on the autoradiograph of a filter could be aligned with a single plaque (Figure 17b). Restriction analysis of this plaque (Figure 17c) showed that it contained an insert of approximately 8kb as predicted. The recombinant phage was amplified in liquid culture to provide large amounts of DNA for detailed restriction enzyme analysis.

t

3.6 Mapping the Recombinant Phage.

The Southern transfer technique was used to generate an accurate physical map of the inserted DNA in the recombinant phage. Single or double restriction enzyme digests of the phage

+ Paddington 1 (tentatively assigned ANEM788j5 ) were fractionated on agarose gels, transferred to nitrocellulose filters, and probed with the j3-globin cDNA plasmids pHpGl or

JW102 (Little et al, 1978, Wilson et al, 1978). Restriction fragments containing structural gene sequences will hybridise to the probe. This is a useful method of analysis as only

7.5/42.5 kb, 18%, of this phage is composed of human DNA sequences.

Figure 18 shows Southern transfers performed on Pad 1 DNA and the sizes of hybridising fragments are presented in Table

4. A map of restriction sites flanking the coding sequence of the insert can be constructed as follows. Digestion with Bam HI EcoR1 Hsu I on A

::J (/) ::J ::J I (/) (/) + I I + + - E --J- --J E d en CJ) d C() ro C!)QJ B 86

~ ~ E 0 E __,)- d u d CJ) en w Cl) a1 + + ...... ,+ + - -4-1 -4-1 __,) __,) (/) (/) (/) CJ) CJ) Q_ 0... 0... 0) Q) c ori-

5.0- 3.4-

2.0-

1.3-

.84-

Figure 18. Southern Transfers of the Recombinant Phage A Paddington 1.

Density gradient purified DNA from the recombinant phage Paddington 1 was digested with one or more restriction endonucleases and electrophoresed on 1% (a,b) or 1.2% agarose gels. Transfers were performed as described in 2.4.5. Between 0.3 and 1.5 ug was loaded per track, depending on the particular restriction digest. Hsu 1/Eco Rl was used as a size marker. Sizes are-given-In kb. A Gene B Bam H1 . 1 95 . 1M ^ Bg Bgl II , M , P Pst 1 . kh. , E Eco R1. . 3-5 , H Hsu I , M , -8 -6 -A -2 0 2 A 6 8Kb I I I I I I I i : I

B Gene 3' AR HPB» •g Bi EiP i Bi E• H AL

ure 19. Mapping the j3 Globin cDNA Hybridising Fragments in A Paddington 1 A. The derivation of the map. The approximate position of the gene region is bracketed. B. A map of cDNA hybridising fragments in A Paddington 1. L and R refer to the phage vector arms which are shown as thick lines. The inserted fragment is shown with a thin line. 88

generates two hybridising fragments, 15.5 and 1.95 kb, implying the presence of a Bam HI site within the gene coding region

(Figure 19/ Orkin, 1977, Flavell et al, 1978). Double digestion with Bgl 11 reduces the 15.5 kb fragment to 3.0 kb. Bgl 11 alone generates a single 5.0 kb hybridising fragment, locating a Bgl 11 site (5.0-3.0) =2.0 kb to the left of the Bam HI site in the coding region. Pst 1 gives a single fragment of 4.4 kb which is reduced to 3.9 kb in a double digest with Bgl 11. This implies that one Pst 1 site lies (4.4-3.9) = 0.5 kb "outside" one of the Bgl 11 sites. If the 3', rightward, Pst 1 site lies outside the Bgl 11 site, this would predict a Bam Hl/Pst 1 fragment of 0.9 kb. Fragments of 2.0 and 1.75 kb are obtained

1 implying that the leftward Pst 1 site lies 0.5 kb 5 of the leftward Bgl 11 site (Figure 19a). The position of the other

1 Pst 1 site is between 1.1 and 1.35 kb left of the 3 Bgl 11 site. Hsu 1 generates a single hybridising fragment of about 8 kb which is cleaved by Bam HI to give 5.0 and 1.94 kb fragments. As the 1.94 kb fragment is common to all Bam HI digests, this places the rightward Hsu 1 site 5.0 kb from the central Bam HI site, and the leftward Hsu 1 site (8.0-5.0) =

3.0 kb from the central Bam HI site. Digestion with Eco R1 generates two hybridising fragments of 8.80 and 3.50 kb.

Therefore this enzyme cleaves within the coding region of the gene. In an Eco Rl/Pst 1 double digest these sizes are reduced to 3.6 and 0.76 kb, and with Eco Rl/Bgl 11, to 3.0 and 2.0 kb.

This locates the central Eco R1 site 0.76 kb to the left of the

3' Pst 1 site. The outer Eco R1 sites can be orientated with respect to the Hsu 1 .sites. If the leftward, outer Eco R1 site lies 3.5 kb from the central site, then this would predict Eco Table 4. cDNA-Hybridising Restriction Fragments of Lambda Paddington 1

Hsu 1 Eco Rl Bgl 11 Bam HI Pst 1

Hsu 1 8.0

Eco Rl 3.50 8.80 3.40 3.50

Bgl 11 5.00 3.00 5.00 2.00

Bam HI 5.00 ND. 3.00 15.5 1.95 1.95 1.95

Pst 1 4.40 3.60 3.90 2.00 4.40 0.76 1.75

Sizes are in kb. Southern Blots and hybridisation probes are as described in Figure 18. 90

Rl/Hsu 1 fragments of 4.0 and 3.5 kb. In fact a 3.5 kb doublet is observed. Therefore the alternative orientation is correct, with the leftward Eco R1 site lying 8.80 kb from the central

Eco R1 site. The map of the cDNA hybridising fragments of Pad 1

(Figure 19b) agrees closely with and is only consistent with published restriction maps of the human P globin gene (Flavell et al, 1978,Tuan et al, 1979). The identity is confirmed by the nucleotide sequence data presented in Section 3.9. As the Bam

HI and Eco R1 sites within the gene lie at codons 90-91 and

1 120-121 respectively, the orientation of the map is LHS 5 RHS

1 3 with respect to the anticoding strand of the gene (Little et al, 1978, Lawn et al 1980). The inserted Hsu 1 fragment can be orientated with respect to the genome of the lambda vector using the Eco Rl, Bam HI, and Bgl 11 sites (data not shown).

The restriction map of Pad 1 is inconsistent with it being an

"escaped" AhpG2 phage.

3.7 Subcloning of the cloned P- Globin Gene into the plasmid

pAT153 '

Prior to further structural analysis of the cloned chromosomal gene it was transferred to a plasmid to increase the ratio of insert:vector DNA. The 7.5 kb Hsu 1 and 4.4 kb Pst

+ 1 fragments were subcloned. Digestion of ANEM788j3 with Hsu 1 gives only one fragment with Hsu 1 "sticky ends" at both termini. (One terminus of both the left and right arms of the phage vector is a "lambda" cohesive end. Consequently these fragments cannot be cloned into the Hsu 1 site of the plasmid).

+ ANEM788/? was digested with Hsu 1 and the integrity of the termini tested by ligation (data not shown). This DNA was added . 91

in threefold excess to Hsu 1 digested pAT153 (a gift from D.

Woods;Twigg and Sherratt, 1980). The mixture was transformed into E.coli HB101 (Boyer and Roulland-Dussoix, 1969).Six out of

72 picked colonies showed the ampicillin resistant tetracycline sensitive (amp-r, tet-s) phenotype expected for insertion at the Hsu 1 site of the plasmid. In colony electrophoresis analysis of the amp-r, tet-s bacteria, 4/6 colonies contained a plasmid larger than wild-type pAT153. Three out of these four contained a very large plasmid whilst the fourth contained an intermediately sized plasmid. The most likely candidates for plasmids with a 7.5 kb insert were the three which appeared the same

A different protocol was used for subcloning the 4.4 kb

+ Pst 1 fragment. ANEM788|3 was digested with Pst 1 and fractionated on a preparative agarose gel. The 4.4 kb band was visualised by ethidium bromide staining of an aliquot of the digest run in an adjacent track. Agarose containing the fragment was excised and the DNA recovered by electroelution.

The DNA was phenol extracted, precipitated, and its ligatability checked by agarose gel electrophoresis. A threefold excess of the DNA was ligated to Pst 1 digested pAT153 and transformed into HB101. 9/72 colonies picked gave a amp-s, tet-r phenotype, and 7/9 of these gave a "consensus" size supercoiled plasmid, band in colony electrophoresis experiments. This supercoiled species ran more slowly than wild-type pAT153. One of these 7 colonies was picked at random for further study.

Colony electrophoresis of the putative Hsu 1 and Pst 1 subclones followed by Southern transfer confirmed that the 92

ABABBAABAB Hsu Pst Bam Eco Bgl

Figure 20. Restriction Digests of the Pst 1 and Hsu 1 Subclones

Fragments were fractionate+ d on a 1.4% agarose gel. A = 4.4B = 7.5/3 . 93

plasmids hybridised to the/3globin cDNA plasmid pHj^Gl under

conditions where wild-type pAT153 was not detected (data not

shown). Bacteria containing the putative subclones were grown

in liquid culture, and plasmid DNA prepared by the-method of

Norgard et al (Norgard et al, 1979). Restriction analysis

confirmed the identity of the plasmids. Thus Hsu 1 linearises

the supercoiled Pst 1 subclone (Figure 20). This is consistent with a single Hsu 1 (Hind 111) site in the vector DNA.

Digestion with Pst 1 gives 4.4 and 3.7 .kb bands (the high molecular weight band is due to a slightly partial digest).

Similar arguments are used to confirm the identity of the Hsu 1

subclone. Pst 1 and Bam HI digests of the two plasmids run in parallel suggest that the Pst 1 subclone (4.4j3 ) contains a

subset of the human DNA sequences found in the Hsu 1 subclone

(7.5/J ; Figure 20). Data presented in the following sections confirms that the identities of these two plasmids are correct.

3.8 Restriction Mapping of the Subclone^ Recombinant Plasmids'.

Detailed restriction analysis was performed on the + 4 recombinant plasmids 4.4J3 and 7.5p . This could potentially

clarify two points.

1) Have any small insertions or deletions occured in or around

the p globin gene in p thalassaemia?

2) Has the cloned fragment gained or lost restriction sites

relative to a "normal" globin gene region? Such restriction

enzyme site polymorphisms are potentially useful in the

antenatal diagnosis of p thalassaemia (Kan et al, 1980; See

Section 5).

To this end the 7.5p plasmid was analysed with a battery 94

• Hpa + Pst

I Figure 21. Restriction Digests of the 7.5P Subclone. A representative mapping gel is shown. Ml = A Hsu 1, M2 = A Hsu 1/Eco R1. Five more similar gels were used to compile the data summarised in Table 5. 95

Table 5. Restriction Fragments Generated by Double-Digests of the Hsu~l Subclone (7.5p+).

Pst 1 Bam HI Eco R1 Bgl 11 Hpa 1 Xba 1 Hsu 1

Pst 1 4.40 3.30 3.50 3 .55 3 .35 4.40 4.30 3.20 2.40 3.00 2 .90 2 .95 2.95 2.85 2.80 1.85 2.65 2 .40 2 .85 2.60 2.60 0.25 1.62 0.72 0 .92 0 .82 0.80 0.78 0.78 0.25 0 .67 0 .64 0.25 0.25 0.54 0 .25 0.25

Bam HI 7.4 3.55 5 .00 7 .60 5.30 5.00 1.87 3.15 2 .80 1 .25 2.60 3.30 0.75 1.90 1 .90 0 .78 1.85 1.90 0.45 0.88 0 .67 0 .66 0.75 0.80 0.75 0 .64 0 .65 0.64 0.64 0 .22

Eco R1 7.3 4 .45 4 .80 7.70 3.4 2 .85 3 .60 1.77 ND. 2 .00 2 .15 1.77 1 .50 0 .61 1.90 0 .22

Bgl 11 6 .60 6.20 4.75 4 .00 4.60 3.50 0 .22 0.22 1.60 1.00 0.22

Hpa 1 6.80 3.85 ND, 0.61

Sal 1 5.0 >6 >7 4.6 5.0 3.00 1.6 0.64

Sizes are in kb. (Sal 1 alone linearises the plasmid). 96

(rackful?) of restriction enzymes to generate a self-consistent

map which could be compared with that of a normal p globin

gene. It was not necessary to use the Southern transfer

technique to simplify the mapping, as the plasmid vector pAT153

has been deliberately "engineered" to have single sites for

many hexanucleotide recognising restriction enzymes. Also

hybridisation to a cDNA plasmid would not detect DNA fragments

arising from the gene's flanking sequences. Sizes of fragments

generated from 7.5/3 are shown in Figure 21 and are summarised

in Table 5.

The map of 7.5p is shown in Figure 22a. This was

constructed in an analogous manner to that of the phage

+ ANEM788J[? . Similar data (not presented) was used to construct a

map of 4.4p (Figure 22b). The accurately known position of

restriction sites within pAT153, in particular the Pst 1, Bam

HI, Hsu 1 (Hind 111), and Sal 1 sites, were exploited to locate

restriction sites within the human DNA insert (Sutcliffe, 1978,

Twigg and Sherratt, 1980). Two differences from the published

restriction map of a normal globin gene were discovered

(Fritsch et al, 1979) .

Two Pst 1 sites are located in the 3' flanking sequence.

The distance from the Eco Rl site in the coding region and the

3' Pst 1 site is about 0.72 kb, and to the Xba 1 site is 1.77

kb. (The Xba 1 site can also be mapped with respect to the Sal

1 site in pAT153). However the Pst 1/Xba 1 distance is not 1.77

- 0.72 = 1.05 kb, but 0.8 kb, defining a second Pst 1 site

about 0.25 kb rightward of the first. The existence of a small

Pst 1 fragment is confirmed by acrylamide gel electrophoresis

of Pst 1 digested 7.5 (Inset, Figure 23d)or 3.6 Eco Rl (The 97

1Kb

n + n+ Figure 22. Restriction Maps of the Plasmids 4.4p and 7.5 p . P = Pst 1, E = Eco Rl, Bg = 11, X = Xba 1, Hs = Hsu 1, B = Bam HI, Hp = Hpa 1. Plasmid sequences are shown by parallel lines.

+ A. 7.5j3

B. 4.4P . (To simplify the diagra+ m not all of the enzyme sites in 4.4 p are shown). Bgl II Pst1 A

r=r Q) -0.25 d - 0.20 I

~ + + c:::::t. 0 c:::::t. 0 I I X L.O -r- lO -r- -e- ,...... ~ ,...... cr -B 8 Bg Hp R B R 8 R R -C ~ ~ ... I I rMB G.... D RB p Ht]-lS I I I II II ... . ~BR322 .. • p R1 - D B R X Bg ls::zl D ·d pM89-

Figure 23. Parallel Restriction Mapping of the Normal and p+Thalassaemia Globin Genes.

B. The subclones of the normal gene.

A, C, D. Parallel restriction digests. Bands common to the different subclones are sized in the diagram alongside the gel. C + D are 1.4 % agarose gels, A is 7.5% acrylamide. 99

EcoR1 c + Bgl ll +Xbal

- -3·00

--2·05 --1-90 - -l55

~ ~ 0:: ~ 0 0 u u w w

D Hpal +EcoR1 +BamH1 -

2·8-­

-- 2·25

- -1·55 1·3---

0·78-- & 0 --0·64-- wu 100

Eco R1 fragment spanning the 3' flanking sequences). A similar argument is used to locate a second Bgl 11 site 0.20 kb

1 downstream of the 3 Bgl 11 site (InsetdFigure 23). However Pst

1 or Bgl 11 digested subclones of the normal gene also generate these small restriction fragments. Thus the map of the normal gene is incorrect in that it lacks one Pst 1 and one Bgl 11 site.

The cloned normal and thalassaemic globin gene have identical restriction sites for the enzymes Eco Rl Bgl 11, Xba f l Pst 1, Hpa 1, and Bam HI. Sizing of fragments in these r experiments is + 50 base-pairs. Errors in sizing of normal and

"thai" fragments could occur such that fragments differing in size by say, 100 base-pairs are scored as being the same. To exclude this type of error, Bam HI, Hpa 11, Bgl 11, and Xba 1

n + digests of normal and p thai globin genes were electrophoresed in parallel. This experiment is complicated to interpret as the + P and p globin subclones are in different plasmid vectors, and give rise to a variety of junction (plasmid/vector) fragments. Subclones of the normal p globin gene cloned by

1 Lawn et al are Rl-C and Rl-D, corresponding to the 5 and3' Eco

R1 fragments respectively (Lawn et al, 1978). Hp -IS contains the same Pst 1 fragment as 4.4p . Rl-C and Rl-D are cloned in the plasmid pMB9, and H0-1S is cloned in pBR322. The results of these experiments are presented and interpreted in Figure

23. There were no detectable differences in the sizes of fragments generated by the normal and thalassaemic gene.

o+ 3.9 The Sequence of the jJ Thalassaemic Globin Gene.

3.9.1 Rationale / 101

The restriction mapping data presented in Section 3.8 excludes that this case of p thalassaemia is associated with large alterations in the gene's 5' and 3' flanking regions.

There is some evidence that the "thalassaemic" mutation maps close to the structural gene (See Section 4.3). If this is so then the lesion must be small. This could be located by DNA sequencing or by the SI nuclease mapping technique (Schenk et al, 1977). Here the two DNA molecules to be compared are annealed together to form a heteroduplex. Mismatches are susceptible to digestion by SI nuclease. The SI treated DNA is then denatured and sized by gel electrophoresis. However mismatches (base changes) mapped in this way will have to be sequenced anyway.I decided to adopt nucleotide sequencing from the outset. Subsequently Gannon et al have questioned the ability of SI nuclease mapping to detect point mutations

(Gannon et al, 1980). The technique failed to detect a 9 base- pair mismatch in a pair of cloned ovalbumin genes.

3.9.2 Technical aspects

The chemical sequencing method of Maxam and Gilbert was used (Maxam and Gilbert, 1977, 1980). In this technique a restriction fragment with a single terminal radioactive label is subjected to partial base-specific reactions which modify the purine or pyrimidine rings. The modified DNA is then treated with an excess of piperidine in a reaction which goes to completion. The phosphodiester backbone of the DNA is cleaved at positions corresponding to the modified bases.

Samples of a restriction fragment which have been treated with

G, G+A, C+T, and C specific reagents and cleaved with piperidine are fractionated on a denaturing acrylamide gel. 102

This gel is capable of resolving a chain of n nucleotides from a chain of n+1 nucleotides (to an upper size limit of n = 200-

250 for an 8% gel). Autoradiography of the gel reveals a

"ladder" of partial cleavage products, with each band corresponding to an individual nucleotide in the sequence of the restriction fragment. If there is a band in the G and the

G+A tracks at position n = 3, then there is a G residue 3 nucleotides from the 5* terminus of the fragment. If the next band up the gel is in the C+T track, but not in the C track, then this is a T residue. The sequence reads 5'..GT..3' and can

1 be routinely extended 200 nucleotides from the 5 terminus.

Restriction fragments were labelled at their 5' termini 32 with polynucleotide kinase and P y-ATP. This method was chosen as the y-ATP can be obtained with a specific activity >10x higher than a-labelled nucleotides used for DNA polymerase 1 or terminal transferase labelling. The "shotgun" sequencing method was used to generate appropriate restriction fragments (Maxam and Gilbert, 1980. Figure 24). In this approach a restriction fragment spanning the area of interest (X) is isolated and digested with a restriction enzyme (A) that cleaves DNA frequently (in practice an enzyme with a 4 or 5 nucleotide

recognition sequence). The resulting fragments are labelled and strand separated, or labelled and redigested with a second enzyme (B). All of the singly-labelled fragments are recovered from an acrylamide gel and subjected to the four base-specific

reactions. The process is repeated by digesting X with a different enzyme (B) and labelling the termini. Again singly labelled fragments can be obtained by re-restriction with a second enzyme (A) or by strand separation. In this way 103

A BA A B X

Figure 24. The "Shotgun" Sequencing Protocol.

A starting fragment X is shown. Labelled 5' termini are indicated by asterisks, s.s. = strand-separation. 2° r. = secondary restrict- ion. B and A are two different restriction enzymes 104

"overlaps" are created between the sequenced fragments. The overlaps can be ordered manually or by feeding the sequences

into a computer. The advantage of this method is that one does not need an a priori knowledge of X's restriction map.

Strand separations were used to generate singly labelled

restriction fragments. This method has three advantages:-

1) Re-restriction of double-stranded DNA fragments will reduce-

their length, limiting the distance from the termini which can be sequenced.

2) The re-restriction depends on the presence of enzyme sites within the labelled double-stranded fragment. Suitable sites will not always be present.

3) The enzyme used for re-restriction may be contaminated by

phosphatase or exonuclease which will remove the terminal ^P

phosphate residues

Ideally the starting restriction fragment should span

only the area of interest. Three starting fragments were used

for sequencing of the "thalassaemic" globin gene (Figure 25).

The plasmid 4.4p was digested with Eco R1 and Bam HI and

fractionated on a 5% acrylamide gel. The 1.9 kb Bam HI

fragment, the 1.5 kb Eco R1 fragment, and the 0.9 kb Bam Hl/Eco

R1 fragment were visualised by UV-shadowing and eluted from the

acrylamide matrix (Maxam and Gilbert, 1980).

The 0.9 kb fragment contains the whole of the large intron.

This was digested with Rsa 1, or Mbo 11, or Mnl 1,

dephosphorylated, and kinase labelled. The labelled fragments

were denatured and loaded onto a strand-separation acrylamide

gel. All of the single-stranded labelled fragments visualised

by autoradiography were eluted and sequenced. 105

P

BL_ 1-9 B 0-9 R 1-5 R

Ava6

Figure 25. Starting Fragments for "Shotgun" Sequencing of the ft + Thalassamia Human Globin Gene.

The numbers are the fragment sizes in kilobases. B = Bam HI, R = Eco R1 106

The 1.9 kb Bam HI fragment contains about 0.5 kb of the

gene's 5' terminus. This fragment was digested with Hae 111, or

Sau 96, or Hinf 1, or Hph 1, labelled and strand-separated. The

sizes of fragments containing globin gene sequences were

already known from the sequence of Lawn et al (Lawn et al,

1980). However the mobility of single-stranded DNA in these

gels does not depend solely on size. Consequently all the

strand-separated fragments of approximately the predicted size were eluted and sequenced. Out of the 9 fragments identified in

this way, 6 gave globin gene sequences. The other 3 fragments must have been derived from the 5' flanking sequences.

The 1.5 kb Eco R1 fragment contains about 0.3kb of the gene's 3' terminus plus 0.4 kb of flanking sequences plus 0.75

kb of plasmid DNA. This was digested with Hinf 1, or Hph 1. One

Hph 1 and two Hinf 1 fragments were used for sequencing this

area of the gene. No other fragments containing flanking or plasmid sequences were spuriously isolated. A fourth

restriction fragment was isolated from a total digest of 4.4)3 .

This 0.22 kb Aya 11 fragment ("Ava 6") spans the Bam HI

junction of the 1.9 and 0.9 fragments. This fragment was

identified by comparing Ava 11 and Ava 11 .+ Bam HI digests of

4.4

DNA eluted from strand-separation gels by the original method of Maxam and Gilbert electrophoresed aberrantly on the sequencing gels. The introduction of an extra step to remove solubilised acrylamide cured this problem.

G, G+A, C+T, and C specific reactions were used for sequencing (Maxam and Gilbert, 1980). In all of the sequencing gels there is "breakthrough" in the pyrimidine specific tracks 107 G G+A C+T C

Figure 26. Sequencing Gel of an Rsa 1 Fragment Spanning the 3' End of the Large Intron.

The autoradiogram of the 8% gel shows the anti- strand. The splice junction with the third coding- block is shown. 108

(C, C+T) . This manifested itself as faint bands on the

autoradiographs at positions corresponding to all possible

chain lengths (Figure 26). This effect cannot be due to a side-

reaction of the piperidine cleavage as the artefactual bands would then be seen in the purine specific reactions (G, G 4- A).

The effect was observed with two different batches of hydrazine

(HZ) from one commercial source (BDH) and also with HZ from

Fluka AG. This makes it unlikely that the artefact resulted

from impurities in the HZ. Carry-through of HZ to the piperidine reaction gives a different kind of artefact (smeared

bands). Predictably extra ethanol precipitation steps (to

remove the putative contaminating HZ prior to chain-cleavage)

failed to improve the situation. The artefact may reflect

impurities in the DNA itself rather than impurities in the

chemical reagents. However the "breakthrough" is also seen when

restriction fragments are eluted from agarose gels by either of

two protocols. (Electroelution, phenol extraction and ethanol

precipitation of the DNA solution; soaking the gel matrix in an

elution buffer followed by DEAE cellulose and ethanol

precipitation steps).The breakthrough was not normally

problematic as bands corresponding to true cleavage products

were invariably more intense than the background bands. Most

of the gene (> 60%) was sequenced on both strands. In this way

possibly ambiguous C or T residues (pyrimidines) could be

derived from the unambiguous purine residues on the complement-

ary strand. Latterly a T-specific reaction was run on the

sequencing gels (Rubin and Schmidt, 1980).

3.9.3 The Sequence.

The "thalassaemic" P globin gene was sequenced in its 109

— EE — ri a; £ D con ji-a c ~ C -C a - III IQX. < 0 CD(Q <> Q(AC 5 I I I II I I I I I I I u I I I V I 1

# «

B

a a R ctattggtctattttcccacccttagGCTGCT atgatgagct gttc

Figure 27. The Sequencing Protocol.

A. The protocol. Only restriction sites used for sequencing are shown. The arrows show the dist- ance sequenced from each restriction site. The "blunt" end of the arrow is the labelled 5' term- inus. Nucleotides adjacent to the 5'terminus which have not been sequenced are indicated by dashed lines.

B. Differences from the sequence of the normal gene. Map positions are shown by stars, and the actual base-changes are shown above the normal sequence. 110

ANTI - CODING

G G C C A T

Figure 28. Sequencing gel of a Hinf 1 fragment spanning the gene's 3' flanking region.

A difference from the "normal" gene sequence is indic- ated by a star. The nucleotide marked A/G is clearly visible as a G on the original autoradiograph. ANTI -CODING

Figure 29. Sequencing gel of a Hinf 1 fragment spanning a region approximately 80 nucleotides downstream" from the gene's polyadenylation site~.

Differences from the normal gene sequence are indicated by stars. Anti-coding sequences are shown. 112

G G C C ANTI - CODING A T

Figure 30. Sequencing gels of the coding and anti-coding strands of an Hph 1 fragment spanning the gene's smal1 intron.

A difference from the normal gene sequence is indicated by a star. 113

entirety. The sequence is 1925 nucleotides long. Its 5' extremity is 154 nucleotides from the "cap" site, that is, the first nucleotide of the mRNA. The sequence finishes 210 nucleotides beyond the polyadenylation site. The sequencing protocol is shown in Figure 27a. More than 60% of the gene has been sequenced on both strands and 86% of the gene has been sequenced at least twice. All of the restriction sites in the gene, with the exception of the Eco R1 site and the outermost

Hinf 1 sites, have been "overlapped". The availability of a prototype sequence from a "normal" gene means that any remaining ambiguities could be excluded. Overall the sequence of the "thalassaemic" gene should be highly accurate.

Four differences from the normal gene sequence were

f found. These are shown in Figure 27b. Three lie in the 3 flanking sequence, 83, 89, and 130 nuceotides beyond the polyadenylation site (Figure 28,29). A fourth lies within the

1 small intron, about 20 nucleotides from the 3 splice junction

(Figure 30). The possible significance of these sequence variants is discussed in the next section. Section 4 : Discussion 114

4. Discussion

4.1 The Analysis of Control Sequences in Eukaryotic Genes

The human globin genes are a paradigm for the study of expression and developmental control of eukaryotic genes

(reviewed by Maniatis et al, 1980a). The organisation of the p- and d-like globin gene clusters has now been extensively described (Fritsch et al, 1980). However knowledge of the specific nucleic acid sequences and protein/RNA species which interact in the expression of these genes is more limited.

Expression can be examined at the protein or RNA level. While protein synthesis is understood in some detail for eukaryotes, this discussion concerns the less well understood processes of

RNA synthesis and maturation. In particular the function of extracistronic DNA in these events remains obscure. Isolating trans acting diffusable control molecules may be a difficult problem for eukaryotic molecular biology at the moment, but the

identification and mapping of "regulatory" DNA sequences within the globin gene cluster is more approachable.

In classical , mutants which quantitatively affect a gene's output define nucleotide sequences involved in the expression of that gene. The map position of loci conferring an over- or under- expressing phenotype ("up" or "down" mutations) can be delimited by linkage analysis with known chromosomal markers. By constructing partial diploids, complementation tests will define whether controls acts in cis or in trans. Such complementation analysis has been used for the lac of

E.coli (Pardee et al, 1959) and has recently become avilable 115

for lower eukaryotes (Hinnen et al, 1978, Nasmyth and Reed,

1980). Rapid DNA sequencing techniques can localise the base changes which occur in these control mutants. This type of study has been used to most effect to define "regulatory" sequences in the (Miller, 1970) and in the immunity region of bacteriophage lambda (Ptashne et al, 1980, Flashman,

1978). Our concepts of promoters, repressor binding sites, UNA polymerase binding sites and terminators derive from these studies. The approach will identify sequences involved in any of the steps of RNA biosynthesis and should theoretically be applicable to eukaryotes (even though they have additional levels of complexity in these processes, such as RNA splicing).

The starting point for this "classical" approach is a mutant which is usually obtained by mutagenesis followed by identification of the desired mutant phenotype. The extremely large size of a eukaryotic genome means that a single copy sequence is a very small target for mutagenesis. Therefore a

"hit" will be an accordingly rare event and isolation of the mutant really depends upon being able to select against the wild-type phenotype. Ironically many biologically interesting eukaryotic genes whose structures have been well defined (e.g. globin, ovalbumin, insulin) are not suitable for this kind of analysis. This is not coincidence. It has been possible to clone these genes because their high output allows the isolation of mRNA in a sufficiently pure form to be used as a ~ hybridisation probe. However they are "luxury" genes in that their product is not an enzyme, and is not essential for the metabolism of individual cells. Consequently one cannot apply selective pressure, the prerequisite for isolation of a mutant, 116

for a change in the output of these genes.Even if the

appropriate selection could be applied, the diploid nature of most eukaryotic genomes would mean that often both alleles of

a gene would need to be mutated before the mutant phenotype

could be scored. This would require two mutational hits. (The

probability of acheiving this is the square of the probability

of getting one hit). Alternatively one could perform a

particular genetic cross (difficulties here for humans) to

obtain a homozygote. In the one case studied where strong selective pressure can be applied for over-expression of a single copy gene, the mouse dihydrofolate reductase (DHFR) gene, the cell does not respond by increasing the output of

that gene but by duplicating and reduplicating the area of the chromosome containing the gene (Alt et al, 1978, Nunberg et al,

1980). This is an interesting mechanism in itself but does not give us any information about the elusive regulatory elements.

Perhaps the only fruitful genetic data on eukaryotic regulatory elements comes from study of the Drosophila melanogaster Rosy

locus (Chovnick et al, 1977). Rosy mutants have an abnormal eye

colour owing to a deficiency in xanthine dehydrogenase (XDH).

The molecular structure of this gene is unknown but genetic

analysis has defined a mutant, i409, which elevates the level of XDH but lies outside the structural gene region. This mutation acts in cis and a variety of fine structure crosses

confirms that there is not a trivial explanation for the mutant

phenotype. The size limits of the XDH control element can be obtained from the map distances from 409 to the nearest markers on the same chromosome and is estimated at .0034 - .0054 map

units, equivalent to 3.00 - 4.75 kb. 117

Because of the limited feasibility of classical genetics an alternative methodology has been advocated for probing eukaryotic gene function. In "surrogate" or "reversed" genetics

(Birnstiel and Chipchase, 1977, Weissmann, 1978), mutations are

introduced into a cloned copy of the gene and their effects on the phenotype scored by expressing the genes in vitro. In vitro mutagenesis is very.efficient, disposing with the need for a phenotype dependant "selection" step in isolating the mutants, and, because it is enzymatic the type and map location of the mutation can be controlled. The approach can be used for RNA as well as DNA and was used to study the function of sequences surrounding the AUG codon in the coat of the RNA phage

Qp (Flavell et al, 1976, Taniguchi and Weissmann, 1978).

However DNA is an easier substrate for such site-directed mutagenesis owing to the plethora of known DNA modifying enzymes. Restriction endonucleases can be used to generate defined deletions or insertions in a target molecule in vitro

(Humayum and Chambers, 1979, Grosschedl and Birnstiel, 1980).

If these enzymes are used in concert with "ordinary" mutagens

(Shortle and Nathans, 1978) or with chemically synthesised oligonucleotides (Hutchison et al, 1978) single base mutations can be constructed. Deletions generated in vivo by the excision of a transposable genetic element have also been used in the

study of a recombinant plasmid containing a Xenopus laevis 5S

DNA gene (Fedoroff, 1979). The transposable element lay adjacent to a a cloned 5S DNA sequence. Excision of the element created deletions which had one terminus within the 5S DNA.

The technology for investigating the phenotype of mutant genes is still rudimentary."Phenotype" is normally investigated 118

in terms of the type and amount of RNA molecules produced by the gene, so the problem has been to coax accurate efficient transcription out of in vitro or quasi-physiological systems.This field has been reviewed recently by Maniatis

(1980b). Briefly, cloned DNA can be expressed by transcription in vitro with cellular extracts (Manley et al, 1980), or reintroduced into a living cell by microinjection (Wickens et al, 1980), by cotransformation with a selectable marker (Wold et al, 1979), or by inserting the cloned DNA into the genome of an eukaryotic virus (Hamer and Leder, 1979).

What information has this approach yielded about eukaryotic gene regulatory sequences? The most detailed study has been carried out on the Somatic 5S RNA genes of Xenopus borealis (Sakonju et al, 1980, Bogenhagen et al, 1980). A region in the coding sequence has been identified which is necessary for the initiation of transcription (by RNA polymerase III) , and the 5* flanking sequences have been shown to be involved in the selection of the exact initiation start point.The data for genes transcribed by RNA polymerase II, such as globin, ovalbumin, etc., is more rudimentary as yet but implies a requirement for the 5' flanking sequences in general and the "TATA" box in particular (Figure 4; the TATA box is in fact a CATA box in the human P globin gene).

4.2 Naturally occurring mutations of the human globin genes

The globin genes are unusual amongst well characterised eukaryotic genes in that a large number of mutants exist both for the type and amount of gene product.Members of the latter class (superficially at least) resemble regulatory mutants.The existence of mutant alleles probably does not reflect a 119

hypermutability of the globin locus but the more prosaic consideration that there is a selective advantage for decreased globin chain output in geographical areas where malaria is prevalent (Haldane, 1949). This bizarre system of selection pressure is fortuitous for the molecular biologist. Therefore the globin genes can supply the first prerequisite for the classical approach to gene regulation, mutant alleles. These are the genes for the a- and ft- thalassaemias, where protein output is reduced. Deletion syndromes also exist, both for the a- and p-like gene clusters, where expression of embryonic or foetal globin protein chains persist beyond the normal developmental stage.The advantage of working with these mutant genes is that their phenotype is established in vivo whereas the phenotype of an allele engineered in vitro can only be established in a quasi-physiological environment.Thus the thalassaemia syndromes and related disorders provide an alternative to the surrogate genetic approach for studying gene expression. This thesis concerns the molecular analysis of a + p-thalassaemic globin gene. The starting assumption is that identification of the mutation producing this phenotype will delineate a sequence important for the functioning of a normal gene. As both the protein and mRNA products of p alleles are ostensibly normal, the chain imbalance observed in the patients may not be due to impaired protein synthesis, and could be the result of decreased P globin RNA synthesis. The "p -thai" gene, that is the mutation conferring the thalassaemic phenotype, is a recessive.

+ 4.3 The map position of the P thalassaemia "gene"

Genetic crosses are not readily controllable in the human 120

population and this precludes detailed mapping of the

thalassaemia gene by linkage analysis. The approach described

here is. to clone a p globin gene from a thalassaemic subject

1 and to map the mutation by comparison with a 'normal gene.

However, one has to somehow make sure that the restriction

fragment cloned from the patient includes the mutant

n+ sequence (s) . This could be a serious problem as the }J thal-

assaemic lesion may not lie immediately adjacent to the /3- globin structural gene. The precedents for 'long-range'

interactions are the syndromes such as HPFH and 6/3

thalassaemia where deletions lying distal to the foetal globin genes elevate gamma-globin expression (Fritsch et al, 1979,

Figure 5). More recently it has been shown the low level of/3 globin gene expression in Y£ thalassaemia seems to be the

result of a deletion several kilobases away from the /3 globin gene (Van der Ploeg et al, 1980).The problem here is that the

role of extracistronic DNA in globin gene expression may be

trivial e.g. there is a requirement for the extracistronic DNA

to be a certain size, rather than a requirement for it to

contain specific nucleotide sequences.If the extracistronic DNA does turn out to contain specific "regulatory" sequences then

it is conceivable that a point mutation at these positions could affect output from a gene many kilobases away. Such a mutant could have a thalassaemic phenotype and no detectable

change in the sequence coding for, and proximal to the /3 globin gene. The Greek form of HPFH could turn out to be such a mutant

(Tuan et al, 1980). + So we cannot formally exclude that the /3 thai mutation

lies away from the /3-globin gene. Conversely, is there any 121

evidence that the mutation maps close in to the gene? This has been tested by looking at the children of individuals doubly heterozygous for P thalassaemia and sickle-cell disease with unaffected partners. If the genes are not tightly linked then meiotic crossover will generate zygotes with neither or both mutations.In the families studied (which included cases of both p and p thalassaemia) only one child out of 62 was a possible

"crossover" in that he carried neither mutation.In an analogous study looking at crossover between the P and 6 genes, 2 or 3 out of 31 children were good candidates for crossing over

(Weatherall and Clegg, 1972). This data implies that the determinant for most P-thalassaemias lies closer to the P gene than the 6 gene.

A second line of evidence concerns the abnormal haemoglobin, HbK Woolwich (Lys Gin p 132), which behaves like a p thalassaemia in that the mutant protein is produced at lower levels than anticipated (Lang et al, 1974).This is not due to protein instability.The other possibility is that there are two separate mutations, for P-thalassaemia and for HbK

Woolwich.This is unlikely as the under-representation of the protein is seen in unrelated patients with HbK Woolwich. Also, in a myoglobin variant where the equivalent lysine residue is replaced by an asparagine, the variant only accounts for 30% of

+ total myoglobin. Overall it seems that this particular j3 thalassaemia is due to a mutation in the /3globin structural gene.

The other evidence for the p thalassaemia lesion mapping n n"* near the p globin gene is more indirect. The mutation in p thalassaemia probably affects either globin gene transcription 122

or RNA splicing (Section 1.10). The recognition sequences for these two processes lie adjacent to the gene (Section 1.7,

\ Figure 4). These sequences were identified by comparing the nucleotide sequences of diverged globin genes. These assignments have now been verified by functional assays of

1 in vitro modified globin genes. Thus deletion of 5 flanking sequences adjacent to the transcription initiation site of rabbit P or mouse P major globin genes abolishes or reduces transcription (Diercks et al, R. Flavell, personal communication). Similarly, expression of mouse P globin gene fragments in a viral vector suggests that only the sequences immediately adjacent to the coding block/ intron boundaries are required for accurate splicing of pre-mRNAs (Hamer and Leder,

1979). These three tentative lines of evidence imply that most/some p thalassaemia mutations will be near the p globin gene.

4.4 Molecular cloning the p-thalassaemic globin gene

With these considerations in mind I decided to clone a chromosomal j3-globin gene plus flanking sequences into bacteriophage lambda. Globin cDNA plasmids had been constructed and characterised already (Figure 6, Little et al, 1978) and these were used as hybridisation probes in this process. The availability of a physical map around the bp globin genes -

(Flavell et al, 1978, Mears et al, 1978) allowed a suitable restriction fragment for cloning to be selected. At the commencement of this project disabled lambda vectors had been constructed for the restriction enzymes Eco R1 and Hind III

(Leder et al, 1977, Blattner et al, 1977, Murray et al, 1977).

Unfortunately Eco R1 is known to cleave within Jthe coding 123

region of the p globin gene such that the gene would have to be

cloned in two pieces. This problem could be circumvented by

performing non-limit Eco Rl digests, or by ligation of Eco Rl

synthetic linkers to Hae 111 plus Alu 1 non-limit digests of human DNA (Maniatis et al, 1978). Both limit and non-limit methodologies have been used by other laboratories to clone p globin genes (Spritz et al, 1980, Burns et al, 1979).

It is difficult to distinguish a homozygous P n+ n° thalassaemic from a compound heterozygote for p and (J thalassaemia. This ambiguity would not be resolved by molecular o + cloning alone, as both j3 and P globin genes are superficially

indistinguishable from normal j3 globin genes. For this reason the starting DNA was from a Turkish Cypriot patient doubly + heterozygous for j3 thalassaemia and the 6p Lepore fusion gene.

This genotype was established from haematological data and also from DNA blotting experiments (Figure 9, 10). As the patient 4- O makes Hb Lepore and Hb A, he must carry a p allele, not a p allele.

I chose to clone the 7.5 kb Hsu 1 p globin gene fragment produced by a limit (total) digest of this patient's

DNA. This cloning strategy has advantages over the approaches used by other laboratories (Spritz et al, 1980, Burns et al

1979). For instance, Spritz et al have cloned two halves of a + globin gene from a Greek Cypriot homozygous (?) for p thalassaemia. These two gene halves are a 5' 5 kb Eco Rl fragment, and a 3' 3.5 kb Eco Rl fragment (Spritz et al 124

1980). These two half genes may be from different P globin alleles. In the "worst-case" analysis of this situation, the 5'

n+ fragment may be from a /J allele, and the 31 fragment may be _o from a p allele. Cloning of a single fragment spanning the whole P globin gene is a simpler and better methodology.

DNA from the compound heterozygote patient was digested with Hsu 1 and fractionated on the basis of size to enrich for the 7.5 kb P-globin gene fragment. The enriched DNA was ligated to purified arms of the vector NEM788 and packaged ir\ vitro.

About half of the resulting phage were true recombinants with human DNA inserts (Figure 15; Table 3) .[The background was probably due to contaminating trp E fragment in the preparation of phage arms and this problem could be cured by using agarose gel electrophoresis to exclude the trp E fragment.]These recombinant phage were screened using standard methodology. One positive-scoring recombinant was detected. This phage, which gave a signal roughly equivalent to the internal standard of

AH b G2 phage, has a single inserted fragment of the predicted size (7.5 kb). The restriction map of this 7.5 kb Hsu 1 fragment is not consistent with that of any known non CZ -globin gene or pseudogene except the j3-globin gene itself.

+ 4.5 Structural comparison of normal and p-thalassaemic globin

genes.

Identifying the lesion in the p-thalassaemic globin gene by comparison with a cloned normal gene makes a number of assumptions about which there is little clarifying data. The first assumption is that all types of somatic cells contain equivalent globin genes. This is potentially important as there 125

are a variety of cell types in the peripheral blood used to

make nuclear DNA. If different somatic cells have different

"versions" of the same gene then these somatic mutations could

be mistaken for the thalassaemic lesion. The general question

was addressed in the nuclear transplantation experiments of

Gurdon, and Briggs and King (Gurdon, 1974, Briggs and King,

1953) where the genomic "equivalence" of a number of somatic

cells was demonstrated. However doubts about these experiments

and also authenticated DNA rearrangements in B lymphocytes

leave this question essentially open (Sakano et al, 1979). Data

for the globin gene system does not really clarify the

situation. Using the Southern blot technique Jeffreys and

Flavell (1977a) found no change in the physical map of the

Rabbit P -globin gene when the starting DNA was isolated from a

number of different tissues. This methodology would not detect

subtle alterations in or around a gene unless they spanned a

restriction enzyme recognition sequence. Tissue specific DNA methylation patterns do exist in the human Y6p globin locus

but these would not be preserved after cloning in E.coli (Van der Ploeg and Flavell, 1980).

Another assumption is that no mutations are introduced

into the gene during cloning. Care was taken not to expose the human DNA to potentially mutagenic agents. Thus the DNA was

kept buffered at all times to avoid depurination, and was not

exposed to ethidium bromide. Once the human DNA is ligated to a vector and reintroduced into E.coli, the fidelity of its

replication is governed by this organisms DNA polymerases. o These have very low error rates, about one base in 10

Cloning and sequencing of genes coding for human G, P ,y ,and 126

6 globins, human insulin etc., have shown no discrepancies between the DNA and protein sequence (Efstratiadis et al, 1980,

Bell et al, 1980). These experiments would include any errors from the DNA sequencing, as well as those hypothetically introduced during cloning. Therefore cloning in E.coli does not normally introduce errors into eukaryotic DNA. The only known exception involves cloned sequences containing directly repeated elements. Here E.coli mediated recombination leads to deletion of the intervening DNA. This has been observed in clones of duplicated y- and a-globin genes, and in a cloned copy of a retroviral provirus (Fritsch et al,1980, Lauer et al,

1980, Van de Woude etal,1979). The 7.5kb fragment cloned here contains no direct repeats and has been propagated extensively without deleting inserted sequences (data not presented)•

The final assumption is that the background "noise" of

DNA base changes in the human population is sufficiently low that a "real" mutation can be distinguished from an asymptomatic sequence variant. This problem has been noted previously (Jeffreys, 1979, Bank et al, 1980).

The first type of structural comparison was restriction mapping. The published map of the normal gene is incorrect as it lacks one Pst 1 and one Bgl 11 site. However, the normal and thalassaemic subclones are indistinguishable in these experiments (Figure 23). Thus there is no gross rearrangement + of the p globin locus in thalassaemia. This result is trivial but may have been assumed, rather than rigorously proven, by other workers. Although several groups have cloned o +

P and p thalassaemia globin genes, and mapped them against the normal p globin gene subclones, the extra Pst 1 and Bgl 11 127

sites missing from the map of the normal gene were not

discovered (Spritz et al 1979, Burns et al, 1979).

The next step was to compare the nucleotide sequences of 4*

the normal and P thalassaemic globin genes. The sequencing

protocol was extensive (Figure 27a). The availability of a

prototype sequence from the normal gene makes it unlikely that

errors have crept into the final version of the " thalassaernic"

sequence. The sequence is very similar to that of the normal

gene, with only 4 base differences. This result, along with

"thalassaemic" gene sequences from other labs, establishes that

globin genes from unrelated individuals are not "peppered" with

random sequence polymorphisms.

Three of the four base changes lie in the gene's 3'

flanking region at -83, -89, and -130 nucleotides beyond the

polyadenylation site. The fourth lies near the 3' end of the

small intron (Figure 27b). Only one of these base differences

spans a restriction site. The replacement at -83 creates a new Mbo 11 site. Two of the changes (-83, -130) are also found in _ o

the p thalasaemia gene sequenced by Flavell and co-workers

(personal communication). With our present knowledge^of gene

function, it seems likely that the -83, -89, and -130 sequence

variants are asymptomatic.

The gene codes for a normal p globin mRNA. Therefore defective mRNA translation can be excluded as the cause of this

thalassaemia. Similarly, the 5' and 3* flanking sequences, extending for 114 and 82 nucleotides beyond the gene, are normal. This makes it unlikely that transcription initiation or termination is perturbed. The gene has not been transcribed

in vitro but the "Spritz" gene, which has identical sequences 128

from +114 to -82 (and possibly further) £s transcribed efficiently in vitro (Spritz et al, 1980). Probably the only other process which could be perturbed is RNA splicing. The only base variant which could conceivably affect this process

is the intron sequence variant. + This variant is also found in a P globin gene isolated from a homozygous Greek Cypriot thalassaemic (Spritz et al,

1980). Unfortunately, the involvement of this sequence change in the thalassaemic phenotype is still equivocal. This is because the "biology" of introns is not well understood.

Deletions made in intron sequences of the virus SV40 do not affect the'expression of the "host" gene (Volckaert et al,

1979). Similarly, insertion of synthetic lac operator fragment

into a yeast tRNA gene intron did not affect production of the mature tRNA (Johnson et al, 1980). These experiments establish that intron sequences are not important for gene expression.

However there is still a possibility that a mutation in an

intron sequence could interfere with gene expression. The mutation would have to convert a normally "passive" sequence

into an "active" one. Spritz et al suggest that the G->A transition creates a new splice "acceptor" site within the body of the small intron. This site could compete with the authentic acceptor site 20 nucleotides downstream, and consequently retard pre-mRNA processing (Spritz et al, 1980). I scanned sequences derived from the small intron with a computer programme designed to look for RNA secondary structure (kindly provided by R. Semeonoff, University of Leicester). In the normal gene the G residue lies in the centre of a (predicted) self-complementary hairpin structure 10 nucleotides long. This 129

is the largest such structure in the small intron.The G->A replacement interrupts the self-complementarity into two blocks of 4 and 5 nucleotides. This is quite interesting but the importance of this type of structure cannot be predicted at the moment•

The significance of the base replacement could be established by assaying the "splicability" of the P gene's transcripts in vitro. If the transcripts were spliced less efficiently than those from a normal gene, then the G->A replacement would be implicated as the cause of this thalassaemia. Alternatively, demonstration of the same G->A replacement in a clinically normal subject would show that this mutation is neutral.

If the G->A transition is asymptomatic then it may be a high-level sequence polymorphism in Mediterranean populations.

This statement is provisional as the sample of "sequenced" genes is not random, but from a limited section of the population (i.e.thalassaemics). If this p globin gene is functionally indistinguishable from a normal gene, then either the functional assays are not accurate enough, or the "lesion" lies outside the cloned fragment. Either way, molecular biologists still have a few "globin-specific" headaches left. Section 5 : Summary and Perspectives. 130

5. Summary and Perspectives.

This research project has investigated the molecular + biology of p thalassaemia. This disease is particularly

interesting as it has a phenotype, decreased mRNA production,

reminiscent of some prokaryotic "control" mutants. I have cloned a P globin gene which definitely lies on the same chromsome as the p thalassaemia determinant. The sequence of this gene differs from that of a normal P globin gene by only

1 one nucleotide. Three other base-changes lie in the 3 flanking

region. Fears about "background" base-changes, due to polymorphisms or manipulation artefacts, turned out to be unwarranted. The only process that the "mutation" is likely to affect is RNA splicing. This would have to be confirmed by experimentation as our present knowledge of intron "biology" is

rudimentary.

I have not placed any emphasis on the clinical aspects of

thalassaemia. This is because the cloning and sequencing tech- niques take too long to be of any use for antenatal diagnosis.

The Southern blot technique is of much more use in these

situations, p thalassaemia can be diagnosed in some populations

by its "linkage disequilibrium" with restriction enzyme site

polymorphisms (Kan et al, 1980). An alternative technique can

be used when a homozygote or normal child has been born to the

family at risk (Little et al, 1980).

In the near future it may be possible to "cure" cases of

thalassaemia by the stable introduction of functional globin

genes into the bone-marrow cells of thalassaemic patients. This

type of technology exists already (Wold et al, 1979). As yet 131

"gene therapy" has not been convincingly demonstrated in human subjects. Section 6 : References.

/ 132

References

Alt, F.W., Kellerns, R.E., Bertino, J.R., and Schimke, R.T. (1978). J. Biol. Chem. 253 1357-1370.

Anderson, J.N., and Schimke, R.T. (1976) Cell 1_ 331-338.

Armstrong, K.A., Hershfield, V., and Helinski, D.R. (1977) Science 196 172-174

Baglioni, C. (1962) Proc. Natl. Acad. Sci. U.S.A. 48 1880-1884.

Bank, A., Mears, A.G., and Ramirez, F. (1980) Science 207 486- 493

Bell, G.I., Pictet, R.L., Rutter, W.J., Cordell, B., and Goodman, H.M. (1980) Nature 284 26-32

Bellet, A.J.D., Busse, H.G., and Baldwin, R.L. (1971). in the Bacteriophage Lambda, ed. Hershey, A,D. (Cold Spring Harbor Laboratory, Cold Spring Harbor N.Y.). 501-513.

Benton, W.D., and Davis, R.W. (1977) Science 196 180-182.

Benz, E.J. Jnr., Forget, B.G., Hillman, D.G., Cohen-Solal, M., Pritchard, J., Cavallesco, C., Prensky, W., and Housman, D. (1978) Cell 14 299-312

Bernards, R., Little, P.F.R., Annison, G., Williamson, R., and Flavell, R.A. (1979a) Proc. Natl. Acad. Sci. U.S.A. 76 4827- 4831

Bernards, R., Kooter, J.M. and Flavell, R.A. (1979b) Gene 6 265-280.

Bernards, R. and Flavell, R.A. (1980) Nucleic Acids Research 8 1521-1534

Birnstiel, M.L. and Chipchase, M. (1977) Trends biochem Sci. 2 149-152

Blattner, F.R., Williams, B.G., Blechl, A.E., Denniston- Thompson, K., Faber, H.E., Furlong, L.A., Grunwald, D.J., Kiefer, D.O., Moore, D.D., Schumm, J.W., Sheldon, E.L. and Smithies, 0. (1977) Science 196 161-169.

Bogenhagen, D.F., Sakonju, S., and Brown, D.D. (1980) Cell 19 27-35

Botchan, M., Topp, W. and Sambrook, J. (1976) Cell 9 269-280

Breathnach, R., Benoist, C., O'Hare, K., Gannon, F. and Chambon, P. (1978) Proc. Natl. Acad. Sci. USA 75 4853-4857

Britten R.J., and Kohne, D.E. (1968) Science 161 529-540 133

Brown, D.D. and Gurdon, J.B. (1978) Proc. Natl. Acad. Sci. U.S.A. 75 2849-2853

Brown, D.D. and Gurdon, J.B. (1977) Proc. Nat. Acad. Sci. U.S.A. 7_4 2064-2068

Bunn, H.F., Schmidt, G.J., Haney, D.N., and Dluhy, R.G. (1975) Proc. Natl. Acad. Sci. U.S.A. 72 3609-3613

Burns, A.L., Spence, S., Ramirez, F., Mears, J.G., and Bank, A. (1979) Blood 54 51a

Cameron, J.R., Philippsen, P. and Davis, R.W. (1977) Nucleic Acids Research £ 1429-1448

Catterall, J.F., Stein, J.P., Lai, E.C., Woo, S.L.C., Dugaiczyk, A., Mace, M.L., Means, A.R. and O'Malley, B.W. (1979) Nature 278 323-327

Chang, J.C., and Kan, Y.W. (1979) Proc. Natl. Acad. Sci. U.S.A. 76 2886-2889

Chang, J.C., Temple, G.F., Trecartin, R.F., and Kan, Y.W. (1979) Nature 2_81 602-603

Chovnick, A., Gelbart, W., and McCarron, M. (1977) Cell _11 1-10

Chow, L.T., Gelinas, R.E., Broker, T.R., and Roberts, R.J. (1977) Cell 12 1-8

Clegg, J.B., Weatherall, D.J., and Milner, P.F. (1971) Nature 234 337-340

Clegg, J.B., and Weatheall, D.J. (1976) Br. Med. Bull. 32 262- 269

Coggins, L., Grindlay, G.J., Vass, J.K., Slater, A.A., Montague, P., Stinson, M.A., and Paul, J. (1980) Nucleic Acids Res. 8 3319-3333

Collins, J., and Hohn, B. (1978) Proc. Natl. Acad. Sci. U.S.A. 75. 4242-4246

Comi, P., Giglioni, B., Barbarano, L., Ottolenghi, S., Williamson, R., Novakova, M., and Masera, G. (1977) Eur. J. Biochem. 79_ 617-622 Conconi, F., Rowley, P.T., Del Senno, L., Pontremoli, S., and Volpalo, S. (1972) Nat. New Biol. 238 83-87

Courtney, M. and Williamson R. (1979) Nucleic Acids Res. 1_ 1121-1130

Cozzarelli, N., Metechan, N.E., Jovin, T.M. and Kornberg, A. (1967) Biochem Biophys Res. Commun. 28 578-583 134

Curtis, P., and Weissmann, C. (1976) J. Mol. Biol. 106 1061- 1075

Dayhoff, M.O., ed. (1972) Atlas of Protein Sequence and structure. Washington D .C. :National-^Biomedical Research Foundation

Denhardt, D.T. (1966) Biochem. Biophys. Res. Comm. 23^ 641-646

Dodgson, J.B., Strommer, J., and Engel, J.D. (1979) Cell 17 879-887

Dozy, A.M., Kan, Y.W., Embury, S.H., Mentzer, W.C., Wang, W.C., Lubin, B., Davis, J.R., and Koenig H.M. (1979) Nature 280 605- 607

Duncan, C., Biro, P.A., Choudary, P.V., Elder, J.T., Wang, R.R.C., Forget, B.G., de Riel, J.K., and Weissman, S.M. (1979) Proc. Natl. Acad. Sci. U.S.A. 76 5095-5099

Dugaiczyk, A., Woo, S.L.C., Lai, E.C., Mace, M.L., McRenolds, L. and O'Malley, B.W. (1978) Nature 274 328-333

Efstratiadis, A., Posakony, J.W., Maniatis, T., Lwan, R.M., O'Connell, C., Spritz, R.A., de Riel, J.K., Forget, B., Weissman, S., Slightom, J.L., Blechl, A.E., Smithies, 0., Baralle, F.E., Shoulders, C.C., and Proudfoot, N.J. (1980) Cell 21 653-668

Elgin, S.C.R. and Weintraub, H. (1975) Ann. Rev. Biochem 44 725-774

Embury, S.H., Miller, J.A., Dozy, A.M., Kan, Y.W., Chan, V. and Todd, D. (1980) Manuscript submitted

Fedoroff, N. V. (1979) Cell _16 551-563

Flashman, S.M. (1978) Molec. Gen. Genet 166 61-73

Flavell, R.A., Sabo, D.L., Bandle, E.F., and Weissmann, C. (1974) J. Mol. Biol. 89 255-272

Flavell, R.A., Kooter, J.M., De Boer, E., Little, P.F.R. and Williamson, R. (1978) Cell _15 25-41

Flavell, R.A. (1979a) in "From Gene to Protein:Information Transfer in Normal and Abnormal Cells." eds.Russell, T.R., Brew, K., Faber, H. , and Schultz, J. Miami Winter Symposium JL6 149-164. (Academic, New York).

Flavell, R.A., Bernards, R., Kooter, J.M., De Boer, E., Little, P.F.R., Annison, G., and Wiliamson R. (1979b) Nucleic Acids Res. 6 2749-2760 135

Forget, B.G., Hillman, D.G., Lazarus, H., Bovell, E.F., Benz, E.J., Caskey, C.T., Huisman, T.H.J., Schroeder, W.A. and Housman, D. (1976) Cell 7 323-329

Forget, B.G. (1978) Trend biochem Sci. 86-89

Fritsch, E.F., Lawn, R.M. and Maniatis, T. (1979) Nature 279 598-603

Fritsch, E.F., Lawn, R.M. and Maniatis, T. (1980) Cell 19^ 959- 972

Gannon, F., Jeltsch, J.M., and Perrin, F. (1980) Nuc. Acids Res. 8 4405-4421

Garel, A. and Axel, R. (1976) Proc. Natl. Acad. Sci. USA 73 3966-3970

Gilbert, W. (1978) Nature 271 501

Glover, D.M. and Hogness, D.S. (1977) Cell 10 167-176

Goosens, M., Dozy, A.M., Embury, S., Zacharides, Z., Hadjiminas, M., Stamatoyannopoulos, G., and Kan, Y.W. (1980) Proc. Natl. Acad. Sci. U.S.A. 77 518-521

Gottesfeld, J.M. and Partington, G.A. (1977) Cell 1_2 953-962

Groschedl, R. and Birnstiel, M.L. (1980) Proc. Natl. Acad. Sci. U.S.A. 77 1432-1436

Haldane, J.B.S. (1949) Disease and Evolution, La Ricera Scientifica _19 2

Hamer, D.H., and Leder, P. (1979) Cell 17 737-747

Hardies, S.C. and Wells, R.D. (1976) Proc. Natn. Acad. Sci. U.S.A. 73 3117-3121

Higgs, D.R., Old, J.M., Pressley, L., Clegg, J.B. and Weatherall, D.J. (1980) Nature 284 632-635

Hinnen, A., Hicks, J.B. and Fink, G.R. (1978) Proc. Natl. Acad. Sci. U.S.A. 75 1929-1933 Hoeijmakers-van Dommelen, H.A.M., Grosveld, G.C., De Boer, E., Flavell, R.A., Varley, J.M., and Jeffreys, A.J. (1980) J. Mol. Biol. 140 531-547

Hohn, B. and Murray, K (1977) Proc. Natl. Acad. Sci. USA 74 3259-3263

Huisman, T.H.J., Wrightstone, R.N., Wilson, J.B., Schroeder, W.A., and Kendall, A.G. (1972) Arch. Biochem. Biophys. 153 850- 853

Humayan, M.Z., and Chambers, R.W. (1979) Nature 278 525-529 136

Humphries, S., Windass, J., and Williamson, R. (1976) Cell 7 267-277

Hutchison, C.A. Ill, Phillips, S., Edgell, M.H., Gillam, S., Jahnke, P., and Smith, M. (1978) J.Biol. Chem. 253 6551-6560

Jacob, F., and Monod, J. (1961) J. Mol. Biol. 2 318-356

Jacq, C., Miller, J.R., and Brownlee, G.G. (1978) Cell 12. 109- 120

Jeffreys, A.J. (1979) Cell 18. 1-10

Jeffreys, A.J. and Flavell, R.A. (1977a) Cell 12. 1097-1108

Jeffreys, A.J. and Flavell, R.A. (1977b) Cell 12 429-439

Johnson,J.D., Ogden, R., Johnson,P., Abelson, J., and Itakura, K. (1980) Proc. Natl. Acad. Sci. U.S.A. 77 2564-2568

Kaiser, A.D. and Hogness, D.S. (1960) J. Mol. Biol. 2 392-415 Kan, Y.W., Dozy, A.M., Varmus, H.E., Taylor, J.M., Holland, J.P., Lie-Injo, L.E., Ganesan, J., and Todd, D. (1975) Nature 255 255-256

Kan, Y.W., Dozy, A.M., Trecartin, R., and Todd, D. (1977) New Engl. J. Med. 297 1081-1084 Kan, Y.W., Lee, K.Y., Furbetta, M. , Angus, A., and Cao, A. (1980) New Eng. J. Med. 302 185-188. Kantor, J.A., Turner, P.H., and Nienhuis, A.W. (1980) Cell 21 149-157

Kaufman, R.E., Kretschmer, P.J., Adams, J.W., Coon, H.C., Anderson, W.F. and Nienhius, A.W. (1980) Proc. Natl, Acad. Sci. USA 77 4229-4233

Konkel, D.A., Maizel, J.V. and Leder, P. (1979) Cell 8 865-873

Korn, L.J. and Brown, D.D. (1978) Cell .15 1145-1156

Kinniburgh, A.J., and Ross, J. (1979) Cell 11_ 915-921

Kronenburg, H.M., Roberts, B.J. and Efstratiadis, A. (1979) Nucleic Acids Res _6 153-166

Kwan, S.-P., Wood, T.G., and Lingrel, J.B. (1977) Proc. Natl. Acad. Sci. U.S.A. 178-182

Lacy, E., and Maniatis, T. (1980) Cell 21 545-553

Lauer, J., Shen, C-K., and Maniatis, T. Cell 20 119-130 137

Lang, A., Lehmann, H., and King-Lewis, P.A. (1974) Nature 249 467-469

Lawn, R.M., Fritsch, E.F., Parker, R.C., Blake, G. and Maniatis, T. (1978) Cell 15_ 1157-1174

Lawn, R.M., Efstratiadis, A., O'Connell, C., and Maniatis, T. (1980) Cell 647-651

Leder, P., Tiemeier, D. and Enquist, L. (1977) Science 196 175- 177

Lehmann, H., Casey, R., Lang, A., Stathopoulou, R., Imai, K., Tuchinda, S., Vinai, P., and Flatz, G. (1975) Br. J. Haematol. 31 119-131

s Lenhard-Schuller, R., Hohn, B., Brack, C., Hirama, M. and Tonegawa, S. (1978) Proc. Natl. Acad. Sci. U.S.A. 75 4709-4713

Little, P., Curtis, P., Coutelle, Ch., Van Den Berg, J., Dalgleish, R., Malcolm, S., Courtney, M., Westaway, D., and Williamson, R. (1978) Nature 2T3 640-643

Little, P.F.R., Flavell. R.A., Kooter, J.M., Annison, G. and Wiliamson, R, (1979) Nature 278 227-331

Maniatis, T., Hardison, R.C., Lacy, E., Lamer, J., O'Connell, C., Quon, D., Sim, G.K. and Efstratiadis, A. (1978) Cell 15 687-701

Maniatis, T., Fritsch, E.F., Lauer, J., and Lawn, R.M. (1980a) The of Human Haemoglobins. Ann. Rev. Genetics In Press

Maniatis, T. (1980b) ", a Comprehensive Treatise" Academic Press eds. Goldstein, L. and Prescott, D.M. 563-608

Manley, J.L., Fire, A., Cano.A., Sharp, P.A., and Gefter, M.L. (1980) Proc. Natl. Acad. Sci. U.S.A. 77 3855-3859

Maquat, L.E., Kinniburgh, A.J., Beach, LR., Honig, G.R., Lazerson, J., Ershler, W.B. and Ross, J. (1980) Proc. Natl. Acad. Sci. U.S.A. 11_ 4287-4291

Marotta, C.A., Forget, B.G., Cohen-Solal, M., Wilson, J.T. and Weissman, S.M. (1977) J. Biol. Chem. 252 5019-5031

Maxam, A.M., and Gilbert, W. (1977) Proc. Natl. Acad. Sci. U.S.A. 74 560-564

Maxam, A.M., and Gilbert, W. (1980) Methods in Enzymology In Press

Mears, J.G., Ramirez, F., Leibowitz, D., and Bank, A. (1978) Cell 15 15-23 138

Miller, J. (1970) in The Lactose Operon 173-188 (Cold Spring Harbor Laboratory, New York.

Murray, K. and Murray, N.E. (1975) J. Mol. Biol. 551-564

Murray, N.E., Manduca de Ritis, P. and Foster, L.A. (1973) Molec• gen. Genet 120 261-281

Murray, N.E. and Murray, K. (1974) Nature 251 476-481

Murray, N.E., Brammar, W.J. and Murray, K. (1977) Molec. gen. Genet. 150 53-61

Nasmyth, K.A. and Reed, S.I. (1980) Proc. Natl. Acad. Sci. USA 7 2119-2123

Nienhuis , A.W., Turner, P., and Benz, E.J. (1977) Proc. Natl. Acad. Sci. U.S.A. 1_4 3960-3964

Nguyen-Huu, M.C., Stratmann, M. Groner, B., Wurtz, T., Land, H., Giesecke, K., Sippel, A.E. and Schutz, G. (1979) Proc. Natl. Acad. Sci. USA 16. 76-80 Norgard, M.V., Emigholz, K. and Monahan, J.J. (1979) J. Bacteriol. 13Q_ 270-272

Nunberg, J.H., Kaufman, R.J., Chang, A.C.Y., Cohen, S.N. and Schimke, R.T. (1980) Cell 19 355-364

01d,J.M., Proudfoot, N.J., Wood, W.G., Longley, J.I., Clegg, J.B., and Weatherall, D.J. (1978) Cell 14_ 289-298

Orkin, S.M. (1978a) Proc. Natl. Acad. Sci USA T5 5950-5954

Orkin, S.H. (1978b) J. Biol. Chem. 2^53 12-15

Orkin, S.H., Old, J.M., Weatherall, D.J., and Nathan, D.G. (1979a) Proc. Natl. Acad. Sci. U.S.A. 16_ 2400-2404 Orkin, S.H., Old, J.M., Lazarus, H., Altay, C., Gurgey, A., Weatherall, D.J., and Nathan, D.G. (1979b) Cell 17 33-42

Ottolenghi, S., Lanyon, W.G., Paul, J., Williamson, R., Weatherall, D.J., Pritchard, J., Pootrakul, S., and Boon, W.H. (1974) Nature 251 389-392

Ottolenghi, S., Comi, P., Giglioni, B. Tolstoshev, P., Lanyn, W.G., Mitchell, G.J., Willaimson, R., Russo, G., Musumeci, S., Schiliro, G., Tsistrakis, G.A., Chavache, S., Wood, W.G., Clegg, J.B. and Weatherall, D.J. (1976) Cell 9 71-80

Ottolenghi, S., Giglioni, B., Comi, P., Gianni, A.M., Polli, E., Acquaye, C.T.A., Oldham, J.H., and Masera, G. (1979) Nature 278 654-656 139

Pardee, A.B., Jacob, F. and Monod, J. (1959) J. Mol. Biol. 1 165-180

Patient, R.K., Elkington, J.A., Kay, R.M., and Williams J. (1980) Cell 21 565-574

Paul,J., and Gilmour, R.S. (1968) J. Mol. Biol. 34 305-316 Paul, J. (1976) Br. Med. Bull. 32. 277-281

Pressley, L., Higgs, D.R., Clegg, J.B., and Weatherall, D.J. (1980) Proc. Natl. Acad. Sci. U.S.A. 77 3586-3587

Pritchard, J., Longley, J., Ross, J. (1976) J. Mol. Biol. 106 403-420

Pritchard, J., Longley, J., Clegg, J.B., and Weatherall, D.J. (1976) Br. J. Haematol 32 473-485

Proudfoot, N.J., and Maniatis, T. (1980) Cell 21 537-544

Ptashne, M., Jeffrey, A., Johnson, A.D., Maurer, R., Meyer, B,J., Pabo, C.O., Roberts, T.M., and Saurer, R.T. (1980) Cell 19 1-11

Ramirez, F., O'Donnell, J.V., Marks,P.A.,Bank, A., Musumeci, S., Schiliro, G., Pizzarelli, G., Russo, G., Luppis, B., and Gambino, R. (1976) Nature 263 471-475

Ramirez, F., Burns, A.L., Mears, J.G., Spence, S., Starkman, D., and Bank, A. (1979) Nuc. Acids Res. 7 1147-1162

Roop, D.R., Nordstrom, J.L., Tsai, S.Y., Tsai, M-J., and O'Malley, B.W. (1978) Cell JX 671-685

Sakano, H., Huppi, K., Heinrich, G., and Tonegawa, S. (1979) Nature 280 288-284

Sakonju, S., Bogenhagen, D.F., and Brown, D.D. (1980) Cell 19 13-26

Sanger, F., and Coulson, A.R. (1978) FEBS Letts. Bl_ 107-110

Sanger, F., Nicklen, S., and Coulson, A.R. (1978) Proc. Natl. Acad. Sci. U.S.A. 16 4951-4955

Schaffner, W., Kunz, G. , Daetwyler, H., Telford, J., Smith, H.O. and Birnstiel, M.L. (1978) Cell _14 6.55-671

Schenk, T.E., Rhodes, C., Rigby, P.W.J., and Berg, P. (1975) Proc. Natl. Acad. Sci. U.S.A. 7_2 989-993

Schmidt, C., and Rubin, J. (1980) Nuc. Acids. Res. In Press 140 I

Seidman, J.G., Max, E.E., and Leder, P. (1979) Nature 280 370- 375

Slightom, J., Blechl, A. and Smithies, 0. (1980) Cell In Press

Shatkin, A.J. (1976) Cell 9 645-653

Shen, C-K.J., and Maniatis , T.(1980) Cell 19 379-391

Shortle, D. and Nathans, D. (1978) Proc. Natl. Acad. Sci. USA 75 2170-2174

Southern, E.M. (1975) J. Mol. Biol. 98 503-517

Spritz, R.A., Jagadeeswaran, P., Biro, P.A., Elder, J.T., Choudary, P.V., deRiel, J.K., Manley,J.L., Gefter, M.L., Weissman, S.M., and Forget, B.G. (1980) Proceedings of Haemoglobin Switching Meeting Airley Hse. Virginia. In Press

Staider, J., Groudine, M., Dodgson, J.B., Engel, J.D. and Weintraub, H. (1980) Cell 19 973-980

Stein, J.P., Catterall, J.F., Kristo, P., Means, A.R., and O'Malley (1980) Cell _21 681-687

Taniguchi, T., and Weissmann, C., (1978) J. Mol. Biol. 118 533 565

Thuring, R.W.J., Sanders, J.P.M. and Borst, P. (1975) Anal. Biochem. 66 213-220

Tiemeier, D., Enquist, L. and Leder, P. (1976) Nature 263 526- 527

Tilghman, S.M., Curtis, P.J., Tiemeier, D.C., Leder, P. and Weissmann, C. (1978) Proc. Natl. Acad. Sci. U.S.A. 'is 1309-1313

Tonegawa, S., Brack, C., Hozumi, N. and Schuller, R. (1977) Proc. Natl. Acad. Sci. USA _74 3518-3522

Tuan, D., Biro, P.A., deRiel, J.K., Lazarus, H., and Forget, B.G. (1979) Nuc. Acids Res. 6 2519-2544 Tua.n, D., Murnane, M.J., deRiel, J.K., and Forget, B.G. (1980) Nature 285 335-337

Van den Berg, J., van Ooyen, A., Mantei, N., Schambock, A., Grosveld, G., Flavell, R.A., and Weissmann, C. (1978) Nature 276 37-44

Van der Ploeg, L.H.T., Konings, A., Oort, M., Roos, D., Bernini, L. and Flavell, R.A. (1980) Nature 283 637-642 r Van der Ploeg, L.H.T., and Flavell, R.A. (1980) Cell 19 947-958 141

Vanin,E.E., Goldberg, G.I., Tucker, P.W., and Smithies, 0. (1980) Nature 286 222-226

Van Ooyen, A., van den Berg, J., Mantei, N., and Weissmann, C. (1979) Science 206 337-344

Wasylyk, B., Vedinger, C., Corden, J., Brison, 0. and Chambon, P. (1980) Nature 285 367-373

Weatherall, D.J., and Clegg, J.B. (1972) The Thalassaemia Syndromes (Blackwell, Oxford, ed.2)

Weatherall, D.J., and Clegg, J.B. (1975) Philos Trans R. Soc. London B 271 411-455

Weaver, R.F., and Weissmann, C. (1979) NUc. Acids Res. 7 1175- 1193 i Weissmann, C. (1978) Trend biochem. Sci. 3 109-111 Whitelaw, E., Pagnier, J., Verdier, G., Henni, T., Godet, J. and Williamson, R. (1980) Blood 55 511-516

Wickens, M.P., Woo, S., O'Malley, B.W. and Gurdon, J.B. (1980) Nature 285 628-634

Williamson, R. (1976) Br. Med. Bull, 32 246-250 Wilson, J.T., Wilson, L.B., deRiel, J.K., Villa-Komaroff, L., Efstratiadis, A., Forget, B.G., and Weissman, S.M. (1978) Nuc. Acids Res. _5 563-581

Wold, B., Wigler, M., Lacy, E., Maniatis, T., Silverstein, S., and Axel, R. (1979) Proc. Natl. Acad. Sci. U.S.A. 76 5684-5688

Woo, S.L.C., Monahan, J.J. and O'Malley, B.W. (1977) J. Biol. Chem. 252 5789-5797

Wood, W.B. (1969) J. Mol. Biol. Ij6 118-133

Wood, W.G., Old, J.M., Roberts, A.V.S., Clegg, J.B. and Weatherall, D.J. (1978) Cell 437-446

Wood, W.G., Clegg, J.B. and Weatherall, D.J. (1979) Brit. J. Haematol. 43. 509-520

Zimmer, E.A., Martin, S.L., Beverley, S.M., Kan, Y.W. and Wilson, A.C. (1980) Proc. Natl. Acad. Sci. USA 77 2158-2162 142

Errata:

Briggs, R., and King, T.J. (1952) Proc.Natl. Acad. Sci. U.S.A. 38. 455-463

Dierks, P., Van Ooyen, A., Meyer, F., Mantei, N., and Weissmann. (1980) Experientia 36 742

Gurdon, J. (1974) The Control of Gene Expression in Animal Development. (Oxford)

Huisman, T.H.J., Schroeder, W.A., Efremov, G.D., Duma, H., Mladenovsky, B., Hyman, C.B., Rachmilewitz, E.A., Bouver, N., Miller, A., Brodie, A., Shelton, J.B. and Apell, G. (1974) Ann. N.Y. Acad. Sci. 232 107-124

Little, P.F.R., Annison, G., Darling, S., Williamson, R. Camba, L., and Modell, B. (1980) Nature 285 144-147

Sutcliffe, J.G. (1978) Nuc. Acids. Res. 5

Rigby, P.W.J., Dieckmann, M., Rhodes, C., and Berg, P. (1977) J. Mol. Biol. 113 237-251

Twigg, A.J. and Sherratt, D. (1980) Nature 283 216-218

Volckaert, G., Feunteun, J., Crawford, L., Berg, P., and Fiers, W. (1979) J. Virol. 30 674-682 (Reprinted from Nature, Vol. 2 73, No. 5664, pp. 640-643, June 22, 1978) © Macmillan Journals Ltd., I 978 Isolation and partial sequence of recombinant plasmids containing human a-, ($- and y-globin cDNA fragments P. Little*, P. Curtis', Ch. Coutelle*, J. Van Den Bergt, R. Dalgleish*, S. Malcolm*, M. Courtney*, D. Westavvay & R. Williamson Department of Biochemistry, St. Mary's Hospital Medical School, University of London, London W2, UK Institute for Molecular Biology I. University ol Zurich, Honggerberg, 8093 Zurich, Switzerland Human globin cDNA-deriued recombinants with plasmid Human globin messenger RNA was prepared using two pCRl have been prepared for use as specific hybridisation cycles of oligo dT cellulose affinity chromatography (Collab- orative Research, grade T3) using 'normal' adult reticulocytes probes and for the partial sequencing of a-, (3- and y- (mRNA (a, (3)), neonatal exchange transfusion blood (mRNA globin genes. (a, (3, y)), and blood from a previously studied double heterozygote (30/8(30 thalassaemic with no mRNA ((3) in circu- To analyse the molecular biology of developmental and lating red blood cells (mRNA (a, y)) (ref. 16). Messenger RNA pathological systems in humans, one must investigate not only for clone identification was used without further purification. the expression of single genes, but also that of families of genes. A sample of mRNA (a, y, (3) was used to prepare double- One of the most interesting and most easily studied human stranded cDNA for insertion into plasmids17 21. This mRNA gene families specifies the /3-like globin chains. The gene was first prepared by affinity chromatography, as above, arrangement is known, and the four (3-like genes are thought to treated with ribonuclease-free DNase (Worthington) and then be closely linked1. There is a switch during foetal and neonatal centrifuged on a sucrose gradient; only material sedimenting at development from -y-globin synthesis to /3-glohin synthesis, 9S was collected. Complementary DNA was prepared using while the synthesis of 8-globin remains at a low level RNA-dependent DNA polymerase. The cDNA was treated throughout . There is evidence of different half for the with alkali to remove RNA, neutralised and used as a template otherwise similar 8 and (3-globin mRNA (ref. 3). Finally, there for the synthesis of a second, complementary strand with is a wide spectrum of genetic disease, the thalassaemias, in RNA-dependent DNA polymerase. The double-strand prod- which the expression of the globin genes is affected, and which uct was treated with SI nuclease, elongated with terminal have been shown to be due to lesions at any one of several transferase, and a poly dA sequence was added to the 3' levels: gene deletion, lack of transcription, faulty post-tran- termini of the double-stranded cDNA using the method of scriptional processing or translation defects4 K. A similar a- Maniatis19'20. gene family exists, although developmental switches in this case are less well understood". There is also regulated expression of the amounts of a-globin and /3-glohin, a useful model for 1 Plasmid identification and hybridisation to coordinate gene expression at the molecular level" . purified globin cDNA Plasmid pCRl was purified and cleaved at the single EcoRI restriction site, and the 3' termini were extended with poly dT Preparation and characterisation of chimaeric sequences using terminal transferase20. 100 ng of tailed plas- plasmids mid were mixed with 2 ng of tailed double-stranded cDNA, To study the problems listed above, it is essential to prepare annealed, and used to transfect Escherichia coli HB101. The nucleic acid probes which are specific for each globin gene. So molar ratio of plasmid to insert was 2.3:1. The number of far, this has been done using either mRNA or cDNA. Prep- recombinant clones obtained in three experiments, as judged aration of total globin mRNA, and thus its complementary by kanamycin resistance, was 37, 70 and 175. These were copy, is no longer difficult, but purifying a sequence comple- picked and plated out in arrays and sufficient cells grown to mentary to only one of the globin genes poses problems, par- allow transfer to several replica Millipore HAWP filters. All ticularly if more than a very small amount of probe is the transfections and growth of recombinant clones before required11" . To meet the needs for complete purity and large positive identification were carried out under UK category 3 amounts of these probes, we have prepared chimaeric plasmids biohazard containment conditions in the Department of Bio- in which a double-stranded DNA copy of human globin foetal chemistry, Imperial College, as advised by the UK Genetic mRNA has formed a recombinant with plasmid pCRl (ref. 13). Manipulation Advisory Group22. The plasmids that have been isolated have been charac- Replica Millipore filters were prepared for hybridisation to terised by 'Grunstein-Hogness' hybridisation on a filter to appropriate (32P)-labelled mRNAs essentially by the method of purified polynucleotide kinase-labelled globin mRNA (refs 14, Grunstein and Hogness14. The mRNAs were cleaved into 15), then by solution hybridisation in plasmid excess to chain- fragments approximately 100 nucleotides long by brief specific cDNAs. The plasmids have also been characterised by exposure to alkali before terminal labelling with polynucleotide restriction mapping, and in two cases the globin cDNA length kinase15. Each filter was hybridised to 2.5 x 106 d.p.m. of (32P)- confirmed by excision and direct measurement. mRNA in a volume of 1.5 ml in 50% formamide hybridisation Plasmids were selected containing a-, (3- and y-globin gene buffer for 18 h at 42°C. sequences and in each case the plasmid selected contained a After autoradiography of the filters, a set of clones hybridis- cDNA-derived sequence greater than half the length of the ing strongly with (32P)-mRNA (a, y, (3) and not scoring with mRNA. The availability of this family of human globin gene (32P)-mRNA (a, (3) was identified as being most likely to sequences should facilitate detailed analysis of normal globin contain y-globin cDNA inserts. Another set scoring with all gene arrangement and the causes of dysfunction, particularly in three probes mRNA (a, y, (3), mRNA (a,.(3), and mRNA (a, y) thalassaemia. was identified as most likely to contain a-globin inserts. Small cultures of colonies hybridising most strongly on filters were under UK category 2 biohazard containment conditions, as grown and DNA was prepared from the supernatant after they had been identified by two independent techniques as pelleting the cell debris and the bulk of the chromosomal containing a known protein-coding sequence for a nontoxic DNA. This was sonicated to a size of approximately 400 protein. Approximately 2 mg plasmid DNA was obtained from nucleotide pairs and then hybridised to purified globin cDNA a 1-litre culture after plasmid amplification with chloram- probes at a DNA: cDNA ratio of 100,000:1. The probes used phenicol (Sigma). It was found essential to use chloramphenicol were cDNA (a:, (3) and cDNA (a, y, (3), prepared from purified rather than the clinically used chloramphenicol succinate to globin mRNA from adult and neonatal reticulocyte rich blood, obtain successful amplification. The plasmid DNA from the respectively, and cDNA (a) and cDNA (/3) prepared using gradients was examined with the electron microscope, and was cross-hybridisation with thalassaemic mRNA as previously found to be entirely in the form of supercoiled or open circles. described23. In every case the plateau hybridisation value The plasmid was now available in sufficient amount to obtained is a compound of the purity of the probe with respect hybridise to purified globin cDNAs at various plasmid to the globin sequence studied and the length of the insert. DNA:cDNA ratios; the hybridisation curves obtained for Of the total 282 clones picked as kanamycin-resistant, 88 plasmids pHcrGl, pH/SGl, and pHyGl are shown in Fig. 1, scored sufficiently strongly with (32P)-mRNA to be charac- and demonstrate that the assignments are correct. terised further; of these, 58 have been tentatively identified as containing y-globin cDNA inserts, 15 as a-globin cDNA inserts, and 15 as (3-globin cDNA inserts. The other clones did Partial sequencing of recombinant plasmids not hybridise to (32P) ribosomal RNA and are thought to Each plasmid was then treated with a number of restriction contain short globin cDNA inserts. The starting mRNA enzymes: Hhal, Hpall and EcoRI proved particularly useful. It contained globin mRNA sequences in the approximate ratio was found that Hhal and Hpall digest plasmid pCRl to give a a:/3: y —55:12:32 (ref. 24). The ratio of clones obtained is large fragment containing the region where insertion occurs consistent with the previous results from other groups20'25 that (the original EcoRl site). Hhal (sequence cleaved, GCGC), more (3 -globin cDNA than a-globin cDNA clones are Hpall Hindlll obtained for a given mRNA with approximately equal amounts (sequence cleaved, CCGG) and (sequence of the two globin types. Human y-globin cDNA behaves like (3 -globin cDNA in its preferential incorporation into plasmids. Plasmids pHoGl, pH/JGl and pHyGl were chosen as containing cDNA inserts of a-, [3- and y-globin gene sequences, respectively, and were grown in 1-litre culture

Fig. 2 Digestion of pCRl, pHaGl, pHyGl with Hpall and EcoRI plus Hpall. Track a, 1.0 pg AC1857 digested with EcoRl and Hmdlll. Track b, 1.0 |xg pCRl digested with Hpall. Track c, 1.0 p.g pCRl digested with Hpall plus EcoRl. Track d, 0.3 |xg SV40 digested with HmdIII. Track e, 1.0 p,g pHaGl Fig. 1 cDNA-hybridisation curve of globin chimaeric plasmids digested with Hpall. Track /, 1.0 pg pHaGl digested with Hpall pHaGl, pHjSGl and pHyGl. a, Hybridisation of pHyGl to plus EcoRl. Track g, as a. Track h, 1.0 pg pHyGl digested with cDNAa/3y and cDNAa/3, and of pHaGl to cDNAa and Hpall. Track i, 1.0 pg pHyGl digested with Hpall plus EcoRl. cDNA/3. b, Hybridisation of pH/SGl to cDNAa/S and cDNAa. Track / as a. Hpall digests were carried out on 2 p.g plasmid DNA The plasmid DNA was sheared to approximately 500-700 base in 20 pi of 6 mM Tris, 6 mM 2-mercaptoethanol, pH 7.5, pairs and was hybridised at increasing concentrations to 0.2 ng of containing 0.5 units of enzyme. Digestion was at 37°C for 3.5 h. the indicated cDNA probes in 2 p,l formamide hybridisation The sample was then split and half kept in ice. To the other half, buffer (50% formamide, 0.5 M NaCl, 0.25 M HEPES pH 7.5, 1 pi of 100 mM Tris, 500 mM NaCl, 100 mM MgCl2, pH 7.5, and 0.01 M EDTA), sealed into capillaries, boiled for 5 min and then two units of EcoRl were added and digested at 37°C for a half left for 60 h at 42°C. The % of hybridisation was assayed by SI hour. The reactions were then stopped with 1 pi of 0.5 M EDTA. nuclease treatment as described by Harrison et al.29. The cDNA The whole sample was run in a 2.0% Agarose (Miles) gel. Run- probes were prepared using as described by ning buffer was 40 mM Tris, 20 mM NaOAc, 2 mM EDTA, pH Jackson et al.30. The cDNAa and cDNA/3 probes were prepared 7.5, and electrophoresis allowed to proceed for 4 h at a voltage -1 by hybridisation of cDNAa/3 to /30mRNA in excess and separa- gradient of 6 Vcm . The gel was then stained with ethidium tion of the single and double stranded nucleic acid by hydroxy- bromide and the ultraviolet induced fluorescence photographed apatite chromatography5 on Ilford FP4 using a Wratten 25A red filter. cleaved, AAGCTT) (rcf. 26) cleave the a-globin cDNA insert, apply to pHyGl. Inspection of the protein sequence indicates and do not cleave the [3- or y-globin cDNA inserts. ErnRI only one possible EcoRI site in the coding sequence, again at cleaves both the (3- and y-globin cDNAs at a single site, and 121-122. What has not been established from the restriction does not cleave the a-globin cDNA insert. map alone is the absolute orientation of the Hpall/Hhal It is possible to derive restriction site maps of the inserted fragments within the whole plasmid. fragments by comparison of fragment patterns generated from The size of DNA which has been inserted into a plasmid chimaeric and non-chimaeric plasmids. Figure 2 shows this using poly dA-poly dT tailing, as in this case, can also be approach using Hpall. Track b is the fragment pattern for estimated by excising the insert using SI nuclease in conditions pCRl with this enzyme. A single fragment at 790 base pairs is where only the poly dA-poly dT regions melt K. This was done removed by digestion with EcoRI (track c) to generate a new for the y-globin plasmid pHyGl and the results are shown in band at 690 base pairs. This indicates that the single EcoRI site Fig. 4. The size of the inserted sequence is 500 nucleotide pairs. is 100 base pairs from one end of a 790-base pair Hpall As this does not include the dA and dT sequences, it is in fragment. The a plasmid gives a pattern (track e) with two new excellent agreement with the restriction data. Similar experi- bands at 1,000 base pairs and 440 base pairs (this band runs at ments for pH/3Gl indicate an inserted sequence of 540 base the same mobility as a pCRl fragment, hut its presence is pairs (data not given). proved by microdensitometry). This pattern may be interpreted All three plasmids were further characterised by direct by assuming that there is an Hpall site within the inserted sequence analysis around a restriction site in the coding region DNA giving rise to the two new fragments. Thus, the 1,000- of the inserted DNA. The results (Fig. 5) confirm the assign- base pair fragment contains 690 base pairs of pCRl and 310 ments made from the hybridisation data and further differen- base pairs of inserted DNA, the 440 fragment contains LOO tiate between pH/SGl and pHyG 1. Sequencing was carried out base pairs of plasmid and 340 base pairs of inserted DNA. This according to the method of Maxam and Gilbert' around the enables the map in Fig. 3 to be derived. The y plasmid gives a Hindlll site in the pHaGl inserted sequence and around the pattern with one new 1,500-base pair fragment (Fig. 2, track EcoRl site in the pHjSGl and pHyGl inserted sequences. h). This consists of 790 base pairs of pCRl plus 710 base pairs Sequencing of the a plasmid shows that out of the five of inserted DNA. There is no internal Hpall site. Subsequent possible Hindlll sites predicted for the human a globin protein digestion with EcoRI breaks this new fragment to give two new sequence, the cDNA is cleaved at a site spanning the amino bands (Fig. 2, track i) and the location of the EcoRI site is acids 90-91. The nucleotide sequences derived from pH^G 1 established in a fashion directly analogous to the a Hpall site. and pHyGl confirm that the EcoRI site lies at amino acids The (3 plasmid gave a pattern of fragments that was very similar 120-121, and in addition, the nucleotide sequence for the (3 to the pHyGl: a new band at 1,550 base pairs was visible, globin agrees fully with sequences previously published indicating an inserted DNA fragment of 760 base pairs. On The sequencing studies also confirm the proposed orientation digestion with EcoRI, this fragment was split to give new bands of each inserted sequence relative to known pCRl restriction at 1,000 base pairs and 550 base pairs (data not presented). sites. Similar experiments were also carried out using Hhal to give the final maps shown in Fig. 3. There is ambiguity in the a Hhal map where the possible location of a second Hhal site remains abed obscure. The single EcoRI site in pH/3Gl and pHyGl allows a tentative statement on the orientation of the inserted DNA. The cDNA/S sequence is known 7 and has a single EcoRI site toward the 3'-end at the amino acid 121-122. Thus, the asymmetric distribution of DNA to the left and right suggests that the cDNA is inserted as shown in Fig. 3. Similar arguments

660 l!„„„„,Ji . 690 pHaCl 310 310 100

330 480

Fig. 3 Restriction map around the site of insertion of plasmids pCRl, pHaGl, pH/3Gl and pHyGl. [Rl] indicates the two halves of the EcoRI site of insertion. Approximate map distances are given in base pairs. The HpaW and Hhal sites in pHaGl can be calculated to be about 30 base pairs apart, but their order relative to other restriction sites cannot be established from the Fig. 4 Excision of the cDNA sequences from pHyGl with SI restriction data presented in Fig. 2. The orientation of the inserted nuclease. Tracks a and d, HaeIII digested SV40 DNA. Track b, DNA in pHaGl is deduced from the nucleotide sequence shown 1.0 q-g pHyGl DNA digested with 24 units SI nuclease (Sigma) in Fig. 5. The two sets of map distances presented for pFIyGl are in 20 ixl of 45% formamide, 30 mM NaOAc, 1 mM ZnS04, 0.2 M derived from experiments excising the inserted DNA from pCRl NaCl, pH 4.5, for 30 min at 55°C. Track c, 1.0 jxg of pCRl with HpaW (above) and Hhal (below). The inserted DNA digested as b. The whole reaction mix is run on a 2.0% Agarose sequences in each plasmid (including the poly dAdT tracks) are gel as described in the legend to Fig. 5. The prominent band seen signified by shading. in b is the inserted cDNA. 75 80 85 90 screening recombinant DNA molecules containing genomic AspMetProAsnA laLeuSer AluLeuSerAspLeu His Ala His LysLeu a-globin globin DNA sequences. 6 ACAUGCCCAACG CG CUG UCCG CCCUGAGC GACCU6C ACGCGCAC AAGCUU pHaGl After this work had been completed, a paper was published35 Hindlll describing the construction of similar human globin cDNA- 95 100 derived recombinants with plasmid pMB9, and reporting the ArgVal Asp Pro Val AsnPheLys Leu Leu a-globin sequence of the (3-globin gene insert. Our data are compatible CGGGUG6ACCCGGUCAAC UUCAAGCUCCUA pHaG 1 with, and extend those, reported by Wilson et al.35. We thank Drs David Glover and Peter Rigby for help in the 105 110 115 120 biohazard facility of Imperial College, Mrs Gillian Annison and LeuLeuGlyAsnVal LeuVal CysVal Leu Ala His HisPheGly LysGlu /1-globin Dr Panos Ioannou for helping in the plasmid characterisation CUCCUGGGCMCGUGCUGGUCUGUGUGCUGGCCCAUCACUUUGGCAA- GAA pH/JGl and mRNA preparation, B. Smith and S. Barrett for Hpall and R1 Hhal, respectively, and Dr Beard for RNA-dependent DNA 105 110 115 120 polymerase. D. W. thanks Professor Charles Weissmann and LeuLeuGlyAsnVal LeuVal Thr Val LeuAla IleHisPheGly LysGld- y-globin Dr Ned Mantei for help with the Maxam and Gilbert sequenc- CUCCUGG6AAAUGUGCUGGUGACCGUUUUG GCAAUCCjAUUUCGGCAA-GAA pH yG 1 ing technique. This work was funded by the MRC. A grant R1 from EMBO allowed S. M. to visit Zurich which was most helpful to this project. Ch. C. is recipient of a WHO fellowship 125 130 135 and is on leave from the Central Institute of Molecular Biology PheThrProGlu Val Gin AlaSer TrpGln LysMet Val Thr y-globm of the Academy of Sciences of the GDR. UUCACCCCUGAGGUGCAGGCUUCAUGGCAGAAGAUGGUGACU pHyG I Mboll Fig. 5 Partial nucleotide sequence of plasmids pHaGl, pH/SGl and pHyGl. Plasmids pH/3Gl and pH-yGl were restricted with Received 23 December 1977; accepted 20 April 1978. EcoRI and labelled at their 5' termini with y-32P-ATP using 1. Weatherall, D. J. & Clegg, J. B. The Thalassaemia Syndromes, 2nd edn (Blackwell, Oxford, 1972). polynucleotide kinase (PL-Biochem). Secondary restriction was 2. Wood, W. G. Br. med. Bull. 32, 282-287 (1976). with Hindlll which cleaves pCRl at a single site and does not 3. Clegg, J. B. & Weatherall, D. J. Lancet ii, 133-135 (1974). cleave the inserted DNA. The two resulting restriction fragments 4. Ottolenghi, S. et al. Cell 9, 71-80 (1976). were separated on preparative 0.6% agarose gels. After isolation 5. Tolstoshev, P. et al. Nature 259, 95-98 (1976). 6. Comi, P. et al. Eur. J. Biochem. 79, 617-622 (1977). from this gel, each labelled fragment was divided into aliquots and 7. Kan, Y. W„ Holland, J. P., Dozy, A. M. & Varmus, H. E. Proc. natn. Acad. Sci. U.S.A. 72, subjected to the four-base-specific cleavage reaction described by 5140-5144 (1975). Maxam and Gilbert32. The products were fractionated on a 20% 8. Conconi, F. et al. Nature 254, 256-259 (1975). acrylamide, 7 M urea, gel. Autoradiography was carried out at 9. Weatherall, D. J. & Clegg, J. B. A. Rev. Genet. 10, 157-178 (1976). 10. Lodish, H. F. & Jacobson, M. J. biol. Chem. 247, 3622-3629 (1972). -70°C using Fuji RX film and Ilford fast tungstate intensifying 11. Ottolenghi, S. et al. Nature 251, 389-392 (1974). screens. pHo-Gl was cleaved with Hindlll before labelling with 12. Kan, Y. W. et al. Nature 255, 255-256 (1975). polynucleotide kinase. Secondary restriction was carried out with 13. Armstrong, K. A., Hershfield, V. & Helinski, D. R. Science 196, 172-174 (1977). 14. Grunstein, M. & Hogness, D. S. Proc. natn. Acad. Sci. U.S.A. 72, 3961-3965 (1975). Kpnl, yielding four labelled fragments, two of which consist solely 15. Maizels, N. Cell 9, 431-438 (1976). of pCRl sequences. The fragments were separated on a 0.6% 16. Ottolenghi, S. et al. Proc. natn. Acad. Sci. U.S.A. 72, 2294-2299 (1975). agarose gel and only the smallest and largest fragments cor- 17. Rabbitts, T. H. Nature 260, 221-224 (1976). responding to approximately 1.45 and 7.50 kilobases, respec- 18. Rougeon, F„ Kourilsky, P. & Mach, B. Nucleic Acid Res. 2, 2365-2378 (1975). 19. Efstratiadis, A., Kafatos, F. C„ Maxam, A. M. & Maniatis, T. Cell 7, 279-288 (1976). tively, were isolated. After isolation, the base-specific reactions 20. Maniatis, T„ Kee, S. G., Efstratiadis, A. & Kafatos, F. C. Cell 8, 163-182 (1976). and fractionation of cleavage products were carried out as for 21. Wilson, J. T., Forget. B. G., Wilson, C. B. & Weissman, S. M. Science 196, 200-202 pH/SGl and pH-yGl plasmids. Sequences read from the (1977). autoradiographs are presented in the RNA form of the coding 22. Williams, R. E. O. Report of the Working Party on the Practice of Genetic Manipulation (HMSO, London, 1976). strand of the inserted DNA. Nucleotides shown underlined signify 23. Williamson, R. Br med. Bull. 32, 246-250 (1976). an ambiguity at this position. Nucleotides lying within the restric- 24. Old, J. etal. Cell 8, 13-28 (1976). tion sites of EcoRl, Hindlll are deduced from the known 25. Rougeon, F. & Mach, B„ J. biol. Chem. 252, 2209-2217 (1976). 26 26. Roberts, R. J. CRC Critical Rev. Biochem., 123-164 (1976). specificity of these restriction endonucleases . 27. Marotta, C. A., Wilson, J. T„ Forget, B. G. & Weissman, S. M. J. biol. Chem. 252, 5040-5053 (1977). 28. Hofstetter, H., Schambock, A., Van Den Berg, J. & Weissmann, C. Biochim. biophys. Acta 454, 587-591 (1976). 29. Harrison, P. R„ Birnie, G. D„ Hell, A., Humphries, S., Young, B. D. & Paul, J. J. molec. The construction of human globin cDNA plasmids will be of Biol. 84, 539-554 (1974). use in the study of human globin gene cluster. Thus, the plas- 30. Jackson, J. F., Tolstoshev, P., Williamson, R. & Hendrick, D. Nucleic Acid Res. 3, 2019-2026 (1976). mids may be used as specific hybridisation probes for 31. Laskey, R. A. & Mills, A. D. FEBS Lett. 82, 314-316 (1977). purification of genomic globin DNA sequences by affinity 32. Maxam, A. M. & Gilbert, W. Proc. natn. Acad. Sci. U.S.A. 74, 560-564 (1977). 33. Schroeder, W. A. etal. Proc. natn. Acad. Sci. U.S.A. 60, 537-544 (1968). chromatography, for restriction mapping after the method of 34. Southern, E. M. J. molec. Biol. 98, 503-517 (1975). Southern (ref. 34 and Flavell et al., in preparation), or for 35. Wilson, J. T. et al. Nucleic Acid Res. 5, 563-581 (1978).

Printed in Great Britain by Henry Ling Ltd , at the Dorset Press, Dorchester, Dorset