‘Towards A Mouse ModeC

fo r MaemopfiiCia

tRaecfiet Kenny

I'Haemostasis ‘R&searcfi Qroup,

CtinicaC ‘R&searcfi Centre, Marrow.

Submitted fo r the degree o f TfdD.

1992

1 ProQuest Number: 10609858

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest

ProQuest 10609858

Published by ProQuest LLC(2017). Copyright of the Dissertation is held by the Author.

All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC.

ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 ABSTRACT OF THESIS TOWARDS A MOUSE MODEL FOR HAEMOPHILIA

Haemophilias A and B are X-linked recessive bleeding disorders caused by deficiencies of coagulation factors VIII and IX respectively. Present treatment for the disease, by replacement therapy with products extracted from large volumes of pooled blood, carries a high risk of infection. The aim of this project is to develop a mouse model for haemophilia, in which proposed gene therapy protocols can be studied. To do this I have proposed to use homologous recombination to disrupt the endogenous factor VIII and IX genes in murine embryonic stem cells. Correctly targeted cells would be introduced into a developing mouse embryo where they would contribute to the germline and subsequently to a line of transgenic, haemophilic mice.

As a first step towards this model, genomic bacteriophage libraries have been screened for the murine factor VIII and IX genes. Five bacteriophage clones were isolated and shown to contain the mouse factor IX gene. This locus has subsequently been characterised by dideoxy-sequencing and bacteriophage mapping methods. The mouse factor IX gene spans more than 53kb and shares, in the coding regions, 86% sequence identity with the human gene. The structure of mouse factor IX is discussed.

No factor VUI-containing clones have been isolated, although sequences thought to be part of the murine factor VIII gene have been amplified by reverse transcription/PCR.

Targeting constructs have been produced which contain part of the mouse factor IX gene disrupted by the selectable Neo cassette, and the HSV-tk gene which will allow the use of positive/negative selection procedures. These constructs have been introduced into the Embryonic Stem Cell line, E14, by electroporation, but no targeting events have yet been identified.

2 TABLE OF CONTENTS

ABSTRACT OF T H E S I S ...... 2 TABLE OF CONTENTS...... 3 LIST OF F I G U R E S ...... 7 LIST OF TABLES 8 ACKNOWLEDGEMENTS 9 AIMS 10 1 GENERAL INTRODUCTION...... 12 1.1 HAEMOPHILIA ...... 13 1.2 THE COAGULATION C A S C A D E ...... 14 1.3 THE FACTOR IX PR O T E I N ...... 18 1.3.1 Synthesis and Structure of Factor IX . 21 1.3.2 H o m o l o g y ...... 29 1.4 THE STRUCTURE OF HUMANFACTOR V I I I ...... 30 1.5 THE FACTOR IX GE N E ...... 34 1.5.1 The factor IX gene Promoter...... 36 1.5.2 Exon A r r a n g e m e n t ...... 40 1.5.3 Introns And Repetitive Elements .... 41 1.5.4 Splice S i t e s ...... 42 1.5.5 3' Untranslated R e g i o n ...... 42 1.6 THE FACTOR VIII GE N E ...... 43 1.7 MUTATIONS ...... 47 1.7.1 Population Genetics ...... 48 1.7.2 Mutational Hotspots ...... 4 9 1.8 TREATMENT ...... 51 1.8.1 Current M e t h o d s...... 51 1.8.2 Future Prospects ...... 53 2. TRANSGENESIS...... 57 2.1 INTRODUCTION OFDNA INTO E M B R Y O S...... 59 2.1.1 Retroviral Infection ...... 59 2.1.2 Microinjection...... 61 2.2 STEM C E L L S ...... 62 2.2.1 Embryonal Carcinoma Cells ...... 63 2.2.2 Embryonic Stem C e l l s ...... 64

3 3. HOMOLOGOUS RECOMBINATION...... 67 3.1 HOMOLOGOUS RECOMBINATION IN MAMMALIAN CELLS . 67 3.1.1 Chromosomal Targeting ...... 68 3.2 FACTORS AFFECTING THE FREQUENCY OF HOMOLOGOUS RECOMBINATION ...... 70 3.2.1 Vector D N A ...... 70 3.2.2 Cell C y c l e ...... 70 3.2.3 Length of H o m o l o g y ...... 71 3.2.4 Targeting of Non-expressed Genes . . . 71 3.3 INSERTION AND REPLACEMENT VECTORS ...... 73 3.4 METHODS OF T R A N S F E C T I O N ...... 76 3.5 SELECTION PROCEDURES ...... 77 3.5.1 E n r i c h m e n t ...... 77 3.5.2 Selection...... 79 3.6 THEORETICAL MECHANISMS ...... 83 3.7 GERMLINE TRANSMISSION AND EXPRESSION OF A GENE CORRECTED BY HOMOLOGOUS RECOMBINATION IN EMBRYONIC STEM CELLS...... 89 4. MATERIALS AND M E T H O D S ...... 92 4.1 MOLECULAR BIOLOGY...... 94 4.1.1 T e c h n i q u e s ...... 94 4.1.2 Buffers 116 4.2 TISSUE CULTURE...... 126 4.2.1 T e c h n i q u e s ...... 126 4.2.2 Buffers and Stock Solutions for Tissue C u l t u r e ...... 130 5. ISOLATION OF THE MOUSE FACTOR VIII G E N E ...... 135 5.1 RESULTS ...... 136 5.1.1 Library screening ...... 136 5.1.2 Reverse Transcription/PCR ...... 138 5.1.3 Amplification from genomic DNA . . . 145 5.1.4 Direct Sequencing ...... 146 5.1.5 Cloning of PCR Products...... 148 5.2 DISCUSSION...... 153 5.2.1 Library Screening ...... 154 5.2.2 Reverse Transcription /PC R ...... 158 5.2.3 Problems of Contamination ...... 160

4 5.2.4 Direct sequencing ...... 160 5.2.5 Cloning PCR P r o d u c t s ...... 161 5.2.6 Recent Developments ...... 162 6. ISOLATION AND CHARACTERISATION OF THE MOUSE FACTOR IX GENE 164 6.1 RESULTS ...... 164 6.1.1 Isolation of the Mouse Factor IX Gene 164 6.1.2 Mapping of the Mouse Factor IX Locus 165 6.1.3 Attempts to Isolate The Uncloned Regions of the Mouse Factor IX Gene...... 179 6.1.4 Sequencing of the Mouse Factor IX Gene 187 6.2 DISCUSSION...... 193 6.2.1 Bacteriophage Isolation ...... 193 6.2.2 The Mouse Factor IX Gene...... 194 6.2.3 Promoter R e g i o n ...... 195 6.2.4 Transcription Start Point ...... 197 6.2.5 Translation...... 199 6.2.6 3' end of the Mouse Factor IX gene . 204 6.2.7 Attempts to Isolate the Uncloned Regions of the Mouse Factor IX Gene...... 205 7. TARGETING VECTOR CONSTRUCTION AND ELECTROPORATION EXPERIMENTS . . . 209 7.1 RESULTS ...... 209 7.1.1 Targeting Vector ...... 209 7.1.2 Determination of Optimal Conditions for ES Cell Growth and Electroporation . . 219 7.1.3 Gene Targeting E x p e r i m e n t s .... 233 7.2 DISCUSSION...... 239 7.2.1 Stem Cell C u lture...... 239 7.2.2 Transfection Results ...... 241 7.2.3 Factors Affecting the Frequency of Homologous Recombination ...... 246 7.2.4 Alternative Strategies for Gene T a r g e t i n g ...... 250 FUTURE W O R K ...... 256 REFERENCES ...... 259

5 APPENDICES I List of Abbreviations...... 278 II List of Suppliers of Reagents...... 282 III Addresses of S u p p l i e r s ...... 284 IV Buffers for Restriction Edonuclease Digests 285 V lkb Ladder ...... 286 VI Oligonucleotide Sequences ...... 287

6 LIST OF FIGURES 1 Photograph of Haemophilia Patient ...... 12 2 The Coagulation Cascade ...... 16 3 Amino Acid Sequence and Structure of Human Factor IX ...... 19 4 Sequence Homology of the Propeptides of the Vitamin K-dependent Proteins ...... 22 5 Domain Structure and Processing of Human Factor IX...... 27 6 Domain Structure and Processing of Human Factor VIII ...... 32 7 Map of the X C h r o m o s o m e ...... 35 8 The Human Factor IX G e n e ...... 38 9 The Human Factor VIII G e n e ...... 45 10 Disruption of the HPRT Gene by Gene Targeting . . 74 11 The Positive/Negative Selection Procedure .... 80 12 Strategy for Detection of Homologous Recombinants by the Polymerase Chain Reaction...... 82 13 The Holliday Model for Genetic Recombination . . 85 14 The Meselson-Radding Model ...... 86 15 The Double Strand Break Repair Model ...... 88 16 Outline of Proposed Strategy for Generating a Mouse Model for Haemophilia B (or A ) ...... 132 17 Genomic Southern Blot for Factor V I I I ...... 137 18 Diagram of Exons 8 and 9 of the Human Factor VIII Gene...... 139 19 Human Factor VIII cDNA Amplification...... 141 20 Mouse Factor VIII cDNA Amplification...... 143 21 Rsal Endonuclease Digestion of PCR Products . . 144 22 Amplification of Factor VIII Genomic DNA .... 145 23 Factor VIII Sequencing D a t a ...... 147 24 M8 Sequencing D a t a ...... 149 25 M8 Genomic Southern B l o t ...... 151 26 Factor IX Genomic Southern B l o t ...... 166 27 Identification of Factor IX E x o n s ...... 169 28 Restriction Maps of the Subclones of the Mouse Factor IX g e n e ...... 172

7 29 Restriction Map of the Mouse Factor IX Locus. . 174 30 The Cloning of Exon 1 by DNA Amplification (strategy)...... 177 31 Exon 1 Amplification...... 178 32 Southern Blot of Bacteriophage D N A ...... 180 33 Genomic Southern Blot with Hind 3 Fragment of #326 182 34 Amplification of Exons 6 and 7 ...... 185 35 Sequence of the Mouse Factor IX G e n e ...... 188 36 Amino Acid Sequence of the Factor IX Protein . . 201 37 Diagram of the pMClNeo and pMCltk Vectors. . . . 213 38 Diagram of Targeting Constructs #440 and #444 . 215 39 Construction of Targeting Vector #444 217 40 Cystic Embryoid B o d y ...... 222 41 E14 Embryonic Stem C e l l s ...... 224 42 Results of G418 Toxicity T e s t s ...... 230 43 Southern Blot Analysis of Selected ES Clones . . 236

LIST OF TABLES 1 Factor IX Exon Arrangement 40 2 Factor IX Introns and Repetitive Elements 41 3 Human Factor IX Splice Junction Sequences 42 4 Mouse Factor IX Splice Junction Sequences 192 5 Transfection Results Under Different Electroporation Conditions 231 6 Transfection Results 233

8 Acknowledgements

I should like to thank all my colleagues at the Haemostasis Research Group for their help, support and entertainment during my time at the CRC. Thanks also to Dr. Ted. Tuddenham, Dr. Liz Simpson, and Dr Anne McLaren for reading my manuscripts and for their sage advice. Particular thanks are also offered to my supervisor Dr. John McVey, for all his help and discerning guidance.

A special thankyou also to David, for all his support and patience, especially during the writing of this thesis.

9 Aims of this Project

My aims at the outset of this project were as follows:

1. To isolate and clone the mouse factor VIII and/or IX genes.

2. To characterise the cloned mouse gene(s) by DNA sequencing and restriction endonuclease mapping.

3. To obtain and culture an embryonic stem (ES) cell line in vitro without cellular differentiation.

4. To construct a vector which would be suitable for targeting the mouse factor VIII/IX gene, and which would allow the use of positive/ negative selection.

5. To introduce the vector into the ES cells by electroporation. To select for cell clones that have undergone a targeting event, and to characterise the clones.

6. To inject the cloned, targeted cells into a host blastocyst, obtain a germline chimaera.

7. Ultimately, to breed the chimaera and generate offspring carrying a null allele for factor VIII/IX.

10 C W iT T E X l

g‘E9&%AL I^OXO'DUCTIOOi id M W E M c m tiL m

11 1 GENERAL INTRODUCTION

Haemophilia is an X-linked recessive bleeding disorder which exists in two forms, A and B. Both forms of haemophilia result in painful haemorrhaging into the joints and muscles. Such bleeding usually occurs in the large, weight-bearing joints such as knees, elbows, ankles and shoulders, and can cause long term problems; recurrent bleeding can lead to arthritis and degenerative changes in the joint structure and to severe deformity and ankylosis. Other complications include bleeding into the kidneys, which can cause permanent impairment of renal function, and intracranial bleeding which can be lethal if not quickly treated.

Figure 1

Photograph of Haemophilia patient. -note the long term damage to joints, especially of the legs. Present treatment for haemophilia, by replacement therapy, with products extracted from large volumes of pooled blood, carries a high risk of infection, and recently developed recombinant products may prove too expensive for normal use. In order to devise new treatments, such as gene therapy, it is necessary to have a suitable animal model on which to test the proposed protocols. The ultimate aim of this project is to develop mouse models of haemophilia A and B which can be used for gene therapy experiments.

1.1 HAEMOPHILIA

Haemophilia has been recognised for many hundreds of years, and as early as 1937 Patek and Taylor (1) had correctly described the function of a clotting factor, which is deficient in classical haemophilia, or haemophilia A. This factor is now known as factor VIII. In 1947 Haldane and Smith (2) published a linkage study between haemophilia and colour blindness and suggested that there were, in fact, two forms of haemophilia. This idea, however, was not accepted until 1952 when Aggeler et al (3) and Biggs et al (4) independently defined a second clotting factor. They named the factor "plasma thromboplastin component" and "Christmas factor" (after a young patient called Christmas) respectively and demonstrated that although the patients deficient in this factor showed symptoms similar to those of classical haemophilia, their prolonged plasma coagulation times could not be corrected by a fraction of the plasma known to be enriched for "antihaemophilic factor" (factor VIII). Haemophilic plasma and plasma from the new category of patients however, could correct each others coagulation times. Biggs et al also showed that, like classical haemophilia, Christmas disease was inherited in a sex-linked recessive manner. Christmas disease is now known as haemophilia B, and occurs at a frequency of 1 in 30,000 males, compared with haemophilia A which occurs in 1 in 6,000 males. Christmas factor is now known as factor IX.

13 1.2 THE COAGULATION CASCADE

Figure 2 is a diagrammatic representation of the coagulation cascade, including the relative positions of factors VIII and IX. The cascade is a complex series of conversions of inactive zymogens to active serine proteases. More than thirty enzymes, cofactors and inhibitors are involved in the cascade, which generates sequentially increasing amounts of the factors further down the cascade, and terminates in the formation of an insoluble fibrin clot from soluble fibrinogen monomers. (For a recent review see ref. 5)

It is now widely accepted that coagulation is initiated in vivo by the exposure of blood to tissue factor on cell surfaces, due to injury or damage of the vascular tissues. Tissue factor (TF) is a cellular receptor and cofactor for coagulation factor Vll/VIIa (7) . It is expressed on the surface of many cell types but not on those normally in contact with the circulation, such as endothelial cells. Factor VII binds to TF and is rapidly activated. It is not yet known how factor VII is activated in vivo, although Zur et al (8) suggested that "non-activated" factor VII has a small amount of activity which, in association with TF, will initiate the cascade. More recently it has been shown that factor VII can be activated by an autocatalytic mechanism following complex formation with TF (9) . Together, factor Vila and TF form a potent catalytic complex, which is located on the phospholipid cell surface, and which activates the zymogen form of factor IX (10,11), by cleaving the single chain glycoprotein into the two chain serine protease, factor IXa.

Factor IX can also be activated by factor XIa (12,13), but since deficiencies of factor XI cause only a mild bleeding disorder, this reaction is thought to be of secondary importance, perhaps only involved when there is severe injury.

14 (In contrast severe deficiencies of factor VII are thought to cause death in utero (14).) In vitro studies have shown that factor XI can be activated by factor XII and prekallikrein (15), but it has also more recently been shown to be activated by thrombin (16), which would explain the total absence of bleeding disorders where there is a deficiency in factor XII or prekallikrein.

15 Figure 2.

FXI

TF FVII FIX

FXIa TF • FVIIa FVIII FX

FIXa • FVIIIa FV

Prothrombin FVIII: FXa • FVa Thrombin

PS APC • PS FV: Thrombin - Thrombomodulin PC

FibrinogenFibrin

The Coagulation Cascade Diagrammatic representation of the coagulation cascade: the major regulatory pathways (6).

KEY a= activated i= inactivated TF= tissue factor APC= activated protein C PS= protein S

16 Factor IXa and its cofactor, factor VIII, which is activated by thrombin-cleavage (17), form a complex, with calcium, on the phospholipid cell surface, and this catalyses the activation of factor X (18). This is thought to be the major route for factor X activation since deficiencies of factor IX and factor VIII, result in haemophilia. The TF/ factor Vila complex is, however, also capable of activating factor X directly in what may be an accessory loop (19).

The addition of phospholipid to the factor IXa/ factor X mixture in the presence of calcium, increases the rate of activation of X slightly by decreasing the Km of the enzyme for the substrate. The addition of factor Villa to the reaction mixture, however, increases the velocity of factor X activation more than 4000-fold (18,20). It is not known how factor Villa mediates this increase in reaction rate but it is thought to facilitate the correct assembly of the reaction complex on the cell surface. Specific receptors for factor IXa on endothelial cell surfaces have also been reported (21-24) and may serve to promote the assembly of the factor X- activating reaction complex on the phospholipid surface and to localize the coagulation process.

Factor Xa, in a surface complex with its cofactor, Va, catalyses the conversion of prothrombin to thrombin (25) . Thrombin then cleaves soluble fibrinogen molecules, which form an insoluble fibrin clot, and further activates the cofactors V and VIII, and factor XI in a positive feedback loop. Factor Xa also activates factor VII in a positive feedback loop resulting in the generation of more factor IXa (26).

Thrombin also initiates the anticoagulant pathway by binding to an endothelial cell surface receptor, thrombomodulin (27). The thrombin-thrombomodulin complex then activates its substrate, protein C. Activated protein C, in a complex with its cofactor, protein S, proteolytically inactivates the coagulant cofactors Va and Villa, thus

17 limiting the coagulation process. Inactivation of factor IX is thought to occur by forming a complex with antithrombin III, which is then cleared by the liver (28), or by proteolysis. The details of this proteolysis are unknown, although factor IX has a very rapid turnover rate.

1.3 THE FACTOR IX PROTEIN

Factor IX is present in normal human plasma at a concentration of about 3(ig/ml but was first purified to homogeneity from bovine blood (29) and was found to be a single chain glycoprotein with a molecular weight of 54,000Da. Osterud et al (30) later isolated human factor IX, using similar procedures to those of Fujikawa et al, as a single chain glycoprotein with a molecular weight of 57,000Da. Katayama et al (31) later published the complete amino acid sequence of bovine factor IX, obtained by Edman degradative sequencing of tryptic peptides. Since then the structure and function of bovine and human factor IX proteins have been well characterised, and that of several other species such as dog, mouse, pig and sheep have been studied to a more limited degree.

18 So L A E R N .sT Y Q G K

° E t C-CS I 4 5 / Pact0r X,a MG R C^rS E P A V P F P G R V S V S 0 T S K L T® @ E A V F P D V D Y V m , |2 £ K E V r 1 1 s I Q V I P T v N F k A "E r F A f E g C-C n D V E N A G K T K D f c T c T Activation / FP rN DV NSA / NS E GE I1 W d « WC'CEL NT VL / VA X r I L E V N»22 C, Ev f P 0 t I P N S L E K I LVV T NS % 1 F L° NR vK gN °w s° Ec-cS j> * UE / TG V K / oT q K 64 G L R E D rQ rS

D VY IA 1 41 VC fA P NP G S 8 9 ® p a ® C'C F K D D G H H aA G A F

,YYV G W Y N mHN V T S cG P 333 5 ^ Q R K v , I G G V -Fac.orX,a FW F 1 A A \ E N r 40 H T * CATALYTIC DOMAIN T G I K c R r SR vteVEGTSfl n K A U V L t V T T N * kFT,Y r I HY LK - L r M r I s ! 235 V ? s N V» wS KV C00H r T R M g f g t r4 R Lc-cF ° c - c e {• \ VPLVDRAT AGFHEGGRDS amkgkyg T F s K C V 7 £ .VRrLNGOVFTTLKGSNY -NHp o, r r M ^ DOMAIN

19 Figure 3. Amino Acid Sequence and Structure of Human Factor IX.

The solid arrows indicate the sites of the activation cleavages.

20 1.3.1 Synthesis and Structure of Factor IX

Factor IX is synthesized in the liver as an inactive zymogen consisting of eight domains (see figure 5). The two most amino terminal domains make up the leader peptide, which is cleaved prior to the secretion of the 415 amino acid mature protein. The first of these domains, the predomain or signal peptide, contains a group of hydrophobic amino acids spanning residues -46 to -19 which are characteristic of secreted proteins. This "signal sequence", binds to the Signal Recognition Particle (SRP) in the cytoplasm. The complex is then targeted to the membrane of the endoplasmic reticulum (ER) by virtue of the SRP's affinity for a membrane receptor, known as the Docking Complex. The immature protein is translocated, via another complex, through the membrane, into the ER, where the signal sequence is cleaved and the protein takes up its mature conformation (32).

The second domain of the leader peptide, spanning amino acids -18 to -1, is called the prodomain. The amino acids in the prodomain show a high degree of sequence identity with other members of the vitamin K-dependent family of blood coagulation proteases which include prothrombin, factor VII, factor X, protein C and protein S. The activity of all of these proteins is dependent on the gamma-carboxylation of glutamic acid residues in one of the other protein domains (33). Amino acid residues Phe-16, Ala-10, Arg-4, and Arg-1 in this domain are invariant between the vitamin K-dependent proteins and several other residues are also highly conserved (see figure 4). It is now known that this region contains a signal which directs the microsomal y-carboxylase to the first 12 glutamic acid residues of the mature protein, since recombinant factor IX lacking the propeptide is not y- carboxylated and has reduced coagulation activity (33, 34).

21 M cd cd i - E-i i< < i < i< Eh

05 O' O' o i O i O' O' v Ml u Ml M u Mi Ml < < < C <

CO O' M w O l O' O' OJ >1 M d rH u Ml d <5 Eh .G < 3 <

o i—1 »—1 &1 01 O' O CO H (0 fd M u Ml Ml P-i > > < I < >< CM

O' O' O' 0) 01 O' O' H* Ml M Ml i—i u Mi Ml < < i< M < c <

c C fd O' V) CD CO CO rH i—i Ml •H i—1 > i < cd < 4 35 M d

3 0 3 3 3 3 i—I CO CD 0) CL) CD CD CD id d id d d d d >

Cl) 3 d) i—1 rH •—1 r—1 r - 1—1 a) rH id fd td fd H i-5 H > > > >

w mi C C5 >1 C5 3 0 0 >1 0) W i—1 r—1 i—1 rH 05 < CD o CD CD

g O' G CO (0 Ml U 05 (/) Ml CO •H •H CD CD < < W 05 05

O fd cd m fd fd fd > i r—1 i—t rH r—i rH i—1 i-H f4 *4 r ij < c < CD

C5 c G O' 3 CO 3 £ CO rH r—1 M rH ■H i-H cd cd C CD 35 CD

CM 3 c 3 3 3 C5 C5 i-H H rH d rH d d o cd CD cd CD CD CD

Ml CO CO (0 o O' 3 O' •H Ml M CD rH Ml > i w CM O) CD < d

■M- a

Ml rH i—1 m 0 3 CD 3 CD a; r—1 CD fd CD id i-5 d H 05 > d >

CO d) a) CD CD CD CD CD d d d d d d d cm CM CM CM CM CM CM

r—\ r-1 3 rH rH C5 id fd cd CD (0 fd CO i—1 > > d > > C CO Ml Ml cd CO 0 0 Ml O' d •H a) CD Ml i—i > i H 35 05 05 i< d

05 CO C 3 0 . CO 3 Ml > i rH 1—1 CO •H i—l CD CD CD CD C 35 CD 05

0 M ' > i 3 o 3 Ml 1—\ 0) i—1 CD Ml CD CD

-20 o 05 CD d CM d 05

rd CO 3 O >i i—1 3 CM »—I -rH CD Ml rH fd i-H rfj 35 d CM o > CD

CM M i—1 3 cd o o cd CM 0) (d CD rH u Ml i—i CO > d < 04 CM <

CO G 3 3 o w 3 3h CM 0) eu CD Me >1 CD rH d d CM d CD M 3 Ml a rH Ml 0) CD CD d u fd CD -24 05 d Eh H > 05 c

l a E g .e o Protein C Protein S Protein Gla Bone Factor X Factor VII Factor protein Factor IX Factor al

22 Figure 4 Sequence Homology of the Propeptides of the Vitamin K-dependent Proteins.

The amino-termini of the mature, secreted proteins are aligned at residue 1. The propeptide of factor IX extends from residues -18 to -1. The homologous regions of the other vitamin K-dependent proteins have been aligned (34).

23 The most N-terminal domain of the mature protein is the "gla" region. This domain, consisting of amino acids 1 to 41, contains the twelve glutamic acid residues which are converted to y-carboxy glutamic acid (gla) residues by a vitamin K-dependent carboxylase. This post-translational modification occurs in the microsomes of the hepatocytes before protein secretion. The reaction is biochemically coupled to the oxidation of vitamin K and is inhibited by coumarin derivatives such as warfarin (34).

The gla residues provide most of the low affinity calcium binding activity of factor IX. It has been shown (35-38) that in the presence of calcium, the gla region in prothrombin takes up an ordered conformation with an alpha helical configuration at the carboxyl end. In the absence of calcium the region tends to be disordered. It is thought that this calcium-induced conformation also occurs in factor IX and is necessary for the interaction of factor IX with phospholipid membranes (39, 40). It has also been shown that cooperative binding of calcium to sixteen low affinity (dissociation constant (Kd) = 0.6+0.1mM) sites in the gla domain significantly increased the rate of factor IX activation by factor XIa (41) . Sinha et al (42) suggested that this conformation allowed the binding of factor IX to a site on the heavy chain of factor XIa, necessary for optimal factor IX activation.

The second domain of the mature protein, at amino acids 41-47, consists of a short hydrophobic region or aromatic amino acid stack which is highly conserved amongst gla- containing proteins and contains the consensus sequence Phe- Trp-N-N-Tyr (where N= any amino acid residue) . This may be involved in binding either to cell membranes or to other proteins (43).

24 The next two domains of the factor IX protein, amino acids 48 to 85 and 86 to 145, bear limited homology to Epidermal Growth Factor (44, 45). These EGF-like domains also occur in factor VII, X, XII, and protein C (46). In particular, factors IX, X and protein C all contain two tandemly arranged EGF domains, the most N-terminal of which contains a fi-hydroxylated aspartic acid residue (Hya). In factor IX this modification occurs in only 30% of the molecules, at residue 64 (47). These Hya-containing domains contain high affinity (Kd 10-100p.m) calcium binding sites independent of the gla regions (4 8, 4 9). The hydroxylated Asp 64 is involved in the Ca2+ binding activity, as are Glu 50, Asp 47, Asp 49, (45, 50) but the fi-hydroxylation modification is not a necessary requisite for such binding (49) . Calcium binding to this region causes a conformational change in the factor IX molecule which may serve to alter the relative orientation of the molecule with the membrane surface on which it acts (51), or form a bridge with other EGF domains (50) . Ryan et al (52) found that the first EGF domain contributes to the binding of factor IX to the endothelial cell receptor. By using factor IX/X chimaeric proteins instead of synthetic peptides in similar experiments, Cheung et al (53) , however, concluded that the EGF domain was not involved in these activities.

The function of the second EGF domain in factor IX is not known but it has been suggested that it may be involved in protein and cell membrane interactions.

The sixth domain consists of the activation peptide which is cleaved from the mature protein during the activation of factor IX, by either factor XIa (13) or the TF-factor Vila complex (10) . These cleavages occur between arginine 145 and alanine 146, and between arginine 180 and valine 181. The Arg-Ala bond is thought to be cleaved first, resulting in a two chain intermediate form. The activation peptide is then

25 released by cleavage of the Arg-Val bond, leaving the factor IX protein as a heavy chain of 28,000Da and a light chain of 18,000Da (in the human protein) linked by a disulphide bridge, between cys 132 and 289 (54) .

The excised activation peptide is acidic in nature and contains N-linked glycosylation sites. Balland et al (55) suggested that many of the sugars contained in factor IX may be N-linked. Although the function of these sugars is unknown it has been suggested that they might protect the protein from non-specific proteases which may otherwise activate the protein (56) . The activation peptide is the least conserved of the factor IX domains and contains the only known site where a protein polymorphism can occur without any effect on the function of factor IX (57) .

The seventh domain of factor IX is the serine protease or catalytic domain. This contains the characteristic triad of residues that constitute the charge relay system; histidine, aspartate and serine, plus an acidic residue at the bottom of the substrate binding pocket. In factor IX these elements are at positions 221, 269, 365 and 359 respectively. Other regions of this domain, which are not conserved among the serine proteases, are thought to contribute to the high degree of substrate specificity of the factor IX enzyme.

For a review of the structures of the domains of the coagulation proteins see ref. 58.

26 Domain structure and processing of human factor IX 0 o o X X a Q. Xm CD 1 C\J Q.-J _ w CD o I _ O) To '•M TJ 0 CO D) c Q. Q. 0 +3 o L. 0 o c 0 in C\J CO CD CD in co Ill o o o LL X ULJ X X 5 0 *5 +-> L C O 0 0 3 O 13 CM

a - *

27 •4= 0 *4= 13 _ ■ -_ J - 4 Q. 9 '4= 0 0 o 0 0 Q. > 0 c O O) o

_L o o O X LL a 0 o 0 co o Figure 5. Domain Structure and Processing of Human Factor IX.

During secretion the signal peptide is removed from the pre- pro-factor IX zymogen. Mature but inactive factor IX is present in the circulation until either factor XIa or the TF/ factor Vila complex perform the activation cleavages at amino acids 145 and 180. The activation peptide is released and factor IXa exists as a two-chain protein joined by disulphide bonds.

28 1.3.2 Homology

The intron-exon organisation of factors IX, X, VII and protein C is identical and the organisation of the first three exons of prothrombin is also identical. It appears that factors IX, X, VII and protein C are the homologous products of a recent duplication of a common precursor gene, which is distantly related to the precursor gene of prothrombin.

The family of vitamin K dependent proteins, including factor IX, share a high degree of sequence identity within their amino-terminal regions. The catalytic domain of factor IX, and those of the other serine proteases involved in coagulation, also shows homology with serine proteases such as trypsin and chymotrypsin. This is greatest around the residues involved in the catalytic triad (31, 59-61).

29 1.4 THE STRUCTURE OF HUMAN FACTOR VIII

Factor VIII is present in the plasma at a concentration of about 0.1 to 0.2 p,g/ml. It is a large protein with a relative molecular weight of about 300,000, and circulates in the plasma bound to a multimeric glycoprotein called von Willebrand's Factor (vWF). The low plasma concentration and inherent instability have made factor VIII a difficult protein to study. Isolation of the protein by its affinity to a monoclonal antibody has (62), however, enabled peptide fragments to be characterised, and the sequence information obtained has been used to isolate cDNA and genomic clones from DNA libraries.

Unlike the vitamin K-dependent proteins, factor VIII is not produced exclusively in the liver. Although hepatocytes are the major sites of synthesis, factor VIII mRNA has also been detected in the, kidney, spleen, lymph nodes and muscle (63) .

Factor VIII is a multidomain protein with a triplicated A domain, separated by a highly glycosylated B domain, and a duplicated C domain (see figure 6) . The three A domain repeats at amino acids 1-329, 380-711, and 1649-2019 show 30% amino acid sequence identity with the copper binding protein, ceruloplasmin (64). This may imply similar binding properties for factor VIII. The B domain is cleaved from the rest of the protein upon activation and is not necessary for procoagulant activity (65) . The two C domains, each consisting of 150 amino acids constitute the carboxy-terminus of the protein. It has been suggested that they may be involved in cell adhesion or recognition as they show some homology to discoidin lectin (66) .

Like many of the proteins of the coagulation cascade factor VIII is synthesized as a single chain precursor protein, including a 19 amino acid leader peptide which is

30 cleaved when the protein translocates into the endoplasmic reticulum. Analysis of the sequence of this precursor indicates the presence of 25 potential asparagine-linked glycosylation sites, at least some of which are known to be occupied (67) . After or during secretion into the plasma, limited proteolysis at a number of sites converts the precursor into a mature two-chain protein. This circulates as a heterogeneous mixture of heterodimers due to variable B domain processing. Thrombin or factor Xa activates the mature protein by cleaving the peptide bonds between Arg 372 and Ser 373, and between Arg 1689 and Ser 1690 (17) . This results in the release of activation peptides of varying sizes. Factor VIII is inactivated by a cleavage between Arg 336 and Met 337, by activated protein C (17).

For a review of the structure and function of factor VIII see ref. 68.

31 Domain structure and proteolytic processing of human factor VIM 32 Figure 6. Domain Structure and Processing of Human Factor VIII.

Diagram of the factor VIII protein showing the domain structure and the cleavage sites (indicated by arrows) which lead to activation and inactivation of the protein. Proteolysis within the 95kDa fragment results in a heterogeneous two-chain circulating protein. Cleavages at amino acids 372 and 1689 by either thrombin or factor Xa activate factor VIII. Activated protein C then inactivates factor VIII by cleaving at amino acid 336.

33 1.5 THE FACTOR IX GENE

Chance et al (69) localized the human factor IX gene to the tip of the long arm of the X chromosome by hybridizing the human cDNA to genomic DNA from a panel of human-rodent hybrid cell lines. These cell lines consisted of different human X chromosomes carrying a variety of translocations on a mouse or hamster background. The cDNA hybridized to all of the cell lines; the only fragment common to all of them being the Xq27- qter region. In situ hybridization of the cDNA to human metaphase chromosomes also showed hybridization to the terminal region of the long arm of the X chromosome. Other authors, using similar techniques on a slightly different panel of X chromosomes have since mapped the human factor IX gene to Xq26-q27 (70-72) and it is usually quoted as between Xq26.3 and 27.1 (73- See figure 7). Mullins et al (74) localized the mouse factor IX gene to a syntenic position (Xpl.21) on the mouse X chromosome by RFLP and Southern blot analysis.

34 Figure 7.

Centromere

q26 Factor IX ~13cM q27 Fragile X - 3 0 c M - q28 G6PD ~15cM ]Factor VIII T elomere PDCB

X

Map of the X Chromosome : showing positions of the factor VIII and factor IX genes. key G6PD; glucose-6-phosphate dehydrogenase PDCB; ' red/green colour blindness fragile X; X-linked mental retardation associated with a fragile site on the X chromosome.

35 The gene for human factor IX was cloned and subsequently characterised by several groups between 1982 and 1985 (75-79) . cDNA fragments were isolated by screening either human or bovine tissue cDNA libraries with oligonucleotides based on the bovine amino acid sequence (31).

The mRNA for human factor IX is 2802 bases long and includes a short 5' non-coding region and a long 3' non-coding region of approximately 1.4kb. The cDNAs were then used to screen human genomic bacteriophage libraries (75, 76, 79) . The entire gene, including intron, exon and flanking regions has now been completely sequenced (7 9) .

1.5.1 The factor IX gene Promoter

Experiments with Chloramphenicol Acetyl Tranferase Vrcxns reporter genes, in transiently ^asfected cells, suggest that the factor IX gene promoter lies between nucleotides -98 and +21 (80). Patients with mutations in this region suffer from a form of haemophilia known as Haemophilia B Leyden. This condition is characterised by low factor IX levels in childhood followed by a steady increase after puberty. The promoter also contains consensus sequences for the binding sites of the transcription factors NF1 (Nuclear Factor 1-Liver ; at nucleotides -99 to -77) and C/EBP (CCAAT/enhancer Binding Protein : at nucleotides +1 to +18) (80) .

On the basis of SI nuclease mapping Anson et al (75) proposed a transcription initiation site at an adenine , (nucleotide number 1) 2 9 base pairs upstream from the methionine (amino acid residue number -46) which they proposed as the translation initiation point. Salier et al (81), working with a chloramphenicol acetyl transferase (CAT) reporter construct, found a transcription initiation site at

36 nucleotide -150. Reijnen et al (82) report initiation sites at nucleotides +4 and +30, as well as +1, but not at -150, suggesting that the site at -150 was caused by an artifact in the CAT constructs.

No definitive TATA boxes (83) or CAAT boxes (84) have been found in the factor IX gene. Sequences at positions -26 to -34 (75), -265, -411 (79), and -187 (81) have, however, been proposed as possible TATA candidates and the sequence ATTGG at nucleotide -92 as a potential CCAAT box (79) . The gene also lacks a GC-rich region found in the promoter regions of many housekeeping genes.

37 The human factor IX gene LO LO CM CO CO O O LO CM O -J O - LO Figure 8. The Human Factor IX Gene

Diagram of the factor IX gene and the translation product of the eight exons.

39 1.5.2 Exon Arrangement

Table 1.

Factor IX Exon Arrangement

EXON SIZE (bp) ENCODING

a/1 117 5' non-coding region predomain b/2 164 Prodomain + 11 of the 12 gla residues c/3 25 Twelfth gla residue + aromatic amino acid stack d/4 114 First EGF domain e/5 129 Second EGF domain f/6 203 Activation peptide + 2 cleavage sites g/7 115 First part of active site including His 210 h/8 1935 Second part of active site, including Asp 270 and Ser 366. Also the long 3' non-translated region.

The gene for human factor IX is 35kb long and contains seven introns ranging in size from 188 nucleotides to 9475 nucleotides. The coding region consists of eight exons corresponding roughly to the protein domains. The sizes of these exons are shown in table 1, along with the corresponding protein domains.

40 1.5.3 Introns And Repetitive Elements

The sizes of the introns of the human factor IX gene, and the repetitive elements they contain are contained in table 2. Both intron A and the 3'non-coding region contain sequences of alternating purines and pyrimidines. These could potentially form hairpin loops or a left-handed helical, Z DNA structure (85) . These Z DNA structures are often found at the 5' regions of eukaryotic genes, where they are thought to be involved in the uncoiling of DNA prior to transcription. They are also found in recombination hotspots where they may have a similar unwinding function (86,87).

Table 2. Factor IX Introns and Repetitive Elements

NON-CODING SIZE REPETITIVE REFERENCE REGION (base pairs) ELEMENTS

5' 28 Kpn, 88 Hind 3 89 INTRON A 6206 Alu 90 pu/py INTRON B 188 / INTRON C 3689 / INTRON D 7163 Kpn 88 INTRON E 2565 / INTRON F 9473 3 X Alu 90 INTRON G 668 / 3' 1.4Kb Alu 90 pu/py

41 1.5.4 Splice Sites

All the intron-exon boundaries were found to comply with the GT-AG rule of Breathnach and Chambon (91) and Shapiro and Senapathy (92 - see table 3), and the splice sites occur either between amino acids (type 0) or after the first nucleotide of the triplet (type 1 - 93) . Although the gene contains an additional six open reading frames there is no evidence for any internal or overlapping genes.

1.5.5 3' Untranslated Region

Translation initiation at Met -46 (75) results in an open reading frame of 1412 base pairs terminating in a Threonine at the double stop codon UAAUGA. Poly(A) addition occurs at nucleotide number 32757, which is 16 nucleotides downstream from the consensus sequence of AATAAA (94) . A further 23 bases downstream is the sequence CATTG which may correspond to the consensus sequence of CA(T/C)TG which may be involved in polyadenylation or splicing of the 3' end of the mRNA (95). Table 3

Human Factor IX Splice Junction Sequences

Splice Junction Sequences INTRON Exon 5' Intron 3' Exon Type (93) A 1 CAG GTTTGT TTTCAG T 2 I B 2 ACAGTGAGT..... A 3 0 C 3 TTG GTAAGC...... TCAAAG A 4 I D 4 TAG GTAAGT..... TTTTAG A 5 I E 5 CAG GTCATA..... T 6 I F 6 CAG GTACTT...... TCACAG G 7 0 G 7 CAG GTAAAT...... TAATAG G 8 I CONSENSUS C A TT T SEQUENCE AG GT AGT..... N AG G (92) A G CC C

42 1.6 THE FACTOR VIII GENE

The human factor VIII gene was isolated by two different groups (96,97) by making oligonucleotide probes corresponding to amino acid sequence information, which was obtained from either human or porcine purified protein. Gitschier et al (96) used the oligonucleotide probes to screen 500,000 bacteriophage in a genomic library from a 4X individual. Clones spanning 28Kb were initially isolated and then expanded to more than 200Kb by chromosome walking.

Toole et al (97) used their oligonucleotide probes to screen a porcine genomic bacteriophage library. From 4 X 105 recombinants screened, one hybridized to a selection of probes. This clone was then used to screen a human genomic bacteriophage library. 8 X 105 recombinants were screened and yielded one positive bacteriophage with an insert of approximately 16Kb. Confirmation that this clone contained part of the factor VIII gene was obtained by sequence analysis and Southern blotting techniques. The human genomic clone was then used to isolate the rest of the gene from a bacteriophage library from a 4X cell line.

Toole et al used their original human clone in Northern blot analysis to identify the liver as the major source of human factor VIII. They estimated that the levels of factor VIII mRNA in the liver was 20-40 times lower than that of factor IX, and that libraries of more than 200,000 recombinants would be required to isolate a single cDNA clone. They eventually isolated 65 factor VIII cDNA clones after screening more than 3 million recombinants from two different libraries.

The human gene for factor VIII is located at the tip of the long arm of the X chromosome, at Xq28, and spans 186Kb. There are twenty-six exons ranging from 68-3,106 base pairs, encoding an mRNA of 9,029 base pairs. The twenty-five introns

43 range in size from 207-32,400 base pairs; a total of 177 Kb. There is a 5' non-coding region of 150 nucleotides, including the sequence GATAAA, 30 base pairs upstream from the mRNA start site, which is a possible TATA box. Translation initiation at the ATG at nucleotide +1 results in an open reading frame of 7,053 nucleotides, encoding 2351 amino acids. There is also a 3' untranslated region of 1,805 nucleotides, including the polyadenylation signal, AATAAA.

44 Factor VIII 45 Figure 9. The Human Factor VIII Gene

Diagram of the factor VIII gene locus showing exon arrangement in relation to the domain structure of the translation product.

46 1.7 MUTATIONS

The mutations causing haemophilia are highly heterogeneous. The great majority are point mutations involving one or just a few base pairs. The rest are due to gross deletions, insertions or rearrangements. Many of the mutations have been characterised in detail and have provided much information about the molecular pathology of haemophilias A and B (98, 99) .

The coagulant activity of normal pooled plasma is defined as lU/ml. People with 5-50% of normal activity are classed as mild haemophiliacs, and generally only bleed excessively after major trauma or surgery. Moderate haemophiliacs have 1-5% of normal activity and bleed after minor trauma, whereas severe haemophiliacs will bleed spontaneously and have factor VIII or IX activity levels of less than 1% of normal.

Some haemophilic patients produce antibodies when they are treated with the exogenous factor. It was found that many of the haemophilia B patients had deletions in the gene for factor IX, and it was proposed that the deletions prevented the development of immune tolerance to the protein (101,102). Not all deletions, however, result in antibody production and it is thought that other factors in the nature of the mutation or in the patients' immune response genes must mediate this response. A weaker relationship between deletions and antibody production has been shown for haemophilia A (102).

Some patients have factor VIII or IX antigen levels (as measured by immunoassays) within the normal range but are deficient in factor activity. These patients are called CRM+ (cross reacting material positive) and carry mutations that affect the function of the protein but not its synthesis or stability. The study of these CRM+ proteins may, therefore, help us to identify functionally important domains. For example, factor IX Oxford 3 and factor IX 3 and 4 carry

47 mutations at arginine -4, which result in the retention of the leader peptide after secretion (103-107) . Patients with reduced antigen to the clotting factor but a disproportionately greater reduction in activity are referred to as CRM reduced. These patients are thought to have mutations which affect either the stability or production of the protein as well as the function of factor IX. Patients with proportionately reduced antigen and activity levels are called CRM- and generally carry mutations that impair protein synthesis or stability. These include splice site mutations like factor IX Oxford 1 (108) and 2 (109) which interfere with mRNA processing, and factor IX London 11 (106) which results in the premature termination of protein synthesis, due to a frameshift mutation. The promoter mutations which result in Haemophilia B Leyden also belong to this group (110).

1.7.1 Population Genetics

If a disease is to persist in the population, the loss of alleles due to reproductive failure must be compensated for by the gain of new mutant alleles or by heterozygous advantage. It is usually assumed, therefore, that there is an equilibrium between loss and gain of detrimental alleles in a population. Since one third of all X-linked genes are in males, the mutation rate for X-linked recessive genes is given by the formula u=l/3sl where u=mutation rate, I=incidence of the disease, s=relative fertility of haemophilia patients (the selection coefficient). Ferrari and Rizza (111) suggest that before the introduction of replacement therapy, the relative fertility was only 0.5, as affected individuals had a much lower chance of reproduction than a normal individual. This means that 1/6 of the haemophilic alleles would have to be replaced at each generation for an equilibrium to be maintained and that new spontaneous mutations must occur with a high frequency.

48 1.7.2 Mutational Hotspots

The above prediction has been largely borne out by the number of new patients with no previous family history of the disease. There do, however, appear to be some mutational hotspots, particularly at CpG base pairs (103-106, 112) . (About 40% of single base pair substitutions in the factor IX gene are thought to be CpG dinucleotide mutations.) These mutations are thought to be due to the spontaneous deamination of a cytosine, methylated at the 5' position of the pyrimidine ring. This will result in a C to T transition. The thymine which is incorrectly paired with a guanine may then be repaired by the cellular machinery. Brown and Juricny (113) showed that 96% of these mismatches may be correctly repaired to a G:C pair. This, however, does not appear to be sufficient to maintain the CpG base pairs at the same level as other dinucleotides. The high frequency of CpG to TpG transitions is thought to account for the relatively low frequency of CpG dinucleotides in eukaryotic DNA.

Factors VIII and IX have fewer C and G bases in their coding regions than their autosomal homologous proteins, factors VII, X and protein C, and factor V. This may be due to hypermethylation of the inactive X chromosome, resulting in a higher rate of conversions of cytosines to thymines than on other chromosomes (114) . Detrimental mutations on the X chromosome, however, are particularly exposed to natural selective pressures because of male hemizygosity. The CpG dinucleotides, encoding Arg residues that are selectively retained during evolution might, therefore, be expected to be those which occupy functionally important sites. Indeed 15% of CpG mutations in the coding region of factor IX actually block protein synthesis, due to the relevance of the Arg residue to the protein, compared with 0, 1.3 and 1.5% in factors VII, X and protein C. CpG dinucleotides in critical positions have also been found to be highly conserved across species (115). A similarly high mutation rate at CpG dinucleotides has also

49 been reported in the factor VIII and glucose-6-phosphate dehydrogenase genes (116, 117) which are both located on the X chromosome. This mechanism, however, is not restricted to the X-chromosome as approximately 1/3 of all identified mutations causing human disease are associated with CpG dinucleotides (118) .

50 1.8 TREATMENT 1.8.1 Current Methods

Early treatment for the haemophilias consisted of transfusion with whole blood. Levels of the clotting factors in normal blood are, however, generally too low to stem bleeding from a major haemorrhage. Before accurate cross­ matching of blood types was available there was also a high risk of severe immune reactions. Whole plasma (or the fibrinogen fraction of plasma for haemophilia A (4)) was also given to patients, but the large volumes necessary because of low factor concentrations often lead to hypervolaemia.

Concentrated forms of factors VIII and IX are now available, but the large numbers of blood donations required to prepare these products lead to a high risk of viral infection. The large numbers of haemophiliacs infected with Human Immune-deficiency Virus (50% in UK, up to 90% in USA) due to administration of contaminated factor VIII concentrates have been particularly prominent recently. This is not, however, the only infective risk; 90% of patients receiving therapy before the advent of heat-treated concentrates are thought to have been infected with hepatitis B and many of these have permanent liver damage. It is thought that recurrent treatment may also cause immune supression and leave the patient more vulnerable to these agents. Furthermore, some patients also produce antibodies to the exogenous factors, which severely complicates treatment.

Since 1986 most blood product concentrates have been heat treated. This treatment appears to eliminate the risk of HIV or hepatitis B infection, but there is still some risk of nonA, nonB -hepatitis. Now that assays for hepatitis C are becoming available it has been realised that large numbers of patients are also infected with this virus (119) . It is hoped that all new cases of haemophilia will be given only heat- treated concentrates and will be vaccinated against hepatitis

51 B as early as possible.

Rather than treating bleeds as they occur it would be desirable to keep haemophiliacs on a regime which would prevent bleeds altogether. Due to the high risk of infection, however, and the lack of knowledge of the side-effects of long-term therapy, such prophylactic treatment has not been encouraged in the UK. (Some other countries have different policies.) As concentrates become safer this may not always be the case. Prophylaxis would lead to a much higher demand for the blood products and it is not known whether this could be met by current methods of blood collection and concentrate production. (For a review of current haemophilia treatment see ref. 120.)

The molecular cloning and sequencing of the genes encoding factors VIII and IX has allowed us to offer carrier detection to families at risk of having a haemophilic child. In the past this has relied on identification of the defective allele by irrelevant variations in DNA sequence which modify its digestion patterns. (There is, however, always the possibility of recombination between the marker and the defective gene, which would lead to incorrect diagnosis.) There are seven such Restriction Fragment Length Polymorphisms (RFLPs) which are used to follow and predict the transmission of a defective factor IX gene in Caucasian families (121). Together they allow successful proband diagnosis in a maximum of 8 9% of families at risk. (Other ethnic groups have a lower frequency of the polymorphic markers which results in a lower diagnostic success rate.) The other 11% are either not informative for the markers, that is they are homozygotes, or there is not enough information available to follow the segregation of the markers. There are three intragenic and two extragenic RFLPs commonly used for carrier diagnosis of haemophilia A. Together these are informative in approximately 70% of cases.

52 More recently, the development of techniques such as DNA amplification, Direct Sequencing, Mismatch Chemical Cleavage and Single Stranded Conformational Polymorphisms has enabled the direct identification of the gene defect. This eliminates both the need for detailed genetic information on family members, and the possibility of recombination. In theory it could provide successful carrier detection and proband diagnosis in virtually every family.

30% of haemophilia A and 15% of haemophilia B cases, however, arise due to spontaneous new mutations. It will, therefore, never be possible to eliminate these genetic diseases from the population by carrier diagnosis and genetic counselling. We must be able to offer effective treatment for the disease.

1.8.2 Future Prospects

The continuing problems of viral contamination and possible shortages of donated blood call for alternative methods of treatment for haemophilia. One possibility is to produce the deficient clotting factor in vitro. This has been successfully achieved for haemophilia A by transfecting a Hamster cell line with the human factor VIII DNA (122) . Recombinant factor VIII has now entered clinical trials although, at an approximate cost of £25,000 per patient, per year, it may prove too expensive for widespread use.

Several attempts have been made to produce active recombinant factor IX from cells in culture with varying degrees of success (123-128). Despite initial difficulties due to the extensive post-translational modification required for factor IX activity, active protein has been produced in vitro. Commercial production of this recombinant protein, however, would be less economically viable because of the small numbers of haemophilia B patients. If production costs were similar to

53 those of recombinant factor VIII, the cost per patient per year could be approximately £150,000. An alternative approach has been to produce transgenic sheep bearing the human factor IX cDNA under the control of the milk protein promotor. The presence of factor IX in the milk of such animals has been reported (129) .

The ultimate therapeutic goal is the correction of the gene defect by providing a permanent endogenous source of factor IX or factor VIII, that is somatic gene therapy (ISO- 134) .

The haemophilias would be ideal candidates for treatment by gene therapy. A small increase in the levels of the clotting factor might convert a severe haemophilic to a moderate case and would greatly enhance the patient's quality of life. The low levels of expression needed, as well as the quick turn over rate of the factors and the endogenous regulation of the coagulation cascade, prevent the need for tight regulation of expression. Also, although factor IX must be produced at a site which can perform all the necessary post-translational modifications, such as the hepatocytes, factor VIII could be synthesised in a number of more easily accessible tissues. (For a review of recent progress towards gene therapy of haemophilia see ref. 135.)

54 Any proposed new therapy may be associated with risk. Gene therapy by retroviral mediated gene transfer carries the risk of viral infection, and any method of random gene insertion may cause oncogenesis or disruption of normal gene function. Gene therapy protocols also often prove to be ineffective because of insufficient or inappropriate expression of the recombinant sequences. Alternatively, replacement therapy with new products may have unknown effects on coagulant activity or the immune system, and may contain infectious agents. Before a possible treatment can be tested on humans its safety and efficacy must first be proven in an animal model of the disease. The only existing animal models described for the haemophilias are dogs and cats (Janet Littlewood- personal communication). These are difficult and expensive to maintain and have inconveniently long generation times. A haemophilic mouse would be a more suitable animal model, but unfortunately a naturally occurring haemophilic mouse has not been described to date. It is now possible, however, to introduce gene sequences that have been manipulated in the laboratory, into a mouse embryo and to either disrupt an endogenous gene, or to express inserted sequences throughout the lifetime of the animal. The purpose of this project is to employ these "transgenic" techniques in order to produce a haemophilic mouse.

55 'WvHSgw&sis

56 2. TRANSGENESIS

The ability to introduce genes into an organism means that we are no longer dependent on random mutagenesis to provide us with ways of analyzing complex biological processes at the genetic level. Methods for inserting cloned genes into bacterial cells have been extended so that DNA can be introduced into eukaryotic cells either in culture or, more recently, in a developing embryo. The animals which develop from these embryos and carry genes that have been manipulated in vitro in all the cells of their body, are called transgenic animals, and the genes that they carry are referred to as transgenes.

The development of these transgenic techniques has been made possible by advances in methods of removing early embryos from the uterus, manipulating them in vitro, and returning them to a foster mother. Such techniques are most widely practised on the mouse, but can now also be applied to other more economically important animals, such as pigs and sheep.

The first transgenic techniques relied on random integration of the manipulated DNA into the genome. This random insertion of sequences may, however, disrupt normal gene expression and have detrimental effects on the embryo. More recently it has become possible to target the recombinant DNA to a particular site in the genome by a process called homologous recombination or gene targeting. This process should theoretically enable the correction or disruption of any chosen sequence in the genome: used in conjunction with recently developed embryonic stem cells these targeted sequences can be incorporated into a transgenic animal. The ability to target specific sequences is a great advantage when studying gene function, and is essential for the development of gene therapy in conditions where the expression of a harmful product must be disrupted. In addition, any gene inserted into the genome by homologous recombination will be

57 under the same regulatory controls as the endogenous gene. It is, therefore, not necessary to include in the transfection construct all the regulatory elements to achieve correct expression of a gene. Homologous recombination, however, is a very rare event and where targeting of a specific gene is not required the more classical methods are still widely used to introduce DNA into transgenic animals. The advantages and disadvantages of these earlier methods are summarized here before a discussion of the development of stem cell lines, and of the process of homologous recombination and its applications.

58 2.1 INTRODUCTION OF DNA INTO EMBRYOS 2.1.1 Retroviral Infection

The first successful attempt at transferring DNA into the tissues of a mouse, and transmitting it through the germline was achieved by injecting DNA from the Moloney Leukaemia Retrovirus (M-MULV) into the blastocoel cavity of the mouse embryo. (136, 137) This retroviral infection can be performed on pre- or even post-implantation embryos (138, 139) and is a fairly simple procedure. It is not, however, possible to infect the embryo at the one-cell stage and the virus infects only a proportion of the embryonic cells. The DNA is, therefore, normally incorporated into a subpopulation of cells throughout the embryo, producing a mosaic animal or chimaera which must be bred in order to produce a true transgenic individual. Although transfection efficiency is high, the efficiency with which the germ cells are infected with viral sequences is low and germline transmission of the introduced DNA is, therefore, rare.

Viral DNA can integrate into the host genome as a single copy without causing any major rearrangements in host sequences. Another advantage of this method is that it facilitates the cloning of the regions flanking the viral insert. This is achieved by the use of probes to identify clones containing the viral Long Terminal Repeat sequences, and is a great advantage when investigating the effects of gene disruption in order to identify the gene involved. The amount of DNA which can be introduced into the genome by this method is, however, limited to about 9-10Kb because of the packaging constraints of the virus particle.

One of the problems of retroviral mediated gene transfer is the infectivity of the viral particles, which can result in a persistent viraemia in the recipient cells or organisms. Although the recombinant retrovirus particles themselves are not capable of replication, they must be used in conjunction

59 with a helper virus to enable them to infect the recipient cells. Packaging-defective helper virus particles, which enable the recombinant virus to infect the cells but can not themselves replicate, were developed. There is, however, occasional recombination between viral strains, which may result in a packaging- competent particle. In the most recently developed systems two helper virus strains, lacking different components of the replication cycle, are used. Two such rare recombination events would therefore be necessary to produce an infective particle (140, 141) .

60 2.1.2 Microinjection

The most commonly used method of producing transgenic animals is by direct microinjection of DNA into the male pronucleus of the fertilised egg followed by implantation in pseudopregnant foster mothers (142). This technique efficiently (approximately 5% of injected cells) produces animals which incorporate the transgene into all its tissues. The initial isolation of the fertilised oocyte, and the microinjection are, however, complicated procedures.

Although the DNA molecules usually integrate stably into the host chromosome (it can occasionally replicate episomally (143)) they do so as multiple copies arranged in a head to tail array, and often cause gross rearrangements of the host sequences (144) . These rearrangements can make it very difficult to clone the true flanking sequences and to interpret any effects of transgene expression.

(Correct expression of the transgene in both these systems depends on the presence of appropriate regulatory sequences.)

61 2.2 STEM CELLS

A third method of introducing foreign DNA into a mouse embryo involves inserting the sequences of interest into pluripotent embryonic stem cells, and injecting the transformed cells into host blastocysts. The descendants of the stem cells are capable of contributing to all the tissues of the developing chimaera. If they contribute to the germline, some or all of the chimaera's offspring will carry the DNA introduced into the original cell; that is they will be transgenic.

Most of the stem cell lines used to derive transgenic animals are male. There are three reasons for this preference. Firstly, the XY karyotype appears to be maintained more stably in culture than XX. Secondly, it is preferable to have male chimaeras for breeding as they can produce more offspring than females, and only XY stem cells can give rise to functional sperm. Thirdly, XY stem cells can cause sex conversion. This occurs when a significant contribution of XY cells to an initially XX blastocyst causes the embryo to develop as a male. The entire germ line in these chimaeric animals is derived from the exogenously introduced stem cells.

The amount of contribution of the stem cell lineage to the chimaeric animal can be assessed by choosing cell lines containing a marker for coat colour. When a stem cell with a gene for light coat colour is injected into a blastocyst containing genes for a dark coat colour, light coloured patches will appear on the mainly dark coat. The proportion of these patches compared with the blastocyst derived colour reflects the contribution of the stem cell to the whole animal.

Genetic material can be introduced at random into stem cells in culture by a number of methods, including retroviral infection, microinjection, calcium phosphate coprecipitation

62 and electroporation. These cells have, however, become particularly important since the development of techniques for gene targeting as they enable the incorporation of a targeted gene into a developing embryo. The properties and development of embryonic stem cell lines are discussed below.

2.2.1 Embryonal Carcinoma Cells

Pluripotent embryonic stem cells were first isolated from malignant tumours called teratomas, caused by transplanting early mouse embryos into adult testes, (or occurring naturally in some inbred strains of 129 mice) (145) . It was noticed that these tumours consisted of a number of differentiated cell types as well as undifferentiated core cells . These core cells were thought to originate from the early embryo before cell development is restricted to certain cell lineages (146). When these core cells were isolated and propagated (147, 148) they retained their ability to differentiate into many cell types and became known as Embryonal Carcinoma (EC) cells. This differentiation can be achieved either in culture (149) or in vivo by injecting the cells into host blastocysts. The EC cells contribute to many of the somatic tissues of the developing chimaeric embryo without causing tumours (150 and 151), and there have been reports that they even contribute to the germline and give rise to viable offspring (152) . The EC cells can also be transfected with foreign DNA prior to blastocyst injection and can maintain and express this DNA throughout development (153, 154).

Although the developmental potential of EC cells appears to mimic that of the early embryo, attempts to produce transgenic animals by this route have proven disappointing. The contribution of the EC cells to the chimaera is generally small compared to that of normal embryonic cells injected into a host blastocyst (155) . The frequency of colonisation of the germ cells is particularly low: only five chimaeric animals

63 have been reported to transmit EC cell progeny through the germline to their offspring, and these were derived from three EC lines, only one of which had been maintained entirely in vitro.

A number of factors may be responsible for these disappointing results. Firstly, the cell lines are remarkably heterogeneous, both in terms of their culture requirements and their growth characteristics, including their differentiation abilities. Secondly, a large number of chromosomal abnormalities have been detected in EC cells and it is thought that tumour growth may select for abnormal cells.

2.2.2 Embryonic Stem Cells

More recent studies have shown that it is possible to isolate pluripotent embryonic stem cells directly from the preimplantation embryo (156, 157). This was first carried out using blastocysts which were prevented from implanting in the uterine wall by ovariectomy and hormonal stimulation of the mother (156). The blastocysts were isolated and cultured in drops of media under paraffin oil on a petri dish for 48 hours. The trophectoderm cells formed giant trophoblast cells and the inner cell mass formed egg cylinder-like structures. These egg cylinder-like structures were isolated, dispersed in trypsin and replated. Colonies with typical EC-like stem cell morphology were passaged on mitomycin C treated STO feeder cells (158,159) until pure lines were obtained.

The undifferentiated cells were capable of forming teratocarcinomas when transplanted into a histocompatible host in a non-uterine site. They formed embryoid bodies when grown in suspension (149), and differentiated into a large number of cell types in vitro. They were shown to have a normal XY karyotype, and similar cell surface antigens and protein

64 constituents to normal embryonic cells and to some EC lines. When injected into a blastocyst these cells were capable of contributing to the germ cell lineages and forming germline chimaeras (160, 161).

Another line of embryo derived stem cells was derived by isolating the inner cell mass from a normal blastocyst by the use of immunosurgery (162) . The cells were then passaged in media conditioned by teratocarcinoma cells. This conditioned media contained a growth factor which prevented the differentiation of the stem cells. Other cell lines often used as feeders, such as STO and BRL fibroblasts, also produce a growth factor which will inhibit stem cell differentiation and in vivo may be involved in normal embryonic development.

These stem cells derived from the mouse embryo are now known as embryonic stem (ES) cells. It has since been realised that EC conditioned media is not necessary for ES culture, and the cells can be isolated and maintained, without differentiation, either in the presence of the growth factor, Leukaemia Inhibitory Factor, LIF, (163-165) on feeder layers (158, 159) , or in media conditioned by Buffalo Rat Liver Cells (166, 167). Various ES cell lines have now been isolated, all of which appear to have similar morphologies and differentiating abilities (167-169).

65 CrtftPTEHtS

MOMOLOQOUS

66 3. HOMOLOGOUS RECOMBINATION

In 1978 Hinnen et al (170) observed genetic recombination between plasmid DNA introduced into a yeast cell, and the homologous sequence in the yeast genome, which resulted in a swapping over of material from the plasmid into the genome. Since then similar processes have been shown to occur in mammalian cell lines, both between co-introduced plasmids and between a plasmid sequence and the endogenous genome. This homologous recombination is, however, a very rare event in mammalian cells and is much less common than random insertion or non-homologous recombination. This process could be used to specifically change any gene in the genome in a particular manner. Several factors have been found to affect the frequency of homologous recombination, such as the size of the region of homology and the design of the vector DNA. In order to apply this natural phenomenon to usefully target specific genes we must be able to maximise its efficiency and to effectively select for the process. Several groups have now reported successful gene targeting and some progress has been made in understanding the process and in overcoming the problem of low frequency. The following is a review of data regarding gene targeting and the factors that are thought to affect its frequency. The theories that have been proposed to explain the mechanism of homologous recombination will also be briefly discussed.

3.1 HOMOLOGOUS RECOMBINATION IN MAMMALIAN CELLS

Homologous recombination is thought to be the process responsible for the genetic recombination that takes place during meiosis, during sister chromatid exchange and during interchromosomal exchange in mitosis. Such exchanges of genetic material were first observed in prokaryotes, but studies have since been carried out on certain species of yeast where the results of recombination can be easily

67 analyzed in the resulting spores or asci. In 1982 Folger et al (163) proposed that the head to tail concatomers that appear when DNA is microinjected into mammalian cells, form by homologous recombination between the introduced molecules. This would suggest that mammalian cells, as well as yeast, contain the necessary machinery to allow homologous recombination to occur. Kucherlapati et al (172) later confirmed this by showing that homologous recombination had occurred between two co-transfected deletion plasmids of the neomycin phosphotransferase (Neo) gene in human bladder carcinoma cells and in mouse LMTK- cells, at a frequency of 7.4% and 1.3% of that obtained with the complete Neo gene (ie random integration). Cells containing the Neo gene repaired by recombination were rendered resistant to the neomycin analogue, G418 (173), and so could be selected.

The first targeting experiments investigated homologous recombination between co-transfected plasmids (174, 175) and many of the factors affecting the frequency of the process have been studied in plasmid X plasmid systems (see below).

3.1.1 Chromosomal Targeting

Smithies et al (17 6) developed a plasmid X chromosomal plasmid system for investigating homologous recombination. They transfected Chinese Hamster Lung cell lines with a deletion mutant of the pSV2Neo gene by calcium phosphate mediated gene transfer. This produced four transfected cell lines with the plasmid sequences inserted into the genome at different random positions. These sequences were then used as chromosomal targets for homologous recombination with vectors containing 500 base pairs of homology and a different, non­ overlapping deletion. Targeted cells containing the repaired Neo gene were selected by their resistance to G418. When the targeting vector was linearized, successfully targeted

68 colonies were obtained at frequencies of 1% and 0.3% of that obtained with the full length gene (ie random insertion).

Lin et al (177) carried out similar experiments with two different deletion constructs of the Herpes Simplex Virus thymidine kinase (HSV-tk) gene, containing 320 base pairs of homology. These vectors were transfected without linearization and produced tk+ colonies at a much lower frequency of about 1/105 that obtained with the complete tk gene.

The first report of homologous recombination between an endogenous chromosomal gene, the fi-globin gene, and a transfected plasmid was published in 1985 by Oliver Smithies and his co workers (178). A special targeting vector was constructed which contained 4.6Kb of sequence homologous to the human 15-globin locus, a pSV2Neo gene which, when expressed, would render cells resistant to G418, and the SupF gene which would allow rescue of vector sequences from the genome of the recipient cells without selection in G418. This bacteriophage rescue system then used Southern blot analysis to reveal true targeting events by the presence of a unique Xbal fragment. Targeting events were detected, using this system, in every 300-1100 transfected Human EJ bladder carcinoma cells. Similar frequencies were obtained in Hu 11 hybrid cells using selection in G418.

Subsequent experiments by a number of groups have shown targeting frequencies between 1 in 5 X 106 (179) and 1 in 500 (180) stably transformed cells.

69 3.2 FACTORS AFFECTING THE FREQUENCY OF HOMOLOGOUS RECOMBINATION 3.2.1 Vector DNA

Kucherlapati et al (172) found that linearization of the incoming plasmid DNA increased the frequency of recombination in his mouse and human cell lines approximately 9-fold. This effect had earlier been observed in yeast cells by Orr-Weaver et al (181), who found that digesting the incoming DNA within or very close to the region of homology increased the frequency of recombination between 10 and 1000-fold. The authors concluded that the double stranded ends of the cut DNA were highly recombinogenic and interacted directly with the homologous sequences. These results have been confirmed several times and linearization of the plasmid DNA is now routinely used in gene targeting experiments.

In plasmid x plasmid recombination experiments Chang and Wilson (182) demonstrated that linear substrates with complementary sticky ends, blunt ends, or mismatched sticky ends all generated the same ratio of homologous to non- homologous recombination, but that the addition of dideoxynucleotides to the 3' hydroxyls of restricted DNA decreased the amount of non-homologous recombination or end joining about five-fold, increasing the comparative frequency of homologous recombination.

3.2.2 Cell Cycle

In 1987 Wong and Capecchi (183) demonstrated that homologous recombination between co-transfected plasmids peaks in the early part of the S-phase of the cell cycle.

70 It was soon realised that by integrating previous work on the development of embryonic stem cells and transgenic mice with homologous recombination procedures, it would be possible to specifically alter a gene in the genome of a stem cell and to derive a mouse line from this altered cell. With this in mind Thomas and Capecchi (184) investigated the effects of various parameters on the frequency of gene targeting of the Hypoxanthine Phosphoribosyltransferase (HPRT) gene in ES cells. They chose this gene because it encodes a selectable phenotype and is located on the X chromosome. Since there is only one copy of this gene in male stem cells, one targeting event will change the phenotype of the cell.

3.2.3 Length of Homology

The frequency of gene targeting or homologous recombination in the HPRT gene was very sensitive to the extent of the homology between the target chromosomal gene and the input vector. Increasing the size of the region of homology from 4 to 9Kb increased the targeting frequency 20- fold. Subsequent data however, have shown that although the proportion of transfected cells that undergo homologous recombination, (ie the targeting frequency) generally increases with the region of homology it also seems to depend on other locus-specific factors, and reasonable targeting rates have been achieved with only small regions of homology.

3.2.4 Targeting of non-expressed genes

In the plasmid X chromosomal plasmid experiments (176) only two of Smithies' four stably transformed cell lines and 1 of Lin's 10 (177), were able to produce positively selectable colonies by homologous recombination. In the other cell lines, it was thought that the chromosomal environment at the site of integration of the initial transfection vectors was somehow preventing homologous recombination. DNA repair is

71 thought to have a similar mechanism to that of homologous recombination, and is known to be enhanced in transcriptionally active regions. This lead to speculation that the rate of homologous recombination may be influenced by transcriptional activity at the target locus.

In their subsequent experiments, however, Smithies et al (178) demonstrated that homologous recombination occurred in the 15-globin gene whether or not the target locus was transcriptionally active. They employed a sophisticated bacteriophage rescue system to identify homologous recombinants without dependence on expression of the selectable Neo gene. The targeting construct, included 4.6kb of Ji-globin sequences, the Neo gene, and the SupF gene which would allow the use of the bacteriophage rescue system. When this construct was introduced into EJ bladder carcinoma cells, which do not express the ii-globin gene, no recombinants survived the G418 selection. Using the bacteriophage rescue system and Southern blot analysis alone, however, homologous recombinants were isolated every 300-1100 cells transfected. This was a similar frequency to that observed when hybrid Hull cells, which do express the ft-globin gene, were transfected with the same vector, but selected by virtue of the Neo gene expression. It was suggested that where the gene locus is not active, homologous recombination may occur, but expression of the neomycin resistance gene is repressed by sequences in the genome, even when it carries its own transcription signals, and the cell does not survive the selection procedure.

Since then other non-expressed genes have been successfully targeted (185, 180) . Although these reports show that some non-expressed genes are targetable, they do not, however, rule out the possibility that a closed chromatin structure or repressor sequences may prevent targeting events or their selection at other target loci.

72 3.3 INSERTION AND REPLACEMENT VECTORS

Two types of vector, both containing a neomycin phosphotransferase gene cassette which is called pMClNeo (184) and is efficiently expressed in stem cells, were constructed for the HPRT gene targeting experiments. One was designed to insert the whole vector, including the plasmid and mutated HPRT gene sequences, into the homologous chromosomal loci by a single crossover event; the other was designed to directly replace the chromosomal gene with the mutated sequence, without adding any extra material. This event involves two crossovers (Figure 10) . Transfected cells were selected by their resistance to G418, and specifically targeted genes by their resistance to 6-thioguanine (6-tg), because of the mutation in the HPRT gene. At optimal conditions the authors found that 1/1000 G418 resistant cells were also 6-tg resistant. Both types of constructs were found to have similar targeting frequencies.

Since insertion vectors result in the addition of extra sequences to the genome, the effects of which are unknown, and are more complicated to construct, the majority of gene targeting experiments are performed with replacement-type vectors.

73 74 Figure 10. Disruption of the HPRT Gene by Gene Targeting (184) .

A. Sequence Replacement; vector is designed so that upon linearization, the vector HPRT sequences remain colinear with the endogenous sequences. Following homologous pairing between vector and genomic sequences, a recombination event replaces the genomic sequences with the vector sequences containing the Neo gene.

B. Sequence Insertion; vector is designed such that the ends of the linearized vector lie adjacent to one another on the HPRT map. Pairing of these vectors with their genomic homologue, followed by recombination at the double strand break, results in the entire vector being inserted into the endogenous gene. This produces a duplication of a portion of the HPRT gene.

Open boxes indicate introns; closed boxes indicate exons. The crosshatched box indicates the Neo gene.

75 3.4 METHODS OF TRANSFECTION

There are several methods which could potentially be used to introduce a targeting vector into cells. These include microinjection, retroviral infection, calcium phosphate precipitation and electroporation. Although electroporation (186) is a relatively inefficient method of gene transfer (efficiencies of 10~5 to 10-2 have been reported) it is now the method of choice when targeting stem cells. This is because, unlike calcium phosphate precipitation, it is possible to select electroporation conditions that result mainly in single copy, single site insertions of the vector DNA (184, 187) . Electroporation can also be conveniently applied to large numbers of cells and, under the correct conditions will not cause ES cells to differentiate.

Microinjection has also been used for gene targeting of Embryonic Stem cells (188) . This procedure has the advantage of being a very efficient method of gene transfer (10-20% of injected cells become stably transformed), but is technically difficult and not feasible where low targeting frequencies mean that large numbers of cells must be transfected.

The proportion of transfected cells which undergo homologous recombination does not appear to be affected by the method of introduction of DNA into the cells (189).

76 3.5 SELECTION PROCEDURES

Early experiments targeted genes that were themselves selectable, such as the neomycin phosphotranferase gene (with its own transcriptional signals) or the HPRT gene. These selectable genes are, however, rare and other methods of detecting targeting events had to be developed before other genes could be targeted. Selectable genes, such as the neomycin resistance gene, are either included in the targeting construct, or co-transfected on a separate vector, in order to select for those cells which have been successfully transfected. Random integration is, however, 2-5 orders of magnitude more common than homologous recombination, and since any targeting events must ultimately be confirmed by laborious Southern blotting analysis, procedures for enriching the proportion of targeted colonies in the surviving cell pool have been developed.

3.5.1 Enrichment

Sedivy and Sharp (190) developed a method of positive selection for homologous recombination which relies on an inframe fusion product of the target sequences and an E. Coli neomycin phosphotransferase gene without any of its own transcription signals. When this occurs the Neo gene will be under the transcriptional control of the target locus and will be expressed with the target gene, rendering the cells resistant to G418. The targeting construct is designed so that an inframe fusion product will result from homologous recombination, whereas most non-homologous recombination events will produce frameshifts. This method can not provide absolute selection for the targeted cells, as occasionally inframe fusion products will result from random insertion, but does result in a 100-fold enrichment of targeted cells in the transfected pool. It is also only applicable were the target gene is expressed and requires that the target gene has been

77 cloned and that the triplet coding sequence is known.

A further method of enriching for targeting events, which can be used in non-expressed genes, is to modify the neomycin phosphotransferase gene in the targeting vector by including its own transcription initiation site but not a poly-A tail. The Neo gene will, therefore, only be expressed where it has access to polyadenylation signal sequences in the genome; that is where the vector has inserted upstream of a genomic poly-A tail. This effectively eliminates all those insertion events in intergenic regions.

Mansour, Thomas and Capecchi (191) developed a "selection" method which can enrich 2000-fold for the targeted event and requires only that the target gene be cloned and the intron-exon boundaries be characterised. This method relies on a positive selection for cells that have incorporated the targeting construct anywhere in the genome, and negative selection against those that have integrated it randomly (see figure 11) . The neomycin phosphotransferase gene, inserted into the coding sequence of the targeting vector, renders transfected cells resistant to G418 as well as providing the mode of disruption of the target gene for "knock out" experiments. The Herpes Simplex Virus thymidine kinase gene (HSV-tk) which renders cells susceptible to the drug ganciclovir and its derivative FIAU (192, 193), is also included in the target vector flanking the region of homology.

Crossing over during homologous recombination will occur within the region of homology so that targeted cells will not incorporate the HSV-tk gene and will be resistant to both G418 and ganciclovir. Random integration, however, will usually occur via the ends of the vector, and will result in the incorporation of the HSV-tk gene into the genome. Non­ targeted, transfected cells will, therefore, metabolise the ganciclovir and die. Further enrichment can be obtained using this method by inserting an HSV-tk gene at both ends of the

78 region of homology. Again this method does not provide absolute selection since random recombination or DNA strand breakage may result in non-targeted cells ejecting the tk gene.

3.5.2 Selection

The above enrichment procedures do not actually identify the targeted clones within the enriched pools of cells. This can be achieved by rescuing the genomic sequences surrounding the inserted vector from all of the selected colonies, then performing Southern blot analysis with probes which will reveal true targeting events (178). This is, however, a time consuming and complicated procedure.

Another method of identifying targeted clones involves the use of the Polymerase Chain Reaction (193-196). Primers are chosen which correspond to unique sequence within the targeting vector, usually to the neomycin phosphotransferase gene, and to an adjacent position in the target genome. Only where homologous recombination has occurred should the two sequences be close enough to each other for amplification to occur. Colonies that are identified as being targeted by this method must also be confirmed by Southern blotting.

79 a Gene Targeting

neor

■ 1 II 1 1 1 1 G ene X 1 t neor ■ 1 ..... 1 i i m m \ i i x~neor HSV-tk-(G418r.GANCr)

b Random Integration

neo' HSV-tk

neor HSV-tk

■ m m A rwmmrM iL x + neor HSV-tk+(G 418r,GANCs)

80 Figure 11 The Positive/Negative Selection Procedure (191) . Showing use for enrichment of cells containing a targeted disruption of gene X. a) A gene X-replacement vector, that contains an insertion of the Neo gene in an exon of gene X and a linked HSV-tk gene, is shown pairing with a chromosomal copy of gene X. Homologous recombination between the targeting vector and genomic X DNA results in the disruption of one copy of gene X and the loss of HSV-tk sequences. Such cell will be X"' Neo+, and HSV-tk“ and will be resistant to both G418 and ganciclovir. b) Non-homologous insertion of exogenous DNA into the genome occurs through the ends of the linearized DNA, thus the HSV-tk gene remains linked to the Neo gene. Such cells will be X+, Neo+, and HSV-tk+ and therefore resistant to G418 but sensitive to ganciclovir.

Open boxes denote introns or flanking DNA sequences, closed boxes denote exons and cross-hatched boxes denote the Neo or HSV-tk genes.

81 Figure 12.

Strategy for Detection of Homologous Recombinants by the Polymerase Chain Reaction.

(A) Targeting Vector a

Chromosomal Gene

b

b

PCR

Amplification

(A) Primers are synthesized which correspond to a novel sequence in the targeting construct (a) and a sequence present in the genome, flanking the target locus, but not included on the vector (b).

(B) Only after homologous recombination will the two priming sequences be sufficiently close to produce an amplification product. (After random insertion of the vector DNA, primer sequence "a" would not be at the target locus and therefore could not act as a co-primer for amplification with "b".)

82 3.6 THEORETICAL MECHANISMS

Holliday (197) proposed a model to explain the mechanism of crossing over during meiosis. This involves the formation of a junction between two molecules of DNA which then passes along the strands by a process of "branch migration". The result is the production of complementary pairs of recombinant chromatids and a reciprocal exchange of genetic information (see figure 13) . In 1968 electron micrographs were published, which supported Holliday's theory, showing bacteriophage DNA containing branches (198) .

It was realised that recombination did not always result in a reciprocal exchange of material, and could result in an aberrant distribution of one marker. This non­ reciprocal transfer of information from one chromatid to another is referred to as gene conversion. Several models have been proposed to explain this event.

Meselson and Radding (199) proposed that recombination during meiosis is initiated by a single strand break in the DNA. Polymerases act to repair this nick, but in so doing may displace the damaged strand of DNA which then invades another DNA molecule by inducing a single strand break in it. This second displaced strand is degraded enzymatically and results in the formation of a small stretch of asymmetrical heteroduplex DNA near the site of initiation. The branch cross over point can move along the molecule because of the action of exonucleases at the second nick, and results in the enlargement of the region of asymmetric heteroduplex (see figure 14) . Isomerisation and branch migration through a Holliday junction then result in the resolution of the strands.

83 More recent observations of plasmid X chromosomal recombination in yeast have demonstrated the recombinogenic nature of double strand breaks and the occurrence of gene conversion by double strand gap repair. In 1983 Szostak et al formulated the Double-Strand-Break Repair Model for Recombination in yeast (200 - see figure 15) . A double strand break in one of the DNA duplexes is enlarged to a double strand gap, which is then repaired from the information on the second DNA duplex, resulting in gene conversion. (Postmeiotic segregation results from resolution of the heteroduplex DNA through Holliday junctions.) This has become the most widely accepted model for homologous recombination although several variations have also been postulated.

In 1984 Lin et al (201) proposed a similar model based on 5' exonuclease activity on the cut ends of double stranded DNA, exposing complementary regions. These regions then pair up and any existing gaps are repaired from the information on the other strand. In 1985 Wake et al (202) published a similar model which depended on the action of either exonucleases or helicases to expose the single stranded complementary regions of DNA. These two models attempt to explain the non-conservative event observed in intramolecular recombination, where both potential products of a crossover do not appear to be preserved.

84 t t h r

Figure 13. The Holliday Model for Genetic Recombination (197) .

Single-strand scissions are made at chemically identical sites on the homologous chromatids. Heteroduplex DNA is then formed symmetrically between h and r on both chromatids. The two pairs of like strands at the site of the crossover are considered by Holliday to be equivalent with respect to recognition by a DNase which terminates the exchange. Cleavages at the points marked p produce two molecules with the flanking arms in the parental configuration, whereas cleavage at points marked r produce molecules with flanking arms in the recombinant configuration, (figure from ref.199)

85 ----X X—

■ 7 ” P crossover non-crossover Figure 14. The Meselson-Radding Model (199).

Recombination is initiated (a) by a single strand nick on one of the two interacting duplexes. The 3' end of the nicked strand acts as a primer for DNA synthesis, which displaces the strand ahead of it. The displaced single strand invades the other duplex at a homologous site, displacing a D loop (b) and forming a small region of asymmetric heteroduplex DNA. The single-stranded D loop is degraded, and the invading strand is ligated in place. The limited region of asymmetric heteroduplex DNA is expanded (c) by concerted DNA synthesis on the first (donor) duplex, and by exonucleolytic degradation on the second (recipient) duplex. After the enzymatically driven production of asymmetric duplex DNA stops, either branch migration or isomerization can bring the 5' and 3' single-stranded ends into apposition so that they can be ligated. The resulting Holliday junction (d) can move along the duplex by the process of branch migration generating symmetric heteroduplex DNA (e) . Resolution can yield either the crossover (f) or non-crossover (g) cofiguration. (fig from ref 200)

87 Figure 15 The Double Strand Break Repair Model (200).

(a) A double-strand cut is made in one duplex, and a gap flanked by 3' single strands is formed by the action of exonucleases, (b) One 3' end invades a homologous duplex, displacing a D loop, (c) The D loop is enlarged by repair synthesis until the other 3' end can anneal to complementary single-stranded sequences. (d) repair synthesis from the second 3' end completes the process of gap repair, and branch migration results in the formation of two Holliday junctions. Resolution of the two junctions by cutting either inner or outer strands leads to two possible non-crossover (e) and two possible crossover (f) configurations. In the illustrated resolutions, the right hand junction was resolved by cutting the inner, crossed strands.

a — I b ------* ___/DC^

1 _ ___ x _ 1

d D C ... ____ XT *

non-crossover crossover

88 3.7 GERMLINE TRANSMISSION AND EXPRESSION OF A GENE CORRECTED BY HOMOLOGOUS RECOMBINATION IN EMBRYONIC STEM CELLS.

In 1989 Thompson et al (203) reported the first case of germline transmission of a gene manipulated by homologous recombination. The HPRT gene was again chosen for its X-linkage and selectability.

A line of embryonic stem cells with a naturally occurring deletion mutation in the HPRT gene had been isolated and shown to be capable of contributing to the germline (195). Doetschman et al (205) corrected the mutation in these cells by supplying the missing sequences on the incoming targeting vector, which also contained between 2.5 and 5Kb of sequences homologous to the genome. The vector was designed so that homologous recombination would take place by a single crossover event at the precut Xho I site, and result in the insertion of 12.4Kb of DNA.

The vector was introduced into the ES cells by electroporation and corrected cells were selected by their ability to grow on HAT. DNA from these colonies was analyzed by Southern blotting techniques. 14% of the stably transfected cells were shown to have a corrected HPRT gene.

Thompson et al injected the targeted cells, which also carried a genetic marker for light coat colour, into host mouse blastocysts from strains with dark coat colours. Any chimaeras could, therefore, be detected by the presence of light patches. Ninety-three blastocysts were injected. Twenty- six mice were live-born and twelve of the fifteen chimaeras were male. Only one of the eight males used in breeding experiments transmitted the targeted gene to their offspring.

DNA from the Fx generation was tested by Southern

89 blotting techniques and found to give the bands expected for the corrected HPRT gene. Messenger RNA levels for HPRT in these transgenic mice were comparable to those found in normal mice. All analyses appear to confirm that gene function had been totally restored and faithfully transmitted.

90 Chapter 4

Materials and Methods

91 4. MSVTEIRmLS AMD METHODS CONTENTS MOLECULAR BIOLOGY Techniques Genomic DNA Extraction from Tissues ...... 94 Genomic DNA Extraction from Blood ...... 95 Extraction of R N A ...... 95 Restriction Enzyme Digests ...... 96 Electrophoresis ...... 97 Phenol/Chloroform Extraction ...... 97 Ethanol Precipitation ...... 97 Ligation of D N A ...... 98 Preparation of Competent Cells ...... 98 Transformation of DNA 99 Bacterial Colony Lifts ...... 99 Plasmid DNA Extraction ...... 100 p r e p s ...... 100 Maxi p r e p s ...... 101 Electroelution of DNA from G e l s ...... 103 Southern Blotting ...... 103 Dot Blots and Slot B l o t s ...... 104 cDNA Synthesis ...... 104 Stripping of Filters ...... 105 T4 Polynucleotide Kinase-Labelling of Oligonucleotides ...... 105 Radiolabelling Fragments with a Random Primer 105 Radiolabelling of DNA Fragments by Nick Translation...... 106 Dideoxy Sequencing ...... 107 Bacteriophage X Library Screening ...... 109 plating bacteria ...... 109 titering the library ...... 109 bacteriophage plaque lifts ...... 109 Isolation of Bacteriophage DNA ...... Ill m i n i p r e p s ...... Ill m a x i p r e p s ...... Ill

92 Bacteriophage X Mapping ...... 114 DNA A m p l i f i c a t i o n ...... 115 Preparation of Blood Samples for PCR 115 Bacteria and Bacteriophage Strains ...... 115 Buffers and Stock Solutions 116

TISSUE CULTURE Techniques 126 Cells 126 Growth Conditions 126 Inactivation of FCS 126 BRL-Conditioned Media 127 Subculturing of ES Cells 127 Splitting of Cell Cultures 127 Freezing of Cell Cultures 128 Thawing Cells 128 Electroporation 129 Buffers and Stock Solutions for Tissue Culture 130

93 4.1 MOLECULAR BIOLOGY 4.1.1 Techniques Genomic DNA Extraction From Tissues(206)

Blood and connective tissue were removed from up to 5 grams of fresh mouse liver tissue. The tissue was then dounce homogenised in 30ml of 5% citric acid and poured through several layers of sterile gauze prewet in 5% citric acid. The homogenate was then centrifuged for 5 min at 2,500g. The pellet was resuspended in 10ml of 5% citric acid by gently pipetting up and down a 10ml pipette. The homogenate was then placed on top of a 15ml cushion of 30% sucrose in 5% citric acid in an acid- washed Corex tube. This was spun at 5000g for 5 min in a swinging bucket rotor.

Upper layers were removed by aspiration and the nuclear pellet resuspended in 10ml of RSB. This was centrifuged at 2500g for 2 min. The pellet was rewashed with RSB until the suspension pH reached 7.4. Sodium dodecyl sulphate (SDS) was added to this nuclei suspension to a final concentration of 1%, and llmgs of proteinase K powder were added. The mixture was incubated at 37°C for 1-2 hours, then a further llmg proteinase K were added and incubated for 1-2 hours or until the tissue was completely dissolved.

1/10 of the total volume of 3M sodium acetate pH 5.2 was added and the mixture was extracted with an equal volume of a 50:50 mix of buffered phenol and chloroform. The aqueous layer was transferred to a new tube with a wide-bored pipette, and extracted with 2 volumes of diethyl ether. The top, ether phase was removed by aspiration and the remaining aqueous layer was precipitated in 2.5 vols of ethanol. DNA was then spooled onto a glass rod and transferred to a new tube.

94 The DNA was suspended in 10ml 0.1XSSC at 4°C overnight. 20^11 of DNase-free RNase A were added and the solution was incubated at 37°C for 30 min. The DNA was then extracted with phenol/chloroform and ethanol precipitated, as described below. The optical density at 260nm was read and the yield of DNA calculated from the equation lOD260=50^g DNA/ml. The DNA was then dissolved in water at approximately lmg/ml and stored at -20°C.

Genomic DNA Extraction From Blood

40ml blood lysis buffer were added to 10ml blood. The mixture was left on ice for 20 min then centrifuged at 960g for 20 min at 4°C. The supernatant was removed and 4.5ml of 75mM NaCl, 24mM EDTA were added to the pellet. The pellet was then resuspended by vigorous pipetting and the mixture transferred to a polypropylene tube. 0.5ml of 5% SDS, 2mg/ml proteinase K were added and the mixture was incubated overnight at 37°C.

The following morning the solution was sequentially extracted with equal volumes of phenol, phenol/chloroform and chloroform, as previously described. 2.5 volumes of ethanol were then added to the solution to precipitate the DNA, which was removed by spooling onto a sterile plastic inoculating loop.

Extraction Of RNA

All solutions for the preparation of RNA, apart from the 1M stock of Tris.Cl, were treated with Diethylpyrocarbonate (DEPC) at a final concentration of 0.01% to destroy RNases. Preparations and solutions of RNA were kept at -20°C or on ice to further prevent degradation. Chemical stocks were kept exclusively for RNA work and always handled with gloves.

95 Approximately 2g tissue were placed in 25ml 6M urea/3M LiCl, in a 50ml plastic centrifuge tube. This was homogenised with a polytron at Vmax for 1 min on ice. The suspension was then left to precipitate at 4°C overnight.

The following morning the suspension was transferred to a Corex tube and centrifuged at an average of 7691g for 20 min at 0°C. The supernatant was removed and the pellet was washed twice by resuspension in 20ml Urea/LiCl and recentrifugation at 7 691g for 20 min at 0°C. The tube was then inverted to thoroughly drain the pellet. This was then dissolved in 6ml lOmM Tris.Cl, pH7.5, 0.5% SDS. Proteinase K was added to the solution to a final concentration of 50fig/ml and the mixture was incubated at 37°C for l-2h.

The mixture was transferred to a polypropylene tube and phenol/chloroform extracted three times, as described. 17ml of ethanol was added and the RNA was precipitated at - 20°C overnight.

The following morning the RNA was pelleted by centrifugation at an average of 7 691g for 30 min at 0°C. The supernatant was carefully removed, the pellet was washed in 80% ethanol and repelleted as before. The pellet was then drained and dried briefly under vacuum, before dissoving in sterile, RNase-free water.

Restriction Enzyme Digests

DNA was suspended at approximately 0. in the appropriate (see appendix IV) 1 x buffer, and digested with 1-10 -fold excess of the appropriate restriction endonuclease. These digests were incubated at 37°C for 2 hours. Genomic DNA was digested in the presence of 3mM spermidine for 4-5h at 37°C

96 Electrophoresis

Horizontal gel electrophoresis equipment was obtained from Gibco BRL. Agarose gels were prestained with ethidium bromide and electrophoresed under current-limiting or voltage-limiting conditions, depending on the purpose of the gel, in 1 X TBE containing ethidium bromide at 0.5lig/ml. Low molecular weight DNA was electrophoresed at a limiting current of about 100mA. DNA of higher molecular weight was electrophoresed for longer periods of time at about 50V, constant voltage, to improve resolution.

Phenol/Chloroform Extraction

The DNA to be extracted was dissolved in a volume of at least 100|il. To the DNA solution was added 1/10 volume of 3M sodium acetate pH5.2 and an equal volume of a 50:50 mix of buffered phenol and chloroform. The mixture was vortexed briefly and microfuged at 13,000g for 2 min. The upper aqueous layer was removed.

Ethanol Precipitation

The DNA to be precipitated was dissolved in a volume of at least 100|il. To the DNA solution was added 1/10 volume of 3M sodium acetate pH5.2 (unless already added in a previous phenol/chloroform extraction) and 2.5 volumes of 100% ethanol. The solution was mixed and microfuged at 13, 000g for 10 min. The pellet washed in 70% ethanol and respun for 5 min. The liquid was carefully poured off and the pellet dried under vacuum.

97 Ligation of DNA

Ligations were carried out in a total volume of 20^.1 in 1 x ligase buffer in the presence of ImM ATP and 1 unit of T4 DNA ligase. The mixture was incubated either at room temperature for 4 hours or at 15°C overnight.

Preparation of Competent Cells

All manipulations were carried out in a cold room and pipettes and solutions were all prechilled.

The desired strain was streaked onto L-broth + lOmM MgS04 and incubated at 37°C overnight. A single colony was then picked and grown for 2 hours in 5ml L broth at 37°C in a shaking incubator. The culture was then transferred to either 100ml of prewarmed LB-broth in a 500ml flask or to 500ml of prewarmed LB-broth in a 2 litre flask, and were incubated in a shaking incubator until the optical density at 550nm was 0.5.

The cells were then pelleted at 2080g for 10 min at 4°C, resuspended in TFBI (25ml for every 100ml starting culture). The suspension was left on ice for 5-20 min. The cells were then repelleted at 2080g, for 10 min at 4°C and resuspended in TFBII (4ml per 100ml starting culture). The suspension was then aliquoted, snap frozen in liquid nitrogen and stored at -70°C.

98 Transformation of DNA

10*11 of the ligation reaction was diluted 5-fold with water and added to 100|ll of competent cells. The mixture was incubated on ice for 30 min then heat shocked at 42°C for 90 seconds. 1ml of L-broth was added and the mixture was incubated at 37°C for a maximum of 1 hour, before plating onto L-agar + antibiotic.

Bacterial Colony Lifts

A labelled nitrocellulose filter was placed on top of the bacterial plate. Orientation marks were inserted using a needle, and repeated in exactly the same place at each subsequent lift.

The filter was carefully lifted from the plate and placed face upwards on a piece of Whatman paper soaked in 10% SDS for 5 min. The filter was then placed successively on paper soaked in Southern Solution 1, Southern Solution 2 and 20XSSC, standing for 5 min on each. The filters were air dried then baked in a vacuum oven for 2 hours at 80°C.

99 Plasmid DNA Extraction (Alkaline Lysis Method) Mini Preps

3ml of L-Broth + the relevant antibiotics were inoculated with a single colony and incubated overnight at 37°C.

The following morning half of the culture was poured into a 1.5ml flip top tube and the remainder stored at 4°C. The bacteria were pelleted by microcentrifugation at 6500g for 2 min, and the supernatant removed by aspiration. The bacterial pellet was resuspended in 100^,1 of lysis buffer and incubated at room temperature for 5 min. 200p.l of alkaline SDS were added to the mixture and incubated at room temperature for a further 5 min. 150|ll of 3M potassium acetate pH4.8 were added to the tube, the contents mixed, and again incubated at room temperature for 5 min.

The contents of the tube were microfuged at 13000g for 10 min and the supernatant was poured into a fresh tube containing 400fll of equilibrated phenol/chloroform (50/50) . The mixture was vortexed and then microfuged for 2 min at 13000g. 300|ll of the aqueous layer were transferred to another tube and ethanol precipitated as described.

The dried pellet was redissolved in 50|ll of RNase A at a concentration of 20|ig/ml in sterile, distilled water. Approximately 5^1 of this solution were then used for restriction endonuclease analysis.

100 Maxi Preps

A 5ml starter culture (L-broth + appropriate antibiotic) was inoculated with either a glycerol stock or a single colony from a plate and incubated in a shaking incubator at 37°C overnight. At the same time 2L flasks containing 500ml of L-broth (no antibiotic) were placed in the orbital shaker to prewarm to 37°C.

The following morning the large flasks were inoculated with the whole of the starter culture and left at 37°C until the following morning.

5ml of cells were removed as a glycerol stock, added to 0.5ml of lOx Hogness buffer, left at room temperature for 30 min, aliquoted and stored at -70°C. The rest of the cells were pelleted at an average of 2661g for 10 min.

0.5g of lysozyme was dissolved in 250 ml of lysis buffer. The bacterial pellet was resuspended in 40ml of the lysis buffer/lysozyme by gently pipetting the cells up and down using a 10ml pipette. These were left on ice for 5 min and 80ml of alkaline SDS were added. The contents were mixed by swirling and again incubated on ice for 5 min. 60ml of 3M potassium acetate (pH4.8) were added, mixed by inversion and left on ice for a further 10 min. The mixture was centrifuged at an average of 7 691g for 10 min and the supernatant was poured into a fresh 500ml bottle through a double layer of sterile muslin.

The contents of the bottle were weighed and 0.6 volumes of 2-propanol were added. The mixture was precipitated at -20°C for 30 min then centrifuged at 7691g for 20 min. The supernatant was poured off and the tube drained in an upright position. The pellet was dissolved in 8ml of sterile water then transfered to a 30ml universal containing 8g of CsCl and 500^.1 of ethidium bromide stock

101 solution. The mixture was vortexed, transferred to a 15ml siliconised corex tube and spun at an average of 7796g for 10 min.

The liquid was transferred to a quickseal tube using a 5ml syringe and the tube was filled to the top with CsCl stock solution. The tubes were balanced to within O.lg of each other and spun in the ultracentrifuge at 25°C at an average of 200,000g for at least 18 hours or at an average of 140,000g for greater than 40 hours.

The plasmid band was harvested and transferred to a fresh quickseal tube . 75JJ.1 of ethidiun bromide were added, the tube was topped up with CsCl stock solution, rebalanced and respun under the same conditions.

The plasmid band was harvested and the volume increased 5-fold with water. The ethidium bromide was removed by repeated extractions with equal volumes of isoamyl alcohol. The DNA was then precipitated by adding 1/10 volume of sodium acetate pH5.2 and 2 volumes of ethanol, and leaving at -20°C for 60 min. The DNA was recovered by centrifugation at an average of 7796g for 20 min, redissolved in 300ji.l of sterile water and reprecipitated.

The Optical Density of the solution at 260nm was read, and the yield was calculated from the equation lOD260=50p.g DNA/ml. The DNA was routinely dissolved at lmg/ml.

102 Electroelution of DNA from Gels

The DNA to be separated was run on an ethidium bromide stained agarose gel. The band of interest was located under a UV light (254nm) and cut out using a clean scalpel blade. Electroelution buffer was placed in each of the reservoirs of the electroelution apparatus (from Applied Biosystems Incorporated) and any air bubbles eliminated with a syringe. The excised agarose block was chopped up and placed in the gel chamber of the electroelution apparatus and 120|ll of salt cushion was placed in the buffer chamber. The DNA was electrophoresed at 125 volts for 1 hour.

400|ll of salt cushion containing the DNA were then removed from the buffer chamber with a needle and syringe and the DNA ethanol precipitated. The DNA was repeatedly extracted with phenol/chloroform and recovered by ethanol precipitation before being used in ligation protocols.

Southern Blotting(208)

After electrophoresis the agarose gel was photographed under UV transillumination (254nm). The DNA was then nicked by exposing the agarose gel to 120 Joules of Ultra Violet light in a Stratagene Stratalinker. The gel was washed twice in Southern Solution 1 for 20 min and twice in Southern Solution 2 for 20 min. The blot was then set up in 20 x SSC using "genescreen" (DuPont) nylon membrane, (see Maniatis)

Genomic DNA was blotted for at least 48 hours. Plasmid DNA was blotted overnight. The blot was then dismantled and the DNA cross-linked to the nylon membrane by exposure to 120 Joules of UV light.

103 The blots were prehybridized in hybridization buffer alone for a period of 3 hours. This was then replaced with fresh buffer containing denatured 32P radio-labelled probe.

Hybridization was carried out overnight in bags. DNA fragments were labelled with 32P by either nick translation or randomly primed oligonucleotide labelling, 32P was added to the 5' end of oligonucleotides by the action of T4 polynucleotide kinase. Genomic DNA blots were hybridized in genomic hybridization buffer. Plasmid or bacteriophage DNA blots were hybridized in Hybridization buffer 1 or in oligo hybridization buffer depending on the probe.

Dot Blots and Slot Blots

The DNA to be applied to the membrane was denatured by dissolving in 200*11 of 0. 1M NaOH, and incubated at 70°C for 20 min. The mixture was then neutralized by the addition of 200*11 of 0.1M HC1 in 10 X SSC.

The nylon membrane was cut to the appropriate size and prepared for blotting by prewetting in water and soaking in 20 X SSC. The membrane was placed inside the apparatus and the DNA samples loaded. A vacuum pump was applied to draw the samples onto the membrane and the wells were then washed with lOOfll of 5 x SSC under vacuum.

The filter was fixed either by baking at 80°C for 2 hours or by exposure to 120 J of UV light. The apparatus was soaked in 50mM NaOH for 30 min then rinsed in distilled water before use, in order to reduce background on subsequent blots. cDNA Synthesis

A cDNA synthesis kit was purchased from Amersham Int. and the manufacturers instructions were followed. This

104 involved incubating approximately lp.g total or A+ RNA in the presence of a random primer, sodium pyrophosphate, deoxynucleoside triphosphates,[a-32P]dATP, and reverse transcriptase.

Stripping of Filters

Stripping of blots containing DNA for rehybridization was performed by washing the filters in 50mM NaOH for 30 min at room temperature. The NaOH was then removed by rinsing twice in water for 20 min at room temperature.

T4 Polynucleotide Kinase-Labelling of Ologonucleotides

250ng of oligonucleotide in 2.5[Ll of sterile water, were added to l|il of 10 x kinase buffer, 5p.l [y-32P]ATP (2.25MBq) and l|il of T4 polynucleotide kinase. The mixture was incubated at 37°C for 40 min and then passed over a nick column (Pharmacia- Sephadex G50 gel filtration), according to the manufacturers instructions, to remove unincorporated material.

Radiolabelling Fragments with a Random Primer

DNA fragments were radiolabelled with 32P using a random primer labelling kit purchased from Boehringer Mannheim.

Approximately 200ng DNA were suspended in 9|ll water and boiled for 10 min. The solution was then allowed to cool on ice. To the DNA solution was added 3p.l of a mixture containing equal volumes of the dCTP, dGTP and dTTP stocks, 2|il reaction mixture 6 (containing reaction buffer and random primer), 5|ll [a-32P]dATP (2.25MBq), lp.1 enzyme. The reaction mixture was incubated at 37°C for 2-3 hours, passed over a nick column and boiled for 10 min. It was then

105 cooled on ice and added in hybridization buffer to nitrocellulose or nylon filters.

Radiolabellinq of DNA Fragments by nick Translation(209)

DNA fragments were labelled with 32P by nick translation using a kit purchased from Amersham International.

50 to lOOng of fragment were resuspended in 4p.l sterile water. To this were added 3|il [a32P]dATP (1.35MBq), 2|il GCT buffer and l|il enzyme. The reaction mixture was incubated at 15°C for 2-3 hours, then passed over a nick column to eliminate unincorporated material.

106 Dideoxv Sequencing(210)

2ng of double stranded DNA in a volume of 20|il were denatured by adding 2|il of 2M NaOH, 2mM EDTA, and incubating at room temperature for 5 min. The mixture was then precipitated and dried as previously described.

The sequencing mixes were made up as follows: (For use with [a35S]dATP)

Deoxynucleo ide triphosphate stock solutions were diluted to a final concentration of 0.5mM, then the solutions A°-T° were made up as below.

mixture A° C° G° IJ iO 0.5mM dNTP dCTP 20 1 20 20 dGTP 20 20 1 20 dTTP 20 20 20 1 H20 20 20 20 20

The dideoxynucleo,ide triphosphate stocks were diluted as follows: ddATP -> 0.ImM ddCTP -> O.lmM ddGTP -> 0.3mM ddTTP -> 0.5mM These solutions were then mixed with the deoxynucleotide triphosphate mixes in a 50:50 ratio.

The chase consists of a solution of the four deoxynucleo ide triphosphates at a final concentration of 0.5mM each.

The stop solution consists of 80% formamide, 0.1% xylene cyanol, 0.1% bromophenol blue, ImM EDTA, lOmM NaOH. Made up fresh or stored at -20°C for 2-3 days.

107 The denatured DNA was resuspended in 8.5J11 of sterile water and the following reagents were added; 1.5|ll of lOx low salt restriction endonuclease buffer, 2|il of primer at a concentration of 10^ig/ml, 2|il of [a35S]dATP (0.81MBq) . The mixture was allowed to anneal at 37°C for 15 min. 2 units of E. Coli DNA polymerase "Klenow fragment" were added to the annealed mixture and 3|ll of this mix was then added to each of four empty flip-top tubes. 2(i.l of the appropriate 50:50 mixes, made up as above, were added to each of these tubes and the reaction mixtures incubated at 50°C. After 15 min, 2^1 of the chase were added to the mixture, which was then incubated for a further 15 min at the same temperature. 20^.1 of the stop solution was added to terminate the reaction.

The sequencing reactions were heated to 90°C for 5 min and 3-4|il were loaded onto an 8% acrylamide/6M urea sequencing gel. The gels were run on constant power at 70 watts for 35x50cms gels

The gels were fixed in a solution of 10% methanol, 10% glacial acetic acid for 20 min. They were then lifted carefully onto a sheet of Whatman 3mm paper and covered with cling film before drying at 80°C for 30 min.

108 Bacteriophage X Library Screening Plating Bacteria

Plating bacteria (Escherichia Coli strain NM538) were prepared by overnight culture in LB-Broth, 0.2% maltose/ lOmM MgS04. The following morning the cells were spun down at 2080g for 10 min, then resuspended in 0.4 X the volume of original culture of 0.01M MgS04. This stock was stored at 4°C for 2-3 weeks.

Titering The Library

Serial dilutions of the bacteriophage X library stock were made in SM buffer. 100^.1 of these dilutions were added to lOOfil of plating bacteria, and incubated at 37°C for 20 min. 3ml of LB-broth, 0.75% agarose at 52°C were added to the mix and quickly poured on top of a dried 90mM plate.

The plate was incubated at 37°C overnight and the number of plaque forming units per ml of library was calculated the following morning.

Bacteriophage Plague Lifts

The library was diluted in SM so that 300|ll contained approximately 50,000 plaque forming units. For each of twenty 140mm plates, 300fll of library dilution was added to 300|ll of plating bacteria. These were incubated and plated as described, except that 7-8ml LB-broth, 0.75% agarose were added to the mixture of bacteriophage and cells.

The following morning the plates were placed at 4°C for lh to allow the agarose to harden. The lifts were then performed in the same manner as bacterial colony lifts but omitting the SDS treatment. (Up to five lifts may be taken

109 from each plate for use either as duplicates or for screening with different probes.) The plates were stored at 4°C.

After baking at 80°C for 2 hours the filters were washed for 30 min in 50mM NaOH, and rinsed twice for 30 min in water to remove any cell debris prior to hybridization.

The filters were hybridized as described in the "Southern Blotting" section in Hybridization Buffer 1.

Once autoradiographed, the films were aligned with the orientation marks on the filters. The films were then placed underneath the bacteriophage plates so as enable picking of positive plaques. The bacteriophage were picked by scraping the area around the positive plaque and eluting it in 1ml of SM containing a drop of chloroform (to prevent cell growth), for several hours at 4°C.

110 Isolation of Bacteriophage DNA Minipreps

Single bacteriophage plaques were picked and eluted in lml SM buffer for 30 min. 100[ll of this suspension were added to 100(11 of plating bacteria stock and incubated at 37°C for 20 min. 5ml LB-broth containing lOmM MgS04 and 0.2% maltose were added and the culture was placed in a shaking incubator at 37°C overnight.

The following morning 50p.l of chloroform were added to the culture in order to obtain complete lysis of the bacterial cells. 500p.l of this culture were microfuged at 13000g for 5 min and the remainder was stored at 4°C. To the supernatant was added E. Coli DNasel and RNaseA to a concentration of lOOng/ml, and the mixture was incubated at 37°C. After 30 min the following were added; 8p,l of stock EDTA (final cone.8mM), 20(11 of 10% SDS, and 20p.l of proteinase K stock (final conc. 400(ig/ml) . The mixture was incubated for a further hour at 37°C before being phenol/ chloroform extracted and ethanol precipitated as previously described. The whole preparation was usually used in one restriction digest and analysed by agarose gel electrophoresis.

Maxipreps (plate lysis method)

A single bacteriophage plaque was picked and eluted in lml of SM buffer for 30 min. 300pl of this suspension were added to 300(1.1 of plating bacteria stock, incubated at 37°C for 20 min and plated onto LB-agar + lOmM MgS04 and 0.2% maltose. The plates were incubated overnight at 37°C.

The following morning 10ml of SM buffer were placed on each of the plates in order to elute the bacteriophage. The plates were placed at 4°C for 4 hours, rocking occasionally. (The plates should be almost, if not totally, confluently

111 lysed). The SM buffer + eluate was removed and used as the bacteriophage stock.

300p.l of the bacteriophage stock were add to 300p.l of plating bacteria, incubated and plated as before. Four 140mm plates were prepared for each bacteriophage DNA preparation.

The following morning the bacteriophage were eluted in SM as before . All the eluates for each bacteriophage were pooled into a 50ml tube and 0.8ml chloroform and 2.3g of NaCl were added. The mixture was swirled until the salt dissolved and the tubes were left on ice for 60 min. The contents were then transferred to 50ml Corex tubes and centrifuged at 7691g for 10 min at 4°C.

The supernatant was removed to fresh 50ml falcon tubes, and 4g of polyethylene glycol (PEG mol. wt = 8,000) were added to each. The mixture was swirled to dissolve the PEG and left on ice for 2-3 hours. The mixture was again centrifuged in Corex tubes at 7 691g for 10 min at 4°C. The tubes were drained, the pellet dissolved in 5ml of SM and the solution was extracted with an equal volume of chloroform.

0.5ml was removed from the solution as a high titre bacteriophage stock, and stored at 4°C. To the remainder of the bacteriophage solution was added DNasel and RNaseA to a concentration of ljig/ml. The mixture was incubated at 37°C for 30 min then phenol/chloroform, and chloroform extracted. Proteinase K was added to a concentration of 50|ig/ml, EDTA to 20mM, and SDS to 0.5%, before incubation at 42°C for 60 min.

The resulting DNA solution was extracted twice with phenol/chloroform and once with chloroform alone, then

112 dialysed overnight at 4°C against lOmM TRIS.HC1 pH7.6 , ImM EDTA. The concentration of the DNA was calculated from the OD260 •

113 Bacteriophage X Mapping

This was performed as instructed in the Amersham kit RPN1721, based on the method by Rackwitz et al. (211).

Oligonucleotides corresponding to the left and right arms of the bacteriophage EMBL3 were P32 labelled with T4 polynucleotide kinase according to Amersham's instructions.

Conditions for the partial digestion of 250ng bacteriophage DNA were maximised. Aliquots of the reaction mixture were taken and stopped at three different time points. 2/3 of the digested DNA was added to P32 - labelled Right oligo and the remaining 1/3 to the P32 - labelled Left oligo. The mixtures were heated to 70°C for 3 min and allowed to anneal at 42°C for 30 min. THe samples were then loaded onto a 0.5% agarose gel and electrophoresed at 50V for approximately 36 hours. The gel was transferred to Whatman DE81 paper, covered with cling film and dried without heat for 1 hour, then for 30 min at 80°C, before autoradiography.

114 DNA Amplification / Polymerase Chain Reaction (PCR)

DNA amplification was performed on a Techne Dry Plate Cycler in a total volume of 25|il. The reaction mixture consisted of 1 x PCR buffer, 0.05% W1 detergent (BRL) . lOOng of each oligonucleotide primer, approximately 250ng genomic DNA, (or approximately 10-100pg cloned DNA) and 1 unit of Promega Thermus aquaticus DNA Polymerase. The reaction mixture was overlayed with liquid paraffin before placing in the apparatus.

Preparation of Blood Samples for P C R (202)

200|ll aliquots of blood were microfuged at 7,500g for 5 min. The supernatant was discarded and 200^.1 sterile water were added. The mixture was boiled for 5 min, then respun as before. The supernatant was then removed, and if not already a pale yellow colour, reboiled. This supernatant was then used as the DNA solution for amplification.

Bacteria and Bacteriophage Strains

The Escherichia Coli strains DH5 (Genotype: supE44, hsdR17, recAl, endAl, gyrA96, thi-1, relAl.) (ref Hanahan 1983) and DH5aF' (Genotype: endAl, hsdR17(rk-, mk+) RsupF44, thi-1, X-, recAl, gyrA96, rclAl,

The strain NM538 (Genotype: supF, hsdR, trpR, lacY) (ref Frischauf et al 1983) was used as the plating bacteria for bacteriophage from the Mouse Genomic DNA library obtained from Cambridge Bioscience.

115 4.1.2 Buffers and Stock Solutions

(All autoclaving carried out for 20 min at 151bs/sq inch and 121°C unless otherwise specified.)

Acrylamide stock 40% 38% (w/v) acrylamide, 2% (w/v) N, N'- methylenebisacrylamide

8% Acrylamide/6M Urea gel lxTBE, 42% (w/v) urea, 8% (w/v) acrylamide. Dissolve urea in TBE and water, add acrylamide and filter through Whatman number 1 paper.

Alkaline SDS 0.2M NaOH, 1% SDS

Ampicillin Stock solution 50mg/ml. Filter sterilize through a Nalgene 0.4 p.m vacuum filter and store at -20°C. Use at a final concentration of 50|ig/ml in media.

Blood Lysis Buffer 0.32M sucrose, lOmM Tris.Cl, pH7.5, 5mM MgCl2, 1% Triton. Sterilize by autoclaving at lOlb/sq.inch for 15 min at 112°C

Bromophenol blue Stock solution lOmg/ml

116 CsCl stock solution 295g CsCl dissolved in exactly 300ml water. RI=1.385.

Denhardt's solution 100X 2% bovine serum albumin, 2% polyvinylpyrolidone, 2% Ficol 400

Dialysis Buffer lOmM Tris.Cl pH 7.6 , ImM diaminoethanetetra- acetic acid disodium salt (EDTA)

Electroelution Buffer 200mM Tris pH8, 20mM EDTA, 50mM NaCl.

Electroelution Salt Cushion 0.lmg/ml bromophenol blue, 7.5M NH4Ac .

Elution Buffer lOmM Tris.Cl (pH7.5) (for oligo(dT) chromatography) ImM EDTA, 0.05% SDS.

Enzyme Dilution buffer 200mM NaCl, lOmM Tris.Cl pH7.4, O.lmM EDTA, ImM dithiothreitol, 50% glycerol.

Ethidium Bromide stock lOmg/ml

Gel loading buffer 25mM EDTA, 0.25% bromophenol blue, 50% glycerol.

117 Genomic Hybridization Buffer 5xSSC, lOmM Tris.Cl pH7.4 1% sodium dodecyl sulphate, 10% (w/v) dextran sulphate. Sterilize by passing through a Nalgene 0. 4p.m vacuum filter and add salmon sperm DNA to 100p,g/ml

Hogness Buffer 10X 36mM K2HPO4.3H20, 13mM KH2P04, 20mM Na3citrate, lOmM MgS04.7H20, 44% glycerol. Sterilize by autoclaving.

Hybridization Buffer 1 50% (v/v) formamide, 50mM Sodium Phosphate buffers (NaPB) pH6.8, 5xSSC, 0.1% SDS, 5x Denhardts solution. Filter through a Nalgene 0.4|im vacuum filter then add salmon sperm DNA to 200|ig/ml and tRNA to 100|ig/ml.

Kinase Buffer lOx 500mM Tris.Cl pH7.4, lOOmM MgCl2, 50mM dithiothreitol, lOmM spermidine.

118 L-Broth (Luria-Bertani medium) To 950 ml deionised water, add lOg bacto- tryptone, 5g bacto-yeast extract, lOgNaCl. Shake to dissolve, and pH to 7 with NaOH. Make volume to 11 and sterilize by autoclaving.

LB Agar L-Broth + 15g bacto-agar p e r litre media. Autoclave and cool to 50°C before addition of antibiotics. (Dry plates in 37°C incubator 2-3 hrs before use)

L agarose L-Broth + 0.75% agarose. Autoclave and cool to 52°C before use.

Ligase Buffer 5x 250mM Tris.Cl pH 7.8, 50mM MgCl2/ 25% w/v polyethyleneglycol, 5mM dithiothreitol

Loading Buffer 20mM Tris.Cl (pH 7.6) (for oligo(dT) chromatography) 0.5M NaCl, ImM EDTA, 0.1% SDS. Add SDS stocks after the other components have been sterilized by autoclaving.

Lysis Buffer 50mM glucose, 25mM Tris.Cl pH8, lOmM EDTA. Sterilize by autoclaving at lOlb/sq.inch for 15 min at

119 112°C

Maltose 20% (w/v), sterilize by autoclaving.

MOPS buffer lOx 0.2M MOPS, 50mM NaAc, 5mM EDTA pH8.

NaPB 1M, pH6.8 4 6.3ml 1M Na2HP04/ 53.7ml 1M NaH2P04.

Northern Hybridization Buffer 50% formamide, 50mM NaPB, 5xSSC, 1%SDS, 5x Denhardt's solution, Filter sterilize through a 0.4|im vacuum filter. Add salmon sperm DNA to 200|ig/ml and tRNA to 100p.g/ml.

Oligo Hybridization Buffer 6xSSC, 0.1% SDS, lx Denhardts. Filter sterilize through a 0.4|im vacuum filter then add tRNA to 100n.g/ml and salmon sperm DNA to 100ng/ml

PCR Buffer 10 x lOOmM Tris.Cl pH8.5, 500mM KC1, 25mM MgCl2, 2mM each deoxynucleotide triphosphate.

Phenol for DNA work 50mM Tris.Cl, pH7.6 ImM EDTA, 0. 3M NaAc pH7 Equilibrate overnight, then remove aqueous

120 layer. Add 0.05 vols M-Cresol, 0.002 vols 2- mercaptoethanol, 0. lg/lOOml fi- hydroxyquinolone. Store at 4°C away from light.

Phenol Chloroform 50% (v/v) DNA phenol 50% chloroform

Potassium Acetate 3M, pH 4.8, sterilize by autoclaving

Proteinase K stock solution lOmg/ml

RNA Sample Buffer 720(11 formamide, 160^1 lOxMOPS, 260|il formaldehyde, 180|ll water, 100^11 glycerol, 80|il Bromophenol Blue. Made up fresh every 1-2 weeks.

RNase A Stock solution lOmg/ml Boil for 10 min, snap cool and store at -20°C

121 Restriction Enzyme Digest Buffers:

10 X low salt lOOmM Tris.Cl pH 7.5, 70mM MgCl2/ 70raM 2-Mercaptoethanol.

10 X medium salt lOOmM Tris.Cl pH 7.5, 70mM MgCl2, 70mM 2-Mercaptoethanol, 500mM NaCl.

10 X high salt lOOmM Tris.Cl pH 7.5, 70mM MgCl2, 70mM 2-Mercaptoethanol, 1M NaCl.

10 X very high salt lOmM Tris.Cl pH 7.5, 70mM MgCl2, 70mM 2-Mercaptoethanol, 1.5M NaCl.

Smal buffer 200mM KC1, 60mM Tris.HCl pH8, 60mM MgCl2, 60mM 2-Mercaptoethanol.

RSB lOmM Tris.Cl pH7.4, lOmM NaCl, 25mM EDTA

Salmon sperm DNA Dissolve at lOmg/ml. Shear by passing through 18g needle, then sonicate three times for 1 min at maximum power. Add proteinase K to 50p.g/ml and incubate at 37°C for 60 m i n .

122 Phenol chloroform extract, ethanol precipitate and redissolve at lOmg/ml. Boil for 10 min before use.

Size Markers Dilute 1Kb ladder (Gibco BRL) to lmg/ml. Add 500^11 of the dilute solution to 500p.l blue juice/gel loading buffer and 4ml sterile water. Load 5p.l on the gel.

Sodium Acetate 3M, pH to 5.2 with glacial acetic acid, sterilize by autoclaving.

SM Buffer lOOmM NaCl, 8mM MgS04, 50mM Tris.Cl pH7.5, 0.05% (w/v) gelatin. Sterilize by autoclaving.

Southern Solution 1 1.5M NaCl, 0.5M NaOH

Southern Solution 2 1M Tris.Cl pH8, 1.5M NaCl

SSC stock, 20X 3M NaCl, 0.3M Na citrate

TBE lOx 900mM Tris Base, 900mM Boric Acid, 20mM EDTA (final pH= 8)

123 TFBI 30mM KAC, 50mM MnCl2, lOOmM KC1, lOmM CaCl2, 15% glycerol. pH to 5.8 with 0.1M acetic acid. Sterilize by passing through a 0. 4\im vacuum filter.

TFBII lOmM MOPS, 75mM CaCI^ lOmM KC1, 15% glycerol. pH to 7.0 with KOH, filter sterilise by passing through a 0.4|lm vacuum filter.

Transcription Buffer lOx 400mM Tris.Cl (pH8.25) 60mM MgCl2, 20mM spermidine, lmg/ml Bovine Serum Albumin. tRNA E Coli: Sigma R4251 Dissolve at lOOmg/ml, Add proteinase K to lOOug/ml, incubate at 37°C for 60 min, phenol chloroform extract and ethanol precipitate. Redissolve at 10 mg/ml.

6M Urea/3M Lithium Chloride 180.2g urea, 63.6g LiCl, made up to 500ml in sterile water. Add 50^.1 Diethylpyrocarbonate to destroy RNases. Do not autoclave.

124 X-gal/lPTG plates To 500ml LB Agar add (for blue white selection 2.5ml 2% X-gal in with DH50CF' ) dimethyl formamide, 2.5ml 0.1M isopropyl-fi-D- thiogalactopyranosid (IPTG) in water.

Xylene cyanol Stock solution lOmg/ml.

125 4.2 TISSUE CULTURE 4.2.1 Techniques

Cells The Embryonic Stem cell line E14 was provided, at passage number 13, by Dr Martin Hooper, University of Edinburgh. CCE and CPI cells were obtained from Dr W. Colledge, University of Cambridge, at passage 15. BRL cells were also provided by Dr Colledge. All cell lines were tested regularly for mycoplasma (kit from GenProbe, California).

Cell line Source Reference BRL Buffalo Rat Liver 213 CCE 129/Sv//Ev 156 CPI 129/Sv//Ev 160 E14 129/Ola 167

Most of the techniques below are based on those described by EJ Robertson in "Teratocarcinomas and Embryonic Stem Cells: A practical approach" (214)

Growth Conditions

Tissue culture flasks and petri dishes for the growth of Embryonic Stem Cells were coated with 0.1% gelatin for a minimum of 2 hours at room temperature before use.

Embryonic Stem Cells were routinely grown in 60% Buffalo Rat Liver Cell-conditioned CMfi media (166), with 20% batch selected fetal calf serum (FCS), in a humidified tissue culture incubator at 37°C and 5% C02.

Inactivation of FCS

The FCS was batch tested and heat inactivated by incubating at 56°C for 30min. The FCS was then aliquoted and stored at -20°C.

126 BRL-Conditioned Media

BRL-conditioned media was prepared by incubating a confluent 150cm2 tissue culture flask of Buffalo Rat Liver Cells (166) in 30mls CMB media for thee days. The media was then removed and filtered through a 0.22^iM Millipore filter and added to 20mls of unconditioned CMft. The BRLs were refed with media and incubated for a further 3 days. The confluent BRL culture was maintained in this manner for 2-3 weeks before splitting.

Subculturing of ES Cells

The Stem Cells were maintained at relatively high densities to minimise the level of spontaneous differentiation. Since the cells grow rapidly once established, they were split every few days, and refed each day. Cell viability improves if the cultures are fed 2-3 hours before splitting. It is important to trypsinise the cultures properly to ensure a single cell suspension. Carry over of cellular aggregates may cause the cells to differentiate.

Splitting of Cell Cultures

The cells were split when the plates were almost confluent. The media was removed from the cells and the plates washed three times with Phosphate Buffered Saline A. Versene trypsin was then added so as to coat the culture, and excess was removed. The trypsin was incubated with the culture for about 5 min then inactivated by adding complete medium. The cells were dislodged from the plate and suspended either by sucking the suspension up and down the pipette or by vortexing gently.

Once a single cell suspension was achieved it was split in the desired dilution, between other prepared flasks containing media. The cells were usually replated at a

127 dilution of between 1 in 5 and 1 in 10.

Freezing of Cell Cultures

Cell stocks were frozen so as to ensure the supply of cells at the lowest possible passage numbers.

The cells were harvested by trypsinisation as described above. The cells were then counted and pelleted by low speed centrifugation (150g for 5 min). Freezing was carried out at a density of about 1X107 cells per ml of media. The cell pellet was therefore resuspended in half the required freezing volume in complete media, and made up to the total required freezing volume by adding 2X freezing media, slowly while shaking the tube.

The cell suspension was then quickly aliquoted into freezing tubes, and placed in the air phase of a liquid nitrogen container overnight. The following day the frozen ampoules were transferred to the liquid compartment.

Thawing Cells

The frozen ampoule was removed from storage and placed in the 37°C water bath until all the ice crystals had disappeared. The ampoule was sterilised in 70% methylated spirit, and the contents transferred to a centrifuge tube. 5mls of complete medium were slowly added while shaking the tube and the cells pelleted by low speed centrifugation. The supernatant was removed and the cells washed and pelleted once more in fresh medium. The cells were resuspended in more fresh media and then transferred onto freshly prepared plates.

128 Electroporation

The cells were trypsinised as normal and media was added to inactivate the trypsin. After resuspending to a single cell suspension the cells were counted, and washed twice with PBSA. The cells were then resuspended at a minimum density of 1 x 107 cells per ml, in complete PBS.

At least 60|ig of DNA, digested with the appropriate restriction enzyme, in 50|il of water were placed in a Falcon 2003 tube and 1ml of cells was added. The mixture was left on ice for 10 min then the cells and DNA were transferred to a sterile cuvette for electroporation in a Biorad Gene Pulser. Capacitance and voltage were varied as required for the conditions of the electroporation. The cells were then plated directly into growth media (215) and incubated overnight.

The cultures were refed the following day and selection was started 48hrs after electroporation. G418 was used at final concentration of 233|ig/ml (equivalent to 400|ig Geneticin/ml ), and ganciclovir at a final concentration of 2|lM.

129 4.2.2 Buffers and Stock Solutions for Tissue Culture

CMS MEDIA 400mls Dulbeccos Modified Eagle's Medium (DMEM) lOOmls inactivated Fetal Calf Serum (Selected batches from Imperial) (final concentration 20%) 0.25% sodium bicarbonate 50,000 IU Penicillin and Streptomycin, 1M L- glutamine, lOmM HEPES, 5mls 100 X non-essential amino acids (Flow), 5mls 2-mercaptoethanol stock, 5mls nucleosides stock. Store at 4°C, warm to 37°C before use.

2-MERCAPTOETHANOL STOCK lOmls PBSA 7|ils -mercaptoethanol Sterilize by passing through a millipore 0.2p,m filter and store at 4°C. Replace at weekly intervals

NUCLEOSIDES STOCK To lOOmls distilled water add 80mg Adenosine, 85mgs Guanosine, 73mg Cytidine, 73mg Uridine, 24mg Thymidine Dissolve by warming to 37°C. Sterilize through a 0.2|lm Millipore filter and aliquot while warm. Store at 4°C and rewarm to 37°C before use.

130 2X FREEZING MEDIUM To 8mls CMft add 2mls dimethylsulphoxide. Make up fresh before use.

VERSENE TRYPSIN 0.02% EDTA, 0.25% trypsin.

PBSA To 1 litre distilled water add 8g NaCI, 0.2g KC1, 1.15g Na2HP0412H20/ 0.2g K H 2 P 0 4 . Aliquot in glass bottles and autoclave. (Phosphate buffer =0.01M, pH 7.4)

PBSB To 1 litre distilled water add 20g CaCl2.2H20, 20g MgCl2.6H20. Sterilize by autoclaving.

COMPLETE PBS To 500ml PBSA add 2.5ml PBSB.

100X non-essential per litre/ amino acids (Flow) 890mg L-alanine, 1.5g L-asparagine.H20, 1.33g L-aspartic acid, 750mg glycine, 1.47g L-glutamic acid, 1.15g L-proline, 1.05g L-serine.

131 Embryonic stem cells containing factor IX gene fix

E14 cells electroporated with mutant' copy of factor IX gene to TKfIX6Neo give homologous recombination

Selection with G418 ganciclovir + PCR

Microinjection into blastocyst

Incubation in foster mother

V Germ line chimaera

Normal offspring Transgenic offspring with mutant factor IX allele "mouse model"

132 Figure 16. Outline of Proposed Strategy for Generating a Mouse Model for Haemophilia B (or A).

A positive/negative selection procedure is used to select cells that have undergone homologous recombination with a vector containing a mutant factor IX (or VIII) gene. The cloned cell line is then microinjected into a blastocyst and introduced into a foster mother. The offspring of the resulting chimaera will then carry the mutant factor IX (or VIII) allele.

133 C^tXPFEILS

SgSWJIS M (p ‘DISCUSSICXK.

ISOLATION 0 7 IM E OvCOUST. FACTOR V III QT3&

134 5. ISOLATION OF THE MOUSE FACTOR VIII GENE

The first step towards making a mouse model of the haemophilias was to isolate regions of the mouse genes for factors VIII and IX, from which a targeting construct could be generated. Such important genes might be expected to be highly conserved across mammalian species. Indeed the human genes had originally been isolated by hybridization to probes of bovine and porcine origin. Human factor VIII-dependent clotting assays in several other species, including the mouse, have also shown the existence of factor VIII activity and, therefore, suggested the presence of coagulation systems homologous to those found in man. It was, therefore, proposed that the mouse factor VIII and IX genes might be isolated by hybridization to human cDNA clones.

135 5.1 RESULTS 5.1.1 Library screening

In order to confirm the presence of factor VIII sequences within the mouse genome, nucleotides 150 to 7210 of the human factor VIII cDNA (Delta Biotechnology) was nick translated in the presence of 32-P and used to probe a Southern blot containing human and mouse genomic DNA digested with EcoR 1 and Hind 3 (see figure 17). Hybridization was carried out at 65°C in Genomic Hybridization Buffer and the blot was washed at 65°C in 3xSSC, 1%SDS. The probe clearly hybridizes to homologous sequences in the mouse genome but shows a different complex restriction pattern to that of the human.

Since the human cDNA probe hybridizes to sequences in the mouse genome it was used as a probe to screen two mouse genomic libraries and a mouse liver cDNA library. The cDNA bacteriophage library was obtained from Cambridge Bioscience and contains cDNA from an adult BALB/c mouse cloned into the EcoR 1 site of lambda gtll. 4 million bacteriophage were screened with the human cDNA probe, under varying conditions, but no positve plaques were isolated. The first genomic DNA library was also obtained from Cambridge Bioscience and contains DNA from adult DBA/2J mouse liver, cloned into the BamH 1 site of EMBL-3, with insert sizes ranging from 8 to 21Kb. The cosmid genomic library was obtained from Anna Marie Poustka (216) and contains DNA from the livers of 129/Sv mice, with an insert size of 40 - 45kb in pCOS2EMBL. 4.5 million bacteriophage and 556,000 cosmids were screened with the human cDNA probe. Several different hybridization temperatures and buffers were tried. Wash stringencies and temperatures were varied from 0.1XSSC to 3XSSC, at 65°C and 42°C and both the whole human factor VIII cDNA and fragments of it were used as probes. No positive clones were isolated.

136 Figure 17.

M MS

0 5 kb f? • a

E H E H E H

Genomic Southern Blot for Factor VIII

Southern blot of mouse (MS) and human male (M) and female (F) genomic DNA digested with EcoR 1 and Hind 3 and probed with nick translated, nucleotides 150 to 7210 of the human factor VIII cDNA. Hybridization was carried out overnight at 65°C in Genomic Hybridization Buffer. The blot was washed twice for 20min at 65°C in 3XSSC, 1%SDS.

137 5.1.2 Reverse Transcription/PCR

Since library screening had not produced any mouse factor VIII clones, a second strategy was employed. Mutations in important parts of the factor VIII protein, such as the cleavage sites, are known to have devastating effects on its function (217). Such regions of the molecule might be predicted to be particularly highly conserved between species, and it might, therefore, be possible to amplify mouse DNA sequences corresponding to these regions, using primers originally designed to amplify these segments of the human gene. A number of primers had previously been synthesized in order to amplify the region surrounding the thrombin cleavage site at amino acid 372 of the human factor VIII gene (112). Since this site is essential for protein function it should be highly conserved.

It was, therefore, decided to try to amplify this region of the factor VIII gene from both mouse genomic DNA, and cDNA synthesized by reverse transcription from total cellular RNA derived from mouse liver, spleen, kidney, brain, lung, heart. In order to enrich for factor VIII sequences, the cDNA was synthesized from a factor VIII specific primer, JKP37. Amplification was then performed with primers, JKP -11, 37 and 494, which are oligomers of 21, 21, and 22 nucleotides respectively, corresponding to the region surrounding the human factor VIII thrombin cleavage site (see figure 18).

138 EXON8 EXON9 ot 0 o 00 H O 0 0

i ro r- r- ro Pi LO H i—I A o

139 W00 W < 2 < .H 00 .H J rH PJ <1 LO 00 - r

OJ o cr» I XI CM o XI xi - I V A V A A V —I 0* a I I I I I I I I I I I I Figure 18. Diagram of Exons 8 and 9 of the Human Factor VIII Gene.

Relative positions of oligonucleotide primers used for amplification of mouse DNA, Rsal restriction sites and introns (17,8,9) are shown. Coordinates are for the most 5' point of the primers, on the human cDNA. (se.e~ VT }

140 In order to test the feasibility of this approach, RNA was first isolated from human spleen by the lithium chloride method as described, and used as a template for cDNA synthesis by reverse transcription from JKP37. Amplification of this cDNA with primers JKP37 and JKP494 resulted in a product of 434 base pairs as predicted (see figure 19).

Figure 19

Human Factor VIII cDNA Amplification. Products of the amplification of human cDNA from primers -434 bp JKP^94 and JKP37, undigested (U) , and digested (D) with the endonuclease Rsal. 35 -194 bp amplification cycles were performed, denaturing at94°C for 30sec, annealing at 55°C -120 bp for 15sec, and extending at 72°C for lOmin. One fifth of the reaction volume was digested with 20U endonuclease Rsal, and loaded onto a 12% acrylamide gel. The fragments were electrophoresed at 250V and the gel stained with ethidium bromide before photography under U.V. illumination.

141 Total cellular RNA was isolated from mouse liver, spleen, kidney, brain, lung and heart, and cDNA was then prepared under the same conditions as described for human RNA. These cDNAs were amplified with the human primers. Amplification was eventually achieved with the cDNA from all the tissues investigated. Amplification with oligonucleotides numbers 37 and 4 94, and -11 and 4 94 produced a fragment of approximately the same size as that observed when amplifying human cDNA (433bp and 339bp respectively) (see figure 20) . The amplification products were digested with the restriction endonuclease Rsal. Digestion of the human product resulted in the predicted bands of 193bp and a doublet of 120bp with oligos 494 and 37 (figure , and bands at 193bp and 120bp only for oligos 494 and -11 (figure 21). (This second digest would also result in a band of 2 6bp which would probably not be visible on a gel.) Digestion of the "mouse" amplification products (figure 21) appear to show two sets of bands; one novel set, perhaps corresponding to a mouse sequence containing only one Rsal site, and one set that show the same pattern as the human product and could perhaps be derived from contamination of the PCR reaction with cloned human cDNA which was in daily use in the laboratory.

142 Figure 20. AB

1kb B K B K K

430bp- - i it : 340 bp

494/37 494/-11

Mouse Factor VIII cDNA Amplification. Products of the amplification of cDNA from mouse brain (B) and kidney (K) from two pairs of primers, A. JKP494 / 37 and B. JKP494 / -11. Amplification conditions were the same as those described for amplification of human cDNA : 35 amplification cycles were performed, denaturing at 94°C for 30sec, annealing at 55°C for 15sec, and extending at 72°C for lOmin.

143 Figure 21.

B K L H 1 kb

195bp- 195bp 120bp-

120bp

■A B

Rsal Endonuclease Digestion of PCR products A. Rsal endonuclease digestion of the amplification products of mouse brain (B) , kidney (K) , and lung (L) and human (H) spleen cDNA from primers JKP4 94 and JKP-11. The mouse bands are at approximately 160 and 175bp. The stronger bands at 195 and 120bp may represent contamination with human cDNA.

B. Rsal endonuclease digestion of the amplification products of mouse brain (B) and kidney (K) cDNA from primers JKP494 and JKP37. The mouse bands are at 160 and 275bp. The fainter bands at 195 and 120bp may represent contamination with human cDNA.

35 amplification cycles were performed, denaturing at 94°C for 30sec, annealing at 55°C for 15sec, and extending at 72°C for lOmin. One fifth of the reaction volume was digested with 20U 144 endonuclease Rsal, and electrophoresed on a 2% ethidium bromide-stained agarose gel. 5.1.3 Amplification from genomic DNA

Mouse genomic DNA was also amplified with primers 37 and 494 (figure 18). This amplification spans exons 8 and 9, and an intron of about 300bp in the human. The amplification products consisted of two bands as shown in figure 22. The lower band, of 430bp, is of the correct size to correspond to contamination with cloned human cDNA. The upper band, of 1.5kb, is however of a different size to that derived from human genomic DNA, and may correspond to the mouse factor VIII gene. The differences in size of the products from the two species may be due to differential evolutionary divergence in intron sequences.

Figure 22. Amplification of Factor M H VIII Genomic DNA

-1.5/cb Products of the amplification of mouse (M) -750bp and human (H) genomic DNA - 430bp from primers JKP37 and JKP494. Amplification conditions were the same as those used for amplification of cDNA (see figure 21).

In order to show that the amplification products were indeed mouse factor VIII sequences, I attempted to sequence them by direct sequencing methods.

145 5.1.4 Direct Sequencing

The amplified fragments were isolated and purified by gel electroelution as described in chapter 4. They were then subjected to numerous direct sequencing strategies, including the Klenow sequencing method used routinely in our laboratory (see chapter 4), and T7 DNA polymerase sequencing using a kit from Pharmacia. Both methods were also attempted using a 32P- labelled primer, and using several different purification procedures such as multiple phenol chloroform extractions, "Geneclean" (Bio 101), and Elutip columns (Schliecher and Schuell) in order to prepare the amplified fragment before +o sequencing. In orderAincrease the specificity, primers which corresponded to sequences within the amplification primers were used in the sequencing protocols. These "nested” primers should not hybridize to non-specific amplification products which would generate nonsense sequence data.

Partial sequence data of 106 nucleotides was obtained, from a fragment derived from mouse kidney RNA using the Klenow sequencing protocol,. This was compared with the human factor VIII cDNA sequence as shown in figure 23. This sequence shows 75% identity with the human factor VIII cDNA in the exon 8 region.

146 Figure 23. ★ * * * * * * ****** ** HUMAN AAGAAGC GGAAGACTAT GATGATGATC TTACTGATTC M8 TGAAGAAGA AGAATATTAT GATGATGATG TTAAGGATTC MSKID AAGATTATA TAKTGACTCA

1301 ***** *** **** ** * * ***** HUMAN TGAAATGGAT GTGGTCAGGT TTGATGATGA CAACTCTCCT M8 TGAAATGGAT GTGTTCAGGT TTGATGATGA CAA CT MSKID TAGTCAGAAG AKTATCACAT ATGATTATGA CAG CCC

1351 ***** * * * * * * * * * * * HUMAN TCCTT TATCCAAATT CGCTCAG.TT GCCAAGAAG. M8 CTGCTTCCTA TATCCAAATT TGCTCAG.TT GCCAAGAAA. MSKID TCCCX KATCCAAAXX CGCTCAGGTT GCTAA.AAAG (THR372) 1384 * * * * HUMAN CATCCTAAAA CTTGGGTACA TTACATTGCT GCTGAAGAGG M8 CACCCTAAAA MSKID CACCCTAAAA CTAGGATACA TTAAA

Factor VIII Sequencing Data Comparison of the data obtained from direct sequencing of the mouse kidney PCR product (MSKID) with the human cDNA sequence and with the sequence of the cloned m8 fragment which was obtained by PCR from genomic mouse DNA (see below). Stars indicate positions where at least one of the sequences differs from the other two. Numbers refer to coordinates of the published human cDNA sequence (EMBL database)^.' Dots indicate "gaps" in the sequence.

NOTE Figure 23 : Please note that, due to contamination of the sequencing template and difficulty in reading the gel, the sequence shown here for MSKID is a "best guess" only, and may not be reliable.

147 No other sequence was obtained from any of the amplification products using any of the direct sequencing methods. Cloning of the amplification products, as a prerequisite to ordinary sequencing methods, was also attempted using blunt ended ligation methods, but this was not successful. In order to facilitate cloning oligonucleotide primers which incorporated restriction endonuclease cleavage sites were used.

5.1.5 Cloning of PCR Products

The oligonucleotide primer JKP4 94 already contains a Hind 3 restriction site. A new primer, JM27, which contains a Pst 1 site was also synthesized (see figure 18) . These two primers were used to amplify genomic DNA from boiled mouse blood by the method of Kogan et al (212) . 30|ll of the boiled mouse serum was amplified with lOOng of each primer, with an annealing temperature of 55°C, and an extension time of 10 min at 72°C. A product of 167bp was obtained, which was named m8 and cloned between the Pst 1 and Hind 3 sites in the plasmid vector pUC19. Dideoxysequencing of this clone revealed that the fragment shows 87% sequence identity with the human factor VIII gene. Fig 24 shows a comparison of the human cDNA and m8 sequences. Twenty-two nucleotide mismatches result in 9 amino acid changes, 3 of which are conservative. Curiously the activated Protein C cleavage site is conserved, while the thrombin cleavage site contains a mutation, which, in the human, is known to result in haemophilia A (217) . It remains to be shown whether this is an artifact caused by the infidelity of the Taq DNA Polymerase or a species specific difference between mouse and man.

(The m8 sequence is also included in figure 23 for comparison with the direct sequencing data derived from amplification of mouse kidney cDNA)

148 c d CD < o CD PM CD D 0 ) CO EH h i cr* V CD t-1 pp; E h < < EHCD 1—I g M-t CD CD h i £ > CD O o > %r h-1 CD PM D c r EH CO pp; CD 5 3 Ph 0 ) < 1 pc W 1/3 E h < >H p i EH CD >H EH h i CD EH PM CD. < h i EH EH CD c d o > < E h w CD PM CD D D H W • m * C h i EH CO w r t j CD H < 5 CDEH h i h i CD O (D CD 5 3 Pm CDCDCD! CD H O ) U EH h - i CD W CD pp; pp; 00 < lj « Eh CD < CO CD EH « < < 00 EH 3 2 EH w CD £ > Eh EH CD < ! h-i E h h i < O c cr (H pp; r - < W Eh >H Eh > 2 CD o < ci oo CD S EH h i Eh u PM CD PM EH s CJ Ul CD W E h CO EH CO CD o 3 EH CO •4-i < < < 1 < iS CJ O O h i O h i CD: Eh CO ! 3 3 G < < EH: H CO Eh « 2 »*M > CD > ■ c PI « < CD E o < tf? CD w h i o != ) CD 22 03 CD h i < PI EH h i < CD < CD PM > EH CD c d Eh CO EH CD o G> Eh Sl l g EH >H O PC EH h i i Eh CD h i CD r < EH CD Eh EH CO CO W CD £ > < CD h-1 "S'-3 E h >H >H PI EH CD h-1 h i < j § £ Eh CD h i h i PM EHCD CD $Eh > £ CD CO CO CD PM CD D u PM a a E h H H EH CO EH h i Eh CO a a ) CD < PI < C EH CD < a < CD o CD Pm « < CD pp; PM EH CO 5 CD Ph EH CO EH >H CO EH CD >H I * 3 PM C CD EH hCd EH CD h-1 -C < C C4—t G EH CO CD PM Pm < CO 3 COEH > h c d >H o Eh >H E h EH T 3 h ^ l CD < < Eh CD C CD G Eh 2 EH 5 3 o PM CD O o CO EH CO CD VI EH •PM T3 CD < E h < 2 CD P>j CD cd CD H EH CO E h

a 3 * C CO c CD GLUPROCYSSERASPVALLYSVALTYR CD § CD E C /3 5 2 C /3 o 3 o^ sg O U M H M SB s s H s £ s ffi S £ S £

149 Figure 24 M8 Sequencing Data

Comparison of the DNA and predicted amino acid sequences, of the cloned m8 fragment and exon 8 of the human factor VIII gene. The thrombin and activated protein C cleavage sites at amino acids 372 and 336 respectively are underlined. Differences between the two sequences are also indicated. ( The sequences are drawn in a 3' to 5' direction )

150 I decided to use the m8 fragment as a probe to isolate other genomic fragments from genomic DNA libraries. Firstly, in order to show that this fragment would hybridize to factor VIII specific sequences, the purified m8 fragment was 32P- labelled by nick translation and used to probe a Southern blot of EcoR 1-digested human and mouse genomic DNA. The results are shown in figure 25.

Figure 25. M8 Genomic Southern Blot

M M S Southern blot of equal amounts of human male (M) , female (F) and mouse (MS) genomic - 6kb DNA, digested with * EcoR 1 and Hind 3 restriction enzymes, and probed with 32P- labelled m8 fragment.

- 1 . 7 k b

E H H

Hybridization was carried out overnight at 65°C in Genomic Hybridization Buffer. The filter was washed twice for 20min at 65°C in 3XSSC,1%SDS.

151 M8 hybridizes to two human EcoR 1 fragments of 15Kb and 1.7Kb, and three human Hind 3 fragments of approximately 10Kb, 1.3Kb and 480bp. The published restriction map of human factor VIII suggests that m8 should hybridize to only one EcoR 1 fragment of the human gene. The human restriction map for Hind 3 is not known. Hybridization of m8 to EcoR 1-digested mouse genomic DNA results in a single band of 6Kb. No clear bands are visible in the Hind 3 digest. Hybridization of the fragment to the mouse genomic DNA appears less strong than to the human DNA. The m8 fragment was consequently used to screen the same mouse genomic DNA libraries as originally screened with the human cDNA, under varying conditions. No positive clones were isolated.

152 5.2 DISCUSSION

Despite several different strategies my attempts to isolate the mouse factor VIII gene have not been successful. Published data suggest that the coagulation systems have been well conserved across species and the human genes for several of the coagulation factors have been isolated by cross species hybridization. The human factor VIII cDNA probe clearly hybridizes to sequences within the mouse genome, producing a banding pattern which, although of obviously different sizes, resembles that seen on the human genomic DNA. Despite this, however, the probe was unable to identify mouse factor VIII sequences in any of the libraries screened. Amplification of mouse genomic and complementary DNA has generated fragments, which are thought to represent the mouse factor VIII gene, but which I was unable to clone.

153 5.2.1 Library Screening

The scarcity of factor VIII sequences within the cDNA library is perhaps not surprising considering the very low levels of factor VIII message found in human liver. Toole et al (97) had to screen more than 3 million recombinants in two human cDNA libraries with their human genomic clone before isolating 65 factor VIII cDNA clones. In order to generate the human genomic clone thay had already screened 4 x 105 bacteriphage of a porcine library with oligonucleotides, and 8 xlO5 recombinants in the human genomic library with the single porcine clone. By quantitative dotblot analysis, Toole et al estimated that the levels of factor VIII mRNA were 20 to 40 times lower than those of factor IX, which are thought to amount to 0.01% of the mRNA in human liver. In support of this I have not been able to demonstrate a signal for either human liver or spleen or mouse liver RNA in a Northern blot probed with either the whole or fragments of the human factor VIII cDNA (data not shown).

The lack of factor VUI-containing sequences in the genomic libraries, however, is more alarming. According to Clark et al (218) the probability of finding any one given sequence in a genomic library is given by; N= ln(l-P)

ln(l-F)

Where P=probability , N= number of recombinants screened and F=fractional proportion of the genome in a single recombinant.

154 4.5 million bacteriophage, with an average insert size of 16kb were screened for factor VIII sequences. Substituting N=2.5, F=1.6xl04 / 3xl09

4.5xl06 = ln(l-P)

In (1- (1. 6xl04 / 3xl09) )

4.5xl06 = ln(l-P)

-5. 3xl0“6

-23.85 = ln(l-P) 4 . 38xl0-11 = 1-P u p = 1 - iv-2>8 x \0~

P = 1 - lv . 3 3 x. vo~'’

The probability of finding a factor VUI-containing genomic clone in a library containing such a clone, by screening 4.5 million bacteriophage is, therefore, 1~ l v .3 8 xlO~M

For the cosmid library 556,000 recombinants with an average insert of 40kb ;

5.56 x 105 = In (1-P)

In (1- (4xl04 / 3xl09)

5.56 x 105 = In (1-P)

-1.33xl0-5

-7.41338 = In(1-P)

6.03 x 10-4= 1-P

P = 1 - 6 . 03xl0-4

P = 0.99939

The probability of finding a factor VUI-containing clone by screening 556,000 cosmids is, therefore, 0.99939.

These figures are both very high probabilities and it is very surprising that no positive clones were isolated.

155 These calculations, however, are based on the presumption that each sequence present in the genome, is equally represented in the libraries. The calculations would not be accurate if the libraries were in some way deficient or the desired sequences were under-represented. The fact that the cosmid library has been successfully screened for several other genes such as the Hox loci, Hox 2.1, 2.6 and 2.7, (219) SPARC (220) and 152 microglobulin, (D. B. Palmer personal communication) both before and after my experiments, suggests that the library itself is of good quality. There must, therefore, be some other explanation for the absence of factor VIII sequences.

Bacterial enzymatic machinery is capable of destroying material that is methylated on a cytosine residue as is eukaryotic DNA (221, 222). When bacterial strains possessing this machinery (modified cytosine restriction systems: mcrA, -B,-C) are used during construction of a DNA library they are thought to eliminate such methylated material. This may occur either during the DNA packaging procedure, if Mcr+ strains are contained in the packaging extracts, or during plating of the library if the host strain is Mcr+. If mouse factor VIII sequences were highly methylated, as is common for X- chromosome loci, they may well have been eliminated from the library in this manner during construction. Interestingly this cosmid library was also screened unsuccessfully for the factor IX gene, another X-linked locus.

Another possibility is that factor VIII sequences were deleted during amplification of the library. Bacteria are known to eliminate unstable DNA configurations such as Z DNA during rounds of replication. Also, sequences which are disadvantageous to the bacterial cell will tend to hinder its growth and so be under-represented in the amplified library. One region of the human factor VIII gene has never been cloned, despite many attempts to do so, and the cloned non­ coding regions of the gene have not yet been fully

156 characterised. It is possible that these regions may contain unstable DNA sequences. The factor IX gene, however, is known to contain a structure in its 3' untranslated region which could potentially form Z DNA. It is possible that this has also contributed to the absence of factor IX sequences from this library.

157 5.2.2 Reverse Transcription / PCR

Amplification from the human-derived primers was eventually achieved with cDNA from mouse brain, kidney, spleen, lung, heart and liver. This correlates with the findings of Wion et al, who reported the presence of human factor VIII mRNA in a variety of tissues including the liver spleen and lymph nodes, and at lower levels in kidney, muscle, placenta and fetal heart. Factor VIII mRNA was not detected in fetal lung or brain, although it may be that levels of the message in these tissues were too low to be detected by Northern blot hybridization and RNase protection mapping. Adult brain and lung tissues were not investigated. Exner et al (223), however, did detect factor VIII in lung tissue using immunohistological techniques, but these do not distinguish between sites of synthesis and storage. Wion et al were also unable to detect factor VIII message in white blood cells or cultured endothelial cells. The wide range of sites of synthesis of factor VIII is in contrast to the vitamin In­ dependent coagulation factors which are synthesised exclusively in the liver.

Since I have been unable to clone and sequence the products of the amplification of mouse cDNA I can not prove that they are of mouse origin rather than cross contaminating human DNA. Digestion of the amplification products, however, produced different Rsal restriction patterns to those derived from human cDNA (see figures 19 and 21) . The patterns and strengths of the bands are not consistent with partial digests. The proposed mouse bands are at 160bp and 275bp when primers 494 and 37 are used. When primer -11, which is approximately lOObp more 5' than primer 37, is used the larger DNA fragment drops to 175bp, while the smaller fragment remains the same. This pattern is consistent with the presence of only one Rsal site in this region of the mouse gene. A search of the sequences in figure 24 reveals the presence, shortly before the APC cleavage site at amino acid 336, of the

158 sequence CATG - the Rsal recognition sequence, in the m8 data but not the human. This is strong evidence to suggest that at least some of the amplification products were mouse-derived.

Although my direct sequencing data from PCR of kidney cDNA was rather difficult to interpret there were some very definite differences between this and the and human sequence. The direct sequencing data, however, also differs from the m8 sequence, and can not, therefore, prove whether or not the amplification products contained mouse sequence. Considering the figure of the Rsal digests (figure 21) of these amplification products, it seems likely that the sequence was derived from a mixed template of human and mouse DNA. This would also explain the difficulty in interpreting the autoradiograph.

Interestingly the thrombin cleavage site at amino acid 372, which is essential for human factor VIII activation, is present in the human and mouse direct sequencing data, but not in the m8 sequence. The Activated Protein C cleavage site at amino acid 336, is conserved in the m8 sequence, although a conservative nucleotide change is present. There have also been other reports of the isolation of these regions of the mouse factor VIII where the Thrombin cleavage site is conserved but the APC site is not (J. Gitchier - personal communication). It remains to be shown whether these differences are true differences in the structures of mouse and human factor VIII molecules or artifacts caused by the infidelity of the Taq. DNA polymerase. Such misincorporations of nucleotides by Taq. Polymerase are thought to occur about every 104 bases (224), although the newer recombinant enzymes are said to have higher degrees of fidelity.

Although the m8 fragment failed to identify clones in the bacteriophage library it did hybridize to clones in a YAC library (Zoia Laren, ICRF. personal Communication). This clone, however, has not yet been characterised.

159 5.2.3 Problems of Contamination

The power of the PCR technique lies in its extraordinary sensitivity; this is also one of its greatest problems and often leads to cross contamination. My efforts to isolate mouse factor VIII sequences by reverse transcription and PCR have been hampered by contamination with human DNA. At the time of these experiments human cDNA clones were in every day use in our laboratory and preparations for PCR were carried out on normal laboratory benches, with non-dedicated Gilsons. Although precautions to prevent cross contamination, such as dedicating solutions, cleaning Gilsons, and endonuclease digestion of primers and starting materials, were taken, contamination was still a problem in our laboratory, as in many others.

It has since been realised that more stringent precautions must be taken, particularly in the physical separation of pre-PCR preparation from the post-PCR analysis, and any cloned material (225) . Our laboratory has now purchased positive displacement pipettes to eliminate cross­ contamination, and dedicated a clone-free room for PCR work only. These steps seem to have successfully eliminated our contamination problems.

5.2.4 Direct Sequencing

Direct sequencing is now carried out routinely in our laboratory. The new protocols use magnetic beads to separate and immobilize single strands of DNA from PCR products (226). A biotinylated primer is incorporated into one DNA strand during the amplification reaction and is then bound to avidin on magnetic beads. After denaturation of the DNA, the beads and the single stranded DNA bound to them are easily removed from solution with a magnet. One strand of DNA is left in solution and the other i£ eluted from the beads: both portions

160 can then be used in normal single stranded sequencing protocols.

This recently developed technique eliminates the need for elaborate methods of purifying amplification products and, by using nested primers, results in a pure preparation of single stranded DNA, without non-specific amplification products. Sequencing single stranded DNA is also simpler because there is no competition between the primer and second DNA strand for the template, during the annealing reaction, and because there is no need for denaturation.

5.2.5 Cloning PCR Products

Recent advances in technology have also eliminated problems of cloning PCR products. Attempts at incorporating restriction sites into PCR primers in order to facilitate cloning have often proven unsuccessful because of the restriction enzyme's tendency not to cleave efficiently where their recognition sequence is located within a few base pairs of the end of the DNA fragment (New England Biolabs Catalogue 1990-91). Blunt-ending cloning procedures often fail because of the terminal transferase activity of the Taq. DNA Polymerase. This invariably leads to the addition of a single A to the end of the amplified fragment, due to the enzyme's preference for this base. Thus enzymic removal of the overhang is necessary prior to blunt-end ligation. Marchuk et al (227), however, have exploited this activity in the development of the T-vector. This is a Bluescript vector (Stratagene) , digested with EcoR V and incubated with Taq Polymerase in the presence of 2mM dTTP. The enzyme is obliged to add a T to the cut ends of the vector and this is then used to ligate to the complementary overhanging A on the PCR product. The T-vector is now routinely used in our laboratory to clone PCR products.

161 5.2.6 Recent Developments

Since my experiments, mouse factor VIII sequences have been isolated in our laboratory by screening a new bacteriophage library. This newly constructed library was unamplified and was preparing using packaging extracts and host strains which are Mcr- (PLK-17, Stratagene). 985,000 bacteriophage with an average insert size of 20kb were screened with the same human cDNA probe as previously described in section 5.1.1. 37 duplicating positive plaques were isolated.

Two of the bacteriophage containing sequences from exons 6, and 19 to 22 have so far been partially characterised. These show an overall sequence identity with the human gene of 90% in the region of the A3 to Cl domains and approximately 87% in the Al domain. Unfortunately it is not possible to compare the sequences derived from PCR of the region surrounding the thrombin cleavage site at amino acid 372, with those isolated from the library as this region has not yet been characterised.

162 C H A P T E R . 6

‘RESULTS JW&) 'DISCUSSIO$C

ISOLSVTICXRC^p CMASRACTnUSAfllOEC

O f ftfE ‘MOUSE FACTOR IX QEVfE

163 6 ISOLATION AND CHARACTERISATION OF THE MOUSE FACTOR IX GENE 6.1 RESULTS 6.1.1 Isolation of the Mouse Factor IX Gene

As a first step towards isolating the mouse factor IX gene, a human factor IX cDNA, termed cVII (75), which spans nucleotide residues 41 to 2026 of the mRNA, was obtained from Professor G. G. Brownlee. This clone was then used to probe a Southern blot of mouse genomic DNA for factor IX sequences. Hybridization to the nick translated probe was carried out overnight at 65°C in genomic hybridization buffer. The filter was washed twice for 20 min at 65°C in 3xSSC and 1% SDS (see figure 2 6).

Since this probe clearly hybridizes to mouse sequences it was used to screen a mouse genomic DNA bacteriophage library. The library was obtained from Cambridge Bioscience and contains DNA from a partial digest of adult DBA/2J mouse liver, cloned into the BamH 1 site of EMBL-3 (Insert sizes range from 8 to 21Kb). 4.5 million bacteriophage were plated and transferred to nitrocellulose filters in duplicate as described (section 4.1) . Gel-purified cVII was nick translated and hybridized to the bacteriophage filters in hybridization buffer 1 at 42°C overnight. The filters were then washed twice for 20 min at 65°C in 3XSSC and 0.1%SDS. Hybridizing areas which duplicated were picked, replated and subjected to secondary screening with the same probe at lower bacteriophage densities. Single bacteriophage plaques were then picked. Restriction analysis of mini-scale preparations of these bacteriophage showed that five independent clones had been isolated.

164 6.1.2 Mapping of the Mouse Factor IX Locus

In order to identify and subclone the factor IX-coding regions and to further characterise the locus, the five bacteriophage were then subjected to Southern blot analysis. The most strongly hybridizing bacteriophage plaque (number 301) , was digested with a panel of restriction enzymes, blotted, and probed with the human factor IX fragment, cVII, under the same conditions as the original genomic Southern blot. A hybridizing Bgl II fragment of 4.7Kb was subcloned into the BamH 1 site of the vector pGEMl, and given the number 304. The enzyme Hind 3 was found to cut this subclone within an exonic region. One of the hybridizing Hind 3 fragments was, therefore, isolated and sequenced by Ms. K. Gale by the method of Maxam and Gilbert (228) . The mouse sequence obtained shows 73% identity (including the 3' untranslated region) with the published sequence for exon 8, the largest of the exons, of the human factor IX gene (79). The large size of the exon is thought to account for the strength of hybridization seen to the human cDNA.

165 B H S E T

14 Kb-

WET.'

■ m *

166 Figure 26. Factor IX Genomic Southern Blot

Southern blot of mouse genomic DNA digested with BamH 1 (B) , Hind 3 (H) , Sal 1 (S) , EcoR 1 (E) , Taq 1 (T) , and probed with nick translated cVII (75).

Hybridization was carried out at65°C in Genomic Hybridization Buffer. The filters were washed twice at65°C for 20min in3xSSC, 1% SDS.

167 At this point the cDNA sequence for the mouse factor IX gene was entered into the EMBL database, by Wu, Stafford and Ware (229). This sequence was used to design 7 20-mers corresponding to sequences in exons 1 to 7 of the mouse factor IX gene (MIX 1 to 7, see appendix VI) . These oligonucleotides were then used to identify the location of the first 7 exons on the isolated bacteriophage. lOOng of each of the five bacteriophage were spotted onto 7 nylon membranes using a dotblot apparatus. The human cDNA and cloning vectors were also included on the filters as positive and negative controls. Each fixed filter was then probed with one of 7 MIX oligos. Hybridization was carried out at 42°C in oligo-hybridization buffer. The filters were washed at 52°C in 2XSSC and 0.1%SDS (see figure 27).

Bacteriophage 2 97 hybridized to oligonucleotides corresponding to exons 2,3,4 and 5, bacteriophage 298 to exons 5 and 6, and bacteriophage 299 and 301 to exon 7.

168 MIX 5 cVII l

V m

0

M I X 6

2 97 298 2 9 9 3 0 0

169 Figure 27 . Identification of Factor IX Exons

Two of the dotblots used to identify the locations of the exons of mouse factor IX within the bacteriophage clones. The top row of each blot contains, lOOng, 50ng, and 25ng of the human factor IX cDNA, cVII cloned into the pAT vector. The middle row of each blot contain the same series of dilutions of the pAT vector alone, and the bottom row contain lOOng of each of the four bacteriophage. The blots were hybridized to 32P-kinased oligonucleotides, specific for exons 5 and 6 at 42°C in oligo hybridization buffer. The filters were washed the following morning in 2xSSC,0.1%SDS at 52°C. MIX5 hybridizes to bacteriophage 297 and 298; MIX6 hybridizes only to bacteriophage 298.

170 All five bacteriophage clones have been mapped for the restriction enzymes Sal 1, Hind 3 and BamH 1, using Southern blotting, and bacteriophage mapping techniques (4.1.1). All the exons have also been subcloned and mapped with a variety of restriction enzymes (see figure 28).

The five bacteriophage span more than 65kb, of which approximately 53kb consist of sequences within the mouse factor IX locus. The bacteriophage include exons 2 to 8 of the mouse factor IX gene, but do not carry the sequences for the 5' region including exon 1, which has been cloned separately, or a region of the locus between exons 6 and 7. Figure 29 shows a diagrammatic representation of the mouse factor IX locus, including restriction maps of the bacteriophage and plasmid subclones.

171 H 3 7 8 H 2 304 P H2 H3 H3 Sc c P 4.7kb A 1----- HHx --- XBH------ScH--- pH—if,, A Sc No SL oL x A 5 |_|3 t '//////A 305 qi H2 H3E H3 c_ p i ---1------H-M il H IE NoSc 3.7kb r A X,P P P B N O b L

H 2

6 SL ^ 326 H2 ^ ^ ^3 p Mn R 4.8kb AM 1-----A1--- 1+1— ASc H 1 2--- 1 lu3 H 3 NKoX o X E ^ ...... H2?~..... ^

H3 2,3 H2 P, VMWA X SL Sc H3 H3SL X.A H3 E c 308 P l ill I | 1111 I 11 1 I I I 1 1 8.6kb SL ■ ill 1 1 1111 III 1 I I I I I « NoB H2 A A X,H3 E H2 1.7/1.2/0.7 H2 A bc A “ X H 2 H 2 l l 1 l 1 I l 1 1 1 0 1 2 3 4 5 6 7 8 9 kb

172 Figure 28. Restriction Maps of the Subclones of the Mouse Factor IX Gene, containing exons 2 to 8.

All clones have been inserted into the multiple cloning site of the vector pGEMl (Promega -3kb). Sizes quoted are insert sizes only. Hatched boxes indicate approximate positions of exons.

Key E EcoR 1 P Pst 1 H3 Hind 3 H2 Hind 2 SL Sal 1 Sc Sac 1 B BamH 1 X Xba 1 A Acc 1

173 I c H i CD - - CD 0) o

* CD CD CO o H 00 CD - - CD C/D - - (Q CO 4 - CD 3 | t \ 3 r\o CD I CO co X X 3 X X o> X X u ~a CO CD CD No -e- CL coro X -+- I 00 zr * CO * CD CD X o w Ul CD CD CD CJ1 Si D> D a X - - X - - CD CD CD 0) x -+- X 3 * X - - CO x ro CT>

S B = = £ = = OD O)

CD- - CD- -

CD- - CD- - X - - X - - CD- - CD- - X - - X - -

X - - X-- * oCO

00 00

CD*ii - - CD-I - - CD- - CD- - X - - X--

-ro e - co x C 0 - - co CD

■so j- COo o Figure 29 Restriction Map of the Mouse Factor IX Locus.

Restriction map for the endonucleases BamH 1 (B), Hind 3 (H), and Sal 1 (S). The five bacteriophage subclones, 297-301 and four of the plasmid subclones, containing exons 2-8 are also indicated. The blocked areas are exonic sequences. The area between bacteriophage 298 and 299, indicated by a dotted line has not yet been cloned.

*The bacteriophage are all shown in Left arm (L) - insert- Right arm (R) orientation except bacteriophage 298 which is as shown.

175 Exon 1 has been cloned by amplification of mouse genomic DNA with oligonucleotide primers RKIX1B and RKIX1C which correspond to the published sequence of the 3' end of exon 1 (229) and the 5' promoter region (230) (see fig 30). The 230 base pair amplification product (figure 31) was purified by gel electroelution and cloned into a T-vector (227) by virtue of the adenosine added to the 3' end of amplified fragments by Taq. DNA Polymerase. Sequencing of the clone confirmed that the product was exon 1 of mouse factor IX.

176 Figure 30. The Cloning of Exon 1 by DNA Amplification

The following shows the published sequence for the 5' region of mouse factor IX (229, 230), including the positions of oligonucleotide primers RKIXiB and C. Amplification is predicted to result in a 221 base pair product.

NF1 GATCGAAGAAGCAACTGGAAATAGCCCAAAGATACACCGAGGGAGATGGACAACAATTTCCC C-> +1 C/EBP AGAAGTAAGTCCCATTCAGCTTGCACTTTGGAACGATTGATTAGCCCT/GACGCTTGCACAA

CCATCTTCCTTTTAGGATATCTACTCAGTACCGAATGTGCAGTTTTCCTTGATCGTGAAAAT <-B

Key sequence in bold - primers for PCR (£.229/230) _ Tjie respective published sequences show "extra" C residues in these positions. +1 - proposed transcription initiation (230) underlined - possible binding sites for transcription factors C/EBP and NF1, and potential translation initiation sites.

Amplification was carried out at an annealing temperature of 59°C, an extension time of 1.5 min at 72°C, for 35 cycles.

177 Exon 1 Amplification. PCR product from amplification of mouse genomic DNA with primers RKIXle and RKIX1B. 35 cycles of amplification were performed with an annealing temperature of 59°C, and an extension time of 2min at 72°C. Lane number 1 is the "0" DNA control. Lanes 2 to 6 are all aliquots of reaction mixtures which contained 250ng mouse genomic DNA. These products were all pooled and purified before cloning into a T-vector. (M) is the lkb ladder (see appendix V for sizes). 6.1.3 Attempts to Isolate The Uncloned Regions of the Mouse Factor IX Gene.

In order to screen the bacteriophage library specifically for the region between exons 6 and 7, probes to the ends of bacteriophage 298 and 299 were required. The 5' BamH 1/Sal 1 fragment of bacteriophage 2 99 was used to generate a probe for the 3' end of the missing region. This was achieved by digesting the whole bacteriophage with BamH 1, Sal 1 and Hind 3/ and shotgun ligating the whole digest into BamH 1, Sal 1 cut pGEMl. According to the restriction map of bacteriophage 299, after digestion with Hind 3 only the most 5' fragment of the bacteriophage should ligate into this vector. This was confirmed by probing a Southern blot of bacteriophage DNA with the cloned product (see figure 32) . The probe appears to hybridize extremely strongly to the long arm of the BamH 1 and Hind 3-cut bacteriophage and to the insert band of the Sal 1 digest as predicted. This probe, however, also hybridizes less strongly to other bands of bacteriophage DNA, suggesting the presence of repetitive elements. Southern blots of genomic DNA probed with this fragment showed a high degree of smearing, confirming the presence of highly repetitive sequences. This fragment was not, therefore, suitable for use as a probe for library screens.

179 Figure 32.

B H 12 3 '4 5

kb

4 8 _

2 0 -

6.5-

3-

Southern blot of Bacteriophage DNA Bacteriophage DNA probed with the 5' BamH 1/Sal 1 fragment of bacteriophage 299, labelled with 32P by DNA synthesis from a random primer. The blot contains the five bacteriophage 297- 301 (1-5) digested with BamH 1 (B), Hind 3 (H), and Sal 1 (S) . The bacteriophage for the three digests are all loaded in the same order - 297 (1), 298 (2), 299 (3), 300 (4), 301 (5). The probe hybridizes very strongly to bacteriophage 299, as expected but also hybridizes to several bands in other bacteriophage, suggesting the presence of repetitive elements.

180 In order to isolate a specific probe for the exon 6 region of the mouse factor IX locus the 500-base pair Hind 3 fragment of subclone #326 was isolated and purified by agarose gel electroelution. The fragment was labelled with 32P[dATP] by DNA synthesis from a random primer and used to probe a blot of mouse genomic DNA digested with restriction enzymes, BamH 1 and Hind 3. Hybridization was carried out overnight at 65°C in Genomic Hybridization Buffer. The filter was washed twice for 20min at 65°C in 1XSSC, 1%SDS. As shown in figure 33, although this Hind 3 fragment would be expected to hybridize to only one band in each lane it clearly hybridizes to several such bands. This fragment does not, however, appear to contain highly repetitive sequences and was, therefore, used in conjunction with the cVII fragment to rescreen the bacteriophage genomic DNA library. Bacteriophage that hybridized to both probes would be deemed to contain exon 6 of the mouse factor IX gene and would then be further screened for exon 7 sequences using oligonucleotides.

181 Figure 33.

Genomic Southern Blot with Hind 3 B H Fragment of #326. Mouse genomic DNA, digested with BamH 1 (B) and Hind 3 (H), probed with the (kb) exon6-containing 20 Hind 3 fragment of

12 #326‘

Vr

-5

182 Approximately 1.3 million bacteriophage were plated and transferred to nitrocellulose in quadruplicate as described. One duplicate set of filters were screened with the cVII probe and the other duplicate set screened with the exon 6- containing Hind 3 fragment. Probes were labelled with 32P[dATP] by DNA synthesis from a random primer, and hybridization was carried out overnight at 42°C in Hybridization Buffer 1. The filters were washed three times for 20min in 0.1XSSC, 1%SDS at 65°C. The exon 6 probe hybridized to a large subset of bacteriophage. The cVII probe hybridized strongly to five bacteriophage, none of which hybridized particularly strongly to the exon 6 probe. These five bacteriophage were subjected to dot blot analysis, and probed with MIX5, 6 and 7. None of the bacteriophage, however, showed hybridization to the oligonucleotides for exons 6 or 7.

Probes for exons 6 and 7 which were specific for the coding regions, and therefore, not likely to contain repetitive elements were generated by amplification of the coding sequences only. Oligonucleotides which had been synthesized for sequencing purposes, RK96.6 / RK96.3 and RK97.3 / RK97.2, were used to amplify the exons from approximately 50pg of subclones #326 and #304. Amplification of both exons was carried out with an annealing temperature of 55°C and an extension time of 2min at 72°C, for 30 cycles. Exon 6 was amplified in the standard 1XPCR buffer, with a final Mg2+ concentration of 2.5mM; exon 7 was amplified in a final Mg2+ concentration of 1.5mM (figure 34). The PCR products were cloned into a T-vector, isolated from the vector, gel purified, 32P-labelled by DNA synthesis from a random primer and then used as probes to screen the bacteriophage library.

The initial screen of 1.3 million bacteriophage was carried out with a mixture of the exon 6 and exon 7 probes. Hybridization was carried out, as in the original library screen for factor IX sequences, overnight in hybridization

183 Buffer 1 at 42°C. The filters were washed twice for 20min in 3XSSC, 0.1%SDS at 65°C. 20 positive bacteriophage plaques were picked, plated in duplicate, and subjected to secondary screening with separate exon 6 and exon 7 probes. Although the exon 6 probe hybridized to many bacteriophage, no positives were found with the exon 7-specific probe. It was, therefore, concluded that none of these bacteriophage spanned the missing region between exons 6 and 7.

184 Figure 34.

Amplification of Exons 6 and 7 PCR products from the amplification of exons 6 (a) and 7 (b) of the mouse factor IX gene with primers RK96 . 6/96. 3 and RK97.3/97.2. Exon 6 was amplified in the standard PCR buffer (Mg2+= 2.5mM) . Exon7 was amplified in a final Mg2+ concentration of 1.5mM. For both exons 30 cycles of amplification were performed with an annealing temperature of 55°C and an extension time of 2min at 72°C. The exon 6 product is 291 bp: the exon 7 product is 260bp.

185 Comparisons of the mouse and human factor IX gene maps suggest that the missing region between exons 6 and 7 may be as small as l-2Kb in length. The possibility of cloning the region by amplification of genomic DNA was, therefore, investigated. Oligonucleotide primers were made which corresponded to the 3' region of exon 6 (RKIXgap3') and the 5' region of the cloned BamH 1/Sal 1 fragment of bacteriophage 299 (RKIXgap5'). PCR was performed under various conditions, altering annealing temperatures, extension times and Mg2+ concentration. Amplification, especially at the lower annealing temperatures, continually resulted in a series of bands or smears. There was, however, also a strong fragment of about 500 base pairs usually present. This fragment was isolated and and cloned into a T-vector. Subsequent sequencing of the clone, however, showed no similarity to the known sequence in the region of the oligos.

In an attempt to isolate the missing intronic region between exons 1 and 2 the original bacteriophage library was rescreened with a gel-purified restriction fragment of exon 2, under the same conditions as described for the original library screen. No positive bacteriophage were isolated.

186 6.1.4 Sequencing of the Mouse Factor IX Gene

All the exons and the intron-exon boundaries have been sequenced by dideoxy-sequencing methods of both DNA strands (210) . Figure 35 shows the mouse gene sequence and its protein translation compared with the human cDNA sequence. There is 77.6% sequence identity between the two cDNA sequences. When the comparison is restricted to the coding regions of the sequences, however, there is 86% sequence identity. If the transcription initiation site is presumed to be at the point corresponding to nucleotide +1 of the human gene, the mouse gene consists of 27 63 nucleotides of complementary DNA, terminating in the consensus sequence AATAAA followed, 21 bp downstream, by the site for the addition of a polyA tail.

There are two potential translation initiation sites at Met -39, and Met -46 (nucleotides +30 and +51) preceded by a 5' untranslated region of 50 nucleotides. The double stop codon, TAATGA, at nucleotide 1442 of the cDNA sequence, is followed by a 3' nontranslated region of 1320 nucleotides, which includes a (GA) 16 repeat.

Exon 6 is the least conserved of the coding regions. Compared to the human, the mouse gene contains two inserts; one single lysine residue, and a block of 9 amino acids.

187 Figure 35. Sequence of the Mouse Factor IX Gene.

Comparison of the mouse factor IX gene sequence (M) and its protein translation (P) with the human cDNA sequence (H). Only the nucleotides that differ in the human sequence are shown.

Key

# Transcription initiation sites corresponding to those proposed for the human gene (75, 82). Bold Possible translation initiation codons (and the whole mouse sequence). / Intron-exon boundaries. ** Sites of inserts compared to the human sequence and the (GA) 16 repeat. Underlined The double stop codon, poly-adenylation signal and possible RNA cleavage signal, aa amino acid. @ Site of addition od polyA tail...... Intron regions for which no sequence data is available. $ Site of base difference compared to sequence of Yao et al (231). (G cf.T)

Numbering not preceded by aa refers to the nucleotides of the coding sequence, presuming the transcription initiation site, corresponding to that proposed by Anson et al (75) for the human gene, is a G at position 1.

188 M GCCCAAAGATACAGCGAGGGAGATGGACAACAAT T TCCCAGAAGTAAGTCCCAT TC (+1)(Exonl) AGCTTGCACTTTGGAACGATTGATTAGCCCTGACGCTTGCACAATCTCCTAACAAAGGTC # #

H C G G TGA (+51) A A C G M ATGAAGCACCTGAACACCGTCATGGCAGAATCCCCGGCTCTCATCACCATCTTCCTTT P MatLysHisLeuAsnThrValMatAlaGluSerProAlaLeuIleThrllePheLeuL # (aa-39) G T A TAGGATATCTACTCAGTACCGAATGTGCAG...... CCTAATACTAAAGAACTATA euGlyTyrLeuLeuSerThrGluCysAlaV- (EXON 2) 74 T A C A G AT G CTTTTAAATTTCAGTTTTCCTTGATCGTGAAAATGCCACCAAAATTCTTACCCGTCCAAA -alPheLeuAspArgGluAsnAlaThrLysIleLeuThrArgProLy G TTG AG GA GAGATATAATTCAGGAAAACTAGAAGAGTTTGTTCGAGGAAACCTTGAAAGAGAGTGTAT sArgTyrAsnSerGlyLysLeuGluGluPheValArgGlyAsnLeuGluArgGluCysIl aa+1 G G A AGAAGAAAGATGTAGTTTTGAAGAAGCACGAGAAGTTTTTGAAAACACTGAAAAAACTGT eGluGluArgCysSerPheGluGluAlaArgGluValPheGluAsnThrGluLysThr GAGTATACCACATGCATATCTGAAGTAAGTATGTGCCAGAGGCA...... TTTAAACAC 235(EXON 3) TATCATTAAGCTGTCCTCCTTTTTCCTTACAGACTGAATTTTGGAAGCAGTATGTTGGTA ThrGluPheTrpLysGlnTyrValA- ACGAATTGCATTTTATTTTATTTTATTTCCTACCTGCTATATGAAA..... GGGAGGAC 263(EXON 4) TGGGCATTTTAGGCGCTCTCTGATAATTCAATTTCTTAACCTGTCTTAAAGATGGAGATC -spGlyAspG G C A CCGT CA AGTGTGAATCAAATCCTTGTTTAAATGGTGGAATATGCAAGGATGATATTAGTTCCTATG InCysGluSerAsnProCysLeuAsnGlyGlylleCysLysAspAspIleSerSerTyrG T T CCT A AATGCTGGTGCCAAGTTGGATTTGAAGGAAGGAACTGTGAATTAGGTAAGTAACCTTTTA luCysTrpCysGlnValGlyPheGluGlyArgAsnCysGluLeuA-

TGTATTCATATTCAACTTTCCTTTTT......

..TTAAATTTTTTTTAAATTGTTTGTTCAAGTTGAAGCCAATTCGTTTTTTTTTTTTCCT 377(EXON 5) T A TT G A G T CTTCTCTTCTTTTAGATGCAACGTGTAACACCAAAAATGGCAGGTGCAAGCAGTTTTGGA -spAlaThrCysAsnlleLysAsnGlyArgCysLysGlnPheTrpL $ T TG GG C T T G A AAAACAGTCCTGATAACAAGGTAATTTGTTCCTGCACTGAGGGATAGCAACTTGCAGAAG ysAsnSerProAspAsnLysVallleCysSerCysThrGluGlyTyrGlnLeuAlaGluA

189 H 478 G M ACCAGAAGTCCTGTGAACCAACAGGTCACAATCTAAATCACAGTTCTTTCAGAAACTT P spGlnLysSerCysGluProThrV-

GCATCTAAATCCTT...... ATTATTGGAATGCATTTCTATATGCCACGTGG 506(EXON 6) G A T AATGTCAGAATATTCTCTTTTTTCTATTTTTGTAGTTCCATTTCCATGTGGGAGAGCTTC -alProPheProCysGlyArgAlaSe G C A C *** C CC T G TC G G TATTTCATACAGTTCTAAAAAGATCACGAGAGCTGAGACTGTTTTCTCTAATATGGACTA rlleSerTyrSerSerLysLysIleThrArgAlaGluThrValPheSerAsnMetAspTy

T *************************** aaA T GG TGAAAATTCTACTGAAGCTGTATTCATTCAAGATGACATCACTGATGGTGCCATTCTTAA rGluAsnSerThrGluAlaValPhelleGlnAspAspIleThrAspGlyAlalleLeuAs A C C CCC T G G TAACGTCACTGAAAGTAGTGAATCACTTAATGACTTCACTCGAGTTGTTGGTGGAGAAAA nAsnValThrGluSerSerGluSerLeuAsnAspPheThrArgValValGlyGlyGluAs T C A T CGCAAAACCGGGTCAAATCCCTTGGCAGGTACTTTATATTGATCCGTTGACCTGCAGCCC nAlaLysProGlyGlnlleProTrpGln aagcttgtattctatagtgtcacaaat ...... 739(EXON 7) TG G ...... CCTGCAGGTCAACGGTCTTCTTAACTTTATTTCCACAGGTCATTTTAAA VallleLeuAs A G T CT T G TGGTGAAATTGAGGCATTCTGTGGAGGTGCCATCATTAATGAAAAATGGATTGTAACTGC nGlyGluIleGluAlaPheCysGlyGlyAlallelleAsnGluLysTrpIleValThrAl G G A T ACA C A TGCCCACTGTCTTAAACCTGGTGATAAAATTGAGGTTGTTGCTGGTAAGTAAACAAAATA aAlaHisCysLeuLysProGlyAspLysIleGluValValAlaG-

GATAATCCTTAGCAACATTAGTGCATGATGGACATATCACATGTACATTGTCCACCGGTG

TTGTTACTGAGTAA......

...... TGCTAATGATCAGTGAAGCCAACCAGACTGGGGACCATGGGAAATG (EXON 8) 854 C T CATTTATGTGAAGGACTATAAACTATGAGATTTGTTTTCAACAGGTGAATATAACATTGA -lyGluTyrAsnlleAs GGCACT GAC TT CAC TAAGAAGGAAGACACAGAACAAAGGAGAAATGTGATTCGAACTATCCCTCATCACCAGTA pLysLysGluAspThrGluGlnArgArgAsnVallleArgThrlleProHisHisGlnTy G C AC C T CG C CAATGCAACTATTAATAAGTATAGTCATGACATTGCCTTGCTGGAACTGGATAAACCTTT rAsnAlaThrlleAsnLysTyrSerHisAspIleAlaLeuLeuGluLeuAspLysProLe

190 990 G G C T T C A TGCA CGC AATACTAAACAGCTATGTAACACCTATCTGTGTTGCCAATAGGGAATATACAAATATCTT uIleLeuAsnSerTyrValThrProIleCysValAlaAsnArgGluTyrThrAsnllePh A A A G C TC CCTCAAGTTTGGTTCTGGCTATGTCAGTGGCTGGGGAAAAGTCTTCAACAAAGGGAGACA eLeuLysPheGlySerGlyTyrValSerGlyTrpGlyLysValPheAsnLysGlyArgGl 1110 A TAG T T CC T C A GGCTTCCATTCTTCAGTACCTTAGAGTTCCACTGGTGGATAGAGCCACATGCCTTAGGTC nAlaSerlleLeuGlnTyrLeuArgValProLeuValAspArgAlaThrCysLeuArgSe T AG C T T A T G CACAACATTCACTATCTATAACAACATGTTCTGTGCAGGCTACCGTGAAGGAGGCAAAGA rThrThrPheThrlleTyrAsnAsnMetPheCysAlaGlyTyrArgGluGlyGlyLysAs A C G C TTCGTGTGAAGGAGATAGTGGGGGACCCCATGTTACTGAAGTAAAGGGACAAGTTTCTT pSerCysGluGlyAspSerGlyGlyProHisValThrGluValGluGlyThrSerPheLe 1290 A G AACTGGCATTATTAGCTGGGGTGAAGAATGTGGAATGAAAGGCAAATATGGAATATATAC uThrGlyllelleSerTrpGlyGluGluCysAlaMetLysGlyLysTyrGlylleTyrTh C A T C 1400 TAAGGTTTCCCGGTACGTCAACTGGATTAAGGAAAAAACAAAGCTAACTTAATGAAAAAC rLysValSerArgTyrValAsnTrpIleLysGluLysThrLysLeuThrEnd aa425 CTATTTCCAAAGACAATTCAGTGGAATTGAAAATGGGTGATGCCCTTTACAGACTAGTCT TTCTACCTTTTGTTAAATTTAAATATATAAGTTCTACAAACACTGATTTTTCTCTGTGCA TAAGACAAGCCCATCTAGGATCTATATTGTTCTAGAGTAAGTAGGTTAGCAAATATAATC CACTAGAGAAATAGTTTAGTAAGAGATTCACCATTTCTGTAAGTCCAGCCCTTGTTAAAA TTAGAAAGTAAAGCTTTCCGTGTTGCCCATAAGGCGTGATGGTTCTTGATACAGAGATGT ACCCAATTCTCCCTCCTTGGCAGCAATTCATGTTTTAGCTCTTCCTTGCTACTCTCAATT TTATTAGTTTTCTATCCAGAATCTTTAACCCATTTTATGGCCAGAAGAATACAAGAGCAG CTGAAAAATTAAAACTCATCAAAAGCATGACTTCCTCTCCTGATTTTTCTGAATCTTGTA TCTTTTACAACTCCCAAACCACAAATCACTGACCTCTCCGTCATTCTCACCTTCCCTTTC TCCATCACCACTGAAGGAGGAAGCTATATGAGTTCCAGGACAGCCTAGGTACACAGAGAA ACCCGGTCTTGAAAGAAAAGAGAGAGTGGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA * GAGGAGAAAGAAATGATTAATTTAATCATATTGGTAATATATATATATTATATCTCTAAA AAAAAGTCACTAAACCTTACTTGTAACAACTGCCTATTTCTATGGTGTAAATATCCTTAC TTTGGTAGATTTCAAGCTATTAACATGAAGTTACTGGAAAAGGAGTTGAGAAAACATATG GAAAATTACTCTTAAAACTGTTTCAGGCAGTTTTTAACCTAGAAGCAGCTGAACTTTCTA GGAATACTTCAACAGTGCATCTTCAGCCTTCTCCAGTTCCAACCTACCTAAGGGTCATGT CTCTCACAGCAGGCTCAAGGCTGCAAGAGTCATTGCAAATGGCCAACTGACTTGCCCATT TATGGTTTTCTTCTCACCGGTAAACTGTTATTGTAATTAACACTGTCATATTGAATTTTC TAGAGGGATGCTGACCATCCGACCCATTTCTCATCTGAGACTTGGTGAACTGGCATTTTA ATACTTATCTGGACCTTTGTAGTGATGCATAATTGGTTTGAACCCCTTGTCACTGCCACC TGCCCCCACCAACACAAAATCCTACTTCATTACTGCTGACTCTGCTAACGTTCCATACTT GTTGCCTCTTTTGTCTTGCAAGAAGTATCAATAAACATCTTTCCAGATTTAQCCCCAAGT GTTTCTTGTTAAATCATTTAGAGCGGATCCCAAAGAACACAATCAACAAAGCTCTGAAAA GAATGTGTCCGATGAGATG.

191 All the intron-exon boundaries in the mouse factor IX gene comply with GT/AG rule of Breathnach and Chambon (91) The splice junctions are either type 0 (exon splicing between the codons) or type I (splicing between the first and second nucleotides of the codon) (93) see table below.

Table 4

Mouse Factor IX Splice Junction Sequences

Splice Junction Sequences INTRON exon 5' intron 3' exon Type (93)

A 1 / / TTTCAG T 2 I B 2 ACT GTGAGT___ TTACAG G 3 0 C 3 TTG GTAACG___ TTAAAG A 4 I D 4 TAG GTAAGT....TTTTAG T 5 I E 5 CAG GTCACA___ TTGTAG T 6 I F 6 CAG GTACTT___ CCACAG G 7 0 G 7 CTG GTAAGT___ CAACAG G 8 I CONSEN C A TT T -sus AG GT AGT ___ N AG G (92) A G CC C

192 6.2 DISCUSSION

All eight exons of the mouse factor IX locus and the majority of the non-coding regions have been isolated by screening a mouse genomic bacteriophage library with a human cDNA probe. Exons 2 to 8 are contained on five EMBL3 Bacteriophage and have been subcloned into the vector pGEMl. Exon 1 has been separately cloned into pBluescript KSM13- (Stratagene). All the coding regions and the intron-exon boundaries of exons 2 to 8 have been sequenced by dideoxysequencing methods. Exons 3, 4, 6, 7 and 8 have been sequenced in both strands; exons 2 and 5 only in one direction.

6.2.1 Bacteriophage Isolation

Three of the five bacteriophage isolated with the human cDNA probe, cVII (75), contain sequences corresponding to the 3' end of the factor IX gene. This is probably because over half of the cVII probe consists of exon 8 sequence, and would therefore, be expected to hybridize most strongly to this region. The other two bacteriophage, numbers 297 and 298, both contain more than one exon and would therefore also hybridize to this probe quite strongly. The fact that cVII lacks the first 41 nucleotides of the human message may partly explain the failure of this probe to identify bacteriophage containing exon 1 which consists of nucleotides 1 to 117.

The five bacteriophage, numbered 297 to 301, range in size from 18 to 21kb and span more than 65kb, with overlaps in the exon 5 and exon 8 regions. No bacteriophage has yet been isolated which spans the intronic region 3' of exon 6 or the 5' regions of the gene including exon 1. Interestingly the 3' ends of bacteriophage 300 and 301, at the 3' flanking regions of the locus do not appear to have the same restriction map, and may have undergone some sort of rearrangement, perhaps due to cloning artifacts.

193 6.2.2 The mouse factor IX gene.

The mouse factor IX locus spans at least 53kb and has an intron-exon arrangement very similar to that of the human gene. As expected for such an important gene the coding regions of the locus have been highly conserved during mammalian evolution; there is 8 6% sequence identity between the mouse and human data for these regions (Bestfit, GCG- Wisconsin University) . The non-translated regions have been less well conserved; sequence identity between the two loci drops to 77% when these regions are included in the comparison.

The intron exon boundaries of the human and mouse gene are very similar. The types of splice junction (either I or 0) are identical at each of the boundaries. Both the human and mouse sequences comply the GT. . . .AG rule of Breathnach and Chambon (91), and the surrounding sequences, corresponding roughly to the wider consensus sequence of Shapiro and Senapathy (92), have also been highly conserved. Outside of the splice junction regions the non-coding sequences are not under selective pressure and are much less well conserved.

The coding sequence of mouse factor IX has already been discussed in several publications after the isolation of clones by screening mouse liver cDNA libraries with the human factor IX cDNA.

My sequence for the DBA/2 mouse agrees with the cDNA sequence published by Yao et al (231) for AKR/J mouse, and with the partial cDNA of Sarkar et al (Mus musculus- 232); all the nucleotide residues of the coding region and the 3'untranslated region are identical apart from T 418 (numbering of Yao et al) which is a G in my sequence, resulting in an amino acid change of Cys to Trp. My sequence for the 5' untranslated region agrees with that of Pang et al

194 (Mus musculus -230) , but not with the 6 nucleotides of untranslated message in the sequence of Yao et al. The sequence of Pang et al, however, differs from my sequence and that of Yao et al in the region of amino acids -34 to -36. My sequence reads TCCCCGGCT compared with TCCCGGCCT of Pang et al. This results in a change of amino acids from SerProAla (231) to SerArgPro (230) compared to the SerProAla of the human sequence. The cDNA sequence of Wu et al (Mus musculus) differs from my sequence and those of Yao and Sarkar at nucleotides 1,110, 1184 and 1185 (numbering of Yao) where they have T,C and T nucleotides respectively compared to G, T and C (Yao, Sarkar) and T, T and C in the human. The sequence of Wu et al also contains an inserted G at nucleotide 1,625 and a deleted T at 1,803. (These are in the 3' untranslated region of the gene where the human sequence varies considerably; no comparison with the human is possible.) It remains to be shown whether any of these differences are sequencing errors or polymorphisms, perhaps due to strain differences.

The exon sequence of mouse factor IX encodes 4 64 amino acids. A comparison of the predicted mouse amino acid sequence with the human and canine proteins shows that mouse factor IX consists of a signal sequence of 21 amino acids, a prodomain of 18 amino acids, and a mature protein of 425 amino acids (230) .

6.2.3 Promoter Region

Haemophilia B Leyden is a rare form of haemophilia where the patients have less then 10% of normal factor IX activity during childhood, but this increases to 40-80% after puberty. Several mutations in the 5' untranslated region of the human factor IX gene, which is thought to be the promoter region, have been reported to lead to this form of haemophilia (110, 233, 234) . In order to look for possible regulatory DNA binding sites, Crossley et al (235) performed DNasel footprinting experiments on the proposed promoter region. Two

195 footprints were found; one at -99 to -7 6 which was heat sensitive and had a sequence corresponding to the consensus sequence known to bind Nuclear Factor 1- Liver (NF1-L) (236) , and one at +1 to +18 which was heat stable and conformed to the consensus sequence known to bind CCAAT/enhancer binding protein (C/EBP) (237). Experiments with the C/EBP and NF1-L binding proteins confirmed that they did indeed bind to these regions of the factor IX gene. Furthermore, the A->G mutation at +13, which is found in some cases of haemophilia B Leyden, abolished the binding of C/EBP to to the +1 to +18 region of the gene in vitro.

Sensitive CAT assays, using special low background vectors (pCATOO) , showed that the normal factor IX promoter could be transactivated by cotransfecting a vector encoding the C/EBP protein with the CAT construct. Introduction of the A->G mutation at nucleotide +13, however, abolished this transactivation. These CAT assays also confirmed the importance of the NF1-L binding site in vivo since deletions in this region reduced CAT activity five-fold.

Another mutation found in haemophilia B Leyden patients is a G->A transition at nucleotide -6. This is adjacent to the C/EBP binding site and it was thought that the mutation might interfere with binding of the C/EBP protein. The presence of the -6 mutation, however, did not abolish binding of C/EBP to the promoter region in vitro, but it did reduce CAT activity to background levels. This suggests that a third protein may bind to this region and regulate factor IX transcription.

Comparison of the DNA sequences of the factor IX promoter regions of macaque, dog, rat and mouse (230) with that of the human reveals that neither the C/EBP nor the NF1-L binding sites have been totally conserved in all species, but none of the sequences depart from those of accepted binding sites (239-240) . This is consistent with a contribution of C/EBP and NF1-L to factor IX expression in these species. The T residue

196 at nucleotide -20 has been conserved in all the species. A mutation to an A at this position has been found in some cases of haemophila B Leyden (233) but it is not known if any factor binds to this region. Interestingly the G residue at nucleotide -6 in the human is not conserved. In fact the dog, rat and mouse genes have an A at this position - a change which interfered with human promoter activity in the CAT assays. It is possible that transcription in these species is regulated differently to that of the human.

The three potential TATA box sequences proposed for the human factor IX gene have not been conserved in the mouse gene. The absence of the TATA box could explain the observations of different transcriptional start points (241).

Many other genes which lack TATA boxes and GC rich regions contain binding sites for large numbers of transcription factors, which are thought to be necessary for strict, specific transcriptional regulation in development or differentiation. In contrast transcription of the factor IX gene is found in many tissues (63) and is thought to be activated fairly late in embryogenesis (231).

6.2.4 Transcription Start Point.

Transcription initiation sites for the mouse factor IX gene have not been defined,. On the basis of SI nuclease mapping and primer extension experiments, Anson et al (75) proposed three posible transcription initiation sites in the human factor IX gene. (These were later confirmed by Reijnen et al (82) and are marked on fig 35 by #.) Since the most 5' of these sites gave the strongest signals it was assumed that this is the major transcription initiation site for the human factor IX gene. Of the three sites only those corresponding to nucleotide numbers +4 and +30 of the human gene (75) are

197 conserved in the mouse. This suggests either that the first site is less important in the mouse or that transcription might be regulated differently to that of the human gene. This would be consistent with the lack of conservation at residue -6 , which is thought to be involved in a transcription factor binding site, as mentioned above. There have also been reports of cDNAs transcribed from other points much further upstream. Salier et al (81) reported the presence of a transcription initiation construct at nucleotide -150 (numbering after Anson) in the human gene. Since no other investigators found such a site this was presumed to be due to an artifact of the CAT construct. Evans et al (242), however, isolated a canine cDNA clone starting at position -17 9 by the Anson numbering system. It is possible that this is an alternative site for the canine gene. Alternatively this clone may be a rarer uncharacteristic transcript.

198 6.2.5 Translation

In the 5' region of the mouse factor IX gene there are two Met residues (compared to three in the human) which could possibly be the translation initiation codon. The second of these codons, corresponding to amino acid residue -39, is known to be conserved among human, macaque, dog, rat and mouse factor IX, while the first Met codon, corresponding to amino acid residue -4 6, is not present in the dog or rat. The DNA sequence surrounding Met -39, GTCATGG, also shows greater homology to the consensus sequence of A/GCCATGG (243) than does the sequence TCAATGA around Met-46. This would suggest that the second Met residue, at amino acid -39, nucleotide residue +51, is the true translation start point of the mouse factor IX gene. The translation initiation codon for the human factor IX gene is often quoted as Met -4 6, since ribosomes usually initiate translation at the most 5' ATG if it is in a favourable sequence context (243) . There is, however, no protein evidence to confirm that this is the case for the factor IX gene, and the conservation of both the surrounding nucleotide sequence and the consequent protein translation suggests that translation may infact be initiated at Met -39. Only one of the six amino acids between the two Met residues are conserved between mouse and man - a much lower degree of conservation than the following region.

Comparison of the predicted amino acid sequence for mouse factor IX with the human sequence reveals 91% identity (see figure 36) . All the protein domains are approximately equally conserved apart from the activation peptide domain which is less well conserved. All the residues which are thought to be important to the function of the human protein have been highly conserved in the mouse.

Of the first 21 amino acid residues of the leader peptide

199 (-19 to -39) 13 are identical in the human, canine and murine proteins. This is perhaps not surprising since this is the domain known to be responsible for the secretion of the protein. In the prodomain the residues that have been conserved among the vitamin K-dependent coagualation proteins, have also been conserved in the murine and canine proteins. These are Phe-16, ala-10, the hydrophobic residues at -17,-7,- 6, and the basic residues at -1 to -4 (46). The aspartic acid residue at position -14, and the asparagine residue at -11 have also been conserved among the three species. In the murine and canine proteins this asparagine at -11 is within the sequence Asp-X-Ser/Thr (X= any residue) which forms a potential N-linked glycosylation site.

Three other potential N-linked glycosylation sites occur in the mouse factor IX protein; two, which are conserved in the human, canine and bovine proteins, at residues 159 and 178 of the activation peptide domain, and one, which is present in canine and bovine but not the human protein, at residue 271 in the catalytic domain.

In the Gla domain the sequence Gla-X-X-X-Gla-X-Cys which is thought to be important for the activity of the y- carboxylase (244) has been conserved among the murine, human, canine and bovine proteins. In the first EGF domain, the consensus sequence Cys-X-Asp/Asn-X-X-X-X-Phe/Tyr-X-Cys-X-Cys (4 6) containing the Asp or Asn residue in the correct orientation for B-hydroxylation is present in the human, canine and murine proteins but not the bovine.

200 Figure 36

-39 Signal Sequence aa+1 D MAEAS GLVTVCLLGY LLSAECAVFL DRENATKILS RPKRYNSGKL H RVNMIMAESP GLITICLLGY LLSAECTVFL DHENANKILN RPKRYNSGKL M HLNTVMAESP ALITIFLLGY LLSTECAVFL DRENATKILT RPKRYNSGKL

Gla Domain EGFl-> 56 D EEFVRGNLER ECIEEKCSFE EAREVFENTE KTTEFWKQYV DGDQCESNPC H EEFVQGNLER ECMEEKCSFE EAREVFENTE RTTEFWKQYV DGDQCESNPC M EEFVRGNLER ECIEERCSFE EAREVFENTE KTTEFWKQYV DGDQCESNPC EGF2-> 106 D LNDGVCKDDI NSYECWCRAG FEGKNCELDV TCNIKNGRCK QFCKLGPDNK H LNGGSCKDDI NSYECWCPFG FEGKNCELDV TCNIKNGRCE QFCKNSADNK M LNGGICKDDI SSYECWCQVG FEGRNCELDA TCNIKNGRCK QFWKNSPDNK $ Activation Peptide Domain-> 156 D WCSCTTGYQ LAEDQRSCEP AVPFPCGRVS VPHISMTRTR AETLFSNMDY P SHSPTTLTR AEIIFSNMDY R GVS VSHASKKITR ATTIFSNTEY S RAS VLHTSKKLTR AETIFSNMNY G RVS IPSVSKEHNR ANAIFSRMGY H WCSCTEGYR LAENQKSCEP AVPFPCGRVS VSQTSK.LTR AETVFPDVDY M VICSCTEGYQ LAEDQKSCEP TVPFPCGRAS ISYSSKKITR AETVFSNMDY

206 D ENSTEV...... EKIL DNVTQPL... NDFTRWGGK DAKPGQFPWQ P ENSTEV...... EPIL DSLTESNQSS DDFIRIVGGE NAKPGQFPWQ R ENFTEA...... ETIR GNVTQRSQSS DDFTRIVGGE NAKPGQFPWQ S ENSSEA___ ..... EIIW DNVTQSNQSFDDFNRWGGE DAARGQFPWQ G VNFTDDETIWDDNDDDETIW DNSTESTKPS DEFFRWGGE DAKPGQFPWQ H VNSTEA...... ETILDNITQSTQSFNDFTRWGGE DAKPGQFPWQ M ENSTEAVFIQ DDITD.GAIL NNVTESSESLNDFTRWGGE NAKPGQIPWQ

Catalytic Domain-> 256 DVLLNGKVDAFCGGSIINEKW WTAAHCIEPDVKITIVAGE HNTEKREHTE PVLLNGKIDAF CGGSIINEKW WTAAHCIEPGVKITWAGE YNTEETEPTE R VLLNGKVEAFCGGSIINEKW WTAAHCIKP DDNITWAGE YNIQETENTE S VLLHGEIAAFCGGSIVNEKWWTAAHCIKPGVKITWAGE HNTEKPEPTE G VLLNGETEAFCGGSIVNEKW IVTAAHCILPGIKIEWAGK HNIEKKEDTE H WLNGKVDAFCGGSIVNEKW IVTAAHCVETGVKITWAGE HNIEETEHTE M VILNGEIEAFCGGAIINEKWIVTAAHCLKPGDKIEWAGE YNIDKKEDTE

306 DQKRNVIRTILHHSYNATINKYNHDIALLELDEPLTLNSYV TPICIADREY P QRRNVIRAIPHHSYNATVNK YSHDIALLELDEPLTLNSYV TPICIADKEY RQKRNVIRIIPYHKYNATINKYNHDIALLELDKPLTLNSYV TPICIANREY S QKRNVIRAIPYHGYNASINKYSHDIALLELDEPLELNSYV TPICIADREY G QRRNVTQIILHHSYNASFNKYSHDIALLELDKPLSLNSYV TPICIANREY HQKRNVIRIIPHHNYNAAINKYNHDIALLELDEPLVLNSYV TPICIADKEY M QRRNVIRTIP HHQYNATINKYSHDIALLEL DKPLILNSYV TPICVANREY

201 356 D SNIFLKFGSG YVSGWGRVFN KGRSASILQY LKVPLVDRAT CLRSTKFTIY P TNIFLKFGSG YVSGWGRVFN RGRSATILQY LKVPLVDRAT CLRSTKVTIY R TNIFLNFGSG YVSGWGRVFN RGRQASILQY LRVPFVDRAT CLRSTKFTIY S TNIFLKFGYG YVSGWGRVFN RGRSASILQY LKVPLVDRAT CLRSTKFTIY G TNIFLKFGAG YVSGWGKLFS QGRTASILQY LRVPLVDRAT CLRSTKFTIY H TNIFLKFGSG YVSGWGRVFH KGRSALVLQY LRVPLVDRAT CLRSTKFTIY M TNIFLKFGSG YVSGWGKVFN KGRNASILQY LRVPLVDRAT CLRSTTFTIY $ $ 406 D NNMFCAGFHE GGKDSCQGDS GGPHVTEVEG ISFLTGIISW GEECAMKGKY P SNMFCAGFHE GGKDSCLGDS GGPHVTEVEG TSFLTGIISW GEECAVKGKY R NNMFCAGFDV GGKDSCEGDS GGPHVTEVEG TSFLTGIISW GEECAIKGKY S NHMFCAGYHE GGKDSCQGDS GGPHVTEVEG TSFLTGIISW GEECAMKGKY G NNMFCAGFHE GGRDSCQGDS GGPHVTEVEG TNFLTGIISW GEECAMKGKY H NNMFCAGFHE GGRDSCQGDS GGPHVTEVEG TSFLTGIISW GEECAMKGKY M NNMFCAGYRE GGKDSCEGDS GGPHVTEVEG TSFLTGIISW GEECAMKGKY

426 D GIYTKVSRYV NWIKEKTKLTEnd P GIYTKVSRYV NW R GVYTRVSWYV NW S GIYTKVSRYE V G GIYTKVSRYV NW H GIYTKVSRYV NWIKEKTKLTEnd M GIYTKVSRYV NWIKEKTKLTEnd

Amino Acid Sequence of the Factor IX Protein

Comparison of the amino acid sequences of the dog (D), Human (H) , Mouse (M) , Pig (P) , Rabbit (R) , Sheep (S) , Guinea Pig (G) factor IX protein. The activation cleavage sites are underlined, the triad of catalytic residues are in bold. (For single letter amino acid code see appendix I)

($ Indicates sites where the predicted amino acid sequence of Wu et al differs from my data. The first of these sites, at amino acid 99, is a W only in my data. Since this residue is conserved as a C in the other species as well as the other publications, it is probably the result of a sequencing error.)

202 The activation peptide domain of the mouse factor IX protein, contains two inserts compared to the human sequence. The first insert is a single Lys residue, while the second is a block of nine amino acids. Sarkar et al (232) compared the sequences of the activation peptides and catalytic domains of human, sheep, pig, rabbit, guinea pig, rat and mouse factor IX and found that the single Lys residue has been lost from the human sequence, while the 9 amino acid block has been gained in rodents. He found that 42% of the bases of the activation peptide were invariant in all seven species, compared to 67% of the bases of the catalytic domain. Although the catalytic domain is the least conserved of the protein subunits, the activation cleavage sites are well conserved; the first site is identical in all seven species, the second has an isoleucine in place of the valine in the pig and rabbit. The lower degree of conservation may be consistant with the role of the activation peptide as a "protein spacer".

As expected the catalytic domain of the mouse protein contains the triad of His, Asp, Ser and an acidic amino acid at the bottom of the substrate binding pocket, characteristic of Serine proteases. In the mouse protein these are at residues 231, 279, 375 and 369 respectively. The amino acid residues 373 through to the end of the protein at residue 425 are also totally conserved between the human and mouse proteins. The first part of this region forms part of a loop on the exterior of the factor IX molecule which may possibly be involved in binding to factor VIII. The carboxy-terminus of the protein is also known to form an a-helix which is conserved in many of the serine proteases (S. Pemberton personal communication).

203 6.2.6 3' end of the Mouse Factor IX gene

Translation is terminated in the mouse, as in the human, by a double stop codon of UAAUGA in the mRNA. There is a 3' untranslated region of 1320 nucleotides containing the polyadenylation signal AATAAA 16 base pairs upstream from where the polyA tail is added (94). The sequence CATTG which is present in the human gene 23 nucleotides downstream from the polyadenylation signal, and is thought to be involved in cleavage of the 3' end of the mRNA, is replaced in the mouse by CATTT, which does not conform as closely to the CAXUG consensus sequence of Berget (95) (X=any residue). The mouse 3' untranslated region contains no potential hairpin loops or Z DNA, but there is a (GA) 16 repeat sequence that has the potential to form H DNA. This is a structure consisting of single and triple stranded DNA (245). Such sequences are common in the 5' regions of many eukaryotic genes (246-248) and have been shown to be capable of binding protein and ribonuclear protein (24 9). It is thought that they may contribute to transcriptional regulation in these genes. H DNA is also common in sites involved in recombination (250) and may play a similar role to Z DNA in unwinding the DNA helix.

204 6.2.7 Attempts to Isolate the Uncloned Regions of the Mouse Factor IX Gene.

The cVII probe consists largely of sequences corresponding to exon 8 of the factor IX gene and lacks the first 41 nucleotides of the mRNA, which correspond to the first half of exon 1. Bacteriophage corresponding to the 3' end of the locus, or those containing multiple exons, would therefore hybridize much more strongly to this probe than bacteriophage containing only one of the first seven exons, especially exon 1. This may explain the failure of this probe to identify clones containing exon 1, and exon 6 and its 3' flanking regions. Interestingly Yoshitake et al (79) were also unable to isolate the 5' region of the human factor IX gene from their original bacteriophage library by probing with a cDNA, and eventually isolated it from a library derived from a 5X cell line.

In order to overcome the bias of the cVII probe for the 3' region of the factor IX gene, more specific probes were generated for the two regions of the locus that had not been cloned. Attempts to use probes containing intron sequences, such as the 5' region of bacteriophage 299, generated problems of cross-hybridization, which are thought to be caused by the presence of highly repetitive sequences. Several such sequences exist in the human factor IX gene, including 3 highly repetitive Alu sequences in intron F (ie between exons 6 and 7) . It is possible that the murine equivalent of the Alu repeat, the B2 repeat sequence (251), is present in the sixth intron of the mouse gene. Partial sequence data (approximately 500 bases) for the 5' BamH 1/ Sal 1 fragment of bacteriophage 299 showed no significant sequence identity with the B2 repeat sequence, but since the fragment is 2kb in length, the possibility of it containing a B2 repeat or other repetitive sequences can not be ruled out. The presence of repetitive sequences would also contribute to the failure of PCR amplification of this region.

205 The 500bp Hind 3 fragment of #326 which was used as a probe for exon 6 sequences, contains 200bp of exon sequence. This fragment consistently hybridizes to several bands on genomic Southern blots (see figure 33) and to a subpopulation of bacteriophage in the library. It was thought that this too may contain a repetitive sequence in its non-coding region. When a completely exonic fragment, generated by cloning the product of amplification of #326, was used to probe a Southern blot, the same bands were evident (data not shown).

If the variable bands in figure 33 are presumed to be the result of hybridization to the mouse factor IX gene, and providing there are no BamH 1 or Hind 3 sites within the missing region, these digests would suggest that this region could be as small as lkb, as might be expected from the comparison with the human gene map. The 12kb band in the BamH 1 digest represents approximately 9kb in bacteriophage 298 and 1.8kb in bacteriophage 299, leaving a gap of approximately lkb. The 5kb Hind 3 band represents 500bp of bacteriophage 298, 4kb of 299 and, therefore, a gap of approximate 500bp. The lOkb band and the two different bands at approximately 20kb might then be explained either as the product of a partial digest, hybridization to other sequences in the genome, or to either a rejfeitive sequence or contamination from an unexplained source.

The amplified exon 6 probe also identified at least 20 strongly positive bacteriophage from the library, none of which also contained exon 7 sequences. Considering the minimum amount of DNA between the two exons (16kb) the probability of finding both exons on one bacteriophage is not high. The numbers of hybridizing plaques and the strength of hybridization, however, suggests that this fragment may be hybridizing to other homologous sequences within the mouse genome, such as pseudogenes, although none have so far been identified. Alternatively there may be contaminating sequences

206 either in the genomic DNA preparation or in the probe. Since the exon-specific probe was generated from amplification of the subclone it might be expected to contain the same contaminating sequences. It is interesting that the lOkb band is the same size whether the DNA is digested with BamH 1, Hind 3, or Sal 1 (data not shown).

When used as part of a mixed probe, an exon 7-specific probe generated from amplification of #304 failed to identify any positive bacteriophage in the same library as originally screened with cVII. Any weakly hybridizing plaques, however may have been overpowered in the initial screen by the second probe (to exon 6) which consistently produces strong positive signals in a subpopulation of bacteriophage. Also, since the titre of the bacteriophage library had fallen, only 1/2 million bacteriophage were screened, compared to the 4 million initially screened with cVII. Applying the Clarke and Carbon formulae, the probability of finding a bacteriophage containing exon 7 sequence, by screening 1/2 million, is 0.929 considerably lower than previous screens. It is possible that a screen of a higher number of bacteriophage, using the exon 7 probe alone, may identify exon 7 containing clones. Due to the large intron region 5' of exon 7, these clones may not, however, contain the uncloned region. In order to isolate this region from a library, a specific probe which does not contain any cross-hybridizing sequences is required.

Despite repeated library screens and attempts at amplification of genomic DNA under a variety of conditions, I have still not isolated parts of the non-coding regions between exons 6 and 7 and between exons 1 and 2. It is possible that these introns could contain unstable, or perhaps highly methylated, sequences, which would affect their representation in the bacteriophage library, and possibly prevent PCR.

207 CHAPTER 7

RESULTS AND DISCUSSION

TARGETING VECTOR CONSTRUCTION

AND ELECTROPORATION EXPERIMENTS

208 7. TARGETING VECTOR CONSTRUCTION AND ELECTROPORATION EXPERIMENTS 7.1 RESULTS 7.1.1 Targeting vector

For the factor IX-targeting experiments it was decided to construct a replacement vector which would enable positive and negative selection of homologous recombinants (191). In order to do this it was necessary to have a clone containing coding sequence including a convenient restriction endonuclease site for the insertion of a neomycin phosphotransferase cassette. As an extra level of enrichment for targeted clones it was decided to use the vector pMClNeo (figure 37). This vector contains the promoter from the viral thymidine kinase gene (to enable expression of the selectable marker even where the disrupted gene is not normally expressed) and coding sequences for the neomycin phosphotransferase gene (to provide resistance to the neomycin analogue G418), but no polyA addition signal. This construct will allow expression of the selectable marker only where it is inserted upstream of a chromosomal polyA addition signal. Under G418 selection this should lead to the elimination of transfected clones that have inserted the construct in intergenic regions. The vector would also have to contain a site outside the region of homology for insertion of the HSV-tk cassette (to provide negative selection against random integration events), and a rare restriction endonuclease site which could be used for linearization purposes.

The published cDNA sequence (229) showed that exon 6 contains an Xho 1 restriction site close to the 3' end of the exon. Exon 8 also contains a Hinc 2 site at the 3' end of the coding region and a Hind 3 site within the untranslated region. Disruption of such a 3' region of the gene, however, would not affect protein function.

It was decided, therefore, to isolate exon 6 from the

209 bacteriophage clones, and to insert the pMClNeo cassette at the Xho 1 site as a basis for the construct . The insertion of the Neo cassette close to the end of the clone would also facilitate identification of targeted clones by DNA amplification across the novel junction at the 3' end of the construct, into the genomic sequence (see figure 12).

In order to subclone exon 6, bacteriophage 298 was subjected to Southern blot analysis, digesting with a panel of restriction endonucleases and probing with the oligonucleotide MIX 6. The Pstl digest produced a hybridizing fragment of 4.8Kb. None of the other digests produced hybridizing fragments that were of a convenient size for cloning into a vector, and contained an insert of a sufficient size for homologous recombination. The Pstl fragment was therefore cloned into the Pstl site of pGEMl by repeating the digest of the bacteriophage, ligating the whole digest into the vector, transforming, and picking the exon-containing clones by probing colony lifts with MIX6. The isolated subclone, number 326 (see figure 28), forms the basis of my targeting vector.

Restriction digest mapping showed that subclone #326 contains 4.8Kb of mouse genomic sequence including the whole of exon 6 (232bp). The Xho 1 restriction cleavage site is 49 base pairs from the 3' end of the exon, and 68 base pairs from the end of the subclone. The whole exonic region can be conveniently excised from the subclone in a 500 base pair Hind 3 fragment.

The neomycin phosphotransferase gene was excised from the pMClNeo vector (figure 37) with the restriction enzymes Sal 1 and Xho 1, and ligated into phosphatased, Xho 1-cut #326. The vector region of pMClNeo was also destroyed by digestion with Bgl 1 in order to decrease background ligation events. After transformation, colonies containing the Neo cassette were selected by hybridization to the purified, 32P-labelled 1.1Kb Neo fragment. A clone containing the expected 1.1Kb insert was

210 digested with several restriction endonucleases to confirm that the Neo was in a 5' to 3' orientation relative to the factor IX gene.

For linearization purposes a unique restriction endonuclease site was required in the construct. The only rare cutter endonuclease found not to cut within the exon 6-Neo construct was Not 1. No site for this enzyme is included in the polylinker of the vector. A pair of 18-base oligonucleotides which, when annealed together would form a Not 1 site were, therefore, synthesized (NotlRK and Not2RK). The oligonucleotides were made with Xba 1 sticky ends, so that they could be inserted into the construct at this site in the polylinker. Only one of the ends was designed to reform an Xba 1 site upon ligation as this site would be used to insert the HSV-tk gene. 2\ig of each of the Not 1 oligonucleotides were kinased separately in the presence of non-radioactive lOmM ATP for 30 min at 37°C, in a total volume of lOp.1, to allow subsequent end-ligation. The two kinased oligonucleotides were then mixed together and heated to 65°C for 15 min. 5|ll of this mixture (containing 500ng each oligonucleotide) were then annealed to lOOng of the exon6-Neo construct at room temperature for 30 min. These DNA concentrations were approximations of the ratios recommended by Sambrook et al (207). The mixture was ligated under the standard conditions as described, at 15°C overnight. Transformed colonies were picked at random and DNA prepared. Restriction analysis revealed the presence of a Not 1 site in several clones. A clone containing the oligonucleotides with the reformed Xba 1 site 3' of the Notl site was required so that the linearization site would be outside the selectable markers. (This stategy should reduce the area of the construct in which illegitimate recombination can take place and still result in a phenotype selectable by the positive/negative procedure.) Several of the clones were sequenced before one with the oligonucleotides inserted in the correct orientation was found. This clone was given the number 440 (see figure 38).

211 I initially attempted to insert the HSV-tk (figure 37) cassette into this construct by excising it from the vector with endonucleases Xho 1 and Sal 1. Oligonucleotides were then sythesized which, when annealed together, would contain an Xba 1 restriction site and have Xho 1 "sticky ends". These overhanging ends anneal to both sides of the purified tk fragment, thus flanking this fragment with Xho 1 restriction sites. Ligations were performed and the product was digested with Xba 1, so that it could be inserted into the Xba 1 site of the construct. This approach, however, was not successful due to the difficulty of restricting a site near to the end of a strand of DNA (New England Biolabs catalogue, 1990-1991). The HSV-tk gene was eventually inserted into the construct by sequentially cloning it into two other vectors, picking up a single Xba 1 site each time (figure 39). The HSV-tk gene was then excised by cutting with Xba 1 and cloned into the Xba 1 site of #440. The completed construct was given the number 444.

212 Xho I TK promoter 4 5 5 \k ATG 733

Amp pMClNeo 3804 Neo

1538 Ori Bam HI Hinc II S a il Hind III

m* • ^ • V * S • % • % • S • S • *■ ■ % • S • % • % • S * % • % • polyoma virus g2 HSV-TK •s«s»s»s» enhancer ■% v■/. y ^ .’ i■ Hind III Sal 1 Xho1 Eco R1 ^ pMC1-TK T 4550 bp %% Figure 37. Diagram of the pMClNeo (184) and pMCltk (191) Vectors.

214 Eco R1 Sal 1 Mouse factor IX

Hind 3

Exon 6

#440

NeoR

Pst-1* Not-1 P stl SPb pGem 1

Eco R1 Sal 1

Mouse factor IX

Hind

Pst-1 Exon 6

TK f ix (10.8 kB) HSV-TK

Not-1 Pst 1

pGem 1 Figure 38.

Diagram of Targeting Constructs #440 and #444 (TKfIX). #440 contains 4.8kb of sequence homologous to the mouse factor IX gene, interrupted by the Neo gene, in the vector pGEMl. Any cell containing this construct will be selected for by growth in G418. #444 is the same as #440, except that the HSV-tk gene has been added outside of the factor IX sequences. This construct will allow selection for transfected cells and against those that have integrated the vector at random, as described in section 3.5.1.

216 H3

pMC1-TK + pGem3

H3 Xho1

BamH1

BamH1 H3 - TK 'Xba1>* Xba1 H3 Sail /

Neo factor I X ^ -

#440 BamH1 Xba1 Xba1 Not1 TK H3 Xba1

factor IX Neor

HSV-TK

#444 Not1

Targeting vector Figure 39. Construction of Targeting Vector Outline of the strategy used to complete the construction of a vector to allow positive/negative selection of homologous recombinants, by inserting the HSV-tk gene into #440. The gel- purified Hind 3/Xho 1 tk fragment was first cloned into the vector pGEM3. The BamH 1/Hind 3 fragment of the resultant vector was then isolated and cloned into pGEM7ZF+. The Xba 1 fragment of this cloning event was then isolated and cloned into the Xba 1 site of #440.

218 7.1.2 Determination of Optimal Conditions for ES Cell Growth and Electroporation

Before gene targeting experiments could be performed it was necessary to establish optimal conditions for the routine growth and manipulation of Embryonic Stem cells. The potential ability of the cultured cells to contribute to a chimaera was assessed, and optimal electroporation conditions were determined. Finally gene targeting experiments with two targeting vectors were performed.

To avoid differentiation it was found to be necessary to culture the cells at high densities, feeding every day and splitting, at a dilution of 1/5 to 1/10, every two days. Initially, stem cells were grown in 10% BRL-conditioned media supplemented with commercial LIF at 103 U/ml (Amrad Corp. Ltd. Melbourne, Australia) . By increasing the BRL-conditioned media to 60%, however, it was found to be unnecessary to supplement with LIF. Unless stated otherwise, in all the following experiments the stem cells were grown in 60% BRL-conditioned media without LIF. Batch testing of fetal calf serum (FCS) samples was carried out, and the heat inactivated, selected batch was used at a final concentration of 20% in BRL- conditioned CMfi (166).

Batch Testing Of Fetal Calf Serum

Buffalo Rat Liver (BRL) cells and the Embryonic Stem Cell lines, CCE and CPI, were obtained from Dr W. Colledge (University of Cambridge) . The ES cells were grown in CM15 medium containing 10% BRL conditioned media and commercial LIF at 103 u/ml. Batch testing of FCS was carried out by seeding approximately 2000 cells in gelatin-coated microtitre plates, in a volume of 800M-1 CMG> + LIF, without FCS. 200^1 of the different samples of FCS were then added to the wells and the appearance of the colonies noted over several days. Batch

219 number 0830023 from Imperial Laboratories Ltd sustained the most proliferation and the most undifferentiated phenotype. This batch was ordered in bulk and used throughout this project.

Assessing Potency of Stem Cell Lines

In order for the stem cells to contribute to the germline of a chimaeric mouse they must be totipotent. This ability to differentiate can be assessed by the cells' tendancy to form Embryoid bodies (149) and by their karyotypic status.

Embrvoid Bodies

In order to allow Embryoid Body formation all the ES cell lines were grown in roller bottle flasks which were not gelatin-coated, in CMB media without LIF or BRL.

After 3-4 days "balls" of cells had formed which consisted of an outer layer of endoderm cells surrounding undifferentiated cells. A layer of Reichert's membrane could also be seen underneath the endodermal cells. After 4 days of roller-bottle culture these simple embryoid bodies were removed to a tissue culture dish in lOmls media for continued growth. The media was replaced daily for 8 days. At this point the majority of the embryoid bodies had greatly increased in size and had developed a fluid filled cavity. These cystic Embryoid Bodies (see figure 40) were cultured for a further two weeks under the same conditions. Complex areas of several differentiated cell types, including areas of differentiated dark tissue, which may have been blood islands, were observed inside the embryoid bodies, but no beating muscle was ever observed.

220 Differentiation

In a separate experiment simple embryoid bodies were plated onto a gelatin-coated culture dish and allowed to settle. Continued growth in the absence of BRL-conditioned media resulted in the differentiation of the stem cells into a number of different cell types (see figure 41).

Karvotvoe Analysis

CCE and CPI cells were sent to the Sir William Dunn School of Pathology for karyotype analysis. Both cell lines were found to contain translocations of chromosomes 6 and 11, resulting in effective trisomy for most of chromosome 11. It was, therefore, unlikely that the CCE and CPI cells would be suitable for derivation of a germline chimaera because of these gross abnormalities.

E14 cells, which had been shown to be capable of contributing to the mouse germline, were obtained from Dr Martin Hooper, University of Edinburgh. Cells from this line have been karyotyped at passage 12 and 25 (167) and found to have normal G-banding patterns, and a modal chromosome number of 40. (Between passages 12 and 25 the number of metaphase spreads with the modal number of chromosomes had dropped from 16 to 13 out of 20.) These cells were grown in 60% BRL- conditioned CMIS media as described in Smith and Hooper (166), except that FCS was used at a final concentration of 20%.

221

Figure 40 Cystic Embryoid Body

The embryoid body consists of an outer layer of endoderm cells surrounding other differentiated cell types and a fluid filled cavity.

223 Figure 41 E14 Embryonic Stem Cells a) Normal E14 cells - note the compact colonies of small rounded cells with large nuclei. b) - d) Differentiated cells including: (b) fibroblast-like cells. (c) elongated cells with pointed outgrowths- possibly neuronal cells. (d) large multinucleate cells- possibly trophectodermal cells.

224 9tJj? /*. S'

‘AV * f r ' A *Mr/ l T r ^ L .

c r

225 226 227 In order to achieve maximum transfection and targeting efficiencies, the electroporation and selection conditions for the ES cells were investigated.

Selection with G418

Toxicity tests for the effects of the neomycin analogue G418 on the stem cells were performed.

Experiments were carried out to determine the concentration of G418 necessary to kill all the non­ transfected cells in about 5-6 days, while not damaging the Neo-containing colonies. Approximately 1 X 106 stem cells were seeded into each of six 25cm2 flasks, containing 10ml 60% BRL- conditioned CMli. A stock solution of Geneticin at lOOmg/ml (active ingredient- 583|ig G418/mg Geneticin) was made by dissolving the powder in sterile water by warming to 37°C. This solution was sterilized by filtering through a 0.2HM filter and stored at -20°C. The following day the media on the cells was replaced with similar media containing Geneticin at final concentrations of 0.25, 0.5, 0.75, 1, 1.25 and 1.5 mg/ml. The media was relaced daily, maintaining the same concentrations of G418 in each flask. The condition of the cell cultures was observed for a further 6 days. By the 6th day of selection the cells in all concentrations of Geneticin apart from 0.25mg/ml had died (see figure 42). The cells at 0.5mg/ml Geneticin were all dead by the 5th day of selection. It was, therefore, decided to select for cells successfully transfected with the Neo gene with geneticin at a final concentration of 400|ig/ml of media. This is equivalent to 233|ig G418 /ml media. This G418-selection regime forms part of the positive /negative selection procedure which I propose to use to select for targeted stem cells.

228 ib in

ro

CM

O 00 L 06 08 01 09 OS Ofr 0C 0Z 9AII9 SH90 %

229 Figure 42 Results of 6418 Toxicity Tests key; concentrations of Geneticin (mg/ml)

■ 0.25

▲ - 0.5

T - 0.75 * - 1.0 * - 1.25 Geneticin was added to the media for the first time on day 0.

230 Electroporation conditions

In order to establish optimal electroporation conditions and transfection efficiencies stem cells were transfected with 60p.g pMClNeoPolyA DNA. Transfected colonies were then selected with Geneticin. Electroporation was performed in a BioRad Gene Pulsar apparatus as described, under varying conditions. The DNA was linearized before electroporation by digestion with restriction endonuclease Hind 3. For each transfection 1X107 cells were electroporated in 1ml complete PBS.

The capacitance was varied between 3, 25 and 500|lFd, while the potential difference was varied between 100, 200 and 400 Volts. All of the cells from each transfection were plated onto an 11cm petri dish, and Geneticin was added to a final concentration of 400|ig/ml, 48h after the transfection. Selection was continued for approximately 10 days. The different cell lines generated very similar results, and the table below incorporates results from electroporations with E14, CCE and CPI cells. Electroporation conditions of 500|lFd and 200V consistently produced the most G418 resistant colonies with a transfection efficiency of approximately 1 in 5.66x10s cells, per 60p.g DNA.

Table 5 Transfection Results Under Different Electroporation

Conditions

cells Capacitance Volts G418r transfected |iFds colonies 2xl07 3 400 2 2xl07 25 200 0 3xl07 25 400 42 lxlO7 500 100 0 3xl07 500 200 53 3xl07 500 400 41

231 Selection in ganciclovir

In order to allow the use of positive and negative selection the final construct (#444) includes the HSV-tk gene as well as the Neo cassette. This renders cells sensitive to the antiviral drug, ganciclovir, and as explained in section 3.5.1, will provide selection against cells that have incorporated the construct by random integration. Under the positive-negative selection regime ganciclovir was added to the medium after transfection with #444 at a final concentration of 2|iM, as recommended by Mansour et al (191) .

232 7.1.3 Gene Targeting Experiments

For the gene targeting experiments ES cells were transfected with constructs containing either the Neo cassette alone (#356, #440) or both the Neo and HSV-tk cassettes (#444) . These targeting constructs contain the Neo gene without its polyadenylation signal which should provide enrichment for those cells that have incorporated the construct upstream from an endogenous signal. In order to test the ability of this cassette to provide selection in G418, and to assess the transfection efficiency, cells were also transfected with pMClNeo and pMClNeopolyA vectors. The total numbers of transfected and G418-resistant cells for all the electroporated constructs are shown in table 6. Table 6

Transfection Results

Vector No of cells No of G418r Transfection electroporated colonies efficiency pMClNeopolyA 1x10® 500 l/2xl05 pMClNeo 7xl07 56 1/1.25xl06 #356 * 9xl07 164 1/5.5xl05 #440 1.2x10® 0 / #444 1.5x10® 59 (0 Gcr) 1/2.5xl06

* #356 was an earlier preparation of #440. It was found to contain an extra Not 1 restriction endonuclease site, thought to be due to a multiple ligation event during the cloning of the Not 1 oligonucleotides, making it unsuitable for further development of the positive/negative selection construct. This site should not influence the vector's transfection efficiency, selectability in G418, or its ability to target the mouse factor IX locus.

233 In the experiments quoted in table 6 the colonies transfected with construct number 444 were first selected for 10 days in G418, then ganciclovir was added for a further 10 days. Although 59 colonies were found to be resistant to G418/ none of these survived selection in ganciclovir.

Two other transfections were performed, which are not included in table 6. These cells showed very poor plating efficiencies after electroporation. The plates were not confluent by 48h after the electroporation and selection consequently yielded very few colonies. Out of a total of 3xl07 cells electroporated with pMClNeopolyA under the standard conditions of 500p.Fds 200V, only 2 colonies were resistant to G418. A total of llxlO7 cells were transfected with construct number 444 and doubly selected under G418 and ganciclovir. No resistant colonies were observed.

Southern Blot Analyses

Since there was a possibility that targeting construct #356 would produce homologous recombinants, E14 colonies which had been transfected with this vector, and which were G418- resistant, were cloned and subsequently screened for targeting events by Southern blot analysis.

Cloning was performed by placing a perspex cloning ring, with its base lightly dipped in sterile petroleum jelly, around the G418r colony. The colony was washed twice with drops of PBSA then trypsinized by placing a drop of versene trypsin inside the ring. The colony was then removed to a microtitre plate by sucking it up and down a capillary tube. 25 colonies were successfully cloned. When sufficient cells had been cultured, aliquots were frozen until they were required for Southern blot analysis.

234 Although aliquots of 8 transfected cell lines were initially thawed, only 6 survived. One 75cm2 flask (approximately 4xl07 cells) of each of each of the 6 G418r, #356-transfected cell lines was used for genomic DNA extraction as described for blood cells, except that the initial step of lysis in a sucrose solution was omitted. DNA from a clone which had been transfected with the pMClNeo vector, and from non transfected cells was also prepared. An average of lmg of genomic DNA was obtained for each cell line. lOjig each DNA was digested with the restriction endonuclease Hind 3 and ran on a 1.2% agarose gel at 37V for 24h. The gel was blotted by the method of Southern as described, and the filter was screened with a probe derived by PCR of exon 6 of the mouse factor IX gene (figure 43a) . The same blot was later stripped in 50mM NaOH and rehybridized, firstly to a the gel- purified lkb Neo fragment of pMClNeo (figure 43b), and then pGEMl (figure 43c) under the same conditions as described for the exon 6 probe.

235 2 3 4 5 6 7 8 kb

2 3 4 5 6 7 8

1.6 kb C 1234 5678

Figure 43. Southern Blot Analysis of Selected ES Clones

Southern blot of Genomic DNA from seven G418r, transfected E14 clones, and a non-transfected control, probed with a) 291 bp exon6-specific probe, b) lkb Neo cassette from pMClNeopA, c) pGEMl.

For a) and b) the gel-purified fragment was 32P-labelled by DNA synthesis from a random primer and hybridized overnight to the blot in genomic hybridization buffer at 65°C. The blot was then washed twice for 20min at 65°C in 3xSSC, 1%SDS. lOOng pGEMl vector was 32P-labelled and hybridized and washed under the same conditions as described for a) and b) .

Lane 8 contains the non-transfected control, lane 7 contains the clone transfected with the pMClNeo vector only. Lanes 1 to 6 contain the clones transfected with construct #356. Differences in intensities between the lanes correspond to different DNA concentrations.

237 The exon 6-specific probe hybridized to three bands of DNA in lanes 1 to 6. The 1. 6kb band corresponds to the 1.6kb Hind 3 fragment of the vector, containing exon 6 and Neo sequences, and represents vector sequences that have not undergone gene replacement events. As expected these bands are also present when the blot is probed with the Neo fragment, but not in the non-transfected or pMClNeo-transfected controls. The 5kb band present in all lanes, probably represents the genomic DNA fragment, which also appears on figure 33, but not 43b. This band appears to be the same size in all of the transfected cell lines indicating that they have not undergone a gene replacement event, which would cause an increase in size of lkb due to the insertion of the Neo gene. The two bands in lane 7 of figure 43b represent random insertion of pMClNeo. The band at 3kb, which hybridizes to the vector-derived probe, is probably due to contamination with vector sequences. Other bands on figure 43c represent the vector-derived fragments of construct #356. These would also be present in the other cell lines but are outside the area of this photograph.

238 7.2 DISCUSSION

Two constructs which are suitable for targeting the mouse factor IX gene have been generated. #440/356 contains 4.8kb of sequences homologous to exon 6 of the chromosomal gene, a Not 1 restriction site for linearization, and a Neo cassette to allow selection of transfected colonies. #444 contains the same sequences as #440 and the HSV-tk cassette which allows positive/negative selection of homologous recombinants. Unfortunately the region of the gene which is 3' of the sequences included in the short arm of the construct has not been isolated. Identification of homologous recombinants by PCR is, therefore, not yet possible with my constructs, and must be performed by Southern blot analysis.

7.2.1 Stem Cell Culture

I have successfully cultured Embryonic Stem cells in vitro without differentiation, by the use of BRL-conditioned medium. These BRL cells are known to secrete a growth factor, differentiation inhibitory activity (DIA (158)) which inhibits the differentiation of ES cells. This growth factor which has been discovered independently by several groups of workers and has been given several names, such as Leukemia Inhibitory Factor (LIF), (252) DIA, (158), Differentiation Inducing Factor (DIF) (253) , and Human Interleukin for DA cells (HILDA) (254) due to its very wide ranging functional effects. Most strikingly, the factor induces differentiation of leukemia cells and inhibits differentiation of ES cells, but does not stimulate proliferation. Sequence comparisons of these factors have confirmed that they are all identical. A wide range of cell types, including fibroblasts, osteoblasts, T lymphocytes and embryonic blastocyst cells, are capable of expressing LIF/DIA under the correct stimuli. It is also constitutively expressed by the metrial gland, developing in the uterine wall at the site of blastocyst implantation, suggesting an important role in early embryonic development. (For a review

239 on LIF see ref 255.) The differentiation inhibitory activities from STO fibroblast feeder layers, BRL-conditioned medium, bacterial or eukaryotic recombinant LIF are all commonly used in ES cell culture.

The stem Cell lines CCE and CPI were found to be unsuitable for targeting experiments after culturing in my hands. Although the CPIs have been shown to be capable of forming germline chimaeras, the change in culture conditions from one laboratory to another, and the high number of passages may have caused them to undergo chromosomal translocations. These gross genetic abnormalities are thought to prevent the stem cells from contributing to embryonic cell lineages, and particularly from forming viable germ cells. The stem cell line E14 is also known to be capable of contributing to germline chimaeras and has been used in several gene targeting experiments. This cell line was obtained from Dr. Martin Hooper at passage number 14 and passages 15-18 have been used in my electroporation experiments. I have been unable to send this cell line for karyotype analysis, but have shown that it is capable of forming cystic embryoid bodies. The possibility that the E14 cells also carry karyotype abnormalites due to high passage numbers cannot be excluded. A new ES cell line has recently been developed at our institute, and has been shown to contribute to germline chimaeras (Graham Kay- personal communication). It is hoped that this cell line will be available at low passage numbers for future gene targeting of factors IX and VIII.

My attempts at gene targeting have not yet identified any clones with targeted modifications of the mouse factor IX gene. None of the cells electroporated with targeting construct #444, has survived the positive/negative selection procedure. Six of the 25 cloned neomycin resistant E14 cell lines, electroporated with construct #356, have been characterised by Southern blot analysis. These have clearly integrated the targeting construct but have not undergone

240 homologous recombination events. 19 other neomycin resistant clones are still to be characterised. The possibility that a targeted clone is awaiting analysis and identification can not be eliminated.

7.2.2 Transfection Results

Reported targeting frequencies vary greatly depending on the amount of homology, target locus, and even strain differences between construct DNA and ES cell line. An additional complication when analysing results is that targeting frequencies are often expressed as a proportion of the G418r cells obtained. This can, however, be misleading, as it depends on the expression of the selectable marker which varies according to the exact cassette employed, and is influenced by its flanking sequences. PMClNeopolyA, for example, is known to be susceptible to position effects (256, 257) . Expressing targeting events as a proportion of the total number of cells electroporated (ie the absolute frequency) eliminates the effects of marker expression, but depends on transfection efficiency and cell survival of the transfection procedures.

Transfection Efficiency

Comparing my transfection data to those of Hasty et al (258), who uses similar electroporation conditions, (25^g DNA, 575V 500fifds in Biorad Gene Pulsar in PBS) reveals that my transfection efficiency is rather low. With the pMClNeopolyA vector I obtained a total of 500 G418r colonies from a total of lOxlO7 cells electroporated- one transfected cell for every 2xl05 cells electroporated: with targeting constructs based on the pMClNeopolyA cassette, Hasty obtained an average of approximately 1,500 G418r colonies, for every transfection of 107 cells - 1 in 6.6xl03. Reported electroporation efficiencies, however, vary greatly; Soriano et al (257) reported a transfection efficiency of 10-6 with pMClNeopolyA

241 (20p.g DNA, 500|iFds, 230V) alone which increases 10 fold when in the context of the c-src targeting construct. McMahon et al (256) also reported an 11-fold increase in transformation efficiency when the pMClNeopolyA cassette was inserted into the context of the targeting vector. The transfection efficiency reported by Hasty et al may be artificially high due to position dependent enhancement of pMClNeopolyA expression in his targeting vectors.

Nonetheless it is clear that my transfection efficiency must be improved in order to increase the probability of isolating homologous recombinants. Plating efficiency after my electroporation experiments appeared to be fairly poor, compared to that observed after cells had been split during routine culture, and in some experiments no transfected cells survived electroporation. This poor survival rate may account for the low numbers of G418r colonies observed in my experiments. It is possible that small variations in the contents of the PBS, which is made in bulk in the central media service, may affect cell viability. No improvements were observed, however, when I prepared my own solution. In preliminary experiments an improvement in survival rates was observed when cells were refed 3-4h prior to electroporation. This practice was continued during all the gene targeting experiments. Alternatively the relatively high passage numbers of my cell cultures at electroporation, typically pl5-16, may affect cell survival. At the time of my experiments it was not possible to obtain any ES cells at lower passage numbers.

242 Targeting Efficiencies

Hasty et al obtained 43 HPRT-targeted ES clones from 34,390 G418r colonies, with a vector containing 4.2kb of homology; an absolute frequency of approximately 1.5 targeting events in every in 107 cells electroporated, and approximately 1 in every 1000 G418r cells. A similar targeting frequency might be expected with my targeting constructs, containing 4. 8kb homology. My constructs, however, contain the pMClNeo cassette without the polyadenylation signal. From table 6 it can be seen that this cassette provides an approximately 10- fold enrichment for targeting events, due to its failure to express the Neor gene where there is no endogenous downstream polyadenylation signal. Extrapolating from the figures of Hasty et al, my construct might be expected to produce one targeting event in every 100 G418r cells. Since I have only isolated 25 G418r colonies from a total of 164, and have so far characterised only 6, the probability of identifying a targeted clone is small. The possibility that a targeted clone is awaiting analysis cannot be ruled out.

243 Vector Differences

The transfection efficiencies of vectors pMClNeo, #356 #440 and #444 differ by a factor of 2-4. Despite the presence of the same selectable marker, these small differences are to be expected since the constructs contain different sequences and have been produced in separate plasmid preparations.

Although vector #356 and #444 contain very similar constructs and would be expected to generate approximately the same numbers of G418r colonies, they actually give very different results. When digested with the Not 1 enzyme #356 appears linear; the extra Not 1 site which is known to be present (see table 5) must, therefore, be very close to the first site, and does not excise any part of the construct which might be expected to effect its transfection efficiency. Although plasmid preparation procedures were not changed, and the appearance of the digested DNA on an ethidium bromide- stained agarose gel was normal, it is probable that the lack of G418r cells obtained with #440 is due to poor DNA quality.

Southern Blot Analysis of the G418r clones.

None of the 6 cell lines transfected with the targeting construct show the lkb shift in the exon6-specific chromosomal Hind 3 band, which should be observed if an accurate gene replacement event had occurred. Recent publications (184, 203, 259) demonstrate, however, that many gene targeting events do not occur by the predicted replacement mechanism and may result in the insertion of the entire vector into the genome (see later) . Berinstein et al (260) also show that a non- homologous recombination beyond the short arm of the vector may occur along with the predicted homologous recombination at the end of the long arm. If either of these events had occurred in my G418r clones, digestion of the locus with Hind 3 would still result in a 1.6kb band derived from the vector

244 but the presence of the "normal” 5kb chromosomal band would not be expected since the cells only contain one factor IX gene.

The 3kb band in figure 43c suggests that there is vector contamination in the genomic DNA samples, probably originating from the stock solutions used in the DNA preparation. The presence of this band in figure 43a is unexplained, since any normal vector contamination in the DNA samples would not be expected to hybridize to the gel-purified, 291bp, exon6- specific probe. If the probe was similarly contaminated with vector sequence, (which is unlikely since it has been gel- purified and would easily separate from vector sequences) it would then be expected to hybridize to the other bands apparent on figure 43c which correspond to random integration of the construct.

245 7.2.3 Factors Affecting the Frequency of Homologous Recombination

The following is a brief review of some of the most recent publications concerning factors which affect the frequency of gene targeting. Many of these factors would be likely to affect the ability of my targeting constructs to undergo homologous recombination with the mouse factor IX locus and generate a targeted ES cell line.

Length Of Homology

The length of the region of the construct/ which is homologous to the chromosomal locus, greatly influences the frequency of gene targeting. There have been many publications on this subject. Rubnitz and Subramani (261) demonstrated a linear relationship between intramolecular recombination frequency and increasing length of homology from 0.25 to 5kb, and a steep reduction in frequency below 0.25 kb. In this system low recombination frequencies could still be observed with as little as 14bp of homology.

Thomas and Capecchi (179), observed an exponential relationship between length of homology and targeting frequency for insertion and replacement vectors, when homology was increased from 4 to 9kb. Hasty et al (258) also investigated the effects of changes in the length of homology in insertion and replacement vectors and observed results similar to those of Thomas and Capecchi. For insertion vectors, an increase in the region of homology from 1.3 to 6.8kb increased the targeting frequency 250-fold. For replacement vectors a 190-fold increase in targeting efficiency was observed when the length of homology was increased from 1.7 to 6.8kb. No gene replacement events were observed when the homologous region was only 1.3kb, suggesting a possible minimum requirement for length of homology.

246 My construct contains a total of 4.8kb of sequences homologous to the mouse factor IX locus. This is well above the observed minimum requirements and several other workers have reported targeting events with constructs of this size (259, 178, 262-264) . It is not likely, therefore, that, there is too little total homology in my construct. The short arm of the construct, on the 3'side of the Neo cassette, however, is only 68 basepairs in length. It is possible that this region is too short to mediate a crossover event. There have been several publications regarding the effect of the size of the short arm of targeting constructs on the frequency of homologous recombination.

"Short Arm” Homology

Berinstein et al (260) reported cases of gene targeting in the 11 immunoglobulin locus with replacement vectors with unilateral regions of homology. Interestingly they found that targeting with homologous sequences on only one side of the vector was equally efficient as vectors with bilateral homology. By analysing the junctions between the chromosomal locus and vector DNA the authors found that although homologous recombination had taken place as expected in the homologous sequences, a non-homologous crossover event had occured at the other end of the construct, in the long (6kb) region of non-homology. They proposed that where there is a long region of DNA in which non-homologous crossovers can occur, the frequency of recombination is determined by the total length of homology, rather than its distribution. Other experiments showed, however, that the |l locus is particularly susceptible to non-homologous insertion events, which could facilitate this type of "semi-homologous" recombination.

The results of Berinstein et al conflict with those of Smith and Kalogerakis (265) who found that the frequency of homologous recombination was severely restricted when the size of the short arm was reduced to 200 base pairs. These

247 experiments, however, were performed with isolated restriction fragments which would prevent the one-sided non-homologous integration observed by Berinstein et al. Accili et al (263) used a replacement vector with 4kb homology on one side of the Neo cassette and 750bp homology on the other side, and obtained a good targeting efficiency of 2 correctly targeted clones from 2.5xl06 cells electroporated. Hasty et al (259) also decreased the size of the short arm of their HPRT- targeting replacement vector from 1.2 to 0.472kb and found no significant change in the frequency of targeting events. They suggested that the total amount of homology in a construct is more important than its distribution. In one of their insertion vectors, however, a low targeting frequency of 1 in 537 G418r colonies was attributed to a limiting amount of total homology. This construct, however, contained only 132bp homology at the 3' end which, alone, could account for the low recombination frequency. It may be that the critical size for the short arm of the construct, beyond which any further decrease will affect targeting efficiency lies between 200bp which gave minimal recombination (265) and 472bp which did not decrease targeting frequency. If this is the case it may well be that my targeting construct contains insufficient homologous sequences on this region. The design of my construct, however, would allow non-homologous integration of the type seen by Berinstein et al in the 3kb of vector sequences.

Non-Homology

Until recently it was generally believed that the rate of homologous recombination is inversely proportional to the length of the region of non-homology in targeting constructs. Mansour et al (2 66) , however, have targeted the HPRT gene with a construct containing different sized inserts of non- homologous DNA. They demonstrated that targeting was as efficient with 12kb of non-homology as with lkb. They concluded that targeting at the HPRT locus was insensitive to

248 the length of non-homologous DNA introduced into the cell.

Gene Expression

Another factor which may affect possible targeting events is transcriptional activity and chromosomal conformation at the target locus. Factor IX is known to be expressed only at very low levels during the later stages of murine gestation (231) . Although the relevent expression studies have not been performed, factor IX expression would not be expected in early embryonic stages or embryonic stem cell lines, due to the absence of the vascular system. Non-expression per se is not thought to prevent gene targeting (section 3.1.4) but it is still possible that local chromatin structure could interfere with the mechanism of homologous recombination. Alternatively local repressor sequences may hinder expression of the selectable marker and prevent identification of the targeted clones.

249 7.2.4 Alternative Strategies for Gene Targeting

Insertion/Replacement Vectors

Following initial reports of similar targeting frequencies for replacement and insertion vectors (184) and the development of the positive/ negative selection procedure, most gene targeting experiments have utilised replacement vectors. There have, however, been several reports of non­ predicted integration events with such vectors (179, 203, 259) .

In order to investigate possible integration patterns and gain a deeper understanding of the mechanisms and criteria for homologous recombination, Hasty et al (259) compared the effects of targeting the HPRT locus with insertion and replacement vectors. They chose this locus because all targeted clones could be selected by the absence of HPRT regardless of integration pattern. They found that insertion vectors targeted up to nine times more frequently than replacement vectors with the same region of homology. Also, seven of nine clones targeted with one of the replacement vectors were found, by Southern blot analysis, to have inserted the whole vector into the target locus. The majority of gene replacement vectors were found to target the locus by integrating concatamers of end-to-end ligated vectors into the locus. This had occurred either by a replacement-like double crossover / gene conversion event or a single reciprocal recombination event like that which occurs in insertion vectors. "Replacement" type events were not even observed even when the targeting fragment of the construct was released from the vector region.

Positive/ negative selection relies on the elimination of the HSV-tk gene from the targeted locus. It would appear, however, that true gene replacement events are rare, and that the majority of clones, targeted by vector insertion events,

250 will incorporate the tk gene into the target locus. The positive /negative procedure, while enriching for true replacement events will also eliminate the majority of targeted clones. Selection procedures reliant on the polymerase chain reaction will also eliminate the majority of clones targeted by "replacement” vectors. PCR selection strategies identify only crossover events within the short arm of homology but any integration events occuring via a single reciprocal recombination are unlikely to occur in this region due to the relatively small amount of homologous sequence. In experiments where the exact scope of the insertion is not important, such as when generating null mutants, these selection procedures may therefore be inappropriate.

These results have far reaching implications for the design and analysis of future gene targeting experiments.

The Hit and Run Approach

To date most of the gene targeting experiments in non- selectable genes have aimed to generate null mutations by the insertion of selectable markers into the target locus. The presence of these selectable markers or the vector sequences introduced by insertion vectors, however, may interfere with local transcription and make the resultant phenotype difficult to interpret. Also, for detailed analysis of gene function and to mimic human diseases caused by small mutations it would be desirable to introduce subtle changes into the target locus without leaving extraneous sequences in the genome. Zimmer and Gruss (188) described the insertion of just 20bp of sequence into the Hoxl.l locus by microinjection and PCR screening, but there have been no further reports of successful targeting using this demanding technique.

In 197 9 Scherer and Davis described a two-step procedure for targeting the yeast his3 locus with subtle mutations

251 (267) . The targeting insertion vector carried a mutated his3 gene and a selectable URA3 gene. Targeted yeast colonies were selected by their URA3+ phenotype, then screened for reverse mutations. These second mutations were caused by the loss of vector sequences and caused the colony to revert to the URA3- phenotype. These colonies retained only the region of the vector containing sequences homologous to the genome, ie. the mutant his3 gene.

Valancius and Smithies (268), and Hasty et al (269) have independently developed two-step procedures, similar to that of Scherer and Davis, which can be applied to mammalian cell lines including ES cells.

Valancius and Smithies targeted the partially deleted HPRT locus in E14TG2a cells with a 4bp insertion. This mutation was included on an insertion vector which was also capable of repairing the mutant locus. Targeted clones, which contained duplicated regions of the locus, were selected by their ability to grow on Hypoxanthine-aminopterin-thymidine (HAT) medium. These were then expanded and selected in 6GT medium for HPRT- revertants. The isolated colonies were screened by Southern blot analysis to confirm that the vector sequences had been correctly excised, and to detect any colonies which had retained the 4bp insertion. All 4 of the HATr colonies reverted to the HPRT-, 6GTr phenotype, and 88% of the 6GTr clones accurately excised the vector sequences. 19 of the 20 accurate reversants were found to have retained the 4bp insertion. This recombination of sequences inserted into the genome could have occured either via intrachromatid recombination or by unequal sister chromatid exchange. It was concluded, however, that this particularly high rate of retention of the mutant 4bp was specific to this region of the HPRT locus, perhaps due to a recombination hotspot. Without such a hotspot the recombination event could occur anywhere within the duplicated sequences and might be expected to produce equal proportions of mutated and wild type revertants.

252 Although these experiments were performed at a selectable locus, it was proposed that this "In-Out” strategy could be applied to a non-selectable gene by including a HPRT minigene on the targeting vector. Targeted and revertant mutants could then be selected by growth on the relevent medium.

Hasty et al (269) tested a very similar protocol, which they described as "Hit and Run" at the HPRT locus, and then applied it to the non-selectable H0X2.6 locus. They mutated the Hox2.6 locus by introducing a small mutation which generated a Nhe 1 site and a stop codon. Instead of selecting for HPRT phenotypes, as Smithies had proposed, they included the Neo and HSV-tk cassettes in the insertion vector, but outside of the region of homology. Transfected cells were selected by growth in G418, and targeted clones were identified by Southern blot analysis (or by growth in 6gt for the HPRT targeting experiments). Revertants, which excised tk sequences, were selected by growth in FIAU and screened by Southern blotting procedures to confirm the absence of vector sequences and the presence of the diagnostic Nhe 1 site. One in 31 G418r colonies were found to be targeted at the Hox2.6 locus. One FIAUr revertant was found in approximately every 263 targeted cells. Two revertant colonies were found to have correctly excised the vector but retained the Nhel mutation.

The development of these two new strategies should make it possible to introduce small mutations into any locus and to carry out in vivo studies of protein structure and function.

253 Cotransformation

There have been several reports of the use of cotransformation of selectable and nonselectable genes in order to select for cells that have incorporated the nonselectable vector (270, 271). Reid et al (272) investigated the possibility of using this protocol to select for ES cell clones that had been targeted at a nonselectable gene without the introduction of extraneous sequences into the target locus.

In order to attempt to prevent concatamerisation of electroporated fragments by homologous recombination, and subsequent integration into the genome as one molecule, the HPRT and Neo genes were coelectroporated on separate fragments of DNA without homologous regions (187, 273). 75-100% of co­ electroporated cells were shown to undergo cotransformation, that is they integrated both DNA fragments into the genome at random sites. Of the clones that had been targeted at the HPRT locus, however, only 4% integrated a second fragment of DNA anywhere in the genome. Analysis of DNA integration patterns revealed frequent concatamers of the electroporated fragments at the random integration sites. The occurance of the concatamers at the target locus could not be excluded, although they were not observed in targeted, non-cotransformed cells. It was suggested that the high levels of concatamer formation compared to those described by Boggs et al (187) were due to higher concentrations of electroporated DNA.

It was calculated that 1 targeted clone was present in approximately every 5,500 electroporated cells. When the electroporated cells were first selected with G418, however, targeted clones were isolated approximately every 70 G418r cells. Despite the rarity of simultaneous targeting and cotransformation with Neo, this procedure thus provided an 80- fold enrichment for gene targeting events at the HPRT locus. Shulman et al (274) used cotransformation to target

254 immunoglobulin genes and obtained a similar 100-fold enrichment of targeted clones in the G418r population. The procedure does not, however, exclude the addition of extraneous sequences into the target locus. (Reid et al (272) were unable to select targeted colonies by this method of enrichment and concluded, on the basis of previously reported tergeting frequencies, that this was due to different efficiencies at the two loci.)

Strain Differences

There have also been recent reports that the rate of homologous recombination can be increased by using DNA from the same strain of mouse as the ES cells (Dr. S. Rastan/ Dr. TeRiele - personal communication). Small strain-specific sequence differences may affect the homologous pairing of the strands of DNA. Unfortunately my construct is derived from the DBA/2 mouse; the E14 cells, like most of the ES cell lines are derived from 12 9 mouse embryos.

255 FUTURE WORK

Due to time constraints there are several areas of this project that require further work. Perhaps the most urgent of these is the characterisation of the 19 ES cell clones containing the targeting vector #440. Southern Blot analyses, similar to those described in chapter 7 should reveal if any of these clones have undergone homologous recombination at the factor IX locus.

It would also be highly desirable to isolate the "missing" regions of the mouse factor IX gene in order to complete the restriction endonuclease map of the locus, allow detection of targeting events by PCR, and to extend the short arm of the targeting vector. The latter step should eliminate the possibility of targeting events being prevented by a limiting amount of homologous sequence in one arm of the construct.

Screening of the original bacteriophage library with a probe specific for exon 7 alone, so that weak signals will not be swamped by signals from other probes, may reveal exon 7- containing clones. These could then be screened with a second probe, for exon 6, in order to isolate the region between these two exons. This library could also be rescreened with probes specific for exons 1 or 2.

If these screens were not successful, implying that the sequences are not present in this library, an unamplified library, such as that used recently to isolate mouse factor VIII clones might be screened.

It would also be desirable to increase transfection efficiency, and to karyotype the E14 cells to confirm that they have not acquired translocations or other chromosomal abberations during culture in my hands.

256 Work towards a mouse model for haemophilia is continuing at the Haemostasis Research Group. Several factor VIII- containing targeting vectors with varying amounts of homology are being constructed. It is hoped that the ES cell line isolated by Dr. Graham Kay at the CRC will be used for factor VUI-targeting experiments. These cells are at low passage numbers and are known to be capable of contributing to the germline.

The isolation and characterisation of the mouse factor VIII gene is also continuing. It will be interesting to see if the sequence of exons 8 and 9 correspond to my data derived from PCR of the region.

To summarise:

I have amplified mouse sequences from genomic DNA, which I believe to be from the mouse factor VIII gene. Unfortunately I have been unable to definitively prove the identity of these sequences.

I have isolated and characterised over 53kb of mouse factor IX DNA, and sequenced all the exons and intron-exon boundaries. I have compared the structure of the mouse factor IX gene with those of other species.

I have constructed two targeting vectors, including sequences from exon 6 of the mouse factor IX gene, enabling two different levels of selection for targeting events.

I have introduced the targeting vectors into undifferentiated embryonic stem cells in culture, and analysed 6 selected stem cell clones by Southern blotting procedures. No targeting events have been identified.

I have discussed several factors thought to affect the

257 frequency of homologous recombination, and two recent developments in gene targeting strategies.

I have thus achieved, in part, aims 1 to 5 (see pagelO), but I have been unable to achieve numbers 6 and 7 due to time limitations.

258 REFERENCES

1. Patek, A.J., and Taylor F.H.L. (1937) J. Clin. Invest. 16, 113-124

2. Haldane, J.B.S., and Smith, C.A.B. (1947) Annals of Genetics. 14, 10-31

3. Aggeler, P.M., White, S.G., Glendening, M.B., Page, E.W., Leake, T.B., Bates, G. (1952) Proc. Soc. Exp. Biol, and Med. 79, 692-694

4. Biggs, R., Douglas, A.S., Macfarlane, R.G. (1952) Brit. Med. J. 2, 1378-1382

5. Davie, E.W., Fujikawa, K., Kisiel, W. (1991) Biochemistry 30, 10363-10370

6. O'Brien, D.P., McVey, J.H. Oxford University Press In Press.

7. Nemerson, Y. (1988) Blood 71, 1-8

8. Zur, M., Radcliffe, R.D., Oberdick, J., Nemerson, Y. (1982) J. Biol. Chem. 257, 5623-5631

9. Nakagaki, T., Foster, D., Berkner, K., Kisiel, W. (1991) Biochemistry 30, 10879-10824

10 Osterud, B., Rapaport, S.I. (1977) Proc. Natl. Acad. Sci. USA. 74, 5260

11. Bauer, K.A., Kass, B.L., Cate, H.T., Hawiger, J.J Rosenberg, R.D. (1990) Blood 76, 731-736

12. Ratnoff, O.D., Davie, E.W. (1962) Biochemistry 1, 677-685

13. Fujikawa, K., Legaz, M. E., Kato, H., Davie, E.W. (1974) Biochemistry 13, 4508-4516

14. Ragni, M.V., Lewis, J.H., Spero, J.A., Hasiba, U. (1981) Am. J. Haematol. 10, 79-88

15. Ratnoff, O.D., Davie E.W., Mallett D.L. (1961) J. Clin. Invest. 10, 803

16. Naito, K. and Fujikawa, K. (1991) J. Biol. Chem. 266, 7353-7358

259 17 Eaton, D., Rodriguez, H., Vehar, G. (1986) Biochemistry 25, 505-512

18 van Dieijen, G., Tans, G., Rosing, J., Hemker, H.C. (1981) J. Biol. Chem. 256, 3433-3442

19 Morrison, S.A., Jesty, J. (1984) Blood 63, 1338-47

20 Griffith, M.J., Reisner, H.M., Lundblad, R.L., Roberts H.R. (1982) Thromb. Res. 27, 289-301

21 Stern, D.M., Drillings, M., Nossel, H.L., Hurlet- Jensen, A., LaGamma, K.S., Owen. J. (1983) Proc. Natl. Acad. Sci. USA. 80, 4119-4123

22 Stern, D.M., Nawroth, P.P., Kisiel, W., Vehar, G., Esmon, C.T. (1985) J. Biol. Chem. 260, 6717-6722

23 Rimon, S., Melamed, R., Savion, N., Scott, T., Nawroth, P., Stern, D. (1987) J. Biol. Chem. 262, 6023-6031

24 Heimark, R.L., and Schwartz, S.M. (1983) Biochem. Biophys. Res. Commun. 111/ 723-731

25 Jackson, C.M. (1987) In Haemostasis and Thrombosis. Eds. Bloom, A.L., Thomas, D.P. 2nd ed. Churchill Livingstone, London. ppl65-91.

26 Radcliffe, R., Nemerson, Y. (1976) J. Biol. Chem. 251, 4749

27 Dittman, W., Majerus, P. (1990) Blood 75, 329-336

28 Fuchs, H.E., Trapp, H.G., Griffith, M.J., Roberts, H.R., Pizzo, S.V. (1984) J. Clin. Invest. 73, 1696-1703

29 Fujikawa, K., Thompson, A.R., Legaz, M.E., Meyer, R.G., Davie, E.W. (1973) Biochemistry. 12, 4938-4945

30 Osterud, B., Bouma B.N., Griffin, J.H. (1978) J. Biol. Chem. 253, 5946-5951

31 Katayama, K., Ericsson, L.H., Enfield, D.L., Walsh, K.A., Neurath, H., Davie, E.W., Titani, K. (1978) Proc. Natl. Acad. Sci. USA. 76, 4990-4994

260 32 Gilmore, R. (1991) Curr. Opin. Cell Biol. 3, 580-584

33 Furie, B. and Furie B.C.(1990) Blood 75, 1753-1762

34 Jorgensen, M.J., Cantor, A.B., Furie, B.C., Brown, C.L., Shoemaker, C. (1987) Cell 48, 185-191

35 Nelsestuen, G.L., (1976) J. Biol. Chem. 251, 5648-5656

36 Bloom, J.W., Mann, K.G. (1978) Biochemistry 17, 4430-4438

37 Tai, M.M., Furie, B.C., Furie, B. (1984) J. Biol. Chem. 259, 4162-4168

38 Borowski, M., Furie, B.C., Bauminger, S., Furie, B. (1986) J. Biol. Chem. 261, 14969-14975

39 Bloom, J.W. (1989) Thromb. Res. 54, 261-268

40 Nelsestuen, G.L., Kisiel, W., DiScipio, R.G. (1978) Biochemistry 17, 2134-2138

41 Bajaj, S.P. (1982) J. Biol. Chem. 257, 4127-4132

42 Sinha, D., Seaman, F.S., Walsh, P.N. (1987) Biochemistry 26, 3768-3775

43 Harlos, K., Holland, S.K., Boys, C.W.G., Burgess, A.I., Esnouf, M.P., Blake, C.C.C.F. (1987) Nature 330, 82-84

44 Stenflo, J. (1991) Blood 78, 1637-1650

45 Rees, D.J.G., Jones, I.M., Handford, P.A., Walter, S.J., Esnouf, M.P., Smith, K.J., Brownlee, G.G. (1988) EMBO J 7, 2053-2061.

46 Furie, B., Furie, B.C. (1988) Cell 53, 505-518

47 Fernlund, P. and Stenflo, J.(1983) J. Biol. Chem. 258, 12509-12512

261 48 Huang, M.N., Kasper, C.K., Roberts, H.R., Stafford, D.W., High, K.A. (1989) Blood 73, 718-721

49 Handford, P.A., Baron, M., Mayhew, M., Willis, A., Beesley, T., Brownlee, G.G. (1990) EMBO. J. 9, 475-480

50 Handford, P. A., Mayhew, M, Baron, M., Winship, P.R., Campbell, I.D., Brownlee, G.G. (1991) Nature 351, 164-167

51 Huang, L.H., Cheng, H., Pardi, A., Tam, J.P. Sweeney, W.V. (1991) Biochemistry 30, 7402-7409

52 Ryan, J., Wolitzky, B., Heimer, E., Lambrose, T., Felix, A., Tam, J.M., Huang, L.H., Nawroth,P., Wilner,G., Kisiel, W., Nelsestuen, G.L., Stern, D.M. (1989) J. Biol. Chem. 264, 20283-87

53 Cheung, W.-F., Straight, D.L., Smith, K.J., Lin, S.-W., Roberts, H.R., Stafford, D.W. (1991) J. Biol. Chem. 266, 8797-8800

54 DiScipio, R.G., Kurachi, K., , E.W. (1978) J. Clin. Invest. 61, 1528-1538

55 Balland, A., Faure, T., Carvallo, D., Cordier, P., Ulrich, P., Fournet, B., De La Salle, H., Lecocq, J-P. (1988) Eur. J. Biochem. 172, 565-572

56 Mizuochi, T., Taniguchi, T., Fujikawa,K., Titani, K., Kobata, A. (1983) J. Biol. Chem. 258, 6020-6024

57 McGraw, R.A., Davis, L.M., Noyes, C.M., Lundblad, R.L., Roberts, H.R., Graham, J.B., Stafford, D.W. (1985) Proc. Natl. Acad. Sci. USA. 82, 2847-2851

58 Tulinsky, A. (1991) Thromb. Haemostas. 66, 16-31

59 Jackson, C.M., Nemerson, Y. (1980) Ann. Rev. Biochem. 49, 765-811

60 Furie, B., Bing, D.H., Feldman, R.J., Robison, D.J., Burnier, J.P. (1982) J. Biol. Chem. 257, 3875-3882

262 61 De Haen, C., Neurath, H., Teller, D.C. (1975) J. Mol. Biol. 92, 225-259

62 Rotblat, F., O'Brien, D.P., O'Brien, F.J., Goodall, A.H., Tuddenham, E.G.D. (1984) Biochemistry 24, 4294-4300

63 Wion, K.L., Kelly, D., Summerfield, J.A., Tuddenham, E.G.D., Lawn, R.M. (1985) Nature 317, 726-729

64 Vehar, G. A., Keyt, B., Eaton, D., Rodriguez, H., O'Brien, D.P., Rotblat, F., Opperman, H., Keck, R., Wood, W.I., Harkins, N., Tuddenham, E.G.D., Lawn, R., Capon, D.J. (1984) Nature 312, 337-342

65 Toole, J.J., Pittman, D.D., Orr, E.C. Murtha, P., Wasley, L.C., Kaufman, R.J. (1986) Proc. Natl. Acad. Sci. USA. 83, 5939-5942

66 Poole, S., Firtel, R.A., Lawar, E., Rowekamp,W. (1981) J. Mol. Biol. 153, 273-289

67 Tuddenham, E.G.D., Trabold, N.C., Collins, J.A., Hoyer, L.W. (1979) J. Lab. Clin. Med. 93, 40-53

68 Pittman, D.D., Kaufman, R.J. (1989) Thromb. Haemostas. 61, 161-165

69 Chance, P.F., Dyer, K.A., Kurachi, K., Yoshitake, S., Ropers, H-H, Wieacker, P., Gartler, S.M. (1983) Hum. Genet. 65, 207-208

70 Boyd, Y., Buckle, V.J., Munroe, E.A., Choo, K.H., Migeon, B.R., Craig. I. (1984) Ann. Hum. Genet. 48, 145-152

71 Camerino, G., Grzeschik, K.H., Jaye, M., De La Salle, H., Tolstoshev, P., Lecocq, J-P., Heilig, R., Mandel, J.L. (1984) Proc. Natl. Acad. Sci. USA. 81, 498-502

72 Avner, P., Amar, L., Arnaud, D., Hanauer, A., Cambrou, J. (1987) Proc. Natl. Acad. Sci. USA. 84, 1629-1633

73 Cytogenetics and Cell Genetics. (1989) Human Gene Mapping, 10 (51), 844

74 Mullins, L.J., Grant, S.G., Stephenson, D.A., Chapman, V.M. (1988) Genomics 3, 187-194

263 75 Anson, D.S., Choo, K.H., Rees, D.J.G., Giannelli, F., Gould, K., Huddleston, J.A., Brownlee, G.G. (1984) EMBO J. 3, 1053-1060

76 Choo, K.H., Gould, K.G., Rees, D.J.G., Brownlee, G.G. (1982) Nature 299, 178-180

77 Kurachi, K., , E.W. (1982) Proc. Natl. Acad. Sci. USA. 79, 6461-6464

78 Jaye, M., De La Salle, H., Schamber, F., Balland, A., Kohli, V., Findeli, A., Tolstochev, P., Lecocq, J-P. (1983) Nuc. Acid. Res. 11, 2325-2335

79 Yoshitake, S., Schach, B.G., Foster, D.C., Davie, E.W., Kurachi, K. (1985) Biochemistry 24, 3736-3750

80 Crossley, M., Brownlee, G.G. (1990) Nature 345, 444-445

81 Salier, J.-P., Hirosawa, S., Kurachi, K. (1990) J. Biol. Chem. 265, 7062-7068

82 Reijnen, M.J., Bertina, R. M., Reitsma, P.H. (1990) FEBS Lett. 270, 207-210

83 Goldberg, M.L. 1979 PhD thesis, Stanford Univ.

84 Efstradiatis, A., Posakony, J.W., Maniatis, T., Lawn, R., O'Connell, C., Spritz, R.A., DeRiel, J.K., Forget, B.G., Weissman, S.M., Slightom,J.L., Blechl, A.E., Smithies, 0., Baralle, F.E., Shoulders, C.C., Proudfoot, N.J. (1980) Cell 21, 653-668

85. Nordheim, A., Tesser, P., Azorin, F., Kwon, Y .H .,.Moller, A., Rich, A. (1982) Proc. Natl. Acad. Sci. 79, 7729-7733

86. Droge, P., Nordheim, A. (1991) Nuc. Acid. Res. 19, 2941-2946

87. Wahl, W.P., Wallace, L.J., Moore, P.D. (1990) Mol. Cell. Biol. 10, 785-793

88 Lerman, M.I., Thayer, R.E., Singer, M . F . (1983) Proc. Natl. Acad. Sci. USA. 80, 3966-3970

89 Manueldis, L. (1982) Nuc. Acid. Res. 10, 3211-3219

264 90 Deininger, P.L., Jolly, D.J., Rubin, C.M., Friedman, T., Schmid, C. (1981) J. Mol. Biol. 151, 17-33

91 Breathnach, R. and Chambon, P. (1981) Ann. Rev. Biochem. 50, 349-83

92 Shapiro, M.B., Senapathy, P. (1987) Nuc. Acid, Res. 15, 7155-7176

93 Sharp, P.A. (1981) Cell 23, 643-646

94 Proudfoot, N.J., Brownlee, G.G. (1976) Nature 263, 211-214

95 Berget, S.M. (1984) Nature 309, 179-182

96 Gitchier, J., Wood, W.I., Goralka, T.M., Wion, K.L., Chen, E.Y., Eaton, D.H., Vehar, G.A., Capon, D.J., Lawn, R.M. (1984) Nature 312, 326-330

97 Toole, J.J., Knopf, J.L., Wozney, J.M., Sultzman, L.A., Buecker, J.L., Pittman, D.D., Kaufman, R.J., Brown, E., Shoemaker, C., Orr, E.C., Amphlett, G.W., Foster, W.B., Coe, M.L., Knutson, G.J., Fass, D.N., Hewick, R.M. (1984) Nature 312, 342-347

98 Giannelli, F., Green, P.M., High, K.A., Sommer, S., Lillicrap, D.P., Ludwig,. (1991) Nuc. Acid. Res. 19 (SUP) 2193-2219

99. Tuddenham, E.G.D., Cooper, D.N., Gitschier, J., Higuchi, M., Hoyer, L.W., Yoshioke, A., Peake, I.R., Schwaab, R., Olek, K., Kazazian, H.H., Lavergne, J.M., Giannelli, F., Antonarakis, S.E. (1991) Nuc. Acid. Res. 19, 4821-4833

100 Giannelli, F. and Brownlee, G.G. (1986) Nature 320, 196

101 Giannelli, F., Choo, K.H., Rees D.J.G., Boyd, Y., Rizza, C.R., Brownlee, G.G. (1983) Nature 303, 181-182

102 Millar, D.S., Steinbrecker, K.A., Wieland, K., Grundy C.B., Martinowitz, U., Krawczak, M., Zoll, B., Whitmore, D., Stephenson, J., Mibashan, R.S., Kakkar, V.V., Cooper, D.N. (1990) Hum. Genet. 86, 219-227

265 103 , A.K., Rees, D.J.G., Rizza, C., Brownlee, G.G. (1986) Cell 45, 343-348

104 Liddell, M.B., Lillicrap, D.P., Peake, I.R., Bloom, A . L . (1988) Brit. J. Haematol. 69, 120

105 Green, P.M., Bentley, D.R., Mibashan, R.S., Nilsson, I.M., Giannelli F. (1989) EMBO J. 8, 1067-1072

106 Tsang, T.C., Bentley, D.R., Mibashan, R.S., Giannelli, F. (1988) EMBO J. 7, 3009-3015

107 Liddell M.B., Lillicrap, D.P., Peake, I.R., Bloom, A . L . (1988) B. J. Haematol. 69, 120

108 Rees, D.J.G/, Rizza, C.R., Brownlee, G.G. (1985) Nature 316, 643-645

109 Winship, P.R. (1986) D. Phil. Thesis, Oxford University.

110 Hirosawa, S., Fahner, J.B., Salier, J.-P., Wu, C.-T., Lovrien, E.W. (1990) Proc. Natl. Acad. Sci. USA. 87, 4421-4425

111 Ferrari, N. and Rizza, C.R. (1986) Braz. J. Genet. 9, 87-99

112 Pattinson J.K., Millar D.S., McVey, J.H., Grundy C.B., Wieland, K., Mibashan, R.S., Martinowitz, U., Tan-Un, K., Vidaud, M., Goossens, M., Sanpietro, P.M., Manucci, M., Krawczak, M., Reiss, J., Zoll, B., Whitmore, D., Bowcock, S., Wensley, R., Ajani, A., Mitchell, V., Rizza, C., Maia, R., Winter, P., Mayne, E.E., Schwartz, M., Green, P.J., Kakkar, V.V., Tuddenham, E.G.D., Cooper, D.N. (1990) Blood 76, 2242-2248

113 Brown, T.C. and Juricny, J. (1988) Cell 54, 705-711

114 Cullen, C.R., Hubberman, P., Kaslow, D.C., Migeon, B.R. (1986) EMBO J. 5, 2223-2229

115 Green, P.M., Montandon, A.J., Bentley, D.R., Ljung, R., Nilsson, I.M., Giannelli, F. (1990) Nuc. Acid. Res. 18, 3227-3231

266 116 Youssoufian, H., Antonarakis, S. E., Bell, W., Griffin, A.M., Kazazian, H.H. Jr. (1988) Am. J. Hum. Genet. 42, 718-725

117 Vulliamy, T.J., D'urso M., Battistuzzi, G., Estrada, M., Foulkes, N.S., Martini, G., Calabro, V., Pogg, V., Giordano, R., Town, M., Luzzatto, L., Persico, M.G. (1988) Proc. Natl. Acad. Sci. USA. 85, 5171-5175

118 Cooper, D.N., Kawczak, M. (1990) Hum. Genet. 85, 55-74

119 Allain, J.-P., Dailey, S.H., Laurian, Y., Vallari, D.S., Rafowicz, A., Desai, S.M., Devare, S.G. (1991) J. Clin. Invest. 88, 1672-1679

120 Bloom, A . L . (1991) Thromb. Haemostas. 66, 166-177

121 Giannelli, F. (1989) in The Molecular Biology of Blood Coagulation, Baillieres Clinical Haematology 2, 821-848 Ed. Tuddenham, E.G.D.

122 Wood, W.I., Capon, D.J., Simonsen, C.C., Eaton, D.L., Gitschier, J., Keyt, B., Seeburg, P.H., Smith, D.H., Hollingshead, P., Wion, K.L., Delwart, E., Tuddenham, E.G.D., Vehar, G.A., Lawn, R.M. (1984) Nature 312, 330-337

123 Kaufman, R.J., Wasley, L.C., Furie, B.C., Furie, B., Shoemaker, C.B. (1986) J. Biol. Chem. 261, 9622-9628

124 Anson, D.S., Austen, D.E.G., Brownlee, G.G. (1985) Nature, 315, 683-685

125 De La Salle, Altenberger, W., Elkaim, R., Dott, K., Dieterle, A., Drillien, R., Cazenave, J.-P., Tolstoshev, P., Lecocq, J.-P. (1985) Nature 316, 268-270

126 Busby, S., Kumar, A., Joseph, M., Halfpap, L., Insley, M., Berkner, K., Kurachi, K., Woodbury, R. (1985) Nature 316, 271-273

127 Balland, A., Faure, T., Carvallo, D., Cordier, P., Ulrich, P., Fournet, B., De la Salle, H., Lecocq, J.-P. (1988) Eur. J. Biochem. 172, 565-572

267 128 Armentano, D., Thompson, A.R., Darlington, G., Woo, S.L.C. (1990) Proc. Natl. Acad. Sci. USA. 87, 6141-6145

129 Clarke, A.J., Ali, S., Archibald, A.L., Bessos, H., Brown, P., Harris, S., McClenaghan, M., Prowse, C., Simons, J.P., Whitelaw, C.B.A., Wilmut, I. (1989) Genome 31, 950-955

130 Verma I.M. (1990) Sci. Amer. 263, 34-41

131 Weatherall, D.J. (1991) Nature 348, 275-276

132 Thompson, A.R. (1991) Throm. Haemost. 66, 119-122

133 Vega, M.A. (1991) Hum. Genet. 87, 245-253

134 St. Louis, D., Verma, I.M. (1988) Proc. Natl. Acad. Sci. USA. 85, 3150-3154

135 Thompson, A.R. (1991) Thromb. Haemostas. 66, 119-122

136 Jaenisch, R. (1977) Cell 12, 691

137 Jaenisch, R. (1976) Proc. Natl. Acad. Sci. USA. 73, 1260-1264

138 Jahner,D. Jaenisch, R. (1980) Nature 287, 456

139 Jaenisch, R. (1980) Cell 19, 181

140 Danos, 0., Mulligan, R.C. (1988) Proc. Natl. Acad. Sci. USA. 85, 6460-6464

141 Morgenstern, J.P., Land, H. (1990) Nuc. Acid. Res. 18, 3587-3596

142 Gordon, J.W., Scaongos, G.A., Plotkin, D.J. Barbosa, J.A., Ruddle, F.H. (1980) Proc. Natl. Acad. Sci. USA. 77, 7380-7384

143 Lacey, M., Alpert, S., Hanahan, D. (1986) Nature 322, 609

144 Mahon, K.A., Overbeek, P.A., Westphal H. (1988) Proc. Natl. Acad. Sci. USA. 85, 1165

268 145 Stevens, L.C. (1964) Proc. Natl. Acad. Sci. USA. 52, 654-662

146 Kleinsmith, L.J., Pierce, G.B.Jr. (1964) Cancer Res. 24, 1544

147 Evans, M. J. (1972) J. Emb. Exp. Morph. 28, 163-176

148 Rosenthal, M., Wishnow, R.N., Sato, G.H. (1970) J. Nat. Cane. Inst. 44, 1001-1014

149 Martin, G.R., Evans, M.J. (1975) Proc. Natl. Acad. Sci. USA. 72, 1441-1445

150 Brinster, R.L. (1974) J. Exp. Med. 140, 1049-1056

151 Papaioannou, V.E., McBurney, M.W., Gardner, R.L., Evans, M.J. (1975) Nature 258, 70-73

152 Mintz, B., Illmensee, K. (1975) Proc. Natl. Acad. Sci. USA. 72

153 Pellicer, A., Wagner, E.F., Kareh, A.E., Dewey, M.J., Reuser, A.J. (1980) Proc. Natl. Acad. Sci. USA. 77, 2098-2101

154 Stewart, C.L., Vanek, M., Wagner, E.F. (1985) EMBO J. 4, 3701-3709

155 Gardner, R.L., (1968) Nature 220, 596-597

156 Evans, M.J., Kaufman, M.H. (1981) Nature 292, 154-156

157 Martin, G.R. (1981) Proc. Natl. Acad. Sci. USA. 78, 7634-7638

158 Smith, T.A., Hooper, M.L. (1983) Exp. Cell Res. 145, 458-462

159 Koopman, P., Cotton, R.G. (1984) Exp. Cell Res. 154, 233-242

160 Bradley, A., Evans, M., Kaufman, M.H., Robertson, E. (1984) Nature 309, 255-256

161 Robertson, E., Bradley, A., Kuehn, M., Evans, M. (1986) Nature 323, 445-447

269 162 Solter, D., Knowles, B.B. (1975) Proc. Natl. Acad. Sci. USA. 72, 5099-5102

163 Williams, R.L., Hilton, D.J. (1988) Nature 336, 684-687

164 Metcalf, D. (1991) Int. J. Cell Cion. 9, 95-108

165 Nichols, J., Evans, E.P., Smith, A.G. (1990) Development 110, 1341-1348

166 Smith, A.G., Hooper, M.L. (1987) Devi. Biol. 121, 1-9

167 Handyside, A.H., O'Neill, G.T., Jones, M., Hooper, M.L. (1989) Roux Arch. Dev. Biol. 198, 48-55

168 Axelrod, H.R. (1984) Devi. Biol. 101, 225-228

169 Doetschman, T.C., Eistetter, H., Katz, M., Schmidt, W., Kemler, R. (1985) J. Emb. Exp. Morph. 87, 27-45

170 Hinnen, A., Hicks, J.B., Fink, G.R. (1978) Proc. Natl. Acad. Sci. USA. 75, 1929-1933

171 Folger, K.R., Wong, A.A., Wahl. G., Capecchi, M.R. (1982) Mol. Cell. Biol. 2, 1372-1387

172 Kucherlapati, R.S., Eves, E.M., Song, K.Y., Morse, B.S., Smithies, 0. (1984) Proc. Natl. Acad. Sci. USA. 81, 3153-3157

173 Paludan, K., Duch, M., Jorgensen, P., Kjeldgaard, N.O., Pedersen, F.S. (1989) Gene 85, 421-426

174 De Saint Vincent, B.R., Wahl, G.M. (1983) Proc. Natl. Acad. Sci. USA. 80, 2002-2006

175 Small, J., Scangos, G. (1983) Science 219, 174-176

176 Smithies, 0., Koralewski, M.A., Song, K.Y., Kucherlapati, R.S. (1984) Cold Spr. Harb. Symp. Quant. Biol. 49, 161-170

177 Lin, F.-L., Sperle, K., Sternberg, N. (1985) Proc. Natl. Acad. Sci. USA. 82, 1391-1395

270 178 Smithies, 0., Gregg, R.G., Boggs, S.S., Koralewski, A., Kucherlapati, R.S. (1985) Nature 317, 230-234

179 Thomas, K.R., Capecchi, M.R. (1990) Nature 346, 847-850

180 DeChiara, T.M., Efstradiadis, A., Robertson, E.J. (1990) Nature 345, 78-80

181 Orr-Weaver, T.L., Szostak, J.W., Rothstein, R.J. (1981) Proc. Natl. Acad. Sci. USA. 78, 6354-6358

182 Chang, X.H., Wilson, J.H. (1987) Proc. Natl. Acad. Sci. USA. 84, 4959-4963

183 Wong, E.A., Capecchi, M.R. (1987) Mol. Cell Biol. 7, 2294-2295

184 Thomas, K.R., Capecchi, M.R. (1987) Cell 51, 503-512

185 Johnson, R.S., Sheng, M., Greenberg, M.E., Kolodner, R.D., Papaioannou, V.E., Spiegelman, B.M. (1989) Science 245, 1234-1236

186 Potter, H. (1988) Analyt. Biochem. 17, 361-373

187 Boggs, S.S., Gregg, R.G., Borenstein, N., Smithies, 0. (1986) Exp. Haematol. 14, 988-994

188 Zimmer, A., Grus, P. (1989) Nature 338, 150-156

189 Song, K.-Y., Schwartz, F., Maeda, N., Smithies, 0., Kucherlapati, R. (1987) Proc. Natl. Acad. Sci. USA. 84, 6820-6824

190 Sedivy, J.M., Sharp, P.A. (1989) Proc. Natl. Acad. Sci. USA. 86, 227-231

191 Mansour, S.L., Thomas, K.R., Capecchi, M.R. (1988) Nature 336, 348-352

192 St. Clair, M.H., Lambe, C.U., Furman, P.A. (1987) Antimicrob. Agents Chemother. 31, 844-849

193 Borrelli, E., Heyman, R., Hsi, M., Evans, R.M. (1988) Proc. Natl. Acad. Sci. USA. 85, 7572-7576

271 194 Saiki, R.K., Gelfand, G.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B., Erlich, H.A. (1986) Science 239, 487-491

195 Mullis, K., Faloona, F., Scharf, S., Saiki, 0., Horn, G., Erlich, H. (1986) Cold Spr. Harb. Symp. Quant. Biol. 51, 263-273

196 Kim, H.-S., Smithies, 0. (1988) Nuc. Acid. Res. 16, 8887-8902

197 Holliday, R. 1964 Genet. Res. Camb. 5, 282-304

198 Huberman, J. (1968) Cold Spr. Harb. Symp. Quant. Biol. 33, 509

199 Meselson M.S., Radding, C.M. (1975) Proc. Natl. Acad. Sci. USA. 72, 358-361

200 Szostak, J.W., Orr-Weaver, T.L., Rothstein, R.J., Stahl, F.W. (1983) Cell 33, 25-35

201 Lin, F.-L., Sperle, K., Sternberg, N. (1984) Mol. Cell Biol. 4, 1020-1034

202 Wake, C.T., Vernaleone, F., Wilson, J.H. (1985) Mol. Cell. Biol. 5, 2080-2089

203 Thompson, S., Clarke, A.R., Pow, A.M., Hooper, M.L., Melton, D. (1989) Cell 56, 313-321

204 Hooper, M., Hardy, K., Handyside, A., Hunter, S., Monk, M. (1987) Nature 326, 292-295

205 Doetschman, T., Gregg, R.G. (1987) Nature 330, 576-578

206 Davis, L.G., Dibner, M.D., Battey, J.F. (1986) Basic Methods in Molecular Biology. Elsevier

207 Sambrook, J., Fritsch, E.F., Maniatis, T. (1989) Molecular Cloning; A laboratory Manual 2nd edition Cold Spring Harbour laboratory Press

208 Southern, E. Methods in Enzymol. 68, 152-164

209 Rigby, P.W., Dieckman, M., Rhodes, C., Berg, P. (1977) J. Mol. Biol. 113, 237-251

272 210 Sanger, F, , Nicklen, S., Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA. 74, 5463-5467

211 Rackwitz, H-R., Zehetner, G., Frischauf, A.-M., Lehrach, H. (1984) Gene 30, 195-200

212 Kogan, S.C., Doherty, M., Gitchier, J. (1987) N. Eng. J. Med. 317, 16

213 Coon, H. G. (1968) J. Cell. Biol. 39, 29a

214 Robertson E.J. Embryo-derived Stem Cell Lines, in Teratocarcinomas and Embryonic Stem Cells; A apractical Approach, p72-112. Ed EJ Robertson IRL Press Oxford

215 Bahnson, A.B., Boggs, S. (1990) Biochem. Biophys. Res. Commun. 171, 752-757

216 Poustka, A-M., Rackwitz, H-R., Frischauf, A-M., Hohn, B., Lehrach, H . (1984) Proc. Natl. Acad. Sci. USA. 81, 4129-4133

217 O'Brien, D.P., Tuddenham, E.G.D. (1989) Blood 73, 2117-2122

218 Clark, L., Carbon, J. (1976) Cell 9, 91

219 Graham, A., Papalopulu, N., Lorimer, J., McVey, J.H., Tuddenham, E.G.D., Krumlauf, R. (1988) Genes and Devel. 2, 1424-1428

220 McVey, J.H., Nomura, S., Kelly, P., Mason, I.J., Hogan, B.L. (1988) J. Biol. Chem. 263, 11111-16

221 Grant, S.G., Jessee J., Hanahan D FOCUS 13 115-119 1992

222 Raleigh, E.A., Murray, N.E., Revel, H., Blumenthal,R.M. (1988) Nucl. Acid Res. 16, 1563

223 Exner, T., Rickard, K.A., Kronenberg, H. (1983) Thromb. Res. 32, 427-436

224 Vosberg, H-P. (1989) Hum. Genet. 83, 1-15

273 225 McVey, J.H., Pattinson, J.K., Tuddenham, E.G.D. (1991) Molecular Biology In Blood Coagulation in Recent Advances in Blood Coagulation 5, 79-92 Ed. Poller, R.

226 Gibbs, R.A., Nguyen, P.-N., Edwards, A., Civitello, A.B., Caskey, C.T. (1990) Genomics 7, 235-244

227 Marchuk, D., Drumm, M., Saulino, A., Collins, S. (1991) Nuc. Acid. Res. 19, 1154

228 Maxam, A.M., Gilbert W. (1980) Proc. Natl. Acad. Sci. USA. 74, 560-564

229 Wu, S.-M., Stafford, D.W., Ware, J. (1990) Gene 86, 275-278

230 Pang, C.-P., Crossley, M., Kent, G., Brownlee, G.G. (1990) Nuc. Acid. Res. 18, 6731-6732

231 Yao, S.-N., DeSilva, A.H., Kurachi, M., Samuelson, L.C., Kurachi, K. (1991) Thromb. Haemost. 65, 52-58

232 Sarkar, G., Koeberl, D.D., Sommer, S.S. (1990) Genomes 6, 133-143

233 Reitsma, P.H., Bertina, R.M., Ploos van Amstel, J.K., Riemens, A., Briet, E. (1988) Blood 72, 1074-1076

234 Crossley, M., Winship, P.R., Austen, D.E., Rizza, C.R., Brownlee, G.G. (1990) Nuc. Acid. Res. 18, 4633

235 Crossley, M., Brownlee, G.G. (1990) Nature 345, 444-445

236 Paonessa, G., Gounari, F, Frank, R., Cortese, R. (1988) EMBO J. 7, 3115-3123

237 Landschulz, W.H., Johnson, P.F., Adeshi, E.Y., Graves, B.J., McKnight, S.L. (1988) Genes. Devel. 2, 786-800

238 Ryden, T.-A., Beamon, K. (1989) Mol. Cell. Biol. 9, 1155-1164

274 239 Nowock, J., Borgmeyer, U., Puschel, A.W., Rupp, R.A.W., Sippel, A . E . (1985) Nuc. Acid. Res. 13, 2045-2061

240 Gronostajski R,M. (1987) Nuc. Acid. Res. 15, 5545-5559

241 Grosveld, G.C., Shewmaker, C.K., Jat, P., Flavell, R.A. (1981) Cell 25, 215-226

242 Evans, J.P., Watzke, H.H., Ware, J.L., Stafford, D.W., High, K.A. (1989) Blood 74, 207-212

243 Kozak, M. (1987) Nuc. Acid Res. 15, 8125-8132

244 Price, P.A. (1987) Proc. Natl. Acad. Sci. USA. 84, 8335-8339

245 Htun, H., Dahlberg, J.E. (1988) Science 241, 1791-96

246 Larsen, A., Weintraub, H. (1982) Cell 29, 609-622

247 Nickol, J.M., Felsenald, G. (1983) Cell 35, 467-477

248 Kilpatrick, M.W., Tom, A., Kang, D.S., Engler, J.A., Wells, R.D. (1986) J. Biol. Chem. 261, 11350-11354

249 Davis, T.L., Firulli, A.B., Kinniburgh, A.J. (1989) Proc. Natl. Acad. Sci. USA. 86, 9682-9686

250 Htun, H., Lund, E., Dahlberg, J.E. (1984) Proc. Natl. Acad. Sci. USA. 81, 7288-7292

251 Krayev, A.S., Markusheva, T.V., Kramerov, D.A., Ryskov, A.P., Skryabin, K.G., Bayev, A.A., Georgiev, G.P. (1982) Nuc. Acid. Res. 10, 7461-7475

252 Hilton, D.J., Nicola, N.A., Metcalf, D. (1988) Anal. Biochem. 173, 359-367

253 Tomida, M., Yamamato-Yamaguchi, Y., Hozumi, M. (1984) J. Biol. Chem. 259, 10978-10982

254 Moreau, J.-F., Donaldson, D.D., Bennett, F., Witek- Gianotti, J.A., Clark, S.C., Wong, G.G. (1988) Nature 336, 690-692

275 255 Metcalf, D. (1991) Int. J. Cell Cion. 9, 95-108

256 McMahon, A.P., Bradley, A. (1990) Cell 62, 1073-1085

257 Soriano, P., Montgomery, C., Geske, R., Bradley, A. (1991) Cell 64, 693-702

258 Hasty, P., Rivera-Perez, J., Bradley, A. (1991) Mol. Cell. Biol. 11, 5586-5591

259 Hasty P, Rivera-Perez J., Chang C., Bradley A. (1991) Mol. Cell. Biol. 11, 4509-4517

260 Berinstein, N., Pennell, N., Ottaway, C.A., Shulman, M.J. (1992) Mol. Cell. Biol. 12, 360-367

261 Rubnitz, J., and Subramani,S. (1984) Mol. Cell. Biol. 4, 2253-2258

262 Zheng, H., Wilson, J.H. (1990) Nature 344, 170-173

263 Accili, D., Taylor, S.I. (1991) Proc. Natl. Acad. Sci. USA. 88, 4708-4712

264 Shesely, E. G., Kim, H.-S., Shehee, W.R., Papayannapoulou, T., Smithies, 0., Popovich, B.W. (1991) Proc. Natl. Acad. Sci. USA. 88, 4294-4298

265 Smith, A.J.H., Kalogerakis, B. (1990) J. Mol. Biol. 213, 415-435

266 Mansour, S.L., Thomas, K.R., Deng, C., Capecchi, M.R. (1990) Proc. Natl. Acad. Sci. USA. 87, 7688-7692

267 Scherer, S., Davis, R.W. (1979) Proc. Natl. Acad. Sci. USA. 76, 4951-55

268 Valancius, V., Smithies, 0. (1991) Mol. Cell. Biol. 11, 1402-1408

269 Hasty, P., Ramirez-Solis, R., Krumlauf, R., Bradley, A. (1991) Nature 350, 243-246

270 Kuhn, L.C., McClelland, A., Ruddle F.H. (1984) Cell 37, 95-103

276 271 Newman, R., Domingo, D., Trotter, J., Trowbridge, I. (1983) Nature 304, 643-645

272 Reid, L.H., Shesely, E.G., Kim, H.-S., Smithies, 0. (1991) Mol. Cell. Biol. 11, 2769-2777

273 Toneguzzo, F., Keating, A., Glynn, S., McDonald, K. (1988) Nuc. Acid. Res. 16, 5515-5532

274 Shulman, M.J., Nissen, L., Collins, C. (1990) Mol. Cell. Biol. 10, 4466-4472

277 APPENDICES Appendix I

List of abbreviations used in this thesis

aa amino acid BRL buffalo rat liver CAT chlorampenicol acetyl transferase CRM cross reacting material DEPC diethylpyrocarbonate EB embryoid body EC cells embryonal carcinoma cells EDTA diaminoethanetetra-acetic acid disodium salt EGF epidermal growth factor ER endoplasmic reticulum ES cells embryonic stem cells FCS fetal calf serum FIAU 1- (2 deoxy-2-f luor-ii-D-arabino- furanosyl)-5-iodouracil G6PD glucose-6-phosphate dehydrogenase HAT hypoxanthine aminopterine thymidine HEPES N-2 hydroxyethyl piperazine-N'-2 -ethane sulfonic acid HPRT hypoxanthinephosphoribosyl transferase HSV-tk Herpes Simplex Virus thymidine kinase gene IPTG isopropyl-B-D-thiogalactopyranosid LB Luria-Bertani LIF leukaemia inhibitory factor Mcr modified cytosine restriction NaPB Sodium Phosphate buffers Neo neomycin phosphotransferase OD optical density PBS phosphate buffered saline PCR polymerase chain reaction

278 PDCB Protan + Deutan colour blindness (red-green) PEG polyethylene glycol RE restriction endonuclease RFLP restriction fragment length polymorphism SDS sodium dodecyl sulphate SRP signal recognition particle TF tissue factor UV Ultra Violet vWF von Willebrand's Factor 6-tg 6-thioguanine

279 Nucleotides bases A adenine ATP Adenosine triphosphate C cytosine NTP any nucleotide G guanine dNTP deoxynucleotide T thymine ddNTP dideoxynucleotide U uracil

Amino Acids

amino acid three-letter code one-letter code

Glycine Gly G Alanine Ala A Valine Val V Leucine Leu L Isoleucine H e I Serine Ser S Threonine Thr T Aspartic Acid Asp D Asparagine Asn N Lysine Lys K Glutamic Acid Glu E Glutamine Gin Q Arginine Arg R Histidine His H Phenylalanine Phe F Cystein Cys C Tryptophan Trp W Tyrosine Tyr Y Methionine Met M Proline Pro P y-carboxy- Gla glutamic acid 15-hydroxy- Hya aspartic acid

280 Units of Measurement sec second Da Dalton min minute Mol mole h hour Bq Becquerel 1 litre °C degrees centigrade g gram J Joule (p: pico-, p.: micro-, g acceleration of gravity n: nano-, m: milli-, k: kilo-, M: Mega-.) V Volt U units A Ampere

281 Appendix II List of Suppliers of Reagents

REAGENT SUPPLIER Chemicals

(Most chemicals used in this study were obtained from either Sigma (Poole, Dorset), Pharmacia (Milton Keynes) or Boehringer BCL (Lewes, Sussex) and were of the highest purity available. The suppliers of some other chemicals and equipment are listed below.)

Acrylamide Gibco Agarose Seakem Ammonia Merck Ammonium Acetate Merck Boric acid Merck BDH Glycerol Merck Formamide Fluka. Glossop, Derbyshire. Hydrogen Peroxide Merck M-Cresol Aldrich. Gillingham, Dorset N,N' -methylenebisacrylamide Gibco Phenol Gibco BRL Life technologies Potassium Acetate Merck Siliconising fluid Merck BDH Sodium Hydroxide FSA Laboratory Supplies Sodium Dodecyl Sulphate Merck Urea Gibco

282 Biochemicals

1Kb DNA size markers Gibco BRL alkaline phosphatase Pharmacia Bglll Pharmacia Bgllll Pharmacia DNasel Gibco Hinc II Pharmacia Lysozyme Sigma Sal 1 Sigma T4 DNA Polymerase Pharmacia T4 Polynucleotide Kinase Pharmacia Thermus aquaticus DNA Polymerase Promega Biotech

Kits and other reagents

Nitrocellulose filters Schliecher and Schuell Random primed DNA labelling kit Boegringer BCL Genescreen Dupont UK Mouse Genomic Library Cambridge Bioscience, Cambridge, GB Mycoplasma Detection Kit Gen Probe Inc. Nick Columns Pharmacia Colloidon dialysis bags Sartorius Nick Translation Kit Amersham pGEM7ZF vector Promega Biotech. E. Coli DH5aF' cells Gibco BRL

283 Appendix III Addresses of suppliers

Aldrich Gillingham, Dorset Amersham Aylesbury, Buckinghamshire Applied Biosystems Inc. (ABI) Warrington, Merseyside Bio 101 La Jolla, California BioRad Hemel Hempstead, Herts Boehringer Lewes, East Sussex Denley Instruments Billingshurst, Sussex DuPont UK / NEN Stevenage Fluka Glossop, Derbyshire, Gen-Probe San Diego, California Gibco BRL Paisley, Scotland Grant Cambridge, UK Imperial Laboratories Ltd Andover, Hampshire Merck BDH Dagenham, Essex Pharmacia Milton Keynes, Promega Biotech. Southampton Sartorius Belmont, Surrey Schleicher and Schuell - through Anderman London Scotlabs Bellhill, Scotland Sigma Poole, Dorset,

284 Appendix IV

Buffers for Restriction Endonuclease Digests (as recommended in the New England Biolabs catalogue).

For contents of buffers see chapter 4

VH- very high, H-high, M-medium, L-low.

enzyme lOxbuffer enzyme lOxbuffer

Acc I L, M Nar I L, M

BamH I H, VH Not I H, VH

Bgl I M, H, VH Pst I M, H, VH

Bgl II M, H, VH Pvu I H, VH

Cla I L, M, H Rsa I L, M

EcoR I M, H, VH Sac I L, M

EcoR V VH Sal I VH

Hinc (hind) II M, H, VH Taq I L, M, H

Hind III M, H Xba I M, H, VH

Kpn I L Xho I M, H, VH

285 Appendix V lkb Ladder (Gibco BRL)

Sizes of bands in descending order (bp) 12,000 11,000 10,000 9.000 8.000 7.000 6, 000 5, 000 4, 000 3.000 2, 000 1, 600 1.000 500 400 350 200 (doublet) 150 (doublet) 74

286 Appendix VI

Oligonucleotide Sequences

NAME 5' SEQUENCE 3' TD °C

JKP4 94 ATGGCATGGAAGCTTATGTCAA 66.5

JKP37 CAACAGTGTGTCTCCAACTTC

JKP-11 G TAT G C CATAAAT CAGAC T T T

JM27 TACATTACATTGCTGCTGCAGA 64.3

RKIX1B T G CACAT T C GG TAC T GAG TA 59.1

RKIX1C CCAAGAACCAACTGGAAATA 59.8

RKIXgap5' ACTCGAGTTGTTGGTGGAGA 62

gap299B/S ATCACAGATCCGTTGACCTG 62.2

NOTIRK CTAGACCGCGGCCGCGGG

N0T2RK CTAGCCCGCGGCCGCGGT

The following oligonucleotides have been used mainly for sequencing purposes. MIX1-7 correspond to sequences in the first 7 exons of mouse factor IX. For oligos beginning RK- the second numerical digit (1-8) refers to the exon where this sequence appears. Oligos which have 9 or 0 as a second number also correspond to exon 8. (the first number (9) refers only to factor IX) Oligos denoted (i) correspond to intronic sequence. The dissociation temperature (TD) is shown only for those oligos used in the polymerase chain reaction.

MIX1 GCTCTCATCACCATCTTCCT

MIX2 CCTTGATCGTGAAAATGCCA

MIX3 C T GAAT T T T GGAAGCAGTAT

287 o HJ O

NAME 5' SEQUENCE 3' o

MIX4 TGGAGATCAGTGTGAATCAA

MIX5 TGCAACGTGTAACATTAAAA

MIX6 TTCCATTTCCATGTGGGAGA

MIX7 GCAACAACCTCAATTTTATC

RK91.1 GGATATCTACTCAGTACCGA

RK92 GGTTTCCTCGAACAAACTC

RK92.4(i) GCCTCTGGCACATACTTAC

RK92.5(i) C C TAATAC TAAAGAAC TATAC

RK92.6 GGTATTTCTACTCTCTGCC

RK92.7(i) TTCTACTCTCTGCCTCTGGC

RK93 (i) C T G TACAAC C T C T C TATATAG G

RK93.3(i) CACTATCATTAAGCTGTCC

RK94 GGCAC CAGCAT T CATAGGAAC

RK94.4 (i) GGAAAGTTGAATATGAATAC

RK94.5 (i) TCCTACCCTCTTAAATCTC

RK94.6 (i) TCTGAATTAGGTAAGTAAC

RK95 CTGGTCTTATGCAACTTGG

RK95.2 TTGGTATCCCTCAGTGCAG

RK95.3(i) GGATTTAGATGCAAGTTTCTG

RK95.4(i) TCAAGTTGAAGCCAATTCG

RK96 AGAT GACAT CAC T GAT GG

RK96.2 CAG T GAC G T TAT TAAGAAT G

RK96.3(i) GCATTTCTATATGCCACGG 61.2

RK96.4 GCCCAAGGGATTTGACCCGG

288 NAME 5' SEQUENCE 3' t d °c

RK96.5 CCGGGTCAAATCCCTTGGCAG

RK96.6 ATAAAGTACCTGCCAAGGGA 61.2

RK97 ATTCTGTGGAGGTGCCATCA

RK97.2 (i) CGACTCTAGAGGATCTGGG 60.1

RK97.3 (i) CTAAGTAACAACACCGGTG 60

RK98 AGCAAGGCAATGTCATGACT

RK98.2 GGGAAAAGTCTTCAACAAAGG

RK98.3 GGGACAAGTTTCTTAACTGGC

RK98.4 GAACCATCACGCCTTATGGG

RK98.5 CAATTCTCCCTCCTTGGCAGC

RK98.6 CAGGAGAGGAAGTCATGCT

RK98.7 CTTTTCCAACTCCCAAACC

RK98.8 ACCGGGTTTCTCTGTGTACC

RK98.9 CTATATGAGTTCCAGGACA

RK99.0 CAGGCAGTTTTTAACCTAG

RK99.1 GGAACGTTAGCAGAGTCAGC

RK99.2 GGGATGCTGACCATCCGACC

RK99.3 GGCAGTTGTTACAAGTAAGG

RK99.4 CCTTACTTGTAACAACTGCC

RK99.5 GGTTGGAACTGGAGAAGGC

RK99.6 GGTCGGATGGTCAGCATCCC

RK99.7 GTCACTGCCACCTGCCCCC

RK99.8 GCAAATGGCCAACTGACTTG

RK99.9(i) CAT C T CAT C G GACACAT T C T

289 o t-3 NAME 5' SEQUENCE 3' Q o

RK901 CATCACCCATTTTCAATTCC

RK902 CGTGTAAGTCCAGCCCTTG

RK903 CATGTGGCTCTATCCACCA

RK904 GAACCAAACTTGAGGAAGA

RK905 CATTGCCTTGCTGGAACTG

RK906 CATGGGGTCCCCCACTATCT

RK907 GATCCTAGATGGGCTTGTCT

RK908 ATCCAGAATCTTTAACCC

RK90 9 C GAAC TAT C C C T CAT CAC CA

290