University of Windsor Scholarship at UWindsor

Electronic Theses and Dissertations Theses, Dissertations, and Major Papers

2006

Characterization of L genes and their cDNAs in the brine shrimp, Artemia franciscana.

Jian Ping Cao University of Windsor

Follow this and additional works at: https://scholar.uwindsor.ca/etd

Recommended Citation Cao, Jian Ping, "Characterization of cathepsin L genes and their cDNAs in the brine shrimp, Artemia franciscana." (2006). Electronic Theses and Dissertations. 1398. https://scholar.uwindsor.ca/etd/1398

This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please contact the repository administrator via email ([email protected]) or by telephone at 519-253-3000ext. 3208. CHARACTERIZATION OF CATHEPSIN L GENES AND THEIR cDNAs IN THE BRINE SHRIMP, ARTEMIA FRANCISCANA

By:

JianPing Cao

A Thesis

Submitted to the Faculty of Graduate Studies and Research

through the Department of Biological Sciences

in Partial Fulfillment of the Requirements for

the Degree of Master of Science at the

University of Windsor

Windsor, Ontario, Canada

2006

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Library and Bibliotheque et Archives Canada Archives Canada

Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-17101-1 Our file Notre reference ISBN: 978-0-494-17101-1

NOTICE: AVIS: The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library permettant a la Bibliotheque et Archives and Archives Canada to reproduce,Canada de reproduire, publier, archiver, publish, archive, preserve, conserve,sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet,distribuer et vendre des theses partout dans loan, distribute and sell theses le monde, a des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, electronique commercial purposes, in microform,et/ou autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in et des droits moraux qui protege cette these. this thesis. Neither the thesis Ni la these ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent etre imprimes ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne Privacy Act some supporting sur la protection de la vie privee, forms may have been removed quelques formulaires secondaires from this thesis. ont ete enleves de cette these.

While these forms may be includedBien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. i * i Canada Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Cao JianPing

Copyright© 2006

All rights reserved

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT:

Embryos of the brine shrimp,Artemia franciscana, contain a novel cysteine

composed of a cathepsin L-Iike catalytic subunit of 28.5 kDa, and a cell

adhesion protein of the FAS-1 family of 31.5 kDa. The cathepsin L-like subunit is

encoded by a cDNA derived fromCL-1 the gene inA. franciscana which has been

shown to be intron-less. TheCL-1 gene was detected in genomic DNA prepared from

nuclei of Artemia, and a secondCL gene (CL-2 ) was obtained from a genomic library

in EMBL3. Screeningo f Artemia adult and embryo cDNA libraries yielded sequences

homologous to the CL-2 gene, and confirmed thatCL-2 gene is also intron-less.

Artemia adult CL-2 cDNA was found to contain two distinct open reading frames

encoding the pro-peptide and mature protease. Several potential transcription factor

binding sites were identified, indicating the possibility of a functional promoter in the

5’ upstream sequence of the CL-2 gene.

iv

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedicated to Dr. A. H. Warner

V

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGEMENT

I wish to thank my supervisor Dr. A. H. Warner very much for providing me the opportunity to work in his laboratory as a Master’s student. His advice and training during this period are very much appreciated.

I would also like to thank my committee member Dr. J. Hudson from the Department of Biological Sciences and Dr. B. Mutus from the department of Chemistry and Biochemistry for their time to review my thesis.

Many thanks are also due to the graduate students and the technical support staff of this department for their help.

vi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS

ABSTRACT...... iv

DEDICATION...... v

ACKNOWLEDGEMENT...... vi

LIST OF FIGURES...... xi

INTRODUCTION...... 1

1. Artemia ...... 1

2. ...... 2

3. family...... 2

4. Cathepsin L...... 5

5. Pro-region of cathepsin L...... 7

6. Intracellular protease targeting...... 7

6.1 Mannose-6-phosphate dependent process...... 7

6.2 Mannose-6-phosphate independent process...... 8

7. Human cathepsin Lgene...... 8

8. Cathepsin Lgenes of the shrimp Penaeus vannamei...... 9

9. ,-like cysteine gene

of Leishmania...... 9

10.Cathepsin Lgene ofFasciola gigantica...... 10

11 .Artemia franciscana cathepsin L...... 11

12.Cysteine protease inhibitors...... 12

13. Objectives...... 15

vii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. MATERIALS...... 16

M ETH O D S...... 17

1. Isolation of cDNA clones coding forArtemia franciscana

embryo cathepsin L...... 17

2. Construction of [32p]-labeled CL cDNA probe...... 18

3. Purification of PCR products...... 18

4. Cloning of PCR products...... 19

4.1 DNA ligation reaction...... 19

4.2 Transformation of competentEscherichia coli cells...... 19

5. Characterization ofArtemia cathepsin L-l gene isolated

from genomic DNA...... 20

5.1 Isolation ofArtemia nuclei...... 20

5.2 Isolation ofArtemia franciscana genomic DNA...... 20

6. Screening of recombinant DNA clones forArtemia franciscana

cathepsin L nucleotide sequences...... 21

7. Isolation of plasmid DNA from CL positiveArtemia franciscana

clones...... 21

8. Screening of an Artemia franciscana genomic DNA library in

EMBL3...... 22

9. Isolation of lambda EMBL3 phage containing putative cathepsin L

genomic DNA...... 23

lO.Restriction analysis and Southern blotting of putative CL clones

viii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. derived from the genomic DNA EMBL3 library...... 24

11. Polymerase chain reaction using phage DNA clones as

Substrate ...... 25

12. Amplification ofArtemia cathespin L genes...... 25

13.DNA sequencing ofArtemia cathepsin L clones...... 25

14.PCR analysis ofArtemia franciscana adult cDNA library for the

presence of cathepsin L sequence...... 27

15.PCR analysis ofArtemia embryonic cDNA library for additional CL

cDNAs...... 28

16.Analysis ofArtemia franciscana genomic DNA in phage EMBL3 for

additional CL genes...... 29

17.Attempt to identify the putative promoter sequence ofArtemia

franciscana CL genes...... 30

RESULTS...... 31

1. Isolation of cDNA clone coding forArtemia embryo

cathepsin L...... 31

2. Isolation and analysis of cathepsin-L like clones from Artemia an

fanciscana genomic DNA library...... 31

3. Identification of DNA sequence in putative CL genomic clones from

Artemia franciscana...... 34

4. PCR anlysis of theArtemia franciscana phage

DNA library...... 41

ix

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. Attempt to amplifyArtemia franciscana cathepsin L-l andL-2 genes

from different preparation of genomic DNA...... 50

6. Isolation of a cathepsin L cDNA from an Artemia adult cDNA

library...... 55

7. Isolation of a cathepsin L cDNA representing theCL-2 gene from the

Artemia embryo cDNA library...... 60

8. Identity of 5’ upstream sequences ofArtemia franciscana

CL genes...... 64

DISSCUSSION...... 77

Appendix 1: Pimers used in PCR experiments...... 90

Appendix 2: Primers designed onArtemia embryo

CL-1 cDNA sequence...... 91

Appendix 3: Primers designed onArtemia genomic

clone 9C sequence...... 93

REFERENCES...... 95

VITA AUCTORIS...... 109

X

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES

Figure 1. Analysis of a cDNA in vector pCR2.1 coding Artemia for

franciscana embryo cathepsin L...... 32

Figure 2. EcoR I digestion ofArtemia franciscana genomic DNA clones

in X EMBL3...... 33

Figure 3. PCR products from the use of putative CL phage DNA clones

as substrate ...... 35

Figure 4. Alignment of genomic clone A1 withArtemia embryo

cathepsin L cDNA...... 36

Figure 5. Alignment of genomic clone A1 withArtemia cathepsin L

genomic clone 9C CL-2( gene)...... 38

Figure 6. Comparison of gene structure of Artemia cathepsin L cDNA

and genomic clone 9C...... 40

Figure 7. PCR analysis of total DNA fromArtemia franciscana EMBL3

genomic DNA library...... 42

Figure 8. EcoR I restriction endonuclease digestion of PCR generated

fragments from EMBL3 DNA cloned

into pCR2.1...... 43

Figure 9. Alignment of PCR derived clone 818 withArtemia cathepsin L

genomic clone 9C CL-2( gene)...... 44

xi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 10. Sequence alignment of PCR derived clone 818 from EMBL3

genomic DNA library withArtemia

embryo cathepsin L cDNA...... 47

Figure 11. PCR products from Artemia genomic DNA prepared from

various sources...... 51

Figure 12. EcoR I digestion of PCR product shown in Fig 11 after cloning

in pCR2.1...... 52

Figure 13. Sequence alignment of DNA genomic clone 4271 withArtemia

embryo cathepsin L cDNA...... 53

Figure 14. Comparison of PCR products obtained from anArtemia adult

cDNA library andArtemia clone 9C representing theCL-2

gene...... 56

Figure 15. Comparison of DNA sequences derived from anArtemia

franciscana adult cDNA library andArtemia CL-2 gene (clone

9C) by PCR...... 57

Figure 16. Open reading frames in Artemia adult cathepsin L cDNA

sequence...... 61

Figure 17. PCR products derived fromArtemia embryonic

cDNA library...... 63

Figure 18. Comparison of DNA sequences derived from anArtemia

franciscana embryo cDNA library andArtemia CL-2 gene

(clone 9C) by PCR...... 65

XU

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 19. Analysis of open reading frame Artemiain embryo cathepsin

L-2 cDNA sequence...... 67

Figure 20. PCR products obtained using degenerate primers andArtemia

genomic clone 9C as substrate...... 69

Figure 21. EcoR I digestion of PCR product using OPC-4 and CLR10 as

primers shown in Fig 20 after cloning in pCR2.1...... 70

Figure 22. Comparison of DNA sequences of Artemia genomic clone 9C

and its 5’ upstream with Artemia franciscana adult CL-2

cDNA and embryo CL cDNA...... 71

Figure 23. Analysis of 5’ upstream sequence ofArtemia genomic clone 9C

representingCL-2 gene...... 75

Figure 24. Putative transcription binding sites in 5’ upstream Artemiaof

genomic clone 9C...... 76

Figure 25. Comparison of the deduced partial amino acid sequence of

Artemia CL-2 with Artemia CL-1 cDNA...... 83

Figure 26. Comparison of the deduced amino acid sequence ofArtemia

CL-1 and CL-2 cDNA with cathepsin L sequences from other

organisms...... 84

xiii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Introduction

1. Artemia

Artemia franciscana is a member of anostracan crustaceans, most commonly

known as brine shrimps. The entire body Artemiaof is covered with a thin, flexible

exoskeleton of chitin to which muscles are attached internally.Artemia is found on six

continents and consists of bisexual and parthenogenetic strains. In North America,

only bisexual species are found, whereas in Europe, Asia and Africa, both populations

occur (Criel and MacRae, 2002). Females of most Artemia strains reproduce either

ovoviviparously or oviparously, releasing either nauplius larvae or encysted embryos,

respectively (Jackson and Clegg, 1996; Liang and MacRae, 1999). They live in saline

environments and are able to tolerate large changes in salinity, ionic composition,

temperature and oxygen tension. Under ideal environmental conditionsArtemia tends

to reproduce ovoviviparously, whereas under adverse environmental situations it

reproduces oviparously (Criel and MacRae, 2002).

In advance of over-wintering conditions, Artemia adult females secrete a

chitinous material which forms a hard shell cyst around the fertilized egg in the ovisac.

Encysted embryos enter a state known as diapause during which their metabolic

activity becomes arrested and the embryo undergoes dehydration (Clegg and Conte,

1980). When environmental conditions are adequate, the cysts rehydrate and resume

metabolic activity and their developmental program.

Among all macromolecules in Artemia embryos, proteolytic play

important roles in development. They are key enzymes in hatching, yolk platelet

degradation, protein synthesis control, and in proenzyme activation reactions (Criel

and MacRae, 2002). In general proteolytic enzymes are divided into five groups based

on the reactive nucleophile in the catalytic site o f the . They are cysteine, serine,

aspartic, metallo- proteases and unclassified groups of peptidases. In embryos of

Artemia franciscana a cysteine protease is the dominant proteolytic enzyme.

l

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. Cysteine protease

Cysteine proteases can be isolated from prokaryotic and eukaryotic organisms.

They are part of a family of hydrolytic enzymes that require an SH group in their active

site. Highly conserved Cys and His residues in the form a

thiolate-imidazolium ion pair that mediates catalysis (Storer and Menard, 1994). The

structure is stabilized by a highly conserved Asn, while a highly conserved Gin forms

the , a crucial element in forming an electrophilic center to stabilize the

tetrahedral intermediate during hydrolysis (Sajid and McKerrow, 2001).

The cysteine proteases have been evolving for at least three million years

(Barrett and Rawlings, 2001). During this period, an ancestral peptidase may have

diversified first into a family with detectable similarity in amino acid sequences, and

then into a cluster of families that differ in many ways. Despite the differences in

amino acid sequences, the families in a cluster are ultimately related according to the

conservation of their protein fold. This group of related families is called a “clan”.

Conventionally, proteases are assigned to clans and families depending on a

number of characteristics including sequence similarity, possession of inserted peptide

loops and biochemical specificity to small peptide substrates. More recent

classification relies on sequence homology directly spanning the catalytic cysteine and

histidine residues (Sajid and McKerrow, 2001).

Clan CA is the largest clan of cysteine proteases, with about half of the total

families. It contains the families of papain (Cl), (C2), streptopain (CIO) and

the ubiquitin-specific peptidases (Cl2, C l9). Among them the papain family is the

largest family of cysteine proteases identified (Barret and Rawlings, 2001).

3. Papain family

Cysteine peptidases of clan CA family Cl (papain family) can be found in the

animal and plant kingdoms as well as in some viruses and prokaryotes (Bernd 2003).

They exist predominantly within the endosomal and lysosomal compartment of cells.

They include cathespins B, L, H, S, K, F, V, X, W, O and C. Among the

2

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. there are two subfamilies: -like subfamily and cathepsin L-like subfamily.

The main differences between the two subfamilies are the insertions in cathepsin B,

including the “occluding loop”, and the much shorter proregion in cathepsin B

compared to other cysteine proteases (Carmonaet al. 1996; Musil et al. 1991).

These enzymes share the general architecture of three catalytic residues Cys25,

His 159 and Asnl75. The ionized state of the nucleophilic cysteine residue, in the

active site is independent of substrate binding, making these and other cysteine

proteases a priori active (Polgar and Halasz, 1982). Except for , all

cysteine cathespins are monomers consisting of two domains, R (right) and L (left),

which ^ e formed in a V-shaped configuration as shown below. At the top of the cleft,

active site amino acids cysteine and histidine are positioned. The left domain (towards

the N-terminus) is composed largely of an a-helix, while a long helix runs through the

middle of the molecule. The right domain (towards the C-terminus) supports the

catalytic histidine and contains a P-barrel structure (Barret and Rawlings, 2001).

Structure of mammalian cathepsin L.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A, catalytic residue Cys 25; B, catalytic residue His 159; C, P-barrel structure in right domain; and D, a-helix structure in left domain (taken from Turk and Guncar, 2003).

Most of the papain family of enzymes are relatively small proteins with Mr values

in the range 20,000-35,000 (Brocklehurst et al. 1987; Polgar, 1989; Rawlings and

Barrett 1994; Berti and Storer, 1995), while cathespin C is an oligomeric enzyme with

Mr value approximately 200,000 (Metrioneet al. 1970; Dolenc et al. 1995). Cathepsin

C is a dipeptidyl aminopeptidase, not an endopeptidase like other cathepsins (Kirschke

et al. 1995). Cathepsins B, H and L are ubiquitous in lysosomes of mammals, whereas

cathespin S has a more restricted localization (Barrett and Kirschke, 1981; Kirschke

and Wiederanders, 1994).

Most lysosomal cysteine proteases are synthesized as 30-50 kDa

prepro-enzymes, processed in the ER then directed into lysosomes where they serve

their function in protein hydrolysis. After removal of the signal peptide, the molecular

mass of these enzymes remains within the range of 20-35 kDa. The processing of

prepro-enzyme into active enzyme includes two steps, the removal of the

prepro-region of the enzyme, and one or more limited proteolytic cleavages within the

polypeptide backbone as well as at the N- and C-termini, respectively (Machet al.

1994, Menardet al. 1998).

The functions of papain-like cysteine proteases are different in various

organisms. Plant papain proteases are mainly used to mobilize storage proteins in seeds.

Protein bodies in seeds contain both storage proteins and protease precursors. The

latter become activated after germination and begin degradation of the stored proteins

(Schlereth etal. 2001). Most parasitic papain-like cysteine proteases act extracellularly,

and they are important in the life cycle of parasites for invading tissues and cells,

gaining nutrients, hatching and even evading the host immune system (Sajid and

Mckerrow, 2002). Primitive organisms dependent on phagocytosis use papain-like

proteases to digest phagocytised proteins. The enzymes of these organisms are already

4

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. packed into lysosomes or acidified lysosome-like structures (Volkelet al. 1996;

Krasko et al. 1997; Gotthardt et al. 2002). Papain-like proteases in mammals are

primarily lysosomal enzymes. Only seems to be retained in the

endoplasmic reticulum (ER) (Wex et al. 2001). Cathepsins B, C, H, L, O are found in

the lysosomes of nearly all tissues and cells, thus probably fulfilling housekeeping

functions. Other cysteine proteases show a different distribution within tissues

suggesting specific functions not found in all mammals (Wiederanders, 2003).

There are several ways to regulate the activity of papain-like cysteine proteases.

The most important are pH and endogenous inhibitors. At low pH (2.8 - 3.8) mature

cysteine proteases, mainly cathepsins B, S, L, are denatured irreversiblyet (Turk al.

1995). Endogenous protein inhibitors include the cystatins (stefins, cystatins,

kininogens) (Turk et al. 1997), thyropins (thyroglobulin type-1 domain inhibitors)

(Lenareie et al. 1998), and the general protease inhibitor a-2-macroglobulin (Masonet

al. 1989).

Among members of the papain family, cathepsins B, H, L and S have been

studied extensively because they have been implicated in a variety of physiological

processes such as proenzyme activitation (Eeckhout and Vales, 1997; Shinagawaet al.

1990; Kobayashiet al. 1991), enzyme inactivation (Bond and Barrett, 1980), antigen

presentation (Takahashi et al. 1989; Roche and Cresswell, 1991; Michalek et al. 1992),

hormone maturation (Uchiyamiet al. 1989), tissue remodelling and bone matrix

resportion (Delaisse et al. 1980; Machiewicz et al. 1990; Guinec et al. 1993; Blondeau

et al. 1993)

4. Cathepsin L

Cathepsin L (CL) is a lysosomal cysteine protease in mammals. It is the most

active of all cysteine proteases in degrading protein substratesin vitro such as

azocasein, collagen, or elastin (Barrett and Kirschke, 1981; Maciewiczet al. 1987;

Mason et al. 1989). CL was first purified from rat liver lysosomes by Bohley and

5

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. colleagues (1971) and was subsequently designated cathepsin L, with “L” to designate

lysosomes.

Some cathepsin L-like cysteine proteases have been detected in non-lysosomal

regions of eukaryotic cells and embyos. Cathespin L-like cysteine proteases occur in

the cytoplasm of unfertilized eggs and around yolk granules in amphibians and fish

embryos (Miyataet al. 1995, 1998; Kestemont, 1999). A cytoplasmic cathepsin-L like

protease is required for grastrulation inXenopus (Miyata and Kubo, 1997). CL is also

found in embryos of the flesh fly,Sarcophaga peregrine, for differentiation of

(cultured) imaginal disks (Homma and Natori, 1996) and inArtemia franciscana

embryos (Warneret al. 1995; Warner and Matheson, 1998). A cathepsin L isoform

devoid of a signal peptide has been detected in the nucleus of murine NIH3T3 cells

(Goulet et al. 2004).

As a member of the papain family, cathspsin L has been shown to play a role in a

variety of intracellular and extracellular processes including antigen presentation

mentioned above (Villadangoset al., 1999), prohormone activation (Marx, 1987),

sperm maturation (Erickson-Lawrence et al., 1991), bone resorption (Delaisse and

Vaes, 1992), and extra-cellular matrix (ECM) remodeling (Yamadaet al., 2000).

Over-expression of cathepsin L has been reported to be involved in several

diseases. Many human tumors and cancers of kidney, testicles, lung, colon, breast,

adrenal gland and ovary have been found to express very high levels of cathepsin L

(Chauhan et al., 1991). Cathespin L is able to hydrolyze components of the ECM such

as collagen IV, fibronectin, and laminin suggesting that it can degrade the ECM, thus

enabling tumor cell proliferation and invasion into surrounding tissues, as well as into

the vasculature. After entering blood vessels malignant cells can metastasize by

extravasation into other tissues (Koblinskiet al. 2000; Szpaderska and Frankfater,

2000; Severet al. 2002). Over-expression of cathepsin L has also been implicated in a

variety of inflammatory diseases, such as inflammatory myopathies, rheumatoid

arthritis, and periodontitis (Berdowska, 2004).

6

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. Pro-region of cathepsin L

The cysteine proteases known as cathepsin L are synthesized as prepro-enzymes

that require processing (cleavage of the N-terminal fragment) to become active

catalytically at pH 3.0-6.5. The pro-region contains two conserved motifs. The first is a

GNFD motif (Gly-Xl-Asn-Xl-Phe-Xl-Asp) and a similar motif could also be found in

the pro-region of cathepsin B group (Ishidohet al. 1987a; Vemet et al. 1995). Some

researches have proposed that alteration of the charge state in the GNFD motif could

trigger the processing of the proenzyme, and the GNFD motif may participate in the

pH regulation of the processing (Vemetet al. 1995). The second conserved motif is the

ERFNIN motif (Glu-X3-Arg-X3-Phe-X2-Asn-Xl-Tyr-X2-Asp), which distinguishes

cathepsins L and B (Karreret al. 1993). The ERFNIN motif is required to maintain the

three-dimensional structure of the pro-region of CL (Coulombeet al. 1996; Cygler and

Mort, 1997).

The pro-region of cathepsin L can inhibit CL activity by covering the active site

cleft in a non-productive orientation (Carmonaet al. 1996; Volkel et al. 1996).

Inhibition by the pro-region displays pH-dependency very similar to that required for

processing of the pro-cathepsin L, and the N-terminus of the pro-region is more

important for inhibition than the C-terminus (Carmonaet al. 1996).

6. Intracellular protease targeting

6.1 Mannose-6-phosphate dependent process

Mannose-6-phosphate marker for delivery of enzymes to the lysosomes or

acidified vesicles can be found in the pro-peptide domain as well as in the catalytic

domain of the mature protease. The generation of the mannose-6-phosphate marker

was made through transferring phospho-Glc-Nac (N-acetyl-glucosamine) to mannose

residues of cathepsin L by the UDP-GlcNAc (lysosomal enzyme

N-acetyl-glucosamine-1-phosphotransferase). The removal of terminal GlcNAc yields

the mannose-6-phosphate group which attaches cathepsin L to the

mannose-6-phosphate receptors for transport to the lysosomes (Wiederanders, 2003).

7

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2 Mannose-6-phosphate independent process

A mannose-6-phosphate independent process for trafficking of lysosomal

proteins has also been suggested. A conserved 9 amino acid long peptide motif was

identified in the alternative trafficking process (HuetePerezet al. 1999). It is located

close to the N-terminus of the pro-peptides. A receptor recognizing the motif hasn’t

been identified so far, although a 43 kD integral lysosomal membrane protein was

described binding mouse procathepsin L in a pH dependent manner (McIntyre and

Erickson, 1993).

7. Human cathepsin L gene

Human cathepsin L is involved in many diseases. High levels of expression of the

cathespin Lgene can be found in various types of tumors and cancers (Izabela 2004).

The human cathepsin Lgene is located on human chromosome 9q21 -22 (Chauhanet al.

1993), and it encodes five mRNA species, namely hCATL A, Al, All, AIII and hCATL

B, all differing in their 5’ untranslated region (5’-UTRs). Of these hCATL A, Al, All

and AIII are produced by the alternative splicing of the same primary transcript (Arora

and Chauhan, 2002). The hCATL Al, All and AIII variants lack 27, 90 and 145 bases,

respectively from the 3’ end of exon-1 (Rescheleitet al. 1996), while the hCATL B

variant includes sequence (182 bases) from the 3’ end of intron-1 of the primary

transcript (Chauhan et al. 1993). Interestingly, the hCATL AIII variant is the most

efficiently translated isoform. The predominance of this translated splice variant

(hCATL AIII) in malignant cells suggests that it plays a key role in the over-expression

of human cathespin L in cancer (Arora and Chauhan, 2002).

A 5’-flanking region of the humancathepsin Lgene containing 3263 bp upstream

of translation start site was identified previously (Jeanet al. 2002). The promoter of

human cathepsin L gene is TATA-less, and there are about forty-three potential

transcription factor binding sites in the 5’ upstream area referred to above (Sethet al.

2003). One specific region of 50 bp located 60 bp from the putative transcription

initiation site was found to have transcription factor binding sites required for

8

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. cathepsin L promoter activity. It contains a CCAAT motif and two GC boxes. The

CCAAT motif could bind transciption factor NF-Y, also named CBF or CPI. The two

GC boxes could bind transcription factors Spl (a general activator of transcription) and

Sp3, an activator or a repressor of Spl-mediated activation.

8. Cathepsin L genes of the shrimp Penaeus vannamei

The cathespin Lgene structure of P. vannamei was identified by Le Boulayet al.

(1998). This gene expresses a cathespin L-like enzyme in the hepatopancreas. It is

encoded by six exons. The six exons and their intervening introns span 1792 bp of

genomic DNA. Exon 1 encodes 14 amino acids of the ER signal sequence, and the first

12 amino acid residues of the pro-peptide. The remainder of the pro-peptide is encoded

by exon 2 and 90 bp of exon 3. The last 90 bp of exon 3 and all of exons 4-6 code for

the mature region of the protease. Sequence polymorphism was also found in the last

intron of the gene, giving rise to three variants of the enzyme. The gene structureP. of

vannamei cathepsin isL homologous to that of rat cathepsin L(63 %), but it contains

fewer introns. When compared with ratcathepsin L, three of the conserved intron

positions were identified in theP. vannamei cathepsin gene L (Le Boulayet al. 1998).

In contrast no similarity or low similarity could be found betweenDrosophila or

Plasmodium cathepsin L-like genes with that of the rat cathepsin L.

9. Cathepsin L-like cysteine protease genes ofLeishmania.

The Leishmania donovani complex causes a variety of diseases such as visceral

leishmaniasis, which is a serious health problem in tropical and subtropical countries

(Badaro et al. 1986; Evan et al. 1986; Tselentis et al. 1994). Characterization of

cathepsin L-like cysteine proteases ofLeishmania donovani complex was first

determined by Mundodiet al. (2001). In Leishmania chagasi the CL gene cluster has

five copies of tandemly arrangedCL genes. The sequences coding for pre-pro and

mature regions of the protease are conserved in all the five genes of the cluster except

for the last gene (LdccyslE ). The LdccyslE gene is identical to the first geneLdccysJA

9

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. except for a deletion of 39 bases coding for 13 amino acids in the mature region,

including one of the active site histidine residues and a truncated carboxyl terminal

extension. The CL gene organization ofLeishmania chagas is similar to that of

Leishmania donovani. L. donovani possesses six CL genes, but two are identical

(LdccyslF andLdccyslE) except that LdccyslE lacks 39 bases in the mature region as

described above. TheLdccyslA and LdccyslF proteases show cysteine protease

activities in gelatin gels, whileLdccyslE is inactive in gelatin gels.

10. Cathepsin Lgene ofFasciola gigantica

The liver flukes Fasciola gigantica andFasciola hepatica are causative agents

of fascioliasis in humans, which has been considered an increasingly important chronic

disease since 1980 (Chen and Mott, 1990; Estebanet al. 1998; Mas-Coma et al. 1999).

Fasciola cathespin L is involved in crucial biological functions such as host protein

degradation, tissue penetration and immune invasion. For these reasons, the cathepsin

L-like cysteine proteases of liver flukes have been potential targets as

immunodiagnostic antigens for fascioliasis (Yamasakiet al. 1989; Fagbemi and

Guobadia, 1995; O’Neill et al. 1999) or as vaccine candidates (Wijffelset al. 1994a;

Dalton et al. 1996b).

The structure of Fasciola gigantica cathespin gene L was characterized by

Yamasaki et al. (2002). The gene consists of four exons and three introns spanning

approximately 2.0 kb in the genome. Exon 1 encodes 15 amino acid residues of the

pro-region and the first 21 residues of the mature enzyme. Of the three introns, two are

in the same position as in the mammaliancathepsin Lgene. In the promoter region two

TATA boxes are localized upstream of the transcription initiation site. The sequence of

the 5’-upstream region of the cathespin Lgene transcript is transcribed by cis-splicing.

The ERFNIN motif was also found inF. gigantica cathepsin and L the processing of

procathespin L is consistent with that found for mammalian cathespin L.

10

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11. Artemia franciscana cathepsin L

The major protease in Artemia embryos was identified twenty years ago as a

cysteine protease based on its inhibition by cysteine protease inhibitors (Warner and

Shridhar 1985). Since then research usingArtemia embryos and larvae have shown

that the cysteine protease activity is found mostly in non-lysosomal structures (Warner

and Shridhar 1985; Lu and Warner 1991; Warneret al. 1995). The non-lysosomal

Artemia cysteine protease has been implicated in yolk utilization and remodeling of the

extracellular matrix in early development, and in regulation of larval molting (Warner

et al. 1995; Warner and Matheson 1998).

Further study also confirmed that theArtemia cysteine protease belongs to the

cathepsin L group (Butler, et al. 2001). This conclusion was based on assays using

substrates specific for cathepsin B (N-a-Cbz-Arg-Arg-4-methoxy- p-naphthylamide),

(L-leucine- P-naphthylamide), and cathepsin L.

(N-Cbz-Phe-Arg-4-methoxy- P-naphthylamide).Artemia embryos contain at least

seven isoforms of cysteine proteases with pi values ranging from 4.6 to 6.2 (Butler,et

al. 2001). Using HPLC and isoelectric focusing, theArtemia embryo cysteine protease

was found to be a unique heterodimeric protease of about 60 kDa, and composed of

two tightly associated polypeptides, a catalytic subunit of 28.5 kDa and a non-catalytic

subunit of 31.5 kDa (Bulter, et al. 2001; Warner et al. 2004).

A cDNA encoding the 28.5 kDa catalytic subunit was isolated and the nucleotide

sequence was shown to have high homology with other cathespin L-like cysteine

proteases (Butler, et al. 2001). The proenzyme has 338 amino acids, while the mature

protein contains 217 amino acids. The first 20 amino acids of the proenzyme encode an

endoplasmic reticulum secretory signal (Watson 1984; Nielsenet al. 1997; Butler et al.

2001).

Previous work in our lab demonstrated that theArtemia cathepsin Lgene coding

for the embryo CL is intron-less inArtemia franciscana (Matt Shaw, unpublished). The

polymerase chain reaction was performed to amplify cathespinthe L gene inArtemia

franciscana using genomic DNA as template and the results indicated a sequence

n

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. identical to that found for cDNA. More recently, thecathepsin L gene inArtemia

parthenogenetica, a parthenogenetic relative ofArtemia franciscana, was identified

and found to contain one intron (Shamoon, unpublished). The DNA sequence of

Artemia parthenogenetica shares 98% identity with the cDNA ofArtemia franciscana,

except for the fact that the CL gene inArtemia parthenogenetica contains an intron of

1085 bp in the prepro-coding region. Since it is believed thatArtemia

parthenogenetica evolved more recently (5-6 mya) Artemiathan franciscana, these

observations tend to support the “intron-late” theory of evolution.

The illustration on page 13 indicates the intron-exon structure of theCathepsin L

gene in various organisms.

12. Cysteine protease inhibitors

An imbalance between endogenous proteases and protease inhibitors may lead to

pathologies such as rheumatoid arthritis, multiple sclerosis, neurological disorders,

osteoporosis and tumors (Berdowska and Siewinski, 2000). As well, many studies

have shown that endogenous protease inhibitors play an important role in development

(Thiery 1984; Montesanoetal. 1990; Matrisian and Hogan 1990). These inhibitors act

through steric blockage of the substrate access to the enzyme catalytic center (Rzychon

et al. 2004). The cyteine protease inhibitors include cystatin, thyropin, chagasin, as

well as inhibitors of the apoptosis (IAP) and staphostatin (Rzychonet al.

2004).

Cystatins inhibit the activity of the papain family of cysteine proteases found in

viruses, , plants and animals. On the basis of sequence homology, the cystatin

superfamily is divided into three groups: stefins (family I), cystatins (family II) and

kininogens (family III) (Otto and Schirmeister, 1997; Barrettet al. 1998; Grzonka et al.

2001). The cystatins bind to amino acids adjacent to the protease active site,

obstructing the access of substrate, but they do not interact directly with enzyme

catalytic center (Bode and Huber, 2000). Thyropin is the protease inhibitor whose

structure contains an arrangement designated as thyroglobulin type-1 domain

12

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Intron and Exon Structure in theCathepsin L Gene in Various Organisms

1186 345 100 166 665 550 500 Total bp Exon Homo 5117 (1605 ) 280 136 123 147 225 163 118 305

1069 715 105 765 213 2100 1110 Mus 7329 (1262 ) 57 137 123 147 225 163 118 403

141 97 101 166 311 Penaeus 1792 (976 ) 79 153 180 192 166 206

1309 4049 60 Dros. 6836 (1418) 129 117 215 957

474 540 571 C. e/e. 2596 (1011) 258 102 447 204

1086 A. par. (CL-1) -2000 (1014)

Leishmania** 1326 (1326 ) 1326

Metapenaeus 1094 (1094) 1094

A.fran. (CL-1) 1014 (1014) 1014

The organisms shown here are as follows (from top to bottom): Human, mouse,

Penaeus, Drosophila, C. elegans, A. parthenogenetica, Leishmania, Metapenaeus, and

A. franscicana. ** Present in five copies in tandem, each as a single exon, with slight

differences at the 3’-end. Gene shown here is LclA.

13

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (Lenarcic et al. 1999). It has a similar mechanism of inhibition, but it is more specific

than the relatively non-selective cystatins (Guncar et al. 1999; Bode and Huber 2000).

Chagasins are Trypanosoma cruzi protein and were recently characterized as a

tight-binding inhibitor of papain-like cysteine proteases (Santoset al. 2005). They are

inhibitory towards papain-like proteinases of bacterial, protozoan and mammalian

origin (Monteiro et al. 2001; Rigden et al. 2002; Sandersenet al. 2003). IAP

(inhibitors of the apoptosis protein family) could function through direct inhibition of

(Rzychonet al. 2004). Staphostatin is a newly identified cysteine protease

inhibitor and it has high specificity towards staphopains, bacterial papain-like cysteine

proteases (Rice et al. 2001; Massimi et al. 2002; Rzychonet al. 2003). Recent studies

have also shown that some novel inhibitor proteins are homologous to the pro-region

of papain-like cysteine proteases such as mouse cytotoxic T-lymphocyte antigen

(CTLA-2), which is homologous to the pro-region of mouse cathepsin L, andBombyx

cysteine protease inhibitor (BCPI) in the silkmothBombyx mor\ (Yamamoto et al.

2002 ).

In dormantArtemia embryos two kinds of cysteine (thiol) protease inhibitors

have been identified, one dialyzable and the other non-dialyzable (Nagainis and

Warner 1979; Warner and Shridhar 1985). Nothing is known about the nature of the

dialyzable inhibitor, but the non-dialyzable inhibitors have been partially characterized

(Nagainis and Warner, 1979). Using gel filtration and HPLC, three thiol-protease

inhibitors were identified and named TPI-1, TPI-2, TPI-3 (Warner and

Sonnenfeld-Karcz, 1992). TPI-2 and TPI-3 were found to be homogenous by

electrophoresis and chromatography on a C-18 column, while TPI-1 appeared to be

heterogeneous, and composed of two components with molecular masses of 11.8 and

13.6 kDa. The Artemia embryo TPI proteins belong to the cystatin superfamily, and are

similar to the members of the type I cystatin family (stefins) (Warner and

Sonnenfeld-Karcz, 1992).

14

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13. Objectives

The objectives of this thesis are as follows: 1) to elucidate further the structure of

the cathepsin Lgene(s) inArtemia franciscana and determine whether more than one

functionalcathepsin Lgene exists; 2) to characterize the cathepsin Lgene(s) expressed

in Artemia adult tissue, and determine whether they have properties similar to the

cathepsin Lgene(s) expressed in the embryo Artemiaof franciscana-, and 3) to begin

characterization of the promoter region of thecathepsin L genes inArtemia to help

elucidate the requirements for transcription.

15

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Materials and Methods

Materials:

The Artemia franciscana genomic DNA library and cDNA libraries were

prepared by L. Sastre (Madrid, Spain). TheArtemia genomic DNA library is in

bacteriophage lambda EMBL 3, and it was maintained Ecoli in K802 cells. The

Artemia adult cDNA library is in bacteriophage lambda gtl 1 and it was maintained in

Ecoli rY1090~ cells. The Artemia embryo cDNA library is in bacteriophage lambda

ZAPII, and it was maintained Ecoliin K802 cells.

Artemia franciscana genomic DNA was isolated fromArtemia cysts obtained

from the Great Salt Lakes in Utah and purchased from the Sanders Brine Shrimp

Company (Ogden, UT).

The Artemia franciscana cDNA clone representing the DNA sequence coding

for Artemia embryo cathepsin L-l was obtained previously in our lab (see NCBI

database, AF147207) (Butler et al. 2001). A genomic clone (9C) representing the

cathepsin L-2 gene was isolated from theArtemia franciscana EMBL3 library and

provided by Matt Shaw in our lab (see NCBI database, AY557372) (unpublished).

All PCR primers were synthesized at Sigma-Genosys (Oakville, ON). The TA

Cloning Kit was purchased from Invitrogen (Burlington, ON). 32P-dCTP for making

CL probe was from PerkinElmer (Boston, MA). PCR reagents and the Wizard

Miniprep Kit were both from Promega (Madison, WI). PCR products were purified

using the Wizard DNA Clean-Up system (Promega) and the QIA Gel Extraction Kit

(Qiagen) (Mississauga, ON). Molecular grade chemicals were from Sigma Chem. Co.

(Mississauga, ON). Restriction enzymes used in this study were obtained from

Promega. The DNAzol Reagent was from GIBCO-BRL (Burlington, ON).

Nitrocellulose membranes were obtained from Pall Life Sciences (East Hills, NY).

X-ray film was obtained from Kodak (Rochester, NY). Sequencing was performed

using the Thermo Sequenase Cy5.5 Dye Terminator Cycle Sequencing Kit (Amersham

Biosciences, Baie d’urfe’, QC) and the departmental DNA sequencer (Visible Genetics)

16

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. or by the Robarts Institute (London, ON).

Methods:

1. Isolation of cDNA clones coding for Artemia franciscana

embryo cathepsin L

A cDNA clone in Bluescript coding forArtemia franciscana embryo cathepsin L

was obtained from our laboratory stock (Butleret al., 2001) and amplified inE.coli

strain JM109 cells as follows. The transformed JM109 cells were grown in LB broth

with ampicillin and 1.5ml stocks were stored at -80°C (see Butleret al. 2001), while

similar size aliquotes were used to isolate plasmid DNA containing the embryo

cathepsin L cDNA using the small-scale alkali lysis method (Birnboim and Doly, 1979;

Ish-Horowicz and Burke, 1981) with modifications as described more recently

(Sambrook et al. 1989). Cultures (1.5 ml) were transferred to microcentrifuge tubes

and centrifuged at 13,000 rpm for 2 minutes. The supernatant was removed and the

sediment was resuspended in 200 pi ice-cold Solution I (50 mM glucose, 25 mM

Tris-Cl, pH8.0, 10 mM EDTA, pH8.0), followed by addition of 200 pi Solution II (0.2

N NaOH, 1% SDS) and inverting the tube five times. Next, 200 pi Solution III (5M

potassium acetate 60 ml, glacial acetate acid 11.5 ml, H20 28.5 ml) was added and the

tube inverted several times before centrifugation at 13,000 rpm for 10 minutes. The

supernatant was transferred to a clean tube and an equal volume of phenol: chloroform

(1:1) was added to the tube and mixed by vortexing. After centrifugation at 13,000 rpm

for 2 minutes, the supernatant was transferred to a clean tube, and 2 volumes of 95%

ethanol were added to precipitate the DNA. After standing for 30 minutes at room

temperature, the plasmid DNA was collected by centrifugation (5 mins, 13,000 rpm),

washed with 70% ethanol, then air-dried for 10 minutes. Finally, the DNA pellet was

resuspended in 50 pi TE (8.0) containing RNAase (20 pg/ml), and purified using the

17

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Wizard DNA Clean-Up system for use as template in making a [32P]-labeled CL cDNA

probe.

2. Construction of [32p]-labeled CL cDNA probe

A [32P]-labeled cathepsin L probe was constructed by the polymerase chain

reaction using a previously clonedArtemia franciscana embryo cathepsin L cDNA as

substrate. The primers used in the PCR were CLF and CLR (see Appendix 1 and 2 for

details of sequence) at a final concentration of 1 pmol/pl constructed from published

nucleotide sequence information forArtemia cathepsin L cDNA (Butler et al. 2001).

The reaction also contained 2 pi [a-32P] dCTP with 2 mM dGTP, dATP, dTTP and 0.4

mM dCTP. Other materials in the reaction mixture were Bluescript plasmid containing

the Artemia embryo cathepsin L cDNA as template (250 ng), lx PCR reaction buffer

containing 2.5 mM MgCl2, and 5 units of Taq DNA polymerase in a final volume of 50

pi. The PCR was performed under the following conditions: 94°C for 5 minutes; 35

cycles of 94°C for 1 minute, 50°C for 1 minute, 72°C for 2 minutes; then 72°C for 10

minutes at the end of the reaction. After the reaction, the PCR product was purified on

a 1 x 5 cm G50 Sephadex column using buffer E (10 mM Tris-Cl pH 8.0, 50 mM NaCl,

ImM EDTA pH 8.0), and the amount of radioactivity in the PCR product was

determined by liquid scintillation counting. The [32P]-labeled probe consisted of 1018

bp representing the prepro- and mature regions of theArtemia embryo cathepsin L

cDNA (see Butler et al. 2001).

3. Purification of PCR products

All PCR products were purified using the Wizard DNA Clean-Up system. At

least 50 pi of the PCR reaction mix was added to a microcentrifuge tube with 1 ml

Clean-Up resin. The resin and sample were mixed by gently invertion several times.

The whole mixture was transferred to a syringe barrel then dispensed through the

minicolumn. The resin was washed with 2 ml 80% isopropanol then transferred to a

1.5 ml microcentrifuge tube and centrifuged for 2.5 minutes to dry the resin. The

18

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. minicolumn (resin) was transferred to a new microcentrifuge tube and 50 pi distilled

water was added to the minicolumn. After 1 minute the minicolumn-microcentrifuge

tube was centrifuged for 30 seconds to collect the eluted DNA. The concentration and

purity of the DNA eluting from the minicolumn was determined by spectrometry at

260 nm and 280 nm.

4. Cloning of PCR products

4.1 DNA ligation reaction

The TA Cloning Kit was used for cloning all PCR products. DNA ligation with

the vector (pCR2.1) was done in a total volume of 10 pi and contained 1 pi 1 Ox ligation

buffer, 2 pi pCR 2.1 vector (25 ng/pl) and 1 pi T4 DNA (4.0 Weiss units). The

amount of template (pCR2.1) used in the reaction depended on the amount of PCR

fragment available for the ligation reaction to obtain a ratio about 1:5 or 1:10

(vector:insert). The ligation reaction was carried out overnight at 14 C.

4.2 Transformation of competent Escherichia coli cells.

Competent E. coli cells, INVaF’ (Invitrogen), were thawed on ice before their

transformation with the pCR2.1 vector with insert. To one vial of competent cells (50

pi) was added 2 pi p-mercapto-ethanol and 2 pi of the ligation reaction. The treated

cells were mixed gently, incubated on ice for 30 minutes, then heated in water bath for

30 seconds at 42°C. The vial was placed in ice for several minutes, then 250 pi of

S.O.C medium (2% tryptone; 0.5% yeast extract; 10 mM NaCl; 2.5 mM KC1; 10 mM

MgCl2; 10 mM MgS04; 20 mM glucose) were added. The vial was mixed horizontally

at 37°C for 1 hour at 225 rpm on a platform shaker. After incubation, 100 pi of the

transfected cells were spread on an LB plate containing 40 pg /ml X-gal (Sigma) and

50 pg /ml ampicillin (Sigma). The LB plate was incubated at 37°C overnight.

19

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. Characterization ofArtemia cathepsin L-l gene isolated

from genomic DNA

5.1 Isolation ofArtemia nuclei

Initially, nuclei were isolated from newly hatched nauplius larvaeArtemia of

franciscana following the method of Squires and Acey (1989) as described by Clegget

al. (1994). Starting material was 1 g (wet weight) nauplii that had been obtained from

Artemia cysts incubated at 28°C for 15-18 hours. The nauplii were homogenized with

8 ml ice-cold homogenization buffer (HB) (10 mM Tris-HCl, pH 7.5; 10 mM MgCl2;

0.1% Nonidet P40). The homogenate was centrifuged briefly to remove fragments of

exo-skeleton and the sediment was washed twice with 5 ml homogenization buffer

(HB), and centrifuged at 2500 rpm for 10 minutes at 4°C to sediment nuclei and

residual yolk platelets. The nuclei-rich pellet was resuspended in 2-3 ml

homogenization buffer lacking Nonidet P40, then layered over a 75% Percoll solution

containing 0.15 M NaCl, 0.01 M MgCl2, and 0.01 M Tris-Cl, pH 7.5 in a centrifuge

tube. The Percoll-nuclei preparation was centrifuged at 14,500 rpm for 30 minutes at

4°C. The white fluffy zone just beneath the surface was collected, diluted with 5 ml

homogenization buffer lacking Nonidet P40 and centrifuged at 10,000 rpm to recover

nuclei.

5.2 Isolation ofArtemia franciscana genomic DNA

Artemia genomic DNA was isolated from purified nuclei using the DNAzol

Reagent as follows. DNAzol (1 ml) was added to about 1-3 x 107 isolated nuclei and

the reaction vessel was inverted several times to lyse the nuclei. The vessel was

centrifuged at 10,000 rpm, then genomic DNA was precipitated from the supernatant

by the addition of 0.5 ml of 100% ethanol per 1ml of DNAzol used for isolation. The

visible DNA precipitate was removed by spooling with a pipette tip. The DNA

precipitate was washed twice with 0.8-1 ml 75% ethanol. The DNA was air dried for

20

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. about 15-60 seconds to remove the ethanol then dissolved in 0.2-0.3 ml of 8 mM

NaOH. The concentration and purity of the DNA was determined by UV absorption at

260 nm and 280 nm.

6. Screening of recombinant DNA clones for Artemia

franciscana cathepsin L nucleotide sequences

Nitrocellulose filters used for screening were placed on agar plates with 50pg/ml

ampicillin (Sigma). White bacterial colonies were transferred onto the filter, and then

onto a master agar plate with ampicillin but no filter. Colonies were streaked in

identical positions on both plates. The plates were incubated overnight at 37°C. The

next day both plates were marked with India ink in three positions, sealed with

parafilm and stored at 4°C.

The nitrocellulose filter was placed on Whatman 3MM paper saturated with 10%

SDS for 3 minutes, then treated as described below (see section 8).

Finally, hybridization with the [32P]-labeled CL probe was carried out as

described above and the filter was exposed to an X-ray film at -80°C until a signal of

the desired strength was obtained.

7. Isolation of plasmid DNA from CL positive Artemia

franciscana clones

Bacterial clones showing a positive signal when probed with 32P-labeled CL

cDNA were suspended in 500 pi LB, incubated with shaking at 37°C for 30 minutes,

then mixed with 3 ml LB broth containing 50 pg/ml ampicillin. The liquid cultures

were incubated overnight, then 1 ml of each bacterial culture was centrifuged for 2

minutes at 13,000 rpm. The supernatant was removed and plasmid DNA was isolated

from the pellet using the Wizard Miniprep Kit according to the manfacturer’s

directions.

The plasmid DNA was treated with the restriction enzyme EcoR I and the

21

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reaction mixture was analyzed on a 1% agarose gel to confirm the presence of an insert

in the vector.

8. Screening of anArtemia franciscana genomic DNA library

in EMBL3

An Artemia franciscana genomic DNA library, constructed in bacteriophage

lambda EMBL3, was prepared and kindly supplied by L. Sastre (Madrid, Spain). To

screen the library for CL DNA sequence(s),E.coli K802 cells were grown overnight in

LB medium supplemented with 10 mM MgS04 and 0.2% maltose, then various

dilutions of the EMBL3 phage were added to 0.5 ml K802 cells and each tube was

incubated at 37°C for 40 minutes. The transfected cells were then added to 2.5 ml LB

containing 0.7% agarose and poured onto 90 mm LB/agar plates. The plates were

incubated overnight, and those with distinct, but non-confluent plaques were

transferred to nitrocellulose membranes as described previously (Benton and Davis,

1977) in Sambrook et al. (1989). The membranes were placed on Whatman 3 MM

filter paper and treated sequentially with the following solutions: 1) denaturing

solution (1.5 M NaCl, 0.5 M NaOH), 2) neutralizing solution (1.5 M NaCl, 0.5 M

Tris-Cl, pH 7.0), and 3) 2x SSC, each for 3 minutes. The membranes were then air

dried for 30 minutes and baked in a vacuum oven at 80°C for one hour. Prior to

treatment with the 32P-labeled CL probe, the membranes were treated with 6x SSC

containing 5x Denhardt’s and 0.1% SDS at 64°C for one hour. Next, the hybridization

reaction was carried out in fresh pre-hybridization solution containing heat denatured

32P-labeled CL probe (1-2 x 106 cpm) at 64°C overnight in a hybridization chamber

(Fisher Scientific). After the hybridization step, the membranes were washed in 6x

SSC containing 1% SDS for 20 minutes, 2x SSC containing 1% SDS for 20 minutes,

and 0.2x SSC containing 1% SDS for 10 minutes, then exposed to an X-ray film at

-80°C for up to 4 days. Plaques giving positive signals after hybridization were

“cored” with a sterile pipette tip and placed in 0.5 mi SM (0.1 M NaCl; 0.008 M

22

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. M gS04-7H20 ; 0.05 M Tris-Cl, pH 7.5; 0.01% gelatin) and one drop of chloroform was

added to each vessel. Phage released from the agar core was used to re-infectE.coli

K802 cells to further purity the phage to homogeneity.

9. Isolation of lambda EMBL3 phage containing putative

cathepsin L genomic DNA

E.coli K802 cells were grown overnight in LB medium supplemented with 10

mM MgS04 and 0.2% maltose. Approximately 30 pi of a purified EMBL3 clone in

SM buffer, containing the putative CL sequence, was added to 0.1 ml K802 cells

containing 0.1 ml of 10 mM CaCl2 and 10 mM MgCl2. The mixture of cells and phage

was incubated at 37°C for 60 minutes, then added to 25 ml LB medium supplemented

with 10 mM MgCl2. The culture was incubated at 37°C overnight with shaking until

lysis had occurred. Next, 5 drops of chloroform were added, and the culture was

shaken for another 5 minutes. Following this, cellular debris was removed by

centrifugation for 20 minutes at 10,000 rpm and 4°C (Sorvall, SS-34 rotor). The

supernatant from the previous step was centrifuged at 40,000 rpm (Beckman L5

ultracentrifuge) for 2.5 hours at 4°C. The resulting pellet was resuspended in SM

solution containing DNase I and RNase A in a final concentration of 1 pg/ml and 10

pg/ml, respectively, followed by incubation at 37 °C for 1.5 hours. Solid NaCl and

polyethylene glycol (PEG 8000) were then added to a final concentration of 1 M and

10% w/v, respectively, and the mixture was stored on ice overnight. The bacteriophage

particles were recovered by centrifugation at 10,000 rpm for 15 minutes at 4°C. The

phage pellet was resuspended in SM, then EDTA and protease K were added to a final

concentration of 0.04 M and 0.09 mg/ml, respectively. The mixture was incubated at

65 °C for one hour. Proteins were removed by extractions first with phenol, then with

phenol:chloroform (1:1), and with chloroform only using standard procedures. After

the final centrifugation the aqueous layer was treated with 1/10 volume of 3 M sodium

23

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. acetate, pH 5.2 and two volumes of 95% cold ethanol to precipitate the phage DNA.

After 3 hours at -20°C, the ethanol-insoluble material was recovered by centrifugation

at 10,000 rpm. The DNA pellet was washed twice with 70% cold ethanol. Finally, the

phage DNA precipitate was dried in a Speedvac and resuspended in distilled water. The

DNA concentration and purity was determined by UV absorption at 260 and 280 nm,

where 50 (xg/ml DNA gives an absorbance of 1.0 at 260 nm. A 260/280 ratio of 1.9-2.0

was taken to represent “pure” DNA.

10. Restriction analysis and Southern blotting of putative

CL clones derived from the genomic DNA EMBL3 library

Since Artemia embryo CL-1 cDNA contained an EcoR I restriction site just

before the mature protease coding sequence, recombinant EMBL3 clones containing

the putative CL genes were analyzed with the restriction enzyme EcoR I. Typically, the

reactions contained 1-2 pg phage DNA, buffer H (90 mM Tris-Cl, 10 mM MgCl2>50

mM NaCl, pH 7.5) and 12 units of restriction enzyme EcoR I in a total volume of 20 pi

reaction. The reactions were incubated at 37°C overnight, combined with loading dye

and subjected to electrophoresis on 1.0% argarose gel with lx TAE buffer. At the end

of the run (about 2 hours at 60 volts) the gel was incubated in lx TAE buffer containing

ethidium bromide, then visualized using UV transillumination.

The EcoR I reaction products were transferred to a nitrocellulose membrane by

Southern blotting according to the standard protocol (Southern, 1975). After the

transfer was complete, the membrane was rinsed in distilled water for several minutes

then baked at 80°C for 90 minutes under vacuum. The prehybridization and

hybridization steps were performed as described above (see section 8). Finally the

membrane was exposed to an X-ray film at -80 °C for at least 18 hours.

24

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11. Polymerase chain reaction using phage DNA clones as

substrate

Cathepsin L (CL) positive clones in EMBL3 were analyzed using the

polymerase chain reaction. The primers used were CLF and CLR3 (see Appendix 1

and 2). The reaction contained 150 ng of a pure CL EMBL3 clone as template, lx PCR

reaction buffer, 2.5 mM MgCl2, 0.2 mM dNTP (dATP, dTTP, dCTP, dGTP) (final

concentration) and 5 units Taq DNA polymerase. PCR was performed under the

following condition: 94°C for 5 minutes; 35 cycles of 94°C for 1 minute, 53°C for 1

minute, 72°C for 2 minutes; finally the reaction vessel was incubated at 72°C for 10

minutes to complete unfinished chain extensions.

The PCR product(s) was subjected to electrophoresis on a 1% agarose gel in lx

TAE buffer. The bands of interest were localized using ethidium bromide, and cut from

the gel using a clean scalpel and purified further using the QIA Gel Extraction Kit. The

concentration and purity of the DNA was determined by UV absorption at 260 nm and

280 nm.

12. Amplification of Artemia cathespin L genes

The polymerase chain reaction was used to amplify the cathepsin L genes from

genomic DNA as follows. The primers used were CLF13 and CLR18 (see Appendix 1

and 2) at the final concentration of 1 pmol/pl. The template DNA was diluted to 100

ng/pl and 2.5 pi template DNA (250 ng) was added to a reaction vessel, containing lx

PCR reaction buffer, 2.5 mM MgCl2, 0.2 mM dNTP (dATP, dTTP, dCTP, dGTP) (final

concentration), 5 units Taq DNA polymerase and ImM Betaine at 5 mg/ml (Sigma).

PCR was performed as follows: 94°C for 5 minutes; 35 cycles of 94°C for 1 minute,

54°C for 1 minute, 72°C for 2 minutes; 72°C for 10 minutes.

13. DNA sequencing ofArtemia cathepsin L clones

DNA sequencing was conducted in two ways. Initially the Thermo Sequenase

25

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Cy5.5 Dye Terminator Cycle Sequencing Kit and departmental DNA sequencer were

used. For each clone to be sequenced, the master mix contained 3.5 pi reaction buffer,

2 pi primer (20 pmol/pl), and 2 pi Thermo Sequenase DNA polymerase (10 U/pl).

PCR products and plasmid DNA were used as template. The total volume of the initial

reaction mix was 31.5 pi. The primers used in the sequencing reactions were CLF and

CLR3. Aliquotes of the initial reaction mix (7 pi) were placed in four different

sequencing reaction vessels, each containing 2.5 pi of different ddNTPs (ddATP,

ddCTP, ddTTP, ddGTP). The reaction vessels were incubated for 2.5 minutes at 94°C,

then 35 cycles of 45 seconds at 94°C, 45 seconds at 52°C, 2 minutes at 72°C, and 10

minute at 72°C. After completion of the cycling program, 2 pi of 7.5 M ammonium

acetate and 30 pi of cold 95% ethanol were added to each of the four reaction tubes.

The tubes were placed at -80°C for at least 18 hours. Next, the tubes were centrifuged

(4°C) at 13,000 rpm for 45 minutes. The supernatant was removed and the pellet was

washed with 70% ice-cold ethanol. The tubes were centrifuged again for 5 minutes.

The supernatant was removed and the pellet dried under vacuum. Formamide loading

dye (6 pi) was added to each pellet and the tubes were vortexed vigorously.

After heating at 70°C for 2.5 minutes, the products were subjected to

electrophoresis on a polyacrylamide gel in Long Read Tower system under the

following conditions: gel temperature: 60°C; gel voltage: 2000 volts; laser power: 50%.

The electrophoresis buffer was lx TBE. Prior to loading the samples (2.5 pi), the gel

was pre-run for 20 mintes. The electrophoresis running time was one hour.

Except for the above, all subsequent plasmid DNA clones were sent to the

Robarts Institute for sequencing. The primers used in the sequencing reaction were

provided by the Robarts Institute as follows:

T7 promoter (5’-TAATACGACTCACTATAGGG-3’)

M13 Forward (5 ’ -CGCCAGGGTTTTCCCAGTCACGAC-3 ’)

M13 Reverse (5 ’ -T C AC AC AGG A AAC AGCTATG AC-3 ’).

After sequencing, the data were used for alignments withArtemia franciscana

cathepsin L cDNA (AF147207) and genomic DNA clone 9C (AY557372), whose

26

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sequences were in Genebank at the National Center for Biotechnology Information

(NCBI). The alignments were performed using Clustal W (1.82) program (Thompson

et al. 1994) at EBI (European Bioinformatics Institute) toolbox.

14. PCR analysis ofArtemia franciscana adult cDNA library

for the presence of cathepsin L sequence

An adult Artemia franciscana cathepsin L cDNA library constructed in

bacteriophage Xgtl 1 was obtained from L. Sastre (Madrid, Spain).Ecoli rY1090~ cells

were used as host and grown overnight in LB medium supplemented with lOmM

MgS04 and 0.2% maltose. Approximately 9x 105 lambda gtll phage from the library

were added to 0.5 ml rY1090' cells and incubated at 37°C for 60 minutes. The mixture

was then added to 2.5 ml LB with 0.7% agarose and poured on a 90 mm LB/agar plate.

The plates (6) were incubated overnight, and those showing confluent plaques were

saved. SM solution (5ml) was added to each plate, and the plates were shaken at room

temperature for 4 hours. The SM solution on the plates was collected and centrifuged

at 10,000 rpm for 15 minutes at 4°C. The supernatant was used for phage DNA

isolation, following the protocols as described in section 4 above. The phage DNA was

analyzed at 260 nm and 280 nm, and the purity and concentration of the DNA were

calculated as described above.

Using DNA (total) prepared from anArtemia franciscana adult cDNA library,

PCR was performed using different pairs of primers designed to determine the

presence of sequence matchingArtemia CL-2 gene as follows: CL9CF11, CLR11,

CLF11 and CLRlOb (see Appendix 1 and 3). The reaction vessels contained 500 ng

template DNA, lx PCR reaction buffer, 2.5 mM MgCl2, 0.2 mM dNTP (final

concentration) (dATP, dTTP, dCTP, dGTP), 5 units Taq DNA polymerase and 1 mM

Betaine. PCR was performed as follows: 94°C for 5 minutes; 35 cycles of 94°C for 1

minute, 51 or 54°C, depending on the primer pair, for 1 minute, 72°C for 2 minutes;

72°C for 10 minutes.

27

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The PCR products were analyzed on a 1% agarose gel and the major bands were

purified using the Wizard DNA Clean-Up system. The PCR derived DNA fragments

were ligated into pCR 2.1, transfected into INVaF’ cells and grown on LB plates with

ampicillin as described above. Through the screening process, DNA clones showing a

positive signal with 32P-labeled CL cDNA were collected, grown in LB broth with

ampicillin and isolated using the Wizard Miniprep Kit. Clones containing putative CL

inserts were sent to the Robarts Institute for DNA sequencing.

The sequencing data were compared withArtemia franciscana embryo cathepsin

L cDNA (AF147207) representing Artemia CL-1 gene andArtemia franciscana

genomic DNA clone 9C (AY557372) representingArtemia CL-2 gene. The program

used was Clustal W (1.82) (Thompson et al. 1994).

15. PCR analysis ofArtemia embryonic cDNA library for

additional CL cDNAs

Bluescript phagemid DNA was prepared as described previously from the

Artemia franciscana cDNA library in XZAP1I using a protocol from the supplier

(Butler et al. 2001).

Using total DNA prepared from anArtemia embryonic cDNA library, PCR was

performed using primer pair TP-7F and CLR11 (see Appendix 1 and 3). The reaction

vessels contained 276 ng template DNA, lx PCR reaction buffer, 2.5 mM MgCl2, 0.2

mM dNTP (dATP, dTTP, dCTP, dGTP), 5 units Taq DNA polymerase and 1 mM

Betaine in 50 pi final volume. PCR was performed as follows: 94°C for 5 minutes; 35

cycles of 94°C for 1 minute, 51 or 54°C, depending on primer pair, for 1 minute, 72°C

for 2 minutes; 72°C for 10 minutes.

The PCR products were analyzed on a 1% agarose gel and the main bands were

purified using the Wizard DNA Clean-Up system. The PCR derived DNA was ligated

into pCR 2.1 then INVaF’ cells were transfected and grown on LB plates with

ampicillin as described above. DNA clones showing a positive signal when probed

28

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with 32P-labeled CL cDNA were collected, grown in LB broth with ampicillin and

isolated using Wizard Miniprep Kit. Clones containing putative CL inserts were sent to

Robarts Institute for DNA sequencing.

The sequencing data were compared withArtemia franciscana genomic clone 9C

(AY557372) representing Artemia CL-2 gene and withArtemia franciscana embryo

cathepsin L cDNA (AF147207). The program used was Clustal W (1.82) (Thompson et

al. 1994).

16. Analysis ofArtemia franciscana genomic DNA in phage

EMBL3 for additional CL genes

Artemia franciscana genomic DNA cloned in EMBL3 was grown E.coliin K802

cells, purified as described above, then used as template for the polymerase chain

reaction using the following primers: CLF10 and CLR8 (see Appendix 1, 2 and 3). The

reaction contained 500 ng total phage DNA as template, lx PCR reaction buffer, 2.5

mM MgCl2, 0.2 mM dNTP (dATP, dTTP dCTP, dGTP), 5 units Taq DNA polymerase

arid 1 mM Betaine in 50 pi final volume. PCR was performed under the following

conditions: 94°C for 5 minutes; 35 cycles of 94-°C for 1 minute, 56°C for 1 minute,

72°C for 2 minutes; 72°C for 10 minutes.

The PCR product was purified by Wizard DNA Clean-Up system and ligated

into vector pCR 2.1 as described above. DNA clones showing a positive signal using

our 32P-labeled CL cDNA probe were collected, purified and the PCR product within

the plasmid was sequenced.

The sequencing data were analyzed for its similarity withArtemia franciscana

embryo cathepsin L cDNA (AF147207) andArtemia franciscana genomic clone 9C

(AY557372).

29

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 17. Attempt to identify the putative promoter sequence of

Artemia franciscana CL genes

Degenerate PCR was performed in an attempt to identify the upstream (5’)

promoter sequence of one of the Artemia franciscana CL genes. Artemia Jransicana

genomic clone 9C contains nucleotide sequence that we have designatedCL-2, as so it

was used as template in the PCR reaction. The degenerate primers used were as

described in paper of Badaraccoet al. (1995) in addition to an internal primer as

follows: OPC-2, OPC-4, OPC-8, OPC-9 and CLR10 (see Appendix 1 and 3).

The reaction contained 250 ng template, lx PCR reaction buffer, 2.5 mM MgCl2,

0.2 mM dNTP (dATP, dTTP, dCTP, dGTP) (final concentration) and 5 units Taq DNA

polymerase. PCR was performed in two consecutive steps. Step 1: 94°C for 5 minutes;

10 cycles of 94°C for 1 minute, 35°C for 1 minute, 45°C for 12 minutes. Step 2: 30

cycles of 94°C for 1 minute, 56°C for 1 minute, 72°C for 2 minutes; 72°C for 7 minutes

after the 30 cycles to complete all extension.

The PCR products were transferred to a nitrocellulose membrane by Southern

blotting (Southern, 1975) and the membrane processed for hybridization with a

[32P]-labeled CL probe as described above.

CL-positive PCR products were separated by electrophoresis on 1% agarose gel,

and the band showing a positive signal was cut from gel, purified using the QIA Gel

Extraction Kit, then ligated into vector pCR 2.1 and cloned as described above.

Positive clones were collected, purified and the plasmid (insert) containing the PCR

product was sequenced.

30

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Results:

1. Isolation of cDNA clone coding for Artemia embryo

cathepsin L

The plasmid pCR2.1 containing a cDNA coding forArtemia franciscana

cathepsin L (CL) used in this experiment was isolated using alkaline lysis and treated

with restriction endonuclease EcoR I as shown in Fig 1. The cloned CL cDNA

contained 1085 bp, and was composed of fragments of 335 bp and 750 bp. This cloned

cDNA was described previously (Butleret al. 2001). The cloned cDNA was used for

construction of a [32P]-labeled probe to detect similar sequences in genomic DNA and

cDNA libraries prepared fromArtemia franciscana larvae and adults.

2. Isolation and analysis of cathepsin-L like clones from an

Artemia fanciscana genomic DNA library

Ten putative CL-positive clones (Al to A10) were isolated from a genomic DNA

library constructed in phage EMBL3 and analyzed using EcoR I digestion. Three of

these clones (Al, A2, A3) are shown in Fig 2. All clones showed the same EcoR I

restriction pattern displayed byArtemia franciscana genomic clone 9C isolated

previously (Fig 2), whose sequence is about 80% identical Artemia to embryo CL

cDNA. The restriction patterns of all the clones were similar in that they yielded three

bands: 10,000 bp, 8000 bp and 2400 bp. These results indicated that there are two

EcoR 1 digestion sites in the cloned sequence, however only one band (8000 bp) gave

a signal with the 32P-labeled CL probe. These results demonstrated that while all clones

appeared to represent CL sequences, they were “identical” to genomic clone 9C, but

not withArtemia embryo CL cDNA as shown in Figl.

31

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2

4000bp ► •4pCR2.1

2000bp ►

innnhn

750bp ► «<750bp

250bp ► «<335bp

Fig. 1. Analysis of a cDNA in vector pCR2.1 coding Artemia for franciscana

embryo cathepsin L.

Lane 1, 1Kb ladder; lane 2, products of EcoR I digestion. The bands above were

visualized using ethidium bromide staining.

32

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A. B.

123 45 678 9

12 345 67 8 9

«*# *»*•***»* *•*!«* m m ^

8000bp ► **" + hlsen

3000bp ► 2000bp ►

Fig. 2. EcoR I digestion ofArtemia franciscana genomic DNA clones in >. EMBL3.

Panel A, the restriction pattern of all the clones analyzed. The restriction enzyme used

was EcoR I. Lanel, 1Kb ladder; lane 2, clone 9C (control); lane 3, EcoR I treatment of

clone 9C; lane 4, clone A l; lane 5, EcoR I treatment of clone A l; lane 6, clone A2; lane

7, EcoR I treatment of clone A2; lane 8, clone A3; and lane 9, EcoR I treatment of

clone A3. Panel B, X-ray of clones in panel A after Southern blotting and hybridization

with 32P-labeled CL cDNA probe. Lanes 1-9 represent the same DNAs as shown in

panel A. All clones showed a strong hybridization signal at about 8000 bp after EcoR I

digestion.

33

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Identification of DNA sequence in putative CL genomic

clones from Artemia franciscana

The polymerase chain reaction was used to amplify part of the insert in the CL

positive clones (Al, A2, A3) shown in Fig 2, which could then be sequenced. The

primers for the PCR reaction were CLF and CLR3 (see Appendix 1 and 2), a design

based on the sequence ofArtemia embryo cDNA and expected to yield a fragment of

578 bp. The PCR products of these reactions are shown in Fig 3. The results yielded

products of approximately 600 bp and they were similar to the product obtained with

genomic clone 9C as substrate (lane 2).

According to the EcoR I restriction pattern and hybridization pattern of all ten

putative CL genomic clones isolated from the EMBL3 library, all clones appeared to

be identical so only one clone (Al) was sequenced. The PCR product shown in lane 3

of Figure 3 was cloned into pCR2.1, then sequenced as described in the methods. The

primers in separate sequencing reactions were CLF and CLR3, the same as the PCR

primers, so only a partial sequence was obtained because of the primer presence which

is not usually seen clearly on the pherogram of sequencing. Thus, 482 base sequence

was obtained, and compared withArtemia cathepsin L cDNA (AF147207) using

Clustal W (1.82) program (Thompson et al. 1994) (Fig 4.). The sequence of PCR

product from clone Al was 90% identical with embryo CL cDNA, and this sequence

covered the prepro-region and part of the mature region of the cysteine protease

according to the embryo CL cDNA sequence. The most notable differences with the

embryo cDNA clone were a six-base gap (bp 318-323) in the prepro-coding region and

loss of the EcoR I site, upstream from the mature protease coding region. The partial

sequence of clone Al was also compared withArtemia franciscana genomic clone 9C

(AY557372) where the identity was 99% as shown in Figure 5. These results suggest

that genomic clone Al is nearly identical to clone 9C except for a few nucleotide

polymorphisms. Overall, theCL gene in genomic clone 9C has 1049 base pairs and

was about 80% identical withArtemia embryo cathepsin L cDNA. Both clones Al and

9C have the same restriction pattern, but they differ from embryonic CL cDNA as

34

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2 3 4 5

< primer dimer

Fig. 3. PCR products from the use of putative CL phage DNA clones as substrate.

PCR products were separated by electrophoresis on 1% agarose, and stained with

ethidium bromide as described in methods. The primers used were CLF and CLR3 (see

Appendix 1 and 2). Lane 1, 1Kb ladder; lane 2, PCR product using genomic clone 9C

as substrate; lanes 3-5, PCR products using genomic clones Al, A2 and A3,

respectively, as substrate. The major band of all reactions was approximately 600 bp.

35

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 4. Alignment of genomic clone A l withArtemia embryo cathepsin L cDNA.

The sequence of Artemia embryo cathepsin L cDNA (AF147207) was obtained from

the GenBank database. The alignment was performed using program Clustal W (1.82).

The two sequences were 90% identical in the region compared. Asterisks indicate

identical base pairs. The arrowhead indicates potential cleavage site for prepro- and

mature region. The boxed bases represent EcoR I restriction site inArtemia embryo

cDNA that is lacking in genomic clone Al (and clone 9C). The dashes in clone Al

indicate sequences missing (bp 318-323 in cDNA). Compared to cDNA, other dashes

indicate the regions of the clone not sequenced.

36

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.

EcDNAl CATCTTGTGGCAGACAATTACACAATGAAGCAGATTACTTTGATATTTTTACTGGGAGCTGTACTTGTGCAGTTAAGTGCTGCACTATCA 90

Al ......

BCDNA1 CTGACAAATTTACTTGCTGATGAATGGCATCTATTCAAGGCTACACACAAGAAAGAATATCCAAGCCAACTTGAGGAGAAATTTAQAATG 180

Al ..... AATTTGCTTGCTGATGAATGGTATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGAGQAAAAATTTAGAATG 84

EcDNAl AAGATTTATTTGGAAAATAAACACAAAGTTGCCAAACATAACATCCTTTATGAAAAAGGCGAAAAGTCTTATCAAGTCGCAATGAATAAG 27 0

Al AAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAGGCGAAAAGTCTTATCAAGTTGCAATGAATCAG 17 4

EcDNAl TTTGGAGATCTTCTTCATCATGAATrrAGATCTATCATGAATGGATACCAACATAAGAAACAjGAATTCjCTCAAGAQCTGAGAGCACTTTC 360

Al TTTGGAGATCTTCTTCATCATGAATTTACATCTATCATGATTGGATA.....-TAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTT 258

EcDNAl ACTTTTATGGAGCCTGCTAATGTTGAAGTTCCAGAATCTGTTGACTGGAGGGTAAAAGGAGCCATAACTCCTGTAAAAGACCAAGGACAG 450

Al ACTTTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCAGTAACTCCTGTAAAATACCAAGGACAG 348

EcDNAl TGTGGTTCATGCTGGGCTTTCTCATCTACTGGTGCCTTGGAAGGTCAAACCTTCAGAAAAACAGGGAAGCTCATTTCTTTGAGTGAACAG 540

Al TGTGCTTCTTGCTTGGCTTTTTCACCTACTGQTQCCTTGGAAAGTCAAACTTTCAGAAAAACAQGAAAGCTCATTTCTTTQAQTQAACAA 43 8

EcDNAl AACTTGATTGATTGTTCTGGAAAATATGGAAATGAAGGATGCAATGGAGGATTAATGGACCAAGCTTTCCAGTATATCAAGGATAACAAG 63 0

Al AACTTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAATGGAGGATTAATGGA...... 497

EcDNAl GGAATTGACACTGAAAATACGTACCCTTATGAAGCTGAAGACAATGTCTGTCGTTATAATCCAAGGAACCGAGGTGCCATTGACCGTGGC 720

Al ......

ECDNA1 TTTGTCCATATCCCATCTGGAGAAGAAGATAAGCTTAAGGCAGCTGTTGCCACTGTTGGACCTGTATCTGTTGCCATCGATGCCTCTCAT 810

Al ......

ECDNA1 GAAAGTTTCCAATTCTATTCTAAAGGTGTTTACTATGAGCCATCATGTGACTCTGATGACCTAGACCACGGAGTTCTTGTGGTTGGCTAT 900

Al ......

BCDNA1 GGTTCTGATAATGGCAAAGACTATTGGCTCGTTAAAAACTCGTGGTCTGAGCACTGGGGAGACGAAGGGTATATCAAGATTGCTCGCAAT 990

Al ......

ECDNA1 CGCAAGAACCATTGTGGTATTGCTACTGCAGCTAGCTATCCACTTGTATAGATAGGGTTGTGGTAATTTTTGTGGATGTGTGTAATTGCA 1080

Al ......

ECDNA1 TACGTTAAATTCTTATTCTCTTGATAGGTTTAGAGAGTTCTAGTTTTCAGTTTGATTCCGTAGATGACAGATTTTGTGACCATATTCGAG 1170

Al -......

ECDNA1 AATAAAGCGTTTTTTTTACCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 122 9

Al ......

37

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 5. Alignment of genomic clone A1 withArtemia cathepsin L genomic clone

9C (CL-2 gene).

The sequence of Artemia cathepsin L genomic clone 9C (AY557372) was obtained

from the GenBank database. The alignment was performed using program Clustal W

(1.82). The two sequences were 99% identical. Asterisks indicate identical base pair.

Dashes indicate regions not sequenced yet.

38

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.

9 C CAAATGAAGCAGATTACTTTGACATATTTACTAACAGCTGTAATGATATTTTTACTGTCAGTTGTACTTGTGCAGTTAAGTGCTACACAA 9 0

A1 ......

9C TCACAGTCAAATTTGCTTGCTGATGAATGGTATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGA 180

A1 ...... AATTTGCTTGCTGATGAATGGTATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGA 81

9 C ATGAAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAGGCGAAAAGTCTTATCAAGTTGCAATGAAT 270

A1 ATGAAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAGGCGAAAAGTCTTATCAAGTTGCAATGAAT 171

9 C CAGTTTGGAGATCTTCTTCATCATGAATTTACATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACT 360

A1 CAGTTTGGAGATCTTCTTCATCATGAATTTACATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACT 261

9 c TTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCAGTAACTCCTGTAAAATACCCAGGACAGTGT 450

A1 TTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCAGTAACTCCTGTAAAATACCAAGGACAGTGT 351

9C GCTTCTTGCTTGGCTTTTTCACCTACTGGTGCCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAAC 54 0

A1 GCTTCTTGCTTGGCTTTTTCACCTACTGGTGCCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAAC 441

9C TTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGCCAAGCTTTTGAGTATATCAAGGATAACAAAGGA 630

A1 TTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAATGGAGGATTAATGGA...... 497

9C ATTGACACTGAAAATAAATArCATTATGAAGCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGCTTT 720

A1 ......

9C GTCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGACCTGTTTCCGCTGTTATTGATGTCTCTCATGAA 810

A1 ......

9C GGTTTTCAATTCTATTCTAAGGGTGTTTACTATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCACGAAGTTCTTGTAATTGGC 900

A1 ......

9C TGTGGTTCTGATAATGGCGAAGACTATTGGCTCGTTAAAAACTCATGGTCTAAGCACTGGGGAGACGAAGGGTACCTCAAGATTGCTCGC 990

A1 ......

9C AATCGCAAGAACCATTGTGGTGTTGCTACTGCAGCTCTCTATCCAATTGTATAGATAGGGTTGTGGTACTTTTTGTGATGTGTGTAATTG 1080

A1 ......

9C ACCACGGTACATCT 1094

A1 ......

39

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A . Artemia franciscana embryo cathepsin L cDNA

m E m HI 11 HIII PstI , stop I______. 1 i______l_ 1 _

B. Artemia franciscana cathepsin L genomic clone 9C CL-2( gene)

m stop m HIII PstI stop

{______+ ______i______I 1 A -

Fig. 6. Comparison of gene structure ofArtemia cathepsin L cDNA and genomic

clone 9C.

E indicates EcoR I site. HIII indicates Hind III site. PstI indicates Pst I site, m indicates

potential translation start sites (methionine), and stop indicates stop codon. Panel A

represents the structure of Artemia embryo cathepsin L cDNA, which contains one

EcoR I site, two Hind III sites and one Pst I site. The cDNA has an one open reading

frame of 1014 bp. Panel B represents the structure of genomic phage clone 9C CL-2(

gene), which contains one Hind III site and one Pst I site and two stop codons; it

contains two open reading frames of 328 bp and 680 bp, respectively.

40

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shown in Fig 6. Since all clones isolated from EMBL3 phage library have the same

restriction digestion pattern using EcoR I and Hind III, they all appear to be identical to

clone 9C. The sequence contained in genomic clone 9C has now been designated as

Artemia franciscana cathepsin L-2 gene (CL-2 gene).

4. PCR anlysis of the Artemia franciscana phage DNA

library.

Since we were unable to isolate a phage from the EMBL3 library with a sequence

identical to embryo cDNA coding for cathespin L, we decided to use the PCR reaction

to determine if there is aCL gene in the phage (EMBL3) library matching theArtemia

embryo cDNA sequence. This alternative approach to conventional screening of phage

library should detect and amplify the desiredCL gene sequence if present in the library.

Artemia franciscana genomic DNA library constructed in EMBL3 was purified as

described in the methods. The primers chosen were designed based on embryo CL

cDNA sequence and were designated CLF10 and CLR8 (see Appendix 1 and 2). The

sequence between the two primers covers the prepro- and mature regions of the

protease. These primers were also efficient in amplifyingArtemia CL-2 gene, as they

contain similar sequence of theCL-2 gene. The results in Figure 7 show that one band

of about 800 bp was produced using PCR and total genomic DNA in the EMBL3

library. The PCR product shown in Fig 7 was cloned in vector pCR 2.1, and several

white colonies were analyzed using EcoR I digestion and probing with 32P-labeled CL

cDNA. As shown in Fig 8, each of the four clones analyzed contained an insert about

800 bp, and all were lacking an internal EcoR I site as foundArtemia in embryo CL

cDNA. All four clones were sequenced and showed about 97% identity, including the

expected amount of polymorphisms. The sequence data in Figure 9 compare genomic

clone 818 obtained using PCR with clone 9C obtained by screening a genomic DNA

library prepared in EMBL3, while the sequence data in Figure 10 compare genomic

clone 818 obtained using PCR with CL cDNA isolated froman Artemia embryo cDNA

library constructed in phage XZAPII. All four clones compared well (99% identical)

41

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2

lOOObp ► < ~800bp 750bp ►

Fig. 7. PCR analysis of total DNA fromArtemia franciscana EMBL3 genomic DNA library.

PCR products were separated by electrophoresis on 1% agarose, and stained with

ethidium bromide as described in methods. Lane 1 is 1 kb ladder, and lane 2 is the PCR

product generated fromArtemia EMBL3 genomic library' using primer pair CLF10 and

CLR8 (see Appendix 1 and 2).

42

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2 3 4 5

4000bp ► Vector (pCR2.1)

IflOOhn ►

750bp ► ■^insert ~800 bp

Fig. 8. EcoR I restriction endonuclease digestion of PCR generated fragments

from EMBL3 DNA cloned into pCR2.1. Lane 1, 1Kb ladder; lane 2, clone 818; lane 3, clone 819; lane 4, clone 820; lane 5,

clone 821. All the four clones have inserts of about 800 base pairs.

43

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 9. Alignment of PCR derived clone 818 withArtemia cathepsin L genomic

clone 9C (CL-2 gene).

The sequence of Artemia cathepsin L genomic clone 9C (AY557372) was obtained

from the GenBank database. The alignment was performed using program Clustal W

(1.82). Asterisks indicate identical base pair. Dashes indicate regions not sequenced.

The sequences used as primers (CLF10 and CLR8) in PCR are underlined.

44

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 9.

9C CAAATGAAGCAGATTACTTTGACATATTTACTAACAGCTGTAATGATATTTTTACTGTCA 6 0

8 1 8 ------

9C GTTGTACTTGTGCAGTTAAGTGCTACACAATCACAGTCAAATTTGCTTGCTGATGAATGG 120

8 1 8 ------

9C TATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGA 180 818 -ATCTATTCAAGGCTACACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGA 59 *************** *******************************************

9C ATGAAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAA 240 818 ATGAAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAA 119

************************************************************

9C GGCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTT 3 00 818 GGCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTT 17 9

9C ACATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACT 360 818 ACATCTATCATGATTGGATATAAGAAACGAACTTCACCCTTTGCTAAGAGCACTTTTACT 239

*************************** ********************************

9C TTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCA 420 818 TTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCA 299

9C GTAACTCCTGTAAAATACCCAGGACAGTGTGCTTCTTGCTTGGCTTTTTCACCTACTGGT 480 818 GTAACTCCTGTAAAATACCAAGGACAGTGTGCTTCTTGCTTGGCTTTTTCACCTACTGGT 3 59

9C GCCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAAC 54 0 818 GCCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAAC 419

9C TTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGCCAA 600 818 TTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGCCAA 4 79

9C GCTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTATGAA 6 60 818 GCTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTATGAA 53 9

45

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9C GCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGCTTT 720 818 GCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGCTTT 599

9C GTCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGACCT 780 818 GTCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGACCT 659

9 C GTTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTTTAC 840 818 GTTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTTTAC 719

********++*++**+**+********+****+*★+*+**+++*★+******+*+*★★**

9C TATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCACGAAGTTCTTGTAATTGGC 900 818 TATGAGGCATCATGTAAAACATCATTTGAACACCTAAACCACGCAGTTCTTGTAATTGGC 779

****+★ i*********************************** it***************

9C TGTGGTTCTGATAATGGCGAAGACTATTGGCTCGTTAAAAACTCATGGTCTAAGCACTGG 96 0 818 TGTGGTTCTGATAATGGCGAAGACTAT------8 0 6

9C GGAGACGAAGGGTACCTCAAGATTGCTCGCAATCGCAAGAACCATTGTGGTGTTGCTACT 1020 8 1 8 ------

9 C GCAGCTCTCTATCCAATTGTATAGATAGGGTTGTGGTACTTTTTGTGATGTGTGTAATTG 1080 8 1 8 ------

9C ACCACGGTACATCT 1094 8 1 8 ------

46

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 10. Sequence alignment of PCR derived clone 818 from EMBL3 genomic DNA library withArtemia embryo cathepsin L cDNA.

The sequence of Artemia embryo cathepsin L cDNA (AF147207) was obtained from

the GenBank database. The alignment was performed using Clustal W (1.82).

Asterisks indicate identical base pair. Dashes at bp 318-323 in cDNA indicate missing

sequence in clone 818. Other dashes indicate regions not sequenced. The EcoR I

restriction site inArtemia embryo CL cDNA is boxed. See Fig 4 for a similar type

comparison of sequence. The sequences used (CLF10 and CLR8) as primers in PCR

are underlined.

47

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 10.

EcDNAl CATCTTGTGGCAGACAATTACACAATGAAGCAGATTACTTTGATATTTTTACTGGGAGCT 60

8 1 8 ------

EcDNAl GTACTTGTGCAGTTAAGTGCTGCACTATCACTGACAAATTTACTTGCTGATGAATGGCAT 120

8 1 8 ------A T 2 *★

EcDNAl CTATTCAAGGCTACACACAAGAAAGAATATCCAAGCCAACTTGAGGAGAAATTTAGAATG 180

818 CTATTCAAGGCTACACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGAATG 6 2 ************* ************★ *★★★***★★★*★**★*★*★* ★★******★★★*

EcDNAl AAGATTTATTTGGAAAATAAACACAAAGTTGCCAAACATAACATCCTTTATGAAAAAGGC 240

818 AAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAGGC 122

***** ************************* ******

EcDNAl GAAAAGTCTTATCAAGTCGCAATGAATAAGTTTGGAGATCTTCTTCATCATGAATTTAGA 300

818 GAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTTACA 182

***************** ********* ****************************** *

EcDNAl TCTATCATGAATGGATACCAACATAAGAAACA|GAATTC|CTCAAGAGCTGAGAGCACTTTC 360

818 TCTATCATGATTGGATA------TAAGAAACGAACTTCACCCTTTGCTAAGAGCACTTTT 236

********* ****** ******** * *★★ * **★ **********

EcDNAl ACTTTTATGGAGCCTGCTAATGTTGAAGTTCCAGAATCTGTTGACTGGAGGGTAAAAGGA 420

818 ACTTTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGA 296

************************ ************************** *******

EcDNAl GCCATAACTCCTGTAAAAGACCAAGGACAGTGTGGTTCATGCTGGGCTTTCTCATCTACT 480

818 GCAGTAACTCCTGTAAAATACCAAGGACAGTGTGCTTCTTGCTTGGCTTTTTCACCTACT 356

** ************** *************** *** **** ****** *** *****

EcDNAl GGTGCCTTGGAAGGTCAAACCTTCAGAAAAACAGGGAAGCTCATTTCTTTGAGTGAACAG 540

818 GGTGCCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAA 416

************ ******* ************** ***********************

EcDNAl AACTTGATTGATTGTTCTGGAAAATATGGAAATGAAGGATGCAATGGAGGATTAATGGAC 600

818 AACTTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGC 476

***************** ** *********** ********* ** **** ** *

EcDNAl CAAGCTTTCCAGTATATCAAGGATAACAAGGGAATTGACACTGAAAATACGTACCCTTAT 660

818 CAAGCTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTAT 536

******** ******************* ******************* ** * ****

48

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. EcDNAl GAAGCTGAAGACAATGTCTGTCGTTATAATCCAAGGAACCGAGGTGCCATTGACCGTGGC 720 818 GAAGCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGC 596

****** **** *** ******** ********** *********** *** ** ****

EcDNAl TTTGTCCATATCCCATCTGGAGAAGAAGATAAGCTTAAGGCAGCTGTTGCCACTGTTGGA 780

818 TTTGTCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGA 656

****** **** ******** *********** ******************** ******

EcDNAl CCTGTATCTGTTGCCATCGATGCCTCTCATGAAAGTTTCCAATTCTATTCTAAAGGTGTT 84 0

818 CCTGTTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTT 716

***** ** * ** * ★ **** ********** **** ************** ******

E c D N A l T A C T A T G A G C C A T C A T G T G A C ------T C T G A T G A C C TA G A C C A C G G A G TT C T TG T G G T T 8 9 4

818 TACTATGAGGCATCATGTAAAACATCATTTGAACACCTAAACCACGCAGTTCTTGTAATT 77 6

********* ******** * ★ *** ***** ****** ********* **

EcDNAl GGCTATGGTTCTGATAATGGCAAAGACTATTGGCTCGTTAAAAACTCGTGGTCTGAGCAC 954

818 GGCTGTGGTTCTGATAATGGCGAAGACTAT------8 0 6

**** **************** ********

EcDNAl TGGGGAGACGAAGGGTATATCAAGATTGCTCGCAATCGCAAGAACCATTGTGGTATTGCT 1014

8 1 8 ------

EcDNAl ACTGCAGCTAGCTATCCACTTGTATAGATAGGGTTGTGGTAATTTTTGTGGATGTGTGTA 1074

8 1 8 ------

EcDNAl ATTGCATACGTTAAATTCTTATTCTCTTGATAGGTTTAGAGAGTTCTAGTTTTCAGTTTG 1134

8 1 8 ------

EcDNAl ATTCCGTAGATGACAGATTTTGTGACCATATTCGAGAATAAAGCGTTTTTTTTACCTAAA 1194

8 1 8 ------

EcDNAl AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1229

8 1 8 ------

49

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with Artemia genomic clone 9C, but less so (87 %) with Artemia embryo cathepsin L

cDNA. The results showed that all four clones giving positive signals with 32P-labeled

CL cDNA probe contained sequence identical Artemiato franciscana CL-2 gene, but

none were identical withArtemia embryo cDNA. These results suggest that the

genomic DNA library in EMBL3 is devoid of CL-1the gene for reasons discussed

later.

5. Attempts to amplifyArtemia franciscana cathepsin L-land

L-2 genes from different preparations of genomic DNA

Genomic DNA was prepared in our lab from nauplii ofArtemia franciscana and

used as template to search for the cathepsin L genes, using PCR primers CLF13 and

CLR18 (see Appendix 1 and 2), synthesized from sequenceArtemia of embryo

cathepsin L cDNA. DNA prepared from genomic clone 9C and the EMBL3 library was

also used as templates for comparison. Primers were designed to distinguish between

Artemia embryo CL cDNA and genomic clone 9C, representing CL-1the gene and

CL-2 gene, respectively. The PCR results shown in Figure 11 indicate that only

genomic DNA prepared fromArtemia nauplii gave a PCR product. DNA prepared

from clone 9C and the EMBL3 library total DNA did not yield any PCR products using

these primer pairs, while genomic DNA yielded a product of about 600 bp product as

expected. The PCR derived DNA fragment (lane 4, Fig 11) was purified and cloned

into vector pCR 2.1. It gave a positive signal when hybridized with 32P-labled CL

cDNA, and when treated with EcoR I, it yielded one detectable band of the size

predicted from data in Figure 11 (see Figure 12). Sequencing yielded a product of 565

bp that was nearly identical (97%) withArtemia franciscana embryo cathepsin L

cDNA sequence (AF147207) (see Fig 13). The sequence of the genomic clone also

contained one EcoR I restriction digestion site and two Hind III sites at the same

positions as inArtemia embryo CL cDNA. These results indicate that while freshly

prepared Artemia genomic DNA contains theCL-1 gene as predicted, the genomic

DNA library constructed in EMBL3 is deficient in the gene codingCL-1 for (for

50

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2 3 4

750bp ► < -6 0 0 bp 500bp ►

& & & ■ r?*5" primer dimer

Fig. 11. PCR products from Artemia genomic DNA prepared from various sources.

PCR products were separated by electrophoresis on 1% agarose, and stained with

ethidium bromide as described in methods. The primers used in the PCR were CLF13

and CLR18 (see Appendix 1 and 2). Lane 1, 1 Kb ladder; lane 2, PCR reaction from

Artemia genomic 9C clone{CL-2 gene); lane 3, PCR reaction from totalArtemia DNA

in EMBL3; lane 4, PCR product fromArtemia franciscana genomic DNA. Only

Artemia genomic DNA yielded a product of around 600 bp with the primers used.

Betaine (10 p.1) was added to each reaction to increase the amount of the products.

51

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2 3

4000bp ►

lOOObp ► 750bp ► 500bp ► ■4 insert 250bp ►

Fig. 12. EcoR 1 digestion of PCR product shown in Fig 11 after cloning in pCR2.1. Lane 1, lkb ladder; lane 2, plasmid containingArtemia embryo cathepsin L cDNA;

lane 3, Artemia genomic DNA clone (4271) derived from PCR as shown in Fig 11.

52

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 13. Sequence alignment of DNA genomic clone 4271 withArtemia embryo

cathepsin L cDNA.

The sequence of Artemia embryo cathepsin L cDNA (AF147207) was obtained from

the GenBank database. The alignment was performed using the Clustal W program

(1.82). The sequence of genomic DNA clone (4271) was 97% identical withArtemia

embryo cathepsin L cDNA sequence. Asterisks indicate identical base pair. Dashes

indicate regions not sequenced. The EcoR I restriction site is boxed. Hind III

restriction sites are indicated with light underline. The sequences used as primers in

PCR are indicated with bold underline. The arrowhead indicates potential cleavage

site for mature region.

53

Reproduced with permission of the copyrightowner. Further reproduction prohibited without permission. Figure 13. EcDNAl CATCTTGTGGCAGACAATTACACAATGAAGCAGATTACTTTGATATTTTTACTGGGAGCTGTACTTGTGCAGTTAAGTGCTGCACTATCA 90

4271 ......

EcDNAl CTGACAAATTTACTTGCTGATGAATGGCATCTATTCAAGGCTACACACAAGAAAGAATATCCAAGCCAACTTGAGGAGAAATTTAGAATG 180

4271 ......

EcDNAl AAGATTTATTTGGAAA&TAAACACAAAGTTGCCAAACATAACATCCTTTATGAAAAAGGCGAAAAGTCTTATCAAGTCGCAATGAATAAG 270

4271 ......

EcDNAl TTTGGAGATCTTCTTCATCATGAATTTAGATCTATCATGAATGGATACCAACATAAGAAACA^AATT^CTCAAGAGCTGAGAGCACTTTC 360

4271 ...... CCAACATAAGAAACAfgAATTCtCTCAAGAQCTGAQAQTACTTTC 43

EcDNAl ACTTTTATGGAGCCTGCTAATGTTGAAGTTCCAGAATCTGTTGACTGGAGGGTAAAAGGAGCCATAACTCCTGTAAAAGACCAAGGACAG 450

4271 ACTTTTATGGAGCCTGCTAATGTTGAAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCCATAACTCCTGTAAAGGACCAAGGACAG 133

EcDNAl TGTGGTTCATGCTGGGCTTTCTCATCTACTGGTGCCTTGGAAGGTCAAACCTTCAGAAAAACAGGGAAGCTCATTTCTTTGAGTGAACAG 540

4271 TGTGGTTCATGCTGGGCTTTCTCATCTACTGGTGCCCTGGAAGGTCAAACCTTCAGAAAAACAGGGAAGCTCATTTCTTTGAGTGAACAG 223

EcDNAl AACTTGATTGATTGTTCTGGAAAATATGGAAATGAAGGATGCAATGGAGGATTAATGQACCAAGCTTTCCAGTATATCAAGGATAACAAG 630

4271 AACTTGATTGATTGTTCTGQAAAATATQGAAATGAAGGATGCAATGGAGQATTGATGGACCAAGCTTTCCAGTATATCAAGGATAACAAG 313

EcDNAl GGAATTGACACTGAAAATACGTACCCTTA7GAAGCTGAAGACAATGTCTGTCGTTATAATCCAAGGAACCGAGGTGCCATTGACCGTGGC 720

4271 GGAATTGACACTGAAAATACGTATCCTTATGAAGCTGAAGACGATGTCTGTCGTTATAATCCAAGGAACCGAGGTGCAGTTGACCGCGGC 403

EcDNAl TTTQTCCATATCCCATCTOGAGAAGAAGATAAQCTTAAGGCAGCTGTTGCCACTGTTGQACCTQTATCTGTTGCCATCGATGCCTCTCAT 810

4271 TTTGTCGATATCCCATCTGGAGAAGAAGATAAGCTTAAGGCAGCTGTTGCCACGGTTGGACCTGTATCTGTTGCCATCGATGCCTCTCAT 493

EcDNAl GAAAGTTTCCAATTCTATTCTAAAGGTGTTTACTATGAGCCATCATGTGACTCTGATGACCTAGACCACGGAGTTCTTGTGGTTGGCTAT 900

4271 GAAAGTTTCCAATTCTATTCTAAAGGTGTTTAgTATGAGCCATgATQTGAgTgTGATGACCTAGACCACGGA...... 565

EcDNAl GGTTCTGATAATGGCAAAGACTATTGGCTCGTTAAAAACTCGTGGTCTGAGCACTGGGGAGACGAAGGGTATATCAAGATTGCTCGCAAT 990

4271 ......

EcDNAl CGCAAGAACCATTGTGGTATTGCTACTGCAGCTAGCTATCCACTTGTATAGATAGGGTTGTGGTAATTTTTGTGGATGTGTGTAATTGCA 1080

4271 ......

EcDNAl TACGTTAAATTCTTATTCTCTTGATAGGTTTAGAGAGTTCTAGTTTTCAGTTTGATTCCGTAGATGACAGATTTTGTGACCATATTCGAG 1170

4271 ......

EcDNAl AATAAAGCGTTTTTTTTACCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1229

4271 ......

54

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. possible reasons discussed later). These observations explain the inability of at least

three individuals in our lab to isolate theCL-1 gene from the Artemia genomic library

constructed in EMBL3.

6. Isolation of a cathepsin L cDNA from an Artemia adult

cDNA library

In an attempt to determine whether theArtemia CL-2 gene is functional, an

Artemia franciscana adult cDNA library in Xgt 11 was analyzed using PCR to look for a

cDNA matching the CL-2 gene sequence. Total DNA from the adult cDNA library in

Lgtl 1 was isolated following the procedures described in the methods, and the DNA

was used as template in a PCR reaction. This approach avoided the need to screen large

numbers of phage (plaques), which is very time consuming and not always successful.

The PCR primers were designed based on the sequence Artemiaof genomic clone 9C

as shown in Appendix 1 and 3. Primers CL9CF1 and CLR11 covered the prepro-region

of the CL-2 gene, while primers CLF11 and CLRlOb covered part of the region coding

for the mature protease of the CL-2 gene. Artemia genomic clone 9C CL-2( gene) was

used as a positive control substrate. The PCR was done successfully and the products

were separated by electrophoresis on 1% agarose gel, stained with ethidium bromide

and visualized using UV transillumination. The results in Figure 14 show that the adult

cDNA library contained a cDNA sequence identical in size to that found using DNA

from genomic clone 9C. Primer pairs CL9CF1 and CLR11 yielded a fragment of

around 300 bp, while primer pairs CLF11 and CLRlOb yielded a product of about 600

bp. The PCR products were purified, cloned into vector pCR 2.1, and those clones that

gave a signal with 32P labled CL cDNA were sequenced.

Sequences of the two PCR products derived fromArtemia adult cDNA (see Fig

14) were aligned withArtemia cathepsin L genomic clone 9C as shown in Fig 15.

Overall, the clones were 97% identical with genomic clone 9C(CL-2) and had a

restriction pattern identical to that found in clone 9C. These results demonstrate that

55

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A. 1 2 3 B. 1 2 3

750bp ► 0mm» M 600 bp 500bp ► 500bp^ ■4 300 bp 250bp^

Fig. 14. Comparison of PCR products obtained from anArtemia adult cDNA library andArtemia clone 9C representing theCL-2 gene.

PCR products were separated by electrophoresis on 1% agarose, and stained with

ethidium bromide as described in methods. Panel A shows PCR products obtained

using primers CLF11 and CLRlOb. Lane 1, lkb ladder; lane 2, genomic clone 9C; and

lane 3, Artemia adult cDNA library. Both the control (genomic clone 9C) and adult

cDNA library yielded a similar size product of about 600 bp. Panel B shows PCR

products obtained using primers CL9CF1 and CLR11. Lanes 1, 2 and 3 represent the

same DNAs as given for panel A. The PCR product was about 300 bp in both lanes 2

and 3.

56

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 15. Comparison of DNA sequences derived from anArtemia franciscana

adult cDNA library andArtemia CL-2 gene (clone 9C) by PCR.

The sequence of Artemia cathepsin L genomic clone 9C CL-2 ( gene) (AY557372)

was obtained from the GenBank database. Products obtained from the two PCR

experiments shown in Fig 14 were cloned into pCR2.1, sequenced then combined for

presentation here. Totally, 874 bp of sequence was obtained fromArtemia adult

cDNA as substate. The alignment was performed using program Clustal W (1.82).

The sequence of clones derived fromArtemia adult cDNA was 97% identical with

Artemia CL-2 gene. Asterisks indicate identical base pair. Dashes indicate regions not

sequenced yet. The sequences used as primers in PCR are underlined. The

overlapping area of the two clones is indicated with bold letters.

57

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 15. 9C CAAATGAAGCAGATTACTTTGACATATTTACTAACAGCTGTAATGATATTTTTACTGTCA 60 ACDNA2 ------GCAGATTACTTTGACATATTTACTGGAAGCTGTACTGATATTTTTACTGTCA 52

***★******************** ******* *****************

9C GTTGTACTTGTGCAGTTAAGTGCTACACAATCACAGTCAAATTTGCTTGCTGATGAATGG 120

ACDNA2 GTTGTACTTGTGCAGTTAAGTGCTACACAATCACAGTCAAATTTGCTTGCTGATGAATGG 112

************************************************************

9C TATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGA 180

ACDNA2 TATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGGGGAAAAATTTAGA 172

********************************************** *************

9C ATGAAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAA 240

ACDNA2 ATGAAGATTTATTTTGGAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAA 232

**************** *******************************************

9C GGCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTT 3 00

ACDNA2 GGCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTT 292

************************************************************

9C ACATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACT 3 6 0

ACDNA2 ACATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACT 3 5 2

************************************************************

9C TTTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCA 420

ACDNA2 TTTATGGAGCCTGCTAACGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCA 412

***************** ******************************************

9C GTAACTCCTGTAAAATACCCAGGACAGTGTGCTTCTTGCTTGGCTTTTTCACCTACTGGT 480

ACDNA2 GTAACTCATGTAAAATACCAAGGACAGTGTGCTTCTTGCTGGGCTTTTTCATCTACTGGT 472

******* *********** ******************** ********** ********

9C GCCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAAC 540

ACDNA2 GCCTTGAAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAGAAC 532

****** ************************************************* ***

9C TTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGCCAA 600

ACDNA2 TTGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGAGGGATGGATAAGCCAA 592

******************************************* ****************

9C GCTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTATGAA 660

ACDNA2 GCTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTATGAA 652

************************************************************

58

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9C GCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGCTTT 720

ACDNA2 GCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAATTGCCCTTGGCTTT 712

********************************************* **************

9C GTCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGACCT 780

A C D N A 2 GTCAATATTCAATCTGGGGAAGAAGATAAACTTCAGGCAGCTGTTGCCACGGTTGGACCT 772

********** ********************** **************************

9C GTTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTTTAC 84 0

A C D N A 2 GTTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTTTAC 832

************************************************************

9C TATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCACGAAGTTCTTGTAATTGGC 900

A C D N A 2 TATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCAC ------8 7 4

******************************************

9C TGTGGTTCTGATAATGGCGAAGACTATTGGCTCGTTAAAAACTCATGGTCTAAGCACTGG 960

A C D N A 2

9C GGAGACGAAGGGTACCTCAAGATTGCTCGCAATCGCAAGAACCATTGTGGTGTTGCTACT 1020

A CD N A2

9C GCAGCTCTCTATCCAATTGTATAGATAGGGTTGTGGTACTTTTTGTGATGTGTGTAATTG 1080

A CD N A2

9C ACCACGGTACATCT 1094

A CD N A2

59

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Artemia franciscana CL-2 gene is expressed in adultArtemia franciscana and

probably represents a functional gene and not a pseudogene. Further, analysis of the

PCR product obtained using theArtemia adult cDNA library indicated two open

reading frames in the cDNA. The results in Fig 16 show the two open reading frames

inArtemia adult cDNA sequence for cathepsin L as identified using the ORF Finder in

NCBI, containing 49 and 133 amino acid sequence, respectively.

7. Isolation of a cathepsin L cDNA representing theCL-2

gene from theArtemia embryo cDNA library

An Artemia embryo cDNA library in AZAPII was screened previously in our lab

(Butler, 2001), andArtemia cathepsin L cDNA containing 1229 bp was isolated from

the library. The embryo CL cDNA was about 80 % identical with genomic clone 9C

and therefore could not have been derived from their genomic clone. In order to search

for a cDNA in the Artemia embryo cDNA library identical with theCL-2 gene (clone

9C), the PCR method was performed. Total DNA from theArtemia embryo cDNA

library in XZAPII was converted into the bluescript phagemid as described previously

(Butler et al. 2001), and 276 ng DNA was used as substrate in PCR. The vector primer

TP-7F (T7 promoter) was used with internal primer CLR11 in PCR reactions (see

Appendix 1). The internal primer was specific for theCL-2 gene. The PCR products

were analyzed on 1% agarose gel as shown in Fig 17. The PCR yielded several

products of different sizes because of the use of T7 promoter in the reaction.

The gel was then blotted to a nitrocellulose membrane and probed with

32P-Iabeied CL cDNA. The membrane was washed and exposed to an X-ray film at -80

°C. PCR product using TP-7F and CLR11 showed a strong positive signal at a band

about 500 bp (Fig 17). This PCR product was purified using the Wizard DNA

Clean-Up system (Pormega), then ligated into plasmid vector pCR 2.1. Through the

plasmid DNA screening with 32P labled CL cDNA, the clones with a positive signal

were collected and their DNA isolated using the Wizard Miniprep Kit (Promega) then

sequenced.

60

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 16. Open reading frames inArtemia adult cathepsin L cDNA sequence.

Panel A, first open reading frame inArtemia adult CL cDNA. The putative translation

start code, ATG, is underlined. Asterisk indicates the stop codon; Panel B, second

open reading frame in adult CL cDNA. Deduced amino acid sequence of adult CL

cDNA is shown under the nucleotide sequence. The numbers at the left indicate the

positions of the nucleotides and amino acids in the complete sequence.

61

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 16.

173 atgaaqatttattttqqaaataaaqacaaaattqccaaacataac

MKIYFGNKDKIAKHN

218 atcctttatgagaaaggcgaaaagtcttatcaagttgcaatgaat

ILYEKGEKSYQVAMN

263 cagtttggagatcttcttcatcatgaatttacatctatcatgatt

QFGDLLHHEFTSIMI

308 ggatataagaaatga 3 22 G Y K K *

B.

356 atqqaqcctqctaacqttacaqttccaqaatctqttqactqqaqq

MEPANVTVPESVDWR

401 gaaaaaggagcagtaactcatgtaaaataccaaggacagtgtgct EKGAVTHVKYQGQCA

446 tcttgctgggctttttcatctactggtgccttgaaaagtcaaact SCWAFSSTGALKSQT

491 ttcagaaaaacaggaaagctcatttctttgagtgaacagaacttg

FRKTGKLISLSEQNL

536 attgattgttccggtgaatatggaaatttaggatgcaaagaggga

IDCSGEYGNLGCKEG

581 tggataagccaagcttttgagtatatcaaggataacaaaggaatt

WISQAFEYIKDNKGI

626 gacactgaaaataaatatcattatgaagctaaagaaaatttctgt DTENKYHYEAKENFC

671 cgtgataatccaagaaaccgaggtgcaattgcccttggctttgtc

RDNPRNRGAIALGFV

716 aatattcaatctggggaagaagataaacttcaggcagctgttgcc NIQSGEEDKLQAAVA

761 acggttggacctgtttccgctgttattgatgtctctcatgaaggt

TVGPVSAVIDVSHEG

806 tttcaattctattctaagggtgtttactatgagccatcatgtaaa FQFYSKGVYYEPSCK

851 acatcatttgaacacctaaaccac 874 TSFEHLNH

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 500bp>- ■< Signal 250bp^

Fig. 17. PCR products derived fromArtemia embryonic cDNA library.

PCR products were separated by electrophoresis on 1% agarose, and stained with

ethidium bromide as described in methods. Panel A, PCR product using primers TP-7F

and CLR11 (see Appendix 1). Panel B, X-ray film after the Southern blotting of the gel.

Lanes 1 and 2 represent the same DNAs as given for Panel A. In lane 2 the band

containing about 500 bp hybridized strongly with the 32P labled CL cDNA probe.

63

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A total of 340 bp sequence was obtained (excluding sequence associated with the

T7 promoter and vector) and aligned withArtemia cathepsin L genomic clone 9C as

shown in Figure 18. The insert consisting of 340 bp is 97 % identical with genomic

clone 9C (CL-2 ), and demonstrates thatArtemia CL-2 gene is also expressed in

Artemia franciscana embryos. The results in Figure 19 (panel A) show the amino acid

sequence for the open reading frame inArtemia embryo cDNA sequence representing

the cathepsin L-2 gene. The sequence consisting of 49 amino acids compares well with

the first open reading frame ofArtemia adult CL cDNA (97 %) as shown in Figure 19

(panel B).

8. Identity of 5’ upstream sequences ofArtemia franciscana

CL genes

As described above,Artemia franciscana CL-2 gene was confirmed to be

functional, or at least transcribed, so we attempted to identify the 5’ upstream part of

the CL-2 gene to help understand the transcriptional regulationArtemia of cathepsin L

genes. To obtain the 5’ flanking upstream sequence of the CL-2 gene, PCR was

performed using degenerate primers (OPC-2, OPC-4, OPC-8, OPC-9) with an internal

primer (CLR10) (see Appendix 1) in the coding region CL-2of gene. Since genomic

clone 9C containing an insert of about 8 kb represented the sequence of theCL-2 gene

in Artemia, it was used as template for the PCR analysis. The PCR reactions were

performed in two consecutive steps according to the nature of degenerate primers and

as described in the methods. The PCR products were separated by electrophoresis on

1% agarose gel and we observed that each primer pair yielded one intense band and

several minor bands ranging in size from 250-1500 bp (see Fig 20). The complex

pattern was attributed to the degenerate primers, as they might have multiple binding

sites in the whole DNA.

The gel containing the PCR products was blotted to a nitrocellulose membrane

and probed with 32P-labeled CL cDNA. Among the four sets of primer pairs used in

PCR, only one CL-positive product was observed and this was with primer pair OPC-4

64

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 18. Comparison of DNA sequences derived from anArtemia franciscana

embryo cDNA library andArtemia CL-2 gene (clone 9C) by PCR.

The sequence of Artemia cathepsin L genomic clone 9C CL-2( gene) (AY557372)

was obtained from the GenBank database. The alignment was performed using

program Clustal W (1.82). Asterisks indicate identical base pair. Dashes indicate

regions not sequenced yet. The internal sequence used as primer in PCR is underlined

in bold, while the TP-7F primer of the vector is not shown.

65

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 18.

9C CAAATGAAGCAGATTACTTTGACATATTTACTAACAGCTGTAATGATATTTTTACTGTCAGTTGTACTTGTGCAGTTAAGTGCTACACAAT 91

ECDNA2 - - CAAGAAGCAGATTACTTTGAAATATTTACTGGAAGCTGTACTGATATTTTTACTGTCAGTTGTACTTGTGCAGTTAAGTGCTACACAAT 9 0

9C CACAGTCAAATTTGCTTGCTGATGAATGGTATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGAA 181

ECDNA2 CACAGTCAAATTTGCTTGCTGATGAATGGTATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGGGGAAAAATTTAGAA 180

9C TGAAGATTTATTTTQAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAGGCGAAAAGTCTTATCAAGTTGCAATGAATC 271

ECDNA2 TGAAGATTTATTTTGGAAATAAAGACAAAATrGCCAAACATAACATCCTTTATGAGAAAGGCGAAAAGTCTTATCAAGTTGCAATGAATC 270

9C AGTTTGGAGATCTTCTTCATCATGAATTTACATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACTT 361

ECDNA2 AGTTTQQAGATCTTCTTCATCATQAATCTACATCTXTCATgATTGGATATAAGAAATGAACTTCACCCTTT...... 341

9C TTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCAGTAACTCCTGTAAAATACCCAGGACAGTGTG 451

EcDNA2

9C CTTCTTGCTTGGCTTTTTCACCTACTGGTGCCTTGGAAAOTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAACT 541

EcDNA2

9C TGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGCCAAGCTTTTGAGTATATCAAGGATAACAAAGGAA 631

ECONA2

9C TTGACACTGAAAATAAATATCATTATGAAGCTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGCTTTG 721

8CDHA2

9C TCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGACCTGTTTCCGCTGTTATTGATGTCTCTCATGAAG 811

ECDHA2

9C GTTTTCAATTCTATTCTAAGGGTGTTTACTATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCACGAAGTTCTTGTAATTGGCT 901

EcDKA2

9C GTGGTTCTGATAATGGCGAAGACTATTGGCTCGTTAAAAACTCATGGTCTAAGCACTGGGGAGACGAAGGGTACCTCAAGATTGCTCGCA 991

ECDMA2

9C ATCGCAAGAACCATTGTGGTGTTGCTACTGCAGCTCTCTATCCAATTGTATAGATAGGGTTGTGGTACTTTTTGTGATGTG7GTAATTGA 1081

BcDKA2

9C CCACGGTACATCT 1094

ECDNA2

66

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 179 atgaagatttattttggaaataaagacaaaattgccaaacataac

MKIYFGNKDKIAKHN

224 atcctttatgagaaaggcgaaaagtcttatcaagttgcaatgaat

I LYEKGEKSYQVAMN

269 cagtttggagatcttcttcatcatgaatctacatctatcatgatt

QFGDLLHHESTS IMI

314 ggatataagaaatga 328

G Y K K *

B.

EcDNA MKIYFGNKDK1AKHNILYEKGEKSYQVAMNQFGDLLHHESTSIMIGYKK 49

AcDNA MKIYFGNKDKIAKHNILYEKGEKSYQVAMNQFGDLLHHEFTSIMIGYKK 4 9

Fig. 19. Analysis of open reading frame inArtemia embryo cathepsin L-2 cDNA

sequence.

Panel A, open reading frame inArtemia embryo CL-2 cDNA. The putative translation

start code, ATG, is underlined. The numbers at the left indicate the positions of the

nucleotides and amino acids in the complete sequence. Asterisk indicates the stop code;

Panel B, comparison o f open reading frame Artemiain embryo CL-2 cDNA with first

open reading frame inArtemia adult CL-2 cDNA. Asterisks indicate identical base

pair.

67

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and CLR10. This primer pair yielded a positive signal for the band at about 400 bp (Fig

20, panel B, lane 5). The band was cut from the gel, purified, then cloned into plasmid

vector pCR 2.1. Through plasmid DNA screening with 32P labled CL cDNA, the clones

with positive signal were purified, and treated with EcoR I restriction endonuclease to

check the presence of inserts, then one of these clones was sequenced. A total o f404 bp

of DNA sequence was obtained including 260 bp of sequence 5’ to the sequence

determined previously Artemia for clone 9C. The newly obtained 5’- sequence was

combined with previously determined sequence for clone 9C and compared with

Artemia adult CL cDNA andArtemia embryo CL-2 cDNA sequences as shown in

Figure 22. As expected most of the 3’ end of the PCR product was nearly identical (97

%) with both adult CL cDNA and embryo CL cDNA derived fromCL-2 the gene, but

the 5’ end of the PCR product had a surprisingly new and different sequence. First,

primer CLR10 appeared at both ends of the PCR product, but on opposite

(complementary) strands, Second, the 5’ end of the PCR product contained an open

reading frame with 69 amino acids for a sequence representing (possibly) another gene

(DEAD-box helicases) as shown in Figure 23. The amino acid sequence of the first

open reading frame is about 30 % identical with DEAD-box helicases. The 227 bp 5’

upstream sequence, excluding primer OPC-4 and CLR10 adjacent to one another, was

analyzed by Match-Public 1.0 (core similarity 0.95, matrix similarity 0.95) using the

TRANSFAC 6.0 database (Wingenderet al. 2000; Wingenderet al. 2001; Matyset al.

2003) to identify putative transcription binding sites as shown in Figue 24. Several

transcription binding sites were found such as GATA-1, GATA-3, CDP CR3+HD,

GATA-3, C/EBP, CDP CR1, NF-Y and Limo2 complex. Some o f them have been

identified previously in the promoter region of genes coding for proteases (Bakhshiet

al. 2001).

68

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1500bp ►

7 50h n ► snnhn ► • m + 250bp ► < Signal

Fig. 20. PCR products obtained using degenerate primers andArtemia genomic

clone 9C as substrate.

Panel A, PCR products were separated by electrophoresis on 1% agarose, and stained

with ethidium bromide as described in methods. Lane 1, lkb ladder; lane 2, PCR

products using primers OPC-8 and CLR10; lane 3, PCR products using primers OPC-9

and CLR10; lane 4, PCR products using primers OPC-2 and CLR10; lane 5, PCR

products using primers OPC-4 and CLR10. Panel B, hybridization reaction with

32P-labeled CL cDNA after the Southern blotting of the gel. Lanes 1-5 represent the

same DNAs as given for Panel A. Only one band of about 400 bp derived from use of

OPC-4 and CLR10 as primers showed a positive signal.

69

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 21. EcoR 1 digestion of PCR product using OPC-4 and CLR10 as primers shown in Fig 20 after cloning in pCR2.1.

Lane 1, lkb ladder; lane 2, DNA clone (9C33) derived from PCR using OPC-4 and

CLR10 as shown in Fig 20.

70

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 22. Comparison of DNA sequences of Artemia genomic clone 9C and its 5’ upstream sequence with Artemia franciscana adult CL-2 cDNA and embryo CL

cDNA.

The sequence of Artemia cathepsin L genomic clone 9C CL-2( gene) (AY557372),

(lacking the newly acquired 5’ sequence) was obtained from the GenBank database

and the 5’ upstream sequence obtained in this experiment was added (indicated with

bold letters). The alignment was performed using program Clustal W (1.82).

Asterisks indicate identical base pair. Dashes indicate regions not sequenced. The

sequences used as primers in PCR to obtain the 5’-extended sequence are underlined

in bold. Primer CLR10 sequence appeared at the 5’ end of the sequence next to primer

OPC-4 as shown in the box as well as at position bp 381 - 405 (underlined). The

arrowhead indicates potential cleavage site for mature region. The open reading

frame identified in the 5’ upstream sequence of genomic clone 9C is indicated as light

underline.

71

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 22.

EC D N A 2

A CD N A2

9C [TCTTGTSTGTAGCCTTQAATAj 2 0

E C D N A 2

A CD N A2

9C |gat|ccgcatctactggatcagataaaacqctggcccatatgctqccatcqatcgtccaca8 0

EC D N A 2

A CD N A2

9C TTAAAAACAAGGCAAATCQTAAAAAAQGAQATQGAACCATAGCTTTTATTTCCGCTCAAG 1 4 0

E C DN A 2

A CD N A 2

9C CTAGAGAATTGGCAAAACAGATCAAAGATGTGGCAGAAAAATATGGAGCAGTATCTTGCA 2 0 0

E C DN A 2

A CD N A2

9C TAAGAGGTACATGTGTCTTCGGTGGGTCTCCAAAGAAAGAAACTGAACATAATACTTGCA 2 6 0

E CDNA2 CAA-GAAGCAGATTACTTTGAAATATTTACTGGAAGCTGTACTGATATTTTTACTGTCAG 5 9

A CD N A2 GCAGATTACTTTGACATATTTACTGGAAGCTGTACTGATATTTTTACTGTCAG 5 3

9C CAATGAAGCAGATTACCTTGACATATTTACTAACAACTGTAATGATATTTTTACTGTCAG 320

* * * * * * * * * **** Hr******** * ***** ******************

E C DN A 2 TTGTACTTGTGCAGTTAAGTGCTACACAATCACAGTCAAATTTGCTTGCTGATGAATGGT 119

ACDNA2 TTGTACTTGTGCAGTTAAGTGCTACACAATCACAGTCAAATTTGCTTGCTGATGAATGGT 113

9C TTGTACTTGAGCAGTTAAGTGCTACACAATCACAGTCAAATTTGCTTGCTGATGAATGGT 3 80

********* **************************************************

ECDNA2 ATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGGGGAAAAATTTAGAA 17 9

ACDNA2 ATCTATTCAAGGCTAGACACAAGAAAGATTATCCAAGCCAACTTGGGGAAAAATTTAGAA 173

9C ATCTATTCAAGGCTACACACAAGAAAGATTATCCAAGCCAACTTGAGGAAAAATTTAGAA 44 0

*************** ***************************** **************

ECDNA2 TGAAGATTTATTTTGGAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAG 23 9

ACDNA2 TGAAGATTTATTTTGGAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAG 233

9C TGAAGATTTATTTTGAAAATAAAGACAAAATTGCCAAACATAACATCCTTTATGAGAAAG 500

*************** ********************************************

72

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ECDNA2 GCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATCTA 299

ACDNA2 GCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTTA 293

9C GCGAAAAGTCTTATCAAGTTGCAATGAATCAGTTTGGAGATCTTCTTCATCATGAATTTA 560

********************************************************* * *

ECDNA2 CATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTT------3 4 0

ACDNA2 CATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACTT 353

9C CATCTATCATGATTGGATATAAGAAATGAACTTCACCCTTTGCTAAGAGCACTTTTACTT 620

*****************************************

ECDNA2 ------

ACDNA2 TTATGGAGCCTGCTAACGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCAG 413

9C TTATGGAGCCTGCTAATGTTACAGTTCCAGAATCTGTTGACTGGAGGGAAAAAGGAGCAG 680 '

ECDNA2 ------

ACDNA2 TAACTCATGTAAAATACCAAGGACAGTGTGCTTCTTGCTGGGCTTTTTCATCTACTGGTG 473

9C TAACTCCTGTAAAATACCCAGGACAGTGTGCTTCTTGCTTGGCTTTTTCACCTACTGGTG 74 0

ECDNA2 ------

ACDNA2 CCTTGAAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAGAACT 533

9C CCTTGGAAAGTCAAACTTTCAGAAAAACAGGAAAGCTCATTTCTTTGAGTGAACAAAACT 800

ECDNA2 ------

ACDNA2 TGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGAGGGATGGATAAGCCAAG 593

9C TGATTGATTGTTCCGGTGAATATGGAAATTTAGGATGCAAAGGGGGATGGATAAGCCAAG 860

ECDNA2 ------

ACDNA2 CTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTATGAAG 653

9C CTTTTGAGTATATCAAGGATAACAAAGGAATTGACACTGAAAATAAATATCATTATGAAG 92 0

EcDNA2 ------

ACDNA2 CTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAATTGCCCTTGGCTTTG 713

9C CTAAAGAAAATTTCTGTCGTGATAATCCAAGAAACCGAGGTGCAGTTGCCCTTGGCTTTG 980

ECDNA2 ------

ACDNA2 TCAATATTCAATCTGGGGAAGAAGATAAACTTCAGGCAGCTGTTGCCACGGTTGGACCTG 773

9C TCAATATTCCATCTGGGGAAGAAGATAAACTTAAGGCAGCTGTTGCCACGGTTGGACCTG 1040

ECDNA2 ------

ACDNA2 TTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTTTACT 833

9 C TTTCCGCTGTTATTGATGTCTCTCATGAAGGTTTTCAATTCTATTCTAAGGGTGTTTACT 1100

73

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ECDNA2 ------

ACDNA2 ATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCAC------8 7 4

9C ATGAGCCATCATGTAAAACATCATTTGAACACCTAAACCACGAAGTTCTTGTAATTGGCT 116 0

ECDNA2 ------

AcDNA2 ------

9C GTGGTTCTGATAATGGCGAAGACTATTGGCTCGTTAAAAACTCATGGTCTAAGCACTGGG 122 0

ECDNA2 ------

ACDNA2 ------

9C GAGACGAAGGGTACCTCAAGATTGCTCGCAATCGCAAGAACCATTGTGGTGTTGCTACTG 128 0

ECDNA2 ------

ACDNA2 ------

9C CAGCTCTCTATCCAATTGTATAGATAGGGTTGTGGTACTTTTTGTGATGTGTGTAATTGA 134 0

ECDNA2 ------

ACDNA2 ------

9C CCACGGTACATCT 1353

74

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A.

50 ctggcccatatgctgccatcgatcgtccacattaaaaacaaggca

LAHMLPS IVHI KNKA

95 aatcgtaaaaaaggagatggaaccatagcttttatttccgctcaa

NRKKGDGTIAFISAQ

14 0 gctagagaattggcaaaacagatcaaagatgtggcagaaaaatat

ARELAKQIKDVAEKY

185 ggagcagtatcttgcataagaggtacatgtgtcttcggtgggtct GAVSCIRGTCVFGGS

230 ccaaagaaagaaactgaacataatacttgc 260

PKKETEHNTC

B.

DEAD LAHMLPSIVHIKNKANRKKGDGTIAFISAQARELAKQIKDVAEKYGAVSCIRGTCVFGGS 6 0

5' - 9C AAFLIPILEKLDP ------SPKKDGPQALILAPTREIiALQIAEVARKLGKHTNLKWVIYGGT 107

* ★ **************** * *

DEAD PKKE 64

5 ' - 9C SIDK 111

Fig. 23. Analysis of 5’ upstream sequence ofArtemia genomic clone 9C

representing CL-2 gene.

Panel A, open reading frame in 5’ upstream sequence of revisedArtemia genomic

clone 9C. The numbers at the left indicate the positions of the nucleotides and amino

acids in the complete sequence; Panel B, BLAST analysis of the amino acid sequence

of the open reading frame in panel A. The best match is a DEAD-box helicase.

Asterisks indicate identical base pair.

75

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B R -C Z 4

^ ______C D PC R 1 ______p. GATA-X M______CDP CR3+HD p Lmo2 complex

______p. GATA-2 ______p. C D PC R 1

______► GATA-1 ______p. CDP CR3+HD TGGATCAGATAAAACGCTGGCCCATATGCTGCCATCGATCGTCCACATTAAAAACAAGGC 6 0

NF-Y 4 ______

M______Elk-1

p Barbie box NF-Y ^ AAATCGTAAAAAAGGAGATGGAACCATAGCTTTTATTTCCGCTCAAGCTAGAGAATTGGC 120

4 ______C/EBP

► GATA-3 ^ Lmo2 complex

______4 ______GATA-1 AAAACAGATCAAAGATGTGGCAGAAAAATATGGAGCAGTATCTTGCATAAGAGGTACATG 180

______p. FO XJ2

TGTCTTCGGTGGGTCTCCAAAGAAAGAAACTGAACATAATACTTGCA 227

Fig. 24. Putative transcription binding sites in 5’ upstream ofArtemia genomic clone 9C.

The 227 bp 5’ upstream sequence of genomic clone 9C was analyzed by Match™

Public 1.0 using the TRANSFAC 6.0 database. The putative transcription binding

sites are shown with an arrow, and indicating the direction.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Discussion:

Cysteine proteases such as cathepsin L have been studied extensively because

they play essential roles in intracellular protein degradation in animal cells. Most

cysteine proteases are synthesized as prepro-enzymes, then undergo proteolytic

processing in endoplasmic reticulum (ER). In most eukaryotes the mature enzymes are

sent to lysosome for storage (Ishidoh and Kominami, 1995), howeverArtemia in

embryos and larvae, most of the cathepsin L-like cysteine protease activities appear to

be non-lysosomal (Warner and Shridhar 1985; Lu and Warner 1991; Warneret al.

1995). Recently some cathepsin L cysteine proteases have been identified in

non-lysosomal regions of several organisms. A cathepsin L isoform in murine NIH3T3

cells, which is devoid of an ER signal peptide in the prepro-sequence, has been found

in the nucleus during the Gl-S transition phase of the cell cycle, and experimental data

suggest that it functions in the regulation of cell cycle progression through proteolytic

processing of the CDP/Cux transcription factor (Goulet et al. 2004). In the shrimp,

Metapenaeus ensis, a cathepsin L encoded by an intron-less gene was found in the

germinal vesicle and thought to be involved in male chromatin remodeling (Hu and

Leung, 2003). Non-lysosomal cathepsin L has also been identifiedXenopus in embryos

(Miyata and Kubo, 1997), Sarcophaga peregrine (Homma and Natori, 1996) and

Onchocerca volvulus (Lustigman et al. 1996).

An Artemia franciscana cathepsin L cDNA containing 1229 bp sequence and

coding for a protease with 217 amino acids was isolated previously in our lab from an

embryo cDNA library (Butleret al. 2001). The cDNA encodes the catalytic subunit of

cathepsin L found in eggs and young larvae of the brine shrimp. This cDNA was used

as template to make a [32P]-labeled probe in search of genomic sequences that might

provide clues as to how theCL gene is regulated inArtemia. An Artemia franciscana

genomic library in EMBL3 was screened using the probe and several clones were

isolated. Prior to this study, theArtemia genomic library in EMBL3 had been searched

by a previous worker (Matt Shaw) and several putative CL clones isolated as well. All

77

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. clones isolated from the library showed an identical restriction pattern and one clone

(9C) was sequenced. The sequence of clone 9C was entered into the GenBank database

in October 2004. Since clone 9C showed only 85 % homology withArtemia embryo

CL cDNA, further screening of the genomic library in EMBL3 was carried out to

isolate a genomic clone matching the embryo cDNA in order to study the promoter

region of the gene. Since the genomic library constructed in phage EMBL3 should

represent all cathepsin L genes inArtemia, several months were spent screening and

analyzing putativeCL gene clones. Ten additional CL-positive clones were isolated,

and upon restriction enzyme analysis, all showed the same pattern, but different from

that observed forArtemia embryo CL cDNA. Three clones were chosen for further

analysis using PCR, and one clone of the PCR products was sequenced. A 482 bp PCR

fragment (see Figure 4) showed that the newly isolated clones were only 90% identical

with Artemia embryo CL cDNA, but 99% identical with genomic DNA clone 9C

previously isolated in our lab. Subsequently, clone 9C was designatedArtemia as

franciscana CL-2 gene.

To analyze further theArtemia franciscana EMBL3 genomic library for the

presence of a CL gene matching the embryonic cDNA, the PCR method was used with

total EMBL3 DNA as substrate and primer pair CLF10 and CLR8. Primers CLF10 and

CLR8 were designed based on the sequence of CL cDNA, however the amplified PCR

fragment was identical to theCL-2 gene (clone 9C). In fact, all PCR products derived

from the use of the EMBL3 library DNA had sequences identical (99 %) with clone 9C

{CL-2), and only 87% identity with CL cDNA.

As the genomic DNA clones matching the embryo CL cDNA could not be found

in the Artemia EMBL3 genomic DNA library, I performed the PCR method using

Artemia genomic DNA isolated from Artemia larvae as template to test for the

presence of the CL-1 gene and confirm the observation of Matt Shaw in our lab. This

experiment yielded a DNA fragment of 565bp, and sequencing showed it to be 97%

identical with embryo CL cDNA. While the PCR product did not cover the entire CL

cDNA, it included most of the mature region and part of prepro-region of cathepsin L.

The 565 bp fragment also contained one EcoR I restriction digestion site, two Hind III

78

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sites and one PstI site at the same positions foundArtemia in embryo CL cDNA. These

results showed that the gene coding forArtemia embryo CL cDNA was indeed present

inArtemia genomic DNA.

Thus, while conventional screening of a phage DNA library and PCR analysis of

the library DNA did not yield a clone matchingArtemia embryo CL cDNA, the gene

coding forArtemia embryo CL(CL-1 gene) was identified inArtemia genomic DNA

isolated from purified nuclei. Given that theArtemia franciscana genomic library in

EMBL3 has been screened in our lab by myself and Matt Shaw without finding a gene

matching Artemia embryo CL cDNA, it appears that the gene matching embryo CL

cDNA {CL-1) was lost during the construction ofArtemia genomic library in EMBL3

(or its re-amplification), or that the cloned gene expressed the protease which was

harmful (killed) to the host cells containing theCL-1 gene ofArtemia.

In comparingArtemia genomic DNA clones withArtemia embryo CL cDNA, no

introns were found in theCL gene sequences. This observation is consistent with the

conclusion obtained previously in our lab (Matt Shaw, unpublished) showing that the

Artemia CL-1 gene is intronless.

Introns were discovered in 1977 (Berget,et al. 1977; Chow et al. 1977; Jeffreys

and Flavell, 1977). By definition introns do not code for protein, but they are thought

to provide a protective mechanism for an organism’s coding regions of DNA from

being damaged by environmental factors. However, in 1990 Liu and Maxwell showed

that intronic sequences in the mousehsc70 heat shock gene are the source of U14, a

small nuclear RNA (or snoRNA). Also, it has been shown that the second intron in the

human apolipoprotein gene B is required for expression of this gene in liver (Brookset

al. 1994). Introns may also increase the rate of meiotic crossing over within a coding

sequence (Fedorova and Fedorov, 2003). Introns are present in most eukaryotic

organisms includinghomo, mus, Penaeus, Drosophila, Fasciola, and even in

single-cell organisms like yeastS. cerevisiae containing about 300 introns (Fedorova

and Fedorov, 2003).

Initially, the lack o f introns in the sequence Artemiaof franciscana cathepsin L

gene was thought to be unique, but recently cathepsin L-like cysteine protease genes in

79

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Leishmania donovani (Mundodiet al. 2001) and the shrimp Metapenaeus ensis

cathepsin L (MeCatL) (Hu and Leung, 2003) have been shown to be intron-less. By

comparing MeCatL with the CL genes PC PI and PCR2 from the marine shrimp

Penaeus vannamei, each containing five introns (Le Boulayet al. 1998), Hu and

Leung found thatMeCatL shares a high degree of sequence identity withPCP1 and

PCP2, suggesting that they have the same ancestor and diverged from one another

recently. They hypothesized that MeCatLthe gene has lost all five introns during

evolution. They also suggested that the double-strand-break repair (DSBR) machinery

might play a role in cDNA-mediated homologous recombination (cDMHR) that causes

the loss of introns (Hu and Leung, 2005).

The “intron-early” theory, first proposed in 1978, hypothesized that introns are

very ancient genetic elements which existed at the beginning of life before the

divergence of eukaryotes and prokaryotes (Doolittle, 1978; Darnel, 1978). The theory

suggests that introns made an essential contribution to the evolution of genes via “exon

shuffling” which created genes from exon “pieces” by recombination within introns

(Doolittle, 1978; Darnel, 1978; Roy et al. 1999; Gilbert, 1987; Fedorov, 2001).

Accordingly, many biologists believe that introns are lost in the course of evolution

(Gilbert et al. 1986). The prokaryotic lineage completely lost its introns, whereas early

introns were retained in the eukaryotes (Fedorovet al. 2002). In 2003, Royet al.

compared 10,020 introns in human-mouse orthologs and 1,459 in mouse-rat, and

found evidence of intron loss in mammals, but no gain in introns during evolution.

In contrast, the intron-late theory, formulated in 1991, hypothesizing that introns

have appeared relatively recently in the genomes of eukaryotes long after their

divergence from prokaryotes (Cavalier-Smith, 1991; Palmer and Logsdon, 1991). This

theory is supported by the fact that introns can behave as transposable elements which

are capable of being inserted into a gene or deleted from it, and based on the

observation that intron positions vary in homologous genes of different organisms

(Longsdonet al. 1995; Logsdon, 1998; Logsdon, Stoltzfus and Doolittle, 1998). The

proponents of this theory have suggested that introns arose as “selfish” elements, and

play no constructive role in evolution. They suggest that introns are spread as mobile go

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. elements that invade genes by insertion into short 4- to 5-nt-long “proto-splice sites”

(Dibb and Newman, 1989). Studies of the triosephosphate {TPI) gene

support an insertional origin of all its known introns (Longsdonet al. 1995;

Kwaitowski et al. 1995). Also, three introns in theDrosophila Xdh gene were shown to

be recent insertions (Tarrioet al. 1998). Divergent structures of Caenorhabditis

elegans cytochromeP450 genes also suggest intron insertions (Gotoh, 1998).

The Artemia franciscana cathepsin L genes {CL-1, CL-2) tend to follow the

intron-late theory. Cysteine proteases arose early in evolution, most likely before the

divergence of eukaryotes and prokaryotes (Berti and Storer, 1995).Artemia is

considered to be an ancient eukaryote and similar in age to prokaryotes, whose genes

lack introns. Rita Shamoon in our lab identified thecathepsin L-l gene of Artemia

parthenogenetica, a parthenogenetic relative ofArtemia franciscana, evolving more

recently (5-6 mya) thanArtemia franciscana. The CL-1 gene sequence of Artemia

parthenogenetica shares 98% identity with the cDNA ofArtemia franciscana, but the

CL gene in Artemia parthenogenetica contains an intron of 1085 bp in the

prepro-region, whereas the CL-1 gene inArtemia franciscana is intron-less. At this

time we have no information on the structure of theCL-2 gene (if it exists) inArtemia

parthenogenetica.

In contrast with theCL genes, the actin genes in twoArtemia species, Artemia

franciscana andArtemia parthenogenetica, contain several introns like that found in

mammals (Ortega et al. 1996). One actin gene isolated fromArtemia parthenogenetica

contains 3 introns, while two otheractin genes isolated fromArtemia franciscana

contain 5 and 6 introns, respectively. In the gene codingArtemia for franciscana

Na/K-ATPase al subunit, ten of the 14 introns are located in identical position as in the

human Na/K-ATPase a3 subunit gene (Garcia-Saez et al. 1997). These findings

suggest that perhaps the actin andATPase genes andCL genes evolved differently in

Artemia and more genes need to be analyzedArtemia in to better understand the

mechanisms involved in intron-loss or intron-gain of this species.

To confirm the functionality ofCL-2 gene inArtemia franciscana, both adult

and embryo cDNA libraries were analyzed using PCR in search of a cDNA matching

81

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the CL-2 gene. The Artemia adult cDNA library in X.gtl 1 was amplified using internal

primers prepared from sequence in theCL-2 gene, and two fragments with a combined

total of 874 bp sequence were obtained representing most of the CL-2 cDNA. The

Artemia embryo cDNA library was also analyzed using PCR and yielded products that

represented both theCL-1 and CL-2 genes. The latter was unexpected because no

cathepsin L matching theCL-2 gene sequence has ever been isolated or identified in

Artemia embryos and larvae.

The 874 bp sequence of CL-2 cDNA obtained from the adult cDNA library was

analyzed in the Open Reading Frame Finder from NCBI, and two open reading frames

were identified, suggesting that theArtemia CL-2 cDNA might encode two proteins.

The deduced amino acid sequence Artemiaof adult cDNA (CL-2) was compared with

amino acid sequence ofArtemia franciscana embryo cathepsin L (ECL-1) (see Fig 25)

as well as cathepsin L sequences from other organisms (see Fig 26). The deduced

amino acid sequence from the embryo CL-2 clone (ACL-2) shows 77 % identity

overall with the amino acid sequence of ECL-1. High amino acid identity with other

cathepsins L was also observed with the fruit flyDrosophila melanogaster (57 %), the

fresh flySarcophagaperegrina (57 %) and the shrimpMetapenaeus ensis (51 %). The

Artemia cathepsin L coded by the CL-2 cDNA (ACL-2) shares the active site Cys and

His with other cathepsin L-like proteases. As the cDNA sequence is not complete, we

have no information on 3’-amino acids including the active site Asn. As shown in

Figure 25, the first open reading frame with 49 amino acids of CL-2 cDNA appears to

encode part of the pro-peptide, while the second open reading frame with 173 amino

acids encodes most (79 %) of the mature region of the protease. The mature CL

proteases of Drosophila melanogaster andSarcophaga peregrina have high identity

(60 % and 59 %, respectively) with the second open reading frame of ACL-2 (see

Figure 26). As well, the mature region of ECL-1 is 76 % identical with the second open

reading frame of ACL-2, while the pro-region of ECL-1 is 81 % identical with the first

open reading of ACL-2. The first open reading frame was also compared with

Drosophila melanogaster (46 %), the flesh fly Sarcophaga peregrina (48 %) and

82

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACL- 2 - MKIYFGNKD 9

ECL-1 ^IKQITLIFLLGAVLVQLSAA|L5LTNLLADEWHLFKATHKKEYPSQLEEKFRMKIYLENKH 60

**** * *

ACL-2 KIAKHNILYEKGEKSYQVAMNQFGDLLHHEFTSIMIGYKK------MEPANV 55

E C L -1 KVAKHNILYEKGEKSYQVAMNKFGDLLHHEFRSIMNGYQHKKQNSSRAESTFTFMEPANV 12 0

* ******************* ********* *** ** ******

ACL-2 TVPESVDWREKGAVTHVKYQGQgASCWAFSSTGALKSQTFRKTGKLISLSEQNLXDCSGE 115

ECL-1 EVPESVDWRVKGAITPVKDQGQjcjGSCWAFSSTGALEGQTFRKTGKLISLSEQNLIDCSGK 18 0

ACL- 2 YGNLGCKEGWISQAFEYIKDNKGIDTENKYHYEAKENFCRDNPRNRGAIALGFVNIQSGE 175

E C L -1 YGNEGCNGGLMDQAFQYIKDNKGIDTENTYPYEAEDNVCRYNPRNRGAIDRGFVHIPSGE 2 4 0

*** ** * *** ************ * *** * ** ******** *** * ***

ACL- 2 edklqaavatvgpvsavidvshegfqfyskgvyyepsckts - f e [h | - - l n h ------2 1 9

ECL-1 EDKLKAAVATVGPVSVAIDASHESFQFYSKGVYYEPSCDSDDLDgGVLWGYGSDNGKDY 3 00

A C L-2 ------2 2 2

E C L -1 WLVKNSWSEHWGDEG YI KI ARNRKNHCGIATAAS Y PLV 3 38

Fig. 25. Comparison of the deduced partial amino acid sequence ofArtemia CL-2 with Artemia CL-1 cDNA.

The sequence of Artemia embryo cDNA (ECL-1) (AF147207) was obtained from

GenBank database. The alignment was performed using program Clustal W (1.82).

Asterisks indicate identical amino acids. Dashes are used to optimize the alignment.

The active site Cys and His are boxed. Arrowhead indicates the putative cleavage site

of the pro-peptide from the mature enzyme. The first open reading frameArtemia of

adult cDNA (ACL-2) is underlined, and the rest of the amino acid sequence belongs to

second open reading frame. The signal peptide for ERArtemia in embryo cDNA is

boxed. Area showing double underline contains bases between the two open reading

frames which are not translated.

83

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 26. Comparison of the deduced amino acid sequence ofArtemia CL-1 and CL-2 cDNA with cathepsin L sequences from other organisms.

Sequences used in the alignments were obtained from GenBank database, and are the

CPI of fruit fly, Drosophila melanogaster, (DCP1) (AF012089); flesh fly,

Sarcophaga peregrina, (D16533); shrimp, Metapenaeus ensis, (Y126713); Artemia

embryo cDNA (ECL-1) (AF147207) and Artemia adult cDNA (ACL-2). The

alignment was performed using program Clustal W (1.82). Asterisks indicate

identical amino acids. Dashes are used to optimize the alignment. The active sites Cys

and His are boxed. Arrowhead indicates the putative cleavage site of the pro-peptide

from the mature enzyme. The first open reading frame ofArtemia adult cDNA

(ACL-2) is underlined, and the rest amino acid sequence belongs to second open

reading frame.

84

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 26.

A C L-2 ------MKIYFGNKD 9

ECL- 1 MKQITLIFLLGAVLVQLSAALSLTNLLADEWHLFKATHKKEYPSQLEEKFRMKIYLENKH 60

DCP1 - -MRTAVLLPLLALLAVAQAVSFADWMEEWHTFKLEHRKNYQDETEERFRLKIFNENKH 58

F l e s h - - MRT- VLVALLALVALTQAISPLDLIKEEWHTYKLQHRKNYANEVEERFRMKIFNENRH 57

M et - -MKALSVLACWAVAVASP WQDFKVQYGRHYGTAREDLYRQSVFEQNQQ 48

*

ACL-2 KIAKHNILYEKGBKSYQVAMNQFGDLLHHEFTSIMIGYKK ME 51

E C L -1 KVAKHNILYEKGEKSYQVAMNKFGDLLHHEFRSIMNGYQHKKQNSSRAEST FTFME 116

DCP1 KIAKHNQRFAEGKVSFKLAVNKYADLLHHEFRQLMNGFNYTLHKQLRAADESFKGVTFIS 118

F l e s h KIAKHNQLFAQGKVSYKLGLNKYADMLHHEFKETMNGYNHTLRQLMRERTG-LVGATYIP 116

Met FIEDHNAKFENGEVTFTLKMNQFGDMTSEEFAATMNGFLNVPTRHP VAILE 99

* * * ******

T ACL-2 PANVTVPESVDWREKGAVTHVKYQGQgASCWAFSSTGALKSQTFRKTGKLISLSEQNLID 111

ECL-1 PANVEVPESVDWRVKGAITPVKDQGQgGSCWAFSSTGALEGQTFRKTGKLISLSEQNLID 176

DCP1 PAHVTLPKSVDWRTKGAVTAVKDQGHgGSCWAFSSTGALEGQHFRKSGVLVSLSEQNLVD 178

F l e s h PAHVTVPKSVDWREHGAVTGVKDQGHgGSCWAFSSTGALEGQHFRKAGVLVSLSEQNLVD 176

M et ADDETLPKHVDWRTKGAVTPVKDQKQgGSCWAFSTTGSLEGQHFLKDGKLVSLSEQNLVD 15 9

* * * * * ** * ** * * ****** ** * * * * * * ******* *

ACL-2 CSGEYGNLGCKEGWISQAFEYIKDNKGIDTENKYHYEAKENFCRDNPRNRGAIALGFVNI 171

ECL-1 CSGKYGNEGCNGGLMDQAFQYIKDNKGIDTENTYPYEAEDNVCRYNPRNRGAIDRGFVHI 236

DCP1 CSTKYGNNGCNGGLMDNAFRYIKDNGGIDTEKSYPYEAIDDSCHFNKGTVGATDRGFTDI 238

F lesh CSTKYGNNGCNGGLMDNAFRYIKDNGGIDTEKSYPYEGIDDSCHFNKATIGATDTGFVDI 236

Met CSGKFGNMGCCGGLMDQAFKYIKENKGIDTEESYPYEAQDGKCRFDSSNVGATDTGFVDI 219

** ** ** * *** * ***** * ** * ** ** *

ACL-2 QSGEEDKLQAAVATVGPVSAVIDVSHEGFQFYSKGVYYEPSCKTS-FEjHj- -LNH------2 1 9

ECL-1 PSGEEDKLKAAVATVGPVSVAIDASHESFQFYSKGVYYEPSCDSDDLE03VLWGYGSD- 295

DCP1 PQGDEKKMAEAVATVGPVSVAIDASHESFQFYSEGVYNEPQCDAQNLDgGVLWGFGTDE 298

F le sh PEGDEEKMKKAVATMGPVSVAIDASHESFQLYSEGVYNEPECDEQNLDjHjGVLWGYGTDE 296

Met AHGEENSLMKAVANIGPISVAIDASHPSFQFYHQGVYYEKECSSTMLDgGVLAIGYGETD 27 9

* * *** +* * ** *★ ** * *** * * ★ *

ACL-2 222

ECL- 1 NGKDYWLVKNSWSEHWGDEGYIKIARNRKNHCGIATAASYPLV 33 8

DCP1 SGEDYWLVKNS WGTT WGDKGFIKMLRNKENQCGI AS ASS Y PLV 3 4 1

F l e s h SGMDYWLVKNSWGTTWGEQGYIKMARNQNNQCGIATASSYPTV 33 9

M et DGKEYWLVKNSWNTSWGDKGFIQMSRNKKNNCGIASQASYPLV 32 2

85

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shrimp Metapenaeus ensis (34 %) in the pro-region. However, it should be noted that

Western blot analysis of the CL from various life cycle stagesArtemia in revealed a

cathepsin L of about 24 kDa after 14 days in culture representing a late juvenile stage

inArtemia development (unpublished data). These results suggest that the second open

reading ofCL-2 gene, which codes for at least 173 amino acids of the mature protein,

may code for a cathepsin L specific to adult tissue and not embryos and larvae. This

conclusion is supported by HPLC and SDS-PAGE analysis of cysteine proteases of

Artemia franciscana in our lab which have indicated the presence of only one cysteine

protease in Artemia embryos, arising from theCL-1 gene (Bulter et al. 2001).

In mammals the prepro-region of the cysteine protease is responsible for proper

targeting of the enzymes (Hanewinkelet al. 1987; Cuozzo et al. 1995) and correct

folding of the enzymes (Smith and Gottesman 1989; Coulombe et al. 1996), as well as

inhibiting the activity of mature proteasesin vitro (Cygler and Mort 1997). Recently,

several studies have focused on the auto-catalytic processing of the prepro-enzyme

under an acidic environment in the ER (Turk et al. 2000). The activation is triggered by

a drop in pH from 8.0 to 5.3, which weakens the interactions between the propeptide

and the catalytic domain (Carmonaet al. 1996; Fox et al. 1992). Under acidic

conditions, the pro-peptide is less tightly bound to the active site, and can be cleaved to

the mature enzyme (Menardet al. 1998; Rozman et al. 1999). Since the second open

reading frame of adult CL-2 cDNA lacks the sequence coding for prepro-region of the

protease, including the ER localization signal peptide, theArtemia cathepsin L

encoded by the second open reading frame would not enter the ER for proteolytic

processing, and subsequent localization in the lysosomes. This is consistent with the

fact that in Artemia franciscana, most of the cysteine protease activity is found in

non-lysosomal areas of embryos, but whether this occurs in adult tissues is not known.

In murine NIH3T3 cells where a cathepsin L isoform devoid of a signal peptide has

been detected in the nucleus, the authors speculate that this cathepsin L isoform

contains a very short pre-domain, which does not bind tightly to the active groove of

mature protease, thereby allowing auto-catalytic processing to occur at higher pH

(Goulet et al. 2004). As well, a cytosolic chaperone could serve as a substitute for the

86

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pro-peptide (Frydman, 2001). A novel form of cathepsin B associated with

intracellular membranes in human tumors was shown to lack a signal peptide and part

of the pro-peptide, but it could still become active (Mehtaniet al. 1998). Artemia

cathepsin L coded by theCL-2 gene could probably follow a similar pathway. It is also

possible that the 49 amino acids encoded by the first open reading frameArtemia of

adult cDNA could substitute for the normally present pro-peptide as found for the

Artemia embryo cathepsin L coded by theCL-1 gene since its sequence is 81 %

identical with the pro-peptide encoded Artemia by embryo CL-1 cDNA. Given the

lack of hydrophobicity in the 49 amino acid prepro-enzyme fragment, it is unlikely that

this fragment could enter the ER for degradation like what is thought to happen with

the pro-peptide coded byArtemia embryo CL-1 cDNA. However, since no matching

protein for the first open reading frame of CL-2 cDNA has been identified on Western

blots o f Artemia franciscana embryo or adult preparations, if one is synthesized it must

be degraded rapidly shortly after translation.

While analyzing theArtemia embryo cDNA library, a PCR product of 340 bp,

identical to the 5’ end of theArtemia adult cDNA and representing theCL-2 gene as

shown in Figure 19 was found. Since a cysteine protease encoded by the CL-2 cDNA

has not been found in the embryosArtemia, of it remains unknown whether this (CL-2)

cDNA is functional or not. Clearly more work needs to be done with bothArtemia

embryo and adult cDNA libraries to obtain the full sequence of the cDNA coded by the

CL-2 gene. Towards this goal, the 3’ end of embryo CL-2 cDNA has to be identified.

Eventually, the cysteine protease observed on Western blotsArtemia of juveniles needs

to be isolated and sequenced to compare with adult cDNA sequences to test the above

hypothesis.

In Artemia franciscana, regulation of promoter occupancy in theactin 302 gene

and sarco/endoplasmic reticulumCa2+-ATPase-encoding gene has been identified

(Martinez-Lamparero et al. 2003). Transcriptional regulation of these genes appears to

be associated with deactivation of cryptobiosis, leading to enhanced metabolism and

development ofArtemia embryos. To understand the mechanism involved in

transcriptional regulation of theArtemia cathepsin L genes, the 5’ upstream sequence

87

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the cathepsin L genes must be identified. Toward this objective I performed the PCR

method using a degenerate primer with an internal primer.Artemia genomic clone 9C

representing the CL-2 gene was used as template. Degenerate primers, also called

random primers, were those used by other researchers along with a primer designed to

a conserved region of the cysteine protease. Random primers have been used in PCR

for universal amplification of prevailing DNA or for amplification of unknown

intervening sequences that are not generally defined in length or sequence (Eggeling

and Spielvogel, 1995). This technique has been widely performed for the identification

of viral genomes (Rose, 2005), for cloning and sequencing ofLactococcus the lactis

subsp. lactis recA gene (Duwart et al. 1992), and for obtaining sequence from the yeast

artificial chromosome (YAC) insert ends (Swensen, 1996).

Our degenerate primers were designed according to primers used by Badaracco

et al. (1995) to randomly amplify polymorphic DNA in the phylogenetic study of

bisexual Artemia. In my experiments four degenerate primers were used with

conserved primers in PCR reactions to yield multiple bands in each case. Of the several

PCR products generated (see Fig 20), only one showed a positive signal on a Southern

blot when probed with [32P]-labeled CL cDNA. This PCR product was ligated into

plasmid vector, cloned and sequenced. Sequencing yielded a product of 404 bp

including 260 bp 5’ upstream sequence of genomic clone 9C. The degenerate primer

OPC-4 was identified at one end of the PCR product, but the conserved primer

(CLR10) appeared at 5’ end of both strands of newly synthesized DNA for reasons not

yet clear. The sequence of (genomic) clone 9C is 97% identical withArtemia adult and

embryo CL-2 cDNA, and no intron or putative splicing sites were identified in the

genomic clone 9C sequence. This observation demonstrates that theArtemia CL-2

gene is also intron-less like theCL-1 gene inArtemia franciscana, which supports the

conclusion thatArtemia cathepsin L genes lack introns as mentioned above.

Analysis of the 5’ end of clone 9C, after PCR extension, revealed an open reading

frame of 69 amino acids prior to the coding sequence for theCL-2 gene. When

analyzed by BLAST (NCBI), the sequence was similar to a gene coding for

DEAD-box helicases (see Figure 23) with an E value of 2E-08. DEAD-box helicases

88

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. belong to a family of proteins involved in ATP-dependent RNA unwinding, needed in a

variety of cellular processes including splicing, ribosome biogenesis and RNA

degradation. The name derives from the sequence of the Walker B motif (motif II),

which contains the ATP-binding region (Marchler-Bauer, 2005). It is unusual to

associate the putative promoter region with another gene, especially if the amino acid

sequence in the open reading frame is only 30 % identical with the DEAD-box helicase.

However, as the 5’ upstream sequence of clone 9C obtained here is not extensive, the

meaning of this sequence is not clear. On the other hand, several putative transcription

factor binding sites were identified in the 227 bp 5’ upstream sequence as shown in

Figure 24, indicating that this area could include the (putative) promoter region of the

Artemia CL-2 gene. However, the 227 bp sequence was found to have a high AT

content (59.5 %), which is in contrast with the promoter region of many lysosomal

enzymes, where a high GC content has been found for human cathepsin L (Bakhshiet

al. 2001), rat cathepsin L gene (Charronet al. 2002), human cathepsin B (Yanet al.

2000) and human (Shiet al. 1994). Clearly, additional 5’ upstream

sequence of Artemia genomic clone 9C is needed to more fully characterize the

promoter region of Artemmia franciscana cathepsin L-2 gene. As well, functional

analysis will be needed to identify the core promoter region ofCL the genes inArtemia

to better understand the transcription mechanism involved in this gene.

Future work will focus on several points as follows: first, completion of the

sequences of CL-2 cDNA inArtemia adult and embryo libraries using the PCR method

or conventional screening procedure; second, isolation of the cysteine protease

observed on Western blots Artemiaof juveniles; third, a search will be carried out for

additional 5’ upstream sequence ofArtemia CL-2 gene, identifying the core promoter

region using deletion analysis, and the transcription initiation site; and fourth,

completion of the sequence of CL-1 gene, and search for its 5’ promoter sequence to

understand the expression and regulation of thecathepsin L-l gene in Artemia

embryos.

89

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix 1: Pimers used in PCR experiments.

Primer Sequence Tm

CLF 5 ’ -C AATGAAGC AGATTACTTTGA-3 ’ 57°C

CL9CF1 5 ’ -GC AG ATTACTTTG AC ATATTTACT-3 ’ 55°C

CLF 11 5 ’ -TATAAGAAATGAACTTCACCCTTT-3 ’ 58.7°C

CLF 13 5 ’ -CCAAC ATAAGAA ACAGAATTCCTC-3 ’ 61.5°C

CLF 10 5 ’ -ATCTATTC AAGGCTAC AC ACAAGA-3 ’ 60.6°C

CLR 5 ’ -TATAC AAGTGGATAGCTAGCT-3 ’ 52.5°C

CLR3 5 ’-TCCATTAATCCTCCATTGCAT-3 ’ 62.9°C

CLR8 5 ’ -ATAGTCTTCGCC ATTATCAGAACC-3 ’ 63°C

CLR10 5 ’ -TCTTGTGTGTAGCCTTGAATAGAT-3 ’ 60.6°C

CLR 10b 5’-GTGGTTTAGGTGTTCAAATGATGT-3’ 62.6°C

CLR 11 5’-AAAGGGTGAAGTTCATTTCTTATA -3’ 58.7°C

CLR 18 5 ’ -TCCGTGGTCTAGGTCATCAGAGTC-3 ’ 67.5°C

OPC-2 5 ’-GTCAGGCGTC-3 ’ 33.4°C

OPC-4 5 ’ -CCGCATCTAG-3 ’ 30°C

OPC-8 5 ’ -TGG ACCGGTG-3 ’ 40.7°C

OPC-9 5 ’ -CTCACCGTCC-3 ’ 33.TC

TP-7F 5 ’ -TTGTAATACGACTC ACTATAGGGC-3 ’ 60°C

90

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix 2: Primers designed on Artemia embryo CL-1

cDNA sequence.

Start code

1 catcttgtgg cagacaatta caca|atg|aag cagattactt tgatattttt actgggagc ------CLF 61 gtacttgtgc agttaagtgc tgcactatca ctgacaaatt tacttgctga tgaatggcat

CLR10 121 ctattcaagg ctacacacaa gaaagaatat ccaagccaac ttgaggagaa atttagaatg ► CLF10 181 aagatttatt tggaaaataa acacaaagtt gccaaacata acatccttta tgaaaaaggc

241 gaaaagtctt atcaagtcgc aatgaataag tttggagatc ttcttcatca tgaatttaga

EcoR I

301 tctatcatga atggatacca acataagaaa ca|gaattcjct caagagctga gagcactttc ► CLF13 361 acttttatgg agcctgctaa tgttgaagtt ccagaatctg ttgactggag ggtaaaagga

421 gccataactc ctgtaaaaga ccaaggacag tgtggttcat gctgggcttt ctcatctact

481 ggtgccttgg aaggtcaaac cttcagaaaa acagggaagc tcatttcttt gagtgaacag

CLR3 ■4------541 aacttgattg attgttctgg aaaatatgga aatgaaggat gcaatggagg attaatggac

601 caagctttcc agtatatcaa ggataacaag ggaattgaca ctgaaaatac gtacccttat

661 gaagctgaag acaatgtctg tcgttataat ccaaggaacc gaggtgccat tgaccgtggc

721 tttgtccata tcccatctgg agaagaagat aagcttaagg cagctgttgc cactgttgga

91

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 781 cctgtatctg ttgccatcga tgcctctcat gaaagtttcc aattctattc taaaggtgtt

CLR18 ■4------841 tactatgagc catcatgtgactctgatgac ctagaccacg gagttcttgt ggttggctat

CLR8 ■4------901 ggttctgata atggcaaaga ctattggctc gttaaaaact cgtggtctga gcactgggga

961 gacgaagggt atatcaagat tgctcgcaat cgcaagaacc attgtggtat tgctactgca

CLR 4 - - , - 1 0 2 1 gctagctatc cacttgt|ata| gatagggttg tgg taatttt tgtggatgtg tgtaattgca

Stop code

1081 tacgttaaat tcttattctc ttgataggtt tagagagttc tagttttcag tttgattccg

1141 tagatgacag attttgtgac catattcgag aataaagcgt ttttttta c c taaaaaaaaa

1201 aaaaaaaaaa aaaaaaaaaa aaaaaaaaa

92

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix 3: Primers designed onArtemia genomic clone 9C

sequence.

1 caaatgaagcagattacttt gacatattta ctaacagctg taatgatatt tttactgtca ► CL9CF1

61 gttgtacttg tgcagttaag tgctacacaa tcacagtcaa atttgcttgc tgatgaatgg

CLR 10

1 2 1 tatctattca aggctagaca caagaaagat tatccaagcc aacttgagga aaaatttaga ► CLF10

181 atgaagattt attttgaaaa taaagacaaa attgccaaac ataacatcct ttatgagaaa

241 ggcgaaaagt cttatcaagt tgcaatgaat cagtttggag atcttcttca tcatgaattt

CLR11

301 acatctatca tgattggata taagaaatga acttcaccct ttgctaagag cacttttact — » CLF11

361 tttatggagc ctgctaatgt tacagttcca gaatctgttg actggaggga aaaaggagca

421 gtaactcctg taaaataccc aggacagtgt gcttcttgct tgg ctttttc acctactggt

481 gccttggaaa gtcaaacttt cagaaaaaca ggaaagctca tttctttg ag tgaacaaaac

541 ttgattgatt gttccggtga atatggaaat ttaggatgca aagggggatg gataagccaa

601 gcttttgagt atatcaagga taacaaagga attgacactg aaaataaata tcattatgaa

661 gctaaagaaa atttctgtcg tgataatcca agaaaccgag gtgcagttgc ccttggcttt

721 gtcaatattc catctgggga agaagataaa cttaaggcag ctgttgccac ggttggacct

93

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 781 gtttccgctg ttattgatgt ctctcatgaa ggttttcaat tctattctaa gggtgtttac

CLRlOb

841 tatgagccat catgtaaaacatcatttgaa cacctaaacc acgaagttct tgtaattggc

CLR 8 ■4------901 tgtggttctg ataatggcga agactattgg ctcgttaaaa actcatggtc taagcactgg

961 ggagacgaag ggtacctcaa gattgctcgc aatcgcaaga accattgtgg tgttgctact

1021 gcagctctct atccaattgt atagataggg ttgtggtact ttttg tg atg tgtgtaaittg

1081 accacggtac atct

94

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References:

Arora, S., Chauhan, S. S. 2002. Identification and characterization of a novel human cathepsin L splice variant.Gene 293: 123-131.

Badaracco, G, Bellorini, M., Landaberger, N. 1995. Phylogenetic study of bisexual Artemia using random amplified polymorphic DNA.J. Mol. Evol. 41: 150-154.

Badaro, R., Jones, T.C., Lorenco, R., 1986. A prospective study of visceral leishmaniasis in an endemic area of Brazil.J. Infect. Dis. 154: 639-649.

Barrett, A. J., and Kirchke, H. 1981. Cathepsin B, cathepsin H, and cathepsin L. Methods Enzymol. 80: 535-561.

Barrett, A. and Rawlings, N. D. 2001. Evolutionary Lines of Cysteine Peptidases.Bio. Chem. 382: 727-733.

Barrett. A. J., Rawlings, N. D., Woessner J. F. 1998.Handbook o f Proteolytic Enzymes. Academic Press, San Diego, California.

Benton, W. D. and R.W. Davis. 1977. Screening Xgt recombinant clones by hybridization to single plaques in situ.Science 196:180.

Berdowska, I. 2003. Cysteine protease as disease markers.Clinica ChimicaActa 342: 41-69.

Berget, S. M., C. Moore, and P.A. Sharp, 1977. Spliced segments at the 5_terminus of adenovirus 2 late mRNA.Proc. Natl. Acad. Sci. USA 74: 3171-3175.

Berti, P. J., Storer, A. C. 1995. Alignment/phylogeny of the papain superfamily of cysteine proteases.J. Mol. Biol. 246(2): 273-283.

Bimboim, H.C. and J. Doly. 1979. A rapid alkaline extraction procedure for screening recombinant plasmid DNA.Nucleic Acids Res. 7: 1513.

Blondeau,X., Vidmar,S.L., Emod, I., Pagano, M., Turk, V. and Keil-Dlouha, V. (1993). Generation of matrix-degrading proteolytic system from fibronectin by cathepsins B, G H and L.Biol. Chem. Hoppe-Seyler, 374: 651-656.

Bode, W., Huber, R. 2000. Structural basis of the endoproteinase — protein inhibitor interaction.Biochim. Biophys. Acta 1477: 241—252.

95

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bromme, D., Nallaseth, F. S., and Turk, B. 2004. Production and activation of recombinant papain-like cysteine proteases.Methods 32(2): 199-206. Review.

Britton, C., Murray, L. 2002. A cathepsin L protease essential forCaenorhabditis elegans embryogenesis is functionally conserved in parasitic nematodes.Mol. Biochem. Parasit. 122: 21-33.

Brocklehurst K, Kowlessur D, O'Driscoll M, Patel G, Quenby S, Salih E, Templeton W, Thomas EW, Willenbrock F. 1987. Substrate-derived two-protonic-state electrophiles as sensitive kinetic specificity probes for cysteine proteinases. Activation of 2-pyridyl disulphides by hydrogen-bonding.Biochem J. 244(1): 173-181.

Brooks, A. R., B.P. Nagy, S. Taylor, W.S. Simonet, J.M. Taylor, and B. Levy-Wilson, 1994. Sequences containing the second-intron enhancer are essential for transcription of the human apolipoprotein B gene in the livers of transgenic mice.Mol. Cell. Biol. 14: 2243-2256.

Butler, A. M., Aiton, A. L. and Warner, A. H. 2001. Characterization of a novel heterodimeric cathepsin L-like protease and cDNA encoding the catalytic subunit of the protease in embryos ofArtemia franciscana. Biochem. Cell Biol. 79: 43-56.

Carmona, E., Dufour, E., Ploufee, C., Takebe, S., Mason, P., Mort, J. S., and Menard, J. 1996. Potency and selectivity of the cathepsin L propeptide as an inhibitor of cysteine protease. Biochem. 35: 8149-8157.

Cavalier-Smith, T., 1991. Intron phylogeny: a new hypothesis.Trend. Genet. 7: 145-148.

Chauhan, S.S., Popescu, N.C., Ray, D., Fleischmann, R., Gottesman, M.M., Troen, B.R., 1993. Cloning, genomic organization and chromosomal localization of human cathepsin L. J. Biol. Chem. 218: 1039- 1045.

Chen, M.G., Mott, K.E., 1990. Progress in morbidity due Frasciolato hepatica infection.Trrop. Dis. Bull. 87: 1-37.

Chow, L.T., R.E. Gelinas, J.R. Broker, and R.J. Roberts, 1977. An amazing sequence arrangement at the 5_ends of adenovirus 2 messenger RNA.Cell 12: 1-8.

Clegg, J.S. and Conte, F.P.1980. A review of the cellular and development biology of Artemia. In: The brine ShrimpArtemia. Vol. 2. (Persoone, G., Sorgeoloos, P., Roels, O., and Jaspers, E. editors), pp. 11-54, Universa Press, Wetteren.

96

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Collette, J., Bocock, J. P., Ahn, K., Chapman, R. L., Godbold, G., Yeyeodu, S., and Erickson, A.H. 2004. Biosynthesis and alternative targeting of the lysosomal cysteine protease cathepsin L. Int. Rev. Cytol, 1: 241.

Criel, G RJ. and Macrae, T.H. 2002. Chapter 1.Artemia morphology and structure. In: Artemia basic and applied biology. (Abatzopoulos, J., Beardmore, J. S., and Sorgeloos, P.), pp. 1-38, Kluwer Academic Publishers, London.

Criel, G.R.J. and Macrae, T.H. 2002. Chapter 2. Reproductive biologyArtemia. of In: Artemia basic and applied biology. (Abatzopoulos, J., Beardmore, J. S., and Sorgeloos, P.), pp. 39-128, Kluwer Academic Publishers, London.

Cygler, M. and Mort, J.S. 1997. Proregion structure of members of the papain superfamily. Mode of inhibition of enzymatic activity.Biochimie 79: 645-652.

Dalton, J.P., McGonigle, S., Rolph, T.P., Andrews, S.J., 1996. Induction of protective immunity in cattle against infection withFasciola hepatica by vaccination with cathepsin L proteinase and hemoglobin.Infect. Immun. 64: 5066-74.

Darnel, J.E., 1978. Implications of RNA. RNA splicing in evolution of eukaryotic cells. Science 202: 1257-1260.

Delaisse, J.M., Vaes, G., 1992. In: Griffin, B.R., Gay, C.V. (Eds.). Biology and Physiology of the Osteoclast,CRC Press, Boca Raton, FL, p. 290.

Delaisse,J.M., Eeckhout,Y. and Vaes,G. 1980. Inhibition of bone resorption in culture by inhibitors of thiol proteinases.Biochem. J., 192: 365-368.

Dibb, N. J., Newman, A. J. 1989. Evidence that introns arose at proto-splice sites. EMBOJ. 8(7): 2015-2021.

Dolenc I., Turk B., Pungercic G, RitonjaA., Turk V. 1995. Oligomeric structure and substrate induced inhibition of human cathepsin J.C. Biol. Chem. 270(37): 21626-21631.

Doolittle, W.F., 1978. Genes in pieces: were they ever together?Nature 272: 581-582.

Duwat, P., Ehrlich, S. D., and Gruss, A. 1992. Use of degenerate primers for polymerase chain reaction cloning and sequencing of the Lactococcus lactis subsp. lactis recA gene. Appl Environ Microbiol. 58(8): 2674-2678.

97

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Eeckhout,Y. and Vaes,G. 1977. Further studies on the activation of procollagenase, the latent precursor of bone collagenase. Effects of lysosomal cathepsin B, plasmin and kallikrein, and spontaneous activation.Biochem. J., 166: 21-31.

Erickson-Lawrence, M., Zabludofe, S.D., Wright, W.W., 1991. Cyclic protein 2, a secretory product of rat sertoli cells is the proenzyme form of cathepsinMol. L. Endocrinol. 5: 1789-1798.

Esteban, J.G., Bargues, M.D., Mas-Coma, S., 1998. Geographical distribution, diagnosis and treatment of human fascioliasis: a review.Res. Rev. Parasitol. 57: 309-18.

Evans, T.G., Teixeira, M.J., McAuliffe, I.T., 1986. Epidemiology of visceral leishmaniasis Northeast Brazil. J. Infect. Dis. 166: 1124-1132.

Fagbemi, B.O., Guobadia, E.E., 1995. Immunodiagnosis of fascioliasis in ruminants using a 28-kDa cysteine protease ofFasciola gigantica. Vet. Parasitol. 57: 309-18.

Fedorov, A., Merican, A. F., Gilbert W. 2002. Large-scale comparison of intron positions among animal, plant, and fungal genes.Proc Natl Acad Sci USA. 99(25): 16128-16133.

Fedorova, J., and Fedorova, A. 2003. Introns in gene evolution.Genetica 118: 123-131.

Frydman, J. 2001. Folding of newly translated protein in vivo: the role of molecular chaperones. Annu. Rev. Biochem. 70: 603-647.

Garci, A., Perona, R. and Sastre, L. 1997. Polymorphism and structure of the gene coding for the al subunit of the Artemia franciscana Na/K-ATPase.Biochem. J. 321: 509-518.

Gilbert W, Marchionni M, McKnight G: On the antiquity of introns.Cell 1986, 46: 151-154.

Gotoh O. 1998. Divergent structures of Caenorhabditis elegans cytochromeP450 genes suggest the frequent loss and gain of introns during the evolution of nematodes. Mol. Biol. Evol. 15(11): 1447-1459.

Gotthardt, D., Warnatz, H.J., Henschel, O., Bruckert, F., Schleicher, M., Soldati, T. 2002. High-resolution dissection of phagosome maturation reveals distinct membrane trafficking phases. Mol Biol Cell. 13: 3508-3520.

98

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Goulet, B., Baruch, A., Moon, N-S., Poirer, M., Sansregret, L., Erickson, A., Bogyo, M., and Nepveu, A. 2004. A cathepsin L isoform that is devoid of a signal peptide localizes to the nucleus in S phase and processes the CDP/Cux transcription factor. Mol. Cell, 14: 207-219.

Grzonka, Z., Jankowska, E., Kasprzykowski, F., Kasprzykowska, R., Lankiewicz, L., Wiczk, W., Wieczerzak, E., Ciarkowski, J., Drabik, P., Jankowski, R., Kozak, M., Jask olski, M., Grubb A. 2001. Structural studies of cysteine proteases and their inhibitors. Acta Biochim Polon. 48: 1-20.

Guinec,N., Dalet-Fumeron,V. and Pagano,M. 1993. "In vitro" study of basement membrane degradation by the cysteine proteinases, cathepsins B, B-like and L. Digestion of collagen IV, laminin, fibronectin, and release of gelatinase activities from basement membrane fibronectin.Biol. Chem. Hoppe- Seyler, 374:1135-1146.

Guncar, G., Pungercic, G., Klemencic, I., Turk, V., Turk, D. 1999. Crystal structure of MHCclass II-associated p41 Ii fragment bound to cathepsin L reveals the structural basis for differentiation between cathepsins L andEMBOJ. S. 18: 793-803.

Homma, K., and Natori, S. 1996. Identification of substrate proteins for cathepsin L that are selectively hydrolyzed during the differentiation of imaginal discs of Sarcophagaperegrina. Eur. J. Biochem. 240: 443-447.

Hu, K. G., and Leung, P. C. 2004. Shrimp cathepsin L encoded by an intronless gene has predominant expression in hepatopancrease, and occurs in the nucleus of oocyte. Comp. Biochem. Physiol 137: B 21-33.

Hu, K. G., and Leung, P. C. 2006. Complete, precise, and innocuous loss of multiple introns in the currently intronless, active cathepsin L-like genes, and inference from this event. (Article in press).

Huete-Perez, J.A., Engel, J.C., Brinen, L.S., Mottram, J.C., McKerrow, J.H. 1999. Protease trafficking in two primitive eukaryotes is mediated by a prodomain protein motif. JBiol Chem. 274: 16249-16256.

Hughes, A. L. 1994. Evolution of cysteine protease in eukaryotes.Mol. Phylogenet. Evol. 3(4): 310-321.

Ish-Horowicz, D. and J.F. Burke. 1981. Rapid and efficient cosmid cloning.Nucleic Acids Res. 9: 2989.

Ishidoh K, Toeatari T, Imajoh S, Kawasaki H, Kominami E, Katunuma N, Suzuki K. 1987. Molecular cloning and sequencing of cDNA for rat cathepsin L.FEBS Lett. 1987. 223(1): 69-73.

99

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ishidoh, K., and Kominami, E. 1995. Procathepsin L degrades extracellular matrix proteins in the presence of glycosaminoglycans in vitro.Biochem. Biophys. Res. Commun. 217: 624-631.

Ishidoh, K., Imajoh, S., Emori, Y., Ohno, S., Kawasaki, H., Minami, Y., Kominami, E., Katunuma, N., and Suzuki, K. 1987. Molecular cloning and sequencing of cDNA for rat cathepsin H. Homology in pro-peptide regions of cysteine proteinases.FEBS Lett. 226: 33-37.

Jackson, S. A. and Clegg, J.S. 1996. Ontogeny of low molecular weight stress protein p26 during early development of the brine shrimp,Artemia franciscana. Development Growth and Differentiation 32: 41-49.

Jean, D., Guillaume, N., and Frade, R. 2002. Characterization of human cathepsin L promoter and identification of binding sites for NF-Y, Spl and Sp3 that are essential for its activity.Biochem. J. 361: 173-184.

Jeffreys, A.J. and R.A. Flavell, 1977. The rabbit beta-globin gene contains a large insert in the coding sequence. Cell 12: 1097-1108.

Karrer, K.M., Peiffer, S.L., DiTomas, M.E.1993. Two distinct gene subfamilies within the family of cysteine protease genes.Proc Natl Acad Sci USA. 90(7): 3063-3067.

Kestemont, P., Cooremans, J., Ayad, A.A., and Melard, C. 1999. Cathepsin L in eggs and larvae of perchPerea fluviatilis. Fish Physiol. Biochem. 21: 59-64.

Kirschke H., Barrett, A.J., Rawlings, N.D. 1995. Proteinases 1: lysosomal cysteine proteinases. Protein Profile. 2(14): 1581-1643.

Kirschke H, Wiederanders B. 1994. Cathepsin S and related lysosomal endopeptidases. Methods. Enzymol. 244: 500-511.

Kobayashi,H., Schmitt,M., Goretzki,L., Chucholowski,N., Calvete,J., Kramer,M., Gunzaler,W.A., Janicke,F. and Graeff,H. 1991. Cathepsin B efficiently activates the soluble and the tumor cell receptor-bound form of the proenzyme urokinase-type plasminogen activator (Pro-uPA).J. Biol. Chem., 266: 5147-5152.

Koblinski, J. E., Ahram, M., Sloane, B.F. 2000. Unraveling the role of proteases in cancer. Clin. Chim. Acta. 291: 113- 135.

Krasko, A., Gamulin, V., Seack, J,. Steffen, R., Schroder, H.C., Muller, W.E. 1997. Cathepsin, a major protease of the marine spongeGeodia cydonium: purification of the enzyme and molecular cloning of cDNA.Mol. Mar. Biol. Biotechnol. 6: 296-307.

100

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Kuipers, A.G., and Jongsma, M.A. 2004. Isolation and characterization of cathepsin L-like cysteine protease cDNAs from western flower thripsFrankliniella ( occidentalis). Comp. Biochem. Physiol. 139: B. 65-75.

Kwaitowski, J., Krawczyk, M., Kornacki, M., Bailey, K., Ayala, F.J. 1995. Evidence against the exon theory of genes derived from the triosephosphate isomerase.Proc Natl Acad Sci USA 92: 8503-8506.

Lamparero, A. M., Casero, M. C., Caro, J. O., and Sastre, L. 2003. Regulation of promoter occupancy during activation of cryptobiotic embryos from the crustacean Artemiafranciscana. J. Exp. Biol. 206:1565-1573.

Lazzarino, D. and Gabe,l C.A. 1990. Protein determinants impair recognition of procathepsin L phosphorylated oligosaccharides by the cation-independent mannose 6-phosphate receptor. J. Biol. Chem. 265: 11864-11871.

Le Boulay, C., Sellos, D., Van Wormhoudt, A. 1998. Cathepsin L gene organization in crustaceans. Gene. 218(1-2): 77-84.

Lenarcic B, Turk V. 1999. Thyroglobulin type-1 domains in equistatin inhibit both papain-like cysteine proteinases and cathepsinJ D. Biol Chem. 274: 563—566.

Liang, P. and MacRae, T.H. 1999. The synthesis of a small heat shock/a-crystallin protein in Artemia and its relationship to stress tolerance during development. Developmental Biology 207: 445-456.

Liu, J. and E.S. Maxwell, 1990. Mouse U14 snRNA is encoded in an intron of the mouse cognate hsc70 heat shock gene. Nucl. Acids Res. 18: 6565- 6571.

Logsdon, J.M., 1998. The recent origin of spliceosomal introns revised.Curr. Opin. Genet. Dev. 8: 637-648.

Logsdon, J.M., A. Stoltzfus, and W.F. Doolittle, 1998. Molecular evolution: recent cases of spliceosomal intron gain?Curr. Biol. 8: R560-R563.

Logsdon, J.M., M.G. Tyshenko, C. Dixon, J.D. Jafari, V.K. Walker, and J.D. Palmer, 1995. Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the intronslate theory.Proc. Natl. Acad. Sci. USA 92: 8507-8511.

Lu, J. and Warner, A.H. 1991. Immunodetection of thiol protease levels in various populations o f Artemia cysts and during development.Biochem. Cell Biol. 69: 96-101.

101

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Lustigman, S., McKerrow, J.H., Shah, K., Lui, J., Huima, T., Hough, M., and Brotman, B. 1996. Cloning of a cysteine protease required for the molting ofOnchocerca volvulus third stage larvae.J. Biol. Chem. 271: 3081 - 3089.

Maciewicz, R.A., and Etherington, D.J. 1988. Enzyme immunoassay for cathepsin B and cathepsin L in synovial fluids from patients with arthritis.Biochem. Soc. Trans. 16: 812-813.

Maciewicz,R.A., Wotton,S.F., Etherington,D.J. and Duance,V.C. 1990. Susceptibility of the cartilage collagens types II, IX and XI to degradation by the cysteine proteinases, cathepsins B and L.FEBSLett., 269: 189-193.

Maniatis, T., and R. Reed, 2002. An extensive network of coupling among gene expression machines.Nature 416: 499-506.

Marchler-Bauer, A., Anderson, J.B., Cherukuri, P.F., DeWeese-Scott, C., Geer, L.Y., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Marchler, G.H., Mullokandov, M., Shoemaker, B.A., Simonyan, V., Song, J.S., Thiessen, P.A., Yamashita, R.A., Yin, J.J., Zhang, D., Bryant, S.H. 2005. "CDD: a Conserved Domain Database for protein classification.", Nucleic Acids Res. 33: D 192-196.

Marx, J.L., 1987. A new wave of enzymes for cleaving prohormones.Science 235: 285-286.

Mas-Coma, S., Esteban, J.G., Bargues, M.D., 1999. Epidemiology of human fascioliasis: a review and proposed new classification.Bull. WHO 77: 340-346.

Massimi, I., Park, E., Rice, K., Muller-Esterl, W., Sauder, D., McGavin, M.J. 2002. Identification of a novel maturation mechanism and restricted substrate specificity for the SspB cysteine protease ofStaphylococcus aureus. Biol Chem. 277: 41770—41777.

Matys,V., Fricke,E., Geffers,R., Gossling,E., Haubrock,M., Hehl,R., Homischer,K., Karas,D., Kel,A.E., Kel-Margoulis,O.V. 2003. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31: 374-378.

McGrath, M. E. 1999. The Lysosomal Cysteine Proteases.Annu. Rev. Biophys. Biomol. Struct. 28: 181-204.

McIntyre, G F. and Erickson, A. H. 1993. The lysosomal proenzyme receptor that binds procathepsin L to microsomal membranes at pH 5 is a 43-kDa integral membrane protein.Proc Natl Acad Sci USA. 90: 10588-10592.

102

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Menard, R., Carmona, E., Takebe, S., Dufour, E., Plouffe, C., Mason, P., and Mort, J. S. 1998. Autocatalytic Processing of Recombinant Human Procathepsin J.L. Biol. Chem. 273(8): 4478-4484.

Mehtani, S., Gong, G., Panella, J., Subbiah, S., Peffley, D. M., and Frankfater, A. 1998. In Vivo Expression of an Alternatively Spliced Human Tumor Message That Encodes a Truncated Form of Cathepsin B.J. Biol. Chem. 273: 13236-13244.

Metrione, R. M., Okuda, Y., Fairclough, G F Jr. 1970. Subunit structure of dipeptidyl . Biochem. 9(12): 2427-2432.

Michalek,M.T., Beancerraf,B. and Rock,K.L. 1992. The class II MHC-restricted presentation of endogenously synthesized ovalbumin displays clonal variation, requires endosomal/lysosomal processing, and is up-regulated by heat shock.J. Immunol., 148: 1016-1024.

Miyata, S., and Kubo, T. 1997. Inhibition of gastrulation Xenopusin embryos by an antibody against a cathepsin L-like protease.Dev. Growth Differ. 39: 111-115.

Miyata, S., Nishibe, Y., and Kihara, H.K. 1995. Effects on properties of a thiol protease from Xenopus embryos in substrate assay conditions.Cell Biol. Int. 19: 33-38.

Monteiro, A.C.S., Abrahamson, M., Lima, A.P.C.A, Vannier-Santos MA, Scharfstein J. 2001. Identification, characterization and localization of chagasin, a tight-binding cysteine protease inhibitor inTrypanosoma cruzi. J Cell Sci. 114: 3933-3942.

Mundodi, V., Somanna, A., Farrell, P. J., Gedamu, L. 2002 . Genomic organization and functional expression of differentially regulated cysteine protease genesLeishmania of donovani complex. Gene 282: 257-265.

Musil, D., Zucic, D., Turk, D., Engh, R.A., Mayr, I., Huber, R., Popovic, T., Turk, V., Towatari, T., Katunuma, N.. 1991. The refined 2.15 A X-ray crystal structure of human liver cathepsin B: the structural basis for its specificity.EMBOJ. 10(9): 2321-2330.

Nagainis, P. A., and Warner, A.H. 1979. Evidence for the presence of an acid protease and protease inhibitors in dormant embryosArtemia of salina. Dev. Biol. 68: 259-270.

Neilson, H., Engelbrecht, J., Brunak, S. and von Hiejne, G. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction from their cleavage sites. Protein Eng. 10: 1-6.

103

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. O’Neil, S.M., Parkinson, M., Dowd, A.J., Strauss, W., Angles, R., Dalton, J.P., 1999. Immunodiagnosis of human fascioliasis using recombinantFasciola hepatica cathepsin LI cysteine protease.Am. J. Trop. Med. Hyg. 60: 749-51.

Ortega, M. A., Diaz-Guerra, M., Sastre, L. 1996. Actin gene structure in two Artemia species, A. franciscana and A. parthenogenetica.J. Mol. Evol. 143(3): 224-35.

Otto, H. H., and Schirmeister, T. 1997. Cysteine proteases and their inhibitors.Chem Rev.; 97: 133-71.

Palmer, J.D., and J.M. Logsdon, 1991. The recent origin of introns.Curr. Opin. Genet. Dev. 1: 470-477.

Polgar, L. and Halasz, P. 1982. Current problems in mechanistic studies of serine and cysteine proteinases.Biochem J. 207(1): 1-10.

Quandt, K., Freeh, K., Karas, H., Wingender, E., Werner, T., 1995. Matlnd and Matlnspector - new fast and versatile tools for detection of consensus matches in nucleotide sequence data.Nucleic Acids Res. 23: 4878-4884.

Reed, R., and K. Magni, 2001. A new view of mRNA export: separating the wheat from the chaff. Nat. Cell Biol. 3: E201-E204.

Rescheleit, D.K., Rommerskirch, W.J., Weideranders, B., 1996. Sequence analysis and distribution of two new human cathepsin L splice variants.FEBS Lett. 394: 345-348.

Rice, K., Perlata, R., Bast, D., Azavedo, J., McGavin, M. J. 2001. Description of staphylococcus serine protease (ssp) operon inStaphylococcus aureus and nonpolar inactivation of sspA-encoded serine protease.Infect Immun.; 69: 159—169.

Rigden, D., Moscolov, V. V., Galperin, M. 2002. Sequence conservation in the chagasin family suggests a comon trend in cysteine proteinase binding by unrelated protein inhibitors.Protein Sci; 11: 1971—1977.

Roche,P. A. and Cresswell,P. 1991. Proteolysis of the class II-associated invariant chain generates a peptide in intracellular HLA-DR molecules.Proc. Natl Acad. Sci. USA, 88: 3150-3154.

Rose, T.M. 2005. CODEHOP-mediated PCR - a powerful technique for the identification and characterization of viral genomes.Virol. J. 2:20.

Roy, S. W., Fedorov, A., and Gilbert, W. 2003. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain.Proc Natl Acad Sci USA. 100(12): 7158-7162.

104

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Roy, S. W., Nosaka, M., de Souza, S. J., Gilbert, W. 1999. Centripetal modules and ancient introns.Gene. 238(1): 85-91.

Rzychon, M., Sabat, A., Kosowska, K., Dubin, A., Potempa, J. 2003. Staphostatins: an expanding new group of proteinase inhibitors with a unique specificity for the regulation of staphopains,Staphylococcus spp. cysteine proteinases.Mol Microbiol. 49: 1051-1066.

Rzychon, M., Chmiel, D. and Niemczyk, J. S. 2004. Modes of inhibition of cysteine proteases. Acta Biochimica Polonica 51(4): 861-873.

Sahagian, G.G., and Gottesman, M.M. 1982. The predominant protein of transformed murine fibroblasts carries the lysosomal mannose 6-phosphate recognition marker.J. Biol. Chem. 257: 11145-11150.

Sajid, M., and McKerrow, J. H. 2002. Cysteine protease of parasitic organisms.Mol. Biochem. Parasitol. 120: 1-21.

Sambrook, J., Fritsch, E. F., Maniatis, T. 1989. Molecular cloning. A laboratory Manual. 2nd ed. Cold Spring Harbor Laboratory Press.

Sandersen, S. J., Westrop, G. D., Scharfstein, J., Mottram, J. C., Coombs, G. H. 2003 Functional conservation of a natural cysteine peptidase inhibitor in protozoan and bacterial pathogens. FEBS Lett. 542: 12-16.

Santos, C. C., Sant'anna, C., Terres, A., Cunha-e-Silva, N. L., Scharfstein, J., de A Lima, A. P. 2005. Chagasin, the endogenous cysteine-protease inhibitor of Trypanosoma cruzi, modulates parasite differentiation and invasion of mammalian cells. J. Cell. Sci. 118: 901-915.

Sastre, L. 1999. Isolation and characterization of the gene coding for Artemia ffanciscana TATA-binding protein: expression in cryptobiotic and developing embryos. Biochim. Biophy. Acta 1445: 271-282.

Schlereth A, Standhardt D, Mock HP, Muntz K. 2001. Stored cysteine proteinases start globulin mobilization in protein bodies of embryonic axes and cotyledons during vetch (Vicia sativa L.) seed germination.Planta. 212(5-6): 718-727.

Seth, P., Mahajanl, V. S., Chauhan, S. S. 2003. Transcription of human cathepsin L mRNA species hCATL B from a novel alternative promoter in the first intron of its gene. Gene 321: 83-91.

Sever, N., Filipic, M., Brzin, J., Lah, T.T. 2002. Effect of cysteine proteinase inhibitors on murine B16 melanoma cell invasion in vitro.Biol. Chem. 383: 839-842.

105

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Shi, GP., Webb, A.C., Foster, K.E., Knoll, J.H., Lemere, C.A., Munger, J.S., Chapman, H.A. 1994. Human cathepsin S: chromosomal localization, gene structure, and tissue distribution. J. Biol. Chem. 269(15): 11530-11536.

Shinagawa,T., Do,Y.S., Baxter,J.K., Carilli,C., Schilling,J. and Hsueh,W.A. 1990. Identification of an enzyme in human kidney that correctly processes prorenin.Proc. Natl Acad. Sci. USA, 87: 1927-1933.

Smith, S. M., Gottesman, M. M. 1989. Activity and deletion analysis of recombinant human cathepsin L expressed in Escherichia coli. J. Biol. Chem. 264(34): 20487-20495.

Southern, E.M. 1975. Detection of specific sequences among DNA fragments separated by gel electrophoresis.J. Mol. Biol. 98: 503.

Steams, N.A., Dong, J., Pan. J.-X., Brenner, D.A., and Sahagian, G.G. 1990. Comparison of cathepsin L synthesized by normal and transformed cells at the gene, message, protein and oligosaccharide levels.Arch. Biochem. Biophys. 283: 447-457.

Storer, A.C. and Menard, R. 1994. Catalytic mechanism in papain family of cysteine peptidases. Methods Enzymol. 244: 486-500.

Swensen, J. 1996. PCR with random primers to obtain sequence from yeast artificial chromosome insert ends or plasmids.BioTechniques 20: 486-491.

Szpaderska, A and Frankfater, A. 2001. An intracellular form of cathepsin B contributes to invasiveness in cancer.Cancer Res. 61: 3493-500.

Takahashi,H., Cease,K.B. and Berzofsky,J.A. 1989. Identification of proteases that process distinct epitopes on the same protein.J. Immunol., 142: 2221-2229.

Tarrio R, Rodriguez-Trelles F, Ayala FJ. 1998. NewDrosophila introns originate by duplication.Proc Natl Acad Sci USA 95: 1658-1662.

Tselentis, Y., Gilkas, A., Chaniotis, B., 1994. Kala-azar in Athens basin.Lancet. 343(8913): 1635.

Turk, B., Bieth, J.G., Dolence, I., Turk, D. Cimerman N, Kos J, Colic A, Stoka V, Turk V. 1995. Regulation of the activity of lysosomal cysteine proteinases by pH-induced inactivation and/ or endogenous protein inhibitors, cystatins.Biol. Chem. 376(4): 225-230.

Turk, B., Turk, V., Turk, D. 1997. Structural and functional aspects of papain-like cysteine proteinases and their protein inhibitors.Biol. Chem. 378:141- 150.

106

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Turk, B., Turk, D., Turk, V. 2000. Lysosomal cysteine proteases: more than scavengers.Biochim. Biophy. Acta 1477: 98-111.

Uchiyama,Y., Watanabe,T. Watanabe,M., Ishii,I., Matsuba,H., Waguri,S. and Kominami,E. 1989. Immunocytochemical localization of prorenin, renin, and cathepsins B, H, and L in juxtaglomerular cells of rat kidney.J. Histochem. Cytochem., 37: 691-696.

Vemet, T., Berti, P.J., de Montigny, C., Musil, R., Tessier, D.C., Menard, R., Magny, M.C., Storer, A.C., Thomas, D.Y. 1995. Processing of the papain precursor. The ionization state of a conserved amino acid motif within the Pro region participates in the regulation of intramolecular processing. J. Biol. Chem. 270(18): 10838-10846.

Vemet, T., Berti, P. J., de Montigny, C., Musil, R., Tessier, D. C., Me'nard, R., Magny, M.-C., Storer, A. C., and Thomas, D. Y. 1995.J. Biol. Chem. 270: 10838-10846.

Villadangos, J.A., Bryant, R.A., Deussing, J., Drissen, C., Lennon-Dumenil, A.M., Riese, R.J., Saftig, P., Shi, G.P., Chapman, H.A., Peters, C., Ploeghl, H.L., 1999. Proteases involved in MHC class II antigen presentation.Immunol. Rev. 172: 109-120.

Volk, H., Kurz, U., Linder, J. Klumpp, S., Gnau, V., Jung, G., and Schultz, J. 1996. Cathepsin L is anintracellular and extracellular protease.Eur. J. Biochem. 238: 198-206.

Von Eggeling, F. and Spielvogel, H. 1995. Applications of random PCR.Cell. Mol. Biol. (Noisy-le-grand). 41(5): 653-670.

Warner, A.H. and Shridhar, V. 1985. Purification and characterization of a cytosol protease from dormant cysts of the brine shrimpArtemia. J. Biol. Chem. 260: 7008-7014.

Warner, A. H. and Sonnenfeld-Karcz, M. J. 1992. Purification and partial characterization of thiol protease inhibitors from embryos of the brine shrimpArtemia. Biochem. Cell Biol. 70: 1020-1029.

Warner, A.H., and Matheson, C. 1998. Release of proteases from larvae of the brine shrimp Artemia franciscana and their potential role during the molting process.Comp. Biochem. Physiol. 119:B 255-263.

Warner, A.H., Perz, M.J., Osahan, J.K., and Zielinski, B.S. 1995. Potential role in development of the major cysteine protease in larvae of the brine shrimpArtemia franciscana. Cell Tissue Res. 282: 221-231.

107

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Watson, M.E. 1984. Compilation of published signal sequences.Nucleic Acids Res. 12: 5145-5164.

Wex, T., Buhling, F., Wex, H., Gunther, D., Malfertheiner, P., Weber, E., Bromme, D. 2001. Human cathepsin W, a cysteine protease predominantly expressed in NK cells, is mainly localized in the endoplasmic reticulum.J. Immunol.', 167: 2172-2178.

Wiederanders, B. 2003. Structure-function relationships in class CA1 cysteine peptidase propeptides.Acta Biochimica Polonica. 50: 691-713.

Wijffels, G.L., Panaccio, M., Salvatore, L., Wilson, L., Walker, I.D., Spithill, T.W., 1994. The second cathepsin L-like proteinase of the trematode,Fasciola hepatica, contain 3-hydroxyproline residues.Biochem. J. 299: 781-790.

Wingender,E., Chen,X., Fricke,E., Geffers,R., Hehl,R., Liebich,I., Krull,M., Matys,V., Michael,H., Ohnha»user,R. et al. 2001. The TRANSFAC system on gene expression regulation. Nucleic Acids Res., 29: 281-283.

Wingender,E., Chen,X., Hehl,R., Karas,H., Liebich,I., Matys,V., Meinhardt,T., Pru, M., Reuter,I. and Schacherer,F. 2000. TRANSFAC: an integrated system for gene expression regulation.Nucleic Acids Res., 28: 316-319.

Wolters, P. J., and Chapman, H. A. 2000. Importance of lysosomal cysteine proteases in lung disease.Respir Res 1:170-177.

Yamasaki, H., Aoki, T., Oya, H., 1989. A cysteine proteinase from the liverfluke Fasciola spp.: purification, characterization, localization and application to immunodiagnosis.Jpn. J. Parasitol. 38: 373-384.

Yamasaki, H., Mineki, R., Murayama, K., Ito, A., Aoki, T. 2002. Characterization and expression of the Fasciola gigantica cathepsin L gene. International Journal for Parasitology 32: 1031 -1042.

Yan, S., Berquin, I.M., Troen, B.R., Sloane, B.F. 2000. Transcription o f human cathepsin B is mediated by Spl and Ets family factors in glioma.DNA Cell Biol. 19(2): 79-91.

108

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. VITA AUCTORIS

Name: Cao JianPing

Born: February 13th, 1980,Shanghai, P.R. China.

Education: 2003-2006

Master of Science Program, Department of Biological Sciences, University of Windsor, Windsor, Ontario, Canada.

1998-2003 Bachelor of Medicine, Department of Medicine, Shanghai Second Medical University, Shanghai, P. R. China.

Publication:

Abstract: Structure of the cathepsin L gene in two species of the crustacean, Artemia. A.H. Warner, M.F. Shaw, R. Shamoon, and P.J. Cao. The International Proteolysis Society, Quebec City, Canada, October, 15-19, 2005.

109

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.