Proc. Natl. Acad. Sci. USA Vol. 91, pp. 7588-7592, August 1994 Biochemistry Enterokinase, the initiator of intestinal digestion, is a mosaic protease composed of a distinctive assortment of domains (serne proteases/tryngen activation) YASUNORI KITAMOTO*, XIN YUANt, QINGYU Wu*, DAVID W. MCCOURTt, AND J. EVAN SADLER*t* tHoward Hughes Medical Institute, *Departments of Medicine and Biochemistry and Molecular Biophysics, The Jewish Hospital of St. Louis, Washington University School of Medicine, St. Louis, MO 63110 Communicated by Earl W. Davie, April 19, 1994

ABSTRACT Enterokinase is a protease of the intestinal brates (7), except for the similar sequences of trypsinogens brush border that specifically cleaves the acidic propeptide from lungfish (IEEDK and LEDDK) and African clawed frog from trypsinogen to yield active trypsin. This cleavage initiates (FDDDK). Enterokinase prefers substrates with the se- a cascade of proteolytic reactions leading to the activation of quence DDDDK, whereas the presence ofaspartate residues many pancreatic zymogens. The full-length cDNA sequence for markedly inhibits the ability of trypsin to cleave such sub- bovine enterokinase and partial cDNA sequence for human strates (8). For example, toward bovine trypsinogen the enterokinase were determined. The deduced amino acid se- catalytic efficiency of enterokinase is 12,000-fold (porcine) quences indicate that active two-chain enterokinase is derived (9) or 34,000-fold (bovine) (10) greater than that of bovine from a single-chain precursor. Membrane association may be trypsin. This reciprocal specificity protects trypsinogen mediated by a potential sigal-anchor sequence near the amino against autoactivation by trypsin and promotes activation by terminus. The amino terminus of bovine enterokinase also enterokinase in the gut. meets the known sequence requirements for N-myris- Enterokinase has been purified from porcine (11), bovine toylation. The amino-terminal heavy chain contains domains (10, 12, 13), human (14), and ostrich intestine (15). With the that are homologous to segments of the low density lipoprotein possible exception of human enterokinase, which was sug- receptor, complement components COr and Cls, the macro- gested to be a heterotrimer (14), enterokinase appears to be phage scavenger receptor, and a recently described motif a disulfide-linked heterodimer with a heavy chain of 82-140 shared by the metalloprotease meprin and the Xenopus AS kDa and a light chain of 35-62 kDa. Mammalian enteroki- neuronal recognition protein. The carboxyl-terminal light nases contain 30-50%o carbohydrate, which may contribute to chain is homologous to the trypsin-like serine proteases. Thus, the apparent differences in polypeptide masses. The heavy enterokinase is a mosaic protein with a complex evolutionary chain is postulated to mediate association with the intestinal history. The amino acid sequence surrounding the amino brush border membrane (16), although no direct evidence for terminus of the enterokinase light chain is ITPK-IVGG (hu- this function has been reported. The light chain contains the man) or VSPK-IVGG (bovine), suggesting that single-chain center. Based enterokinase is activated by an unidentified trypsin-like pro- catalytic on susceptibility to inhibition by tease that cleaves the indicated Lys-fle bond. Therefore, en- chemical modification of the active-site serine and histidine terokinase may not be the "first" of the intestnal residues (9-11, 17) and on the partial amino acid sequence digestive hydrolase cascade. The specificity ofenterokinase for (18) and cDNA sequence of the bovine enterokinase light the DDDDK-I sequence of trpsinogen may be explained by chain (19), enterokinase is a member ofthe trypsin-like family complementary basic-amino acid residues clustered in poten- of serine proteases. tial S2-S5 subsites. Enterokinase stands at or near the top of a regulatory enzyme cascade that successfully limits the activity ofdiges- tive hydrolases to the gut, but there is no structural expla- All animals need to digest exogenous macromolecules with- nation for enterokinase membrane localization, substrate out destroying similar endogenous constituents. The regula- specificity, or expression specifically in the proximal small tion of digestive is, therefore, a fundamental re- intestine. To address these questions we have characterized quirement (1). Vertebrates have solved this problem, in part, cDNA clones for bovine and human by using a two-step enzymatic cascade to convert pancreatic enterokinase.§ zymogens to active enzymes in the lumen of the gut. The basic features ofthis cascade were described in 1899 by N. P. MATERIALS AND METHODS Schepovalnikov, working in the laboratory of I. P. Pavlov Materials. Purified calf enterokinase (EK-3, 131 units/4g) (2). Extracts of the proximal small intestine were shown to was strikingly activate the latent hydrolytic enzymes in pancre- from Biozyme Laboratories (San Diego). Fresh bovine atic fluid. Pavlov considered this intestinal factor to be an tissues were from a local abattoir. enzyme that activated other enzymes, or a "ferment of Amino Acid Sequencing. Enterokinase (16 pg) was reduced ferments," and named it "enterokinase." The importance of with 0.5% (vol/vol) 2-mercaptoethanol, separated by elec- this protease cascade is emphasized by the life-threatening trophoresis (20), transferred to an Immobilon P membrane intestinal malabsorption that accompanies congenital defi- (Millipore) by electroblotting, and stained with Coomassie ciency of enterokinase (3, 4). brilliant blue. The excised light-chain band (=47 kDa) was Enterokinase activates bovine trypsinogen by cleaving subjected to automated Edman degradation with an Applied after the sequence VDDDDK, releasing an amino-terminal Biosystems model 470A sequencer (21) equipped with a activation peptide (5, 6). The acidic DDDDK sequence ofthe model 120A phenylthiohydantoin analyzer. trypsinogen-activation peptide is conserved among verte- tTo whom reprint requests should be addressed at: Howard Hughes Medical Institute, Washington University, 660 South Euclid, Box The publication costs ofthis article were defrayed in part by page charge 8022, St. Louis, MO 63110. payment. This article must therefore be hereby marked "advertisement" §The sequences reported in this paper have been deposited in the in accordance with 18 U.S.C. §1734 solely to indicate this fact. GenBank data base (accession nos. U09859 and U09860). 7588 Downloaded by guest on September 27, 2021 Biochemistry: Kitamoto et aL Proc. NatL. Acad. Sci. USA 91 (1994) 7589 Isolation of cDNA Clones. RNA was extracted (22) from clones but absent in one (Fig. 1). This sequence is not bovine duodenum and proximal small intestine. Single- delimited by splice sites and therefore may be encoded by an stranded cDNA was prepared from total RNA (10 pg) using exon that is occasionally absent due to alternative splicing. avian myeloblastosis virus reverse transcriptase and an oli- This segment also could represent a length polymorphism. go(dT) primer (cDNA cycle kit, Invitrogen). The cDNA was The partial cDNA sequence for human enterokinase cor- used for PCR amplification (30 cycles of 2-min annealing at responds to amino acids 765-1035 encoded by the bovine 580C, 2-min extension at 720C, and 1-min denaturation at sequence. In the region of overlap, the open reading frames 940C) with sense primer 5'-TAY GAR GGI GCI TGG CCI of the bovine and human nucleotide sequences are -85% TGG GT-3' and antisense primer 5'-AAT GGG ACC GCC identical, and the encoded amino acid sequences are =84% IGA RTC ICC-3'. Products were analyzed by Southern identical. The 3' untranslated regions are less conserved, blotting and hybridization with 32P-labeled oligonucleotide exhibiting =67% sequence identity over 572 nt. probe 5'-STI WCI GCI GCC CAC TG-3'. The positive 572-bp By Northern blotting, an enterokinase mRNA species of product was cloned to yield pBEK1. =4.4 kb was detected in human small intestine, but not in Additional clones were obtained by radiolabeling the cDNA leukocytes, colon, ovary, testis, prostate, thymus, spleen, insert ofpBEK1 with [32P]dCTP (23) and screening ofbovine pancreas, kidney, skeletal muscle, liver, lung, placenta, or human small intestine Agtl1 cDNA libraries (Clontech) or brain, or heart (data not shown). This result is consistent with by using oligonucleotides to screen 5' rapid amplification of the studies of Pavlov on the distribution of enterokinase (2) cDNA ends (RACE) libraries (24). RACE libraries were and the immunohistochemical localization ofenterokinase in constructed with the 5' RACE system (GIBCO/BRL) using the brush border of duodenum and jejunum (27). bovine intestinal RNA and one of two sets of enterokinase- Structure of the Enterokinase Catalytic Domain. In agree- specific primers: set 1, 5'-TTA TTG TCT TCA TCA GAG ment with LaVallie et al. (19), amino acid residues 801-1035 CCA TC-3', 5'-TGG ACA GTT TAA TTC TCC ATC ACA-3', correspond to the enterokinase light chain, which has a 5'-ATC AAT TGC TAT GTA CTT TAG AGC-3'; set 2, predicted mass of 26.3 kDa, compared with 47 kDa observed 5'-ATT GAG ACA TTT CCT GTG ATA TCA ATG CTG-3', for purified bovine intestinal enterokinase (data not shown). 5'-TGT GGA AAG TGA CCA GTT GGC TGG ATT TAT-3', The difference reflects glycosylation ofthe light chain. There 5'-GCC TTG AAT CAG TTC TTC TT-3'. DNA sequences are three and four potential N-linked glycosylation sites, were determined on both strands (25). respectively, in the bovine and human enterokinase light DNA Sequence Analysis. Sequences were compared to chains, and digestion ofbovine enterokinase with peptide:N- GenBank and EMBL data bases at the National Center for glycosidase F reduces the apparent mass of the light chain Biotechnology Information using the BLAST network server from 47 kDa to 35 kDa (data not shown). (26). Sequence alignments and consensus sequences were The enterokinase protease domain was compared with prepared and analyzed with the programs PILEUP and GAP of other serine proteases for characteristic disulfide bond pat- the Genetics Computer Group (version 7.1, Madison, WI). terns and sequence similarity. Enterokinase is most similar to The significance of GAP alignments was evaluated by com- a subfamily of two-chain serine proteases that share 10 paring the optimal alignment score (x) to the mean (,u) and SD conserved cysteine residues and in which the activation (a) of scores obtained for 30 alignments of randomized peptide remains attached to the protease domain by a disul- sequences, using the normal distribution to estimate the fide bond. The archetype of this group is chymotrypsin. By probability that the alignment could occur by chance. analogy to chymotrypsin (28, 29) and related proteases for which the disulfide bonds have been determined directly, the most likely pairings in enterokinase are as follows: Cys-788- RESULTS AND DISCUSSION Cys-912, Cys-826-Cys-842, Cys-926-Cys-993, Cys-957-Cys- Isolation of cDNA Clones. The bovine enterokinase light 972, and Cys-983-Cys-1011. The first ofthese disulfide bonds chain was reported to contain the motif YUGAWPWYV at joins the heavy chain and light chain. residues 8-16 (18); the underlined residues are not conserved The amino acid sequence of the enterokinase protease in other serine proteases. Thirty-one residues of the amino- domain is strikingly similar to the blood coagulation prote- terminal sequence ofthe bovine enterokinase light chain were ases factor XI (30) and prekallikrein (31) and to hepsin, an determined, and the previously reported sequence was con- unusual serine protease with a possible transmembrane do- firmed, except that arginine rather than tyrosine was identi- main near the amino terminus (32). Enterokinase exhibits the fied at cycle 8. This sequence was used to design a degenerate expected conservation of serine protease sequence motifs; in 23-mer "sense" primer that would be relatively specific for particular, the active-site residues can be identified as His- enterokinase. A degenerate 21-mer "antisense" primer was 841, Asp-892, and Ser-987 (Fig. 1). Compared with factor XI, based on the conserved GDSGGPL motif that contains the hepsin, and chymotrypsin, the human enterokinase light active-site serine of serine proteases. Upon PCR with a chain has 41%, 44%, and 35% identical amino acid residues. bovine small intestine single-stranded cDNA template, the The percentages for the bovine enterokinase comparisons are major product hybridized to a probe based on the conserved similar. Enterokinase and factor XI appear to share two sequence near the active-site histidine. The corresponding potential N-linked glycosylation sites, whereas hepsin has no clone pBEK1 was used to isolate overlapping cDNAs from N-linked glycosylation sites. bovine and human small intestine cDNA libraries. The specificity of enterokinase for cleavage after lysine is The composite cDNA sequence for bovine enterokinase consistent with the presence of Asp-981 at the base, and spans 3923 nt. Beginning at nt 113 there is an ATG codon and Gly-1008 and Gly-1018 at the sides of the specificity pocket open reading frame of 3105 nt, a stop codon plus 3' untrans- or S1 subsite that binds the substrate P1 residue (Fig. 1). The lated region of 643 nt, and a poly(A) tail of 63 nt. A poly- requirement for aspartate in the P2-P5 positions suggests that adenylylation signal of AATAAA is present 25 nt before the the surface ofenterokinase should provide electrostatic com- poly(A) tail. The open reading frame encodes a polypeptide of plementarity to negatively charged side chains. Examination 1035 amino acids with a calculated mass of 114.9 kDa. The of the homologous three-dimensional structure of chymo- translated amino acid sequence after residue 800 (Fig. 1) was trypsin suggests that several exposed surface loops of enter- identical to the 31 residues determined by Edman degradation okinase (Fig. 1, segments a-d) might contact these substrate of the enterokinase light chain, confirming that the cDNA residues. Within these segments, there are a few positively encodes enterokinase. A segment of 81 nt that encodes amino charged residues that are present in both bovine and human acid residues Ala-166-Pro-192 was present in three cDNA enterokinase but absent from related proteases with different Downloaded by guest on September 27, 2021 7590 Biochemistry: Kitamoto et al. Proc. Natl. Acad. Sci. USA 91 (1994) specificity for the P2-P5 substrate residues. In particular, the more likely. There is no in-frame termination codon within RRRK (human) or KRRK (bovine) sequences between res- the available 112 nt of putative 5' untranslated sequence, so idues 886-889 (Fig. 1, segment b) may interact directly with it is possible that the initiation codon remains to be cloned. the aspartate residues in enterokinase substrates. However, initiation at Met-1 predicts a bovine enterokinase The synthesis of enterokinase as a single-chain protein heavy chain of 800 amino acids with a mass of88.6 kDa (Fig. because it indicates that "proen- 1), and this is consistent with the -763 amino acids and =84 poses a conceptual problem kDa estimated by compositional analysis of purified enter- terokinase" itself must be activated by proteolytic cleavage. okinase (12). By SDS/gel electrophoresis, the apparent mass The responsible protease could act on proenterokinase in- of the heavy chain was 116 kDa, decreasing to =82 kDa tracellularly during biosynthesis or extracellularly. Although after removal of N-linked oligosaccharides with peptide:N- the reaction could be autocatalytic, the participation of a glycosidase F (data not shown). This decrease in mass is separate protease seems more likely. In that case, enteroki- consistent with the reported carbohydrate composition of nase would not be strictly at the top ofthe digestive hydrolase enterokinase (10, 12), and there are 17 potential N-linked cascade but would be in the second position at best. The glycosylation sites in the sequence of the heavy chain (two amino-terminal isoleucine of the enterokinase light chain is are concatenated) (Fig. 1). preceded by Ser-Pro-Lys (bovine) or Thr-Pro-Lys (human), The hydrophobic 29-residue sequence from Val-19 through suggesting that enterokinase is activated by a trypsin-like Val-47 could serve as a signal peptide. Ifit were not cleaved enzyme. The identity and location of the proenterokinase by signal peptidase, this segment could function as a signal- activator may indicate another level in the control of diges- anchor sequence and account for the membrane association tion. of enterokinase. The amino-terminal sequence also is com- Structural Motifs of the Enterokinase Heavy Chain. The patible with the substrate specificity of myristoylCoA:pro- nucleotide sequence around the codon for Met-i is tein N-myristoyltransferase (34), suggesting that Gly-2 may AAAATfG, and that for Met-20 is GTCAflT. Only the be myristoylated and thereby provide another mechanism for former sequence matches at both positions -3 and +4 the membrane targeting during biosynthesis. consensus sequence proposed for translation initiation in The heavy chain ofenterokinase contains five domains that vertebrate mRNAs (33), suggesting that initiation at Met-1 is are related to four different structural motifs found in other

EKbov MGSKRSVPSR HRSLTTYEVM FAVLFVILVA LCAGLIAVSW LSIQGSVKDA AFGKSHEARG TLKIISGATY NPHLQDKLSV DFKVLAFDIQ QMIDDIFQSS 100

EKbov NLKNEYKNSR VLQFEINBII VIFDLLFDQW VSDKNVKEEL IQGIEAMS QLVTFHIDLN SIDITASLENIFTISPATTS EKLTTSIPLA TPGMIECP 200 ...... 0001- EKbov PDSRLCADAL KYIAIDLFCD GELNCPDGSD EDTCATAC DGRFLLTGSS GSFEALHYPK PSMWAVCR WIIRVNQGLS IQLNFDYFNT YYADVLNIYE 300

EKbov GMGSSKILRA SLWSNNPGII RIFSNQVTAT FLIQSDESDY IGFKVTYTAF NSKELNNYEK INCNFEDGFC FWIQDLNDDN EWERTQGSTF PPSTGPTFDH 400 2 EKbov TFGMMGFYI STPTGPGGRR ERVGLLTLPL DPTPEQACLS FWYYMYGENV YKLSIUSD QNMEKTIFQK EGNYGQNWNY GQVTLin5VE FKVSFYGFKN 500

EKbov QILSDIALDD ISLTYGICMN-OVYPEPTLVP TPPPELPTDC GGPHDLWEPNL.FTSINFPN SYPNQAFCIW NLNAQKGKNI QLHFQEFDLE NIADVVEIRD 600 3 EKbov GEGDDSLFLA VYTGPGPVND VFSTTNRMTV LFITDNMLAK QGFKA4TTG YGLGIPEPCK EDNFQCKDGE CIPLVNLCDG FPHCKDGSDE AHCVRLFMf 700 4 5 -

...... EKhu ...... LILTPSQQCLQDSLIR LQCNHKSCGKKLA ..AQDIT EKbov TDSSGLVQFR IQSIWHVACA E3MTQISDD VCQLLGLGTG NWVPTFSTG GGPYVNLNTA PNQOLILTPS QQCLEDSLIL LQCNYKSCGK KLV..TQEVS 798

FXI ...... KIK ......

Hepsin ...... PCGRRKL ..... PV

Chta ...... CGV PAIQPVLSGL Consensus ------CG- K------

EKhu PKIVGGSNAK EGAWPWVVGL YY.... GGRLL CGASLVSSDW LVSAAHCVYG RNLEPSKWTA ILGLHMKSEZLTSPQTVPRLI DEIVINPHY. NRRRK EKbov PKIVGGSDSR EGAWPWVVAL YF..... DDQQV CGASLVSRDW LVSAAHCVYG RNMEPSKWKA VLGLHMASNIL.TSPQIETRLI DQIVINPHY. NKRRK 889 FXI PRIVGGTASV RGEWPWQVTL HTTSPTQRHL CGGSIIGNQW ILTAAHCFYG ..VESPKILR VYSGILIQIE IKEDTSFFGV QEIIIHDQY. KMAES Hepsin DRIVGGRDTS LGRWPWQVSL RY.... DGAHL CGGSLLSGDW VLTAAHCFPE RNRVLSRWRV FAG... AVAQ ASPHGLQLGV QAVVYHGGYL PFRDPNSEEN

Chta SRIVNGEEAV PGSWPWQVSL ..QDKTGFHF CGGSLINENW VVTAAHC ...... GVTTSDV VVAGEFDQGS SSEKIQKLKI AKVFKNSKY. NSLTI Consensus PRIVGG-D-- -G-WPWQV-L -Y----G-HL CGGSL-S-DW VVTAAHC-YG RN-E-SKW-- V-G--M---- -SP----L-I --IVIN--Y------N---- A * - a -

EKhu DNDIAMMHLE FKVMDYIQ PICLPEENQV FPPGRNJIA GWGTVVY.QG TTANILQEAD VPLLSNERCQ .QQHPEYM ENMICAGYEE GGIDSCQGDS EKbov NNDIAMMHLE MKVIWDYIQ PICLPEENQV FPPGRICSIA GWGALIY.QG STADVLQEAD VPLLSNEKCQ .QQMPEYEW ENMVCAGYEA GGVDSCQGDS 987 FXI GYDIALLKLE TTV5DSQR PICLPSKGDR NVIYTDCWVT GWGYRKL.RD KIQNTLQKAK IPLVTNEECQ .KRYRGHKIT HKMICAGYRE GGKDACKGDS Hepsin SNDIALVHLS SPLPLTEYIQ PVCLPAAGQA LVDGKICTVT GWGNTQY.YG QQAGVLQEAR VPIISNDVCN GADFYGNQIK PKMFCAGYPE GGIDACQGDS Chta NNDITLLKLS TAASFSQTVS AVCLPSASDD FAAGTTCVTT GWGLTRYTNA NTPDRLQQAS LPLLSNTNC. .KKYWGTKIK DAMICAG..A SGVSSCMGDS Consensus -NDIAL-HLE --VNYTDYIQ PICLP---Q- F--G--C--T GWG---Y--G -TA-VLQEA- VPLLSNE-CQ ---Y-G--IT E-MICAGY-E GG-DSCQGDS

~c ~ *

EKhu GGPLMCQEN. ... NRWFLAG VTSFGYK.CA LPNRPGVYAR VSRFTEWIQS FLH ......

EKbov GGPLMCQEN. NRWLLAG VTSFGYQ. CA LPNRPGVYAR VPRFTEWIQS FLH ...... 1035

FXI GGPLSCKHN. EVWHLVG ITSWGEG. CA QRERPGVYTN VVEYVDWILE KTQAV ...... Hepsin GGPFVCEDSI SRTPRWRLCG IVSWGTG.CA LAQKPGVYTK VSDFREWIFQ AIKTHSEASG MVTQL

Chta GGPLVCKKN. GAWTLVG IVSWGSSTCS .TSTPGVYAR VTALVNWVQQ TLAAN ...... Consensus GGPL-C-EN- ----RW-L-G ITSWG---CA L--RPGVYAR V--F-EWIQ- -L------*-~r

FIG. 1. Translated amino acid sequence of enterokinase cDNA clones and alignment with other serine proteases. The aligned sequences include human enterokinase (EKhu), bovine enterokinase (EKbov), human factor XI (FXI), human hepsin (Hepsin), bovine chymotrypsinogen A (Chta), and a consensus sequence. Numbering at right refers to the translated sequence of bovine enterokinase. Cysteine residues are in boldface type. Potential N-linked glycosylation sites are in boldface underlined type. The potential signal-anchor sequence is double underlined. The potential alternative exon is indicated by a dotted underline. Sequence motifs in the heavy chain are indicated by numbered underlines. Segments of the protease domain that may interact with substrate amino acids are indicated by lettered underlines (a-d). The cleavage site for zymogen activation (A), active site residues (*), and residues in the specificity pocket or S1 subsite (t) are indicated below the consensus sequence. Downloaded by guest on September 27, 2021 Biochemistry: Kitamoto et al. Proc. Natl. Acad. Sci. USA 91 (1994) 7591 A

SR? RE? LDLR Meprin Clr/s LDLR MSCR Serine Protease 2 3 4 5

B 1 52

EK1(199-239) C.PPDSRLCA D.ALKYIAID LFCDGELNCP DGSDEDNKTC ATA ...... EK4(659-693) D ..GECIPLV NLCDGFPHCK DGSDE C.KEDNFQCK ..AHC ......

LDLR1(6-46) D .. GKCISYK C.ERNEFQCQ WVCDGSAECQ DGSDESQETC LSVT ...... LDLR2 (47-87) C.KSGDFSCG GRVNRCIPQF WRCDGQVDCD NGSDE ..QGC PPKT LDLR3(88-126) C.SQDEFRCH D ..GKCISRQ FVCDSDRDCL DGSDE ..ASC PVLT LDLR4(127-175) C.GPASFQCN S..STCIPQL WACDNDPDCE DGSDEWPQRC RGLYVFQGDS SP

LDLR5(176-214) C.SAFEFHCL S ..GECIHSS WRCDGGPDCK DKSDE ..ENC AVAT .. LDLR6(215-254) C.RPDEFQCS D ..GNCIHGS RQCDREYDCK DMSDE ..VGC VNVTL LDLR7(255-296) CEGPNKFKCH S ..GECITLD KVCNMARDCR DWSDEPIKEC GTNE ...... Consensus C----EF-C- D--G-CI--- W-CDG--DC- DGSDE----C ----T---

C 1 90 EK2 (358-443) YEKINCNF.. ..EDGFCFWI QDLNDDNEWE RTQGSTFPPS TGPTFDHTFG NESGFYISTP TGPGGRRERV GLLTLPLDPT PEQACLSFWY

A5xen(646-727) HSDLDCKFGW GSQKTVCNWQ HDISSDLKWA VLNSKTGP ...... VQD.H TGDGNFIYSE ADERHEGRAA RLMSPVVSSS RSAHCLTFWY

MeprinA(276-360) TLLDHCDFEK ... TNVCGMI QGTRDDADWA H.GDSSQPEQ VDHTLVEQ.C KGAGYFMFFN TSLGARGEAA LLESRILYPK RKQQCLQFFY

MeprinB(261-346) SFMDSCDFEL ... ENICGMI QSSQDSADWQ RLSQVLSGPE NDHSNMGQ.C KDSGFFMHFN TSTGNGGITA MLESRVLYPK RGFQCVEFYL Consensus ---D-CDFE- ----NVCG-I QD--DDADWA RL--ST-PP- -DHT-V-Q-C K-SGFF--FN TS-G-RGEAA -L-SRVLYPK R-QQCL-FWY

91 179

EK2(444-520) YMYGENVYKL SINISSDQNM EKT .... IF QKEGNYGQNW NYGQVTLNET VEFKVSFYGF ... KNQILSD IALDDISL.. ..TYGICNV A5xen(728-812) HMDGSHVGTL SIKLKYEMEE DFDQTL... W TVSGNQGDQW KEARVVLHKT MKQYQVIVEG TVGKG.SAGG IAVDDIIIAN HISPSQCRA MeprinA(361-445) KMTGSPADRF EVWVRRDDNA GKVRQLAKIQ TFQGDSDHNW KIAHVTLNEE KKFRYVFLGT KGDPGNSSGG IYLDDITL.. ..TETPCPA MeprinB(347-430) YNSGSGNGQL NVYTREYTAG HQDGVLTLQR EIRDIPTGSW QLYYVTLQVT EKFRVVFEGV .GGPGASSGG LSIDDINL.. ..SETRCPH Consensus YM-GS-VG-L S---R-D-N- -KD--L--I- T--GN-G-NW K-A-VTLNET -KFRVVF-G- -GG-G-SSGG IA-DDI-L-- ---ET-CPA

D 1 90

EK3(540-619) CGGPHDLWEP NTTFTSINFP NSYPNQAFCI WNLNAQKGKN IQLHF.QEFD LEN . IADVVEIRD GEGDDS.LFL AVYTGPGPVN

Tolloid2(468-550) CGGDLKLTKD QSI.DSPNYP MDYMPDKECV WRITAPDNHQ VALKF.QSFE LEK ... HDG CAYDFVEIRD GNHSDS.RLI GRFCGDKLPP Tolloid3(624-712) CGGVVDATKS NGSLYSPSYP DVYPNSKQCV WEVVAPPNHA VFLNF.SHFD LEGTRFHYTK CNYDYLIIYS KMRDNRLKKI GIYCGHELPP Tolloid4(787-868) CKFEI..TTS YGVLQSPNYP EDYPRNIYCY WHFQTVLGHR IQLTF.HDFE VES.... .HQE CIYDYVAIYD GRSENS.STL GIYCGGREPY

Clrl(18-95) ..SIPIPQKL FGEVTSPLFP KPYPNNFETT TVITVPTGYR VKLVF.QQFD LEPS EG CFYDYVKISA DKKS .....L GRFCGQLGSP

Clr2(193-274) CSSELY.TEA SGYISSLEYP RSYPPDLRCN YSIRVERGLT LHLKFLEPFD IDDHQQVH.. CPYDQLQIYA NGKN .....I GEFCGKQRPP Consensus CGGE---TK- -G-L-SPNYP --YPN---CV W-I-AP-GH- V-L-F-Q-FD LE-----H-- C-YDYV-IYD G--D-S---- GI-CG---PP

91 130

EK3 (620-651) ...... DV FSTTNRMTVL FITDNMLAKQ GFKANFTTGY

Tolloid2(551-581) ...... NI KTRSNQMYIR FVSDSSVQKL GFSAALMLD.

Tolloid3(713-743) ...... VV NSEQSILRLE FYSDRTVQRS GFVAKFVID.

Tolloid4(869-899) ...... AV IASTNEMFMV LATDAGLQRK GFKATFVSE. Clrl(96-135) LGNPPGKKEF MSQGNKMLLT FHTDFSNEEN GTIMFYKGFL

Clr2 (275-306) ...... DL DTSSNAVDLL FFTDESGDSR GWKLRYTTEI Consensus ------DV -S--N-M-L- F-TD-S-Q-- GFKA-F----

E 1 90 EK5(694-782) VRLFNGTTDS SGLVQFRIQS IWHVACAENW TTQISDDVCQ LLGLGTGNSS VPTFSTGGGP YVNLNTAPNG SLILTPSQQC LE.DSLILLQ MSCR(349-428) VRLVGGSGPH EGRVEILHSG QWGTICDDRW EVRVGQVVCR SLGYPGV... ..QAVHKAAH F.GQGTGP.. ..IWLNEVFC FGRE.SSIEE Speractl(43-121) IRLIHGRTEN EGSVEIYHAT RWGGVCDWWW HMENANVTCK QLGFPGARQ. ....FYRRAY F.GAHVTT.. ..FWVYKMNC LGNE.TRLED Speract2(153-232) LRMILGDVPN EGTLETFWDG AWGSVCHTDF GTPDGNVACR QMGYSRGVK. ...SIKTDGH F.GFSTGP.. ..IILDAVDC EGTE.AHITE Speract3(264-344) IRLMDGSGPH EGRVEIWHDD AWGTICDDGW DWADANVVCR QAGYRGAVK. ..ASGFKGED F.GFTWAP.. ..IHTSFVMC TGVE.DRLID Speract4(382-464) VRIV.GMGQG QGRVEVSLGN GWGRVCDPDW SDHEAKTVCY HAGYKWGASR AAGSAEVSAP F.DLE.AP.. ..FIIDGITC SGVENETLSQ Consensus VRL--G-GP- EGRVEI-H-- -WG-VCDD-W ---DANVVCR QLGY-GG------S----A- F-G--TAP-- --I----V-C LG-E---L--

91 116

EK5(783-787) CNYKS ...... MSCR(429-451) CKIR.QWGTR AC..SHSEDA GVTCTL Speractl(122-145) CYHRPYGRPW LC..NAQWAA GVECLP Speract2(233-258) CNMPVTPYQH ACPYTHNWDV GVVCKP Speract3(345-367) CILR.DGWTH SC..YHVEDA SVVCAT Speract4(465-486) CQMKV.SADM TC... ATGDV GVVCEG Consensus C-MR------C---H--DA GVVC-- FIG. 2. Structural motifs in enterokinase. Numbers in parentheses refer to the amino acid residues represented in each aligned sequence. Bovine enterokinase (EK) residues are numbered as in Fig. 1. (A) Schematic structure of enterokinase, indicating the proposed signal-anchor sequence (SA), alternative exon (AE), numbered heavy chain domains (LDLR, low-density-lipoprotein receptor; MSCR, macrophage scavenger receptor), and serine protease domain with active-site residues histidine (H), aspartate (D), and serine (S). The cleavage site between the heavy and light chains (arrowhead) and disulfide bond connecting them are shown. (B) Alignment of EK domains 1 and 4 with cysteine-rich motifs of the LDL receptor (LDLR) (35). (C) Alignment of EK domain 2 with segments of Xenopus laevis A5 antigen (A5xen) (36), mouse meprin A (37), and rat meprin B (38). (D) Alignment of EK domain 3 with selected Clr/s-like domains of Drosophila melanogaster tolloid (39), and complement component COr (40). (E) Alignment of EK domain 5 with repeated domains of the mouse macrophage scavenger receptor type I (MSCR) (41) and the speract crosslinking protein from sea urchin sperm (42). The significance of alignments was estimated as described under Materials and Methods: EK1 or EK4 versus LDLR motifs, P < 10-14; EK2 versus meprin motifs, P < 10-23; EK3 versus Clr/s motifs, P < 10-23; EK5 versus MSCR motifs, P 3.7 x 10-5. Downloaded by guest on September 27, 2021 7592 Biochemistry: Kitamoto et al. Proc. Natl. Acad Sci. USA 91 (1994) protein families, indicating that enterokinase is a mosaic 4. Haworth, J. C., Gourley, B., Hadorn, B. & Sumida, C. (1971) protein with a complex evolutionary history. The particular J. Pediatr. 78, 481-490. combination of motifs is specific and surprising (Fig. 2A). 5. Davie, E. W. & Neurath, H. (1955) J. Biol. Chem. 212, 515- Enterokinase domains 1 and 4 are homologous to an z40- 529. amino acid repeat found in the amino-terminal 6. Yamashina, I. (1956) Acta Chem. Scand. 10, 739-743. cysteine-rich 7. Bricteux-Gregoire, S., Schyns, R. & Florkin, M. (1972) Comp. domain of the low-density lipoprotein receptor and also in Biochem. Physiol. B 42, 23-39. several complement (Fig. 2B) (35). 8. Abita, J. P., Delaage, M., Lazdunski, M. & Savrda, J. (1969) Enterokinase domain 2 (Fig. 2C) is homologous to mz170- Eur. J. Biochem. 8, 314-324. amino acid segments of meprins A and B, which are mem- 9. Maroux, S., Baratti, J. & Desnuelle, P. (1971) J. Biol. Chem. brane-bound metalloproteases of renal glomeruli (37, 38). 246, 5031-5039. This domain also is homologous to a segment of the AS 10. Anderson, L. E., Walsh, K. A. & Neurath, H. (1977) Biochem- protein of X. Iaevis (36), which may mediate neuronal rec- istry 16, 3354-3360. ognition. For this structural motif, identified in four distinct 11. Baratti, J., Maroux, S., Louvard, D. & Desnuelle, P. (1973) vertebrate proteins, we propose the name "meprin domain." Biochim. Biophys. Acta 315, 147-161. Enterokinase domain 3 (Fig. 2D) is homologous to a family 12. Liepnieks, J. J. & Light, A. (1979) J. Biol. Chem. 254, 1677- of 120-amino acid repeats reported in complement serine 1683. in many proteins 13. Fonseca, P. & Light, P. (1983)J. Biol. Chem. 258,14516-14520. protease COr (40) and subsequently found 14. Magee, A. I., Grant, D. A. W. & Hermon-Taylor, J. (1981) including the product of the Drosophila dorsal-ventral pat- Clin. Chim. Acta 115, 241-254. terning tolloid (39). Interestingly, tolloid also encodes a 15. Naude, R. J., Da Silva, D., Edge, W. & Oelofsen, W. (1993) separate metalloprotease domain that is homologous to the Comp. Biochem. Physiol. B 105, 591-595. metalloprotease domains of meprins A and B. 16. Fonseca, P. & Light, A. (1983) J. Biol. Chem. 258, 3069-3074. Enterokinase domain 5 (Fig. 2E) is homologous to 110- 17. Baratti, J. & Maroux, S. (1976) Biochim. Biophys. Acta 452, amino acid cysteine-rich motifs that are found in the macro- 488-496. phage scavenger receptor (41), the sea urchin spermatozoa 18. Light, A. & Janska, H. (1991) J. Protein Chem. 10, 475-480. speract receptor (42), and several lymphocyte cell-surface 19. LaVallie, E. R., Rehemtulla, A., Racie, L. A., DiBlasio, antigens (41). This domain in enterokinase is truncated at the E. A., Ferenz, C., Grant, K. L., Light, A. & McCoy, J. M. carboxyl end. (1993) J. Biol. Chem. 268, 23311-23317. The structural domains ofthe enterokinase heavy chain are 20. Laemmli, U. K. (1970) Nature (London) 227, 680-685. in of the complement cascade, in endocytic 21. Hewick, R. M., Hunkapillar, M. W., Hood, L. E. & Dreyer, found proteins W. J. (1981) J. Biol. Chem. 256, 7990-7997. receptors for diverse ligands including lipoproteins, in pro- 22. Chomczynski, P. & Sacchi, N. (1987) Anal. Biochem. 162, teins that regulate development, in receptors that contribute 156-159. to the specificity of egg fertilization, and in proteins of 23. Feinberg, A. P. & Vogelstein, B. (1983) Anal. Biochem. 132, unknown function. The particular combination of structural 6-13. motifs observed in the enterokinase heavy chain is unprec- 24. Frohman, M. A., Dush, M. K. & Martin, G. R. (1988) Proc. edented. The presence of potential ligand-binding domains Natl. Acad. Sci. USA 85, 8998-9002. suggests that interaction with other macromolecules, either 25. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. in the or in the lumen of the gut, might Acad. Sci. USA 74, 5463-5467. modulate enterokinase activation, substrate specificity, or 26. Gish, W. & States, D. J. (1993) Nature Genet. 3, 266-272. 27. Hermon-Taylor, J., Perrin, J., Grant, D. A. W., Appleyard, A. inhibition. & Magee, A. I. (1977) Gut 18, 259-265. For nearly a century enterokinase has been known as the 28. Hartley, B. S. & Kauffman, D. L. (1966) Biochem. J. 101, principal activator of digestive hydrolases, and the same 229-231. basic regulatory mechanism appears to be conserved among 29. Brown, J. R. & Hartley, B. S. (1966) Biochem. J. 101,214-228. all vertebrates. The physiologic importance of this mecha- 30. Fujikawa, K., Chung, D. W., Hendrickson, L. E. & Davie, nism is emphasized by the severe malabsorption that accom- E. W. (1986) Biochemistry 25, 2417-2424. panies human enterokinase deficiency (3, 4). The apparent 31. Chung, D. W., Fujikawa, K., McMullen, B. A. & Davie, E. W. requirement for proteolytic activation of proenterokinase (1986) Biochemistry 25, 2410-2417. suggests that yet another protease is required for the normal 32. Leytus, S. P., Loeb, K. R., Hagen, F. S., Kurachi, K. & Davie, E. W. (1988) Biochemistry 27, 1067-1074. regulation of pancreatic zymogens. The isolation of cDNA 33. Kozak, M. (1991) J. Cell Biol. 115, 887-903. clones for human and bovine enterokinase provides the 34. Rudnick, D. A., McWherter, C. A., Gokel, G. W. & Gordon, means to address the regulation and structure-function re- J. I. (1993) Adv. Enzymol. 67, 375-430. lationships of this ancient, essential protease. 35. Sudhof, T. C., Goldstein, J. L., Brown, M. S. & Russell, D. W. (1985) Science 228, 815-822. We thank Dr. Anja Schweizer and Dr. Jack Rorer (Washington 36. Takagi, S., Hirata, T., Agata, K., Mochii, M., Eguchi, G. & University) for translations ofarticles from the original German, Dr. Fujisawa, H. (1991) Neuron 7, 295-307. Heidi Hope (Washington University) for assistance with protein 37. Jiang, W., Gorbea, C. M., Flannery, A. V., Beynon, R. J., blotting, Lisa Westfield (Howard Hughes Medical Institute) for Grant, G. A. & Bond, J. S. (1992) J. Biol. Chem. 267, 9185- synthesis of oligonucleotides, and Cecil Buchanan (Washington 9193. University) for assistance in obtaining fresh bovine tissues. We also 38. Johnson, G. D. & Hersh, L. B. (1992) J. Biol. Chem. 267, thank Prof. Tatsuo Sato (Kumamoto University, Kumamoto, Japan) 13505-13512. for his encouragement and support. This research was supported, in 39. Shimell, M. J., Ferguson, E. L., Childs, S. R. & O'Connor, part, by National Institutes of Health Grant HL14147 (Specialized M. B. (1991) Cell 67, 469-481. Center of Research in Thrombosis). 40. Leytus, S. P., Kurachi, K., Sakariassen, K. S. & Davie, E. W. (1986) Biochemistry 25, 4855-4863. 1. Neurath, H. (1984) Science 224, 350-357. 41. Freeman, M., Ashenas, J., Rees, D. J. G., Kingsley, D. M., 2. Pavlov, I. P. (1902) The Work ofthe Digestive Glands (Charles Copeland, N. G., Jenkins, N. A. & Krieger, M. (1990) Proc. Griffin, London), 1st Ed. Natl. Acad. Sci. USA 87, 8810-8814. 3. Hadorn, B., Tarlow, M. J., Lloyd, J. K. & Wolff, 0. H. (1969) 42. Dangott, L. J., Jordan, J. E., Beliet, R. A. & Garbers, D. L. Lancet 1, 812-813. (1989) Proc. Natl. Acad. Sci. USA 86, 2128-2132. Downloaded by guest on September 27, 2021