<<

Proc. Nati. Acad. Sci. USA Vol. 85, pp. 4218-4222, June 1988 Biochemistry Primary structure of the human follistatin precursor and its genomic organization (ovary/testis/pituitary/prehormone/alternative splicing) SHUNICHI SHIMASAKI*, MAKOTO KOGA*, FREDERICK ESCHt, KAREN COOKSEY*, MALUZ MERCADO*, ANN KOBA*, NAOTO UENO*, SHAO-YAO YING*, NICHOLAS LING*, AND ROGER GUILLEMIN* *Laboratories for Neuroendocrinology, The Salk Institute, 10010 North Torrey Pines Road, La Jolla, CA 92037; and tAthena Neurosciences, Inc., 887-D Industrial Road, San Carlos, CA 94070 Contributed by Roger Guillemin, February 23, 1988

ABSTRACT Follistatin is a single-chain gonadal characterized porcine inhibins A and B. By use ofthe porcine that specifically inhibits follicle-stimulating hormone release. cDNA as a probe, we report herein the cloning and sequenc- By use ofthe recently characterized porcine follistatin cDNA as ing of the human follistatin precursor as well as the deter- a probe to screen a human testis cDNA library and a genomic mination of its genomic organizationA library, the structure of the complete human follistatin pre- cursor as well as its genomic organization have been deter- mined. Three of eight cDNA clones that were sequenced MATERIALS AND METHODS predicted a precursor with 344 amino acids, whereas the Cloning and DNA Sequencing. An oligo(dT)-primed human remaining five cDNA clones encoded a 317 amino acid precur- testicular cDNA library constructed with the vector Agtll sor, resulting from alternative splicing ofthe precursor mRNA. was purchased from Clontech (Palo Alto, CA). The mRNA Mature follistatins contain four contiguous domains that are used to produce the library was derived from the testis of a encoded by precisely separated exons; three of the domains are healthy 50-year-old man. The library was screened with a highly similar to each other, as well as to human epidermal porcine cDNA probe encoding the follistatin precursor. growth factor and human pancreatic secretory trypsin inhib- Hybridization was done in 1 M NaCI/1% NaDodSO4/10% itor. The genomic organization of the human follistatin is dextran sulfate/50% (vol/vol) formamide containing 0.2 mg similar to that of the human epidermal growth factor and of denatured salmon sperm DNA per ml at 37°C for 15 hr. thus supports the notion of exon shuffling during evolution. Filters were washed at 50°C for 10 min with 15 mM NaCl/1.5 mM sodium citrate/0.19o NaDodSO4. Twelve positive clones The granulosa cells of the ovary secrete into the follicular were detected that were divided into four groups according to fluid many that can modify the secretion ofpituitary the length of the inserted cDNA fragment; the 1.00-kilobase follicle-stimulating hormone (FSH). Two such proteins, (kb) group ofclones HTF 101, 103, and 108; the 1.50-kb group named inhibins A and B, which specifically inhibit the containing only clone HTF 109, the 1.40-kb group of clones secretion of pituitary FSH, have been isolated and charac- HTF 102, 104, 105, 107, and 110-112; and the 1.85-kb group terized from porcine (1-3) and bovine (4, 5) follicular fluid. containing only clone HTF 106. Eight clones (HTF 102 and The inhibins are heterodimeric proteins of M, 32,000, com- HTF 106-112) were subcloned into the M13mpl9 vector and posed of a common glycosylated a subunit of Mr 18,000 and sequenced in both directions by the dideoxy chain- one of two similar , subunits of Mr 14,000 (1, 6). In addition, termination method (13) with synthetic primers based on the homo- and heterodimers ofthe /3 subunits ofinhibins, named porcine follistatin cDNA sequence (12). activins (7, 8) or FRP (9), have also been identified in porcine For the follistatin gene cloning, a human lymphocyte follicular fluid, which can specifically stimulate the secretion genomic library, constructed in the vector EMBL 3 with 15- of pituitary FSH. Throughout the purification of the inhibins to 20-kb DNA fragments produced by partial digestion with and activins, we consistently observed one side fraction that Mbo I, was purchased from Stratagene (San Diego, CA). It also has FSH-release inhibitory activity and appeared unre- was screened with the same porcine cDNA probe as above. lated to the inhibins. The fractions containing this activity Three positive clones (HFG 101-103) were identified. The have been purified to homogeneity to yield two glycosylated insert fragments of the follistatin gene clones were excised to single-chain proteins of Mr 35,000 and 32,000, respectively the appropriate length, subcloned into pUC18 or M13mpl9 (10). The two proteins have similar amino acid compositions vectors, and sequenced in both directions by the same and identical amino-terminal amino acid sequences and were method as above. Compressed sequence ladders were clar- named follistatins. Subsequently, three proteins with FSH ified by the addition ofEscherichia coli single-strand binding release-inhibitory activity and having the same amino- protein to the sequencing reaction (Pharmacia). terminal amino acid sequence as porcine follistatins were Ovarian RNA Preparation and RNA Blot Hybridization. isolated from bovine follicular fluid by Robertson et al. (11). Human ovaries were obtained from a patient with radical By use of amino acid sequence information derived from endometriosis. The tissues were quick frozen in liquid nitro- trypsin digestion ofthe Mr 35,000 protein, two populations of - cDNA clones encoding, respectively, a follistatin precursor gen immediately after excision and stored at 80°C until of 344 amino acids or its carboxyl-terminal truncated form ready for RNA preparation. Total RNA was prepared by the with 317 residues were identified from a porcine ovarian conventional guanidine isothiocyanate method (14) and cDNA library (12). The deduced precursor sequences bear no homology with the a, PA, and PB chains of the previously Abbreviations: FSH, follicle-stimulating hormone; EGF, epidermal growth factor; preFS317 and preFS344, follistatin precursors of 317 and 344 amino acids, respectively. The publication costs of this article were defrayed in part by page charge $The sequence reported in this paper is being deposited in the payment. This article must therefore be hereby marked "advertisement" EMBL/GenBank data base (IntelliGenetics, Mountain View, CA, in accordance with 18 U.S.C. §1734 solely to indicate this fact. and Eur. Mol. Biol. Lab., Heidelberg) (accession no. J03771).

Downloaded by guest on September 30, 2021 4218 Biochemistry: Shimasaki et al. Proc. Natl. Acad. Sci. USA 85 (1988) 4219

poly(A)+ RNA was selected by oligo(dT)-cellulose column To further delineate the origin of the two precursors, we chromatography. have cloned and sequenced the follistatin gene from a human Five micrograms of poly(A)+ RNA was electrophoresed genomic library. From 1 million phage plaques, three positive on a formaldehyde-agarose gel and then transferred onto a clones, HFG 101, HFG 102, and HFG 103 (Fig. ld), were nylon membrane filter. Hybridization with the 32P-labeled- obtained by hybridization with the porcine cDNA probe. HTF 109 insert fragment was done under the same conditions Restriction mapping and -sequence analyses re- as those that were used for the plaque screening. The filters vealed that clone HFG 102, which is 19.1 kb in length, were washed with vigorous agitation in 300 mM NaCl/30 mM encodes nearly the whole human follistatin gene containing sodium citrate/0.1% NaDodSO4 for 15 min at room temper- five exons. These five exons encode, respectively, the signal ature and then incubated in 15 mM NaCl/1.5 mM sodium sequence and the first four domains (I-IV) of follistatin. citrate/0.1% NaDodSO4 for 1 hr at 650C. Autoradiography Clone HFG 103, which is 14.6 kb long, contains only the first was performed on Kodak XAR-5 film with an intensifying exon encoding the signal sequence, whereas the 15.0-kb screen for 6 hr at - 80'C. clone HFG 101 contains only the last exon (V) encoding the extra 27 carboxyl-terminal residues of preFS344. The 5'- terminal sequence of HFG 101 could not be merged with the RESULTS AND DISCUSSION 3'-end of HFG 102 to yield the complete gene sequence. From 8.2 x 105 phage plaques of an adult human testicular Nevertheless, the overall genomic organization of human Agtll cDNA library, 12 positive clones were obtained by follistatin can be postulated to be composed of a DNA hybridization with a porcine follistatin cDNA probe, and 8 sequence that is >5 kb in length with six exons interrupted by positive clones were selected for sequencing. From the five (Fig. ic). Aside from the first exon, which deduced nucleotide sequences, two populations of cDNAs encodes the signal sequence, the following four exons en- that differ in their 3'-terminal coding regions were detected. coding domains I-IV are of approximately equal size. The One population consists of three clones (HTF 106, HTF 108, first four introns, where sequences have been determined and HTF completely, contain 2009, 430, 346, and 702 base pairs, 109, Fig. la) that encode a follistatin precursor of respectively, all of which possess the consensus 5'-GT... 344 amino acids (preFS344), whereas the other consists of AG-3' donor and acceptor splicing sites (Fig. 2). The exact five clones of the same length (HTF 102, HTF 107, and HTF length of the last could not be determined precisely. 110-112, Fig. lb) whose open reading frame encodes an Fig. 2 shows the composite cDNA and the corresponding identical precursor that lacks 3 amino acids at the amino amino acid sequence of the follistatin precursor as well as the terminal and 27 residues at the carboxyl terminal of five intron sequences in the follistatin gene. The upper preFS344. Shortage of the three amino-terminal amino acids portion of the figure ( 1-979) presents the com- in the latter precursor may be caused by termination of the plete cDNA sequence encoding the follistatin precursor of reverse at that position during cDNA synthesis. 317 amino acids (preFS317). Termination of the coding a b 200 bp HTF102 200 bp HTF107 HTF 108 HTF1 10 HTF109 HTFI 1I HTF106 HTF112 preFS344 preFS3 17 5' -.... 3' 5, E....JJ.:.J:X 3'

C 5'.]IFm 11,411INT Is l:..1-.1 TNTIN I 7 URA::..:::.: IN I Ii aaf.I INT 4 m 3' 1 kb c s I II III IV V

d 515, ~~~~~Ii~~~~~~~S 1 I|Ii11i I0S I vmiIII IV -SHFGIOI HFG103 v 5 kb

FIG. 1. Schematic representation of the human follistatin cDNAs and follistatin genomic organization. (a and b) Structures of the human follistatin (FS) cDNAs encoding a precursor of 344 (a) and 317 amino acids (b). Untranslated sequences are represented by a line; coding sequences are boxed. The filled region represents the signal sequence, and the shaded regions correspond to the domains found in mature proteins. cDNA clones used in the sequence determination are shown above the respective prefollistatin structures. (c) Structure of the human follistatin gene. Filled box, exon encoding signal sequence; shaded boxes, other exons; open boxes, introns; slashed box, stretch of shaded nucleotides in Fig. 2. The gap in intron five represents the sequence that was missing in the follistatin genomic clones. (d) Structure of the human follistatin genomic clones; exons are represented by black bars. Downloaded by guest on September 30, 2021 4220 Biochemistry: Shimasaki et al. Proc. Natl.- Acad. Sci. USA 85 (1988)

- 29 1I0 GcCICCrCCCCCCCGC~ccCGCCCCCACCAICGICCCCCCCACGCACCACCCGCCIGCCCCI ICCCICCICrCICCICCICCICICCACI IC 90 M V R AP H QP CCG [CL LII L L CO0 F Intron y. ( P(K ) .1I0 20 91 ATCGAGCACCCCACICCCCACCC ICCCAACICTCICCCCCCICAACCC;AACAACCCCCCCICCCACCTCCICIACAACACCCAACICACC R0 M ED0 PSA ACG N C WI P 0 A K H CRC 0 VI V KILEL S .30 .40 +50 IPI AACCACCACICCICCAGCACCCGCCCCCICAGCACCICCICCACCCACCACCACCICAAICACAACACACTCTTCAAGCICAICAI II C 270 K I E C CS I C P L S I S 8 1 E L 0 V N 0 N I I F K W M I F .60 Intron 27Y .70 271 AACCCCCCCCCCCCCAACTCCATCCCCICIAAACAAACGCTCIACAACCICCACTCICCACCICCCAAAAAATCCCCAATCAACAACAAC 560 NHC CAP N C [ PC K LITC E N V D CC PCG K K C PIM N K K +90 * I100 .1I1I0 561 AACAAACCCCCTCICCCTCICCCCCCCGATTCIICCAACATCACCTCCAACCCICCACICICCGCCCCICATCCCAAAACCTACCCCAAT 450 N K PPR C V C AP D C S N I118 K C. P VC CI D0C K IV P N .1I20 + 1150 Intrcon 3 Y .40 451 CAATCICCAC 1CCTAAACCCAACNICIAAACACCACCCACAACICCAACICCACTACCAACCCACAICIAAAAACACTTICGCCCAICII 540 EC A LL KA P C K LO0P E [L V 00Y0 C P C K K I CR D V .150 .160 (K) .1,0 541 IICICICCACCCACCICCACAICICIGCCICACCACACCAAIAAIGCCCIACICICICACCICIAATCCCAIIICCCCACArCCTCCCIrCC 630 F C PCG SSI C V V D001 N NA Y C V I C NPR I C P E PA S .180 .190 +0 T 631 ICICACCAAIArCICTICCCCAATCATCCACICACCIACTCCACIGCCCICCCACCTCACAAACGCCACCICCCTICCCCCCACATCIAII 720 S EQ YVI C C HOC V IVY S S A C HIP K A I C 11C R SI1 .210 YLIntron 4 .220 .230 721 CCAIIACCCIATCACCCAAACICIATCAAACCAAACICCICICAACAIATCCACICCACTGICCICCAAAAAATCT IIATCCCATTICAAC 810 C LA Y CCG K C 1 K A K S C EDI 0 CIT CC K K CIL 0FO K .240 .250 *.260 P11 CIICCCACACGCCCCCCI1CCCTCTCICAICAGCTCICCCCICACACTAAGCGCCATCACCCTGCICICGCCACTCACAAICCCACTTAT 900 V C P C R C S L CD0 LI C P 05SK SQL PV C A SODN Ar V .270 (E) ,280(E) Intron sy 901 CCCAGCCCACICCCCATCAACCAAGCCICCTCCCICCICAGCTTCICCACICCAACTAAACCACTCCCCAICTICCAACICAICAq'C.t.. 990 A S L C A N K C A A C SS C V I I E V K H S C S C H emi.. 991 .AAA...... A..TIAIACWT.&.TTTO.T.A..A!..C AAAAAA..... 7*A.eTA.TracCATA 1080 1081AC~~ttA4AI~ec&Aknek~k~a~c~rcrc~crcc4Acrc~c~tAMA~~v~c~f...... r.n.r...... c...... T.O....1170

117 CACTCT0TC0 0GVASAFP 1 CSAIC V2eaC

1351 ACAIAGCCCITTCICCAAAAAAAAAAAAAAAAAAAAACAAAAACAAAAAAACAAAAAVAIAIICTCCAIACTGTAAAIAACICIATCCTTA 1440 1441 TIrATTTCGCCCCAAAACIATACAITAAACCACCIIICICCTAAAcCTCTCTCCCACCCCAcCICTTrACTCATTCCACACCcACACCCA 1530 1531 rTCAIICICACcICIACICCATCACCCCCAIAGIICACACIICIAGACAIIIAIIIAIACICTCTCATCTTTrAIAAIIIATACAIAAAA 1620 1621 ICICGCCIICACICIAIACCIIGIII TICAACAAAII IAIICTGCAAAGCAACACCACTICTTATTIATTCTCACCTCtCIICCTTCIAA 1710 1711 AcTAAAAcCCIIIIIICCrrCIAAACCArITAACICCAIFCCTTACIAIICACTccCArTCACTCTCCCCTTCATTTCAcCTCTAACTcC 1600 1801 II IICCACI !ICAACAAACTICCATCICACTIIICICCAICIIIATTTArICCATTCrcCGCICCcCTATCTCTACATACAICAcCC~CT 1890 1891 GCCCItrCrTtACAAGCAACCTTGACICACCAAAAcCCAIIATAAcCTCCACICAAATACAACCIACACAACATAACCATCIIICACCAA 1980 1981 ACICCrACTICACIICIIIICIIATCATCAACACAIIICI 2020 intron 1 I GTAACCCAGTCCCCAICCCCTGCCCCACGCIIGCCTCAGCCcACCGCcCCCCACIIITCICtCICCTCCcAcCtTrcICIGTCTCACCAACC 90 91 CICCICICCCAACATCIICAACCAACTTCCCCACICCCCCATCCACACCAcICCCAAAATIGIGCCcAArGCAC~cIIACCACTCCCAAC 180 181 ICACIIICcACCrCCGrIArCIIGAAAAAccAAAcccACCCACCTCCCdCreIACTAGTrAcccTrAAArcerecAAcrreGccccATCA 270 271 TAACI rrAcATGccATCreccrCcrTICCCrCrCAAcAAcccAAeAAcIAreACrACTAceC~CCAcAIrrCAcTCCAeCCATTrCCACT 360 361 CTCATCCICrICCCTCCCACATCIIIICCArACTTAAAAAAAAAAAACCAAACAAACAAAAGAAAACCTCcCCIIITTCCCCGACACATAC 450 451 TcAAT~rrArITGGccCcCGccccACCCIccrrCAcAAACTrrccccAAACrTccAAATcTcccGCerCAACArrrrCAACCCAACtAcc HO0 541 IIfTCCCCIACICCATGCrCIFCCCAACCCCAICCCCCArICTCcCAAAAAAAA(AAAAAAAAAAAAAAAAAAAAAAAACCCAAATTAIIA 630 631 CCCCAAACIGCCIIrCAQCACI tIACAGII rI IICCCCICICCCICAIICIICIIICtCACCCtIACCIIICIGIAAAACACICACCACC 720 721 CCATAACTCACTTAAAIICtGAAAICCCTTAACACCAAArIICICAACCTTCCCTTTTAAAAATACAIIIOIIIATTTCAATTACrCTTI 610 811 CCCCACAATCIIIAAAAIAAIAATACCACCIICCCATCCAATCATTCTTAAACICAIArICCTCATTCAICACAACICITCICCACCCAA 900 901 ATCCCAACAIIAAAAACCCCGAACACGCACAAAICACTICCAAAIAIIICICTTCIACCIAICACAIACCIACIIAIAACCCTCAAAICCI 990 991 ACCAACACICCtCCCCICCCCCCACCIIAACCCAIIACITI CCCICCIICAGCCIIAACAITI IAACIATCIICIAAACTCCCCICCICCC 1080 1081 ACIGCCCCFCICCCGICICCATCGCACICCCCICCCIICCCCGCCTrIIACACCACACGCCTCCCCICGTCCCCCCACCTCCCCCCCGACCA 1170 1171 AACCAGCCCCICCCACAIIICCAAACAICCCGCCICCCCCGCCCCGCCGCQCICICtCfCCCAIICCAAAACCCCCCCCICCCCCTGCCCICC 1260 1261 GCCICAGCCCCCCCCCCCCIACCICCCCCCICCCCCCCCIICCIIIIIIGCIIICAIIIIICAGAACICCCIIIICCIICAICICIIICIC 1350 1351 trCrCcrCrCrrlCrrlcrlCrTtrCCtclcrcrrrcrllCllTCTCrllrllrTClrrCTTercrrterlllCrrlCrTrcrcrlcrlr 1440 1441 CII IICTTrCT~rCIIICrCTIICCIICICCCTCICTCrCCCCICCAICTCTCCFrCI~CCICICtCTCCIICTCTTACIIFcVIAACCC 1530 1531 CICICACAAAIAAIICCICIACCCCCCIACAICAAACCACCAICIAAAAACAICCCICCCAICCICIIIIICTCACCTICIIIAATCICC 1620 1621 ICrICCACICICICCACCIIAICAAAICCCACCAAIAAAAGIAAACAtCICACIAAAACICAAICCAACCICCACCICIICIcTcIcccI 1710 1711 CAcIGCIAACTCACATICATAICCCICCCCCCCCCIGCrcIIcccicrcFCCCCTCCCTCCCtCICCCCCACCICCCATCTCICTCATCA 1800 1801 GCCCCICCCCCICCACICCCIICTIIIICCACCCCTCCACCCCrTfCCATTTATIICCtACIIIICICCcCCCICICTICCcACTTCCcCT 1890 1891 CCTCCACccICACCCCCrCCCCATCCCCGCCecCccrcrctCCCTAGCCAeCrCCCrCrCCCrCccrCCGCACCCTCcACTCrCCACTCA 1980 1981 CCCACCICCCCACCCIICCFcFITCACAC 2009 Intron 2 1 CIACCACTCC~rCIICCCAACTTCCAGGCCCICACIAGACCCCCrCIIACCCIIACCIICCCCACTACCICACTCdCCCTIICCCACIACC 90 91 ACACCIIICIrICTCICC~ccICCrcCT CCICCCCITCCCCICCIAAGCCCCICCACACCTcAAIICECCCICTIACAGCCTCICACCCAAIA 180 181 CACCCCACACrICITIACCCAACIGCICCICCIAAACCCACCAACCICrCCTCCIVAATCACACACAICICCGCGCICCCACACCCCAACC 270 271 TCCACACICIrIICACCAACICCCAATAITCCACCACACAGCCCICCCCCCCICCACCCCAAACICACCCCICCATCArTCCCCAACCCACC 360 361 CCAAGCtCCCTCCCCCCACCICCACACICCC[~ccCCGCICCIIIAAICCAIGCCTCICIICIAACICACAC 430 Intron 3 1 CIAGCCICCACCCICITICACCAAGACTGCAICICICCCCTCCTCCACtITICIACCTAAACIACACCCFCTACAAGACCCTTCCCCCAIC 90 91 CTGIACTCCCCACTAAGACCCICATAATAGIAATACICAAACCAAATAAAGGACICCILrTCrAACCICIACACATTCATTAACAACACT 180 181 CACCCCACCAACCIACICAIACAItTCrCICIICAAAACTACAGCCCCICCCIAACTGCCCIIIICAAACCTCCATGCCICAGCTCCATCAII 270 271 ICCIICCIAACI rCAAGCICCCACTCCCTAACCACIACAACCIACCIAIICAICICICIIICCIICIIICIICCAC 346 Introrzk 4 1 CIAGCIAIr~rtCCAIICAGCAACCAAAAACAGAAAACAGCCIACIICTAIIATTAAACIGCGCCCCIAACTAATAACTAAACCCCAACCC 90 91 CICCCCAAACACCAIACCCACAAAIACCCICCAAIIICCCCAAAGCIYCIICACCACACIATICCICATGCAAACCAIIGCICICICCAC 180 181 GCATICACACATArAITrCAAArCCCACCACGAACCAAGGAACACIAIrCCCICTTAGAAAACIIAGAACIIACTCAATIIIACACAtIII 270 271 TfrAACICCCACACTICCICCAACCCAAAAATAATIACTrAGCACIICCACAAATCICIICICACATTCTACIAATTAACGCAATTCIII 360 361 CrfAtCrCAAAACAICCCACTICICCAAAIAAIAACAIACATATTTAACIICACAATAIICIAAAAICCTGTTCIIAAAAAAATACCCTTT 450 451 TAATCCACIAAICIAGTATGIAACAAACICCAGGCCCCIIICGCCICICIATCICIGCtQCCAIIICACTIrCACIIIIAIIAICCACCA 540 541 rIIIICCATAAfrAcICCATrACCCCCATrAGIAAIACCCrAIrACTATTAICIIIATATAAAAIAAAIIAGITrIAIAtrrATTCATAC 630 631 ACCACIACACAAAGCCAGAAAAGCCCCAIAICGCCAAAtCACIIIACTCATCACACAICIArrAIAICCTAC 702 Introni 5 1 CIAATCICCArrrIIAACtFrICCICCArrIAAGCTrITCCCACCCAArCCCIACCCAAICCACACIIACAAAGCACGCAGArCtCCCAT 90 91 AAAICCArtr CTrcr CAAAIIACCIACCrGCrAACIAICACCAGCAAIICAATAATCCACACAAAAIICICrGCCAICTTTCTTGCCCIII 180 181 IACCAc~rArcrCGrCArc .----CAP .... 199 CAICATCAAICCCCI ICCTCITACAAAIcrAI rrCCAC rC~rIITCCCCIArICITCTTCIIITCGIACqCIC~rtrCCIGrrrCCIIIA 90 91 1ITAACArTICAGICIAAACTAACIc~rICAAAAGrTrrItrrII~rIICCAACCACACCIICATAFTATCACACAICGCCtGCIrCCITr 180 181 TCCACI ICCCIcIACrAACrcFCAATIAAGCACCCAAACCAGrrAIACCrACAACACAACAGC~cCrrrrfAICr4ArrrCAC 262 FIG. 2. Nucleotide and deduced protein sequences ofhuman follistatin cDNAs encoding the 317 and 344 amino acid precursors and the human follistatin gene. The nucleotides are numbered at the left and the right, and amino acids, in one-letter code, are numbered throughout. The cDNA encoding preFS317 contains the stretch of shaded nucleotides, whereas in the cDNA encoding preFS344 that stretch of shaded nucleotides is spliced out. Two potential N-linked glycosylation sites in the precursor protein are marked by stars. Six amino acid substitutions were found between the porcine (in parentheses) and the human sequences. Arrowheads indicate the positions where the five introns are inserted in the follistatin gene. All eight cDNAs (HTF 102 and HTF 106-112) that were sequenced started at the underlined oligo(dA) region at nucleotides 1366-1386, which might have functioned like the poly(A) tail during cDNA synthesis, even though there are two potential polyadenylylation sites, AATATA and AAATAA, immediately after the stretch ofoligo(dA) in the gene. All cDNA sequences were identical to the corresponding sequences ofthe gene with only one exception-marked with an asterisk at nucleotide 1060; in the cDNAs the identified nucleotide was adenosine, whereas in the gene this nucleotide was thymidine. Intron sequences are shown below. Downloaded by guest on September 30, 2021 Biochemistry: Shimasaki et A Proc. Natl. Acad. Sci. USA 85 (1988) 4221 Whether this mode of glycosylation will apply to human follistatin must await the isolation of the human protein. Mature preFS317 and preFS344 contain four contiguous domains with the last three domains being highly similar to each other as shown in Fig. 4. Moreover, the four domains -28S also have sequence similarity with human pancreatic secre- tory trypsin inhibitor (16, 17) and human epidermal growth factor (EGF) (18, 19), suggesting that the homologous regions V may have originated from the same ancestral gene. Each of -18S the coding regions for the domains of follistatin is separated precisely by introns. This type of genomic organization is also present in the EGF precursor gene structure, in which the coding region ofeight repeated EGF-like domains are also separated precisely by introns (19). This finding strongly supports the notion of exon shuffling (20): introns can facilitate gene evolution by recruitment of exons from other in the . It is also interesting to note that the FIG. 3. RNA analysis of human ovarian follistatin mRNA. hexapeptide sequence, Ile-Ser-Ser-Ile-Leu-Glu, at the car- Positions ofthe 28S and 18S ribosomal RNAs are marked. Transcript boxyl terminal of preFS344 is also present in the tyrosine size was determined by comparison with an RNA ladder marker kinase domain of the EGF receptor (21, 22). (Bethesda Research Laboratories). The significance ofthe 27 extra amino acids at the carboxyl sequence for preFS317 is caused by the splicing out ofintron terminal of preFS344 is at present unclear. Because two five, while leaving the stretch of shaded nucleotides, to molecular forms offollistatin with Mr 35,000 and 32,000 have generate a stop codon immediately following the last amino been isolated from porcine follicular fluid (10), it is tempting acid, Asn-288, of exon IV. On the other hand, in cDNA to assign the mature preFS344 ofthe human to the analogous clones HTF 106, HTF 108, and HTF 109, the stretch of Mr 35,000 species and the 317-residue precursor to the shaded nucleotides that also has the consensus acceptor site analogous Mr 32;000 species. One possible explanation for at the 3' end is spliced out together with intron five to the function of the extra carboxyl-terminal 27 amino acids is generate a precursor of 344 amino acids. This finding thus to facilitate the transport of the Mr 35,000 species to the suggests an alternative splicing event in the RNA processing peripheral circulation because addition of the highly acidic to generate two follistatin precursors. carboxyl-terminal tail (Glu-Asp-Thr-Glu-Glu-Glu-Glu-Glu- The first 29 amino acid residues of the precursor corre- Asp-Glu-Asp-Gln-Asp) will make the protein more water- spond to the putative signal sequence followed by four soluble. The shorter Mr 32,000 species may be confined to the contiguous domains specified by the total of 36 cysteines. follicular fluid to play a paracrine and/or autocrine role for The putative initiation Met at nucleotides 28-30 is located folliculogenesis. This hypothesis could be ascertained by within a sequence favored for eukaryotic initiation sites, preparation of a carboxyl-terminal-directed antibody of the A/G-X-X-A-U-G-G (15), and assignment of the signal se- Mr 35,000 species to detect the protein in the plasma. quence and the domains are based on homology with the The physiological significance of human follistatin needs porcine sequence (12). Each domain is clearly delineated by further investigation. Preliminary studies with the two por- the insertion of introns marked by the arrowheads in Fig. 2. cine proteins showed that they can specifically inhibit the RNA analysis ofa human ovarian mRNA preparation with secretion of pituitary FSH in rat anterior pituitary primary a follistatin cDNA probe demonstrated a single mRNA band culture with a potency of approximately one-third that of of about 2.4 kb in length (Fig. 3). However, this band may inhibin A on a molar basis (23). In addition, the inhibitory contain two follistatin mRNAs ofdifferent sizes, produced by effect is additive to that of inhibin to produce the same alternative splicing, if they differ only by 263 nucleotides maximal inhibition of FSH release exerted by either inhibin (represented by the shaded stretch in Fig. 2). Such a small or follistatin alone. Further physiological studies must await difference could not be discriminated by our electrophoretic the purification of a large quantity of this material from analysis. follicular fluid or its production by gene expression. With the Although human follistatin has not yet been isolated from unexpected finding that the activins of ovarian origin are the follicular fluid, its structure can be predicted from the cloned same molecules as those produced by the human leukemia cDNA sequence. Comparison with the porcine structure cell line THP-1, which induce differentiation of the erythro- shows that there are only six conservative amino acid blasts (24), it is quite possible that the follistatin proteins may substitutions between the two species with two of the be similarly involved as regulating factors in physiological substitutions occurring within the signal sequence (Fig. 2). systems unrelated to ovarian-pituitary functions. The precursor protein predicts two potential N-linked gly- cosylation sites at residues 95 and 259. In porcine follistatin, We thank Drs. Tony Hunter and Dave Schubert of the Salk glycosylation at Asn-259 has been tentatively identified (12), Institute for suggestions to improve the manuscript and Ms. D. whereas no carbohydrate chain was detected at Asn-95. Higgins for secretarial assistance. This research was supported by

GRC-VL ROAKC&atC0L Y K TEL S K EECCSTORL - STSWT E DVDN TLFKWM I FNC- - -SA P NCI PCK Ooin I ( 1- 63) T C R F C P C S S T C - V V 0 Q rI A r C - V T C - R I C P E P A SS E Q Y L C 6 N C T17 Ss A C - H L R K AT C LL C R S I C L A V E 6 K C I Om in III (138-211) tC ENYDCGPGKKC-RMNKXXKPRC-V-C-APDC-SNITWK OPrCCLDCXTTRNEC-ALLKARCKEQPELEVQTQCRCK Omin 11 ( 65-136) S C E 0 1 Q C T C e K X C L V D F K V G R C R C - S L C - D E L C - P D S K S D E P V C A S 0 N A T r A SE C - A M K £ A A C S S C V L L E V K H S G S C N Ooin TV (215-288) DSLGR-EAK-CYREt-SGCTPLyS PVCGTDGNTP1ECLnCCFENRKRQTSILTQK-SC PC hPSTI ( 1- 56) N SD0- S EC P L S HO0G YC L H DGOCYCHY TEA L T KY A C N C V ICG T G E R C- QY ItD WV ( 1- 46)

FIG. 4. Alignment of amino acid sequences of the four domains of human follistatin, human pancreatic secretory trypsin inhibitor (hPSTI), and human EGF. Amino acids are shown in one-letter code, and identical residues are shaded. Numbers refer to the amino acid positions occurring in the mature protein. Downloaded by guest on September 30, 2021 4222 Biochemistry: Shimasaki et al. Proc. Nati. Acad. Sci. USA 85 (1988) National Institute ofChild Health and Human Development Contract deKretser, D. M. (1987) Biochem. Biophys. Res. Commun. NO1HD-6-2944, Program Project Grants HD-09690 and DK-18811 149, 744-749. from the National Institutes of Health, and a grant from the Robert 12. Esch, F. S., Shimasaki, S., Mercado, M., Cooksey, K., Ling, J. and Helen C. Kleberg Foundation. N., Ying, S.-Y., Ueno, N. & Guillemin, R. (1987) Mol. Endocrinol. 1, 849-855. 1. Ling, N., Ying, S.-Y., Ueno, N., Esch, F., Denoroy, L. & 13. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Guillemin, R. (1985) Proc. Nati. Acad. Sci. USA 82,7217-7221. Acad. Sci. USA 74, 5463-5467. 2. Miyamoto, K., Hasegawa, Y., Fukuda, M., Nomura, M., 14. Chirgwin, J., Przybyla, A., MacDonald, J. & Rutter, W. (1979) Igarashi, M., Kangawa, K. & Matsuo, H. (1985) Biochem. Biochemistry 18, 5294-5299. Biophys. Res. Commun. 129, 396-403. 15. Kozak, M. (1981) Nucleic Acids Res. 9, 5233-5252. 3. Rivier, J., Spiess, J., McClintock, R., Vaughan, J. & Vale, W. 16. Bartelt, D. C., Shapanka, R. & Greene, L. J. (1977) Arch. (1985) Biochem. Biophys. Res. Commun. 133, 120-127. Biochem. Biophys. 179, 189-199. 4. Fukuda, M., Miyamoto, K., Hasegawa, Y., Nomura, M., 17. Yamamoto, T., Nakamura Y., Nishide, T., Emi, M., Ogawa, Igarashi, M., Kangawa, K. & Matsuo, H. (1986) Mol. Cell. M., Mori, T. & Matsubara, K. (1985) Biochem. Biophys. Res. Endocrinol. 44, 55-60. Commun. 132, 605-612. 5. Robertson, D. M., deVos, F. L., Foulds, L. M., McLachlan, 18. Gregory, H. (1975) Nature (London) 257, 325-327. R. E., Burger, H. G., Morgan, F. J., Hearn, M. T. W. & 19. Bell, G. I., Fong, N. M., Stempien, M. M., Wormsted, M. A., deKretser, D. M. (1986) Mol. Cell. Endocrinol. 44, 271-277. Caput, D., Ku, L., Urdea, M. S., Rall, L. B. & Sanchez- 6. Mason, A. J., Hayflick, J. S., Ling, N., Esch, F., Ueno, N., Pescador, R. (1986) Nucleic Acids Res. 14, 8427-8446. Ying, S.-Y., Guillemin, R., Niall, H. & Seeburg, P. (1985) 20. Gilbert, W. (1978) Nature (London) 271, 501. Nature (London) 318, 659-663. 21. Ullrich, A., Coussens, L., Hayflick, J. S., Dull, T. J., Gray, A., 7. Ling, N., Ying, S.-Y., Ueno, N., Shimasaki, S., Esch, F., Tam, A. W., Lee, J., Yarden, Y., Libermann, T. A., Schles- Hotta, M. & Guillemin, R. (1986) Nature (London) 321, singer, J., Downward, J., Mayes, E. L. V., Whittle, N., Wa- 779-782. terfield, M. D. & Seeburg, P. H. (1984) Nature (London) 309, 8. Ling, N., Ying, S.-Y., Ueno, N., Shimasaki, S., Esch, F., 418-425. Hotta, M. & Guillemin, R. (1986) Biochem. Biophys. Res. 22. Lin, C. R., Chen, W. S., Kruiger, W., Stolarsky, L. S., Weber, Commun. 138, 1129-1137. W., Evans, R. M., Verma, I. M., Gill, G. N. & Rosenfeld, 9. Vale, W., Rivier, J., Vaughan, J., McClintock, R., Corrigan, M. G. (1984) Science 224, 843-848. A., Woo, W., Karr, D. & Spiess, J. (1986) Nature (London) 321, 23. Ying, S.-Y., Becker, A., Swanson, G., Tan, P., Ling, N., Esch, 776-779. F., Ueno, N., Shimasaki, S. & Guillemin, R. (1987) Biochem. 10. Ueno, N., Ling, N., Ying, S.-Y., Esch, F., Shimasaki, S. & Biophys. Res. Commun. 149, 133-139. Guillemin, R. (1987) Proc. Natl. Acad. Sci. USA 84,8282-8286. 24. Eto, Y., Tsuji, T., Takezawa, M., Takano, S., Yokogawa, Y. 11. Robertson, D. M., Klein, R., deVos, F. L., McLachlan, R. I., & Shibai, H. (1987) Biochem. Biophys. Res. Commun. 142, Wettenhall, R. E. H., Hearn, M. T. W., Burger, H. G. & 1095-1103. Downloaded by guest on September 30, 2021