Modular evolution of egg case silk genes across orb-weaving superfamilies

Jessica E. Garb* and Cheryl Y. Hayashi

Department of Biology, University of , Riverside, CA 92521

Edited by M. T. Clegg, University of California, Irvine, CA, and approved June 13, 2005 (received for review March 25, 2005) Spider silk proteins (fibroins) are renowned for their extraordinary many presumed members of the spidroin gene family remain mechanical properties and biomimetic potential. Despite extensive uncharacterized at the nucleotide level. evolutionary, ecological, and industrial interest in these fibroins, construct egg cases, protective cocoons where eggs only a fraction of the known silk types have been characterized at undergo development, primarily from silk fibers synthesized by the the molecular level. Here we report cDNA and genomic sequences tubuliform (cylindrical) silk glands, and tubuliform fibers are ex- of the fibroin TuSp1, which appears to be the major component of clusively used for egg case construction (6, 11). Tubuliform glands tubuliform gland silk, a fiber exclusively synthesized by female are found only in female spiders, and silk synthesis is initiated in spiders for egg case construction. We obtained TuSp1 sequences these glands at sexual maturation (6, 12). Guerette et al. (4) cloned from 12 spider species that represent the extremes of phylogenetic a putative component of egg case silk (ADF-2) expressed in diversity within the Orbicularia (orb-weaver superfamilies, Arane- tubuliform glands of the orb-weaving spider Araneus diadematus. oidea and Deinopoidea) and finer scale sampling within genera. Examination of ADF-2, in comparison with additional spidroins, TuSp1 encodes tandem arrays of an Ϸ200-aa-long repeat unit and subsequently identified it as a member of the major ampullate individual repeats are readily aligned, even among species that spidroin 1 (MaSp1) silk ortholog group (7), used in dragline fibers diverged >125 million years ago. Analyses of these repeats across (13, 14). Recently, Hu et al. (15) reported another putative com- species reveal the strong influence of concerted evolution, result- ponent of egg case silk (ECP-1) expressed in silk glands of the black ing in intragenic homogenization. However, deinopoid TuSp1 widow spider, Latrodectus hesperus. However, the translated se- repeats also contain insertions of coding, minisatellite-like se- quence of neither ADF-2 nor ECP-1 could account for the amino quences, an apparent result of replication slippage and nonrecip- acid composition of tubuliform silk fibers or the gland’s contents (4, rocal recombination. Phylogenetic analyses of 37 spider fibroin 11, 15, 16, 17), indicating that the dominant silk expressed by sequences support the monophyly of TuSp1 within the spider tubuliform glands must be a different, unidentified protein (4). fibroin gene family, consistent with a single origin of this ortholog Here we report cDNA and genomic sequences of a spider silk group. The diversity of taxa and silks examined here confirms that gene that exhibits features consistent with its protein product being repetitive architecture is a general feature of this gene family. a major component of tubuliform silk fibers. The tubuliform gene Moreover, we show that TuSp1 provides a clear example of (TuSp1) shares the distinctive molecular architecture characteristic of members of the spidroin gene family. We found the TuSp1 modular evolution across a range of phylogenetic levels. transcript in high copy number in tubuliform silk gland cDNA libraries constructed from two araneoid species, the garden spider concerted evolution ͉ fibroin ͉ gene family ͉ spider silk ͉ TuSp1 Argiope argentata (Araneidae) and the black widow spider L. hesperus (Theridiidae), as well as in the total silk gland cDNA defining feature of the diverse order Araneae (Ͼ37,000 libraries made from two species of the divergent Deinopoidea, a Adescribed species) (1) is the ability to spin multiple task- feather-legged spider, Uloborus diversus (Uloboridae), and an ogre- specific silks. Spiders use different combinations of silk proteins faced spider, Deinopis spinosa (Deinopidae). Genomic fragments of (fibroins) to construct structures for reproduction, prey capture, the tubuliform silk gene were sequenced from Argiope argentata and and locomotion (2, 3). Silk fibers are assembled from one or more L. hesperus in addition to eight other araneoid species. In view of fibroins that are expressed in abdominal glands connected to the the diversity of species from which we obtained TuSp1 sequences, spinnerets (4). Silk gland morphology varies extensively across we investigated the evolution of the gene’s modular organization species. For example, members of the infraorder Mygalomorphae and the phylogenetic relationship of TuSp1 to other members of the possess one or two types of globular silk glands (5, 6). In contrast, spidroin gene family. an individual araneoid spider (ecribellate orb-web weaver) has seven morphologically distinct types of silk glands (3), each of which Materials and Methods synthesizes different suites of silk proteins (4, 7). Spider Collection and Dissections. We obtained Argiope argentata The amino acid sequences of spider fibroins (spidroins) share a (A.ar.) from Encinitas, CA; Deinopis spinosa (D.s.) from Gaines- number of distinctive features. Iterations of four simple amino acid ville, FL; and L. hesperus (L.he.) and U. diversus (U.d.) from motifs characterize the majority of sequenced spider silks: (i) Riverside, CA. Tubuliform glands were dissected from A.ar. and polyalanine (An), (ii) alternating glycine and alanine [(GA) n], (iii) L.he. and frozen in liquid nitrogen. Total silk glands (tubuliform GGX (X ϭ subset of residues), and (iv) GPGXn. Previous studies and other silk glands) dissected from U.d. and D.s. were similarly indicate that these motifs correspond to distinct structural modules frozen. Taxa used for PCR amplification were obtained from the

[e.g., An and (GA)n form crystalline ␤-sheets, GPGXn forms EVOLUTION ␤-spirals], and it is hypothesized that different proportions of these modules dictate fiber mechanics (8). Combinations of these amino This paper was submitted directly (Track II) to the PNAS office. acid motifs are organized into a larger unit, termed the ensemble Abbreviations: A.ar., Argiope argentata; D.s., Deinopis spinosa; L.he., Latrodectus hespe- rus; U.d., Uloborus diversus; L.g., Latrodectus geometricus; N.c., Nephila clavipes; P.t., repeat, which is tandem-arrayed throughout the silk sequence (7, 8). tristis; TuSp1, tubuliform spidroin 1; MaSp, major ampullate spidroin; MiSp, In addition to their similar repetitive architecture, all spidroins have minor ampullate spidroin; ML, maximum likelihood. a nonrepetitive C terminus exhibiting length and sequence conser- Data deposition: The sequences reported in this paper have been deposited in the GenBank vation (4, 7, 9, 10). These shared features are consistent with database (accession nos. AY953070–AY953093). spidroins being encoded by members of a gene family that func- *To whom correspondence should be addressed. E-mail: [email protected]. tionally diversified after gene duplication events (4, 7). However, © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502473102 PNAS ͉ August 9, 2005 ͉ vol. 102 ͉ no. 32 ͉ 11379–11384 Downloaded by guest on October 1, 2021 following localities: (i) family Araneidae: Argiope aurantia, Colum- TuSp1 sequences and across species, were aligned by CLUSTAL W bus, OH; Cyrtophora moluccensis, Queensland, Australia; Gea (MACVECTOR 7.2, Accelrys, San Diego), refined by eye, and used to heptagon, Loudon, TN; and (ii) family Theridiidae: L.he., Yarnell, align the encoding nucleotides. A neighbor-joining tree of TuSp1 AZ; Lactrodectus mactans, Dominican Republic; Lactrodectus geo- repeats was constructed by using PAUP* (19), based on uncorrected metricus (L.g.), San Diego, CA; Lactrodectus hasselti, Perth, Aus- genetic distance; support was evaluated by 10,000 bootstrap repli- tralia; Lactrodectus tredecimguttatus, Israel; Steatoda grossa, cates. Translations of the silk C termini were combined with Haleakala National Park, HI. published silk cDNA C termini in a phylogenetic analysis of the spider fibroin gene family. C termini were aligned by CLUSTAL W, cDNA Library Construction and Screening. Total RNA was extracted refined by eye, and used to align encoding nucleotide sequences. from silk gland tissue (tubuliform glands or all glands) by homog- The A.ar. TuSp1 genomic C- terminal sequence was identical to the enization in TRIzol (Invitrogen), followed by purification with an A.ar. TuSp1 cDNA sequence at the DNA level; thus, the genomic RNeasy Mini Kit (Qiagen). mRNA was extracted from total RNA sequence was not included in phylogenetic analyses. Heuristic with oligo(dT)25-tagged magnetic beads (Dynal, Brown Deer, WI). parsimony and maximum likelihood (ML) tree searches were cDNA was synthesized by using the SuperScript II Choice protocol conducted in PAUP*. Parsimony searches treated characters as (Invitrogen) with an anchored oligo(dT)18V primer. ChromaSpin unweighted with gaps as a 5th state, and included 1,000 random 1000 columns (Clontech) were used to select cDNA fragments taxon addition replicates; clade support was assessed by 10,000 Ն1,000 bp. cDNA fragments were ligated into EcoRV-digested bootstrap replicates with 100 random taxon addition replicates and (New England Biolabs) pZErO-2 plasmids (Invitrogen) and elec- decay indices (20). Heuristic ML searches were conducted with a troporated into TOP10 Escherichia coli cells (Invitrogen). Four Ϸ Hasegawa–Kishino–Yano model plus among-site rate variation, libraries of 1,200 (tubuliform glands only) or 1,800 (all silk glands) ϩ⌫ recombinant colonies were arrayed and replicated onto nylon assuming a gamma distribution of rates among sites (HKY ) membranes. Libraries were screened with the ␥-32P-labeled oligo- (21) with estimated parameters selected by likelihood ratio tests implemented in MODELTEST (22). ML tree searches included 100 nucleotide probes (sequences shown from 5Ј to 3Ј)An: GCDGC- random taxon additions, followed by 100 bootstrap replicates (1 DGCDGCDGCDGC (all libraries) and (GA)n: CCWGCWCC- WGCWCCWGCWCC (D.s. and U.d. libraries), designed from random taxon addition per bootstrap replicate). Bayesian tree amino acid motifs common to published spider silks. Additional searches were executed with MR. BAYES 3.0 (23); three independent probes were designed from preliminary tubuliform sequences: searches were done to verify convergence of stationary log likeli- (i) GCATTGGAAACGGCATTAGC, GAGAGGGCACTG- hood values and clade posterior probabilities. Runs were conducted GCAAAAGACG, and TGAAGCCYAGTTGATAAGC (A.ar. by using the HKY ϩ⌫substitution model with four Markov chain library); (ii) ACTDGCTCCBACRCCRAC (A.ar., L.he., and U.d. Monte Carlo (MCMC) chains for 3 ϫ 106 generations. Clade libraries); and (iii) AACTGGASGATGCTAAGSG, SGCMGC- posterior probabilities values were computed from trees retained TYGACTGAAGGC, and GAYTGGCTTGCGGCTTGRCT after a burn-in of 2 ϫ 105 generations. (L.he. and U.d. libraries). About 20% of each library was scored for insert size (18) to locate silks possibly missed by probes. Silk probe Results and Discussion positives and colonies with inserts Ն500 bp were sequenced by using Tubuliform Sequences Identify a Unique Silk Gene. Forty-eight silk T7 and Sp6 primers. Restriction digests were used to identify the clones were found in the A.ar. tubuliform gland cDNA library. longest insert of each silk type, and these inserts were entirely These clones were sequenced and found to be partial transcripts, sequenced in both directions by using the GPS-1 Genome Priming varying in length toward the 5Ј end, of the same gene. A subset of System (New England Biolabs). 79 L.he. silk-probe positives were sequenced, and 45% of these clones were portions of a single gene transcript that was similar to Tubuliform PCR Amplification. Genomic DNA was extracted from the A.ar. tubuliform silk clones in their amino acid translation. individual spiders by using the DNeasy Tissue Kit (Qiagen). Prim- Size-screening and probing found copies of a putatively orthologous ers designed from A.ar. tubuliform sequences were used to amplify transcript in the total gland libraries of U.d. (6 clones) and D.s. (10 internal repeats (GCTTTCTCCAGTGYCTTCTC with GCTT- clones). The far greater number of clones of this gene identified GTGCGAAGGAAGAGGCACT) and the last repeat and C ter- from cDNA libraries made exclusively from tubuliform glands, minus (GTGCCTCTTCCTTCGCACAAG with GCCAAAGC- relative to those constructed from multiple glands, suggests this CTGTTGAGCATTCG) from araneid species. Primers designed gene is highly expressed in tubuliform glands. from L.he. sequences were used to amplify internal repeats Like other spidroins, translations of the tubuliform silk cDNAs (CATCGTCCAGTTCTTTCTCCAGTG and GAGAGAATGC- featured iterated repeat units flanked by a nonrepetitive C terminus CGATTGACTTGC) or the last repeat to C terminus (GCAAGT- (Fig. 1A). Translated BLAST searches found both repetitive and CAATCGGCATTCTCTC or CATCGTCCAGTTCTTTCTC- C-terminal regions of the sequences most similar to other spider CAGTG with GCAGGTAAAGTCGCAGCATCATAAG) from fibroins. Following the nomenclature for published spider fibroins theridiids. PCR products were directly sequenced in both directions after purification. (24), we refer to this gene as TuSp1 (a contraction of ‘‘tubuliform spidroin 1’’). The longest TuSp1 transcript found in each library was Amino Acid Composition Analyses. Proteins were extracted from as follows: A.ar., 2.7 kb; L.he., 4.2 kb; U.d., 3.9 kb; and D.s., 2.5 kb. A.ar. and L.he. tubuliform glands by grinding the tissue in 2% SDS Each sequence represents a partial cDNA transcript, as are all buffer and rinsing the pellets twice with acetone. Samples were sent published silk cDNA transcripts (e.g., refs. 4, 7, 9, 13, 14, and to the University of California, Davis, Molecular Structure Facility, 24–27). Genomic fragments of TuSp1 obtained by PCR were not where, after hydrolysis in 6 M hydrochloric acid for 24 h at 110°C, interrupted by introns. The genomic organization of TuSp1 there- molar fractions of amino acid residues were determined with a fore may resemble that of silk genes containing a single enormous Beckman 6300 amino acid analyzer using ion exchange chroma- exon (13, 14, 24), in contrast to the multiexonic capture spiral silk tography and a ninhydrin reaction detection system. gene (28). In addition to the numerous TuSp1 sequences, a putative homolog of ECP-1 was identified from the L.he. tubuliform cDNA Sequence Analyses. Nucleotide sequences were edited by using library. However, ECP-1, which was recently proposed to code for SEQUENCHER 3.1 (Gene Codes, Ann Arbor, MI) and submitted to an abundant component of Latrodectus tubuliform silk (15), was not BLASTX searches (www.ncbi.nlm.nih.gov͞BLAST). Translated found in the A.ar. tubuliform cDNA library or the deinopoid (U.d. amino acid repeat units, as found in tandem within individual and D.s.) total silk gland cDNA libraries.

11380 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502473102 Garb and Hayashi Downloaded by guest on October 1, 2021 Fig. 1. Organization of TuSp1. (A) Schematic view of TuSp1 primary structure. Ovals, individual repeat units numbered starting from the C terminus; arrows, additional sequence toward the N terminus. (B) Alignment of individual repeat units, with unit number and color as in A; gen., repeats obtained by PCR; *, sites in alignment with invariant residue; amino acids colored as follows: A, blue; S, yellow; L, dark purple; G, light purple; V, green; Q, pink; T, light green; numbered carets, positions of expansions; white space, data missing from upstream cDNA sequence or not obtained by PCR; Xs, codons with ambiguous nucleotides; L.m., L. mactans; L.t., L. tredecimguttatus; L.ha., L. hasselti; S.g., S. grossa; A.au., Argiope aurantia; G.h., Gea heptagon.

TuSp1 Amino Acid Composition Matches Gland and Silk Composition. scripts of major ampullate spidroins 1 and 2 (MaSp1, 7 clones; Tubuliform fibers are unusual in their serine-rich and glycine-poor MaSp2, 4 clones) in the L.he. tubuliform gland cDNA library. amino acid composition relative to other spider silks (16), a feature MaSp1 and MaSp2 are highly expressed in spider major ampullate that may correlate with the unique secondary structural features silk glands, and their products compose the bulk of dragline silk (13, (29) and mechanical properties of this silk (30). The contents of 14). Nucleotide BLAST searches with L.he. MaSp1 and MaSp2 found A.ar. and L.he. tubuliform glands were largely composed of alanine highest sequence identity with congeneric L.g. MaSp1 and MaSp2, (26%) and serine (22%) (Table 1). These compositions closely respectively. Translations of L.he. MaSp1 and MaSp2 were also EVOLUTION matched those reported for tubuliform glands (16) and egg case silk similar to published translations of L.he. MaSp1 and MaSp2 ob- fibers (11, 17) of other araneoid spiders. Moreover, the predicted tained by PCR amplification of total silk gland cDNA (31), but amino acid compositions of the A.ar. and L.he. TuSp1 gene products these translations did not include C termini. We confirmed the closely match the compositions we report for their tubuliform expression of MaSp1 and MaSp2 in L.he. major ampullate glands glands (Table 1), which is consistent with TuSp1 being the major with RT-PCR amplification (data not shown). The occurrence of component of tubuliform fibers. multiple MaSp1 and MaSp2 copies in the L.he. tubuliform cDNA library suggests that these genes are expressed in both major MaSp1 and MaSp2 Are Expressed in L.he. Tubuliform Glands. In ampullate and tubuliform glands, consistent with the finding of addition to locating TuSp1 transcripts, the An probe found tran- Guerette et al. (4) of ADF-2 (a MaSp1 ortholog; 7) in Araneus

Garb and Hayashi PNAS ͉ August 9, 2005 ͉ vol. 102 ͉ no. 32 ͉ 11381 Downloaded by guest on October 1, 2021 Table 1. Amino acid composition of tubuliform glands, egg case fibers, and predicted TuSp1 Egg case fiber

Tubuliform gland* Outer† Inner† Both‡ TuSp1§ Amino acid A.ar. L.he. A.d. A.au. A.au. L.he. A.ar. L.he.

Ala 26.5 25.6 24.4 25.8 26.7 23.3 25.3 25.1 Ser 22.3 22.1 27.6 22.4 24.0 25.7 28.0 25.9 Gly 9.8 8.1 8.6 10.3 8.7 7.7 7.7 6.6 Leu 7.7 5.3 5.7 6.1 6.6 5.0 7.8 5.8 Glx 7.3 10.7 8.2 7.3 7.6 11.0 7.6 11.2 Asx 6.0 5.0 6.3 6.9 7.0 6.3 5.6 3.9 Val 5.2 5.1 6.0 4.2 4.8 NA 5.5 5.5 Thr 4.6 5.0 3.4 4.1 3.6 5.0 4.4 4.9 Phe 4.2 4.1 3.2 3.7 4.0 NA 3.9 3.7 Tyr 1.5 1.5 1.0 1.8 1.4 1.7 1.0 1.7 Cys 1.5 1.3 0 NA NA NA 0 0 Arg 1.4 1.7 1.5 3.1 2.1 1.3 1.1 1.0 Ile 1.1 2.6 1.7 1.9 1.7 2.0 1.1 2.7 Pro 0.7 1.7 0.6 1.3 0.9 2.0 0.9 2.0 Lys 0.2 0.2 1.8 0.6 0.6 NA 0.1 0

All values are mol %. NA, not available. *A.ar. and L.he. values are from this study; A.d. (A. diadematus) data are from ref. 16. †Fiber values are from A. aurantia (A.au.) (11). ‡Fiber values are from L.he. (17). §Predicted from TuSp1 translations.

diadematus tubuliform gland cDNA. Consequently, it appears that among the A.ar. TuSp1 repeats. Similarly, genomic fragments from morphologically distinct silk glands predominately express one L. mactans were among L.he. cDNA repeats, suggesting that fibroin type, but also may express smaller amounts of fibroins complete homogenization of variant repeat units does not neces- primarily synthesized by other silk glands. sarily accompany speciation events but will presumably occur given a sufficient period of reproductive isolation (36). Although repeats Homogeneity of Intragenic Repeats Provides Evidence of Concerted within a gene are not evolving independently, the branching order Evolution. Consecutive repeats found within each translated TuSp1 of repeat unit clades reflected higher-level phylogenetic relation- sequence exhibit a remarkable degree of homogeneity: the first five ships: (i) deinopoids are united (37); (ii) araneoids are united (38); of six repeats found in L.he. TuSp1 consisted of 184 aa differing by (iii) a theridiid cluster depicts Steatoda as sister to Latrodectus (39); Ն 1–4 residues and were 98% similar at the DNA level. Four and (iv) the araneid cluster unites Argiope to the exclusion of Gea consecutive repeats found in A.ar. TuSp1 were 180 aa long and (40). The sequence conservation of these repeats across Deinopo- Ն 97% similar at the DNA level. Although L.he. and A.ar. repeats idea and Araneoidea, taxa that diverged at least 125 million years were of equal length within a species, deinopoid tubuliform repeats ago (41), suggests the operation of strong functional constraints on exhibited length variation: U.d. TuSp1 repeats ranged from 262 to TuSp1, possibly explaining its retention of phylogenetic signal. 294 aa, whereas D.s. TuSp1 repeats were 188–206 aa long. More- The modular organization of TuSp1, having extreme homoge- over, whereas upstream U.d. repeats (Fig. 1A, repeats 2–4) were Ͼ neity of intragenic units in both sequence composition and length nearly identical, with 98% similarity, D.s. repeats 2 and 3 (Fig. 1A) (particularly for L.he. and A.ar.), substantially varies from the were only 88% similar at the DNA level. arrangement of repetitive modules that typify most other spidroins. The level of sequence similarity between consecutive intragenic The majority of characterized spidroins are MaSp1, MaSp2, Flag, repeats of TuSp1 suggests a history of nonreciprocal recombination and MiSp (minor ampullate spidroin; ref. 24). These silks are between units, leading to homogenization through gene conver- dominated by a small number of amino acid motifs [e.g., A , (GA) , sion, a process termed concerted evolution (32). A specific predic- n n GGX, or GPGX ], which compose larger ensemble repeats (7). tion of concerted evolution is that repeat units occurring within a n gene (e.g., in Fig. 1A, red repeats 1–6 in L.he. TuSp1), become more However, the subrepetitive organization of consecutive intragenic similar to each other than to corresponding regions in related ensembles is highly variable because of differences in the number species (i.e., in Fig. 1A,unit1sinL.he., A.ar., U.d., and D.s. TuSp1 of short amino acid motifs (28), making their alignment problem- in contrast to unit 2s in L.he., A.ar., U.d., and D.s. TuSp1, etc.) (33). atic. Recent investigations discovered two spidroins composed of In addition to similarity among repeat units within a gene, we highly homogeneous ensemble repeats largely devoid of such observed substantial sequence conservation of the repeat unit characteristic amino acid motifs (7, 9). TuSp1 appears more similar across species (Fig. 1B), enabling us to test this prediction. Genetic to this recently discovered class of fibroins, being composed of long, similarity among TuSp1 repeat units, illustrated by a neighbor- nearly identical ensembles, that contain no An, (GA)n, or GPGXn joining tree (Fig. 2), provides evidence of concerted evolution. motifs. Although several lines of evidence indicate that silk pro- Rather than grouping repeats by position across species, we found duction independently arose in spiders and insects (42), many that repeats within a single cDNA sequence cluster together and lepidopteran fibroins similarly contain the amino acid motifs An and generally are connected by short branch lengths indicating minor (GA)n arrayed in length-variable repetitive units (43). Although this genetic differentiation. As observed in other silk sequences (28), the length variability was presumed to characterize lepidopteran silks, final repeat in each TuSp1 sequence (repeat 1s, Fig. 1A) was least recent studies found that silks of certain pyralid moths are instead similar to upstream repeats, consistent with predictions of the composed of long, highly homogeneous repeats (43, 44). Accord- unequal exchange model for tandem repeat evolution (34, 35). ingly, it appears that the similarities of certain amino acid motifs and Instead of clustering A.ar. TuSp1 cDNA repeats to the exclusion modular organizations in lepidopteran and spider silks have evolved of all others, the TuSp1 repeat amplified from Argiope aurantia fell convergently.

11382 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502473102 Garb and Hayashi Downloaded by guest on October 1, 2021 Fig. 2. Neighbor-joining tree of repeat units, based on DNA sequences of alignment in Fig. 1B. Numbered and colored ovals, repeat units as labeled in Fig. 1; values at nodes, percent times recovered in 10,000 bootstrap replicates; mid-point rooted; horizontal branch lengths proportional to number of changes; L.m., L. mactans; L.t., L. tredecimguttatus; S.g., S. grossa; A.au., Argiope aurantia; G.h., Gea heptagon.

Deinopoid TuSp1 Repeats Contain Subrepetitive Expansion Regions. Although the deinopoid (D.s. and U.d.) TuSp1 repeats were easily aligned to araneoid TuSp1 repeats, the deinopoid TuSp1s also Fig. 3. Phylogeny of spidroin gene family: one of two ML trees of C termini contained distinctive expansion regions amidst conserved blocks of (Ϫln L ϭ 8,378.76). Genomic sequences are labeled with superscript G, and all sequence. The U.d. TuSp1 expansion regions contained various others are cDNAs. Three circles at branches denote heuristic parsimony boot- numbers of the nearly perfect repeats: (i) ASQAGSQA and (ii) strap, ML bootstrap, and clade posterior probabilities support, respectively GSQA; whereas the expansion region in D.s. TuSp1 is composed of with coloration indicating percentage values: white, Ͻ50; green, 50–69; a variable number of the repeat GAXAGA where X ϭ S, V, or I yellow, 70–89; red, Ն90. Number below branch indicates decay index. The (Fig. 1B). Similar length variation between repeats has been ob- tree is rooted with mygalomorph E.c. fibroin 1; other sequences are from served in silks largely composed of few amino acids (7, 25), but not (sister to Mygalomorphae). GenBank accessions for addi- in silks with longer, more complicated repeat units, such as acini- tional sequences: N.c. MaSp1, U20329; Araneus diadematus ADF1–4, U47853– U47856; N.c. MaSp2, M92913; N.c. MiSp1, AF027735; N.c. MiSp, AF027736; form silk (AcSp1; ref. 9). Simple amino acid composition, along Argiope trifasciata (A.t.) AcSp1, AY426339; N.c. Flag, AF027973; A.t. Flag, with the extreme codon bias noted in spidroin genes (9, 13, 26), may AF350264; A.t. MaSp1–2, AF350266–AF350267; Dolomedes tenebrosus (D.t.) promote replication slippage and aberrant recombination events, fibroin 1, D.t. fibroin 2, E. chisoseus (E.c.) fibroin 1, AF350269–AF350271; L.g. resulting in pronounced length variation between repeats (25, 28). MaSp1, AF350273; L.g. MaSp2, AF350275; P.t. fibroin 1–4, AF350281– TuSp1 sequences also exhibit bias in codon usage. For example, AF350284; Agelenopsis aperta (A.ap.) MaSp, AY566305. 81% of alanine, 89% of glycine, and 87% of glutamine residues in the U.d. TuSp1 translation contained either adenine or thymine in EVOLUTION third codon positions. Differences in amino acid composition and cDNAs cloned from eight species: Argiope trifasciata, Araneus codon usage between araneoid and deinopoid TuSp1 may therefore diadematus, Nephila clavipes (N.c.), L.g., Agelenopsis aperta, Do- account for the presence of expansion regions unique to deinopoid lomedes tenebrosus, Plectreurys tristis (P.t.), and Euagrus chisoseus (4, TuSp1. 7, 9, 13, 14, 24–28). These sequences encode multiple functionally distinct proteins: (i) Flag, forms orb-web capture spiral (26); (ii) C Termini Phylogeny Confirms That TuSp1 Belongs to the Spidroin MaSp, orb web radii, frame, and dragline (13, 14); (iii) AcSp, sperm Gene Family. Phylogenetic analyses of the spidroin gene family web, prey wrapping, and web decorations (9); and (iv) MiSp, examined relationships between the C termini of our TuSp1, temporary orb-web spiral (24). Also included were C termini of MaSp1, and MaSp2 sequences and previously reported spidroin silks that cannot presently be assigned to a functional category (e.g.,

Garb and Hayashi PNAS ͉ August 9, 2005 ͉ vol. 102 ͉ no. 32 ͉ 11383 Downloaded by guest on October 1, 2021 P.t. fibroins 1–4). In total, 37 C-terminal sequences were aligned, tubuliform glands in Entelegynae spiders outside of the Orbicularia and the resulting matrix of their encoded nucleotides consisted of suggests that TuSp1 is also expressed in the tubuliform glands of 366 characters with 312 variable sites. The three approaches used these spiders. Therefore, further assessment of the TuSp1 phylog- in phylogenetic tree construction (heuristic parsimony, ML, and eny should investigate this gene in an array of Entelegynae spiders. Bayesian) resulted in topologically similar phylogenies. Heuristic The overall phylogenetic pattern of the spidroin gene family, parsimony analyses resulted in nine equally parsimonious trees where multiple distinct ortholog groups contain relationships mir- (length, 2,485; consistency index, 0.383; retention index, 0.575). ML roring species phylogeny, implies that the different silk orthologs searches (Hasegawa–Kishino–Yano model plus among-site rate diverged before the origin of many spider families. The nested, variation, assuming a gamma distribution of rates among sites; monophyletic position of the tubuliform C termini is consistent with parameter estimates: frequency of A, 0.25; frequency of C, 0.21; a single origin of this ortholog group within the spidroin gene frequency of G, 0.21; frequency of T, 0.33; transitions-to- family. However, it is presently difficult to determine the order in transversions ratio, 0.968; ␣, 1.4575) retained two trees of Ϫln L ϭ which ortholog groups arose, because of limited resolution among 8378.76. Three Bayesian runs independently converged on similar the different silk genes and the lack of other unknown silk or- Ϫln L values at stationarity (averaging from Ϫln L 8427.2 to thologs. TuSp1 appears most closely related to P.t. fibroin 3 and the 8407.2), with post-burn-in consensus trees for each run producing Flag ortholog, with relationships among these three groups being identical topologies. We present one of two ML trees (Fig. 3), with equivocal. The repetitive sequences of P.t. fibroin 3 and Flag are support values for the three methods labeled on branch nodes. The extremely different from TuSp1, and it is highly tenuous to align two ML trees differed by two effectively zero-length branches, their repeat sequences to each other. Specifically, the translated specifying the placements of P.t. fibroin 3 and Flag. repetitive sequence of Flag is composed of iterations of GGX and The spidroin phylogeny (Fig. 3) indicates that relationships GPGXn motifs within ensemble repeats that exhibit extensive within this gene family are organized at multiple hierarchical levels. length variation. This organization makes alignment of repetitive Most broadly, C termini group according to their ortholog type sequences, even between closely related Flag genes, difficult (28). rather than by species. For example, Latrodectus TuSp1 sequences P.t. fibroin 3 presents an entirely different repetitive architecture group with all other TuSp1, rather than with Latrodectus MaSp1 and composed of tandem arrays of long repeats (Ϸ200-aa-long), fol- Ϸ MaSp2.AMaSp1–3 clade is also recovered in the phylogeny and lowed by 15 iterations of a shorter ensemble unit ( 40 aa) before contains ADF-2, proposed by Guerette et al. (4) as a putative the C terminus (7). Accordingly, silk repetitive sequences, which component of egg case silk because of its expression in tubuliform encode structural motifs, can rapidly diversify through extensive glands. Similarly, the MaSp1 and MaSp2 sequences we cloned from mutations that are propagated by intragenic homogenization. L.he. tubuliform glands fall into the MaSp1–3 clade, being most The presented C termini phylogeny illustrates how the vast closely related to L.g. MaSp1 and MaSp2, respectively. However, majority of characterized silks belong to the MaSp1–3 clade, with expression of major ampullate fibroin genes is presumed to be additional spider silk ortholog groups represented by cDNAs from greater in major ampullate silk glands than in tubuliform glands, only one to three species. By cloning TuSp1 from a diverse array of given the close correspondence between amino acid compositions species, we have greatly extended knowledge of the spidroin gene of tubuliform silk glands and TuSp1 (Table 1) and the abundance family. However, additional putative members of this gene family of TuSp1 in the L.he. cDNA library. remain unknown. The range of genetic features found across spider Within an ortholog group, evidence suggests that topology fibroins, including different levels of intragenic homogenization reflects species phylogeny: deinopoid TuSp1 C termini are united, and various modular organizations, underscores the utility of the and topology among the Latrodectus and araneid TuSp1 is consis- silk gene family as a system to investigate the evolution of molecules tent with expected phylogenetic relationships in these two groups that are especially critical to organismal ecology. The antiquity and (39, 40). However, rather than grouping with the araneid TuSp1 C immense size of the order Araneae, with all members expressing termini, Latrodectus TuSp1 unexpectedly appeared more closely different suites of silk genes, hint at a vast diversity of silk genes related to the deinopoid sequences. These taxa represent highly awaiting discovery. divergent lineages, and resolution of these relationships likely requires TuSp1 from additional species. Although egg case con- M. Andrade, T. Blackledge, S. Crews, M. Hedin, C. Kristensen, P. Krushelnycky, and M. Stowe helped in obtaining spiders. G. Maduro, N. struction is characteristic of virtually all spiders, tubuliform gland Nguyen, and V. Vo assisted in laboratory work. N. Ayoub, J. Gatesy, A. spigots are considered a synapomorphy of the Entelegynae, a large Lancaster, and B. Swanson provided comments on previous drafts. This clade of Ϸ70 spider families that includes diverse taxa such as work was supported by the National Science Foundation Award DEB- jumping, orb-weaving, and wolf spiders (45). The presence of 0236020 and U.S. Army Research Office Award DAAD19-02-1-0358.

1. Coddington, J., Giribet, G., Harvey, S., Prendini, L. & Walter, D. (2004) in Assembling the 23. Huelsenbeck, J. & Ronquist, F. (2001) Bioinformatics 17, 754–755. Tree of Life, eds. Cracraft, J. & Donoghue, M. (Oxford Univ. Press, New York) pp. 296–318. 24. Colgin, M. & Lewis, R. (1998) Protein Sci. 7, 667–672. 2. Gosline, J., DeMont, M. & Denny, M. (1986) Endeavour 10, 37–43. 25. Beckwitt, R., Arcidiacono, S. & Stote, R. (1998) Insect. Biochem. Mol. Biol. 28, 121–130. 3. Vollrath, F. (1992) Sci. Am. 266 (March), 70–76. 26. Hayashi, C. & Lewis, R. (1998) J. Mol. Biol. 275, 773–784. 4. Guerette, P., Ginzinger, D., Weber, B. & Gosline, J. (1996) Science 272, 112–115. 27. Tian, M., Liu, C. & Lewis, R. (2004) Biomacromolecules 5, 657–660. 5. Palmer, J., Coyle, F. & Harrison, F. (1982) J. Morphol. 174, 269–274. 28. Hayashi, C. & Lewis, R. (2000) Science 287, 1477–1479. 6. Kovoor, J. (1987) in The Ecophysiology of Spiders, eds. Nentwig, W. & Heimer, S. (Springer, 29. Barghout, J., Thiel, B. & Viney, C. (1999) Int. J. Biol. Macromol. 24, 211–217. New York), pp. 160–186. 30. Stauffer, S., Coguill, S. & Lewis, R. (1994) J. Arachnol. 22, 5–11. 7. Gatesy, J., Hayashi, C., Motriuk, D., Woods, J. & Lewis, R. (2001) Science 291, 2603–2605. 31. Lawrence, B., Vierra, C. & Moore, A. (2004) Biomacromolecules 5, 689–695. 8. Hayashi, C., Shipley, N. & Lewis, R. (1999) Int. J. Biol. Macromol. 24, 271–275. 32. Zimmer, E. A., Martin, S. L., Beverley, S. M., Kan, Y. W. & Wilson, A. C. (1980) Proc. Natl. 21, 9. Hayashi, C., Blackledge, T. & Lewis, R. (2004) Mol. Biol. Evol. 1950–1959. Acad. Sci. USA 77, 2158–2162. 10. Beckwitt, R. & Arcidiacono, S. (1994) J. Biol. Chem. 269, 6661–6663. 33. Swanson, W. & Vacquier, V. (1998) Science 281, 710–712. 11. Foradori, M., Kovoor, J., Moon, M. & Tillinghast, E. (2002) J. Morphol. 252, 218–226. 34. Charlesworth, B., Sniegowski, P. & Stephan W. (1994) Nature 371, 215–220. 12. Candelas, G., Ortiz, A. & Molina, C. (1986) J. Exp. Zool. 237, 281–285. 35. McAllister, B. & Werren, J. (1999) J. Mol. Evol. 48, 469–481. 13. Xu, M. & Lewis, R. (1990) Proc. Natl. Acad. Sci. USA 87, 7120–7124. 53, 14. Hinman, M. & Lewis, R. (1992) J. Biol. Chem. 267, 19320–19324. 36. Meeds, T., Lockard, E. & Livingston, B. (2001) J. Mol. Evol. 180–190. 15. Hu, X., Kohler, K., Falick, A., Moore, A., Jones, P., Sparkman, O. & Vierra, C. (2005) J. Biol. 37. Coddington, J. (1989) J. Arachnol. 17, 71–95. Chem. 280, 21220–21230. 38. Griswold, C., Coddington, J., Hormiga, G. & Scharff, N. (1998) Zool. J. Linn. Soc. 123, 1–99. 16. Andersen, S. (1970) Comp. Biochem. Physiol. 35, 705–711. 39. Garb, J., Gonzalez, A. & Gillespie, R. (2004) Mol. Phylogenet. Evol. 31, 1129–1142. 17. Casem, M., Turner, D. & Houchin, K. (1999) Int. J. Biol. Macromol. 24, 103–108. 40. Scharff, N. & Coddington, J. (1997) Zool. J. Linn. Soc. 120, 355–434. 18. Beuken, E., Vink, C. & Bruggeman, C. (1998) BioTechniques 24, 748–750. 41. Selden, P. (1990) Paleontology 33, 257–285. 19. Swofford, D. (2002) PAUP* (Sinauer, Sunderland, MA), Version 4.0b10. 42. Craig, C. (1997) Annu. Rev. Entomol. 42, 231–267. 20. Bremer, K. (1988) Evolution 42, 795–803. 43. Zˇurovec, M. & Sehnal, F. (2002) J. Biol. Chem. 277, 22639–22647. 21. Hasegawa, M., Kishino, H. & Yano, T. (1985) J. Mol. Evol. 22, 160–174. 44. Fedicˇ, R., Zˇurovec, M. & Sehnal, F. (2003) J. Biol. Chem. 278, 35255–35264. 22. Posada, D. & Crandall, K. (1998) Bioinformatics 14, 817–818. 45. Platnick, N., Coddington, J., Forster, R. & Griswold, C. (1991) Am. Mus. Novit. 3016, 1–73.

11384 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502473102 Garb and Hayashi Downloaded by guest on October 1, 2021