bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Multicomponent nature underlies the extraordinary mechanical properties of dragline silk

5 Nobuaki Konoa,b, Hiroyuki Nakamurac, Masaru Moria,b, Yuki Yoshidaa,b, Rintaro Ohtoshid, Ali D Malayd, Daniel A Pedrazzoli Moranc, Masaru Tomitaa,b, Keiji Numatad,e, Kazuharu Arakawaa,b,*

aInstitute for Advanced Biosciences, Keio University, 403-1 Nihonkoku, Daihouji, Tsuruoka, Yamagata, 997-0017, . bSystems Biology Program, Graduate School of Media and Governance, Keio University, 5322 Endo, 10 Fujisawa, Kanagawa, 252-0882, Japan. cSpiber Inc., 234-1 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan. dRIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan. eDepertment of Material Chemistry, Kyoto University, Kyotodaigaku-Katsura, Nishikyo-ku, Kyoto, 615- 8510, Japan 15 *Kazuharu Arakawa Email: [email protected] Keywords: Artificial spider silk, Orb-weaver spider, Mechanical property, Genome

20 Abstract

Dragline silk of golden orb-weaver () is noted for its unsurpassed toughness, combining extraordinary extensibility and tensile strength, suggesting industrial application as a sustainable biopolymer material. To pinpoint the molecular composition of dragline silk and the roles of its constituents in achieving its mechanical properties, we report a multiomics approach combining high- 25 quality genome sequencing and assembly, silk gland transcriptomics, and dragline silk proteomics of four Nephilinae spiders. We observed the consistent presence of the MaSp3B spidroin unique to this subfamily, as well as several non-spidroin SpiCE proteins. Artificial synthesis and combination of these components in vitro showed that the multicomponent nature of dragline silk, including MaSp3B and SpiCE, along with MaSp1 and MaSp2, is essential to realize the mechanical properties of spider dragline 30 silk.

1

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Main Text

Introduction 5 Spider silk is a typical natural high-performance structural protein and has potential for numerous applications as a protein biopolymer with biodegradability and biocompatibility (1). Spider silk has a tensile strength superior to that of steel, yet it is highly elastic, showing greater toughness than aramid fibers such as Kevlar (2); thus, it has received interest for use in industrial applications (1, 3). Orb-weaver spiders, 10 especially those belonging to the family Araneidae, are often used as models in natural spider silk research. The average mechanical properties of their dragline silks reach approximately 1 GPa for breaking strength, 30% for breaking strain, and 130 MJ/m3 for toughness (4). Numerous works have reported the use of recombinant protein and artificial fiber spinning to produce artificial spider silk in genetically optimized organisms (5-10), but it remains challenging to fully reproduce and rival the mechanical properties of natural 15 silks (6, 11-13). The difficulties are multifactorial; the unusually large protein size and the repetitive nature of its sequence are inherent challenges for synthesis, and the natural conditions of spinning are only beginning to be fully uncovered (13-15). However, one primary reason that full reproduction has not been successful is that the previous recombinant approaches employed only MaSp1, only MaSp2, or at most a combination of the two (11, 16-18). In fact, it is often suggested that dragline silk is composed primarily of 20 two components, MaSp1 and MaSp2 (5, 19-23). However, recent proteome analyses suggest the existence of additional components in spider silks, such as a cysteine-rich protein (CRP) in black widows (24-26). Genome and transcriptome analyses have identified many MaSp families (27, 28), and in the Araneus, it is known that dragline silk contains nearly equal amounts of MaSp3 and MaSp1/2 (28). Furthermore, low molecular weight (LMW) non-spidroin proteins, such as spider-silk constituting element 25 (SpiCE), have been found by transcriptomic and proteomic analyses. SpiCE is a protein of unknown function that is commonly highly expressed in the silk gland and in spider silk. It is becoming apparent that dragline silk is a complex multicomponent material containing much more than MaSp1 and MaSp2.

To pinpoint the protein constituents of dragline silk through quantitative proteomics, a high-quality reference 30 genome and full-length coding sequence annotation are essential to allow the correct and comprehensive identification of the proteins corresponding to the detected peptide fragments. Since spider fibroin genes are extremely long (~10 kbp) and consist almost entirely of repeat sequences (29, 30), genome sequencing using PCR-free long reads is critical, and in order to eliminate false-positive annotations, predicted coding sequences need to be confirmed on the bases of conservation in closely related species and actual mRNA 35 expression in the silk gland. Hence, we took a multiomics approach to quantitatively identify the dragline protein constituents and the genomes of four golden silk orb-weavers (subfamily Nephilinae): Trichonephila clavata, , Trichonephila inaurata madagascariensis, and pilipes. These spiders are reported to produce high-performance dragline silk with average toughness values of 169, 131, 285, and 292 MJ/m3, respectively (Fig. S1). T. clavipes genome data have already been reported (27), but

2

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

we chose to construct ab initio assemblies, including for this species, since the existing genome is based on PCR-amplified sequencing for fibroins, and some fibroin gene sequences remain incomplete. Moreover, the existing T. clavipes assembly is suspected to be substantially contaminated, as the entirety of the longest scaffold has been identified as bacterial in origin (31).

5

Results

Draft genomes of four Nephilinae spiders 10 Spider genomes are large and complex, and de novo sequencing is challenging. We extracted genomic DNA from dissected leg and cephalothorax tissue of adult female individuals, conducted hybrid sequencing with short- and long-read sequence technologies (32) (see Methods). The 10X Genomics linked sequencing produced 702.57 M reads (106.09 Gb), 1,090 M reads (164.60 Gb), 1,010.42 M reads (152.57 Gb), and 900.42 M reads (135.96 Gb) from T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes, 15 respectively (S1 Table). PacBio sequencing for the T. clavata genome yielded 74.81 Gb of data with an average length of 8.24 kb and an N50 of 11.1 kb (S1 Table). Furthermore, 2.19 M reads (11.03 Gb), 4.70 M reads (41.18 Gb), 5.88 M reads (24.33 Gb), 2.92 M reads (8.48 Gb) were generated by Nanopore sequencing of gDNA from T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes, respectively (S1 Table). Total coverage was approximately 80X. Obtained sequence reads were assembled, polished, 20 corrected, and contaminant eliminated (see Methods) the resulting the number of scaffolds and N50 length in the T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes draft genomes were 39,792 (N50 of 112,957 bp), 21,438 (N50 of 738,925 bp), 28,263 (N50 of 231,352 bp), and 132,899 (N50 of 292,147 bp), respectively (Table 1, Fig. S3). The completeness of each assembled result was assessed by BUSCO (33). The completeness scores were 90.6%, 93.0%, 89.8%, and 92.9% for the eukaryote lineage (Table 1), 25 indicating the high quality of the obtained draft genomes. The tRNAs and ribosomal RNAs were annotated using tRNAscan-SE v2.0 (34) and Barrnap (https://github.com/tseemann/barrnap) with default parameters. In total, 8,008, 11,433, 8,679, and 1,837 tRNAs, and 248 (49 18S, 49 28S, 43 5.8S, and 107 5S), 106 (15 18S, 5 28S, 9 5.8S, and 77 5S), 128 (11 18S, 10 28S, 5 5.8S, and 102 5S), and 167 (8 18S, 8 28S, 6 5.8S, and 29 5S) rRNA copies were predicted in the T. clavata, T. clavipes, T. inaurata madagascariensis, and 30 N. pilipes genomes, respectively. The T. clavipes genome improves upon the previous assembly (27) in quality and completeness (longest contig, 4.65 Mbp; N50 length, 738 kbp) (Fig. S4 and SI Text).

The number of protein-coding genes predicted by BRAKER was initially 70,418, 517,373, 51,581, and 72,271. Redundant genes were eliminated by CD-HIT-EST (35) clustering with a nucleotide identity of 97%. 35 To obtain a functional gene set, we removed the genes with an expression level of less than 0.1 and unannotated genes. Finally, 18,822, 45,304, 16,899, and 17,127 functional protein-coding gene sets were obtained for T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes, respectively (Table 1).

3

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

BUSCO (v4.0.5) was used to determine the quality of our functional gene set using the eukaryote lineage, and the completeness scores were 91.4%, 87.5%, 91.0%, and 93.4% for the eukaryote lineage (Table 1).

Catalog of spidroins in four Nephilinae spiders 5 Spidroin genes are structured with a long repeat domain sandwiched between the nonrepetitive N/C- terminal domains and an extreme length, of 10 kbp or more (29, 30, 36, 37). Short-read sequencing and PCR amplification technology alone often cause fragmentation, collapse, and chimerization of spidroin genes. Therefore, long-read sequencing using MinION or GridION is an essential technique for finding spidroins because these technologies can sequence a region covering the entire length of the spidroin- 10 coding gene. However, the sequence of the repeat region is very delicate, and the sequence quality of long reads alone is not sufficiently high. We have contributed to the construction of the spidroin catalog using hybrid sequencing, which combines various sequencing technologies and has been reported previously (28, 32, 38). In a previous study, we reported the spidroin catalog, including complete spidroin CDS in A. ventricosus, and showed that known spidroin genes that were sequenced without using long reads were 15 mostly short (28). Furthermore, the collection of partial or repeat-domain-collapsed spidroin genes omitted at least one gene family. The new gene family of MaSp in A. ventricosus was found in a comparison of full- length sequences. Here, we used a hybrid method that combines the gDNA sequencing reads used for genome assembly and the a few hundred million reads obtained by cDNA and direct RNA sequencing to catalog the spidroin diversity in four Nephilinae spiders (S1 Table and Materials and Methods). Seven 20 orthologous groups (MaSp: major ampullate spidroin, MiSp: minor ampullate spidroin, AcSp: aciniform spidroin, Flag: flagelliform spidroin, AgSp: aggregate spidroin, PySp: pyriform spidroin, and CySp: cylindrical/tubuliform spidroin) are known as the typical spidroins in Araneoid orb-weaving spiders. We obtained all seven groups contain 2-3 families or subfamilies from four Nephilinae draft genomes, five of which were full length (MaSp, MiSp, AcSp, CySp, and PySp) (Fig. 1, Fig. S5 for details). 25 The spidroin genes were well conserved among Nephilinae spiders and followed the established phylogenetic relationship (Fig. S6 and S7). Interestingly, the spidroin catalog revealed a group of major ampullate spidroins distinct from MaSp families 1 and 2 that form a unique clade within Nephilinae. The N- terminal phylogeny suggests that this group forms a clade with the MaSp family 3 sequence of Araneidae 30 (Fig. S7), reflected by the characteristic DGGR motif, yet the N-terminus conservation is sufficiently low to define this as a unique group; thus, we named this gene MaSp3B based on a spidroin nomenclature (SI Text). Previously identified partial sequences (Sp-74867 and Sp-907) (27) belong to this group.

Expression profile of spidroin and SpiCE 35 With this high-quality reference of full-length spidroins, the existence of a large proportion of MaSp3B in the forcibly reeled dragline silk of T. clavata was immediately observable as a distinct band in the high molecular

4

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

weight (HMW) fraction on SDS-PAGE (Fig. 2A). The abundance ratios of the proteins were then quantified by intensity-based absolute quantification (iBAQ) (39) with shotgun proteomics of selectively reeled dragline silks (Fig. S9). These analyses showed consistent composition not only within the replicates but also among the four species analyzed (Fig. 2B). Among the spidroins, only MaSp family members were detected; 5 MaSp1, 2, and 3 accounted for approximately 90 wt% of the total in any spider species (Fig. 2B). Transcriptome analysis of silk glands confirmed the specific expression of MaSp3B in the major ampullate silk gland, similar to that of MaSp1/2 (Fig. 2C, Fig. S10, and Table S2 to S4). Both of these omics analyses also clearly demonstrated the consistent presence of non-spidroins, which was also confirmed by SDS- PAGE of the LMW fraction (Fig. 2A). Across the four Nephilinae species, these genes are consistently 10 expressed as mRNAs at an level equivalent to that of MaSp genes in the major ampullate silk gland and at lower levels in the minor ampullate silk gland (Fig 2C, Fig. S10). Three such proteins highly expressed in both silk (as protein) and the silk gland (as mRNA) were annotated as SpiCEs, following the identification of non-spidroin constituents in A. ventricosus dragline silk (28). These SpiCEs are conserved in Nephilinae (Fig. S11), but their sequences are considerably nonhomologous to those of A. ventricosus; therefore, we 15 named them SpiCE-NMa1, 2, and 4 (SpiCE Nephilinae Major Ampullate). Among them, SpiCE-NMa2 and SpiCE-NMA4 were CRP type with high cysteine content. The most abundant non-spidroin protein among the dragline silks was SpiCE-NMa1, at approximately 1 wt% (Fig. 2B).

SpiCE doubles the tensile strength of artificial silk-based material 20 To investigate the role of these proteins in the expression of the mechanical properties of Nephilinae dragline silks, we first produced composite films of recombinant MaSp family proteins. Recombinant MaSp was synthesized based on shortened amino acid sequences of approximately 50 kDa (Fig. S12 and S13). The transparency of silk-based films did not drastically decrease even when the composite films were prepared from two or three spidroins, suggesting that heterogeneous interactions among MaSps did not 25 induce specific aggregation (Fig. 3A). Wide-angle X-ray scattering (WAXS) profiles demonstrated that MaSp1, MaSp2, and MaSp3 predominantly form β-sheet structures with different d-spacings based on the peak intensities at 6 and 18 nm-1 (Fig. 3B); however, the composite films of MaSps showed similar β-sheet structures. There were noticeable differences among the β-sheet structures with different protein compositions, that is, with MaSp1 used alone, with the mix of three families, and with the natural 30 composition (Fig. 3B). We further tested the impact of SpiCE inclusion by producing composite films of recombinant MaSp and SpiCE-NMa1, which was approximately 5 times more abundant than other SpiCEs (Fig. 2B). Recombinant SpiCE-NMa1 was added in the range of 0 ~ 5 wt% based on proteome quantification to produce composite films, and the film transparency increased to 76.1 with 5 wt% SpiCE-NMa1 with MaSp compared to 72.5 with MaSp alone, suggesting a close interaction of the proteins in the composite film (Fig. 35 4A, Fig. S14). Surprisingly, the addition of SpiCE-NMa1 dramatically changed the mechanical properties of the composite film. Tension tests showed a twofold increase in the tensile strength of the MaSp film

5

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

containing SpiCE-NMa1 (Fig. 4, Fig. S15 and S16). The impact of SpiCE was also clearly observed in the recombinant composite silk mixed with recombinant MaSp and SpiCE-NMa1. Composite silk with SpiCE- NMa1 showed an increase in elongation at break, and notably, the stress-strain curve showed a clear drastic yield point only in the composite silk mixed with SpiCE-NMa1 (Fig. 4C). Natural spider silk is 5 characterized by a nonlinear response to external pressure, and softening at the yield point allows for the mechanical robustness of the spider web (40-44). Therefore, including a small percentage (~2%) of SpiCE- NMa1 in artificial silks could potentially be used to enhance their mechanical properties.

10 Discussion

MaSp3 is widely conserved in the large-web-forming group of the family Araneidae and has been found by proteomics to be a major component of dragline silk in A. ventricosus. The conservation of MaSp3 has been thought to be limited to closely related species of the genus Araneus (28), and it was believed that 15 Nephilinae spiders do not have MaSp3 even though they also build large webs. However, our high-quality genome analyses demonstrate that MaSp3B, a distinct subtype of MaSp3 in the genus Araneus, is conserved in the genera Trichonephila and Nephila. Further, our analyses suggest that MaSp3B is one of the major components of dragline silk and may associate with other MaSps at equivalent stoichiometry. This finding demonstrates the limitation of the conventional MaSp1/2-based strategy for artificial silk 20 synthesis and the importance of including other components, such as MaSp3, for the purpose of producing araneoid spider silk, which has particularly strong mechanical properties. Although the specific role of MaSp3 is still unclear, the key residues contributing to the N-terminal domain association, such as D40, K65, E79, and E119, are conserved in MaSp1-3 (Fig. S17). Therefore, it is possible that the formation of a heteromer with MaSp1/2 may stabilize the silk structure and improve its toughness.

25

The contribution of the low molecular weight component of the spider dragline silk, SpiCE, to its mechanical properties was also confirmed in vitro, in that inclusion of SpiCE doubled the tensile strength of the artificial film, but the exact mechanism responsible for this phenomenon is still an area of future work. Accessory components of structural proteins are often used to stabilize the structure; for example, elastin, another 30 known structural protein, binds with fibulin proteins to produce elastic fibers (45). Likewise, while a self- assembly system involving pH-induced fibrillization of the N-terminal domain and liquid-liquid phase separation (LLPS) of the C-terminal domain (14) is known to participate in spider silk fibril formation, SpiCE could serve to support the interaction between repeat regions rather than the terminal domains. Further observation of the interaction between SpiCE and spidroin with detailed biophysical analysis, such as 35 nuclear magnetic resonance (NMR) spectroscopy or atomic force microscopy (AFM) scanning, is desirable to elucidate the functions of SpiCE. The evolutionary origin of SpiCE also requires further analyses, as these proteins seem to be specific to a narrow range of spider clades. Unlike MaSp3, the conservation of

6

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

SpiCE has been confirmed only within very closely related groups (genus or subfamily at most), and CRP seems to be used instead of SpiCE in black widow spiders (26). The MaSps retain homology across different spider clades, but there is no sequence homology among SpiCEs, suggesting that MaSps and SpiCEs did not coevolve. The orb web size evolved in response to body size, environment, and prey, and 5 the dragline silk made by MaSp used for the orb web frame needed to be strong enough to match its size (46). To produce silks with stronger properties, mutations in the nonconserved repeat domain of the spidroin gene and diversification of multiple MaSp families by gene duplication must have occurred. In fact, multiple lineage-specific duplications of MaSp paralogs are observed even among orb-weaving spiders (27, 28, 47- 49). The evolution of SpiCE seems to follow this lineage-specific evolutionary pattern.

10

Considering the contributions of MaSp3 and SpiCE to the mechanical performance of spider silk, it is essential to consider silk as not just a product of MaSp1 and MaSp2 but rather a multicomponent material that also includes MaSp3 and SpiCE. The contribution of multiple paralogs of MaSp1 and MaSp2, which is not covered in this work, may also contribute to the specific ecological adaptation of Nephilinae spiders. 15 The availability of four closely related Nephilinae genomes and their gland transcriptomes as well as the dragline silk proteomes presented in this work will contribute to the research in this direction, as well as other phylogenomic work of .

20 Materials and Methods

Spider sample and rearing environment Trichonephila clavata and Nephila pilipes samples were collected in Japan, and Trichonephila inaurata madagascariensis was collected in . Trichonephila clavipes samples were purchased from 25 Spider Pharm Inc. All the spiders used in this study were adult females. The identification of spider specimens was performed based on morphological characteristics and sequence identification of cytochrome c oxidase subunit 1 (COX1) in the Barcode of Life Data System (BOLD: http://barcodinglife.org). The spiders used for dissection or silk reeling were kept in a PET cup (129 x 97 mm, PAPM340: RISUPACK CO., LTD) in a climate-controlled laboratory with 25.1 °C, 57.8% humidity and 12-12 h light-dark cycles. 30 The spiders were fed every two days with crickets purchased from Mito-korogi Farm, and water was given daily by softly spraying inside the cup. The legs and cephalothoraxes were used for gDNA extraction, and abdomens and silk glands were used for RNA extraction. All of the tissues were dissected from adult female spiders. Dissected samples were immersed in liquid nitrogen (LN2) and stored at -80 °C until the next process. 35 High molecular weight genomic DNA extraction Spider genomic DNA was extracted from dissected leg and cephalothorax tissue from adult female individuals using Genomic-tip 20/G (QIAGEN) following the manufacturer’s protocol. To obtain high

7

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

molecular weight (HMW) genomic DNA, all steps were conducted as gently as possible. The legs and cephalothorax were separated from flash-frozen spider specimens and then homogenized with a BioMasher II (Funakoshi) and mixed with 2 mL of Buffer G2 (QIAGEN) containing 200 µg/mL RNase A. After the addition of 50 µL Proteinase K (20 mg/mL), the lysate was incubated at 50 °C for 12 h on a shaker 5 (300 rpm) and centrifuged at 5,000 x g for 5 min at 4 °C. The aqueous phase was loaded onto a pre- equilibrated QIAGEN Genomic-tip 20/G (QIAGEN) by gravity flow and washed three times. The DNA was eluted in high-salt buffer (Buffer QF) (QIAGEN). Using isopropanol precipitation, we desalted and concentrated the eluted DNA and resuspended it in 10 mM Tris-HCl (pH 8.5). The extracted genomic DNA was checked for quality using a TapeStation 2200 instrument with genomic DNA Screen Tape (Agilent 10 Technologies) and quantified using a Qubit Broad Range dsDNA assay (Life Technologies). Size selection (> 10 kb) was performed with a BluePippin with High Pass Plus Gel Cassette (Sage Science).

RNA extraction, mRNA selection, and cDNA preparation Spiders and silk gland samples were stored in LN2 at -80 °C until RNA extraction. RNA isolation from the 15 dissected abdomen or silk glands in the abdomen was conducted based on a spider transcriptome protocol (50). Flash-frozen dissected abdomen tissue was immersed in 1 mL TRIzol Reagent (Invitrogen) along with a metal cone and homogenized with the Multi-Beads Shocker (Yasui Kikai). Phase separation was performed by the addition of chloroform, and the upper aqueous phase containing RNA was purified automatically with an RNeasy Plus Mini Kit (QIAGEN) in a QIAcube instrument (QIAGEN). The RNA was 20 quantified and checked for quality using a Qubit Broad Range RNA assay (Life Technologies) and NanoDrop 2000 (Thermo Fisher Scientific). The integrity was estimated by electrophoresis using a TapeStation 2200 instrument with RNA ScreenTape (Agilent Technologies). mRNA selection was performed using oligo d(T). For direct RNA sequencing, mRNA was prepared using the NucleoTrap mRNA Mini Kit (Clontech) from 500 ng of total RNA. cDNA was synthesized from mRNA isolated from 100 µg of

25 total RNA by NEBNext Oligo d(T)25 beads (skipping the Tris buffer wash step). The first- and second-strand cDNA were synthesized using ProtoScript II Reverse Transcriptase and NEBNext Second Strand Synthesis Enzyme Mix.

Library preparation and sequencing 30 For cDNA sequencing was carried out on the abdomens of T. clavata (n = 14), T. clavipes (n = 2), T. inaurata madagascariensis (n = 1) and N. pilipes (n = 14); in the major ampullate silk glands of T. clavata (n = 6), T. clavipes (n = 7), and N. pilipes (n = 5); and in the minor ampullate silk glands of T. clavata (n = 6), T. clavipes (n = 8), and N. pilipes (n = 6). All cDNA libraries were constructed according to the standard protocol of the NEBNext Ultra RNA Library Prep Kit for Illumina (New England BioLabs). The synthesized 35 double-stranded cDNA was end-repaired using NEBNext End Prep Enzyme Mix before ligation with NEBNext Adaptor for Illumina. After USER enzyme treatment, cDNA was amplified by PCR with the following conditions: 20 µL cDNA, 2.5 µL Index Primer, 2.5 µL Universal PCR Primer, 25 µL NEBNext Q5

8

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Hot Start HiFi PCR Master Mix 2X; 98 °C for 30 s and 12 cycles each of 98 °C for 10 s, 65 °C for 75 s and 65 °C for 5 min. cDNA sequencing was conducted with a NextSeq 500 instrument (Illumina) using 150-bp paired-end reads with a NextSeq 500 High Output Kit (300 cycles).

5 The synthetic long-read sequencing using 10X Genomics was carried out in all four Nephilinae spiders. Purified genomic DNA fragments longer than 60 kb (10 ng) were used to prepare the 10X GemCode library with the Chromium instrument and Genome Reagent Kit v2 (10X Genomics) following the manufacturer’s protocol. The 10X GemCode library sequencing was conducted with a NextSeq 500 instrument (Illumina) using 150-bp paired-end reads with a NextSeq 500 High Output Kit (300 cycles). 10 For PacBio sequencing, the library preparation was performed with gDNA fragments of more than 10 kb in size (selected by BluePippin) and sequenced in PacBio RSII using P6-C4 chemistry. PacBio library construction and sequencing were conducted at Genomic Information Research Center, Osaka University.

15 The gDNAs of T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes were also sequenced by Nanopore technology. Library preparation was completed following the 1D library protocol (SQK-LSK109, Oxford Nanopore Technologies). The quality of the libraries was estimated by TapeStation 2200 with D1000 Screen Tape (Agilent Technologies). Direct RNA sequencing was performed for T. clavata, T. clavipes, and N. pilipes. Libraries constructed from 500 ng of total RNA were prepared following the direct RNA 20 sequencing protocol (SQK-RNA001, Oxford Nanopore Technologies). Sequencing was performed using a GridION instrument with Spot On Flow Cell Rev D (FLO-MIN106D, Oxford Nanopore Technologies). The base calls were performed by Guppy basecalling software (version 3.2.10+aabd4ec).

Genome assembly 25 We adopted a hybrid assembly strategy for spider draft genomes that combined the various reads produced by each sequencing technology. Natural long reads were produced by direct sequencing of HMW genomic DNA with Nanopore or PacBio technology. The synthetic long reads were generated using a combination of Illumina and 10X Genomics technologies. The details of the assembly methods used for each of the four spider genomes are described below. 30 T. clavata: Long reads produced by PacBio RSII were assembled with Canu 1.4 (51) with default parameters. First, polishing was performed with Quiver in SMRT analysis 2.3.0 and further polished with pilon 1.22 (52). Synthetic long reads of the 10X GemCode library sequenced by a NextSeq 500 instrument were assembled by Supernova 2.1 with pseudohap2 output mode (53). The above two assemblies were 35 merged using SSPACE-LongRead v.1.1 with the parameters (-i 95 -o 2000 -l 1 -k 1) (54).

9

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

T. clavipes: Long reads from Nanopore sequencing were first quality filtered using NanoFilt with options -q 7 --headcrop 50 (55), and then assembled using Flye 2.7 with three polishing iterations (56). The assembly was further polished by pilon 1.23 (52) using 10X GemCode library sequencing reads for four rounds until the BUSCO score was saturated. 5 T. inaurata madagascariensis: Synthetic long reads of the 10X GemCode library sequenced by a NextSeq 500 instrument were assembled by Supernova 2.1.1 with pseudohap2 output mode (53). Assembled contigs were gap-filled using PBJelly (support option -m 45, blasr option -minMatch 8 -minPctIdentity 70 - bestn 1 -nCandidates 20 -maxScore -500 -nproc 32 -noSplitSubreads) (57) and polished by pilon v.1.23 10 (52) for three rounds. Long reads from Nanopore sequencing were first quality filtered using NanoFilt with options -q 7 --headcrop 50 (55), assembled using wtdbg v2.0 (58), and then further polished by pilon v.1.23 for three rounds. Two assemblies were merged using quickmerge with options -hco 5.0 -c 1.5 -l 100000 - ml 5000 (59).

15 N. pilipes: Synthetic long reads from the 10X GemCode library sequenced by a NextSeq 500 instrument were assembled by Supernova 2.0.0 with pseudohap2 output mode (53). The resulting assemblies were further merged and gap-filled using PBJelly (blasr option -minMatch 8 -minPctIdentity 70 -bestn 1 - nCandidates 20 -maxScore -500 -nproc 32 -noSplitSubreads) (57) and polished by pilon 1.22 (52).

20 Contaminant elimination and genome size estimation To detect possible contaminants in the genome, we submitted each genome assembly to BlobTools analysis (60). The genome sequence was submitted to a Diamond (61) BLASTX search (--sensitive --max- target-seqs 1 --evalue 1e-25) against the UniProt Reference proteome database (downloaded on 2018 Nov.). Each contig was classified into a lineage with BlobTools taxify. We then mapped DNA-Seq reads to 25 the genome by BWA MEM and conducted SAM to BAM conversion with SAMtools. These taxonomic annotation and coverage data were visualized by BlobTools following the protocol. Contigs that were classified as from bacteria, plants, or fungi were removed from the assembly, and the filtered genome assembly was assayed by Benchmarking Universal Single-Copy Orthologs (BUSCO) v4.0.5 (33) (eukaryote lineage) to validate genome completeness. Genome heterozygosity, repeat content, and size 30 were estimated based on the k-mer distribution with Jellyfish and GenomeScope (62).

Gene prediction and annotation Gene prediction was performed using a gene model created by cDNA-seq mapping data with HISAT2 and BRAKER (63, 64). cDNA-seq reads were mapped to the genome with HISAT2 (v 2.1.0), and the resulting 35 SAM file was converted, sorted, and indexed by SAMtools (v1.4). Repeat sequences were detected by RepeatModeler (1.0.11) and soft-masked by RepeatMasker (v4.0.7). The soft-masked genome was submitted to gene prediction with BRAKER (v2.1.4, --softmasking --gff3). The amino acid sequences were

10

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

submitted to BLASTP or Diamond BLASTP searches against public databases (UniProt TrEMBL, UniProt Swiss-Prot). Redundant genes were eliminated by CD-HIT-EST (35) clustering with a nucleotide identity of 97%. To obtain a functional gene set, we removed the genes with an expression level of less than 0.1 and unannotated genes. BUSCO (v4.0.5) was used to determine the quality of our functional gene set using the 5 eukaryote lineage. The tRNAs were annotated using tRNAscan-SE v2.0 (34) with default parameters. Ribosomal RNAs (rRNAs) were predicted using Barrnap (https://github.com/tseemann/barrnap).

Spidroin gene catalog We used a hybrid method that combines short- and long-read sequencing to catalog the spidroin diversity 10 in four Nephilinae spiders. The short reads were obtained from Illumina sequencing of cDNA or gDNA libraries, and the long reads were obtained by Nanopore or PacBio sequencing of the gDNA library. The typically used de Bruijn graph algorithm is not suitable for the assembly of long repeat domains, so the SMoC (Spidroin Motif Collection) algorithm (28), developed based on the OLC (Overlap-Layout- Consensus) algorithm, was used. The SMoC algorithm collects as many patterns of repeat motifs as 15 possible, scaffolds these repeat motifs based on long reads, and then provides the full-length spidroin gene sequence. SMoC first finds the N/C-terminal region (nonrepetitive region) contigs from the assembled contigs with a homology search. These terminal fragments were used as seeds for screening of the short reads obtained by an Illumina sequencer harboring an exact match of extremely large k-mers (approximately 100) up to the 5′-end. The obtained short reads were aligned on the seed sequence to 20 construct a PWM (position weight matrix) on the 3′-side of the matching k-mer, and the seed sequence was extended based on the PWM until there was a split in the graph using stringent thresholds. Therefore, neighboring repeats are not resolvable. After the comprehensive collection of repeat motif patterns, scaffolding was performed using long reads. The long reads were obtained by direct sequencing, so the actual gene length was guaranteed. Finally, mapping the repeat motifs to the corresponding long read and 25 curating the sequence yields the complete spidroin genes.

Phylogenetic tree of spidroin The phylogenetic tree in Fig. S6 was constructed based on a core ortholog gene set obtained from BUSCO 30 arthropoda_obd9 (33) using four Nephilinae draft genomes (this study) and the A. ventricosus draft genome (Ave_3.0) (28). The phylogenetic tree of spidroin (Fig. S7) was constructed by FastTree (v2.1.10) (65) as an approximately maximum-likelihood phylogenetic tree from aligned and trimmed N-terminal sequences (150 residues from the start codon) with MAFFT (v7.273) (66) and trimAl (1.2rev59) (67).

35 Accession numbers for known spidroin sequences are as follows: major ampullate spidroin 2-like (AAZ15322.1), and flagelliform silk protein (AAF36091.1) in T. inaurata madagascariensis; major ampullate spidroin 1A precursor, partial (ACF19411.1), major ampullate spidroin 1B precursor, partial (ACF19412.1),

11

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

major ampullate spidroin 2 precursor, partial (ACF19413.1), major ampullate spidroin protein MaSp-a (PRD23950.1), major ampullate spidroin protein MaSp-b (PRD18716.1), major ampullate spidroin protein MaSp-c (PRD18936.1), major ampullate spidroin protein MaSp-d (PRD23750.1), major ampullate spidroin protein MaSp-f isoform 1 (PRD27696.1), major ampullate spidroin protein MaSp-g (PRD24320.1), major 5 ampullate spidroin protein MaSp-h (PRD20448.1), minor ampullate spidroin protein MiSp-a (PRD23654.1), minor ampullate spidroin protein MiSp-b (PRD30268.1), minor ampullate spidroin protein MiSp-c (PRD24510.1), minor ampullate spidroin protein MiSp-d (PRD18914.1), flagelliform silk protein (AAC38846.1), aciniform spidroin protein AcSp (PRD26201.1), aggregate spidroin protein AgSp-a (PRD23399.1), aggregate spidroin protein AgSp-b (PRD22063.1), aggregate spidroin protein AgSp-c 10 (PRD26655.1), aggregate spidroin protein AgSp-d (PRD23989.1), tubuliform spidroin protein TuSp (PRD35275.1), flagelliform spidroin protein FLAG-a (PRD27227.1), flagelliform spidroin protein FLAG-b (PRD24772.1), piriform spidroin protein PiSp (PRD25616.1), and spidroin protein Sp-907 (PRD35000.1), and spidroin protein Sp-74867 (PRD19552.1) in T. clavipes; cylindrical silk protein 1 (BAE54451.1) in T. clavate; major ampullate spidroin 3 variant 1, partial (AWK58729.1) in Argiope argentata; Major ampullate 15 spidroin 3 (GBN25680.1) in A. ventricosus; aggregate spidroin 1 (QDI78451.1), aggregate spidroin 2 (QDI78449.1) in Argiope trifasciata; eggcase silk protein (ACI23395.1) in Nephila antipodiana; egg case silk protein 1 (BAE86855.1) in Argiope bruennichi.

Gene expression analysis 20 Gene expression profiling was conducted with multiple biological replicates from mRNA extracted from the abdomen or silk glands. The gene expression levels were quantified and normalized as transcripts per million (TPM) by mapping processed reads to our assembled draft genome references with Kallisto version 0.42.1 (68) (S2 to S4 Table).

25 Silk collection The natural spider silks were sampled directly from adult female T. clavata restrained using two pieces of sponge and locked with rubber bands. Silk reeling from the spider was performed at a constant speed (1.28 m/m for 1 h) with a reeling machine developed by Spiber Inc. with six biological replicates and three technical replicates each. Since the spinnerets are very close together and many silks may be spun at the 30 same time, simple silk reeling may involve minor ampullate silk as well. Therefore, to identify the proteins that were significantly present in minor ampullate silk and remove them as background, we separated the dragline and minor ampullate silks clearly from their spinnerets using a stereomicroscope (Leica S4E, Leica microsystems GmbH, Wetzlar, Germany) (Fig. S9).

35 SDS-PAGE analysis SDS-PAGE analysis of dragline silk was conducted at IDEA Consultants, Inc. The dragline silk reeled from T. clavate was dissolved in 2 mL of ionic liquid (1-butyl-3-methylimidazolium acetate) per mg. After vortexing

12

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

for 2 min, the sample was incubated at 100 °C for 15 min. The ionic liquid solution was dialyzed into lysis buffer (6 M urea, 2 M thiourea, 2% CHAPS, 1% DTT) using a 3k MWCO cellulose membrane (Amicon Ultra, Millipore).

5 The high molecular weight fraction of 0.33 mg dragline silk was dissolved in the ionic liquid and filtered through a 50k MWCO membrane (Amicon Ultra, Millipore), collected in 100 µL of lysis buffer, and incubated at 95 °C for 5 min. Then, 2.5 µL and 5 µL samples were applied to a 4% SDS-polyacrylamide gel. Electrophoresis was performed at 20 mA for 80 min in running buffer [25 mM Tris, 192 mM glycine, 0.1% SDS], and a HiMark Unstained Standard (LC5688, Life Technologies) was used as a molecular marker. 10 After electrophoresis, the SDS gel was fixed in fixing solution [30% MeOH, 5% acetate acid], stained with SYPRO Ruby overnight, and washed with a wash solution [10% MeOH, 7% acetic acid]. Silver staining was conducted after fixation with 50% MeOH and sensitization with 0.005% sodium thiosulfate (Fig. 2A).

A solution of 3.97 mg dragline silk dissolved in 80 µL of Tris-HCl (pH 8.6), 1% Triton-X, and 2% SDS was 15 vortexed, incubated for 1 h on ice, and freeze-dried. After resuspension, the total volume was applied to a 12.5% SDS-polyacrylamide gel. Electrophoresis was performed at 20 mA for 80 min, with Broad Range Protein Molecular Weight Markers (V8491, Promega) used as molecular markers. After electrophoresis, the SDS gel was fixed in fixing solution [30% MeOH, 5% acetic acid], stained with SYPRO Ruby overnight, and washed with a wash solution [10% MeOH, 7% acetic acid]. Silver staining was conducted after fixation 20 with 50% MeOH and sensitization with 0.005% sodium thiosulfate (Fig. 2A).

Liquid chromatography mass spectrometry (LC-MS) analysis Approximately 1.0 mg of dragline silk was used for the LC-MS analysis. After washing the T. clavipes or N.

pilipes dragline silk with 100 µL of washing buffer [50 mM NH4HCO3, 0.1% SDS], the silk was immersed in

25 50 µL of solution A [50 mM NH4HCO3, 500 mM DTT] and incubated at 60 °C for 1 h. The silk was further

incubated with 50 µL of solution B [50 mM NH4HCO3, 500 mM IAA] at room temperature for 30 min in the

dark and washed with 100 µL of 50 mM NH4HCO3. Silk protein digestion was carried out in 50 µL of

digestion buffer [10 ng/µL trypsin, 50 mM NH4HCO3] at 37 °C overnight. The digested sample was incubated with 250 µL of 0.5% formic acid on a rotator and purified using a MonoSpin C18 column (GL Science). 30 The dragline silk of T. clavata or T. inaurata madagascariensis was immersed in 200 µL of lysis buffer [6 M guanidine-HCl, pH 8.5] per mg dragline silk and frozen in liquid nitrogen. After 10 cycles of sonication with 60 sec on/off at high level using Bioruptor II (BM Equipment), the protein was quantified with Qubit Protein assay (Life Technologies) and Pierce BCA Protein Assay Kit (Thermo Scientific). The lysate [50 µg 35 protein/50 µL] was incubated with 0.5 µL of 1 M DTT solution at 37 °C for 30 min followed by 2.5 µL of 1 M

IAA solution at 37 °C for 30 min in the dark. After five-fold dilution with 50 mM NH4HCO3, protein digestion was performed by 1 µg of Lys-C (Wako) at 37 °C for 3 h following 1 µg of trypsin (Promega, Madison, WI,

13

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

USA) at 37 °C for 16 h. The sample was acidified by TFA and desalted by C18-StageTips (69). LC-MS analysis of T. clavipes (n = 6) and N. pilipes (n = 9) samples was performed with an UltiMate 3000 NanoLC Pump (Dionex Co., Sunnyvale, CA, USA) and an LTQ Orbitrap XL ETD (Thermo Electron, San Jose, CA, USA). A 5 µg aliquot of digested sample was injected into a spray needle column [ReproSil-Pur C18-AQ, 3 5 µm, Dr. Maisch, Ammerbuch-Entringen, Germany, 100 µm i.d.×130 mm] and separated by linear gradient elution with three mobile phases, A [0.5% acetic acid in water], B [0.5% acetic acid in acetonitrile] and C [0.5% acetic acid in DMSO], at a flow rate of 500 nL/min. The composition of mobile phase C was kept at 4%. The composition of mobile phase B was changed from 0-4% in 5 min, 4-24% in 60 min and 24-76% in 5 min, followed by keeping at 76% in 10 min. The separated peptides were ionized at 2600 V and detected 10 as peptide ions (scan range: m/z 300-1500, mass resolution: 60000 at m/z 400). The top 10 peaks of multiple charged peptide ions were subjected to collision-induced dissociation (isolation width: 2, normalized collision energy: 35 V, activation Q: 0.25, activation time: 30 s).

The samples of T. clavata (n = 7) and T. inaurata madagascariensis (n = 1) were analyzed with a nanoElute 15 and a timsTOF Pro (Bruker Daltonics, Bremen, Germany). A 200 ng aliquot of digested sample was injected into a spray needle column [ACQUITY UPLC BEH C18, 1.7 µm, Waters, Milford, MA, 75 µm i.d.×250 mm] and separated by linear gradient elution with two mobile phases, D [0.1% formic acid in water] and E [0.1% formic acid in acetonitrile], at a flow rate of 280 nL/min. The composition of mobile phase E was increased from 2% to 35% in 100 min, changed from 35% to 80% in 10 min and kept at 80% for 10 min. The separated 20 peptides were ionized at 1600 V and analyzed by parallel accumulation serial fragmentation (PASEF) scan (70). Briefly, the PASEF scan was performed at ion mobility coefficients (1/K0) ranging from 0.6 Vs/cm2 to 1.6 Vs/cm2 within a ramp time of 100 msec, keeping the duty cycle at 100%. An MS scan was performed in the mass range from m/z 100 to m/z 1700, followed by 8 PASEF-MS/MS scans per cycle. Precursor ions were selected from the 12 most intense ions in a TIMS-MS survey scan (precursor ion charge: 0-5, intensity 25 threshold: 1250, target intensity: 10000). In addition, a polygon filter was applied to the m/z and ion mobility plane to select the most likely representative peptide precursors without singly charged ions. CID was performed with the default settings (isolation width: 2 Th at m/z 700 and 3 Th at m/z 800, collision energy: 20 eV at 1/k0 0.6 Vs/cm2 and 59 eV at 1/k0 1.6 Vs/cm2). LC-MS data of T. clavipes (n = 6) and N. pilipes (n = 9) samples were analyzed using PEAKS studio X+ (Bioinformatics Solutions Inc., Canada) with the 30 following conditions (71). Briefly, de novo sequencing and database searches were performed with an error tolerance of 6 ppm for precursor ions and 0.5 Da for fragment ions. The enzyme was set to trypsin, and up to 2 missed cleavages were allowed. Carbamidomethylation at cysteine residues was set as a fixed modification. N-acetylation at the protein N-terminus and oxidation at the methionine residue were set as variable modifications, allowing for up to 3 positions per peptide. A protein sequence database generated 35 from our draft genome was used for identification. The MaxQuant (version 1.6.10.43) (72) contaminant database (245 entries, major experimental contaminants) was used to detect contaminants. The criterion of identification was set to less than 1% FDR at the peptide-spectrum match (PSM) level. The feature area

14

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

of each identified peptide ion was calculated automatically with the PEAKS software algorithm. The intensity-based absolute quantification (iBAQ) (39) value of each identified protein was calculated from the feature area values. LC-MS data of T. clavata (n = 7) and T. inaurata madagascariensis (n = 1) were also analyzed using the same PEAKS software. De novo sequencing and database search were performed with 5 an error tolerance of 20 ppm for precursor ions and 0.05 Da for fragment ions. The protein sequence database obtained from our draft genome was used for protein identification. Other conditions were the same as described above.

To eliminate the possibility that other silks might have contaminated reeled dragline silk samples, we 10 removed background data. We separated the dragline and minor ampullate silks clearly from the spinneret by microscopy and used them for LC-MS analysis to identify the background proteins. Using the iBAQ score, we identified the proteins specifically expressed in the minor ampullate silk based on significance (FDR < 0.01) and removed them from the result of the dragline silk proteome analysis as contaminants. The percentage of each protein in silk was calculated based on the iBAQ multiplied by the amino acid length. 15 Recombinant protein expression and purification Sequence-optimized recombinant T. clavata MaSp family and SpiCE-NMa1 proteins were used for the composite film and silk. Targeting a size of approximately 50 kDa, we reduced the number of repeat units to 14 for MaSp1, 9 for MaSp2, and 8 for MaSp3 while keeping the N- and C-terminal domains (Fig. S12). 20 The recombinant spidroin genes are composed of a 6x His tag (MHHHHHH), a linker (SSGSS), an HRV 3C protease recognition site (LEVLFQGP), an N-terminal domain, repeat units, and a C-terminal domain. The SpiCE-NMa1 gene is also composed of a 6x His tag, a linker (SSGSS), an HRV 3C protease recognition site and a T. clavata comp1999 protein sequence. Each signal peptide was removed based on SignalP-5.0 (73). Nucleotide sequences were adjusted to the codon usage of E. coli. Fragmented 25 sequences were chemically synthesized by Fasmac Co., Ltd. (Atsugi, Japan). The synthesized sequences were assembled with overlap extension PCR (74). Assembled sequences were cloned into pET-22b(+) and

transformed into E. coli BLR (DE3). A 100 mL bacterial culture (5.0 g/L glucose, 4.0 g/L KH2PO4, 6.0 g/L

yeast extract and 0.1 g/L ampicillin) was grown at 30 °C to an OD600 of 5.0 and used for inoculation. The cells were cultured in a 10 L jar fermenter (Takasugi Seisakusho Co. Ltd., Tokyo, Japan) containing 5.7 L

30 of medium (12.0 g/L glucose, 9.0 g/L KH2PO4, 15 g/L yeast extract, 0.04 g/L FeSO4 · 7H2O, 0.04 MnSO4 ·

5H2O, 0.04 CaCl2 · 2H2O, GD-113 (Nof Corporation, Japan)) at 37 °C. The initial OD600 was 0.05. After the initial glucose was consumed, a feeding solution containing 65.9 w/v% glucose was pumped into the fermenter at a feeding rate of 50 mL/h. The dissolved oxygen concentration was kept at 20% air saturation. The pH was kept at pH 6.9. After 24 h, protein expression was induced by adding isopropyl-β-D- 35 thiogalactoside (IPTG) to a final concentration of 0.1 mM. The cells were harvested by centrifugation at 11,000 x g for 15 min at 24 h after induction. For the recombinant spidroins, the harvested cells were washed out with 20 mM Tris-HCl, pH 7.4, and centrifuged at 11,000 x g, and the supernatant was discarded.

15

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

The cells were resuspended in 20 mM Tris-HCl, pH 7.4, 100 mM NaCl, pH 8.0 with 0.9 µg of DNase per wet cell weight (g), 82 µg of lysozyme per wet cell weight (g) and 0.2 mL of phenylmethylsulfonyl fluoride (PMSF) per wet cell weight (g). Cells were agitated overnight at 37 °C. The cells were resuspended in buffer solution (50 mM Tris-HCl, 100 mM NaCl, pH 8.0) with 3 w/v% sodium dodecyl sulfate (SDS), and the cell 5 suspension was centrifuged at 11,000 x g for 30 min at room temperature. The pellets were dissolved in DMSO containing 1 M LiCl at 60 °C for 30 min. The solution was centrifuged at 11,000 x g for 30 min at room temperature. Ethanol precipitation was performed at room temperature. The mixture was centrifuged at 11,000 x g for 10 min. The precipitate was washed with reverse-osmosis purified (RO) water three times, followed by lyophilization. For SpiCE-NMa1, the harvested cells were suspended in sodium phosphate 10 buffer (50 mM sodium phosphate, 300 mM NaCl, pH 7) with 7.5 M urea, PMSF and DTT and then lysed by a high-pressure homogenizer (GEA Niro Soavi), followed by centrifugation. The supernatant was filtered through a 0.44 µm filter and loaded onto a column packed with Nuvia IMAC (Bio-Rad). After loading, the resin was washed with 100 mM sodium phosphate buffer with 7.5 M urea, DTT, Triton-X100 and 15 mM imidazole and then eluted using a linear gradient of imidazole concentrations from 15 mM to 500 mM. The 15 eluents were collected and dialyzed against buffer A (20 mM Tris-HCl, 7.5 M urea, 50 mM NaCl, DTT and Triton-X100, pH 7.4). The dialyzed sample was loaded onto a 5 mL Bio-Scale Macro-Prep High Q Column (Bio-Rad). The resin was washed with 20 mM Tris-HCl buffer containing 7.5 M urea, 80 mM NaCl, 1 mM DTT and Triton-X100, pH 7.4, and then eluted using a linear gradient of NaCl concentration from 80 mM to 1 M. The eluents were collected and loaded onto Nuvia IMAC resin again and eluted using a linear gradient 20 of imidazole concentration from 15 mM to 500 mM. The eluents were dialyzed against RO water and then lyophilized.

Artificial recombinant films HIFP was used as the solvent for the film synthesis. Recombinant MaSps were mixed in 8 mL HFIP to a 25 total of 100 mg and then dried at room temperature for 16 h in a dish (Fig. S13). Composite films containing SpiCE-NMa1 were produced by mixing the MaSp2 solution with recombinant SpiCE-NMa1 at 1, 3, or 5 wt% each.

Artificial recombinant silks 30 Recombinant MaSp2 protein of 22 wt% (1.66 g) was dissolved in 5.89 g DMSO (4 wt% LiCl) and incubated at 90 °C for 8 h. For composite silk containing SpiCE-NMa1, recombinant MaSp2 protein of 22 wt% and

recombinant SpiCE-NMa1 of 0.012 g were dissolved in 4.2 g DMSO. The dope was aspirated with a N2 pump and D=0.1 mm needle at 70 °C and spun directly into baths containing MeOH/DMSO 30 v/v% (5 °C), MeOH (25 °C), and water (25 °C). Only MaSp2/SpiCE-NMa1 composite silk could be produced because of 35 the low yield of recombinant MaSp1 and MaSp3 proteins.

Wide-angle X-ray scattering (WAXS) measurement

16

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

The crystalline samples were characterized by a synchrotron WAXS measurement at the BL45XU beamline of SPring-8, Harima, Japan (75). The X-ray energy was set at 12.4 keV (wavelength: 0.1 nm), and the distance between the sample and the detector was approximately 257 mm. For the diffraction patterns, an exposure time of 10 sec was used. Using Fit2D software (76), the obtained data were converted into one- 5 dimensional radial integration profiles and corrected by subtracting the background scattering. The degree of crystallinity was calculated from the area of the crystal peaks divided by the combined area of the crystal peaks and the amorphous halo by fitting the Gaussian function using Igor Pro 6.3.

Measurement of mechanical properties 10 The tensile properties of the fibers were measured using an EZ-LX universal tester (Shimadzu, Kyoto, Japan) with a 1 N load cell at a strain rate of 10 mm/min (0.033 s-1) at 25 °C and 48% relative humidity. For each tensile test, the cross-sectional area of an adjacent section of the fiber was calculated based on the SEM images. The transparency of the film was measured at selected wavelengths between 350 and 600 nm using a UV instrument. Each fiber was attached to a rectangular piece of cardboard with a 5 mm 15 aperture using 95% cyanoacrylate.

Tensile tests of the recombinant silk fibers were performed using a mechanical testing apparatus (Automatic Single-Fiber Test System FAVIMAT+, Textechno H. Stein GmbH & Co. KG, Mönchengladbach, Germany) with a 210 cN load cell at a strain rate of 10 mm/min (0.016 s-1) at 20 °C and RH 65%. From each fiber 20 sample, diameters were measured at different sites (n = 6) by microscopic observation (Nikon Eclipse LV100ND, lens 150x0) to calculate the cross-sectional area of an adjacent section of the fibers.

Acknowledgments 25 The authors thank Akio Tanikawa for morphological identification of spiders and helpful comments about phylogenetic discussion and thank the Malagasy Institute for the Conservation of Tropical Environments (MICET), the Ministry of Environment and Sustainable Development of Madagascar, Mention Zoologie et Biologie Animale (MZBA), Université d’Antananarivo, and Madagascar National Parks (MNP) for their 30 cooperation and permission to conduct sampling in Madagascar. The authors also thank Yuki Takai, Nozomi Abe, Yuki Onozawa, and Naoko Ishii for technical support in the sequencing and proteome analysis, Hongfang Chi, Haruka Funayama, Kaori Yaosaka, Ryota Sato, and Hironori Yamamoto for technical support in the fiber spinning and analysis of fibers, Hiroshi Kano, Ryoko Sato, and Hideki Nishijima for the development of reeling machines, and Hiroyasu Masunaga for his technical support in 35 SPring-8 measurements. H.N., M.T., K.N., and A.K. are supported by the ImPACT Program of the Council for Science, Technology and Innovation (Cabinet Office, Government of Japan). N.K. is supported by a Nakatsuji Foresight Foundation Research Grant, the Sumitomo Foundation (190426), The Uehara Memorial Foundation, Grant-in-Aid for Scientific Research (B). N.K., M.M., M.T., and A.K. are supported by research funds from the Yamagata Prefectural Government and Tsuruoka City, Japan. Y.Y. is 40 supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP18J21155, and K.N. is financially supported by a Grant-in-Aid for Transformative Research Areas (B).

References

17

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1. N. C. Abascal, L. Regan, The past, present and future of protein-based materials. Open Biol 8 (2018). 2. F. G. Omenetto, D. L. Kaplan, New opportunities for an ancient material. Science 329, 528-531 5 (2010). 3. K. Numata, How to define and study structural proteins as biopolymer materials. Polymer Journal 52, 1043-1056 (2020). 4. B. O. Swanson, T. A. Blackledge, J. Beltrán, C. Y. Hayashi, Variation in the material properties of spider dragline silk across species. Applied Physics A 82, 213-218 (2006). 10 5. F. Teule et al., A protocol for the production of recombinant spider silk-like proteins for artificial fiber spinning. Nat. Protoc. 4, 341-355 (2009). 6. X. X. Xia et al., Native-sized recombinant spider silk protein produced in metabolically engineered Escherichia coli results in a strong fiber. Proc. Natl. Acad. Sci. U. S. A. 107, 14059- 14063 (2010). 15 7. S. R. Fahnestock, L. A. Bedzyk, Production of synthetic spider dragline silk protein in Pichia pastoris. Appl. Microbiol. Biotechnol. 47, 33-39 (1997). 8. D. Huemmerich et al., Novel assembly properties of recombinant spider dragline silk proteins. Curr. Biol. 14, 2070-2074 (2004). 9. J. Scheller, K. H. Guhrs, F. Grosse, U. Conrad, Production of spider silk proteins in tobacco and 20 potato. Nat. Biotechnol. 19, 573-577 (2001). 10. A. Lazaris et al., Spider silk fibers spun from soluble recombinant silk produced in mammalian cells. Science 295, 472-476 (2002). 11. C. Guo, C. Li, X. Mu, D. L. Kaplan, Engineering silk materials: From natural spinning to artificial processing. Applied Physics Reviews 7, 011313 (2020). 25 12. C. H. Bowen et al., Recombinant Spidroins Fully Replicate Primary Mechanical Properties of Natural Spider Silk. Biomacromolecules 19, 3853-3860 (2018). 13. M. Saric, L. Eisoldt, V. Doring, T. Scheibel, Interplay of Different Major Ampullate Spidroins during Assembly and Implications for Fiber Mechanics. Adv Mater 10.1002/adma.202006499, e2006499 (2021). 30 14. A. D. Malay et al., Spider silk self-assembly via modular liquid-liquid phase separation and nanofibrillation. Sci Adv 6 (2020). 15. M. Andersson, J. Johansson, A. Rising, Silk Spinning in Silkworms and Spiders. Int. J. Mol. Sci. 17 (2016). 16. T. Scheibel, Spider silks: recombinant synthesis, assembly, spinning, and engineering of synthetic 35 proteins. Microb Cell Fact 3, 14 (2004). 17. M. B. Hinman, J. A. Jones, R. V. Lewis, Synthetic spider silk: a modular fiber. Trends Biotechnol. 18, 374-379 (2000). 18. C. Belbeoch, J. Lejeune, P. Vroman, F. Salaun, Silkworm and spider silk electrospinning: a review. Environ. Chem. Lett. 10.1007/s10311-020-01147-x, 1-27 (2021). 40 19. S. J. Blamires, T. A. Blackledge, I. M. Tso, Physicochemical Property Variation in Spider Silk: Ecology, Evolution, and Synthetic Production. Annu. Rev. Entomol. 62, 443-460 (2017). 20. J. E. Garb, T. Dimauro, V. Vo, C. Y. Hayashi, Silk genes support the single origin of orb webs. Science 312, 1762 (2006). 21. O. Tokareva, V. A. Michalczechen-Lacerda, E. L. Rech, D. L. Kaplan, Recombinant DNA production 45 of spider silk proteins. Microb Biotechnol 6, 651-663 (2013). 22. Z. Jaleel et al., Expanding Canonical Spider Silk Properties through a DNA Combinatorial Approach. Materials (Basel) 13 (2020).

18

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

23. L. Eisoldt, A. Smith, T. Scheibel, Decoding the secrets of spider silk. Materials Today 14, 80-86 (2011). 24. R. C. Chaw, S. M. Correa-Garhwal, T. H. Clarke, N. A. Ayoub, C. Y. Hayashi, Proteomic Evidence for Components of Spider Silk Synthesis from Black Widow Silk Glands and Fibers. J. Proteome 5 Res. 14, 4223-4231 (2015). 25. C. Larracas et al., Comprehensive Proteomic Analysis of Spider Dragline Silk from Black Widows: A Recipe to Build Synthetic Silk Fibers. Int. J. Mol. Sci. 17 (2016). 26. T. Pham et al., Dragline silk: a fiber assembled with low-molecular-weight cysteine-rich proteins. Biomacromolecules 15, 4073-4081 (2014). 10 27. P. L. Babb et al., The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression. Nat. Genet. 49, 895-903 (2017). 28. N. Kono et al., Orb-weaving spider Araneus ventricosus genome elucidates the spidroin gene catalogue. Sci. Rep. 9, 8380 (2019). 29. J. Gatesy, C. Hayashi, D. Motriuk, J. Woods, R. Lewis, Extreme diversity, conservation, and 15 convergence of spider silk fibroin sequences. Science 291, 2603-2605 (2001). 30. C. Y. Hayashi, R. V. Lewis, Molecular architecture and evolution of a modular spider silk protein gene. Science 287, 1477-1479 (2000). 31. M. Steinegger, S. L. Salzberg, Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020). 20 32. N. Kono, K. Arakawa, Nanopore sequencing: Review of potential applications in functional genomics. Dev. Growth Differ. 10.1111/dgd.12608 (2019). 33. F. A. Simao, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, E. M. Zdobnov, BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212 (2015). 25 34. T. M. Lowe, S. R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955-964 (1997). 35. W. Li, A. Godzik, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659 (2006). 36. N. A. Ayoub, J. E. Garb, R. M. Tinghitella, M. A. Collin, C. Y. Hayashi, Blueprint for a high- 30 performance biomaterial: full-length spider dragline silk genes. PLoS One 2, e514 (2007). 37. C. Y. Hayashi, T. A. Blackledge, R. V. Lewis, Molecular and mechanical characterization of aciniform silk: uniformity of iterated sequence modules in a novel member of the spider silk fibroin gene family. Mol. Biol. Evol. 21, 1950-1959 (2004). 38. N. Kono et al., The bagworm genome reveals a unique fibroin gene that provides high tensile 35 strength. Commun Biol 2, 148 (2019). 39. B. Schwanhausser et al., Global quantification of mammalian gene expression control. Nature 473, 337-342 (2011). 40. S. W. Cranford, A. Tarakanova, N. M. Pugno, M. J. Buehler, Nonlinear material behaviour of spider silk yields robust webs. Nature 482, 72-76 (2012). 40 41. G. G. Kerr et al., Mechanical properties of silk of the Australian golden orb weavers Nephila pilipes and Nephilaplumipes. Biol Open 7 (2018). 42. R. Madurga et al., Material properties of evolutionary diverse spider silks described by variation in a single structural parameter. Sci. Rep. 6, 18991 (2016). 43. J. Sirichaisit, R. J. Young, F. Vollrath, Molecular deformation in spider dragline silk subjected to 45 stress. Polymer 41, 1223-1227 (2000). 44. F. Vollrath, D. Porter, Spider silk as archetypal protein elastomer. Soft Matter 2, 377-385 (2006). 45. J. E. Wagenseil, R. P. Mecham, New insights into elastic fiber assembly. Birth Defects Res C Embryo Today 81, 229-240 (2007).

19

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

46. I. Agnarsson, M. Kuntner, T. A. Blackledge, Bioprospecting finds the toughest biological material: extraordinary silk from a giant riverine orb spider. PLoS One 5, e11234 (2010). 47. N. Kono, H. Nakamura, M. Mori, M. Tomita, K. Arakawa, Spidroin profiling of cribellate spiders provides insight into the evolution of spider prey capture strategies. Sci. Rep. 10, 15721 (2020). 5 48. J. E. Garb et al., The transcriptome of Darwin's bark spider silk glands predicts proteins contributing to dragline silk toughness. Commun Biol 2, 275 (2019). 49. M. M. Sheffer et al., Chromosome-level reference genome of the European wasp spider Argiope bruennichi: a resource for studies on range expansion and evolutionary adaptation. Gigascience 10 (2021). 10 50. N. Kono, H. Nakamura, Y. Ito, M. Tomita, K. Arakawa, Evaluation of the impact of RNA preservation methods of spiders for de novo transcriptome assembly. Mol. Ecol. Resour. 16, 662-672 (2016). 51. S. Koren et al., Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722-736 (2017). 15 52. B. J. Walker et al., Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014). 53. N. I. Weisenfeld, V. Kumar, P. Shah, D. M. Church, D. B. Jaffe, Direct determination of diploid genome sequences. Genome Res. 27, 757-767 (2017). 54. M. Boetzer, W. Pirovano, SSPACE-LongRead: scaffolding bacterial draft genomes using long read 20 sequence information. BMC Bioinformatics 15, 211 (2014). 55. W. De Coster, S. D'Hert, D. T. Schultz, M. Cruts, C. Van Broeckhoven, NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666-2669 (2018). 56. M. Kolmogorov, J. Yuan, Y. Lin, P. A. Pevzner, Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540-546 (2019). 25 57. A. C. English et al., Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7, e47768 (2012). 58. J. Ruan, H. Li, Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155-158 (2020). 59. M. Chakraborty, J. G. Baldwin-Brown, A. D. Long, J. J. Emerson, Contiguous and accurate de novo 30 assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 44, e147 (2016). 60. D. Laetsch, M. Blaxter, BlobTools: Interrogation of genome assemblies [version 1; peer review: 2 approved with reservations]. F1000Research 6 (2017). 61. B. Buchfink, C. Xie, D. H. Huson, Fast and sensitive protein alignment using DIAMOND. Nat 35 Methods 12, 59-60 (2015). 62. G. W. Vurture et al., GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202-2204 (2017). 63. D. Kim, J. M. Paggi, C. Park, C. Bennett, S. L. Salzberg, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907-915 (2019). 40 64. K. J. Hoff, S. Lange, A. Lomsadze, M. Borodovsky, M. Stanke, BRAKER1: Unsupervised RNA-Seq- Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767-769 (2016). 65. M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010). 45 66. K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772-780 (2013). 67. S. Capella-Gutierrez, J. M. Silla-Martinez, T. Gabaldon, trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-1973 (2009).

20

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

68. N. L. Bray, H. Pimentel, P. Melsted, L. Pachter, Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525-527 (2016). 69. J. Rappsilber, Y. Ishihama, M. Mann, Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. 5 Chem. 75, 663-670 (2003). 70. F. Meier et al., Online Parallel Accumulation-Serial Fragmentation (PASEF) with a Novel Trapped Ion Mobility Mass Spectrometer. Mol. Cell. Proteomics 17, 2534-2545 (2018). 71. J. Zhang et al., PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics 11, M111 010587 (2012). 10 72. J. Cox, M. Mann, MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367-1372 (2008). 73. H. Nielsen, K. D. Tsirigos, S. Brunak, G. von Heijne, A Brief History of Protein Sorting Prediction. Protein J 38, 200-216 (2019). 15 74. A. V. Bryksin, I. Matsumura, Overlap extension PCR cloning: a simple and reliable way to create recombinant plasmids. Biotechniques 48, 463-465 (2010). 75. K. Yazawa, K. Ishida, H. Masunaga, T. Hikima, K. Numata, Influence of Water Content on the beta-Sheet Formation, Thermal Stability, Water Removal, and Mechanical Properties of Silk Materials. Biomacromolecules 17, 1057-1066 (2016). 20 76. J. Chang, G. S. Mageras, C. C. Ling, Evaluation of rapid dose map acquisition of a scanning liquid- filled ionization chamber electronic portal imaging device. Int. J. Radiat. Oncol. Biol. Phys. 55, 1432-1445 (2003).

Figures and Tables 25

21

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Trichonephila inaurata Trichonephila clavata Trichonephila clavipes Nephila pilipes madagascariensis

MaSp1A1 MaSp1B1 MaSp1C1 MaSp1D1 MaSp2A1 MaSp2B1 MaSp3B1 MaSp3C1 MiSp1A1 MiSp1B1 MiSp1B2

PySp1A1

AcSp1A1

AgSp1A1

AgSp2B1

TuSp1A1

Flag1A1

Figure 1. Catalog of spidroins in four Nephilinae spiders

This summary panel shows the spidroin sequence characteristics and structures obtained from four Nephilinae draft genomes. The icons in the first column represent spidroin groups. The second

5 column represents the motif variety in the domains, as follows: b-sheet (An, (GA)n, and AS), blue;

b-turn (XQQ, GPGQQ, and GPGXX), yellow; spacer (SSX and TTX), green; 310 helix (GGX), orange; and MaSp motif (DGGR, GGYGGRF, and GGYGGL), red. The spidroin structure columns show the N/C-terminal (yellow and blue box) and repeat domains, and each structure is drawn to scale. The number, width, and color of stripes reflect the number, size, and motif identify 10 of the repeats. Assembly gaps are represented by “N” (see Fig. S5 for details).

22

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 2. Expression profile of spidroin and SpiCE in silk and silk glands

(A) SDS-PAGE of HMW and LMW fractions from 3.97 mg of T. clavata dragline silk dissolved in 80 µL Tris-HCl (pH 8.6) containing 1% Triton-X and 2% SDS. SDS-polyacrylamide gels (4%) of the HMW fraction 5 were stained with SYPRO Ruby or Silver after electrophoresis for 80 min at 20 mA. HiMark Unstained Standard (LC5688, Life Technologies) was used as a molecular marker. SDS-polyacrylamide gels (12.5%) of the LMW fraction were stained with SYPRO Ruby or Silver after electrophoresis for 80 min at 20 mA. Broad Range Protein Molecular Weight Marker (V8491, Promega Technologies) used as the molecular marker. (B) Average percentage of protein contents in dragline silk of T. clavata, T. clavipes, T. inaurata 10 madagascariensis, and N. pilipes. Each SpiCE-NMa1 was indicated in yellow. (C) Expression profiles from the major ampullate silk glands in T. clavata (n = 6), T. clavipes (n = 7), and N. pilipes (n = 5). The x-axes represent the median expression level in TPM (transcripts per million). The y-axes show the highly expressed genes in each gland. The yellow boxes indicate SpiCE-NMa1. The photomicrographs show typical silk gland samples dissected from each spider (see S2 to S4 Table for details).

15

23

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 3. Physical properties of the artificial MaSp composite film

(A) Seven films were produced by the combination of recombinant MaSp family proteins (MaSp1, MaSp2, and MaSp3). The sequences of recombinant proteins are indicated in Fig. S11. T% 5 indicates transparency at a wavelength of 500 nm. (B) The WAXS profiles of the films with different MaSp compositions showing the one-dimensional radial integration profiles and two- dimensional patterns.

24

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Representative A Stress-strain curve Tensile testTensile testTensile testTensile test 50 + 1 wt% + 3 wt% + 5 wt% MaSp2 MaSp2 + 1 wt% Only MaSp2Only MaSp2Only MaSp2Only MaSp2 SpiCE-NMa11 wt% 19991 wt% 19991 wt% 19991 wt% 1999 3 wt% 1999SpiCE-NMa13 wt% 19993 wt% 19993 wt% 1999 5 wt% 19995 wt% 1999SpiCE-NMa15 wt% 19995 wt% 1999 40 + 3 wt% + 5 wt%

30

20

10 Tensile strength [MPa] strength Tensile T = 72.5 T = 77.4 T = 77.8 T = 76.1 0 0 2 4 6 Elongation [%] B MaSp2 (x5.5) MaSp2 + SpiCE-NMa1 (x5.5) MaSp2 + SpiCE-NMa1 (x6.0) #2784 #2776 #2777

→→→→Tensile strength increased by 2 times by adding MaSp2.Tensile strength increased by 2 times by adding MaSp2.Tensile strength increased by 2 times by adding MaSp2.Tensile strength increased by 2 times by adding MaSp2.

#2777 300

[MPa] 200 strength 100 Tensile Tensile strength [MPa] strength Tensile Tensile strength [MPa] strength Tensile [MPa] strength Tensile

0 0 10 20 30 40 50 Elongation [%] Elongation [%] Elongation[%]Elongation [%] Figure 4. SpiCEd film and silk

(A) Composite films mixing MaSp2 and SpiCE-NMa1. The stress-strain curve represents the typical tensile engineering stress-strain in MaSp2 and SpiCE composite films under different 5 SpiCE concentrations (0, 1, 3, 5 wt%). The transparency (%) of these films is shown as T in each picture (see Fig. S13-S15 for details). (B) Recombinant spider silks mixing MaSp2 and SpiCE- NMa1 with different draw ratios (5.5 or 6.0). Scanning electron microscopy (SEM) images show the cylindrical and smooth surface and cross-section of each recombinant spider silk. (C) Stress- strain curve of each recombinant spider silk. Black arrows indicate the yield points.

10

25

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Table 1. Assembly and annotation statistics of four spider draft genomes (T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes). Statistics Trichonephila clavata Trichonephila clavipes Trichonephila inaurata Nephila pilipes madagascariensis Genome v4 v4 v4 v4 Estimated genome 2.46 2.72 2.32 2.12 size (Gbp)a Assembly size (Gbp) 2.50 2.87 2.51 2.69 Scaffold number 39,792 21,438 28,263 132,899 Average scaffold 62,773 134,077 88,704 20,274 length (bp) Longest scaffold 1,797,358 4,645,559 2,667,260 4,220,928 length (bp) Shortest scaffold 1,010 150 1,150 451 length (bp) N50 (bp) (# of 112,957 (#5,596) 738,925 (#1,112) 231,352 (#2,910) 292,147 (#1,975) scaffolds in N50) N90 (bp) (# of 22,952 (#25,683) 104,003 (#4,898) 40,466 (#12,765) 6,691 (#34,939) scaffolds in N90) BUSCO v4.0.5 90.6 93.0 89.8 92.9 completeness (%)b C:90.6%[S:85.5%,D:5.1%],F:5. C:93.0%[S:92.2%,D:0.8%],F:2. C:89.8%[S:86.7%,D:3.1%],F:2. C:92.9%[S:89.8%,D:3.1%],F:3. 5%,M:3.9%,n:255 7%,M:4.3%,n:255 7%,M:7.5%,n:255 5%,M:3.6%,n:255 Genes Number of protein- 18,822 45,304 16,899 17,127 coding genes tRNAs 8,008 11,433 8,679 1,837 rRNAs 248 106 128 167 BUSCO v4.0.5 91.4 87.5 91.0 93.4 completeness (%)b C:91.4%[S:89.0%,D:2.4%],F:5. C:87.5%[S:85.5%,D:2.0%],F:8. C:91.0%[S:89.8%,D:1.2%],F:4. C:93.4%[S:91.0%,D:2.4%],F:5. 9%,M:2.7%,n:255 6%,M:3.9%,n:255 3%,M:4.7%,n:255 1%,M:1.5%,n:255 a Genome size was estimated based on k-mer (GenomeScope) b BUSCO analysis based on eukaryota lineage of protein-coding genes 5

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Supplementary Information for Multicomponent nature underlies the extraordinary mechanical properties of spider dragline silk

Nobuaki Kono, Hiroyuki Nakamura, Masaru Mori, Yuki Yoshida, Rintaro Ohtoshid, Ali D Malay, Daniel A Pedrazzoli Moran, Masaru Tomita, Keiji Numata, Kazuharu Arakawa

Kazuharu Arakawa Email: [email protected]

Supplementary Information Text

Improvement of the T. clavipes (NepCla1.0) genome and spidroins Genome improvement The first T. clavipes genome was reported in (1). This paper made a significant contribution to spidroin research, with the spidroin catalog of orb-weaving spiders constructed from the draft genome demonstrating that spidroins constitute a complex gene family. However, the sequencing of large spider genomes is very challenging, and classic technologies alone can obtain only limited information. In addition, the risk of contamination must always be considered in genome sequencing. Steinegger and Salzberg conducted a comprehensive analysis of public databases and reported millions of contaminants in genome sequences (2). In that report, they described that all 504 contigs (over 24 M residues) in the T. clavipes genome (NepCla1.0) were identified as possible contaminants (2). We also submitted the T. clavipes genome (NepCla1.0) to BlobTools analysis (3) to detect possible contaminants and showed that 4,935 scaffolds (over 44 Mb), including the top two long scaffolds (MWRG01000001.1 and MWRG01000002.1 and, total 2.89 Mbp), may be of bacterial origin (Fig. S4). In addition to contamination, the completeness of the genome assembly and the predicted gene set must be carefully evaluated. N50 and BUSCO are de facto benchmarks for evaluating the completeness and credibility of genome assembly or gene sets and should always be evaluated. The BUSCO completeness of the T. clavipes genome (NepCla1.0: GCA_002102615.1_NepCla1.0) and T. clavipes gene set (NepCla1.0: 50,846 genes published as MWRG01_proteins_060117) were 67.4% and 62.2%, respectively, for the Arthropoda lineage; they were 72.1% and 60.4%, respectively, even for the eukaryote lineage. The N50 and N90 scaffold sizes were 62,959 bp (#9,532) and 5,317 (#65,231), respectively. The T. clavipes genome we report here is the result of the hybrid assembly using newly sequenced reads with short-read sequencing to guarantee accuracy and a long-read sequencing

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

to guarantee structure. The removal of contaminants was strictly conducted using BlobTools based on homology searches (Fig. S2). We then presented in this paper a sufficiently improved genome with an N50 of 738,925 bp (#1,112) and BUSCO of 93.0%. The BUSCO score of the predicted gene set was 88.2%, which is considered to be high enough quality to be used. Spidroin improvement All known T. clavipes spidroins have been isolated based on short-read data obtained from gDNA or cDNA sequencing. The typical spidroin structure is one in which repeat domains are sandwiched between the nonrepeated terminal domains, and a reasonable spidroin ortholog group lineage has been shown. However, full-length spidroin genes have not been sequenced directly, and classic assemblies from short reads alone may have collapsed the repeat domains. We confirmed all spidroin lengths based on the long reads obtained by Nanopore direct sequencing and validated the full-length structures. Accordingly, a new proposal for nomenclature is now possible. The spidroin gene set (NepCla1.0) was systematized, and the spidroin genes were also given appropriate names. In particular, our comparative and phylogenetic analysis showed that Sp-907 and Sp-74867 are MaSp family 3 spidroins.

Spidroin nomenclature According to N-terminal domain similarity, we defined the following nomenclature system for spidroins in this paper. Names of individual spidroin members are represented as “SPnXm”, where SP is the root name (MaSp, MiSp, AgSp, PySp, Flag, CySp, or AcSp), n is an integer representing a family, X is a single letter denoting a subfamily, and m is an integer representing an individual family member (isoform) (Fig. 1 and Fig. S5). There have been many previous reports of the existence of multiple MaSp families. In addition to the classic MaSp families 1 and 2, a recent high-throughput sequencing approach has found a third family (family 3) in A. ventricosus, Araneus diadematus, Argiope argentata, Latrodectus hesperus, and Parasteatoda tepidariorum (4, 5) and MaSps a-h in T. clavipes (1). Furthermore, families 4 and 5 have also been reported in Caerostris darwini with unique repeat motifs (6). These sequences were also sorted into appropriate families by our comparative analysis (Fig. S7). For example, from previous studies, MaSp-abcdefgh and MiSp-abcd had been reported in alphabetical order, and Sp-907 (PRD35000.1) and Sp-74867 (PRD19552.1) had been reported as just spidroin proteins from the previous T. clavipes genome (NepCla1.0). Here, all of these proteins except for the very short MaSp-e were classified based on the phylogenetic analysis (Fig. S7), as follows: MaSp-b/c (PRD18716.1/PRD18936.1), MaSp-a (PRD23950.1), MaSp-d (PRD23750.1), MaSp-h (PRD20448.1), MaSp-g (PRD24320.1), MaSp-f (PRD27696.1), MiSp-b/c/d (PRD30268.1/PRD24510.1/PRD18914.1), MiSp-a (PRD23654.1), Sp-907 (PRD35000.1), and Sp-

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

74867 (PRD19552.1) are MaSp1A3, MaSp1C, MaSp1B1, MaSp2A2, MaSp2A3, MaSp2B1, MiSp1A, MiSp1B, MaSp3C1, and MaSp3B1, respectively.

Trichonephila clavata Trichonephila clavata

Trichonephila inaurata madagascariensis Nephila pilipes

Fig. S1. Stress-strain curves of dragline silk in four golden silk orb-weavers (subfamily Nephilinae).

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

The results of tensile tests of dragline silks from Trichonephila clavata (toughness: 169 ± 39 MJ/m3; Young's modulus 11.80 ± 0.82 GPa; tensile strength 1.51 ± 0.10 GPa; strain at break: 19.2 ± 3.3%), Trichonephila clavipes (toughness: 131 ± 36 MJ/m3; Young's modulus: 10.40 ± 2.86 GPa; tensile strength: 1.56 ± 0.24 GPa; strain at break: 17.1 ± 3.3%), Trichonephila inaurata madagascariensis (toughness: 285 ± 41 MJ/m3; Young's modulus: 14.60 ± 1.30 GPa; tensile strength: 1.73 ± 0.07 GPa; strain at break: 27.2 ± 3.2%), and Nephila pilipes (toughness: 292 ± 40 MJ/m3; Young's modulus: 7.00 ± 2.50 GPa; tensile strength: 1.72 ± 0.34 GPa; strain at break: 41.8 ± 6.8%).

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Trichonephila clavata Trichonephila clavata

Trichonephila inaurata madagascariensis Nephila pilipes

Fig. S2. Graphical representation of the contaminant analyses Results from the contamination check of the T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes genome assemblies using BlobTools (3) and Diamond (7). The BlobTools plot shows the taxonomic affiliation at the phylum rank level, distributed according to GC% and coverage.

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Trichonephila clavata Trichonephila clavipes (Illumina sequence reads) (Illumina sequence reads)

Trichonephila inaurata madagascariensis Nephila pilipes (Illumina sequence reads) (Illumina sequence reads)

Fig. S3. Graphical results of genome size estimation The size of each genome was estimated by GenomeScope (8), providing a k-mer analysis (k = 61) based on raw Illumina sequence reads obtained from genome sequencing.

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Trichonephila clavipes (NepCla1.0)

T. clavipes (NepCla1.0 with bacterial T. clavipes (this study) T. clavipes (NepCla1.0) and fungal scaffolds removed)

GENOME

General

Scaffold number 21,438 180,236 175,217

Total scaffold length 2,874,350,602 2,439,301,466 2,392,713,721

Average scaffold length 134,077 13,533 13,655

Longest scaffold 4,645,559 1,655,743 925,373

Shortest scaffold length 150 200 200

N50 length 738,925 62,959 63,195

BUSCO (Compelte[Single,Duplicated], Fragmented, Missing)

Eukaryote (n=255) 93.0%[92.2%,0.8%],2.7%,4.3% 72.1%[69.0%,3.1%],11.8%,16.1% 71.4%[71.4%,0.0%],15.7%,12.9%

Gene

General

# of protein-coding genes 45,304 47,907 30,897 BUSCO (Compelte[Single,Duplicated], Fragmented, Missing) Eukaryote (n=255) 87.5%[85.5%,2.0%],8.6%,3.9% 60.4%[51.0%,9.4%],22.4%,17.2% 57.3%[52.2%,5.1%],24.7%,18.0% Fig. S4. Comparison of T. clavipes genome statistics from this and a previous study. The panel shows the results of contamination detection in the previously reported T. clavipes draft genome (NepCla1.0), and the table shows the genome statistics. The large orange circles and many pink circles indicate the scaffolds classified as Proteobacteria (38.26 Mb) and Bacteroidetes (6.0 Mb), respectively. The bottom table shows the genome statistics of our T. clavipes draft genome (“this study”, Table 1) and the previous T. clavipes draft genome with and without contamination scaffolds (“NepCla1.0” and “NepCla1.0 with bacterial and fungal scaffolds removed”).

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Fig. S5. Spidroin and SpiCE catalog. This panel shows the spidroin and SpiCE sequence characteristics and structures obtained from four Nephilinae draft genomes. The icons in the first column represent spidroin groups. The second column represents the motif variety in the repeat domains: b-sheet (An, (GA)n, and AS), blue; b-turn (XQQ, GPGQQ, and GPGXX), yellow; spacer (SSX and TTX), green; 310 helix (GGX), orange; and MaSp motif (DGGR, GGYGGRF, and GGYGGL), red. The spidroin structure columns show the N/C-terminal (yellow and blue box) and repeat domains, and each structure is drawn to scale. The number, width, and color of stripes reflect the number, size, and motif of repeats. Assembly gaps are indicated as “N”.

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Nephila_inaurataTrichonephila inaurata madagascariensis

100 Nephila_clavipes 100 Trichonephila clavipes 100 Nephila_clavipes_babb

Nephila_clavataTrichonephila clavata

Nephila_pilipesNephila pilipes

Araneus_ventricosusAraneus ventricosus

0.6 Fig. S6. Phylogenetic tree of four sequenced Nephilinae species. Phylogenetic tree based on a protein sequence corresponding to the 1,031 core orthologous gene set in (9).

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

A Spidroin group Family Subfamily Isoform Spider Accession no. SP n X m Nct|MiSp1A2|completeTnct MiSp 1 A 2 this study Babb_MiSp-b_NCL1_26946-RATncv MiSp 1 A 3 PRD30268.1 Ncv|MiSp1A3|completeTncv MiSp 1 A 3 this study 0.999 Nct|MiSp1A1|completeTnct MiSp 1 A 1 this study Nin|MiSp1A2|completeTnin MiSp 1 A 2 this study Babb_MiSp-c_NCL1_43544-RATncv MiSp 1 A 2 PRD24510.1 Babb_MiSp-d_NCL1_59043-RATncv MiSp 1 A 1 PRD18914.1 0.832 Ncv|MiSp1A1|completeTnin MiSp 1 A 1 this study Ncv|MiSp1A2|completeTncv MiSp 1 A 1 this study Ncv|MiSp1A4|completeTncv MiSp 1 A 2 this study 0.834 Nin|MiSp1A1|completeTncv MiSp 1 A 4 this study Nin|MiSp1B1|completeTnin MiSp 1 B 1 this study Babb_MiSp-a_NCL1_45814-RA 0.986 Tncv MiSp 1 B 1 PRD23654.1 Ncv|MiSp1B1|completeTncv MiSp 1 B 1 this study Npi|MiSp1B2|completeNpil MiSp 1 B 2 this study Npi|MiSp1B1|completeNpil MiSp 1 B 1 this study 1.000 Ncv|MaSp1C1|completeTncv MaSp 1 C 1 this study Babb_MaSp-a_NCL1_45030-RATncv MaSp 1 C 1 PRD23950.1 Nin|MaSp1C1|completeTnin MaSp 1 C 1 this study Nct|MaSp1C1|completeTnct MaSp 1 C 1 this study Npi|MaSp1B1|completeNpil MaSp 1 B 1 this study 0.999 Nin|MaSp1B1|completeTnin MaSp 1 B 1 this study Babb_MaSp-d_NCL1_45564-RATncv MaSp 1 B 1 PRD23750.1 0.995 Ncv|MaSp1B1|completeTncv MaSp 1 B 1 this study Nct|MaSp1B1|completeTnct MaSp 1 B 1 this study Nct|MaSp1B3|completeTnct MaSp 1 B 3 this study Nct|MaSp1B2|completeTnct MaSp 1 B 2 this study Nct|MaSp1B4|partTnct MaSp 1 B 4 this study 0.942 0.526 1.000 Nin|MaSp1A4|completeTnin MaSp 1 A 4 this study Nct|MaSp1A4|completeTnin MaSp 1 A 3 this study Nin|MaSp1A3|completeTnct MaSp 1 A 4 this study MaSp1A|EU599238|Nephila_clavipesTncv MaSp 1 A 1 EU599238.1 Ncv|MaSp1A1|completeTncv MaSp 1 A 1 this study Ncv|MaSp1A2|completeTncv MaSp 1 A 2 this study 0.97 Nin|MaSp1A2|completeTnin MaSp 1 A 2 this study Nct|MaSp1A1|completeTnct MaSp 1 A 1 this study Nct|MaSp1A3|completeTnct MaSp 1 A 3 this study Npi|MaSp1A1|completeNpil MaSp 1 A 1 this study Nct|MaSp1A2|completeTnct MaSp 1 A 2 this study Nct|MaSp1A5|partTnct MaSp 1 A 5 this study 0.795 Nin|MaSp1A1|completeTnin MaSp 1 A 1 this study MaSp1B|EU599239|Nephila_clavipesTncv MaSp 1 A 3 EU599239.1 Babb_MaSp-b_NCL1_59687-RATncv MaSp 1 A 3 PRD18716.1 Babb_MaSp-c_NCL1_58967-RATncv MaSp 1 A 3 PRD18936.1 Ncv|MaSp1A3|completeTncv MaSp 1 A 3 this study Npi|MaSp2B1|complete 0.737 Npil MaSp 2 B 1 this study Nct|MaSp2B1|completeTnct MaSp 2 B 1 this study Nin|MaSp2B1|completeTnin MaSp 2 B 1 this study 0.986 Ncv|MaSp2B1|completeTNcv MaSp 2 B 1 this study Babb_MaSp-f-1_NCL1_34622-RBTNcv MaSp 2 B 1 PRD27696.1 0.966 Babb_MaSp-f-2_NCL1_34622-RATNcv MaSp 2 B 1 PRD27696.1 Npi|MaSp2A1|completeNpil MaSp 2 A 1 this study Npi|MaSp2A2|completeNpil MaSp 2 A 2 this study Npi|MaSp2A3|completeNpil MaSp 2 A 3 this study MaSp2|DQ059135|Nephila_inaurataTnin MaSp 2 A 2 DQ059135.1 0.966 Nin|MaSp2A2|completeTnin MaSp 2 A 2 this study Babb_MaSp-g_NCL1_44126-RATNcv MaSp 2 A 3 PRD24320.1 Ncv|MaSp2A3|completeTNcv MaSp 2 A 3 this study Nin|MaSp2A1|completeTnin MaSp 2 A 1 this study Nct|MaSp2A1|completeTnct MaSp 2 A 1 this study Nct|MaSp2A2|completeTnct MaSp 2 A 2 this study Ncv|MaSp2A1|completeTNcv MaSp 2 A 1 this study MaSp2|EU599240|Nephila_clavipesTNcv MaSp 2 A 2 EU599240.1 Babb_MaSp-h_NCL1_54604-RATNcv MaSp 2 A 2 PRD20448.1 Ncv|MaSp2A2|completeTNcv MaSp 2 A 2 this study 0.927 Ave|MaSp3Aven MaSp 3 A 1 GBN25680.1 MaSp3|AWK58729.1|Argiope_argentataAarg MaSp 3 A 1 AWK58729.1 Npi|MaSp3C1|completeNpil MaSp 3 C 1 this study 0.744 0.98 Nct|MaSp3C1|completeTnct MaSp 3 C 1 this study Nin|MaSp3C1|completeTnin MaSp 3 C 1 this study 1.000 Babb_Sp-907_NCL1_12843-RA|MaSp3Tncv MaSp 3 C 1 PRD35000.1 Ncv|MaSp3C1|completeTncv MaSp 3 C 1 this study Npi|MaSp3B1|completeNpil MaSp 3 B 1 this study 0.924 Babb_Sp-74867_NCL1_57024-RA|MaSp3Tncv MaSp 3 B 3 PRD19552.1 Ncv|MaSp3B3|completeTncv MaSp 3 B 3 this study Ncv|MaSp3B1|completeTncv MaSp 3 B 1 this study Ncv|MaSp3B2|completeTncv MaSp 3 B 2 this study Nct|MaSp3B1|completeTnct MaSp 3 B 1 this study Nin|MaSp3B1|completeTnin MaSp 3 B 1 this study Nin|MaSp3B2|completeTnin MaSp 3 B 2 this study

0.3

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

B

Spidroin group Family Subfamily Isoform Spider Accession no. SP n X m

Npil AcSp 1 A 1 this study Npi|AcSp1A1|complete

0.992 Tnct AcSp 1 A 1 this study Nct|AcSp1A1|complete

Tnin AcSp 1 A 1 this study Nin|AcSp1A1|complete

Tncv AcSp 1 A 1 this study Ncv|AcSp1A1|complete

Tncv AcSp 1 A 1 PRD26201.1 Babb_AcSp_NCL1_38763-RA

Npil CySp 1 A 1 this study Npi|CySp1A1|complete

Abru CySp 1 A 1 AB242144.1 0.998 CySp1|AB242144|Argiope_bruennichi

CySp|EU730637|Nephila_antipodianaTant CySp 1 A 1 EU730637.1

Nct|CySp1A1|completeTnct CySp 1 A 1 this study

CySp1|AB218974|Nephila_clavataTnct CySp 1 A 1 AB218974.1

Nin|CySp1A1|completeTnin CySp 1 A 1 this study

Ncv|CySp1A1|completeTncv CySp 1 A 1 this study

Babb_CySp_NCL1_12038-RATncv CySp 1 A 1 PRD35275.1 0.992 Npi|PySp1A1|completeNpil PySp 1 A 1 this study 1.000 Nct|PySp1A1|completeTnct PySp 1 A 1 this study

Nin|PySp1A1|completeTnin PySp 1 A 1 this study

Ncv|PySp1A1|completeTncv PySp 1 A 1 this study

Babb_PiSp_NCL1_40387-RATncv PySp 1 A 1 PRD25616.1

Babb_FLAG-b_NCL1_42776-RATncv Flag 1 A 1 PRD24772.1 0.936 Npi|Flag1A1|partNpil Flag 1 A 1 this study 0.983

Nct|Flag1A1|completeTnct Flag 1 A 1 this study

Ncv|Flag1A1|completeTncv Flag 1 A 1 this study

Babb_FLAG-a_NCL1_35963-RATncv Flag 1 A 1 PRD27227.1

Nin|Flag1A1|partTnin Flag 1 A 1 this study

Flag|AF218623.1|Nephila_madagascariensisTnin Flag 1 A 1 AF218623.1

Flag|AF027972|Trichonephila_clavipesTncv Flag 1 A 1 AF027972.1 0.571 AgSp1|lcl|MK138561.1_prot_QDI78451.1_1|Argiope_trifasciataAtri AgSp 1 B 1 QDI78451.1

1.000 Nin|AgSp1B1|partTnin AgSp 1 B 1 this study

Babb_AgSp-d_NCL1_44933-RATncv AgSp 1 B 1 PRD23989.1

Npi|AgSp1A1|completeNpil AgSp 1 A 1 this study

Nct|AgSp1A1|completeTnct AgSp 1 A 1 this study

Babb_AgSp-c_NCL1_37576-RATncv AgSp 1 A 1 PRD26655.1 0.929

Ncv|AgSp1A1|completeTncv AgSp 1 A 1 this study

Nin|AgSp1A1|completeTnin AgSp 1 A 1 this study

Ncv|AgSp2A1|completeTncv AgSp 2 B 1 this study

Babb_AgSp-b_NCL1_50123-RATncv AgSp 2 B 1 PRD22063.1 0.992 AgSp2|lcl|MK138559.1_prot_QDI78449.1_1|Argiope_trifasciataAtri AgSp 2 A 1 QDI78449.1

Npi|AgSp2A1|completeNpil AgSp 2 A 1 this study

Nct|AgSp2A1|completeTnct AgSp 2 A 1 this study

Nin|AgSp2A1|completeTnin AgSp 2 A 1 this study

Ncv|AgSp1B1|completeTncv AgSp 2 A 1 this study

Babb_AgSp-a_NCL1_46464-RATncv AgSp 2 A 1 PRD23399.1 0.3 Fig. S7. Systemic relationships underlying new nomenclature. These phylogenetic trees are constructed on the basis of the 150 aa N-terminal regions of (A) the ampullate spidroins (MaSp and MiSp) and (B) the other spidroins. The spidroin nomenclature (SPnXm) in this paper is based on this phylogenetic tree. Tnct: Trichonephila clavata,

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Tncv: Trichonephila clavipes, Tnin: Trichonephila inaurata madagascariensis, Npil: Nephila pilipes, Aven: Araneus ventricosus, Aarg: Araneus argentata, Tant: Trichonephila antipodiana, Abru: Argiope bruennichi, Atri: Argiope trifasciata.

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

MView 2020/05/24 15(23

Reference sequence (1): MaSp3|AWK58730.1|Latrodectus_hes Identities normalised by aligned length. Colored by: property N-terminal region cov pid 1 [ . . . . : . . . . 1 . . 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus MTWSTRLALSLLSVLLSQAICALGQTNTPWS-SKANADAFIQAFMKDASQSGAFSSDQIDDMSMISNTILSAIENM--GGVITQHKLQAL 2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata MAWITRLPLLVLVALCTQSIIVHGQDSHPWS-DVRTTESFMKNFVECIRQSSYFNTDDIESIRDLSDTMIQSLNGMTAIGKTSHQMLQAL 3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus MTWTARLVLLTLVALCTLSAVARGESNHPWK-DMKSMETFMDIATEKLSSSGRFNDDDVDAVREVGDTLKQSAENMWKKGKMSPKQLGMM 4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes MGWLTNILFVAL--FCTQTTFVRGSLQELMRGGLGNMESFINIFSDSLVSSGQIREEYIDDVYTMRQTLINSINDMKATKNLNPHKLQLW 5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata MGWITNILFVAL--FCSQTSLVRASLQELMSGGLGNMEGFINTFSDSLVSSGQLKPDLVDEVYSMKDTLINSINEMKTSKNLNQHKLQLF 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes MGWITNILFVAL--FCSQTSFVRASLQELMSGGLGNMEGFINTFSDSLVSSGQLNADLVDEVYSMKDTLINSINEMKTSKNLNPHKLQLF 7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% MGWITNILFVAL--FCSQTSFVSASLQELMSGGLGNMEGFINTFSDSLVSSGELNADLVDEVYSMKDTLINSINEMKTSKNLNQHKLQLF consensus/100% MsW.spl.h.hL..hho.s.hs.up.pp.hp.shtsh-sFhp.h.cph.pSu.hp.-.l-th..htpThhpuhptM...t.hs.p.Lthh consensus/90% MsW.spl.h.hL..hho.s.hs.up.pp.hp.shtsh-sFhp.h.cph.pSu.hp.-.l-th..htpThhpuhptM...t.hs.p.Lthh consensus/80% MsWhTplhhlsL..hCoQo.hspup.pc.hp.shtshEuFhphFs-slspSGthpsD.lDplhphpsTllpSlppMhtttphs.+pLQhh consensus/70% MuWlTpllhlsL..hCoQoshV+uphpc.hS.uluNMEuFIssFo-sLsSSGphssD.lD-VhshpDTLIsSINpMpsstsls.HKLQhh

cov pid 121 . . : . . . . 2 . . . . 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus QSAFYQ-TTGVVNNQFINEIKSLISMFRKASANAISYDSSGSGVYSPSSASVSVSVPSAPSVSNQILNQQPVSVAFTQQQPGIQRNLLT 2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata RNSLIQ-TTGVVNESFMNEMDKLMQMFSQINVLNEDSGGNGASAAS------3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus SAALVA-SIGKIDPIFMNEINTLMSTFGQSNQLGNESGGYGAESAAVAASSSNGFGSGPQTLSSQGRGAPSISVSAT-GAGAFPQGPVG 4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes ASAFQQFGMGQ-DYKLAREMEELIRVFAE-NQDTNSVESSAAAASSGIGG-----GYGPVSGYGNGAGAAAAAAGSGEGGRGIGYGPGGG 5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata ASAFAQFGMGQ-DYKLAREMEELIRVFAE-NQDTNFVESTGAAAAAAGGGGYGGIGYGPGGASG-----AAAAAAAEGGNRGIEYGPGG 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes ASAFAQFGMGQ-DYKLAREMEELIRVFAE-NQDTNFVESTGAAAAAAGGG--GGIGYGPGGGYGGGSGSAAASAAAEGGYGGIGYGPGG 7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% ASAFGQFGMGQ-DYKLAREMEELIRVFAE-NQDTNFVESSGAAAAAAGGG--GGIGYGPGGGYGGGTGSAAASAAAERSYGGIGYGPGG consensus/100% tsuh.t.shG..s..hhpEhppLhphFtp.s..s...tu.uuts.u...... consensus/90% tsuh.t.shG..s..hhpEhppLhphFtp.s..s...tu.uuts.u...... consensus/80% tuAhhQ.shG..s.phhpEhcpLhphFup.Nt.sp.stusGAuuuus.uu.....s.us.sh.s...... shusu.t.t..uh..s.hst consensus/70% uSAFhQ.uhGp.D.phhpEM-cLIpsFuc.NQssN.s-SoGAuAAusuuu..ss.G.GPsus.uth.stsususuuptu.tGIthGPsG

cov pid 241 : . . . . 3 . . . . : . 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus ------SETVSYRPGQSIQSSGTTSES------YDQGNYVQSGSGNSVAIAAG-GTTQRRYDQDTTGS------2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata ------SASASNVPGIGQSLGPQGQGS------3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus ------GGQYGSGLGAGQGGNSGSAASAEAT--GSGGAGSPIQGYGGAVG--SAAASGIGGYG-GLGSLGAGG------4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes GAGAAAAAAAASGEGGRGIGYGAGGASGAAAAASGAAGGRGIGYGPGSGYGGGAGAAAAGAAGEGGRGIGYGP-GGGS------5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata DSGAAAAAAGEGN-GGF--GYGPGGASGAAAAAAAADGGRGGLYGRGRGLGGDS--AAAAAAADGGRG-GYGGRGGGDSGAAAAAAAAG 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes ASGTAAAAAADGGRGGYG-GLG-GGSDAAAAAAAAADSGRGGLYGGGRGLGGDSSAAAAAAAANGGRG-GYGGFGGGS------7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% ASAAAAAAAADGGRGGYG-GLGRGGDSGAAAAAAASDGGRGGYGGNGRGLGGESGAAAAAAAADGGRG-GYGGLGGGSDAAAAAAAAAG consensus/100% ...... sA.Aus.ssht.ths..stGs...... consensus/90% ...... sA.Aus.ssht.ths..stGs...... consensus/80% ...... u...uht.Gts.tuuusstts...... hs.hphhtuts...uAuAuu.GGhG.uhGs.GsGu...... consensus/70% ...... us.hG.s.GhG.GGssuuAAuAuuu..GpGhhhu.GpGhGGsuu.uuAuAAu.GGpG.GYGs.GuGS......

cov pid 361 . . . 4 . . . . : . . . 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus AAVATENGVYEA------GQSGYG-QYGA------ALAGNGANQGEFG------2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata V------GSGSYE------YP------3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus AGLGGQEDTAAAAAAAGGLGGQGGYG-GLGYQGSGGV------GQGG-YGSGLGAAGQSAASAAVASASGGGRGGLGGRGYGPGGYGR 4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes SGYGGGS--GAAAAAAGS--GEGGRGVGYGPGGASGA-AAAADGGRGIGYGPGSGYGGGAGAAAA-GAAGEGGRGIGYG----PGG---- 5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata RGLGGDS--AAAAAAADG--GQGGYG-GRGFGGDSGAAAAAAGEGNGGL---GYGPRGAS-GAAA-AAAADGGRGGLYGRRRGYGG---- 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes RGLGGDSGAAAAAAAADG--GRGGYG-GFGGGSDAAAAAAAADGGRGGLYGRGRGLGGDSGAAAA-AAAADGGRGG-YG---GFGG---- 7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% RGLGGDSGAAAAAAAADG--GRGGYG-GLGGGSDAAAAAAAADGGRGGLYGRGRGLGGEL-VQQA-AAAADGGRGG-YG---GLGG---- consensus/100% ...... Gpuuht...... hs...... consensus/90% ...... Gpuuht...... hs...... consensus/80% tuhustp..htA...... GpGGYG.thG...... u.AutGupth.aG...... consensus/70% tGlGGps..uAAAAAAsu..GpGGYG.GhGhtusuus...... GpGh....G.G.tGt..sttA.AuAusGGRGG.YG.....GG....

cov pid 481 . 5 . . . . : . . . . 6 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus ------NYDNGSAGQGNAGFVAIAANGVGQG----GYGP------EQGSVTPNSYTQT----GYAQGSVGQGGDKRYGPSSA 2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata ------LSVNSIGGS----GYGPGGSGTGGSGAAAAAASS------GAGPGYGGRGR------3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus ------GQG-GPGQGGASAAAAAASGGGRGGLGGRGYGPGGYGRDGSGSAAAAASSDGSGSDGYGGGYGGQGGPGQGGASAAAAAA 4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes AAAAGS----GEGGRGVGYGPGGGSGAAAAAAAGSGEGGRGIGYGPGG----GSGAAAAAAGADG----GRGVGYGPGGR---YGSEFA 5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata AAAAGE----GNGGLGYGPGGASGAAAAAEAADGGRGGLYGRGRGLGG------DSAAAAAAADG----GQG-GYGGRGF---GGDSGAAAAAA 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes AAAAAAAADGGRGGYG-GLAGGSD--AAAAAADGGRGGLYGRGRGLGG----ESGAAAAAA-ADG----GRG-GYGGLGG---GSD-AAAAAA 7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% AAAAAAAADGGRGGYG-GLGGGSDAAAAAAAADGGRGGLYGRGRGLGG----ESGAAAAAAAADG----GRG-GYGGLGG---GSDAAAAAAAA consensus/100% ...... ussu.ttu....GhG...... sussssu.s...... G.u.G.ss.Gh...... consensus/90% ...... ussu.ttu....GhG...... sussssu.s...... G.u.G.ss.Gh...... consensus/80% ...... s.s.G.u.tus..hsAhAusGsttG....GhG.GG....tpGuAAAAA.ups....GhG.GYGG.Gt...hus..A consensus/70% ...... GhG.G.usuusuuAAAtAAsGutGGhhGhGhG.GG....tSGuAAAAAuuDG....GtG.GYGGtGt...hustuA

cov pid 601 . . . . : . . . . 7 . . 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus SPSVTVTEQVTYAPQRQVSYSETISYIPQDSTKMALVPIAT------GADETSYEQIGQGPTTVTIAT------2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata IKKIIVRRKG------GSQYDTEAFEIE------3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus AAASAASSDGSGSDGYGGGYGGQGGPGQGGDSAAAAAASGGGRGGLGGR------GYGPGGYGRDGSGSAAAAAASSDGSGS- 4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes SASLSLARN------YALQSGLSAAASNEMLNLVNRY------ISNVGAYAEAEAYANALAQAIAEGLGNA 5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata AAAAAAAADG----GRGGLYGRGRGLGGDSAAAAAAAA------DGGREGYGGLGGDSGAAAAAAAAGEGN- 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes AAAAAAAADG----GRGG-YG---GLGGGSDAAAAAAADGGRGGFYGRGRGAGVGAAGRSSDGGRGGYGGLGGGSDAAAAAAAAD-GR- 7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% AAAAAAAADG----GRGG-YG---GLGGGSDAAAAAAA------DGGRGGYGGLGGDSGAAAAAAAADGGR- consensus/100% ..t..http...... t.ssthh.h...... consensus/90% ..t..http...... t.ssthh.h...... consensus/80% usuhssstps...... Yu...u.ustsssthhths...... s.tuYtt.tt.ssshs.A...... consensus/70% uAususutcG....thts.Yu...GhGususstAAsss...... sus.suYuthGususAsAtAsuts.Gp.

cov pid 721 . . : . . . . 8 . . . . 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus ADQG-----GYGQGSAGQFGAGTAADTTFGSAGQGN------QFSYGQGEYV-QVGNGQDGTGAAATVSGWTGQD 2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata ------GNNYGPQG----GSGS------3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus GRGGL-GGRGYGPGGYGRDGSGSAAASAASSDGSGS------DGYGGG-YGGQGGPGQGGDSAAAAAASGGGRGG 4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes GEGGIEYGPVYGREG----GSGAAAAAAAGEGGEGGYGTGYGREGGSGAAAAAAAAEEGGAGG-YGNEVGAGG---ASAAAASGAGGRR 5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata GRGGL-YGRGRGLGG----DSAAAAACVYGPGGASG------AAAAAAAADGGRGGLYGRGRGLGG--DSAAAAAAADGGRGG 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes GRGGL-YGRGRGLGG----ESGAAAAAAAAEGGRGGYG---GLGGGSDAAAAAAAAADGGRGG-YG---GLGGGSDAAAAAAAADGGRGG 7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% GRGGL-YGRGRGLGG----DSGAAAAAAAADGGRGGYG---GLGGGSD-AAAAAAAADGGRGGLYGRGRGLGGGSGAAAAAAAADGGRGG consensus/100% ...... shG.tu....tuus...... consensus/90% ...... shG.tu....tuus...... MView consensus/80% uctG...G.shG.tG....sSGuAAssshu.sGtus...... uhttG.Ys...G.Gt2020/05/24...uuAAssuu 15(23.sGpt consensus/70% GcGGl.hGpGhG.GG....sSGuAAAsshupuGpGu...... t-GhtGG.YG.thG.Gt..suAAAAAuusGGRs

cov pid 841 : . . . . ] 900 1 MaSp3|AWK58730.1|Latrodectus_hesMaSp3 (AWK58730.1) 100.0% Latrodectus 100.0% hesperus YG--G---GNAGQGGAG----AA------https://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20200524-072255-0424-52958852-p1m/aln-html2 MaSp3|AWK58729.1|Argiope_argentaMaSp3 (AWK58729.1) 49.6% Argiope 15.0% argentata SGSAGY--GEQGESGPG----GAA------1 / 2 ページ 3 Ave|MaSp3 MaSp3 98.2% Araneus 20.8% ventricosus SGSDGYGGGYGGQGGPGQGGDSAAAAAASGGGRGG------4 Npi|MaSp3B1|complete MaSp3B1 93.4% Nephila 19.4% pilipes EG--GL--GYGGEGGSG----AAAAAAAADGA-GGRRGYGR------5 Nct|MaSp3B1|complete MaSp3B1 Trichonephila91.7% 17.4% clavata NG--GF--GYGPGGASG----AAAAAAAADGGRGGLYGRGRGLGGDSAEAAAAAAAAGEG 6 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 92.5% 17.6% clavipes RG--GYG-GFGGGSD------AAAAAAAADGGRGGLYG------7 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 93.0% madagascariensis 17.5% NG--GI--GYGPGGASG------AAAAADGGRGGL------consensus/100% .G..G...G.tstus...... consensus/90% .G..G...G.tstus...... consensus/80% pG..Gh..G.ustuusG....uA...... consensus/70% pG..Gh..GaGGtGusG....uAAAAAAusGu.GG......

MView 1.63, Copyright © 1997-2018 Nigel P. Brown

15

https://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20200524-072255-0424-52958852-p1m/aln-html 2 / 2 ページ bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. MView 2020/05/24 15(25

Reference sequence (1): Ave|MaSp3 Identities normalised by aligned length. Colored by: property C-terminal region cov pid 1 [ . . . . : . . . . 1 . . 1 Ave|MaSp3 MaSp3 100.0% Araneus 100.0% ventricosus ------GGQGGPGQGGASAAAAAASGGGRGGLGG-RGYGPGGYGRDGSGSAAAAAASSDGSGSDGYGGGYGGQGGPGQGGDSAAAAAA 2 MaSp3|AWK58637.1|Araneus_diademaMaSp3 (AWK58637.1) 33.2% Araneus 25.1% diadematus ------GGYG------3 Npi|MaSp3B1|complete MaSp3B1 82.2% Nephila 32.1% pilipes ------GS--GAAAAAAAAEEGGAGGYGNEVGAG------GASAAAASGAGGRRGPGYGRGYGP------GSGSA 4 Nct|MaSp3B1|complete MaSp3B1 Trichonephila82.8% 36.1% clavata AAADGGRGGLYGRGRGLGG--DSAAAAAAAADGGQGGYGG-RGLG------GDSGAAAAAGEGNG--GLGYGPGG------ASGAAAAAAAA 5 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 80.2% 36.5% clavipes ------GLYGRGRGPGGESGGAAAAAAAADGGRGGYGG-LGRG------SDAAAAAAADGGRG--GYGGLVGG------SDAAAAAAAA 6 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 83.6% madagascariensis 38.2% -ASDGGRGGYGGLG---GG--SDAAAAAAAADGGRGGLYG-RGRGLGG----DSGAAAAAAAADGGRG--GYG-GLGG------GSDAAAAAAAA 7 MaSp3|AWK58638.1|Latrodectus_hesMaSp3 (AWK58638.1) 57.1%Latrodectus 12.9% hesperus ------GRYGQ-SGFG------8 MaSp3|AWK58639.1|Parasteatoda_teMaSp3 (AWK58639.1) Parasteatoda 20.8% 22.8%tepidariorum ------consensus/100% ...... consensus/90% ...... consensus/80% ...... Gthh...... consensus/70% ...... GGhGt..GhG......

cov pid 121 . . : . . . . 2 . . . . 1 Ave|MaSp3 MaSp3 100.0% Araneus 100.0% ventricosus AAAADSADGSGGNGYGSGFREEAGDAAATGNALSLGTAEGQRDIYKRIVIIRRGGPGSSSAAAASSDGSGSDGGQGGLGQGGDSAAAAAA 2 MaSp3|AWK58637.1|Araneus_diademaMaSp3 (AWK58637.1) 33.2% Araneus 25.1% diadematus ------RG------3 Npi|MaSp3B1|complete MaSp3B1 82.2% Nephila 32.1% pilipes AAAAAADGAGGRRGY-G------RGYGPGSGAGAAGAAGEGGR-GYGPGYGGEGAAAAAAAAAA 4 Nct|MaSp3B1|complete MaSp3B1 Trichonephila82.8% 36.1% clavata AAAAAAD--GGRGGYGG------RGLGGDSGAAAAAAAADGGRGGYGGRGLGGD--SAAAAA 5 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 80.2% 36.5% clavipes AAAAAAD--GGRGGYGG------LG-RGSDAAAAAAAAADGGRGGYGGLGGGSDAAAAAAAA 6 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 83.6% madagascariensis 38.2% AAAAAAD--GGRGGYGG------LG-GGSD-AAAAAAAADGGRGGYGGLGGGSDAAAAAAAA 7 MaSp3|AWK58638.1|Latrodectus_hesMaSp3 (AWK58638.1) 57.1%Latrodectus 12.9% hesperus ------QEGAGQGGAGSVATAGESGQGNTG------INVKVA 8 MaSp3|AWK58639.1|Parasteatoda_teMaSp3 (AWK58639.1) Parasteatoda 20.8% 22.8%tepidariorum ------consensus/100% ...... consensus/90% ...... consensus/80% ...... t...... consensus/70% ...... hG..sts.Auusussupuup.s.G...... ssts

cov pid 241 : . . . . 3 . . . . : . 1 Ave|MaSp3 MaSp3 100.0% Araneus 100.0% ventricosus ASSDGSGSDGYG------GGYGGQGGPGQGGDSAAAAAASGGGRGGIGGRGYGPGGYGRDGSGSAAAAAASSDGSGSDGYGGG------2 MaSp3|AWK58637.1|Araneus_diademaMaSp3 (AWK58637.1) 33.2% Araneus 25.1% diadematus ------3 Npi|MaSp3B1|complete MaSp3B1 82.2% Nephila 32.1% pilipes AADGAGGRRGYG------RGYGPGS----GAGAAAAAGSGEGGRG--YGSGYGGEG----ASG---AAAAAGAGEGGLGYGGEGGS 4 Nct|MaSp3B1|complete MaSp3B1 Trichonephila82.8% 36.1% clavata AADG--GRGGYG------GLGRDG----ESGAAAAAAAADGGRGG-YG-GLGRGG----DSGAAAAAAAAGEGNGGLGYGPGGAS 5 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 80.2% 36.5% clavipes AADG--GRGGYGGLESWLHGRGGYGGLG----GGSDAAAAAAADGGRGGLYGRGRGPGG----ESGVAAAAAAAEGGQGGYGGIGGGS------6 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 83.6% madagascariensis 38.2% AADG--GIGGYG------GLG--G----GSDAAAAAAAADGGRGGLYGRGRGLGG----DSGAAAAAAAADGVRGGYG-GLGGGS 7 MaSp3|AWK58638.1|Latrodectus_hesMaSp3 (AWK58638.1) 57.1%Latrodectus 12.9% hesperus RNAGQVNVVGYR------PSSASISISVPQVSRVTYQVPVQQPVS------VSFPQQPLTQTYSSGPSVT------8 MaSp3|AWK58639.1|Parasteatoda_teMaSp3 (AWK58639.1) Parasteatoda 20.8% 22.8%tepidariorum ------consensus/100% ...... consensus/90% ...... consensus/80% ...... consensus/70% tsss..s..GYt...... ussuhuhusstsuRs...s.s.t..u...... suhstt.hpts.u.h.t......

cov pid 361 . . . 4 . . . . : . . . 1 Ave|MaSp3 MaSp3 100.0% Araneus 100.0% ventricosus AAAASGGGRGGIGGRGYGPG------GYGRD--GSGSAAAAAASSDGSGSDG-YGVGYG------2 MaSp3|AWK58637.1|Araneus_diademaMaSp3 (AWK58637.1) 33.2% Araneus 25.1% diadematus ------GSGSSAAAAGSSDDSGSEG-YGRGFG------3 Npi|MaSp3B1|complete MaSp3B1 82.2% Nephila 32.1% pilipes AGAAGEGGR-GY-GPGYGGEGASAAAAAAGEGGEGGLGYGGEG-GSGAAAAAAAADGSGGRRG-YGRGYGPGSGAGAAAAGAAGEAAAAAAA 4 Nct|MaSp3B1|complete MaSp3B1 Trichonephila82.8% 36.1% clavata AAAAADGGRGGYGGRGLGGDSAAAAAAAAADDGRG--GYGGLGRGGDSAAAAAAAAADGGQGG-YG-GRGLGGDSGAAAAAAAAGEDN------5 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 80.2% 36.5% clavipes AAAAADGGRGGYGGLGRGSD-AAAAAAAAADGGRG--GYGGL--GGGSDAAAAAAAADGGRGGLYGRGRGLGGDSGAAAAASASGEGN------6 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 83.6% madagascariensis 38.2% AAAAADGGRGGYGGLG-GGSD-AAAAAAAADGGRG--GYGGL--GGDSDAAAAAAAADGGRGGLYGRGRGLGGDSSAAAAGAASGEGN------7 MaSp3|AWK58638.1|Latrodectus_hesMaSp3 (AWK58638.1) 57.1%Latrodectus 12.9% hesperus QRQVSYSETASY-DLGVSPQSTSSAASKEYVPVDN--GY-----GQETTSTATTTTVSGKQER-----YRQSGSESAAARGI------8 MaSp3|AWK58639.1|Parasteatoda_teMaSp3 (AWK58639.1) Parasteatoda 20.8% 22.8%tepidariorum ------consensus/100% ...... consensus/90% ...... consensus/80% ...... GttssusAssssssutptt.....ht...... consensus/70% tttsu.utp.uh.s.G.ust...... GY.....GusosAAAAuusssuGptG.YG.GhG......

cov pid 481 . 5 . . . . : . . . . 6 1 Ave|MaSp3 MaSp3 100.0% Araneus 100.0% ventricosus ------AASSASGGGSGGYGAGQGVFGGDLQEPAVAAASAA------SSAASRMSSPSSQSRVSSAASIFLDN 2 MaSp3|AWK58637.1|Araneus_diademaMaSp3 (AWK58637.1) 33.2% Araneus 25.1% diadematus ------ATSSASGGGSGGYGAGPGLYGGGVQEPAAASAATA------SSAASRMSSPSSQSRVASAVSIFSEN 3 Npi|MaSp3B1|complete MaSp3B1 82.2% Nephila 32.1% pilipes ------AAAAGAAGEGGR-GYGPG-YGGEGAAAAAAAAGGRRGPVGYGGSSSASAAEVISNLAPSIPRVTSVTTDLVSGG 4 Nct|MaSp3B1|complete MaSp3B1 Trichonephila82.8% 36.1% clavata ------AAAAAADGGRGGLYGRRRG-YGGGSGAAAAAAAAGEGDLGSFGA----PAITLIRSVSPSVSRVNSFTTSLVSGG 5 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 80.2% 36.5% clavipes GEDYEGLGYGPGGASAAASAADGGRGGRYGRRRG-YGGGSGAAAASAAAGEGEFGSFGG----PAI------6 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 83.6% madagascariensis 38.2% ------AAAAAEGGRGGRYGRRRG-FGGGSGAAAASAAAGEGELGSFGG----PAITLIRSVTPSVSRVNSFTTSLVSGG 7 MaSp3|AWK58638.1|Latrodectus_hesMaSp3 (AWK58638.1) 57.1%Latrodectus 12.9% hesperus ------DAALAVSSGAGQRY------YGTKLVAAPSANAAYA------LATPDSSARISSLTSSLLST 8 MaSp3|AWK58639.1|Parasteatoda_teMaSp3 (AWK58639.1) Parasteatoda 20.8% 22.8%tepidariorum ------GESGAAAASAAASA------ISSPASTSRISFVASKLVSGG consensus/100% ...... st..tsssusAu.t...... consensus/90% ...... st..tsssusAu.t...... consensus/80% ...... As.ustuGtuth...... aGGt.ttsAuuuAust...... s.ssS.sRls.hso.h.pss consensus/70% ...... AAuuusuGtuGtYGht.G.aGGtssAAAAuAAAut...... suh...husssS.uRlsShso.hlssss

cov pid 601 . . . . ] 645 1 Ave|MaSp3 MaSp3 100.0% Araneus 100.0% ventricosus LVQSLTELLVA--TIMAATGLDS------SSATDVVRSSFY 2 MaSp3|AWK58637.1|Araneus_diademaMaSp3 (AWK58637.1) 33.2% Araneus 25.1% diadematus LVQVLLEILAA--TISRYSGLDS------GSSMEYVSSALSSSLY 3 Npi|MaSp3B1|complete MaSp3B1 82.2% Nephila 32.1% pilipes SSQVLYDIIVSLFHILRYSNVGEVDFNTIGETSRYLSSVIGG--Y 4 Nct|MaSp3B1|complete MaSp3B1 Trichonephila82.8% 36.1% clavata SVQVLLDIVASLLHILRYSSIGVVDYSTVGETNEYLSNALRG--Y 5 Ncv|MaSp3B1|complete MaSp3B1 Trichonephila 80.2% 36.5% clavipes SVQVLLDIVASLLHILRYSSIGVVDYSTVGETNGYLSNTLRG--Y 6 Nin|MaSp3B1|completeMaSp3B1 Trichonephila inaurata 83.6% madagascariensis 38.2% SVQVLLDIVASLLHILRYSSIGVVDYSTVGETNGYLSNALRG--Y 7 MaSp3|AWK58638.1|Latrodectus_hesMaSp3 (AWK58638.1) 57.1%Latrodectus 12.9% hesperus LVQAVLELVTALLSIIGSSNIGDVNYDAAGQYAQMVSQSVQNAFI 8 MaSp3|AWK58639.1|Parasteatoda_teMaSp3 (AWK58639.1) Parasteatoda 20.8% 22.8%tepidariorum TIQALVELIAALIHILGSASIGNVNYGSAAQSAAVVSESFQSAFH consensus/100% ..Qslh-llsu..pI.t.ssls...... t.hopshts..h consensus/90% ..Qslh-llsu..pI.t.ssls...... t.hopshts..h consensus/80% .lQsLh-llsu..pIht.oslss...... up..thlSpslpu..a consensus/70% .VQsLl-llsuLhpIlthSulGsVsasshGposthlSsslpu..Y Fig. S8. Alignment results of N/C-terminal regions in the MaSp family 3 genes. Two MView 1.63, Copyright © 1997-2018 Nigel P. Brown panels show the alignment results of the N- or C-terminal regions of MaSp family 3 genes newly found in subfamily Nephilinae and the known MaSp3 genes. The terminal regions are well conserved and share the same characteristics, with many arginine residueshttps://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20200524-072424-0530-77073497-p2m/aln-html in the repeat domain. 1 / 1 ページ

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

DraglineDragline silkssilks

MiSp silk

Fig. S9. Spinnerets producing dragline (MaSp) and MiSp silk. Dragline silks were reeled directly from the major ampullate gland spigots using a stereomicroscope to avoid mixing with other silks coming out of other spinnerets. This picture is of T. clavata.

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

T. clavata T. clavipes N. pilipes 1 100 10000 1e+03 2e+03 5e+03 1e+04 2e+04 5e+04 1e+05 1e+03 2e+03 5e+03 1e+04 2e+04 5e+04 1e+05 g43994.t3 MiSp1A1 MiSp1B1 g43994.t3 MiSp1A1 MiSp1B1 MiSp1A1 g286931.t1 MiSp1B2

g50170.t1 g208208.t1 g52079.t1SpiCE.NMa1 MiSp1A4 g60416.t1g1881.t1 g32081.t1 g47301.t1 g1881.t1 MiSp1A4 SpiCE-NMa1 g3280.t1 g483385.t1 g67374.t1

g66747.t1SpiCE.NMa1 g387525.t1 g47285.t1 g335078.t1 g31891.t1 g304464.t1 g67953.t1g22501.t1 SpiCE-NMa1 g335078.t1 g22501.t1 g35940.t1 g143386.t1 g32790.t1 g57655.t1 g270551.t1 g47303.t1 g266220.t1 g51381.t1 g38328.t1 g299811.t2 g32791.t1g9444.t1 g51381.t1 g266220.t1 g9444.t1 g27778.t1 g13752.t1 g7266.t1 g48872.t1 g334326.t1 g47300.t2g9651.t1 g445643.t1 g15475.t1 g6022.t1 g435224.t1 g9651.t1 g15475.t1 g445643.t1 g64564.t1 g72188.t1 g311.t1 g5963.t1 g58411.t1 SpiCE-NMi1 g36609.t1g61242.t1 g43994.t1 MiSp1A2 g61242.t1 Fig. S10. Expression profiling of the minor ampullate silk gland. Expression profiles from the minor ampullate silk gland in T. clavata (n = 6), T. clavipes (n = 8), and N. pilipes (n = 6). The X-axes represent the median of the quantified expression level in TPM (transcripts per million). The y-axes show the top 20 most highly expressed genes in each gland.

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

MView 2021/02/28 14(53

Reference sequence (1): Npi|SpiCE-NMa1|complete Identities normalised by aligned length. Colored by: property SpiCE-NMa1 cov pid 1 [ . . . . : . . . . 1 . . 120 1 Npi|SpiCE-NMa1|completeSpiCE-NMa1A1 100.0% Nephila 100.0% pilipes MSKILTIFVVFATLSYTSSLKLKRFEKYLALLGPDGLECDGVANLIKSMSSLIGTVKDAGGVMQGALGDFKDDLREAINDPNVRHKLRGLKGQMK------DI 2 Ncv|SpiCE-NMa1|completeSpiCE-NMa1A1 Trichonephila 96.1% 30.9% clavipes MSKILSILVVFTTLSYTSANHIKKIEKLLSKVDKEGMQCDDVRNVIRSYSNLLATVMDAGSVLKGAFGDFRDNIIEGISNPEMQKKLKSVKHGLASTIKRFMNFESSSQVLQNVLN--GL 3 Nct|SpiCE-NMa1|completeSpiCE-NMa1A1 Trichonephila 97.8% 41.7% clavata MSKILSILVVFTTLSYTSASHIKRIEKLLSKVDKEGMQCDDVKNVIKSYSSLLGTLMDAGNVIKGAFGDFRANMLEAIHNPEMLKKLKSVRHGLTSSIKRLMNFEGSSQALMNALQ--GL SpiCE-NMa1A14 Nin|SpiCE-NMa1|complete Trichonephila inaurata 98.2% madagascariensis 39.5% MSKILSILVVFTTLSYTSANHIKTIEKLLSKVDKEGMQCADVKNVIKSYSSLLGTVMDAGNVLKGAFADFRANIMEAITNPEMQKKLKSVKHGLTSSIGKLMNFESSTQAIQDALSGAGL consensus/100% MSKILoIhVVFsTLSYTSu.+lKphEKhLuhls.-GhpCssVtNlI+ShSsLluTlhDAGsVhpGAhuDF+sshhEuIpsPph.+KL+ul+ttht...... sl consensus/90% MSKILoIhVVFsTLSYTSu.+lKphEKhLuhls.-GhpCssVtNlI+ShSsLluTlhDAGsVhpGAhuDF+sshhEuIpsPph.+KL+ul+ttht...... sl consensus/80% MSKILoIhVVFsTLSYTSu.+lKphEKhLuhls.-GhpCssVtNlI+ShSsLluTlhDAGsVhpGAhuDF+sshhEuIpsPph.+KL+ul+ttht...... sl consensus/70% MSKILSILVVFTTLSYTSAsHIK+IEKLLSKVDKEGMQCDDV+NVIKSYSSLLGTVMDAGsVlKGAFGDFRsNlhEAIsNPEMpKKLKSVKHGLsSoIt+hMNFEuSoQsl.ssLp..GL

cov pid 121 . . : . . . . 2 . . . . 240 1 Npi|SpiCE-NMa1|completeSpiCE-NMa1A1 100.0% Nephila 100.0% pilipes GNLMGGMSGNLKDLGSGLQDLGGSMKGAFGPPVLNLLGNGYSMMDNLAGGGTMDSLSDIFNNARDLGGNMMSMDG------2 Ncv|SpiCE-NMa1|completeSpiCE-NMa1A1 Trichonephila 96.1% 30.9% clavipes GGSMQGFMGDMTSLGGQMEDLKGSMKDAFAPPLMNMVGTGYSMLDNMASGGLMNNVQDFGKNAMSLGKKMMFTIGTAENHDTAAHIMTDPPSCLTVGKIQSGSNACKGVLQTSTHPIVGE 3 Nct|SpiCE-NMa1|completeSpiCE-NMa1A1 Trichonephila 97.8% 41.7% clavata GDSMSGFMGDMKSLAGSMGDLKGSMKDAFAPPLMNMVGTGYSMLDNMANGGLMNNVQDFGQNAMSLGKKMMSTDG------SpiCE-NMa1A14 Nin|SpiCE-NMa1|complete Trichonephila inaurata 98.2% madagascariensis 39.5% GDSMKGFMSDIKSLAGQVGDLKGSMKDAFAPPLMNMVGTGYSMLDNMANGGLMNNVQDFGKNAMSLGKKMMSTDG------consensus/100% Gs.MtGh.ushpsLuuthtDLtGSMKsAFuPPlhNhlGsGYSMhDNhAsGGhMsslpDhhpNAhsLGtpMM.h.G...... consensus/90% Gs.MtGh.ushpsLuuthtDLtGSMKsAFuPPlhNhlGsGYSMhDNhAsGGhMsslpDhhpNAhsLGtpMM.h.G...... consensus/80% Gs.MtGh.ushpsLuuthtDLtGSMKsAFuPPlhNhlGsGYSMhDNhAsGGhMsslpDhhpNAhsLGtpMM.h.G...... consensus/70% GsSMpGFMGDhKSLuGphtDLKGSMKDAFAPPLMNMVGTGYSMLDNMAsGGLMNNVQDFGpNAMSLGKKMMSTDG......

cov pid 241 : . . . . 3 . . ] 328 1 Npi|SpiCE-NMa1|completeSpiCE-NMa1A1 100.0% Nephila 100.0% pilipes ------AGDLANEIFGDLSSGLGPMLGNLN-----LGGLG--EIGIKKIRNQRIRLLRKGVGDENSSHI 2 Ncv|SpiCE-NMa1|completeSpiCE-NMa1A1 Trichonephila 96.1% 30.9% clavipes SVMKKKGTTLQTILLSSTYQPTWFCAGDLASEIMNDLGGVMGPLMSGMDGLTGKVGGLS---GGLDGTG----GGLSGGLG----GGL MView 2021/02/28 14(54 3 Nct|SpiCE-NMa1|completeSpiCE-NMa1A1 Trichonephila 97.8% 41.7% clavata ------AGDLANEIMGDFGGAMGPLLSGMDGMTGKLGGLM---GGLGGDGGDTSGGLQGSLD----SSL SpiCE-NMa1A14 Nin|SpiCE-NMa1|complete Trichonephila inaurata 98.2% madagascariensis 39.5% ------AGDLANEIMGDMGGVVGPLMSGVGGLSDKMSGLMGGDGGLSGGLGGPAGGLSGGLG----GSL consensus/100% ...... AGDLAsEIhsDhuushGPhhushs.....huGL....hGltt.h....thLptuls....utl consensus/90% ...... AGDLAsEIhsDhuushGPhhushs.....huGL....hGltt.h....thLptuls....utl Reference consensus/80% sequence (1): Nct|SpiCE-NMa2A1|complete ...... AGDLAsEIhsDhuushGPhhushs.....huGL....hGltt.h....thLptuls....utl Identities consensus/70% normalised by aligned length...... AGDLANEIMGDhGGshGPLhSGhsGhosKlGGLh...GGLsGstst..GGLpGGLG....uuL Colored by: property

SpiCE-NMa2MView 1.63, Copyright(CRP type) © 1997-2018 Nigel P. Brown cov pid 1 [ . . . . : . . . . 1 . . 120 1 Nct|SpiCE-NMa2A1|completeSpiCE-NMa2A1 Trichonephila100.0% 100.0% clavata MKVIVALFVLISVVEFGDAK------RIANRGPPIAGRSSGPAMCDSFPICPPSCVTDVTGPCPTCDCSGFCDLNC 2 Ncv|SpiCE-NMa2A1|completeSpiCE-NMa2A1 Trichonephila 53.8% 34.5% clavipes MIFINTQEEIQTLLESLDAR------SpiCE-NMa2A2 3 Nin|SpiCE-NMa2A2|complete Trichonephila inaurata 55.0% madagascariensis 41.0% MKFLIALFVLIAAVKFGDAK------KQANRGPPNAEGGSTPA------SpiCE-NMa2A4 4 Nin|SpiCE-NMa2A4|complete Trichonephila inaurata 55.0% madagascariensis 41.0% MKFLIALFVLIAAVKFGDAK------KQANRGPPNAEGGSTPA------5 Nct|SpiCE-NMa2B1|completeSpiCE-NMa2B1 Trichonephila 63.1% 46.6% clavata MKVLVALFVVIAVVEFGDAR------KQAVSAVANAGARSGPA------6 Ncv|SpiCE-NMa2A2|completeSpiCE-NMa2A2 Trichonephila 63.1% 38.1% clavipes MKVLAELLVLGILLKFGADAARTNTAVARSGKLPSVNRGGFSEKCVGGCGVN------VAEPPPTNAKRVANRGPSNAGGSADPP------7 Ncv|SpiCE-NMa2A3|completeSpiCE-NMa2A3 Trichonephila 53.8% 41.3% clavipes ------MEVDTVWEKG------GSADPP------8 Ncv|SpiCE-NMa2A4|completeSpiCE-NMa2A4 Trichonephila 63.5% 47.2% clavipes MKVLVALFILVAALEFGDAK------KQANRSPSNAGGCSNPA------SpiCE-NMa2A3 9 Nin|SpiCE-NMa2A3|complete Trichonephila inaurata 63.1% madagascariensis 46.2% MKFLIALFVLIAAVEFGDAK------KIAYRGPSNAGGSSNPP------SpiCE-NMa2A110 Nin|SpiCE-NMa2A1|complete Trichonephila inaurata 63.5% madagascariensis 50.2% MKVLVALFVLIAVVEFGDAK------RAANRGPSNGGGSSNPP------11 Npi|SpiCE-NMa2A5|completeSpiCE-NMa2A5 42.6% Nephila 21.5% pilipes MKFFIAFLVLGVILAFSDA------DEQPA------12 Npi|SpiCE-NMa2A1|completeSpiCE-NMa2A1 51.0% Nephila 26.1% pilipes MKVQIALLVLGAVLVLSDAK------RVRKSKPAATGGNDQPA------13 Npi|SpiCE-NMa2A4|completeSpiCE-NMa2A4 61.4% Nephila 23.1% pilipes MKILIAVIVLGTVLEFGSTDDKSIRCRNVPNCEPPCVLDVTGSCPTCDCEIMKVAIVFLVLGAVLEFGDAKKHHKNPPVPKGNSAPA------14 Npi|SpiCE-NMa2A2|completeSpiCE-NMa2A2 47.0% Nephila 25.2% pilipes MKFFIAVIVLGAVFAFGNAK------QFRDHRPPVTRAEENPI------15 Npi|SpiCE-NMa2A3|completeSpiCE-NMa2A3 47.0% Nephila 24.2% pilipes MKVLIAVIVLGSVLELGNAK------RTRDHGPPNTSADEIPI------consensus/100% ...... l..hh...... https://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20210228-055213-0648-72055554-p1m/aln-html 1 / 1 consensus/90% Mhh..t.h.l.shhthuss...... tt.P...... ページ consensus/80% MKhhlAlhVLhsslchGsAc...... p.t.ptss.stusptPs...... consensus/70% MKhllAlhVLhsllcFGsA+...... +.t.ptPssstusssPs......

cov pid 121 . . : . . . . 2 . . . . 240 1 Nct|SpiCE-NMa2A1|completeSpiCE-NMa2A1 Trichonephila100.0% 100.0% clavata AGDSCKVVRNGNACACACTSGDDGDDGNSDACAITCPDGCTAIANNGQCSCLCNGGNSGPATCDSFPICPPSCVTDVSGPCPSCDCSGFCDLDCAGDSCKVVRNGNACACACTSG----- 2 Ncv|SpiCE-NMa2A1|completeSpiCE-NMa2A1 Trichonephila 53.8% 34.5% clavipes ------VSQYVLC--IVYWSHFGPCPKCDCSGYCDVNCVGDNCKIVRNGNASSCACTSG----- SpiCE-NMa2A2 3 Nin|SpiCE-NMa2A2|complete Trichonephila inaurata 55.0% madagascariensis 41.0% ------QNALCDSFPVC-LDCTLDVTGPCPTCDCSGFCDVNCDGDSCKIVRNGNACACACASG----- SpiCE-NMa2A4 4 Nin|SpiCE-NMa2A4|complete Trichonephila inaurata 55.0% madagascariensis 41.0% ------QNALCDSFPVC-LDCTLDVTGPCPTCDCSGFCDVNCDGDSCKIVRNGNACACACASG----- 5 Nct|SpiCE-NMa2B1|completeSpiCE-NMa2B1 Trichonephila 63.1% 46.6% clavata ------QNARCQSIPVCDPSCVLDITGPCPKCDCSGYCDVSCVGDSCKIVRNGNACSCACTSG----- 6 Ncv|SpiCE-NMa2A2|completeSpiCE-NMa2A2 Trichonephila 63.1% 38.1% clavipes ------GGARCQSFPVCPSNCVTDITGPCPKCDCSGYCDVNCVGDNCKIVRNGNACSCACTSG----- 7 Ncv|SpiCE-NMa2A3|completeSpiCE-NMa2A3 Trichonephila 53.8% 41.3% clavipes ------GVARCQSFPICPSNCVTDITGPCPKCDCSGFCDVNCVGDNCKIVRNGNACACACTSG----- 8 Ncv|SpiCE-NMa2A4|completeSpiCE-NMa2A4 Trichonephila 63.5% 47.2% clavipes ------PDAICDSVPVCPPYCTLDVTGPCPRCDCSNFCSVNCAGDSCKIVRNGNACACACTSG----- SpiCE-NMa2A3 9 Nin|SpiCE-NMa2A3|complete Trichonephila inaurata 63.1% madagascariensis 46.2% ------GVARCQSFPVCAPNCVTDITGPCPTCDCSGYCEVNCVGDSCKIVRNGTACSCACTSG----- SpiCE-NMa2A110 Nin|SpiCE-NMa2A1|complete Trichonephila inaurata 63.5% madagascariensis 50.2% ------GVARCQSIPVCPPYCSLDVTGPCPRCDCTGFCDVNCVGDSCKIVRNGNACACACASG----- 11 Npi|SpiCE-NMa2A5|completeSpiCE-NMa2A5 42.6% Nephila 21.5% pilipes ------AETICQSIPVCNPPCVLDRTGPCPRCDCRNTCNVSCTGQNCKIIRNGGQCSCVCGGG----- 12 Npi|SpiCE-NMa2A1|completeSpiCE-NMa2A1 51.0% Nephila 26.1% pilipes ------PNTICENVPRCNPPCILDISGPCPRCDCSNTCSVSCSGTNCKVIRNGRQCSCVCSGG----- 13 Npi|SpiCE-NMa2A4|completeSpiCE-NMa2A4 61.4% Nephila 23.1% pilipes ------QPARCNSFPVCDPTCVLDVSGPCPYCDCANVCSVSCSGTNCKIVRNGRQCSCVCNGGGGGNV 14 Npi|SpiCE-NMa2A2|completeSpiCE-NMa2A2 47.0% Nephila 25.2% pilipes ------LCRSIPQCDPPCVLDVSGYCPKCDCKNTCVVSCAGRNCKYLRNGKQCSCVCNGG----- 15 Npi|SpiCE-NMa2A3|completeSpiCE-NMa2A3 47.0% Nephila 24.2% pilipes ------TCRNISQCDSPCILDVSGLCPKCDCKNTSSVSCTGPNCKIIRNGKQCSCVCNGG----- consensus/100% ...... spphs.C...s.hsh.G.CP.CDCtshs.lsCsG.sCKhlRNGttsuCsCsuG..... consensus/90% ...... hCpshs.C...ChhDhoG.CPpCDCpshCsVsCsGpsCKllRNGptCuCsCsuG..... consensus/80% ...... shCpShPlCsssCshDloGPCPpCDCsshCsVsCsGssCKIlRNGptCuCsCsuG..... consensus/70% ...... sshCpShPlCsssCshDloGPCPpCDCoshCsVsCsGcsCKIVRNGstCuCsCsuG.....

cov pid 241 : . . . . 3 . . . . ] 349 1 Nct|SpiCE-NMa2A1|completeSpiCE-NMa2A1 Trichonephila100.0% 100.0% clavata ------DDDDAGTSDA-CAITCPDGCT-AIANNGQCSCLCNGNDGDDVCVIDCPPQCQVTRDCN-GNCQCVCF MView 2 Ncv|SpiCE-NMa2A1|completeSpiCE-NMa2A1 Trichonephila 53.8% 34.5% clavipes ------DDDATTSGS-CSITCPGGCK-AIANNGQCYCLCNGNDGDDVCVIDCPPQCTVTRDRF-GKCQCVCF 2021/02/28 14(54 SpiCE-NMa2A2 3 Nin|SpiCE-NMa2A2|complete Trichonephila inaurata 55.0% madagascariensis 41.0% ------DDDAGTSGS-CSITCPDGCK-AIANNGQCYCLCNDNDGDDVC------D-GDD----- SpiCE-NMa2A4 4 Nin|SpiCE-NMa2A4|complete Trichonephila inaurata 55.0% madagascariensis 41.0% ------DDDAGTSGS-CSITCPDGCK-AIANNGQCYCLCNDNDGDDVC------D-GDD----- 5 Nct|SpiCE-NMa2B1|completeSpiCE-NMa2B1 Trichonephila 63.1% 46.6% clavata ------DDDATTSGS-CSISCPGGCK-AIANNGQCYCLCNGNDGDDVCVIDCPPQCDVTRDSN-GNCQCVCF 6 Ncv|SpiCE-NMa2A2|completeSpiCE-NMa2A2 Trichonephila 63.1% 38.1% clavipes ------DDDATTSGS-CSITCPGGCK-AIANNGQCYCLCNGNDGDDVCVIDCPPQCTVTRDRF-GKCQCVCF https://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20210228-055340-0408-83452560-p1m/aln-html 7 Ncv|SpiCE-NMa2A3|completeSpiCE-NMa2A3 Trichonephila 53.8% 41.3% clavipes ------DDDAGTSGS-CSITCPGGCK-AIANNGQCYCLCNG--GDDVCAIDCPPQCTVTRDRY-GNCQCVCF 1 / 2 ページ 8 Ncv|SpiCE-NMa2A4|completeSpiCE-NMa2A4 Trichonephila 63.5% 47.2% clavipes ------DDGDAGTSGTGCAITCPGGCK-AIANNGQCYCLCNGNDGDDVCVIDCPPQCEVSRDSN-GQCQCICF SpiCE-NMa2A3 9 Nin|SpiCE-NMa2A3|complete Trichonephila inaurata 63.1% madagascariensis 46.2% ------NDDARASGS-CSITCPGGCK-AIANNGQCYCLCNGDDGDDVCVIDCPPQCEVTRDRN-GKCQCVCF SpiCE-NMa2A110 Nin|SpiCE-NMa2A1|complete Trichonephila inaurata 63.5% madagascariensis 50.2% ------DDDDAGTSGS-CSITCPGGCK-AIANNGQCYCLCNGNDGDDVCVIDCPPQCEVTRDSN-GNCQCVCF 11 Npi|SpiCE-NMa2A5|completeSpiCE-NMa2A5 42.6% Nephila 21.5% pilipes ------NV-CGIDCPPQCRVEFNEYGQCNCICF------12 Npi|SpiCE-NMa2A1|completeSpiCE-NMa2A1 51.0% Nephila 26.1% pilipes ------NNGGGGNNNV-CAVACSPGCR-KVTDNGQCACYCN------13 Npi|SpiCE-NMa2A4|complete 61.4% 23.1% CAISCPRGCRVIRNNGQCACDCSRGGGINNDDGGAVADAVAVSDDGASDAGA-CAIVCDPGCT-PVTTNGQCYCNCNGR-----CNINCPPGCPPRYDENDGVCRCVCF MView SpiCE-NMa2A4 Nephila pilipes 2021/02/28 14(47 14 Npi|SpiCE-NMa2A2|completeSpiCE-NMa2A2 47.0% Nephila 25.2% pilipes ------SKA-CTIVCNPGCR-PVTTNGQCACICK------15 Npi|SpiCE-NMa2A3|completeSpiCE-NMa2A3 47.0% Nephila 24.2% pilipes ------SKAICPIICPPDCM-PLTTNKQCACICK------consensus/100% ...... ts.Csl.CsstCh..hsp.tQC.C.C...... consensus/90% ...... sts.CsIsCsssCp..lssNGQC.ChCp...... Referenceconsensus/80% sequence (1): Npi|SpiCE-NMa4|complete ...... sssutsSsu.CuIsCPsGC+.slssNGQChClCN...... Identitiesconsensus/70% normalised by aligned length...... sssAssSsu.CuIsCPsGC+.slssNGQChClCNs...... C...... G.s..... Colored by: property

SpiCE-NMa4MView 1.63, Copyright (CRP © type)1997-2018 Nigel P. Brown cov pid 1 [ . . . . : . . . . 1 . . 120 1 Npi|SpiCE-NMa4|completeSpiCE-NMa4A1 100.0% Nephila 100.0% pilipes MMRLEILAV-AFTVLALSSVTAFKG------YSEDLVSFTIKADTCVGNSGDQKFCDDFISCVDKFPQKLQDIFHQCVKQAYGEEGLGKCNDKETLFRSQKQLKEYVSCFLTKLPNKSDL 2 Nct|SpiCE-NMa4|completeSpiCE-NMa4A1 Trichonephila100.0% 37.4% clavata MFEITQKRMGSLRRFHFKSSRAPDGERTSARYDNDFVRFNYRSDECISTSGNQNLCDKLQSCIQKLPTTLLNNFNECVSQVYPHGKLGKCNDQENLFGTNKQLKQYFHCFINKLPQKSSL 3 Ncv|SpiCE-NMa4|completeSpiCE-NMa4A1 Trichonephila 100.0% 40.7% clavipes MNRLPIQIL-VVSVLAVSSVIAIKG------YDNDFVNFNYRSDKCISTSGDQSLCDELQSCIQKLPITLLNNFNECVSQIYPDGMLGKCNDQENLFGTMEQLNQYFHCFLTKLPQKSSL SpiCE-NMa4A14 Nin|SpiCE-NMa4|complete Trichonephila inaurata 82.6% madagascariensis 37.5% MNRLPIQVL-VLSVLAVSSVIASKG------YDNDFVNFNYRSDKCISTSGDQNLCDELQSCAQKLPITLLNNFNECVSQIYPDGMLGKCNDQENLFGTMKQFNQ-----VRKIVTFSHI consensus/100% M.cl...hh.shphhthpSshA.cG...... YspDhVpFsh+uDpClusSGsQphCDch.SChpKhP.pL.s.FppCVpQhYscthLGKCNDpEsLFto.cQhpp.....lpKlsphSpl consensus/90% M.cl...hh.shphhthpSshA.cG...... YspDhVpFsh+uDpClusSGsQphCDch.SChpKhP.pL.s.FppCVpQhYscthLGKCNDpEsLFto.cQhpp.....lpKlsphSpl consensus/80% M.cl...hh.shphhthpSshA.cG...... YspDhVpFsh+uDpClusSGsQphCDch.SChpKhP.pL.s.FppCVpQhYscthLGKCNDpEsLFto.cQhpp.....lpKlsphSpl consensus/70% M.RLsIphl.sloVLAlSSVhA.KG...... YDNDFVsFNYRSDcCISTSGDQsLCD-LQSClQKLPhTLLNNFNECVSQlYP-GhLGKCNDQENLFGT.KQLpQYhpCFlsKLPpKSsL

cov pid 121 . . : ] 157 1 Npi|SpiCE-NMa4|completeSpiCE-NMa4A1 100.0% Nephila 100.0% pilipes SDSDQSKVDD-FKKCEYKVAGKCNKQ------2 Nct|SpiCE-NMa4|completeSpiCE-NMa4A1 Trichonephila100.0% 37.4% clavata SPEEQRQFDHPYKRCMLKVGHECLDNQ------3 Ncv|SpiCE-NMa4|completeSpiCE-NMa4A1 Trichonephila 100.0% 40.7% clavipes SPAEHRQFDHPYKAVDGSAELSCFVPTDELSWFLLPS SpiCE-NMa4A14 Nin|SpiCE-NMa4|complete Trichonephila inaurata 82.6% madagascariensis 37.5% N------LLRAM------consensus/100% s...... hhts...... consensus/90% s...... hhts...... consensus/80% s...... hhts...... consensus/70% Sst-ppphDc.aK+s.hpsthpC...... MView 1.63, Copyright © 1997-2018 Nigel P. Brown

19

https://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20210228-055340-0408-83452560-p1m/aln-html 2 / 2 ページ

https://www.ebi.ac.uk/Tools/services/rest/mview/result/mview-I20210228-054347-0684-63302604-p2m/aln-html 1 / 1 ページ bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Fig. S11. Alignment of SpiCE proteins. These figures show the alignment results of two SpiCE proteins (SpiCE-NMa1, SpiCE-NMa2, and SpiCE-NMa4) in T. clavata, T. clavipes, T. inaurata madagascariensis, and N. pilipes.

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

>Nct|SpiCE-NMa1|PRT#974 MHHHHHHSSGSSLEVLFQGPSHIKRIEKLLSKVDKEGMQCDDVKNVIKSYSSLLGTLMDAGNVIKGAFGDFRANMLEAIHNPEMLKKLK SVRHGLTSSIKRLMNFEGSSQALMNALQGGGLGDSMSGFMGDMKSLAGSMGDLKGSMKDAFAPPLMNMVGTGYSMLDNMANGGLMNNVQ DFGQNAMSLGKKMMSTDGAGDLANEIMGDFGGAMGPLLSGMDGMTGKLGGLMGGLGGDGGDTSGGLQGSLDSSL >Nct|MaSp1|PRT#967 MHHHHHHSSGSSLEVLFQGPAQGQATPWSSTELADAFINAFMNEAGRTGAFTADQLDDMSTIGDTIKTAMDKMARSNKSSKGKLQALNM AFASSMAEIAAVEQGGLSVDAKTNAIADSLNSAFYQTTGAANPQFVNEIRSLINMFAQSSANEVSYGGGYSGGQGGQSAAAAAAVGSAG QGGYGGLGGQGAGAAAAAAGGARQGGYGTLGGQGAGASAAAAAAGGTGQGGYGGLGSQGGYGGQGAGAAAAAAGGAGQGGYGGQNAGRG AGAAAAAAGGAGQGGYGGQGAGQGAGAAAAAAGGAGQGGYGGLTGQGAGAAAAAAGGAGQGGYGGLGAAAAAAGGAGQGGYGGVGAGAS AASAAAAGQGYGGLGGQGAGQGAGAAAAATGGAGQGYGGLGGQGAGAAAAAAGGAGQGGYGGLSGQGAGAAAAAAGGAGQGGYGGLGAA AAAAGGAGQGGYGGVGAGASAASAAASRLSSPEASSRVSSAVSNLVSSGPTNSAALSSTISNVVSQIGASNPGLSGCDVLVQALLEVVS ALIHILGSSSIGQVNYGTAGQATQIVGQSVYQALG >Nct|MaSp3|PRT#972 MHHHHHHSSGSSLEVLFQGPSRLHENKKEPELMENHIFERYPKQLDSAMGWITNILFVALFCSQTSLVRASLQELMSGGLGNMEGFINT FSDSLVSSGQLKPDLVDEVYSMKDTLINSINEMKTSKNLNQHKLQLFQMMFNSGIAEIAAQNGGDVNNIINSVQNSMASAFAQFSMGQD YKLAREMEELIRVFAENQDTNFAESTGAAAAAAGGRGYGGIGYGPGGASGAAAAAAAAEEGNRGLGYGPAGAAGAAAAAAAADGGQGGY GGRGFGRESGAAAAAAAAEGERGGLYGGGRGLGGDSAAAAAAAAADGGRGGYGGRGFGGNSGAAAAAAAAGEGNGGLGYGPGGASAAAA AADGGRGGLYGRGRGLGGDSAAAAAAAAADGGRGGYGGRGFGGDSGGEGDLGSFGAPAITLIRSVSPSVSRVNSFTTSLVSGGSLNVGA LPEIISRGIDEISSYSSGVSECEISVQVLLDIVASLLHILRYSSIGVVDYSTVGETNEYLSNALRGY >Nct|MaSp2|PRT#977 MHHHHHHSSGSSLEVLFQGPAQARSPWSDTATADAFIQNFLAAVSGSGAFSSDQLDDMSTIGDTIMSAMDKMARSNKSSQHKLQALNMA FASSMAEIAAVEQGGMSMGVKTNAIADALNSAFYMTTGAANPQFVNEMRSLISMISAASANEVSYGGGASAAAAAAAGGYGQGPSGYGQ GPSASSAAAAAPSGYGPSQQGPSGSAAAAAAAAGPAQQGPSAYGPSGPGAAAATAAAGPGGYGPQQGPGPQGPSGYGPSGPGSAAAAAA AAGAGPGGYGPQQQGPGQQGPSGYGPSGPSGPGSAAAVAAATGAGPGGYGPGQQGPGGYGPGQQGPSGPGSAAAAAAAAAGAGPRGYGP GQQGPAGQGPTGYGPSGPGSAAAAAAGAGPGQQGPGGYRPSGQGSAAAAAAAAGAGPGGYGPGQQGPGQQGPGGYGPSGPGSAAAAAAA AGAPDSGARVASAVSNLVSSGPTSSAALSSVISNAVSQIGASNPGLSGCDVLIQALLEIVSACVTILSSSSIGQVNYGAASQFAQVVSQ SILSAF

Fig. S12. Shortened recombinant amino acid sequences. The optimized amino acid sequences used to produce recombinant proteins (MaSp1, MaSp2, MaSp3, and SpiCE- NMa1). The iteration of the repeat domain in each MaSp gene was modified to be shorter than in the native protein sequences. These genes are composed of a 6x His tag (MHHHHHH), a linker (SSGSS), and an HRV 3C protease recognition site (LEVLFQGP).

21

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

MaSp1, MaSp2, or MaSp3 Drying for 16 h.

The solution was MaSp1, 2, and/or 3 poured into a dish blended film 8 ml HFIP

Fig. S13. Film formation procedure. This figure represents the composite film production procedure. Recombinant MaSps were mixed in 8 mL HFIP to a total of 100 mg and dried at room temperature for 16 h in a dish. The blending of the recombinant proteins was conducted on the dish.

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Solution Film 35 85

30 80

25 75

20 70 T% (500 nm) T% (500 nm) MaSp2 : 24.1 MaSp2 : 72.5 T [%] T 15 [%] T 65 + SpiCE-NMa1 + SpiCE-NMa1 10 1 wt% : 24.4 60 1 wt% : 77.4 3 wt% : 26.3 3 wt% : 77.8 5 5 wt% : 26.4 55 5 wt% : 76.1 Concentration: 1.25 wt%/vol% (HFIP) Film thickness: 10 µm 0 50 350 400 450 500 550 600 350 400 450 500 550 600 Wavelength [nm] Wavelength [nm] Fig. S14. Recombinant protein film transparency. These graphs show the total transparency (%) of MaSp2 and SpiCE composite films with different SpiCE concentrations (0, 1, 3, 5 wt%). The thickness of the used films was 10 µm.

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

50 0.5 0.5 50 0.45 45 40 0.4 0.4 40 ) 3 0.35 35 30 0.3 0.3 30 0.25 25 20 0.2 0.2 20 0.15 15 0.1 10

Toughness (MJ/m Toughness 0.1 10

0.05 strength (MPa) Tensile 5 0 0 0.0 1 wt% 3 wt% 5 wt% 0 1 wt% 3 wt% 5 wt% MaSp2 only + SpiCE-NMa1 MaSp2 only Ony.Masp21 X1999.1wt.2 X1999.3wt.3 X1999.5wt.4 OnyMasp21 X19991wt.2 + SpiCE-NMa1X19993wt.3 X19995wt.4

2.5 2.5 2.5 2500

2 2 2.0 2000

1.5 1.5 1.5 1500

1 1 1.0 1000 Strainatbreak (%) 0.5 0.5 500 0.5 Young's modulus (GPa) Young's 0 0 1 wt% 3 wt% 5 wt% 0 0.0 1 wt% 3 wt% 5 wt% MaSp2 only MaSp2 only OnyMasp21 X19991wt.2 + SpiCE-NMa1X19993wt.3 X19995wt.4 Ony.Masp21 X1999.1wt.2 + SpiCE-NMa1X1999.3wt.3 X1999.5wt.4

Fig. S15. Mechanical properties of recombinant composite films. Graphs showing the tensile properties, such as modulus, strain, strength, and toughness, of MaSp2 and SpiCE protein composite films with different SpiCE concentrations (0, 1, 3, 5 wt%).

24

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

MaSp2 only MaSp2 only

1 wt% + SpiCE-NMa1 (1 wt%) 3 wt% + SpiCE-NMa1 (3 wt%) + SpiCE-NMa1

+ SpiCE-NMa1 (5 wt%) 5 wt%

30

25

20

15

10

5 Degreecrystallinityof (%) 0 MaSp2Only 1%1 wt% 1999 3%3 wt%1999 5%5 wt%1999 MaSp2only + SpiCE-NMa1

Fig. S16. WAXS pattern and crystallinity of recombinant composite films. WAXS measurements of MaSp2 and SpiCE protein composite films under different SpiCE concentrations (0, 1, 3, 5 wt%). The top figure shows one-dimensional radial integration profiles and two-dimensional profiles. The bottom graph shows the degree of crystallinity in each composite film.

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Reference sequence (1): Npi|MaSp3B1|complete Identities normalised by aligned length. ColoredReference by: sequence consensus (1): group/70% Npi|MaSp3B1|complete Identities normalised by aligned length. Colored by: consensus group/70% cov pid 1 [ . . . . : . . . . 1 . . 120 1 Npi|MaSp3B1|complete 100.0% 100.0% MGWLT--NILFVALFCTQTTFVRGSL-QELMRGGLGNMESHelixFINIFSDSLVSSGQIREEYIDDVYTHelixMRQTLINSINDMKATKNLNPHKLQLWEHelixMLFNSGIAEIAA--QNGGDVNIHelixIVNSVQ 2 Nct|MaSp3B1|complete 97.5% cov 71.7% pid 1 M[G W I T -- NI L.FV A LFC S Q T S.LV R A S L - Q E L.M SGGLG NM E G.F I N T F S DSLV :S S G QL KP D L V.D E V YS MK D T LINSIN. E M K T S.KN L N QH K L Q LF. Q L L F N S AI A1E IAA -- Q N G GDINNI. I N SVQ . 120 31 Ncv|MaSp3B1|completeNpi|MaSp3B1|complete 100.0%98.5%N. pilipes 100.0% 76.0%(MaSp3B1) MGWILT--NILFVALFCSTQTSTFVFVRAGSL-QELMMRSGGLGGGLGNMEGSFINTIFSDSLVSSGQLNAQIREEYDLVIDDDEVYSYTMKMRDQTLINSINEDMKTATSKNLNPHKLQLFLWQEMLFNSGIAEIAA--QNGGDVNNIGDVNIIVINSVQ 42 Nin|MaSp3B1|completeNct|MaSp3B1|complete 98.5%97.5%T. clavata 73.5%71.7%(MaSp3B1) MGWIT--NILFVALFCSQTSFVLVSRASL-QELMSGGLGNMEGFINTFSDSLVSSGELNAQLKPDLVDEVYSMKDTLINSINEMKTSKNLNQHKLQLFQLLFNSGIAIAEIAA--QNGGDVNNIGDINNIIKINSVQ 53 Npi|MaSp2A1|completeNcv|MaSp3B1|complete T.91.5%98.5% clavipes 27.2%76.0%(MaSp3B1) MSGWSIT-L--ALNIALILFVAVLSLFCTSQTILASFVARGAQSALR-SQPEWLSMDTATASGGLG--NMDEAGFIQNNTFLSGAVSDSLVNSSGAFTAQLNADQLLVDDDEMVSYSTVGMKDTIMSAMDLINSINKEMAKRTSNKKNSLSNQSPHKLQALLFNQMALFANSSMGIAEIAAVE--QGNGQGDVNNISVGVKTINAIASVQ 64 Ncv|MaSp2A1|completeNin|MaSp3B1|completeT. inaurata madagascariensis 97.0%98.5% 25.7%73.5%(MaSp3B1) MSGWSIT-L--ALNIALILFVAVLSLFCTSQSTIYASFVSAQSALR-SQPEWLSMDTATASGGLG--NMDEAGFIQNNTFLSAAISDSLVGSSGAFTSELNADQLLVDDDEMVSYSTIGMKDTIMSAMDLINSINKEMAKRTSNKKNSLSNQHKLQALLFNQMLALFANSSMGIAEIAAVE--QGNGMSGDVNNIMAVKTIKNAISVQV 75 Nct|MaSp2A1|completeNpi|MaSp2A1|complete 97.5%91.5%N. pilipes 27.3%27.2%(MaSp2A1) MSWST-LALAILAVLSTQSTIYAILASAAGQARSPWSDTATA--DAFIQNFLAAVSGAVSGNSGAFSSAFTADQLDDMSTIGVGDTIMSAMDKMARSNKSSQHQSKLQALNMAFASSMAEIAAVEQGGMSQSVGVKTMGVKTNAIA 86 Nin|MaSp2A1|completeNcv|MaSp2A1|complete T.97.0% clavipes 27.0%25.7%(MaSp2A1) MSWST-LALAILAVLSTQSIYASAQARSPWSDTATA--DAFIQNFLGAVSAAISGSGAFTAFTSPDQLDDMSTVGIGDTIMSAMDKMARSNKSSQHKLQALNMAFASSMAEIAAVEQGGMSMAVKTNAIAAIV 97 Npi|MaSp1A1|completeNct|MaSp2A1|complete 93.0%97.5%T. clavata 27.8%27.3%(MaSp2A1) MTSWTSATRL-LALSALLILAVLCVLSTQGSMLAIYAQSAQN-ARTSPWSSTALADTATA--DAFINQANFLNEAGAAVSRGTSGAFTAAFSSDQLDDMSTIGDTLKGAMDIMSAMDKMARSNKSSKSQHKLQALNMAFASSMAEIAAVEQGGMGVMSMEGVKTAKTNAIA 10 8 Nin|MaSp1A1|completeNin|MaSp2A1|completeT. inaurata madagascariensis 92.0%97.0% 28.1%27.0%(MaSp2A1) MTSWTSATRL-LALSAILAVLCVLSTQGSLFAIYAQSGAQN-ARTSPWSSTDTATAELA--DAFINQANFLNEAGGAVSRGSGAFTAAFTPDQLDDMSTIGVGDTLKTAMDIMSAMDKMARSNKSSQSQHKLQALNMAFASSMAEIAAVEQGGLSVAMSMAVKTEKTNAIA 11 9 Nct|MaSp1A1|completeNpi|MaSp1A1|complete 93.0%N. pilipes 28.1%27.8%(MaSp1A1) MTWTARLALSLLAVLCTQSGLLAMLAQGAQAN--TPWSSTSTALAELA--DAFINAFMLNEAGRTGAFTADQLDDMSTIGDTIKTAMDLKGAMDKMARSNKSSKKSGKLQALNMAFASSMAEIAAVEQGGLSVDAKTMGVEAKTNAIA 1210 Ncv|MaSp1A1|completeNin|MaSp1A1|completeT. inaurata madagascariensis 93.5%92.0% 30.5%28.1%(MaSp1A1) MTWTARLALSFLILAVICVLCTQSGLFAQGQN-TPWSSTELA--DAFINAFMLNEAGRTSGAFTADQLDDMSTIGDTIKTAMDLKTAMDKMARSNKSSKQSGKLQALNMAFASSMAEIAAVEQGGLSVDAKTLSVAEKTNAIA 11 consensus/100%Nct|MaSp1A1|complete 93.0%T. clavata 28.1%(MaSp1A1) MsTW.TsA..RLslAL.ShlLLAlhsVLCoTQsS.LLAhstuQGpQ..A-pT.PhW.SpSTsthuELA..---DuAFIpN.AF.MsthstNEAGRoTGthAFTAp.-D.QlLDDD-hM.SoThtIGpDTlhsuhsIKTAMDcKMttARoppSNK.SsS.KtGKLQhhALpNhhMAFsASuhSMAEIAA..VEQsGG.LSVDAKTsht.hhpNulAIA. 12 consensus/90%Ncv|MaSp1A1|complete T. 93.5% clavipes 30.5%(MaSp1A1) MsTW.TsA..RLslAL.ShlFLAlhsVICoTQsS.LFAhspQuGpQ..N-pT.PhW.SssthuSTELA..---DuAFIpNsAF.MsthstNEAGRoTGthAFTApsD.QlLDDD-hM.SoThtIGDTlhsuhsIKTAMDcKMtApRSppNK.SsS.KtGKLQhhALpNhhMAFsASuhSMAEIAA..VEQsGGhshsLSVDAKT.hhNultAIA consensus/80%consensus/100% MsW.s..sl.hlAlhsoQos.hshstupup..p.h.ssthupsthu..-uFIps.F.sthstoGthssthp.D-.lD-h.ohtDpTlhsuhscMtttpSopppp.spp.tKLQhhpMhhhFsSuhAEIAA..QsGhshsshh.sht.hhNpultul. consensus/70%consensus/90% MsW.s..sl.hlAlhsToQoshhs.hspuph...p.h.ssuhussthu..-uFIpsF.sthssthstpSoGthsuthpsD.lDDD-h.TohtDTlhsuhscMtpSpp.spp.tKLQhhpMhhhFsSuhAEIAA..QsGhslsshshshs.hhNult consensus/80% MsW.s..sl.hlAlhsoQo.hspup..p.h.ssthu..-uFIpsF.sthstoGthssD.lD-h.ohtDTlhsuhscMtpSpp.sppKLQhhpMhFsSuhAEIAA..QsGhshsshhNult Helix consensus/70% cov pid 121 Ms W . s .. sl .hl A lhs T Q o hhs. p u Helixp h . p . h:. ssuhu .. - u.F I p s F . sths .p S G thsu D . l.DD h . T ht D T lhsuhs. c M t p S2pp . s pp K L Q hh. p M h F s S uh A.E IAA .. Q s G hslsshs.] 231 Nult 1 Npi|MaSp3B1|complete 100.0%N. pilipes 100.0% (MaSp3B1) NAMASAFQQFGMGQDYKLAREMEELIRVFAENQDTNSVESSA------AAASSGIGG-----GYGPVS------GYG--NGAGAAAAAAGSGEGGRGIGYGPGG 2 Nct|MaSp3B1|complete 97.5% T. clavatacov 71.7%(MaSp3B1) pid 121 NSMA S A F A Q FGMGQD. Y K LA R.E M E E LI R VF A:E N Q DT N FV E S.T G ------. AA A A AA GG .G GYGGI G Y G P.G G------AS 2 G ------. A AAAAAA E GG. NRGIEYGPGG .] 231 31 Ncv|MaSp3B1|completeNpi|MaSp3B1|complete 100.0%T.98.5% clavipes 100.0% 76.0%(MaSp3B1) NSMANAMASAFAQQQFGMGQDYKLAREMEELIRVFAENQDTNFVSVESSSTGA------AAAASAASGIGGGG--GGI-----GYGPPVS------GG------GYG--GGSGSANGAGAAAAAAAASAAAGSGEGGEGYGGIGYG---GRGIGYGPGG 42 Nin|MaSp3B1|completeNct|MaSp3B1|completeT. inaurata madagascariensis 98.5%97.5% 73.5%71.7%(MaSp3B1) NSMASAFGAQFGMGQDYKLAREMEELIRVFAENQDTNFVESSSTG------AAAAAAGGG--GGIGYGGIGYGPGG------G------ASGYG------GGTGSAAAAAAAAAASAAAERSGGYGGIGYG---NRGIEYGPGG 53 Npi|MaSp2A1|completeNcv|MaSp3B1|complete 91.5%98.5%N. pilipes 27.2%76.0%(MaSp2A1) DALNNSMASAFYAMQTTGAANFGMGQDAYQKFVLANREIMRESELISRMLVFAQEANTQVNDTENIGFVYEGGSTGAAAS------ASAAASAAAAAGGG------GGIGYGQPGPSAPVQQGAVG------GYG--PGGGSGSAPQVSSAAAAASAAA------EGGYGGIGYG--- 64 Ncv|MaSp2A1|completeNin|MaSp3B1|complete T.97.0%98.5% clavipes 25.7%73.5%(MaSp2A1) DGLNNSMASAFYGMQTTGAANFGMGQDPYQKFVLANREMRESELISRMIVFSAAEANSQANDTENVSFVYEGGSSG------ASAAAAAAAAGGG------GGIGYGQPGPS------G------GYGP---GASASSGGTGSAAAAAAAAASAAAPESGRSYGPSQQ----YGGIGYG--- 75 Nct|MaSp2A1|completeNpi|MaSp2A1|complete 97.5%91.5%T. clavata 27.3%27.2%(MaSp2A1) DALNSAFYMTTGAANPAQFVNEMIRSLISMIMLSAAQASTANVNEVSIGYGGG------AAAS------ASAASSAAAAAG-----GYGQGPS------PSAPVQQGAVGYG--Q--PGGPPQVSASSSAAAAAAAA------PSGYGPSQQGPSG 86 Nin|MaSp2A1|completeNcv|MaSp2A1|completeT. inaurata madagascariensis 97.0% 27.0%25.7%(MaSp2A1) DALNDGLNSAFYMTTGAANPQFVNEMRSLISMISAASSNANEVSYGGG------ASASAAAAAG-----GYGQGPS------GYGP-GGASASSTSASSAAAAAAPSGYGPSQQGP--YGPSQQ---- 97 Npi|MaSp1A1|completeNct|MaSp2A1|complete 93.0%97.5%N. pilipes 27.8%27.3%(MaSp1A1) DSLNDALNASAFMYQMTTGSTTGAANINSPQFVNEIMRSLISMFMIASAQASANEVSYGGSGGGGGYASAGG------ASAAAASAASAAG------GGYGQ------QGPS------GYG----QGGRGAGGPSASTSAAAAAAAAASGPAGSGQ------YGPSQQGPSG 10 8 Nin|MaSp1A1|completeNin|MaSp2A1|completeT. inaurata madagascariensis 92.0%97.0% 28.1%27.0%(MaSp1A1) DSLNDALNSAFYQMTTGATTGAANVNVPQFVNEIMRSLISMFMIASAQASANSNEVSYGGGYGGGQGGQSA------GAAASAAAAATG-----GAYGQG------PS------GYGGLP-GNTRGGGSASSAAAAAAAAATAA------PSGYGPSQQGP-- 11 9 Nct|MaSp1A1|completeNpi|MaSp1A1|complete 93.0%T. clavata 28.1%27.8%(MaSp1A1) DSLNSAAFYMQTTGAANTTGSINPSQFVNEIRSLINSMFAQSASANEVSYGGGSGGGGGYASAGYSGGQGGQ--SAASAAAAAASVG-----S------AGGQQ------G------GYGGL--GGQGAGGGRGAGAAAAAATAAAASGGGAG------Q------1210 Ncv|MaSp1A1|completeNin|MaSp1A1|complete T.93.5%92.0% clavipes 30.5%28.1%(MaSp1A1) DSLNSAFYQTTGAANTTGAVNPVQFVNEIRSLITSMFAQSASANEVSYGGGYGGQSAG---YGGGQGGQSAAAGAASAAAAAATG-----GGAGQG------GYGNLGLGGQGAGGNRGGGAAAAAAAAATAAA------AG------11 consensus/100%Nct|MaSp1A1|complete 93.0% 28.1% suhsuDSLNSAF..YQhshuTTGAAN.s.PpQhsFVpNEhIcRpSLIpNhhutsMFAQSpSssANpE.VSs.YusuGGG...... YSGGQGGQ--SAAuuuhsAAAAVG...... -----ShAG...... QG------uG.YG...... GLGGQGAGusAAAAAAAus...... GG------12 consensus/90%Ncv|MaSp1A1|complete 93.5% 30.5% suhsDSLNSAFhY.QhshuTTGAAN.s.PpQhsFVpNEhIcRpSLIpThhutsMFAQSpSssANpEhsVS.YusuGGG...... YGGQSAG---AAAuuSAAAusAAG.....-----uhGGG.QsG...... ------GYG...NLGGQGAGs.tsuuAAAAAAAuAuu...... AAG------consensus/80%consensus/100% suhssuhsuSAFh...hshutshshu.s.phspEhcpLIphhutspssphs.s.uuusuG...... uuAuuuuhsAAuuG...... GhG...... G...... GuY.G...... ss.uuuuusAAAAAus...... u.t...... consensus/70%consensus/90% suhsSAFh.hshutshshu.s.phspEhcpLIphhhhutsApspssphs.uuusuG...... uuAuuuAAAuuusG.....GuhhG.Gs...... GYG.....sutuuus.tsuuAAAAAAAuAuu...... tu...... consensus/80% suhsSAFh.hshuts.phspEhcpLIphhutspssphs.uuG...... uuAuAAuuG.....GhG.G...... GYG..ss.uuuuAAAAu.t...... consensus/70%Fig. S17 . Sequence alignment suhsSAFh.hshuts .ofphs pMaSpEhcpLIphhAp sNpss-pterminalhs.uuG...... regions.uuAuAAuuG..... ThisGhG.G...... alignmentGYG..sutuuu figureAAAAAA.tu ...... MView 1.63,confirms Copyright © 1997-2018 which Nigel P. Brown residues in the N-terminal domain (NTD) are shared among MaSp1, 2, MView 1.63, Copyright © 1997-2018 Nigel P. Brown and 3. Red triangles indicate the residues that have been suggested to contribute to the NTD-assisted association as D40, K65, E79, and E119 (10-12).

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.22.441049; this version posted April 23, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Table S1. Raw sequencing data

Table S2. Transcriptome data of Trichonephila clavata

Table S3. Transcriptome data of Trichonephila clavipes

Table S4. Transcriptome data of Nephila pilipes

SI References

1. P. L. Babb et al., The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression. Nat. Genet. 49, 895-903 (2017). 2. M. Steinegger, S. L. Salzberg, Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020). 3. D. Laetsch, M. Blaxter, BlobTools: Interrogation of genome assemblies [version 1; peer review: 2 approved with reservations]. F1000Research 6 (2017). 4. N. Kono et al., Orb-weaving spider Araneus ventricosus genome elucidates the spidroin gene catalogue. Sci. Rep. 9, 8380 (2019). 5. M. A. Collin, T. H. Clarke, 3rd, N. A. Ayoub, C. Y. Hayashi, Genomic perspectives of spider silk genes through target capture sequencing: Conservation of stabilization mechanisms and homology-based structural models of spidroin terminal regions. Int. J. Biol. Macromol. 113, 829-840 (2018). 6. J. E. Garb et al., The transcriptome of Darwin's bark spider silk glands predicts proteins contributing to dragline silk toughness. Commun Biol 2, 275 (2019). 7. B. Buchfink, C. Xie, D. H. Huson, Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59-60 (2015). 8. G. W. Vurture et al., GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202-2204 (2017). 9. F. A. Simao, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, E. M. Zdobnov, BUSCO: assessing genome assembly and annotation completeness with single- copy orthologs. Bioinformatics 31, 3210-3212 (2015). 10. J. H. Atkison, S. Parnham, W. R. Marcotte, Jr., S. K. Olsen, Crystal Structure of the Nephila clavipes Major Ampullate Spidroin 1A N-terminal Domain Reveals Plasticity at the Dimer Interface. J. Biol. Chem. 291, 19006-19017 (2016). 11. S. Schwarze, F. U. Zwettler, C. M. Johnson, H. Neuweiler, The N-terminal domains of spider silk proteins assemble ultrafast and protected from charge screening. Nat Commun 4, 2815 (2013). 12. N. Kronqvist et al., Sequential pH-driven dimerization and stabilization of the N- terminal domain enables rapid spider silk formation. Nat Commun 5, 3254 (2014).

27