Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. odce ymkn h eto h eoeifrain(ue l,21;Nsiaae l,21;Zaget those Zhang on 2015; analysis al., genetic et forward Nishikawa and 2019; strains conducted. al., mutant been et few have not (Gu above have information mentioned species genome species the However, as of such 2019). best al., species, the lepidopteran making other by of conducted of sequences sequence genome of genome Whole usability , whole the the increased. 2008), and Since significantly Consortium, Beadle discoveries. has Genome When those Silkworm of to species. (International mutants reached contributed determined colour simultaneously strains an almost egg mutant for (1941) using of laws Kikkawa dreds by Mendel’s (1941), concept of for hypothesis’ similar validity valid enzyme the the is gene–one proved heredity ‘one which of proposed case laws Tatum Mendel’s first confirmed the (1906) was Toyama this example, For discoveries. able mori Bombyx Introduction silkmoth, Eri such assembly, genes, some genome were mori, novo genes B. ricini. was De coding of S. ricini that protein in Keywords: S. from 16,702 evolved different of specifically so Mb. have genome not 21 to assembled was approximately seemed ricini The was genes, S. assembly fibroin of ricini. and the facilitate repertoire S. genes to of gene chorion order of the length In as Although sequence N50 analysis. assembly. genome the genetic the and whole forward S. in for scaffolds, the predicted of target 155 different determined the eco-races with are and be we 26 Mb can countries, etc. ricini, least traits Asian 458 pattern, S. at those throughout spot that in larval distributed mori: so colour, research are species B. integument which genetic preference, for wild species, the food those and Samia to as ricini exceed compared wild such S. even reared with between traits or Physiological easily hybridise rival more can be progenies. much ricini can ricini fertile is former S. S. produce ricini the and for S. mori, available Since reported Bombyx resources are species. species ricini genetic model model lepidopteran addition, current novel In the a than be latter. diseases to potential to the resistant has and , tough saturniid gigantic a ricini, Samia Abstract 2020 27, April 5 4 3 2 1 Shimada Toru Lee of Jung species model new a ricini, lepidopteran Samia of sequence genome The h nvriyo Tokyo of University The available not Biology Affiliation Basic for Institute Center National Research Science Advanced University Kanazawa University Gakushuin aasplexippus Danaus 1 ook Nishiyama Tomoaki , aebe h et‘oe raim nLpdpeaadalwdrsacest aeremark- make to researchers allowed and in organism’ ‘model best the been have 1 uuuKatsuma Susumu , , yati dispar Lymantria 2 hj Shigenobu Shuji , 5 n oo,aenwaalbeadsm osdrbesuiswere studies considerable some and available now are on, so and n aah Kiuchi Takashi and , .mori B. 1 3 assiYamaguchi Katsushi , ai ricini Samia hr sn ob htaalblt fhun- of availability that doubt no is There . 5 , .mori B. 4 uaaSuzuki Yutaka , smdlspecies model as aii polytes Papilio .mori B. .mori B. and was 5 , Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. iona S)wr mlyd o aBo 0k irr a rprdadfu MTclswr used were were cells libraries SMRT mate-pair four and and paired-end prepared Illumina Ca- was obtained. (Illumina, library finally HiSeq 20-kb were Illumina subreads a and 3,267,255 PacBio, USA) For sequencing; California, employed. for Bioscience, were (Pacific USA) System lifornia, Sequel PacBio Genomic-tip WGS, For using sequencing prepared genome was protocol. and DNA manufacturer’s preparation the Genomic Library to larvae. according fifth-instar Germany) from Hilden, sampled (QIAGEN, 100/G was (WGS) gland sequencing silk genome Posterior whole for preparation sample DNA 25 at dark) h light/8 25 BC at and dark) individuals h light/8 h (16 altissima. conditions long-day under leaves (NBRP; Project of strain UT the identify Methods to and attempted Materials we not, or feasible is in genes phenotypes several trait-related whether for identifying confirm chromosomes to at responsible completed, was aiming assembly the research After genetic assembly. 2003). genome high-quality Naumann, construct to & HiSeq1500 Peigler, of 2015; research genetic ricini al., facilitate to et host order and (Brahma In colour populations cocoon among genus patterns, in observed marking diversity larval be Genetic colour, can integument larval preference as plant such nature endemic as different such of 2003), Naumann, populations & the Peigler, addition, 2015; In al., 2017). et (Brahma al., et (Singh all, races of achievable.eco now First is diversity. interest genetic of genes of analysis Utilising already functional effector have Shimada, activator-like that Kawamoto, we Transcription meaning Kiuchi, Thus, using (Lee, 2018), lines species production. Katsuma, knockout gene egg this & several efficient in obtained in successfully system and resulting (TALENs) genome-editing Also, nucleases scale, a 2017). large establishing Pathania, in in & synchronously of succeeded Ahmed, research reared Kumar, that be means (Singh, can which limitation and 1984), seasonal Waldbauer, from & Although free Sternburg, production, is 2015; regions. silk Dutta, other of & and Swargiary, purpose countries (Brahma, the Asian for many in domesticated to lost are transferred been artificially has been than species has better this and is India, that species Assam, demonstrating ‘Which in traits, question that these the obvious of to ricini is basis answer It genetic possible on. the taxon. One so own elucidate and to its ability attempting in flight researches using species adult researches for ordinary ability, suitable foraging an not colour, being is integument mean larval necessarily the not as does deprived organism had model Domestication a being However, Fg 1). (Fig. eepoe ohln-edadsotra eunes aeyPci eulsse n illumina and system Sequel Pacbio namely sequencers, short-read and long-read both employed We . .ricini S. .mori B. F 1 .ricini S. neseichbiswr bandb crossing by obtained were hybrids interspecific .ricini S. http://shigen.nig.ac.jp/wildmoth/ .mori B. o eei eerhhsaohravnae twl nbeu oacs osbtnilyhigh substantially to access to us enable will It advantage. another has research genetic for . .canningi S. 1 .ricini S. niiul eerae on reared were individuals ° C. n aaosri of strain Nagano and lokona Eislmt, n fteStridmt pce hc a originated was which species moth Saturniid the of one silkmoth,’ ‘Eri as known also , osntawy edt etrudrtnigo eiotrninsects. lepidopteran of understanding better to lead always not does Samia samliotn pce hl aoiyo aunisaeuiotn rbivoltine or univoltine are saturniids of majority while species multivoltine a is .mori B. .ricini S. and iaso vnecesta in that exceeds even or rivals .c pryeri c. S. fmn riswihmjrt flpdpea pce oss,such possess, species lepidopteran of majority which traits many of .ricini S. eotdycnit fa es wnysxmrhlgclydifferent morphologically twenty-six least at of consists reportedly .ricini S. .c pryeri c. S. r itiue hogotsuhades sa countries, Asian east and south throughout distributed are iiu communis edcddt eemn h hl eoesqec of sequence genome whole the determine to decided we , .While ). .ricini S. sal opouefriehbiswt wild with hybrids fertile produce to able is 2 avewr rvddfo h ainlBioResource National the from provided were larvae ai canningi Samia .ricini S. ° and C, .c pryeri c. S. .mori B. .c pryeri c. S. .c pryeri c. S. .mori B. evsudrln-a odtos(6h (16 conditions long-day under leaves avewr erdon reared were larvae .ricini S. . or ai yti pryeri cynthia Samia eaeand female avewr erdon reared were larvae ngntcrsac? is research?’ genetic in . tl ean h risthat traits the retains still .ricini S. .ricini S. iiu communis Ricinus rw uniformly grows Samia Because . ae F male. .ricini S. .mori B. species Samia S. 1 Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. h eoi C rga a sflos 0cce f1 t98 at s 10 of cycles 40 follows: as was BC eight program One of PCR Mbp legs genomic 1 the The than from longer extracted scaffolds was thirty-five DNA detect specifically distinguish can molecularly can which and markers genetic thirty-five designed BC We in chromosomes backcross species, obtained or lepidopteran we heterozygotic of First, ovaries approach. in genetic occur ( the was adapted scheme we Crossing (BC scaffolds, 1 between linkages generation the clarify To lepidopteran scaffolds 4 of with of 2018) analysis assembly utilized Linkage al., genome was et latest 2014) (Waterhouse the al., software comparison, v3 et For to BUSCO. (Walker assembly. BUSCO sufficient the Pilon to were assembly, of including submitted repeats species, completeness was the four case, assembly the polish this final assess to In to The order run. reads. output previous in the short until the Then, Illumina repeated of results. was that treatment final Racon from employed. obtain difference was Naga- no Sovic, 2016) (Vaser, showed (Li, Racon minimap file HGAP4, FASTA with from 2016) contigs Sikic, draft & (https://www.pacb.com/support/software- from rajan, assembler sequences HGAP4 consensus the using construct assembled To were downloads/). System numbers Sequel from accession derived reads under Long available assessment were completeness Jellyfish and for assembly used Genome of data in heterozygosity read heterozygosity comparison, Short 2017), For estimated. ( estimated. al., also DRR213145 was was et individuals 2018) (Vurture sequenced al., GenomeScope et the and of 2011) one Kingsford, the (Mar¸cais, & Jellyfish Using Loh- (Bolger, assessment v0.36 Heterozygosity Trimmomatic using conducted were 2014). reads Usadel, of & trimming se, results, check (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). quality FastQC using the examined on was Based reads. reads short paired-end Illumina of 100-bp quality The with platform trimming RNA-seq. 2500 and of Prep check HiSeq results Library Quality Illumina the RNA the on and Specific gland- information using Strand silk summerizes sequenced SureSelect anterior S3 for were using Table library prepared and The were (Agilent) reads. samples paired-end Kit RNA 76-bp sequenced gland-derived with and silk System (Illumina) and library Kit IIx middle The Prep (Illumina) Analyser reads. Library Genome paired-end Kit RNA TruSeq 101-bp the Prep using and using prepared 100-bp Library was with samples RNA platform RNA 2500 TruSeq midgut-derived HiSeq for using Illumina the prepared using were sequenced count were RNA-seq read for of length, libraries sequencing read Embryo-derived e.g. v4 WGS, Illumina of by (RNA-seq) results obtained sequencing the were RNA on reads information mixed 511,569,649 summarize and total, S2 bases. index Table In total different and and had lane. to S1 libraries same Table 3-kbp the libraries the lanes. slices All Mate-pair in 2 gel ZO). process. sequenced subsequent from (female recovered the bp be with for DNA 310-530 to Hyper separated used at and Kapa and were Sage-ELF tagmentation and 40-kbp S2 with after approximately Covaris and kit, CHEF-electrophoresis with ZZ) prep with fragmented (male separated library 200–250-bp DNA were Pair at from Mate gel constructed Nextera agarose were an kit, libraries Paired-end prep kit. library PCR-Free prep Illumina using prepared TM C atrMx(OOO a sdt efr eoi C.Te,teall obntosof combinations allele the Then, PCR. genomic perform to used was (TOYOBO) Mix Master PCR .ricini S. .ricini S. 1 .mori B. niiul between individuals ) ai yti pryeri cynthia Samia n R5445( SRR5641445 and ) - , .ricini S. aii xuthus Papilio .ricini S. homozygotic. and .ricini S. .yamamai A. , .c pryeri c. S. aasplexippus Danaus x 1 .ricini S. aveuigteDes lo n iseKt(QIAGEN). Kit Tissue and Blood DNeasy the using larvae and 3 Fg 1)adte efr eoi C.Genomic PCR. genomic perform then and S1A) (Fig. .c pryeri c. S. x ) ). .ricini. S. 1 and niiul hudbe should individuals ltlaxylostella Plutella ° lsl eae pce to species related closely a , ,5sa 60 at s 5 C, ic eoi eobnto osnot does recombination meiotic Since ° n t68 at s 5 and C .ricini S. a losmte to sumitted also was , .yamamai A. - .c pryeri c. S. ° .ricini. S. .KOD C. (Kim Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. eesbitdt QTE e.161 Nue,Shit aslr ih 05 on,Chernomor, Hoang, 2015; Minh, & Haeseler, Schmidt, (Nguyen, 1.6.11 ver. IQ-TREE alignments to by incom- trimmed aligned were submitted Finally, alignments were were sequence used. OrthoFinder, where of was regions Gabald´on, (Capella-Guti´errez, outputs remove Silla-Mart´ınez, 2009) To & trimAl the 2002). of plete, Miyata, one & Ortholog, Kuma, Misawa, Copy (Katoh, Single MAFFT of sequences acid amino Extracted tree species Constructing of data proteome tokyo.ac.jp/cgi-bin/download.cgi). The 2016). Blaxter, & to Jiggins, Regarding S5). (Table analysis stella BUSCO including for used species, was which assembly, multiple among xylostella groups P. ortholog software identify OmicsBox To in done program were program analysis BLASTP analyses InterProScan genome These with by Comparative G achieved 2019). database & were (El-Gebali, Uniprot (Conesa, search database mode to Pfam domain trial with and aligned through 2017) classification were al., Protein genes et (Finn 2009). predicted al., the et of (Camacho sequences acid alignment Amino spliced were protein exonerate obtain and to annotation StringTie 2008). Functional PASA, 2005) al., BRAKER2, et Birney, (Haas by & EvidenceModeler generated (Slater, by region predictions integrated exon v2.2.0 multiple for Finally, exonerate data using of information. pro- transcriptome sequence tr2aacds.pl sequences multiple (UniProt, to genome annotated merge database addition to manually to Resource In of used Protein regions. sequences Universal also exon using the the was acid identifying sequence 2015) amino for genome Furthermore, al., 2013) the prediction. et al., to (Pertea et aligned Haas StringTie were 2008; gram, assemblies al., suite transcriptome multiple et EvidentialGene reads merged (Haas from in RNA-seq PASA The bundled assemblies the sets. program the assembled tr2aacds.pl data Hi- merge we the by transcriptome to Parallelly, Then, generated used genome option. 2013). files was the al., ‘–bam’ BAM (http://arthropods.eugenes.org/EvidentialGene/evigene/) et to using resultant (Haas mapped The by assembler was generate 2015). Trinity BRAKER2 S3) To using Salzberg, to (Table soft-masked. & reads submitted were Langmead, RNA-seq were RepeatMasker (Kim, of SAT2 HiSAT2 by sets prediction. using identified eleven Hoff, gene by genome prediction, for sequence the 2016; gene employed in for was Stanke, 2008) evidence sequences & Sch Haussler, extrinsic repetitive Stanke, & Borodovsky, all, Baertsch, 2014; of Lomsadze, Borodovsky, Diekhans, First Stanke, & Lange, 2006; Burns, Waack, Hoff, Lomsadze, & 2019; 2009; genstern, Stanke, al., & Borodovsky, et Lomsadze, (Camacho pipeline BRAKER2 (http://www.repeatmasker.org/RMBlast.html). initio RMBlast Ab with conducted was annotate RepeatMasker and mask 2005). To al., 1999). (Benson, v4.0.4 TRF ( and 2005) in Pevzner, sequences & repetitive 2002), Jones, Eddy, & (Price, (Bao, v1.0.5 v1.0.8 RepeatScout RECON with (http://www.repeatmasker.org/RepeatModeler/) v1.0.11 the Modeler of elements linkage repeat the for identify To used analysis primers comparative and all identification listed Repeat S4 Table Tokyo). Kyoto, (Shimadzu, analysis. system electrophoresis microchip Yoshido to according groups BC 8 in scaffolds http://www.repeatmasker.org/RMDownload.html h rtoedt a bandfo ebs ( Lepbase from obtained was data proteome the , eeprediction gene rhFne Em,&Kly 05 a sd ahgn e orsoddt h genome the to corresponded set gene Each used. was 2015) Kelly, & (Emms, OrthoFinder , 1 niiul eeeaie.Acrigt h eut edsgae h dnie linkage identified the designated we result, the to According examined. were individuals .ricini S. tal. et h osrce utmrpa irr a tlsdb eetakrv4.0.7 RepeatMasker by utilised was library repeat custom constructed the , t,2008). ¨ otz, 21) lcrpoei a odce sn .%aaoeglo MultiNA or gel agarose 2.0% using conducted was Electrophoresis (2011). .ricini S. eoe utmrpa irr a osrce sn Repeat- using constructed was library repeat custom a genome, .mori B. aal-roa,&Ce,20)wt ebs Jraet (Jurka Repbase with 2009) Chen, & Tarailo-Graovac, ; http://www.uniprot.org 4 http://lepbase.org a bandfo ikae(http://silkbase.ab.a.u- SilkBase from obtained was .mori B. .plexippus D. Cals ua,Dasmahapatra, Kumar, (Challis, ) , Btmn 09 eealigned were 2019) (Bateman, ) .plexippus D. , .xuthus P. .ricini S. , .xuthus P. ffan Mor- ¨ offmann, eoie in deposited and .xylo- P. and Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. h nsue nBSOaayi TbeS) stetasrpsof transcripts the As S5). p25 (Table Fib-L, analysis Fib-H, BUSCO as in reported used previously ones not were the sequences acid in amino sericin or and transcripts with any conducted because queries was in as assemblies genes genome encoding (NP protein the p25 silk Fib-L, against of search repertoire TBLASTN the , comprehend putative to confirm tried of to we conducted not. sequences When was or TBLASTN acid present found, amino were not were transcripts deduced models both The gene whether corresponding the S7). because (Table LC001870, respectively and gland, silk middle sericin and against gland search silk TBLASTN performed BLASTP we to when addition, Tsubota order In added In search. always query. TBLASTN were as and yamamai sericin) options ‘No BLASTP A. p25, for no’ being Fib-L, query result ‘-seg (Fib-H, as BLASTP proteins and utilised of silk of 1e-5’ homolog with cases ‘-evalue the conducted In investigate were report, search option. of this of TBLASTN no’ sequences In models and ‘-seg nucleotide gene and against parameter. 16,702 1e-5 search against same than TBLASTN search less BLASTP found,’ e-value query, hits an as with sequences conducted those was using of thus LC001865.1) and Genbank, LC001864.1 in (LC001863.1, p25 and (BAQ55621.1) Fib-H Identifying program. on BLASTP based via constructed genes was of whether protein tree chorion check phylogenetic IQ-TREE high-cysteine The to using replicates order 2015). 1000 In Minh, with & 2013) Haeseler, Haeseler, model. Schmidt, & likelihood PMB+F+R5 (Nguyen, maximum Nguyen, by 1.5.5 (Minh, analysis phylogenetic methods ver. to bootstrap subjected were ultrafast sequences and Aligned 2004). (Edgar, sequences (evm.model.Sr of gene analysis Phylogenetic 24 genes, S6). chorion (Table ricini identified S. also (evm.model.Sr were models gene and 1137) non-chorion 80 within five and models cluster, result, this gene a Within predicted As the cluster. using region. gene 1e-5, gene the than to chorion less addition e-value the in an around database, with protein 2019) non-redundant (Bateman, against database search Uniprot BLASTP performed we information, tailed of that analysis phylogenetic found the and OrthoFinder in cluster scaffolds gene short orthologs, chorion ideogram, Single-copy the the 2009). Identifying simplify al., To et connected. (Krzywinski were program genome, mori Circos each B. the in with OrthoFinder 2015) by Ng, identified & Yap, of Tan, similarity (Cheong, the assess to & order In Haeseler, replicates. Wong, The bootstrap for Minh, 1,000 alignment. ideogram Kalyaanamoorthy, on each circular based (S. for Drawing evaluated ModelFinder lengths were (Chernomor, by branch supports option and determined Branch ‘-sp’ model was 2017). with substitution partition Jermiin, reconstruction, different each allow tree of to likelihood model 2016) maximum Minh, for & 2018) Haeseler, Vinh, & Minh, Haeseler, rncit eesbitdt h eemdlstof set model gene the to submitted were transcripts eoeasml hc eentasge o2 hoooe,wr lee out. filtered were chromosomes, 28 to assigned not were which assembly genome tal et D . eoeusing genome fibroin .xylostella P. 21)adDong and (2015) . plexippus 0191.)adsrcn1 ,3(B1091 NP (AB112019.1, 3 2, sericin-1, and 001139413.1) ee,121 genes, HGAP and hro ee eitrda h npo n CIdtbs n n non-chorion one and database NCBI and Uniprot the at registered genes chorion chorion Fib-L and sericin .mori B. S JL .mr chorion mori B. . .plexippus D. scaf .mori B. ricini speeto o in not or present is tal et ee eetnel rae ncrmsm Cr ) o oede- more For 1). (Chr. 1 chromosome on arrayed tandemly were genes .mori B. .mori B. ee in genes .15 sotru.Msl a sdt eeaeainet fprotein of alignments generate to used was Muscle outgroup. as 2.1135) i- eunea query. as sequence Fib-L 21)rpre ht5ad4 and 5 that reported (2015) . a ihcsen hro eeo o,aioai eune f38 of sequences acid amino not, or gene chorion high-cysteine has a lge oddcdaioai eune f80 of sequences acid amino deduced to aligned was and eoeasmle hc eeue o BAT erhwas search TBLASTN for used were which assemblies Genome . and .ricini S. ee,21 genes, .ricini S. chorion .ricini S. .ricini S. 5 eoe,acrua dormwsdanuigClico using drawn was ideogram circular a genomes, ee eefudo h.1adteegnsmd a made genes these and 1 Chr. on found were genes genome P .ricini S. . xylostella genome, genomes .ricini S. 0168.,NP 001166287.1, Fib-H hog LSP eadn LC001867 Regarding BLASTP. through chorion sericin .mori B. hro ee,29 genes, chorion .mori B. , eoewscnutdwt the with conducted was genome Fib-L chorion .ricini S. HGAP ee r xrse nanterior in expressed are genes .plexippus D. ee a odce ih80 with conducted was genes i- (NP Fib-L and i- (NP Fib-H 0181.)sequences 001108116.1) JL eearayregistered already were genes p25 scaf P of 0078.)was 001037488.1) and .ricini S. . 2.1123,1128,1135,1136 .xuthus P. xuthus 001106733.1), .xylostella P. .ricini S. chorion chorion were Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. A ) h opnnso aiiso IEwr ieet al 8sostecp ubro ahLINE each of number copy the shows S8 Table different. were LINE of largest between families similar the were of is genome components the LINE the in 43.5% repeats, sequences B), repetitive ‘unclassified’ occupy 2A, all for to elements proportion Except its repeat 1). and in (Table that sequences genome repetitive estimated assembled of that superfamily the 2009) meaning of Chen, bp, bp) & 443,618,927 (196,045,652 (Tarailo-Graovac, up program genome counted assembled RepeatMasker 2). scaffolds the (Table thirty-five in chromosomes the These found to sequences 27). assigned Repetitive was = genome 2n the female: of 2011) 98.5% 28, Sahara, approximately & = Yasukochi, (Yoshido, 2n that report concluded previous (male: and the conducted to was of corresponds BAC-FISH line which where inbred S1B), ( an Fig. scaffolds 2, establish (Table indicates thirty-five to linkages above required of mentioned in are strategy analysis difference diapause periods Linkage of longer the difference times from The six year. result least per might at generations that six This least 2018). 31) at al., generate = can et (k Kim analysis S2; distribution because (Fig. k-mer Saturniidae assembly: between family the voltinism successful the of the to that in of to than belonging heterozygosity individual the key lower Low male are the considerably one 2018). scores S2), be Kawahara, in statistic heterozygosity might & these that & Cinel, project knowledge, Triant, estimated Jiggins, were this our 2018; Dasmahapatra, BUSCOs for of al., Kumar, of best et used (Challis, Kim the assemblies 97.8% duplicated) To 2016; genome and that S5). Blaxter, lepidopteran complete revealed was Table 8, ever–constructed species single-copy; length (see among and 42 BUSCOs scaffold best complete from tested (1614, longest genome 1,658 BUSCOs BUSCO The assembled among 1,658 using the 34.3%. analysis including in was detected (BUSCO) odb9, content completely Orthologs insecta GC Single-Copy with Universal 1). v3.0 Benchmarking (Table Mbp. Mb 33 21 approximately approximately was assembly of assembly Final of Overview Discussion and BC Results and of (QIAGEN) 98 PCR at kit designed genomic min Tissue markers and 2 and Genetic follows: extraction as Blood DNA was DNeasy trait. for program PCR used each 98 genomic S4). was for The Table (TaKaRa) respectively. chromosomes (see Ver.3 individuals, responsible utilized Polymerase were identify DNA analysis MightyAmp to linkage order scaffold in for up picked were phenotypes cocoon’ ‘Red BC and ‘Spot’ ‘Yellow’, ‘Blue’, BC MEGAX to of in using subjected chromosomes replicates were responsible 1,000 sequences 2018). Identifying with Tamura, Aligned methods & Knyaz, bootstrap 2004). Li, and (Edgar, Stecher, likelihood sequences (Kumar, maximum protein by of analysis alignments phylogenetic generate to used was of seven sericin with case yamamai conducted the was as sericin procedure of same the the to thus mapped Regarding were sequences presence. those the 3), Table (see registered already ° ,1 t60 at s 15 C, 1 niiul,wihhsayoeo ormrhlgcltat,‘le’‘elw’‘pt n Rdcocoon,’ ‘Red and ‘Spot’ ‘Yellow,’ ‘Blue,’ traits, morphological four of one any has which individuals, 1 .yamamai A. individuals .ricini S. ° n i t68 at min 1 and C ee L057 C88,L059 C89 n C89;Zrvce l,21) Muscle 2016). al., et Zurovec LC08591; and LC08590 LC08589, LC08588, (LC08587, genes .ricini S. .ricini S. sauiotn pce n mreol neprya,weesmultivoltine whereas year, per once only emerge and species univoltine a is eoeassembly genome sericin and eoews404945b ogwt 5 cffls h 5 egho the of length N50 The scaffolds. 155 with long bp 450,479,495 was genome .yamamai A. ee in genes ° C. .ricini S. .ricini S. .xylostella P. > nhre yamamai Antheraea .xuthus P. b)rvae httesaod r rue nofourteen into grouped are scaffolds the that revealed Mbp) 1 ti ut icl oetbiha nrdln of line inbred an establish to difficult quite is It : putative Fg.2,B.Itrsigy lhuhtettllnt fLINE of length total the although Interestingly, B). 2A, (Figs. .ricini S. osqecswr rvosyrgsee nGenbank, in registered previously were sequences no , 6 and sericin .ricini S. .plexippus D. a hrenatsmsadoeZchromosome Z one and autosomes thirteen has ee,three genes, genome a .46t .42 Tbe1 e Fig. see 1, (Table 0.0472% to 0.0466 was .xuthus P. , 087t .0% eae species related a 0.808%) to (0.807 a ae.Pyoeei analysis Phylogenetic taken. was , .mr sericin mori B. eoesqec oconfirm to sequence genome .yamamai A. .ricini S. ° n 0cce f1 at s 10 of cycles 40 and C and ee n five and genes .ricini S. . .mori B. .yamamai A. .ricini S. strain (Figs. A. 1 Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. aeyhaycan(i-) ih-hi FbL n boeaei p5 Ioee l,20) twsbio- was it 2000), al., of et fibroin (Inoue Although (p25) fibrohexamerin protein. and silk (Fib-L) light-chain of (Fib-H), component heavy-chain major namely the is Fibroin 1982). sericin Kafatos, and Fibroin & that to plausible (Rodakis, seemed highly genes winter proteins is protein chorion the chorion it Hc Hc in species, analysis, the because diapause high-cysteine phylogenetic in diapause, survive and absent subfamilies, embryonic search to be three BLAST for embryos the the role to for Among important according eggshells Interestingly, an 2015). of play al., hardness to et increase considered Papantonis is 1986; chorion al, (Hc) et (Lecanidou respectively 63 duplication apparent above-mentioned high the the to of addition ground In the be conversion. can gene which or cluster, duplication specific gene tandem These a through as respectively. rate 1 consist genes, are chromosome protein OG0000131) and on chorion Ortholog (OG0000113 proximity 30 OGs 3C). two and OGs, (Fig. these 33 Of species of 159. 5 were OGs the related tree among non-retrotransposon 205 phylogenetic identified relationships a 2015) and genetic Kelly, 3B, & ricini the Fig. (Emms, explains in OrthoFinder shown orthologs using is copy analysis species single Lepidoptera 5 3,907 almost among of that (OGs) suggesting orthogroups of plot, number the The in observed hardly were rearrangements, regions. links chromosomal specific’ few frequent of extremely despite ancestor However, of or the regions links 2009). entire in no al., happened of et Krzywinski regions fusion, 2015; genomic chromosome al., and et translocation (Cheong 3A) as such among orthologs chromosomes, copy among single of links domains which plot populous circos 2). of The 2 proportion Fig. large in top LTR 000477) the al., the LINE, (IPR that et (SINE, are domain reflect Stanke elements transcriptase may (IPR001584) 2014; Reverse which of domain al., that S3), assembly et shows core genome (Fig. analysis Lomsadze the catalytic 2017) 2019; in Integrase al., genes al., et protein-coding and et (Finn 16,702 Hoff InterProScan predicted 2016; software 1). al., 2008) (Table et al., et Hoff Stanke 2009; 2006; al., et (Camacho BRAKER2 initio Ab sequences. repetitive of expansion independent on in independently SINEs and the the parallel While that 2A). in (Fig. was occurred feature have noteworthy to Another seems sequences branches. repetitive phylogenetic of own expansion their the 2A), (Fig. and in family in family aao,18;Ppnoi,Sees aru 05 oai,&Kfts 92.Bsdo euneho- sequence on Based ( groups & 1982). two Eickbush, Kafatos, into Rodakis, & categorized (Lecanidou, chorion Rodakis, environment be that 2015; the can suggesting Iatrou, to proteins environment, & adaptations chorion the Swevers, reflect mology, Papantonis, from to embryos evolve 1986; 3D). to Kafatos, protect (Fig. clades likely and distinct are eggshell into proteins fell comprise OG0000131 proteins and OG0000113 Chorion from genes chorion because plexippus proteins D. chorion and xuthus P. xylostella, in present .mori B. seicOs ot-i G r eae ortornpsbeeeet Fg ) Thus, 2). (Fig. elements retrotransposable to related are OGs forty-six OGs, -specific chorion .ricini S. eepeito n oprtv eoeanalysis genome comparative and prediction gene .ricini S. .ricini S. .ricini S. aelre mut frpttv eune ntegnm hnohrlpdpea pce do species lepidopteran other than genome the in sequences repetitive of amounts larger have ee,17 genes, .ricini S. .ricini S. eoeocpe ny008% hsfidn lospotdtehptei fprle and parallel of hypothesis the supported also finding This 0.0588%. only occupied genome eoe hlgntcaayi fteegnsaogwt hro ee of genes chorion with along genes these of analysis phylogenetic A genome. h ags aiyin family largest the , and .mori B. .mori B. chorion and eoe(i.3,TbeS n al 9.Gvnthat Given S9). Table and S6 Table 3D, (Fig. genome .mori B. eoesoe ag rprino IE(94 falrpttv sequences), repetitive all of (19.4% SINE of proportion large a showed genome eoe.Freape hl h R-eo aiywstelretLINE largest the was family CR1-Zenon the while example, For genomes. ee eefudi hscutr al 6smaie l 80 all summarized S6 Table cluster. this in found were genes .ricini S. eoe r eirclycrepnigadteebrl xs ‘species- exist barely there and corresponding reciprocally are genomes ugssta eedpiaincudhv eutdi iesfiainof diversification in resulted have could duplication gene that suggests .mori B. .ricini S. ak ccoingenes. chorion Hc lacks a oky ie hs eut,atog both although results, these Given Jockey. was eoecnan osdrbysalaonso SINE of amounts small considerably contains genome 7 .mori B. .ricini S. .ricini S. and α .ricini S. pcfi hro ee r oae nclose in located are genes chorion specific and eoei cuidb retrotransposable by occupied is genome .ricini S. .mori B. β ,wihicuetresubfamilies, three include which ), seicOs(i.3) f205 Of 3B). (Fig. OGs -specific hw ag cl rearrangement scale large shows ossstrepolypeptides, three consists .ricini S. sanon-diapause a is .ricini S. .ricini S. .ricini S. chorion .mr,P. mori, B. .ricini S. .ricini S. .ricini S. specific genes genes (Fig. S. - Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. iha661mlclrrto hc scniee ob nipnal o rprsceino bon mutations fibroin: of secretion proper for indispensable be to considered is which in ratio, molecular 6:6:1 a of with fibroin silk above, described lepidopteran As other . Because saturniid S4B). of ancestor (Fig. common search TBLASTN possess through 2010), moth, saturniid including another species, 2018), al., et ai n eetvl rnpr h bonplppiecmlxt ue nsl ln.Snestridspecies saturniid Since gland. molecular silk 6:6:1 in with lumen in to polypeptides complex three polypeptide that the fibroin lack hypothesized the by transport been assembled selectively has fibroin and ratio it of Therefore, structure three-dimensional sericin. recognizes of composed mainly are in deletions model that lacks gene revealed but the also construct information properly genome of to The sequence determined unable coding already was near-complete the was prediction detected (SrFib-H) gene our Fib-H but of (2014), sequence for Yukuhiro acid and amino Sezutsu complete by 1988).The Kubota, & (Tamura, of fibroin that confirmed chemically rsn ncco Tks,Ht,Uhn,&Zag 00.Topoen eie rmatraiesplicing alternative from derived proteins Two 2010). is Zhang, sericin & fibroin, Uchino, Unlike Hata, of silk. (Takasu, fibroin. of cocoon layer following in outer present protein, most silk the of consisting and and proportion of water largest multi-copies to of second soluble presence the The occupies fibroin. Sericin of structure in complex undetectable of is part protein of the ricini p25 knockout being Since Whether than other unclear. answered. function still be is to p25 remains of not function biological far, So ssoni is Bad4,tepeoye rgnlydrvdfrom derived originally phenotypes the 4C, (BC and 1 4B generation Figs. in shown As in genes analysis. trait-related identifying at using analysis aiming between genetic research forward in of out feasibility phenotypes carry the to cocoon examine and to larval order the In of chromosomes responsible the of on Identification analysis Proteomic indigenous composition. putative their elucidated. protein of seven be the differences by the to reflect encoded may remains saturniids proteins cocoons these while whether among identified genes However, not sericin were of environments. diversity genes The class Ser2 S5). and (Fig. class Ser1/3 to belong in in multiplicated to of included seemed repertoires were gene genes sericin them three (Saturniidae), of other all that the confirmed and successfully analysis BLAST 2015). Sericin al., in et Tsubota present as 2015; are al., genbank et NCBI (Dong S8) to (Table registered are transcripts Fib-H .mori B. Ser2 SrFib-H Ser3 Fib-L eoeaepsn h osblt ffntoa ieetainaogparalogous among differentiation functional of possibility the posing are genome ee.Pyoeei nlsssoe htfu u fsvngnsaectgrzdinto categorized are genes seven of out four that showed analysis Phylogenetic genes. a efudi avlsl rdcddrn h rwn tgs(aaue l,21) ofr nine far, So 2010). al., et (Takasu stages growing the during produced silk larval in found be can .ricini S. Fib-L or Tuoae l,21) hl e1adSr eetecmoet fcco rti,Sr snot is Ser2 protein, cocoon of components the were Ser3 and Ser1 While 2015). al., et (Tsubota ee bontasotto n erto ytmi auni pce utb ieetfo that from different be must species saturniid in system secretion and transportation fibroin gene, . Fib-L Fib-H anybcueo t eeiiesqecs oee,TLSNsac sn ri- squery as SrFib-H using search TBLASTN However, sequences. repetitive its of because mainly , Fib-L .ricini S. Tbe3.I diin ecnre that confirmed we addition, In 3). (Table 1 and as bonsceindfiiny(nu ta. 00 ae l,2014). al., et Ma 2000; al., et (Inoue deficiency secretion fibroin cause .mori B. niiul hc eeotie ytecosn ( crossing the by obtained were which individuals ) or ee bec of absence gene, Fib-L .c pryeri c. S. eoeadtasrbdfo ee ou,maigthat meaning locus, seven from transcribed and genome .yamamai A. , antpoel ert bonpoent ue nsl ln,adtercocoons their and gland, silk in lumen to protein fibroin secrete properly cannot .xylostella P. .mori B. Fg.1ad4) hssc riscnb odtresfrfradgenetic forward for targets good be can traits such thus 4A), and 1 (Figs. .ricini S. Fib-L hlgntcaayi eeldta l eii ee in genes sericin all that revealed analysis Phylogenetic . .yamamai A. osssHcan -hi n 2.3firi oyetdsassemble polypeptides fibroin 3 P25. and L-chain H-chain, consists , .ricini S. .xuthus P. .ricini S. nstridmtscnb srbdt h osof loss the to ascribed be can moths saturniid in ak i- n 2 n tcnit fFbHFbHhomodimer Fib-H/Fib-H of consists it and p25 and Fib-L lacks Ser2 SrFib-H Sericin and ls Fg 5.Dsieblnigt h aefamily same the to belonging Despite S5). (Fig. class eoehstrecpe of copies three has genome and 8 and Fg 4) uprigteacrc fteassembly. the of accuracy the supporting S4A), (Fig. .c pryeri c. S. ecdn ee or genes -encoding Fib-L ocr cephalonica Corcyra .ricini S. .ricini S. .mori B. sasn ntegnm of genome the in absent is .ricini S. oon hudb are u oreveal to out carried be should cocoons oemrhlgcltat r different are traits morphological Some . eeqiedffrn:Sr/ ls genes class Ser1/3 different: quite were .ricini S. .c pryeri c. S. a three has p25 .ricini S. Ser .mori B. ffcstesceino bonor fibroin of secretion the affects ee in genes silk, Sericin oss he e2casgenes class Ser2 three possess Catna Dutta-Gupta, & (Chaitanya, sericin p25 x p25 eeioae nbackcross in isolated were .c pryeri c. S. .ricini S. a ehns which mechanism a has nadto to addition in , lk ee in genes -like ol aeo different on take could .ricini S. .mori B. genes, p25 .ricini S. .yamamai A. Tbe3). (Table s a putative 7 has r rsn in present are Ser1 Fib-L .yamamai A. x ) Ser1/3 tan with strains .ricini S. etried we , .ricini S. p25 .ricini S. , Fib-H Ser2 nthe in (Kim in class S. , , Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. rmohrlpdpea pce,btsm rhlggop,sc as such groups, ortholog some but species, lepidopteran of other sequence from genome whole the determined We are We step. cocoons, degumming in condition. Conclusion the formation degumming colour before the red on be of improvement to basis about Therefore, appeared genetic information the much they S7). reveal with are as (Fig. and us pigments red pigment step provide red as red degumming will because the not which the identify cocoons during is to away red cocoons attempting swept from now red easily silk are from red and made produce layer Eri-silk to sericin difficult in is present it mainly However, individuals. cocoon’ single a in never has side. found on phenotype each be spots cocoon’ ‘Red individuals. on black can Interestingly, F ‘Spot’ of spots in spots colour. to patterns red observed three eleven similar shining various been have brightly patterns total, show exhibits segment spot In phenotype which uniform cocoon’ body strains have ‘Red spots. larval silkworm strains five of single those have number of a none segment a of integument, a are larval sides there of lateral Although side S6, dorsal segment. the Fig. addition, colour. in ‘green’ In shown larval As of formation of body. of has larvae basis of lethal’ genetic pattern are ‘lemon the Spot lethal’ reveal and to ‘lemon ‘lemon’ us and allow of ‘lemon’ will colour Furthermore, phenotype of yellow are phenotypes carotenoid. The mutant There ricini not two S. these identified. xanthopterin, 2009). indicating be be al., phenotypes, to to recessive et yet elucidated (Meng has been colour example already ‘Yellow’ one form represent which of lethal’ substances strains exact mutant few the a ‘Blue’, be to contrast to In elucidated of integument was phenotype. larval ‘Blue’ colour the for in namely blue gene amount pigments, forms protein blue BBP-II which ricini However, and S. substance (BBP-II). yellow in- II the greenish of protein their mixture ‘Blue,’ Binding of is Regarding larvae because IX identified. caterpillar’ lepidopteran biliverdin were bilins. ‘green in those and called before found long sometimes carotenoids colour be are Greenish not demonstrated species will which tegument. lepidopteran it report yet, many first identified of the been Larvae not is have This phenotypes in ‘Blue,’ 5). four achievable for the is (Fig. responsible of analysis respectively were genetic chromosomes traits, forward those cocoon’ that that ‘Red meaning and individuals, ‘Spot’ distinguish cocoon’ ‘Yellow,’ ‘Red molecularly and can ‘Spot’ ‘Yellow,’ which markers, chromosome-specific pryeri with PCR BC Genomic all BC in the from heterozygotic of derived be chromosomes phenotypes should four all chromosomes above-mentioned females, the lepidopteran that in Considering occur be which not should cocoons uals does of refers recombination colour ‘Spot’ meiotic the respectively. illustrates ‘Spot’ Since integument, ‘Yellow,’ literally larval phenotype ‘Blue,’ yellow cocoon’ namely, and ‘Red phenotypes, blue BC integument. 4 to larval some for refer on ‘Yellow’ chromosomes spots and black responsible ‘Blue’ to the identify cocoon.’ to ‘Red tried and we Here, . Ge oon steeittctatt Rdcco’peoye(i.4) Rdcco’i n fteeconomic the of one is cocoon’ ‘Red 4C). (Fig. phenotype cocoon’ ‘Red to of trait traits epistatic the is cocoon’ ‘Grey eeldta hoooe8 3 n 2wr nfrl eeoyoi naleaie ‘Blue,’ examined all in heterozygotic uniformly were 12 and 3 13, 8, chromosome that revealed , osntsgicnl ie Sio,21) niaigta B-Iecdn eei o responsible a not is gene encoding BBP-II that indicating 2011), (Saitoh, differ significantly not does eas Ylo’i oiatpeoye dniyn h epnil ee o Bu’ad‘Yellow’ and ‘Blue’ for genes responsible the Identifying phenotype. dominant a is ‘Yellow’ because .ricini S. 1 niiul produce. individuals γ Sio,21) ntelra neuetof integument larval the In 2011). (Saitoh, .ricini S. oeeorcsof eco-races Some . 1 niiul.Clu fccosof cocoons of Colour individuals. .mori B. - .c pryeri c. S. .c pryeri c. S. hc hwylo neuetclu hntp,ad‘eo’ad‘lemon and ‘lemon’ and phenotype, colour integument yellow show which .ricini S. n So’idvdasi nfrl omdi vr emn flarval of segment every in formed uniformly is individuals ‘Spot’ and eeoyoi or heterozygotic .ricini S. .ricini S. nIdaaekont pnrdccos iia oor‘Red our to similar cocoons, red spin to known are India in 1 individuals. 9 .c pryeri c. S. (and h eerpror of repertoire gene The . .ricini S. .c pryeri c. S. .c pryeri c. S. .mori B. - .ricini S. n F and chorion .c pryeri c. S. iiednIX Biliverdin , 1 .Atog h epnil genes responsible the Although ). a ohn od ih‘elw of ‘Yellow’ with do to nothing has niiul sge,adtu,this thus, and grey, is individuals ee,and genes, ooyoi,adntchimeric. not and homozygotic, .ricini S. r oiat responsible dominant, are γ sericin id oBiliverdin- to binds .ricini S. a o odistinct so not was .c pryeri c. S. ee,seemed genes, 1 and individ- .c. S. and Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. hroo,O,VnHeee,A,&Mn,B .(06.TraeAaeDt tutr o Phylogenomic for Structure Data of Aware Terrace service https://doi.org/10.1093/sysbio/syw037 web-based (2016). 997–1008. interactive Q. 65(6), An Biology, B. Systematic FS: Minh, ClicO Supermatrices. & from (2015). A., Inference P. Haeseler, K. the Von Ng, of O., & analysis Chernomor, comprehensive J., A S. 3685–3687. (2015). Yap, K. 31(22), C., Mita, Bioinformatics, Y. Circos. Tan, . . . H., 1–11. Y., W. 5, Lepi- Guo, Cheong, Reports, The J., Scientific Liu, Lepbase: silkmoth. S., (2016). Li, in M. H., locus Blaxter, Guo, chorion & J., D., Nohata, Z., C. Chen, Jiggins, https://doi.org/10.1101/056994 K., 056994. cephalo- BioRxiv, K. Corcyra database. Dasmahapatra, genome S., of dopteran Kumar, genes J., 20- P25 R. Endocrino- Challis, and and Comparative and developmental fibroin General development. synchronous chain 113–121. larval 167(1), expression, instar Light logy, last tissue-specific the (2010). during characterization, A. regulation hydroxyecdysone cloning, Dutta-Gupta, Molecular & nica: K., R. Chaitanya, 1972–1973. alignment 25(15), Bioinformatics, automated analyses. for phylogenetic tool large-scale (2009). in A L. trimming trimAl: T. (2009). Gabald´on, T. Madden, & M., & Silla-Mart´ınez,Capella-Guti´errez, J. S., K., of Bealer, 1–9. Journal 10, J., Bioinformatics, Papadopoulos, plants. BMC N., 10-421 applications. food Ma, and different V., Architecture perfor- Avagyan, to rearing BLAST+: G., reference and Coulouris, morphology with C., on crossbreed Camacho, 12–19. study canningi 3(5), comparative Studies, Samia A Zoology and and (2015). ricini Entomology K. Samia Dutta, data. of sequence & Illumina mance A., for Swargiary, trimmer D., flexible a Brahma, Trimmomatic: (2014). Research, B. 2114–2120. Usadel, Acids 30(15), & Nucleic Bioinformatics, M., sequences. Lohse, DNA M., 47(D1), A. analyze Bolger, Research, to program Acids a Nucleic finder: 573–580. repeats knowledge. 27(2), Tandem protein (1999). of G. 47(D1), hub Benson, Research, worldwide Acids a Nucleic UniProt: knowledge. D506–D515. (2019). protein A. of Bateman, hub worldwide A UniProt: Sequenced D506–D515. in Families (2019). Sequence A. Repeat of https://doi.org/10.1101/gr.88502 Bateman, Identification Novo 1269–1276. De 12(8), Automated Research, Genome (2002). R. Genomes. S. Eddy, & Z., Bao, JSPS References numbers by grant supported KAKENHI was JSPS study 18H03949. and This and 17H06431 strategy. 17H05047 and 15H02482, assemble numbers the JP22128004 grant on number KAKENHI advice grant valuable KAKENHI for MEXT Kasahara Masahiro thank We silkmoth.’ wild of genetics Acknowledgements ‘forward for way the Now, paved traits. has certain report the for this chromosomes that responsible anticipating the are identified successfully we we thus applicable, is analysis in specifically evolve to https://doi.org/10.1093/nar/gky1049 https://doi.org/10.1093/nar/gky1049 https://doi.org/10.1093/nar/27.2.573 https://doi.org/10.1016/j.ygcen.2010.02.007 .ricini S. h ult ftegnm sebymtteciei hc owr genetic forward which criteria the met assembly genome the of quality The . https://doi.org/10.1093/bioinformatics/btu170 https://doi.org/10.1093/bioinformatics/btv433 10 https://doi.org/10.1038/srep16424 https://doi.org/10.1186/1471-2105- https://doi.org/10.1093/bioinformatics/btp348 Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. o,K . osde . ooosy . tne .(09.WoeGnm noainwt BRAKER with Annotation Whole-Genome (2019). Unsupervised M. Stanke, BRAKER1: & 65–95). M., Borodovsky, (pp. (2016). A., Lomsadze, M. J., K. Stanke, 32(5), Hoff, Bioinformatics, & 1. M., Table Borodovsky, AUGUSTUS: and A., GeneMark-ET Improving767–769. Lomsadze, with UFBoot2: Annotation S., Genome Lange, (2018). RNA-Seq-Based S. J., L. K. Evolution, Vinh, and Hoff, & Biology Molecular Q., evolution. B. novo https://doi.org/10.5281/zenodo.854445 and Minh, De biology 518–522. A., Molecular 35(2), Haeseler, Approximation. (2013). and von Bootstrap A. O., Ultrafast generation the Regev, Chernomor, reference T., & for from D. N., platform Westerman, Retrieved Hoang, Trinity Friedman, N., 1494. Weeks, the R.D., 8, F., using Protocols, LeDuc, Strozzi, Eccles, RNA-seq Nature N., R., M.B., Pochet, from analysis. Couger, Henschel, J., reconstruction J., Orvis, C.N., sequence Bowden, M., Dewey, D., transcript Ott, P. T., M.D., Blood, MacManes, William, M., M., R., Grabherr, Lieber, Wortman, M., B., Yassour, B., Li, to C. A., D., Robin, Program Papanicolaou, O., the J., White, and B. of 1–22. J., EVidenceModeler Haas, 9(1), Dichotomy Orvis, using Biology, E., annotation Genome J. (2019). structure Allen, Alignments. gene R. M., Spliced eukaryotic 29(23), Pertea, Assemble J. Automated Biology, W., Walters, Zhu, (2008). Current L., & R. S. J. Butterfly. P., Salzberg, Monarch Andolfatto, J., the B. D., Haas, of R. Chromosome Reed, Z J., Neo https://doi.org/10.1016/j.cub.2019.09.056 J. A. the 4071-4077.e3. Mitchell, Lewis, along & F., Compensation Y., P. Dosage S. Reilly, 45(D1), Young, L., Research, L., S., Acids Richardson, Gu, L. Nucleic N., Yeh, annotations. Redaschi, domain I., N., Orengo, and D., Xenarios, Thanki, family G., H., G., N. protein Nuka, Cathy D190–D199. Sutton, 2017-beyond Rawlings, M., Wu, S., in C., InterPro Necci, Squizzato, C.E., (2017). B., S. A., Silvio L. Smithers, Tosatto, D. Potter, I., D., Natale, D., Sillitoe, P. C., J., Piovesan, Thomas, Sigrist, Mistry, Letunic, S., X., A., H., Sangrador-Vegas, Pesseat, Huang, Yu Mi, C., H., Y., Hsin Rivoire, A., Huang, Chang, Park, L., Marchler-Bauer, J., G. A., S., Holliday, A. D., Lu, C. Bridge, Haft, P., R., J., Bork, Gough, Lopez, A., M., I., Fraser, Bateman, S., C., El-Gebali, P. Z., Babbitt, Dosztanyi, K., T. Attwood, D., R. Finn, dra- comparisons genome S. 47(D1), whole 1–14. Tosatto, Research, in 16(1), 015-0721-2 Biology, biases D., Acids Genome fundamental accuracy. Piovesan, Nucleic solving inference OrthoFinder: 2019. orthogroup L., improves (2015). Richardson, in matically Paladin, S. Kelly, database M., L., & families M., Qureshi, Hirsh, D. protein Emms, C., L., Pfam S. L. The Potter, (2019). E. D. A., Sonnhammer, R. Luciani, A., D427–D432. Finn, Smart, R., & A., S. E., Eddy, G. C. A., Salazar, Bateman, J., (2015). J., L. Mistry, H. S., Xiang, Nucleic El-Gebali, & throughput. high W., and Wang, accuracy 1792–1797. high structure 32(5), X., with silk alignment Research, Li, sequence of Acid multiple Y., basis MUSCLE: (2004). genetic Liu, C. the P., R. Edgar, imply Yang, 1–14. silkmoths 16(1), L., six Genomics, Chen, of BMC H., glands coloration. silk Liu, and on Y., analyses Ren, transcriptome F., Comparative Dai, Y., assembly. Dong, genome 2008. for Genomics, Plant selection of size Journal k-mer International G automated & A., and Conesa, Informed (2014). 31–37. P. 30(1), Medvedev, Bioinformatics, & R., Chikhi, https://doi.org/10.1093/bioinformatics/btv661 https://doi.org/10.1093/nar/gkw1107 https://doi.org/10.1093/nar/gky995 https://doi.org/10.1007/978-1-4939-9173-0 t,S 20) ls2O opeesv ut o ucinlaayi npatgenomics. plant in analysis functional for suite comprehensive A Blast2GO: (2008). S. ¨ otz, https://doi.org/10.1093/bioinformatics/btt310 https://doi.org/10.1093/nar/gkh340 https://doi.org/10.1186/s12864-015-1420-9 https://doi.org/10.1155/2008/619832 https://doi.org/10.1038/nprot.2013.084 11 5 https://doi.org/10.1186/gb-2008-9-1-r7 https://doi.org/10.1186/s13059- Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. ih .Q,Nue,M .T,&VnHeee,A 21) lrfs prxmto o phylogenetic for approximation Ultrafast (2013). A. Haeseler, Von https://doi.org/10.1093/molbev/mst024 & 1188–1195. 30(5), T., Evolution, A. and Biology M. Molecular Nguyen, bootstrap. Q., B. reductase sepiapterin 11698–11705. Minh, human T. 284(17), for Shimada, Chemistry, model & insect K., Biological potential Mita, of a T., Tamura, is Journal H., lethal) deficiency. Sezutsu, (lemon K., lemon Uchino, mutant silkworm Y., The occurrences Banno, (2009). T., of Daimon, counting S., Katsuma, parallel Y., efficient Meng, for 764–770. approach 27(6), lock-free Bioinformatics, fast, k-mers. A Genome of (2011). (2014). C. Q. Scientific Kingsford, Xia, bioreactor. & & efficient P., Mar¸cais, highly G., Zhao, a J., for Zhang, gland W., silk 1–7. Lu, mori Bombyx J., 4, empty Gao, Reports, an J., provides Chang, gene Y., BmFib-H Liu, of X., editing Wang, automatic R., into 1–8. Shi, reads 42(15), S., RNA-Seq Research, Ma, mapped Acids Nucleic of Bioin- Integration algorithm. sequences. finding (2014). gene M. long eukaryotic Borodovsky, of noisy & training for D., P. assembly Burns, novo A., de Lomsadze, and mapping Fast 2103–2110. epi- miniasm: the 32(14), and in formatics, Minimap acid e0205758. uric 13(10), (2016). of ONE, H. Accumulation PLOS Li, larvae. (2018). ricini S. Samia Katsuma, of & integument United white T., the the Shimada, forms of M., chorion dermis Kawamoto, Sciences moth T., of silk Kiuchi, Academy the J., National of Lee, the Evolution of 6514–6518. (1986). Proceedings 83(17), C. CB. America, F. and of Kafatos, CA States & families H., gene T. genetics Eickbush, superfamily: evolutionary C., gene Molecular G. X: Rodakis, MEGA 1547–1549. R., 35(6), (2018). 1639– Lecanidou, Evolution, M. K. and 19(9), Tamura, Biology Marra, & Molecular Research, & C., platforms. Knyaz, Genome computing J., M., across S. analysis Li, genomics. G., Jones, Stecher, comparative D., S., Japanese for Kumar, Horsman, the aesthetic R., of information Gascoyne, 7(1), sequence An J., GigaScience, Genome W., Connors, 1645. Circos: S. Saturniidae. I., (2018). Kim, family Birol, (2009). W. H., the J., A. S. K. in Schein, Park, Choi, genome M., & draft B., Krzywinski, W., S. first T. the Kim, Goo, Y., yamamai: I., K. Antheraea 1–11. Kim, Kim, moth, require- K., M., silk memory Caetano-Anolles, Kim, low oak with H., S., aligner Kim, J. spliced W., Hwang, fast Kwak, a HISAT: S.R., from Retrieved Kim, (2015). 357. L. 12, S. Methods, Salzberg, Nature & ments. B., Langmead, D., 587–607. 26(6), Kim, Genetics, Drosophila. and Bombyx http://www.ncbi.nlm.nih.gov/pubmed/17247024 in Formation from Pigment sequence Retrieved multiple of Mechanism rapid for 3059–3066. (1941). method 30(14), H. novel Research, Kikkawa, a Acids MAFFT: Nucleic (2002). transform. Fourier T. ModelFinder: fast Miyata, on & (2017). based K., alignment S. https://doi.org/10.1038/nmeth.4285 Kuma, L. K., 587–589. Jermiin, Misawa, 14(6), & K., Methods, A., Katoh, Nature Haeseler, estimates. von phylogenetic F., accurate 462– for K. selection T. 110(1–4), model Wong, fast Research, Repbase Q., Genome B. (2005). Minh, and J. S., Kalyaanamoorthy, Walichiewicz, Cytogenetic & elements. O., repetitive Kohany, P., eukaryotic Klonowski, of 467. A., database Pavlicek, a V., Update, a V. with Kapitonov, P25, J., and mori 40517–40528. L-chain, Bombyx Jurka, 275(51), H-chain, of Chemistry, of fibroin Biological Silk consisting of Journal unit (2000). elementary ratio. S. mass molar Mizuno, molecular 6:6:1 & high K., a Ohtomo, assembling S., secreted, Kimura, is F., Arisaka, K., Tanaka, S., Inoue, https://doi.org/10.1159/000084979 https://doi.org/10.1101/gr.092759.109 https://doi.org/10.1093/gigascience/gix113 https://doi.org/10.1038/srep06867 https://doi.org/10.1093/bioinformatics/btw152 https://doi.org/10.1073/pnas.83.17.6514 https://doi.org/10.1093/bioinformatics/btr011 https://doi.org/10.1038/nmeth.3317 12 https://doi.org/10.1074/jbc.M900485200 https://doi.org/10.1074/jbc.M006897200 https://doi.org/10.1093/nar/gku557 https://doi.org/10.1093/nar/gkf436 https://doi.org/10.1371/journal.pone.0205758 https://doi.org/10.1093/molbev/msy096 Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. aal-roa,M,&Ce,N 20) sn eetakrt dniyrpttv lmnsi eoi se- genomic in elements the repetitive 1–14. in identify 25), to (SUPPL. polypeptides Bioinformatics, RepeatMasker in Using fibroin Protocols Current (2009). quences. of N. Chen, In weight PAGE. & SDS molecular M., Tarailo-Graovac, by of ricini Silkmoths. cynthia sericin determination Wild Philosamia 40(4), for and major A Biology, Society pernyi as (1988). International Antheraea Molecular yamamai, proteins T. Antheraea and silkworms, Ser2 Kubota, Biochemistry saturnid of & Insect Identification T., mori. (2010). Tamura, Bombyx Q. of Zhang, silk Tine & non-cocoon Bivol 339–344. K., the and Uchino, in Univoltine 8. T., in components 17(3), Hata, Patterns Entomologist, Lakes Emergence Y., Great Takasu, and The Diapause Saturniidae). (Lepidoptera: (1984). Promethea G. of Waldbauer, Populations cDNA 7(1), & mapped syntenically J., Bioinformatics, and 637–644. native Sternburg, BMC 24(5), Using Bioinformatics, (2008). finding. sources. D. gene novo Haussler, de external & improve to R., from alignments Baertsch, M., hints Diekhans, M., uses Stanke, that model Markov 62. hidden generalized comparison. a Sch sequence biological M., for heuristics Stanke, of : generation LEPIDOPTERA Automated 31. 43–52. ( (2005). 6(5), 30(1), utilization E. Bioinformatics, Science, Birney, for BMC & Insect species prospects C., of of their S. Journal clarification G. and PROSPECTS. their Slater, India and THEIR Diversity in AND ) (2017). INDIA C. Saturniidae IN P. ) Pathania, : SATURNIIDAE & Lepidoptera cynthia A., ( S. (Samia 59–70. Samia Ahmed, eri-silkworm Genus 83(3), R., the Sericology, Kumar, of and K., Biotechnology sequence B. Singh, Insect nucleotide of complete Journal The gene. (2014). fibroin of K. ricini) Yukuhiro, States from & Retrieved United H., 26–34. (23), the Sezutsu, Kankyo, of hihu- seisitu. high-cysteine no to Sciences ketugoutanpakusitu a –sinjusanyoutyutaieki Biliverdin aoirosikisoketugoutanpakusitu how okeru of ni no sosiki youtyu Academy proteins: Yamamayuga-ka in National (2011). novelty H. the evolutionary Saitoh, of of Origin Proceedings 3551–3555. (1982). 79(11), evolved. C. America, has F. genomes. large protein Kafatos, in & chorion families (2015). repeat C., L. of G. identification S. novo Rodakis, De Salzberg, (2005). 1(suppl & A. Suppl P. 21 Pevzner, T., England), & (Oxford, J. C., Bioinformatics Biotechnology, N. Mendell, Jones, Nature L., -C., A. reads. Price, T. RNA-seq from Chang, from transcriptome M., Retrieved Incarnate a C. 290. the of 33, of Antonescu, reconstruction University improved M., Samia. enables G. genus StringTie Pertea, silkmoth the M., of Pertea, revision A (2003). S. Naumann, Struc- Word. & Evolution, S., Their R. of Peigler, Landscape A 177–194. 60(1), Genes: Entomology, Chorion A of Review (2015). 010814-020810 (2015). Annual K. H. Regulation. Iatrou, Fujiwara, and & ture, L., . Swevers, . . A., Y., Papantonis, 405–409. 47(4), Suzuki, Genetics, T., Nature butterfly. Ando, Papilio Effective J., in and https://doi.org/10.1038/ng.3241 mimicry Yamaguchi, Batesian Fast R., female-limited A for Kajitani, mechanism IQ-TREE: genetic T., Iijima, (2015). Evolution, H., Q. and Nishikawa, B. Biology Minh, Molecular & Phylogenies. A., 268–274. Maximum-Likelihood Haeseler, 32(1), Estimating von for A., Algorithm H. Stochastic Schmidt, L.T., Nguyen, https://doi.org/10.1186/1471-2105-7-62 https://doi.org/10.1016/j.ibmb.2010.02.010 https://doi.org/10.1093/molbev/msu300 ffan . ogntr,B,&Wak .(06.Gn rdcini uaytswith eukaryotes in prediction Gene (2006). S. Waack, & B., Morgenstern, O., ¨ offmann, https://doi.org/10.1038/nbt.3122 https://doi.org/10.1073/pnas.79.11.3551 https://doi.org/10.1186/1471-2105-6-31 ) i351-8. 1), 13 https://doi.org/10.1093/bioinformatics/bti1018 https://doi.org/10.1002/0471250953.bi0410s25 https://doi.org/10.1146/annurev-ento- https://doi.org/10.1093/bioinformatics/btn013 http://hdl.handle.net/10212/2037 Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. al h euto ikg analysis. linkage of result The 2 Table of Features 1 Table results. Figures the for discussed and library J.L. the Table Illumina and prepared prepared S.K. Y.S. T.K., T.N. runs. T.S., sequence manuscript. runs. the the sequence performed the wrote S.S. performed and and and data K.Y. sequencing the RNA sequencing. analyzed genome and for experiments libraries the designed J.L. Contributions DRA009718 and Author DRA009717 numbers accession the under available https://datadryad.org/stash/share/5hfxrylqOiDgzbXoM at available are is study assembly Genome this respectively. (DDBJ), in obtained data NGS Biomacro- Accessibility yamamai. Antheraea Data Strnad, of L., Silk Kucerova, the C., in L. Composition Vieira, Sericin D., (2016). Kodrik, 1776–1787. H. 17(5), F., (2019). Sehadova, molecules, B. Sehnal, & P., D. B., Konik, Gammon, Kludkiewicz, the & H., of https://doi.org/10.1073/pnas.1818283116 N., V., Yonemura, Proceedings 1669–1678. N. 116(5), M., interactions. America, Grishin, Zurovec, virus–host of H., States and Insect United D. capability the of Lepidoptera. Janzen, flight Sciences W., into of of Academy Hallwachs, insights National species provides A., genome E. model moth Rex, gene the Gypsy Q., Comparative and Cong, mori: J., karyotype Bombyx Zhang, low-number 370–377. versus and 41(6), a cynthia Biology, prediction with Samia Molecular gene and (2011). species to Biochemistry K. a assessments Sahara, between quality Kriventseva, & mapping G., from Y., Klioutchnikov, Yasukochi, applications P., A., BUSCO 543–548. Yoshido, Ioannidis, 35(3), (2018). M., Evolution, M. and Manni, Biology E. A., Molecular Zdobnov, F. phylogenomics. & Simao, V., M., Seppey, E. de- variant M., microbial comprehensive Wort- R. for Q., tool Waterhouse, Zeng, integrated 9(11). A., An ONE, C. Pilon: PLoS improvement. Cuomo, (2014). assembly M. S., Schatz, genome A. Sakthikumar, and Earl, A., & tection & Abouelliel, K., J., M., S. Priest, Young, Gurtowski, T., J., 33(14), Shea, man, H., T., Bioinformatics, Fang, Abeel, reads. J., J., short B. Walker, C. from profiling Underwood, from genome M., assembly reference-free Fast genome Nattestad, 2202–2204. novo GenomeScope: J., de (2017). F. C. accurate Sedlazeck, M. and Fast W., (2016). G. M. Vurture, Sikic, 068122. & BioRxiv, N., reads. uncorrected gland Nagarajan, silk long I., larval 1–14. the Sovic, Science, in R., analysis Insect Vaser, expression ricini. Gene Samia and (2015). silkworm H. gaps eri Sezutsu, knowledge, & the current K., of genomes: Mita, 99–105. K., Lepidoptera Yamamoto, 25, (2018). T., Science, Y. Tsubota, Insect A. in Kawahara, Opinion 259– & Current 7, D., University., directions. S. Imperial future Tokyo Cinel, Agriculture, A., of D. reference Triant, College special the with of crosses, silkworm Bulletin some The On heredity. 393. I. of insects. law of Mendel’s hybridology the to on Studies (1906). K. Toyama, https://doi.org/10.1192/bjp.111.479.1009-a https://doi.org/10.1093/bioinformatics/btx153 .ricini S. https://doi.org/10.1021/acs.biomac.6b00189 genome. https://doi.org/10.1101/068122 14 https://doi.org/10.1016/j.ibmb.2011.02.005 https://doi.org/10.1111/1744-7917.12251 https://doi.org/10.1371/journal.pone.0112963 https://doi.org/10.1016/j.cois.2017.12.004 https://doi.org/10.1093/molbev/msx319 8esv8rQGyOO056G9X732hKlP8 Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. h rtsqecn eentsffiin o seby ons1ad2ma h rtadscn sequencing, second and first the mean 2 and 1 via Rounds reads assembly. obtained of for amounts sufficient because not respectively. libraries were same sequencing the data. first re-using the read twice conducted short was illumina sequencing of *Illumina statistics of Summary S1 Table Information Supporting phenotypes. BC (D)’ the cocoon ‘Red in cocoon.’ and markers ‘Red (C),’ PCR-based and ‘Spot of ‘Spot’ ‘Yellow,’ patterns ‘Blue,’ Segregation of mapping Linkage 5. Fig of Cocoon (C) x of larvae 5th-instar (B) of larva Fifth-instar (A) of view Orange–DpCho. Graphical Green–PxuCho; 4. Purple–PxyCho; Fig Blue–SrCho; BmCho; Red– are: colors Branch (PxyCho), proteins chorion of orthologs. tree copy Phylogenetic single (D) 3,907 of alignments on the based indicates species lepidopteran section xylostella five each of in tree Number Phylogenetic (C) species. lepidopteran five in orthogroups. orthogroups of protein number of diagram Venn (B) al. et of chromosomes putative of chromosomes represents ‘Sr ideogram of ricini side Left (A) of presence The 3 Table mut(p )adpooto % )o eeiiesqecsi v eiotrnseis including species, lepidopteran five in between sequences Comparison repetitive 3. of Fig B) (%; proportion the in and ricini sequences A) repeat (bp; of Amount proportion and Amount 2. Fig of moth male Adult (B) similarity. high of with larva region Fifth-instar any (A) identify of to view failed Graphical search the 1. BLAST to Fig means similarity high marks showing Question region 1e-5. genomic than one less least of at columns has the species in Circles Genbank. of at column of the in column numbers the in numbers The genomes. .ricini S. HGAP (2011). , ‘bm . .mori B. a sda h ugopfrroigtete.Bosrpvle r hw ntebranches. the on shown are values Bootstrap tree. the rooting for outgroup the as used was . JL ’t ‘bm to 1’ .ricini S. scaf , .plexippus D. (sr 1 8 orsod otecrmsmsof chromosomes the to corresponds 28’ .ricini S. , .ricini S. .ricini S. .c pryeri c. S. .ricini S. .xuthus P. .c pryeri c. S. .ricini S. ) o‘Sr to 1)’ .xuthus P. Fib-H .c pryeri c. S. .ricini S. , .ricini S. .xuthus P. .ricini S. n he omo BC of form three and hro rtis(SrCho), proteins chorion . . , h hoooenmesof numbers chromosome The . eedrvdfo h rncit ftecrepniggns registered genes, corresponding the of transcripts the from derived were Fib-L HGAP . hro rtis(xCo and (PxuCho) proteins chorion F , . .xylostella P. and 1 niiul n Rdcco’idvdasi BC in individuals cocoon’ ‘Red and individuals and , fibrohexamerin and n yrdprogenies. hybrid and JL .mori B. .xylostella P. .mori B. scaf and 15 5(sr 35 tn o oynme fec ee.Teaccession The genes. each of number copy for stand 1 .plexippus D. 1 .mori B. bandb h rsig( crossing the by obtained rgne hc hwd‘le()’‘elw(B),’ ‘Yellow (A),’ ‘Blue showed which progenies genome .mori B. . , 5’aeson ue ig(lc)indicates (black) ring Outer shown. are 35)’ p25 .mori B. n ih ierpeet cfflsof scaffolds represents side right and .ricini S. .ricini S. and hro rtis(mh) .xylostella P. (BmCho), proteins chorion .plexippus D. niaeta eoeasml feach of assembly genome that indicate .mori B. sericin sfor As . r ie codn oYoshido to according given are genome. ikpoen iha e-value an with proteins silk ee n5lepidopteran 5 in genes hro rtis(DpCho). proteins chorion .ricini S. .ricini S. 1 . x 5scaffolds, 35 , .c pryeri c. S. Plutella S. S. ) Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. i.S.Pyoeei reo eiisof sericins of tree Phylogenetic S5. of assembly Fig. genome the query. against as search sequence acid TBLASTN of Result of (B) assembly genome the query. against as search sequence TBLASTN of search Result (A) TBLASTN of Results S4. Fig. distribution IDs (blue). InterProScan frequencies (k=31) S3. k-mer Fig. observed the to (black) the model of GenomeScope plots the profile k-mer of GenomeScope analysis distribution k-mer S2. Fig. BC in patterns segregation Scaffold distinguish ( (B) analysis scaffolds to linkage 35 markers to of genetic specific result using is PCR the marker genomic and of analysis Electrophoresis linkage (A) for markers Genetic S1. Fig. of chorion Hc with search shown. BLASTP of results best-hit The of genomes the B.mori in elements S9 LINE were of Table LC001870 amount to and and variants LC001867 able splicing Proportion were Because was S8 LC001870 Table and search LC001867 1e-5. that TBLASTN than concluded gene. we but less single locus, e-value a models, same the of an gene to with mapped corresponding be regions to the elucidated genomic identify identical not the could find evm.model.Sr search outgroup. Because BLASTP an * as ‘chorion.’ of gene in hit this be utilized top Putative we to S7 The sequences, Table gene registered this shown. any to for were similarity decided database no we showed protein so non-redundant ‘chorion,’ to as search BLASTP of evm.model.Sr cluster. results gene best-hit chorion in The genes of assemblies. Annotations genome S6 Table lepidopteran of analysis. assessment linkage BUSCO for S5 cell markers Table SMRT genetic per data. of bases RNA-seq Sequences total Illumina S4 and of Table subreads statistics of of number Summary The S3 Table reads. long obtaining for shown. data. used are read were long cells Pacbio SMRT of 4 statistics of Summary S2 Table osl(,C n aea B iw fa of views (B) lateral of and integument C) the (A, on Dorsal spots black Larval S6. Fig. i.S.Pgetetat f‘e cocoon.’ ‘Red of extracts Pigment S7. spiracle. Fig. a indicates arrowhead blue the while .ricini S. HGAP sericin JL hro hwn h ihs iiaiyt ihcsen H)coinof chorion (Hc) High-cysteine to similarity highest the showing chorion scaf .01wsntanttda coin’btsm ueirht eeannotated were hits superior some but ‘chorion,’ as annotated not was 2.1091 ee of genes > Mbp). 1 1 .ricini S. .ricini S. individuals. .c pryeri c. S. .ricini S. .ricini S. A and (A) eitrdi CIgenbank. NCBI in registered 16 av.Terdarwed niaetebakspots black the indicate arrowheads red The larva. .c pryeri c. S. and B.mori .yamamai A. and .mori B. .yamamai A. .ricini S. sqeyto query as .yamamai A. . . B eoe hwn h to the of fit the showing genomes (B) .ricini S. using .ricini S. using genomes. .ricini S. .ricini S. and HGAP .mori B. hro rtiswere proteins chorion .c pryeri c. S. i- mn acid amino Fib-H and JL i- amino Fib-L scaf .mori B. Each . 2.1135 . Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. a itr f10m fsdu abnt,20m fTio- n 0m fwater. of mL 50 and solution Triton-X Extraction of mg solution. 250 extraction in carbonate, min sodium 30 of for mg boiled 100 were of cocoons mixture red a pigments, was extract to order In 0mm 10 Length [Mbp] 100 150 200 250 50 0 A ricini S. mori B. Species plexippus .D. xuthus P. xylostella P. Satellites Unclassified N elements DNA T elements LTR LINE SINE o complexity Low repeats Simple RNA Small i.1 Fig. i.2 Fig. 17

Length [%] B 100% 25% 50% 75% 0% ricini S. mori B. Species plexippus D. xuthus P. xylostella P. SINE Unclassified N elements DNA o complexity Low elements LTR LINE iperepeats Simple RNA Small Satellites Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. A 0mm 10 C

A bm23

bm19 bm16

bm18 bm15

bm25 bm4 bm22

0 0

0

0

0 bm5

bm13 0

0

bm17 0 0

bm3 bm28

0 0

bm14

0

0 bm7 0

bm9 0 bm10 0

bm24 0

bm6 0

0

bm11 bm27

0

0 bm20

bm21 0

bm26 0 0

bm12

bm2 0 0 100

bm8 0 0.04 bm1 0 100

sr29 0 25

3 0

Z

1

.

sr5 sr13

r

c

h h

0 r

C

0 sr20

B

sr30 0 0

sr16

0

sr2 C

2 25 h

1

r .

sr12

. r

0

0 1 h

0

C sr26 sr34 0

0

sr33

0

sr32

0 sr10

sr31

Blue 0 0

sr28

1 0

1 sr22

. sr17

r 0

C

0

h

sr18

0 h

C sr21 r

0 .

sr24 2

0 sr9 25

sr8

0 0

1 .

r

0 sr7

h 0

sr14 C 25 0

sr19

C

0

25 h

sr3

9 r

. 0

.

0 r 0 sr4 3

h

0

0 25

C 25 sr15 0

0

0 sr25 sr11

sr1

sr35

sr6 sr23

8 sr27 .

r

C

h

h C

ai ricini Samia r ltlaxylostella Plutella xuthus Papilio aasplexippus Danaus obxmori Bombyx . 4

7

. r

h

C

Yellow C h r

.

5 6 . r h C B D i.3 Fig. i.4 Fig. 18 ai ricini Samia ai ricini Samia Spot aii xuthus Papilio

0mm 10

DPOGS206741-PA DPOGS212791-PA

DPOGS206760-PA DPOGS201057-PA

DPOGS206758-PA DPOGS212796-PA

DPOGS206754-PA DPOGS214026-PA DPOGS206753-PA DPOGS214025-PA DPOGS206752-PA 205 DPOGS205110-PA

DPOGS212797-PA

XM_011566961.1

A0A194PUT0

XM_011561864.1 A0A194PT51

A0A194PU55 DPOGS212795-PA A0A194PZC5

XM_011554906.1 A0A0K2S2W8 A0A194PSV0

XM_011559410.1 A0A194PSU1 A0A194PSU6

XM_011557436.1 A0A194PU40

XM_011569393.1 A0A0N0P9J7

A0A194PT61

DPOGS206740-PA A0A194PU45 XM_011569391.1 A0A194PUR5

A0A194PU35

DPOGS206761-PA XM_011560172.1 XM_011560171.1

2.1175 A0A194Q5Y9

2.1092 XM_011566431.1

2.1091 A0A194Q567 mori Bombyx XM_011566430.1

H9JJD2 A0A194PST0 XM_011565373.1 XM_011565370.1 A0A194PUT4 XM_011565368.1 XM_011565371.1A0A0K2S2P4

XM_011565377.1 XM_011565384.1 A0A0K2S2Z0

XM_011563823.1 H9J162

XM_011566435.1 A0A0K2S2Q6 H9JJM0

2.1094

A0A194PT66 H9JJL4

C

A0A194PT46 2.1174 A0A194PST5 H9J161

1012 85

A0A194PZB0 2.1142

A0A194PUS5 2.1145

A0A194PZB5 2.1151

A0A194PUS0 2.1149

2.1150

A0A0N1PG09 A0A0K2S351 2.1148

DPOGS206743-PA A0A0K2S2X8 48 77 A0A194PU50 H9JJM1 150

DPOGS206744-PAA0A194PT56 A0A0K2S3R0 A0A0K2S2Y2 93

0mm 10 DPOGS206759-PA DPOGS206755-PA

DPOGS206751-PA

DPOGS212790-PA 2.1122 447 101

2.1140 A0A194PZA5

ai ricini Samia 2.1095

A0A194Q703 XM_011556205.1 2.1127 247

81

2.1119 200

H9JJD1 A0A0K2S2Y9 A0A0K2S3462.1093 A0A0K2S3Q5

42

2.1144 2.1120 O 2.1147 2.1124 G 183

0 2.1161 2.1138

0

8604

2.1163 2.1125 0 0

2.1157 2.1129 1

1

2.1154 2.1131 3 166

O

2.1164 2.1132

Q17215 174

G 2.1172 42

2.1170 A0A0K2S2Z1

0 2.1168 A0A0K2S3P5

0 2.1166 A0A0K2S306

0

2.1152 A0A0K2S3H1

0

2.1158 A0A0K2S2Z4 83 1

2.1116 2.1097 3

2.1106 2.1099 353

1

2.1111 2.1098

2.1113 2.1109 90

2.1104 2.1117

2.1102 2.1114

2.1108 2.1107 49 60

2.1118 2.1112

2.1100 2.1101

302

2.1096 2.1115

A0A0K2S2W5

2.1103 O

A0A0K2S2Y8 2.1105 127 108 G

A0A0K2S2S0 2.1110

0

A0A0K2S2S1 2.1141

0 116

A0A0K2S2T8 2.1143

0 1322

A0A0K2S2R5 2.1146 105 0

A0A0K2S2Q9

2.1173 1

B A0A0K2S3K4

2.1165 1

o A0A0K2S309

2.1167 3 323

m A0A0K2S2S9 2.1169

b A0A0K2S2V5 2.1171

y A0A0K2S2T7 2.1160

BC1 x A0A0K2S2R2 2.1162

m A0A0K2S3J6 2.1159 plexippus Danaus F1

A0A0K2S2U0 2.1153

o A0A0K2S2U6 2.1155 xylostella Plutella r A0A0K2S2Z8 2.1156 106

i

H A0A0K2S2S8 A0A0K2S2Z9

A0A0K2S2T6 A0A0K2S354

c A0A0K2S2U3 A0A0K2S323

A A0A0K2S2Y4 A0A0K2S3G9

c A0A0K2S3I6 A0A0K2S3R4

h A0A0K2S2Z5 A0A0K2S326

o A0A0K2S2W2 A0A0K2S372

r A0A0K2S2V2 A0A0K2S3M6

i A0A0K2S2X5 A0A0K2S333

o A0A0K2S3L8 A0A0K2S321

n A0A0K2S3H2 A0A0K2S314

A0A0K2S2Q7 A0A0K2S2S6

A0A0K2S3N0 A0A0K2S2U8

A0A0K2S301 A0A0K2S2Q3

A0A0K2S300 A0A0K2S2R7

A0A0K2S305 A0A0K2S2V6

A0A0K2S375 A0A0K2S2S2

A0A0K2S303 A0A0K2S2U4

A0A0K2S2P5 A0A0K2S2X0

A0A0K2S2W7 A0A0K2S2V8

A0A0K2S2X1 A0A0K2S2X9

S

A0A0K2S2W1 A0A0K2S2W6

A0A0K2S2V4 A0A0K2S2U9

A0A0K2S3M2 A0A0K2S2P9

A0A0K2S328 A0A0K2S2Q4

A0A0K2S2Y6 A0A0K2S2R0

A0A0K2S338 A0A0K2S2S4 macynthia amia

A0A0K2S2P8 A0A0K2S2R6

A0A0K2S2R4 A0A0K2S3J1

2.1130 A0A0K2S2Y0

2.1126 A0A0K2S2T3

2.1139 A0A0K2S3K9

2.1134 A0A0K2S3L4

2.1133 A0A0K2S2W0

2.1121 A0A0K2S2V1

A0A0K2S2Z6 A0A0K2S2T4

A0A0K2S310 A0A0K2S2T0

A0A0K2S368 A0A0K2S318

A0A0K2S2Y7 A0A0K2S2S5

A0A0K2S2X3 A0A0K2S2Z3

A0A0K2S2Y3 A0A0K2S2U2

A0A0K2S359 A0A0K2S3K0 XM_011561028.1 A0A0K2S2T2

DPOGS206742-PA 2.1135(outgroup)

1.0 pryeri

1

3

1

0

0

0

0 G O

B

o

m

b

y

x

m

o

r

i

H

c

B

c h o r i o n Posted on Authorea 27 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158802211.15119012 — This a preprint and has not been peer reviewed. Data may be preliminary. of-samia-ricini-a-new-model-species-of-lepidopteran-insect Table3.xlsx file Hosted of-samia-ricini-a-new-model-species-of-lepidopteran-insect Table2.xlsx file Hosted of-samia-ricini-a-new-model-species-of-lepidopteran-insect Table1.xlsx file Hosted vial at available at available at available A Chr13 https://authorea.com/users/315693/articles/446018-the-genome-sequence- https://authorea.com/users/315693/articles/446018-the-genome-sequence- https://authorea.com/users/315693/articles/446018-the-genome-sequence- Chr1 Chr9 Chr5 Chr6 Chr2 Chr10 ChrZ i.5 Fig. 19 Chr7 Chr3 Chr11 Chr8 Chr4 Chr12