GigaScience

A highly contiguous genome assembly of the bat hawkmoth vespertilio (: ) --Manuscript Draft--

Manuscript Number: GIGA-D-19-00361R1 Full Title: A highly contiguous genome assembly of the bat hawkmoth Hyles vespertilio (Lepidoptera: Sphingidae) Article Type: Data Note

Funding Information: Max-Planck-Gesellschaft Dr. Gene Myers (-) Deutsche Forschungsgemeinschaft Not applicable (HU 1561/5-1 & RE 603/25-1 & HI 1423/3- 1) Bundesministerium für Bildung und Dr. Martin Pippel Forschung (01IS18026C)

Abstract: Adapted to different ecological niches, species belonging to the Hyles genus exhibit a spectacular diversity of larval color patterns. These species diverged about 7.5 Mya, making this rather young genus an interesting system to study a wide range of questions including the process of speciation, ecological adaptation and adaptive radiation. Here we present a high-quality genome assembly of the bat hawkmoth Hyles vespertilio , the first reference genome of a member of the Hyles genus. We generated 51X PacBio long reads with an average read length of 8.9 kb. PacBio reads longer than 4 kb were assembled into contigs, resulting in a 651.4 Mb assembly consisting of 530 contigs with an N50 value of 7.5 Mb. The circular mitochondrial contig has a length of 15,303 bases. The H. vespertilio genome is very repeat-rich and exhibits a higher repeat content (50.3%) than other Bombycoidea species such as Bombyx mori (45.7%) and Manduca sexta (27.5%). We developed a comprehensive gene annotation workflow to obtain consensus gene models from different evidences including gene projections, protein homology, transcriptome data, and ab initio predictions. The resulting gene annotation is highly complete with 94.5% of BUSCO genes being completely present, which is higher than the BUSCO completeness of the B. mori (92.2%) and M. sexta (90%) annotation. Our gene annotation strategy has general applicability to other genomes and the H. vespertilio genome provides a valuable molecular resource to study a range of questions in this genus, including phylogeny, incomplete lineage sorting, speciation and hybridization. A genome browser displaying the genome, alignments and annotations is available at https://genome-public.pks.mpg.de/cgi-bin/hgTracks?db=HLhylVes1 . Corresponding Author: Michael Hiller

GERMANY Corresponding Author Secondary Information: Corresponding Author's Institution: Corresponding Author's Secondary Institution: First Author: Martin Pippel First Author Secondary Information: Order of Authors: Martin Pippel David Jebb Franziska Patzold Sylke Winkler Gene Myers

Powered by Editorial Manager®Heiko Vogel and ProduXion Manager® from Aries Systems Corporation Michael Hiller Anna K. Hundsdoerfer Order of Authors Secondary Information: Response to Reviewers: We have uploaded a word file that provides a point-by-point response "PointByPointResponse". The file is labeled as Supplementary Material.

A cover letter to the editor is also uploaded. Additional Information: Question Response

Are you submitting this manuscript to a No special series or article collection?

Experimental design and statistics No

Full details of the experimental design and statistical methods used should be given in the Methods section, as detailed in our Minimum Standards Reporting Checklist. Information essential to interpreting the data presented should be made available in the figure legends.

Have you included all the information requested in your manuscript?

If not, please give reasons for any not applicable as no experiments were performed omissions below.

as follow-up to "Experimental design and statistics

Full details of the experimental design and statistical methods used should be given in the Methods section, as detailed in our Minimum Standards Reporting Checklist. Information essential to interpreting the data presented should be made available in the figure legends.

Have you included all the information requested in your manuscript?

"

Resources Yes

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

A description of all resources used, including antibodies, cell lines, and software tools, with enough information to allow them to be uniquely identified, should be included in the Methods section. Authors are strongly encouraged to cite Research Resource Identifiers (RRIDs) for antibodies, model organisms and tools, where possible.

Have you included the information requested as detailed in our Minimum Standards Reporting Checklist?

Availability of data and materials Yes

All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the “Availability of Data and Materials” section of your manuscript.

Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Manuscript with DOI Click here to access/download;Manuscript;ManuscriptR1 WITH DOI.docx

1

2 A highly contiguous genome assembly of the bat hawkmoth 3 Hyles vespertilio (Lepidoptera: Sphingidae)

4 5 Martin Pippel 1, 2, a , David Jebb 1, 2, 3, a, Franziska Patzold 4, Sylke Winkler 6 1, Gene Myers 1, Heiko Vogel 5, Michael Hiller 1, 2, 3, # Anna K. 7 Hundsdoerfer 4, # 8 9 1 Max Planck Institute of Molecular Cell Biology and Genetics, 10 Pfotenhauerstraße 108, 01307 Dresden, 11 2 Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307, Dresden, 12 Germany 13 3 Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 14 01187, Dresden, Germany 15 4 Senckenberg Natural History Collections Dresden, Königsbrücker Landstr. 16 159, 01109 Dresden, Germany 17 5 Department of Entomology, Max Planck Institute for Chemical Ecology, 18 Hans-Knoell-Strasse 8, 07745, Jena, Germany 19 20 a Joint first authorship 21 22 # Corresponding authors: 23 [email protected] & [email protected] 24 25

1 26 ABSTRACT

27 Adapted to different ecological niches, moth species belonging to the Hyles 28 genus exhibit a spectacular diversity of larval color patterns. These species 29 diverged about 7.5 Mya, making this rather young genus an interesting system 30 to study a wide range of questions including the process of speciation, 31 ecological adaptation and adaptive radiation. Here we present a high-quality 32 genome assembly of the bat hawkmoth Hyles vespertilio, the first reference 33 genome of a member of the Hyles genus. We generated 51X PacBio long reads 34 with an average read length of 8.9 kb. PacBio reads longer than 4 kb were 35 assembled into contigs, resulting in a 651.4 Mb assembly consisting of 530 36 contigs with an N50 value of 7.5 Mb. The circular mitochondrial contig has a 37 length of 15,303 bases. The H. vespertilio genome is very repeat-rich and 38 exhibits a higher repeat content (50.3%) than other Bombycoidea species such 39 as Bombyx mori (45.7%) and Manduca sexta (27.5%). We developed a 40 comprehensive gene annotation workflow to obtain consensus gene models 41 from different evidences including gene projections, protein homology, 42 transcriptome data, and ab initio predictions. The resulting gene annotation is 43 highly complete with 94.5% of BUSCO genes being completely present, which 44 is higher than the BUSCO completeness of the B. mori (92.2%) and M. sexta 45 (90%) annotation. Our gene annotation strategy has general applicability to 46 other genomes and the H. vespertilio genome provides a valuable molecular 47 resource to study a range of questions in this genus, including phylogeny, 48 incomplete lineage sorting, speciation and hybridization. A genome browser 49 displaying the genome, alignments and annotations is available at 50 https://genome-public.pks.mpg.de/cgi-bin/hgTracks?db=HLhylVes1. 51 52

53 KEYWORDS

54 Genome assembly, PacBio long reads, hawkmoth - silk moth comparison, 55 gene annotation 56

57

2

58 INTRODUCTION

59 Bombycoidea are a speciose superfamily of , comprising ten families, 60 more than 500 genera [1] and 6,092 species that are mostly diversified in the 61 intertropical region of the globe [2]. This superfamily includes the two well- 62 known macrolepidoptera families Saturniidae and Sphingidae. The larvae of at 63 least eight Saturniid species are eaten as an important source of proteins in 64 rural Africa [3]. With wingspans of 4 to 10 cm, Sphingids are large pollinators 65 with excellent flying abilities. They are important prey for bats and some species 66 can produce ultrasound to divert attacks by echolocating bats [4]. Furthermore, 67 Bombycoidea not only comprises some of the largest moth species, exemplified 68 by the giant silk moth Attacus atlas with a wingspan measuring 25-30 cm, but 69 also includes several model organisms, such as the domestic silkmoth Bombyx 70 mori, a saturniid of great economic importance for silk production, and the 71 tobacco hornworm Manduca sexta, which is a common pest sphingid species 72 causing considerable damage to tobacco, tomato, pepper, eggplant and 73 plantations of other crops [5]. Since these species have been extensively 74 studied, they take a leading role in the fields of Lepidoptera genetics and 75 physiology. To date, genomes of only four Bombycoidea species have been 76 published (Bombyx mori [6] together with its two closely-related congeners B. 77 huttoni and B. mandarina of the Saturniidae, and Manduca sexta [7] of the 78 Sphingidae), which represents a tiny fraction of the diversity of Bombycoidea. 79 80 Sphingidae include the hawkmoth genus Hyles. This genus originated in South 81 America and comprises 32 recognized species [8-10] with representatives 82 native to all continents and major islands (except Antarctica). As a rather young 83 genus, estimated to have diverged about 7.5 Mya [4], species from different 84 continents are still able to hybridize. This makes Hyles an interesting genus to 85 study a wide range of questions including the process of speciation, ecological 86 adaptation, adaptive radiation, genetics of reproduction and evolution in action. 87 However, such studies are hampered by the lack of suitable molecular 88 resources. In particular, a well-assembled reference genome of any Hyles 89 species is currently missing. 90

3 91 Here we present a high-quality nuclear and mitochondrial genome assembly of 92 the bat hawkmoth Hyles vespertilio (NCBI:txid283848) (Figure 1), the first 93 reference genome assembly of a member of the Hyles genus. Compared with 94 other members of the Hyles genus that have broad species distributions, H. 95 vespertilio is restricted to mountainous river valleys in the Western Palearctic. 96 This rather restricted distribution is likely associated with a lower degree of 97 hybridization, which makes H. vespertilio a good target for assembly of a 98 reference genome. In the following, we report the assembly of the H. vespertilio 99 genome from using PacBio long sequencing reads, the annotation of this 100 genome and a comparative analysis to B. mori and M. sexta. All data can be 101 visualized and downloaded from a genome browser instance at https://genome- 102 public.pks.mpg.de/cgi-bin/hgTracks?db=HLhylVes1. 103 104 105

106 RESULTS

107 Assembly of Hyles vespertilio from long sequencing reads 108 We generated 51X PacBio long reads with an average read length of 8.9 kb and 109 an N50 read length of 16.5 kb. PacBio reads longer than 4 kb were assembled 110 into contigs with a customized assembler called DAmar. DAmar is a hybrid of 111 our MARVEL approach, which was used to assemble the Axolotl [11] and 112 Schmidtea [12] genomes, and the Dazzler method (Supplementary Text 1). This 113 resulted in a 651.4 Mb assembly consisting of 530 contigs. The contig N50 value 114 is 7.5 Mb (Figure 2A).

115

116 Compared with the assembly of B. mori (460.3 Mb) and M. sexta (419.4 Mb), 117 the H. vespertilio assembly is substantially larger by 191 and 232 Mb, 118 respectively. As shown in Figure 2, our contigs are >100 times longer than the 119 contigs of the M. sexta assembly (N50 value 52 kb) but shorter than the contigs 120 of the B. mori assembly (N50 value 12.2 Mb), which was based on a 121 combination of different sequencing technologies including 80X PacBio long

4

122 reads, 60X Illumina reads and complete sequences of BAC and Fosmid clones. 123 To assess and compare genome completeness, we used BUSCO [13] and the 124 set of 2,442 conserved, single-copy Endopterygota genes. As shown in 125 Supplementary Table 1, the H. vespertilio assembly (95.58% complete genes) 126 shows a similar level of completeness as B. mori (96.48%) and M. sexta 127 (95.21%).

128

129 We also assembled the mitochondrial genome of H. vespertilio using PacBio 130 long reads and the DAmar assembler. The resulting circular mitochondrial 131 contig has a length of 15,303 bases.

132

133 Repeat content 134 To assess to which extent the H. vespertilio assembly consist of repetitive 135 sequences, we modelled and masked repeats using RepeatModeler and 136 RepeatMasker. To compare repeat content with B. mori and M. sexta, we 137 applied the same procedure to these genomes as well. We found that the H. 138 vespertilio genome has a high repeat content of 50.3% (Figure 3, 139 Supplementary Table 2). B. mori also has a repeat-rich genome (45.7%), while 140 the M. sexta assembly is less repeat-rich (27.5%), which may be an 141 underestimation as similar repeat copies may not be properly assembled for this 142 species. This analysis also suggests that expansions of transposons and other 143 repeats contributed to the larger genome size of H. vespertilio, as the H. 144 vespertilio assembly comprises 117 Mb more in repetitive elements compared 145 to B. mori (328 vs. 210.5 Mb). Long Interspersed Nuclear Elements (LINEs) 146 comprise the largest repeat class in the H. vespertilio assembly, followed by 147 Short Interspersed Nuclear Elements (SINEs) and DNA transposons (Figure 3). 148 This is similar to B. mori and M. sexta, except that the latter exhibits fewer 149 LINEs. 150

151 Gene annotation 152 To annotate genes in the H. vespertilio assembly, we used Evidence Modeler

5 153 [14] to produce consensus gene models from multiple sources of predictions. 154 A flowchart visualizing the gene annotation strategy is shown in Figure 4. First 155 we generated pairwise genome alignments between the H. vespertilio 156 assembly and B. mori and M. sexta, and used these alignments to project 157 annotated genes from B. mori and M. sexta to the H. vespertilio genome using 158 CESAR [15, 16]. This resulted in 11,721 and 19,760 gene predictions that were 159 projected from B. mori and M. sexta, respectively. Second, coding sequences 160 (CDS) from 22 available Lepidopteran species were downloaded from Lepbase. 161 CDS and translated protein sequences were aligned to the H. vespertilio 162 assembly using GenomeThreader [17]. The number of significant alignments is 163 listed in Supplementary Table 3. Third, RNA-seq data from H. euphorbiae, a 164 closely related species, was mapped to the H. vespertilio assembly and 165 assembled into transcripts using StringTie [18]. TAMA and GenemarkS-T [19] 166 were used to predict open reading frames in assembled transcripts, predicting 167 45,488 and 19,776 coding transcripts respectively. Fourth, four ab initio gene 168 finders, SNAP, GlimmerHMM, Genemark-ES and Augustus [20-23], predicted 169 67,351, 108,570, 92,757 and 61,225 gene models respectively. All evidences 170 were passed to Evidence Modeler to produce a consensus gene set. Likely 171 missing genes were recovered by keeping the M. sexta and B. mori CESAR 172 projections, generating a set of 50,612 loci. Finally, models for which greater 173 than 50% of the coding sequence was contained within repeat sequence were 174 removed. This produced a final set of 23,768 genes. 175 176 To assess the completeness of our H. vespertilio gene annotation, we used 177 BUSCO [13] and the set of 2,442 conserved, single-copy Endopterygota genes. 178 We found that our annotation is highly complete with 94.5% (2,306 of 2,442) of 179 these BUSCO genes being completely present (91.97% single copy, 2.46% 180 duplicated genes). Importantly, this completeness is higher than the gene 181 annotations of both B. mori (92.2% complete BUSCO genes) and M. sexta (90% 182 complete BUSCO genes). 183

6

184 DISCUSSION

185 Here we present a high-quality genome assembly for the bat hawkmoth Hyles 186 vespertilio. PacBio long reads have been instrumental to assemble long 187 contigs, in particular since the H. vespertilio genome is longer and more repeat- 188 rich than other Bombycoidea species. With a contig N50 value of 7.5 Mb, our 189 assembly is the second-most contiguous Bombycoidea genome to date. 190 191 To annotate coding genes, we developed a strategy that integrates multiple 192 different gene evidences. First, we used genome alignments to project genes 193 annotated in related species. Gene projection generally produces very accurate 194 annotations, but is by definition limited by the completeness of the gene sets of 195 related species. Evolutionary distance is another factor influencing the 196 completeness of results obtained from gene projections, with closely-related 197 species generally allowing for more complete projections [16], consistent with 198 our result that substantially more genes were projected using M. sexta as a 199 reference. Nevertheless, we were able to project a large number of genes 200 (>11,000) also from the more distantly-related B. mori. To supplement genes 201 predicted by homology-based approaches, we additionally aligned proteins and 202 CDS from Lepbase. In the absence of available RNA-seq data of H. vespertilio, 203 we used RNA-seq data from the related H. euphorbiae species to obtain 204 transcriptomic evidence for gene models. Finally, since homology and 205 transcriptomic evidence may miss lineage-specific or lowly-expressed genes, 206 we aimed at increasing gene annotation completeness by employing four ab 207 initio gene prediction methods, aided by a large training set available from our 208 high-quality gene projections. After integrating and filtering all evidences, this 209 strategy produced a gene annotation with a higher BUSCO gene completeness 210 than for other Bombycoidea species. Since all employed methods are useable 211 for other species, our integrative gene annotation strategy likely has general 212 applicability to many other genomes. 213 214 To make our data accessible to the community and enable efficient use of the 215 H. vespertilio genome for future studies, we provide a freely-available genome 216 browser instance (https://genome-public.pks.mpg.de/cgi-

7 217 bin/hgTracks?db=HLhylVes1) for data visualization and exploration. The 218 genome browser visualization of the alignments to B. mori and M. sexta, the 219 gene annotation, and the underlying gene evidences is shown in Figure 5 for 220 an exemplary genomic locus. 221 222 The H. vespertilio genome provides a valuable molecular resource to study 223 speciation and hybridization processes in this genus. In particular, together with 224 newly generated molecular data, the genome will help to infer the phylogeny of 225 the 32 recognized species, which is not yet resolved due to a high degree of 226 hybridization between species and incomplete lineage sorting [24]. 227 Furthermore, Hyles species exhibit a spectacular diversity of color patterns, 228 exemplified by many different colorations of larva and adult wings, e.g. [25], and 229 are adapted to different ecological niches. Thus, the genome of H. vespertilio 230 will facilitate a multitude of studies ranging from the genetic basis of 231 morphological evolution, ecological adaptation to fundamental evolutionary 232 processes such as speciation and hybridization. 233 234 235

236 METHODS

237 Ethics, consent, and permissions 238 The DNA sample was derived from a single male individual of Hyles vespertilio, 239 collected by Alberto Zilli in Latium (Province Rieti), Mt. Terminillo, Vallonina 240 (950m), , on 31.V.2018 (specimen accession number LG2117), in 241 accordance with the EU’s environmental and scientific legislation. A second co- 242 captured male was deposited as a voucher in the Museum of Zoology 243 (Senckenberg Natural History Collections Dresden; with the DNA/tissue 244 voucher number MTD-TW-12562). 245

246 DNA extraction, library preparation and sequencing 247 High molecular weight (HMW) genomic DNA for the PacBio library was isolated

248 after lysis of the liquid N2-ground abdomen in home-made lysis buffer (400 mM

8

249 NaCl, 20 mM Tris base pH 8.0. 30 mM EDTA pH 8.0, 0,5% SDS, 100 ug/ml 250 Proteinase K) and standard phenol-chloroform extraction. HMW genomic DNA 251 was precipitated by centrifugation after adding ice-cold ethanol and dissolved 252 in Tris-EDTA, pH 8.0. RNA was removed by RNase A treatment. Pulse field gel 253 electrophoresis (SAGE Pippin Pulse) showed that the resulting DNA molecules 254 were around 50 -150 Kb long. The gDNA concentration was 413 ng/µ (Qubit ds 255 BR assay kit, Thermo Fisher Scientific). 256 257 Pacific Bioscience continuous long read (CLR) libraries were prepared as 258 described in the ‘Guidelines for preparing size-selected 20 kb SMRTbellTM 259 templates. In brief, long gDNA was sheared to 60 kb by the MegaruptorTM 260 device (Diagenode). PacBio SMRTbellTM libraries were size selected for 261 fragments larger than 20 kb with the SAGE BluePippinTM device. SMRT 262 sequencing was performed on the SEQUEL system making use of sequencing 263 chemistry 2.1; movie time was 10 hours for all SMRT cells. A total of 6 SMRT 264 cells were sequenced with an average unique molecular yield of 5.3 Gb. 265

266 Genome assembly 267 De novo genome assembly was performed with DAmar 268 (https://github.com/MartinPippel/DAmar). This assembler is based on an 269 improved MARVEL assembler [11, 12] and integrates parts from Dazzler suite 270 (Supplementary Text 1) and DACCORD [26, 27]. 271 272 To assemble the genome, we performed the following four steps: setup, read 273 patching, assembly and error polishing. In the setup phase, PacBio reads were 274 filtered by choosing only the longest read of each zero-mode waveguide and 275 requiring subsequently a minimum read length of 4 kb. The resulting 2.2 million 276 reads (45X coverage) were stored in a Dazzler database 277 (https://github.com/thegenemyers/DAZZ_DB). 278 279 The patch phase detects and corrects read artefacts including missed adapters, 280 polymerase strand jumps, chimeric reads and long low-quality read segments 281 that are the primary impediments to long contiguous assemblies. To this end,

9 282 we first computed local alignments of all raw reads. Since local alignment 283 computation is by far the most time and storage consuming part of the pipeline, 284 we reduced runtime and storage by masking repeats in the reads as follows. 285 First, low complexity intervals, such as microsatellites or homopolymers, were 286 masked with DBdust (https://github.com/thegenemyers/DAZZ_DB/). Second, 287 tandem repeats were masked by using datander and TANmask 288 (https://github.com/thegenemyers/DAMASKER). Third, we used a read 289 alignment step to detect repeats (Supplementary Text 2). To this end, we first 290 split all reads into groups representing 1X read coverage. For each group, we 291 then aligned all reads against all others in the same group with daligner [28] 292 (https://github.com/thegenemyers/DALIGNER) and masked all local regions in 293 each read where at least 10 other reads aligned. The repeat masks were 294 subsequently used to prevent k-mer seeding in repetitive regions when 295 computing all local alignments between all reads. Since masking repeats can 296 lead to missing low quality or noisy regions within PacBio reads, we used 297 LAseparate to find proper alignment chains that prematurely end in repeat 298 regions. For those alignment chains, we recomputed local alignments with the 299 repcomp tool without using the repeat mask. Then we applied LAfix to detect 300 and correct read artefacts. 301 302 Manual inspection of the overlap graph (Supplementary Figure 1) revealed that 303 chimeric reads were passed on to the assembly phase because chimeric 304 breaks within large repeat regions were missed. Therefore, we improved the 305 detection of chimeric reads by re-analyzing repetitive regions up to a length of 306 8 kb for chimers. Any subread which includes a repetitive region that could not 307 be spanned by at least three proper alignment chains was excluded. This 308 additional step lead to a final overlap graph that was much cleaner, which made 309 manual validation easier (Supplementary Figure 2). 310 311 In the assembly phase, we first calculated all overlaps between patched reads 312 using the same alignment strategy of the patch phase. The subsequent steps 313 of (i) computing a quality track for all reads, (ii) computing a detailed repeat 314 mask, (iii) filtering overlap piles, (iv) computing the overlap graph, and (v)

10

315 touring the overlap graph to obtain primary contigs follow the steps of the 316 original MARVEL assembly pipeline [11, 12]. 317 318 In the error polishing phase, we polished all contigs using the raw PacBio reads 319 and two rounds of Arrow 320 (https://github.com/PacificBiosciences/GenomicConsensus.git). All commands 321 and parameters of all steps in the assembly are provided in Supplementary 322 Data files 1-3. 323 324 To assess completeness of the genome assembly, we used BUSCO v3 [13] 325 and the Endopterygota dataset comprising 2,442 genes. To assess potential 326 contamination, we used BlobTools [29] with default parameters except 327 “max_target_seqs 10” in the blastn step and the NCBI nt database (2019/10/31) 328 for the classification step. As shown in Supplementary Table 4, 329 BlobTools classified only 2.17 Mb (0.33% of the of 651.4 Mb assembly) as 330 contamination, showing that contamination is not a major issue that could 331 explain the genome size expansion compared to other Bombycoidea. 332 333 To investigate whether some contigs may represent alternative haplotypes, we 334 determined and plotted the per base read coverage. As shown in 335 Supplementary Figure 3, this revealed a large peak around 40X, which is 336 consistent with our sequencing coverage, and a small hump around 20X, 337 indicating that some contigs may be alternative haplotypes. Therefore, we used 338 purge_dups [30] to detect alternative haplotypes based on read coverage. This 339 tool assigned 622.7 Mb (95.6% of the 651.4 Mb assembly) as the purged 340 primary assembly and assigned 28.7 Mb (4.4% of the assembly) as alternative 341 haplotypes (Supplementary Table 5). Haplotig contigs mostly small and often 342 repeat-rich, which complicates accurate read coverage determination as 343 unique read mappings are harder to obtain. Furthermore, repeating the BUSCO 344 analysis on the purged primary assembly resulted in a smaller percentage of 345 complete but duplicated genes (0.3% vs. 2.8%) but also a 0.5% decrease in the 346 total number of complete BUSCO genes. This suggests that while some contigs 347 are indeed alternative haplotypes, others contain unique genes and should not 348 be classified as alternative haplotypes. Overall, this analysis shows that

11 349 alternative haplotypes cannot explain the larger genome size of H. vespertilio 350 compared to other Bombycoidea. 351 352 To assemble the mitochondrial genome, the corresponding PacBio reads were 353 extracted by mapping them with daligner to the mitochondrial reference 354 sequence of the related Ampelophaga rubiginosa (NCBI Acc. No.

355 NC_035431.1; unpublished). The resulting overlaps were filtered for proper 356 circular alignment chains (chain lengths 4-14 kb, max unaligned bases 1500) 357 with the tool LAfilterMito. The filtered reads were then processed according to 358 the general assembly pipeline (read patching, assembly, error polishing). After 359 read patching, the reads were split into shorter reads with a 1500 bp overlap to 360 ensure that the assembly creates a circular contig that consist of more than a 361 single read. Error polishing was done by running Arrow 362 (https://github.com/PacificBiosciences/GenomicConsensus.git) with filtered 363 PacBio reads. We used Circlator [31] to circularize the mitochondrial contig, 364 map it to itself and trim back the overlapping part.

365 Repeat annotation 366 We first used RepeatModeler (http://www.repeatmasker.org/) with parameters 367 ‘-engine ncbi’) to identify repeat families in genomes of H. vespertilio, B. mori 368 and M. sexta. We used RepeatMasker with default parameters to soft-mask the 369 three genomes with their respective repeat library. Tandem Repeat Finder 370 (https://tandem.bu.edu/trf/trf.html) to detect simple and tandem repeats. 371

372 Genome alignment 373 The H. vespertilio genome was aligned to the genomes of M. sexta (Sphingidae) 374 and Bombyx mori (Saturniidae; sequence data was downloaded from LepBase 375 at http://ensembl.lepbase.org/index.html [32]). Pairwise genome alignments 376 were produced using lastz [33] with parameters K = 2400, L = 3000 and the 377 default scoring matrix, axtChain [34], chainCleaner [35] and RepeatFiller [36] 378 (all with default parameters). 379

12

380 Gene annotation 381 Consensus gene models were produced from gene projections, protein 382 homology, transcriptome data, and ab initio predictions. Evidences were ranked 383 and weighted following the guidelines of the EVidenceModeler manual 384 (https://evidencemodeler.github.io/). The ab initio predictors were given the 385 lowest rank, followed by the spliced alignments. As transcript assembly was 386 done using data from another species, this was ranked second after the gene 387 projections. 388 389 As the first evidence, we used TOGA (Tool to find Orthologs from Genome 390 Alignments, last commit: 02/05/2019) to project annotations of coding genes 391 from multiple reference genomes to a query genome. Briefly, TOGA takes as 392 input pairwise genome alignment chains between a designated reference (here 393 B. mori or M. sexta) and query genome (here H. vespertilio), coding transcript 394 annotations for the reference species and a file linking gene and transcripts 395 isoforms. For each gene, TOGA identifies the chain(s) that aligns the putative 396 ortholog in the query using synteny and the amount of aligning exonic and 397 intronic sequence. To obtain the locations of coding exons of this gene, TOGA 398 then extracts the genomic region corresponding to the gene on this chain from 399 the query assembly and uses CESAR 2.0 (Coding Exon Structure Aware 400 Realigner) [16] in multi-exon mode. B. mori gene models from 401 http://silkbase.ab.a.u-tokyo.ac.jp/ and M. sexta models from Lepbase were 402 projected to H. vespertilio. Using a 10% overlap, 15,169 (77%) of the 19,760 M. 403 sexta projections overlap a B. mori projection and 9,996 (85%) of the 11,721 B. 404 mori projections overlap a M. sexta projection. 3,336 of the M. sexta and B. mori 405 projections are identical. Projected genes were assigned a weight of 8 and 406 classed as “Other prediction” within Evidence Modeler. 407 408 As second evidence, coding sequences (CDS) from 22 available lepidopteran 409 species were downloaded from Lepbase. CDS were translated to 410 corresponding peptide sequence using Prank (v.170427) [37]. CDS and peptide 411 sequences were co-aligned to the assembly of H. vespertilio using 412 GenomeThreader [38] with the parameters “-gcmincoverage 70 -paralogs -

13 413 species drosophila”. Species used and number of significant alignments are 414 detailed in Supplementary Table 3. Alignments were passed to Evidence 415 Modeler as “Protein” alignments and assigned a weight of 4. 416 417 As a third evidence, we used short read RNA sequencing data that was 418 generated from larvae tissue of the closely related H. euphorbiae, since RNA 419 sequencing data of H. vespertilio was not available. Reads were mapped to the 420 H. vespertilio assembly using hisat2 (v 2.0.0) with parameters “--dta --no-unal - 421 -mp 4,1 --score-min L,0,-0.125”, which resulted in mapping 65.82% of the reads. 422 Transcripts were assembled with StringTie [18] with default parameters. Fasta 423 sequences for each transcript were extracted using bedtools. Open reading 424 frames were predicted for each transcript using GenemarkS-T (v5.1) [19] with 425 parameters “--strand direct”. GenemarkS-T transcripts were given a weight of 7 426 and classed as “Other prediction” within Evidence Modeler. TAMA 427 (https://github.com/GenomeRIK/tama.git) was also used to identify ORFs within 428 assembled transcripts. Briefly, ORFs are predicted from all forward frames of a 429 transcript. Predicted peptide sequences are then queried using Blastp [39] 430 against a blast database of the downloaded Lepbase proteins and further 431 classified as full length or partial hits. The highest scoring ORF for each 432 transcript is mapped back to the transcript, and putative nonsense mediated 433 decay (NMD) targets were determined and excluded. Full length and non-NMD 434 target transcripts were provided to Evidence Modeler as class “Other prediction” 435 with a weight of 7. The remaining transcripts were provided to Evidence Modeler 436 as “Transcript” alignments with a weight of 4. 437 438 As a fourth evidence, four ab initio gene prediction tools were used to predict 439 genes in the H. vespertilio genome. As training data, we used a set of non- 440 overlapping, full length and intact genes that were projected from B. mori and 441 M. sexta and resulted in an identical gene model in H. vespertilio (2504 genes). 442 This set was randomly divided into 80% training data and 20% test data. SNAP 443 [20] and GlimmerHMM [21] were trained as per the available manuals, and 444 genes were predicted. To run Augustus [22], we first mapped the RNAseq data 445 again to the genome with using hisat2 and strict mapping parameters (--no- 446 mixed --no-discordant --dta --no-unal --n-ceil L,0,0.05; read mapping rate of 14

447 46.79%) to generate hints. Intron positions as predicted by spliced alignments 448 were extracted using the bam2hints module from Augustus (v3.3.1). The 449 heliconius_melpomene1 model provided with Augustus was optimized for H. 450 vespertilio using optimize_augustus.pl (--cpus=12 --kfold=12) and the training 451 gene set, and Augustus was further trained using these parameters with the 452 etraining tool. Genes were then predicted with Augustus, providing the intron 453 positions as extrinsic hints. Finally, intron hints were provided to Genemark-ES 454 [23] for self-training and gene prediction. Gene predictions were evaluated 455 against the test gene set using ParsEval [40]. SNAP and GlimmerHMM were 456 subsequently given a weight of 1, while Augustus and Genemark-ES 457 predictions were given a weight of 2. All were provided as type “ab initio 458 prediction”. 459 460 Evidence Modeler [14] was then run using the above evidences and described 461 weights. Full length, functional TOGA projections from M. sexta and B. mori with 462 no CDS overlap to any consensus model were included into the consensus set. 463 Consensus gene models with greater than 50% CDS overlap within a single 464 repeat region, as annotated by RepeatMasker, were removed. To assess 465 completeness of the gene annotation, we applied BUSCO v3 [13] in protein 466 mode to our final H. vespertilio protein set and the annotated B. mori and M. 467 sexta proteins, using the Endopterygota dataset comprising 2,442 genes. 468

469 AVAILABILITY OF SUPPORTING DATA

470 All raw sequencing data and the genome assembly of H. vespertilio are 471 available at the National Center for Biotechnology Information under the 472 Bioproject ID PRJNA574010. H. euphorbiae RNA-seq data has been submitted 473 to the EBI short read archive (accession numbers: ERS4198286- 474 ERS4198293). The genome, our gene annotations including the gene 475 evidences, and genome alignments to B. mori and M. sexta are available for 476 download at https://bds.mpi-cbg.de/hillerlab/HylesGenomeData/ and for 477 genome browser visualization and exploration at https://genome- 478 public.pks.mpg.de/cgi-bin/hgTracks?db=HLhylVes1. Other data, further

15 479 supporting this work are openly available in the GigaScience repository, 480 GigaDB [41]. 481

482 ADDITIONAL FILES

483 Supplementary Tables 1-5, Supplementary Figures 1-3, Supplementary Text 1- 484 2, Supplementary Data files 1-3 485 486

487 COMPETING INTERESTS

488 The authors declare that they have no competing interests. 489

490 FUNDING

491 This work was funded by the Max Planck Gesellschaft (Michael Hiller, Gene 492 Myers), the Federal Ministry of Education and Research (grant 01IS18026C) 493 and the German Research Foundation (grants HI 1423/3-1, HU 1561/5-1 and 494 RE 603/25-1). It benefitted from the sharing of expertise within the DFG priority 495 program SPP 1991 Taxon-Omics. 496

497 ACKNOWLEDGEMENTS

498 We thank Alberto Zilli (London) for the collection of the two Hyles vespertilio 499 moths and the Long Read platform of the DRESDEN-concept Genome Center, 500 DFG NGS Competence Center, c/o Center for Molecular and Cellular 501 Bioengineering (CMCB), Technische Universität Dresden, Dresden, Germany 502 for DNA isolation and PacBio long read sequencing. 503 504

505 REFERENCES

506 1. van Nieukerken EJ, Kaila L, Kitching IJ, Kristensen NP, Lees DC, Minet J, et 507 al. Order Lepidoptera Linnaeus, 1758. Zootaxa. 2011;3148:212-21.

16

508 2. Kitching IJ, Rougerie R, Zwick A, Hamilton CA, St Laurent RA, Naumann S, 509 et al. A global checklist of the Bombycoidea (Insecta: Lepidoptera). 510 Biodiversity Data Journal. 2018; 6. 511 3. Lautenschläger T, Neinhuis C, Monizi M, Mandombe JL, Förster A, Henle T, 512 et al. Edible of Northern Angola. African Invertebrates. 2017;58:55. 513 4. Kawahara AY and Barber JR. Tempo and mode of antibat ultrasound 514 production and sonar jamming in the diverse hawkmoth radiation. Proceedings 515 of the National Academy of Sciences. 2015;112 20:6407-12. 516 5. del Campo C ML and Renwick JAA. Dependence on host constituents 517 controlling food acceptance by Manduca sexta larvae. Entomologia 518 Experimentalis et Applicata. 1999;93 2:209-15. 519 6. International Silkworm Genome C. The genome of a lepidopteran model 520 , the silkworm Bombyx mori. Insect Biochemistry and Molecular 521 Biology. 2008;38 12:1036-45. 522 7. Kanost MR, Arrese EL, Cao X, Chen Y-R, Chellapilla S, Goldsmith MR, et al. 523 Multifaceted biological insights from a draft genome sequence of the tobacco 524 hornworm moth, Manduca sexta. Insect Biochemistry and Molecular Biology. 525 2016;76:118-47. 526 8. Hundsdoerfer AK, Päckert M, Kehlmaier C, Strutzenberger P and Kitching IJ. 527 Museum archives revisited: Central Asiatic hawkmoths reveal exceptionally 528 high late Pliocene species diversification (Lepidoptera, Sphingidae). 529 Zoologica Scripta. 2017;46 5:552-70. 530 9. Hundsdoerfer AK, Rubinoff D, Attié M, Kitching IJ and Wink M. A revised 531 molecular phylogeny of the globally distributed hawkmoth genus Hyles 532 (Lepidoptera: Sphingidae), based on mitochondrial and nuclear DNA 533 sequences. Molecular Phylogenetics and Evolution. 2009;52:852–65. 534 10. Hyles. Sphingidae taxonomic inventory. Scratchpads. Biodiversity online, 535 2019. http://sphingidae.myspecies.info/taxonomy/term/1276. Accessed 536 5.7.2019. 537 11. Nowoshilow S, Schloissnig S, Fei JF, Dahl A, Pang AWC, Pippel M, et al. 538 The axolotl genome and the evolution of key tissue formation regulators. 539 Nature. 2018;554 7690:50-5. 540 12. Grohme MA, Schloissnig S, Rozanski A, Pippel M, Young GR, Winkler S, et 541 al. The genome of Schmidtea mediterranea and the evolution of core cellular 542 mechanisms. Nature. 2018;554 7690:56-61. 543 13. Waterhouse RM, Seppey M, Simao FA, Manni M, Ioannidis P, Klioutchnikov 544 G, et al. BUSCO applications from quality assessments to gene prediction and 545 phylogenomics. Molecular Biology and Evolution. 2017. 546 14. Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated 547 eukaryotic gene structure annotation using EVidenceModeler and the Program 548 to Assemble Spliced Alignments. Genome Biology. 2008;9 1:R7. 549 15. Sharma V, Elghafari A and Hiller M. Coding exon-structure aware realigner 550 (CESAR) utilizes genome alignments for accurate comparative gene 551 annotation. Nucleic Acids Res. 2016;44 11:e103. doi:10.1093/nar/gkw210. 552 16. Sharma V, Schwede P and Hiller M. CESAR 2.0 substantially improves speed 553 and accuracy of comparative gene annotation. Bioinformatics. 2017;33 554 24:3985-7. 555 17. Jung S, Pausch H, Langenmayer MC, Schwarzenbacher H, Majzoub-Altweck 556 M, Gollnick NS, et al. A nonsense mutation in PLD4 is associated with a zinc 557 deficiency-like syndrome in Fleckvieh cattle. BMC Genomics. 2014;15:623.

17 558 18. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT and Salzberg 559 SL. StringTie enables improved reconstruction of a transcriptome from RNA- 560 seq reads. Nature Biotechnology. 2015;33 3:290-5. 561 19. Tang S, Lomsadze A and Borodovsky M. Identification of protein coding 562 regions in RNA transcripts. Nucleic Acids Research. 2015;43 12:e78. 563 20. Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59. 564 21. Majoros WH, Pertea M and Salzberg SL. TigrScan and GlimmerHMM: two 565 open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20 566 16:2878-9. 567 22. Stanke M, Schoffmann O, Morgenstern B and Waack S. Gene prediction in 568 eukaryotes with a generalized hidden Markov model that uses hints from 569 external sources. BMC Bioinformatics. 2006;7:62. 570 23. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO and Borodovsky M. Gene 571 prediction in novel fungal genomes using an ab initio algorithm with 572 unsupervised training. Genome Research. 2008;18 12:1979-90. 573 24. Mende MB and Hundsdoerfer AK. Mitochondrial lineage sorting in action - 574 historical biogeography of the Hyles euphorbiae complex (Sphingidae, 575 Lepidoptera) in Italy. BMC Evolutionary Biology. 2013;13 1:83. 576 25. Hundsdoerfer AK, Mende MB, Harbich H, Pittaway AR and Kitching IJ. 577 Larval pattern morphotypes in the Western Palaearctic Hyles euphorbiae 578 complex (Lepidoptera: Sphingidae: Macroglossinae). Insect Systematics and 579 Phylogeny. 2011;42:41-86. 580 26. Tischler-Höhle G. Haplotype and Repeat Separation in Long Reads. In: Cham, 581 2019, pp.103-14. Springer International Publishing. 582 27. Tischler G and Myers EW. Non hybrid long read consensus using local de 583 Bruijn graph assembly. bioRxiv. 2017:106252. 584 28. Myers G. Efficient Local Alignment Discovery amongst Noisy Long Reads. 585 In: Berlin, Heidelberg, 2014, pp.52-67. Springer Berlin Heidelberg. 586 29. Laetsch DR and Blaxter ML. BlobTools: Interrogation of genome assemblies. 587 F1000Research. 2017;6:1287. 588 doi:https://doi.org/10.12688/f1000research.12232.1. 589 30. Guan D, McCarthy SA, Wood J, Howe K, Wang Y and Durbin R. Identifying 590 and removing haplotypic duplication in primary genome assemblies. bioRxiv. 591 2019:729962. doi:10.1101/729962. 592 31. Hunt M, Silva ND, Otto TD, Parkhill J, Keane JA and Harris SR. Circlator: 593 automated circularization of genome assemblies using long sequencing reads. 594 Genome Biol. 2015;16:294. doi:10.1186/s13059-015-0849-0. 595 32. Challis RJ, Kumar S, Dasmahapatra KK, Jiggins CD and Blaxter M. Lepbase: 596 the Lepidopteran genome database. BioRxiv. 2016:056994. 597 33. Harris RS. Improved pairwise alignmnet of genomic DNA. The Pennsylvania 598 State University, Pennsylvania 2007. 599 34. Kent WJ, Baertsch R, Hinrichs A, Miller W and Haussler D. Evolution's 600 cauldron: duplication, deletion, and rearrangement in the mouse and human 601 genomes. Proceedings of the National Academy of Sciences of the United 602 States of America. 2003;100 20:11484-9. 603 35. Suarez HG, Langer BE, Ladde P and Hiller M. chainCleaner improves 604 genome alignment specificity and sensitivity. Bioinformatics. 2017;33 605 11:1596-603.

18

606 36. Osipova E, Hecker N and Hiller M. RepeatFiller newly identifies megabases 607 of aligning repetitive sequences and improves annotations of conserved non- 608 exonic elements. Gigascience. 2019;8 11 doi:10.1093/gigascience/giz132. 609 37. Loytynoja A. Phylogeny-aware alignment with PRANK. Methods in 610 Molecular Biology. 2014;1079:155-70. 611 38. Gremme G, Brendel V, Sparks ME and Kurtz S. Engineering a software tool 612 for gene structure prediction in higher organisms. Information and Software 613 Technology. 2005;47 15:965-78. 614 39. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. 615 BLAST+: architecture and applications. BMC Bioinformatics. 2009;10 1:421. 616 40. Standage DS and Brendel VP. ParsEval: parallel comparison and analysis of 617 gene structure annotations. BMC Bioinformatics. 2012;13 1:187. 618 41. Pippel M; Jebb D; Patzold F; Winkler S; Myers G; Vogel H; Hiller M; 619 Hundsdoerfer AK: Supporting data for "A highly contiguous genome 620 assembly of the bat hawkmoth (Lepidoptera Sphingidae)" GigaScience 621 Database. 2020. http://dx.doi.org/10.5524/100697. 622 623 624

19 625 FIGURES

626 627 A B

1 cm 628 1 cm 629 Figure 1: Hyles vespertilio. 630 (A) The lectotype specimen, originally described as Sphinx vespertilio by Esper 631 in 1779, was collected in the "area of Verona" (Italy) and deposited in the 632 "Landesmuseum für Kunst und Natur" (Wiesbaden, Germany). 633 (B) The specimen collected in Vallonina (Italy) in 2018. Nearly all tissue was 634 used to sequence the genome. The wings are deposited in the Museum of 635 Zoology (Senckenberg Natural History Collections Dresden, Germany). 636 Scale bars: 1 cm. 637

20

638 639 Figure 2: Assembly contiguity. 640 (A) The N(x)% graph shows the contig or scaffold sizes on the y-axis, where 641 x% of the genome assembly consists of contigs or scaffolds of at least that size. 642 Contigs are shown as solid curves, scaffolds (only for B. mori and M. sexta) as 643 dashed curves. The N50 value is marked as a vertical dashed line. 644 (B) Treemap comparison between the DAmar H. vespertilio assembly and the 645 assemblies of B. mori [6] and M. sexta [7]. Squares encode the relative 646 contributions of individual scaffolds or contigs to assembly size. 647 648

21 649

Fraction of genome assembly (in Mb) 0 100 200 300 400 500 600 700 SINE Hyles 651.4 Mb LINE LTR DNA Manduca 419.4 Mb Simple repeats Other Bombyx 460.3 Mb non-repetitive 650 651 Figure 3: Comparison of genomic repeat content. 652 Stacked bar charts represent the portion of the genome assembly covered by 653 major classes of repetitive elements. Grey indicates non-repetitive genomic 654 regions. Simple repeats comprise tandem repeats, low complexity regions and 655 satellite repeats. The total assembly size is provided right of the bars. 656 657

22

2. Lepbase protein/cDNA alignments

1. Bombyx mori projections Final gene set Manduca sexta projections

High confidence training set 4. Consensus SNAP models Remove models with >50% training repeat overlap Glimmer

Augustus

hints GenemarkES

training 3.

Hyles euphorbiae GenemarkST and TAMA RNAseq mapping ORF predictions 658 659 Figure 4: Gene annotation strategy. 660 First, genome alignments were used to project coding gene annotation from B. 661 mori and M. sexta to H. vespertilio with CESAR 2.0 (red box). Second, protein 662 and coding sequences from 22 Lepidopteran species were downloaded from 663 Lepbase and aligned to the H. vespertilio genome (blue box). Third, RNA-seq 664 reads from a related Hyles species were aligned to the genome and used to 665 assemble transcripts, followed by predicting open reading frames with two 666 methods (purple box). Fourth, high quality gene projections and/or RNA-seq 667 evidence were used to train four ab initio gene prediction tools (green box). All 668 evidences were combined into a consensus set using EVidenceModeler and 669 filtered for models that overlap genomic repeats. Finally, these filtered gene 670 models were combined with full length gene projections that did not overlap 671 consensus genes model to produce the final gene set. 672

23 673 674 Figure 5: Genome browser visualization of annotations generated for the H. 675 vespertilio genome. 676 A UCSC genome browser instance visualizes the final gene annotation (blue), 677 together with the underlying gene evidences, and pairwise genome alignment 678 chains to B. mori and M. sexta. 679

24

AFigure 1 BClick here to access/download;Figure;Figure1.pdf

1 cm 1 cm Figure 2 Click here to Hyles contigs Bombyxaccess/download;Figure;Figure2.pdf contigs Manduca contigs A Bombyx scaffolds Manduca scaffolds

10 Mb

1 Mb

100 Kb N(X) length

10 Kb

1 Kb

100 bp

0 10 20 30 40 50 60 70 80 90 100 N(X) %

B Hyles contigs Bombyx contigs Manduca contigs Figure 3 Fraction of genome assembly (in Mb) Click here to access/download;Figure;Figure3.pdf 0 100 200 300 400 500 600 700 SINE Hyles 651.4 Mb LINE LTR DNA Manduca 419.4 Mb Simple repeats Other Bombyx 460.3 Mb non-repetitive Figure 4 2. Click here to access/download;Figure;Figure4.pdf Lepbase protein/cDNA alignments

1. Bombyx mori projections Final gene set Manduca sexta projections

High confidence training set 4. Consensus SNAP models Remove models with >50% training repeat overlap Glimmer

Augustus

hints GenemarkES

training 3.

Hyles euphorbiae GenemarkST and TAMA RNAseq mapping ORF predictions HLhylVes1:Figure 5 HLhylVes1_00000001:3,690,825-3,737,757 Click10 kb here to access/download;Figure;Figure5.pdf

EVidenceModeler consensus gene models gene evm49429 annotation

Silkworm (Bombyx mori) gene projections using TOGA KWMTBOMO11153.mrna1.77 gene Tobacco hornworm (Manduca sexta) gene projections using TOGA projection Msex2.14235-RA.5132

Lepbase Protein and cDNA dual alignments from GenomeThreader protein Msex2.14235-RA alignments maker-scaffold21-exonerate_est2genome-gene-2.8-mRNA-1

Lower confidence transcripts from Stringtie and TAMA-GO transcriptome STRG.184.1 data of related Coding transcripts assembled with Stringtie and Genemark-ST STRG.184.1 species

de novo Augustus predictions with RNA-seq hints g356 g357 de novo Genemark predictions with RNA-seq hints 70767_t 70768_t 70770_t 70769_t ab initio de novo Glimmer gene predictions mRNA607 mRNA609 mRNA611 mRNA613 predictions mRNA608 mRNA610 (with or mRNA612 without hints) de novo Snap gene predictions snap.model.373 snap.model.376 snap.model.375 snap.model.377 snap.model.378

Silkworm alignment chains Bomo_Chr18 + 12278k Tobacco hornworm alignment chains genome JH668404 - 0k AIXA01025155 + 0k AIXA01020904 + 0k alignments AIXA01028318 - 0k JH669766 + 0k Assembly instructions

Click here to access/download Supplementary Material Supplementary_DataFile1.txt Assembly instructions

Click here to access/download Supplementary Material iHylVes.assembly.config.sh Assembly instructions

Click here to access/download Supplementary Material iHylVes.coverage.config.sh Supplementary Figures and Texts

Click here to access/download Supplementary Material Supplement.pdf Supplementary Tables

Click here to access/download Supplementary Material Supplementary_Tables.xlsx Cover letter to Editor

Click here to access/download Supplementary Material CoverLetter.pdf Point by point response to reviewer comments

Click here to access/download Supplementary Material PointbyPointResponse.docx